from:"Ryota Ozaki"

Re: 9.99.104: panic in tcp_shutdown_wrapper

2022-10-30 Thread Ryota Ozaki

On Sun, Oct 30, 2022 at 5:12 PM J. Hannken-Illjes  wrote:
>
> > On 30. Oct 2022, at 06:52, Michael van Elst  wrote:
> >
> > ozak...@netbsd.org (Ryota Ozaki) writes:
> >
> >> I've committed a possible fix.  Could you try it?
> >
> >> Thanks,
> >> ozaki-r
> >
> >
> > I just got a NULL pointer dereference in tcp_ctloutput where
> > the previous check for inp == NULL is also missing.
> >
> > [ 24837.756043] fp c0016794db70 tcp_ctloutput() at c02ec4b4 
> > netbsd:tcp_ctloutput+0x94
> > [ 24837.756043] fp c0016794dcc0 tcp_ctloutput_wrapper() at 
> > c02d2680 netbsd:tcp_ctloutput_wrapper+-0x31150
> > [ 24837.756043] fp c0016794dcf0 sosetopt() at c0603cbc 
> > netbsd:sosetopt+0x78
> > [ 24837.756043] fp c0016794ddb0 sys_setsockopt() at c060b0fc 
> > netbsd:sys_setsockopt+0x7c
> > [ 24837.766041] fp c0016794de20 syscall() at c00b30fc 
> > netbsd:syscall+0x19c
> >
> > That's:
> >
> > int
> > tcp_ctloutput(int op, struct socket *so, struct sockopt *sopt)
> > {
> > ...
> >   s = splsoftnet();
> >inp = sotoinpcb(so);
> > ...
> >}
> >tp = intotcpcb(inp); <-
> >
> >switch (op) {
>
> ... and Syzcaller (https://syzkaller.appspot.com/netbsd) has a
> bunch of new tcp related crashes starting ~2 days before ...

It seems that all of the failures stem from the missing NULL checks.
So they should be fixed now.

  ozaki-r

Re: ECONNREFUSED no longer works

2022-10-30 Thread Ryota Ozaki

On Mon, Oct 31, 2022 at 6:12 AM Michael van Elst  wrote:
>
> t...@netbsd.org (Tobias Nygren) writes:
>
> >$ nc -n -v 127.0.0.1 1234
> ># hangs forever in connect(2) instead of exiting w/ connection refused.
>
> The logic in tcp_drop() got reversed:
>
> @@ -1042,17 +1017,12 @@ tcp_newtcpcb(int family, void *aux)
>  struct tcpcb *
>  tcp_drop(struct tcpcb *tp, int errno)
>  {
> -   struct socket *so = NULL;
> +   struct socket *so;
>
> -   KASSERT(!(tp->t_inpcb && tp->t_in6pcb));
> +   KASSERT(tp->t_inpcb != NULL);
>
> -   if (tp->t_inpcb)
> -   so = tp->t_inpcb->inp_socket;
> -#ifdef INET6
> -   if (tp->t_in6pcb)
> -   so = tp->t_in6pcb->in6p_socket;
> -#endif
> -   if (!so)
> +   so = tp->t_inpcb->inp_socket;
> +   if (so != NULL)<-
> return NULL;
>
> if (TCPS_HAVERCVDSYN(tp->t_state)) {
>

Thank you for pointing this out.  I've committed a fix.

  ozaki-r

Re: 9.99.104: panic in tcp_shutdown_wrapper

2022-10-30 Thread Ryota Ozaki

On Sun, Oct 30, 2022 at 2:52 PM Michael van Elst  wrote:
>
> ozak...@netbsd.org (Ryota Ozaki) writes:
>
> >I've committed a possible fix.  Could you try it?
>
> >Thanks,
> >  ozaki-r
>
>
> I just got a NULL pointer dereference in tcp_ctloutput where
> the previous check for inp == NULL is also missing.
>
> [ 24837.756043] fp c0016794db70 tcp_ctloutput() at c02ec4b4 
> netbsd:tcp_ctloutput+0x94
> [ 24837.756043] fp c0016794dcc0 tcp_ctloutput_wrapper() at 
> c02d2680 netbsd:tcp_ctloutput_wrapper+-0x31150
> [ 24837.756043] fp c0016794dcf0 sosetopt() at c0603cbc 
> netbsd:sosetopt+0x78
> [ 24837.756043] fp c0016794ddb0 sys_setsockopt() at c060b0fc 
> netbsd:sys_setsockopt+0x7c
> [ 24837.766041] fp c0016794de20 syscall() at c00b30fc 
> netbsd:syscall+0x19c
>
> That's:
>
> int
> tcp_ctloutput(int op, struct socket *so, struct sockopt *sopt)
> {
> ...
> s = splsoftnet();
> inp = sotoinpcb(so);
> ...
> }
> tp = intotcpcb(inp); <-
>
> switch (op) {
>

Thank you for the report.  I've fixed the panic too.

  ozaki-r

Re: 9.99.104: panic in tcp_shutdown_wrapper

2022-10-29 Thread Ryota Ozaki

Hi,

I've committed a possible fix.  Could you try it?

Thanks,
  ozaki-r

On Sun, Oct 30, 2022 at 12:17 AM Thomas Klausner  wrote:
>
> Hi!
>
> A couple hours later, my shell was in an NFS mounted directory (probably idle 
> for some time) and I tried tab-completing an entry, and it panicked again.
> Same location as below.
>
> Hand copied:
> tcp_shutdown_wrapper+0x20
> nfs_disconnect+0x69
> nfs_reconnect+0x1a
> nfs_request+0x7fb
> nfs_access+0x1ed
> VOP_ACCESS+0x61
> nfs_lookup+052f
> VOP_LOOKUP+0x8a
> lookup_once+0x1a6
> namei_tryemulroot+0xb00
> namei+0x29
> vn_open+0x133
> do_open+0xc3
> do_sys_openat+0x74
> sys_open+0x24
> syscall+0x196
>
>  Thomas
>
> > On 29.10.2022, at 11:53, Thomas Klausner  wrote:
> >
> > Hi!
> >
> > I’ve upgraded from 9.99.100 (stable) to 9.99.104 this morning (kernel + 
> > user land, but packages still the old ones built on 9.99.100 in case it 
> > matters).
> > A couple hours later I started transmission-gtk and the machine immediately 
> > panicked.
> >
> > Hand copied:
> >
> > uvm_fault(0xf8b04ab6d8f0, 0x0, 1) -> e
> > Fatal page fault in supervisor mode
> > Trap type 6 code 0 rip 0x80b06b82 cs 0x8 rflags 0x10246 cr2 0x38 
> > ilevel 0 rsp 0xfc62191caaaf0
> > Curlwp 0xff8b08ac6d040 pid 6904.22757 lowest kstack 0xfc62191ca62c0
> > Kernel: page fault trap, code = 0
> > Stopped in pid 6904.22757 (transmission-gtk) at 
> > netbsd:tcp_shutdown_wrapper+0x20
> > : movq 38(%rax), %r14
> > tcp_shutdown_wrapper() at netbsd:tcp_shutdown_wrapper:0x20
> > nfs_disconnect() at netbsd:nfs_disconnect+0x69
> > nfs_reconnect() at netbsd:nfs_reconnect+0x1a
> > nfs_request() at netbsd:nfs_request+0x7fb
> > nfs_statvfs() at netbsd:nfs_statvfs+0x173
> > VFS_STATVFS() at netbsd:VFS_STATVFS+0x22
> > dostatvfs() at netbsd:dostatvfs+0x132
> > do_sys_getvfsstat() at netbsd:do_sys_getvfsstat+0x9f
> > sys___getvfsstat90() at netbsd:sys___getvfsstat90+0x2b
> > syscall() at netbsd:syscall+0x196
> >
> > I have nfs mounted some shares from a Synology station.
> >
> > Ideas? Perhaps the pcb merge changes from this week?
> > Thomas
>

Re: Any TCP VTW users?

2022-09-15 Thread Ryota Ozaki

On Fri, Sep 16, 2022 at 10:21 AM Simon Burge  wrote:
>
> Ryota Ozaki wrote:
>
> > Hi,
> >
> > Are there any users of TCP Vestigial Time-Wait (VTW)?
> > The feature is disabled by default and we need to explicitly
> > enable via sysctl to use it.
> >
> > I just want to know if we should still maintain it.
>
> I wouldn't be unhappy if it just disappeared.  It's totally
> undocumented, and will also cause a panic on archs with larger cache
> line size because it does some really funky incorrect math that I
> stared at for a while then gave up on.
>
> erlite# sysctl -w net.inet.tcp.vtw.enable=1
> 15293.7939168] panic: kernel diagnostic assertion "n <= FATP_MAX / 2" failed: 
> file "../../../../netinet/tcp_vtw.c", line 218

Oh, that's bad...  Thank you for telling me.

  ozaki-r

Re: Any TCP VTW users?

2022-09-15 Thread Ryota Ozaki

On Fri, Sep 16, 2022 at 2:03 AM Brad Spencer  wrote:
>
> Ryota Ozaki  writes:
>
> > Hi,
> >
> > Are there any users of TCP Vestigial Time-Wait (VTW)?
> > The feature is disabled by default and we need to explicitly
> > enable via sysctl to use it.
> >
> > I just want to know if we should still maintain it.
> >
> >   ozaki-r
>
>
> I do use it on one system, but it is not likely to be critical to
> anything.   The system in question creates a whole ton of short lived
> connections and I think I was trying to get them to expire quicker then
> normal.

Thank you for the report!

Just curious. Does it improve performance? (or reduce CPU/memory usage?)

  ozaki-r

Re: Any TCP VTW users?

2022-09-15 Thread Ryota Ozaki

On Thu, Sep 15, 2022 at 9:33 PM Andy Ruhl  wrote:
>
> On Thu, Sep 15, 2022 at 12:34 AM Ryota Ozaki  wrote:
> >
> > Hi,
> >
> > Are there any users of TCP Vestigial Time-Wait (VTW)?
> > The feature is disabled by default and we need to explicitly
> > enable via sysctl to use it.
> >
> > I just want to know if we should still maintain it.
> >
> >   ozaki-r
>
> I wasn't even aware of it. I read the comments in
> sys/netinet/tcp_vtw.c. Seems useful for systems that handle a lot of
> sockets. Pretty neat.
>
> Is there some reason why this is obsolete or something?
>
> Andy

When we do mp-ification of TCP, VTW requires extra efforts;
the code looks not mp-ification friendly.

If nobody uses the feature, we can defer mp-ification of it
or completely ignore it.  So I asked the question.

  ozaki-r

Any TCP VTW users?

2022-09-15 Thread Ryota Ozaki

Hi,

Are there any users of TCP Vestigial Time-Wait (VTW)?
The feature is disabled by default and we need to explicitly
enable via sysctl to use it.

I just want to know if we should still maintain it.

  ozaki-r

Re: State of NET_MPSAFE in -9 and/or -current?

2022-03-11 Thread Ryota Ozaki

On Thu, Mar 10, 2022 at 6:51 PM Michael van Elst  wrote:
>
> ozak...@netbsd.org (Ryota Ozaki) writes:
>
> >Note that as you can see from the list above, Layer 4 protocols
> >including TCP and UDP are
> >not MP-safe yet.
>
> Is anyone working on it ?

No one, AFAIK.

  ozaki-r

Re: State of NET_MPSAFE in -9 and/or -current?

2022-03-09 Thread Ryota Ozaki

On Thu, Mar 10, 2022 at 3:19 AM Brian Buhrow  wrote:
>
> hello.  I'm wondering if someone could comment on the state of using 
> NET_MPSAFE kernels?
> Is it ready for production use yet?
> -thanks
> -Brian
>

It depends on your machine(s) and usages.

doc/TODO.smpnet[1] lists up MP-safe components.  If you need to use components
in the "Unprotected ones" (sub)section, we can clearly say NET_MPSAFE kernels
are not ready for production.  Otherwise, you may be able to use
NET_MPSAFE kernels
in production.  Actually we've used NET_MPSAFE kernels based on NetBSD 8 for
a couple of years in production as routers.

[1] 
http://cvsweb.netbsd.org/bsdweb.cgi/src/doc/TODO.smpnet?rev=1.45=text/x-cvsweb-markup_with_tag=MAIN

Note that as you can see from the list above, Layer 4 protocols
including TCP and UDP are
not MP-safe yet.  So using NET_MPSAFE kernels as servers or clients is
probably not
performant.

  ozaki-r

Re: File sharing over virtio-9p

2019-10-28 Thread Ryota Ozaki

On Fri, Oct 25, 2019 at 11:19 PM Mouse  wrote:
>
> > [W]hich of the following is more readable to the user:
>
> > $ ls foo
> > ls: foo: No such file or directory
>
> > or
>
> > $ ls foo
> > ls: stat(foo): No such file or directory
>
> It depends entirely on the user.
>
> As I recently wrote on a non-NetBSD mailing list, there is no such
> thing as a good or bad user interface; there is only a good or bad user
> interfaces for a particular user (or class of sufficiently-similar
> users).
>
> I've lost track of the number of times I've had to resort to a
> sledgehammer such as ktrace to find out what's really going wrong
> because an error message doesn't report enough information.

I've had similar experiences on KASSERT; if a KASSERT fails because of
memory corruption, I wish to know not only if it fails or not but also
values used in KASSERT.

Anyway thank you for suggestions.  I committed the patches with
changing the error message for open while keeping one for setsockopt.

It may be good to have guidelines on writing error messages somewhere.

  ozaki-r

Re: File sharing over virtio-9p

2019-10-25 Thread Ryota Ozaki

On Fri, Oct 25, 2019 at 3:04 PM Michael van Elst  wrote:
>
> ozaki.ry...@gmail.com (Ryota Ozaki) writes:
>
> >> @@ -72,7 +74,7 @@ serverconnect(const char *addr, unsigned short port)
> >> +   err(1, "setsockopt(SO_NOSIGPIPE)");
> >> +err(1, "open(%s)", path);
>
> >I prefer more informative messages.  Why do you want to trim them?
>
>
> Usually the error gives enough context, e.g. SO_NOSIGPIPE is a socket option
> and telling that it's setsockopt failing is redundant and printing
> an input file name is enough when the error identifies the operation
> or the specific operation doesn't matter.
>
> But there is no rule for this, in particular when embedding filenames
> where multiple operations are possible. Many people seem to prefer even
> more verbose phrases like "Cannot open `%s'". Our code base has lots
> of variants.

I think I'm affected by ping6 or something (it's just one of variants though).

>
> I personally would prefer error messages without special characters
> so that you can grep them easily. :)

Indeed.

A type of annoying messages is that a phrase is separated into two (or more)
lines to avoid the 80 character limit.  That's quite anti-grep :-/

  ozaki-r

Re: File sharing over virtio-9p

2019-10-25 Thread Ryota Ozaki

On Fri, Oct 25, 2019 at 2:38 PM Valery Ushakov  wrote:
>
> On Fri, Oct 25, 2019 at 12:56:43 +0900, Ryota Ozaki wrote:
>
> > > @@ -72,7 +74,7 @@ serverconnect(const char *addr, unsigned short port)
> > > [...]
> > > +   err(1, "setsockopt(SO_NOSIGPIPE)");
> > >
> > > I'd just trim it down to "SO_NOSIGPIPE".
> > >
> > > +err(1, "open(%s)", path);
> > >
> > > Ditto.  Just make it "%s".
> >
> > I prefer more informative messages.  Why do you want to trim them?
>
> Consider that from the user perspective.  As a developer it's tempting
> to dump the implementation details, but which of the following is more
> readable to the user:
>
> $ ls foo
> ls: foo: No such file or directory
>
> or
>
> $ ls foo
> ls: stat(foo): No such file or directory

Hm, the example makes sense to me (so I'll fix open's one),
but doesn't for setsockopt:
  mount_9p: SO_NOSIGPIPE: Cannot allocate memory
or
  mount_9p: setsockopt(SO_NOSIGPIPE): Cannot allocate memory

I think the latter looks readable/understandable to users.

  ozaki-r

Re: File sharing over virtio-9p

2019-10-23 Thread Ryota Ozaki

On Tue, May 21, 2019 at 1:39 PM Ryota Ozaki  wrote:
>
> Hi,
>
> The following two patches enables a NetBSD guest running
> on a Linux KVM to share files with its host over virtio-9p.
>
>   https://www.netbsd.org/~ozaki-r/vio9p.diff
>   https://www.netbsd.org/~ozaki-r/mount_9p-cdev.diff
>
> The first patch exports a 9p endpoint of virtio-9p via a character
> device (e.g., /dev/vio9p0) and the second one extends mount_9p
> (puffs) to talk with a 9p server inside qemu via the character device.
>
> [Usage]
>
> Export a directory of a host via virtio-9p.  The below XML is part of
> a libvirt domain configuration.  It exports "/var/lib/libvirt/export"
> directory with a tag "test".  If it is the first entry, it will have
> minor number 0 and /dev/vio9p0 is assigned to it typically.
>
> 
>   
>   
>function='0x0'/>
> 
>
> A boot log of vio9p looks like this:
>
> virtio0 at pci0 dev 3 function 0
> virtio0: Virtio 9P Transport Device (rev. 0x00)
> vio9p0 at virtio0: Features: 0x1000
> virtio0: allocated 12288 byte for virtqueue 0 for vio9p, size 128
> virtio0: using 4096 byte (256 entries) indirect descriptors
> vio9p0: tagged as test
> virtio0: interrupting at ioapic0 pin 11
>
> A NetBSD guest can mount the exported directory with mount_9p.
>
> mount_9p -cu /dev/vio9p0 /mnt/9p
>
> -c tells mount_9p to interpret the first argument as a character
> device file to talk with.
>
>
> Have fun,
>   ozaki-r

I've prepared complete patches ready to commit:
  https://www.netbsd.org/~ozaki-r/tweak-MAKEDEV.diff
  https://www.netbsd.org/~ozaki-r/vio9p.diff
  https://www.netbsd.org/~ozaki-r/vio9p-configs.diff
  https://www.netbsd.org/~ozaki-r/mount_9p-cdev.diff

Any comments?

I would like to commit them in several days if there is
no objection.

Regards,
  ozaki-r

Re: Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself

2019-05-25 Thread Ryota Ozaki

On Fri, May 24, 2019 at 11:45 PM Paul Ripke  wrote:
>
> On Fri, May 24, 2019 at 12:15:58PM +0900, Ryota Ozaki wrote:
> > On Thu, May 23, 2019 at 10:00 PM Paul Ripke  wrote:
> > >
> > > Old Sun sparc 5, 32MiB RAM, netbooted with nfs root & swap. Has been
> > > running fine with an old kernel built from netbsd-8:
> > >
> > > NetBSD 8.0_STABLE (GENERIC) #0: Wed Sep 26 17:47:02 AEST 2018
> > >
> > > Booting a kernel from netbsd-8 from the last few days:
> > >
> > > NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019
> > >
> > > panics either during or shortly after boot, with the following console
> > > log:
> > >
> > > ---
> > > Starting sshd.
> > > Mutex error: mutex_vector_enter,552: locking against myself
> > >
> > > lock address : 0xf04aafc0
> > > current cpu  :  0
> > > current lwp  : 0xf0604680
> > > owner field  : 0xf0604680 wait/spin:0/0
> > >
> > > panic: lock error: Mutex: mutex_vector_enter,552: locking against myself: 
> > > lock 0xf04aafc0 cpu 0 lwp 0xf0604680
> > > cpu0: Begin traceback...
> > > 0x0(0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) at 
> > > netbsd:panic+0x20
> > > panic(0xf02cbb88, 0xf02c89f8, 0xf02a1d08, 0x228, 0xf02c89c0, 0xf04aafc0) 
> > > at netbsd:lockdebug_abort+0x9c
> > > lockdebug_abort(0xf02a1d08, 0x228, 0xf04aafc0, 0xf0329950, 0xf02c89c0, 
> > > 0xf0002000) at netbsd:mutex_enter+0x1cc
> > > mutex_enter(0xf04aafc0, 0x13, 0xf032993c, 0xf0349400, 0xf0604680, 
> > > 0xf0604680) at netbsd:sosend+0x44
> > > sosend(0xf05a22a0, 0xf060a020, 0x0, 0xf04aafc0, 0x700, 0x0) at 
> > > netbsd:nfs_send+0x90
> > > nfs_send(0xf05a22a0, 0xf060a000, 0xf0791e00, 0xf052f1f8, 0xf0604680, 0x0) 
> > > at netbsd:nfs_request+0x2f4
> > > nfs_request(0xf052f1f8, 0xf04f6a00, 0x2c, 0xf0342764, 0x0, 0x700) at 
> > > netbsd:nfs_readrpc+0x1dc
> > > nfs_readrpc(0xf06d8cb8, 0xf34544c8, 0x1000, 0x1000, 0xf06d61b0, 
> > > 0xf04f6a40) at netbsd:nfs_doio+0x6bc
> > > nfs_doio(0xf07c9020, 0x1, 0xf07c9020, 0x0, 0xf073ee00, 0xf06d8cb8) at 
> > > netbsd:VOP_STRATEGY+0x3c
> > > VOP_STRATEGY(0xf06d8cb8, 0xf07c9020, 0x0, 0xf04ad468, 0xf34545d8, 
> > > 0xf0744000) at netbsd:sw_reg_start.part.0+0x20
> > > sw_reg_start.part.0(0xf0518008, 0xf07c9020, 0x1, 0xf07c9020, 0x10, 
> > > 0xf06d8cb8) at netbsd:swstrategy+0x3fc
> > > swstrategy(0xf05b6480, 0x1000, 0xf19af000, 0x1000, 0xf07c0fc0, 
> > > 0xf0518008) at netbsd:bdev_strategy+0x50
> > > bdev_strategy(0xf05b6480, 0x0, 0xf032993c, 0x0, 0xf0604680, 0x0) at 
> > > netbsd:spec_strategy+0x88
> > > spec_strategy(0x0, 0x1c, 0x400, 0x0, 0xf0539d48, 0xf05b6480) at 
> > > netbsd:VOP_STRATEGY+0x3c
> > > VOP_STRATEGY(0xf0539d48, 0xf05b6480, 0xf0342ecc, 0xf0330de8, 0xf0029538, 
> > > 0xf04fb000) at netbsd:uvm_swap_io+0x10c
> > > uvm_swap_io(0xf345488c, 0xe90, 0x1, 0x10, 0x10, 0xf05b6480) at 
> > > netbsd:uvm_swap_get+0x3c
> > > uvm_swap_get(0x5, 0x1d2, 0x2, 0x0, 0x10, 0xf0342ecc) at 
> > > netbsd:uvmfault_anonget+0x2c4
> > > uvmfault_anonget(0xf3454944, 0xf060d758, 0xf05fa630, 0x1, 0xf0342ecc, 
> > > 0xf044a530) at netbsd:uvm_fault_internal+0xbbc
> > > uvm_fault_internal(0xedb0a000, 0x1, 0x20, 0x0, 0xf3454944, 0xf05fa630) at 
> > > netbsd:mem_access_fault4m+0x514
> > > mem_access_fault4m(0x9, 0x3a6, 0xedb05000, 0xf3454b08, 0x40, 0xf0604680) 
> > > at netbsd:memfault_sun4m+0xe8
> > > memfault_sun4m(0xf04de400, 0xedb05000, 0xf8, 0xf3453000, 0x1000404, 
> > > 0x2) at netbsd:copyout+0x28
> > > copyout(0x0, 0xf3454d88, 0xedb05000, 0xe400, 0x0, 0xf0644e60) at 
> > > netbsd:rt_walktree_visitor+0xc
> > > rt_walktree_visitor(0xf0644e60, 0xf3454d10, 0xedb05000, 0xe400, 0x0, 
> > > 0x0) at netbsd:rn_walktree+0xbc
> > > rn_walktree(0xf04d8e70, 0xf02593b8, 0xf3454d10, 0x0, 0xf0644950, 
> > > 0xf05a4870) at netbsd:rtbl_walktree+0x30
> > > rtbl_walktree(0x0, 0xf0259dd8, 0xf3454d88, 0xf0349400, 0xf0604680, 0x0) 
> > > at netbsd:sysctl_rtable+0x114
> > > sysctl_rtable(0xf0259dd8, 0x18, 0xedb05000, 0xf3454e94, 0x16, 0x18) at 
> > > netbsd:sysctl_dispatch+0x94
> > > sysctl_dispatch(0xf3454e98, 0x6, 0xedb05000, 0xf3454e94, 0x0, 0x0) at 
> > > netbsd:sys___sysctl+0xc4
> > > sys___sysctl(0xf0604680, 0xf3454f30, 0xf3454f28, 0xe404, 0x1b54, 
> > > 0xe400) a

Re: Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself

2019-05-24 Thread Ryota Ozaki

On Thu, May 23, 2019 at 10:00 PM Paul Ripke  wrote:
>
> Old Sun sparc 5, 32MiB RAM, netbooted with nfs root & swap. Has been
> running fine with an old kernel built from netbsd-8:
>
> NetBSD 8.0_STABLE (GENERIC) #0: Wed Sep 26 17:47:02 AEST 2018
>
> Booting a kernel from netbsd-8 from the last few days:
>
> NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019
>
> panics either during or shortly after boot, with the following console
> log:
>
> ---
> Starting sshd.
> Mutex error: mutex_vector_enter,552: locking against myself
>
> lock address : 0xf04aafc0
> current cpu  :  0
> current lwp  : 0xf0604680
> owner field  : 0xf0604680 wait/spin:0/0
>
> panic: lock error: Mutex: mutex_vector_enter,552: locking against myself: 
> lock 0xf04aafc0 cpu 0 lwp 0xf0604680
> cpu0: Begin traceback...
> 0x0(0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) at 
> netbsd:panic+0x20
> panic(0xf02cbb88, 0xf02c89f8, 0xf02a1d08, 0x228, 0xf02c89c0, 0xf04aafc0) at 
> netbsd:lockdebug_abort+0x9c
> lockdebug_abort(0xf02a1d08, 0x228, 0xf04aafc0, 0xf0329950, 0xf02c89c0, 
> 0xf0002000) at netbsd:mutex_enter+0x1cc
> mutex_enter(0xf04aafc0, 0x13, 0xf032993c, 0xf0349400, 0xf0604680, 0xf0604680) 
> at netbsd:sosend+0x44
> sosend(0xf05a22a0, 0xf060a020, 0x0, 0xf04aafc0, 0x700, 0x0) at 
> netbsd:nfs_send+0x90
> nfs_send(0xf05a22a0, 0xf060a000, 0xf0791e00, 0xf052f1f8, 0xf0604680, 0x0) at 
> netbsd:nfs_request+0x2f4
> nfs_request(0xf052f1f8, 0xf04f6a00, 0x2c, 0xf0342764, 0x0, 0x700) at 
> netbsd:nfs_readrpc+0x1dc
> nfs_readrpc(0xf06d8cb8, 0xf34544c8, 0x1000, 0x1000, 0xf06d61b0, 0xf04f6a40) 
> at netbsd:nfs_doio+0x6bc
> nfs_doio(0xf07c9020, 0x1, 0xf07c9020, 0x0, 0xf073ee00, 0xf06d8cb8) at 
> netbsd:VOP_STRATEGY+0x3c
> VOP_STRATEGY(0xf06d8cb8, 0xf07c9020, 0x0, 0xf04ad468, 0xf34545d8, 0xf0744000) 
> at netbsd:sw_reg_start.part.0+0x20
> sw_reg_start.part.0(0xf0518008, 0xf07c9020, 0x1, 0xf07c9020, 0x10, 
> 0xf06d8cb8) at netbsd:swstrategy+0x3fc
> swstrategy(0xf05b6480, 0x1000, 0xf19af000, 0x1000, 0xf07c0fc0, 0xf0518008) at 
> netbsd:bdev_strategy+0x50
> bdev_strategy(0xf05b6480, 0x0, 0xf032993c, 0x0, 0xf0604680, 0x0) at 
> netbsd:spec_strategy+0x88
> spec_strategy(0x0, 0x1c, 0x400, 0x0, 0xf0539d48, 0xf05b6480) at 
> netbsd:VOP_STRATEGY+0x3c
> VOP_STRATEGY(0xf0539d48, 0xf05b6480, 0xf0342ecc, 0xf0330de8, 0xf0029538, 
> 0xf04fb000) at netbsd:uvm_swap_io+0x10c
> uvm_swap_io(0xf345488c, 0xe90, 0x1, 0x10, 0x10, 0xf05b6480) at 
> netbsd:uvm_swap_get+0x3c
> uvm_swap_get(0x5, 0x1d2, 0x2, 0x0, 0x10, 0xf0342ecc) at 
> netbsd:uvmfault_anonget+0x2c4
> uvmfault_anonget(0xf3454944, 0xf060d758, 0xf05fa630, 0x1, 0xf0342ecc, 
> 0xf044a530) at netbsd:uvm_fault_internal+0xbbc
> uvm_fault_internal(0xedb0a000, 0x1, 0x20, 0x0, 0xf3454944, 0xf05fa630) at 
> netbsd:mem_access_fault4m+0x514
> mem_access_fault4m(0x9, 0x3a6, 0xedb05000, 0xf3454b08, 0x40, 0xf0604680) at 
> netbsd:memfault_sun4m+0xe8
> memfault_sun4m(0xf04de400, 0xedb05000, 0xf8, 0xf3453000, 0x1000404, 0x2) 
> at netbsd:copyout+0x28
> copyout(0x0, 0xf3454d88, 0xedb05000, 0xe400, 0x0, 0xf0644e60) at 
> netbsd:rt_walktree_visitor+0xc
> rt_walktree_visitor(0xf0644e60, 0xf3454d10, 0xedb05000, 0xe400, 0x0, 0x0) 
> at netbsd:rn_walktree+0xbc
> rn_walktree(0xf04d8e70, 0xf02593b8, 0xf3454d10, 0x0, 0xf0644950, 0xf05a4870) 
> at netbsd:rtbl_walktree+0x30
> rtbl_walktree(0x0, 0xf0259dd8, 0xf3454d88, 0xf0349400, 0xf0604680, 0x0) at 
> netbsd:sysctl_rtable+0x114
> sysctl_rtable(0xf0259dd8, 0x18, 0xedb05000, 0xf3454e94, 0x16, 0x18) at 
> netbsd:sysctl_dispatch+0x94
> sysctl_dispatch(0xf3454e98, 0x6, 0xedb05000, 0xf3454e94, 0x0, 0x0) at 
> netbsd:sys___sysctl+0xc4
> sys___sysctl(0xf0604680, 0xf3454f30, 0xf3454f28, 0xe404, 0x1b54, 
> 0xe400) at netbsd:syscall+0x248
> syscall(0xcca, 0xf3454fb0, 0xede028d0, 0xca, 0x4e, 0xf0604680) at 
> netbsd:memfault_sun4m+0x3f4
> cpu0: End traceback...
> Frame pointer is at 0xf3453f20
> Call traceback:
>   pc = 0xf0024fec  args = (0xf02be550, 0x0, 0xffe2, 0xf02aca38, 0xf01dcfc8, 
> 0xf0002000) fp = 0xf3453f90
>   pc = 0xf01dd358  args = (0x104, 0x0, 0xf02cbb88, 0xf0002000, 0xf0321000, 
> 0xf0344c00) fp = 0xf3453ff8
>   pc = 0xf01dd3e4  args = (0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 
> 0xf0349648, 0x104) fp = 0xf3454058
> rebooting

Could you try the below patch?

Thanks,
  ozaki-r

---
diff --git a/sys/net/rtsock.c b/sys/net/rtsock.c
index 399b2049130..4f17e716e29 100644
--- a/sys/net/rtsock.c
+++ b/sys/net/rtsock.c
@@ -1873,7 +1873,7 @@ again:
w.w_needed = 0 - w.w_given;
w.w_where = where;

-   SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE();
+   KERNEL_LOCK_UNLESS_NET_MPSAFE();
s = splsoftnet();
switch (w.w_op) {

@@ -1932,7 +1932,7 @@ again:
break;
}
splx(s);
-   SOFTNET_KERNEL_UNLOCK_UNLESS_NET_MPSAFE();
+   KERNEL_UNLOCK_UNLESS_NET_MPSAFE();

/* check to see if we couldn't allocate memory with

File sharing over virtio-9p

2019-05-21 Thread Ryota Ozaki

Hi,

The following two patches enables a NetBSD guest running
on a Linux KVM to share files with its host over virtio-9p.

  https://www.netbsd.org/~ozaki-r/vio9p.diff
  https://www.netbsd.org/~ozaki-r/mount_9p-cdev.diff

The first patch exports a 9p endpoint of virtio-9p via a character
device (e.g., /dev/vio9p0) and the second one extends mount_9p
(puffs) to talk with a 9p server inside qemu via the character device.

[Usage]

Export a directory of a host via virtio-9p.  The below XML is part of
a libvirt domain configuration.  It exports "/var/lib/libvirt/export"
directory with a tag "test".  If it is the first entry, it will have
minor number 0 and /dev/vio9p0 is assigned to it typically.


  
  
  


A boot log of vio9p looks like this:

virtio0 at pci0 dev 3 function 0
virtio0: Virtio 9P Transport Device (rev. 0x00)
vio9p0 at virtio0: Features: 0x1000
virtio0: allocated 12288 byte for virtqueue 0 for vio9p, size 128
virtio0: using 4096 byte (256 entries) indirect descriptors
vio9p0: tagged as test
virtio0: interrupting at ioapic0 pin 11

A NetBSD guest can mount the exported directory with mount_9p.

mount_9p -cu /dev/vio9p0 /mnt/9p

-c tells mount_9p to interpret the first argument as a character
device file to talk with.


Have fun,
  ozaki-r

Re: iwm driver leads to kernel crash

2019-04-01 Thread Ryota Ozaki

On Mon, Apr 1, 2019 at 6:53 AM  wrote:
>
> Hi,
>
> would further dmesg outputs from the last 10 or so kernel crashes
> still be useful?

Yes, and if you have crashdumps, could you please provide detailed
information from them?  (see https://wiki.netbsd.org/panic/ for the
instructions).

> This still keeps happening (workaround so far is to use ethernet).
>
> Or maybe I'm looking at the wrong kind of bug and there is something
> to track / being worked on already?

I guess no.

Thanks,
  ozaki-r

Re: netbsd-8: panic: sockaddr_copy: source too long, 28 < 128 bytes

2018-11-05 Thread Ryota Ozaki

On Mon, Nov 5, 2018 at 12:38 PM Ryota Ozaki  wrote:
>
(snip)
>
> I can reproduce the panic easily by the small program:
>
> // start--
> #include 
> #include 
> #include 
>
> int
> main(void)
> {
> char buf[64];
> struct sockaddr_storage ss = {0};
> int s, e;
>
> ss.ss_family = AF_INET6;
> ss.ss_len = sizeof(struct sockaddr_in6);

Oops. sin6_addr and sin6_port (of ss casted to sockaddr_in6)
should not be zero and so be set some 1.

  ozaki-r

> s = socket(AF_INET6, SOCK_DGRAM, 0);
> e = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *), ss.ss_len);
> if (e == -1)
> warn("sendto");
> ss.ss_len = sizeof(ss);
> e = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *), ss.ss_len);
> if (e == -1)
> warn("sendto");
> }
> // --end

Re: netbsd-8: panic: sockaddr_copy: source too long, 28 < 128 bytes

2018-11-05 Thread Ryota Ozaki

On Tue, Nov 6, 2018 at 10:41 AM Paul Ripke  wrote:
>
> On Mon, Nov 05, 2018 at 05:28:23PM +0900, Ryota Ozaki wrote:
> > On Mon, Nov 5, 2018 at 4:40 PM Michael van Elst  wrote:
> > >
> > > ozak...@netbsd.org (Ryota Ozaki) writes:
> > >
> > > >diff --git a/sys/netinet6/udp6_usrreq.c b/sys/netinet6/udp6_usrreq.c
> > > >index ee4fc6fdfb3..a4a74c8009e 100644
> > > >--- a/sys/netinet6/udp6_usrreq.c
> > > >+++ b/sys/netinet6/udp6_usrreq.c
> > > >@@ -668,10 +668,18 @@ udp6_output(struct in6pcb * const in6p, struct 
> > > >mbuf *m,
> > >
> > > >if (addr6) {
> > > >sin6 = addr6;
> > > >+   if (sin6->sin6_len != sizeof(*sin6)) {
> > > >+   error = EINVAL;
> > > >+   goto release;
> > > >+   }
> > > >if (sin6->sin6_family != AF_INET6) {
> > > >error = EAFNOSUPPORT;
> > > >goto release;
> > > >}
> > > >+   if (sin6->sin6_port == 0) {
> > > >+   error = EADDRNOTAVAIL;
> > > >+   goto release;
> > > >+   }
> > >
> > > The port validation is already done a few lines below,
> >
> > Thanks, that's right.
> >
> > > but the comment when using the port is a bit strange:
> > >
> > > fport = sin6->sin6_port; /* allow 0 port */
> > >
> > > Apparently that comment (and the port check) already
> > > existed when the initial version was imported.
> >
> > Well... I think the comment is just a leftover to be removed :-/
> >
> >   ozaki-r
>
> Thanks! Patched into netbsd-8, running with it now. I do wonder
> which process was responsible for doing the op. It's been too long
> since I've tried grokking gdb on kvm cores...

Thank you for testing! I hope the patch fixes the panic you encountered.
Anyway I'll commit and pull up the fix soon because it certainly fixes a panic.

  ozaki-r

Re: netbsd-8: panic: sockaddr_copy: source too long, 28 < 128 bytes

2018-11-05 Thread Ryota Ozaki

On Mon, Nov 5, 2018 at 4:40 PM Michael van Elst  wrote:
>
> ozak...@netbsd.org (Ryota Ozaki) writes:
>
> >diff --git a/sys/netinet6/udp6_usrreq.c b/sys/netinet6/udp6_usrreq.c
> >index ee4fc6fdfb3..a4a74c8009e 100644
> >--- a/sys/netinet6/udp6_usrreq.c
> >+++ b/sys/netinet6/udp6_usrreq.c
> >@@ -668,10 +668,18 @@ udp6_output(struct in6pcb * const in6p, struct mbuf *m,
>
> >if (addr6) {
> >sin6 = addr6;
> >+   if (sin6->sin6_len != sizeof(*sin6)) {
> >+   error = EINVAL;
> >+   goto release;
> >+   }
> >if (sin6->sin6_family != AF_INET6) {
> >error = EAFNOSUPPORT;
> >goto release;
> >}
> >+   if (sin6->sin6_port == 0) {
> >+   error = EADDRNOTAVAIL;
> >+   goto release;
> >+   }
>
> The port validation is already done a few lines below,

Thanks, that's right.

> but the comment when using the port is a bit strange:
>
> fport = sin6->sin6_port; /* allow 0 port */
>
> Apparently that comment (and the port check) already
> existed when the initial version was imported.

Well... I think the comment is just a leftover to be removed :-/

  ozaki-r

Re: netbsd-8: panic: sockaddr_copy: source too long, 28 < 128 bytes

2018-11-04 Thread Ryota Ozaki

On Mon, Nov 5, 2018 at 10:38 AM Paul Ripke  wrote:
>
> I'm running netbsd-8 synced as of ~2018-10-30, and am continuing to
> get occasional panics, about once a week on the prior kernel, and
> another just now. Is this familiar to anybody? Core on hand, pointers
> on what to look at appreciated.
>
> NetBSD slave 8.0_STABLE NetBSD 8.0_STABLE (SLAVE) #1: Tue Oct 30 07:40:01 
> AEDT 2018  
> stix@slave:/home/netbsd/netbsd-8/obj.amd64/home/netbsd/netbsd-8/src/sys/arch/amd64/compile/SLAVE
>  amd64
>
> Nov  5 12:09:59 slave /netbsd: panic: sockaddr_copy: source too long, 28 < 
> 128 bytes
> Nov  5 12:09:59 slave /netbsd: cpu0: Begin traceback...
> Nov  5 12:09:59 slave /netbsd: vpanic() at netbsd:vpanic+0x15d
> Nov  5 12:09:59 slave /netbsd: snprintf() at netbsd:snprintf
> Nov  5 12:09:59 slave /netbsd: sockaddr_copy() at netbsd:sockaddr_copy+0x7f
> Nov  5 12:09:59 slave /netbsd: rtcache_setdst() at netbsd:rtcache_setdst+0x5d
> Nov  5 12:09:59 slave /netbsd: rtcache_lookup2() at 
> netbsd:rtcache_lookup2+0x5e
> Nov  5 12:09:59 slave /netbsd: in6_selectroute() at 
> netbsd:in6_selectroute+0x15a
> Nov  5 12:09:59 slave /netbsd: in6_selectsrc() at netbsd:in6_selectsrc+0x100
> Nov  5 12:09:59 slave /netbsd: udp6_output() at netbsd:udp6_output+0x246
> Nov  5 12:09:59 slave /netbsd: udp6_send_wrapper() at 
> netbsd:udp6_send_wrapper+0x51
> Nov  5 12:09:59 slave /netbsd: sosend() at netbsd:sosend+0x77f
> Nov  5 12:09:59 slave /netbsd: do_sys_sendmsg_so() at 
> netbsd:do_sys_sendmsg_so+0x26d
> Nov  5 12:09:59 slave /netbsd: do_sys_sendmsg() at netbsd:do_sys_sendmsg+0x85
> Nov  5 12:09:59 slave /netbsd: sys_sendto() at netbsd:sys_sendto+0x5c
> Nov  5 12:09:59 slave /netbsd: syscall() at netbsd:syscall+0x1ec
> Nov  5 12:09:59 slave /netbsd: --- syscall (number 133) ---
> Nov  5 12:09:59 slave /netbsd: 7db02a8eea4a:
> Nov  5 12:09:59 slave /netbsd: cpu0: End traceback...
>
> (gdb) p *src
> $3 = {
>   sa_len = 128 '\200',
>   sa_family = 24 '\030',
>   sa_data = "\254\005\000\000\000\000 \001A\320\000\n^\207"
> }
>
> --
> Paul Ripke
> "Great minds discuss ideas, average minds discuss events, small minds
>  discuss people."
> -- Disputed: Often attributed to Eleanor Roosevelt. 1948.

I can reproduce the panic easily by the small program:

// start--
#include 
#include 
#include 

int
main(void)
{
char buf[64];
struct sockaddr_storage ss = {0};
int s, e;

ss.ss_family = AF_INET6;
ss.ss_len = sizeof(struct sockaddr_in6);
s = socket(AF_INET6, SOCK_DGRAM, 0);
e = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *), ss.ss_len);
if (e == -1)
warn("sendto");
ss.ss_len = sizeof(ss);
e = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *), ss.ss_len);
if (e == -1)
warn("sendto");
}
// --end

It seems that, on UDP/IPv6, a passed sockaddr to sendto isn't validated
well and passed to the lower layers as is, then it triggers the panic.

The length check of a sockaddr was performed implicitly in udp6_output but
the check was removed accidentally between NetBSD 7 and 8 (*).

(*) 
http://cvsweb.netbsd.org/cgi-bin/cvsweb.cgi/src/sys/netinet6/Attic/udp6_output.c.diff?r1=1.48=1.49=h

So the follow patch fixes the issue. (There can be better fixes.)

Thanks,
  ozaki-r

diff --git a/sys/netinet6/udp6_usrreq.c b/sys/netinet6/udp6_usrreq.c
index ee4fc6fdfb3..a4a74c8009e 100644
--- a/sys/netinet6/udp6_usrreq.c
+++ b/sys/netinet6/udp6_usrreq.c
@@ -668,10 +668,18 @@ udp6_output(struct in6pcb * const in6p, struct mbuf *m,

if (addr6) {
sin6 = addr6;
+   if (sin6->sin6_len != sizeof(*sin6)) {
+   error = EINVAL;
+   goto release;
+   }
if (sin6->sin6_family != AF_INET6) {
error = EAFNOSUPPORT;
goto release;
}
+   if (sin6->sin6_port == 0) {
+   error = EADDRNOTAVAIL;
+   goto release;
+   }

/* protect *sin6 from overwrites */
tmp = *sin6;

Re: Running out of buffers?

2018-04-26 Thread Ryota Ozaki

On Fri, Apr 27, 2018 at 1:18 PM Roy Marples  wrote:

> On 27/04/2018 04:09, Paul Goyette wrote:
> > I've got lots of memory, so I don't understand what buffers are not
> > available.  Ever since upgrading to my current system (sources dated
> > 2018-03-20 11:25:00 UTC), I've been seeing these messages at random
> > intervals:
> >
> > Apr 23 05:51:33 speedy ntpd[526]: routing socket reports: No buffer
> > space available

> This may come as some suprise, but the only change is that the error is
> now logged. Previously it was silenty discarded.

> No-one has yet weighed in on how this should be resolved.


> > I never saw them with a previous kernel (from March 3rd), so it
> > would seem that something changed between the 3rd and 20th.
> >
> > Is anyone else seeing similar?
> >
> > Any clues on what changed?
> >
> > The situation doesn't seem fatal (at least, not yet), but I'd like
> > to mitigate the condition before it gets worse.  :)

> Ideas welcome!
> The only one stop solution I can think of is increasing the the default
> buffer size, but this might adversley affect small memory systems.

> > Thanks in advance for any suggestions.

> Looking forward to hearimg some!

One option would be to add a new socket option and send up an error
only if it's set, which avoids surprising unaware apps.

   ozaki-r

Re: netbsd-8 hang on tstile

2018-03-08 Thread Ryota Ozaki

On Thu, Mar 8, 2018 at 5:04 PM, Manuel Bouyer <bou...@antioche.eu.org> wrote:
> On Thu, Mar 08, 2018 at 11:04:00AM +0900, Ryota Ozaki wrote:
>> On Wed, Mar 7, 2018 at 6:38 PM, Manuel Bouyer <bou...@antioche.eu.org> wrote:
>> > On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote:
>> >> On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote:
>> >> > >[...]
>> >> > > This is reproductible, restarting my automatic test script hangs the 
>> >> > > same
>> >> > > way. This i plain ffs, no wapbl.
>> >> > >
>> >> > > Any idea ?
>> >> >
>> >> > pserialize_perform can get stuck if any softints get stuck. Can you 
>> >> > check
>> >> > if such softints exist?
>> >>
>> >> I'll look the next time I can see this.
>> >
>> > I had a console log of a previous hang. No softint appears to be waiting
>> > in the ps/a output
>>
>> Thanks. Hm, does ps/a show softints (say softnet/0)?
>> I use just ps for the purpose.
>
> I used plain ps too. Not sure why I added this /a in my mail (probably
> related to tr/a :)

Okay, thanks :)

So my concern probably proved unfounded.

  ozaki-r

Re: netbsd-8 hang on tstile

2018-03-07 Thread Ryota Ozaki

On Wed, Mar 7, 2018 at 6:38 PM, Manuel Bouyer <bou...@antioche.eu.org> wrote:
> On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote:
>> On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote:
>> > >[...]
>> > > This is reproductible, restarting my automatic test script hangs the same
>> > > way. This i plain ffs, no wapbl.
>> > >
>> > > Any idea ?
>> >
>> > pserialize_perform can get stuck if any softints get stuck. Can you check
>> > if such softints exist?
>>
>> I'll look the next time I can see this.
>
> I had a console log of a previous hang. No softint appears to be waiting
> in the ps/a output

Thanks. Hm, does ps/a show softints (say softnet/0)?
I use just ps for the purpose.

  ozaki-r

Re: netbsd-8 hang on tstile

2018-03-06 Thread Ryota Ozaki

On Wed, Mar 7, 2018 at 7:33 AM, Manuel Bouyer  wrote:
> Hello
> on an up-to-date netbsd-8 Xen3 i386PAE kernel I see hangs on tstile.
> Hung processes shows the same pattern, they sleep in fstrans_start():
> sleepq_block(0,0,c0596900,c0639b6c,c534e802,40,c03dbcfe,75,c5356340,c534e800) 
> at netbsd:sleepq_block+0xe6
> turnstile_block(c59847f0,1,c078b6c0,c0639b6c,de8b9b70,c59847f0,c5db6020,c59a2008,0,c59a2008)
>  at netbsd:turnstile_block+0x29d
> mutex_vector_enter(c078b6c0,c0cb8fe0,c055570c,de8b9b84,c05349d7,c575ed04,1,c59a2008,de8b9ba4,c0493ff8)
>  at netbsd:mutex_vector_enter+0x28c
> fstrans_start(c59a2008,0,de8b9ba8,10,c055cf2c,c575ed04,20002,20002,c575ed04,0)
>  at netbsd:fstrans_start+0x3b6
> VOP_LOCK(c575ed04,20002,22000,0,c03dc2e7,c0746e1a,de8b9d0c,20002,c575ed04,de8b9c88)
>  at netbsd:VOP_LOCK+0x48
> vn_lock(c575ed04,20002,4,0,de8b9c14,c03af32d,de8b9edc,3,0,5) at 
> netbsd:vn_lock+0x7f
> namei_tryemulroot(0,c012a96a,c532c240,de8b9ce8,c041f909,c532c2b4,de8b9d0c,de8b9d34,0,0)
>  at netbsd:namei_tryemulroot+0x14f
> namei(de8b9d0c,1,3,de8b9d10,c042266a,0,c532c240,de8b9d20,c04211d6,0) at 
> netbsd:namei+0x27
> check_exec(c5db6020,de8b9dc8,c5f558a8,de8b9da4,9cf21000,de8b9dac,c0114f1a,bad054d0,de8b9ddc,bf7fef58)
>  at netbsd:check_exec+0x40
> execve_loadvm(bf7feab8,c03c95a0,de8b9dc8,c5a6f000,c6460c00,c700b008,404,0,0,0)
>  at netbsd:execve_loadvm+0x233
> execve1(c5db6020,bf7fef58,bad054d0,bf7feab8,c03c95a0,de8b9f9c,c0113572,c5db6020,
>  de8b9f68,de8b9f60) at netbsd:execve1+0x3c
> sys_execve(c5db6020,de8b9f68,de8b9f60,c63d8290,0,c0636ddc,de8b9f68,0,0,bf7fef58)
>  at netbsd:sys_execve+0x31
> syscall() at netbsd:syscall+0x82
>
> I guess the culprit is:
> 2650 1 3   2 0   c5f92a80  python2.7 psrlz
> (an anita process, actually)
> sleepq_block(1,0,c059af17,c0639f80,c0640804,c5356340,c5358a80,c063f93c,6406c2,c0
> 59af17) at netbsd:sleepq_block+0x1cd
> kpause(c059af17,0,1,0,c5448590,c5cec008,de5e7b5c,c048355a,c5330a28,de5e7b6c) 
> at
> n
> etbsd:kpause+0xf2
> pserialize_perform(c5330a28,de5e7b6c,c0484a6f,c638e8c0,0,c055cf2c,c5cec008,504,0,de5e7b7c)
>  at netbsd:pserialize_perform+0x10a
> fstrans_setstate(c5cec008,0,fffe,c0115914,c5cec008,c5cec008,de5e7b94,c047a2c0,c5cec008,2)
>  at netbsd:fstrans_setstate+0x3a
> genfs_suspendctl(c5cec008,2,de5e7bb0,de5e7bd4,de5e7bb0,c0483fc4,c5cec008,2,de5e7bd4,de5e7bd4)
>  at netbsd:genfs_suspendctl+0x3a
> VFS_SUSPENDCTL(c5cec008,2,de5e7bd4,de5e7bd4,504,de5e7be4,c0486fd6,c5cec008,504,0)
>  at netbsd:VFS_SUSPENDCTL+0x20
> vfs_resume(c5cec008,504,0,de5e7bd4,c011599f,4,c5cec008,c5e978e4,c5e978e4,c5f92a80)
>  at netbsd:vfs_resume+0x74
> vrevoke(c5e978e4,de5e7c14,c049370a,de5e7c04,0,0,c055d160,c5e978e4,1,0) at 
> netbsd:vrevoke+0x96
> genfs_revoke(de5e7c04,0,0,c055d160,c5e978e4,1,0,de5e7cc4,c044dea9,c5e978e4) 
> at netbsd:genfs_revoke+0x1a
> VOP_REVOKE(c5e978e4,1,c5353f00,504,0,74,c5e978e4,0,190,) at 
> netbsd:VOP_REVOKE+0x4a
> pty_grant_slave(c5f92a80,504,0,c5cec008,0,10,c055cf2c,c5c271bc,20002,20002) 
> at netbsd:pty_grant_slave+0xc9
> ptmioctl(a501,0,48087446,c6922008,3,c5f92a80,c5f92a80,48087446,c055c260,3) at 
> netbsd:ptmioctl+0xdd
> cdev_ioctl(a501,0,48087446,c6922008,3,c5f92a80,a501,c5c271bc,c5c271bc,c5fd22c0)
>  at netbsd:cdev_ioctl+0xd0
> spec_ioctl(de5e7da0,c0115914,c5df9bc0,c055d1f0,c5c271bc,48087446,c6922008,3,c5eac9c0,48087446)
>  at netbsd:spec_ioctl+0x90
> VOP_IOCTL(c5c271bc,48087446,c6922008,3,c5eac9c0,c5e74380,c6fbe790,fffe,c0115914,0)
>  at netbsd:VOP_IOCTL+0x3e
> vn_ioctl(c5fd22c0,48087446,c6922008,c5f8eb74,c5fd22c0,,,0,0,c6922008)
>  at netbsd:vn_ioctl+0x9f
> sys_ioctl(c5f92a80,de5e7f68,de5e7f60,c63d8788,0,c0636d78,de5e7f68,0,0,7) at 
> netbsd:sys_ioctl+0x10a
> syscall() at netbsd:syscall+0x82
>
> This is reproductible, restarting my automatic test script hangs the same
> way. This i plain ffs, no wapbl.
>
> Any idea ?

pserialize_perform can get stuck if any softints get stuck. Can you check
if such softints exist?

  ozaki-r

Re: Status of 8.99.12

2018-02-11 Thread Ryota Ozaki

On Mon, Feb 12, 2018 at 9:48 AM, Paul Goyette  wrote:
> After an extended period of build breaks, I finally got a new release built
> from sources updated on 2018-02-10 at 04:02:43 UTC
>
> I'm seeing several problems with this release that were not seen with my
> previous installation (from last November).
>
> 1. Starting the gnucash program (from pkgsrc finance/gnucash) now takes
>about 3 times as long as before.  Even after successfully loading
>the image (to get libraries etc into the file system cache) it take
>more than three full minutes for the program to initialize.
>
> 2. Whenever I try to shutdown the system, I get a networking-related
>panic.  The following is manually transcribed:
>
> trap type 4 code 0 rip 0x802d3f75 cs 0x8 rflags 0x10282
>   cr2 0x77e0e931c020 ilevel 0x4 rsp 0x80090a7e3c80
> curlwp 0xe4afbb6e8700 pid 926.1 lowest kstack
>   0x80090a7e0c20
> kernel: protection fault trap, code = 0
> stopped in 926.1 (avahi-daemon) at ip_setmoptions+0x237: movq
>   360(%rax),%rdi
> traceback:
> ip_setmoptions + 0x237
> ip_rtloutput + 0x218
> udp_ctloutput + 0x82
> udp_ctloutput_wrapper + 0x2c
> sosetopt + 0x67
> sys_setsockopt + 0x91
> syscall + 0x1ed (syscall #105)

Is the panic fixed by the following patch?

Thanks,
  ozaki-r


diff --git a/sys/netinet/ip_output.c b/sys/netinet/ip_output.c
index 44d8032f387..2e5e346af91 100644
--- a/sys/netinet/ip_output.c
+++ b/sys/netinet/ip_output.c
@@ -1927,9 +1927,13 @@ ip_drop_membership(struct ip_moptions *imo,
const struct sockopt *sopt)
 * Give up the multicast address record to which the
 * membership points.
 */
-   IFNET_LOCK(imo->imo_membership[i]->inm_ifp);
+{
+   struct ifnet *inm_ifp = imo->imo_membership[i]->inm_ifp;
+   IFNET_LOCK(inm_ifp);
in_delmulti(imo->imo_membership[i]);
-   IFNET_UNLOCK(imo->imo_membership[i]->inm_ifp);
+   /* ifp should not leave thanks to solock */
+   IFNET_UNLOCK(inm_ifp);
+}

/*
 * Remove the gap in the membership array.

Re: 8.99.12 panic [Re: two 8.99.9 panics]

2018-01-11 Thread Ryota Ozaki

On Thu, Jan 11, 2018 at 6:01 PM, Thomas Klausner  wrote:
> On Wed, Jan 10, 2018 at 01:40:52PM -0500, Christos Zoulas wrote:
>> On Jan 10,  7:32pm, t...@giga.or.at (Thomas Klausner) wrote:
>> -- Subject: Re: 8.99.12 panic [Re: two 8.99.9 panics]
>>
>> | On Wed, Jan 10, 2018 at 12:36:24PM -0500, Christos Zoulas wrote:
>> | > On Jan 10,  3:45pm, t...@giga.or.at (Thomas Klausner) wrote:
>> | > -- Subject: 8.99.12 panic [Re: two 8.99.9 panics]
>> | >
>> | > | I was told to try 8.99.12 (head as of ~20 minutes ago) and I quickly
>> | > | saw the second panic again.
>> | >
>> | > CVS update.
>> |
>> | I used your version, not ozaki-r's, rebooted and went away.
>> | 20 minutes later:
>>
>> Ozaki's version is probably going to work better :-)
>
> The machine survived half a day including a bulk build, so "yes".

Thanks!

(I'll send a pullup request of the fix to netbsd-8.)

  ozaki-r

Re: 8.99.12 panic [Re: two 8.99.9 panics]

2018-01-10 Thread Ryota Ozaki

On Wed, Jan 10, 2018 at 11:45 PM, Thomas Klausner  wrote:
> I was told to try 8.99.12 (head as of ~20 minutes ago) and I quickly
> saw the second panic again.
>
> Jan 10 15:42:23 yt /netbsd: panic: prevented null pointer dereference at 
> 0x360 (SMAP)
> Jan 10 15:42:23 yt /netbsd: cpu8: Begin traceback...
> Jan 10 15:42:23 yt /netbsd: vpanic() at netbsd:vpanic+0x140
> Jan 10 15:42:23 yt /netbsd: snprintf() at netbsd:snprintf
> Jan 10 15:42:23 yt /netbsd: trap() at netbsd:trap+0xc15
> Jan 10 15:42:23 yt /netbsd: --- trap (number 6) ---
> Jan 10 15:42:23 yt /netbsd: ip_setmoptions() at netbsd:ip_setmoptions+0x1ff
> Jan 10 15:42:23 yt /netbsd: ip_ctloutput() at netbsd:ip_ctloutput+0x260
> Jan 10 15:42:23 yt /netbsd: udp_ctloutput() at netbsd:udp_ctloutput+0x82
> Jan 10 15:42:23 yt /netbsd: udp_ctloutput_wrapper() at 
> netbsd:udp_ctloutput_wrapper+0x2c
> Jan 10 15:42:23 yt /netbsd: sosetopt() at netbsd:sosetopt+0x67
> Jan 10 15:42:23 yt /netbsd: sys_setsockopt() at netbsd:sys_setsockopt+0x91
> Jan 10 15:42:23 yt /netbsd: syscall() at netbsd:syscall+0x1d8
> Jan 10 15:42:23 yt /netbsd: --- syscall (number 105) ---
> Jan 10 15:42:23 yt /netbsd: 765ca20cde5a:
> Jan 10 15:42:23 yt /netbsd: cpu8: End traceback...
>
> Reduced workload: just syncthing and mercurial self tests.
>  Thomas

Does the following patch help you?

Thanks,
  ozaki-r


diff --git a/sys/netinet/ip_output.c b/sys/netinet/ip_output.c
index d643aef71c8..4e70cc7ad94 100644
--- a/sys/netinet/ip_output.c
+++ b/sys/netinet/ip_output.c
@@ -1905,9 +1905,9 @@ ip_drop_membership(struct ip_moptions *imo,
const struct sockopt *sopt)
 * Give up the multicast address record to which the
 * membership points.
 */
-   IFNET_LOCK(ifp);
+   IFNET_LOCK(imo->imo_membership[i]->inm_ifp);
in_delmulti(imo->imo_membership[i]);
-   IFNET_UNLOCK(ifp);
+   IFNET_UNLOCK(imo->imo_membership[i]->inm_ifp);

/*
 * Remove the gap in the membership array.

Re: Strange new test failures

2018-01-08 Thread Ryota Ozaki

On Sun, Jan 7, 2018 at 6:29 AM, Martin Husemann  wrote:
> We should have 0 unexpected failures, but the latest run got:
>
> Failed test cases:
> lib/librumphijack/t_config:fdoff, lib/librumphijack/t_tcpip:http, 
> lib/librumphijack/t_tcpip:nfs, lib/librumphijack/t_tcpip:ssh, 
> lib/librumphijack/t_vfs:cpcopy, lib/librumphijack/t_vfs:mv_x, 
> lib/librumphijack/t_vfs:paxcopy, 
> net/net/t_forwarding:ipforwarding_fastforward_v4, 
> net/net/t_forwarding:ipforwarding_fastforward_v6, 
> net/net/t_forwarding:ipforwarding_fragment_v4, 
> net/net/t_forwarding:ipforwarding_misc, net/net/t_mtudisc6:mtudisc6_basic, 
> net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_aesxcbcmac, 
> net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacmd5, 
> net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacripemd160, 
> net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacsha1, 
> net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacsha256, 
> net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacsha384, 
> net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacsha512, 
> net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_keyedmd5, 
> net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_keyedsha1, 
> net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_null, 
> net/npf/t_npf:npf_state, fs/nfs/t_rquotad:get_nfs_be_1_both, 
> fs/nfs/t_rquotad:get_nfs_be_1_group, fs/nfs/t_rquotad:get_nfs_be_1_user, 
> fs/nfs/t_rquotad:get_nfs_le_1_both, fs/nfs/t_rquotad:get_nfs_le_1_group, 
> fs/nfs/t_rquotad:get_nfs_le_1_user
>
>
> See http://www.netbsd.org/~martin/sparc64-atf for details.
>
> Anyone recognize their latest changes as potential culprit?

My fixes of use-after-free of mbuf should fix ipsec failures, but
I don't know about other failures. You may need to clean-build
rump libraries (I needed).

  ozaki-r

Re: Automated report: NetBSD-current/i386 build failure

2017-12-27 Thread Ryota Ozaki

On Thu, Dec 28, 2017 at 4:39 PM, NetBSD Test Fixture  wrote:
> This is an automatically generated notice of a NetBSD-current/i386
> build failure.
>
> The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host,
> using sources from CVS date 2017.12.28.06.13.50.
>
> An extract from the build.sh output follows:
>
> --- side.o ---
> #   compile  diff/side.o
> /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/i486--netbsdelf-gcc 
> -O2   -std=gnu99-Werror   -fPIE   -DLOCALEDIR=\"/usr/share/locale\" 
> -DHAVE_CONFIG_H  
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/external/gpl2/diffutils/dist/../include
>  
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/external/gpl2/diffutils/dist/lib
>  --sysroot=/tmp/bracket/build/2017.12.28.06.13.50-i386/destdir  -c
> /tmp/bracket/build/2017.12.28.06.13.50-i386/src/external/gpl2/diffutils/dist/src/side.c
> --- dependall-gettext ---
> /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/nbctfmerge -t -g -L 
> VERSION -o msgexec msgexec.o
> --- dependall-msgfmt ---
> dependall ===> external/gpl2/gettext/bin/msgfmt
> --- dependall-usr.bin ---
> /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/nbctfconvert -g -L 
> VERSION lstIsEmpty.o
> --- lstLast.o ---
> #   compile  make/lstLast.o
> /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/i486--netbsdelf-gcc 
> -O2 -fPIE-std=gnu99-Wall -Wstrict-prototypes -Wmissing-prototypes 
> -Wpointer-arith -Wno-sign-compare  -Wsystem-headers   -Wno-traditional   
> -Wa,--fatal-warnings  -Wreturn-type -Wswitch -Wshadow -Wcast-qual 
> -Wwrite-strings -Wextra -Wno-unused-parameter -Wno-sign-compare 
> -Wold-style-definition -Wsign-compare -Wformat=2  -Wno-format-zero-length  
> -Werror-DUSE_META 
> --sysroot=/tmp/bracket/build/2017.12.28.06.13.50-i386/destdir  -DMAKE_NATIVE 
> -DUSE_EMALLOC -c
> /tmp/bracket/build/2017.12.28.06.13.50-i386/src/usr.bin/make/lst.lib/lstLast.c
> --- dependall-usr.sbin ---
> /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/nbctfconvert -g -L 
> VERSION msdosfs_vnops.o
> --- msdosfs_unicode.o ---
> #   compile  makefs/msdosfs_unicode.o
> /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/i486--netbsdelf-gcc 
> -O2 -fPIE-std=gnu99-Wall -Wstrict-prototypes -Wmissing-prototypes 
> -Wpointer-arith -Wno-sign-compare  -Wsystem-headers   -Wno-traditional   
> -Wa,--fatal-warnings  -Wreturn-type -Wswitch -Wshadow -Wcast-qual 
> -Wwrite-strings -Wextra -Wno-unused-parameter -Wno-sign-compare 
> -Wold-style-definition -Wsign-compare -Wformat=2  -Wno-format-zero-length  
> -Werror--sysroot=/tmp/bracket/build/2017.12.28.06.13.50-i386/destdir 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/usr.sbin/makefs 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sbin/mknod 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/usr.sbin/mtree -DMAKEFS 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/fs/cd9660 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/ufs/chfs -DV7FS_EI 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/fs/v7fs 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sbin/newfs_v7fs 
> -I/tmp/bracket/build/2017.12.
>  28.06.13.50-i386/src/sbin/fsck -DMSDOS_EI 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/fs/msdosfs 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sbin/newfs_msdos 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/fs/udf 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sbin/newfs_udf 
> -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sbin/fsck -D_KERNTYPES  -c  
>   
> /tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/fs/msdosfs/msdosfs_unicode.c
> --- dependall-tests ---
> --- dependall-rump ---
> cc1: all warnings being treated as errors
> *** [workqueue.o] Error code 1
> nbmake[8]: stopped in 
> /tmp/bracket/build/2017.12.28.06.13.50-i386/src/tests/rump/kernspace
> 1 error
>
> The following commits were made between the last successful build and
> the failed build:
>
> 2017.12.28.04.36.15 ozaki-r src/tests/rump/kernspace/workqueue.c,v 1.2
> 2017.12.28.04.38.02 ozaki-r src/tests/rump/kernspace/workqueue.c,v 1.3
> 2017.12.28.05.43.42 msaitoh src/sys/dev/pci/xhci_pci.c,v 1.11
> 2017.12.28.06.10.01 msaitoh src/sys/dev/pci/ixgbe/ixgbe.c,v 1.119
> 2017.12.28.06.13.50 msaitoh src/sys/dev/pci/if_wm.c,v 1.550

Fixed.

  ozaki-r

Re: Automated report: NetBSD-current/i386 test failure

2017-12-13 Thread Ryota Ozaki

On Wed, Dec 13, 2017 at 9:27 PM, Andreas Gustafsson  wrote:
> NetBSD Test Fixture wrote:
>> The newly failing test cases are:
>>
>> fs/vfs/t_mtime_otrunc:puffs_otrunc_mtime_update
>> net/route/t_change:route_change_ifp
>> net/route/t_change:route_change_ifp_ifa
>>
>> The above tests failed in each of the last 3 test runs, and passed in
>> at least 27 consecutive runs before that.
>>
>> The following commits were made between the last successful test and
>> the failed test:
>>
>> 2017.12.11.02.33.17 knakahara src/sys/kern/subr_psref.c,v 1.8
>> 2017.12.11.02.33.17 knakahara src/sys/sys/psref.h,v 1.3
>> 2017.12.11.03.25.45 ozaki-r src/sys/net/if.c,v 1.412
>> 2017.12.11.03.25.45 ozaki-r src/sys/net/if.h,v 1.252
>> 2017.12.11.03.25.46 ozaki-r src/sys/net/npf/npf_ifaddr.c,v 1.3
>> 2017.12.11.03.25.46 ozaki-r src/sys/net/npf/npf_os.c,v 1.9
>> 2017.12.11.03.29.20 ozaki-r src/sys/net/if.c,v 1.413
>> 2017.12.11.03.29.20 ozaki-r src/sys/net/if.h,v 1.253
>> 2017.12.11.03.29.20 ozaki-r src/sys/net/if_bridge.c,v 1.145
>> 2017.12.11.03.29.20 ozaki-r src/sys/net/if_spppsubr.c,v 1.177
>> 2017.12.11.03.29.20 ozaki-r src/sys/net/if_vlan.c,v 1.119
>> 2017.12.11.03.29.20 ozaki-r 
>> src/sys/rump/net/lib/libnetinet/netinet_component.c,v 1.10
>
> Also, the tests now leave two rump_server processes looping in the
> background:
>
>   UID   PID PPIDCPU PRI NI   VSZ   RSS WCHAN   STAT TTYTIME 
> COMMAND
> 0  44331 310129  27  0 74696  4024 -   Rsl  ? 194:32.70 
> rump_server -lrumpdev -lrumpnet -lrumpnet_net -lrumpnet_netin
> 0 203561 666580  27  0 73884  4024 -   Rsl  ? 216:32.72 
> rump_server -lrumpdev -lrumpnet -lrumpnet_net -lrumpnet_netin
>
> This is slowing down the test VMs enough to make some of the test runs
> time out.
> --
> Andreas Gustafsson, g...@gson.org

Fixed in -current.

The cause of the failures was a bug which calls psref_release to an ifa twice
in rtsock.c, not due to the change to psref.

It seems that changing from LIST to SLIST revealed the bug. LIST could
resist the
bug (LIST_REMOVE can be called twice to an item without errors if the list isn't
modified between the removals) while SLIST couldn't.

  ozaki-r

Re: panic with bpf-using tool

2017-12-11 Thread Ryota Ozaki

On Tue, Dec 12, 2017 at 1:03 AM, Thomas Klausner <t...@giga.or.at> wrote:
> On Thu, Dec 07, 2017 at 06:57:43PM +0900, Ryota Ozaki wrote:
>> On Thu, Dec 7, 2017 at 6:54 PM, Thomas Klausner <t...@giga.or.at> wrote:
>> > I just started net/trafshow for fun and shortly afterwards the machine
>> > paniced (NetBSD 8.99.7/amd64):
>> >
>> > WARNING: SPL NOT LOWERED ON SYSCALL 1 235601568 EXIT 176106b0 7
>> > WARNING: SPL NOT LOWERED ON SYSCALL 1 235601568 EXIT 176106b0 7
>> > vpanic() at netbsd:vpanic+0x140
>> > snprintf() at netbsd:snprintf
>> > lockdebug_abort() at netbsd:lockdebug_abort+0x6e
>> > mutex_vector_exit() at netbsd:mutex_vector_exit+0xe4
>> > callout_halt() at netbsd:callout_halt+0xe6
>> > bpf_read() at netbsd:bpf_read+0x199
>> > dofileread() at netbsd:dofileread+0x8f
>> > sys_read() at netbsd:sys_read+0x5f
>> > syscall() at netbsd:syscall+0x1d8
>> > --- syscall (number 3) ---
>> > 7bb169e3e1fa:
>> > cpu9: End traceback...
>> >
>> > Any ideas?
>>
>> Oops. Could you try the below patch?
>>
>> Thanks,
>>   ozaki-r
>>
>> diff --git a/sys/net/bpf.c b/sys/net/bpf.c
>> index c4bd8306042..64c9d4900bd 100644
>> --- a/sys/net/bpf.c
>> +++ b/sys/net/bpf.c
>> @@ -662,7 +662,7 @@ bpf_read(struct file *fp, off_t *offp, struct uio *uio,
>>
>> mutex_enter(d->bd_mtx);
>> if (d->bd_state == BPF_WAITING)
>> -   callout_halt(>bd_callout, d->bd_buf_mtx);
>> +   callout_halt(>bd_callout, d->bd_mtx);
>> timed_out = (d->bd_state == BPF_TIMED_OUT);
>> d->bd_state = BPF_IDLE;
>> mutex_exit(d->bd_mtx);
>>
>
> With this patch applied, trafshow runs for minutes without problems.
>
> Thank you!
>  Thomas

Good :) I've committed the patch.

Thanks!
  ozaki-r

Fails to build "ALL" kernel of amd64

2017-12-10 Thread Ryota Ozaki

Hi,

"ALL" kernel of amd64 (and maybe i386 too) fails to be built
for recent days (or months?). Unfortunately few people have
not noticed the failure because http://build.tastylime.net/builders/ ,
which had been building kernels including ALL per commits,
is now out of service.

Of course we should fix the build though, I think we also should
build the ALL kernel regularly somehow, for example build it when
build.sh release (just build, not install).

Thought?

  ozaki-r

Re: panic with bpf-using tool

2017-12-07 Thread Ryota Ozaki

On Thu, Dec 7, 2017 at 6:54 PM, Thomas Klausner  wrote:
> I just started net/trafshow for fun and shortly afterwards the machine
> paniced (NetBSD 8.99.7/amd64):
>
> WARNING: SPL NOT LOWERED ON SYSCALL 1 235601568 EXIT 176106b0 7
> WARNING: SPL NOT LOWERED ON SYSCALL 1 235601568 EXIT 176106b0 7
> vpanic() at netbsd:vpanic+0x140
> snprintf() at netbsd:snprintf
> lockdebug_abort() at netbsd:lockdebug_abort+0x6e
> mutex_vector_exit() at netbsd:mutex_vector_exit+0xe4
> callout_halt() at netbsd:callout_halt+0xe6
> bpf_read() at netbsd:bpf_read+0x199
> dofileread() at netbsd:dofileread+0x8f
> sys_read() at netbsd:sys_read+0x5f
> syscall() at netbsd:syscall+0x1d8
> --- syscall (number 3) ---
> 7bb169e3e1fa:
> cpu9: End traceback...
>
> Any ideas?

Oops. Could you try the below patch?

Thanks,
  ozaki-r

diff --git a/sys/net/bpf.c b/sys/net/bpf.c
index c4bd8306042..64c9d4900bd 100644
--- a/sys/net/bpf.c
+++ b/sys/net/bpf.c
@@ -662,7 +662,7 @@ bpf_read(struct file *fp, off_t *offp, struct uio *uio,

mutex_enter(d->bd_mtx);
if (d->bd_state == BPF_WAITING)
-   callout_halt(>bd_callout, d->bd_buf_mtx);
+   callout_halt(>bd_callout, d->bd_mtx);
timed_out = (d->bd_state == BPF_TIMED_OUT);
d->bd_state = BPF_IDLE;
mutex_exit(d->bd_mtx);

Re: Automated report: NetBSD-current/i386 build failure

2017-09-27 Thread Ryota Ozaki

On Wed, Sep 27, 2017 at 6:47 PM, NetBSD Test Fixture  wrote:
> This is an automatically generated notice of a NetBSD-current/i386
> build failure.
>
> The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host,
> using sources from CVS date 2017.09.27.08.14.18.
>
> An extract from the build.sh output follows:
>
> --- ieee8023ad_lacp_sm_tx.pico ---
> #   compile  libagr/ieee8023ad_lacp_sm_tx.pico
> /tmp/bracket/build/2017.09.27.08.14.18-i386/tools/bin/i486--netbsdelf-gcc 
> -O2 -ffreestanding -fno-strict-aliasing -msoft-float -mno-mmx -mno-sse 
> -mno-avx -msoft-float -mno-mmx -mno-sse -mno-avx   -std=gnu99-Wall 
> -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-sign-compare  
> -Wsystem-headers   -Wno-traditional   -Wa,--fatal-warnings  -Wreturn-type 
> -Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wextra -Wno-unused-parameter 
> -Wno-sign-compare -Werror -Wno-format-zero-length -Wno-pointer-sign   -fPIE 
> -fstack-protector -Wstack-protector   --param ssp-buffer-size=1   
> --sysroot=/tmp/bracket/build/2017.09.27.08.14.18-i386/destdir -DCOMPAT_50 
> -DCOMPAT_60 -DCOMPAT_70 -nostdinc -imacros 
> /tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../include/opt/opt_rumpkernel.h
>  -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr 
> -I. 
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../../../common/include
>  -
>  
> I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../include
>  
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../include/opt
>  
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../../arch
>  
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../..
>  -DDIAGNOSTIC -DKTRACE   -D_FORTIFY_SOURCE=2 -c-fPIC 
> /tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../../net/agr/ieee8023ad_lacp_sm_tx.c
>  -o ieee8023ad_lacp_sm_tx.pico
> --- dependall-libnetipsec ---
> --- key.pico ---
> cc1: all warnings being treated as errors
> --- dependall-libnpf ---
> 
> uild/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnpf/../../../include 
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnpf/../../../include/opt
>  
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnpf/../../../../arch
>  
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnpf/../../../..
>  -DDIAGNOSTIC -DKTRACE   -D_FORTIFY_SOURCE=2 -c -DGPROF -DPROF-pg 
> /tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnpf/../../../../net/npf/npf_tableset.c
>  -o npf_tableset.po
> --- dependall-libnetipsec ---
> --- keysock.po ---
> /tmp/bracket/build/2017.09.27.08.14.18-i386/tools/bin/nbctfconvert -g -L 
> VERSION keysock.po
> 
> /tmp/bracket/build/2017.09.27.08.14.18-i386/tools/bin/i486--netbsdelf-objcopy 
> -X  keysock.po
> --- dependall-libnetbt ---
> --- rfcomm_socket.po ---
> #   compile  libnetbt/rfcomm_socket.po
> /tmp/bracket/build/2017.09.27.08.14.18-i386/tools/bin/i486--netbsdelf-gcc 
> -O2 -ffreestanding -fno-strict-aliasing -msoft-float -mno-mmx -mno-sse 
> -mno-avx -msoft-float -mno-mmx -mno-sse -mno-avx   -std=gnu99-Wall 
> -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-sign-compare  
> -Wsystem-headers   -Wno-traditional   -Wa,--fatal-warnings  -Wreturn-type 
> -Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wextra -Wno-unused-parameter 
> -Wno-sign-compare -Werror -Wno-format-zero-length -Wno-pointer-sign   -fPIE 
> -fstack-protector -Wstack-protector   --param ssp-buffer-size=1   
> --sysroot=/tmp/bracket/build/2017.09.27.08.14.18-i386/destdir -DCOMPAT_50 
> -DCOMPAT_60 -DCOMPAT_70 -nostdinc -imacros 
> /tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../include/opt/opt_rumpkernel.h
>  -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt 
> -I. 
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../../../common/inc
>  lude 
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../include
>  
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../include/opt
>  
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../../arch
>  
> -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../..
>  -DDIAGNOSTIC -DKTRACE   -D_FORTIFY_SOURCE=2 -c -DGPROF -DPROF-pg 
> /tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../../netbt/rfcomm_socket.c
>  -o rfcomm_socket.po
> --- dependall-libnpf ---
> --- npf_conndb.po ---
> --- dependall-libnetipsec ---
> --- key.pico ---
> *** [key.pico] Error code 1
> nbmake[8]: stopped in 
>

Re: Using NET_MPSAFE

2017-08-14 Thread Ryota Ozaki

I'm sorry for late replying.

On Wed, Aug 9, 2017 at 8:53 AM, Brian Buhrow  wrote:
> hello Ryota San.  Thank you for your detailed response.  Yes, I'm
> interested in using NetBSD as a router.  We use NetBSD-5 as router
> devices and find it quite reliable, but for higher speed applications,
> we're running into the cpu0 takes all interrupts bottleneck.  I can't
> promise to deliver anything in a timely manner, but I can look at the
> agr(4) driver and see if I can MP-ify it.

My co-worker would also work on MP-ification of agr in a few months.
Of course your help still welcome for us :)

> Are there any notes in English
> describing the basic procedures for MP-ifying a driver? I've done a bit of
> it for drivers under NetBSD-5, so it's not completely foreign to me.

My presentation at BSDCan 2017 includes rough procedures:

http://www.netbsd.org/gallery/presentations/ozaki-r/2017_BSDCan/BSDCan2017-ozaki-nakahara.pdf

There is no detailed documentation for MP-ification. You still need to
reference source codes and change logs of existing MP-safe components
such as gif(4) and l2tp(4) to know how to MP-ify a component.

> We also use pf(4) extensively.  Unfortunately, npf(4) doesn't have all
> the functionality we need to implement the configurations we use.
> Consequently, it may be necessary to MP-ify pf(4) as well, as I suspect
> that's easier than implementing its functionality in npf(4).

Personally I recommend to extend nfp because it's already MP-safe
and maintained better than pf and ipf, but MP-ifying pf is welcome
anyway.

  ozaki-r

Re: Using NET_MPSAFE

2017-08-07 Thread Ryota Ozaki

Hi,

On Sat, Aug 5, 2017 at 3:25 AM, Brian Buhrow  wrote:
> hello.  I'm excited to see the development of the MP-safe network
> stack in NetBSD.  Now that some progress has been made in that regard and
> there are MP-safe drivers and  stack components to use, I have some
> questions.  I'm interested in using options NET_MPSAFE in NetBSD-8.0_BETA
> and the eventual netbsd-8 release.  Here are my questions.  I apologize if
> some of them seem obvious, but I don't want to make any assumptions when
> trying this new stuff.

First of all, the primary target of the work is routers. So if you use
NetBSD as clients or servers, you may not gain benefits from the work.
For such users, we're looking for someone who works on MP-safe Layer 4 :)

>
> 1.  If I enable NET_MPSAFE in the kernel, will non-MP-ify'd components work
> in that kernel using the kernel lock?  In other words, if I enable
> NET_MPSAFE and use the wm(4) driver, I'll get MP performance out of the
> network stack.  However, what if I try to use a non-MP-ify'd component on
> that same machine, i.e. agr(4) or pf(4)?  It looks to me like things should
> work, but traffic through the non-MP-ify'd components will be single
> threaded.  Is this correct?

Nope, unfortunately. Non-MP-safe components need to be protected somehow
(probably by adding KERNEL_LOCK to the entrances of the component)
if NET_MPSAFE is enabled. That's why NET_MPSAFE is not enabled by default.
We're looking for someone who works on the tasks too.

Nonetheless, some non-MP-safe components luckily work even if NET_MPSAFE
is enabled. For example CARP isn't MP-safe yet however it works pretty
stable with NET_MPSAFE thanks to the big lock for the network stack
(softnet_lock). My dogfooding router works with it for several months
without any issues.

FYI: you can check the lists of MP-safe/non-MP-safe components at:
  https://nxr.netbsd.org/xref/src/doc/TODO.smpnet

>
> 2.  Am I correct that when NET_MPSAFE is turned on, the network stack is
> runing as an LWP inside the kernel?

No.

> And, am I correct that this means that
> even if a particular network component is single-threaded, it's able to
> execute on any CPU, thus reducing CPU congestion on CPU0 as happens on the
> stock NetBSD kernels?

NetBSD doesn't have dedicated threads for network components (except for
timers). For transmissions from a userland program, the network stack runs
in a LWP of the program. For receptions, the network stack runs in some of
software interrupt contexts. In any cases, the big locks (KERNEL_LOCK and
softnet_lock) prevents such contexts from running in parallel. NET_MPSAFE
option gets rid of (some of) the big locks and thus the network stack runs
in parallel on multiple CPUs.

NET_MPSAFE doesn't remove the big locks for transmissions from userland,
so sending packets don't run in parallel. For packet receptions and
forwarding, NET_MPSAFE remove the big locks and packet processing runs
in parallel. If you use one of MP-safe network device drivers such as
wm(4), NET_MPSAFE enables the hardware multi-queue feature and incoming
packets are delivered to multiple CPUs. If you use non-MP-safe drivers,
all packets are delivered to CPU0 and no packet processing runs in
parallel even if NET_MPSAFE is enabled.

>
> 3.  How stable  is the NET_MPSAFE stack?  Is anyone using it in any sort of
> production environment?
> the BSDCAN paper I read suggests it's pretty stable, but I'm wondering if
> anyone can report their experience.

We (IIJ) are working on making the network stack with NET_MPSAFE stable
enough for productions.

What I can say now is that if you use only MP-safe network components
it should be stable.

Regards,
  ozaki-r

Re: panic in wqinput_input

2017-05-21 Thread Ryota Ozaki

On Thu, May 4, 2017 at 6:31 AM, Thomas Klausner <t...@giga.or.at> wrote:
> On Wed, May 03, 2017 at 05:11:12PM +0900, Ryota Ozaki wrote:
>> On Wed, May 3, 2017 at 4:16 PM, Thomas Klausner <t...@giga.or.at> wrote:
>> > Hi!
>> >
>> > Last night my 7.99.67/amd64 rebooted after this panic:
>> >
>> > fatal page fault in supervisor mode
>> > trap type 6 code 0x2 rip 0x80a815b6 cs 0x8 rflags 0x10286 cr2 0 
>> > ilevel 0x4 rsp 0xfe813a414dc0
>> > curlwp 0xfe882df26420 pid 0.3 lowest kstack 0xfe813a4112c0
>> > panic: trap
>> > cpu0: Begin traceback...
>> > vpanic() at netbsd:vpanic+0x140
>> > snprintf() at netbsd:snprintf
>> > trap() at netbsd:trap+0xc6b
>> > --- trap (number 6) ---
>> > wqinput_input() at netbsd:wqinput_input+0x43
>> > icmp6_input() at netbsd:icmp6_input+0x17
>> > ip6_input() at netbsd:ip6_input+0x6cb
>> > ip6intr() at netbsd:ip6intr+0x71
>> > softint_dispatch() at netbsd:softint_dispatch+0xd3
>> > DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe813a414ff0
>> > Xsoftintr() at netbsd:Xsoftintr+0x4f
>> > --- interrupt ---
>> > 0:
>> > cpu0: End traceback...
>> >
>> > It probably was under high memory load, so it could be an
>> > out-of-memory situation.
>> >
>> > Crash dump is available.
>> >  Thomas
>>
>> I think the below diff fixes the panic.
>>
>> Could you confirm that the variable "work" was NULL by
>> inspecting the crash dump?
>
> Thank you for your reply and patch.

I'm sorry for not replying this mail.

>
> I'm sorry, it seems something went wrong with the crash dump, it looks
> like this:

Hmm, I'm not sure why the crash dump is broken. Anyway
my patch should be necessary and likely to fix the panic,
so I committed the patch.

If the same panic would happen again, please let me know.

Thanks,
  ozaki-r

>
> (gdb) target kvm netbsd.107.core
> 0x80219f55 in cpu_reboot (howto=howto@entry=260, 
> bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:674
> 674 dumpsys();
> (gdb) bt
> #0  0x80219f55 in cpu_reboot (howto=howto@entry=260, 
> bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:674
> #1  0x809a867c in vpanic (fmt=fmt@entry=0x810d5762 "trap", 
> ap=ap@entry=0xfe813b390888) at /usr/src/sys/kern/subr_prf.c:342
> #2  0x809a8730 in panic (fmt=fmt@entry=0x810d5762 "trap") at 
> /usr/src/sys/kern/subr_prf.c:258
> #3  0x8021bb86 in trap (frame=0xfe813b3909c0) at 
> /usr/src/sys/arch/amd64/amd64/trap.c:297
> #4  0x8020113e in alltraps ()
> #5  0x8021ba0f in trap (frame=0xfe813b390b90) at 
> /usr/src/sys/arch/amd64/amd64/trap.c:345
> #6  0x8020113e in alltraps ()
> #7  0x81dc7033 in ?? ()
> #8  0xfe81046a in ?? ()
> #9  0xfe870023 in ?? ()
> #10 0xfe813b390cf0 in ?? ()
> #11 0x80445829 in usbd_fill_deviceinfo (dev=0xfe839b9bdab0, 
> di=0xfe86c3e7b020, usedev=0) at /usr/src/sys/dev/usb/usb_subr.c:1507
> Backtrace stopped: frame did not save the PC
> (gdb)
>
> "thread apply all bt" makes gdb crash, as does "thread 2.1", so I
> don't know how to find backtraces for other active threads.
>
> (gdb) thread apply all bt
>
> Thread 2.1 ():
> /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/gdbarch.c:4884: 
> internal-error: gdbarch_addressable_memory_unit_size: Assertion `gdbarch != 
> NULL' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
> Quit this debugging session? (y or n) y
>
> This is a bug, please report it.  For instructions, see:
> <http://www.gnu.org/software/gdb/bugs/>.
>
> /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/gdbarch.c:4884: 
> internal-error: gdbarch_addressable_memory_unit_size: Assertion `gdbarch != 
> NULL' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
> Create a core file of GDB? (y or n) y
> Abort (core dumped)
>
> (gdb) thread 2.1
> [Switching to thread 2.1 ()]
> /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/gdbarch.c:4884: 
> internal-error: gdbarch_addressable_memory_unit_size: Assertion `gdbarch != 
> NULL' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
> Quit this debugging session? (y or n)
> ...
>
>
> gdb backtrace for the first case:
>
> (gdb) bt
> #0  0x7f61c731cd8a in _lwp_kill () from /usr/lib

Re: re(4) trouble

2017-03-10 Thread Ryota Ozaki

On Wed, Mar 8, 2017 at 6:42 PM, Patrick Welche  wrote:
> On Fri, Mar 03, 2017 at 05:31:30PM +, Patrick Welche wrote:
>> Netbooted a new box with this morning's -current/amd64, so its
>> network interface is successfully configured. ftping the sets
>> failed. I just typed dhcpcd to check, and:
> ...
>> mounted the disks, tried ftp again - 100% packet loss. Tried dhcpcd -k, and
>> it didn't return - hang.
>>
>> re1 at pci4 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet (rev. 
>> 0x07)
>> re1: interrupting at msi4 vec 0
>> re1: Ethernet address 00:01:2e:67:bc:68
>> re1: using 256 tx descriptors
>> rgephy1 at re1 phy 7: RTL8169S/8110S/8211 1000BASE-T media interface, rev. 5
>> rgephy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
>> 1000baseT-FDX, auto
>>
>> is the interface involved...
>
> Guessing fixed by ozaki-r
>
> http://mail-index.netbsd.org/current-users/2017/03/05/msg031261.html

Oh I'm sorry for not responding to the report. I agree your guess.

Thank you for the report,
  ozaki-r

Re: A few crashes with yesterday's amd64-current -- IPv6 related?

2017-03-07 Thread Ryota Ozaki

On Wed, Mar 8, 2017 at 12:32 AM, Tom Ivar Helbekkmo
 wrote:
> Tom Ivar Helbekkmo  writes:
>
>> It seems rather unwilling to crash again.  It's possible, I guess,
>> that the last one was actually due to the corrupted file system, which
>> I fixed.  I'll keep trying, anyway.
>
> It seems I didn't entirely fix it, and that was, indeed, it.  I still
> haven't been able to fix it completely, but with the troublesome inode
> sitting as an empty file in /lost+found, things seem stable.  I guess
> I'll have to boot from CD, and dump, newfs and restore the file system.
> (I used fsdb to fake a link count of 1 on the inode, so that fsck put it
> in lost+found.  Just clearing it made the problem come back next boot.)
>
> Heads up to those others who've experienced fs crashes, in other words.

The issue of ffs has been fixed in PR kern/52045.

>
> However, with the fixes committed by ozaki-r, it seems the networking
> troubles with re devices have been solved, which is very nice.  :)

Good :)

  ozaki-r

Re: A few crashes with yesterday's amd64-current -- IPv6 related?

2017-03-06 Thread Ryota Ozaki

On Tue, Mar 7, 2017 at 3:05 AM, Tom Ivar Helbekkmo <t...@hamartun.priv.no> 
wrote:
> Ryota Ozaki <ozak...@netbsd.org> writes:
>> <t...@hamartun.priv.no> wrote:
>>> No idea, I'm afraid.  I'll just have to provoke another one.
>
> It seems rather unwilling to crash again.  It's possible, I guess, that
> the last one was actually due to the corrupted file system, which I
> fixed.  I'll keep trying, anyway.

Thanks.

I also encountered the ffs panic. It seems to happen on bootup after
a kernel panicked and a filesystem became unclean. It's not related to
neither NFS nor re(4). I guess it's because of a recent ffs/vfs change.

>
>> BTW sysctl -w ddb.onpanic=1 may help to avoid losing the first crash.
>
> I've been wondering about that: will it do the right thing if the system
> crashes while I'm using X on the console?  I assume it has to switch to
> a text-based virtual console, and then run the debugger on that?

Oh, I have no idea when using X because I don't use X daily. I guess you
need to switch to a console beforehand to see DDB on a kernel panic.

  ozaki-r

Re: A few crashes with yesterday's amd64-current -- IPv6 related?

2017-03-05 Thread Ryota Ozaki

On Mon, Mar 6, 2017 at 12:24 PM, Tom Ivar Helbekkmo
<t...@hamartun.priv.no> wrote:
> Ryota Ozaki <ozak...@netbsd.org> writes:
>
>> Hmm. Where did the first crash happen? In re(4) or NFS or ffs?
>
> No idea, I'm afraid.  I'll just have to provoke another one.

NP, thanks.

BTW sysctl -w ddb.onpanic=1 may help to avoid losing the first crash.

  ozaki-r

Re: A few crashes with yesterday's amd64-current -- IPv6 related?

2017-03-05 Thread Ryota Ozaki

On Mon, Mar 6, 2017 at 4:31 AM, Tom Ivar Helbekkmo  
wrote:
> Tom Ivar Helbekkmo  writes:
>
>> Have done so, and have been building stuff for two or three hours with
>> no accidents.  Looking good so far, in other words.
>
> It crashed just now, though.  Unfortunately, it crashed again during
> boot, and ended up overwriting the old core dump before saving it.  :(

Hmm. Where did the first crash happen? In re(4) or NFS or ffs?

  ozaki-r

Re: A few crashes with yesterday's amd64-current -- IPv6 related?

2017-03-05 Thread Ryota Ozaki

On Sun, Mar 5, 2017 at 8:18 PM, Ryota Ozaki <ozak...@netbsd.org> wrote:
> Hi Tom
>
> Thank you for the reports.
>
>
> On Sun, Mar 5, 2017 at 6:59 PM, Tom Ivar Helbekkmo <t...@hamartun.priv.no> 
> wrote:
>> I updated again yesterday, and it seems at least one stability issue
>> has been introduced since 7.99.59, which I was running before this.
>>
>> The first crash came when I was trying to shut down to single user after
>> booting the new kernel with the existing userland.  I *think* it was
>> triggered by the kernel missing the correct module directory; I caught a
>> glimpse of it trying to access a module to connect to the console, and I
>> later discovered that my ttys file had console enabled instead of ttyE0:
>>
>> panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() 
>> || ISSET(curlwp->l_pflag, LP_BOUND))" failed: file 
>> "/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, 
>> but preemption is enabled and the caller is not in a softint or CPU-bound LWP
>> cpu1: Begin traceback...
>> vpanic() at netbsd:vpanic+0x140
>> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
>> psref_release() at netbsd:psref_release+0xf8
>> ip_setmoptions() at netbsd:ip_setmoptions+0x269
>> ip_ctloutput() at netbsd:ip_ctloutput+0x1ee
>> rip_ctloutput() at netbsd:rip_ctloutput+0xee
>> rip_ctloutput_wrapper() at netbsd:rip_ctloutput_wrapper+0x2c
>> sosetopt() at netbsd:sosetopt+0x67
>> sys_setsockopt() at netbsd:sys_setsockopt+0x91
>> syscall() at netbsd:syscall+0x1d8
>> --- syscall (number 105) ---
>> 7eb0dacdb16a:
>> cpu1: End traceback...
>
> I fixed the panic in -current.
>
>>
>> Then it crashed during boot, seemingly related to fsck:
>>
>> panic: ffs_sync: rofs mod, fs=/
>> cpu0: Begin traceback...
>> vpanic() at netbsd:vpanic+0x140
>> snprintf() at netbsd:snprintf
>> ffs_sync() at netbsd:ffs_sync+0x26b
>> VFS_SYNC() at netbsd:VFS_SYNC+0x1c
>> sched_sync() at netbsd:sched_sync+0x27b
>> cpu0: End traceback...
>>
>> Anyway, I installed the complete updated userland on the machine, and
>> started updating a bunch of packages from source, with all disk activity
>> over NFS over UDP over IPv6.  After about three hours:
>>
>> panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file 
>> "/usr/src/sys/dev/ic/rtl8169.c", line 1380
>> cpu0: Begin traceback...
>> vpanic() at netbsd:vpanic+0x140
>> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
>> re_txeof() at netbsd:re_txeof+0x250
>> re_intr() at netbsd:re_intr+0x11b
>> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
>> Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee
>> --- interrupt ---
>> x86_mwait() at netbsd:x86_mwait+0xd
>> acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
>> acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
>> idle_loop() at netbsd:idle_loop+0x18c
>> cpu0: End traceback...
>> uvm_fault(0xfe80cbca48c0, 0x0, 2) -> e
>> fatal page fault in supervisor mode
>> trap type 6 code 2 rip 8095500b cs 8 rflags 10282 cr2 84 ilevel 8 
>> rsp fe8040afea80
>> curlwp 0xfe804dedaa20 pid 20873.1 lowest kstack 0xfe8040afb2c0
>>
>> Once more, it crashed during boot, just like after the first crash:
>>
>> panic: ffs_sync: rofs mod, fs=/
>> cpu1: Begin traceback...
>> vpanic() at netbsd:vpanic+0x140
>> snprintf() at netbsd:snprintf
>> ffs_sync() at netbsd:ffs_sync+0x26b
>> VFS_SYNC() at netbsd:VFS_SYNC+0x1c
>> sched_sync() at netbsd:sched_sync+0x27b
>> cpu1: End traceback...
>>
>> I tried to continue building packages over NFS, but this happened again:
>>
>> panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file 
>> "/usr/src/sys/dev/ic/rtl8169.c", line 1380
>> cpu0: Begin traceback...
>> vpanic() at netbsd:vpanic+0x140
>> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
>> re_txeof() at netbsd:re_txeof+0x250
>> re_intr() at netbsd:re_intr+0x11b
>> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
>> Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee
>> --- interrupt ---
>> x86_mwait() at netbsd:x86_mwait+0xd
>> acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
>> acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
>> idle_loop() at netbsd:idle_loop+0x18c
>> cpu0: End traceback...
>>
>> This is when I p

Re: A few crashes with yesterday's amd64-current -- IPv6 related?

2017-03-05 Thread Ryota Ozaki

Hi Tom

Thank you for the reports.


On Sun, Mar 5, 2017 at 6:59 PM, Tom Ivar Helbekkmo  
wrote:
> I updated again yesterday, and it seems at least one stability issue
> has been introduced since 7.99.59, which I was running before this.
>
> The first crash came when I was trying to shut down to single user after
> booting the new kernel with the existing userland.  I *think* it was
> triggered by the kernel missing the correct module directory; I caught a
> glimpse of it trying to access a module to connect to the console, and I
> later discovered that my ttys file had console enabled instead of ttyE0:
>
> panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() 
> || ISSET(curlwp->l_pflag, LP_BOUND))" failed: file 
> "/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, 
> but preemption is enabled and the caller is not in a softint or CPU-bound LWP
> cpu1: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
> psref_release() at netbsd:psref_release+0xf8
> ip_setmoptions() at netbsd:ip_setmoptions+0x269
> ip_ctloutput() at netbsd:ip_ctloutput+0x1ee
> rip_ctloutput() at netbsd:rip_ctloutput+0xee
> rip_ctloutput_wrapper() at netbsd:rip_ctloutput_wrapper+0x2c
> sosetopt() at netbsd:sosetopt+0x67
> sys_setsockopt() at netbsd:sys_setsockopt+0x91
> syscall() at netbsd:syscall+0x1d8
> --- syscall (number 105) ---
> 7eb0dacdb16a:
> cpu1: End traceback...

I fixed the panic in -current.

>
> Then it crashed during boot, seemingly related to fsck:
>
> panic: ffs_sync: rofs mod, fs=/
> cpu0: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> snprintf() at netbsd:snprintf
> ffs_sync() at netbsd:ffs_sync+0x26b
> VFS_SYNC() at netbsd:VFS_SYNC+0x1c
> sched_sync() at netbsd:sched_sync+0x27b
> cpu0: End traceback...
>
> Anyway, I installed the complete updated userland on the machine, and
> started updating a bunch of packages from source, with all disk activity
> over NFS over UDP over IPv6.  After about three hours:
>
> panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file 
> "/usr/src/sys/dev/ic/rtl8169.c", line 1380
> cpu0: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
> re_txeof() at netbsd:re_txeof+0x250
> re_intr() at netbsd:re_intr+0x11b
> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
> Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee
> --- interrupt ---
> x86_mwait() at netbsd:x86_mwait+0xd
> acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
> acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
> idle_loop() at netbsd:idle_loop+0x18c
> cpu0: End traceback...
> uvm_fault(0xfe80cbca48c0, 0x0, 2) -> e
> fatal page fault in supervisor mode
> trap type 6 code 2 rip 8095500b cs 8 rflags 10282 cr2 84 ilevel 8 rsp 
> fe8040afea80
> curlwp 0xfe804dedaa20 pid 20873.1 lowest kstack 0xfe8040afb2c0
>
> Once more, it crashed during boot, just like after the first crash:
>
> panic: ffs_sync: rofs mod, fs=/
> cpu1: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> snprintf() at netbsd:snprintf
> ffs_sync() at netbsd:ffs_sync+0x26b
> VFS_SYNC() at netbsd:VFS_SYNC+0x1c
> sched_sync() at netbsd:sched_sync+0x27b
> cpu1: End traceback...
>
> I tried to continue building packages over NFS, but this happened again:
>
> panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file 
> "/usr/src/sys/dev/ic/rtl8169.c", line 1380
> cpu0: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
> re_txeof() at netbsd:re_txeof+0x250
> re_intr() at netbsd:re_intr+0x11b
> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
> Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee
> --- interrupt ---
> x86_mwait() at netbsd:x86_mwait+0xd
> acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
> acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
> idle_loop() at netbsd:idle_loop+0x18c
> cpu0: End traceback...
>
> This is when I pointed WRKOBJDIR to a local scratch directory in
> /etc/mk.conf, thus reducing the amount of network traffic severely.
> It's now building happily.  :)
>
> I've noticed quite a few IPv6 changes, lately.  Might these mbuf related
> assertions have something to do with that?

I doubt rather the change of rtl8169.c,v 1.149; it applied the deferred
if_start mechanism to re(4).

Could you apply the following patch and try again? If the patch doesn't
help, could you revert rtl8169.c,v 1.149 and try again?

Thanks,
  ozaki-r

diff --git a/sys/net/if.c b/sys/net/if.c
index 482bcbe..61c1b50 100644
--- a/sys/net/if.c
+++ b/sys/net/if.c
@@ -1008,8 +1008,11 @@ if_deferred_start_softint(void *arg)
 static void
 if_deferred_start_common(struct ifnet *ifp)
 {
+   int s;

+   s = splnet();
if_start_lock(ifp);
+   splx(s);
 }

 static inline bool

Re: re(4) bpf-related assert

2017-02-19 Thread Ryota Ozaki

On Sun, Feb 19, 2017 at 9:17 PM,   wrote:
> It holds up fine (although I'm not confident I know
> how to end up in that code).
> Thanks!

Thank you for testing! That code is executed when there are
certain amount of Tx/Rx traffics on the NIC.

  ozaki-r

Re: re(4) bpf-related assert

2017-02-17 Thread Ryota Ozaki

Hi,

On Fri, Feb 17, 2017 at 7:40 PM,   wrote:
> Hi,
>
> I'm using:
> re0 at pci8 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet (rev. 
> 0x06)
>
> while using -current I got:
> System panicked: kernel diagnostic assertion "!cpu_intr_p()" failed: file 
> "/usr/src/sys/net/bpf.c", line 1577
>
> _KERNEL_OPT_NARCNET() at 0
> ?() at 80001d563000
> vpanic() at vpanic+0x149
> ch_voltag_convert_in() at ch_voltag_convert_in
> _bpf_mtap() at _bpf_mtap+0x48f
> re_start() at re_start+0x3d8
> re_intr() at re_intr+0x176
> intr_biglock_wrapper() at intr_biglock_wrapper+0x1d
> Xintr_ioapic_edge18() at Xintr_ioapic_edge18+0xee
> --- interrupt ---
> Xspllower() at Xspllower+0xe
> callout_softclock() at callout_softclock+0x41c
> softint_dispatch() at softint_dispatch+0xda
>
> how should bpf_mtap callers be adjusted in this case?
>
> thanks.

Sorry, I forgot to make re use the deferred if_start mechanism.

Could you try the patch?
  ozaki-r

diff --git a/sys/dev/ic/rtl8169.c b/sys/dev/ic/rtl8169.c
index 691afa4..d262af1 100644
--- a/sys/dev/ic/rtl8169.c
+++ b/sys/dev/ic/rtl8169.c
@@ -869,6 +869,7 @@ re_attach(struct rtk_softc *sc)
 * Call MI attach routine.
 */
if_attach(ifp);
+   if_deferred_start_init(ifp, NULL);
ether_ifattach(ifp, eaddr);

rnd_attach_source(>rnd_source, device_xname(sc->sc_dev),
@@ -1496,8 +1497,8 @@ re_intr(void *arg)
}
}

-   if (handled && !IFQ_IS_EMPTY(>if_snd))
-   re_start(ifp);
+   if (handled)
+   if_schedule_deferred_start(ifp);

rnd_add_uint32(>rnd_source, status);

Re: OpenVPN causes fresh -current to crash

2017-01-23 Thread Ryota Ozaki

On Tue, Jan 24, 2017 at 12:53 AM, Tom Ivar Helbekkmo
<t...@hamartun.priv.no> wrote:
> Ryota Ozaki <ozak...@netbsd.org> writes:
>
>> The latest pfil.c (v1.34) should fix the panic. Could you try it?
>
> I'll give it a go tonight, and report back.

Thanks.

>
> Meanwhile, do you think this ongoing MPSAFE work may have some unwanted
> consequences for NFS?  There's a problem that's been around for at least
> a couple of months, but that I only discovered the other day -- I was
> running with kernels from late October then, and the problem I observed
> is still there after upgrading.

I'm not sure. I don't know much about NFS, how it works and how it involves
the network stack.

>
> Reading NFS file systems is no problem, which is why I didn't notice it
> before, but writing hangs.  Here's an example: I started compiling a C
> source file directly to an executable on an NFS mounted file system
> (server and client both amd64 running fresh -current).  The compile pass
> is fine, but when the ld end of the pipeline wants to write the
> executable, it hangs.  So I try to do a 'df' in another terminal, and it
> hangs.  Finally, I simply attempt to make 'ls -l [target executable]'
> show me if it's written anything yet, and that hangs, too: after an
> attempt to write has hung the communication up, reads no longer work,
> either:
>
>  UID   PID  PPID   CPU PRI  NI VSZ RSS WCHAN   STAT TTY  TIME 
> COMMAND
>0 22179 22678 0 124   0   333445136 netio   D+   pts/170:00.01 
> ld [...]
>  501 21370 21006   516  85   089521144 nfsrcv  I+   pts/180:00.00 
> df
>  501 21710 1 0 127   089641116 tstile  Dpts/20-   0:00.00 
> /bin/ls [...]
>
> Once I have something with "tstile" in the "WCHAN" column, I know that
> I can't just reboot the machine: it's going to take a hard reset.

Can you get DDB? If you can, you can know where the processes hang up:
  db> ps # you can get LWP addresses of ld and ls
  db> bt/a  # you can get their stack traces

And I guess by ps you can see some other LWPs stuck on tstile, for example
softnet/N. Getting stack traces of such LWPs would explain how the hang
happens, at least, can be hints to investigate.

>
> Oh, and it's the client that hangs; the server seems to be just fine,
> and a reboot of the client makes NFS reads behave normally again.  On
> the server, the output file got created, but is zero bytes.  The error
> logged on the client when it gets stuck is this console output:
>
> nfs send error 64 for barsoom:/usr/local
>
> ...and then the normal "nfs server not responding" messages in syslog
> after that, of course.

I tried a NFS client with -current and a NFS server with netbsd-7, but
writing didn't hang (I compiled a C program and cp -r /etc/ /mnt/nfs).
The hang may happen depending on a NIC. Which NIC do you use?

And please let me know NFS options of the client and the server?

  ozaki-r

Re: OpenVPN causes fresh -current to crash

2017-01-22 Thread Ryota Ozaki

On Sun, Jan 22, 2017 at 8:05 PM, Tom Ivar Helbekkmo
 wrote:
> Martin Husemann  writes:
>
>> Could you try backing out this change and see if it helps?
>>
>> http://mail-index.netbsd.org/source-changes/2017/01/16/msg081115.html
>
> That did the trick.  I've rebooted a few times, now, and the system
> comes up as it should, with no incident, every time.  Thanks!  :)

The latest pfil.c (v1.34) should fix the panic. Could you try it?

Thanks,
  ozaki-r

Re: strange observations on network configuration (ifconfig)

2016-12-13 Thread Ryota Ozaki

Hi

Thank you for testing!

I'll commit the patch unless my further investigation denies.

  ozaki-r

On Tue, Dec 13, 2016 at 2:29 AM, Frank Kardel <kar...@netbsd.org> wrote:
> Hi,
>
> thanks for that.
>
> It looks fine to me now.
>
> Frank
>
>
> On 12/12/16 14:13, Ryota Ozaki wrote:
>>
>> On Mon, Dec 12, 2016 at 5:22 PM, Frank Kardel <kar...@netbsd.org> wrote:
>>>
>>> Hi,
>>>
>>> I did try that efore an verified it again. Now routed attempts to install
>>> a
>>> local route
>>> for the lo0 interface and fill the log with the EEXIST messages.
>>>
>>> That's why I went for LLDATA in order to avoid to analyse routed's inner
>>> workings completely.
>>
>> RTF_LLDATA is introduced for userland programs to get L2 routes
>> (ARP/NDP caches), so we don't use it for this case.
>>
>>> Maybe we need a different test for ignoring kernel routing messages.
>>
>> Yes. I'll investigate more though, prepared another patch:
>>http://www.netbsd.org/~ozaki-r/fix-routed2.diff
>>
>> It mimics checks to ignore RTF_STATIC routes. It works for me.
>>
>> Thanks,
>>ozaki-r
>
>

Re: strange observations on network configuration (ifconfig)

2016-12-12 Thread Ryota Ozaki

On Mon, Dec 12, 2016 at 5:22 PM, Frank Kardel  wrote:
> Hi,
>
> I did try that efore an verified it again. Now routed attempts to install a
> local route
> for the lo0 interface and fill the log with the EEXIST messages.
>
> That's why I went for LLDATA in order to avoid to analyse routed's inner
> workings completely.

RTF_LLDATA is introduced for userland programs to get L2 routes
(ARP/NDP caches), so we don't use it for this case.

> Maybe we need a different test for ignoring kernel routing messages.

Yes. I'll investigate more though, prepared another patch:
  http://www.netbsd.org/~ozaki-r/fix-routed2.diff

It mimics checks to ignore RTF_STATIC routes. It works for me.

Thanks,
  ozaki-r

Re: strange observations on network configuration (ifconfig)

2016-12-11 Thread Ryota Ozaki

Hi,

Thank you for the investigation.

On Sun, Dec 11, 2016 at 9:08 PM, Frank Kardel  wrote:
> Hi !
>
> Reverting that change (1.24->1.25)  and using RTF_LLDATA instead of
> RTF_LLINFO seems to solve the problem.
> Is this correct or am I overlooking something?

Local routes aren't actually link-layer routes; RTF_LLDATA remain in them
for backward compatibility, IIRC. So as you said if old routed works on
a new kernel, I think it is good to fix routed as I proposed in my earlier
mail.

Could you try the patch?
  http://www.netbsd.org/~ozaki-r/fix-routed.diff

Thanks,
  ozaki-r

Re: strange observations on network configuration (ifconfig)

2016-12-11 Thread Ryota Ozaki

Hi,

Thank you for the report.

On Mon, Dec 5, 2016 at 11:47 PM, Frank Kardel  wrote:
> Hi !
>
> when trying out a -current from 20161127 (7.99.42) I see issues with routed.
>
> On configuration of an interface address A.B.C.D/m the local network address
> A.B.C.D is correctly entered with a loopback host route for the local
> address
> in the routing table.
> Also the network route via the interface is correctly entered in the table.
>
> As soon as routed detects the new interface it seems to miss the loopback
> host route for the local address and consequently decides to remove the
> loopback host route from the kernel routing table,
>
> route monitor output:
> got message of size 160 on Mon Dec  5 15:10:49 2016
> RTM_CHANGE: Change Metrics, Flags or Gateway: len 160, pid 25290, seq 1,
> errno 0, flags: 
> locks: none inits: 
> sockaddrs: 
>  default 10.200.1.1 0.0.0.0
> got message of size 96 on Mon Dec  5 15:10:52 2016
> RTM_ONEWADDR: address being added to iface: len 96, pid 2, seq 0, errno 528,
> flags: 
> locks:  inits: none
> got message of size 104 on Mon Dec  5 15:10:52 2016
> RTM_NEWADDR: address being added to iface: len 104, metric 0, flags:
> 
> sockaddrs: 
>  255.255.255.248 00:1b:21:aa:9b:7c A.B.C.38 default
> ### new address (tentative)
> got message of size 160 on Mon Dec  5 15:10:52 2016
> RTM_ADD: Add Route: len 160, pid 4878, seq 0, errno 0, flags:
> 
> locks: none inits: none
> sockaddrs: 
>  A.B.C.38 link#2
> ### local address loopback link
> got message of size 208 on Mon Dec  5 15:10:52 2016
> RTM_ADD: Add Route: len 208, pid 4878, seq 0, errno 0, flags:
> 
> locks: none inits: none
> sockaddrs: 
>  A.B.C.32 link#2 255.255.255.248 00:1b:21:aa:9b:7c A.B.C.38
> ### net route via interface
> got message of size 160 on Mon Dec  5 15:10:52 2016
> RTM_DELETE: Delete Route: len 160, pid 25290, seq 2, errno 0, flags:
> 
> locks: none inits: none
> sockaddrs: 
>  A.B.C.38 link#2
> ### routed deletes local address loopback link
> got message of size 88 on Mon Dec  5 15:10:57 2016
> RTM_ONEWADDR: address being added to iface: len 88, pid 2, seq 0, errno 520,
> flags: 
> locks:  inits:
> 
> got message of size 96 on Mon Dec  5 15:10:57 2016
> RTM_NEWADDR: address being added to iface: len 96, metric 0, flags:
> 
> sockaddrs: 
>  255.255.255.248 00:1b:21:aa:9b:7c A.B.C.38 A.B.C.39
> ### address finally valid
>
> [BTW: routed/table.c contains an out of date RTM_* number to string table -
> fixed in output below]
>
> Trace from routed:
> Tracing actions started
> Tracing packets started
> Tracing packet contents started
> Tracing kernel changes started
> Add interface lo0  127.0.0.1  -->127.0.0.1/32 
> RCVBUF=61440
> Add interface wm1  10.200.1.2 -->10.200.1.0/24   
> turn on RIP
> Add10.200.1.0/24   -->10.200.1.2   metric=0  wm1 
> Add127.0.0.1/32-->127.0.0.1metric=0  lo0 
> ### initial interface state
> Send mcast RIPv2 REQUEST to 224.0.0.9.520 via wm1
> QUERY
> -- 15:10:46 --
> Recv RIPv2 REQUEST from 10.200.1.2.520 via wm1
> QUERY
> discard our own RIP request
> -- 15:10:46 --
> Recv RIPv2 RESPONSE from 10.200.1.1.520 via wm1
> 0.0.0.0metric=9
> 10.0.0.0   metric=1
> 10.0.0.128/32  metric=2
> ...
> Add0.0.0.0 -->10.200.1.1   metric=9  wm1 15:10:46
> Add10.0.0.0-->10.200.1.1   metric=1  wm1 15:10:46
> Add10.0.0.128/32   -->10.200.1.1   metric=2  wm1 15:10:46
> ...
> ### received routing information
>
> -- 15:10:47 --
> Send multicast Router Solic. from 10.200.1.2 to 224.0.0.2 via wm1 value=0
> -- 15:10:48 --
> write kernel RTM_CHANGE 0.0.0.0 -->10.200.1.1  metric=9
> flags=0x2
> -- 15:10:50 --
> Send multicast Router Solic. from 10.200.1.2 to 224.0.0.2 via wm1 value=0
> -- 15:10:50 --
> ignore RTM_ONEWADDR without dst
> ### old routing messages are not properly skipped?
>
> Add interface wm0  A.B.C.38   -->A.B.C.32/29 
> AddA.B.C.32/29 -->A.B.C.38 metric=0  wm0 
> ### new interface due to ifconfig wm0 A.B.C.D/29
>
> note RTM_NEWADDR with flags 0x100 for unknown interface index #180
> ### RTM_NEWADDR not properly handled/skipped
>
> RTM_ADD from pid 4878: A.B.C.38/32 --> A.B.C.38
> RTM_ADD from pid 4878: A.B.C.32/29 --> A.B.C.38
> -- 15:10:51 --
> write kernel RTM_DELETE A.B.C.38/32 -->A.B.C.38metric=0 flags=0
> ### routed does not seem to consider the A.B.C.38/32 -->A.B.C.38 (if=lo0,
> gw=link#2) as being valid
>
> -- 15:10:53 --
> Send multicast Router Solic. from 10.200.1.2 to 224.0.0.2 via wm1 value=0
> -- 15:10:53 --
> ignore RTM_ONEWADDR without dst
> note RTM_NEWADDR with flags 0x101 for unknown interface

Re: Why so many packet filters?

2016-08-15 Thread Ryota Ozaki

On Mon, Aug 15, 2016 at 4:13 PM, Joerg Sonnenberger <jo...@bec.de> wrote:
> On Mon, Aug 15, 2016 at 01:51:38PM +0900, Ryota Ozaki wrote:
>> BTW should we mark pf and ipf deprecated in netbsd-8 as they aren't
>> well maintained nowadays?
>
> While they are not maintained, PF works quite well for the feature set
> our version has. There are still quite a few issues with NPF, primarily
> documentation issues, but also some functional ones. It seems a bit
> premature to deprecate IPF and PF in the current situation. That said,
> they should certainly be considered legacy functionality.

Hmm, I thought npf is mature. I think it's better to have TODOs of npf
somewhere to clarify what we need to do to make it mature enough.

  ozaki-r

Re: Why so many packet filters?

2016-08-14 Thread Ryota Ozaki

On Mon, Aug 15, 2016 at 12:10 PM, Paul Goyette  wrote:
> Taking a quick look, it seems that we have at least four (maybe five)
> different packet filters available.
>
> pf
> ipf
> bpf (and bpfjit)
> npf
>
> Is there a concise description of each, and when to use one vs the
> other?

(I'm not so familiar with filters, so please someone correct me
 if I'm wrong.)

First of all, bpf (bpfjit) is different from the others. bpf sniffs
raw packets on rx/tx in network device drivers (grep bpf_mtap) and
also allows to send raw packets directly via ifp->if_output
(e.g., ether_output). It doesn't provide pass/block filters that
the others provide.

bpfjit is just an optimization option of bpf. So we don't need to
treat it individually.

pf, ipf and npf provide pass/block functionalities (and more) at
hook points (grep pfil_run_hooks) in the network stack via pfil(9),
which realizes say firewall and NAT/NAPT. They provide similar
functions but unfortunately their functions aren't compatible and
cannot replace one to another easily, IIUC. (Someone would explain
details of the differences.)

npf is a newer filter than the others and designed for multi-core
systems. So basically we recommend npf when one want to use one of
them newly.

BTW should we mark pf and ipf deprecated in netbsd-8 as they aren't
well maintained nowadays?

  ozaki-r

HEADS UP: netstat needs to be updated

2016-08-01 Thread Ryota Ozaki

Hello -current users,

Running netstat -ia (before 2016-07-14) on a 7.99.35
(or newer) kernel causes a problem (same as PR kern/51325).
You can solve the problem by updating netstat with the
latest source code (or get a recent binary at
http://nyftp.netbsd.org/).

Technical details: netstat -ia used kvm(3) to get address
information from the kernel by reading lists of struct
in_ifaddr/in6_ifaddr embedding struct ifaddr. Since
2016-07-14, netstat -ia began to use sysctl instead of
kvm(3) to get the information. In favor of the change,
the kernel changed struct ifaddr (that was needed for
MP-safe network stack work) at 7.99.35, which broke
old netstat -ia.

The breakage is expected. We may not keep backcompat of
kvm(3) if the work for backcompat needs dirty workarounds
and/or extra overheads in the kernel.

Regards,
  ozaki-r

Re: kernel panic

2016-06-19 Thread Ryota Ozaki

On Sun, Jun 19, 2016 at 9:23 PM, Michael van Elst  wrote:
> brad.har...@gmail.com (bch) writes:
>
>>kernel (adjusted from GENNERIC to allow dtrace support) from latest src 
>>panics:
>
>>(transcription):
>
>>reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m,
>>struct ieee80211_node *) == NULL)" failed: file
>>"/usr/src/sys/80211/ieee80211_output.c", line 1347
>
>
> That assertion seems to be bogus. It checks a field in an mbuf
> that was just allocated in ieee80211_getmgtframe using m_getcl
> and that may contain random data in the ctx pointer.

Indeed.

>
> Another similar assertion in the same file is #ifdef __FreeBSD__.
>
> Looking at the current FreeBSD code, it still abuses the rcvif
> pointer for local data. But there are no such assertions, which
> would be bogus in FreeBSD either.

Thanks. I think we can remove the assertion(s) safely.

(I'm not sure why the assertion hadn't failed ever. I guess my changes
broke some implicit zeroing rcvif somewhere.)

  ozaki-r

Re: Test failure for amd64 -current

2016-06-16 Thread Ryota Ozaki

On Fri, Jun 17, 2016 at 11:32 AM, Paul Goyette  wrote:
> I just noticed that the automated test bed for amd64 -current has been
> crashing during one of the test cases:
>
> sbin/sysctl/t_perm (439/654): 8 test cases
> sysctl_ddb: [3.708810s] Passed.
> sysctl_hw: [32.423393s] Passed.
> sysctl_kern: [39.811971s] Passed.
> sysctl_machdep: [8.952583s] Passed.
> sysctl_net: uvm_fault(0xfe8007a9b2e8, 0x0, 1) -> e
> fatal page fault in supervisor mode
> trap type 6 code 0 rip 80862581 cs 8 rflags 286 cr2 398 ilevel 0 rsp
> fe80079facf0
> curlwp 0xfe80033376c0 pid 22351.1 lowest kstack 0xfe80079f72c0
> panic: trap
> cpu0: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> snprintf() at netbsd:snprintf
> trap() at netbsd:trap+0xc4b
> --- trap (number 6) ---
> psref_release() at netbsd:psref_release+0x23
> if_sdl_sysctl() at netbsd:if_sdl_sysctl+0xc1
> sysctl_dispatch() at netbsd:sysctl_dispatch+0xc1
> sys___sysctl() at netbsd:sys___sysctl+0xd8
> syscall() at netbsd:syscall+0x15b
> --- syscall (number 202) ---
> 78e62090bfaa:
> cpu0: End traceback...
>
>
> Is anyone looking into this?

riastradh@ already fixed it (sys/net/if.c 1.341).

  ozaki-r

Re: kernel panic

2016-06-16 Thread Ryota Ozaki

On Thu, Jun 16, 2016 at 3:04 PM, Ryota Ozaki <ozak...@netbsd.org> wrote:
> On Thu, Jun 16, 2016 at 1:56 PM, bch <brad.har...@gmail.com> wrote:
>>
>> On Jun 15, 2016 9:29 PM, "Kengo NAKAHARA" <k-nakah...@iij.ad.jp> wrote:
>>>
>>> Hi,
>>>
>>> On 2016/06/16 8:15, bch wrote:
>>> > I am now at 1.414, and it seems stable.
>>>
>>> Thank you for your checking and reporting.
>>
>> My pleasure. Question, were my wm(4) and iwm(4) faults related (maybe some
>> luck macro-ization, rejigging)?
>
> Not related.
>
>> Can anybody point me to the commits that
>> apparently fixed these interfaces (did if_wm.c fix iwm(4) too??)?
>
> For iwm:
>   http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/mbuf.h
>
> Commit 1.164 broke iwm (and I guess all other wifi drivers)
> and commit 1.165 fixed it.
>
> For wm:
>   http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/if_wm.c
>
> Commit 1.413 broke wm and commit 1.414 fixed it.
>
>>
>>> If it seems there is still
>>> problems, please tell us.
>>
>> Will do. I'd like to have the commit(s) identified and
>> re-witness/characterize the issue. Otherwise, things currently seem stable.
>> Thanks.
>
> [Timeline]
>
> - Jun 10 13:31:45: mbuf.h r1.164
> - Jun 11 ??:??:??: you encountered the first panic
> - Jun 12 10:14:12: mbuf.h r1.165

oops

> - Jun 14 09:07:22: if_wm.c r1.164
 ^^
 r1.413
> - Jun 14 ??:??:??: you encountered the second panic
> - Jun 14 17:09:20: if_wm.c r1.165
 ^^
 r1.414

> - Jun 16 ??:??:??: you are here
>
> And I noticed that I forgot to bump the kernel version; my mbuf.h
> change required it. (I already bumped.) If you run a kernel between
> my mbuf.h change and the bump with network device driver modules
> of 7.99.30, something bad will happen. (I guess the issues you saw
> aren't related to this though.)
>
> Thanks,
>   ozaki-r

Re: kernel panic

2016-06-16 Thread Ryota Ozaki

On Thu, Jun 16, 2016 at 1:56 PM, bch  wrote:
>
> On Jun 15, 2016 9:29 PM, "Kengo NAKAHARA"  wrote:
>>
>> Hi,
>>
>> On 2016/06/16 8:15, bch wrote:
>> > I am now at 1.414, and it seems stable.
>>
>> Thank you for your checking and reporting.
>
> My pleasure. Question, were my wm(4) and iwm(4) faults related (maybe some
> luck macro-ization, rejigging)?

Not related.

> Can anybody point me to the commits that
> apparently fixed these interfaces (did if_wm.c fix iwm(4) too??)?

For iwm:
  http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/mbuf.h

Commit 1.164 broke iwm (and I guess all other wifi drivers)
and commit 1.165 fixed it.

For wm:
  http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/if_wm.c

Commit 1.413 broke wm and commit 1.414 fixed it.

>
>> If it seems there is still
>> problems, please tell us.
>
> Will do. I'd like to have the commit(s) identified and
> re-witness/characterize the issue. Otherwise, things currently seem stable.
> Thanks.

[Timeline]

- Jun 10 13:31:45: mbuf.h r1.164
- Jun 11 ??:??:??: you encountered the first panic
- Jun 12 10:14:12: mbuf.h r1.165
- Jun 14 09:07:22: if_wm.c r1.164
- Jun 14 ??:??:??: you encountered the second panic
- Jun 14 17:09:20: if_wm.c r1.165
- Jun 16 ??:??:??: you are here

And I noticed that I forgot to bump the kernel version; my mbuf.h
change required it. (I already bumped.) If you run a kernel between
my mbuf.h change and the bump with network device driver modules
of 7.99.30, something bad will happen. (I guess the issues you saw
aren't related to this though.)

Thanks,
  ozaki-r

Re: kernel panic

2016-06-11 Thread Ryota Ozaki

Hi,

On Sat, Jun 11, 2016 at 3:58 AM, bch  wrote:
> kernel (adjusted from GENNERIC to allow dtrace support) from latest src 
> panics:
>
> (transcription):
>
> reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m,
> struct ieee80211_node *) == NULL)" failed: file
> "/usr/src/sys/80211/ieee80211_output.c", line 1347

Can you show me a backtrace?

And let me know the latest version (date) of the kernel that worked for you.

  ozaki-r

Re: Crash during rc bootup (amd64) with new networking stuff

2016-04-17 Thread Ryota Ozaki

On Sat, Apr 16, 2016 at 10:49 AM, Geoff Wing <g...@pobox.com> wrote:
> On Friday 2016-04-15 16:48 +1000, Ryota Ozaki output:
> :> :panic: kernel  "(la->la_flags & LLE_STATIC) == 0 failed: .. if_arp", 
> line 1220
> :> :It also deletes and adds in a static arp address:
> :> :   "arp -d 1.2.3.4; arp -s 1.2.3.4 xx:xx:xx:xx:xx:xx"
> :> Taking out the static arp commands and it boots up OK.
> :Thanks. I could reproduce the panic on my machine with the latest kernel.
> :
> :A quick fix is like this:
> [...]
> :Does this patch help you?
> :If so, I'll commit it (with tweaks maybe) after more validations.
>
> Great, boots up OK.

Thanks. I committed the fix and a regression test for the panic.

  ozaki-r

Re: Crash during rc bootup (amd64) with new networking stuff

2016-04-15 Thread Ryota Ozaki

2016/04/15 4:34 "Geoff Wing" <g...@pobox.com>:
>
> On Friday 2016-04-15 16:20 +1000, Ryota Ozaki output:
> :> panic: kernel  "(la->la_flags & LLE_STATIC) == 0 failed: ..
if_arp", line 1220
> :The source code of your kernel looks a bit old:
> :  http://nxr.netbsd.org/xref/src/sys/netinet/if_arp.c#1220
> :
> :You can see the version of the file by:
> :  $ ident /netbsd |grep if_arp.c
> :  $NetBSD: if_arp.c,v 1.205 2016/04/07 03:22:15 christos Exp $
> :And this is the latest version.
>
> Mine says:
> $NetBSD: if_arp.c,v 1.206 2016/04/13 00:47:01 ozaki-r Exp $

Heh. Oh mine IS old! Sorry for confusing you.

>
> I'll try the patch in the other post in a couple of hours.

Thanks!
  ozaki-r

Re: Crash during rc bootup (amd64) with new networking stuff

2016-04-15 Thread Ryota Ozaki

On Fri, Apr 15, 2016 at 3:10 PM, Geoff Wing  wrote:
> On Friday 2016-04-15 13:20 +1000, Geoff Wing output:
> :panic: kernel  "(la->la_flags & LLE_STATIC) == 0 failed: .. if_arp", 
> line 1220
> :It also deletes and adds in a static arp address:
> :   "arp -d 1.2.3.4; arp -s 1.2.3.4 xx:xx:xx:xx:xx:xx"
>
> Taking out the static arp commands and it boots up OK.

Thanks. I could reproduce the panic on my machine with the latest kernel.

A quick fix is like this:

--- a/sys/netinet/if_arp.c
+++ b/sys/netinet/if_arp.c
@@ -1223,10 +1223,11 @@ in_arpinput(struct mbuf *m)
KASSERT(sizeof(la->ll_addr) >= ifp->if_addrlen);
(void)memcpy(>ll_addr, ar_sha(ah), ifp->if_addrlen);
la->la_flags |= LLE_VALID;
-   la->la_expire = time_uptime + arpt_keep;
+   if ((la->la_flags & LLE_STATIC) == 0) {
+   la->la_expire = time_uptime + arpt_keep;
+   arp_settimer(la, arpt_keep);
+   }
la->la_asked = 0;
-   KASSERT((la->la_flags & LLE_STATIC) == 0);
-   arp_settimer(la, arpt_keep);
/* rt->rt_flags &= ~RTF_REJECT; */

Does this patch help you?

If so, I'll commit it (with tweaks maybe) after more validations.

Thanks,
  ozaki-r

Re: Crash during rc bootup (amd64) with new networking stuff

2016-04-15 Thread Ryota Ozaki

On Fri, Apr 15, 2016 at 12:20 PM, Geoff Wing  wrote:
> Hi,
> with the new networking setup, I'm getting a crash using clean amd64
> build (GENERIC kernel) during rc script processing.
>
> After getting past netstart.local, I'll get
>
>  interface address is missing from cache = 0x0  in delete
> arp: writing to routing socket: No such file or directory
> Building databases: 
> Starting syslogd.
> Starting named.
> Setting date via ntp.
> then
>
> panic: kernel  "(la->la_flags & LLE_STATIC) == 0 failed: .. if_arp", line 
> 1220

The source code of your kernel looks a bit old:
  http://nxr.netbsd.org/xref/src/sys/netinet/if_arp.c#1220

You can see the version of the file by:
  $ ident /netbsd |grep if_arp.c
  $NetBSD: if_arp.c,v 1.205 2016/04/07 03:22:15 christos Exp $
And this is the latest version.

Can you rebuild a kernel with the latest snapshot of -current
and try it again (if the source code is actually old)?

Thanks,
  ozaki-r

>
> My netstart.local adds static routes and blackhole routes.
> It also deletes and adds in a static arp address:
> "arp -d 1.2.3.4; arp -s 1.2.3.4 xx:xx:xx:xx:xx:xx"
>
> It's a bit hard to diagnose on my computer, but I can try if others
> cannot reproduce.
>
> Regards,
> Geoff

BuildBot failures

2016-03-22 Thread Ryota Ozaki

Hi,

BuildBot (http://build.tastylime.net/waterfall) has stopped
for several weeks it seems because of cvs update failure.
Do we have to remove local files and checkout cleanly?

Regards,
  ozaki-r

Re: ATF test failures

2016-03-10 Thread Ryota Ozaki

On Thu, Mar 10, 2016 at 4:02 PM, Martin Husemann <mar...@duskware.de> wrote:
> On Thu, Mar 10, 2016 at 03:05:36PM +0900, Ryota Ozaki wrote:
>> We're seeing many test failures (> 1000) on
>> amd64 and i386, and installation failures on
>> sparc.
>
> We should back out the gcc change that causes the x86 failures.

Please someone do it (or fix the problem) :)

>
> Sparc bootblocks are broken, everything else works fine - I am looking
> at it.

Thanks!

  ozaki-r

ATF test failures

2016-03-09 Thread Ryota Ozaki

Hi,

http://releng.netbsd.org/test-results.html

We're seeing many test failures (> 1000) on
amd64 and i386, and installation failures on
sparc.

What is happening on them? Do anyone have
ideas to fix them?

Thanks,
  ozaki-r

Re: recurring panics

2016-03-09 Thread Ryota Ozaki

On Wed, Mar 9, 2016 at 10:59 PM, Thomas Klausner <w...@netbsd.org> wrote:
> On Wed, Mar 09, 2016 at 10:22:56PM +0900, Ryota Ozaki wrote:
>> On Wed, Mar 9, 2016 at 8:45 PM, Thomas Klausner <t...@giga.or.at> wrote:
>> > Hi!
>> >
>> > I have had this kind of reboot about 5 times in the last couple of days:
>> >
>> > Mar  8 16:26:14 yt savecore: reboot after panic: panic: kernel diagnostic 
>> > assertion "l->l_nopreempt == 0" failed: file 
>> > "/archive/foreign/src/sys/sys/userret.h", line 116  WARNING: SPL NOT 
>> > LOWERED ON SYSCALL 16384 24 EXIT ef20f930 6 WARNING: SPL NOT LOWERED ON 
>> > TRAP EXIT 6 0
>> >
>> > For some of them, I even have crash dumps:
>> >
>> > (gdb) target kvm netbsd.94.core
>> > 0x801195a5 in cpu_reboot (howto=howto@entry=260, 
>> > bootstr=bootstr@entry=0x0) at 
>> > /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
>> > 671 dumpsys();
>> > (gdb) bt
>> > #0  0x801195a5 in cpu_reboot (howto=howto@entry=260, 
>> > bootstr=bootstr@entry=0x0) at 
>> > /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
>> > #1  0x8081c704 in vpanic (fmt=0x80e13840 "kernel 
>> > %sassertion \"%s\" failed: file \"%s\", line %d ", 
>> > ap=ap@entry=0xfe8154d51e58)
>> > at /archive/foreign/src/sys/kern/subr_prf.c:342
>> > #2  0x80b3edb3 in kern_assert (fmt=fmt@entry=0x80e13840 
>> > "kernel %sassertion \"%s\" failed: file \"%s\", line %d ")
>> > at /archive/foreign/src/sys/lib/libkern/kern_assert.c:51
>> > #3  0x8013d0e7 in mi_userret (l=0xfe84092b3460) at 
>> > /archive/foreign/src/sys/sys/userret.h:116
>> > #4  userret (l=0xfe84092b3460) at ./machine/userret.h:82
>> > #5  syscall (frame=0xfe8154d51f00) at 
>> > /archive/foreign/src/sys/arch/x86/x86/syscall.c:184
>> > #6  0x80100661 in Xsyscall ()
>> > (gdb) fr 3
>> > #3  0x8013d0e7 in mi_userret (l=0xfe84092b3460) at 
>> > /archive/foreign/src/sys/sys/userret.h:116
>> > 116 KASSERT(l->l_nopreempt == 0);
>> > (gdb)
>> >
>> > I upgraded from a Jan 28 kernel to a March 3 kernel, and I think it
>> > only started afterwards.
>> >
>> > Any ideas?
>> >  Thomas
>>
>> I saw similar panics and my commit at March 7 (if.c,v 1.326) fixed them.
>> So updating your kernel may solve the problem.
>
> This one?

Yes.

>
> Index: src/sys/net/if.c
> diff -u src/sys/net/if.c:1.325 src/sys/net/if.c:1.326
> --- src/sys/net/if.c:1.325  Fri Feb 19 20:05:43 2016
> +++ src/sys/net/if.cMon Mar  7 01:41:55 2016
> @@ -770,6 +770,7 @@
> ifq = percpu_getref(ipq->ipq_ifqs);
> if (IF_QFULL(ifq)) {
> IF_DROP(ifq);
> +   percpu_putref(ipq->ipq_ifqs);
> m_freem(m);
> goto out;
> }
>
> What would trigger that case?

percpu_putref is kpreempt_enable that decrements l_nopreempt.
If we forget it, some other places that accesses l_nopreempt
can be affected. In your case, mi_userret is suffered.

  ozaki-r

Re: recurring panics

2016-03-09 Thread Ryota Ozaki

On Wed, Mar 9, 2016 at 8:45 PM, Thomas Klausner  wrote:
> Hi!
>
> I have had this kind of reboot about 5 times in the last couple of days:
>
> Mar  8 16:26:14 yt savecore: reboot after panic: panic: kernel diagnostic 
> assertion "l->l_nopreempt == 0" failed: file 
> "/archive/foreign/src/sys/sys/userret.h", line 116  WARNING: SPL NOT LOWERED 
> ON SYSCALL 16384 24 EXIT ef20f930 6 WARNING: SPL NOT LOWERED ON TRAP EXIT 6 0
>
> For some of them, I even have crash dumps:
>
> (gdb) target kvm netbsd.94.core
> 0x801195a5 in cpu_reboot (howto=howto@entry=260, 
> bootstr=bootstr@entry=0x0) at 
> /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
> 671 dumpsys();
> (gdb) bt
> #0  0x801195a5 in cpu_reboot (howto=howto@entry=260, 
> bootstr=bootstr@entry=0x0) at 
> /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
> #1  0x8081c704 in vpanic (fmt=0x80e13840 "kernel %sassertion 
> \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xfe8154d51e58)
> at /archive/foreign/src/sys/kern/subr_prf.c:342
> #2  0x80b3edb3 in kern_assert (fmt=fmt@entry=0x80e13840 
> "kernel %sassertion \"%s\" failed: file \"%s\", line %d ")
> at /archive/foreign/src/sys/lib/libkern/kern_assert.c:51
> #3  0x8013d0e7 in mi_userret (l=0xfe84092b3460) at 
> /archive/foreign/src/sys/sys/userret.h:116
> #4  userret (l=0xfe84092b3460) at ./machine/userret.h:82
> #5  syscall (frame=0xfe8154d51f00) at 
> /archive/foreign/src/sys/arch/x86/x86/syscall.c:184
> #6  0x80100661 in Xsyscall ()
> (gdb) fr 3
> #3  0x8013d0e7 in mi_userret (l=0xfe84092b3460) at 
> /archive/foreign/src/sys/sys/userret.h:116
> 116 KASSERT(l->l_nopreempt == 0);
> (gdb)
>
> I upgraded from a Jan 28 kernel to a March 3 kernel, and I think it
> only started afterwards.
>
> Any ideas?
>  Thomas

I saw similar panics and my commit at March 7 (if.c,v 1.326) fixed them.
So updating your kernel may solve the problem.

  ozaki-r

Re: Disk full on anita/amd64?

2016-02-01 Thread Ryota Ozaki

On Tue, Feb 2, 2016 at 4:28 PM, Andreas Gustafsson <g...@gson.org> wrote:
> Ryota Ozaki wrote:
>> I've noticed increase of test failures on anita/amd64:
>> http://releng.netbsd.org/b5reports/amd64/commits-2016.01.html#2016.01.30.03.38.39
>>
>> A test failure indicates disk full of the system:
>> http://releng.netbsd.org/b5reports/amd64/build/2016.01.30.03.38.39/test.html#atf_atf-c_detail_fs_test_mkdtemp_err
>>
>> A sanity check of ATF also indicates that:
>>   df-pre-test Filesystem  1K-blocks  UsedAvail %Cap Mounted on
>>   df-pre-test /dev/wd0a 862463 810271 9069  98% /
>>
>> I guess it happens due to dtrace stuffs added to the base
>> recently and keeping debugging symbols of kernel modules
>> for dtrace (this is the just previous commit where test
>> failures increased).
>
> I just incrased the disk size in the configuration of the amd64 tests
> on babylon5, and will also increase anita's default disk size in the
> next release (assuming the increase in disk use is permanent).

Thanks!
  ozaki-r

Disk full on anita/amd64?

2016-02-01 Thread Ryota Ozaki

Hi,

I've noticed increase of test failures on anita/amd64:
http://releng.netbsd.org/b5reports/amd64/commits-2016.01.html#2016.01.30.03.38.39

A test failure indicates disk full of the system:
http://releng.netbsd.org/b5reports/amd64/build/2016.01.30.03.38.39/test.html#atf_atf-c_detail_fs_test_mkdtemp_err

A sanity check of ATF also indicates that:
  df-pre-test Filesystem  1K-blocks  UsedAvail %Cap Mounted on
  df-pre-test /dev/wd0a 862463 810271 9069  98% /

I guess it happens due to dtrace stuffs added to the base
recently and keeping debugging symbols of kernel modules
for dtrace (this is the just previous commit where test
failures increased).

I think we should increase the size of the disk image
of qemu.

Regards,
  ozaki-r

Re: IPv6 init order and RPI

2015-11-10 Thread Ryota Ozaki

On Mon, Nov 9, 2015 at 5:24 AM, Joerg Sonnenberger
 wrote:
> Hi all,
> while trying to update my first generation RPI, I hit a kernel panic
> during first boot. Looking a bit further, for unknown reasons the
> address setup sometimes fails with usmsc. When it does, the ip6_input
> case can trigger the timeout handling on the incompletely set up
> address. The attached patch seems to fix that problem at least.
> Comments?

LGTM; in6m (in6m_timer_ch) should be initialized before being
published via ia6_multiaddrs.

Thanks,
  ozaki-r

Re: panic in arptimer

2015-10-20 Thread Ryota Ozaki

On Mon, Oct 19, 2015 at 6:33 PM, Ryota Ozaki <ozak...@netbsd.org> wrote:
> Hi,
>
> I've reproduced the panic on my machine and I'm investing
> the problem.

A possible fix has been committed. Could you try a latest kernel?
(a kernel binary will be built in several hours.)

tips: by doing sysctl -w net.inet.arp.keep=30, you don't need to
wait for 1200 seconds.

>
> Thank you for the report,
>   ozaki-r
>
> On Mon, Oct 19, 2015 at 4:10 PM, Takahiro Hayashi <t.hash...@gmail.com> wrote:
>> Hello,
>>
>> Kernel panics in arptimer after detaching network interface.
>> See dmesg below please.
>> It happened on NetBSD/amd64 on GENERIC.201510182130Z from nyftp.
>> I think this problem looks like kern/50186.

BTW I think this problem is different from PR kern/50186.

Regards,
  ozaki-r

>>
>> How-To-Repeat:
>> 1. Boot kernel into single user mode with "boot netbsd -s".
>> 2. sysctl -w net.inet6.ip6.auto_linklocal=0
>> 3. ifconfig interface .
>> 4. Send one ping to other host.
>> 5. Detach the interface with "drvctl -d".
>> 6. Wait about 1200 seconds (actually 1200 sec after ping is sent).
>>
>>
>> I saw following panic after detaching "re0".
>>
>> fatal page fault in supervisor mode
>> trap type 6 code 0 rip 808cbf2f cs 8 rflags 10246 cr2
>> 873c7368 ilevel 2 rsp fe80dabd1f08
>> curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0
>> kernel: page fault trap, code=0
>> Stopped in pid 0.22 (system) at netbsd:arptimer+0xc4:   movq
>> 360(%r15),%rdi
>> db{1}> bt
>> arptimer() at netbsd:arptimer+0xc4
>> callout_softclock() at netbsd:callout_softclock+0x1d0
>> softint_dispatch() at netbsd:softint_dispatch+0xd3
>> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0
>> Xsoftintr() at netbsd:Xsoftintr+0x4f
>> --- interrupt ---
>> 0:
>> db{1}> show reg
>> ds  1ee8
>> es  4040
>> fs  600
>> gs  3d5e
>> rdi fe81078c8a00
>> rsi f800
>> rbp fe80dabd1f38
>> rbx fe81078c8908
>> rdx 0
>> rcx 7
>> rax fe81071a30c4
>> r8  fe8107184040
>> r9  7d
>> r10 fe811eb244f4
>> r11 246
>> r12 fe81078c8a00
>> r13 fe81078c89b0
>> r14 0
>> r15 873c7008
>> rip 808cbf2farptimer+0xc4
>> cs  8
>> rflags  10246
>> rsp fe80dabd1f08
>> ss  10
>> netbsd:arptimer+0xc4:   movq360(%r15),%rdi
>> db{1}>
>>
>>
>> I met folloging panic after "drvctl -d axe0".
>>
>> fatal page fault in supervisor mode
>> trap type 6 code 2 rip 8011bcdd cs 8 rflags 10282 cr2 0 ilevel 2 rsp
>> fe80dabd1f00
>> curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0
>> kernel: page fault trap, code=0
>> Stopped in pid 0.22 (system) at netbsd:rw_enter+0x2d:   lock cmpxchgq
>> %rcx,0(%
>> rdi)
>> db{1}> bt
>> rw_enter() at netbsd:rw_enter+0x2d
>> callout_softclock() at netbsd:callout_softclock+0x1d0
>> softint_dispatch() at netbsd:softint_dispatch+0xd3
>> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0
>> Xsoftintr() at netbsd:Xsoftintr+0x4f
>> --- interrupt ---
>> 0:
>> db{1}> show reg
>> ds  1ee8
>> es  0
>> fs  fc00
>> gs  3d5e
>> rdi 0
>> rsi 1
>> rbp fe80dabd1f38
>> rbx fe811dbc8188
>> rdx 0
>> rcx fe81071a30c4
>> rax 0
>> r8  fe8107184040
>> r9  7d
>> r10 fe811eb244f4
>> r11 246
>> r12 fe811dbc8280
>> r13 fe811dbc8230
>> r14 0
>> r15 fe811de0b010
>> rip 8011bcddrw_enter+0x2d
>> cs  8
>> rflags  10282
>> rsp fe80dabd1f00
>> ss  10
>> netbsd:rw_enter+0x2d:   lock cmpxchgq   %rcx,0(%rdi)
>> db{1}>
>>
>>
>> --
>> t-hash

Re: panic in arptimer

2015-10-20 Thread Ryota Ozaki

On Tue, Oct 20, 2015 at 7:31 PM, Takahiro Hayashi <t.hash...@gmail.com> wrote:
> Hello,
>
> On 2015/10/20 16:59, Ryota Ozaki wrote:
>>
>> On Mon, Oct 19, 2015 at 6:33 PM, Ryota Ozaki <ozak...@netbsd.org> wrote:
>>>
>>> Hi,
>>>
>>> I've reproduced the panic on my machine and I'm investing
>>> the problem.
>>
>>
>> A possible fix has been committed. Could you try a latest kernel?
>> (a kernel binary will be built in several hours.)
>
>
> I have updated my local tree and confirmed the problem is fixed.

Good to hear. Thank you for testing.

  ozaki-r

> Thank you for working on this prob.
>
>> tips: by doing sysctl -w net.inet.arp.keep=30, you don't need to
>> wait for 1200 seconds.
>
>
> Thank you for nice tip.
>
>
>>>
>>> Thank you for the report,
>>>ozaki-r
>>>
>>> On Mon, Oct 19, 2015 at 4:10 PM, Takahiro Hayashi <t.hash...@gmail.com>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Kernel panics in arptimer after detaching network interface.
>>>> See dmesg below please.
>>>> It happened on NetBSD/amd64 on GENERIC.201510182130Z from nyftp.
>>>> I think this problem looks like kern/50186.
>>
>>
>> BTW I think this problem is different from PR kern/50186.
>>
>> Regards,
>>ozaki-r
>>
>>>>
>>>> How-To-Repeat:
>>>> 1. Boot kernel into single user mode with "boot netbsd -s".
>>>> 2. sysctl -w net.inet6.ip6.auto_linklocal=0
>>>> 3. ifconfig interface .
>>>> 4. Send one ping to other host.
>>>> 5. Detach the interface with "drvctl -d".
>>>> 6. Wait about 1200 seconds (actually 1200 sec after ping is sent).
>>>>
>>>>
>>>> I saw following panic after detaching "re0".
>>>>
>>>> fatal page fault in supervisor mode
>>>> trap type 6 code 0 rip 808cbf2f cs 8 rflags 10246 cr2
>>>> 873c7368 ilevel 2 rsp fe80dabd1f08
>>>> curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0
>>>> kernel: page fault trap, code=0
>>>> Stopped in pid 0.22 (system) at netbsd:arptimer+0xc4:   movq
>>>> 360(%r15),%rdi
>>>> db{1}> bt
>>>> arptimer() at netbsd:arptimer+0xc4
>>>> callout_softclock() at netbsd:callout_softclock+0x1d0
>>>> softint_dispatch() at netbsd:softint_dispatch+0xd3
>>>> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0
>>>> Xsoftintr() at netbsd:Xsoftintr+0x4f
>>>> --- interrupt ---
>>>> 0:
>>>> db{1}> show reg
>>>> ds  1ee8
>>>> es  4040
>>>> fs  600
>>>> gs  3d5e
>>>> rdi fe81078c8a00
>>>> rsi f800
>>>> rbp fe80dabd1f38
>>>> rbx fe81078c8908
>>>> rdx 0
>>>> rcx 7
>>>> rax fe81071a30c4
>>>> r8  fe8107184040
>>>> r9  7d
>>>> r10 fe811eb244f4
>>>> r11 246
>>>> r12 fe81078c8a00
>>>> r13 fe81078c89b0
>>>> r14 0
>>>> r15 873c7008
>>>> rip 808cbf2farptimer+0xc4
>>>> cs  8
>>>> rflags  10246
>>>> rsp fe80dabd1f08
>>>> ss  10
>>>> netbsd:arptimer+0xc4:   movq360(%r15),%rdi
>>>> db{1}>
>>>>
>>>>
>>>> I met folloging panic after "drvctl -d axe0".
>>>>
>>>> fatal page fault in supervisor mode
>>>> trap type 6 code 2 rip 8011bcdd cs 8 rflags 10282 cr2 0 ilevel 2
>>>> rsp
>>>> fe80dabd1f00
>>>> curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0
>>>> kernel: page fault trap, code=0
>>>> Stopped in pid 0.22 (system) at netbsd:rw_enter+0x2d:   lock cmpxchgq
>>>> %rcx,0(%
>>>> rdi)
>>>> db{1}> bt
>>>> rw_enter() at netbsd:rw_enter+0x2d
>>>> callout_softclock() at netbsd:callout_softclock+0x1d0
>>>> softint_dispatch() at netbsd:softint_dispatch+0xd3
>>>> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0
>>>> Xsoftintr() at netbsd:Xsoftintr+0x4f
>>>> --- interrupt ---
>>>> 0:
>>>> db{1}> show reg
>>>> ds  1ee8
>>>> es  0
>>>> fs  fc00
>>>> gs  3d5e
>>>> rdi 0
>>>> rsi 1
>>>> rbp fe80dabd1f38
>>>> rbx fe811dbc8188
>>>> rdx 0
>>>> rcx fe81071a30c4
>>>> rax 0
>>>> r8  fe8107184040
>>>> r9  7d
>>>> r10 fe811eb244f4
>>>> r11 246
>>>> r12 fe811dbc8280
>>>> r13 fe811dbc8230
>>>> r14 0
>>>> r15 fe811de0b010
>>>> rip 8011bcddrw_enter+0x2d
>>>> cs  8
>>>> rflags  10282
>>>> rsp fe80dabd1f00
>>>> ss  10
>>>> netbsd:rw_enter+0x2d:   lock cmpxchgq   %rcx,0(%rdi)
>>>> db{1}>
>>>>
>>>>
>>>> --
>>>> t-hash
>>
>>
>
> --
> t-hash

Re: panic in arptimer

2015-10-19 Thread Ryota Ozaki

Hi,

I've reproduced the panic on my machine and I'm investing
the problem.

Thank you for the report,
  ozaki-r

On Mon, Oct 19, 2015 at 4:10 PM, Takahiro Hayashi  wrote:
> Hello,
>
> Kernel panics in arptimer after detaching network interface.
> See dmesg below please.
> It happened on NetBSD/amd64 on GENERIC.201510182130Z from nyftp.
> I think this problem looks like kern/50186.
>
> How-To-Repeat:
> 1. Boot kernel into single user mode with "boot netbsd -s".
> 2. sysctl -w net.inet6.ip6.auto_linklocal=0
> 3. ifconfig interface .
> 4. Send one ping to other host.
> 5. Detach the interface with "drvctl -d".
> 6. Wait about 1200 seconds (actually 1200 sec after ping is sent).
>
>
> I saw following panic after detaching "re0".
>
> fatal page fault in supervisor mode
> trap type 6 code 0 rip 808cbf2f cs 8 rflags 10246 cr2
> 873c7368 ilevel 2 rsp fe80dabd1f08
> curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0
> kernel: page fault trap, code=0
> Stopped in pid 0.22 (system) at netbsd:arptimer+0xc4:   movq
> 360(%r15),%rdi
> db{1}> bt
> arptimer() at netbsd:arptimer+0xc4
> callout_softclock() at netbsd:callout_softclock+0x1d0
> softint_dispatch() at netbsd:softint_dispatch+0xd3
> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0
> Xsoftintr() at netbsd:Xsoftintr+0x4f
> --- interrupt ---
> 0:
> db{1}> show reg
> ds  1ee8
> es  4040
> fs  600
> gs  3d5e
> rdi fe81078c8a00
> rsi f800
> rbp fe80dabd1f38
> rbx fe81078c8908
> rdx 0
> rcx 7
> rax fe81071a30c4
> r8  fe8107184040
> r9  7d
> r10 fe811eb244f4
> r11 246
> r12 fe81078c8a00
> r13 fe81078c89b0
> r14 0
> r15 873c7008
> rip 808cbf2farptimer+0xc4
> cs  8
> rflags  10246
> rsp fe80dabd1f08
> ss  10
> netbsd:arptimer+0xc4:   movq360(%r15),%rdi
> db{1}>
>
>
> I met folloging panic after "drvctl -d axe0".
>
> fatal page fault in supervisor mode
> trap type 6 code 2 rip 8011bcdd cs 8 rflags 10282 cr2 0 ilevel 2 rsp
> fe80dabd1f00
> curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0
> kernel: page fault trap, code=0
> Stopped in pid 0.22 (system) at netbsd:rw_enter+0x2d:   lock cmpxchgq
> %rcx,0(%
> rdi)
> db{1}> bt
> rw_enter() at netbsd:rw_enter+0x2d
> callout_softclock() at netbsd:callout_softclock+0x1d0
> softint_dispatch() at netbsd:softint_dispatch+0xd3
> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0
> Xsoftintr() at netbsd:Xsoftintr+0x4f
> --- interrupt ---
> 0:
> db{1}> show reg
> ds  1ee8
> es  0
> fs  fc00
> gs  3d5e
> rdi 0
> rsi 1
> rbp fe80dabd1f38
> rbx fe811dbc8188
> rdx 0
> rcx fe81071a30c4
> rax 0
> r8  fe8107184040
> r9  7d
> r10 fe811eb244f4
> r11 246
> r12 fe811dbc8280
> r13 fe811dbc8230
> r14 0
> r15 fe811de0b010
> rip 8011bcddrw_enter+0x2d
> cs  8
> rflags  10282
> rsp fe80dabd1f00
> ss  10
> netbsd:rw_enter+0x2d:   lock cmpxchgq   %rcx,0(%rdi)
> db{1}>
>
>
> --
> t-hash

Re: panic in if_arp.c

2015-10-14 Thread Ryota Ozaki

Hi,

roy@ and I have been working on the issue and have had a fix for it.
The fix will be committed by roy@ (or me) soon.

I'm sorry for the inconvenience,
  ozaki-r

On Tue, Oct 13, 2015 at 4:04 PM, Thomas Klausner  wrote:
> Hi!
>
> I've upgraded my kernel from a version of end of September (28th or
> 30th, not quite sure) to one from yesterday, Oct 12.
>
> Since then I've had two panics, both in:
>
> panic: kernel diagnostic assertion "rw_write_held(&(la)->lle_lock)" failed: 
> file "...sys/netinet/if_arp.c", line 931
>
> The second was this morning. The machine had been idle after the
> previous panic, I started vncserver, and in that filezilla and
> transmission, and it immediately paniced.
>
> The backtrace is:
>
> (gdb) target kvm netbsd.88.core
> 0x80119755 in cpu_reboot (howto=howto@entry=260, 
> bootstr=bootstr@entry=0x0) at 
> /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
> 671 dumpsys();
> (gdb) bt
> #0  0x80119755 in cpu_reboot (howto=howto@entry=260, 
> bootstr=bootstr@entry=0x0) at 
> /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
> #1  0x80811da4 in vpanic (fmt=0x80d38b08 "kernel %sassertion 
> \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xfe813bf7b6f8)
> at /archive/foreign/src/sys/kern/subr_prf.c:342
> #2  0x80a86303 in kern_assert (fmt=fmt@entry=0x80d38b08 
> "kernel %sassertion \"%s\" failed: file \"%s\", line %d ")
> at /archive/foreign/src/sys/lib/libkern/kern_assert.c:51
> #3  0x808cf369 in arpresolve (ifp=ifp@entry=0x80003b2ec058, 
> rt=rt@entry=0xfe8826534118, m=0xfe88174e0e00, 
> dst=dst@entry=0xfe8827d2d8b0,
> desten=desten@entry=0xfe813bf7b7c2 
> "��\377\377\377\377\030AS&�\376\377\377r\253\202\245\347\063\"�\030AS&�\376\377\377��\322'�\376\377\377\030AS&�\376\377\377\060�\367;�\376\377\377�6T�\377\377\377\377")
>  at /archive/foreign/src/sys/netinet/if_arp.c:931
> #4  0x8089c80c in ether_output (ifp0=0x80003b2ec058, 
> m0=, dst=0xfe8827d2d8b0, rt=0xfe8826534118)
> at /archive/foreign/src/sys/net/if_ethersubr.c:245
> #5  0x805436b3 in klock_if_output (rt=0xfe8826534118, 
> dst=0xfe8827d2d8b0, m=0xfe88174e0e00, ifp=0x80003b2ec058)
> at /archive/foreign/src/sys/netinet/ip_output.c:189
> #6  ip_hresolv_output (ifp0=, m=0xfe88174e0e00, 
> dst=dst@entry=0xfe8827d2d8b0, rt00=rt00@entry=0xfe8826534118)
> at /archive/foreign/src/sys/netinet/ip_output.c:314
> #7  0x80544e1e in ip_output (m0=m0@entry=0xfe88174e0e00) at 
> /archive/foreign/src/sys/netinet/ip_output.c:749
> #8  0x8054f323 in tcp_output (tp=0xfe882578c000) at 
> /archive/foreign/src/sys/netinet/tcp_output.c:1627
> #9  0x8055863b in tcp_shutdown (so=0xfe882582f498) at 
> /archive/foreign/src/sys/netinet/tcp_usrreq.c:931
> #10 tcp_shutdown_wrapper (a=0xfe882582f498) at 
> /archive/foreign/src/sys/netinet/tcp_usrreq.c:2471
> #11 0x8071484b in nfs_disconnect (nmp=0xfe882554e008) at 
> /archive/foreign/src/sys/nfs/nfs_socket.c:410
> #12 0x807156d1 in nfs_reconnect (rep=rep@entry=0xfe880d6e92d8) at 
> /archive/foreign/src/sys/nfs/nfs_socket.c:355
> #13 0x806fe473 in nfs_receive (l=0xfe880aab3ba0, 
> mp=0xfe813bf7bc28, aname=0xfe813bf7bc30, rep=0xfe880d6e92d8)
> at /archive/foreign/src/sys/nfs/nfs_clntsocket.c:278
> #14 nfs_reply (lwp=0xfe880aab3ba0, myrep=0xfe880d6e92d8) at 
> /archive/foreign/src/sys/nfs/nfs_clntsocket.c:352
> #15 nfs_request (np=np@entry=0xfe8825d92e38, mrest=0xfe8826c9e800, 
> procnum=procnum@entry=18, lwp=lwp@entry=0xfe880aab3ba0, 
> cred=cred@entry=0xfe880d70d180,
> mrp=mrp@entry=0xfe813bf7bd50, mdp=mdp@entry=0xfe813bf7bd58, 
> dposp=dposp@entry=0xfe813bf7bd40, rexmitp=rexmitp@entry=0x0)
> at /archive/foreign/src/sys/nfs/nfs_clntsocket.c:688
> #16 0x8071d74a in nfs_statvfs (mp=0xfe8825e03008, 
> sbp=0xfe880d472008) at /archive/foreign/src/sys/nfs/nfs_vfsops.c:202
> #17 0x80859948 in VFS_STATVFS (mp=mp@entry=0xfe8825e03008, 
> a=a@entry=0xfe880d472008) at /archive/foreign/src/sys/kern/vfs_subr.c:1339
> #18 0x8085d123 in dostatvfs (mp=mp@entry=0xfe8825e03008, 
> sp=sp@entry=0xfe880d472008, l=l@entry=0xfe880aab3ba0, 
> flags=flags@entry=1, root=root@entry=0)
> at /archive/foreign/src/sys/kern/vfs_syscalls.c:1101
> #19 0x8085d4cf in do_sys_getvfsstat (l=0xfe880aab3ba0, 
> sfsp=0x7f7ff5d41290, bufsize=, flags=1, 
> copyfn=0x80113c40 ,
> entry_sz=entry_sz@entry=2256, retval=0xfe813bf7beb8) at 
> /archive/foreign/src/sys/kern/vfs_syscalls.c:1254
> #20 0x8085d60b in sys_getvfsstat (l=, uap= out>, retval=) at 
> /archive/foreign/src/sys/kern/vfs_syscalls.c:1306
> #21 0x8013cdbe in sy_call (rval=0xfe813bf7beb8, 
> uap=0xfe813bf7bf00, l=0xfe880aab3ba0,

cd sys/rump; $TOOLDIR/bin/nbmake-$MACHINE doesn't work

2015-09-27 Thread Ryota Ozaki

Hi,

I sometimes build only rump-related binaries
by cd sys/rump; $TOOLDIR/bin/nbmake-$MACHINE.
It worked fine, but since one or two weeks ago,
it doesn't work for me; it fails with the
following errors:


cd sys/rump; ~/git/seil6/work.tools/bin/nbmake-amd64 -j9 &&
~/git/seil6/work.tools/bin/nbmake-amd64 -j9 install; cd -
all ===> include
all ===> librump
all ===> dev
all ===> fs
all ===> kern
all ===> net
all ===> share
all ===> dev/lib
all ===> include/rump
all ===> librump/rumpkern
all ===> librump/rumpdev
all ===> kern/lib
all ===> librump/rumpnet
all ===> net/lib
all ===> share/man
nbmake: "/home/ozaki-r/git/netbsd-src/sys/rump/include/rump/Makefile"
line 7: Malformed conditional ((${MKRUMP} != "no"))
nbmake: Fatal errors encountered -- cannot continue
nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump/include/rump
--- all-rump ---
*** [all-rump] Error code 1

nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump/include
1 error

nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump/include
--- all-include ---
*** [all-include] Error code 2

nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump
nbmake: 
"/home/ozaki-r/git/netbsd-src/sys/rump/kern/lib/../Makefile.rumpkerncomp"
line 8: Malformed conditional (${MKSLJIT} != "no")
nbmake: "/home/ozaki-r/git/netbsd-src/sys/rump/net/lib/../Makefile.rumpnetcomp"
line 9: Malformed conditional (${MKSLJIT} != "no")
nbmake: Fatal errors encountered -- cannot continue
nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump/kern/lib
A failure has been detected in another branch of the parallel make

nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump/share/man


After the build failure, some symlinks are created
in the source code tree:
sys/rump/librump/rumpdev/amd64
sys/rump/librump/rumpdev/i386
sys/rump/librump/rumpdev/machine
sys/rump/librump/rumpdev/x86
sys/rump/librump/rumpkern/amd64
sys/rump/librump/rumpkern/i386
sys/rump/librump/rumpkern/machine
sys/rump/librump/rumpkern/x86
sys/rump/librump/rumpnet/amd64
sys/rump/librump/rumpnet/i386
sys/rump/librump/rumpnet/machine
sys/rump/librump/rumpnet/x86

It's something wrong because I do build.sh with
-O work.amd64 so nothing should be created in
the source code tree.

Do anyone know how to fix it?

Thanks,
  ozaki-r

Re: arplookup: unable to enter address - host is not on local network

2015-09-18 Thread Ryota Ozaki

On Fri, Sep 18, 2015 at 5:53 AM, Benjamin Lorenz <b...@pocketservices.de> wrote:
> On 17 Sep 2015, at 03:34, Ryota Ozaki <ozak...@netbsd.org> wrote:
>
>> We discussed a similar issue ever and introduced a sysctl to suppress
>> the messages in -current:
>> http://mail-index.netbsd.org/tech-kern/2014/11/13/msg017981.html
>>
>> So I think we can pull up the sysctl to netbsd-6 and netbsd-7
>> if you want. (It may take a while though.)
>
> I consider this to be a great idea — at least for netbsd-7. I have my own 
> patch
> which basically comments out the log in all instances, so I am fine. But your 
> solution
> would help other users as well.

Okay. I'll request pull-ups of the commit to -7 and -6.

  ozaki-r

Re: arplookup: unable to enter address - host is not on local network

2015-09-16 Thread Ryota Ozaki

Hi,

On Mon, Sep 14, 2015 at 4:49 PM, Benjamin Lorenz  wrote:
>
> Talking about
> http://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/sys/netinet/if_arp.c: I
> have a hoster (www.vultr.com) with some strange(?) network setup. All
> kernels (6.1 and 7.0 RC) complain with plenty of log messages highlighted
> below. They screw up dmesg output so I had to comment out the log statement
> and compile my own kernel.
>
> Question: Is this setup something vultr should change, or is our debug
> logging a little bit too pedantic/paranoid so we should consider removing
> it?

We discussed a similar issue ever and introduced a sysctl to suppress
the messages in -current:
http://mail-index.netbsd.org/tech-kern/2014/11/13/msg017981.html

So I think we can pull up the sysctl to netbsd-6 and netbsd-7
if you want. (It may take a while though.)

  ozaki-r

>
> Thanks for insight,
> Benjamin
>
>
>
>
> if (create) {
> if (rt->rt_flags & RTF_GATEWAY) {
> if (log_unknown_network)
> why = "host is not on local network";
> } else if ((rt->rt_flags & RTF_LLINFO) == 0) {
> ARP_STATINC(ARP_STAT_ALLOCFAIL);
> why = "could not allocate llinfo";
> } else
> why = "gateway route is not ours";
> if (why) {
> log(LOG_DEBUG, "arplookup: unable to enter address"
>" for %s@%s on %s (%s)\n", in_fmtaddr(*addr),
>lla_snprintf(ar_sha(ah), ah->ar_hln),
>(ifp) ? ifp->if_xname : "null", why);
> }
> if ((rt->rt_flags & RTF_CLONED) != 0) {
> rtrequest(RTM_DELETE, rt_getkey(rt),
>rt->rt_gateway, rt_mask(rt), rt->rt_flags, NULL);
> }
> }
>
>

Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed

2015-09-03 Thread Ryota Ozaki

On Fri, Sep 4, 2015 at 10:28 AM, Jun Ebihara  wrote:
> From: Jun Ebihara 
> Subject: Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed
> Date: Thu, 03 Sep 2015 15:41:28 +0900 (JST)
>
 I'll re-updating my kernel and try.
>>> The fix is already committed, so you can try it by cvs update.
>>> If the panic still happens, could you please try a kernel with
>>> DEBUG? It will provide a more useful backtrace.
>> Thanx.
>> Now I make cvs update and make GENERIC_DEBUG and waiting for the panic will 
>> come.
>
> After over 10 hours,my system still work without headached panic!
> Many thanx!

Good :) Thank you for your help!

  ozaki-r

Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed

2015-09-02 Thread Ryota Ozaki

On Wed, Sep 2, 2015 at 4:14 PM, Paul Goyette <p...@vps1.whooppee.com> wrote:
> On Wed, 2 Sep 2015, Ryota Ozaki wrote:
>
>> On Wed, Sep 2, 2015 at 3:16 PM, Ryota Ozaki <ozak...@netbsd.org> wrote:
>>>
>>> Hi Paul and Jun,
>>>
>>> Thank you for your reporting!
>>>
>>> Now I can reproduce the issue quickly using openvpn.
>>> So I would provide a fix soon (hopefully).
>>
>>
>> Oops. The tested kernel was built at 8/24. A kernel built today
>> doesn't reproduce the issue...
>
>
> Hmmm.
>
> I will update my sources and check with a up-to-the-minute kernel.
>
> I should be able to provide results within the next 60 to 90 minutes.

Thank you very much for your kind support.

  ozaki-r

Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed

2015-09-02 Thread Ryota Ozaki

Hi,

Thank you for rechecking.

I found you're right and I mess up. I've been using a kernel
with a different config from GENERIC, which is tuned for
KVM to reduce build time. So I'm now able to reproduce the
issue again with a latest GENERIC kernel. I've started
debugging really.

Thank you so much!
  ozaki-r

On Wed, Sep 2, 2015 at 5:06 PM, Paul Goyette <p...@vps1.whooppee.com> wrote:
> On Wed, 2 Sep 2015, Ryota Ozaki wrote:
>
>> On Wed, Sep 2, 2015 at 3:16 PM, Ryota Ozaki <ozak...@netbsd.org> wrote:
>>>
>>> Hi Paul and Jun,
>>>
>>> Thank you for your reporting!
>>>
>>> Now I can reproduce the issue quickly using openvpn.
>>> So I would provide a fix soon (hopefully).
>>
>>
>> Oops. The tested kernel was built at 8/24. A kernel built today
>> doesn't reproduce the issue...
>
>
>
> Hmmm, I don't know what kernel you have from Aug 24.  But a kernel that was
> built from up-to-date sources less than one hour ago (and with no subsequent
> CVS commits!) still fails.
>
> This kernel identifies itself as
>
> %uname -a
> NetBSD pokey.whooppee.com 7.99.21 NetBSD 7.99.21 (GENERIC) #0: Wed Sep  2
> 15:49:03 PHT 2015
> p...@pokey.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/GENERIC
> amd64
>
> And I have attached the Xterm log from my gdb session (after running
>
> tr -d '\r'
>
> to remove trailing ^M characters!)
>
>
>
>
>
>
> -
> | Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
> | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
> | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
> -

Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed

2015-09-02 Thread Ryota Ozaki

Hi Paul and Jun,

Thank you for your reporting!

Now I can reproduce the issue quickly using openvpn.
So I would provide a fix soon (hopefully).

  ozaki-r

On Tue, Sep 1, 2015 at 8:14 PM, Paul Goyette <p...@vps1.whooppee.com> wrote:
> On Mon, 31 Aug 2015, Ryota Ozaki wrote:
>
>> Hi,
>>
>> I've committed a fix for rt_refcnt. Could you try again
>> with -current? (though I'm not sure the fix is related to
>> the issue...)
>
>
> The fix is not helping on my situation, either!
>
> Here is the latest info, based on a kernel from today's sources...
>
> (gdb) target kvm netbsd.0.core
> 0x8069eda5 in cpu_reboot (howto=howto@entry=260,
> bootstr=bootstr@entry=0x0)
> at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671
> 671 dumpsys();
> #0  0x8069eda5 in cpu_reboot (howto=howto@entry=260,
> bootstr=bootstr@entry=0x0)
> at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671
> #1  0x808e4cc4 in vpanic (
> fmt=0x80d2d8b8 "kernel %sassertion \"%s\" failed: file \"%s\",
> line %d ", ap=ap@entry=0xfe810f5a48a8)
> at /build/netbsd-local/src/sys/kern/subr_prf.c:342
> #2  0x80a7d763 in kern_assert (
> fmt=fmt@entry=0x80d2d8b8 "kernel %sassertion \"%s\" failed: file
> \"%s\", line %d ") at
> /build/netbsd-local/src/sys/lib/libkern/kern_assert.c:51
> #3  0x80a8421a in rtcache_invariants (ro=0xfe821c830060)
> at /build/netbsd-local/src/sys/net/route.h:441
> #4  0x8082b2a5 in rtcache_invariants (ro=0xfe821c830060)
> at /build/netbsd-local/src/sys/net/route.h:441
> #5  rtcache_getdst (ro=0xfe821c830060)
> at /build/netbsd-local/src/sys/net/route.h:467
> #6  rtcache_lookup2 (ro=0xfe821c830060,
> dst=dst@entry=0xfe810f5a495c,
> clone=clone@entry=1, hitp=hitp@entry=0xfe810f5a4958)
> at /build/netbsd-local/src/sys/net/route.c:1493
> #7  0x805070f7 in rtcache_lookup1 (clone=1, dst=0xfe810f5a495c,
> ro=) at /build/netbsd-local/src/sys/net/route.h:449
> #8  selectroute (dstsock=dstsock@entry=0xfe810f5a4bc4,
> opts=opts@entry=0xfe81f7af67d0, mopts=,
> ro=ro@entry=0x0, retifp=retifp@entry=0xfe810f5a4a10,
> retrt=retrt@entry=0xfe810f5a4a18, clone=1,
> norouteok=norouteok@entry=1)
> at /build/netbsd-local/src/sys/netinet6/in6_src.c:665
> #9  0x8050723a in in6_selectif (retifp=0xfe810f5a4a10, ro=0x0,
> mopts=, opts=0xfe81f7af67d0,
> dstsock=0xfe810f5a4bc4)
> at /build/netbsd-local/src/sys/netinet6/in6_src.c:724
> #10 in6_selectsrc (dstsock=dstsock@entry=0xfe810f5a4bc4,
> opts=opts@entry=0xfe81f7af67d0, mopts=,
> ro=ro@entry=0xfe821c830060, laddr=laddr@entry=0xfe821c8300a0,
> ifpp=ifpp@entry=0xfe810f5a4ae0,
> errorp=errorp@entry=0xfe810f5a4adc)
> at /build/netbsd-local/src/sys/netinet6/in6_src.c:204
> #11 0x80800869 in rip6_output (m=m@entry=0xfe821c249400,
> so=so@entry=0xfe8219325db0,
> dstsock=dstsock@entry=0xfe810f5a4bc4,
> control=control@entry=0x0)
> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:447
> #12 0x80800e08 in rip6_send (l=, control=0x0,
> nam=0xfe81effcd638, m=0xfe821c249400, so=0xfe8219325db0)
> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:893
> #13 rip6_send_wrapper (a=0xfe8219325db0, b=0xfe821c249400,
> c=0xfe81effcd638, d=0x0, e=)
> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:966
> #14 0x80999251 in sosend (so=0xfe8219325db0,
> addr=0xfe81effcd638, uio=0xfe810f5a4d10, top=0xfe821c249400,
> control=0x0, flags=, l=0xfe81f455c4c0)
> at /build/netbsd-local/src/sys/kern/uipc_socket.c:1064
> #15 0x809a0785 in do_sys_sendmsg_so (l=l@entry=0xfe81f455c4c0,
> s=s@entry=4, so=, fp=0xfe81f1ac6380,
> mp=mp@entry=0xfe810f5a4e58, flags=flags@entry=0,
> retsize=retsize@entry=0xfe810f5a4eb8)
> at /build/netbsd-local/src/sys/kern/uipc_syscalls.c:622
> #16 0x809a0ad2 in do_sys_sendmsg (l=l@entry=0xfe81f455c4c0, s=4,
> mp=mp@entry=0xfe810f5a4e58, flags=0,
> retsize=retsize@entry=0xfe810f5a4eb8)
> at /build/netbsd-local/src/sys/kern/uipc_syscalls.c:672
> #17 0x809a0b9b in sys_sendmsg (l=0xfe81f455c4c0,
> uap=0xfe810f5a4f00, retval=0xfe810f5a4eb8)
> at /build/netbsd-local/src/sys/kern/uipc_syscalls.c:528
> #18 0x80901f6c in sy_call (rval=0xfe810f5a4eb8,
> uap=0xfe810f5a4f00, l=0xfe81f455c4c0,
> sy=0x810ef240 <sysent+672>)
> at /bu

Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed

2015-09-02 Thread Ryota Ozaki

On Wed, Sep 2, 2015 at 3:16 PM, Ryota Ozaki <ozak...@netbsd.org> wrote:
> Hi Paul and Jun,
>
> Thank you for your reporting!
>
> Now I can reproduce the issue quickly using openvpn.
> So I would provide a fix soon (hopefully).

Oops. The tested kernel was built at 8/24. A kernel built today
doesn't reproduce the issue...

  ozaki-r

>
>   ozaki-r
>
> On Tue, Sep 1, 2015 at 8:14 PM, Paul Goyette <p...@vps1.whooppee.com> wrote:
>> On Mon, 31 Aug 2015, Ryota Ozaki wrote:
>>
>>> Hi,
>>>
>>> I've committed a fix for rt_refcnt. Could you try again
>>> with -current? (though I'm not sure the fix is related to
>>> the issue...)
>>
>>
>> The fix is not helping on my situation, either!
>>
>> Here is the latest info, based on a kernel from today's sources...
>>
>> (gdb) target kvm netbsd.0.core
>> 0x8069eda5 in cpu_reboot (howto=howto@entry=260,
>> bootstr=bootstr@entry=0x0)
>> at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671
>> 671 dumpsys();
>> #0  0x8069eda5 in cpu_reboot (howto=howto@entry=260,
>> bootstr=bootstr@entry=0x0)
>> at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671
>> #1  0x808e4cc4 in vpanic (
>> fmt=0x80d2d8b8 "kernel %sassertion \"%s\" failed: file \"%s\",
>> line %d ", ap=ap@entry=0xfe810f5a48a8)
>> at /build/netbsd-local/src/sys/kern/subr_prf.c:342
>> #2  0x80a7d763 in kern_assert (
>> fmt=fmt@entry=0x80d2d8b8 "kernel %sassertion \"%s\" failed: file
>> \"%s\", line %d ") at
>> /build/netbsd-local/src/sys/lib/libkern/kern_assert.c:51
>> #3  0x80a8421a in rtcache_invariants (ro=0xfe821c830060)
>> at /build/netbsd-local/src/sys/net/route.h:441
>> #4  0x8082b2a5 in rtcache_invariants (ro=0xfe821c830060)
>> at /build/netbsd-local/src/sys/net/route.h:441
>> #5  rtcache_getdst (ro=0xfe821c830060)
>> at /build/netbsd-local/src/sys/net/route.h:467
>> #6  rtcache_lookup2 (ro=0xfe821c830060,
>> dst=dst@entry=0xfe810f5a495c,
>> clone=clone@entry=1, hitp=hitp@entry=0xfe810f5a4958)
>> at /build/netbsd-local/src/sys/net/route.c:1493
>> #7  0x805070f7 in rtcache_lookup1 (clone=1, dst=0xfe810f5a495c,
>> ro=) at /build/netbsd-local/src/sys/net/route.h:449
>> #8  selectroute (dstsock=dstsock@entry=0xfe810f5a4bc4,
>> opts=opts@entry=0xfe81f7af67d0, mopts=,
>> ro=ro@entry=0x0, retifp=retifp@entry=0xfe810f5a4a10,
>> retrt=retrt@entry=0xfe810f5a4a18, clone=1,
>> norouteok=norouteok@entry=1)
>> at /build/netbsd-local/src/sys/netinet6/in6_src.c:665
>> #9  0x8050723a in in6_selectif (retifp=0xfe810f5a4a10, ro=0x0,
>> mopts=, opts=0xfe81f7af67d0,
>> dstsock=0xfe810f5a4bc4)
>> at /build/netbsd-local/src/sys/netinet6/in6_src.c:724
>> #10 in6_selectsrc (dstsock=dstsock@entry=0xfe810f5a4bc4,
>> opts=opts@entry=0xfe81f7af67d0, mopts=,
>> ro=ro@entry=0xfe821c830060, laddr=laddr@entry=0xfe821c8300a0,
>> ifpp=ifpp@entry=0xfe810f5a4ae0,
>> errorp=errorp@entry=0xfe810f5a4adc)
>> at /build/netbsd-local/src/sys/netinet6/in6_src.c:204
>> #11 0x80800869 in rip6_output (m=m@entry=0xfe821c249400,
>> so=so@entry=0xfe8219325db0,
>> dstsock=dstsock@entry=0xfe810f5a4bc4,
>> control=control@entry=0x0)
>> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:447
>> #12 0x80800e08 in rip6_send (l=, control=0x0,
>> nam=0xfe81effcd638, m=0xfe821c249400, so=0xfe8219325db0)
>> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:893
>> #13 rip6_send_wrapper (a=0xfe8219325db0, b=0xfe821c249400,
>> c=0xfe81effcd638, d=0x0, e=)
>> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:966
>> #14 0x80999251 in sosend (so=0xfe8219325db0,
>> addr=0xfe81effcd638, uio=0xfe810f5a4d10, top=0xfe821c249400,
>> control=0x0, flags=, l=0xfe81f455c4c0)
>> at /build/netbsd-local/src/sys/kern/uipc_socket.c:1064
>> #15 0x809a0785 in do_sys_sendmsg_so (l=l@entry=0xfe81f455c4c0,
>> s=s@entry=4, so=, fp=0xfe81f1ac6380,
>> mp=mp@entry=0xfe810f5a4e58, flags=flags@entry=0,
>> retsize=retsize@entry=0xfe810f5a4eb8)
>> at /build/netbsd-local/src/sys/kern/uipc_syscalls.c:622
>> #16 0x809a0ad2 in do_sys_sendmsg (l=l@entry=0xfe81f455c4c0, s=4,
>>

Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed

2015-09-02 Thread Ryota Ozaki

On Wed, Sep 2, 2015 at 9:11 PM, Paul Goyette <p...@vps1.whooppee.com> wrote:
> On Wed, 2 Sep 2015, Ryota Ozaki wrote:
>
>> On Wed, Sep 2, 2015 at 5:45 PM, Ryota Ozaki <ozak...@netbsd.org> wrote:
>>>
>>> Hi,
>>>
>>> Thank you for rechecking.
>>>
>>> I found you're right and I mess up. I've been using a kernel
>>> with a different config from GENERIC, which is tuned for
>>> KVM to reduce build time. So I'm now able to reproduce the
>>> issue again with a latest GENERIC kernel. I've started
>>> debugging really.
>>
>>
>> I found nd6_lookup is broken. Here is a patch to fix the issue:
>> http://www.netbsd.org/~ozaki-r/fix-ipv6-refcnt.diff
>> It works for me. I hope it works for you too.
>>
>> The change looks big because it is based on my local change that
>> cleans up complicated nd6_lookup though, the point is to replace
>> rtfree with RTFREE_IF_NEEDED.
>
>
> This seems to work for me, too!

Yay!

>
> (I still have other routing issues with my vpn tunnel, but at least the
> machine no longer crashes.)

Hmm, feel free to report the issues.

Anyway thanks for you help!
  ozaki-r

Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed

2015-09-02 Thread Ryota Ozaki

On Wed, Sep 2, 2015 at 5:45 PM, Ryota Ozaki <ozak...@netbsd.org> wrote:
> Hi,
>
> Thank you for rechecking.
>
> I found you're right and I mess up. I've been using a kernel
> with a different config from GENERIC, which is tuned for
> KVM to reduce build time. So I'm now able to reproduce the
> issue again with a latest GENERIC kernel. I've started
> debugging really.

I found nd6_lookup is broken. Here is a patch to fix the issue:
http://www.netbsd.org/~ozaki-r/fix-ipv6-refcnt.diff
It works for me. I hope it works for you too.

The change looks big because it is based on my local change that
cleans up complicated nd6_lookup though, the point is to replace
rtfree with RTFREE_IF_NEEDED.

Thanks,
  ozaki-r

>
> Thank you so much!
>   ozaki-r
>
> On Wed, Sep 2, 2015 at 5:06 PM, Paul Goyette <p...@vps1.whooppee.com> wrote:
>> On Wed, 2 Sep 2015, Ryota Ozaki wrote:
>>
>>> On Wed, Sep 2, 2015 at 3:16 PM, Ryota Ozaki <ozak...@netbsd.org> wrote:
>>>>
>>>> Hi Paul and Jun,
>>>>
>>>> Thank you for your reporting!
>>>>
>>>> Now I can reproduce the issue quickly using openvpn.
>>>> So I would provide a fix soon (hopefully).
>>>
>>>
>>> Oops. The tested kernel was built at 8/24. A kernel built today
>>> doesn't reproduce the issue...
>>
>>
>>
>> Hmmm, I don't know what kernel you have from Aug 24.  But a kernel that was
>> built from up-to-date sources less than one hour ago (and with no subsequent
>> CVS commits!) still fails.
>>
>> This kernel identifies itself as
>>
>> %uname -a
>> NetBSD pokey.whooppee.com 7.99.21 NetBSD 7.99.21 (GENERIC) #0: Wed Sep  2
>> 15:49:03 PHT 2015
>> p...@pokey.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/GENERIC
>> amd64
>>
>> And I have attached the Xterm log from my gdb session (after running
>>
>> tr -d '\r'
>>
>> to remove trailing ^M characters!)
>>
>>
>>
>>
>>
>>
>> -
>> | Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
>> | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
>> | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
>> -

Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed

2015-08-31 Thread Ryota Ozaki

Hi,

I've committed a fix for rt_refcnt. Could you try again
with -current? (though I'm not sure the fix is related to
the issue...)

Thanks,
  ozaki-r

On Fri, Aug 28, 2015 at 3:55 PM, Ryota Ozaki <ozak...@netbsd.org> wrote:
> Hi,
>
> Thank you for sending backtraces. I'm investigating now...
>
>   ozaki-r
>
>
> On Fri, Aug 28, 2015 at 1:46 PM, Paul Goyette <p...@vps1.whooppee.com> wrote:
>> On Fri, 28 Aug 2015, Paul Goyette wrote:
>>
>>> On Fri, 28 Aug 2015, Paul Goyette wrote:
>>>
>>>> On Fri, 28 Aug 2015, Jun Ebihara wrote:
>>>>
>>>>> On: i386 kernel from nyftp.
>>>>> 7.99.21 NetBSD 7.99.21 (GENERIC.201508271450Z) #0: Thu Aug 27 17:23:37
>>>>> UTC 2015
>>>>> bui...@b47.netbsd.org:/home/builds/ab/HEAD/i386/201508271450Z-obj/home/source/ab/HEAD/src/sys/arch/i386/compile/GENERIC
>>>>> i386
>>>>>
>>>>> sometimes panic around route.h.
>>>>>
>>>>> savecore: reboot after panic: panic: kernel diagnostic assertion
>>>>> "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed: file
>>>>> "/home/source/ab/HEAD/src/sys/net/route.h", line 433
>>>>>
>>>>
>>>> I had one of these crashes yesterday.  It happened when I was stopping a
>>>> pkgsrc/net/openvpn tunnel via "/etc/rc.d/openvpn onestop"
>>>>
>>>>
>>>> I don't bring my tunnel up very often, so I'm not sure if it is
>>>> reproducible...
>>>
>>>
>>> I don't know if this is the same crash as yesterday, but I just got
>>> another one!
>>>
>>> I started openvpn, then stopped it.  No crash.  so I repeated this a
>>> couple more times.  Still no crash.
>>>
>>> Then I brought the tunnel up, and did a 'ping6 ftp.netbsd.org' and it
>>> crashed almost immediately.
>>
>>
>> Here's the backtrace again, this time with symbol table loaded!
>>
>> (gdb) bt
>> #0  0x802f5ae5 in cpu_reboot (howto=howto@entry=260,
>> bootstr=bootstr@entry=0x0)
>> at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671
>> #1  0x803742d4 in vpanic (
>> fmt=0x804b6e38 "kernel %sassertion \"%s\" failed: file \"%s\",
>> line %d ", ap=ap@entry=0xfe817d50eb70)
>> at /build/netbsd-local/src/sys/kern/subr_prf.c:340
>> #2  0x80467583 in kern_assert (
>> fmt=fmt@entry=0x804b6e38 "kernel %sassertion \"%s\" failed: file
>> \"%s\", line %d ")
>> at /build/netbsd-local/src/sys/lib/libkern/kern_assert.c:51
>> #3  0x8033a0e3 in rtfree (rt=0xfe815325aab0)
>> at /build/netbsd-local/src/sys/net/route.c:417
>> #4  0x8033a5fd in rtcache_clear (ro=ro@entry=0xfe821db78060)
>> at /build/netbsd-local/src/sys/net/route.c:1473
>> #5  0x8033a681 in rtcache_free (ro=ro@entry=0xfe821db78060)
>> at /build/netbsd-local/src/sys/net/route.c:1518
>> #6  0x8020a65a in in6_pcbdetach (in6p=0xfe821db78000)
>> at /build/netbsd-local/src/sys/netinet6/in6_pcb.c:618
>> #7  0x80332bec in rip6_detach (so=)
>> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:644
>> #8  0x80332ccc in rip6_detach_wrapper (a=0xfe817c792498)
>> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:964
>> #9  0x803cea10 in soclose (so=0xfe817c792498)
>> at /build/netbsd-local/src/sys/kern/uipc_socket.c:762
>> #10 0x803860a1 in soo_close (fp=0xfe81ef653980)
>> at /build/netbsd-local/src/sys/kern/sys_socket.c:255
>> #11 0x802b0de4 in closef (fp=0xfe81ef653980)
>> at /build/netbsd-local/src/sys/kern/kern_descrip.c:831
>> #12 0x802b39e3 in fd_free ()
>> at /build/netbsd-local/src/sys/kern/kern_descrip.c:1561
>> #13 0x802badd5 in exit1 (l=l@entry=0xfe815a2249a0,
>> rv=rv@entry=2)
>> at /build/netbsd-local/src/sys/kern/kern_exit.c:275
>> #14 0x802db146 in sigexit (l=l@entry=0xfe815a2249a0,
>> signo=signo@entry=2)
>> at /build/netbsd-local/src/sys/kern/kern_sig.c:2048
>> #15 0x802db46b in postsig (signo=2)
>> at /build/netbsd-local/src/sys/kern/kern_sig.c:1848
>> #16 0x802c4999 in lwp_userret (l=l@entry=0xfe815a2249a0)
>> at /build/netbsd-local/src/sys/kern/kern_lwp.c:1530
>> #17 0x803866b4 in mi_userret (l=0xfe815a2249a0)
>> at /build/netbs

Re: ro-_ro_rt ==NULL || ro-_ro_rt-rt_refcnt 0 failed

2015-08-28 Thread Ryota Ozaki

Hi,

Thank you for sending backtraces. I'm investigating now...

  ozaki-r


On Fri, Aug 28, 2015 at 1:46 PM, Paul Goyette p...@vps1.whooppee.com wrote:
 On Fri, 28 Aug 2015, Paul Goyette wrote:

 On Fri, 28 Aug 2015, Paul Goyette wrote:

 On Fri, 28 Aug 2015, Jun Ebihara wrote:

 On: i386 kernel from nyftp.
 7.99.21 NetBSD 7.99.21 (GENERIC.201508271450Z) #0: Thu Aug 27 17:23:37
 UTC 2015
 bui...@b47.netbsd.org:/home/builds/ab/HEAD/i386/201508271450Z-obj/home/source/ab/HEAD/src/sys/arch/i386/compile/GENERIC
 i386

 sometimes panic around route.h.

 savecore: reboot after panic: panic: kernel diagnostic assertion
 ro-_ro_rt ==NULL || ro-_ro_rt-rt_refcnt  0 failed: file
 /home/source/ab/HEAD/src/sys/net/route.h, line 433


 I had one of these crashes yesterday.  It happened when I was stopping a
 pkgsrc/net/openvpn tunnel via /etc/rc.d/openvpn onestop


 I don't bring my tunnel up very often, so I'm not sure if it is
 reproducible...


 I don't know if this is the same crash as yesterday, but I just got
 another one!

 I started openvpn, then stopped it.  No crash.  so I repeated this a
 couple more times.  Still no crash.

 Then I brought the tunnel up, and did a 'ping6 ftp.netbsd.org' and it
 crashed almost immediately.


 Here's the backtrace again, this time with symbol table loaded!

 (gdb) bt
 #0  0x802f5ae5 in cpu_reboot (howto=howto@entry=260,
 bootstr=bootstr@entry=0x0)
 at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671
 #1  0x803742d4 in vpanic (
 fmt=0x804b6e38 kernel %sassertion \%s\ failed: file \%s\,
 line %d , ap=ap@entry=0xfe817d50eb70)
 at /build/netbsd-local/src/sys/kern/subr_prf.c:340
 #2  0x80467583 in kern_assert (
 fmt=fmt@entry=0x804b6e38 kernel %sassertion \%s\ failed: file
 \%s\, line %d )
 at /build/netbsd-local/src/sys/lib/libkern/kern_assert.c:51
 #3  0x8033a0e3 in rtfree (rt=0xfe815325aab0)
 at /build/netbsd-local/src/sys/net/route.c:417
 #4  0x8033a5fd in rtcache_clear (ro=ro@entry=0xfe821db78060)
 at /build/netbsd-local/src/sys/net/route.c:1473
 #5  0x8033a681 in rtcache_free (ro=ro@entry=0xfe821db78060)
 at /build/netbsd-local/src/sys/net/route.c:1518
 #6  0x8020a65a in in6_pcbdetach (in6p=0xfe821db78000)
 at /build/netbsd-local/src/sys/netinet6/in6_pcb.c:618
 #7  0x80332bec in rip6_detach (so=optimized out)
 at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:644
 #8  0x80332ccc in rip6_detach_wrapper (a=0xfe817c792498)
 at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:964
 #9  0x803cea10 in soclose (so=0xfe817c792498)
 at /build/netbsd-local/src/sys/kern/uipc_socket.c:762
 #10 0x803860a1 in soo_close (fp=0xfe81ef653980)
 at /build/netbsd-local/src/sys/kern/sys_socket.c:255
 #11 0x802b0de4 in closef (fp=0xfe81ef653980)
 at /build/netbsd-local/src/sys/kern/kern_descrip.c:831
 #12 0x802b39e3 in fd_free ()
 at /build/netbsd-local/src/sys/kern/kern_descrip.c:1561
 #13 0x802badd5 in exit1 (l=l@entry=0xfe815a2249a0,
 rv=rv@entry=2)
 at /build/netbsd-local/src/sys/kern/kern_exit.c:275
 #14 0x802db146 in sigexit (l=l@entry=0xfe815a2249a0,
 signo=signo@entry=2)
 at /build/netbsd-local/src/sys/kern/kern_sig.c:2048
 #15 0x802db46b in postsig (signo=2)
 at /build/netbsd-local/src/sys/kern/kern_sig.c:1848
 #16 0x802c4999 in lwp_userret (l=l@entry=0xfe815a2249a0)
 at /build/netbsd-local/src/sys/kern/kern_lwp.c:1530
 #17 0x803866b4 in mi_userret (l=0xfe815a2249a0)
 at /build/netbsd-local/src/sys/sys/userret.h:94
 #18 userret (l=0xfe815a2249a0) at ./machine/userret.h:82
 #19 syscall (frame=0xfe817d50ef00)
 at /build/netbsd-local/src/sys/arch/x86/x86/syscall.c:184
 #20 0x80100691 in Xsyscall ()
 (gdb) frame 3
 #3  0x8033a0e3 in rtfree (rt=0xfe815325aab0)
 at /build/netbsd-local/src/sys/net/route.c:417
 warning: Source file is more recent than executable.
 (gdb) print rt
 $1 = (struct rtentry *) 0xfe815325aab0
 (gdb) print *rt
 $2 = {rt_nodes = {{rn_mklist = 0xfe81f18b9558, rn_p =
 0xfe821dbb4bb8,
   rn_b = -1, rn_bmask = 0 '\000', rn_flags = 4 '\004', rn_u = {rn_leaf =
 {
   rn_Key = 0xfe821ddcc048 \034\030,
   rn_Mask = 0xfe810e9882b8 , rn_Dupedkey = 0x0}, rn_node = {
   rn_Off = 501006408, rn_L = 0xfe810e9882b8, rn_R = 0x0}}}, {
   rn_mklist = 0x0, rn_p = 0x0, rn_b = 0, rn_bmask = 0 '\000',
   rn_flags = 0 '\000', rn_u = {rn_leaf = {rn_Key = 0x0, rn_Mask = 0x0,
   rn_Dupedkey = 0x0}, rn_node = {rn_Off = 0, rn_L = 0x0,
   rn_R = 0x0, rt_gateway = 0xfe821e1cdeb8, rt_flags = 2051,
   rt_refcnt = 0, rt_use = 17, rt_ifp = 0xfe8138031810,
   ^^
   rt_ifa = 0xfe81ee677010, rt_ifa_seqno = 0, rt_llinfo = 0x0, rt_rmx = {
 rmx_locks = 0, rmx_mtu = 0,

Re: -current kernel on KVM with virtio disk fails to boot

2015-08-24 Thread Ryota Ozaki

On Mon, Aug 24, 2015 at 2:12 PM, Michael van Elst mlel...@serpens.de wrote:
 ozak...@netbsd.org (Ryota Ozaki) writes:

Hi,

I got the following panic on bootup. It seems recent
IPL_VM = IPL_NONE change in dk_attach causes it.

 Yes. Unfortunately the ld drivers differ very much in
 what context the start and iodone routines are called.

Should we apply it?

 It would only help virtio. Either all drivers must
 be adjusted or a common solution must be found.

I thought the panic happens only on the virtio disk driver.
Feel free to apply the patch if we choose the former.

  ozaki-r



 --
 --
 Michael van Elst
 Internet: mlel...@serpens.de
 A potential Snark may lurk in every tree.

-current kernel on KVM with virtio disk fails to boot

2015-08-23 Thread Ryota Ozaki

Hi,

I got the following panic on bootup. It seems recent
IPL_VM = IPL_NONE change in dk_attach causes it.



Mutex error: lockdebug_wantlock: acquiring sleep lock from interrupt context

lock address : 0xfe800386bce0 type : sleep/adaptive
initialized  : 0x801b3a65
shared holds :  0 exclusive:  0
shares wanted:  0 exclusive:  0
current cpu  :  0 last held:  0
current lwp  : 0xfe80035771a0 last held: 00
last locked  : 0x801b3bfc unlocked*: 0x80489f53
owner field  : 00 wait/spin:0/0

Turnstile chain at 0x809bca40.
= No active turnstile for this lock.

panic: LOCKDEBUG: Mutex error: lockdebug_wantlock: acquiring sleep
lock from interrupt context
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 80197c75 cs 8 rflags 246 cr2 0 ilevel 6
rsp fe8003997b00
curlwp 0xfe80035771a0 pid 0.43 lowest kstack 0xfe80039942c0
Stopped in pid 0.43 (system) at netbsd:breakpoint+0x5:  leave
db{0} bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
lockdebug_more() at netbsd:lockdebug_more
mutex_enter() at netbsd:mutex_enter+0x43f
dk_done() at netbsd:dk_done+0x62
lddone() at netbsd:lddone+0xf
ld_virtio_vq_done() at netbsd:ld_virtio_vq_done+0x37
virtio_vq_intr() at netbsd:virtio_vq_intr+0x70
virtio_intr() at netbsd:virtio_intr+0x70
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x19
Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2
--- interrupt ---
bus_space_read_4() at netbsd:bus_space_read_4+0xa
lwp_exit_switchaway() at netbsd:lwp_exit_switchaway+0x79
lwp_exit() at netbsd:lwp_exit+0x317
kthread_exit() at netbsd:kthread_exit+0x4e
config_finalize_register() at netbsd:config_finalize_register



We can fix it by running ld_virtio's interrupt
hander in softint; it can be done simply by the
patch.



diff --git a/sys/dev/pci/ld_virtio.c b/sys/dev/pci/ld_virtio.c
index f578a73..404edfd 100644
--- a/sys/dev/pci/ld_virtio.c
+++ b/sys/dev/pci/ld_virtio.c
@@ -249,7 +249,7 @@ ld_virtio_attach(device_t parent, device_t self, void *aux)
vsc-sc_nvqs = 1;
vsc-sc_config_change = NULL;
vsc-sc_intrhand = virtio_vq_intr;
-   vsc-sc_flags = 0;
+   vsc-sc_flags = VIRTIO_F_PCI_INTR_SOFTINT;

features = virtio_negotiate_features(vsc,
 (VIRTIO_BLK_F_SIZE_MAX |



Should we apply it?

  ozaki-r

Re: Kernel panic from network traffic

2015-07-24 Thread Ryota Ozaki

Hi,

It's probably due to my recent change to refcnt. I'm investigating
that defect.

Thanks,
  ozaki-r

On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote:
 Being a moron, I plugged ports of my switch together. The big surprise
 is two ports away is my -current box and it kept panicking.

 This is all I got so far.

 Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion
 rt-rt_refcnt  0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23
 21:46:11 mara /netbsd: cpu3: Begin traceback...
 Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c
 Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f
 Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5
 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at
 netbsd:rtcache_clear+0x41
 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd
 Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at
 netbsd:in6_pcbdetach+0xcb
 Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at
 netbsd:udp6_detach_wrapper+0x3f
 Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63
 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16
 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54
 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f
 Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20
 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c
 Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) ---
 Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a:
 Jul 23 21:46:11 mara /netbsd: cpu3: End traceback...

 --
 Hisashi T Fujinaka - ht...@twofifty.com
 BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee

Re: Kernel panic from network traffic

2015-07-24 Thread Ryota Ozaki

Hi,

I just fixed one bug related to refcnt. The fix may shut up the panic.
Could you try again with a latest kernel?

Thanks,
  ozaki-r

On Fri, Jul 24, 2015 at 3:38 PM, Ryota Ozaki ozak...@netbsd.org wrote:
 On Fri, Jul 24, 2015 at 3:12 PM, Ryota Ozaki ozak...@netbsd.org wrote:
 Hi,

 It's probably due to my recent change to refcnt. I'm investigating
 that defect.

 Hmm, I cannot reproduce it. Could you tell me the kernel config,
 network setups and apps running on the box?

 Thanks,
   ozaki-r


 Thanks,
   ozaki-r

 On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com 
 wrote:
 Being a moron, I plugged ports of my switch together. The big surprise
 is two ports away is my -current box and it kept panicking.

 This is all I got so far.

 Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion
 rt-rt_refcnt  0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23
 21:46:11 mara /netbsd: cpu3: Begin traceback...
 Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c
 Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f
 Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5
 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at
 netbsd:rtcache_clear+0x41
 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd
 Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at
 netbsd:in6_pcbdetach+0xcb
 Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at
 netbsd:udp6_detach_wrapper+0x3f
 Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63
 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16
 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54
 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f
 Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20
 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c
 Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) ---
 Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a:
 Jul 23 21:46:11 mara /netbsd: cpu3: End traceback...

 --
 Hisashi T Fujinaka - ht...@twofifty.com
 BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee

Re: Kernel panic from network traffic

2015-07-24 Thread Ryota Ozaki

On Fri, Jul 24, 2015 at 3:12 PM, Ryota Ozaki ozak...@netbsd.org wrote:
 Hi,

 It's probably due to my recent change to refcnt. I'm investigating
 that defect.

Hmm, I cannot reproduce it. Could you tell me the kernel config,
network setups and apps running on the box?

Thanks,
  ozaki-r


 Thanks,
   ozaki-r

 On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com 
 wrote:
 Being a moron, I plugged ports of my switch together. The big surprise
 is two ports away is my -current box and it kept panicking.

 This is all I got so far.

 Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion
 rt-rt_refcnt  0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23
 21:46:11 mara /netbsd: cpu3: Begin traceback...
 Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c
 Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f
 Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5
 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at
 netbsd:rtcache_clear+0x41
 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd
 Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at
 netbsd:in6_pcbdetach+0xcb
 Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at
 netbsd:udp6_detach_wrapper+0x3f
 Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63
 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16
 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54
 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f
 Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20
 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c
 Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) ---
 Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a:
 Jul 23 21:46:11 mara /netbsd: cpu3: End traceback...

 --
 Hisashi T Fujinaka - ht...@twofifty.com
 BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee

Re: Regression on rtadvd (was: Re: CVS commit: src/usr.sbin/rtadvd)

2015-06-14 Thread Ryota Ozaki

On Mon, Jun 15, 2015 at 5:58 AM, Timo Buhrmester fstd.l...@gmail.com wrote:
 Module Name:  src
 Committed By: roy
 Date: Fri Jun  5 14:15:41 UTC 2015

 Modified Files:
   src/usr.sbin/rtadvd: rtadvd.c

 Log Message:
 Set the hoplimit of 255 as specified in RFC 4861 section 4.2
 using the IPV6_MULTICAST_HOPS socket option rather than using CMSG
 when constructing each message.
 This commit broke rtadvd for me, on i386, which now gives Invalid argument 
 on the call to sendmsg() that is supposed to generate RAs.

 Here's two gdb transcripts:

 This is what happens AFTER the offending commit:

 | # gdb -q rtadvd
 | (gdb) break rtadvd.c:1699
 | (gdb) run -df vr1
 | [...]
 | Breakpoint 1, ra_output (rai=0xbb91c0e0) at 
 /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1699
 | 1699  i = sendmsg(sock, sndmhdr, 0);
 | (gdb) bt
 | #0  ra_output (rai=0xbb91c0e0) at 
 /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1699
 | #1  0x0804cdbf in ra_timeout (data=0xbb91c0e0) at 
 /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1779
 | #2  0x08052610 in rtadvd_check_timer () at 
 /usr/src.head/usr.sbin/rtadvd/timer.c:130
 | #3  0x080499c0 in main (argc=-1, argv=0xbfbfec64) at 
 /usr/src.head/usr.sbin/rtadvd/rtadvd.c:315
 | (gdb) x/28xb sndmhdr
 | 0x8058b2c sndmhdr:0x40  0x7e0x050x080x1c0x000x00  
   0x00
 | 0x8058b34 sndmhdr+8:  0xc4  0x8a0x050x080x010x000x00  
   0x00
 | 0x8058b3c sndmhdr+16: 0xd0  0xa00x900xbb0x300x000x00  
   0x00
 | 0x8058b44 sndmhdr+24: 0x00  0x000x000x00
 | (gdb) n
 | 1701  if (i  0 || (size_t)i != rai-ra_datalen)  {
 | (gdb) print i
 | $1 = -1
 | (gdb) n
 | [...]
 | rtadvd[3794]: ra_output sendmsg on vr1: Invalid argument



 The following is what it looked like BEFORE the offending commit:

 | # gdb -q rtadvd
 | (gdb) break rtadvd.c:1702
 | (gdb) run -df vr1
 | [...]
 | Breakpoint 1, ra_output (rai=0xbb91c0e0) at 
 /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1702
 | 1702  i = sendmsg(sock, sndmhdr, 0);
 | (gdb) bt
 | #0  ra_output (rai=0xbb91c0e0) at 
 /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1702
 | #1  0x0804cdcf in ra_timeout (data=0xbb91c0e0) at 
 /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1782
 | #2  0x08052620 in rtadvd_check_timer () at 
 /usr/src.head/usr.sbin/rtadvd/timer.c:130
 | #3  0x080499c0 in main (argc=-1, argv=0xbfbfec64) at 
 /usr/src.head/usr.sbin/rtadvd/rtadvd.c:315
 | (gdb) x/28xb sndmhdr
 | 0x8058b0c sndmhdr:0x20  0x7e0x050x080x1c0x000x00  
   0x00
 | 0x8058b14 sndmhdr+8:  0xa4  0x8a0x050x080x010x000x00  
   0x00
 | 0x8058b1c sndmhdr+16: 0xd0  0xa00x900xbb0x300x000x00  
   0x00
 | 0x8058b24 sndmhdr+24: 0x00  0x000x000x00
 | (gdb) n
 | 1704  if (i  0 || (size_t)i != rai-ra_datalen)  {
 | (gdb) print i
 | $2 = 56



 The difference in the representations of `sndmhdr` are in the 1st and 9th 
 byte, which on i386 correspond to the 1st and 3rd members of struct sndhdr:
 |void*msg_name;  /* optional address */
 | andstruct iovec*msg_iov;   /* scatter/gather array */


 I've manually tracked the failing code path down through the sendmsg system 
 call:
 sys/kern/uipc_syscalls.c:620 do_sys_sendmsg_so()  ``error = 
 (*so-so_send)(so, sa, auio, NULL, control, flags, l);''
 sys/kern/uipc_socket.c:1602 sosend()  ``error = 
 (*so-so_proto-pr_usrreqs-pr_send)(so, top, addr, control, l);''
 sys/netinet6/raw_ip6.c:891 rip6_send()``error = rip6_output(m, 
 so, dst, control);''
 sys/netinet6/raw_ip6.c:391 rip6_output()  ``if ((error = 
 ip6_setpktopts(control, opt, [...]''
 sys/netinet6/ip6_output.c:2705 ip6_setpktopts()   ``if (cm-cmsg_len == 0 || 
 cm-cmsg_len  control-m_len)''

 The last function is where the EINVAL is produced, due to cm-cmsg_len being 
 zero (in the 2nd iteration of the enclosing loop)
 Unfortunately, I couldn't figure out what's supposed to happen there.


 Any ideas?

It seems that the kernel attempts to read non-existent cmsg
which was removed by the commit. Can you try the below patch?
It reduces cmsg buffer length as hoplimit cmsg is removed.

  ozaki-r

diff --git a/usr.sbin/rtadvd/rtadvd.c b/usr.sbin/rtadvd/rtadvd.c
index 66885c4..f2016eb 100644
--- a/usr.sbin/rtadvd/rtadvd.c
+++ b/usr.sbin/rtadvd/rtadvd.c
@@ -1503,8 +1503,7 @@ sock_open(void)
exit(1);
}

-   sndcmsgbuflen = CMSG_SPACE(sizeof(struct in6_pktinfo)) +
-   CMSG_SPACE(sizeof(int));
+   sndcmsgbuflen = CMSG_SPACE(sizeof(struct in6_pktinfo));
sndcmsgbuf = malloc(sndcmsgbuflen);
if (sndcmsgbuf == NULL) {
syslog(LOG_ERR, %s malloc: %m, __func__);

fatal integer divide fault in dk(4)

2015-05-05 Thread Ryota Ozaki

Hi,

I got the following fault with a recent -current kernel
on KVM since some days ago.


fatal integer divide fault in supervisor mode
trap type 8 code 0 rip 801b38fa cs 8 rflags 10246 cr2 0 ilevel
0 rsp fe800398cb48
curlwp 0xfe80035345c0 pid 0.42 lowest kstack 0xfe80039892c0
kernel: integer divide fault trap, code=0
Stopped in pid 0.42 (system) at netbsd:dk_strategy+0x41:divl
 %esi,%eax
db{0} bt
dk_strategy() at netbsd:dk_strategy+0x41
disk_read_sectors() at netbsd:disk_read_sectors+0x3b
read_sector() at netbsd:read_sector+0x1d
scan_mbr() at netbsd:scan_mbr+0x32
readdisklabel() at netbsd:readdisklabel+0x150
dk_getdisklabel() at netbsd:dk_getdisklabel+0xbf
dk_open() at netbsd:dk_open+0xf4
cdev_open() at netbsd:cdev_open+0xb2
spec_open() at netbsd:spec_open+0x25d
VOP_OPEN() at netbsd:VOP_OPEN+0x33
dkwedge_discover() at netbsd:dkwedge_discover+0xb4
config_interrupts_thread() at netbsd:config_interrupts_thread+0x2c
db{0}


The place of rip is around the code below.


/*
 * The transfer must be a whole number of blocks and the offset must
 * not be negative.
 */
if ((bp-b_bcount % secsize) != 0 || bp-b_blkno  0) {
801b38f6:   89 c8   mov%ecx,%eax
801b38f8:   31 d2   xor%edx,%edx
801b38fa:   f7 f6   div%esi
801b38fc:   85 d2   test   %edx,%edx
801b38fe:   0f 85 8c 00 00 00   jne
801b3990 dk_strategy+0xd7
801b3904:   48 83 7b 48 00  cmpq   $0x0,0x48(%rbx)
801b3909:   0f 88 81 00 00 00   js
801b3990 dk_strategy+0xd7
biodone(bp);
return;
}


I know what happens easily but I don't know
how to fix it. Can anyone fix it?

Thanks,
  ozaki-r

Re: Build error spdmem.

2015-04-20 Thread Ryota Ozaki

Applied. Thanks!

  ozaki-r

On Mon, Apr 20, 2015 at 2:52 PM, henning petersen
henning.peter...@t-online.de wrote:
 In spdmemvar.h is there a space after backslace and one semicolon is
 missing.

Re: Revisiting DTrace syscall provider

2015-03-07 Thread Ryota Ozaki

On Sat, Mar 7, 2015 at 11:34 PM, Christos Zoulas chris...@zoulas.com wrote:
 On Mar 7,  6:11pm, ozak...@netbsd.org (Ryota Ozaki) wrote:
 -- Subject: Re: Revisiting DTrace syscall provider

 | I first did so but I changed back to systrace because FreeBSD named it
 | systrace and named modules for syscall emulations as systrace_linux32
 | and systrace_freebsd32. If we follow so, we should keep systrace name as is?

 Seeing how much confusion systrace is causing I renamed it again:

 dtrace_syscall
 dtrace_syscall_netbsd32
 dtrace_syscall_linux
 ...

Yea, they're better than dtrace_systrace* :)

BTW please don't forget to rename its internal name.
(See 
http://cvsweb.netbsd.org/cgi-bin/cvsweb.cgi/src/external/cddl/osnet/dev/fbt/fbt.c.diff?r1=1.17r2=1.18f=u
)


 | NP, please do it :)

 I am making one more change to make things even simpler than what FreeBSD
 did and save more space.

Thanks!
  ozaki-r


 Thanks,

 christos

Re: Revisiting DTrace syscall provider

2015-03-07 Thread Ryota Ozaki

On Sat, Mar 7, 2015 at 6:10 AM, Christos Zoulas chris...@astron.com wrote:
 In article 
 cakryomg6z8scuvmzqmfk+n09n+1xfniznc4fac59zxoaljo...@mail.gmail.com,
 Ryota Ozaki  ozak...@netbsd.org wrote:
Anyway I updated my patches; they're based on latest -current.

Changes since the previous are:
- Remove an unexpected contribution comment from kern_dtrace.c
  (thanks riastradh!)
- Don't unload systrace.kmod when there are users using dtrace
- Add created from line to *_systrace_args.c

http://www.netbsd.org/~ozaki-r/systrace.diff
http://www.netbsd.org/~ozaki-r/systrace-full.diff
https://github.com/ozaki-r/netbsd-src/commits/dtrace-syscall-provider2

 Thanks for working on this... I had done mostly the same, but with a few
 differences:

 1. I named the module dtrace_systrace to match with the other
dtrace modules.

I first did so but I changed back to systrace because FreeBSD named it
systrace and named modules for syscall emulations as systrace_linux32
and systrace_freebsd32. If we follow so, we should keep systrace name as is?

 2. I did not create the args union to deal with signed/unsigned,
chose to mimick what FreeBSD does. Do you prefer your way?

I don't mind if it works. It was just as riz did.

 3. I fixed the entry/exit probe functions to deal with the return
value of the syscall, as well as the entry and exit argument
descriptions.
 4. I am not sure if creating systrace_foo files in the emulations
should be done right now, since we don't really load/use them.
This should be a todo item, perhaps these functions should be
the emulation structure and loaded with the emulation. But
that should be phase II...

Yes I agree.

 5. I bumped the kernel because of struct sysent changes.

Sure.


 Anyway it works nicely, and I'd like to commit it if you don't mind
 (or if you have any changes).

NP, please do it :)

  ozaki-r


 Thanks,

 christos

 Index: kern/Makefile
 ===
 RCS file: /cvsroot/src/sys/kern/Makefile,v
 retrieving revision 1.17
 diff -u -u -r1.17 Makefile
 --- kern/Makefile   16 Jan 2014 01:15:34 -  1.17
 +++ kern/Makefile   6 Mar 2015 20:55:27 -
 @@ -11,7 +11,7 @@
 @false

  SYSCALLSRC = makesyscalls.sh syscalls.conf syscalls.master
 -init_sysent.c syscalls.c ../sys/syscall.h ../sys/syscallargs.h: ${SYSCALLSRC}
 +init_sysent.c syscalls.c systrace_args.c ../sys/syscall.h 
 ../sys/syscallargs.h: ${SYSCALLSRC}
 ${HOST_SH} makesyscalls.sh syscalls.conf syscalls.master

  VNODEIFSRC = vnode_if.sh vnode_if.src
 Index: kern/makesyscalls.sh
 ===
 RCS file: /cvsroot/src/sys/kern/makesyscalls.sh,v
 retrieving revision 1.145
 diff -u -u -r1.145 makesyscalls.sh
 --- kern/makesyscalls.sh24 Jul 2014 11:58:45 -  1.145
 +++ kern/makesyscalls.sh6 Mar 2015 20:55:28 -
 @@ -61,6 +61,7 @@
  # source the config file.
  sys_nosys=sys_nosys  # default is sys_nosys(), if not specified otherwise
  maxsysargs=8   # default limit is 8 (32bit) arguments
 +systrace=/dev/null
  rumpcalls=/dev/null
  rumpcallshdr=/dev/null
  rumpsysmap=/dev/null
 @@ -75,15 +76,17 @@
  sysnamesbottom=sysnames.bottom
  rumptypes=rumphdr.types
  rumpprotos=rumphdr.protos
 +systracetmp=systrace.$$
 +systraceret=systraceret.$$

 -trap rm $sysdcl $sysprotos $sysent $sysnamesbottom $rumpsysent $rumptypes 
 $rumpprotos 0
 +trap rm $sysdcl $sysprotos $sysent $sysnamesbottom $rumpsysent $rumptypes 
 $rumpprotos $systracetmp $systraceret 0

  # Awk program (must support nawk extensions)
  # Use awk at Berkeley, nawk or gawk elsewhere.
  awk=${AWK:-awk}

  # Does this awk have a toupper function?
 -have_toupper=`$awk 'BEGIN { print toupper(true); exit; }' 2/dev/null`
 +have_toupper=$($awk 'BEGIN { print toupper(true); exit; }' 2/dev/null)

  # If this awk does not define toupper then define our own.
  if [ $have_toupper = TRUE ] ; then
 @@ -137,6 +140,9 @@
 sysnumhdr = \$sysnumhdr\
 sysarghdr = \$sysarghdr\
 sysarghdrextra = \$sysarghdrextra\
 +   systrace = \$systrace\
 +   systracetmp = \$systracetmp\
 +   systraceret = \$systraceret\
 rumpcalls = \$rumpcalls\
 rumpcallshdr = \$rumpcallshdr\
 rumpsysent = \$rumpsysent\
 @@ -211,6 +217,10 @@
 printf /* %s */\n\n, tag  rumpcallshdr
 printf /*\n * System call protos in rump namespace.\n *\n  
 rumpcallshdr
 printf  * DO NOT EDIT-- this file is automatically generated.\n  
 rumpcallshdr
 +
 +   printf /* %s */\n\n, tag  systrace
 +   printf /*\n * System call argument to DTrace register array 
 converstion.\n *\n  systrace
 +   printf  * DO NOT EDIT-- this file is automatically generated.\n  
 systrace
  }
  NR == 1 {
 sub(/ $/, )
 @@ -324,6 +334,17 @@
 \t\t

1 2 >

1 - 100 of 147 matches

Mail list logo