Re: OpenVPN causes fresh -current to crash

2017-01-24 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo  writes:

> Hm. Maybe I should change to a TCP mount, and see what happens...

...and with NFS over TCP, writing works without hanging.  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-24 Thread Tom Ivar Helbekkmo
Ryota Ozaki  writes:

>>> The latest pfil.c (v1.34) should fix the panic. Could you try it?
>>
>> I'll give it a go tonight, and report back.

I re-introduced the change that I previously rolled back to get things
working, and then upgraded pfil.c to 1.34 and built a new kernel.  This
worked fine -- you've obviously corrected the problem.  :)

About the NFS hang:

> Can you get DDB? If you can, you can know where the processes hang up:
>   db> ps # you can get LWP addresses of ld and ls
>   db> bt/a  # you can get their stack traces

Noted - but I haven't been able to get into DDB.  I though Ctrl-Alt-Esc
in the first console (the Ctrl-Alt-F1 one) should do it, but it doesn't.

> The hang may happen depending on a NIC. Which NIC do you use?

re0 at pci2 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet

> And please let me know NFS options of the client and the server?

Not much.  Server:

nfs_server=YES and nfsd_flags="-n 16" in rc.conf.

Client:

nfs_client=YES in rc.conf, and "rw,bg,intr" as mount options.

Hm. Maybe I should change to a TCP mount, and see what happens...

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-23 Thread Ryota Ozaki
On Tue, Jan 24, 2017 at 12:53 AM, Tom Ivar Helbekkmo
 wrote:
> Ryota Ozaki  writes:
>
>> The latest pfil.c (v1.34) should fix the panic. Could you try it?
>
> I'll give it a go tonight, and report back.

Thanks.

>
> Meanwhile, do you think this ongoing MPSAFE work may have some unwanted
> consequences for NFS?  There's a problem that's been around for at least
> a couple of months, but that I only discovered the other day -- I was
> running with kernels from late October then, and the problem I observed
> is still there after upgrading.

I'm not sure. I don't know much about NFS, how it works and how it involves
the network stack.

>
> Reading NFS file systems is no problem, which is why I didn't notice it
> before, but writing hangs.  Here's an example: I started compiling a C
> source file directly to an executable on an NFS mounted file system
> (server and client both amd64 running fresh -current).  The compile pass
> is fine, but when the ld end of the pipeline wants to write the
> executable, it hangs.  So I try to do a 'df' in another terminal, and it
> hangs.  Finally, I simply attempt to make 'ls -l [target executable]'
> show me if it's written anything yet, and that hangs, too: after an
> attempt to write has hung the communication up, reads no longer work,
> either:
>
>  UID   PID  PPID   CPU PRI  NI VSZ RSS WCHAN   STAT TTY  TIME 
> COMMAND
>0 22179 22678 0 124   0   333445136 netio   D+   pts/170:00.01 
> ld [...]
>  501 21370 21006   516  85   089521144 nfsrcv  I+   pts/180:00.00 
> df
>  501 21710 1 0 127   089641116 tstile  Dpts/20-   0:00.00 
> /bin/ls [...]
>
> Once I have something with "tstile" in the "WCHAN" column, I know that
> I can't just reboot the machine: it's going to take a hard reset.

Can you get DDB? If you can, you can know where the processes hang up:
  db> ps # you can get LWP addresses of ld and ls
  db> bt/a  # you can get their stack traces

And I guess by ps you can see some other LWPs stuck on tstile, for example
softnet/N. Getting stack traces of such LWPs would explain how the hang
happens, at least, can be hints to investigate.

>
> Oh, and it's the client that hangs; the server seems to be just fine,
> and a reboot of the client makes NFS reads behave normally again.  On
> the server, the output file got created, but is zero bytes.  The error
> logged on the client when it gets stuck is this console output:
>
> nfs send error 64 for barsoom:/usr/local
>
> ...and then the normal "nfs server not responding" messages in syslog
> after that, of course.

I tried a NFS client with -current and a NFS server with netbsd-7, but
writing didn't hang (I compiled a C program and cp -r /etc/ /mnt/nfs).
The hang may happen depending on a NIC. Which NIC do you use?

And please let me know NFS options of the client and the server?

  ozaki-r


Re: OpenVPN causes fresh -current to crash

2017-01-23 Thread Jarle Greipsland
Tom Ivar Helbekkmo  writes:
[ ... ]
> Oh, and it's the client that hangs; the server seems to be just fine,
> and a reboot of the client makes NFS reads behave normally again.  On
> the server, the output file got created, but is zero bytes.  The error
> logged on the client when it gets stuck is this console output:
[ ... ]
Could this be another manifestation of PR/50432?

-jarle
--
"A firewall that lets NFS through is like a seatbelt that is designed to let
 your face reach the dashboard."
-- m...@tis.com (Marcus J Ranum)


Re: OpenVPN causes fresh -current to crash

2017-01-23 Thread Tom Ivar Helbekkmo
Ryota Ozaki  writes:

> The latest pfil.c (v1.34) should fix the panic. Could you try it?

I'll give it a go tonight, and report back.

Meanwhile, do you think this ongoing MPSAFE work may have some unwanted
consequences for NFS?  There's a problem that's been around for at least
a couple of months, but that I only discovered the other day -- I was
running with kernels from late October then, and the problem I observed
is still there after upgrading.

Reading NFS file systems is no problem, which is why I didn't notice it
before, but writing hangs.  Here's an example: I started compiling a C
source file directly to an executable on an NFS mounted file system
(server and client both amd64 running fresh -current).  The compile pass
is fine, but when the ld end of the pipeline wants to write the
executable, it hangs.  So I try to do a 'df' in another terminal, and it
hangs.  Finally, I simply attempt to make 'ls -l [target executable]'
show me if it's written anything yet, and that hangs, too: after an
attempt to write has hung the communication up, reads no longer work,
either:

 UID   PID  PPID   CPU PRI  NI VSZ RSS WCHAN   STAT TTY  TIME 
COMMAND
   0 22179 22678 0 124   0   333445136 netio   D+   pts/170:00.01 
ld [...]
 501 21370 21006   516  85   089521144 nfsrcv  I+   pts/180:00.00 df
 501 21710 1 0 127   089641116 tstile  Dpts/20-   0:00.00 
/bin/ls [...]

Once I have something with "tstile" in the "WCHAN" column, I know that
I can't just reboot the machine: it's going to take a hard reset.

Oh, and it's the client that hangs; the server seems to be just fine,
and a reboot of the client makes NFS reads behave normally again.  On
the server, the output file got created, but is zero bytes.  The error
logged on the client when it gets stuck is this console output:

nfs send error 64 for barsoom:/usr/local

...and then the normal "nfs server not responding" messages in syslog
after that, of course.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-22 Thread Ryota Ozaki
On Sun, Jan 22, 2017 at 8:05 PM, Tom Ivar Helbekkmo
 wrote:
> Martin Husemann  writes:
>
>> Could you try backing out this change and see if it helps?
>>
>> http://mail-index.netbsd.org/source-changes/2017/01/16/msg081115.html
>
> That did the trick.  I've rebooted a few times, now, and the system
> comes up as it should, with no incident, every time.  Thanks!  :)

The latest pfil.c (v1.34) should fix the panic. Could you try it?

Thanks,
  ozaki-r


Re: OpenVPN causes fresh -current to crash

2017-01-22 Thread Tom Ivar Helbekkmo
Martin Husemann  writes:

> Could you try backing out this change and see if it helps?
>
> http://mail-index.netbsd.org/source-changes/2017/01/16/msg081115.html

That did the trick.  I've rebooted a few times, now, and the system
comes up as it should, with no incident, every time.  Thanks!  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-22 Thread Martin Husemann
On Sun, Jan 22, 2017 at 10:43:27AM +0100, Tom Ivar Helbekkmo wrote:
> panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() 
> || ISSET(curlwp->l_pflag, LP_BOUND))" failed: file 
> "/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, 
> but preemption is enabled and the caller is not in a softint or CPU-bound LWP
> 
> Backtrace:
> 
> vpanic()
> ch_voltag_convert_in()
> psref_release()
> pfil_run_arg.isra.0()
> if_initialize()
> if_attach()
> tun_clone_create()
> tunopen()
> cdev_open()
> spec_open()
> VOP_OPEN()
> vn_open()
> do_open()
> do_sys_openat()
> sys_open()
> syscall()

Could you try backing out this change and see if it helps?

http://mail-index.netbsd.org/source-changes/2017/01/16/msg081115.html

Martin


OpenVPN causes fresh -current to crash

2017-01-22 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo  writes:

> Didn't go so well.  My main machine does routing between several VLANs,
> using Quagga to manage the routing, NPF and ALTQ for traffic management,
> and OpenVPN for tunnels from remote devices, all the while offering a
> number of network services internally.
>
> After updating to a fresh current, attempting to enable NPF will crash
> the machine, as will starting OpenVPN.  The latter causes a crash the
> moment it tries to create a tun interface.

It's a little more complex than that.

With NPF enabled, the machine will sometimes boot, sometimes not.  It
may hang just after enabling NPF, or it may get hung later in the boot
process -- seemingly mostly while doing stuff with USB.  Turning the
machine fully off and on again before a reboot attempt seems to increase
the chance of a successful boot, but it's still about fifty/fifty.  If
it does boot completely, it seems to be stable after that.

OpenVPN, on the other hand, will reliably crash the system.  I'm running
openvpn-2.3.6nb2 from pkgsrc, compiled about a year ago.  It's set up to
create three tunnels, and to (like the rest of the system) route IPv4
and IPv6 over them.  When it starts, the kernel immediately panics while
handling a syscall number 5 for the openvpn process.  The following
copied by hand, because a recursive panic causes the attempt to dump
core to disk to fail:

panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() || 
ISSET(curlwp->l_pflag, LP_BOUND))" failed: file 
"/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, 
but preemption is enabled and the caller is not in a softint or CPU-bound LWP

Backtrace:

vpanic()
ch_voltag_convert_in()
psref_release()
pfil_run_arg.isra.0()
if_initialize()
if_attach()
tun_clone_create()
tunopen()
cdev_open()
spec_open()
VOP_OPEN()
vn_open()
do_open()
do_sys_openat()
sys_open()
syscall()

This is with a NetBSD/amd64-current, updated from cvs yesterday.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay