Re: OpenVPN causes fresh -current to crash

2017-01-24 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo  writes:

> Hm. Maybe I should change to a TCP mount, and see what happens...

...and with NFS over TCP, writing works without hanging.  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-24 Thread Tom Ivar Helbekkmo
Ryota Ozaki  writes:

>>> The latest pfil.c (v1.34) should fix the panic. Could you try it?
>>
>> I'll give it a go tonight, and report back.

I re-introduced the change that I previously rolled back to get things
working, and then upgraded pfil.c to 1.34 and built a new kernel.  This
worked fine -- you've obviously corrected the problem.  :)

About the NFS hang:

> Can you get DDB? If you can, you can know where the processes hang up:
>   db> ps # you can get LWP addresses of ld and ls
>   db> bt/a  # you can get their stack traces

Noted - but I haven't been able to get into DDB.  I though Ctrl-Alt-Esc
in the first console (the Ctrl-Alt-F1 one) should do it, but it doesn't.

> The hang may happen depending on a NIC. Which NIC do you use?

re0 at pci2 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet

> And please let me know NFS options of the client and the server?

Not much.  Server:

nfs_server=YES and nfsd_flags="-n 16" in rc.conf.

Client:

nfs_client=YES in rc.conf, and "rw,bg,intr" as mount options.

Hm. Maybe I should change to a TCP mount, and see what happens...

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-23 Thread Ryota Ozaki
On Tue, Jan 24, 2017 at 12:53 AM, Tom Ivar Helbekkmo
 wrote:
> Ryota Ozaki  writes:
>
>> The latest pfil.c (v1.34) should fix the panic. Could you try it?
>
> I'll give it a go tonight, and report back.

Thanks.

>
> Meanwhile, do you think this ongoing MPSAFE work may have some unwanted
> consequences for NFS?  There's a problem that's been around for at least
> a couple of months, but that I only discovered the other day -- I was
> running with kernels from late October then, and the problem I observed
> is still there after upgrading.

I'm not sure. I don't know much about NFS, how it works and how it involves
the network stack.

>
> Reading NFS file systems is no problem, which is why I didn't notice it
> before, but writing hangs.  Here's an example: I started compiling a C
> source file directly to an executable on an NFS mounted file system
> (server and client both amd64 running fresh -current).  The compile pass
> is fine, but when the ld end of the pipeline wants to write the
> executable, it hangs.  So I try to do a 'df' in another terminal, and it
> hangs.  Finally, I simply attempt to make 'ls -l [target executable]'
> show me if it's written anything yet, and that hangs, too: after an
> attempt to write has hung the communication up, reads no longer work,
> either:
>
>  UID   PID  PPID   CPU PRI  NI VSZ RSS WCHAN   STAT TTY  TIME 
> COMMAND
>0 22179 22678 0 124   0   333445136 netio   D+   pts/170:00.01 
> ld [...]
>  501 21370 21006   516  85   089521144 nfsrcv  I+   pts/180:00.00 
> df
>  501 21710 1 0 127   089641116 tstile  Dpts/20-   0:00.00 
> /bin/ls [...]
>
> Once I have something with "tstile" in the "WCHAN" column, I know that
> I can't just reboot the machine: it's going to take a hard reset.

Can you get DDB? If you can, you can know where the processes hang up:
  db> ps # you can get LWP addresses of ld and ls
  db> bt/a  # you can get their stack traces

And I guess by ps you can see some other LWPs stuck on tstile, for example
softnet/N. Getting stack traces of such LWPs would explain how the hang
happens, at least, can be hints to investigate.

>
> Oh, and it's the client that hangs; the server seems to be just fine,
> and a reboot of the client makes NFS reads behave normally again.  On
> the server, the output file got created, but is zero bytes.  The error
> logged on the client when it gets stuck is this console output:
>
> nfs send error 64 for barsoom:/usr/local
>
> ...and then the normal "nfs server not responding" messages in syslog
> after that, of course.

I tried a NFS client with -current and a NFS server with netbsd-7, but
writing didn't hang (I compiled a C program and cp -r /etc/ /mnt/nfs).
The hang may happen depending on a NIC. Which NIC do you use?

And please let me know NFS options of the client and the server?

  ozaki-r


Re: OpenVPN causes fresh -current to crash

2017-01-23 Thread Jarle Greipsland
Tom Ivar Helbekkmo  writes:
[ ... ]
> Oh, and it's the client that hangs; the server seems to be just fine,
> and a reboot of the client makes NFS reads behave normally again.  On
> the server, the output file got created, but is zero bytes.  The error
> logged on the client when it gets stuck is this console output:
[ ... ]
Could this be another manifestation of PR/50432?

-jarle
--
"A firewall that lets NFS through is like a seatbelt that is designed to let
 your face reach the dashboard."
-- m...@tis.com (Marcus J Ranum)


Re: OpenVPN causes fresh -current to crash

2017-01-23 Thread Tom Ivar Helbekkmo
Ryota Ozaki  writes:

> The latest pfil.c (v1.34) should fix the panic. Could you try it?

I'll give it a go tonight, and report back.

Meanwhile, do you think this ongoing MPSAFE work may have some unwanted
consequences for NFS?  There's a problem that's been around for at least
a couple of months, but that I only discovered the other day -- I was
running with kernels from late October then, and the problem I observed
is still there after upgrading.

Reading NFS file systems is no problem, which is why I didn't notice it
before, but writing hangs.  Here's an example: I started compiling a C
source file directly to an executable on an NFS mounted file system
(server and client both amd64 running fresh -current).  The compile pass
is fine, but when the ld end of the pipeline wants to write the
executable, it hangs.  So I try to do a 'df' in another terminal, and it
hangs.  Finally, I simply attempt to make 'ls -l [target executable]'
show me if it's written anything yet, and that hangs, too: after an
attempt to write has hung the communication up, reads no longer work,
either:

 UID   PID  PPID   CPU PRI  NI VSZ RSS WCHAN   STAT TTY  TIME 
COMMAND
   0 22179 22678 0 124   0   333445136 netio   D+   pts/170:00.01 
ld [...]
 501 21370 21006   516  85   089521144 nfsrcv  I+   pts/180:00.00 df
 501 21710 1 0 127   089641116 tstile  Dpts/20-   0:00.00 
/bin/ls [...]

Once I have something with "tstile" in the "WCHAN" column, I know that
I can't just reboot the machine: it's going to take a hard reset.

Oh, and it's the client that hangs; the server seems to be just fine,
and a reboot of the client makes NFS reads behave normally again.  On
the server, the output file got created, but is zero bytes.  The error
logged on the client when it gets stuck is this console output:

nfs send error 64 for barsoom:/usr/local

...and then the normal "nfs server not responding" messages in syslog
after that, of course.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-22 Thread Ryota Ozaki
On Sun, Jan 22, 2017 at 8:05 PM, Tom Ivar Helbekkmo
 wrote:
> Martin Husemann  writes:
>
>> Could you try backing out this change and see if it helps?
>>
>> http://mail-index.netbsd.org/source-changes/2017/01/16/msg081115.html
>
> That did the trick.  I've rebooted a few times, now, and the system
> comes up as it should, with no incident, every time.  Thanks!  :)

The latest pfil.c (v1.34) should fix the panic. Could you try it?

Thanks,
  ozaki-r


Re: OpenVPN causes fresh -current to crash

2017-01-22 Thread Tom Ivar Helbekkmo
Martin Husemann  writes:

> Could you try backing out this change and see if it helps?
>
> http://mail-index.netbsd.org/source-changes/2017/01/16/msg081115.html

That did the trick.  I've rebooted a few times, now, and the system
comes up as it should, with no incident, every time.  Thanks!  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-22 Thread Martin Husemann
On Sun, Jan 22, 2017 at 10:43:27AM +0100, Tom Ivar Helbekkmo wrote:
> panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() 
> || ISSET(curlwp->l_pflag, LP_BOUND))" failed: file 
> "/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, 
> but preemption is enabled and the caller is not in a softint or CPU-bound LWP
> 
> Backtrace:
> 
> vpanic()
> ch_voltag_convert_in()
> psref_release()
> pfil_run_arg.isra.0()
> if_initialize()
> if_attach()
> tun_clone_create()
> tunopen()
> cdev_open()
> spec_open()
> VOP_OPEN()
> vn_open()
> do_open()
> do_sys_openat()
> sys_open()
> syscall()

Could you try backing out this change and see if it helps?

http://mail-index.netbsd.org/source-changes/2017/01/16/msg081115.html

Martin