Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-05-22 Thread Rick Macklem
Konstantin Belousov wrote:
>On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
>> >
>> > Hi,
>> >
>> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> > to bisect this, but r358252 seems to be the culprit.
>> >
>> > If I do a kernel build over NFS using my not so big Pentium 4 (single core,
>> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> > If I revert to r358251, I cannot reproduce this.
>> >
>> > Any ideas?
>> >
>> > I can easily test any change you might suggest to see if it fixes the
>> > problem.
>> >
>> > If you want more debug info, let me know, since I can easily
>> > reproduce it.
>> >
>> > Thanks, rick
>>
>> Nothing obvious to me.  I can maybe try a repro on a VM...
>>
>> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>>
>> "btalloc" is "We're either out of address space or lost a fill race."
>
>Yes, I would be not surprised to be out of something on 1G i386 machine.
>Please also add 'show alllocks'.
Ok, I used an up to date head kernel and it took longer to reproduce a hang.
This time, none of the processes are stuck on "btalloc".
I'll try and give you most of the above, but since I have to type it in by hand
from the screen, I might not get it all. (I'm no real typist;-)
> show alllocks
exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259
exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737
exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ 
kern/subr_vmem.c:1343
exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/vfs_bio.c:1663
exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c:2930
exclusive lockmgr syncer (syncer) r = 0 locked @ kern/vfs_subr.c:2474
Process 12 (intr) thread 0x.. (108)
exclusive sleep mutex Giant (Giant) r = 0 locked @ kern/kern_intr.c:1152

> ps
- Not going to list them all, but here are the ones that seem interesting...
18 0 0 0 DL vlruwt 0x11d939cc [vnlru]
16 0 0 0 DL (threaded)   [bufdaemon]
100069  D  qsleep  [bufdaemon]
100074  D  -   [bufspacedaemon-0]
100084  D  sdflush  0x11923284 [/ worker]
- and more of these for the other UFS file systems
9 0 0 0   DL psleep  0x1e2f830  [vmdaemon]
8 0 0 0   DL (threaded)   [pagedaemon]
100067  D   psleep 0x1e2e95c   [dom0]
100072  D   launds 0x1e2e968   [laundry: dom0]
100073  D   umarcl 0x12cc720   [uma]
… a bunch of usb and cam ones
100025  D   -   0x1b2ee40  [doneq0]
…
12 0 0 0 RL  (threaded)   [intr]
17  I [swi6: task queue]
18  Run   CPU 0   [swi6: Giant taskq]
…
10  D   swapin 0x1d96dfc[swapper]
- and a bunch more in D state.
Does this mean the swapper was trying to swap in?

> acttrace
- just shows the keyboard
kdb_enter() at kdb_enter+0x35/frame
vt_kbdevent() at vt_kdbevent+0x329/frame
kdbmux_intr() at kbdmux_intr+0x19/frame
taskqueue_run_locked() at taskqueue_run_locked+0x175/frame
taskqueue_run() at taskqueue_run+0x44/frame
taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame
ithread_loop() at ithread_loop+0x237/frame
fork_exit() at fork_exit+0x6c/frame
fork_trampoline() at 0x../frame

> show all vmem
vmem 0x.. 'transient arena'
  quantum: 4096
  size:  23592960
  inuse: 0
  free: 23592960
  busy tags:   0
  free tags:2
 inusesize   freesize
  16777216   0   0   123592960
vmem 0x.. 'buffer arena'
  quantum:  4096
  size:   94683136
  inuse: 94502912
  free: 180224
  busy tags:1463
  free tags:  3
   inuse  size freesize
  16384   2 32768 1 16384
  32768   39   1277952 1  32768
  655361422   93192192 0   0
  131072  0 01  131072
vmem 0x.. 'i386trampoline'
  quantum:  1
  size:   24576
  inuse: 20860
  free:   3716
  busy tags: 9
  free tags:  3
   inuse  size  free  size
  32 1 481   52
  64  2208   0   0
  1282280   00
  2048  12048 1   3664
  4096  28192 0   0
  8192  110084   0   0
vmem 0x.. 'kernel rwx arena'
  quantum:4096
  size: 0
  inuse: 0
  free:   0
  busy tags: 0
  free tags:  0
vmem 0x.. 'kernel area dom'
  quantum:  4096
  size: 56623104
  inuse: 56582144
  free: 40960
  busy tags: 11224
  free tags: 3
inuse size  free  size
  4096  1109145428736 0   

Re: RFC: merging nfs-over-tls changes into head/sys

2020-05-22 Thread Rick Macklem
John Baldwin wrote:
>On 5/21/20 2:01 PM, Rick Macklem wrote:
>> Hi,
>>
>> I have now completed changes to the code in projects/nfs-over-tls, which
>> implements TLS encryption of NFS RPC messages. (This roughly conforms
>> to the internet draft "Towards Remote Procedure Call Encryption By Default",
>> which should soon become an RFC. For now, TLS1.2 is used instead of TLS1.3,
>> since FreeBSD's KERN_TLS does not yet implement TLS1.3.)
>>
>> I'd like to start merging some of the kernel changes into head/sys.
>>
>> The first of these would be creation of the syscall used by the daemons.
>> (The code in projects/nfs-over-tls cheats and uses the syscall for the gssd,
>>  but it needs to have its own syscall so that the gssd daemon can run 
>> concurrently
>>  with it. I didn't want testers to need to build userland just to get a 
>> syscall stub
>>  in libc.)
>>
>> After this, there are a bunch of changes to the NFS code to add support for
>> ext_pgs mbufs (these are significant patches, but should not affect the
>> non-ext_pgs mbuf case, since they'll be conditional on ND_EXTPGS/M_EXTPGS).
>>
>> Does this sound ok to do?
>>
>> Please let me know if you see problems with me doing this?
>
>I don't see any problems, per se, but I still need to do some changes on my
>end for software KTLS RX before it's ready to merge (I'm hoping to kill
>the iovecs in the kthreads entirely).
Sure. My plan is to merge bits and pieces, because some of it involves parts
of the system like mount exports or changes to soreceive_generic(),
that will require reviews.

To be honest, most of the changes are not specifically nfs-over-tls (or
krpc-over-tls, although NFS is currently the only consumer).
They are things like generating ext_pgs mbuf lists (which can be used for
non-TLS connections, although I'm not sure they are useful for other cases?)
or a better way of handling the krpc client side receive.

I think it will be quite a while before all the kernel bits are in head, but 
having
the syscall in head (mainly the syscall stub in libc) will make it easier for
testers to set systems up. They may not be FreeBSD types.

No rush on the TLS changes from my perspective. (It would be nice to get
the kernel bits in FreeBSD13. The userland stuff could probably become a
package/port, I think?

Thanks yet again, for your help with this, rick


--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC: merging nfs-over-tls changes into head/sys

2020-05-22 Thread John Baldwin
On 5/21/20 2:01 PM, Rick Macklem wrote:
> Hi,
> 
> I have now completed changes to the code in projects/nfs-over-tls, which
> implements TLS encryption of NFS RPC messages. (This roughly conforms
> to the internet draft "Towards Remote Procedure Call Encryption By Default",
> which should soon become an RFC. For now, TLS1.2 is used instead of TLS1.3,
> since FreeBSD's KERN_TLS does not yet implement TLS1.3.)
> 
> I'd like to start merging some of the kernel changes into head/sys.
> 
> The first of these would be creation of the syscall used by the daemons.
> (The code in projects/nfs-over-tls cheats and uses the syscall for the gssd,
>  but it needs to have its own syscall so that the gssd daemon can run 
> concurrently
>  with it. I didn't want testers to need to build userland just to get a 
> syscall stub
>  in libc.)
> 
> After this, there are a bunch of changes to the NFS code to add support for
> ext_pgs mbufs (these are significant patches, but should not affect the
> non-ext_pgs mbuf case, since they'll be conditional on ND_EXTPGS/M_EXTPGS).
> 
> Does this sound ok to do?
> 
> Please let me know if you see problems with me doing this?

I don't see any problems, per se, but I still need to do some changes on my
end for software KTLS RX before it's ready to merge (I'm hoping to kill
the iovecs in the kthreads entirely).

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"