Re: recurring tstile hangs on -current

2019-07-05 Thread maya
On Fri, Jul 05, 2019 at 10:29:24PM +0200, Thomas Klausner wrote: > >From some debugging so far, the cause for the hang seems to be that > the nvme driver is waiting for an interrupt that doesn't come. > > At least once I got it to get unstuck by call "nvme_intr()" on the > softc address from ddb.

Re: recurring tstile hangs on -current

2019-07-05 Thread Thomas Klausner
On Mon, Jul 01, 2019 at 07:02:06AM -, Michael van Elst wrote: > t...@giga.or.at (Thomas Klausner) writes: > > >So it looks like it could be a very extreme slowness instead of a > >complete deadlock. > > When it stops, try to reduce kern.maxvnodes to something low (like 100), > you can

Re: recurring tstile hangs on -current

2019-07-01 Thread Brian Buhrow
hello. If I were looking at this issue, I'd be looking at the perl process stuck in bioloc, to see what it's doing. As I understand it, processes stuck in tstile are a symptom, rather than a cause. that is, any process that is waiting for access to some subsystem in an indirect manner

Re: recurring tstile hangs on -current

2019-07-01 Thread Manuel Bouyer
On Fri, Jun 28, 2019 at 09:42:08PM +0200, Thomas Klausner wrote: > To reduce the bug surface, I've disconnected the wd0 device which was > attached at ahcisata. This also removed the swap device, but the > machine is far from needing to swap. > > After ~5 hours the machine is currently hanging in

Re: recurring tstile hangs on -current

2019-07-01 Thread Michael van Elst
t...@giga.or.at (Thomas Klausner) writes: >So it looks like it could be a very extreme slowness instead of a >complete deadlock. When it stops, try to reduce kern.maxvnodes to something low (like 100), you can restore it, if the machine wakes up. If this is a memory shortage instead of a

Re: recurring tstile hangs on -current

2019-06-30 Thread Thomas Klausner
To reduce the bug surface, I've disconnected the wd0 device which was attached at ahcisata. This also removed the swap device, but the machine is far from needing to swap. After ~5 hours the machine is currently hanging in tstile again. I noticed the bulk build wasn't progressing (in a perl

Re: recurring tstile hangs on -current

2019-06-30 Thread Thomas Klausner
This time it recovered! It took an hour or so, but the tstile blocked processes are now gone (finished) and I got my console shell (with the rm) back. Nothing in dmesg or /var/log/messages. So it looks like it could be a very extreme slowness instead of a complete deadlock. Thomas On Fri, Jun

Re: recurring tstile hangs on -current

2019-06-30 Thread Thomas Klausner
With dmesg this time. On Fri, Jun 28, 2019 at 11:39:05AM +0200, Thomas Klausner wrote: > Hi Frank! > > I checked some process states in ddb. > > "master", the 2 "bjam" and at least one "cp" hanging in tstile have: > sleepq_block() > turnstile_block() > rw_vector_enter() > genfs_lock() >

Re: recurring tstile hangs on -current

2019-06-28 Thread Thomas Klausner
When I tried to get a core, I saw: > reboot 0x104 dumping to dev 168,2 (offset=73677660, size=33524130) dump ahcisata0 port 5: clearing WDCTL_RST failed for drive 0 wddump: device timed out i/o error rebooting... Thomas On Fri, Jun 28, 2019 at 11:39:05AM +0200, Thomas Klausner wrote: > Hi

Re: recurring tstile hangs on -current

2019-06-28 Thread Thomas Klausner
On Fri, Jun 28, 2019 at 11:44:37AM +0100, Robert Swindells wrote: > > Thomas Klausner wrote: > >I've set up a new machine for bulk building. I have tried various > >things, but in the end it always hangs in tstile. > > > >First try was what I currently use: tmpfs sandboxes with nullfs > >mounted

Re: recurring tstile hangs on -current

2019-06-28 Thread Robert Swindells
Thomas Klausner wrote: >I've set up a new machine for bulk building. I have tried various >things, but in the end it always hangs in tstile. > >First try was what I currently use: tmpfs sandboxes with nullfs >mounted /bin, /lib, ... When it hung, the suspicion was that it's >nullfs' fault. (The

Re: recurring tstile hangs on -current

2019-06-28 Thread Thomas Klausner
Hi Frank! I checked some process states in ddb. "master", the 2 "bjam" and at least one "cp" hanging in tstile have: sleepq_block() turnstile_block() rw_vector_enter() genfs_lock() VOP_LOCK() vn_lock() namei_tryemulroot() namei() check_exec() execve_loadvm() execve1() syscall() These look quite

Re: recurring tstile hangs on -current

2019-06-28 Thread Frank Kardel
Hi Thomas, glad that this is observed elsewhere. Maybe following bugs could resonate with your observations: kern/54207 [serious/high]: -current locks up solidly when pkgsrc building adapta-gtk-theme-3.95.0.11 looks like locking issue in layerfs* (nullfs). (AMD 1800X, 64GB)