Re: netbsd-8 hang on tstile
On Wed, Mar 07, 2018 at 10:20:45AM +0100, J. Hannken-Illjes wrote: > > On 6. Mar 2018, at 23:33, Manuel Bouyer wrote: > > > > on an up-to-date netbsd-8 Xen3 i386PAE kernel I see hangs on tstile. > > Hung processes shows the same pattern, they sleep in fstrans_start(): > > > > > This is reproductible, restarting my automatic test script hangs the > > same way. This i plain ffs, no wapbl. > > > > Any idea ? > > Please enter DDB and "call fstrans_dump(0)" to see which thread blocks > the transition (it will have "... shared N ..." with N > 0). a pbulk build stalled out on a NetBSD-7 VM and I remembered this. :) db{0}> call fstrans_dump(0) Fstrans locks by lwp: 11537.1 (/) shared 2 cow 0 Fstrans state by mount: 0 db{0}> ps PIDLID S CPU FLAGS STRUCT LWP * NAME WAIT [...] 115371 3 0 0 a00069efea20 find tstile [... and a whole bunch of processes stuck in tstile ...] -- Aaron J. Grier | "Not your ordinary poofy goof." | agr...@poofygoof.com "The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay." -- Tony Hoare
Re: netbsd-8 hang on tstile
On Thu, Mar 8, 2018 at 5:04 PM, Manuel Bouyer wrote: > On Thu, Mar 08, 2018 at 11:04:00AM +0900, Ryota Ozaki wrote: >> On Wed, Mar 7, 2018 at 6:38 PM, Manuel Bouyer wrote: >> > On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote: >> >> On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote: >> >> > >[...] >> >> > > This is reproductible, restarting my automatic test script hangs the >> >> > > same >> >> > > way. This i plain ffs, no wapbl. >> >> > > >> >> > > Any idea ? >> >> > >> >> > pserialize_perform can get stuck if any softints get stuck. Can you >> >> > check >> >> > if such softints exist? >> >> >> >> I'll look the next time I can see this. >> > >> > I had a console log of a previous hang. No softint appears to be waiting >> > in the ps/a output >> >> Thanks. Hm, does ps/a show softints (say softnet/0)? >> I use just ps for the purpose. > > I used plain ps too. Not sure why I added this /a in my mail (probably > related to tr/a :) Okay, thanks :) So my concern probably proved unfounded. ozaki-r
Re: netbsd-8 hang on tstile
On Thu, Mar 08, 2018 at 11:04:00AM +0900, Ryota Ozaki wrote: > On Wed, Mar 7, 2018 at 6:38 PM, Manuel Bouyer wrote: > > On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote: > >> On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote: > >> > >[...] > >> > > This is reproductible, restarting my automatic test script hangs the > >> > > same > >> > > way. This i plain ffs, no wapbl. > >> > > > >> > > Any idea ? > >> > > >> > pserialize_perform can get stuck if any softints get stuck. Can you check > >> > if such softints exist? > >> > >> I'll look the next time I can see this. > > > > I had a console log of a previous hang. No softint appears to be waiting > > in the ps/a output > > Thanks. Hm, does ps/a show softints (say softnet/0)? > I use just ps for the purpose. I used plain ps too. Not sure why I added this /a in my mail (probably related to tr/a :) -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: netbsd-8 hang on tstile
On Wed, Mar 7, 2018 at 6:38 PM, Manuel Bouyer wrote: > On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote: >> On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote: >> > >[...] >> > > This is reproductible, restarting my automatic test script hangs the same >> > > way. This i plain ffs, no wapbl. >> > > >> > > Any idea ? >> > >> > pserialize_perform can get stuck if any softints get stuck. Can you check >> > if such softints exist? >> >> I'll look the next time I can see this. > > I had a console log of a previous hang. No softint appears to be waiting > in the ps/a output Thanks. Hm, does ps/a show softints (say softnet/0)? I use just ps for the purpose. ozaki-r
Re: netbsd-8 hang on tstile
On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote: > On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote: > > >[...] > > > This is reproductible, restarting my automatic test script hangs the same > > > way. This i plain ffs, no wapbl. > > > > > > Any idea ? > > > > pserialize_perform can get stuck if any softints get stuck. Can you check > > if such softints exist? > > I'll look the next time I can see this. I had a console log of a previous hang. No softint appears to be waiting in the ps/a output -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: netbsd-8 hang on tstile
> On 6. Mar 2018, at 23:33, Manuel Bouyer wrote: > > Hello > on an up-to-date netbsd-8 Xen3 i386PAE kernel I see hangs on tstile. > Hung processes shows the same pattern, they sleep in fstrans_start(): > > This is reproductible, restarting my automatic test script hangs the same > way. This i plain ffs, no wapbl. > > Any idea ? Please enter DDB and "call fstrans_dump(0)" to see which thread blocks the transition (it will have "... shared N ..." with N > 0). -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: netbsd-8 hang on tstile
On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote: > >[...] > > This is reproductible, restarting my automatic test script hangs the same > > way. This i plain ffs, no wapbl. > > > > Any idea ? > > pserialize_perform can get stuck if any softints get stuck. Can you check > if such softints exist? I'll look the next time I can see this. Interestingly, I started the automatic tests again before going to bed, and as expected the system got stuck, I assume in the same way. But it got out of this state about 4 hours later (if I can trust nagios). This morning, the tests were still making progress. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: netbsd-8 hang on tstile
On Wed, Mar 7, 2018 at 7:33 AM, Manuel Bouyer wrote: > Hello > on an up-to-date netbsd-8 Xen3 i386PAE kernel I see hangs on tstile. > Hung processes shows the same pattern, they sleep in fstrans_start(): > sleepq_block(0,0,c0596900,c0639b6c,c534e802,40,c03dbcfe,75,c5356340,c534e800) > at netbsd:sleepq_block+0xe6 > turnstile_block(c59847f0,1,c078b6c0,c0639b6c,de8b9b70,c59847f0,c5db6020,c59a2008,0,c59a2008) > at netbsd:turnstile_block+0x29d > mutex_vector_enter(c078b6c0,c0cb8fe0,c055570c,de8b9b84,c05349d7,c575ed04,1,c59a2008,de8b9ba4,c0493ff8) > at netbsd:mutex_vector_enter+0x28c > fstrans_start(c59a2008,0,de8b9ba8,10,c055cf2c,c575ed04,20002,20002,c575ed04,0) > at netbsd:fstrans_start+0x3b6 > VOP_LOCK(c575ed04,20002,22000,0,c03dc2e7,c0746e1a,de8b9d0c,20002,c575ed04,de8b9c88) > at netbsd:VOP_LOCK+0x48 > vn_lock(c575ed04,20002,4,0,de8b9c14,c03af32d,de8b9edc,3,0,5) at > netbsd:vn_lock+0x7f > namei_tryemulroot(0,c012a96a,c532c240,de8b9ce8,c041f909,c532c2b4,de8b9d0c,de8b9d34,0,0) > at netbsd:namei_tryemulroot+0x14f > namei(de8b9d0c,1,3,de8b9d10,c042266a,0,c532c240,de8b9d20,c04211d6,0) at > netbsd:namei+0x27 > check_exec(c5db6020,de8b9dc8,c5f558a8,de8b9da4,9cf21000,de8b9dac,c0114f1a,bad054d0,de8b9ddc,bf7fef58) > at netbsd:check_exec+0x40 > execve_loadvm(bf7feab8,c03c95a0,de8b9dc8,c5a6f000,c6460c00,c700b008,404,0,0,0) > at netbsd:execve_loadvm+0x233 > execve1(c5db6020,bf7fef58,bad054d0,bf7feab8,c03c95a0,de8b9f9c,c0113572,c5db6020, > de8b9f68,de8b9f60) at netbsd:execve1+0x3c > sys_execve(c5db6020,de8b9f68,de8b9f60,c63d8290,0,c0636ddc,de8b9f68,0,0,bf7fef58) > at netbsd:sys_execve+0x31 > syscall() at netbsd:syscall+0x82 > > I guess the culprit is: > 2650 1 3 2 0 c5f92a80 python2.7 psrlz > (an anita process, actually) > sleepq_block(1,0,c059af17,c0639f80,c0640804,c5356340,c5358a80,c063f93c,6406c2,c0 > 59af17) at netbsd:sleepq_block+0x1cd > kpause(c059af17,0,1,0,c5448590,c5cec008,de5e7b5c,c048355a,c5330a28,de5e7b6c) > at > n > etbsd:kpause+0xf2 > pserialize_perform(c5330a28,de5e7b6c,c0484a6f,c638e8c0,0,c055cf2c,c5cec008,504,0,de5e7b7c) > at netbsd:pserialize_perform+0x10a > fstrans_setstate(c5cec008,0,fffe,c0115914,c5cec008,c5cec008,de5e7b94,c047a2c0,c5cec008,2) > at netbsd:fstrans_setstate+0x3a > genfs_suspendctl(c5cec008,2,de5e7bb0,de5e7bd4,de5e7bb0,c0483fc4,c5cec008,2,de5e7bd4,de5e7bd4) > at netbsd:genfs_suspendctl+0x3a > VFS_SUSPENDCTL(c5cec008,2,de5e7bd4,de5e7bd4,504,de5e7be4,c0486fd6,c5cec008,504,0) > at netbsd:VFS_SUSPENDCTL+0x20 > vfs_resume(c5cec008,504,0,de5e7bd4,c011599f,4,c5cec008,c5e978e4,c5e978e4,c5f92a80) > at netbsd:vfs_resume+0x74 > vrevoke(c5e978e4,de5e7c14,c049370a,de5e7c04,0,0,c055d160,c5e978e4,1,0) at > netbsd:vrevoke+0x96 > genfs_revoke(de5e7c04,0,0,c055d160,c5e978e4,1,0,de5e7cc4,c044dea9,c5e978e4) > at netbsd:genfs_revoke+0x1a > VOP_REVOKE(c5e978e4,1,c5353f00,504,0,74,c5e978e4,0,190,) at > netbsd:VOP_REVOKE+0x4a > pty_grant_slave(c5f92a80,504,0,c5cec008,0,10,c055cf2c,c5c271bc,20002,20002) > at netbsd:pty_grant_slave+0xc9 > ptmioctl(a501,0,48087446,c6922008,3,c5f92a80,c5f92a80,48087446,c055c260,3) at > netbsd:ptmioctl+0xdd > cdev_ioctl(a501,0,48087446,c6922008,3,c5f92a80,a501,c5c271bc,c5c271bc,c5fd22c0) > at netbsd:cdev_ioctl+0xd0 > spec_ioctl(de5e7da0,c0115914,c5df9bc0,c055d1f0,c5c271bc,48087446,c6922008,3,c5eac9c0,48087446) > at netbsd:spec_ioctl+0x90 > VOP_IOCTL(c5c271bc,48087446,c6922008,3,c5eac9c0,c5e74380,c6fbe790,fffe,c0115914,0) > at netbsd:VOP_IOCTL+0x3e > vn_ioctl(c5fd22c0,48087446,c6922008,c5f8eb74,c5fd22c0,,,0,0,c6922008) > at netbsd:vn_ioctl+0x9f > sys_ioctl(c5f92a80,de5e7f68,de5e7f60,c63d8788,0,c0636d78,de5e7f68,0,0,7) at > netbsd:sys_ioctl+0x10a > syscall() at netbsd:syscall+0x82 > > This is reproductible, restarting my automatic test script hangs the same > way. This i plain ffs, no wapbl. > > Any idea ? pserialize_perform can get stuck if any softints get stuck. Can you check if such softints exist? ozaki-r