Re: netbsd-8 hang on tstile

2019-08-31 Thread Aaron J. Grier
On Wed, Mar 07, 2018 at 10:20:45AM +0100, J. Hannken-Illjes wrote:
> > On 6. Mar 2018, at 23:33, Manuel Bouyer  wrote:
> > 
> > on an up-to-date netbsd-8 Xen3 i386PAE kernel I see hangs on tstile.
> > Hung processes shows the same pattern, they sleep in fstrans_start():
> 
> > 
> > This is reproductible, restarting my automatic test script hangs the
> > same way. This i plain ffs, no wapbl.
> > 
> > Any idea ?
> 
> Please enter DDB and "call fstrans_dump(0)" to see which thread blocks
> the transition (it will have "... shared N ..." with N > 0).

a pbulk build stalled out on a NetBSD-7 VM and I remembered this.  :)

db{0}> call fstrans_dump(0)
Fstrans locks by lwp:
11537.1  (/) shared 2 cow 0
Fstrans state by mount:
0
db{0}> ps
PIDLID S CPU FLAGS   STRUCT LWP *   NAME WAIT
[...]
115371 3   0 0   a00069efea20   find tstile
[... and a whole bunch of processes stuck in tstile ...]

-- 
  Aaron J. Grier | "Not your ordinary poofy goof." | agr...@poofygoof.com
  "The price of reliability is the pursuit of the utmost simplicity.  It
   is a price which the very rich find most hard to pay."  -- Tony Hoare


Re: netbsd-8 hang on tstile

2018-03-08 Thread Ryota Ozaki
On Thu, Mar 8, 2018 at 5:04 PM, Manuel Bouyer  wrote:
> On Thu, Mar 08, 2018 at 11:04:00AM +0900, Ryota Ozaki wrote:
>> On Wed, Mar 7, 2018 at 6:38 PM, Manuel Bouyer  wrote:
>> > On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote:
>> >> On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote:
>> >> > >[...]
>> >> > > This is reproductible, restarting my automatic test script hangs the 
>> >> > > same
>> >> > > way. This i plain ffs, no wapbl.
>> >> > >
>> >> > > Any idea ?
>> >> >
>> >> > pserialize_perform can get stuck if any softints get stuck. Can you 
>> >> > check
>> >> > if such softints exist?
>> >>
>> >> I'll look the next time I can see this.
>> >
>> > I had a console log of a previous hang. No softint appears to be waiting
>> > in the ps/a output
>>
>> Thanks. Hm, does ps/a show softints (say softnet/0)?
>> I use just ps for the purpose.
>
> I used plain ps too. Not sure why I added this /a in my mail (probably
> related to tr/a :)

Okay, thanks :)

So my concern probably proved unfounded.

  ozaki-r


Re: netbsd-8 hang on tstile

2018-03-08 Thread Manuel Bouyer
On Thu, Mar 08, 2018 at 11:04:00AM +0900, Ryota Ozaki wrote:
> On Wed, Mar 7, 2018 at 6:38 PM, Manuel Bouyer  wrote:
> > On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote:
> >> On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote:
> >> > >[...]
> >> > > This is reproductible, restarting my automatic test script hangs the 
> >> > > same
> >> > > way. This i plain ffs, no wapbl.
> >> > >
> >> > > Any idea ?
> >> >
> >> > pserialize_perform can get stuck if any softints get stuck. Can you check
> >> > if such softints exist?
> >>
> >> I'll look the next time I can see this.
> >
> > I had a console log of a previous hang. No softint appears to be waiting
> > in the ps/a output
> 
> Thanks. Hm, does ps/a show softints (say softnet/0)?
> I use just ps for the purpose.

I used plain ps too. Not sure why I added this /a in my mail (probably
related to tr/a :)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: netbsd-8 hang on tstile

2018-03-07 Thread Ryota Ozaki
On Wed, Mar 7, 2018 at 6:38 PM, Manuel Bouyer  wrote:
> On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote:
>> On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote:
>> > >[...]
>> > > This is reproductible, restarting my automatic test script hangs the same
>> > > way. This i plain ffs, no wapbl.
>> > >
>> > > Any idea ?
>> >
>> > pserialize_perform can get stuck if any softints get stuck. Can you check
>> > if such softints exist?
>>
>> I'll look the next time I can see this.
>
> I had a console log of a previous hang. No softint appears to be waiting
> in the ps/a output

Thanks. Hm, does ps/a show softints (say softnet/0)?
I use just ps for the purpose.

  ozaki-r


Re: netbsd-8 hang on tstile

2018-03-07 Thread Manuel Bouyer
On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote:
> On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote:
> > >[...]
> > > This is reproductible, restarting my automatic test script hangs the same
> > > way. This i plain ffs, no wapbl.
> > >
> > > Any idea ?
> > 
> > pserialize_perform can get stuck if any softints get stuck. Can you check
> > if such softints exist?
> 
> I'll look the next time I can see this.

I had a console log of a previous hang. No softint appears to be waiting
in the ps/a output

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: netbsd-8 hang on tstile

2018-03-07 Thread J. Hannken-Illjes

> On 6. Mar 2018, at 23:33, Manuel Bouyer  wrote:
> 
> Hello
> on an up-to-date netbsd-8 Xen3 i386PAE kernel I see hangs on tstile.
> Hung processes shows the same pattern, they sleep in fstrans_start():

> 
> This is reproductible, restarting my automatic test script hangs the same
> way. This i plain ffs, no wapbl.
> 
> Any idea ?

Please enter DDB and "call fstrans_dump(0)" to see which thread blocks
the transition (it will have "... shared N ..." with N > 0).

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: netbsd-8 hang on tstile

2018-03-07 Thread Manuel Bouyer
On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote:
> >[...]
> > This is reproductible, restarting my automatic test script hangs the same
> > way. This i plain ffs, no wapbl.
> >
> > Any idea ?
> 
> pserialize_perform can get stuck if any softints get stuck. Can you check
> if such softints exist?

I'll look the next time I can see this.

Interestingly, I started the automatic tests again before going to bed,
and as expected the system got stuck, I assume in the same way. But it
got out of this state about 4 hours later (if I can trust nagios).
This morning, the tests were still making progress.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: netbsd-8 hang on tstile

2018-03-06 Thread Ryota Ozaki
On Wed, Mar 7, 2018 at 7:33 AM, Manuel Bouyer  wrote:
> Hello
> on an up-to-date netbsd-8 Xen3 i386PAE kernel I see hangs on tstile.
> Hung processes shows the same pattern, they sleep in fstrans_start():
> sleepq_block(0,0,c0596900,c0639b6c,c534e802,40,c03dbcfe,75,c5356340,c534e800) 
> at netbsd:sleepq_block+0xe6
> turnstile_block(c59847f0,1,c078b6c0,c0639b6c,de8b9b70,c59847f0,c5db6020,c59a2008,0,c59a2008)
>  at netbsd:turnstile_block+0x29d
> mutex_vector_enter(c078b6c0,c0cb8fe0,c055570c,de8b9b84,c05349d7,c575ed04,1,c59a2008,de8b9ba4,c0493ff8)
>  at netbsd:mutex_vector_enter+0x28c
> fstrans_start(c59a2008,0,de8b9ba8,10,c055cf2c,c575ed04,20002,20002,c575ed04,0)
>  at netbsd:fstrans_start+0x3b6
> VOP_LOCK(c575ed04,20002,22000,0,c03dc2e7,c0746e1a,de8b9d0c,20002,c575ed04,de8b9c88)
>  at netbsd:VOP_LOCK+0x48
> vn_lock(c575ed04,20002,4,0,de8b9c14,c03af32d,de8b9edc,3,0,5) at 
> netbsd:vn_lock+0x7f
> namei_tryemulroot(0,c012a96a,c532c240,de8b9ce8,c041f909,c532c2b4,de8b9d0c,de8b9d34,0,0)
>  at netbsd:namei_tryemulroot+0x14f
> namei(de8b9d0c,1,3,de8b9d10,c042266a,0,c532c240,de8b9d20,c04211d6,0) at 
> netbsd:namei+0x27
> check_exec(c5db6020,de8b9dc8,c5f558a8,de8b9da4,9cf21000,de8b9dac,c0114f1a,bad054d0,de8b9ddc,bf7fef58)
>  at netbsd:check_exec+0x40
> execve_loadvm(bf7feab8,c03c95a0,de8b9dc8,c5a6f000,c6460c00,c700b008,404,0,0,0)
>  at netbsd:execve_loadvm+0x233
> execve1(c5db6020,bf7fef58,bad054d0,bf7feab8,c03c95a0,de8b9f9c,c0113572,c5db6020,
>  de8b9f68,de8b9f60) at netbsd:execve1+0x3c
> sys_execve(c5db6020,de8b9f68,de8b9f60,c63d8290,0,c0636ddc,de8b9f68,0,0,bf7fef58)
>  at netbsd:sys_execve+0x31
> syscall() at netbsd:syscall+0x82
>
> I guess the culprit is:
> 2650 1 3   2 0   c5f92a80  python2.7 psrlz
> (an anita process, actually)
> sleepq_block(1,0,c059af17,c0639f80,c0640804,c5356340,c5358a80,c063f93c,6406c2,c0
> 59af17) at netbsd:sleepq_block+0x1cd
> kpause(c059af17,0,1,0,c5448590,c5cec008,de5e7b5c,c048355a,c5330a28,de5e7b6c) 
> at
> n
> etbsd:kpause+0xf2
> pserialize_perform(c5330a28,de5e7b6c,c0484a6f,c638e8c0,0,c055cf2c,c5cec008,504,0,de5e7b7c)
>  at netbsd:pserialize_perform+0x10a
> fstrans_setstate(c5cec008,0,fffe,c0115914,c5cec008,c5cec008,de5e7b94,c047a2c0,c5cec008,2)
>  at netbsd:fstrans_setstate+0x3a
> genfs_suspendctl(c5cec008,2,de5e7bb0,de5e7bd4,de5e7bb0,c0483fc4,c5cec008,2,de5e7bd4,de5e7bd4)
>  at netbsd:genfs_suspendctl+0x3a
> VFS_SUSPENDCTL(c5cec008,2,de5e7bd4,de5e7bd4,504,de5e7be4,c0486fd6,c5cec008,504,0)
>  at netbsd:VFS_SUSPENDCTL+0x20
> vfs_resume(c5cec008,504,0,de5e7bd4,c011599f,4,c5cec008,c5e978e4,c5e978e4,c5f92a80)
>  at netbsd:vfs_resume+0x74
> vrevoke(c5e978e4,de5e7c14,c049370a,de5e7c04,0,0,c055d160,c5e978e4,1,0) at 
> netbsd:vrevoke+0x96
> genfs_revoke(de5e7c04,0,0,c055d160,c5e978e4,1,0,de5e7cc4,c044dea9,c5e978e4) 
> at netbsd:genfs_revoke+0x1a
> VOP_REVOKE(c5e978e4,1,c5353f00,504,0,74,c5e978e4,0,190,) at 
> netbsd:VOP_REVOKE+0x4a
> pty_grant_slave(c5f92a80,504,0,c5cec008,0,10,c055cf2c,c5c271bc,20002,20002) 
> at netbsd:pty_grant_slave+0xc9
> ptmioctl(a501,0,48087446,c6922008,3,c5f92a80,c5f92a80,48087446,c055c260,3) at 
> netbsd:ptmioctl+0xdd
> cdev_ioctl(a501,0,48087446,c6922008,3,c5f92a80,a501,c5c271bc,c5c271bc,c5fd22c0)
>  at netbsd:cdev_ioctl+0xd0
> spec_ioctl(de5e7da0,c0115914,c5df9bc0,c055d1f0,c5c271bc,48087446,c6922008,3,c5eac9c0,48087446)
>  at netbsd:spec_ioctl+0x90
> VOP_IOCTL(c5c271bc,48087446,c6922008,3,c5eac9c0,c5e74380,c6fbe790,fffe,c0115914,0)
>  at netbsd:VOP_IOCTL+0x3e
> vn_ioctl(c5fd22c0,48087446,c6922008,c5f8eb74,c5fd22c0,,,0,0,c6922008)
>  at netbsd:vn_ioctl+0x9f
> sys_ioctl(c5f92a80,de5e7f68,de5e7f60,c63d8788,0,c0636d78,de5e7f68,0,0,7) at 
> netbsd:sys_ioctl+0x10a
> syscall() at netbsd:syscall+0x82
>
> This is reproductible, restarting my automatic test script hangs the same
> way. This i plain ffs, no wapbl.
>
> Any idea ?

pserialize_perform can get stuck if any softints get stuck. Can you check
if such softints exist?

  ozaki-r


netbsd-8 hang on tstile

2018-03-06 Thread Manuel Bouyer
Hello
on an up-to-date netbsd-8 Xen3 i386PAE kernel I see hangs on tstile.
Hung processes shows the same pattern, they sleep in fstrans_start():
sleepq_block(0,0,c0596900,c0639b6c,c534e802,40,c03dbcfe,75,c5356340,c534e800) 
at netbsd:sleepq_block+0xe6
turnstile_block(c59847f0,1,c078b6c0,c0639b6c,de8b9b70,c59847f0,c5db6020,c59a2008,0,c59a2008)
 at netbsd:turnstile_block+0x29d
mutex_vector_enter(c078b6c0,c0cb8fe0,c055570c,de8b9b84,c05349d7,c575ed04,1,c59a2008,de8b9ba4,c0493ff8)
 at netbsd:mutex_vector_enter+0x28c
fstrans_start(c59a2008,0,de8b9ba8,10,c055cf2c,c575ed04,20002,20002,c575ed04,0) 
at netbsd:fstrans_start+0x3b6
VOP_LOCK(c575ed04,20002,22000,0,c03dc2e7,c0746e1a,de8b9d0c,20002,c575ed04,de8b9c88)
 at netbsd:VOP_LOCK+0x48
vn_lock(c575ed04,20002,4,0,de8b9c14,c03af32d,de8b9edc,3,0,5) at 
netbsd:vn_lock+0x7f
namei_tryemulroot(0,c012a96a,c532c240,de8b9ce8,c041f909,c532c2b4,de8b9d0c,de8b9d34,0,0)
 at netbsd:namei_tryemulroot+0x14f
namei(de8b9d0c,1,3,de8b9d10,c042266a,0,c532c240,de8b9d20,c04211d6,0) at 
netbsd:namei+0x27
check_exec(c5db6020,de8b9dc8,c5f558a8,de8b9da4,9cf21000,de8b9dac,c0114f1a,bad054d0,de8b9ddc,bf7fef58)
 at netbsd:check_exec+0x40
execve_loadvm(bf7feab8,c03c95a0,de8b9dc8,c5a6f000,c6460c00,c700b008,404,0,0,0) 
at netbsd:execve_loadvm+0x233
execve1(c5db6020,bf7fef58,bad054d0,bf7feab8,c03c95a0,de8b9f9c,c0113572,c5db6020,
 de8b9f68,de8b9f60) at netbsd:execve1+0x3c
sys_execve(c5db6020,de8b9f68,de8b9f60,c63d8290,0,c0636ddc,de8b9f68,0,0,bf7fef58)
 at netbsd:sys_execve+0x31
syscall() at netbsd:syscall+0x82

I guess the culprit is:
2650 1 3   2 0   c5f92a80  python2.7 psrlz
(an anita process, actually)
sleepq_block(1,0,c059af17,c0639f80,c0640804,c5356340,c5358a80,c063f93c,6406c2,c0
59af17) at netbsd:sleepq_block+0x1cd
kpause(c059af17,0,1,0,c5448590,c5cec008,de5e7b5c,c048355a,c5330a28,de5e7b6c) at 
n
etbsd:kpause+0xf2
pserialize_perform(c5330a28,de5e7b6c,c0484a6f,c638e8c0,0,c055cf2c,c5cec008,504,0,de5e7b7c)
 at netbsd:pserialize_perform+0x10a
fstrans_setstate(c5cec008,0,fffe,c0115914,c5cec008,c5cec008,de5e7b94,c047a2c0,c5cec008,2)
 at netbsd:fstrans_setstate+0x3a
genfs_suspendctl(c5cec008,2,de5e7bb0,de5e7bd4,de5e7bb0,c0483fc4,c5cec008,2,de5e7bd4,de5e7bd4)
 at netbsd:genfs_suspendctl+0x3a
VFS_SUSPENDCTL(c5cec008,2,de5e7bd4,de5e7bd4,504,de5e7be4,c0486fd6,c5cec008,504,0)
 at netbsd:VFS_SUSPENDCTL+0x20
vfs_resume(c5cec008,504,0,de5e7bd4,c011599f,4,c5cec008,c5e978e4,c5e978e4,c5f92a80)
 at netbsd:vfs_resume+0x74
vrevoke(c5e978e4,de5e7c14,c049370a,de5e7c04,0,0,c055d160,c5e978e4,1,0) at 
netbsd:vrevoke+0x96
genfs_revoke(de5e7c04,0,0,c055d160,c5e978e4,1,0,de5e7cc4,c044dea9,c5e978e4) at 
netbsd:genfs_revoke+0x1a
VOP_REVOKE(c5e978e4,1,c5353f00,504,0,74,c5e978e4,0,190,) at 
netbsd:VOP_REVOKE+0x4a
pty_grant_slave(c5f92a80,504,0,c5cec008,0,10,c055cf2c,c5c271bc,20002,20002) at 
netbsd:pty_grant_slave+0xc9
ptmioctl(a501,0,48087446,c6922008,3,c5f92a80,c5f92a80,48087446,c055c260,3) at 
netbsd:ptmioctl+0xdd
cdev_ioctl(a501,0,48087446,c6922008,3,c5f92a80,a501,c5c271bc,c5c271bc,c5fd22c0) 
at netbsd:cdev_ioctl+0xd0
spec_ioctl(de5e7da0,c0115914,c5df9bc0,c055d1f0,c5c271bc,48087446,c6922008,3,c5eac9c0,48087446)
 at netbsd:spec_ioctl+0x90
VOP_IOCTL(c5c271bc,48087446,c6922008,3,c5eac9c0,c5e74380,c6fbe790,fffe,c0115914,0)
 at netbsd:VOP_IOCTL+0x3e
vn_ioctl(c5fd22c0,48087446,c6922008,c5f8eb74,c5fd22c0,,,0,0,c6922008)
 at netbsd:vn_ioctl+0x9f
sys_ioctl(c5f92a80,de5e7f68,de5e7f60,c63d8788,0,c0636d78,de5e7f68,0,0,7) at 
netbsd:sys_ioctl+0x10a
syscall() at netbsd:syscall+0x82

This is reproductible, restarting my automatic test script hangs the same
way. This i plain ffs, no wapbl.

Any idea ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--