Re: r244036 kernel hangs under load.
Konstantin Belousov wrote: On Wed, Dec 12, 2012 at 10:01:39PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Tue, Dec 11, 2012 at 08:58:47PM -0500, Rick Macklem wrote: Ok, I'll test r243598 and then r243599 and r243835, just to see if it really is this. I'll email when I have done this. If you test only r243598, I am sure that you would experience corruption. The r243599 should cause the deadlocks. I think you meant r243599 will result in corruptions and r243835 deadlocks. I have run r243598 for a while without a hang. (r243599 doesn't build) I haven't tried r243835 yet. Also, do you use the post-r244095 kernel ? Before and after. The most recent tests were post-r244095. (If anything the more recent kernels hang more easily.) Is your machine SMP ? Old, slow single core i386. Try this. Please note that this is mostly a debugging facility. It seemed to help, but didn't stop the hangs completely. r244125 without the patch would hang somewhere in a kernel build. r244125 plus this patch ran almost 2 kernel builds before it got hung. Can you try to look into the system state on the hang (on the kernel with the if (1 || patch applied) ? Using the ddb and recipe from the web page. Basically watch for a thread looping in the mnt_active iterator and threads owning vnode interlocks. Ok, there is only one process in the mnt_active iterator and its trace is as follows (syncer): dle+0x12d/frame 0xdfe33adc (I suspect the screen lost an 'I') intr_execute_handlers(c5e3d064,dfe33b20,0,dfe33b64,c0ec2115,...) at intr_execute_handlers+0x49/frame 0xdfe33afc lapic_handle_intr(3c,dfe33b20) at lapic_handle_intr+0x36/frame 0xdfe33b10 Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe33b10 --- interrupt, eip = 0xc0eca8db, esp = 0xdfe33b60, ebp = 0xdfe33b64 --- spinlock_exit(c128be90,4,c10b5017,130,1710,...) at spinlock_exit+0x2b/frame 0xdfe33b64 __mtx_unlock_spin_flags(c128be90,0,c10b80be,25d,0,...) at __mtx_unlock_spin_flags+0x112/frame 0xdfe33b90 kern_yield(,0,c10c75c9,127b,c8b05238,...) at kern_yield+0x125/frame 0xdfe33bbc __mnt_vnode_next_active(dfe33c08,c857ba80,c10c75c9,dac,5d7,...) at __mnt_vnode_next_active+0xda/frame 0xdfe33be0 vfs_msync(c857ba80,2,2,e6b,c857ba80,...) at vfs_msync+0x175/frame 0xdfe33c18 sync_fsync(dfe33ca8,c85cf470,80400,c10c75c9,6f4,...) at sync_fsync+0x141/frame 0xdfe33c64 VOP_FSYNC_APV(c12008a0,dfe33ca8,c10c75c9,6f4,4e20,...) at VOP_FSYNC_APV+0xb4/frame 0xdfe33c64 sched_sync(0,dfe33d08,c10b0e10,3db,c85395a0,...) at sched_sync+0x399/frame 0xdfe33cc8 fork_exit(c0b79420,0,dfe33d08) at fork_exit+0xc0/frame 0xdfe33cf4 fork_trampoline() at fork_trampoline+0x8/frame 0xdfe33cf4 --- trap 0, eip = 0, esp = 0xdfe33d40, ebp = 0 --- This process holds: exclusive lockmgr syncer (syncer) r = 0 (0xc85cf4c8) locked @ kern/vfs_subr.c:1780 The only other process that is doing anything in the VFS subsystem holds the vnode interlock. It's trace is: dle+0x12d/frame 0xdfe6a850 intr_execute_handlers(c5f721c0,dfe6a894,0,dfe6a908,c0ec2115,...) at intr_execute_handlers+0x49/frame 0xdfe6a870 lapic_handle_intr(31,dfe6a894) at lapic_handle_intr+0x36/frame 0xdfe6a884 Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe6a884 --- interrupt, eip = 0xc0b2206a, esp = 0xdfe6a8d4, ebp = 0xdfe6a908 --- witness_unlock(c8972a74,8,c10c75c9,965,0,...) at witness_unlock+0x3a/frame 0xdfe6a908 __mtx_unlock_flags(c8972a84,0,c10c75c9,965,c89729fc,...) at __mtx_unlock_flags+0x9f/frame 0xdfe6a938 vdropl(c89729fc,dfe6a974,c10c75c9,8e7,c1238020,...) at vdropl+0x63/frame 0xdfe6a95c vputx(dfe6aa04,c0b67acc,c89729fc,dfe6a9e4,dfe6abc4,...) at vputx+0x300/frame 0xdfe6a994 vput(c89729fc,dfe6a9e4,dfe6abc4,31d,dfe6a9e4,...) at vput+0x10/frame 0xdfe6a99c lookup(dfe6ab84,c857e000,0,ce,c13c83c8,...) at lookup+0x9bc/frame 0xdfe6aa04 namei(dfe6ab84,0,0,246,0,...) at namei+0x4fe/frame 0xdfe6aa80 vn_open_cred(dfe6ab84,dfe6ac24,1a4,0,c5dd4580,...) at vn_open_cred+0x2c0/frame 0xdfe6ab40 vn_open(dfe6ab84,dfe6ac24,1a4,c85922a0,c853a2d0,...) at vn_open+0x3b/frame 0xdfe6ab60 kern_openat(c85c55e0,ff9c,2882dcc0,0,8001,...) at kern_openat+0x1e2/frame 0xdfe6ac0c kern_open(c85c55e0,2882dcc0,0,8000,1b6,...) at kern_open+0x35/frame 0xdfe6ac2c sys_open(c85c55e0,dfe6accc,c02acde7,7307f55d,5e5b00,...) at sys_open+0x30/frame 0xdfe6ac48 syscall(dfe6ad08) at syscall+0x2e5/frame 0xdfe6acfc Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xdfe6acfc --- syscall (5, FreeBSD ELF32, sys_open), eip = 0x84a1667, esp = 0xbfbfcffc, ebp = 0xbfbfd018 --- The locks this process holds are: exclusive sleep mutex vnode interlock (vnode interlock) r = 0 (0x8972a74) locked @ kern/vfs_subr.c:2513 shared lockmgr ufs (ufs) r = 0 (0xc8bd181c) locked @ kern/vfs_subr.c:2161
Re: r244036 kernel hangs under load.
Konstantin Belousov wrote: On Wed, Dec 12, 2012 at 10:01:39PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Tue, Dec 11, 2012 at 08:58:47PM -0500, Rick Macklem wrote: Ok, I'll test r243598 and then r243599 and r243835, just to see if it really is this. I'll email when I have done this. If you test only r243598, I am sure that you would experience corruption. The r243599 should cause the deadlocks. I think you meant r243599 will result in corruptions and r243835 deadlocks. I have run r243598 for a while without a hang. (r243599 doesn't build) I haven't tried r243835 yet. Also, do you use the post-r244095 kernel ? Before and after. The most recent tests were post-r244095. (If anything the more recent kernels hang more easily.) Is your machine SMP ? Old, slow single core i386. Try this. Please note that this is mostly a debugging facility. It seemed to help, but didn't stop the hangs completely. r244125 without the patch would hang somewhere in a kernel build. r244125 plus this patch ran almost 2 kernel builds before it got hung. Can you try to look into the system state on the hang (on the kernel with the if (1 || patch applied) ? Using the ddb and recipe from the web page. Basically watch for a thread looping in the mnt_active iterator and threads owning vnode interlocks. Ok, there is only one process in the mnt_active iterator and its trace is as follows (syncer): dle+0x12d/frame 0xdfe33adc (I suspect the screen lost an 'I') intr_execute_handlers(c5e3d064,dfe33b20,0,dfe33b64,c0ec2115,...) at intr_execute_handlers+0x49/frame 0xdfe33afc lapic_handle_intr(3c,dfe33b20) at lapic_handle_intr+0x36/frame 0xdfe33b10 Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe33b10 --- interrupt, eip = 0xc0eca8db, esp = 0xdfe33b60, ebp = 0xdfe33b64 --- spinlock_exit(c128be90,4,c10b5017,130,1710,...) at spinlock_exit+0x2b/frame 0xdfe33b64 __mtx_unlock_spin_flags(c128be90,0,c10b80be,25d,0,...) at __mtx_unlock_spin_flags+0x112/frame 0xdfe33b90 kern_yield(,0,c10c75c9,127b,c8b05238,...) at kern_yield+0x125/frame 0xdfe33bbc __mnt_vnode_next_active(dfe33c08,c857ba80,c10c75c9,dac,5d7,...) at __mnt_vnode_next_active+0xda/frame 0xdfe33be0 vfs_msync(c857ba80,2,2,e6b,c857ba80,...) at vfs_msync+0x175/frame 0xdfe33c18 sync_fsync(dfe33ca8,c85cf470,80400,c10c75c9,6f4,...) at sync_fsync+0x141/frame 0xdfe33c64 VOP_FSYNC_APV(c12008a0,dfe33ca8,c10c75c9,6f4,4e20,...) at VOP_FSYNC_APV+0xb4/frame 0xdfe33c64 sched_sync(0,dfe33d08,c10b0e10,3db,c85395a0,...) at sched_sync+0x399/frame 0xdfe33cc8 fork_exit(c0b79420,0,dfe33d08) at fork_exit+0xc0/frame 0xdfe33cf4 fork_trampoline() at fork_trampoline+0x8/frame 0xdfe33cf4 --- trap 0, eip = 0, esp = 0xdfe33d40, ebp = 0 --- This process holds: exclusive lockmgr syncer (syncer) r = 0 (0xc85cf4c8) locked @ kern/vfs_subr.c:1780 The only other process that is doing anything in the VFS subsystem holds the vnode interlock. It's trace is: dle+0x12d/frame 0xdfe6a850 intr_execute_handlers(c5f721c0,dfe6a894,0,dfe6a908,c0ec2115,...) at intr_execute_handlers+0x49/frame 0xdfe6a870 lapic_handle_intr(31,dfe6a894) at lapic_handle_intr+0x36/frame 0xdfe6a884 Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe6a884 --- interrupt, eip = 0xc0b2206a, esp = 0xdfe6a8d4, ebp = 0xdfe6a908 --- witness_unlock(c8972a74,8,c10c75c9,965,0,...) at witness_unlock+0x3a/frame 0xdfe6a908 __mtx_unlock_flags(c8972a84,0,c10c75c9,965,c89729fc,...) at __mtx_unlock_flags+0x9f/frame 0xdfe6a938 vdropl(c89729fc,dfe6a974,c10c75c9,8e7,c1238020,...) at vdropl+0x63/frame 0xdfe6a95c vputx(dfe6aa04,c0b67acc,c89729fc,dfe6a9e4,dfe6abc4,...) at vputx+0x300/frame 0xdfe6a994 vput(c89729fc,dfe6a9e4,dfe6abc4,31d,dfe6a9e4,...) at vput+0x10/frame 0xdfe6a99c lookup(dfe6ab84,c857e000,0,ce,c13c83c8,...) at lookup+0x9bc/frame 0xdfe6aa04 namei(dfe6ab84,0,0,246,0,...) at namei+0x4fe/frame 0xdfe6aa80 vn_open_cred(dfe6ab84,dfe6ac24,1a4,0,c5dd4580,...) at vn_open_cred+0x2c0/frame 0xdfe6ab40 vn_open(dfe6ab84,dfe6ac24,1a4,c85922a0,c853a2d0,...) at vn_open+0x3b/frame 0xdfe6ab60 kern_openat(c85c55e0,ff9c,2882dcc0,0,8001,...) at kern_openat+0x1e2/frame 0xdfe6ac0c kern_open(c85c55e0,2882dcc0,0,8000,1b6,...) at kern_open+0x35/frame 0xdfe6ac2c sys_open(c85c55e0,dfe6accc,c02acde7,7307f55d,5e5b00,...) at sys_open+0x30/frame 0xdfe6ac48 syscall(dfe6ad08) at syscall+0x2e5/frame 0xdfe6acfc Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xdfe6acfc --- syscall (5, FreeBSD ELF32, sys_open), eip = 0x84a1667, esp = 0xbfbfcffc, ebp = 0xbfbfd018 --- The locks this process holds are: exclusive sleep mutex vnode interlock (vnode interlock) r = 0 (0x8972a74) locked @ kern/vfs_subr.c:2513 shared lockmgr ufs (ufs) r = 0 (0xc8bd181c) locked @ kern/vfs_subr.c:2161
Re: r244036 kernel hangs under load.
On Thu, Dec 13, 2012 at 10:14:29PM -0500, Rick Macklem wrote: Good work. This patch seems to have done the trick. I've run quite a few kernel build cycles without a hang. I'll keep running them, but I would have expected to see a hang by now. Maybe Tim can test the patch as well? (I needed to add #include sys/smp.h btw.) Below is the current version of the patch, it is being tested by Peter Holm. Compared to what I sent you earlier, it contain a bug fix only relevant if you use quotas. diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 67e078d..64d75fb 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -69,6 +69,7 @@ __FBSDID($FreeBSD$); #include sys/reboot.h #include sys/sched.h #include sys/sleepqueue.h +#include sys/smp.h #include sys/stat.h #include sys/sysctl.h #include sys/syslog.h @@ -4710,32 +4711,54 @@ __mnt_vnode_markerfree_all(struct vnode **mvp, struct mount *mp) * These are helper functions for filesystems to traverse their * active vnodes. See MNT_VNODE_FOREACH_ACTIVE() in sys/mount.h */ -struct vnode * -__mnt_vnode_next_active(struct vnode **mvp, struct mount *mp) +static void +mnt_vnode_markerfree_active(struct vnode **mvp, struct mount *mp) +{ + + KASSERT((*mvp)-v_mount == mp, (marker vnode mount list mismatch)); + + MNT_ILOCK(mp); + MNT_REL(mp); + MNT_IUNLOCK(mp); + free(*mvp, M_VNODE_MARKER); + *mvp = NULL; +} + +#ifdef SMP +#defineALWAYS_YIELD(mp_ncpus == 1) +#else +#defineALWAYS_YIELD1 +#endif + +static struct vnode * +mnt_vnode_next_active(struct vnode **mvp, struct mount *mp) { struct vnode *vp, *nvp; - if (should_yield()) - kern_yield(PRI_UNCHANGED); - mtx_lock(vnode_free_list_mtx); -restart: + mtx_assert(vnode_free_list_mtx, MA_OWNED); KASSERT((*mvp)-v_mount == mp, (marker vnode mount list mismatch)); +restart: vp = TAILQ_NEXT(*mvp, v_actfreelist); + TAILQ_REMOVE(mp-mnt_activevnodelist, *mvp, v_actfreelist); while (vp != NULL) { if (vp-v_type == VMARKER) { vp = TAILQ_NEXT(vp, v_actfreelist); continue; } if (!VI_TRYLOCK(vp)) { - if (should_yield()) { + if (ALWAYS_YIELD || should_yield()) { + TAILQ_INSERT_BEFORE(vp, *mvp, v_actfreelist); mtx_unlock(vnode_free_list_mtx); - kern_yield(PRI_UNCHANGED); + kern_yield(PRI_USER); mtx_lock(vnode_free_list_mtx); + goto restart; } - goto restart; + continue; } - if (vp-v_mount == mp vp-v_type != VMARKER - (vp-v_iflag VI_DOOMED) == 0) + KASSERT(vp-v_type != VMARKER, (locked marker %p, vp)); + KASSERT(vp-v_mount == mp || vp-v_mount == NULL, + (alien vnode on the active list %p %p, vp, mp)); + if (vp-v_mount == mp (vp-v_iflag VI_DOOMED) == 0) break; nvp = TAILQ_NEXT(vp, v_actfreelist); VI_UNLOCK(vp); @@ -4745,22 +4768,31 @@ restart: /* Check if we are done */ if (vp == NULL) { mtx_unlock(vnode_free_list_mtx); - __mnt_vnode_markerfree_active(mvp, mp); - mtx_assert(MNT_MTX(mp), MA_NOTOWNED); + mnt_vnode_markerfree_active(mvp, mp); return (NULL); } - TAILQ_REMOVE(mp-mnt_activevnodelist, *mvp, v_actfreelist); TAILQ_INSERT_AFTER(mp-mnt_activevnodelist, vp, *mvp, v_actfreelist); mtx_unlock(vnode_free_list_mtx); ASSERT_VI_LOCKED(vp, active iter); KASSERT((vp-v_iflag VI_ACTIVE) != 0, (Non-active vp %p, vp)); return (vp); } +#undef ALWAYS_YIELD + +struct vnode * +__mnt_vnode_next_active(struct vnode **mvp, struct mount *mp) +{ + + if (should_yield()) + kern_yield(PRI_UNCHANGED); + mtx_lock(vnode_free_list_mtx); + return (mnt_vnode_next_active(mvp, mp)); +} struct vnode * __mnt_vnode_first_active(struct vnode **mvp, struct mount *mp) { - struct vnode *vp, *nvp; + struct vnode *vp; *mvp = malloc(sizeof(struct vnode), M_VNODE_MARKER, M_WAITOK | M_ZERO); MNT_ILOCK(mp); @@ -4770,44 +4802,14 @@ __mnt_vnode_first_active(struct vnode **mvp, struct mount *mp) (*mvp)-v_mount = mp; mtx_lock(vnode_free_list_mtx); -restart: vp = TAILQ_FIRST(mp-mnt_activevnodelist); - while (vp != NULL) { - if (vp-v_type == VMARKER) { - vp = TAILQ_NEXT(vp, v_actfreelist); - continue; - } - if (!VI_TRYLOCK(vp)) { -
Re: r244036 kernel hangs under load.
Konstantin Belousov wrote: On Tue, Dec 11, 2012 at 08:58:47PM -0500, Rick Macklem wrote: Ok, I'll test r243598 and then r243599 and r243835, just to see if it really is this. I'll email when I have done this. If you test only r243598, I am sure that you would experience corruption. The r243599 should cause the deadlocks. I think you meant r243599 will result in corruptions and r243835 deadlocks. I have run r243598 for a while without a hang. (r243599 doesn't build) I haven't tried r243835 yet. Also, do you use the post-r244095 kernel ? Before and after. The most recent tests were post-r244095. (If anything the more recent kernels hang more easily.) Is your machine SMP ? Old, slow single core i386. Try this. Please note that this is mostly a debugging facility. It seemed to help, but didn't stop the hangs completely. r244125 without the patch would hang somewhere in a kernel build. r244125 plus this patch ran almost 2 kernel builds before it got hung. Can you try to look into the system state on the hang (on the kernel with the if (1 || patch applied) ? Using the ddb and recipe from the web page. Basically watch for a thread looping in the mnt_active iterator and threads owning vnode interlocks. Ok, there is only one process in the mnt_active iterator and its trace is as follows (syncer): dle+0x12d/frame 0xdfe33adc (I suspect the screen lost an 'I') intr_execute_handlers(c5e3d064,dfe33b20,0,dfe33b64,c0ec2115,...) at intr_execute_handlers+0x49/frame 0xdfe33afc lapic_handle_intr(3c,dfe33b20) at lapic_handle_intr+0x36/frame 0xdfe33b10 Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe33b10 --- interrupt, eip = 0xc0eca8db, esp = 0xdfe33b60, ebp = 0xdfe33b64 --- spinlock_exit(c128be90,4,c10b5017,130,1710,...) at spinlock_exit+0x2b/frame 0xdfe33b64 __mtx_unlock_spin_flags(c128be90,0,c10b80be,25d,0,...) at __mtx_unlock_spin_flags+0x112/frame 0xdfe33b90 kern_yield(,0,c10c75c9,127b,c8b05238,...) at kern_yield+0x125/frame 0xdfe33bbc __mnt_vnode_next_active(dfe33c08,c857ba80,c10c75c9,dac,5d7,...) at __mnt_vnode_next_active+0xda/frame 0xdfe33be0 vfs_msync(c857ba80,2,2,e6b,c857ba80,...) at vfs_msync+0x175/frame 0xdfe33c18 sync_fsync(dfe33ca8,c85cf470,80400,c10c75c9,6f4,...) at sync_fsync+0x141/frame 0xdfe33c64 VOP_FSYNC_APV(c12008a0,dfe33ca8,c10c75c9,6f4,4e20,...) at VOP_FSYNC_APV+0xb4/frame 0xdfe33c64 sched_sync(0,dfe33d08,c10b0e10,3db,c85395a0,...) at sched_sync+0x399/frame 0xdfe33cc8 fork_exit(c0b79420,0,dfe33d08) at fork_exit+0xc0/frame 0xdfe33cf4 fork_trampoline() at fork_trampoline+0x8/frame 0xdfe33cf4 --- trap 0, eip = 0, esp = 0xdfe33d40, ebp = 0 --- This process holds: exclusive lockmgr syncer (syncer) r = 0 (0xc85cf4c8) locked @ kern/vfs_subr.c:1780 The only other process that is doing anything in the VFS subsystem holds the vnode interlock. It's trace is: dle+0x12d/frame 0xdfe6a850 intr_execute_handlers(c5f721c0,dfe6a894,0,dfe6a908,c0ec2115,...) at intr_execute_handlers+0x49/frame 0xdfe6a870 lapic_handle_intr(31,dfe6a894) at lapic_handle_intr+0x36/frame 0xdfe6a884 Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe6a884 --- interrupt, eip = 0xc0b2206a, esp = 0xdfe6a8d4, ebp = 0xdfe6a908 --- witness_unlock(c8972a74,8,c10c75c9,965,0,...) at witness_unlock+0x3a/frame 0xdfe6a908 __mtx_unlock_flags(c8972a84,0,c10c75c9,965,c89729fc,...) at __mtx_unlock_flags+0x9f/frame 0xdfe6a938 vdropl(c89729fc,dfe6a974,c10c75c9,8e7,c1238020,...) at vdropl+0x63/frame 0xdfe6a95c vputx(dfe6aa04,c0b67acc,c89729fc,dfe6a9e4,dfe6abc4,...) at vputx+0x300/frame 0xdfe6a994 vput(c89729fc,dfe6a9e4,dfe6abc4,31d,dfe6a9e4,...) at vput+0x10/frame 0xdfe6a99c lookup(dfe6ab84,c857e000,0,ce,c13c83c8,...) at lookup+0x9bc/frame 0xdfe6aa04 namei(dfe6ab84,0,0,246,0,...) at namei+0x4fe/frame 0xdfe6aa80 vn_open_cred(dfe6ab84,dfe6ac24,1a4,0,c5dd4580,...) at vn_open_cred+0x2c0/frame 0xdfe6ab40 vn_open(dfe6ab84,dfe6ac24,1a4,c85922a0,c853a2d0,...) at vn_open+0x3b/frame 0xdfe6ab60 kern_openat(c85c55e0,ff9c,2882dcc0,0,8001,...) at kern_openat+0x1e2/frame 0xdfe6ac0c kern_open(c85c55e0,2882dcc0,0,8000,1b6,...) at kern_open+0x35/frame 0xdfe6ac2c sys_open(c85c55e0,dfe6accc,c02acde7,7307f55d,5e5b00,...) at sys_open+0x30/frame 0xdfe6ac48 syscall(dfe6ad08) at syscall+0x2e5/frame 0xdfe6acfc Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xdfe6acfc --- syscall (5, FreeBSD ELF32, sys_open), eip = 0x84a1667, esp = 0xbfbfcffc, ebp = 0xbfbfd018 --- The locks this process holds are: exclusive sleep mutex vnode interlock (vnode interlock) r = 0 (0x8972a74) locked @ kern/vfs_subr.c:2513 shared lockmgr ufs (ufs) r = 0 (0xc8bd181c) locked @ kern/vfs_subr.c:2161 The only other lock held by any thread/process is: Process 12 (intr) thread 0xc5dfc5e0 (100012) exclusive sleep mutex Giant (Giant) r = 1 (0xc127b690) locked @ dev/syscons/syscons.c:724 The only 2 locked vnodes are the ufs one and the syncer one, as shown above. The rest of the processes/threads
Re: r244036 kernel hangs under load.
On Wed, Dec 12, 2012 at 10:01:39PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Tue, Dec 11, 2012 at 08:58:47PM -0500, Rick Macklem wrote: Ok, I'll test r243598 and then r243599 and r243835, just to see if it really is this. I'll email when I have done this. If you test only r243598, I am sure that you would experience corruption. The r243599 should cause the deadlocks. I think you meant r243599 will result in corruptions and r243835 deadlocks. I have run r243598 for a while without a hang. (r243599 doesn't build) I haven't tried r243835 yet. Also, do you use the post-r244095 kernel ? Before and after. The most recent tests were post-r244095. (If anything the more recent kernels hang more easily.) Is your machine SMP ? Old, slow single core i386. Try this. Please note that this is mostly a debugging facility. It seemed to help, but didn't stop the hangs completely. r244125 without the patch would hang somewhere in a kernel build. r244125 plus this patch ran almost 2 kernel builds before it got hung. Can you try to look into the system state on the hang (on the kernel with the if (1 || patch applied) ? Using the ddb and recipe from the web page. Basically watch for a thread looping in the mnt_active iterator and threads owning vnode interlocks. Ok, there is only one process in the mnt_active iterator and its trace is as follows (syncer): dle+0x12d/frame 0xdfe33adc (I suspect the screen lost an 'I') intr_execute_handlers(c5e3d064,dfe33b20,0,dfe33b64,c0ec2115,...) at intr_execute_handlers+0x49/frame 0xdfe33afc lapic_handle_intr(3c,dfe33b20) at lapic_handle_intr+0x36/frame 0xdfe33b10 Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe33b10 --- interrupt, eip = 0xc0eca8db, esp = 0xdfe33b60, ebp = 0xdfe33b64 --- spinlock_exit(c128be90,4,c10b5017,130,1710,...) at spinlock_exit+0x2b/frame 0xdfe33b64 __mtx_unlock_spin_flags(c128be90,0,c10b80be,25d,0,...) at __mtx_unlock_spin_flags+0x112/frame 0xdfe33b90 kern_yield(,0,c10c75c9,127b,c8b05238,...) at kern_yield+0x125/frame 0xdfe33bbc __mnt_vnode_next_active(dfe33c08,c857ba80,c10c75c9,dac,5d7,...) at __mnt_vnode_next_active+0xda/frame 0xdfe33be0 vfs_msync(c857ba80,2,2,e6b,c857ba80,...) at vfs_msync+0x175/frame 0xdfe33c18 sync_fsync(dfe33ca8,c85cf470,80400,c10c75c9,6f4,...) at sync_fsync+0x141/frame 0xdfe33c64 VOP_FSYNC_APV(c12008a0,dfe33ca8,c10c75c9,6f4,4e20,...) at VOP_FSYNC_APV+0xb4/frame 0xdfe33c64 sched_sync(0,dfe33d08,c10b0e10,3db,c85395a0,...) at sched_sync+0x399/frame 0xdfe33cc8 fork_exit(c0b79420,0,dfe33d08) at fork_exit+0xc0/frame 0xdfe33cf4 fork_trampoline() at fork_trampoline+0x8/frame 0xdfe33cf4 --- trap 0, eip = 0, esp = 0xdfe33d40, ebp = 0 --- This process holds: exclusive lockmgr syncer (syncer) r = 0 (0xc85cf4c8) locked @ kern/vfs_subr.c:1780 The only other process that is doing anything in the VFS subsystem holds the vnode interlock. It's trace is: dle+0x12d/frame 0xdfe6a850 intr_execute_handlers(c5f721c0,dfe6a894,0,dfe6a908,c0ec2115,...) at intr_execute_handlers+0x49/frame 0xdfe6a870 lapic_handle_intr(31,dfe6a894) at lapic_handle_intr+0x36/frame 0xdfe6a884 Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe6a884 --- interrupt, eip = 0xc0b2206a, esp = 0xdfe6a8d4, ebp = 0xdfe6a908 --- witness_unlock(c8972a74,8,c10c75c9,965,0,...) at witness_unlock+0x3a/frame 0xdfe6a908 __mtx_unlock_flags(c8972a84,0,c10c75c9,965,c89729fc,...) at __mtx_unlock_flags+0x9f/frame 0xdfe6a938 vdropl(c89729fc,dfe6a974,c10c75c9,8e7,c1238020,...) at vdropl+0x63/frame 0xdfe6a95c vputx(dfe6aa04,c0b67acc,c89729fc,dfe6a9e4,dfe6abc4,...) at vputx+0x300/frame 0xdfe6a994 vput(c89729fc,dfe6a9e4,dfe6abc4,31d,dfe6a9e4,...) at vput+0x10/frame 0xdfe6a99c lookup(dfe6ab84,c857e000,0,ce,c13c83c8,...) at lookup+0x9bc/frame 0xdfe6aa04 namei(dfe6ab84,0,0,246,0,...) at namei+0x4fe/frame 0xdfe6aa80 vn_open_cred(dfe6ab84,dfe6ac24,1a4,0,c5dd4580,...) at vn_open_cred+0x2c0/frame 0xdfe6ab40 vn_open(dfe6ab84,dfe6ac24,1a4,c85922a0,c853a2d0,...) at vn_open+0x3b/frame 0xdfe6ab60 kern_openat(c85c55e0,ff9c,2882dcc0,0,8001,...) at kern_openat+0x1e2/frame 0xdfe6ac0c kern_open(c85c55e0,2882dcc0,0,8000,1b6,...) at kern_open+0x35/frame 0xdfe6ac2c sys_open(c85c55e0,dfe6accc,c02acde7,7307f55d,5e5b00,...) at sys_open+0x30/frame 0xdfe6ac48 syscall(dfe6ad08) at syscall+0x2e5/frame 0xdfe6acfc Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xdfe6acfc --- syscall (5, FreeBSD ELF32, sys_open), eip = 0x84a1667, esp = 0xbfbfcffc, ebp = 0xbfbfd018 --- The locks this process holds are: exclusive sleep mutex vnode interlock (vnode interlock) r = 0 (0x8972a74) locked @ kern/vfs_subr.c:2513 shared lockmgr ufs (ufs) r = 0 (0xc8bd181c) locked @ kern/vfs_subr.c:2161 The only other lock held by any thread/process is: Process 12 (intr) thread 0xc5dfc5e0 (100012) exclusive sleep mutex
Re: r244036 kernel hangs under load.
Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 07:11:59PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 01:38:21PM -0500, Rick Macklem wrote: Adrian Chadd wrote: .. what was the previous kernel version? Hopefully Tim has it narrowed down more, but I don't see the hangs on a Sept. 7 kernel from head and I do see them on a Dec. 3 kernel from head. (Don't know the eact rNN.) It seems to predate my commit (r244008), which was my first concern. I use old single core i386 hardware and can fairly reliably reproduce it by doing a kernel build and a svn checkout concurrently. No NFS activity. These are running on a local disk (UFS/FFS). (The kernel I reproduce it on is built via GENERIC for i386. If you want me to start a binary search for which rNN, I can do that, but it will take a while.:-) I can get out into DDB, but I'll admit I don't know enough about it to know where to look;-) Here's some lines from db ps, in case they give someone useful information. (I can leave this box sitting in DB for the rest of to-day, in case someone can suggest what I should look for on it.) Just snippets... Ss pause adjkerntz DL sdflush [sofdepflush] RL [syncer] DL vlruwt [vnlru] DL psleep [bufdaemon] RL [pagezero] DL psleep [vmdaemon] DL psleep [pagedaemon] DL ccb_scan [xpt_thrd] DL waiting_ [sctp_iterator] DL ctl_work [ctl_thrd] DL cooling [acpi_cooling0] DL tzpoll [acpi_thermal] DL (threaded) [usb] ... DL - [yarrow] DL (threaded) [geom] D - [g_down] D - [g_up] D - [g_event] RL (threaded) [intr] I [irq15: ata1] ... Run CPU0 [swi6: Giant taskq] -- does this one indicate the CPU is actually running this? (after a db cont, wait a while ctrlaltesc db ps it is still the same) I [swi4: clock] I [swi1: netisr 0] I [swi3: vm] RL [idle: cpu0] SLs wait [init] DL audit_wo [audit] DLs (threaded) [kernel] D - [deadlkres] ... D sched [swapper] I have no idea if this ps output helps, unless it indicates that it is looping on the Giant taskq? Might be. You could do 'bt pid' for the process to see where it loops. Another good set of hints is at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Kostik, you must be clairvoyant;-) When I did show alllocks, I found that the syncer process held - exclusive sleep mutex mount mtx locked @ kern/vfs_subr.c:4720 - exclusive lockmgr syncer locked @ kern/vfs_subr.c:1780 The trace for this process goes like: spinlock_exit mtx_unlock_spin_flags kern_yield _mnt_vnode_next_active vnode_next_active vfs_msync() So, it seems like your r244095 commit might have fixed this? (I'm not good at this stuff, but from your description, it looks like it did the kern_yield() with the mutex held and maybe got into trouble trying to acquire Giant?) Anyhow, I'm going to test a kernel with r244095 in it and see if I can still reproduce the hang. (There wasn't much else in the show alllocks, except a process that held the exclusive vnode interlock mutex plus a ufs vnode lock, but it's just doing a witness_unlock.) There must be a thread blocked for the mount interlock for the loop in the mnt_vnode_next_active to cause livelock. Yes. I am getting hangs with the -current kernel and they seem easier for me to reproduce. For the one I just did, the syncer seems to be blocked at VI_TRYLOCK() in _mnt_vnode_next_active(). The vnode interlock mutex is eclusively locked by a sh process (11627). Now, here is where it gets weird... When I do a db trace 11627 I get the following: witness_unlock+0x1f3 (subr_witness.c:1563) mtx_unlock_flags+0x9f (kern_mutex.c:250) vdropl+0x63 (vfs_subr.c:2405) vputx+0x130 (vfs_subr.c:2116) vput+0x10 (vfs_subr.c:2319) vm_mmap+0x52e (vm_mmap.c:1341) sys_mmap So, it seems this process is stuck while trying to unlock the mutex, if that makes any sense... When I look at subr_witness.c around 1563, I can't see why anything would block there? (here's the code snippet) if ((instance-li_flags LI_EXCLUSIVE) == 0 witness_watch 0 (flags LOP_EXCLUSIVE) != 0) { printf(exclusive unlock of (%s) %s @ %s:%d\n, class-lc_name, lock-lo_name, fixup_filename(file), line); printf(while share locked from %s:%d\n, fixup_filename(instance-li_file), instance-li_line); panic(share-uexcl); } /* If we are recursed, unrecurse. */ *1563* if ((instance-li_flags LI_RECURSEMASK) 0) { CTR4(KTR_WITNESS,
Re: r244036 kernel hangs under load.
On Tue, Dec 11, 2012 at 9:55 PM, Rick Macklem rmack...@uoguelph.ca wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 07:11:59PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 01:38:21PM -0500, Rick Macklem wrote: Adrian Chadd wrote: .. what was the previous kernel version? Hopefully Tim has it narrowed down more, but I don't see the hangs on a Sept. 7 kernel from head and I do see them on a Dec. 3 kernel from head. (Don't know the eact rNN.) It seems to predate my commit (r244008), which was my first concern. I use old single core i386 hardware and can fairly reliably reproduce it by doing a kernel build and a svn checkout concurrently. No NFS activity. These are running on a local disk (UFS/FFS). (The kernel I reproduce it on is built via GENERIC for i386. If you want me to start a binary search for which rNN, I can do that, but it will take a while.:-) I can get out into DDB, but I'll admit I don't know enough about it to know where to look;-) Here's some lines from db ps, in case they give someone useful information. (I can leave this box sitting in DB for the rest of to-day, in case someone can suggest what I should look for on it.) Just snippets... Ss pause adjkerntz DL sdflush [sofdepflush] RL [syncer] DL vlruwt [vnlru] DL psleep [bufdaemon] RL [pagezero] DL psleep [vmdaemon] DL psleep [pagedaemon] DL ccb_scan [xpt_thrd] DL waiting_ [sctp_iterator] DL ctl_work [ctl_thrd] DL cooling [acpi_cooling0] DL tzpoll [acpi_thermal] DL (threaded) [usb] ... DL - [yarrow] DL (threaded) [geom] D - [g_down] D - [g_up] D - [g_event] RL (threaded) [intr] I [irq15: ata1] ... Run CPU0 [swi6: Giant taskq] -- does this one indicate the CPU is actually running this? (after a db cont, wait a while ctrlaltesc db ps it is still the same) I [swi4: clock] I [swi1: netisr 0] I [swi3: vm] RL [idle: cpu0] SLs wait [init] DL audit_wo [audit] DLs (threaded) [kernel] D - [deadlkres] ... D sched [swapper] I have no idea if this ps output helps, unless it indicates that it is looping on the Giant taskq? Might be. You could do 'bt pid' for the process to see where it loops. Another good set of hints is at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Kostik, you must be clairvoyant;-) When I did show alllocks, I found that the syncer process held - exclusive sleep mutex mount mtx locked @ kern/vfs_subr.c:4720 - exclusive lockmgr syncer locked @ kern/vfs_subr.c:1780 The trace for this process goes like: spinlock_exit mtx_unlock_spin_flags kern_yield _mnt_vnode_next_active vnode_next_active vfs_msync() So, it seems like your r244095 commit might have fixed this? (I'm not good at this stuff, but from your description, it looks like it did the kern_yield() with the mutex held and maybe got into trouble trying to acquire Giant?) Anyhow, I'm going to test a kernel with r244095 in it and see if I can still reproduce the hang. (There wasn't much else in the show alllocks, except a process that held the exclusive vnode interlock mutex plus a ufs vnode lock, but it's just doing a witness_unlock.) There must be a thread blocked for the mount interlock for the loop in the mnt_vnode_next_active to cause livelock. Yes. I am getting hangs with the -current kernel and they seem easier for me to reproduce. Can you report the svn rev number is kernel is built from? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: r244036 kernel hangs under load.
On Tue, Dec 11, 2012 at 04:55:52PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 07:11:59PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 01:38:21PM -0500, Rick Macklem wrote: Adrian Chadd wrote: .. what was the previous kernel version? Hopefully Tim has it narrowed down more, but I don't see the hangs on a Sept. 7 kernel from head and I do see them on a Dec. 3 kernel from head. (Don't know the eact rNN.) It seems to predate my commit (r244008), which was my first concern. I use old single core i386 hardware and can fairly reliably reproduce it by doing a kernel build and a svn checkout concurrently. No NFS activity. These are running on a local disk (UFS/FFS). (The kernel I reproduce it on is built via GENERIC for i386. If you want me to start a binary search for which rNN, I can do that, but it will take a while.:-) I can get out into DDB, but I'll admit I don't know enough about it to know where to look;-) Here's some lines from db ps, in case they give someone useful information. (I can leave this box sitting in DB for the rest of to-day, in case someone can suggest what I should look for on it.) Just snippets... Ss pause adjkerntz DL sdflush [sofdepflush] RL [syncer] DL vlruwt [vnlru] DL psleep [bufdaemon] RL [pagezero] DL psleep [vmdaemon] DL psleep [pagedaemon] DL ccb_scan [xpt_thrd] DL waiting_ [sctp_iterator] DL ctl_work [ctl_thrd] DL cooling [acpi_cooling0] DL tzpoll [acpi_thermal] DL (threaded) [usb] ... DL - [yarrow] DL (threaded) [geom] D - [g_down] D - [g_up] D - [g_event] RL (threaded) [intr] I [irq15: ata1] ... Run CPU0 [swi6: Giant taskq] -- does this one indicate the CPU is actually running this? (after a db cont, wait a while ctrlaltesc db ps it is still the same) I [swi4: clock] I [swi1: netisr 0] I [swi3: vm] RL [idle: cpu0] SLs wait [init] DL audit_wo [audit] DLs (threaded) [kernel] D - [deadlkres] ... D sched [swapper] I have no idea if this ps output helps, unless it indicates that it is looping on the Giant taskq? Might be. You could do 'bt pid' for the process to see where it loops. Another good set of hints is at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Kostik, you must be clairvoyant;-) When I did show alllocks, I found that the syncer process held - exclusive sleep mutex mount mtx locked @ kern/vfs_subr.c:4720 - exclusive lockmgr syncer locked @ kern/vfs_subr.c:1780 The trace for this process goes like: spinlock_exit mtx_unlock_spin_flags kern_yield _mnt_vnode_next_active vnode_next_active vfs_msync() So, it seems like your r244095 commit might have fixed this? (I'm not good at this stuff, but from your description, it looks like it did the kern_yield() with the mutex held and maybe got into trouble trying to acquire Giant?) Anyhow, I'm going to test a kernel with r244095 in it and see if I can still reproduce the hang. (There wasn't much else in the show alllocks, except a process that held the exclusive vnode interlock mutex plus a ufs vnode lock, but it's just doing a witness_unlock.) There must be a thread blocked for the mount interlock for the loop in the mnt_vnode_next_active to cause livelock. Yes. I am getting hangs with the -current kernel and they seem easier for me to reproduce. For the one I just did, the syncer seems to be blocked at VI_TRYLOCK() in _mnt_vnode_next_active(). trylock cannot block. The vnode interlock mutex is eclusively locked by a sh process (11627). Now, here is where it gets weird... When I do a db trace 11627 I get the following: witness_unlock+0x1f3 (subr_witness.c:1563) mtx_unlock_flags+0x9f (kern_mutex.c:250) vdropl+0x63 (vfs_subr.c:2405) vputx+0x130 (vfs_subr.c:2116) vput+0x10 (vfs_subr.c:2319) vm_mmap+0x52e (vm_mmap.c:1341) sys_mmap So, it seems this process is stuck while trying to unlock the mutex, if that makes any sense... It probably not stuck, but just you catched it at this moment. The issue sounds more like a livelock. Can you obtain _all_ the information listed in the deadlock debugging page I sent earlier, and provide it to me ? Also, do you use the post-r244095 kernel ? Is your machine SMP ? pgpF4yXHmfYCH.pgp Description: PGP signature
Re: r244036 kernel hangs under load.
Konstantin Belousov wrote: On Tue, Dec 11, 2012 at 04:55:52PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 07:11:59PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 01:38:21PM -0500, Rick Macklem wrote: Adrian Chadd wrote: .. what was the previous kernel version? Hopefully Tim has it narrowed down more, but I don't see the hangs on a Sept. 7 kernel from head and I do see them on a Dec. 3 kernel from head. (Don't know the eact rNN.) It seems to predate my commit (r244008), which was my first concern. I use old single core i386 hardware and can fairly reliably reproduce it by doing a kernel build and a svn checkout concurrently. No NFS activity. These are running on a local disk (UFS/FFS). (The kernel I reproduce it on is built via GENERIC for i386. If you want me to start a binary search for which rNN, I can do that, but it will take a while.:-) I can get out into DDB, but I'll admit I don't know enough about it to know where to look;-) Here's some lines from db ps, in case they give someone useful information. (I can leave this box sitting in DB for the rest of to-day, in case someone can suggest what I should look for on it.) Just snippets... Ss pause adjkerntz DL sdflush [sofdepflush] RL [syncer] DL vlruwt [vnlru] DL psleep [bufdaemon] RL [pagezero] DL psleep [vmdaemon] DL psleep [pagedaemon] DL ccb_scan [xpt_thrd] DL waiting_ [sctp_iterator] DL ctl_work [ctl_thrd] DL cooling [acpi_cooling0] DL tzpoll [acpi_thermal] DL (threaded) [usb] ... DL - [yarrow] DL (threaded) [geom] D - [g_down] D - [g_up] D - [g_event] RL (threaded) [intr] I [irq15: ata1] ... Run CPU0 [swi6: Giant taskq] -- does this one indicate the CPU is actually running this? (after a db cont, wait a while ctrlaltesc db ps it is still the same) I [swi4: clock] I [swi1: netisr 0] I [swi3: vm] RL [idle: cpu0] SLs wait [init] DL audit_wo [audit] DLs (threaded) [kernel] D - [deadlkres] ... D sched [swapper] I have no idea if this ps output helps, unless it indicates that it is looping on the Giant taskq? Might be. You could do 'bt pid' for the process to see where it loops. Another good set of hints is at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Kostik, you must be clairvoyant;-) When I did show alllocks, I found that the syncer process held - exclusive sleep mutex mount mtx locked @ kern/vfs_subr.c:4720 - exclusive lockmgr syncer locked @ kern/vfs_subr.c:1780 The trace for this process goes like: spinlock_exit mtx_unlock_spin_flags kern_yield _mnt_vnode_next_active vnode_next_active vfs_msync() So, it seems like your r244095 commit might have fixed this? (I'm not good at this stuff, but from your description, it looks like it did the kern_yield() with the mutex held and maybe got into trouble trying to acquire Giant?) Anyhow, I'm going to test a kernel with r244095 in it and see if I can still reproduce the hang. (There wasn't much else in the show alllocks, except a process that held the exclusive vnode interlock mutex plus a ufs vnode lock, but it's just doing a witness_unlock.) There must be a thread blocked for the mount interlock for the loop in the mnt_vnode_next_active to cause livelock. Yes. I am getting hangs with the -current kernel and they seem easier for me to reproduce. For the one I just did, the syncer seems to be blocked at VI_TRYLOCK() in _mnt_vnode_next_active(). trylock cannot block. The vnode interlock mutex is eclusively locked by a sh process (11627). Now, here is where it gets weird... When I do a db trace 11627 I get the following: witness_unlock+0x1f3 (subr_witness.c:1563) mtx_unlock_flags+0x9f (kern_mutex.c:250) vdropl+0x63 (vfs_subr.c:2405) vputx+0x130 (vfs_subr.c:2116) vput+0x10 (vfs_subr.c:2319) vm_mmap+0x52e (vm_mmap.c:1341) sys_mmap So, it seems this process is stuck while trying to unlock the mutex, if that makes any sense... It probably not stuck, but just you catched it at this moment. The issue sounds more like a livelock. Can you obtain _all_ the information listed in the deadlock debugging page I sent earlier, and provide it to me ? Well, this is a laptop and when it hangs (doesn't do anything, except sometimes echo characters on the console screen) I ctrlaltesc to get
Re: r244036 kernel hangs under load.
On Tue, Dec 11, 2012 at 05:30:24PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Tue, Dec 11, 2012 at 04:55:52PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 07:11:59PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 01:38:21PM -0500, Rick Macklem wrote: Adrian Chadd wrote: .. what was the previous kernel version? Hopefully Tim has it narrowed down more, but I don't see the hangs on a Sept. 7 kernel from head and I do see them on a Dec. 3 kernel from head. (Don't know the eact rNN.) It seems to predate my commit (r244008), which was my first concern. I use old single core i386 hardware and can fairly reliably reproduce it by doing a kernel build and a svn checkout concurrently. No NFS activity. These are running on a local disk (UFS/FFS). (The kernel I reproduce it on is built via GENERIC for i386. If you want me to start a binary search for which rNN, I can do that, but it will take a while.:-) I can get out into DDB, but I'll admit I don't know enough about it to know where to look;-) Here's some lines from db ps, in case they give someone useful information. (I can leave this box sitting in DB for the rest of to-day, in case someone can suggest what I should look for on it.) Just snippets... Ss pause adjkerntz DL sdflush [sofdepflush] RL [syncer] DL vlruwt [vnlru] DL psleep [bufdaemon] RL [pagezero] DL psleep [vmdaemon] DL psleep [pagedaemon] DL ccb_scan [xpt_thrd] DL waiting_ [sctp_iterator] DL ctl_work [ctl_thrd] DL cooling [acpi_cooling0] DL tzpoll [acpi_thermal] DL (threaded) [usb] ... DL - [yarrow] DL (threaded) [geom] D - [g_down] D - [g_up] D - [g_event] RL (threaded) [intr] I [irq15: ata1] ... Run CPU0 [swi6: Giant taskq] -- does this one indicate the CPU is actually running this? (after a db cont, wait a while ctrlaltesc db ps it is still the same) I [swi4: clock] I [swi1: netisr 0] I [swi3: vm] RL [idle: cpu0] SLs wait [init] DL audit_wo [audit] DLs (threaded) [kernel] D - [deadlkres] ... D sched [swapper] I have no idea if this ps output helps, unless it indicates that it is looping on the Giant taskq? Might be. You could do 'bt pid' for the process to see where it loops. Another good set of hints is at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Kostik, you must be clairvoyant;-) When I did show alllocks, I found that the syncer process held - exclusive sleep mutex mount mtx locked @ kern/vfs_subr.c:4720 - exclusive lockmgr syncer locked @ kern/vfs_subr.c:1780 The trace for this process goes like: spinlock_exit mtx_unlock_spin_flags kern_yield _mnt_vnode_next_active vnode_next_active vfs_msync() So, it seems like your r244095 commit might have fixed this? (I'm not good at this stuff, but from your description, it looks like it did the kern_yield() with the mutex held and maybe got into trouble trying to acquire Giant?) Anyhow, I'm going to test a kernel with r244095 in it and see if I can still reproduce the hang. (There wasn't much else in the show alllocks, except a process that held the exclusive vnode interlock mutex plus a ufs vnode lock, but it's just doing a witness_unlock.) There must be a thread blocked for the mount interlock for the loop in the mnt_vnode_next_active to cause livelock. Yes. I am getting hangs with the -current kernel and they seem easier for me to reproduce. For the one I just did, the syncer seems to be blocked at VI_TRYLOCK() in _mnt_vnode_next_active(). trylock cannot block. The vnode interlock mutex is eclusively locked by a sh process (11627). Now, here is where it gets weird... When I do a db trace 11627 I get the following: witness_unlock+0x1f3 (subr_witness.c:1563) mtx_unlock_flags+0x9f (kern_mutex.c:250) vdropl+0x63 (vfs_subr.c:2405) vputx+0x130 (vfs_subr.c:2116) vput+0x10 (vfs_subr.c:2319) vm_mmap+0x52e (vm_mmap.c:1341) sys_mmap So, it seems this process is stuck while trying to unlock the mutex, if that makes any sense... It probably not stuck, but just you catched it at this moment. The issue sounds more like a livelock. Can you obtain _all_ the information listed in the deadlock
Re: r244036 kernel hangs under load.
Konstantin Belousov wrote: On Tue, Dec 11, 2012 at 05:30:24PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Tue, Dec 11, 2012 at 04:55:52PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 07:11:59PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 01:38:21PM -0500, Rick Macklem wrote: Adrian Chadd wrote: .. what was the previous kernel version? Hopefully Tim has it narrowed down more, but I don't see the hangs on a Sept. 7 kernel from head and I do see them on a Dec. 3 kernel from head. (Don't know the eact rNN.) It seems to predate my commit (r244008), which was my first concern. I use old single core i386 hardware and can fairly reliably reproduce it by doing a kernel build and a svn checkout concurrently. No NFS activity. These are running on a local disk (UFS/FFS). (The kernel I reproduce it on is built via GENERIC for i386. If you want me to start a binary search for which rNN, I can do that, but it will take a while.:-) I can get out into DDB, but I'll admit I don't know enough about it to know where to look;-) Here's some lines from db ps, in case they give someone useful information. (I can leave this box sitting in DB for the rest of to-day, in case someone can suggest what I should look for on it.) Just snippets... Ss pause adjkerntz DL sdflush [sofdepflush] RL [syncer] DL vlruwt [vnlru] DL psleep [bufdaemon] RL [pagezero] DL psleep [vmdaemon] DL psleep [pagedaemon] DL ccb_scan [xpt_thrd] DL waiting_ [sctp_iterator] DL ctl_work [ctl_thrd] DL cooling [acpi_cooling0] DL tzpoll [acpi_thermal] DL (threaded) [usb] ... DL - [yarrow] DL (threaded) [geom] D - [g_down] D - [g_up] D - [g_event] RL (threaded) [intr] I [irq15: ata1] ... Run CPU0 [swi6: Giant taskq] -- does this one indicate the CPU is actually running this? (after a db cont, wait a while ctrlaltesc db ps it is still the same) I [swi4: clock] I [swi1: netisr 0] I [swi3: vm] RL [idle: cpu0] SLs wait [init] DL audit_wo [audit] DLs (threaded) [kernel] D - [deadlkres] ... D sched [swapper] I have no idea if this ps output helps, unless it indicates that it is looping on the Giant taskq? Might be. You could do 'bt pid' for the process to see where it loops. Another good set of hints is at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Kostik, you must be clairvoyant;-) When I did show alllocks, I found that the syncer process held - exclusive sleep mutex mount mtx locked @ kern/vfs_subr.c:4720 - exclusive lockmgr syncer locked @ kern/vfs_subr.c:1780 The trace for this process goes like: spinlock_exit mtx_unlock_spin_flags kern_yield _mnt_vnode_next_active vnode_next_active vfs_msync() So, it seems like your r244095 commit might have fixed this? (I'm not good at this stuff, but from your description, it looks like it did the kern_yield() with the mutex held and maybe got into trouble trying to acquire Giant?) Anyhow, I'm going to test a kernel with r244095 in it and see if I can still reproduce the hang. (There wasn't much else in the show alllocks, except a process that held the exclusive vnode interlock mutex plus a ufs vnode lock, but it's just doing a witness_unlock.) There must be a thread blocked for the mount interlock for the loop in the mnt_vnode_next_active to cause livelock. Yes. I am getting hangs with the -current kernel and they seem easier for me to reproduce. For the one I just did, the syncer seems to be blocked at VI_TRYLOCK() in _mnt_vnode_next_active(). trylock cannot block. The vnode interlock mutex is eclusively locked by a sh process (11627). Now, here is where it gets weird... When I do a db trace 11627 I get the following: witness_unlock+0x1f3 (subr_witness.c:1563) mtx_unlock_flags+0x9f (kern_mutex.c:250) vdropl+0x63 (vfs_subr.c:2405) vputx+0x130 (vfs_subr.c:2116) vput+0x10 (vfs_subr.c:2319) vm_mmap+0x52e
Re: r244036 kernel hangs under load.
On Tue, Dec 11, 2012 at 08:58:47PM -0500, Rick Macklem wrote: Ok, I'll test r243598 and then r243599 and r243835, just to see if it really is this. I'll email when I have done this. If you test only r243598, I am sure that you would experience corruption. The r243599 should cause the deadlocks. Also, do you use the post-r244095 kernel ? Before and after. The most recent tests were post-r244095. (If anything the more recent kernels hang more easily.) Is your machine SMP ? Old, slow single core i386. Try this. Please note that this is mostly a debugging facility. It seemed to help, but didn't stop the hangs completely. r244125 without the patch would hang somewhere in a kernel build. r244125 plus this patch ran almost 2 kernel builds before it got hung. Can you try to look into the system state on the hang (on the kernel with the if (1 || patch applied) ? Using the ddb and recipe from the web page. Basically watch for a thread looping in the mnt_active iterator and threads owning vnode interlocks. pgpyyYZqSAQ4m.pgp Description: PGP signature
Re: r244036 kernel hangs under load.
.. what was the previous kernel version? adrian On 9 December 2012 22:08, Tim Kientzle kient...@freebsd.org wrote: I haven't found any useful clues yet, but thought I'd ask if anyone else was seeing hangs in a recent kernel. I just upgraded to r244036 using a straight GENERIC i386 kernel. (Straight buildworld/buildkernel, no local changes, /etc/src.conf doesn't exist, /etc/make.conf just has PERL_VERSION defined.) When I try to cross build an ARM world on the resulting system, the entire system hangs hard after about 30 minutes: No network, no keyboard response, no nothing. Don't know if it's relevant, but the system is using NFS pretty heavily (Parallels VM mounting NFS from Mac OS 10.7 host.) I'll try to get some more details ... Tim ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: r244036 kernel hangs under load.
Adrian Chadd wrote: .. what was the previous kernel version? Hopefully Tim has it narrowed down more, but I don't see the hangs on a Sept. 7 kernel from head and I do see them on a Dec. 3 kernel from head. (Don't know the eact rNN.) It seems to predate my commit (r244008), which was my first concern. I use old single core i386 hardware and can fairly reliably reproduce it by doing a kernel build and a svn checkout concurrently. No NFS activity. These are running on a local disk (UFS/FFS). (The kernel I reproduce it on is built via GENERIC for i386. If you want me to start a binary search for which rNN, I can do that, but it will take a while.:-) I can get out into DDB, but I'll admit I don't know enough about it to know where to look;-) Here's some lines from db ps, in case they give someone useful information. (I can leave this box sitting in DB for the rest of to-day, in case someone can suggest what I should look for on it.) Just snippets... Ss pause adjkerntz DL sdflush [sofdepflush] RL[syncer] DL vlruwt [vnlru] DL psleep [bufdaemon] RL [pagezero] DL psleep [vmdaemon] DL psleep [pagedaemon] DL ccb_scan [xpt_thrd] DL waiting_ [sctp_iterator] DL ctl_work [ctl_thrd] DL cooling [acpi_cooling0] DL tzpoll [acpi_thermal] DL (threaded) [usb] ... DL -[yarrow] DL (threaded) [geom] D - [g_down] D - [g_up] D - [g_event] RL (threaded) [intr] I[irq15: ata1] ... Run CPU0[swi6: Giant taskq] -- does this one indicate the CPU is actually running this? (after a db cont, wait a while ctrlaltesc db ps it is still the same) I[swi4: clock] I[swi1: netisr 0] I[swi3: vm] RL [idle: cpu0] SLs wait [init] DL audit_wo [audit] DLs (threaded) [kernel] D - [deadlkres] ... D sched[swapper] I have no idea if this ps output helps, unless it indicates that it is looping on the Giant taskq? As I said, I can leave it in db for to-day, if anyone wants me to do anything in the debugger and I can probably reproduce it, if someone wants stuff tried later. rick adrian On 9 December 2012 22:08, Tim Kientzle kient...@freebsd.org wrote: I haven't found any useful clues yet, but thought I'd ask if anyone else was seeing hangs in a recent kernel. I just upgraded to r244036 using a straight GENERIC i386 kernel. (Straight buildworld/buildkernel, no local changes, /etc/src.conf doesn't exist, /etc/make.conf just has PERL_VERSION defined.) When I try to cross build an ARM world on the resulting system, the entire system hangs hard after about 30 minutes: No network, no keyboard response, no nothing. Don't know if it's relevant, but the system is using NFS pretty heavily (Parallels VM mounting NFS from Mac OS 10.7 host.) I'll try to get some more details ... Tim ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: r244036 kernel hangs under load.
On Mon, Dec 10, 2012 at 01:38:21PM -0500, Rick Macklem wrote: Adrian Chadd wrote: .. what was the previous kernel version? Hopefully Tim has it narrowed down more, but I don't see the hangs on a Sept. 7 kernel from head and I do see them on a Dec. 3 kernel from head. (Don't know the eact rNN.) It seems to predate my commit (r244008), which was my first concern. I use old single core i386 hardware and can fairly reliably reproduce it by doing a kernel build and a svn checkout concurrently. No NFS activity. These are running on a local disk (UFS/FFS). (The kernel I reproduce it on is built via GENERIC for i386. If you want me to start a binary search for which rNN, I can do that, but it will take a while.:-) I can get out into DDB, but I'll admit I don't know enough about it to know where to look;-) Here's some lines from db ps, in case they give someone useful information. (I can leave this box sitting in DB for the rest of to-day, in case someone can suggest what I should look for on it.) Just snippets... Ss pause adjkerntz DL sdflush [sofdepflush] RL[syncer] DL vlruwt [vnlru] DL psleep [bufdaemon] RL [pagezero] DL psleep [vmdaemon] DL psleep [pagedaemon] DL ccb_scan [xpt_thrd] DL waiting_ [sctp_iterator] DL ctl_work [ctl_thrd] DL cooling [acpi_cooling0] DL tzpoll [acpi_thermal] DL (threaded) [usb] ... DL -[yarrow] DL (threaded) [geom] D - [g_down] D - [g_up] D - [g_event] RL (threaded) [intr] I[irq15: ata1] ... Run CPU0[swi6: Giant taskq] -- does this one indicate the CPU is actually running this? (after a db cont, wait a while ctrlaltesc db ps it is still the same) I[swi4: clock] I[swi1: netisr 0] I[swi3: vm] RL [idle: cpu0] SLs wait [init] DL audit_wo [audit] DLs (threaded) [kernel] D - [deadlkres] ... D sched[swapper] I have no idea if this ps output helps, unless it indicates that it is looping on the Giant taskq? Might be. You could do 'bt pid' for the process to see where it loops. Another good set of hints is at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html As I said, I can leave it in db for to-day, if anyone wants me to do anything in the debugger and I can probably reproduce it, if someone wants stuff tried later. rick adrian On 9 December 2012 22:08, Tim Kientzle kient...@freebsd.org wrote: I haven't found any useful clues yet, but thought I'd ask if anyone else was seeing hangs in a recent kernel. I just upgraded to r244036 using a straight GENERIC i386 kernel. (Straight buildworld/buildkernel, no local changes, /etc/src.conf doesn't exist, /etc/make.conf just has PERL_VERSION defined.) When I try to cross build an ARM world on the resulting system, the entire system hangs hard after about 30 minutes: No network, no keyboard response, no nothing. Don't know if it's relevant, but the system is using NFS pretty heavily (Parallels VM mounting NFS from Mac OS 10.7 host.) I'll try to get some more details ... Tim ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org pgp8rafpIDxGe.pgp Description: PGP signature
Re: r244036 kernel hangs under load.
Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 01:38:21PM -0500, Rick Macklem wrote: Adrian Chadd wrote: .. what was the previous kernel version? Hopefully Tim has it narrowed down more, but I don't see the hangs on a Sept. 7 kernel from head and I do see them on a Dec. 3 kernel from head. (Don't know the eact rNN.) It seems to predate my commit (r244008), which was my first concern. I use old single core i386 hardware and can fairly reliably reproduce it by doing a kernel build and a svn checkout concurrently. No NFS activity. These are running on a local disk (UFS/FFS). (The kernel I reproduce it on is built via GENERIC for i386. If you want me to start a binary search for which rNN, I can do that, but it will take a while.:-) I can get out into DDB, but I'll admit I don't know enough about it to know where to look;-) Here's some lines from db ps, in case they give someone useful information. (I can leave this box sitting in DB for the rest of to-day, in case someone can suggest what I should look for on it.) Just snippets... Ss pause adjkerntz DL sdflush [sofdepflush] RL [syncer] DL vlruwt [vnlru] DL psleep [bufdaemon] RL [pagezero] DL psleep [vmdaemon] DL psleep [pagedaemon] DL ccb_scan [xpt_thrd] DL waiting_ [sctp_iterator] DL ctl_work [ctl_thrd] DL cooling [acpi_cooling0] DL tzpoll [acpi_thermal] DL (threaded) [usb] ... DL - [yarrow] DL (threaded) [geom] D - [g_down] D - [g_up] D - [g_event] RL (threaded) [intr] I [irq15: ata1] ... Run CPU0 [swi6: Giant taskq] -- does this one indicate the CPU is actually running this? (after a db cont, wait a while ctrlaltesc db ps it is still the same) I [swi4: clock] I [swi1: netisr 0] I [swi3: vm] RL [idle: cpu0] SLs wait [init] DL audit_wo [audit] DLs (threaded) [kernel] D - [deadlkres] ... D sched [swapper] I have no idea if this ps output helps, unless it indicates that it is looping on the Giant taskq? Might be. You could do 'bt pid' for the process to see where it loops. Another good set of hints is at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Kostik, you must be clairvoyant;-) When I did show alllocks, I found that the syncer process held - exclusive sleep mutex mount mtx locked @ kern/vfs_subr.c:4720 - exclusive lockmgr syncer locked @ kern/vfs_subr.c:1780 The trace for this process goes like: spinlock_exit mtx_unlock_spin_flags kern_yield _mnt_vnode_next_active vnode_next_active vfs_msync() So, it seems like your r244095 commit might have fixed this? (I'm not good at this stuff, but from your description, it looks like it did the kern_yield() with the mutex held and maybe got into trouble trying to acquire Giant?) Anyhow, I'm going to test a kernel with r244095 in it and see if I can still reproduce the hang. (There wasn't much else in the show alllocks, except a process that held the exclusive vnode interlock mutex plus a ufs vnode lock, but it's just doing a witness_unlock.) I'll email if/when I know more, rick ps: Fingers/toes crossed that you've already fixed it. As I said, I can leave it in db for to-day, if anyone wants me to do anything in the debugger and I can probably reproduce it, if someone wants stuff tried later. rick adrian On 9 December 2012 22:08, Tim Kientzle kient...@freebsd.org wrote: I haven't found any useful clues yet, but thought I'd ask if anyone else was seeing hangs in a recent kernel. I just upgraded to r244036 using a straight GENERIC i386 kernel. (Straight buildworld/buildkernel, no local changes, /etc/src.conf doesn't exist, /etc/make.conf just has PERL_VERSION defined.) When I try to cross build an ARM world on the resulting system, the entire system hangs hard after about 30 minutes: No network, no keyboard response, no nothing. Don't know if it's relevant, but the system is using NFS pretty heavily (Parallels VM mounting NFS from Mac OS 10.7 host.) I'll try to get some more details ... Tim ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to
Re: r244036 kernel hangs under load.
On Mon, Dec 10, 2012 at 07:11:59PM -0500, Rick Macklem wrote: Konstantin Belousov wrote: On Mon, Dec 10, 2012 at 01:38:21PM -0500, Rick Macklem wrote: Adrian Chadd wrote: .. what was the previous kernel version? Hopefully Tim has it narrowed down more, but I don't see the hangs on a Sept. 7 kernel from head and I do see them on a Dec. 3 kernel from head. (Don't know the eact rNN.) It seems to predate my commit (r244008), which was my first concern. I use old single core i386 hardware and can fairly reliably reproduce it by doing a kernel build and a svn checkout concurrently. No NFS activity. These are running on a local disk (UFS/FFS). (The kernel I reproduce it on is built via GENERIC for i386. If you want me to start a binary search for which rNN, I can do that, but it will take a while.:-) I can get out into DDB, but I'll admit I don't know enough about it to know where to look;-) Here's some lines from db ps, in case they give someone useful information. (I can leave this box sitting in DB for the rest of to-day, in case someone can suggest what I should look for on it.) Just snippets... Ss pause adjkerntz DL sdflush [sofdepflush] RL [syncer] DL vlruwt [vnlru] DL psleep [bufdaemon] RL [pagezero] DL psleep [vmdaemon] DL psleep [pagedaemon] DL ccb_scan [xpt_thrd] DL waiting_ [sctp_iterator] DL ctl_work [ctl_thrd] DL cooling [acpi_cooling0] DL tzpoll [acpi_thermal] DL (threaded) [usb] ... DL - [yarrow] DL (threaded) [geom] D - [g_down] D - [g_up] D - [g_event] RL (threaded) [intr] I [irq15: ata1] ... Run CPU0 [swi6: Giant taskq] -- does this one indicate the CPU is actually running this? (after a db cont, wait a while ctrlaltesc db ps it is still the same) I [swi4: clock] I [swi1: netisr 0] I [swi3: vm] RL [idle: cpu0] SLs wait [init] DL audit_wo [audit] DLs (threaded) [kernel] D - [deadlkres] ... D sched [swapper] I have no idea if this ps output helps, unless it indicates that it is looping on the Giant taskq? Might be. You could do 'bt pid' for the process to see where it loops. Another good set of hints is at http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Kostik, you must be clairvoyant;-) When I did show alllocks, I found that the syncer process held - exclusive sleep mutex mount mtx locked @ kern/vfs_subr.c:4720 - exclusive lockmgr syncer locked @ kern/vfs_subr.c:1780 The trace for this process goes like: spinlock_exit mtx_unlock_spin_flags kern_yield _mnt_vnode_next_active vnode_next_active vfs_msync() So, it seems like your r244095 commit might have fixed this? (I'm not good at this stuff, but from your description, it looks like it did the kern_yield() with the mutex held and maybe got into trouble trying to acquire Giant?) Anyhow, I'm going to test a kernel with r244095 in it and see if I can still reproduce the hang. (There wasn't much else in the show alllocks, except a process that held the exclusive vnode interlock mutex plus a ufs vnode lock, but it's just doing a witness_unlock.) There must be a thread blocked for the mount interlock for the loop in the mnt_vnode_next_active to cause livelock. I'll email if/when I know more, rick ps: Fingers/toes crossed that you've already fixed it. As I said, I can leave it in db for to-day, if anyone wants me to do anything in the debugger and I can probably reproduce it, if someone wants stuff tried later. rick adrian On 9 December 2012 22:08, Tim Kientzle kient...@freebsd.org wrote: I haven't found any useful clues yet, but thought I'd ask if anyone else was seeing hangs in a recent kernel. I just upgraded to r244036 using a straight GENERIC i386 kernel. (Straight buildworld/buildkernel, no local changes, /etc/src.conf doesn't exist, /etc/make.conf just has PERL_VERSION defined.) When I try to cross build an ARM world on the resulting system, the entire system hangs hard after about 30 minutes: No network, no keyboard response, no nothing. Don't know if it's relevant, but the system is using NFS pretty heavily (Parallels VM mounting NFS from Mac OS 10.7 host.) I'll try to get some more details ... Tim ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list
Re: r244036 kernel hangs under load.
On Dec 10, 2012, at 10:38 AM, Rick Macklem wrote: Adrian Chadd wrote: .. what was the previous kernel version? Hopefully Tim has it narrowed down more, but I don't see the hangs on a Sept. 7 kernel from head and I do see them on a Dec. 3 kernel from head. (Don't know the eact rNN.) I haven't had a chance to narrow it down any yet, but I think my previous kernel was from around Aug 25. Tim ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org