Re: Why does printf(9) hang network?
Robert writes: Why would doing a printf(9) in a device driver (usb, firewire, probably others) cause an obscenely long lockout on /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) ? Printf(9) alone isn't the problem, adding printfs to chown(2) does not cause the problem, but printfs from device drivers do. Grep says that uipc_sockbuf.c is the only file that locks/unlocks sb_sx. The device drivers and printf don't even know that sb_sx exists. I can't speak to the details of your situation, but one possible explanation might be: printf runs at the speed of the console, which for serious consoles can be extremely slowly. But shouldn't the RS-232 driver just fill up the UART's FIFO and then sleep? Why can't the network code run while the RS-232 driver sleeps? Actually this must be happening in the call-printf-from-chown case. There must be something different when printf is called from a device driver. Device driver interrupt threads can preempt other threads, possibly while those threads hold locks. That causes them to hold the locks for much longer, as the threads may not get rescheduled for some period (for example, until the device driver is done doing a printf), If the CPU is mostly idle, shouldn't the network thread get scheduled right away? leading other threads waiting for that lock to wait significantly longer. Especially the case if the other thread was spinning adaptively, in which case it will then yield since the holder of the lock effectively yielded. My head is spinning attempting to understand this... You might try forcing all the various threads to run on different CPUs using cpuset and see if the variance goes down. Uniprocessor You can also use KTR + schedgraph to explore the specific scheduling going on, although be aware that KTR can also noticeably perturb schediling itself. Scheduling? The CPU can be mostly idle and the problem still happens. In general, things shouldn't call kernel printf in steady state operation; if they need to log something, they should use log(9) or similar. printf is primarily a tool for printing out device probe information, and for debugging purposes: it is not intended to be fast. Sounds fine to me. Is there a consensus on this? If so, does this need to go into some developer's handbook? How do we get developers to fix the existing code? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
witness Re: Why does printf(9) hang network?
I received a suggestion to try witness, so I build a kernel with WITNESS, WITNESS_KDB, KDB, DDB, KDB_TRACE, and DDB_NUMSYM. This is my first attempt to use witness, so if I got something wrong let me know. Didn't quite make it all the way up to a multiuser prompt: Starting syslogd. Starting rpcbind. lock order reversal: 1st 0xff8029549320 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2559 2nd 0xff000498b000 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:2 85 KDB: stack backtrace: db_trace_self_wrapper() at 0x801dab0a = db_trace_self_wrapper+0x2a _witness_debugger() at 0x805a144c = _witness_debugger+0x2c witness_checkorder() at 0x805a24af = witness_checkorder+0x66f _sx_xlock() at 0x8056b6d4 = _sx_xlock+0x34 ufsdirhash_acquire() at 0x8076f833 = ufsdirhash_acquire+0x33 ufsdirhash_add() at 0x8076fd99 = ufsdirhash_add+0x19 ufs_direnter() at 0x80772498 = ufs_direnter+0x848 ufs_mkdir() at 0x807783d6 = ufs_mkdir+0x5e6 VOP_MKDIR_APV() at 0x808650d4 = VOP_MKDIR_APV+0x34 kern_mkdirat() at 0x805eb740 = kern_mkdirat+0x270 syscall() at 0x8081ec5e = syscall+0x19e Xfast_syscall() at 0x80806ab1 = Xfast_syscall+0xe1 --- syscall (136, FreeBSD ELF64, mkdir), rip = 0x80072c53c, rsp = 0x7fffec88 , rbp = 0x7fffef66 --- KDB: enter: witness_checkorder [thread pid 1255 tid 100076 ] Stopped at 0x8059083d = kdb_enter+0x3d:movq $0,0x6508a0(%rip ) db Managed to reboot to single user mode, changed /boot/kernel back to my production kernel, and got another lock order reversal rebooting: # sync # reboot Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...0 done All buffers synced. lock order reversal: 1st 0xff0004831448 ufs (ufs) @ /usr/src/sys/kern/vfs_mount.c:1200 2nd 0xff0004831d80 devfs (devfs) @ /usr/src/sys/kern/vfs_subr.c:2083 KDB: stack backtrace: db_trace_self_wrapper() at 0x801dab0a = db_trace_self_wrapper+0x2a _witness_debugger() at 0x805a144c = _witness_debugger+0x2c witness_checkorder() at 0x805a24af = witness_checkorder+0x66f __lockmgr_args() at 0x80552054 = __lockmgr_args+0xd04 vop_stdlock() at 0x805d9239 = vop_stdlock+0x39 VOP_LOCK1_APV() at 0x80864f56 = VOP_LOCK1_APV+0x46 _vn_lock() at 0x805f3cc7 = _vn_lock+0x47 vget() at 0x805e8856 = vget+0x56 devfs_allocv() at 0x804fa993 = devfs_allocv+0x103 devfs_root() at 0x804f9268 = devfs_root+0x48 dounmount() at 0x805e3369 = dounmount+0x419 vfs_unmountall() at 0x805e82a2 = vfs_unmountall+0x42 boot() at 0x80564bd3 = boot+0x683 reboot() at 0x80564ef8 = reboot+0x68 syscall() at 0x8081ec5e = syscall+0x19e Xfast_syscall() at 0x80806ab1 = Xfast_syscall+0xe1 --- syscall (55, FreeBSD ELF64, reboot), rip = 0x80078f83c, rsp = 0x7fffece8 , rbp = 0 --- KDB: enter: witness_checkorder [thread pid 35 tid 100073 ] Stopped at 0x8059083d = kdb_enter+0x3d:movq $0,0x6508a0(%rip ) db lock order reversal: 1st 0xff0004831da8 vnode interlock (vnode interlock) @ /usr/src/sys/fs/devf s/devfs_vnops.c:349 2nd 0xff8000248858 firewire (firewire) @ /usr/src/sys/dev/firewire/fwohci.c :2227 KDB: stack backtrace: db_trace_self_wrapper() at 0x801dab0a = db_trace_self_wrapper+0x2a _witness_debugger() at 0x805a144c = _witness_debugger+0x2c witness_checkorder() at 0x805a24af = witness_checkorder+0x66f _mtx_lock_flags() at 0x80557b52 = _mtx_lock_flags+0x32 fwohci_poll() at 0x8035feb1 = fwohci_poll+0x31 dcons_cngetc() at 0x80303d69 = dcons_cngetc+0x59 cncheckc() at 0x8052d425 = cncheckc+0x65 cngetc() at 0x8052d44c = cngetc+0x1c db_readline() at 0x801d9ef7 = db_readline+0x77 db_read_line() at 0x801da975 = db_read_line+0x15 db_command_loop() at 0x801d8ad8 = db_command_loop+0x38 db_trap() at 0x801daa49 = db_trap+0x89 kdb_trap() at 0x80590665 = kdb_trap+0x95 trap() at 0x8081f200 = trap+0x170 calltrap() at 0x808067d3 = calltrap+0x8 --- trap 0x3, rip = 0x8059083d, rsp = 0xff80405d9710, rbp = 0xff 80405d9730 --- kdb_enter() at 0x8059083d = kdb_enter+0x3d witness_checkorder() at 0x805a24af = witness_checkorder+0x66f __lockmgr_args() at 0x80552054 = __lockmgr_args+0xd04 vop_stdlock() at 0x805d9239 = vop_stdlock+0x39 VOP_LOCK1_APV() at 0x80864f56 = VOP_LOCK1_APV+0x46 _vn_lock() at 0x805f3cc7 = _vn_lock+0x47 vget() at 0x805e8856 = vget+0x56 devfs_allocv() at 0x804fa993 = devfs_allocv+0x103 devfs_root() at 0x804f9268 = devfs_root+0x48 dounmount() at 0x805e3369 = dounmount+0x419 vfs_unmountall() at
Why does printf(9) hang network?
Why would doing a printf(9) in a device driver (usb, firewire, probably others) cause an obscenely long lockout on /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) ? Printf(9) alone isn't the problem, adding printfs to chown(2) does not cause the problem, but printfs from device drivers do. Grep says that uipc_sockbuf.c is the only file that locks/unlocks sb_sx. The device drivers and printf don't even know that sb_sx exists. 135 int 136 sblock(struct sockbuf *sb, int flags) 137 { 138 139 KASSERT((flags SBL_VALID) == flags, 140 (sblock: flags invalid (0x%x), flags)); 141 142 if (flags SBL_WAIT) { 143 if ((sb-sb_flags SB_NOINTR) || 144 (flags SBL_NOINTR)) { 145 sx_xlock(sb-sb_sx); 146 return (0); 147 } 148 return (sx_xlock_sig(sb-sb_sx)); 149 } else { 150 if (sx_try_xlock(sb-sb_sx) == 0) 151 return (EWOULDBLOCK); 152 return (0); 153 } 154 } More info at: http://www.freebsd.org/cgi/query-pr.cgi?pr=118093 ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Why does printf(9) hang network?
On Sat, 5 Feb 2011, dieter...@engineer.com wrote: Why would doing a printf(9) in a device driver (usb, firewire, probably others) cause an obscenely long lockout on /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) ? Printf(9) alone isn't the problem, adding printfs to chown(2) does not cause the problem, but printfs from device drivers do. Grep says that uipc_sockbuf.c is the only file that locks/unlocks sb_sx. The device drivers and printf don't even know that sb_sx exists. I can't speak to the details of your situation, but one possible explanation might be: printf runs at the speed of the console, which for serious consoles can be extremely slowly. Device driver interrupt threads can preempt other threads, possibly while those threads hold locks. That causes them to hold the locks for much longer, as the threads may not get rescheduled for some period (for example, until the device driver is done doing a printf), leading other threads waiting for that lock to wait significantly longer. Especially the case if the other thread was spinning adaptively, in which case it will then yield since the holder of the lock effectively yielded. You might try forcing all the various threads to run on different CPUs using cpuset and see if the variance goes down. You can also use KTR + schedgraph to explore the specific scheduling going on, although be aware that KTR can also noticeably perturb schediling itself. In general, things shouldn't call kernel printf in steady state operation; if they need to log something, they should use log(9) or similar. printf is primarily a tool for printing out device probe information, and for debugging purposes: it is not intended to be fast. Robert 135 int 136 sblock(struct sockbuf *sb, int flags) 137 { 138 139 KASSERT((flags SBL_VALID) == flags, 140 (sblock: flags invalid (0x%x), flags)); 141 142 if (flags SBL_WAIT) { 143 if ((sb-sb_flags SB_NOINTR) || 144 (flags SBL_NOINTR)) { 145 sx_xlock(sb-sb_sx); 146 return (0); 147 } 148 return (sx_xlock_sig(sb-sb_sx)); 149 } else { 150 if (sx_try_xlock(sb-sb_sx) == 0) 151 return (EWOULDBLOCK); 152 return (0); 153 } 154 } More info at: http://www.freebsd.org/cgi/query-pr.cgi?pr=118093 ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org