Re: Why does printf(9) hang network?

2011-02-07 Thread dieterbsd

Robert writes:
Why would doing a printf(9) in a device driver (usb, firewire, 

probably

others) cause an obscenely long lockout on
/usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx)  ?

Printf(9) alone isn't the problem, adding printfs to chown(2) does 

not

cause the problem, but printfs from device drivers do.

Grep says that uipc_sockbuf.c is the only file that locks/unlocks 

sb_sx.

The device drivers and printf don't even know that sb_sx exists.


I can't speak to the details of your situation, but one possible
explanation might be: printf runs at the speed of the console, which 

for

serious consoles can be extremely slowly.


But shouldn't the RS-232 driver just fill up the UART's FIFO and then 
sleep?

Why can't the network code run while the RS-232 driver sleeps?  Actually
this must be happening in the call-printf-from-chown case.  There must 
be

something different when printf is called from a device driver.


Device driver interrupt threads can preempt other threads,
possibly while those threads hold locks.  That causes them to hold
the locks for much longer, as the threads may not get rescheduled for 

some

period (for example, until the device driver is done doing a printf),


If the CPU is mostly idle, shouldn't the network thread get scheduled 
right

away?


leading
other threads waiting for that lock to wait significantly longer.
Especially the case if the other thread was spinning adaptively, in 

which
case it will then yield since the holder of the lock effectively 

yielded.

My head is spinning attempting to understand this...


You might try forcing all the various threads to run on different CPUs
using cpuset and see if the variance goes down.


Uniprocessor


You can also use KTR + schedgraph
to explore the specific scheduling going on, although be aware that 

KTR

can also noticeably perturb schediling itself.


Scheduling?  The CPU can be mostly idle and the problem still happens.

In general, things shouldn't call kernel printf in steady state 

operation;
if  they need to log something, they should use log(9) or similar.  

printf

is primarily a tool for printing out device probe information, and for
debugging purposes: it is not intended to be fast.


Sounds fine to me.  Is there a consensus on this?  If so, does this 
need to

go into some developer's handbook?  How do we get developers to fix the
existing code?


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


witness Re: Why does printf(9) hang network?

2011-02-07 Thread dieterbsd

I received a suggestion to try witness, so I build a kernel with
WITNESS, WITNESS_KDB, KDB, DDB, KDB_TRACE, and DDB_NUMSYM.
This is my first attempt to use witness, so if I got something wrong
let me know.  Didn't quite make it all the way up to a multiuser prompt:

Starting syslogd.
Starting rpcbind.
lock order reversal:
 1st 0xff8029549320 bufwait (bufwait) @ 
/usr/src/sys/kern/vfs_bio.c:2559
 2nd 0xff000498b000 dirhash (dirhash) @ 
/usr/src/sys/ufs/ufs/ufs_dirhash.c:2

85
KDB: stack backtrace:
db_trace_self_wrapper() at 0x801dab0a = 
db_trace_self_wrapper+0x2a

_witness_debugger() at 0x805a144c = _witness_debugger+0x2c
witness_checkorder() at 0x805a24af = witness_checkorder+0x66f
_sx_xlock() at 0x8056b6d4 = _sx_xlock+0x34
ufsdirhash_acquire() at 0x8076f833 = ufsdirhash_acquire+0x33
ufsdirhash_add() at 0x8076fd99 = ufsdirhash_add+0x19
ufs_direnter() at 0x80772498 = ufs_direnter+0x848
ufs_mkdir() at 0x807783d6 = ufs_mkdir+0x5e6
VOP_MKDIR_APV() at 0x808650d4 = VOP_MKDIR_APV+0x34
kern_mkdirat() at 0x805eb740 = kern_mkdirat+0x270
syscall() at 0x8081ec5e = syscall+0x19e
Xfast_syscall() at 0x80806ab1 = Xfast_syscall+0xe1
--- syscall (136, FreeBSD ELF64, mkdir), rip = 0x80072c53c, rsp = 
0x7fffec88

, rbp = 0x7fffef66 ---
KDB: enter: witness_checkorder
[thread pid 1255 tid 100076 ]
Stopped at  0x8059083d = kdb_enter+0x3d:movq
$0,0x6508a0(%rip

)
db


Managed to reboot to single user mode, changed /boot/kernel
back to my production kernel, and got another lock order reversal
rebooting:

# sync
# reboot
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...0 done
All buffers synced.
lock order reversal:
1st 0xff0004831448 ufs (ufs) @ /usr/src/sys/kern/vfs_mount.c:1200
 2nd 0xff0004831d80 devfs (devfs) @ 
/usr/src/sys/kern/vfs_subr.c:2083

KDB: stack backtrace:
db_trace_self_wrapper() at 0x801dab0a = 
db_trace_self_wrapper+0x2a

_witness_debugger() at 0x805a144c = _witness_debugger+0x2c
witness_checkorder() at 0x805a24af = witness_checkorder+0x66f
__lockmgr_args() at 0x80552054 = __lockmgr_args+0xd04
vop_stdlock() at 0x805d9239 = vop_stdlock+0x39
VOP_LOCK1_APV() at 0x80864f56 = VOP_LOCK1_APV+0x46
_vn_lock() at 0x805f3cc7 = _vn_lock+0x47
vget() at 0x805e8856 = vget+0x56
devfs_allocv() at 0x804fa993 = devfs_allocv+0x103
devfs_root() at 0x804f9268 = devfs_root+0x48
dounmount() at 0x805e3369 = dounmount+0x419
vfs_unmountall() at 0x805e82a2 = vfs_unmountall+0x42
boot() at 0x80564bd3 = boot+0x683
reboot() at 0x80564ef8 = reboot+0x68
syscall() at 0x8081ec5e = syscall+0x19e
Xfast_syscall() at 0x80806ab1 = Xfast_syscall+0xe1
--- syscall (55, FreeBSD ELF64, reboot), rip = 0x80078f83c, rsp = 
0x7fffece8

, rbp = 0 ---
KDB: enter: witness_checkorder
[thread pid 35 tid 100073 ]
Stopped at  0x8059083d = kdb_enter+0x3d:movq
$0,0x6508a0(%rip

)
db lock order reversal:
 1st 0xff0004831da8 vnode interlock (vnode interlock) @ 
/usr/src/sys/fs/devf

s/devfs_vnops.c:349
 2nd 0xff8000248858 firewire (firewire) @ 
/usr/src/sys/dev/firewire/fwohci.c

:2227
KDB: stack backtrace:
db_trace_self_wrapper() at 0x801dab0a = 
db_trace_self_wrapper+0x2a

_witness_debugger() at 0x805a144c = _witness_debugger+0x2c
witness_checkorder() at 0x805a24af = witness_checkorder+0x66f
_mtx_lock_flags() at 0x80557b52 = _mtx_lock_flags+0x32
fwohci_poll() at 0x8035feb1 = fwohci_poll+0x31
dcons_cngetc() at 0x80303d69 = dcons_cngetc+0x59
cncheckc() at 0x8052d425 = cncheckc+0x65
cngetc() at 0x8052d44c = cngetc+0x1c
db_readline() at 0x801d9ef7 = db_readline+0x77
db_read_line() at 0x801da975 = db_read_line+0x15
db_command_loop() at 0x801d8ad8 = db_command_loop+0x38
db_trap() at 0x801daa49 = db_trap+0x89
kdb_trap() at 0x80590665 = kdb_trap+0x95
trap() at 0x8081f200 = trap+0x170
calltrap() at 0x808067d3 = calltrap+0x8
--- trap 0x3, rip = 0x8059083d, rsp = 0xff80405d9710, rbp = 
0xff

80405d9730 ---
kdb_enter() at 0x8059083d = kdb_enter+0x3d
witness_checkorder() at 0x805a24af = witness_checkorder+0x66f
__lockmgr_args() at 0x80552054 = __lockmgr_args+0xd04
vop_stdlock() at 0x805d9239 = vop_stdlock+0x39
VOP_LOCK1_APV() at 0x80864f56 = VOP_LOCK1_APV+0x46
_vn_lock() at 0x805f3cc7 = _vn_lock+0x47
vget() at 0x805e8856 = vget+0x56
devfs_allocv() at 0x804fa993 = devfs_allocv+0x103
devfs_root() at 0x804f9268 = devfs_root+0x48
dounmount() at 0x805e3369 = dounmount+0x419
vfs_unmountall() at 

Why does printf(9) hang network?

2011-02-05 Thread dieterbsd
Why would doing a printf(9) in a device driver (usb, firewire, probably
others) cause an obscenely long lockout on
/usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx)  ?

Printf(9) alone isn't the problem, adding printfs to chown(2) does not
cause the problem, but printfs from device drivers do.

Grep says that uipc_sockbuf.c is the only file that locks/unlocks sb_sx.
The device drivers and printf don't even know that sb_sx exists.

135  int
136  sblock(struct sockbuf *sb, int flags)
137  {
138
139  KASSERT((flags  SBL_VALID) == flags,
140  (sblock: flags invalid (0x%x), flags));
141
142  if (flags  SBL_WAIT) {
143  if ((sb-sb_flags  SB_NOINTR) ||
144  (flags  SBL_NOINTR)) {
145  sx_xlock(sb-sb_sx);
146  return (0);
147  }
148  return (sx_xlock_sig(sb-sb_sx));
149  } else {
150  if (sx_try_xlock(sb-sb_sx) == 0)
151  return (EWOULDBLOCK);
152  return (0);
153  }
154  }

More info at: http://www.freebsd.org/cgi/query-pr.cgi?pr=118093


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Why does printf(9) hang network?

2011-02-05 Thread Robert Watson


On Sat, 5 Feb 2011, dieter...@engineer.com wrote:

Why would doing a printf(9) in a device driver (usb, firewire, probably 
others) cause an obscenely long lockout on 
/usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx)  ?


Printf(9) alone isn't the problem, adding printfs to chown(2) does not cause 
the problem, but printfs from device drivers do.


Grep says that uipc_sockbuf.c is the only file that locks/unlocks sb_sx. The 
device drivers and printf don't even know that sb_sx exists.


I can't speak to the details of your situation, but one possible explanation 
might be: printf runs at the speed of the console, which for serious consoles 
can be extremely slowly.  Device driver interrupt threads can preempt other 
threads, possibly while those threads hold locks.  That causes them to hold 
the locks for much longer, as the threads may not get rescheduled for some 
period (for example, until the device driver is done doing a printf), leading 
other threads waiting for that lock to wait significantly longer.  Especially 
the case if the other thread was spinning adaptively, in which case it will 
then yield since the holder of the lock effectively yielded.


You might try forcing all the various threads to run on different CPUs using 
cpuset and see if the variance goes down.  You can also use KTR + schedgraph 
to explore the specific scheduling going on, although be aware that KTR 
can also noticeably perturb schediling itself.


In general, things shouldn't call kernel printf in steady state operation; if 
they need to log something, they should use log(9) or similar.  printf is 
primarily a tool for printing out device probe information, and for debugging 
purposes: it is not intended to be fast.


Robert



135  int
136  sblock(struct sockbuf *sb, int flags)
137  {
138
139  KASSERT((flags  SBL_VALID) == flags,
140  (sblock: flags invalid (0x%x), flags));
141
142  if (flags  SBL_WAIT) {
143  if ((sb-sb_flags  SB_NOINTR) ||
144  (flags  SBL_NOINTR)) {
145  sx_xlock(sb-sb_sx);
146  return (0);
147  }
148  return (sx_xlock_sig(sb-sb_sx));
149  } else {
150  if (sx_try_xlock(sb-sb_sx) == 0)
151  return (EWOULDBLOCK);
152  return (0);
153  }
154  }

More info at: http://www.freebsd.org/cgi/query-pr.cgi?pr=118093


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org