On Wed, Jul 27, 2011 at 06:32:33PM +0200, Manuel Bouyer wrote: > Hello, > I'm testing a current amd64 kernel: > NetBSD borneo 5.99.55 NetBSD 5.99.55 (GENERIC) #0: Tue Jul 26 23:38:21 UTC > 2011 > bui...@b7.netbsd.org:/home/builds/ab/HEAD/amd64/201107262140Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC > amd64 > > with a 5.0_STABLE userland, and I noticed a rsync client is really slow > (rsync -avH --delete --delete-excluded --delete-after --delay-updates --force > --stats --partial rsync://rsync.fr.netbsd.org/NetBSD/NetBSD-release-4-0/src . > or > rsync -avH --delete --delete-excluded --delete-after --delay-updates --force > --stats --partial rsync://rsync.fr.netbsd.org/NetBSD/NetBSD-release-4-0/src). > > Some investigations makes me suspect that select(2) is not working > properly, especially it doesn't wake up when there's data ready in the > socket buffer: when the rsync process is idle it's waiting on select, > when it's idle netstat shows that the receive socket queue is full (I > tried with both net.inet.tcp.recvbuf_auto set to 1 and 0) and > ktrace shows: > 4102 1 rsync 1311780519.324063908 CALL > select(4,0x7f7fffff83b0,0x7f7fffff8390,0,0x7f7fffff83d0) > 4102 1 rsync 1311780579.483436279 RET select 0 > 4102 1 rsync 1311780579.483440327 CALL > select(4,0x7f7fffff83b0,0x7f7fffff8390,0,0x7f7fffff83d0) > 4102 1 rsync 1311780579.483442445 RET select 1 > 4102 1 rsync 1311780579.483443326 CALL > read(3,0x7f7ff7a36de2,0x21a) > 4102 1 rsync 1311780579.483451341 GIO fd 3 read 538 bytes > > So select blocks (maybe because there's effectively nothing to read at this > time), but instead of waking up when there's data ready it wakes up > when the timeout expires. The next select call returns immediatly. > Does it ring a bell to someone ? Any recent change in this area > (either in select(2), or tcp) recently ?
Disabling DIRECT_SELECT (with #define NO_DIRECT_SELECT in sys_select.c) "fixes" the problem. I opened PR kern/45187 for this. I don't know what's wrong with DIRECT_SELECT at this time, or even if it's just a timing change which makes select behaves as expected. -- Manuel Bouyer <bou...@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference --