Re: devd in r329188M don't start
On 13/02/2018 23:50, Hans Petter Selasky wrote: > On 02/13/18 10:47, Jakob Alvermark wrote: >> +1 >> >> My USB mouse was working fine before the switch to devmatch. Now I >> have to 'kldload ums' manually. >> >> Same for USB audio, snd_uaudio.ko was loaded by devd before. >> > > Hi, > > This is a known issue. > > Can you try the attached patch? > > Rebuild devmatch(8) and reinstall /etc/devd/devmatch.conf and > /etc/rc.d/devmatch only. +1 for ums mouse breakage on recent upgrade from head 20180201 (Git 9e57d147a97) to head 20180215 (Git 81891e10182). Will build and test patch on the weekend, but given the 2 success reports already, I presume it will fix things for me too. I won't follow up to this thread again unless the patch doesn't work for me. Thanks Hans. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
[SOLVED] Re: Deterministic rescue buildworld error with custom make.conf/src.conf/MAKEOBJDIRPREFIX
On 12/03/2017 13:37, Ian Lepore wrote: > On Sun, 2017-03-12 at 13:27 +1100, Lawrence Stewart wrote: >> Hi Ian, >> >> On 12/03/2017 10:29, Ian Lepore wrote: >>> >>> On Sun, 2017-03-12 at 10:22 +1100, Lawrence Stewart wrote: >>>> >>>> Hi all, >>>> >>>> I'm unable to complete buildworld with 2 recent svn revs I've >>>> tried >>>> (r314838 and r315059). I'm building for a slightly resource >>>> constrained >>>> production system so am specifying custom settings and a >>>> different >>>> obj >>>> tree location so I can copy it to the target system. The error >>>> persists >>>> after an "rm -rf /usr/obj/*", and if parallel building is >>>> disabled. >>>> >>>> The underlying build system built from r314838 via simple "make >>>> -C >>>> /usr/src -s -j6 buildworld buildkernel" built and installed fine, >>>> so >>>> the >>>> problem seems to be around the use of the build customisations. >>>> >>>> Any clues? >>>> >>>> Cheers, >>>> Lawrence >>>> >>>> >>>> root@builder-head-amd64:/usr/src # cat cust_make.conf >>>> KERNCONF=GENERIC-NODEBUG >>>> MALLOC_PRODUCTION=YES >>>> >>>> root@builder-head-amd64:/usr/src # cat cust_src.conf >>>> WITHOUT_PROFILE=1 >>>> >>>> root@builder-head-amd64:/usr/src # make >>>> __MAKE_CONF=/usr/src/cust_make.conf >>>> SRCCONF=/usr/src/cust_src.conf >>>> MAKEOBJDIRPREFIX=/usr/obj/cust buildworld buildkernel >>>> [...] >>>> MK_AUTO_OBJ=no >>>> MK_TESTS=no UPDATE_DEPENDFILE=no _RECURSING_CRUNCH=1 >>>> CC="cc -target x86_64-unknown-freebsd12.0 >>>> --sysroot=/usr/obj/cust/usr/src/tmp >>>> -B/usr/obj/cust/usr/src/tmp/usr/bin >>>> -O2 -pipe -std=gnu99-Qunused-arguments " CXX="c++ - >>>> target >>>> x86_64-unknown-freebsd12.0 --sysroot=/usr/obj/cust/usr/src/tmp >>>> -B/usr/obj/cust/usr/src/tmp/usr/bin -O2 -pipe -Qunused-arguments >>>> -Wno-c++11-extensions " make .MAKE.MODE="normal curdirOk=yes" >>>> .MAKE.META.IGNORE_PATHS="" -f rescue.mk exe >>>> cc -target x86_64-unknown-freebsd12.0 >>>> --sysroot=/usr/obj/cust/usr/src/tmp >>>> -B/usr/obj/cust/usr/src/tmp/usr/bin >>>> -O2 -pipe -std=gnu99-Qunused-arguments -nostdlib -Wl,-dc >>>> -r >>>> -o >>>> cat.lo cat_stub.o >>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o >>>> cc: error: no such file or directory: >>>> '/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o' >>>> *** Error code 1 >>>> >>>> There appear to be a lot of missing .o files under the rescue obj >>>> tree: >>>> >>>> root@builder-head-amd64:/usr/src # find >>>> /usr/obj/cust/usr/src/rescue/rescue//usr -type f >>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax.o >>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax >>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes.o >>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes >>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/sh.err.h >>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/tc.const.h >>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/gethost >>>> >>>> compared with an obj tree on a different head system: >>>> >>>> find /usr/obj/usr/src/rescue/rescue/usr/ -type f | wc -l >>>> 1552 >>>> ___ >>>> freebsd-current@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current >>>> To unsubscribe, send any mail to "freebsd-current-unsubscribe@fre >>>> ebsd >>>> .org" >>> The MAKEOBJDIRPREFIX variable must be set in the environment, not >>> in >>> make.conf or on the make command line (documented in build(7)). >> Your assertion seems at odds with my past experience and my reading >> of >> the man page... from build(7): >> >> The build may be controlled by defining make(1) variables >> described in the ENVIRONMENT section below, and by the >> varia
Re: Deterministic rescue buildworld error with custom make.conf/src.conf/MAKEOBJDIRPREFIX
Hi Ian, On 12/03/2017 10:29, Ian Lepore wrote: > On Sun, 2017-03-12 at 10:22 +1100, Lawrence Stewart wrote: >> Hi all, >> >> I'm unable to complete buildworld with 2 recent svn revs I've tried >> (r314838 and r315059). I'm building for a slightly resource >> constrained >> production system so am specifying custom settings and a different >> obj >> tree location so I can copy it to the target system. The error >> persists >> after an "rm -rf /usr/obj/*", and if parallel building is disabled. >> >> The underlying build system built from r314838 via simple "make -C >> /usr/src -s -j6 buildworld buildkernel" built and installed fine, so >> the >> problem seems to be around the use of the build customisations. >> >> Any clues? >> >> Cheers, >> Lawrence >> >> >> root@builder-head-amd64:/usr/src # cat cust_make.conf >> KERNCONF=GENERIC-NODEBUG >> MALLOC_PRODUCTION=YES >> >> root@builder-head-amd64:/usr/src # cat cust_src.conf >> WITHOUT_PROFILE=1 >> >> root@builder-head-amd64:/usr/src # make >> __MAKE_CONF=/usr/src/cust_make.conf SRCCONF=/usr/src/cust_src.conf >> MAKEOBJDIRPREFIX=/usr/obj/cust buildworld buildkernel >> [...] >> MK_AUTO_OBJ=no MK_TESTS=no UPDATE_DEPENDFILE=no _RECURSING_CRUNCH=1 >> CC="cc -target x86_64-unknown-freebsd12.0 >> --sysroot=/usr/obj/cust/usr/src/tmp >> -B/usr/obj/cust/usr/src/tmp/usr/bin >> -O2 -pipe -std=gnu99-Qunused-arguments " CXX="c++ -target >> x86_64-unknown-freebsd12.0 --sysroot=/usr/obj/cust/usr/src/tmp >> -B/usr/obj/cust/usr/src/tmp/usr/bin -O2 -pipe -Qunused-arguments >> -Wno-c++11-extensions " make .MAKE.MODE="normal curdirOk=yes" >> .MAKE.META.IGNORE_PATHS="" -f rescue.mk exe >> cc -target x86_64-unknown-freebsd12.0 >> --sysroot=/usr/obj/cust/usr/src/tmp >> -B/usr/obj/cust/usr/src/tmp/usr/bin >> -O2 -pipe -std=gnu99-Qunused-arguments -nostdlib -Wl,-dc -r >> -o >> cat.lo cat_stub.o >> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o >> cc: error: no such file or directory: >> '/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o' >> *** Error code 1 >> >> There appear to be a lot of missing .o files under the rescue obj >> tree: >> >> root@builder-head-amd64:/usr/src # find >> /usr/obj/cust/usr/src/rescue/rescue//usr -type f >> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax.o >> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax >> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes.o >> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes >> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/sh.err.h >> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/tc.const.h >> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/gethost >> >> compared with an obj tree on a different head system: >> >> find /usr/obj/usr/src/rescue/rescue/usr/ -type f | wc -l >> 1552 >> ___ >> freebsd-current@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd >> .org" > > The MAKEOBJDIRPREFIX variable must be set in the environment, not in > make.conf or on the make command line (documented in build(7)). Your assertion seems at odds with my past experience and my reading of the man page... from build(7): The build may be controlled by defining make(1) variables described in the ENVIRONMENT section below, and by the variables documented in make.conf(5). ... which indicates they are make variables, not environment variables specifically. As a concrete example, TARGET and DESTDIR are listed under the "ENVIRONMENT" section of the man page, yet "EXAMPLES" shows: make TARGET=sparc64 buildworld make TARGET=sparc64 DESTDIR=/clients/sparc64 installworld I've certainly always set build vars documented in the "ENVIRONMENT" section of the man page on the make command line without issue. Pretty sure I've set MAKEOBJDIRPREFIX from the make command line also in the past, though perhaps it has been working for me "by accident" and a documentation tweak is in order if the distinction you make is in fact relevant... Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Deterministic rescue buildworld error with custom make.conf/src.conf/MAKEOBJDIRPREFIX
Hi all, I'm unable to complete buildworld with 2 recent svn revs I've tried (r314838 and r315059). I'm building for a slightly resource constrained production system so am specifying custom settings and a different obj tree location so I can copy it to the target system. The error persists after an "rm -rf /usr/obj/*", and if parallel building is disabled. The underlying build system built from r314838 via simple "make -C /usr/src -s -j6 buildworld buildkernel" built and installed fine, so the problem seems to be around the use of the build customisations. Any clues? Cheers, Lawrence root@builder-head-amd64:/usr/src # cat cust_make.conf KERNCONF=GENERIC-NODEBUG MALLOC_PRODUCTION=YES root@builder-head-amd64:/usr/src # cat cust_src.conf WITHOUT_PROFILE=1 root@builder-head-amd64:/usr/src # make __MAKE_CONF=/usr/src/cust_make.conf SRCCONF=/usr/src/cust_src.conf MAKEOBJDIRPREFIX=/usr/obj/cust buildworld buildkernel [...] MK_AUTO_OBJ=no MK_TESTS=no UPDATE_DEPENDFILE=no _RECURSING_CRUNCH=1 CC="cc -target x86_64-unknown-freebsd12.0 --sysroot=/usr/obj/cust/usr/src/tmp -B/usr/obj/cust/usr/src/tmp/usr/bin -O2 -pipe -std=gnu99-Qunused-arguments " CXX="c++ -target x86_64-unknown-freebsd12.0 --sysroot=/usr/obj/cust/usr/src/tmp -B/usr/obj/cust/usr/src/tmp/usr/bin -O2 -pipe -Qunused-arguments -Wno-c++11-extensions " make .MAKE.MODE="normal curdirOk=yes" .MAKE.META.IGNORE_PATHS="" -f rescue.mk exe cc -target x86_64-unknown-freebsd12.0 --sysroot=/usr/obj/cust/usr/src/tmp -B/usr/obj/cust/usr/src/tmp/usr/bin -O2 -pipe -std=gnu99-Qunused-arguments -nostdlib -Wl,-dc -r -o cat.lo cat_stub.o /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o cc: error: no such file or directory: '/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o' *** Error code 1 There appear to be a lot of missing .o files under the rescue obj tree: root@builder-head-amd64:/usr/src # find /usr/obj/cust/usr/src/rescue/rescue//usr -type f /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax.o /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes.o /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/sh.err.h /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/tc.const.h /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/gethost compared with an obj tree on a different head system: find /usr/obj/usr/src/rescue/rescue/usr/ -type f | wc -l 1552 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On 08/27/15 17:15, Don Lewis wrote: On 27 Aug, Don Lewis wrote: On 27 Aug, Lawrence Stewart wrote: On 08/27/15 09:36, John-Mark Gurney wrote: Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300: On 12/08/2015 17:11, Lawrence Stewart wrote: On 08/07/15 07:33, Pawel Pekala wrote: Hi K., On 2015-08-06 12:33 -0700, K. Macy km...@freebsd.org wrote: Is this still happening? Still crashes: +1 for me running r286617 Here is another +1 with r286922. I can add a couple of bits of debugging data: (kgdb) fr 8 #8 0x80639d60 in knote (list=0xf8019a733ea0, hint=2147483648, lockflags=value optimized out) at /usr/src/sys/kern/kern_event.c:1964 1964} else if ((lockflags KNF_NOKQLOCK) != 0) { (kgdb) p *list $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 We should/cannot get here w/ an empty list. If we do, then there is something seriously wrong... The current kn (which we must have as we are here) MUST be on the list, but as you just showed, there are no knotes on the list. Can you get me a print of the knote? That way I can see what flags are on it? I quickly tried to get this info for you by building my kernel with -O0 and reproducing, but I get an insta-panic on boot with the new kernel: Fatal double fault rip = 0x8218c794 rsp = 0xfe044cdc9fe0 rbp = 0xfe044cdca110 cpuid = 2; apic id = 02 panic: double fault cpuid = 2 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe03dcfffe30 vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0 panic() at panic+0x43/frame 0xfe03dc10 dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30 Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30 --- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp = 0xfe044cdca110 --- vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110 vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame 0xfe044cdca560 vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0 zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0 zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730 zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760 vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame 0xfe044cdca800 zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930 zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980 zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0 spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50 traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60 traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0 traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40 traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0 traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0 traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0 traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040 traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140 spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0 spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610 spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0 spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0 spa_load() at spa_load+0x320/frame 0xfe044cdccbb0 spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50 spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40 spa_open() at spa_open+0x35/frame 0xfe044cdccd70 dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0 dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30 zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050 zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0 zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390 vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660 kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0 parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810 vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d0 start_init() at start_init+0x62/frame 0xfe044cdcda70 fork_exit() at fork_exit+0x84/frame 0xfe044cdcdab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe044cdcdab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic Didn't get a core because it panics before dumpdev is set. Is anyone
Re: Instant panic while trying run ports-mgmt/poudriere
On 08/23/15 22:54, Konstantin Belousov wrote: On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote: On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote: On 12/08/2015 17:11, Lawrence Stewart wrote: On 08/07/15 07:33, Pawel Pekala wrote: Hi K., On 2015-08-06 12:33 -0700, K. Macy km...@freebsd.org wrote: Is this still happening? Still crashes: +1 for me running r286617 Here is another +1 with r286922. I can add a couple of bits of debugging data: (kgdb) fr 8 #8 0x80639d60 in knote (list=0xf8019a733ea0, hint=2147483648, lockflags=value optimized out) at /usr/src/sys/kern/kern_event.c:1964 1964} else if ((lockflags KNF_NOKQLOCK) != 0) { (kgdb) p *list $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 knlist_mtx_lock, kl_unlock = 0x8063a200 knlist_mtx_unlock, kl_assert_locked = 0x8063a220 knlist_mtx_assert_locked, kl_assert_unlocked = 0x8063a240 knlist_mtx_assert_unlocked, kl_lockarg = 0xf8019a733bb0} (kgdb) disassemble Dump of assembler code for function knote: 0x80639d00 knote+0: push %rbp 0x80639d01 knote+1: mov%rsp,%rbp 0x80639d04 knote+4: push %r15 0x80639d06 knote+6: push %r14 0x80639d08 knote+8: push %r13 0x80639d0a knote+10: push %r12 0x80639d0c knote+12: push %rbx 0x80639d0d knote+13: sub$0x18,%rsp 0x80639d11 knote+17: mov%edx,%r12d 0x80639d14 knote+20: mov%rsi,-0x30(%rbp) 0x80639d18 knote+24: mov%rdi,%rbx 0x80639d1b knote+27: test %rbx,%rbx 0x80639d1e knote+30: je 0x80639ef6 knote+502 0x80639d24 knote+36: mov%r12d,%eax 0x80639d27 knote+39: and$0x1,%eax 0x80639d2a knote+42: mov%eax,-0x3c(%rbp) 0x80639d2d knote+45: mov0x28(%rbx),%rdi 0x80639d31 knote+49: je 0x80639d38 knote+56 0x80639d33 knote+51: callq *0x18(%rbx) 0x80639d36 knote+54: jmp0x80639d42 knote+66 0x80639d38 knote+56: callq *0x20(%rbx) 0x80639d3b knote+59: mov0x28(%rbx),%rdi 0x80639d3f knote+63: callq *0x8(%rbx) 0x80639d42 knote+66: mov%rbx,-0x38(%rbp) 0x80639d46 knote+70: mov(%rbx),%rbx 0x80639d49 knote+73: test %rbx,%rbx 0x80639d4c knote+76: je 0x80639ee5 knote+485 0x80639d52 knote+82: and$0x2,%r12d 0x80639d56 knote+86: nopw %cs:0x0(%rax,%rax,1) 0x80639d60 knote+96: mov0x28(%rbx),%r14 Panic is in the last quoted instruction. And: (kgdb) i reg rax0x246582 rbx0xdeadc0dedeadc0de -2401050962867404578 rcx0x0 0 rdx0x12e302 rsi0x80a26a5a -2136839590 rdi0x80e81b80 -2132272256 rbp0xfe02b7efea20 0xfe02b7efea20 rsp0xfe02b7efe9e0 0xfe02b7efe9e0 r8 0x80a269ce -2136839730 r9 0x80e82838 -2132269000 r100x1 65536 r110x80fabd10 -2131051248 r120x0 0 r130xf801ff84a818 -8787511171048 r140xf801ff84a800 -8787511171072 r150xf8019a6974f0 -8789207452432 rip0x80639d60 0x80639d60 knote+96 eflags 0x10286 66182 I think that $rbx stands out here (this is a kernel with INVARIANTS). Looking at the code, is it possible that one of the calls from within the loop's body modifies the list? If that is so and provided that is a valid behavior, then maybe using SLIST_FOREACH_SAFE would help. This is first time a useful debugging data was posted. The 0x28 offset may indicate either kn_kq member access of the struct knote, or kq_list of the struct kqueue. kl_list.slh_first of the list parameter is NULL, how would a list iteration loop even start ? Can you look up the list argument value from the previous frame (%rdi is overwritten, so debugger might be confused) ? After looking at your data closely, I think you are right. The panic occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT). This is the only case in the tree where filter uses knlist_remove_inevent() to detach processed note, so indeed the slist is modified under the iterator. Below is the patch with the suggested change and unrelated cleanup of the uma(9) KPI use. Please test, everybody who has a panic with the backtrace pointing to the sys_exit(). Fixes the panic for me too, thanks Kostik. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Instant panic while trying run ports-mgmt/poudriere
On 08/27/15 09:36, John-Mark Gurney wrote: Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300: On 12/08/2015 17:11, Lawrence Stewart wrote: On 08/07/15 07:33, Pawel Pekala wrote: Hi K., On 2015-08-06 12:33 -0700, K. Macy km...@freebsd.org wrote: Is this still happening? Still crashes: +1 for me running r286617 Here is another +1 with r286922. I can add a couple of bits of debugging data: (kgdb) fr 8 #8 0x80639d60 in knote (list=0xf8019a733ea0, hint=2147483648, lockflags=value optimized out) at /usr/src/sys/kern/kern_event.c:1964 1964} else if ((lockflags KNF_NOKQLOCK) != 0) { (kgdb) p *list $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 We should/cannot get here w/ an empty list. If we do, then there is something seriously wrong... The current kn (which we must have as we are here) MUST be on the list, but as you just showed, there are no knotes on the list. Can you get me a print of the knote? That way I can see what flags are on it? I quickly tried to get this info for you by building my kernel with -O0 and reproducing, but I get an insta-panic on boot with the new kernel: Fatal double fault rip = 0x8218c794 rsp = 0xfe044cdc9fe0 rbp = 0xfe044cdca110 cpuid = 2; apic id = 02 panic: double fault cpuid = 2 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe03dcfffe30 vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0 panic() at panic+0x43/frame 0xfe03dc10 dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30 Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30 --- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp = 0xfe044cdca110 --- vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110 vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame 0xfe044cdca560 vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0 zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0 zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730 zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760 vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame 0xfe044cdca800 zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930 zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980 zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0 spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50 traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60 traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0 traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40 traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0 traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0 traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0 traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040 traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140 spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0 spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610 spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0 spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0 spa_load() at spa_load+0x320/frame 0xfe044cdccbb0 spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50 spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40 spa_open() at spa_open+0x35/frame 0xfe044cdccd70 dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0 dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30 zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050 zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0 zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390 vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660 kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0 parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810 vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d0 start_init() at start_init+0x62/frame 0xfe044cdcda70 fork_exit() at fork_exit+0x84/frame 0xfe044cdcdab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe044cdcdab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic Didn't get a core because it panics before dumpdev is set. Is anyone else able to run -O0 kernels or do I have something set to evil? Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list
Re: Instant panic while trying run ports-mgmt/poudriere
On 08/07/15 07:33, Pawel Pekala wrote: Hi K., On 2015-08-06 12:33 -0700, K. Macy km...@freebsd.org wrote: Is this still happening? Still crashes: +1 for me running r286617 Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Panic @r251745; i386, early in boot sequence
On 06/15/13 02:35, David Wolfskill wrote: Here's a hand-transcribed copy of the backtrace: ... Timecounters tick every 1.000 msec panic: curvnet is NULL cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper(c1034c40,c102a482,c11b646c,c2020cbc,c1037f21,...) at 0xc051283d = db_trace_self_wrapper+0x2d/frame 0xc2020be8 kdb_backtrace(c10863e7,0,c102a482,c2020cbc,c102a482,...) at 0xc0aa9800 = kdb_backtrace+0x30/frame 0xc2020c50 vpanic(c11963a2,100,c102a482,c2020cbc,c2020cbc,...) at 0xc0a71e0f = vpanic+0x11f/frame 0xc2020c50 kassert_panic(c102a482,0,c102a450,10b,c1143100,...) at 0xc0a71cea = kassert_panic+0xea/frame 0xc2020cb0 hhook_head_register(1,0,c1358904,102,c116030c,...) at 0xc0a40132 = hhook_head_register+0x102/frame 0xc2020cd4 tcp_init(0,c103cac6,ce03d448,c112654c,c2020d58,...) at 0xc0c1fecc = tcp_init+0x2c/frame 0xc0c0d20 domain_init(c114305c,0,ce03d530,201e000,2025000,...) at 0xc0adf357 = domain_init+0x27/frame 0xc2020d38 mi_startup() at 0xc0a1deb7 = mi_startup+0xf7/frame 0xc2020d58 begin() at 0xc0a1c07 = begin+0x2c KDB: enter: panic [ thread pid 0 tid 10 ] Stopped at 0xc0aa95fd = kdb_enter+0x3d: movl$0,0xc11b21c4 = kdb_why db Previous working head was @r251684 (from yesterday). My build machine (where I have a serial console) is still building; above is from laptop (which lacks serial console, unfortunately). I update build machine laptop to same GRNs; here is build machine's uname-a output (showing yesterday's update, as today's is still in progress): FreeBSD freebeast.catwhisker.org 10.0-CURRENT FreeBSD 10.0-CURRENT #1191 r251684M/251684:135: Thu Jun 13 09:46:05 PDT 2013 r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/sys/GENERIC i386 Any suggestions for what to hack? :-} My apologies for the brain fart. Committing a fix shortly. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Read-triggered corruption of swap backed MD devices
Hi all, I tracked the cause of a colleague's nanobsd image creation problem to what appears to be some nasty behaviour with swap-backed MD devices. I've verified the behaviour exists on three separate systems running 10-CURRENT r250260, 9-STABLE r250824 and 9-STABLE r250925. The following minimal reproduction recipe (run as root) deterministically triggers the behaviour for me on the 3 systems I've tested: env MD_DEV=`mdconfig -an -t swap -s 1m -x 63 -y 16` sh -c '(fdisk -I md${MD_DEV} ; bsdlabel -w -B md${MD_DEV}s1 ; bsdlabel md${MD_DEV}s1 ; dd if=/dev/md${MD_DEV} of=/dev/null bs=64k ; bsdlabel md${MD_DEV}s1 ; mdconfig -d -u ${MD_DEV})' By changing the mdconfig -t swap argument to -t malloc, the bsdlabel remains intact after the dd command completes. I've included command line recipe runs from my 10-CURRENT r250260 laptop with both -t swap and -t malloc at the end of this email for reference. Smells like a VM related problem to me, but ENOCLUE so I would appreciate some help. Cheers, Lawrence root@lstewart-laptop:~ # uname -a FreeBSD lstewart-laptop 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r250260: Wed May 22 15:57:40 EST 2013 root@lstewart-laptop:/usr/obj/usr/src/sys/GENERIC amd64 root@lstewart-laptop:~ # env MD_DEV=`mdconfig -an -t swap -s 1m -x 63 -y 16` sh -c '(fdisk -I md${MD_DEV} ; bsdlabel -w -B md${MD_DEV}s1 ; bsdlabel md${MD_DEV}s1 ; dd if=/dev/md${MD_DEV} of=/dev/null bs=64k ; bsdlabel md${MD_DEV}s1 ; mdconfig -d -u ${MD_DEV})' *** Working on device /dev/md0 *** fdisk: invalid fdisk partition table found # /dev/md0s1: 8 partitions: # size offsetfstype [fsize bsize bps/cpg] a: 1937 16unused0 0 c: 1953 0unused0 0 # raw part, don't edit 16+0 records in 16+0 records out 1048576 bytes transferred in 0.001728 secs (606794497 bytes/sec) bsdlabel: /dev/md0s1: no valid label found root@lstewart-laptop:~ # env MD_DEV=`mdconfig -an -t malloc -s 1m -x 63 -y 16` sh -c '(fdisk -I md${MD_DEV} ; bsdlabel -w -B md${MD_DEV}s1 ; bsdlabel md${MD_DEV}s1 ; dd if=/dev/md${MD_DEV} of=/dev/null bs=64k ; bsdlabel md${MD_DEV}s1 ; mdconfig -d -u ${MD_DEV})' *** Working on device /dev/md0 *** fdisk: invalid fdisk partition table found # /dev/md0s1: 8 partitions: # size offsetfstype [fsize bsize bps/cpg] a: 1937 16unused0 0 c: 1953 0unused0 0 # raw part, don't edit 16+0 records in 16+0 records out 1048576 bytes transferred in 0.001251 secs (838202118 bytes/sec) # /dev/md0s1: 8 partitions: # size offsetfstype [fsize bsize bps/cpg] a: 1937 16unused0 0 c: 1953 0unused0 0 # raw part, don't edit ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Enhancing the user experience with tcsh
On 02/10/12 11:52, Eitan Adler wrote: In conf/160689 (http://www.freebsd.org/cgi/query-pr.cgi?pr=160689) there has been some discussion about changing the default cshrc file. I'd like to commit something like the following based on Chris's patch at the end of the thread. This post is an attempt to open the change to wider discussion. I like the proposed changes, although I don't see why you set the prompt twice? I've also inserted the changes I commonly run with inline below. commit dbe6cb730686dd53af7d06cc9b69b60e6e55549c diff --git a/etc/root/dot.cshrc b/etc/root/dot.cshrc --- a/etc/root/dot.cshrc +++ b/etc/root/dot.cshrc @@ -7,9 +7,10 @@ alias h history 25 alias j jobs -l -alias la ls -a +alias la ls -aF alias lf ls -FA -alias ll ls -lA +alias ll ls -lAF +alias ls ls -F # A righteous umask umask 22 @@ -17,19 +18,24 @@ umask 22 set path = (/sbin /bin /usr/sbin /usr/bin /usr/games /usr/local/sbin /usr/local/bin $HOME/bin) setenv EDITOR vi -setenv PAGER more +setenv PAGER less setenv BLOCKSIZE K # Sets SSH_AUTH_SOCK to the user's ssh-agent socket path if running if (${?SSH_AUTH_SOCK} != 1) then setenv SSH_AUTH_SOCK `sockstat | grep ${USER} | grep ssh-agent | awk '{print $6}'` endif if ($?prompt) then # An interactive shell -- set some stuff up set prompt = `/bin/hostname -s`# # Useful for root's .cshrc, although I run with it in all my .cshrc if (`id -g` == 0) then set prompt=root@%m# endif set filec - set history = 100 - set savehist = 100 + set history = 1 + set savehist = 1 + set autolist set autologout = 0 + # Use history to aid expansion + set autoexpand set mail = (/var/mail/$USER) if ( $?tcsh ) then bindkey ^W backward-delete-word bindkey -k up history-search-backward bindkey -k down history-search-forward # This maps the Delete key to do the right thing # Pressing CTRL-v followed by the key of interest will print the shell's mapping for the key bindkey ^[[3~ delete-char-or-list-or-eof endif + set prompt = [%n@%m]%c04%# + set promptchars = %# endif Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Removal of sysinstall from HEAD and lack of a post-install configuration tool
On 12/27/11 16:13, Ron McDowell wrote: Doug Barton wrote: The story so far ... sysinstall was removed from HEAD in October. I (and others) objected on the basis that at this time there is no replacement for the post-install configuration role that sysinstall played. More sysinstall components were then removed. Then the old version of libdialog (which sysinstall used) was removed. Thus at this point it's not possible to easily restore sysinstall. So my question is, how much do you care? Is lack of that functionality in HEAD something that we care about? Doug We have around 90 web servers running 8.2p5 right now [and yes, I did update the lot on Christmas Eve but that's a different story] and they will not be upgraded to 9.0 until/unless the post-install functionality that was lost by the removal of sysinstall is reintegrated in some way. I also complained about it and was told in effect, too bad. Everyone who commented said sysinstall caused more problems than it solved, although I've been using it for any system changes I needed that it was capable of doing for as long back as I can remember, and my first FreeBSD box was v2.2. I think removing any functionality that was in a previous release without providing an equal-or-better alternative is a bad idea, and that needs to be considered more carefully in the future. So this is not just a +1 vote, it's a +90. Sysintall is in 9 and will not be removed from the 9 branch. The installer used on the release media has changed, but as far as I understand, there is nothing stopping you from running sysinstall from a installer shell or using it for post installation configuration. Doug is only referring to the head branch (which will eventually in ~18-24 months become the 10 branch), so you should be able to have the best of both worlds with 9 i.e. try bsdinstall, fall back to sysinstall when you find bugs or missing features (don't forget to lodge bug reports for problems you find so that bsdinstall can be improved). On the topic of Doug's actual question, I see minimal sense in resurrecting sysinstall in head now. I would suggest it be done much closer to (say, 6 months before) the 10.0 release cycle, if no suitable post-installation configuration tool has materialised. In the meantime, cajole everyone who pops up saying I really want post installation configuration support to get involved with writing a bsdinstaller-like script (I think it should be completely separate to bsdinstaller, but perhaps use the same backend shell script functions/infrastructure) to do the job. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Removal of sysinstall from HEAD and lack of a post-install configuration tool
On 12/28/11 06:29, Doug Barton wrote: On 12/27/2011 03:48, Lawrence Stewart wrote: On the topic of Doug's actual question, I see minimal sense in resurrecting sysinstall in head now. I would suggest it be done much closer to (say, 6 months before) the 10.0 release cycle, if no suitable post-installation configuration tool has materialised. My concern about that approach is that 9.0 hasn't even been released yet and we've already seen changes that are going to make it hard to resurrect sysinstall if that's the decision we come to. Waiting another year or 2 would make it impossible. Which changes are you referring to? I would have thought a reverse merge to undo the deletion of the sysinstall and old libdialog sources would be very minimal work. We'd also probably need a few extra build system changes to make sure old libdialog is perhaps statically compiled into sysinstall as it would be the only in-tree consumer, but that's not hard either. I may be lacking some imagination, but don't really see why it would become harder the longer we wait. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?
On 12/08/11 05:08, Luigi Rizzo wrote: On Wed, Dec 07, 2011 at 11:59:43AM +0100, Andre Oppermann wrote: On 06.12.2011 22:06, Luigi Rizzo wrote: ... Even in my experiments there is a lot of instability in the results. I don't know exactly where the problem is, but the high number of read syscalls, and the huge impact of setting interrupt_rate=0 (defaults at 16us on the ixgbe) makes me think that there is something that needs investigation in the protocol stack. Of course we don't want to optimize specifically for the one-flow-at-10G case, but devising something that makes the system less affected by short timing variations, and can pass upstream interrupt mitigation delays would help. I'm not sure the variance is only coming from the network card and driver side of things. The TCP processing and interactions with scheduler and locking probably play a big role as well. There have been many changes to TCP recently and maybe an inefficiency that affects high-speed single sessions throughput has crept in. That's difficult to debug though. I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which seems slightly faster than HEAD) using MTU=1500 and various combinations of card capabilities (hwcsum,tso,lro), different window sizes and interrupt mitigation configurations. default latency is 16us, l=0 means no interrupt mitigation. lro is the software implementation of lro (tcp_lro.c) hwlro is the hardware one (on 82599). Using a window of 100 Kbytes seems to give the best results. Summary: [snip] - enabling software lro on the transmit side actually slows down the throughput (4-5Gbit/s instead of 8.0). I am not sure why (perhaps acks are delayed too much) ? Adding a couple of lines in tcp_lro to reject pure acks seems to have much better effect. The tcp_lro patch below might actually be useful also for other cards. --- tcp_lro.c (revision 228284) +++ tcp_lro.c (working copy) @@ -245,6 +250,8 @@ ip_len = ntohs(ip-ip_len); tcp_data_len = ip_len - (tcp-th_off 2) - sizeof (*ip); + if (tcp_data_len == 0) + return -1; /* not on ack */ /* There is a bug with our LRO implementation (first noticed by Jeff Roberson) that I started fixing some time back but dropped the ball on. The crux of the problem is that we currently only send an ACK for the entire LRO chunk instead of all the segments contained therein. Given that most stacks rely on the ACK clock to keep things ticking over, the current behaviour kills performance. It may well be the cause of the performance loss you have observed. WIP patch is at: http://people.freebsd.org/~lstewart/patches/misctcp/tcplro_multiack_9.x.r219723.patch Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have LRO capable hardware setup locally to figure out what I've missed. Most of the machines in my lab are running em(4) NICs which don't support LRO, but I'll see if I can find something which does and perhaps resurrect this patch. If anyone has any ideas what I'm missing in the patch to make it work, please let me know. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On 10/26/11 22:53, John Baldwin wrote: On Wednesday, October 26, 2011 3:54:31 am Pawel Jakub Dawidek wrote: On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote: On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. Shipped kernels have all assertions turned off. Yes, I'm aware of that, but many people compile their production kernels with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing it into a printf, so it will be visible. No, the kernel is corrupting things in other places when this is true, so if you are running with INVARIANTS, we want to know about it. Specifically, in several places in TCP we assume that rcv_adv= rcv_nxt, and depend on being able to do 'rcv_adv - rcv_nxt'. In this case, it looks like the difference is consistently less than one frame. I suspect the other end of the connection is sending just beyond the end of the advertised window (it probably assumes it is better to send a full frame if it has that much pending data even though part of it is beyond the window edge vs sending a truncated packet that just fills the window) and that that frame is accepted ok in the header prediction case and it's ACK is delayed, but the next packet to arrive then trips over this assumption. Since 'win' is guaranteed to be non-negative and we explicitly cast 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking for: tp-rcv_wnd = imax(win, (int)(tp-rcv_adv - tp-rcv_nxt)); I think we already handle this case ok and perhaps the assertion can just be removed? Not sure if others feel that it warrants a comment to note that this is the case being handled. I added debug to the places where rcv_adv and rcv_nxt are modified. Here is what happens before the panic occurs: tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022361548 rcv_adv 4022360100 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022362298 rcv_adv 4022361548 diff -750 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022363746 rcv_adv 4022362298 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022364836 rcv_adv 4022363746 diff -1090 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022366284 rcv_adv 4022364836 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022370628 rcv_adv 4022369690 diff -938 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022379140 rcv_adv 4022377692 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022387792 rcv_adv 4022386344 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022388890 rcv_adv 4022387792 diff -1098 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022390338 rcv_adv 4022388890 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 diff -221 panic: tcp_input negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 win=0 diff -221 I can send you the full log if you want, I've plenty of messages where rcv_adv rcv_nxt, not all of them trigger this assertion. The assertion would be triggered when the next packet arrives (as I said above). Try modifying your debugging output to also log if the ACK is delayed. I suspect it is not delayed until the last one. (Pushing out an ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case of an immediate ACK, rcv_nxt rcv_adv is only a transient condition all under a single lock invocation so never visible to other consumers of the protocol control block.) If that is what you see, then that confirms what I guessed above and I will likely just remove the assertion in tcp_input() and patch the timewait code to handle this case. Pawel, have you been able to confirm John's hypothesis? What I don't quite get is why we haven't had a lot more reports of this issue... Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On 10/22/11 19:49, Pawel Jakub Dawidek wrote: The panic message says: panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 3718269252 rcv_adv 3718268291 I only have picture of the backtrace: http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg ewww that is not good. Can you give us any more information about the machine and what it's doing? Is it terminating TCP connections from the internet at large or only local LAN (i.e. is there likely to be packet loss happening)? Are you doing TSO or LRO? Do you have any non-default tuning in place? Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Sense fetching [Was: cdrtools /devel ...]
On 11/13/10 20:34, Alexander Motin wrote: Brandon Gooch wrote: 2010/11/5 Alexander Motin m...@freebsd.org: Hi. I've reviewed tests that scgcheck does to SCSI subsystem. It shown combination of several issues in both CAM, ahci(4) and cdrtools itself. Several small patches allow us to pass most of that tests: http://people.freebsd.org/~mav/sense/ ahci_resid.patch: Add support for reporting residual length on data underrun. SCSI commands often returns results shorter then expected. Returned value allows application to know/check how much data it really has. It is also important for sense fetching, as ATAPI and USB devices return sense as data in response to REQUEST_SENSE command. sense_resid.patch: When manually requesting sense data (ATAPI or USB), request only as much data as user requested (not the fixed structure size), and return respective sense residual length. pass_autosence.patch: Unless CAM_DIS_AUTOSENSE is set, always fetch sense if not done by SIM, independently of CAM_PASS_ERR_RECOVER. As soon as device freeze released before returning to user-level, user-level application by definition can't reliably fetch sense data if some other application (like hald) tries to access device same time. cdrtools.patch: Make libscg (part of cdrtools) on FreeBSD to submit wanted sense length to CAM and do not clear sense return buffer. It is mostly cosmetics, important probably only for scgcheck. Testers and reviewers welcome. I am especially interested in opinion about pass_autosence.patch -- may be we should lower sense fetching even deeper, to make it work for all cam_periph_runccb() consumers. Hey mav, sorry to chime in after so long here, but have some of these patches been committed (as of r215179)? Which patches are still applicable for testing? I assume the cdrtools patch for sure... Now uncommitted pass_autosence.patch and possibly cdrtools.patch. To add another data point, I just applied the pass_autosence.patch to my ahci enabled 8.2-STABLE r220153 kernel and I can now burn successfully with cdrecord. The same kernel without the patch was unable to burn (though it could erase disks ok). Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [HEADS UP] Significant TCP work committed to head - CUBIC H-TCP committed
On 11/12/10 20:35, Lawrence Stewart wrote: Hi All, A quick note that this evening, I made the first in a series of upcoming commits to head that modify the TCP stack fairly significantly. I have no reason to believe you'll notice any issues, but TCP is a complex beast and it's possible things might crop up. The changes are mostly related to congestion control, so the sorts of issues that are likely to crop up if any will most probably be subtle and difficult to even detect. The first svn revision in question is r215166. The next few commits I plan to make will be basically zero impact and then another significant patch will follow in a few weeks. If you bump into an issue that you think might be related to this work, please roll back r215166 from your tree and attempt to reporoduce before reporting the problem. Please CC me directly with your problem report and post to freebsd-current@ or freebsd-net@ as well. Lots more information about what all this does and how to use it will be following in the coming weeks, but in the meantime, just keep this note in the back of your mind. For the curious, some information about the project is available at [1,2]. Cheers, Lawrence [1] http://caia.swin.edu.au/freebsd/5cc/ [2] http://www.freebsd.org/news/status/report-2010-07-2010-09.html#Five-New-TCP-Congestion-Control-Algorithms-for-FreeBSD After a rather arduous couple of weeks grappling with VIMAGE related bugs, intermittently failing testbed hardware and various algorithm ambiguities, the next chunk of work has finally landed in head. Kernel modules implementing the CUBIC and H-TCP congestion control algorithms are now built/installed during a make kernel. I should stress that everything other than NewReno is considered experimental at this stage in an IRTF/IETF specification sense, and as such I would strongly advise against setting the system default algorithm to anything other than NewReno. The TCP_CONGESTION setsockopt call (used by e.g. iperf -Z) is the appropriate way to test an algorithm on an individual connection. For those interested in taking the algorithms for a spin, the easiest way is probably to use benchmarks/iperf from ports on a source/sink machine and do the following: - On the data sink (receiver) cd /usr/ports/benchmarks/iperf fetch http://caia.swin.edu.au/urp/newtcp/tools/caia_iperf204_1.1.patch mv caia_iperf204_1.1.patch files/patch-caiaiperf make install clean sysctl kern.ipc.maxsockbuf=1048576 iperf -s -j 256k -k 256k - On the data source (sender) cd /usr/ports/benchmarks/iperf fetch http://caia.swin.edu.au/urp/newtcp/tools/caia_iperf204_1.1.patch mv caia_iperf204_1.1.patch files/patch-caiaiperf make install clean kldload cc_cubic cc_htcp sysctl kern.ipc.maxsockbuf=1048576 iperf -c data_sink_ip -j 256k -k 256k -Z algo (where algo is one from the list reported by sysctl net.inet.tcp.cc.available) You may need to fiddle with the above parameters a bit depending on your setup. You will want decent bandwidth (5+Mbps should be ok) and a moderate to large RTT (50+ms) between both hosts if you want to see these algorithms really shine. You can use dummynet on the data source machine to easily introduce artificial bw/delay/queuing e.g. ipfw pipe 1 config noerror bw 10Mbps delay 20ms queue 100Kbytes ipfw add 10 pipe 1 ip from me to data_sink_ip dst-port 5001 Be careful to do the above via console access or stick options IPFIREWALL and options IPFIREWALL_DEFAULT_TO_ACCEPT in your kernel config to avoid locking yourself out (dummynet needs IPFW to work). For the really interested (by now I suspect my audience is down to 0, but still), you might want to load siftr and enable/disable it during each test run and make your very own plot of cwnd vs time to see what's really going on behind the scenes. Ok that's enough for now, but much more is on the way. Please let me know if you have any feedback or run into any problems related to this work. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [HEADS UP] Significant TCP work committed to head - CUBIC H-TCP committed
Hi Ivan, On 12/03/10 00:07, Ivan Voras wrote: On 12/02/10 12:53, Lawrence Stewart wrote: For the really interested (by now I suspect my audience is down to 0, but still), you might want to load siftr and enable/disable it during each test run and make your very own plot of cwnd vs time to see what's really going on behind the scenes. Ok that's enough for now, but much more is on the way. Please let me know if you have any feedback or run into any problems related to this work. Hi, My question isn't very constructive but I'd like to know more about this topic. Have you seen this: http://blog.benstrong.com/2010/11/google-and-microsoft-cheat-on-slow.html http://developers.slashdot.org/story/10/11/26/1729218/Google-Microsoft-Cheat-On-Slow-Start-mdash-Should-You Yes I'd seen the first one and just skimmed the slashdot thread now. ? In short: is the existance of slow-start a property of (New)Reno and No, mostly unrelated. Slow start is one of 4 separate but related algorithms which control a TCP flow's behaviour during startup and general operation. See RFC5681 for useful discussion of the algorithms. NewReno unfortunately is an overloaded term. In congestion control circles, NewReno is used to refer to the congestion avoidance behaviour of increase cwnd by 1 max seg size per RTT and backoff cwnd by half when congestion (3 dup ACKs) is detected (which is the same basic behaviour as Reno BTW). NewReno also refers to a set of tweaks (RFC3782) to TCP's fast recovery algorithm (helps recover from multiple losses in a window when SACK isn't available). will some of the new algorithms make it less cautious, i.e. faster? I don't think it's critical but I'm often noticing it, especially on bulk transfers over LAN. With respect to slow start, no. Congestion control algorithms tend to focus on the increase/decrease of cwnd during congestion avoidance mode, which is transitioned to after slow start completes. Slow start is left untouched. There are proposals to modify/replace slow start e.g. RFC4782 and 'JumpStart' [1]. The reason Google and Microsoft are fiddling with things are because they typically only need to push a small amount of data, so waiting for slow start to complete eats up unnecessary RTTs. Google are pushing in the IETF at the moment to have the initial window bumped to 10 segments (see the tcpm, iccrg and tmrg IRTF/IETF mailing lists if interested). There is some push back happening though and the discussions are interesting. Cheers, Lawrence [1] http://www.icir.org/mallman/papers/jumpstart-pfldnet07.pdf ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [HEADS UP] Significant TCP work committed to head - VIMAGE users
On 11/12/10 20:35, Lawrence Stewart wrote: Hi All, A quick note that this evening, I made the first in a series of upcoming commits to head that modify the TCP stack fairly significantly. I have no reason to believe you'll notice any issues, but TCP is a complex beast and it's possible things might crop up. The changes are mostly related to congestion control, so the sorts of issues that are likely to crop up if any will most probably be subtle and difficult to even detect. The first svn revision in question is r215166. The next few commits I plan to make will be basically zero impact and then another significant patch will follow in a few weeks. If you bump into an issue that you think might be related to this work, please roll back r215166 from your tree and attempt to reporoduce before reporting the problem. Please CC me directly with your problem report and post to freebsd-current@ or freebsd-net@ as well. Lots more information about what all this does and how to use it will be following in the coming weeks, but in the meantime, just keep this note in the back of your mind. For the curious, some information about the project is available at [1,2]. Cheers, Lawrence [1] http://caia.swin.edu.au/freebsd/5cc/ [2] http://www.freebsd.org/news/status/report-2010-07-2010-09.html#Five-New-TCP-Congestion-Control-Algorithms-for-FreeBSD For any VIMAGE users running head, please note that r215166 overlooked some important VIMAGE related issues and actually triggers a kernel panic when the first vnet is brought up (see [3] for details). Please ensure you update to r215395 or later to ensure you have all the patches I committed to address the VIMAGE deficiencies in the original r215166 commit. Cheers, Lawrence [3] http://lists.freebsd.org/pipermail/svn-src-head/2010-November/022381.html ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
[HEADS UP] Significant TCP work committed to head
Hi All, A quick note that this evening, I made the first in a series of upcoming commits to head that modify the TCP stack fairly significantly. I have no reason to believe you'll notice any issues, but TCP is a complex beast and it's possible things might crop up. The changes are mostly related to congestion control, so the sorts of issues that are likely to crop up if any will most probably be subtle and difficult to even detect. The first svn revision in question is r215166. The next few commits I plan to make will be basically zero impact and then another significant patch will follow in a few weeks. If you bump into an issue that you think might be related to this work, please roll back r215166 from your tree and attempt to reporoduce before reporting the problem. Please CC me directly with your problem report and post to freebsd-current@ or freebsd-net@ as well. Lots more information about what all this does and how to use it will be following in the coming weeks, but in the meantime, just keep this note in the back of your mind. For the curious, some information about the project is available at [1,2]. Cheers, Lawrence [1] http://caia.swin.edu.au/freebsd/5cc/ [2] http://www.freebsd.org/news/status/report-2010-07-2010-09.html#Five-New-TCP-Congestion-Control-Algorithms-for-FreeBSD ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [HEADS UP] Significant TCP work committed to head
On 11/13/10 04:58, Kevin Oberman wrote: Date: Fri, 12 Nov 2010 20:35:45 +1100 From: Lawrence Stewart lstew...@freebsd.org Sender: owner-freebsd-curr...@freebsd.org Hi All, A quick note that this evening, I made the first in a series of upcoming commits to head that modify the TCP stack fairly significantly. I have no reason to believe you'll notice any issues, but TCP is a complex beast and it's possible things might crop up. The changes are mostly related to congestion control, so the sorts of issues that are likely to crop up if any will most probably be subtle and difficult to even detect. The first svn revision in question is r215166. The next few commits I plan to make will be basically zero impact and then another significant patch will follow in a few weeks. If you bump into an issue that you think might be related to this work, please roll back r215166 from your tree and attempt to reporoduce before reporting the problem. Please CC me directly with your problem report and post to freebsd-current@ or freebsd-net@ as well. Lots more information about what all this does and how to use it will be following in the coming weeks, but in the meantime, just keep this note in the back of your mind. For the curious, some information about the project is available at [1,2]. Cheers, Lawrence [1] http://caia.swin.edu.au/freebsd/5cc/ [2] http://www.freebsd.org/news/status/report-2010-07-2010-09.html#Five-New-TCP-Congestion-Control-Algorithms-for-FreeBSD Lawrence, Great news! I've been looking forward to having these congestion algorithms for a while and this is clearly a big step to getting there. HTCP and CUBIC should become available next week (the zero impact commits I mentioned in my email above) and then a chunk of additional infrastructure is needed in order to add our delay based algorithm modules to the tree. We anticipate having all the code in head by the time Christmas rolls around assuming no major issues crop up. Do you intend to MFC this for 8.2? No. I wouldn't feel comfortable unleashing this on people with only 2 weeks of soak time in head. The current MFC schedule is 3 months from yesterday, so it will be in stable/8 and hopefully stable/7 as well shortly after 8.2 and 7.4 are released. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sysctl -a is slow
On 09/21/10 02:21, David Xu wrote: jhell wrote: On Mon, 20 Sep 2010 10:26, David Xu wrote: In Message-Id: 4c976f14.8000...@freebsd.org jhell wrote: On 09/19/2010 09:28, David Xu wrote: just typed sysctl -a on keyboard, and found it is slow, sometimes it has been stuck for a few seconds, further studied,I found it is stucked at sysctl kern.geom: %/usr/bin/time sysctl -a kern.geom kern.geom.collectstats: 1 kern.geom.debugflags: 0 kern.geom.label.debug: 0 kern.geom.label.ext2fs.enable: 1 kern.geom.label.iso9660.enable: 1 kern.geom.label.msdosfs.enable: 1 kern.geom.label.ntfs.enable: 1 kern.geom.label.reiserfs.enable: 1 kern.geom.label.ufs.enable: 1 kern.geom.label.ufsid.enable: 1 kern.geom.label.gptid.enable: 1 kern.geom.label.gpt.enable: 1 2.01 real 0.00 user 0.00 sys it seems it needs more than 2 seconds to complete. A ktrace(1) and a kdump(1) of the resulting ktrace.out file would probably help here along with uname -a. Ive seen this happen once before but do not recall what caused it. Regards good luck, Result is dumped here. http://people.freebsd.org/~davidxu/sysctl_slow.txt I think the culprit is sysctl kern.geom.confdot, which does not appear in normal output, until I check the kdump result. I tried five times, and it was blocked three times. Inspecting the output of sysctl -b kern.geom.confdot will give you what you currently have configured in the system as disks and what not through geom. If this seems to be bailing at that point, which is an opaque MIB/OID which doesn't come up other than when you use the -o switch to sysctl(1) then could you check your labels for your disks for any weird characters in the labels ? ( sysctl -bo kern.geom ) Also does this have the same effect when run in a xterm, cons25 terminal ? And same for the above but with the C, *_COUNTRY.UTF-8 or your normal locale ? ( env LANG=C sysctl kern.geom ) Looking at the output from mine there are quite a few unprintable characters present. Maybe these are having an impact with one of your labels. I redirect all output to a disk file, and it still needs 1 second to complete, this machine is dual-core pentium E5500, faster than previous one which is a dual-core AMD 5000+ machine, the 5000+ needs 2 seconds to complete. $/usr/bin/time sysctl -b kern.geom.confdot sysctl_geom_confdot.txt 1.00 real 0.00 user 0.00 sys the file is here: http://people.freebsd.org/~davidxu/sysctl_geom_confdot.txt As an extra data point, running /usr/bin/time sysctl -b kern.geom.confdot repeatedly on my amd64 8.1-STABLE desktop varies between 0s and 2s. It reports 0 majority of the time but every 5 or so runs it'll stall for 1 or 2 seconds. So the problem isn't isolated to head. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sysctl -a is slow
On 09/21/10 00:01, David Xu wrote: Lawrence Stewart wrote: On 09/21/10 02:21, David Xu wrote: jhell wrote: On Mon, 20 Sep 2010 10:26, David Xu wrote: In Message-Id: 4c976f14.8000...@freebsd.org jhell wrote: On 09/19/2010 09:28, David Xu wrote: just typed sysctl -a on keyboard, and found it is slow, sometimes it has been stuck for a few seconds, further studied,I found it is stucked at sysctl kern.geom: %/usr/bin/time sysctl -a kern.geom kern.geom.collectstats: 1 kern.geom.debugflags: 0 kern.geom.label.debug: 0 kern.geom.label.ext2fs.enable: 1 kern.geom.label.iso9660.enable: 1 kern.geom.label.msdosfs.enable: 1 kern.geom.label.ntfs.enable: 1 kern.geom.label.reiserfs.enable: 1 kern.geom.label.ufs.enable: 1 kern.geom.label.ufsid.enable: 1 kern.geom.label.gptid.enable: 1 kern.geom.label.gpt.enable: 1 2.01 real 0.00 user 0.00 sys it seems it needs more than 2 seconds to complete. A ktrace(1) and a kdump(1) of the resulting ktrace.out file would probably help here along with uname -a. Ive seen this happen once before but do not recall what caused it. Regards good luck, Result is dumped here. http://people.freebsd.org/~davidxu/sysctl_slow.txt I think the culprit is sysctl kern.geom.confdot, which does not appear in normal output, until I check the kdump result. I tried five times, and it was blocked three times. Inspecting the output of sysctl -b kern.geom.confdot will give you what you currently have configured in the system as disks and what not through geom. If this seems to be bailing at that point, which is an opaque MIB/OID which doesn't come up other than when you use the -o switch to sysctl(1) then could you check your labels for your disks for any weird characters in the labels ? ( sysctl -bo kern.geom ) Also does this have the same effect when run in a xterm, cons25 terminal ? And same for the above but with the C, *_COUNTRY.UTF-8 or your normal locale ? ( env LANG=C sysctl kern.geom ) Looking at the output from mine there are quite a few unprintable characters present. Maybe these are having an impact with one of your labels. I redirect all output to a disk file, and it still needs 1 second to complete, this machine is dual-core pentium E5500, faster than previous one which is a dual-core AMD 5000+ machine, the 5000+ needs 2 seconds to complete. $/usr/bin/time sysctl -b kern.geom.confdot sysctl_geom_confdot.txt 1.00 real 0.00 user 0.00 sys the file is here: http://people.freebsd.org/~davidxu/sysctl_geom_confdot.txt As an extra data point, running /usr/bin/time sysctl -b kern.geom.confdot repeatedly on my amd64 8.1-STABLE desktop varies between 0s and 2s. It reports 0 majority of the time but every 5 or so runs it'll stall for 1 or 2 seconds. So the problem isn't isolated to head. Cheers, Lawrence I happened to set kern.sched.preempt_thresh=200, so the kernel is more aggressive than default on thread preemption. it is easier than default to reproduce the problem, my desktop machine is idle, but it still stalls 1 or 2 seconds on the sysctl. heh, from /etc/sysctl.conf on the machine I tested with: # 4/9/2010 # should give more responsiveness on desktop suggested # by David Xu davi...@freebsd.org on freebsd-stable@ kern.sched.preempt_thresh=220 This machine is my primary kde4 desktop at home. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Crash during boot of current (rev 212885)
Hiya Randall! On 09/20/10 08:56, Randall Stewart wrote: Hey all: I am now seeing a crash when I boot my Intel (in 64 bit more)... Its very early in the boot process.. and thus no crash dump ;-0 Its in netisr_start_swi() When it initializes netisr_mtx with a mtx_init() it crashes saying that netisr_mtx is unaligned... (the address ddb shows for netisr_mtx ends with c ... so it definitely is unaligned... Looking at the netisr_workstream structure (where netisr_mtx is) it appears to be in theory aligned right (follows 2 pointers)... so did something change the DP_CPU Define stuff to cause us to get unaligned access? Just curious... If I don't hear from anyone I will start backing things out 1 rev at a time until I find what did it I guess ;-) My guess would be r212647. Try backing that rev out and if it fixes things, hopefully Andriy will have some thoughts on how to fix the problem. Apologies if my guess is a red herring. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: amd64 panic snd_hda - hdac_get_capabilities: Invalid corb size (0)
On 07/27/10 02:07, Anton Shterenlikht wrote: On Mon, Jul 26, 2010 at 02:24:52PM +0100, Anton Shterenlikht wrote: On amd64 r210496 I get this panic when booting a kernel with snd_hda(4). I haven't used this driver before, so can't say if this is a regression. (copied by hand) hdac0: ATI SB600 High Definition Audion Controller irq 16 at device 20.2 on pci0 hdac0: HDA Driver Revision: 20100226_0142 hdac0: [ITHREAD] hdac0: hdac_get_capabilities: Invalid corb size (0) device_attach: hdac0 attach returned 6 Slab at 0xff000261eb18, freei 3 = 0 panic: Duplicate free of item 0xff0002661c00 from zone 0xff00b7f9a500(1024) cpuid = 0 KDB: enter: panic [ thread pid 0 tid 10 ] Stopped at kdb_enter+0x3d: movq $0,0x74f360(%rip) dbbt (very long output.. ending in) mi_startup() at mi_startup_0x59 btext() at btext+0x2c I moved back as far as r204000, still the same panic. Please advise I get this same panic on my Toshiba Portege R600 laptop when I boot it into Windows and then reboot it into FreeBSD. My guess is that the Windows drivers leave the hardware in a state which the FreeBSD code doesn't know how to deal with. I don't run Windows often so haven't hit this panic in a while, but the trick that always worked for me was to go into the BIOS and Reset to defaults, then boot into FreeBSD. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: amd64 panic snd_hda - hdac_get_capabilities: Invalid corb size (0)
On 07/27/10 18:09, Anton Shterenlikht wrote: On Tue, Jul 27, 2010 at 05:37:49PM +1000, Lawrence Stewart wrote: On 07/27/10 02:07, Anton Shterenlikht wrote: On Mon, Jul 26, 2010 at 02:24:52PM +0100, Anton Shterenlikht wrote: On amd64 r210496 I get this panic when booting a kernel with snd_hda(4). I haven't used this driver before, so can't say if this is a regression. (copied by hand) hdac0: ATI SB600 High Definition Audion Controller irq 16 at device 20.2 on pci0 hdac0: HDA Driver Revision: 20100226_0142 hdac0: [ITHREAD] hdac0: hdac_get_capabilities: Invalid corb size (0) device_attach: hdac0 attach returned 6 Slab at 0xff000261eb18, freei 3 = 0 panic: Duplicate free of item 0xff0002661c00 from zone 0xff00b7f9a500(1024) cpuid = 0 KDB: enter: panic [ thread pid 0 tid 10 ] Stopped at kdb_enter+0x3d: movq $0,0x74f360(%rip) dbbt (very long output.. ending in) mi_startup() at mi_startup_0x59 btext() at btext+0x2c I moved back as far as r204000, still the same panic. Please advise I get this same panic on my Toshiba Portege R600 laptop when I boot it into Windows and then reboot it into FreeBSD. My guess is that the Windows drivers leave the hardware in a state which the FreeBSD code doesn't know how to deal with. I don't run Windows often so haven't hit this panic in a while, but the trick that always worked for me was to go into the BIOS and Reset to defaults, then boot into FreeBSD. no, that doesn't help, still the same panic Also, I've only FBSD installed on this laptop (HP Compaq 6715s), no other OS. hmm I'll have to try the patch and see if it resolves the issue for me. I guess in my case resetting the BIOS was causing a different code path to be taken and thus the panic never triggered. Good to here it's resovled for you though. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
On 06/28/10 18:56, Lawrence Stewart wrote: Hi again, After my most recent appeal for testers, I received some excellent feedback and thank everyone that has tried the patch. I've ironed out a couple of bugs and have what I hope is the import-ready candidate patch available for a final round of testing. Please read on if you are able and willing to (re)test the code. [snip] I've committed SIFTR to head as r209662, with r209665 as a minor follow up fix to include the man page in the build. Sincere thanks to everyone that pitched in with review/testing and if you haven't already tried it, give it a spin next time you update your sources to r209665 or later - man siftr will get you going. Please CC me explicitly on any mail regarding problems with SIFTR. On the off chance anyone is looking for some self contained, small projects/patches to work on, I have plenty of additional ideas for improvements to SIFTR. I'd be very happy to collaborate with anyone that was interested enough to work on the code. Enjoy! Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
Hi again, After my most recent appeal for testers, I received some excellent feedback and thank everyone that has tried the patch. I've ironed out a couple of bugs and have what I hope is the import-ready candidate patch available for a final round of testing. Please read on if you are able and willing to (re)test the code. On 06/19/10 13:27, Lawrence Stewart wrote: Amount of feedback received thus far: nichts, nil, nada *sings I'm so ronery in his best Kim Jong-il voice* [4] Just like Uncle Sam [5], Uncle Lawrence needs you too - yes, I'm pointing at YOU! More specifically, people out there running current with 10-15 mins to spare for some testing, please read on. On 06/13/10 18:12, Lawrence Stewart wrote: Hi all, The time has come to solicit some external testing for my SIFTR tool. I'm hoping to commit it within a week or so unless problems are discovered. SIFTR is a kernel module that logs a range of statistics on active TCP connections to a log file. It provides the ability to make highly granular measurements of TCP connection state, aimed at system administrators, developers and researchers. You can use the data to find bugs in the stack, understand why connections are performing badly and test new code to name a few uses. Development has been made possible in part by grants from the Cisco University Research Program Fund at Community Foundation Silicon Valley, and the FreeBSD Foundation. Bringing it into FreeBSD proper is being carried out under the auspices of the Enhancing the FreeBSD TCP Implementation FreeBSD Foundation project. More details are available at [1,2,3]. If you can help out, please read on! [snip] Latest patch which fixes 2 bugs reported by testers and adds a bit more discussion to the man page is available here: http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209558.patch Fixed bugs: - Running SIFTR on an INVARIANTS enabled kernel with a large number of TCP flows terminating on the machine would lead to a KASSERT triggering in the ALQ framework when SIFTR was disabled. - The SACK enabled data log message field was not being set correctly. If you would like to test on a kernel revision older then r209558, make sure you have my r209325 diff to sys/pcpu.h applied. It is safe to apply r209325 stand alone as it is self contained and not used by any code in the tree other than SIFTR. Please adapt the following instructions as appropriate based on the patch version you're testing. Copy it to the root of your source tree and run the following: patch -p1 siftr_9.x.r209119.patch It's a loadable kernel module so you can build it for testing like so: cd path/to/src/sys/modules/siftr make kldload ./siftr.ko (don't forget to make cleandir to remove cruft when finished testing) It turns out that the above instructions to build the module can produce a .ko that is out of sync with your kernel in such a way that the module can load, but may blow up unexpectedly. This was observed when KTR was enabled in the running kernel. To be safe, please use the following procedure instead: - Ensure path/to/src is the source tree that the kernel you are currently running was built from. cd path/to/src make buildkernel cp /usr/obj/path/to/src/sys/KERNCONF/modules/path/to/src/sys/modules/siftr/siftr.ko /tmp kldload /tmp/siftr.ko Alternatively for the last 2 steps, you can make installkernel ; shutdown -r now after the kernel build completes and then simply kldload siftr as the module will be installed to /boot/kernel/ as per usual. After applying the patch, you can read the man page by running: man -M path/to/src/share/man siftr If I've done a decent job, all the info you need to understand what it does and how to use it should be in the man page. I'm interested in all feedback and reports of success/failure, along with details of the architecture tested and number of CPUs if you would be so kind. That should be enough to get the ball rolling. Thanks and I look forward to hearing from you! Cheers, Lawrence [1] http://caia.swin.edu.au/freebsd/etcp09/ [2] http://www.freebsdfoundation.org/projects.shtml#Swinburne [3] http://caia.swin.edu.au/urp/newtcp/ [4] http://www.youtube.com/watch?v=xh_9QhRzJEs (language warning) [5] http://www.sonofthesouth.net/uncle-sam/images/uncle-sam-wants-you.jpg Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
On 06/22/10 04:52, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.org wrote: On 06/21/10 05:44, Rui Paulo wrote: On 20 Jun 2010, at 20:36, Fabian Keil wrote: Fabian Keilfreebsd-lis...@fabiankeil.de wrote: Fabian Keilfreebsd-lis...@fabiankeil.de wrote: My custom kernel normally doesn't have INVARIANTS and WITNESS enabled, so I'll try to enable them next. The culprit seem to be non-default KTR settings in the kernel while loading alq as a module. Actually whether or not alq is loaded as a module doesn't seem to matter, with: options KTR options KTR_ENTRIES=262144 options KTR_COMPILE=(KTR_SCHED) options KTR_MASK=(KTR_SCHED) options KTR_CPUMASK=0x3 options ALQ options KTR_ALQ enabling siftr panics the system, too. That's probably because your module was built with different compile time options than the ones used in the kernel. These options may change structure sizes, function parameters, etc. and that easily causes panics. hmm I wonder if my instructions to build SIFTR manually are causing your problems. Fabian, is the siftr.ko module you're loading built as part of a make buildkernel, or did you follow my instructions and cd /path/to/src/sys/modules/siftr ; make ; kldload ./siftr.ko? The latter. If the latter is true, perhaps try and explicitly build SIFTR as part of make buildkernel and see if loading the module built that way still triggers the panic when enabled (the module will be in /usr/obj/path/to/src/sys/KERNCONF/modules/path/to/src/sys/modules/siftr/siftr.ko or if you make installkernel it'll be in /boot/kernel/kernel/siftr.ko). That seems to work. Damn, well this is the first time I've encountered a problem like this whilst using SIFTR compiled standalone and I've been using it like that for almost 3 years. I guess the lack of KTR in the module build subtly influences the module in a way that allows it load but in a precarious way. How irritating. Rui you were right on the money! I will revise my testing instructions to build the module as part of a buildkernel to avoid potential problems like this. Thanks for helping get to the bottom of this and for the test feedback. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
Hi Fabian, On 06/20/10 03:58, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.org wrote: On 06/13/10 18:12, Lawrence Stewart wrote: The time has come to solicit some external testing for my SIFTR tool. I'm hoping to commit it within a week or so unless problems are discovered. I'm interested in all feedback and reports of success/failure, along with details of the architecture tested and number of CPUs if you would be so kind. I got the following hand-transcribed panic maybe a second after sysctl net.inet.siftr.enabled=1 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 [...] current process = 12 (swi4: clock) [ thread pid 12 tid 16 ] Stopped at siftr_chkpkt+0xd0: addq$0x1,0x8(%r14) db where Tracing pid 12 tid 16 td 0xff00034037e0 siftr_chkpt() at siftr_chkpkt+0xd0 pfil_run_hooks() at pfil_run_hooks+0xb4 ip_output() at ip_output+0x382 tcp_output() tcp_output+0xa41 tcp_timer_rexmt() at tcp_timer_rexmt+0x251 softclock() at softclock+0x291 intr_event_execute_handlers() at intr_event_execute_handlers+0x66 ithread_loop at ithread_loop+0x8e fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 --- So I've tracked down the line of code where the page fault is occurring: if (dir == PFIL_IN) ss-n_in++; else ss-n_out++; ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats per-cpu and is initialised at the start of the function like so: ss = DPCPU_PTR(ss); So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your machine. I know very little about the inner workings of the DPCPU_* macros, but I'm pretty sure the way I use them in SIFTR is correct or at least as intended. Could you please go ahead and retest using a GENERIC kernel and see if you can reproduce? There could be something in your custom kernel causing the offsets or linker set magic used by the DPCPU bits to break which in turn is triggering this panic in SIFTR. Whether its your custom changes breaking DPCPU or DPCPU being fragile remains to be seen, but the good news for me is that it looks like SIFTR is off the hook :) Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
On 06/20/10 21:15, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.org wrote: On 06/20/10 03:58, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.org wrote: On 06/13/10 18:12, Lawrence Stewart wrote: The time has come to solicit some external testing for my SIFTR tool. I'm hoping to commit it within a week or so unless problems are discovered. I'm interested in all feedback and reports of success/failure, along with details of the architecture tested and number of CPUs if you would be so kind. I got the following hand-transcribed panic maybe a second after sysctl net.inet.siftr.enabled=1 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 [...] current process = 12 (swi4: clock) [ thread pid 12 tid 16 ] Stopped at siftr_chkpkt+0xd0: addq$0x1,0x8(%r14) db where Tracing pid 12 tid 16 td 0xff00034037e0 siftr_chkpt() at siftr_chkpkt+0xd0 pfil_run_hooks() at pfil_run_hooks+0xb4 ip_output() at ip_output+0x382 tcp_output() tcp_output+0xa41 tcp_timer_rexmt() at tcp_timer_rexmt+0x251 softclock() at softclock+0x291 intr_event_execute_handlers() at intr_event_execute_handlers+0x66 ithread_loop at ithread_loop+0x8e fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 --- So I've tracked down the line of code where the page fault is occurring: if (dir == PFIL_IN) ss-n_in++; else ss-n_out++; ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats per-cpu and is initialised at the start of the function like so: ss = DPCPU_PTR(ss); So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your machine. I know very little about the inner workings of the DPCPU_* macros, but I'm pretty sure the way I use them in SIFTR is correct or at least as intended. siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing it itself. I think if ss was NULL, the panic should already occur in siftr_chkreinject(). Yes but siftr_chkreinject() only dereferences ss in the exceptional case of a malloc failure or duplicate pkt. It's unlikely either case happens for you and so wouldn't trigger the panic. To be sure I added: diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c index 8bc3498..b9fdfe4 100644 --- a/sys/netinet/siftr.c +++ b/sys/netinet/siftr.c @@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet *ifp, int dir, if (siftr_chkreinject(*m, dir, ss)) goto ret; + if (ss == NULL) { + printf(ss is NULL); + ss = DPCPU_PTR(ss); + if (ss == NULL) { + printf(ss is still NULL); + goto ret; + } +} + + if (dir == PFIL_IN) ss-n_in++; else which doesn't seem to affect the problem. As in it still panics and the ss is NULL message is not printed? I would have expected to at least see ss is NULL printed if my hypothesis was correct... hmm. Perhaps the way I discovered the line number at which the panic occurred was wrong. I compiled SIFTR on my amd64 dev server with CFLAGS+=-g in the SIFTR Makefile to get debug symbols, ran objdump -Sd siftr.ko | vim -, searched for the instruction reported in the panic message i.e. addq $0x1,0x8(%r14) and then with a bit of trial and error, recompiled SIFTR with the line of code volatile int blah = 0; blah = 2; at various points in the function and looking at the change in the objdump output to pinpoint which line of C code corresponded with the addq instruction. The volatile int blah = 0; blah = 2; compiles to movl $0x0,0xffd4(%rbp) followed immediately by movl $0x2,0xffd4(%rbp). When I put that code above the if (dir == PFIL_IN) statement I see the objdump output show the assembly code before the addq instruction and when I move it after the if statement the assembly code moves after the addq instruction. Perhaps you could reproduce the above procedure and see if you identify the same point in the siftr_chkpkt function I did for the instruction referenced by the panic message? Could you please go ahead and retest using a GENERIC kernel and see if you can reproduce? There could be something in your custom kernel causing the offsets or linker set magic used by the DPCPU bits to break which in turn is triggering this panic in SIFTR. I'll retry without pf first, and with GENERIC afterwards. Sounds good, thanks. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
On 06/20/10 22:28, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.org wrote: On 06/20/10 21:15, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.org wrote: On 06/20/10 03:58, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.orgwrote: On 06/13/10 18:12, Lawrence Stewart wrote: The time has come to solicit some external testing for my SIFTR tool. I'm hoping to commit it within a week or so unless problems are discovered. I'm interested in all feedback and reports of success/failure, along with details of the architecture tested and number of CPUs if you would be so kind. I got the following hand-transcribed panic maybe a second after sysctl net.inet.siftr.enabled=1 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 [...] current process = 12 (swi4: clock) [ thread pid 12 tid 16 ] Stopped at siftr_chkpkt+0xd0: addq$0x1,0x8(%r14) dbwhere Tracing pid 12 tid 16 td 0xff00034037e0 siftr_chkpt() at siftr_chkpkt+0xd0 pfil_run_hooks() at pfil_run_hooks+0xb4 ip_output() at ip_output+0x382 tcp_output() tcp_output+0xa41 tcp_timer_rexmt() at tcp_timer_rexmt+0x251 softclock() at softclock+0x291 intr_event_execute_handlers() at intr_event_execute_handlers+0x66 ithread_loop at ithread_loop+0x8e fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 --- So I've tracked down the line of code where the page fault is occurring: if (dir == PFIL_IN) ss-n_in++; else ss-n_out++; ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats per-cpu and is initialised at the start of the function like so: ss = DPCPU_PTR(ss); So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your machine. I know very little about the inner workings of the DPCPU_* macros, but I'm pretty sure the way I use them in SIFTR is correct or at least as intended. siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing it itself. I think if ss was NULL, the panic should already occur in siftr_chkreinject(). Yes but siftr_chkreinject() only dereferences ss in the exceptional case of a malloc failure or duplicate pkt. It's unlikely either case happens for you and so wouldn't trigger the panic. To be sure I added: diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c index 8bc3498..b9fdfe4 100644 --- a/sys/netinet/siftr.c +++ b/sys/netinet/siftr.c @@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet *ifp, int dir, if (siftr_chkreinject(*m, dir, ss)) goto ret; + if (ss == NULL) { + printf(ss is NULL); + ss = DPCPU_PTR(ss); + if (ss == NULL) { + printf(ss is still NULL); + goto ret; + } +} + + if (dir == PFIL_IN) ss-n_in++; else which doesn't seem to affect the problem. As in it still panics and the ss is NULL message is not printed? I would have expected to at least see ss is NULL printed if my hypothesis was correct... hmm. Yes, it still panics, but no message is printed. It was just pointed out to me that ss doesn't have to be NULL in order to cause the page fault (duh). It could also just be a garbage ptr which is why your print statement isn't firing. Can you trigger the panic again and look for some information along the lines of fault virtual address = ... as part of the panic info. Knowing the faulting address would be useful and may help further diagnosis. Perhaps the way I discovered the line number at which the panic occurred was wrong. I compiled SIFTR on my amd64 dev server with CFLAGS+=-g in the SIFTR Makefile to get debug symbols, ran objdump -Sd siftr.ko | vim -, searched for the instruction reported in the panic message i.e. addq $0x1,0x8(%r14) and then with a bit of trial and error, recompiled SIFTR with the line of code volatile int blah = 0; blah = 2; at various points in the function and looking at the change in the objdump output to pinpoint which line of C code corresponded with the addq instruction. The volatile int blah = 0; blah = 2; compiles to movl $0x0,0xffd4(%rbp) followed immediately by movl $0x2,0xffd4(%rbp). When I put that code above the if (dir == PFIL_IN) statement I see the objdump output show the assembly code before the addq instruction and when I move it after the if statement the assembly code moves after the addq instruction. That's a neat trick. Indeed, and I thank phk@ for suggesting it to me. Perhaps you could reproduce the above procedure and see if you identify the same point in the siftr_chkpkt function I did for the instruction referenced by the panic message? I do. Using: diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c index b9fdfe4..fc6bd9a 100644 --- a/sys/netinet/siftr.c +++ b/sys/netinet/siftr.c @@ -797,12 +797,15
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
On 06/20/10 23:15, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.org wrote: On 06/20/10 22:28, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.org wrote: On 06/20/10 21:15, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.orgwrote: On 06/20/10 03:58, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.org wrote: On 06/13/10 18:12, Lawrence Stewart wrote: The time has come to solicit some external testing for my SIFTR tool. I'm hoping to commit it within a week or so unless problems are discovered. I'm interested in all feedback and reports of success/failure, along with details of the architecture tested and number of CPUs if you would be so kind. I got the following hand-transcribed panic maybe a second after sysctl net.inet.siftr.enabled=1 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 [...] current process = 12 (swi4: clock) [ thread pid 12 tid 16 ] Stopped at siftr_chkpkt+0xd0: addq$0x1,0x8(%r14) db where Tracing pid 12 tid 16 td 0xff00034037e0 siftr_chkpt() at siftr_chkpkt+0xd0 pfil_run_hooks() at pfil_run_hooks+0xb4 ip_output() at ip_output+0x382 tcp_output() tcp_output+0xa41 tcp_timer_rexmt() at tcp_timer_rexmt+0x251 softclock() at softclock+0x291 intr_event_execute_handlers() at intr_event_execute_handlers+0x66 ithread_loop at ithread_loop+0x8e fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 --- So I've tracked down the line of code where the page fault is occurring: if (dir == PFIL_IN) ss-n_in++; else ss-n_out++; ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats per-cpu and is initialised at the start of the function like so: ss = DPCPU_PTR(ss); So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your machine. I know very little about the inner workings of the DPCPU_* macros, but I'm pretty sure the way I use them in SIFTR is correct or at least as intended. siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing it itself. I think if ss was NULL, the panic should already occur in siftr_chkreinject(). Yes but siftr_chkreinject() only dereferences ss in the exceptional case of a malloc failure or duplicate pkt. It's unlikely either case happens for you and so wouldn't trigger the panic. To be sure I added: diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c index 8bc3498..b9fdfe4 100644 --- a/sys/netinet/siftr.c +++ b/sys/netinet/siftr.c @@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet *ifp, int dir, if (siftr_chkreinject(*m, dir, ss)) goto ret; + if (ss == NULL) { + printf(ss is NULL); + ss = DPCPU_PTR(ss); + if (ss == NULL) { + printf(ss is still NULL); + goto ret; + } +} + + if (dir == PFIL_IN) ss-n_in++; else which doesn't seem to affect the problem. As in it still panics and the ss is NULL message is not printed? I would have expected to at least see ss is NULL printed if my hypothesis was correct... hmm. Yes, it still panics, but no message is printed. It was just pointed out to me that ss doesn't have to be NULL in order to cause the page fault (duh). It could also just be a garbage ptr which is why your print statement isn't firing. Can you trigger the panic again and look for some information along the lines of fault virtual address = ... as part of the panic info. Knowing the faulting address would be useful and may help further diagnosis. Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xff7f808f9de8 fault code = supervisor write data, page not present instruction pointer = 0x20:0x8241f800 stack pointer = 0x28:0xff83a7d0 frame pointer = 0x28:0xff83a840 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 None of this looks too crazy, but at least one person I've been chatting to about this thinks the faulting address doesn't look quite right for a DPCPU variable. Can you please get the following additional info from DDB: show reg show dpcpu_offset p/x pcpu_entry_modspace And can you also please identify the upstream FreeBSD revision number your kernel source is based on (as opposed to the GIT rev) so we can make sure we're looking at the same base sources you're running. current process = 12 (swi4: clock) [ thread pid 12 tid 16 ] Stopped at siftr_chkpkt+0xd0: addq$0x1,0x8(%r14) db where Tracing pid 12 tid 16 td 0xff00034037e0 siftr_chkpt() at siftr_chkpkt+0xd0 pfil_run_hooks() at pfil_run_hooks+0xb4
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
On 06/21/10 00:12, Fabian Keil wrote: Fabian Keilfreebsd-lis...@fabiankeil.de wrote: Lawrence Stewartlstew...@freebsd.org wrote: On 06/20/10 22:28, Fabian Keil wrote: Taking pf (and altq) out of the picture doesn't seem to make a difference. Wouldn't have expected it to. Will be very curious to know if the panic is triggered in GENERIC. It's not. I, too, get pfil.c related LORs though: lock order reversal: 1st 0x80e5c568 PFil hook read/write mutex (PFil hook read/write mutex) @ /usr/src/sys/net/pfil.c:77 2nd 0x80e5dd68 udp (udp) @ /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:3035 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x81e _rw_rlock() at _rw_rlock+0x5f pf_socket_lookup() at pf_socket_lookup+0x1c5 pf_test_udp() at pf_test_udp+0x8b0 pf_test() at pf_test+0x1089 pf_check_in() at pf_check_in+0x39 pfil_run_hooks() at pfil_run_hooks+0xcf ip_input() at ip_input+0x2ae swi_net() at swi_net+0x151 intr_event_execute_handlers() at intr_event_execute_handlers+0x66 ithread_loop() at ithread_loop+0xb2 fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff844d30, rbp = 0 --- lock order reversal: 1st 0x80e5c568 PFil hook read/write mutex (PFil hook read/write mutex) @ /usr/src/sys/net/pfil.c:77 2nd 0x80e5d788 tcp (tcp) @ /usr/src/sys/modules/siftr/../../netinet/siftr.c:698 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x81e _rw_rlock() at _rw_rlock+0x5f siftr_chkpkt() at siftr_chkpkt+0x3c4 pfil_run_hooks() at pfil_run_hooks+0xcf ip_input() at ip_input+0x2ae swi_net() at swi_net+0x151 intr_event_execute_handlers() at intr_event_execute_handlers+0x66 ithread_loop() at ithread_loop+0xb2 fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff844d30, rbp = 0 --- My custom kernel normally doesn't have INVARIANTS and WITNESS enabled, so I'll try to enable them next. The culprit seem to be non-default KTR settings in the kernel while loading alq as a module. With the following change siftr works with my non-GENERIC kernel, too: commit f43b8b5171c858df7b419f6a695e9e3b53531a8e Author: Fabian Keilf...@fabiankeil.de Date: Sun Jun 20 15:43:01 2010 +0200 Disable KTR changes. diff --git a/sys/amd64/conf/ZOEY b/sys/amd64/conf/ZOEY index 6fb3480..c584317 100644 --- a/sys/amd64/conf/ZOEY +++ b/sys/amd64/conf/ZOEY @@ -16,11 +16,11 @@ options ATA_CAM device atapicam options SC_KERNEL_CONS_ATTR=(FG_GREEN|BG_BLACK) -options KTR -options KTR_ENTRIES=262144 -options KTR_COMPILE=(KTR_SCHED) -options KTR_MASK=(KTR_SCHED) -options KTR_CPUMASK=0x3 +#options KTR +#options KTR_ENTRIES=262144 +#options KTR_COMPILE=(KTR_SCHED) +#options KTR_MASK=(KTR_SCHED) +#options KTR_CPUMASK=0x3 options ACCEPT_FILTER_HTTP makeoptions WITH_CTF=yes This smells very fishy. Without options KTR_ALQ, KTR shouldn't even care if ALQ exists or not. Not only that, but ALQ isn't even used in siftr_chkpkt and you clearly manage to successfully use ALQ to write the module load message to the log file. H... Thanks for taking the time to find the culprit though - I'll see if I can reproduce here. Could you try another thing for me and see if reducing options KTR_ENTRIES=262144 down to a smaller number (maybe 4096?) and leaving all the other KTR options as they are above (but uncommented) makes any difference? The ktr(4) man page indicates the default is 8192 entries and I'm curious if the your allocation of so many additional entries is making something unhappy. Thanks again for your time helping with this, I really appreciate it. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
On 06/21/10 05:44, Rui Paulo wrote: On 20 Jun 2010, at 20:36, Fabian Keil wrote: Fabian Keilfreebsd-lis...@fabiankeil.de wrote: Fabian Keilfreebsd-lis...@fabiankeil.de wrote: My custom kernel normally doesn't have INVARIANTS and WITNESS enabled, so I'll try to enable them next. The culprit seem to be non-default KTR settings in the kernel while loading alq as a module. Actually whether or not alq is loaded as a module doesn't seem to matter, with: options KTR options KTR_ENTRIES=262144 options KTR_COMPILE=(KTR_SCHED) options KTR_MASK=(KTR_SCHED) options KTR_CPUMASK=0x3 options ALQ options KTR_ALQ enabling siftr panics the system, too. That's probably because your module was built with different compile time options than the ones used in the kernel. These options may change structure sizes, function parameters, etc. and that easily causes panics. hmm I wonder if my instructions to build SIFTR manually are causing your problems. Fabian, is the siftr.ko module you're loading built as part of a make buildkernel, or did you follow my instructions and cd /path/to/src/sys/modules/siftr ; make ; kldload ./siftr.ko? If the latter is true, perhaps try and explicitly build SIFTR as part of make buildkernel and see if loading the module built that way still triggers the panic when enabled (the module will be in /usr/obj/path/to/src/sys/KERNCONF/modules/path/to/src/sys/modules/siftr/siftr.ko or if you make installkernel it'll be in /boot/kernel/kernel/siftr.ko). Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
Hi Lev, On 06/19/10 16:26, Lev Serebryakov wrote: Hello, Lawrence. You wrote 19 июня 2010 г., 07:27:30: Amount of feedback received thus far: nichts, nil, nada I wanted to help you, but here is one problem: I dont have any traffic-loaded 9-CURRENT machines. I have some not-so-critical 7.x and 8.x machines with noticeable traffic (for example, my torrent box still run 7-STABLE), but no 9-CURRENT except VMWare on my desktop :( I think, it is common case: 9-CURRENT machines are developers one, without noticeable amount of network traffic and all traffic-loaded machines run more stable versions. Right now the traffic load of the test machine is not really all that important to the testing. As long as the module loads, logs some coherent looking data whilst enabled and unloads across a range of different hardware and kernel archs, I'll be happy. SIFTR will be backported to 8 and possibly 7 also, so there will be plenty of time to get people with more heavily loaded systems running stable branches to join in testing. This is the first real push I've made to get the code widely tested, so I wouldn't feel comfortable asking people to run it on (semi-)production, stable branch systems yet. If you're really keen to help test it and you wouldn't be worried about running the code on such a system, I would be happy to create a 7 and/or 8 backport of the required bits. Otherwise, I'm happy to get the initial round of 9-CURRENT only testing feedback, commit it to head and then revisit once it's settled and time to merge it back to the stable branches. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] SIFTR - Statistical Information For TCP Research
Hi Pluknet, On 06/19/10 18:48, pluknet wrote: [snip] Hi. I'm seeing this right after enabling siftr via sysctl and changing ppl. Sorry, if that was already discussed, known or unrelated (since em is in locking chain). lock order reversal: 1st 0x80e51568 PFil hook read/write mutex (PFil hook read/write mutex) @ /usr/src/sys/net/pfil.c:77 2nd 0x80e52788 tcp (tcp) @ /usr/src/sys/modules/siftr/../../netinet/siftr.c:698 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x81e _rw_rlock() at _rw_rlock+0x5f siftr_chkpkt() at siftr_chkpkt+0x374 pfil_run_hooks() at pfil_run_hooks+0xcf ip_input() at ip_input+0x2ae netisr_dispatch_src() at netisr_dispatch_src+0xb8 ether_demux() at ether_demux+0x17d ether_input() at ether_input+0x175 em_rxeof() at em_rxeof+0x193 em_handle_que() at em_handle_que+0x4a taskqueue_run() at taskqueue_run+0x91 taskqueue_thread_loop() at taskqueue_thread_loop+0x3f fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8bed30, rbp = 0 --- I believe I discussed this LOR with Robert Watson some time back and we came to the conclusion it is a false positive witness report and is safe to ignore. I should document it in the man page and figure out if there's some way to tell witness to not report it. Thanks for reminding me and for testing. Did everything else behave sanely and work ok? Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
Hi Fabian, Thank you for the the report. This is indeed an issue I've never seen before and exactly the sort of thing I wanted to uncover. On 06/20/10 03:58, Fabian Keil wrote: Lawrence Stewartlstew...@freebsd.org wrote: On 06/13/10 18:12, Lawrence Stewart wrote: The time has come to solicit some external testing for my SIFTR tool. I'm hoping to commit it within a week or so unless problems are discovered. I'm interested in all feedback and reports of success/failure, along with details of the architecture tested and number of CPUs if you would be so kind. I got the following hand-transcribed panic maybe a second after sysctl net.inet.siftr.enabled=1 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 [...] current process = 12 (swi4: clock) [ thread pid 12 tid 16 ] Stopped at siftr_chkpkt+0xd0: addq$0x1,0x8(%r14) db where Tracing pid 12 tid 16 td 0xff00034037e0 siftr_chkpt() at siftr_chkpkt+0xd0 pfil_run_hooks() at pfil_run_hooks+0xb4 ip_output() at ip_output+0x382 tcp_output() tcp_output+0xa41 tcp_timer_rexmt() at tcp_timer_rexmt+0x251 softclock() at softclock+0x291 intr_event_execute_handlers() at intr_event_execute_handlers+0x66 ithread_loop at ithread_loop+0x8e fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 --- hmm I'd love to know which line of code siftr_chkpkt+0xd0 maps to. Let me read through the function carefully and see if I can spot an obvious null ptr deref. The hook function has received some major rototilling of late to get it ready for the import so I must have missed something. This is from the third attempt, the second time I got a different backtrace that also contained some *_iwn_* functions, the first time I had X running, so I didn't get anything. Unfortunately at that point the system seems to be too busted to dump core. Typically, packets are direct dispatched into the stack from the driver so it is normal to see driver functions in a thread's stack trace when it's executing in the siftr pfil hook. I'm using: FreeBSD 9.0-CURRENT #99 r+b768fe1: Sat Jun 19 15:01:37 CEST 2010 f...@r500.local:/usr/obj/usr/src/sys/ZOEY amd64 Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU T5870 @ 2.00GHz (1995.01-MHz K8-class CPU) Origin = GenuineIntel Id = 0x6fd Family = 6 Model = f Stepping = 13 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0xe39dSSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant real memory = 2147483648 (2048 MB) avail memory = 1976610816 (1885 MB) ACPI APIC Table:LENOVO TP-7Y FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 1 ioapic0Version 2.0 irqs 0-23 on motherboard I'm not using vanilla sources, but none of the modifications should matter here. Yes this does not look like an issue with your sources but with the siftr code itself. Don't bother testing with GENERIC yet as I'm confident you've given me enough info to track this down. I have powerd running and did not yet try without it. The system has bge0 and iwn0, but bge0 is mainly down. pf is compiled into the kernel, siftr is loaded as a module. The panic seems to occur without logging a single packet first: f...@r500 ~ $cat /var/log/siftr.log enable_time_secs=1276966161 enable_time_usecs=945080siftrver=1.2.3 hz=100 tcp_rtt_scale=32sysname=FreeBSD sysver=900014 ipmode=4 enable_time_secs=1276966586 enable_time_usecs=314023siftrver=1.2.3 hz=100 tcp_rtt_scale=32sysname=FreeBSD sysver=900014 ipmode=4 I get the impression that this is reproducible, but only tried three times (the last time with everything mounted read-only). Thanks again for the report and I'll be in touch as soon as I get a chance to look at it some more (hopefully later today). Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
Amount of feedback received thus far: nichts, nil, nada *sings I'm so ronery in his best Kim Jong-il voice* [4] Just like Uncle Sam [5], Uncle Lawrence needs you too - yes, I'm pointing at YOU! More specifically, people out there running current with 10-15 mins to spare for some testing, please read on. On 06/13/10 18:12, Lawrence Stewart wrote: Hi all, The time has come to solicit some external testing for my SIFTR tool. I'm hoping to commit it within a week or so unless problems are discovered. SIFTR is a kernel module that logs a range of statistics on active TCP connections to a log file. It provides the ability to make highly granular measurements of TCP connection state, aimed at system administrators, developers and researchers. You can use the data to find bugs in the stack, understand why connections are performing badly and test new code to name a few uses. Development has been made possible in part by grants from the Cisco University Research Program Fund at Community Foundation Silicon Valley, and the FreeBSD Foundation. Bringing it into FreeBSD proper is being carried out under the auspices of the Enhancing the FreeBSD TCP Implementation FreeBSD Foundation project. More details are available at [1,2,3]. If you can help out, please read on! Before continuing, make sure you're running with at least svn revision 209119 (my commit to sys/pcpu.h), or you can manually apply the r209119 diff to to your earlier rev source tree. The SIFTR patch is here: http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209119.patch An updated version of the patch against svn head revision 209325 is available from: http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209325.patch There was a backwards incompatible change in the external DPCPU_SUM() macro in sys/pcpu.h in r209325 of head so SIFTR also had to be updated. Please adapt the following instructions as appropriate based on the patch version you're testing. Copy it to the root of your source tree and run the following: patch -p1 siftr_9.x.r209119.patch It's a loadable kernel module so you can build it for testing like so: cd path/to/src/sys/modules/siftr make kldload ./siftr.ko (don't forget to make cleandir to remove cruft when finished testing) After applying the patch, you can read the man page by running: man -M path/to/src/share/man siftr If I've done a decent job, all the info you need to understand what it does and how to use it should be in the man page. I'm interested in all feedback and reports of success/failure, along with details of the architecture tested and number of CPUs if you would be so kind. That should be enough to get the ball rolling. Thanks and I look forward to hearing from you! Cheers, Lawrence [1] http://caia.swin.edu.au/freebsd/etcp09/ [2] http://www.freebsdfoundation.org/projects.shtml#Swinburne [3] http://caia.swin.edu.au/urp/newtcp/ [4] http://www.youtube.com/watch?v=xh_9QhRzJEs (language warning) [5] http://www.sonofthesouth.net/uncle-sam/images/uncle-sam-wants-you.jpg ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
[CFT] SIFTR - Statistical Information For TCP Research
Hi all, The time has come to solicit some external testing for my SIFTR tool. I'm hoping to commit it within a week or so unless problems are discovered. SIFTR is a kernel module that logs a range of statistics on active TCP connections to a log file. It provides the ability to make highly granular measurements of TCP connection state, aimed at system administrators, developers and researchers. You can use the data to find bugs in the stack, understand why connections are performing badly and test new code to name a few uses. Development has been made possible in part by grants from the Cisco University Research Program Fund at Community Foundation Silicon Valley, and the FreeBSD Foundation. Bringing it into FreeBSD proper is being carried out under the auspices of the Enhancing the FreeBSD TCP Implementation FreeBSD Foundation project. More details are available at [1,2,3]. If you can help out, please read on! Before continuing, make sure you're running with at least svn revision 209119 (my commit to sys/pcpu.h), or you can manually apply the r209119 diff to to your earlier rev source tree. The SIFTR patch is here: http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209119.patch Copy it to the root of your source tree and run the following: patch -p1 siftr_9.x.r209119.patch It's a loadable kernel module so you can build it for testing like so: cd path/to/src/sys/modules/siftr make kldload ./siftr.ko (don't forget to make cleandir to remove cruft when finished testing) After applying the patch, you can read the man page by running: man -M path/to/src/share/man siftr If I've done a decent job, all the info you need to understand what it does and how to use it should be in the man page. I'm interested in all feedback and reports of success/failure, along with details of the architecture tested and number of CPUs if you would be so kind. That should be enough to get the ball rolling. Thanks and I look forward to hearing from you! Cheers, Lawrence [1] http://caia.swin.edu.au/freebsd/etcp09/ [2] http://www.freebsdfoundation.org/projects.shtml#Swinburne [3] http://caia.swin.edu.au/urp/newtcp/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [RFC] Macro to sum DPCPU vars
On 06/10/10 22:23, John Baldwin wrote: On Wednesday 09 June 2010 11:54:53 pm Lawrence Stewart wrote: Does anyone have objections to or feedback on the following patch? The macro simplifies the act of calculating an aggregate from DPCPU counters. http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/dpcpu_sum_9.x.r208900.patch If anyone is curious how you would use it, take a look at: I think this is fine, though I'm about to make it smaller. At Robert's request I've come up with some macros to iterate over CPUs to abstract out the CPU_ABSENT(), etc. bits. It is at www.freebsd.org/~jhb/patches/cpu_iter.patch Using CPU_FOREACH() should try your macro down slightly. Nice, I'll rework my patch and commit once your new bits hit the tree. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: etcupdate tool in base?
On 06/11/10 03:46, John Baldwin wrote: I've had several folks ask me recently about importing etcupdate (http://www.FreeBSD.org/~jhb/etcupdate) into the base system as an alternate tool for updating /etc during upgrades. Do folks have any strong objections to doing so? More details about how it works and an HTML version of the manpage can be found at the URL above. +1 for adding to base (and updating handbook chapters makeworld.html and small-lan.html, plus maybe /usr/src/Makefile and an UPDATING entry). Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
[RFC] Macro to sum DPCPU vars
Does anyone have objections to or feedback on the following patch? The macro simplifies the act of calculating an aggregate from DPCPU counters. http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/dpcpu_sum_9.x.r208900.patch If anyone is curious how you would use it, take a look at: http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r208900_v2.patch and search for code that references the siftr_stats struct or DPCPU. I intend to commit the DPCPU patch in the next day or two. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [TESTING]: ClangBSD branch needs testing before the import to HEAD
On 06/01/10 09:25, James R. Van Artsdalen wrote: [snip interesting history] I do suggest modifying the FreeBSD build process so that uname -a shows the compiler and its version for both the kernel and userland. Reading through this discussion, I wanted to draw attention to this footnote in James' email. It sounds like a sensible and useful suggestion that would go some way to addressing Kostik's concerns about knowing whether a kernel bug report was related to a gcc or clang built kernel. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org