Re: devd in r329188M don't start

2018-02-15 Thread Lawrence Stewart
On 13/02/2018 23:50, Hans Petter Selasky wrote:
> On 02/13/18 10:47, Jakob Alvermark wrote:
>> +1
>>
>> My USB mouse was working fine before the switch to devmatch. Now I
>> have to 'kldload ums' manually.
>>
>> Same for USB audio, snd_uaudio.ko was loaded by devd before.
>>
> 
> Hi,
> 
> This is a known issue.
> 
> Can you try the attached patch?
> 
> Rebuild devmatch(8) and reinstall /etc/devd/devmatch.conf and
> /etc/rc.d/devmatch only.

+1 for ums mouse breakage on recent upgrade from head 20180201 (Git
9e57d147a97) to head 20180215 (Git 81891e10182). Will build and test
patch on the weekend, but given the 2 success reports already, I presume
it will fix things for me too. I won't follow up to this thread again
unless the patch doesn't work for me. Thanks Hans.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


[SOLVED] Re: Deterministic rescue buildworld error with custom make.conf/src.conf/MAKEOBJDIRPREFIX

2017-03-11 Thread Lawrence Stewart
On 12/03/2017 13:37, Ian Lepore wrote:
> On Sun, 2017-03-12 at 13:27 +1100, Lawrence Stewart wrote:
>> Hi Ian,
>>
>> On 12/03/2017 10:29, Ian Lepore wrote:
>>>
>>> On Sun, 2017-03-12 at 10:22 +1100, Lawrence Stewart wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I'm unable to complete buildworld with 2 recent svn revs I've
>>>> tried
>>>> (r314838 and r315059). I'm building for a slightly resource
>>>> constrained
>>>> production system so am specifying custom settings and a
>>>> different
>>>> obj
>>>> tree location so I can copy it to the target system. The error
>>>> persists
>>>> after an "rm -rf /usr/obj/*", and if parallel building is
>>>> disabled.
>>>>
>>>> The underlying build system built from r314838 via simple "make
>>>> -C
>>>> /usr/src -s -j6 buildworld buildkernel" built and installed fine,
>>>> so
>>>> the
>>>> problem seems to be around the use of the build customisations.
>>>>
>>>> Any clues?
>>>>
>>>> Cheers,
>>>> Lawrence
>>>>
>>>>
>>>> root@builder-head-amd64:/usr/src # cat cust_make.conf
>>>> KERNCONF=GENERIC-NODEBUG
>>>> MALLOC_PRODUCTION=YES
>>>>
>>>> root@builder-head-amd64:/usr/src # cat cust_src.conf
>>>> WITHOUT_PROFILE=1
>>>>
>>>> root@builder-head-amd64:/usr/src # make
>>>> __MAKE_CONF=/usr/src/cust_make.conf
>>>> SRCCONF=/usr/src/cust_src.conf
>>>> MAKEOBJDIRPREFIX=/usr/obj/cust buildworld buildkernel
>>>> [...]
>>>> MK_AUTO_OBJ=no
>>>> MK_TESTS=no  UPDATE_DEPENDFILE=no  _RECURSING_CRUNCH=1
>>>> CC="cc -target x86_64-unknown-freebsd12.0
>>>> --sysroot=/usr/obj/cust/usr/src/tmp
>>>> -B/usr/obj/cust/usr/src/tmp/usr/bin
>>>> -O2 -pipe   -std=gnu99-Qunused-arguments  "  CXX="c++  -
>>>> target
>>>> x86_64-unknown-freebsd12.0 --sysroot=/usr/obj/cust/usr/src/tmp
>>>> -B/usr/obj/cust/usr/src/tmp/usr/bin -O2 -pipe -Qunused-arguments
>>>> -Wno-c++11-extensions  "  make .MAKE.MODE="normal curdirOk=yes"
>>>> .MAKE.META.IGNORE_PATHS=""  -f rescue.mk exe
>>>> cc -target x86_64-unknown-freebsd12.0
>>>> --sysroot=/usr/obj/cust/usr/src/tmp
>>>> -B/usr/obj/cust/usr/src/tmp/usr/bin
>>>> -O2 -pipe   -std=gnu99-Qunused-arguments   -nostdlib -Wl,-dc
>>>> -r
>>>> -o
>>>> cat.lo cat_stub.o
>>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o
>>>> cc: error: no such file or directory:
>>>> '/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o'
>>>> *** Error code 1
>>>>
>>>> There appear to be a lot of missing .o files under the rescue obj
>>>> tree:
>>>>
>>>> root@builder-head-amd64:/usr/src # find
>>>> /usr/obj/cust/usr/src/rescue/rescue//usr -type f
>>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax.o
>>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax
>>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes.o
>>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes
>>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/sh.err.h
>>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/tc.const.h
>>>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/gethost
>>>>
>>>> compared with an obj tree on a different head system:
>>>>
>>>> find /usr/obj/usr/src/rescue/rescue/usr/ -type f | wc -l
>>>> 1552
>>>> ___
>>>> freebsd-current@freebsd.org mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>>>> To unsubscribe, send any mail to "freebsd-current-unsubscribe@fre
>>>> ebsd
>>>> .org"
>>> The MAKEOBJDIRPREFIX variable must be set in the environment, not
>>> in
>>> make.conf or on the make command line (documented in build(7)).
>> Your assertion seems at odds with my past experience and my reading
>> of
>> the man page... from build(7):
>>
>>  The build may be controlled by defining make(1) variables
>>  described in the ENVIRONMENT section below, and by the
>>  varia

Re: Deterministic rescue buildworld error with custom make.conf/src.conf/MAKEOBJDIRPREFIX

2017-03-11 Thread Lawrence Stewart
Hi Ian,

On 12/03/2017 10:29, Ian Lepore wrote:
> On Sun, 2017-03-12 at 10:22 +1100, Lawrence Stewart wrote:
>> Hi all,
>>
>> I'm unable to complete buildworld with 2 recent svn revs I've tried
>> (r314838 and r315059). I'm building for a slightly resource
>> constrained
>> production system so am specifying custom settings and a different
>> obj
>> tree location so I can copy it to the target system. The error
>> persists
>> after an "rm -rf /usr/obj/*", and if parallel building is disabled.
>>
>> The underlying build system built from r314838 via simple "make -C
>> /usr/src -s -j6 buildworld buildkernel" built and installed fine, so
>> the
>> problem seems to be around the use of the build customisations.
>>
>> Any clues?
>>
>> Cheers,
>> Lawrence
>>
>>
>> root@builder-head-amd64:/usr/src # cat cust_make.conf
>> KERNCONF=GENERIC-NODEBUG
>> MALLOC_PRODUCTION=YES
>>
>> root@builder-head-amd64:/usr/src # cat cust_src.conf
>> WITHOUT_PROFILE=1
>>
>> root@builder-head-amd64:/usr/src # make
>> __MAKE_CONF=/usr/src/cust_make.conf SRCCONF=/usr/src/cust_src.conf
>> MAKEOBJDIRPREFIX=/usr/obj/cust buildworld buildkernel
>> [...]
>> MK_AUTO_OBJ=no MK_TESTS=no  UPDATE_DEPENDFILE=no  _RECURSING_CRUNCH=1
>> CC="cc -target x86_64-unknown-freebsd12.0
>> --sysroot=/usr/obj/cust/usr/src/tmp
>> -B/usr/obj/cust/usr/src/tmp/usr/bin
>> -O2 -pipe   -std=gnu99-Qunused-arguments  "  CXX="c++  -target
>> x86_64-unknown-freebsd12.0 --sysroot=/usr/obj/cust/usr/src/tmp
>> -B/usr/obj/cust/usr/src/tmp/usr/bin -O2 -pipe -Qunused-arguments
>> -Wno-c++11-extensions  "  make .MAKE.MODE="normal curdirOk=yes"
>> .MAKE.META.IGNORE_PATHS=""  -f rescue.mk exe
>> cc -target x86_64-unknown-freebsd12.0
>> --sysroot=/usr/obj/cust/usr/src/tmp
>> -B/usr/obj/cust/usr/src/tmp/usr/bin
>> -O2 -pipe   -std=gnu99-Qunused-arguments   -nostdlib -Wl,-dc -r
>> -o
>> cat.lo cat_stub.o
>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o
>> cc: error: no such file or directory:
>> '/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o'
>> *** Error code 1
>>
>> There appear to be a lot of missing .o files under the rescue obj
>> tree:
>>
>> root@builder-head-amd64:/usr/src # find
>> /usr/obj/cust/usr/src/rescue/rescue//usr -type f
>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax.o
>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax
>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes.o
>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes
>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/sh.err.h
>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/tc.const.h
>> /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/gethost
>>
>> compared with an obj tree on a different head system:
>>
>> find /usr/obj/usr/src/rescue/rescue/usr/ -type f | wc -l
>> 1552
>> ___
>> freebsd-current@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd
>> .org"
> 
> The MAKEOBJDIRPREFIX variable must be set in the environment, not in
> make.conf or on the make command line (documented in build(7)).

Your assertion seems at odds with my past experience and my reading of
the man page... from build(7):

The build may be controlled by defining make(1) variables
described in the ENVIRONMENT section below, and by the
variables documented in make.conf(5).

... which indicates they are make variables, not environment variables
specifically. As a concrete example, TARGET and DESTDIR are listed under
the "ENVIRONMENT" section of the man page, yet "EXAMPLES" shows:

   make TARGET=sparc64 buildworld
   make TARGET=sparc64 DESTDIR=/clients/sparc64 installworld

I've certainly always set build vars documented in the "ENVIRONMENT"
section of the man page on the make command line without issue. Pretty
sure I've set MAKEOBJDIRPREFIX from the make command line also in the
past, though perhaps it has been working for me "by accident" and a
documentation tweak is in order if the distinction you make is in fact
relevant...

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Deterministic rescue buildworld error with custom make.conf/src.conf/MAKEOBJDIRPREFIX

2017-03-11 Thread Lawrence Stewart
Hi all,

I'm unable to complete buildworld with 2 recent svn revs I've tried
(r314838 and r315059). I'm building for a slightly resource constrained
production system so am specifying custom settings and a different obj
tree location so I can copy it to the target system. The error persists
after an "rm -rf /usr/obj/*", and if parallel building is disabled.

The underlying build system built from r314838 via simple "make -C
/usr/src -s -j6 buildworld buildkernel" built and installed fine, so the
problem seems to be around the use of the build customisations.

Any clues?

Cheers,
Lawrence


root@builder-head-amd64:/usr/src # cat cust_make.conf
KERNCONF=GENERIC-NODEBUG
MALLOC_PRODUCTION=YES

root@builder-head-amd64:/usr/src # cat cust_src.conf
WITHOUT_PROFILE=1

root@builder-head-amd64:/usr/src # make
__MAKE_CONF=/usr/src/cust_make.conf SRCCONF=/usr/src/cust_src.conf
MAKEOBJDIRPREFIX=/usr/obj/cust buildworld buildkernel
[...]
MK_AUTO_OBJ=no MK_TESTS=no  UPDATE_DEPENDFILE=no  _RECURSING_CRUNCH=1
CC="cc -target x86_64-unknown-freebsd12.0
--sysroot=/usr/obj/cust/usr/src/tmp -B/usr/obj/cust/usr/src/tmp/usr/bin
-O2 -pipe   -std=gnu99-Qunused-arguments  "  CXX="c++  -target
x86_64-unknown-freebsd12.0 --sysroot=/usr/obj/cust/usr/src/tmp
-B/usr/obj/cust/usr/src/tmp/usr/bin -O2 -pipe -Qunused-arguments
-Wno-c++11-extensions  "  make .MAKE.MODE="normal curdirOk=yes"
.MAKE.META.IGNORE_PATHS=""  -f rescue.mk exe
cc -target x86_64-unknown-freebsd12.0
--sysroot=/usr/obj/cust/usr/src/tmp -B/usr/obj/cust/usr/src/tmp/usr/bin
-O2 -pipe   -std=gnu99-Qunused-arguments   -nostdlib -Wl,-dc -r -o
cat.lo cat_stub.o /usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o
cc: error: no such file or directory:
'/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/cat/cat.o'
*** Error code 1

There appear to be a lot of missing .o files under the rescue obj tree:

root@builder-head-amd64:/usr/src # find
/usr/obj/cust/usr/src/rescue/rescue//usr -type f
/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax.o
/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mksyntax
/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes.o
/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/sh/mknodes
/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/sh.err.h
/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/tc.const.h
/usr/obj/cust/usr/src/rescue/rescue//usr/src/bin/csh/gethost

compared with an obj tree on a different head system:

find /usr/obj/usr/src/rescue/rescue/usr/ -type f | wc -l
1552
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Lawrence Stewart
On 08/27/15 17:15, Don Lewis wrote:
 On 27 Aug, Don Lewis wrote:
 On 27 Aug, Lawrence Stewart wrote:
 On 08/27/15 09:36, John-Mark Gurney wrote:
 Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
 On 12/08/2015 17:11, Lawrence Stewart wrote:
 On 08/07/15 07:33, Pawel Pekala wrote:
 Hi K.,

 On 2015-08-06 12:33 -0700, K. Macy km...@freebsd.org wrote:
 Is this still happening?

 Still crashes:

 +1 for me running r286617

 Here is another +1 with r286922.
 I can add a couple of bits of debugging data:

 (kgdb) fr 8
 #8  0x80639d60 in knote (list=0xf8019a733ea0,
 hint=2147483648, lockflags=value optimized out) at
 /usr/src/sys/kern/kern_event.c:1964
 1964} else if ((lockflags  KNF_NOKQLOCK) != 0) {
 (kgdb) p *list
 $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0

 We should/cannot get here w/ an empty list.  If we do, then there is
 something seriously wrong...  The current kn (which we must have as we
 are here) MUST be on the list, but as you just showed, there are no
 knotes on the list.

 Can you get me a print of the knote?  That way I can see what flags
 are on it?

 I quickly tried to get this info for you by building my kernel with -O0
 and reproducing, but I get an insta-panic on boot with the new kernel:

 Fatal double fault
 rip = 0x8218c794
 rsp = 0xfe044cdc9fe0
 rbp = 0xfe044cdca110
 cpuid = 2; apic id = 02
 panic: double fault
 cpuid = 2
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
 0xfe03dcfffe30
 vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0
 panic() at panic+0x43/frame 0xfe03dc10
 dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30
 Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30
 --- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp =
 0xfe044cdca110 ---
 vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110
 vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame
 0xfe044cdca560
 vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0
 zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0
 zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730
 zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760
 vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame
 0xfe044cdca800
 zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930
 zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980
 zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0
 spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50
 traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60
 traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0
 traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0
 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0
 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300
 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510
 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720
 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930
 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40
 traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0
 traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0
 traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0
 traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040
 traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140
 spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0
 spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610
 spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0
 spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0
 spa_load() at spa_load+0x320/frame 0xfe044cdccbb0
 spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50
 spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40
 spa_open() at spa_open+0x35/frame 0xfe044cdccd70
 dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0
 dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30
 zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050
 zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0
 zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390
 vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660
 kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0
 parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810
 vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d0
 start_init() at start_init+0x62/frame 0xfe044cdcda70
 fork_exit() at fork_exit+0x84/frame 0xfe044cdcdab0
 fork_trampoline() at fork_trampoline+0xe/frame 0xfe044cdcdab0
 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
 KDB: enter: panic

 Didn't get a core because it panics before dumpdev is set.

 Is anyone

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Lawrence Stewart
On 08/23/15 22:54, Konstantin Belousov wrote:
 On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote:
 On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote:
 On 12/08/2015 17:11, Lawrence Stewart wrote:
 On 08/07/15 07:33, Pawel Pekala wrote:
 Hi K.,

 On 2015-08-06 12:33 -0700, K. Macy km...@freebsd.org wrote:
 Is this still happening?

 Still crashes:

 +1 for me running r286617

 Here is another +1 with r286922.
 I can add a couple of bits of debugging data:

 (kgdb) fr 8
 #8  0x80639d60 in knote (list=0xf8019a733ea0,
 hint=2147483648, lockflags=value optimized out) at
 /usr/src/sys/kern/kern_event.c:1964
 1964} else if ((lockflags  KNF_NOKQLOCK) != 0) {
 (kgdb) p *list
 $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
 knlist_mtx_lock, kl_unlock = 0x8063a200 knlist_mtx_unlock,
   kl_assert_locked = 0x8063a220 knlist_mtx_assert_locked,
 kl_assert_unlocked = 0x8063a240 knlist_mtx_assert_unlocked,
   kl_lockarg = 0xf8019a733bb0}
 (kgdb) disassemble
 Dump of assembler code for function knote:
 0x80639d00 knote+0:   push   %rbp
 0x80639d01 knote+1:   mov%rsp,%rbp
 0x80639d04 knote+4:   push   %r15
 0x80639d06 knote+6:   push   %r14
 0x80639d08 knote+8:   push   %r13
 0x80639d0a knote+10:  push   %r12
 0x80639d0c knote+12:  push   %rbx
 0x80639d0d knote+13:  sub$0x18,%rsp
 0x80639d11 knote+17:  mov%edx,%r12d
 0x80639d14 knote+20:  mov%rsi,-0x30(%rbp)
 0x80639d18 knote+24:  mov%rdi,%rbx
 0x80639d1b knote+27:  test   %rbx,%rbx
 0x80639d1e knote+30:  je 0x80639ef6 knote+502
 0x80639d24 knote+36:  mov%r12d,%eax
 0x80639d27 knote+39:  and$0x1,%eax
 0x80639d2a knote+42:  mov%eax,-0x3c(%rbp)
 0x80639d2d knote+45:  mov0x28(%rbx),%rdi
 0x80639d31 knote+49:  je 0x80639d38 knote+56
 0x80639d33 knote+51:  callq  *0x18(%rbx)
 0x80639d36 knote+54:  jmp0x80639d42 knote+66
 0x80639d38 knote+56:  callq  *0x20(%rbx)
 0x80639d3b knote+59:  mov0x28(%rbx),%rdi
 0x80639d3f knote+63:  callq  *0x8(%rbx)
 0x80639d42 knote+66:  mov%rbx,-0x38(%rbp)
 0x80639d46 knote+70:  mov(%rbx),%rbx
 0x80639d49 knote+73:  test   %rbx,%rbx
 0x80639d4c knote+76:  je 0x80639ee5 knote+485
 0x80639d52 knote+82:  and$0x2,%r12d
 0x80639d56 knote+86:  nopw   %cs:0x0(%rax,%rax,1)
 0x80639d60 knote+96:  mov0x28(%rbx),%r14

 Panic is in the last quoted instruction.
 And:
 (kgdb) i reg
 rax0x246582
 rbx0xdeadc0dedeadc0de   -2401050962867404578
 rcx0x0  0
 rdx0x12e302
 rsi0x80a26a5a   -2136839590
 rdi0x80e81b80   -2132272256
 rbp0xfe02b7efea20   0xfe02b7efea20
 rsp0xfe02b7efe9e0   0xfe02b7efe9e0
 r8 0x80a269ce   -2136839730
 r9 0x80e82838   -2132269000
 r100x1  65536
 r110x80fabd10   -2131051248
 r120x0  0
 r130xf801ff84a818   -8787511171048
 r140xf801ff84a800   -8787511171072
 r150xf8019a6974f0   -8789207452432
 rip0x80639d60   0x80639d60 knote+96
 eflags 0x10286  66182

 I think that $rbx stands out here (this is a kernel with INVARIANTS).

 Looking at the code, is it possible that one of the calls from within
 the loop's body modifies the list?  If that is so and provided that is a
 valid behavior, then maybe using SLIST_FOREACH_SAFE would help.

 This is first time a useful debugging data was posted.

 The 0x28 offset may indicate either kn_kq member access of the struct
 knote, or kq_list of the struct kqueue.

 kl_list.slh_first of the list parameter is NULL, how would a list
 iteration loop even start ?  Can you look up the list argument value
 from the previous frame (%rdi is overwritten, so debugger might be
 confused) ?
 
 After looking at your data closely, I think you are right.  The panic
 occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT).  This is the
 only case in the tree where filter uses knlist_remove_inevent() to detach
 processed note, so indeed the slist is modified under the iterator.
 
 Below is the patch with the suggested change and unrelated cleanup of
 the uma(9) KPI use.  Please test, everybody who has a panic with the
 backtrace pointing to the sys_exit().

Fixes the panic for me too, thanks Kostik.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Lawrence Stewart
On 08/27/15 09:36, John-Mark Gurney wrote:
 Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
 On 12/08/2015 17:11, Lawrence Stewart wrote:
 On 08/07/15 07:33, Pawel Pekala wrote:
 Hi K.,

 On 2015-08-06 12:33 -0700, K. Macy km...@freebsd.org wrote:
 Is this still happening?

 Still crashes:

 +1 for me running r286617

 Here is another +1 with r286922.
 I can add a couple of bits of debugging data:

 (kgdb) fr 8
 #8  0x80639d60 in knote (list=0xf8019a733ea0,
 hint=2147483648, lockflags=value optimized out) at
 /usr/src/sys/kern/kern_event.c:1964
 1964} else if ((lockflags  KNF_NOKQLOCK) != 0) {
 (kgdb) p *list
 $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
 
 We should/cannot get here w/ an empty list.  If we do, then there is
 something seriously wrong...  The current kn (which we must have as we
 are here) MUST be on the list, but as you just showed, there are no
 knotes on the list.
 
 Can you get me a print of the knote?  That way I can see what flags
 are on it?

I quickly tried to get this info for you by building my kernel with -O0
and reproducing, but I get an insta-panic on boot with the new kernel:

Fatal double fault
rip = 0x8218c794
rsp = 0xfe044cdc9fe0
rbp = 0xfe044cdca110
cpuid = 2; apic id = 02
panic: double fault
cpuid = 2
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfe03dcfffe30
vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0
panic() at panic+0x43/frame 0xfe03dc10
dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30
Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30
--- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp =
0xfe044cdca110 ---
vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110
vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame
0xfe044cdca560
vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0
zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0
zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730
zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760
vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame
0xfe044cdca800
zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930
zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980
zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0
spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50
traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60
traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0
traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40
traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0
traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0
traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0
traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040
traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140
spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0
spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610
spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0
spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0
spa_load() at spa_load+0x320/frame 0xfe044cdccbb0
spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50
spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40
spa_open() at spa_open+0x35/frame 0xfe044cdccd70
dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0
dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30
zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050
zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0
zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390
vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660
kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0
parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810
vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d0
start_init() at start_init+0x62/frame 0xfe044cdcda70
fork_exit() at fork_exit+0x84/frame 0xfe044cdcdab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe044cdcdab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Didn't get a core because it panics before dumpdev is set.

Is anyone else able to run -O0 kernels or do I have something set to evil?

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-12 Thread Lawrence Stewart
On 08/07/15 07:33, Pawel Pekala wrote:
 Hi K.,
 
 On 2015-08-06 12:33 -0700, K. Macy km...@freebsd.org wrote:
 Is this still happening?
 
 Still crashes:

+1 for me running r286617

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Panic @r251745; i386, early in boot sequence

2013-06-14 Thread Lawrence Stewart
On 06/15/13 02:35, David Wolfskill wrote:
 Here's a hand-transcribed copy of the backtrace:
 
 ... Timecounters tick every 1.000 msec panic: curvnet is NULL cpuid
 = 0 KDB: stack backtrace: 
 db_trace_self_wrapper(c1034c40,c102a482,c11b646c,c2020cbc,c1037f21,...)
 at 0xc051283d = db_trace_self_wrapper+0x2d/frame 0xc2020be8 
 kdb_backtrace(c10863e7,0,c102a482,c2020cbc,c102a482,...) at
 0xc0aa9800 = kdb_backtrace+0x30/frame 0xc2020c50 
 vpanic(c11963a2,100,c102a482,c2020cbc,c2020cbc,...) at 0xc0a71e0f =
 vpanic+0x11f/frame 0xc2020c50 
 kassert_panic(c102a482,0,c102a450,10b,c1143100,...) at 0xc0a71cea =
 kassert_panic+0xea/frame 0xc2020cb0 
 hhook_head_register(1,0,c1358904,102,c116030c,...) at 0xc0a40132 =
 hhook_head_register+0x102/frame 0xc2020cd4 
 tcp_init(0,c103cac6,ce03d448,c112654c,c2020d58,...) at 0xc0c1fecc =
 tcp_init+0x2c/frame 0xc0c0d20 
 domain_init(c114305c,0,ce03d530,201e000,2025000,...) at 0xc0adf357
 = domain_init+0x27/frame 0xc2020d38 mi_startup() at 0xc0a1deb7 =
 mi_startup+0xf7/frame 0xc2020d58 begin() at 0xc0a1c07 = begin+0x2c 
 KDB: enter: panic [ thread pid 0 tid 10 ] Stopped at  0xc0aa95fd
 = kdb_enter+0x3d: movl$0,0xc11b21c4 = kdb_why db
 
 Previous working head was @r251684 (from yesterday).  My build
 machine (where I have a serial console) is still building; above is
 from laptop (which lacks serial console, unfortunately).
 
 I update build machine  laptop to same GRNs; here is build
 machine's uname-a output (showing yesterday's update, as today's
 is still in progress):
 
 FreeBSD freebeast.catwhisker.org 10.0-CURRENT FreeBSD 10.0-CURRENT
 #1191  r251684M/251684:135: Thu Jun 13 09:46:05 PDT 2013
 r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/sys/GENERIC
 i386
 
 Any suggestions for what to hack?  :-}

My apologies for the brain fart. Committing a fix shortly.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Read-triggered corruption of swap backed MD devices

2013-05-23 Thread Lawrence Stewart
Hi all,

I tracked the cause of a colleague's nanobsd image creation problem to
what appears to be some nasty behaviour with swap-backed MD devices.
I've verified the behaviour exists on three separate systems running
10-CURRENT r250260, 9-STABLE r250824 and 9-STABLE r250925.

The following minimal reproduction recipe (run as root)
deterministically triggers the behaviour for me on the 3 systems I've
tested:

env MD_DEV=`mdconfig -an -t swap -s 1m -x 63 -y 16` sh -c '(fdisk -I
md${MD_DEV} ; bsdlabel -w -B md${MD_DEV}s1 ; bsdlabel md${MD_DEV}s1 ; dd
if=/dev/md${MD_DEV} of=/dev/null bs=64k ; bsdlabel md${MD_DEV}s1 ;
mdconfig -d -u ${MD_DEV})'

By changing the mdconfig -t swap argument to -t malloc, the bsdlabel
remains intact after the dd command completes.

I've included command line recipe runs from my 10-CURRENT r250260 laptop
with both -t swap and -t malloc at the end of this email for reference.

Smells like a VM related problem to me, but ENOCLUE so I would
appreciate some help.

Cheers,
Lawrence



root@lstewart-laptop:~ # uname -a
FreeBSD lstewart-laptop 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r250260:
Wed May 22 15:57:40 EST 2013
root@lstewart-laptop:/usr/obj/usr/src/sys/GENERIC  amd64



root@lstewart-laptop:~ # env MD_DEV=`mdconfig -an -t swap -s 1m -x 63 -y
16` sh -c '(fdisk -I md${MD_DEV} ; bsdlabel -w -B md${MD_DEV}s1 ;
bsdlabel md${MD_DEV}s1 ; dd if=/dev/md${MD_DEV} of=/dev/null bs=64k ;
bsdlabel md${MD_DEV}s1 ; mdconfig -d -u ${MD_DEV})'
*** Working on device /dev/md0 ***
fdisk: invalid fdisk partition table found
# /dev/md0s1:
8 partitions:
#  size offsetfstype   [fsize bsize bps/cpg]
  a:   1937 16unused0 0
  c:   1953  0unused0 0 # raw part,
don't edit
16+0 records in
16+0 records out
1048576 bytes transferred in 0.001728 secs (606794497 bytes/sec)
bsdlabel: /dev/md0s1: no valid label found



root@lstewart-laptop:~ # env MD_DEV=`mdconfig -an -t malloc -s 1m -x 63
-y 16` sh -c '(fdisk -I md${MD_DEV} ; bsdlabel -w -B md${MD_DEV}s1 ;
bsdlabel md${MD_DEV}s1 ; dd if=/dev/md${MD_DEV} of=/dev/null bs=64k ;
bsdlabel md${MD_DEV}s1 ; mdconfig -d -u ${MD_DEV})'
*** Working on device /dev/md0 ***
fdisk: invalid fdisk partition table found
# /dev/md0s1:
8 partitions:
#  size offsetfstype   [fsize bsize bps/cpg]
  a:   1937 16unused0 0
  c:   1953  0unused0 0 # raw part,
don't edit
16+0 records in
16+0 records out
1048576 bytes transferred in 0.001251 secs (838202118 bytes/sec)
# /dev/md0s1:
8 partitions:
#  size offsetfstype   [fsize bsize bps/cpg]
  a:   1937 16unused0 0
  c:   1953  0unused0 0 # raw part,
don't edit
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Enhancing the user experience with tcsh

2012-02-09 Thread Lawrence Stewart
On 02/10/12 11:52, Eitan Adler wrote:
 In conf/160689 (http://www.freebsd.org/cgi/query-pr.cgi?pr=160689)
 there has been some discussion about changing the default cshrc file.
 
 I'd like to commit something like the following based on Chris's patch
 at the end of the thread. This post is an attempt to open the change
 to wider discussion.

I like the proposed changes, although I don't see why you set the prompt
twice? I've also inserted the changes I commonly run with inline below.

 commit dbe6cb730686dd53af7d06cc9b69b60e6e55549c
 diff --git a/etc/root/dot.cshrc b/etc/root/dot.cshrc
 --- a/etc/root/dot.cshrc
 +++ b/etc/root/dot.cshrc
 @@ -7,9 +7,10 @@
 
  alias h  history 25
  alias j  jobs -l
 -alias la ls -a
 +alias la ls -aF
  alias lf ls -FA
 -alias ll ls -lA
 +alias ll ls -lAF
 +alias ls ls -F
 
  # A righteous umask
  umask 22
 @@ -17,19 +18,24 @@ umask 22
  set path = (/sbin /bin /usr/sbin /usr/bin /usr/games /usr/local/sbin
 /usr/local/bin $HOME/bin)
 
  setenv   EDITOR  vi
 -setenv   PAGER   more
 +setenv   PAGER   less
  setenv   BLOCKSIZE   K

# Sets SSH_AUTH_SOCK to the user's ssh-agent socket path if running
if (${?SSH_AUTH_SOCK} != 1) then
setenv  SSH_AUTH_SOCK   `sockstat | grep ${USER} | grep
ssh-agent | awk '{print $6}'`
endif

  if ($?prompt) then
   # An interactive shell -- set some stuff up
   set prompt = `/bin/hostname -s`# 
# Useful for root's .cshrc, although I run with it in all my .cshrc
if (`id -g` == 0) then
set prompt=root@%m# 
endif
   set filec
 - set history = 100
 - set savehist = 100
 + set history = 1
 + set savehist = 1
 + set autolist
set autologout = 0
 + # Use history to aid expansion
 + set autoexpand
   set mail = (/var/mail/$USER)
   if ( $?tcsh ) then
   bindkey ^W backward-delete-word
   bindkey -k up history-search-backward
   bindkey -k down history-search-forward
# This maps the Delete key to do the right thing
# Pressing CTRL-v followed by the key of interest will print the shell's
mapping for the key
bindkey ^[[3~ delete-char-or-list-or-eof
   endif
 + set prompt = [%n@%m]%c04%# 
 + set promptchars = %#
  endif
 

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Removal of sysinstall from HEAD and lack of a post-install configuration tool

2011-12-27 Thread Lawrence Stewart

On 12/27/11 16:13, Ron McDowell wrote:

Doug Barton wrote:

The story so far ...

sysinstall was removed from HEAD in October. I (and others) objected on
the basis that at this time there is no replacement for the post-install
configuration role that sysinstall played. More sysinstall components
were then removed. Then the old version of libdialog (which sysinstall
used) was removed. Thus at this point it's not possible to easily
restore sysinstall.

So my question is, how much do you care? Is lack of that functionality
in HEAD something that we care about?


Doug


We have around 90 web servers running 8.2p5 right now [and yes, I did
update the lot on Christmas Eve but that's a different story] and they
will not be upgraded to 9.0 until/unless the post-install functionality
that was lost by the removal of sysinstall is reintegrated in some way.
I also complained about it and was told in effect, too bad. Everyone
who commented said sysinstall caused more problems than it solved,
although I've been using it for any system changes I needed that it was
capable of doing for as long back as I can remember, and my first
FreeBSD box was v2.2.

I think removing any functionality that was in a previous release
without providing an equal-or-better alternative is a bad idea, and that
needs to be considered more carefully in the future.

So this is not just a +1 vote, it's a +90.


Sysintall is in 9 and will not be removed from the 9 branch. The 
installer used on the release media has changed, but as far as I 
understand, there is nothing stopping you from running sysinstall from a 
installer shell or using it for post installation configuration.


Doug is only referring to the head branch (which will eventually in 
~18-24 months become the 10 branch), so you should be able to have the 
best of both worlds with 9 i.e. try bsdinstall, fall back to sysinstall 
when you find bugs or missing features (don't forget to lodge bug 
reports for problems you find so that bsdinstall can be improved).


On the topic of Doug's actual question, I see minimal sense in 
resurrecting sysinstall in head now. I would suggest it be done much 
closer to (say, 6 months before) the 10.0 release cycle, if no suitable 
post-installation configuration tool has materialised.


In the meantime, cajole everyone who pops up saying I really want post 
installation configuration support to get involved with writing a 
bsdinstaller-like script (I think it should be completely separate to 
bsdinstaller, but perhaps use the same backend shell script 
functions/infrastructure) to do the job.


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Removal of sysinstall from HEAD and lack of a post-install configuration tool

2011-12-27 Thread Lawrence Stewart

On 12/28/11 06:29, Doug Barton wrote:

On 12/27/2011 03:48, Lawrence Stewart wrote:

On the topic of Doug's actual question, I see minimal sense in
resurrecting sysinstall in head now. I would suggest it be done much
closer to (say, 6 months before) the 10.0 release cycle, if no suitable
post-installation configuration tool has materialised.


My concern about that approach is that 9.0 hasn't even been released yet
and we've already seen changes that are going to make it hard to
resurrect sysinstall if that's the decision we come to. Waiting another
year or 2 would make it impossible.


Which changes are you referring to? I would have thought a reverse merge 
to undo the deletion of the sysinstall and old libdialog sources would 
be very minimal work. We'd also probably need a few extra build system 
changes to make sure old libdialog is perhaps statically compiled into 
sysinstall as it would be the only in-tree consumer, but that's not hard 
either. I may be lacking some imagination, but don't really see why it 
would become harder the longer we wait.


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Lawrence Stewart

On 12/08/11 05:08, Luigi Rizzo wrote:

On Wed, Dec 07, 2011 at 11:59:43AM +0100, Andre Oppermann wrote:

On 06.12.2011 22:06, Luigi Rizzo wrote:

...

Even in my experiments there is a lot of instability in the results.
I don't know exactly where the problem is, but the high number of
read syscalls, and the huge impact of setting interrupt_rate=0
(defaults at 16us on the ixgbe) makes me think that there is something
that needs investigation in the protocol stack.

Of course we don't want to optimize specifically for the one-flow-at-10G
case, but devising something that makes the system less affected
by short timing variations, and can pass upstream interrupt mitigation
delays would help.


I'm not sure the variance is only coming from the network card and
driver side of things.  The TCP processing and interactions with
scheduler and locking probably play a big role as well.  There have
been many changes to TCP recently and maybe an inefficiency that
affects high-speed single sessions throughput has crept in.  That's
difficult to debug though.


I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which
seems slightly faster than HEAD) using MTU=1500 and various
combinations of card capabilities (hwcsum,tso,lro), different window
sizes and interrupt mitigation configurations.

default latency is 16us, l=0 means no interrupt mitigation.
lro is the software implementation of lro (tcp_lro.c)
hwlro is the hardware one (on 82599). Using a window of 100 Kbytes
seems to give the best results.

Summary:


[snip]


- enabling software lro on the transmit side actually slows
   down the throughput (4-5Gbit/s instead of 8.0).
   I am not sure why (perhaps acks are delayed too much) ?
   Adding a couple of lines in tcp_lro to reject
   pure acks seems to have much better effect.

The tcp_lro patch below might actually be useful also for
other cards.

--- tcp_lro.c   (revision 228284)
+++ tcp_lro.c   (working copy)
@@ -245,6 +250,8 @@

 ip_len = ntohs(ip-ip_len);
 tcp_data_len = ip_len - (tcp-th_off  2) - sizeof (*ip);
+   if (tcp_data_len == 0)
+   return -1;  /* not on ack */


 /*


There is a bug with our LRO implementation (first noticed by Jeff 
Roberson) that I started fixing some time back but dropped the ball on. 
The crux of the problem is that we currently only send an ACK for the 
entire LRO chunk instead of all the segments contained therein. Given 
that most stacks rely on the ACK clock to keep things ticking over, the 
current behaviour kills performance. It may well be the cause of the 
performance loss you have observed. WIP patch is at:


http://people.freebsd.org/~lstewart/patches/misctcp/tcplro_multiack_9.x.r219723.patch

Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have 
LRO capable hardware setup locally to figure out what I've missed. Most 
of the machines in my lab are running em(4) NICs which don't support 
LRO, but I'll see if I can find something which does and perhaps 
resurrect this patch.


If anyone has any ideas what I'm missing in the patch to make it work, 
please let me know.


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: 9.0-RC1 panic in tcp_input: negative winow.

2011-10-27 Thread Lawrence Stewart

On 10/26/11 22:53, John Baldwin wrote:

On Wednesday, October 26, 2011 3:54:31 am Pawel Jakub Dawidek wrote:

On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote:

On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote:

On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote:

On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote:

My suggestion would be that if we won't be able to fix it before 9.0,
we should turn this assertion off, as the system seems to be able to
recover.


Shipped kernels have all assertions turned off.


Yes, I'm aware of that, but many people compile their production kernels
with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg.
corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing
it into a printf, so it will be visible.


No, the kernel is corrupting things in other places when this is true, so
if you are running with INVARIANTS, we want to know about it.   Specifically,
in several places in TCP we assume that rcv_adv= rcv_nxt, and depend on
being able to do 'rcv_adv - rcv_nxt'.

In this case, it looks like the difference is consistently less than one
frame.  I suspect the other end of the connection is sending just beyond the
end of the advertised window (it probably assumes it is better to send a full
frame if it has that much pending data even though part of it is beyond the
window edge vs sending a truncated packet that just fills the window) and that
that frame is accepted ok in the header prediction case and it's ACK is
delayed, but the next packet to arrive then trips over this assumption.

Since 'win' is guaranteed to be non-negative and we explicitly cast
'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking
for:

tp-rcv_wnd = imax(win, (int)(tp-rcv_adv - tp-rcv_nxt));

I think we already handle this case ok and perhaps the assertion can just be
removed?  Not sure if others feel that it warrants a comment to note that this
is the case being handled.


I added debug to the places where rcv_adv and rcv_nxt are modified. Here
is what happens before the panic occurs:

tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022361548 
rcv_adv 4022360100 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022362298 
rcv_adv 4022361548 diff -750
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022363746 
rcv_adv 4022362298 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022364836 
rcv_adv 4022363746 diff -1090
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022366284 
rcv_adv 4022364836 diff -1448
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022370628 
rcv_adv 4022369690 diff -938
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022379140 
rcv_adv 4022377692 diff -1448
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022387792 
rcv_adv 4022386344 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022388890 
rcv_adv 4022387792 diff -1098
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022390338 
rcv_adv 4022388890 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 
rcv_adv 4022394342 diff -221
panic: tcp_input negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 
rcv_adv 4022394342 win=0 diff -221

I can send you the full log if you want, I've plenty of messages where
rcv_adv  rcv_nxt, not all of them trigger this assertion.


The assertion would be triggered when the next packet arrives (as I said
above).  Try modifying your debugging output to also log if the ACK is
delayed.  I suspect it is not delayed until the last one.  (Pushing out an
ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case
of an immediate ACK, rcv_nxt  rcv_adv is only a transient condition all
under a single lock invocation so never visible to other consumers of the
protocol control block.)  If that is what you see, then that confirms what
I guessed above and I will likely just remove the assertion in tcp_input()
and patch the timewait code to handle this case.



Pawel, have you been able to confirm John's hypothesis? What I don't 
quite get is why we haven't had a lot more reports of this issue...


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: 9.0-RC1 panic in tcp_input: negative winow.

2011-10-22 Thread Lawrence Stewart

On 10/22/11 19:49, Pawel Jakub Dawidek wrote:

The panic message says:

panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 
3718269252 rcv_adv 3718268291

I only have picture of the backtrace:

http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg



ewww that is not good. Can you give us any more information about the 
machine and what it's doing? Is it terminating TCP connections from the 
internet at large or only local LAN (i.e. is there likely to be packet 
loss happening)? Are you doing TSO or LRO? Do you have any non-default 
tuning in place?


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Sense fetching [Was: cdrtools /devel ...]

2011-03-30 Thread Lawrence Stewart
On 11/13/10 20:34, Alexander Motin wrote:
 Brandon Gooch wrote:
 2010/11/5 Alexander Motin m...@freebsd.org:
 Hi.

 I've reviewed tests that scgcheck does to SCSI subsystem. It shown
 combination of several issues in both CAM, ahci(4) and cdrtools itself.
 Several small patches allow us to pass most of that tests:
 http://people.freebsd.org/~mav/sense/

 ahci_resid.patch: Add support for reporting residual length on data
 underrun. SCSI commands often returns results shorter then expected.
 Returned value allows application to know/check how much data it really
 has. It is also important for sense fetching, as ATAPI and USB devices
 return sense as data in response to REQUEST_SENSE command.

 sense_resid.patch: When manually requesting sense data (ATAPI or USB),
 request only as much data as user requested (not the fixed structure
 size), and return respective sense residual length.

 pass_autosence.patch: Unless CAM_DIS_AUTOSENSE is set, always fetch
 sense if not done by SIM, independently of CAM_PASS_ERR_RECOVER. As soon
 as device freeze released before returning to user-level, user-level
 application by definition can't reliably fetch sense data if some other
 application (like hald) tries to access device same time.

 cdrtools.patch: Make libscg (part of cdrtools) on FreeBSD to submit
 wanted sense length to CAM and do not clear sense return buffer. It is
 mostly cosmetics, important probably only for scgcheck.

 Testers and reviewers welcome. I am especially interested in opinion
 about pass_autosence.patch -- may be we should lower sense fetching even
 deeper, to make it work for all cam_periph_runccb() consumers.

 Hey mav, sorry to chime in after so long here, but have some of these
 patches been committed (as of r215179)?

 Which patches are still applicable for testing? I assume the cdrtools
 patch for sure...
 
 Now uncommitted pass_autosence.patch and possibly cdrtools.patch.
 

To add another data point, I just applied the pass_autosence.patch to my
ahci enabled 8.2-STABLE r220153 kernel and I can now burn successfully
with cdrecord. The same kernel without the patch was unable to burn
(though it could erase disks ok).

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [HEADS UP] Significant TCP work committed to head - CUBIC H-TCP committed

2010-12-02 Thread Lawrence Stewart
On 11/12/10 20:35, Lawrence Stewart wrote:
 Hi All,
 
 A quick note that this evening, I made the first in a series of upcoming
 commits to head that modify the TCP stack fairly significantly. I have
 no reason to believe you'll notice any issues, but TCP is a complex
 beast and it's possible things might crop up. The changes are mostly
 related to congestion control, so the sorts of issues that are likely to
 crop up if any will most probably be subtle and difficult to even
 detect. The first svn revision in question is r215166. The next few
 commits I plan to make will be basically zero impact and then another
 significant patch will follow in a few weeks.
 
 If you bump into an issue that you think might be related to this work,
 please roll back r215166 from your tree and attempt to reporoduce before
 reporting the problem. Please CC me directly with your problem report
 and post to freebsd-current@ or freebsd-net@ as well.
 
 Lots more information about what all this does and how to use it will be
 following in the coming weeks, but in the meantime, just keep this note
 in the back of your mind. For the curious, some information about the
 project is available at [1,2].
 
 Cheers,
 Lawrence
 
 [1] http://caia.swin.edu.au/freebsd/5cc/
 [2]
 http://www.freebsd.org/news/status/report-2010-07-2010-09.html#Five-New-TCP-Congestion-Control-Algorithms-for-FreeBSD

After a rather arduous couple of weeks grappling with VIMAGE related
bugs, intermittently failing testbed hardware and various algorithm
ambiguities, the next chunk of work has finally landed in head. Kernel
modules implementing the CUBIC and H-TCP congestion control algorithms
are now built/installed during a make kernel.

I should stress that everything other than NewReno is considered
experimental at this stage in an IRTF/IETF specification sense, and as
such I would strongly advise against setting the system default
algorithm to anything other than NewReno. The TCP_CONGESTION setsockopt
call (used by e.g. iperf -Z) is the appropriate way to test an algorithm
on an individual connection.

For those interested in taking the algorithms for a spin, the easiest
way is probably to use benchmarks/iperf from ports on a source/sink
machine and do the following:

- On the data sink (receiver)
cd /usr/ports/benchmarks/iperf
fetch http://caia.swin.edu.au/urp/newtcp/tools/caia_iperf204_1.1.patch
mv caia_iperf204_1.1.patch files/patch-caiaiperf
make install clean
sysctl kern.ipc.maxsockbuf=1048576
iperf -s -j 256k -k 256k

- On the data source (sender)
cd /usr/ports/benchmarks/iperf
fetch http://caia.swin.edu.au/urp/newtcp/tools/caia_iperf204_1.1.patch
mv caia_iperf204_1.1.patch files/patch-caiaiperf
make install clean
kldload cc_cubic cc_htcp
sysctl kern.ipc.maxsockbuf=1048576
iperf -c data_sink_ip -j 256k -k 256k -Z algo (where algo is one
from the list reported by sysctl net.inet.tcp.cc.available)

You may need to fiddle with the above parameters a bit depending on your
setup. You will want decent bandwidth (5+Mbps should be ok) and a
moderate to large RTT (50+ms) between both hosts if you want to see
these algorithms really shine. You can use dummynet on the data source
machine to easily introduce artificial bw/delay/queuing e.g.

ipfw pipe 1 config noerror bw 10Mbps delay 20ms queue 100Kbytes
ipfw add 10 pipe 1 ip from me to data_sink_ip dst-port 5001

Be careful to do the above via console access or stick options
IPFIREWALL and options IPFIREWALL_DEFAULT_TO_ACCEPT in your kernel
config to avoid locking yourself out (dummynet needs IPFW to work).

For the really interested (by now I suspect my audience is down to 0,
but still), you might want to load siftr and enable/disable it during
each test run and make your very own plot of cwnd vs time to see what's
really going on behind the scenes.

Ok that's enough for now, but much more is on the way. Please let me
know if you have any feedback or run into any problems related to this work.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [HEADS UP] Significant TCP work committed to head - CUBIC H-TCP committed

2010-12-02 Thread Lawrence Stewart
Hi Ivan,

On 12/03/10 00:07, Ivan Voras wrote:
 On 12/02/10 12:53, Lawrence Stewart wrote:
 
 For the really interested (by now I suspect my audience is down to 0,
 but still), you might want to load siftr and enable/disable it during
 each test run and make your very own plot of cwnd vs time to see what's
 really going on behind the scenes.

 Ok that's enough for now, but much more is on the way. Please let me
 know if you have any feedback or run into any problems related to this
 work.
 
 Hi,
 
 My question isn't very constructive but I'd like to know more about this
 topic. Have you seen this:
 
 http://blog.benstrong.com/2010/11/google-and-microsoft-cheat-on-slow.html
 http://developers.slashdot.org/story/10/11/26/1729218/Google-Microsoft-Cheat-On-Slow-Start-mdash-Should-You

Yes I'd seen the first one and just skimmed the slashdot thread now.

 ? In short: is the existance of slow-start a property of (New)Reno and

No, mostly unrelated. Slow start is one of 4 separate but related
algorithms which control a TCP flow's behaviour during startup and
general operation. See RFC5681 for useful discussion of the algorithms.

NewReno unfortunately is an overloaded term. In congestion control
circles, NewReno is used to refer to the congestion avoidance behaviour
of increase cwnd by 1 max seg size per RTT and backoff cwnd by half
when congestion (3 dup ACKs) is detected (which is the same basic
behaviour as Reno BTW). NewReno also refers to a set of tweaks (RFC3782)
to TCP's fast recovery algorithm (helps recover from multiple losses in
a window when SACK isn't available).

 will some of the new algorithms make it less cautious, i.e. faster? I
 don't think it's critical but I'm often noticing it, especially on bulk
 transfers over LAN.

With respect to slow start, no. Congestion control algorithms tend to
focus on the increase/decrease of cwnd during congestion avoidance mode,
which is transitioned to after slow start completes. Slow start is left
untouched. There are proposals to modify/replace slow start e.g. RFC4782
and 'JumpStart' [1].

The reason Google and Microsoft are fiddling with things are because
they typically only need to push a small amount of data, so waiting for
slow start to complete eats up unnecessary RTTs. Google are pushing in
the IETF at the moment to have the initial window bumped to 10 segments
(see the tcpm, iccrg and tmrg IRTF/IETF mailing lists if interested).
There is some push back happening though and the discussions are
interesting.

Cheers,
Lawrence

[1] http://www.icir.org/mallman/papers/jumpstart-pfldnet07.pdf
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [HEADS UP] Significant TCP work committed to head - VIMAGE users

2010-11-16 Thread Lawrence Stewart
On 11/12/10 20:35, Lawrence Stewart wrote:
 Hi All,
 
 A quick note that this evening, I made the first in a series of upcoming
 commits to head that modify the TCP stack fairly significantly. I have
 no reason to believe you'll notice any issues, but TCP is a complex
 beast and it's possible things might crop up. The changes are mostly
 related to congestion control, so the sorts of issues that are likely to
 crop up if any will most probably be subtle and difficult to even
 detect. The first svn revision in question is r215166. The next few
 commits I plan to make will be basically zero impact and then another
 significant patch will follow in a few weeks.
 
 If you bump into an issue that you think might be related to this work,
 please roll back r215166 from your tree and attempt to reporoduce before
 reporting the problem. Please CC me directly with your problem report
 and post to freebsd-current@ or freebsd-net@ as well.
 
 Lots more information about what all this does and how to use it will be
 following in the coming weeks, but in the meantime, just keep this note
 in the back of your mind. For the curious, some information about the
 project is available at [1,2].
 
 Cheers,
 Lawrence
 
 [1] http://caia.swin.edu.au/freebsd/5cc/
 [2]
 http://www.freebsd.org/news/status/report-2010-07-2010-09.html#Five-New-TCP-Congestion-Control-Algorithms-for-FreeBSD

For any VIMAGE users running head, please note that r215166 overlooked
some important VIMAGE related issues and actually triggers a kernel
panic when the first vnet is brought up (see [3] for details). Please
ensure you update to r215395 or later to ensure you have all the patches
I committed to address the VIMAGE deficiencies in the original r215166
commit.

Cheers,
Lawrence

[3]
http://lists.freebsd.org/pipermail/svn-src-head/2010-November/022381.html
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


[HEADS UP] Significant TCP work committed to head

2010-11-12 Thread Lawrence Stewart
Hi All,

A quick note that this evening, I made the first in a series of upcoming
commits to head that modify the TCP stack fairly significantly. I have
no reason to believe you'll notice any issues, but TCP is a complex
beast and it's possible things might crop up. The changes are mostly
related to congestion control, so the sorts of issues that are likely to
crop up if any will most probably be subtle and difficult to even
detect. The first svn revision in question is r215166. The next few
commits I plan to make will be basically zero impact and then another
significant patch will follow in a few weeks.

If you bump into an issue that you think might be related to this work,
please roll back r215166 from your tree and attempt to reporoduce before
reporting the problem. Please CC me directly with your problem report
and post to freebsd-current@ or freebsd-net@ as well.

Lots more information about what all this does and how to use it will be
following in the coming weeks, but in the meantime, just keep this note
in the back of your mind. For the curious, some information about the
project is available at [1,2].

Cheers,
Lawrence

[1] http://caia.swin.edu.au/freebsd/5cc/
[2]
http://www.freebsd.org/news/status/report-2010-07-2010-09.html#Five-New-TCP-Congestion-Control-Algorithms-for-FreeBSD
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [HEADS UP] Significant TCP work committed to head

2010-11-12 Thread Lawrence Stewart
On 11/13/10 04:58, Kevin Oberman wrote:
 Date: Fri, 12 Nov 2010 20:35:45 +1100
 From: Lawrence Stewart lstew...@freebsd.org
 Sender: owner-freebsd-curr...@freebsd.org

 Hi All,

 A quick note that this evening, I made the first in a series of upcoming
 commits to head that modify the TCP stack fairly significantly. I have
 no reason to believe you'll notice any issues, but TCP is a complex
 beast and it's possible things might crop up. The changes are mostly
 related to congestion control, so the sorts of issues that are likely to
 crop up if any will most probably be subtle and difficult to even
 detect. The first svn revision in question is r215166. The next few
 commits I plan to make will be basically zero impact and then another
 significant patch will follow in a few weeks.

 If you bump into an issue that you think might be related to this work,
 please roll back r215166 from your tree and attempt to reporoduce before
 reporting the problem. Please CC me directly with your problem report
 and post to freebsd-current@ or freebsd-net@ as well.

 Lots more information about what all this does and how to use it will be
 following in the coming weeks, but in the meantime, just keep this note
 in the back of your mind. For the curious, some information about the
 project is available at [1,2].

 Cheers,
 Lawrence

 [1] http://caia.swin.edu.au/freebsd/5cc/
 [2]
 http://www.freebsd.org/news/status/report-2010-07-2010-09.html#Five-New-TCP-Congestion-Control-Algorithms-for-FreeBSD
 
 Lawrence,
 
 Great news! I've been looking forward to having these congestion
 algorithms for a while and this is clearly a big step to getting there.

HTCP and CUBIC should become available next week (the zero impact
commits I mentioned in my email above) and then a chunk of additional
infrastructure is needed in order to add our delay based algorithm
modules to the tree. We anticipate having all the code in head by the
time Christmas rolls around assuming no major issues crop up.

 Do you intend to MFC this for 8.2?

No. I wouldn't feel comfortable unleashing this on people with only 2
weeks of soak time in head. The current MFC schedule is 3 months from
yesterday, so it will be in stable/8 and hopefully stable/7 as well
shortly after 8.2 and 7.4 are released.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sysctl -a is slow

2010-09-20 Thread Lawrence Stewart
On 09/21/10 02:21, David Xu wrote:
 jhell wrote:

 On Mon, 20 Sep 2010 10:26, David Xu wrote:
 In Message-Id: 4c976f14.8000...@freebsd.org

 jhell wrote:
 On 09/19/2010 09:28, David Xu wrote:
 just typed sysctl -a on keyboard, and found it is slow, sometimes
 it has been stuck for a few seconds, further studied,I found it is
 stucked at sysctl kern.geom:

 %/usr/bin/time sysctl -a kern.geom
 kern.geom.collectstats: 1
 kern.geom.debugflags: 0
 kern.geom.label.debug: 0
 kern.geom.label.ext2fs.enable: 1
 kern.geom.label.iso9660.enable: 1
 kern.geom.label.msdosfs.enable: 1
 kern.geom.label.ntfs.enable: 1
 kern.geom.label.reiserfs.enable: 1
 kern.geom.label.ufs.enable: 1
 kern.geom.label.ufsid.enable: 1
 kern.geom.label.gptid.enable: 1
 kern.geom.label.gpt.enable: 1
2.01 real 0.00 user 0.00 sys

 it seems it needs more than 2 seconds to complete.


 A ktrace(1) and a kdump(1) of the resulting ktrace.out file would
 probably help here along with uname -a. Ive seen this happen once
 before
 but do not recall what caused it.


 Regards  good luck,


 Result is dumped here.
 http://people.freebsd.org/~davidxu/sysctl_slow.txt
 I think the culprit is sysctl kern.geom.confdot,
 which does not appear in normal output, until I check the kdump result.
 I tried five times, and it was blocked three times.



 Inspecting the output of sysctl -b kern.geom.confdot will give you
 what you currently have configured in the system as disks and what not
 through geom. If this seems to be bailing at that point, which is an
 opaque MIB/OID which doesn't come up other than when you use the -o
 switch to sysctl(1) then could you check your labels for your disks
 for any weird characters in the labels ?

 ( sysctl -bo kern.geom )

 Also does this have the same effect when run in a xterm, cons25
 terminal ?

 And same for the above but with the C, *_COUNTRY.UTF-8 or your normal
 locale ?

 ( env LANG=C sysctl kern.geom )

 Looking at the output from mine there are quite a few unprintable
 characters present. Maybe these are having an impact with one of your
 labels.


 
 I redirect all output to a disk file, and it still needs 1 second to
 complete, this machine is dual-core pentium E5500, faster than previous
 one which is a dual-core AMD 5000+ machine, the 5000+ needs 2
 seconds to complete.
 
 $/usr/bin/time sysctl -b kern.geom.confdot  sysctl_geom_confdot.txt
 1.00 real 0.00 user 0.00 sys
 
 the file is here:
 http://people.freebsd.org/~davidxu/sysctl_geom_confdot.txt

As an extra data point, running /usr/bin/time sysctl -b
kern.geom.confdot repeatedly on my amd64 8.1-STABLE desktop varies
between 0s and 2s. It reports 0 majority of the time but every 5 or so
runs it'll stall for 1 or 2 seconds. So the problem isn't isolated to head.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sysctl -a is slow

2010-09-20 Thread Lawrence Stewart
On 09/21/10 00:01, David Xu wrote:
 Lawrence Stewart wrote:
 On 09/21/10 02:21, David Xu wrote:
  
 jhell wrote:

 On Mon, 20 Sep 2010 10:26, David Xu wrote:
 In Message-Id: 4c976f14.8000...@freebsd.org

  
 jhell wrote:

 On 09/19/2010 09:28, David Xu wrote:
  
 just typed sysctl -a on keyboard, and found it is slow, sometimes
 it has been stuck for a few seconds, further studied,I found it is
 stucked at sysctl kern.geom:

 %/usr/bin/time sysctl -a kern.geom
 kern.geom.collectstats: 1
 kern.geom.debugflags: 0
 kern.geom.label.debug: 0
 kern.geom.label.ext2fs.enable: 1
 kern.geom.label.iso9660.enable: 1
 kern.geom.label.msdosfs.enable: 1
 kern.geom.label.ntfs.enable: 1
 kern.geom.label.reiserfs.enable: 1
 kern.geom.label.ufs.enable: 1
 kern.geom.label.ufsid.enable: 1
 kern.geom.label.gptid.enable: 1
 kern.geom.label.gpt.enable: 1
2.01 real 0.00 user 0.00 sys

 it seems it needs more than 2 seconds to complete.

 
 A ktrace(1) and a kdump(1) of the resulting ktrace.out file would
 probably help here along with uname -a. Ive seen this happen once
 before
 but do not recall what caused it.


 Regards  good luck,

   
 Result is dumped here.
 http://people.freebsd.org/~davidxu/sysctl_slow.txt
 I think the culprit is sysctl kern.geom.confdot,
 which does not appear in normal output, until I check the kdump
 result.
 I tried five times, and it was blocked three times.

 
 Inspecting the output of sysctl -b kern.geom.confdot will give you
 what you currently have configured in the system as disks and what not
 through geom. If this seems to be bailing at that point, which is an
 opaque MIB/OID which doesn't come up other than when you use the -o
 switch to sysctl(1) then could you check your labels for your disks
 for any weird characters in the labels ?

 ( sysctl -bo kern.geom )

 Also does this have the same effect when run in a xterm, cons25
 terminal ?

 And same for the above but with the C, *_COUNTRY.UTF-8 or your normal
 locale ?

 ( env LANG=C sysctl kern.geom )

 Looking at the output from mine there are quite a few unprintable
 characters present. Maybe these are having an impact with one of your
 labels.


   
 I redirect all output to a disk file, and it still needs 1 second to
 complete, this machine is dual-core pentium E5500, faster than previous
 one which is a dual-core AMD 5000+ machine, the 5000+ needs 2
 seconds to complete.

 $/usr/bin/time sysctl -b kern.geom.confdot  sysctl_geom_confdot.txt
 1.00 real 0.00 user 0.00 sys

 the file is here:
 http://people.freebsd.org/~davidxu/sysctl_geom_confdot.txt
 

 As an extra data point, running /usr/bin/time sysctl -b
 kern.geom.confdot repeatedly on my amd64 8.1-STABLE desktop varies
 between 0s and 2s. It reports 0 majority of the time but every 5 or so
 runs it'll stall for 1 or 2 seconds. So the problem isn't isolated to
 head.

 Cheers,
 Lawrence
   
 
 I happened to set kern.sched.preempt_thresh=200, so the kernel is more
 aggressive
 than default on thread preemption. it is easier than default to
 reproduce the
 problem,  my desktop machine is idle, but  it still stalls 1 or 2
 seconds on the sysctl.


heh, from /etc/sysctl.conf on the machine I tested with:

# 4/9/2010
# should give more responsiveness on desktop suggested
# by David Xu davi...@freebsd.org on freebsd-stable@
kern.sched.preempt_thresh=220

This machine is my primary kde4 desktop at home.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Crash during boot of current (rev 212885)

2010-09-19 Thread Lawrence Stewart
Hiya Randall!

On 09/20/10 08:56, Randall Stewart wrote:
 Hey all:
 
 I am now seeing a crash when I boot my Intel (in 64 bit more)...
 
 Its very early in the boot process.. and thus no crash dump ;-0
 
 Its in
 
 netisr_start_swi()
 
 When it initializes netisr_mtx with a mtx_init() it crashes saying
 that netisr_mtx is unaligned... (the address ddb shows for netisr_mtx ends
 with c ... so it definitely is unaligned...
 
 Looking at the netisr_workstream structure (where netisr_mtx is) it
 appears to be in theory aligned right (follows 2 pointers)... so
 did something change the DP_CPU Define stuff to cause us to get unaligned
 access?
 
 Just curious... If I don't hear from anyone I will start backing things
 out 1
 rev at a time until I find what did it I guess ;-)

My guess would be r212647. Try backing that rev out and if it fixes
things, hopefully Andriy will have some thoughts on how to fix the
problem. Apologies if my guess is a red herring.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: amd64 panic snd_hda - hdac_get_capabilities: Invalid corb size (0)

2010-07-27 Thread Lawrence Stewart
On 07/27/10 02:07, Anton Shterenlikht wrote:
 On Mon, Jul 26, 2010 at 02:24:52PM +0100, Anton Shterenlikht wrote:
 On amd64 r210496 I get this panic when booting a kernel
 with snd_hda(4). I haven't used this driver before, so
 can't say if this is a regression.

 (copied by hand)

 hdac0: ATI SB600 High Definition Audion Controller irq 16 at device 20.2 
 on pci0
 hdac0: HDA Driver Revision: 20100226_0142
 hdac0: [ITHREAD]
 hdac0: hdac_get_capabilities: Invalid corb size (0)
 device_attach: hdac0 attach returned 6
 Slab at 0xff000261eb18, freei 3 = 0
 panic: Duplicate free of item 0xff0002661c00 from zone 
 0xff00b7f9a500(1024)

 cpuid = 0
 KDB: enter: panic
 [ thread pid 0 tid 10 ]
 Stopped at kdb_enter+0x3d: movq  $0,0x74f360(%rip)
 dbbt

 (very long output.. ending in)

 mi_startup() at mi_startup_0x59
 btext() at btext+0x2c
 
 I moved back as far as r204000, still the same panic.
 
 Please advise

I get this same panic on my Toshiba Portege R600 laptop when I boot it
into Windows and then reboot it into FreeBSD. My guess is that the
Windows drivers leave the hardware in a state which the FreeBSD code
doesn't know how to deal with. I don't run Windows often so haven't hit
this panic in a while, but the trick that always worked for me was to go
into the BIOS and Reset to defaults, then boot into FreeBSD.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: amd64 panic snd_hda - hdac_get_capabilities: Invalid corb size (0)

2010-07-27 Thread Lawrence Stewart
On 07/27/10 18:09, Anton Shterenlikht wrote:
 On Tue, Jul 27, 2010 at 05:37:49PM +1000, Lawrence Stewart wrote:
 On 07/27/10 02:07, Anton Shterenlikht wrote:
 On Mon, Jul 26, 2010 at 02:24:52PM +0100, Anton Shterenlikht wrote:
 On amd64 r210496 I get this panic when booting a kernel
 with snd_hda(4). I haven't used this driver before, so
 can't say if this is a regression.

 (copied by hand)

 hdac0: ATI SB600 High Definition Audion Controller irq 16 at device 20.2 
 on pci0
 hdac0: HDA Driver Revision: 20100226_0142
 hdac0: [ITHREAD]
 hdac0: hdac_get_capabilities: Invalid corb size (0)
 device_attach: hdac0 attach returned 6
 Slab at 0xff000261eb18, freei 3 = 0
 panic: Duplicate free of item 0xff0002661c00 from zone 
 0xff00b7f9a500(1024)

 cpuid = 0
 KDB: enter: panic
 [ thread pid 0 tid 10 ]
 Stopped at kdb_enter+0x3d: movq  $0,0x74f360(%rip)
 dbbt

 (very long output.. ending in)

 mi_startup() at mi_startup_0x59
 btext() at btext+0x2c

 I moved back as far as r204000, still the same panic.

 Please advise

 I get this same panic on my Toshiba Portege R600 laptop when I boot it
 into Windows and then reboot it into FreeBSD. My guess is that the
 Windows drivers leave the hardware in a state which the FreeBSD code
 doesn't know how to deal with. I don't run Windows often so haven't hit
 this panic in a while, but the trick that always worked for me was to go
 into the BIOS and Reset to defaults, then boot into FreeBSD.
 
 no, that doesn't help, still the same panic
 
 Also, I've only FBSD installed on this laptop (HP Compaq 6715s),
 no other OS.

hmm I'll have to try the patch and see if it resolves the issue for me.
I guess in my case resetting the BIOS was causing a different code path
to be taken and thus the panic never triggered. Good to here it's
resovled for you though.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-07-03 Thread Lawrence Stewart

On 06/28/10 18:56, Lawrence Stewart wrote:

Hi again,

After my most recent appeal for testers, I received some excellent
feedback and thank everyone that has tried the patch. I've ironed out a
couple of bugs and have what I hope is the import-ready candidate patch
available for a final round of testing.

Please read on if you are able and willing to (re)test the code.


[snip]

I've committed SIFTR to head as r209662, with r209665 as a minor follow 
up fix to include the man page in the build.


Sincere thanks to everyone that pitched in with review/testing and if 
you haven't already tried it, give it a spin next time you update your 
sources to r209665 or later - man siftr will get you going. Please CC 
me explicitly on any mail regarding problems with SIFTR.


On the off chance anyone is looking for some self contained, small 
projects/patches to work on, I have plenty of additional ideas for 
improvements to SIFTR. I'd be very happy to collaborate with anyone that 
was interested enough to work on the code.


Enjoy!

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-28 Thread Lawrence Stewart

Hi again,

After my most recent appeal for testers, I received some excellent 
feedback and thank everyone that has tried the patch. I've ironed out a 
couple of bugs and have what I hope is the import-ready candidate patch 
available for a final round of testing.


Please read on if you are able and willing to (re)test the code.

On 06/19/10 13:27, Lawrence Stewart wrote:

Amount of feedback received thus far: nichts, nil, nada

*sings I'm so ronery in his best Kim Jong-il voice* [4]

Just like Uncle Sam [5], Uncle Lawrence needs you too - yes, I'm
pointing at YOU!

More specifically, people out there running current with 10-15 mins to
spare for some testing, please read on.

On 06/13/10 18:12, Lawrence Stewart wrote:

Hi all,

The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are
discovered.

SIFTR is a kernel module that logs a range of statistics on active TCP
connections to a log file. It provides the ability to make highly
granular measurements of TCP connection state, aimed at system
administrators, developers and researchers. You can use the data to find
bugs in the stack, understand why connections are performing badly and
test new code to name a few uses.

Development has been made possible in part by grants from the Cisco
University Research Program Fund at Community Foundation Silicon Valley,
and the FreeBSD Foundation. Bringing it into FreeBSD proper is being
carried out under the auspices of the Enhancing the FreeBSD TCP
Implementation FreeBSD Foundation project. More details are available
at [1,2,3].

If you can help out, please read on!


[snip]

Latest patch which fixes 2 bugs reported by testers and adds a bit more 
discussion to the man page is available here:


http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209558.patch

Fixed bugs:
- Running SIFTR on an INVARIANTS enabled kernel with a large number of 
TCP flows terminating on the machine would lead to a KASSERT triggering 
in the ALQ framework when SIFTR was disabled.

- The SACK enabled data log message field was not being set correctly.

If you would like to test on a kernel revision older then r209558, make 
sure you have my r209325 diff to sys/pcpu.h applied. It is safe to 
apply r209325 stand alone as it is self contained and not used by any 
code in the tree other than SIFTR.


Please adapt the following instructions as appropriate based on the 
patch version you're testing.



Copy it to the root of your source tree and run the following:

patch -p1  siftr_9.x.r209119.patch

It's a loadable kernel module so you can build it for testing like so:

cd path/to/src/sys/modules/siftr
make
kldload ./siftr.ko
(don't forget to make cleandir to remove cruft when finished testing)


It turns out that the above instructions to build the module can produce 
a .ko that is out of sync with your kernel in such a way that the module 
can load, but may blow up unexpectedly. This was observed when KTR was 
enabled in the running kernel.


To be safe, please use the following procedure instead:

- Ensure path/to/src is the source tree that the kernel you are 
currently running was built from.


cd path/to/src
make buildkernel
cp 
/usr/obj/path/to/src/sys/KERNCONF/modules/path/to/src/sys/modules/siftr/siftr.ko 
/tmp

kldload /tmp/siftr.ko

Alternatively for the last 2 steps, you can make installkernel ; 
shutdown -r now after the kernel build completes and then simply 
kldload siftr as the module will be installed to /boot/kernel/ as per 
usual.



After applying the patch, you can read the man page by running:

man -M path/to/src/share/man siftr

If I've done a decent job, all the info you need to understand what it
does and how to use it should be in the man page.

I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.

That should be enough to get the ball rolling. Thanks and I look forward
to hearing from you!

Cheers,
Lawrence

[1] http://caia.swin.edu.au/freebsd/etcp09/

[2] http://www.freebsdfoundation.org/projects.shtml#Swinburne

[3] http://caia.swin.edu.au/urp/newtcp/


[4] http://www.youtube.com/watch?v=xh_9QhRzJEs (language warning)

[5] http://www.sonofthesouth.net/uncle-sam/images/uncle-sam-wants-you.jpg


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-21 Thread Lawrence Stewart

On 06/22/10 04:52, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/21/10 05:44, Rui Paulo wrote:


On 20 Jun 2010, at 20:36, Fabian Keil wrote:


Fabian Keilfreebsd-lis...@fabiankeil.de   wrote:


Fabian Keilfreebsd-lis...@fabiankeil.de   wrote:



My custom kernel normally doesn't have INVARIANTS and WITNESS
enabled, so I'll try to enable them next.


The culprit seem to be non-default KTR settings in the kernel
while loading alq as a module.


Actually whether or not alq is loaded as a module doesn't
seem to matter, with:

options KTR
options KTR_ENTRIES=262144
options KTR_COMPILE=(KTR_SCHED)
options KTR_MASK=(KTR_SCHED)
options KTR_CPUMASK=0x3
options ALQ
options KTR_ALQ

enabling siftr panics the system, too.


That's probably because your module was built with different compile time 
options than the ones used in the kernel. These options may change structure 
sizes, function parameters, etc. and that easily causes panics.


hmm I wonder if my instructions to build SIFTR manually are causing your
problems. Fabian, is the siftr.ko module you're loading built as part of
a make buildkernel, or did you follow my instructions and cd
/path/to/src/sys/modules/siftr ; make ; kldload ./siftr.ko?


The latter.


If the latter is true, perhaps try and explicitly build SIFTR as part of
make buildkernel and see if loading the module built that way still
triggers the panic when enabled (the module will be in
/usr/obj/path/to/src/sys/KERNCONF/modules/path/to/src/sys/modules/siftr/siftr.ko
or if you make installkernel it'll be in /boot/kernel/kernel/siftr.ko).


That seems to work.


Damn, well this is the first time I've encountered a problem like this 
whilst using SIFTR compiled standalone and I've been using it like that 
for almost 3 years. I guess the lack of KTR in the module build subtly 
influences the module in a way that allows it load but in a precarious 
way. How irritating. Rui you were right on the money!


I will revise my testing instructions to build the module as part of a 
buildkernel to avoid potential problems like this.


Thanks for helping get to the bottom of this and for the test feedback.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

Hi Fabian,

On 06/20/10 03:58, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/13/10 18:12, Lawrence Stewart wrote:



The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.



I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.


I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db  where
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---


So I've tracked down the line of code where the page fault is occurring:

if (dir == PFIL_IN)
ss-n_in++;
else
ss-n_out++;

ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats 
per-cpu and is initialised at the start of the function like so:


ss = DPCPU_PTR(ss);

So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your 
machine. I know very little about the inner workings of the DPCPU_* 
macros, but I'm pretty sure the way I use them in SIFTR is correct or at 
least as intended.


Could you please go ahead and retest using a GENERIC kernel and see if 
you can reproduce? There could be something in your custom kernel 
causing the offsets or linker set magic used by the DPCPU bits to break 
which in turn is triggering this panic in SIFTR.


Whether its your custom changes breaking DPCPU or DPCPU being fragile 
remains to be seen, but the good news for me is that it looks like SIFTR 
is off the hook :)


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

On 06/20/10 21:15, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/20/10 03:58, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org   wrote:


On 06/13/10 18:12, Lawrence Stewart wrote:



The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.



I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.


I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db   where
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---


So I've tracked down the line of code where the page fault is occurring:

  if (dir == PFIL_IN)
  ss-n_in++;
  else
  ss-n_out++;

ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats
per-cpu and is initialised at the start of the function like so:

  ss = DPCPU_PTR(ss);

So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your
machine. I know very little about the inner workings of the DPCPU_*
macros, but I'm pretty sure the way I use them in SIFTR is correct or at
least as intended.


siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing
it itself. I think if ss was NULL, the panic should already occur in
siftr_chkreinject().


Yes but siftr_chkreinject() only dereferences ss in the exceptional case 
of a malloc failure or duplicate pkt. It's unlikely either case happens 
for you and so wouldn't trigger the panic.



To be sure I added:

diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
index 8bc3498..b9fdfe4 100644
--- a/sys/netinet/siftr.c
+++ b/sys/netinet/siftr.c
@@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet 
*ifp, int dir,
 if (siftr_chkreinject(*m, dir, ss))
 goto ret;

+   if (ss == NULL) {
+   printf(ss is NULL);
+   ss = DPCPU_PTR(ss);
+   if (ss == NULL) {
+  printf(ss is still NULL);
+  goto ret;
+   }
+}
+
+
 if (dir == PFIL_IN)
 ss-n_in++;
 else

which doesn't seem to affect the problem.


As in it still panics and the ss is NULL message is not printed? I 
would have expected to at least see ss is NULL printed if my 
hypothesis was correct... hmm.


Perhaps the way I discovered the line number at which the panic occurred 
was wrong. I compiled SIFTR on my amd64 dev server with CFLAGS+=-g in 
the SIFTR Makefile to get debug symbols, ran objdump -Sd siftr.ko | vim 
-, searched for the instruction reported in the panic message i.e. 
addq $0x1,0x8(%r14) and then with a bit of trial and error, recompiled 
SIFTR with the line of code volatile int blah = 0; blah = 2; at 
various points in the function and looking at the change in the objdump 
output to pinpoint which line of C code corresponded with the addq 
instruction.


The volatile int blah = 0; blah = 2; compiles to movl 
$0x0,0xffd4(%rbp) followed immediately by movl 
$0x2,0xffd4(%rbp). When I put that code above the if (dir 
== PFIL_IN) statement I see the objdump output show the assembly code 
before the addq instruction and when I move it after the if statement 
the assembly code moves after the addq instruction.


Perhaps you could reproduce the above procedure and see if you identify 
the same point in the siftr_chkpkt function I did for the instruction 
referenced by the panic message?



Could you please go ahead and retest using a GENERIC kernel and see if
you can reproduce? There could be something in your custom kernel
causing the offsets or linker set magic used by the DPCPU bits to break
which in turn is triggering this panic in SIFTR.


I'll retry without pf first, and with GENERIC afterwards.


Sounds good, thanks.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

On 06/20/10 22:28, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/20/10 21:15, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org   wrote:


On 06/20/10 03:58, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.orgwrote:


On 06/13/10 18:12, Lawrence Stewart wrote:



The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.



I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.


I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
dbwhere
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---


So I've tracked down the line of code where the page fault is occurring:

   if (dir == PFIL_IN)
   ss-n_in++;
   else
   ss-n_out++;

ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats
per-cpu and is initialised at the start of the function like so:

   ss = DPCPU_PTR(ss);

So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your
machine. I know very little about the inner workings of the DPCPU_*
macros, but I'm pretty sure the way I use them in SIFTR is correct or at
least as intended.


siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing
it itself. I think if ss was NULL, the panic should already occur in
siftr_chkreinject().


Yes but siftr_chkreinject() only dereferences ss in the exceptional case
of a malloc failure or duplicate pkt. It's unlikely either case happens
for you and so wouldn't trigger the panic.


To be sure I added:

diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
index 8bc3498..b9fdfe4 100644
--- a/sys/netinet/siftr.c
+++ b/sys/netinet/siftr.c
@@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet 
*ifp, int dir,
  if (siftr_chkreinject(*m, dir, ss))
  goto ret;

+   if (ss == NULL) {
+   printf(ss is NULL);
+   ss = DPCPU_PTR(ss);
+   if (ss == NULL) {
+  printf(ss is still NULL);
+  goto ret;
+   }
+}
+
+
  if (dir == PFIL_IN)
  ss-n_in++;
  else

which doesn't seem to affect the problem.


As in it still panics and the ss is NULL message is not printed? I
would have expected to at least see ss is NULL printed if my
hypothesis was correct... hmm.


Yes, it still panics, but no message is printed.


It was just pointed out to me that ss doesn't have to be NULL in order 
to cause the page fault (duh). It could also just be a garbage ptr which 
is why your print statement isn't firing.


Can you trigger the panic again and look for some information along the 
lines of fault virtual address = ... as part of the panic info. 
Knowing the faulting address would be useful and may help further diagnosis.



Perhaps the way I discovered the line number at which the panic occurred
was wrong. I compiled SIFTR on my amd64 dev server with CFLAGS+=-g in
the SIFTR Makefile to get debug symbols, ran objdump -Sd siftr.ko | vim
-, searched for the instruction reported in the panic message i.e.
addq $0x1,0x8(%r14) and then with a bit of trial and error, recompiled
SIFTR with the line of code volatile int blah = 0; blah = 2; at
various points in the function and looking at the change in the objdump
output to pinpoint which line of C code corresponded with the addq
instruction.

The volatile int blah = 0; blah = 2; compiles to movl
$0x0,0xffd4(%rbp) followed immediately by movl
$0x2,0xffd4(%rbp). When I put that code above the if (dir
== PFIL_IN) statement I see the objdump output show the assembly code
before the addq instruction and when I move it after the if statement
the assembly code moves after the addq instruction.


That's a neat trick.


Indeed, and I thank phk@ for suggesting it to me.


Perhaps you could reproduce the above procedure and see if you identify
the same point in the siftr_chkpkt function I did for the instruction
referenced by the panic message?


I do. Using:

diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
index b9fdfe4..fc6bd9a 100644
--- a/sys/netinet/siftr.c
+++ b/sys/netinet/siftr.c
@@ -797,12 +797,15

Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

On 06/20/10 23:15, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/20/10 22:28, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org   wrote:


On 06/20/10 21:15, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.orgwrote:


On 06/20/10 03:58, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org wrote:


On 06/13/10 18:12, Lawrence Stewart wrote:



The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.



I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.


I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db where
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---


So I've tracked down the line of code where the page fault is occurring:

if (dir == PFIL_IN)
ss-n_in++;
else
ss-n_out++;

ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats
per-cpu and is initialised at the start of the function like so:

ss = DPCPU_PTR(ss);

So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your
machine. I know very little about the inner workings of the DPCPU_*
macros, but I'm pretty sure the way I use them in SIFTR is correct or at
least as intended.


siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing
it itself. I think if ss was NULL, the panic should already occur in
siftr_chkreinject().


Yes but siftr_chkreinject() only dereferences ss in the exceptional case
of a malloc failure or duplicate pkt. It's unlikely either case happens
for you and so wouldn't trigger the panic.


To be sure I added:

diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
index 8bc3498..b9fdfe4 100644
--- a/sys/netinet/siftr.c
+++ b/sys/netinet/siftr.c
@@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet 
*ifp, int dir,
   if (siftr_chkreinject(*m, dir, ss))
   goto ret;

+   if (ss == NULL) {
+   printf(ss is NULL);
+   ss = DPCPU_PTR(ss);
+   if (ss == NULL) {
+  printf(ss is still NULL);
+  goto ret;
+   }
+}
+
+
   if (dir == PFIL_IN)
   ss-n_in++;
   else

which doesn't seem to affect the problem.


As in it still panics and the ss is NULL message is not printed? I
would have expected to at least see ss is NULL printed if my
hypothesis was correct... hmm.


Yes, it still panics, but no message is printed.


It was just pointed out to me that ss doesn't have to be NULL in order
to cause the page fault (duh). It could also just be a garbage ptr which
is why your print statement isn't firing.

Can you trigger the panic again and look for some information along the
lines of fault virtual address = ... as part of the panic info.
Knowing the faulting address would be useful and may help further diagnosis.


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xff7f808f9de8
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x8241f800
stack pointer   = 0x28:0xff83a7d0
frame pointer   = 0x28:0xff83a840
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0


None of this looks too crazy, but at least one person I've been chatting 
to about this thinks the faulting address doesn't look quite right for a 
DPCPU variable.


Can you please get the following additional info from DDB:

show reg
show dpcpu_offset
p/x pcpu_entry_modspace

And can you also please identify the upstream FreeBSD revision number 
your kernel source is based on (as opposed to the GIT rev) so we can 
make sure we're looking at the same base sources you're running.



current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db  where
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4

Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

On 06/21/10 00:12, Fabian Keil wrote:

Fabian Keilfreebsd-lis...@fabiankeil.de  wrote:


Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/20/10 22:28, Fabian Keil wrote:



Taking pf (and altq) out of the picture doesn't seem to make
a difference.


Wouldn't have expected it to. Will be very curious to know if the panic
is triggered in GENERIC.


It's not. I, too, get pfil.c related LORs though:

lock order reversal:
  1st 0x80e5c568 PFil hook read/write mutex (PFil hook read/write 
mutex) @ /usr/src/sys/net/pfil.c:77
  2nd 0x80e5dd68 udp (udp) @ 
/usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:3035
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
_rw_rlock() at _rw_rlock+0x5f
pf_socket_lookup() at pf_socket_lookup+0x1c5
pf_test_udp() at pf_test_udp+0x8b0
pf_test() at pf_test+0x1089
pf_check_in() at pf_check_in+0x39
pfil_run_hooks() at pfil_run_hooks+0xcf
ip_input() at ip_input+0x2ae
swi_net() at swi_net+0x151
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop() at ithread_loop+0xb2
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff844d30, rbp = 0 ---
lock order reversal:
  1st 0x80e5c568 PFil hook read/write mutex (PFil hook read/write 
mutex) @ /usr/src/sys/net/pfil.c:77
  2nd 0x80e5d788 tcp (tcp) @ 
/usr/src/sys/modules/siftr/../../netinet/siftr.c:698
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
_rw_rlock() at _rw_rlock+0x5f
siftr_chkpkt() at siftr_chkpkt+0x3c4
pfil_run_hooks() at pfil_run_hooks+0xcf
ip_input() at ip_input+0x2ae
swi_net() at swi_net+0x151
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop() at ithread_loop+0xb2
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff844d30, rbp = 0 ---

My custom kernel normally doesn't have INVARIANTS and WITNESS
enabled, so I'll try to enable them next.


The culprit seem to be non-default KTR settings in the kernel
while loading alq as a module. With the following change siftr
works with my non-GENERIC kernel, too:

commit f43b8b5171c858df7b419f6a695e9e3b53531a8e
Author: Fabian Keilf...@fabiankeil.de
Date:   Sun Jun 20 15:43:01 2010 +0200

 Disable KTR changes.

diff --git a/sys/amd64/conf/ZOEY b/sys/amd64/conf/ZOEY
index 6fb3480..c584317 100644
--- a/sys/amd64/conf/ZOEY
+++ b/sys/amd64/conf/ZOEY
@@ -16,11 +16,11 @@ options ATA_CAM
  device  atapicam
  options SC_KERNEL_CONS_ATTR=(FG_GREEN|BG_BLACK)

-options KTR
-options KTR_ENTRIES=262144
-options KTR_COMPILE=(KTR_SCHED)
-options KTR_MASK=(KTR_SCHED)
-options KTR_CPUMASK=0x3
+#options KTR
+#options KTR_ENTRIES=262144
+#options KTR_COMPILE=(KTR_SCHED)
+#options KTR_MASK=(KTR_SCHED)
+#options KTR_CPUMASK=0x3

  options ACCEPT_FILTER_HTTP
  makeoptions WITH_CTF=yes


This smells very fishy. Without options KTR_ALQ, KTR shouldn't even 
care if ALQ exists or not. Not only that, but ALQ isn't even used in 
siftr_chkpkt and you clearly manage to successfully use ALQ to write the 
module load message to the log file. H...


Thanks for taking the time to find the culprit though - I'll see if I 
can reproduce here. Could you try another thing for me and see if 
reducing options KTR_ENTRIES=262144 down to a smaller number (maybe 
4096?) and leaving all the other KTR options as they are above (but 
uncommented) makes any difference? The ktr(4) man page indicates the 
default is 8192 entries and I'm curious if the your allocation of so 
many additional entries is making something unhappy.


Thanks again for your time helping with this, I really appreciate it.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

On 06/21/10 05:44, Rui Paulo wrote:


On 20 Jun 2010, at 20:36, Fabian Keil wrote:


Fabian Keilfreebsd-lis...@fabiankeil.de  wrote:


Fabian Keilfreebsd-lis...@fabiankeil.de  wrote:



My custom kernel normally doesn't have INVARIANTS and WITNESS
enabled, so I'll try to enable them next.


The culprit seem to be non-default KTR settings in the kernel
while loading alq as a module.


Actually whether or not alq is loaded as a module doesn't
seem to matter, with:

options KTR
options KTR_ENTRIES=262144
options KTR_COMPILE=(KTR_SCHED)
options KTR_MASK=(KTR_SCHED)
options KTR_CPUMASK=0x3
options ALQ
options KTR_ALQ

enabling siftr panics the system, too.


That's probably because your module was built with different compile time 
options than the ones used in the kernel. These options may change structure 
sizes, function parameters, etc. and that easily causes panics.


hmm I wonder if my instructions to build SIFTR manually are causing your 
problems. Fabian, is the siftr.ko module you're loading built as part of 
a make buildkernel, or did you follow my instructions and cd 
/path/to/src/sys/modules/siftr ; make ; kldload ./siftr.ko?


If the latter is true, perhaps try and explicitly build SIFTR as part of 
make buildkernel and see if loading the module built that way still 
triggers the panic when enabled (the module will be in 
/usr/obj/path/to/src/sys/KERNCONF/modules/path/to/src/sys/modules/siftr/siftr.ko 
or if you make installkernel it'll be in /boot/kernel/kernel/siftr.ko).


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-19 Thread Lawrence Stewart

Hi Lev,

On 06/19/10 16:26, Lev Serebryakov wrote:

Hello, Lawrence.
You wrote 19 июня 2010 г., 07:27:30:


Amount of feedback received thus far: nichts, nil, nada

   I  wanted  to  help  you,  but  here is one problem: I dont have any
traffic-loaded 9-CURRENT machines. I have some not-so-critical 7.x and
8.x  machines  with  noticeable  traffic  (for example, my torrent box
still run 7-STABLE), but no 9-CURRENT except VMWare on my desktop :(
   I  think,  it is common case: 9-CURRENT machines are developers one,
without  noticeable  amount of network traffic and all traffic-loaded
machines run more stable versions.


Right now the traffic load of the test machine is not really all that 
important to the testing. As long as the module loads, logs some 
coherent looking data whilst enabled and unloads across a range of 
different hardware and kernel archs, I'll be happy. SIFTR will be 
backported to 8 and possibly 7 also, so there will be plenty of time to 
get people with more heavily loaded systems running stable branches to 
join in testing.


This is the first real push I've made to get the code widely tested, so 
I wouldn't feel comfortable asking people to run it on 
(semi-)production, stable branch systems yet. If you're really keen to 
help test it and you wouldn't be worried about running the code on such 
a system, I would be happy to create a 7 and/or 8 backport of the 
required bits. Otherwise, I'm happy to get the initial round of 
9-CURRENT only testing feedback, commit it to head and then revisit once 
it's settled and time to merge it back to the stable branches.


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research

2010-06-19 Thread Lawrence Stewart

Hi Pluknet,

On 06/19/10 18:48, pluknet wrote:
[snip]

Hi.

I'm seeing this right after enabling siftr via sysctl and changing ppl.
Sorry, if that was already discussed, known or unrelated (since em is
in locking chain).

lock order reversal:
  1st 0x80e51568 PFil hook read/write mutex (PFil hook
read/write mutex) @ /usr/src/sys/net/pfil.c:77
  2nd 0x80e52788 tcp (tcp) @
/usr/src/sys/modules/siftr/../../netinet/siftr.c:698
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
_rw_rlock() at _rw_rlock+0x5f
siftr_chkpkt() at siftr_chkpkt+0x374
pfil_run_hooks() at pfil_run_hooks+0xcf
ip_input() at ip_input+0x2ae
netisr_dispatch_src() at netisr_dispatch_src+0xb8
ether_demux() at ether_demux+0x17d
ether_input() at ether_input+0x175
em_rxeof() at em_rxeof+0x193
em_handle_que() at em_handle_que+0x4a
taskqueue_run() at taskqueue_run+0x91
taskqueue_thread_loop() at taskqueue_thread_loop+0x3f
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff8bed30, rbp = 0 ---


I believe I discussed this LOR with Robert Watson some time back and we 
came to the conclusion it is a false positive witness report and is safe 
to ignore. I should document it in the man page and figure out if 
there's some way to tell witness to not report it. Thanks for reminding 
me and for testing. Did everything else behave sanely and work ok?


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-19 Thread Lawrence Stewart

Hi Fabian,

Thank you for the the report. This is indeed an issue I've never seen 
before and exactly the sort of thing I wanted to uncover.


On 06/20/10 03:58, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/13/10 18:12, Lawrence Stewart wrote:



The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.



I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.


I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db  where
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---


hmm I'd love to know which line of code siftr_chkpkt+0xd0 maps to. Let 
me read through the function carefully and see if I can spot an obvious 
null ptr deref. The hook function has received some major rototilling of 
late to get it ready for the import so I must have missed something.



This is from the third attempt, the second time I got a different
backtrace that also contained some *_iwn_* functions, the first
time I had X running, so I didn't get anything. Unfortunately
at that point the system seems to be too busted to dump core.


Typically, packets are direct dispatched into the stack from the driver 
so it is normal to see driver functions in a thread's stack trace when 
it's executing in the siftr pfil hook.



I'm using:
FreeBSD 9.0-CURRENT #99 r+b768fe1: Sat Jun 19 15:01:37 CEST 2010
 f...@r500.local:/usr/obj/usr/src/sys/ZOEY amd64
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 Duo CPU T5870  @ 2.00GHz (1995.01-MHz K8-class CPU)
   Origin = GenuineIntel  Id = 0x6fd  Family = 6  Model = f  Stepping = 13
   
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
   Features2=0xe39dSSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM
   AMD Features=0x20100800SYSCALL,NX,LM
   AMD Features2=0x1LAHF
   TSC: P-state invariant
real memory  = 2147483648 (2048 MB)
avail memory = 1976610816 (1885 MB)
ACPI APIC Table:LENOVO TP-7Y
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
  cpu0 (BSP): APIC ID:  0
  cpu1 (AP): APIC ID:  1
ioapic0: Changing APIC ID to 1
ioapic0Version 2.0  irqs 0-23 on motherboard

I'm not using vanilla sources, but none of the modifications
should matter here.


Yes this does not look like an issue with your sources but with the 
siftr code itself. Don't bother testing with GENERIC yet as I'm 
confident you've given me enough info to track this down.



I have powerd running and did not yet try without it.

The system has bge0 and iwn0, but bge0 is mainly down.

pf is compiled into the kernel, siftr is loaded as a module.

The panic seems to occur without logging a single packet first:
f...@r500 ~ $cat /var/log/siftr.log
enable_time_secs=1276966161 enable_time_usecs=945080siftrver=1.2.3  
hz=100  tcp_rtt_scale=32sysname=FreeBSD sysver=900014   ipmode=4
enable_time_secs=1276966586 enable_time_usecs=314023siftrver=1.2.3  
hz=100  tcp_rtt_scale=32sysname=FreeBSD sysver=900014   ipmode=4

I get the impression that this is reproducible, but only tried
three times (the last time with everything mounted read-only).


Thanks again for the report and I'll be in touch as soon as I get a 
chance to look at it some more (hopefully later today).


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-18 Thread Lawrence Stewart

Amount of feedback received thus far: nichts, nil, nada

*sings I'm so ronery in his best Kim Jong-il voice* [4]

Just like Uncle Sam [5], Uncle Lawrence needs you too - yes, I'm 
pointing at YOU!


More specifically, people out there running current with 10-15 mins to 
spare for some testing, please read on.


On 06/13/10 18:12, Lawrence Stewart wrote:

Hi all,

The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.

SIFTR is a kernel module that logs a range of statistics on active TCP
connections to a log file. It provides the ability to make highly
granular measurements of TCP connection state, aimed at system
administrators, developers and researchers. You can use the data to find
bugs in the stack, understand why connections are performing badly and
test new code to name a few uses.

Development has been made possible in part by grants from the Cisco
University Research Program Fund at Community Foundation Silicon Valley,
and the FreeBSD Foundation. Bringing it into FreeBSD proper is being
carried out under the auspices of the Enhancing the FreeBSD TCP
Implementation FreeBSD Foundation project. More details are available
at [1,2,3].

If you can help out, please read on!

Before continuing, make sure you're running with at least svn revision
209119 (my commit to sys/pcpu.h), or you can manually apply the
r209119 diff to to your earlier rev source tree.

The SIFTR patch is here:

http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209119.patch


An updated version of the patch against svn head revision 209325 is 
available from:


http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209325.patch

There was a backwards incompatible change in the external DPCPU_SUM() 
macro in sys/pcpu.h in r209325 of head so SIFTR also had to be 
updated. Please adapt the following instructions as appropriate based on 
the patch version you're testing.



Copy it to the root of your source tree and run the following:

patch -p1  siftr_9.x.r209119.patch

It's a loadable kernel module so you can build it for testing like so:

cd path/to/src/sys/modules/siftr
make
kldload ./siftr.ko
(don't forget to make cleandir to remove cruft when finished testing)

After applying the patch, you can read the man page by running:

man -M path/to/src/share/man siftr

If I've done a decent job, all the info you need to understand what it
does and how to use it should be in the man page.

I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.

That should be enough to get the ball rolling. Thanks and I look forward
to hearing from you!

Cheers,
Lawrence

[1] http://caia.swin.edu.au/freebsd/etcp09/

[2] http://www.freebsdfoundation.org/projects.shtml#Swinburne

[3] http://caia.swin.edu.au/urp/newtcp/


[4] http://www.youtube.com/watch?v=xh_9QhRzJEs (language warning)

[5] http://www.sonofthesouth.net/uncle-sam/images/uncle-sam-wants-you.jpg
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


[CFT] SIFTR - Statistical Information For TCP Research

2010-06-13 Thread Lawrence Stewart

Hi all,

The time has come to solicit some external testing for my SIFTR tool. 
I'm hoping to commit it within a week or so unless problems are discovered.


SIFTR is a kernel module that logs a range of statistics on active TCP 
connections to a log file. It provides the ability to make highly 
granular measurements of TCP connection state, aimed at system 
administrators, developers and researchers. You can use the data to find 
bugs in the stack, understand why connections are performing badly and 
test new code to name a few uses.


Development has been made possible in part by grants from the Cisco 
University Research Program Fund at Community Foundation Silicon Valley, 
and the FreeBSD Foundation. Bringing it into FreeBSD proper is being 
carried out under the auspices of the Enhancing the FreeBSD TCP 
Implementation FreeBSD Foundation project. More details are available 
at [1,2,3].


If you can help out, please read on!

Before continuing, make sure you're running with at least svn revision 
209119 (my commit to sys/pcpu.h), or you can manually apply the 
r209119 diff to to your earlier rev source tree.


The SIFTR patch is here:

http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209119.patch

Copy it to the root of your source tree and run the following:

patch -p1  siftr_9.x.r209119.patch

It's a loadable kernel module so you can build it for testing like so:

cd path/to/src/sys/modules/siftr
make
kldload ./siftr.ko
(don't forget to make cleandir to remove cruft when finished testing)

After applying the patch, you can read the man page by running:

man -M path/to/src/share/man siftr

If I've done a decent job, all the info you need to understand what it 
does and how to use it should be in the man page.


I'm interested in all feedback and reports of success/failure, along 
with details of the architecture tested and number of CPUs if you would 
be so kind.


That should be enough to get the ball rolling. Thanks and I look forward 
to hearing from you!


Cheers,
Lawrence

[1] http://caia.swin.edu.au/freebsd/etcp09/

[2] http://www.freebsdfoundation.org/projects.shtml#Swinburne

[3] http://caia.swin.edu.au/urp/newtcp/
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Macro to sum DPCPU vars

2010-06-10 Thread Lawrence Stewart

On 06/10/10 22:23, John Baldwin wrote:

On Wednesday 09 June 2010 11:54:53 pm Lawrence Stewart wrote:

Does anyone have objections to or feedback on the following patch? The
macro simplifies the act of calculating an aggregate from DPCPU counters.



http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/dpcpu_sum_9.x.r208900.patch


If anyone is curious how you would use it, take a look at:


I think this is fine, though I'm about to make it smaller.  At Robert's
request I've come up with some macros to iterate over CPUs to abstract out the
CPU_ABSENT(), etc. bits.  It is at www.freebsd.org/~jhb/patches/cpu_iter.patch
Using CPU_FOREACH() should try your macro down slightly.


Nice, I'll rework my patch and commit once your new bits hit the tree.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: etcupdate tool in base?

2010-06-10 Thread Lawrence Stewart

On 06/11/10 03:46, John Baldwin wrote:

I've had several folks ask me recently about importing etcupdate
(http://www.FreeBSD.org/~jhb/etcupdate) into the base system as an alternate
tool for updating /etc during upgrades.  Do folks have any strong objections
to doing so?  More details about how it works and an HTML version of the
manpage can be found at the URL above.



+1 for adding to base (and updating handbook chapters makeworld.html 
and small-lan.html, plus maybe /usr/src/Makefile and an UPDATING entry).


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


[RFC] Macro to sum DPCPU vars

2010-06-09 Thread Lawrence Stewart
Does anyone have objections to or feedback on the following patch? The 
macro simplifies the act of calculating an aggregate from DPCPU counters.


http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/dpcpu_sum_9.x.r208900.patch

If anyone is curious how you would use it, take a look at:

http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r208900_v2.patch

and search for code that references the siftr_stats struct or DPCPU.

I intend to commit the DPCPU patch in the next day or two.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [TESTING]: ClangBSD branch needs testing before the import to HEAD

2010-05-31 Thread Lawrence Stewart

On 06/01/10 09:25, James R. Van Artsdalen wrote:
[snip interesting history]


I do suggest modifying the FreeBSD build process so that uname -a shows
the compiler and its version for both the kernel and userland.


Reading through this discussion, I wanted to draw attention to this 
footnote in James' email. It sounds like a sensible and useful 
suggestion that would go some way to addressing Kostik's concerns about 
knowing whether a kernel bug report was related to a gcc or clang built 
kernel.


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org