Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-07-03 Thread Lawrence Stewart

On 06/28/10 18:56, Lawrence Stewart wrote:

Hi again,

After my most recent appeal for testers, I received some excellent
feedback and thank everyone that has tried the patch. I've ironed out a
couple of bugs and have what I hope is the import-ready candidate patch
available for a final round of testing.

Please read on if you are able and willing to (re)test the code.


[snip]

I've committed SIFTR to head as r209662, with r209665 as a minor follow 
up fix to include the man page in the build.


Sincere thanks to everyone that pitched in with review/testing and if 
you haven't already tried it, give it a spin next time you update your 
sources to r209665 or later - man siftr will get you going. Please CC 
me explicitly on any mail regarding problems with SIFTR.


On the off chance anyone is looking for some self contained, small 
projects/patches to work on, I have plenty of additional ideas for 
improvements to SIFTR. I'd be very happy to collaborate with anyone that 
was interested enough to work on the code.


Enjoy!

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-28 Thread Lawrence Stewart

Hi again,

After my most recent appeal for testers, I received some excellent 
feedback and thank everyone that has tried the patch. I've ironed out a 
couple of bugs and have what I hope is the import-ready candidate patch 
available for a final round of testing.


Please read on if you are able and willing to (re)test the code.

On 06/19/10 13:27, Lawrence Stewart wrote:

Amount of feedback received thus far: nichts, nil, nada

*sings I'm so ronery in his best Kim Jong-il voice* [4]

Just like Uncle Sam [5], Uncle Lawrence needs you too - yes, I'm
pointing at YOU!

More specifically, people out there running current with 10-15 mins to
spare for some testing, please read on.

On 06/13/10 18:12, Lawrence Stewart wrote:

Hi all,

The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are
discovered.

SIFTR is a kernel module that logs a range of statistics on active TCP
connections to a log file. It provides the ability to make highly
granular measurements of TCP connection state, aimed at system
administrators, developers and researchers. You can use the data to find
bugs in the stack, understand why connections are performing badly and
test new code to name a few uses.

Development has been made possible in part by grants from the Cisco
University Research Program Fund at Community Foundation Silicon Valley,
and the FreeBSD Foundation. Bringing it into FreeBSD proper is being
carried out under the auspices of the Enhancing the FreeBSD TCP
Implementation FreeBSD Foundation project. More details are available
at [1,2,3].

If you can help out, please read on!


[snip]

Latest patch which fixes 2 bugs reported by testers and adds a bit more 
discussion to the man page is available here:


http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209558.patch

Fixed bugs:
- Running SIFTR on an INVARIANTS enabled kernel with a large number of 
TCP flows terminating on the machine would lead to a KASSERT triggering 
in the ALQ framework when SIFTR was disabled.

- The SACK enabled data log message field was not being set correctly.

If you would like to test on a kernel revision older then r209558, make 
sure you have my r209325 diff to sys/pcpu.h applied. It is safe to 
apply r209325 stand alone as it is self contained and not used by any 
code in the tree other than SIFTR.


Please adapt the following instructions as appropriate based on the 
patch version you're testing.



Copy it to the root of your source tree and run the following:

patch -p1  siftr_9.x.r209119.patch

It's a loadable kernel module so you can build it for testing like so:

cd path/to/src/sys/modules/siftr
make
kldload ./siftr.ko
(don't forget to make cleandir to remove cruft when finished testing)


It turns out that the above instructions to build the module can produce 
a .ko that is out of sync with your kernel in such a way that the module 
can load, but may blow up unexpectedly. This was observed when KTR was 
enabled in the running kernel.


To be safe, please use the following procedure instead:

- Ensure path/to/src is the source tree that the kernel you are 
currently running was built from.


cd path/to/src
make buildkernel
cp 
/usr/obj/path/to/src/sys/KERNCONF/modules/path/to/src/sys/modules/siftr/siftr.ko 
/tmp

kldload /tmp/siftr.ko

Alternatively for the last 2 steps, you can make installkernel ; 
shutdown -r now after the kernel build completes and then simply 
kldload siftr as the module will be installed to /boot/kernel/ as per 
usual.



After applying the patch, you can read the man page by running:

man -M path/to/src/share/man siftr

If I've done a decent job, all the info you need to understand what it
does and how to use it should be in the man page.

I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.

That should be enough to get the ball rolling. Thanks and I look forward
to hearing from you!

Cheers,
Lawrence

[1] http://caia.swin.edu.au/freebsd/etcp09/

[2] http://www.freebsdfoundation.org/projects.shtml#Swinburne

[3] http://caia.swin.edu.au/urp/newtcp/


[4] http://www.youtube.com/watch?v=xh_9QhRzJEs (language warning)

[5] http://www.sonofthesouth.net/uncle-sam/images/uncle-sam-wants-you.jpg


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-21 Thread Fabian Keil
Lawrence Stewart lstew...@freebsd.org wrote:

 On 06/21/10 05:44, Rui Paulo wrote:
 
  On 20 Jun 2010, at 20:36, Fabian Keil wrote:
 
  Fabian Keilfreebsd-lis...@fabiankeil.de  wrote:
 
  Fabian Keilfreebsd-lis...@fabiankeil.de  wrote:
 
  My custom kernel normally doesn't have INVARIANTS and WITNESS
  enabled, so I'll try to enable them next.
 
  The culprit seem to be non-default KTR settings in the kernel
  while loading alq as a module.
 
  Actually whether or not alq is loaded as a module doesn't
  seem to matter, with:
 
  optionsKTR
  optionsKTR_ENTRIES=262144
  optionsKTR_COMPILE=(KTR_SCHED)
  optionsKTR_MASK=(KTR_SCHED)
  optionsKTR_CPUMASK=0x3
  optionsALQ
  optionsKTR_ALQ
 
  enabling siftr panics the system, too.
 
  That's probably because your module was built with different compile time 
  options than the ones used in the kernel. These options may change 
  structure sizes, function parameters, etc. and that easily causes panics.
 
 hmm I wonder if my instructions to build SIFTR manually are causing your 
 problems. Fabian, is the siftr.ko module you're loading built as part of 
 a make buildkernel, or did you follow my instructions and cd 
 /path/to/src/sys/modules/siftr ; make ; kldload ./siftr.ko?

The latter.

 If the latter is true, perhaps try and explicitly build SIFTR as part of 
 make buildkernel and see if loading the module built that way still 
 triggers the panic when enabled (the module will be in 
 /usr/obj/path/to/src/sys/KERNCONF/modules/path/to/src/sys/modules/siftr/siftr.ko
  
 or if you make installkernel it'll be in /boot/kernel/kernel/siftr.ko).

That seems to work.

Fabian


signature.asc
Description: PGP signature


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-21 Thread Lawrence Stewart

On 06/22/10 04:52, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/21/10 05:44, Rui Paulo wrote:


On 20 Jun 2010, at 20:36, Fabian Keil wrote:


Fabian Keilfreebsd-lis...@fabiankeil.de   wrote:


Fabian Keilfreebsd-lis...@fabiankeil.de   wrote:



My custom kernel normally doesn't have INVARIANTS and WITNESS
enabled, so I'll try to enable them next.


The culprit seem to be non-default KTR settings in the kernel
while loading alq as a module.


Actually whether or not alq is loaded as a module doesn't
seem to matter, with:

options KTR
options KTR_ENTRIES=262144
options KTR_COMPILE=(KTR_SCHED)
options KTR_MASK=(KTR_SCHED)
options KTR_CPUMASK=0x3
options ALQ
options KTR_ALQ

enabling siftr panics the system, too.


That's probably because your module was built with different compile time 
options than the ones used in the kernel. These options may change structure 
sizes, function parameters, etc. and that easily causes panics.


hmm I wonder if my instructions to build SIFTR manually are causing your
problems. Fabian, is the siftr.ko module you're loading built as part of
a make buildkernel, or did you follow my instructions and cd
/path/to/src/sys/modules/siftr ; make ; kldload ./siftr.ko?


The latter.


If the latter is true, perhaps try and explicitly build SIFTR as part of
make buildkernel and see if loading the module built that way still
triggers the panic when enabled (the module will be in
/usr/obj/path/to/src/sys/KERNCONF/modules/path/to/src/sys/modules/siftr/siftr.ko
or if you make installkernel it'll be in /boot/kernel/kernel/siftr.ko).


That seems to work.


Damn, well this is the first time I've encountered a problem like this 
whilst using SIFTR compiled standalone and I've been using it like that 
for almost 3 years. I guess the lack of KTR in the module build subtly 
influences the module in a way that allows it load but in a precarious 
way. How irritating. Rui you were right on the money!


I will revise my testing instructions to build the module as part of a 
buildkernel to avoid potential problems like this.


Thanks for helping get to the bottom of this and for the test feedback.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

Hi Fabian,

On 06/20/10 03:58, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/13/10 18:12, Lawrence Stewart wrote:



The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.



I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.


I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db  where
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---


So I've tracked down the line of code where the page fault is occurring:

if (dir == PFIL_IN)
ss-n_in++;
else
ss-n_out++;

ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats 
per-cpu and is initialised at the start of the function like so:


ss = DPCPU_PTR(ss);

So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your 
machine. I know very little about the inner workings of the DPCPU_* 
macros, but I'm pretty sure the way I use them in SIFTR is correct or at 
least as intended.


Could you please go ahead and retest using a GENERIC kernel and see if 
you can reproduce? There could be something in your custom kernel 
causing the offsets or linker set magic used by the DPCPU bits to break 
which in turn is triggering this panic in SIFTR.


Whether its your custom changes breaking DPCPU or DPCPU being fragile 
remains to be seen, but the good news for me is that it looks like SIFTR 
is off the hook :)


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Fabian Keil
Lawrence Stewart lstew...@freebsd.org wrote:

 On 06/20/10 03:58, Fabian Keil wrote:
  Lawrence Stewartlstew...@freebsd.org  wrote:
 
  On 06/13/10 18:12, Lawrence Stewart wrote:
 
  The time has come to solicit some external testing for my SIFTR tool.
  I'm hoping to commit it within a week or so unless problems are 
  discovered.
 
  I'm interested in all feedback and reports of success/failure, along
  with details of the architecture tested and number of CPUs if you would
  be so kind.
 
  I got the following hand-transcribed panic maybe a second after
  sysctl net.inet.siftr.enabled=1
 
  Fatal trap 12: page fault while in kernel mode
  cpuid = 1; apic id = 01
  [...]
  current process = 12 (swi4: clock)
  [ thread pid 12 tid 16 ]
  Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
  db  where
  Tracing pid 12 tid 16 td 0xff00034037e0
  siftr_chkpt() at siftr_chkpkt+0xd0
  pfil_run_hooks() at pfil_run_hooks+0xb4
  ip_output() at ip_output+0x382
  tcp_output() tcp_output+0xa41
  tcp_timer_rexmt() at tcp_timer_rexmt+0x251
  softclock() at softclock+0x291
  intr_event_execute_handlers() at intr_event_execute_handlers+0x66
  ithread_loop at ithread_loop+0x8e
  fork_exit() at fork_exit+0x112
  fork_trampoline() at fork_trampoline+0xe
  --- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---
 
 So I've tracked down the line of code where the page fault is occurring:
 
  if (dir == PFIL_IN)
  ss-n_in++;
  else
  ss-n_out++;
 
 ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats 
 per-cpu and is initialised at the start of the function like so:
 
  ss = DPCPU_PTR(ss);
 
 So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your 
 machine. I know very little about the inner workings of the DPCPU_* 
 macros, but I'm pretty sure the way I use them in SIFTR is correct or at 
 least as intended.

siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing
it itself. I think if ss was NULL, the panic should already occur in
siftr_chkreinject().

To be sure I added:

diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
index 8bc3498..b9fdfe4 100644
--- a/sys/netinet/siftr.c
+++ b/sys/netinet/siftr.c
@@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet 
*ifp, int dir,
if (siftr_chkreinject(*m, dir, ss))
goto ret;
 
+   if (ss == NULL) {
+   printf(ss is NULL);
+   ss = DPCPU_PTR(ss);
+   if (ss == NULL) {
+  printf(ss is still NULL);
+  goto ret;
+   }
+}
+
+
if (dir == PFIL_IN)
ss-n_in++;
else

which doesn't seem to affect the problem.

 Could you please go ahead and retest using a GENERIC kernel and see if 
 you can reproduce? There could be something in your custom kernel 
 causing the offsets or linker set magic used by the DPCPU bits to break 
 which in turn is triggering this panic in SIFTR.

I'll retry without pf first, and with GENERIC afterwards.

Fabian


signature.asc
Description: PGP signature


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

On 06/20/10 21:15, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/20/10 03:58, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org   wrote:


On 06/13/10 18:12, Lawrence Stewart wrote:



The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.



I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.


I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db   where
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---


So I've tracked down the line of code where the page fault is occurring:

  if (dir == PFIL_IN)
  ss-n_in++;
  else
  ss-n_out++;

ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats
per-cpu and is initialised at the start of the function like so:

  ss = DPCPU_PTR(ss);

So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your
machine. I know very little about the inner workings of the DPCPU_*
macros, but I'm pretty sure the way I use them in SIFTR is correct or at
least as intended.


siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing
it itself. I think if ss was NULL, the panic should already occur in
siftr_chkreinject().


Yes but siftr_chkreinject() only dereferences ss in the exceptional case 
of a malloc failure or duplicate pkt. It's unlikely either case happens 
for you and so wouldn't trigger the panic.



To be sure I added:

diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
index 8bc3498..b9fdfe4 100644
--- a/sys/netinet/siftr.c
+++ b/sys/netinet/siftr.c
@@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet 
*ifp, int dir,
 if (siftr_chkreinject(*m, dir, ss))
 goto ret;

+   if (ss == NULL) {
+   printf(ss is NULL);
+   ss = DPCPU_PTR(ss);
+   if (ss == NULL) {
+  printf(ss is still NULL);
+  goto ret;
+   }
+}
+
+
 if (dir == PFIL_IN)
 ss-n_in++;
 else

which doesn't seem to affect the problem.


As in it still panics and the ss is NULL message is not printed? I 
would have expected to at least see ss is NULL printed if my 
hypothesis was correct... hmm.


Perhaps the way I discovered the line number at which the panic occurred 
was wrong. I compiled SIFTR on my amd64 dev server with CFLAGS+=-g in 
the SIFTR Makefile to get debug symbols, ran objdump -Sd siftr.ko | vim 
-, searched for the instruction reported in the panic message i.e. 
addq $0x1,0x8(%r14) and then with a bit of trial and error, recompiled 
SIFTR with the line of code volatile int blah = 0; blah = 2; at 
various points in the function and looking at the change in the objdump 
output to pinpoint which line of C code corresponded with the addq 
instruction.


The volatile int blah = 0; blah = 2; compiles to movl 
$0x0,0xffd4(%rbp) followed immediately by movl 
$0x2,0xffd4(%rbp). When I put that code above the if (dir 
== PFIL_IN) statement I see the objdump output show the assembly code 
before the addq instruction and when I move it after the if statement 
the assembly code moves after the addq instruction.


Perhaps you could reproduce the above procedure and see if you identify 
the same point in the siftr_chkpkt function I did for the instruction 
referenced by the panic message?



Could you please go ahead and retest using a GENERIC kernel and see if
you can reproduce? There could be something in your custom kernel
causing the offsets or linker set magic used by the DPCPU bits to break
which in turn is triggering this panic in SIFTR.


I'll retry without pf first, and with GENERIC afterwards.


Sounds good, thanks.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Fabian Keil
Lawrence Stewart lstew...@freebsd.org wrote:

 On 06/20/10 21:15, Fabian Keil wrote:
  Lawrence Stewartlstew...@freebsd.org  wrote:
 
  On 06/20/10 03:58, Fabian Keil wrote:
  Lawrence Stewartlstew...@freebsd.org   wrote:
 
  On 06/13/10 18:12, Lawrence Stewart wrote:
 
  The time has come to solicit some external testing for my SIFTR tool.
  I'm hoping to commit it within a week or so unless problems are 
  discovered.
 
  I'm interested in all feedback and reports of success/failure, along
  with details of the architecture tested and number of CPUs if you would
  be so kind.
 
  I got the following hand-transcribed panic maybe a second after
  sysctl net.inet.siftr.enabled=1
 
  Fatal trap 12: page fault while in kernel mode
  cpuid = 1; apic id = 01
  [...]
  current process = 12 (swi4: clock)
  [ thread pid 12 tid 16 ]
  Stopped atsiftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
  db   where
  Tracing pid 12 tid 16 td 0xff00034037e0
  siftr_chkpt() at siftr_chkpkt+0xd0
  pfil_run_hooks() at pfil_run_hooks+0xb4
  ip_output() at ip_output+0x382
  tcp_output() tcp_output+0xa41
  tcp_timer_rexmt() at tcp_timer_rexmt+0x251
  softclock() at softclock+0x291
  intr_event_execute_handlers() at intr_event_execute_handlers+0x66
  ithread_loop at ithread_loop+0x8e
  fork_exit() at fork_exit+0x112
  fork_trampoline() at fork_trampoline+0xe
  --- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---
 
  So I've tracked down the line of code where the page fault is occurring:
 
if (dir == PFIL_IN)
ss-n_in++;
else
ss-n_out++;
 
  ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats
  per-cpu and is initialised at the start of the function like so:
 
ss = DPCPU_PTR(ss);
 
  So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your
  machine. I know very little about the inner workings of the DPCPU_*
  macros, but I'm pretty sure the way I use them in SIFTR is correct or at
  least as intended.
 
  siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing
  it itself. I think if ss was NULL, the panic should already occur in
  siftr_chkreinject().
 
 Yes but siftr_chkreinject() only dereferences ss in the exceptional case 
 of a malloc failure or duplicate pkt. It's unlikely either case happens 
 for you and so wouldn't trigger the panic.
 
  To be sure I added:
 
  diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
  index 8bc3498..b9fdfe4 100644
  --- a/sys/netinet/siftr.c
  +++ b/sys/netinet/siftr.c
  @@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet 
  *ifp, int dir,
   if (siftr_chkreinject(*m, dir, ss))
   goto ret;
 
  +   if (ss == NULL) {
  +   printf(ss is NULL);
  +   ss = DPCPU_PTR(ss);
  +   if (ss == NULL) {
  +  printf(ss is still NULL);
  +  goto ret;
  +   }
  +}
  +
  +
   if (dir == PFIL_IN)
   ss-n_in++;
   else
 
  which doesn't seem to affect the problem.
 
 As in it still panics and the ss is NULL message is not printed? I 
 would have expected to at least see ss is NULL printed if my 
 hypothesis was correct... hmm.

Yes, it still panics, but no message is printed.

 Perhaps the way I discovered the line number at which the panic occurred 
 was wrong. I compiled SIFTR on my amd64 dev server with CFLAGS+=-g in 
 the SIFTR Makefile to get debug symbols, ran objdump -Sd siftr.ko | vim 
 -, searched for the instruction reported in the panic message i.e. 
 addq $0x1,0x8(%r14) and then with a bit of trial and error, recompiled 
 SIFTR with the line of code volatile int blah = 0; blah = 2; at 
 various points in the function and looking at the change in the objdump 
 output to pinpoint which line of C code corresponded with the addq 
 instruction.
 
 The volatile int blah = 0; blah = 2; compiles to movl 
 $0x0,0xffd4(%rbp) followed immediately by movl 
 $0x2,0xffd4(%rbp). When I put that code above the if (dir 
 == PFIL_IN) statement I see the objdump output show the assembly code 
 before the addq instruction and when I move it after the if statement 
 the assembly code moves after the addq instruction.

That's a neat trick.
 
 Perhaps you could reproduce the above procedure and see if you identify 
 the same point in the siftr_chkpkt function I did for the instruction 
 referenced by the panic message?

I do. Using:

diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
index b9fdfe4..fc6bd9a 100644
--- a/sys/netinet/siftr.c
+++ b/sys/netinet/siftr.c
@@ -797,12 +797,15 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet 
*ifp, int dir,
}
 }
 
+volatile int blah = 0; blah = 2;
 
if (dir == PFIL_IN)
ss-n_in++;
else
ss-n_out++;
 
+volatile int foo = 0; foo = 3;
+
/*
 * Create a tcphdr 

Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

On 06/20/10 22:28, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/20/10 21:15, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org   wrote:


On 06/20/10 03:58, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.orgwrote:


On 06/13/10 18:12, Lawrence Stewart wrote:



The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.



I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.


I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
dbwhere
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---


So I've tracked down the line of code where the page fault is occurring:

   if (dir == PFIL_IN)
   ss-n_in++;
   else
   ss-n_out++;

ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats
per-cpu and is initialised at the start of the function like so:

   ss = DPCPU_PTR(ss);

So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your
machine. I know very little about the inner workings of the DPCPU_*
macros, but I'm pretty sure the way I use them in SIFTR is correct or at
least as intended.


siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing
it itself. I think if ss was NULL, the panic should already occur in
siftr_chkreinject().


Yes but siftr_chkreinject() only dereferences ss in the exceptional case
of a malloc failure or duplicate pkt. It's unlikely either case happens
for you and so wouldn't trigger the panic.


To be sure I added:

diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
index 8bc3498..b9fdfe4 100644
--- a/sys/netinet/siftr.c
+++ b/sys/netinet/siftr.c
@@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet 
*ifp, int dir,
  if (siftr_chkreinject(*m, dir, ss))
  goto ret;

+   if (ss == NULL) {
+   printf(ss is NULL);
+   ss = DPCPU_PTR(ss);
+   if (ss == NULL) {
+  printf(ss is still NULL);
+  goto ret;
+   }
+}
+
+
  if (dir == PFIL_IN)
  ss-n_in++;
  else

which doesn't seem to affect the problem.


As in it still panics and the ss is NULL message is not printed? I
would have expected to at least see ss is NULL printed if my
hypothesis was correct... hmm.


Yes, it still panics, but no message is printed.


It was just pointed out to me that ss doesn't have to be NULL in order 
to cause the page fault (duh). It could also just be a garbage ptr which 
is why your print statement isn't firing.


Can you trigger the panic again and look for some information along the 
lines of fault virtual address = ... as part of the panic info. 
Knowing the faulting address would be useful and may help further diagnosis.



Perhaps the way I discovered the line number at which the panic occurred
was wrong. I compiled SIFTR on my amd64 dev server with CFLAGS+=-g in
the SIFTR Makefile to get debug symbols, ran objdump -Sd siftr.ko | vim
-, searched for the instruction reported in the panic message i.e.
addq $0x1,0x8(%r14) and then with a bit of trial and error, recompiled
SIFTR with the line of code volatile int blah = 0; blah = 2; at
various points in the function and looking at the change in the objdump
output to pinpoint which line of C code corresponded with the addq
instruction.

The volatile int blah = 0; blah = 2; compiles to movl
$0x0,0xffd4(%rbp) followed immediately by movl
$0x2,0xffd4(%rbp). When I put that code above the if (dir
== PFIL_IN) statement I see the objdump output show the assembly code
before the addq instruction and when I move it after the if statement
the assembly code moves after the addq instruction.


That's a neat trick.


Indeed, and I thank phk@ for suggesting it to me.


Perhaps you could reproduce the above procedure and see if you identify
the same point in the siftr_chkpkt function I did for the instruction
referenced by the panic message?


I do. Using:

diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
index b9fdfe4..fc6bd9a 100644
--- a/sys/netinet/siftr.c
+++ b/sys/netinet/siftr.c
@@ -797,12 +797,15 

Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Fabian Keil
Lawrence Stewart lstew...@freebsd.org wrote:

 On 06/20/10 22:28, Fabian Keil wrote:
  Lawrence Stewartlstew...@freebsd.org  wrote:
 
  On 06/20/10 21:15, Fabian Keil wrote:
  Lawrence Stewartlstew...@freebsd.org   wrote:
 
  On 06/20/10 03:58, Fabian Keil wrote:
  Lawrence Stewartlstew...@freebsd.orgwrote:
 
  On 06/13/10 18:12, Lawrence Stewart wrote:
 
  The time has come to solicit some external testing for my SIFTR tool.
  I'm hoping to commit it within a week or so unless problems are 
  discovered.
 
  I'm interested in all feedback and reports of success/failure, along
  with details of the architecture tested and number of CPUs if you 
  would
  be so kind.
 
  I got the following hand-transcribed panic maybe a second after
  sysctl net.inet.siftr.enabled=1
 
  Fatal trap 12: page fault while in kernel mode
  cpuid = 1; apic id = 01
  [...]
  current process = 12 (swi4: clock)
  [ thread pid 12 tid 16 ]
  Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
  dbwhere
  Tracing pid 12 tid 16 td 0xff00034037e0
  siftr_chkpt() at siftr_chkpkt+0xd0
  pfil_run_hooks() at pfil_run_hooks+0xb4
  ip_output() at ip_output+0x382
  tcp_output() tcp_output+0xa41
  tcp_timer_rexmt() at tcp_timer_rexmt+0x251
  softclock() at softclock+0x291
  intr_event_execute_handlers() at intr_event_execute_handlers+0x66
  ithread_loop at ithread_loop+0x8e
  fork_exit() at fork_exit+0x112
  fork_trampoline() at fork_trampoline+0xe
  --- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---
 
  So I've tracked down the line of code where the page fault is occurring:
 
 if (dir == PFIL_IN)
 ss-n_in++;
 else
 ss-n_out++;
 
  ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats
  per-cpu and is initialised at the start of the function like so:
 
 ss = DPCPU_PTR(ss);
 
  So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your
  machine. I know very little about the inner workings of the DPCPU_*
  macros, but I'm pretty sure the way I use them in SIFTR is correct or at
  least as intended.
 
  siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing
  it itself. I think if ss was NULL, the panic should already occur in
  siftr_chkreinject().
 
  Yes but siftr_chkreinject() only dereferences ss in the exceptional case
  of a malloc failure or duplicate pkt. It's unlikely either case happens
  for you and so wouldn't trigger the panic.
 
  To be sure I added:
 
  diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
  index 8bc3498..b9fdfe4 100644
  --- a/sys/netinet/siftr.c
  +++ b/sys/netinet/siftr.c
  @@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct 
  ifnet *ifp, int dir,
if (siftr_chkreinject(*m, dir, ss))
goto ret;
 
  +   if (ss == NULL) {
  +   printf(ss is NULL);
  +   ss = DPCPU_PTR(ss);
  +   if (ss == NULL) {
  +  printf(ss is still NULL);
  +  goto ret;
  +   }
  +}
  +
  +
if (dir == PFIL_IN)
ss-n_in++;
else
 
  which doesn't seem to affect the problem.
 
  As in it still panics and the ss is NULL message is not printed? I
  would have expected to at least see ss is NULL printed if my
  hypothesis was correct... hmm.
 
  Yes, it still panics, but no message is printed.
 
 It was just pointed out to me that ss doesn't have to be NULL in order 
 to cause the page fault (duh). It could also just be a garbage ptr which 
 is why your print statement isn't firing.
 
 Can you trigger the panic again and look for some information along the 
 lines of fault virtual address = ... as part of the panic info. 
 Knowing the faulting address would be useful and may help further diagnosis.

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xff7f808f9de8
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x8241f800
stack pointer   = 0x28:0xff83a7d0
frame pointer   = 0x28:0xff83a840
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db where  
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---

  Could 

Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

On 06/20/10 23:15, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/20/10 22:28, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org   wrote:


On 06/20/10 21:15, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.orgwrote:


On 06/20/10 03:58, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org wrote:


On 06/13/10 18:12, Lawrence Stewart wrote:



The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.



I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.


I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db where
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---


So I've tracked down the line of code where the page fault is occurring:

if (dir == PFIL_IN)
ss-n_in++;
else
ss-n_out++;

ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats
per-cpu and is initialised at the start of the function like so:

ss = DPCPU_PTR(ss);

So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your
machine. I know very little about the inner workings of the DPCPU_*
macros, but I'm pretty sure the way I use them in SIFTR is correct or at
least as intended.


siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing
it itself. I think if ss was NULL, the panic should already occur in
siftr_chkreinject().


Yes but siftr_chkreinject() only dereferences ss in the exceptional case
of a malloc failure or duplicate pkt. It's unlikely either case happens
for you and so wouldn't trigger the panic.


To be sure I added:

diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
index 8bc3498..b9fdfe4 100644
--- a/sys/netinet/siftr.c
+++ b/sys/netinet/siftr.c
@@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet 
*ifp, int dir,
   if (siftr_chkreinject(*m, dir, ss))
   goto ret;

+   if (ss == NULL) {
+   printf(ss is NULL);
+   ss = DPCPU_PTR(ss);
+   if (ss == NULL) {
+  printf(ss is still NULL);
+  goto ret;
+   }
+}
+
+
   if (dir == PFIL_IN)
   ss-n_in++;
   else

which doesn't seem to affect the problem.


As in it still panics and the ss is NULL message is not printed? I
would have expected to at least see ss is NULL printed if my
hypothesis was correct... hmm.


Yes, it still panics, but no message is printed.


It was just pointed out to me that ss doesn't have to be NULL in order
to cause the page fault (duh). It could also just be a garbage ptr which
is why your print statement isn't firing.

Can you trigger the panic again and look for some information along the
lines of fault virtual address = ... as part of the panic info.
Knowing the faulting address would be useful and may help further diagnosis.


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xff7f808f9de8
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x8241f800
stack pointer   = 0x28:0xff83a7d0
frame pointer   = 0x28:0xff83a840
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0


None of this looks too crazy, but at least one person I've been chatting 
to about this thinks the faulting address doesn't look quite right for a 
DPCPU variable.


Can you please get the following additional info from DDB:

show reg
show dpcpu_offset
p/x pcpu_entry_modspace

And can you also please identify the upstream FreeBSD revision number 
your kernel source is based on (as opposed to the GIT rev) so we can 
make sure we're looking at the same base sources you're running.



current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db  where
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4

Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Fabian Keil
Fabian Keil freebsd-lis...@fabiankeil.de wrote:

 Lawrence Stewart lstew...@freebsd.org wrote:
 
  On 06/20/10 22:28, Fabian Keil wrote:

   Taking pf (and altq) out of the picture doesn't seem to make
   a difference.
  
  Wouldn't have expected it to. Will be very curious to know if the panic 
  is triggered in GENERIC.
 
 It's not. I, too, get pfil.c related LORs though:
 
 lock order reversal:
  1st 0x80e5c568 PFil hook read/write mutex (PFil hook read/write 
 mutex) @ /usr/src/sys/net/pfil.c:77
  2nd 0x80e5dd68 udp (udp) @ 
 /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:3035
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
 _witness_debugger() at _witness_debugger+0x2e
 witness_checkorder() at witness_checkorder+0x81e
 _rw_rlock() at _rw_rlock+0x5f
 pf_socket_lookup() at pf_socket_lookup+0x1c5
 pf_test_udp() at pf_test_udp+0x8b0
 pf_test() at pf_test+0x1089
 pf_check_in() at pf_check_in+0x39
 pfil_run_hooks() at pfil_run_hooks+0xcf
 ip_input() at ip_input+0x2ae
 swi_net() at swi_net+0x151
 intr_event_execute_handlers() at intr_event_execute_handlers+0x66
 ithread_loop() at ithread_loop+0xb2
 fork_exit() at fork_exit+0x12a
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xff844d30, rbp = 0 ---
 lock order reversal:
  1st 0x80e5c568 PFil hook read/write mutex (PFil hook read/write 
 mutex) @ /usr/src/sys/net/pfil.c:77
  2nd 0x80e5d788 tcp (tcp) @ 
 /usr/src/sys/modules/siftr/../../netinet/siftr.c:698
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
 _witness_debugger() at _witness_debugger+0x2e
 witness_checkorder() at witness_checkorder+0x81e
 _rw_rlock() at _rw_rlock+0x5f
 siftr_chkpkt() at siftr_chkpkt+0x3c4
 pfil_run_hooks() at pfil_run_hooks+0xcf
 ip_input() at ip_input+0x2ae
 swi_net() at swi_net+0x151
 intr_event_execute_handlers() at intr_event_execute_handlers+0x66
 ithread_loop() at ithread_loop+0xb2
 fork_exit() at fork_exit+0x12a
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xff844d30, rbp = 0 ---
 
 My custom kernel normally doesn't have INVARIANTS and WITNESS
 enabled, so I'll try to enable them next.

The culprit seem to be non-default KTR settings in the kernel
while loading alq as a module. With the following change siftr
works with my non-GENERIC kernel, too:

commit f43b8b5171c858df7b419f6a695e9e3b53531a8e
Author: Fabian Keil f...@fabiankeil.de
Date:   Sun Jun 20 15:43:01 2010 +0200

Disable KTR changes.

diff --git a/sys/amd64/conf/ZOEY b/sys/amd64/conf/ZOEY
index 6fb3480..c584317 100644
--- a/sys/amd64/conf/ZOEY
+++ b/sys/amd64/conf/ZOEY
@@ -16,11 +16,11 @@ options ATA_CAM
 device  atapicam
 options SC_KERNEL_CONS_ATTR=(FG_GREEN|BG_BLACK)
 
-options KTR
-options KTR_ENTRIES=262144
-options KTR_COMPILE=(KTR_SCHED)
-options KTR_MASK=(KTR_SCHED)
-options KTR_CPUMASK=0x3
+#options KTR
+#options KTR_ENTRIES=262144
+#options KTR_COMPILE=(KTR_SCHED)
+#options KTR_MASK=(KTR_SCHED)
+#options KTR_CPUMASK=0x3

 options ACCEPT_FILTER_HTTP 
 makeoptions WITH_CTF=yes

Fabian


signature.asc
Description: PGP signature


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Fabian Keil
Fabian Keil freebsd-lis...@fabiankeil.de wrote:

 Fabian Keil freebsd-lis...@fabiankeil.de wrote:

  My custom kernel normally doesn't have INVARIANTS and WITNESS
  enabled, so I'll try to enable them next.
 
 The culprit seem to be non-default KTR settings in the kernel
 while loading alq as a module.

Actually whether or not alq is loaded as a module doesn't
seem to matter, with:

options KTR
options KTR_ENTRIES=262144
options KTR_COMPILE=(KTR_SCHED)
options KTR_MASK=(KTR_SCHED)
options KTR_CPUMASK=0x3
options ALQ
options KTR_ALQ

enabling siftr panics the system, too.

Fabian


signature.asc
Description: PGP signature


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Rui Paulo

On 20 Jun 2010, at 20:36, Fabian Keil wrote:

 Fabian Keil freebsd-lis...@fabiankeil.de wrote:
 
 Fabian Keil freebsd-lis...@fabiankeil.de wrote:
 
 My custom kernel normally doesn't have INVARIANTS and WITNESS
 enabled, so I'll try to enable them next.
 
 The culprit seem to be non-default KTR settings in the kernel
 while loading alq as a module.
 
 Actually whether or not alq is loaded as a module doesn't
 seem to matter, with:
 
 options   KTR
 options   KTR_ENTRIES=262144
 options   KTR_COMPILE=(KTR_SCHED)
 options   KTR_MASK=(KTR_SCHED)
 options   KTR_CPUMASK=0x3
 options   ALQ
 options   KTR_ALQ
 
 enabling siftr panics the system, too.

That's probably because your module was built with different compile time 
options than the ones used in the kernel. These options may change structure 
sizes, function parameters, etc. and that easily causes panics.

Regards,
--
Rui Paulo


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

On 06/21/10 00:12, Fabian Keil wrote:

Fabian Keilfreebsd-lis...@fabiankeil.de  wrote:


Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/20/10 22:28, Fabian Keil wrote:



Taking pf (and altq) out of the picture doesn't seem to make
a difference.


Wouldn't have expected it to. Will be very curious to know if the panic
is triggered in GENERIC.


It's not. I, too, get pfil.c related LORs though:

lock order reversal:
  1st 0x80e5c568 PFil hook read/write mutex (PFil hook read/write 
mutex) @ /usr/src/sys/net/pfil.c:77
  2nd 0x80e5dd68 udp (udp) @ 
/usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:3035
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
_rw_rlock() at _rw_rlock+0x5f
pf_socket_lookup() at pf_socket_lookup+0x1c5
pf_test_udp() at pf_test_udp+0x8b0
pf_test() at pf_test+0x1089
pf_check_in() at pf_check_in+0x39
pfil_run_hooks() at pfil_run_hooks+0xcf
ip_input() at ip_input+0x2ae
swi_net() at swi_net+0x151
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop() at ithread_loop+0xb2
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff844d30, rbp = 0 ---
lock order reversal:
  1st 0x80e5c568 PFil hook read/write mutex (PFil hook read/write 
mutex) @ /usr/src/sys/net/pfil.c:77
  2nd 0x80e5d788 tcp (tcp) @ 
/usr/src/sys/modules/siftr/../../netinet/siftr.c:698
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
_rw_rlock() at _rw_rlock+0x5f
siftr_chkpkt() at siftr_chkpkt+0x3c4
pfil_run_hooks() at pfil_run_hooks+0xcf
ip_input() at ip_input+0x2ae
swi_net() at swi_net+0x151
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop() at ithread_loop+0xb2
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff844d30, rbp = 0 ---

My custom kernel normally doesn't have INVARIANTS and WITNESS
enabled, so I'll try to enable them next.


The culprit seem to be non-default KTR settings in the kernel
while loading alq as a module. With the following change siftr
works with my non-GENERIC kernel, too:

commit f43b8b5171c858df7b419f6a695e9e3b53531a8e
Author: Fabian Keilf...@fabiankeil.de
Date:   Sun Jun 20 15:43:01 2010 +0200

 Disable KTR changes.

diff --git a/sys/amd64/conf/ZOEY b/sys/amd64/conf/ZOEY
index 6fb3480..c584317 100644
--- a/sys/amd64/conf/ZOEY
+++ b/sys/amd64/conf/ZOEY
@@ -16,11 +16,11 @@ options ATA_CAM
  device  atapicam
  options SC_KERNEL_CONS_ATTR=(FG_GREEN|BG_BLACK)

-options KTR
-options KTR_ENTRIES=262144
-options KTR_COMPILE=(KTR_SCHED)
-options KTR_MASK=(KTR_SCHED)
-options KTR_CPUMASK=0x3
+#options KTR
+#options KTR_ENTRIES=262144
+#options KTR_COMPILE=(KTR_SCHED)
+#options KTR_MASK=(KTR_SCHED)
+#options KTR_CPUMASK=0x3

  options ACCEPT_FILTER_HTTP
  makeoptions WITH_CTF=yes


This smells very fishy. Without options KTR_ALQ, KTR shouldn't even 
care if ALQ exists or not. Not only that, but ALQ isn't even used in 
siftr_chkpkt and you clearly manage to successfully use ALQ to write the 
module load message to the log file. H...


Thanks for taking the time to find the culprit though - I'll see if I 
can reproduce here. Could you try another thing for me and see if 
reducing options KTR_ENTRIES=262144 down to a smaller number (maybe 
4096?) and leaving all the other KTR options as they are above (but 
uncommented) makes any difference? The ktr(4) man page indicates the 
default is 8192 entries and I'm curious if the your allocation of so 
many additional entries is making something unhappy.


Thanks again for your time helping with this, I really appreciate it.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-20 Thread Lawrence Stewart

On 06/21/10 05:44, Rui Paulo wrote:


On 20 Jun 2010, at 20:36, Fabian Keil wrote:


Fabian Keilfreebsd-lis...@fabiankeil.de  wrote:


Fabian Keilfreebsd-lis...@fabiankeil.de  wrote:



My custom kernel normally doesn't have INVARIANTS and WITNESS
enabled, so I'll try to enable them next.


The culprit seem to be non-default KTR settings in the kernel
while loading alq as a module.


Actually whether or not alq is loaded as a module doesn't
seem to matter, with:

options KTR
options KTR_ENTRIES=262144
options KTR_COMPILE=(KTR_SCHED)
options KTR_MASK=(KTR_SCHED)
options KTR_CPUMASK=0x3
options ALQ
options KTR_ALQ

enabling siftr panics the system, too.


That's probably because your module was built with different compile time 
options than the ones used in the kernel. These options may change structure 
sizes, function parameters, etc. and that easily causes panics.


hmm I wonder if my instructions to build SIFTR manually are causing your 
problems. Fabian, is the siftr.ko module you're loading built as part of 
a make buildkernel, or did you follow my instructions and cd 
/path/to/src/sys/modules/siftr ; make ; kldload ./siftr.ko?


If the latter is true, perhaps try and explicitly build SIFTR as part of 
make buildkernel and see if loading the module built that way still 
triggers the panic when enabled (the module will be in 
/usr/obj/path/to/src/sys/KERNCONF/modules/path/to/src/sys/modules/siftr/siftr.ko 
or if you make installkernel it'll be in /boot/kernel/kernel/siftr.ko).


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-19 Thread Lev Serebryakov
Hello, Lawrence.
You wrote 19 июня 2010 г., 07:27:30:

 Amount of feedback received thus far: nichts, nil, nada
  I  wanted  to  help  you,  but  here is one problem: I dont have any
traffic-loaded 9-CURRENT machines. I have some not-so-critical 7.x and
8.x  machines  with  noticeable  traffic  (for example, my torrent box
still run 7-STABLE), but no 9-CURRENT except VMWare on my desktop :(
  I  think,  it is common case: 9-CURRENT machines are developers one,
without  noticeable  amount of network traffic and all traffic-loaded
machines run more stable versions.

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-19 Thread Lawrence Stewart

Hi Lev,

On 06/19/10 16:26, Lev Serebryakov wrote:

Hello, Lawrence.
You wrote 19 июня 2010 г., 07:27:30:


Amount of feedback received thus far: nichts, nil, nada

   I  wanted  to  help  you,  but  here is one problem: I dont have any
traffic-loaded 9-CURRENT machines. I have some not-so-critical 7.x and
8.x  machines  with  noticeable  traffic  (for example, my torrent box
still run 7-STABLE), but no 9-CURRENT except VMWare on my desktop :(
   I  think,  it is common case: 9-CURRENT machines are developers one,
without  noticeable  amount of network traffic and all traffic-loaded
machines run more stable versions.


Right now the traffic load of the test machine is not really all that 
important to the testing. As long as the module loads, logs some 
coherent looking data whilst enabled and unloads across a range of 
different hardware and kernel archs, I'll be happy. SIFTR will be 
backported to 8 and possibly 7 also, so there will be plenty of time to 
get people with more heavily loaded systems running stable branches to 
join in testing.


This is the first real push I've made to get the code widely tested, so 
I wouldn't feel comfortable asking people to run it on 
(semi-)production, stable branch systems yet. If you're really keen to 
help test it and you wouldn't be worried about running the code on such 
a system, I would be happy to create a 7 and/or 8 backport of the 
required bits. Otherwise, I'm happy to get the initial round of 
9-CURRENT only testing feedback, commit it to head and then revisit once 
it's settled and time to merge it back to the stable branches.


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research

2010-06-19 Thread pluknet
On 13 June 2010 12:12, Lawrence Stewart lstew...@freebsd.org wrote:
 Hi all,

 The time has come to solicit some external testing for my SIFTR tool. I'm
 hoping to commit it within a week or so unless problems are discovered.

 SIFTR is a kernel module that logs a range of statistics on active TCP
 connections to a log file. It provides the ability to make highly granular
 measurements of TCP connection state, aimed at system administrators,
 developers and researchers. You can use the data to find bugs in the stack,
 understand why connections are performing badly and test new code to name a
 few uses.

 Development has been made possible in part by grants from the Cisco
 University Research Program Fund at Community Foundation Silicon Valley, and
 the FreeBSD Foundation. Bringing it into FreeBSD proper is being carried out
 under the auspices of the Enhancing the FreeBSD TCP Implementation FreeBSD
 Foundation project. More details are available at [1,2,3].

 If you can help out, please read on!

 Before continuing, make sure you're running with at least svn revision
 209119 (my commit to sys/pcpu.h), or you can manually apply the r209119
 diff to to your earlier rev source tree.

 The SIFTR patch is here:

 http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209119.patch

 Copy it to the root of your source tree and run the following:

 patch -p1  siftr_9.x.r209119.patch

 It's a loadable kernel module so you can build it for testing like so:

 cd path/to/src/sys/modules/siftr
 make
 kldload ./siftr.ko
 (don't forget to make cleandir to remove cruft when finished testing)

 After applying the patch, you can read the man page by running:

 man -M path/to/src/share/man siftr

 If I've done a decent job, all the info you need to understand what it does
 and how to use it should be in the man page.

 I'm interested in all feedback and reports of success/failure, along with
 details of the architecture tested and number of CPUs if you would be so
 kind.

 That should be enough to get the ball rolling. Thanks and I look forward to
 hearing from you!

 Cheers,
 Lawrence

 [1] http://caia.swin.edu.au/freebsd/etcp09/

 [2] http://www.freebsdfoundation.org/projects.shtml#Swinburne

 [3] http://caia.swin.edu.au/urp/newtcp/

Hi.

I'm seeing this right after enabling siftr via sysctl and changing ppl.
Sorry, if that was already discussed, known or unrelated (since em is
in locking chain).

lock order reversal:
 1st 0x80e51568 PFil hook read/write mutex (PFil hook
read/write mutex) @ /usr/src/sys/net/pfil.c:77
 2nd 0x80e52788 tcp (tcp) @
/usr/src/sys/modules/siftr/../../netinet/siftr.c:698
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
_rw_rlock() at _rw_rlock+0x5f
siftr_chkpkt() at siftr_chkpkt+0x374
pfil_run_hooks() at pfil_run_hooks+0xcf
ip_input() at ip_input+0x2ae
netisr_dispatch_src() at netisr_dispatch_src+0xb8
ether_demux() at ether_demux+0x17d
ether_input() at ether_input+0x175
em_rxeof() at em_rxeof+0x193
em_handle_que() at em_handle_que+0x4a
taskqueue_run() at taskqueue_run+0x91
taskqueue_thread_loop() at taskqueue_thread_loop+0x3f
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff8bed30, rbp = 0 ---

-- 
wbr,
pluknet
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research

2010-06-19 Thread Lawrence Stewart

Hi Pluknet,

On 06/19/10 18:48, pluknet wrote:
[snip]

Hi.

I'm seeing this right after enabling siftr via sysctl and changing ppl.
Sorry, if that was already discussed, known or unrelated (since em is
in locking chain).

lock order reversal:
  1st 0x80e51568 PFil hook read/write mutex (PFil hook
read/write mutex) @ /usr/src/sys/net/pfil.c:77
  2nd 0x80e52788 tcp (tcp) @
/usr/src/sys/modules/siftr/../../netinet/siftr.c:698
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
_rw_rlock() at _rw_rlock+0x5f
siftr_chkpkt() at siftr_chkpkt+0x374
pfil_run_hooks() at pfil_run_hooks+0xcf
ip_input() at ip_input+0x2ae
netisr_dispatch_src() at netisr_dispatch_src+0xb8
ether_demux() at ether_demux+0x17d
ether_input() at ether_input+0x175
em_rxeof() at em_rxeof+0x193
em_handle_que() at em_handle_que+0x4a
taskqueue_run() at taskqueue_run+0x91
taskqueue_thread_loop() at taskqueue_thread_loop+0x3f
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff8bed30, rbp = 0 ---


I believe I discussed this LOR with Robert Watson some time back and we 
came to the conclusion it is a false positive witness report and is safe 
to ignore. I should document it in the man page and figure out if 
there's some way to tell witness to not report it. Thanks for reminding 
me and for testing. Did everything else behave sanely and work ok?


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-19 Thread Fabian Keil
Lawrence Stewart lstew...@freebsd.org wrote:

 On 06/13/10 18:12, Lawrence Stewart wrote:

  The time has come to solicit some external testing for my SIFTR tool.
  I'm hoping to commit it within a week or so unless problems are discovered.

  I'm interested in all feedback and reports of success/failure, along
  with details of the architecture tested and number of CPUs if you would
  be so kind.

I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db where
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---

This is from the third attempt, the second time I got a different
backtrace that also contained some *_iwn_* functions, the first
time I had X running, so I didn't get anything. Unfortunately
at that point the system seems to be too busted to dump core.

I'm using:
FreeBSD 9.0-CURRENT #99 r+b768fe1: Sat Jun 19 15:01:37 CEST 2010
f...@r500.local:/usr/obj/usr/src/sys/ZOEY amd64
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 Duo CPU T5870  @ 2.00GHz (1995.01-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0x6fd  Family = 6  Model = f  Stepping = 13
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0xe39dSSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  TSC: P-state invariant
real memory  = 2147483648 (2048 MB)
avail memory = 1976610816 (1885 MB)
ACPI APIC Table: LENOVO TP-7Y   
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0: Changing APIC ID to 1
ioapic0 Version 2.0 irqs 0-23 on motherboard

I'm not using vanilla sources, but none of the modifications
should matter here.

I have powerd running and did not yet try without it.

The system has bge0 and iwn0, but bge0 is mainly down.

pf is compiled into the kernel, siftr is loaded as a module.

The panic seems to occur without logging a single packet first:
f...@r500 ~ $cat /var/log/siftr.log 
enable_time_secs=1276966161 enable_time_usecs=945080siftrver=1.2.3  
hz=100  tcp_rtt_scale=32sysname=FreeBSD sysver=900014   ipmode=4
enable_time_secs=1276966586 enable_time_usecs=314023siftrver=1.2.3  
hz=100  tcp_rtt_scale=32sysname=FreeBSD sysver=900014   ipmode=4

I get the impression that this is reproducible, but only tried
three times (the last time with everything mounted read-only).

Fabian


signature.asc
Description: PGP signature


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-19 Thread Fabian Keil
Fabian Keil freebsd-lis...@fabiankeil.de wrote:

 Lawrence Stewart lstew...@freebsd.org wrote:
 
  On 06/13/10 18:12, Lawrence Stewart wrote:
 
   The time has come to solicit some external testing for my SIFTR tool.
   I'm hoping to commit it within a week or so unless problems are 
   discovered.
 
   I'm interested in all feedback and reports of success/failure, along
   with details of the architecture tested and number of CPUs if you would
   be so kind.
 
 I got the following hand-transcribed panic maybe a second after
 sysctl net.inet.siftr.enabled=1

 I have powerd running and did not yet try without it.

Disabling powerd doesn't seem to make a difference.
I'll try with a GENERIC kernel tomorrow.

Fabian


signature.asc
Description: PGP signature


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-19 Thread Lawrence Stewart

Hi Fabian,

Thank you for the the report. This is indeed an issue I've never seen 
before and exactly the sort of thing I wanted to uncover.


On 06/20/10 03:58, Fabian Keil wrote:

Lawrence Stewartlstew...@freebsd.org  wrote:


On 06/13/10 18:12, Lawrence Stewart wrote:



The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.



I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.


I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 16 ]
Stopped at  siftr_chkpkt+0xd0:  addq$0x1,0x8(%r14)
db  where
Tracing pid 12 tid 16 td 0xff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff83ad30, rbp = 0 ---


hmm I'd love to know which line of code siftr_chkpkt+0xd0 maps to. Let 
me read through the function carefully and see if I can spot an obvious 
null ptr deref. The hook function has received some major rototilling of 
late to get it ready for the import so I must have missed something.



This is from the third attempt, the second time I got a different
backtrace that also contained some *_iwn_* functions, the first
time I had X running, so I didn't get anything. Unfortunately
at that point the system seems to be too busted to dump core.


Typically, packets are direct dispatched into the stack from the driver 
so it is normal to see driver functions in a thread's stack trace when 
it's executing in the siftr pfil hook.



I'm using:
FreeBSD 9.0-CURRENT #99 r+b768fe1: Sat Jun 19 15:01:37 CEST 2010
 f...@r500.local:/usr/obj/usr/src/sys/ZOEY amd64
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 Duo CPU T5870  @ 2.00GHz (1995.01-MHz K8-class CPU)
   Origin = GenuineIntel  Id = 0x6fd  Family = 6  Model = f  Stepping = 13
   
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
   Features2=0xe39dSSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM
   AMD Features=0x20100800SYSCALL,NX,LM
   AMD Features2=0x1LAHF
   TSC: P-state invariant
real memory  = 2147483648 (2048 MB)
avail memory = 1976610816 (1885 MB)
ACPI APIC Table:LENOVO TP-7Y
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
  cpu0 (BSP): APIC ID:  0
  cpu1 (AP): APIC ID:  1
ioapic0: Changing APIC ID to 1
ioapic0Version 2.0  irqs 0-23 on motherboard

I'm not using vanilla sources, but none of the modifications
should matter here.


Yes this does not look like an issue with your sources but with the 
siftr code itself. Don't bother testing with GENERIC yet as I'm 
confident you've given me enough info to track this down.



I have powerd running and did not yet try without it.

The system has bge0 and iwn0, but bge0 is mainly down.

pf is compiled into the kernel, siftr is loaded as a module.

The panic seems to occur without logging a single packet first:
f...@r500 ~ $cat /var/log/siftr.log
enable_time_secs=1276966161 enable_time_usecs=945080siftrver=1.2.3  
hz=100  tcp_rtt_scale=32sysname=FreeBSD sysver=900014   ipmode=4
enable_time_secs=1276966586 enable_time_usecs=314023siftrver=1.2.3  
hz=100  tcp_rtt_scale=32sysname=FreeBSD sysver=900014   ipmode=4

I get the impression that this is reproducible, but only tried
three times (the last time with everything mounted read-only).


Thanks again for the report and I'll be in touch as soon as I get a 
chance to look at it some more (hopefully later today).


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

2010-06-18 Thread Lawrence Stewart

Amount of feedback received thus far: nichts, nil, nada

*sings I'm so ronery in his best Kim Jong-il voice* [4]

Just like Uncle Sam [5], Uncle Lawrence needs you too - yes, I'm 
pointing at YOU!


More specifically, people out there running current with 10-15 mins to 
spare for some testing, please read on.


On 06/13/10 18:12, Lawrence Stewart wrote:

Hi all,

The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.

SIFTR is a kernel module that logs a range of statistics on active TCP
connections to a log file. It provides the ability to make highly
granular measurements of TCP connection state, aimed at system
administrators, developers and researchers. You can use the data to find
bugs in the stack, understand why connections are performing badly and
test new code to name a few uses.

Development has been made possible in part by grants from the Cisco
University Research Program Fund at Community Foundation Silicon Valley,
and the FreeBSD Foundation. Bringing it into FreeBSD proper is being
carried out under the auspices of the Enhancing the FreeBSD TCP
Implementation FreeBSD Foundation project. More details are available
at [1,2,3].

If you can help out, please read on!

Before continuing, make sure you're running with at least svn revision
209119 (my commit to sys/pcpu.h), or you can manually apply the
r209119 diff to to your earlier rev source tree.

The SIFTR patch is here:

http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209119.patch


An updated version of the patch against svn head revision 209325 is 
available from:


http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209325.patch

There was a backwards incompatible change in the external DPCPU_SUM() 
macro in sys/pcpu.h in r209325 of head so SIFTR also had to be 
updated. Please adapt the following instructions as appropriate based on 
the patch version you're testing.



Copy it to the root of your source tree and run the following:

patch -p1  siftr_9.x.r209119.patch

It's a loadable kernel module so you can build it for testing like so:

cd path/to/src/sys/modules/siftr
make
kldload ./siftr.ko
(don't forget to make cleandir to remove cruft when finished testing)

After applying the patch, you can read the man page by running:

man -M path/to/src/share/man siftr

If I've done a decent job, all the info you need to understand what it
does and how to use it should be in the man page.

I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.

That should be enough to get the ball rolling. Thanks and I look forward
to hearing from you!

Cheers,
Lawrence

[1] http://caia.swin.edu.au/freebsd/etcp09/

[2] http://www.freebsdfoundation.org/projects.shtml#Swinburne

[3] http://caia.swin.edu.au/urp/newtcp/


[4] http://www.youtube.com/watch?v=xh_9QhRzJEs (language warning)

[5] http://www.sonofthesouth.net/uncle-sam/images/uncle-sam-wants-you.jpg
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


[CFT] SIFTR - Statistical Information For TCP Research

2010-06-13 Thread Lawrence Stewart

Hi all,

The time has come to solicit some external testing for my SIFTR tool. 
I'm hoping to commit it within a week or so unless problems are discovered.


SIFTR is a kernel module that logs a range of statistics on active TCP 
connections to a log file. It provides the ability to make highly 
granular measurements of TCP connection state, aimed at system 
administrators, developers and researchers. You can use the data to find 
bugs in the stack, understand why connections are performing badly and 
test new code to name a few uses.


Development has been made possible in part by grants from the Cisco 
University Research Program Fund at Community Foundation Silicon Valley, 
and the FreeBSD Foundation. Bringing it into FreeBSD proper is being 
carried out under the auspices of the Enhancing the FreeBSD TCP 
Implementation FreeBSD Foundation project. More details are available 
at [1,2,3].


If you can help out, please read on!

Before continuing, make sure you're running with at least svn revision 
209119 (my commit to sys/pcpu.h), or you can manually apply the 
r209119 diff to to your earlier rev source tree.


The SIFTR patch is here:

http://people.freebsd.org/~lstewart/patches/tcp_ffcaia2008/siftr_9.x.r209119.patch

Copy it to the root of your source tree and run the following:

patch -p1  siftr_9.x.r209119.patch

It's a loadable kernel module so you can build it for testing like so:

cd path/to/src/sys/modules/siftr
make
kldload ./siftr.ko
(don't forget to make cleandir to remove cruft when finished testing)

After applying the patch, you can read the man page by running:

man -M path/to/src/share/man siftr

If I've done a decent job, all the info you need to understand what it 
does and how to use it should be in the man page.


I'm interested in all feedback and reports of success/failure, along 
with details of the architecture tested and number of CPUs if you would 
be so kind.


That should be enough to get the ball rolling. Thanks and I look forward 
to hearing from you!


Cheers,
Lawrence

[1] http://caia.swin.edu.au/freebsd/etcp09/

[2] http://www.freebsdfoundation.org/projects.shtml#Swinburne

[3] http://caia.swin.edu.au/urp/newtcp/
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org