new documentation: IP_TRANSPARENT, is it correct?

2017-04-21 Thread bert hubert
Hi everyone,

10 years after lartc.org I decided to document a little bit more of Linux
networking, and I hope I got it right. This email asks for your help in
making sure.

Recently I attempted to use IP_TRANSPARENT as outlined in
https://www.kernel.org/doc/Documentation/networking/tproxy.txt but I could
not figure out how it really worked from there (although I could copy paste
my way to some working code).  The web also mostly offered little in the way
of (correct) explanation.

I think I have it figured out by now, but I'm sure there are nuances I have
missed. I'm especially interested in understanding _exactly_ what the
IP_TRANSPARENT socket option does, because it appears somewhat arbitrary
right now:

"The IP_TRANSPARENT socket option enables:

* Binding to addresses that are not (usually) considered local
* Receiving connections and packets from iptables TPROXY redirected sessions"

https://ds9a.nl/tproxy/tproxy.md.html has somewhat prettified Markdown that
requires Javascript, plain Markdown is on
https://github.com/ahupowerdns/tproxydoc/blob/master/tproxy.md

If you could give this a read and a comment on things I got wrong, that
would be most appreciated. Pointers to other relevant documentation are also
very welcome.

Thanks!

Bert



Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?

2007-02-20 Thread bert hubert
On Mon, Feb 19, 2007 at 03:56:23PM -0800, Stephen Hemminger wrote:

  Linux 2.6.20-rc4 appears to take 4 microseconds on my P4 3GHz for a
  non-blocking UDPv4 recvfrom() call, both on loopback and ethernet.
  
  Linux 2.6.18 on my 64 bit Athlon64 3200+ takes a similar amount of time.

  recvfrom itself is a tad worrisome, x=recvfrom. I didn't ask for the
  'libc_enable_asynccancel' stuff. I'm trying to isolate the actual syscall
  but it is proving hard work for an assemnly newbie like me - socketcall
  doesn't make things easier.

Together with Zwane Mwaikambo, we managed to isolate the pure syscall, it
doesn't make a difference, a single recvfrom continues to take around 4
microseconds at 3GHz. Many thanks to Zwane for helping out.

 Use oprofile to find the hotspot.

Will do this next - I need to get me a setup where I can do oprofile *and*
decent query rates, I don't do oprofile on my remote machines I don't have
easy access to.

Thanks.

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?

2007-02-20 Thread bert hubert
On Tue, Feb 20, 2007 at 11:50:13AM +0100, Andi Kleen wrote:
 P4s are pretty slow at taking locks (or rather doing atomical operations)
 and there are several of them in this path. You could try it with a UP
 kernel. Actually hotunplugging the other virtual CPU should be sufficient 
 with recent kernels.

This is on a UP kernel, on a single CPU. It does have hyperthreading, but
the kernel is uniprocessor, non-preempt. No frequency scaling. Linux
2.6.20-rc4, 2.6.19, 2.6.18, P4, P-M, Athlon 64. Ubunty Edgy Eft on the P4.

 Also BTW RDTSC on P4 is not very accurate for small measurements
 because it has a quite high overhead by itself, i would suggest
 running it in a loop.

I've done so, with some interesting results. Source on
http://ds9a.nl/tmp/recvtimings.c - be careful to adjust the '3000' divider
to your CPU frequency if you care about absolute numbers!

These are two groups, each consisting of 10 consecutive nonblocking UDP
recvfroms, with 10 packets preloaded. Reported is the number of microseconds
per recvfrom call which yielded a packet:

$ ./recvtimings
4.142333
2.237667
1.927333
1.58
1.77
1.632333
1.712667
1.685000
1.62
2.415000
1.347333
1.545000
1.492667
1.902333
1.485000
1.532667
1.46
1.517667
1.492333
1.58

This in a nearly quiet P4 - I've removed the first line:
$ vmstat 1
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  0  0 290064 307036 29603600 0 0  124   58  0  0 100  0
 0  0  0 289972 307036 29603600 0 4  139   95  0  0 100  0
 0  0  0 289972 307036 29603600 0 0  119   55  0  0 100  0
 1  0  0 289972 307036 29603600 0 0  135   71  0  0 100  0

HZ is clearly 100. If I usleep in between, timings for each recvfrom call
become higher. If I sleep for a full second, I get nearly flat results:
4.25
5.317667
3.525000
4.147333
3.36
3.552667
3.087667

Various differing CPUs report more or less the same results. Now I know we
have caching effects, but these effects are HUGE.

Is this supposed to be the case? I'm on an up to date system, glibc 2.4.

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?

2007-02-20 Thread bert hubert
On Tue, Feb 20, 2007 at 07:41:25PM +0300, Evgeniy Polyakov wrote:

 It can be recvfrom only problem - syscall overhead on my p4 (core duo,
 debian testing) is bout 300 usec - to test I ran read('dev/zero', data,
 0) in a loop.

nsec I assume?

The usec numbers for read(fd, c, 0) where fd is /dev/zero:
1.557667, 0.627667, 0.447333, 0.44, 0.44, 0.44, 0.442333,
0.44, 0.44, 0.442333, 0.442333, 0.44, 0.44, 0.442333,
0.442667, 0.44, 0.44, 0.44, 0.442333, 0.442667,

In usecs. Notice the same declining figure, but not as pronounced. With a
sleep(1) in between, we get:
1.692667, 1.80, 0.782667, 1.282667, 0.665000, 0.98, 0.925000,
0.887667, 0.662667, 0.862667, 1.077333, 1.442333, 0.66, 1.89,
0.672333, 0.795000, 0.647667, 0.692333, 0.75, 0.865000,

This doesn't look all that unhealthy.

 Could you try to hack recvfrom() for your socket to always copy some
 empty buffer and check the results without waiting for packet?

That might be out of my reach before tomorrow :-)

 If you are not hurry I can test it myself tomorrow.

Thanks. My major problem is that in my measurements, I quite often see the
'worst case' 4usec result. It would not be a problem if it happens only
once, of course.

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?

2007-02-20 Thread bert hubert
On Tue, Feb 20, 2007 at 09:48:59PM +0300, Evgeniy Polyakov wrote:

 Likely first overhead related to cache population or gamma-ray radiation.
 If it happens only one (it does in my test), then everything is ok I
 think. Bert, how frequently you get that long recvfrom()?

I have plotted the average time for a single non-blocking UPDv4 recvfrom
call returning 100 bytes, based on the delay I insert between recvfrom
calls, as measured in cycles spent busywaiting.

In theory, this graph should show some slope, perhaps because of the higher
chance of context switches, cache evictions and purging of any 
branche-prediction
information the CPU might have kept. I'm no expert.

I measure a huge slope, however. Starting at 1usec for back-to-back system
calls, it rises to 2usec after interleaving calls with a count to 20
million.

4usec is hit after 110 million.

The graph, with semi-scientific error-bars is on
http://ds9a.nl/tmp/recvfrom-usec-vs-wait.png

The code to generate it is on:
http://ds9a.nl/tmp/recvtimings.c

I'm investigating this further for other system calls. It might be that my
measurements are off, but it appears even a slight delay between calls
incurs a large penalty.

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?

2007-02-20 Thread bert hubert
On Tue, Feb 20, 2007 at 02:40:40PM -0500, Benjamin LaHaise wrote:

 Make sure your system is idle.  Userspace bloat means that *lots* of idle 
 activity occurs in between timer ticks on recent distributions -- all those 

You hit the nail on the head. I had previously measured with X shut down,
but the effect didn't disappear.

With init=/bin/bash, recvfrom suddenly takes from 900nsec to 1.3usec, with
only slight correlation between inter-call delay and cycles spent.

I'm investigating this further as it appears this has a real life effect on
my P4 - a drastic one!

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping: 1
cpu MHz : 3000.131
cache size  : 1024 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx
constant_tsc pni monitor ds_cpl cid xtpr
bogomips: 6003.91
clflush size: 64

Thanks for your help!

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?

2007-02-20 Thread bert hubert
On Tue, Feb 20, 2007 at 02:02:00PM -0800, Rick Jones wrote:

 The slope appears to be flattening-out the farther out to the right it 
 goes.  Perhaps that is the length of time it takes to take all the 
 requisite cache misses.

The rate of flattening out appears to correlate with the number of processes
running, even though the system is otherwise 99.5% idle during my
measurements.

With only 'gdm' running, things flatten out slowly, iow, it takes longer
delays to see recvfrom slow down. With only 1 process running (init=bash),
the graph is nearly flat.

From this, it is probable that even an idle GNOME desktop (Ubunty Edgy Eft)
is under fierce cache pressure, enough to blow away my meagre 1MB in a
matter of milliseconds.

I'm trying to figure out which processes have the most impact, I had already
killed anything non-essential. But that still leaves 140 pids.

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?

2007-02-19 Thread bert hubert
Hi people,

I'm trying to save people the cost of buying extra servers by making
PowerDNS (GPL) ever faster, but I've hit a rather fundamental problem.

Linux 2.6.20-rc4 appears to take 4 microseconds on my P4 3GHz for a
non-blocking UDPv4 recvfrom() call, both on loopback and ethernet.

Linux 2.6.18 on my 64 bit Athlon64 3200+ takes a similar amount of time.

This seems like rather a lot for a 50 byte datagram, but perhaps I'm
overestimating your abilities :-)

The program is unthreaded, and I measure like this:

#define RDTSC(qp) \
do { \
  unsigned long lowPart, highPart;  \
  __asm__ __volatile__(rdtsc : =a (lowPart), =d (highPart)); \
qp = (((unsigned long long) highPart)  32) | lowPart; \
} while (0)

...

uint64_t tsc1, tsc2;
RDTSC(tsc1);  

if((len=recvfrom(fd, data, sizeof(data), 0, (sockaddr *)fromaddr, addrlen)) 
= 0) { 
RDTSC(tsc2);  
printf(%f\n, (tsc2-tsc1)/3000.0);  // 3GHz P4
}

gdb generates the following dump from the actual program,
x=_Z20handleNewUDPQuestioniRN5boost3anyE, I see nothing untoward happening
between the two 'rdtsc' opcodes.

0x08091de0 x+0:  push   %ebp
0x08091de1 x+1:  mov%esp,%ebp
0x08091de3 x+3:  push   %edi
0x08091de4 x+4:  push   %esi
0x08091de5 x+5:  push   %ebx
0x08091de6 x+6:  sub$0x78c,%esp
0x08091dec x+12: mov%gs:0x14,%eax
0x08091df2 x+18: mov%eax,0xffe4(%ebp)
0x08091df5 x+21: xor%eax,%eax
0x08091df7 x+23: movw   $0x2,0xffac(%ebp)
0x08091dfd x+29: movl   $0x0,0xffb0(%ebp)
0x08091e04 x+36: movw   $0x0,0xffae(%ebp)
0x08091e0a x+42: movl   $0x1c,0xf8f4(%ebp)
0x08091e14 x+52: rdtsc  
0x08091e16 x+54: mov%edx,%ebx
0x08091e18 x+56: mov0x8(%ebp),%edx
0x08091e1b x+59: mov%eax,%esi
0x08091e1d x+61: lea0xf8f4(%ebp),%eax
0x08091e23 x+67: mov%eax,0x14(%esp)
0x08091e27 x+71: lea0xffac(%ebp),%ecx
0x08091e2a x+74: lea0xf950(%ebp),%eax
0x08091e30 x+80: mov%ecx,0x10(%esp)
0x08091e34 x+84: movl   $0x0,0xc(%esp)
0x08091e3c x+92: movl   $0x5dc,0x8(%esp)
0x08091e44 x+100:mov%eax,0x4(%esp)
0x08091e48 x+104:mov%edx,(%esp)
0x08091e4b x+107:call   0x8192110 recvfrom
0x08091e50 x+112:test   %eax,%eax
0x08091e52 x+114:mov%eax,0xf8b0(%ebp)
0x08091e58 x+120:js 0x8092168 x+904
0x08091e5e x+126:mov%ebx,%eax
0x08091e60 x+128:xor%edx,%edx
0x08091e62 x+130:mov%eax,%edx
0x08091e64 x+132:mov$0x0,%eax
0x08091e69 x+137:mov%esi,%ecx
0x08091e6b x+139:mov%eax,%esi
0x08091e6d x+141:or %ecx,%esi
0x08091e6f x+143:mov%edx,%edi
0x08091e71 x+145:rdtsc  
0x08091e73 x+147:mov%eax,0xf8a0(%ebp)
0x08091e79 x+153:mov0xf8a0(%ebp),%eax
0x08091e7f x+159:mov%edx,%ecx
0x08091e81 x+161:xor%ebx,%ebx
0x08091e83 x+163:mov%ecx,%ebx

recvfrom itself is a tad worrisome, x=recvfrom. I didn't ask for the
'libc_enable_asynccancel' stuff. I'm trying to isolate the actual syscall
but it is proving hard work for an assemnly newbie like me - socketcall
doesn't make things easier.

0xb7d62410 x+0:cmpl   $0x0,%gs:0xc
0xb7d62418 x+8:jne0xb7d62439 x+41
0xb7d6241a x+10:   mov%ebx,%edx
0xb7d6241c x+12:   mov$0x66,%eax
0xb7d62421 x+17:   mov$0xc,%ebx
0xb7d62426 x+22:   lea0x4(%esp),%ecx
0xb7d6242a x+26:   call   *%gs:0x10
0xb7d62431 x+33:   mov%edx,%ebx
0xb7d62433 x+35:   cmp$0xff83,%eax
0xb7d62436 x+38:   jae0xb7d62469 x+89
0xb7d62438 x+40:   ret
0xb7d62439 x+41:   push   %esi
0xb7d6243a x+42:   call   0xb7d6ddd0 __libc_enable_asynccancel
0xb7d6243f x+47:   mov%eax,%esi
0xb7d62441 x+49:   mov%ebx,%edx
0xb7d62443 x+51:   mov$0x66,%eax
0xb7d62448 x+56:   mov$0xc,%ebx
0xb7d6244d x+61:   lea0x8(%esp),%ecx
0xb7d62451 x+65:   call   *%gs:0x10
0xb7d62458 x+72:   mov%edx,%ebx
0xb7d6245a x+74:   xchg   %eax,%esi
0xb7d6245b x+75:   call   0xb7d6dd90 __libc_disable_asynccancel
0xb7d62460 x+80:   mov%esi,%eax
0xb7d62462 x+82:   pop%esi
0xb7d62463 x+83:   cmp$0xff83,%eax
0xb7d62466 x+86:   jae0xb7d62469 x+89
0xb7d62468 x+88:   ret
0xb7d62469 x+89:   call   0xb7d998f8 __i686.get_pc_thunk.cx
0xb7d6246e x+94:   add$0x61b86,%ecx
0xb7d62474 x+100:  mov0xff2c(%ecx),%ecx
0xb7d6247a x+106:  xor%edx,%edx
0xb7d6247c x+108:  sub%eax,%edx
0xb7d6247e x+110:  mov%edx,%gs:(%ecx)
0xb7d62481 x+113:  or $0x,%eax
0xb7d62484 x+116:  jmp0xb7d62438 x+40

Any clues?

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo 

fix for 2.9.20-rc2 null pointer dereference in SoftMAC? was Re: [PATCH] softmac: Fix for work struct changes

2006-12-26 Thread bert hubert
On Sun, Dec 10, 2006 at 03:37:27PM -0600, Larry Finger wrote:
 casted to (void*). This compiled correctly but resulted in a
 softlock, because mutex_lock was called with the wrong memory
 address. The patch fixes the problem. Another issue was a wrong

(quickly, between christmas dinner preparations)
Does this explain the following, which happens reliably in stock 2.6.20-rc2 
(in-kernel zd1211rw):

Dec 24 22:07:25 localhost kernel: [  120.238914] SoftMAC: Open Authentication 
completed with 00:0e:a6:16:28:a9
Dec 24 22:07:25 localhost kernel: [  120.239005] BUG: unable to handle kernel 
NULL pointer dereference at virtual address 0006
Dec 24 22:07:25 localhost kernel: [  120.239132]  printing eip:
Dec 24 22:07:25 localhost kernel: [  120.239191] c04cf8c5
Dec 24 22:07:25 localhost kernel: [  120.239249] *pde = 
Dec 24 22:07:25 localhost kernel: [  120.239308] Oops: 0002 [#1]
Dec 24 22:07:25 localhost kernel: [  120.239367] Modules linked in: capability 
commoncap cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand 
freq_table cpufreq_conservative zd1211rw ieee80211softmac usbhid ieee80211 
ieee80211_crypt psmouse
Dec 24 22:07:25 localhost kernel: [  120.239850] CPU:0
Dec 24 22:07:25 localhost kernel: [  120.239851] EIP:
0060:[__mutex_lock_slowpath+30/89]Not tainted VLI
Dec 24 22:07:25 localhost kernel: [  120.239853] EFLAGS: 00010286   (2.6.20-rc2 
#7)
Dec 24 22:07:25 localhost kernel: [  120.240043] EIP is at 
__mutex_lock_slowpath+0x1e/0x59
Dec 24 22:07:25 localhost kernel: [  120.240106] eax: f5b449e0   ebx: f5b449dc  
 ecx: 0006   edx: 0004
Dec 24 22:07:25 localhost kernel: [  120.240173] esi: c19005a0   edi: f5b44a40  
 ebp: f8862ce8   esp: c1909ec0
Dec 24 22:07:25 localhost kernel: [  120.240241] ds: 007b   es: 007b   ss: 0068
Dec 24 22:07:25 localhost kernel: [  120.240305] Process events/0 (pid: 4, 
ti=c1908000 task=c19005a0 task.ti=c1908000)
Dec 24 22:07:25 localhost kernel: [  120.240372] Stack: f5b449e0 0006 
0020 f5b449a0 f5b44a40 c04cf7d8 f8862943 f72b8500
Dec 24 22:07:25 localhost kernel: [  120.240676]0286 f5b44314 
f5b449dc f5b44a40 0001  f5e6c9c0 f5e6c9c0
Dec 24 22:07:25 localhost kernel: [  120.240981] f5b44a40 
f8862ce8 f8862d50 0004 00100100 00200200 0004
Dec 24 22:07:25 localhost kernel: [  120.241284] Call Trace:
Dec 24 22:07:25 localhost kernel: [  120.241399]  [mutex_lock+9/10] 
mutex_lock+0x9/0xa
Dec 24 22:07:25 localhost kernel: [  120.241485]  [f8862943] 
ieee80211softmac_assoc_work+0x1b/0x3c0 [ieee80211softmac]
Dec 24 22:07:25 localhost kernel: [  120.241614]  [f8862ce8] 
ieee80211softmac_assoc_notify_auth+0x0/0x1e [ieee80211softmac]
Dec 24 22:07:25 localhost kernel: [  120.241741]  [f8862d50] 
ieee80211softmac_notify_callback+0x40/0x48 [ieee80211softmac]
Dec 24 22:07:25 localhost kernel: [  120.241866]  [f8862d10] 
ieee80211softmac_notify_callback+0x0/0x48 [ieee80211softmac]
Dec 24 22:07:25 localhost kernel: [  120.241992]  [f8862ce8] 
ieee80211softmac_assoc_notify_auth+0x0/0x1e [ieee80211softmac]
Dec 24 22:07:25 localhost kernel: [  120.242118]  [f8862d10] 
ieee80211softmac_notify_callback+0x0/0x48 [ieee80211softmac]
Dec 24 22:07:25 localhost kernel: [  120.242243]  [run_workqueue+139/311] 
run_workqueue+0x8b/0x137
Dec 24 22:07:25 localhost kernel: [  120.242336]  [worker_thread+0/302] 
worker_thread+0x0/0x12e
Dec 24 22:07:25 localhost kernel: [  120.242422]  [worker_thread+261/302] 
worker_thread+0x105/0x12e
Dec 24 22:07:25 localhost kernel: [  120.242509]  [default_wake_function+0/12] 
default_wake_function+0x0/0xc
Dec 24 22:07:25 localhost kernel: [  120.242596]  [kthread+155/191] 
kthread+0x9b/0xbf
Dec 24 22:07:25 localhost kernel: [  120.242682]  [kthread+0/191] 
kthread+0x0/0xbf
Dec 24 22:07:25 localhost kernel: [  120.242767]  [kernel_thread_helper+7/16] 
kernel_thread_helper+0x7/0x10
Dec 24 22:07:25 localhost kernel: [  120.242856]  ===
Dec 24 22:07:25 localhost kernel: [  120.242915] Code: 00 00 00 31 d2 89 d0 83 
c4 0c 5b 5e c3 56 53 83 ec 0c 89 c3 65 8b 35 08 00 00 00 8d 40 04 8b 48 04 89
60 04 89 04 24 89 4c 24 04 89 21 89 74 24 08 83 c8 ff 87 03 48 74 0d c7 06 02 
00 00 00 e8
Dec 24 22:07:25 localhost kernel: [  120.244531] EIP: 
[__mutex_lock_slowpath+30/89] __mutex_lock_slowpath+0x1e/0x59 SS:ESP 
0068:c1909ec0

This happens after starting wpa_supplicant on a zd1211rw device.


-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bcm43xx-softmac broken on 2.6.20-rc2

2006-12-26 Thread bert hubert
On Sun, Dec 24, 2006 at 09:51:50AM -0600, Larry Finger wrote:
 This is a heads-up for anyone wishing to use bcm43xx-softmac on Linus's git 
 tree, which is now at
 v2.6.20-rc2. There are two serious bugs in that code. Fixes are found below.

For some reason your patch does not apply to stock 2.6.20-rc2, although I
don't see why. Applying it by hand makes things compile though, and even
fixes the problem I mentioned in my previous post:

http://www.spinics.net/lists/netdev/msg21906.html

Thanks!

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sendmsg, descriptors and no content

2006-10-31 Thread bert hubert
On Tue, Oct 31, 2006 at 11:01:01PM +0100, [EMAIL PROTECTED] wrote:
 When I use sendmsg to send descriptors from one process to another using
 unix-sockets I need to include at least one byte of normal data for the
 descriptors to be send (using the iovec structure). The same code worked

W. R. Stevens, Unix Network Programming (2nd ed), vol 1, p. 389 recommends
sending at least a byte anyhow, which allows you to detect EOF.

Also see
http://www.cs-ipv6.lancs.ac.uk/ipv6/mail-archive/LinuxNetdev/1998-03/0144.html 
however.

I've attached an example that appears to work.

Bert

#include sys/socket.h
#include stdio.h

int sfd(int passfd, int fd, int data)
{
char cbuf[CMSG_SPACE(sizeof(int))];
struct msghdr mh = { 0 };
struct cmsghdr *cm;
int *dp;
struct iovec iov;

if (fd = 0) {
mh.msg_control = cbuf;
mh.msg_controllen = sizeof cbuf;
cm = CMSG_FIRSTHDR(mh);
cm-cmsg_len = CMSG_LEN(sizeof(int));
cm-cmsg_level = SOL_SOCKET;
cm-cmsg_type = SCM_RIGHTS;

dp = CMSG_DATA(cm);
*dp = fd;
}
if (data != 0) {
iov.iov_base = data;
iov.iov_len = sizeof data;
mh.msg_iov = iov;
mh.msg_iovlen = 1;
}

return sendmsg(passfd, mh, 0);
}

/* Only prepared to rcv one fd per message */
int rcvfd(int passfd, int *data, int *datalen)
{
char cbuf[CMSG_SPACE(sizeof(int))];
struct msghdr mh = { 0 };
struct cmsghdr *cm;
int *dp, ret;
struct iovec iov;

if (data) {
mh.msg_iov = iov;
mh.msg_iovlen = 1;
iov.iov_base = data;
iov.iov_len = sizeof(int);
}

mh.msg_control = cbuf;
mh.msg_controllen = sizeof cbuf;
cm = CMSG_FIRSTHDR(mh);
cm-cmsg_len = CMSG_LEN(sizeof(int));
cm-cmsg_level = SOL_SOCKET;
cm-cmsg_type = SCM_RIGHTS;

*datalen = 0;
ret = recvmsg(passfd, mh, 0);
if (ret  0)
return ret;
if (datalen)
*datalen = ret;

dp = CMSG_DATA(cm);
return *dp;
}

int main()
{
  int fd[2];
  int datalen;

  socketpair(AF_UNIX, SOCK_DGRAM, 0, fd);

  printf(Sending returned status: %d\n, sfd(fd[0], 0, 1));
  
  printf(Received fd: %d\n, rcvfd(fd[1], 0, datalen));
  
}

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Announce] Netchannels ported to the latest git tree. Gigabit benchmark. Complete rout.

2006-10-26 Thread bert hubert
On Thu, Oct 26, 2006 at 02:51:51PM +0400, Evgeniy Polyakov wrote:

 Benchmark uses 128 bytes sending/receiving per syscall (no latency
 checks, only throughput.
 Receiving CPU usage is 3 times less (90% socket code vs. 30%
 Sending CPU usage is 5 times less (upto 50% vs. upto 10%).

Wow. I currently lack the hardware to reproduce your measurements, do you
have any idea of how these numbers would be with 1024 byte system calls?

Thanks.

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tested: Re: [PATCH] tcp: make cubic the default

2006-09-23 Thread bert hubert
Stephen,

I've applied both of your patches
(http://marc.theaimsgroup.com/?l=linux-netdevm=115878447914612w=2
and 
http://marc.theaimsgroup.com/?l=linux-netdevm=115878448125216w=2
) and tried to break them, but it now appears to do the right thing in all
cases, even when malforming the .config by hand, a 'make oldconfig' restores
sanity.

Reno is chosen if none of the non-scary congestion avoidance algorithms are
available, and the default for when they are are as you intended.

I've testbooted the resulting kernel and everything appears to work as
desired, the proper TCP gets chosen, loading other ones does not change the
default, but does make them available.

Unloading the module containing the configured policy sets the policy to
'cubic', which is probably the next entry in the policy list.

All in all, this final iteration of the congestion selection patches appears
to do the job!

Davem, I'd recommend both patches for merging.

Bert

On Wed, Sep 20, 2006 at 01:32:58PM -0700, Stephen Hemminger wrote:
 Change default congestion control used from BIC to the newer CUBIC
 which it the successor to BIC but has better properties over long delay links.
 
 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
 
 ---
  net/ipv4/Kconfig |   12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)
 
 --- net-test.orig/net/ipv4/Kconfig2006-09-20 12:22:06.0 -0700
 +++ net-test/net/ipv4/Kconfig 2006-09-20 13:31:21.0 -0700
 @@ -454,7 +454,7 @@
 modules.
  
 Nearly all users can safely say no here, and a safe default
 -   selection will be made (BIC-TCP with new Reno as a fallback).
 +   selection will be made (CUBIC with new Reno as a fallback).
  
 If unsure, say N.
  
 @@ -462,7 +462,7 @@
  
  config TCP_CONG_BIC
   tristate Binary Increase Congestion (BIC) control
 - default y
 + default m
   ---help---
   BIC-TCP is a sender-side only change that ensures a linear RTT
   fairness under large windows while offering both scalability and
 @@ -476,7 +476,7 @@
  
  config TCP_CONG_CUBIC
   tristate CUBIC TCP
 - default m
 + default y
   ---help---
   This is version 2.0 of BIC-TCP which uses a cubic growth function
   among other techniques.
 @@ -573,7 +573,7 @@
  
  choice
   prompt Default TCP congestion control
 - default DEFAULT_BIC
 + default DEFAULT_CUBIC
   help
 Select the TCP congestion control that will be used by default
 for all connections.
 @@ -600,7 +600,7 @@
  
  endif
  
 -config TCP_CONG_BIC
 +config TCP_CONG_CUBIC
   tristate
   depends on !TCP_CONG_ADVANCED
   default y
 @@ -613,7 +613,7 @@
   default vegas if DEFAULT_VEGAS
   default westwood if DEFAULT_WESTWOOD
   default reno if DEFAULT_RENO
 - default bic
 + default cubic
  
  source net/ipv4/ipvs/Kconfig
  
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 !DSPAM:4511a594269391527717022!
 

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcp: simpler bic default

2006-09-20 Thread bert hubert
On Tue, Sep 19, 2006 at 04:23:55PM -0700, Stephen Hemminger wrote:
 Okay, build testing all the possibilities now, answer by morning..

Please boot some of them as well - I can see a kernel that really wants to
load bic at boot time but can't find it.

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcp: set congestion default through Kconfig (v2)

2006-09-19 Thread bert hubert
On Tue, Sep 19, 2006 at 02:20:07PM -0700, David Miller wrote:

  Bert's attempt was noble
  It showed your desire for the truth

It was also crap :-)

 Applied, but...
 net/ipv4/Kconfig:607:warning: defaults for choice values not supported

It does appear to do the right thing in all cases I throw against it, but
this warning is sure to generate noise.

It probably means CONFIG_BIC is both available in the menu, and set as a
separate option, and that this does not set the menu to CONFIG_BIC. This is
not a big deal however as the menu is not shown at all whenever line 607
applies.


-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcp: simpler bic default

2006-09-19 Thread bert hubert
On Tue, Sep 19, 2006 at 02:32:09PM -0700, Stephen Hemminger wrote:
 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

# CONFIG_TCP_CONG_ADVANCED is not set
# CONFIG_DEFAULT_BIC is not set
# CONFIG_DEFAULT_CUBIC is not set
# CONFIG_DEFAULT_HTCP is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG=bic

There is no bic in the kernel now - will this do the right thing?

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp congestion policy selection link order fragile

2006-09-18 Thread bert hubert
On Mon, Sep 18, 2006 at 01:51:30AM -0700, David Miller wrote:
 We created TCP_CONG_ADVANCED for a purpose.  If you turn that
 thing on, you get full control but if something breaks you get
 to keep the pieces.

But we should not try to break stuff on purpose, no matter how advanced. It
makes zero sense. To reiterate, when compiling in multiple TCP policies, a
*random* one gets enabled. This is not something we want to offer even
advanced users. It is a kernel, not an adventure course.

Please consider this near-oneliner patch which makes stuff behave more like
people expect: loading a module, or compiling in a congestion avoidance
policy only makes it available, but does not turn it on by default. 

It also cleans up two notices a bit.

I've tested this patch and it does the job for me, reno is now the default,
even when more advanced options are compiled in, but the rest is still
available.

When in doubt, consider that I discovered this because my kernel was
crashing, and that this is bound to generate heaps of annoying email
otherwise. 

Thanks.

Signed-off-by: bert hubert [EMAIL PROTECTED]

--- linux-2.6.18-rc7/net/ipv4/tcp_cong.c.org2006-09-18 11:42:25.0 
+0200
+++ linux-2.6.18-rc7/net/ipv4/tcp_cong.c2006-09-18 11:43:45.0 
+0200
@@ -45,11 +45,11 @@
 
spin_lock(tcp_cong_list_lock);
if (tcp_ca_find(ca-name)) {
-   printk(KERN_NOTICE TCP %s already registered\n, ca-name);
+   printk(KERN_NOTICE TCP congestion control '%s' already 
registered\n, ca-name);
ret = -EEXIST;
} else {
-   list_add_rcu(ca-list, tcp_cong_list);
-   printk(KERN_INFO TCP %s registered\n, ca-name);
+   list_add_tail_rcu(ca-list, tcp_cong_list);
+   printk(KERN_INFO TCP congestion control '%s' registered\n, 
ca-name);
}
spin_unlock(tcp_cong_list_lock);
 
-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp congestion policy selection link order fragile

2006-09-18 Thread bert hubert
On Mon, Sep 18, 2006 at 07:06:00AM -0700, David Miller wrote:

 Any ordering scheme is wrong or unexpected for _somebody_.  Look how

I agree violently. Would you agree that it would be best to have a mechanism
that explicitly sets a sane default, and does not rely on ordering?

My implementation indeed broke your intentions, but would you be open to
revamping things so the default policy is not dependent on load order?

What would the desired default be, 'BIC' in all cases?

Thanks. 

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp congestion policy selection link order fragile

2006-09-18 Thread bert hubert
On Mon, Sep 18, 2006 at 11:53:09AM -0700, David Miller wrote:
  What would the desired default be, 'BIC' in all cases?
 
 And if BIC is not enabled in the configuration, then what?

As the source notes /* we'll always have reno */ . This would make the
policy: the default is bic if available, otherwise it is reno, which is
*always* available.

But it is all up to you. I'm willing to do the leg work.

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-rc[67] crashes in TCP ack handling

2006-09-17 Thread bert hubert
On Sun, Sep 17, 2006 at 08:32:14AM +0900, Stephen Hemminger wrote:

 By building all the possiblities into the kernel, ie. not as modules
 you get the last one registered. TCP LP is probably the worst one
 to use, because it is designed for bulk low priority applications.
 It also is one of the newest least tested.  Right now, I would rate

Hehe, this seems to be a bad default configuration policy then. People
generally don't assume that if the kernel offers 10 policies that the most
unstable will be used by default if you compile them all in.

I've attached a patch that reorders the choices per your suggested order, so
people are most likely to get a sane default.

I've tried to make reno the default, no matter what you compiled in, but
it didn't work. The linker probably reorders tcp_cong.o in early.

 Without a back trace, it will be hard to find the bug in TCP LP

Indeed. 

Many thanks for your quick answer Stephen!

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
--- linux-2.6.18/net/ipv4/Makefile~ 2006-09-17 11:48:33.0 +0200
+++ linux-2.6.18/net/ipv4/Makefile  2006-09-17 11:48:45.0 +0200
@@ -7,7 +7,7 @@
 ip_output.o ip_sockglue.o inet_hashtables.o \
 inet_timewait_sock.o inet_connection_sock.o \
 tcp.o tcp_input.o tcp_output.o tcp_timer.o tcp_ipv4.o \
-tcp_minisocks.o tcp_cong.o \
+tcp_minisocks.o \
 datagram.o raw.o udp.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 sysctl_net_ipv4.o fib_frontend.o fib_semantics.o
 
@@ -37,16 +37,20 @@
 obj-$(CONFIG_IP_ROUTE_MULTIPATH_CACHED) += multipath.o
 obj-$(CONFIG_INET_TCP_DIAG) += tcp_diag.o
 obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o
-obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
-obj-$(CONFIG_TCP_CONG_CUBIC) += tcp_cubic.o
-obj-$(CONFIG_TCP_CONG_WESTWOOD) += tcp_westwood.o
+
+obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
+obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o
+obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o
+obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o
 obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o
 obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o
+obj-$(CONFIG_TCP_CONG_CUBIC) += tcp_cubic.o
+obj-$(CONFIG_TCP_CONG_WESTWOOD) += tcp_westwood.o
 obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o
-obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o
-obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o
-obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o
-obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
+obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
+
+# make sure the built in congestion scheme is the default
+obj-y += tcp_cong.o
 
 obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \
  xfrm4_output.o


Re: tcp congestion policy selection link order fragile

2006-09-17 Thread bert hubert
The original message Stephen reacts to below apparently never made it to the
list, it can be found here: http://ds9a.nl/tmp/module-policy.txt

 Any body who builds in random stuff without thinking is being foolish.
 But, if you can think of a better configuration method that isn't too
 grotty, then go for it.

The method I'm proposing is simple enough:

1) reno is always built-in
2) it is the default tcp congestion policy
3) loading/compiling-in additional tcp congestion policies only make them
   available
4) userspace is free to select a non-default tcp congestion policy at will

The implementation might be as simple as making the *first* registered
congestion policy the default (instead of the last one) which would be reno,
as it is in tcp_cong.o, which is probably always loaded first (as the other
.o's need symbols that are in tcp_cong.o).

Despite what you allege about my foolishness, I maintain that a kernel that
enables a *random policy* from the ones you compiled in, is not a sane
kernel.

The default kernel should be as sane as possible, allowing the userspace
people (ie, me) to mess things up to their heart's desire.

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.18-rc[67] crashes in TCP ack handling

2006-09-16 Thread bert hubert
The bad news is that I haven't yet been able to capture traces. 
Once every three days or so I get a crash of 2.6.18-rc[67] which
*probably* end in tcp_ack(), but I don't have the exact dump.

My .config is indeed heavy on TCP congestion stuff:

$ zcat /proc/config.gz | grep -i tcp
CONFIG_INET_TCP_DIAG=y
CONFIG_TCP_CONG_ADVANCED=y
# TCP congestion control
CONFIG_TCP_CONG_BIC=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=y
CONFIG_TCP_CONG_HTCP=y
CONFIG_TCP_CONG_HSTCP=y
CONFIG_TCP_CONG_HYBLA=y
CONFIG_TCP_CONG_VEGAS=y
CONFIG_TCP_CONG_SCALABLE=y
CONFIG_TCP_CONG_LP=y
CONFIG_TCP_CONG_VENO=y
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
CONFIG_IP_NF_TARGET_TCPMSS=y
# CONFIG_NET_TCPPROBE is not set
# CONFIG_ISCSI_TCP is not set
# CONFIG_NFSD_TCP is not set

However, I haven't specifically configured anything.
$ dmesg | grep -i tcp 
[   33.106317] TCP established hash table entries: 131072 (order: 8, 1048576
bytes)
[   33.107086] TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
[   33.107476] TCP: Hash tables configured (established 131072 bind 65536)
[   33.107605] TCP reno registered
[   40.985770] IPVS: Registered protocols (TCP, UDP, AH, ESP)
[   41.105710] TCP bic registered
[   41.105833] TCP cubic registered
[   41.105957] TCP westwood registered
[   41.106080] TCP highspeed registered
[   41.106203] TCP hybla registered
[   41.106328] TCP htcp registered
[   41.106452] TCP vegas registered
[   41.106574] TCP veno registered
[   41.106698] TCP scalable registered
[   41.106822] TCP lp registered

$ cat ipv4/tcp_congestion_control
lp

I hope to follow up this message with the actual backtrace, but this is
already an heads up. 

Sorry for not yet being able to be more specific.

bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-rc6 memory mapped pcap truncates outgoing TCP packets, but not icmp

2006-09-15 Thread bert hubert
 It appears to be intentionally, but I don't see a reason for it.
 Can you try if this patch makes it work as expected?

 [PACKET]: Don't truncate non-linear skbs with mmaped IO
 
 Non-linear skbs are truncated to their linear part with mmaped IO.
 Fix by using skb_copy_bits instead of memcpy.

Works very well for me! I hope this can make it into 2.6.18.
Thanks everybody.



-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-rc6 memory mapped pcap truncates outgoing TCP packets, but not icmp

2006-09-14 Thread bert hubert
On Wed, Sep 13, 2006 at 08:44:21PM +0200, Patrick McHardy wrote:

 Are you using TSO on the outgoing device? If so please try to log the
 packet using iptables to see if it really is a TSO packet.

Good catch! I turned off TSO and things are working fine again.

Is this a known problem, should it be documented or fixed? I'm more than
willing to write up some warnings should this be a good idea.

Thanks Patrick! I can do without TSO but not without mmapped pcap!

Bert


-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.18-rc6 memory mapped pcap truncates outgoing TCP packets, but not icmp

2006-09-13 Thread bert hubert
Hi people,

I like to use memory mapped pcap (PACKET_MMAP) since off the shelf, linux is
a tad prone to drop packets while capturing these days. It used to be lots
better at it, but right now memory mapped pcap is the only way to get things
working a bit. I've noticed this on many machines.

However, memory mapped pcap has started to truncate outgoing packets for me
recently, and interestingly, I only see this with locally generated TCP
packets, not with locally generated ICMP packets. I haven't yet tried UDP,
nor actual sniffing, this is all locally generated packets going out on
eth0.

Incoming packets are not truncated.

My commandline:
# PCAP_VERBOSE=1 PCAP_FRAMES=15000 tcpdump  -i eth0 -s 1600 -p -w test-dump
libpcap version: 0.9
Kernel filter, Protocol 0300, MMAP mode (12188 frames, snapshot 1600), socket 
type: Raw
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 1600 bytes


Within this dump we find the following outgoing TCP packet:
Frame 289 (553 bytes on wire, 66 bytes captured)
Arrival Time: Sep 13, 2006 13:18:31.96025
Time delta from previous packet: 0.72000 seconds
Time since reference or first frame: 42.738582000 seconds
Frame Number: 289
Packet Length: 553 bytes
Capture Length: 66 bytes
Protocols in frame: eth:ip:tcp
Type: IP (0x0800)
Internet Protocol, Src: 10.0.3.146 (10.0.3.146), Dst: 82.165.25.125 
(82.165.25.125)

Which is truncated!

However, we also find this incoming packet:
Frame 290 (1508 bytes on wire, 1508 bytes captured)
Arrival Time: Sep 13, 2006 13:18:32.036536000
Time delta from previous packet: 0.076286000 seconds
Time since reference or first frame: 42.814868000 seconds
Frame Number: 290
Packet Length: 1508 bytes
Capture Length: 1508 bytes
Protocols in frame: eth:ip:tcp:http
Internet Protocol, Src: 82.165.25.125 (82.165.25.125), Dst: 10.0.3.146 
(10.0.3.146)

Which looks just fine.

Downgrading to 'normal' mode fixes this problem, but suffers from too much
packet loss to be useful.

My tcpdump is built with:
http://public.lanl.gov/cpw/libpcap-0.9.20060417.tar.gz

It used to work just fine, but I haven't been able to find when it broke.

Please let me know how I can help solve this!

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Van Jacobson's net channels and real-time

2006-04-22 Thread bert hubert
On Thu, Apr 20, 2006 at 12:09:55PM -0700, David S. Miller wrote:
 Going all the way to the socket is a large endeavor and will require a
 lot of restructuring to do it right, so expect this to take on the
 order of months.

That's what you said about Niagara too :-) 

Good luck!

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skb diet

2006-04-16 Thread bert hubert
On Sat, Apr 15, 2006 at 09:22:01PM +0200, Andi Kleen wrote:

 And optimizing for uncommon cases (not TCP) doesn't seem too useful.

There are servers that do tens of megabits of UDP these days (think VoIP, or
in my case, DNS), so it not that uncommon.

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


gcc -Os causes: Re: ip route add default: network unreachable? 2.6.15

2006-01-04 Thread bert hubert
On Wed, Jan 04, 2006 at 11:36:33PM +0100, bert hubert wrote:
 $ sudo ip route re default via 10.0.0.12
 RTNETLINK answers: Network is unreachable

This all goes away on removing CONFIG_CC_OPTIMIZE_FOR_SIZE in the kernel
.config with the gcc prerelease Ubunty Breezy ships.

Now verifying if this is fixed in gcc 4.0.2.

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=322723 for more
details.

I hope to pin down a culprit.

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html