Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)

2007-03-06 Thread Vladimir B. Savkin
On Fri, Sep 22, 2006 at 09:51:09AM -0700, Rick Jones wrote:
 That came from named. It opens lots of sockets with SIOCGSTAMP.
 No idea what it needs that many for.
 
 IIRC ISC BIND named opens a socket for each IP it finds on the system. 
 Presumeably in this way it knows implicitly the destination IP without 
 using platform-specific recvfrom/whatever extensions and gets some 
 additional parallelism in the stack on SMP systems.
 
 Why it needs/wants the timestamps I've no idea, I don't think it gets 
 them that way on all platforms.  I suppose the next time I do some named 
 benchmarking I can try to take a closer look in the source.
 

Returning to the discussion about packet timestamps, I just
use the following patch now:

diff -ur ../linux-2.6.20.1/include/linux/sysctl.h 
linux-2.6.20.1-ts/include/linux/sysctl.h
--- ../linux-2.6.20.1/include/linux/sysctl.h2007-02-20 09:34:32.0 
+0300
+++ linux-2.6.20.1-ts/include/linux/sysctl.h2007-03-04 19:10:36.0 
+0300
@@ -280,6 +280,7 @@
NET_CORE_BUDGET=19,
NET_CORE_AEVENT_ETIME=20,
NET_CORE_AEVENT_RSEQTH=21,
+   NET_CORE_ACCURATE_TIMESTAMPS=99,
 };
 
 /* /proc/sys/net/ethernet */
diff -ur ../linux-2.6.20.1/net/core/dev.c linux-2.6.20.1-ts/net/core/dev.c
--- ../linux-2.6.20.1/net/core/dev.c2007-02-20 09:34:32.0 +0300
+++ linux-2.6.20.1-ts/net/core/dev.c2007-03-04 19:09:44.0 +0300
@@ -1043,9 +1043,11 @@
 }
 EXPORT_SYMBOL(__net_timestamp);
 
+int sysctl_accurate_timestamps = 1;
+
 static inline void net_timestamp(struct sk_buff *skb)
 {
-   if (atomic_read(netstamp_needed))
+   if (sysctl_accurate_timestamps  atomic_read(netstamp_needed))
__net_timestamp(skb);
else {
skb-tstamp.off_sec = 0;
diff -ur ../linux-2.6.20.1/net/core/sysctl_net_core.c 
linux-2.6.20.1-ts/net/core/sysctl_net_core.c
--- ../linux-2.6.20.1/net/core/sysctl_net_core.c2007-02-20 
09:34:32.0 +0300
+++ linux-2.6.20.1-ts/net/core/sysctl_net_core.c2007-03-04 
19:05:11.0 +0300
@@ -21,6 +21,8 @@
 
 extern int sysctl_core_destroy_delay;
 
+extern int sysctl_accurate_timestamps;
+
 #ifdef CONFIG_XFRM
 extern u32 sysctl_xfrm_aevent_etime;
 extern u32 sysctl_xfrm_aevent_rseqth;
@@ -136,6 +138,14 @@
.mode   = 0644,
.proc_handler   = proc_dointvec
},
+   {
+   .ctl_name   = NET_CORE_ACCURATE_TIMESTAMPS,
+   .procname   = accurate_timestamps,
+   .data   = sysctl_accurate_timestamps,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec
+   },
{ .ctl_name = 0 }
 };
 

May I ask about integrating this or a similar solution for those
like me who values routing performance (with bind9 running) over
minor convinience of having tcpdump always display accurate
timestamps?

And why current kernel (2.6.20.1) still ignores parameter
clocksource=tsc ? I think with idle=poll TSC is safe to use on my setup,
it had ran with TSC for many months without a problem.

~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)

2007-03-06 Thread Eric Dumazet
On Tuesday 06 March 2007 14:25, Vladimir B. Savkin wrote:

   },
 + {
 + .ctl_name   = NET_CORE_ACCURATE_TIMESTAMPS,
 + .procname   = accurate_timestamps,
 + .data   = sysctl_accurate_timestamps,
 + .maxlen = sizeof(int),
 + .mode   = 0644,
 + .proc_handler   = proc_dointvec
 + },
   { .ctl_name = 0 }
  };


 May I ask about integrating this or a similar solution for those
 like me who values routing performance (with bind9 running) over
 minor convinience of having tcpdump always display accurate
 timestamps?


Quite frankly I dont like this patch :

1) Fix applications, do not bloat kernel.

2) accurate_timestamps is misleading. 
Should be disable_timestamps 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)

2007-03-06 Thread Vladimir B. Savkin
On Tue, Mar 06, 2007 at 03:38:44PM +0100, Eric Dumazet wrote:
 2) accurate_timestamps is misleading. 
   Should be disable_timestamps 

Not, if default is 1, as in my patch.

~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)

2007-03-06 Thread Eric Dumazet
On Tuesday 06 March 2007 15:43, Vladimir B. Savkin wrote:
 On Tue, Mar 06, 2007 at 03:38:44PM +0100, Eric Dumazet wrote:
  2) accurate_timestamps is misleading.
  Should be disable_timestamps

 Not, if default is 1, as in my patch.

Yes I saw this. I should write more words next time :)

Full explanation:
--

If your tunable is named accurate_timestamps then a 0 value would mean :

Use a low precision timestamp (based on xtime for example) instead of a full 
resolution...

This is not what your patch does (while it could do that, but beware that 
net-2.6.22 includes now a ktime_t timestamping)

So :
--

It would be better to name the tunable disable_timestamps, default 0 of 
course 
It would better describe what your patch is actually doing : Even if a tcpdump 
is running (so asking for timestamps), it wont have them because the sysctl 
disabled them.

Thank you

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)

2007-03-06 Thread Vladimir B. Savkin
On Tue, Mar 06, 2007 at 04:16:24PM +0100, Eric Dumazet wrote:
 
 It would be better to name the tunable disable_timestamps, default 0 of 
 course 

I agree.
If networking maintainers are interested, I surely can prepare a patch.

But IMO some way to force TSC usage on x86_64 will be even better.

 It would better describe what your patch is actually doing : Even if a 
 tcpdump 
 is running (so asking for timestamps), it wont have them because the sysctl 
 disabled them.

Well, tcpdump will have timestamps, but taken at wrong moment.
But some other applications (that use ip_queue, ulog etc.) will not,
as I understand.

 
 Thank you
 
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-22 Thread Alexey Kuznetsov
Hello!

 I can't even find a reference to SIOCGSTAMP in the
 dhcp-2.0pl5 or dhcp3-3.0.3 sources shipped in Ubuntu.
 
 But I will note that tpacket_rcv() expects to always get
 valid timestamps in the SKB, it does a:

It is equally unlikely it uses mmapped packet socket (tpacket_rcv).

I even installed that dhcp on x86_64. And I do not see anything,
netstamp_needed remains zero when running both server and client.
It looks like dhcp was defamed without a guilt. :-)

Seems, Andi saw some leakage in netstamp_needed (value of 7),
but I do not see this too.


In any case, the issue is obviously more serious than just behaviour
of some applications. On my notebook one gettimeofday() takes:

0.2 us with tsc
4.6 us with pm  (AND THIS CRAP IS DEFAULT!!)
9.4 us with pit (kinda expected)

It is ridiculous. Obviosuly, nobody (not only tcpdump, but everything
else) does not need such clock. Taking timestamp takes time comparable
with processing the whole tcp frame. :-) I have no idea what is possible
to do without breaking everything, but it is not something to ignore.
This timer must be shot. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-22 Thread Andi Kleen
On Friday 22 September 2006 17:35, Alexey Kuznetsov wrote:
 Hello!
 
  I can't even find a reference to SIOCGSTAMP in the
  dhcp-2.0pl5 or dhcp3-3.0.3 sources shipped in Ubuntu.
  
  But I will note that tpacket_rcv() expects to always get
  valid timestamps in the SKB, it does a:
 
 It is equally unlikely it uses mmapped packet socket (tpacket_rcv).
 
 I even installed that dhcp on x86_64. And I do not see anything,
 netstamp_needed remains zero when running both server and client.
 It looks like dhcp was defamed without a guilt. :-)

 Seems, Andi saw some leakage in netstamp_needed (value of 7),
 but I do not see this too.

That came from named. It opens lots of sockets with SIOCGSTAMP.
No idea what it needs that many for.
 
I suspect  it was either dhcpd (server) or that ppp user space daemon
the original reporter was running.

Maybe it would be a good idea to add a printk by default?

 In any case, the issue is obviously more serious than just behaviour
 of some applications. On my notebook one gettimeofday() takes:
 
   0.2 us with tsc
   4.6 us with pm  (AND THIS CRAP IS DEFAULT!!)

This is actually quite fast. I've seen much worse ratios.

Also on some i386 kernels the pmtimer reads the register three 
times to work around some buggy implementation that doesn't latch the counter
properly.

   9.4 us with pit (kinda expected)
 
 It is ridiculous. Obviosuly, nobody (not only tcpdump, but everything
 else) does not need such clock. Taking timestamp takes time comparable
 with processing the whole tcp frame. :-) I have no idea what is possible
 to do without breaking everything, but it is not something to ignore.
 This timer must be shot. :-)

If it's a reasonably new notebook it might be actually possible to change.
The default choices are quite conservative there because in the past
there were lots of problems with notebooks changing frequency behind
the kernel's back etc. and screwing up TSC. But that shouldn't happen anymore.

If you had a 64bit laptop the kernel would likely do the right choice :)

Notebooks are easy because they are only single socket, so the only thing
needed is to keep track of the frequency (or not if you have a Core+) 

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-22 Thread Rick Jones

That came from named. It opens lots of sockets with SIOCGSTAMP.
No idea what it needs that many for.


IIRC ISC BIND named opens a socket for each IP it finds on the system. 
Presumeably in this way it knows implicitly the destination IP without 
using platform-specific recvfrom/whatever extensions and gets some 
additional parallelism in the stack on SMP systems.


Why it needs/wants the timestamps I've no idea, I don't think it gets 
them that way on all platforms.  I suppose the next time I do some named 
benchmarking I can try to take a closer look in the source.


rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread Andi Kleen
On Monday 18 September 2006 23:22, David Miller wrote:

 Ok, ok, but don't we have queueing disciplines that need the timestamp
 even on ingress?

I grepped and I can't find any. The only non SIOCGTSTAMP users of the
time stamp seem to be sunrpc and conntrack and I bet both can be converted
over to jiffies without trouble.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread David Miller
From: Vladimir B. Savkin [EMAIL PROTECTED]
Date: Tue, 19 Sep 2006 02:03:31 +0400

 On Tue, Sep 19, 2006 at 02:00:38AM +0400, Alexey Kuznetsov wrote:
  * I do see get_offset_pmtmr() in top lines of profile. That's scary enough.
 
 I had it at the very top line.

That is just rediculious.

You can fix the networking by making it timestamp less but what
about things like just normal X11 clients that call gettimeofday()
at very high rates?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread David Miller
From: David Lang [EMAIL PROTECTED]
Date: Mon, 18 Sep 2006 14:57:04 -0700 (PDT)

 yes tcpdump may be wrong in requesting timestamps (in most cases it
 probably is, but in some cases it's doing exactly what the sysadmin
 wants it to do), but I don't think that many sysadmins would expect
 this much of a performance hit.  there should be some way to tell
 the system to ignore requests for timestamps so that a badly behaved
 program cannot cripple the system this way (and preferably something
 that doesn't require a full SELinux/capabilities implementation)

tcpdump is not wrong in requesting timestamps, and there are
many legitimate userland programs that call gettimeofday()
for internal timestamping _A LOT_.  Such as X11 clients.

The real fact of the matter is that these x86_64 systems are using the
slowest possible time-of-day implementation, simply because it's too
hard currently to properly probe the most efficient mechanism which
is present in the system.

If getting the time of day is at the top of the profiles in the packet
input path, and we're only capturing a timestamp once per packet,
something is _VERY VERY_ wrong with the timestamp implementation
because think of all of the other seriously expensive things that
happen on a per-packet basis which should absolutely dwarf
timestamping in terms of cost.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread David Miller
From: Alexey Kuznetsov [EMAIL PROTECTED]
Date: Tue, 19 Sep 2006 02:00:38 +0400

 * I do not undestand what the hell dhcp needs timestamps for.

I can't even find a reference to SIOCGSTAMP in the
dhcp-2.0pl5 or dhcp3-3.0.3 sources shipped in Ubuntu.

But I will note that tpacket_rcv() expects to always get
valid timestamps in the SKB, it does a:

if (skb-tstamp.off_sec == 0) { 
__net_timestamp(skb);
sock_enable_timestamp(sk);
}

so that it can fill in the h-tp_sec and h-tp_usec
fields.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread Stephen Hemminger
The sky2 hardware (and others) can timestamp in hardware, but trying
to keep device ticks and system clock in sync looked too nasty
to contemplate actually using it.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread Thomas Graf
* David Miller [EMAIL PROTECTED] 2006-09-18 14:22
 From: Alexey Kuznetsov [EMAIL PROTECTED]
 Date: Tue, 19 Sep 2006 01:03:21 +0400
 
  1. It even does not disable possibility to record timestamp inside
 driver, which Alan was afraid of. The sequence is:
  
  if (!skb-tstamp.off_sec)
  net_timestamp(skb);
  
  2. Maybe, netif_rx() should continue to get timestamp in netif_rx().
  
  3. NAPI already introduced almost the same inaccuracy. And it is really
 silly to waste time getting timestamp in netif_receive_skb() a few
 moments before the packet is delivered to a socket.
  
  4. ...but clock source, which takes one of top lines in profiles
 must be repaired yet. :-)
 
 Ok, ok, but don't we have queueing disciplines that need the timestamp
 even on ingress?

Queueing disciplines generally only care about the time delta
between two packets, using the receive stamp would lead to
wrong results as soon as a packet is queued more than once.

However, since we recently introcued ingress queueing we
must update the stamp to make up for the delay caused by the
queue. Updating the stamp at socket enqueue time would solve
this automatically.

It seems only natural to me that the real problem is the slow
clock source which needs to be resolved regardless of the
outcome of this discussion. I believe that updating the stamp
at socket enqueue time is the right thing to do but it shouldn't
be considered as a solution to the performance problem.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread Andi Kleen

 It seems only natural to me that the real problem is the slow
 clock source which needs to be resolved regardless of the
 outcome of this discussion. I believe that updating the stamp
 at socket enqueue time is the right thing to do but it shouldn't
 be considered as a solution to the performance problem.

While I agree it would be nice to fix that particular issue 
(it's unfortunately hard) slow clock sources in general won't go
away. They are also in lots of other platforms.

And even if you have a fast clock source not using it when you
don't need to is better. For example some x86s can be quite
slow even reading TSCs. It's much better than pmtmr
it's still is a expensive operations that is best avoided.

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin
On Mon, Sep 18, 2006 at 10:35:38AM +0200, Andi Kleen wrote:
  I just found out that TSC clocksource is not implemented on x86-64.
  Kernel version 2.6.18-rc7, is it true?
 
 The x86-64 timer subsystems currently doesn't have clocksources
 at all, but it supports TSC and some other timers.

Hm. On my box, TSC did not work, until I hacked arch/i386/kernel/tsc.c
in it. 
Neither clock=tsc nor clocksource=tsc didn't have any effect.

  I've also had experience of unsychronized TSC on dual-core Athlon,
  but it was cured by idle=poll.
 
 You can use that, but it will make your system run quite hot 
 and cost you a lot of powe^wmoney.

Here in Russia electric power is cheap compared with hardware upgrade.

  It seems that dhcpd3 makes the box timestamping incoming packets,
  killing the performance. I think that combining router and DHCP server
  on a same box is a legitimate situation, isn't it?
 
 Yes.  Good point. DHCP is broken and needs to be fixed. Can you
 send a bug report to the DHCP maintainers? 
 
 iirc the problem used to be that RAW sockets didn't do something
 they need them to do. Maybe we can fix that now.

Will try some days later.

Oh, and pppoe-server uses some kind of packet socket too, doesn't it?

 
 If that's not possible we can probably add a ioctl or similar
 to disable time stamping for packet sockets (DHCP shouldn't really
 need a fine grained time stamp). dhcpcd would need to use that then.

I would like some sysctl very much, too. Let tcpdump show imprecise
timestamps when forwarding performance is more important.
After all, Ciscos don't have any tcpdump analog at all, and they are 
very popular :)

 
 Keep me updated what they say.
 
 -Andi
 
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen
Vladimir B. Savkin [EMAIL PROTECTED] writes:

[you seem to send your emails in a strange way that doesn't keep me in cc.
Please stop doing that.]

 On Mon, Sep 18, 2006 at 11:58:21AM +0200, Andi Kleen wrote:
The x86-64 timer subsystems currently doesn't have clocksources
at all, but it supports TSC and some other timers.
   
  
   until I hacked arch/i386/kernel/tsc.c
  
  Then you don't use x86-64. 
  
 Oh. I mean I made arch/i386/kernel/tsc.c compile on x86-64
 by hacking some Makefiles and headers. 

The codebase for timing (and lots of other things) is quite different
between 32bit and 64bit. You're really surprised it doesn't work if you do such 
things?

 But the question is, why stock 2.6.18-rc7 could not use TSC on its own?

x86-64 doesn't use the TSC when it deems it to not be reliable, which
is the case on your system.
 
 I've also had experience of unsychronized TSC on dual-core Athlon,
 but it was cured by idle=poll.

You can use that, but it will make your system run quite hot 
and cost you a lot of powe^wmoney.
   
   Here in Russia electric power is cheap compared with hardware upgrade.
  
  It's not just electrical power - the hardware is more stressed and will
  likely fail earlier too.  As a rule of thumb the hotter your hardware runs
  the earlier it will fail.
 
 What hardware exactly. Doesn't it affect only CPU? And they are not
 know to fail before any other components.

All hardware. It's basic physics.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread David Miller
From: Andi Kleen [EMAIL PROTECTED]
Date: 18 Sep 2006 11:58:21 +0200

 For netdev: I'm more and more thinking we should just avoid the
 problem completely and switch to true end2end timestamps. This
 means don't time stamp when a packet is received, but only when it
 is delivered to a socket. The timestamp at receiving is a lie
 anyways because the network hardware can add an arbitary long delay
 before the driver interrupt handler runs. Then the problem above
 would completely disappear.

I don't think this is wise.

People who run tcpdump want wire timestamps as close as possible.
Yes, things get delayed with the IRQ path, DMA delays, IRQ
mitigation and whatnot, but it's an order of magnitude worse if
you delay to user read() since that introduces also the delay of
the packet copies to userspace which are significantly larger than
these hardware level delays.  If tcpdump gets swapped out, the
timestamp delay can be on the order of several seconds making it
totally useless.

Andi, you will need to find another solution to this problem :-)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

 
 People who run tcpdump want wire timestamps as close as possible.
 Yes, things get delayed with the IRQ path, DMA delays, IRQ
 mitigation and whatnot, but it's an order of magnitude worse if
 you delay to user read() since that introduces also the delay of
 the packet copies to userspace which are significantly larger than
 these hardware level delays.  If tcpdump gets swapped out, the
 timestamp delay can be on the order of several seconds making it
 totally useless.

My proposal wasn't to delay to user read, just to do the time stamp in socket 
context. This means as soon as packet or RAW/UDP have looked up the socket and 
can 
check a per socket flag do the time stamp.

The only delay this would add would be the queueing time from the NIC
to the softirq. Do you really think that is that bad?

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alan Cox
Ar Llu, 2006-09-18 am 16:29 +0200, ysgrifennodd Andi Kleen:
 The only delay this would add would be the queueing time from the NIC
 to the softirq. Do you really think that is that bad?

If you are trying to do things like network record/playback then you
want the minimal delay. There's a reason the original timestamp code
supported the hardware setting the timestamp itself - we actually had a
separare set of logic on a board that was doing the timestamping by
watching the IRQ line of the NIC chip.

Alan

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen
On Monday 18 September 2006 17:19, Alan Cox wrote:
 Ar Llu, 2006-09-18 am 16:29 +0200, ysgrifennodd Andi Kleen:
  The only delay this would add would be the queueing time from the NIC
  to the softirq. Do you really think that is that bad?
 
 If you are trying to do things like network record/playback then you
 want the minimal delay. 

But it's not minimal. Maybe it was long ago when the code was designed
on a 3c509 but not with modern hardware: Think interrupt mitigation and NAPI. 

And with NAPI we tend to process the packets directly after they
are fetched out of the RX queue, so there is practically no delay
between driver seeing the packet and softirq seeing it.  All the queuing
is done either at hardware level or later at socket level.

 There's a reason the original timestamp code 
 supported the hardware setting the timestamp itself - we actually had a
 separare set of logic on a board that was doing the timestamping by
 watching the IRQ line of the NIC chip.

That would be fine too (because it will be likely fast), but unfortunately
I don't know of any driver that does that.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov
Hello!

 For netdev: I'm more and more thinking we should just avoid the problem
 completely and switch to true end2end timestamps. This means don't
 time stamp when a packet is received, but only when it is delivered
 to a socket.

This will work.

From viewpoint of existing uses of timestamp by packet socket
this time is not worse. The only danger is violation of casuality
(when forwarded packet or reply packet gets timestamp earlier than
original packet). This pathology was main reason why timestamp
is recorded early, before packet is demultiplexed in netif_receive_skb().
But it is not a practical problem: delivery to packet/raw sockets
is occasionally placed _before_ delivery to real protocol handlers.


 handler runs. Then the problem above would completely disappear. 

Well, not completely. Too slow clock source remains too slow clock source.
If it is so slow, that it results in performance degradation, it just
should not be used at all, even such pariah as tcpdump wants to be fast.

Actually, I have a question. Why the subject is
Network performance degradation from 2.6.11.12 to 2.6.16.20?
I do not see beginning of the thread and cannot guess
why clock source degraded. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen
On Monday 18 September 2006 17:38, Alexey Kuznetsov wrote:
 Hello!
 
  For netdev: I'm more and more thinking we should just avoid the problem
  completely and switch to true end2end timestamps. This means don't
  time stamp when a packet is received, but only when it is delivered
  to a socket.
 
 This will work.
 
 From viewpoint of existing uses of timestamp by packet socket
 this time is not worse. The only danger is violation of casuality
 (when forwarded packet or reply packet gets timestamp earlier than
 original packet). 

Hmm, not sure how that could happen. Also is it a real problem
even if it could?

  handler runs. Then the problem above would completely disappear. 
 
 Well, not completely. Too slow clock source remains too slow clock source.
 If it is so slow, that it results in performance degradation, it just
 should not be used at all, even such pariah as tcpdump wants to be fast.
 
 Actually, I have a question. Why the subject is
 Network performance degradation from 2.6.11.12 to 2.6.16.20?
 I do not see beginning of the thread and cannot guess
 why clock source degraded. :-)

It's a long and sad story.

Old kernels didn't disable the TSC on those boxes (multi core K8) and assumed
they were synchronized for timing purposes. 

This initially mostly worked  if you don't use cpufreq, 
but over a longer uptime the TSCs would drift against each other and timing
would jump more and more between CPUs.

On older versions of K8 this drift happened much slower (more
aggressive power saving in HLT in newer steppings made it worse; that is why
idle=poll helps) and could be often ignored. But technically it was still a 
bug there because it would could break timing after long uptimes.

New multi socket K8 boxes are generally 
totally unusable with TSC because they use cpufreq and the TSCs can run
at completely differently frequencies, which obviously doesn't give very 
good timing information if you assume the TSC is globally synchronized.

That is why later kernels default to TSC off.  The original plan 
was to use HPET then, which is slower than TSC, but still not that bad.
But while most modern systems have a HPET timer somewhere in the chipset 
nearly all BIOS vendors forgot to describe it in the BIOS because Windows
didn't use it and Linux can't find it because of that. 

Then it has to use the ACPI pmtmr which is really really slow.
The overhead of that thing is so large that you can clearly see it in
the network benchmark.

The real fix long term is to change the timer subsystem to keep all TSC
state per CPU, then it'll work on the K8s too. Unfortunately it's a moderately 
hard problem  to make the result still fully monotonic. But people are working 
on it.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov
Hello!

 Hmm, not sure how that could happen. Also is it a real problem
 even if it could?

As I said, the problem is _occasionally_ theoretical.

This would happen f.e. if packet socket handler was installed
after IP handler. Then tcpdump would get packet after it is processed
(acked/replied/forwarded). This would be disasterous, the results
are unparsable.

I recall, the issue was discussed, and that time it looked more
reasonable to solve problems of this kind taking timestamp once
before it is seen by all the rest of stack. Who could expect that
PIT nightmare is going to return? :-)


 Then it has to use the ACPI pmtmr which is really really slow.
 The overhead of that thing is so large that you can clearly see it in
 the network benchmark.

I see. Thank you.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen
On Monday 18 September 2006 18:28, Alexey Kuznetsov wrote:
 Hello!
 
  Hmm, not sure how that could happen. Also is it a real problem
  even if it could?
 
 As I said, the problem is _occasionally_ theoretical.
 
 This would happen f.e. if packet socket handler was installed
 after IP handler. Then tcpdump would get packet after it is processed
 (acked/replied/forwarded). This would be disasterous, the results
 are unparsable.

But that never happens right? 

And do you have some other prefered way to solve this? Even if the timer
was fast it would be still good to avoid it in the fast path when DHCPD
is running.

I suppose in the worst case a sysctl like Vladimir asked for could be added,
but it would seem somewhat lame.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov
Hello!

 But that never happens right? 

Right.

Well, not right. It happens. Simply because you get packet
with newer timestamp after previous handler saw this packet
and did some actions. I just do not see any bad consequences.


 And do you have some other prefered way to solve this? Even if the timer
 was fast it would be still good to avoid it in the fast path when DHCPD
 is running.

No. The way, which you suggested, seems to be the best.


1. It even does not disable possibility to record timestamp inside
   driver, which Alan was afraid of. The sequence is:

if (!skb-tstamp.off_sec)
net_timestamp(skb);

2. Maybe, netif_rx() should continue to get timestamp in netif_rx().

3. NAPI already introduced almost the same inaccuracy. And it is really
   silly to waste time getting timestamp in netif_receive_skb() a few
   moments before the packet is delivered to a socket.

4. ...but clock source, which takes one of top lines in profiles
   must be repaired yet. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin
On Mon, Sep 18, 2006 at 01:27:57PM +0200, Andi Kleen wrote:
 The codebase for timing (and lots of other things) is quite different
 between 32bit and 64bit. You're really surprised it doesn't work if you do 
 such things?
 
It works, and after your remark above, I'm surprised.
Dunno about slow TSC drift though, there was not enough time passed to
detect it, and I hope we will have this problem soved in a better way
before the drift becomes visible :)

  But the question is, why stock 2.6.18-rc7 could not use TSC on its own?
 
 x86-64 doesn't use the TSC when it deems it to not be reliable, which
 is the case on your system.
  
Could it at least print something so that I know that using TSC  was
considered, but rejected?

  What hardware exactly. Doesn't it affect only CPU? And they are not
  know to fail before any other components.
 
 All hardware. It's basic physics.

Hm, what other hardware is affected by idle=poll? Does this option ear
out HDDs?
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin
On Mon, Sep 18, 2006 at 06:50:22PM +0200, Andi Kleen wrote:
 
 I suppose in the worst case a sysctl like Vladimir asked for could be added,
 but it would seem somewhat lame.
 
Please think about it this way:
suppose you haave a heavily loaded router and some network problem is to
be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
switching to timestamp-it-all mode), drops OSPF adjancecies etc. Users
are angry, and you can't diagnose anything. But with impresise
timestamps and maybe even with reordered packets you still have some
traces to analyze.
So, in this particular corner case it's not that lame.

Or maybe patching tcpdump will do better?
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread David Miller
From: Alexey Kuznetsov [EMAIL PROTECTED]
Date: Tue, 19 Sep 2006 01:03:21 +0400

 1. It even does not disable possibility to record timestamp inside
driver, which Alan was afraid of. The sequence is:
 
   if (!skb-tstamp.off_sec)
 net_timestamp(skb);
 
 2. Maybe, netif_rx() should continue to get timestamp in netif_rx().
 
 3. NAPI already introduced almost the same inaccuracy. And it is really
silly to waste time getting timestamp in netif_receive_skb() a few
moments before the packet is delivered to a socket.
 
 4. ...but clock source, which takes one of top lines in profiles
must be repaired yet. :-)

Ok, ok, but don't we have queueing disciplines that need the timestamp
even on ingress?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov
Hello!

 Please think about it this way:
 suppose you haave a heavily loaded router and some network problem is to
 be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
 switching to timestamp-it-all mode

I am sorry. I cannot think that way. :-)

Instead of attempts to scare, better resend original report,
where you said how much performance degraded, I cannot find it.

* I do see get_offset_pmtmr() in top lines of profile. That's scary enough.
* I do not undestand what the hell dhcp needs timestamps for.
* I do not listen any suggestions to screw up tcpdump with a sysctl.
  Kernel already implements much better thing then a sysctl.
  Do not want timestamps? Fix tcpdump, add an options, submit the
  patch to tcpdump maintainers. Not a big deal. 

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin
On Tue, Sep 19, 2006 at 02:00:38AM +0400, Alexey Kuznetsov wrote:
 Hello!
 
  Please think about it this way:
  suppose you haave a heavily loaded router and some network problem is to
  be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
  switching to timestamp-it-all mode
 
 I am sorry. I cannot think that way. :-)
 
 Instead of attempts to scare, better resend original report,
 where you said how much performance degraded, I cannot find it.
 
 * I do see get_offset_pmtmr() in top lines of profile. That's scary enough.

I had it at the very top line.

 * I do not undestand what the hell dhcp needs timestamps for.
 * I do not listen any suggestions to screw up tcpdump with a sysctl.
   Kernel already implements much better thing then a sysctl.
   Do not want timestamps? Fix tcpdump, add an options, submit the
   patch to tcpdump maintainers. Not a big deal. 

OK, point taken.
It's better to patch tcpdump.

 
 Alexey
 
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread David Lang

On Tue, 19 Sep 2006, Alexey Kuznetsov wrote:


Hello!


Please think about it this way:
suppose you haave a heavily loaded router and some network problem is to
be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
switching to timestamp-it-all mode


I am sorry. I cannot think that way. :-)

Instead of attempts to scare, better resend original report,
where you said how much performance degraded, I cannot find it.

* I do see get_offset_pmtmr() in top lines of profile. That's scary enough.
* I do not undestand what the hell dhcp needs timestamps for.
* I do not listen any suggestions to screw up tcpdump with a sysctl.
 Kernel already implements much better thing then a sysctl.
 Do not want timestamps? Fix tcpdump, add an options, submit the
 patch to tcpdump maintainers. Not a big deal.


if fireing up one program (however minor) can cause network performance to drop 
by 50% (based on the numbers reported earlier in this thread) that is a 
significant problem for sysadmins.


yes tcpdump may be wrong in requesting timestamps (in most cases it probably is, 
but in some cases it's doing exactly what the sysadmin wants it to do), but I 
don't think that many sysadmins would expect this much of a performance hit. 
there should be some way to tell the system to ignore requests for timestamps so 
that a badly behaved program cannot cripple the system this way (and preferably 
something that doesn't require a full SELinux/capabilities implementation)


David Lang
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen
On Monday 18 September 2006 23:03, Alexey Kuznetsov wrote:

 
  And do you have some other prefered way to solve this? Even if the timer
  was fast it would be still good to avoid it in the fast path when DHCPD
  is running.
 
 No. The way, which you suggested, seems to be the best.

Ok. I also checked my desktop and for some reason I got a timestamp counter
of 7 (and it doesn't even run client dhcp). Haven't investigated why yet, and I 
am 
still hoping it's not a leak. 

But that hints that trying to fix all of user space to not use the ioctl 
would have been probably too much work.


 1. It even does not disable possibility to record timestamp inside
driver, which Alan was afraid of. The sequence is:
 
   if (!skb-tstamp.off_sec)
 net_timestamp(skb);
 
 2. Maybe, netif_rx() should continue to get timestamp in netif_rx().

Hmm, there are still quite a lot users and even with netif_rx() you
can have long delays from interrupt mitigation etc.

% grep -rw netif_rx drivers/net/*  | wc -l
253

 3. NAPI already introduced almost the same inaccuracy. And it is really
silly to waste time getting timestamp in netif_receive_skb() a few
moments before the packet is delivered to a socket.
 
 4. ...but clock source, which takes one of top lines in profiles
must be repaired yet. :-)

It's being worked on, but it'll take some time. But even when TSC 
can be used it's still a good idea to not call gtod unnecessarily 
because it can be still relatively slow (e.g. on P4 RDTSC takes
hundreds of cycles because it synchronizes the CPU). Also on some 
other non x86 platforms it is also relatively slow because they have 
to reach out to the chipset and every time you do that things get slow.

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-07-10 Thread Jesper Dangaard Brouer



On Tue, 4 Jul 2006, Andi Kleen wrote:


On Tuesday 04 July 2006 13:41, Jesper Dangaard Brouer wrote:


Actually the change happens between kernel version 2.6.15 and 2.6.16.


The timestamp optimizations are older. Don't remember the exact release,
but earlier 2.6.


What I'm saying is that, with the same Config file, some Kconfig option 
changed between 2.6.15 and 2.6.16, that made my system use pmtmr for high-res 
timesource instead of TSC.




And
is a result of Andi's changes to arch/x86_64/Kconfig and
drivers/acpi/Kconfig, which allows/activates the use of the timer on
x86_64.


Not sure what you mean here?


I think, that the changes you made to the files arch/x86_64/Kconfig and 
drivers/acpi/Kconfig, caused this change...


commit: e78256b8f3e2850ad55c2d69e1429e6c2607afd3

http://www.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.17.y.git;a=commitdiff;h=e78256b8f3e2850ad55c2d69e1429e6c2607afd3

and maybe
commit: 2eb1bdbad89b19c99f8ac1de1492cdabbff6b3d3

http://www.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.17.y.git;a=commitdiff;h=2eb1bdbad89b19c99f8ac1de1492cdabbff6b3d3


Hilsen
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-07-04 Thread Jesper Dangaard Brouer


On Mon, 26 Jun 2006, Andi Kleen wrote:


I encountered the same problem on a dual core opteron equipped with a
broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC
as the clock source, but the time jumped back and forth, so I changed
it to 'notsc', then the performance dropped dramatically to around the
same value as above with one CPU saturated. I suspect that the clock
precision is needed by the tg3 driver to correctly decide to switch to
polling mode, but unfortunately, the performance drop rendered the
solution so much unusable that I finally decided to use it only in
uniprocessor with TSC enabled.


2.6 is more clever at this than 2.4. In particular it does the timestamp
for each packet only when actually needed, which is relativelt rare.

Old experiences do not always apply to new kernels.


Note, that I experinced this problem on 2.6.

Actually the change happens between kernel version 2.6.15 and 2.6.16. And 
is a result of Andi's changes to arch/x86_64/Kconfig and 
drivers/acpi/Kconfig, which allows/activates the use of the timer on 
x86_64.


Cheers,
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-07-04 Thread Andi Kleen
On Tuesday 04 July 2006 13:41, Jesper Dangaard Brouer wrote:
 
 On Mon, 26 Jun 2006, Andi Kleen wrote:
 
  I encountered the same problem on a dual core opteron equipped with a
  broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC
  as the clock source, but the time jumped back and forth, so I changed
  it to 'notsc', then the performance dropped dramatically to around the
  same value as above with one CPU saturated. I suspect that the clock
  precision is needed by the tg3 driver to correctly decide to switch to
  polling mode, but unfortunately, the performance drop rendered the
  solution so much unusable that I finally decided to use it only in
  uniprocessor with TSC enabled.
 
  2.6 is more clever at this than 2.4. In particular it does the timestamp
  for each packet only when actually needed, which is relativelt rare.
 
  Old experiences do not always apply to new kernels.
 
 Note, that I experinced this problem on 2.6.
 
 Actually the change happens between kernel version 2.6.15 and 2.6.16.

The timestamp optimizations are older. Don't remember the exact release,
but earlier 2.6.

 And  
 is a result of Andi's changes to arch/x86_64/Kconfig and 
 drivers/acpi/Kconfig, which allows/activates the use of the timer on 
 x86_64.

Not sure what you mean here?

2.6.18 will likely be more aggressive at using the TSC on i386 on
Intel systems where possible, but x86-64 did this already for a long time. 
When x86-64 uses non TSC then it's because using the TSC is not safe.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-25 Thread Harry Edmon
I understand the saying beggars can't be choosers, but I have heard nothing on 
this issue since June 19th.  Does anyone have any ideas on what is going on?  Is 
there more information I can collect that would help diagnose this problem?  And 
again, thanks for any and all help!

--
 Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED]
 206-543-0547   [EMAIL PROTECTED]
 Dept of Atmospheric Sciences   FAX:206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-25 Thread Willy Tarreau
Hi Andi,

On Mon, Jun 19, 2006 at 05:24:31PM +0200, Andi Kleen wrote:
 
  If you use pmtmr try to reboot with kernel option clock=tsc.
 
 That's dangerous advice - when the system choses not to use
 TSC it often has a reason.
 
  
  On my Opteron AMD system i normally can route 400 kpps, but with 
  timesource pmtmr i could only route around 83 kpps.  (I found the timer 
  to be the issue by using oprofile).
 
 Unless you're using packet sniffing or any other application
 that requests time stamps on a socket then the timer shouldn't 
 make much difference. Incoming packets are only time stamped
 when someone asks for the timestamps.

I encountered the same problem on a dual core opteron equipped with a
broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC
as the clock source, but the time jumped back and forth, so I changed
it to 'notsc', then the performance dropped dramatically to around the
same value as above with one CPU saturated. I suspect that the clock
precision is needed by the tg3 driver to correctly decide to switch to
polling mode, but unfortunately, the performance drop rendered the
solution so much unusable that I finally decided to use it only in
uniprocessor with TSC enabled.

 -Andi

Regards,
Willy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-25 Thread Bill Fink
On Sun, 25 Jun 2006, Harry Edmon wrote:

 I understand the saying beggars can't be choosers, but I have heard nothing 
 on 
 this issue since June 19th.  Does anyone have any ideas on what is going on?  
 Is 
 there more information I can collect that would help diagnose this problem?  
 And 
 again, thanks for any and all help!

Harry,

I'd suggest checking all the ethtool configuration settings
(ethtool -a, -c, -g, -k) and statistics (ethtool -S) for both
the working and problematic kernels, and then comparing them
to see if anything jumps out at you.  Also compare ifconfig
settings and dmesg output.  Check /proc/interrupts to see if
there is any difference with the interrupt routing.  Check
sysctl.conf and rc.local for any special system configuration
or device settings that might differ between the systems.

The one thing that has caused me a lot of network performance
issues on e1000 is having TSO enabled, so if that is enabled
(check with ethtool -k), then I'd try disabling it to see if
that helps.

-Hope this helps

-Bill
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-25 Thread Andi Kleen

 I encountered the same problem on a dual core opteron equipped with a
 broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC
 as the clock source, but the time jumped back and forth, so I changed
 it to 'notsc', then the performance dropped dramatically to around the
 same value as above with one CPU saturated. I suspect that the clock
 precision is needed by the tg3 driver to correctly decide to switch to
 polling mode, but unfortunately, the performance drop rendered the
 solution so much unusable that I finally decided to use it only in
 uniprocessor with TSC enabled.

2.6 is more clever at this than 2.4. In particular it does the timestamp
for each packet only when actually needed, which is relativelt rare.

Old experiences do not always apply to new kernels.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Harry Edmon

Stephen Hemminger wrote:


Does this fix it?
   # sysctl -w net.ipv4.tcp_abc=0


That did not help.  I have 1 minute outputs from tcpdump under both 2.6.11.12 
and 2.6.16.20.  You will see a large size difference between the files.  Since 
the 2.6.11.12 one is 2 MBytes, I thought I would post them via the web instead 
of via attachments.   Look at:


http://www.atmos.washington.edu/~harry/linux/2.6.11.12.out.1min
http://www.atmos.washington.edu/~harry/linux/2.6.16.20.out.1min

And again, thank to all of you for looking into this.

--
 Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED]
 206-543-0547   [EMAIL PROTECTED]
 Dept of Atmospheric Sciences   FAX:206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Jesper Dangaard Brouer



Harry Edmon [EMAIL PROTECTED] wrote:

I have a system with a strange network performance degradation from 
2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6. 
The system is has Dual single core Xeons with hyperthreading on.

cut

Hi Harry

Can you check which high-res timesource you are using?

In the kernel log look for:
 kernel: Using tsc for high-res timesource
 kernel: Using pmtmr for high-res timesource

I have experinced some network performance degradation when using the 
pmtmr timesource, on a Opteron AMD system.  It seems that the default 
timesource change between 2.6.15 to 2.6.16.


If you use pmtmr try to reboot with kernel option clock=tsc.

On my Opteron AMD system i normally can route 400 kpps, but with 
timesource pmtmr i could only route around 83 kpps.  (I found the timer 
to be the issue by using oprofile).



Cheers,
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Harry Edmon



Jesper Dangaard Brouer wrote:



Harry Edmon [EMAIL PROTECTED] wrote:

I have a system with a strange network performance degradation from 
2.6.11.12 to most recent kernels including 2.6.16.20 and 
2.6.17-rc6. The system is has Dual single core Xeons with 
hyperthreading on.

cut

Hi Harry

Can you check which high-res timesource you are using?

In the kernel log look for:
 kernel: Using tsc for high-res timesource
 kernel: Using pmtmr for high-res timesource

I have experinced some network performance degradation when using the 
pmtmr timesource, on a Opteron AMD system.  It seems that the 
default timesource change between 2.6.15 to 2.6.16.


If you use pmtmr try to reboot with kernel option clock=tsc.

On my Opteron AMD system i normally can route 400 kpps, but with 
timesource pmtmr i could only route around 83 kpps.  (I found the 
timer to be the issue by using oprofile).




We have CONFIG_HPET_TIMER=y, so we do not see these messages.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Chris Friesen

Andi Kleen wrote:


Incoming packets are only time stamped
when someone asks for the timestamps.


Doesn't that add scheduling latency to the timestamps?  Or is is a flag 
that gets set to trigger timestamping at packet arrival?


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Jesper Dangaard Brouer


On Mon, 19 Jun 2006, Andi Kleen wrote:


If you use pmtmr try to reboot with kernel option clock=tsc.


That's dangerous advice - when the system choses not to use
TSC it often has a reason.


Sorry, it was not a general advice, just something to try out.  It really 
solved my network performance issue...




On my Opteron AMD system i normally can route 400 kpps, but with
timesource pmtmr i could only route around 83 kpps.  (I found the timer
to be the issue by using oprofile).


Unless you're using packet sniffing or any other application
that requests time stamps on a socket then the timer shouldn't
make much difference. Incoming packets are only time stamped
when someone asks for the timestamps.


I do not know what caused the issue on my machine, but I can look into it 
if you like to know?


I do have VLAN interfaces on the machine and it seems that eth1 runs in 
PROMISC mode (eth1.xxx does not).  Could it be caused by that?


Hilsen
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Andi Kleen
On Monday 19 June 2006 19:34, Chris Friesen wrote:
 Andi Kleen wrote:
  Incoming packets are only time stamped
  when someone asks for the timestamps.

 Doesn't that add scheduling latency to the timestamps?  Or is is a flag
 that gets set to trigger timestamping at packet arrival?

It's a flag (or more precise a global counter) 

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Herbert Xu
Harry Edmon [EMAIL PROTECTED] wrote:
 
 That did not help.  I have 1 minute outputs from tcpdump under both 2.6.11.12 
 and 2.6.16.20.  You will see a large size difference between the files.  
 Since 
 the 2.6.11.12 one is 2 MBytes, I thought I would post them via the web 
 instead 
 of via attachments.   Look at:
 
 http://www.atmos.washington.edu/~harry/linux/2.6.11.12.out.1min
 http://www.atmos.washington.edu/~harry/linux/2.6.16.20.out.1min

The latter shows that it took 40ms to generate an ACK.  What does
'vmstat 1' show while this is happneing?
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-18 Thread Harry Edmon

Stephen Hemminger wrote:

  

Does this fix it?
   # sysctl -w net.ipv4.tcp_abc=0


Thanks for the suggestion.  I will give it a try later tonight.  Also Andrew - 
sorry for the incorrect placement of my follow-up comments.  I do appreciate 
everyone's help in figuring this out.


--
 Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED]
 206-543-0547   [EMAIL PROTECTED]
 Dept of Atmospheric Sciences   FAX:206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-17 Thread Andrew Morton
On Fri, 16 Jun 2006 09:01:23 -0700
Harry Edmon [EMAIL PROTECTED] wrote:

 I have a system with a strange network performance degradation from 
 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6.   
 The system is has Dual single core Xeons with hyperthreading on.   The 
 application is the LDM system from UCAR/Unidata 
 (http://www.unidata.ucar.edu/software/ldm).   This system requests 
 weather data from a variety of systems using RPC calls over a reserved 
 TCP port (388), puts them into a memory mapped queue file, and then 
 sends the data out to a variety of downstream requesting systems, again 
 using RPC calls.  When the load is heavy, the 2.6.16.20 kernel falls way 
 behind with the data ingestion.  The 2.6.11.12 kernel does not.   I have 
 tried an experiment with a 2.6.17-rc6 system where it just does the 
 ingestion, and not the downstream distribution, and it is able to keep 
 up.   I would really appreciate any pointers as to where the problem may 
 be and how to diagnose it.  I have attached the config files from both 
 kernels and the sysctl.conf file I am using.   I have also included the 
 output from netstat -s on the 2.6.16.20 system during a time when it 
 was having problems.
 

(added netdev)

A quick grep indicates that it isn't using TCP_NODELAY - we've had problems
with that in the past.

Perhaps a tcpdump of the net traffic will help to determine what's going on.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-17 Thread Harry Edmon
I assume you are talking about using TCP_NODELAY as a socket option within the 
LDM software.  I could give that a try.


There is a lot of traffic on this node, on the order of 2000 packets in and out 
per second, so the tcpdump output will grow pretty fast.  How long a tcpdump 
would be useful, and what options would you suggest?


I should also note that my network interfaces are Intel, using the latest e1000 
driver.



Andrew Morton wrote:

On Fri, 16 Jun 2006 09:01:23 -0700
Harry Edmon [EMAIL PROTECTED] wrote:

I have a system with a strange network performance degradation from 
2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6.   
The system is has Dual single core Xeons with hyperthreading on.   The 
application is the LDM system from UCAR/Unidata 
(http://www.unidata.ucar.edu/software/ldm).   This system requests 
weather data from a variety of systems using RPC calls over a reserved 
TCP port (388), puts them into a memory mapped queue file, and then 
sends the data out to a variety of downstream requesting systems, again 
using RPC calls.  When the load is heavy, the 2.6.16.20 kernel falls way 
behind with the data ingestion.  The 2.6.11.12 kernel does not.   I have 
tried an experiment with a 2.6.17-rc6 system where it just does the 
ingestion, and not the downstream distribution, and it is able to keep 
up.   I would really appreciate any pointers as to where the problem may 
be and how to diagnose it.  I have attached the config files from both 
kernels and the sysctl.conf file I am using.   I have also included the 
output from netstat -s on the 2.6.16.20 system during a time when it 
was having problems.




(added netdev)

A quick grep indicates that it isn't using TCP_NODELAY - we've had problems
with that in the past.

Perhaps a tcpdump of the net traffic will help to determine what's going on.



--
 Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED]
 206-543-0547   [EMAIL PROTECTED]
 Dept of Atmospheric Sciences   FAX:206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-17 Thread Andrew Morton
On Sat, 17 Jun 2006 16:23:34 -0700
Harry Edmon [EMAIL PROTECTED] wrote:

 Andrew Morton wrote:
  On Fri, 16 Jun 2006 09:01:23 -0700
  Harry Edmon [EMAIL PROTECTED] wrote:
  
  I have a system with a strange network performance degradation from 
  2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6.   
  The system is has Dual single core Xeons with hyperthreading on.   The 
  application is the LDM system from UCAR/Unidata 
  (http://www.unidata.ucar.edu/software/ldm).   This system requests 
  weather data from a variety of systems using RPC calls over a reserved 
  TCP port (388), puts them into a memory mapped queue file, and then 
  sends the data out to a variety of downstream requesting systems, again 
  using RPC calls.  When the load is heavy, the 2.6.16.20 kernel falls way 
  behind with the data ingestion.  The 2.6.11.12 kernel does not.   I have 
  tried an experiment with a 2.6.17-rc6 system where it just does the 
  ingestion, and not the downstream distribution, and it is able to keep 
  up.   I would really appreciate any pointers as to where the problem may 
  be and how to diagnose it.  I have attached the config files from both 
  kernels and the sysctl.conf file I am using.   I have also included the 
  output from netstat -s on the 2.6.16.20 system during a time when it 
  was having problems.
 
  
  (added netdev)
  
  A quick grep indicates that it isn't using TCP_NODELAY - we've had problems
  with that in the past.
  
  Perhaps a tcpdump of the net traffic will help to determine what's going on.
 

[ edit, edit - please don't top-post ]

 I assume you are talking about using TCP_NODELAY as a socket option within 
 the 
 LDM software.  I could give that a try.

The use of TCP_NODELAY caused problems with the JVM debugger.  I'm not
suggesting that enabling it will fix anything here.

 
 There is a lot of traffic on this node, on the order of 2000 packets in and 
 out 
 per second, so the tcpdump output will grow pretty fast.  How long a tcpdump 
 would be useful, and what options would you suggest?

I don't know, frankly - first one needs to develop some sort of theory,
then use the diagnostic tools to prove or disprove that theory.  And I
don't have a theory.

I guess a simple one-second bare `tcpdump -i eth0' would be a starting
point.  Perhaps compare the output of that with the output from a
correctly-operating kernel, see if anything suggests itself.  That might
also give us something which the networking developers can use.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-17 Thread Stephen Hemminger

Andrew Morton wrote:

On Sat, 17 Jun 2006 16:23:34 -0700
Harry Edmon [EMAIL PROTECTED] wrote:

  

Andrew Morton wrote:


On Fri, 16 Jun 2006 09:01:23 -0700
Harry Edmon [EMAIL PROTECTED] wrote:

  
I have a system with a strange network performance degradation from 
2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6.   
The system is has Dual single core Xeons with hyperthreading on.   The 
application is the LDM system from UCAR/Unidata 
(http://www.unidata.ucar.edu/software/ldm).   This system requests 
weather data from a variety of systems using RPC calls over a reserved 
TCP port (388), puts them into a memory mapped queue file, and then 
sends the data out to a variety of downstream requesting systems, again 
using RPC calls.  When the load is heavy, the 2.6.16.20 kernel falls way 
behind with the data ingestion.  The 2.6.11.12 kernel does not.   I have 
tried an experiment with a 2.6.17-rc6 system where it just does the 
ingestion, and not the downstream distribution, and it is able to keep 
up.   I would really appreciate any pointers as to where the problem may 
be and how to diagnose it.  I have attached the config files from both 
kernels and the sysctl.conf file I am using.   I have also included the 
output from netstat -s on the 2.6.16.20 system during a time when it 
was having problems.




(added netdev)

A quick grep indicates that it isn't using TCP_NODELAY - we've had problems
with that in the past.

Perhaps a tcpdump of the net traffic will help to determine what's going on.
  


[ edit, edit - please don't top-post ]

  
I assume you are talking about using TCP_NODELAY as a socket option within the 
LDM software.  I could give that a try.



The use of TCP_NODELAY caused problems with the JVM debugger.  I'm not
suggesting that enabling it will fix anything here.

  
There is a lot of traffic on this node, on the order of 2000 packets in and out 
per second, so the tcpdump output will grow pretty fast.  How long a tcpdump 
would be useful, and what options would you suggest?



I don't know, frankly - first one needs to develop some sort of theory,
then use the diagnostic tools to prove or disprove that theory.  And I
don't have a theory.

I guess a simple one-second bare `tcpdump -i eth0' would be a starting
point.  Perhaps compare the output of that with the output from a
correctly-operating kernel, see if anything suggests itself.  That might
also give us something which the networking developers can use.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  

Does this fix it?
   # sysctl -w net.ipv4.tcp_abc=0

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html