subject:"Network performance degradation from 2.6.11.12 to 2.6.16.20"

Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)

2007-03-06 Thread Vladimir B. Savkin

On Tue, Mar 06, 2007 at 04:16:24PM +0100, Eric Dumazet wrote:
> 
> It would be better to name the tunable "disable_timestamps", default 0 of 
> course 

I agree.
If networking maintainers are interested, I surely can prepare a patch.

But IMO some way to force TSC usage on x86_64 will be even better.

> It would better describe what your patch is actually doing : Even if a 
> tcpdump 
> is running (so asking for timestamps), it wont have them because the sysctl 
> disabled them.

Well, tcpdump will have timestamps, but taken at wrong moment.
But some other applications (that use ip_queue, ulog etc.) will not,
as I understand.

> 
> Thank you
> 
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)

2007-03-06 Thread Eric Dumazet

On Tuesday 06 March 2007 15:43, Vladimir B. Savkin wrote:
> On Tue, Mar 06, 2007 at 03:38:44PM +0100, Eric Dumazet wrote:
> > 2) "accurate_timestamps" is misleading.
> > Should be "disable_timestamps"
>
> Not, if default is 1, as in my patch.

Yes I saw this. I should write more words next time :)

Full explanation:
--

If your tunable is named "accurate_timestamps" then a 0 value would mean :

Use a low precision timestamp (based on xtime for example) instead of a full 
resolution...

This is not what your patch does (while it could do that, but beware that 
net-2.6.22 includes now a ktime_t timestamping)

So :
--

It would be better to name the tunable "disable_timestamps", default 0 of 
course 
It would better describe what your patch is actually doing : Even if a tcpdump 
is running (so asking for timestamps), it wont have them because the sysctl 
disabled them.

Thank you

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)

2007-03-06 Thread Vladimir B. Savkin

On Tue, Mar 06, 2007 at 03:38:44PM +0100, Eric Dumazet wrote:
> 2) "accurate_timestamps" is misleading. 
>   Should be "disable_timestamps" 

Not, if default is 1, as in my patch.

~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)

2007-03-06 Thread Eric Dumazet

On Tuesday 06 March 2007 14:25, Vladimir B. Savkin wrote:

>   },
> + {
> + .ctl_name   = NET_CORE_ACCURATE_TIMESTAMPS,
> + .procname   = "accurate_timestamps",
> + .data   = &sysctl_accurate_timestamps,
> + .maxlen = sizeof(int),
> + .mode   = 0644,
> + .proc_handler   = &proc_dointvec
> + },
>   { .ctl_name = 0 }
>  };
>
>
> May I ask about integrating this or a similar solution for those
> like me who values routing performance (with bind9 running) over
> minor convinience of having tcpdump always display accurate
> timestamps?
>

Quite frankly I dont like this patch :

1) Fix applications, do not bloat kernel.

2) "accurate_timestamps" is misleading. 
Should be "disable_timestamps" 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)

2007-03-06 Thread Vladimir B. Savkin

On Fri, Sep 22, 2006 at 09:51:09AM -0700, Rick Jones wrote:
> >That came from named. It opens lots of sockets with SIOCGSTAMP.
> >No idea what it needs that many for.
> 
> IIRC ISC BIND named opens a socket for each IP it finds on the system. 
> Presumeably in this way it "knows" implicitly the destination IP without 
> using platform-specific recvfrom/whatever extensions and gets some 
> additional parallelism in the stack on SMP systems.
> 
> Why it needs/wants the timestamps I've no idea, I don't think it gets 
> them that way on all platforms.  I suppose the next time I do some named 
> benchmarking I can try to take a closer look in the source.
> 

Returning to the discussion about packet timestamps, I just
use the following patch now:

diff -ur ../linux-2.6.20.1/include/linux/sysctl.h 
linux-2.6.20.1-ts/include/linux/sysctl.h
--- ../linux-2.6.20.1/include/linux/sysctl.h2007-02-20 09:34:32.0 
+0300
+++ linux-2.6.20.1-ts/include/linux/sysctl.h2007-03-04 19:10:36.0 
+0300
@@ -280,6 +280,7 @@
NET_CORE_BUDGET=19,
NET_CORE_AEVENT_ETIME=20,
NET_CORE_AEVENT_RSEQTH=21,
+   NET_CORE_ACCURATE_TIMESTAMPS=99,
 };
 
 /* /proc/sys/net/ethernet */
diff -ur ../linux-2.6.20.1/net/core/dev.c linux-2.6.20.1-ts/net/core/dev.c
--- ../linux-2.6.20.1/net/core/dev.c2007-02-20 09:34:32.0 +0300
+++ linux-2.6.20.1-ts/net/core/dev.c2007-03-04 19:09:44.0 +0300
@@ -1043,9 +1043,11 @@
 }
 EXPORT_SYMBOL(__net_timestamp);
 
+int sysctl_accurate_timestamps = 1;
+
 static inline void net_timestamp(struct sk_buff *skb)
 {
-   if (atomic_read(&netstamp_needed))
+   if (sysctl_accurate_timestamps && atomic_read(&netstamp_needed))
__net_timestamp(skb);
else {
skb->tstamp.off_sec = 0;
diff -ur ../linux-2.6.20.1/net/core/sysctl_net_core.c 
linux-2.6.20.1-ts/net/core/sysctl_net_core.c
--- ../linux-2.6.20.1/net/core/sysctl_net_core.c2007-02-20 
09:34:32.0 +0300
+++ linux-2.6.20.1-ts/net/core/sysctl_net_core.c2007-03-04 
19:05:11.0 +0300
@@ -21,6 +21,8 @@
 
 extern int sysctl_core_destroy_delay;
 
+extern int sysctl_accurate_timestamps;
+
 #ifdef CONFIG_XFRM
 extern u32 sysctl_xfrm_aevent_etime;
 extern u32 sysctl_xfrm_aevent_rseqth;
@@ -136,6 +138,14 @@
.mode   = 0644,
.proc_handler   = &proc_dointvec
},
+   {
+   .ctl_name   = NET_CORE_ACCURATE_TIMESTAMPS,
+   .procname   = "accurate_timestamps",
+   .data   = &sysctl_accurate_timestamps,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = &proc_dointvec
+   },
{ .ctl_name = 0 }
 };
 

May I ask about integrating this or a similar solution for those
like me who values routing performance (with bind9 running) over
minor convinience of having tcpdump always display accurate
timestamps?

And why current kernel (2.6.20.1) still ignores parameter
clocksource=tsc ? I think with idle=poll TSC is safe to use on my setup,
it had ran with TSC for many months without a problem.

~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-22 Thread Rick Jones


That came from named. It opens lots of sockets with SIOCGSTAMP.
No idea what it needs that many for.


IIRC ISC BIND named opens a socket for each IP it finds on the system. 
Presumeably in this way it "knows" implicitly the destination IP without 
using platform-specific recvfrom/whatever extensions and gets some 
additional parallelism in the stack on SMP systems.


Why it needs/wants the timestamps I've no idea, I don't think it gets 
them that way on all platforms.  I suppose the next time I do some named 
benchmarking I can try to take a closer look in the source.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-22 Thread Andi Kleen

On Friday 22 September 2006 17:35, Alexey Kuznetsov wrote:
> Hello!
> 
> > I can't even find a reference to SIOCGSTAMP in the
> > dhcp-2.0pl5 or dhcp3-3.0.3 sources shipped in Ubuntu.
> > 
> > But I will note that tpacket_rcv() expects to always get
> > valid timestamps in the SKB, it does a:
> 
> It is equally unlikely it uses mmapped packet socket (tpacket_rcv).
> 
> I even installed that dhcp on x86_64. And I do not see anything,
> netstamp_needed remains zero when running both server and client.
> It looks like dhcp was defamed without a guilt. :-)
>
> Seems, Andi saw some leakage in netstamp_needed (value of 7),
> but I do not see this too.

That came from named. It opens lots of sockets with SIOCGSTAMP.
No idea what it needs that many for.

I suspect  it was either dhcpd (server) or that ppp user space daemon
the original reporter was running.

Maybe it would be a good idea to add a printk by default?

> In any case, the issue is obviously more serious than just behaviour
> of some applications. On my notebook one gettimeofday() takes:
> 
>   0.2 us with tsc
>   4.6 us with pm  (AND THIS CRAP IS DEFAULT!!)

This is actually quite fast. I've seen much worse ratios.

Also on some i386 kernels the pmtimer reads the register three 
times to work around some buggy implementation that doesn't latch the counter
properly.

>   9.4 us with pit (kinda expected)
> 
> It is ridiculous. Obviosuly, nobody (not only tcpdump, but everything
> else) does not need such clock. Taking timestamp takes time comparable
> with processing the whole tcp frame. :-) I have no idea what is possible
> to do without breaking everything, but it is not something to ignore.
> This timer must be shot. :-)

If it's a reasonably new notebook it might be actually possible to change.
The default choices are quite conservative there because in the past
there were lots of problems with notebooks changing frequency behind
the kernel's back etc. and screwing up TSC. But that shouldn't happen anymore.

If you had a 64bit laptop the kernel would likely do the right choice :)

Notebooks are easy because they are only single socket, so the only thing
needed is to keep track of the frequency (or not if you have a Core+) 

-Andi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-22 Thread Alexey Kuznetsov

Hello!

> I can't even find a reference to SIOCGSTAMP in the
> dhcp-2.0pl5 or dhcp3-3.0.3 sources shipped in Ubuntu.
> 
> But I will note that tpacket_rcv() expects to always get
> valid timestamps in the SKB, it does a:

It is equally unlikely it uses mmapped packet socket (tpacket_rcv).

I even installed that dhcp on x86_64. And I do not see anything,
netstamp_needed remains zero when running both server and client.
It looks like dhcp was defamed without a guilt. :-)

Seems, Andi saw some leakage in netstamp_needed (value of 7),
but I do not see this too.


In any case, the issue is obviously more serious than just behaviour
of some applications. On my notebook one gettimeofday() takes:

0.2 us with tsc
4.6 us with pm  (AND THIS CRAP IS DEFAULT!!)
9.4 us with pit (kinda expected)

It is ridiculous. Obviosuly, nobody (not only tcpdump, but everything
else) does not need such clock. Taking timestamp takes time comparable
with processing the whole tcp frame. :-) I have no idea what is possible
to do without breaking everything, but it is not something to ignore.
This timer must be shot. :-)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread Andi Kleen


> It seems only natural to me that the real problem is the slow
> clock source which needs to be resolved regardless of the
> outcome of this discussion. I believe that updating the stamp
> at socket enqueue time is the right thing to do but it shouldn't
> be considered as a solution to the performance problem.

While I agree it would be nice to fix that particular issue 
(it's unfortunately hard) slow clock sources in general won't go
away. They are also in lots of other platforms.

And even if you have a fast clock source not using it when you
don't need to is better. For example some x86s can be quite
slow even reading TSCs. It's much better than pmtmr
it's still is a expensive operations that is best avoided.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread Thomas Graf

* David Miller <[EMAIL PROTECTED]> 2006-09-18 14:22
> From: Alexey Kuznetsov <[EMAIL PROTECTED]>
> Date: Tue, 19 Sep 2006 01:03:21 +0400
> 
> > 1. It even does not disable possibility to record timestamp inside
> >driver, which Alan was afraid of. The sequence is:
> > 
> > if (!skb->tstamp.off_sec)
> > net_timestamp(skb);
> > 
> > 2. Maybe, netif_rx() should continue to get timestamp in netif_rx().
> > 
> > 3. NAPI already introduced almost the same inaccuracy. And it is really
> >silly to waste time getting timestamp in netif_receive_skb() a few
> >moments before the packet is delivered to a socket.
> > 
> > 4. ...but clock source, which takes one of top lines in profiles
> >must be repaired yet. :-)
> 
> Ok, ok, but don't we have queueing disciplines that need the timestamp
> even on ingress?

Queueing disciplines generally only care about the time delta
between two packets, using the receive stamp would lead to
wrong results as soon as a packet is queued more than once.

However, since we recently introcued ingress queueing we
must update the stamp to make up for the delay caused by the
queue. Updating the stamp at socket enqueue time would solve
this automatically.

It seems only natural to me that the real problem is the slow
clock source which needs to be resolved regardless of the
outcome of this discussion. I believe that updating the stamp
at socket enqueue time is the right thing to do but it shouldn't
be considered as a solution to the performance problem.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread David Miller

From: Alexey Kuznetsov <[EMAIL PROTECTED]>
Date: Tue, 19 Sep 2006 02:00:38 +0400

> * I do not undestand what the hell dhcp needs timestamps for.

I can't even find a reference to SIOCGSTAMP in the
dhcp-2.0pl5 or dhcp3-3.0.3 sources shipped in Ubuntu.

But I will note that tpacket_rcv() expects to always get
valid timestamps in the SKB, it does a:

if (skb->tstamp.off_sec == 0) { 
__net_timestamp(skb);
sock_enable_timestamp(sk);
}

so that it can fill in the h->tp_sec and h->tp_usec
fields.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread Stephen Hemminger

The sky2 hardware (and others) can timestamp in hardware, but trying
to keep device ticks and system clock in sync looked too nasty
to contemplate actually using it.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread David Miller

From: David Lang <[EMAIL PROTECTED]>
Date: Mon, 18 Sep 2006 14:57:04 -0700 (PDT)

> yes tcpdump may be wrong in requesting timestamps (in most cases it
> probably is, but in some cases it's doing exactly what the sysadmin
> wants it to do), but I don't think that many sysadmins would expect
> this much of a performance hit.  there should be some way to tell
> the system to ignore requests for timestamps so that a badly behaved
> program cannot cripple the system this way (and preferably something
> that doesn't require a full SELinux/capabilities implementation)

tcpdump is not wrong in requesting timestamps, and there are
many legitimate userland programs that call gettimeofday()
for internal timestamping _A LOT_.  Such as X11 clients.

The real fact of the matter is that these x86_64 systems are using the
slowest possible time-of-day implementation, simply because it's "too
hard" currently to properly probe the most efficient mechanism which
is present in the system.

If getting the time of day is at the top of the profiles in the packet
input path, and we're only capturing a timestamp once per packet,
something is _VERY VERY_ wrong with the timestamp implementation
because think of all of the other seriously expensive things that
happen on a per-packet basis which should absolutely dwarf
timestamping in terms of cost.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-19 Thread David Miller

From: "Vladimir B. Savkin" <[EMAIL PROTECTED]>
Date: Tue, 19 Sep 2006 02:03:31 +0400

> On Tue, Sep 19, 2006 at 02:00:38AM +0400, Alexey Kuznetsov wrote:
> > * I do see get_offset_pmtmr() in top lines of profile. That's scary enough.
> 
> I had it at the very top line.

That is just rediculious.

You can "fix" the networking by making it timestamp less but what
about things like just normal X11 clients that call gettimeofday()
at very high rates?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

On Monday 18 September 2006 23:22, David Miller wrote:

> Ok, ok, but don't we have queueing disciplines that need the timestamp
> even on ingress?

I grepped and I can't find any. The only non SIOCGTSTAMP users of the
time stamp seem to be sunrpc and conntrack and I bet both can be converted
over to jiffies without trouble.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

On Monday 18 September 2006 23:03, Alexey Kuznetsov wrote:

> 
> > And do you have some other prefered way to solve this? Even if the timer
> > was fast it would be still good to avoid it in the fast path when DHCPD
> > is running.
> 
> No. The way, which you suggested, seems to be the best.

Ok. I also checked my desktop and for some reason I got a timestamp counter
of 7 (and it doesn't even run client dhcp). Haven't investigated why yet, and I 
am 
still hoping it's not a leak. 

But that hints that trying to fix all of user space to not use the ioctl 
would have been probably too much work.

> 1. It even does not disable possibility to record timestamp inside
>driver, which Alan was afraid of. The sequence is:
> 
>   if (!skb->tstamp.off_sec)
> net_timestamp(skb);
> 
> 2. Maybe, netif_rx() should continue to get timestamp in netif_rx().

Hmm, there are still quite a lot users and even with netif_rx() you
can have long delays from interrupt mitigation etc.

% grep -rw netif_rx drivers/net/*  | wc -l
253

> 3. NAPI already introduced almost the same inaccuracy. And it is really
>silly to waste time getting timestamp in netif_receive_skb() a few
>moments before the packet is delivered to a socket.
> 
> 4. ...but clock source, which takes one of top lines in profiles
>must be repaired yet. :-)

It's being worked on, but it'll take some time. But even when TSC 
can be used it's still a good idea to not call gtod unnecessarily 
because it can be still relatively slow (e.g. on P4 RDTSC takes
hundreds of cycles because it synchronizes the CPU). Also on some 
other non x86 platforms it is also relatively slow because they have 
to reach out to the chipset and every time you do that things get slow.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread David Lang


On Tue, 19 Sep 2006, Alexey Kuznetsov wrote:


Hello!


Please think about it this way:
suppose you haave a heavily loaded router and some network problem is to
be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
switching to timestamp-it-all mode


I am sorry. I cannot think that way. :-)

Instead of attempts to scare, better resend original report,
where you said how much performance degraded, I cannot find it.

* I do see get_offset_pmtmr() in top lines of profile. That's scary enough.
* I do not undestand what the hell dhcp needs timestamps for.
* I do not listen any suggestions to screw up tcpdump with a sysctl.
 Kernel already implements much better thing then a sysctl.
 Do not want timestamps? Fix tcpdump, add an options, submit the
 patch to tcpdump maintainers. Not a big deal.


if fireing up one program (however minor) can cause network performance to drop 
by >50% (based on the numbers reported earlier in this thread) that is a 
significant problem for sysadmins.


yes tcpdump may be wrong in requesting timestamps (in most cases it probably is, 
but in some cases it's doing exactly what the sysadmin wants it to do), but I 
don't think that many sysadmins would expect this much of a performance hit. 
there should be some way to tell the system to ignore requests for timestamps so 
that a badly behaved program cannot cripple the system this way (and preferably 
something that doesn't require a full SELinux/capabilities implementation)


David Lang
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin

On Tue, Sep 19, 2006 at 02:00:38AM +0400, Alexey Kuznetsov wrote:
> Hello!
> 
> > Please think about it this way:
> > suppose you haave a heavily loaded router and some network problem is to
> > be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
> > switching to timestamp-it-all mode
> 
> I am sorry. I cannot think that way. :-)
> 
> Instead of attempts to scare, better resend original report,
> where you said how much performance degraded, I cannot find it.
> 
> * I do see get_offset_pmtmr() in top lines of profile. That's scary enough.

I had it at the very top line.

> * I do not undestand what the hell dhcp needs timestamps for.
> * I do not listen any suggestions to screw up tcpdump with a sysctl.
>   Kernel already implements much better thing then a sysctl.
>   Do not want timestamps? Fix tcpdump, add an options, submit the
>   patch to tcpdump maintainers. Not a big deal. 

OK, point taken.
It's better to patch tcpdump.

> 
> Alexey
> 
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello!

> Please think about it this way:
> suppose you haave a heavily loaded router and some network problem is to
> be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
> switching to timestamp-it-all mode

I am sorry. I cannot think that way. :-)

Instead of attempts to scare, better resend original report,
where you said how much performance degraded, I cannot find it.

* I do see get_offset_pmtmr() in top lines of profile. That's scary enough.
* I do not undestand what the hell dhcp needs timestamps for.
* I do not listen any suggestions to screw up tcpdump with a sysctl.
  Kernel already implements much better thing then a sysctl.
  Do not want timestamps? Fix tcpdump, add an options, submit the
  patch to tcpdump maintainers. Not a big deal. 

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello!

> Ok, ok, but don't we have queueing disciplines that need the timestamp
> even on ingress?

I cannot find.

ip_queue does. But it is just another user, not different of sockets.

BTW in any case, any user of timestamp who sees 0, because skb was received
before timestamping was enabled, has to calculate timestamp itself right
in the place where Andi suggested. Seems, preparation to the change
makes sense even without the change. :-)

Alexey

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread David Miller

From: Alexey Kuznetsov <[EMAIL PROTECTED]>
Date: Tue, 19 Sep 2006 01:03:21 +0400

> 1. It even does not disable possibility to record timestamp inside
>driver, which Alan was afraid of. The sequence is:
> 
>   if (!skb->tstamp.off_sec)
> net_timestamp(skb);
> 
> 2. Maybe, netif_rx() should continue to get timestamp in netif_rx().
> 
> 3. NAPI already introduced almost the same inaccuracy. And it is really
>silly to waste time getting timestamp in netif_receive_skb() a few
>moments before the packet is delivered to a socket.
> 
> 4. ...but clock source, which takes one of top lines in profiles
>must be repaired yet. :-)

Ok, ok, but don't we have queueing disciplines that need the timestamp
even on ingress?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin

On Mon, Sep 18, 2006 at 06:50:22PM +0200, Andi Kleen wrote:
> 
> I suppose in the worst case a sysctl like Vladimir asked for could be added,
> but it would seem somewhat lame.
> 
Please think about it this way:
suppose you haave a heavily loaded router and some network problem is to
be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
switching to timestamp-it-all mode), drops OSPF adjancecies etc. Users
are angry, and you can't diagnose anything. But with impresise
timestamps and maybe even with reordered packets you still have some
traces to analyze.
So, in this particular corner case it's not that lame.

Or maybe patching tcpdump will do better?
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin

On Mon, Sep 18, 2006 at 01:27:57PM +0200, Andi Kleen wrote:
> The codebase for timing (and lots of other things) is quite different
> between 32bit and 64bit. You're really surprised it doesn't work if you do 
> such things?
> 
It works, and after your remark above, I'm surprised.
Dunno about slow TSC drift though, there was not enough time passed to
detect it, and I hope we will have this problem soved in a better way
before the drift becomes visible :)

> > But the question is, why stock 2.6.18-rc7 could not use TSC on its own?
> 
> x86-64 doesn't use the TSC when it deems it to not be reliable, which
> is the case on your system.
>  
Could it at least print something so that I know that using TSC  was
considered, but rejected?

> > What hardware exactly. Doesn't it affect only CPU? And they are not
> > know to fail before any other components.
> 
> All hardware. It's basic physics.

Hm, what other hardware is affected by idle=poll? Does this option ear
out HDDs?
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello!

> But that never happens right? 

Right.

Well, not right. It happens. Simply because you get packet
with newer timestamp after previous handler saw this packet
and did some actions. I just do not see any bad consequences.


> And do you have some other prefered way to solve this? Even if the timer
> was fast it would be still good to avoid it in the fast path when DHCPD
> is running.

No. The way, which you suggested, seems to be the best.


1. It even does not disable possibility to record timestamp inside
   driver, which Alan was afraid of. The sequence is:

if (!skb->tstamp.off_sec)
net_timestamp(skb);

2. Maybe, netif_rx() should continue to get timestamp in netif_rx().

3. NAPI already introduced almost the same inaccuracy. And it is really
   silly to waste time getting timestamp in netif_receive_skb() a few
   moments before the packet is delivered to a socket.

4. ...but clock source, which takes one of top lines in profiles
   must be repaired yet. :-)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

On Monday 18 September 2006 18:28, Alexey Kuznetsov wrote:
> Hello!
> 
> > Hmm, not sure how that could happen. Also is it a real problem
> > even if it could?
> 
> As I said, the problem is _occasionally_ theoretical.
> 
> This would happen f.e. if packet socket handler was installed
> after IP handler. Then tcpdump would get packet after it is processed
> (acked/replied/forwarded). This would be disasterous, the results
> are unparsable.

But that never happens right? 

And do you have some other prefered way to solve this? Even if the timer
was fast it would be still good to avoid it in the fast path when DHCPD
is running.

I suppose in the worst case a sysctl like Vladimir asked for could be added,
but it would seem somewhat lame.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello!

> Hmm, not sure how that could happen. Also is it a real problem
> even if it could?

As I said, the problem is _occasionally_ theoretical.

This would happen f.e. if packet socket handler was installed
after IP handler. Then tcpdump would get packet after it is processed
(acked/replied/forwarded). This would be disasterous, the results
are unparsable.

I recall, the issue was discussed, and that time it looked more
reasonable to solve problems of this kind taking timestamp once
before it is seen by all the rest of stack. Who could expect that
PIT nightmare is going to return? :-)


> Then it has to use the ACPI pmtmr which is really really slow.
> The overhead of that thing is so large that you can clearly see it in
> the network benchmark.

I see. Thank you.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

On Monday 18 September 2006 17:38, Alexey Kuznetsov wrote:
> Hello!
> 
> > For netdev: I'm more and more thinking we should just avoid the problem
> > completely and switch to "true end2end" timestamps. This means don't
> > time stamp when a packet is received, but only when it is delivered
> > to a socket.
> 
> This will work.
> 
> From viewpoint of existing uses of timestamp by packet socket
> this time is not worse. The only danger is violation of casuality
> (when forwarded packet or reply packet gets timestamp earlier than
> original packet). 

Hmm, not sure how that could happen. Also is it a real problem
even if it could?

> > handler runs. Then the problem above would completely disappear. 
> 
> Well, not completely. Too slow clock source remains too slow clock source.
> If it is so slow, that it results in "performance degradation", it just
> should not be used at all, even such pariah as tcpdump wants to be fast.
> 
> Actually, I have a question. Why the subject is
> "Network performance degradation from 2.6.11.12 to 2.6.16.20"?
> I do not see beginning of the thread and cannot guess
> why clock source degraded. :-)

It's a long and sad story.

Old kernels didn't disable the TSC on those boxes (multi core K8) and assumed
they were synchronized for timing purposes. 

This initially mostly worked  if you don't use cpufreq, 
but over a longer uptime the TSCs would drift against each other and timing
would jump more and more between CPUs.

On older versions of K8 this drift happened much slower (more
aggressive power saving in HLT in newer steppings made it worse; that is why
idle=poll helps) and could be often ignored. But technically it was still a 
bug there because it would could break timing after long uptimes.

New multi socket K8 boxes are generally 
totally unusable with TSC because they use cpufreq and the TSCs can run
at completely differently frequencies, which obviously doesn't give very 
good timing information if you assume the TSC is globally synchronized.

That is why later kernels default to TSC off.  The original plan 
was to use HPET then, which is slower than TSC, but still not that bad.
But while most modern systems have a HPET timer somewhere in the chipset 
nearly all BIOS vendors "forgot" to describe it in the BIOS because Windows
didn't use it and Linux can't find it because of that. 

Then it has to use the ACPI pmtmr which is really really slow.
The overhead of that thing is so large that you can clearly see it in
the network benchmark.

The real fix long term is to change the timer subsystem to keep all TSC
state per CPU, then it'll work on the K8s too. Unfortunately it's a moderately 
hard problem  to make the result still fully monotonic. But people are working 
on it.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello!

> For netdev: I'm more and more thinking we should just avoid the problem
> completely and switch to "true end2end" timestamps. This means don't
> time stamp when a packet is received, but only when it is delivered
> to a socket.

This will work.

>From viewpoint of existing uses of timestamp by packet socket
this time is not worse. The only danger is violation of casuality
(when forwarded packet or reply packet gets timestamp earlier than
original packet). This pathology was main reason why timestamp
is recorded early, before packet is demultiplexed in netif_receive_skb().
But it is not a practical problem: delivery to packet/raw sockets
is occasionally placed _before_ delivery to real protocol handlers.


> handler runs. Then the problem above would completely disappear. 

Well, not completely. Too slow clock source remains too slow clock source.
If it is so slow, that it results in "performance degradation", it just
should not be used at all, even such pariah as tcpdump wants to be fast.

Actually, I have a question. Why the subject is
"Network performance degradation from 2.6.11.12 to 2.6.16.20"?
I do not see beginning of the thread and cannot guess
why clock source degraded. :-)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

On Monday 18 September 2006 17:19, Alan Cox wrote:
> Ar Llu, 2006-09-18 am 16:29 +0200, ysgrifennodd Andi Kleen:
> > The only delay this would add would be the queueing time from the NIC
> > to the softirq. Do you really think that is that bad?
> 
> If you are trying to do things like network record/playback then you
> want the minimal delay. 

But it's not minimal. Maybe it was long ago when the code was designed
on a 3c509 but not with modern hardware: Think interrupt mitigation and NAPI. 

And with NAPI we tend to process the packets directly after they
are fetched out of the RX queue, so there is practically no delay
between driver seeing the packet and softirq seeing it.  All the queuing
is done either at hardware level or later at socket level.

> There's a reason the original timestamp code 
> supported the hardware setting the timestamp itself - we actually had a
> separare set of logic on a board that was doing the timestamping by
> watching the IRQ line of the NIC chip.

That would be fine too (because it will be likely fast), but unfortunately
I don't know of any driver that does that.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alan Cox

Ar Llu, 2006-09-18 am 16:29 +0200, ysgrifennodd Andi Kleen:
> The only delay this would add would be the queueing time from the NIC
> to the softirq. Do you really think that is that bad?

If you are trying to do things like network record/playback then you
want the minimal delay. There's a reason the original timestamp code
supported the hardware setting the timestamp itself - we actually had a
separare set of logic on a board that was doing the timestamping by
watching the IRQ line of the NIC chip.

Alan

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen


> 
> People who run tcpdump want "wire" timestamps as close as possible.
> Yes, things get delayed with the IRQ path, DMA delays, IRQ
> mitigation and whatnot, but it's an order of magnitude worse if
> you delay to user read() since that introduces also the delay of
> the packet copies to userspace which are significantly larger than
> these hardware level delays.  If tcpdump gets swapped out, the
> timestamp delay can be on the order of several seconds making it
> totally useless.

My proposal wasn't to delay to user read, just to do the time stamp in socket 
context. This means as soon as packet or RAW/UDP have looked up the socket and 
can 
check a per socket flag do the time stamp.

The only delay this would add would be the queueing time from the NIC
to the softirq. Do you really think that is that bad?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread David Miller

From: Andi Kleen <[EMAIL PROTECTED]>
Date: 18 Sep 2006 11:58:21 +0200

> For netdev: I'm more and more thinking we should just avoid the
> problem completely and switch to "true end2end" timestamps. This
> means don't time stamp when a packet is received, but only when it
> is delivered to a socket. The timestamp at receiving is a lie
> anyways because the network hardware can add an arbitary long delay
> before the driver interrupt handler runs. Then the problem above
> would completely disappear.

I don't think this is wise.

People who run tcpdump want "wire" timestamps as close as possible.
Yes, things get delayed with the IRQ path, DMA delays, IRQ
mitigation and whatnot, but it's an order of magnitude worse if
you delay to user read() since that introduces also the delay of
the packet copies to userspace which are significantly larger than
these hardware level delays.  If tcpdump gets swapped out, the
timestamp delay can be on the order of several seconds making it
totally useless.

Andi, you will need to find another solution to this problem :-)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

"Vladimir B. Savkin" <[EMAIL PROTECTED]> writes:

[you seem to send your emails in a strange way that doesn't keep me in cc.
Please stop doing that.]

> On Mon, Sep 18, 2006 at 11:58:21AM +0200, Andi Kleen wrote:
> > > > The x86-64 timer subsystems currently doesn't have clocksources
> > > > at all, but it supports TSC and some other timers.
> > > 
> > 
> > > until I hacked arch/i386/kernel/tsc.c
> > 
> > Then you don't use x86-64. 
> > 
> Oh. I mean I made arch/i386/kernel/tsc.c compile on x86-64
> by hacking some Makefiles and headers. 

The codebase for timing (and lots of other things) is quite different
between 32bit and 64bit. You're really surprised it doesn't work if you do such 
things?

> But the question is, why stock 2.6.18-rc7 could not use TSC on its own?

x86-64 doesn't use the TSC when it deems it to not be reliable, which
is the case on your system.
 
> > > > > I've also had experience of unsychronized TSC on dual-core Athlon,
> > > > > but it was cured by idle=poll.
> > > > 
> > > > You can use that, but it will make your system run quite hot 
> > > > and cost you a lot of powe^wmoney.
> > > 
> > > Here in Russia electric power is cheap compared with hardware upgrade.
> > 
> > It's not just electrical power - the hardware is more stressed and will
> > likely fail earlier too.  As a rule of thumb the hotter your hardware runs
> > the earlier it will fail.
> 
> What hardware exactly. Doesn't it affect only CPU? And they are not
> know to fail before any other components.

All hardware. It's basic physics.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin

On Mon, Sep 18, 2006 at 11:58:21AM +0200, Andi Kleen wrote:
> > > The x86-64 timer subsystems currently doesn't have clocksources
> > > at all, but it supports TSC and some other timers.
> > 
> 
> > until I hacked arch/i386/kernel/tsc.c
> 
> Then you don't use x86-64. 
> 
Oh. I mean I made arch/i386/kernel/tsc.c compile on x86-64
by hacking some Makefiles and headers. 

But the question is, why stock 2.6.18-rc7 could not use TSC on its own?

> > > > I've also had experience of unsychronized TSC on dual-core Athlon,
> > > > but it was cured by idle=poll.
> > > 
> > > You can use that, but it will make your system run quite hot 
> > > and cost you a lot of powe^wmoney.
> > 
> > Here in Russia electric power is cheap compared with hardware upgrade.
> 
> It's not just electrical power - the hardware is more stressed and will
> likely fail earlier too.  As a rule of thumb the hotter your hardware runs
> the earlier it will fail.

What hardware exactly. Doesn't it affect only CPU? And they are not
know to fail before any other components.
> > 
> > > > It seems that dhcpd3 makes the box timestamping incoming packets,
> > > > killing the performance. I think that combining router and DHCP server
> > > > on a same box is a legitimate situation, isn't it?
> > > 
> > > Yes.  Good point. DHCP is broken and needs to be fixed. Can you
> > > send a bug report to the DHCP maintainers? 
> > > 
> > > iirc the problem used to be that RAW sockets didn't do something
> > > they need them to do. Maybe we can fix that now.
> > 
> > Will try some days later.
> > 
> > Oh, and pppoe-server uses some kind of packet socket too, doesn't it?
> 
> The problem is not really using a packet socket, but using the SIOCGSTAMP
> ioctl on it. As soon as someone issues it the system will take accurate 
> time stamps for each incoming packet until the respective socket is closed.
> 
> Quick fix is to change user space to use gettimeofday() when it reads
> the packet instead.

Ok, thank you, I now understand.

> 
> For netdev: I'm more and more thinking we should just avoid the problem
> completely and switch to "true end2end" timestamps. This means don't
> time stamp when a packet is received, but only when it is delivered
> to a socket. The timestamp at receiving is a lie anyways because
> the network hardware can add an arbitary long delay before the driver 
> interrupt
> handler runs. Then the problem above would completely disappear. 
> Comments? Opinions? 
> 
> -Andi
> 
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

"Vladimir B. Savkin" <[EMAIL PROTECTED]> writes:

> On Mon, Sep 18, 2006 at 10:35:38AM +0200, Andi Kleen wrote:
> > > I just found out that TSC clocksource is not implemented on x86-64.
> > > Kernel version 2.6.18-rc7, is it true?
> > 
> > The x86-64 timer subsystems currently doesn't have clocksources
> > at all, but it supports TSC and some other timers.
> 

> until I hacked arch/i386/kernel/tsc.c

Then you don't use x86-64. 

> 
> > > I've also had experience of unsychronized TSC on dual-core Athlon,
> > > but it was cured by idle=poll.
> > 
> > You can use that, but it will make your system run quite hot 
> > and cost you a lot of powe^wmoney.
> 
> Here in Russia electric power is cheap compared with hardware upgrade.

It's not just electrical power - the hardware is more stressed and will
likely fail earlier too.  As a rule of thumb the hotter your hardware runs
the earlier it will fail.

> 
> > > It seems that dhcpd3 makes the box timestamping incoming packets,
> > > killing the performance. I think that combining router and DHCP server
> > > on a same box is a legitimate situation, isn't it?
> > 
> > Yes.  Good point. DHCP is broken and needs to be fixed. Can you
> > send a bug report to the DHCP maintainers? 
> > 
> > iirc the problem used to be that RAW sockets didn't do something
> > they need them to do. Maybe we can fix that now.
> 
> Will try some days later.
> 
> Oh, and pppoe-server uses some kind of packet socket too, doesn't it?

The problem is not really using a packet socket, but using the SIOCGSTAMP
ioctl on it. As soon as someone issues it the system will take accurate 
time stamps for each incoming packet until the respective socket is closed.

Quick fix is to change user space to use gettimeofday() when it reads
the packet instead.

For netdev: I'm more and more thinking we should just avoid the problem
completely and switch to "true end2end" timestamps. This means don't
time stamp when a packet is received, but only when it is delivered
to a socket. The timestamp at receiving is a lie anyways because
the network hardware can add an arbitary long delay before the driver interrupt
handler runs. Then the problem above would completely disappear. 
Comments? Opinions? 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin

On Mon, Sep 18, 2006 at 10:35:38AM +0200, Andi Kleen wrote:
> > I just found out that TSC clocksource is not implemented on x86-64.
> > Kernel version 2.6.18-rc7, is it true?
> 
> The x86-64 timer subsystems currently doesn't have clocksources
> at all, but it supports TSC and some other timers.

Hm. On my box, TSC did not work, until I hacked arch/i386/kernel/tsc.c
in it. 
Neither clock=tsc nor clocksource=tsc didn't have any effect.

> > I've also had experience of unsychronized TSC on dual-core Athlon,
> > but it was cured by idle=poll.
> 
> You can use that, but it will make your system run quite hot 
> and cost you a lot of powe^wmoney.

Here in Russia electric power is cheap compared with hardware upgrade.

> > It seems that dhcpd3 makes the box timestamping incoming packets,
> > killing the performance. I think that combining router and DHCP server
> > on a same box is a legitimate situation, isn't it?
> 
> Yes.  Good point. DHCP is broken and needs to be fixed. Can you
> send a bug report to the DHCP maintainers? 
> 
> iirc the problem used to be that RAW sockets didn't do something
> they need them to do. Maybe we can fix that now.

Will try some days later.

Oh, and pppoe-server uses some kind of packet socket too, doesn't it?

> 
> If that's not possible we can probably add a ioctl or similar
> to disable time stamping for packet sockets (DHCP shouldn't really
> need a fine grained time stamp). dhcpcd would need to use that then.

I would like some sysctl very much, too. Let tcpdump show imprecise
timestamps when forwarding performance is more important.
After all, Ciscos don't have any tcpdump analog at all, and they are 
very popular :)

> 
> Keep me updated what they say.
> 
> -Andi
> 
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

"Vladimir B. Savkin" <[EMAIL PROTECTED]> writes:

> On Mon, Jun 19, 2006 at 05:24:31PM +0200, Andi Kleen wrote:
> > 
> > > If you use "pmtmr" try to reboot with kernel option "clock=tsc".
> > 
> > That's dangerous advice - when the system choses not to use
> > TSC it often has a reason.
> 
> I just found out that TSC clocksource is not implemented on x86-64.
> Kernel version 2.6.18-rc7, is it true?

The x86-64 timer subsystems currently doesn't have clocksources
at all, but it supports TSC and some other timers.

> 
> I've also had experience of unsychronized TSC on dual-core Athlon,
> but it was cured by idle=poll.

You can use that, but it will make your system run quite hot 
and cost you a lot of powe^wmoney.

> It seems that dhcpd3 makes the box timestamping incoming packets,
> killing the performance. I think that combining router and DHCP server
> on a same box is a legitimate situation, isn't it?

Yes.  Good point. DHCP is broken and needs to be fixed. Can you
send a bug report to the DHCP maintainers? 

iirc the problem used to be that RAW sockets didn't do something
they need them to do. Maybe we can fix that now.

If that's not possible we can probably add a ioctl or similar
to disable time stamping for packet sockets (DHCP shouldn't really
need a fine grained time stamp). dhcpcd would need to use that then.

Keep me updated what they say.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-16 Thread Vladimir B. Savkin

On Mon, Jun 19, 2006 at 05:24:31PM +0200, Andi Kleen wrote:
> 
> > If you use "pmtmr" try to reboot with kernel option "clock=tsc".
> 
> That's dangerous advice - when the system choses not to use
> TSC it often has a reason.

I just found out that TSC clocksource is not implemented on x86-64.
Kernel version 2.6.18-rc7, is it true?

I've also had experience of unsychronized TSC on dual-core Athlon,
but it was cured by idle=poll.

> 
> > 
> > On my Opteron AMD system i normally can route 400 kpps, but with 
> > timesource "pmtmr" i could only route around 83 kpps.  (I found the timer 
> > to be the issue by using oprofile).
> 
> Unless you're using packet sniffing or any other application
> that requests time stamps on a socket then the timer shouldn't 
> make much difference. Incoming packets are only time stamped
> when someone asks for the timestamps.
> 
It seems that dhcpd3 makes the box timestamping incoming packets,
killing the performance. I think that combining router and DHCP server
on a same box is a legitimate situation, isn't it?

~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-07-10 Thread Jesper Dangaard Brouer




On Tue, 4 Jul 2006, Andi Kleen wrote:


On Tuesday 04 July 2006 13:41, Jesper Dangaard Brouer wrote:


Actually the change happens between kernel version 2.6.15 and 2.6.16.


The timestamp optimizations are older. Don't remember the exact release,
but earlier 2.6.


What I'm saying is that, with the same Config file, some Kconfig option 
changed between 2.6.15 and 2.6.16, that made my system use pmtmr for high-res 
timesource instead of TSC.




And
is a result of Andi's changes to arch/x86_64/Kconfig and
drivers/acpi/Kconfig, which "allows/activates" the use of the timer on
x86_64.


Not sure what you mean here?


I think, that the changes you made to the files "arch/x86_64/Kconfig" and 
"drivers/acpi/Kconfig", caused this change...


commit: e78256b8f3e2850ad55c2d69e1429e6c2607afd3

http://www.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.17.y.git;a=commitdiff;h=e78256b8f3e2850ad55c2d69e1429e6c2607afd3

and maybe
commit: 2eb1bdbad89b19c99f8ac1de1492cdabbff6b3d3

http://www.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.17.y.git;a=commitdiff;h=2eb1bdbad89b19c99f8ac1de1492cdabbff6b3d3


Hilsen
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-07-04 Thread Andi Kleen

On Tuesday 04 July 2006 13:41, Jesper Dangaard Brouer wrote:
> 
> On Mon, 26 Jun 2006, Andi Kleen wrote:
> 
> >> I encountered the same problem on a dual core opteron equipped with a
> >> broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC
> >> as the clock source, but the time jumped back and forth, so I changed
> >> it to 'notsc', then the performance dropped dramatically to around the
> >> same value as above with one CPU saturated. I suspect that the clock
> >> precision is needed by the tg3 driver to correctly decide to switch to
> >> polling mode, but unfortunately, the performance drop rendered the
> >> solution so much unusable that I finally decided to use it only in
> >> uniprocessor with TSC enabled.
> >
> > 2.6 is more clever at this than 2.4. In particular it does the timestamp
> > for each packet only when actually needed, which is relativelt rare.
> >
> > Old experiences do not always apply to new kernels.
> 
> Note, that I experinced this problem on 2.6.
> 
> Actually the change happens between kernel version 2.6.15 and 2.6.16.

The timestamp optimizations are older. Don't remember the exact release,
but earlier 2.6.

> And  
> is a result of Andi's changes to arch/x86_64/Kconfig and 
> drivers/acpi/Kconfig, which "allows/activates" the use of the timer on 
> x86_64.

Not sure what you mean here?

2.6.18 will likely be more aggressive at using the TSC on i386 on
Intel systems where possible, but x86-64 did this already for a long time. 
When x86-64 uses non TSC then it's because using the TSC is not safe.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-07-04 Thread Jesper Dangaard Brouer



On Mon, 26 Jun 2006, Andi Kleen wrote:


I encountered the same problem on a dual core opteron equipped with a
broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC
as the clock source, but the time jumped back and forth, so I changed
it to 'notsc', then the performance dropped dramatically to around the
same value as above with one CPU saturated. I suspect that the clock
precision is needed by the tg3 driver to correctly decide to switch to
polling mode, but unfortunately, the performance drop rendered the
solution so much unusable that I finally decided to use it only in
uniprocessor with TSC enabled.


2.6 is more clever at this than 2.4. In particular it does the timestamp
for each packet only when actually needed, which is relativelt rare.

Old experiences do not always apply to new kernels.


Note, that I experinced this problem on 2.6.

Actually the change happens between kernel version 2.6.15 and 2.6.16. And 
is a result of Andi's changes to arch/x86_64/Kconfig and 
drivers/acpi/Kconfig, which "allows/activates" the use of the timer on 
x86_64.


Cheers,
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-25 Thread Andi Kleen


> I encountered the same problem on a dual core opteron equipped with a
> broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC
> as the clock source, but the time jumped back and forth, so I changed
> it to 'notsc', then the performance dropped dramatically to around the
> same value as above with one CPU saturated. I suspect that the clock
> precision is needed by the tg3 driver to correctly decide to switch to
> polling mode, but unfortunately, the performance drop rendered the
> solution so much unusable that I finally decided to use it only in
> uniprocessor with TSC enabled.

2.6 is more clever at this than 2.4. In particular it does the timestamp
for each packet only when actually needed, which is relativelt rare.

Old experiences do not always apply to new kernels.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-25 Thread Bill Fink

On Sun, 25 Jun 2006, Harry Edmon wrote:

> I understand the saying "beggars can't be choosers", but I have heard nothing 
> on 
> this issue since June 19th.  Does anyone have any ideas on what is going on?  
> Is 
> there more information I can collect that would help diagnose this problem?  
> And 
> again, thanks for any and all help!

Harry,

I'd suggest checking all the ethtool configuration settings
(ethtool -a, -c, -g, -k) and statistics (ethtool -S) for both
the working and problematic kernels, and then comparing them
to see if anything jumps out at you.  Also compare ifconfig
settings and dmesg output.  Check /proc/interrupts to see if
there is any difference with the interrupt routing.  Check
sysctl.conf and rc.local for any special system configuration
or device settings that might differ between the systems.

The one thing that has caused me a lot of network performance
issues on e1000 is having TSO enabled, so if that is enabled
(check with ethtool -k), then I'd try disabling it to see if
that helps.

-Hope this helps

-Bill
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-25 Thread Willy Tarreau

Hi Andi,

On Mon, Jun 19, 2006 at 05:24:31PM +0200, Andi Kleen wrote:
> 
> > If you use "pmtmr" try to reboot with kernel option "clock=tsc".
> 
> That's dangerous advice - when the system choses not to use
> TSC it often has a reason.
> 
> > 
> > On my Opteron AMD system i normally can route 400 kpps, but with 
> > timesource "pmtmr" i could only route around 83 kpps.  (I found the timer 
> > to be the issue by using oprofile).
> 
> Unless you're using packet sniffing or any other application
> that requests time stamps on a socket then the timer shouldn't 
> make much difference. Incoming packets are only time stamped
> when someone asks for the timestamps.

I encountered the same problem on a dual core opteron equipped with a
broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC
as the clock source, but the time jumped back and forth, so I changed
it to 'notsc', then the performance dropped dramatically to around the
same value as above with one CPU saturated. I suspect that the clock
precision is needed by the tg3 driver to correctly decide to switch to
polling mode, but unfortunately, the performance drop rendered the
solution so much unusable that I finally decided to use it only in
uniprocessor with TSC enabled.

> -Andi

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-25 Thread Harry Edmon

I understand the saying "beggars can't be choosers", but I have heard nothing on 
this issue since June 19th.  Does anyone have any ideas on what is going on?  Is 
there more information I can collect that would help diagnose this problem?  And 
again, thanks for any and all help!

--
 Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED]
 206-543-0547   [EMAIL PROTECTED]
 Dept of Atmospheric Sciences   FAX:206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Herbert Xu

Harry Edmon <[EMAIL PROTECTED]> wrote:
> 
> That did not help.  I have 1 minute outputs from tcpdump under both 2.6.11.12 
> and 2.6.16.20.  You will see a large size difference between the files.  
> Since 
> the 2.6.11.12 one is 2 MBytes, I thought I would post them via the web 
> instead 
> of via attachments.   Look at:
> 
> http://www.atmos.washington.edu/~harry/linux/2.6.11.12.out.1min
> http://www.atmos.washington.edu/~harry/linux/2.6.16.20.out.1min

The latter shows that it took 40ms to generate an ACK.  What does
'vmstat 1' show while this is happneing?
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Andi Kleen

On Monday 19 June 2006 19:34, Chris Friesen wrote:
> Andi Kleen wrote:
> > Incoming packets are only time stamped
> > when someone asks for the timestamps.
>
> Doesn't that add scheduling latency to the timestamps?  Or is is a flag
> that gets set to trigger timestamping at packet arrival?

It's a flag (or more precise a global counter) 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Jesper Dangaard Brouer



On Mon, 19 Jun 2006, Andi Kleen wrote:


If you use "pmtmr" try to reboot with kernel option "clock=tsc".


That's dangerous advice - when the system choses not to use
TSC it often has a reason.


Sorry, it was not a general advice, just something to try out.  It really 
solved my network performance issue...




On my Opteron AMD system i normally can route 400 kpps, but with
timesource "pmtmr" i could only route around 83 kpps.  (I found the timer
to be the issue by using oprofile).


Unless you're using packet sniffing or any other application
that requests time stamps on a socket then the timer shouldn't
make much difference. Incoming packets are only time stamped
when someone asks for the timestamps.


I do not know what caused the issue on my machine, but I can look into it 
if you like to know?


I do have VLAN interfaces on the machine and it seems that eth1 runs in 
PROMISC mode (eth1.xxx does not).  Could it be caused by that?


Hilsen
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Chris Friesen


Andi Kleen wrote:


Incoming packets are only time stamped
when someone asks for the timestamps.


Doesn't that add scheduling latency to the timestamps?  Or is is a flag 
that gets set to trigger timestamping at packet arrival?


Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Harry Edmon




Jesper Dangaard Brouer wrote:



Harry Edmon <[EMAIL PROTECTED]> wrote:

I have a system with a strange network performance degradation from 
2.6.11.12 to most recent kernels including 2.6.16.20 and 
2.6.17-rc6. The system is has Dual single core Xeons with 
hyperthreading on.



Hi Harry

Can you check which "high-res timesource" you are using?

In the kernel log look for:
 kernel: Using tsc for high-res timesource
 kernel: Using pmtmr for high-res timesource

I have experinced some network performance degradation when using the 
"pmtmr" timesource, on a Opteron AMD system.  It seems that the 
default timesource change between 2.6.15 to 2.6.16.


If you use "pmtmr" try to reboot with kernel option "clock=tsc".

On my Opteron AMD system i normally can route 400 kpps, but with 
timesource "pmtmr" i could only route around 83 kpps.  (I found the 
timer to be the issue by using oprofile).




We have CONFIG_HPET_TIMER=y, so we do not see these messages.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Andi Kleen


> If you use "pmtmr" try to reboot with kernel option "clock=tsc".

That's dangerous advice - when the system choses not to use
TSC it often has a reason.

> 
> On my Opteron AMD system i normally can route 400 kpps, but with 
> timesource "pmtmr" i could only route around 83 kpps.  (I found the timer 
> to be the issue by using oprofile).

Unless you're using packet sniffing or any other application
that requests time stamps on a socket then the timer shouldn't 
make much difference. Incoming packets are only time stamped
when someone asks for the timestamps.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Jesper Dangaard Brouer




Harry Edmon <[EMAIL PROTECTED]> wrote:

I have a system with a strange network performance degradation from 
2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6. 
The system is has Dual single core Xeons with hyperthreading on.



Hi Harry

Can you check which "high-res timesource" you are using?

In the kernel log look for:
 kernel: Using tsc for high-res timesource
 kernel: Using pmtmr for high-res timesource

I have experinced some network performance degradation when using the 
"pmtmr" timesource, on a Opteron AMD system.  It seems that the default 
timesource change between 2.6.15 to 2.6.16.


If you use "pmtmr" try to reboot with kernel option "clock=tsc".

On my Opteron AMD system i normally can route 400 kpps, but with 
timesource "pmtmr" i could only route around 83 kpps.  (I found the timer 
to be the issue by using oprofile).



Cheers,
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Harry Edmon


Stephen Hemminger wrote:


Does this fix it?
   # sysctl -w net.ipv4.tcp_abc=0


That did not help.  I have 1 minute outputs from tcpdump under both 2.6.11.12 
and 2.6.16.20.  You will see a large size difference between the files.  Since 
the 2.6.11.12 one is 2 MBytes, I thought I would post them via the web instead 
of via attachments.   Look at:


http://www.atmos.washington.edu/~harry/linux/2.6.11.12.out.1min
http://www.atmos.washington.edu/~harry/linux/2.6.16.20.out.1min

And again, thank to all of you for looking into this.

--
 Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED]
 206-543-0547   [EMAIL PROTECTED]
 Dept of Atmospheric Sciences   FAX:206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-18 Thread Harry Edmon


Stephen Hemminger wrote:

  

Does this fix it?
   # sysctl -w net.ipv4.tcp_abc=0


Thanks for the suggestion.  I will give it a try later tonight.  Also Andrew - 
sorry for the incorrect placement of my follow-up comments.  I do appreciate 
everyone's help in figuring this out.


--
 Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED]
 206-543-0547   [EMAIL PROTECTED]
 Dept of Atmospheric Sciences   FAX:206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-17 Thread Stephen Hemminger


Andrew Morton wrote:

On Sat, 17 Jun 2006 16:23:34 -0700
Harry Edmon <[EMAIL PROTECTED]> wrote:

  

Andrew Morton wrote:


On Fri, 16 Jun 2006 09:01:23 -0700
Harry Edmon <[EMAIL PROTECTED]> wrote:

  
I have a system with a strange network performance degradation from 
2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6.   
The system is has Dual single core Xeons with hyperthreading on.   The 
application is the LDM system from UCAR/Unidata 
(http://www.unidata.ucar.edu/software/ldm).   This system requests 
weather data from a variety of systems using RPC calls over a reserved 
TCP port (388), puts them into a memory mapped queue file, and then 
sends the data out to a variety of downstream requesting systems, again 
using RPC calls.  When the load is heavy, the 2.6.16.20 kernel falls way 
behind with the data ingestion.  The 2.6.11.12 kernel does not.   I have 
tried an experiment with a 2.6.17-rc6 system where it just does the 
ingestion, and not the downstream distribution, and it is able to keep 
up.   I would really appreciate any pointers as to where the problem may 
be and how to diagnose it.  I have attached the config files from both 
kernels and the sysctl.conf file I am using.   I have also included the 
output from "netstat -s" on the 2.6.16.20 system during a time when it 
was having problems.




(added netdev)

A quick grep indicates that it isn't using TCP_NODELAY - we've had problems
with that in the past.

Perhaps a tcpdump of the net traffic will help to determine what's going on.
  


[ edit, edit - please don't top-post ]

  
I assume you are talking about using TCP_NODELAY as a socket option within the 
LDM software.  I could give that a try.



The use of TCP_NODELAY caused problems with the JVM debugger.  I'm not
suggesting that enabling it will fix anything here.

  
There is a lot of traffic on this node, on the order of 2000 packets in and out 
per second, so the tcpdump output will grow pretty fast.  How long a tcpdump 
would be useful, and what options would you suggest?



I don't know, frankly - first one needs to develop some sort of theory,
then use the diagnostic tools to prove or disprove that theory.  And I
don't have a theory.

I guess a simple one-second bare `tcpdump -i eth0' would be a starting
point.  Perhaps compare the output of that with the output from a
correctly-operating kernel, see if anything suggests itself.  That might
also give us something which the networking developers can use.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  

Does this fix it?
   # sysctl -w net.ipv4.tcp_abc=0

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-17 Thread Andrew Morton

On Sat, 17 Jun 2006 16:23:34 -0700
Harry Edmon <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Fri, 16 Jun 2006 09:01:23 -0700
> > Harry Edmon <[EMAIL PROTECTED]> wrote:
> > 
> >> I have a system with a strange network performance degradation from 
> >> 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6.   
> >> The system is has Dual single core Xeons with hyperthreading on.   The 
> >> application is the LDM system from UCAR/Unidata 
> >> (http://www.unidata.ucar.edu/software/ldm).   This system requests 
> >> weather data from a variety of systems using RPC calls over a reserved 
> >> TCP port (388), puts them into a memory mapped queue file, and then 
> >> sends the data out to a variety of downstream requesting systems, again 
> >> using RPC calls.  When the load is heavy, the 2.6.16.20 kernel falls way 
> >> behind with the data ingestion.  The 2.6.11.12 kernel does not.   I have 
> >> tried an experiment with a 2.6.17-rc6 system where it just does the 
> >> ingestion, and not the downstream distribution, and it is able to keep 
> >> up.   I would really appreciate any pointers as to where the problem may 
> >> be and how to diagnose it.  I have attached the config files from both 
> >> kernels and the sysctl.conf file I am using.   I have also included the 
> >> output from "netstat -s" on the 2.6.16.20 system during a time when it 
> >> was having problems.
> >>
> > 
> > (added netdev)
> > 
> > A quick grep indicates that it isn't using TCP_NODELAY - we've had problems
> > with that in the past.
> > 
> > Perhaps a tcpdump of the net traffic will help to determine what's going on.
> 

[ edit, edit - please don't top-post ]

> I assume you are talking about using TCP_NODELAY as a socket option within 
> the 
> LDM software.  I could give that a try.

The use of TCP_NODELAY caused problems with the JVM debugger.  I'm not
suggesting that enabling it will fix anything here.

> 
> There is a lot of traffic on this node, on the order of 2000 packets in and 
> out 
> per second, so the tcpdump output will grow pretty fast.  How long a tcpdump 
> would be useful, and what options would you suggest?

I don't know, frankly - first one needs to develop some sort of theory,
then use the diagnostic tools to prove or disprove that theory.  And I
don't have a theory.

I guess a simple one-second bare `tcpdump -i eth0' would be a starting
point.  Perhaps compare the output of that with the output from a
correctly-operating kernel, see if anything suggests itself.  That might
also give us something which the networking developers can use.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-17 Thread Harry Edmon

I assume you are talking about using TCP_NODELAY as a socket option within the 
LDM software.  I could give that a try.


There is a lot of traffic on this node, on the order of 2000 packets in and out 
per second, so the tcpdump output will grow pretty fast.  How long a tcpdump 
would be useful, and what options would you suggest?


I should also note that my network interfaces are Intel, using the latest e1000 
driver.



Andrew Morton wrote:

On Fri, 16 Jun 2006 09:01:23 -0700
Harry Edmon <[EMAIL PROTECTED]> wrote:

I have a system with a strange network performance degradation from 
2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6.   
The system is has Dual single core Xeons with hyperthreading on.   The 
application is the LDM system from UCAR/Unidata 
(http://www.unidata.ucar.edu/software/ldm).   This system requests 
weather data from a variety of systems using RPC calls over a reserved 
TCP port (388), puts them into a memory mapped queue file, and then 
sends the data out to a variety of downstream requesting systems, again 
using RPC calls.  When the load is heavy, the 2.6.16.20 kernel falls way 
behind with the data ingestion.  The 2.6.11.12 kernel does not.   I have 
tried an experiment with a 2.6.17-rc6 system where it just does the 
ingestion, and not the downstream distribution, and it is able to keep 
up.   I would really appreciate any pointers as to where the problem may 
be and how to diagnose it.  I have attached the config files from both 
kernels and the sysctl.conf file I am using.   I have also included the 
output from "netstat -s" on the 2.6.16.20 system during a time when it 
was having problems.




(added netdev)

A quick grep indicates that it isn't using TCP_NODELAY - we've had problems
with that in the past.

Perhaps a tcpdump of the net traffic will help to determine what's going on.



--
 Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED]
 206-543-0547   [EMAIL PROTECTED]
 Dept of Atmospheric Sciences   FAX:206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-17 Thread Andrew Morton

On Fri, 16 Jun 2006 09:01:23 -0700
Harry Edmon <[EMAIL PROTECTED]> wrote:

> I have a system with a strange network performance degradation from 
> 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6.   
> The system is has Dual single core Xeons with hyperthreading on.   The 
> application is the LDM system from UCAR/Unidata 
> (http://www.unidata.ucar.edu/software/ldm).   This system requests 
> weather data from a variety of systems using RPC calls over a reserved 
> TCP port (388), puts them into a memory mapped queue file, and then 
> sends the data out to a variety of downstream requesting systems, again 
> using RPC calls.  When the load is heavy, the 2.6.16.20 kernel falls way 
> behind with the data ingestion.  The 2.6.11.12 kernel does not.   I have 
> tried an experiment with a 2.6.17-rc6 system where it just does the 
> ingestion, and not the downstream distribution, and it is able to keep 
> up.   I would really appreciate any pointers as to where the problem may 
> be and how to diagnose it.  I have attached the config files from both 
> kernels and the sysctl.conf file I am using.   I have also included the 
> output from "netstat -s" on the 2.6.16.20 system during a time when it 
> was having problems.
> 

(added netdev)

A quick grep indicates that it isn't using TCP_NODELAY - we've had problems
with that in the past.

Perhaps a tcpdump of the net traffic will help to determine what's going on.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

58 matches

Mail list logo