Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)
On Tue, Mar 06, 2007 at 04:16:24PM +0100, Eric Dumazet wrote: > > It would be better to name the tunable "disable_timestamps", default 0 of > course I agree. If networking maintainers are interested, I surely can prepare a patch. But IMO some way to force TSC usage on x86_64 will be even better. > It would better describe what your patch is actually doing : Even if a > tcpdump > is running (so asking for timestamps), it wont have them because the sysctl > disabled them. Well, tcpdump will have timestamps, but taken at wrong moment. But some other applications (that use ip_queue, ulog etc.) will not, as I understand. > > Thank you > ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)
On Tuesday 06 March 2007 15:43, Vladimir B. Savkin wrote: > On Tue, Mar 06, 2007 at 03:38:44PM +0100, Eric Dumazet wrote: > > 2) "accurate_timestamps" is misleading. > > Should be "disable_timestamps" > > Not, if default is 1, as in my patch. Yes I saw this. I should write more words next time :) Full explanation: -- If your tunable is named "accurate_timestamps" then a 0 value would mean : Use a low precision timestamp (based on xtime for example) instead of a full resolution... This is not what your patch does (while it could do that, but beware that net-2.6.22 includes now a ktime_t timestamping) So : -- It would be better to name the tunable "disable_timestamps", default 0 of course It would better describe what your patch is actually doing : Even if a tcpdump is running (so asking for timestamps), it wont have them because the sysctl disabled them. Thank you - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)
On Tue, Mar 06, 2007 at 03:38:44PM +0100, Eric Dumazet wrote: > 2) "accurate_timestamps" is misleading. > Should be "disable_timestamps" Not, if default is 1, as in my patch. ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)
On Tuesday 06 March 2007 14:25, Vladimir B. Savkin wrote: > }, > + { > + .ctl_name = NET_CORE_ACCURATE_TIMESTAMPS, > + .procname = "accurate_timestamps", > + .data = &sysctl_accurate_timestamps, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = &proc_dointvec > + }, > { .ctl_name = 0 } > }; > > > May I ask about integrating this or a similar solution for those > like me who values routing performance (with bind9 running) over > minor convinience of having tcpdump always display accurate > timestamps? > Quite frankly I dont like this patch : 1) Fix applications, do not bloat kernel. 2) "accurate_timestamps" is misleading. Should be "disable_timestamps" - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20)
On Fri, Sep 22, 2006 at 09:51:09AM -0700, Rick Jones wrote: > >That came from named. It opens lots of sockets with SIOCGSTAMP. > >No idea what it needs that many for. > > IIRC ISC BIND named opens a socket for each IP it finds on the system. > Presumeably in this way it "knows" implicitly the destination IP without > using platform-specific recvfrom/whatever extensions and gets some > additional parallelism in the stack on SMP systems. > > Why it needs/wants the timestamps I've no idea, I don't think it gets > them that way on all platforms. I suppose the next time I do some named > benchmarking I can try to take a closer look in the source. > Returning to the discussion about packet timestamps, I just use the following patch now: diff -ur ../linux-2.6.20.1/include/linux/sysctl.h linux-2.6.20.1-ts/include/linux/sysctl.h --- ../linux-2.6.20.1/include/linux/sysctl.h2007-02-20 09:34:32.0 +0300 +++ linux-2.6.20.1-ts/include/linux/sysctl.h2007-03-04 19:10:36.0 +0300 @@ -280,6 +280,7 @@ NET_CORE_BUDGET=19, NET_CORE_AEVENT_ETIME=20, NET_CORE_AEVENT_RSEQTH=21, + NET_CORE_ACCURATE_TIMESTAMPS=99, }; /* /proc/sys/net/ethernet */ diff -ur ../linux-2.6.20.1/net/core/dev.c linux-2.6.20.1-ts/net/core/dev.c --- ../linux-2.6.20.1/net/core/dev.c2007-02-20 09:34:32.0 +0300 +++ linux-2.6.20.1-ts/net/core/dev.c2007-03-04 19:09:44.0 +0300 @@ -1043,9 +1043,11 @@ } EXPORT_SYMBOL(__net_timestamp); +int sysctl_accurate_timestamps = 1; + static inline void net_timestamp(struct sk_buff *skb) { - if (atomic_read(&netstamp_needed)) + if (sysctl_accurate_timestamps && atomic_read(&netstamp_needed)) __net_timestamp(skb); else { skb->tstamp.off_sec = 0; diff -ur ../linux-2.6.20.1/net/core/sysctl_net_core.c linux-2.6.20.1-ts/net/core/sysctl_net_core.c --- ../linux-2.6.20.1/net/core/sysctl_net_core.c2007-02-20 09:34:32.0 +0300 +++ linux-2.6.20.1-ts/net/core/sysctl_net_core.c2007-03-04 19:05:11.0 +0300 @@ -21,6 +21,8 @@ extern int sysctl_core_destroy_delay; +extern int sysctl_accurate_timestamps; + #ifdef CONFIG_XFRM extern u32 sysctl_xfrm_aevent_etime; extern u32 sysctl_xfrm_aevent_rseqth; @@ -136,6 +138,14 @@ .mode = 0644, .proc_handler = &proc_dointvec }, + { + .ctl_name = NET_CORE_ACCURATE_TIMESTAMPS, + .procname = "accurate_timestamps", + .data = &sysctl_accurate_timestamps, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec + }, { .ctl_name = 0 } }; May I ask about integrating this or a similar solution for those like me who values routing performance (with bind9 running) over minor convinience of having tcpdump always display accurate timestamps? And why current kernel (2.6.20.1) still ignores parameter clocksource=tsc ? I think with idle=poll TSC is safe to use on my setup, it had ran with TSC for many months without a problem. ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
That came from named. It opens lots of sockets with SIOCGSTAMP. No idea what it needs that many for. IIRC ISC BIND named opens a socket for each IP it finds on the system. Presumeably in this way it "knows" implicitly the destination IP without using platform-specific recvfrom/whatever extensions and gets some additional parallelism in the stack on SMP systems. Why it needs/wants the timestamps I've no idea, I don't think it gets them that way on all platforms. I suppose the next time I do some named benchmarking I can try to take a closer look in the source. rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Friday 22 September 2006 17:35, Alexey Kuznetsov wrote: > Hello! > > > I can't even find a reference to SIOCGSTAMP in the > > dhcp-2.0pl5 or dhcp3-3.0.3 sources shipped in Ubuntu. > > > > But I will note that tpacket_rcv() expects to always get > > valid timestamps in the SKB, it does a: > > It is equally unlikely it uses mmapped packet socket (tpacket_rcv). > > I even installed that dhcp on x86_64. And I do not see anything, > netstamp_needed remains zero when running both server and client. > It looks like dhcp was defamed without a guilt. :-) > > Seems, Andi saw some leakage in netstamp_needed (value of 7), > but I do not see this too. That came from named. It opens lots of sockets with SIOCGSTAMP. No idea what it needs that many for. I suspect it was either dhcpd (server) or that ppp user space daemon the original reporter was running. Maybe it would be a good idea to add a printk by default? > In any case, the issue is obviously more serious than just behaviour > of some applications. On my notebook one gettimeofday() takes: > > 0.2 us with tsc > 4.6 us with pm (AND THIS CRAP IS DEFAULT!!) This is actually quite fast. I've seen much worse ratios. Also on some i386 kernels the pmtimer reads the register three times to work around some buggy implementation that doesn't latch the counter properly. > 9.4 us with pit (kinda expected) > > It is ridiculous. Obviosuly, nobody (not only tcpdump, but everything > else) does not need such clock. Taking timestamp takes time comparable > with processing the whole tcp frame. :-) I have no idea what is possible > to do without breaking everything, but it is not something to ignore. > This timer must be shot. :-) If it's a reasonably new notebook it might be actually possible to change. The default choices are quite conservative there because in the past there were lots of problems with notebooks changing frequency behind the kernel's back etc. and screwing up TSC. But that shouldn't happen anymore. If you had a 64bit laptop the kernel would likely do the right choice :) Notebooks are easy because they are only single socket, so the only thing needed is to keep track of the frequency (or not if you have a Core+) -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hello! > I can't even find a reference to SIOCGSTAMP in the > dhcp-2.0pl5 or dhcp3-3.0.3 sources shipped in Ubuntu. > > But I will note that tpacket_rcv() expects to always get > valid timestamps in the SKB, it does a: It is equally unlikely it uses mmapped packet socket (tpacket_rcv). I even installed that dhcp on x86_64. And I do not see anything, netstamp_needed remains zero when running both server and client. It looks like dhcp was defamed without a guilt. :-) Seems, Andi saw some leakage in netstamp_needed (value of 7), but I do not see this too. In any case, the issue is obviously more serious than just behaviour of some applications. On my notebook one gettimeofday() takes: 0.2 us with tsc 4.6 us with pm (AND THIS CRAP IS DEFAULT!!) 9.4 us with pit (kinda expected) It is ridiculous. Obviosuly, nobody (not only tcpdump, but everything else) does not need such clock. Taking timestamp takes time comparable with processing the whole tcp frame. :-) I have no idea what is possible to do without breaking everything, but it is not something to ignore. This timer must be shot. :-) Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
> It seems only natural to me that the real problem is the slow > clock source which needs to be resolved regardless of the > outcome of this discussion. I believe that updating the stamp > at socket enqueue time is the right thing to do but it shouldn't > be considered as a solution to the performance problem. While I agree it would be nice to fix that particular issue (it's unfortunately hard) slow clock sources in general won't go away. They are also in lots of other platforms. And even if you have a fast clock source not using it when you don't need to is better. For example some x86s can be quite slow even reading TSCs. It's much better than pmtmr it's still is a expensive operations that is best avoided. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
* David Miller <[EMAIL PROTECTED]> 2006-09-18 14:22 > From: Alexey Kuznetsov <[EMAIL PROTECTED]> > Date: Tue, 19 Sep 2006 01:03:21 +0400 > > > 1. It even does not disable possibility to record timestamp inside > >driver, which Alan was afraid of. The sequence is: > > > > if (!skb->tstamp.off_sec) > > net_timestamp(skb); > > > > 2. Maybe, netif_rx() should continue to get timestamp in netif_rx(). > > > > 3. NAPI already introduced almost the same inaccuracy. And it is really > >silly to waste time getting timestamp in netif_receive_skb() a few > >moments before the packet is delivered to a socket. > > > > 4. ...but clock source, which takes one of top lines in profiles > >must be repaired yet. :-) > > Ok, ok, but don't we have queueing disciplines that need the timestamp > even on ingress? Queueing disciplines generally only care about the time delta between two packets, using the receive stamp would lead to wrong results as soon as a packet is queued more than once. However, since we recently introcued ingress queueing we must update the stamp to make up for the delay caused by the queue. Updating the stamp at socket enqueue time would solve this automatically. It seems only natural to me that the real problem is the slow clock source which needs to be resolved regardless of the outcome of this discussion. I believe that updating the stamp at socket enqueue time is the right thing to do but it shouldn't be considered as a solution to the performance problem. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
From: Alexey Kuznetsov <[EMAIL PROTECTED]> Date: Tue, 19 Sep 2006 02:00:38 +0400 > * I do not undestand what the hell dhcp needs timestamps for. I can't even find a reference to SIOCGSTAMP in the dhcp-2.0pl5 or dhcp3-3.0.3 sources shipped in Ubuntu. But I will note that tpacket_rcv() expects to always get valid timestamps in the SKB, it does a: if (skb->tstamp.off_sec == 0) { __net_timestamp(skb); sock_enable_timestamp(sk); } so that it can fill in the h->tp_sec and h->tp_usec fields. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
The sky2 hardware (and others) can timestamp in hardware, but trying to keep device ticks and system clock in sync looked too nasty to contemplate actually using it. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
From: David Lang <[EMAIL PROTECTED]> Date: Mon, 18 Sep 2006 14:57:04 -0700 (PDT) > yes tcpdump may be wrong in requesting timestamps (in most cases it > probably is, but in some cases it's doing exactly what the sysadmin > wants it to do), but I don't think that many sysadmins would expect > this much of a performance hit. there should be some way to tell > the system to ignore requests for timestamps so that a badly behaved > program cannot cripple the system this way (and preferably something > that doesn't require a full SELinux/capabilities implementation) tcpdump is not wrong in requesting timestamps, and there are many legitimate userland programs that call gettimeofday() for internal timestamping _A LOT_. Such as X11 clients. The real fact of the matter is that these x86_64 systems are using the slowest possible time-of-day implementation, simply because it's "too hard" currently to properly probe the most efficient mechanism which is present in the system. If getting the time of day is at the top of the profiles in the packet input path, and we're only capturing a timestamp once per packet, something is _VERY VERY_ wrong with the timestamp implementation because think of all of the other seriously expensive things that happen on a per-packet basis which should absolutely dwarf timestamping in terms of cost. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
From: "Vladimir B. Savkin" <[EMAIL PROTECTED]> Date: Tue, 19 Sep 2006 02:03:31 +0400 > On Tue, Sep 19, 2006 at 02:00:38AM +0400, Alexey Kuznetsov wrote: > > * I do see get_offset_pmtmr() in top lines of profile. That's scary enough. > > I had it at the very top line. That is just rediculious. You can "fix" the networking by making it timestamp less but what about things like just normal X11 clients that call gettimeofday() at very high rates? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 18 September 2006 23:22, David Miller wrote: > Ok, ok, but don't we have queueing disciplines that need the timestamp > even on ingress? I grepped and I can't find any. The only non SIOCGTSTAMP users of the time stamp seem to be sunrpc and conntrack and I bet both can be converted over to jiffies without trouble. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 18 September 2006 23:03, Alexey Kuznetsov wrote: > > > And do you have some other prefered way to solve this? Even if the timer > > was fast it would be still good to avoid it in the fast path when DHCPD > > is running. > > No. The way, which you suggested, seems to be the best. Ok. I also checked my desktop and for some reason I got a timestamp counter of 7 (and it doesn't even run client dhcp). Haven't investigated why yet, and I am still hoping it's not a leak. But that hints that trying to fix all of user space to not use the ioctl would have been probably too much work. > 1. It even does not disable possibility to record timestamp inside >driver, which Alan was afraid of. The sequence is: > > if (!skb->tstamp.off_sec) > net_timestamp(skb); > > 2. Maybe, netif_rx() should continue to get timestamp in netif_rx(). Hmm, there are still quite a lot users and even with netif_rx() you can have long delays from interrupt mitigation etc. % grep -rw netif_rx drivers/net/* | wc -l 253 > 3. NAPI already introduced almost the same inaccuracy. And it is really >silly to waste time getting timestamp in netif_receive_skb() a few >moments before the packet is delivered to a socket. > > 4. ...but clock source, which takes one of top lines in profiles >must be repaired yet. :-) It's being worked on, but it'll take some time. But even when TSC can be used it's still a good idea to not call gtod unnecessarily because it can be still relatively slow (e.g. on P4 RDTSC takes hundreds of cycles because it synchronizes the CPU). Also on some other non x86 platforms it is also relatively slow because they have to reach out to the chipset and every time you do that things get slow. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Tue, 19 Sep 2006, Alexey Kuznetsov wrote: Hello! Please think about it this way: suppose you haave a heavily loaded router and some network problem is to be diagnosed. You run tcpdump and suddenly router becomes overloaded (by switching to timestamp-it-all mode I am sorry. I cannot think that way. :-) Instead of attempts to scare, better resend original report, where you said how much performance degraded, I cannot find it. * I do see get_offset_pmtmr() in top lines of profile. That's scary enough. * I do not undestand what the hell dhcp needs timestamps for. * I do not listen any suggestions to screw up tcpdump with a sysctl. Kernel already implements much better thing then a sysctl. Do not want timestamps? Fix tcpdump, add an options, submit the patch to tcpdump maintainers. Not a big deal. if fireing up one program (however minor) can cause network performance to drop by >50% (based on the numbers reported earlier in this thread) that is a significant problem for sysadmins. yes tcpdump may be wrong in requesting timestamps (in most cases it probably is, but in some cases it's doing exactly what the sysadmin wants it to do), but I don't think that many sysadmins would expect this much of a performance hit. there should be some way to tell the system to ignore requests for timestamps so that a badly behaved program cannot cripple the system this way (and preferably something that doesn't require a full SELinux/capabilities implementation) David Lang - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Tue, Sep 19, 2006 at 02:00:38AM +0400, Alexey Kuznetsov wrote: > Hello! > > > Please think about it this way: > > suppose you haave a heavily loaded router and some network problem is to > > be diagnosed. You run tcpdump and suddenly router becomes overloaded (by > > switching to timestamp-it-all mode > > I am sorry. I cannot think that way. :-) > > Instead of attempts to scare, better resend original report, > where you said how much performance degraded, I cannot find it. > > * I do see get_offset_pmtmr() in top lines of profile. That's scary enough. I had it at the very top line. > * I do not undestand what the hell dhcp needs timestamps for. > * I do not listen any suggestions to screw up tcpdump with a sysctl. > Kernel already implements much better thing then a sysctl. > Do not want timestamps? Fix tcpdump, add an options, submit the > patch to tcpdump maintainers. Not a big deal. OK, point taken. It's better to patch tcpdump. > > Alexey > ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hello! > Please think about it this way: > suppose you haave a heavily loaded router and some network problem is to > be diagnosed. You run tcpdump and suddenly router becomes overloaded (by > switching to timestamp-it-all mode I am sorry. I cannot think that way. :-) Instead of attempts to scare, better resend original report, where you said how much performance degraded, I cannot find it. * I do see get_offset_pmtmr() in top lines of profile. That's scary enough. * I do not undestand what the hell dhcp needs timestamps for. * I do not listen any suggestions to screw up tcpdump with a sysctl. Kernel already implements much better thing then a sysctl. Do not want timestamps? Fix tcpdump, add an options, submit the patch to tcpdump maintainers. Not a big deal. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hello! > Ok, ok, but don't we have queueing disciplines that need the timestamp > even on ingress? I cannot find. ip_queue does. But it is just another user, not different of sockets. BTW in any case, any user of timestamp who sees 0, because skb was received before timestamping was enabled, has to calculate timestamp itself right in the place where Andi suggested. Seems, preparation to the change makes sense even without the change. :-) Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
From: Alexey Kuznetsov <[EMAIL PROTECTED]> Date: Tue, 19 Sep 2006 01:03:21 +0400 > 1. It even does not disable possibility to record timestamp inside >driver, which Alan was afraid of. The sequence is: > > if (!skb->tstamp.off_sec) > net_timestamp(skb); > > 2. Maybe, netif_rx() should continue to get timestamp in netif_rx(). > > 3. NAPI already introduced almost the same inaccuracy. And it is really >silly to waste time getting timestamp in netif_receive_skb() a few >moments before the packet is delivered to a socket. > > 4. ...but clock source, which takes one of top lines in profiles >must be repaired yet. :-) Ok, ok, but don't we have queueing disciplines that need the timestamp even on ingress? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, Sep 18, 2006 at 06:50:22PM +0200, Andi Kleen wrote: > > I suppose in the worst case a sysctl like Vladimir asked for could be added, > but it would seem somewhat lame. > Please think about it this way: suppose you haave a heavily loaded router and some network problem is to be diagnosed. You run tcpdump and suddenly router becomes overloaded (by switching to timestamp-it-all mode), drops OSPF adjancecies etc. Users are angry, and you can't diagnose anything. But with impresise timestamps and maybe even with reordered packets you still have some traces to analyze. So, in this particular corner case it's not that lame. Or maybe patching tcpdump will do better? ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, Sep 18, 2006 at 01:27:57PM +0200, Andi Kleen wrote: > The codebase for timing (and lots of other things) is quite different > between 32bit and 64bit. You're really surprised it doesn't work if you do > such things? > It works, and after your remark above, I'm surprised. Dunno about slow TSC drift though, there was not enough time passed to detect it, and I hope we will have this problem soved in a better way before the drift becomes visible :) > > But the question is, why stock 2.6.18-rc7 could not use TSC on its own? > > x86-64 doesn't use the TSC when it deems it to not be reliable, which > is the case on your system. > Could it at least print something so that I know that using TSC was considered, but rejected? > > What hardware exactly. Doesn't it affect only CPU? And they are not > > know to fail before any other components. > > All hardware. It's basic physics. Hm, what other hardware is affected by idle=poll? Does this option ear out HDDs? ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hello! > But that never happens right? Right. Well, not right. It happens. Simply because you get packet with newer timestamp after previous handler saw this packet and did some actions. I just do not see any bad consequences. > And do you have some other prefered way to solve this? Even if the timer > was fast it would be still good to avoid it in the fast path when DHCPD > is running. No. The way, which you suggested, seems to be the best. 1. It even does not disable possibility to record timestamp inside driver, which Alan was afraid of. The sequence is: if (!skb->tstamp.off_sec) net_timestamp(skb); 2. Maybe, netif_rx() should continue to get timestamp in netif_rx(). 3. NAPI already introduced almost the same inaccuracy. And it is really silly to waste time getting timestamp in netif_receive_skb() a few moments before the packet is delivered to a socket. 4. ...but clock source, which takes one of top lines in profiles must be repaired yet. :-) Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 18 September 2006 18:28, Alexey Kuznetsov wrote: > Hello! > > > Hmm, not sure how that could happen. Also is it a real problem > > even if it could? > > As I said, the problem is _occasionally_ theoretical. > > This would happen f.e. if packet socket handler was installed > after IP handler. Then tcpdump would get packet after it is processed > (acked/replied/forwarded). This would be disasterous, the results > are unparsable. But that never happens right? And do you have some other prefered way to solve this? Even if the timer was fast it would be still good to avoid it in the fast path when DHCPD is running. I suppose in the worst case a sysctl like Vladimir asked for could be added, but it would seem somewhat lame. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hello! > Hmm, not sure how that could happen. Also is it a real problem > even if it could? As I said, the problem is _occasionally_ theoretical. This would happen f.e. if packet socket handler was installed after IP handler. Then tcpdump would get packet after it is processed (acked/replied/forwarded). This would be disasterous, the results are unparsable. I recall, the issue was discussed, and that time it looked more reasonable to solve problems of this kind taking timestamp once before it is seen by all the rest of stack. Who could expect that PIT nightmare is going to return? :-) > Then it has to use the ACPI pmtmr which is really really slow. > The overhead of that thing is so large that you can clearly see it in > the network benchmark. I see. Thank you. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 18 September 2006 17:38, Alexey Kuznetsov wrote: > Hello! > > > For netdev: I'm more and more thinking we should just avoid the problem > > completely and switch to "true end2end" timestamps. This means don't > > time stamp when a packet is received, but only when it is delivered > > to a socket. > > This will work. > > From viewpoint of existing uses of timestamp by packet socket > this time is not worse. The only danger is violation of casuality > (when forwarded packet or reply packet gets timestamp earlier than > original packet). Hmm, not sure how that could happen. Also is it a real problem even if it could? > > handler runs. Then the problem above would completely disappear. > > Well, not completely. Too slow clock source remains too slow clock source. > If it is so slow, that it results in "performance degradation", it just > should not be used at all, even such pariah as tcpdump wants to be fast. > > Actually, I have a question. Why the subject is > "Network performance degradation from 2.6.11.12 to 2.6.16.20"? > I do not see beginning of the thread and cannot guess > why clock source degraded. :-) It's a long and sad story. Old kernels didn't disable the TSC on those boxes (multi core K8) and assumed they were synchronized for timing purposes. This initially mostly worked if you don't use cpufreq, but over a longer uptime the TSCs would drift against each other and timing would jump more and more between CPUs. On older versions of K8 this drift happened much slower (more aggressive power saving in HLT in newer steppings made it worse; that is why idle=poll helps) and could be often ignored. But technically it was still a bug there because it would could break timing after long uptimes. New multi socket K8 boxes are generally totally unusable with TSC because they use cpufreq and the TSCs can run at completely differently frequencies, which obviously doesn't give very good timing information if you assume the TSC is globally synchronized. That is why later kernels default to TSC off. The original plan was to use HPET then, which is slower than TSC, but still not that bad. But while most modern systems have a HPET timer somewhere in the chipset nearly all BIOS vendors "forgot" to describe it in the BIOS because Windows didn't use it and Linux can't find it because of that. Then it has to use the ACPI pmtmr which is really really slow. The overhead of that thing is so large that you can clearly see it in the network benchmark. The real fix long term is to change the timer subsystem to keep all TSC state per CPU, then it'll work on the K8s too. Unfortunately it's a moderately hard problem to make the result still fully monotonic. But people are working on it. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hello! > For netdev: I'm more and more thinking we should just avoid the problem > completely and switch to "true end2end" timestamps. This means don't > time stamp when a packet is received, but only when it is delivered > to a socket. This will work. >From viewpoint of existing uses of timestamp by packet socket this time is not worse. The only danger is violation of casuality (when forwarded packet or reply packet gets timestamp earlier than original packet). This pathology was main reason why timestamp is recorded early, before packet is demultiplexed in netif_receive_skb(). But it is not a practical problem: delivery to packet/raw sockets is occasionally placed _before_ delivery to real protocol handlers. > handler runs. Then the problem above would completely disappear. Well, not completely. Too slow clock source remains too slow clock source. If it is so slow, that it results in "performance degradation", it just should not be used at all, even such pariah as tcpdump wants to be fast. Actually, I have a question. Why the subject is "Network performance degradation from 2.6.11.12 to 2.6.16.20"? I do not see beginning of the thread and cannot guess why clock source degraded. :-) Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 18 September 2006 17:19, Alan Cox wrote: > Ar Llu, 2006-09-18 am 16:29 +0200, ysgrifennodd Andi Kleen: > > The only delay this would add would be the queueing time from the NIC > > to the softirq. Do you really think that is that bad? > > If you are trying to do things like network record/playback then you > want the minimal delay. But it's not minimal. Maybe it was long ago when the code was designed on a 3c509 but not with modern hardware: Think interrupt mitigation and NAPI. And with NAPI we tend to process the packets directly after they are fetched out of the RX queue, so there is practically no delay between driver seeing the packet and softirq seeing it. All the queuing is done either at hardware level or later at socket level. > There's a reason the original timestamp code > supported the hardware setting the timestamp itself - we actually had a > separare set of logic on a board that was doing the timestamping by > watching the IRQ line of the NIC chip. That would be fine too (because it will be likely fast), but unfortunately I don't know of any driver that does that. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Ar Llu, 2006-09-18 am 16:29 +0200, ysgrifennodd Andi Kleen: > The only delay this would add would be the queueing time from the NIC > to the softirq. Do you really think that is that bad? If you are trying to do things like network record/playback then you want the minimal delay. There's a reason the original timestamp code supported the hardware setting the timestamp itself - we actually had a separare set of logic on a board that was doing the timestamping by watching the IRQ line of the NIC chip. Alan - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
> > People who run tcpdump want "wire" timestamps as close as possible. > Yes, things get delayed with the IRQ path, DMA delays, IRQ > mitigation and whatnot, but it's an order of magnitude worse if > you delay to user read() since that introduces also the delay of > the packet copies to userspace which are significantly larger than > these hardware level delays. If tcpdump gets swapped out, the > timestamp delay can be on the order of several seconds making it > totally useless. My proposal wasn't to delay to user read, just to do the time stamp in socket context. This means as soon as packet or RAW/UDP have looked up the socket and can check a per socket flag do the time stamp. The only delay this would add would be the queueing time from the NIC to the softirq. Do you really think that is that bad? -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
From: Andi Kleen <[EMAIL PROTECTED]> Date: 18 Sep 2006 11:58:21 +0200 > For netdev: I'm more and more thinking we should just avoid the > problem completely and switch to "true end2end" timestamps. This > means don't time stamp when a packet is received, but only when it > is delivered to a socket. The timestamp at receiving is a lie > anyways because the network hardware can add an arbitary long delay > before the driver interrupt handler runs. Then the problem above > would completely disappear. I don't think this is wise. People who run tcpdump want "wire" timestamps as close as possible. Yes, things get delayed with the IRQ path, DMA delays, IRQ mitigation and whatnot, but it's an order of magnitude worse if you delay to user read() since that introduces also the delay of the packet copies to userspace which are significantly larger than these hardware level delays. If tcpdump gets swapped out, the timestamp delay can be on the order of several seconds making it totally useless. Andi, you will need to find another solution to this problem :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
"Vladimir B. Savkin" <[EMAIL PROTECTED]> writes: [you seem to send your emails in a strange way that doesn't keep me in cc. Please stop doing that.] > On Mon, Sep 18, 2006 at 11:58:21AM +0200, Andi Kleen wrote: > > > > The x86-64 timer subsystems currently doesn't have clocksources > > > > at all, but it supports TSC and some other timers. > > > > > > > > until I hacked arch/i386/kernel/tsc.c > > > > Then you don't use x86-64. > > > Oh. I mean I made arch/i386/kernel/tsc.c compile on x86-64 > by hacking some Makefiles and headers. The codebase for timing (and lots of other things) is quite different between 32bit and 64bit. You're really surprised it doesn't work if you do such things? > But the question is, why stock 2.6.18-rc7 could not use TSC on its own? x86-64 doesn't use the TSC when it deems it to not be reliable, which is the case on your system. > > > > > I've also had experience of unsychronized TSC on dual-core Athlon, > > > > > but it was cured by idle=poll. > > > > > > > > You can use that, but it will make your system run quite hot > > > > and cost you a lot of powe^wmoney. > > > > > > Here in Russia electric power is cheap compared with hardware upgrade. > > > > It's not just electrical power - the hardware is more stressed and will > > likely fail earlier too. As a rule of thumb the hotter your hardware runs > > the earlier it will fail. > > What hardware exactly. Doesn't it affect only CPU? And they are not > know to fail before any other components. All hardware. It's basic physics. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, Sep 18, 2006 at 11:58:21AM +0200, Andi Kleen wrote: > > > The x86-64 timer subsystems currently doesn't have clocksources > > > at all, but it supports TSC and some other timers. > > > > > until I hacked arch/i386/kernel/tsc.c > > Then you don't use x86-64. > Oh. I mean I made arch/i386/kernel/tsc.c compile on x86-64 by hacking some Makefiles and headers. But the question is, why stock 2.6.18-rc7 could not use TSC on its own? > > > > I've also had experience of unsychronized TSC on dual-core Athlon, > > > > but it was cured by idle=poll. > > > > > > You can use that, but it will make your system run quite hot > > > and cost you a lot of powe^wmoney. > > > > Here in Russia electric power is cheap compared with hardware upgrade. > > It's not just electrical power - the hardware is more stressed and will > likely fail earlier too. As a rule of thumb the hotter your hardware runs > the earlier it will fail. What hardware exactly. Doesn't it affect only CPU? And they are not know to fail before any other components. > > > > > > It seems that dhcpd3 makes the box timestamping incoming packets, > > > > killing the performance. I think that combining router and DHCP server > > > > on a same box is a legitimate situation, isn't it? > > > > > > Yes. Good point. DHCP is broken and needs to be fixed. Can you > > > send a bug report to the DHCP maintainers? > > > > > > iirc the problem used to be that RAW sockets didn't do something > > > they need them to do. Maybe we can fix that now. > > > > Will try some days later. > > > > Oh, and pppoe-server uses some kind of packet socket too, doesn't it? > > The problem is not really using a packet socket, but using the SIOCGSTAMP > ioctl on it. As soon as someone issues it the system will take accurate > time stamps for each incoming packet until the respective socket is closed. > > Quick fix is to change user space to use gettimeofday() when it reads > the packet instead. Ok, thank you, I now understand. > > For netdev: I'm more and more thinking we should just avoid the problem > completely and switch to "true end2end" timestamps. This means don't > time stamp when a packet is received, but only when it is delivered > to a socket. The timestamp at receiving is a lie anyways because > the network hardware can add an arbitary long delay before the driver > interrupt > handler runs. Then the problem above would completely disappear. > Comments? Opinions? > > -Andi > ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
"Vladimir B. Savkin" <[EMAIL PROTECTED]> writes: > On Mon, Sep 18, 2006 at 10:35:38AM +0200, Andi Kleen wrote: > > > I just found out that TSC clocksource is not implemented on x86-64. > > > Kernel version 2.6.18-rc7, is it true? > > > > The x86-64 timer subsystems currently doesn't have clocksources > > at all, but it supports TSC and some other timers. > > until I hacked arch/i386/kernel/tsc.c Then you don't use x86-64. > > > > I've also had experience of unsychronized TSC on dual-core Athlon, > > > but it was cured by idle=poll. > > > > You can use that, but it will make your system run quite hot > > and cost you a lot of powe^wmoney. > > Here in Russia electric power is cheap compared with hardware upgrade. It's not just electrical power - the hardware is more stressed and will likely fail earlier too. As a rule of thumb the hotter your hardware runs the earlier it will fail. > > > > It seems that dhcpd3 makes the box timestamping incoming packets, > > > killing the performance. I think that combining router and DHCP server > > > on a same box is a legitimate situation, isn't it? > > > > Yes. Good point. DHCP is broken and needs to be fixed. Can you > > send a bug report to the DHCP maintainers? > > > > iirc the problem used to be that RAW sockets didn't do something > > they need them to do. Maybe we can fix that now. > > Will try some days later. > > Oh, and pppoe-server uses some kind of packet socket too, doesn't it? The problem is not really using a packet socket, but using the SIOCGSTAMP ioctl on it. As soon as someone issues it the system will take accurate time stamps for each incoming packet until the respective socket is closed. Quick fix is to change user space to use gettimeofday() when it reads the packet instead. For netdev: I'm more and more thinking we should just avoid the problem completely and switch to "true end2end" timestamps. This means don't time stamp when a packet is received, but only when it is delivered to a socket. The timestamp at receiving is a lie anyways because the network hardware can add an arbitary long delay before the driver interrupt handler runs. Then the problem above would completely disappear. Comments? Opinions? -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, Sep 18, 2006 at 10:35:38AM +0200, Andi Kleen wrote: > > I just found out that TSC clocksource is not implemented on x86-64. > > Kernel version 2.6.18-rc7, is it true? > > The x86-64 timer subsystems currently doesn't have clocksources > at all, but it supports TSC and some other timers. Hm. On my box, TSC did not work, until I hacked arch/i386/kernel/tsc.c in it. Neither clock=tsc nor clocksource=tsc didn't have any effect. > > I've also had experience of unsychronized TSC on dual-core Athlon, > > but it was cured by idle=poll. > > You can use that, but it will make your system run quite hot > and cost you a lot of powe^wmoney. Here in Russia electric power is cheap compared with hardware upgrade. > > It seems that dhcpd3 makes the box timestamping incoming packets, > > killing the performance. I think that combining router and DHCP server > > on a same box is a legitimate situation, isn't it? > > Yes. Good point. DHCP is broken and needs to be fixed. Can you > send a bug report to the DHCP maintainers? > > iirc the problem used to be that RAW sockets didn't do something > they need them to do. Maybe we can fix that now. Will try some days later. Oh, and pppoe-server uses some kind of packet socket too, doesn't it? > > If that's not possible we can probably add a ioctl or similar > to disable time stamping for packet sockets (DHCP shouldn't really > need a fine grained time stamp). dhcpcd would need to use that then. I would like some sysctl very much, too. Let tcpdump show imprecise timestamps when forwarding performance is more important. After all, Ciscos don't have any tcpdump analog at all, and they are very popular :) > > Keep me updated what they say. > > -Andi > ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
"Vladimir B. Savkin" <[EMAIL PROTECTED]> writes: > On Mon, Jun 19, 2006 at 05:24:31PM +0200, Andi Kleen wrote: > > > > > If you use "pmtmr" try to reboot with kernel option "clock=tsc". > > > > That's dangerous advice - when the system choses not to use > > TSC it often has a reason. > > I just found out that TSC clocksource is not implemented on x86-64. > Kernel version 2.6.18-rc7, is it true? The x86-64 timer subsystems currently doesn't have clocksources at all, but it supports TSC and some other timers. > > I've also had experience of unsychronized TSC on dual-core Athlon, > but it was cured by idle=poll. You can use that, but it will make your system run quite hot and cost you a lot of powe^wmoney. > It seems that dhcpd3 makes the box timestamping incoming packets, > killing the performance. I think that combining router and DHCP server > on a same box is a legitimate situation, isn't it? Yes. Good point. DHCP is broken and needs to be fixed. Can you send a bug report to the DHCP maintainers? iirc the problem used to be that RAW sockets didn't do something they need them to do. Maybe we can fix that now. If that's not possible we can probably add a ioctl or similar to disable time stamping for packet sockets (DHCP shouldn't really need a fine grained time stamp). dhcpcd would need to use that then. Keep me updated what they say. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, Jun 19, 2006 at 05:24:31PM +0200, Andi Kleen wrote: > > > If you use "pmtmr" try to reboot with kernel option "clock=tsc". > > That's dangerous advice - when the system choses not to use > TSC it often has a reason. I just found out that TSC clocksource is not implemented on x86-64. Kernel version 2.6.18-rc7, is it true? I've also had experience of unsychronized TSC on dual-core Athlon, but it was cured by idle=poll. > > > > > On my Opteron AMD system i normally can route 400 kpps, but with > > timesource "pmtmr" i could only route around 83 kpps. (I found the timer > > to be the issue by using oprofile). > > Unless you're using packet sniffing or any other application > that requests time stamps on a socket then the timer shouldn't > make much difference. Incoming packets are only time stamped > when someone asks for the timestamps. > It seems that dhcpd3 makes the box timestamping incoming packets, killing the performance. I think that combining router and DHCP server on a same box is a legitimate situation, isn't it? ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Tue, 4 Jul 2006, Andi Kleen wrote: On Tuesday 04 July 2006 13:41, Jesper Dangaard Brouer wrote: Actually the change happens between kernel version 2.6.15 and 2.6.16. The timestamp optimizations are older. Don't remember the exact release, but earlier 2.6. What I'm saying is that, with the same Config file, some Kconfig option changed between 2.6.15 and 2.6.16, that made my system use pmtmr for high-res timesource instead of TSC. And is a result of Andi's changes to arch/x86_64/Kconfig and drivers/acpi/Kconfig, which "allows/activates" the use of the timer on x86_64. Not sure what you mean here? I think, that the changes you made to the files "arch/x86_64/Kconfig" and "drivers/acpi/Kconfig", caused this change... commit: e78256b8f3e2850ad55c2d69e1429e6c2607afd3 http://www.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.17.y.git;a=commitdiff;h=e78256b8f3e2850ad55c2d69e1429e6c2607afd3 and maybe commit: 2eb1bdbad89b19c99f8ac1de1492cdabbff6b3d3 http://www.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.17.y.git;a=commitdiff;h=2eb1bdbad89b19c99f8ac1de1492cdabbff6b3d3 Hilsen Jesper Brouer -- --- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk --- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Tuesday 04 July 2006 13:41, Jesper Dangaard Brouer wrote: > > On Mon, 26 Jun 2006, Andi Kleen wrote: > > >> I encountered the same problem on a dual core opteron equipped with a > >> broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC > >> as the clock source, but the time jumped back and forth, so I changed > >> it to 'notsc', then the performance dropped dramatically to around the > >> same value as above with one CPU saturated. I suspect that the clock > >> precision is needed by the tg3 driver to correctly decide to switch to > >> polling mode, but unfortunately, the performance drop rendered the > >> solution so much unusable that I finally decided to use it only in > >> uniprocessor with TSC enabled. > > > > 2.6 is more clever at this than 2.4. In particular it does the timestamp > > for each packet only when actually needed, which is relativelt rare. > > > > Old experiences do not always apply to new kernels. > > Note, that I experinced this problem on 2.6. > > Actually the change happens between kernel version 2.6.15 and 2.6.16. The timestamp optimizations are older. Don't remember the exact release, but earlier 2.6. > And > is a result of Andi's changes to arch/x86_64/Kconfig and > drivers/acpi/Kconfig, which "allows/activates" the use of the timer on > x86_64. Not sure what you mean here? 2.6.18 will likely be more aggressive at using the TSC on i386 on Intel systems where possible, but x86-64 did this already for a long time. When x86-64 uses non TSC then it's because using the TSC is not safe. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, 26 Jun 2006, Andi Kleen wrote: I encountered the same problem on a dual core opteron equipped with a broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC as the clock source, but the time jumped back and forth, so I changed it to 'notsc', then the performance dropped dramatically to around the same value as above with one CPU saturated. I suspect that the clock precision is needed by the tg3 driver to correctly decide to switch to polling mode, but unfortunately, the performance drop rendered the solution so much unusable that I finally decided to use it only in uniprocessor with TSC enabled. 2.6 is more clever at this than 2.4. In particular it does the timestamp for each packet only when actually needed, which is relativelt rare. Old experiences do not always apply to new kernels. Note, that I experinced this problem on 2.6. Actually the change happens between kernel version 2.6.15 and 2.6.16. And is a result of Andi's changes to arch/x86_64/Kconfig and drivers/acpi/Kconfig, which "allows/activates" the use of the timer on x86_64. Cheers, Jesper Brouer -- --- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk --- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
> I encountered the same problem on a dual core opteron equipped with a > broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC > as the clock source, but the time jumped back and forth, so I changed > it to 'notsc', then the performance dropped dramatically to around the > same value as above with one CPU saturated. I suspect that the clock > precision is needed by the tg3 driver to correctly decide to switch to > polling mode, but unfortunately, the performance drop rendered the > solution so much unusable that I finally decided to use it only in > uniprocessor with TSC enabled. 2.6 is more clever at this than 2.4. In particular it does the timestamp for each packet only when actually needed, which is relativelt rare. Old experiences do not always apply to new kernels. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Sun, 25 Jun 2006, Harry Edmon wrote: > I understand the saying "beggars can't be choosers", but I have heard nothing > on > this issue since June 19th. Does anyone have any ideas on what is going on? > Is > there more information I can collect that would help diagnose this problem? > And > again, thanks for any and all help! Harry, I'd suggest checking all the ethtool configuration settings (ethtool -a, -c, -g, -k) and statistics (ethtool -S) for both the working and problematic kernels, and then comparing them to see if anything jumps out at you. Also compare ifconfig settings and dmesg output. Check /proc/interrupts to see if there is any difference with the interrupt routing. Check sysctl.conf and rc.local for any special system configuration or device settings that might differ between the systems. The one thing that has caused me a lot of network performance issues on e1000 is having TSO enabled, so if that is enabled (check with ethtool -k), then I'd try disabling it to see if that helps. -Hope this helps -Bill - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hi Andi, On Mon, Jun 19, 2006 at 05:24:31PM +0200, Andi Kleen wrote: > > > If you use "pmtmr" try to reboot with kernel option "clock=tsc". > > That's dangerous advice - when the system choses not to use > TSC it often has a reason. > > > > > On my Opteron AMD system i normally can route 400 kpps, but with > > timesource "pmtmr" i could only route around 83 kpps. (I found the timer > > to be the issue by using oprofile). > > Unless you're using packet sniffing or any other application > that requests time stamps on a socket then the timer shouldn't > make much difference. Incoming packets are only time stamped > when someone asks for the timestamps. I encountered the same problem on a dual core opteron equipped with a broadcom NIC (tg3) under 2.4. It could receive 1 Mpps when using TSC as the clock source, but the time jumped back and forth, so I changed it to 'notsc', then the performance dropped dramatically to around the same value as above with one CPU saturated. I suspect that the clock precision is needed by the tg3 driver to correctly decide to switch to polling mode, but unfortunately, the performance drop rendered the solution so much unusable that I finally decided to use it only in uniprocessor with TSC enabled. > -Andi Regards, Willy - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
I understand the saying "beggars can't be choosers", but I have heard nothing on this issue since June 19th. Does anyone have any ideas on what is going on? Is there more information I can collect that would help diagnose this problem? And again, thanks for any and all help! -- Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED] 206-543-0547 [EMAIL PROTECTED] Dept of Atmospheric Sciences FAX:206-543-0308 University of Washington, Box 351640, Seattle, WA 98195-1640 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Harry Edmon <[EMAIL PROTECTED]> wrote: > > That did not help. I have 1 minute outputs from tcpdump under both 2.6.11.12 > and 2.6.16.20. You will see a large size difference between the files. > Since > the 2.6.11.12 one is 2 MBytes, I thought I would post them via the web > instead > of via attachments. Look at: > > http://www.atmos.washington.edu/~harry/linux/2.6.11.12.out.1min > http://www.atmos.washington.edu/~harry/linux/2.6.16.20.out.1min The latter shows that it took 40ms to generate an ACK. What does 'vmstat 1' show while this is happneing? -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 19 June 2006 19:34, Chris Friesen wrote: > Andi Kleen wrote: > > Incoming packets are only time stamped > > when someone asks for the timestamps. > > Doesn't that add scheduling latency to the timestamps? Or is is a flag > that gets set to trigger timestamping at packet arrival? It's a flag (or more precise a global counter) -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, 19 Jun 2006, Andi Kleen wrote: If you use "pmtmr" try to reboot with kernel option "clock=tsc". That's dangerous advice - when the system choses not to use TSC it often has a reason. Sorry, it was not a general advice, just something to try out. It really solved my network performance issue... On my Opteron AMD system i normally can route 400 kpps, but with timesource "pmtmr" i could only route around 83 kpps. (I found the timer to be the issue by using oprofile). Unless you're using packet sniffing or any other application that requests time stamps on a socket then the timer shouldn't make much difference. Incoming packets are only time stamped when someone asks for the timestamps. I do not know what caused the issue on my machine, but I can look into it if you like to know? I do have VLAN interfaces on the machine and it seems that eth1 runs in PROMISC mode (eth1.xxx does not). Could it be caused by that? Hilsen Jesper Brouer -- --- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk --- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Andi Kleen wrote: Incoming packets are only time stamped when someone asks for the timestamps. Doesn't that add scheduling latency to the timestamps? Or is is a flag that gets set to trigger timestamping at packet arrival? Chris - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Jesper Dangaard Brouer wrote: Harry Edmon <[EMAIL PROTECTED]> wrote: I have a system with a strange network performance degradation from 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6. The system is has Dual single core Xeons with hyperthreading on. Hi Harry Can you check which "high-res timesource" you are using? In the kernel log look for: kernel: Using tsc for high-res timesource kernel: Using pmtmr for high-res timesource I have experinced some network performance degradation when using the "pmtmr" timesource, on a Opteron AMD system. It seems that the default timesource change between 2.6.15 to 2.6.16. If you use "pmtmr" try to reboot with kernel option "clock=tsc". On my Opteron AMD system i normally can route 400 kpps, but with timesource "pmtmr" i could only route around 83 kpps. (I found the timer to be the issue by using oprofile). We have CONFIG_HPET_TIMER=y, so we do not see these messages. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
> If you use "pmtmr" try to reboot with kernel option "clock=tsc". That's dangerous advice - when the system choses not to use TSC it often has a reason. > > On my Opteron AMD system i normally can route 400 kpps, but with > timesource "pmtmr" i could only route around 83 kpps. (I found the timer > to be the issue by using oprofile). Unless you're using packet sniffing or any other application that requests time stamps on a socket then the timer shouldn't make much difference. Incoming packets are only time stamped when someone asks for the timestamps. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Harry Edmon <[EMAIL PROTECTED]> wrote: I have a system with a strange network performance degradation from 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6. The system is has Dual single core Xeons with hyperthreading on. Hi Harry Can you check which "high-res timesource" you are using? In the kernel log look for: kernel: Using tsc for high-res timesource kernel: Using pmtmr for high-res timesource I have experinced some network performance degradation when using the "pmtmr" timesource, on a Opteron AMD system. It seems that the default timesource change between 2.6.15 to 2.6.16. If you use "pmtmr" try to reboot with kernel option "clock=tsc". On my Opteron AMD system i normally can route 400 kpps, but with timesource "pmtmr" i could only route around 83 kpps. (I found the timer to be the issue by using oprofile). Cheers, Jesper Brouer -- --- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk --- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Stephen Hemminger wrote: Does this fix it? # sysctl -w net.ipv4.tcp_abc=0 That did not help. I have 1 minute outputs from tcpdump under both 2.6.11.12 and 2.6.16.20. You will see a large size difference between the files. Since the 2.6.11.12 one is 2 MBytes, I thought I would post them via the web instead of via attachments. Look at: http://www.atmos.washington.edu/~harry/linux/2.6.11.12.out.1min http://www.atmos.washington.edu/~harry/linux/2.6.16.20.out.1min And again, thank to all of you for looking into this. -- Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED] 206-543-0547 [EMAIL PROTECTED] Dept of Atmospheric Sciences FAX:206-543-0308 University of Washington, Box 351640, Seattle, WA 98195-1640 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Stephen Hemminger wrote: Does this fix it? # sysctl -w net.ipv4.tcp_abc=0 Thanks for the suggestion. I will give it a try later tonight. Also Andrew - sorry for the incorrect placement of my follow-up comments. I do appreciate everyone's help in figuring this out. -- Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED] 206-543-0547 [EMAIL PROTECTED] Dept of Atmospheric Sciences FAX:206-543-0308 University of Washington, Box 351640, Seattle, WA 98195-1640 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Andrew Morton wrote: On Sat, 17 Jun 2006 16:23:34 -0700 Harry Edmon <[EMAIL PROTECTED]> wrote: Andrew Morton wrote: On Fri, 16 Jun 2006 09:01:23 -0700 Harry Edmon <[EMAIL PROTECTED]> wrote: I have a system with a strange network performance degradation from 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6. The system is has Dual single core Xeons with hyperthreading on. The application is the LDM system from UCAR/Unidata (http://www.unidata.ucar.edu/software/ldm). This system requests weather data from a variety of systems using RPC calls over a reserved TCP port (388), puts them into a memory mapped queue file, and then sends the data out to a variety of downstream requesting systems, again using RPC calls. When the load is heavy, the 2.6.16.20 kernel falls way behind with the data ingestion. The 2.6.11.12 kernel does not. I have tried an experiment with a 2.6.17-rc6 system where it just does the ingestion, and not the downstream distribution, and it is able to keep up. I would really appreciate any pointers as to where the problem may be and how to diagnose it. I have attached the config files from both kernels and the sysctl.conf file I am using. I have also included the output from "netstat -s" on the 2.6.16.20 system during a time when it was having problems. (added netdev) A quick grep indicates that it isn't using TCP_NODELAY - we've had problems with that in the past. Perhaps a tcpdump of the net traffic will help to determine what's going on. [ edit, edit - please don't top-post ] I assume you are talking about using TCP_NODELAY as a socket option within the LDM software. I could give that a try. The use of TCP_NODELAY caused problems with the JVM debugger. I'm not suggesting that enabling it will fix anything here. There is a lot of traffic on this node, on the order of 2000 packets in and out per second, so the tcpdump output will grow pretty fast. How long a tcpdump would be useful, and what options would you suggest? I don't know, frankly - first one needs to develop some sort of theory, then use the diagnostic tools to prove or disprove that theory. And I don't have a theory. I guess a simple one-second bare `tcpdump -i eth0' would be a starting point. Perhaps compare the output of that with the output from a correctly-operating kernel, see if anything suggests itself. That might also give us something which the networking developers can use. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Does this fix it? # sysctl -w net.ipv4.tcp_abc=0 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Sat, 17 Jun 2006 16:23:34 -0700 Harry Edmon <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > On Fri, 16 Jun 2006 09:01:23 -0700 > > Harry Edmon <[EMAIL PROTECTED]> wrote: > > > >> I have a system with a strange network performance degradation from > >> 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6. > >> The system is has Dual single core Xeons with hyperthreading on. The > >> application is the LDM system from UCAR/Unidata > >> (http://www.unidata.ucar.edu/software/ldm). This system requests > >> weather data from a variety of systems using RPC calls over a reserved > >> TCP port (388), puts them into a memory mapped queue file, and then > >> sends the data out to a variety of downstream requesting systems, again > >> using RPC calls. When the load is heavy, the 2.6.16.20 kernel falls way > >> behind with the data ingestion. The 2.6.11.12 kernel does not. I have > >> tried an experiment with a 2.6.17-rc6 system where it just does the > >> ingestion, and not the downstream distribution, and it is able to keep > >> up. I would really appreciate any pointers as to where the problem may > >> be and how to diagnose it. I have attached the config files from both > >> kernels and the sysctl.conf file I am using. I have also included the > >> output from "netstat -s" on the 2.6.16.20 system during a time when it > >> was having problems. > >> > > > > (added netdev) > > > > A quick grep indicates that it isn't using TCP_NODELAY - we've had problems > > with that in the past. > > > > Perhaps a tcpdump of the net traffic will help to determine what's going on. > [ edit, edit - please don't top-post ] > I assume you are talking about using TCP_NODELAY as a socket option within > the > LDM software. I could give that a try. The use of TCP_NODELAY caused problems with the JVM debugger. I'm not suggesting that enabling it will fix anything here. > > There is a lot of traffic on this node, on the order of 2000 packets in and > out > per second, so the tcpdump output will grow pretty fast. How long a tcpdump > would be useful, and what options would you suggest? I don't know, frankly - first one needs to develop some sort of theory, then use the diagnostic tools to prove or disprove that theory. And I don't have a theory. I guess a simple one-second bare `tcpdump -i eth0' would be a starting point. Perhaps compare the output of that with the output from a correctly-operating kernel, see if anything suggests itself. That might also give us something which the networking developers can use. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
I assume you are talking about using TCP_NODELAY as a socket option within the LDM software. I could give that a try. There is a lot of traffic on this node, on the order of 2000 packets in and out per second, so the tcpdump output will grow pretty fast. How long a tcpdump would be useful, and what options would you suggest? I should also note that my network interfaces are Intel, using the latest e1000 driver. Andrew Morton wrote: On Fri, 16 Jun 2006 09:01:23 -0700 Harry Edmon <[EMAIL PROTECTED]> wrote: I have a system with a strange network performance degradation from 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6. The system is has Dual single core Xeons with hyperthreading on. The application is the LDM system from UCAR/Unidata (http://www.unidata.ucar.edu/software/ldm). This system requests weather data from a variety of systems using RPC calls over a reserved TCP port (388), puts them into a memory mapped queue file, and then sends the data out to a variety of downstream requesting systems, again using RPC calls. When the load is heavy, the 2.6.16.20 kernel falls way behind with the data ingestion. The 2.6.11.12 kernel does not. I have tried an experiment with a 2.6.17-rc6 system where it just does the ingestion, and not the downstream distribution, and it is able to keep up. I would really appreciate any pointers as to where the problem may be and how to diagnose it. I have attached the config files from both kernels and the sysctl.conf file I am using. I have also included the output from "netstat -s" on the 2.6.16.20 system during a time when it was having problems. (added netdev) A quick grep indicates that it isn't using TCP_NODELAY - we've had problems with that in the past. Perhaps a tcpdump of the net traffic will help to determine what's going on. -- Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED] 206-543-0547 [EMAIL PROTECTED] Dept of Atmospheric Sciences FAX:206-543-0308 University of Washington, Box 351640, Seattle, WA 98195-1640 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Fri, 16 Jun 2006 09:01:23 -0700 Harry Edmon <[EMAIL PROTECTED]> wrote: > I have a system with a strange network performance degradation from > 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6. > The system is has Dual single core Xeons with hyperthreading on. The > application is the LDM system from UCAR/Unidata > (http://www.unidata.ucar.edu/software/ldm). This system requests > weather data from a variety of systems using RPC calls over a reserved > TCP port (388), puts them into a memory mapped queue file, and then > sends the data out to a variety of downstream requesting systems, again > using RPC calls. When the load is heavy, the 2.6.16.20 kernel falls way > behind with the data ingestion. The 2.6.11.12 kernel does not. I have > tried an experiment with a 2.6.17-rc6 system where it just does the > ingestion, and not the downstream distribution, and it is able to keep > up. I would really appreciate any pointers as to where the problem may > be and how to diagnose it. I have attached the config files from both > kernels and the sysctl.conf file I am using. I have also included the > output from "netstat -s" on the 2.6.16.20 system during a time when it > was having problems. > (added netdev) A quick grep indicates that it isn't using TCP_NODELAY - we've had problems with that in the past. Perhaps a tcpdump of the net traffic will help to determine what's going on. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html