Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
On Fri, 07 Sep 2007, jamal wrote: On Fri, 2007-07-09 at 10:31 +0100, James Chapman wrote: Not really. I used 3-year-old, single CPU x86 boxes with e100 interfaces. The idle poll change keeps them in polled mode. Without idle poll, I get twice as many interrupts as packets, one for txdone and one for rx. NAPI is continuously scheduled in/out. Certainly faster than the machine in the paper (which was about 2 years old in 2005). I could never get ping -f to do that for me - so things must be getting worse with newer machines then. No. Since I did a flood ping from the machine under test, the improved latency meant that the ping response was handled more quickly, causing the next packet to be sent sooner. So more packets were transmitted in the allotted time (10 seconds). ok. With current NAPI: rtt min/avg/max/mdev = 0.902/1.843/101.727/4.659 ms, pipe 9, ipg/ewma 1.611/1.421 ms With idle poll changes: rtt min/avg/max/mdev = 0.898/1.117/28.371/0.689 ms, pipe 3, ipg/ewma 1.175/1.236 ms Not bad in terms of latency. The deviation certainly looks better. But the CPU has done more work. I am going to be the devil's advocate[1]: So let me be the angel's advocate. :-) If the problem i am trying to solve is reduce cpu use at lower rate, then this is not the right answer because your cpu use has gone up. Your latency numbers have not improved that much (looking at the avg) and your throughput is not that much higher. Will i be willing to pay more cpu (of an already piggish cpu use by NAPI at that rate with 2 interupts per packet)? I view his results much more favorably. With current NAPI, the average RTT is 104% higher than the minimum, the deviation is 4.659 ms, and the maximum RTT is 101.727 ms. With his patch, the average RTT is only 24% higher than the minimum, the deviation is only 0.689 ms, and the maximum RTT is 28.371 ms. The average RTT improved by 39%, the deviation was 6.8 times smaller, and the maximum RTT was 3.6 times smaller. So in every respect the latency was significantly better. The throughput increased from 6200 packets to 8510 packets or an increase of 37%. The only negative is that the CPU utilization increased from 62% to 100% or an increase of 61%, so the CPU increase was greater than the increase in the amount of work performed (17.6% greater than what one would expect purely from the increased amount of work). You can't always improve on all metrics of a workload. Sometimes there are tradeoffs to be made to be decided by the user based on what's most important to that user and his specific workload. And the suggested ethtool option (defaulting to current behavior) would enable the user to make that decision. -Bill P.S. I agree that some tests run in parallel with some CPU hogs also running might be beneficial and enlightening. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add IP1000A Driver
On Wed, 12 Sep 2007 13:35:43 +0800 黃建興-Jesse [EMAIL PROTECTED] wrote: -Original Message- From: Stephen Hemminger [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 11, 2007 10:42 PM To: Jesse Huang Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; netdev@vger.kernel.org; [EMAIL PROTECTED] Subject: Re: [PATCH] Add IP1000A Driver Who will be listed as maintainer of this device? A good way to show that is to add an entry to MAINTAINERS file. Ok, Should I generate a patch to modify MAINTAINERS file? Yes, can be included with patch or separate, it doesn't matter. + * Current Maintainer: + * + * Sorbica Shieh. + * 10F, No.47, Lane 2, Kwang-Fu RD. + * Sec. 2, Hsin-Chu, Taiwan, R.O.C. + * http://www.icplus.com.tw + * [EMAIL PROTECTED] + */ Names only, no physical addresses please. Should I remove those two lins? 10F, No.47, Lane 2, Kwang-Fu RD. Sec. 2, Hsin-Chu, Taiwan, R.O.C. It is your option, but many times people and companies move locations and this gets out of date. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] dgrs: remove from build, config, and maintainer list
From: Nathanael Nerode Stop building and configuring driver for Digi RightSwitch, which was never actually sold to anyone, and remove it from MAINTAINERS. In response to an investigation into the firmware of the Digi Rightswitch driver, Andres Salomon discovered: Dear Andres: After further research, we found that this product was killed in place and never reached the market. We would like to request that this not be included. Since the product never reached market, clearly nobody is using this orphaned driver. Signed-off-by: Nathanael Nerode [EMAIL PROTECTED] --- This is patch 1 of 2 for removing the Digi Rightswitch (dgrs). Patch 2 would be the patch to remove the actual files. However, that would be around 400K, which doesn't seem suitable for a mailing list -- and this length seems quite unnecessary, given that it would consist solely of full-file deletions. I'm not quite sure what to do about this. Please advise. These are the files to be deleted: ./Documentation/networking/dgrs.txt ./drivers/net/dgrs.c ./drivers/net/dgrs.h ./drivers/net/dgrs_asstruct.h ./drivers/net/dgrs_bcomm.h ./drivers/net/dgrs_es4h.h ./drivers/net/dgrs_ether.h ./drivers/net/dgrs_firmware.c (this is the very large one) ./drivers/net/dgrs_i82596.h ./drivers/net/dgrs_plx9060.h diff -upr linux-2.6.22.6/drivers/net/Kconfig linux-2.6-deleted/drivers/net/Kconfig --- linux-2.6.22.6/drivers/net/Kconfig 2007-08-31 02:21:01.0 -0400 +++ linux-2.6-deleted/drivers/net/Kconfig 2007-09-12 03:28:11.0 -0400 @@ -1447,21 +1447,6 @@ config TC35815 depends on NET_PCI PCI MIPS select MII -config DGRS - tristate Digi Intl. RightSwitch SE-X support - depends on NET_PCI (PCI || EISA) - ---help--- - This is support for the Digi International RightSwitch series of - PCI/EISA Ethernet switch cards. These include the SE-4 and the SE-6 - models. If you have a network card of this type, say Y and read the - Ethernet-HOWTO, available from - http://www.tldp.org/docs.html#howto. More specific - information is contained in file:Documentation/networking/dgrs.txt. - - To compile this driver as a module, choose M here and read - file:Documentation/networking/net-modules.txt. The module - will be called dgrs. - config EEPRO100 tristate EtherExpressPro/100 support (eepro100, original Becker driver) depends on NET_PCI PCI diff -upr linux-2.6.22.6/drivers/net/Makefile linux-2.6-deleted/drivers/net/Makefile --- linux-2.6.22.6/drivers/net/Makefile 2007-08-31 02:21:01.0 -0400 +++ linux-2.6-deleted/drivers/net/Makefile 2007-09-12 03:28:31.0 -0400 @@ -38,7 +38,6 @@ obj-$(CONFIG_CASSINI) += cassini.o obj-$(CONFIG_MACE) += mace.o obj-$(CONFIG_BMAC) += bmac.o -obj-$(CONFIG_DGRS) += dgrs.o obj-$(CONFIG_VORTEX) += 3c59x.o obj-$(CONFIG_TYPHOON) += typhoon.o obj-$(CONFIG_NE2K_PCI) += ne2k-pci.o 8390.o diff -upr linux-2.6.22.6/MAINTAINERS linux-2.6-deleted/MAINTAINERS --- linux-2.6.22.6/MAINTAINERS 2007-08-31 02:21:01.0 -0400 +++ linux-2.6-deleted/MAINTAINERS 2007-09-12 03:27:26.0 -0400 @@ -1234,12 +1234,6 @@ L: [EMAIL PROTECTED] W: http://www.digi.com S: Orphaned -DIGI RIGHTSWITCH NETWORK DRIVER -P: Rick Richardson -L: netdev@vger.kernel.org -W: http://www.digi.com -S: Orphaned - DIRECTORY NOTIFICATION P: Stephen Rothwell M: [EMAIL PROTECTED] -- Nathanael Nerode [EMAIL PROTECTED] [Insert famous quote here] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Clean up owner field in sock_lock_t
From: John Heffner [EMAIL PROTECTED] Date: Tue, 11 Sep 2007 14:01:31 -0400 I don't know why the owner field is a (struct sock_iocb *). I'm assuming it's historical. Can someone check this out? Did I miss some alternate usage? AIO used it somehow in net/socket.c and I believe there was some intention to access this sock_iocb deeper in the call chain. None of that materialized of course :) These patches are against net-2.6.24. Thanks a lot, I'll add these patches. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: xfrm audit calls
From: Joy Latten [EMAIL PROTECTED] Date: Tue, 11 Sep 2007 19:03:14 -0500 This patch modifies the current ipsec audit layer by breaking it up into purpose driven audit calls. So far, the only audit calls made are when add/delete an SA/policy. It had been discussed to give each key manager it's own calls to do this, but I found there to be much redundnacy since they did the exact same things, except for how they got auid and sid, so I combined them. The below audit calls can be made by any key manager. Hopefully, this is ok. I compiled and tested with CONFIG_AUDITSYSCALLS on and off. Signed-off-by: Joy Latten [EMAIL PROTECTED] Patch applied, thanks! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
From: Eric Dumazet [EMAIL PROTECTED] Date: Tue, 11 Sep 2007 14:56:13 +0200 When the periodic IP route cache flush is done (every 600 seconds on default configuration), some hosts suffer a lot and eventually trigger the soft lockup message. dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, while holding the dst_lock spinlock for the whole scan. Then it rearms a timer to redo the full thing 1/10 s later... The slowdown can last one minute or so, depending on how active are the tcp sessions. This second version of the patch converts the processing from a softirq based one to a workqueue. Even if the list of entries in garbage_list is huge, host is still responsive to softirqs and can make progress. Instead of reseting gc timer to 0.1 second if one entry was freed in a gc run, we do this if more than 10% of entries were freed. I like this patch a lot, some minor fix is needed though: + __builtin_prefetch(next-next, 1, 0); Please use prefetch() instead of a direct explicit call to a gcc-specific routine :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/16] appletalk: In notifier handlers convert the void pointer to a netdevice
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:09:36 -0600 This slightly improves code safetly and clarity. Later network namespace patches touch this code so this is a preliminary cleanup. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/16] net: Don't implement dev_ifname32 inline
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:13:04 -0600 The current implementation of dev_ifname makes maintenance difficult because updates to the implementation of the ioctl have to made in two places. So this patch updates dev_ifname32 to do a classic 32/64 structure conversion and call sys_ioctl like the rest of the compat calls do. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/16] net: Basic network namespace infrastructure.
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:15:34 -0600 This is the basic infrastructure needed to support network namespaces. This infrastructure is: - Registration functions to support initializing per network namespace data when a network namespaces is created or destroyed. - struct net. The network namespace data structure. This structure will grow as variables are made per network namespace but this is the minimal starting point. - Functions to grab a reference to the network namespace. I provide both get/put functions that keep a network namespace from being freed. And hold/release functions serve as weak references and will warn if their count is not zero when the data structure is freed. Useful for dealing with more complicated data structures like the ipv4 route cache. - A list of all of the network namespaces so we can iterate over them. - A slab for the network namespace data structure allowing leaks to be spotted. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] I realize there are some discussions about naming and fixing some races, but I applied this anyways so we can make some forward progress. We can make name changes and fixes on top of this initial work. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/16] net: Add a network namespace parameter to tasks
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:17:03 -0600 This is the network namespace from which all which all sockets and anything else under user control ultimately get their network namespace parameters. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/16] net: Add a network namespace tag to struct net_device
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:18:12 -0600 Please note that network devices do not increase the count count on the network namespace. The are inside the network namespace and so the network namespace tag is in the nature of a back pointer and so getting and putting the network namespace is unnecessary. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/16] net: Add a network namespace parameter to struct sock
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:21:37 -0600 Sockets need to get a reference to their network namespace, or possibly a simple hold if someone registers on the network namespace notifier and will free the sockets when the namespace is going to be destroyed. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery.
RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery. Calling arp_send() to initiate neighbour discovery (ND) doesn't do the full ND protocol. Namely, it doesn't handle retransmitting the arp request if it is dropped. The function neigh_event_send() does all this. Without doing full ND, rdma address resolution fails in the presence of dropped arp bcast packets. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/core/addr.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index c5c33d3..5381c80 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -161,8 +161,7 @@ static void addr_send_arp(struct sockadd if (ip_route_output_key(rt, fl)) return; - arp_send(ARPOP_REQUEST, ETH_P_ARP, rt-rt_gateway, rt-idev-dev, -rt-rt_src, NULL, rt-idev-dev-dev_addr, NULL); + neigh_event_send(rt-u.dst.neighbour, NULL); ip_rt_put(rt); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
This looks nice in general, getting things out of softirq context is always good. On Tue, Sep 11, 2007 at 02:56:13PM +0200, Eric Dumazet wrote: #if RT_CACHE_DEBUG = 2 static atomic_t dst_total = ATOMIC_INIT(0); #endif -static unsigned long dst_gc_timer_expires; -static unsigned long dst_gc_timer_inc = DST_GC_MAX; -static void dst_run_gc(unsigned long); +static struct { + spinlock_t lock; + struct dst_entry*list; + unsigned long timer_inc; + unsigned long timer_expires; +} dst_garbage = { + .lock = __SPIN_LOCK_UNLOCKED(dst_garbage.lock), + .timer_inc = DST_GC_MAX, +}; Can you please et rid of this useless struct? It just complicates the code and means we can't use the proper DEFINE_SPINLOCK initializer. +DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task); This should be static. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/16] net: Make /proc/net per network namespace
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:20:36 -0600 This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Patch applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/16] net: Make socket creation namespace safe.
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:23:01 -0600 This patch passes in the namespace a new socket should be created in and has the socket code do the appropriate reference counting. By virtue of this all socket create methods are touched. In addition the socket create methods are modified so that they will fail if you attempt to create a socket in a non-default network namespace. Failing if we attempt to create a socket outside of the default network namespace ensures that as we incrementally make the network stack network namespace aware we will not export functionality that someone has not audited and made certain is network namespace safe. Allowing us to partially enable network namespaces before all of the exotic protocols are supported. Any protocol layers I have missed will fail to compile because I now pass an extra parameter into the socket creation code. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Patch applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] dgrs: remove from build, config, and maintainer list
On Wed, 12 Sep 2007, Nathanael Nerode wrote: From: Nathanael Nerode Stop building and configuring driver for Digi RightSwitch, which was never actually sold to anyone, and remove it from MAINTAINERS. In response to an investigation into the firmware of the Digi Rightswitch driver, Andres Salomon discovered: search the netdev archive for this month before sending out duplicate patches. jgarzik was on the kernel summit, so i'm waiting on his reply to the complete removal patch. -- maks - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
On Wed, 12 Sep 2007 02:05:25 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Tue, 11 Sep 2007 14:56:13 +0200 When the periodic IP route cache flush is done (every 600 seconds on default configuration), some hosts suffer a lot and eventually trigger the soft lockup message. dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, while holding the dst_lock spinlock for the whole scan. Then it rearms a timer to redo the full thing 1/10 s later... The slowdown can last one minute or so, depending on how active are the tcp sessions. This second version of the patch converts the processing from a softirq based one to a workqueue. Even if the list of entries in garbage_list is huge, host is still responsive to softirqs and can make progress. Instead of reseting gc timer to 0.1 second if one entry was freed in a gc run, we do this if more than 10% of entries were freed. I like this patch a lot, some minor fix is needed though: Thank you I also spoted a missing static before DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task); no need to stress Adrian on this :) + __builtin_prefetch(next-next, 1, 0); Please use prefetch() instead of a direct explicit call to a gcc-specific routine :-) Unfortunatly, there is no equivalent for this one. This gives on my Opterons a nice prefetchnta prefetch(addr) is more like __builtin_prefetch(addr, 0, 3) I would like to avoid to zap L2 cache with useless data. __builtin_prefetch() is included from gcc 3.1 (2002), so every platform should support it, as linux-2.6 requires gcc 3.2 at least. I guess you are going to tell me to first publish a patch to lkml :) Thank you Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] NET_SCHED: Rate table fixes
This set of patches, aim at fixing an issue with the rate table used by the rate based schedulers. Currently we use the lower-boundry value, which result in under-estimating the actual bandwidth usage. The patches will change this to use the upper-boundry L2T (length to time) value. The patches include both changes to the kernel and iproute2 userspace utility. The kernel changes, only adds flexibility to allow userspace to do the rate table alignment. The patches has been splitup in cleanup and actual functional change patches. The patches also moves the overhead calculation (currently only used by HTB) into the kernel, which makes it more precise (as it won't miss-align the contents of the rate table). This should raise some questions, 1. How does the current/old rate table mapping look like. 2. How does new aligned rate table mapping look like. 3. What happens when only the TC util is changed and used on a old kernel. Lets look at how the layout of the rate tables looks like: Illustrating the rate table array: Legend description rtab[x] : Array index x of rtab[x] xmit_sz : Transmit size contained in rtab[x] (normally transmit time) maps[a-b] : Packet sizes from a to b, will map into rtab[x] (1) Current/old rate table mapping (cell_log:3): rtab[0]:=xmit_sz:0 maps[0-7] rtab[1]:=xmit_sz:8 maps[8-15] rtab[2]:=xmit_sz:16 maps[16-23] rtab[3]:=xmit_sz:24 maps[24-31] rtab[4]:=xmit_sz:32 maps[32-39] rtab[5]:=xmit_sz:40 maps[40-47] rtab[6]:=xmit_sz:48 maps[48-55] The above illustrates that we are using the lower-boundry transmit size (xmit_sz). (2) New iproute rate table mapping, with kernel cell_align support. rtab[0]:=xmit_sz:8 maps[0-8] rtab[1]:=xmit_sz:16 maps[9-16] rtab[2]:=xmit_sz:24 maps[17-24] rtab[3]:=xmit_sz:32 maps[25-32] rtab[4]:=xmit_sz:40 maps[33-40] rtab[5]:=xmit_sz:48 maps[41-48] rtab[6]:=xmit_sz:56 maps[49-56] The above illustrates that we are using the upper-boundry transmit size (xmit_sz), when mapping packets sizes. The interesting question is what about compatibility. If a old iproute utility is used on a new kernel, we simply get the old rate table (lower-bound) alignment. The interesting case is what happens with a new iproute util on a old kernel. The table below, shows that what happens is that we use the upper-bound+1byte. I believe that this is a good and acceptable solution. (3) New TC util on a kernel WITHOUT support for cell_align rtab[0]:=xmit_sz:8 maps[0-7] rtab[1]:=xmit_sz:16 maps[8-15] rtab[2]:=xmit_sz:24 maps[16-23] rtab[3]:=xmit_sz:32 maps[24-31] rtab[4]:=xmit_sz:40 maps[32-39] rtab[5]:=xmit_sz:48 maps[40-47] rtab[6]:=xmit_sz:56 maps[48-55] -- Med venlig hilsen / Best regards Jesper Brouer ComX Networks A/S Linux Network developer Cand. Scient Datalog / MSc. Author of http://adsl-optimizer.dk - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] [NET_SCHED]: Cleanup L2T macros and handle oversized packets
commit a28343c933f6cfc3df1be86e0ebe8d99fa8d5f77 Author: Jesper Dangaard Brouer [EMAIL PROTECTED] Date: Wed Sep 12 10:01:00 2007 +0200 [NET_SCHED]: Cleanup L2T macros and handle oversized packets Change L2T (length to time) macros, in all rate based schedulers, to call a common function qdisc_l2t() that does the rate table lookup. This function handles if the packet size lookup is larger than the rate table, which often occurs with TSO enabled. Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED] diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 8a67f24..4ebd615 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -302,4 +302,16 @@ drop: return NET_XMIT_DROP; } +/* Length to Time (L2T) lookup in a qdisc_rate_table, to determine how + long it will take to send a packet given its size. + */ +static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, unsigned int pktlen) +{ + int slot = pktlen; + slot = rtab-rate.cell_log; + if (slot 255) + return (rtab-data[255]*(slot 8) + rtab-data[slot 0xFF]); + return rtab-data[slot]; +} + #endif diff --git a/net/sched/act_police.c b/net/sched/act_police.c index 6085be5..46deb5f 100644 --- a/net/sched/act_police.c +++ b/net/sched/act_police.c @@ -21,8 +21,8 @@ #include net/act_api.h #include net/netlink.h -#define L2T(p,L) ((p)-tcfp_R_tab-data[(L)(p)-tcfp_R_tab-rate.cell_log]) -#define L2T_P(p,L) ((p)-tcfp_P_tab-data[(L)(p)-tcfp_P_tab-rate.cell_log]) +#define L2T(p,L) qdisc_l2t((p)-tcfp_R_tab, L) +#define L2T_P(p,L) qdisc_l2t((p)-tcfp_P_tab, L) #define POL_TAB_MASK 15 static struct tcf_common *tcf_police_ht[POL_TAB_MASK + 1]; diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index e38c283..aed2af2 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -175,7 +175,7 @@ struct cbq_sched_data }; -#define L2T(cl,len)((cl)-R_tab-data[(len)(cl)-R_tab-rate.cell_log]) +#define L2T(cl,len)qdisc_l2t((cl)-R_tab,len) static __inline__ unsigned cbq_hash(u32 h) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 246a2f9..5e608a6 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -132,10 +132,8 @@ struct htb_class { static inline long L2T(struct htb_class *cl, struct qdisc_rate_table *rate, int size) { - int slot = size rate-rate.cell_log; - if (slot 255) - return (rate-data[255]*(slot 8) + rate-data[slot 0xFF]); - return rate-data[slot]; + long result = qdisc_l2t(rate, size); + return result; } struct htb_sched { diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index 8c2639a..b0d8109 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -115,8 +115,8 @@ struct tbf_sched_data struct qdisc_watchdog watchdog; /* Watchdog timer */ }; -#define L2T(q,L) ((q)-R_tab-data[(L)(q)-R_tab-rate.cell_log]) -#define L2T_P(q,L) ((q)-P_tab-data[(L)(q)-P_tab-rate.cell_log]) +#define L2T(q,L) qdisc_l2t((q)-R_tab,L) +#define L2T_P(q,L) qdisc_l2t((q)-P_tab,L) static int tbf_enqueue(struct sk_buff *skb, struct Qdisc* sch) { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] [IPROUTE2]: Update pkt_sched.h (to resemble the kernel one)
commit ef065a43b8900fbc0763eac0fa0a9a8a00c8aaa2 Author: Jesper Dangaard Brouer [EMAIL PROTECTED] Date: Tue Sep 11 16:17:46 2007 +0200 [IPROUTE2]: Update pkt_sched.h (to resemble the kernel one) Extend the tc_ratespec struct, with two parameters: 1) cell_align that allow adjusting the alignment of the rate table. 2) overhead that allow adding a packet overhead before the lookup in the kernel. This is done in order to, add support to changing the rate table to use the upper-boundry L2T (length to time) value. Currently we use the lower-boundry, which result in under-estimating the actual bandwidth usage. Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED] diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 268c515..919af93 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -77,8 +77,8 @@ struct tc_ratespec { unsigned char cell_log; unsigned char __reserved; - unsigned short feature; - short addend; + unsigned short overhead; + short cell_align; unsigned short mpu; __u32 rate; }; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] [IPROUTE2]: Overhead calculation is now done in the kernel
commit 07a74a2613440fc1a68d0faa7235ed7027532d78 Author: Jesper Dangaard Brouer [EMAIL PROTECTED] Date: Tue Sep 11 16:59:58 2007 +0200 [IPROUTE2]: Overhead calculation is now done in the kernel. The only current user is HTB. HTB overhead argument is now passed on to the kernel (in the struct tc_ratespec). Also correct the data types. Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED] diff --git a/tc/q_htb.c b/tc/q_htb.c index 53e3f78..310d36d 100644 --- a/tc/q_htb.c +++ b/tc/q_htb.c @@ -107,8 +107,9 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str __u32 rtab[256],ctab[256]; unsigned buffer=0,cbuffer=0; int cell_log=-1,ccell_log = -1; - unsigned mtu, mpu; - unsigned char mpu8 = 0, overhead = 0; + unsigned mtu; + unsigned short mpu = 0; + unsigned short overhead = 0; struct rtattr *tail; memset(opt, 0, sizeof(opt)); mtu = 1600; /* eth packet len */ @@ -127,12 +128,12 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str } } else if (matches(*argv, mpu) == 0) { NEXT_ARG(); - if (get_u8(mpu8, *argv, 10)) { + if (get_u16(mpu, *argv, 10)) { explain1(mpu); return -1; } } else if (matches(*argv, overhead) == 0) { NEXT_ARG(); - if (get_u8(overhead, *argv, 10)) { + if (get_u16(overhead, *argv, 10)) { explain1(overhead); return -1; } } else if (matches(*argv, quantum) == 0) { @@ -206,9 +207,11 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str if (!buffer) buffer = opt.rate.rate / get_hz() + mtu; if (!cbuffer) cbuffer = opt.ceil.rate / get_hz() + mtu; -/* encode overhead and mpu, 8 bits each, into lower 16 bits */ - mpu = (unsigned)mpu8 | (unsigned)overhead 8; - opt.ceil.mpu = mpu; opt.rate.mpu = mpu; + opt.ceil.overhead = overhead; + opt.rate.overhead = overhead; + + opt.ceil.mpu = mpu; + opt.rate.mpu = mpu; if ((cell_log = tc_calc_rtable(opt.rate.rate, rtab, cell_log, mtu, mpu)) 0) { fprintf(stderr, htb: failed to calculate rate table.\n); diff --git a/tc/tc_core.c b/tc/tc_core.c index 58155fb..1ab0ba0 100644 --- a/tc/tc_core.c +++ b/tc/tc_core.c @@ -73,8 +73,6 @@ int tc_calc_rtable(unsigned bps, __u32 *rtab, int cell_log, unsigned mtu, unsigned mpu) { int i; - unsigned overhead = (mpu 8) 0xFF; - mpu = mpu 0xFF; if (mtu == 0) mtu = 2047; @@ -86,8 +84,6 @@ int tc_calc_rtable(unsigned bps, __u32 *rtab, int cell_log, unsigned mtu, } for (i=0; i256; i++) { unsigned sz = (icell_log); - if (overhead) - sz += overhead; if (sz mpu) sz = mpu; rtab[i] = tc_calc_xmittime(bps, sz); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] [IPROUTE2]: Cleanup: tc_calc_rtable()
commit e3bad6e344303fec9916d1420aade98a2e6c79cc Author: Jesper Dangaard Brouer [EMAIL PROTECTED] Date: Wed Sep 5 10:47:47 2007 +0200 [IPROUTE2]: Cleanup: tc_calc_rtable(). Change tc_calc_rtable() to take a tc_ratespec struct as an argument. (cell_log still needs to be passed on as a parameter, because -1 indicate that the cell_log needs to be computed by the function.). Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED] diff --git a/tc/m_police.c b/tc/m_police.c index 5d2528b..acdfd22 100644 --- a/tc/m_police.c +++ b/tc/m_police.c @@ -263,22 +263,20 @@ int act_parse_police(struct action_util *a,int *argc_p, char ***argv_p, int tca_ } if (p.rate.rate) { - if ((Rcell_log = tc_calc_rtable(p.rate.rate, rtab, Rcell_log, mtu, mpu)) 0) { + p.rate.mpu = mpu; + if (tc_calc_rtable(p.rate, rtab, Rcell_log, mtu) 0) { fprintf(stderr, TBF: failed to calculate rate table.\n); return -1; } p.burst = tc_calc_xmittime(p.rate.rate, buffer); - p.rate.cell_log = Rcell_log; - p.rate.mpu = mpu; } p.mtu = mtu; if (p.peakrate.rate) { - if ((Pcell_log = tc_calc_rtable(p.peakrate.rate, ptab, Pcell_log, mtu, mpu)) 0) { + p.peakrate.mpu = mpu; + if (tc_calc_rtable(p.peakrate, ptab, Pcell_log, mtu) 0) { fprintf(stderr, POLICE: failed to calculate peak rate table.\n); return -1; } - p.peakrate.cell_log = Pcell_log; - p.peakrate.mpu = mpu; } tail = NLMSG_TAIL(n); diff --git a/tc/q_cbq.c b/tc/q_cbq.c index f2b4ce8..df98312 100644 --- a/tc/q_cbq.c +++ b/tc/q_cbq.c @@ -137,12 +137,11 @@ static int cbq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl if (allot (avpkt*3)/2) allot = (avpkt*3)/2; - if ((cell_log = tc_calc_rtable(r.rate, rtab, cell_log, allot, mpu)) 0) { + r.mpu = mpu; + if (tc_calc_rtable(r, rtab, cell_log, allot) 0) { fprintf(stderr, CBQ: failed to calculate rate table.\n); return -1; } - r.cell_log = cell_log; - r.mpu = mpu; if (ewma_log 0) ewma_log = TC_CBQ_DEF_EWMA; @@ -336,12 +335,11 @@ static int cbq_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str unsigned pktsize = wrr.allot; if (wrr.allot (lss.avpkt*3)/2) wrr.allot = (lss.avpkt*3)/2; - if ((cell_log = tc_calc_rtable(r.rate, rtab, cell_log, pktsize, mpu)) 0) { + r.mpu = mpu; + if (tc_calc_rtable(r, rtab, cell_log, pktsize) 0) { fprintf(stderr, CBQ: failed to calculate rate table.\n); return -1; } - r.cell_log = cell_log; - r.mpu = mpu; } if (ewma_log 0) ewma_log = TC_CBQ_DEF_EWMA; diff --git a/tc/q_htb.c b/tc/q_htb.c index 310d36d..e24ad6d 100644 --- a/tc/q_htb.c +++ b/tc/q_htb.c @@ -213,19 +213,17 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str opt.ceil.mpu = mpu; opt.rate.mpu = mpu; - if ((cell_log = tc_calc_rtable(opt.rate.rate, rtab, cell_log, mtu, mpu)) 0) { + if (tc_calc_rtable(opt.rate, rtab, cell_log, mtu) 0) { fprintf(stderr, htb: failed to calculate rate table.\n); return -1; } opt.buffer = tc_calc_xmittime(opt.rate.rate, buffer); - opt.rate.cell_log = cell_log; - if ((ccell_log = tc_calc_rtable(opt.ceil.rate, ctab, cell_log, mtu, mpu)) 0) { + if (tc_calc_rtable(opt.ceil, ctab, ccell_log, mtu) 0) { fprintf(stderr, htb: failed to calculate ceil rate table.\n); return -1; } opt.cbuffer = tc_calc_xmittime(opt.ceil.rate, cbuffer); - opt.ceil.cell_log = ccell_log; tail = NLMSG_TAIL(n); addattr_l(n, 1024, TCA_OPTIONS, NULL, 0); diff --git a/tc/q_tbf.c b/tc/q_tbf.c index 1fc05f4..c7b4f0f 100644 --- a/tc/q_tbf.c +++ b/tc/q_tbf.c @@ -170,21 +170,20 @@ static int tbf_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl opt.limit = lim; } - if ((Rcell_log = tc_calc_rtable(opt.rate.rate, rtab, Rcell_log, mtu, mpu)) 0) { + opt.rate.mpu = mpu; + if (tc_calc_rtable(opt.rate, rtab, Rcell_log, mtu) 0) { fprintf(stderr, TBF: failed to calculate rate table.\n); return -1; } opt.buffer = tc_calc_xmittime(opt.rate.rate, buffer); - opt.rate.cell_log = Rcell_log; - opt.rate.mpu = mpu; + if (opt.peakrate.rate) { - if ((Pcell_log =
[PATCH 6/6] [IPROUTE2]: Change the rate table calc of transmit cost to use upper bound value
commit 2e3edbef7913ac43899c8258ee59d9032778cee1 Author: Jesper Dangaard Brouer [EMAIL PROTECTED] Date: Wed Sep 5 15:24:51 2007 +0200 [IPROUTE2]: Change the rate table calc of transmit cost to use upper bound value. Patrick McHardy, Cite: 'its better to overestimate than underestimate to stay in control of the queue'. Illustrating the rate table array: Legend description rtab[x] : Array index x of rtab[x] xmit_sz : Transmit size contained in rtab[x] (normally transmit time) maps[a-b] : Packet sizes from a to b, will map into rtab[x] Current/old rate table mapping (cell_log:3): rtab[0]:=xmit_sz:0 maps[0-7] rtab[1]:=xmit_sz:8 maps[8-15] rtab[2]:=xmit_sz:16 maps[16-23] rtab[3]:=xmit_sz:24 maps[24-31] rtab[4]:=xmit_sz:32 maps[32-39] rtab[5]:=xmit_sz:40 maps[40-47] rtab[6]:=xmit_sz:48 maps[48-55] New rate table mapping, with kernel cell_align support. rtab[0]:=xmit_sz:8 maps[0-8] rtab[1]:=xmit_sz:16 maps[9-16] rtab[2]:=xmit_sz:24 maps[17-24] rtab[3]:=xmit_sz:32 maps[25-32] rtab[4]:=xmit_sz:40 maps[33-40] rtab[5]:=xmit_sz:48 maps[41-48] rtab[6]:=xmit_sz:56 maps[49-56] New TC util on a kernel WITHOUT support for cell_align rtab[0]:=xmit_sz:8 maps[0-7] rtab[1]:=xmit_sz:16 maps[8-15] rtab[2]:=xmit_sz:24 maps[16-23] rtab[3]:=xmit_sz:32 maps[24-31] rtab[4]:=xmit_sz:40 maps[32-39] rtab[5]:=xmit_sz:48 maps[40-47] rtab[6]:=xmit_sz:56 maps[48-55] Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED] diff --git a/tc/tc_core.c b/tc/tc_core.c index c713a18..752b07c 100644 --- a/tc/tc_core.c +++ b/tc/tc_core.c @@ -84,11 +84,12 @@ int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab, int cell_log, unsigned mt cell_log++; } for (i=0; i256; i++) { - unsigned sz = (icell_log); + unsigned sz = ((i+1)cell_log); if (sz mpu) sz = mpu; rtab[i] = tc_calc_xmittime(bps, sz); } + r-cell_align=-1; // Due to the sz calc r-cell_log=cell_log; return cell_log; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
net/bluetooth/hci_sock.c:352: error: storage size of 'ctv' isn't known
latest git pull, make allyesconfig on i386: ... CC net/bluetooth/hci_sock.o net/bluetooth/hci_sock.c: In function âhci_sock_cmsgâ: net/bluetooth/hci_sock.c:352: error: storage size of âctvâ isnât known net/bluetooth/hci_sock.c:352: warning: unused variable âctvâ make[2]: *** [net/bluetooth/hci_sock.o] Error 1 make[1]: *** [net/bluetooth] Error 2 make: *** [net] Error 2 rday p.s. dumb question -- what locale should i be using to get those quotes to not make such a mess of my screen? thanks. -- Robert P. J. Day Linux Consulting, Training and Annoying Kernel Pedantry Waterloo, Ontario, CANADA http://crashcourse.ca
Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
On Wed, 12 Sep 2007 11:00:54 +0100 Christoph Hellwig [EMAIL PROTECTED] wrote: This looks nice in general, getting things out of softirq context is always good. I am preparing a patch to net/ipv4/route.c to migrate rt_check_expire() as well. On Tue, Sep 11, 2007 at 02:56:13PM +0200, Eric Dumazet wrote: #if RT_CACHE_DEBUG = 2 static atomic_t dst_total = ATOMIC_INIT(0); #endif -static unsigned long dst_gc_timer_expires; -static unsigned long dst_gc_timer_inc = DST_GC_MAX; -static void dst_run_gc(unsigned long); +static struct { + spinlock_t lock; + struct dst_entry*list; + unsigned long timer_inc; + unsigned long timer_expires; +} dst_garbage = { + .lock = __SPIN_LOCK_UNLOCKED(dst_garbage.lock), + .timer_inc = DST_GC_MAX, +}; Can you please et rid of this useless struct? It just complicates the code and means we can't use the proper DEFINE_SPINLOCK initializer. When using the standard DEFINE_SPINLOCK initializer, the lock is in the data section, while list is in bss section. This 'useless struct' makes lock/list being on the same cache line, so reduces latency of __dst_free(). I wish more structures in kernel be used instead of relying on random placement of the linker... +DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task); This should be static. Yes I agree - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] NET_SCHED: Rate table fixes
Jesper Dangaard Brouer wrote: This set of patches, aim at fixing an issue with the rate table used by the rate based schedulers. ACK for all the patches :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] net: Initialize the network namespace of network devices.
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:24:21 -0600 Except for carefully selected pseudo devices all network interfaces should start out in the initial network namespace. Ultimately it will be register_netdev that examines what dev-nd_net is set to and places a device in a network namespace. This patch modifies alloc_netdev to initialize the network namespace a device is in with the initial network namespace. This gets it right for the vast majority of devices so their drivers need not be modified and for those few pseudo devices that need something different they can change this parameter before calling register_netdevice. The network namespace parameter on a network device is not reference counted as the devices are inside of a network namespace and cannot remain in that namespace past the lifetime of the network namespace. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Move the definition of pr_err() into kernel.h
On Tue, 11 Sep 2007 09:56:05 -0500 Emil Medve [EMAIL PROTECTED] wrote: Other pr_*() macros are already defined in kernel.h, but pr_err() was defined multiple times in several other places Signed-off-by: Emil Medve [EMAIL PROTECTED] pr_error seems better than pr_err Please add the full set: pr_alert pr_critical pr_error pr_warn pr_notice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/16] net: Make packet reception network namespace safe
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:25:43 -0600 This patch modifies every packet receive function registered with dev_add_pack() to drop packets if they are not from the initial network namespace. This should ensure that the various network stacks do not receive packets in a anything but the initial network namespace until the code has been converted and is ready for them. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] [IPROUTE2]: Update pkt_sched.h (to resemble the kernel one)
On Wed, 12 Sep 2007 12:14:14 +0200 Jesper Dangaard Brouer [EMAIL PROTECTED] wrote: commit ef065a43b8900fbc0763eac0fa0a9a8a00c8aaa2 Author: Jesper Dangaard Brouer [EMAIL PROTECTED] Date: Tue Sep 11 16:17:46 2007 +0200 [IPROUTE2]: Update pkt_sched.h (to resemble the kernel one) Extend the tc_ratespec struct, with two parameters: 1) cell_align that allow adjusting the alignment of the rate table. 2) overhead that allow adding a packet overhead before the lookup in the kernel. This is done in order to, add support to changing the rate table to use the upper-boundry L2T (length to time) value. Currently we use the lower-boundry, which result in under-estimating the actual bandwidth usage. Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED] Okay, but don't need a special patch to do it. I perodically sync up the headers before each release. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/16] net: Make device event notification network namespace safe
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:27:11 -0600 Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] [IPROUTE2]: Overhead calculation is now done in the kernel
On Wed, 12 Sep 2007 12:14:39 +0200 Jesper Dangaard Brouer [EMAIL PROTECTED] wrote: commit 07a74a2613440fc1a68d0faa7235ed7027532d78 Author: Jesper Dangaard Brouer [EMAIL PROTECTED] Date: Tue Sep 11 16:59:58 2007 +0200 [IPROUTE2]: Overhead calculation is now done in the kernel. The only current user is HTB. HTB overhead argument is now passed on to the kernel (in the struct tc_ratespec). Also correct the data types. Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED] How is this binary compatable with older kernels? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/16] net: Support multiple network namespaces with netlink
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:28:27 -0600 Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [POWERPC] ucc_geth: fix module removal
- uccf should be set to NULL to not double-free memory on subsequent calls; - ind_hash_q and group_hash_q lists should be initialized in the probe() function, instead of struct_init() (called by open()), otherwise there will be an oops if ucc_geth_driver removed prior 'ifconfig ethX up'; - add unregister_netdev(); - reorder geth_remove() steps. Signed-off-by: Anton Vorontsov [EMAIL PROTECTED] --- drivers/net/ucc_geth.c | 17 ++--- 1 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c index 9a38dfe..bc2b3bf 100644 --- a/drivers/net/ucc_geth.c +++ b/drivers/net/ucc_geth.c @@ -2080,8 +2080,10 @@ static void ucc_geth_memclean(struct ucc_geth_private *ugeth) if (!ugeth) return; - if (ugeth-uccf) + if (ugeth-uccf) { ucc_fast_free(ugeth-uccf); + ugeth-uccf = NULL; + } if (ugeth-p_thread_data_tx) { qe_muram_free(ugeth-thread_dat_tx_offset); @@ -2312,10 +2314,6 @@ static int ucc_struct_init(struct ucc_geth_private *ugeth) ug_info = ugeth-ug_info; uf_info = ug_info-uf_info; - /* Create CQs for hash tables */ - INIT_LIST_HEAD(ugeth-group_hash_q); - INIT_LIST_HEAD(ugeth-ind_hash_q); - if (!((uf_info-bd_mem_part == MEM_PART_SYSTEM) || (uf_info-bd_mem_part == MEM_PART_MURAM))) { if (netif_msg_probe(ugeth)) @@ -3949,6 +3947,10 @@ static int ucc_geth_probe(struct of_device* ofdev, const struct of_device_id *ma ugeth = netdev_priv(dev); spin_lock_init(ugeth-lock); + /* Create CQs for hash tables */ + INIT_LIST_HEAD(ugeth-group_hash_q); + INIT_LIST_HEAD(ugeth-ind_hash_q); + dev_set_drvdata(device, dev); /* Set the dev-base_addr to the gfar reg region */ @@ -4002,9 +4004,10 @@ static int ucc_geth_remove(struct of_device* ofdev) struct net_device *dev = dev_get_drvdata(device); struct ucc_geth_private *ugeth = netdev_priv(dev); - dev_set_drvdata(device, NULL); - ucc_geth_memclean(ugeth); + unregister_netdev(dev); free_netdev(dev); + ucc_geth_memclean(ugeth); + dev_set_drvdata(device, NULL); return 0; } -- 1.5.0.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] phy: implement release function
Lately I've got this nice badness on mdio bus removal: Device 'e0103120:06' does not have a release() function, it is broken and must be fixed. [ cut here ] Badness at drivers/base/core.c:107 NIP: c015c1a8 LR: c015c1a8 CTR: c0157488 REGS: c34bdcf0 TRAP: 0700 Not tainted (2.6.23-rc5-g9ebadfbb-dirty) MSR: 00029032 EE,ME,IR,DR CR: 24088422 XER: ... [c34bdda0] [c015c1a8] device_release+0x78/0x80 (unreliable) [c34bddb0] [c01354cc] kobject_cleanup+0x80/0xbc [c34bddd0] [c01365f0] kref_put+0x54/0x6c [c34bdde0] [c013543c] kobject_put+0x24/0x34 [c34bddf0] [c015c384] put_device+0x1c/0x2c [c34bde00] [c0180e84] mdiobus_unregister+0x2c/0x58 ... Though actually there is nothing broken, it just device subsystem core expects another pattern of resource managment. This patch implement phy device's release function, thus we're getting rid of this badness. Also small hidden bug fixed, hope none other introduced. ;-) Signed-off-by: Anton Vorontsov [EMAIL PROTECTED] --- drivers/net/phy/mdio_bus.c |9 + drivers/net/phy/phy_device.c | 13 + include/linux/phy.h |1 + 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index fc2f0e6..c30196d 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -91,9 +91,12 @@ int mdiobus_register(struct mii_bus *bus) err = device_register(phydev-dev); - if (err) + if (err) { printk(KERN_ERR phy %d failed to register\n, i); + phy_device_free(phydev); + phydev = NULL; + } } bus-phy_map[i] = phydev; @@ -110,10 +113,8 @@ void mdiobus_unregister(struct mii_bus *bus) int i; for (i = 0; i PHY_MAX_ADDR; i++) { - if (bus-phy_map[i]) { + if (bus-phy_map[i]) device_unregister(bus-phy_map[i]-dev); - kfree(bus-phy_map[i]); - } } } EXPORT_SYMBOL(mdiobus_unregister); diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index e275df8..80c283c 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -44,6 +44,17 @@ static struct phy_driver genphy_driver; extern int mdio_bus_init(void); extern void mdio_bus_exit(void); +void phy_device_free(struct phy_device *phydev) +{ + kfree(phydev); +} +EXPORT_SYMBOL(phy_device_free); + +static void phy_device_release(struct device *dev) +{ + phy_device_free(to_phy_device(dev)); +} + struct phy_device* phy_device_create(struct mii_bus *bus, int addr, int phy_id) { struct phy_device *dev; @@ -54,6 +65,8 @@ struct phy_device* phy_device_create(struct mii_bus *bus, int addr, int phy_id) if (NULL == dev) return (struct phy_device*) PTR_ERR((void*)-ENOMEM); + dev-dev.release = phy_device_release; + dev-speed = 0; dev-duplex = -1; dev-pause = dev-asym_pause = 0; diff --git a/include/linux/phy.h b/include/linux/phy.h index 2a65978..9ec1363 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -398,6 +398,7 @@ int phy_mii_ioctl(struct phy_device *phydev, int phy_start_interrupts(struct phy_device *phydev); void phy_print_status(struct phy_device *phydev); struct phy_device* phy_device_create(struct mii_bus *bus, int addr, int phy_id); +void phy_device_free(struct phy_device *phydev); extern struct bus_type mdio_bus_type; #endif /* __PHY_H */ -- 1.5.0.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix e100 on systems that have cache incoherent DMA
David Acker wrote: Jeff Garzik wrote: David Acker wrote: Let me know if there is any other information I can provide you. I will look through the code to see what could be going on with your machine. I will also look into reproducing these results with a newer kernel. This may be tricky since compulab's patches are pretty stale and don't always apply easily. pktgen outputs for the various cases modified/unmodified[/others?] would be nice, if you have a spot of time. Jeff I am not familiar with pktgen but I seem to have it working for a simple test. I edited the 1-1 example from ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/examples/ . The results with and without the patch are below. It looks like you ran pktgen on the embedded system, which exercised only the transmit path. Auke indicated that the lockup was in the RU. Have you run pktgen on a test system to fire packets at the embedded system at max rate? Also test what happens when you fire packets in both directions simultaneously. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] sk98lin: ethtool perm_addr build fix
Deal with API changes while sk98lin was removed. ethtool_ops no longer has a perm_addr hook. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/sk98lin/skethtool.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/net/sk98lin/skethtool.c b/drivers/net/sk98lin/skethtool.c index 3646069..5a6da89 100644 --- a/drivers/net/sk98lin/skethtool.c +++ b/drivers/net/sk98lin/skethtool.c @@ -616,7 +616,6 @@ const struct ethtool_ops SkGeEthtoolOps = { .get_pauseparam = getPauseParams, .set_pauseparam = setPauseParams, .get_link = ethtool_op_get_link, - .get_perm_addr = ethtool_op_get_perm_addr, .get_sg = ethtool_op_get_sg, .set_sg = setScatterGather, .get_tx_csum= ethtool_op_get_tx_csum, -- 1.5.2.5 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3]: sk98lin: neuter device to only SysKonnect boards
The skge driver works better for all boards except older SysKonnect boards. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/sk98lin/skge.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index bf21862..7dc9c9e 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5168,10 +5168,17 @@ err_out: #endif static struct pci_device_id skge_pci_tbl[] = { +#ifdef SK98LIN_ALL_DEVICES { PCI_VENDOR_ID_3COM, 0x1700, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, { PCI_VENDOR_ID_3COM, 0x80eb, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, +#endif +#ifdef GENESIS + /* Generic SysKonnect SK-98xx Gigabit Ethernet Server Adapter */ { PCI_VENDOR_ID_SYSKONNECT, 0x4300, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, +#endif + /* Generic SysKonnect SK-98xx V2.0 Gigabit Ethernet Adapter */ { PCI_VENDOR_ID_SYSKONNECT, 0x4320, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, +#ifdef SK98LIN_ALL_DEVICES /* DLink card does not have valid VPD so this driver gags * { PCI_VENDOR_ID_DLINK, 0x4c00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, */ @@ -5180,6 +5187,7 @@ static struct pci_device_id skge_pci_tbl[] = { { PCI_VENDOR_ID_CNET, 0x434e, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, { PCI_VENDOR_ID_LINKSYS, 0x1032, PCI_ANY_ID, 0x0015, }, { PCI_VENDOR_ID_LINKSYS, 0x1064, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, +#endif { 0 } }; -- 1.5.2.5 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/16] net: Make the device list and device lookups per namespace.
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:35:46 -0600 This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/16] net: Factor out __dev_alloc_name from dev_alloc_name
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:36:56 -0600 When forcibly changing the network namespace of a device I need something that can generate a name for the device in the new namespace without overwriting the old name. __dev_alloc_name provides me that functionality. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/16] net: Implement network device movement between namespaces
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:38:46 -0600 This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate a network device is local to a single network namespace and should never be moved. Useful for pseudo devices that we need an instance in each network namespace (like the loopback device) and for any device we find that cannot handle multiple network namespaces so we may trap them in the initial network namespace. This patch introduces the function dev_change_net_namespace a function used to move a network device from one network namespace to another. To the network device nothing special appears to happen, to the components of the network stack it appears as if the network device was unregistered in the network namespace it is in, and a new device was registered in the network namespace the device was moved to. This patch sets up a namespace device destructor that upon the exit of a network namespace moves all of the movable network devices to the initial network namespace so they are not lost. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/16] net: netlink support for moving devices between network namespaces.
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:43:44 -0600 The simplest thing to implement is moving network devices between namespaces. However with the same attribute IFLA_NET_NS_PID we can easily implement creating devices in the destination network namespace as well. However that is a little bit trickier so this patch sticks to what is simple and easy. A pid is used to identify a process that happens to be a member of the network namespace we want to move the network device to. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/16] net: Disable netfilter sockopts when not in the initial network namespace
I added the following patch to net-2.6.24 to kill a warning since net_alloc() has no users (yet). commit f444fa9b5d70b3d431e1554e0975e012514c39f3 Author: David S. Miller [EMAIL PROTECTED](none) Date: Wed Sep 12 14:01:08 2007 +0200 [NET]: #if 0 out net_alloc() for now. We will undo this once it is actually used. Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index f259a9b..1fc513c 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -32,10 +32,12 @@ void net_unlock(void) mutex_unlock(net_list_mutex); } +#if 0 static struct net *net_alloc(void) { return kmem_cache_alloc(net_cachep, GFP_KERNEL); } +#endif static void net_free(struct net *net) { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] sk98lin: restore driver
This reverts commit e1abecc48938fbe1966ea6e78267fc673fa59295. The driver works on some hardware that skge doesn't handle yet. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- Patch too large for mailing list. Download from: http://developer.osdl.org/shemminger/patches/sk98lin-2.6.23-restore.patch - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)
Hello, as already described old temporary sockets (client is gone) of lockd aren't closed after some time. So, with enough clients and some time gone, there are 80 open dangling sockets and you start getting messages of the form: lockd: too many open TCP sockets, consider increasing the number of nfsd threads. If I understand the code then the intention was that the server closes temporary sockets after about 6 to 12 minutes: a timer is started which calls svc_age_temp_sockets every 6 minutes. svc_age_temp_sockets: if a socket is marked OLD it gets closed. sockets which are not marked as OLD are marked OLD every time the sockets receives something OLD is cleared. But svc_age_temp_sockets never closes any socket though because it only closes sockets with svsk-sk_inuse == 0. This seems to be a bug. Here is a patch against 2.6.22.6 which changes the test to svsk-sk_inuse = 0 which was probably meant. The patched kernel runs fine here. Unused sockets get closed (after 6 to 12 minutes) Signed-off-by: Wolfgang Walter [EMAIL PROTECTED] --- ../linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.0 +0200 +++ net/sunrpc/svcsock.c2007-09-11 11:07:13.0 +0200 @@ -1572,7 +1575,7 @@ if (!test_and_set_bit(SK_OLD, svsk-sk_flags)) continue; - if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, svsk-sk_flags)) + if (atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, svsk-sk_flags)) continue; atomic_inc(svsk-sk_inuse); list_move(le, to_be_aged); As svc_age_temp_sockets did not do anything before this change may trigger hidden bugs. To be true I don't see why this check (atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, svsk-sk_flags)) is needed at all (it can only be an optimation) as this fields change after the check. In svc_tcp_accept there is no such check when a temporary socket is closed. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
On Wed, 2007-12-09 at 03:04 -0400, Bill Fink wrote: On Fri, 07 Sep 2007, jamal wrote: I am going to be the devil's advocate[1]: So let me be the angel's advocate. :-) I think this would make you God's advocate ;- (http://en.wikipedia.org/wiki/God%27s_advocate) I view his results much more favorably. The challenge is, under _low traffic_: bad bad CPU use. Thats what is at stake, correct? Lets bury the stats for a sec ... 1) Has that CPU situation improved? No, it has gotten worse. 2) Was there a throughput problem? No. Remember, this is _low traffic and the complaint is not NAPI doesnt do high throughput. I am not willing to spend 34% more cpu to get a few hundred pps (under low traffic!). 3)Latency improvement is good. But is 34% cost worthwile for the corner case of low traffic? Heres an analogy: I went to buy bread and complained that 66cents was too much for such a tiny sliced loaf. You tell me you have solved my problem: asking me to pay a dollar because you made the bread slices crispier. I was complaining on the _66 cents price_ not on the crispiness of the slices ;- Crispier slices are good - but am i, the person who was complaining about price, willing to pay 40-50% more? People are bitching about NAPI abusing CPU, is the answer to abuse more CPU than NAPI?;- The answer could be I am not solving that problem anymore - at least thats what James is saying;- Note: I am not saying theres no problem - just saying the result is not addressing the problem. You can't always improve on all metrics of a workload. But you gotta try to be consistent. If, for example, one packet size/rate got negative results but the next got positive results - thats lacking consistency. Sometimes there are tradeoffs to be made to be decided by the user based on what's most important to that user and his specific workload. And the suggested ethtool option (defaulting to current behavior) would enable the user to make that decision. And the challenge is: What workload is willing to invest that much cpu for low traffic? Can you name one? One that may come close is database benchmarks for latency - but those folks wouldnt touch this with a mile-long pole if you told them their cpu use is going to get worse than what NAPI (that big bad CPU hog under low traffic) is giving them. P.S. I agree that some tests run in parallel with some CPU hogs also running might be beneficial and enlightening. indeed. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/16] net: Make /proc/net per network namespace
David Miller wrote: From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 08 Sep 2007 15:20:36 -0600 This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Patch applied, thanks. ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers Hi Dave, it seems the fs/proc/proc_net.c was not added to the git repository. Regards. -- Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
On Wed, 12 Sep 2007 04:12:00 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 12:08:45 +0200 Unfortunatly, there is no equivalent for this one. This gives on my Opterons a nice prefetchnta prefetch(addr) is more like __builtin_prefetch(addr, 0, 3) I would like to avoid to zap L2 cache with useless data. __builtin_prefetch() is included from gcc 3.1 (2002), so every platform should support it, as linux-2.6 requires gcc 3.2 at least. I guess you are going to tell me to first publish a patch to lkml :) Basically, yes :-) You won't be the only person to find this useful. OK, let's try a normal prefetch(), I'll change it later when/if a new generic macro is added. I added the missing 'static' and a comment about the struct {} dst_garbage. I also corrected spelling error on patch title (collection) Thank you [PATCH] NET : convert IP route cache garbage collection from softirq processing to a workqueue When the periodic IP route cache flush is done (every 600 seconds on default configuration), some hosts suffer a lot and eventually trigger the soft lockup message. dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, while holding the dst_lock spinlock for the whole scan. Then it rearms a timer to redo the full thing 1/10 s later... The slowdown can last one minute or so, depending on how active are the tcp sessions. This second version of the patch converts the processing from a softirq based one to a workqueue. Even if the list of entries in garbage_list is huge, host is still responsive to softirqs and can make progress. Instead of resetting gc timer to 0.1 second if one entry was freed in a gc run, we do this if more than 10% of entries were freed. Before patch : Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0! Aug 16 06:21:37 SRV1 kernel: Aug 16 06:21:37 SRV1 kernel: Call Trace: Aug 16 06:21:37 SRV1 kernel: IRQ [802286f0] wake_up_process+0x10/0x20 Aug 16 06:21:37 SRV1 kernel: [80251e09] softlockup_tick+0xe9/0x110 Aug 16 06:21:37 SRV1 kernel: [803cd380] dst_run_gc+0x0/0x140 Aug 16 06:21:37 SRV1 kernel: [802376f3] run_local_timers+0x13/0x20 Aug 16 06:21:37 SRV1 kernel: [802379c7] update_process_times+0x57/0x90 Aug 16 06:21:37 SRV1 kernel: [80216034] smp_local_timer_interrupt+0x34/0x60 Aug 16 06:21:37 SRV1 kernel: [802165cc] smp_apic_timer_interrupt+0x5c/0x80 Aug 16 06:21:37 SRV1 kernel: [8020a816] apic_timer_interrupt+0x66/0x70 Aug 16 06:21:37 SRV1 kernel: [803cd3d3] dst_run_gc+0x53/0x140 Aug 16 06:21:37 SRV1 kernel: [803cd3c6] dst_run_gc+0x46/0x140 Aug 16 06:21:37 SRV1 kernel: [80237148] run_timer_softirq+0x148/0x1c0 Aug 16 06:21:37 SRV1 kernel: [8023340c] __do_softirq+0x6c/0xe0 Aug 16 06:21:37 SRV1 kernel: [8020ad6c] call_softirq+0x1c/0x30 Aug 16 06:21:37 SRV1 kernel: EOI [8020cb34] do_softirq+0x34/0x90 Aug 16 06:21:37 SRV1 kernel: [802331cf] local_bh_enable_ip+0x3f/0x60 Aug 16 06:21:37 SRV1 kernel: [80422913] _spin_unlock_bh+0x13/0x20 Aug 16 06:21:37 SRV1 kernel: [803dfde8] rt_garbage_collect+0x1d8/0x320 Aug 16 06:21:37 SRV1 kernel: [803cd4dd] dst_alloc+0x1d/0xa0 Aug 16 06:21:37 SRV1 kernel: [803e1433] __ip_route_output_key+0x573/0x800 Aug 16 06:21:37 SRV1 kernel: [803c02e2] sock_common_recvmsg+0x32/0x50 Aug 16 06:21:37 SRV1 kernel: [803e16dc] ip_route_output_flow+0x1c/0x60 Aug 16 06:21:37 SRV1 kernel: [80400160] tcp_v4_connect+0x150/0x610 Aug 16 06:21:37 SRV1 kernel: [803ebf07] inet_bind_bucket_create+0x17/0x60 Aug 16 06:21:37 SRV1 kernel: [8040cd16] inet_stream_connect+0xa6/0x2c0 Aug 16 06:21:37 SRV1 kernel: [80422981] _spin_lock_bh+0x11/0x30 Aug 16 06:21:37 SRV1 kernel: [803c0bdf] lock_sock_nested+0xcf/0xe0 Aug 16 06:21:37 SRV1 kernel: [80422981] _spin_lock_bh+0x11/0x30 Aug 16 06:21:37 SRV1 kernel: [803be551] sys_connect+0x71/0xa0 Aug 16 06:21:37 SRV1 kernel: [803eee3f] tcp_setsockopt+0x1f/0x30 Aug 16 06:21:37 SRV1 kernel: [803c030f] sock_common_setsockopt+0xf/0x20 Aug 16 06:21:37 SRV1 kernel: [803be4bd] sys_setsockopt+0x9d/0xc0 Aug 16 06:21:37 SRV1 kernel: [8028881e] sys_ioctl+0x5e/0x80 Aug 16 06:21:37 SRV1 kernel: [80209c4e] system_call+0x7e/0x83 After patch : (RT_CACHE_DEBUG set to 2 to get following traces) dst_total: 75469 delayed: 74109 work_perf: 141 expires: 150 elapsed: 8092 us dst_total: 78725 delayed: 73366 work_perf: 743 expires: 400 elapsed: 8542 us dst_total: 86126 delayed: 71844 work_perf: 1522 expires: 775 elapsed: 8849 us dst_total: 100173 delayed: 68791 work_perf: 3053 expires: 1256 elapsed: 9748 us dst_total: 121798 delayed: 64711 work_perf: 4080 expires: 1997 elapsed: 10146 us
Re: [PATCH 17/16] net: Disable netfilter sockopts when not in the initial network namespace
David Miller [EMAIL PROTECTED] writes: I added the following patch to net-2.6.24 to kill a warning since net_alloc() has no users (yet). Reasonable, and thanks for merging these. Having a solid place to start helps a lot. I will see if I can get the /proc races fixed shortly. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/16] net: Make /proc/net per network namespace
From: Daniel Lezcano [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 14:12:04 +0200 it seems the fs/proc/proc_net.c was not added to the git repository. Fixed, thanks for catching that. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] [IPROUTE2]: Overhead calculation is now done in the kernel
On Wed, 2007-09-12 at 13:05 +0200, Stephen Hemminger wrote: How is this binary compatable with older kernels? It will be binary compatable, as I use/rename some unused variables in struct tc_ratespec. -- Med venlig hilsen / Best regards Jesper Brouer ComX Networks A/S Linux Network developer Cand. Scient Datalog / MSc. Author of http://adsl-optimizer.dk - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 14:16:56 +0200 OK, let's try a normal prefetch(), I'll change it later when/if a new generic macro is added. I added the missing 'static' and a comment about the struct {} dst_garbage. I also corrected spelling error on patch title (collection) I sorted out the conflicts with the network namespace stuff I just checked in and added your patch to net-2.6.24 Thanks! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
From: Johannes Berg [EMAIL PROTECTED] Date: Thu, 06 Sep 2007 17:19:55 +0200 Oh btw. Can we stick a might_sleep() into dev_close() *before* the test whether the device is up? That way, we'd have seen the bug, but apparently nobody before Florian ever did a 'ip link set wmaster0 down' while the other interfaces were still open. I've added this to net-2.6.24 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NETLINK]: Introduce nested and byteorder flag to netlink attribute
This change allows the generic attribute interface to be used within the netfilter subsystem where this flag was initially introduced. The byte-order flag is yet unused, it's intended use is to allow automatic byte order convertions for all atomic types. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/include/linux/netlink.h === --- net-2.6.24.orig/include/linux/netlink.h 2007-09-12 13:29:49.0 +0200 +++ net-2.6.24/include/linux/netlink.h 2007-09-12 13:59:41.0 +0200 @@ -129,6 +129,20 @@ __u16 nla_type; }; +/* + * nla_type (16 bits) + * +---+---+---+ + * | N | O | Attribute Type| + * +---+---+---+ + * N := Carries nested attributes + * O := Payload stored in network byte order + * + * Note: The N and O flag are mutually exclusive. + */ +#define NLA_F_NESTED (1 15) +#define NLA_F_NET_BYTEORDER(1 14) +#define NLA_TYPE_MASK ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER) + #define NLA_ALIGNTO4 #define NLA_ALIGN(len) (((len) + NLA_ALIGNTO - 1) ~(NLA_ALIGNTO - 1)) #define NLA_HDRLEN ((int) NLA_ALIGN(sizeof(struct nlattr))) Index: net-2.6.24/include/net/netlink.h === --- net-2.6.24.orig/include/net/netlink.h 2007-09-12 13:29:50.0 +0200 +++ net-2.6.24/include/net/netlink.h2007-09-12 14:17:56.0 +0200 @@ -667,6 +667,15 @@ } /** + * nla_type - attribute type + * @nla: netlink attribute + */ +static inline int nla_type(const struct nlattr *nla) +{ + return nla-nla_type NLA_TYPE_MASK; +} + +/** * nla_data - head of payload * @nla: netlink attribute */ Index: net-2.6.24/net/ipv4/fib_frontend.c === --- net-2.6.24.orig/net/ipv4/fib_frontend.c 2007-09-12 13:29:51.0 +0200 +++ net-2.6.24/net/ipv4/fib_frontend.c 2007-09-12 13:59:41.0 +0200 @@ -487,7 +487,7 @@ } nlmsg_for_each_attr(attr, nlh, sizeof(struct rtmsg), remaining) { - switch (attr-nla_type) { + switch (nla_type(attr)) { case RTA_DST: cfg-fc_dst = nla_get_be32(attr); break; Index: net-2.6.24/net/ipv4/fib_semantics.c === --- net-2.6.24.orig/net/ipv4/fib_semantics.c2007-09-12 13:29:51.0 +0200 +++ net-2.6.24/net/ipv4/fib_semantics.c 2007-09-12 13:59:41.0 +0200 @@ -743,7 +743,7 @@ int remaining; nla_for_each_attr(nla, cfg-fc_mx, cfg-fc_mx_len, remaining) { - int type = nla-nla_type; + int type = nla_type(nla); if (type) { if (type RTAX_MAX) Index: net-2.6.24/net/ipv6/route.c === --- net-2.6.24.orig/net/ipv6/route.c2007-09-12 13:29:51.0 +0200 +++ net-2.6.24/net/ipv6/route.c 2007-09-12 13:59:41.0 +0200 @@ -1278,7 +1278,7 @@ int remaining; nla_for_each_attr(nla, cfg-fc_mx, cfg-fc_mx_len, remaining) { - int type = nla-nla_type; + int type = nla_type(nla); if (type) { if (type RTAX_MAX) { Index: net-2.6.24/net/netlabel/netlabel_cipso_v4.c === --- net-2.6.24.orig/net/netlabel/netlabel_cipso_v4.c2007-09-12 13:29:51.0 +0200 +++ net-2.6.24/net/netlabel/netlabel_cipso_v4.c 2007-09-12 13:59:41.0 +0200 @@ -130,7 +130,7 @@ return -EINVAL; nla_for_each_nested(nla, info-attrs[NLBL_CIPSOV4_A_TAGLST], nla_rem) - if (nla-nla_type == NLBL_CIPSOV4_A_TAG) { + if (nla_type(nla) == NLBL_CIPSOV4_A_TAG) { if (iter = CIPSO_V4_TAG_MAXCNT) return -EINVAL; doi_def-tags[iter++] = nla_get_u8(nla); @@ -192,13 +192,13 @@ nla_for_each_nested(nla_a, info-attrs[NLBL_CIPSOV4_A_MLSLVLLST], nla_a_rem) - if (nla_a-nla_type == NLBL_CIPSOV4_A_MLSLVL) { + if (nla_type(nla_a) == NLBL_CIPSOV4_A_MLSLVL) { if (nla_validate_nested(nla_a, NLBL_CIPSOV4_A_MAX, netlbl_cipsov4_genl_policy) != 0) goto add_std_failure; nla_for_each_nested(nla_b, nla_a, nla_b_rem) - switch (nla_b-nla_type) { + switch
Re: [NETLINK]: Introduce nested and byteorder flag to netlink attribute
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 14:41:45 +0200 This change allows the generic attribute interface to be used within the netfilter subsystem where this flag was initially introduced. The byte-order flag is yet unused, it's intended use is to allow automatic byte order convertions for all atomic types. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied to net-2.6.24, thanks Thomas. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][NETNS][patch 0/3] fixes for the core network namespace
The following patches fixes some compilation errors and boot problems related to the network namespace patchset. They apply to net-2.6.24 -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][NETNS][patch 3/3] fix bad macro definition
From: Daniel Lezcano [EMAIL PROTECTED] The macro definition is bad. When calling next_net_device with parameter name dev, the resulting code is: struct net_device *dev = dev and that leads to an unexpected behavior. Especially when llc_core is compiled in, the kernel panics at boot time. The patchset change macro definition with static inline functions as they were defined before. Signed-off-by: Benjamin Thery [EMAIL PROTECTED] Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/linux/netdevice.h | 35 +-- 1 file changed, 17 insertions(+), 18 deletions(-) Index: net-2.6.24/include/linux/netdevice.h === --- net-2.6.24.orig/include/linux/netdevice.h +++ net-2.6.24/include/linux/netdevice.h @@ -41,7 +41,8 @@ #include linux/dmaengine.h #include linux/workqueue.h -struct net; +#include net/net_namespace.h + struct vlan_group; struct ethtool_ops; struct netpoll_info; @@ -739,23 +740,21 @@ list_for_each_entry_continue(d, (net)-dev_base_head, dev_list) #define net_device_entry(lh) list_entry(lh, struct net_device, dev_list) -#define next_net_device(d) \ -({ \ - struct net_device *dev = d; \ - struct list_head *lh; \ - struct net *net;\ - \ - net = dev-nd_net; \ - lh = dev-dev_list.next;\ - lh == net-dev_base_head ? NULL : net_device_entry(lh);\ -}) - -#define first_net_device(N)\ -({ \ - struct net *NET = (N); \ - list_empty(NET-dev_base_head) ? NULL :\ - net_device_entry(NET-dev_base_head.next); \ -}) +static inline struct net_device *next_net_device(struct net_device *dev) +{ + struct list_head *lh; + struct net *net; + + net = dev-nd_net; +lh = dev-dev_list.next; + return lh == net-dev_base_head ? NULL : net_device_entry(lh); +} + +static inline struct net_device *first_net_device(struct net *net) +{ + return list_empty(net-dev_base_head) ? NULL : + net_device_entry(net-dev_base_head.next); +} extern int netdev_boot_setup_check(struct net_device *dev); extern unsigned long netdev_boot_base(const char *prefix, int unit); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][NETNS][patch 2/3] fix loopback network namespace initialization
From: Daniel Lezcano [EMAIL PROTECTED] The core patchset of the network namespace sent by Eric Biederman does not do dynamic loopback creation. So there is no call to alloc_netdev_mq which fills the network namespace field of the netdevice. This patch assign the loopback to the init network namespace. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- drivers/net/loopback.c |1 + 1 file changed, 1 insertion(+) Index: net-2.6.24/drivers/net/loopback.c === --- net-2.6.24.orig/drivers/net/loopback.c +++ net-2.6.24/drivers/net/loopback.c @@ -225,6 +225,7 @@ | NETIF_F_LLTX | NETIF_F_NETNS_LOCAL, .ethtool_ops= loopback_ethtool_ops, + .nd_net = init_net, }; /* Setup and register the loopback device. */ -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][NETNS][patch 1/3] fix export symbols
From: Daniel Lezcano [EMAIL PROTECTED] Add the appropriate EXPORT_SYMBOLS for proc_net_create, proc_net_fops_create and proc_net_remove to fix errors when compiling allmodconfig Signed-off-by: Mark Nelson [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] --- fs/proc/proc_net.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: net-2.6.24/fs/proc/proc_net.c === --- net-2.6.24.orig/fs/proc/proc_net.c +++ net-2.6.24/fs/proc/proc_net.c @@ -31,6 +31,7 @@ { return create_proc_info_entry(name,mode, net-proc_net, get_info); } +EXPORT_SYMBOL_GPL(proc_net_create); struct proc_dir_entry *proc_net_fops_create(struct net *net, const char *name, mode_t mode, const struct file_operations *fops) @@ -42,12 +43,13 @@ res-proc_fops = fops; return res; } +EXPORT_SYMBOL_GPL(proc_net_fops_create); void proc_net_remove(struct net *net, const char *name) { remove_proc_entry(name, net-proc_net); } - +EXPORT_SYMBOL_GPL(proc_net_remove); static struct proc_dir_entry *proc_net_shadow; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.23-rc5] ipsec interfamily route handling fix
From: Joakim Koskela [EMAIL PROTECTED] Date: Thu, 6 Sep 2007 19:00:10 +0300 This patch addresses a couple of issues related to interfamily ipsec modes. The problem is that the structure of the routing info changes with the family during the __xfrmX_bundle_create, which hasn't been taken properly into account. Seems that by coincidence it hasn't caused problems on 32bit platforms, but crashes for example on x86_64 in 6-4 around line 209 of xfrm6_policy.c as rt doesn't point to a rt6_info anymore, but actually a struct rtable. With 64bit pointers, the rt-rt6i_node pointer seems to hit something usually not null in the rtable that rt now points to, making it go for the path_cookie assignment and subsequently crashing. Tested on both 32/64bit with all four (44/46/64/66) combinations of transformation. I'm still a bit worried about how for example nested transformations work with all of this and would appreciate if someone more familiar with the details of these structs could comment. Signed-off-by: Joakim Koskela [EMAIL PROTECTED] This fix basically looks fine to me, but I'd like at least one other person to review it too. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: new NAPI interface broken
From: Jan-Bernd Themann [EMAIL PROTECTED] Date: Fri, 7 Sep 2007 11:37:02 +0200 2) On SMP systems: after netif_rx_complete has been called on CPU1 (+interruts enabled), netif_rx_schedule could be called on CPU2 (irq handler) before net_rx_action on CPU1 has checked NAPI_STATE_SCHED. In that case the device would be added to poll lists of CPU1 and CPU2 as net_rx_action would see NAPI_STATE_SCHED set. This must not happen. It will be caught when netif_rx_complete is called the second time (BUG() called) This would mean we have a problem on all SMP machines right now. This is not a correct statement. Only on your platform do network device interrupts get moved around, no other platform does this. Sparc64 doesn't, all interrupts stay in one location after the cpu is initially choosen. x86 and x86_64 specifically do not move around network device interrupts, even though other device types do get dynamic IRQ cpu distribution. That's why you are the only person seeing this problem. I agree that it should be fixed, but we should also fix the IRQ distribution scheme used on powerpc platforms which is totally broken in these cases. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-2.6.24][NETNS][patch 1/3] fix export symbols
From: [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 14:38:12 +0200 From: Daniel Lezcano [EMAIL PROTECTED] Add the appropriate EXPORT_SYMBOLS for proc_net_create, proc_net_fops_create and proc_net_remove to fix errors when compiling allmodconfig Signed-off-by: Mark Nelson [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] Applied to net-2.6.24, thanks. Why aren't you signing off on these patches? Please do so in the future. Because From: usually means you are the patch author, and I can't tell who wrote these patches, you or these other people listed in the signoff area. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-2.6.24][NETNS][patch 2/3] fix loopback network namespace initialization
From: [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 14:38:13 +0200 From: Daniel Lezcano [EMAIL PROTECTED] The core patchset of the network namespace sent by Eric Biederman does not do dynamic loopback creation. So there is no call to alloc_netdev_mq which fills the network namespace field of the netdevice. This patch assign the loopback to the init network namespace. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] [IPROUTE2] Revert Make ip utility veth driver aware
Stephen it looks like you weren't cc'd on the latest version of the veth support. So this patchset first reverts the old version of the veth support you merged. Then merges a tested version of the veth support. This reverts commit 4ed390ce43d1ec7c881721f312260df901d8390d. Conflicts: ip/ip.c --- ip/Makefile |2 +- ip/ip.c |4 +- ip/veth.c | 196 --- ip/veth.h | 17 - 4 files changed, 2 insertions(+), 217 deletions(-) delete mode 100644 ip/veth.c delete mode 100644 ip/veth.h diff --git a/ip/Makefile b/ip/Makefile index 209c5c8..9a5bfe3 100644 --- a/ip/Makefile +++ b/ip/Makefile @@ -1,7 +1,7 @@ IPOBJ=ip.o ipaddress.o iproute.o iprule.o \ rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \ ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \ -ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o veth.o +ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o RTMONOBJ=rtmon.o diff --git a/ip/ip.c b/ip/ip.c index 829fc64..4bdb83b 100644 --- a/ip/ip.c +++ b/ip/ip.c @@ -27,7 +27,6 @@ #include SNAPSHOT.h #include utils.h #include ip_common.h -#include veth.h int preferred_family = AF_UNSPEC; int show_stats = 0; @@ -48,7 +47,7 @@ static void usage(void) Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n ip [ -force ] [-batch filename\n where OBJECT := { link | addr | route | rule | neigh | ntable | tunnel |\n - maddr | mroute | monitor | xfrm | veth }\n + maddr | mroute | monitor | xfrm }\n OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n -f[amily] { inet | inet6 | ipx | dnet | link } |\n -o[neline] | -t[imestamp] }\n); @@ -78,7 +77,6 @@ static const struct cmd { { monitor,do_ipmonitor }, { xfrm, do_xfrm }, { mroute, do_multiroute }, - { veth, do_veth }, { help, do_help }, { 0 } }; diff --git a/ip/veth.c b/ip/veth.c deleted file mode 100644 index d4eecc8..000 --- a/ip/veth.c +++ /dev/null @@ -1,196 +0,0 @@ -/* - * veth.c ethernet tunnel - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - * - * Authors:Pavel Emelianov, [EMAIL PROTECTED] - * - */ - -#include stdio.h -#include string.h -#include unistd.h -#include sys/types.h -#include sys/socket.h -#include linux/genetlink.h - -#include utils.h -#include veth.h - -#define GENLMSG_DATA(glh) ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN)) -#define NLA_DATA(na)((void *)((char*)(na) + NLA_HDRLEN)) - -static int do_veth_help(void) -{ - fprintf(stderr, Usage: ip veth add DEVICE PEER_NAME\n); - fprintf(stderr,del DEVICE\n); - exit(-1); -} - -static int genl_ctrl_resolve_family(const char *family) -{ - struct rtnl_handle rth; - struct nlmsghdr *nlh; - struct genlmsghdr *ghdr; - int ret = 0; - struct { - struct nlmsghdr n; - charbuf[4096]; - } req; - - memset(req, 0, sizeof(req)); - - nlh = req.n; - nlh-nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN); - nlh-nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; - nlh-nlmsg_type = GENL_ID_CTRL; - - ghdr = NLMSG_DATA(req.n); - ghdr-cmd = CTRL_CMD_GETFAMILY; - - if (rtnl_open_byproto(rth, 0, NETLINK_GENERIC) 0) { - fprintf(stderr, Cannot open generic netlink socket\n); - exit(1); - } - - addattr_l(nlh, 128, CTRL_ATTR_FAMILY_NAME, family, strlen(family) + 1); - - if (rtnl_talk(rth, nlh, 0, 0, nlh, NULL, NULL) 0) { - fprintf(stderr, Error talking to the kernel\n); - goto errout; - } - - { - struct rtattr *tb[CTRL_ATTR_MAX + 1]; - struct genlmsghdr *ghdr = NLMSG_DATA(nlh); - int len = nlh-nlmsg_len; - struct rtattr *attrs; - - if (nlh-nlmsg_type != GENL_ID_CTRL) { - fprintf(stderr, Not a controller message, nlmsg_len=%d - nlmsg_type=0x%x\n, nlh-nlmsg_len, nlh-nlmsg_type); - goto errout; - } - - if (ghdr-cmd != CTRL_CMD_NEWFAMILY) { - fprintf(stderr, Unkown controller command %d\n, ghdr-cmd); - goto errout; - } - - len -= NLMSG_LENGTH(GENL_HDRLEN); - - if (len 0) { - fprintf(stderr, wrong controller message len %d\n, len); - return -1; - } - - attrs = (struct rtattr
Re: [PATCH 1/4] [IPROUTE2] Revert Make ip utility veth driver aware
Eric W. Biederman wrote: Stephen it looks like you weren't cc'd on the latest version of the veth support. So this patchset first reverts the old He was. The latest version looks completely different from what is reversed in this patch. version of the veth support you merged. Then merges a tested version of the veth support. This reverts commit 4ed390ce43d1ec7c881721f312260df901d8390d. Conflicts: ip/ip.c --- ip/Makefile |2 +- ip/ip.c |4 +- ip/veth.c | 196 --- ip/veth.h | 17 - 4 files changed, 2 insertions(+), 217 deletions(-) delete mode 100644 ip/veth.c delete mode 100644 ip/veth.h diff --git a/ip/Makefile b/ip/Makefile index 209c5c8..9a5bfe3 100644 --- a/ip/Makefile +++ b/ip/Makefile @@ -1,7 +1,7 @@ IPOBJ=ip.o ipaddress.o iproute.o iprule.o \ rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \ ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \ -ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o veth.o +ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o RTMONOBJ=rtmon.o diff --git a/ip/ip.c b/ip/ip.c index 829fc64..4bdb83b 100644 --- a/ip/ip.c +++ b/ip/ip.c @@ -27,7 +27,6 @@ #include SNAPSHOT.h #include utils.h #include ip_common.h -#include veth.h int preferred_family = AF_UNSPEC; int show_stats = 0; @@ -48,7 +47,7 @@ static void usage(void) Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n ip [ -force ] [-batch filename\n where OBJECT := { link | addr | route | rule | neigh | ntable | tunnel |\n - maddr | mroute | monitor | xfrm | veth }\n + maddr | mroute | monitor | xfrm }\n OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n -f[amily] { inet | inet6 | ipx | dnet | link } |\n -o[neline] | -t[imestamp] }\n); @@ -78,7 +77,6 @@ static const struct cmd { { monitor,do_ipmonitor }, { xfrm, do_xfrm }, { mroute, do_multiroute }, - { veth, do_veth }, { help, do_help }, { 0 } }; diff --git a/ip/veth.c b/ip/veth.c deleted file mode 100644 index d4eecc8..000 --- a/ip/veth.c +++ /dev/null @@ -1,196 +0,0 @@ -/* - * veth.c ethernet tunnel - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - * - * Authors: Pavel Emelianov, [EMAIL PROTECTED] - * - */ - -#include stdio.h -#include string.h -#include unistd.h -#include sys/types.h -#include sys/socket.h -#include linux/genetlink.h - -#include utils.h -#include veth.h - -#define GENLMSG_DATA(glh) ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN)) -#define NLA_DATA(na)((void *)((char*)(na) + NLA_HDRLEN)) - -static int do_veth_help(void) -{ - fprintf(stderr, Usage: ip veth add DEVICE PEER_NAME\n); - fprintf(stderr,del DEVICE\n); - exit(-1); -} - -static int genl_ctrl_resolve_family(const char *family) -{ - struct rtnl_handle rth; - struct nlmsghdr *nlh; - struct genlmsghdr *ghdr; - int ret = 0; - struct { - struct nlmsghdr n; - charbuf[4096]; - } req; - - memset(req, 0, sizeof(req)); - - nlh = req.n; - nlh-nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN); - nlh-nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; - nlh-nlmsg_type = GENL_ID_CTRL; - - ghdr = NLMSG_DATA(req.n); - ghdr-cmd = CTRL_CMD_GETFAMILY; - - if (rtnl_open_byproto(rth, 0, NETLINK_GENERIC) 0) { - fprintf(stderr, Cannot open generic netlink socket\n); - exit(1); - } - - addattr_l(nlh, 128, CTRL_ATTR_FAMILY_NAME, family, strlen(family) + 1); - - if (rtnl_talk(rth, nlh, 0, 0, nlh, NULL, NULL) 0) { - fprintf(stderr, Error talking to the kernel\n); - goto errout; - } - - { - struct rtattr *tb[CTRL_ATTR_MAX + 1]; - struct genlmsghdr *ghdr = NLMSG_DATA(nlh); - int len = nlh-nlmsg_len; - struct rtattr *attrs; - - if (nlh-nlmsg_type != GENL_ID_CTRL) { - fprintf(stderr, Not a controller message, nlmsg_len=%d - nlmsg_type=0x%x\n, nlh-nlmsg_len, nlh-nlmsg_type); - goto errout; - } - - if (ghdr-cmd != CTRL_CMD_NEWFAMILY) { - fprintf(stderr, Unkown controller command %d\n, ghdr-cmd); - goto errout; - } - - len -= NLMSG_LENGTH(GENL_HDRLEN); - - if (len 0) { -
Re: [net-2.6.24][NETNS][patch 3/3] fix bad macro definition
From: [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 14:38:14 +0200 From: Daniel Lezcano [EMAIL PROTECTED] The macro definition is bad. When calling next_net_device with parameter name dev, the resulting code is: struct net_device *dev = dev and that leads to an unexpected behavior. Especially when llc_core is compiled in, the kernel panics at boot time. The patchset change macro definition with static inline functions as they were defined before. Signed-off-by: Benjamin Thery [EMAIL PROTECTED] Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-2.6.24][NETNS][patch 1/3] fix export symbols
David Miller wrote: From: [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 14:38:12 +0200 From: Daniel Lezcano [EMAIL PROTECTED] Add the appropriate EXPORT_SYMBOLS for proc_net_create, proc_net_fops_create and proc_net_remove to fix errors when compiling allmodconfig Signed-off-by: Mark Nelson [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] Applied to net-2.6.24, thanks. Why aren't you signing off on these patches? Please do so in the future. Because From: usually means you are the patch author, and I can't tell who wrote these patches, you or these other people listed in the signoff area. Sorry for that, I will take care of that next time. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] [IPROUTE2] Introduce iplink_parse() routine
From: Pavel Emelyanov [EMAIL PROTECTED] Date: Thu, 19 Jul 2007 13:32:31 +0400 This routine parses CLI attributes, describing generic link parameters such as name, address, etc. This is mostly copy-pasted from iplink_modify(). Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Acked-by: Patrick McHardy [EMAIL PROTECTED] --- include/utils.h |3 + ip/iplink.c | 127 +++--- 2 files changed, 76 insertions(+), 54 deletions(-) diff --git a/include/utils.h b/include/utils.h index a3fd335..3fd851d 100644 --- a/include/utils.h +++ b/include/utils.h @@ -146,4 +146,7 @@ extern int cmdlineno; extern size_t getcmdline(char **line, size_t *len, FILE *in); extern int makeargs(char *line, char *argv[], int maxargs); +struct iplink_req; +int iplink_parse(int argc, char **argv, struct iplink_req *req, + char **name, char **type, char **link, char **dev); #endif /* __UTILS_H__ */ diff --git a/ip/iplink.c b/ip/iplink.c index 4060845..64989b2 100644 --- a/ip/iplink.c +++ b/ip/iplink.c @@ -142,140 +142,159 @@ static int iplink_have_newlink(void) } #endif /* ! IPLINK_IOCTL_COMPAT */ -static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv) +struct iplink_req { + struct nlmsghdr n; + struct ifinfomsgi; + charbuf[1024]; +}; + +int iplink_parse(int argc, char **argv, struct iplink_req *req, + char **name, char **type, char **link, char **dev) { + int ret, len; + char abuf[32]; int qlen = -1; int mtu = -1; - int len; - char abuf[32]; - char *dev = NULL; - char *name = NULL; - char *link = NULL; - char *type = NULL; - struct link_util *lu = NULL; - struct { - struct nlmsghdr n; - struct ifinfomsgi; - charbuf[1024]; - } req; - memset(req, 0, sizeof(req)); - - req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)); - req.n.nlmsg_flags = NLM_F_REQUEST|flags; - req.n.nlmsg_type = cmd; - req.i.ifi_family = preferred_family; + ret = argc; while (argc 0) { if (strcmp(*argv, up) == 0) { - req.i.ifi_change |= IFF_UP; - req.i.ifi_flags |= IFF_UP; + req-i.ifi_change |= IFF_UP; + req-i.ifi_flags |= IFF_UP; } else if (strcmp(*argv, down) == 0) { - req.i.ifi_change |= IFF_UP; - req.i.ifi_flags = ~IFF_UP; + req-i.ifi_change |= IFF_UP; + req-i.ifi_flags = ~IFF_UP; } else if (strcmp(*argv, name) == 0) { NEXT_ARG(); - name = *argv; + *name = *argv; } else if (matches(*argv, link) == 0) { NEXT_ARG(); - link = *argv; + *link = *argv; } else if (matches(*argv, address) == 0) { NEXT_ARG(); len = ll_addr_a2n(abuf, sizeof(abuf), *argv); - addattr_l(req.n, sizeof(req), IFLA_ADDRESS, abuf, len); + addattr_l(req-n, sizeof(*req), IFLA_ADDRESS, abuf, len); } else if (matches(*argv, broadcast) == 0 || - strcmp(*argv, brd) == 0) { + strcmp(*argv, brd) == 0) { NEXT_ARG(); len = ll_addr_a2n(abuf, sizeof(abuf), *argv); - addattr_l(req.n, sizeof(req), IFLA_BROADCAST, abuf, len); + addattr_l(req-n, sizeof(*req), IFLA_BROADCAST, abuf, len); } else if (matches(*argv, txqueuelen) == 0 || - strcmp(*argv, qlen) == 0 || - matches(*argv, txqlen) == 0) { + strcmp(*argv, qlen) == 0 || + matches(*argv, txqlen) == 0) { NEXT_ARG(); if (qlen != -1) duparg(txqueuelen, *argv); if (get_integer(qlen, *argv, 0)) invarg(Invalid \txqueuelen\ value\n, *argv); - addattr_l(req.n, sizeof(req), IFLA_TXQLEN, qlen, 4); + addattr_l(req-n, sizeof(*req), IFLA_TXQLEN, qlen, 4); } else if (strcmp(*argv, mtu) == 0) { NEXT_ARG(); if (mtu != -1) duparg(mtu, *argv); if (get_integer(mtu, *argv, 0)) invarg(Invalid \mtu\ value\n, *argv); - addattr_l(req.n, sizeof(req), IFLA_MTU, mtu, 4); +
[PATCH 3/4] [IPROUTE2] Module for ip utility to support veth device
From: Pavel Emelyanov [EMAIL PROTECTED] Date: Thu, 19 Jul 2007 13:33:56 +0400 The link_veth.so itself. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Acked-by: Patrick McHardy [EMAIL PROTECTED] --- ip/Makefile|6 - ip/link_veth.c | 63 ip/veth.h | 12 ++ 3 files changed, 80 insertions(+), 1 deletions(-) create mode 100644 ip/link_veth.c create mode 100644 ip/veth.h diff --git a/ip/Makefile b/ip/Makefile index 9a5bfe3..b46bce3 100644 --- a/ip/Makefile +++ b/ip/Makefile @@ -8,8 +8,9 @@ RTMONOBJ=rtmon.o ALLOBJ=$(IPOBJ) $(RTMONOBJ) SCRIPTS=ifcfg rtpr routel routef TARGETS=ip rtmon +LIBS=link_veth.so -all: $(TARGETS) $(SCRIPTS) +all: $(TARGETS) $(SCRIPTS) $(LIBS) ip: $(IPOBJ) $(LIBNETLINK) $(LIBUTIL) @@ -24,3 +25,6 @@ clean: LDLIBS += -ldl LDFLAGS+= -Wl,-export-dynamic + +%.so: %.c + $(CC) $(CFLAGS) -shared $ -o $@ diff --git a/ip/link_veth.c b/ip/link_veth.c new file mode 100644 index 000..ded2cdd --- /dev/null +++ b/ip/link_veth.c @@ -0,0 +1,63 @@ +/* + * link_veth.c veth driver module + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors:Pavel Emelianov [EMAIL PROTECTED] + * + */ + +#include string.h + +#include utils.h +#include ip_common.h +#include veth.h + +#defineIFNAMSIZ16 + +static void usage(void) +{ + printf(Usage: ip link add ... type veth + [peer peer-name] [mac mac] [peer_mac mac]\n); +} + +static int veth_parse_opt(struct link_util *lu, int argc, char **argv, + struct nlmsghdr *hdr) +{ + char *name, *type, *link, *dev; + int err, len; + struct rtattr * data; + + if (strcmp(argv[0], peer) != 0) { + usage(); + return -1; + } + + data = NLMSG_TAIL(hdr); + addattr_l(hdr, 1024, VETH_INFO_PEER, NULL, 0); + + hdr-nlmsg_len += sizeof(struct ifinfomsg); + + err = iplink_parse(argc - 1, argv + 1, (struct iplink_req *)hdr, + name, type, link, dev); + if (err 0) + return err; + + if (name) { + len = strlen(name) + 1; + if (len IFNAMSIZ) + invarg(\name\ too long\n, *argv); + addattr_l(hdr, 1024, IFLA_IFNAME, name, len); + } + + data-rta_len = (void *)NLMSG_TAIL(hdr) - (void *)data; + return argc - 1 - err; +} + +struct link_util veth_link_util = { + .id = veth, + .parse_opt = veth_parse_opt, +}; diff --git a/ip/veth.h b/ip/veth.h new file mode 100644 index 000..aa2e6f9 --- /dev/null +++ b/ip/veth.h @@ -0,0 +1,12 @@ +#ifndef __NET_VETH_H__ +#define __NET_VETH_H__ + +enum { + VETH_INFO_UNSPEC, + VETH_INFO_PEER, + + __VETH_INFO_MAX +#define VETH_INFO_MAX (__VETH_INFO_MAX - 1) +}; + +#endif -- 1.5.3.rc6.17.g1911 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] [IPROUTE2] iproute2: link_veth support bug fixes.
From: Eric W. Biederman [EMAIL PROTECTED] Date: Sat, 8 Sep 2007 10:17:43 -0600 This patch contains small compile and implementation bug fixes for link_veth.c. The compile fixes stop trying to build a shared object when we can just as easily compile the code in. Making support of non arch/i386 architectures easier. The documentation is fixed to not document the previous version of the veth support. The code is to initialize it's pointers before calling iplink_parse, and we now set name = dev if name is not passed. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- ip/Makefile|8 +++- ip/link_veth.c | 12 +--- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/ip/Makefile b/ip/Makefile index b46bce3..a98e1f3 100644 --- a/ip/Makefile +++ b/ip/Makefile @@ -3,14 +3,15 @@ IPOBJ=ip.o ipaddress.o iproute.o iprule.o \ ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \ ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o +IPOBJ += link_veth.o + RTMONOBJ=rtmon.o ALLOBJ=$(IPOBJ) $(RTMONOBJ) SCRIPTS=ifcfg rtpr routel routef TARGETS=ip rtmon -LIBS=link_veth.so -all: $(TARGETS) $(SCRIPTS) $(LIBS) +all: $(TARGETS) $(SCRIPTS) ip: $(IPOBJ) $(LIBNETLINK) $(LIBUTIL) @@ -25,6 +26,3 @@ clean: LDLIBS += -ldl LDFLAGS+= -Wl,-export-dynamic - -%.so: %.c - $(CC) $(CFLAGS) -shared $ -o $@ diff --git a/ip/link_veth.c b/ip/link_veth.c index ded2cdd..6f3931c 100644 --- a/ip/link_veth.c +++ b/ip/link_veth.c @@ -20,14 +20,16 @@ static void usage(void) { - printf(Usage: ip link add ... type veth - [peer peer-name] [mac mac] [peer_mac mac]\n); + printf(Usage: ip link add ... type veth peer { ... }\n); } static int veth_parse_opt(struct link_util *lu, int argc, char **argv, struct nlmsghdr *hdr) { - char *name, *type, *link, *dev; + char *dev = NULL; + char *name = NULL; + char *link = NULL; + char *type = NULL; int err, len; struct rtattr * data; @@ -46,6 +48,10 @@ static int veth_parse_opt(struct link_util *lu, int argc, char **argv, if (err 0) return err; + /* Allow ip link add dev and ip link add name */ + if (!name) + name = dev; + if (name) { len = strlen(name) + 1; if (len IFNAMSIZ) -- 1.5.3.rc6.17.g1911 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [IPROUTE2] Basic documentation for dynamic link creation/destruction.
This updates the usage to indicate that we have support link creation and destruction in addition to just setting link parameters. It's not really great documentation of the new netlink support for link creations and removal but it is a start. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- ip/iplink.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/ip/iplink.c b/ip/iplink.c index 64989b2..541f3d6 100644 --- a/ip/iplink.c +++ b/ip/iplink.c @@ -38,7 +38,8 @@ static void usage(void) __attribute__((noreturn)); void iplink_usage(void) { - fprintf(stderr, Usage: ip link set DEVICE { up | down |\n); + fprintf(stderr, Usage: ip link { set | add | replace | delete } DEVICE {\n); + fprintf(stderr, up | down |\n); fprintf(stderr, arp { on | off } |\n); fprintf(stderr, dynamic { on | off } |\n); fprintf(stderr, multicast { on | off } |\n); @@ -48,7 +49,9 @@ void iplink_usage(void) fprintf(stderr, txqueuelen PACKETS |\n); fprintf(stderr, name NEWNAME |\n); fprintf(stderr, address LLADDR | broadcast LLADDR |\n); - fprintf(stderr, mtu MTU }\n); + fprintf(stderr, mtu MTU | \n); + fprintf(stderr, type TYPE [ TYPE specifc options]\n); + fprintf(stderr, }\n); fprintf(stderr,ip link show [ DEVICE ]\n); exit(-1); } -- 1.5.3.rc6.17.g1911 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [IPROUTE2] Add support for moving links between network namespaces
This adds support for setting the IFLA_NET_NS_PID attribute on links allowing them to be moved between network namespaces. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/if_link.h |1 + ip/iplink.c |9 + 2 files changed, 10 insertions(+), 0 deletions(-) diff --git a/include/linux/if_link.h b/include/linux/if_link.h index 23b3a8e..c948395 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -78,6 +78,7 @@ enum IFLA_LINKMODE, IFLA_LINKINFO, #define IFLA_LINKINFO IFLA_LINKINFO + IFLA_NET_NS_PID, __IFLA_MAX }; diff --git a/ip/iplink.c b/ip/iplink.c index 541f3d6..624c784 100644 --- a/ip/iplink.c +++ b/ip/iplink.c @@ -158,6 +158,7 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char abuf[32]; int qlen = -1; int mtu = -1; + pid_t netns_pid = -1; ret = argc; @@ -255,6 +256,14 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, } else return on_off(dynamic); #endif + } else if (matches(*argv, netnspid) == 0) { + NEXT_ARG(); + if (netns_pid != -1) + duparg(netnspid, *argv); + if (get_integer(netns_pid, *argv, 0)) + invarg(Invalid \netnspid\ value\n, *argv); + addattr_l(req-n, sizeof(*req), IFLA_NET_NS_PID, + netns_pid, sizeof(netns_pid)); } else if (matches(*argv, type) == 0) { NEXT_ARG(); *type = *argv; -- 1.5.3.rc6.17.g1911 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] [IPROUTE2] Revert Make ip utility veth driver aware
Pavel Emelyanov [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Stephen it looks like you weren't cc'd on the latest version of the veth support. So this patchset first reverts the old He was. The latest version looks completely different from what is reversed in this patch. This is against the latest snapshot I could find. My apologies if I missed some of the communication. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-mm patch] really unexport do_softirq
From: Adrian Bunk [EMAIL PROTECTED] Date: Sun, 9 Sep 2007 22:25:40 +0200 On Fri, Aug 31, 2007 at 09:58:22PM -0700, Andrew Morton wrote: ... Changes since 2.6.23-rc3-mm1: ... git-net.patch ... git trees ... This hydra had more than one head... Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-mm patch] unexport raise_softirq_irqoff
From: Christoph Hellwig [EMAIL PROTECTED] Date: Sun, 9 Sep 2007 21:41:53 +0100 On Sun, Sep 09, 2007 at 10:25:44PM +0200, Adrian Bunk wrote: On Fri, Aug 31, 2007 at 09:58:22PM -0700, Andrew Morton wrote: ... Changes since 2.6.23-rc3-mm1: ... git-net.patch ... git trees ... raise_softirq_irqoff no longer has any modular user. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] This should probably go in through Dave's tree as it's removing this rather annoying user. Yep, I've just tossed it into my tree. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] make sctp_addto_param() static
From: Adrian Bunk [EMAIL PROTECTED] Date: Sun, 9 Sep 2007 22:25:50 +0200 sctp_addto_param() can become static. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-mm patch] net/sctp/socket.c: make 3 variables static
From: Adrian Bunk [EMAIL PROTECTED] Date: Sun, 9 Sep 2007 22:25:54 +0200 This patch makes the following needlessly globalvariables static: - sctp_memory_pressure - sctp_memory_allocated - sctp_sockets_allocated Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] veth: Cleanly handle a missing peer_tb argument on creation.
I was getting strange kernel crashes when attempting to create veth devices when I did not specify a peer argument to /bin/ip. So this patch defaults peer_tb to all zeros and doesn't attempt to reuse the netlink attributes for the primary link to create the secondary link and now I can't reproduce the failures. Given that some of the most interesting netlink attributes to specify like a mac address or a network device name seem are generally the wrong thing to do this seems like the right approach. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- drivers/net/veth.c | 16 +++- 1 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 9e6a746..d49bd2c 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -313,7 +313,7 @@ static int veth_newlink(struct net_device *dev, struct net_device *peer; struct veth_priv *priv; char ifname[IFNAMSIZ]; - struct nlattr *peer_tb[IFLA_MAX + 1], **tbp; + struct nlattr *peer_tb[IFLA_MAX + 1]; /* * create and register peer first @@ -322,6 +322,7 @@ static int veth_newlink(struct net_device *dev, * skip it since no info from it is useful yet */ + memset(peer_tb, 0, sizeof(peer_tb)); if (data != NULL data[VETH_INFO_PEER] != NULL) { struct nlattr *nla_peer; @@ -336,21 +337,18 @@ static int veth_newlink(struct net_device *dev, err = veth_validate(peer_tb, NULL); if (err 0) return err; + } - tbp = peer_tb; - } else - tbp = tb; - - if (tbp[IFLA_IFNAME]) - nla_strlcpy(ifname, tbp[IFLA_IFNAME], IFNAMSIZ); + if (peer_tb[IFLA_IFNAME]) + nla_strlcpy(ifname, peer_tb[IFLA_IFNAME], IFNAMSIZ); else snprintf(ifname, IFNAMSIZ, DRV_NAME %%d); - peer = rtnl_create_link(dev-nd_net, ifname, veth_link_ops, tbp); + peer = rtnl_create_link(dev-nd_net, ifname, veth_link_ops, peer_tb); if (IS_ERR(peer)) return PTR_ERR(peer); - if (tbp[IFLA_ADDRESS] == NULL) + if (peer_tb[IFLA_ADDRESS] == NULL) random_ether_addr(peer-dev_addr); err = register_netdevice(peer); -- 1.5.3.rc6.17.g1911 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-mm patch] make tcp_splice_data_recv() static
From: Adrian Bunk [EMAIL PROTECTED] Date: Sun, 9 Sep 2007 22:25:58 +0200 On Fri, Aug 31, 2007 at 09:58:22PM -0700, Andrew Morton wrote: ... Changes since 2.6.23-rc3-mm1: ... git-block.patch ... git trees ... tcp_splice_data_recv() can become static. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] I'll let Jens or similar pick this one up since it obviously won't apply to my tree. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: new NAPI interface broken for POWER architecture?
David Miller [EMAIL PROTECTED] wrote on 12.09.2007 14:50:04: From: Jan-Bernd Themann [EMAIL PROTECTED] Date: Fri, 7 Sep 2007 11:37:02 +0200 2) On SMP systems: after netif_rx_complete has been called on CPU1 (+interruts enabled), netif_rx_schedule could be called on CPU2 (irq handler) before net_rx_action on CPU1 has checked NAPI_STATE_SCHED. In that case the device would be added to poll lists of CPU1 and CPU2 as net_rx_action would see NAPI_STATE_SCHED set. This must not happen. It will be caught when netif_rx_complete is called the second time (BUG() called) This would mean we have a problem on all SMP machines right now. This is not a correct statement. Only on your platform do network device interrupts get moved around, no other platform does this. Sparc64 doesn't, all interrupts stay in one location after the cpu is initially choosen. x86 and x86_64 specifically do not move around network device interrupts, even though other device types do get dynamic IRQ cpu distribution. That's why you are the only person seeing this problem. I agree that it should be fixed, but we should also fix the IRQ distribution scheme used on powerpc platforms which is totally broken in these cases. This is definitely not something we can change in the HEA device driver alone. It could also affect any other networking cards on POWER (e1000,s2io...). Paul, Michael, Arndt, what is your opinion here? Gruss / Regards Christoph Raisch - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: new NAPI interface broken for POWER architecture?
From: Christoph Raisch [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 15:10:08 +0200 This is definitely not something we can change in the HEA device driver alone. And it shouldn't be, x86 implements the policy in irq balance daemon, powerpc should do it wherever it would be appropriate there. Paul, Michael, Arndt, what is your opinion here? I'm all ears too :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
dscc4.c tests for #ifndef MODULE even though it must be modular
from drivers/net/wan/dscc4.c: = #ifndef MODULE static int __init dscc4_setup(char *str) { int *args[] = { debug, quartz, NULL }, **p = args; while (*p (get_option(str, *p) == 2)) p++; return 1; } __setup(dscc4.setup=, dscc4_setup); #endif = but from drivers/net/wan/Kconfig: ... config DSCC4 tristate Etinc PCISYNC serial board support depends on HDLC PCI m ... if i read this correctly, doesn't the depends on of m mean that that Kconfig selection can be *at most* modular, so that that preprocessor conditional can never be satisfied? a quick test under make menuconfig seems to confirm that. besides, the kernel parm being defined in that call to __setup() really violates the spirit of defining kernel parms. :-) rday -- Robert P. J. Day Linux Consulting, Training and Annoying Kernel Pedantry Waterloo, Ontario, CANADA http://crashcourse.ca - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] veth: Cleanly handle a missing peer_tb argument on creation.
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Wed, 12 Sep 2007 07:19:56 -0600 I was getting strange kernel crashes when attempting to create veth devices when I did not specify a peer argument to /bin/ip. So this patch defaults peer_tb to all zeros and doesn't attempt to reuse the netlink attributes for the primary link to create the secondary link and now I can't reproduce the failures. Given that some of the most interesting netlink attributes to specify like a mac address or a network device name seem are generally the wrong thing to do this seems like the right approach. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] This looks mostly fine, can someone else who knows veth a bit review this as well? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IPV4 : convert rt_check_expire() from softirq processing to workqueue
On loaded/big hosts, rt_check_expire() if of litle use, because it generally breaks out of its main loop because of a jiffies change. It can take a long time (read : timer invocations) to actually scan the whole hash table, freeing unused entries. Converting it to use a workqueue instead of softirq is a nice move because we can allow rt_check_expire() to do the scan it is supposed to do, without hogging the CPU. This has an impact on the average number of entries in cache, reducing ram usage. Cache is more responsive to parameter changes (/proc/sys/net/ipv4/route/gc_timeout and /proc/sys/net/ipv4/route/gc_interval) Note: Maybe the default value of gc_interval (60 seconds) is too high, since this means we actually need 5 (300/60) invocations to scan the whole table. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 396c631..006d605 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -81,6 +81,7 @@ #include linux/netdevice.h #include linux/proc_fs.h #include linux/init.h +#include linux/workqueue.h #include linux/skbuff.h #include linux/inetdevice.h #include linux/igmp.h @@ -136,7 +137,8 @@ static unsigned long rt_deadline; #define RTprint(a...) printk(KERN_DEBUG a) static struct timer_list rt_flush_timer; -static struct timer_list rt_periodic_timer; +static void rt_check_expire(struct work_struct *work); +static DECLARE_DELAYED_WORK(expires_work, rt_check_expire); static struct timer_list rt_secret_timer; /* @@ -572,20 +574,19 @@ static inline int compare_keys(struct flowi *fl1, struct flowi *fl2) (fl1-iif ^ fl2-iif)) == 0; } -/* This runs via a timer and thus is always in BH context. */ -static void rt_check_expire(unsigned long dummy) +static void rt_check_expire(struct work_struct *work) { static unsigned int rover; unsigned int i = rover, goal; struct rtable *rth, **rthp; - unsigned long now = jiffies; u64 mult; mult = ((u64)ip_rt_gc_interval) rt_hash_log; if (ip_rt_gc_timeout 1) do_div(mult, ip_rt_gc_timeout); goal = (unsigned int)mult; - if (goal rt_hash_mask) goal = rt_hash_mask + 1; + if (goal rt_hash_mask) + goal = rt_hash_mask + 1; for (; goal 0; goal--) { unsigned long tmo = ip_rt_gc_timeout; @@ -594,11 +595,11 @@ static void rt_check_expire(unsigned long dummy) if (*rthp == 0) continue; - spin_lock(rt_hash_lock_addr(i)); + spin_lock_bh(rt_hash_lock_addr(i)); while ((rth = *rthp) != NULL) { if (rth-u.dst.expires) { /* Entry is expired even if it is in use */ - if (time_before_eq(now, rth-u.dst.expires)) { + if (time_before_eq(jiffies, rth-u.dst.expires)) { tmo = 1; rthp = rth-u.dst.rt_next; continue; @@ -613,14 +614,10 @@ static void rt_check_expire(unsigned long dummy) *rthp = rth-u.dst.rt_next; rt_free(rth); } - spin_unlock(rt_hash_lock_addr(i)); - - /* Fallback loop breaker. */ - if (time_after(jiffies, now)) - break; + spin_unlock_bh(rt_hash_lock_addr(i)); } rover = i; - mod_timer(rt_periodic_timer, jiffies + ip_rt_gc_interval); + schedule_delayed_work(expires_work, ip_rt_gc_interval); } /* This can run from both BH and non-BH contexts, the latter @@ -2993,17 +2990,14 @@ int __init ip_rt_init(void) init_timer(rt_flush_timer); rt_flush_timer.function = rt_run_flush; - init_timer(rt_periodic_timer); - rt_periodic_timer.function = rt_check_expire; init_timer(rt_secret_timer); rt_secret_timer.function = rt_secret_rebuild; /* All the timers, started at system startup tend to synchronize. Perturb it a bit. */ - rt_periodic_timer.expires = jiffies + net_random() % ip_rt_gc_interval + - ip_rt_gc_interval; - add_timer(rt_periodic_timer); + schedule_delayed_work(expires_work, + net_random() % ip_rt_gc_interval + ip_rt_gc_interval); rt_secret_timer.expires = jiffies + net_random() % ip_rt_secret_interval + ip_rt_secret_interval; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)
On Wed, Sep 12, 2007 at 02:07:10PM +0200, Wolfgang Walter wrote: as already described old temporary sockets (client is gone) of lockd aren't closed after some time. So, with enough clients and some time gone, there are 80 open dangling sockets and you start getting messages of the form: lockd: too many open TCP sockets, consider increasing the number of nfsd threads. Thanks for working on this problem! If I understand the code then the intention was that the server closes temporary sockets after about 6 to 12 minutes: a timer is started which calls svc_age_temp_sockets every 6 minutes. svc_age_temp_sockets: if a socket is marked OLD it gets closed. sockets which are not marked as OLD are marked OLD every time the sockets receives something OLD is cleared. But svc_age_temp_sockets never closes any socket though because it only closes sockets with svsk-sk_inuse == 0. This seems to be a bug. Here is a patch against 2.6.22.6 which changes the test to svsk-sk_inuse = 0 which was probably meant. The patched kernel runs fine here. Unused sockets get closed (after 6 to 12 minutes) So the fact that this changes the behavior means that sk_inuse is taking on negative values. This can't be right--how can something like svc_sock_put() (which does an atomic_dec_and_test) work in that case? I wish I had time today to figure out what's going on in this case. But from a quick through svsock.c for sk_inuse, it looks odd; I'm suspicious of anything without the stereotyped behavior--initializing to one, atomic_inc()ing whenever someone takes a reference, and atomic_dec_and_test()ing whenever someone drops it --b. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPROUTE2] Add support for moving links between network namespaces
On Wed, 12 Sep 2007 07:05:42 -0600 [EMAIL PROTECTED] (Eric W. Biederman) wrote: This adds support for setting the IFLA_NET_NS_PID attribute on links allowing them to be moved between network namespaces. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/if_link.h |1 + ip/iplink.c |9 + 2 files changed, 10 insertions(+), 0 deletions(-) Please don't mix header file updates with command changes. As a first step, I always install standard kernel santized headers. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] [IPROUTE2] Revert Make ip utility veth driver aware
[EMAIL PROTECTED] (Eric W. Biederman) writes: Pavel Emelyanov [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Stephen it looks like you weren't cc'd on the latest version of the veth support. So this patchset first reverts the old He was. The latest version looks completely different from what is reversed in this patch. This is against the latest snapshot I could find. My apologies if I missed some of the communication. I was working against: git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git And the last I could find of the conversation about veth support was in the thread announcing iproute-2-2.6.23-rc3, and Stephen Hemminger asking for the latest version of the veth support to be sent on Sept 1st. So it is quite possible this has been resolved in private email, and nothing public has been updated yet. I just don't have a copy of anything newer, and I don't know where else I would look for something newer. So since I'm starting to use veth I sent the patches I had to make it work. The last round of veth support for iproute2 I could find was sent on the 19th of July and David Miller, Patrick McHardy, and netdev were copied but Stephen Hemminger wasn't. Which is where my assertion that Stephen hadn't been sent the latest version came from. If you guys have already sorted this out and I just can't find the code I'm overjoyed. Otherwise the patches I sent should be enough to get things sorted out, if I have figure out the current state of confusion. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
jamal wrote: On Wed, 2007-12-09 at 03:04 -0400, Bill Fink wrote: On Fri, 07 Sep 2007, jamal wrote: I am going to be the devil's advocate[1]: So let me be the angel's advocate. :-) I think this would make you God's advocate ;- (http://en.wikipedia.org/wiki/God%27s_advocate) I view his results much more favorably. The challenge is, under _low traffic_: bad bad CPU use. Thats what is at stake, correct? By low traffic, I assume you mean a rate at which the NAPI driver doesn't stay in polled mode. The problem is that that rate is getting higher all the time, as interface and CPU speeds increase. This results in too many interrupts and NAPI thrashing in/out of polled mode very quickly. Lets bury the stats for a sec ... Yes please. We need an analysis of what happens to cpu usage, latency, pps etc when various factors are changed, e.g. input pps, NAPI busy-idle delay etc. The main purpose of my RFC wasn't to push a patch into the kernel right now, it was to highlight the issue and to find out if others were already working on it. The feedback has been good so far. I just need to find some time to do some testing. :) People are bitching about NAPI abusing CPU, is the answer to abuse more CPU than NAPI?;- Jamal, do you have more details? Are people saying NAPI gets too much of the CPU pie because they profiled it? Are they complaining that system behavior degrades too much under certain network traffic conditions? Mouse cursor movement jittery? Real-time apps such as music/video players starved of CPU? Is it possible they blame NAPI because they see tangible effects on their system, not because measured CPU usage is high? I say this because my music/video player and mouse cursor behave _much_ better with my NAPI changes during general use, despite the increase in measured cpu load. Even ftp can make my system's mouse cursor jitter... The answer could be I am not solving that problem anymore - at least thats what James is saying;- I'm investigating whether the symptoms I describe above can be reduced or eliminated without resorting to hardware interrupt mitigation. Specifically, I want to do more testing on the idle polling scheme which seems to improve system behavior in my tests. This will involve more than doing a flood ping or two. :) Sometimes there are tradeoffs to be made to be decided by the user based on what's most important to that user and his specific workload. And the suggested ethtool option (defaulting to current behavior) would enable the user to make that decision. And the challenge is: What workload is willing to invest that much cpu for low traffic? Can you name one? One that may come close is database benchmarks for latency - but those folks wouldnt touch this with a mile-long pole if you told them their cpu use is going to get worse than what NAPI (that big bad CPU hog under low traffic) is giving them. I agree with both of you. But we need more test results first to know whether it will be useful to offer NAPI idle polling as an _option_. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
On Wed, 12 Sep 2007 14:50:01 +0100 James Chapman [EMAIL PROTECTED] wrote: jamal wrote: On Wed, 2007-12-09 at 03:04 -0400, Bill Fink wrote: On Fri, 07 Sep 2007, jamal wrote: I am going to be the devil's advocate[1]: So let me be the angel's advocate. :-) I think this would make you God's advocate ;- (http://en.wikipedia.org/wiki/God%27s_advocate) I view his results much more favorably. The challenge is, under _low traffic_: bad bad CPU use. Thats what is at stake, correct? By low traffic, I assume you mean a rate at which the NAPI driver doesn't stay in polled mode. The problem is that that rate is getting higher all the time, as interface and CPU speeds increase. This results in too many interrupts and NAPI thrashing in/out of polled mode very quickly. But if you compare this to non-NAPI driver the same softirq overhead happens. The problem is that for many older devices disabling IRQ's require an expensive non-cached PCI access. Smarter, newer devices all use MSI which is pure edge triggered and with proper register usage, NAPI should be no worse than non-NAPI. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPROUTE2] Add support for moving links between network namespaces
Stephen Hemminger [EMAIL PROTECTED] writes: On Wed, 12 Sep 2007 07:05:42 -0600 [EMAIL PROTECTED] (Eric W. Biederman) wrote: This adds support for setting the IFLA_NET_NS_PID attribute on links allowing them to be moved between network namespaces. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/if_link.h |1 + ip/iplink.c |9 + 2 files changed, 10 insertions(+), 0 deletions(-) Please don't mix header file updates with command changes. As a first step, I always install standard kernel santized headers. Sorry I didn't know. Those changes are now in net-2.6.24 so installing sanitized headers should not change anything. In please feel free to drop the if_link.h part, and if you want I can resend that patch with those few lines deleted. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
net-2.6.24 build problem
ERROR: xfrm_audit_state_delete [net/key/af_key.ko] undefined! ERROR: xfrm_audit_state_add [net/key/af_key.ko] undefined! ERROR: xfrm_audit_policy_add [net/key/af_key.ko] undefined! ERROR: xfrm_audit_policy_delete [net/key/af_key.ko] undefined # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23-rc5 # Wed Sep 12 15:12:02 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION=-net # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_TASKSTATS=y # CONFIG_TASK_DELAY_ACCT is not set # CONFIG_TASK_XACCT is not set # CONFIG_USER_NS is not set CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=14 # CONFIG_SYSFS_DEPRECATED is not set CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_KALLSYMS_EXTRA_PASS=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_MODVERSIONS=y # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set CONFIG_BLK_DEV_BSG=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_PARAVIRT is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set CONFIG_MPENTIUMM=y # CONFIG_MCORE2 is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_XADD=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_FAMILY=4 CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_MCE=y CONFIG_X86_MCE_NONFATAL=m CONFIG_X86_MCE_P4THERMAL=y CONFIG_VM86=y # CONFIG_TOSHIBA is not set # CONFIG_I8K is not set CONFIG_X86_REBOOTFIXUPS=y CONFIG_MICROCODE=m CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=m CONFIG_X86_CPUID=m # # Firmware Drivers # CONFIG_EDD=m # CONFIG_DELL_RBU is not set # CONFIG_DCDBAS is not set CONFIG_DMIID=y # CONFIG_NOHIGHMEM is not set CONFIG_HIGHMEM4G=y # CONFIG_HIGHMEM64G is not set CONFIG_PAGE_OFFSET=0xC000 CONFIG_HIGHMEM=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_POPULATES_NODE_MAP=y
Re: [PATCH] [IPROUTE2] Add support for moving links between network namespaces
On Wed, 12 Sep 2007 08:06:08 -0600 [EMAIL PROTECTED] (Eric W. Biederman) wrote: Stephen Hemminger [EMAIL PROTECTED] writes: On Wed, 12 Sep 2007 07:05:42 -0600 [EMAIL PROTECTED] (Eric W. Biederman) wrote: This adds support for setting the IFLA_NET_NS_PID attribute on links allowing them to be moved between network namespaces. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/if_link.h |1 + ip/iplink.c |9 + 2 files changed, 10 insertions(+), 0 deletions(-) Please don't mix header file updates with command changes. As a first step, I always install standard kernel santized headers. Sorry I didn't know. Those changes are now in net-2.6.24 so installing sanitized headers should not change anything. In please feel free to drop the if_link.h part, and if you want I can resend that patch with those few lines deleted. Eric I take care of fixing patches (as long as they aren't really damaged). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)
On Wednesday September 12, [EMAIL PROTECTED] wrote: Hello, as already described old temporary sockets (client is gone) of lockd aren't closed after some time. So, with enough clients and some time gone, there are 80 open dangling sockets and you start getting messages of the form: lockd: too many open TCP sockets, consider increasing the number of nfsd threads. If I understand the code then the intention was that the server closes temporary sockets after about 6 to 12 minutes: a timer is started which calls svc_age_temp_sockets every 6 minutes. svc_age_temp_sockets: if a socket is marked OLD it gets closed. sockets which are not marked as OLD are marked OLD every time the sockets receives something OLD is cleared. But svc_age_temp_sockets never closes any socket though because it only closes sockets with svsk-sk_inuse == 0. This seems to be a bug. Here is a patch against 2.6.22.6 which changes the test to svsk-sk_inuse = 0 which was probably meant. The patched kernel runs fine here. Unused sockets get closed (after 6 to 12 minutes) Signed-off-by: Wolfgang Walter [EMAIL PROTECTED] --- ../linux-2.6.22.6/net/sunrpc/svcsock.c2007-08-27 18:10:14.0 +0200 +++ net/sunrpc/svcsock.c 2007-09-11 11:07:13.0 +0200 @@ -1572,7 +1575,7 @@ if (!test_and_set_bit(SK_OLD, svsk-sk_flags)) continue; - if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, svsk-sk_flags)) + if (atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, svsk-sk_flags)) continue; atomic_inc(svsk-sk_inuse); list_move(le, to_be_aged); As svc_age_temp_sockets did not do anything before this change may trigger hidden bugs. To be true I don't see why this check (atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, svsk-sk_flags)) is needed at all (it can only be an optimation) as this fields change after the check. In svc_tcp_accept there is no such check when a temporary socket is closed. Thanks for looking into this. I think the correct change is to test if (atomic_read(svsk-sk_inuse) 1 || test_bit(SK_BUSY, svsk-sk_flags)) or even if (atomic_read(svsk-sk_inuse) != 1 || test_bit(SK_BUSY, svsk-sk_flags)) sk_inuse contains a bias of '1' until SK_DEAD is set. So a normal, active socket will have an inuse count of 1 or more. If it is exactly 1, then either it is SK_DEAD (in which case there is nothing for this code to do), or it has no users, in which case it is appropriate to close the socket if it is old. Note that this test is for the socket should not be closed, so we test if it is *not* 1, or 1. The tests are needed because we don't want to close a socket that might be inuse elsewhere. The SK_BUSY bit combined with the sk_inuse count combine to check if the socket is in use at all or not. You change effectively disabled the test, as sk_inuse is never = 0 (except when SK_DEAD is set). This bug has been present since commit aaf68cfbf2241d24d46583423f6bff5c47e088b3 Author: NeilBrown [EMAIL PROTECTED] Date: Thu Feb 8 14:20:30 2007 -0800 (i.e. it is my fault). So it is in 2.6.21 and later and should probably go to .stable for .21 and .22. Bruce: for you :-) --- Correctly close old nfsd/lockd sockets. From: NeilBrown [EMAIL PROTECTED] Commit aaf68cfbf2241d24d46583423f6bff5c47e088b3 added a bias to sk_inuse, so this test for an unused socket now fails. So no sockets gets closed because they are old (they might get closed if the client closed them). This bug has existed since 2.6.21-rc1. Thanks to Wolfgang Walter for finding and reporting the bug. Cc: Wolfgang Walter [EMAIL PROTECTED] Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./net/sunrpc/svcsock.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c --- .prev/net/sunrpc/svcsock.c 2007-09-12 16:05:23.0 +0200 +++ ./net/sunrpc/svcsock.c 2007-09-12 16:06:01.0 +0200 @@ -1592,7 +1592,8 @@ svc_age_temp_sockets(unsigned long closu if (!test_and_set_bit(SK_OLD, svsk-sk_flags)) continue; - if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, svsk-sk_flags)) + if (atomic_read(svsk-sk_inuse) 1 + || test_bit(SK_BUSY, svsk-sk_flags)) continue; atomic_inc(svsk-sk_inuse); list_move(le, to_be_aged); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] veth: Cleanly handle a missing peer_tb argument on creation.
Eric W. Biederman wrote: I was getting strange kernel crashes when attempting to create veth devices when I did not specify a peer argument to /bin/ip. So this patch defaults peer_tb to all zeros and doesn't attempt to reuse the netlink attributes for the primary link to create the secondary link and now I can't reproduce the failures. Given that some of the most interesting netlink attributes to specify like a mac address or a network device name seem are generally the wrong thing to do this seems like the right approach. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- drivers/net/veth.c | 16 +++- 1 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 9e6a746..d49bd2c 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -313,7 +313,7 @@ static int veth_newlink(struct net_device *dev, struct net_device *peer; struct veth_priv *priv; char ifname[IFNAMSIZ]; - struct nlattr *peer_tb[IFLA_MAX + 1], **tbp; + struct nlattr *peer_tb[IFLA_MAX + 1]; /* * create and register peer first @@ -322,6 +322,7 @@ static int veth_newlink(struct net_device *dev, * skip it since no info from it is useful yet */ + memset(peer_tb, 0, sizeof(peer_tb)); if (data != NULL data[VETH_INFO_PEER] != NULL) { struct nlattr *nla_peer; @@ -336,21 +337,18 @@ static int veth_newlink(struct net_device *dev, err = veth_validate(peer_tb, NULL); if (err 0) return err; + } - tbp = peer_tb; - } else - tbp = tb; The intention of this part was to get the same parameters for peer as for the first device if no peer argument was specified for ip utility. Does it still work? - - if (tbp[IFLA_IFNAME]) - nla_strlcpy(ifname, tbp[IFLA_IFNAME], IFNAMSIZ); + if (peer_tb[IFLA_IFNAME]) + nla_strlcpy(ifname, peer_tb[IFLA_IFNAME], IFNAMSIZ); else snprintf(ifname, IFNAMSIZ, DRV_NAME %%d); - peer = rtnl_create_link(dev-nd_net, ifname, veth_link_ops, tbp); + peer = rtnl_create_link(dev-nd_net, ifname, veth_link_ops, peer_tb); if (IS_ERR(peer)) return PTR_ERR(peer); - if (tbp[IFLA_ADDRESS] == NULL) + if (peer_tb[IFLA_ADDRESS] == NULL) random_ether_addr(peer-dev_addr); err = register_netdevice(peer); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.24 build problem
From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 16:08:33 +0200 ERROR: xfrm_audit_state_delete [net/key/af_key.ko] undefined! ERROR: xfrm_audit_state_add [net/key/af_key.ko] undefined! ERROR: xfrm_audit_policy_add [net/key/af_key.ko] undefined! ERROR: xfrm_audit_policy_delete [net/key/af_key.ko] undefined I just checked in the following fix for this: From 2c2d4ef06a1bdb25b721372ab63adde1523e34ec Mon Sep 17 00:00:00 2001 From: David S. Miller [EMAIL PROTECTED](none) Date: Wed, 12 Sep 2007 16:17:36 +0200 Subject: [PATCH] [XFRM]: Add missing auditing symbol exports. Signed-off-by: David S. Miller [EMAIL PROTECTED] --- net/xfrm/xfrm_policy.c |2 ++ net/xfrm/xfrm_state.c |2 ++ 2 files changed, 4 insertions(+), 0 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index de0ff51..50682d3 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -2341,6 +2341,7 @@ xfrm_audit_policy_add(struct xfrm_policy *xp, int result, u32 auid, u32 sid) xfrm_audit_common_policyinfo(xp, audit_buf); audit_log_end(audit_buf); } +EXPORT_SYMBOL_GPL(xfrm_audit_policy_add); void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result, u32 auid, u32 sid) @@ -2357,6 +2358,7 @@ xfrm_audit_policy_delete(struct xfrm_policy *xp, int result, u32 auid, u32 sid) xfrm_audit_common_policyinfo(xp, audit_buf); audit_log_end(audit_buf); } +EXPORT_SYMBOL_GPL(xfrm_audit_policy_delete); #endif #ifdef CONFIG_XFRM_MIGRATE diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index f64621c..15734ad 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1865,6 +1865,7 @@ xfrm_audit_state_add(struct xfrm_state *x, int result, u32 auid, u32 sid) (unsigned long)x-id.spi, (unsigned long)x-id.spi); audit_log_end(audit_buf); } +EXPORT_SYMBOL_GPL(xfrm_audit_state_add); void xfrm_audit_state_delete(struct xfrm_state *x, int result, u32 auid, u32 sid) @@ -1883,4 +1884,5 @@ xfrm_audit_state_delete(struct xfrm_state *x, int result, u32 auid, u32 sid) (unsigned long)x-id.spi, (unsigned long)x-id.spi); audit_log_end(audit_buf); } +EXPORT_SYMBOL_GPL(xfrm_audit_state_delete); #endif /* CONFIG_AUDITSYSCALL */ -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFT] sky2: receive hang check
Would some of the users of 2.6.23-rc5 or later who still experience hangs please test this. IT IS EXPERIMENTAL AND NOT TESTED YET. I am sending it out to see if it detects anything. --- a/drivers/net/sky2.c2007-09-12 14:52:18.0 +0200 +++ b/drivers/net/sky2.c2007-09-12 15:53:16.0 +0200 @@ -1304,6 +1304,7 @@ static int sky2_up(struct net_device *de /* Register is number of 4K blocks on internal RAM buffer. */ ramsize = sky2_read8(hw, B2_E_0) * 4; printk(KERN_INFO PFX %s: ram buffer %dK\n, dev-name, ramsize); + memset(sky2-check, 0, sizeof(sky2-check)); if (ramsize 0) { u32 rxspace; @@ -2446,11 +2447,42 @@ static void sky2_le_error(struct sky2_hw sky2_write32(hw, Q_ADDR(q, Q_CSR), BMU_CLR_IRQ_CHK); } -/* Check for lost IRQ once a second */ +static void sky2_rx_check(struct net_device *dev) +{ + struct sky2_port *sky2 = netdev_priv(dev); + struct sky2_hw *hw = sky2-hw; + unsigned port = sky2-port; + unsigned rxq = rxqaddr[port]; + u32 mac_rp = sky2_read32(hw, SK_REG(port, RX_GMF_RP)); + u8 mac_lev = sky2_read8(hw, SK_REG(port, RX_GMF_RLEV)); + u8 fifo_rp = sky2_read8(hw, Q_ADDR(rxq, Q_RP)); + u8 fifo_lev = sky2_read8(hw, Q_ADDR(rxq, Q_RL)); + + /* If not idle and MAC or PCI is stuck */ + if (sky2-check.last != dev-last_rx + ((mac_rp == sky2-check.mac_rp +mac_lev != 0 mac_lev = sky2-check.mac_lev) || + /* Check if the PCI RX hang */ + (fifo_rp == sky2-check.fifo_rp +fifo_lev != 0 fifo_lev = sky2-check.fifo_lev))) { + + pr_info(PFX %s: receiver hang detected\n, dev-name); + schedule_work(hw-restart_work); + } + + sky2-check.last = dev-last_rx; + sky2-check.mac_rp = mac_rp; + sky2-check.mac_lev = mac_lev; + sky2-check.fifo_rp = fifo_rp; + sky2-check.fifo_lev = fifo_lev; +} + static void sky2_watchdog(unsigned long arg) { struct sky2_hw *hw = (struct sky2_hw *) arg; + int i; + /* Check for lost IRQ */ if (sky2_read32(hw, B0_ISRC)) { struct net_device *dev = hw-dev[0]; @@ -2458,6 +2490,13 @@ static void sky2_watchdog(unsigned long __netif_rx_schedule(dev); } + /* Check for stuck receiver */ + if (sky2_read8(hw, B2_E_0) != 0) + for (i = 0; i hw-ports; i++) + if (netif_running(hw-dev[i])) + sky2_rx_check(hw-dev[i]); + + if (hw-active 0) mod_timer(hw-watchdog_timer, round_jiffies(jiffies + HZ)); } --- a/drivers/net/sky2.h2007-09-05 14:21:59.0 +0200 +++ b/drivers/net/sky2.h2007-09-12 15:36:59.0 +0200 @@ -2017,6 +2017,14 @@ struct sky2_port { u16 rx_tag; struct vlan_group*vlgrp; #endif + struct { + unsigned long last; + u32 mac_rp; + u8 mac_lev; + u8 fifo_rp; + u8 fifo_lev; + } check; + dma_addr_t rx_le_map; dma_addr_t tx_le_map; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)
On Wed, Sep 12, 2007 at 09:37:29AM -0400, bfields wrote: So the fact that this changes the behavior means that sk_inuse is taking on negative values. Uh, no, I misread the tests, sorry. I'm not awake.--b. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New NAPI API: Need for netif_napi_remove() ?!
From: Kok, Auke [EMAIL PROTECTED] Date: Mon, 10 Sep 2007 17:27:33 -0700 hm, I spoke too soon, I think I can get by for now by just modifying adapter-napi.poll when needed, and this would be clean enough for now. This might change as I enable multiqueue in this driver later though. Ok, let me know if things change. The only reason it doesn't exist was the lack of any need. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][2/2] Add ICMPMsgStats MIB (RFC 4293)
From: David Stevens [EMAIL PROTECTED] Date: Tue, 11 Sep 2007 08:21:54 -0700 So maybe it's not so bad -- I'll roll another per-interface version to see. Let us know how it goes. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html