Re: request for information about the ath5k licensing
On Wed, Sep 05, 2007 at 01:00:15PM -0400, Luis R. Rodriguez wrote: On 9/5/07, Michael Buesch [EMAIL PROTECTED] wrote: On Wednesday 05 September 2007, Reyk Floeter wrote: I'm the author of the free hardware driver layer for wireless Atheros devices in OpenBSD, also known as OpenHAL. I'm still trying to get an idea about the facts and the latest state of the incidence that violated the copyright of my code, because I just returned from vacation. Could you please give me some feedback about the latest state? Please reply in private, I'm not subscribed to any of the Linux lists and I'm rather interested in facts than in the usual trolling. - Has this issue been fixed? It has never been applied to any repository. - No issue and no copyright violation. - Is there any repository available with the ath5k code using a modified/extended license? No. Well that is not accurate. Please give us a few we are working on verifying some information for you. I don't know how to find the relevant bits in the various Linux git repositories. Sorry, I don't get the structure of it. Are there any other sources online except this diff on the linux kernel mailing list? - Are there any plans to release the ath5k code using a modified/extended license? No. Same here. Apologies for this taking so long. It'll all be sorted out soon. I'm still waiting for an answer. Your process is taking too long. Reyk - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][MIPS][7/7] AR7: ethernet
On Thu, 6 Sep 2007, Andrew Morton wrote: On Thu, 6 Sep 2007 17:34:10 +0200 Matteo Croce [EMAIL PROTECTED] wrote: Driver for the cpmac 100M ethernet driver. It works fine disabling napi support, enabling it gives a kernel panic when the first IPv6 packet has to be forwarded. Other than that works fine. The driver does a lot of open-coded dma_cache_inv() calls (in a way which assumes a 32-bit bus, too). I assume that dma_cache_inv() is some mips No, even i386 has it ;-) thing. I'd have thought that it would be better to use the dma mapping API thoughout the driver, and its associated dma invalidation APIs. However, Ralf just posted a patch to remove it on all architectures, and driver writers should consider it gone. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED] In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
new NAPI interface broken
Hi Stephen, I saw that you developed most of the new NAPI interface. I already addressed this issue a while ago. Please correct me if I got it wrong. I think there is still a serious problem with the NAPI changes to make NAPI polling independent of struct net_device objects. Its about the question who inserts and removes devices from the poll list. netif_rx_schedule: sets NAPI_STATE_SCHED flag, insert device in poll list. netif_rx_complete: clears NAPI_STATE_SCHED netif_rx_reschedule: sets NAPI_STATE_SCHED, insert device in poll list. net_rx_action: -removes dev from poll list -calls poll function -adds dev to poll list if NAPI_STATE_SCHED still set 1) netif_rx_complete and netif_rx_reschedule don't work together 2) On SMP systems: after netif_rx_complete has been called on CPU1 (+interruts enabled), netif_rx_schedule could be called on CPU2 (irq handler) before net_rx_action on CPU1 has checked NAPI_STATE_SCHED. In that case the device would be added to poll lists of CPU1 and CPU2 as net_rx_action would see NAPI_STATE_SCHED set. This must not happen. It will be caught when netif_rx_complete is called the second time (BUG() called) This would mean we have a problem on all SMP machines right now. If I got all this right then we probably need a further flag to tell net_rx_action whether to poll again or to stop (with the possibility that the device has been scheduled on a different CPU in between). The old NAPI interface uses the return value of poll to determine if the device has to be polled again or not. We can either switch back or in case we want to stick to the new return value, we might have to add something similar to the NAPI_STATE_SCHED flag or a new parameter... Regards, Jan-Bernd - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
On Fri, 2007-07-09 at 10:31 +0100, James Chapman wrote: Not really. I used 3-year-old, single CPU x86 boxes with e100 interfaces. The idle poll change keeps them in polled mode. Without idle poll, I get twice as many interrupts as packets, one for txdone and one for rx. NAPI is continuously scheduled in/out. Certainly faster than the machine in the paper (which was about 2 years old in 2005). I could never get ping -f to do that for me - so things must be getting worse with newer machines then. No. Since I did a flood ping from the machine under test, the improved latency meant that the ping response was handled more quickly, causing the next packet to be sent sooner. So more packets were transmitted in the allotted time (10 seconds). ok. With current NAPI: rtt min/avg/max/mdev = 0.902/1.843/101.727/4.659 ms, pipe 9, ipg/ewma 1.611/1.421 ms With idle poll changes: rtt min/avg/max/mdev = 0.898/1.117/28.371/0.689 ms, pipe 3, ipg/ewma 1.175/1.236 ms Not bad in terms of latency. The deviation certainly looks better. But the CPU has done more work. I am going to be the devil's advocate[1]: If the problem i am trying to solve is reduce cpu use at lower rate, then this is not the right answer because your cpu use has gone up. Your latency numbers have not improved that much (looking at the avg) and your throughput is not that much higher. Will i be willing to pay more cpu (of an already piggish cpu use by NAPI at that rate with 2 interupts per packet)? Another test: try a simple ping and compare the rtts. The problem I started thinking about was the one where NAPI thrashes in/out of polled mode at higher and higher rates as network interface speeds and CPU speeds increase. A flood ping demonstrates this even on 100M links on my boxes. things must be getting worse in the state of average hardware out there. It will be worthwile exercise to compare on an even faster machine and see what transpires there. Networking boxes want consistent performance/latency for all traffic patterns and they need to avoid interrupt livelock. Current practice seems to be to use hardware interrupt mitigation or timers to limit interrupt rate but this just hurts latency, as you noted. So I'm trying to find a way to limit the NAPI interrupt rate without increasing latency. My comment about this approach being suitable for routers and networked servers is that these boxes care more about minimizing packet latency than they do about wasting CPU cycles by polling idle devices. I think the arguement of who cares about a little more cpu is valid for the case of routers. It is a double edged sword, because it applies to the case of who cares if NAPI uses a little more cpu at low rates and who cares if James turns on polling and abuses a little more-more cpu. Since NAPI is the incumbent, the onus(sp?) is to do better. You must do better sir! Look at the timers, she said - that way you may be able to cut the cpu abuse. cheers, jamal [1] historically the devils advocate was a farce really ;- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
On Fri, Sep 07, 2007 at 03:27:15PM +0200, Johannes Berg wrote: On Thu, 2007-09-06 at 08:46 -0700, Paul E. McKenney wrote: Looks good to me from an RCU viewpoint. I cannot claim familiarity with this code. I therefore especially like the indications of where RTNL is held and not!!! :) Some questions below based on a quick scan. And a global question: should the comments about RTNL being held be replaced by ASSERT_RTNL()? I don't like ASSERT_RTNL() much because it actually tries to lock it. I'd be much happer if it was WARN_ON(!mutex_locked(rtnl_mutex)) or something equivalent. Ah! It would indeed be nice to have a lower-overhead ASSERT_RTNL_LIGHT() or whatever. In any case, I have an updated patch I'll be sending soon, and it requires a new list walking primitive I'll also send. Look forward to seeing it! - write_lock_bh(local-sub_if_lock); + /* we're under RTNL so all this is fine */ if (unlikely(local-reg_state == IEEE80211_DEV_UNREGISTERED)) { - write_unlock_bh(local-sub_if_lock); __ieee80211_if_del(local, sdata); return -ENODEV; } - list_add(sdata-list, local-sub_if_list); + list_add_tail_rcu(sdata-list, local-interfaces); The _rcu is required because this list isn't protected by RTNL? Yes, not all walkers of the list are protected by the RTNL. K. @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi /* Remove all virtual interfaces that use this BSS * as their sdata-bss */ struct ieee80211_sub_if_data *tsdata, *n; - LIST_HEAD(tmp_list); - write_lock_bh(local-sub_if_lock); This code is also protected by RTNL? Yes. Comment? (Or is it in the function header?) ASSERT_RTNL(); I -like- this!!! ;-) :) Thanx, Paul - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Freeing alive inet6 address
From: Denis V. Lunev [EMAIL PROTECTED] addrconf_dad_failure calls addrconf_dad_stop which takes referenced address and drops the count. So, in6_ifa_put perrformed at out: is extra. This results in message: Freeing alive inet6 address and not released dst entries. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] --- ./net/ipv6/ndisc.c.ipv6dad 2007-09-03 16:54:32.0 +0400 +++ ./net/ipv6/ndisc.c 2007-09-07 13:34:30.0 +0400 @@ -736,7 +736,7 @@ static void ndisc_recv_ns(struct sk_buff * so fail our DAD process */ addrconf_dad_failure(ifp); - goto out; + return; } else { /* * This is not a dad solicitation. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
On Thu, 2007-09-06 at 08:46 -0700, Paul E. McKenney wrote: Looks good to me from an RCU viewpoint. I cannot claim familiarity with this code. I therefore especially like the indications of where RTNL is held and not!!! :) Some questions below based on a quick scan. And a global question: should the comments about RTNL being held be replaced by ASSERT_RTNL()? I don't like ASSERT_RTNL() much because it actually tries to lock it. I'd be much happer if it was WARN_ON(!mutex_locked(rtnl_mutex)) or something equivalent. In any case, I have an updated patch I'll be sending soon, and it requires a new list walking primitive I'll also send. - write_lock_bh(local-sub_if_lock); + /* we're under RTNL so all this is fine */ if (unlikely(local-reg_state == IEEE80211_DEV_UNREGISTERED)) { - write_unlock_bh(local-sub_if_lock); __ieee80211_if_del(local, sdata); return -ENODEV; } - list_add(sdata-list, local-sub_if_list); + list_add_tail_rcu(sdata-list, local-interfaces); The _rcu is required because this list isn't protected by RTNL? Yes, not all walkers of the list are protected by the RTNL. @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi /* Remove all virtual interfaces that use this BSS * as their sdata-bss */ struct ieee80211_sub_if_data *tsdata, *n; - LIST_HEAD(tmp_list); - write_lock_bh(local-sub_if_lock); This code is also protected by RTNL? Yes. ASSERT_RTNL(); I -like- this!!! ;-) :) johannes signature.asc Description: This is a digitally signed message part
[RFC] mac80211: fix virtual interface locking
Florian Lohoff noticed a bug in mac80211: when bringing the master interface down while other virtual interfaces are up we call dev_close() under a spinlock which is not allowed. This patch removes the sub_if_lock used by mac80211 in favour of using an RCU list. All list manipulations are already done under rtnl so are well protected against each other, and the read-side locks we took in the RX and TX code are already in RCU read-side critical sections. Signed-off-by: Johannes Berg [EMAIL PROTECTED] Cc: Florian Lohoff [EMAIL PROTECTED] Cc: Herbert Xu [EMAIL PROTECTED] Cc: Michal Piotrowski [EMAIL PROTECTED] Cc: Satyam Sharma [EMAIL PROTECTED] --- If you want to test this you'll need to get the other pending patches, as John is at KS he isn't pushing to Dave who is at KS too anyhow. Grab from http://johannes.sipsolutions.net/patches/net-2.6.24/all/2007-09-06-13:43/ patches 002-011, they are slated to go into net-2.6.24 if timing works out. I'll backport this fix to -stable when we actually get around to verifying it. net/mac80211/ieee80211.c | 100 - net/mac80211/ieee80211_i.h |5 -- net/mac80211/ieee80211_iface.c | 31 +--- net/mac80211/ieee80211_sta.c | 12 ++-- net/mac80211/rx.c |9 +-- net/mac80211/tx.c | 10 ++-- 6 files changed, 84 insertions(+), 83 deletions(-) --- wireless-dev.orig/net/mac80211/ieee80211.c 2007-09-07 10:52:12.604441281 +0200 +++ wireless-dev/net/mac80211/ieee80211.c 2007-09-07 16:30:34.044429746 +0200 @@ -88,24 +88,31 @@ static struct dev_mc_list *ieee80211_get return NULL; } - /* start of iteration, both unassigned */ - if (!mcd-cur !mcd-sdata) { - mcd-sdata = list_entry(local-sub_if_list.next, - struct ieee80211_sub_if_data, list); - mcd-cur = mcd-sdata-dev-mc_list; - } + /* +* Prepare for iteration if not done already. +*/ + list_prepare_entry(mcd-sdata, local-interfaces, list); - if (mcd-cur) + if (mcd-cur) { + /* +* Iterate over the multicast addresses in +* the current device (mcd-sdata). +*/ mcd-cur = mcd-cur-next; + } - while (!mcd-cur) { - /* reached end of interface list? */ - if (mcd-sdata-list.next == local-sub_if_list) - break; - /* otherwise try next interface */ - mcd-sdata = list_entry(mcd-sdata-list.next, - struct ieee80211_sub_if_data, list); - mcd-cur = mcd-sdata-dev-mc_list; + if (!mcd-cur) { + /* +* Iterate over the devices until finding one (the +* first or the next) with multicast addresses. +*/ + list_for_each_entry_continue_rcu(mcd-sdata, +local-interfaces, +list) { + mcd-cur = mcd-sdata-dev-mc_list; + if (mcd-cur) + break; + } } return mcd-cur; @@ -145,9 +152,10 @@ static void ieee80211_configure_filter(s /* * We can iterate through the device list for the multicast -* address list so need to lock it. +* address list so need to be in a RCU read-side section, +* the RTNL isn't held in this function. */ - read_lock(local-sub_if_lock); + rcu_read_lock(); /* be a bit nasty */ new_flags |= (131); @@ -163,7 +171,7 @@ static void ieee80211_configure_filter(s WARN_ON(mcd.cur); local-filter_flags = new_flags ~(131); - read_unlock(local-sub_if_lock); + rcu_read_unlock(); netif_tx_unlock(local-mdev); } @@ -176,14 +184,13 @@ static int ieee80211_master_open(struct struct ieee80211_sub_if_data *sdata; int res = -EOPNOTSUPP; - read_lock(local-sub_if_lock); - list_for_each_entry(sdata, local-sub_if_list, list) { + /* we hold the RTNL here so can safely walk the list */ + list_for_each_entry(sdata, local-interfaces, list) { if (sdata-dev != dev netif_running(sdata-dev)) { res = 0; break; } } - read_unlock(local-sub_if_lock); return res; } @@ -192,11 +199,10 @@ static int ieee80211_master_stop(struct struct ieee80211_local *local = wdev_priv(dev-ieee80211_ptr); struct ieee80211_sub_if_data *sdata; - read_lock(local-sub_if_lock); - list_for_each_entry(sdata, local-sub_if_list, list) + /* we hold the RTNL here so can safely walk the list */ + list_for_each_entry(sdata, local-interfaces, list)
Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
On Fri, 2007-09-07 at 07:25 -0700, Paul E. McKenney wrote: @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi /* Remove all virtual interfaces that use this BSS * as their sdata-bss */ struct ieee80211_sub_if_data *tsdata, *n; - LIST_HEAD(tmp_list); - write_lock_bh(local-sub_if_lock); This code is also protected by RTNL? Yes. Comment? (Or is it in the function header?) Oh, forgot to say: yes, there is a comment further up and even an ASSERT_RTNL() johannes signature.asc Description: This is a digitally signed message part
Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
On Fri, 2007-09-07 at 07:25 -0700, Paul E. McKenney wrote: I don't like ASSERT_RTNL() much because it actually tries to lock it. I'd be much happer if it was WARN_ON(!mutex_locked(rtnl_mutex)) or something equivalent. Ah! It would indeed be nice to have a lower-overhead ASSERT_RTNL_LIGHT() or whatever. I don't know why it tries that anyway. Maybe it's from semaphore days where you couldn't check _is_locked()? In any case, I have an updated patch I'll be sending soon, and it requires a new list walking primitive I'll also send. Look forward to seeing it! Will send in a minute. johannes signature.asc Description: This is a digitally signed message part
Re: [NFS] problems with lockd in 2.6.22.6
On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote: Hello, we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then we get the message lockd: too many open TCP sockets, consider increasing the number of nfsd threads lockd: last TCP connect from ^\\236^\É^D 1) These random characters in the second line are caused by a bug in svc_tcp_accept. I already posted this patch on netdev@vger.kernel.org: Thanks, I've applied that. (The bug is a little subtle: there's actually two previous __svc_print_addr() calls which might have initialized buf correctly, and it's not obvious that the second isn't always called (since it's in a dprintk, which is a macro that expands into a printk inside a conditional)). with this patch applied one gets something like lockd: too many open TCP sockets, consider increasing the number of nfsd threads lockd: last TCP connect from 10.11.0.12, port=784 2) The number of nfsd threads we are running on the machine is 1024. So this is not the problem. It seems, though, that in the case of lockd svc_tcp_accept does not check the number of nfsd threads but the number of lockd threads which is one. As soon as the number of open lockd sockets surpasses 80 this message gets logged. This usually happens every evening when a lot of people shutdown their workstation. So to be clear: there's not an actual problem here other than that the logs are getting spammed? (Not that that isn't a problem in itself.) 3) For unknown reason these sockets then remain open. In the morning when people start their workstation again we therefor not only get a lot of these messages again but often the nfs-server does not proberly work any more. Restarting the nfs-daemon is a workaround. Hm, thanks. --b. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PATCH to bug #8876
Hi there! Below is a fix for this: http://bugzilla.kernel.org/show_bug.cgi?id=8876 Applies to any version since 2.6.22 to latest: 2.6.23-rc5-git1 please apply :) -CUT- diff -urN a/net/ipv4/devinet.c b/net/ipv4/devinet.c --- a/net/ipv4/devinet.c2007-07-09 02:32:17.0 +0300 +++ b/net/ipv4/devinet.c2007-08-10 20:33:22.0 +0300 @@ -1193,7 +1193,7 @@ for (ifa = in_dev-ifa_list, ip_idx = 0; ifa; ifa = ifa-ifa_next, ip_idx++) { if (ip_idx s_ip_idx) - goto cont; + continue; if (inet_fill_ifaddr(skb, ifa, NETLINK_CB(cb-skb).pid, cb-nlh-nlmsg_seq, RTM_NEWADDR, NLM_F_MULTI) = 0) -/CUT- Signed-off-by: [EMAIL PROTECTED] Thanks Nikolay Kopitonenko - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
problems with lockd in 2.6.22.6
Hello, we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then we get the message lockd: too many open TCP sockets, consider increasing the number of nfsd threads lockd: last TCP connect from ^\\236^\É^D 1) These random characters in the second line are caused by a bug in svc_tcp_accept. I already posted this patch on netdev@vger.kernel.org: Signed-off-by: Wolfgang Walter [EMAIL PROTECTED] --- linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.0 +0200 +++ linux-2.6.22.6w/net/sunrpc/svcsock.c2007-09-03 18:27:30.0 +0200 @@ -1090,7 +1090,7 @@ serv-sv_name); printk(KERN_NOTICE %s: last TCP connect from %s\n, - serv-sv_name, buf); + serv-sv_name, __svc_print_addr(sin, buf, sizeof(buf))); } /* * Always select the oldest socket. It's not fair, with this patch applied one gets something like lockd: too many open TCP sockets, consider increasing the number of nfsd threads lockd: last TCP connect from 10.11.0.12, port=784 2) The number of nfsd threads we are running on the machine is 1024. So this is not the problem. It seems, though, that in the case of lockd svc_tcp_accept does not check the number of nfsd threads but the number of lockd threads which is one. As soon as the number of open lockd sockets surpasses 80 this message gets logged. This usually happens every evening when a lot of people shutdown their workstation. 3) For unknown reason these sockets then remain open. In the morning when people start their workstation again we therefor not only get a lot of these messages again but often the nfs-server does not proberly work any more. Restarting the nfs-daemon is a workaround. Reagrds, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix e100 on systems that have cache incoherent DMA
David Acker wrote: On the systems that have cache incoherent DMA, including ARM, there is a race condition between software allocating a new receive buffer and hardware writing into a buffer. The two race on touching the last Receive Frame Descriptor (RFD). It has its el-bit set and its next link equal to 0. When hardware encounters this buffer it attempts to write data to it and then update Status Word bits and Actual Count in the RFD. At the same time software may try to clear the el-bit and set the link address to a new buffer. Since the entire RFD is once cache-line, the two write operations can collide. This can lead to the receive unit stalling or interpreting random memory as its receive area. The fix is to set the el-bit on and the size to 0 on the next to last buffer in the chain. When the hardware encounters this buffer it stops and does not write to it at all. The hardware issues an RNR interrupt with the receive unit in the No Resources state. Software can write to the tail of the list because it knows hardware will stop on the previous descriptor that was marked as the end of list. Once it has a new next to last buffer prepared, it can clear the el-bit and set the size on the previous one. The race on this buffer is safe since the link already points to a valid next buffer and the software can handle the race setting the size (assuming aligned 16 bit writes are atomic with respect to the DMA read). If the hardware sees the el-bit cleared without the size set, it will move on to the next buffer and skip this one. If it sees the size set but the el-bit still set, it will complete that buffer and then RNR interrupt and wait. Flags are kept in the software descriptor to note if the el bit is set and if the size was 0. When software clears the RFD's el bit and set its size, it also clears the el flag but leaves the size was 0 bit set. This way software can identify them when the race may have occurred when cleaning the ring. On these descriptors, it looks ahead and if the next one is complete then hardware must have skipped the current one. Logic is added to prevent two packets in a row being marked while the receiver is running to avoid running in lockstep with the hardware and thereby limiting the required lookahead. This is a patch for 2.6.23-rc4. Signed-off-by: David Acker [EMAIL PROTECTED] first impressions are not good: pings are erratic and shoot up to 3 seconds. In an overnight stress test, the receive unit went offline and never came back up (TX still working). it sounds like something in the logic is suspending the ru too much, but I haven't had time to look deeply into the code yet. Auke --- --- linux-2.6.23-rc4/drivers/net/e100.c.orig2007-08-30 13:32:10.0 -0400 +++ linux-2.6.23-rc4/drivers/net/e100.c 2007-08-30 15:42:07.0 -0400 @@ -106,6 +106,13 @@ * the RFD, the RFD must be dma_sync'ed to maintain a consistent * view from software and hardware. * + * In order to keep updates to the RFD link field from colliding with + * hardware writes to mark packets complete, we use the feature that + * hardware will not write to a size 0 descriptor and mark the previous + * packet as end-of-list (EL). After updating the link, we remove EL + * and only then restore the size such that hardware may use the + * previous-to-end RFD. + * * Under typical operation, the receive unit (RU) is start once, * and the controller happily fills RFDs as frames arrive. If * replacement RFDs cannot be allocated, or the RU goes non-active, @@ -281,14 +288,14 @@ struct csr { }; enum scb_status { + rus_no_res = 0x08, rus_ready= 0x10, rus_mask = 0x3C, }; enum ru_state { - RU_SUSPENDED = 0, - RU_RUNNING = 1, - RU_UNINITIALIZED = -1, + ru_stopped = 0, + ru_running = 1, }; enum scb_stat_ack { @@ -401,10 +408,16 @@ struct rfd { u16 size; }; +enum rx_flags { + rx_el = 0x01, + rx_s0 = 0x02, +}; + struct rx { struct rx *next, *prev; struct sk_buff *skb; dma_addr_t dma_addr; + u8 flags; }; #if defined(__BIG_ENDIAN_BITFIELD) @@ -952,7 +965,7 @@ static void e100_get_defaults(struct nic ((nic-mac = mac_82558_D101_A4) ? cb_cid : cb_i)); /* Template for a freshly allocated RFD */ - nic-blank_rfd.command = cpu_to_le16(cb_el); + nic-blank_rfd.command = 0; nic-blank_rfd.rbd = 0x; nic-blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN); @@ -1753,18 +1766,48 @@ static int e100_alloc_cbs(struct nic *ni return 0; } -static inline void e100_start_receiver(struct nic *nic, struct rx *rx) +static void e100_find_mark_el(struct nic *nic, struct rx *marked_rx, int is_running) { - if(!nic-rxs) return; - if(RU_SUSPENDED != nic-ru_running) return; + struct rx *rx = nic-rx_to_use-prev-prev; +
auto recycling of TIME_WAIT connections
As I see it, TIME_WAIT state is required for 2 reasons: to handle wandering duplicate packets (so a reincarnation of a connection will not be corrupted by these packets) To handle last ack from active closer (client) not being received by remote. If that happened, the server which is in LAST_ACK state would retransmit its FIN (which may contain data also) so the client must be in TIME_WAIT state to handle that. If client is not in TIME_WAIT state, then it could only indicate to the server that data was maybe lost (with an RST). The first issue, requires a large timeout, and the TIME_WAIT timeout is currently 60 seconds on linux. That timeout effectively limits the connection rate between local TCP clients and a server to 32k/60s or around 500 connections/second. But that issue can't really happen when the client and server are on the same machine can it, and even if it could, the timeouts involved would be shorter. Now linux does have an (undocumented) /proc/sys/net/ipv4/tcp_tw_recycle flag to enable recycling of TIME_WAIT connections. This is global however and could cause problems in general for external connections. So how about auto enabling recycling for local connections? cheers, Pádraig. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc4-mm1: e1000e napi lockup
From: Jiri Slaby [EMAIL PROTECTED] Date: Fri, 07 Sep 2007 09:19:30 +0200 I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver. napi_disable(adapter-napi) in e1000_probe freezes the kernel on boot. Yes, the semantics changed slightly in the net-2.6.24 tree the other week and someone needs to fix it up. The netif_napi_add() implicitly does a napi_disable() call. Device open must explicitly napi_enable() and device close must explicitly napi_disable(), and if done elsewhere these calls must be strictly balanced. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wrt Age Entry For IPv4 IPv6 Route Table
I'm trevelling otherwise I would have reviewed and integrated or given feedback for changes. I'll be back late next week. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.23-rc4-mm1: e1000e napi lockup
Hi, I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver. napi_disable(adapter-napi) in e1000_probe freezes the kernel on boot. regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NFS] problems with lockd in 2.6.22.6
Am Freitag, 7. September 2007 18:19 schrieben Sie: On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote: Hello, we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then we get the message lockd: too many open TCP sockets, consider increasing the number of nfsd threads lockd: last TCP connect from ^\\236^\É^D 2) The number of nfsd threads we are running on the machine is 1024. So this is not the problem. It seems, though, that in the case of lockd svc_tcp_accept does not check the number of nfsd threads but the number of lockd threads which is one. As soon as the number of open lockd sockets surpasses 80 this message gets logged. This usually happens every evening when a lot of people shutdown their workstation. So to be clear: there's not an actual problem here other than that the logs are getting spammed? (Not that that isn't a problem in itself.) When more than 80 nfs clients try to lock files at the same time then it probably would. 3) For unknown reason these sockets then remain open. In the morning when people start their workstation again we therefor not only get a lot of these messages again but often the nfs-server does not properly work any more. Restarting the nfs-daemon is a workaround. Hm, thanks. I don't know if the lockd thing is the reason, though. 2.6.22.6 per se runs stable (no oops, no crash etc) but kernel nfs seems to be a little bit unstable. 2.6.17.11 run for months without any nfsd-related problems whereas in 2.6.22.6 nfs needs to be restarted almost every day. Sometimes this fails with lockd_down: lockd failed to exit, clearing pid nfsd: last server has exited nfsd: unexporting all filesystems lockd_up: makesock failed, error=-98 after which the server must be rebooted. I think there is something with lockd because there are no problems over the day. It is in the morning when a lot of people log into their machines and start their desktops (I think kde locks its config files when it reads them). Regards -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bonding: update some distro-specific documentation
Andy Gospodarek [EMAIL PROTECTED] wrote: This all looks fine except for one nit (well, request for extra detail, really): @@ -802,15 +802,20 @@ BROADCAST=192.168.1.255 ONBOOT=yes BOOTPROTO=none USERCTL=no +BONDING_OPTS=mode=balance-alb miimon=100 Be sure to change the networking specific lines (IPADDR, NETMASK, NETWORK and BROADCAST) to match your network configuration. +You also need to set the BONDING_OPTS= line to specify the desired +options for your bond0 interface. Specifying bonding options in this +way is the preferred method for configuring bonding interfaces. Can you add something here that mentions that, for the arp_ip_target option, it has to be supplied as arp_ip_target=+10.0.0.1 and not just arp_ip_target=10.0.0.1? Also, multiple targets require multiple instances of the arp_ip_target option; it doesn't work to put multiple IP addresses as in the module option (i.e., arp_ip_target=10.0.0.1,10.0.0.2). This is necessary because ifup-eth isn't adding the + when it translates the option for use with sysfs or parsing the multiple IP address syntax. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2][RESEND] ehea: fix last_rx update
Update last_rx in registered device struct instead of in the dummy device. Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- drivers/net/ehea/ehea_main.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c index 1e9fd6f..717b129 100644 --- a/drivers/net/ehea/ehea_main.c +++ b/drivers/net/ehea/ehea_main.c @@ -471,7 +471,7 @@ static struct ehea_cqe *ehea_proc_rwqes(struct net_device *dev, else netif_receive_skb(skb); - dev-last_rx = jiffies; + port-netdev-last_rx = jiffies; } else { pr-p_stats.poll_receive_errors++; port_reset = ehea_treat_poll_error(pr, rq, cqe, -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PROBLEM: Oops with forcedeth and netkey in 2.6.21 and 22
Hi there, I cannot get my AMD64 working with forcedeth network chip and Netkey. While recompiling kernel to get iptables and OpenSWAN working, I cannot anymore boot my computer, it freeze on network setup. After some reboots / recompile, I've traced the problem arround NetKEY. If I enable it in the kernel, I'm getting oops. Starting in single, I've been able to see errors comming because dhcpclient process and af_packet module. If I dont load af_packet at boot, i can setup manually an ip address. Unfortunaty, when lauching gnome, my computer hang (probably some process tries to load af_packets ?) The NIC is on-board NIC on the MSI Neo4 Platinum motherboard (product MS-7125) I've tried thoses kernel version, same behavior : - 2.6.21.6 - 2.6.22.1 - 2.6.22.6 Here is the oops I got (dmesg captured) : skb_under_panic: text:c02b089c len:14 put:14 head:f74e8410 data:f74e8402 tail:f74e8400 end:f74e8580 dev:NULL [ cut here ] kernel BUG at net/core/skbuff.c:111! invalid opcode: [#1] PREEMPT SMP Modules linked in: af_packet usbhid sha256 sha1 hmac crypto_hash des crypto_algapi af_key xfrm_user ohci_hcd parport_pc ehci_hcd usbcore parport ohci1394 ieee1394 nvidia(P) floppy forcedeth sg CPU:1 EIP:0060:[c029f0c9]Tainted: P VLI EFLAGS: 00010292 (2.6.21.6 #2) EIP is at skb_under_panic+0x59/0x60 eax: 0072 ebx: f74e8410 ecx: f72a2000 edx: esi: edi: 0800 ebp: esp: f72a3d6c ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process dhcpcd (pid: 3723, ti=f72a2000 task=f76bd030 task.ti=f72a2000) Stack: c0398bd0 c02b089c 000e 000e f74e8410 f74e8402 f74e8400 f74e8580 c037d386 f74e8402 f7fa4c80 c02b08a1 f79b4000 f72a3f40 f79b4000 c215a900 f7fa4c80 f8b95cca f72a3ecc 0148 c016c455 f7a92600 0008cae0 Call Trace: [c02b089c] eth_header+0x10c/0x120 [c02b08a1] eth_header+0x111/0x120 [f8b95cca] packet_sendmsg+0x14a/0x260 [af_packet] [c016c455] link_path_walk+0x65/0xc0 [c0299dbe] sock_sendmsg+0xce/0x100 [c0131180] autoremove_wake_function+0x0/0x40 [c016d3d4] path_lookup+0x14/0x20 [c02fdb10] unix_find_other+0x30/0x1a0 [c029a193] sys_sendto+0x133/0x180 [c029b1ce] sys_socketcall+0x14e/0x280 [c0102c7e] sysenter_past_esp+0x5f/0x85 === Code: 00 00 89 5c 24 14 8b 98 90 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 60 89 4c 24 04 c7 04 24 d0 8b 39 c0 89 44 24 08 e8 57 f4 e7 ff 0f 0b eb fe 8d 76 00 56 53 bb 86 d3 37 c0 83 ec 24 8b 70 14 85 EIP: [c029f0c9] skb_under_panic+0x59/0x60 SS:ESP 0068:f72a3d6c Here is my running environement : Linux amd64-linux 2.6.21.6 #1 SMP PREEMPT Thu Sep 6 23:33:04 CEST 2007 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux Gnu C 4.2.0 Gnu make 3.81 binutils Binutils util-linux 2.12r mount 2.12r module-init-tools 3.2.2 e2fsprogs 1.40.2 reiserfsprogs 3.6.19 PPP2.4.4 Linux C Library libc.2.6 Dynamic linker (ldd) 2.6 Procps 3.2.7 Net-tools 1.60 Kbd1.13 Sh-utils 6.9 udev 114 And my on-board NIC : 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) Subsystem: Micro-Star International Co., Ltd. Unknown device 7125 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- Latency: 0 (250ns min, 5000ns max) Interrupt: pin A routed to IRQ 16 Region 0: Memory at fe029000 (32-bit, non-prefetchable) [size=4K] Region 1: I/O ports at b400 [size=8] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1 +,D2+,D3hot+,D3cold+) Status: D0 PME-Enable+ DSel=0 DScale=0 PME- Thanks for your help. --Alexandre - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
On Fri, 2007-09-07 at 18:01 +0200, Michael Buesch wrote: What's the problem with trying to lock it? I think I had a problem with it once when I inserted it into some code that was atomic and it all blew up badly ;) Nothing important really but it sort of made me not like it much. johannes signature.asc Description: This is a digitally signed message part
Re: [PATCH] Fix e100 on systems that have cache incoherent DMA
Kok, Auke wrote: first impressions are not good: pings are erratic and shoot up to 3 seconds. In an overnight stress test, the receive unit went offline and never came back up (TX still working). it sounds like something in the logic is suspending the ru too much, but I haven't had time to look deeply into the code yet. I don't have an e100 enabled x86 box handy but I will look into getting one setup. I just applied this patch to my PXA255 based system http://www.compulab.co.il/x255/html/x255-cm-datasheet.htm . It is running 2.6.18.4 plus compulab patches plus some hostap patches plus the e100 patch. I get: pings going from the embedded system to a desktop machine. 100 packets transmitted, 100 received, 0% packet loss, time 98996ms rtt min/avg/max/mdev = 0.239/0.728/1.512/0.571 ms Pings going the from the desktop machine to the embedded system 100 packets transmitted, 100 received, 0% packet loss, time 99217ms rtt min/avg/max/mdev = 0.206/0.876/1.473/0.575 ms iperf tcp from embedded to desktop gets: [ 5] 0.0-100.0 sec 1007 MBytes 84.4 Mbits/sec iperf udp from the embedded to the desktop gets (embedded told to send at 100mbps): [ 5] Server Report: [ 5] 0.0-100.0 sec947 MBytes 79.4 Mbits/sec 0.068 ms 16/675645 (0.0024%) [ 5] 0.0-100.0 sec 1 datagrams received out-of-order iperf tcp from the desktop to the embedded gets: [ 6] 0.0-100.0 sec 1.01 GBytes 86.4 Mbits/sec iperf udp from the desktop to the embedded gets the following when the desktop sent at 100 mbps [ 5] 0.0-100.0 sec964 MBytes 80.8 Mbits/sec 0.359 ms 126467/813760 (16%) [ 5] 0.0-100.0 sec 1 datagrams received out-of-order Boot messages for my e100 are: e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI e100: Copyright(c) 1999-2005 Intel Corporation PCI: enabling device :00:09.0 ( - 0003) PCI: Setting latency timer of device :00:09.0 to 64 e100: eth0: e100_probe: addr 0x10131000, irq 111, MAC addr 00:09:30:FF:F2:F6 cat /sys/bus/pci/drivers/e100/\:00\:09.0/{device,vendor,subsystem_device,subsystem_vendor} 0x1209 0x8086 0x 0x It's on its own interrupt line: cm-debian:~# cat /proc/interrupts |grep eth0 111: 402428 - eth0 lspci shows: 00:09.0 Ethernet controller: Intel Corporation 8255xER/82551IT Fast Ethernet Controller (rev 09) Let me know if there is any other information I can provide you. I will look through the code to see what could be going on with your machine. I will also look into reproducing these results with a newer kernel. This may be tricky since compulab's patches are pretty stale and don't always apply easily. -Ack - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix e100 on systems that have cache incoherent DMA
David Acker wrote: Kok, Auke wrote: first impressions are not good: pings are erratic and shoot up to 3 seconds. In an overnight stress test, the receive unit went offline and never came back up (TX still working). it sounds like something in the logic is suspending the ru too much, but I haven't had time to look deeply into the code yet. I don't have an e100 enabled x86 box handy but I will look into getting one setup. I just applied this patch to my PXA255 based system http://www.compulab.co.il/x255/html/x255-cm-datasheet.htm . It is running 2.6.18.4 plus compulab patches plus some hostap patches plus the e100 patch. I get: pings going from the embedded system to a desktop machine. 100 packets transmitted, 100 received, 0% packet loss, time 98996ms rtt min/avg/max/mdev = 0.239/0.728/1.512/0.571 ms Pings going the from the desktop machine to the embedded system 100 packets transmitted, 100 received, 0% packet loss, time 99217ms rtt min/avg/max/mdev = 0.206/0.876/1.473/0.575 ms ok, I just got a note from our lab saying that that particular system has the freak ping times even without your patch applied 8) ignoring the ping issue, we still have the ru offline, but that could have possibly been caused by whatever is causing this ping issue... More testing is needed, and I'll try to find a system without the ping issue here first. Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.
Anil Veerabhadrappa wrote: + +/* iSCSI stages */ +#define ISCSI_STAGE_SECURITY_NEGOTIATION (0) +#define ISCSI_STAGE_LOGIN_OPERATIONAL_NEGOTIATION (1) +#define ISCSI_STAGE_FULL_FEATURE_PHASE (3) +/* Logout response codes */ +#define ISCSI_LOGOUT_RESPONSE_CONNECTION_CLOSED (0) +#define ISCSI_LOGOUT_RESPONSE_CID_NOT_FOUND (1) +#define ISCSI_LOGOUT_RESPONSE_CLEANUP_FAILED (3) + +/* iSCSI task types */ +#define ISCSI_TASK_TYPE_READ(0) +#define ISCSI_TASK_TYPE_WRITE (1) +#define ISCSI_TASK_TYPE_MPATH (2) All of these iscsi code shoulds be in iscsi_proto.h or should be added there. This is a very tricky proposal as this header file is automatically generated by a well defined process and is shared between various driver supporting multiple platform/OS and the firmware. If it is not of a big issue I would like to keep it the way it is. The values that are iscsi RFC values should come from the iscsi_proto.h file and not be duplicated for each driver. +/* + * hardware reset + */ +int bnx2i_reset(struct scsi_cmnd *sc) +{ + return 0; +} So what is up with this one? It seems like if there is a way to reset hardware then you would want it as the scsi eh host reset callout instead of dropping the session. We could add some transport level recovery callouts for the iscsi specifics. We may not be able to support HBA cold reset as bnx2 driver is the primary owner of chip reset and initialization. This is the drawback of sharing network interface with the NIC driver. If there is a need for administrator to reset the iSCSI port same can be achieved by running 'ifdown eth#' and 'ifup eth#'. Current driver even allows ethernet interface reset when there are active iSCSI connection, all active iscsi sessions will be reinstated when the network link comes back live If you cannot support it or it does not make sense just remove the stub then. I say it is not a big deal now, but hopefully we do not hit fun like with qla3xxx and qla4xxx :) + +void bnx2i_sysfs_cleanup(void) +{ + class_device_unregister(port_class_dev); + class_unregister(bnx2i_class); +} The sysfs bits related to the hba should be use one of the scsi sysfs facilities or if they are related to iscsi bits and are generic then through the iscsi hba bnx2i needs 2 sysfs entries - 1. QP size info - this is used to size per connection shared data structures to issue work requests to chip (login, scsi cmd, tmf, nopin) and get completions from the chip (scsi completions, async messages, etc'). This is a iSCSI HBA attribute 2. port mapper - we can be more flexible on classifying this as either iSCSI HBA attribute or bnx2i driver global attribute Can hooks be added to iSCSI transport class to include these? Which ones were they exactly? I think JamesB wanted only common transport values in the transport class. If it is driver specific then it should go on the host or target or device with the scsi_host_template attrs. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
In gmane.linux.network, you wrote: But the CPU has done more work. The flood ping will always show increased CPU with these changes because the driver always stays in the NAPI poll list. For typical LAN traffic, the average CPU usage doesn't increase as much, though more measurements would be useful. I'd be particularly interested to see what happens to your latency when other apps are hogging the cpu. I assume from your description that your cpu is mostly free to schedule the niced softirqd for the device polling duration, but this won't always be the case. If other tasks are running at high priority, it could be nearly a full jiffy before softirqd gets to check the poll list again and the latency introduced could be much higher than you've yet measured. Jason - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
error(s) in 2.6.23-rc5 bonding.txt ?
I was perusing Documentation/networking/bonding.txt in a 2.6.23-rc5 tree and came across the following discussing the round-robin scheduling: Note that this out of order delivery occurs when both the sending and receiving systems are utilizing a multiple interface bond. Consider a configuration in which a balance-rr bond feeds into a single higher capacity network channel (e.g., multiple 100Mb/sec ethernets feeding a single gigabit ethernet via an etherchannel capable switch). In this configuration, traffic sent from the multiple 100Mb devices to a destination connected to the gigabit device will not see packets out of order. My first reaction was that this was incorrect - it didn't matter if the receiver was using a single link or not because the packets flowing across the multiple 100Mb links could hit the intermediate device out of order and so stay that way across the GbE link. Before I go and patch-out that text I thought I'd double check. rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ixgbe: driver for Intel(R) 82598 PCI-Express 10GbE adapters (v4)
David Miller wrote: From: Kok, Auke [EMAIL PROTECTED] Date: Thu, 06 Sep 2007 11:31:47 -0700 Also available through git:// and http:// here: http://foo-projects.org/~sofar/ixgbe-20070905-submission.patch http://foo-projects.org/~sofar/ixgbe-20070905-submission.patch.bz2 (git-am formatted!) git://lost.foo-projects.org/~ahkok/linux-2.6 ixgbe-20070905-submission To be honest I have absolutely no problems with this driver and we should just cut the crap and merge it in now. Any objections anyone makes at this point is frankly nit picking crap which we can cure with followon cleanups and corrections. Are you responding to a strawman or something? AFAICS nobody objected to it, and Auke cleaned it up a la e1000e, which got queued during KS. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] DOC: Update networking/multiqueue.txt with correct information.
Updated the multiqueue.txt document to call out the correct kernel options to select to enable multiqueue. Signed-off-by: Peter P Waskiewicz Jr [EMAIL PROTECTED] --- Documentation/networking/multiqueue.txt | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt index 00b60cc..ea5a42e 100644 --- a/Documentation/networking/multiqueue.txt +++ b/Documentation/networking/multiqueue.txt @@ -58,9 +58,13 @@ software, so it's a straight round-robin qdisc. It uses the same syntax and classification priomap that sch_prio uses, so it should be intuitive to configure for people who've used sch_prio. -The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been -built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of -bands requested is equal to the number of queues on the hardware. If they +In order to utilitize the multiqueue features of the qdiscs, the network +device layer needs to enable multiple queue support. This can be done by +selecting NETDEVICES_MULTIQUEUE under Drivers. + +The PRIO qdisc naturally plugs into a multiqueue device. If +NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of +bands requested is compared to the number of queues on the hardware. If they are equal, it sets a one-to-one mapping up between the queues and bands. If they're not equal, it will not load the qdisc. This is the same behavior for RR. Once the association is made, any skb that is classified will have - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][MIPS][7/7] AR7: ethernet
Matteo Croce wrote: Il Friday 07 September 2007 00:30:25 Andrew Morton ha scritto: On Thu, 6 Sep 2007 17:34:10 +0200 Matteo Croce [EMAIL PROTECTED] wrote: Driver for the cpmac 100M ethernet driver. It works fine disabling napi support, enabling it gives a kernel panic when the first IPv6 packet has to be forwarded. Other than that works fine. I'm not too sure why I got cc'ed on this (and not on patches 1-6?) but whatever. I mailed every maintainer in the respective section in the file MAINTAINERS and you were in the NETWORK DEVICE DRIVERS section This patch introduces quite a number of basic coding-style mistakes. Please run it through scripts/checkpatch.pl and review the output. Already done. I'm collecting other suggestions before committing cool, I'll wait for the resend before reviewing, then. As an author I understand that fixing up coding style / cosmetic stuff rather than meat is annoying. But it is important to emphasize that a clean driver is what makes a good, thorough, effective review possible. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CORRECTION][PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c
Micah Gruber wrote: This patch fixes a potential null dereference bug where we dereference dev before a null check. This patch simply moves the dereferencing after the null check. Signed-off-by: Micah Gruber [EMAIL PROTECTED] --- --- a/drivers/net/tulip/uli526x.c +++ b/drivers/net/tulip/uli526x.c @@ -663,7 +663,7 @@ { struct net_device *dev = dev_id; struct uli526x_board_info *db = netdev_priv(dev); - unsigned long ioaddr = dev-base_addr; + unsigned long ioaddr; unsigned long flags; if (!dev) { @@ -671,6 +671,8 @@ return IRQ_NONE; } + ioaddr = dev-base_addr; + as satyam noted, just remove the !dev test - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix e100 on systems that have cache incoherent DMA
David Acker wrote: Let me know if there is any other information I can provide you. I will look through the code to see what could be going on with your machine. I will also look into reproducing these results with a newer kernel. This may be tricky since compulab's patches are pretty stale and don't always apply easily. pktgen outputs for the various cases modified/unmodified[/others?] would be nice, if you have a spot of time. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: error(s) in 2.6.23-rc5 bonding.txt ?
Rick Jones [EMAIL PROTECTED] wrote: [...] Note that this out of order delivery occurs when both the sending and receiving systems are utilizing a multiple interface bond. Consider a configuration in which a balance-rr bond feeds into a single higher capacity network channel (e.g., multiple 100Mb/sec ethernets feeding a single gigabit ethernet via an etherchannel capable switch). In this configuration, traffic sent from the multiple 100Mb devices to a destination connected to the gigabit device will not see packets out of order. My first reaction was that this was incorrect - it didn't matter if the receiver was using a single link or not because the packets flowing across the multiple 100Mb links could hit the intermediate device out of order and so stay that way across the GbE link. Usually it does matter, at least at the time I tested this. Usually, the even striping of traffic from the balance-rr mode will deliver in-order to a single higher speed link (e.g., N 100Mb feeding a single 1Gb). I say usually because, although I don't see it happen with the equipment I have, I'm willing to believe that there are gizmos that would bundle packets arriving on the switch ports. The reordering (usually) occurs when packet coalescing stuff (either interrupt mitigation on the device, or NAPI) happens at the receiver end, after the packets are striped evenly into the interfaces, e.g., eth0eth1eth2 P1 P2 P3 P4 P5 P6 P7 P8 P9 and then eth0 goes and grabs a bunch of its packets, then eth1, and eth2 do the same afterwards, so the received order ends up something like P1, P4, P7, P2, P5, P8, P3, P6, P9. In Ye Olde Dayes Of Yore, with one packet per interrupt at 10 Mb/sec, this type of configuration wouldn't reorder (or at least not as badly). The text probably is lacking in some detail, though. The real key is that the last sender before getting to the destination system has to do the round-robin striping. Most switches that I'm familiar with (again, never seen one, but willing to believe there is one) don't have round-robin as a load balance option for etherchannel, and thus won't evenly stripe traffic, but instead do some math on the packets so that a given connection isn't split across ports. That said, it's certainly plausible that, for a given set of N ethernets all enslaved to a single bonding balance-rr, the individual ethernets could get out of sync, as it were (e.g., one running a fuller tx ring, and thus running behind the others). If bonding is the only feeder of the devices, then for a continuous flow of traffic, all the slaves will generally receive packets (from the kernel, for transmission) at pretty much the same rate, and so they won't tend to get ahead or behind. I haven't investigated into this deeply for a few years, but this is my recollection of what happened with the tests I did then. I did testing with multiple 100Mb devices feeding either other sets of 100Mb devices or single gigabit devices. I'm willing to believe that things have changed, and an N feeding into one configuration can reorder, but I haven't seen it (or really looked for it; balance-rr isn't much the rage these days). -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc4-mm1: e1000e napi lockup
Kok, Auke wrote: David Miller wrote: From: Jiri Slaby [EMAIL PROTECTED] Date: Fri, 07 Sep 2007 09:19:30 +0200 I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver. napi_disable(adapter-napi) in e1000_probe freezes the kernel on boot. Yes, the semantics changed slightly in the net-2.6.24 tree the other week and someone needs to fix it up. The netif_napi_add() implicitly does a napi_disable() call. Device open must explicitly napi_enable() and device close must explicitly napi_disable(), and if done elsewhere these calls must be strictly balanced. I'll fix it... it's my patch that adds the new napi code to it and I need to get it ready for the merge window anyway. well since its close to the merge window opening, we could see what happens if DaveM pulls branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git That should make this class of pre-merge-window annoyance go away. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: request for information about the ath5k licensing
Reyk Floeter wrote: I'm still waiting for an answer. Your process is taking too long. Speaking as a person through which these changes flow upstream into the official kernel (ath5k maintainers - linville - me - linus)... The most important thing for today is that no ath5k stuff has been committed (nor has it ever been). I would rather take it slow and make sure everybody is happy. There is nothing upstream, and so, there is no need to rush and correct something. Collectively, this is just growing pains. Everyone is breaking new ground, trying to figure out how to best support atheros stuff on Linux. There are new tools to deal with (svn? git? flavor of the day?:)), new licenses with new ramifications to consider, a new wireless stack to deal with. What you are witnessing is but a small part of the chaos as everyone tackles these chores simultaneously. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: error(s) in 2.6.23-rc5 bonding.txt ?
That said, it's certainly plausible that, for a given set of N ethernets all enslaved to a single bonding balance-rr, the individual ethernets could get out of sync, as it were (e.g., one running a fuller tx ring, and thus running behind the others). That is the scenario of which I was thinking. If bonding is the only feeder of the devices, then for a continuous flow of traffic, all the slaves will generally receive packets (from the kernel, for transmission) at pretty much the same rate, and so they won't tend to get ahead or behind. I could see that if there was just one TCP connection going doing bulk or something, but if there were a bulk transmitter coupled with an occasional request/response (ie netperf TCP_STREAM and a TCP_RR) i'd think the tx rings would no longer remain balanced. I haven't investigated into this deeply for a few years, but this is my recollection of what happened with the tests I did then. I did testing with multiple 100Mb devices feeding either other sets of 100Mb devices or single gigabit devices. I'm willing to believe that things have changed, and an N feeding into one configuration can reorder, but I haven't seen it (or really looked for it; balance-rr isn't much the rage these days). Are you OK with that block of text simply being yanked? rick - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] atl1: add CONFIG_ATL1_EXPERIMENTAL to kconfig
From: Chris Snook [EMAIL PROTECTED] Introduce Kconfig ATL1_EXPERIMENTAL to separate mature code from less mature code in the atl1 driver, and remove EXPERIMENTAL designation for ATL1. Signed-off-by: Chris Snook [EMAIL PROTECTED] Acked-by: Jay Cliburn [EMAIL PROTECTED] --- a/drivers/net/Kconfig 2007-09-04 10:12:38.0 -0400 +++ b/drivers/net/Kconfig 2007-09-04 10:37:34.0 -0400 @@ -2329,8 +2329,8 @@ config QLA3XXX will be called qla3xxx. config ATL1 - tristate Attansic L1 Gigabit Ethernet support (EXPERIMENTAL) - depends on PCI EXPERIMENTAL + tristate Attansic L1 Gigabit Ethernet support + depends on PCI select CRC32 select MII help @@ -2339,6 +2339,16 @@ config ATL1 To compile this driver as a module, choose M here. The module will be called atl1. +config ATL1_EXPERIMENTAL + bool atl1 experimental features + depends on ATL1 EXPERIMENTAL + help + This option enables various features that have not yet reached + the maturity of the rest of the atl1 driver. The driver will + still work fine without this option enabled. + + If unsure, say N. + endif # NETDEV_1000 # - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: unable to handle kernel NULL pointer dereference1
Hi Mark, [Adding netdev to CC] On 07/09/2007, Mark Nipper [EMAIL PROTECTED] wrote: I've received two oopses now from my kernel while running the 2.6.22 series. The first was with 2.6.22.1 back in July and the second which happened just within the last day is 2.6.22.5. They both appear to be the same bug and I don't think it's hardware related. I'm attaching the entries from logcheck which I received when they happened. I'm not subscribed to the mailing list, so please make sure to copy me directly on any replies. And let me know if anyone needs any additional information to try to track this down. Thanks for reading... Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc4-mm1: e1000e napi lockup
Kok, Auke wrote: Jeff Garzik wrote: Kok, Auke wrote: David Miller wrote: From: Jiri Slaby [EMAIL PROTECTED] Date: Fri, 07 Sep 2007 09:19:30 +0200 I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver. napi_disable(adapter-napi) in e1000_probe freezes the kernel on boot. Yes, the semantics changed slightly in the net-2.6.24 tree the other week and someone needs to fix it up. The netif_napi_add() implicitly does a napi_disable() call. Device open must explicitly napi_enable() and device close must explicitly napi_disable(), and if done elsewhere these calls must be strictly balanced. I'll fix it... it's my patch that adds the new napi code to it and I need to get it ready for the merge window anyway. well since its close to the merge window opening, we could see what happens if DaveM pulls branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git That should make this class of pre-merge-window annoyance go away. If I do that now I get a big merge conflict: oh you are _guaranteed_ conflicts. most of that is NAPI-area code that got changed by both. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] atl1: wrap problematic optimizations in CONFIG_ATL1_EXPERIMENTAL
From: Chris Snook [EMAIL PROTECTED] Make certain problematic optimizations build-time configurable. Signed-off-by: Chris Snook [EMAIL PROTECTED] Acked-by: Jay Cliburn [EMAIL PROTECTED] --- a/drivers/net/atl1/atl1_main.c 2007-09-04 10:12:38.0 -0400 +++ b/drivers/net/atl1/atl1_main.c 2007-09-04 11:23:26.0 -0400 @@ -2203,22 +2203,26 @@ static int __devinit atl1_probe(struct p struct net_device *netdev; struct atl1_adapter *adapter; static int cards_found = 0; - bool pci_using_64 = true; + bool pci_using_64 = false; int err; err = pci_enable_device(pdev); if (err) return err; +#ifdef CONFIG_ATL1_EXPERIMENTAL err = pci_set_dma_mask(pdev, DMA_64BIT_MASK); + if (!err) { + pci_using_64 = true; + goto dma_ok; + } +#endif /* CONFIG_ATL1_EXPERIMENTAL */ + err = pci_set_dma_mask(pdev, DMA_32BIT_MASK); if (err) { - err = pci_set_dma_mask(pdev, DMA_32BIT_MASK); - if (err) { - dev_err(pdev-dev, no usable DMA configuration\n); - goto err_dma; - } - pci_using_64 = false; + dev_err(pdev-dev, no usable DMA configuration\n); + goto err_dma; } +dma_ok: /* Mark all PCI regions associated with PCI device * pdev as being reserved by owner atl1_driver_name */ @@ -2294,11 +2298,13 @@ static int __devinit atl1_probe(struct p netdev-features |= (NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX); /* -* FIXME - Until tso performance gets fixed, disable the feature. +* TSO currently has performance problems, +* so let's disable it by default. * Enable it with ethtool -K if desired. */ - /* netdev-features |= NETIF_F_TSO; */ - +#ifdef CONFIG_ATL1_EXPERIMENTAL + netdev-features |= NETIF_F_TSO; +#endif if (pci_using_64) netdev-features |= NETIF_F_HIGHDMA; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc4-mm1: e1000e napi lockup
Jeff Garzik wrote: Kok, Auke wrote: Jeff Garzik wrote: Kok, Auke wrote: David Miller wrote: From: Jiri Slaby [EMAIL PROTECTED] Date: Fri, 07 Sep 2007 09:19:30 +0200 I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver. napi_disable(adapter-napi) in e1000_probe freezes the kernel on boot. Yes, the semantics changed slightly in the net-2.6.24 tree the other week and someone needs to fix it up. The netif_napi_add() implicitly does a napi_disable() call. Device open must explicitly napi_enable() and device close must explicitly napi_disable(), and if done elsewhere these calls must be strictly balanced. I'll fix it... it's my patch that adds the new napi code to it and I need to get it ready for the merge window anyway. well since its close to the merge window opening, we could see what happens if DaveM pulls branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git That should make this class of pre-merge-window annoyance go away. If I do that now I get a big merge conflict: oh you are _guaranteed_ conflicts. most of that is NAPI-area code that got changed by both. actually that's the only thing it was, and fixing it up was trivial (took me about 3 minutes). it was 3x the napi code and once a struct indent change... I'll have a new e1000e napi patch for andrew in a sec. Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] atl1: Introduce CONFIG_ATL1_EXPERIMENTAL
Chris Snook wrote: The atl1 driver is currently marked EXPERIMENTAL, because a few supposedly performance-enhancing features still have problems. When these features are disabled, the driver is completely stable, fully functional, and performs well. Patch 1/2 Creates the kconfig option CONFIG_ATL1_EXPERIMENTAL, and removes the EXPERIMENTAL designation from CONFIG_ATL1 Patch 2/2 Wraps some currently-disabled features in #ifdef CONFIG_ATL1_EXPERIMENTAL, so developers and testers can play with these features more easily, and distributions will still get a fast, stable driver with existing .config files. We'll also be using this to wrap around various new features we'll be experimenting with in coming months. Instead of using a half dozen different kconfig options for each of them, like some drivers do, we'll just use this, and make sure things are safe for everyone before we take them out of the experimental wrapper. Well, I haven't received patch #2 yet, but in general a runtime switch (module option?) is greatly preferred. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] atl1: Introduce CONFIG_ATL1_EXPERIMENTAL
Jeff Garzik wrote: Chris Snook wrote: The atl1 driver is currently marked EXPERIMENTAL, because a few supposedly performance-enhancing features still have problems. When these features are disabled, the driver is completely stable, fully functional, and performs well. Patch 1/2 Creates the kconfig option CONFIG_ATL1_EXPERIMENTAL, and removes the EXPERIMENTAL designation from CONFIG_ATL1 Patch 2/2 Wraps some currently-disabled features in #ifdef CONFIG_ATL1_EXPERIMENTAL, so developers and testers can play with these features more easily, and distributions will still get a fast, stable driver with existing .config files. We'll also be using this to wrap around various new features we'll be experimenting with in coming months. Instead of using a half dozen different kconfig options for each of them, like some drivers do, we'll just use this, and make sure things are safe for everyone before we take them out of the experimental wrapper. Well, I haven't received patch #2 yet, but in general a runtime switch (module option?) is greatly preferred. Jeff Okay, I'll think about how we want to parameterize this. I don't want users expecting development options to be around forever. I'll resubmit something once I have more of these experimental features ready to submit. -- Chris - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [atl1-devel] [PATCH 2/2] atl1: wrap problematic optimizations in CONFIG_ATL1_EXPERIMENTAL
On 9/8/07, Chris Snook [EMAIL PROTECTED] wrote: From: Chris Snook [EMAIL PROTECTED] Make certain problematic optimizations build-time configurable. Signed-off-by: Chris Snook [EMAIL PROTECTED] Acked-by: Jay Cliburn [EMAIL PROTECTED] --- a/drivers/net/atl1/atl1_main.c 2007-09-04 10:12:38.0 -0400 +++ b/drivers/net/atl1/atl1_main.c 2007-09-04 11:23:26.0 -0400 @@ -2203,22 +2203,26 @@ static int __devinit atl1_probe(struct p struct net_device *netdev; struct atl1_adapter *adapter; static int cards_found = 0; - bool pci_using_64 = true; + bool pci_using_64 = false; int err; err = pci_enable_device(pdev); if (err) return err; +#ifdef CONFIG_ATL1_EXPERIMENTAL err = pci_set_dma_mask(pdev, DMA_64BIT_MASK); + if (!err) { + pci_using_64 = true; + goto dma_ok; + } +#endif /* CONFIG_ATL1_EXPERIMENTAL */ This is more like CONFIG_ATL1_PLEASE_KILL_MY_MACHINE; I really don't see the problem with just limiting the DMA mask: - if you don't have physical mem over the 4GB boundary limiting DMA doesn't make any difference - if you have more than 4GB of memory the machine won't survive long without it Luca - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [atl1-devel] [PATCH 2/2] atl1: wrap problematic optimizations in CONFIG_ATL1_EXPERIMENTAL
Luca wrote: On 9/8/07, Chris Snook [EMAIL PROTECTED] wrote: From: Chris Snook [EMAIL PROTECTED] Make certain problematic optimizations build-time configurable. Signed-off-by: Chris Snook [EMAIL PROTECTED] Acked-by: Jay Cliburn [EMAIL PROTECTED] --- a/drivers/net/atl1/atl1_main.c 2007-09-04 10:12:38.0 -0400 +++ b/drivers/net/atl1/atl1_main.c 2007-09-04 11:23:26.0 -0400 @@ -2203,22 +2203,26 @@ static int __devinit atl1_probe(struct p struct net_device *netdev; struct atl1_adapter *adapter; static int cards_found = 0; - bool pci_using_64 = true; + bool pci_using_64 = false; int err; err = pci_enable_device(pdev); if (err) return err; +#ifdef CONFIG_ATL1_EXPERIMENTAL err = pci_set_dma_mask(pdev, DMA_64BIT_MASK); + if (!err) { + pci_using_64 = true; + goto dma_ok; + } +#endif /* CONFIG_ATL1_EXPERIMENTAL */ This is more like CONFIG_ATL1_PLEASE_KILL_MY_MACHINE; I really don't see the problem with just limiting the DMA mask: - if you don't have physical mem over the 4GB boundary limiting DMA doesn't make any difference - if you have more than 4GB of memory the machine won't survive long without it Atheros is still working on this, and we plan to fix it. 64-bit DMA *should* work. I just resubmitted your patch with the comment Jeff requested. I still may want to revisit CONFIG_ATL1_EXPERIMENTAL soon when I start playing around with more features. -- Chris - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [-MM, FIX V3] e1000e: incorporate napi_struct changes from net-2.6.24.git
This incorporates the new napi_struct changes into e1000e. Included bugfix for ifdown hang from Krishna Kumar for e1000. Disabling polling is no longer needed at init time, so remove napi_disable() call from _probe(). Signed-off-by: Auke Kok [EMAIL PROTECTED] --- drivers/net/e1000e/e1000.h |2 ++ drivers/net/e1000e/netdev.c | 39 --- 2 files changed, 18 insertions(+), 23 deletions(-) diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h index c57e35a..d2499bb 100644 --- a/drivers/net/e1000e/e1000.h +++ b/drivers/net/e1000e/e1000.h @@ -187,6 +187,8 @@ struct e1000_adapter { struct e1000_ring *tx_ring /* One per active queue */ cacheline_aligned_in_smp; + struct napi_struct napi; + unsigned long tx_queue_len; unsigned int restart_queue; u32 txd_cmd; diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c index 372da46..f8ec537 100644 --- a/drivers/net/e1000e/netdev.c +++ b/drivers/net/e1000e/netdev.c @@ -1149,12 +1149,12 @@ static irqreturn_t e1000_intr_msi(int irq, void *data) mod_timer(adapter-watchdog_timer, jiffies + 1); } - if (netif_rx_schedule_prep(netdev)) { + if (netif_rx_schedule_prep(netdev, adapter-napi)) { adapter-total_tx_bytes = 0; adapter-total_tx_packets = 0; adapter-total_rx_bytes = 0; adapter-total_rx_packets = 0; - __netif_rx_schedule(netdev); + __netif_rx_schedule(netdev, adapter-napi); } else { atomic_dec(adapter-irq_sem); } @@ -1212,12 +1212,12 @@ static irqreturn_t e1000_intr(int irq, void *data) mod_timer(adapter-watchdog_timer, jiffies + 1); } - if (netif_rx_schedule_prep(netdev)) { + if (netif_rx_schedule_prep(netdev, adapter-napi)) { adapter-total_tx_bytes = 0; adapter-total_tx_packets = 0; adapter-total_rx_bytes = 0; adapter-total_rx_packets = 0; - __netif_rx_schedule(netdev); + __netif_rx_schedule(netdev, adapter-napi); } else { atomic_dec(adapter-irq_sem); } @@ -1662,10 +1662,10 @@ set_itr_now: * e1000_clean - NAPI Rx polling callback * @adapter: board private structure **/ -static int e1000_clean(struct net_device *poll_dev, int *budget) +static int e1000_clean(struct napi_struct *napi, int budget) { - struct e1000_adapter *adapter; - int work_to_do = min(*budget, poll_dev-quota); + struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi); + struct net_device *poll_dev = adapter-netdev; int tx_cleaned = 0, work_done = 0; /* Must NOT use netdev_priv macro here. */ @@ -1684,25 +1684,20 @@ static int e1000_clean(struct net_device *poll_dev, int *budget) spin_unlock(adapter-tx_queue_lock); } - adapter-clean_rx(adapter, work_done, work_to_do); - *budget -= work_done; - poll_dev-quota -= work_done; + adapter-clean_rx(adapter, work_done, budget); /* If no Tx and not enough Rx work done, exit the polling mode */ - if ((!tx_cleaned (work_done == 0)) || + if ((tx_cleaned (work_done budget)) || !netif_running(poll_dev)) { quit_polling: if (adapter-itr_setting 3) e1000_set_itr(adapter); - netif_rx_complete(poll_dev); - if (test_bit(__E1000_DOWN, adapter-state)) - atomic_dec(adapter-irq_sem); - else - e1000_irq_enable(adapter); + netif_rx_complete(poll_dev, napi); + e1000_irq_enable(adapter); return 0; } - return 1; + return work_done; } static void e1000_vlan_rx_add_vid(struct net_device *netdev, u16 vid) @@ -2439,7 +2434,7 @@ int e1000e_up(struct e1000_adapter *adapter) clear_bit(__E1000_DOWN, adapter-state); - netif_poll_enable(adapter-netdev); + napi_enable(adapter-napi); e1000_irq_enable(adapter); /* fire a link change interrupt to start the watchdog */ @@ -2472,7 +2467,7 @@ void e1000e_down(struct e1000_adapter *adapter) e1e_flush(); msleep(10); - netif_poll_disable(netdev); + napi_disable(adapter-napi); e1000_irq_disable(adapter); del_timer_sync(adapter-watchdog_timer); @@ -2605,7 +2600,7 @@ static int e1000_open(struct net_device *netdev) /* From here on the code is the same as e1000e_up() */ clear_bit(__E1000_DOWN, adapter-state); - netif_poll_enable(netdev); + napi_enable(adapter-napi); e1000_irq_enable(adapter); @@ -4090,8 +4085,7 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
Re: [PATCH] [-MM, FIX V3] e1000e: incorporate napi_struct changes from net-2.6.24.git
Auke Kok wrote: This incorporates the new napi_struct changes into e1000e. Included bugfix for ifdown hang from Krishna Kumar for e1000. Disabling polling is no longer needed at init time, so remove napi_disable() call from _probe(). david, while testing this patch I noticed that the poll routine is now called 100% of the time, and since I'm not doing much different than before, I suspec that something in the new napi code is staying in polling mode forever? Since e1000e is pretty much the same code as e1000, I doubt the problem is there, but you can probably tell better. ideas? Auke Signed-off-by: Auke Kok [EMAIL PROTECTED] --- drivers/net/e1000e/e1000.h |2 ++ drivers/net/e1000e/netdev.c | 39 --- 2 files changed, 18 insertions(+), 23 deletions(-) diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h index c57e35a..d2499bb 100644 --- a/drivers/net/e1000e/e1000.h +++ b/drivers/net/e1000e/e1000.h @@ -187,6 +187,8 @@ struct e1000_adapter { struct e1000_ring *tx_ring /* One per active queue */ cacheline_aligned_in_smp; + struct napi_struct napi; + unsigned long tx_queue_len; unsigned int restart_queue; u32 txd_cmd; diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c index 372da46..f8ec537 100644 --- a/drivers/net/e1000e/netdev.c +++ b/drivers/net/e1000e/netdev.c @@ -1149,12 +1149,12 @@ static irqreturn_t e1000_intr_msi(int irq, void *data) mod_timer(adapter-watchdog_timer, jiffies + 1); } - if (netif_rx_schedule_prep(netdev)) { + if (netif_rx_schedule_prep(netdev, adapter-napi)) { adapter-total_tx_bytes = 0; adapter-total_tx_packets = 0; adapter-total_rx_bytes = 0; adapter-total_rx_packets = 0; - __netif_rx_schedule(netdev); + __netif_rx_schedule(netdev, adapter-napi); } else { atomic_dec(adapter-irq_sem); } @@ -1212,12 +1212,12 @@ static irqreturn_t e1000_intr(int irq, void *data) mod_timer(adapter-watchdog_timer, jiffies + 1); } - if (netif_rx_schedule_prep(netdev)) { + if (netif_rx_schedule_prep(netdev, adapter-napi)) { adapter-total_tx_bytes = 0; adapter-total_tx_packets = 0; adapter-total_rx_bytes = 0; adapter-total_rx_packets = 0; - __netif_rx_schedule(netdev); + __netif_rx_schedule(netdev, adapter-napi); } else { atomic_dec(adapter-irq_sem); } @@ -1662,10 +1662,10 @@ set_itr_now: * e1000_clean - NAPI Rx polling callback * @adapter: board private structure **/ -static int e1000_clean(struct net_device *poll_dev, int *budget) +static int e1000_clean(struct napi_struct *napi, int budget) { - struct e1000_adapter *adapter; - int work_to_do = min(*budget, poll_dev-quota); + struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi); + struct net_device *poll_dev = adapter-netdev; int tx_cleaned = 0, work_done = 0; /* Must NOT use netdev_priv macro here. */ @@ -1684,25 +1684,20 @@ static int e1000_clean(struct net_device *poll_dev, int *budget) spin_unlock(adapter-tx_queue_lock); } - adapter-clean_rx(adapter, work_done, work_to_do); - *budget -= work_done; - poll_dev-quota -= work_done; + adapter-clean_rx(adapter, work_done, budget); /* If no Tx and not enough Rx work done, exit the polling mode */ - if ((!tx_cleaned (work_done == 0)) || + if ((tx_cleaned (work_done budget)) || !netif_running(poll_dev)) { quit_polling: if (adapter-itr_setting 3) e1000_set_itr(adapter); - netif_rx_complete(poll_dev); - if (test_bit(__E1000_DOWN, adapter-state)) - atomic_dec(adapter-irq_sem); - else - e1000_irq_enable(adapter); + netif_rx_complete(poll_dev, napi); + e1000_irq_enable(adapter); return 0; } - return 1; + return work_done; } static void e1000_vlan_rx_add_vid(struct net_device *netdev, u16 vid) @@ -2439,7 +2434,7 @@ int e1000e_up(struct e1000_adapter *adapter) clear_bit(__E1000_DOWN, adapter-state); - netif_poll_enable(adapter-netdev); + napi_enable(adapter-napi); e1000_irq_enable(adapter); /* fire a link change interrupt to start the watchdog */ @@ -2472,7 +2467,7 @@ void e1000e_down(struct e1000_adapter *adapter) e1e_flush(); msleep(10); - netif_poll_disable(netdev); + napi_disable(adapter-napi); e1000_irq_disable(adapter); del_timer_sync(adapter-watchdog_timer); @@ -2605,7 +2600,7 @@ static int e1000_open(struct
Re: error(s) in 2.6.23-rc5 bonding.txt ?
Rick Jones [EMAIL PROTECTED] wrote: [...] If bonding is the only feeder of the devices, then for a continuous flow of traffic, all the slaves will generally receive packets (from the kernel, for transmission) at pretty much the same rate, and so they won't tend to get ahead or behind. I could see that if there was just one TCP connection going doing bulk or something, but if there were a bulk transmitter coupled with an occasional request/response (ie netperf TCP_STREAM and a TCP_RR) i'd think the tx rings would no longer remain balanced. I'm not sure that would be the case, because even the traffic bump from the TCP_RR would be funneled through the round-robin. So, the next packet of the bulk transmit would simply be pushed back to the next available interface. Perhaps varying packet sizes would throw things out of whack, if the small ones happened to line up all one one interface (regardless of the other traffic). A PAUSE frame to one interface would almost certainly get things out of whack, but I don't know how long it would stay out of whack (or, really, how likely getting a PAUSE is). Probably just as long as all of the slaves are running at full speed. I haven't investigated into this deeply for a few years, but this is my recollection of what happened with the tests I did then. I did testing with multiple 100Mb devices feeding either other sets of 100Mb devices or single gigabit devices. I'm willing to believe that things have changed, and an N feeding into one configuration can reorder, but I haven't seen it (or really looked for it; balance-rr isn't much the rage these days). Are you OK with that block of text simply being yanked? Mmm... I'm an easy sell for a usually or other suitable caveat added in strategic places (avoiding absolute statements and all that). The text does reflect the results of experiments I ran at the time, so I'm reluctant to toss it wholesale simply because we speculate over how it might not be accurate. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TG3]: Workaround MSI bug on 5714/5780.
On Thu, 2007-09-06 at 12:50 -0700, David Miller wrote: From: Michael Chan [EMAIL PROTECTED] Date: Thu, 06 Sep 2007 12:05:30 -0700 The HT1000 bridge may very well have an MSI issue. I'm checking with ServerWorks and I will do some testing to confirm. If confirmed, we can disable MSI behind the HT1000 bridge instead of globally. The 5714 issue is not caused by the HT1000 as it is not behind the HT1000. What I'm going to do at this point is just merge the tg3 fix into the current 2.6.23 tree right now. Meanwhile I'll have the HT1000 MSI quirk revert ready and, unless we find a reason not to, I'll ask Greg KH to merge that patch into 2.6.24 David, I see that you have already done the revert in your 2.6.23 tree. So the following patch assumes the revert is already done. I think it is quite safe for this to go into 2.6.23. [PCI]: Add MSI quirk for ServerWorks HT1000 PCIX bridge. This is the fix for the following problem: https://bugzilla.redhat.com/show_bug.cgi?id=227657 The bnx2 device 5706 complains about MSI not working behind a ServerWorks HT1000 PCIX bridge. An earlier commit to fix the problem: e3008dedff4bdc96a5f67224cd3d8d12237082a0: PCI: disable MSI by default on systems with Serverworks HT1000 chips was not entirely correct, and has been reverted. MSI does not work on the PCIX bus because the BIOS did not set the HT_MSI_FLAGS_ENABLE bit in the HyperTransport MSI capability on the bridge. We use the existing quirk_msi_ht_cap() to detect the problem and disable MSI in all buses behind it. Signed-off-by: Michael Chan [EMAIL PROTECTED] Cc: Anantha Subramanyam [EMAIL PROTECTED] Cc: Naren Sankar [EMAIL PROTECTED] diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6da5a5d..c58429b 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -1703,6 +1703,9 @@ static void __devinit quirk_msi_ht_cap(struct pci_dev *dev) } DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SERVERWORKS, PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE, quirk_msi_ht_cap); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SERVERWORKS, + PCI_DEVICE_ID_SERVERWORKS_HT1000_PXB, + quirk_msi_ht_cap); /* The nVidia CK804 chipset may have 2 HT MSI mappings. * MSI are supported if the MSI capability set in any of these mappings. diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 3e34dc0..1bdf8be 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -1428,6 +1428,7 @@ #define PCI_DEVICE_ID_SERVERWORKS_HE 0x0008 #define PCI_DEVICE_ID_SERVERWORKS_LE 0x0009 #define PCI_DEVICE_ID_SERVERWORKS_GCNB_LE 0x0017 +#define PCI_DEVICE_ID_SERVERWORKS_HT1000_PXB 0x0036 #define PCI_DEVICE_ID_SERVERWORKS_EPB0x0103 #define PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE 0x0132 #define PCI_DEVICE_ID_SERVERWORKS_OSB4 0x0200 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html