Re: Patch [PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec = 1000000 seems wrong
From: Guillaume Chazarain [EMAIL PROTECTED] Date: Wed, 19 Jul 2006 14:47:34 +0200 Shuya MAEDA wrote : while (__delta USEC_PER_SEC){ ... }, but I think it should be while (__delta = USEC_PER_SEC){ ... }. Is it right? I agree, good catch :-) Applied, thanks a lot. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH dscape] d80211: Switch to IEEE80211_ style naming in d80211.h
Christoph Hellwig made a comment about how the names in ieee80211.h make more sense, and I also agree. This should also make patches for migrating fullmac drivers to d80211 smaller. Hopefully I didn't miss anything in the transition. Patch is bzip2ed since a 49kb patch seems a little big for inline. Thanks, -MIchael Wu switch-to-new-names.patch.bz2 Description: BZip2 compressed data pgpAnnmOSkqNA.pgp Description: PGP signature
[NET] initialisation cleanup for ULI526x-net-driver
From: Henrik Kretzschmar [EMAIL PROTECTED] [NET] initialisation cleanup for ULI526x-net-driver removes the unneeded local variable rc replace pci_module_init() with pci_register_driver() two coding style issues on switch Signed-off-by: Henrik Kretzschmar [EMAIL PROTECTED] --- diff -ruN linux-2.6.18-rc2-git2/drivers/net/tulip/uli526x.c linux/drivers/net/tulip/uli526x.c --- linux-2.6.18-rc2-git2/drivers/net/tulip/uli526x.c 2006-07-24 13:58:05.0 +0200 +++ linux/drivers/net/tulip/uli526x.c 2006-07-24 14:21:43.0 +0200 @@ -1702,7 +1702,6 @@ static int __init uli526x_init_module(void) { - int rc; printk(version); printed_version = 1; @@ -1714,22 +1713,19 @@ if (cr6set) uli526x_cr6_user_set = cr6set; - switch(mode) { + switch (mode) { case ULI526X_10MHF: case ULI526X_100MHF: case ULI526X_10MFD: case ULI526X_100MFD: uli526x_media_mode = mode; break; - default:uli526x_media_mode = ULI526X_AUTO; + default: + uli526x_media_mode = ULI526X_AUTO; break; } - rc = pci_module_init(uli526x_driver); - if (rc 0) - return rc; - - return 0; + return pci_register_driver(uli526x_driver); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET] initialisation cleanup for ULI526x-net-driver 2nd(mailer issue)
From: Henrik Kretzschmar [EMAIL PROTECTED] [NET] initialisation cleanup for ULI526x-net-driver removes the unneeded local variable rc replace pci_module_init() with pci_register_driver() two coding style issues on switch Signed-off-by: Henrik Kretzschmar [EMAIL PROTECTED] --- diff -ruN linux-2.6.18-rc2-git2/drivers/net/tulip/uli526x.c linux/drivers/net/tulip/uli526x.c --- linux-2.6.18-rc2-git2/drivers/net/tulip/uli526x.c 2006-07-24 13:58:05.0 +0200 +++ linux/drivers/net/tulip/uli526x.c 2006-07-24 14:21:43.0 +0200 @@ -1702,7 +1702,6 @@ static int __init uli526x_init_module(void) { - int rc; printk(version); printed_version = 1; @@ -1714,22 +1713,19 @@ if (cr6set) uli526x_cr6_user_set = cr6set; - switch(mode) { + switch (mode) { case ULI526X_10MHF: case ULI526X_100MHF: case ULI526X_10MFD: case ULI526X_100MFD: uli526x_media_mode = mode; break; - default:uli526x_media_mode = ULI526X_AUTO; + default: + uli526x_media_mode = ULI526X_AUTO; break; } - rc = pci_module_init(uli526x_driver); - if (rc 0) - return rc; - - return 0; + return pci_register_driver(uli526x_driver); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH dscape] d80211: Switch to IEEE80211_ style naming in d80211.h
On Mon, 24 Jul 2006 01:39:30 -0700, Michael Wu wrote: Christoph Hellwig made a comment about how the names in ieee80211.h make more sense, and I also agree. This should also make patches for migrating fullmac drivers to d80211 smaller. Could you split the patch to two patches (one for the d80211 stack and one for drivers) and add some description to both of them? I will take care of pushing both patches to John then. Thanks, Jiri -- Jiri Benc SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH wireless-dev] d80211: Make MACSTR/MAC2STR macro available to drivers
On Sun, 23 Jul 2006 01:43:25 -0700, Michael Wu wrote: This patch moves the MACSTR/MAC2STR macros to d80211.h so that they are available to drivers. It also converts the adm8211 and bcm43xx drivers to use this macro. I really dislike those MACSTR/MAC2STR names. I always fail to remember which one is which. What about renaming them when we are touching them? And why not to use MAC_FMT/MAC_ARG names as used in net/ieee80211.h? ;-) Thanks, Jiri -- Jiri Benc SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless-2.6 git repos broken
On Sun, Jul 23, 2006 at 11:13:52AM -0500, Larry Finger wrote: Michael Buesch wrote: On Sunday 23 July 2006 17:59, Larry Finger wrote: Do you have the same problem on other git trees? I saw some people running into this error when cloning Linus' linux-2.6.git. I couldn't reproduce it, using exactly the same git version. I had the same error when pulling from Linus's tree. It was fixed with a 'git fsck-objects --full' command. Uhm, that tells me the git tree on kernel.org is broken and John has to run this command, right? I think so. Hmmm...well, I'll look into it. FWIW, I cloned my git tree from kernel.org onto my laptop while I was at OLS w/o any problems. John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] PHYLIB: Fix forcing mode reduction
Jeff, Any status on accepting this patch, I've got some additional fixes that are based on having access to genphy_update_link() - kumar On Jun 5, 2006, at 6:45 PM, Nathaniel Case wrote: On Mon, 2006-06-05 at 17:08 -0500, Andy Fleming wrote: Looks good. Feel free to send these patches to netdev@vger.kernel.org (you may need to subscribe), and copy Jeff Garzik [EMAIL PROTECTED]. This fixes a problem seen when a port without a cable connected would repeatedly print out Trying 1000/HALF. While in the PHY_FORCING state, the call to phy_read_status() was resetting the value of phydev-speed and phydev-duplex, preventing it from incrementally trying the speed/duplex variations. Since we really just want the link status updated for the PHY_FORCING state, calling genphy_update_link() instead of phy_read_status() fixes this issue. Patch tested on a MPC8540 platform with a BCM5421 PHY. Signed-off-by: Nate Case [EMAIL PROTECTED] Signed-off-by: Andy Fleming [EMAIL PROTECTED] --- --- a/drivers/net/phy/phy.c 2006-06-04 16:01:59.0 -0500 +++ b/drivers/net/phy/phy.c 2006-06-05 10:55:31.0 -0500 @@ -767,7 +783,7 @@ } break; case PHY_FORCING: - err = phy_read_status(phydev); + err = genphy_update_link(phydev); if (err) break; --- a/drivers/net/phy/phy_device.c 2006-06-04 16:02:08.0 -0500 +++ b/drivers/net/phy/phy_device.c 2006-06-04 19:12:26.0 -0500 @@ -417,6 +417,7 @@ return 0; } +EXPORT_SYMBOL(genphy_update_link); /* genphy_read_status * - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH wireless-dev] d80211: Make MACSTR/MAC2STR macro available to drivers
On Monday 24 July 2006 06:54, Jiri Benc wrote: I really dislike those MACSTR/MAC2STR names. I always fail to remember which one is which. What about renaming them when we are touching them? And why not to use MAC_FMT/MAC_ARG names as used in net/ieee80211.h? ;-) I dislike MACSTR/MAC2STR too, but I was trying to minimize changes to the d80211 code. I'll switch to MAC_FMT/MAC_ARG. -Michael Wu pgpzNB1IryxvS.pgp Description: PGP signature
Re: [PATCH wireless-dev 0/5] Switch drivers to d80211
On 7/23/06, Michael Wu [EMAIL PROTECTED] wrote: Hi, This patch series converts a number of fullmac wireless drivers to use d80211.h instead of ieee80211.h. Nice work The remaining drivers are hostap, atmel, zd1211rw and ipw*. I've been working on zd1211rw, give me a week. Anyone started ipw yet? Luis - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with bugfix for bond active-backup mode + vlans
Christophe Devriese [EMAIL PROTECTED] wrote: [...] Would it be acceptable to have an interface flag IFF_SILENT that can be set on an interface, which prevents it from receiving packets in both forwarding paths ? Starting with kernel version 2.6.17, there is code in skb_bond() to suppress traffic on inactive slaves, but it looks like that will bypassed by hardware accelerated VLAN packets (which, if I'm reading the code correctly, have their skb-dev directly assigned to the VLAN interface, so they go into netif_receive_skb with skb-dev not set to the slave device, which will bypass the stuff in skb_bond). An IFF_SILENT type of flag would work fine (if checked in both input paths) for the active-backup mode, but the 802.3ad and balance-alb modes need differing types of traffic dropping, e.g., the balance-alb mode just needs to suppress broadcast and multicasts. One possible solution for this would be to have bonding remove the vlan registration for inactive slaves, which would cause the errant packets to pass through skb_bond() normally and presumably be dropped. That would work for the active-backup case, but would cause balance-alb mode to lose VLAN acceleration on all interfaces except for one. Another possibility would be to have __vlan_hwaccel_rx check the VLAN_DEV_INFO(skb-dev)-real_dev, and if that's a bonding device, apply the same logic found in skb_bond(). Or, if there's some way to ask the question is dev a VLAN device?, then that same test could be put into skb_bond() and all of the packet suppression fru fru could stay there. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problems with sky2 driver.
Todd Showalter wrote: I've been having trouble with the sky2 driver. It appears to work most of the time, but it will quite often wedge during transfers. The 2.6.17.* kernels actually seem worse than 2.6.16.19, but none of them work perfectly. What typically happens is that after working perfectly for a while, existing net connections hang, and subsequent net connections don't seem to start at all. firefox gets stuck with a bunch of half-loaded pages, for instance, and I've watched an scp of a large file to a colleague's machine stall and remain stalled. Please test with the very latest git snapshot. A critical fix was applied after 2.6.18-rc2 was released. Once the machine is behaving this way, a reboot is the only way I have found of recovering it. We have two identical machines here that are both behaving this way, so I'm assuming it's not a hardware problem per se. The machines are Intel Pentium D 940 (3GHz) processors. They have ASUS P5LD2 motherboards, with builtin Marvell PCIe 88E8053 gigabit ethernet controllers. I'm not running any binary modules; it's an untainted kernel. I'm running a Gentoo system, but I'm using the vanilla-sources kernel (ie: a pure kernel.org release, not the Gentoo-specific patched version). What can I do to help solve this? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problems with sky2 driver.
On Mon, 24 Jul 2006 14:38:39 -0400 Todd Showalter [EMAIL PROTECTED] wrote: On Mon, 24 Jul 2006 10:53:03 -0700, Stephen Hemminger [EMAIL PROTECTED] wrote: There is a receive problem that shows up under load, that is fixed in the latest version (2.6.18 git), the patch is queued for the stable tree as well. I have hand-patched this in my 2.6.17.6 kernel. It seems better (no hard wedge yet), but there are still definitely problems. The most obvious place is in firefox; for example, the front page of slashdot half-renders (all the borders, no stories) and then sits loading for eternity. Ditto the online package database at gentoo.org. I'm seeing similar behavior with other websites as well. It's consistant, too; I haven't been able to view the slashdot front page since booting a 2.6.17 kernel. I suspect that probably isn't a sky2 driver problem. Does it go away if you turn of TCP window scaling: sysctl -w net.ipv4.tcp_window_scaling=0 If so, you probably have a middlebox in your path that is not correctly handling TCP window scaling. OpenBSD seems to be particularly bad. If I boot with the 2.6.16.9 kernel, I don't seem to get that problem until the network actually hangs. Todd. -- Todd Showalter Silverbirch Studios -- Stephen Hemminger [EMAIL PROTECTED] And in the Packet there writ down that doome - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netchannles: first stage has been completed. Further ideas.
On Wed, 19 Jul 2006 13:01:50 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 19 Jul 2006 15:52:04 -0400 As a related note, I am looking into fixing inet hash tables to use RCU. IBM had posted a patch a long time ago, which would be not so hard to munge into the current tree. See if you can spot it in the archives :) Srivatsa Vaddagiri from IBM did patch: http://lkml.org/lkml/2004/8/31/129 And Ben had a patch: http://lwn.net/Articles/174596/ Srivata's was more complete but pre-dates Acme's rearrangement. Also, there is some code for refcnt's in it that looks wrong. Or at minimum is masking underlying design flaws. /* Ungrab socket and destroy it, if it was the last reference. */ static inline void sock_put(struct sock *sk) { - if (atomic_dec_and_test(sk-sk_refcnt)) - sk_free(sk); +sp_loop: + if (atomic_dec_and_test(sk-sk_refcnt)) { + /* Restore ref count and schedule callback. +* If we don't restore ref count, then the callback can be +* scheduled by more than one CPU. +*/ + atomic_inc(sk-sk_refcnt); + + if (atomic_read(sk-sk_refcnt) == 1) + call_rcu(sk-sk_rcu, sk_free_rcu); + else + goto sp_loop; + } } Ben's still left reader writer locks, and needed IPV6 work. He said he plans to get back to it. -- Stephen Hemminger [EMAIL PROTECTED] And in the Packet there writ down that doome - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problems with sky2 driver.
On Mon, 24 Jul 2006 11:45:33 -0700, Stephen Hemminger [EMAIL PROTECTED] wrote: I suspect that probably isn't a sky2 driver problem. Does it go away if you turn of TCP window scaling: sysctl -w net.ipv4.tcp_window_scaling=0 If so, you probably have a middlebox in your path that is not correctly handling TCP window scaling. OpenBSD seems to be particularly bad. This seems to be the case. The combination of the patch and shutting off tcp window scaling seems to have fixed the box. Thanks! I'll ask around locally about network structure. Todd. -- Todd Showalter Silverbirch Studios - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kthread: airo.c
Sukadev Bhattiprolu [EMAIL PROTECTED] wrote: | Andrew, | | Javier Achirica, one of the major contributors to drivers/net/wireless/airo.c | took a look at this patch, and doesn't have any problems with it. It doesn't | fix any bugs and is just a cleanup, so it certainly isn't a candidate | for this mainline cycle Here is the same patch, merged up to 2.6.18-rc2. Christoph's patch (see http://lkml.org/lkml/2006/7/13/332) still applies cleanly on top of this. - The airo driver is currently caching a pid for later use, but with the implementation of containers, pids themselves do not uniquely identify a task. The driver is also using kernel_thread() which is deprecated in drivers. This patch essentially replaces the kernel_thread() with kthread_create(). It also stores the task_struct of the airo_thread rather than its pid. Since this introduces a second task_struct in struct airo_info, the patch renames airo_info.task to airo_info.list_bss_task. As an extension of these changes, the patch further: - replaces kill_proc() with kthread_stop() - replaces signal_pending() with kthread_should_stop() - removes thread completion synchronisation which is handled by kthread_stop(). Signed-off-by: Sukadev Bhattiprolu [EMAIL PROTECTED] Cc: Javier Achirica [EMAIL PROTECTED] Cc: Christoph Hellwig [EMAIL PROTECTED] Cc: John Linville [EMAIL PROTECTED] drivers/net/wireless/airo.c | 38 +++--- 1 files changed, 15 insertions(+), 23 deletions(-) Index: linux-2.6.18-rc1-mm2/drivers/net/wireless/airo.c === --- linux-2.6.18-rc1-mm2.orig/drivers/net/wireless/airo.c 2006-07-14 14:04:01.0 -0700 +++ linux-2.6.18-rc1-mm2/drivers/net/wireless/airo.c2006-07-20 19:44:50.0 -0700 @@ -47,6 +47,7 @@ #include linux/pci.h #include asm/uaccess.h #include net/ieee80211.h +#include linux/kthread.h #include airo.h @@ -1187,11 +1188,10 @@ struct airo_info { int whichbap); unsigned short *flash; tdsRssiEntry *rssi; - struct task_struct *task; + struct task_struct *list_bss_task; + struct task_struct *airo_thread_task; struct semaphore sem; - pid_t thr_pid; wait_queue_head_t thr_wait; - struct completion thr_exited; unsigned long expires; struct { struct sk_buff *skb; @@ -1736,9 +1736,9 @@ static int readBSSListRid(struct airo_in issuecommand(ai, cmd, rsp); up(ai-sem); /* Let the command take effect */ - ai-task = current; + ai-list_bss_task = current; ssleep(3); - ai-task = NULL; + ai-list_bss_task = NULL; } rc = PC4500_readrid(ai, first ? ai-bssListFirst : ai-bssListNext, list, ai-bssListRidLen, 1); @@ -2400,8 +2400,7 @@ void stop_airo_card( struct net_device * clear_bit(FLAG_REGISTERED, ai-flags); } set_bit(JOB_DIE, ai-jobs); - kill_proc(ai-thr_pid, SIGTERM, 1); - wait_for_completion(ai-thr_exited); + kthread_stop(ai-airo_thread_task); /* * Clean out tx queue @@ -2811,9 +2810,8 @@ static struct net_device *_init_airo_car ai-config.len = 0; ai-pci = pci; init_waitqueue_head (ai-thr_wait); - init_completion (ai-thr_exited); - ai-thr_pid = kernel_thread(airo_thread, dev, CLONE_FS | CLONE_FILES); - if (ai-thr_pid 0) + ai-airo_thread_task = kthread_run(airo_thread, dev, dev-name); + if (IS_ERR(ai-airo_thread_task)) goto err_out_free; ai-tfm = NULL; rc = add_airo_dev( dev ); @@ -2930,8 +2928,7 @@ err_out_unlink: del_airo_dev(dev); err_out_thr: set_bit(JOB_DIE, ai-jobs); - kill_proc(ai-thr_pid, SIGTERM, 1); - wait_for_completion(ai-thr_exited); + kthread_stop(ai-airo_thread_task); err_out_free: free_netdev(dev); return NULL; @@ -3063,13 +3060,7 @@ static int airo_thread(void *data) { struct airo_info *ai = dev-priv; int locked; - daemonize(%s, dev-name); - allow_signal(SIGTERM); - while(1) { - if (signal_pending(current)) - flush_signals(current); - /* make swsusp happy with our thread */ try_to_freeze(); @@ -3097,7 +3088,7 @@ static int airo_thread(void *data) { set_bit(JOB_AUTOWEP, ai-jobs); break; } - if (!signal_pending(current)) { + if (!kthread_should_stop()) { unsigned long wake_at;
Re: linux-2.6.17(.6): bnx2.c:(.text+0xd741e): undefined reference to `crc32_le'
On Fri, 2006-07-21 at 05:06 -0700, Toralf Förster wrote: Compiling (an exotic ?) config I got: ... CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 drivers/built-in.o: In function `bnx2_set_rx_mode': bnx2.c:(.text+0xd741e): undefined reference to `crc32_le' drivers/built-in.o: In function `bnx2_test_nvram': bnx2.c:(.text+0xd9a5f): undefined reference to `crc32_le' bnx2.c:(.text+0xd9a83): undefined reference to `crc32_le' make: *** [.tmp_vmlinux1] Error 1 BNX2 requires the CRC32 library and the current kernels do not have that dependency in the Kconfig. This has been fixed and will be in 2.6.18. For now, you can turn on CONFIG_CRC32 (to y) and that should fix the problem. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Repost: Re: [VLAN]: translate IF_OPER_DORMANT to netif_dormant_on()
From: Patrick McHardy [EMAIL PROTECTED] Date: Wed, 19 Jul 2006 14:42:35 +0200 Stefan Rompf wrote: [VLAN]: Fix link state propagation When the queue of the underlying device is stopped at initialization time or the device is marked not present, the state will be propagated to the vlan device and never change. Based on an analysis by Patrick McHardy. ACKed-by: Patrick McHardy [EMAIL PROTECTED] Applied, and I will queue this up for -stable. Thanks everyone. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netchannles: first stage has been completed. Further ideas.
Hello! Also, there is some code for refcnt's in it that looks wrong. Yes, it is disgusting. rcu does not allow to increase socket refcnt in lookup routine. Ben's version looks cleaner here, it does not touch refcnt in rcu lookups. But it is dubious too: do_time_wait: + sock_hold(sk); is obviously in violation of the rule. Probably, rcu lookup should do something like: if (!atomic_inc_not_zero(sk-sk_refcnt)) pretend_it_is_not_found; It is clear Ben did not look into IBM patch, because one known place of trouble is missed: when socket moves from established to timewait, timewait bucket must be inserted before established socket is removed. Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SNMPv2 tcpAttemptFails counter error
From: Wei Yongjun [EMAIL PROTECTED] Date: Wed, 05 Jul 2006 05:19:54 -0400 In my test, those direct state transition can not be counted to tcpAttemptFails. Following is my patch: Signed-off-by: Wei Yongjun [EMAIL PROTECTED] This change can be implemented more simply, I believe. Except for the tcp_minisocks.c change, all the paths changed go to tcp_done() which is what actually transfers the state to TCP_CLOSE. Therefore, tcp_done() can simply be modified to check if the current state is TCP_SYN_RECV, and is so bump the counter. Once you implement it this way, please audit all call paths to make sure we don't now bump this counter twice. Thank you. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with bugfix for bond active-backup mode + vlans
Ben Greear [EMAIL PROTECTED] wrote: Jay Vosburgh wrote: Another possibility would be to have __vlan_hwaccel_rx check the VLAN_DEV_INFO(skb-dev)-real_dev, and if that's a bonding device, apply the same logic found in skb_bond(). Or, if there's some way to ask the question is dev a VLAN device?, then that same test could be put into skb_bond() and all of the packet suppression fru fru could stay there. There is a flag in if.h to denote VLAN devices: Thanks, I missed that. Sadly, elegance remains elusive, since the by the time skb_bond is called, the slave device the packet arrived on isn't available (vlan-real_dev points to 'bond0' by this point), and that information is needed to decide whether to drop the packet or not. The least grotty solution that comes to mind is to have __vlan_hwaccel_rx call some skb_bond_suppress_dups() function directly, and change skb_bond() to also call that function. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with bugfix for bond active-backup mode + vlans
Jay Vosburgh wrote: Ben Greear [EMAIL PROTECTED] wrote: Jay Vosburgh wrote: Another possibility would be to have __vlan_hwaccel_rx check the VLAN_DEV_INFO(skb-dev)-real_dev, and if that's a bonding device, apply the same logic found in skb_bond(). Or, if there's some way to ask the question is dev a VLAN device?, then that same test could be put into skb_bond() and all of the packet suppression fru fru could stay there. There is a flag in if.h to denote VLAN devices: Thanks, I missed that. Sadly, elegance remains elusive, since the by the time skb_bond is called, the slave device the packet arrived on isn't available (vlan-real_dev points to 'bond0' by this point), and that information is needed to decide whether to drop the packet or not. The least grotty solution that comes to mind is to have __vlan_hwaccel_rx call some skb_bond_suppress_dups() function directly, and change skb_bond() to also call that function. Can you use skb-input_dev? Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SNMPv2 tcpOutSegs counter error
From: Wei Yongjun [EMAIL PROTECTED] Date: Thu, 06 Jul 2006 04:01:18 -0400 - TCP_INC_STATS(TCP_MIB_OUTSEGS); + if (!(tcb-sacked TCPCB_LOST)) + TCP_INC_STATS(TCP_MIB_OUTSEGS); This test is not accurate enough. For example, timer based retransmits will not set the TCPCB_LOST bit. I'm tempted to say to pass a flag to tcp_transmit_skb() which says whether it is a retransmit or not, but that function already takes way too many arguments. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] skge: chip clock rate typo
On Mon, 24 Jul 2006 16:34:46 -0500 Larry Finger [EMAIL PROTECTED] wrote: Stephen Hemminger wrote: Michael Buesch spotted this typo. The impact is that the incorrect value was being computed for blinking LED and interrupt moderation values. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge-2.6.orig/drivers/net/skge.c +++ skge-2.6/drivers/net/skge.c @@ -519,7 +519,7 @@ static inline u32 hwkhz(const struct skg if (hw-chip_id == CHIP_ID_GENESIS) return 53215; /* or: 53.125 MHz */ -^ Should this be 53125? Larry yup - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] skge: chip clock rate typo
Okay, Fix both typo's in one patch .The impact is that the incorrect value was being computed for blinking LED and interrupt moderation values. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge-2.6.orig/drivers/net/skge.c +++ skge-2.6/drivers/net/skge.c @@ -516,10 +516,7 @@ static int skge_set_pauseparam(struct ne /* Chip internal frequency for clock calculations */ static inline u32 hwkhz(const struct skge_hw *hw) { - if (hw-chip_id == CHIP_ID_GENESIS) - return 53215; /* or: 53.125 MHz */ - else - return 78215; /* or: 78.125 MHz */ + return (hw-chip_id == CHIP_ID_GENESIS) ? 53125 : 78125; } /* Chip HZ to microseconds */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA will be reverted
From: Roland Dreier [EMAIL PROTECTED] Date: Tue, 04 Jul 2006 13:34:27 -0700 Well, here's a quick overview, leaving out some of the details. The difference between TOE and iWARP/RDMA is really the interface that they present. Thanks for the description Roland. It helps me understand the situation better. The real issues for netdev are things like Steve Wise's patch to add route change notifiers, which could be used to tell RNICs when to update the next hop for a connection they're handling. I'll probably put Steve's patches in soon. More generally, it would be interesting to see if it's possible to tie an RNIC into the kernel's packet filtering, so that disallowed connections don't get set up. This seems very similar in spirit to the problems around packet filtering that were raised for VJ netchannels. Don't get too excited about VJ netchannels, more and more roadblocks to their practicality are being found every day. For example, my idea to allow ESTABLISHED TCP socket demux to be done before netfilter is flawed. Connection tracking and NAT can change the packet ID and loop it back to us to hit exactly an ESTABLISHED TCP socket, therefore we must always hit netfilter first. All the original costs of route, netfilter, TCP socket lookup all reappear as we make VJ netchannels fit all the rules of real practical systems, eliminating their gains entirely. I will also note in passing that papers on related ideas, such as the Exokernel stuff, are very careful to not address the issue of how practical 1) their demux engine is and 2) the negative side effects of userspace TCP implementations. For an example of the latter, if you have some 1GB JAVA process you do not want to wake that monster up just to do some ACK processing or TCP window updates, yet if you don't you violate TCP's rules and risk spurious unnecessary retransmits. Furthermore, the VJ netchannel gains can be partially obtained from generic stateless facilities that we are going to get anyways. Networking chips supporting multiple MSI-X vectors, choosen by hashing the flow ID, can move TCP processing to end nodes which are cpu threads in this case, by having each such MSI-X vector target a different cpu thread. The good news is that we've survived a long time without revolutions like VJ net channels, and the existing TCP stack can be improved dramatically and in ways that people will see benefits from in a shorter amount of time. For example, Alexey Kuznetsov and I have some ideas on how to make the most expensive TCP function for a sender, tcp_ack(), more efficient by using different data structures for the retransmit queue and the loss/recovery packet SACK state. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA will be reverted
From: Tom Tucker [EMAIL PROTECTED] Date: Wed, 05 Jul 2006 12:09:42 -0500 A TOE net stack is closed source firmware. Linux engineers have no way to fix security issues that arise. As a result, only non-TOE users will receive security updates, leaving random windows of vulnerability for each TOE NIC's users. - A Linux security update may or may not be relevant to a vendors implementation. - If a vendor's implementation has a security issue then the customer must rely on the vendor to fix it. This is no less true for iWARP than for any adapter. This isn't how things actually work. Users have a computer, and they can rightly expect the community to help them solve problems that occur in the upstream kernel. When a bug is found and the person is using NIC X, we don't necessarily forward the bug report to the vendor of NIC X. Instead we try to fix the bug. Many chip drivers are maintained by people who do not work for the company that makes the chip, and this works just fine. If only the chip vendor can fix a security problem, this makes Linux less agile to fix. Even aspect of a problem on a Linux system that cannot be fixed entirely by the community is a net negative for Linux. - iWARP needs to do protocol processing in order to validate and evaluate TCP payload in advance of direct data placement. This requirement is independent of CPU speed. Yet, RDMA itself is just an optimization meant to deal with limitations of cpu and memory speed. You can rephrase the situation in whatever way suits your argument, but it does not make the core issue go away :) - I suspect that connection rates for RDMA adapters fall well-below the rates attainable with a dumb device. That said, all of the RDMA applications that I know of are not connection intensive. Even for TOE, the later HTTP versions makes connection rates less of an issue. This is a very naive evaluation of the situation. Yes, newer versions of protocols such as HTTP make the per-client connection burdon lower, but the number of clients will increase in time to more than makeup for whatever gains are seen due to this. And then you have protocols which by design are connection heavy, and rightly so, such as bittorrent. Being able to handle enormous numbers of connections, with extreme scalability and low latency, is an absolute requirement of any modern day serious TCP stack. And this requirement is not going away. Wishing this requirement away due to HTTP persistent connections is very unrealistic, at best. - This is the problem we're trying to solve...incrementally and responsibly. You can't. See my email to Roland about why even VJ net channels are found to be impractical. To support netfilter properly, you must traverse the whole netfilter stack, because NAT can rewrite packets, yet still make them destined for the local system, and thus they will have a different identity for connection demux by the time the TCP stack sees the packet. All of these tranformations occur between normal packet receive and the TCP stack. You would therefore need to put your card between netfilter and TCP in the packet input path, and at that point why bother with the stateful card at all? The fact is that stateless approaches will always be better than stateful things because you cannot replicate the functionality we have in the Linux stack without replicating 10 years of work into your chip's firmware. At that point you should just run Linux on your NIC since that is what you are effectively doing :) In conversations such as these, it helps us a lot if you can be frank and honest about the true absolute limitations of your technology. I can see that your viewpoint is tainted when I hear things such as HTTP persistent connections being used as a reason why high TCP connection rates won't matter in the future. Such assertions are understood to be patently false by anyone who understands TCP and how it is used in the real world. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA will be reverted
From: Steve Wise [EMAIL PROTECTED] Date: Wed, 05 Jul 2006 12:50:34 -0500 However, iWARP devices _could_ integrate with netfilter. For most devices the connection request event (SYN) gets passed up to the host driver. So the driver can enforce filter rules then. This doesn't work. In order to handle things like NAT and connection tracking properly you must even allow ESTABLISHED state packets to pass through netfilter. Netfilter can have rules such as NAT port 200 to 300, leave the other fields alone and your suggested scheme cannot handle this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] remove CONFIG_HAVE_ARCH_DEV_ALLOC_SKB
From: Christoph Hellwig [EMAIL PROTECTED] Date: Fri, 7 Jul 2006 11:10:08 +0200 skbuff.h has an #ifndef CONFIG_HAVE_ARCH_DEV_ALLOC_SKB to allow architectures to reimplement __dev_alloc_skb. It's not set on any architecture and now that we have an architecture-overrideable NET_SKB_PAD there is not point at all to have one either. Signed-off-by: Christoph Hellwig [EMAIL PROTECTED] Applied, thanks Christoph. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] correct dev_alloc_skb kerneldoc
From: Christoph Hellwig [EMAIL PROTECTED] Date: Fri, 7 Jul 2006 11:09:57 +0200 dev_alloc_skb is designated for RX descriptors, not TX. (Some drivers use it for the latter anyway, but that's a different story) Signed-off-by: Christoph Hellwig [EMAIL PROTECTED] Also applied, thanks a lot. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is RDMA
That TOE/iWARP could end-up being precluded by NAT seems so ironic from a POE2E standpoint. rick jones Purity Of End To END - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ip multicast route bug fix
This should fix the problem reported in http://bugzilla.kernel.org/show_bug.cgi?id=6186 where the skb is used after freed. The code in IP multicast route. Code was reusing an skb which could lead to use after free or double free. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- net/ipv4/ipmr.c | 20 ++-- 1 files changed, 14 insertions(+), 6 deletions(-) diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index ba33f86..d336104 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -1580,6 +1580,7 @@ int ipmr_get_route(struct sk_buff *skb, cache = ipmr_cache_find(rt-rt_src, rt-rt_dst); if (cache==NULL) { + struct sk_buff *iskb; struct net_device *dev; int vif; @@ -1593,12 +1594,19 @@ int ipmr_get_route(struct sk_buff *skb, read_unlock(mrt_lock); return -ENODEV; } - skb-nh.raw = skb_push(skb, sizeof(struct iphdr)); - skb-nh.iph-ihl = sizeof(struct iphdr)2; - skb-nh.iph-saddr = rt-rt_src; - skb-nh.iph-daddr = rt-rt_dst; - skb-nh.iph-version = 0; - err = ipmr_cache_unresolved(vif, skb); + + iskb = alloc_skb(sizeof(struct iphdr), GFP_KERNEL); + if (!iskb) { + read_unlock(mrt_lock); + return -ENOMEM; + } + memset(iskb-data, 0, sizeof(struct iphdr)); + iskb-nh.raw = iskb-data; + iskb-nh.iph-ihl = sizeof(struct iphdr)2; + iskb-nh.iph-saddr = rt-rt_src; + iskb-nh.iph-daddr = rt-rt_dst; + + err = ipmr_cache_unresolved(vif, iskb); read_unlock(mrt_lock); return err; } -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is RDMA
From: Rick Jones [EMAIL PROTECTED] Date: Mon, 24 Jul 2006 15:34:30 -0700 That TOE/iWARP could end-up being precluded by NAT seems so ironic from a POE2E standpoint. To be honest we do not have a pure end to end internet, and some of our failed experiments in the past prove this :-) For example, we have an optimization that allows much earlier termination of TIME_WAIT connections. It relies upon TCP timestamps and attributes we can determine about end hosts using that information (it is yet another Van Jacobson idea btw). But NAT means that IP+Port does not necessarily equate to the same host over time, not even over short periods of time. A NAT box could be using Port X for host A and then host B some short time later. Therefore we had to disable the early timewait recycling trick by default. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] remove CONFIG_HAVE_ARCH_DEV_ALLOC_SKB
skbuff.h has an #ifndef CONFIG_HAVE_ARCH_DEV_ALLOC_SKB to allow architectures to reimplement __dev_alloc_skb. It's not set on any architecture and now that we have an architecture-overrideable NET_SKB_PAD there is not point at all to have one either. I missed this when hch first posted it, sorry. But my impression was that the intent of the config option was to let Xen hook __dev_alloc_skb() to allocate special receive skbs to handle their page-flipping virtual network device. Which goes beyond NET_SKB_PAD. So the real question is about Xen hooks I guess -- and given where the rest of Xen is, it probably does make sense to go ahead and strip this out. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RDMA will be reverted
[EMAIL PROTECTED] wrote: From: Steve Wise [EMAIL PROTECTED] Date: Wed, 05 Jul 2006 12:50:34 -0500 However, iWARP devices _could_ integrate with netfilter. For most devices the connection request event (SYN) gets passed up to the host driver. So the driver can enforce filter rules then. This doesn't work. In order to handle things like NAT and connection tracking properly you must even allow ESTABLISHED state packets to pass through netfilter. Netfilter can have rules such as NAT port 200 to 300, leave the other fields alone and your suggested scheme cannot handle this. This is totally irrelevant. But it does work. First, an RDMA connection once established associates a TCP connection *as identified external to the box* with an RDMA endpoint (conventionally a QP). Performing a NAT translation on a TCP packet would certainly be within the capabilities of an RNIC, but it would be pointless. The relabeled TCP segment would be associated with the same QP. Once an RDMA connection is established, the individual TCP segments are only of interest to the RDMA endpoint. Payload is delivered through the RDMA interface (the same one already used for InfiniBand). The purpose of integration with netfilter would be to ensure that no RDMA Connection could exist, or persist, if netfilter would not allow the TCP connection to be created. That is not a matter of packet filtering, it is matter of administrative consistency. If someone uses netfilter to block connections from a given IP netmask then they reasonably expect that there will be no connections with any host within that IP netmask. They do not expect exceptions for RDMA, iSCSI or InfiniBand. The existing connection management interfaces in openfabrics, designed to support both InfiniBand and iWARP, could naturally be extended to validate all RDMA connections using an IP address with netfilter. This would be of real value. The only real value of a rule such as NAT port 200 to 300 is to allow a remote peer to establish a connection to port 200 with a local listener using port 300. That *can* be supported without actually manipulating the header in each TCP packet. It is also possible to discuss other netfilter functionality that serves a valid end-user purpose, such as counting packets. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is RDMA
On Tuesday 25 July 2006 00:34, Rick Jones wrote: That TOE/iWARP could end-up being precluded by NAT seems so ironic from a POE2E standpoint. Yes, it's sad, but reality unfortunately. There is even precedent: the VJ stateless TW recycling scheme also turned out to not work because of NAT considerations. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RDMA will be reverted
[EMAIL PROTECTED] wrote: From: Tom Tucker [EMAIL PROTECTED] Date: Wed, 05 Jul 2006 12:09:42 -0500 A TOE net stack is closed source firmware. Linux engineers have no way to fix security issues that arise. As a result, only non-TOE users will receive security updates, leaving random windows of vulnerability for each TOE NIC's users. - A Linux security update may or may not be relevant to a vendors implementation. - If a vendor's implementation has a security issue then the customer must rely on the vendor to fix it. This is no less true for iWARP than for any adapter. This isn't how things actually work. Users have a computer, and they can rightly expect the community to help them solve problems that occur in the upstream kernel. When a bug is found and the person is using NIC X, we don't necessarily forward the bug report to the vendor of NIC X. Instead we try to fix the bug. Many chip drivers are maintained by people who do not work for the company that makes the chip, and this works just fine. If only the chip vendor can fix a security problem, this makes Linux less agile to fix. Even aspect of a problem on a Linux system that cannot be fixed entirely by the community is a net negative for Linux. - iWARP needs to do protocol processing in order to validate and evaluate TCP payload in advance of direct data placement. This requirement is independent of CPU speed. Yet, RDMA itself is just an optimization meant to deal with limitations of cpu and memory speed. You can rephrase the situation in whatever way suits your argument, but it does not make the core issue go away :) RDMA is a protocol that allows the application to more precisely state the actual ordering requirements. It improves the end-to-end interactions and has value over a protocol with only byte or message stream semantics regardless of local interface efficiencies. See http://ietf.org/internet-drafts/draft-ietf-rddp-applicability-08.txt In any event, isn't the value of an RDMA interface to applications already settled? The question is how best to deal integrate the usage of IP addresses with the kernel. The inability to validate the low-level packet validation in open source code is a limitation of *all* RDMA solutions, the transport layer of InfiniBand is just as offloaded as it is for iWARP. The patches proposed are intended to support integrated connection management for RDMA connections using IP addresses, no matter what the underlying transport is. The only difference is that *all* iWARP connections use IP addresses. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA will be reverted
For example, my idea to allow ESTABLISHED TCP socket demux to be done before netfilter is flawed. Connection tracking and NAT can change the packet ID and loop it back to us to hit exactly an ESTABLISHED TCP socket, therefore we must always hit netfilter first. Hmm, how does this happen? I guess either when a connection is masqueraded and an application did a bind() on a local port that is used by the masquerading engine. That could be handled by just disallowing it. Or when you have a transparent proxy setup with the proxy on the local host. Perhaps in that case netfilter could be taught to reinject packets in a way that they hit another ESTABLISHED lookup. Did I miss a case? All the original costs of route, netfilter, TCP socket lookup all reappear as we make VJ netchannels fit all the rules of real practical systems, eliminating their gains entirely. At least most of the optimizations from the early demux scheme could be probably gotten simpler by adding a fast path to iptables/conntrack/etc. that checks if all rules only check SYN etc. packets and doesn't walk the full rules then (or more generalized a fast TCP flag mask check similar to what TCP does). With that ESTABLISHED would hit TCP only with relatively small overhead. I will also note in passing that papers on related ideas, such as the Exokernel stuff, are very careful to not address the issue of how practical 1) their demux engine is and 2) the negative side effects of userspace TCP implementations. For an example of the latter, if you have some 1GB JAVA process you do not want to wake that monster up just to do some ACK processing or TCP window updates, yet if you don't you violate TCP's rules and risk spurious unnecessary retransmits. I don't quite get why the size of the process matters here - if only some user space TCP library is called directly then it shouldn't really matter how big or small the rest of the process is. Or did you mean migration costs as described below? But on the other hand full user space TCP seems to me of little gain compared to a process context implementation. I somehow like it better to hide these implementation details in the kernel. Furthermore, the VJ netchannel gains can be partially obtained from generic stateless facilities that we are going to get anyways. Networking chips supporting multiple MSI-X vectors, choosen by hashing the flow ID, can move TCP processing to end nodes which are cpu threads in this case, by having each such MSI-X vector target a different cpu thread. The problem with the scheme is that to do process context processing effectively you would need to teach the scheduler to aggressively migrate on wake up (so that the process ends up on the CPU that was selected by the hash function in the NIC). But what do you do when you have lots of different connections with different target CPU hash values or when this would require you to move multiple compute intensive processes or a single core? Without user context TCP, but using softirqs instead, it looks a bit better because you can at least use different CPUs to do the ACK processing etc. and the hash function spreading out connections over your CPUs doesn't harm. But you still have relatively high cache line transfer costs in handing over these packet from the softirq CPUs to the final process consumer. I liked VJ's idea of using arrays-of-something instead of lists for that to avoid some cache line transfers. Ok at least it sounds nice in theory - haven't seen any hard numbers on this scheme compared to a traditional double linked list. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/2] s2io driver bug fixes
Hi, This patch contains some of the bug fixes and enhancements done to s2io driver. Following are the brief description of changes 1. Introduced macro S2IO_PARM_INT for declaring integer load parameter 2. UDP_RR test failure, memset txdl after Tx completion 3. PXE boot may leave adapter in unknown state so do reset in probe. 4. Add Tx completion code in netpoll 5. In s2io_vpd_read() move array vpd_data[] to pointer, saves stack memory 6. Fix bug in ethtool online test Signed-off-by: Ananda Raju [EMAIL PROTECTED] --- diff -upNr netdev/drivers/net/s2io.c bug_fixes_1/drivers/net/s2io.c --- netdev/drivers/net/s2io.c 2006-07-14 07:58:06.0 -0700 +++ bug_fixes_1/drivers/net/s2io.c 2006-07-14 09:26:09.0 -0700 @@ -370,38 +370,50 @@ static const u64 fix_mac[] = { END_SIGN }; +MODULE_AUTHOR(Raghavendra Koushik [EMAIL PROTECTED]); +MODULE_LICENSE(GPL); +MODULE_VERSION(DRV_VERSION); + + /* Module Loadable parameters. */ -static unsigned int tx_fifo_num = 1; -static unsigned int tx_fifo_len[MAX_TX_FIFOS] = -{DEFAULT_FIFO_0_LEN, [1 ...(MAX_TX_FIFOS - 1)] = DEFAULT_FIFO_1_7_LEN}; -static unsigned int rx_ring_num = 1; -static unsigned int rx_ring_sz[MAX_RX_RINGS] = -{[0 ...(MAX_RX_RINGS - 1)] = SMALL_BLK_CNT}; -static unsigned int rts_frm_len[MAX_RX_RINGS] = -{[0 ...(MAX_RX_RINGS - 1)] = 0 }; -static unsigned int rx_ring_mode = 1; -static unsigned int use_continuous_tx_intrs = 1; -static unsigned int rmac_pause_time = 0x100; -static unsigned int mc_pause_threshold_q0q3 = 187; -static unsigned int mc_pause_threshold_q4q7 = 187; -static unsigned int shared_splits; -static unsigned int tmac_util_period = 5; -static unsigned int rmac_util_period = 5; -static unsigned int bimodal = 0; -static unsigned int l3l4hdr_size = 128; -#ifndef CONFIG_S2IO_NAPI -static unsigned int indicate_max_pkts; -#endif +S2IO_PARM_INT(tx_fifo_num, 1); +S2IO_PARM_INT(rx_ring_num, 1); + + +S2IO_PARM_INT(rx_ring_mode, 1); +S2IO_PARM_INT(use_continuous_tx_intrs, 1); +S2IO_PARM_INT(rmac_pause_time, 0x100); +S2IO_PARM_INT(mc_pause_threshold_q0q3, 187); +S2IO_PARM_INT(mc_pause_threshold_q4q7, 187); +S2IO_PARM_INT(shared_splits, 0); +S2IO_PARM_INT(tmac_util_period, 5); +S2IO_PARM_INT(rmac_util_period, 5); +S2IO_PARM_INT(bimodal, 0); +S2IO_PARM_INT(l3l4hdr_size, 128); /* Frequency of Rx desc syncs expressed as power of 2 */ -static unsigned int rxsync_frequency = 3; +S2IO_PARM_INT(rxsync_frequency, 3); /* Interrupt type. Values can be 0(INTA), 1(MSI), 2(MSI_X) */ -static unsigned int intr_type = 0; +S2IO_PARM_INT(intr_type, 0); /* Large receive offload feature */ -static unsigned int lro = 0; +S2IO_PARM_INT(lro, 0); /* Max pkts to be aggregated by LRO at one time. If not specified, * aggregation happens until we hit max IP pkt size(64K) */ -static unsigned int lro_max_pkts = 0x; +S2IO_PARM_INT(lro_max_pkts, 0x); +#ifndef CONFIG_S2IO_NAPI +S2IO_PARM_INT(indicate_max_pkts, 0); +#endif + +static unsigned int tx_fifo_len[MAX_TX_FIFOS] = +{DEFAULT_FIFO_0_LEN, [1 ...(MAX_TX_FIFOS - 1)] = DEFAULT_FIFO_1_7_LEN}; +static unsigned int rx_ring_sz[MAX_RX_RINGS] = +{[0 ...(MAX_RX_RINGS - 1)] = SMALL_BLK_CNT}; +static unsigned int rts_frm_len[MAX_RX_RINGS] = +{[0 ...(MAX_RX_RINGS - 1)] = 0 }; + +module_param_array(tx_fifo_len, uint, NULL, 0); +module_param_array(rx_ring_sz, uint, NULL, 0); +module_param_array(rts_frm_len, uint, NULL, 0); /* * S2IO device table. @@ -464,10 +476,9 @@ static int init_shared_mem(struct s2io_n size += config-tx_cfg[i].fifo_len; } if (size MAX_AVAILABLE_TXDS) { - DBG_PRINT(ERR_DBG, %s: Requested TxDs too high, , - __FUNCTION__); + DBG_PRINT(ERR_DBG, s2io: Requested TxDs too high, ); DBG_PRINT(ERR_DBG, Requested: %d, max supported: 8192\n, size); - return FAILURE; + return -EINVAL; } lst_size = (sizeof(TxD_t) * config-max_txds); @@ -547,6 +558,7 @@ static int init_shared_mem(struct s2io_n nic-ufo_in_band_v = kmalloc((sizeof(u64) * size), GFP_KERNEL); if (!nic-ufo_in_band_v) return -ENOMEM; + memset(nic-ufo_in_band_v, 0, size); /* Allocation and initialization of RXDs in Rings */ size = 0; @@ -1213,7 +1225,7 @@ static int init_nic(struct s2io_nic *nic break; } - /* Enable Tx FIFO partition 0. */ + /* Enable all configured Tx FIFO partitions */ val64 = readq(bar0-tx_fifo_partition_0); val64 |= (TX_FIFO_PARTITION_EN); writeq(val64, bar0-tx_fifo_partition_0); @@ -1650,7 +1662,7 @@ static void en_dis_able_nic_intrs(struct writeq(temp64, bar0-general_int_mask); /* * If Hercules adapter enable GPIO otherwise -* disabled all PCIX, Flash, MDIO, IIC
[patch 2/2] s2io driver bug fixes
Hi, This patch contains some of the bug fixes and enhancements done to s2io driver. Following are the brief description of changes 1. code cleanup to handle gso modification better 2. Move repeated code in rx path, to a common function s2io_chk_rx_buffers() 3. Bug fix in MSI interrupt 4. clear statistics when card is down 5. Avoid linked list traversing in lro aggregation. 6. Use pci_dma_sync_single_for_cpu for buffer0 in case of 2/3 buffer mode. 7. ethtool tso get/set functions to set clear NETIF_F_TSO6 8. Stop LRO aggregation when we receive ECN notification Signed-off-by: Ananda Raju [EMAIL PROTECTED] --- diff -upNr bug_fixes_1/drivers/net/s2io.c bug_fixes_2/drivers/net/s2io.c --- bug_fixes_1/drivers/net/s2io.c 2006-07-14 09:26:09.0 -0700 +++ bug_fixes_2/drivers/net/s2io.c 2006-07-21 05:22:19.0 -0700 @@ -76,7 +76,7 @@ #include s2io.h #include s2io-regs.h -#define DRV_VERSION 2.0.14.2 +#define DRV_VERSION 2.0.15.2 /* S2io Driver name version. */ static char s2io_driver_name[] = Neterion; @@ -2383,9 +2383,14 @@ static int fill_rx_buffers(struct s2io_n skb-data = (void *) (unsigned long)tmp; skb-tail = (void *) (unsigned long)tmp; - ((RxD3_t*)rxdp)-Buffer0_ptr = - pci_map_single(nic-pdev, ba-ba_0, BUF0_LEN, + if (!(((RxD3_t*)rxdp)-Buffer0_ptr)) + ((RxD3_t*)rxdp)-Buffer0_ptr = + pci_map_single(nic-pdev, ba-ba_0, BUF0_LEN, PCI_DMA_FROMDEVICE); + else + pci_dma_sync_single_for_device(nic-pdev, + (dma_addr_t) ((RxD3_t*)rxdp)-Buffer0_ptr, + BUF0_LEN, PCI_DMA_FROMDEVICE); rxdp-Control_2 = SET_BUFFER0_SIZE_3(BUF0_LEN); if (nic-rxd_mode == RXD_MODE_3B) { /* Two buffer mode */ @@ -2398,10 +2403,13 @@ static int fill_rx_buffers(struct s2io_n (nic-pdev, skb-data, dev-mtu + 4, PCI_DMA_FROMDEVICE); - /* Buffer-1 will be dummy buffer not used */ - ((RxD3_t*)rxdp)-Buffer1_ptr = - pci_map_single(nic-pdev, ba-ba_1, BUF1_LEN, - PCI_DMA_FROMDEVICE); + /* Buffer-1 will be dummy buffer. Not used */ + if (!(((RxD3_t*)rxdp)-Buffer1_ptr)) { + ((RxD3_t*)rxdp)-Buffer1_ptr = + pci_map_single(nic-pdev, + ba-ba_1, BUF1_LEN, + PCI_DMA_FROMDEVICE); + } rxdp-Control_2 |= SET_BUFFER1_SIZE_3(1); rxdp-Control_2 |= SET_BUFFER2_SIZE_3 (dev-mtu + 4); @@ -2728,7 +2736,7 @@ static void rx_intr_handler(ring_info_t /* If your are next to put index then it's FIFO full condition */ if ((get_block == put_block) (get_info.offset + 1) == put_info.offset) { - DBG_PRINT(ERR_DBG, %s: Ring Full\n,dev-name); + DBG_PRINT(INTR_DBG, %s: Ring Full\n,dev-name); break; } skb = (struct sk_buff *) ((unsigned long)rxdp-Host_Control); @@ -2748,18 +2756,15 @@ static void rx_intr_handler(ring_info_t HEADER_SNAP_SIZE, PCI_DMA_FROMDEVICE); } else if (nic-rxd_mode == RXD_MODE_3B) { - pci_unmap_single(nic-pdev, (dma_addr_t) + pci_dma_sync_single_for_cpu(nic-pdev, (dma_addr_t) ((RxD3_t*)rxdp)-Buffer0_ptr, BUF0_LEN, PCI_DMA_FROMDEVICE); pci_unmap_single(nic-pdev, (dma_addr_t) -((RxD3_t*)rxdp)-Buffer1_ptr, -BUF1_LEN, PCI_DMA_FROMDEVICE); - pci_unmap_single(nic-pdev, (dma_addr_t) ((RxD3_t*)rxdp)-Buffer2_ptr, dev-mtu + 4, PCI_DMA_FROMDEVICE); } else { - pci_unmap_single(nic-pdev, (dma_addr_t) + pci_dma_sync_single_for_cpu(nic-pdev, (dma_addr_t) ((RxD3_t*)rxdp)-Buffer0_ptr, BUF0_LEN,
Re: RDMA will be reverted
On Tuesday 25 July 2006 01:22, David Miller wrote: From: Andi Kleen [EMAIL PROTECTED] Date: Tue, 25 Jul 2006 01:10:25 +0200 All the original costs of route, netfilter, TCP socket lookup all reappear as we make VJ netchannels fit all the rules of real practical systems, eliminating their gains entirely. At least most of the optimizations from the early demux scheme could be probably gotten simpler by adding a fast path to iptables/conntrack/etc. that checks if all rules only check SYN etc. packets and doesn't walk the full rules then (or more generalized a fast TCP flag mask check similar to what TCP does). With that ESTABLISHED would hit TCP only with relatively small overhead. Actually, all is not lost. Alexey has a more clever idea which is basically to run the netfilter hooks in the socket receive path. The gain being that the target CPU does the work instead of the softirq one? Some combined lookup and better handler of ESTABLISHED still seems like a good idea. One idea I had at some point was to separate conntrack for local connection vs routed connections and attach the local conntrack to the socket (and use its lookup tables). Then at least for local connections conntrack should be nearly free. It should also solve the issue we currently have that enabled conntrack makes local network performance significantly worse. Where does state live in such a huge process? Usually, it is scattered all over it's address space. Let us say that java application just did a lot of churning on it's own data structure, swapping out TCP library state objects, we will prematurely swap that stuff back in just to spit out an ACK or similar. TCP state is usually multiple cache lines, so you would have cache misses anyways. Do you worry about the TLBs? But what do you do when you have lots of different connections with different target CPU hash values or when this would require you to move multiple compute intensive processes or a single core? That is why we have scheduler :) It can't do well if it gets conflicting input. Even in a best effort scenerio, things will be generally better than they are currently, plus there is nothing precluding the flow demux MSI-X selection from getting more intelligent. Intelligent = statefull in this case. AFAIK the only way to do it stateless is hashes and the output of hashes tends to be unpredictible by definition. For example, the demuxer could notice that TCPdata transmits for flow X tend to happen on cpu X, and update a flow table to record that fact. It could use the same flow table as the one used for LRO. Hmm, i somewhat doubt that lower end NICs will ever have such flow tables. Also the flow tables could always thrash (because the on NIC RAM is necessarily limited) or they or require the NIC to look up state in memory which is probably not much faster than the CPUs doing it. Using hash functions in the hardware to select the MSI-X seems more elegant, cheaper and much more scalable to me. The drawback of hashes is that for processes with multiple connections you have to move some work back into the softirqs that run on the MSI-X target CPUs. So basically doing process context TCP fully will require much more complex and statefull hardware. Or you can keep it only as a fast path for specific situations (single busy connection per thread) and stay with mostly-softirq processing for the many connection cases. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[XFRM]: Fix protocol field value for outgoing IPv6 GSO packets
This appears to be a mistake, but I didn't follow the GSO stuff very closely, so there could be some non-obvious reason. [XFRM]: Fix protocol field value for outgoing IPv6 GSO packets Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 8035f60a607630459e4440dbbc5a20f3cfbf97ac tree f1a7061cfd1f923b3991ee8f449cffce86870a3e parent 440848a8e33fc1927bab45bd73f6c8e042ea7abd author Patrick McHardy [EMAIL PROTECTED] Tue, 25 Jul 2006 02:02:00 +0200 committer Patrick McHardy [EMAIL PROTECTED] Tue, 25 Jul 2006 02:02:00 +0200 net/ipv6/xfrm6_output.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c index 0eea60e..c8c8b44 100644 --- a/net/ipv6/xfrm6_output.c +++ b/net/ipv6/xfrm6_output.c @@ -125,7 +125,7 @@ static int xfrm6_output_finish(struct sk if (!skb_is_gso(skb)) return xfrm6_output_finish2(skb); - skb-protocol = htons(ETH_P_IP); + skb-protocol = htons(ETH_P_IPV6); segs = skb_gso_segment(skb, 0); kfree_skb(skb); if (unlikely(IS_ERR(segs)))
Re: [XFRM]: Fix protocol field value for outgoing IPv6 GSO packets
On Tue, Jul 25, 2006 at 02:09:26AM +0200, Patrick McHardy wrote: This appears to be a mistake, but I didn't follow the GSO stuff very closely, so there could be some non-obvious reason. Yes it definitely was a mistake! Thanks for picking this up Patrick. [XFRM]: Fix protocol field value for outgoing IPv6 GSO packets Signed-off-by: Patrick McHardy [EMAIL PROTECTED] Acked-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA will be reverted
This all sounds like the discussions we had within HP-UX between 10.20 and 11.0 concerning Inbound Packet Scheduling vs Thread Optimized Packet Scheduling. IPS was done by the 10.20 stack at the handoff between the driver and netisr. If the packet was not an IP datagram fragment, parts of the transport and IP headers would be hashed, and the result would be the netisr queue to which the packet would be queued for further processing. It worked fine and dandy for stuff like aggregate netperf TCP_RR tests because there was a 1-1 correspondence between a connection and a process/thread. It was OK for the networking to dictate where the process should run. That feels rather like a NIC that would hash packets and pick the MSI-X based on that. However, as Andi discusses, when there is a process/thread doing more than one connection, picking a CPU based on addressing hashing will be like TweedleDee and TweedleDum telling Alice to go in opposite directions. Hence TOPS in 11.X. This time, when there is a normal lookup location in the path, where the application last accessed the socket is determined, and things shift-over to that CPU. This then is the process (well actually the scheduler) telling networking where it should do its work. That addresses the multiple connections per thread/process and still works just as well for 1-1. There are still issues if you have mutiple threads/processes concurrently accessing the same socket/connection, but that one is much more rare. Nirvana I suppose would be the addition of a field in the header which could be used for the determination of where to process. A Transport Protocol option I suppose, maybe the IPv6 flow id, but knuth only knows if anyone would go for something along those lines. It does though mean that the state is per-packet without it having to be based on addressing information. Almost like RDMA arriving saying where the data goes, but this thing says where the processing should happen :) rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA will be reverted
From: Rick Jones [EMAIL PROTECTED] Date: Mon, 24 Jul 2006 17:29:05 -0700 Nirvana I suppose would be the addition of a field in the header which could be used for the determination of where to process. A Transport Protocol option I suppose, maybe the IPv6 flow id, but knuth only knows if anyone would go for something along those lines. It does though mean that the state is per-packet without it having to be based on addressing information. Almost like RDMA arriving saying where the data goes, but this thing says where the processing should happen :) Since the full interpretation of the TCP timestamp option field value is largely local to the peer setting it, there is nothing wrong with stealing a few bits for destination cpu information. It would have to be done in such a way as to not make the PAWS tests fail by accident. But I think it's doable. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA will be reverted
David Miller wrote: From: Rick Jones [EMAIL PROTECTED] Date: Mon, 24 Jul 2006 17:29:05 -0700 Nirvana I suppose would be the addition of a field in the header which could be used for the determination of where to process. A Transport Protocol option I suppose, maybe the IPv6 flow id, but knuth only knows if anyone would go for something along those lines. It does though mean that the state is per-packet without it having to be based on addressing information. Almost like RDMA arriving saying where the data goes, but this thing says where the processing should happen :) Since the full interpretation of the TCP timestamp option field value is largely local to the peer setting it, there is nothing wrong with stealing a few bits for destination cpu information. Even enough bits for 1024 or 2048 CPUs in the single system image? I have seen 1024 touted by SGI, and with things going so multi-core, perhaps 16384 while sounding initially bizzare would be in the realm of theoretically possible before to long. It would have to be done in such a way as to not make the PAWS tests fail by accident. But I think it's doable. That would cover TCP, are there similarly fungible fields in SCTP or other ULPs? And if we were to want to get HW support for the thing, getting it adopted in a de jure standards body would probably be in order :) rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA will be reverted
It would have to be done in such a way as to not make the PAWS tests fail by accident. But I think it's doable. CPU ID and higher-order generation number such that whenever the process migrates to a lower-numbered CPU, the generation number is bumped to make the timestamp larger than before? rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
softmac possible null deref [was: Complete report of Null dereference errors in kernel 2.6.17.1]
Tom Walter Dillig wrote: [109] 452 net/ieee80211/softmac/ieee80211softmac_io.c Possible null dereference of variable *pkt in function call (include/asm/string.h:__constant_c_and_count_memset) checked at (453:net/ieee80211/softmac/ieee80211softmac_io.c) Either I'm misunderstanding, or this is bogus. when *pkt is allocated by the various child functions (e.g. ieee80211softmac_disassoc_deauth), it is always checked for NULL. Finally, line 453 does another NULL check. What is the report trying to say? Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
softmac possible null deref [was: Complete report of Null dereference errors in kernel 2.6.17.1]
Tom Walter Dillig wrote: [109] 452 net/ieee80211/softmac/ieee80211softmac_io.c Possible null dereference of variable *pkt in function call (include/asm/string.h:__constant_c_and_count_memset) checked at (453:net/ieee80211/softmac/ieee80211softmac_io.c) Either I'm misunderstanding, or this is bogus. when *pkt is allocated by the various child functions (e.g. ieee80211softmac_disassoc_deauth), it is always checked for NULL before being used. Finally, line 453 does another NULL check, so that any failures generated above are handled appropriately. What is the report trying to say? Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: softmac possible null deref [was: Complete report of Null dereference errors in kernel 2.6.17.1]
On Tue, 25 Jul 2006 01:00:54 +0100 Daniel Drake [EMAIL PROTECTED] wrote: Tom Walter Dillig wrote: [109] 452 net/ieee80211/softmac/ieee80211softmac_io.c Possible null dereference of variable *pkt in function call (include/asm/string.h:__constant_c_and_count_memset) checked at (453:net/ieee80211/softmac/ieee80211softmac_io.c) Either I'm misunderstanding, or this is bogus. when *pkt is allocated by the various child functions (e.g. ieee80211softmac_disassoc_deauth), it is always checked for NULL. Finally, line 453 does another NULL check. What is the report trying to say? That the check in 453 should be removed because is unneeded. People who obsess about code coverage care that there are unneded checks. I don't think it matters. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA will be reverted
Even enough bits for 1024 or 2048 CPUs in the single system image? MSI-X supports 255 sub interrupts max, most hardware probably much less (e.g. 8 seems to be a popular number). It can be always hashed to the existing CPUs. It's a nice idea but I think standard hashing + processing in softirq would be worth a try first at least. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] via-rhine: NAPI support
Add NAPI support to the via-rhine driver so that it can handle higher speeds and doesn't get overloaded by interrupts as easily. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/via-rhine.c | 75 +++ 1 files changed, 63 insertions(+), 12 deletions(-) 90258ab7e4c90d183cfa32156cd2ff48aca03974 --- skge.orig/drivers/net/via-rhine.c +++ skge/drivers/net/via-rhine.c @@ -30,8 +30,8 @@ */ #define DRV_NAME via-rhine -#define DRV_VERSION1.4.0 -#define DRV_RELDATEJune-27-2006 +#define DRV_VERSION1.4.1 +#define DRV_RELDATEJuly-24-2006 /* A few user-configurable values. @@ -63,7 +63,11 @@ static const int multicast_filter_limit There are no ill effects from too-large receive rings. */ #define TX_RING_SIZE 16 #define TX_QUEUE_LEN 10 /* Limit ring entries actually used. */ +#ifdef CONFIG_VIA_RHINE_NAPI +#define RX_RING_SIZE 64 +#else #define RX_RING_SIZE 16 +#endif /* Operational parameters that usually are not changed. */ @@ -396,7 +400,7 @@ static void rhine_tx_timeout(struct net_ static int rhine_start_tx(struct sk_buff *skb, struct net_device *dev); static irqreturn_t rhine_interrupt(int irq, void *dev_instance, struct pt_regs *regs); static void rhine_tx(struct net_device *dev); -static void rhine_rx(struct net_device *dev); +static int rhine_rx(struct net_device *dev, int limit); static void rhine_error(struct net_device *dev, int intr_status); static void rhine_set_rx_mode(struct net_device *dev); static struct net_device_stats *rhine_get_stats(struct net_device *dev); @@ -564,6 +568,32 @@ static void rhine_poll(struct net_device } #endif +#ifdef CONFIG_VIA_RHINE_NAPI +static int rhine_napipoll(struct net_device *dev, int *budget) +{ + struct rhine_private *rp = netdev_priv(dev); + void __iomem *ioaddr = rp-base; + int done, limit = min(dev-quota, *budget); + + done = rhine_rx(dev, limit); + *budget -= done; + dev-quota -= done; + + if (done limit) { + netif_rx_complete(dev); + + iowrite16(IntrRxDone | IntrRxErr | IntrRxEmpty| IntrRxOverflow | + IntrRxDropped | IntrRxNoBuf | IntrTxAborted | + IntrTxDone | IntrTxError | IntrTxUnderrun | + IntrPCIErr | IntrStatsMax | IntrLinkChange, + ioaddr + IntrEnable); + return 0; + } + else + return 1; +} +#endif + static void rhine_hw_init(struct net_device *dev, long pioaddr) { struct rhine_private *rp = netdev_priv(dev); @@ -744,6 +774,10 @@ static int __devinit rhine_init_one(stru #ifdef CONFIG_NET_POLL_CONTROLLER dev-poll_controller = rhine_poll; #endif +#ifdef CONFIG_VIA_RHINE_NAPI + dev-poll = rhine_napipoll; + dev-weight = 64; +#endif if (rp-quirks rqRhineI) dev-features |= NETIF_F_SG|NETIF_F_HW_CSUM; @@ -1165,6 +1199,7 @@ static void rhine_tx_timeout(struct net_ dev-trans_start = jiffies; rp-stats.tx_errors++; netif_wake_queue(dev); + netif_poll_enable(dev); } static int rhine_start_tx(struct sk_buff *skb, struct net_device *dev) @@ -1268,8 +1303,18 @@ static irqreturn_t rhine_interrupt(int i dev-name, intr_status); if (intr_status (IntrRxDone | IntrRxErr | IntrRxDropped | - IntrRxWakeUp | IntrRxEmpty | IntrRxNoBuf)) - rhine_rx(dev); + IntrRxWakeUp | IntrRxEmpty | IntrRxNoBuf)) { +#ifdef CONFIG_VIA_RHINE_NAPI + iowrite16(IntrTxAborted | + IntrTxDone | IntrTxError | IntrTxUnderrun | + IntrPCIErr | IntrStatsMax | IntrLinkChange, + ioaddr + IntrEnable); + + netif_rx_schedule(dev); +#else + rhine_rx(dev, RX_RING_SIZE); +#endif + } if (intr_status (IntrTxErrSummary | IntrTxDone)) { if (intr_status IntrTxErrSummary) { @@ -1367,13 +1412,12 @@ static void rhine_tx(struct net_device * spin_unlock(rp-lock); } -/* This routine is logically part of the interrupt handler, but isolated - for clarity and better register allocation. */ -static void rhine_rx(struct net_device *dev) +/* Process up to limit frames from receive ring */ +static int rhine_rx(struct net_device *dev, int limit) { struct rhine_private *rp = netdev_priv(dev); + int count; int entry = rp-cur_rx % RX_RING_SIZE; - int boguscnt = rp-dirty_rx + RX_RING_SIZE - rp-cur_rx; if (debug 4) { printk(KERN_DEBUG %s: rhine_rx(), entry %d status %8.8x.\n, @@ -1382,16 +1426,18 @@ static void rhine_rx(struct net_device * } /* If EOP is set on the next entry, it's a new packet.
Re: RDMA will be reverted
From: Rick Jones [EMAIL PROTECTED] Date: Mon, 24 Jul 2006 17:55:24 -0700 Even enough bits for 1024 or 2048 CPUs in the single system image? I have seen 1024 touted by SGI, and with things going so multi-core, perhaps 16384 while sounding initially bizzare would be in the realm of theoretically possible before to long. Read the RSS NDIS documents from Microsoft. You aren't going to want to demux to more than, say, 256 cpus for single network adapter even on the largest machines. Therefore a simple translation table and/or base cpu number is sufficient to only need 8 bits of cpu identification. You will be limited by the number of MSI-X vectors also, for implementations demuxing directly to cpus using MSI-X selection. That would cover TCP, are there similarly fungible fields in SCTP or other ULPs? And if we were to want to get HW support for the thing, getting it adopted in a de jure standards body would probably be in order :) Microsoft never does this, neither do we. LRO came out of our own design, the network folks found it reasonable and thus they have started to implement it. The same is true for Microsofts RSS stuff. It's a hardware interpretation, therefore it belongs in a driver API specification, nowhere else. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with bugfix for bond active-backup mode + vlans
Ben Greear [EMAIL PROTECTED] wrote: Jay Vosburgh wrote: Sadly, elegance remains elusive, since the by the time skb_bond is called, the slave device the packet arrived on isn't available (vlan-real_dev points to 'bond0' by this point), and that information is needed to decide whether to drop the packet or not. The least grotty solution that comes to mind is to have __vlan_hwaccel_rx call some skb_bond_suppress_dups() function directly, and change skb_bond() to also call that function. Can you use skb-input_dev? Not as it is currently implemented. It is set by netif_receive_skb, not by the vlan accelerator, so input_dev ends up being the vlan device, not the underlying actual ethernet device. It looks like input_dev will be inconsistently assigned with vlans over bonding: if the slave device is vlan accelerated, input_dev will be the vlan device; if the slave isn't accelerated, input_dev will be the slave. As far as I can tell, the input_dev is only used by the NET_CLS_IND (input device classification) stuff, which has warnings saying it might be going away. I'm not seeing anything else right offhand that uses it. Anyway, the skb_bond logic really needs the enslaved interface, which isn't necessarily the input_dev (even if input_dev was always the device that actually had the wire plugged into it). If the slave is itself some kind of virtual device (a vlan, for example), then input_dev wouldn't be the right thing. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA will be reverted
On Tuesday 25 July 2006 02:29, Rick Jones wrote: This all sounds like the discussions we had within HP-UX between 10.20 and 11.0 concerning Inbound Packet Scheduling vs Thread Optimized Packet Scheduling. We've also talking about this for many years, just no code so far. Or rather Linux so far left the job to manual tuning. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skge error; hangs w/ hardware memory hole
On Sunday 23 July 2006 08:32, Anthony DeRobertis wrote: Andreas Kleen wrote: You need to use iommu=soft swiotlb=force The standard IOMMU is also broken on VIA, but forced swiotlb should work. Didn't work :-( swiotlb=force is unfortunately broken right now. But which this patch it should work. Does it? -Andi Test patch only: disable DMA over 4GB Index: linux-2.6.17-work/arch/x86_64/kernel/pci-dma.c === --- linux-2.6.17-work.orig/arch/x86_64/kernel/pci-dma.c +++ linux-2.6.17-work/arch/x86_64/kernel/pci-dma.c @@ -202,7 +202,7 @@ int dma_set_mask(struct device *dev, u64 { if (!dev-dma_mask || !dma_supported(dev, mask)) return -EIO; - *dev-dma_mask = mask; + *dev-dma_mask = mask 0x; return 0; } EXPORT_SYMBOL(dma_set_mask); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[IPROUTE]: Add support for multipath route realms
[IPROUTE]: Add support for multipath route realms Routing realms exist per nexthop, but iproute currently only allows to send a single route realm, which is refused by the kernel for multipath routes. Add support for specifying per nexthop realms. Old kernels only return the first realm back to userspace when dumping, so the others can't be displayed, besides that it will also behave correctly on old kernels. old kernel: 1.2.3.4 realm 1 nexthop dev dummy0 weight 1 nexthop dev dummy1 weight 1 nexthop dev dummy2 weight 1 nexthop dev dummy3 weight 1 new kernel: 1.2.3.4 nexthop realm 1 dev dummy0 weight 1 nexthop realm 2 dev dummy1 weight 1 nexthop realm 3 dev dummy2 weight 1 nexthop realm 4 dev dummy3 weight 1 route queries on both old and new kernel: 1.2.3.4 dev dummy0 src 10.0.0.1 realm 1 cache mtu 1500 advmss 1460 metric 10 64 1.2.3.4 dev dummy1 src 10.0.0.1 realm 2 cache mtu 1500 advmss 1460 metric 10 64 1.2.3.4 dev dummy2 src 10.0.0.1 realm 3 cache mtu 1500 advmss 1460 metric 10 64 1.2.3.4 dev dummy3 src 10.0.0.1 realm 4 cache mtu 1500 advmss 1460 metric 10 64 Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit dbc39a8d37d658776a8959d2393b1047ea124436 tree be59669a06709aaa3b194f050529fe3986928dc8 parent 8f8a36487119a3cd1afe86a9649704aca088567b author Patrick McHardy [EMAIL PROTECTED] Tue, 25 Jul 2006 05:55:36 +0200 committer Patrick McHardy [EMAIL PROTECTED] Tue, 25 Jul 2006 05:55:36 +0200 ip/iproute.c | 19 +++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/ip/iproute.c b/ip/iproute.c index a43c09e..3544f02 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -557,6 +557,18 @@ #endif RTA_DATA(tb[RTA_GATEWAY]), abuf, sizeof(abuf))); } + if (tb[RTA_FLOW]) { + __u32 to = *(__u32*)RTA_DATA(tb[RTA_FLOW]); + __u32 from = to16; + to = 0x; + fprintf(fp, realm%s , from ? s : ); + if (from) { + fprintf(fp, %s/, + rtnl_rtrealm_n2a(from, b1, sizeof(b1))); + } + fprintf(fp, %s, + rtnl_rtrealm_n2a(to, b1, sizeof(b1))); + } } if (r-rtm_flagsRTM_F_CLONED r-rtm_type == RTN_MULTICAST) { fprintf(fp, %s, ll_index_to_name(nh-rtnh_ifindex)); @@ -606,6 +618,13 @@ int parse_one_nh(struct rtattr *rta, str rtnh-rtnh_hops = w - 1; } else if (strcmp(*argv, onlink) == 0) { rtnh-rtnh_flags |= RTNH_F_ONLINK; + } else if (matches(*argv, realms) == 0) { + __u32 realm; + NEXT_ARG(); + if (get_rt_realms(realm, *argv)) + invarg(\realm\ value is invalid\n, *argv); + rta_addattr32(rta, 4096, RTA_FLOW, realm); + rtnh-rtnh_len += sizeof(struct rtattr) + 4; } else break; }
Can we have GET_NETDEV_DEV?
Hello! gregkh-driver-network-class_device-to-device.patch, which briefly appeared in Linux 2.6.18-rc1-mm1 broke MadWifi, which is copying the physical device information from the master network device to the virtual network devices: SET_NETDEV_DEV(dev, mdev-class_dev.dev); The same code exists in hostap. The patch is gone from 2.6.18-rc1-mm2, but I'd like to be prepared if it reappears. An easy solution would be to have GET_NETDEV_DEV macro. Then the drivers could do this: SET_NETDEV_DEV(dev, GET_NETDEV_DEV(mdev)); without having to worry about the internals of struct net_device. It should be done before class_dev is removed or in the same time. Should I send a patch? -- Regards, Pavel Roskin - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ip multicast route bug fix
On Wed, 19 Jul 2006, Stephen Hemminger wrote: This should fix the problem reported in http://bugzilla.kernel.org/show_bug.cgi?id=6186 where the skb is used after freed. The code in IP multicast route. Code was reusing an skb which could lead to use after free or double free. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] Acked-by: James Morris [EMAIL PROTECTED] -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMA will be reverted
On Mon, Jul 24, 2006 at 03:06:13PM -0700, David Miller ([EMAIL PROTECTED]) wrote: Don't get too excited about VJ netchannels, more and more roadblocks to their practicality are being found every day. For example, my idea to allow ESTABLISHED TCP socket demux to be done before netfilter is flawed. Connection tracking and NAT can change the packet ID and loop it back to us to hit exactly an ESTABLISHED TCP socket, therefore we must always hit netfilter first. There is no problem with netfilter and process context processing - when skb is removed from hardware list/array and is being processed by netfilter in netchannel (or in process context in general), there is no problems if changed skb will be rerouted into different queue and state. All the original costs of route, netfilter, TCP socket lookup all reappear as we make VJ netchannels fit all the rules of real practical systems, eliminating their gains entirely. I will also note in passing that papers on related ideas, such as the Exokernel stuff, are very careful to not address the issue of how practical 1) their demux engine is and 2) the negative side effects of userspace TCP implementations. For an example of the latter, if you have some 1GB JAVA process you do not want to wake that monster up just to do some ACK processing or TCP window updates, yet if you don't you violate TCP's rules and risk spurious unnecessary retransmits. I still plan to continue userspace implementation. If gigantic-java-monster (tm) is going to read some data - it has been awakened already, thus it is in the memeory (with linked tcp lib), so there is zero overhead. Furthermore, the VJ netchannel gains can be partially obtained from generic stateless facilities that we are going to get anyways. Networking chips supporting multiple MSI-X vectors, choosen by hashing the flow ID, can move TCP processing to end nodes which are cpu threads in this case, by having each such MSI-X vector target a different cpu thread. And if that CPU is very busy? Linux should somehow tell NIC that some CPUs are valid and some are not right now, but not in a second, so scheduler must be tightly bound with network internals. Just my 2 coins. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html