Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
Nagendra Tomar a écrit : --- Davide Libenzi [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007, David Miller wrote: From: Nagendra Tomar [EMAIL PROTECTED] Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will not return, even when the incoming acks free the buffers. Note that this patch assumes that the SOCK_NOSPACE check in tcp_check_space is a trivial optimization which can be safely removed. I already replied to your patch posting explaining that whatever is not setting SOCK_NOSPACE should be fixed instead. Please address that, thanks. You're not planning of putting the notion of a SOCK_NOSPACE bit inside a completely device-unaware interface like epoll, I hope? Definitely not ! The point is that the tcp write space available wakeup does not get called if SOCK_NOSPACE bit is not set. This was fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit indicated that someone really cared abt the wakeup). Now after the introduction of callback'ed wakeups, we might have some work to do inside the callback even if there is nobody interested in the wakeup at that point of time. In this particular case the ep_poll_callback is not getting called and hence the socket fd is not getting added to the ready list. Does it means that with your patch each ACK on a ET managed socket will trigger an epoll event ? Maybe your very sensitive high throuput appication needs to set a flag or something at socket level to ask for such a behavior. The default should stay as is. That is an event should be sent only if someone cared about the wakeup. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ethtool: marvell register update
Stephen Hemminger wrote: Update the decode of sky2 registers. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] phy: export phy_mii_ioctl
Domen Puncer wrote: Export phy_mii_ioctl, so network drivers can use it when built as modules too. Signed-off-by: Domen Puncer [EMAIL PROTECTED] applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci: Fix e100 interrupt quirk
On Tue, 18 Sep 2007 15:17:37 +0400 Valentine Barshak [EMAIL PROTECTED] wrote: PCI memory space may have a 64-bit offset on some architectures (for example, PowerPC 440) and the actual PCI memory address has to fixed up (an offset to PCI mem space shuld be added) before remapping. So, pci_iomap should be used instead of reading and remapping PCI BAR directly. This has been tested on Sequoia PowerPC 440EPx board. Signed-off-by: Valentine Barshak [EMAIL PROTECTED] --- --- linux-2.6.orig/drivers/pci/quirks.c 2007-09-04 21:15:43.0 +0400 +++ linux-2.6.bld/drivers/pci/quirks.c2007-09-05 20:46:14.0 +0400 @@ -1444,9 +1444,9 @@ static void __devinit quirk_e100_interrupt(struct pci_dev *dev) { u16 command; - u32 bar; u8 __iomem *csr; u8 cmd_hi; + int rc; switch (dev-device) { /* PCI IDs taken from drivers/net/e100.c */ @@ -1476,16 +1476,17 @@ * re-enable them when it's ready. */ pci_read_config_word(dev, PCI_COMMAND, command); - pci_read_config_dword(dev, PCI_BASE_ADDRESS_0, bar); - if (!(command PCI_COMMAND_MEMORY) || !bar) + rc = pci_request_region(dev, 0, e100_quirk); + + if (!(command PCI_COMMAND_MEMORY) || (rc 0)) return; Really? So if pci_request_region() failed and !(command PCI_COMMAND_MEMORY), we leak the region? So the next call to this function will fail? - csr = ioremap(bar, 8); + csr = pci_iomap(dev, 0, 8); if (!csr) { printk(KERN_WARNING PCI: Can't map %s e100 registers\n, pci_name(dev)); - return; + goto e100_quirk_exit; } cmd_hi = readb(csr + 3); @@ -1495,7 +1496,9 @@ writeb(1, csr + 3); } - iounmap(csr); + pci_iounmap(dev, csr); +e100_quirk_exit: + pci_release_region(dev, 0); } DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, quirk_e100_interrupt); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH - net-2.6.24 0/2] Introduce and use print_ip and print_ipv6
In the same vein as print_mac, the implementations introduce declaration macros: DECLARE_IP_BUF(var) DECLARE_IPV6_BUF(var) and functions: print_ip print_ipv6 print_ipv6_nofmt IPV4 Use: DECLARE_IP_BUF(ipbuf); __be32 addr; print_ip(ipbuf, addr); IPV6 use: DECLARE_IPV6_BUF(ipv6buf); const struct in6_addr *addr; print_ipv6(ipv6buf, addr); and print_ipv6_nofmt(ipv6buf, addr); compiled x86, defconfig and allyesconfig - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH - net-2.6.24 1/2] Introduce and use print_ip
This removes the uses of NIPQUAD and HIPQUAD in drivers/net and net IPV4 Use: DECLARE_IP_BUF(ipbuf); __be32 addr; print_ip(ipbuf, addr) Signed-off-by: Joe Perches [EMAIL PROTECTED] please pull from: git pull http://repo.or.cz/r/linux-2.6/trivial-mods.git print_ipv4 stats for print_ipv4: -- drivers/net/bonding/bond_main.c| 35 +++- drivers/net/bonding/bond_sysfs.c | 31 ++- include/linux/ip.h |8 +++ include/net/ip_vs.h| 36 +++-- include/net/sctp/sctp.h|5 +- net/atm/clip.c |5 +- net/atm/mpc.c | 28 ++ net/atm/mpoa_caches.c | 20 +--- net/bridge/netfilter/ebt_log.c | 21 net/core/netpoll.c | 15 +++-- net/core/utils.c | 14 + net/dccp/ipv4.c| 10 ++-- net/dccp/probe.c | 14 +++-- net/ipv4/af_inet.c |8 ++- net/ipv4/arp.c |9 ++-- net/ipv4/fib_trie.c|7 ++- net/ipv4/icmp.c| 26 + net/ipv4/ip_fragment.c |5 +- net/ipv4/ip_input.c|8 ++- net/ipv4/ipcomp.c |5 +- net/ipv4/ipconfig.c| 46 +--- net/ipv4/ipvs/ip_vs_conn.c | 63 ++ net/ipv4/ipvs/ip_vs_core.c | 51 +++--- net/ipv4/ipvs/ip_vs_ctl.c | 35 +++- net/ipv4/ipvs/ip_vs_dh.c | 10 ++-- net/ipv4/ipvs/ip_vs_ftp.c | 19 --- net/ipv4/ipvs/ip_vs_lblc.c | 14 +++-- net/ipv4/ipvs/ip_vs_lblcr.c| 34 +++- net/ipv4/ipvs/ip_vs_lc.c |5 +- net/ipv4/ipvs/ip_vs_nq.c |5 +- net/ipv4/ipvs/ip_vs_proto.c| 20 --- net/ipv4/ipvs/ip_vs_proto_ah.c | 24 +--- net/ipv4/ipvs/ip_vs_proto_esp.c| 24 +--- net/ipv4/ipvs/ip_vs_proto_tcp.c| 20 --- net/ipv4/ipvs/ip_vs_proto_udp.c| 10 ++-- net/ipv4/ipvs/ip_vs_rr.c |5 +- net/ipv4/ipvs/ip_vs_sed.c |5 +- net/ipv4/ipvs/ip_vs_sh.c | 10 ++-- net/ipv4/ipvs/ip_vs_sync.c |5 +- net/ipv4/ipvs/ip_vs_wlc.c |5 +- net/ipv4/ipvs/ip_vs_wrr.c |5 +- net/ipv4/ipvs/ip_vs_xmit.c | 16 +++--- net/ipv4/netfilter/arp_tables.c| 20 --- net/ipv4/netfilter/ip_tables.c | 19 --- net/ipv4/netfilter/ipt_CLUSTERIP.c | 16 +++--- net/ipv4/netfilter/ipt_LOG.c | 10 ++-- net/ipv4/netfilter/ipt_SAME.c | 21 +--- net/ipv4/netfilter/ipt_iprange.c | 21 net/ipv4/netfilter/ipt_recent.c|5 +- net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 20 --- net/ipv4/netfilter/nf_nat_ftp.c|3 +- net/ipv4/netfilter/nf_nat_h323.c | 68 ++- net/ipv4/netfilter/nf_nat_irc.c|5 +- net/ipv4/netfilter/nf_nat_rule.c | 16 -- net/ipv4/netfilter/nf_nat_sip.c| 12 +++-- net/ipv4/netfilter/nf_nat_snmp_basic.c | 13 +++-- net/ipv4/route.c | 58 + net/ipv4/tcp_input.c |5 +- net/ipv4/tcp_ipv4.c| 25 + net/ipv4/tcp_probe.c |8 ++- net/ipv4/tcp_timer.c |5 +- net/ipv4/udp.c | 14 +++-- net/ipv6/netfilter/ip6t_LOG.c |9 ++- net/netfilter/nf_conntrack_ftp.c | 10 ++-- net/netfilter/nf_conntrack_irc.c | 18 --- net/netfilter/xt_hashlimit.c | 10 ++-- net/rxrpc/af_rxrpc.c |7 ++- net/rxrpc/ar-error.c |5 +- net/rxrpc/ar-local.c | 19 --- net/rxrpc/ar-peer.c|9 ++-- net/rxrpc/ar-proc.c| 23 +--- net/rxrpc/ar-transport.c | 17 -- net/rxrpc/rxkad.c |4 +- net/sctp/protocol.c| 26 ++--- net/sctp/sm_statefuns.c|6 +-
[git patches] net driver updates
[this, sans patch which was too big for netdev, was just sent upstream. the patch can be recreated via 'git diff net-2.6.24..upstream'] NOTE that sky2 will also be going upstream for 2.6.23-rc, as just posted on netdev. Please pull from the 'upstream' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream to receive the following changes: Al Viro (20): 8139cp: trivial endianness annotations endianness annotations drivers/net/bonding/ fix vlan in 8139cp on big-endian 3c59x: trivial endianness annotations, NULL noise removal amd8111e: trivial endianness annotations, NULL noise removal amd8111e big-endian fix arcnet endianness annotations tulip: endianness annotations typhoon: trivial endianness annotations pcnet32: endianness ixgb: endianness drivers/net/irda: endianness, NULL noise starfire: trivial endianness annotations r8169: endianness via-rhine: endianness pppoe: endianness tms380tr: trivial endianness annotations drivers/net/appletalk: endianness 3c509: endianness cxgb3: trivial endianness annotations Alex Landau (1): Blackfin EMAC driver: add function to change the MAC address Bryan Wu (3): Blackfin EMAC driver: add power management interface and change the bf537mac_reset to bf537mac_disable Blackfin EMAC driver: Add phy abstraction layer supporting in bfin_emac driver Blackfin EMAC driver: add a select for the PHYLIB of this driver David Gibson (1): Device tree aware EMAC driver Dhananjay Phadke (1): netxen: ethtool fixes Jeff Garzik (1): [netdrvr] Stop using legacy hooks -self_test_count, -get_stats_count Maciej W. Rozycki (3): sb1250-mac.c: Fix stats references NET_SB1250_MAC: Update Kconfig entry NET_SB1250_MAC: Rename to SB1250_MAC Sivakumar Subramani (4): S2io: Change kmalloc+memset to k[zc]alloc S2io: Removed unused feature - bimodal interrupts S2io: Added support set_mac_address driver entry point S2io: Updating transceiver information in ethtool function Stephen Hemminger (3): sky2: fix VLAN receive processing (resend) sky2: ethtool speed report bug sky2: version 1.18 Ursula Braun (1): s390 networking MAINTAINERS Vitaly Bordug (2): FS_ENET: TX stuff should use fep-tx_lock, instead of fep-lock. FS_ENET: Add polling support Documentation/powerpc/booting-without-of.txt | 156 + MAINTAINERS | 12 arch/mips/configs/bigsur_defconfig |2 arch/mips/configs/sb1250-swarm_defconfig |2 arch/powerpc/platforms/44x/Kconfig |3 arch/powerpc/platforms/cell/Kconfig |4 drivers/net/3c509.c |4 drivers/net/3c59x.c | 39 drivers/net/8139cp.c | 59 drivers/net/8139too.c| 11 drivers/net/Kconfig | 89 drivers/net/Makefile |3 drivers/net/amd8111e.c |9 drivers/net/amd8111e.h | 24 drivers/net/appletalk/ipddp.c|2 drivers/net/appletalk/ipddp.h|2 drivers/net/arcnet/rfc1051.c |4 drivers/net/arcnet/rfc1201.c |6 drivers/net/atl1/atl1_ethtool.c | 11 drivers/net/b44.c| 11 drivers/net/bfin_mac.c | 347 ++- drivers/net/bfin_mac.h | 53 drivers/net/bnx2.c | 20 drivers/net/bonding/bond_3ad.c | 42 drivers/net/bonding/bond_3ad.h | 20 drivers/net/bonding/bond_alb.c | 19 drivers/net/bonding/bond_alb.h |4 drivers/net/bonding/bond_main.c | 22 drivers/net/bonding/bond_sysfs.c |8 drivers/net/bonding/bonding.h|6 drivers/net/cassini.c| 11 drivers/net/chelsio/cxgb2.c | 11 drivers/net/cxgb3/common.h |4 drivers/net/cxgb3/cxgb3_main.c | 11 drivers/net/cxgb3/sge.c |6 drivers/net/e100.c | 19 drivers/net/e1000/e1000_ethtool.c| 22 drivers/net/e1000e/ethtool.c | 21 drivers/net/ehea/ehea_ethtool.c | 13 drivers/net/forcedeth.c | 45 drivers/net/fs_enet/fs_enet-main.c | 77 drivers/net/fs_enet/mac-fcc.c| 12 drivers/net/fs_enet/mac-fec.c| 30 drivers/net/fs_enet/mac-scc.c| 20 drivers/net/fs_enet/mii-bitbang.c| 10 drivers/net/gianfar_ethtool.c| 20 drivers/net/ibm_emac/Kconfig
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- Eric Dumazet [EMAIL PROTECTED] wrote: Nagendra Tomar a écrit : --- Davide Libenzi [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007, David Miller wrote: From: Nagendra Tomar [EMAIL PROTECTED] Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will not return, even when the incoming acks free the buffers. Note that this patch assumes that the SOCK_NOSPACE check in tcp_check_space is a trivial optimization which can be safely removed. I already replied to your patch posting explaining that whatever is not setting SOCK_NOSPACE should be fixed instead. Please address that, thanks. You're not planning of putting the notion of a SOCK_NOSPACE bit inside a completely device-unaware interface like epoll, I hope? Definitely not ! The point is that the tcp write space available wakeup does not get called if SOCK_NOSPACE bit is not set. This was fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit indicated that someone really cared abt the wakeup). Now after the introduction of callback'ed wakeups, we might have some work to do inside the callback even if there is nobody interested in the wakeup at that point of time. In this particular case the ep_poll_callback is not getting called and hence the socket fd is not getting added to the ready list. Does it means that with your patch each ACK on a ET managed socket will trigger an epoll event ? Maybe your very sensitive high throuput appication needs to set a flag or something at socket level to ask for such a behavior. The default should stay as is. That is an event should be sent only if someone cared about the wakeup. A high throughput app will always care about the wakeup, or else it will not be a high throughput app in the first place. An application that occasionaly writes and then goes to slumber and then writes again will not be a high throughput app. My point is that the SOCK_NOSPACE check does not save us much. For high throughput app it will almost always be set, thus making the check insignificant, and for the low throughput case we care less. Thanx, Tomar ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] CAN: Add PF_CAN core module
Hi Patrick, I have done allmost all changes to the code as you suggested. The changes to use the return value of can_rx_register() also fixed a minor flax with failing bind() and setsockopt() on raw sockets. But there are two things left I would like to ask/understand: Patrick McHardy [EMAIL PROTECTED] writes: When the module is unloaded it calls can_proto_unregister() which clears the pointer. Do you see a race condition here? Yes, you do request_module, load the module, get the cp pointer from proto_tab, the module is unloaded again. cp points to stable memory. Using module references would fix this. How would I use the module reference counter? Somehow with try_module_get()? I have thought something like cp = proto_tab[protocol]; if (!cp ...) return ...; if (!try_module_get(cp-prot-owner)) return ...; sk = sk_alloc(...) module_put(...); return ret; But here I see two problems: 1. Between the check !cp... and referencing cp-prot-owner the module could get unloaded and the reference be invalid. Is there some lock I can hold that prevents module unloading? I haven't found something like this in include/linux/module.h 2. If the module gets unloaded after the first check and request_module() but before the call to try_module_get() the socket() syscall will return with error, although module auto loading would normally be successful. How can I prevent that? find_dev_rcv_lists() is called in one place from can_rcv() with RCU lock held, as you write. The other two calls to find_dev_rcv_lists() are from can_rx_register/unregister() functions which change the receive lists. Therefore, we can't only use RCU but need protection against simultanous writes. We do this with the spin_lock_bh(). The _bh variant, because can_rcv() runs in interrupt and we need to block that. I thought this is pretty standard. I'll check this again tomorrow, but I have put much time in these locking issues already, changed it quite a few times and hoped to have got it right finally. I'm not saying you should use *only* RCU, you need the lock for additions/removal of course, but since the receive path doesn't take that lock and relies on RCU, you need to use the _rcu list walking variant to avoid races with concurrent list changes. I have no objections to add the _rcu suffix for the code changing the receive lists, but I don't see why it's necessary. When I do a spin_lock_bh() before writing, can't I be sure that there is no interrupt routine running in parallel while I hold this spinlock? If so, there is no reader in parallel because the can_rcv() function runs in a softirq. I'd really like to understand why you think the writers should also use the _rcu variant. I'm sorry if I miss something obvious here, but could you try to explain it to me? urs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.23-rc4-mm1][Bug] kernel BUG at include/linux/netdevice.h:339!
Andrew Morton wrote: On Mon, 17 Sep 2007 17:46:38 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote: Kernel Bug is hit with 2.6.23-rc4-mm1 kernel on ppc64 machine. kernel BUG at include/linux/netdevice.h:339! (please cc netdev@vger.kernel.org on networking-related matters) You died here: static inline void napi_complete(struct napi_struct *n) { BUG_ON(!test_bit(NAPI_STATE_SCHED, n-state)); The NAPI changes have had a few problems and hopefully things have been fixed up since then. I'll try to get rc6-mm1 out this evening, so please retest that? Hi Andrew, I don't see this bug in the 2.6.23-rc6-mm1, till now. -- Thanks Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] CAN: Add PF_CAN core module
Urs Thuermann wrote: Patrick McHardy [EMAIL PROTECTED] writes: When the module is unloaded it calls can_proto_unregister() which clears the pointer. Do you see a race condition here? Yes, you do request_module, load the module, get the cp pointer from proto_tab, the module is unloaded again. cp points to stable memory. Using module references would fix this. How would I use the module reference counter? Somehow with try_module_get()? I have thought something like cp = proto_tab[protocol]; if (!cp ...) return ...; if (!try_module_get(cp-prot-owner)) return ...; sk = sk_alloc(...) module_put(...); return ret; But here I see two problems: 1. Between the check !cp... and referencing cp-prot-owner the module could get unloaded and the reference be invalid. Is there some lock I can hold that prevents module unloading? I haven't found something like this in include/linux/module.h No, you need to add your own locking to prevent this, something list this: registration/unregistration: take lock change proto_tab[] release lock lookup: take lock cp = proto_tab[] if (cp !try_module_get(cp-owner)) cp = NULL release lock 2. If the module gets unloaded after the first check and request_module() but before the call to try_module_get() the socket() syscall will return with error, although module auto loading would normally be successful. How can I prevent that? Why do you want to prevent it? The admin unloaded the module, so he apparently doesn't want the operation to succeed. find_dev_rcv_lists() is called in one place from can_rcv() with RCU lock held, as you write. The other two calls to find_dev_rcv_lists() are from can_rx_register/unregister() functions which change the receive lists. Therefore, we can't only use RCU but need protection against simultanous writes. We do this with the spin_lock_bh(). The _bh variant, because can_rcv() runs in interrupt and we need to block that. I thought this is pretty standard. I'll check this again tomorrow, but I have put much time in these locking issues already, changed it quite a few times and hoped to have got it right finally. I'm not saying you should use *only* RCU, you need the lock for additions/removal of course, but since the receive path doesn't take that lock and relies on RCU, you need to use the _rcu list walking variant to avoid races with concurrent list changes. I have no objections to add the _rcu suffix for the code changing the receive lists, but I don't see why it's necessary. When I do a spin_lock_bh() before writing, can't I be sure that there is no interrupt routine running in parallel while I hold this spinlock? If so, there is no reader in parallel because the can_rcv() function runs in a softirq. I'd really like to understand why you think the writers should also use the _rcu variant. I'm saying you need _rcu for the *read side*. All operations changing the list already use the _rcu variants. I'm sorry if I miss something obvious here, but could you try to explain it to me? spin_lock_bh only disables BHs locally, other CPUs can still process softirqs. And since rcv_lists_lock is only used in process context, the BH disabling is actually not even necessary. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci: Fix e100 interrupt quirk
Andrew Morton wrote: On Tue, 18 Sep 2007 15:17:37 +0400 Valentine Barshak [EMAIL PROTECTED] wrote: PCI memory space may have a 64-bit offset on some architectures (for example, PowerPC 440) and the actual PCI memory address has to fixed up (an offset to PCI mem space shuld be added) before remapping. So, pci_iomap should be used instead of reading and remapping PCI BAR directly. This has been tested on Sequoia PowerPC 440EPx board. Signed-off-by: Valentine Barshak [EMAIL PROTECTED] --- --- linux-2.6.orig/drivers/pci/quirks.c 2007-09-04 21:15:43.0 +0400 +++ linux-2.6.bld/drivers/pci/quirks.c 2007-09-05 20:46:14.0 +0400 @@ -1444,9 +1444,9 @@ static void __devinit quirk_e100_interrupt(struct pci_dev *dev) { u16 command; - u32 bar; u8 __iomem *csr; u8 cmd_hi; + int rc; switch (dev-device) { /* PCI IDs taken from drivers/net/e100.c */ @@ -1476,16 +1476,17 @@ * re-enable them when it's ready. */ pci_read_config_word(dev, PCI_COMMAND, command); - pci_read_config_dword(dev, PCI_BASE_ADDRESS_0, bar); - if (!(command PCI_COMMAND_MEMORY) || !bar) + rc = pci_request_region(dev, 0, e100_quirk); + + if (!(command PCI_COMMAND_MEMORY) || (rc 0)) return; Really? So if pci_request_region() failed and !(command PCI_COMMAND_MEMORY), we leak the region? So the next call to this function will fail? I've split command and request region checks and submitted new patch: http://lkml.org/lkml/2007/9/19/106 Please, take a look, Thanks, Valentine. - csr = ioremap(bar, 8); + csr = pci_iomap(dev, 0, 8); if (!csr) { printk(KERN_WARNING PCI: Can't map %s e100 registers\n, pci_name(dev)); - return; + goto e100_quirk_exit; } cmd_hi = readb(csr + 3); @@ -1495,7 +1496,9 @@ writeb(1, csr + 3); } - iounmap(csr); + pci_iounmap(dev, csr); +e100_quirk_exit: + pci_release_region(dev, 0); } DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, quirk_e100_interrupt); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] CAN: Add PF_CAN core module
Patrick McHardy [EMAIL PROTECTED] writes: No, you need to add your own locking to prevent this, something list this: registration/unregistration: take lock change proto_tab[] release lock lookup: take lock cp = proto_tab[] if (cp !try_module_get(cp-owner)) cp = NULL release lock Ah, ok. Thanks for that hint. I will add it that way. 2. If the module gets unloaded after the first check and request_module() but before the call to try_module_get() the socket() syscall will return with error, although module auto loading would normally be successful. How can I prevent that? Why do you want to prevent it? The admin unloaded the module, so he apparently doesn't want the operation to succeed. Well, unloading a module doesn't usually cause to operation to fail when auto loading is enabled. It only wouldn't succeed when the unload happens in the small window between test/request-module and call to try_module_get(). This looks ugly to me. But the lock you described above would also solve this. I'm saying you need _rcu for the *read side*. All operations changing the list already use the _rcu variants. I'm sorry if I miss something obvious here, but could you try to explain it to me? spin_lock_bh only disables BHs locally, other CPUs can still process softirqs. And since rcv_lists_lock is only used in process context, the BH disabling is actually not even necessary. Well, I finally (hopefully) got it and I have changed the code accordingly. Thanks for your explanation. I will post our updated code again, probably today. The issues still left are * module parameter for loopback, but we want to keep that. * configure option for allowing normal users access to raw and bcm CAN sockets. I'll check how easily an (embedded) system can be set up to run relevant/all processes with the CAP_NEW_RAW capability. I would like to kill that configure option. * seq_files for proc fs. On my TODO list. urs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LARTC] ifb and ppp
Please keep netdev and myself CCed. Frithjof Hammer wrote: Does this patch help? A further examiniation: [...] printk (fri: mein type %x\n,dev-type); switch (dev-type) { [...] shows this: [EMAIL PROTECTED]:/usr/src/linux-source-2.6.21# dmesg | grep fri fri: mein type 1 that is defined as ARPHRD_ETHER in include/linux/if_arp.h. As far as i understand this means, that my ppp0 device is recognized as Ethernetinterface. Any further help/ideas? I misread the code, the device it looks at in tcf_mirred_init is the target device (ifb). So what it does is check whether the target device wants a link layer header and if it does restores the one from the source device. So currently it seems impossible to get rid of the PPP(oE) header. Jamal, is that how its supposed to work? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.24 0/9]: TCP improvements cleanups
Hi Dave, Just in case you're short on what to do ;-) here are some TCP related cleanups improvements to net-2.6.24. Including FRTO undo fix which finally should allow FRTO to be turned on, and some simple fastpath tweaks simple enough to the 2.6.24 schedule. ...I've a larger fastpath_hint removal patch coming up later too but it's really a monster which needs more time though I guess it could really cut down the SACK processing latencies people are experience with high-speed flows (I'll probably post it with RFC once you've picked these up). These were boot ( couple of hours) tested on the top of net-2.6.24 (something after the first large rebase you did, so you could count that as success report of it too :-)). Not sure if all those fragment/collapse paths I modified got executed though. -- i. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb
In general, it should not be necessary to call tcp_fragment for already SACKed skbs, but it's better to be safe than sorry. And indeed, it can be called from sacktag when a DSACK arrives or some ACK (with SACK) reordering occurs (sacktag could be made to avoid the call in the latter case though I'm not sure if it's worth of the trouble and added complexity to cover such marginal case). The collapse case has return for SACKED_ACKED case earlier, so just WARN_ON if internal inconsistency is detected for some reason. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_output.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d65d17b..9df5b2a 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -692,6 +692,9 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss TCP_SKB_CB(buff)-end_seq = TCP_SKB_CB(skb)-end_seq; TCP_SKB_CB(skb)-end_seq = TCP_SKB_CB(buff)-seq; + if (tp-sacked_out (TCP_SKB_CB(skb)-seq == tp-highest_sack)) + tp-highest_sack = TCP_SKB_CB(buff)-seq; + /* PSH and FIN should only be set in the second packet. */ flags = TCP_SKB_CB(skb)-flags; TCP_SKB_CB(skb)-flags = flags ~(TCPCB_FLAG_FIN|TCPCB_FLAG_PSH); @@ -1723,6 +1726,10 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m /* Update sequence range on original skb. */ TCP_SKB_CB(skb)-end_seq = TCP_SKB_CB(next_skb)-end_seq; + if (WARN_ON(tp-sacked_out + (TCP_SKB_CB(next_skb)-seq == tp-highest_sack))) + return; + /* Merge over control information. */ flags |= TCP_SKB_CB(next_skb)-flags; /* This moves PSH/FIN etc. over */ TCP_SKB_CB(skb)-flags = flags; -- 1.5.0.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] [TCP]: Make fackets_out accurate
Substraction for fackets_out is unconditional when snd_una advances, thus there's no need to do it inside the loop. Just make sure correct bounds are honored. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c | 10 +++--- net/ipv4/tcp_output.c | 44 ++-- 2 files changed, 29 insertions(+), 25 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index fd0ae4d..09b6b1d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2302,8 +2302,8 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag) * 1. Reno does not count dupacks (sacked_out) automatically. */ if (!tp-packets_out) tp-sacked_out = 0; - /* 2. SACK counts snd_fack in packets inaccurately. */ - if (tp-sacked_out == 0) + + if (WARN_ON(!tp-sacked_out tp-fackets_out)) tp-fackets_out = 0; /* Now state machine starts. @@ -2571,10 +2571,6 @@ static int tcp_tso_acked(struct sock *sk, struct sk_buff *skb, } else if (*seq_rtt 0) *seq_rtt = now - scb-when; - if (tp-fackets_out) { - __u32 dval = min(tp-fackets_out, packets_acked); - tp-fackets_out -= dval; - } tp-packets_out -= packets_acked; BUG_ON(tcp_skb_pcount(skb) == 0); @@ -2657,7 +2653,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) seq_rtt = now - scb-when; last_ackt = skb-tstamp; } - tcp_dec_pcount_approx(tp-fackets_out, skb); tp-packets_out -= tcp_skb_pcount(skb); tcp_unlink_write_queue(skb, sk); sk_stream_free_skb(sk, skb); @@ -2672,6 +2667,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) tcp_ack_update_rtt(sk, acked, seq_rtt); tcp_rearm_rto(sk); + tp-fackets_out -= min(pkts_acked, tp-fackets_out); if (tcp_is_reno(tp)) tcp_remove_reno_sacks(sk, pkts_acked); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 9df5b2a..cbe8bf6 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -652,6 +652,26 @@ static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb, unsigned } } +/* When a modification to fackets out becomes necessary, we need to check + * skb is counted to fackets_out or not. Another important thing is to + * tweak SACK fastpath hint too as it would overwrite all changes unless + * hint is also changed. + */ +static void tcp_adjust_fackets_out(struct tcp_sock *tp, struct sk_buff *skb, + int decr) +{ + if (!tp-sacked_out) + return; + + if (!before(tp-highest_sack, TCP_SKB_CB(skb)-seq)) + tp-fackets_out -= decr; + + /* cnt_hint is off-by-one compared with fackets_out (see sacktag) */ + if (tp-fastpath_skb_hint != NULL + after(TCP_SKB_CB(tp-fastpath_skb_hint)-seq, TCP_SKB_CB(skb)-seq)) + tp-fastpath_cnt_hint -= decr; +} + /* Function to create two new TCP segments. Shrinks the given segment * to the specified size and appends a new segment with the rest of the * packet to the list. This won't be called frequently, I hope. @@ -746,21 +766,12 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss if (TCP_SKB_CB(skb)-sacked TCPCB_LOST) tp-lost_out -= diff; - if (diff 0) { - /* Adjust Reno SACK estimate. */ - if (tcp_is_reno(tp)) { - tcp_dec_pcount_approx_int(tp-sacked_out, diff); - tcp_verify_left_out(tp); - } - - tcp_dec_pcount_approx_int(tp-fackets_out, diff); - /* SACK fastpath might overwrite it unless dealt with */ - if (tp-fastpath_skb_hint != NULL - after(TCP_SKB_CB(tp-fastpath_skb_hint)-seq, - TCP_SKB_CB(skb)-seq)) { - tcp_dec_pcount_approx_int(tp-fastpath_cnt_hint, diff); - } + /* Adjust Reno SACK estimate. */ + if (tcp_is_reno(tp) diff 0) { + tcp_dec_pcount_approx_int(tp-sacked_out, diff); + tcp_verify_left_out(tp); } + tcp_adjust_fackets_out(tp, skb, diff); } /* Link BUFF into the send queue. */ @@ -1746,10 +1757,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m if (tcp_is_reno(tp) tp-sacked_out) tcp_dec_pcount_approx(tp-sacked_out, next_skb); -
[PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_
In addition, fix its function comment spacing. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- include/net/tcp.h |4 ++-- net/ipv4/tcp_input.c | 10 +- net/ipv4/tcp_output.c |6 +++--- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index f28f382..16dfe3c 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1066,8 +1066,8 @@ static inline void tcp_mib_init(void) TCP_ADD_STATS_USER(TCP_MIB_MAXCONN, -1); } -/*from STCP */ -static inline void clear_all_retrans_hints(struct tcp_sock *tp){ +/* from STCP */ +static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) { tp-lost_skb_hint = NULL; tp-scoreboard_skb_hint = NULL; tp-retransmit_skb_hint = NULL; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 09b6b1d..89162a9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1670,7 +1670,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) tp-high_seq = tp-frto_highmark; TCP_ECN_queue_cwr(tp); - clear_all_retrans_hints(tp); + tcp_clear_all_retrans_hints(tp); } void tcp_clear_retrans(struct tcp_sock *tp) @@ -1741,7 +1741,7 @@ void tcp_enter_loss(struct sock *sk, int how) /* Abort FRTO algorithm if one is in progress */ tp-frto_counter = 0; - clear_all_retrans_hints(tp); + tcp_clear_all_retrans_hints(tp); } static int tcp_check_sack_reneging(struct sock *sk) @@ -2106,7 +2106,7 @@ static void tcp_undo_cwr(struct sock *sk, const int undo) /* There is something screwy going on with the retrans hints after an undo */ - clear_all_retrans_hints(tp); + tcp_clear_all_retrans_hints(tp); } static inline int tcp_may_undo(struct tcp_sock *tp) @@ -2199,7 +2199,7 @@ static int tcp_try_undo_loss(struct sock *sk) TCP_SKB_CB(skb)-sacked = ~TCPCB_LOST; } - clear_all_retrans_hints(tp); + tcp_clear_all_retrans_hints(tp); DBGUNDO(sk, partial loss); tp-lost_out = 0; @@ -2656,7 +2656,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) tp-packets_out -= tcp_skb_pcount(skb); tcp_unlink_write_queue(skb, sk); sk_stream_free_skb(sk, skb); - clear_all_retrans_hints(tp); + tcp_clear_all_retrans_hints(tp); } if (ackedFLAG_ACKED) { diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index cbe8bf6..f46d24b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -687,7 +687,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss BUG_ON(len skb-len); - clear_all_retrans_hints(tp); + tcp_clear_all_retrans_hints(tp); nsize = skb_headlen(skb) - len; if (nsize 0) nsize = 0; @@ -1719,7 +1719,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m tcp_skb_pcount(next_skb) != 1); /* changing transmit queue under us so clear hints */ - clear_all_retrans_hints(tp); + tcp_clear_all_retrans_hints(tp); /* Ok. We will be able to collapse the packet. */ tcp_unlink_write_queue(next_skb, sk); @@ -1792,7 +1792,7 @@ void tcp_simple_retransmit(struct sock *sk) } } - clear_all_retrans_hints(tp); + tcp_clear_all_retrans_hints(tp); if (!lost) return; -- 1.5.0.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue
The accounting code is pretty much the same, so it's a shame we do it in two places. I'm not too sure if added fully_acked check in MTU probing is really what we want perhaps the added end_seq could be used in the after() comparison. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c | 75 + 1 files changed, 32 insertions(+), 43 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 89162a9..d340fd5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2528,14 +2528,12 @@ static void tcp_rearm_rto(struct sock *sk) } } -static int tcp_tso_acked(struct sock *sk, struct sk_buff *skb, -__u32 now, __s32 *seq_rtt) +static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_skb_cb *scb = TCP_SKB_CB(skb); __u32 seq = tp-snd_una; __u32 packets_acked; - int acked = 0; /* If we get here, the whole TSO packet has not been * acked. @@ -2548,36 +2546,11 @@ static int tcp_tso_acked(struct sock *sk, struct sk_buff *skb, packets_acked -= tcp_skb_pcount(skb); if (packets_acked) { - __u8 sacked = scb-sacked; - - acked |= FLAG_DATA_ACKED; - if (sacked) { - if (sacked TCPCB_RETRANS) { - if (sacked TCPCB_SACKED_RETRANS) - tp-retrans_out -= packets_acked; - acked |= FLAG_RETRANS_DATA_ACKED; - *seq_rtt = -1; - } else if (*seq_rtt 0) - *seq_rtt = now - scb-when; - if (sacked TCPCB_SACKED_ACKED) - tp-sacked_out -= packets_acked; - if (sacked TCPCB_LOST) - tp-lost_out -= packets_acked; - if (sacked TCPCB_URG) { - if (tp-urg_mode - !before(seq, tp-snd_up)) - tp-urg_mode = 0; - } - } else if (*seq_rtt 0) - *seq_rtt = now - scb-when; - - tp-packets_out -= packets_acked; - BUG_ON(tcp_skb_pcount(skb) == 0); BUG_ON(!before(scb-seq, scb-end_seq)); } - return acked; + return packets_acked; } /* Remove acknowledged frames from the retransmission queue. */ @@ -2587,6 +2560,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) const struct inet_connection_sock *icsk = inet_csk(sk); struct sk_buff *skb; __u32 now = tcp_time_stamp; + int fully_acked = 1; int acked = 0; int prior_packets = tp-packets_out; __s32 seq_rtt = -1; @@ -2595,6 +2569,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) while ((skb = tcp_write_queue_head(sk)) skb != tcp_send_head(sk)) { struct tcp_skb_cb *scb = TCP_SKB_CB(skb); + u32 end_seq; + u32 packets_acked; __u8 sacked = scb-sacked; /* If our packet is before the ack sequence we can @@ -2602,11 +2578,19 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) * the other end. */ if (after(scb-end_seq, tp-snd_una)) { - if (tcp_skb_pcount(skb) 1 - after(tp-snd_una, scb-seq)) - acked |= tcp_tso_acked(sk, skb, - now, seq_rtt); - break; + if (tcp_skb_pcount(skb) == 1 || + !after(tp-snd_una, scb-seq)) + break; + + packets_acked = tcp_tso_acked(sk, skb); + if (!packets_acked) + break; + + fully_acked = 0; + end_seq = tp-snd_una; + } else { + packets_acked = tcp_skb_pcount(skb); + end_seq = scb-end_seq; } /* Initial outgoing SYN's get put onto the write_queue @@ -2624,7 +2608,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) } /* MTU probing checks */ - if (icsk-icsk_mtup.probe_size) { + if (fully_acked icsk-icsk_mtup.probe_size) { if (!after(tp-mtu_probe.probe_seq_end, TCP_SKB_CB(skb)-end_seq)) { tcp_mtup_probe_success(sk, skb); } @@ -2633,27 +2617,32 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32
[PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users
Basically this change enables it, previously other undo_marker users were left with nothing. Reverse undo_marker logic completely to get it set right in CA_Loss. On the other hand, when spurious RTO is detected, clear it. Clearing might be too heavy for some scenarios but seems safe enough starting point for now and shouldn't have much effect except in majority of cases (if in any). By adding a new FLAG_ we avoid looping through write_queue when RTO occurs. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c | 42 +++--- 1 files changed, 27 insertions(+), 15 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 74accb0..948e79a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -104,6 +104,7 @@ int sysctl_tcp_abc __read_mostly; #define FLAG_ONLY_ORIG_SACKED 0x200 /* SACKs only non-rexmit sent before RTO */ #define FLAG_SND_UNA_ADVANCED 0x400 /* Snd_una was changed (!= FLAG_DATA_ACKED) */ #define FLAG_DSACKING_ACK 0x800 /* SACK blocks contained DSACK info */ +#define FLAG_NONHEAD_RETRANS_ACKED 0x1000 /* Non-head rexmitted data was ACKed */ #define FLAG_ACKED (FLAG_DATA_ACKED|FLAG_SYN_ACKED) #define FLAG_NOT_DUP (FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED) @@ -1597,6 +1598,8 @@ void tcp_enter_frto(struct sock *sk) tp-undo_retrans = 0; skb = tcp_write_queue_head(sk); + if (TCP_SKB_CB(skb)-sacked TCPCB_RETRANS) + tp-undo_marker = 0; if (TCP_SKB_CB(skb)-sacked TCPCB_SACKED_RETRANS) { TCP_SKB_CB(skb)-sacked = ~TCPCB_SACKED_RETRANS; tp-retrans_out -= tcp_skb_pcount(skb); @@ -1646,6 +1649,8 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) /* ...enter this if branch just for the first segment */ flag |= FLAG_DATA_ACKED; } else { + if (TCP_SKB_CB(skb)-sacked TCPCB_RETRANS) + tp-undo_marker = 0; TCP_SKB_CB(skb)-sacked = ~(TCPCB_LOST|TCPCB_SACKED_RETRANS); } @@ -1661,7 +1666,6 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) tp-snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments; tp-snd_cwnd_cnt = 0; tp-snd_cwnd_stamp = tcp_time_stamp; - tp-undo_marker = 0; tp-frto_counter = 0; tp-reordering = min_t(unsigned int, tp-reordering, @@ -2587,20 +2591,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p) end_seq = scb-end_seq; } - /* Initial outgoing SYN's get put onto the write_queue -* just like anything else we transmit. It is not -* true data, and if we misinform our callers that -* this ACK acks real data, we will erroneously exit -* connection startup slow start one packet too -* quickly. This is severely frowned upon behavior. -*/ - if (!(scb-flags TCPCB_FLAG_SYN)) { - flag |= FLAG_DATA_ACKED; - } else { - flag |= FLAG_SYN_ACKED; - tp-retrans_stamp = 0; - } - /* MTU probing checks */ if (fully_acked icsk-icsk_mtup.probe_size !after(tp-mtu_probe.probe_seq_end, scb-end_seq)) { @@ -2613,6 +2603,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p) tp-retrans_out -= packets_acked; flag |= FLAG_RETRANS_DATA_ACKED; seq_rtt = -1; + if ((flag FLAG_DATA_ACKED) || + (packets_acked 1)) + flag |= FLAG_NONHEAD_RETRANS_ACKED; } else if (seq_rtt 0) { seq_rtt = now - scb-when; if (fully_acked) @@ -2634,6 +2627,20 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p) } tp-packets_out -= packets_acked; + /* Initial outgoing SYN's get put onto the write_queue +* just like anything else we transmit. It is not +* true data, and if we misinform our callers that +* this ACK acks real data, we will erroneously exit +* connection startup slow start one packet too +* quickly. This is severely frowned upon behavior. +*/ + if (!(scb-flags TCPCB_FLAG_SYN)) { + flag |= FLAG_DATA_ACKED; + } else { + flag |= FLAG_SYN_ACKED; + tp-retrans_stamp = 0; + } +
[PATCH 7/9] [TCP] FRTO: Update sysctl documentation
Since the SACK enhanced FRTO was added, the code has been under test numerous times so remove experimental claim from the documentation. Also be a bit more verbose about the usage. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- Documentation/networking/ip-sysctl.txt | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 32c2e9d..6ae2fef 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -180,13 +180,20 @@ tcp_fin_timeout - INTEGER to live longer. Cf. tcp_max_orphans. tcp_frto - INTEGER - Enables F-RTO, an enhanced recovery algorithm for TCP retransmission + Enables Forward RTO-Recovery (F-RTO) defined in RFC4138. + F-RTO is an enhanced recovery algorithm for TCP retransmission timeouts. It is particularly beneficial in wireless environments where packet loss is typically due to random radio interference - rather than intermediate router congestion. If set to 1, basic - version is enabled. 2 enables SACK enhanced F-RTO, which is - EXPERIMENTAL. The basic version can be used also when SACK is - enabled for a flow through tcp_sack sysctl. + rather than intermediate router congestion. FRTO is sender-side + only modification. Therefore it does not require any support from + the peer, but in a typical case, however, where wireless link is + the local access link and most of the data flows downlink, the + faraway servers should have FRTO enabled to take advantage of it. + If set to 1, basic version is enabled. 2 enables SACK enhanced + F-RTO if flow uses SACK. The basic version can be used also when + SACK is in use though scenario(s) with it exists where FRTO + interacts badly with the packet counting of the SACK enabled TCP + flow. tcp_frto_response - INTEGER When F-RTO has detected that a TCP retransmission timeout was -- 1.5.0.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue
Implements following cleanups: - Comment re-placement (CodingStyle) - tcp_tso_acked() local (wrapper-like) variable removal (readability) - __-types removed (IMHO they make local variables jumpy looking and just was space) - acked - flag (naming conventions elsewhere in TCP code) - linebreak adjustments (readability) - nested if()s combined (reduced indentation) - clarifying newlines added Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c | 66 ++--- 1 files changed, 30 insertions(+), 36 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d340fd5..74accb0 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2528,55 +2528,49 @@ static void tcp_rearm_rto(struct sock *sk) } } +/* If we get here, the whole TSO packet has not been acked. */ static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); - struct tcp_skb_cb *scb = TCP_SKB_CB(skb); - __u32 seq = tp-snd_una; - __u32 packets_acked; + u32 packets_acked; - /* If we get here, the whole TSO packet has not been -* acked. -*/ - BUG_ON(!after(scb-end_seq, seq)); + BUG_ON(!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una)); packets_acked = tcp_skb_pcount(skb); - if (tcp_trim_head(sk, skb, seq - scb-seq)) + if (tcp_trim_head(sk, skb, tp-snd_una - TCP_SKB_CB(skb)-seq)) return 0; packets_acked -= tcp_skb_pcount(skb); if (packets_acked) { BUG_ON(tcp_skb_pcount(skb) == 0); - BUG_ON(!before(scb-seq, scb-end_seq)); + BUG_ON(!before(TCP_SKB_CB(skb)-seq, TCP_SKB_CB(skb)-end_seq)); } return packets_acked; } -/* Remove acknowledged frames from the retransmission queue. */ -static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) +/* Remove acknowledged frames from the retransmission queue. If our packet + * is before the ack sequence we can discard it as it's confirmed to have + * arrived at the other end. + */ +static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p) { struct tcp_sock *tp = tcp_sk(sk); const struct inet_connection_sock *icsk = inet_csk(sk); struct sk_buff *skb; - __u32 now = tcp_time_stamp; + u32 now = tcp_time_stamp; int fully_acked = 1; - int acked = 0; + int flag = 0; int prior_packets = tp-packets_out; - __s32 seq_rtt = -1; + s32 seq_rtt = -1; ktime_t last_ackt = net_invalid_timestamp(); - while ((skb = tcp_write_queue_head(sk)) - skb != tcp_send_head(sk)) { + while ((skb = tcp_write_queue_head(sk)) skb != tcp_send_head(sk)) { struct tcp_skb_cb *scb = TCP_SKB_CB(skb); u32 end_seq; u32 packets_acked; - __u8 sacked = scb-sacked; + u8 sacked = scb-sacked; - /* If our packet is before the ack sequence we can -* discard it as it's confirmed to have arrived at -* the other end. -*/ if (after(scb-end_seq, tp-snd_una)) { if (tcp_skb_pcount(skb) == 1 || !after(tp-snd_una, scb-seq)) @@ -2601,38 +2595,38 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) * quickly. This is severely frowned upon behavior. */ if (!(scb-flags TCPCB_FLAG_SYN)) { - acked |= FLAG_DATA_ACKED; + flag |= FLAG_DATA_ACKED; } else { - acked |= FLAG_SYN_ACKED; + flag |= FLAG_SYN_ACKED; tp-retrans_stamp = 0; } /* MTU probing checks */ - if (fully_acked icsk-icsk_mtup.probe_size) { - if (!after(tp-mtu_probe.probe_seq_end, TCP_SKB_CB(skb)-end_seq)) { - tcp_mtup_probe_success(sk, skb); - } + if (fully_acked icsk-icsk_mtup.probe_size + !after(tp-mtu_probe.probe_seq_end, scb-end_seq)) { + tcp_mtup_probe_success(sk, skb); } if (sacked) { if (sacked TCPCB_RETRANS) { if (sacked TCPCB_SACKED_RETRANS) tp-retrans_out -= packets_acked; - acked |= FLAG_RETRANS_DATA_ACKED; + flag |= FLAG_RETRANS_DATA_ACKED; seq_rtt = -1; } else if (seq_rtt 0) { seq_rtt = now - scb-when; if (fully_acked) last_ackt = skb-tstamp;
[PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default
Most of the description that follows comes from my mail to netdev (some editing done): Main obstacle to FRTO use is its deployment as it has to be on the sender side where as wireless link is often the receiver's access link. Take initiative on behalf of unlucky receivers and enable it by default in future Linux TCP senders. Also IETF seems to interested in advancing FRTO from experimental [1]. How does FRTO help? === FRTO detects spurious RTOs and avoids a number of unnecessary retransmissions and a couple of other problems that can arise due to incorrect guess made at RTO (i.e., that segments were lost when they actually got delayed which is likely to occur e.g. in wireless environments with link-layer retransmission). Though FRTO cannot prevent the first (potentially unnecessary) retransmission at RTO, I suspect that it won't cost that much even if you have to pay for each bit (won't be that high percentage out of all packets after all :-)). However, usually when you have a spurious RTO, not only the first segment unnecessarily retransmitted but the *whole window*. It goes like this: all cumulative ACKs got delayed due to in-order delivery, then TCP will actually send 1.5*original cwnd worth of data in the RTO's slow-start when the delayed ACKs arrive (basically the original cwnd worth of it unnecessarily). In case one is interested in minimizing unnecessary retransmissions e.g. due to cost, those rexmissions must never see daylight. Besides, in the worst case the generated burst overloads the bottleneck buffers which is likely to significantly delay the further progress of the flow. In case of ll rexmissions, ACK compression often occurs at the same time making the burst very sharp edged (in that case TCP often loses most of the segments above high_seq = very bad performance too). When FRTO is enabled, those unnecessary retransmissions are fully avoided except for the first segment and the cwnd behavior after detected spurious RTO is determined by the response (one can tune that by sysctl). Basic version (non-SACK enhanced one), FRTO can fail to detect spurious RTO as spurious and falls back to conservative behavior. ACK lossage is much less significant than reordering, usually the FRTO can detect spurious RTO if at least 2 cumulative ACKs from original window are preserved (excluding the ACK that advances to high_seq). With SACK-enhanced version, the detection is quite robust. FRTO should remove the need to set a high lower bound for the RTO estimator due to delay spikes that occur relatively common in some environments (esp. in wireless/cellular ones). [1] http://www1.ietf.org/mail-archive/web/tcpm/current/msg02862.html Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 948e79a..02b549b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -85,7 +85,7 @@ int sysctl_tcp_adv_win_scale __read_mostly = 2; int sysctl_tcp_stdurg __read_mostly; int sysctl_tcp_rfc1337 __read_mostly; int sysctl_tcp_max_orphans __read_mostly = NR_FILE; -int sysctl_tcp_frto __read_mostly; +int sysctl_tcp_frto __read_mostly = 2; int sysctl_tcp_frto_response __read_mostly; int sysctl_tcp_nometrics_save __read_mostly; -- 1.5.0.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations
There's no reason to clear the sacktag skb hint when small part of the rexmit queue changes. Account changes (if any) instead when fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so no need to discard SACK tag hint at all. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- include/net/tcp.h |6 +- net/ipv4/tcp_input.c | 14 -- net/ipv4/tcp_output.c | 12 3 files changed, 21 insertions(+), 11 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 16dfe3c..07b1faa 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1067,11 +1067,15 @@ static inline void tcp_mib_init(void) } /* from STCP */ -static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) { +static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp) { tp-lost_skb_hint = NULL; tp-scoreboard_skb_hint = NULL; tp-retransmit_skb_hint = NULL; tp-forward_skb_hint = NULL; +} + +static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) { + tcp_clear_retrans_hints_partial(tp); tp-fastpath_skb_hint = NULL; } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 02b549b..1092b5a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1674,7 +1674,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) tp-high_seq = tp-frto_highmark; TCP_ECN_queue_cwr(tp); - tcp_clear_all_retrans_hints(tp); + tcp_clear_retrans_hints_partial(tp); } void tcp_clear_retrans(struct tcp_sock *tp) @@ -1714,10 +1714,14 @@ void tcp_enter_loss(struct sock *sk, int how) tp-bytes_acked = 0; tcp_clear_retrans(tp); - /* Push undo marker, if it was plain RTO and nothing -* was retransmitted. */ - if (!how) + if (!how) { + /* Push undo marker, if it was plain RTO and nothing +* was retransmitted. */ tp-undo_marker = tp-snd_una; + tcp_clear_retrans_hints_partial(tp); + } else { + tcp_clear_all_retrans_hints(tp); + } tcp_for_write_queue(skb, sk) { if (skb == tcp_send_head(sk)) @@ -1744,8 +1748,6 @@ void tcp_enter_loss(struct sock *sk, int how) TCP_ECN_queue_cwr(tp); /* Abort FRTO algorithm if one is in progress */ tp-frto_counter = 0; - - tcp_clear_all_retrans_hints(tp); } static int tcp_check_sack_reneging(struct sock *sk) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index f46d24b..cbb83ac 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -687,7 +687,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss BUG_ON(len skb-len); - tcp_clear_all_retrans_hints(tp); + tcp_clear_retrans_hints_partial(tp); nsize = skb_headlen(skb) - len; if (nsize 0) nsize = 0; @@ -1718,9 +1718,6 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m BUG_ON(tcp_skb_pcount(skb) != 1 || tcp_skb_pcount(next_skb) != 1); - /* changing transmit queue under us so clear hints */ - tcp_clear_all_retrans_hints(tp); - /* Ok. We will be able to collapse the packet. */ tcp_unlink_write_queue(next_skb, sk); @@ -1759,6 +1756,13 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m tcp_adjust_fackets_out(tp, skb, tcp_skb_pcount(next_skb)); tp-packets_out -= tcp_skb_pcount(next_skb); + + /* changed transmit queue under us so clear hints */ + tcp_clear_retrans_hints_partial(tp); + /* manually tune sacktag skb hint */ + if (tp-fastpath_skb_hint == next_skb) + tp-fastpath_skb_hint = skb; + sk_stream_free_skb(sk, next_skb); } } -- 1.5.0.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netif_rx will not free skb when I use ftp in kernel 2.6.22/2.6.21
Chris Snook wrote: wrote: in function at_alloc_rx_buffers(), pci_unmap_page() and netif_rx() in function at_clean_rx_irq(), Okay, I didn't know you were talking about the atl1 driver. Are you using the in-tree driver in 2.6.22, or the pre-merge driver on sourceforge, or the vendor driver from Attansic/Atheros? Based on the function names (at_*), looks like the vendor driver is being used. The pre-merge and in-kernel function names begin with atl1_. Jay - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH: 2.6.13-15-SMP 3/3] network: concurrently run softirq network code on SMP
Bottom Softirq Implementation. John Ye, 2007.08.27 Why this patch: Make kernel be able to concurrently execute softirq's net code on SMP system. Takes full advantages of SMP to handle more packets and greatly raises NIC throughput. The current kernel's net packet processing logic is: 1) The CPU which handles a hardirq must be executing its related softirq. 2) One softirq instance(irqs handled by 1 CPU) can't be executed on more than 2 CPUs at the same time. The limitation make kernel network be hard to take the advantages of SMP. How this patch: It splits the current softirq code into 2 parts: the cpu-sensitive top half, and the cpu-insensitive bottom half, then make bottom half(calld BS) be executed on SMP concurrently. The two parts are not equal in terms of size and load. Top part has constant code size(mainly, in net/core/dev.c and NIC drivers), while bottom part involves netfilter(iptables) whose load varies very much. An iptalbes with 1000 rules to match will make the bottom part's load be very high. So, if the bottom part softirq can be randomly distributed to processors and run concurrently on them, the network will gain much more packet handling capacity, network throughput will be be increased remarkably. Where useful: It's useful on SMP machines that meet the following 2 conditions: 1) have high kernel network load, for example, running iptables with thousands of rules, etc). 2) have more CPUs than active NICs, e.g. a 4 CPUs machine with 2 NICs). On these system, with the increase of softirq load, some CPUs will be idle while others(number is equal to # of NIC) keeps busy. IRQBALANCE will help, but it only shifts IRQ among CPUS, makes no softirq concurrency. Balancing the load of each cpus will not remarkably increase network speed. Where NOT useful: If the bottom half of softirq is too small(without running iptables), or the network is too idle, BS patch will not be seen to have visible effect. But It has no negative affect either. User can turn on/off BS functionality by /proc/sys/net/bs_enable switch. How to test: On a linux box, run iptables, add 2000 rules to table filter table nat to simulate huge softirq load. Then, open 20 ftp sessions to download big file. On another machine(who use this test machine as gateway), open 20 more ftp download sessions. Compare the speed, without BS enabled, and with BS enabled. cat /proc/sys/net/bs_enable. this is a switch to turn on/off BS cat /proc/sys/net/bs_status. this shows the usage of each CPUs Test shown that when bottom softirq load is high, the network throughput can be nearly doubled on 2 CPUs machine. hopefully it may be quadrupled on a 4 cpus linux box. Bugs: It will NOT allow hotpug CPU. It only allows incremental CPUs ids, starting from 0 to num_online_cpus(). for example, 0,1,2,3 is OK. 0,1,8,9 is KO. Some considerations in the future: 1) With BS patch, the irq balance code on arch/i386/kernel/io_apic.c seems no need any more, at least not for network irq. 2) Softirq load will become very small. It only run the top half of old softirq, which is much less expensive than bottom half---the netfilter program. To let top softirq process more packets, can these 3 network parameters be given a larger value? extern int netdev_max_backlog = 1000; extern int netdev_budget = 300; extern int weight_p = 64; 3) Now, BS are running on built-in keventd thread, we can create new workqueues to let it run on? Signed-off-by: John Ye (Seeker) [EMAIL PROTECTED] --- old/net/ipv4/ip_input.c 2007-09-20 20:50:31.0 +0800 +++ new/net/ipv4/ip_input.c 2007-09-21 05:52:40.0 +0800 @@ -362,6 +362,198 @@ return NET_RX_DROP; } + +#define CONFIG_BOTTOM_SOFTIRQ_SMP +#define CONFIG_BOTTOM_SOFTIRQ_SMP_SYSCTL + +#ifdef CONFIG_BOTTOM_SOFTIRQ_SMP + +/* + * +Bottom Softirq Implementation. John Ye, 2007.08.27 + +Why this patch: +Make kernel be able to concurrently execute softirq's net code on SMP system. +Takes full advantages of SMP to handle more packets and greatly raises NIC throughput. +The current kernel's net packet processing logic is: +1) The CPU which handles a hardirq must be executing its related softirq. +2) One softirq instance(irqs handled by 1 CPU) can't be executed on more than 2 CPUs +at the same time. +The limitation make kernel network be hard to take the advantages of SMP. + +How this patch: +It splits the current softirq code into 2 parts: the cpu-sensitive top half, +and the cpu-insensitive bottom half, then make bottom half(calld BS) be +executed on SMP concurrently. +The two parts are not equal in terms of size and load. Top part has constant code +size(mainly, in net/core/dev.c and NIC drivers), while bottom part involves +netfilter(iptables) whose load varies very much. An iptalbes with 1000 rules to match +will make the bottom part's load be very high. So, if the bottom part softirq +can be randomly distributed to processors and run concurrently on them, the network will +gain
Re: [LARTC] ifb and ppp
On Thu, 2007-20-09 at 13:55 +0200, Patrick McHardy wrote: Please keep netdev and myself CCed. and me too (I am way behind on netdev) Frithjof Hammer wrote: Any further help/ideas? Sorry, I didnt follow the thread - what is the goal to be achieved with the setup? I misread the code, the device it looks at in tcf_mirred_init is the target device (ifb). So what it does is check whether the target device wants a link layer header and if it does restores the one from the source device. So currently it seems impossible to get rid of the PPP(oE) header. It is tricky to redirect from devices that have disparity in their view of link layer headers except for those that we know dont expect anything. Jamal, is that how its supposed to work? Right - some netdevices on receipt will expect the link layer header. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 1/11] net/core: add a netdev notification for slave detach
A slave of a bonding master that wants to send a notification before going down should call netdev_slave_detach(). The handling of this notification will be done outside the context of unregister_netdevice() which is sometimes necessary, as with IPoIB slave for example. Signed-off-by: Moni Shoua monis at voltaire.com --- include/linux/if.h |1 + net/core/dev.c | 20 2 files changed, 21 insertions(+) Index: net-2.6/net/core/dev.c === --- net-2.6.orig/net/core/dev.c 2007-09-20 08:04:47.164051688 +0200 +++ net-2.6/net/core/dev.c 2007-09-20 09:20:21.493060579 +0200 @@ -2588,6 +2588,25 @@ int netdev_set_master(struct net_device return 0; } +/** + * netdev_slave_detach - notify that slave is about to detach from master + * @slave: slave device + * + * Raise a flag that slave is about to detach from master + * and notify the netdev chain. + * The caller must hold the rtnl_mutex. + */ + +int netdev_slave_detach(struct net_device *slave) +{ + int ret = 0; + if (slave-flags IFF_SLAVE) { + slave-priv_flags |= IFF_SLAVE_DETACH; + ret = call_netdevice_notifiers(NETDEV_CHANGE, slave); + } + return ret; +} + static void __dev_set_promiscuity(struct net_device *dev, int inc) { unsigned short old_flags = dev-flags; @@ -4120,6 +4139,7 @@ EXPORT_SYMBOL(dev_set_mac_address); EXPORT_SYMBOL(free_netdev); EXPORT_SYMBOL(netdev_boot_setup_check); EXPORT_SYMBOL(netdev_set_master); +EXPORT_SYMBOL(netdev_slave_detach); EXPORT_SYMBOL(netdev_state_change); EXPORT_SYMBOL(netif_receive_skb); EXPORT_SYMBOL(netif_rx); Index: net-2.6/include/linux/if.h === --- net-2.6.orig/include/linux/if.h 2007-09-20 08:04:47.164051688 +0200 +++ net-2.6/include/linux/if.h 2007-09-20 08:15:29.577729301 +0200 @@ -61,6 +61,7 @@ #define IFF_MASTER_ALB 0x10/* bonding master, balance-alb. */ #define IFF_BONDING0x20/* bonding master or slave */ #define IFF_SLAVE_NEEDARP 0x40 /* need ARPs for validation */ +#define IFF_SLAVE_DETACH 0x80 /* slave is about to unregister */ #define IF_GET_IFACE 0x0001 /* for querying only */ #define IF_GET_PROTO 0x0002 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 2/11] IB/ipoib: Notify the world before doing unregister
When the bonding device enslaves IPoIB devices it takes pointers to functions in the ib_ipoib module. This is fine as long as the ib_ipoib nodule remains loaded while the references to its functions exist. So, to help bonding do a cleanup on time, when the IPoIB net device is a slave of a bonding master, let the master know that the IPoIB device is about to unregister (but before calling unregister). Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/infiniband/ulp/ipoib/ipoib.h |7 +++ drivers/infiniband/ulp/ipoib/ipoib_main.c |3 +++ drivers/infiniband/ulp/ipoib/ipoib_vlan.c |1 + 3 files changed, 11 insertions(+) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-20 08:35:34.0 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-20 14:20:16.495147879 +0200 @@ -48,6 +48,7 @@ #include linux/in.h #include net/dst.h +#include linux/netdevice.h MODULE_AUTHOR(Roland Dreier); MODULE_DESCRIPTION(IP-over-InfiniBand net driver); @@ -921,6 +922,7 @@ void ipoib_dev_cleanup(struct net_device /* Delete any child interfaces first */ list_for_each_entry_safe(cpriv, tcpriv, priv-child_intfs, list) { + ipoib_slave_detach(cpriv-dev); unregister_netdev(cpriv-dev); ipoib_dev_cleanup(cpriv-dev); free_netdev(cpriv-dev); @@ -1208,6 +1210,7 @@ static void ipoib_remove_one(struct ib_d ib_unregister_event_handler(priv-event_handler); flush_scheduled_work(); + ipoib_slave_detach(priv-dev); unregister_netdev(priv-dev); ipoib_dev_cleanup(priv-dev); free_netdev(priv-dev); Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_vlan.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2007-09-20 09:26:11.0 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2007-09-20 09:27:20.182709679 +0200 @@ -157,6 +157,7 @@ int ipoib_vlan_delete(struct net_device mutex_lock(ppriv-vlan_mutex); list_for_each_entry_safe(priv, tpriv, ppriv-child_intfs, list) { if (priv-pkey == pkey) { + ipoib_slave_detach(priv-dev); unregister_netdev(priv-dev); ipoib_dev_cleanup(priv-dev); list_del(priv-list); Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-09-20 12:18:56.0 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-09-20 14:21:47.385972207 +0200 @@ -570,6 +570,13 @@ static inline void ipoib_cm_handle_rx_wc #endif +static inline void ipoib_slave_detach(struct net_device *dev) +{ + rtnl_lock(); + netdev_slave_detach(dev); + rtnl_unlock(); +} + #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG void ipoib_create_debug_files(struct net_device *dev); void ipoib_delete_debug_files(struct net_device *dev); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'z1211' branch of wireless-2.6
On Wed, Sep 19, 2007 at 11:12:50PM +0100, Daniel Drake wrote: John W. Linville wrote: BTW: I fairly regularly get email from F7 users complaining about connection intermittancy and other bugs that we don't seem to have for the softmac driver (maybe stack related issues, of which I've fixed a couple that affected me personally, I'm a little surprised that F7 jumped so early). Hmmm...please refer any of these to bugzilla.redhat.com if you don't mind. John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 3/11] IB/ipoib: Bound the net device to the ipoib_neigh structue
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb-dst-neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb-dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n-dev is an ipoib device and hence netdev_priv(n-dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n-dev-flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/infiniband/ulp/ipoib/ipoib.h |4 +++- drivers/infiniband/ulp/ipoib/ipoib_main.c | 24 +++- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |3 ++- 3 files changed, 20 insertions(+), 11 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-09-18 17:08:53.245849217 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-09-18 17:09:26.534874404 +0200 @@ -328,6 +328,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headlist; }; @@ -344,7 +345,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:08:53.245849217 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:23:54.725744661 +0200 @@ -511,7 +511,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb-dst-neighbour); + neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev); if (!neigh) { ++priv-stats.tx_dropped; dev_kfree_skb_any(skb); @@ -830,6 +830,13 @@ static void ipoib_neigh_cleanup(struct n unsigned long flags; struct ipoib_ah *ah = NULL; + neigh = *to_ipoib_neigh(n); + if (neigh) { + priv = netdev_priv(neigh-dev); + ipoib_dbg(priv, neigh_destructor for bonding device: %s\n, + n-dev-name); + } else + return; ipoib_dbg(priv, neigh_cleanup for %06x IPOIB_GID_FMT \n, IPOIB_QPN(n-ha), @@ -837,13 +844,10 @@ static void ipoib_neigh_cleanup(struct n spin_lock_irqsave(priv-lock, flags); - neigh = *to_ipoib_neigh(n); - if (neigh) { - if (neigh-ah) - ah = neigh-ah; - list_del(neigh-list); - ipoib_neigh_free(n-dev, neigh); - } + if (neigh-ah) + ah = neigh-ah; + list_del(neigh-list); + ipoib_neigh_free(n-dev, neigh); spin_unlock_irqrestore(priv-lock, flags); @@ -851,7 +855,8 @@ static void ipoib_neigh_cleanup(struct n ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -860,6 +865,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh-neighbour = neighbour; + neigh-dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(neigh-queue); ipoib_cm_set(neigh, NULL); Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-09-18 17:08:53.245849217 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-09-18 17:09:26.536874045 +0200 @@ -727,7 +727,8 @@ out: if (skb-dst skb-dst-neighbour !*to_ipoib_neigh(skb-dst-neighbour)) { -
Re: Please pull 'ssb-drivers' branch of wireless-2.6
On Wed, Sep 19, 2007 at 02:33:56PM -0700, Greg KH wrote: On Wed, Sep 19, 2007 at 04:44:28PM -0400, John W. Linville wrote: These patches build upon the SSB bus support added to net-2.6.24 to support the b43 wireless driver. Since Dave has that support in his tree, I'm asking him to merge these patches as well. The second patch adds a driver for a USB OHCI device which lives on the SSB bus. Again, this is found on a number of SoC devices used especially in wireless routers and APs. The patches are available here: http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/ssb-drivers/0001-b44-port-to-native-ssb-support.patch http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/ssb-drivers/0002-usb-ssb-hosted-OHCI-driver.patch This one needs to go through the linux-usb-devel list (not the -users list) and get acked by David Brownell, the current OHCI maintainer. Ooops, sorry -- clicked on the wrong line in MAINTAINERS... David, please review the patch in the second link above and consider it for inclusion in 2.6.24 (once the SSB stuff in net-2.6.24 is merged). Thanks! John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 4/11] IB/ipoib: Verify address handle validity on send
When the bonding device senses a carrier loss of its active slave it replaces that slave with a new one. In between the times when the carrier of an IPoIB device goes down and ipoib_neigh is destroyed, it is possible that the bonding driver will send a packet on a new slave that uses an old ipoib_neigh. This patch detects and prevents this from happenning. Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/infiniband/ulp/ipoib/ipoib_main.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:09:26.535874225 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:10:22.375853147 +0200 @@ -686,9 +686,10 @@ static int ipoib_start_xmit(struct sk_bu goto out; } } else if (neigh-ah) { - if (unlikely(memcmp(neigh-dgid.raw, + if (unlikely((memcmp(neigh-dgid.raw, skb-dst-neighbour-ha + 4, - sizeof(union ib_gid { + sizeof(union ib_gid))) || +(neigh-dev != dev))) { spin_lock(priv-lock); /* * It's safe to call ipoib_put_ah() inside - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 5/11] net/bonding: Enable bonding to enslave non ARPHRD_ETHER
This patch changes some of the bond netdevice attributes and functions to be that of the active slave for the case of the enslaved device not being of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(), which are netdevice **type** dependent and hence might be not appropriate for devices of other types. It also enforces mutual exclusion on bonding slaves from dissimilar ether types, as was concluded over the v1 discussion. IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this IPoIB device is bounded to. The QP is a resource created by the IB HW and the GID is an identifier burned into the HCA (i have omitted here some details which are not important for the bonding RFC). Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/net/bonding/bond_main.c | 39 +++ 1 files changed, 39 insertions(+) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:08:59.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:13.424688411 +0300 @@ -1237,6 +1237,26 @@ static int bond_compute_features(struct return 0; } + +static void bond_setup_by_slave(struct net_device *bond_dev, + struct net_device *slave_dev) +{ + bond_dev-hard_header = slave_dev-hard_header; + bond_dev-rebuild_header= slave_dev-rebuild_header; + bond_dev-hard_header_cache = slave_dev-hard_header_cache; + bond_dev-header_cache_update = slave_dev-header_cache_update; + bond_dev-hard_header_parse = slave_dev-hard_header_parse; + + bond_dev-neigh_setup = slave_dev-neigh_setup; + + bond_dev-type = slave_dev-type; + bond_dev-hard_header_len = slave_dev-hard_header_len; + bond_dev-addr_len = slave_dev-addr_len; + + memcpy(bond_dev-broadcast, slave_dev-broadcast, + slave_dev-addr_len); +} + /* enslave device slave to bond device master */ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) { @@ -1311,6 +1331,25 @@ int bond_enslave(struct net_device *bond goto err_undo_flags; } + /* set bonding device ether type by slave - bonding netdevices are +* created with ether_setup, so when the slave type is not ARPHRD_ETHER +* there is a need to override some of the type dependent attribs/funcs. +* +* bond ether type mutual exclusion - don't allow slaves of dissimilar +* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same bond +*/ + if (bond-slave_cnt == 0) { + if (slave_dev-type != ARPHRD_ETHER) + bond_setup_by_slave(bond_dev, slave_dev); + } else if (bond_dev-type != slave_dev-type) { + printk(KERN_ERR DRV_NAME : %s ether type (%d) is different + from other slaves (%d), can not enslave it.\n, + slave_dev-name, + slave_dev-type, bond_dev-type); + res = -EINVAL; + goto err_undo_flags; + } + if (slave_dev-set_mac_address == NULL) { printk(KERN_ERR DRV_NAME : %s: Error: The slave device you specified does - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 6/11] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()
This patch allows for enslaving netdevices which do not support the set_mac_address() function. In that case the bond mac address is the one of the active slave, where remote peers are notified on the mac address (neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs (this is already done by the bonding code). Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/net/bonding/bond_main.c | 87 +++- drivers/net/bonding/bonding.h |1 2 files changed, 60 insertions(+), 28 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:54:13.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:41.971632881 +0300 @@ -1095,6 +1095,14 @@ void bond_change_active_slave(struct bon if (new_active) { bond_set_slave_active_flags(new_active); } + + /* when bonding does not set the slave MAC address, the bond MAC +* address is the one of the active slave. +*/ + if (new_active !bond-do_set_mac_addr) + memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, + new_active-dev-addr_len); + bond_send_gratuitous_arp(bond); } } @@ -1351,13 +1359,22 @@ int bond_enslave(struct net_device *bond } if (slave_dev-set_mac_address == NULL) { - printk(KERN_ERR DRV_NAME - : %s: Error: The slave device you specified does - not support setting the MAC address. - Your kernel likely does not support slave - devices.\n, bond_dev-name); - res = -EOPNOTSUPP; - goto err_undo_flags; + if (bond-slave_cnt == 0) { + printk(KERN_WARNING DRV_NAME + : %s: Warning: The first slave device you + specified does not support setting the MAC + address. This bond MAC address would be that + of the active slave.\n, bond_dev-name); + bond-do_set_mac_addr = 0; + } else if (bond-do_set_mac_addr) { + printk(KERN_ERR DRV_NAME + : %s: Error: The slave device you specified + does not support setting the MAC addres,. + but this bond uses this practice. \n + , bond_dev-name); + res = -EOPNOTSUPP; + goto err_undo_flags; + } } new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL); @@ -1378,16 +1395,18 @@ int bond_enslave(struct net_device *bond */ memcpy(new_slave-perm_hwaddr, slave_dev-dev_addr, ETH_ALEN); - /* -* Set slave to master's mac address. The application already -* set the master's mac address to that of the first slave -*/ - memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); - addr.sa_family = slave_dev-type; - res = dev_set_mac_address(slave_dev, addr); - if (res) { - dprintk(Error %d calling set_mac_address\n, res); - goto err_free; + if (bond-do_set_mac_addr) { + /* +* Set slave to master's mac address. The application already +* set the master's mac address to that of the first slave +*/ + memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); + addr.sa_family = slave_dev-type; + res = dev_set_mac_address(slave_dev, addr); + if (res) { + dprintk(Error %d calling set_mac_address\n, res); + goto err_free; + } } res = netdev_set_master(slave_dev, bond_dev); @@ -1612,9 +1631,11 @@ err_close: dev_close(slave_dev); err_restore_mac: - memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); - addr.sa_family = slave_dev-type; - dev_set_mac_address(slave_dev, addr); + if (bond-do_set_mac_addr) { + memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); + addr.sa_family = slave_dev-type; + dev_set_mac_address(slave_dev, addr); + } err_free: kfree(new_slave); @@ -1792,10 +1813,12 @@ int bond_release(struct net_device *bond /* close slave before restoring its mac address */ dev_close(slave_dev); - /* restore original (permanent) mac address */ - memcpy(addr.sa_data, slave-perm_hwaddr, ETH_ALEN); -
[PATCH V5 7/11] net/bonding: Enable IP multicast for bonding IPoIB devices
Allow to enslave devices when the bonding device is not up. Over the discussion held at the previous post this seemed to be the most clean way to go, where it is not expected to cause instabilities. Normally, the bonding driver is UP before any enslavement takes place. Once a netdevice is UP, the network stack acts to have it join some multicast groups (eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called where for multicast joins taking place after the enslavement another ip_xxx_mc_map() is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND) Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/net/bonding/bond_main.c |5 +++-- drivers/net/bonding/bond_sysfs.c |6 ++ 2 files changed, 5 insertions(+), 6 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:54:41.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:55:48.431862446 +0300 @@ -1285,8 +1285,9 @@ int bond_enslave(struct net_device *bond /* bond must be initialized by bond_open() before enslaving */ if (!(bond_dev-flags IFF_UP)) { - dprintk(Error, master_dev is not up\n); - return -EPERM; + printk(KERN_WARNING DRV_NAME +%s: master_dev is not up in bond_enslave\n, + bond_dev-name); } /* already enslaved */ Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-08-15 10:08:58.0 +0300 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-08-15 10:55:48.432862269 +0300 @@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru /* Quick sanity check -- is the bond interface up? */ if (!(bond-dev-flags IFF_UP)) { - printk(KERN_ERR DRV_NAME - : %s: Unable to update slaves because interface is down.\n, + printk(KERN_WARNING DRV_NAME + : %s: doing slave updates when interface is down.\n, bond-dev-name); - ret = -EPERM; - goto out; } /* Note: We can't hold bond-lock here, as bond_create grabs it. */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 0/11] net/bonding: ADD IPoIB support for the bonding driver
This patch series is the fifth version (see below link to V4) of the suggested changes to the bonding driver so it would be able to support non ARPHRD_ETHER netdevices for its High-Availability (active-backup) mode. Patches 1-10 were originally submitted in V4 and patch 11 is an addition by Jay. Jay, The bonding patches you acked remain unchanged while I guess I sitll need to get an official ack by Roland for the IPoIB patches. Is it OK with you to push the entire series to the networking tree? Roland has already agreed to do so. Major changes from the previous version: 1. Style changes 2. IPoIB - notify slave detach on vlan delete 3. Add function to net/core for slave detach instead of having it only in ib/ipoib 4. IPoIB - handle ib device and bonding device the same way in neigh_cleanup function Links to earlier discussion: 1. A discussion in netdev about bonding support for IPoIB. http://lists.openwall.net/netdev/2006/11/30/46 2. V4 series http://lists.openfabrics.org/pipermail/general/2007-August/039825.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 8/11] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device
bonding sometimes uses Ethernet constants (such as MTU and address length) which are not good when it enslaves non Ethernet devices (such as InfiniBand). Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/net/bonding/bond_main.c |3 ++- drivers/net/bonding/bond_sysfs.c | 19 +-- drivers/net/bonding/bonding.h|1 + 3 files changed, 16 insertions(+), 7 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:55:48.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-20 14:29:11.911298577 +0300 @@ -1224,7 +1224,8 @@ static int bond_compute_features(struct struct slave *slave; struct net_device *bond_dev = bond-dev; unsigned long features = bond_dev-features; - unsigned short max_hard_header_len = ETH_HLEN; + unsigned short max_hard_header_len = max((u16)ETH_HLEN, + bond_dev-hard_header_len); int i; features = ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES); Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-08-15 10:55:48.0 +0300 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-08-15 12:14:41.152469089 +0300 @@ -164,9 +164,7 @@ static ssize_t bonding_store_bonds(struc printk(KERN_INFO DRV_NAME : %s is being deleted...\n, bond-dev-name); - bond_deinit(bond-dev); - bond_destroy_sysfs_entry(bond); - unregister_netdevice(bond-dev); + bond_destroy(bond); rtnl_unlock(); goto out; } @@ -260,6 +258,7 @@ static ssize_t bonding_store_slaves(stru char command[IFNAMSIZ + 1] = { 0, }; char *ifname; int i, res, found, ret = count; + u32 original_mtu; struct slave *slave; struct net_device *dev = NULL; struct bonding *bond = to_bond(d); @@ -325,6 +324,7 @@ static ssize_t bonding_store_slaves(stru } /* Set the slave's MTU to match the bond */ + original_mtu = dev-mtu; if (dev-mtu != bond-dev-mtu) { if (dev-change_mtu) { res = dev-change_mtu(dev, @@ -339,6 +339,9 @@ static ssize_t bonding_store_slaves(stru } rtnl_lock(); res = bond_enslave(bond-dev, dev); + bond_for_each_slave(bond, slave, i) + if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) + slave-original_mtu = original_mtu; rtnl_unlock(); if (res) { ret = res; @@ -351,13 +354,17 @@ static ssize_t bonding_store_slaves(stru bond_for_each_slave(bond, slave, i) if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) { dev = slave-dev; + original_mtu = slave-original_mtu; break; } if (dev) { printk(KERN_INFO DRV_NAME : %s: Removing slave %s\n, bond-dev-name, dev-name); rtnl_lock(); - res = bond_release(bond-dev, dev); + if (bond-setup_by_slave) + res = bond_release_and_destroy(bond-dev, dev); + else + res = bond_release(bond-dev, dev); rtnl_unlock(); if (res) { ret = res; @@ -365,9 +372,9 @@ static ssize_t bonding_store_slaves(stru } /* set the slave MTU to the default */ if (dev-change_mtu) { - dev-change_mtu(dev, 1500); + dev-change_mtu(dev, original_mtu); } else { - dev-mtu = 1500; + dev-mtu = original_mtu; } } else { Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-08-15 10:55:34.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-08-20 14:29:11.912298402 +0300 @@ -156,6 +156,7 @@ struct slave { s8 link;/* one of
PATCH V5 9/11] net/bonding: Delay sending of gratuitous ARP to avoid failure
Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit in dev-state field is on. This improves the chances for the arp packet to be transmitted. Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/net/bonding/bond_main.c | 24 +--- drivers/net/bonding/bonding.h |1 + 2 files changed, 22 insertions(+), 3 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:56:33.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 11:04:37.221123652 +0300 @@ -1102,8 +1102,14 @@ void bond_change_active_slave(struct bon if (new_active !bond-do_set_mac_addr) memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, new_active-dev-addr_len); - - bond_send_gratuitous_arp(bond); + if (bond-curr_active_slave + test_bit(__LINK_STATE_LINKWATCH_PENDING, + bond-curr_active_slave-dev-state)) { + dprintk(delaying gratuitous arp on %s\n, + bond-curr_active_slave-dev-name); + bond-send_grat_arp = 1; + } else + bond_send_gratuitous_arp(bond); } } @@ -2083,6 +2089,17 @@ void bond_mii_monitor(struct net_device * program could monitor the link itself if needed. */ + if (bond-send_grat_arp) { + if (bond-curr_active_slave test_bit(__LINK_STATE_LINKWATCH_PENDING, + bond-curr_active_slave-dev-state)) + dprintk(Needs to send gratuitous arp but not yet\n); + else { + dprintk(sending delayed gratuitous arp on on %s\n, + bond-curr_active_slave-dev-name); + bond_send_gratuitous_arp(bond); + bond-send_grat_arp = 0; + } + } read_lock(bond-curr_slave_lock); oldcurrent = bond-curr_active_slave; read_unlock(bond-curr_slave_lock); @@ -2484,7 +2501,7 @@ static void bond_send_gratuitous_arp(str if (bond-master_ip) { bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip, - bond-master_ip, 0); + bond-master_ip, 0); } list_for_each_entry(vlan, bond-vlan_list, vlan_list) { @@ -4293,6 +4310,7 @@ static int bond_init(struct net_device * bond-current_arp_slave = NULL; bond-primary_slave = NULL; bond-dev = bond_dev; + bond-send_grat_arp = 0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-08-15 10:56:33.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-08-15 11:05:41.516451497 +0300 @@ -187,6 +187,7 @@ struct bonding { struct timer_list arp_timer; s8 kill_timers; s8 do_set_mac_addr; + s8 send_grat_arp; struct net_device_stats stats; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 10/11] net/bonding: Destroy bonding master when last slave is gone
When bonding enslaves non Ethernet devices it takes pointers to functions in the module that owns the slaves. In this case it becomes unsafe to keep the bonding master registered after last slave was unenslaved because we don't know if the pointers are still valid. Destroying the bond when slave_cnt is zero ensures that these functions be used anymore. Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/net/bonding/bond_main.c | 45 +++- drivers/net/bonding/bonding.h |3 ++ 2 files changed, 47 insertions(+), 1 deletion(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-20 14:43:17.123702132 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-20 14:43:17.850571535 +0300 @@ -1256,6 +1256,7 @@ static int bond_compute_features(struct static void bond_setup_by_slave(struct net_device *bond_dev, struct net_device *slave_dev) { + struct bonding *bond = bond_dev-priv; bond_dev-hard_header = slave_dev-hard_header; bond_dev-rebuild_header= slave_dev-rebuild_header; bond_dev-hard_header_cache = slave_dev-hard_header_cache; @@ -1270,6 +1271,7 @@ static void bond_setup_by_slave(struct n memcpy(bond_dev-broadcast, slave_dev-broadcast, slave_dev-addr_len); + bond-setup_by_slave = 1; } /* enslave device slave to bond device master */ @@ -1838,6 +1840,35 @@ int bond_release(struct net_device *bond } /* +* Destroy a bonding device. +* Must be under rtnl_lock when this function is called. +*/ +void bond_destroy(struct bonding *bond) +{ + bond_deinit(bond-dev); + bond_destroy_sysfs_entry(bond); + unregister_netdevice(bond-dev); +} + +/* +* First release a slave and than destroy the bond if no more slaves iare left. +* Must be under rtnl_lock when this function is called. +*/ +int bond_release_and_destroy(struct net_device *bond_dev, struct net_device *slave_dev) +{ + struct bonding *bond = bond_dev-priv; + int ret; + + ret = bond_release(bond_dev, slave_dev); + if ((ret == 0) (bond-slave_cnt == 0)) { + printk(KERN_INFO DRV_NAME %s: destroying bond for.\n, + bond_dev-name); + bond_destroy(bond); + } + return ret; +} + +/* * This function releases all slaves. */ static int bond_release_all(struct net_device *bond_dev) @@ -3322,7 +3353,11 @@ static int bond_slave_netdev_event(unsig switch (event) { case NETDEV_UNREGISTER: if (bond_dev) { - bond_release(bond_dev, slave_dev); + dprintk(slave %s unregisters\n, slave_dev-name); + if (bond-setup_by_slave) + bond_release_and_destroy(bond_dev, slave_dev); + else + bond_release(bond_dev, slave_dev); } break; case NETDEV_CHANGE: @@ -3331,6 +3366,13 @@ static int bond_slave_netdev_event(unsig * sets up a hierarchical bond, then rmmod's * one of the slave bonding devices? */ + if (slave_dev-priv_flags IFF_SLAVE_DETACH) { + dprintk(slave %s detaching\n, slave_dev-name); + if (bond-setup_by_slave) + bond_release_and_destroy(bond_dev, slave_dev); + else + bond_release(bond_dev, slave_dev); + } break; case NETDEV_DOWN: /* @@ -4311,6 +4353,7 @@ static int bond_init(struct net_device * bond-primary_slave = NULL; bond-dev = bond_dev; bond-send_grat_arp = 0; + bond-setup_by_slave = 0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-08-20 14:43:17.123702132 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-08-20 14:47:52.845180870 +0300 @@ -188,6 +188,7 @@ struct bonding { s8 kill_timers; s8 do_set_mac_addr; s8 send_grat_arp; + s8 setup_by_slave; struct net_device_stats stats; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry; @@ -295,6 +296,8 @@ static inline void bond_unset_master_alb struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr); int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct net_device *slave_dev); int bond_create(char *name, struct bond_params *params, struct bonding **newbond); +void
[PATCH V5 5/11] net/bonding: Enable bonding to enslave non ARPHRD_ETHER
This patch changes some of the bond netdevice attributes and functions to be that of the active slave for the case of the enslaved device not being of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(), which are netdevice **type** dependent and hence might be not appropriate for devices of other types. It also enforces mutual exclusion on bonding slaves from dissimilar ether types, as was concluded over the v1 discussion. IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this IPoIB device is bounded to. The QP is a resource created by the IB HW and the GID is an identifier burned into the HCA (i have omitted here some details which are not important for the bonding RFC). Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/net/bonding/bond_main.c | 39 +++ 1 files changed, 39 insertions(+) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:08:59.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:13.424688411 +0300 @@ -1237,6 +1237,26 @@ static int bond_compute_features(struct return 0; } + +static void bond_setup_by_slave(struct net_device *bond_dev, + struct net_device *slave_dev) +{ + bond_dev-hard_header = slave_dev-hard_header; + bond_dev-rebuild_header= slave_dev-rebuild_header; + bond_dev-hard_header_cache = slave_dev-hard_header_cache; + bond_dev-header_cache_update = slave_dev-header_cache_update; + bond_dev-hard_header_parse = slave_dev-hard_header_parse; + + bond_dev-neigh_setup = slave_dev-neigh_setup; + + bond_dev-type = slave_dev-type; + bond_dev-hard_header_len = slave_dev-hard_header_len; + bond_dev-addr_len = slave_dev-addr_len; + + memcpy(bond_dev-broadcast, slave_dev-broadcast, + slave_dev-addr_len); +} + /* enslave device slave to bond device master */ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) { @@ -1311,6 +1331,25 @@ int bond_enslave(struct net_device *bond goto err_undo_flags; } + /* set bonding device ether type by slave - bonding netdevices are +* created with ether_setup, so when the slave type is not ARPHRD_ETHER +* there is a need to override some of the type dependent attribs/funcs. +* +* bond ether type mutual exclusion - don't allow slaves of dissimilar +* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same bond +*/ + if (bond-slave_cnt == 0) { + if (slave_dev-type != ARPHRD_ETHER) + bond_setup_by_slave(bond_dev, slave_dev); + } else if (bond_dev-type != slave_dev-type) { + printk(KERN_ERR DRV_NAME : %s ether type (%d) is different + from other slaves (%d), can not enslave it.\n, + slave_dev-name, + slave_dev-type, bond_dev-type); + res = -EINVAL; + goto err_undo_flags; + } + if (slave_dev-set_mac_address == NULL) { printk(KERN_ERR DRV_NAME : %s: Error: The slave device you specified does - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/11] bonding: Optionally allow ethernet slaves to keep own MAC
Update the don't change MAC of slaves functionality added in previous changes to be a generic option, rather than something tied to IB devices, as it's occasionally useful for regular ethernet devices as well. Adds fail_over_mac option (which is automatically enabled for IB slaves), applicable only to active-backup mode. Includes documentation update. Updates bonding driver version to 3.2.0. Signed-off-by: Jay Vosburgh [EMAIL PROTECTED] --- Documentation/networking/bonding.txt | 33 +++ drivers/net/bonding/bond_main.c | 57 + drivers/net/bonding/bond_sysfs.c | 49 + drivers/net/bonding/bonding.h|6 ++-- 4 files changed, 121 insertions(+), 24 deletions(-) diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 1da5666..1134062 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -281,6 +281,39 @@ downdelay will be rounded down to the nearest multiple. The default value is 0. +fail_over_mac + + Specifies whether active-backup mode should set all slaves to + the same MAC address (the traditional behavior), or, when + enabled, change the bond's MAC address when changing the + active interface (i.e., fail over the MAC address itself). + + Fail over MAC is useful for devices that cannot ever alter + their MAC address, or for devices that refuse incoming + broadcasts with their own source MAC (which interferes with + the ARP monitor). + + The down side of fail over MAC is that every device on the + network must be updated via gratuitous ARP, vs. just updating + a switch or set of switches (which often takes place for any + traffic, not just ARP traffic, if the switch snoops incoming + traffic to update its tables) for the traditional method. If + the gratuitous ARP is lost, communication may be disrupted. + + When fail over MAC is used in conjuction with the mii monitor, + devices which assert link up prior to being able to actually + transmit and receive are particularly susecptible to loss of + the gratuitous ARP, and an appropriate updelay setting may be + required. + + A value of 0 disables fail over MAC, and is the default. A + value of 1 enables fail over MAC. This option is enabled + automatically if the first slave added cannot change its MAC + address. This option may be modified via sysfs only when no + slaves are present in the bond. + + This option was added in bonding version 3.2.0. + lacp_rate Option specifying the rate in which we'll ask our link partner diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 77caca3..c01ff9d 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -97,6 +97,7 @@ static char *xmit_hash_policy = NULL; static int arp_interval = BOND_LINK_ARP_INTERV; static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, }; static char *arp_validate = NULL; +static int fail_over_mac = 0; struct bond_params bonding_defaults; module_param(max_bonds, int, 0); @@ -130,6 +131,8 @@ module_param_array(arp_ip_target, charp, NULL, 0); MODULE_PARM_DESC(arp_ip_target, arp targets in n.n.n.n form); module_param(arp_validate, charp, 0); MODULE_PARM_DESC(arp_validate, validate src/dst of ARP probes: none (default), active, backup or all); +module_param(fail_over_mac, int, 0); +MODULE_PARM_DESC(fail_over_mac, For active-backup, do not set all slaves to the same MAC. 0 of off (default), 1 for on.); /*- Global variables */ @@ -1099,7 +1102,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active) /* when bonding does not set the slave MAC address, the bond MAC * address is the one of the active slave. */ - if (new_active !bond-do_set_mac_addr) + if (new_active bond-params.fail_over_mac) memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, new_active-dev-addr_len); if (bond-curr_active_slave @@ -1371,16 +1374,16 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) if (slave_dev-set_mac_address == NULL) { if (bond-slave_cnt == 0) { printk(KERN_WARNING DRV_NAME - : %s: Warning: The first slave device you - specified does not support setting the MAC - address. This bond MAC address would be that - of the active slave.\n, bond_dev-name); - bond-do_set_mac_addr = 0; - } else
Re: Please pull 'z1211' branch of wireless-2.6
On Wed, Sep 19, 2007 at 11:08:16PM +0100, Daniel Drake wrote: I would like to this until 2.6.25 until I have had time to clear up some final issues and do more testing myself of zd1211rw-mac80211. I also think we need to discuss the rename... Renames being what they are, I was hoping to avoid a bikeshed discussion about the choice of names. My main point was to get it into the tree with a unique and manageable name. I'm sure we could still rename it again before 2.6.24 ships or even later. I know that you will argue that a rename is unnecessary if we simply port the existing driver to mac80211, which is certainly true. I just wonder if that is the least bumpy solution for users. At least with a new driver, if something doesn't work then the old driver is still there as a fallback. Plus you can avoid some confusion with old howtos and such on the web referring to an old driver instead of the new one, etc. Maybe that isn't a huge issue in this case, but I wouldn't underestimate the possible confusion. (just to clarify to others: this is the first I heard of this merge before John posted it). Yes, sorry...permission, forgiveness...forgive? :-) John, thanks a lot for your efforts, I hope you don't mind waiting one extra release cycle for me to sort a few things out. Well, obviously I would like to get it out now. The longer we are without a mac80211-based driver for zd1211 hardware then the longer we must maintain the softmac component (or at least take bug reports for it). If you are determined not to have it in 2.6.24 then I will relent. I will also suggest that Larry start sending any softmac bugs to you... :-) If we will be having a port rather than a new driver, how soon after 2.6.24-rc1 closes can we queue the port for 2.6.25? I think it should be almost immediately, to ensure maximum test exposure and to seal the deal. What do you think? Thanks, John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH - net-2.6.24 1/2] Introduce and use print_ip
* Joe Perches [EMAIL PROTECTED] 2007-09-19 23:53 This removes the uses of NIPQUAD and HIPQUAD in drivers/net and net IPV4 Use: DECLARE_IP_BUF(ipbuf); __be32 addr; print_ip(ipbuf, addr) Signed-off-by: Joe Perches [EMAIL PROTECTED] please pull from: git pull http://repo.or.cz/r/linux-2.6/trivial-mods.git print_ipv4 Including a patch for review would be helpful. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'z1211' branch of wireless-2.6
John W. Linville wrote: I know that you will argue that a rename is unnecessary if we simply port the existing driver to mac80211, which is certainly true. I just wonder if that is the least bumpy solution for users. At least with a new driver, if something doesn't work then the old driver is still there as a fallback. Plus you can avoid some confusion with old howtos and such on the web referring to an old driver instead of the new one, etc. Maybe that isn't a huge issue in this case, but I wouldn't underestimate the possible confusion. Maybe I'll provide a one-off externally building driver for 2.6.25 or something like that, just as a basis for comparison. I think biting the bullet and simply attacking the issues that come up is the best way. Old documentation will still be relevant for the mac80211 driver, especially if we don't change the driver/config names -- offhand I can't think of any obvious differences between the user interface to the 2 drivers. (just to clarify to others: this is the first I heard of this merge before John posted it). Yes, sorry...permission, forgiveness...forgive? :-) Of course :) If you are determined not to have it in 2.6.24 then I will relent. I will also suggest that Larry start sending any softmac bugs to you... :-) That's fine. If we will be having a port rather than a new driver, how soon after 2.6.24-rc1 closes can we queue the port for 2.6.25? I think it should be almost immediately, to ensure maximum test exposure and to seal the deal. What do you think? I think that's realistic, I'll do what I can. Thanks, Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...
On Thu, 20 Sep 2007, Jeff Garzik wrote: You may be pleased (or less so) to hear that the version of sb1250-mac.c in your tree does not even build (because of 42d53d6be113f974d8152979c88e1061b953bd12) and the patch below does not address it. I ran out of time in the evening, but I will send you a fix shortly. To be honest I think even with bulk changes it may be worth checking whether they do not break stuff. ;-) hrm. I cannot get this to apply on top of linux-2.6.git, netdev-2.6.git#upstream (prior to net-2.6.24 rebase) or netdev-2.6.git#upstream (after net-2.6.24 rebase) It applies on top of current -mm. It seems to apply to a copy of netdev-2.6.git#upstream that I have got, but I am probably missing something... If I try to clone your repository again I get: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/linux-netdev-2.6.git linux Initialized empty Git repository in /home/macro/GIT-other/linux-netdev/linux/.git/ fatal: The remote end hung up unexpectedly fetch-pack from 'git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/linux-netdev-2.6.git' failed. $ For linux-2.6.git the patch-mips-2.6.23-rc5-20070904-sb1250-mac-typedef-7 version applies as submitted originally; I can resubmit this one if you like. I am slowly getting lost and I have another big chunk for sb1250-mac.c waiting to be put on top of these... Maciej - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LARTC] ifb and ppp
Sorry, I didnt follow the thread - what is the goal to be achieved with the setup? A simple ingress shaping on ppp0 (PPPOE DSL line). I want to replace my old imq ingress shaper in favor of ifb. My former script used iptables marks to classify the packets. My iptables marks are getting set, as like before with imq. But tc seems not to recognize them: It only uses the default class. So i run tcpdump -i ifb0 and discovered that the packets seems to be still encapsulated on ifb0. I suppose this is why my iptables stuff is not working. I've attached the ingress part of my shaping script. Thanks for your help Frithjof tc qdisc del dev ppp0 root2 /dev/null /dev/null tc qdisc del dev ifb0 root 2 /dev/null /dev/null tc qdisc del dev ppp0 ingress modprobe ifb ifconfig ifb0 up tc qdisc add dev ppp0 ingress tc filter add dev ppp0 parent : protocol ip u32 match u32 0 0 flowid 1:1 action mirred egress redirect dev ifb0 tc qdisc add dev ifb0 handle 1: root hfsc default 32 tc class add dev ifb0 parent 1: classid 1:1 hfsc sc rate 6000kbit ul rate 6000kbit tc class add dev ifb0 parent 1:1 classid 1:30 hfsc rt umax 208b dmax 20ms rate 83kbit ls rate 120kbit tc class add dev ifb0 parent 1:1 classid 1:31 hfsc sc rate $[(6000-120)/3]kbit ul rate 6000kbit tc class add dev ifb0 parent 1:1 classid 1:32 hfsc sc rate $[(6000-120)/3*2]kbit ul rate 6000kbit tc qdisc add dev ifb0 parent 1:30 handle 30: sfq perturb 10 tc qdisc add dev ifb0 parent 1:31 handle 31: sfq perturb 10 tc qdisc add dev ifb0 parent 1:32 handle 32: red limit 100 min 5000 max 10 avpkt 1000 burst 50 tc filter add dev ifb0 parent 1:0 prio 0 protocol ip handle 30 fw flowid 1:30 tc filter add dev ifb0 parent 1:0 prio 0 protocol ip handle 31 fw flowid 1:31 tc filter add dev ifb0 parent 1:0 prio 0 protocol ip handle 32 fw flowid 1:32 iptables -t mangle -N MYSHAPER-IN iptables -t mangle -I PREROUTING -i ppp0 -j MYSHAPER-IN iptables -t mangle -A MYSHAPER-IN -p tcp -m length --length :64 -j MARK --set-mark 31 # short TCP packets are probably ACKs iptables -t mangle -A MYSHAPER-IN -p tcp --dport 22 -m length --length :500 -j MARK --set-mark 3# secure shell iptables -t mangle -A MYSHAPER-IN -p tcp --sport 22 -m length --length :500 -j MARK --set-mark 31# secure shell iptables -t mangle -A MYSHAPER-IN -p ! tcp -j MARK --set-mark 31 # Set non-tcp packets to high priority iptables -t mangle -A MYSHAPER-IN -m mark --mark 0 -j MARK --set-mark 32 # redundant- mark any unmarked packets as 26 (low prio) [...]
Re: [PATCH - net-2.6.24 0/2] Introduce and use print_ip and print_ipv6
On Wed, 19 Sep 2007 23:53:31 -0700 Joe Perches wrote: In the same vein as print_mac, the implementations introduce declaration macros: DECLARE_IP_BUF(var) DECLARE_IPV6_BUF(var) and functions: print_ip print_ipv6 print_ipv6_nofmt IPV4 Use: DECLARE_IP_BUF(ipbuf); __be32 addr; print_ip(ipbuf, addr); IPV6 use: DECLARE_IPV6_BUF(ipv6buf); const struct in6_addr *addr; print_ipv6(ipv6buf, addr); and print_ipv6_nofmt(ipv6buf, addr); compiled x86, defconfig and allyesconfig How large are the patches if you posted them for review instead of just referencing gits for them? (which cuts down on review possibilities) --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.24 plans
On Thu, 2007-09-20 at 10:17 -0400, John W. Linville wrote: 2) ATMEL USB driver These are both really new. I think I'll transfer them to my wireless-2.6 tree, but still hold them back at least until 2.6.25. Also, atmel isn't even ported to mac80211 yet, is it? 3) NL80211 I need to check w/ Johannes to see if the user-facing portions of this have stabilized. I have a patch to basically remove everything from nl80211 that we're not using today, and make the interface well-defined so each type of setting has methods to new, del, get, set, for example create, remove, get info or change a virtual interface. If you wish, I can post this patch for inclusion into wireless-dev and then copy the resulting nl80211 to net-2.6.24, including the mac80211 hooks to make use of it. Shouldn't take more than a few hours. johannes signature.asc Description: This is a digitally signed message part
Re: net-2.6.24 plans
On Wed, Sep 19, 2007 at 03:19:28PM -0700, David Miller wrote: So it looks like what's left is: 1) ATH5K driver 2) ATMEL USB driver These are both really new. I think I'll transfer them to my wireless-2.6 tree, but still hold them back at least until 2.6.25. 3) NL80211 I need to check w/ Johannes to see if the user-facing portions of this have stabilized. 4) misc bits sprinkled around mac80211 These bits are mostly pieces with unsettled user inferface issues or unsettled features that still need some development. I'll be holding-on to these a while longer. Thanks, John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bnx2 dirver's firmware images
On Wednesday 19 September 2007 22:43, Michael Chan wrote: On Wed, 2007-09-19 at 21:29 +0100, Denys Vlasenko wrote: Are you saying that you successfully run-tested it? I've only reviewed the code. Let's resolve these issues first before testing the code. Please test these two patches. I updated them according to your comments. -- vda diff -urpN linux-2.6.23-rc6/drivers/net/bnx2.c linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c --- linux-2.6.23-rc6/drivers/net/bnx2.c 2007-09-14 00:08:11.0 +0100 +++ linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c 2007-09-20 15:47:06.0 +0100 @@ -52,6 +52,8 @@ #include bnx2_fw.h #include bnx2_fw2.h +#define FW_BUF_SIZE 0x8000 + #define DRV_MODULE_NAME bnx2 #define PFX DRV_MODULE_NAME : #define DRV_MODULE_VERSION 1.6.4 @@ -2767,89 +2769,44 @@ bnx2_set_rx_mode(struct net_device *dev) spin_unlock_bh(bp-phy_lock); } -#define FW_BUF_SIZE 0x8000 - +/* To be moved to generic lib/ */ static int -bnx2_gunzip_init(struct bnx2 *bp) +bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len, void **outbuf) { - if ((bp-gunzip_buf = vmalloc(FW_BUF_SIZE)) == NULL) - goto gunzip_nomem1; + struct z_stream_s *strm; + int rc; - if ((bp-strm = kmalloc(sizeof(*bp-strm), GFP_KERNEL)) == NULL) - goto gunzip_nomem2; + /* gzip header (1f,8b,08... 10 bytes total + possible asciz filename) + * is stripped */ - bp-strm-workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL); - if (bp-strm-workspace == NULL) + rc = -ENOMEM; + strm = kmalloc(sizeof(*strm), GFP_KERNEL); + if (strm == NULL) + goto gunzip_nomem2; + strm-workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL); + if (strm-workspace == NULL) goto gunzip_nomem3; - return 0; + strm-next_in = zbuf; + strm-avail_in = len; + strm-next_out = gunzip_buf; + strm-avail_out = sz; + + rc = zlib_inflateInit2(strm, -MAX_WBITS); + if (rc == Z_OK) { + rc = zlib_inflate(strm, Z_FINISH); + if (rc == Z_OK) + rc = sz - strm-avail_out; + else + rc = -EINVAL; + zlib_inflateEnd(strm); + } else + rc = -EINVAL; + kfree(strm-workspace); gunzip_nomem3: - kfree(bp-strm); - bp-strm = NULL; - + kfree(strm); gunzip_nomem2: - vfree(bp-gunzip_buf); - bp-gunzip_buf = NULL; - -gunzip_nomem1: - printk(KERN_ERR PFX %s: Cannot allocate firmware buffer for - uncompression.\n, bp-dev-name); - return -ENOMEM; -} - -static void -bnx2_gunzip_end(struct bnx2 *bp) -{ - kfree(bp-strm-workspace); - - kfree(bp-strm); - bp-strm = NULL; - - if (bp-gunzip_buf) { - vfree(bp-gunzip_buf); - bp-gunzip_buf = NULL; - } -} - -static int -bnx2_gunzip(struct bnx2 *bp, u8 *zbuf, int len, void **outbuf, int *outlen) -{ - int n, rc; - - /* check gzip header */ - if ((zbuf[0] != 0x1f) || (zbuf[1] != 0x8b) || (zbuf[2] != Z_DEFLATED)) - return -EINVAL; - - n = 10; - -#define FNAME 0x8 - if (zbuf[3] FNAME) - while ((zbuf[n++] != 0) (n len)); - - bp-strm-next_in = zbuf + n; - bp-strm-avail_in = len - n; - bp-strm-next_out = bp-gunzip_buf; - bp-strm-avail_out = FW_BUF_SIZE; - - rc = zlib_inflateInit2(bp-strm, -MAX_WBITS); - if (rc != Z_OK) - return rc; - - rc = zlib_inflate(bp-strm, Z_FINISH); - - *outlen = FW_BUF_SIZE - bp-strm-avail_out; - *outbuf = bp-gunzip_buf; - - if ((rc != Z_OK) (rc != Z_STREAM_END)) - printk(KERN_ERR PFX %s: Firmware decompression error: %s\n, - bp-dev-name, bp-strm-msg); - - zlib_inflateEnd(bp-strm); - - if (rc == Z_STREAM_END) - return 0; - return rc; } @@ -2902,22 +2859,21 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_ /* Load the Text area. */ offset = cpu_reg-spad_base + (fw-text_addr - cpu_reg-mips_view_base); if (fw-gz_text) { - u32 text_len; - void *text; - - rc = bnx2_gunzip(bp, fw-gz_text, fw-gz_text_len, text, - text_len); - if (rc) - return rc; - - fw-text = text; - } - if (fw-gz_text) { + u32 *text; int j; + text = vmalloc(FW_BUF_SIZE); + if (!text) + return -ENOMEM; + rc = bnx2_gunzip(text, FW_BUF_SIZE, fw-gz_text, fw-gz_text_len); + if (rc 0) { + vfree(text); + return rc; + } for (j = 0; j (fw-text_len / 4); j++, offset += 4) { - REG_WR_IND(bp, offset, cpu_to_le32(fw-text[j])); + REG_WR_IND(bp, offset, cpu_to_le32(text[j])); } + vfree(text); } /* Load the Data area. */ @@ -2979,27 +2935,27 @@ bnx2_init_cpus(struct bnx2 *bp) { struct cpu_reg cpu_reg; struct fw_info *fw; - int rc = 0; + int rc; void *text; - u32 text_len; - - if ((rc = bnx2_gunzip_init(bp)) != 0) - return rc; /* Initialize the RV2P processor. */ - rc = bnx2_gunzip(bp, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1), text, - text_len); - if (rc) + text = vmalloc(FW_BUF_SIZE); + if (!text) + return -ENOMEM; + rc = bnx2_gunzip(text, FW_BUF_SIZE, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1)); + if (rc 0) { + vfree(text); goto init_cpu_err; + } + load_rv2p_fw(bp, text, rc /* == len */, RV2P_PROC1); - load_rv2p_fw(bp, text, text_len, RV2P_PROC1); - - rc = bnx2_gunzip(bp, bnx2_rv2p_proc2, sizeof(bnx2_rv2p_proc2), text, - text_len); - if (rc) +
[PATCH 0/3 Rev-4] Age Entry For IPv4 IPv6 Route Table
Hi Dave, Thanks for the comment. I have created another patch set as you have suggested. Your Comments: In avoiding the age initialization at routing cache insertion time, you make the value provided totally inaccurate and essentially useless especially the very first time the value is asked for. I really don't like these changes, they have had problems every step of the way, and the above proves that we could essentially always return an age value of zero and still be compliant with the standards. + if (!*age) { + *age = timeval_to_sec(tv); + NLA_PUT_U32(skb, RTA_AGE, *age); I have made a mistake. Sorry i didnt catch it earlier :-) So, NLA_PUT_U32(skb, RTA_AGE, 0) would have made more sense? + } else { + NLA_PUT_U32(skb, RTA_AGE, timeval_to_sec(tv) - *age); + } Since you didnt like the hack, i have reimplemented the above by initilizing the age value at the time of insertion. I hope this is what you pointed out in your comments. Please let me know if its ok. Stephen, as the age value is human readable we decided that it need not be accurate. I thought that rounding up will make it a bit more readable. But i think you are right. So, in this patchset i have taken care of this issue. Is this ok? Regards, Varun Original Comment: According to the RFC 4292 (IP Forwarding Table MIB) there is a need for an age entry for all the routes in therouting table. The entry in the RFC is inetCidrRouteAge and oid is inetCidrRouteAge.1.10. Many snmp application require this age entry. So iam adding the age field in the routing table for ipv4 and ipv6 and providing the interface for this value netlink. I made a note of changes i made as per the suggestions given in the community. Here is the changelog. Changelog since ver 1: - Changes Suggestion 1)Change in the interface from proc to netlink. It was not approved by David Miller and Yoshifuji.David Miller Yoshifuji 2)Change from jiffies to timeval. Eric Dumazet 3)Rounding up timeval Patrick McHardy, Oliver Hartkopp Eric Dumazet. 4)Relocate timeval_to_sec Stephen Hemminger, Krishna Kumar 5)Using macro RT6_GET_ROUTE_INFOKrishna Kumar 6)Add proper comment for timeval_to_sec Eric Dumazet 7)Add proper comment for timeval insertion Thomas Graf 8)Insert the age value at route insertion David Miller 9)Remove round off. Stephen Hemminger Signed-off-by: Varun Chandramohan [EMAIL PROTECTED] --- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH - net-2.6.24 0/2] Introduce and use print_ip and print_ipv6
On Thu, 2007-09-20 at 07:55 -0700, Randy Dunlap wrote: How large are the patches if you posted them for review instead of just referencing gits for them? (which cuts down on review possibilities) The v4 is ~130kb, the v6 ~35kb. There is a gitweb available at: print_ip: http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=shortlog;h=print_ipv4 commit diff: http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=commitdiff;h=1e3a30d5d8b49b3accca07cc84ecf6d977cacdd5 print_ipv6: http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=shortlog;h=print_ipv6 commit diff: http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=commitdiff;h=e96b794a57a164db84379e2baf5fe2622a5ae3bf - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3 Rev4] New attribute RTA_AGE
A new attribute RTA_AGE is added for the age value to be exported to userlevel using netlink Signed-off-by: Varun Chandramohan [EMAIL PROTECTED] --- include/linux/rtnetlink.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index c91476c..68046a4 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -263,6 +263,7 @@ enum rtattr_type_t RTA_SESSION, RTA_MP_ALGO, /* no longer used */ RTA_TABLE, + RTA_AGE, __RTA_MAX }; -- 1.4.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3 Rev4] Initilize and populate age field
The age field is filled with the current time at the time of creation of the route. When the routes are dumped then the age value stored in the route structure is subtracted from the current time value and the difference is the age expressed in secs. Signed-off-by: Varun Chandramohan [EMAIL PROTECTED] --- net/ipv4/fib_hash.c |5 + net/ipv4/fib_lookup.h|3 ++- net/ipv4/fib_semantics.c | 13 ++--- net/ipv4/fib_trie.c |1 + 4 files changed, 18 insertions(+), 4 deletions(-) diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c index 9ad1d9f..bb52193 100644 --- a/net/ipv4/fib_hash.c +++ b/net/ipv4/fib_hash.c @@ -385,6 +385,7 @@ static int fn_hash_insert(struct fib_tab struct fib_alias *fa, *new_fa; struct fn_zone *fz; struct fib_info *fi; + struct timeval tv; u8 tos = cfg-fc_tos; __be32 key; int err; @@ -420,6 +421,7 @@ static int fn_hash_insert(struct fib_tab else fa = fib_find_alias(f-fn_alias, tos, fi-fib_priority); + do_gettimeofday(tv); /* Now fa, if non-NULL, points to the first fib alias * with the same keys [prefix,tos,priority], if such key already * exists or to the node before which we will insert new one. @@ -448,6 +450,7 @@ static int fn_hash_insert(struct fib_tab fa-fa_info = fi; fa-fa_type = cfg-fc_type; fa-fa_scope = cfg-fc_scope; + fa-fa_age = tv.tv_sec; state = fa-fa_state; fa-fa_state = ~FA_S_ACCESSED; fib_hash_genid++; @@ -507,6 +510,7 @@ static int fn_hash_insert(struct fib_tab new_fa-fa_type = cfg-fc_type; new_fa-fa_scope = cfg-fc_scope; new_fa-fa_state = 0; + new_fa-fa_age = tv.tv_sec; /* * Insert new entry to the list. @@ -697,6 +701,7 @@ fn_hash_dump_bucket(struct sk_buff *skb, f-fn_key, fz-fz_order, fa-fa_tos, + fa-fa_age, fa-fa_info, NLM_F_MULTI) 0) { cb-args[4] = i; diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h index eef9eec..76c4a47 100644 --- a/net/ipv4/fib_lookup.h +++ b/net/ipv4/fib_lookup.h @@ -13,6 +13,7 @@ struct fib_alias { u8 fa_type; u8 fa_scope; u8 fa_state; + time_t fa_age; }; #define FA_S_ACCESSED 0x01 @@ -27,7 +28,7 @@ extern struct fib_info *fib_create_info( extern int fib_nh_match(struct fib_config *cfg, struct fib_info *fi); extern int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event, u32 tb_id, u8 type, u8 scope, __be32 dst, -int dst_len, u8 tos, struct fib_info *fi, +int dst_len, u8 tos, time_t age, struct fib_info *fi, unsigned int); extern void rtmsg_fib(int event, __be32 key, struct fib_alias *fa, int dst_len, u32 tb_id, struct nl_info *info, diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index c434119..fa892ce 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -278,7 +278,8 @@ static inline size_t fib_nlmsg_size(stru + nla_total_size(4) /* RTA_TABLE */ + nla_total_size(4) /* RTA_DST */ + nla_total_size(4) /* RTA_PRIORITY */ -+ nla_total_size(4); /* RTA_PREFSRC */ ++ nla_total_size(4) /* RTA_PREFSRC */ ++ nla_total_size(4); /*RTA_AGE*/ /* space for nested metrics */ payload += nla_total_size((RTAX_MAX * nla_total_size(4))); @@ -313,7 +314,7 @@ void rtmsg_fib(int event, __be32 key, st err = fib_dump_info(skb, info-pid, seq, event, tb_id, fa-fa_type, fa-fa_scope, key, dst_len, - fa-fa_tos, fa-fa_info, nlm_flags); + fa-fa_tos, fa-fa_age, fa-fa_info, nlm_flags); if (err 0) { /* -EMSGSIZE implies BUG in fib_nlmsg_size() */ WARN_ON(err == -EMSGSIZE); @@ -940,11 +941,12 @@ __be32 __fib_res_prefsrc(struct fib_resu } int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event, - u32 tb_id, u8 type, u8 scope, __be32 dst, int dst_len, u8 tos, + u32 tb_id, u8 type, u8 scope, __be32 dst, int dst_len, u8 tos, time_t age, struct fib_info *fi, unsigned int flags) { struct nlmsghdr *nlh; struct rtmsg *rtm; + struct timeval tv; nlh = nlmsg_put(skb, pid,
[PATCH 3/3 Rev4] Initialize and fill IPv6 route age
The age field of the ipv6 route structures are initilized with the current timeval at the time of route creation. When the route dump is called the route age value stored in the structure is subtracted from the present timeval and the difference is passed on as the route age. Signed-off-by: Varun Chandramohan [EMAIL PROTECTED] --- include/net/ip6_fib.h |1 + net/ipv6/addrconf.c |5 + net/ipv6/route.c | 14 ++ 3 files changed, 20 insertions(+), 0 deletions(-) diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index c48ea87..e30a1cf 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -98,6 +98,7 @@ struct rt6_info u32 rt6i_flags; u32 rt6i_metric; + time_t rt6i_age; atomic_trt6i_ref; struct fib6_table *rt6i_table; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 91ef3be..e77c6ad 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4182,6 +4182,7 @@ EXPORT_SYMBOL(unregister_inet6addr_notif int __init addrconf_init(void) { + struct timeval tv; int err = 0; /* The addrconf netdev notifier requires that loopback_dev @@ -4209,10 +4210,14 @@ int __init addrconf_init(void) if (err) return err; + do_gettimeofday(tv); ip6_null_entry.rt6i_idev = in6_dev_get(loopback_dev); + ip6_null_entry.rt6i_age = tv.tv_sec; #ifdef CONFIG_IPV6_MULTIPLE_TABLES ip6_prohibit_entry.rt6i_idev = in6_dev_get(loopback_dev); + ip6_prohibit_entry.rt6i_age = tv.tv_sec; ip6_blk_hole_entry.rt6i_idev = in6_dev_get(loopback_dev); + ip6_blk_hole_entry.rt6i_age = tv.tv_sec; #endif register_netdevice_notifier(ipv6_dev_notf); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 55ea80f..e9a9d00 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -600,7 +600,14 @@ static int __ip6_ins_rt(struct rt6_info { int err; struct fib6_table *table; + struct timeval tv; + do_gettimeofday(tv); + /* Update the timeval for new routes +* We add it here to make it common irrespective +* of how the new route is added. +*/ + rt-rt6i_age = tv.tv_sec; table = rt-rt6i_table; write_lock_bh(table-tb6_lock); err = fib6_add(table-tb6_root, rt, info); @@ -2112,6 +2119,7 @@ static inline size_t rt6_nlmsg_size(void + nla_total_size(4) /* RTA_IIF */ + nla_total_size(4) /* RTA_OIF */ + nla_total_size(4) /* RTA_PRIORITY */ + + nla_total_size(4) /*RTA_AGE*/ + RTAX_MAX * nla_total_size(4) /* RTA_METRICS */ + nla_total_size(sizeof(struct rta_cacheinfo)); } @@ -2123,6 +2131,7 @@ static int rt6_fill_node(struct sk_buff { struct rtmsg *rtm; struct nlmsghdr *nlh; + struct timeval tv; long expires; u32 table; @@ -2186,6 +2195,11 @@ static int rt6_fill_node(struct sk_buff if (ipv6_get_saddr(rt-u.dst, dst, saddr_buf) == 0) NLA_PUT(skb, RTA_PREFSRC, 16, saddr_buf); } + + do_gettimeofday(tv); + if (rt-rt6i_age) { + NLA_PUT_U32(skb, RTA_AGE, (tv.tv_sec - rt-rt6i_age)); + } if (rtnetlink_put_metrics(skb, rt-u.dst.metrics) 0) goto nla_put_failure; -- 1.4.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wrong arp query with policy routing
Is there a way to force linux to make an arp probe with the source ip belonging to the same subnet requesting ip? Umm, arp_filter? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LARTC] ifb and ppp
Frithjof Hammer wrote: Sorry, I didnt follow the thread - what is the goal to be achieved with the setup? A simple ingress shaping on ppp0 (PPPOE DSL line). I want to replace my old imq ingress shaper in favor of ifb. My former script used iptables marks to classify the packets. My iptables marks are getting set, as like before with imq. But tc seems not to recognize them: It only uses the default class. So i run tcpdump -i ifb0 and discovered that the packets seems to be still encapsulated on ifb0. I suppose this is why my iptables stuff is not working. Thats actually a completely different problem. Unlike with imq, packets are delivered to ifb *before* they pass through iptables. So at that time they're not marked. I don't see a good solution for this that allows to keep the iptables rules, I'd suggest to switch to ematches. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][MIPS][7/7] AR7: ethernet
Driver for the cpmac 100M ethernet driver. Jeff, here is the meat ;) Signed-off-by: Matteo Croce [EMAIL PROTECTED] Signed-off-by: Eugene Konev [EMAIL PROTECTED] diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 6a0863e..28ba0dc 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -1822,6 +1822,15 @@ config SC92031 To compile this driver as a module, choose M here: the module will be called sc92031. This is recommended. +config CPMAC + tristate TI AR7 CPMAC Ethernet support (EXPERIMENTAL) + depends on NET_ETHERNET EXPERIMENTAL AR7 + select PHYLIB + select FIXED_PHY + select FIXED_MII_100_FDX + help + TI AR7 CPMAC Ethernet support + config NET_POCKET bool Pocket and portable adapters depends on PARPORT diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 9501d64..b536934 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -157,6 +157,7 @@ obj-$(CONFIG_8139CP) += 8139cp.o obj-$(CONFIG_8139TOO) += 8139too.o obj-$(CONFIG_ZNET) += znet.o obj-$(CONFIG_LAN_SAA9730) += saa9730.o +obj-$(CONFIG_CPMAC) += cpmac.o obj-$(CONFIG_DEPCA) += depca.o obj-$(CONFIG_EWRK3) += ewrk3.o obj-$(CONFIG_ATP) += atp.o diff --git a/drivers/net/cpmac.c b/drivers/net/cpmac.c new file mode 100644 index 000..50aad94 --- /dev/null +++ b/drivers/net/cpmac.c @@ -0,0 +1,1166 @@ +/* + * Copyright (C) 2006, 2007 Eugene Konev + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include linux/module.h +#include linux/init.h +#include linux/moduleparam.h + +#include linux/sched.h +#include linux/kernel.h +#include linux/slab.h +#include linux/errno.h +#include linux/types.h +#include linux/delay.h +#include linux/version.h + +#include linux/netdevice.h +#include linux/etherdevice.h +#include linux/ethtool.h +#include linux/skbuff.h +#include linux/mii.h +#include linux/phy.h +#include linux/platform_device.h +#include linux/dma-mapping.h +#include asm/gpio.h + +MODULE_AUTHOR(Eugene Konev); +MODULE_DESCRIPTION(TI AR7 ethernet driver (CPMAC)); +MODULE_LICENSE(GPL); + +static int rx_ring_size = 64; +static int disable_napi; +static int debug_level = 8; +static int dumb_switch; + +module_param(rx_ring_size, int, 0644); +module_param(disable_napi, int, 0644); +/* Next 2 are only used in cpmac_probe, so it's pointless to change them */ +module_param(debug_level, int, 0444); +module_param(dumb_switch, int, 0444); + +MODULE_PARM_DESC(rx_ring_size, Size of rx ring (in skbs)); +MODULE_PARM_DESC(disable_napi, Disable NAPI polling); +MODULE_PARM_DESC(debug_level, Number of NETIF_MSG bits to enable); +MODULE_PARM_DESC(dumb_switch, Assume switch is not connected to MDIO bus); + +/* frame size + 802.1q tag */ +#define CPMAC_SKB_SIZE (ETH_FRAME_LEN + 4) +#define CPMAC_TX_RING_SIZE 8 + +/* Ethernet registers */ +#define CPMAC_TX_CONTROL 0x0004 +#define CPMAC_TX_TEARDOWN 0x0008 +#define CPMAC_RX_CONTROL 0x0014 +#define CPMAC_RX_TEARDOWN 0x0018 +#define CPMAC_MBP 0x0100 +# define MBP_RXPASSCRC 0x4000 +# define MBP_RXQOS 0x2000 +# define MBP_RXNOCHAIN 0x1000 +# define MBP_RXCMF 0x0100 +# define MBP_RXSHORT 0x0080 +# define MBP_RXCEF 0x0040 +# define MBP_RXPROMISC 0x0020 +# define MBP_PROMISCCHAN(channel) (((channel) 0x7) 16) +# define MBP_RXBCAST 0x2000 +# define MBP_BCASTCHAN(channel)(((channel) 0x7) 8) +# define MBP_RXMCAST 0x0020 +# define MBP_MCASTCHAN(channel)((channel) 0x7) +#define CPMAC_UNICAST_ENABLE 0x0104 +#define CPMAC_UNICAST_CLEAR0x0108 +#define CPMAC_MAX_LENGTH 0x010c +#define CPMAC_BUFFER_OFFSET0x0110 +#define CPMAC_MAC_CONTROL 0x0160 +# define MAC_TXPTYPE 0x0200 +# define MAC_TXPACE0x0040 +# define MAC_MII 0x0020 +# define MAC_TXFLOW0x0010 +# define MAC_RXFLOW0x0008 +# define MAC_MTEST 0x0004 +# define MAC_LOOPBACK
Re: [PATCH V5 2/11] IB/ipoib: Notify the world before doing unregister
+ipoib_slave_detach(cpriv-dev); unregister_netdev(cpriv-dev); Maybe you already answered this before, but I'm still not clear why this notifier call can't just be added to the start of unregister_netdevice(), so we can avoid having driver needing to know anything about bonding internals? - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery.
Roland - can you please queue this up for 2.6.24? Done, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Preliminary release of Sun Neptune driver
Thanks Dave for your preliminary posting of the driver. I am copying Matheos Worku. Matheos is intimately familiar with the Neptune/NIU family of devices and their respective drivers. Not only he can be a good reviewer, he can also clarify issues around naming and so on. I agree that Neptune is just an overused internal codename not worth propagating in the code. Please feel free to add [EMAIL PROTECTED] to the reviewers list. Ariel David Miller wrote: From: Rick Jones [EMAIL PROTECTED] Date: Wed, 19 Sep 2007 16:20:39 -0700 so why niu? To what does niu translate anyway? Network Interface Unit. This is what the Niagara-2 programmers manual refers to the chip as. I try to name the files for most drivers I write as a 2 or 3 letter acronyms, it looks so much better than the usual verbose names. It's very unix. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] net/: all net/ cleanup with ARRAY_SIZE
On 9/17/07, David Miller [EMAIL PROTECTED] wrote: From: Denis Cheng [EMAIL PROTECTED] Date: Sun, 2 Sep 2007 18:30:17 +0800 Signed-off-by: Denis Cheng [EMAIL PROTECTED] You already submitted the net/ipv4/af_inet.c case seperately, so I had to remove it from this patch for it to apply properly. Please keep your patches straight to avoid problems like this. I just can say sorry. But at that time, I'm not sure the former specific patch to net/ipv4/af_inet.c would be applied, and then I realized that change should be done with every subsystem in the kernel source, so I regenerate a new patch for the whole net/ subsystem; In this situation, I think I should give an announcement to make the former patch deprecated, shouldn't it? However, I'll be more cautious with patches. Thans. Thanks for applying. -- Denis Cheng - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'z1211' branch of wireless-2.6
Daniel Drake wrote: John W. Linville wrote: If you are determined not to have it in 2.6.24 then I will relent. I will also suggest that Larry start sending any softmac bugs to you... :-) That's fine. You're on. BTW, I will let you be the primary tester of [PATCH] fix softmac lockdep reports that Johannes posted earlier today. I see you were CC'd. I plan on testing it with bcm43xx, but I won't get to it for a couple of days. Larry - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'z1211' branch of wireless-2.6
On Thu, 2007-09-20 at 11:37 -0500, Larry Finger wrote: You're on. BTW, I will let you be the primary tester of [PATCH] fix softmac lockdep reports that Johannes posted earlier today. I see you were CC'd. I plan on testing it with bcm43xx, but I won't get to it for a couple of days. The only thing it can possibly fix is our race against some other functions that use the global workqueue and lock the RTNL from within the work function while we have it locked while flushing. Conversely, it can't really break anything either. johannes signature.asc Description: This is a digitally signed message part
Re: [PATCH 2/3] netlink: the temp variable name max is ambiguous
On 9/17/07, David Miller [EMAIL PROTECTED] wrote: From: Denis Cheng [EMAIL PROTECTED] Date: Sun, 2 Sep 2007 03:45:58 +0800 with the macro max provided by linux/kernel.h, so changed its name to a more proper one: limit Signed-off-by: Denis Cheng [EMAIL PROTECTED] Not strictly necessary because CPP knows to differentiate between 'max(' and plain 'max' when evaluating if a CPP macro should be expanded or not. I also know the GNU CPP is intelligent, but people are often not. I just think the avoidance to use human ambiguous names could give more readability. Nonetheless, applied to net-2.6.24, thanks. -- Denis Cheng - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.24 plans
On Thu, Sep 20, 2007 at 04:50:52PM +0200, Johannes Berg wrote: On Thu, 2007-09-20 at 10:17 -0400, John W. Linville wrote: 2) ATMEL USB driver These are both really new. I think I'll transfer them to my wireless-2.6 tree, but still hold them back at least until 2.6.25. Also, atmel isn't even ported to mac80211 yet, is it? Kalle Valo has done some work on this, and I think Eugene Teo has joined the effort. They both are in contact with Pavel to accomplish the mac80211 port. John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.23][BNX2]: Add PHY workaround for 5709 A1.
[BNX2]: Add PHY workaround for 5709 A1. Add the DIS_EARLY_DAC PHY workaround for 5709 A1. Without it, link sometimes does not come up. Update version to 1.6.5. Signed-off-by: Michael Chan [EMAIL PROTECTED] diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index 854d80c..66eed22 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -54,8 +54,8 @@ #define DRV_MODULE_NAMEbnx2 #define PFX DRV_MODULE_NAME: -#define DRV_MODULE_VERSION 1.6.4 -#define DRV_MODULE_RELDATE August 3, 2007 +#define DRV_MODULE_VERSION 1.6.5 +#define DRV_MODULE_RELDATE September 20, 2007 #define RUN_AT(x) (jiffies + (x)) @@ -6727,7 +6727,8 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev) } else if (CHIP_NUM(bp) == CHIP_NUM_5706 || CHIP_NUM(bp) == CHIP_NUM_5708) bp-phy_flags |= PHY_CRC_FIX_FLAG; - else if (CHIP_ID(bp) == CHIP_ID_5709_A0) + else if (CHIP_ID(bp) == CHIP_ID_5709_A0 || +CHIP_ID(bp) == CHIP_ID_5709_A1) bp-phy_flags |= PHY_DIS_EARLY_DAC_FLAG; if ((CHIP_ID(bp) == CHIP_ID_5708_A0) || - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...
Maciej W. Rozycki wrote: On Thu, 20 Sep 2007, Jeff Garzik wrote: You may be pleased (or less so) to hear that the version of sb1250-mac.c in your tree does not even build (because of 42d53d6be113f974d8152979c88e1061b953bd12) and the patch below does not address it. I ran out of time in the evening, but I will send you a fix shortly. To be honest I think even with bulk changes it may be worth checking whether they do not break stuff. ;-) hrm. I cannot get this to apply on top of linux-2.6.git, netdev-2.6.git#upstream (prior to net-2.6.24 rebase) or netdev-2.6.git#upstream (after net-2.6.24 rebase) It applies on top of current -mm. It seems to apply to a copy of netdev-2.6.git#upstream that I have got, but I am probably missing something... If I try to clone your repository again I get: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/linux-netdev-2.6.git linux Initialized empty Git repository in /home/macro/GIT-other/linux-netdev/linux/.git/ fatal: The remote end hung up unexpectedly fetch-pack from 'git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/linux-netdev-2.6.git' failed. Remove the linux- prefix. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Wed, 19 Sep 2007, Nagendra Tomar wrote: The tcp_check_space() function calls tcp_new_space() only if the SOCK_NOSPACE bit is set in the socket flags. This is causing Edge Triggered EPOLLOUT events to be missed for TCP sockets, as the ep_poll_callback() is not called from the wakeup routine. The SOCK_NOSPACE bit indicates the user's intent to perform writes on that socket (set in tcp_sendmsg and tcp_poll). I believe the idea behind the SOCK_NOSPACE check is to optimize away the tcp_new_space call in cases when user is not interested in writing to the socket. These two take care of all possible scenarios in which a user can convey his intent to write on that socket. Case 1: tcp_sendmsg detects lack of sndbuf space Case 2: tcp_poll returns not writable This is fine if we do not deal with epoll's Edge Triggered events (EPOLLET). With ET events we can have a scenario where the SOCK_NOSPACE bit is not set, as the user has neither done a sendmsg nor a poll/epoll call that returned with the POLLOUT condition not set. Looking back at it, I think the current TCP code is right, once you look at the event to be a output buffer full-with_space transition. If you drop an fd inside epoll with EPOLLOUT|EPOLLET and you get an event (free space on the output buffer), if you do not consume it (say a tcp_sendmsg that re-fill the buffer), you can't see other OUT event anymore since they happen on the full-with_space transition. Yes, I know, the read size (EPOLLIN) works differently and you get an event for every packet you receive. And yes, I do not like asymmetric things. But that does not make the EPOLLOUT|EPOLLET wrong IMO. - Davide - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH: 2.6.13-15-SMP 3/3] network: concurrently run softirq network code on SMP
The whole reason the queues are per-cpu is so that we do not have to touch remote processor state nor use locks of any kind whatsoever. With multi-queue networking cards becoming more and more available, which will split up the packet workload in hardware across all available cpus, there is less and less reason to make a patch like this one. We've known about this issue for ages, and if we felt it was appropriate to make this change, we would have done so years ago. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH - net-2.6.24 0/2] Introduce and use print_ip and print_ipv6
On Thu, 20 Sep 2007, Joe Perches wrote: On Thu, 2007-09-20 at 07:55 -0700, Randy Dunlap wrote: How large are the patches if you posted them for review instead of just referencing gits for them? (which cuts down on review possibilities) The v4 is ~130kb, the v6 ~35kb. There is a gitweb available at: print_ip: http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=shortlog;h=print_ipv4 commit diff: http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=commitdiff;h=1e3a30d5d8b49b3accca07cc84ecf6d977cacdd5 print_ipv6: http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=shortlog;h=print_ipv6 commit diff: http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=commitdiff;h=e96b794a57a164db84379e2baf5fe2622a5ae3bf ...Alternatively you could split it up a bit and send those smaller chunks for reviewing purposes only (even though it would be combined to a single big patch in the end). -- i. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Thu, 20 Sep 2007, Eric Dumazet wrote: Does it means that with your patch each ACK on a ET managed socket will trigger an epoll event ? Maybe your very sensitive high throuput appication needs to set a flag or something at socket level to ask for such a behavior. The default should stay as is. That is an event should be sent only if someone cared about the wakeup. Unfortunately f_op-poll() does not let the caller to specify the events it's interested in, that would allow to split send/recevie wait queues and better detect read/write cases. The detection of a waitqueue_active(-sk_wr_sleep) would work fine in detecting is someone is actually waiting for a write, w/out the false positives triggered by the read-waiters. That would be a very sane thing to do, but would require a bigdumb change to all the -poll around (that could be automated by a script - devices not caring about the events hint can just continue to use the single queue like they currently do), and a more critical and gradual change of all the devices that wants to take advantage of it. That way, no more magic bits are needed, and a simple waitqueue_active() would tell you if someone is waiting for write-space events. - Davide - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23][BNX2]: Add PHY workaround for 5709 A1.
From: Michael Chan [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 11:07:13 -0700 [BNX2]: Add PHY workaround for 5709 A1. Add the DIS_EARLY_DAC PHY workaround for 5709 A1. Without it, link sometimes does not come up. Update version to 1.6.5. Signed-off-by: Michael Chan [EMAIL PROTECTED] Applied, thanks Michael. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()
From: Krishna Kumar2 [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 11:24:01 +0530 Ran 4/16/64 thread iperf on latest bits with this patch and no issues after 30 mins. I used to consistently get the bug within 1-2 mins with just 4 threads prior to this patch. Tested-by: Krishna Kumar [EMAIL PROTECTED] (if any value in that) There is much value in that :-) Thanks a lot Kirshna. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH: 2.6.13-15-SMP 3/3] network: concurrently run softirq network code on SMP
On Thu, 20 Sep 2007 21:04:16 +0800 john ye [EMAIL PROTECTED] wrote: Bottom Softirq Implementation. John Ye, 2007.08.27 Why this patch: Make kernel be able to concurrently execute softirq's net code on SMP system. Takes full advantages of SMP to handle more packets and greatly raises NIC throughput. The current kernel's net packet processing logic is: 1) The CPU which handles a hardirq must be executing its related softirq. 2) One softirq instance(irqs handled by 1 CPU) can't be executed on more than 2 CPUs at the same time. The limitation make kernel network be hard to take the advantages of SMP. How this patch: It splits the current softirq code into 2 parts: the cpu-sensitive top half, and the cpu-insensitive bottom half, then make bottom half(calld BS) be executed on SMP concurrently. The two parts are not equal in terms of size and load. Top part has constant code size(mainly, in net/core/dev.c and NIC drivers), while bottom part involves netfilter(iptables) whose load varies very much. An iptalbes with 1000 rules to match will make the bottom part's load be very high. So, if the bottom part softirq can be randomly distributed to processors and run concurrently on them, the network will gain much more packet handling capacity, network throughput will be be increased remarkably. Where useful: It's useful on SMP machines that meet the following 2 conditions: 1) have high kernel network load, for example, running iptables with thousands of rules, etc). 2) have more CPUs than active NICs, e.g. a 4 CPUs machine with 2 NICs). On these system, with the increase of softirq load, some CPUs will be idle while others(number is equal to # of NIC) keeps busy. IRQBALANCE will help, but it only shifts IRQ among CPUS, makes no softirq concurrency. Balancing the load of each cpus will not remarkably increase network speed. Where NOT useful: If the bottom half of softirq is too small(without running iptables), or the network is too idle, BS patch will not be seen to have visible effect. But It has no negative affect either. User can turn on/off BS functionality by /proc/sys/net/bs_enable switch. If I read this correctly. You basically changed network processing from softirq to workqueue (which is pretty much what -rt does). Perhaps optimizing and/or rearchitecting netfilter rule processing would get more benefit. But you are ignoring the issue of all the locking assumptions that get changed. Any performance gain from getting SMP will probably be lost by the additional locking required. Also patch is formatted badly, has multiple style issues (indentation etc). If you want to have it seriously considered, follow the Documentation/CodingStyle guidelines. There is even a perl script to check it scripts/checkpatch.pl - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...
Remove typedefs, volatiles and convert kmalloc()/memset() pairs to kcalloc(). Also reformat the surrounding clutter. Signed-off-by: Maciej W. Rozycki [EMAIL PROTECTED] --- On Thu, 20 Sep 2007, Jeff Garzik wrote: Remove the linux- prefix. Hmm, it looks like a bad application of `sed' by myself. Sorry for the noise. Maciej patch-netdev-2.6.23-rc6-20070920-sb1250-mac-typedef-9 diff -up --recursive --new-file linux-netdev-2.6.23-rc6-20070920.macro/drivers/net/sb1250-mac.c linux-netdev-2.6.23-rc6-20070920/drivers/net/sb1250-mac.c --- linux-netdev-2.6.23-rc6-20070920.macro/drivers/net/sb1250-mac.c 2007-09-20 17:55:14.0 + +++ linux-netdev-2.6.23-rc6-20070920/drivers/net/sb1250-mac.c 2007-09-20 18:09:18.0 + @@ -140,17 +140,17 @@ MODULE_PARM_DESC(int_timeout_rx, RX tim * */ -typedef enum { sbmac_speed_auto, sbmac_speed_10, - sbmac_speed_100, sbmac_speed_1000 } sbmac_speed_t; +enum sbmac_speed { sbmac_speed_auto, sbmac_speed_10, + sbmac_speed_100, sbmac_speed_1000 }; -typedef enum { sbmac_duplex_auto, sbmac_duplex_half, - sbmac_duplex_full } sbmac_duplex_t; +enum sbmac_duplex { sbmac_duplex_auto, sbmac_duplex_half, + sbmac_duplex_full }; -typedef enum { sbmac_fc_auto, sbmac_fc_disabled, sbmac_fc_frame, - sbmac_fc_collision, sbmac_fc_carrier } sbmac_fc_t; +enum sbmac_fc { sbmac_fc_auto, sbmac_fc_disabled, sbmac_fc_frame, + sbmac_fc_collision, sbmac_fc_carrier } sbmac_fc_t; -typedef enum { sbmac_state_uninit, sbmac_state_off, sbmac_state_on, - sbmac_state_broken } sbmac_state_t; +enum sbmac_state { sbmac_state_uninit, sbmac_state_off, sbmac_state_on, + sbmac_state_broken }; /** @@ -176,55 +176,61 @@ typedef enum { sbmac_state_uninit, sbmac * DMA Descriptor structure * */ -typedef struct sbdmadscr_s { +struct sbdmadscr { uint64_t dscr_a; uint64_t dscr_b; -} sbdmadscr_t; - -typedef unsigned long paddr_t; +}; /** * DMA Controller structure * */ -typedef struct sbmacdma_s { +struct sbmacdma { /* * This stuff is used to identify the channel and the registers * associated with it. */ - - struct sbmac_softc *sbdma_eth; /* back pointer to associated MAC */ - int sbdma_channel; /* channel number */ - int sbdma_txdir; /* direction (1=transmit) */ - int sbdma_maxdescr;/* total # of descriptors in ring */ + struct sbmac_softc *sbdma_eth; /* back pointer to associated + MAC */ + int sbdma_channel; /* channel number */ + int sbdma_txdir;/* direction (1=transmit) */ + int sbdma_maxdescr; /* total # of descriptors + in ring */ #ifdef CONFIG_SBMAC_COALESCE - int sbdma_int_pktcnt; /* # descriptors rx/tx before interrupt*/ - int sbdma_int_timeout; /* # usec rx/tx interrupt */ + int sbdma_int_pktcnt; + /* # descriptors rx/tx + before interrupt */ + int sbdma_int_timeout; + /* # usec rx/tx interrupt */ #endif - - volatile void __iomem *sbdma_config0; /* DMA config register 0 */ - volatile void __iomem *sbdma_config1; /* DMA config register 1 */ - volatile void __iomem *sbdma_dscrbase; /* Descriptor base address */ - volatile void __iomem *sbdma_dscrcnt; /* Descriptor count register */ - volatile void __iomem *sbdma_curdscr; /* current descriptor address */ - volatile void __iomem *sbdma_oodpktlost;/* pkt drop (rx only) */ - + void __iomem*sbdma_config0; /* DMA config register 0 */ + void __iomem*sbdma_config1; /* DMA config register 1 */ + void __iomem*sbdma_dscrbase; + /* descriptor base address */ + void __iomem*sbdma_dscrcnt; /* descriptor count register */ + void __iomem*sbdma_curdscr; /* current descriptor + address */ + void __iomem*sbdma_oodpktlost; + /* pkt drop (rx only) */ /* * This stuff is for maintenance of the ring
Re: [PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 15:17:44 +0300 In general, it should not be necessary to call tcp_fragment for already SACKed skbs, but it's better to be safe than sorry. And indeed, it can be called from sacktag when a DSACK arrives or some ACK (with SACK) reordering occurs (sacktag could be made to avoid the call in the latter case though I'm not sure if it's worth of the trouble and added complexity to cover such marginal case). The collapse case has return for SACKED_ACKED case earlier, so just WARN_ON if internal inconsistency is detected for some reason. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied, thanks Ilpo. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] [TCP]: Make fackets_out accurate
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 15:17:45 +0300 Substraction for fackets_out is unconditional when snd_una advances, thus there's no need to do it inside the loop. Just make sure correct bounds are honored. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 15:17:46 +0300 In addition, fix its function comment spacing. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied. -/*from STCP */ -static inline void clear_all_retrans_hints(struct tcp_sock *tp){ +/* from STCP */ +static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) { This brace should also be on a line by itself. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 15:17:47 +0300 The accounting code is pretty much the same, so it's a shame we do it in two places. I'm not too sure if added fully_acked check in MTU probing is really what we want perhaps the added end_seq could be used in the after() comparison. Indeed there are a bunch of tradeoffs to consider when handling the TSO-partial-ack cases. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied, thanks Ilpo. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 15:17:48 +0300 Implements following cleanups: - Comment re-placement (CodingStyle) - tcp_tso_acked() local (wrapper-like) variable removal (readability) - __-types removed (IMHO they make local variables jumpy looking and just was space) - acked - flag (naming conventions elsewhere in TCP code) - linebreak adjustments (readability) - nested if()s combined (reduced indentation) - clarifying newlines added Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 15:17:49 +0300 Basically this change enables it, previously other undo_marker users were left with nothing. Reverse undo_marker logic completely to get it set right in CA_Loss. On the other hand, when spurious RTO is detected, clear it. Clearing might be too heavy for some scenarios but seems safe enough starting point for now and shouldn't have much effect except in majority of cases (if in any). By adding a new FLAG_ we avoid looping through write_queue when RTO occurs. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied. Thanks for following up on all of this stuff to get FRTO in shape. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] [TCP] FRTO: Update sysctl documentation
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 15:17:50 +0300 Since the SACK enhanced FRTO was added, the code has been under test numerous times so remove experimental claim from the documentation. Also be a bit more verbose about the usage. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] APplied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 15:17:51 +0300 Most of the description that follows comes from my mail to netdev (some editing done): ... Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied, thanks Ilpo! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 15:17:52 +0300 There's no reason to clear the sacktag skb hint when small part of the rexmit queue changes. Account changes (if any) instead when fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so no need to discard SACK tag hint at all. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied, and I followed it up with this coding style fixlet. Thanks! commit e3723ad866a1e0690f3bc32443180ec1f6657f4a Author: David S. Miller [EMAIL PROTECTED] Date: Thu Sep 20 11:40:37 2007 -0700 [TCP]: Minor coding style fixup. Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/include/net/tcp.h b/include/net/tcp.h index 07b1faa..991ccdc 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1067,14 +1067,16 @@ static inline void tcp_mib_init(void) } /* from STCP */ -static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp) { +static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp) +{ tp-lost_skb_hint = NULL; tp-scoreboard_skb_hint = NULL; tp-retransmit_skb_hint = NULL; tp-forward_skb_hint = NULL; } -static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) { +static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) +{ tcp_clear_retrans_hints_partial(tp); tp-fastpath_skb_hint = NULL; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] net driver updates
From: Jeff Garzik [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 03:26:10 -0400 Please pull from the 'upstream' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream to receive the following changes: Pulled into net-2.6.24 and pushed out, thanks Jeff! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] CAN: Add raw protocol
This patch adds the CAN raw protocol. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- include/linux/can/raw.h | 31 + net/can/Kconfig | 26 + net/can/Makefile|3 net/can/raw.c | 828 4 files changed, 888 insertions(+) Index: net-2.6.24/include/linux/can/raw.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6.24/include/linux/can/raw.h 2007-09-20 18:48:59.0 +0200 @@ -0,0 +1,31 @@ +/* + * linux/can/raw.h + * + * Definitions for raw CAN sockets + * + * Authors: Oliver Hartkopp [EMAIL PROTECTED] + * Urs Thuermann [EMAIL PROTECTED] + * Copyright (c) 2002-2007 Volkswagen Group Electronic Research + * All rights reserved. + * + * Send feedback to [EMAIL PROTECTED] + * + */ + +#ifndef CAN_RAW_H +#define CAN_RAW_H + +#include linux/can.h + +#define SOL_CAN_RAW (SOL_CAN_BASE + CAN_RAW) + +/* for socket options affecting the socket (not the global system) */ + +enum { + CAN_RAW_FILTER = 1, /* set 0 .. n can_filter(s) */ + CAN_RAW_ERR_FILTER, /* set filter for error frames */ + CAN_RAW_LOOPBACK, /* local loopback (default:on) */ + CAN_RAW_RECV_OWN_MSGS /* receive my own msgs (default:off) */ +}; + +#endif Index: net-2.6.24/net/can/Kconfig === --- net-2.6.24.orig/net/can/Kconfig 2007-09-20 18:48:58.0 +0200 +++ net-2.6.24/net/can/Kconfig 2007-09-20 18:48:59.0 +0200 @@ -16,6 +16,32 @@ If you want CAN support, you should say Y here and also to the specific driver for your controller(s) below. +config CAN_RAW + tristate Raw CAN Protocol (raw access with CAN-ID filtering) + depends on CAN + default N + ---help--- + The Raw CAN protocol option offers access to the CAN bus via + the BSD socket API. You probably want to use the raw socket in + most cases where no higher level protocol is being used. The raw + socket has several filter options e.g. ID-Masking / Errorframes. + To receive/send raw CAN messages, use AF_CAN with protocol CAN_RAW. + +config CAN_RAW_USER + bool Allow non-root users to access Raw CAN Protocol sockets + depends on CAN_RAW + default N + ---help--- + The Controller Area Network is a local field bus transmitting only + broadcast messages without any routing and security concepts. + In the majority of cases the user application has to deal with + raw CAN frames. Therefore it might be reasonable NOT to restrict + the CAN access only to the user root, as known from other networks. + Since CAN_RAW sockets can only send and receive frames to/from CAN + interfaces this does not affect security of others networks. + Say Y here if you want non-root users to be able to access CAN_RAW + sockets. + config CAN_DEBUG_CORE bool CAN Core debugging messages depends on CAN Index: net-2.6.24/net/can/Makefile === --- net-2.6.24.orig/net/can/Makefile2007-09-20 18:48:58.0 +0200 +++ net-2.6.24/net/can/Makefile 2007-09-20 18:48:59.0 +0200 @@ -4,3 +4,6 @@ obj-$(CONFIG_CAN) += can.o can-objs := af_can.o proc.o + +obj-$(CONFIG_CAN_RAW) += can-raw.o +can-raw-objs := raw.o Index: net-2.6.24/net/can/raw.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6.24/net/can/raw.c2007-09-20 18:48:59.0 +0200 @@ -0,0 +1,828 @@ +/* + * raw.c - Raw sockets for protocol family CAN + * + * Copyright (c) 2002-2007 Volkswagen Group Electronic Research + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions, the following disclaimer and + *the referenced file 'COPYING'. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. Neither the name of Volkswagen nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * + * Alternatively, provided that this notice is retained in full, this + * software may be distributed under the terms of the GNU General + * Public License (GPL) version 2 as distributed in the 'COPYING' + * file from the main directory of the
[PATCH 1/7] CAN: Allocate protocol numbers for PF_CAN
This patch adds a protocol/address family number, ARP hardware type, ethernet packet type, and a line discipline number for the SocketCAN implementation. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- include/linux/if_arp.h |1 + include/linux/if_ether.h |1 + include/linux/socket.h |2 ++ include/linux/tty.h |3 ++- net/core/sock.c |4 ++-- 5 files changed, 8 insertions(+), 3 deletions(-) Index: net-2.6.24/include/linux/if_arp.h === --- net-2.6.24.orig/include/linux/if_arp.h 2007-09-20 18:48:21.0 +0200 +++ net-2.6.24/include/linux/if_arp.h 2007-09-20 18:48:57.0 +0200 @@ -52,6 +52,7 @@ #define ARPHRD_ROSE270 #define ARPHRD_X25 271 /* CCITT X.25 */ #define ARPHRD_HWX25 272 /* Boards with X.25 in firmware */ +#define ARPHRD_CAN 280 /* Controller Area Network */ #define ARPHRD_PPP 512 #define ARPHRD_CISCO 513 /* Cisco HDLC */ #define ARPHRD_HDLCARPHRD_CISCO Index: net-2.6.24/include/linux/if_ether.h === --- net-2.6.24.orig/include/linux/if_ether.h2007-09-20 18:48:21.0 +0200 +++ net-2.6.24/include/linux/if_ether.h 2007-09-20 18:48:57.0 +0200 @@ -90,6 +90,7 @@ #define ETH_P_WAN_PPP 0x0007 /* Dummy type for WAN PPP frames*/ #define ETH_P_PPP_MP0x0008 /* Dummy type for PPP MP frames */ #define ETH_P_LOCALTALK 0x0009 /* Localtalk pseudo type*/ +#define ETH_P_CAN 0x000C /* Controller Area Network */ #define ETH_P_PPPTALK 0x0010 /* Dummy type for Atalk over PPP*/ #define ETH_P_TR_802_2 0x0011 /* 802.2 frames */ #define ETH_P_MOBITEX 0x0015 /* Mobitex ([EMAIL PROTECTED]) */ Index: net-2.6.24/include/linux/socket.h === --- net-2.6.24.orig/include/linux/socket.h 2007-09-20 18:48:21.0 +0200 +++ net-2.6.24/include/linux/socket.h 2007-09-20 18:48:57.0 +0200 @@ -185,6 +185,7 @@ #define AF_PPPOX 24 /* PPPoX sockets*/ #define AF_WANPIPE 25 /* Wanpipe API Sockets */ #define AF_LLC 26 /* Linux LLC*/ +#define AF_CAN 29 /* Controller Area Network */ #define AF_TIPC30 /* TIPC sockets */ #define AF_BLUETOOTH 31 /* Bluetooth sockets*/ #define AF_IUCV32 /* IUCV sockets */ @@ -220,6 +221,7 @@ #define PF_PPPOX AF_PPPOX #define PF_WANPIPE AF_WANPIPE #define PF_LLC AF_LLC +#define PF_CAN AF_CAN #define PF_TIPCAF_TIPC #define PF_BLUETOOTH AF_BLUETOOTH #define PF_IUCVAF_IUCV Index: net-2.6.24/include/linux/tty.h === --- net-2.6.24.orig/include/linux/tty.h 2007-09-20 18:48:21.0 +0200 +++ net-2.6.24/include/linux/tty.h 2007-09-20 18:48:57.0 +0200 @@ -24,7 +24,7 @@ #define NR_PTYSCONFIG_LEGACY_PTY_COUNT /* Number of legacy ptys */ #define NR_UNIX98_PTY_DEFAULT 4096 /* Default maximum for Unix98 ptys */ #define NR_UNIX98_PTY_MAX (1 MINORBITS) /* Absolute limit */ -#define NR_LDISCS 17 +#define NR_LDISCS 18 /* line disciplines */ #define N_TTY 0 @@ -45,6 +45,7 @@ #define N_SYNC_PPP 14 /* synchronous PPP */ #define N_HCI 15 /* Bluetooth HCI UART */ #define N_GIGASET_M101 16 /* Siemens Gigaset M101 serial DECT adapter */ +#define N_SLCAN17 /* Serial / USB serial CAN Adaptors */ /* * This character is the same as _POSIX_VDISABLE: it cannot be used as Index: net-2.6.24/net/core/sock.c === --- net-2.6.24.orig/net/core/sock.c 2007-09-20 18:48:21.0 +0200 +++ net-2.6.24/net/core/sock.c 2007-09-20 18:48:57.0 +0200 @@ -154,7 +154,7 @@ sk_lock-AF_ASH , sk_lock-AF_ECONET , sk_lock-AF_ATMSVC , sk_lock-21 , sk_lock-AF_SNA , sk_lock-AF_IRDA , sk_lock-AF_PPPOX , sk_lock-AF_WANPIPE , sk_lock-AF_LLC , - sk_lock-27 , sk_lock-28 , sk_lock-29 , + sk_lock-27 , sk_lock-28 , sk_lock-AF_CAN , sk_lock-AF_TIPC , sk_lock-AF_BLUETOOTH, sk_lock-IUCV, sk_lock-AF_RXRPC , sk_lock-AF_MAX }; @@ -168,7 +168,7 @@ slock-AF_ASH , slock-AF_ECONET , slock-AF_ATMSVC , slock-21 , slock-AF_SNA , slock-AF_IRDA , slock-AF_PPPOX , slock-AF_WANPIPE , slock-AF_LLC , - slock-27 , slock-28 , slock-29 , + slock-27 ,
[PATCH 7/7] CAN: Add documentation
This patch adds documentation for the PF_CAN protocol family. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- Documentation/networking/00-INDEX |2 Documentation/networking/can.txt | 635 ++ 2 files changed, 637 insertions(+) Index: net-2.6.24/Documentation/networking/can.txt === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6.24/Documentation/networking/can.txt 2007-09-20 18:49:01.0 +0200 @@ -0,0 +1,635 @@ + + +can.txt + +Readme file for the Controller Area Network Protocol Family (aka Socket CAN) + +This file contains + + 1 Overview / What is Socket CAN + + 2 Motivation / Why using the socket API + + 3 Socket CAN concept +3.1 receive lists +3.2 loopback +3.3 network security issues (capabilities) +3.4 network problem notifications + + 4 How to use Socket CAN +4.1 RAW protocol sockets with can_filters (SOCK_RAW) + 4.1.1 RAW socket option CAN_RAW_FILTER + 4.1.2 RAW socket option CAN_RAW_ERR_FILTER + 4.1.3 RAW socket option CAN_RAW_LOOPBACK + 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS +4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) +4.3 connected transport protocols (SOCK_SEQPACKET) +4.4 unconnected transport protocols (SOCK_DGRAM) + + 5 Socket CAN core module +5.1 can.ko module params +5.2 procfs content +5.3 writing own CAN protocol modules + + 6 CAN network drivers +6.1 general settings +6.2 loopback +6.3 CAN controller hardware filters +6.4 currently supported CAN hardware +6.5 todo + + 7 Credits + + + +1. Overview / What is Socket CAN + + +The socketcan package is an implementation of CAN protocols +(Controller Area Network) for Linux. CAN is a networking technology +which has widespread use in automation, embedded devices, and +automotive fields. While there have been other CAN implementations +for Linux based on character devices, Socket CAN uses the Berkeley +socket API, the Linux network stack and implements the CAN device +drivers as network interfaces. The CAN socket API has been designed +as similar as possible to the TCP/IP protocols to allow programmers, +familiar with network programming, to easily learn how to use CAN +sockets. + +2. Motivation / Why using the socket API + + +There have been CAN implementations for Linux before Socket CAN so the +question arises, why we have started another project. Most existing +implementations come as a device driver for some CAN hardware, they +are based on character devices and provide comparatively little +functionality. Usually, there is only a hardware-specific device +driver which provides a character device interface to send and +receive raw CAN frames, directly to/from the controller hardware. +Queueing of frames and higher-level transport protocols like ISO-TP +have to be implemented in user space applications. Also, most +character-device implementations support only one single process to +open the device at a time, similar to a serial interface. Exchanging +the CAN controller requires employment of another device driver and +often the need for adaption of large parts of the application to the +new driver's API. + +Socket CAN was designed to overcome all of these limitations. A new +protocol family has been implemented which provides a socket interface +to user space applications and which builds upon the Linux network +layer, so to use all of the provided queueing functionality. A device +driver for CAN controller hardware registers itself with the Linux +network layer as a network device, so that CAN frames from the +controller can be passed up to the network layer and on to the CAN +protocol family module and also vice-versa. Also, the protocol family +module provides an API for transport protocol modules to register, so +that any number of transport protocols can be loaded or unloaded +dynamically. In fact, the can core module alone does not provide any +protocol and cannot be used without loading at least one additional +protocol module. Multiple sockets can be opened at the same time, +on different or the same protocol module and they can listen/send +frames on different or the same CAN IDs. Several sockets listening on +the same interface for frames with the same CAN ID are all passed the +same received matching CAN frames. An application wishing to +communicate using a specific transport protocol, e.g. ISO-TP, just +selects that protocol when opening the socket, and then can read and +write application data byte streams, without having to deal with +CAN-IDs, frames, etc. + +Similar functionality visible from user-space
[PATCH 0/7] CAN: Add new PF_CAN protocol family, try #7
Hello Dave, hello Patrick, this is the seventh post of the patch series that adds the PF_CAN protocol family for the Controller Area Network. Since our last post we have changed the following: * Changes suggested by Patrick: - protect proto_tab[] by a lock. - add _rcu to some hlist traversals. - use printk_ratelimit() for module autoload failures. - make can_proto_unregister() and can_rx_unregister() return void. - use return value of can_proto_register() and can_rx_register() (this also removed a flaw in behavior of raw_bind() and raw_setsockopt() in case of failure to can_rx_register() their filters). - call kzalloc() with GFP_KERNEL in case NETDEV_REGISTER. - use round_jiffies() to calculate expiration times. - make some variables static and/or __read_mostly. - in can_create() check for net namespace before auto loading modules. - add build time check for struct sizes. - use skb_share_chack() in vcan. - fixed some comments. * Typos in documentation as pointed out by Randy Dunlap and Bill Fink. The changes in try #6 were: * Update code to work with namespaces in net-2.6.24. * Remove SET_MODULE_OWNER() from vcan. The changes in try #5 were: * Remove slab destructor from calls to kmem_cache_alloc(). * Add comments about types defined in can.h. * Update comment on vcan loopback module parameter. * Fix typo in documentation. The changes in try #4 were: * Change vcan network driver to use the new RTNL API, as suggested by Patrick. * Revert our change to use skb-iif instead of skb-cb. After discussion with Patrick and Jamal it turned out, our first implementation was correct. * Use skb_tail_pointer() instead of skb-tail directly. * Coding style changes to satisfy linux/scripts/checkpatch.pl. * Minor changes for 64-bit-cleanliness. * Minor cleanup of #include's The changes in try #3 were: * Use sbk-sk and skb-pkt_type instead of skb-cb to pass loopback flags and originating socket down to the driver and back to the receiving socket. Thanks to Patrick McHardy for pointing out our wrong use of sbk-cb. * Use skb-iif instead of skb-cb to pass receiving interface from raw_rcv() and bcm_rcv() up to raw_recvmsg() and bcm_recvmsg(). * Set skb-protocol when sending CAN frames to netdevices. * Removed struct raw_opt and struct bcm_opt and integrated these directly into struct raw_sock and bcm_sock resp., like most other proto implementations do. * We have found and fixed race conditions between raw_bind(), raw_{set,get}sockopt() and raw_notifier(). This resulted in - complete removal of our own notifier list infrastructure in af_can.c. raw.c and bcm.c now use normal netdevice notifiers. - removal of ro-lock spinlock. We use lock_sock(sk) now. - changed deletion of dev_rcv_lists, which are now marked for deletion in the netdevice notifier in af_can.c and are actually deleted when all entries have been deleted using can_rx_unregister(). * Follow changes in 2.6.22 (e.g. ktime_t timestamps in skb). * Removed obsolete code from vcan.c, as pointed out by Stephen Hemminger. The changes in try #2 were: * reduced RCU callback overhead when deleting receiver lists (thx to feedback from Paul E. McKenney). * eliminated some code duplication in net/can/proc.c. * renamed slock-29 and sk_lock-29 to slock-AF_CAN and sk_lock-AF_CAN in net/core/sock.c * added entry for can.txt in Documentation/networking/00-INDEX * added error frame definitions in include/linux/can/error.h, which are to be used by CAN network drivers. This patch series applies against net-2.6.24 and is derived from Subversion revision r484 of http://svn.berlios.de/svnroot/repos/socketcan. It can be found in the directory http://svn.berlios.de/svnroot/repos/socketcan/trunk/patch-series/version. Thanks very much for your work! Best regards, Urs Thuermann Oliver Hartkopp P.S. Greetings from some BSD and Linux users here at the LUG meeting in Braunschweig :-) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] CAN: Add virtual CAN netdevice driver
This patch adds the virtual CAN bus (vcan) network driver. The vcan device is just a loopback device for CAN frames, no real CAN hardware is involved. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- drivers/net/Makefile |1 drivers/net/can/Kconfig | 25 + drivers/net/can/Makefile |5 + drivers/net/can/vcan.c | 208 +++ net/can/Kconfig |3 5 files changed, 242 insertions(+) Index: net-2.6.24/drivers/net/Makefile === --- net-2.6.24.orig/drivers/net/Makefile2007-09-20 18:48:21.0 +0200 +++ net-2.6.24/drivers/net/Makefile 2007-09-20 18:49:00.0 +0200 @@ -10,6 +10,7 @@ obj-$(CONFIG_CHELSIO_T1) += chelsio/ obj-$(CONFIG_CHELSIO_T3) += cxgb3/ obj-$(CONFIG_EHEA) += ehea/ +obj-$(CONFIG_CAN) += can/ obj-$(CONFIG_BONDING) += bonding/ obj-$(CONFIG_ATL1) += atl1/ obj-$(CONFIG_GIANFAR) += gianfar_driver.o Index: net-2.6.24/drivers/net/can/Kconfig === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6.24/drivers/net/can/Kconfig 2007-09-20 18:49:00.0 +0200 @@ -0,0 +1,25 @@ +menu CAN Device Drivers + depends on CAN + +config CAN_VCAN + tristate Virtual Local CAN Interface (vcan) + depends on CAN + default N + ---help--- + Similar to the network loopback devices, vcan offers a + virtual local CAN interface. + + This driver can also be built as a module. If so, the module + will be called vcan. + +config CAN_DEBUG_DEVICES + bool CAN devices debugging messages + depends on CAN + default N + ---help--- + Say Y here if you want the CAN device drivers to produce a bunch of + debug messages to the system log. Select this if you are having + a problem with CAN support and want to see more of what is going + on. + +endmenu Index: net-2.6.24/drivers/net/can/Makefile === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6.24/drivers/net/can/Makefile 2007-09-20 18:49:00.0 +0200 @@ -0,0 +1,5 @@ +# +# Makefile for the Linux Controller Area Network drivers. +# + +obj-$(CONFIG_CAN_VCAN) += vcan.o Index: net-2.6.24/drivers/net/can/vcan.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6.24/drivers/net/can/vcan.c 2007-09-20 18:49:00.0 +0200 @@ -0,0 +1,208 @@ +/* + * vcan.c - Virtual CAN interface + * + * Copyright (c) 2002-2007 Volkswagen Group Electronic Research + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions, the following disclaimer and + *the referenced file 'COPYING'. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. Neither the name of Volkswagen nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * + * Alternatively, provided that this notice is retained in full, this + * software may be distributed under the terms of the GNU General + * Public License (GPL) version 2 as distributed in the 'COPYING' + * file from the main directory of the linux kernel source. + * + * The provided data structures and external interfaces from this code + * are not restricted to be used by modules with a GPL compatible license. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH + * DAMAGE. + * + * Send feedback to [EMAIL PROTECTED] + * + */ + +#include linux/module.h +#include linux/init.h +#include linux/netdevice.h +#include linux/if_arp.h +#include linux/if_ether.h +#include
[PATCH 4/7] CAN: Add broadcast manager (bcm) protocol
This patch adds the CAN broadcast manager (bcm) protocol. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- include/linux/can/bcm.h | 65 + net/can/Kconfig | 28 net/can/Makefile|3 net/can/bcm.c | 1784 4 files changed, 1880 insertions(+) Index: net-2.6.24/include/linux/can/bcm.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6.24/include/linux/can/bcm.h 2007-09-20 18:48:59.0 +0200 @@ -0,0 +1,65 @@ +/* + * linux/can/bcm.h + * + * Definitions for CAN Broadcast Manager (BCM) + * + * Author: Oliver Hartkopp [EMAIL PROTECTED] + * Copyright (c) 2002-2007 Volkswagen Group Electronic Research + * All rights reserved. + * + * Send feedback to [EMAIL PROTECTED] + * + */ + +#ifndef CAN_BCM_H +#define CAN_BCM_H + +/** + * struct bcm_msg_head - head of messages to/from the broadcast manager + * @opcode:opcode, see enum below. + * @flags: special flags, see below. + * @count: number of frames to send before changing interval. + * @ival1: interval for the first @count frames. + * @ival2: interval for the following frames. + * @can_id:CAN ID of frames to be sent or received. + * @nframes: number of frames appended to the message head. + * @frames:array of CAN frames. + */ +struct bcm_msg_head { + int opcode; + int flags; + int count; + struct timeval ival1, ival2; + canid_t can_id; + int nframes; + struct can_frame frames[0]; +}; + +enum { + TX_SETUP = 1, /* create (cyclic) transmission task */ + TX_DELETE, /* remove (cyclic) transmission task */ + TX_READ,/* read properties of (cyclic) transmission task */ + TX_SEND,/* send one CAN frame */ + RX_SETUP, /* create RX content filter subscription */ + RX_DELETE, /* remove RX content filter subscription */ + RX_READ,/* read properties of RX content filter subscription */ + TX_STATUS, /* reply to TX_READ request */ + TX_EXPIRED, /* notification on performed transmissions (count=0) */ + RX_STATUS, /* reply to RX_READ request */ + RX_TIMEOUT, /* cyclic message is absent */ + RX_CHANGED /* updated CAN frame (detected content change) */ +}; + +#define SETTIMER0x0001 +#define STARTTIMER 0x0002 +#define TX_COUNTEVT 0x0004 +#define TX_ANNOUNCE 0x0008 +#define TX_CP_CAN_ID0x0010 +#define RX_FILTER_ID0x0020 +#define RX_CHECK_DLC0x0040 +#define RX_NO_AUTOTIMER 0x0080 +#define RX_ANNOUNCE_RESUME 0x0100 +#define TX_RESET_MULTI_IDX 0x0200 +#define RX_RTR_FRAME0x0400 + +#endif /* CAN_BCM_H */ Index: net-2.6.24/net/can/Kconfig === --- net-2.6.24.orig/net/can/Kconfig 2007-09-20 18:48:59.0 +0200 +++ net-2.6.24/net/can/Kconfig 2007-09-20 18:48:59.0 +0200 @@ -42,6 +42,34 @@ Say Y here if you want non-root users to be able to access CAN_RAW sockets. +config CAN_BCM + tristate Broadcast Manager CAN Protocol (with content filtering) + depends on CAN + default N + ---help--- + The Broadcast Manager offers content filtering, timeout monitoring, + sending of RTR-frames and cyclic CAN messages without permanent user + interaction. The BCM can be 'programmed' via the BSD socket API and + informs you on demand e.g. only on content updates / timeouts. + You probably want to use the bcm socket in most cases where cyclic + CAN messages are used on the bus (e.g. in automotive environments). + To use the Broadcast Manager, use AF_CAN with protocol CAN_BCM. + +config CAN_BCM_USER + bool Allow non-root users to access CAN broadcast manager sockets + depends on CAN_BCM + default N + ---help--- + The Controller Area Network is a local field bus transmitting only + broadcast messages without any routing and security concepts. + In the majority of cases the user application has to deal with + raw CAN frames. Therefore it might be reasonable NOT to restrict + the CAN access only to the user root, as known from other networks. + Since CAN_BCM sockets can only send and receive frames to/from CAN + interfaces this does not affect security of others networks. + Say Y here if you want non-root users to be able to access CAN_BCM + sockets. + config CAN_DEBUG_CORE bool CAN Core debugging messages depends on CAN Index: net-2.6.24/net/can/Makefile === --- net-2.6.24.orig/net/can/Makefile2007-09-20 18:48:59.0 +0200 +++ net-2.6.24/net/can/Makefile
[PATCH 6/7] CAN: Add maintainer entries
This patch adds entries in the CREDITS and MAINTAINERS file for CAN. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- CREDITS | 16 MAINTAINERS |9 + 2 files changed, 25 insertions(+) Index: net-2.6.24/CREDITS === --- net-2.6.24.orig/CREDITS 2007-09-20 18:48:21.0 +0200 +++ net-2.6.24/CREDITS 2007-09-20 18:49:00.0 +0200 @@ -1331,6 +1331,14 @@ S: 5623 HZ Eindhoven S: The Netherlands +N: Oliver Hartkopp +E: [EMAIL PROTECTED] +W: http://www.volkswagen.de +D: Controller Area Network (network layer core) +S: Brieffach 1776 +S: 38436 Wolfsburg +S: Germany + N: Andrew Haylett E: [EMAIL PROTECTED] D: Selection mechanism @@ -3284,6 +3292,14 @@ S: F-35042 Rennes Cedex S: France +N: Urs Thuermann +E: [EMAIL PROTECTED] +W: http://www.volkswagen.de +D: Controller Area Network (network layer core) +S: Brieffach 1776 +S: 38436 Wolfsburg +S: Germany + N: Jon Tombs E: [EMAIL PROTECTED] W: http://www.esi.us.es/~jon Index: net-2.6.24/MAINTAINERS === --- net-2.6.24.orig/MAINTAINERS 2007-09-20 18:48:21.0 +0200 +++ net-2.6.24/MAINTAINERS 2007-09-20 18:49:00.0 +0200 @@ -975,6 +975,15 @@ L: [EMAIL PROTECTED] S: Maintained +CAN NETWORK LAYER +P: Urs Thuermann +M: [EMAIL PROTECTED] +P: Oliver Hartkopp +M: [EMAIL PROTECTED] +L: [EMAIL PROTECTED] +W: http://developer.berlios.de/projects/socketcan/ +S: Maintained + CALGARY x86-64 IOMMU P: Muli Ben-Yehuda M: [EMAIL PROTECTED] -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations
On Thu, 20 Sep 2007, David Miller wrote: From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 15:17:52 +0300 There's no reason to clear the sacktag skb hint when small part of the rexmit queue changes. Account changes (if any) instead when fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so no need to discard SACK tag hint at all. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied, and I followed it up with this coding style fixlet. Yeah, that's for fixing it... ...Just didn't notice it was left wrong while doing things that required more thinking to get them right... -- i.
[git patches] net driver fixes
This includes the sky2 update that you and sch discussed. Please pull from 'upstream-linus' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream-linus to receive the following updates: drivers/net/myri10ge/myri10ge.c |3 + drivers/net/phy/phy.c |1 + drivers/net/sky2.c | 368 +++ drivers/net/sky2.h | 41 - 4 files changed, 292 insertions(+), 121 deletions(-) Brice Goglin (1): myri10ge: Add support for PCI device id 9 Domen Puncer (1): phy: export phy_mii_ioctl Stephen Hemminger (6): sky2: fix VLAN receive processing (resend) sky2: ethtool speed report bug sky2: reorganize chip revision features sky2: fe+ chip support sky2: receive FIFO checking sky2: version 1.18 diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c index 1c42266..556962f 100644 --- a/drivers/net/myri10ge/myri10ge.c +++ b/drivers/net/myri10ge/myri10ge.c @@ -3094,9 +3094,12 @@ static void myri10ge_remove(struct pci_dev *pdev) } #define PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E 0x0008 +#define PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E_9 0x0009 static struct pci_device_id myri10ge_pci_tbl[] = { {PCI_DEVICE(PCI_VENDOR_ID_MYRICOM, PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E)}, + {PCI_DEVICE +(PCI_VENDOR_ID_MYRICOM, PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E_9)}, {0}, }; diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 0cc4369..cb230f4 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -409,6 +409,7 @@ int phy_mii_ioctl(struct phy_device *phydev, return 0; } +EXPORT_SYMBOL(phy_mii_ioctl); /** * phy_start_aneg - start auto-negotiation for this PHY device diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index 5d812de..eaffe55 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -51,7 +51,7 @@ #include sky2.h #define DRV_NAME sky2 -#define DRV_VERSION1.17 +#define DRV_VERSION1.18 #define PFXDRV_NAME /* @@ -118,12 +118,15 @@ static const struct pci_device_id sky2_id_table[] = { { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4351) }, /* 88E8036 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4352) }, /* 88E8038 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4353) }, /* 88E8039 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4354) }, /* 88E8040 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4356) }, /* 88EC033 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x435A) }, /* 88E8048 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4360) }, /* 88E8052 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4361) }, /* 88E8050 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4362) }, /* 88E8053 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4363) }, /* 88E8055 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4364) }, /* 88E8056 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4365) }, /* 88E8070 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4366) }, /* 88EC036 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4367) }, /* 88EC032 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4368) }, /* 88EC034 */ @@ -147,6 +150,7 @@ static const char *yukon2_name[] = { Extreme, /* 0xb5 */ EC, /* 0xb6 */ FE, /* 0xb7 */ + FE+, /* 0xb8 */ }; static void sky2_set_multicast(struct net_device *dev); @@ -217,8 +221,7 @@ static void sky2_power_on(struct sky2_hw *hw) else sky2_write8(hw, B2_Y2_CLK_GATE, 0); - if (hw-chip_id == CHIP_ID_YUKON_EC_U || - hw-chip_id == CHIP_ID_YUKON_EX) { + if (hw-flags SKY2_HW_ADV_POWER_CTL) { u32 reg; sky2_pci_write32(hw, PCI_DEV_REG3, 0); @@ -311,10 +314,8 @@ static void sky2_phy_init(struct sky2_hw *hw, unsigned port) struct sky2_port *sky2 = netdev_priv(hw-dev[port]); u16 ctrl, ct1000, adv, pg, ledctrl, ledover, reg; - if (sky2-autoneg == AUTONEG_ENABLE -!(hw-chip_id == CHIP_ID_YUKON_XL -|| hw-chip_id == CHIP_ID_YUKON_EC_U -|| hw-chip_id == CHIP_ID_YUKON_EX)) { + if (sky2-autoneg == AUTONEG_ENABLE + !(hw-flags SKY2_HW_NEWER_PHY)) { u16 ectrl = gm_phy_read(hw, port, PHY_MARV_EXT_CTRL); ectrl = ~(PHY_M_EC_M_DSC_MSK | PHY_M_EC_S_DSC_MSK | @@ -334,7 +335,7 @@ static void sky2_phy_init(struct sky2_hw *hw, unsigned port) ctrl = gm_phy_read(hw, port, PHY_MARV_PHY_CTRL); if (sky2_is_copper(hw)) { - if (hw-chip_id == CHIP_ID_YUKON_FE) { + if (!(hw-flags SKY2_HW_GIGABIT)) { /* enable automatic crossover */ ctrl |= PHY_M_PC_MDI_XMODE(PHY_M_PC_ENA_AUTO) 1; } else { @@ -346,9 +347,7 @@ static void sky2_phy_init(struct sky2_hw *hw, unsigned
Please pull 'nl80211' branch of wireless-2.6
Dave, This patch adds the basic nl80211 infrastructure. Thanks! John --- Patch is available here: http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/nl80211/0001-nl80211-add-netlink-interface-to-cfg80211.patch --- The following changes since commit 0d4cbb5e7f60b2f1a4d8b7f6ea4cc264262c7a01: Linus Torvalds (1): Linux 2.6.23-rc6 are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git nl80211 Johannes Berg (1): nl80211: add netlink interface to cfg80211 include/linux/nl80211.h | 97 +- include/net/cfg80211.h | 11 +- include/net/iw_handler.h |8 +- net/mac80211/ieee80211_cfg.c |2 +- net/wireless/Kconfig | 17 ++- net/wireless/Makefile|1 + net/wireless/core.c | 148 +++ net/wireless/core.h | 32 +++ net/wireless/nl80211.c | 431 ++ net/wireless/nl80211.h | 24 +++ 10 files changed, 762 insertions(+), 9 deletions(-) create mode 100644 net/wireless/nl80211.c create mode 100644 net/wireless/nl80211.h diff --git a/include/linux/nl80211.h b/include/linux/nl80211.h index 9a30ba2..a5dd030 100644 --- a/include/linux/nl80211.h +++ b/include/linux/nl80211.h @@ -7,7 +7,97 @@ */ /** + * enum nl80211_commands - supported nl80211 commands + * + * @NL80211_CMD_UNSPEC: unspecified command to catch errors + * + * @NL80211_CMD_GET_WIPHY: request information about a wiphy or dump request + * to get a list of all present wiphys. + * @NL80211_CMD_SET_WIPHY: set wiphy name, needs %NL80211_ATTR_WIPHY and + * %NL80211_ATTR_WIPHY_NAME. + * @NL80211_CMD_NEW_WIPHY: Newly created wiphy, response to get request + * or rename notification. Has attributes %NL80211_ATTR_WIPHY and + * %NL80211_ATTR_WIPHY_NAME. + * @NL80211_CMD_DEL_WIPHY: Wiphy deleted. Has attributes + * %NL80211_ATTR_WIPHY and %NL80211_ATTR_WIPHY_NAME. + * + * @NL80211_CMD_GET_INTERFACE: Request an interface's configuration; + * either a dump request on a %NL80211_ATTR_WIPHY or a specific get + * on an %NL80211_ATTR_IFINDEX is supported. + * @NL80211_CMD_SET_INTERFACE: Set type of a virtual interface, requires + %NL80211_ATTR_IFINDEX and %NL80211_ATTR_IFTYPE. + * @NL80211_CMD_NEW_INTERFACE: Newly created virtual interface or response + * to %NL80211_CMD_GET_INTERFACE. Has %NL80211_ATTR_IFINDEX, + * %NL80211_ATTR_WIPHY and %NL80211_ATTR_IFTYPE attributes. Can also + * be sent from userspace to request creation of a new virtual interface, + * then requires attributes %NL80211_ATTR_WIPHY, %NL80211_ATTR_IFTYPE and + * %NL80211_ATTR_IFNAME. + * @NL80211_CMD_DEL_INTERFACE: Virtual interface was deleted, has attributes + * %NL80211_ATTR_IFINDEX and %NL80211_ATTR_WIPHY. Can also be sent from + * userspace to request deletion of a virtual interface, then requires + * attribute %NL80211_ATTR_IFINDEX. + * + * @NL80211_CMD_MAX: highest used command number + * @__NL80211_CMD_AFTER_LAST: internal use + */ +enum nl80211_commands { +/* don't change the order or add anything inbetween, this is ABI! */ + NL80211_CMD_UNSPEC, + + NL80211_CMD_GET_WIPHY, /* can dump */ + NL80211_CMD_SET_WIPHY, + NL80211_CMD_NEW_WIPHY, + NL80211_CMD_DEL_WIPHY, + + NL80211_CMD_GET_INTERFACE, /* can dump */ + NL80211_CMD_SET_INTERFACE, + NL80211_CMD_NEW_INTERFACE, + NL80211_CMD_DEL_INTERFACE, + + /* add commands here */ + + /* used to define NL80211_CMD_MAX below */ + __NL80211_CMD_AFTER_LAST, + NL80211_CMD_MAX = __NL80211_CMD_AFTER_LAST - 1 +}; + + +/** + * enum nl80211_attrs - nl80211 netlink attributes + * + * @NL80211_ATTR_UNSPEC: unspecified attribute to catch errors + * + * @NL80211_ATTR_WIPHY: index of wiphy to operate on, cf. + * /sys/class/ieee80211/phyname/index + * @NL80211_ATTR_WIPHY_NAME: wiphy name (used for renaming) + * + * @NL80211_ATTR_IFINDEX: network interface index of the device to operate on + * @NL80211_ATTR_IFNAME: network interface name + * @NL80211_ATTR_IFTYPE: type of virtual interface, see enum nl80211_iftype + * + * @NL80211_ATTR_MAX: highest attribute number currently defined + * @__NL80211_ATTR_AFTER_LAST: internal use + */ +enum nl80211_attrs { +/* don't change the order or add anything inbetween, this is ABI! */ + NL80211_ATTR_UNSPEC, + + NL80211_ATTR_WIPHY, + NL80211_ATTR_WIPHY_NAME, + + NL80211_ATTR_IFINDEX, + NL80211_ATTR_IFNAME, + NL80211_ATTR_IFTYPE, + + /* add attributes here, update the policy in nl80211.c */ + + __NL80211_ATTR_AFTER_LAST, + NL80211_ATTR_MAX = __NL80211_ATTR_AFTER_LAST - 1 +}; + +/** * enum nl80211_iftype - (virtual) interface types + * * @NL80211_IFTYPE_UNSPECIFIED: unspecified type, driver decides * @NL80211_IFTYPE_ADHOC: independent BSS member * @NL80211_IFTYPE_STATION:
Re: 2.6.23-rc6-mm1
On Thu, 20 Sep 2007 21:42:44 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote: ... i have tested the change with cross compiler for power405 with the same .config with which the build problem is solved, but the build fails with another error CC [M] drivers/net/mace.o drivers/net/mace.c: In function 'mace_handle_misc_intrs': drivers/net/mace.c:642: error: 'dev' undeclared (first use in this function) drivers/net/mace.c:642: error: (Each undeclared identifier is reported only once drivers/net/mace.c:642: error: for each function it appears in.) make[2]: *** [drivers/net/mace.o] Error 1 make[1]: *** [drivers/net] Error 2 make: *** [drivers] Error 2 This patch fixes the build failure Signed-off-by: Kamalesh Babulal [EMAIL PROTECTED] --- --- linux-2.6.23-rc6 /drivers/net/mace.c 2007-09-20 17:16:50.0+0530 +++ linux-2.6.23-rc6/drivers/net/~mace.c2007-09-20 17:12: 47.0 +0530 @@ -633,7 +633,7 @@ static void mace_set_multicast(struct ne spin_unlock_irqrestore(mp-lock, flags); } -static void mace_handle_misc_intrs(struct mace_data *mp, int intr) +static void mace_handle_misc_intrs(struct mace_data *mp, int intr, struct net_device *dev) { volatile struct mace __iomem *mb = mp-mace; static int mace_babbles, mace_jabbers; @@ -669,7 +669,7 @@ static irqreturn_t mace_interrupt(int ir spin_lock_irqsave(mp-lock, flags); intr = in_8(mb-ir); /* read interrupt register */ in_8(mb-xmtrc); /* get retries */ -mace_handle_misc_intrs(mp, intr); +mace_handle_misc_intrs(mp, intr, dev); i = mp-tx_empty; while (in_8(mb-pr) XMTSV) { @@ -682,7 +682,7 @@ static irqreturn_t mace_interrupt(int ir */ intr = in_8(mb-ir); if (intr != 0) - mace_handle_misc_intrs(mp, intr); + mace_handle_misc_intrs(mp, intr, dev); if (mp-tx_bad_runt) { fs = in_8(mb-xmtfs); mp-tx_bad_runt = 0; @@ -817,7 +817,7 @@ static void mace_tx_timeout(unsigned lon goto out; /* update various counters */ -mace_handle_misc_intrs(mp, in_8(mb-ir)); +mace_handle_misc_intrs(mp, in_8(mb-ir), dev); cp = mp-tx_cmds + NCMDS_TX * mp-tx_empty; Thanks, I will fix the wordwrapping in your patch and shall send it in to David. Hi, The build fails when compiling with the same .config over cross compiler for powerpc405 drivers/net/mv643xx_eth.c: In function 'mv643xx_eth_int_handler': drivers/net/mv643xx_eth.c:564: error: 'bp' undeclared (first use in this function) drivers/net/mv643xx_eth.c:564: error: (Each undeclared identifier is reported only once drivers/net/mv643xx_eth.c:564: error: for each function it appears in.) drivers/net/mv643xx_eth.c: At top level: drivers/net/mv643xx_eth.c:1010: error: conflicting types for 'mv643xx_poll' drivers/net/mv643xx_eth.c:68: error: previous declaration of 'mv643xx_poll' was here make[2]: *** [drivers/net/mv643xx_eth.o] Error 1 make[1]: *** [drivers/net] Error 2 make: *** [drivers] Error 2 Yes, rather a lot of net drivers got broken in easy-to-fix ways. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sky2: be more selective about FIFO watchdog
Be more selective about when to enable the ram buffer watchdog code. It is unnecessary on XL A3 or later revs, and with Yukon FE the buffer is so small (4K) that the watchdog detects false positives. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/drivers/net/sky2.c2007-09-19 15:36:32.0 -0700 +++ b/drivers/net/sky2.c2007-09-20 10:43:42.0 -0700 @@ -816,7 +816,8 @@ static void sky2_mac_init(struct sky2_hw sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_CLR); sky2_write16(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_OPER_ON); - if (!(hw-flags SKY2_HW_RAMBUFFER)) { + /* On chips without ram buffer, pause is controled by MAC level */ + if (sky2_read8(hw, B2_E_0) == 0) { sky2_write8(hw, SK_REG(port, RX_GMF_LP_THR), 768/8); sky2_write8(hw, SK_REG(port, RX_GMF_UP_THR), 1024/8); @@ -1271,7 +1272,7 @@ static int sky2_up(struct net_device *de struct sky2_port *sky2 = netdev_priv(dev); struct sky2_hw *hw = sky2-hw; unsigned port = sky2-port; - u32 imask; + u32 imask, ramsize; int cap, err = -ENOMEM; struct net_device *otherdev = hw-dev[sky2-port^1]; @@ -1326,13 +1327,12 @@ static int sky2_up(struct net_device *de sky2_mac_init(hw, port); - if (hw-flags SKY2_HW_RAMBUFFER) { - /* Register is number of 4K blocks on internal RAM buffer. */ - u32 ramsize = sky2_read8(hw, B2_E_0) * 4; + /* Register is number of 4K blocks on internal RAM buffer. */ + ramsize = sky2_read8(hw, B2_E_0) * 4; + if (ramsize 0) { u32 rxspace; - printk(KERN_DEBUG PFX %s: ram buffer %dK\n, dev-name, ramsize); - + pr_debug(PFX %s: ram buffer %dK\n, dev-name, ramsize); if (ramsize 16) rxspace = ramsize / 2; else @@ -1995,7 +1995,7 @@ static int sky2_change_mtu(struct net_de synchronize_irq(hw-pdev-irq); - if (!(hw-flags SKY2_HW_RAMBUFFER)) + if (sky2_read8(hw, B2_E_0) == 0) sky2_set_tx_stfwd(hw, port); ctl = gma_read16(hw, port, GM_GP_CTRL); @@ -2526,7 +2526,7 @@ static void sky2_watchdog(unsigned long ++active; /* For chips with Rx FIFO, check if stuck */ - if ((hw-flags SKY2_HW_RAMBUFFER) + if ((hw-flags SKY2_HW_FIFO_HANG_CHECK) sky2_rx_hung(dev)) { pr_info(PFX %s: receiver hang detected\n, dev-name); @@ -2684,8 +2684,10 @@ static int __devinit sky2_init(struct sk switch(hw-chip_id) { case CHIP_ID_YUKON_XL: hw-flags = SKY2_HW_GIGABIT - | SKY2_HW_NEWER_PHY - | SKY2_HW_RAMBUFFER; + | SKY2_HW_NEWER_PHY; + if (hw-chip_rev 3) + hw-flags |= SKY2_HW_FIFO_HANG_CHECK; + break; case CHIP_ID_YUKON_EC_U: @@ -2711,11 +2713,10 @@ static int __devinit sky2_init(struct sk dev_err(hw-pdev-dev, unsupported revision Yukon-EC rev A1\n); return -EOPNOTSUPP; } - hw-flags = SKY2_HW_GIGABIT | SKY2_HW_RAMBUFFER; + hw-flags = SKY2_HW_GIGABIT | SKY2_HW_FIFO_HANG_CHECK; break; case CHIP_ID_YUKON_FE: - hw-flags = SKY2_HW_RAMBUFFER; break; case CHIP_ID_YUKON_FE_P: --- a/drivers/net/sky2.h2007-09-19 10:05:28.0 -0700 +++ b/drivers/net/sky2.h2007-09-20 10:44:15.0 -0700 @@ -2063,7 +2063,7 @@ struct sky2_hw { #define SKY2_HW_FIBRE_PHY 0x0002 #define SKY2_HW_GIGABIT0x0004 #define SKY2_HW_NEWER_PHY 0x0008 -#define SKY2_HW_RAMBUFFER 0x0010 /* chip has RAM FIFO */ +#define SKY2_HW_FIFO_HANG_CHECK0x0010 #define SKY2_HW_NEW_LE 0x0020 /* new LSOv2 format */ #define SKY2_HW_AUTO_TX_SUM0x0040 /* new IP decode for Tx */ #define SKY2_HW_ADV_POWER_CTL 0x0080 /* additional PHY power regs */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pull request for 'r8169-for-jeff-20070919' branch
On 09/19/2007 03:56 PM, Francois Romieu wrote: Please pull from branch 'r8169-for-jeff-20070919' in repository People are still reporting hangs with this card in 2.6.22.6, are there any fixes appropriate for that? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html