[PATCH 2.6.24 2/3]S2io: Support for add/delete/store/restore ethernet addresses
- Support to add/delete/store/restore 64 and 128 Ethernet addresses for Xframe I and Xframe II respectively. Signed-off-by: Sreenivasa Honnur [EMAIL PROTECTED] Signed-off-by: Ramkrishna Vepa [EMAIL PROTECTED] --- diff -urpN patch1/drivers/net/s2io.c patch2/drivers/net/s2io.c --- patch1/drivers/net/s2io.c 2007-08-18 05:32:23.0 +0530 +++ patch2/drivers/net/s2io.c 2007-08-18 07:19:21.0 +0530 @@ -3589,6 +3589,9 @@ static void s2io_reset(struct s2io_nic * /* Set swapper to enable I/O register access */ s2io_set_swapper(sp); + /* restore mac_addr entries */ + restore_mac_and_mc_addr(sp); + /* Restore the MSIX table entries from local variables */ restore_xmsi_data(sp); @@ -3647,9 +3650,6 @@ static void s2io_reset(struct s2io_nic * writeq(val64, bar0-pcc_err_reg); } - /* restore the previously assigned mac address */ - set_mac_addr(sp-dev, (u8 *)sp-def_mac_addr[0].mac_addr); - sp-device_enabled_once = FALSE; } @@ -4118,8 +4118,19 @@ hw_init_failed: static int s2io_close(struct net_device *dev) { struct s2io_nic *sp = dev-priv; + struct config_param *config = sp-config; + u64 tmp64; + int off; netif_stop_queue(dev); + + /* delete all populated mac entries */ + for(off =1; off config-max_mc_addr; off++) { + tmp64 = read_mac_addr(sp,off); + if(tmp64 != 0xULL) + delete_mac_addr(sp, tmp64); + } + /* Reset card, kill tasklet and free Tx and Rx buffers. */ s2io_card_down(sp); @@ -5044,7 +5055,7 @@ static void s2io_set_multicast(struct ne bar0-rmac_addr_data1_mem); val64 = RMAC_ADDR_CMD_MEM_WE | RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD | - RMAC_ADDR_CMD_MEM_OFFSET(MAC_MC_ALL_MC_ADDR_OFFSET); + RMAC_ADDR_CMD_MEM_OFFSET(config-max_mc_addr - 1); writeq(val64, bar0-rmac_addr_cmd_mem); /* Wait till command completes */ wait_for_cmd_complete(bar0-rmac_addr_cmd_mem, @@ -5052,7 +5063,7 @@ static void s2io_set_multicast(struct ne S2IO_BIT_RESET); sp-m_cast_flg = 1; - sp-all_multi_pos = MAC_MC_ALL_MC_ADDR_OFFSET; + sp-all_multi_pos = config-max_mc_addr - 1; } else if ((dev-flags IFF_ALLMULTI) (sp-m_cast_flg)) { /* Disable all Multicast addresses */ writeq(RMAC_ADDR_DATA0_MEM_ADDR(dis_addr), @@ -5121,7 +5132,8 @@ static void s2io_set_multicast(struct ne /* Update individual M_CAST address list */ if ((!sp-m_cast_flg) dev-mc_count) { if (dev-mc_count - (MAX_ADDRS_SUPPORTED - MAC_MC_ADDR_START_OFFSET - 1)) { + ((config-max_mc_addr - config-max_mac_addr) + - config-mc_start_offset - 1)) { DBG_PRINT(ERR_DBG, %s: No more Rx filters , dev-name); DBG_PRINT(ERR_DBG, can be added, please enable ); @@ -5141,7 +5153,7 @@ static void s2io_set_multicast(struct ne val64 = RMAC_ADDR_CMD_MEM_WE | RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD | RMAC_ADDR_CMD_MEM_OFFSET - (MAC_MC_ADDR_START_OFFSET + i); + (config-mc_start_offset + i); writeq(val64, bar0-rmac_addr_cmd_mem); /* Wait for command completes */ @@ -5173,7 +5185,7 @@ static void s2io_set_multicast(struct ne val64 = RMAC_ADDR_CMD_MEM_WE | RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD | RMAC_ADDR_CMD_MEM_OFFSET - (i + MAC_MC_ADDR_START_OFFSET); + (i + config-mc_start_offset); writeq(val64, bar0-rmac_addr_cmd_mem); /* Wait for command completes */ @@ -5188,6 +5200,75 @@ static void s2io_set_multicast(struct ne } } } +/* read from CAM unicast multicast addresses and store it in + * def_mac_addr structure. + **/ +void store_mac_and_mc_addr(struct s2io_nic *sp) +{ + int offset; + u64 mac_addr=0x0; + struct config_param *config = sp-config; + + /* store unicast multicast mac addresses */ + for(offset = 0; offset config-max_mc_addr; offset++) { + mac_addr = read_mac_addr(sp,offset); + /* if read fails disable the entry */ + if(mac_addr == FAILURE) + mac_addr = 0xULL; + MAC_ADDR_SET(offset,mac_addr); + } +} + +/* restore unicast MAC addresses to CAM from def_mac_addr structure + **/ +static void restore_mac_and_mc_addr(struct
Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
Hi Dave, David Miller [EMAIL PROTECTED] wrote on 08/22/2007 09:52:29 AM: From: Krishna Kumar2 [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 09:41:52 +0530 snip Because TSO does batching already, so it's a very good tit for tat comparison of the new batching scheme vs. an existing one. I am planning to do more testing on your suggestion over the weekend, but I had a comment. Are you saying that TSO and batching should be mutually exclusive so hardware that doesn't support TSO (like IB) only would benefit? But even if they can co-exist, aren't cases like sending multiple small skbs better handled with batching? I'm not making any suggestions, so don't read that into anything I've said :-) I think the jury is still out, but seeing TSO perform even slightly worse with the batching changes in place would be very worrysome. This applies to both throughput and cpu utilization. Does turning off batching solve that problem? What I mean by that is: batching can be disabled if a TSO device is worse for some cases. Infact something that I had changed my latest code is to not enable batching in register_netdevice (in Rev4 which I am sending in a few mins), rather the user has to explicitly turn 'on' batching. Wondering if that is what you are concerned about. In any case, I will test your case on Monday (I am on vacation for next couple of days). Thanks, - KK - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.24 failure with netconsole
Andrew Morton [EMAIL PROTECTED] wrote: David, there's basically no reason ever why anyone should add BUG() or BUG_ON() to net code. Please consider rejecting any patches which add new ones. WARN_ON() is *much* better. It at least gives the user a chance of getting some disgnostic info out, of performing additional tests or even of using their kernel if they want to test something else. The only reason to choose BUG over WARN is if we're actually concerned about scrogging people's data, or serious things like that (ie: filesystems and mm). Well, for networking if we continue after a serious coding error it could result in a remote kernel compromise. So BUG_ON/BUG is not entirely useless. I'm not claiming that it's necessarily the case here though :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops in 2.6.22.1: skb_copy_and_csum_datagram_iovec()
Chuck Ebbert [EMAIL PROTECTED] wrote: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=253290 18:57:54 osama kernel: [c05be67f] kernel_recvmsg+0x31/0x40 18:57:54 osama kernel: [e0bc52d4] svc_udp_recvfrom+0x114/0x368 [sunrpc] svc_udp_recvfrom is calling kernel_recvmsg with iov == NULL. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.24 1/3]S2io: Added support for set_mac_address driver entry point
- Added set_mac_address driver entry point - Copying permanent mac address to dev-perm_addr Signed-off-by: Sreenivasa Honnur [EMAIL PROTECTED] Signed-off-by: Ramkrishna Vepa [EMAIL PROTECTED] --- diff -urpN orig/drivers/net/s2io.c patch1/drivers/net/s2io.c --- orig/drivers/net/s2io.c 2007-08-17 02:38:17.0 +0530 +++ patch1/drivers/net/s2io.c 2007-08-18 05:32:23.0 +0530 @@ -350,6 +350,15 @@ static char ethtool_driver_stats_keys[][ timer.data = (unsigned long) arg; \ mod_timer(timer, (jiffies + exp)) \ +#defineMAC_ADDR_SET(offset,mac_addr) \ + memset(sp-def_mac_addr[offset].mac_addr, 0, sizeof(ETH_ALEN));\ + sp-def_mac_addr[offset].mac_addr[5] = (u8) (mac_addr); \ + sp-def_mac_addr[offset].mac_addr[4] = (u8) (mac_addr 8);\ + sp-def_mac_addr[offset].mac_addr[3] = (u8) (mac_addr 16);\ + sp-def_mac_addr[offset].mac_addr[2] = (u8) (mac_addr 24);\ + sp-def_mac_addr[offset].mac_addr[1] = (u8) (mac_addr 32);\ + sp-def_mac_addr[offset].mac_addr[0] = (u8) (mac_addr 40);\ + /* Add the vlan */ static void s2io_vlan_rx_register(struct net_device *dev, struct vlan_group *grp) @@ -3639,7 +3648,7 @@ static void s2io_reset(struct s2io_nic * } /* restore the previously assigned mac address */ - s2io_set_mac_addr(sp-dev, (u8 *)sp-def_mac_addr[0].mac_addr); + set_mac_addr(sp-dev, (u8 *)sp-def_mac_addr[0].mac_addr); sp-device_enabled_once = FALSE; } @@ -4067,7 +4076,7 @@ static int s2io_open(struct net_device * goto hw_init_failed; } - if (s2io_set_mac_addr(dev, dev-dev_addr) == FAILURE) { + if (set_mac_addr(dev, dev-dev_addr) == FAILURE) { DBG_PRINT(ERR_DBG, Set Mac Address Failed\n); s2io_card_down(sp); err = -ENODEV; @@ -5025,6 +5034,7 @@ static void s2io_set_multicast(struct ne 0xfeffULL; u64 dis_addr = 0xULL, mac_addr = 0; void __iomem *add; + struct config_param *config = sp-config; if ((dev-flags IFF_ALLMULTI) (!sp-m_cast_flg)) { /* Enable all Multicast addresses */ @@ -5179,8 +5189,48 @@ static void s2io_set_multicast(struct ne } } +/* add MAC address to CAM */ +static int add_mac_addr(struct s2io_nic *sp, u64 addr, int off) +{ + u64 val64; + struct XENA_dev_config __iomem *bar0 = sp-bar0; + + writeq(RMAC_ADDR_DATA0_MEM_ADDR(addr), + bar0-rmac_addr_data0_mem); + + val64 = + RMAC_ADDR_CMD_MEM_WE | RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD | + RMAC_ADDR_CMD_MEM_OFFSET(off); + writeq(val64, bar0-rmac_addr_cmd_mem); + + /* Wait till command completes */ + if (wait_for_cmd_complete(bar0-rmac_addr_cmd_mem, + RMAC_ADDR_CMD_MEM_STROBE_CMD_EXECUTING, + S2IO_BIT_RESET)) { + DBG_PRINT(INFO_DBG, add_mac_addr failed\n); + return FAILURE; + } + return SUCCESS; +} + +/** + * s2io_set_mac_addr driver entry point + */ +static int s2io_set_mac_addr(struct net_device *dev, void* p) +{ + struct sockaddr *addr=p; + + if (!is_valid_ether_addr(addr-sa_data)) + return -EINVAL; + + memcpy(dev-dev_addr, addr-sa_data,dev-addr_len); + + /* store the MAC address in CAM */ + return (set_mac_addr(dev, dev-dev_addr)); +} + /** - * s2io_set_mac_addr - Programs the Xframe mac address + * set_mac_addr - Programs the Xframe mac address * @dev : pointer to the device structure. * @addr: a uchar pointer to the new mac address which is to be set. * Description : This procedure will program the Xframe to receive @@ -5188,56 +5238,31 @@ static void s2io_set_multicast(struct ne * Return value: SUCCESS on success and an appropriate (-)ve integer * as defined in errno.h file on failure. */ - -static int s2io_set_mac_addr(struct net_device *dev, u8 * addr) +static int set_mac_addr(struct net_device *dev, u8 * addr) { struct s2io_nic *sp = dev-priv; - struct XENA_dev_config __iomem *bar0 = sp-bar0; - register u64 val64, mac_addr = 0; + register u64 mac_addr = 0,perm_addr=0; int i; - u64 old_mac_addr = 0; /* -* Set the new MAC address as the new unicast filter and reflect this -* change on the device address registered with the OS. It will be -* at offset 0. -*/ + * Set the new MAC address as the new unicast filter and reflect this + * change on the device address registered with the OS. It will be + * at offset 0. + */ for (i = 0; i ETH_ALEN; i++) { mac_addr = 8; mac_addr |= addr[i]; - old_mac_addr = 8; - old_mac_addr |= sp-def_mac_addr[0].mac_addr[i]; +
[PATCH 2.6.24 3/3]S2io: Updating transceiver information in ethtool function
- Update transceiver information in ethtool function Signed-off-by: Sreenivasa Honnur [EMAIL PROTECTED] Signed-off-by: Ramkrishna Vepa [EMAIL PROTECTED] --- diff -urpN patch2/drivers/net/s2io.c patch3/drivers/net/s2io.c --- patch2/drivers/net/s2io.c 2007-08-18 07:19:21.0 +0530 +++ patch3/drivers/net/s2io.c 2007-08-18 07:20:27.0 +0530 @@ -84,7 +84,7 @@ #include s2io.h #include s2io-regs.h -#define DRV_VERSION 2.0.26.4 +#define DRV_VERSION 2.0.26.5 /* S2io Driver name version. */ static char s2io_driver_name[] = Neterion; @@ -5459,7 +5459,9 @@ static int s2io_ethtool_gset(struct net_ info-supported = (SUPPORTED_1baseT_Full | SUPPORTED_FIBRE); info-advertising = (SUPPORTED_1baseT_Full | SUPPORTED_FIBRE); info-port = PORT_FIBRE; - /* info-transceiver?? TODO */ + + /* info-transceiver */ + info-transceiver = XCVR_EXTERNAL; if (netif_carrier_ok(sp-dev)) { info-speed = 1; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/10 Rev4] Implement skb batching and support in IPoIB
This set of patches implements the batching xmit capability (changed from API), and adds support for batching in IPoIB. Also included is a sample patch for E1000 (ported - thanks to Jamal's E1000 changes from earlier kernel). I will use this patch for testing E1000 TSO vs batching after the weekend. List of changes from previous revision: 1. [Dave/Patrick] Remove new xmit API altogether (and add a capabilities flag in dev-features). Modify documentation to remove API, etc. 2. [Evgeniy] Remove bogus checks for 0, and use spin_lock_bh. 3. [Jamal] Ported Jamal's E1000 driver changes for using batching xmit. 5. [KK] Fix out-of-order sending of skbs bug resulting in re-transmissions by a fix in IPoIB [see XXX]. 6. [KK] Do not force device to use batching as default, instead let user enable batching if required. This is useful in case users are not aware that batching is taking place. 4. [KK] IPoIB: Remove multiple xmit handlers and convert to use one. 7. [KK] IPoIB: Removed overkill - poll handler can be called on one CPU, so there is no need to take a new lock against parallel WC's. Extras that I can do later: --- 1. [Patrick] Use skb_blist statically in netdevice. This could also be used to integrate GSO and batching. 2. [Evgeniy] Useful to splice lists dev_add_skb_to_blist (and this can be done for regular xmit's of GSO skbs too for #1 above). Patches are described as: Mail 0/10: This mail Mail 1/10: HOWTO documentation Mail 2/10: Introduce skb_blist, NETIF_F_BATCH_SKBS, use single API for batching/no-batching, etc. Mail 3/10: Modify qdisc_run() to support batching Mail 4/10: Add ethtool support to enable/disable batching Mail 5/10: IPoIB: Header file changes to use batching Mail 6/10: IPoIB: CM Multicast changes Mail 7/10: IPoIB: Verbs changes to use batching Mail 8/10: IPoIB: Internal post and work completion handler Mail 9/10: IPoIB: Implement the new batching capability Mail 10/10: E1000: Implement the new batching capability Issues: I am getting a huge amount of retransmissions for both TCP and TCP No Delay cases for IPoIB (which explains the slight degradation for some test cases mentioned in previous mail). After a full test run, there were 18500 retransmissions for every 1 in regular code. But there is 20.7% overall improvement in BW even with this huge amount of retransmissions (which implies batching could improve results even more if this problem is fixed). Results of experiments are: a. With batching set to maximum 2 skbs, I get almost the same number of retransmissions (implies receiver probably is not dropping skbs). ifconfig/netstat on receiver gives no clue (drop/errors, etc). b. Making the IPoIB xmit create single work requests for each skb on blist reduces retrans to same as in regular code. c. Similar retransmission increase is not seen for E1000. Please review and provide feedback; and consider for inclusion. Thanks, - KK [XXX] Dave had suggested to use batching only in the net_tx_action case. When I implemented that in earlier revisions, there were lots of TCP retransmissions (about 18,000 to every 1 in regular code). I found the reason for part of that problem as: skbs get queue'd up in dev-qdisc (when tx lock was not got or queue blocked); when net_tx_action is called later, it passes the batch list as argument to qdisc_run and this results in skbs being moved to the batch list; then batching xmit also fails due to tx lock failure; the next many regular xmit of a single skb will go through the fast path (pass NULL batch list to qdisc_run) and send those skbs out to the device while previous skbs are cooling their heels in the batch list. The first fix was to not pass NULL/batch-list to qdisc_run() but to always check whether skbs are present in batch list when trying to xmit. This reduced retransmissions by a third (from 18,000 to around 12,000), but led to another problem while testing - iperf transmit almost zero data for higher # of parallel flows like 64 or more (and when I run iperf for a 2 min run, it takes about 5-6 mins, and reports that it ran 0 secs and the amount of data transfered is a few MB's). I don't know why this happens with this being the only change (any ideas is very appreciated). The second fix that resolved this was to revert back to Dave's suggestion to use batching only in net_tx_action case, and modify the driver to see if skbs are present in batch list and to send them out first before sending the current skb. I still see huge retransmission for IPoIB (but not for E1000), though it has reduced to 12,000 from the earlier 18,000 number. - To
[PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching
Add Documentation describing batching skb xmit capability. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- batching_skb_xmit.txt | 78 ++ 1 files changed, 78 insertions(+) diff -ruNp org/Documentation/networking/batching_skb_xmit.txt new/Documentation/networking/batching_skb_xmit.txt --- org/Documentation/networking/batching_skb_xmit.txt 1970-01-01 05:30:00.0 +0530 +++ new/Documentation/networking/batching_skb_xmit.txt 2007-08-22 10:21:19.0 +0530 @@ -0,0 +1,78 @@ +HOWTO for batching skb xmit support +--- + +Section 1: What is batching skb xmit +Section 2: How batching xmit works vs the regular xmit +Section 3: How drivers can support batching +Section 4: How users can work with batching + + +Introduction: Kernel support for batching skb +-- + +A new capability to support xmit of multiple skbs is provided in the netdevice +layer. Drivers which enable this capability should be able to process multiple +skbs in a single call to their xmit handler. + + +Section 1: What is batching skb xmit +- + + This capability is optionally enabled by a driver by setting the + NETIF_F_BATCH_SKBS bit in dev-features. The pre-requisite for a + driver to use this capability is that it should have a reasonably + sized hardware queue that can process multiple skbs. + + +Section 2: How batching xmit works vs the regular xmit +--- + + The network stack gets called from upper layer protocols with a single + skb to transmit. This skb is first enqueue'd and an attempt is made to + transmit it immediately (via qdisc_run). However, events like tx lock + contention, tx queue stopped, etc, can result in the skb not getting + sent out and it remains in the queue. When the next xmit is called or + when the queue is re-enabled, qdisc_run could potentially find + multiple packets in the queue, and iteratively send them all out + one-by-one. + + Batching skb xmit is a mechanism to exploit this situation where all + skbs can be passed in one shot to the device. This reduces driver + processing, locking at the driver (or in stack for ~LLTX drivers) + gets amortized over multiple skbs, and in case of specific drivers + where every xmit results in a completion processing (like IPoIB) - + optimizations can be made in the driver to request a completion for + only the last skb that was sent which results in saving interrupts + for every (but the last) skb that was sent in the same batch. + + Batching can result in significant performance gains for systems that + have multiple data stream paths over the same network interface card. + + +Section 3: How drivers can support batching +- + + Batching requires the driver to set the NETIF_F_BATCH_SKBS bit in + dev-features. + + The driver's xmit handler should be modified to process multiple skbs + instead of one skb. The driver's xmit handler is called either with a + skb to transmit or NULL skb, where the latter case should be handled + as a call to xmit multiple skbs. This is done by sending out all skbs + in the dev-skb_blist list (where it was added by the core stack). + + +Section 4: How users can work with batching +- + + Batching can be disabled for a particular device, e.g. on desktop + systems if only one stream of network activity for that device is + taking place, since performance could be slightly affected due to + extra processing that batching adds (unless packets are getting + sent fast resulting in stopped queue's). Batching can be enabled if + more than one stream of network activity per device is being done, + e.g. on servers; or even desktop usage with multiple browser, chat, + file transfer sessions, etc. + + Per device batching can be enabled/disabled by passing 'on' or 'off' + respectively to ethtool. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/10 Rev4] [core] Add skb_blist support for batching
Introduce skb_blist, NETIF_F_BATCH_SKBS, use single API for batching/no-batching, etc. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- include/linux/netdevice.h |4 net/core/dev.c| 21 ++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h --- org/include/linux/netdevice.h 2007-08-20 14:26:36.0 +0530 +++ new/include/linux/netdevice.h 2007-08-22 08:42:10.0 +0530 @@ -399,6 +399,7 @@ struct net_device #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN packets */ #define NETIF_F_GSO2048/* Enable software GSO. */ #define NETIF_F_LLTX 4096/* LockLess TX */ +#define NETIF_F_BATCH_SKBS 8192/* Driver supports multiple skbs/xmit */ #define NETIF_F_MULTI_QUEUE16384 /* Has multiple TX/RX queues */ #define NETIF_F_LRO32768 /* large receive offload */ @@ -510,6 +511,9 @@ struct net_device /* Partially transmitted GSO packet. */ struct sk_buff *gso_skb; + /* List of batch skbs (optional, used if driver supports skb batching */ + struct sk_buff_head *skb_blist; + /* ingress path synchronizer */ spinlock_t ingress_lock; struct Qdisc*qdisc_ingress; diff -ruNp org/net/core/dev.c new/net/core/dev.c --- org/net/core/dev.c 2007-08-20 14:26:37.0 +0530 +++ new/net/core/dev.c 2007-08-22 10:49:22.0 +0530 @@ -898,6 +898,16 @@ void netdev_state_change(struct net_devi } } +static void free_batching(struct net_device *dev) +{ + if (dev-skb_blist) { + if (!skb_queue_empty(dev-skb_blist)) + skb_queue_purge(dev-skb_blist); + kfree(dev-skb_blist); + dev-skb_blist = NULL; + } +} + /** * dev_load- load a network module * @name: name of interface @@ -1458,7 +1468,9 @@ static int dev_gso_segment(struct sk_buf int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) { - if (likely(!skb-next)) { + if (likely(skb)) { + if (unlikely(skb-next)) + goto gso; if (!list_empty(ptype_all)) dev_queue_xmit_nit(skb, dev); @@ -1468,10 +1480,10 @@ int dev_hard_start_xmit(struct sk_buff * if (skb-next) goto gso; } - - return dev-hard_start_xmit(skb, dev); } + return dev-hard_start_xmit(skb, dev); + gso: do { struct sk_buff *nskb = skb-next; @@ -3791,6 +3803,9 @@ void unregister_netdevice(struct net_dev synchronize_net(); + /* Deallocate batching structure */ + free_batching(dev); + /* Shutdown queueing discipline. */ dev_shutdown(dev); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/10 Rev4] [sched] Modify qdisc_run to support batching
Modify qdisc_run() to support batching. Modify callers of qdisc_run to use batching, modify qdisc_restart to implement batching. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- include/linux/netdevice.h |2 + include/net/pkt_sched.h |6 +-- net/core/dev.c| 44 +++- net/sched/sch_generic.c | 70 ++ 4 files changed, 105 insertions(+), 17 deletions(-) diff -ruNp org/include/net/pkt_sched.h new/include/net/pkt_sched.h --- org/include/net/pkt_sched.h 2007-08-20 14:26:36.0 +0530 +++ new/include/net/pkt_sched.h 2007-08-22 09:23:57.0 +0530 @@ -80,13 +80,13 @@ extern struct qdisc_rate_table *qdisc_ge struct rtattr *tab); extern void qdisc_put_rtab(struct qdisc_rate_table *tab); -extern void __qdisc_run(struct net_device *dev); +extern void __qdisc_run(struct net_device *dev, struct sk_buff_head *blist); -static inline void qdisc_run(struct net_device *dev) +static inline void qdisc_run(struct net_device *dev, struct sk_buff_head *blist) { if (!netif_queue_stopped(dev) !test_and_set_bit(__LINK_STATE_QDISC_RUNNING, dev-state)) - __qdisc_run(dev); + __qdisc_run(dev, blist); } extern int tc_classify_compat(struct sk_buff *skb, struct tcf_proto *tp, diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h --- org/include/linux/netdevice.h 2007-08-20 14:26:36.0 +0530 +++ new/include/linux/netdevice.h 2007-08-22 08:42:10.0 +0530 @@ -892,6 +896,8 @@ extern int dev_set_mac_address(struct n struct sockaddr *); extern int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev); +extern int dev_add_skb_to_blist(struct sk_buff *skb, +struct net_device *dev); extern voiddev_init(void); diff -ruNp org/net/sched/sch_generic.c new/net/sched/sch_generic.c --- org/net/sched/sch_generic.c 2007-08-20 14:26:37.0 +0530 +++ new/net/sched/sch_generic.c 2007-08-22 08:49:55.0 +0530 @@ -59,10 +59,12 @@ static inline int qdisc_qlen(struct Qdis static inline int dev_requeue_skb(struct sk_buff *skb, struct net_device *dev, struct Qdisc *q) { - if (unlikely(skb-next)) - dev-gso_skb = skb; - else - q-ops-requeue(skb, q); + if (likely(skb)) { + if (unlikely(skb-next)) + dev-gso_skb = skb; + else + q-ops-requeue(skb, q); + } netif_schedule(dev); return 0; @@ -91,10 +93,15 @@ static inline int handle_dev_cpu_collisi /* * Same CPU holding the lock. It may be a transient * configuration error, when hard_start_xmit() recurses. We -* detect it by checking xmit owner and drop the packet when -* deadloop is detected. Return OK to try the next skb. +* detect it by checking xmit owner and drop the packet (or +* all packets in batching case) when deadloop is detected. +* Return OK to try the next skb. */ - kfree_skb(skb); + if (likely(skb)) + kfree_skb(skb); + else if (!skb_queue_empty(dev-skb_blist)) + skb_queue_purge(dev-skb_blist); + if (net_ratelimit()) printk(KERN_WARNING Dead loop on netdevice %s, fix it urgently!\n, dev-name); @@ -112,6 +119,38 @@ static inline int handle_dev_cpu_collisi } /* + * Algorithm to get skb(s) is: + * - Non batching drivers, or if the batch list is empty and there is + * 1 skb in the queue - dequeue skb and put it in *skbp to tell the + * caller to use the single xmit API. + * - Batching drivers where the batch list already contains atleast one + * skb, or if there are multiple skbs in the queue: keep dequeue'ing + * skb's upto a limit and set *skbp to NULL to tell the caller to use + * the multiple xmit API. + * + * Returns: + * 1 - atleast one skb is to be sent out, *skbp contains skb or NULL + * (in case 1 skbs present in blist for batching) + * 0 - no skbs to be sent. + */ +static inline int get_skb(struct net_device *dev, struct Qdisc *q, + struct sk_buff_head *blist, struct sk_buff **skbp) +{ + if (likely(!blist || (!skb_queue_len(blist) qdisc_qlen(q) = 1))) { + return likely((*skbp = dev_dequeue_skb(dev, q)) != NULL); + } else { + struct sk_buff *skb; + int max = dev-tx_queue_len - skb_queue_len(blist); + + while (max 0 (skb = dev_dequeue_skb(dev, q)) !=
[PATCH 4/10 Rev4] [ethtool] Add ethtool support
Add ethtool support to enable/disable batching. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- include/linux/ethtool.h |2 ++ include/linux/netdevice.h |2 ++ net/core/dev.c| 36 net/core/ethtool.c| 27 +++ 4 files changed, 67 insertions(+) diff -ruNp org/include/linux/ethtool.h new/include/linux/ethtool.h --- org/include/linux/ethtool.h 2007-08-20 14:26:35.0 +0530 +++ new/include/linux/ethtool.h 2007-08-22 08:37:35.0 +0530 @@ -440,6 +440,8 @@ struct ethtool_ops { #define ETHTOOL_SFLAGS 0x0026 /* Set flags bitmap(ethtool_value) */ #define ETHTOOL_GPFLAGS0x0027 /* Get driver-private flags bitmap */ #define ETHTOOL_SPFLAGS0x0028 /* Set driver-private flags bitmap */ +#define ETHTOOL_GBATCH 0x0029 /* Get Batching (ethtool_value) */ +#define ETHTOOL_SBATCH 0x0030 /* Set Batching (ethtool_value) */ /* compatibility with older code */ #define SPARC_ETH_GSET ETHTOOL_GSET diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h --- org/include/linux/netdevice.h 2007-08-20 14:26:36.0 +0530 +++ new/include/linux/netdevice.h 2007-08-22 08:42:10.0 +0530 @@ -1152,6 +1152,8 @@ extern void dev_set_promiscuity(struct extern voiddev_set_allmulti(struct net_device *dev, int inc); extern voidnetdev_state_change(struct net_device *dev); extern voidnetdev_features_change(struct net_device *dev); +extern int dev_change_tx_batch_skb(struct net_device *dev, + unsigned long new_batch_skb); /* Load a device via the kmod */ extern voiddev_load(const char *name); extern voiddev_mcast_init(void); diff -ruNp org/net/core/dev.c new/net/core/dev.c --- org/net/core/dev.c 2007-08-20 14:26:37.0 +0530 +++ new/net/core/dev.c 2007-08-22 10:49:22.0 +0530 @@ -908,6 +908,42 @@ static void free_batching(struct net_dev } } +int dev_change_tx_batch_skb(struct net_device *dev, unsigned long new_batch_skb) +{ + int ret = 0; + struct sk_buff_head *blist; + + if (!(dev-features NETIF_F_BATCH_SKBS)) { + /* Driver doesn't support batching skb API */ + ret = -ENOTSUPP; + goto out; + } + + /* +* Check if new value is same as the current (paranoia to use !! for +* new_batch_skb as that should always be boolean). +*/ + if (!!dev-skb_blist == !!new_batch_skb) + goto out; + + if (new_batch_skb + (blist = kmalloc(sizeof *blist, GFP_KERNEL)) == NULL) { + ret = -ENOMEM; + goto out; + } + + spin_lock_bh(dev-queue_lock); + if (new_batch_skb) { + skb_queue_head_init(blist); + dev-skb_blist = blist; + } else + free_batching(dev); + spin_unlock_bh(dev-queue_lock); + +out: + return ret; +} + /** * dev_load- load a network module * @name: name of interface diff -ruNp org/net/core/ethtool.c new/net/core/ethtool.c --- org/net/core/ethtool.c 2007-08-20 14:26:37.0 +0530 +++ new/net/core/ethtool.c 2007-08-22 08:36:07.0 +0530 @@ -556,6 +556,26 @@ static int ethtool_set_gso(struct net_de return 0; } +static int ethtool_get_batch(struct net_device *dev, char __user *useraddr) +{ + struct ethtool_value edata = { ETHTOOL_GBATCH }; + + edata.data = dev-skb_blist != NULL; + if (copy_to_user(useraddr, edata, sizeof(edata))) +return -EFAULT; + return 0; +} + +static int ethtool_set_batch(struct net_device *dev, char __user *useraddr) +{ + struct ethtool_value edata; + + if (copy_from_user(edata, useraddr, sizeof(edata))) + return -EFAULT; + + return dev_change_tx_batch_skb(dev, edata.data); +} + static int ethtool_self_test(struct net_device *dev, char __user *useraddr) { struct ethtool_test test; @@ -813,6 +833,7 @@ int dev_ethtool(struct ifreq *ifr) case ETHTOOL_GGSO: case ETHTOOL_GFLAGS: case ETHTOOL_GPFLAGS: + case ETHTOOL_GBATCH: break; default: if (!capable(CAP_NET_ADMIN)) @@ -956,6 +977,12 @@ int dev_ethtool(struct ifreq *ifr) rc = ethtool_set_value(dev, useraddr, dev-ethtool_ops-set_priv_flags); break; + case ETHTOOL_GBATCH: + rc = ethtool_get_batch(dev, useraddr); + break; + case ETHTOOL_SBATCH: + rc = ethtool_set_batch(dev, useraddr); + break; default: rc = -EOPNOTSUPP; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of
[PATCH 5/10 Rev4] [IPoIB] Header file changes
IPoIB header file changes to use batching. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- ipoib.h |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib.h new/drivers/infiniband/ulp/ipoib/ipoib.h --- org/drivers/infiniband/ulp/ipoib/ipoib.h2007-08-20 14:26:26.0 +0530 +++ new/drivers/infiniband/ulp/ipoib/ipoib.h2007-08-22 08:33:51.0 +0530 @@ -271,8 +271,8 @@ struct ipoib_dev_priv { struct ipoib_tx_buf *tx_ring; unsigned tx_head; unsigned tx_tail; - struct ib_sgetx_sge; - struct ib_send_wrtx_wr; + struct ib_sge*tx_sge; + struct ib_send_wr*tx_wr; struct ib_wc ibwc[IPOIB_NUM_WC]; @@ -367,8 +367,11 @@ static inline void ipoib_put_ah(struct i int ipoib_open(struct net_device *dev); int ipoib_add_pkey_attr(struct net_device *dev); +int ipoib_process_skb(struct net_device *dev, struct sk_buff *skb, + struct ipoib_dev_priv *priv, struct ipoib_ah *address, + u32 qpn, int wr_num); void ipoib_send(struct net_device *dev, struct sk_buff *skb, - struct ipoib_ah *address, u32 qpn); + struct ipoib_ah *address, u32 qpn, int num_skbs); void ipoib_reap_ah(struct work_struct *work); void ipoib_flush_paths(struct net_device *dev); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/10 Rev4] [IPoIB] CM Multicast changes
IPoIB CM Multicast changes based on header file changes. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- ipoib_cm.c| 13 + ipoib_multicast.c |4 ++-- 2 files changed, 11 insertions(+), 6 deletions(-) diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_cm.c new/drivers/infiniband/ulp/ipoib/ipoib_cm.c --- org/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-08-20 14:26:26.0 +0530 +++ new/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-08-22 08:33:51.0 +0530 @@ -493,14 +493,19 @@ static inline int post_send(struct ipoib unsigned int wr_id, u64 addr, int len) { + int ret; struct ib_send_wr *bad_wr; - priv-tx_sge.addr = addr; - priv-tx_sge.length = len; + priv-tx_sge[0].addr = addr; + priv-tx_sge[0].length= len; + + priv-tx_wr[0].wr_id = wr_id; - priv-tx_wr.wr_id = wr_id; + priv-tx_wr[0].next = NULL; + ret = ib_post_send(tx-qp, priv-tx_wr, bad_wr); + priv-tx_wr[0].next = priv-tx_wr[1]; - return ib_post_send(tx-qp, priv-tx_wr, bad_wr); + return ret; } void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx) diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_multicast.c new/drivers/infiniband/ulp/ipoib/ipoib_multicast.c --- org/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-08-20 14:26:26.0 +0530 +++ new/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-08-22 08:33:51.0 +0530 @@ -217,7 +217,7 @@ static int ipoib_mcast_join_finish(struc if (!memcmp(mcast-mcmember.mgid.raw, priv-dev-broadcast + 4, sizeof (union ib_gid))) { priv-qkey = be32_to_cpu(priv-broadcast-mcmember.qkey); - priv-tx_wr.wr.ud.remote_qkey = priv-qkey; + priv-tx_wr[0].wr.ud.remote_qkey = priv-qkey; } if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, mcast-flags)) { @@ -736,7 +736,7 @@ out: } } - ipoib_send(dev, skb, mcast-ah, IB_MULTICAST_QPN); + ipoib_send(dev, skb, mcast-ah, IB_MULTICAST_QPN, 1); } unlock: - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/10 Rev4] [IPoIB] Verbs changes
IPoIB verb changes to use batching. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- ipoib_verbs.c | 23 ++- 1 files changed, 14 insertions(+), 9 deletions(-) diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_verbs.c new/drivers/infiniband/ulp/ipoib/ipoib_verbs.c --- org/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-08-20 14:26:26.0 +0530 +++ new/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-08-22 08:33:51.0 +0530 @@ -152,11 +152,11 @@ int ipoib_transport_dev_init(struct net_ .max_send_sge = 1, .max_recv_sge = 1 }, - .sq_sig_type = IB_SIGNAL_ALL_WR, + .sq_sig_type = IB_SIGNAL_REQ_WR,/* 11.2.4.1 */ .qp_type = IB_QPT_UD }; - - int ret, size; + struct ib_send_wr *next_wr = NULL; + int i, ret, size; priv-pd = ib_alloc_pd(priv-ca); if (IS_ERR(priv-pd)) { @@ -197,12 +197,17 @@ int ipoib_transport_dev_init(struct net_ priv-dev-dev_addr[2] = (priv-qp-qp_num 8) 0xff; priv-dev-dev_addr[3] = (priv-qp-qp_num ) 0xff; - priv-tx_sge.lkey = priv-mr-lkey; - - priv-tx_wr.opcode = IB_WR_SEND; - priv-tx_wr.sg_list = priv-tx_sge; - priv-tx_wr.num_sge = 1; - priv-tx_wr.send_flags = IB_SEND_SIGNALED; + for (i = ipoib_sendq_size - 1; i = 0; i--) { + priv-tx_sge[i].lkey= priv-mr-lkey; + priv-tx_wr[i].opcode = IB_WR_SEND; + priv-tx_wr[i].sg_list = priv-tx_sge[i]; + priv-tx_wr[i].num_sge = 1; + priv-tx_wr[i].send_flags = 0; + + /* Link the list properly for provider to use */ + priv-tx_wr[i].next = next_wr; + next_wr = priv-tx_wr[i]; + } return 0; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/10 Rev4] [IPoIB] Post and work completion handler changes
IPoIB internal post and work completion handler changes. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- ipoib_ib.c | 207 - 1 files changed, 163 insertions(+), 44 deletions(-) diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_ib.c new/drivers/infiniband/ulp/ipoib/ipoib_ib.c --- org/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-08-20 14:26:26.0 +0530 +++ new/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-08-22 08:33:51.0 +0530 @@ -242,6 +242,8 @@ repost: static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); + int i, num_completions; + unsigned int tx_ring_index; unsigned int wr_id = wc-wr_id; struct ipoib_tx_buf *tx_req; unsigned long flags; @@ -255,18 +257,46 @@ static void ipoib_ib_handle_tx_wc(struct return; } - tx_req = priv-tx_ring[wr_id]; + /* Get first WC to process (no one can update tx_tail at this time) */ + tx_ring_index = priv-tx_tail (ipoib_sendq_size - 1); - ib_dma_unmap_single(priv-ca, tx_req-mapping, - tx_req-skb-len, DMA_TO_DEVICE); + /* Find number of WC's */ + num_completions = wr_id - tx_ring_index + 1; + if (unlikely(num_completions = 0)) + num_completions += ipoib_sendq_size; - ++priv-stats.tx_packets; - priv-stats.tx_bytes += tx_req-skb-len; + /* +* Handle WC's from earlier (possibly multiple) post_sends in this +* iteration as we move from tx_tail to wr_id, since if the last WR +* (which is the one which requested completion notification) failed +* to be sent for any of those earlier request(s), no completion +* notification is generated for successful WR's of those earlier +* request(s). +*/ + tx_req = priv-tx_ring[tx_ring_index]; + for (i = 0; i num_completions; i++) { + if (likely(tx_req-skb)) { + ib_dma_unmap_single(priv-ca, tx_req-mapping, + tx_req-skb-len, DMA_TO_DEVICE); + + ++priv-stats.tx_packets; + priv-stats.tx_bytes += tx_req-skb-len; - dev_kfree_skb_any(tx_req-skb); + dev_kfree_skb_any(tx_req-skb); + } + /* +* else this skb failed synchronously when posted and was +* freed immediately. +*/ + + if (likely(++tx_ring_index != ipoib_sendq_size)) + tx_req++; + else + tx_req = priv-tx_ring[0]; + } spin_lock_irqsave(priv-tx_lock, flags); - ++priv-tx_tail; + priv-tx_tail += num_completions; if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, priv-flags)) priv-tx_head - priv-tx_tail = ipoib_sendq_size 1) { clear_bit(IPOIB_FLAG_NETIF_STOPPED, priv-flags); @@ -335,29 +365,57 @@ void ipoib_ib_completion(struct ib_cq *c netif_rx_schedule(dev, priv-napi); } -static inline int post_send(struct ipoib_dev_priv *priv, - unsigned int wr_id, - struct ib_ah *address, u32 qpn, - u64 addr, int len) +/* + * post_send : Post WR(s) to the device. + * + * num_skbs is the number of WR's, first_wr is the first slot in tx_wr[] (or + * tx_sge[]). first_wr is normally zero unless a previous post_send returned + * error and we are trying to post the untried WR's, in which case first_wr + * is the index to the first untried WR. + * + * Break the WR link before posting so that provider knows how many WR's to + * process, and this is set back after the post. + */ +static inline int post_send(struct ipoib_dev_priv *priv, u32 qpn, + int first_wr, int num_skbs, + struct ib_send_wr **bad_wr) { - struct ib_send_wr *bad_wr; + int ret; + struct ib_send_wr *last_wr, *next_wr; + + last_wr = priv-tx_wr[first_wr + num_skbs - 1]; - priv-tx_sge.addr = addr; - priv-tx_sge.length = len; + /* Set Completion Notification for last WR */ + last_wr-send_flags = IB_SEND_SIGNALED; - priv-tx_wr.wr_id = wr_id; - priv-tx_wr.wr.ud.remote_qpn = qpn; - priv-tx_wr.wr.ud.ah = address; + /* Terminate the last WR */ + next_wr = last_wr-next; + last_wr-next = NULL; - return ib_post_send(priv-qp, priv-tx_wr, bad_wr); + /* Send all the WR's in one doorbell */ + ret = ib_post_send(priv-qp, priv-tx_wr[first_wr], bad_wr); + + /* Restore send_flags WR chain */ + last_wr-send_flags = 0; + last_wr-next = next_wr; + + return ret; } -void ipoib_send(struct net_device *dev, struct
[PATCH 9/10 Rev4] [IPoIB] Implement batching
IPoIB: implement the new batching API. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- ipoib_main.c | 251 --- 1 files changed, 171 insertions(+), 80 deletions(-) diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_main.c new/drivers/infiniband/ulp/ipoib/ipoib_main.c --- org/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-08-20 14:26:26.0 +0530 +++ new/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-08-22 08:33:51.0 +0530 @@ -560,7 +560,8 @@ static void neigh_add_path(struct sk_buf goto err_drop; } } else - ipoib_send(dev, skb, path-ah, IPOIB_QPN(skb-dst-neighbour-ha)); + ipoib_send(dev, skb, path-ah, + IPOIB_QPN(skb-dst-neighbour-ha), 1); } else { neigh-ah = NULL; @@ -640,7 +641,7 @@ static void unicast_arp_send(struct sk_b ipoib_dbg(priv, Send unicast ARP to %04x\n, be16_to_cpu(path-pathrec.dlid)); - ipoib_send(dev, skb, path-ah, IPOIB_QPN(phdr-hwaddr)); + ipoib_send(dev, skb, path-ah, IPOIB_QPN(phdr-hwaddr), 1); } else if ((path-query || !path_rec_start(dev, path)) skb_queue_len(path-queue) IPOIB_MAX_PATH_REC_QUEUE) { /* put pseudoheader back on for next time */ @@ -654,105 +655,166 @@ static void unicast_arp_send(struct sk_b spin_unlock(priv-lock); } +#defineXMIT_PROCESSED_SKBS() \ + do {\ + if (wr_num) { \ + ipoib_send(dev, NULL, old_neigh-ah, old_qpn, \ + wr_num); \ + wr_num = 0; \ + } \ + } while (0) + static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_neigh *neigh; + struct sk_buff_head *blist; + int max_skbs, wr_num = 0; + u32 qpn, old_qpn = 0; + struct ipoib_neigh *neigh, *old_neigh = NULL; unsigned long flags; if (unlikely(!spin_trylock_irqsave(priv-tx_lock, flags))) return NETDEV_TX_LOCKED; - /* -* Check if our queue is stopped. Since we have the LLTX bit -* set, we can't rely on netif_stop_queue() preventing our -* xmit function from being called with a full queue. -*/ - if (unlikely(netif_queue_stopped(dev))) { - spin_unlock_irqrestore(priv-tx_lock, flags); - return NETDEV_TX_BUSY; - } - - if (likely(skb-dst skb-dst-neighbour)) { - if (unlikely(!*to_ipoib_neigh(skb-dst-neighbour))) { - ipoib_path_lookup(skb, dev); - goto out; - } + blist = dev-skb_blist; - neigh = *to_ipoib_neigh(skb-dst-neighbour); + if (!skb || (blist skb_queue_len(blist))) { + /* +* Either batching xmit call, or single skb case but there are +* skbs already in the batch list from previous failure to +* xmit - send the earlier skbs first to avoid out of order. +*/ + + if (skb) + __skb_queue_tail(blist, skb); + + /* +* Figure out how many skbs can be sent. This prevents the +* device getting full and avoids checking for stopped queue +* after each iteration. Now the queue can get stopped atmost +* after xmit of the last skb. +*/ + max_skbs = ipoib_sendq_size - (priv-tx_head - priv-tx_tail); + skb = __skb_dequeue(blist); + } else { + blist = NULL; + max_skbs = 1; + } - if (ipoib_cm_get(neigh)) { - if (ipoib_cm_up(neigh)) { - ipoib_cm_send(dev, skb, ipoib_cm_get(neigh)); - goto out; - } - } else if (neigh-ah) { - if (unlikely(memcmp(neigh-dgid.raw, - skb-dst-neighbour-ha + 4, - sizeof(union ib_gid { - spin_lock(priv-lock); - /* -* It's safe to call ipoib_put_ah() inside -* priv-lock here, because we know that -* path-ah will always hold one more
[PATCH 10/10 Rev4] [E1000] Implement batching
E1000: Implement batching capability (ported thanks to changes taken from Jamal). Not all changes are made in this as in IPoIB, eg, handling out of order skbs (see XXX in the first mail). Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- e1000_main.c | 150 +++ 1 files changed, 121 insertions(+), 29 deletions(-) diff -ruNp org/drivers/net/e1000/e1000_main.c new/drivers/net/e1000/e1000_main.c --- org/drivers/net/e1000/e1000_main.c 2007-08-20 14:26:29.0 +0530 +++ new/drivers/net/e1000/e1000_main.c 2007-08-22 08:33:51.0 +0530 @@ -157,6 +157,7 @@ static void e1000_update_phy_info(unsign static void e1000_watchdog(unsigned long data); static void e1000_82547_tx_fifo_stall(unsigned long data); static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev); +static int e1000_xmit_frames(struct net_device *dev); static struct net_device_stats * e1000_get_stats(struct net_device *netdev); static int e1000_change_mtu(struct net_device *netdev, int new_mtu); static int e1000_set_mac(struct net_device *netdev, void *p); @@ -990,7 +991,7 @@ e1000_probe(struct pci_dev *pdev, if (pci_using_dac) netdev-features |= NETIF_F_HIGHDMA; - netdev-features |= NETIF_F_LLTX; + netdev-features |= NETIF_F_LLTX | NETIF_F_BATCH_SKBS; adapter-en_mng_pt = e1000_enable_mng_pass_thru(adapter-hw); @@ -3098,6 +3099,18 @@ e1000_tx_map(struct e1000_adapter *adapt return count; } +static void e1000_kick_DMA(struct e1000_adapter *adapter, + struct e1000_tx_ring *tx_ring, int i) +{ + wmb(); + + writel(i, adapter-hw.hw_addr + tx_ring-tdt); + /* we need this if more than one processor can write to our tail +* at a time, it syncronizes IO on IA64/Altix systems */ + mmiowb(); +} + + static void e1000_tx_queue(struct e1000_adapter *adapter, struct e1000_tx_ring *tx_ring, int tx_flags, int count) @@ -3144,13 +3157,7 @@ e1000_tx_queue(struct e1000_adapter *ada * know there are new descriptors to fetch. (Only * applicable for weak-ordered memory model archs, * such as IA-64). */ - wmb(); - tx_ring-next_to_use = i; - writel(i, adapter-hw.hw_addr + tx_ring-tdt); - /* we need this if more than one processor can write to our tail -* at a time, it syncronizes IO on IA64/Altix systems */ - mmiowb(); } /** @@ -3257,21 +3264,31 @@ static int e1000_maybe_stop_tx(struct ne } #define TXD_USE_COUNT(S, X) (((S) (X)) + 1 ) + +struct e1000_tx_cbdata { + int count; + unsigned int max_per_txd; + unsigned int nr_frags; + unsigned int mss; +}; + +#define E1000_SKB_CB(__skb)((struct e1000_tx_cbdata *)((__skb)-cb[0])) +#define NETDEV_TX_DROPPED -5 + static int -e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev) +e1000_prep_queue_frame(struct sk_buff *skb, struct net_device *netdev) { struct e1000_adapter *adapter = netdev_priv(netdev); struct e1000_tx_ring *tx_ring; - unsigned int first, max_per_txd = E1000_MAX_DATA_PER_TXD; + unsigned int max_per_txd = E1000_MAX_DATA_PER_TXD; unsigned int max_txd_pwr = E1000_MAX_TXD_PWR; - unsigned int tx_flags = 0; unsigned int len = skb-len; - unsigned long flags; - unsigned int nr_frags = 0; - unsigned int mss = 0; + unsigned int nr_frags; + unsigned int mss; int count = 0; - int tso; unsigned int f; + struct e1000_tx_cbdata *cb = E1000_SKB_CB(skb); + len -= skb-data_len; /* This goes back to the question of how to logically map a tx queue @@ -3282,7 +3299,7 @@ e1000_xmit_frame(struct sk_buff *skb, st if (unlikely(skb-len = 0)) { dev_kfree_skb_any(skb); - return NETDEV_TX_OK; + return NETDEV_TX_DROPPED; } /* 82571 and newer doesn't need the workaround that limited descriptor @@ -3328,7 +3345,7 @@ e1000_xmit_frame(struct sk_buff *skb, st DPRINTK(DRV, ERR, __pskb_pull_tail failed.\n); dev_kfree_skb_any(skb); - return NETDEV_TX_OK; + return NETDEV_TX_DROPPED; } len = skb-len - skb-data_len; break; @@ -3372,22 +3389,32 @@ e1000_xmit_frame(struct sk_buff *skb, st (adapter-hw.mac_type == e1000_82573)) e1000_transfer_dhcp_info(adapter, skb); - if (!spin_trylock_irqsave(tx_ring-tx_lock, flags)) - /* Collision - tell upper layer to requeue */ - return NETDEV_TX_LOCKED; + cb-count = count; + cb-max_per_txd =
Oops in e100_up
With the davem-2.6.24 tree I get the following Oops in the e100 driver (cribbed from console): Code: 6c ff ff ff 8b 48 0c ba 01 00 00 00 89 f0 e8 1b f2 ff ff c7 86 9c 00 00 00 01 00 00 00 e9 4e ff ff ff 89 d0 e8 b3 f8 0b 00 eb 8e 0f 0b eb fe 55 89 e5 56 53 83 ec 0c 8b 98 dc 01 00 00 e8 ff b9 EIP: e100_up+0x11d/0x121 SS:ESP 0068:f759ce38 Stack: syscall_call - sys_ioctl - vfs_ioctl - do_ioctl - sock_ioctl - inet_ioctl - devinet_ioctl - dev_change_flags - dev_open - e100_open - oops The system log then goes on reporting eth0: link up, 100Mbps, full-duplex and hangs while trying to restore the serial console state (not sure that this is related). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.24 failure with netconsole
From: Andrew Morton [EMAIL PROTECTED] Date: Tue, 21 Aug 2007 22:54:38 -0700 Has anyone tested all this new napi stuff with netconsole? It's pretty disastrous. It immediately goes BUG in napi_enable(). Thomas Graf has found and fixed a bug in the netconsole napi bits a few hours ago, maybe it fixes this problem? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
From: Krishna Kumar2 [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 12:33:04 +0530 Does turning off batching solve that problem? What I mean by that is: batching can be disabled if a TSO device is worse for some cases. This new batching stuff isn't going to be enabled or disabled on a per-device basis just to get parity with how things are now. It should be enabled by default, and give at least as good performance as what can be obtained right now. Otherwise it's a clear regression. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops in e100_up
From: Gerrit Renker [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 09:56:48 +0100 With the davem-2.6.24 tree I get the following Oops in the e100 driver (cribbed from console): Probably the NAPI conversion, I'll try to get to diagnosing this one soon but I've been wrapped up in some other tasks so if someone could beat me to it that'd be great :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.24 failure with netconsole
* Andrew Morton [EMAIL PROTECTED] 2007-08-21 22:54 Which used to be a BUG. It later oopsed via a null-pointer deref in net_rx_action(), which is a much preferable result. I fixed this already Index: net-2.6.24/include/linux/netpoll.h === --- net-2.6.24.orig/include/linux/netpoll.h 2007-08-22 01:02:14.0 +0200 +++ net-2.6.24/include/linux/netpoll.h 2007-08-22 01:02:30.0 +0200 @@ -75,7 +75,7 @@ static inline void *netpoll_poll_lock(st struct net_device *dev = napi-dev; rcu_read_lock(); /* deal with race on -npinfo */ - if (dev-npinfo) { + if (dev dev-npinfo) { spin_lock(napi-poll_lock); napi-poll_owner = smp_processor_id(); return napi; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.24] [NET] Cleanup: DIV_ROUND_UP
Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_output.c |6 +- net/key/af_key.c | 17 + 2 files changed, 6 insertions(+), 17 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 10b2e39..bca4ee2 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -646,11 +646,7 @@ static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb, unsigned skb_shinfo(skb)-gso_size = 0; skb_shinfo(skb)-gso_type = 0; } else { - unsigned int factor; - - factor = skb-len + (mss_now - 1); - factor /= mss_now; - skb_shinfo(skb)-gso_segs = factor; + skb_shinfo(skb)-gso_segs = DIV_ROUND_UP(skb-len, mss_now); skb_shinfo(skb)-gso_size = mss_now; skb_shinfo(skb)-gso_type = sk-sk_gso_type; } diff --git a/net/key/af_key.c b/net/key/af_key.c index 5502df1..17b2a69 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -352,16 +352,14 @@ static int verify_address_len(void *p) switch (addr-sa_family) { case AF_INET: - len = sizeof(*sp) + sizeof(*sin) + (sizeof(uint64_t) - 1); - len /= sizeof(uint64_t); + len = DIV_ROUND_UP(sizeof(*sp) + sizeof(*sin), sizeof(uint64_t)); if (sp-sadb_address_len != len || sp-sadb_address_prefixlen 32) return -EINVAL; break; #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) case AF_INET6: - len = sizeof(*sp) + sizeof(*sin6) + (sizeof(uint64_t) - 1); - len /= sizeof(uint64_t); + len = DIV_ROUND_UP(sizeof(*sp) + sizeof(*sin6), sizeof(uint64_t)); if (sp-sadb_address_len != len || sp-sadb_address_prefixlen 128) return -EINVAL; @@ -386,14 +384,9 @@ static int verify_address_len(void *p) static inline int pfkey_sec_ctx_len(struct sadb_x_sec_ctx *sec_ctx) { - int len = 0; - - len += sizeof(struct sadb_x_sec_ctx); - len += sec_ctx-sadb_x_ctx_len; - len += sizeof(uint64_t) - 1; - len /= sizeof(uint64_t); - - return len; + return DIV_ROUND_UP(sizeof(struct sadb_x_sec_ctx) + + sec_ctx-sadb_x_ctx_len, + sizeof(uint64_t)); } static inline int verify_sec_ctx_len(void *p) -- 1.5.0.6
[PATCH -mm] drivers/net/e1000e/netdev.c warning fix
Hi, This patch fixes the following compilation warning drivers/net/e1000e/netdev.c: In function ‘e1000_setup_rctl’: drivers/net/e1000e/netdev.c:1963: warning: unused variable ‘pages’ Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ --- linux-mm-clean/drivers/net/e1000e/netdev.c 2007-08-22 12:20:31.0 +0200 +++ linux-work/drivers/net/e1000e/netdev.c 2007-08-22 14:44:58.0 +0200 @@ -1960,7 +1960,10 @@ static void e1000_setup_rctl(struct e100 struct e1000_hw *hw = adapter-hw; u32 rctl, rfctl; u32 psrctl = 0; + +#ifndef CONFIG_E1000_DISABLE_PACKET_SPLIT u32 pages = 0; +#endif /* Program MC offset vector base */ rctl = er32(RCTL); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] net/core: Fix crash in dev_mc_sync()/dev_mc_unsync()
Oops, don't use the previous version of the patch: the change in dev_mc_unsync() was not correct. Sorry. This one is a lot better (it compiles and runs). :) Benjamin -- B e n j a m i n T h e r y - BULL/DT/Open Software RD http://www.bull.com From: [EMAIL PROTECTED] Subject: net/core: Fix crash in dev_mc_sync()/dev_mc_unsync() This patch fixes a crash that may occur when the routine dev_mc_sync() deletes an address from the list it is currently going through. It saves the pointer to the next element before deleting the current one. The problem may also exist in dev_mc_unsync(). Signed-off-by: Benjamin Thery [EMAIL PROTECTED] --- net/core/dev_mcast.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) Index: linux-2.6.23-rc2/net/core/dev_mcast.c === --- linux-2.6.23-rc2.orig/net/core/dev_mcast.c +++ linux-2.6.23-rc2/net/core/dev_mcast.c @@ -116,11 +116,13 @@ int dev_mc_add(struct net_device *dev, v */ int dev_mc_sync(struct net_device *to, struct net_device *from) { - struct dev_addr_list *da; + struct dev_addr_list *da, *next; int err = 0; netif_tx_lock_bh(to); - for (da = from-mc_list; da != NULL; da = da-next) { + da = from-mc_list; + while (da != NULL) { + next = da-next; if (!da-da_synced) { err = __dev_addr_add(to-mc_list, to-mc_count, da-da_addr, da-da_addrlen, 0); @@ -134,6 +136,7 @@ int dev_mc_sync(struct net_device *to, s __dev_addr_delete(from-mc_list, from-mc_count, da-da_addr, da-da_addrlen, 0); } + da = next; } if (!err) __dev_set_rx_mode(to); @@ -156,12 +159,14 @@ EXPORT_SYMBOL(dev_mc_sync); */ void dev_mc_unsync(struct net_device *to, struct net_device *from) { - struct dev_addr_list *da; + struct dev_addr_list *da, *next; netif_tx_lock_bh(from); netif_tx_lock_bh(to); - for (da = from-mc_list; da != NULL; da = da-next) { + da = from-mc_list; + while (da != NULL) { + next = da-next; if (!da-da_synced) continue; __dev_addr_delete(to-mc_list, to-mc_count, @@ -169,6 +174,7 @@ void dev_mc_unsync(struct net_device *to da-da_synced = 0; __dev_addr_delete(from-mc_list, from-mc_count, da-da_addr, da-da_addrlen, 0); + da = next; } __dev_set_rx_mode(to);
[PATCH 2.6.23-rc3-mm1] request_irq fix DEBUG_SHIRQ handling Re: 2.6.23-rc2-mm1: rtl8139 inconsistent lock state
On 10-08-2007 01:49, Mariusz Kozlowski wrote: Hello, = [ INFO: inconsistent lock state ] 2.6.23-rc2-mm1 #7 - inconsistent {in-hardirq-W} - {hardirq-on-W} usage. ifconfig/5492 [HC0[0]:SC0[0]:HE1:SE1] takes: (tp-lock){+...}, at: [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too] {in-hardirq-W} state was registered at: [c0138eeb] __lock_acquire+0x949/0x11ac [c01397e7] lock_acquire+0x99/0xb2 [c0452ff3] _spin_lock+0x35/0x42 [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too] [c0147a5d] handle_IRQ_event+0x28/0x59 [c01493ca] handle_level_irq+0xad/0x10b [c0105a13] do_IRQ+0x93/0xd0 [c010441e] common_interrupt+0x2e/0x34 ... other info that might help us debug this: 1 lock held by ifconfig/5492: #0: (rtnl_mutex){--..}, at: [c0451778] mutex_lock+0x1c/0x1f stack backtrace: ... [c0452ff3] _spin_lock+0x35/0x42 [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too] [c01480fd] free_irq+0x11b/0x146 [de871d59] rtl8139_close+0x8a/0x14a [8139too] [c03bde63] dev_close+0x57/0x74 ... It looks like this was possible after David's fix, which really enabled running of the handler in free_irq, but before Andrew's patch disabling local irqs for this time. So, this bug should be fixed, but IMHO similar problem is possible in request_irq. And, I think, this is not only about lockdep complaining, but real lockup possibility, because any locks in such a handler are taken in another, not expected for them context, and could be vulnerable (especially with softirqs, but probably hardirqs as well). Reported-by: Mariusz Kozlowski [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- diff -Nurp 2.6.23-rc3-mm1-/kernel/irq/manage.c 2.6.23-rc3-mm1/kernel/irq/manage.c --- 2.6.23-rc3-mm1-/kernel/irq/manage.c 2007-08-22 13:58:58.0 +0200 +++ 2.6.23-rc3-mm1/kernel/irq/manage.c 2007-08-22 14:12:21.0 +0200 @@ -546,14 +546,11 @@ int request_irq(unsigned int irq, irq_ha * We do this before actually registering it, to make sure that * a 'real' IRQ doesn't run in parallel with our fake */ - if (irqflags IRQF_DISABLED) { - unsigned long flags; + unsigned long flags; - local_irq_save(flags); - handler(irq, dev_id); - local_irq_restore(flags); - } else - handler(irq, dev_id); + local_irq_save(flags); + handler(irq, dev_id); + local_irq_restore(flags); } #endif - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] AH4: Update IPv4 options handling to conform to RFC 4302.
I was asked to resend my message here, so here it is. Please CC me on replies. --- In testing our ESP/AH offload hardware, I discovered an issue with how AH handles mutable fields in IPv4. RFC 4302 (AH) states the following on the subject: For IPv4, the entire option is viewed as a unit; so even though the type and length fields within most options are immutable in transit, if an option is classified as mutable, the entire option is zeroed for ICV computation purposes. The current implementation does not zero the type and length fields, resulting in authentication failures when communicating with hosts that do (i.e. FreeBSD). I have tested record route and timestamp options (ping -R and ping -T) on a small network involving Windows XP, FreeBSD 6.2, and Linux hosts, with one router. In the presence of these options, the FreeBSD and Linux hosts (with the patch or with the hardware) can communicate. The Windows XP host simply fails to accept these packets with or without the patch. I have also been trying to test source routing options (using traceroute -g), but haven't had much luck getting this option to work *without* AH, let alone with. Signed-off-by: Nick Bowler [EMAIL PROTECTED] --- net/ipv4/ah4.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index 7a23e59..39f6211 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -46,7 +46,7 @@ static int ip_clear_mutable_options(struct iphdr *iph, __be32 *daddr) memcpy(daddr, optptr+optlen-4, 4); /* Fall through */ default: - memset(optptr+2, 0, optlen-2); + memset(optptr, 0, optlen); } l -= optlen; optptr += optlen; -- 1.5.2.2 -- Nick Bowler, Elliptic Semiconductor (http://www.ellipticsemi.com/) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10 Rev4] [E1000] Implement batching
Krishna Kumar wrote: E1000: Implement batching capability (ported thanks to changes taken from Jamal). Not all changes are made in this as in IPoIB, eg, handling out of order skbs (see XXX in the first mail). Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- e1000_main.c | 150 +++ 1 files changed, 121 insertions(+), 29 deletions(-) Krishna, while I appreciate the patch I would have preferred a patch to e1000e. Not only does the e1000e driver remove a lot of the workarounds for old silicon, it is also a good way for us to move the current e1000 driver into a bit more stable maintenance mode. Do you think you can write this patch for e1000e instead? code-wise a lot of things are still the same, so your patch should be relatively easy to generate. e1000e currently lives in a branch from jeff garzik's netdev-2.6 tree Thanks, Auke diff -ruNp org/drivers/net/e1000/e1000_main.c new/drivers/net/e1000/e1000_main.c --- org/drivers/net/e1000/e1000_main.c 2007-08-20 14:26:29.0 +0530 +++ new/drivers/net/e1000/e1000_main.c 2007-08-22 08:33:51.0 +0530 @@ -157,6 +157,7 @@ static void e1000_update_phy_info(unsign static void e1000_watchdog(unsigned long data); static void e1000_82547_tx_fifo_stall(unsigned long data); static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev); +static int e1000_xmit_frames(struct net_device *dev); static struct net_device_stats * e1000_get_stats(struct net_device *netdev); static int e1000_change_mtu(struct net_device *netdev, int new_mtu); static int e1000_set_mac(struct net_device *netdev, void *p); @@ -990,7 +991,7 @@ e1000_probe(struct pci_dev *pdev, if (pci_using_dac) netdev-features |= NETIF_F_HIGHDMA; - netdev-features |= NETIF_F_LLTX; + netdev-features |= NETIF_F_LLTX | NETIF_F_BATCH_SKBS; adapter-en_mng_pt = e1000_enable_mng_pass_thru(adapter-hw); @@ -3098,6 +3099,18 @@ e1000_tx_map(struct e1000_adapter *adapt return count; } +static void e1000_kick_DMA(struct e1000_adapter *adapter, + struct e1000_tx_ring *tx_ring, int i) +{ + wmb(); + + writel(i, adapter-hw.hw_addr + tx_ring-tdt); + /* we need this if more than one processor can write to our tail +* at a time, it syncronizes IO on IA64/Altix systems */ + mmiowb(); +} + + static void e1000_tx_queue(struct e1000_adapter *adapter, struct e1000_tx_ring *tx_ring, int tx_flags, int count) @@ -3144,13 +3157,7 @@ e1000_tx_queue(struct e1000_adapter *ada * know there are new descriptors to fetch. (Only * applicable for weak-ordered memory model archs, * such as IA-64). */ - wmb(); - tx_ring-next_to_use = i; - writel(i, adapter-hw.hw_addr + tx_ring-tdt); - /* we need this if more than one processor can write to our tail -* at a time, it syncronizes IO on IA64/Altix systems */ - mmiowb(); } /** @@ -3257,21 +3264,31 @@ static int e1000_maybe_stop_tx(struct ne } #define TXD_USE_COUNT(S, X) (((S) (X)) + 1 ) + +struct e1000_tx_cbdata { + int count; + unsigned int max_per_txd; + unsigned int nr_frags; + unsigned int mss; +}; + +#define E1000_SKB_CB(__skb)((struct e1000_tx_cbdata *)((__skb)-cb[0])) +#define NETDEV_TX_DROPPED -5 + static int -e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev) +e1000_prep_queue_frame(struct sk_buff *skb, struct net_device *netdev) { struct e1000_adapter *adapter = netdev_priv(netdev); struct e1000_tx_ring *tx_ring; - unsigned int first, max_per_txd = E1000_MAX_DATA_PER_TXD; + unsigned int max_per_txd = E1000_MAX_DATA_PER_TXD; unsigned int max_txd_pwr = E1000_MAX_TXD_PWR; - unsigned int tx_flags = 0; unsigned int len = skb-len; - unsigned long flags; - unsigned int nr_frags = 0; - unsigned int mss = 0; + unsigned int nr_frags; + unsigned int mss; int count = 0; - int tso; unsigned int f; + struct e1000_tx_cbdata *cb = E1000_SKB_CB(skb); + len -= skb-data_len; /* This goes back to the question of how to logically map a tx queue @@ -3282,7 +3299,7 @@ e1000_xmit_frame(struct sk_buff *skb, st if (unlikely(skb-len = 0)) { dev_kfree_skb_any(skb); - return NETDEV_TX_OK; + return NETDEV_TX_DROPPED; } /* 82571 and newer doesn't need the workaround that limited descriptor @@ -3328,7 +3345,7 @@ e1000_xmit_frame(struct sk_buff *skb, st DPRINTK(DRV, ERR, __pskb_pull_tail failed.\n); dev_kfree_skb_any(skb); - return NETDEV_TX_OK; + return NETDEV_TX_DROPPED;
Re: [PATCH -mm] drivers/net/e1000e/netdev.c warning fix
Michal Piotrowski wrote: Hi, This patch fixes the following compilation warning drivers/net/e1000e/netdev.c: In function ‘e1000_setup_rctl’: drivers/net/e1000e/netdev.c:1963: warning: unused variable ‘pages’ Regards, Michal also exposes a symbol issue. I think I want to remove this #ifdef CONFIG_E1000_DISABLE_PACKET_SPLIT from this driver alltogether... Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] ehea: fix interface to DLPAR tools
Userspace DLPAR tool expects decimal numbers to be written to and read from sysfs entries. Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- drivers/net/ehea/ehea_main.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c index 9756211..22d000f 100644 --- a/drivers/net/ehea/ehea_main.c +++ b/drivers/net/ehea/ehea_main.c @@ -2490,7 +2490,7 @@ static ssize_t ehea_show_port_id(struct device *dev, struct device_attribute *attr, char *buf) { struct ehea_port *port = container_of(dev, struct ehea_port, ofdev.dev); - return sprintf(buf, 0x%X, port-logical_port_id); + return sprintf(buf, %d, port-logical_port_id); } static DEVICE_ATTR(log_port_id, S_IRUSR | S_IRGRP | S_IROTH, ehea_show_port_id, @@ -2781,7 +2781,7 @@ static ssize_t ehea_probe_port(struct device *dev, u32 logical_port_id; - sscanf(buf, %X, logical_port_id); + sscanf(buf, %d, logical_port_id); port = ehea_get_port(adapter, logical_port_id); @@ -2834,7 +2834,7 @@ static ssize_t ehea_remove_port(struct device *dev, int i; u32 logical_port_id; - sscanf(buf, %X, logical_port_id); + sscanf(buf, %d, logical_port_id); port = ehea_get_port(adapter, logical_port_id); -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] ehea: fix module parameter description
Update the module parameter description of use_mcs to show correct default value Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- drivers/net/ehea/ehea_main.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c index 22d000f..db57474 100644 --- a/drivers/net/ehea/ehea_main.c +++ b/drivers/net/ehea/ehea_main.c @@ -76,7 +76,7 @@ MODULE_PARM_DESC(rq1_entries, Number of entries for Receive Queue 1 MODULE_PARM_DESC(sq_entries, Number of entries for the Send Queue [2^x - 1], x = [6..14]. Default = __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) )); -MODULE_PARM_DESC(use_mcs, 0:NAPI, 1:Multiple receive queues, Default = 1 ); +MODULE_PARM_DESC(use_mcs, 0:NAPI, 1:Multiple receive queues, Default = 0 ); static int port_name_cnt = 0; static LIST_HEAD(adapter_list); -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] ehea: fix queue destructor
Includes hcp_epas_dtor in eq/cq/qp destructors to unmap HW register. Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- drivers/net/ehea/ehea_qmr.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/drivers/net/ehea/ehea_qmr.c b/drivers/net/ehea/ehea_qmr.c index a36fa6c..c82e245 100644 --- a/drivers/net/ehea/ehea_qmr.c +++ b/drivers/net/ehea/ehea_qmr.c @@ -235,6 +235,8 @@ int ehea_destroy_cq(struct ehea_cq *cq) if (!cq) return 0; + hcp_epas_dtor(cq-epas); + if ((hret = ehea_destroy_cq_res(cq, NORMAL_FREE)) == H_R_STATE) { ehea_error_data(cq-adapter, cq-fw_handle); hret = ehea_destroy_cq_res(cq, FORCE_FREE); @@ -361,6 +363,8 @@ int ehea_destroy_eq(struct ehea_eq *eq) if (!eq) return 0; + hcp_epas_dtor(eq-epas); + if ((hret = ehea_destroy_eq_res(eq, NORMAL_FREE)) == H_R_STATE) { ehea_error_data(eq-adapter, eq-fw_handle); hret = ehea_destroy_eq_res(eq, FORCE_FREE); @@ -541,6 +545,8 @@ int ehea_destroy_qp(struct ehea_qp *qp) if (!qp) return 0; + hcp_epas_dtor(qp-epas); + if ((hret = ehea_destroy_qp_res(qp, NORMAL_FREE)) == H_R_STATE) { ehea_error_data(qp-adapter, qp-fw_handle); hret = ehea_destroy_qp_res(qp, FORCE_FREE); -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] ehea: show physical port state
Introduces a module parameter to decide whether the physical port link state is propagated to the network stack or not. It makes sense not to take the physical port state into account on machines with more logical partitions that communicate with each other. This is always possible no matter what the physical port state is. Thus eHEA can be considered as a switch there. Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- drivers/net/ehea/ehea.h |5 - drivers/net/ehea/ehea_main.c | 14 +- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h index d67f97b..8d58be5 100644 --- a/drivers/net/ehea/ehea.h +++ b/drivers/net/ehea/ehea.h @@ -39,7 +39,7 @@ #include asm/io.h #define DRV_NAME ehea -#define DRV_VERSIONEHEA_0073 +#define DRV_VERSIONEHEA_0074 /* eHEA capability flags */ #define DLPAR_PORT_ADD_REM 1 @@ -402,6 +402,8 @@ struct ehea_mc_list { #define EHEA_PORT_UP 1 #define EHEA_PORT_DOWN 0 +#define EHEA_PHY_LINK_UP 1 +#define EHEA_PHY_LINK_DOWN 0 #define EHEA_MAX_PORT_RES 16 struct ehea_port { struct ehea_adapter *adapter;/* adapter that owns this port */ @@ -427,6 +429,7 @@ struct ehea_port { u32 msg_enable; u32 sig_comp_iv; u32 state; + u8 phy_link; u8 full_duplex; u8 autoneg; u8 num_def_qps; diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c index db57474..1804c99 100644 --- a/drivers/net/ehea/ehea_main.c +++ b/drivers/net/ehea/ehea_main.c @@ -53,17 +53,21 @@ static int rq3_entries = EHEA_DEF_ENTRIES_RQ3; static int sq_entries = EHEA_DEF_ENTRIES_SQ; static int use_mcs = 0; static int num_tx_qps = EHEA_NUM_TX_QP; +static int show_phys_link = 0; module_param(msg_level, int, 0); module_param(rq1_entries, int, 0); module_param(rq2_entries, int, 0); module_param(rq3_entries, int, 0); module_param(sq_entries, int, 0); +module_param(show_phys_link, int, 0); module_param(use_mcs, int, 0); module_param(num_tx_qps, int, 0); MODULE_PARM_DESC(num_tx_qps, Number of TX-QPS); MODULE_PARM_DESC(msg_level, msg_level); +MODULE_PARM_DESC(show_phys_link, Show link state of external port +1:yes, 0: no. Default = 0 ); MODULE_PARM_DESC(rq3_entries, Number of entries for Receive Queue 3 [2^x - 1], x = [6..14]. Default = __MODULE_STRING(EHEA_DEF_ENTRIES_RQ3) )); @@ -814,7 +818,9 @@ int ehea_set_portspeed(struct ehea_port *port, u32 port_speed) ehea_error(Failed setting port speed); } } - netif_carrier_on(port-netdev); + if (!show_phys_link || (port-phy_link == EHEA_PHY_LINK_UP)) + netif_carrier_on(port-netdev); + kfree(cb4); out: return ret; @@ -869,13 +875,19 @@ static void ehea_parse_eqe(struct ehea_adapter *adapter, u64 eqe) } if (EHEA_BMASK_GET(NEQE_EXTSWITCH_PORT_UP, eqe)) { + port-phy_link = EHEA_PHY_LINK_UP; if (netif_msg_link(port)) ehea_info(%s: Physical port up, port-netdev-name); + if (show_phys_link) + netif_carrier_on(port-netdev); } else { + port-phy_link = EHEA_PHY_LINK_DOWN; if (netif_msg_link(port)) ehea_info(%s: Physical port down, port-netdev-name); + if (show_phys_link) + netif_carrier_off(port-netdev); } if (EHEA_BMASK_GET(NEQE_EXTSWITCH_PRIMARY, eqe)) -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET] sgiseeq: Fix return type of sgiseeq_remove
The driver remove method needs to return an int not void. This was just never noticed because usually this driver is not being built as a module. Signed-off-by: Ralf Baechle [EMAIL PROTECTED] diff --git a/drivers/net/sgiseeq.c b/drivers/net/sgiseeq.c index 384b468..0fb74cb 100644 --- a/drivers/net/sgiseeq.c +++ b/drivers/net/sgiseeq.c @@ -726,7 +726,7 @@ err_out: return err; } -static void __exit sgiseeq_remove(struct platform_device *pdev) +static int __exit sgiseeq_remove(struct platform_device *pdev) { struct net_device *dev = platform_get_drvdata(pdev); struct sgiseeq_private *sp = netdev_priv(dev); @@ -735,6 +735,8 @@ static void __exit sgiseeq_remove(struct platform_device *pdev) free_page((unsigned long) sp-srings); free_netdev(dev); platform_set_drvdata(pdev, NULL); + + return 0; } static struct platform_driver sgiseeq_driver = { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] e1000e: retire last_tx_tso workaround
This TSO-related workaround is no longer needed since it's only applicable for 8254x silicon. Signed-off-by: Auke Kok [EMAIL PROTECTED] --- drivers/net/e1000e/e1000.h | 15 +++ drivers/net/e1000e/netdev.c | 20 ++-- 2 files changed, 5 insertions(+), 30 deletions(-) diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h index e3cd877..bbe5faf 100644 --- a/drivers/net/e1000e/e1000.h +++ b/drivers/net/e1000e/e1000.h @@ -142,18 +142,9 @@ struct e1000_ring { /* array of buffer information structs */ struct e1000_buffer *buffer_info; - union { - /* for TX */ - struct { - bool last_tx_tso; /* used to mark tso desc. */ - }; - /* for RX */ - struct { - /* arrays of page information for packet split */ - struct e1000_ps_page *ps_pages; - struct sk_buff *rx_skb_top; - }; - }; + /* arrays of page information for packet split */ + struct e1000_ps_page *ps_pages; + struct sk_buff *rx_skb_top; struct e1000_queue_stats stats; }; diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c index 8ebe238..4916f7c 100644 --- a/drivers/net/e1000e/netdev.c +++ b/drivers/net/e1000e/netdev.c @@ -1483,7 +1483,6 @@ static void e1000_clean_tx_ring(struct e1000_adapter *adapter) tx_ring-next_to_use = 0; tx_ring-next_to_clean = 0; - tx_ring-last_tx_tso = 0; writel(0, adapter-hw.hw_addr + tx_ring-head); writel(0, adapter-hw.hw_addr + tx_ring-tail); @@ -3216,15 +3215,6 @@ static int e1000_tx_map(struct e1000_adapter *adapter, while (len) { buffer_info = tx_ring-buffer_info[i]; size = min(len, max_per_txd); - /* Workaround for Controller erratum -- -* descriptor for non-tso packet in a linear SKB that follows a -* tso gets written back prematurely before the data is fully -* DMA'd to the controller */ - if (tx_ring-last_tx_tso !skb_is_gso(skb)) { - tx_ring-last_tx_tso = 0; - if (!skb-data_len) - size -= 4; - } /* Workaround for premature desc write-backs * in TSO mode. Append 4-byte sentinel desc */ @@ -3497,10 +3487,6 @@ static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev) count++; count++; - /* Controller Erratum workaround */ - if (!skb-data_len tx_ring-last_tx_tso !skb_is_gso(skb)) - count++; - count += TXD_USE_COUNT(len, max_txd_pwr); nr_frags = skb_shinfo(skb)-nr_frags; @@ -3536,12 +3522,10 @@ static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev) return NETDEV_TX_OK; } - if (tso) { - tx_ring-last_tx_tso = 1; + if (tso) tx_flags |= E1000_TX_FLAGS_TSO; - } else if (e1000_tx_csum(adapter, skb)) { + else if (e1000_tx_csum(adapter, skb)) tx_flags |= E1000_TX_FLAGS_CSUM; - } /* Old method was to assume IPv4 packet by default if TSO was enabled. * 82571 hardware supports TSO capabilities for IPv6 as well... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] e1000e: Add read code and printout of PBA number (board identifier)
The PBA number allows customers and support to directly identify the type of board and characteristics such as different skews. Slightly enhance loading messages by adding module name to printout. Signed-off-by: Auke Kok [EMAIL PROTECTED] --- drivers/net/e1000e/defines.h |6 -- drivers/net/e1000e/e1000.h |2 ++ drivers/net/e1000e/lib.c | 21 + drivers/net/e1000e/netdev.c | 12 +--- 4 files changed, 36 insertions(+), 5 deletions(-) diff --git a/drivers/net/e1000e/defines.h b/drivers/net/e1000e/defines.h index ca80fde..b32ed45 100644 --- a/drivers/net/e1000e/defines.h +++ b/drivers/net/e1000e/defines.h @@ -573,9 +573,11 @@ /* For checksumming, the sum of all words in the NVM should equal 0xBABA. */ #define NVM_SUM0xBABA -#define NVM_WORD_SIZE_BASE_SHIFT 6 +/* PBA (printed board assembly) number words */ +#define NVM_PBA_OFFSET_0 8 +#define NVM_PBA_OFFSET_1 9 -/* NVM Commands - Microwire */ +#define NVM_WORD_SIZE_BASE_SHIFT 6 /* NVM Commands - SPI */ #define NVM_MAX_RETRY_SPI 5000 /* Max wait of 5ms, for RDY signal */ diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h index bbe5faf..c57e35a 100644 --- a/drivers/net/e1000e/e1000.h +++ b/drivers/net/e1000e/e1000.h @@ -358,6 +358,8 @@ extern struct e1000_info e1000_ich8_info; extern struct e1000_info e1000_ich9_info; extern struct e1000_info e1000_es2_info; +extern s32 e1000e_read_part_num(struct e1000_hw *hw, u32 *part_num); + extern s32 e1000e_commit_phy(struct e1000_hw *hw); extern bool e1000e_enable_mng_pass_thru(struct e1000_hw *hw); diff --git a/drivers/net/e1000e/lib.c b/drivers/net/e1000e/lib.c index 6645c21..3bbfe60 100644 --- a/drivers/net/e1000e/lib.c +++ b/drivers/net/e1000e/lib.c @@ -2464,3 +2464,24 @@ bool e1000e_enable_mng_pass_thru(struct e1000_hw *hw) return ret_val; } +s32 e1000e_read_part_num(struct e1000_hw *hw, u32 *part_num) +{ + s32 ret_val; + u16 nvm_data; + + ret_val = e1000_read_nvm(hw, NVM_PBA_OFFSET_0, 1, nvm_data); + if (ret_val) { + hw_dbg(hw, NVM Read Error\n); + return ret_val; + } + *part_num = (u32)(nvm_data 16); + + ret_val = e1000_read_nvm(hw, NVM_PBA_OFFSET_1, 1, nvm_data); + if (ret_val) { + hw_dbg(hw, NVM Read Error\n); + return ret_val; + } + *part_num |= nvm_data; + + return 0; +} diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c index 4916f7c..420e111 100644 --- a/drivers/net/e1000e/netdev.c +++ b/drivers/net/e1000e/netdev.c @@ -3966,6 +3966,7 @@ static void e1000_print_device_info(struct e1000_adapter *adapter) { struct e1000_hw *hw = adapter-hw; struct net_device *netdev = adapter-netdev; + u32 part_num; /* print bus type/speed/width info */ ndev_info(netdev, (PCI Express:2.5GB/s:%s) @@ -3980,6 +3981,10 @@ static void e1000_print_device_info(struct e1000_adapter *adapter) ndev_info(netdev, Intel(R) PRO/%s Network Connection\n, (hw-phy.type == e1000_phy_ife) ? 10/100 : 1000); + e1000e_read_part_num(hw, part_num); + ndev_info(netdev, MAC: %d, PHY: %d, PBA No: %06x-%03x\n, + hw-mac.type, hw-phy.type, + (part_num 8), (part_num 0xff)); } /** @@ -4414,9 +4419,10 @@ static struct pci_driver e1000_driver = { static int __init e1000_init_module(void) { int ret; - printk(KERN_INFO Intel(R) PRO/1000 Network Driver - %s\n, - e1000e_driver_version); - printk(KERN_INFO Copyright (c) 1999-2007 Intel Corporation.\n); + printk(KERN_INFO %s: Intel(R) PRO/1000 Network Driver - %s\n, + e1000e_driver_name, e1000e_driver_version); + printk(KERN_INFO %s: Copyright (c) 1999-2007 Intel Corporation.\n, + e1000e_driver_name); ret = pci_register_driver(e1000_driver); return ret; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] e1000e: Remove conditional packet split disable flag
This flag conflicts with e1000's Kconfig symbol and we'll leave the feature enabled by default for now. Signed-off-by: Auke Kok [EMAIL PROTECTED] --- drivers/net/e1000e/netdev.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c index 420e111..372da46 100644 --- a/drivers/net/e1000e/netdev.c +++ b/drivers/net/e1000e/netdev.c @@ -2009,7 +2009,6 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter) break; } -#ifndef CONFIG_E1000_DISABLE_PACKET_SPLIT /* * 82571 and greater support packet-split where the protocol * header is placed in skb-data and the packet data is @@ -2029,7 +2028,7 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter) pages = PAGE_USE_COUNT(adapter-netdev-mtu); if ((pages = 3) (PAGE_SIZE = 16384) (rctl E1000_RCTL_LPE)) adapter-rx_ps_pages = pages; -#endif + if (adapter-rx_ps_pages) { /* Configure extra packet-split registers */ rfctl = er32(RFCTL); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [03/10] pasemi_mac: Enable L2 caching of packet headers
Enable settings to target l2 for the first few cachelines of the packet, since we'll access them to get to the various headers. Signed-off-by: Olof Johansson [EMAIL PROTECTED] Index: mainline/drivers/net/pasemi_mac.c === --- mainline.orig/drivers/net/pasemi_mac.c +++ mainline/drivers/net/pasemi_mac.c @@ -216,7 +216,7 @@ static int pasemi_mac_setup_rx_resources PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE 2)); write_dma_reg(mac, PAS_DMA_RXCHAN_CFG(chan_id), - PAS_DMA_RXCHAN_CFG_HBU(1)); + PAS_DMA_RXCHAN_CFG_HBU(2)); write_dma_reg(mac, PAS_DMA_RXINT_BASEL(mac-dma_if), PAS_DMA_RXINT_BASEL_BRBL(__pa(ring-buffers))); @@ -225,6 +225,9 @@ static int pasemi_mac_setup_rx_resources PAS_DMA_RXINT_BASEU_BRBH(__pa(ring-buffers) 32) | PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE 3)); + write_dma_reg(mac, PAS_DMA_RXINT_CFG(mac-dma_if), + PAS_DMA_RXINT_CFG_DHL(2)); + ring-next_to_fill = 0; ring-next_to_clean = 0; Index: mainline/drivers/net/pasemi_mac.h === --- mainline.orig/drivers/net/pasemi_mac.h +++ mainline/drivers/net/pasemi_mac.h @@ -218,6 +218,14 @@ enum { #definePAS_DMA_RXINT_RCMDSTA_ACT 0x0001 #definePAS_DMA_RXINT_RCMDSTA_DROPS_M 0xfffe #definePAS_DMA_RXINT_RCMDSTA_DROPS_S 17 +#define PAS_DMA_RXINT_CFG(i) (0x204+(i)*_PAS_DMA_RXINT_STRIDE) +#definePAS_DMA_RXINT_CFG_DHL_M 0x0700 +#definePAS_DMA_RXINT_CFG_DHL_S 24 +#definePAS_DMA_RXINT_CFG_DHL(x)(((x) PAS_DMA_RXINT_CFG_DHL_S) \ +PAS_DMA_RXINT_CFG_DHL_M) +#definePAS_DMA_RXINT_CFG_WIF 0x0002 +#definePAS_DMA_RXINT_CFG_WIL 0x0001 + #define PAS_DMA_RXINT_INCR(i) (0x210+(i)*_PAS_DMA_RXINT_STRIDE) #definePAS_DMA_RXINT_INCR_INCR_M 0x #definePAS_DMA_RXINT_INCR_INCR_S 0 -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [00/10] pasemi_mac patches for 2.6.24
Hi, pasemi_mac patches for 2.6.24: 01/10: pasemi_mac: Abstract out register access 02/10: pasemi_mac: Stop using the pci config space accessors for register read/writes 03/10: pasemi_mac: Enable L2 caching of packet headers 04/10: pasemi_mac: Fix memcpy amount for short receives 05/10: pasemi_mac: RX performance tweaks 06/10: pasemi_mac: Batch up TX buffer frees 07/10: pasemi_mac: Enable LLTX 08/10: pasemi_mac: Fix TX ring wrap checking 09/10: pasemi_mac: Fix RX checksum flags 10/10: pasemi_mac: Clean TX ring in poll Thanks, Olof -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [09/10] pasemi_mac: Fix RX checksum flags
RX side flag to use is CHECKSUM_UNNECESSARY, not CHECKSUM_COMPLETE. Signed-off-by: Olof Johansson [EMAIL PROTECTED] Index: mainline/drivers/net/pasemi_mac.c === --- mainline.orig/drivers/net/pasemi_mac.c +++ mainline/drivers/net/pasemi_mac.c @@ -534,7 +534,7 @@ static int pasemi_mac_clean_rx(struct pa skb_put(skb, len); if (likely((macrx XCT_MACRX_HTY_M) == XCT_MACRX_HTY_IPV4_OK)) { - skb-ip_summed = CHECKSUM_COMPLETE; + skb-ip_summed = CHECKSUM_UNNECESSARY; skb-csum = (macrx XCT_MACRX_CSUM_M) XCT_MACRX_CSUM_S; } else -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [01/10] pasemi_mac: Abstract out register access
Abstract out the PCI config read/write accesses into reg read/write ones, still calling the pci accessors on the back end. Signed-off-by: Olof Johansson [EMAIL PROTECTED] Index: mainline/drivers/net/pasemi_mac.c === --- mainline.orig/drivers/net/pasemi_mac.c +++ mainline/drivers/net/pasemi_mac.c @@ -81,6 +81,48 @@ MODULE_PARM_DESC(debug, PA Semi MAC bit static struct pasdma_status *dma_status; +static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg) +{ + unsigned int val; + + pci_read_config_dword(mac-iob_pdev, reg, val); + return val; +} + +static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg, + unsigned int val) +{ + pci_write_config_dword(mac-iob_pdev, reg, val); +} + +static unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg) +{ + unsigned int val; + + pci_read_config_dword(mac-pdev, reg, val); + return val; +} + +static void write_mac_reg(struct pasemi_mac *mac, unsigned int reg, + unsigned int val) +{ + pci_write_config_dword(mac-pdev, reg, val); +} + +static unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg) +{ + unsigned int val; + + pci_read_config_dword(mac-dma_pdev, reg, val); + return val; +} + +static void write_dma_reg(struct pasemi_mac *mac, unsigned int reg, + unsigned int val) +{ + pci_write_config_dword(mac-dma_pdev, reg, val); +} + static int pasemi_get_mac_addr(struct pasemi_mac *mac) { struct pci_dev *pdev = mac-pdev; @@ -166,22 +208,21 @@ static int pasemi_mac_setup_rx_resources memset(ring-buffers, 0, RX_RING_SIZE * sizeof(u64)); - pci_write_config_dword(mac-dma_pdev, PAS_DMA_RXCHAN_BASEL(chan_id), - PAS_DMA_RXCHAN_BASEL_BRBL(ring-dma)); + write_dma_reg(mac, PAS_DMA_RXCHAN_BASEL(chan_id), PAS_DMA_RXCHAN_BASEL_BRBL(ring-dma)); + + write_dma_reg(mac, PAS_DMA_RXCHAN_BASEU(chan_id), + PAS_DMA_RXCHAN_BASEU_BRBH(ring-dma 32) | + PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE 2)); + + write_dma_reg(mac, PAS_DMA_RXCHAN_CFG(chan_id), + PAS_DMA_RXCHAN_CFG_HBU(1)); - pci_write_config_dword(mac-dma_pdev, PAS_DMA_RXCHAN_BASEU(chan_id), - PAS_DMA_RXCHAN_BASEU_BRBH(ring-dma 32) | - PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE 2)); - - pci_write_config_dword(mac-dma_pdev, PAS_DMA_RXCHAN_CFG(chan_id), - PAS_DMA_RXCHAN_CFG_HBU(1)); - - pci_write_config_dword(mac-dma_pdev, PAS_DMA_RXINT_BASEL(mac-dma_if), - PAS_DMA_RXINT_BASEL_BRBL(__pa(ring-buffers))); - - pci_write_config_dword(mac-dma_pdev, PAS_DMA_RXINT_BASEU(mac-dma_if), - PAS_DMA_RXINT_BASEU_BRBH(__pa(ring-buffers) 32) | - PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE 3)); + write_dma_reg(mac, PAS_DMA_RXINT_BASEL(mac-dma_if), + PAS_DMA_RXINT_BASEL_BRBL(__pa(ring-buffers))); + + write_dma_reg(mac, PAS_DMA_RXINT_BASEU(mac-dma_if), + PAS_DMA_RXINT_BASEU_BRBH(__pa(ring-buffers) 32) | + PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE 3)); ring-next_to_fill = 0; ring-next_to_clean = 0; @@ -233,18 +274,18 @@ static int pasemi_mac_setup_tx_resources memset(ring-desc, 0, TX_RING_SIZE * sizeof(struct pas_dma_xct_descr)); - pci_write_config_dword(mac-dma_pdev, PAS_DMA_TXCHAN_BASEL(chan_id), - PAS_DMA_TXCHAN_BASEL_BRBL(ring-dma)); + write_dma_reg(mac, PAS_DMA_TXCHAN_BASEL(chan_id), + PAS_DMA_TXCHAN_BASEL_BRBL(ring-dma)); val = PAS_DMA_TXCHAN_BASEU_BRBH(ring-dma 32); val |= PAS_DMA_TXCHAN_BASEU_SIZ(TX_RING_SIZE 2); - pci_write_config_dword(mac-dma_pdev, PAS_DMA_TXCHAN_BASEU(chan_id), val); + write_dma_reg(mac, PAS_DMA_TXCHAN_BASEU(chan_id), val); - pci_write_config_dword(mac-dma_pdev, PAS_DMA_TXCHAN_CFG(chan_id), - PAS_DMA_TXCHAN_CFG_TY_IFACE | - PAS_DMA_TXCHAN_CFG_TATTR(mac-dma_if) | - PAS_DMA_TXCHAN_CFG_UP | - PAS_DMA_TXCHAN_CFG_WT(2)); + write_dma_reg(mac, PAS_DMA_TXCHAN_CFG(chan_id), + PAS_DMA_TXCHAN_CFG_TY_IFACE | + PAS_DMA_TXCHAN_CFG_TATTR(mac-dma_if) | + PAS_DMA_TXCHAN_CFG_UP | + PAS_DMA_TXCHAN_CFG_WT(2)); ring-next_to_use = 0; ring-next_to_clean = 0; @@ -383,12 +424,8 @@ static void pasemi_mac_replenish_rx_ring wmb(); - pci_write_config_dword(mac-dma_pdev, -
[PATCH] [08/10] pasemi_mac: Fix TX ring wrap checking
The old logic didn't detect full (tx) ring cases properly, causing overruns and general badness. Clean it up a bit and abstract out the ring size checks, always making sure to leave 1 slot open. Signed-off-by: Olof Johansson [EMAIL PROTECTED] Index: mainline/drivers/net/pasemi_mac.c === --- mainline.orig/drivers/net/pasemi_mac.c +++ mainline/drivers/net/pasemi_mac.c @@ -69,6 +69,10 @@ #define RX_DESC_INFO(mac, num) ((mac)-rx-desc_info[(num) (RX_RING_SIZE-1)]) #define RX_BUFF(mac, num) ((mac)-rx-buffers[(num) (RX_RING_SIZE-1)]) +#define RING_USED(ring)(((ring)-next_to_fill - (ring)-next_to_clean) \ + ((ring)-size - 1)) +#define RING_AVAIL(ring) ((ring-size) - RING_USED(ring)) + #define BUF_SIZE 1646 /* 1500 MTU + ETH_HLEN + VLAN_HLEN + 2 64B cachelines */ MODULE_LICENSE(GPL); @@ -184,6 +188,7 @@ static int pasemi_mac_setup_rx_resources spin_lock_init(ring-lock); + ring-size = RX_RING_SIZE; ring-desc_info = kzalloc(sizeof(struct pasemi_mac_buffer) * RX_RING_SIZE, GFP_KERNEL); @@ -263,6 +268,7 @@ static int pasemi_mac_setup_tx_resources spin_lock_init(ring-lock); + ring-size = TX_RING_SIZE; ring-desc_info = kzalloc(sizeof(struct pasemi_mac_buffer) * TX_RING_SIZE, GFP_KERNEL); if (!ring-desc_info) @@ -291,7 +297,7 @@ static int pasemi_mac_setup_tx_resources PAS_DMA_TXCHAN_CFG_UP | PAS_DMA_TXCHAN_CFG_WT(2)); - ring-next_to_use = 0; + ring-next_to_fill = 0; ring-next_to_clean = 0; snprintf(ring-irq_name, sizeof(ring-irq_name), @@ -386,9 +392,7 @@ static void pasemi_mac_replenish_rx_ring int start = mac-rx-next_to_fill; unsigned int limit, count; - limit = (mac-rx-next_to_clean + RX_RING_SIZE - -mac-rx-next_to_fill) (RX_RING_SIZE - 1); - + limit = RING_AVAIL(mac-rx); /* Check to see if we're doing first-time setup */ if (unlikely(mac-rx-next_to_clean == 0 mac-rx-next_to_fill == 0)) limit = RX_RING_SIZE; @@ -572,7 +576,7 @@ restart: spin_lock_irqsave(mac-tx-lock, flags); start = mac-tx-next_to_clean; - limit = min(mac-tx-next_to_use, start+32); + limit = min(mac-tx-next_to_fill, start+32); count = 0; @@ -1013,14 +1017,13 @@ static int pasemi_mac_start_tx(struct sk spin_lock_irqsave(txring-lock, flags); - if (txring-next_to_clean - txring-next_to_use == TX_RING_SIZE) { + if (RING_AVAIL(txring) = 1) { spin_unlock_irqrestore(txring-lock, flags); pasemi_mac_clean_tx(mac); pasemi_mac_restart_tx_intr(mac); spin_lock_irqsave(txring-lock, flags); - if (txring-next_to_clean - txring-next_to_use == - TX_RING_SIZE) { + if (RING_AVAIL(txring) = 1) { /* Still no room -- stop the queue and wait for tx * intr when there's room. */ @@ -1029,15 +1032,15 @@ static int pasemi_mac_start_tx(struct sk } } - dp = TX_DESC(mac, txring-next_to_use); - info = TX_DESC_INFO(mac, txring-next_to_use); + dp = TX_DESC(mac, txring-next_to_fill); + info = TX_DESC_INFO(mac, txring-next_to_fill); dp-mactx = mactx; dp-ptr = ptr; info-dma = map; info-skb = skb; - txring-next_to_use++; + txring-next_to_fill++; mac-stats.tx_packets++; mac-stats.tx_bytes += skb-len; Index: mainline/drivers/net/pasemi_mac.h === --- mainline.orig/drivers/net/pasemi_mac.h +++ mainline/drivers/net/pasemi_mac.h @@ -31,7 +31,7 @@ struct pasemi_mac_txring { struct pas_dma_xct_descr*desc; dma_addr_t dma; unsigned int size; - unsigned int next_to_use; + unsigned int next_to_fill; unsigned int next_to_clean; struct pasemi_mac_buffer *desc_info; char irq_name[10]; /* eth%d tx */ -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [05/10] pasemi_mac: RX performance tweaks
Various RX performance tweaks, do some explicit prefetching of packet data, etc. Signed-off-by: Olof Johansson [EMAIL PROTECTED] Index: mainline/drivers/net/pasemi_mac.c === --- mainline.orig/drivers/net/pasemi_mac.c +++ mainline/drivers/net/pasemi_mac.c @@ -481,6 +481,7 @@ static int pasemi_mac_clean_rx(struct pa rmb(); dp = RX_DESC(mac, n); + prefetchw(dp); macrx = dp-macrx; if (!(macrx XCT_MACRX_O)) @@ -502,8 +503,10 @@ static int pasemi_mac_clean_rx(struct pa if (info-dma == dma) break; } + prefetchw(info); skb = info-skb; + prefetchw(skb); info-dma = 0; pci_unmap_single(mac-dma_pdev, dma, skb-len, @@ -526,9 +529,7 @@ static int pasemi_mac_clean_rx(struct pa skb_put(skb, len); - skb-protocol = eth_type_trans(skb, mac-netdev); - - if ((macrx XCT_MACRX_HTY_M) == XCT_MACRX_HTY_IPV4_OK) { + if (likely((macrx XCT_MACRX_HTY_M) == XCT_MACRX_HTY_IPV4_OK)) { skb-ip_summed = CHECKSUM_COMPLETE; skb-csum = (macrx XCT_MACRX_CSUM_M) XCT_MACRX_CSUM_S; @@ -538,6 +539,7 @@ static int pasemi_mac_clean_rx(struct pa mac-stats.rx_bytes += len; mac-stats.rx_packets++; + skb-protocol = eth_type_trans(skb, mac-netdev); netif_receive_skb(skb); dp-ptr = 0; @@ -569,7 +571,7 @@ static int pasemi_mac_clean_tx(struct pa for (i = start; i mac-tx-next_to_use; i++) { dp = TX_DESC(mac, i); - if (!dp || (dp-mactx XCT_MACTX_O)) + if (unlikely(dp-mactx XCT_MACTX_O)) break; count++; @@ -957,7 +959,7 @@ static int pasemi_mac_start_tx(struct sk struct pasemi_mac_txring *txring; struct pasemi_mac_buffer *info; struct pas_dma_xct_descr *dp; - u64 dflags; + u64 dflags, mactx, ptr; dma_addr_t map; int flags; @@ -985,6 +987,9 @@ static int pasemi_mac_start_tx(struct sk if (dma_mapping_error(map)) return NETDEV_TX_BUSY; + mactx = dflags | XCT_MACTX_LLEN(skb-len); + ptr = XCT_PTR_LEN(skb-len) | XCT_PTR_ADDR(map); + txring = mac-tx; spin_lock_irqsave(txring-lock, flags); @@ -1005,12 +1010,11 @@ static int pasemi_mac_start_tx(struct sk } } - dp = TX_DESC(mac, txring-next_to_use); info = TX_DESC_INFO(mac, txring-next_to_use); - dp-mactx = dflags | XCT_MACTX_LLEN(skb-len); - dp-ptr = XCT_PTR_LEN(skb-len) | XCT_PTR_ADDR(map); + dp-mactx = mactx; + dp-ptr = ptr; info-dma = map; info-skb = skb; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [10/10] pasemi_mac: Clean TX ring in poll
Unfortunately there's no timeout for how long a packet can sit on the TX ring after completion before an interrupt is generated, and we want to have a threshold that's larger than one packet per interrupt. So we have to have a timer that occasionally cleans the TX ring even though there hasn't been an interrupt. Instead of setting up a dedicated timer for this, just clean it in the NAPI poll routine instead. Signed-off-by: Olof Johansson [EMAIL PROTECTED] --- I know I got this rejected last time it was submitted, but no answers with suggestions on how to handle it better. I'm all ears if there's a better way. (I noticed that Intel's new ixgbe driver does the same thing). Index: mainline/drivers/net/pasemi_mac.c === --- mainline.orig/drivers/net/pasemi_mac.c +++ mainline/drivers/net/pasemi_mac.c @@ -1086,6 +1086,7 @@ static int pasemi_mac_poll(struct net_de int pkts, limit = min(*budget, dev-quota); struct pasemi_mac *mac = netdev_priv(dev); + pasemi_mac_clean_tx(mac); pkts = pasemi_mac_clean_rx(mac, limit); dev-quota -= pkts; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes
Move away from using the pci config access functions for simple register access. Our device has all of the registers in the config space (hey, from the hardware point of view it looks reasonable :-), so we need to somehow get to it. Newer firmwares have it in the device tree such that we can just get it and ioremap it there (in case it ever moves in future products). For now, provide a hardcoded fallback for older firmwares. Signed-off-by: Olof Johansson [EMAIL PROTECTED] Index: mainline/drivers/net/pasemi_mac.c === --- mainline.orig/drivers/net/pasemi_mac.c +++ mainline/drivers/net/pasemi_mac.c @@ -81,46 +81,47 @@ MODULE_PARM_DESC(debug, PA Semi MAC bit static struct pasdma_status *dma_status; -static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg) +static inline unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg) { unsigned int val; - pci_read_config_dword(mac-iob_pdev, reg, val); + val = in_le32(mac-iob_regs+reg); + return val; } -static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg, +static inline void write_iob_reg(struct pasemi_mac *mac, unsigned int reg, unsigned int val) { - pci_write_config_dword(mac-iob_pdev, reg, val); + out_le32(mac-iob_regs+reg, val); } -static unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg) +static inline unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg) { unsigned int val; - pci_read_config_dword(mac-pdev, reg, val); + val = in_le32(mac-regs+reg); return val; } -static void write_mac_reg(struct pasemi_mac *mac, unsigned int reg, +static inline void write_mac_reg(struct pasemi_mac *mac, unsigned int reg, unsigned int val) { - pci_write_config_dword(mac-pdev, reg, val); + out_le32(mac-regs+reg, val); } -static unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg) +static inline unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg) { unsigned int val; - pci_read_config_dword(mac-dma_pdev, reg, val); + val = in_le32(mac-dma_regs+reg); return val; } -static void write_dma_reg(struct pasemi_mac *mac, unsigned int reg, +static inline void write_dma_reg(struct pasemi_mac *mac, unsigned int reg, unsigned int val) { - pci_write_config_dword(mac-dma_pdev, reg, val); + out_le32(mac-dma_regs+reg, val); } static int pasemi_get_mac_addr(struct pasemi_mac *mac) @@ -585,7 +586,6 @@ static int pasemi_mac_clean_tx(struct pa } mac-tx-next_to_clean += count; spin_unlock_irqrestore(mac-tx-lock, flags); - netif_wake_queue(mac-netdev); return count; @@ -1076,6 +1076,73 @@ static int pasemi_mac_poll(struct net_de } } +static inline void __iomem * __devinit map_onedev(struct pci_dev *p, int index) +{ + struct device_node *dn; + void __iomem *ret; + + dn = pci_device_to_OF_node(p); + if (!dn) + goto fallback; + + ret = of_iomap(dn, index); + if (!ret) + goto fallback; + + return ret; +fallback: + /* This is hardcoded and ugly, but we have some firmware versions +* who don't provide the register space in the device tree. Luckily +* they are at well-known locations so we can just do the math here. +*/ + return ioremap(0xe000 + (p-devfn 12), 0x2000); +} + +static int __devinit pasemi_mac_map_regs(struct pasemi_mac *mac) +{ + struct resource res; + struct device_node *dn; + int err; + + mac-dma_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa007, NULL); + if (!mac-dma_pdev) { + dev_err(mac-pdev-dev, Can't find DMA Controller\n); + return -ENODEV; + } + + mac-iob_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa001, NULL); + if (!mac-iob_pdev) { + dev_err(mac-pdev-dev, Can't find I/O Bridge\n); + return -ENODEV; + } + + mac-regs = map_onedev(mac-pdev, 0); + mac-dma_regs = map_onedev(mac-dma_pdev, 0); + mac-iob_regs = map_onedev(mac-iob_pdev, 0); + + if (!mac-regs || !mac-dma_regs || !mac-iob_regs) { + dev_err(mac-pdev-dev, Can't map registers\n); + return -ENODEV; + } + + /* The dma status structure is located in the I/O bridge, and +* is cache coherent. +*/ + if (!dma_status) { + dn = pci_device_to_OF_node(mac-iob_pdev); + if (dn) + err = of_address_to_resource(dn, 1, res); + if (!dn || err) { + /* Fallback for old firmware */ + res.start = 0xfd80; + res.end = res.start + 0x1000; + } +
[PATCH] [06/10] pasemi_mac: Batch up TX buffer frees
Postpone pci unmap and skb free of the transmitted buffers to outside of the tx ring lock, batching them up 32 at a time. Also increase the count threshold to 128. Signed-off-by: Olof Johansson [EMAIL PROTECTED] Index: mainline/drivers/net/pasemi_mac.c === --- mainline.orig/drivers/net/pasemi_mac.c +++ mainline/drivers/net/pasemi_mac.c @@ -561,37 +561,56 @@ static int pasemi_mac_clean_tx(struct pa int i; struct pasemi_mac_buffer *info; struct pas_dma_xct_descr *dp; - int start, count; + unsigned int start, count, limit; + unsigned int total_count; int flags; + struct sk_buff *skbs[32]; + dma_addr_t dmas[32]; + total_count = 0; +restart: spin_lock_irqsave(mac-tx-lock, flags); start = mac-tx-next_to_clean; + limit = min(mac-tx-next_to_use, start+32); + count = 0; - for (i = start; i mac-tx-next_to_use; i++) { + for (i = start; i limit; i++) { dp = TX_DESC(mac, i); + if (unlikely(dp-mactx XCT_MACTX_O)) + /* Not yet transmitted */ break; - count++; - info = TX_DESC_INFO(mac, i); - - pci_unmap_single(mac-dma_pdev, info-dma, -info-skb-len, PCI_DMA_TODEVICE); - dev_kfree_skb_irq(info-skb); + skbs[count] = info-skb; + dmas[count] = info-dma; info-skb = NULL; info-dma = 0; dp-mactx = 0; dp-ptr = 0; + + count++; } mac-tx-next_to_clean += count; spin_unlock_irqrestore(mac-tx-lock, flags); netif_wake_queue(mac-netdev); - return count; + for (i = 0; i count; i++) { + pci_unmap_single(mac-dma_pdev, dmas[i], +skbs[i]-len, PCI_DMA_TODEVICE); + dev_kfree_skb_irq(skbs[i]); + } + + total_count += count; + + /* If the batch was full, try to clean more */ + if (count == 32) + goto restart; + + return total_count; } @@ -787,7 +806,7 @@ static int pasemi_mac_open(struct net_de PAS_IOB_DMA_RXCH_CFG_CNTTH(0)); write_iob_reg(mac, PAS_IOB_DMA_TXCH_CFG(mac-dma_txch), - PAS_IOB_DMA_TXCH_CFG_CNTTH(32)); + PAS_IOB_DMA_TXCH_CFG_CNTTH(128)); /* Clear out any residual packet count state from firmware */ pasemi_mac_restart_rx_intr(mac); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [04/10] pasemi_mac: Fix memcpy amount for short receives
Fix up memcpy for short receives. Signed-off-by: Olof Johansson [EMAIL PROTECTED] Index: mainline/drivers/net/pasemi_mac.c === --- mainline.orig/drivers/net/pasemi_mac.c +++ mainline/drivers/net/pasemi_mac.c @@ -516,9 +516,7 @@ static int pasemi_mac_clean_rx(struct pa netdev_alloc_skb(mac-netdev, len + NET_IP_ALIGN); if (new_skb) { skb_reserve(new_skb, NET_IP_ALIGN); - memcpy(new_skb-data - NET_IP_ALIGN, - skb-data - NET_IP_ALIGN, - len + NET_IP_ALIGN); + memcpy(new_skb-data, skb-data, len); /* save the skb in buffer_info as good */ skb = new_skb; } -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [07/10] pasemi_mac: Enable LLTX
Enable LLTX on pasemi_mac: we're already doing sufficient locking in the driver to enable it. Signed-off-by: Olof Johansson [EMAIL PROTECTED] Index: mainline/drivers/net/pasemi_mac.c === --- mainline.orig/drivers/net/pasemi_mac.c +++ mainline/drivers/net/pasemi_mac.c @@ -1235,7 +1235,7 @@ pasemi_mac_probe(struct pci_dev *pdev, c dev-set_multicast_list = pasemi_mac_set_rx_mode; dev-weight = 64; dev-poll = pasemi_mac_poll; - dev-features = NETIF_F_HW_CSUM; + dev-features = NETIF_F_HW_CSUM | NETIF_F_LLTX; err = pasemi_mac_map_regs(mac); if (err) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/16] xfrm netlink interface cleanups
This patchset converts the xfrm netlink bits over to the type safe netlink interface and does some cleanups. xfrm_user.c | 1041 1 file changed, 433 insertions(+), 608 deletions(-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/16] [XFRM] netlink: Use nlmsg_put() instead of NLMSG_PUT()
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-20 17:09:48.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:10:34.0 +0200 @@ -588,10 +588,10 @@ static int dump_one_state(struct xfrm_st if (sp-this_idx sp-start_idx) goto out; - nlh = NLMSG_PUT(skb, NETLINK_CB(in_skb).pid, - sp-nlmsg_seq, - XFRM_MSG_NEWSA, sizeof(*p)); - nlh-nlmsg_flags = sp-nlmsg_flags; + nlh = nlmsg_put(skb, NETLINK_CB(in_skb).pid, sp-nlmsg_seq, + XFRM_MSG_NEWSA, sizeof(*p), sp-nlmsg_flags); + if (nlh == NULL) + return -EMSGSIZE; p = NLMSG_DATA(nlh); copy_to_user_state(x, p); @@ -633,7 +633,6 @@ out: sp-this_idx++; return 0; -nlmsg_failure: rtattr_failure: nlmsg_trim(skb, b); return -1; @@ -1276,11 +1275,11 @@ static int dump_one_policy(struct xfrm_p if (sp-this_idx sp-start_idx) goto out; - nlh = NLMSG_PUT(skb, NETLINK_CB(in_skb).pid, - sp-nlmsg_seq, - XFRM_MSG_NEWPOLICY, sizeof(*p)); + nlh = nlmsg_put(skb, NETLINK_CB(in_skb).pid, sp-nlmsg_seq, + XFRM_MSG_NEWPOLICY, sizeof(*p), sp-nlmsg_flags); + if (nlh == NULL) + return -EMSGSIZE; p = NLMSG_DATA(nlh); - nlh-nlmsg_flags = sp-nlmsg_flags; copy_to_user_policy(xp, p, dir); if (copy_to_user_tmpl(xp, skb) 0) @@ -1449,9 +1448,10 @@ static int build_aevent(struct sk_buff * struct xfrm_lifetime_cur ltime; unsigned char *b = skb_tail_pointer(skb); - nlh = NLMSG_PUT(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id)); + nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0); + if (nlh == NULL) + return -EMSGSIZE; id = NLMSG_DATA(nlh); - nlh-nlmsg_flags = 0; memcpy(id-sa_id.daddr, x-id.daddr,sizeof(x-id.daddr)); id-sa_id.spi = x-id.spi; @@ -1483,7 +1483,6 @@ static int build_aevent(struct sk_buff * return skb-len; rtattr_failure: -nlmsg_failure: nlmsg_trim(skb, b); return -1; } @@ -1866,9 +1865,10 @@ static int build_migrate(struct sk_buff unsigned char *b = skb_tail_pointer(skb); int i; - nlh = NLMSG_PUT(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id)); + nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id), 0); + if (nlh == NULL) + return -EMSGSIZE; pol_id = NLMSG_DATA(nlh); - nlh-nlmsg_flags = 0; /* copy data from selector, dir, and type to the pol_id */ memset(pol_id, 0, sizeof(*pol_id)); @@ -2045,20 +2045,16 @@ static int build_expire(struct sk_buff * struct nlmsghdr *nlh; unsigned char *b = skb_tail_pointer(skb); - nlh = NLMSG_PUT(skb, c-pid, 0, XFRM_MSG_EXPIRE, - sizeof(*ue)); + nlh = nlmsg_put(skb, c-pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0); + if (nlh == NULL) + return -EMSGSIZE; ue = NLMSG_DATA(nlh); - nlh-nlmsg_flags = 0; copy_to_user_state(x, ue-state); ue-hard = (c-data.hard != 0) ? 1 : 0; nlh-nlmsg_len = skb_tail_pointer(skb) - b; return skb-len; - -nlmsg_failure: - nlmsg_trim(skb, b); - return -1; } static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c) @@ -2108,9 +2104,11 @@ static int xfrm_notify_sa_flush(struct k return -ENOMEM; b = skb-tail; - nlh = NLMSG_PUT(skb, c-pid, c-seq, - XFRM_MSG_FLUSHSA, sizeof(*p)); - nlh-nlmsg_flags = 0; + nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_FLUSHSA, sizeof(*p), 0); + if (nlh == NULL) { + kfree_skb(skb); + return -EMSGSIZE; + } p = NLMSG_DATA(nlh); p-proto = c-data.proto; @@ -2119,10 +2117,6 @@ static int xfrm_notify_sa_flush(struct k NETLINK_CB(skb).dst_group = XFRMNLGRP_SA; return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); - -nlmsg_failure: - kfree_skb(skb); - return -1; } static inline int xfrm_sa_len(struct xfrm_state *x) @@ -2162,8 +2156,9 @@ static int xfrm_notify_sa(struct xfrm_st return -ENOMEM; b = skb-tail; - nlh = NLMSG_PUT(skb, c-pid, c-seq, c-event, headlen); - nlh-nlmsg_flags = 0; + nlh = nlmsg_put(skb, c-pid, c-seq, c-event, headlen, 0); + if (nlh == NULL) + goto nlmsg_failure; p = NLMSG_DATA(nlh); if (c-event == XFRM_MSG_DELSA) { @@ -2233,10 +2228,10 @@ static int build_acquire(struct sk_buff unsigned char *b = skb_tail_pointer(skb); __u32 seq = xfrm_get_acqseq(); -
[PATCH 09/16] [XFRM] netlink: Use nlmsg_parse() to parse attributes
Uses nlmsg_parse() to parse the attributes. This actually changes behaviour as unknown attributes (type MAXTYPE) no longer cause an error. Instead unknown attributes will be ignored henceforth to keep older kernels compatible with more recent userspace tools. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:07:38.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:31:04.0 +0200 @@ -1890,7 +1890,7 @@ static int xfrm_send_migrate(struct xfrm } #endif -#define XMSGSIZE(type) NLMSG_LENGTH(sizeof(struct type)) +#define XMSGSIZE(type) sizeof(struct type) static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = { [XFRM_MSG_NEWSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info), @@ -1906,13 +1906,13 @@ static const int xfrm_msg_min[XFRM_NR_MS [XFRM_MSG_UPDSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info), [XFRM_MSG_POLEXPIRE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_polexpire), [XFRM_MSG_FLUSHSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_flush), - [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = NLMSG_LENGTH(0), + [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = 0, [XFRM_MSG_NEWAE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id), [XFRM_MSG_GETAE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id), [XFRM_MSG_REPORT - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_report), [XFRM_MSG_MIGRATE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id), - [XFRM_MSG_GETSADINFO - XFRM_MSG_BASE] = NLMSG_LENGTH(sizeof(u32)), - [XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = NLMSG_LENGTH(sizeof(u32)), + [XFRM_MSG_GETSADINFO - XFRM_MSG_BASE] = sizeof(u32), + [XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = sizeof(u32), }; #undef XMSGSIZE @@ -1946,9 +1946,9 @@ static struct xfrm_link { static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) { - struct rtattr *xfrma[XFRMA_MAX]; + struct nlattr *xfrma[XFRMA_MAX+1]; struct xfrm_link *link; - int type, min_len; + int type, err; type = nlh-nlmsg_type; if (type XFRM_MSG_MAX) @@ -1970,30 +1970,16 @@ static int xfrm_user_rcv_msg(struct sk_b return netlink_dump_start(xfrm_nl, skb, nlh, link-dump, NULL); } - memset(xfrma, 0, sizeof(xfrma)); - - if (nlh-nlmsg_len (min_len = xfrm_msg_min[type])) - return -EINVAL; - - if (nlh-nlmsg_len min_len) { - int attrlen = nlh-nlmsg_len - NLMSG_ALIGN(min_len); - struct rtattr *attr = (void *) nlh + NLMSG_ALIGN(min_len); - - while (RTA_OK(attr, attrlen)) { - unsigned short flavor = attr-rta_type; - if (flavor) { - if (flavor XFRMA_MAX) - return -EINVAL; - xfrma[flavor - 1] = attr; - } - attr = RTA_NEXT(attr, attrlen); - } - } + /* FIXME: Temporary hack, nlmsg_parse() starts at xfrma[1], old code +* expects first attribute at xfrma[0] */ + err = nlmsg_parse(nlh, xfrm_msg_min[type], xfrma-1, XFRMA_MAX, NULL); + if (err 0) + return err; if (link-doit == NULL) return -EINVAL; - return link-doit(skb, nlh, xfrma); + return link-doit(skb, nlh, (struct rtattr **) xfrma); } static void xfrm_netlink_rcv(struct sock *sk, int len) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/16] [XFRM] netlink: Use nlmsg_new() and type-safe size calculation helpers
Moves all complex message size calculation into own inlined helper functions and makes use of the type-safe netlink interface. Using nlmsg_new() simplifies the calculation itself as it takes care of the netlink header length by itself. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:04:46.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:07:38.0 +0200 @@ -670,7 +670,7 @@ static struct sk_buff *xfrm_state_netlin struct xfrm_dump_info info; struct sk_buff *skb; - skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); if (!skb) return ERR_PTR(-ENOMEM); @@ -688,6 +688,13 @@ static struct sk_buff *xfrm_state_netlin return skb; } +static inline size_t xfrm_spdinfo_msgsize(void) +{ + return NLMSG_ALIGN(4) + + nla_total_size(sizeof(struct xfrmu_spdinfo)) + + nla_total_size(sizeof(struct xfrmu_spdhinfo)); +} + static int build_spdinfo(struct sk_buff *skb, u32 pid, u32 seq, u32 flags) { struct xfrmk_spdinfo si; @@ -729,12 +736,8 @@ static int xfrm_get_spdinfo(struct sk_bu u32 *flags = nlmsg_data(nlh); u32 spid = NETLINK_CB(skb).pid; u32 seq = nlh-nlmsg_seq; - int len = NLMSG_LENGTH(sizeof(u32)); - len += RTA_SPACE(sizeof(struct xfrmu_spdinfo)); - len += RTA_SPACE(sizeof(struct xfrmu_spdhinfo)); - - r_skb = alloc_skb(len, GFP_ATOMIC); + r_skb = nlmsg_new(xfrm_spdinfo_msgsize(), GFP_ATOMIC); if (r_skb == NULL) return -ENOMEM; @@ -744,6 +747,13 @@ static int xfrm_get_spdinfo(struct sk_bu return nlmsg_unicast(xfrm_nl, r_skb, spid); } +static inline size_t xfrm_sadinfo_msgsize(void) +{ + return NLMSG_ALIGN(4) + + nla_total_size(sizeof(struct xfrmu_sadhinfo)) + + nla_total_size(4); /* XFRMA_SAD_CNT */ +} + static int build_sadinfo(struct sk_buff *skb, u32 pid, u32 seq, u32 flags) { struct xfrmk_sadinfo si; @@ -779,13 +789,8 @@ static int xfrm_get_sadinfo(struct sk_bu u32 *flags = nlmsg_data(nlh); u32 spid = NETLINK_CB(skb).pid; u32 seq = nlh-nlmsg_seq; - int len = NLMSG_LENGTH(sizeof(u32)); - - len += RTA_SPACE(sizeof(struct xfrmu_sadhinfo)); - len += RTA_SPACE(sizeof(u32)); - - r_skb = alloc_skb(len, GFP_ATOMIC); + r_skb = nlmsg_new(xfrm_sadinfo_msgsize(), GFP_ATOMIC); if (r_skb == NULL) return -ENOMEM; @@ -1311,7 +1316,7 @@ static struct sk_buff *xfrm_policy_netli struct xfrm_dump_info info; struct sk_buff *skb; - skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!skb) return ERR_PTR(-ENOMEM); @@ -1425,6 +1430,14 @@ static int xfrm_flush_sa(struct sk_buff return 0; } +static inline size_t xfrm_aevent_msgsize(void) +{ + return NLMSG_ALIGN(sizeof(struct xfrm_aevent_id)) + + nla_total_size(sizeof(struct xfrm_replay_state)) + + nla_total_size(sizeof(struct xfrm_lifetime_cur)) + + nla_total_size(4) /* XFRM_AE_RTHR */ + + nla_total_size(4); /* XFRM_AE_ETHR */ +} static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, struct km_event *c) { @@ -1469,19 +1482,9 @@ static int xfrm_get_ae(struct sk_buff *s int err; struct km_event c; struct xfrm_aevent_id *p = nlmsg_data(nlh); - int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id)); struct xfrm_usersa_id *id = p-sa_id; - len += RTA_SPACE(sizeof(struct xfrm_replay_state)); - len += RTA_SPACE(sizeof(struct xfrm_lifetime_cur)); - - if (p-flagsXFRM_AE_RTHR) - len+=RTA_SPACE(sizeof(u32)); - - if (p-flagsXFRM_AE_ETHR) - len+=RTA_SPACE(sizeof(u32)); - - r_skb = alloc_skb(len, GFP_ATOMIC); + r_skb = nlmsg_new(xfrm_aevent_msgsize(), GFP_ATOMIC); if (r_skb == NULL) return -ENOMEM; @@ -1824,6 +1827,13 @@ static int copy_to_user_migrate(struct x return nla_put(skb, XFRMA_MIGRATE, sizeof(um), um); } +static inline size_t xfrm_migrate_msgsize(int num_migrate) +{ + return NLMSG_ALIGN(sizeof(struct xfrm_userpolicy_id)) + + nla_total_size(sizeof(struct xfrm_user_migrate) * num_migrate) + + userpolicy_type_attrsize(); +} + static int build_migrate(struct sk_buff *skb, struct xfrm_migrate *m, int num_migrate, struct xfrm_selector *sel, u8 dir, u8 type) @@ -1861,12 +1871,8 @@ static int xfrm_send_migrate(struct xfrm struct xfrm_migrate *m, int num_migrate) { struct sk_buff *skb; - size_t len;
[PATCH 02/16] [XFRM] netlink: Use nlmsg_end() and nlmsg_cancel()
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:10:34.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:12:20.0 +0200 @@ -583,7 +583,6 @@ static int dump_one_state(struct xfrm_st struct sk_buff *skb = sp-out_skb; struct xfrm_usersa_info *p; struct nlmsghdr *nlh; - unsigned char *b = skb_tail_pointer(skb); if (sp-this_idx sp-start_idx) goto out; @@ -628,14 +627,14 @@ static int dump_one_state(struct xfrm_st if (x-lastused) RTA_PUT(skb, XFRMA_LASTUSED, sizeof(x-lastused), x-lastused); - nlh-nlmsg_len = skb_tail_pointer(skb) - b; + nlmsg_end(skb, nlh); out: sp-this_idx++; return 0; rtattr_failure: - nlmsg_trim(skb, b); - return -1; + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; } static int xfrm_dump_sa(struct sk_buff *skb, struct netlink_callback *cb) @@ -1270,7 +1269,6 @@ static int dump_one_policy(struct xfrm_p struct sk_buff *in_skb = sp-in_skb; struct sk_buff *skb = sp-out_skb; struct nlmsghdr *nlh; - unsigned char *b = skb_tail_pointer(skb); if (sp-this_idx sp-start_idx) goto out; @@ -1289,14 +1287,14 @@ static int dump_one_policy(struct xfrm_p if (copy_to_user_policy_type(xp-type, skb) 0) goto nlmsg_failure; - nlh-nlmsg_len = skb_tail_pointer(skb) - b; + nlmsg_end(skb, nlh); out: sp-this_idx++; return 0; nlmsg_failure: - nlmsg_trim(skb, b); - return -1; + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; } static int xfrm_dump_policy(struct sk_buff *skb, struct netlink_callback *cb) @@ -1446,7 +1444,6 @@ static int build_aevent(struct sk_buff * struct xfrm_aevent_id *id; struct nlmsghdr *nlh; struct xfrm_lifetime_cur ltime; - unsigned char *b = skb_tail_pointer(skb); nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0); if (nlh == NULL) @@ -1479,12 +1476,11 @@ static int build_aevent(struct sk_buff * RTA_PUT(skb,XFRMA_ETIMER_THRESH,sizeof(u32),etimer); } - nlh-nlmsg_len = skb_tail_pointer(skb) - b; - return skb-len; + return nlmsg_end(skb, nlh); rtattr_failure: - nlmsg_trim(skb, b); - return -1; + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; } static int xfrm_get_ae(struct sk_buff *skb, struct nlmsghdr *nlh, @@ -1862,7 +1858,6 @@ static int build_migrate(struct sk_buff struct xfrm_migrate *mp; struct xfrm_userpolicy_id *pol_id; struct nlmsghdr *nlh; - unsigned char *b = skb_tail_pointer(skb); int i; nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id), 0); @@ -1883,11 +1878,10 @@ static int build_migrate(struct sk_buff goto nlmsg_failure; } - nlh-nlmsg_len = skb_tail_pointer(skb) - b; - return skb-len; + return nlmsg_end(skb, nlh); nlmsg_failure: - nlmsg_trim(skb, b); - return -1; + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; } static int xfrm_send_migrate(struct xfrm_selector *sel, u8 dir, u8 type, @@ -2043,7 +2037,6 @@ static int build_expire(struct sk_buff * { struct xfrm_user_expire *ue; struct nlmsghdr *nlh; - unsigned char *b = skb_tail_pointer(skb); nlh = nlmsg_put(skb, c-pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0); if (nlh == NULL) @@ -2053,8 +2046,7 @@ static int build_expire(struct sk_buff * copy_to_user_state(x, ue-state); ue-hard = (c-data.hard != 0) ? 1 : 0; - nlh-nlmsg_len = skb_tail_pointer(skb) - b; - return skb-len; + return nlmsg_end(skb, nlh); } static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c) @@ -2096,13 +2088,11 @@ static int xfrm_notify_sa_flush(struct k struct xfrm_usersa_flush *p; struct nlmsghdr *nlh; struct sk_buff *skb; - sk_buff_data_t b; int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush)); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; - b = skb-tail; nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_FLUSHSA, sizeof(*p), 0); if (nlh == NULL) { @@ -2113,7 +2103,7 @@ static int xfrm_notify_sa_flush(struct k p = NLMSG_DATA(nlh); p-proto = c-data.proto; - nlh-nlmsg_len = skb-tail - b; + nlmsg_end(skb, nlh); NETLINK_CB(skb).dst_group = XFRMNLGRP_SA; return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); @@ -2140,7 +2130,6 @@ static int xfrm_notify_sa(struct xfrm_st struct xfrm_usersa_id *id; struct nlmsghdr *nlh; struct sk_buff *skb; -
[PATCH 16/16] [XFRM] netlink: Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()
These functions are only used once and are a lot easier to understand if inlined directly into the function. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 23:05:30.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-22 16:45:31.0 +0200 @@ -214,23 +214,6 @@ static int attach_one_algo(struct xfrm_a return 0; } -static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct nlattr *rta) -{ - struct xfrm_encap_tmpl *p, *uencap; - - if (!rta) - return 0; - - uencap = nla_data(rta); - p = kmemdup(uencap, sizeof(*p), GFP_KERNEL); - if (!p) - return -ENOMEM; - - *encapp = p; - return 0; -} - - static inline int xfrm_user_sec_ctx_size(struct xfrm_sec_ctx *xfrm_ctx) { int len = 0; @@ -242,33 +225,6 @@ static inline int xfrm_user_sec_ctx_size return len; } -static int attach_sec_ctx(struct xfrm_state *x, struct nlattr *u_arg) -{ - struct xfrm_user_sec_ctx *uctx; - - if (!u_arg) - return 0; - - uctx = nla_data(u_arg); - return security_xfrm_state_alloc(x, uctx); -} - -static int attach_one_addr(xfrm_address_t **addrpp, struct nlattr *rta) -{ - xfrm_address_t *p, *uaddrp; - - if (!rta) - return 0; - - uaddrp = nla_data(rta); - p = kmemdup(uaddrp, sizeof(*p), GFP_KERNEL); - if (!p) - return -ENOMEM; - - *addrpp = p; - return 0; -} - static void copy_from_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) { memcpy(x-id, p-id, sizeof(x-id)); @@ -340,15 +296,27 @@ static struct xfrm_state *xfrm_state_con xfrm_calg_get_byname, attrs[XFRMA_ALG_COMP]))) goto error; - if ((err = attach_encap_tmpl(x-encap, attrs[XFRMA_ENCAP]))) - goto error; - if ((err = attach_one_addr(x-coaddr, attrs[XFRMA_COADDR]))) - goto error; + + if (attrs[XFRMA_ENCAP]) { + x-encap = kmemdup(nla_data(attrs[XFRMA_ENCAP]), + sizeof(x-encap), GFP_KERNEL); + if (x-encap == NULL) + goto error; + } + + if (attrs[XFRMA_COADDR]) { + x-coaddr = kmemdup(nla_data(attrs[XFRMA_COADDR]), + sizeof(x-coaddr), GFP_KERNEL); + if (x-coaddr == NULL) + goto error; + } + err = xfrm_init_state(x); if (err) goto error; - if ((err = attach_sec_ctx(x, attrs[XFRMA_SEC_CTX]))) + if (attrs[XFRMA_SEC_CTX] + security_xfrm_state_alloc(x, nla_data(attrs[XFRMA_SEC_CTX]))) goto error; x-km.seq = p-seq; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/16] [XFRM] netlink: Move algorithm length calculation to its own function
Adds alg_len() to calculate the properly padded length of an algorithm attribute to simplify the code. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:16:03.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:03:43.0 +0200 @@ -33,6 +33,11 @@ #endif #include linux/audit.h +static inline int alg_len(struct xfrm_algo *alg) +{ + return sizeof(*alg) + ((alg-alg_key_len + 7) / 8); +} + static int verify_one_alg(struct rtattr **xfrma, enum xfrm_attr_type_t type) { struct rtattr *rt = xfrma[type - 1]; @@ -232,7 +237,6 @@ static int attach_one_algo(struct xfrm_a struct rtattr *rta = u_arg; struct xfrm_algo *p, *ualg; struct xfrm_algo_desc *algo; - int len; if (!rta) return 0; @@ -244,8 +248,7 @@ static int attach_one_algo(struct xfrm_a return -ENOSYS; *props = algo-desc.sadb_alg_id; - len = sizeof(*ualg) + (ualg-alg_key_len + 7U) / 8; - p = kmemdup(ualg, len, GFP_KERNEL); + p = kmemdup(ualg, alg_len(ualg), GFP_KERNEL); if (!p) return -ENOMEM; @@ -617,11 +620,9 @@ static int dump_one_state(struct xfrm_st copy_to_user_state(x, p); if (x-aalg) - NLA_PUT(skb, XFRMA_ALG_AUTH, - sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg); + NLA_PUT(skb, XFRMA_ALG_AUTH, alg_len(x-aalg), x-aalg); if (x-ealg) - NLA_PUT(skb, XFRMA_ALG_CRYPT, - sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg); + NLA_PUT(skb, XFRMA_ALG_CRYPT, alg_len(x-ealg), x-ealg); if (x-calg) NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg); @@ -2072,9 +2073,9 @@ static inline int xfrm_sa_len(struct xfr { int l = 0; if (x-aalg) - l += RTA_SPACE(sizeof(*x-aalg) + (x-aalg-alg_key_len+7)/8); + l += RTA_SPACE(alg_len(x-aalg)); if (x-ealg) - l += RTA_SPACE(sizeof(*x-ealg) + (x-ealg-alg_key_len+7)/8); + l += RTA_SPACE(alg_len(x-ealg)); if (x-calg) l += RTA_SPACE(sizeof(*x-calg)); if (x-encap) @@ -2127,11 +2128,9 @@ static int xfrm_notify_sa(struct xfrm_st copy_to_user_state(x, p); if (x-aalg) - NLA_PUT(skb, XFRMA_ALG_AUTH, - sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg); + NLA_PUT(skb, XFRMA_ALG_AUTH, alg_len(x-aalg), x-aalg); if (x-ealg) - NLA_PUT(skb, XFRMA_ALG_CRYPT, - sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg); + NLA_PUT(skb, XFRMA_ALG_CRYPT, alg_len(x-ealg), x-ealg); if (x-calg) NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/16] [XFRM] netlink: Rename attribyte array from xfrma[] to attrs[]
Increases readability a lot. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:34:10.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:34:29.0 +0200 @@ -38,9 +38,9 @@ static inline int alg_len(struct xfrm_al return sizeof(*alg) + ((alg-alg_key_len + 7) / 8); } -static int verify_one_alg(struct rtattr **xfrma, enum xfrm_attr_type_t type) +static int verify_one_alg(struct rtattr **attrs, enum xfrm_attr_type_t type) { - struct rtattr *rt = xfrma[type]; + struct rtattr *rt = attrs[type]; struct xfrm_algo *algp; if (!rt) @@ -75,18 +75,18 @@ static int verify_one_alg(struct rtattr return 0; } -static void verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type, +static void verify_one_addr(struct rtattr **attrs, enum xfrm_attr_type_t type, xfrm_address_t **addrp) { - struct rtattr *rt = xfrma[type]; + struct rtattr *rt = attrs[type]; if (rt addrp) *addrp = RTA_DATA(rt); } -static inline int verify_sec_ctx_len(struct rtattr **xfrma) +static inline int verify_sec_ctx_len(struct rtattr **attrs) { - struct rtattr *rt = xfrma[XFRMA_SEC_CTX]; + struct rtattr *rt = attrs[XFRMA_SEC_CTX]; struct xfrm_user_sec_ctx *uctx; if (!rt) @@ -101,7 +101,7 @@ static inline int verify_sec_ctx_len(str static int verify_newsa_info(struct xfrm_usersa_info *p, -struct rtattr **xfrma) +struct rtattr **attrs) { int err; @@ -125,35 +125,35 @@ static int verify_newsa_info(struct xfrm err = -EINVAL; switch (p-id.proto) { case IPPROTO_AH: - if (!xfrma[XFRMA_ALG_AUTH] || - xfrma[XFRMA_ALG_CRYPT] || - xfrma[XFRMA_ALG_COMP]) + if (!attrs[XFRMA_ALG_AUTH] || + attrs[XFRMA_ALG_CRYPT] || + attrs[XFRMA_ALG_COMP]) goto out; break; case IPPROTO_ESP: - if ((!xfrma[XFRMA_ALG_AUTH] -!xfrma[XFRMA_ALG_CRYPT]) || - xfrma[XFRMA_ALG_COMP]) + if ((!attrs[XFRMA_ALG_AUTH] +!attrs[XFRMA_ALG_CRYPT]) || + attrs[XFRMA_ALG_COMP]) goto out; break; case IPPROTO_COMP: - if (!xfrma[XFRMA_ALG_COMP] || - xfrma[XFRMA_ALG_AUTH] || - xfrma[XFRMA_ALG_CRYPT]) + if (!attrs[XFRMA_ALG_COMP] || + attrs[XFRMA_ALG_AUTH] || + attrs[XFRMA_ALG_CRYPT]) goto out; break; #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) case IPPROTO_DSTOPTS: case IPPROTO_ROUTING: - if (xfrma[XFRMA_ALG_COMP] || - xfrma[XFRMA_ALG_AUTH] || - xfrma[XFRMA_ALG_CRYPT] || - xfrma[XFRMA_ENCAP] || - xfrma[XFRMA_SEC_CTX]|| - !xfrma[XFRMA_COADDR]) + if (attrs[XFRMA_ALG_COMP] || + attrs[XFRMA_ALG_AUTH] || + attrs[XFRMA_ALG_CRYPT] || + attrs[XFRMA_ENCAP] || + attrs[XFRMA_SEC_CTX]|| + !attrs[XFRMA_COADDR]) goto out; break; #endif @@ -162,13 +162,13 @@ static int verify_newsa_info(struct xfrm goto out; } - if ((err = verify_one_alg(xfrma, XFRMA_ALG_AUTH))) + if ((err = verify_one_alg(attrs, XFRMA_ALG_AUTH))) goto out; - if ((err = verify_one_alg(xfrma, XFRMA_ALG_CRYPT))) + if ((err = verify_one_alg(attrs, XFRMA_ALG_CRYPT))) goto out; - if ((err = verify_one_alg(xfrma, XFRMA_ALG_COMP))) + if ((err = verify_one_alg(attrs, XFRMA_ALG_COMP))) goto out; - if ((err = verify_sec_ctx_len(xfrma))) + if ((err = verify_sec_ctx_len(attrs))) goto out; err = -EINVAL; @@ -298,12 +298,12 @@ static void copy_from_user_state(struct * somehow made shareable and move it to xfrm_state.c - JHS * */ -static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma) +static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **attrs) { - struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL]; - struct rtattr *lt = xfrma[XFRMA_LTIME_VAL]; - struct rtattr *et = xfrma[XFRMA_ETIMER_THRESH]; - struct rtattr *rt = xfrma[XFRMA_REPLAY_THRESH]; + struct rtattr *rp = attrs[XFRMA_REPLAY_VAL]; +
[PATCH 10/16] [XFRM] netlink: Establish an attribute policy
Adds a policy defining the minimal payload lengths for all the attributes allowing for most attribute validation checks to be removed from in the middle of the code path. Makes updates more consistent as many format errors are recognised earlier, before any changes have been attempted. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:31:04.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:31:56.0 +0200 @@ -42,19 +42,12 @@ static int verify_one_alg(struct rtattr { struct rtattr *rt = xfrma[type - 1]; struct xfrm_algo *algp; - int len; if (!rt) return 0; - len = (rt-rta_len - sizeof(*rt)) - sizeof(*algp); - if (len 0) - return -EINVAL; - algp = RTA_DATA(rt); - - len -= (algp-alg_key_len + 7U) / 8; - if (len 0) + if (RTA_PAYLOAD(rt) alg_len(algp)) return -EINVAL; switch (type) { @@ -82,55 +75,25 @@ static int verify_one_alg(struct rtattr return 0; } -static int verify_encap_tmpl(struct rtattr **xfrma) -{ - struct rtattr *rt = xfrma[XFRMA_ENCAP - 1]; - struct xfrm_encap_tmpl *encap; - - if (!rt) - return 0; - - if ((rt-rta_len - sizeof(*rt)) sizeof(*encap)) - return -EINVAL; - - return 0; -} - -static int verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type, +static void verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type, xfrm_address_t **addrp) { struct rtattr *rt = xfrma[type - 1]; - if (!rt) - return 0; - - if ((rt-rta_len - sizeof(*rt)) sizeof(**addrp)) - return -EINVAL; - - if (addrp) + if (rt addrp) *addrp = RTA_DATA(rt); - - return 0; } static inline int verify_sec_ctx_len(struct rtattr **xfrma) { struct rtattr *rt = xfrma[XFRMA_SEC_CTX - 1]; struct xfrm_user_sec_ctx *uctx; - int len = 0; if (!rt) return 0; - if (rt-rta_len sizeof(*uctx)) - return -EINVAL; - uctx = RTA_DATA(rt); - - len += sizeof(struct xfrm_user_sec_ctx); - len += uctx-ctx_len; - - if (uctx-len != len) + if (uctx-len != (sizeof(struct xfrm_user_sec_ctx) + uctx-ctx_len)) return -EINVAL; return 0; @@ -205,12 +168,8 @@ static int verify_newsa_info(struct xfrm goto out; if ((err = verify_one_alg(xfrma, XFRMA_ALG_COMP))) goto out; - if ((err = verify_encap_tmpl(xfrma))) - goto out; if ((err = verify_sec_ctx_len(xfrma))) goto out; - if ((err = verify_one_addr(xfrma, XFRMA_COADDR, NULL))) - goto out; err = -EINVAL; switch (p-mode) { @@ -339,9 +298,8 @@ static void copy_from_user_state(struct * somehow made shareable and move it to xfrm_state.c - JHS * */ -static int xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma) +static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma) { - int err = - EINVAL; struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1]; struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1]; struct rtattr *et = xfrma[XFRMA_ETIMER_THRESH-1]; @@ -349,8 +307,6 @@ static int xfrm_update_ae_params(struct if (rp) { struct xfrm_replay_state *replay; - if (RTA_PAYLOAD(rp) sizeof(*replay)) - goto error; replay = RTA_DATA(rp); memcpy(x-replay, replay, sizeof(*replay)); memcpy(x-preplay, replay, sizeof(*replay)); @@ -358,8 +314,6 @@ static int xfrm_update_ae_params(struct if (lt) { struct xfrm_lifetime_cur *ltime; - if (RTA_PAYLOAD(lt) sizeof(*ltime)) - goto error; ltime = RTA_DATA(lt); x-curlft.bytes = ltime-bytes; x-curlft.packets = ltime-packets; @@ -367,21 +321,11 @@ static int xfrm_update_ae_params(struct x-curlft.use_time = ltime-use_time; } - if (et) { - if (RTA_PAYLOAD(et) sizeof(u32)) - goto error; + if (et) x-replay_maxage = *(u32*)RTA_DATA(et); - } - if (rt) { - if (RTA_PAYLOAD(rt) sizeof(u32)) - goto error; + if (rt) x-replay_maxdiff = *(u32*)RTA_DATA(rt); - } - - return 0; -error: - return err; } static struct xfrm_state *xfrm_state_construct(struct xfrm_usersa_info *p, @@ -429,9 +373,7 @@ static struct xfrm_state *xfrm_state_con /* override default values from above */ -
[PATCH 14/16] [XFRM] netlink: Use nla_memcpy() in xfrm_update_ae_params()
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:35:13.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:36:59.0 +0200 @@ -303,20 +303,12 @@ static void xfrm_update_ae_params(struct struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH]; if (rp) { - struct xfrm_replay_state *replay; - replay = nla_data(rp); - memcpy(x-replay, replay, sizeof(*replay)); - memcpy(x-preplay, replay, sizeof(*replay)); + nla_memcpy(x-replay, rp, sizeof(x-replay)); + nla_memcpy(x-preplay, rp, sizeof(x-preplay)); } - if (lt) { - struct xfrm_lifetime_cur *ltime; - ltime = nla_data(lt); - x-curlft.bytes = ltime-bytes; - x-curlft.packets = ltime-packets; - x-curlft.add_time = ltime-add_time; - x-curlft.use_time = ltime-use_time; - } + if (lt) + nla_memcpy(x-curlft, lt, sizeof(x-curlft)); if (et) x-replay_maxage = nla_get_u32(et); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/16] [XFRM] netlink: Use nlattr instead of rtattr
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:34:29.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:35:13.0 +0200 @@ -38,16 +38,16 @@ static inline int alg_len(struct xfrm_al return sizeof(*alg) + ((alg-alg_key_len + 7) / 8); } -static int verify_one_alg(struct rtattr **attrs, enum xfrm_attr_type_t type) +static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type) { - struct rtattr *rt = attrs[type]; + struct nlattr *rt = attrs[type]; struct xfrm_algo *algp; if (!rt) return 0; - algp = RTA_DATA(rt); - if (RTA_PAYLOAD(rt) alg_len(algp)) + algp = nla_data(rt); + if (nla_len(rt) alg_len(algp)) return -EINVAL; switch (type) { @@ -75,24 +75,24 @@ static int verify_one_alg(struct rtattr return 0; } -static void verify_one_addr(struct rtattr **attrs, enum xfrm_attr_type_t type, +static void verify_one_addr(struct nlattr **attrs, enum xfrm_attr_type_t type, xfrm_address_t **addrp) { - struct rtattr *rt = attrs[type]; + struct nlattr *rt = attrs[type]; if (rt addrp) - *addrp = RTA_DATA(rt); + *addrp = nla_data(rt); } -static inline int verify_sec_ctx_len(struct rtattr **attrs) +static inline int verify_sec_ctx_len(struct nlattr **attrs) { - struct rtattr *rt = attrs[XFRMA_SEC_CTX]; + struct nlattr *rt = attrs[XFRMA_SEC_CTX]; struct xfrm_user_sec_ctx *uctx; if (!rt) return 0; - uctx = RTA_DATA(rt); + uctx = nla_data(rt); if (uctx-len != (sizeof(struct xfrm_user_sec_ctx) + uctx-ctx_len)) return -EINVAL; @@ -101,7 +101,7 @@ static inline int verify_sec_ctx_len(str static int verify_newsa_info(struct xfrm_usersa_info *p, -struct rtattr **attrs) +struct nlattr **attrs) { int err; @@ -191,16 +191,15 @@ out: static int attach_one_algo(struct xfrm_algo **algpp, u8 *props, struct xfrm_algo_desc *(*get_byname)(char *, int), - struct rtattr *u_arg) + struct nlattr *rta) { - struct rtattr *rta = u_arg; struct xfrm_algo *p, *ualg; struct xfrm_algo_desc *algo; if (!rta) return 0; - ualg = RTA_DATA(rta); + ualg = nla_data(rta); algo = get_byname(ualg-alg_name, 1); if (!algo) @@ -216,15 +215,14 @@ static int attach_one_algo(struct xfrm_a return 0; } -static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct rtattr *u_arg) +static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct nlattr *rta) { - struct rtattr *rta = u_arg; struct xfrm_encap_tmpl *p, *uencap; if (!rta) return 0; - uencap = RTA_DATA(rta); + uencap = nla_data(rta); p = kmemdup(uencap, sizeof(*p), GFP_KERNEL); if (!p) return -ENOMEM; @@ -245,26 +243,25 @@ static inline int xfrm_user_sec_ctx_size return len; } -static int attach_sec_ctx(struct xfrm_state *x, struct rtattr *u_arg) +static int attach_sec_ctx(struct xfrm_state *x, struct nlattr *u_arg) { struct xfrm_user_sec_ctx *uctx; if (!u_arg) return 0; - uctx = RTA_DATA(u_arg); + uctx = nla_data(u_arg); return security_xfrm_state_alloc(x, uctx); } -static int attach_one_addr(xfrm_address_t **addrpp, struct rtattr *u_arg) +static int attach_one_addr(xfrm_address_t **addrpp, struct nlattr *rta) { - struct rtattr *rta = u_arg; xfrm_address_t *p, *uaddrp; if (!rta) return 0; - uaddrp = RTA_DATA(rta); + uaddrp = nla_data(rta); p = kmemdup(uaddrp, sizeof(*p), GFP_KERNEL); if (!p) return -ENOMEM; @@ -298,23 +295,23 @@ static void copy_from_user_state(struct * somehow made shareable and move it to xfrm_state.c - JHS * */ -static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **attrs) +static void xfrm_update_ae_params(struct xfrm_state *x, struct nlattr **attrs) { - struct rtattr *rp = attrs[XFRMA_REPLAY_VAL]; - struct rtattr *lt = attrs[XFRMA_LTIME_VAL]; - struct rtattr *et = attrs[XFRMA_ETIMER_THRESH]; - struct rtattr *rt = attrs[XFRMA_REPLAY_THRESH]; + struct nlattr *rp = attrs[XFRMA_REPLAY_VAL]; + struct nlattr *lt = attrs[XFRMA_LTIME_VAL]; + struct nlattr *et = attrs[XFRMA_ETIMER_THRESH]; + struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH]; if (rp) { struct xfrm_replay_state *replay; - replay = RTA_DATA(rp);
[PATCH 05/16] [XFRM] netlink: Use nla_put()/NLA_PUT() variantes
Also makes use of copy_sec_ctx() in another place and removes duplicated code. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:15:03.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:16:03.0 +0200 @@ -576,6 +576,27 @@ struct xfrm_dump_info { int this_idx; }; +static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb) +{ + int ctx_size = sizeof(struct xfrm_sec_ctx) + s-ctx_len; + struct xfrm_user_sec_ctx *uctx; + struct nlattr *attr; + + attr = nla_reserve(skb, XFRMA_SEC_CTX, ctx_size); + if (attr == NULL) + return -EMSGSIZE; + + uctx = nla_data(attr); + uctx-exttype = XFRMA_SEC_CTX; + uctx-len = ctx_size; + uctx-ctx_doi = s-ctx_doi; + uctx-ctx_alg = s-ctx_alg; + uctx-ctx_len = s-ctx_len; + memcpy(uctx + 1, s-ctx_str, s-ctx_len); + + return 0; +} + static int dump_one_state(struct xfrm_state *x, int count, void *ptr) { struct xfrm_dump_info *sp = ptr; @@ -596,43 +617,32 @@ static int dump_one_state(struct xfrm_st copy_to_user_state(x, p); if (x-aalg) - RTA_PUT(skb, XFRMA_ALG_AUTH, + NLA_PUT(skb, XFRMA_ALG_AUTH, sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg); if (x-ealg) - RTA_PUT(skb, XFRMA_ALG_CRYPT, + NLA_PUT(skb, XFRMA_ALG_CRYPT, sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg); if (x-calg) - RTA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg); + NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg); if (x-encap) - RTA_PUT(skb, XFRMA_ENCAP, sizeof(*x-encap), x-encap); + NLA_PUT(skb, XFRMA_ENCAP, sizeof(*x-encap), x-encap); - if (x-security) { - int ctx_size = sizeof(struct xfrm_sec_ctx) + - x-security-ctx_len; - struct rtattr *rt = __RTA_PUT(skb, XFRMA_SEC_CTX, ctx_size); - struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt); - - uctx-exttype = XFRMA_SEC_CTX; - uctx-len = ctx_size; - uctx-ctx_doi = x-security-ctx_doi; - uctx-ctx_alg = x-security-ctx_alg; - uctx-ctx_len = x-security-ctx_len; - memcpy(uctx + 1, x-security-ctx_str, x-security-ctx_len); - } + if (x-security copy_sec_ctx(x-security, skb) 0) + goto nla_put_failure; if (x-coaddr) - RTA_PUT(skb, XFRMA_COADDR, sizeof(*x-coaddr), x-coaddr); + NLA_PUT(skb, XFRMA_COADDR, sizeof(*x-coaddr), x-coaddr); if (x-lastused) - RTA_PUT(skb, XFRMA_LASTUSED, sizeof(x-lastused), x-lastused); + NLA_PUT_U64(skb, XFRMA_LASTUSED, x-lastused); nlmsg_end(skb, nlh); out: sp-this_idx++; return 0; -rtattr_failure: +nla_put_failure: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } @@ -1193,32 +1203,9 @@ static int copy_to_user_tmpl(struct xfrm up-ealgos = kp-ealgos; up-calgos = kp-calgos; } - RTA_PUT(skb, XFRMA_TMPL, - (sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr), - vec); - - return 0; - -rtattr_failure: - return -1; -} - -static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb) -{ - int ctx_size = sizeof(struct xfrm_sec_ctx) + s-ctx_len; - struct rtattr *rt = __RTA_PUT(skb, XFRMA_SEC_CTX, ctx_size); - struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt); - - uctx-exttype = XFRMA_SEC_CTX; - uctx-len = ctx_size; - uctx-ctx_doi = s-ctx_doi; - uctx-ctx_alg = s-ctx_alg; - uctx-ctx_len = s-ctx_len; - memcpy(uctx + 1, s-ctx_str, s-ctx_len); - return 0; - rtattr_failure: - return -1; + return nla_put(skb, XFRMA_TMPL, + sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr, vec); } static inline int copy_to_user_state_sec_ctx(struct xfrm_state *x, struct sk_buff *skb) @@ -1240,17 +1227,11 @@ static inline int copy_to_user_sec_ctx(s #ifdef CONFIG_XFRM_SUB_POLICY static int copy_to_user_policy_type(u8 type, struct sk_buff *skb) { - struct xfrm_userpolicy_type upt; + struct xfrm_userpolicy_type upt = { + .type = type, + }; - memset(upt, 0, sizeof(upt)); - upt.type = type; - - RTA_PUT(skb, XFRMA_POLICY_TYPE, sizeof(upt), upt); - - return 0; - -rtattr_failure: - return -1; + return nla_put(skb, XFRMA_POLICY_TYPE, sizeof(upt), upt); } #else @@ -1440,7 +1421,6 @@ static int build_aevent(struct sk_buff * { struct xfrm_aevent_id *id; struct nlmsghdr *nlh; - struct xfrm_lifetime_cur ltime;
[PATCH 04/16] [XFRM] netlink: Use nlmsg_broadcast() and nlmsg_unicast()
This simplifies successful return codes from 0 to 0. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:13:57.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:15:03.0 +0200 @@ -800,8 +800,7 @@ static int xfrm_get_sa(struct sk_buff *s if (IS_ERR(resp_skb)) { err = PTR_ERR(resp_skb); } else { - err = netlink_unicast(xfrm_nl, resp_skb, - NETLINK_CB(skb).pid, MSG_DONTWAIT); + err = nlmsg_unicast(xfrm_nl, resp_skb, NETLINK_CB(skb).pid); } xfrm_state_put(x); out_noput: @@ -882,8 +881,7 @@ static int xfrm_alloc_userspi(struct sk_ goto out; } - err = netlink_unicast(xfrm_nl, resp_skb, - NETLINK_CB(skb).pid, MSG_DONTWAIT); + err = nlmsg_unicast(xfrm_nl, resp_skb, NETLINK_CB(skb).pid); out: xfrm_state_put(x); @@ -1393,9 +1391,8 @@ static int xfrm_get_policy(struct sk_buf if (IS_ERR(resp_skb)) { err = PTR_ERR(resp_skb); } else { - err = netlink_unicast(xfrm_nl, resp_skb, - NETLINK_CB(skb).pid, - MSG_DONTWAIT); + err = nlmsg_unicast(xfrm_nl, resp_skb, + NETLINK_CB(skb).pid); } } else { xfrm_audit_log(NETLINK_CB(skb).loginuid, NETLINK_CB(skb).sid, @@ -1525,8 +1522,7 @@ static int xfrm_get_ae(struct sk_buff *s if (build_aevent(r_skb, x, c) 0) BUG(); - err = netlink_unicast(xfrm_nl, r_skb, - NETLINK_CB(skb).pid, MSG_DONTWAIT); + err = nlmsg_unicast(xfrm_nl, r_skb, NETLINK_CB(skb).pid); spin_unlock_bh(x-lock); xfrm_state_put(x); return err; @@ -1903,9 +1899,7 @@ static int xfrm_send_migrate(struct xfrm if (build_migrate(skb, m, num_migrate, sel, dir, type) 0) BUG(); - NETLINK_CB(skb).dst_group = XFRMNLGRP_MIGRATE; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_MIGRATE, -GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_MIGRATE, GFP_ATOMIC); } #else static int xfrm_send_migrate(struct xfrm_selector *sel, u8 dir, u8 type, @@ -2061,8 +2055,7 @@ static int xfrm_exp_state_notify(struct if (build_expire(skb, x, c) 0) BUG(); - NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC); } static int xfrm_aevent_state_notify(struct xfrm_state *x, struct km_event *c) @@ -2079,8 +2072,7 @@ static int xfrm_aevent_state_notify(stru if (build_aevent(skb, x, c) 0) BUG(); - NETLINK_CB(skb).dst_group = XFRMNLGRP_AEVENTS; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_AEVENTS, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_AEVENTS, GFP_ATOMIC); } static int xfrm_notify_sa_flush(struct km_event *c) @@ -2105,8 +2097,7 @@ static int xfrm_notify_sa_flush(struct k nlmsg_end(skb, nlh); - NETLINK_CB(skb).dst_group = XFRMNLGRP_SA; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); } static inline int xfrm_sa_len(struct xfrm_state *x) @@ -2175,8 +2166,7 @@ static int xfrm_notify_sa(struct xfrm_st nlmsg_end(skb, nlh); - NETLINK_CB(skb).dst_group = XFRMNLGRP_SA; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); nlmsg_failure: rtattr_failure: @@ -2262,8 +2252,7 @@ static int xfrm_send_acquire(struct xfrm if (build_acquire(skb, x, xt, xp, dir) 0) BUG(); - NETLINK_CB(skb).dst_group = XFRMNLGRP_ACQUIRE; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_ACQUIRE, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_ACQUIRE, GFP_ATOMIC); } /* User gives us xfrm_user_policy_info followed by an array of 0 @@ -2371,8 +2360,7 @@ static int xfrm_exp_policy_notify(struct if (build_polexpire(skb, xp, dir, c) 0) BUG(); - NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC); } static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, struct km_event *c) @@ -2423,8 +2411,7 @@ static
[PATCH 03/16] [XFRM] netlink: Use nlmsg_data() instead of NLMSG_DATA()
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:12:20.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:13:57.0 +0200 @@ -443,7 +443,7 @@ error_no_put: static int xfrm_add_sa(struct sk_buff *skb, struct nlmsghdr *nlh, struct rtattr **xfrma) { - struct xfrm_usersa_info *p = NLMSG_DATA(nlh); + struct xfrm_usersa_info *p = nlmsg_data(nlh); struct xfrm_state *x; int err; struct km_event c; @@ -520,7 +520,7 @@ static int xfrm_del_sa(struct sk_buff *s struct xfrm_state *x; int err = -ESRCH; struct km_event c; - struct xfrm_usersa_id *p = NLMSG_DATA(nlh); + struct xfrm_usersa_id *p = nlmsg_data(nlh); x = xfrm_user_state_lookup(p, xfrma, err); if (x == NULL) @@ -592,7 +592,7 @@ static int dump_one_state(struct xfrm_st if (nlh == NULL) return -EMSGSIZE; - p = NLMSG_DATA(nlh); + p = nlmsg_data(nlh); copy_to_user_state(x, p); if (x-aalg) @@ -715,7 +715,7 @@ static int xfrm_get_spdinfo(struct sk_bu struct rtattr **xfrma) { struct sk_buff *r_skb; - u32 *flags = NLMSG_DATA(nlh); + u32 *flags = nlmsg_data(nlh); u32 spid = NETLINK_CB(skb).pid; u32 seq = nlh-nlmsg_seq; int len = NLMSG_LENGTH(sizeof(u32)); @@ -765,7 +765,7 @@ static int xfrm_get_sadinfo(struct sk_bu struct rtattr **xfrma) { struct sk_buff *r_skb; - u32 *flags = NLMSG_DATA(nlh); + u32 *flags = nlmsg_data(nlh); u32 spid = NETLINK_CB(skb).pid; u32 seq = nlh-nlmsg_seq; int len = NLMSG_LENGTH(sizeof(u32)); @@ -787,7 +787,7 @@ static int xfrm_get_sadinfo(struct sk_bu static int xfrm_get_sa(struct sk_buff *skb, struct nlmsghdr *nlh, struct rtattr **xfrma) { - struct xfrm_usersa_id *p = NLMSG_DATA(nlh); + struct xfrm_usersa_id *p = nlmsg_data(nlh); struct xfrm_state *x; struct sk_buff *resp_skb; int err = -ESRCH; @@ -841,7 +841,7 @@ static int xfrm_alloc_userspi(struct sk_ int family; int err; - p = NLMSG_DATA(nlh); + p = nlmsg_data(nlh); err = verify_userspi_info(p); if (err) goto out_noput; @@ -1130,7 +1130,7 @@ static struct xfrm_policy *xfrm_policy_c static int xfrm_add_policy(struct sk_buff *skb, struct nlmsghdr *nlh, struct rtattr **xfrma) { - struct xfrm_userpolicy_info *p = NLMSG_DATA(nlh); + struct xfrm_userpolicy_info *p = nlmsg_data(nlh); struct xfrm_policy *xp; struct km_event c; int err; @@ -1277,8 +1277,8 @@ static int dump_one_policy(struct xfrm_p XFRM_MSG_NEWPOLICY, sizeof(*p), sp-nlmsg_flags); if (nlh == NULL) return -EMSGSIZE; - p = NLMSG_DATA(nlh); + p = nlmsg_data(nlh); copy_to_user_policy(xp, p, dir); if (copy_to_user_tmpl(xp, skb) 0) goto nlmsg_failure; @@ -1351,7 +1351,7 @@ static int xfrm_get_policy(struct sk_buf struct km_event c; int delete; - p = NLMSG_DATA(nlh); + p = nlmsg_data(nlh); delete = nlh-nlmsg_type == XFRM_MSG_DELPOLICY; err = copy_from_user_policy_type(type, xfrma); @@ -1420,7 +1420,7 @@ static int xfrm_flush_sa(struct sk_buff struct rtattr **xfrma) { struct km_event c; - struct xfrm_usersa_flush *p = NLMSG_DATA(nlh); + struct xfrm_usersa_flush *p = nlmsg_data(nlh); struct xfrm_audit audit_info; int err; @@ -1448,8 +1448,8 @@ static int build_aevent(struct sk_buff * nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0); if (nlh == NULL) return -EMSGSIZE; - id = NLMSG_DATA(nlh); + id = nlmsg_data(nlh); memcpy(id-sa_id.daddr, x-id.daddr,sizeof(x-id.daddr)); id-sa_id.spi = x-id.spi; id-sa_id.family = x-props.family; @@ -1490,7 +1490,7 @@ static int xfrm_get_ae(struct sk_buff *s struct sk_buff *r_skb; int err; struct km_event c; - struct xfrm_aevent_id *p = NLMSG_DATA(nlh); + struct xfrm_aevent_id *p = nlmsg_data(nlh); int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id)); struct xfrm_usersa_id *id = p-sa_id; @@ -1538,7 +1538,7 @@ static int xfrm_new_ae(struct sk_buff *s struct xfrm_state *x; struct km_event c; int err = - EINVAL; - struct xfrm_aevent_id *p = NLMSG_DATA(nlh); + struct xfrm_aevent_id *p = nlmsg_data(nlh); struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1]; struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1]; @@ -1602,7 +1602,7 @@ static int xfrm_add_pol_expire(struct sk struct rtattr
[PATCH 15/16] [XFRM] netlink: Remove dependency on rtnetlink
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:36:59.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:37:18.0 +0200 @@ -19,7 +19,6 @@ #include linux/string.h #include linux/net.h #include linux/skbuff.h -#include linux/rtnetlink.h #include linux/pfkeyv2.h #include linux/ipsec.h #include linux/init.h -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/16] [XFRM] netlink: Clear up some of the CONFIG_XFRM_SUB_POLICY ifdef mess
Moves all of the SUB_POLICY ifdefs related to the attribute size calculation into a function. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:03:43.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:04:46.0 +0200 @@ -1224,6 +1224,14 @@ static inline int copy_to_user_sec_ctx(s } return 0; } +static inline size_t userpolicy_type_attrsize(void) +{ +#ifdef CONFIG_XFRM_SUB_POLICY + return nla_total_size(sizeof(struct xfrm_userpolicy_type)); +#else + return 0; +#endif +} #ifdef CONFIG_XFRM_SUB_POLICY static int copy_to_user_policy_type(u8 type, struct sk_buff *skb) @@ -1857,9 +1865,7 @@ static int xfrm_send_migrate(struct xfrm len = RTA_SPACE(sizeof(struct xfrm_user_migrate) * num_migrate); len += NLMSG_SPACE(sizeof(struct xfrm_userpolicy_id)); -#ifdef CONFIG_XFRM_SUB_POLICY - len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type)); -#endif + len += userpolicy_type_attrsize(); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -2214,9 +2220,7 @@ static int xfrm_send_acquire(struct xfrm len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr); len += NLMSG_SPACE(sizeof(struct xfrm_user_acquire)); len += RTA_SPACE(xfrm_user_sec_ctx_size(x-security)); -#ifdef CONFIG_XFRM_SUB_POLICY - len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type)); -#endif + len += userpolicy_type_attrsize(); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -2322,9 +2326,7 @@ static int xfrm_exp_policy_notify(struct len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr); len += NLMSG_SPACE(sizeof(struct xfrm_user_polexpire)); len += RTA_SPACE(xfrm_user_sec_ctx_size(xp-security)); -#ifdef CONFIG_XFRM_SUB_POLICY - len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type)); -#endif + len += userpolicy_type_attrsize(); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -2349,9 +2351,7 @@ static int xfrm_notify_policy(struct xfr len += RTA_SPACE(headlen); headlen = sizeof(*id); } -#ifdef CONFIG_XFRM_SUB_POLICY - len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type)); -#endif + len += userpolicy_type_attrsize(); len += NLMSG_SPACE(headlen); skb = alloc_skb(len, GFP_ATOMIC); @@ -2401,9 +2401,7 @@ static int xfrm_notify_policy_flush(stru struct nlmsghdr *nlh; struct sk_buff *skb; int len = 0; -#ifdef CONFIG_XFRM_SUB_POLICY - len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type)); -#endif + len += userpolicy_type_attrsize(); len += NLMSG_LENGTH(0); skb = alloc_skb(len, GFP_ATOMIC); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching
On Wed, 22 Aug 2007 13:58:58 +0530 Krishna Kumar wrote: Add Documentation describing batching skb xmit capability. Signed-off-by: Krishna Kumar [EMAIL PROTECTED] --- batching_skb_xmit.txt | 78 ++ 1 files changed, 78 insertions(+) diff -ruNp org/Documentation/networking/batching_skb_xmit.txt new/Documentation/networking/batching_skb_xmit.txt --- org/Documentation/networking/batching_skb_xmit.txt1970-01-01 05:30:00.0 +0530 +++ new/Documentation/networking/batching_skb_xmit.txt2007-08-22 10:21:19.0 +0530 @@ -0,0 +1,78 @@ + HOWTO for batching skb xmit support + --- + +Section 1: What is batching skb xmit +Section 2: How batching xmit works vs the regular xmit +Section 3: How drivers can support batching +Section 4: How users can work with batching + + +Introduction: Kernel support for batching skb +-- + +A new capability to support xmit of multiple skbs is provided in the netdevice +layer. Drivers which enable this capability should be able to process multiple +skbs in a single call to their xmit handler. + + +Section 1: What is batching skb xmit +- + + This capability is optionally enabled by a driver by setting the + NETIF_F_BATCH_SKBS bit in dev-features. The pre-requisite for a prerequisite + driver to use this capability is that it should have a reasonably I would say reasonably-sized. + sized hardware queue that can process multiple skbs. + + +Section 2: How batching xmit works vs the regular xmit +--- + + The network stack gets called from upper layer protocols with a single + skb to transmit. This skb is first enqueue'd and an attempt is made to enqueued + transmit it immediately (via qdisc_run). However, events like tx lock + contention, tx queue stopped, etc, can result in the skb not getting etc., + sent out and it remains in the queue. When the next xmit is called or + when the queue is re-enabled, qdisc_run could potentially find + multiple packets in the queue, and iteratively send them all out + one-by-one. + + Batching skb xmit is a mechanism to exploit this situation where all + skbs can be passed in one shot to the device. This reduces driver + processing, locking at the driver (or in stack for ~LLTX drivers) + gets amortized over multiple skbs, and in case of specific drivers + where every xmit results in a completion processing (like IPoIB) - + optimizations can be made in the driver to request a completion for + only the last skb that was sent which results in saving interrupts + for every (but the last) skb that was sent in the same batch. + + Batching can result in significant performance gains for systems that + have multiple data stream paths over the same network interface card. + + +Section 3: How drivers can support batching +- + + Batching requires the driver to set the NETIF_F_BATCH_SKBS bit in + dev-features. + + The driver's xmit handler should be modified to process multiple skbs + instead of one skb. The driver's xmit handler is called either with a an + skb to transmit or NULL skb, where the latter case should be handled + as a call to xmit multiple skbs. This is done by sending out all skbs + in the dev-skb_blist list (where it was added by the core stack). + + +Section 4: How users can work with batching +- + + Batching can be disabled for a particular device, e.g. on desktop + systems if only one stream of network activity for that device is + taking place, since performance could be slightly affected due to + extra processing that batching adds (unless packets are getting + sent fast resulting in stopped queue's). Batching can be enabled if queues). + more than one stream of network activity per device is being done, + e.g. on servers; or even desktop usage with multiple browser, chat, + file transfer sessions, etc. + + Per device batching can be enabled/disabled by passing 'on' or 'off' + respectively to ethtool. with what other parameter(s), e.g., ethtool dev batching on/off ? --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More
Re: [2.6.20.17 review 35/58] forcedeth bug fix: realtek phy
On Wed, Aug 22, 2007 at 11:56:42AM -0400, Chuck Ebbert wrote: On 08/22/2007 05:39 AM, Willy Tarreau wrote: This patch contains errata fixes for the realtek phy. It only renamed the defines to be phy specific. Signed-off-by: Ayaz Abdulla [EMAIL PROTECTED] Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED] Signed-off-by: Willy Tarreau [EMAIL PROTECTED] --- drivers/net/forcedeth.c | 54 +++ 1 files changed, 54 insertions(+), 0 deletions(-) diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c index c383dc3..dbfdbed 100644 --- a/drivers/net/forcedeth.c +++ b/drivers/net/forcedeth.c @@ -554,6 +554,7 @@ union ring_type { #define PHY_OUI_MARVELL0x5043 #define PHY_OUI_CICADA 0x03f1 #define PHY_OUI_VITESSE0x01c1 +#define PHY_OUI_REALTEK0x01c1 #define PHYID1_OUI_MASK0x03ff #define PHYID1_OUI_SHFT6 #define PHYID2_OUI_MASK0xfc00 Realtek is 0x0732 This is still wrong upstream -- what happened to the patch to fix it? Good catch, thanks Chuck! I've already seen the fix somewhere, I believe it was on netdev, though I'm not sure. I'm fixing the patch in place right now. I can add your signoff if you want. Cheers, Willy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops in e100_up
Gerrit Renker wrote: With the davem-2.6.24 tree I get the following Oops in the e100 driver (cribbed from console): Code: 6c ff ff ff 8b 48 0c ba 01 00 00 00 89 f0 e8 1b f2 ff ff c7 86 9c 00 00 00 01 00 00 00 e9 4e ff ff ff 89 d0 e8 b3 f8 0b 00 eb 8e 0f 0b eb fe 55 89 e5 56 53 83 ec 0c 8b 98 dc 01 00 00 e8 ff b9 EIP: e100_up+0x11d/0x121 SS:ESP 0068:f759ce38 Stack: syscall_call - sys_ioctl - vfs_ioctl - do_ioctl - sock_ioctl - inet_ioctl - devinet_ioctl - dev_change_flags - dev_open - e100_open - oops The system log then goes on reporting eth0: link up, 100Mbps, full-duplex and hangs while trying to restore the serial console state (not sure that this is related). restore? Is this during resume from suspend or something? Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
r8169: slow samba performance
Just upgraded a motherboard and it came with an onboard Realtek card which appears to use the r8169 driver. The machine is a samba server and when serving files to a local Linux or Windows client, I only get approx 40-60 kbps. Write performance is fine though, in the tens of mbps and scp, nfs, and ftp server all work well so it appears specific to the Samba load. However, when serving to more than one client symoltaniously, performance goes up dramatically, again into the tens of mbps or when there is other network activity. Shane, join the crowd :) Try the fix I just re-posted over here: http://www.spinics.net/lists/netdev/msg39244.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops in e100_up
Em Wed, Aug 22, 2007 at 09:35:04AM -0700, Kok, Auke escreveu: Gerrit Renker wrote: With the davem-2.6.24 tree I get the following Oops in the e100 driver (cribbed from console): Code: 6c ff ff ff 8b 48 0c ba 01 00 00 00 89 f0 e8 1b f2 ff ff c7 86 9c 00 00 00 01 00 00 00 e9 4e ff ff ff 89 d0 e8 b3 f8 0b 00 eb 8e 0f 0b eb fe 55 89 e5 56 53 83 ec 0c 8b 98 dc 01 00 00 e8 ff b9 EIP: e100_up+0x11d/0x121 SS:ESP 0068:f759ce38 Stack: syscall_call - sys_ioctl - vfs_ioctl - do_ioctl - sock_ioctl - inet_ioctl - devinet_ioctl - dev_change_flags - dev_open - e100_open - oops The system log then goes on reporting eth0: link up, 100Mbps, full-duplex and hangs while trying to restore the serial console state (not sure that this is related). restore? Is this during resume from suspend or something? This seems to have been fixed by a bug reported by akpm and fixed by Thomas Graf, check a recent post with netconsole on the subject. - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
David Miller wrote: I think the jury is still out, but seeing TSO perform even slightly worse with the batching changes in place would be very worrysome. This applies to both throughput and cpu utilization. Should it be any more or less worrysome than small packet performance (eg the TCP_RR stuff I posted recently) being rather worse with TSO enabled than with it disabled? rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] santize tc_ematch headers
The headers in tc_ematch are used by iproute2, so these headers should be processed. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/linux/Kbuild |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/Kbuild b/include/linux/Kbuild index ad7f71a..818cc3a 100644 --- a/include/linux/Kbuild +++ b/include/linux/Kbuild @@ -7,6 +7,7 @@ header-y += raid/ header-y += spi/ header-y += sunrpc/ header-y += tc_act/ +header-y += tc_ematch/ header-y += netfilter/ header-y += netfilter_arp/ header-y += netfilter_bridge/ -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC IPROUTE]: Add flow classifier support
On Wed, 30 May 2007 11:42:01 +0200 Patrick McHardy [EMAIL PROTECTED] wrote: The iproute patch for the flow classifier. This patch is on hold since the netlink changes haven't made it upstream yet. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] iproute2-2.6.23-rc3
There have been a lot of changes for 2.6.23, so here is a test release of iproute2 that should capture all the submitted patches http://developer.osdl.org/shemminger/iproute2/download/iproute2-2.6.23-rc3.tar.gz Johannes Berg (1): show multicast groups PJ Waskiewicz (1): iproute2: sch_rr support in tc Patrick McHardy (6): TC action parsing bug fix Bug fix tc action drop IPROUTE2: RTNETLINK nested attributes Use FRA_* attributes for routing rules iplink: use netlink for link configuration Fix meta ematch usage of 0 values Pavel Emelianov (1): Make ip utility veth driver aware Sridhar Samudrala (1): Fix bug in display of ipv6 cloned/cached routes Stephen Hemminger (3): Fix ss to handle partial records. sanitized headers update to 2.6.23-rc3 Fix m_ipt build -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] netdevice: kernel docbook addition
Add more kernel doc's for part of the network device API. This is only a start, and needs more work. Applies against net-2.6.24 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/Documentation/DocBook/kernel-api.tmpl 2007-08-21 15:43:37.0 -0700 +++ b/Documentation/DocBook/kernel-api.tmpl 2007-08-22 12:30:33.0 -0700 @@ -240,17 +240,23 @@ X!Ilib/string.c sect1titleDriver Support/title !Enet/core/dev.c !Enet/ethernet/eth.c +!Enet/sched/sch_generic.c !Iinclude/linux/etherdevice.h +!Iinclude/linux/netdevice.h + /sect1 + sect1titlePHY Support/title !Edrivers/net/phy/phy.c !Idrivers/net/phy/phy.c !Edrivers/net/phy/phy_device.c !Idrivers/net/phy/phy_device.c !Edrivers/net/phy/mdio_bus.c !Idrivers/net/phy/mdio_bus.c + /sect1 !-- FIXME: Removed for now since no structured comments in source + sect1titleWireless/title X!Enet/core/wireless.c --- /sect1 +-- sect1titleSynchronous PPP/title !Edrivers/net/wan/syncppp.c /sect1 --- a/include/linux/netdevice.h 2007-08-21 15:44:00.0 -0700 +++ b/include/linux/netdevice.h 2007-08-22 12:00:16.0 -0700 @@ -302,17 +302,38 @@ enum extern void FASTCALL(__napi_schedule(struct napi_struct *n)); +/** + * napi_schedule_prep - check if napi can be scheduled + * @n: napi context + * + * Test if NAPI routine is already running, and if not mark + * it as running. This is used as a condition variable + * insure only one NAPI poll instance runs + */ static inline int napi_schedule_prep(struct napi_struct *n) { return !test_and_set_bit(NAPI_STATE_SCHED, n-state); } +/** + * napi_schedule - schedule NAPI poll + * @n: napi context + * + * Schedule NAPI poll routine to be called if it is not already + * running. + */ static inline void napi_schedule(struct napi_struct *n) { if (napi_schedule_prep(n)) __napi_schedule(n); } +/** + * napi_complete - NAPI processing complete + * @n: napi context + * + * Mark NAPI processing as complete. + */ static inline void napi_complete(struct napi_struct *n) { BUG_ON(!test_bit(NAPI_STATE_SCHED, n-state)); @@ -320,12 +341,26 @@ static inline void napi_complete(struct clear_bit(NAPI_STATE_SCHED, n-state); } +/** + * napi_disable - prevent NAPI from scheduling + * @n: napi context + * + * Stop NAPI from being scheduled on this context. + * Waits till any outstanding processing completes. + */ static inline void napi_disable(struct napi_struct *n) { while (test_and_set_bit(NAPI_STATE_SCHED, n-state)) msleep_interruptible(1); } +/** + * napi_disable - prevent NAPI from scheduling + * @n: napi context + * + * Resume NAPI from being scheduled on this context. + * Must be paired with napi_disable. + */ static inline void napi_enable(struct napi_struct *n) { BUG_ON(!test_bit(NAPI_STATE_SCHED, n-state)); @@ -636,6 +671,12 @@ struct net_device #defineNETDEV_ALIGN32 #defineNETDEV_ALIGN_CONST (NETDEV_ALIGN - 1) +/** + * netdev_priv - access network device private data + * @dev: network device + * + * Get network device private data + */ static inline void *netdev_priv(const struct net_device *dev) { return dev-priv; @@ -773,11 +814,24 @@ static inline void netif_schedule(struct __netif_schedule(dev); } +/** + * netif_start_queue - allow transmit + * @dev: network device + * + * Allow upper layers to call the device hard_start_xmit routine. + */ static inline void netif_start_queue(struct net_device *dev) { clear_bit(__LINK_STATE_XOFF, dev-state); } +/** + * netif_wake_queue - restart transmit + * @dev: network device + * + * Allow upper layers to call the device hard_start_xmit routine. + * Used for flow control when transmit resources are available. + */ static inline void netif_wake_queue(struct net_device *dev) { #ifdef CONFIG_NETPOLL_TRAP @@ -790,16 +844,35 @@ static inline void netif_wake_queue(stru __netif_schedule(dev); } +/** + * netif_stop_queue - stop transmitted packets + * @dev: network device + * + * Stop upper layers calling the device hard_start_xmit routine. + * Used for flow control when transmit resources are unavailable. + */ static inline void netif_stop_queue(struct net_device *dev) { set_bit(__LINK_STATE_XOFF, dev-state); } +/** + * netif_queue_stopped - test if transmit queue is flowblocked + * @dev: network device + * + * Test if transmit queue on device is currently unable to send. + */ static inline int netif_queue_stopped(const struct net_device *dev) { return test_bit(__LINK_STATE_XOFF, dev-state); } +/** + * netif_running - test if up + * @dev: network device + * + * Test if the device has been brought up. + */ static inline int netif_running(const struct net_device *dev) { return
Re: [PATCH] AH4: Update IPv4 options handling to conform to RFC 4302.
From: Nick Bowler [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 10:22:53 -0400 In testing our ESP/AH offload hardware, I discovered an issue with how AH handles mutable fields in IPv4. RFC 4302 (AH) states the following on the subject: For IPv4, the entire option is viewed as a unit; so even though the type and length fields within most options are immutable in transit, if an option is classified as mutable, the entire option is zeroed for ICV computation purposes. The current implementation does not zero the type and length fields, resulting in authentication failures when communicating with hosts that do (i.e. FreeBSD). I have tested record route and timestamp options (ping -R and ping -T) on a small network involving Windows XP, FreeBSD 6.2, and Linux hosts, with one router. In the presence of these options, the FreeBSD and Linux hosts (with the patch or with the hardware) can communicate. The Windows XP host simply fails to accept these packets with or without the patch. I have also been trying to test source routing options (using traceroute -g), but haven't had much luck getting this option to work *without* AH, let alone with. Signed-off-by: Nick Bowler [EMAIL PROTECTED] Patch applied, thanks a lot Nick. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] santize tc_ematch headers
From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 10:18:38 -0700 The headers in tc_ematch are used by iproute2, so these headers should be processed. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] Applied, thanks Stephen. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC IPROUTE]: Add flow classifier support
From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 10:46:15 -0700 On Wed, 30 May 2007 11:42:01 +0200 Patrick McHardy [EMAIL PROTECTED] wrote: The iproute patch for the flow classifier. This patch is on hold since the netlink changes haven't made it upstream yet. I don't have the kernel side in my queue either, perhaps I lost it or I didn't see it when it was sent out. Patrick? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm: export sysctl_xfrm_acq_expires
From: Neil Horman [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 15:42:02 -0400 Hey all- I had noticed that an extra sysctl for xfrm had been added while back (specifically sysctl_xfrm_acq_expires). Unlike its related sysctl's however, this was never exported so that out-of-tree modules could access it, and I thought it would be a good idea if it was. This patch handles that. Thanks Regards Neil Signed-off-by: Neil Horman [EMAIL PROTECTED] There is no reason for out-of-tree code to access it and no current examples exist. It is an internal knob controlling how a specific part of the IPSEC rule lookup operates, and that is all in-tree. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xfrm: export sysctl_xfrm_acq_expires
Hey all- I had noticed that an extra sysctl for xfrm had been added while back (specifically sysctl_xfrm_acq_expires). Unlike its related sysctl's however, this was never exported so that out-of-tree modules could access it, and I thought it would be a good idea if it was. This patch handles that. Thanks Regards Neil Signed-off-by: Neil Horman [EMAIL PROTECTED] xfrm_state.c |1 + 1 file changed, 1 insertion(+) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index d4356e6..62ae5a2 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -34,6 +34,7 @@ u32 sysctl_xfrm_aevent_rseqth __read_mostly = XFRM_AE_SEQT_SIZE; EXPORT_SYMBOL(sysctl_xfrm_aevent_rseqth); u32 sysctl_xfrm_acq_expires __read_mostly = 30; +EXPORT_SYMBOL(sysctl_xfrm_acq_expires); /* Each xfrm_state may be linked to two tables: -- /*** *Neil Horman *Software Engineer *Red Hat, Inc. [EMAIL PROTECTED] *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/16] [XFRM] netlink: Use nlmsg_end() and nlmsg_cancel()
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:40 +0200 Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] improved xfrm_audit_log() patch
From: David Miller [EMAIL PROTECTED] Date: Tue, 21 Aug 2007 00:24:05 -0700 (PDT) Looks good, applied to net-2.6.24, thanks Joy. Something is still buggered up in this patch, you can't add this local audit_info variable unconditionally to these functions, and alternatively you also can't add a bunch of ifdefs to xfrm_user.c to cover it up either. CC [M] net/xfrm/xfrm_user.o net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_add_sa$,1ry(B: net/xfrm/xfrm_user.c:450: warning: unused variable $,1rx(Baudit_info$,1ry(B net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_del_sa$,1ry(B: net/xfrm/xfrm_user.c:525: warning: unused variable $,1rx(Baudit_info$,1ry(B net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_add_policy$,1ry(B: net/xfrm/xfrm_user.c:1140: warning: unused variable $,1rx(Baudit_info$,1ry(B net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_get_policy$,1ry(B: net/xfrm/xfrm_user.c:1404: warning: unused variable $,1rx(Baudit_info$,1ry(B net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_add_pol_expire$,1ry(B: net/xfrm/xfrm_user.c:1651: warning: unused variable $,1rx(Baudit_info$,1ry(B net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_add_sa_expire$,1ry(B: net/xfrm/xfrm_user.c:1688: warning: unused variable $,1rx(Baudit_info$,1ry(B So I'm going to revert for now. Let me know when you have a fixed version of the patch. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
From: Rick Jones [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 10:09:37 -0700 Should it be any more or less worrysome than small packet performance (eg the TCP_RR stuff I posted recently) being rather worse with TSO enabled than with it disabled? That, like any such thing shown by the batching changes, is a bug to fix. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netdevice: kernel docbook addition
On Wed, 22 Aug 2007 12:33:14 -0700 Stephen Hemminger wrote: Add more kernel doc's for part of the network device API. This is only a start, and needs more work. Applies against net-2.6.24 Thanks! --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Net: ath5k, remove sysctls
ath5k, remove sysctls Syscalls were buggy and defunct in later kernels (due to sysctl check). Signed-off-by: Jiri Slaby [EMAIL PROTECTED] --- commit 069bfbe93facb3468f579568434d18f1268a487c tree 87c19ebf2c91d9fb07f1847adcb6098f2235eaaa parent b01c0e9a02b248c3e2f2923da9728ba2c3961dee author Jiri Slaby [EMAIL PROTECTED] Wed, 22 Aug 2007 22:48:41 +0200 committer Jiri Slaby [EMAIL PROTECTED] Wed, 22 Aug 2007 22:48:41 +0200 drivers/net/wireless/ath5k_base.c | 23 --- 1 files changed, 8 insertions(+), 15 deletions(-) diff --git a/drivers/net/wireless/ath5k_base.c b/drivers/net/wireless/ath5k_base.c index 2ce82ed..7f938c4 100644 --- a/drivers/net/wireless/ath5k_base.c +++ b/drivers/net/wireless/ath5k_base.c @@ -2440,21 +2440,13 @@ static struct pci_driver ath_pci_drv_id = { .resume = ath_pci_resume, }; -/* - * Static (i.e. global) sysctls. Note that the hal sysctls - * are located under ours by sharing the setting for DEV_ATH. - */ -enum { - DEV_ATH = 9,/* XXX known by hal */ -}; - static int mincalibrate = 1; static int maxcalibrate = INT_MAX / 1000; #defineCTL_AUTO-2 /* cannot be CTL_ANY or CTL_NONE */ static ctl_table ath_static_sysctls[] = { #if AR_DEBUG - { .ctl_name = CTL_AUTO, + { .procname = debug, .mode = 0644, .data = ath_debug, @@ -2462,28 +2454,28 @@ static ctl_table ath_static_sysctls[] = { .proc_handler = proc_dointvec }, #endif - { .ctl_name = CTL_AUTO, + { .procname = countrycode, .mode = 0444, .data = countrycode, .maxlen = sizeof(countrycode), .proc_handler = proc_dointvec }, - { .ctl_name = CTL_AUTO, + { .procname = outdoor, .mode = 0444, .data = outdoor, .maxlen = sizeof(outdoor), .proc_handler = proc_dointvec }, - { .ctl_name = CTL_AUTO, + { .procname = xchanmode, .mode = 0444, .data = xchanmode, .maxlen = sizeof(xchanmode), .proc_handler = proc_dointvec }, - { .ctl_name = CTL_AUTO, + { .procname = calibrate, .mode = 0644, .data = ath_calinterval, @@ -2495,14 +2487,15 @@ static ctl_table ath_static_sysctls[] = { { 0 } }; static ctl_table ath_ath_table[] = { - { .ctl_name = DEV_ATH, + { .procname = ath, .mode = 0555, .child= ath_static_sysctls }, { 0 } }; static ctl_table ath_root_table[] = { - { .ctl_name = CTL_DEV, + { + .ctl_name = CTL_DEV, .procname = dev, .mode = 0555, .child= ath_ath_table - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Wild and crazy ideas involving struct sk_buff
Over in LSM/SELinux land there has been a lot of talk recently about how to deal with loopback and forwarded traffic, specifically, how to preserve the sender's security label on those two types of traffic. Yes, there is the existing sk_buff.secmark field but that is already being used for something else and utilizing it for this purpose has it's pros/cons. We're currently talking about several different ideas to solve the problem, including leveraging the sk_buff.secmark field, and one of the ideas was to add an additional field to the sk_buff structure. Knowing how well that idea would go over (lead balloon is probably an understatement at best) I started looking at what I might be able to remove from the sk_buff struct to make room for a new field (the new field would be a u32). Looking at the sk_buff structure it appears that the sk_buff.dev and sk_buff.iif fields are a bit redundant and removing the sk_buff.dev field could free 32/64 bits depending on the platform. Is there any reason (performance?) for keeping the sk_buff.dev field around? Would the community be open to patches which removed it and transition users over to the sk_buff.iif field? Finally, assuming the sk_buff.dev field was removed, would the community be open to adding a new LSM/SELinux related u32 field to the sk_buff struct? Thanks. -- paul moore linux security @ hp - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/16] [XFRM] netlink: Use nlmsg_broadcast() and nlmsg_unicast()
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:42 +0200 This simplifies successful return codes from 0 to 0. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/16] [XFRM] netlink: Use nla_put()/NLA_PUT() variantes
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:43 +0200 Also makes use of copy_sec_ctx() in another place and removes duplicated code. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/16] [XFRM] netlink: Move algorithm length calculation to its own function
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:44 +0200 Adds alg_len() to calculate the properly padded length of an algorithm attribute to simplify the code. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/16] [XFRM] netlink: Clear up some of the CONFIG_XFRM_SUB_POLICY ifdef mess
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:45 +0200 Moves all of the SUB_POLICY ifdefs related to the attribute size calculation into a function. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] [XFRM] netlink: Use nlmsg_parse() to parse attributes
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:47 +0200 Uses nlmsg_parse() to parse the attributes. This actually changes behaviour as unknown attributes (type MAXTYPE) no longer cause an error. Instead unknown attributes will be ignored henceforth to keep older kernels compatible with more recent userspace tools. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/16] [XFRM] netlink: Establish an attribute policy
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:48 +0200 Adds a policy defining the minimal payload lengths for all the attributes allowing for most attribute validation checks to be removed from in the middle of the code path. Makes updates more consistent as many format errors are recognised earlier, before any changes have been attempted. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/16] [XFRM] netlink: Enhance indexing of the attribute array
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:49 +0200 nlmsg_parse() puts attributes at array[type] so the indexing method can be simpilfied by removing the obscuring - 1. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/16] [XFRM] netlink: Rename attribyte array from xfrma[] to attrs[]
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:50 +0200 Increases readability a lot. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied. I named it like this to mean XFRM Attributes :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7 v2] fs_enet: Be an of_platform device when CONFIG_PPC_CPM_NEW_BINDING is set.
On Tue, 21 Aug 2007 11:47:41 -0500 Scott Wood [EMAIL PROTECTED] wrote: Vitaly Bordug wrote: On Fri, 17 Aug 2007 13:17:18 -0500 Scott Wood wrote: The existing OF glue code was crufty and broken. Rather than fix it, it will be removed, and the ethernet driver now talks to the device tree directly. A bit short description, I'd rather expect some specific improvements list, that are now up and running using device tree. Or if it is just move to new infrastucture, let's state that, too. Some of specific binding changes (there are too many to exhaustively list) are enumerated in the new CPM binding patch, which I'll send after Kumar's include/asm-ppc patch goes in (since it modifies one of those files). ok +#ifdef CONFIG_PPC_CPM_NEW_BINDING +static int __devinit find_phy(struct device_node *np, + struct fs_platform_info *fpi) +{ + struct device_node *phynode, *mdionode; + struct resource res; + int ret = 0, len; + + const u32 *data = of_get_property(np, phy-handle, len); + if (!data || len != 4) + return -EINVAL; + + phynode = of_find_node_by_phandle(*data); + if (!phynode) + return -EINVAL; + + mdionode = of_get_parent(phynode); + if (!phynode) + goto out_put_phy; + + ret = of_address_to_resource(mdionode, 0, res); + if (ret) + goto out_put_mdio; + + data = of_get_property(phynode, reg, len); + if (!data || len != 4) + goto out_put_mdio; + + snprintf(fpi-bus_id, 16, PHY_ID_FMT, res.start, *data); + +out_put_mdio: + of_node_put(mdionode); +out_put_phy: + of_node_put(phynode); + return ret; +} And without phy node? It returns -EINVAL. :-) +#ifdef CONFIG_FS_ENET_HAS_FEC +#define IS_FEC(match) ((match)-data == fs_fec_ops) +#else +#define IS_FEC(match) 0 +#endif + Since we're talking directly with device tree, why bother with CONFIG_ stuff? We are able to figure it out from dts.. We are figuring it out from the DTS (that's what match-data is). I just didn't want boards without a FEC to have to build in support for it and waste memory. yes, wrong snippet what about #ifdef CONFIG_CPM2 + r = fs_enet_mdio_bb_init(); + if (r != 0) + goto out_mdio_bb; +#endif +#ifdef CONFIG_8xx + r = fs_enet_mdio_fec_init(); + if (r != 0) + goto out_mdio_fec; +#endif We had to pray and hope that 8xx would only have fec, and cpm2 has some bitbanged stuff. now we can inquire dts and know for sure, at least it seems so. + fpi-rx_ring = 32; + fpi-tx_ring = 32; + fpi-rx_copybreak = 240; + fpi-use_napi = 0; + fpi-napi_weight = 17; + move params over to dts? No. These aren't attributes of the hardware, they're choices the driver makes about how much memory to use and how to interact with the rest of the kernel. + ret = find_phy(ofdev-node, fpi); + if (ret) + goto out_free_fpi; + so we're hosed without phy node. How is that different from the old code, where you're hosed without fep-fpi-bus_id? I wasn't defending old code, and consider old code is POS, new one is just great game meaningless. I am just stating the problem, that we'll have to address later. On 8xx even reference boards may be without phy at all. +static struct of_device_id fs_enet_match[] = { +#ifdef CONFIG_FS_ENET_HAS_SCC same nagging. Are we able to get rid of Kconfig arcane defining which SoC currently plays the game for fs_enet? No, it's still needed for mpc885ads to determine pin setup and conflicting device tree node removal (though that could go away if the device tree is changed to reflect the jumper settings). It's also useful for excluding unwanted code. I don't like using 8xx/CPM2 as the decision point for that -- why should I build in mac-scc.c if I have no intention of using an SCC ethernet (either because my board doesn't have one, or because it's slow and conflicts with other devices)? ok, agreed, size is most serious judge here. we'll definitely have to revisit pin problem later too (because custom designs sometimes switch contradictory devices on-the-fly, disable soc parts for alternative function, etc.) QE-like pin encoding may be an option for this or not- I'm inclined to look at most resource-safe approach. -Scott -- Sincerely, Vitaly - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/16] [XFRM] netlink: Use nlattr instead of rtattr
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:51 +0200 Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/16] [XFRM] netlink: Use nla_memcpy() in xfrm_update_ae_params()
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:52 +0200 Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/16] [XFRM] netlink: Remove dependency on rtnetlink
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:53 +0200 Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/16] [XFRM] netlink: Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:55:54 +0200 These functions are only used once and are a lot easier to understand if inlined directly into the function. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Also applied. Thanks for doing all of this work Thomas! :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] fs_enet: Whitespace cleanup.
On Fri, 17 Aug 2007 12:53:59 -0500 Scott Wood wrote: Signed-off-by: Scott Wood [EMAIL PROTECTED] Acked-by: Vitaly Bordug [EMAIL PROTECTED] --- drivers/net/fs_enet/fs_enet-main.c | 85 --- drivers/net/fs_enet/fs_enet.h |4 +- drivers/net/fs_enet/mac-fcc.c |1 - drivers/net/fs_enet/mii-fec.c |1 - 4 files changed, 41 insertions(+), 50 deletions(-) diff --git a/drivers/net/fs_enet/fs_enet-main.c b/drivers/net/fs_enet/fs_enet-main.c index a4a2a0e..f261b90 100644 --- a/drivers/net/fs_enet/fs_enet-main.c +++ b/drivers/net/fs_enet/fs_enet-main.c @@ -353,7 +353,6 @@ static void fs_enet_tx(struct net_device *dev) do_wake = do_restart = 0; while (((sc = CBDR_SC(bdp)) BD_ENET_TX_READY) == 0) { - dirtyidx = bdp - fep-tx_bd_base; if (fep-tx_free == fep-tx_ring) @@ -454,7 +453,6 @@ fs_enet_interrupt(int irq, void *dev_id) nr = 0; while ((int_events = (*fep-ops-get_int_events)(dev)) != 0) { - nr++; int_clr_events = int_events; @@ -710,45 +708,43 @@ static void fs_timeout(struct net_device *dev) *-*/ static void generic_adjust_link(struct net_device *dev) { - struct fs_enet_private *fep = netdev_priv(dev); - struct phy_device *phydev = fep-phydev; - int new_state = 0; - - if (phydev-link) { - - /* adjust to duplex mode */ - if (phydev-duplex != fep-oldduplex){ - new_state = 1; - fep-oldduplex = phydev-duplex; - } - - if (phydev-speed != fep-oldspeed) { - new_state = 1; - fep-oldspeed = phydev-speed; - } - - if (!fep-oldlink) { - new_state = 1; - fep-oldlink = 1; - netif_schedule(dev); - netif_carrier_on(dev); - netif_start_queue(dev); - } - - if (new_state) - fep-ops-restart(dev); - - } else if (fep-oldlink) { - new_state = 1; - fep-oldlink = 0; - fep-oldspeed = 0; - fep-oldduplex = -1; - netif_carrier_off(dev); - netif_stop_queue(dev); - } - - if (new_state netif_msg_link(fep)) - phy_print_status(phydev); + struct fs_enet_private *fep = netdev_priv(dev); + struct phy_device *phydev = fep-phydev; + int new_state = 0; + + if (phydev-link) { + /* adjust to duplex mode */ + if (phydev-duplex != fep-oldduplex) { + new_state = 1; + fep-oldduplex = phydev-duplex; + } + + if (phydev-speed != fep-oldspeed) { + new_state = 1; + fep-oldspeed = phydev-speed; + } + + if (!fep-oldlink) { + new_state = 1; + fep-oldlink = 1; + netif_schedule(dev); + netif_carrier_on(dev); + netif_start_queue(dev); + } + + if (new_state) + fep-ops-restart(dev); + } else if (fep-oldlink) { + new_state = 1; + fep-oldlink = 0; + fep-oldspeed = 0; + fep-oldduplex = -1; + netif_carrier_off(dev); + netif_stop_queue(dev); + } + + if (new_state netif_msg_link(fep)) + phy_print_status(phydev); } @@ -792,7 +788,6 @@ static int fs_init_phy(struct net_device *dev) return 0; } - static int fs_enet_open(struct net_device *dev) { struct fs_enet_private *fep = netdev_priv(dev); @@ -978,7 +973,7 @@ static struct net_device *fs_init_instance(struct device *dev, #endif #ifdef CONFIG_FS_ENET_HAS_SCC - if (fs_get_scc_index(fpi-fs_no) =0 ) + if (fs_get_scc_index(fpi-fs_no) = 0) fep-ops = fs_scc_ops; #endif @@ -1069,9 +1064,8 @@ static struct net_device *fs_init_instance(struct device *dev, return ndev; - err: +err: if (ndev != NULL) { - if (registered) unregister_netdev(ndev); @@ -1262,7 +1256,6 @@ static int __init fs_init(void) err: cleanup_immap(); return r; - } static void __exit fs_cleanup(void) diff --git a/drivers/net/fs_enet/fs_enet.h b/drivers/net/fs_enet/fs_enet.h index 569be22..72a61e9 100644 --- a/drivers/net/fs_enet/fs_enet.h +++ b/drivers/net/fs_enet/fs_enet.h @@ -15,8 +15,8 @@ #include asm/commproc.h struct fec_info { -fec_t* fecp; - u32 mii_speed; + fec_t
Re: [RFC] Wild and crazy ideas involving struct sk_buff
From: Paul Moore [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 16:31:34 -0400 We're currently talking about several different ideas to solve the problem, including leveraging the sk_buff.secmark field, and one of the ideas was to add an additional field to the sk_buff structure. Knowing how well that idea would go over (lead balloon is probably an understatement at best) I started looking at what I might be able to remove from the sk_buff struct to make room for a new field (the new field would be a u32). Looking at the sk_buff structure it appears that the sk_buff.dev and sk_buff.iif fields are a bit redundant and removing the sk_buff.dev field could free 32/64 bits depending on the platform. Is there any reason (performance?) for keeping the sk_buff.dev field around? Would the community be open to patches which removed it and transition users over to the sk_buff.iif field? Finally, assuming the sk_buff.dev field was removed, would the community be open to adding a new LSM/SELinux related u32 field to the sk_buff struct? It's there for performance, and I bet there might be some semantic issues involved. And ironically James Morris still owes me a struct sk_buff removal from when I let him put the secmark thing in there! Stop spending money you guys haven't earned yet :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html