date:20070822

[PATCH 2.6.24 2/3]S2io: Support for add/delete/store/restore ethernet addresses

2007-08-22 Thread Sreenivasa Honnur

- Support to add/delete/store/restore 64 and 128 Ethernet addresses for
Xframe I and Xframe II respectively.

Signed-off-by: Sreenivasa Honnur [EMAIL PROTECTED]
Signed-off-by: Ramkrishna Vepa [EMAIL PROTECTED]
---
diff -urpN patch1/drivers/net/s2io.c patch2/drivers/net/s2io.c
--- patch1/drivers/net/s2io.c   2007-08-18 05:32:23.0 +0530
+++ patch2/drivers/net/s2io.c   2007-08-18 07:19:21.0 +0530
@@ -3589,6 +3589,9 @@ static void s2io_reset(struct s2io_nic *
/* Set swapper to enable I/O register access */
s2io_set_swapper(sp);
 
+   /* restore mac_addr entries */
+   restore_mac_and_mc_addr(sp);
+
/* Restore the MSIX table entries from local variables */
restore_xmsi_data(sp);
 
@@ -3647,9 +3650,6 @@ static void s2io_reset(struct s2io_nic *
writeq(val64, bar0-pcc_err_reg);
}
 
-   /* restore the previously assigned mac address */
-   set_mac_addr(sp-dev, (u8 *)sp-def_mac_addr[0].mac_addr);
-
sp-device_enabled_once = FALSE;
 }
 
@@ -4118,8 +4118,19 @@ hw_init_failed:
 static int s2io_close(struct net_device *dev)
 {
struct s2io_nic *sp = dev-priv;
+   struct config_param *config = sp-config;
+   u64 tmp64;
+   int off;
 
netif_stop_queue(dev);
+
+   /* delete all populated mac entries */
+   for(off =1; off  config-max_mc_addr; off++) {
+   tmp64 = read_mac_addr(sp,off);
+   if(tmp64 != 0xULL)
+   delete_mac_addr(sp, tmp64);
+   }
+
/* Reset card, kill tasklet and free Tx and Rx buffers. */
s2io_card_down(sp);
 
@@ -5044,7 +5055,7 @@ static void s2io_set_multicast(struct ne
   bar0-rmac_addr_data1_mem);
val64 = RMAC_ADDR_CMD_MEM_WE |
RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD |
-   RMAC_ADDR_CMD_MEM_OFFSET(MAC_MC_ALL_MC_ADDR_OFFSET);
+   RMAC_ADDR_CMD_MEM_OFFSET(config-max_mc_addr - 1);
writeq(val64, bar0-rmac_addr_cmd_mem);
/* Wait till command completes */
wait_for_cmd_complete(bar0-rmac_addr_cmd_mem,
@@ -5052,7 +5063,7 @@ static void s2io_set_multicast(struct ne
S2IO_BIT_RESET);
 
sp-m_cast_flg = 1;
-   sp-all_multi_pos = MAC_MC_ALL_MC_ADDR_OFFSET;
+   sp-all_multi_pos = config-max_mc_addr - 1;
} else if ((dev-flags  IFF_ALLMULTI)  (sp-m_cast_flg)) {
/*  Disable all Multicast addresses */
writeq(RMAC_ADDR_DATA0_MEM_ADDR(dis_addr),
@@ -5121,7 +5132,8 @@ static void s2io_set_multicast(struct ne
/*  Update individual M_CAST address list */
if ((!sp-m_cast_flg)  dev-mc_count) {
if (dev-mc_count 
-   (MAX_ADDRS_SUPPORTED - MAC_MC_ADDR_START_OFFSET - 1)) {
+   ((config-max_mc_addr - config-max_mac_addr)
+   - config-mc_start_offset - 1)) {
DBG_PRINT(ERR_DBG, %s: No more Rx filters ,
  dev-name);
DBG_PRINT(ERR_DBG, can be added, please enable );
@@ -5141,7 +5153,7 @@ static void s2io_set_multicast(struct ne
val64 = RMAC_ADDR_CMD_MEM_WE |
RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD |
RMAC_ADDR_CMD_MEM_OFFSET
-   (MAC_MC_ADDR_START_OFFSET + i);
+   (config-mc_start_offset + i);
writeq(val64, bar0-rmac_addr_cmd_mem);
 
/* Wait for command completes */
@@ -5173,7 +5185,7 @@ static void s2io_set_multicast(struct ne
val64 = RMAC_ADDR_CMD_MEM_WE |
RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD |
RMAC_ADDR_CMD_MEM_OFFSET
-   (i + MAC_MC_ADDR_START_OFFSET);
+   (i + config-mc_start_offset);
writeq(val64, bar0-rmac_addr_cmd_mem);
 
/* Wait for command completes */
@@ -5188,6 +5200,75 @@ static void s2io_set_multicast(struct ne
}
}
 }
+/* read from CAM unicast  multicast addresses and store it in 
+ * def_mac_addr structure.
+ **/
+void store_mac_and_mc_addr(struct s2io_nic *sp)
+{
+   int offset;
+   u64 mac_addr=0x0;
+   struct config_param *config = sp-config;
+
+   /* store unicast  multicast mac addresses */
+   for(offset = 0; offset  config-max_mc_addr; offset++) {
+   mac_addr = read_mac_addr(sp,offset);
+   /* if read fails disable the entry */
+   if(mac_addr == FAILURE)
+   mac_addr = 0xULL;
+   MAC_ADDR_SET(offset,mac_addr);
+   }
+}
+
+/* restore unicast MAC addresses to CAM from def_mac_addr structure
+ **/
+static void restore_mac_and_mc_addr(struct

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-22 Thread Krishna Kumar2

Hi Dave,

David Miller [EMAIL PROTECTED] wrote on 08/22/2007 09:52:29 AM:

 From: Krishna Kumar2 [EMAIL PROTECTED]
 Date: Wed, 22 Aug 2007 09:41:52 +0530

snip
   Because TSO does batching already, so it's a very good
   tit for tat comparison of the new batching scheme
   vs. an existing one.

  I am planning to do more testing on your suggestion over the
  weekend, but I had a comment. Are you saying that TSO and
  batching should be mutually exclusive so hardware that doesn't
  support TSO (like IB) only would benefit?

  But even if they can co-exist, aren't cases like sending
  multiple small skbs better handled with batching?

 I'm not making any suggestions, so don't read that into anything I've
 said :-)

 I think the jury is still out, but seeing TSO perform even slightly
 worse with the batching changes in place would be very worrysome.
 This applies to both throughput and cpu utilization.

Does turning off batching solve that problem? What I mean by that is:
batching can be disabled if a TSO device is worse for some cases. Infact
something that I had changed my latest code is to not enable batching
in register_netdevice (in Rev4 which I am sending in a few mins), rather
the user has to explicitly turn 'on' batching.

Wondering if that is what you are concerned about. In any case, I will
test your case on Monday (I am on vacation for next couple of days).

Thanks,

- KK

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: net-2.6.24 failure with netconsole

2007-08-22 Thread Herbert Xu

Andrew Morton [EMAIL PROTECTED] wrote:
 
 David, there's basically no reason ever why anyone should add BUG() or
 BUG_ON() to net code.  Please consider rejecting any patches which add new
 ones.  WARN_ON() is *much* better.  It at least gives the user a chance of
 getting some disgnostic info out, of performing additional tests or even of
 using their kernel if they want to test something else.  The only reason to
 choose BUG over WARN is if we're actually concerned about scrogging
 people's data, or serious things like that (ie: filesystems and mm).

Well, for networking if we continue after a serious coding
error it could result in a remote kernel compromise.  So
BUG_ON/BUG is not entirely useless.

I'm not claiming that it's necessarily the case here though :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Oops in 2.6.22.1: skb_copy_and_csum_datagram_iovec()

2007-08-22 Thread Herbert Xu

Chuck Ebbert [EMAIL PROTECTED] wrote:
 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=253290
 
 18:57:54 osama kernel:  [c05be67f] kernel_recvmsg+0x31/0x40
 18:57:54 osama kernel:  [e0bc52d4] svc_udp_recvfrom+0x114/0x368 [sunrpc]

svc_udp_recvfrom is calling kernel_recvmsg with iov == NULL.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.24 1/3]S2io: Added support for set_mac_address driver entry point

2007-08-22 Thread Sreenivasa Honnur

- Added set_mac_address driver entry point
- Copying permanent mac address to dev-perm_addr
 
Signed-off-by: Sreenivasa Honnur [EMAIL PROTECTED]
Signed-off-by: Ramkrishna Vepa [EMAIL PROTECTED]
---
diff -urpN orig/drivers/net/s2io.c patch1/drivers/net/s2io.c
--- orig/drivers/net/s2io.c 2007-08-17 02:38:17.0 +0530
+++ patch1/drivers/net/s2io.c   2007-08-18 05:32:23.0 +0530
@@ -350,6 +350,15 @@ static char ethtool_driver_stats_keys[][
timer.data = (unsigned long) arg;   \
mod_timer(timer, (jiffies + exp))  \
 
+#defineMAC_ADDR_SET(offset,mac_addr)   \
+   memset(sp-def_mac_addr[offset].mac_addr, 0, sizeof(ETH_ALEN));\
+   sp-def_mac_addr[offset].mac_addr[5] = (u8) (mac_addr); \
+   sp-def_mac_addr[offset].mac_addr[4] = (u8) (mac_addr  8);\
+   sp-def_mac_addr[offset].mac_addr[3] = (u8) (mac_addr  16);\
+   sp-def_mac_addr[offset].mac_addr[2] = (u8) (mac_addr  24);\
+   sp-def_mac_addr[offset].mac_addr[1] = (u8) (mac_addr  32);\
+   sp-def_mac_addr[offset].mac_addr[0] = (u8) (mac_addr  40);\
+
 /* Add the vlan */
 static void s2io_vlan_rx_register(struct net_device *dev,
struct vlan_group *grp)
@@ -3639,7 +3648,7 @@ static void s2io_reset(struct s2io_nic *
}
 
/* restore the previously assigned mac address */
-   s2io_set_mac_addr(sp-dev, (u8 *)sp-def_mac_addr[0].mac_addr);
+   set_mac_addr(sp-dev, (u8 *)sp-def_mac_addr[0].mac_addr);
 
sp-device_enabled_once = FALSE;
 }
@@ -4067,7 +4076,7 @@ static int s2io_open(struct net_device *
goto hw_init_failed;
}
 
-   if (s2io_set_mac_addr(dev, dev-dev_addr) == FAILURE) {
+   if (set_mac_addr(dev, dev-dev_addr) == FAILURE) {
DBG_PRINT(ERR_DBG, Set Mac Address Failed\n);
s2io_card_down(sp);
err = -ENODEV;
@@ -5025,6 +5034,7 @@ static void s2io_set_multicast(struct ne
0xfeffULL;
u64 dis_addr = 0xULL, mac_addr = 0;
void __iomem *add;
+   struct config_param *config = sp-config;
 
if ((dev-flags  IFF_ALLMULTI)  (!sp-m_cast_flg)) {
/*  Enable all Multicast addresses */
@@ -5179,8 +5189,48 @@ static void s2io_set_multicast(struct ne
}
 }
 
+/* add MAC address to CAM */
+static int add_mac_addr(struct s2io_nic *sp, u64 addr, int off)
+{
+   u64 val64;
+   struct XENA_dev_config __iomem *bar0 = sp-bar0;
+
+   writeq(RMAC_ADDR_DATA0_MEM_ADDR(addr),
+   bar0-rmac_addr_data0_mem);
+
+   val64 =
+   RMAC_ADDR_CMD_MEM_WE | RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD |
+   RMAC_ADDR_CMD_MEM_OFFSET(off);
+   writeq(val64, bar0-rmac_addr_cmd_mem);
+
+   /* Wait till command completes */
+   if (wait_for_cmd_complete(bar0-rmac_addr_cmd_mem,
+   RMAC_ADDR_CMD_MEM_STROBE_CMD_EXECUTING,
+   S2IO_BIT_RESET)) {
+   DBG_PRINT(INFO_DBG, add_mac_addr failed\n);
+   return FAILURE;
+   }
+   return SUCCESS;
+}
+
+/**
+ * s2io_set_mac_addr driver entry point
+ */
+static int s2io_set_mac_addr(struct net_device *dev, void* p)
+{
+   struct sockaddr *addr=p;
+
+   if (!is_valid_ether_addr(addr-sa_data))
+   return -EINVAL;
+
+   memcpy(dev-dev_addr, addr-sa_data,dev-addr_len);
+
+   /* store the MAC address in CAM */
+   return (set_mac_addr(dev, dev-dev_addr));
+}
+
 /**
- *  s2io_set_mac_addr - Programs the Xframe mac address
+ *  set_mac_addr - Programs the Xframe mac address
  *  @dev : pointer to the device structure.
  *  @addr: a uchar pointer to the new mac address which is to be set.
  *  Description : This procedure will program the Xframe to receive
@@ -5188,56 +5238,31 @@ static void s2io_set_multicast(struct ne
  *  Return value: SUCCESS on success and an appropriate (-)ve integer
  *  as defined in errno.h file on failure.
  */
-
-static int s2io_set_mac_addr(struct net_device *dev, u8 * addr)
+static int set_mac_addr(struct net_device *dev, u8 * addr)
 {
struct s2io_nic *sp = dev-priv;
-   struct XENA_dev_config __iomem *bar0 = sp-bar0;
-   register u64 val64, mac_addr = 0;
+   register u64 mac_addr = 0,perm_addr=0;
int i;
-   u64 old_mac_addr = 0;
 
/*
-* Set the new MAC address as the new unicast filter and reflect this
-* change on the device address registered with the OS. It will be
-* at offset 0.
-*/
+   * Set the new MAC address as the new unicast filter and reflect this
+   * change on the device address registered with the OS. It will be
+   * at offset 0.
+   */
for (i = 0; i  ETH_ALEN; i++) {
mac_addr = 8;
mac_addr |= addr[i];
-   old_mac_addr = 8;
-   old_mac_addr |= sp-def_mac_addr[0].mac_addr[i];
+

[PATCH 2.6.24 3/3]S2io: Updating transceiver information in ethtool function

2007-08-22 Thread Sreenivasa Honnur

- Update transceiver information in ethtool function
 
Signed-off-by: Sreenivasa Honnur [EMAIL PROTECTED]
Signed-off-by: Ramkrishna Vepa [EMAIL PROTECTED]
---
diff -urpN patch2/drivers/net/s2io.c patch3/drivers/net/s2io.c
--- patch2/drivers/net/s2io.c   2007-08-18 07:19:21.0 +0530
+++ patch3/drivers/net/s2io.c   2007-08-18 07:20:27.0 +0530
@@ -84,7 +84,7 @@
 #include s2io.h
 #include s2io-regs.h
 
-#define DRV_VERSION 2.0.26.4
+#define DRV_VERSION 2.0.26.5
 
 /* S2io Driver name  version. */
 static char s2io_driver_name[] = Neterion;
@@ -5459,7 +5459,9 @@ static int s2io_ethtool_gset(struct net_
info-supported = (SUPPORTED_1baseT_Full | SUPPORTED_FIBRE);
info-advertising = (SUPPORTED_1baseT_Full | SUPPORTED_FIBRE);
info-port = PORT_FIBRE;
-   /* info-transceiver?? TODO */
+
+   /* info-transceiver */
+   info-transceiver = XCVR_EXTERNAL;
 
if (netif_carrier_ok(sp-dev)) {
info-speed = 1;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/10 Rev4] Implement skb batching and support in IPoIB

2007-08-22 Thread Krishna Kumar

This set of patches implements the batching xmit capability (changed from
API), and adds support for batching in IPoIB. Also included is a sample patch
for E1000 (ported - thanks to Jamal's E1000 changes from earlier kernel). I
will use this patch for testing E1000 TSO vs batching after the weekend.

List of changes from previous revision:

1. [Dave/Patrick] Remove new xmit API altogether (and add a capabilities
flag in dev-features). Modify documentation to remove API, etc.
2. [Evgeniy] Remove bogus checks for 0, and use spin_lock_bh.
3. [Jamal] Ported Jamal's E1000 driver changes for using batching xmit.
5. [KK] Fix out-of-order sending of skbs bug resulting in re-transmissions
by a fix in IPoIB [see XXX].
6. [KK] Do not force device to use batching as default, instead let user
enable batching if required. This is useful in case users are not
aware that batching is taking place.
4. [KK] IPoIB: Remove multiple xmit handlers and convert to use one.
7. [KK] IPoIB: Removed overkill - poll handler can be called on one CPU, so
there is no need to take a new lock against parallel WC's.

Extras that I can do later:
---
1. [Patrick] Use skb_blist statically in netdevice. This could also be used
to integrate GSO and batching.
2. [Evgeniy] Useful to splice lists dev_add_skb_to_blist (and this can be
done for regular xmit's of GSO skbs too for #1 above).

Patches are described as:
 Mail 0/10:  This mail
 Mail 1/10:  HOWTO documentation
 Mail 2/10:  Introduce skb_blist, NETIF_F_BATCH_SKBS, use
 single API for batching/no-batching, etc.
 Mail 3/10:  Modify qdisc_run() to support batching
 Mail 4/10:  Add ethtool support to enable/disable batching
 Mail 5/10:  IPoIB: Header file changes to use batching
 Mail 6/10:  IPoIB: CM  Multicast changes
 Mail 7/10:  IPoIB: Verbs changes to use batching
 Mail 8/10:  IPoIB: Internal post and work completion handler
 Mail 9/10:  IPoIB: Implement the new batching capability
 Mail 10/10: E1000: Implement the new batching capability

Issues:

I am getting a huge amount of retransmissions for both TCP and TCP No Delay
cases for IPoIB (which explains the slight degradation for some test cases
mentioned in previous mail). After a full test run, there were 18500
retransmissions for every 1 in regular code. But there is 20.7% overall
improvement in BW even with this huge amount of retransmissions (which implies
batching could improve results even more if this problem is fixed). Results of
experiments are:
a. With batching set to maximum 2 skbs, I get almost the same number
   of retransmissions (implies receiver probably is not dropping skbs).
   ifconfig/netstat on receiver gives no clue (drop/errors, etc).
b. Making the IPoIB xmit create single work requests for each skb on
   blist reduces retrans to same as in regular code.
c. Similar retransmission increase is not seen for E1000.

Please review and provide feedback; and consider for inclusion.

Thanks,

- KK

[XXX] Dave had suggested to use batching only in the net_tx_action case.
When I implemented that in earlier revisions, there were lots of TCP
retransmissions (about 18,000 to every 1 in regular code). I found the reason
for part of that problem as: skbs get queue'd up in dev-qdisc (when tx lock
was not got or queue blocked); when net_tx_action is called later, it passes
the batch list as argument to qdisc_run and this results in skbs being moved
to the batch list; then batching xmit also fails due to tx lock failure; the
next many regular xmit of a single skb will go through the fast path (pass
NULL batch list to qdisc_run) and send those skbs out to the device while
previous skbs are cooling their heels in the batch list.

The first fix was to not pass NULL/batch-list to qdisc_run() but to always
check whether skbs are present in batch list when trying to xmit. This reduced
retransmissions by a third (from 18,000 to around 12,000), but led to another
problem while testing - iperf transmit almost zero data for higher # of
parallel flows like 64 or more (and when I run iperf for a 2 min run, it
takes about 5-6 mins, and reports that it ran 0 secs and the amount of data
transfered is a few MB's). I don't know why this happens with this being the
only change (any ideas is very appreciated).

The second fix that resolved this was to revert back to Dave's suggestion to
use batching only in net_tx_action case, and modify the driver to see if skbs
are present in batch list and to send them out first before sending the
current skb. I still see huge retransmission for IPoIB (but not for E1000),
though it has reduced to 12,000 from the earlier 18,000 number.
-
To

[PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching

2007-08-22 Thread Krishna Kumar

Add Documentation describing batching skb xmit capability.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 batching_skb_xmit.txt |   78 ++
 1 files changed, 78 insertions(+)

diff -ruNp org/Documentation/networking/batching_skb_xmit.txt 
new/Documentation/networking/batching_skb_xmit.txt
--- org/Documentation/networking/batching_skb_xmit.txt  1970-01-01 
05:30:00.0 +0530
+++ new/Documentation/networking/batching_skb_xmit.txt  2007-08-22 
10:21:19.0 +0530
@@ -0,0 +1,78 @@
+HOWTO for batching skb xmit support
+---
+
+Section 1: What is batching skb xmit
+Section 2: How batching xmit works vs the regular xmit
+Section 3: How drivers can support batching
+Section 4: How users can work with batching
+
+
+Introduction: Kernel support for batching skb
+--
+
+A new capability to support xmit of multiple skbs is provided in the netdevice
+layer. Drivers which enable this capability should be able to process multiple
+skbs in a single call to their xmit handler.
+
+
+Section 1: What is batching skb xmit
+-
+
+   This capability is optionally enabled by a driver by setting the
+   NETIF_F_BATCH_SKBS bit in dev-features. The pre-requisite for a
+   driver to use this capability is that it should have a reasonably
+   sized hardware queue that can process multiple skbs.
+
+
+Section 2: How batching xmit works vs the regular xmit
+---
+
+   The network stack gets called from upper layer protocols with a single
+   skb to transmit. This skb is first enqueue'd and an attempt is made to
+   transmit it immediately (via qdisc_run). However, events like tx lock
+   contention, tx queue stopped, etc, can result in the skb not getting
+   sent out and it remains in the queue. When the next xmit is called or
+   when the queue is re-enabled, qdisc_run could potentially find
+   multiple packets in the queue, and iteratively send them all out
+   one-by-one.
+
+   Batching skb xmit is a mechanism to exploit this situation where all
+   skbs can be passed in one shot to the device. This reduces driver
+   processing, locking at the driver (or in stack for ~LLTX drivers)
+   gets amortized over multiple skbs, and in case of specific drivers
+   where every xmit results in a completion processing (like IPoIB) -
+   optimizations can be made in the driver to request a completion for
+   only the last skb that was sent which results in saving interrupts
+   for every (but the last) skb that was sent in the same batch.
+
+   Batching can result in significant performance gains for systems that
+   have multiple data stream paths over the same network interface card.
+
+
+Section 3: How drivers can support batching
+-
+
+   Batching requires the driver to set the NETIF_F_BATCH_SKBS bit in
+   dev-features.
+
+   The driver's xmit handler should be modified to process multiple skbs
+   instead of one skb. The driver's xmit handler is called either with a
+   skb to transmit or NULL skb, where the latter case should be handled
+   as a call to xmit multiple skbs. This is done by sending out all skbs
+   in the dev-skb_blist list (where it was added by the core stack).
+
+
+Section 4: How users can work with batching
+-
+
+   Batching can be disabled for a particular device, e.g. on desktop
+   systems if only one stream of network activity for that device is
+   taking place, since performance could be slightly affected due to
+   extra processing that batching adds (unless packets are getting
+   sent fast resulting in stopped queue's). Batching can be enabled if
+   more than one stream of network activity per device is being done,
+   e.g. on servers; or even desktop usage with multiple browser, chat,
+   file transfer sessions, etc.
+
+   Per device batching can be enabled/disabled by passing 'on' or 'off'
+   respectively to ethtool.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/10 Rev4] [core] Add skb_blist support for batching

2007-08-22 Thread Krishna Kumar

Introduce skb_blist, NETIF_F_BATCH_SKBS, use single API for
batching/no-batching, etc.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 include/linux/netdevice.h |4 
 net/core/dev.c|   21 ++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
--- org/include/linux/netdevice.h   2007-08-20 14:26:36.0 +0530
+++ new/include/linux/netdevice.h   2007-08-22 08:42:10.0 +0530
@@ -399,6 +399,7 @@ struct net_device
 #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN 
packets */
 #define NETIF_F_GSO2048/* Enable software GSO. */
 #define NETIF_F_LLTX   4096/* LockLess TX */
+#define NETIF_F_BATCH_SKBS 8192/* Driver supports multiple skbs/xmit */
 #define NETIF_F_MULTI_QUEUE16384   /* Has multiple TX/RX queues */
 #define NETIF_F_LRO32768   /* large receive offload */
 
@@ -510,6 +511,9 @@ struct net_device
/* Partially transmitted GSO packet. */
struct sk_buff  *gso_skb;
 
+   /* List of batch skbs (optional, used if driver supports skb batching */
+   struct sk_buff_head *skb_blist;
+
/* ingress path synchronizer */
spinlock_t  ingress_lock;
struct Qdisc*qdisc_ingress;
diff -ruNp org/net/core/dev.c new/net/core/dev.c
--- org/net/core/dev.c  2007-08-20 14:26:37.0 +0530
+++ new/net/core/dev.c  2007-08-22 10:49:22.0 +0530
@@ -898,6 +898,16 @@ void netdev_state_change(struct net_devi
}
 }
 
+static void free_batching(struct net_device *dev)
+{
+   if (dev-skb_blist) {
+   if (!skb_queue_empty(dev-skb_blist))
+   skb_queue_purge(dev-skb_blist);
+   kfree(dev-skb_blist);
+   dev-skb_blist = NULL;
+   }
+}
+
 /**
  * dev_load- load a network module
  * @name: name of interface
@@ -1458,7 +1468,9 @@ static int dev_gso_segment(struct sk_buf
 
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
-   if (likely(!skb-next)) {
+   if (likely(skb)) {
+   if (unlikely(skb-next))
+   goto gso;
if (!list_empty(ptype_all))
dev_queue_xmit_nit(skb, dev);
 
@@ -1468,10 +1480,10 @@ int dev_hard_start_xmit(struct sk_buff *
if (skb-next)
goto gso;
}
-
-   return dev-hard_start_xmit(skb, dev);
}
 
+   return dev-hard_start_xmit(skb, dev);
+
 gso:
do {
struct sk_buff *nskb = skb-next;
@@ -3791,6 +3803,9 @@ void unregister_netdevice(struct net_dev
 
synchronize_net();
 
+   /* Deallocate batching structure */
+   free_batching(dev);
+
/* Shutdown queueing discipline. */
dev_shutdown(dev);
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/10 Rev4] [sched] Modify qdisc_run to support batching

2007-08-22 Thread Krishna Kumar

Modify qdisc_run() to support batching. Modify callers of qdisc_run to
use batching, modify qdisc_restart to implement batching.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 include/linux/netdevice.h |2 +
 include/net/pkt_sched.h   |6 +--
 net/core/dev.c|   44 +++-
 net/sched/sch_generic.c   |   70 ++
 4 files changed, 105 insertions(+), 17 deletions(-)

diff -ruNp org/include/net/pkt_sched.h new/include/net/pkt_sched.h
--- org/include/net/pkt_sched.h 2007-08-20 14:26:36.0 +0530
+++ new/include/net/pkt_sched.h 2007-08-22 09:23:57.0 +0530
@@ -80,13 +80,13 @@ extern struct qdisc_rate_table *qdisc_ge
struct rtattr *tab);
 extern void qdisc_put_rtab(struct qdisc_rate_table *tab);
 
-extern void __qdisc_run(struct net_device *dev);
+extern void __qdisc_run(struct net_device *dev, struct sk_buff_head *blist);
 
-static inline void qdisc_run(struct net_device *dev)
+static inline void qdisc_run(struct net_device *dev, struct sk_buff_head 
*blist)
 {
if (!netif_queue_stopped(dev) 
!test_and_set_bit(__LINK_STATE_QDISC_RUNNING, dev-state))
-   __qdisc_run(dev);
+   __qdisc_run(dev, blist);
 }
 
 extern int tc_classify_compat(struct sk_buff *skb, struct tcf_proto *tp,
diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
--- org/include/linux/netdevice.h   2007-08-20 14:26:36.0 +0530
+++ new/include/linux/netdevice.h   2007-08-22 08:42:10.0 +0530
@@ -892,6 +896,8 @@ extern int  dev_set_mac_address(struct n
struct sockaddr *);
 extern int dev_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev);
+extern int dev_add_skb_to_blist(struct sk_buff *skb,
+struct net_device *dev);
 
 extern voiddev_init(void);
 
diff -ruNp org/net/sched/sch_generic.c new/net/sched/sch_generic.c
--- org/net/sched/sch_generic.c 2007-08-20 14:26:37.0 +0530
+++ new/net/sched/sch_generic.c 2007-08-22 08:49:55.0 +0530
@@ -59,10 +59,12 @@ static inline int qdisc_qlen(struct Qdis
 static inline int dev_requeue_skb(struct sk_buff *skb, struct net_device *dev,
  struct Qdisc *q)
 {
-   if (unlikely(skb-next))
-   dev-gso_skb = skb;
-   else
-   q-ops-requeue(skb, q);
+   if (likely(skb)) {
+   if (unlikely(skb-next))
+   dev-gso_skb = skb;
+   else
+   q-ops-requeue(skb, q);
+   }
 
netif_schedule(dev);
return 0;
@@ -91,10 +93,15 @@ static inline int handle_dev_cpu_collisi
/*
 * Same CPU holding the lock. It may be a transient
 * configuration error, when hard_start_xmit() recurses. We
-* detect it by checking xmit owner and drop the packet when
-* deadloop is detected. Return OK to try the next skb.
+* detect it by checking xmit owner and drop the packet (or
+* all packets in batching case) when deadloop is detected.
+* Return OK to try the next skb.
 */
-   kfree_skb(skb);
+   if (likely(skb))
+   kfree_skb(skb);
+   else if (!skb_queue_empty(dev-skb_blist))
+   skb_queue_purge(dev-skb_blist);
+
if (net_ratelimit())
printk(KERN_WARNING Dead loop on netdevice %s, 
   fix it urgently!\n, dev-name);
@@ -112,6 +119,38 @@ static inline int handle_dev_cpu_collisi
 }
 
 /*
+ * Algorithm to get skb(s) is:
+ * - Non batching drivers, or if the batch list is empty and there is
+ *   1 skb in the queue - dequeue skb and put it in *skbp to tell the
+ *   caller to use the single xmit API.
+ * - Batching drivers where the batch list already contains atleast one
+ *   skb, or if there are multiple skbs in the queue: keep dequeue'ing
+ *   skb's upto a limit and set *skbp to NULL to tell the caller to use
+ *   the multiple xmit API.
+ *
+ * Returns:
+ * 1 - atleast one skb is to be sent out, *skbp contains skb or NULL
+ * (in case 1 skbs present in blist for batching)
+ * 0 - no skbs to be sent.
+ */
+static inline int get_skb(struct net_device *dev, struct Qdisc *q,
+ struct sk_buff_head *blist, struct sk_buff **skbp)
+{
+   if (likely(!blist || (!skb_queue_len(blist)  qdisc_qlen(q) = 1))) {
+   return likely((*skbp = dev_dequeue_skb(dev, q)) != NULL);
+   } else {
+   struct sk_buff *skb;
+   int max = dev-tx_queue_len - skb_queue_len(blist);
+
+   while (max  0  (skb = dev_dequeue_skb(dev, q)) !=

[PATCH 4/10 Rev4] [ethtool] Add ethtool support

2007-08-22 Thread Krishna Kumar

Add ethtool support to enable/disable batching.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 include/linux/ethtool.h   |2 ++
 include/linux/netdevice.h |2 ++
 net/core/dev.c|   36 
 net/core/ethtool.c|   27 +++
 4 files changed, 67 insertions(+)

diff -ruNp org/include/linux/ethtool.h new/include/linux/ethtool.h
--- org/include/linux/ethtool.h 2007-08-20 14:26:35.0 +0530
+++ new/include/linux/ethtool.h 2007-08-22 08:37:35.0 +0530
@@ -440,6 +440,8 @@ struct ethtool_ops {
 #define ETHTOOL_SFLAGS 0x0026 /* Set flags bitmap(ethtool_value) */
 #define ETHTOOL_GPFLAGS0x0027 /* Get driver-private flags 
bitmap */
 #define ETHTOOL_SPFLAGS0x0028 /* Set driver-private flags 
bitmap */
+#define ETHTOOL_GBATCH 0x0029 /* Get Batching (ethtool_value) */
+#define ETHTOOL_SBATCH 0x0030 /* Set Batching (ethtool_value) */
 
 /* compatibility with older code */
 #define SPARC_ETH_GSET ETHTOOL_GSET
diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
--- org/include/linux/netdevice.h   2007-08-20 14:26:36.0 +0530
+++ new/include/linux/netdevice.h   2007-08-22 08:42:10.0 +0530
@@ -1152,6 +1152,8 @@ extern void   dev_set_promiscuity(struct 
 extern voiddev_set_allmulti(struct net_device *dev, int inc);
 extern voidnetdev_state_change(struct net_device *dev);
 extern voidnetdev_features_change(struct net_device *dev);
+extern int dev_change_tx_batch_skb(struct net_device *dev,
+   unsigned long new_batch_skb);
 /* Load a device via the kmod */
 extern voiddev_load(const char *name);
 extern voiddev_mcast_init(void);
diff -ruNp org/net/core/dev.c new/net/core/dev.c
--- org/net/core/dev.c  2007-08-20 14:26:37.0 +0530
+++ new/net/core/dev.c  2007-08-22 10:49:22.0 +0530
@@ -908,6 +908,42 @@ static void free_batching(struct net_dev
}
 }
 
+int dev_change_tx_batch_skb(struct net_device *dev, unsigned long 
new_batch_skb)
+{
+   int ret = 0;
+   struct sk_buff_head *blist;
+
+   if (!(dev-features  NETIF_F_BATCH_SKBS)) {
+   /* Driver doesn't support batching skb API */
+   ret = -ENOTSUPP;
+   goto out;
+   }
+
+   /*
+* Check if new value is same as the current (paranoia to use !! for
+* new_batch_skb as that should always be boolean).
+*/
+   if (!!dev-skb_blist == !!new_batch_skb)
+   goto out;
+
+   if (new_batch_skb 
+   (blist = kmalloc(sizeof *blist, GFP_KERNEL)) == NULL) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   spin_lock_bh(dev-queue_lock);
+   if (new_batch_skb) {
+   skb_queue_head_init(blist);
+   dev-skb_blist = blist;
+   } else
+   free_batching(dev);
+   spin_unlock_bh(dev-queue_lock);
+
+out:
+   return ret;
+}
+
 /**
  * dev_load- load a network module
  * @name: name of interface
diff -ruNp org/net/core/ethtool.c new/net/core/ethtool.c
--- org/net/core/ethtool.c  2007-08-20 14:26:37.0 +0530
+++ new/net/core/ethtool.c  2007-08-22 08:36:07.0 +0530
@@ -556,6 +556,26 @@ static int ethtool_set_gso(struct net_de
return 0;
 }
 
+static int ethtool_get_batch(struct net_device *dev, char __user *useraddr)
+{
+   struct ethtool_value edata = { ETHTOOL_GBATCH };
+
+   edata.data = dev-skb_blist != NULL;
+   if (copy_to_user(useraddr, edata, sizeof(edata)))
+return -EFAULT;
+   return 0;
+}
+
+static int ethtool_set_batch(struct net_device *dev, char __user *useraddr)
+{
+   struct ethtool_value edata;
+
+   if (copy_from_user(edata, useraddr, sizeof(edata)))
+   return -EFAULT;
+
+   return dev_change_tx_batch_skb(dev, edata.data);
+}
+
 static int ethtool_self_test(struct net_device *dev, char __user *useraddr)
 {
struct ethtool_test test;
@@ -813,6 +833,7 @@ int dev_ethtool(struct ifreq *ifr)
case ETHTOOL_GGSO:
case ETHTOOL_GFLAGS:
case ETHTOOL_GPFLAGS:
+   case ETHTOOL_GBATCH:
break;
default:
if (!capable(CAP_NET_ADMIN))
@@ -956,6 +977,12 @@ int dev_ethtool(struct ifreq *ifr)
rc = ethtool_set_value(dev, useraddr,
   dev-ethtool_ops-set_priv_flags);
break;
+   case ETHTOOL_GBATCH:
+   rc = ethtool_get_batch(dev, useraddr);
+   break;
+   case ETHTOOL_SBATCH:
+   rc = ethtool_set_batch(dev, useraddr);
+   break;
default:
rc = -EOPNOTSUPP;
}
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of

[PATCH 5/10 Rev4] [IPoIB] Header file changes

2007-08-22 Thread Krishna Kumar

IPoIB header file changes to use batching.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 ipoib.h |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib.h 
new/drivers/infiniband/ulp/ipoib/ipoib.h
--- org/drivers/infiniband/ulp/ipoib/ipoib.h2007-08-20 14:26:26.0 
+0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib.h2007-08-22 08:33:51.0 
+0530
@@ -271,8 +271,8 @@ struct ipoib_dev_priv {
struct ipoib_tx_buf *tx_ring;
unsigned tx_head;
unsigned tx_tail;
-   struct ib_sgetx_sge;
-   struct ib_send_wrtx_wr;
+   struct ib_sge*tx_sge;
+   struct ib_send_wr*tx_wr;
 
struct ib_wc ibwc[IPOIB_NUM_WC];
 
@@ -367,8 +367,11 @@ static inline void ipoib_put_ah(struct i
 int ipoib_open(struct net_device *dev);
 int ipoib_add_pkey_attr(struct net_device *dev);
 
+int ipoib_process_skb(struct net_device *dev, struct sk_buff *skb,
+ struct ipoib_dev_priv *priv, struct ipoib_ah *address,
+ u32 qpn, int wr_num);
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-   struct ipoib_ah *address, u32 qpn);
+   struct ipoib_ah *address, u32 qpn, int num_skbs);
 void ipoib_reap_ah(struct work_struct *work);
 
 void ipoib_flush_paths(struct net_device *dev);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/10 Rev4] [IPoIB] CM Multicast changes

2007-08-22 Thread Krishna Kumar

IPoIB CM  Multicast changes based on header file changes.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 ipoib_cm.c|   13 +
 ipoib_multicast.c |4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
new/drivers/infiniband/ulp/ipoib/ipoib_cm.c
--- org/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-08-20 14:26:26.0 
+0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-08-22 08:33:51.0 
+0530
@@ -493,14 +493,19 @@ static inline int post_send(struct ipoib
unsigned int wr_id,
u64 addr, int len)
 {
+   int ret;
struct ib_send_wr *bad_wr;
 
-   priv-tx_sge.addr = addr;
-   priv-tx_sge.length   = len;
+   priv-tx_sge[0].addr  = addr;
+   priv-tx_sge[0].length= len;
+
+   priv-tx_wr[0].wr_id  = wr_id;
 
-   priv-tx_wr.wr_id = wr_id;
+   priv-tx_wr[0].next = NULL;
+   ret = ib_post_send(tx-qp, priv-tx_wr, bad_wr);
+   priv-tx_wr[0].next = priv-tx_wr[1];
 
-   return ib_post_send(tx-qp, priv-tx_wr, bad_wr);
+   return ret;
 }
 
 void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct 
ipoib_cm_tx *tx)
diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
new/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
--- org/drivers/infiniband/ulp/ipoib/ipoib_multicast.c  2007-08-20 
14:26:26.0 +0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib_multicast.c  2007-08-22 
08:33:51.0 +0530
@@ -217,7 +217,7 @@ static int ipoib_mcast_join_finish(struc
if (!memcmp(mcast-mcmember.mgid.raw, priv-dev-broadcast + 4,
sizeof (union ib_gid))) {
priv-qkey = be32_to_cpu(priv-broadcast-mcmember.qkey);
-   priv-tx_wr.wr.ud.remote_qkey = priv-qkey;
+   priv-tx_wr[0].wr.ud.remote_qkey = priv-qkey;
}
 
if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, mcast-flags)) {
@@ -736,7 +736,7 @@ out:
}
}
 
-   ipoib_send(dev, skb, mcast-ah, IB_MULTICAST_QPN);
+   ipoib_send(dev, skb, mcast-ah, IB_MULTICAST_QPN, 1);
}
 
 unlock:
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/10 Rev4] [IPoIB] Verbs changes

2007-08-22 Thread Krishna Kumar

IPoIB verb changes to use batching.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 ipoib_verbs.c |   23 ++-
 1 files changed, 14 insertions(+), 9 deletions(-)

diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
new/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
--- org/drivers/infiniband/ulp/ipoib/ipoib_verbs.c  2007-08-20 
14:26:26.0 +0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib_verbs.c  2007-08-22 
08:33:51.0 +0530
@@ -152,11 +152,11 @@ int ipoib_transport_dev_init(struct net_
.max_send_sge = 1,
.max_recv_sge = 1
},
-   .sq_sig_type = IB_SIGNAL_ALL_WR,
+   .sq_sig_type = IB_SIGNAL_REQ_WR,/* 11.2.4.1 */
.qp_type = IB_QPT_UD
};
-
-   int ret, size;
+   struct ib_send_wr *next_wr = NULL;
+   int i, ret, size;
 
priv-pd = ib_alloc_pd(priv-ca);
if (IS_ERR(priv-pd)) {
@@ -197,12 +197,17 @@ int ipoib_transport_dev_init(struct net_
priv-dev-dev_addr[2] = (priv-qp-qp_num   8)  0xff;
priv-dev-dev_addr[3] = (priv-qp-qp_num  )  0xff;
 
-   priv-tx_sge.lkey   = priv-mr-lkey;
-
-   priv-tx_wr.opcode  = IB_WR_SEND;
-   priv-tx_wr.sg_list = priv-tx_sge;
-   priv-tx_wr.num_sge = 1;
-   priv-tx_wr.send_flags  = IB_SEND_SIGNALED;
+   for (i = ipoib_sendq_size - 1; i = 0; i--) {
+   priv-tx_sge[i].lkey= priv-mr-lkey;
+   priv-tx_wr[i].opcode   = IB_WR_SEND;
+   priv-tx_wr[i].sg_list  = priv-tx_sge[i];
+   priv-tx_wr[i].num_sge  = 1;
+   priv-tx_wr[i].send_flags   = 0;
+
+   /* Link the list properly for provider to use */
+   priv-tx_wr[i].next = next_wr;
+   next_wr = priv-tx_wr[i];
+   }
 
return 0;
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/10 Rev4] [IPoIB] Post and work completion handler changes

2007-08-22 Thread Krishna Kumar

IPoIB internal post and work completion handler changes.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 ipoib_ib.c |  207 -
 1 files changed, 163 insertions(+), 44 deletions(-)

diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
new/drivers/infiniband/ulp/ipoib/ipoib_ib.c
--- org/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-08-20 14:26:26.0 
+0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-08-22 08:33:51.0 
+0530
@@ -242,6 +242,8 @@ repost:
 static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
+   int i, num_completions;
+   unsigned int tx_ring_index;
unsigned int wr_id = wc-wr_id;
struct ipoib_tx_buf *tx_req;
unsigned long flags;
@@ -255,18 +257,46 @@ static void ipoib_ib_handle_tx_wc(struct
return;
}
 
-   tx_req = priv-tx_ring[wr_id];
+   /* Get first WC to process (no one can update tx_tail at this time) */
+   tx_ring_index = priv-tx_tail  (ipoib_sendq_size - 1);
 
-   ib_dma_unmap_single(priv-ca, tx_req-mapping,
-   tx_req-skb-len, DMA_TO_DEVICE);
+   /* Find number of WC's */
+   num_completions = wr_id - tx_ring_index + 1;
+   if (unlikely(num_completions = 0))
+   num_completions += ipoib_sendq_size;
 
-   ++priv-stats.tx_packets;
-   priv-stats.tx_bytes += tx_req-skb-len;
+   /*
+* Handle WC's from earlier (possibly multiple) post_sends in this
+* iteration as we move from tx_tail to wr_id, since if the last WR
+* (which is the one which requested completion notification) failed
+* to be sent for any of those earlier request(s), no completion
+* notification is generated for successful WR's of those earlier
+* request(s).
+*/
+   tx_req = priv-tx_ring[tx_ring_index];
+   for (i = 0; i  num_completions; i++) {
+   if (likely(tx_req-skb)) {
+   ib_dma_unmap_single(priv-ca, tx_req-mapping,
+   tx_req-skb-len, DMA_TO_DEVICE);
+
+   ++priv-stats.tx_packets;
+   priv-stats.tx_bytes += tx_req-skb-len;
 
-   dev_kfree_skb_any(tx_req-skb);
+   dev_kfree_skb_any(tx_req-skb);
+   }
+   /*
+* else this skb failed synchronously when posted and was
+* freed immediately.
+*/
+
+   if (likely(++tx_ring_index != ipoib_sendq_size))
+   tx_req++;
+   else
+   tx_req = priv-tx_ring[0];
+   }
 
spin_lock_irqsave(priv-tx_lock, flags);
-   ++priv-tx_tail;
+   priv-tx_tail += num_completions;
if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, priv-flags)) 
priv-tx_head - priv-tx_tail = ipoib_sendq_size  1) {
clear_bit(IPOIB_FLAG_NETIF_STOPPED, priv-flags);
@@ -335,29 +365,57 @@ void ipoib_ib_completion(struct ib_cq *c
netif_rx_schedule(dev, priv-napi);
 }
 
-static inline int post_send(struct ipoib_dev_priv *priv,
-   unsigned int wr_id,
-   struct ib_ah *address, u32 qpn,
-   u64 addr, int len)
+/*
+ * post_send : Post WR(s) to the device.
+ *
+ * num_skbs is the number of WR's, first_wr is the first slot in tx_wr[] (or
+ * tx_sge[]). first_wr is normally zero unless a previous post_send returned
+ * error and we are trying to post the untried WR's, in which case first_wr
+ * is the index to the first untried WR.
+ *
+ * Break the WR link before posting so that provider knows how many WR's to
+ * process, and this is set back after the post.
+ */
+static inline int post_send(struct ipoib_dev_priv *priv, u32 qpn,
+   int first_wr, int num_skbs,
+   struct ib_send_wr **bad_wr)
 {
-   struct ib_send_wr *bad_wr;
+   int ret;
+   struct ib_send_wr *last_wr, *next_wr;
+
+   last_wr = priv-tx_wr[first_wr + num_skbs - 1];
 
-   priv-tx_sge.addr = addr;
-   priv-tx_sge.length   = len;
+   /* Set Completion Notification for last WR */
+   last_wr-send_flags = IB_SEND_SIGNALED;
 
-   priv-tx_wr.wr_id = wr_id;
-   priv-tx_wr.wr.ud.remote_qpn  = qpn;
-   priv-tx_wr.wr.ud.ah  = address;
+   /* Terminate the last WR */
+   next_wr = last_wr-next;
+   last_wr-next = NULL;
 
-   return ib_post_send(priv-qp, priv-tx_wr, bad_wr);
+   /* Send all the WR's in one doorbell */
+   ret = ib_post_send(priv-qp, priv-tx_wr[first_wr], bad_wr);
+
+   /* Restore send_flags  WR chain */
+   last_wr-send_flags = 0;
+   last_wr-next = next_wr;
+
+   return ret;
 }
 
-void ipoib_send(struct net_device *dev, struct

[PATCH 9/10 Rev4] [IPoIB] Implement batching

2007-08-22 Thread Krishna Kumar

IPoIB: implement the new batching API.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 ipoib_main.c |  251 ---
 1 files changed, 171 insertions(+), 80 deletions(-)

diff -ruNp org/drivers/infiniband/ulp/ipoib/ipoib_main.c 
new/drivers/infiniband/ulp/ipoib/ipoib_main.c
--- org/drivers/infiniband/ulp/ipoib/ipoib_main.c   2007-08-20 
14:26:26.0 +0530
+++ new/drivers/infiniband/ulp/ipoib/ipoib_main.c   2007-08-22 
08:33:51.0 +0530
@@ -560,7 +560,8 @@ static void neigh_add_path(struct sk_buf
goto err_drop;
}
} else
-   ipoib_send(dev, skb, path-ah, 
IPOIB_QPN(skb-dst-neighbour-ha));
+   ipoib_send(dev, skb, path-ah,
+  IPOIB_QPN(skb-dst-neighbour-ha), 1);
} else {
neigh-ah  = NULL;
 
@@ -640,7 +641,7 @@ static void unicast_arp_send(struct sk_b
ipoib_dbg(priv, Send unicast ARP to %04x\n,
  be16_to_cpu(path-pathrec.dlid));
 
-   ipoib_send(dev, skb, path-ah, IPOIB_QPN(phdr-hwaddr));
+   ipoib_send(dev, skb, path-ah, IPOIB_QPN(phdr-hwaddr), 1);
} else if ((path-query || !path_rec_start(dev, path)) 
   skb_queue_len(path-queue)  IPOIB_MAX_PATH_REC_QUEUE) {
/* put pseudoheader back on for next time */
@@ -654,105 +655,166 @@ static void unicast_arp_send(struct sk_b
spin_unlock(priv-lock);
 }
 
+#defineXMIT_PROCESSED_SKBS()   
\
+   do {\
+   if (wr_num) {   \
+   ipoib_send(dev, NULL, old_neigh-ah, old_qpn,   \
+  wr_num); \
+   wr_num = 0; \
+   }   \
+   } while (0)
+
 static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
-   struct ipoib_neigh *neigh;
+   struct sk_buff_head *blist;
+   int max_skbs, wr_num = 0;
+   u32 qpn, old_qpn = 0;
+   struct ipoib_neigh *neigh, *old_neigh = NULL;
unsigned long flags;
 
if (unlikely(!spin_trylock_irqsave(priv-tx_lock, flags)))
return NETDEV_TX_LOCKED;
 
-   /*
-* Check if our queue is stopped.  Since we have the LLTX bit
-* set, we can't rely on netif_stop_queue() preventing our
-* xmit function from being called with a full queue.
-*/
-   if (unlikely(netif_queue_stopped(dev))) {
-   spin_unlock_irqrestore(priv-tx_lock, flags);
-   return NETDEV_TX_BUSY;
-   }
-
-   if (likely(skb-dst  skb-dst-neighbour)) {
-   if (unlikely(!*to_ipoib_neigh(skb-dst-neighbour))) {
-   ipoib_path_lookup(skb, dev);
-   goto out;
-   }
+   blist = dev-skb_blist;
 
-   neigh = *to_ipoib_neigh(skb-dst-neighbour);
+   if (!skb || (blist  skb_queue_len(blist))) {
+   /*
+* Either batching xmit call, or single skb case but there are
+* skbs already in the batch list from previous failure to
+* xmit - send the earlier skbs first to avoid out of order.
+*/
+
+   if (skb)
+   __skb_queue_tail(blist, skb);
+
+   /*
+* Figure out how many skbs can be sent. This prevents the
+* device getting full and avoids checking for stopped queue
+* after each iteration. Now the queue can get stopped atmost
+* after xmit of the last skb.
+*/
+   max_skbs = ipoib_sendq_size - (priv-tx_head - priv-tx_tail);
+   skb = __skb_dequeue(blist);
+   } else {
+   blist = NULL;
+   max_skbs = 1;
+   }
 
-   if (ipoib_cm_get(neigh)) {
-   if (ipoib_cm_up(neigh)) {
-   ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
-   goto out;
-   }
-   } else if (neigh-ah) {
-   if (unlikely(memcmp(neigh-dgid.raw,
-   skb-dst-neighbour-ha + 4,
-   sizeof(union ib_gid {
-   spin_lock(priv-lock);
-   /*
-* It's safe to call ipoib_put_ah() inside
-* priv-lock here, because we know that
-* path-ah will always hold one more

[PATCH 10/10 Rev4] [E1000] Implement batching

2007-08-22 Thread Krishna Kumar

E1000: Implement batching capability (ported thanks to changes taken from
Jamal). Not all changes are made in this as in IPoIB, eg, handling
out of order skbs (see XXX in the first mail).

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 e1000_main.c |  150 +++
 1 files changed, 121 insertions(+), 29 deletions(-)

diff -ruNp org/drivers/net/e1000/e1000_main.c new/drivers/net/e1000/e1000_main.c
--- org/drivers/net/e1000/e1000_main.c  2007-08-20 14:26:29.0 +0530
+++ new/drivers/net/e1000/e1000_main.c  2007-08-22 08:33:51.0 +0530
@@ -157,6 +157,7 @@ static void e1000_update_phy_info(unsign
 static void e1000_watchdog(unsigned long data);
 static void e1000_82547_tx_fifo_stall(unsigned long data);
 static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
+static int e1000_xmit_frames(struct net_device *dev);
 static struct net_device_stats * e1000_get_stats(struct net_device *netdev);
 static int e1000_change_mtu(struct net_device *netdev, int new_mtu);
 static int e1000_set_mac(struct net_device *netdev, void *p);
@@ -990,7 +991,7 @@ e1000_probe(struct pci_dev *pdev,
if (pci_using_dac)
netdev-features |= NETIF_F_HIGHDMA;
 
-   netdev-features |= NETIF_F_LLTX;
+   netdev-features |= NETIF_F_LLTX | NETIF_F_BATCH_SKBS;
 
adapter-en_mng_pt = e1000_enable_mng_pass_thru(adapter-hw);
 
@@ -3098,6 +3099,18 @@ e1000_tx_map(struct e1000_adapter *adapt
return count;
 }
 
+static void e1000_kick_DMA(struct e1000_adapter *adapter,
+  struct e1000_tx_ring *tx_ring, int i)
+{
+   wmb();
+
+   writel(i, adapter-hw.hw_addr + tx_ring-tdt);
+   /* we need this if more than one processor can write to our tail
+* at a time, it syncronizes IO on IA64/Altix systems */
+   mmiowb();
+}
+
+
 static void
 e1000_tx_queue(struct e1000_adapter *adapter, struct e1000_tx_ring *tx_ring,
int tx_flags, int count)
@@ -3144,13 +3157,7 @@ e1000_tx_queue(struct e1000_adapter *ada
 * know there are new descriptors to fetch.  (Only
 * applicable for weak-ordered memory model archs,
 * such as IA-64). */
-   wmb();
-
tx_ring-next_to_use = i;
-   writel(i, adapter-hw.hw_addr + tx_ring-tdt);
-   /* we need this if more than one processor can write to our tail
-* at a time, it syncronizes IO on IA64/Altix systems */
-   mmiowb();
 }
 
 /**
@@ -3257,21 +3264,31 @@ static int e1000_maybe_stop_tx(struct ne
 }
 
 #define TXD_USE_COUNT(S, X) (((S)  (X)) + 1 )
+
+struct e1000_tx_cbdata {
+   int count;
+   unsigned int max_per_txd;
+   unsigned int nr_frags;
+   unsigned int mss;
+};
+
+#define E1000_SKB_CB(__skb)((struct e1000_tx_cbdata *)((__skb)-cb[0]))
+#define NETDEV_TX_DROPPED  -5
+
 static int
-e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
+e1000_prep_queue_frame(struct sk_buff *skb, struct net_device *netdev)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_tx_ring *tx_ring;
-   unsigned int first, max_per_txd = E1000_MAX_DATA_PER_TXD;
+   unsigned int max_per_txd = E1000_MAX_DATA_PER_TXD;
unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
-   unsigned int tx_flags = 0;
unsigned int len = skb-len;
-   unsigned long flags;
-   unsigned int nr_frags = 0;
-   unsigned int mss = 0;
+   unsigned int nr_frags;
+   unsigned int mss;
int count = 0;
-   int tso;
unsigned int f;
+   struct e1000_tx_cbdata *cb = E1000_SKB_CB(skb);
+
len -= skb-data_len;
 
/* This goes back to the question of how to logically map a tx queue
@@ -3282,7 +3299,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
 
if (unlikely(skb-len = 0)) {
dev_kfree_skb_any(skb);
-   return NETDEV_TX_OK;
+   return NETDEV_TX_DROPPED;
}
 
/* 82571 and newer doesn't need the workaround that limited descriptor
@@ -3328,7 +3345,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
DPRINTK(DRV, ERR,
__pskb_pull_tail failed.\n);
dev_kfree_skb_any(skb);
-   return NETDEV_TX_OK;
+   return NETDEV_TX_DROPPED;
}
len = skb-len - skb-data_len;
break;
@@ -3372,22 +3389,32 @@ e1000_xmit_frame(struct sk_buff *skb, st
(adapter-hw.mac_type == e1000_82573))
e1000_transfer_dhcp_info(adapter, skb);
 
-   if (!spin_trylock_irqsave(tx_ring-tx_lock, flags))
-   /* Collision - tell upper layer to requeue */
-   return NETDEV_TX_LOCKED;
+   cb-count = count;
+   cb-max_per_txd =

Oops in e100_up

2007-08-22 Thread Gerrit Renker

With the davem-2.6.24 tree I get the following Oops in the e100 driver (cribbed 
from console):

Code: 6c ff ff ff 8b 48 0c ba 01 00 00 00 89 f0 e8 1b f2 ff ff c7 86 9c 00 00 00
  01 00 00 00 e9 4e ff ff ff 89 d0 e8 b3 f8 0b 00 eb 8e 0f 0b eb fe 55 89 
e5
  56 53 83 ec 0c 8b 98 dc 01 00 00 e8 ff b9

EIP: e100_up+0x11d/0x121 

SS:ESP 0068:f759ce38

Stack: syscall_call - sys_ioctl - vfs_ioctl - do_ioctl - sock_ioctl - 
inet_ioctl - devinet_ioctl -
   dev_change_flags - dev_open - e100_open - oops

The system log then goes on reporting eth0: link up, 100Mbps, full-duplex and 
hangs while trying to
restore the serial console state (not sure that this is related).
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: net-2.6.24 failure with netconsole

2007-08-22 Thread David Miller

From: Andrew Morton [EMAIL PROTECTED]
Date: Tue, 21 Aug 2007 22:54:38 -0700

 Has anyone tested all this new napi stuff with netconsole?  It's pretty
 disastrous.  It immediately goes BUG in napi_enable().

Thomas Graf has found and fixed a bug in the netconsole napi
bits a few hours ago, maybe it fixes this problem?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-22 Thread David Miller

From: Krishna Kumar2 [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 12:33:04 +0530

 Does turning off batching solve that problem? What I mean by that is:
 batching can be disabled if a TSO device is worse for some cases.

This new batching stuff isn't going to be enabled or disabled
on a per-device basis just to get parity with how things are
now.

It should be enabled by default, and give at least as good
performance as what can be obtained right now.

Otherwise it's a clear regression.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Oops in e100_up

2007-08-22 Thread David Miller

From: Gerrit Renker [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 09:56:48 +0100

 With the davem-2.6.24 tree I get the following Oops in the e100 driver 
 (cribbed from console):

Probably the NAPI conversion, I'll try to get to diagnosing this
one soon but I've been wrapped up in some other tasks so if
someone could beat me to it that'd be great :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: net-2.6.24 failure with netconsole

2007-08-22 Thread Thomas Graf

* Andrew Morton [EMAIL PROTECTED] 2007-08-21 22:54
 Which used to be a BUG.  It later oopsed via a null-pointer deref in
 net_rx_action(), which is a much preferable result.

I fixed this already

Index: net-2.6.24/include/linux/netpoll.h
===
--- net-2.6.24.orig/include/linux/netpoll.h 2007-08-22 01:02:14.0 
+0200
+++ net-2.6.24/include/linux/netpoll.h  2007-08-22 01:02:30.0 +0200
@@ -75,7 +75,7 @@ static inline void *netpoll_poll_lock(st
struct net_device *dev = napi-dev;
 
rcu_read_lock(); /* deal with race on -npinfo */
-   if (dev-npinfo) {
+   if (dev  dev-npinfo) {
spin_lock(napi-poll_lock);
napi-poll_owner = smp_processor_id();
return napi;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.24] [NET] Cleanup: DIV_ROUND_UP

2007-08-22 Thread Ilpo Järvinen


Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_output.c |6 +-
 net/key/af_key.c  |   17 +
 2 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 10b2e39..bca4ee2 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -646,11 +646,7 @@ static void tcp_set_skb_tso_segs(struct sock *sk, struct 
sk_buff *skb, unsigned
skb_shinfo(skb)-gso_size = 0;
skb_shinfo(skb)-gso_type = 0;
} else {
-   unsigned int factor;
-
-   factor = skb-len + (mss_now - 1);
-   factor /= mss_now;
-   skb_shinfo(skb)-gso_segs = factor;
+   skb_shinfo(skb)-gso_segs = DIV_ROUND_UP(skb-len, mss_now);
skb_shinfo(skb)-gso_size = mss_now;
skb_shinfo(skb)-gso_type = sk-sk_gso_type;
}
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 5502df1..17b2a69 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -352,16 +352,14 @@ static int verify_address_len(void *p)
 
switch (addr-sa_family) {
case AF_INET:
-   len  = sizeof(*sp) + sizeof(*sin) + (sizeof(uint64_t) - 1);
-   len /= sizeof(uint64_t);
+   len = DIV_ROUND_UP(sizeof(*sp) + sizeof(*sin), 
sizeof(uint64_t));
if (sp-sadb_address_len != len ||
sp-sadb_address_prefixlen  32)
return -EINVAL;
break;
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
case AF_INET6:
-   len  = sizeof(*sp) + sizeof(*sin6) + (sizeof(uint64_t) - 1);
-   len /= sizeof(uint64_t);
+   len = DIV_ROUND_UP(sizeof(*sp) + sizeof(*sin6), 
sizeof(uint64_t));
if (sp-sadb_address_len != len ||
sp-sadb_address_prefixlen  128)
return -EINVAL;
@@ -386,14 +384,9 @@ static int verify_address_len(void *p)
 
 static inline int pfkey_sec_ctx_len(struct sadb_x_sec_ctx *sec_ctx)
 {
-   int len = 0;
-
-   len += sizeof(struct sadb_x_sec_ctx);
-   len += sec_ctx-sadb_x_ctx_len;
-   len += sizeof(uint64_t) - 1;
-   len /= sizeof(uint64_t);
-
-   return len;
+   return DIV_ROUND_UP(sizeof(struct sadb_x_sec_ctx) +
+   sec_ctx-sadb_x_ctx_len,
+   sizeof(uint64_t));
 }
 
 static inline int verify_sec_ctx_len(void *p)
-- 
1.5.0.6

[PATCH -mm] drivers/net/e1000e/netdev.c warning fix

2007-08-22 Thread Michal Piotrowski

Hi,

This patch fixes the following compilation warning

drivers/net/e1000e/netdev.c: In function ‘e1000_setup_rctl’:
drivers/net/e1000e/netdev.c:1963: warning: unused variable ‘pages’

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/

--- linux-mm-clean/drivers/net/e1000e/netdev.c  2007-08-22 12:20:31.0 
+0200
+++ linux-work/drivers/net/e1000e/netdev.c  2007-08-22 14:44:58.0 
+0200
@@ -1960,7 +1960,10 @@ static void e1000_setup_rctl(struct e100
struct e1000_hw *hw = adapter-hw;
u32 rctl, rfctl;
u32 psrctl = 0;
+
+#ifndef CONFIG_E1000_DISABLE_PACKET_SPLIT
u32 pages = 0;
+#endif
 
/* Program MC offset vector base */
rctl = er32(RCTL);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] net/core: Fix crash in dev_mc_sync()/dev_mc_unsync()

2007-08-22 Thread Benjamin Thery


Oops, don't use the previous version of the patch:
the change in dev_mc_unsync() was not correct.
Sorry.

This one is a lot better (it compiles and runs). :)

Benjamin
--
B e n j a m i n   T h e r y  - BULL/DT/Open Software RD

   http://www.bull.com
From: [EMAIL PROTECTED]
Subject: net/core: Fix crash in dev_mc_sync()/dev_mc_unsync()

This patch fixes a crash that may occur when the routine dev_mc_sync()
deletes an address from the list it is currently going through. It 
saves the pointer to the next element before deleting the current one.
The problem may also exist in dev_mc_unsync().

Signed-off-by: Benjamin Thery [EMAIL PROTECTED]
---
 net/core/dev_mcast.c |   14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux-2.6.23-rc2/net/core/dev_mcast.c
===
--- linux-2.6.23-rc2.orig/net/core/dev_mcast.c
+++ linux-2.6.23-rc2/net/core/dev_mcast.c
@@ -116,11 +116,13 @@ int dev_mc_add(struct net_device *dev, v
  */
 int dev_mc_sync(struct net_device *to, struct net_device *from)
 {
-	struct dev_addr_list *da;
+	struct dev_addr_list *da, *next;
 	int err = 0;
 
 	netif_tx_lock_bh(to);
-	for (da = from-mc_list; da != NULL; da = da-next) {
+	da = from-mc_list;
+	while (da != NULL) {
+		next = da-next;
 		if (!da-da_synced) {
 			err = __dev_addr_add(to-mc_list, to-mc_count,
 	 da-da_addr, da-da_addrlen, 0);
@@ -134,6 +136,7 @@ int dev_mc_sync(struct net_device *to, s
 			__dev_addr_delete(from-mc_list, from-mc_count,
 	  da-da_addr, da-da_addrlen, 0);
 		}
+		da = next;
 	}
 	if (!err)
 		__dev_set_rx_mode(to);
@@ -156,12 +159,14 @@ EXPORT_SYMBOL(dev_mc_sync);
  */
 void dev_mc_unsync(struct net_device *to, struct net_device *from)
 {
-	struct dev_addr_list *da;
+	struct dev_addr_list *da, *next;
 
 	netif_tx_lock_bh(from);
 	netif_tx_lock_bh(to);
 
-	for (da = from-mc_list; da != NULL; da = da-next) {
+	da = from-mc_list;
+	while (da != NULL) {
+		next = da-next;
 		if (!da-da_synced)
 			continue;
 		__dev_addr_delete(to-mc_list, to-mc_count,
@@ -169,6 +174,7 @@ void dev_mc_unsync(struct net_device *to
 		da-da_synced = 0;
 		__dev_addr_delete(from-mc_list, from-mc_count,
   da-da_addr, da-da_addrlen, 0);
+		da = next;
 	}
 	__dev_set_rx_mode(to);

[PATCH 2.6.23-rc3-mm1] request_irq fix DEBUG_SHIRQ handling Re: 2.6.23-rc2-mm1: rtl8139 inconsistent lock state

2007-08-22 Thread Jarek Poplawski

On 10-08-2007 01:49, Mariusz Kozlowski wrote:
 Hello,
 
 =
 [ INFO: inconsistent lock state ]
 2.6.23-rc2-mm1 #7
 -
 inconsistent {in-hardirq-W} - {hardirq-on-W} usage.
 ifconfig/5492 [HC0[0]:SC0[0]:HE1:SE1] takes:
  (tp-lock){+...}, at: [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too]
 {in-hardirq-W} state was registered at:
   [c0138eeb] __lock_acquire+0x949/0x11ac
   [c01397e7] lock_acquire+0x99/0xb2
   [c0452ff3] _spin_lock+0x35/0x42
   [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too]
   [c0147a5d] handle_IRQ_event+0x28/0x59
   [c01493ca] handle_level_irq+0xad/0x10b
   [c0105a13] do_IRQ+0x93/0xd0
   [c010441e] common_interrupt+0x2e/0x34
...
 other info that might help us debug this:
 1 lock held by ifconfig/5492:
  #0:  (rtnl_mutex){--..}, at: [c0451778] mutex_lock+0x1c/0x1f
 
 stack backtrace:
...
  [c0452ff3] _spin_lock+0x35/0x42
  [de8706e0] rtl8139_interrupt+0x27/0x46b [8139too]
  [c01480fd] free_irq+0x11b/0x146
  [de871d59] rtl8139_close+0x8a/0x14a [8139too]
  [c03bde63] dev_close+0x57/0x74
...

It looks like this was possible after David's fix, which really
enabled running of the handler in free_irq, but before Andrew's patch
disabling local irqs for this time.

So, this bug should be fixed, but IMHO similar problem is possible in
request_irq. And, I think, this is not only about lockdep complaining,
but real lockup possibility, because any locks in such a handler are
taken in another, not expected for them context, and could be
vulnerable (especially with softirqs, but probably hardirqs as well).

Reported-by: Mariusz Kozlowski [EMAIL PROTECTED]
Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]

---

diff -Nurp 2.6.23-rc3-mm1-/kernel/irq/manage.c 
2.6.23-rc3-mm1/kernel/irq/manage.c
--- 2.6.23-rc3-mm1-/kernel/irq/manage.c 2007-08-22 13:58:58.0 +0200
+++ 2.6.23-rc3-mm1/kernel/irq/manage.c  2007-08-22 14:12:21.0 +0200
@@ -546,14 +546,11 @@ int request_irq(unsigned int irq, irq_ha
 * We do this before actually registering it, to make sure that
 * a 'real' IRQ doesn't run in parallel with our fake
 */
-   if (irqflags  IRQF_DISABLED) {
-   unsigned long flags;
+   unsigned long flags;
 
-   local_irq_save(flags);
-   handler(irq, dev_id);
-   local_irq_restore(flags);
-   } else
-   handler(irq, dev_id);
+   local_irq_save(flags);
+   handler(irq, dev_id);
+   local_irq_restore(flags);
}
 #endif
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] AH4: Update IPv4 options handling to conform to RFC 4302.

2007-08-22 Thread Nick Bowler

I was asked to resend my message here, so here it is.
Please CC me on replies.
---
In testing our ESP/AH offload hardware, I discovered an issue with how AH
handles mutable fields in IPv4.  RFC 4302 (AH) states the following on the
subject:

For IPv4, the entire option is viewed as a unit; so even
though the type and length fields within most options are immutable
in transit, if an option is classified as mutable, the entire option
is zeroed for ICV computation purposes.

The current implementation does not zero the type and length fields, resulting
in authentication failures when communicating with hosts that do (i.e. FreeBSD).

I have tested record route and timestamp options (ping -R and ping -T) on a
small network involving Windows XP, FreeBSD 6.2, and Linux hosts, with one
router.  In the presence of these options, the FreeBSD and Linux hosts (with
the patch or with the hardware) can communicate.  The Windows XP host simply
fails to accept these packets with or without the patch.

I have also been trying to test source routing options (using traceroute -g),
but haven't had much luck getting this option to work *without* AH, let alone
with.

Signed-off-by: Nick Bowler [EMAIL PROTECTED]
---
 net/ipv4/ah4.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 7a23e59..39f6211 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -46,7 +46,7 @@ static int ip_clear_mutable_options(struct iphdr *iph, __be32 
*daddr)
memcpy(daddr, optptr+optlen-4, 4);
/* Fall through */
default:
-   memset(optptr+2, 0, optlen-2);
+   memset(optptr, 0, optlen);
}
l -= optlen;
optptr += optlen;
-- 
1.5.2.2

-- 
Nick Bowler, Elliptic Semiconductor (http://www.ellipticsemi.com/)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/10 Rev4] [E1000] Implement batching

2007-08-22 Thread Kok, Auke


Krishna Kumar wrote:

E1000: Implement batching capability (ported thanks to changes taken from
Jamal). Not all changes are made in this as in IPoIB, eg, handling
out of order skbs (see XXX in the first mail).

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
 e1000_main.c |  150 +++
 1 files changed, 121 insertions(+), 29 deletions(-)



Krishna,

while I appreciate the patch I would have preferred a patch to e1000e. Not only 
does the e1000e driver remove a lot of the workarounds for old silicon, it is 
also a good way for us to move the current e1000 driver into a bit more stable 
maintenance mode.


Do you think you can write this patch for e1000e instead? code-wise a lot of 
things are still the same, so your patch should be relatively easy to generate.


e1000e currently lives in a branch from jeff garzik's netdev-2.6 tree

Thanks,

Auke



diff -ruNp org/drivers/net/e1000/e1000_main.c new/drivers/net/e1000/e1000_main.c
--- org/drivers/net/e1000/e1000_main.c  2007-08-20 14:26:29.0 +0530
+++ new/drivers/net/e1000/e1000_main.c  2007-08-22 08:33:51.0 +0530
@@ -157,6 +157,7 @@ static void e1000_update_phy_info(unsign
 static void e1000_watchdog(unsigned long data);
 static void e1000_82547_tx_fifo_stall(unsigned long data);
 static int e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
+static int e1000_xmit_frames(struct net_device *dev);
 static struct net_device_stats * e1000_get_stats(struct net_device *netdev);
 static int e1000_change_mtu(struct net_device *netdev, int new_mtu);
 static int e1000_set_mac(struct net_device *netdev, void *p);
@@ -990,7 +991,7 @@ e1000_probe(struct pci_dev *pdev,
if (pci_using_dac)
netdev-features |= NETIF_F_HIGHDMA;
 
-	netdev-features |= NETIF_F_LLTX;

+   netdev-features |= NETIF_F_LLTX | NETIF_F_BATCH_SKBS;
 
 	adapter-en_mng_pt = e1000_enable_mng_pass_thru(adapter-hw);
 
@@ -3098,6 +3099,18 @@ e1000_tx_map(struct e1000_adapter *adapt

return count;
 }
 
+static void e1000_kick_DMA(struct e1000_adapter *adapter,

+  struct e1000_tx_ring *tx_ring, int i)
+{
+   wmb();
+
+   writel(i, adapter-hw.hw_addr + tx_ring-tdt);
+   /* we need this if more than one processor can write to our tail
+* at a time, it syncronizes IO on IA64/Altix systems */
+   mmiowb();
+}
+
+
 static void
 e1000_tx_queue(struct e1000_adapter *adapter, struct e1000_tx_ring *tx_ring,
int tx_flags, int count)
@@ -3144,13 +3157,7 @@ e1000_tx_queue(struct e1000_adapter *ada
 * know there are new descriptors to fetch.  (Only
 * applicable for weak-ordered memory model archs,
 * such as IA-64). */
-   wmb();
-
tx_ring-next_to_use = i;
-   writel(i, adapter-hw.hw_addr + tx_ring-tdt);
-   /* we need this if more than one processor can write to our tail
-* at a time, it syncronizes IO on IA64/Altix systems */
-   mmiowb();
 }
 
 /**

@@ -3257,21 +3264,31 @@ static int e1000_maybe_stop_tx(struct ne
 }
 
 #define TXD_USE_COUNT(S, X) (((S)  (X)) + 1 )

+
+struct e1000_tx_cbdata {
+   int count;
+   unsigned int max_per_txd;
+   unsigned int nr_frags;
+   unsigned int mss;
+};
+
+#define E1000_SKB_CB(__skb)((struct e1000_tx_cbdata *)((__skb)-cb[0]))
+#define NETDEV_TX_DROPPED  -5
+
 static int
-e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
+e1000_prep_queue_frame(struct sk_buff *skb, struct net_device *netdev)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_tx_ring *tx_ring;
-   unsigned int first, max_per_txd = E1000_MAX_DATA_PER_TXD;
+   unsigned int max_per_txd = E1000_MAX_DATA_PER_TXD;
unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
-   unsigned int tx_flags = 0;
unsigned int len = skb-len;
-   unsigned long flags;
-   unsigned int nr_frags = 0;
-   unsigned int mss = 0;
+   unsigned int nr_frags;
+   unsigned int mss;
int count = 0;
-   int tso;
unsigned int f;
+   struct e1000_tx_cbdata *cb = E1000_SKB_CB(skb);
+
len -= skb-data_len;
 
 	/* This goes back to the question of how to logically map a tx queue

@@ -3282,7 +3299,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
 
 	if (unlikely(skb-len = 0)) {

dev_kfree_skb_any(skb);
-   return NETDEV_TX_OK;
+   return NETDEV_TX_DROPPED;
}
 
 	/* 82571 and newer doesn't need the workaround that limited descriptor

@@ -3328,7 +3345,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
DPRINTK(DRV, ERR,
__pskb_pull_tail failed.\n);
dev_kfree_skb_any(skb);
-   return NETDEV_TX_OK;
+   return NETDEV_TX_DROPPED;

Re: [PATCH -mm] drivers/net/e1000e/netdev.c warning fix

2007-08-22 Thread Kok, Auke


Michal Piotrowski wrote:

Hi,

This patch fixes the following compilation warning

drivers/net/e1000e/netdev.c: In function â€˜e1000_setup_rctlâ€™:
drivers/net/e1000e/netdev.c:1963: warning: unused variable â€˜pagesâ€™

Regards,
Michal



also exposes a symbol issue. I think I want to remove this #ifdef 
CONFIG_E1000_DISABLE_PACKET_SPLIT from this driver alltogether...


Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] ehea: fix interface to DLPAR tools

2007-08-22 Thread Jan-Bernd Themann

Userspace DLPAR tool expects decimal numbers to be written to
and read from sysfs entries.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]


---
 drivers/net/ehea/ehea_main.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 9756211..22d000f 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -2490,7 +2490,7 @@ static ssize_t ehea_show_port_id(struct device *dev,
 struct device_attribute *attr, char *buf)
 {
struct ehea_port *port = container_of(dev, struct ehea_port, ofdev.dev);
-   return sprintf(buf, 0x%X, port-logical_port_id);
+   return sprintf(buf, %d, port-logical_port_id);
 }
 
 static DEVICE_ATTR(log_port_id, S_IRUSR | S_IRGRP | S_IROTH, ehea_show_port_id,
@@ -2781,7 +2781,7 @@ static ssize_t ehea_probe_port(struct device *dev,
 
u32 logical_port_id;
 
-   sscanf(buf, %X, logical_port_id);
+   sscanf(buf, %d, logical_port_id);
 
port = ehea_get_port(adapter, logical_port_id);
 
@@ -2834,7 +2834,7 @@ static ssize_t ehea_remove_port(struct device *dev,
int i;
u32 logical_port_id;
 
-   sscanf(buf, %X, logical_port_id);
+   sscanf(buf, %d, logical_port_id);
 
port = ehea_get_port(adapter, logical_port_id);
 
-- 
1.5.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] ehea: fix module parameter description

2007-08-22 Thread Jan-Bernd Themann

Update the module parameter description of use_mcs to
show correct default value

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 22d000f..db57474 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -76,7 +76,7 @@ MODULE_PARM_DESC(rq1_entries, Number of entries for Receive 
Queue 1 
 MODULE_PARM_DESC(sq_entries,  Number of entries for the Send Queue  
 [2^x - 1], x = [6..14]. Default = 
 __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) ));
-MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 1 );
+MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 0 );
 
 static int port_name_cnt = 0;
 static LIST_HEAD(adapter_list);
-- 
1.5.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] ehea: fix queue destructor

2007-08-22 Thread Jan-Bernd Themann

Includes hcp_epas_dtor in eq/cq/qp destructors to unmap
HW register.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea_qmr.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ehea/ehea_qmr.c b/drivers/net/ehea/ehea_qmr.c
index a36fa6c..c82e245 100644
--- a/drivers/net/ehea/ehea_qmr.c
+++ b/drivers/net/ehea/ehea_qmr.c
@@ -235,6 +235,8 @@ int ehea_destroy_cq(struct ehea_cq *cq)
if (!cq)
return 0;
 
+   hcp_epas_dtor(cq-epas);
+
if ((hret = ehea_destroy_cq_res(cq, NORMAL_FREE)) == H_R_STATE) {
ehea_error_data(cq-adapter, cq-fw_handle);
hret = ehea_destroy_cq_res(cq, FORCE_FREE);
@@ -361,6 +363,8 @@ int ehea_destroy_eq(struct ehea_eq *eq)
if (!eq)
return 0;
 
+   hcp_epas_dtor(eq-epas);
+
if ((hret = ehea_destroy_eq_res(eq, NORMAL_FREE)) == H_R_STATE) {
ehea_error_data(eq-adapter, eq-fw_handle);
hret = ehea_destroy_eq_res(eq, FORCE_FREE);
@@ -541,6 +545,8 @@ int ehea_destroy_qp(struct ehea_qp *qp)
if (!qp)
return 0;
 
+   hcp_epas_dtor(qp-epas);
+
if ((hret = ehea_destroy_qp_res(qp, NORMAL_FREE)) == H_R_STATE) {
ehea_error_data(qp-adapter, qp-fw_handle);
hret = ehea_destroy_qp_res(qp, FORCE_FREE);
-- 
1.5.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] ehea: show physical port state

2007-08-22 Thread Jan-Bernd Themann

Introduces a module parameter to decide whether the physical
port link state is propagated to the network stack or not.
It makes sense not to take the physical port state into account
on machines with more logical partitions that communicate
with each other. This is always possible no matter what the physical
port state is. Thus eHEA can be considered as a switch there.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea.h  |5 -
 drivers/net/ehea/ehea_main.c |   14 +-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index d67f97b..8d58be5 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0073
+#define DRV_VERSIONEHEA_0074
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
@@ -402,6 +402,8 @@ struct ehea_mc_list {
 
 #define EHEA_PORT_UP 1
 #define EHEA_PORT_DOWN 0
+#define EHEA_PHY_LINK_UP 1
+#define EHEA_PHY_LINK_DOWN 0
 #define EHEA_MAX_PORT_RES 16
 struct ehea_port {
struct ehea_adapter *adapter;/* adapter that owns this port */
@@ -427,6 +429,7 @@ struct ehea_port {
u32 msg_enable;
u32 sig_comp_iv;
u32 state;
+   u8 phy_link;
u8 full_duplex;
u8 autoneg;
u8 num_def_qps;
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index db57474..1804c99 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -53,17 +53,21 @@ static int rq3_entries = EHEA_DEF_ENTRIES_RQ3;
 static int sq_entries = EHEA_DEF_ENTRIES_SQ;
 static int use_mcs = 0;
 static int num_tx_qps = EHEA_NUM_TX_QP;
+static int show_phys_link = 0;
 
 module_param(msg_level, int, 0);
 module_param(rq1_entries, int, 0);
 module_param(rq2_entries, int, 0);
 module_param(rq3_entries, int, 0);
 module_param(sq_entries, int, 0);
+module_param(show_phys_link, int, 0);
 module_param(use_mcs, int, 0);
 module_param(num_tx_qps, int, 0);
 
 MODULE_PARM_DESC(num_tx_qps, Number of TX-QPS);
 MODULE_PARM_DESC(msg_level, msg_level);
+MODULE_PARM_DESC(show_phys_link, Show link state of external port
+1:yes, 0: no.  Default = 0 );
 MODULE_PARM_DESC(rq3_entries, Number of entries for Receive Queue 3 
 [2^x - 1], x = [6..14]. Default = 
 __MODULE_STRING(EHEA_DEF_ENTRIES_RQ3) ));
@@ -814,7 +818,9 @@ int ehea_set_portspeed(struct ehea_port *port, u32 
port_speed)
ehea_error(Failed setting port speed);
}
}
-   netif_carrier_on(port-netdev);
+   if (!show_phys_link || (port-phy_link == EHEA_PHY_LINK_UP))
+   netif_carrier_on(port-netdev);
+
kfree(cb4);
 out:
return ret;
@@ -869,13 +875,19 @@ static void ehea_parse_eqe(struct ehea_adapter *adapter, 
u64 eqe)
}
 
if (EHEA_BMASK_GET(NEQE_EXTSWITCH_PORT_UP, eqe)) {
+   port-phy_link = EHEA_PHY_LINK_UP;
if (netif_msg_link(port))
ehea_info(%s: Physical port up,
  port-netdev-name);
+   if (show_phys_link)
+   netif_carrier_on(port-netdev);
} else {
+   port-phy_link = EHEA_PHY_LINK_DOWN;
if (netif_msg_link(port))
ehea_info(%s: Physical port down,
  port-netdev-name);
+   if (show_phys_link)
+   netif_carrier_off(port-netdev);
}
 
if (EHEA_BMASK_GET(NEQE_EXTSWITCH_PRIMARY, eqe))
-- 
1.5.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET] sgiseeq: Fix return type of sgiseeq_remove

2007-08-22 Thread Ralf Baechle

The driver remove method needs to return an int not void.  This was just
never noticed because usually this driver is not being built as a module.

Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

diff --git a/drivers/net/sgiseeq.c b/drivers/net/sgiseeq.c
index 384b468..0fb74cb 100644
--- a/drivers/net/sgiseeq.c
+++ b/drivers/net/sgiseeq.c
@@ -726,7 +726,7 @@ err_out:
return err;
 }
 
-static void __exit sgiseeq_remove(struct platform_device *pdev)
+static int __exit sgiseeq_remove(struct platform_device *pdev)
 {
struct net_device *dev = platform_get_drvdata(pdev);
struct sgiseeq_private *sp = netdev_priv(dev);
@@ -735,6 +735,8 @@ static void __exit sgiseeq_remove(struct platform_device 
*pdev)
free_page((unsigned long) sp-srings);
free_netdev(dev);
platform_set_drvdata(pdev, NULL);
+
+   return 0;
 }
 
 static struct platform_driver sgiseeq_driver = {
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] e1000e: retire last_tx_tso workaround

2007-08-22 Thread Auke Kok

This TSO-related workaround is no longer needed since it's only
applicable for 8254x silicon.

Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/e1000e/e1000.h  |   15 +++
 drivers/net/e1000e/netdev.c |   20 ++--
 2 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index e3cd877..bbe5faf 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -142,18 +142,9 @@ struct e1000_ring {
/* array of buffer information structs */
struct e1000_buffer *buffer_info;
 
-   union {
-   /* for TX */
-   struct {
-   bool last_tx_tso; /* used to mark tso desc.  */
-   };
-   /* for RX */
-   struct {
-   /* arrays of page information for packet split */
-   struct e1000_ps_page *ps_pages;
-   struct sk_buff *rx_skb_top;
-   };
-   };
+   /* arrays of page information for packet split */
+   struct e1000_ps_page *ps_pages;
+   struct sk_buff *rx_skb_top;
 
struct e1000_queue_stats stats;
 };
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 8ebe238..4916f7c 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1483,7 +1483,6 @@ static void e1000_clean_tx_ring(struct e1000_adapter 
*adapter)
 
tx_ring-next_to_use = 0;
tx_ring-next_to_clean = 0;
-   tx_ring-last_tx_tso = 0;
 
writel(0, adapter-hw.hw_addr + tx_ring-head);
writel(0, adapter-hw.hw_addr + tx_ring-tail);
@@ -3216,15 +3215,6 @@ static int e1000_tx_map(struct e1000_adapter *adapter,
while (len) {
buffer_info = tx_ring-buffer_info[i];
size = min(len, max_per_txd);
-   /* Workaround for Controller erratum --
-* descriptor for non-tso packet in a linear SKB that follows a
-* tso gets written back prematurely before the data is fully
-* DMA'd to the controller */
-   if (tx_ring-last_tx_tso  !skb_is_gso(skb)) {
-   tx_ring-last_tx_tso = 0;
-   if (!skb-data_len)
-   size -= 4;
-   }
 
/* Workaround for premature desc write-backs
 * in TSO mode.  Append 4-byte sentinel desc */
@@ -3497,10 +3487,6 @@ static int e1000_xmit_frame(struct sk_buff *skb, struct 
net_device *netdev)
count++;
count++;
 
-   /* Controller Erratum workaround */
-   if (!skb-data_len  tx_ring-last_tx_tso  !skb_is_gso(skb))
-   count++;
-
count += TXD_USE_COUNT(len, max_txd_pwr);
 
nr_frags = skb_shinfo(skb)-nr_frags;
@@ -3536,12 +3522,10 @@ static int e1000_xmit_frame(struct sk_buff *skb, struct 
net_device *netdev)
return NETDEV_TX_OK;
}
 
-   if (tso) {
-   tx_ring-last_tx_tso = 1;
+   if (tso)
tx_flags |= E1000_TX_FLAGS_TSO;
-   } else if (e1000_tx_csum(adapter, skb)) {
+   else if (e1000_tx_csum(adapter, skb))
tx_flags |= E1000_TX_FLAGS_CSUM;
-   }
 
/* Old method was to assume IPv4 packet by default if TSO was enabled.
 * 82571 hardware supports TSO capabilities for IPv6 as well...
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] e1000e: Add read code and printout of PBA number (board identifier)

2007-08-22 Thread Auke Kok

The PBA number allows customers and support to directly identify
the type of board and characteristics such as different skews.

Slightly enhance loading messages by adding module name to printout.

Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/e1000e/defines.h |6 --
 drivers/net/e1000e/e1000.h   |2 ++
 drivers/net/e1000e/lib.c |   21 +
 drivers/net/e1000e/netdev.c  |   12 +---
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/drivers/net/e1000e/defines.h b/drivers/net/e1000e/defines.h
index ca80fde..b32ed45 100644
--- a/drivers/net/e1000e/defines.h
+++ b/drivers/net/e1000e/defines.h
@@ -573,9 +573,11 @@
 /* For checksumming, the sum of all words in the NVM should equal 0xBABA. */
 #define NVM_SUM0xBABA
 
-#define NVM_WORD_SIZE_BASE_SHIFT   6
+/* PBA (printed board assembly) number words */
+#define NVM_PBA_OFFSET_0   8
+#define NVM_PBA_OFFSET_1   9
 
-/* NVM Commands - Microwire */
+#define NVM_WORD_SIZE_BASE_SHIFT   6
 
 /* NVM Commands - SPI */
 #define NVM_MAX_RETRY_SPI  5000 /* Max wait of 5ms, for RDY signal */
diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index bbe5faf..c57e35a 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -358,6 +358,8 @@ extern struct e1000_info e1000_ich8_info;
 extern struct e1000_info e1000_ich9_info;
 extern struct e1000_info e1000_es2_info;
 
+extern s32 e1000e_read_part_num(struct e1000_hw *hw, u32 *part_num);
+
 extern s32  e1000e_commit_phy(struct e1000_hw *hw);
 
 extern bool e1000e_enable_mng_pass_thru(struct e1000_hw *hw);
diff --git a/drivers/net/e1000e/lib.c b/drivers/net/e1000e/lib.c
index 6645c21..3bbfe60 100644
--- a/drivers/net/e1000e/lib.c
+++ b/drivers/net/e1000e/lib.c
@@ -2464,3 +2464,24 @@ bool e1000e_enable_mng_pass_thru(struct e1000_hw *hw)
return ret_val;
 }
 
+s32 e1000e_read_part_num(struct e1000_hw *hw, u32 *part_num)
+{
+   s32 ret_val;
+   u16 nvm_data;
+
+   ret_val = e1000_read_nvm(hw, NVM_PBA_OFFSET_0, 1, nvm_data);
+   if (ret_val) {
+   hw_dbg(hw, NVM Read Error\n);
+   return ret_val;
+   }
+   *part_num = (u32)(nvm_data  16);
+
+   ret_val = e1000_read_nvm(hw, NVM_PBA_OFFSET_1, 1, nvm_data);
+   if (ret_val) {
+   hw_dbg(hw, NVM Read Error\n);
+   return ret_val;
+   }
+   *part_num |= nvm_data;
+
+   return 0;
+}
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 4916f7c..420e111 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -3966,6 +3966,7 @@ static void e1000_print_device_info(struct e1000_adapter 
*adapter)
 {
struct e1000_hw *hw = adapter-hw;
struct net_device *netdev = adapter-netdev;
+   u32 part_num;
 
/* print bus type/speed/width info */
ndev_info(netdev, (PCI Express:2.5GB/s:%s) 
@@ -3980,6 +3981,10 @@ static void e1000_print_device_info(struct e1000_adapter 
*adapter)
ndev_info(netdev, Intel(R) PRO/%s Network Connection\n,
  (hw-phy.type == e1000_phy_ife)
   ? 10/100 : 1000);
+   e1000e_read_part_num(hw, part_num);
+   ndev_info(netdev, MAC: %d, PHY: %d, PBA No: %06x-%03x\n,
+ hw-mac.type, hw-phy.type,
+ (part_num  8), (part_num  0xff));
 }
 
 /**
@@ -4414,9 +4419,10 @@ static struct pci_driver e1000_driver = {
 static int __init e1000_init_module(void)
 {
int ret;
-   printk(KERN_INFO Intel(R) PRO/1000 Network Driver - %s\n,
-  e1000e_driver_version);
-   printk(KERN_INFO Copyright (c) 1999-2007 Intel Corporation.\n);
+   printk(KERN_INFO %s: Intel(R) PRO/1000 Network Driver - %s\n,
+  e1000e_driver_name, e1000e_driver_version);
+   printk(KERN_INFO %s: Copyright (c) 1999-2007 Intel Corporation.\n,
+  e1000e_driver_name);
ret = pci_register_driver(e1000_driver);
 
return ret;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] e1000e: Remove conditional packet split disable flag

2007-08-22 Thread Auke Kok

This flag conflicts with e1000's Kconfig symbol and we'll leave
the feature enabled by default for now.

Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/e1000e/netdev.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 420e111..372da46 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2009,7 +2009,6 @@ static void e1000_setup_rctl(struct e1000_adapter 
*adapter)
break;
}
 
-#ifndef CONFIG_E1000_DISABLE_PACKET_SPLIT
/*
 * 82571 and greater support packet-split where the protocol
 * header is placed in skb-data and the packet data is
@@ -2029,7 +2028,7 @@ static void e1000_setup_rctl(struct e1000_adapter 
*adapter)
pages = PAGE_USE_COUNT(adapter-netdev-mtu);
if ((pages = 3)  (PAGE_SIZE = 16384)  (rctl  E1000_RCTL_LPE))
adapter-rx_ps_pages = pages;
-#endif
+
if (adapter-rx_ps_pages) {
/* Configure extra packet-split registers */
rfctl = er32(RFCTL);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [03/10] pasemi_mac: Enable L2 caching of packet headers

2007-08-22 Thread Olof Johansson

Enable settings to target l2 for the first few cachelines of the packet, since
we'll access them to get to the various headers.


Signed-off-by: Olof Johansson [EMAIL PROTECTED]


Index: mainline/drivers/net/pasemi_mac.c
===
--- mainline.orig/drivers/net/pasemi_mac.c
+++ mainline/drivers/net/pasemi_mac.c
@@ -216,7 +216,7 @@ static int pasemi_mac_setup_rx_resources
   PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE  2));
 
write_dma_reg(mac, PAS_DMA_RXCHAN_CFG(chan_id),
-  PAS_DMA_RXCHAN_CFG_HBU(1));
+  PAS_DMA_RXCHAN_CFG_HBU(2));
 
write_dma_reg(mac, PAS_DMA_RXINT_BASEL(mac-dma_if),
   PAS_DMA_RXINT_BASEL_BRBL(__pa(ring-buffers)));
@@ -225,6 +225,9 @@ static int pasemi_mac_setup_rx_resources
   PAS_DMA_RXINT_BASEU_BRBH(__pa(ring-buffers)  32) |
   PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE  3));
 
+   write_dma_reg(mac, PAS_DMA_RXINT_CFG(mac-dma_if),
+  PAS_DMA_RXINT_CFG_DHL(2));
+
ring-next_to_fill = 0;
ring-next_to_clean = 0;
 
Index: mainline/drivers/net/pasemi_mac.h
===
--- mainline.orig/drivers/net/pasemi_mac.h
+++ mainline/drivers/net/pasemi_mac.h
@@ -218,6 +218,14 @@ enum {
 #definePAS_DMA_RXINT_RCMDSTA_ACT   0x0001
 #definePAS_DMA_RXINT_RCMDSTA_DROPS_M   0xfffe
 #definePAS_DMA_RXINT_RCMDSTA_DROPS_S   17
+#define PAS_DMA_RXINT_CFG(i)   (0x204+(i)*_PAS_DMA_RXINT_STRIDE)
+#definePAS_DMA_RXINT_CFG_DHL_M 0x0700
+#definePAS_DMA_RXINT_CFG_DHL_S 24
+#definePAS_DMA_RXINT_CFG_DHL(x)(((x)  PAS_DMA_RXINT_CFG_DHL_S)  \
+PAS_DMA_RXINT_CFG_DHL_M)
+#definePAS_DMA_RXINT_CFG_WIF   0x0002
+#definePAS_DMA_RXINT_CFG_WIL   0x0001
+
 #define PAS_DMA_RXINT_INCR(i)  (0x210+(i)*_PAS_DMA_RXINT_STRIDE)
 #definePAS_DMA_RXINT_INCR_INCR_M   0x
 #definePAS_DMA_RXINT_INCR_INCR_S   0

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [00/10] pasemi_mac patches for 2.6.24

2007-08-22 Thread Olof Johansson

Hi,

pasemi_mac patches for 2.6.24:

01/10: pasemi_mac: Abstract out register access
02/10: pasemi_mac: Stop using the pci config space accessors for register 
read/writes
03/10: pasemi_mac: Enable L2 caching of packet headers
04/10: pasemi_mac: Fix memcpy amount for short receives
05/10: pasemi_mac: RX performance tweaks
06/10: pasemi_mac: Batch up TX buffer frees
07/10: pasemi_mac: Enable LLTX
08/10: pasemi_mac: Fix TX ring wrap checking
09/10: pasemi_mac: Fix RX checksum flags
10/10: pasemi_mac: Clean TX ring in poll


Thanks,

Olof

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [09/10] pasemi_mac: Fix RX checksum flags

2007-08-22 Thread Olof Johansson

RX side flag to use is CHECKSUM_UNNECESSARY, not CHECKSUM_COMPLETE.

Signed-off-by: Olof Johansson [EMAIL PROTECTED]

Index: mainline/drivers/net/pasemi_mac.c
===
--- mainline.orig/drivers/net/pasemi_mac.c
+++ mainline/drivers/net/pasemi_mac.c
@@ -534,7 +534,7 @@ static int pasemi_mac_clean_rx(struct pa
skb_put(skb, len);
 
if (likely((macrx  XCT_MACRX_HTY_M) == XCT_MACRX_HTY_IPV4_OK)) 
{
-   skb-ip_summed = CHECKSUM_COMPLETE;
+   skb-ip_summed = CHECKSUM_UNNECESSARY;
skb-csum = (macrx  XCT_MACRX_CSUM_M) 
   XCT_MACRX_CSUM_S;
} else

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [01/10] pasemi_mac: Abstract out register access

2007-08-22 Thread Olof Johansson

Abstract out the PCI config read/write accesses into reg read/write ones,
still calling the pci accessors on the back end.

Signed-off-by: Olof Johansson [EMAIL PROTECTED]


Index: mainline/drivers/net/pasemi_mac.c
===
--- mainline.orig/drivers/net/pasemi_mac.c
+++ mainline/drivers/net/pasemi_mac.c
@@ -81,6 +81,48 @@ MODULE_PARM_DESC(debug, PA Semi MAC bit
 
 static struct pasdma_status *dma_status;
 
+static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg)
+{
+   unsigned int val;
+
+   pci_read_config_dword(mac-iob_pdev, reg, val);
+   return val;
+}
+
+static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg,
+ unsigned int val)
+{
+   pci_write_config_dword(mac-iob_pdev, reg, val);
+}
+
+static unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg)
+{
+   unsigned int val;
+
+   pci_read_config_dword(mac-pdev, reg, val);
+   return val;
+}
+
+static void write_mac_reg(struct pasemi_mac *mac, unsigned int reg,
+ unsigned int val)
+{
+   pci_write_config_dword(mac-pdev, reg, val);
+}
+
+static unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg)
+{
+   unsigned int val;
+
+   pci_read_config_dword(mac-dma_pdev, reg, val);
+   return val;
+}
+
+static void write_dma_reg(struct pasemi_mac *mac, unsigned int reg,
+ unsigned int val)
+{
+   pci_write_config_dword(mac-dma_pdev, reg, val);
+}
+
 static int pasemi_get_mac_addr(struct pasemi_mac *mac)
 {
struct pci_dev *pdev = mac-pdev;
@@ -166,22 +208,21 @@ static int pasemi_mac_setup_rx_resources
 
memset(ring-buffers, 0, RX_RING_SIZE * sizeof(u64));
 
-   pci_write_config_dword(mac-dma_pdev, PAS_DMA_RXCHAN_BASEL(chan_id),
-  PAS_DMA_RXCHAN_BASEL_BRBL(ring-dma));
+   write_dma_reg(mac, PAS_DMA_RXCHAN_BASEL(chan_id), 
PAS_DMA_RXCHAN_BASEL_BRBL(ring-dma));
+
+   write_dma_reg(mac, PAS_DMA_RXCHAN_BASEU(chan_id),
+  PAS_DMA_RXCHAN_BASEU_BRBH(ring-dma  32) |
+  PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE  2));
+
+   write_dma_reg(mac, PAS_DMA_RXCHAN_CFG(chan_id),
+  PAS_DMA_RXCHAN_CFG_HBU(1));
 
-   pci_write_config_dword(mac-dma_pdev, PAS_DMA_RXCHAN_BASEU(chan_id),
-  PAS_DMA_RXCHAN_BASEU_BRBH(ring-dma  32) |
-  PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE  2));
-
-   pci_write_config_dword(mac-dma_pdev, PAS_DMA_RXCHAN_CFG(chan_id),
-  PAS_DMA_RXCHAN_CFG_HBU(1));
-
-   pci_write_config_dword(mac-dma_pdev, PAS_DMA_RXINT_BASEL(mac-dma_if),
-  PAS_DMA_RXINT_BASEL_BRBL(__pa(ring-buffers)));
-
-   pci_write_config_dword(mac-dma_pdev, PAS_DMA_RXINT_BASEU(mac-dma_if),
-  PAS_DMA_RXINT_BASEU_BRBH(__pa(ring-buffers)  
32) |
-  PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE  3));
+   write_dma_reg(mac, PAS_DMA_RXINT_BASEL(mac-dma_if),
+  PAS_DMA_RXINT_BASEL_BRBL(__pa(ring-buffers)));
+
+   write_dma_reg(mac, PAS_DMA_RXINT_BASEU(mac-dma_if),
+  PAS_DMA_RXINT_BASEU_BRBH(__pa(ring-buffers)  32) |
+  PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE  3));
 
ring-next_to_fill = 0;
ring-next_to_clean = 0;
@@ -233,18 +274,18 @@ static int pasemi_mac_setup_tx_resources
 
memset(ring-desc, 0, TX_RING_SIZE * sizeof(struct pas_dma_xct_descr));
 
-   pci_write_config_dword(mac-dma_pdev, PAS_DMA_TXCHAN_BASEL(chan_id),
-  PAS_DMA_TXCHAN_BASEL_BRBL(ring-dma));
+   write_dma_reg(mac, PAS_DMA_TXCHAN_BASEL(chan_id),
+  PAS_DMA_TXCHAN_BASEL_BRBL(ring-dma));
val = PAS_DMA_TXCHAN_BASEU_BRBH(ring-dma  32);
val |= PAS_DMA_TXCHAN_BASEU_SIZ(TX_RING_SIZE  2);
 
-   pci_write_config_dword(mac-dma_pdev, PAS_DMA_TXCHAN_BASEU(chan_id), 
val);
+   write_dma_reg(mac, PAS_DMA_TXCHAN_BASEU(chan_id), val);
 
-   pci_write_config_dword(mac-dma_pdev, PAS_DMA_TXCHAN_CFG(chan_id),
-  PAS_DMA_TXCHAN_CFG_TY_IFACE |
-  PAS_DMA_TXCHAN_CFG_TATTR(mac-dma_if) |
-  PAS_DMA_TXCHAN_CFG_UP |
-  PAS_DMA_TXCHAN_CFG_WT(2));
+   write_dma_reg(mac, PAS_DMA_TXCHAN_CFG(chan_id),
+  PAS_DMA_TXCHAN_CFG_TY_IFACE |
+  PAS_DMA_TXCHAN_CFG_TATTR(mac-dma_if) |
+  PAS_DMA_TXCHAN_CFG_UP |
+  PAS_DMA_TXCHAN_CFG_WT(2));
 
ring-next_to_use = 0;
ring-next_to_clean = 0;
@@ -383,12 +424,8 @@ static void pasemi_mac_replenish_rx_ring
 
wmb();
 
-   pci_write_config_dword(mac-dma_pdev,
-

[PATCH] [08/10] pasemi_mac: Fix TX ring wrap checking

2007-08-22 Thread Olof Johansson

The old logic didn't detect full (tx) ring cases properly, causing
overruns and general badness. Clean it up a bit and abstract out the
ring size checks, always making sure to leave 1 slot open.


Signed-off-by: Olof Johansson [EMAIL PROTECTED]

Index: mainline/drivers/net/pasemi_mac.c
===
--- mainline.orig/drivers/net/pasemi_mac.c
+++ mainline/drivers/net/pasemi_mac.c
@@ -69,6 +69,10 @@
 #define RX_DESC_INFO(mac, num) ((mac)-rx-desc_info[(num)  (RX_RING_SIZE-1)])
 #define RX_BUFF(mac, num)  ((mac)-rx-buffers[(num)  (RX_RING_SIZE-1)])
 
+#define RING_USED(ring)(((ring)-next_to_fill - 
(ring)-next_to_clean) \
+ ((ring)-size - 1))
+#define RING_AVAIL(ring)   ((ring-size) - RING_USED(ring))
+
 #define BUF_SIZE 1646 /* 1500 MTU + ETH_HLEN + VLAN_HLEN + 2 64B cachelines */
 
 MODULE_LICENSE(GPL);
@@ -184,6 +188,7 @@ static int pasemi_mac_setup_rx_resources
 
spin_lock_init(ring-lock);
 
+   ring-size = RX_RING_SIZE;
ring-desc_info = kzalloc(sizeof(struct pasemi_mac_buffer) *
  RX_RING_SIZE, GFP_KERNEL);
 
@@ -263,6 +268,7 @@ static int pasemi_mac_setup_tx_resources
 
spin_lock_init(ring-lock);
 
+   ring-size = TX_RING_SIZE;
ring-desc_info = kzalloc(sizeof(struct pasemi_mac_buffer) *
  TX_RING_SIZE, GFP_KERNEL);
if (!ring-desc_info)
@@ -291,7 +297,7 @@ static int pasemi_mac_setup_tx_resources
   PAS_DMA_TXCHAN_CFG_UP |
   PAS_DMA_TXCHAN_CFG_WT(2));
 
-   ring-next_to_use = 0;
+   ring-next_to_fill = 0;
ring-next_to_clean = 0;
 
snprintf(ring-irq_name, sizeof(ring-irq_name),
@@ -386,9 +392,7 @@ static void pasemi_mac_replenish_rx_ring
int start = mac-rx-next_to_fill;
unsigned int limit, count;
 
-   limit = (mac-rx-next_to_clean + RX_RING_SIZE -
-mac-rx-next_to_fill)  (RX_RING_SIZE - 1);
-
+   limit = RING_AVAIL(mac-rx);
/* Check to see if we're doing first-time setup */
if (unlikely(mac-rx-next_to_clean == 0  mac-rx-next_to_fill == 0))
limit = RX_RING_SIZE;
@@ -572,7 +576,7 @@ restart:
spin_lock_irqsave(mac-tx-lock, flags);
 
start = mac-tx-next_to_clean;
-   limit = min(mac-tx-next_to_use, start+32);
+   limit = min(mac-tx-next_to_fill, start+32);
 
count = 0;
 
@@ -1013,14 +1017,13 @@ static int pasemi_mac_start_tx(struct sk
 
spin_lock_irqsave(txring-lock, flags);
 
-   if (txring-next_to_clean - txring-next_to_use == TX_RING_SIZE) {
+   if (RING_AVAIL(txring) = 1) {
spin_unlock_irqrestore(txring-lock, flags);
pasemi_mac_clean_tx(mac);
pasemi_mac_restart_tx_intr(mac);
spin_lock_irqsave(txring-lock, flags);
 
-   if (txring-next_to_clean - txring-next_to_use ==
-   TX_RING_SIZE) {
+   if (RING_AVAIL(txring) = 1) {
/* Still no room -- stop the queue and wait for tx
 * intr when there's room.
 */
@@ -1029,15 +1032,15 @@ static int pasemi_mac_start_tx(struct sk
}
}
 
-   dp = TX_DESC(mac, txring-next_to_use);
-   info = TX_DESC_INFO(mac, txring-next_to_use);
+   dp = TX_DESC(mac, txring-next_to_fill);
+   info = TX_DESC_INFO(mac, txring-next_to_fill);
 
dp-mactx = mactx;
dp-ptr   = ptr;
info-dma = map;
info-skb = skb;
 
-   txring-next_to_use++;
+   txring-next_to_fill++;
mac-stats.tx_packets++;
mac-stats.tx_bytes += skb-len;
 
Index: mainline/drivers/net/pasemi_mac.h
===
--- mainline.orig/drivers/net/pasemi_mac.h
+++ mainline/drivers/net/pasemi_mac.h
@@ -31,7 +31,7 @@ struct pasemi_mac_txring {
struct pas_dma_xct_descr*desc;
dma_addr_t   dma;
unsigned int size;
-   unsigned int next_to_use;
+   unsigned int next_to_fill;
unsigned int next_to_clean;
struct pasemi_mac_buffer *desc_info;
char irq_name[10];  /* eth%d tx */

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [05/10] pasemi_mac: RX performance tweaks

2007-08-22 Thread Olof Johansson

Various RX performance tweaks, do some explicit prefetching of packet
data, etc.

Signed-off-by: Olof Johansson [EMAIL PROTECTED]

Index: mainline/drivers/net/pasemi_mac.c
===
--- mainline.orig/drivers/net/pasemi_mac.c
+++ mainline/drivers/net/pasemi_mac.c
@@ -481,6 +481,7 @@ static int pasemi_mac_clean_rx(struct pa
rmb();
 
dp = RX_DESC(mac, n);
+   prefetchw(dp);
macrx = dp-macrx;
 
if (!(macrx  XCT_MACRX_O))
@@ -502,8 +503,10 @@ static int pasemi_mac_clean_rx(struct pa
if (info-dma == dma)
break;
}
+   prefetchw(info);
 
skb = info-skb;
+   prefetchw(skb);
info-dma = 0;
 
pci_unmap_single(mac-dma_pdev, dma, skb-len,
@@ -526,9 +529,7 @@ static int pasemi_mac_clean_rx(struct pa
 
skb_put(skb, len);
 
-   skb-protocol = eth_type_trans(skb, mac-netdev);
-
-   if ((macrx  XCT_MACRX_HTY_M) == XCT_MACRX_HTY_IPV4_OK) {
+   if (likely((macrx  XCT_MACRX_HTY_M) == XCT_MACRX_HTY_IPV4_OK)) 
{
skb-ip_summed = CHECKSUM_COMPLETE;
skb-csum = (macrx  XCT_MACRX_CSUM_M) 
   XCT_MACRX_CSUM_S;
@@ -538,6 +539,7 @@ static int pasemi_mac_clean_rx(struct pa
mac-stats.rx_bytes += len;
mac-stats.rx_packets++;
 
+   skb-protocol = eth_type_trans(skb, mac-netdev);
netif_receive_skb(skb);
 
dp-ptr = 0;
@@ -569,7 +571,7 @@ static int pasemi_mac_clean_tx(struct pa
 
for (i = start; i  mac-tx-next_to_use; i++) {
dp = TX_DESC(mac, i);
-   if (!dp || (dp-mactx  XCT_MACTX_O))
+   if (unlikely(dp-mactx  XCT_MACTX_O))
break;
 
count++;
@@ -957,7 +959,7 @@ static int pasemi_mac_start_tx(struct sk
struct pasemi_mac_txring *txring;
struct pasemi_mac_buffer *info;
struct pas_dma_xct_descr *dp;
-   u64 dflags;
+   u64 dflags, mactx, ptr;
dma_addr_t map;
int flags;
 
@@ -985,6 +987,9 @@ static int pasemi_mac_start_tx(struct sk
if (dma_mapping_error(map))
return NETDEV_TX_BUSY;
 
+   mactx = dflags | XCT_MACTX_LLEN(skb-len);
+   ptr   = XCT_PTR_LEN(skb-len) | XCT_PTR_ADDR(map);
+
txring = mac-tx;
 
spin_lock_irqsave(txring-lock, flags);
@@ -1005,12 +1010,11 @@ static int pasemi_mac_start_tx(struct sk
}
}
 
-
dp = TX_DESC(mac, txring-next_to_use);
info = TX_DESC_INFO(mac, txring-next_to_use);
 
-   dp-mactx = dflags | XCT_MACTX_LLEN(skb-len);
-   dp-ptr   = XCT_PTR_LEN(skb-len) | XCT_PTR_ADDR(map);
+   dp-mactx = mactx;
+   dp-ptr   = ptr;
info-dma = map;
info-skb = skb;
 

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [10/10] pasemi_mac: Clean TX ring in poll

2007-08-22 Thread Olof Johansson

Unfortunately there's no timeout for how long a packet can sit on
the TX ring after completion before an interrupt is generated, and
we want to have a threshold that's larger than one packet per interrupt.

So we have to have a timer that occasionally cleans the TX ring even
though there hasn't been an interrupt. Instead of setting up a dedicated
timer for this, just clean it in the NAPI poll routine instead.


Signed-off-by: Olof Johansson [EMAIL PROTECTED]

---

I know I got this rejected last time it was submitted, but no answers with
suggestions on how to handle it better. I'm all ears if there's a better
way.  (I noticed that Intel's new ixgbe driver does the same thing).

Index: mainline/drivers/net/pasemi_mac.c
===
--- mainline.orig/drivers/net/pasemi_mac.c
+++ mainline/drivers/net/pasemi_mac.c
@@ -1086,6 +1086,7 @@ static int pasemi_mac_poll(struct net_de
int pkts, limit = min(*budget, dev-quota);
struct pasemi_mac *mac = netdev_priv(dev);
 
+   pasemi_mac_clean_tx(mac);
pkts = pasemi_mac_clean_rx(mac, limit);
 
dev-quota -= pkts;

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes

2007-08-22 Thread Olof Johansson

Move away from using the pci config access functions for simple register
access.  Our device has all of the registers in the config space (hey,
from the hardware point of view it looks reasonable :-), so we need to
somehow get to it. Newer firmwares have it in the device tree such that
we can just get it and ioremap it there (in case it ever moves in future
products). For now, provide a hardcoded fallback for older firmwares.


Signed-off-by: Olof Johansson [EMAIL PROTECTED]


Index: mainline/drivers/net/pasemi_mac.c
===
--- mainline.orig/drivers/net/pasemi_mac.c
+++ mainline/drivers/net/pasemi_mac.c
@@ -81,46 +81,47 @@ MODULE_PARM_DESC(debug, PA Semi MAC bit
 
 static struct pasdma_status *dma_status;
 
-static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg)
+static inline unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int 
reg)
 {
unsigned int val;
 
-   pci_read_config_dword(mac-iob_pdev, reg, val);
+   val = in_le32(mac-iob_regs+reg);
+
return val;
 }
 
-static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg,
+static inline void write_iob_reg(struct pasemi_mac *mac, unsigned int reg,
  unsigned int val)
 {
-   pci_write_config_dword(mac-iob_pdev, reg, val);
+   out_le32(mac-iob_regs+reg, val);
 }
 
-static unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg)
+static inline unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int 
reg)
 {
unsigned int val;
 
-   pci_read_config_dword(mac-pdev, reg, val);
+   val = in_le32(mac-regs+reg);
return val;
 }
 
-static void write_mac_reg(struct pasemi_mac *mac, unsigned int reg,
+static inline void write_mac_reg(struct pasemi_mac *mac, unsigned int reg,
  unsigned int val)
 {
-   pci_write_config_dword(mac-pdev, reg, val);
+   out_le32(mac-regs+reg, val);
 }
 
-static unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg)
+static inline unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int 
reg)
 {
unsigned int val;
 
-   pci_read_config_dword(mac-dma_pdev, reg, val);
+   val = in_le32(mac-dma_regs+reg);
return val;
 }
 
-static void write_dma_reg(struct pasemi_mac *mac, unsigned int reg,
+static inline void write_dma_reg(struct pasemi_mac *mac, unsigned int reg,
  unsigned int val)
 {
-   pci_write_config_dword(mac-dma_pdev, reg, val);
+   out_le32(mac-dma_regs+reg, val);
 }
 
 static int pasemi_get_mac_addr(struct pasemi_mac *mac)
@@ -585,7 +586,6 @@ static int pasemi_mac_clean_tx(struct pa
}
mac-tx-next_to_clean += count;
spin_unlock_irqrestore(mac-tx-lock, flags);
-
netif_wake_queue(mac-netdev);
 
return count;
@@ -1076,6 +1076,73 @@ static int pasemi_mac_poll(struct net_de
}
 }
 
+static inline void __iomem * __devinit map_onedev(struct pci_dev *p, int index)
+{
+   struct device_node *dn;
+   void __iomem *ret;
+
+   dn = pci_device_to_OF_node(p);
+   if (!dn)
+   goto fallback;
+
+   ret = of_iomap(dn, index);
+   if (!ret)
+   goto fallback;
+
+   return ret;
+fallback:
+   /* This is hardcoded and ugly, but we have some firmware versions
+* who don't provide the register space in the device tree. Luckily
+* they are at well-known locations so we can just do the math here.
+*/
+   return ioremap(0xe000 + (p-devfn  12), 0x2000);
+}
+
+static int __devinit pasemi_mac_map_regs(struct pasemi_mac *mac)
+{
+   struct resource res;
+   struct device_node *dn;
+   int err;
+
+   mac-dma_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa007, NULL);
+   if (!mac-dma_pdev) {
+   dev_err(mac-pdev-dev, Can't find DMA Controller\n);
+   return -ENODEV;
+   }
+
+   mac-iob_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa001, NULL);
+   if (!mac-iob_pdev) {
+   dev_err(mac-pdev-dev, Can't find I/O Bridge\n);
+   return -ENODEV;
+   }
+
+   mac-regs = map_onedev(mac-pdev, 0);
+   mac-dma_regs = map_onedev(mac-dma_pdev, 0);
+   mac-iob_regs = map_onedev(mac-iob_pdev, 0);
+
+   if (!mac-regs || !mac-dma_regs || !mac-iob_regs) {
+   dev_err(mac-pdev-dev, Can't map registers\n);
+   return -ENODEV;
+   }
+
+   /* The dma status structure is located in the I/O bridge, and
+* is cache coherent.
+*/
+   if (!dma_status) {
+   dn = pci_device_to_OF_node(mac-iob_pdev);
+   if (dn)
+   err = of_address_to_resource(dn, 1, res);
+   if (!dn || err) {
+   /* Fallback for old firmware */
+   res.start = 0xfd80;
+   res.end = res.start + 0x1000;
+   }
+

[PATCH] [06/10] pasemi_mac: Batch up TX buffer frees

2007-08-22 Thread Olof Johansson

Postpone pci unmap and skb free of the transmitted buffers to outside of
the tx ring lock, batching them up 32 at a time.

Also increase the count threshold to 128.

Signed-off-by: Olof Johansson [EMAIL PROTECTED]

Index: mainline/drivers/net/pasemi_mac.c
===
--- mainline.orig/drivers/net/pasemi_mac.c
+++ mainline/drivers/net/pasemi_mac.c
@@ -561,37 +561,56 @@ static int pasemi_mac_clean_tx(struct pa
int i;
struct pasemi_mac_buffer *info;
struct pas_dma_xct_descr *dp;
-   int start, count;
+   unsigned int start, count, limit;
+   unsigned int total_count;
int flags;
+   struct sk_buff *skbs[32];
+   dma_addr_t dmas[32];
 
+   total_count = 0;
+restart:
spin_lock_irqsave(mac-tx-lock, flags);
 
start = mac-tx-next_to_clean;
+   limit = min(mac-tx-next_to_use, start+32);
+
count = 0;
 
-   for (i = start; i  mac-tx-next_to_use; i++) {
+   for (i = start; i  limit; i++) {
dp = TX_DESC(mac, i);
+
if (unlikely(dp-mactx  XCT_MACTX_O))
+   /* Not yet transmitted */
break;
 
-   count++;
-
info = TX_DESC_INFO(mac, i);
-
-   pci_unmap_single(mac-dma_pdev, info-dma,
-info-skb-len, PCI_DMA_TODEVICE);
-   dev_kfree_skb_irq(info-skb);
+   skbs[count] = info-skb;
+   dmas[count] = info-dma;
 
info-skb = NULL;
info-dma = 0;
dp-mactx = 0;
dp-ptr = 0;
+
+   count++;
}
mac-tx-next_to_clean += count;
spin_unlock_irqrestore(mac-tx-lock, flags);
netif_wake_queue(mac-netdev);
 
-   return count;
+   for (i = 0; i  count; i++) {
+   pci_unmap_single(mac-dma_pdev, dmas[i],
+skbs[i]-len, PCI_DMA_TODEVICE);
+   dev_kfree_skb_irq(skbs[i]);
+   }
+
+   total_count += count;
+
+   /* If the batch was full, try to clean more */
+   if (count == 32)
+   goto restart;
+
+   return total_count;
 }
 
 
@@ -787,7 +806,7 @@ static int pasemi_mac_open(struct net_de
   PAS_IOB_DMA_RXCH_CFG_CNTTH(0));
 
write_iob_reg(mac, PAS_IOB_DMA_TXCH_CFG(mac-dma_txch),
-  PAS_IOB_DMA_TXCH_CFG_CNTTH(32));
+  PAS_IOB_DMA_TXCH_CFG_CNTTH(128));
 
/* Clear out any residual packet count state from firmware */
pasemi_mac_restart_rx_intr(mac);

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [04/10] pasemi_mac: Fix memcpy amount for short receives

2007-08-22 Thread Olof Johansson

Fix up memcpy for short receives.

Signed-off-by: Olof Johansson [EMAIL PROTECTED]


Index: mainline/drivers/net/pasemi_mac.c
===
--- mainline.orig/drivers/net/pasemi_mac.c
+++ mainline/drivers/net/pasemi_mac.c
@@ -516,9 +516,7 @@ static int pasemi_mac_clean_rx(struct pa
netdev_alloc_skb(mac-netdev, len + NET_IP_ALIGN);
if (new_skb) {
skb_reserve(new_skb, NET_IP_ALIGN);
-   memcpy(new_skb-data - NET_IP_ALIGN,
-   skb-data - NET_IP_ALIGN,
-   len + NET_IP_ALIGN);
+   memcpy(new_skb-data, skb-data, len);
/* save the skb in buffer_info as good */
skb = new_skb;
}

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [07/10] pasemi_mac: Enable LLTX

2007-08-22 Thread Olof Johansson

Enable LLTX on pasemi_mac: we're already doing sufficient locking
in the driver to enable it.


Signed-off-by: Olof Johansson [EMAIL PROTECTED]

Index: mainline/drivers/net/pasemi_mac.c
===
--- mainline.orig/drivers/net/pasemi_mac.c
+++ mainline/drivers/net/pasemi_mac.c
@@ -1235,7 +1235,7 @@ pasemi_mac_probe(struct pci_dev *pdev, c
dev-set_multicast_list = pasemi_mac_set_rx_mode;
dev-weight = 64;
dev-poll = pasemi_mac_poll;
-   dev-features = NETIF_F_HW_CSUM;
+   dev-features = NETIF_F_HW_CSUM | NETIF_F_LLTX;
 
err = pasemi_mac_map_regs(mac);
if (err)

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/16] xfrm netlink interface cleanups

2007-08-22 Thread Thomas Graf

This patchset converts the xfrm netlink bits over to the type
safe netlink interface and does some cleanups.

 xfrm_user.c | 1041 
 1 file changed, 433 insertions(+), 608 deletions(-)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/16] [XFRM] netlink: Use nlmsg_put() instead of NLMSG_PUT()

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-20 17:09:48.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:10:34.0 +0200
@@ -588,10 +588,10 @@ static int dump_one_state(struct xfrm_st
if (sp-this_idx  sp-start_idx)
goto out;
 
-   nlh = NLMSG_PUT(skb, NETLINK_CB(in_skb).pid,
-   sp-nlmsg_seq,
-   XFRM_MSG_NEWSA, sizeof(*p));
-   nlh-nlmsg_flags = sp-nlmsg_flags;
+   nlh = nlmsg_put(skb, NETLINK_CB(in_skb).pid, sp-nlmsg_seq,
+   XFRM_MSG_NEWSA, sizeof(*p), sp-nlmsg_flags);
+   if (nlh == NULL)
+   return -EMSGSIZE;
 
p = NLMSG_DATA(nlh);
copy_to_user_state(x, p);
@@ -633,7 +633,6 @@ out:
sp-this_idx++;
return 0;
 
-nlmsg_failure:
 rtattr_failure:
nlmsg_trim(skb, b);
return -1;
@@ -1276,11 +1275,11 @@ static int dump_one_policy(struct xfrm_p
if (sp-this_idx  sp-start_idx)
goto out;
 
-   nlh = NLMSG_PUT(skb, NETLINK_CB(in_skb).pid,
-   sp-nlmsg_seq,
-   XFRM_MSG_NEWPOLICY, sizeof(*p));
+   nlh = nlmsg_put(skb, NETLINK_CB(in_skb).pid, sp-nlmsg_seq,
+   XFRM_MSG_NEWPOLICY, sizeof(*p), sp-nlmsg_flags);
+   if (nlh == NULL)
+   return -EMSGSIZE;
p = NLMSG_DATA(nlh);
-   nlh-nlmsg_flags = sp-nlmsg_flags;
 
copy_to_user_policy(xp, p, dir);
if (copy_to_user_tmpl(xp, skb)  0)
@@ -1449,9 +1448,10 @@ static int build_aevent(struct sk_buff *
struct xfrm_lifetime_cur ltime;
unsigned char *b = skb_tail_pointer(skb);
 
-   nlh = NLMSG_PUT(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id));
+   nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0);
+   if (nlh == NULL)
+   return -EMSGSIZE;
id = NLMSG_DATA(nlh);
-   nlh-nlmsg_flags = 0;
 
memcpy(id-sa_id.daddr, x-id.daddr,sizeof(x-id.daddr));
id-sa_id.spi = x-id.spi;
@@ -1483,7 +1483,6 @@ static int build_aevent(struct sk_buff *
return skb-len;
 
 rtattr_failure:
-nlmsg_failure:
nlmsg_trim(skb, b);
return -1;
 }
@@ -1866,9 +1865,10 @@ static int build_migrate(struct sk_buff 
unsigned char *b = skb_tail_pointer(skb);
int i;
 
-   nlh = NLMSG_PUT(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id));
+   nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id), 0);
+   if (nlh == NULL)
+   return -EMSGSIZE;
pol_id = NLMSG_DATA(nlh);
-   nlh-nlmsg_flags = 0;
 
/* copy data from selector, dir, and type to the pol_id */
memset(pol_id, 0, sizeof(*pol_id));
@@ -2045,20 +2045,16 @@ static int build_expire(struct sk_buff *
struct nlmsghdr *nlh;
unsigned char *b = skb_tail_pointer(skb);
 
-   nlh = NLMSG_PUT(skb, c-pid, 0, XFRM_MSG_EXPIRE,
-   sizeof(*ue));
+   nlh = nlmsg_put(skb, c-pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0);
+   if (nlh == NULL)
+   return -EMSGSIZE;
ue = NLMSG_DATA(nlh);
-   nlh-nlmsg_flags = 0;
 
copy_to_user_state(x, ue-state);
ue-hard = (c-data.hard != 0) ? 1 : 0;
 
nlh-nlmsg_len = skb_tail_pointer(skb) - b;
return skb-len;
-
-nlmsg_failure:
-   nlmsg_trim(skb, b);
-   return -1;
 }
 
 static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c)
@@ -2108,9 +2104,11 @@ static int xfrm_notify_sa_flush(struct k
return -ENOMEM;
b = skb-tail;
 
-   nlh = NLMSG_PUT(skb, c-pid, c-seq,
-   XFRM_MSG_FLUSHSA, sizeof(*p));
-   nlh-nlmsg_flags = 0;
+   nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_FLUSHSA, sizeof(*p), 0);
+   if (nlh == NULL) {
+   kfree_skb(skb);
+   return -EMSGSIZE;
+   }
 
p = NLMSG_DATA(nlh);
p-proto = c-data.proto;
@@ -2119,10 +2117,6 @@ static int xfrm_notify_sa_flush(struct k
 
NETLINK_CB(skb).dst_group = XFRMNLGRP_SA;
return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
-
-nlmsg_failure:
-   kfree_skb(skb);
-   return -1;
 }
 
 static inline int xfrm_sa_len(struct xfrm_state *x)
@@ -2162,8 +2156,9 @@ static int xfrm_notify_sa(struct xfrm_st
return -ENOMEM;
b = skb-tail;
 
-   nlh = NLMSG_PUT(skb, c-pid, c-seq, c-event, headlen);
-   nlh-nlmsg_flags = 0;
+   nlh = nlmsg_put(skb, c-pid, c-seq, c-event, headlen, 0);
+   if (nlh == NULL)
+   goto nlmsg_failure;
 
p = NLMSG_DATA(nlh);
if (c-event == XFRM_MSG_DELSA) {
@@ -2233,10 +2228,10 @@ static int build_acquire(struct sk_buff 
unsigned char *b = skb_tail_pointer(skb);
__u32 seq = xfrm_get_acqseq();
 
-

[PATCH 09/16] [XFRM] netlink: Use nlmsg_parse() to parse attributes

2007-08-22 Thread Thomas Graf

Uses nlmsg_parse() to parse the attributes. This actually changes
behaviour as unknown attributes (type  MAXTYPE) no longer cause
an error. Instead unknown attributes will be ignored henceforth
to keep older kernels compatible with more recent userspace tools.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:07:38.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:31:04.0 +0200
@@ -1890,7 +1890,7 @@ static int xfrm_send_migrate(struct xfrm
 }
 #endif
 
-#define XMSGSIZE(type) NLMSG_LENGTH(sizeof(struct type))
+#define XMSGSIZE(type) sizeof(struct type)
 
 static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = {
[XFRM_MSG_NEWSA   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info),
@@ -1906,13 +1906,13 @@ static const int xfrm_msg_min[XFRM_NR_MS
[XFRM_MSG_UPDSA   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info),
[XFRM_MSG_POLEXPIRE   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_polexpire),
[XFRM_MSG_FLUSHSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_flush),
-   [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = NLMSG_LENGTH(0),
+   [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = 0,
[XFRM_MSG_NEWAE   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id),
[XFRM_MSG_GETAE   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id),
[XFRM_MSG_REPORT  - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_report),
[XFRM_MSG_MIGRATE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
-   [XFRM_MSG_GETSADINFO  - XFRM_MSG_BASE] = NLMSG_LENGTH(sizeof(u32)),
-   [XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = NLMSG_LENGTH(sizeof(u32)),
+   [XFRM_MSG_GETSADINFO  - XFRM_MSG_BASE] = sizeof(u32),
+   [XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = sizeof(u32),
 };
 
 #undef XMSGSIZE
@@ -1946,9 +1946,9 @@ static struct xfrm_link {
 
 static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
-   struct rtattr *xfrma[XFRMA_MAX];
+   struct nlattr *xfrma[XFRMA_MAX+1];
struct xfrm_link *link;
-   int type, min_len;
+   int type, err;
 
type = nlh-nlmsg_type;
if (type  XFRM_MSG_MAX)
@@ -1970,30 +1970,16 @@ static int xfrm_user_rcv_msg(struct sk_b
return netlink_dump_start(xfrm_nl, skb, nlh, link-dump, NULL);
}
 
-   memset(xfrma, 0, sizeof(xfrma));
-
-   if (nlh-nlmsg_len  (min_len = xfrm_msg_min[type]))
-   return -EINVAL;
-
-   if (nlh-nlmsg_len  min_len) {
-   int attrlen = nlh-nlmsg_len - NLMSG_ALIGN(min_len);
-   struct rtattr *attr = (void *) nlh + NLMSG_ALIGN(min_len);
-
-   while (RTA_OK(attr, attrlen)) {
-   unsigned short flavor = attr-rta_type;
-   if (flavor) {
-   if (flavor  XFRMA_MAX)
-   return -EINVAL;
-   xfrma[flavor - 1] = attr;
-   }
-   attr = RTA_NEXT(attr, attrlen);
-   }
-   }
+   /* FIXME: Temporary hack, nlmsg_parse() starts at xfrma[1], old code
+* expects first attribute at xfrma[0] */
+   err = nlmsg_parse(nlh, xfrm_msg_min[type], xfrma-1, XFRMA_MAX, NULL);
+   if (err  0)
+   return err;
 
if (link-doit == NULL)
return -EINVAL;
 
-   return link-doit(skb, nlh, xfrma);
+   return link-doit(skb, nlh, (struct rtattr **) xfrma);
 }
 
 static void xfrm_netlink_rcv(struct sock *sk, int len)

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/16] [XFRM] netlink: Use nlmsg_new() and type-safe size calculation helpers

2007-08-22 Thread Thomas Graf

Moves all complex message size calculation into own inlined helper
functions and makes use of the type-safe netlink interface.

Using nlmsg_new() simplifies the calculation itself as it takes care
of the netlink header length by itself.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:04:46.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:07:38.0 +0200
@@ -670,7 +670,7 @@ static struct sk_buff *xfrm_state_netlin
struct xfrm_dump_info info;
struct sk_buff *skb;
 
-   skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC);
+   skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
if (!skb)
return ERR_PTR(-ENOMEM);
 
@@ -688,6 +688,13 @@ static struct sk_buff *xfrm_state_netlin
return skb;
 }
 
+static inline size_t xfrm_spdinfo_msgsize(void)
+{
+   return NLMSG_ALIGN(4)
+  + nla_total_size(sizeof(struct xfrmu_spdinfo))
+  + nla_total_size(sizeof(struct xfrmu_spdhinfo));
+}
+
 static int build_spdinfo(struct sk_buff *skb, u32 pid, u32 seq, u32 flags)
 {
struct xfrmk_spdinfo si;
@@ -729,12 +736,8 @@ static int xfrm_get_spdinfo(struct sk_bu
u32 *flags = nlmsg_data(nlh);
u32 spid = NETLINK_CB(skb).pid;
u32 seq = nlh-nlmsg_seq;
-   int len = NLMSG_LENGTH(sizeof(u32));
 
-   len += RTA_SPACE(sizeof(struct xfrmu_spdinfo));
-   len += RTA_SPACE(sizeof(struct xfrmu_spdhinfo));
-
-   r_skb = alloc_skb(len, GFP_ATOMIC);
+   r_skb = nlmsg_new(xfrm_spdinfo_msgsize(), GFP_ATOMIC);
if (r_skb == NULL)
return -ENOMEM;
 
@@ -744,6 +747,13 @@ static int xfrm_get_spdinfo(struct sk_bu
return nlmsg_unicast(xfrm_nl, r_skb, spid);
 }
 
+static inline size_t xfrm_sadinfo_msgsize(void)
+{
+   return NLMSG_ALIGN(4)
+  + nla_total_size(sizeof(struct xfrmu_sadhinfo))
+  + nla_total_size(4); /* XFRMA_SAD_CNT */
+}
+
 static int build_sadinfo(struct sk_buff *skb, u32 pid, u32 seq, u32 flags)
 {
struct xfrmk_sadinfo si;
@@ -779,13 +789,8 @@ static int xfrm_get_sadinfo(struct sk_bu
u32 *flags = nlmsg_data(nlh);
u32 spid = NETLINK_CB(skb).pid;
u32 seq = nlh-nlmsg_seq;
-   int len = NLMSG_LENGTH(sizeof(u32));
-
-   len += RTA_SPACE(sizeof(struct xfrmu_sadhinfo));
-   len += RTA_SPACE(sizeof(u32));
-
-   r_skb = alloc_skb(len, GFP_ATOMIC);
 
+   r_skb = nlmsg_new(xfrm_sadinfo_msgsize(), GFP_ATOMIC);
if (r_skb == NULL)
return -ENOMEM;
 
@@ -1311,7 +1316,7 @@ static struct sk_buff *xfrm_policy_netli
struct xfrm_dump_info info;
struct sk_buff *skb;
 
-   skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+   skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
if (!skb)
return ERR_PTR(-ENOMEM);
 
@@ -1425,6 +1430,14 @@ static int xfrm_flush_sa(struct sk_buff 
return 0;
 }
 
+static inline size_t xfrm_aevent_msgsize(void)
+{
+   return NLMSG_ALIGN(sizeof(struct xfrm_aevent_id))
+  + nla_total_size(sizeof(struct xfrm_replay_state))
+  + nla_total_size(sizeof(struct xfrm_lifetime_cur))
+  + nla_total_size(4) /* XFRM_AE_RTHR */
+  + nla_total_size(4); /* XFRM_AE_ETHR */
+}
 
 static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, struct 
km_event *c)
 {
@@ -1469,19 +1482,9 @@ static int xfrm_get_ae(struct sk_buff *s
int err;
struct km_event c;
struct xfrm_aevent_id *p = nlmsg_data(nlh);
-   int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id));
struct xfrm_usersa_id *id = p-sa_id;
 
-   len += RTA_SPACE(sizeof(struct xfrm_replay_state));
-   len += RTA_SPACE(sizeof(struct xfrm_lifetime_cur));
-
-   if (p-flagsXFRM_AE_RTHR)
-   len+=RTA_SPACE(sizeof(u32));
-
-   if (p-flagsXFRM_AE_ETHR)
-   len+=RTA_SPACE(sizeof(u32));
-
-   r_skb = alloc_skb(len, GFP_ATOMIC);
+   r_skb = nlmsg_new(xfrm_aevent_msgsize(), GFP_ATOMIC);
if (r_skb == NULL)
return -ENOMEM;
 
@@ -1824,6 +1827,13 @@ static int copy_to_user_migrate(struct x
return nla_put(skb, XFRMA_MIGRATE, sizeof(um), um);
 }
 
+static inline size_t xfrm_migrate_msgsize(int num_migrate)
+{
+   return NLMSG_ALIGN(sizeof(struct xfrm_userpolicy_id))
+  + nla_total_size(sizeof(struct xfrm_user_migrate) * num_migrate)
+  + userpolicy_type_attrsize();
+}
+
 static int build_migrate(struct sk_buff *skb, struct xfrm_migrate *m,
 int num_migrate, struct xfrm_selector *sel,
 u8 dir, u8 type)
@@ -1861,12 +1871,8 @@ static int xfrm_send_migrate(struct xfrm
 struct xfrm_migrate *m, int num_migrate)
 {
struct sk_buff *skb;
-   size_t len;

[PATCH 02/16] [XFRM] netlink: Use nlmsg_end() and nlmsg_cancel()

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:10:34.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:12:20.0 +0200
@@ -583,7 +583,6 @@ static int dump_one_state(struct xfrm_st
struct sk_buff *skb = sp-out_skb;
struct xfrm_usersa_info *p;
struct nlmsghdr *nlh;
-   unsigned char *b = skb_tail_pointer(skb);
 
if (sp-this_idx  sp-start_idx)
goto out;
@@ -628,14 +627,14 @@ static int dump_one_state(struct xfrm_st
if (x-lastused)
RTA_PUT(skb, XFRMA_LASTUSED, sizeof(x-lastused), x-lastused);
 
-   nlh-nlmsg_len = skb_tail_pointer(skb) - b;
+   nlmsg_end(skb, nlh);
 out:
sp-this_idx++;
return 0;
 
 rtattr_failure:
-   nlmsg_trim(skb, b);
-   return -1;
+   nlmsg_cancel(skb, nlh);
+   return -EMSGSIZE;
 }
 
 static int xfrm_dump_sa(struct sk_buff *skb, struct netlink_callback *cb)
@@ -1270,7 +1269,6 @@ static int dump_one_policy(struct xfrm_p
struct sk_buff *in_skb = sp-in_skb;
struct sk_buff *skb = sp-out_skb;
struct nlmsghdr *nlh;
-   unsigned char *b = skb_tail_pointer(skb);
 
if (sp-this_idx  sp-start_idx)
goto out;
@@ -1289,14 +1287,14 @@ static int dump_one_policy(struct xfrm_p
if (copy_to_user_policy_type(xp-type, skb)  0)
goto nlmsg_failure;
 
-   nlh-nlmsg_len = skb_tail_pointer(skb) - b;
+   nlmsg_end(skb, nlh);
 out:
sp-this_idx++;
return 0;
 
 nlmsg_failure:
-   nlmsg_trim(skb, b);
-   return -1;
+   nlmsg_cancel(skb, nlh);
+   return -EMSGSIZE;
 }
 
 static int xfrm_dump_policy(struct sk_buff *skb, struct netlink_callback *cb)
@@ -1446,7 +1444,6 @@ static int build_aevent(struct sk_buff *
struct xfrm_aevent_id *id;
struct nlmsghdr *nlh;
struct xfrm_lifetime_cur ltime;
-   unsigned char *b = skb_tail_pointer(skb);
 
nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0);
if (nlh == NULL)
@@ -1479,12 +1476,11 @@ static int build_aevent(struct sk_buff *
RTA_PUT(skb,XFRMA_ETIMER_THRESH,sizeof(u32),etimer);
}
 
-   nlh-nlmsg_len = skb_tail_pointer(skb) - b;
-   return skb-len;
+   return nlmsg_end(skb, nlh);
 
 rtattr_failure:
-   nlmsg_trim(skb, b);
-   return -1;
+   nlmsg_cancel(skb, nlh);
+   return -EMSGSIZE;
 }
 
 static int xfrm_get_ae(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -1862,7 +1858,6 @@ static int build_migrate(struct sk_buff 
struct xfrm_migrate *mp;
struct xfrm_userpolicy_id *pol_id;
struct nlmsghdr *nlh;
-   unsigned char *b = skb_tail_pointer(skb);
int i;
 
nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id), 0);
@@ -1883,11 +1878,10 @@ static int build_migrate(struct sk_buff 
goto nlmsg_failure;
}
 
-   nlh-nlmsg_len = skb_tail_pointer(skb) - b;
-   return skb-len;
+   return nlmsg_end(skb, nlh);
 nlmsg_failure:
-   nlmsg_trim(skb, b);
-   return -1;
+   nlmsg_cancel(skb, nlh);
+   return -EMSGSIZE;
 }
 
 static int xfrm_send_migrate(struct xfrm_selector *sel, u8 dir, u8 type,
@@ -2043,7 +2037,6 @@ static int build_expire(struct sk_buff *
 {
struct xfrm_user_expire *ue;
struct nlmsghdr *nlh;
-   unsigned char *b = skb_tail_pointer(skb);
 
nlh = nlmsg_put(skb, c-pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0);
if (nlh == NULL)
@@ -2053,8 +2046,7 @@ static int build_expire(struct sk_buff *
copy_to_user_state(x, ue-state);
ue-hard = (c-data.hard != 0) ? 1 : 0;
 
-   nlh-nlmsg_len = skb_tail_pointer(skb) - b;
-   return skb-len;
+   return nlmsg_end(skb, nlh);
 }
 
 static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c)
@@ -2096,13 +2088,11 @@ static int xfrm_notify_sa_flush(struct k
struct xfrm_usersa_flush *p;
struct nlmsghdr *nlh;
struct sk_buff *skb;
-   sk_buff_data_t b;
int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush));
 
skb = alloc_skb(len, GFP_ATOMIC);
if (skb == NULL)
return -ENOMEM;
-   b = skb-tail;
 
nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_FLUSHSA, sizeof(*p), 0);
if (nlh == NULL) {
@@ -2113,7 +2103,7 @@ static int xfrm_notify_sa_flush(struct k
p = NLMSG_DATA(nlh);
p-proto = c-data.proto;
 
-   nlh-nlmsg_len = skb-tail - b;
+   nlmsg_end(skb, nlh);
 
NETLINK_CB(skb).dst_group = XFRMNLGRP_SA;
return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
@@ -2140,7 +2130,6 @@ static int xfrm_notify_sa(struct xfrm_st
struct xfrm_usersa_id *id;
struct nlmsghdr *nlh;
struct sk_buff *skb;
-

[PATCH 16/16] [XFRM] netlink: Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()

2007-08-22 Thread Thomas Graf

These functions are only used once and are a lot easier to understand if
inlined directly into the function.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 23:05:30.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-22 16:45:31.0 +0200
@@ -214,23 +214,6 @@ static int attach_one_algo(struct xfrm_a
return 0;
 }
 
-static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct nlattr 
*rta)
-{
-   struct xfrm_encap_tmpl *p, *uencap;
-
-   if (!rta)
-   return 0;
-
-   uencap = nla_data(rta);
-   p = kmemdup(uencap, sizeof(*p), GFP_KERNEL);
-   if (!p)
-   return -ENOMEM;
-
-   *encapp = p;
-   return 0;
-}
-
-
 static inline int xfrm_user_sec_ctx_size(struct xfrm_sec_ctx *xfrm_ctx)
 {
int len = 0;
@@ -242,33 +225,6 @@ static inline int xfrm_user_sec_ctx_size
return len;
 }
 
-static int attach_sec_ctx(struct xfrm_state *x, struct nlattr *u_arg)
-{
-   struct xfrm_user_sec_ctx *uctx;
-
-   if (!u_arg)
-   return 0;
-
-   uctx = nla_data(u_arg);
-   return security_xfrm_state_alloc(x, uctx);
-}
-
-static int attach_one_addr(xfrm_address_t **addrpp, struct nlattr *rta)
-{
-   xfrm_address_t *p, *uaddrp;
-
-   if (!rta)
-   return 0;
-
-   uaddrp = nla_data(rta);
-   p = kmemdup(uaddrp, sizeof(*p), GFP_KERNEL);
-   if (!p)
-   return -ENOMEM;
-
-   *addrpp = p;
-   return 0;
-}
-
 static void copy_from_user_state(struct xfrm_state *x, struct xfrm_usersa_info 
*p)
 {
memcpy(x-id, p-id, sizeof(x-id));
@@ -340,15 +296,27 @@ static struct xfrm_state *xfrm_state_con
   xfrm_calg_get_byname,
   attrs[XFRMA_ALG_COMP])))
goto error;
-   if ((err = attach_encap_tmpl(x-encap, attrs[XFRMA_ENCAP])))
-   goto error;
-   if ((err = attach_one_addr(x-coaddr, attrs[XFRMA_COADDR])))
-   goto error;
+
+   if (attrs[XFRMA_ENCAP]) {
+   x-encap = kmemdup(nla_data(attrs[XFRMA_ENCAP]),
+  sizeof(x-encap), GFP_KERNEL);
+   if (x-encap == NULL)
+   goto error;
+   }
+
+   if (attrs[XFRMA_COADDR]) {
+   x-coaddr = kmemdup(nla_data(attrs[XFRMA_COADDR]),
+   sizeof(x-coaddr), GFP_KERNEL);
+   if (x-coaddr == NULL)
+   goto error;
+   }
+
err = xfrm_init_state(x);
if (err)
goto error;
 
-   if ((err = attach_sec_ctx(x, attrs[XFRMA_SEC_CTX])))
+   if (attrs[XFRMA_SEC_CTX] 
+   security_xfrm_state_alloc(x, nla_data(attrs[XFRMA_SEC_CTX])))
goto error;
 
x-km.seq = p-seq;

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/16] [XFRM] netlink: Move algorithm length calculation to its own function

2007-08-22 Thread Thomas Graf

Adds alg_len() to calculate the properly padded length of an
algorithm attribute to simplify the code.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:16:03.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:03:43.0 +0200
@@ -33,6 +33,11 @@
 #endif
 #include linux/audit.h
 
+static inline int alg_len(struct xfrm_algo *alg)
+{
+   return sizeof(*alg) + ((alg-alg_key_len + 7) / 8);
+}
+
 static int verify_one_alg(struct rtattr **xfrma, enum xfrm_attr_type_t type)
 {
struct rtattr *rt = xfrma[type - 1];
@@ -232,7 +237,6 @@ static int attach_one_algo(struct xfrm_a
struct rtattr *rta = u_arg;
struct xfrm_algo *p, *ualg;
struct xfrm_algo_desc *algo;
-   int len;
 
if (!rta)
return 0;
@@ -244,8 +248,7 @@ static int attach_one_algo(struct xfrm_a
return -ENOSYS;
*props = algo-desc.sadb_alg_id;
 
-   len = sizeof(*ualg) + (ualg-alg_key_len + 7U) / 8;
-   p = kmemdup(ualg, len, GFP_KERNEL);
+   p = kmemdup(ualg, alg_len(ualg), GFP_KERNEL);
if (!p)
return -ENOMEM;
 
@@ -617,11 +620,9 @@ static int dump_one_state(struct xfrm_st
copy_to_user_state(x, p);
 
if (x-aalg)
-   NLA_PUT(skb, XFRMA_ALG_AUTH,
-   sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg);
+   NLA_PUT(skb, XFRMA_ALG_AUTH, alg_len(x-aalg), x-aalg);
if (x-ealg)
-   NLA_PUT(skb, XFRMA_ALG_CRYPT,
-   sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg);
+   NLA_PUT(skb, XFRMA_ALG_CRYPT, alg_len(x-ealg), x-ealg);
if (x-calg)
NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg);
 
@@ -2072,9 +2073,9 @@ static inline int xfrm_sa_len(struct xfr
 {
int l = 0;
if (x-aalg)
-   l += RTA_SPACE(sizeof(*x-aalg) + (x-aalg-alg_key_len+7)/8);
+   l += RTA_SPACE(alg_len(x-aalg));
if (x-ealg)
-   l += RTA_SPACE(sizeof(*x-ealg) + (x-ealg-alg_key_len+7)/8);
+   l += RTA_SPACE(alg_len(x-ealg));
if (x-calg)
l += RTA_SPACE(sizeof(*x-calg));
if (x-encap)
@@ -2127,11 +2128,9 @@ static int xfrm_notify_sa(struct xfrm_st
copy_to_user_state(x, p);
 
if (x-aalg)
-   NLA_PUT(skb, XFRMA_ALG_AUTH,
-   sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg);
+   NLA_PUT(skb, XFRMA_ALG_AUTH, alg_len(x-aalg), x-aalg);
if (x-ealg)
-   NLA_PUT(skb, XFRMA_ALG_CRYPT,
-   sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg);
+   NLA_PUT(skb, XFRMA_ALG_CRYPT, alg_len(x-ealg), x-ealg);
if (x-calg)
NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg);
 

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/16] [XFRM] netlink: Rename attribyte array from xfrma[] to attrs[]

2007-08-22 Thread Thomas Graf

Increases readability a lot.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:34:10.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:34:29.0 +0200
@@ -38,9 +38,9 @@ static inline int alg_len(struct xfrm_al
return sizeof(*alg) + ((alg-alg_key_len + 7) / 8);
 }
 
-static int verify_one_alg(struct rtattr **xfrma, enum xfrm_attr_type_t type)
+static int verify_one_alg(struct rtattr **attrs, enum xfrm_attr_type_t type)
 {
-   struct rtattr *rt = xfrma[type];
+   struct rtattr *rt = attrs[type];
struct xfrm_algo *algp;
 
if (!rt)
@@ -75,18 +75,18 @@ static int verify_one_alg(struct rtattr 
return 0;
 }
 
-static void verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type,
+static void verify_one_addr(struct rtattr **attrs, enum xfrm_attr_type_t type,
   xfrm_address_t **addrp)
 {
-   struct rtattr *rt = xfrma[type];
+   struct rtattr *rt = attrs[type];
 
if (rt  addrp)
*addrp = RTA_DATA(rt);
 }
 
-static inline int verify_sec_ctx_len(struct rtattr **xfrma)
+static inline int verify_sec_ctx_len(struct rtattr **attrs)
 {
-   struct rtattr *rt = xfrma[XFRMA_SEC_CTX];
+   struct rtattr *rt = attrs[XFRMA_SEC_CTX];
struct xfrm_user_sec_ctx *uctx;
 
if (!rt)
@@ -101,7 +101,7 @@ static inline int verify_sec_ctx_len(str
 
 
 static int verify_newsa_info(struct xfrm_usersa_info *p,
-struct rtattr **xfrma)
+struct rtattr **attrs)
 {
int err;
 
@@ -125,35 +125,35 @@ static int verify_newsa_info(struct xfrm
err = -EINVAL;
switch (p-id.proto) {
case IPPROTO_AH:
-   if (!xfrma[XFRMA_ALG_AUTH]  ||
-   xfrma[XFRMA_ALG_CRYPT]  ||
-   xfrma[XFRMA_ALG_COMP])
+   if (!attrs[XFRMA_ALG_AUTH]  ||
+   attrs[XFRMA_ALG_CRYPT]  ||
+   attrs[XFRMA_ALG_COMP])
goto out;
break;
 
case IPPROTO_ESP:
-   if ((!xfrma[XFRMA_ALG_AUTH] 
-!xfrma[XFRMA_ALG_CRYPT])   ||
-   xfrma[XFRMA_ALG_COMP])
+   if ((!attrs[XFRMA_ALG_AUTH] 
+!attrs[XFRMA_ALG_CRYPT])   ||
+   attrs[XFRMA_ALG_COMP])
goto out;
break;
 
case IPPROTO_COMP:
-   if (!xfrma[XFRMA_ALG_COMP]  ||
-   xfrma[XFRMA_ALG_AUTH]   ||
-   xfrma[XFRMA_ALG_CRYPT])
+   if (!attrs[XFRMA_ALG_COMP]  ||
+   attrs[XFRMA_ALG_AUTH]   ||
+   attrs[XFRMA_ALG_CRYPT])
goto out;
break;
 
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
case IPPROTO_DSTOPTS:
case IPPROTO_ROUTING:
-   if (xfrma[XFRMA_ALG_COMP]   ||
-   xfrma[XFRMA_ALG_AUTH]   ||
-   xfrma[XFRMA_ALG_CRYPT]  ||
-   xfrma[XFRMA_ENCAP]  ||
-   xfrma[XFRMA_SEC_CTX]||
-   !xfrma[XFRMA_COADDR])
+   if (attrs[XFRMA_ALG_COMP]   ||
+   attrs[XFRMA_ALG_AUTH]   ||
+   attrs[XFRMA_ALG_CRYPT]  ||
+   attrs[XFRMA_ENCAP]  ||
+   attrs[XFRMA_SEC_CTX]||
+   !attrs[XFRMA_COADDR])
goto out;
break;
 #endif
@@ -162,13 +162,13 @@ static int verify_newsa_info(struct xfrm
goto out;
}
 
-   if ((err = verify_one_alg(xfrma, XFRMA_ALG_AUTH)))
+   if ((err = verify_one_alg(attrs, XFRMA_ALG_AUTH)))
goto out;
-   if ((err = verify_one_alg(xfrma, XFRMA_ALG_CRYPT)))
+   if ((err = verify_one_alg(attrs, XFRMA_ALG_CRYPT)))
goto out;
-   if ((err = verify_one_alg(xfrma, XFRMA_ALG_COMP)))
+   if ((err = verify_one_alg(attrs, XFRMA_ALG_COMP)))
goto out;
-   if ((err = verify_sec_ctx_len(xfrma)))
+   if ((err = verify_sec_ctx_len(attrs)))
goto out;
 
err = -EINVAL;
@@ -298,12 +298,12 @@ static void copy_from_user_state(struct 
  * somehow made shareable and move it to xfrm_state.c - JHS
  *
 */
-static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma)
+static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **attrs)
 {
-   struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL];
-   struct rtattr *lt = xfrma[XFRMA_LTIME_VAL];
-   struct rtattr *et = xfrma[XFRMA_ETIMER_THRESH];
-   struct rtattr *rt = xfrma[XFRMA_REPLAY_THRESH];
+   struct rtattr *rp = attrs[XFRMA_REPLAY_VAL];
+

[PATCH 10/16] [XFRM] netlink: Establish an attribute policy

2007-08-22 Thread Thomas Graf

Adds a policy defining the minimal payload lengths for all the attributes
allowing for most attribute validation checks to be removed from in
the middle of the code path. Makes updates more consistent as many format
errors are recognised earlier, before any changes have been attempted.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:31:04.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:31:56.0 +0200
@@ -42,19 +42,12 @@ static int verify_one_alg(struct rtattr 
 {
struct rtattr *rt = xfrma[type - 1];
struct xfrm_algo *algp;
-   int len;
 
if (!rt)
return 0;
 
-   len = (rt-rta_len - sizeof(*rt)) - sizeof(*algp);
-   if (len  0)
-   return -EINVAL;
-
algp = RTA_DATA(rt);
-
-   len -= (algp-alg_key_len + 7U) / 8;
-   if (len  0)
+   if (RTA_PAYLOAD(rt)  alg_len(algp))
return -EINVAL;
 
switch (type) {
@@ -82,55 +75,25 @@ static int verify_one_alg(struct rtattr 
return 0;
 }
 
-static int verify_encap_tmpl(struct rtattr **xfrma)
-{
-   struct rtattr *rt = xfrma[XFRMA_ENCAP - 1];
-   struct xfrm_encap_tmpl *encap;
-
-   if (!rt)
-   return 0;
-
-   if ((rt-rta_len - sizeof(*rt))  sizeof(*encap))
-   return -EINVAL;
-
-   return 0;
-}
-
-static int verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type,
+static void verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type,
   xfrm_address_t **addrp)
 {
struct rtattr *rt = xfrma[type - 1];
 
-   if (!rt)
-   return 0;
-
-   if ((rt-rta_len - sizeof(*rt))  sizeof(**addrp))
-   return -EINVAL;
-
-   if (addrp)
+   if (rt  addrp)
*addrp = RTA_DATA(rt);
-
-   return 0;
 }
 
 static inline int verify_sec_ctx_len(struct rtattr **xfrma)
 {
struct rtattr *rt = xfrma[XFRMA_SEC_CTX - 1];
struct xfrm_user_sec_ctx *uctx;
-   int len = 0;
 
if (!rt)
return 0;
 
-   if (rt-rta_len  sizeof(*uctx))
-   return -EINVAL;
-
uctx = RTA_DATA(rt);
-
-   len += sizeof(struct xfrm_user_sec_ctx);
-   len += uctx-ctx_len;
-
-   if (uctx-len != len)
+   if (uctx-len != (sizeof(struct xfrm_user_sec_ctx) + uctx-ctx_len))
return -EINVAL;
 
return 0;
@@ -205,12 +168,8 @@ static int verify_newsa_info(struct xfrm
goto out;
if ((err = verify_one_alg(xfrma, XFRMA_ALG_COMP)))
goto out;
-   if ((err = verify_encap_tmpl(xfrma)))
-   goto out;
if ((err = verify_sec_ctx_len(xfrma)))
goto out;
-   if ((err = verify_one_addr(xfrma, XFRMA_COADDR, NULL)))
-   goto out;
 
err = -EINVAL;
switch (p-mode) {
@@ -339,9 +298,8 @@ static void copy_from_user_state(struct 
  * somehow made shareable and move it to xfrm_state.c - JHS
  *
 */
-static int xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma)
+static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma)
 {
-   int err = - EINVAL;
struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1];
struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1];
struct rtattr *et = xfrma[XFRMA_ETIMER_THRESH-1];
@@ -349,8 +307,6 @@ static int xfrm_update_ae_params(struct 
 
if (rp) {
struct xfrm_replay_state *replay;
-   if (RTA_PAYLOAD(rp)  sizeof(*replay))
-   goto error;
replay = RTA_DATA(rp);
memcpy(x-replay, replay, sizeof(*replay));
memcpy(x-preplay, replay, sizeof(*replay));
@@ -358,8 +314,6 @@ static int xfrm_update_ae_params(struct 
 
if (lt) {
struct xfrm_lifetime_cur *ltime;
-   if (RTA_PAYLOAD(lt)  sizeof(*ltime))
-   goto error;
ltime = RTA_DATA(lt);
x-curlft.bytes = ltime-bytes;
x-curlft.packets = ltime-packets;
@@ -367,21 +321,11 @@ static int xfrm_update_ae_params(struct 
x-curlft.use_time = ltime-use_time;
}
 
-   if (et) {
-   if (RTA_PAYLOAD(et)  sizeof(u32))
-   goto error;
+   if (et)
x-replay_maxage = *(u32*)RTA_DATA(et);
-   }
 
-   if (rt) {
-   if (RTA_PAYLOAD(rt)  sizeof(u32))
-   goto error;
+   if (rt)
x-replay_maxdiff = *(u32*)RTA_DATA(rt);
-   }
-
-   return 0;
-error:
-   return err;
 }
 
 static struct xfrm_state *xfrm_state_construct(struct xfrm_usersa_info *p,
@@ -429,9 +373,7 @@ static struct xfrm_state *xfrm_state_con
 
/* override default values from above */
 
-

[PATCH 14/16] [XFRM] netlink: Use nla_memcpy() in xfrm_update_ae_params()

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:35:13.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:36:59.0 +0200
@@ -303,20 +303,12 @@ static void xfrm_update_ae_params(struct
struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH];
 
if (rp) {
-   struct xfrm_replay_state *replay;
-   replay = nla_data(rp);
-   memcpy(x-replay, replay, sizeof(*replay));
-   memcpy(x-preplay, replay, sizeof(*replay));
+   nla_memcpy(x-replay, rp, sizeof(x-replay));
+   nla_memcpy(x-preplay, rp, sizeof(x-preplay));
}
 
-   if (lt) {
-   struct xfrm_lifetime_cur *ltime;
-   ltime = nla_data(lt);
-   x-curlft.bytes = ltime-bytes;
-   x-curlft.packets = ltime-packets;
-   x-curlft.add_time = ltime-add_time;
-   x-curlft.use_time = ltime-use_time;
-   }
+   if (lt)
+   nla_memcpy(x-curlft, lt, sizeof(x-curlft));
 
if (et)
x-replay_maxage = nla_get_u32(et);

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/16] [XFRM] netlink: Use nlattr instead of rtattr

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:34:29.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:35:13.0 +0200
@@ -38,16 +38,16 @@ static inline int alg_len(struct xfrm_al
return sizeof(*alg) + ((alg-alg_key_len + 7) / 8);
 }
 
-static int verify_one_alg(struct rtattr **attrs, enum xfrm_attr_type_t type)
+static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
 {
-   struct rtattr *rt = attrs[type];
+   struct nlattr *rt = attrs[type];
struct xfrm_algo *algp;
 
if (!rt)
return 0;
 
-   algp = RTA_DATA(rt);
-   if (RTA_PAYLOAD(rt)  alg_len(algp))
+   algp = nla_data(rt);
+   if (nla_len(rt)  alg_len(algp))
return -EINVAL;
 
switch (type) {
@@ -75,24 +75,24 @@ static int verify_one_alg(struct rtattr 
return 0;
 }
 
-static void verify_one_addr(struct rtattr **attrs, enum xfrm_attr_type_t type,
+static void verify_one_addr(struct nlattr **attrs, enum xfrm_attr_type_t type,
   xfrm_address_t **addrp)
 {
-   struct rtattr *rt = attrs[type];
+   struct nlattr *rt = attrs[type];
 
if (rt  addrp)
-   *addrp = RTA_DATA(rt);
+   *addrp = nla_data(rt);
 }
 
-static inline int verify_sec_ctx_len(struct rtattr **attrs)
+static inline int verify_sec_ctx_len(struct nlattr **attrs)
 {
-   struct rtattr *rt = attrs[XFRMA_SEC_CTX];
+   struct nlattr *rt = attrs[XFRMA_SEC_CTX];
struct xfrm_user_sec_ctx *uctx;
 
if (!rt)
return 0;
 
-   uctx = RTA_DATA(rt);
+   uctx = nla_data(rt);
if (uctx-len != (sizeof(struct xfrm_user_sec_ctx) + uctx-ctx_len))
return -EINVAL;
 
@@ -101,7 +101,7 @@ static inline int verify_sec_ctx_len(str
 
 
 static int verify_newsa_info(struct xfrm_usersa_info *p,
-struct rtattr **attrs)
+struct nlattr **attrs)
 {
int err;
 
@@ -191,16 +191,15 @@ out:
 
 static int attach_one_algo(struct xfrm_algo **algpp, u8 *props,
   struct xfrm_algo_desc *(*get_byname)(char *, int),
-  struct rtattr *u_arg)
+  struct nlattr *rta)
 {
-   struct rtattr *rta = u_arg;
struct xfrm_algo *p, *ualg;
struct xfrm_algo_desc *algo;
 
if (!rta)
return 0;
 
-   ualg = RTA_DATA(rta);
+   ualg = nla_data(rta);
 
algo = get_byname(ualg-alg_name, 1);
if (!algo)
@@ -216,15 +215,14 @@ static int attach_one_algo(struct xfrm_a
return 0;
 }
 
-static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct rtattr 
*u_arg)
+static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct nlattr 
*rta)
 {
-   struct rtattr *rta = u_arg;
struct xfrm_encap_tmpl *p, *uencap;
 
if (!rta)
return 0;
 
-   uencap = RTA_DATA(rta);
+   uencap = nla_data(rta);
p = kmemdup(uencap, sizeof(*p), GFP_KERNEL);
if (!p)
return -ENOMEM;
@@ -245,26 +243,25 @@ static inline int xfrm_user_sec_ctx_size
return len;
 }
 
-static int attach_sec_ctx(struct xfrm_state *x, struct rtattr *u_arg)
+static int attach_sec_ctx(struct xfrm_state *x, struct nlattr *u_arg)
 {
struct xfrm_user_sec_ctx *uctx;
 
if (!u_arg)
return 0;
 
-   uctx = RTA_DATA(u_arg);
+   uctx = nla_data(u_arg);
return security_xfrm_state_alloc(x, uctx);
 }
 
-static int attach_one_addr(xfrm_address_t **addrpp, struct rtattr *u_arg)
+static int attach_one_addr(xfrm_address_t **addrpp, struct nlattr *rta)
 {
-   struct rtattr *rta = u_arg;
xfrm_address_t *p, *uaddrp;
 
if (!rta)
return 0;
 
-   uaddrp = RTA_DATA(rta);
+   uaddrp = nla_data(rta);
p = kmemdup(uaddrp, sizeof(*p), GFP_KERNEL);
if (!p)
return -ENOMEM;
@@ -298,23 +295,23 @@ static void copy_from_user_state(struct 
  * somehow made shareable and move it to xfrm_state.c - JHS
  *
 */
-static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **attrs)
+static void xfrm_update_ae_params(struct xfrm_state *x, struct nlattr **attrs)
 {
-   struct rtattr *rp = attrs[XFRMA_REPLAY_VAL];
-   struct rtattr *lt = attrs[XFRMA_LTIME_VAL];
-   struct rtattr *et = attrs[XFRMA_ETIMER_THRESH];
-   struct rtattr *rt = attrs[XFRMA_REPLAY_THRESH];
+   struct nlattr *rp = attrs[XFRMA_REPLAY_VAL];
+   struct nlattr *lt = attrs[XFRMA_LTIME_VAL];
+   struct nlattr *et = attrs[XFRMA_ETIMER_THRESH];
+   struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH];
 
if (rp) {
struct xfrm_replay_state *replay;
-   replay = RTA_DATA(rp);

[PATCH 05/16] [XFRM] netlink: Use nla_put()/NLA_PUT() variantes

2007-08-22 Thread Thomas Graf

Also makes use of copy_sec_ctx() in another place and removes
duplicated code.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:15:03.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:16:03.0 +0200
@@ -576,6 +576,27 @@ struct xfrm_dump_info {
int this_idx;
 };
 
+static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb)
+{
+   int ctx_size = sizeof(struct xfrm_sec_ctx) + s-ctx_len;
+   struct xfrm_user_sec_ctx *uctx;
+   struct nlattr *attr;
+
+   attr = nla_reserve(skb, XFRMA_SEC_CTX, ctx_size);
+   if (attr == NULL)
+   return -EMSGSIZE;
+
+   uctx = nla_data(attr);
+   uctx-exttype = XFRMA_SEC_CTX;
+   uctx-len = ctx_size;
+   uctx-ctx_doi = s-ctx_doi;
+   uctx-ctx_alg = s-ctx_alg;
+   uctx-ctx_len = s-ctx_len;
+   memcpy(uctx + 1, s-ctx_str, s-ctx_len);
+
+   return 0;
+}
+
 static int dump_one_state(struct xfrm_state *x, int count, void *ptr)
 {
struct xfrm_dump_info *sp = ptr;
@@ -596,43 +617,32 @@ static int dump_one_state(struct xfrm_st
copy_to_user_state(x, p);
 
if (x-aalg)
-   RTA_PUT(skb, XFRMA_ALG_AUTH,
+   NLA_PUT(skb, XFRMA_ALG_AUTH,
sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg);
if (x-ealg)
-   RTA_PUT(skb, XFRMA_ALG_CRYPT,
+   NLA_PUT(skb, XFRMA_ALG_CRYPT,
sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg);
if (x-calg)
-   RTA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg);
+   NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg);
 
if (x-encap)
-   RTA_PUT(skb, XFRMA_ENCAP, sizeof(*x-encap), x-encap);
+   NLA_PUT(skb, XFRMA_ENCAP, sizeof(*x-encap), x-encap);
 
-   if (x-security) {
-   int ctx_size = sizeof(struct xfrm_sec_ctx) +
-   x-security-ctx_len;
-   struct rtattr *rt = __RTA_PUT(skb, XFRMA_SEC_CTX, ctx_size);
-   struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt);
-
-   uctx-exttype = XFRMA_SEC_CTX;
-   uctx-len = ctx_size;
-   uctx-ctx_doi = x-security-ctx_doi;
-   uctx-ctx_alg = x-security-ctx_alg;
-   uctx-ctx_len = x-security-ctx_len;
-   memcpy(uctx + 1, x-security-ctx_str, x-security-ctx_len);
-   }
+   if (x-security  copy_sec_ctx(x-security, skb)  0)
+   goto nla_put_failure;
 
if (x-coaddr)
-   RTA_PUT(skb, XFRMA_COADDR, sizeof(*x-coaddr), x-coaddr);
+   NLA_PUT(skb, XFRMA_COADDR, sizeof(*x-coaddr), x-coaddr);
 
if (x-lastused)
-   RTA_PUT(skb, XFRMA_LASTUSED, sizeof(x-lastused), x-lastused);
+   NLA_PUT_U64(skb, XFRMA_LASTUSED, x-lastused);
 
nlmsg_end(skb, nlh);
 out:
sp-this_idx++;
return 0;
 
-rtattr_failure:
+nla_put_failure:
nlmsg_cancel(skb, nlh);
return -EMSGSIZE;
 }
@@ -1193,32 +1203,9 @@ static int copy_to_user_tmpl(struct xfrm
up-ealgos = kp-ealgos;
up-calgos = kp-calgos;
}
-   RTA_PUT(skb, XFRMA_TMPL,
-   (sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr),
-   vec);
-
-   return 0;
-
-rtattr_failure:
-   return -1;
-}
-
-static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb)
-{
-   int ctx_size = sizeof(struct xfrm_sec_ctx) + s-ctx_len;
-   struct rtattr *rt = __RTA_PUT(skb, XFRMA_SEC_CTX, ctx_size);
-   struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt);
-
-   uctx-exttype = XFRMA_SEC_CTX;
-   uctx-len = ctx_size;
-   uctx-ctx_doi = s-ctx_doi;
-   uctx-ctx_alg = s-ctx_alg;
-   uctx-ctx_len = s-ctx_len;
-   memcpy(uctx + 1, s-ctx_str, s-ctx_len);
-   return 0;
 
- rtattr_failure:
-   return -1;
+   return nla_put(skb, XFRMA_TMPL,
+  sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr, vec);
 }
 
 static inline int copy_to_user_state_sec_ctx(struct xfrm_state *x, struct 
sk_buff *skb)
@@ -1240,17 +1227,11 @@ static inline int copy_to_user_sec_ctx(s
 #ifdef CONFIG_XFRM_SUB_POLICY
 static int copy_to_user_policy_type(u8 type, struct sk_buff *skb)
 {
-   struct xfrm_userpolicy_type upt;
+   struct xfrm_userpolicy_type upt = {
+   .type = type,
+   };
 
-   memset(upt, 0, sizeof(upt));
-   upt.type = type;
-
-   RTA_PUT(skb, XFRMA_POLICY_TYPE, sizeof(upt), upt);
-
-   return 0;
-
-rtattr_failure:
-   return -1;
+   return nla_put(skb, XFRMA_POLICY_TYPE, sizeof(upt), upt);
 }
 
 #else
@@ -1440,7 +1421,6 @@ static int build_aevent(struct sk_buff *
 {
struct xfrm_aevent_id *id;
struct nlmsghdr *nlh;
-   struct xfrm_lifetime_cur ltime;

[PATCH 04/16] [XFRM] netlink: Use nlmsg_broadcast() and nlmsg_unicast()

2007-08-22 Thread Thomas Graf

This simplifies successful return codes from 0 to 0.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:13:57.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:15:03.0 +0200
@@ -800,8 +800,7 @@ static int xfrm_get_sa(struct sk_buff *s
if (IS_ERR(resp_skb)) {
err = PTR_ERR(resp_skb);
} else {
-   err = netlink_unicast(xfrm_nl, resp_skb,
- NETLINK_CB(skb).pid, MSG_DONTWAIT);
+   err = nlmsg_unicast(xfrm_nl, resp_skb, NETLINK_CB(skb).pid);
}
xfrm_state_put(x);
 out_noput:
@@ -882,8 +881,7 @@ static int xfrm_alloc_userspi(struct sk_
goto out;
}
 
-   err = netlink_unicast(xfrm_nl, resp_skb,
- NETLINK_CB(skb).pid, MSG_DONTWAIT);
+   err = nlmsg_unicast(xfrm_nl, resp_skb, NETLINK_CB(skb).pid);
 
 out:
xfrm_state_put(x);
@@ -1393,9 +1391,8 @@ static int xfrm_get_policy(struct sk_buf
if (IS_ERR(resp_skb)) {
err = PTR_ERR(resp_skb);
} else {
-   err = netlink_unicast(xfrm_nl, resp_skb,
- NETLINK_CB(skb).pid,
- MSG_DONTWAIT);
+   err = nlmsg_unicast(xfrm_nl, resp_skb,
+   NETLINK_CB(skb).pid);
}
} else {
xfrm_audit_log(NETLINK_CB(skb).loginuid, NETLINK_CB(skb).sid,
@@ -1525,8 +1522,7 @@ static int xfrm_get_ae(struct sk_buff *s
 
if (build_aevent(r_skb, x, c)  0)
BUG();
-   err = netlink_unicast(xfrm_nl, r_skb,
- NETLINK_CB(skb).pid, MSG_DONTWAIT);
+   err = nlmsg_unicast(xfrm_nl, r_skb, NETLINK_CB(skb).pid);
spin_unlock_bh(x-lock);
xfrm_state_put(x);
return err;
@@ -1903,9 +1899,7 @@ static int xfrm_send_migrate(struct xfrm
if (build_migrate(skb, m, num_migrate, sel, dir, type)  0)
BUG();
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_MIGRATE;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_MIGRATE,
-GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_MIGRATE, GFP_ATOMIC);
 }
 #else
 static int xfrm_send_migrate(struct xfrm_selector *sel, u8 dir, u8 type,
@@ -2061,8 +2055,7 @@ static int xfrm_exp_state_notify(struct 
if (build_expire(skb, x, c)  0)
BUG();
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC);
 }
 
 static int xfrm_aevent_state_notify(struct xfrm_state *x, struct km_event *c)
@@ -2079,8 +2072,7 @@ static int xfrm_aevent_state_notify(stru
if (build_aevent(skb, x, c)  0)
BUG();
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_AEVENTS;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_AEVENTS, 
GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_AEVENTS, GFP_ATOMIC);
 }
 
 static int xfrm_notify_sa_flush(struct km_event *c)
@@ -2105,8 +2097,7 @@ static int xfrm_notify_sa_flush(struct k
 
nlmsg_end(skb, nlh);
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_SA;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
 }
 
 static inline int xfrm_sa_len(struct xfrm_state *x)
@@ -2175,8 +2166,7 @@ static int xfrm_notify_sa(struct xfrm_st
 
nlmsg_end(skb, nlh);
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_SA;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
 
 nlmsg_failure:
 rtattr_failure:
@@ -2262,8 +2252,7 @@ static int xfrm_send_acquire(struct xfrm
if (build_acquire(skb, x, xt, xp, dir)  0)
BUG();
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_ACQUIRE;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_ACQUIRE, 
GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_ACQUIRE, GFP_ATOMIC);
 }
 
 /* User gives us xfrm_user_policy_info followed by an array of 0
@@ -2371,8 +2360,7 @@ static int xfrm_exp_policy_notify(struct
if (build_polexpire(skb, xp, dir, c)  0)
BUG();
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC);
 }
 
 static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, struct km_event 
*c)
@@ -2423,8 +2411,7 @@ static

[PATCH 03/16] [XFRM] netlink: Use nlmsg_data() instead of NLMSG_DATA()

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:12:20.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:13:57.0 +0200
@@ -443,7 +443,7 @@ error_no_put:
 static int xfrm_add_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtattr **xfrma)
 {
-   struct xfrm_usersa_info *p = NLMSG_DATA(nlh);
+   struct xfrm_usersa_info *p = nlmsg_data(nlh);
struct xfrm_state *x;
int err;
struct km_event c;
@@ -520,7 +520,7 @@ static int xfrm_del_sa(struct sk_buff *s
struct xfrm_state *x;
int err = -ESRCH;
struct km_event c;
-   struct xfrm_usersa_id *p = NLMSG_DATA(nlh);
+   struct xfrm_usersa_id *p = nlmsg_data(nlh);
 
x = xfrm_user_state_lookup(p, xfrma, err);
if (x == NULL)
@@ -592,7 +592,7 @@ static int dump_one_state(struct xfrm_st
if (nlh == NULL)
return -EMSGSIZE;
 
-   p = NLMSG_DATA(nlh);
+   p = nlmsg_data(nlh);
copy_to_user_state(x, p);
 
if (x-aalg)
@@ -715,7 +715,7 @@ static int xfrm_get_spdinfo(struct sk_bu
struct rtattr **xfrma)
 {
struct sk_buff *r_skb;
-   u32 *flags = NLMSG_DATA(nlh);
+   u32 *flags = nlmsg_data(nlh);
u32 spid = NETLINK_CB(skb).pid;
u32 seq = nlh-nlmsg_seq;
int len = NLMSG_LENGTH(sizeof(u32));
@@ -765,7 +765,7 @@ static int xfrm_get_sadinfo(struct sk_bu
struct rtattr **xfrma)
 {
struct sk_buff *r_skb;
-   u32 *flags = NLMSG_DATA(nlh);
+   u32 *flags = nlmsg_data(nlh);
u32 spid = NETLINK_CB(skb).pid;
u32 seq = nlh-nlmsg_seq;
int len = NLMSG_LENGTH(sizeof(u32));
@@ -787,7 +787,7 @@ static int xfrm_get_sadinfo(struct sk_bu
 static int xfrm_get_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtattr **xfrma)
 {
-   struct xfrm_usersa_id *p = NLMSG_DATA(nlh);
+   struct xfrm_usersa_id *p = nlmsg_data(nlh);
struct xfrm_state *x;
struct sk_buff *resp_skb;
int err = -ESRCH;
@@ -841,7 +841,7 @@ static int xfrm_alloc_userspi(struct sk_
int family;
int err;
 
-   p = NLMSG_DATA(nlh);
+   p = nlmsg_data(nlh);
err = verify_userspi_info(p);
if (err)
goto out_noput;
@@ -1130,7 +1130,7 @@ static struct xfrm_policy *xfrm_policy_c
 static int xfrm_add_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtattr **xfrma)
 {
-   struct xfrm_userpolicy_info *p = NLMSG_DATA(nlh);
+   struct xfrm_userpolicy_info *p = nlmsg_data(nlh);
struct xfrm_policy *xp;
struct km_event c;
int err;
@@ -1277,8 +1277,8 @@ static int dump_one_policy(struct xfrm_p
XFRM_MSG_NEWPOLICY, sizeof(*p), sp-nlmsg_flags);
if (nlh == NULL)
return -EMSGSIZE;
-   p = NLMSG_DATA(nlh);
 
+   p = nlmsg_data(nlh);
copy_to_user_policy(xp, p, dir);
if (copy_to_user_tmpl(xp, skb)  0)
goto nlmsg_failure;
@@ -1351,7 +1351,7 @@ static int xfrm_get_policy(struct sk_buf
struct km_event c;
int delete;
 
-   p = NLMSG_DATA(nlh);
+   p = nlmsg_data(nlh);
delete = nlh-nlmsg_type == XFRM_MSG_DELPOLICY;
 
err = copy_from_user_policy_type(type, xfrma);
@@ -1420,7 +1420,7 @@ static int xfrm_flush_sa(struct sk_buff 
struct rtattr **xfrma)
 {
struct km_event c;
-   struct xfrm_usersa_flush *p = NLMSG_DATA(nlh);
+   struct xfrm_usersa_flush *p = nlmsg_data(nlh);
struct xfrm_audit audit_info;
int err;
 
@@ -1448,8 +1448,8 @@ static int build_aevent(struct sk_buff *
nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0);
if (nlh == NULL)
return -EMSGSIZE;
-   id = NLMSG_DATA(nlh);
 
+   id = nlmsg_data(nlh);
memcpy(id-sa_id.daddr, x-id.daddr,sizeof(x-id.daddr));
id-sa_id.spi = x-id.spi;
id-sa_id.family = x-props.family;
@@ -1490,7 +1490,7 @@ static int xfrm_get_ae(struct sk_buff *s
struct sk_buff *r_skb;
int err;
struct km_event c;
-   struct xfrm_aevent_id *p = NLMSG_DATA(nlh);
+   struct xfrm_aevent_id *p = nlmsg_data(nlh);
int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id));
struct xfrm_usersa_id *id = p-sa_id;
 
@@ -1538,7 +1538,7 @@ static int xfrm_new_ae(struct sk_buff *s
struct xfrm_state *x;
struct km_event c;
int err = - EINVAL;
-   struct xfrm_aevent_id *p = NLMSG_DATA(nlh);
+   struct xfrm_aevent_id *p = nlmsg_data(nlh);
struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1];
struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1];
 
@@ -1602,7 +1602,7 @@ static int xfrm_add_pol_expire(struct sk
struct rtattr

[PATCH 15/16] [XFRM] netlink: Remove dependency on rtnetlink

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:36:59.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:37:18.0 +0200
@@ -19,7 +19,6 @@
 #include linux/string.h
 #include linux/net.h
 #include linux/skbuff.h
-#include linux/rtnetlink.h
 #include linux/pfkeyv2.h
 #include linux/ipsec.h
 #include linux/init.h

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/16] [XFRM] netlink: Clear up some of the CONFIG_XFRM_SUB_POLICY ifdef mess

2007-08-22 Thread Thomas Graf

Moves all of the SUB_POLICY ifdefs related to the attribute size
calculation into a function.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:03:43.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:04:46.0 +0200
@@ -1224,6 +1224,14 @@ static inline int copy_to_user_sec_ctx(s
}
return 0;
 }
+static inline size_t userpolicy_type_attrsize(void)
+{
+#ifdef CONFIG_XFRM_SUB_POLICY
+   return nla_total_size(sizeof(struct xfrm_userpolicy_type));
+#else
+   return 0;
+#endif
+}
 
 #ifdef CONFIG_XFRM_SUB_POLICY
 static int copy_to_user_policy_type(u8 type, struct sk_buff *skb)
@@ -1857,9 +1865,7 @@ static int xfrm_send_migrate(struct xfrm
 
len = RTA_SPACE(sizeof(struct xfrm_user_migrate) * num_migrate);
len += NLMSG_SPACE(sizeof(struct xfrm_userpolicy_id));
-#ifdef CONFIG_XFRM_SUB_POLICY
-   len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type));
-#endif
+   len += userpolicy_type_attrsize();
skb = alloc_skb(len, GFP_ATOMIC);
if (skb == NULL)
return -ENOMEM;
@@ -2214,9 +2220,7 @@ static int xfrm_send_acquire(struct xfrm
len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr);
len += NLMSG_SPACE(sizeof(struct xfrm_user_acquire));
len += RTA_SPACE(xfrm_user_sec_ctx_size(x-security));
-#ifdef CONFIG_XFRM_SUB_POLICY
-   len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type));
-#endif
+   len += userpolicy_type_attrsize();
skb = alloc_skb(len, GFP_ATOMIC);
if (skb == NULL)
return -ENOMEM;
@@ -2322,9 +2326,7 @@ static int xfrm_exp_policy_notify(struct
len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr);
len += NLMSG_SPACE(sizeof(struct xfrm_user_polexpire));
len += RTA_SPACE(xfrm_user_sec_ctx_size(xp-security));
-#ifdef CONFIG_XFRM_SUB_POLICY
-   len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type));
-#endif
+   len += userpolicy_type_attrsize();
skb = alloc_skb(len, GFP_ATOMIC);
if (skb == NULL)
return -ENOMEM;
@@ -2349,9 +2351,7 @@ static int xfrm_notify_policy(struct xfr
len += RTA_SPACE(headlen);
headlen = sizeof(*id);
}
-#ifdef CONFIG_XFRM_SUB_POLICY
-   len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type));
-#endif
+   len += userpolicy_type_attrsize();
len += NLMSG_SPACE(headlen);
 
skb = alloc_skb(len, GFP_ATOMIC);
@@ -2401,9 +2401,7 @@ static int xfrm_notify_policy_flush(stru
struct nlmsghdr *nlh;
struct sk_buff *skb;
int len = 0;
-#ifdef CONFIG_XFRM_SUB_POLICY
-   len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type));
-#endif
+   len += userpolicy_type_attrsize();
len += NLMSG_LENGTH(0);
 
skb = alloc_skb(len, GFP_ATOMIC);

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/10 Rev4] [Doc] HOWTO Documentation for batching

2007-08-22 Thread Randy Dunlap

On Wed, 22 Aug 2007 13:58:58 +0530 Krishna Kumar wrote:

 Add Documentation describing batching skb xmit capability.
 
 Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
 ---
  batching_skb_xmit.txt |   78 
 ++
  1 files changed, 78 insertions(+)
 
 diff -ruNp org/Documentation/networking/batching_skb_xmit.txt 
 new/Documentation/networking/batching_skb_xmit.txt
 --- org/Documentation/networking/batching_skb_xmit.txt1970-01-01 
 05:30:00.0 +0530
 +++ new/Documentation/networking/batching_skb_xmit.txt2007-08-22 
 10:21:19.0 +0530
 @@ -0,0 +1,78 @@
 +  HOWTO for batching skb xmit support
 +  ---
 +
 +Section 1: What is batching skb xmit
 +Section 2: How batching xmit works vs the regular xmit
 +Section 3: How drivers can support batching
 +Section 4: How users can work with batching
 +
 +
 +Introduction: Kernel support for batching skb
 +--
 +
 +A new capability to support xmit of multiple skbs is provided in the 
 netdevice
 +layer. Drivers which enable this capability should be able to process 
 multiple
 +skbs in a single call to their xmit handler.
 +
 +
 +Section 1: What is batching skb xmit
 +-
 +
 + This capability is optionally enabled by a driver by setting the
 + NETIF_F_BATCH_SKBS bit in dev-features. The pre-requisite for a

 prerequisite

 + driver to use this capability is that it should have a reasonably

I would say reasonably-sized.

 + sized hardware queue that can process multiple skbs.
 +
 +
 +Section 2: How batching xmit works vs the regular xmit
 +---
 +
 + The network stack gets called from upper layer protocols with a single
 + skb to transmit. This skb is first enqueue'd and an attempt is made to

   enqueued

 + transmit it immediately (via qdisc_run). However, events like tx lock
 + contention, tx queue stopped, etc, can result in the skb not getting

  etc.,

 + sent out and it remains in the queue. When the next xmit is called or
 + when the queue is re-enabled, qdisc_run could potentially find
 + multiple packets in the queue, and iteratively send them all out
 + one-by-one.
 +
 + Batching skb xmit is a mechanism to exploit this situation where all
 + skbs can be passed in one shot to the device. This reduces driver
 + processing, locking at the driver (or in stack for ~LLTX drivers)
 + gets amortized over multiple skbs, and in case of specific drivers
 + where every xmit results in a completion processing (like IPoIB) -
 + optimizations can be made in the driver to request a completion for
 + only the last skb that was sent which results in saving interrupts
 + for every (but the last) skb that was sent in the same batch.
 +
 + Batching can result in significant performance gains for systems that
 + have multiple data stream paths over the same network interface card.
 +
 +
 +Section 3: How drivers can support batching
 +-
 +
 + Batching requires the driver to set the NETIF_F_BATCH_SKBS bit in
 + dev-features.
 +
 + The driver's xmit handler should be modified to process multiple skbs
 + instead of one skb. The driver's xmit handler is called either with a

   an

 + skb to transmit or NULL skb, where the latter case should be handled
 + as a call to xmit multiple skbs. This is done by sending out all skbs
 + in the dev-skb_blist list (where it was added by the core stack).
 +
 +
 +Section 4: How users can work with batching
 +-
 +
 + Batching can be disabled for a particular device, e.g. on desktop
 + systems if only one stream of network activity for that device is
 + taking place, since performance could be slightly affected due to
 + extra processing that batching adds (unless packets are getting
 + sent fast resulting in stopped queue's). Batching can be enabled if

   queues).

 + more than one stream of network activity per device is being done,
 + e.g. on servers; or even desktop usage with multiple browser, chat,
 + file transfer sessions, etc.
 +
 + Per device batching can be enabled/disabled by passing 'on' or 'off'
 + respectively to ethtool.

with what other parameter(s), e.g.,

ethtool dev batching on/off ?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More

Re: [2.6.20.17 review 35/58] forcedeth bug fix: realtek phy

2007-08-22 Thread Willy Tarreau

On Wed, Aug 22, 2007 at 11:56:42AM -0400, Chuck Ebbert wrote:
 On 08/22/2007 05:39 AM, Willy Tarreau wrote:
  This patch contains errata fixes for the realtek phy. It only renamed the
  defines to be phy specific.
  
  Signed-off-by: Ayaz Abdulla [EMAIL PROTECTED]
  Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED]
  Signed-off-by: Willy Tarreau [EMAIL PROTECTED]
  ---
   drivers/net/forcedeth.c |   54 
  +++
   1 files changed, 54 insertions(+), 0 deletions(-)
  
  diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
  index c383dc3..dbfdbed 100644
  --- a/drivers/net/forcedeth.c
  +++ b/drivers/net/forcedeth.c
  @@ -554,6 +554,7 @@ union ring_type {
   #define PHY_OUI_MARVELL0x5043
   #define PHY_OUI_CICADA 0x03f1
   #define PHY_OUI_VITESSE0x01c1
  +#define PHY_OUI_REALTEK0x01c1
   #define PHYID1_OUI_MASK0x03ff
   #define PHYID1_OUI_SHFT6
   #define PHYID2_OUI_MASK0xfc00
 
 Realtek is 0x0732
 
 This is still wrong upstream -- what happened to the patch to fix it?

Good catch, thanks Chuck! I've already seen the fix somewhere, I believe it
was on netdev, though I'm not sure. I'm fixing the patch in place right now.
I can add your signoff if you want.

Cheers,
Willy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Oops in e100_up

2007-08-22 Thread Kok, Auke


Gerrit Renker wrote:

With the davem-2.6.24 tree I get the following Oops in the e100 driver (cribbed 
from console):

Code: 6c ff ff ff 8b 48 0c ba 01 00 00 00 89 f0 e8 1b f2 ff ff c7 86 9c 00 00 00
  01 00 00 00 e9 4e ff ff ff 89 d0 e8 b3 f8 0b 00 eb 8e 0f 0b eb fe 55 89 
e5
  56 53 83 ec 0c 8b 98 dc 01 00 00 e8 ff b9

EIP: e100_up+0x11d/0x121 


SS:ESP 0068:f759ce38

Stack: syscall_call - sys_ioctl - vfs_ioctl - do_ioctl - sock_ioctl - inet_ioctl 
- devinet_ioctl -
   dev_change_flags - dev_open - e100_open - oops

The system log then goes on reporting eth0: link up, 100Mbps, full-duplex and 
hangs while trying to
restore the serial console state (not sure that this is related).


restore? Is this during resume from suspend or something?

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

r8169: slow samba performance

2007-08-22 Thread Bruce Cole


Just upgraded a motherboard and it came with an onboard
Realtek card which appears to use the r8169 driver.  The
machine is a samba server and when serving files to a local
Linux or Windows client, I only get approx 40-60 kbps. 
Write performance is fine though, in the tens of mbps and

scp, nfs, and ftp server all work well so it appears
specific to the Samba load.  However, when serving to more
than one client symoltaniously, performance goes up
dramatically, again into the tens of mbps or when there is
other network activity.


Shane, join the crowd :)  Try the fix I just re-posted over here:

http://www.spinics.net/lists/netdev/msg39244.html



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Oops in e100_up

2007-08-22 Thread Arnaldo Carvalho de Melo

Em Wed, Aug 22, 2007 at 09:35:04AM -0700, Kok, Auke escreveu:
 Gerrit Renker wrote:
 With the davem-2.6.24 tree I get the following Oops in the e100 driver 
 (cribbed from console):
 
 Code: 6c ff ff ff 8b 48 0c ba 01 00 00 00 89 f0 e8 1b f2 ff ff c7 86 9c 00 
 00 00
   01 00 00 00 e9 4e ff ff ff 89 d0 e8 b3 f8 0b 00 eb 8e 0f 0b eb fe 
   55 89 e5
   56 53 83 ec 0c 8b 98 dc 01 00 00 e8 ff b9
 
 EIP: e100_up+0x11d/0x121 
 
 SS:ESP 0068:f759ce38
 
 Stack: syscall_call - sys_ioctl - vfs_ioctl - do_ioctl - sock_ioctl - 
 inet_ioctl - devinet_ioctl -
dev_change_flags - dev_open - e100_open - oops
 
 The system log then goes on reporting eth0: link up, 100Mbps, 
 full-duplex and hangs while trying to
 restore the serial console state (not sure that this is related).
 
 restore? Is this during resume from suspend or something?

This seems to have been fixed by a bug reported by akpm and fixed by
Thomas Graf, check a recent post with netconsole on the subject.

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-22 Thread Rick Jones


David Miller wrote:

I think the jury is still out, but seeing TSO perform even slightly
worse with the batching changes in place would be very worrysome.
This applies to both throughput and cpu utilization.


Should it be any more or less worrysome than small packet performance (eg the 
TCP_RR stuff I posted recently) being rather worse with TSO enabled than with it 
disabled?


rick jones

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] santize tc_ematch headers

2007-08-22 Thread Stephen Hemminger

The headers in tc_ematch are used by iproute2, so these headers
should be processed.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 include/linux/Kbuild |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index ad7f71a..818cc3a 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -7,6 +7,7 @@ header-y += raid/
 header-y += spi/
 header-y += sunrpc/
 header-y += tc_act/
+header-y += tc_ematch/
 header-y += netfilter/
 header-y += netfilter_arp/
 header-y += netfilter_bridge/
-- 
1.5.2.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC IPROUTE]: Add flow classifier support

2007-08-22 Thread Stephen Hemminger

On Wed, 30 May 2007 11:42:01 +0200
Patrick McHardy [EMAIL PROTECTED] wrote:

 The iproute patch for the flow classifier.
 
 

This patch is on hold since the netlink changes haven't made it upstream yet.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ANNOUNCE] iproute2-2.6.23-rc3

2007-08-22 Thread Stephen Hemminger

There have been a lot of changes for 2.6.23, so here is a test release
of iproute2 that should capture all the submitted patches


http://developer.osdl.org/shemminger/iproute2/download/iproute2-2.6.23-rc3.tar.gz

Johannes Berg (1):
  show multicast groups

PJ Waskiewicz (1):
  iproute2: sch_rr support in tc

Patrick McHardy (6):
  TC action parsing bug fix
  Bug fix tc action drop
  IPROUTE2: RTNETLINK nested attributes
  Use FRA_* attributes for routing rules
  iplink: use netlink for link configuration
  Fix meta ematch usage of 0 values

Pavel Emelianov (1):
  Make ip utility veth driver aware

Sridhar Samudrala (1):
  Fix bug  in display of ipv6 cloned/cached routes

Stephen Hemminger (3):
  Fix ss to handle partial records.
  sanitized headers update to 2.6.23-rc3
  Fix m_ipt build

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] netdevice: kernel docbook addition

2007-08-22 Thread Stephen Hemminger

Add more kernel doc's for part of the network device API.
This is only a start, and needs more work.

Applies against net-2.6.24

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/Documentation/DocBook/kernel-api.tmpl 2007-08-21 15:43:37.0 
-0700
+++ b/Documentation/DocBook/kernel-api.tmpl 2007-08-22 12:30:33.0 
-0700
@@ -240,17 +240,23 @@ X!Ilib/string.c
  sect1titleDriver Support/title
 !Enet/core/dev.c
 !Enet/ethernet/eth.c
+!Enet/sched/sch_generic.c
 !Iinclude/linux/etherdevice.h
+!Iinclude/linux/netdevice.h
+ /sect1
+ sect1titlePHY Support/title
 !Edrivers/net/phy/phy.c
 !Idrivers/net/phy/phy.c
 !Edrivers/net/phy/phy_device.c
 !Idrivers/net/phy/phy_device.c
 !Edrivers/net/phy/mdio_bus.c
 !Idrivers/net/phy/mdio_bus.c
+ /sect1
 !-- FIXME: Removed for now since no structured comments in source
+ sect1titleWireless/title
 X!Enet/core/wireless.c
---
  /sect1
+--
  sect1titleSynchronous PPP/title
 !Edrivers/net/wan/syncppp.c
  /sect1
--- a/include/linux/netdevice.h 2007-08-21 15:44:00.0 -0700
+++ b/include/linux/netdevice.h 2007-08-22 12:00:16.0 -0700
@@ -302,17 +302,38 @@ enum
 
 extern void FASTCALL(__napi_schedule(struct napi_struct *n));
 
+/**
+ * napi_schedule_prep - check if napi can be scheduled
+ * @n: napi context
+ *
+ * Test if NAPI routine is already running, and if not mark
+ * it as running.  This is used as a condition variable
+ * insure only one NAPI poll instance runs
+ */
 static inline int napi_schedule_prep(struct napi_struct *n)
 {
return !test_and_set_bit(NAPI_STATE_SCHED, n-state);
 }
 
+/**
+ * napi_schedule - schedule NAPI poll
+ * @n: napi context
+ *
+ * Schedule NAPI poll routine to be called if it is not already
+ * running.
+ */
 static inline void napi_schedule(struct napi_struct *n)
 {
if (napi_schedule_prep(n))
__napi_schedule(n);
 }
 
+/**
+ * napi_complete - NAPI processing complete
+ * @n: napi context
+ *
+ * Mark NAPI processing as complete.
+ */
 static inline void napi_complete(struct napi_struct *n)
 {
BUG_ON(!test_bit(NAPI_STATE_SCHED, n-state));
@@ -320,12 +341,26 @@ static inline void napi_complete(struct 
clear_bit(NAPI_STATE_SCHED, n-state);
 }
 
+/**
+ * napi_disable - prevent NAPI from scheduling
+ * @n: napi context
+ *
+ * Stop NAPI from being scheduled on this context.
+ * Waits till any outstanding processing completes.
+ */
 static inline void napi_disable(struct napi_struct *n)
 {
while (test_and_set_bit(NAPI_STATE_SCHED, n-state))
msleep_interruptible(1);
 }
 
+/**
+ * napi_disable - prevent NAPI from scheduling
+ * @n: napi context
+ *
+ * Resume NAPI from being scheduled on this context.
+ * Must be paired with napi_disable.
+ */
 static inline void napi_enable(struct napi_struct *n)
 {
BUG_ON(!test_bit(NAPI_STATE_SCHED, n-state));
@@ -636,6 +671,12 @@ struct net_device
 #defineNETDEV_ALIGN32
 #defineNETDEV_ALIGN_CONST  (NETDEV_ALIGN - 1)
 
+/**
+ * netdev_priv - access network device private data
+ * @dev: network device
+ *
+ * Get network device private data
+ */
 static inline void *netdev_priv(const struct net_device *dev)
 {
return dev-priv;
@@ -773,11 +814,24 @@ static inline void netif_schedule(struct
__netif_schedule(dev);
 }
 
+/**
+ * netif_start_queue - allow transmit
+ * @dev: network device
+ *
+ * Allow upper layers to call the device hard_start_xmit routine.
+ */
 static inline void netif_start_queue(struct net_device *dev)
 {
clear_bit(__LINK_STATE_XOFF, dev-state);
 }
 
+/**
+ * netif_wake_queue - restart transmit
+ * @dev: network device
+ *
+ * Allow upper layers to call the device hard_start_xmit routine.
+ * Used for flow control when transmit resources are available.
+ */
 static inline void netif_wake_queue(struct net_device *dev)
 {
 #ifdef CONFIG_NETPOLL_TRAP
@@ -790,16 +844,35 @@ static inline void netif_wake_queue(stru
__netif_schedule(dev);
 }
 
+/**
+ * netif_stop_queue - stop transmitted packets
+ * @dev: network device
+ *
+ * Stop upper layers calling the device hard_start_xmit routine.
+ * Used for flow control when transmit resources are unavailable.
+ */
 static inline void netif_stop_queue(struct net_device *dev)
 {
set_bit(__LINK_STATE_XOFF, dev-state);
 }
 
+/**
+ * netif_queue_stopped - test if transmit queue is flowblocked
+ * @dev: network device
+ *
+ * Test if transmit queue on device is currently unable to send.
+ */
 static inline int netif_queue_stopped(const struct net_device *dev)
 {
return test_bit(__LINK_STATE_XOFF, dev-state);
 }
 
+/**
+ * netif_running - test if up
+ * @dev: network device
+ *
+ * Test if the device has been brought up.
+ */
 static inline int netif_running(const struct net_device *dev)
 {
return

Re: [PATCH] AH4: Update IPv4 options handling to conform to RFC 4302.

2007-08-22 Thread David Miller

From: Nick Bowler [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 10:22:53 -0400

 In testing our ESP/AH offload hardware, I discovered an issue with how AH
 handles mutable fields in IPv4.  RFC 4302 (AH) states the following on the
 subject:

 For IPv4, the entire option is viewed as a unit; so even
 though the type and length fields within most options are immutable
 in transit, if an option is classified as mutable, the entire option
 is zeroed for ICV computation purposes.

 The current implementation does not zero the type and length fields, resulting
 in authentication failures when communicating with hosts that do (i.e. 
 FreeBSD).

 I have tested record route and timestamp options (ping -R and ping -T) on a
 small network involving Windows XP, FreeBSD 6.2, and Linux hosts, with one
 router.  In the presence of these options, the FreeBSD and Linux hosts (with
 the patch or with the hardware) can communicate.  The Windows XP host simply
 fails to accept these packets with or without the patch.

 I have also been trying to test source routing options (using traceroute -g),
 but haven't had much luck getting this option to work *without* AH, let alone
 with.

 Signed-off-by: Nick Bowler [EMAIL PROTECTED]

Patch applied, thanks a lot Nick.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] santize tc_ematch headers

2007-08-22 Thread David Miller

From: Stephen Hemminger [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 10:18:38 -0700

 The headers in tc_ematch are used by iproute2, so these headers
 should be processed.

 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

Applied, thanks Stephen.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC IPROUTE]: Add flow classifier support

2007-08-22 Thread David Miller

From: Stephen Hemminger [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 10:46:15 -0700

 On Wed, 30 May 2007 11:42:01 +0200
 Patrick McHardy [EMAIL PROTECTED] wrote:

  The iproute patch for the flow classifier.

 This patch is on hold since the netlink changes haven't made it upstream yet.

I don't have the kernel side in my queue either, perhaps
I lost it or I didn't see it when it was sent out.

Patrick?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] xfrm: export sysctl_xfrm_acq_expires

2007-08-22 Thread David Miller

From: Neil Horman [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 15:42:02 -0400

 Hey all-
   I had noticed that an extra sysctl for xfrm had been added while back
 (specifically sysctl_xfrm_acq_expires).  Unlike its related sysctl's however,
 this was never exported so that out-of-tree modules could access it, and I
 thought it would be a good idea if it was.  This patch handles that.

 Thanks  Regards
 Neil

 Signed-off-by: Neil Horman [EMAIL PROTECTED]

There is no reason for out-of-tree code to access it and no
current examples exist.

It is an internal knob controlling how a specific part of the IPSEC
rule lookup operates, and that is all in-tree.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] xfrm: export sysctl_xfrm_acq_expires

2007-08-22 Thread Neil Horman

Hey all-
I had noticed that an extra sysctl for xfrm had been added while back
(specifically sysctl_xfrm_acq_expires).  Unlike its related sysctl's however,
this was never exported so that out-of-tree modules could access it, and I
thought it would be a good idea if it was.  This patch handles that.

Thanks  Regards
Neil

Signed-off-by: Neil Horman [EMAIL PROTECTED]


 xfrm_state.c |1 +
 1 file changed, 1 insertion(+)


diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index d4356e6..62ae5a2 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -34,6 +34,7 @@ u32 sysctl_xfrm_aevent_rseqth __read_mostly = 
XFRM_AE_SEQT_SIZE;
 EXPORT_SYMBOL(sysctl_xfrm_aevent_rseqth);
 
 u32 sysctl_xfrm_acq_expires __read_mostly = 30;
+EXPORT_SYMBOL(sysctl_xfrm_acq_expires);
 
 /* Each xfrm_state may be linked to two tables:
 
-- 
/***
 *Neil Horman
 *Software Engineer
 *Red Hat, Inc.
 [EMAIL PROTECTED]
 *gpg keyid: 1024D / 0x92A74FA1
 *http://pgp.mit.edu
 ***/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/16] [XFRM] netlink: Use nlmsg_end() and nlmsg_cancel()

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:40 +0200

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] improved xfrm_audit_log() patch

2007-08-22 Thread David Miller

From: David Miller [EMAIL PROTECTED]
Date: Tue, 21 Aug 2007 00:24:05 -0700 (PDT)

 Looks good, applied to net-2.6.24, thanks Joy.

Something is still buggered up in this patch, you can't add this local
audit_info variable unconditionally to these functions, and
alternatively you also can't add a bunch of ifdefs to xfrm_user.c to
cover it up either.

  CC [M]  net/xfrm/xfrm_user.o
net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_add_sa$,1ry(B:
net/xfrm/xfrm_user.c:450: warning: unused variable $,1rx(Baudit_info$,1ry(B
net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_del_sa$,1ry(B:
net/xfrm/xfrm_user.c:525: warning: unused variable $,1rx(Baudit_info$,1ry(B
net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_add_policy$,1ry(B:
net/xfrm/xfrm_user.c:1140: warning: unused variable $,1rx(Baudit_info$,1ry(B
net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_get_policy$,1ry(B:
net/xfrm/xfrm_user.c:1404: warning: unused variable $,1rx(Baudit_info$,1ry(B
net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_add_pol_expire$,1ry(B:
net/xfrm/xfrm_user.c:1651: warning: unused variable $,1rx(Baudit_info$,1ry(B
net/xfrm/xfrm_user.c: In function $,1rx(Bxfrm_add_sa_expire$,1ry(B:
net/xfrm/xfrm_user.c:1688: warning: unused variable $,1rx(Baudit_info$,1ry(B

So I'm going to revert for now.  Let me know when you have
a fixed version of the patch.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-22 Thread David Miller

From: Rick Jones [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 10:09:37 -0700

 Should it be any more or less worrysome than small packet
 performance (eg the TCP_RR stuff I posted recently) being rather
 worse with TSO enabled than with it disabled?

That, like any such thing shown by the batching changes, is a bug
to fix.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netdevice: kernel docbook addition

2007-08-22 Thread Randy Dunlap

On Wed, 22 Aug 2007 12:33:14 -0700 Stephen Hemminger wrote:

 Add more kernel doc's for part of the network device API.
 This is only a start, and needs more work.
 
 Applies against net-2.6.24

Thanks!

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] Net: ath5k, remove sysctls

2007-08-22 Thread Jiri Slaby

ath5k, remove sysctls

Syscalls were buggy and defunct in later kernels (due to sysctl check).

Signed-off-by: Jiri Slaby [EMAIL PROTECTED]

---
commit 069bfbe93facb3468f579568434d18f1268a487c
tree 87c19ebf2c91d9fb07f1847adcb6098f2235eaaa
parent b01c0e9a02b248c3e2f2923da9728ba2c3961dee
author Jiri Slaby [EMAIL PROTECTED] Wed, 22 Aug 2007 22:48:41 +0200
committer Jiri Slaby [EMAIL PROTECTED] Wed, 22 Aug 2007 22:48:41 +0200

 drivers/net/wireless/ath5k_base.c |   23 ---
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/net/wireless/ath5k_base.c 
b/drivers/net/wireless/ath5k_base.c
index 2ce82ed..7f938c4 100644
--- a/drivers/net/wireless/ath5k_base.c
+++ b/drivers/net/wireless/ath5k_base.c
@@ -2440,21 +2440,13 @@ static struct pci_driver ath_pci_drv_id = {
.resume = ath_pci_resume,
 };
 
-/*
- * Static (i.e. global) sysctls.  Note that the hal sysctls
- * are located under ours by sharing the setting for DEV_ATH.
- */
-enum {
-   DEV_ATH = 9,/* XXX known by hal */
-};
-
 static int mincalibrate = 1;
 static int maxcalibrate = INT_MAX / 1000;
 #defineCTL_AUTO-2  /* cannot be CTL_ANY or CTL_NONE */
 
 static ctl_table ath_static_sysctls[] = {
 #if AR_DEBUG
-   { .ctl_name = CTL_AUTO,
+   {
  .procname = debug,
  .mode = 0644,
  .data = ath_debug,
@@ -2462,28 +2454,28 @@ static ctl_table ath_static_sysctls[] = {
  .proc_handler = proc_dointvec
},
 #endif
-   { .ctl_name = CTL_AUTO,
+   {
  .procname = countrycode,
  .mode = 0444,
  .data = countrycode,
  .maxlen   = sizeof(countrycode),
  .proc_handler = proc_dointvec
},
-   { .ctl_name = CTL_AUTO,
+   {
  .procname = outdoor,
  .mode = 0444,
  .data = outdoor,
  .maxlen   = sizeof(outdoor),
  .proc_handler = proc_dointvec
},
-   { .ctl_name = CTL_AUTO,
+   {
  .procname = xchanmode,
  .mode = 0444,
  .data = xchanmode,
  .maxlen   = sizeof(xchanmode),
  .proc_handler = proc_dointvec
},
-   { .ctl_name = CTL_AUTO,
+   {
  .procname = calibrate,
  .mode = 0644,
  .data = ath_calinterval,
@@ -2495,14 +2487,15 @@ static ctl_table ath_static_sysctls[] = {
{ 0 }
 };
 static ctl_table ath_ath_table[] = {
-   { .ctl_name = DEV_ATH,
+   {
  .procname = ath,
  .mode = 0555,
  .child= ath_static_sysctls
}, { 0 }
 };
 static ctl_table ath_root_table[] = {
-   { .ctl_name = CTL_DEV,
+   {
+ .ctl_name = CTL_DEV,
  .procname = dev,
  .mode = 0555,
  .child= ath_ath_table
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] Wild and crazy ideas involving struct sk_buff

2007-08-22 Thread Paul Moore

Over in LSM/SELinux land there has been a lot of talk recently about how to 
deal with loopback and forwarded traffic, specifically, how to preserve the 
sender's security label on those two types of traffic.  Yes, there is the 
existing sk_buff.secmark field but that is already being used for something 
else and utilizing it for this purpose has it's pros/cons.

We're currently talking about several different ideas to solve the problem, 
including leveraging the sk_buff.secmark field, and one of the ideas was to 
add an additional field to the sk_buff structure.  Knowing how well that idea 
would go over (lead balloon is probably an understatement at best) I started 
looking at what I might be able to remove from the sk_buff struct to make 
room for a new field (the new field would be a u32).  Looking at the sk_buff 
structure it appears that the sk_buff.dev and sk_buff.iif fields are a bit 
redundant and removing the sk_buff.dev field could free 32/64 bits depending 
on the platform.  Is there any reason (performance?) for keeping the 
sk_buff.dev field around?  Would the community be open to patches which 
removed it and transition users over to the sk_buff.iif field?  Finally, 
assuming the sk_buff.dev field was removed, would the community be open to 
adding a new LSM/SELinux related u32 field to the sk_buff struct?

Thanks.
 
-- 
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/16] [XFRM] netlink: Use nlmsg_broadcast() and nlmsg_unicast()

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:42 +0200

 This simplifies successful return codes from 0 to 0.

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/16] [XFRM] netlink: Use nla_put()/NLA_PUT() variantes

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:43 +0200

 Also makes use of copy_sec_ctx() in another place and removes
 duplicated code.

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 06/16] [XFRM] netlink: Move algorithm length calculation to its own function

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:44 +0200

 Adds alg_len() to calculate the properly padded length of an
 algorithm attribute to simplify the code.

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/16] [XFRM] netlink: Clear up some of the CONFIG_XFRM_SUB_POLICY ifdef mess

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:45 +0200

 Moves all of the SUB_POLICY ifdefs related to the attribute size
 calculation into a function.

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/16] [XFRM] netlink: Use nlmsg_parse() to parse attributes

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:47 +0200

 Uses nlmsg_parse() to parse the attributes. This actually changes
 behaviour as unknown attributes (type  MAXTYPE) no longer cause
 an error. Instead unknown attributes will be ignored henceforth
 to keep older kernels compatible with more recent userspace tools.

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/16] [XFRM] netlink: Establish an attribute policy

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:48 +0200

 Adds a policy defining the minimal payload lengths for all the attributes
 allowing for most attribute validation checks to be removed from in
 the middle of the code path. Makes updates more consistent as many format
 errors are recognised earlier, before any changes have been attempted.

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 11/16] [XFRM] netlink: Enhance indexing of the attribute array

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:49 +0200

 nlmsg_parse() puts attributes at array[type] so the indexing
 method can be simpilfied by removing the obscuring - 1.

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 12/16] [XFRM] netlink: Rename attribyte array from xfrma[] to attrs[]

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:50 +0200

 Increases readability a lot.
 
 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.

I named it like this to mean XFRM Attributes :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/7 v2] fs_enet: Be an of_platform device when CONFIG_PPC_CPM_NEW_BINDING is set.

2007-08-22 Thread Vitaly Bordug

On Tue, 21 Aug 2007 11:47:41 -0500
Scott Wood [EMAIL PROTECTED] wrote:

 Vitaly Bordug wrote:
  On Fri, 17 Aug 2007 13:17:18 -0500
  Scott Wood wrote:
  
  
 The existing OF glue code was crufty and broken.  Rather than fix
 it, it will be removed, and the ethernet driver now talks to the
 device tree directly.
 
  
  A bit short description, I'd rather expect some specific
  improvements list, that are now up and running using device tree.
  Or if it is just move to new infrastucture, let's state that, too.
 
 Some of specific binding changes (there are too many to exhaustively 
 list) are enumerated in the new CPM binding patch, which I'll send 
 after Kumar's include/asm-ppc patch goes in (since it modifies one of 
 those files).
 
ok
 +#ifdef CONFIG_PPC_CPM_NEW_BINDING
 +static int __devinit find_phy(struct device_node *np,
 +  struct fs_platform_info *fpi)
 +{
 +   struct device_node *phynode, *mdionode;
 +   struct resource res;
 +   int ret = 0, len;
 +
 +   const u32 *data = of_get_property(np, phy-handle, len);
 +   if (!data || len != 4)
 +   return -EINVAL;
 +
 +   phynode = of_find_node_by_phandle(*data);
 +   if (!phynode)
 +   return -EINVAL;
 +
 +   mdionode = of_get_parent(phynode);
 +   if (!phynode)
 +   goto out_put_phy;
 +
 +   ret = of_address_to_resource(mdionode, 0, res);
 +   if (ret)
 +   goto out_put_mdio;
 +
 +   data = of_get_property(phynode, reg, len);
 +   if (!data || len != 4)
 +   goto out_put_mdio;
 +
 +   snprintf(fpi-bus_id, 16, PHY_ID_FMT, res.start, *data);
 +
 +out_put_mdio:
 +   of_node_put(mdionode);
 +out_put_phy:
 +   of_node_put(phynode);
 +   return ret;
 +}
  
  And without phy node? 
 
 It returns -EINVAL. :-)
 
 +#ifdef CONFIG_FS_ENET_HAS_FEC
 +#define IS_FEC(match) ((match)-data == fs_fec_ops)
 +#else
 +#define IS_FEC(match) 0
 +#endif
 +
  
  Since we're talking directly with device tree, why bother with
  CONFIG_ stuff? We are able to figure it out from dts..
 
 We are figuring it out from the DTS (that's what match-data is).  I 
 just didn't want boards without a FEC to have to build in support for
 it and waste memory.

yes, wrong snippet
what about 

 #ifdef CONFIG_CPM2
 + r = fs_enet_mdio_bb_init();
 + if (r != 0)
 + goto out_mdio_bb;
 +#endif
 +#ifdef CONFIG_8xx
 + r = fs_enet_mdio_fec_init();
 + if (r != 0)
 + goto out_mdio_fec;
 +#endif

We had to pray and hope that 8xx would only have fec, and cpm2 has some 
bitbanged stuff. now we can inquire dts and know for sure, at least it seems so.



 
 +   fpi-rx_ring = 32;
 +   fpi-tx_ring = 32;
 +   fpi-rx_copybreak = 240;
 +   fpi-use_napi = 0;
 +   fpi-napi_weight = 17;
 +
  
  
  move params over to  dts?
 
 No.  These aren't attributes of the hardware, they're choices the
 driver makes about how much memory to use and how to interact with
 the rest of the kernel.
 
 +   ret = find_phy(ofdev-node, fpi);
 +   if (ret)
 +   goto out_free_fpi;
 +
  
  so we're hosed without phy node.
 
 How is that different from the old code, where you're hosed without 
 fep-fpi-bus_id?
 

I wasn't defending old code, and consider old code is POS, new one is just 
great game meaningless.
I am just stating the problem, that we'll have to address later. On 8xx even 
reference boards may be 
without phy at all.

 +static struct of_device_id fs_enet_match[] = {
 +#ifdef CONFIG_FS_ENET_HAS_SCC
  
  
  same nagging. Are we able to get rid of Kconfig arcane defining
  which SoC currently plays the game for fs_enet?
 
 No, it's still needed for mpc885ads to determine pin setup and 
 conflicting device tree node removal (though that could go away if
 the device tree is changed to reflect the jumper settings).
 
 It's also useful for excluding unwanted code.  I don't like using 
 8xx/CPM2 as the decision point for that -- why should I build in 
 mac-scc.c if I have no intention of using an SCC ethernet (either 
 because my board doesn't have one, or because it's slow and conflicts 
 with other devices)?

ok, agreed, size is most serious judge here. we'll definitely have to revisit 
pin problem later too
(because custom designs sometimes switch contradictory devices on-the-fly, 
disable soc parts for alternative function, etc.) QE-like pin encoding may be 
an option for this or not- I'm inclined to look at
most resource-safe approach.

 
 -Scott

-- 
Sincerely, Vitaly
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/16] [XFRM] netlink: Use nlattr instead of rtattr

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:51 +0200

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/16] [XFRM] netlink: Use nla_memcpy() in xfrm_update_ae_params()

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:52 +0200

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 15/16] [XFRM] netlink: Remove dependency on rtnetlink

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:53 +0200

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 16/16] [XFRM] netlink: Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()

2007-08-22 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:55:54 +0200

 These functions are only used once and are a lot easier to understand if
 inlined directly into the function.

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Also applied.

Thanks for doing all of this work Thomas! :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] fs_enet: Whitespace cleanup.

2007-08-22 Thread Vitaly Bordug

On Fri, 17 Aug 2007 12:53:59 -0500
Scott Wood wrote:

 Signed-off-by: Scott Wood [EMAIL PROTECTED]

Acked-by: Vitaly Bordug [EMAIL PROTECTED]


 ---
  drivers/net/fs_enet/fs_enet-main.c |   85
 ---
 drivers/net/fs_enet/fs_enet.h  |4 +-
 drivers/net/fs_enet/mac-fcc.c  |1 -
 drivers/net/fs_enet/mii-fec.c  |1 - 4 files changed, 41
 insertions(+), 50 deletions(-)
 
 diff --git a/drivers/net/fs_enet/fs_enet-main.c
 b/drivers/net/fs_enet/fs_enet-main.c index a4a2a0e..f261b90 100644
 --- a/drivers/net/fs_enet/fs_enet-main.c
 +++ b/drivers/net/fs_enet/fs_enet-main.c
 @@ -353,7 +353,6 @@ static void fs_enet_tx(struct net_device *dev)
  
   do_wake = do_restart = 0;
   while (((sc = CBDR_SC(bdp))  BD_ENET_TX_READY) == 0) {
 -
   dirtyidx = bdp - fep-tx_bd_base;
  
   if (fep-tx_free == fep-tx_ring)
 @@ -454,7 +453,6 @@ fs_enet_interrupt(int irq, void *dev_id)
  
   nr = 0;
   while ((int_events = (*fep-ops-get_int_events)(dev)) != 0)
 { -
   nr++;
  
   int_clr_events = int_events;
 @@ -710,45 +708,43 @@ static void fs_timeout(struct net_device *dev)
   
 *-*/
  static void generic_adjust_link(struct  net_device *dev)
  {
 -   struct fs_enet_private *fep = netdev_priv(dev);
 -   struct phy_device *phydev = fep-phydev;
 -   int new_state = 0;
 -
 -   if (phydev-link) {
 -
 -   /* adjust to duplex mode */
 -   if (phydev-duplex != fep-oldduplex){
 -   new_state = 1;
 -   fep-oldduplex = phydev-duplex;
 -   }
 -
 -   if (phydev-speed != fep-oldspeed) {
 -   new_state = 1;
 -   fep-oldspeed = phydev-speed;
 -   }
 -
 -   if (!fep-oldlink) {
 -   new_state = 1;
 -   fep-oldlink = 1;
 -   netif_schedule(dev);
 -   netif_carrier_on(dev);
 -   netif_start_queue(dev);
 -   }
 -
 -   if (new_state)
 -   fep-ops-restart(dev);
 -
 -   } else if (fep-oldlink) {
 -   new_state = 1;
 -   fep-oldlink = 0;
 -   fep-oldspeed = 0;
 -   fep-oldduplex = -1;
 -   netif_carrier_off(dev);
 -   netif_stop_queue(dev);
 -   }
 -
 -   if (new_state  netif_msg_link(fep))
 -   phy_print_status(phydev);
 + struct fs_enet_private *fep = netdev_priv(dev);
 + struct phy_device *phydev = fep-phydev;
 + int new_state = 0;
 +
 + if (phydev-link) {
 + /* adjust to duplex mode */
 + if (phydev-duplex != fep-oldduplex) {
 + new_state = 1;
 + fep-oldduplex = phydev-duplex;
 + }
 +
 + if (phydev-speed != fep-oldspeed) {
 + new_state = 1;
 + fep-oldspeed = phydev-speed;
 + }
 +
 + if (!fep-oldlink) {
 + new_state = 1;
 + fep-oldlink = 1;
 + netif_schedule(dev);
 + netif_carrier_on(dev);
 + netif_start_queue(dev);
 + }
 +
 + if (new_state)
 + fep-ops-restart(dev);
 + } else if (fep-oldlink) {
 + new_state = 1;
 + fep-oldlink = 0;
 + fep-oldspeed = 0;
 + fep-oldduplex = -1;
 + netif_carrier_off(dev);
 + netif_stop_queue(dev);
 + }
 +
 + if (new_state  netif_msg_link(fep))
 + phy_print_status(phydev);
  }
  
  
 @@ -792,7 +788,6 @@ static int fs_init_phy(struct net_device *dev)
   return 0;
  }
  
 -
  static int fs_enet_open(struct net_device *dev)
  {
   struct fs_enet_private *fep = netdev_priv(dev);
 @@ -978,7 +973,7 @@ static struct net_device *fs_init_instance(struct
 device *dev, #endif
  
  #ifdef CONFIG_FS_ENET_HAS_SCC
 - if (fs_get_scc_index(fpi-fs_no) =0 )
 + if (fs_get_scc_index(fpi-fs_no) = 0)
   fep-ops = fs_scc_ops;
  #endif
  
 @@ -1069,9 +1064,8 @@ static struct net_device
 *fs_init_instance(struct device *dev, 
   return ndev;
  
 -  err:
 +err:
   if (ndev != NULL) {
 -
   if (registered)
   unregister_netdev(ndev);
  
 @@ -1262,7 +1256,6 @@ static int __init fs_init(void)
  err:
   cleanup_immap();
   return r;
 - 
  }
  
  static void __exit fs_cleanup(void)
 diff --git a/drivers/net/fs_enet/fs_enet.h
 b/drivers/net/fs_enet/fs_enet.h index 569be22..72a61e9 100644
 --- a/drivers/net/fs_enet/fs_enet.h
 +++ b/drivers/net/fs_enet/fs_enet.h
 @@ -15,8 +15,8 @@
  #include asm/commproc.h
  
  struct fec_info {
 -fec_t*  fecp;
 - u32 mii_speed;
 + fec_t

Re: [RFC] Wild and crazy ideas involving struct sk_buff

2007-08-22 Thread David Miller

From: Paul Moore [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:31:34 -0400

 We're currently talking about several different ideas to solve the problem, 
 including leveraging the sk_buff.secmark field, and one of the ideas was to 
 add an additional field to the sk_buff structure.  Knowing how well that idea 
 would go over (lead balloon is probably an understatement at best) I started 
 looking at what I might be able to remove from the sk_buff struct to make 
 room for a new field (the new field would be a u32).  Looking at the sk_buff 
 structure it appears that the sk_buff.dev and sk_buff.iif fields are a bit 
 redundant and removing the sk_buff.dev field could free 32/64 bits depending 
 on the platform.  Is there any reason (performance?) for keeping the 
 sk_buff.dev field around?  Would the community be open to patches which 
 removed it and transition users over to the sk_buff.iif field?  Finally, 
 assuming the sk_buff.dev field was removed, would the community be open to 
 adding a new LSM/SELinux related u32 field to the sk_buff struct?

It's there for performance, and I bet there might be some semantic
issues involved.

And ironically James Morris still owes me a struct sk_buff removal
from when I let him put the secmark thing in there!

Stop spending money you guys haven't earned yet :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 130 matches

Mail list logo