Re: NETDEV WATCHDOG: eth2: transmit timed out with 3c905C-TX

2006-05-23 Thread Steffen Klassert
On Tue, May 23, 2006 at 03:36:35PM +0200, Marco Berizzi wrote:
 Steffen Klassert wrote:
 
 On Wed, Apr 05, 2006 at 06:33:18PM +0200, Marco Berizzi wrote:
  Hello everybody.
  I'm getting these errors (with packet/connectivity loss) on
  our firewall after I have plugged in a 3c905C nic. Linux is
  Slackware 10.2 with vanilla 2.6.16.1.
 
  Hints?
 
  PS: I have temporary resolved the problem running 'ifconfig
  eth2 down' and 'ifconfig eth2 up'
 
  Apr  5 17:47:07 Teti kernel: eth2: Resetting the Tx ring pointer.
  Apr  5 17:47:47 Teti last message repeated 4 times
  Apr  5 17:48:57 Teti last message repeated 7 times
  Apr  5 17:49:57 Teti last message repeated 6 times
  Apr  5 17:50:57 Teti last message repeated 6 times
 
  Apr  5 17:47:07 Teti kernel: NETDEV WATCHDOG: eth2: transmit timed out
 
 There were some problems of this kind with 10base2 networks in 2.6.16.
 Could you please try whether 2.6.17-rc1 has this problems too?
 
 [Sorry for the very huge delay, but after 2.6.17-rc1 upgrade
 xfs filesystem crashed].
 
 Same problem here with 2.6.17-rc3-git18. Running ifconfig
 eth2 down and ifconfig eth2 up resolves the problem for
 a while.

Actually I have not really an idea what is going on here,
but increasing the debug level could give some more informations.
Setting debug=4 is a good start.
Did you try older kernel versions too?

Steffen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NETDEV WATCHDOG: eth2: transmit timed out with 3c905C-TX

2006-06-06 Thread Steffen Klassert
On Tue, Jun 06, 2006 at 11:12:45AM +0200, Marco Berizzi wrote:
 
 I have moved this damn pc from the remote to my site and I have
 placed it in production environment with 2.6.17-rc5
 No problem after 24 hours (on the remote side the problem was
 arising after a couple of hours). I have modprobed 3c59x with
 debug=4. I see only these kind of messages (are they fine?):
 
 Jun  5 14:31:25 Pleiadi kernel: eth2: vortex_error(), status=0x8081
 Jun  5 14:31:40 Pleiadi last message repeated 3 times
 Jun  5 14:31:47 Pleiadi kernel: eth2: vortex_error(), status=0x8281
 Jun  5 14:31:47 Pleiadi kernel: eth2: Media selection timer tick happened, 
 Autonegotiate.
 Jun  5 14:31:47 Pleiadi kernel: dev-watchdog_timeo=1250
 Jun  5 14:31:47 Pleiadi kernel: eth2: MII transceiver has status 782d.
 Jun  5 14:31:47 Pleiadi kernel: eth2: Media selection timer finished, 
 Autonegotiate.
 Jun  5 14:31:51 Pleiadi kernel: eth2: vortex_error(), status=0x8081
 Jun  5 14:32:03 Pleiadi last message repeated 2 times
 Jun  5 14:32:10 Pleiadi kernel: eth2: vortex_error(), status=0x8481
 Jun  5 14:32:15 Pleiadi kernel: eth2: vortex_error(), status=0x8081
 Jun  5 14:32:46 Pleiadi last message repeated 7 times

This is ok, just normal operation of the NIC.

 
 The only relevant change, between the remote and my site, is a
 different ethernet switch where the 3c905C is connected to.
 Could it be an issue?

Well, I think it can. Problems with a switch are mostly related
to the autonegotiation of the media type and full/half-duplex. 
But in your case the autonegotiation seems to be ok 
(mii-tool/ethtool output). More specific information you can
get with the mii-diag and vortex-diag tools. You can find
these tools at http://www.scyld.com/ethercard_diag.html

There are problems with a cisco switch documented in
Documentation/networking/vortex.txt for example.

Steffen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] drivers/net/3c59x: notice carrier a little sooner

2006-01-12 Thread Steffen Klassert
Did you give the patch a try?

Actually I have no possibility to test, but
I think that netif_carrier_{on,off} still does not
work proper. The timer function does just nothing
if vp-medialock is set.

Steffen


On Thu, Jan 12, 2006 at 01:29:23PM -0500, Dan Williams wrote:
 Hi,
 
 This patch attempts to notice carrier 'on' state a little sooner.  Since
 the carrier watchdog only fires in really, really long intervals (like,
 60s in most cases), it takes up to 60s to notice that a cable has been
 plugged/unplugged.  This patch fires the carrier watchdog if (1) the
 carrier is off, and (2) packets have been successfully received.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] drivers/net/3c59x: notice carrier a little sooner

2006-01-12 Thread Steffen Klassert
On Thu, Jan 12, 2006 at 03:02:25PM -0500, Dan Williams wrote:
 On Thu, 2006-01-12 at 20:57 +0100, Steffen Klassert wrote:
  Did you give the patch a try?
  
  Actually I have no possibility to test, but
  I think that netif_carrier_{on,off} still does not
  work proper. The timer function does just nothing
  if vp-medialock is set.
  
 Worked for me with 3c905 card in a Dell Latitude C610.  I just cp'd the
 code into the vortex, so maybe vortex is different than boomerang.
 Anyway, shouldn't the timer be run a bit later after the *_rx()
 function?
 
 Dan

Of course the timer function should run in both cases.
If it works with the boomerang then it should work with vortex too.
I just remember that I had some problems with netif_carrier_ok()
when I tried to use ethtool_op_get_link about a year ago.
Perhaps I'm not up to date and it works in default case. 
I think a problem should appear at least
if one forces a certain media type. 
I still have a patch in queue to improve usage of netif_carrier_{on,off} 
but I had no possibility to test yet, so I did not send.

Steffen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] 3c59x: improve usage of netif_carrier_{on,off}

2006-01-13 Thread Steffen Klassert
...
  I still have a patch in queue to improve usage of netif_carrier_{on,off} 
  but I had no possibility to test yet, so I did not send.
 
 I'd be happy to test it out if you'd like.
 
 Dan

Hi Dan,

here is the patch for testing.


The patch _should_ do the following:

1. Set the polling intervall for media changes to 5 seconds if link is down 
and 60 seconds if link is up.
2. Handle netif_carrier_{on,of} and check for media changes in 
proper way by using mii_check_media() in the mii case.
3. Handle netif_carrier_{on,of} also if media is forced to 10baseT/100baseTx.
4. Use ethtool_op_get_link instead of vortex_get_link. 
So it is possible to test with ethtool.

The patch compiles, but as I told it is fairly untested. 

Please let me know the results of your tests.
Thanks in advance,

Steffen

Signed-off-by: Steffen Klassert [EMAIL PROTECTED]

--- vanilla-2.6.15/drivers/net/3c59x.c  2006-01-03 04:21:10.0 +0100
+++ linux-2.6.15-sk2/drivers/net/3c59x.c2006-01-13 16:15:28.0 
+0100
@@ -789,7 +789,7 @@
int options;/* 
User-settable misc. driver options. */
unsigned int media_override:4,  /* Passed-in media type. */
default_media:4,/* Read from 
the EEPROM/Wn3_Config. */
-   full_duplex:1, force_fd:1, autoselect:1,
+   full_duplex:1, autoselect:1,
bus_master:1,   /* Vortex can 
only do a fragment bus-m. */
full_bus_master_tx:1, full_bus_master_rx:2, /* Boomerang  */
flow_ctrl:1,/* Use 802.3x 
flow control (PAUSE only) */
@@ -1326,7 +1326,7 @@
vp-enable_wol = 1;
}
 
-   vp-force_fd = vp-full_duplex;
+   vp-mii.force_media = vp-full_duplex;
vp-options = option;
/* Read the station address from the EEPROM. */
EL3WINDOW(0);
@@ -1616,6 +1616,48 @@
 }
 
 static void
+vortex_set_duplex(struct net_device *dev)
+{
+   int mii_reg1, mii_reg5;
+   struct vortex_private *vp = netdev_priv(dev);
+   long *ioaddr = vp-ioaddr;
+   
+   EL3WINDOW(4);
+   mii_reg1 = mdio_read(dev, vp-phys[0], 1);
+   mii_reg5 = mdio_read(dev, vp-phys[0], 5);
+   vp-partner_flow_ctrl = ((mii_reg5  0x0400) != 0);
+   EL3WINDOW(3);
+   if (vortex_debug  1)
+   printk(KERN_INFO %s: MII #%d status %4.4x, link partner 
capability %4.4x,
+   info1 %04x, setting %s-duplex.\n,
+   dev-name, vp-phys[0],
+   mii_reg1, mii_reg5,
+   vp-info1, ((vp-info1  0x8000) || 
vp-full_duplex) ? full : half);
+   /* Set the full-duplex bit. */
+   iowrite16(  ((vp-info1  0x8000) || vp-full_duplex ? 0x20 : 0) |
+   (vp-large_frames ? 0x40 : 0) |
+   ((vp-full_duplex  vp-flow_ctrl  
vp-partner_flow_ctrl) ? 0x100 : 0),
+   ioaddr + Wn3_MAC_Ctrl);
+   /* AKPM: bug: should reset Tx and Rx after setting Duplex.  Page 180 */
+}
+
+static void
+vortex_check_media(struct net_device *dev, unsigned int init)
+{
+   struct vortex_private *vp = netdev_priv(dev);
+   unsigned int ok_to_print = 0;
+
+   if (vortex_debug  3)
+   ok_to_print = 1;
+   
+   if (mii_check_media(vp-mii, ok_to_print, init)) {
+   vp-full_duplex = vp-mii.full_duplex;
+   vortex_set_duplex(dev);
+   } else if (init) 
+   vortex_set_duplex(dev);
+}
+
+static void
 vortex_up(struct net_device *dev)
 {
struct vortex_private *vp = netdev_priv(dev);
@@ -1675,55 +1717,15 @@
printk(KERN_DEBUG %s: Initial media type %s.\n,
   dev-name, media_tbl[dev-if_port].name);
 
-   vp-full_duplex = vp-force_fd;
+   vp-full_duplex = vp-mii.force_media;
config = BFINS(config, dev-if_port, 20, 4);
if (vortex_debug  6)
printk(KERN_DEBUG vortex_up(): writing 0x%x to 
InternalConfig\n, config);
iowrite32(config, ioaddr + Wn3_Config);
 
-   if (dev-if_port == XCVR_MII || dev-if_port == XCVR_NWAY) {
-   int mii_reg1, mii_reg5;
-   EL3WINDOW(4);
-   /* Read BMSR (reg1) only to clear old status. */
-   mii_reg1 = mdio_read(dev, vp-phys[0], MII_BMSR);
-   mii_reg5 = mdio_read(dev, vp-phys[0], MII_LPA);
-   if (mii_reg5 == 0x  ||  mii_reg5 == 0x) {
-   netif_carrier_off(dev); /* No MII device or no link 
partner report */
-   } else {
-   mii_reg5 = vp-advertising;
-   if ((mii_reg5  0x0100) != 0/* 100baseTx-FD */
-|| (mii_reg5  0x00C0) == 0x0040) /* 10T-FD, 
but not 100-HD */
-   vp-full_duplex

[PATCH] 3c59x: collision statistic fix

2006-01-14 Thread Steffen Klassert
Count the total number of packets with collisions during transmission
in vp-stats.collisions.

Signed-off-by: Steffen Klassert [EMAIL PROTECTED]

--- vanilla-2.6.15/drivers/net/3c59x.c  2006-01-03 04:21:10.0 +0100
+++ linux-2.6.15-sk/drivers/net/3c59x.c 2006-01-14 17:54:16.0 +0100
@@ -754,7 +754,9 @@
 
 struct vortex_extra_stats {
 unsigned long tx_deferred;
+   unsigned long tx_max_collisions;
 unsigned long tx_multiple_collisions;
+unsigned long tx_single_collisions;
 unsigned long rx_bad_ssd;
 };
 
@@ -863,12 +865,14 @@
const char str[ETH_GSTRING_LEN];
 } ethtool_stats_keys[] = {
{ tx_deferred },
+   { tx_max_collisions },
{ tx_multiple_collisions },
+   { tx_single_collisions },
{ rx_bad_ssd },
 };
 
 /* number of ETHTOOL_GSTATS u64's */
-#define VORTEX_NUM_STATS 3
+#define VORTEX_NUM_STATS5 
 
 static int vortex_probe1(struct device *gendev, void __iomem *ioaddr, int irq,
   int chip_idx, int card_idx);
@@ -2108,9 +2112,12 @@
iowrite8(0, ioaddr + TxStatus);
if (tx_status  0x30) { /* txJabber or 
txUnderrun */
do_tx_reset = 1;
-   } else if ((tx_status  0x08)  (vp-drv_flags  
MAX_COLLISION_RESET)) {   /* maxCollisions */
-   do_tx_reset = 1;
-   reset_mask = 0x0108;/* Reset interface 
logic, but not download logic */
+   } else if (tx_status  0x08) {  /* maxCollisions */
+   vp-xstats.tx_max_collisions++;
+   if (vp-drv_flags  MAX_COLLISION_RESET) {
+   do_tx_reset = 1;
+   reset_mask = 0x0108;/* Reset 
interface logic, but not download logic */
+   }
} else {/* 
Merely re-enable the transmitter. */
iowrite16(TxEnable, ioaddr + EL3_CMD);
}
@@ -2926,7 +2933,6 @@
EL3WINDOW(6);
vp-stats.tx_carrier_errors += ioread8(ioaddr + 0);
vp-stats.tx_heartbeat_errors   += ioread8(ioaddr + 1);
-   vp-stats.collisions+= ioread8(ioaddr + 3);
vp-stats.tx_window_errors  += ioread8(ioaddr + 4);
vp-stats.rx_fifo_errors+= ioread8(ioaddr + 5);
vp-stats.tx_packets+= ioread8(ioaddr + 6);
@@ -2939,10 +2945,15 @@
vp-stats.tx_bytes  += ioread16(ioaddr + 12);
/* Extra stats for get_ethtool_stats() */
vp-xstats.tx_multiple_collisions   += ioread8(ioaddr + 2);
+   vp-xstats.tx_single_collisions += ioread8(ioaddr + 3);
vp-xstats.tx_deferred  += ioread8(ioaddr + 8);
EL3WINDOW(4);
vp-xstats.rx_bad_ssd   += ioread8(ioaddr + 12);
 
+   vp-stats.collisions = vp-xstats.tx_multiple_collisions
+   + vp-xstats.tx_single_collisions
+   + vp-xstats.tx_max_collisions;
+
{
u8 up = ioread8(ioaddr + 13);
vp-stats.rx_bytes += (up  0x0f)  16;
@@ -3036,8 +3047,10 @@
spin_unlock_irqrestore(vp-lock, flags);
 
data[0] = vp-xstats.tx_deferred;
-   data[1] = vp-xstats.tx_multiple_collisions;
-   data[2] = vp-xstats.rx_bad_ssd;
+   data[1] = vp-xstats.tx_max_collisions;
+   data[2] = vp-xstats.tx_multiple_collisions;
+   data[3] = vp-xstats.tx_single_collisions;
+   data[4] = vp-xstats.rx_bad_ssd;
 }
 
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] 3c59x: improve usage of netif_carrier_{on,off}

2006-01-16 Thread Steffen Klassert
On Mon, Jan 16, 2006 at 02:43:30PM -0500, Dan Williams wrote:
...
 The patch appears to work correctly and does notice links quite a bit
 sooner.  The only issue I noticed was that if no cable is plugged in, it
 starts off with the carrier on (/sys/class/net/eth0/carrier == 1) but
 a second later switches the carrier off.  How do I track that down?
 
 Dan

The patch should not affect the set up of the NIC that early.
vortex_up() calls netif_carrier_off() before the patch interfere.
Perhaps you can see this behavior before the old status 
is cleared by reading MII_BMSR register.  

How is it without the patch?

Steffen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH REPOST] mii: check carrier state even when force_media == 1

2006-02-09 Thread Steffen Klassert
On Thu, Feb 09, 2006 at 05:19:45PM -0500, John W. Linville wrote:
 Drivers using mii_check_media (via-rhine in particular) and also
 forcing link parameters with ethtool can reach a state where the link
 goes down and never comes back up.  This is because mii_check_media
 short-circuits early if mii-force_media != 0.  This was discussed
 in a couple of past threads, one of which is available here:
 
   http://www.ussg.iu.edu/hypermail/linux/kernel/0508.3/0670.html
 
 The patch moves the force_media check to below the carrier status
 check.This allows the link state to show correctly, while avoiding
 the check of link parameters.
 

I sent almost the same patch because of the same reasons about a year ago,
see http://oss.sgi.com/projects/netdev/archive/2005-02/msg00648.html
so I would vote for this patch too.

Acked-by: Steffen Klassert [EMAIL PROTECTED]
 Signed-off-by: John W. Linville [EMAIL PROTECTED]
 ---
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] 3c59x: fix networking for 10base2 NICs

2006-03-30 Thread Steffen Klassert

Fix broken networking for older 10base2 NICs.

Signed-off-by: Steffen Klassert [EMAIL PROTECTED]

--- linux-2.6.16-git12/drivers/net/3c59x.c  2006-03-30 14:16:23.0 
+0200
+++ linux-2.6.16-git12-sk/drivers/net/3c59x.c   2006-03-30 15:27:13.0 
+0200
@@ -788,7 +788,7 @@
int options;/* 
User-settable misc. driver options. */
unsigned int media_override:4,  /* Passed-in media type. */
default_media:4,/* Read from 
the EEPROM/Wn3_Config. */
-   full_duplex:1, force_fd:1, autoselect:1,
+   full_duplex:1, autoselect:1,
bus_master:1,   /* Vortex can 
only do a fragment bus-m. */
full_bus_master_tx:1, full_bus_master_rx:2, /* Boomerang  */
flow_ctrl:1,/* Use 802.3x 
flow control (PAUSE only) */
@@ -1633,12 +1633,6 @@
((vp-full_duplex  vp-flow_ctrl  
vp-partner_flow_ctrl) ?
0x100 : 0),
ioaddr + Wn3_MAC_Ctrl);
-
-   issue_and_wait(dev, TxReset);
-   /*
-* Don't reset the PHY - that upsets autonegotiation during DHCP 
operations.
-*/
-   issue_and_wait(dev, RxReset|0x04);
 }
 
 static void vortex_check_media(struct net_device *dev, unsigned int init)
@@ -1663,7 +1657,7 @@
struct vortex_private *vp = netdev_priv(dev);
void __iomem *ioaddr = vp-ioaddr;
unsigned int config;
-   int i;
+   int i, mii_reg1, mii_reg5;
 
if (VORTEX_PCI(vp)) {
pci_set_power_state(VORTEX_PCI(vp), PCI_D0);/* Go active */
@@ -1723,14 +1717,23 @@
printk(KERN_DEBUG vortex_up(): writing 0x%x to 
InternalConfig\n, config);
iowrite32(config, ioaddr + Wn3_Config);
 
-   netif_carrier_off(dev);
if (dev-if_port == XCVR_MII || dev-if_port == XCVR_NWAY) {
EL3WINDOW(4);
+   mii_reg1 = mdio_read(dev, vp-phys[0], MII_BMSR);
+   mii_reg5 = mdio_read(dev, vp-phys[0], MII_LPA);
+   vp-partner_flow_ctrl = ((mii_reg5  0x0400) != 0);
+
vortex_check_media(dev, 1);
}
else
vortex_set_duplex(dev);
 
+   issue_and_wait(dev, TxReset);
+   /*
+* Don't reset the PHY - that upsets autonegotiation during DHCP 
operations.
+*/
+   issue_and_wait(dev, RxReset|0x04);
+
 
iowrite16(SetStatusEnb | 0x00, ioaddr + EL3_CMD);
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NETDEV WATCHDOG: eth2: transmit timed out with 3c905C-TX

2006-04-05 Thread Steffen Klassert
On Wed, Apr 05, 2006 at 06:33:18PM +0200, Marco Berizzi wrote:
 Hello everybody.
 I'm getting these errors (with packet/connectivity loss) on
 our firewall after I have plugged in a 3c905C nic. Linux is
 Slackware 10.2 with vanilla 2.6.16.1.
 
 Hints?
 
 PS: I have temporary resolved the problem running 'ifconfig
 eth2 down' and 'ifconfig eth2 up'
 
 Apr  5 17:47:07 Teti kernel: eth2: Resetting the Tx ring pointer.
 Apr  5 17:47:47 Teti last message repeated 4 times
 Apr  5 17:48:57 Teti last message repeated 7 times
 Apr  5 17:49:57 Teti last message repeated 6 times
 Apr  5 17:50:57 Teti last message repeated 6 times
 
 Apr  5 17:47:07 Teti kernel: NETDEV WATCHDOG: eth2: transmit timed out

There were some problems of this kind with 10base2 networks in 2.6.16.
Could you please try whether 2.6.17-rc1 has this problems too?

Steffen 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] 3c59x: fix duplex configuration

2007-08-09 Thread Steffen Klassert
A special sequence of ifconfig up/down and plug/unplug the cable
can break the duplex configuration of the driver.

Setting 
vp-mii.full_duplex = vp-full_duplex 
in vortex_up should fix this.

Addresses Bug 8575 3c59x duplex configuration broken
http://bugzilla.kernel.org/show_bug.cgi?id=8575

Cc: Martin Buck [EMAIL PROTECTED]
Signed-off-by: Steffen Klassert [EMAIL PROTECTED]
---
 drivers/net/3c59x.c |1 +
 1 file changed, 1 insertion(+)

--- linux-2.6.23-rc2.orig/drivers/net/3c59x.c
+++ linux-2.6.23-rc2/drivers/net/3c59x.c
@@ -1555,6 +1555,7 @@ vortex_up(struct net_device *dev)
mii_reg1 = mdio_read(dev, vp-phys[0], MII_BMSR);
mii_reg5 = mdio_read(dev, vp-phys[0], MII_LPA);
vp-partner_flow_ctrl = ((mii_reg5  0x0400) != 0);
+   vp-mii.full_duplex = vp-full_duplex;
 
vortex_check_media(dev, 1);
}
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Add 3c59x maintainer

2007-08-10 Thread Steffen Klassert
Add 3c59x maintainer.

Signed-off-by: Steffen Klassert [EMAIL PROTECTED]
---
 MAINTAINERS |6 ++
 1 file changed, 6 insertions(+)

--- linux-2.6.23-rc2.orig/MAINTAINERS
+++ linux-2.6.23-rc2/MAINTAINERS
@@ -97,6 +97,12 @@ M:   [EMAIL PROTECTED]
 L: netdev@vger.kernel.org
 S: Maintained
 
+3C59X NETWORK DRIVER
+P: Steffen Klassert
+M: [EMAIL PROTECTED]
+L: netdev@vger.kernel.org
+S: Maintained
+
 3CR990 NETWORK DRIVER
 P: David Dillow
 M: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 08/18] 3c59x: check return of pci_enable_device()

2007-08-15 Thread Steffen Klassert
On Wed, Aug 15, 2007 at 06:30:00PM +0200, Steffen Klassert wrote:
 On Tue, Aug 14, 2007 at 10:54:32AM +0100, Mark Hindley wrote:
  On Tue, Aug 14, 2007 at 01:33:26AM -0400, Jeff Garzik wrote:
   I would strongly prefer that vortex_up return a value, since all the 
   important callers of this function can themselves return an error back 
   to the system.
   
   we can definitely return a meaningful return value here, if 
   pci_enable_device() fails, and I would rather not apply a patch that 
   fails to propagate a serious condition (pci_enable_device failure is 
   indeed serious) when it is possible to do so
   
  
  OK. Any comments on this revised version? I have only ignored the return of
  vortex_up in vortex_error. It is not immediately clear what to do if
  vortex_up still fails there after a pci reset.
  
  Mark
  
  
  diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
  index 001c66d..a1dfd6b 100644
  --- a/drivers/net/3c59x.c
  +++ b/drivers/net/3c59x.c
  @@ -705,7 +705,7 @@ static struct {
   
   static int vortex_probe1(struct device *gendev, void __iomem *ioaddr, int 
  irq,
 int chip_idx, int card_idx);
  -static void vortex_up(struct net_device *dev);
  +static int vortex_up(struct net_device *dev);
   static void vortex_down(struct net_device *dev, int final);
   static int vortex_open(struct net_device *dev);
   static void mdio_sync(void __iomem *ioaddr, int bits);
  @@ -841,8 +841,11 @@ static int vortex_resume(struct pci_dev *pdev)
  return -EBUSY;
  }
  if (netif_running(dev)) {
  -   vortex_up(dev);
  -   netif_device_attach(dev);
  +   err = vortex_up(dev);
  +   if (err)
  +   return err;
  +   else  
  +   netif_device_attach(dev);
  }
  }
  return 0;
 
 I think we should free the requested irq if vortex_up really fails here.
 

I was wrong, this will be done in vortex_close. So it is OK as it is.

Steffen 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REVISED PATCH] 3c59x: check return of pci_enable_device()

2007-08-31 Thread Steffen Klassert
On Fri, Aug 31, 2007 at 09:08:37AM -0400, Jeff Garzik wrote:
 Mark Hindley wrote:
 Revised patch for this.
 
 Mark
 
 
 commit 5cf33391eba81a49038fa8be8cbad8425b80bf7f
 Author: Mark Hindley [EMAIL PROTECTED]
 Date:   Thu Aug 16 11:26:35 2007 +0100
 
 Check return of pci_enable_device in vortex_up().
 
 Also modify vortex_up to return error to callers. Handle failure of 
 vortex_up in
 vortex_open and vortex_resume.
 
 Signed-off-by: Mark Hindley [EMAIL PROTECTED]
 
 Steffen, did you ACK this?  and/or going to resend it to me?
 

Yes, I did. Andrew has it already in -mm, 
I guess that he will resend it to you soon.

Steffen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -mm 2/2] 3c59x MAINTAINERS

2007-09-04 Thread Steffen Klassert
On Tue, Sep 04, 2007 at 03:52:50AM +0530, Satyam Sharma wrote:
 Remove duplicate entry for the same driver.

This is -mm specific. Andrew did not remove the add-3c59x-maintainer
patch after pushing it to mainline. This can be fixed just by removing
the add-3c59x-maintainer patch from -mm. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug

2007-09-04 Thread Steffen Klassert
On Tue, Sep 04, 2007 at 03:45:55AM +0530, Satyam Sharma wrote:
 
 drivers/net/3c59x.c: In function 'vortex_up':
 drivers/net/3c59x.c:1495: warning: 'err' may be used uninitialized in this 
 function

This came in with the recently applied 3c59x-check-return-of-pci_enable_device 
patch
from Mark Hindley. I just compiled it on a PCI only machine so far, therefore I 
did
not notice the warning yet.

 
 is a genuine bug. The function returns an uninitialized value of 'err'
 back to the caller, which expects it to be 0 for success cases. Let's
 fix this by explicitly initializing 'err' to zero.
 
 Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Acked-by: Steffen Klassert [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug

2007-09-04 Thread Steffen Klassert
On Tue, Sep 04, 2007 at 09:53:31AM +0100, Mark Hindley wrote:
 On Tue, Sep 04, 2007 at 02:09:47PM +0530, Satyam Sharma wrote:
  Hi Steffen,
  
  
  On Tue, 4 Sep 2007, Steffen Klassert wrote:
  
   On Tue, Sep 04, 2007 at 03:45:55AM +0530, Satyam Sharma wrote:

drivers/net/3c59x.c: In function 'vortex_up':
drivers/net/3c59x.c:1495: warning: 'err' may be used uninitialized in 
this function
   
   This came in with the recently applied 
   3c59x-check-return-of-pci_enable_device patch
   from Mark Hindley. I just compiled it on a PCI only machine so far, 
   therefore I did
   not notice the warning yet.
  
  Hmm, the .config I built with had PCI=y as well. Probably a compiler
  version difference -- Jeff also mentioned yesterday that some newer
  GCC versions fail to warn about uninitialized variables cases.
  
 
 Sorry, this is my bad. I have just checked: there is no warning with gcc
 4.2 or 4.1, but 3.3 emits the warning. 
 

The only warning that I was able to trigger with gcc 4.2 is in the case of a 
.config
without PCI support. In this case I get

drivers/net/3c59x.c: In function 'vortex_up':
drivers/net/3c59x.c:1672: warning: 'err' is used uninitialized in this function
 ^^
So we have to be more carefull with the newer gcc versions.

Steffen

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug

2007-09-04 Thread Steffen Klassert
On Tue, Sep 04, 2007 at 10:35:10AM +0100, Mark Hindley wrote:
 On Tue, Sep 04, 2007 at 11:17:57AM +0200, Steffen Klassert wrote:
  
  The only warning that I was able to trigger with gcc 4.2 is in the case of 
  a .config
  without PCI support. In this case I get
  
  drivers/net/3c59x.c: In function 'vortex_up':
  drivers/net/3c59x.c:1672: warning: 'err' is used uninitialized in this 
  function
 
 That should be fixed by Satyam's patch too.

Yes, of course.
This was just to point out what's  possible to trigger with a newer gcc and 
what's not.

Steffen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] 3c59x: sparse warning fix

2007-09-13 Thread Steffen Klassert
On Wed, Sep 05, 2007 at 03:23:59PM +0100, Stephen Hemminger wrote:
 --- a/drivers/net/3c59x.c 2007-09-05 15:15:16.0 +0100
 +++ b/drivers/net/3c59x.c 2007-09-05 15:16:29.0 +0100
 @@ -1122,7 +1122,7 @@ static int __devinit vortex_probe1(struc
  + sizeof(struct boom_tx_desc) * 
 TX_RING_SIZE,
  vp-rx_ring_dma);
   retval = -ENOMEM;
 - if (vp-rx_ring == 0)
 + if (!vp-rx_ring)
   goto free_region;
  
   vp-tx_ring = (struct boom_tx_desc *)(vp-rx_ring + RX_RING_SIZE);
 @@ -2476,7 +2476,8 @@ boomerang_rx(struct net_device *dev)
  
   /* Check if the packet is long enough to just accept 
 without
  copying to a properly sized skbuff. */
 - if (pkt_len  rx_copybreak  (skb = 
 dev_alloc_skb(pkt_len + 2)) != 0) {
 + if (pkt_len  rx_copybreak 
 + (skb = dev_alloc_skb(pkt_len + 2)) ) {
   skb_reserve(skb, 2);/* Align IP on 16 byte 
 boundaries */
   pci_dma_sync_single_for_cpu(VORTEX_PCI(vp), 
 dma, PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
   /* 'skb_put()' points to the start of sk_buff 
 data area. */
 -

Thanks,

Acked-by: Steffen Klassert [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problems with 3c59x driver (extremely low throughput)

2007-09-18 Thread Steffen Klassert
On Tue, Sep 18, 2007 at 01:32:00PM +0200, Mikael Leivisk? wrote:
 OK first of all I'm not really sure if this is the place but was told by
  Dave Dillow to send to the netdev kernel mailing list. So that's what
 I'm doing :)
 
 I tried searching the mailing lists and didn't find anything that seemed
 relevant, it is possible I missed something the archives are huge...
 
 Following is the mail I sent originally:
 
 I've run into something of a strange problem lately, I have a network
 card that runs on the Vortex/Boomerang driver. I ran the 2.6.16 kernel
 without any issues, and later switched to 2.6.22 because I couldn't get
 my IPSEC tunnel working with 2.6.16 for some reason (didn't find the
 right switches in menuconfig). And everything was seemingly working fine
 but soon after my ISP got some problems and they said they had to
 replace some hardware somewhere along my route and now when I access the
 internet (the machine is setup as router/web-file-mail-server with this
 card as the external interface) I get dreadfully slow speeds, I'll be
 lucky to get 10KB/s on a 10MB/s link. 

Does this mean that the driver worked well with 2.6.22 befor the hardware
changes of your ISP? What did they change?

 But the funny thing is if I switch
 to the 2.6.16 kernel, WHAM I get 700KB/s no sweat... The kernel
 configurations are otherwise almost exactly the same bar tickless and
 some switches in network options to get ipsec... The only thing I do is
 change the kernel and things work...
 
 So my questions are,
 Has there been any recent changes to the drivers for the card lately?
 (between .16 and .22)

There were some changes, your problem could be related to one of them. 

 Or any changes in the network infrastructure that could be the cause of it?
 Or is there some other switch in the config that could be the cause of this?
 Or is it my ISP's hardware that's not playing nice?
 Or any other ideas as to what might be the cause.
 
 I've posted the respective kernel configs, and lspci output and my
 cpuinfo on http://www.zero-kelvin.org/pub/kern/
 
 If there is any other info that you need I'll gladly provide it assuming
 I can get my hands on it   :)

There were some duplex related fixes on 3c59x recently,
could you please try whether your problem persist in 2.6.23-rc6?

Steffen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bugzilla: open bug reports

2007-10-05 Thread Steffen Klassert
On Thu, Oct 04, 2007 at 09:44:26AM -0700, Stephen Hemminger wrote:
 Bugzilla report of open bugs. Yes you could run it yourself but
 many of these bugs seem to be old and need some attention or work
 to get resolved.
 

Perhaps we should at least ask the reporters of the older bugs 
whether the bugs are still present in newer kernel versions to get
an actual state.

I will do this for the 3c59x related bugs after the release of 2.6.23.

Steffen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vortex_up should initialize err

2007-10-18 Thread Steffen Klassert
On Wed, Oct 17, 2007 at 08:28:36PM -0400, Jeff Garzik wrote:
 Badari Pulavarty wrote:
 Simple compile warning fix. (against 2.6.23-git12)
 
 Thanks,
 Badari
 
 vortex_up() should initialize 'err' for a successful return.
 
 drivers/net/3c59x.c: In function `vortex_up':
 drivers/net/3c59x.c:1494: warning: `err' might be used uninitialized in 
 this function
 
 applied as an obvious bug fix (Steffen added to CC)
 

Hm, we had already the 3c59x-fix-uninitialized-variable-bug.patch from
Satyam Sharma in -mm to fix this, but the patch was removed from -mm
some time ago. Andrew, what happened to this one?

However, this issue should be fixed.

Thanks,

Steffen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CCM/GCM implementation defect

2015-04-23 Thread Steffen Klassert
On Thu, Apr 23, 2015 at 11:26:20AM +0800, Herbert Xu wrote:
 Hi:
 
 It looks like our IPsec implementations of CCM and GCM are buggy
 in that they don't include the IV in the authentication calculation.

Seems like crypto_rfc4106_crypt() passes the associated data it
got from ESP directly to gcm, without chaining with the IV.

 
 This definitely breaks interoperability with anyone who implements
 them correctly.  The fact that there have been no reports on this
 probably means that nobody has run into this in the field yet.
 
 On the security side, this is probably not a big deal for CCM
 because it always verifies the authentication tag after decryption.
 But for GCM this may be a DoS issue as an attacker could modify
 the IV without triggering the authentication check and thus cause
 an unnecessary decryption.  For both CCM and GCM this will result
 in random data injected as a packet into the network stack which
 hopefully will be dropped.
 
 In order to fix this without breaking backwards compatibility,
 my plan is to introduce new templates such as rfc4106v2 which
 implement the RFC correctly.  The existing templates will be
 retained so that current users aren't broken by the fix.

Adding a second template for the correct implementation is
probaply the only thing that we can do if we don't want to
break backwards compatibility. But maybe we can add a warning
to the old implementation, such that users notice that they
use a broken version.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm: fix a race in xfrm_state_lookup_byspi

2015-04-30 Thread Steffen Klassert
On Wed, Apr 29, 2015 at 05:25:25AM +, Du, Fan wrote:
 
 -Original Message-
 From: roy.qing...@gmail.com [mailto:roy.qing...@gmail.com]
 Sent: Wednesday, April 29, 2015 8:43 AM
 To: netdev@vger.kernel.org
 Cc: Du, Fan; steffen.klass...@secunet.com
 Subject: [PATCH] xfrm: fix a race in xfrm_state_lookup_byspi
 
 From: Li RongQing roy.qing...@gmail.com
 
 The returned xfrm_state should be hold before unlock xfrm_state_lock,
 otherwise the returned xfrm_state maybe be released.
 
 Fixes: c454997e6[{pktgen, xfrm} Introduce xfrm_state_lookup_byspi..]
 Cc: Fan Du fan...@intel.com
 Signed-off-by: Li RongQing roy.qing...@gmail.com
 
 Acked-by: Fan Du fan...@intel.com

Applied to the ipsec tree, thanks a lot!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm: fix the return code when xfrm_*_register_afinfo failed

2015-04-27 Thread Steffen Klassert
On Thu, Apr 23, 2015 at 11:06:53AM +0800, roy.qing...@gmail.com wrote:
 From: Li RongQing roy.qing...@gmail.com
 
 If xfrm_*_register_afinfo failed since xfrm_*_afinfo[afinfo-family] had the
 value, return the -EEXIST, not -ENOBUFS
 
 Signed-off-by: Li RongQing roy.qing...@gmail.com

Also applied to ipsec-next, thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][net-next] xfrm: slightly optimise xfrm_input

2015-04-27 Thread Steffen Klassert
On Fri, Apr 24, 2015 at 04:49:31PM +0800, roy.qing...@gmail.com wrote:
 From: Li RongQing roy.qing...@gmail.com
 
 Check x-km.state with XFRM_STATE_ACQ only when state is not
 XFRM_STAT_VALID, not everytime
 
 Signed-off-by: Li RongQing roy.qing...@gmail.com

Applied to ipsec-next, thanks a lot Li!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][net-next][v2] xfrm: optimise the use of walk list header in xfrm_policy/state_walk

2015-04-27 Thread Steffen Klassert
On Wed, Apr 22, 2015 at 05:13:18PM +0800, Herbert Xu wrote:
 On Wed, Apr 22, 2015 at 05:09:54PM +0800, roy.qing...@gmail.com wrote:
  From: Li RongQing roy.qing...@gmail.com
  
  The walk from input is the list header, and marked as dead, and will
  be skipped in loop.
  
  list_first_entry() can be used to return the true usable value from
  walk if walk is not empty
  
  Signed-off-by: Li RongQing roy.qing...@gmail.com
 
 Acked-by: Herbert Xu herb...@gondor.apana.org.au

Applied to ipsec-next, thanks everyone!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm: remove the xfrm_queue_purge definition

2015-04-27 Thread Steffen Klassert
On Wed, Apr 22, 2015 at 03:51:16PM +0800, roy.qing...@gmail.com wrote:
 From: Li RongQing roy.qing...@gmail.com
 
 The task of xfrm_queue_purge is same as skb_queue_purge, so remove it
 
 Signed-off-by: Li RongQing roy.qing...@gmail.com

Applied to ipsec-next, thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 PATCH 0/8] crypto: Convert all AEAD users to new interface

2015-05-27 Thread Steffen Klassert
On Wed, May 27, 2015 at 04:01:05PM +0800, Herbert Xu wrote:
 Hi:
 
 The only changes from the last version are that set_ad no longer
 takes a cryptoff argument and testmgr has been updated to always
 supply space for the authentication tag.
 
 The algif_aead patch has been removed and will be posted separately.
 
 Series description:
 
 This series of patches convert all in-tree AEAD users that I
 could find to the new single SG list interface.  For IPsec it
 also adopts the new explicit IV generator scheme.
 
 To recap, the old AEAD interface takes an associated data (AD)
 SG list in addition to the plain/cipher text SG list(s).  That
 forces the underlying AEAD algorithm implementors to try to stitch
 those two lists together where possible in order to maximise the
 contiguous chunk of memory passed to the ICV/hash function.  Things
 get even more hairy for IPsec as it has a third piece of memory,
 the generated IV (giv) that needs to be hashed.  One look at the
 nasty things authenc does for example is enough to make anyone
 puke :)
 
 In fact the interface is just getting in our way because for the
 main user IPsec the data is naturally contiguous as the protocol
 was designed with this in mind.
 
 So the new AEAD interface gets rid of the separate AD SG list
 and instead simply requires the AD to be at the head of the src
 and dst SG lists.
 
 The conversion of in-tree users is fairly straightforward.  The
 only non-trivial bit is IPsec as I'm taking this opportunity to
 move the IV generation knowledge into IPsec as that's where it
 belongs since we may in future wish to support different generation
 schemes for a single algorithm.

Not sure if I missed something in the flood of patches, but if I
apply your v3 patchset on top of the cryptodev tree, it crashes
like that buring boot:

[4.668297] [ cut here ]
[4.669143] kernel BUG at 
/home/klassert/git/linux-stk/include/linux/scatterlist.h:67!
[4.670457] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
[4.671595] CPU: 0 PID: 1363 Comm: cryptomgr_test Not tainted 4.0.0+ #951
[4.672025] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011
[4.672025] task: ce9e7300 ti: ceb54000 task.ti: ceb54000
[4.672025] EIP: 0060:[c11d45b5] EFLAGS: 00010206 CPU: 0
[4.672025] EIP is at scatterwalk_ffwd+0xf5/0x100
[4.672025] EAX: ceb43b20 EBX: ceb55c94 ECX: 0014 EDX: c11db23f
[4.672025] ESI: 0010 EDI: 0003 EBP: ceb55c7c ESP: ceb55c6c
[4.672025]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
[4.672025] CR0: 8005003b CR2: bfbb6fc0 CR3: 0eb26000 CR4: 06d0
[4.672025] Stack:
[4.672025]  cffd28c0 0014 ceb35400 cea33618 ceb55cd0 c11d45e8 ceb43b20 

[4.672025]  ceb35438 c11db220 ceb55c9c c11db23f ceb55cac c11da470 ceb35438 
ceb353c8
[4.672025]  ceb55cb4 c11da763 ceb55cd0 c11f2c6f ceb35400 0200 ceb35358 
ceb353c8
[4.672025] Call Trace:
[4.672025]  [c11d45e8] scatterwalk_map_and_copy+0x28/0xc0
[4.672025]  [c11db220] ? shash_ahash_finup+0x80/0x80
[4.672025]  [c11db23f] ? shash_async_finup+0x1f/0x30
[4.672025]  [c11da470] ? crypto_ahash_op+0x20/0x50
[4.672025]  [c11da763] ? crypto_ahash_finup+0x13/0x20
[4.672025]  [c11f2c6f] ? crypto_authenc_ahash_fb+0xaf/0xd0
[4.672025]  [c11f2dfc] crypto_authenc_genicv+0xfc/0x340
[4.672025]  [c11f3526] crypto_authenc_encrypt+0x96/0xb0
[4.672025]  [c11f3490] ? crypto_authenc_decrypt+0x3e0/0x3e0
[4.672025]  [c11d4eb7] old_crypt+0xa7/0xc0
[4.672025]  [c11d4f09] old_encrypt+0x19/0x20
[4.672025]  [c11ddbe8] __test_aead+0x268/0x1580
[4.672025]  [c11d28a7] ? __crypto_alloc_tfm+0x37/0x120
[4.672025]  [c11d28a7] ? __crypto_alloc_tfm+0x37/0x120
[4.672025]  [c11d7742] ? skcipher_geniv_init+0x22/0x40
[4.672025]  [c11d7d73] ? eseqiv_init+0x43/0x50
[4.672025]  [c11d2936] ? __crypto_alloc_tfm+0xc6/0x120
[4.672025]  [c11df101] test_aead+0x31/0xc0
[4.672025]  [c11df1d3] alg_test_aead+0x43/0xa0
[4.672025]  [c11def2e] ? alg_find_test+0x2e/0x70
[4.672025]  [c11dfe42] alg_test+0xa2/0x240
[4.672025]  [c106dd83] ? finish_task_switch+0x83/0xe0
[4.672025]  [c159c002] ? __schedule+0x412/0x1067
[4.672025]  [c1085f57] ? __wake_up_common+0x47/0x70
[4.672025]  [c11dbc10] ? cryptomgr_notify+0x450/0x450
[4.672025]  [c11dbc4f] cryptomgr_test+0x3f/0x50
[4.672025]  [c1066dfb] kthread+0xab/0xc0
[4.672025]  [c15a1a41] ret_from_kernel_thread+0x21/0x30
[4.672025]  [c1066d50] ? __kthread_parkme+0x80/0x80
[4.672025] Code: 83 c4 04 5b 5e 5f 5d c3 81 3b 21 43 65 87 75 13 8b 43 04 
83 e0 fe 83 c8 02 89 43 04 89 d8 e9 4d ff ff ff 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 
0f 0b 8d b4 26 00 00 00 00 55 89 e5 57 56 53 83 ec 40 3e
[4.672025] EIP: [c11d45b5] scatterwalk_ffwd+0xf5/0x100 SS:ESP 
0068:ceb55c6c
[4.721562] ---[ end trace 94a02f0816fe7c7f ]---

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a 

Re: Looking for a lost patch

2015-05-27 Thread Steffen Klassert
On Thu, May 21, 2015 at 05:25:24PM -0400, David Miller wrote:
 From: Steffen Klassert steffen.klass...@secunet.com
 Date: Wed, 20 May 2015 08:32:23 +0200
 
  On Tue, May 19, 2015 at 11:32:15AM -0700, Alexander Duyck wrote:
  On 05/19/2015 12:57 AM, Steffen Klassert wrote:
  The MTU should be 1500. All the IPsec overhead is handled by PMTU
  discovery, just like in the case we use IPsec without vti tunnels.
  The IPv6 side of vti does it like that.
  
  The problem is the PMTU isn't communicated to things that make use
  of the tunnel.  For example if I do a ping -s 2000 x.x.x.x across
  an IPv6 VTI interface it will fail currently as it assumes the MTU
  is 1500 and so it is fragmenting the ping packet at sizes that won't
  be communicated across the underlying interface.
  
  Well, the problem is that the local socket is still attached on the
  skb. The socket gets an error notification if the packet is too big,
  but ping does not care much about these error notifications.
  
  One option to get such applications to work is to orphan the skb
  in the vti xmit function. Then the packet is not assumed to be
  local, so PMTU discovery is triggered on that route.
  
  Something like this should work for IPv6:
 
 When a packet traverses software layered devices, we should not orphan
 the socket.
 
 In fact, we have taken great pains to make sure this works so that the
 socket memory accounting is done correctly on the original top-level
 socket.

I have not considered this as an official patch :)
It was more to demonstrate that PMTU discovery with IPsec tunnels can
work, so we don't need to reduce the MTU of the tunnel device.

We currently check if a socket is attached to a skb and do socket
error notification in this case, otherwise we do PMTU discovery if
the packet is too big. Looks like this socket check is not sufficient
if the packet is already transmitted through a tunnel device.

I wonder if we have something to know that a packet was already
transmitted through a tunnel device. We could switch from socket
notification to PMTU discovery in this case.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 PATCH 0/8] crypto: Convert all AEAD users to new interface

2015-05-27 Thread Steffen Klassert
On Wed, May 27, 2015 at 05:29:22PM +0800, Herbert Xu wrote:
 On Wed, May 27, 2015 at 11:25:33AM +0200, Steffen Klassert wrote:
  
  Not sure if I missed something in the flood of patches, but if I
  apply your v3 patchset on top of the cryptodev tree, it crashes
  like that buring boot:
 
 Sorry, I forgot to mention that v3 depends on the series of fixes
 posted just before it (but only to linux-crypto):
 
 https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg14487.html
 

OK, I'll try with this.

Thanks!

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ipsec PATCH 0/3] Preserve skb-mark through VTI tunnels

2015-05-27 Thread Steffen Klassert
On Wed, May 27, 2015 at 07:16:37AM -0700, Alexander Duyck wrote:
 These patches are meant to try and address the fact the VTI tunnels are
 currently overwriting the skb-mark value.  I am generally happy with the
 first two patches, however the third patch still modifies the skb-mark,
 though it undoes after the fact.
 
 The main problem I am trying to address is the fact that currently if I use
 an v6 over v6 VTI tunnel I cannot receive any traffic on the interface as
 the skb-mark is bleeding through and causing the traffic to be dropped.
 
 ---
 
 Alexander Duyck (3):
   ip_vti/ip6_vti: Do not touch skb-mark on xmit
   xfrm: Override skb-mark with tunnel-parm.i_key in xfrm_input
   ip_vti/ip6_vti: Preserve skb-mark after rcv_cb call

All applied to the ipsec tree, thanks a lot Alexander!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] xfrm: fix a race in xfrm_state_lookup_byspi

2015-05-28 Thread Steffen Klassert
From: Li RongQing roy.qing...@gmail.com

The returned xfrm_state should be hold before unlock xfrm_state_lock,
otherwise the returned xfrm_state maybe be released.

Fixes: c454997e6[{pktgen, xfrm} Introduce xfrm_state_lookup_byspi..]
Cc: Fan Du fan...@intel.com
Signed-off-by: Li RongQing roy.qing...@gmail.com
Acked-by: Fan Du fan...@intel.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/xfrm/xfrm_state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index f5e39e3..96688cd 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -927,8 +927,8 @@ struct xfrm_state *xfrm_state_lookup_byspi(struct net *net, 
__be32 spi,
x-id.spi != spi)
continue;
 
-   spin_unlock_bh(net-xfrm.xfrm_state_lock);
xfrm_state_hold(x);
+   spin_unlock_bh(net-xfrm.xfrm_state_lock);
return x;
}
spin_unlock_bh(net-xfrm.xfrm_state_lock);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] esp6: Use high-order sequence number bits for IV generation

2015-05-28 Thread Steffen Klassert
From: Herbert Xu herb...@gondor.apana.org.au

I noticed we were only using the low-order bits for IV generation
when ESN is enabled.  This is very bad because it means that the
IV can repeat.  We must use the full 64 bits.

Signed-off-by: Herbert Xu herb...@gondor.apana.org.au
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/ipv6/esp6.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 31f1b5d..7c07ce3 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -248,7 +248,8 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff 
*skb)
aead_givcrypt_set_crypt(req, sg, sg, clen, iv);
aead_givcrypt_set_assoc(req, asg, assoclen);
aead_givcrypt_set_giv(req, esph-enc_data,
- XFRM_SKB_CB(skb)-seq.output.low);
+ XFRM_SKB_CB(skb)-seq.output.low +
+ ((u64)XFRM_SKB_CB(skb)-seq.output.hi  32));
 
ESP_SKB_CB(skb)-tmp = tmp;
err = crypto_aead_givencrypt(req);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull request (net): ipsec 2015-05-28

2015-05-28 Thread Steffen Klassert
1) Fix a race in xfrm_state_lookup_byspi, we need to take
   the refcount before we release xfrm_state_lock.
   From Li RongQing.

2) Fix IV generation on ESN state. We used just the
   low order sequence numbers for IV generation on
   ESN, as a result the IV can repeat on the same
   state. Fix this by using the  high order sequence
   number bits too and make sure to always initialize
   the high order bits with zero. These patches are
   serious stable candidates. Fixes from Herbert Xu.

3) Fix the skb-mark handling on vti. We don't
   reset skb-mark in skb_scrub_packet anymore,
   so vti must care to restore the original
   value back after it was used to lookup the
   vti policy and state. Fixes from Alexander Duyck.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 39376ccb1968ba9f83e2a880a8bf02ad5dea44e1:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf (2015-04-27 
23:12:34 -0400)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master

for you to fetch changes up to d55c670cbc54b2270a465cdc382ce71adae45785:

  ip_vti/ip6_vti: Preserve skb-mark after rcv_cb call (2015-05-28 06:23:32 
+0200)


Alexander Duyck (3):
  ip_vti/ip6_vti: Do not touch skb-mark on xmit
  xfrm: Override skb-mark with tunnel-parm.i_key in xfrm_input
  ip_vti/ip6_vti: Preserve skb-mark after rcv_cb call

Herbert Xu (3):
  esp4: Use high-order sequence number bits for IV generation
  esp6: Use high-order sequence number bits for IV generation
  xfrm: Always zero high-order sequence number bits

Li RongQing (1):
  xfrm: fix a race in xfrm_state_lookup_byspi

 net/ipv4/esp4.c|  3 ++-
 net/ipv4/ip_vti.c  | 14 ++
 net/ipv6/esp6.c|  3 ++-
 net/ipv6/ip6_vti.c | 13 ++---
 net/xfrm/xfrm_input.c  | 17 -
 net/xfrm/xfrm_replay.c |  2 ++
 net/xfrm/xfrm_state.c  |  2 +-
 7 files changed, 43 insertions(+), 11 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] esp4: Use high-order sequence number bits for IV generation

2015-05-28 Thread Steffen Klassert
From: Herbert Xu herb...@gondor.apana.org.au

I noticed we were only using the low-order bits for IV generation
when ESN is enabled.  This is very bad because it means that the
IV can repeat.  We must use the full 64 bits.

Signed-off-by: Herbert Xu herb...@gondor.apana.org.au
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/ipv4/esp4.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 421a80b..30b544f 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -256,7 +256,8 @@ static int esp_output(struct xfrm_state *x, struct sk_buff 
*skb)
aead_givcrypt_set_crypt(req, sg, sg, clen, iv);
aead_givcrypt_set_assoc(req, asg, assoclen);
aead_givcrypt_set_giv(req, esph-enc_data,
- XFRM_SKB_CB(skb)-seq.output.low);
+ XFRM_SKB_CB(skb)-seq.output.low +
+ ((u64)XFRM_SKB_CB(skb)-seq.output.hi  32));
 
ESP_SKB_CB(skb)-tmp = tmp;
err = crypto_aead_givencrypt(req);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] ip_vti/ip6_vti: Do not touch skb-mark on xmit

2015-05-28 Thread Steffen Klassert
From: Alexander Duyck alexander.h.du...@redhat.com

Instead of modifying skb-mark we can simply modify the flowi_mark that is
generated as a result of the xfrm_decode_session.  By doing this we don't
need to actually touch the skb-mark and it can be preserved as it passes
out through the tunnel.

Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/ipv4/ip_vti.c  | 5 +++--
 net/ipv6/ip6_vti.c | 4 +++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 9f7269f..4c318e1 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -216,8 +216,6 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
memset(fl, 0, sizeof(fl));
 
-   skb-mark = be32_to_cpu(tunnel-parms.o_key);
-
switch (skb-protocol) {
case htons(ETH_P_IP):
xfrm_decode_session(skb, fl, AF_INET);
@@ -233,6 +231,9 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, 
struct net_device *dev)
return NETDEV_TX_OK;
}
 
+   /* override mark with tunnel output key */
+   fl.flowi_mark = be32_to_cpu(tunnel-parms.o_key);
+
return vti_xmit(skb, dev, fl);
 }
 
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index ed9d681..104de4d 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -495,7 +495,6 @@ vti6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
int ret;
 
memset(fl, 0, sizeof(fl));
-   skb-mark = be32_to_cpu(t-parms.o_key);
 
switch (skb-protocol) {
case htons(ETH_P_IPV6):
@@ -516,6 +515,9 @@ vti6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
goto tx_err;
}
 
+   /* override mark with tunnel output key */
+   fl.flowi_mark = be32_to_cpu(t-parms.o_key);
+
ret = vti6_xmit(skb, dev, fl);
if (ret  0)
goto tx_err;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] ip_vti/ip6_vti: Preserve skb-mark after rcv_cb call

2015-05-28 Thread Steffen Klassert
From: Alexander Duyck alexander.h.du...@redhat.com

The vti6_rcv_cb and vti_rcv_cb calls were leaving the skb-mark modified
after completing the function.  This resulted in the original skb-mark
value being lost.  Since we only need skb-mark to be set for
xfrm_policy_check we can pull the assignment into the rcv_cb calls and then
just restore the original mark after xfrm_policy_check has been completed.

Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/ipv4/ip_vti.c  | 9 +++--
 net/ipv6/ip6_vti.c | 9 +++--
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 4c318e1..0c15208 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -65,7 +65,6 @@ static int vti_input(struct sk_buff *skb, int nexthdr, __be32 
spi,
goto drop;
 
XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip4 = tunnel;
-   skb-mark = be32_to_cpu(tunnel-parms.i_key);
 
return xfrm_input(skb, nexthdr, spi, encap_type);
}
@@ -91,6 +90,8 @@ static int vti_rcv_cb(struct sk_buff *skb, int err)
struct pcpu_sw_netstats *tstats;
struct xfrm_state *x;
struct ip_tunnel *tunnel = XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip4;
+   u32 orig_mark = skb-mark;
+   int ret;
 
if (!tunnel)
return 1;
@@ -107,7 +108,11 @@ static int vti_rcv_cb(struct sk_buff *skb, int err)
x = xfrm_input_state(skb);
family = x-inner_mode-afinfo-family;
 
-   if (!xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family))
+   skb-mark = be32_to_cpu(tunnel-parms.i_key);
+   ret = xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family);
+   skb-mark = orig_mark;
+
+   if (!ret)
return -EPERM;
 
skb_scrub_packet(skb, !net_eq(tunnel-net, dev_net(skb-dev)));
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 104de4d..ff3bd86 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -322,7 +322,6 @@ static int vti6_rcv(struct sk_buff *skb)
}
 
XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip6 = t;
-   skb-mark = be32_to_cpu(t-parms.i_key);
 
rcu_read_unlock();
 
@@ -342,6 +341,8 @@ static int vti6_rcv_cb(struct sk_buff *skb, int err)
struct pcpu_sw_netstats *tstats;
struct xfrm_state *x;
struct ip6_tnl *t = XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip6;
+   u32 orig_mark = skb-mark;
+   int ret;
 
if (!t)
return 1;
@@ -358,7 +359,11 @@ static int vti6_rcv_cb(struct sk_buff *skb, int err)
x = xfrm_input_state(skb);
family = x-inner_mode-afinfo-family;
 
-   if (!xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family))
+   skb-mark = be32_to_cpu(t-parms.i_key);
+   ret = xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family);
+   skb-mark = orig_mark;
+
+   if (!ret)
return -EPERM;
 
skb_scrub_packet(skb, !net_eq(t-net, dev_net(skb-dev)));
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] xfrm: Always zero high-order sequence number bits

2015-05-28 Thread Steffen Klassert
From: Herbert Xu herb...@gondor.apana.org.au

As we're now always including the high bits of the sequence number
in the IV generation process we need to ensure that they don't
contain crap.

This patch ensures that the high sequence bits are always zeroed
so that we don't leak random data into the IV.

Signed-off-by: Herbert Xu herb...@gondor.apana.org.au
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/xfrm/xfrm_replay.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/xfrm/xfrm_replay.c b/net/xfrm/xfrm_replay.c
index dab57da..4fd725a 100644
--- a/net/xfrm/xfrm_replay.c
+++ b/net/xfrm/xfrm_replay.c
@@ -99,6 +99,7 @@ static int xfrm_replay_overflow(struct xfrm_state *x, struct 
sk_buff *skb)
 
if (x-type-flags  XFRM_TYPE_REPLAY_PROT) {
XFRM_SKB_CB(skb)-seq.output.low = ++x-replay.oseq;
+   XFRM_SKB_CB(skb)-seq.output.hi = 0;
if (unlikely(x-replay.oseq == 0)) {
x-replay.oseq--;
xfrm_audit_state_replay_overflow(x, skb);
@@ -177,6 +178,7 @@ static int xfrm_replay_overflow_bmp(struct xfrm_state *x, 
struct sk_buff *skb)
 
if (x-type-flags  XFRM_TYPE_REPLAY_PROT) {
XFRM_SKB_CB(skb)-seq.output.low = ++replay_esn-oseq;
+   XFRM_SKB_CB(skb)-seq.output.hi = 0;
if (unlikely(replay_esn-oseq == 0)) {
replay_esn-oseq--;
xfrm_audit_state_replay_overflow(x, skb);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] xfrm: Override skb-mark with tunnel-parm.i_key in xfrm_input

2015-05-28 Thread Steffen Klassert
From: Alexander Duyck alexander.h.du...@redhat.com

This change makes it so that if a tunnel is defined we just use the mark
from the tunnel instead of the mark from the skb header.  By doing this we
can avoid the need to set skb-mark inside of the tunnel receive functions.

Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/xfrm/xfrm_input.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 526c4fe..b58286e 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -13,6 +13,8 @@
 #include net/dst.h
 #include net/ip.h
 #include net/xfrm.h
+#include net/ip_tunnels.h
+#include net/ip6_tunnel.h
 
 static struct kmem_cache *secpath_cachep __read_mostly;
 
@@ -186,6 +188,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
struct xfrm_state *x = NULL;
xfrm_address_t *daddr;
struct xfrm_mode *inner_mode;
+   u32 mark = skb-mark;
unsigned int family;
int decaps = 0;
int async = 0;
@@ -203,6 +206,18 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
   XFRM_SPI_SKB_CB(skb)-daddroff);
family = XFRM_SPI_SKB_CB(skb)-family;
 
+   /* if tunnel is present override skb-mark value with tunnel i_key */
+   if (XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip4) {
+   switch (family) {
+   case AF_INET:
+   mark = 
be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip4-parms.i_key);
+   break;
+   case AF_INET6:
+   mark = 
be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip6-parms.i_key);
+   break;
+   }
+   }
+
/* Allocate new secpath or COW existing one. */
if (!skb-sp || atomic_read(skb-sp-refcnt) != 1) {
struct sec_path *sp;
@@ -229,7 +244,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
goto drop;
}
 
-   x = xfrm_state_lookup(net, skb-mark, daddr, spi, nexthdr, 
family);
+   x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr, family);
if (x == NULL) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
xfrm_audit_state_notfound(skb, family, spi, seq);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm6: Do not use xfrm_local_error for path MTU issues in tunnels

2015-05-28 Thread Steffen Klassert
On Thu, May 28, 2015 at 12:18:51AM -0700, Alexander Duyck wrote:
 On 05/27/2015 10:36 PM, Steffen Klassert wrote:
 On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote:
 This change makes it so that we use icmpv6_send to report PMTU issues back
 into tunnels in the case that the resulting packet is larger than the MTU
 of the outgoing interface.  Previously xfrm_local_error was being used in
 this case, however this was resulting in no changes, I suspect due to the
 fact that the tunnel itself was being kept out of the loop.
 
 This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the
 behavior seen if the socket was orphaned.  Instead of requiring the socket
 to be orphaned this patch simply defaults to using icmpv6_send in the case
 that the frame came though a tunnel.
 We can use icmpv6_send() just in the case that the packet
 was already transmitted by a tunnel device, otherwise we
 get the bug back that I mentioned in my other mail.
 
 Not sure if we have something to know that the packet
 traversed a tunnel device. That's what I asked in the
 thread 'Looking for a lost patch'.
 
 Okay I will try to do some more digging.  From what I can tell right
 now it looks like my ping attempts are getting hung up on the
 xfrm_local_error in __xfrm6_output.  I wonder if we couldn't somehow
 make use of the skb-cb to store a pointer to the tunnel that could
 be checked to determine if we are going through a VTI or not.

Maybe it is as easy as the patch below, could you please test it?

Subject: [PATCH RFC] vti6: Add pmtu handling to vti6_xmit.

We currently rely on the PMTU discovery of xfrm.
However if a packet is localy sent, the PMTU mechanism
of xfrm tries to to local socket notification what
might not work for applications like ping that don't
check for this. So add pmtu handling to vti6_xmit to
report MTU changes immediately.

Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/ipv6/ip6_vti.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index ff3bd86..13cb771 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -434,6 +434,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, 
struct flowi *fl)
struct dst_entry *dst = skb_dst(skb);
struct net_device *tdev;
struct xfrm_state *x;
+   int mtu;
int err = -1;
 
if (!dst)
@@ -468,6 +469,15 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, 
struct flowi *fl)
skb_dst_set(skb, dst);
skb-dev = skb_dst(skb)-dev;
 
+   mtu = dst_mtu(dst);
+   if (!skb-ignore_df  skb-len  mtu) {
+   skb_dst(skb)-ops-update_pmtu(dst, NULL, skb, mtu);
+
+   icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+
+   return -EMSGSIZE;
+   }
+
err = dst_output(skb);
if (net_xmit_eval(err) == 0) {
struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev-tstats);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull request (net-next): ipsec-next 2015-05-28

2015-05-28 Thread Steffen Klassert
1) Remove xfrm_queue_purge as this is the same as skb_queue_purge.

2) Optimize policy and state walk.

3) Use a sane return code if afinfo registration fails.

4) Only check fori a acquire state if the state is not valid.

5) Remove a unnecessary NULL check before xfrm_pol_hold
   as it checks the input for NULL.

6) Return directly if the xfrm hold queue is empty, avoid
   to take a lock as it is nothing to do in this case.

7) Optimize the inexact policy search and allow for matching
   of policies with priority ~0U.

All from Li RongQing.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 04b7fe6a4a231871ef681bc95e08fe66992f7b1f:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide (2015-04-17 
16:36:59 -0400)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to 8faf491e642011f449d6405be52bedb70964aed6:

  xfrm: optimise to search the inexact policy list (2015-05-18 10:31:56 +0200)


Li RongQing (7):
  xfrm: remove the xfrm_queue_purge definition
  xfrm: optimise the use of walk list header in xfrm_policy/state_walk
  xfrm: fix the return code when xfrm_*_register_afinfo failed
  xfrm: slightly optimise xfrm_input
  xfrm: remove the unnecessary checking before call xfrm_pol_hold
  xfrm: move the checking for old xfrm_policy hold_queue to beginning
  xfrm: optimise to search the inexact policy list

 net/xfrm/xfrm_input.c  | 12 ++--
 net/xfrm/xfrm_policy.c | 42 --
 net/xfrm/xfrm_state.c  |  4 ++--
 3 files changed, 28 insertions(+), 30 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] xfrm: remove the xfrm_queue_purge definition

2015-05-28 Thread Steffen Klassert
From: Li RongQing roy.qing...@gmail.com

The task of xfrm_queue_purge is same as skb_queue_purge, so remove it

Signed-off-by: Li RongQing roy.qing...@gmail.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/xfrm/xfrm_policy.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 638af06..d8c35ad 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -315,14 +315,6 @@ void xfrm_policy_destroy(struct xfrm_policy *policy)
 }
 EXPORT_SYMBOL(xfrm_policy_destroy);
 
-static void xfrm_queue_purge(struct sk_buff_head *list)
-{
-   struct sk_buff *skb;
-
-   while ((skb = skb_dequeue(list)) != NULL)
-   kfree_skb(skb);
-}
-
 /* Rule must be locked. Release descentant resources, announce
  * entry dead. The rule must be unlinked from lists to the moment.
  */
@@ -335,7 +327,7 @@ static void xfrm_policy_kill(struct xfrm_policy *policy)
 
if (del_timer(policy-polq.hold_timer))
xfrm_pol_put(policy);
-   xfrm_queue_purge(policy-polq.hold_queue);
+   skb_queue_purge(policy-polq.hold_queue);
 
if (del_timer(policy-timer))
xfrm_pol_put(policy);
@@ -1955,7 +1947,7 @@ out:
 
 purge_queue:
pq-timeout = 0;
-   xfrm_queue_purge(pq-hold_queue);
+   skb_queue_purge(pq-hold_queue);
xfrm_pol_put(pol);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] xfrm: remove the unnecessary checking before call xfrm_pol_hold

2015-05-28 Thread Steffen Klassert
From: Li RongQing roy.qing...@gmail.com

xfrm_pol_hold will check its input with NULL

Signed-off-by: Li RongQing roy.qing...@gmail.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/xfrm/xfrm_policy.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index c4c47f3..435bc0d 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1127,8 +1127,8 @@ static struct xfrm_policy 
*xfrm_policy_lookup_bytype(struct net *net, u8 type,
break;
}
}
-   if (ret)
-   xfrm_pol_hold(ret);
+
+   xfrm_pol_hold(ret);
 fail:
read_unlock_bh(net-xfrm.xfrm_policy_lock);
 
@@ -3211,8 +3211,7 @@ static struct xfrm_policy *xfrm_migrate_policy_find(const 
struct xfrm_selector *
}
}
 
-   if (ret)
-   xfrm_pol_hold(ret);
+   xfrm_pol_hold(ret);
 
read_unlock_bh(net-xfrm.xfrm_policy_lock);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] xfrm: move the checking for old xfrm_policy hold_queue to beginning

2015-05-28 Thread Steffen Klassert
From: Li RongQing roy.qing...@gmail.com

if hold_queue of old xfrm_policy is NULL, return directly, then not need to
run other codes, especially take the spin lock

Signed-off-by: Li RongQing roy.qing...@gmail.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/xfrm/xfrm_policy.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 435bc0d..3d264e5 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -700,6 +700,9 @@ static void xfrm_policy_requeue(struct xfrm_policy *old,
struct xfrm_policy_queue *pq = old-polq;
struct sk_buff_head list;
 
+   if (skb_queue_empty(pq-hold_queue))
+   return;
+
__skb_queue_head_init(list);
 
spin_lock_bh(pq-hold_queue.lock);
@@ -708,9 +711,6 @@ static void xfrm_policy_requeue(struct xfrm_policy *old,
xfrm_pol_put(old);
spin_unlock_bh(pq-hold_queue.lock);
 
-   if (skb_queue_empty(list))
-   return;
-
pq = new-polq;
 
spin_lock_bh(pq-hold_queue.lock);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] xfrm: fix the return code when xfrm_*_register_afinfo failed

2015-05-28 Thread Steffen Klassert
From: Li RongQing roy.qing...@gmail.com

If xfrm_*_register_afinfo failed since xfrm_*_afinfo[afinfo-family] had the
value, return the -EEXIST, not -ENOBUFS

Signed-off-by: Li RongQing roy.qing...@gmail.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/xfrm/xfrm_input.c  | 2 +-
 net/xfrm/xfrm_policy.c | 2 +-
 net/xfrm/xfrm_state.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 526c4fe..459796a 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -29,7 +29,7 @@ int xfrm_input_register_afinfo(struct xfrm_input_afinfo 
*afinfo)
return -EAFNOSUPPORT;
spin_lock_bh(xfrm_input_afinfo_lock);
if (unlikely(xfrm_input_afinfo[afinfo-family] != NULL))
-   err = -ENOBUFS;
+   err = -EEXIST;
else
rcu_assign_pointer(xfrm_input_afinfo[afinfo-family], afinfo);
spin_unlock_bh(xfrm_input_afinfo_lock);
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 847053e..c4c47f3 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2808,7 +2808,7 @@ int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo 
*afinfo)
return -EAFNOSUPPORT;
spin_lock(xfrm_policy_afinfo_lock);
if (unlikely(xfrm_policy_afinfo[afinfo-family] != NULL))
-   err = -ENOBUFS;
+   err = -EEXIST;
else {
struct dst_ops *dst_ops = afinfo-dst_ops;
if (likely(dst_ops-kmem_cachep == NULL))
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index df93183..e47e498 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1908,7 +1908,7 @@ int xfrm_state_register_afinfo(struct xfrm_state_afinfo 
*afinfo)
return -EAFNOSUPPORT;
spin_lock_bh(xfrm_state_afinfo_lock);
if (unlikely(xfrm_state_afinfo[afinfo-family] != NULL))
-   err = -ENOBUFS;
+   err = -EEXIST;
else
rcu_assign_pointer(xfrm_state_afinfo[afinfo-family], afinfo);
spin_unlock_bh(xfrm_state_afinfo_lock);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] xfrm: slightly optimise xfrm_input

2015-05-28 Thread Steffen Klassert
From: Li RongQing roy.qing...@gmail.com

Check x-km.state with XFRM_STATE_ACQ only when state is not
XFRM_STAT_VALID, not everytime

Signed-off-by: Li RongQing roy.qing...@gmail.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/xfrm/xfrm_input.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 459796a..1858a45f 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -239,13 +239,13 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
skb-sp-xvec[skb-sp-len++] = x;
 
spin_lock(x-lock);
-   if (unlikely(x-km.state == XFRM_STATE_ACQ)) {
-   XFRM_INC_STATS(net, LINUX_MIB_XFRMACQUIREERROR);
-   goto drop_unlock;
-   }
 
if (unlikely(x-km.state != XFRM_STATE_VALID)) {
-   XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEINVALID);
+   if (x-km.state == XFRM_STATE_ACQ)
+   XFRM_INC_STATS(net, LINUX_MIB_XFRMACQUIREERROR);
+   else
+   XFRM_INC_STATS(net,
+  LINUX_MIB_XFRMINSTATEINVALID);
goto drop_unlock;
}
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] xfrm: optimise the use of walk list header in xfrm_policy/state_walk

2015-05-28 Thread Steffen Klassert
From: Li RongQing roy.qing...@gmail.com

The walk from input is the list header, and marked as dead, and will
be skipped in loop.

list_first_entry() can be used to return the true usable value from
walk if walk is not empty

Signed-off-by: Li RongQing roy.qing...@gmail.com
Acked-by: Herbert Xu herb...@gondor.apana.org.au
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/xfrm/xfrm_policy.c | 4 +++-
 net/xfrm/xfrm_state.c  | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index d8c35ad..847053e 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1004,7 +1004,9 @@ int xfrm_policy_walk(struct net *net, struct 
xfrm_policy_walk *walk,
if (list_empty(walk-walk.all))
x = list_first_entry(net-xfrm.policy_all, struct 
xfrm_policy_walk_entry, all);
else
-   x = list_entry(walk-walk.all, struct xfrm_policy_walk_entry, 
all);
+   x = list_first_entry(walk-walk.all,
+struct xfrm_policy_walk_entry, all);
+
list_for_each_entry_from(x, net-xfrm.policy_all, all) {
if (x-dead)
continue;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index f5e39e3..df93183 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1626,7 +1626,7 @@ int xfrm_state_walk(struct net *net, struct 
xfrm_state_walk *walk,
if (list_empty(walk-all))
x = list_first_entry(net-xfrm.state_all, struct 
xfrm_state_walk, all);
else
-   x = list_entry(walk-all, struct xfrm_state_walk, all);
+   x = list_first_entry(walk-all, struct xfrm_state_walk, all);
list_for_each_entry_from(x, net-xfrm.state_all, all) {
if (x-state == XFRM_STATE_DEAD)
continue;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] xfrm: optimise to search the inexact policy list

2015-05-28 Thread Steffen Klassert
From: Li RongQing roy.qing...@gmail.com

The policies are organized into list by priority ascent of policy,
so it is unnecessary to continue to loop the policy if the priority
of current looped police is larger than or equal priority which is
from the policy_bydst list.

This allows to match policy with ~0U priority in inexact list too.

Signed-off-by: Li RongQing roy.qing...@gmail.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/xfrm/xfrm_policy.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 3d264e5..18cead7 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1114,6 +1114,9 @@ static struct xfrm_policy 
*xfrm_policy_lookup_bytype(struct net *net, u8 type,
}
chain = net-xfrm.policy_inexact[dir];
hlist_for_each_entry(pol, chain, bydst) {
+   if ((pol-priority = priority)  ret)
+   break;
+
err = xfrm_policy_match(pol, fl, type, family, dir);
if (err) {
if (err == -ESRCH)
@@ -1122,7 +1125,7 @@ static struct xfrm_policy 
*xfrm_policy_lookup_bytype(struct net *net, u8 type,
ret = ERR_PTR(err);
goto fail;
}
-   } else if (pol-priority  priority) {
+   } else {
ret = pol;
break;
}
@@ -3203,9 +3206,11 @@ static struct xfrm_policy 
*xfrm_migrate_policy_find(const struct xfrm_selector *
}
chain = net-xfrm.policy_inexact[dir];
hlist_for_each_entry(pol, chain, bydst) {
+   if ((pol-priority = priority)  ret)
+   break;
+
if (xfrm_migrate_selector_match(sel, pol-selector) 
-   pol-type == type 
-   pol-priority  priority) {
+   pol-type == type) {
ret = pol;
break;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm6: Do not use xfrm_local_error for path MTU issues in tunnels

2015-05-27 Thread Steffen Klassert
On Thu, May 28, 2015 at 12:49:19PM +0800, Herbert Xu wrote:
 On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote:
  This change makes it so that we use icmpv6_send to report PMTU issues back
  into tunnels in the case that the resulting packet is larger than the MTU
  of the outgoing interface.  Previously xfrm_local_error was being used in
  this case, however this was resulting in no changes, I suspect due to the
  fact that the tunnel itself was being kept out of the loop.
  
  This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the
  behavior seen if the socket was orphaned.  Instead of requiring the socket
  to be orphaned this patch simply defaults to using icmpv6_send in the case
  that the frame came though a tunnel.
  
  Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
 
 Does this still work with normal tunnel mode and identical inner
 and outer addresses? I recall we used to have a bug where in that
 situation the kernel would interpret the ICMP message as a reduction
 in outer MTU and thus resulting in a loop where the MTU keeps
 getting smaller.

Right, I think this reintroduces a bug that I fixed some years ago with
commit dd767856a36e (xfrm6: Don't call icmpv6_send on local error)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm6: Do not use xfrm_local_error for path MTU issues in tunnels

2015-05-27 Thread Steffen Klassert
On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote:
 This change makes it so that we use icmpv6_send to report PMTU issues back
 into tunnels in the case that the resulting packet is larger than the MTU
 of the outgoing interface.  Previously xfrm_local_error was being used in
 this case, however this was resulting in no changes, I suspect due to the
 fact that the tunnel itself was being kept out of the loop.
 
 This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the
 behavior seen if the socket was orphaned.  Instead of requiring the socket
 to be orphaned this patch simply defaults to using icmpv6_send in the case
 that the frame came though a tunnel.

We can use icmpv6_send() just in the case that the packet
was already transmitted by a tunnel device, otherwise we
get the bug back that I mentioned in my other mail.

Not sure if we have something to know that the packet
traversed a tunnel device. That's what I asked in the
thread 'Looking for a lost patch'.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Looking for a lost patch

2015-05-27 Thread Steffen Klassert
On Wed, May 27, 2015 at 11:46:03AM -0400, David Miller wrote:
 From: Steffen Klassert steffen.klass...@secunet.com
 Date: Wed, 27 May 2015 10:35:16 +0200
 
  We currently check if a socket is attached to a skb and do socket
  error notification in this case, otherwise we do PMTU discovery if
  the packet is too big. Looks like this socket check is not sufficient
  if the packet is already transmitted through a tunnel device.
  
  I wonder if we have something to know that a packet was already
  transmitted through a tunnel device. We could switch from socket
  notification to PMTU discovery in this case.
 
 Generally speaking, we should not be orphaning the socket as it
 traverses through tunnels.
 
 In fact we have taken great pains to avoid doing this.

Yes, I'm aware of this. I don't want to orphan the socket,
all I wanted to do is to change the way we notify about MTU
changes. I.e. use icmpv6_send() instead of xfrm_local_error()
if the packet traversed a tunnel, that's why I wondered whether
we can know this.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH RFC 0/3] Preserve skb-mark through VTI tunnels

2015-05-27 Thread Steffen Klassert
On Tue, May 26, 2015 at 03:41:10PM -0700, Alexander Duyck wrote:
 These patches are meant to try and address the fact the VTI tunnels are
 currently overwriting the skb-mark value.  I am generally happy with the
 first two patches, however the third patch still modifies the skb-mark,
 though it undoes after the fact.

I don't see any better solution, so I think this should be ok for now.
On the long run we need to replace this gre key/mark matching with
a separate interface.

 
 The main problem I am trying to address is the fact that currently if I use
 an v6 over v6 VTI tunnel I cannot receive any traffic on the interface as
 the skb-mark is bleeding through and causing the traffic to be dropped.

This is broken in the current mainline, so it should go into the ipsec
tree as a bugfix. I'd merge this patchset if you submit it to that tree.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm: Always zero high-order sequence number bits

2015-05-21 Thread Steffen Klassert
On Thu, May 21, 2015 at 12:38:12AM +0800, Herbert Xu wrote:
 As we're now always including the high bits of the sequence number
 in the IV generation process we need to ensure that they don't
 contain crap.
 
 This patch ensures that the high sequence bits are always zeroed
 so that we don't leak random data into the IV.
 
 Signed-off-by: Herbert Xu herb...@gondor.apana.org.au

Applied, thanks Herbert!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ipsec-next] xfrm: Use VRF master index if output device is enslaved

2015-08-19 Thread Steffen Klassert
On Wed, Aug 19, 2015 at 11:35:55AM -0700, David Ahern wrote:
 
 I think you should use the new vrf_master_index() helper that acquires rcu 
 because
 it looks possible to call -decode_session() without rcu read lock, e.g. in 
 the hold_timer
 function xfrm_policy_queue_process(), though I haven’t tested it and might 
 be missing
 something. :-)
 
 I was digging into code paths yesterday. Today I added WARN_ON and
 seems like the rcu_read_lock is held:
 
 if (skb_dst(skb)) {
 WARN_ON(!rcu_read_lock_held()  !rcu_read_lock_bh_held());
 oif = vrf_master_ifindex_rcu(skb_dst(skb)-dev) ?
 : skb_dst(skb)-dev-ifindex;
 pr_info(_decode_session: oif %d skb_dst(skb)-dev-ifindex %d\n,
 oif, skb_dst(skb)-dev-ifindex);
 }
 
 I get the printk, but not the WARN_ON splat.

Well, this depends on the codepath that called xfrm_decode_session().
It really think it was not called through xfrm_policy_queue_process()
because this codepath is just used if the sysctl xfrm_larval_drop is switched
off (on by default) and the required xfrm_state is not yet established.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ipsec-next v2] xfrm: Use VRF master index if output device is enslaved

2015-08-21 Thread Steffen Klassert
On Fri, Aug 21, 2015 at 02:11:21AM +0300, Nikolay Aleksandrov wrote:
 
  On Aug 21, 2015, at 1:06 AM, David Ahern d...@cumulusnetworks.com wrote:
  
  Directs route lookups to VRF table. Compiles out if NET_VRF is not
  enabled. With this patch able to successfully bring up ipsec tunnels
  in VRFs, even with duplicate network configuration.
  
  Signed-off-by: David Ahern d...@cumulusnetworks.com
  ---
  v2
  - use vrf_master_ifindex rather than vrf_master_ifindex_rcu
  
  net/ipv4/xfrm4_policy.c | 7 +--
  net/ipv6/xfrm6_policy.c | 7 +--
  2 files changed, 10 insertions(+), 4 deletions(-)
  
 
 Looks good to me,
 
 Acked-by: Nikolay Aleksandrov niko...@cumulusnetworks.com

David, can you please take this directly into net-next?

Acked-by: Steffen Klassert steffen.klass...@secunet.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] net: Document xfrm4_gc_thresh and xfrm6_gc_thresh

2015-08-17 Thread Steffen Klassert
From: Alexander Duyck alexander.h.du...@redhat.com

This change adds documentation for xfrm4_gc_thresh and xfrm6_gc_thresh
based on the comments in commit eeb1b73378b56 (xfrm: Increase the garbage
collector threshold).

Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 Documentation/networking/ip-sysctl.txt | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 56db1ef..46e88ed 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1181,6 +1181,11 @@ tag - INTEGER
Allows you to write a number, which can be used as required.
Default value is 0.
 
+xfrm4_gc_thresh - INTEGER
+   The threshold at which we will start garbage collecting for IPv4
+   destination cache entries.  At twice this value the system will
+   refuse new allocations.
+
 Alexey Kuznetsov.
 kuz...@ms2.inr.ac.ru
 
@@ -1617,6 +1622,11 @@ ratelimit - INTEGER
otherwise the minimal space between responses in milliseconds.
Default: 1000
 
+xfrm6_gc_thresh - INTEGER
+   The threshold at which we will start garbage collecting for IPv6
+   destination cache entries.  At twice this value the system will
+   refuse new allocations.
+
 
 IPv6 Update by:
 Pekka Savola pek...@netcore.fi
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] xfrm: Add oif to dst lookups

2015-08-17 Thread Steffen Klassert
From: David Ahern d...@cumulusnetworks.com

Rules can be installed that direct route lookups to specific tables based
on oif. Plumb the oif through the xfrm lookups so it gets set in the flow
struct and passed to the resolver routines.

Signed-off-by: David Ahern d...@cumulusnetworks.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 include/net/xfrm.h  |  7 +--
 net/ipv4/xfrm4_policy.c | 11 ++-
 net/ipv6/xfrm6_policy.c |  7 ---
 net/xfrm/xfrm_policy.c  | 24 ++--
 4 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index f0ee97e..312e3fe 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -285,10 +285,13 @@ struct xfrm_policy_afinfo {
unsigned short  family;
struct dst_ops  *dst_ops;
void(*garbage_collect)(struct net *net);
-   struct dst_entry*(*dst_lookup)(struct net *net, int tos,
+   struct dst_entry*(*dst_lookup)(struct net *net,
+  int tos, int oif,
   const xfrm_address_t *saddr,
   const xfrm_address_t *daddr);
-   int (*get_saddr)(struct net *net, xfrm_address_t 
*saddr, xfrm_address_t *daddr);
+   int (*get_saddr)(struct net *net, int oif,
+xfrm_address_t *saddr,
+xfrm_address_t *daddr);
void(*decode_session)(struct sk_buff *skb,
  struct flowi *fl,
  int reverse);
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index bff6974..55b3c0f 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -19,7 +19,7 @@
 static struct xfrm_policy_afinfo xfrm4_policy_afinfo;
 
 static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 
*fl4,
-   int tos,
+   int tos, int oif,
const xfrm_address_t *saddr,
const xfrm_address_t *daddr)
 {
@@ -28,6 +28,7 @@ static struct dst_entry *__xfrm4_dst_lookup(struct net *net, 
struct flowi4 *fl4,
memset(fl4, 0, sizeof(*fl4));
fl4-daddr = daddr-a4;
fl4-flowi4_tos = tos;
+   fl4-flowi4_oif = oif;
if (saddr)
fl4-saddr = saddr-a4;
 
@@ -38,22 +39,22 @@ static struct dst_entry *__xfrm4_dst_lookup(struct net 
*net, struct flowi4 *fl4,
return ERR_CAST(rt);
 }
 
-static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos,
+static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos, int oif,
  const xfrm_address_t *saddr,
  const xfrm_address_t *daddr)
 {
struct flowi4 fl4;
 
-   return __xfrm4_dst_lookup(net, fl4, tos, saddr, daddr);
+   return __xfrm4_dst_lookup(net, fl4, tos, oif, saddr, daddr);
 }
 
-static int xfrm4_get_saddr(struct net *net,
+static int xfrm4_get_saddr(struct net *net, int oif,
   xfrm_address_t *saddr, xfrm_address_t *daddr)
 {
struct dst_entry *dst;
struct flowi4 fl4;
 
-   dst = __xfrm4_dst_lookup(net, fl4, 0, NULL, daddr);
+   dst = __xfrm4_dst_lookup(net, fl4, 0, oif, NULL, daddr);
if (IS_ERR(dst))
return -EHOSTUNREACH;
 
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index ed0583c..a74013d 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -26,7 +26,7 @@
 
 static struct xfrm_policy_afinfo xfrm6_policy_afinfo;
 
-static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos,
+static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif,
  const xfrm_address_t *saddr,
  const xfrm_address_t *daddr)
 {
@@ -35,6 +35,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, 
int tos,
int err;
 
memset(fl6, 0, sizeof(fl6));
+   fl6.flowi6_oif = oif;
memcpy(fl6.daddr, daddr, sizeof(fl6.daddr));
if (saddr)
memcpy(fl6.saddr, saddr, sizeof(fl6.saddr));
@@ -50,13 +51,13 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, 
int tos,
return dst;
 }
 
-static int xfrm6_get_saddr(struct net *net,
+static int xfrm6_get_saddr(struct net *net, int oif,
   xfrm_address_t *saddr, xfrm_address_t *daddr)
 {
struct dst_entry *dst;
struct net_device *dev;
 
-   dst = xfrm6_dst_lookup(net, 0, NULL, daddr);
+   dst = xfrm6_dst_lookup(net, 0, oif, NULL, daddr);
if (IS_ERR(dst

[PATCH 2/4] net/xfrm: use kmemdup rather than duplicating its implementation

2015-08-17 Thread Steffen Klassert
From: Andrzej Hajda a.ha...@samsung.com

The patch was generated using fixed coccinelle semantic patch
scripts/coccinelle/api/memdup.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2014320

Signed-off-by: Andrzej Hajda a.ha...@samsung.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/xfrm/xfrm_user.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 0cebf1f..a8de9e3 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -925,12 +925,10 @@ static int xfrm_dump_sa(struct sk_buff *skb, struct 
netlink_callback *cb)
return err;
 
if (attrs[XFRMA_ADDRESS_FILTER]) {
-   filter = kmalloc(sizeof(*filter), GFP_KERNEL);
+   filter = kmemdup(nla_data(attrs[XFRMA_ADDRESS_FILTER]),
+sizeof(*filter), GFP_KERNEL);
if (filter == NULL)
return -ENOMEM;
-
-   memcpy(filter, nla_data(attrs[XFRMA_ADDRESS_FILTER]),
-  sizeof(*filter));
}
 
if (attrs[XFRMA_PROTO])
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull request (net-next): ipsec-next 2015-08-17

2015-08-17 Thread Steffen Klassert
1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels.
   From Thomas Egerer.

2) Use kmemdup instead of duplicating it in xfrm_dump_sa().
   From Andrzej Hajda.

3) Pass oif to the xfrm lookups so that it gets set on the flow
   and the resolver routines can match based on oif.
   From David Ahern.

4) Add documentation for the new xfrm garbage collector threshold.
   From Alexander Duyck.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 07a51cd3794960548627a27aae68c1446341db32:

  vxlan: fix fdb_dump index calculation (2015-08-10 21:15:18 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to e69948a0a5309f3ef5715cb4ca7a9bd77d64e2cf:

  net: Document xfrm4_gc_thresh and xfrm6_gc_thresh (2015-08-12 08:28:04 +0200)


Alexander Duyck (1):
  net: Document xfrm4_gc_thresh and xfrm6_gc_thresh

Andrzej Hajda (1):
  net/xfrm: use kmemdup rather than duplicating its implementation

David Ahern (1):
  xfrm: Add oif to dst lookups

Thomas Egerer (1):
  xfrm6: Fix IPv6 ECN decapsulation

 Documentation/networking/ip-sysctl.txt | 10 ++
 include/net/xfrm.h |  7 +--
 net/ipv4/xfrm4_policy.c| 11 ++-
 net/ipv6/xfrm6_mode_tunnel.c   |  3 +--
 net/ipv6/xfrm6_policy.c|  7 ---
 net/xfrm/xfrm_policy.c | 24 ++--
 net/xfrm/xfrm_user.c   |  6 ++
 7 files changed, 42 insertions(+), 26 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] xfrm6: Fix IPv6 ECN decapsulation

2015-08-17 Thread Steffen Klassert
From: Thomas Egerer thomas.ege...@secunet.com

Using ipv6_get_dsfield on the outer IP header implies that inner and
outer header are of the the same address family. For interfamily
tunnels, particularly 646, the code reading the DSCP field obtains the
wrong values (IHL and the upper four bits of the DSCP field).
This can cause the code to detect a congestion encoutered state in the
outer header and enable the corresponding bits in the inner header, too.

Since the DSCP field is stored in the xfrm mode common buffer
independently from the IP version of the outer header, it's safe (and
correct) to take this value from there.

Signed-off-by: Thomas Egerer thomas.ege...@secunet.com
Signed-off-by: Steffen Klassert steffen.klass...@secunet.com
---
 net/ipv6/xfrm6_mode_tunnel.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index 901ef6f..f7fbdba 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -20,10 +20,9 @@
 
 static inline void ipip6_ecn_decapsulate(struct sk_buff *skb)
 {
-   const struct ipv6hdr *outer_iph = ipv6_hdr(skb);
struct ipv6hdr *inner_iph = ipipv6_hdr(skb);
 
-   if (INET_ECN_is_ce(ipv6_get_dsfield(outer_iph)))
+   if (INET_ECN_is_ce(XFRM_MODE_SKB_CB(skb)-tos))
IP6_ECN_set_ce(inner_iph);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPsec maintenance during the next weeks

2015-07-29 Thread Steffen Klassert
On Tue, Jul 07, 2015 at 10:58:17PM -0700, David Miller wrote:
 From: Steffen Klassert steffen.klass...@secunet.com
 Date: Wed, 8 Jul 2015 07:04:32 +0200
 
  I'll be off without mail access for the next two and a half weeks.
  Can you please take urgent IPsec patches directly into the net
  tree during this time? 
  
  I'll let you know as soon as I'm back.
 
 Sure, thanks for letting me know.

I'm back and continue to run the ipsec trees.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm: Add oif to dst lookups

2015-08-12 Thread Steffen Klassert
On Mon, Aug 10, 2015 at 04:58:11PM -0600, David Ahern wrote:
 Rules can be installed that direct route lookups to specific tables based
 on oif. Plumb the oif through the xfrm lookups so it gets set in the flow
 struct and passed to the resolver routines.
 
 Signed-off-by: David Ahern d...@cumulusnetworks.com

Applied to ipsec-next, thanks a lot!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 28/31] net/xfrm: use kmemdup rather than duplicating its implementation

2015-08-12 Thread Steffen Klassert
On Fri, Aug 07, 2015 at 09:59:34AM +0200, Andrzej Hajda wrote:
 The patch was generated using fixed coccinelle semantic patch
 scripts/coccinelle/api/memdup.cocci [1].
 
 [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320
 
 Signed-off-by: Andrzej Hajda a.ha...@samsung.com

Applied to ipsec-next, thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [net-next PATCH] net: Document xfrm4_gc_thresh and xfrm6_gc_thresh

2015-08-12 Thread Steffen Klassert
On Tue, Aug 11, 2015 at 01:51:52PM -0700, David Miller wrote:
 From: Alexander Duyck alexander.h.du...@redhat.com
 Date: Tue, 11 Aug 2015 13:35:01 -0700
 
  This change adds documentation for xfrm4_gc_thresh and xfrm6_gc_thresh
  based on the comments in commit eeb1b73378b56 (xfrm: Increase the garbage
  collector threshold).
  
  Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
 
 Thanks, I'll let Steffen review and pick this up.

Now applied to ipsec-next, thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next] xfrm: Fix unaligned access to stats in copy_to_user_state()

2015-10-23 Thread Steffen Klassert
On Wed, Oct 21, 2015 at 11:48:25AM -0400, Sowmini Varadhan wrote:
> 
> On sparc, deleting established SAs (e.g., by restarting ipsec)
> results in unaligned access messages via xfrm_del_sa -> 
> km_state_notify -> xfrm_send_state_notify().
> 
> Even though struct xfrm_usersa_info is aligned on 8-byte boundaries,
> netlink attributes are fundamentally only 4 byte aligned, and this
> cannot be changed for nla_data() that is passed up to userspace.
> As a result, the put_unaligned() macro needs to be used to
> set up potentially unaligned fields such as the xfrm_stats in
> copy_to_user_state()
> 
> Signed-off-by: Sowmini Varadhan 
> ---
> v2: review comment from thread: cannot use PTR_ALIGN as this would break
> userspace assumptions about 4 byte alignment. Use *_unaligned() macros
> as needed, instead.

This works on intel 32-bit and 64-bit as expected.
Patch applied to ipsec-next, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] xfrm: Fix pmtu discovery for local generated packets.

2015-10-22 Thread Steffen Klassert
Commit 044a832a777 ("xfrm: Fix local error reporting crash
with interfamily tunnels") moved the setting of skb->protocol
behind the last access of the inner mode family to fix an
interfamily crash. Unfortunately now skb->protocol might not
be set at all, so we fail dispatch to the inner address family.
As a reault, the local error handler is not called and the
mtu value is not reported back to userspace.

We fix this by setting skb->protocol on message size errors
before we call xfrm_local_error.

Fixes: 044a832a7779c ("xfrm: Fix local error reporting crash with interfamily 
tunnels")
Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>
---
 net/ipv4/xfrm4_output.c | 2 ++
 net/ipv6/xfrm6_output.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 2878dbf..41a2613 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -30,6 +30,8 @@ static int xfrm4_tunnel_check_size(struct sk_buff *skb)
 
mtu = dst_mtu(skb_dst(skb));
if (skb->len > mtu) {
+   skb->protocol = htons(ETH_P_IP);
+
if (skb->sk)
xfrm_local_error(skb, mtu);
else
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index be033f2..e15feb7 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -79,6 +79,7 @@ static int xfrm6_tunnel_check_size(struct sk_buff *skb)
 
if (!skb->ignore_df && skb->len > mtu) {
skb->dev = dst->dev;
+   skb->protocol = htons(ETH_P_IPV6);
 
if (xfrm6_local_dontfrag(skb))
xfrm6_local_rxpmtu(skb, mtu);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] xfrm: Fix state threshold configuration from userspace

2015-10-22 Thread Steffen Klassert
From: Michael Rossberg <michael.rossb...@tu-ilmenau.de>

Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry
timer (XFRMA_ETIMER_THRESH) of a state without having to set other
attributes like replay counter and byte lifetime. Changing these other
values while traffic flows will break the state.

Signed-off-by: Michael Rossberg <michael.rossb...@tu-ilmenau.de>
Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>
---
 net/xfrm/xfrm_user.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index a8de9e3..24e06a2 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1928,8 +1928,10 @@ static int xfrm_new_ae(struct sk_buff *skb, struct 
nlmsghdr *nlh,
struct nlattr *rp = attrs[XFRMA_REPLAY_VAL];
struct nlattr *re = attrs[XFRMA_REPLAY_ESN_VAL];
struct nlattr *lt = attrs[XFRMA_LTIME_VAL];
+   struct nlattr *et = attrs[XFRMA_ETIMER_THRESH];
+   struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH];
 
-   if (!lt && !rp && !re)
+   if (!lt && !rp && !re && !et && !rt)
return err;
 
/* pedantic mode - thou shalt sayeth replaceth */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull request (net): ipsec 2015-10-22

2015-10-22 Thread Steffen Klassert
1) Fix IPsec pre-encap fragmentation for GSO packets.
   From Herbert Xu.

2) Fix some header checks in _decode_session6.
   We skip the header informations if the data pointer points
   already behind the header in question for some protocols.
   This is because we call pskb_may_pull with a negative value
   converted to unsigened int from pskb_may_pull in this case.
   Skipping the header informations can lead to incorrect policy
   lookups. From Mathias Krause.

3) Allow to change the replay threshold and expiry timer of a
   state without having to set other attributes like replay
   counter and byte lifetime. Changing these other attributes
   may break the SA. From Michael Rossberg.

4) Fix pmtu discovery for local generated packets.
   We may fail dispatch to the inner address family.
   As a reault, the local error handler is not called
   and the mtu value is not reported back to userspace.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 724a7636ad026a3a68f3fc626ccd04111f65cfd9:

  Merge branch 'sctp-fixes' (2015-09-03 15:43:06 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master

for you to fetch changes up to ca064bd89363a6e7e71b1c5226ff1b718957a9d4:

  xfrm: Fix pmtu discovery for local generated packets. (2015-10-19 10:30:05 
+0200)


Herbert Xu (1):
  ipv6: Fix IPsec pre-encap fragmentation check

Mathias Krause (1):
  xfrm6: Fix ICMPv6 and MH header checks in _decode_session6

Michael Rossberg (1):
  xfrm: Fix state threshold configuration from userspace

Steffen Klassert (1):
  xfrm: Fix pmtu discovery for local generated packets.

 net/ipv4/xfrm4_output.c |  2 ++
 net/ipv6/xfrm6_output.c | 18 --
 net/ipv6/xfrm6_policy.c |  6 --
 net/xfrm/xfrm_user.c|  4 +++-
 4 files changed, 21 insertions(+), 9 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] xfrm6: Fix ICMPv6 and MH header checks in _decode_session6

2015-10-22 Thread Steffen Klassert
From: Mathias Krause <mathias.kra...@secunet.com>

Ensure there's enough data left prior calling pskb_may_pull(). If
skb->data was already advanced, we'll call pskb_may_pull() with a
negative value converted to unsigned int -- leading to a huge
positive value. That won't matter in practice as pskb_may_pull()
will likely fail in this case, but it leads to underflow reports on
kernels handling such kind of over-/underflows, e.g. a PaX enabled
kernel instrumented with the size_overflow plugin.

Reported-by: satmd <sa...@lain.at>
Reported-and-tested-by: Marcin Jurkowski <marci...@gmail.com>
Signed-off-by: Mathias Krause <mathias.kra...@secunet.com>
Cc: PaX Team <pagee...@freemail.hu>
Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>
---
 net/ipv6/xfrm6_policy.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 30caa28..f10b940 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -178,7 +178,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int 
reverse)
return;
 
case IPPROTO_ICMPV6:
-   if (!onlyproto && pskb_may_pull(skb, nh + offset + 2 - 
skb->data)) {
+   if (!onlyproto && (nh + offset + 2 < skb->data ||
+   pskb_may_pull(skb, nh + offset + 2 - skb->data))) {
u8 *icmp;
 
nh = skb_network_header(skb);
@@ -192,7 +193,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int 
reverse)
 #if IS_ENABLED(CONFIG_IPV6_MIP6)
case IPPROTO_MH:
offset += ipv6_optlen(exthdr);
-   if (!onlyproto && pskb_may_pull(skb, nh + offset + 3 - 
skb->data)) {
+   if (!onlyproto && (nh + offset + 3 < skb->data ||
+   pskb_may_pull(skb, nh + offset + 3 - skb->data))) {
struct ip6_mh *mh;
 
nh = skb_network_header(skb);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] ipv6: Fix IPsec pre-encap fragmentation check

2015-10-22 Thread Steffen Klassert
From: Herbert Xu <herb...@gondor.apana.org.au>

The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode
packets.  That is, we perform fragmentation pre-encap rather than
post-encap.

A check was added later to ensure that proper MTU information is
passed back for locally generated traffic.  Unfortunately this
check was performed on all IPsec packets, including transport-mode
packets.

What's more, the check failed to take GSO into account.

The end result is that transport-mode GSO packets get dropped at
the check.

This patch fixes it by moving the tunnel mode check forward as well
as adding the GSO check.

Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")
Signed-off-by: Herbert Xu <herb...@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>
---
 net/ipv6/xfrm6_output.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 09c76a7..be033f2 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -136,6 +136,7 @@ static int __xfrm6_output(struct sock *sk, struct sk_buff 
*skb)
struct dst_entry *dst = skb_dst(skb);
struct xfrm_state *x = dst->xfrm;
int mtu;
+   bool toobig;
 
 #ifdef CONFIG_NETFILTER
if (!x) {
@@ -144,25 +145,29 @@ static int __xfrm6_output(struct sock *sk, struct sk_buff 
*skb)
}
 #endif
 
+   if (x->props.mode != XFRM_MODE_TUNNEL)
+   goto skip_frag;
+
if (skb->protocol == htons(ETH_P_IPV6))
mtu = ip6_skb_dst_mtu(skb);
else
mtu = dst_mtu(skb_dst(skb));
 
-   if (skb->len > mtu && xfrm6_local_dontfrag(skb)) {
+   toobig = skb->len > mtu && !skb_is_gso(skb);
+
+   if (toobig && xfrm6_local_dontfrag(skb)) {
xfrm6_local_rxpmtu(skb, mtu);
return -EMSGSIZE;
-   } else if (!skb->ignore_df && skb->len > mtu && skb->sk) {
+   } else if (!skb->ignore_df && toobig && skb->sk) {
xfrm_local_error(skb, mtu);
return -EMSGSIZE;
}
 
-   if (x->props.mode == XFRM_MODE_TUNNEL &&
-   ((skb->len > mtu && !skb_is_gso(skb)) ||
-   dst_allfrag(skb_dst(skb {
+   if (toobig || dst_allfrag(skb_dst(skb)))
return ip6_fragment(sk, skb,
x->outer_mode->afinfo->output_finish);
-   }
+
+skip_frag:
return x->outer_mode->afinfo->output_finish(sk, skb);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] xfrm: Fix unaligned access to stats in copy_to_user_state()

2015-10-30 Thread Steffen Klassert
From: Sowmini Varadhan <sowmini.varad...@oracle.com>

On sparc, deleting established SAs (e.g., by restarting ipsec)
results in unaligned access messages via xfrm_del_sa ->
km_state_notify -> xfrm_send_state_notify().

Even though struct xfrm_usersa_info is aligned on 8-byte boundaries,
netlink attributes are fundamentally only 4 byte aligned, and this
cannot be changed for nla_data() that is passed up to userspace.
As a result, the put_unaligned() macro needs to be used to
set up potentially unaligned fields such as the xfrm_stats in
copy_to_user_state()

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>
---
 net/xfrm/xfrm_user.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index a8de9e3..639e0d5 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -31,6 +31,7 @@
 #if IS_ENABLED(CONFIG_IPV6)
 #include 
 #endif
+#include 
 
 static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
 {
@@ -728,7 +729,9 @@ static void copy_to_user_state(struct xfrm_state *x, struct 
xfrm_usersa_info *p)
memcpy(>sel, >sel, sizeof(p->sel));
memcpy(>lft, >lft, sizeof(p->lft));
memcpy(>curlft, >curlft, sizeof(p->curlft));
-   memcpy(>stats, >stats, sizeof(p->stats));
+   put_unaligned(x->stats.replay_window, >stats.replay_window);
+   put_unaligned(x->stats.replay, >stats.replay);
+   put_unaligned(x->stats.integrity_failed, >stats.integrity_failed);
memcpy(>saddr, >props.saddr, sizeof(p->saddr));
p->mode = x->props.mode;
p->replay_window = x->props.replay_window;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] xfrm: Let the flowcache handle its size by default.

2015-10-30 Thread Steffen Klassert
The xfrm flowcache size is limited by the flowcache limit
(4096 * number of online cpus) and the xfrm garbage collector
threshold (2 * 32768), whatever is reached first. This means
that we can hit the garbage collector limit only on systems
with more than 16 cpus. On such systems we simply refuse
new allocations if we reach the limit, so new flows are dropped.
On syslems with 16 or less cpus, we hit the flowcache limit.
In this case, we shrink the flow cache instead of refusing new
flows.

We increase the xfrm garbage collector threshold to INT_MAX
to get the same behaviour, independent of the number of cpus.

The xfrm garbage collector threshold can still be set below
the flowcache limit to reduce the memory usage of the flowcache.

Tested-by: Dan Streetman <dan.street...@canonical.com>
Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>
---
 Documentation/networking/ip-sysctl.txt | 6 --
 net/ipv4/xfrm4_policy.c| 2 +-
 net/ipv6/xfrm6_policy.c| 2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index ebe94f2..260f30b 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1199,7 +1199,8 @@ tag - INTEGER
 xfrm4_gc_thresh - INTEGER
The threshold at which we will start garbage collecting for IPv4
destination cache entries.  At twice this value the system will
-   refuse new allocations.
+   refuse new allocations. The value must be set below the flowcache
+   limit (4096 * number of online cpus) to take effect.
 
 igmp_link_local_mcast_reports - BOOLEAN
Enable IGMP reports for link local multicast groups in the
@@ -1645,7 +1646,8 @@ ratelimit - INTEGER
 xfrm6_gc_thresh - INTEGER
The threshold at which we will start garbage collecting for IPv6
destination cache entries.  At twice this value the system will
-   refuse new allocations.
+   refuse new allocations. The value must be set below the flowcache
+   limit (4096 * number of online cpus) to take effect.
 
 
 IPv6 Update by:
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 0304d16..75e8d48 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -246,7 +246,7 @@ static struct dst_ops xfrm4_dst_ops = {
.destroy =  xfrm4_dst_destroy,
.ifdown =   xfrm4_dst_ifdown,
.local_out =__ip_local_out,
-   .gc_thresh =32768,
+   .gc_thresh =INT_MAX,
 };
 
 static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 30caa28..2fad593 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -287,7 +287,7 @@ static struct dst_ops xfrm6_dst_ops = {
.destroy =  xfrm6_dst_destroy,
.ifdown =   xfrm6_dst_ifdown,
.local_out =__ip6_local_out,
-   .gc_thresh =32768,
+   .gc_thresh =INT_MAX,
 };
 
 static struct xfrm_policy_afinfo xfrm6_policy_afinfo = {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] xfrm4: Fix header checks in _decode_session4.

2015-10-30 Thread Steffen Klassert
We skip the header informations if the data pointer points
already behind the header in question for some protocols.
This is because we call pskb_may_pull with a negative value
converted to unsigened int from pskb_may_pull in this case.
Skipping the header informations can lead to incorrect policy
lookups, so fix it by a check of the data pointer position
before we call pskb_may_pull.

Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>
---
 net/ipv4/xfrm4_policy.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 75e8d48..e4d533c 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -137,7 +137,8 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int 
reverse)
break;
 
case IPPROTO_ICMP:
-   if (pskb_may_pull(skb, xprth + 2 - skb->data)) {
+   if (xprth + 2 < skb->data ||
+   pskb_may_pull(skb, xprth + 2 - skb->data)) {
u8 *icmp = xprth;
 
fl4->fl4_icmp_type = icmp[0];
@@ -146,7 +147,8 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int 
reverse)
break;
 
case IPPROTO_ESP:
-   if (pskb_may_pull(skb, xprth + 4 - skb->data)) {
+   if (xprth + 4 < skb->data ||
+   pskb_may_pull(skb, xprth + 4 - skb->data)) {
__be32 *ehdr = (__be32 *)xprth;
 
fl4->fl4_ipsec_spi = ehdr[0];
@@ -154,7 +156,8 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int 
reverse)
break;
 
case IPPROTO_AH:
-   if (pskb_may_pull(skb, xprth + 8 - skb->data)) {
+   if (xprth + 8 < skb->data ||
+   pskb_may_pull(skb, xprth + 8 - skb->data)) {
__be32 *ah_hdr = (__be32 *)xprth;
 
fl4->fl4_ipsec_spi = ah_hdr[1];
@@ -162,7 +165,8 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int 
reverse)
break;
 
case IPPROTO_COMP:
-   if (pskb_may_pull(skb, xprth + 4 - skb->data)) {
+   if (xprth + 4 < skb->data ||
+   pskb_may_pull(skb, xprth + 4 - skb->data)) {
__be16 *ipcomp_hdr = (__be16 *)xprth;
 
fl4->fl4_ipsec_spi = 
htonl(ntohs(ipcomp_hdr[1]));
@@ -170,7 +174,8 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int 
reverse)
break;
 
case IPPROTO_GRE:
-   if (pskb_may_pull(skb, xprth + 12 - skb->data)) {
+   if (xprth + 12 < skb->data ||
+   pskb_may_pull(skb, xprth + 12 - skb->data)) {
__be16 *greflags = (__be16 *)xprth;
__be32 *gre_hdr = (__be32 *)xprth;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] xfrm: Increment statistic counter on inner mode error

2015-10-30 Thread Steffen Klassert
Increment the LINUX_MIB_XFRMINSTATEMODEERROR statistic counter
to notify about dropped packets if we fail to fetch a inner mode.

Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>
---
 net/xfrm/xfrm_input.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 60ce701..ad7f5b3 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -330,8 +330,10 @@ resume:
 
if (x->sel.family == AF_UNSPEC) {
inner_mode = xfrm_ip2inner_mode(x, 
XFRM_MODE_SKB_CB(skb)->protocol);
-   if (inner_mode == NULL)
+   if (inner_mode == NULL) {
+   XFRM_INC_STATS(net, 
LINUX_MIB_XFRMINSTATEMODEERROR);
goto drop;
+   }
}
 
if (inner_mode->input(x, skb)) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] xfrm4: Reload skb header pointers after calling pskb_may_pull.

2015-10-30 Thread Steffen Klassert
A call to pskb_may_pull may change the pointers into the packet,
so reload the pointers after the call.

Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>
---
 net/ipv4/xfrm4_policy.c | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index e4d533c..269b137 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -129,7 +129,10 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, 
int reverse)
case IPPROTO_DCCP:
if (xprth + 4 < skb->data ||
pskb_may_pull(skb, xprth + 4 - skb->data)) {
-   __be16 *ports = (__be16 *)xprth;
+   __be16 *ports;
+
+   xprth = skb_network_header(skb) + iph->ihl * 4;
+   ports = (__be16 *)xprth;
 
fl4->fl4_sport = ports[!!reverse];
fl4->fl4_dport = ports[!reverse];
@@ -139,7 +142,10 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, 
int reverse)
case IPPROTO_ICMP:
if (xprth + 2 < skb->data ||
pskb_may_pull(skb, xprth + 2 - skb->data)) {
-   u8 *icmp = xprth;
+   u8 *icmp;
+
+   xprth = skb_network_header(skb) + iph->ihl * 4;
+   icmp = xprth;
 
fl4->fl4_icmp_type = icmp[0];
fl4->fl4_icmp_code = icmp[1];
@@ -149,7 +155,10 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, 
int reverse)
case IPPROTO_ESP:
if (xprth + 4 < skb->data ||
pskb_may_pull(skb, xprth + 4 - skb->data)) {
-   __be32 *ehdr = (__be32 *)xprth;
+   __be32 *ehdr;
+
+   xprth = skb_network_header(skb) + iph->ihl * 4;
+   ehdr = (__be32 *)xprth;
 
fl4->fl4_ipsec_spi = ehdr[0];
}
@@ -158,7 +167,10 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, 
int reverse)
case IPPROTO_AH:
if (xprth + 8 < skb->data ||
pskb_may_pull(skb, xprth + 8 - skb->data)) {
-   __be32 *ah_hdr = (__be32 *)xprth;
+   __be32 *ah_hdr;
+
+   xprth = skb_network_header(skb) + iph->ihl * 4;
+   ah_hdr = (__be32 *)xprth;
 
fl4->fl4_ipsec_spi = ah_hdr[1];
}
@@ -167,7 +179,10 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, 
int reverse)
case IPPROTO_COMP:
if (xprth + 4 < skb->data ||
pskb_may_pull(skb, xprth + 4 - skb->data)) {
-   __be16 *ipcomp_hdr = (__be16 *)xprth;
+   __be16 *ipcomp_hdr;
+
+   xprth = skb_network_header(skb) + iph->ihl * 4;
+   ipcomp_hdr = (__be16 *)xprth;
 
fl4->fl4_ipsec_spi = 
htonl(ntohs(ipcomp_hdr[1]));
}
@@ -176,8 +191,12 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, 
int reverse)
case IPPROTO_GRE:
if (xprth + 12 < skb->data ||
pskb_may_pull(skb, xprth + 12 - skb->data)) {
-   __be16 *greflags = (__be16 *)xprth;
-   __be32 *gre_hdr = (__be32 *)xprth;
+   __be16 *greflags;
+   __be32 *gre_hdr;
+
+   xprth = skb_network_header(skb) + iph->ihl * 4;
+   greflags = (__be16 *)xprth;
+   gre_hdr = (__be32 *)xprth;
 
if (greflags[0] & GRE_KEY) {
if (greflags[0] & GRE_CSUM)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull request (net-next): ipsec-next 2015-10-30

2015-10-30 Thread Steffen Klassert
1) The flow cache is limited by the flow cache limit which
   depends on the number of cpus and the xfrm garbage collector
   threshold which is independent of the number of cpus. This
   leads to the fact that on systems with more than 16 cpus
   we hit the xfrm garbage collector limit and refuse new
   allocations, so new flows are dropped. On systems with 16
   or less cpus, we hit the flowcache limit. In this case, we
   shrink the flow cache instead of refusing new flows. 

   We increase the xfrm garbage collector threshold to INT_MAX
   to get the same behaviour, independent of the number of cpus.

2) Fix some unaligned accesses on sparc systems.
   From Sowmini Varadhan.

3) Fix some header checks in _decode_session4. We may call
   pskb_may_pull with a negative value converted to unsigened
   int from pskb_may_pull. This can lead to incorrect policy
   lookups. We fix this by a check of the data pointer position
   before we call pskb_may_pull.

4) Reload skb header pointers after calling pskb_may_pull
   in _decode_session4 as this may change the pointers into
   the packet. 

5) Add a missing statistic counter on inner mode errors.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 8a4683a5e06efda7e1f327213678d4dcafc0d894:

  net: help compiler generate better code in eth_get_headlen (2015-09-28 
22:51:15 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to cb866e3298cd7412503fc7e2c265753c853fab9d:

  xfrm: Increment statistic counter on inner mode error (2015-10-23 07:52:58 
+0200)


Sowmini Varadhan (1):
  xfrm: Fix unaligned access to stats in copy_to_user_state()

Steffen Klassert (4):
  xfrm: Let the flowcache handle its size by default.
  xfrm4: Fix header checks in _decode_session4.
  xfrm4: Reload skb header pointers after calling pskb_may_pull.
  xfrm: Increment statistic counter on inner mode error

 Documentation/networking/ip-sysctl.txt |  6 ++--
 net/ipv4/xfrm4_policy.c| 50 +-
 net/ipv6/xfrm6_policy.c|  2 +-
 net/xfrm/xfrm_input.c  |  4 ++-
 net/xfrm/xfrm_user.c   |  5 +++-
 5 files changed, 49 insertions(+), 18 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] xfrm: dst_entries_init() per-net dst_ops

2015-11-03 Thread Steffen Klassert
On Thu, Oct 29, 2015 at 09:51:16AM -0400, Dan Streetman wrote:
> Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
> templates; their dst_entries counters will never be used.  Move the
> xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
> xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
> and dst_entries_destroy for each net namespace.
> 
> The ipv4 and ipv6 xfrms each create dst_ops template, and perform
> dst_entries_init on the templates.  The template values are copied to each
> net namespace's xfrm.xfrm*_dst_ops.  The problem there is the dst_ops
> pcpuc_entries field is a percpu counter and cannot be used correctly by
> simply copying it to another object.
> 
> The result of this is a very subtle bug; changes to the dst entries
> counter from one net namespace may sometimes get applied to a different
> net namespace dst entries counter.  This is because of how the percpu
> counter works; it has a main count field as well as a pointer to the
> percpu variables.  Each net namespace maintains its own main count
> variable, but all point to one set of percpu variables.  When any net
> namespace happens to change one of the percpu variables to outside its
> small batch range, its count is moved to the net namespace's main count
> variable.  So with multiple net namespaces operating concurrently, the
> dst_ops entries counter can stray from the actual value that it should
> be; if counts are consistently moved from one net namespace to another
> (which my testing showed is likely), then one net namespace winds up
> with a negative dst_ops count while another winds up with a continually
> increasing count, eventually reaching its gc_thresh limit, which causes
> all new traffic on the net namespace to fail with -ENOBUFS.
> 
> Signed-off-by: Dan Streetman 
> Signed-off-by: Dan Streetman 

Applied to the ipsec tree, thanks Dan!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6

2015-10-19 Thread Steffen Klassert
On Mon, Oct 12, 2015 at 12:49:29PM -0600, David Ahern wrote:
> On 10/9/15 11:27 AM, David Ahern wrote:
> >On 10/9/15 1:17 AM, Steffen Klassert wrote:
> >>>>diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
> >>>>index 30caa289c5db..5cedfda4b241 100644
> >>>>--- a/net/ipv6/xfrm6_policy.c
> >>>>+++ b/net/ipv6/xfrm6_policy.c
> >>>>@@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct
> >>>>net *net, int tos, int oif,
> >>>>
> >>>>  memset(, 0, sizeof(fl6));
> >>>>  fl6.flowi6_oif = oif;
> >>>>+fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
> >>>>  memcpy(, daddr, sizeof(fl6.daddr));
> >>>>  if (saddr)
> >>>>  memcpy(, saddr, sizeof(fl6.saddr));
> >>>
> >>>I found that this fix is still not sufficient with the mip6
> >>>(Mobile IPv6) use case.
> >>
> >>It does not even fix the vti case. The behaviour of the vti devices is
> >>the same, with and without the patch.
> >>
> >
> >The attached patch applied to Linus' tree works for me. Currently the
> >above change is not in his tree, so I added it to this patch. Once you
> >confirm that it works for you I'll create the delta-patch for net and
> >send out.
> 
> Steffen: Have you had a chance to try the patch? Does it solve the
> vti6 problem for you?

The delta of your patch and the current net tree fixes the
vti6 problems.

Current net-next works without any changes.

Thanks David!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Really fix vti6 with oif in dst lookups

2015-10-20 Thread Steffen Klassert
On Mon, Oct 19, 2015 at 08:26:05AM -0700, David Ahern wrote:
> 6e28b000825d ("net: Fix vti use case with oif in dst lookups for IPv6")
> is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them.
> 
> Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
> Cc: Steffen Klassert <steffen.klass...@secunet.com>
> Signed-off-by: David Ahern <d...@cumulusnetworks.com>
> ---
> This is needed for net in 4.3. 6e28b000825d was mistakenly created on
> top of the VRF patches which had this change. Steffen has confirmed
> VTI6 works with this change.

Acked-by: Steffen Klassert <steffen.klass...@secunet.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/2] xfrm: Fix unaligned access in xfrm_notify_sa() for DELSA

2015-10-21 Thread Steffen Klassert
On Mon, Oct 19, 2015 at 05:23:29PM -0400, Sowmini Varadhan wrote:
> On sparc, deleting established SAs (e.g., by restarting ipsec
> at the peer) results in unaligned access messages via
> xfrm_del_sa -> km_state_notify -> xfrm_send_state_notify().
> Use an aligned pointer to xfrm_usersa_info for this case.
> 
> Signed-off-by: Sowmini Varadhan 
> ---
>  net/xfrm/xfrm_user.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
> index a8de9e3..158ef4a 100644
> --- a/net/xfrm/xfrm_user.c
> +++ b/net/xfrm/xfrm_user.c
> @@ -2659,7 +2659,7 @@ static int xfrm_notify_sa(struct xfrm_state *x, const 
> struct km_event *c)
>   if (attr == NULL)
>   goto out_free_skb;
>  
> - p = nla_data(attr);
> + p = PTR_ALIGN(nla_data(attr), __alignof__(*p));

Hm, this breaks userspace notifications on 64-bit systems.
Userspace expects this to be aligned to 4, with your patch
it is aligned to 8 on 64-bit.

Without your patch I get the correct notification when deleting a SA:

ip x m

Deleted src 172.16.0.2 dst 172.16.0.1
proto esp spi 0x0002 reqid 2 mode tunnel
replay-window 32
auth-trunc hmac(sha1) 0x31323334353637383930 96
enc cbc(aes) 0x31323334353637383930313233343536
sel src 10.0.0.0/24 dst 192.168.0.0/24

With your patch I get for the same SA:

ip x m

Deleted src 50.0.0.0 dst 0.0.0.0
proto 0 reqid 0 mode transport
replay-window 0 flag decap-dscp
auth-trunc hmac(sha1) 0x31323334353637383930 96
enc cbc(aes) 0x31323334353637383930313233343536
sel src 0.0.0.0/0 dst 0.234.255.255/0 proto igmp

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6

2015-10-13 Thread Steffen Klassert
Hi David.

On Mon, Oct 12, 2015 at 12:49:29PM -0600, David Ahern wrote:
> On 10/9/15 11:27 AM, David Ahern wrote:
> >
> >The attached patch applied to Linus' tree works for me. Currently the
> >above change is not in his tree, so I added it to this patch. Once you
> >confirm that it works for you I'll create the delta-patch for net and
> >send out.
> 
> Steffen: Have you had a chance to try the patch? Does it solve the
> vti6 problem for you?
> 

I'm not at office this week, so not able to do a test
before monday. I let you know as soon as I tried the
patch.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6

2015-10-09 Thread Steffen Klassert
On Fri, Oct 09, 2015 at 03:54:22PM +0900, Hajime Tazaki wrote:
> 
> Hello David,
> 
> At Mon,  5 Oct 2015 08:32:51 -0600,
> David Ahern wrote:
> 
> > 
> > diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
> > index 30caa289c5db..5cedfda4b241 100644
> > --- a/net/ipv6/xfrm6_policy.c
> > +++ b/net/ipv6/xfrm6_policy.c
> > @@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net 
> > *net, int tos, int oif,
> >  
> > memset(, 0, sizeof(fl6));
> > fl6.flowi6_oif = oif;
> > +   fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
> > memcpy(, daddr, sizeof(fl6.daddr));
> > if (saddr)
> > memcpy(, saddr, sizeof(fl6.saddr));
> 
> I found that this fix is still not sufficient with the mip6
> (Mobile IPv6) use case.

It does not even fix the vti case. The behaviour of the vti devices is
the same, with and without the patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


IPsec maintenance during the next weeks

2015-07-07 Thread Steffen Klassert
David,

I'll be off without mail access for the next two and a half weeks.
Can you please take urgent IPsec patches directly into the net
tree during this time? 

I'll let you know as soon as I'm back.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPv6 xfrm GSO fragmentation bug

2015-08-31 Thread Steffen Klassert
On Sun, Aug 30, 2015 at 04:24:32PM +0800, Herbert Xu wrote:
> Hi Steffen:
> 
> I received a bug report regarding poor IPComp performance over
> IPv6:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=1257952
> 
> It appears to have been caused by
> 
> commit dd767856a36e00b631d65ebc4bb81b19915532d6
> Author: Steffen Klassert <steffen.klass...@secunet.com>
> Date:   Tue Oct 11 01:44:30 2011 +
> 
> xfrm6: Don't call icmpv6_send on local error
> 
> which addded an MTU check without a GSO override.
> 
> Fixing it obviously isn't difficult, but I am wondering why I
> can't find a corresponding patch for IPv4.  Do we need that check
> to be present in xfrm6_output at all? If we do why isn't it needed
> for IPv4 as well?

As far as I remember, this was to catch local message size
errors before __xfrm6_output calls ip6_fragment which would
use icmpv6_send for the error notification. IPv4 does not do
fragmentation in the xfrm4_output functions, so this mtu
check was not needed there.

I think just adding the gso checks should be fine.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPv6 xfrm GSO fragmentation bug

2015-09-07 Thread Steffen Klassert
On Fri, Sep 04, 2015 at 01:21:06PM +0800, Herbert Xu wrote:
> On Mon, Aug 31, 2015 at 03:35:26PM +0800, Herbert Xu wrote:
> > 
> > I see where the bug came from.  Indeed IPv6 does do fragmentation
> > but only for tunnel mode.  While your patch added a check that also
> > affected transport mode.  So in addition to the GSO fix we should
> > also make the MTU check conditional to tunnel mode.
> 
> Here is the patch:
> 
> ---8<---
> ipv6: Fix IPsec pre-encap fragmentation check
> 
> The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode
> packets.  That is, we perform fragmentation pre-encap rather than
> post-encap.
> 
> A check was added later to ensure that proper MTU information is
> passed back for locally generated traffic.  Unfortunately this
> check was performed on all IPsec packets, including transport-mode
> packets.
> 
> What's more, the check failed to take GSO into account.
> 
> The end result is that transport-mode GSO packets get dropped at
> the check.
> 
> This patch fixes it by moving the tunnel mode check forward as well
> as adding the GSO check.
> 
> Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")
> Signed-off-by: Herbert Xu 

Applied to the ipsec tree, thanks Herbert!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm6: Fix ICMPv6 and MH header checks in _decode_session6

2015-09-14 Thread Steffen Klassert
On Fri, Sep 11, 2015 at 09:57:20AM +0200, Mathias Krause wrote:
> From: Mathias Krause 
> 
> Ensure there's enough data left prior calling pskb_may_pull(). If
> skb->data was already advanced, we'll call pskb_may_pull() with a
> negative value converted to unsigned int -- leading to a huge
> positive value. That won't matter in practice as pskb_may_pull()
> will likely fail in this case, but it leads to underflow reports on
> kernels handling such kind of over-/underflows, e.g. a PaX enabled
> kernel instrumented with the size_overflow plugin.
> 
> Reported-by: satmd 
> Reported-and-tested-by: Marcin Jurkowski 
> Signed-off-by: Mathias Krause 
> Cc: PaX Team 

Skipping upper layer informations due to a wrong length calculation,
may also leed to incorrect policy lookups. Patch applied to the
ipsec tree, thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Fix vti use case with oif in dst lookups

2015-09-16 Thread Steffen Klassert
On Tue, Sep 15, 2015 at 03:10:50PM -0700, David Ahern wrote:
> Steffen reported that the recent change to add oif to dst lookups breaks
> the VTI use case. The problem is that with the oif set in the flow struct
> the comparison to the nh_oif is triggered. Fix by splitting the
> FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache
> bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare
> nh oif (FLOWI_FLAG_SKIP_NH_OIF).
> 
> Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
> 
> Signed-off-by: David Ahern <d...@cumulusnetworks.com>

This works, thanks a lot for the quick fix!

> ---
> IPv6 does not show this problem for me. So no change is added for IPv6.
> If your mileage varies let me know and I'll take another look.

IPv6 works just fine as it is, so no change needed.

Acked-by: Steffen Klassert <steffen.klass...@secunet.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-16 Thread Steffen Klassert
On Mon, Sep 14, 2015 at 11:14:59PM -0400, Dan Streetman wrote:
> On Fri, Sep 11, 2015 at 5:48 AM, Steffen Klassert
> <steffen.klass...@secunet.com> wrote:
> >
> >> Possibly the
> >> default value of xfrm4_gc_thresh could be set proportional to
> >> num_online_cpus(), but that doesn't help when cpus are onlined after
> >> boot.
> >
> > This could be an option, we could change the xfrm4_gc_thresh value with
> > a cpu notifier callback if more cpus come up after boot.
> 
> the issue there is, if the value is changed by the user, does a cpu
> hotplug reset it back to default...

What about the patch below? With this we are independent of the number
of cpus. It should cover most, if not all usecases.

While we are at it, we could think about increasing the flowcache
percpu limit. This value was choosen back in 2003, so maybe we could
have more than 4k cache entries per cpu these days.


Subject: [PATCH RFC] xfrm: Let the flowcache handle its size by default.

The xfrm flowcache size is limited by the flowcache limit
(4096 * number of online cpus) and the xfrm garbage collector
threshold (2 * 32768), whatever is reached first. This means
that we can hit the garbage collector limit only on systems
with more than 16 cpus. On such systems we simply refuse
new allocations if we reach the limit, so new flows are dropped.
On syslems with 16 or less cpus, we hit the flowcache limit.
In this case, we shrink the flow cache instead of refusing new
flows.

We increase the xfrm garbage collector threshold to INT_MAX
to get the same behaviour, independent of the number of cpus.

The xfrm garbage collector threshold can still be set below
the flowcache limit to reduce the memory usage of the flowcache.

Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>
---
 Documentation/networking/ip-sysctl.txt | 6 --
 net/ipv4/xfrm4_policy.c| 2 +-
 net/ipv6/xfrm6_policy.c| 2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index ebe94f2..260f30b 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1199,7 +1199,8 @@ tag - INTEGER
 xfrm4_gc_thresh - INTEGER
The threshold at which we will start garbage collecting for IPv4
destination cache entries.  At twice this value the system will
-   refuse new allocations.
+   refuse new allocations. The value must be set below the flowcache
+   limit (4096 * number of online cpus) to take effect.
 
 igmp_link_local_mcast_reports - BOOLEAN
Enable IGMP reports for link local multicast groups in the
@@ -1645,7 +1646,8 @@ ratelimit - INTEGER
 xfrm6_gc_thresh - INTEGER
The threshold at which we will start garbage collecting for IPv6
destination cache entries.  At twice this value the system will
-   refuse new allocations.
+   refuse new allocations. The value must be set below the flowcache
+   limit (4096 * number of online cpus) to take effect.
 
 
 IPv6 Update by:
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 1e06c4f..3dffc73 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -248,7 +248,7 @@ static struct dst_ops xfrm4_dst_ops = {
.destroy =  xfrm4_dst_destroy,
.ifdown =   xfrm4_dst_ifdown,
.local_out =__ip_local_out,
-   .gc_thresh =32768,
+   .gc_thresh =INT_MAX,
 };
 
 static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index f10b940..e9af39a 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -289,7 +289,7 @@ static struct dst_ops xfrm6_dst_ops = {
.destroy =  xfrm6_dst_destroy,
.ifdown =   xfrm6_dst_ifdown,
.local_out =__ip6_local_out,
-   .gc_thresh =32768,
+   .gc_thresh =INT_MAX,
 };
 
 static struct xfrm_policy_afinfo xfrm6_policy_afinfo = {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-11 Thread Steffen Klassert
Hi Dan.

On Thu, Sep 10, 2015 at 05:01:26PM -0400, Dan Streetman wrote:
> Hi Steffen,
> 
> I've been working with Jay on a ipsec issue, which I believe he
> discussed with you.  

Yes, we talked about this at the LPC.

> In this case the xfrm4_garbage_collect is
> returning error because the number of xfrm4 dst entries has exceeded
> twice the gc_thresh, which causes new allocations of xfrm4 dst objects
> to fail, thus making the ipsec connection unusable (until dst objects
> are removed/freed).
> 
> The main reason the count gets to the limit is because the
> xfrm4_policy_afinfo.garbage_collect function - which points to
> flow_cache_flush (indirectly) - doesn't actually guarantee any xfrm4
> dst will get cleaned up, it only cleans up unused entries.
> 
> The flow cache hashtable size limit watermark does restrict how many
> flow cache entries exist (by shrinking the per-cpu hashtable once it
> has 4k entries), and therefore indirectly controls the total number of
> xfrm4 dst objects.  However, there's a mismatch between the default
> xfrm4 gc_thresh - of 32k objects (which sets a 64k max of xfrm4 dst
> objects) - and the flow cache hashtable limit of 4k objects per cpu.
> Any system with 16 or less cpus will have a total limit of 64k (or
> less) flow cache entries, so the 64k xfrm4 dst entry limit will never
> be reached.  However for any system with more than 16 cpus, the flow
> cache limit is greater than the xfrm4 dst limit, and so the xfrm4 dst
> allocation can fail, rendering the ipsec connection unusable.
> 
> The most obvious solution is for the system admin to increase the
> xfrm4_gc_thresh value, although it's not really an obvious solution to
> the end-user what value they should set it to :-) 

Yes, a static gc threshold is always wrong for some workloads. So
the user needs to adjust it to his needs, even if the right value
is not obvious.

> Possibly the
> default value of xfrm4_gc_thresh could be set proportional to
> num_online_cpus(), but that doesn't help when cpus are onlined after
> boot.  

This could be an option, we could change the xfrm4_gc_thresh value with
a cpu notifier callback if more cpus come up after boot.

> Also, a warning message indicating the xfrm4_gc_thresh limit
> was reached, and a suggestion to increase the limit, may help anyone
> who hits the issue.
> 
> I'm not sure if something more aggressive is appropriate, like
> removing active entries during garbage collection. 

It would not make too much sense to push an active flow out of the
fastpath just to add some other flow. If the number of active
entries is to high, there is no other option than increasing the
gc threshold.

You could try to reduce the number of active entries by shutting
down stale security associations frequently.

> Or, removing the
> failure condition from xfrm4_garbage_collect so xfrm4 dst_ops can
> always be allocated,

This would open doors for DOS attacks, we can't do this.

> or just increasing it from gc_thresh * 2 up to *
> 4 or more.

This would just defer the problem, so not a real solution.

That said, whatever we do, we just paper over the real problem,
that is the flowcache itself. Everything that need this kind
of garbage collecting is fundamentally broken. But as long as
nobody volunteers to work on a replacement, we have to live
with this situation somehow.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm: Add oif to dst lookups

2015-09-15 Thread Steffen Klassert
On Mon, Aug 10, 2015 at 04:58:11PM -0600, David Ahern wrote:
> Rules can be installed that direct route lookups to specific tables based
> on oif. Plumb the oif through the xfrm lookups so it gets set in the flow
> struct and passed to the resolver routines.
> 
> Signed-off-by: David Ahern 

David, this change broke vti tunnels.

> @@ -1690,8 +1694,8 @@ static struct dst_entry *xfrm_bundle_create(struct 
> xfrm_policy *policy,
>  
>   if (xfrm[i]->props.mode != XFRM_MODE_TRANSPORT) {
>   family = xfrm[i]->props.family;
> - dst = xfrm_dst_lookup(xfrm[i], tos, , ,
> -   family);
> + dst = xfrm_dst_lookup(xfrm[i], tos, fl->flowi_oif,
> +   , , family);

Passing the original output interface to xfrm_dst_lookup will generate
a routing loop whenever the original output interface is not identical
to the tunnel endpoint, like it is with vti. We can not ask for a route
through a specific interface here. This is the lookup for the tunnel
endpoints, so it must return a route through the local tunnel endpoint
device.

I don't know how you are going to use this with your vrf changes, so
I'm not sure how to fix this in a way that it works with vrf. Please
look into this.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-17 Thread Steffen Klassert
On Thu, Sep 17, 2015 at 09:23:35PM -0700, David Miller wrote:
> From: Steffen Klassert <steffen.klass...@secunet.com>
> Date: Wed, 16 Sep 2015 10:45:41 +0200
> 
> > index 1e06c4f..3dffc73 100644
> > --- a/net/ipv4/xfrm4_policy.c
> > +++ b/net/ipv4/xfrm4_policy.c
> > @@ -248,7 +248,7 @@ static struct dst_ops xfrm4_dst_ops = {
> > .destroy =  xfrm4_dst_destroy,
> > .ifdown =   xfrm4_dst_ifdown,
> > .local_out =__ip_local_out,
> > -   .gc_thresh =32768,
> > +   .gc_thresh =INT_MAX,
> >  };
> >  
> >  static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {
> 
> This means the dst_ops->gc() for xfrm will never be invoked.
> 
> Is that intentional?

Yes. This is already the case on systems with less than 8 cpus
because the flowcache is limited to 4096 entries per cpu. The
percpu flowcache shrinks itself to 'low_watermark' enrires if
it hits the percpu limit.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function

2015-09-30 Thread Steffen Klassert
On Tue, Sep 29, 2015 at 04:58:46PM -0600, David Ahern wrote:
> Hi Tom:
> 
> On 9/29/15 4:17 PM, Tom Herbert wrote:
> >This patch adds xfrm6_xlat_addr which is called in the data path
> >to perform address translation (primarily for the receive path). Modules
> >may register their own callback to perform a translation-- this
> >registration is managed by xfrm6_xlat_addr_add and xfrm6_xlat_addr_del.
> >xfrm6_xlat_addr allows translation of addresses for an sk_buff.
> 
> 
> Seems like a stretch to lump this into xfrms. You have a separate
> genl based config as opposed to the netlink xfrm API and you are
> calling the xlat_addr function directly in ip6_rcv as opposed to via
> some policy with dst_ops driven redirection. Why call this a xfrm?

I have to agree here. We have policies and states to do the lookups
and to describe the transformation. Just adding a callback to do this
in a different way does not integrate well into xfrm.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] xfrm: Fix state threshold configuration from userspace

2015-09-30 Thread Steffen Klassert
On Tue, Sep 29, 2015 at 11:25:08AM +0200, Michael Rossberg wrote:
> Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry
> timer (XFRMA_ETIMER_THRESH) of a state without having to set other
> attributes like replay counter and byte lifetime. Changing these other
> values while traffic flows will break the state.
> 
> Signed-off-by: Michael Rossberg 

Applied to the ipsec tree, thanks Michael!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv

2015-09-30 Thread Steffen Klassert
On Tue, Sep 29, 2015 at 03:17:22PM -0700, Tom Herbert wrote:
> Call before performing NF_HOOK and routing in order to perform address
> translation in the receive path.
> 
> Signed-off-by: Tom Herbert 
> ---
>  net/ipv6/ip6_input.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 9075acf..06dac55 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -183,6 +183,9 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, 
> struct packet_type *pt
>   /* Must drop socket now because of tproxy. */
>   skb_orphan(skb);
>  
> + /* Translate destination address before routing */
> + xfrm6_xlat_addr(skb);
> +

This shows that xfrm is not the right place to add this. The existing
xfrm hooks are located at the same place as your current LWT hooks are.

You could use the existing xfrm hooks similar to xfrm tunnel modes.
This reinserts the transformed packet back into layer2, but I guess
this is not what you want.

I'm currently paying with a GRO codepath for IPsec to get the
packets transformed early. If you can do your address translation
that early, it could be an option too. This clearly depends on
enabled GRO at the receiving device, but you would still have
the LWT hook as a fallback.

>   return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
>  net, NULL, skb, dev, NULL,
>  ip6_rcv_finish);

Or, try to use the netfilter hook that seems to be at the right
place at least.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-30 Thread Steffen Klassert
On Mon, Sep 21, 2015 at 10:51:11AM -0400, Dan Streetman wrote:
> On Fri, Sep 18, 2015 at 1:00 AM, Dan Streetman <ddstr...@ieee.org> wrote:
> > On Wed, Sep 16, 2015 at 4:45 AM, Steffen Klassert
> > <steffen.klass...@secunet.com> wrote:
> >>
> >> What about the patch below? With this we are independent of the number
> >> of cpus. It should cover most, if not all usecases.
> >
> > yep that works, thanks!  I'll give it a test also, but I don't see how
> > it would fail.
> 
> Yep, on a test setup that previously failed within several hours, it
> ran over the weekend successfully.  Thanks!
> 
> Tested-by: Dan Streetman <dan.street...@canonical.com>
> 
> >
> >>
> >> While we are at it, we could think about increasing the flowcache
> >> percpu limit. This value was choosen back in 2003, so maybe we could
> >> have more than 4k cache entries per cpu these days.
> >>
> >>
> >> Subject: [PATCH RFC] xfrm: Let the flowcache handle its size by default.
> >>
> >> The xfrm flowcache size is limited by the flowcache limit
> >> (4096 * number of online cpus) and the xfrm garbage collector
> >> threshold (2 * 32768), whatever is reached first. This means
> >> that we can hit the garbage collector limit only on systems
> >> with more than 16 cpus. On such systems we simply refuse
> >> new allocations if we reach the limit, so new flows are dropped.
> >> On syslems with 16 or less cpus, we hit the flowcache limit.
> >> In this case, we shrink the flow cache instead of refusing new
> >> flows.
> >>
> >> We increase the xfrm garbage collector threshold to INT_MAX
> >> to get the same behaviour, independent of the number of cpus.
> >>
> >> The xfrm garbage collector threshold can still be set below
> >> the flowcache limit to reduce the memory usage of the flowcache.
> >>
> >> Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>

I've applied this to ipsec-next now. It can be considered as a fix too,
but we still can tweak the value via the sysctl in the meantime. So
it is better to test it a bit longer before it hits the mainline.

Thanks a lot for your work Dan!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPsec workshop/BoF at netdev1.1?

2015-12-08 Thread Steffen Klassert
On Mon, Dec 07, 2015 at 08:20:01AM -0800, Eric Dumazet wrote:
> On Mon, 2015-12-07 at 13:00 +0100, Steffen Klassert wrote:
> > Is there any interest in doing an IPsec workshop/BoF at netdev1.1?
> > 
> > This mail is to probe if we can gather enough discussion topics to run
> > such a workshop/BoF. So if someone is interested to attend and/or has a
> > related discussion topic, please let me know.
> > 
> > My current topics:
> > 
> > - Adding a software GRO/GSO codepath for IPsec. I have working example code,
> >   a RFC version could be ready before the netdev conference.
> 
> 
> - Remove rwlock usage (xfrm_policy_lock) from fast path and szitch to
> RCU ?

Good point. I add it to the topic list, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ipsec impact on performance

2015-12-08 Thread Steffen Klassert
On Mon, Dec 07, 2015 at 06:27:48AM -0500, Sowmini Varadhan wrote:
> On (12/07/15 09:40), Steffen Klassert wrote:
> > 
> > I've pushed it to
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/klassert/linux-stk.git/log/?h=net-next-ipsec-offload
> > 
> > It is just example code, nothing that I would show usually.
> > But you asked for it, so here is it :)
> 
> that's fine, I dont expect more at this point, just want to 
> test-drive it, and see how it compares to my approach. 

Would be nice if you could share the results. Comments are
welcome too, of course.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ipsec impact on performance

2015-12-03 Thread Steffen Klassert
On Wed, Dec 02, 2015 at 07:05:38AM -0500, Sowmini Varadhan wrote:
> On (12/02/15 07:53), Steffen Klassert wrote:
> > 
> > I'm currently working on a GRO/GSO codepath for IPsec too. The GRO part
> > works already. I decapsulate/decrypt the packets on layer2 with a esp GRO
> > callback function and reinject them into napi_gro_receive(). So in case
> > the decapsulated packet is TCP, GRO can aggregate big packets.
> 
> Would you be able to share your patch with me? I'd like to give that a try
> just to get preliminary numbers (and I could massage it as needed
> for transport mode too).

I've got the final bits to work today, I can do async crypto now.
I can push the patches to a public tree after some polishing.
But I have to warn, it has still bugs and no usefull commit messages.

I did a first test with forwaring esp in tunnel mode. The crypto
algorithm I used was:

pcrypt(echainiv(authenc(hmac(sha1-ssse3),cbc-aes-aesni)))

Result:

iperf -c 10.0.0.12 -t 60

Client connecting to 10.0.0.12, TCP port 5001
TCP window size: 45.0 KByte (default)

[  3] local 192.168.0.12 port 39380 connected with 10.0.0.12 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-60.0 sec  32.8 GBytes  4.70 Gbits/sec

I provide more informatios as soon as the code is available.

> 
> > Another thing, I thought about setting up an IPsec BoF/workshop at
> > netdev1.1. My main topic is GRO/GSO for IPsec. I'll send out a mail
> > to the list later this week to see if there is enough interest and
> > maybe some additional topics.
> 
> Sounds like an excellent idea. I'm certainly interested.

Great, than we are at least two :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ipsec impact on performance

2015-12-03 Thread Steffen Klassert
On Thu, Dec 03, 2015 at 06:38:20AM -0500, Sowmini Varadhan wrote:
> On (12/03/15 09:45), Steffen Klassert wrote:
> > pcrypt(echainiv(authenc(hmac(sha1-ssse3),cbc-aes-aesni)))
> > 
> > Result:
> > 
> > iperf -c 10.0.0.12 -t 60
> > 
> > Client connecting to 10.0.0.12, TCP port 5001
> > TCP window size: 45.0 KByte (default)
> > 
> > [  3] local 192.168.0.12 port 39380 connected with 10.0.0.12 port 5001
> > [ ID] Interval   Transfer Bandwidth
> > [  3]  0.0-60.0 sec  32.8 GBytes  4.70 Gbits/sec
> > 
> > I provide more informatios as soon as the code is available.
> 
> that's pretty good compared to the baseline. 
> I'd like to try out our patches, when they are ready.
> 
> I think you may get some more improvement if you manually pin the irq 
> and iperf to specific cpus (at least that was my observation for transp 
> mode) 

I do that already. I have dedicated crypto and IO cpus, 2 cpus
do networking IO and 4 cpus do crypto (parallelized with pcrypt).

The bottleneck is now the cpu that does the TX path (checksumming
of the GSO segments).

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   9   >