Re: NETDEV WATCHDOG: eth2: transmit timed out with 3c905C-TX
On Tue, May 23, 2006 at 03:36:35PM +0200, Marco Berizzi wrote: Steffen Klassert wrote: On Wed, Apr 05, 2006 at 06:33:18PM +0200, Marco Berizzi wrote: Hello everybody. I'm getting these errors (with packet/connectivity loss) on our firewall after I have plugged in a 3c905C nic. Linux is Slackware 10.2 with vanilla 2.6.16.1. Hints? PS: I have temporary resolved the problem running 'ifconfig eth2 down' and 'ifconfig eth2 up' Apr 5 17:47:07 Teti kernel: eth2: Resetting the Tx ring pointer. Apr 5 17:47:47 Teti last message repeated 4 times Apr 5 17:48:57 Teti last message repeated 7 times Apr 5 17:49:57 Teti last message repeated 6 times Apr 5 17:50:57 Teti last message repeated 6 times Apr 5 17:47:07 Teti kernel: NETDEV WATCHDOG: eth2: transmit timed out There were some problems of this kind with 10base2 networks in 2.6.16. Could you please try whether 2.6.17-rc1 has this problems too? [Sorry for the very huge delay, but after 2.6.17-rc1 upgrade xfs filesystem crashed]. Same problem here with 2.6.17-rc3-git18. Running ifconfig eth2 down and ifconfig eth2 up resolves the problem for a while. Actually I have not really an idea what is going on here, but increasing the debug level could give some more informations. Setting debug=4 is a good start. Did you try older kernel versions too? Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NETDEV WATCHDOG: eth2: transmit timed out with 3c905C-TX
On Tue, Jun 06, 2006 at 11:12:45AM +0200, Marco Berizzi wrote: I have moved this damn pc from the remote to my site and I have placed it in production environment with 2.6.17-rc5 No problem after 24 hours (on the remote side the problem was arising after a couple of hours). I have modprobed 3c59x with debug=4. I see only these kind of messages (are they fine?): Jun 5 14:31:25 Pleiadi kernel: eth2: vortex_error(), status=0x8081 Jun 5 14:31:40 Pleiadi last message repeated 3 times Jun 5 14:31:47 Pleiadi kernel: eth2: vortex_error(), status=0x8281 Jun 5 14:31:47 Pleiadi kernel: eth2: Media selection timer tick happened, Autonegotiate. Jun 5 14:31:47 Pleiadi kernel: dev-watchdog_timeo=1250 Jun 5 14:31:47 Pleiadi kernel: eth2: MII transceiver has status 782d. Jun 5 14:31:47 Pleiadi kernel: eth2: Media selection timer finished, Autonegotiate. Jun 5 14:31:51 Pleiadi kernel: eth2: vortex_error(), status=0x8081 Jun 5 14:32:03 Pleiadi last message repeated 2 times Jun 5 14:32:10 Pleiadi kernel: eth2: vortex_error(), status=0x8481 Jun 5 14:32:15 Pleiadi kernel: eth2: vortex_error(), status=0x8081 Jun 5 14:32:46 Pleiadi last message repeated 7 times This is ok, just normal operation of the NIC. The only relevant change, between the remote and my site, is a different ethernet switch where the 3c905C is connected to. Could it be an issue? Well, I think it can. Problems with a switch are mostly related to the autonegotiation of the media type and full/half-duplex. But in your case the autonegotiation seems to be ok (mii-tool/ethtool output). More specific information you can get with the mii-diag and vortex-diag tools. You can find these tools at http://www.scyld.com/ethercard_diag.html There are problems with a cisco switch documented in Documentation/networking/vortex.txt for example. Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] drivers/net/3c59x: notice carrier a little sooner
Did you give the patch a try? Actually I have no possibility to test, but I think that netif_carrier_{on,off} still does not work proper. The timer function does just nothing if vp-medialock is set. Steffen On Thu, Jan 12, 2006 at 01:29:23PM -0500, Dan Williams wrote: Hi, This patch attempts to notice carrier 'on' state a little sooner. Since the carrier watchdog only fires in really, really long intervals (like, 60s in most cases), it takes up to 60s to notice that a cable has been plugged/unplugged. This patch fires the carrier watchdog if (1) the carrier is off, and (2) packets have been successfully received. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] drivers/net/3c59x: notice carrier a little sooner
On Thu, Jan 12, 2006 at 03:02:25PM -0500, Dan Williams wrote: On Thu, 2006-01-12 at 20:57 +0100, Steffen Klassert wrote: Did you give the patch a try? Actually I have no possibility to test, but I think that netif_carrier_{on,off} still does not work proper. The timer function does just nothing if vp-medialock is set. Worked for me with 3c905 card in a Dell Latitude C610. I just cp'd the code into the vortex, so maybe vortex is different than boomerang. Anyway, shouldn't the timer be run a bit later after the *_rx() function? Dan Of course the timer function should run in both cases. If it works with the boomerang then it should work with vortex too. I just remember that I had some problems with netif_carrier_ok() when I tried to use ethtool_op_get_link about a year ago. Perhaps I'm not up to date and it works in default case. I think a problem should appear at least if one forces a certain media type. I still have a patch in queue to improve usage of netif_carrier_{on,off} but I had no possibility to test yet, so I did not send. Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] 3c59x: improve usage of netif_carrier_{on,off}
... I still have a patch in queue to improve usage of netif_carrier_{on,off} but I had no possibility to test yet, so I did not send. I'd be happy to test it out if you'd like. Dan Hi Dan, here is the patch for testing. The patch _should_ do the following: 1. Set the polling intervall for media changes to 5 seconds if link is down and 60 seconds if link is up. 2. Handle netif_carrier_{on,of} and check for media changes in proper way by using mii_check_media() in the mii case. 3. Handle netif_carrier_{on,of} also if media is forced to 10baseT/100baseTx. 4. Use ethtool_op_get_link instead of vortex_get_link. So it is possible to test with ethtool. The patch compiles, but as I told it is fairly untested. Please let me know the results of your tests. Thanks in advance, Steffen Signed-off-by: Steffen Klassert [EMAIL PROTECTED] --- vanilla-2.6.15/drivers/net/3c59x.c 2006-01-03 04:21:10.0 +0100 +++ linux-2.6.15-sk2/drivers/net/3c59x.c2006-01-13 16:15:28.0 +0100 @@ -789,7 +789,7 @@ int options;/* User-settable misc. driver options. */ unsigned int media_override:4, /* Passed-in media type. */ default_media:4,/* Read from the EEPROM/Wn3_Config. */ - full_duplex:1, force_fd:1, autoselect:1, + full_duplex:1, autoselect:1, bus_master:1, /* Vortex can only do a fragment bus-m. */ full_bus_master_tx:1, full_bus_master_rx:2, /* Boomerang */ flow_ctrl:1,/* Use 802.3x flow control (PAUSE only) */ @@ -1326,7 +1326,7 @@ vp-enable_wol = 1; } - vp-force_fd = vp-full_duplex; + vp-mii.force_media = vp-full_duplex; vp-options = option; /* Read the station address from the EEPROM. */ EL3WINDOW(0); @@ -1616,6 +1616,48 @@ } static void +vortex_set_duplex(struct net_device *dev) +{ + int mii_reg1, mii_reg5; + struct vortex_private *vp = netdev_priv(dev); + long *ioaddr = vp-ioaddr; + + EL3WINDOW(4); + mii_reg1 = mdio_read(dev, vp-phys[0], 1); + mii_reg5 = mdio_read(dev, vp-phys[0], 5); + vp-partner_flow_ctrl = ((mii_reg5 0x0400) != 0); + EL3WINDOW(3); + if (vortex_debug 1) + printk(KERN_INFO %s: MII #%d status %4.4x, link partner capability %4.4x, + info1 %04x, setting %s-duplex.\n, + dev-name, vp-phys[0], + mii_reg1, mii_reg5, + vp-info1, ((vp-info1 0x8000) || vp-full_duplex) ? full : half); + /* Set the full-duplex bit. */ + iowrite16( ((vp-info1 0x8000) || vp-full_duplex ? 0x20 : 0) | + (vp-large_frames ? 0x40 : 0) | + ((vp-full_duplex vp-flow_ctrl vp-partner_flow_ctrl) ? 0x100 : 0), + ioaddr + Wn3_MAC_Ctrl); + /* AKPM: bug: should reset Tx and Rx after setting Duplex. Page 180 */ +} + +static void +vortex_check_media(struct net_device *dev, unsigned int init) +{ + struct vortex_private *vp = netdev_priv(dev); + unsigned int ok_to_print = 0; + + if (vortex_debug 3) + ok_to_print = 1; + + if (mii_check_media(vp-mii, ok_to_print, init)) { + vp-full_duplex = vp-mii.full_duplex; + vortex_set_duplex(dev); + } else if (init) + vortex_set_duplex(dev); +} + +static void vortex_up(struct net_device *dev) { struct vortex_private *vp = netdev_priv(dev); @@ -1675,55 +1717,15 @@ printk(KERN_DEBUG %s: Initial media type %s.\n, dev-name, media_tbl[dev-if_port].name); - vp-full_duplex = vp-force_fd; + vp-full_duplex = vp-mii.force_media; config = BFINS(config, dev-if_port, 20, 4); if (vortex_debug 6) printk(KERN_DEBUG vortex_up(): writing 0x%x to InternalConfig\n, config); iowrite32(config, ioaddr + Wn3_Config); - if (dev-if_port == XCVR_MII || dev-if_port == XCVR_NWAY) { - int mii_reg1, mii_reg5; - EL3WINDOW(4); - /* Read BMSR (reg1) only to clear old status. */ - mii_reg1 = mdio_read(dev, vp-phys[0], MII_BMSR); - mii_reg5 = mdio_read(dev, vp-phys[0], MII_LPA); - if (mii_reg5 == 0x || mii_reg5 == 0x) { - netif_carrier_off(dev); /* No MII device or no link partner report */ - } else { - mii_reg5 = vp-advertising; - if ((mii_reg5 0x0100) != 0/* 100baseTx-FD */ -|| (mii_reg5 0x00C0) == 0x0040) /* 10T-FD, but not 100-HD */ - vp-full_duplex
[PATCH] 3c59x: collision statistic fix
Count the total number of packets with collisions during transmission in vp-stats.collisions. Signed-off-by: Steffen Klassert [EMAIL PROTECTED] --- vanilla-2.6.15/drivers/net/3c59x.c 2006-01-03 04:21:10.0 +0100 +++ linux-2.6.15-sk/drivers/net/3c59x.c 2006-01-14 17:54:16.0 +0100 @@ -754,7 +754,9 @@ struct vortex_extra_stats { unsigned long tx_deferred; + unsigned long tx_max_collisions; unsigned long tx_multiple_collisions; +unsigned long tx_single_collisions; unsigned long rx_bad_ssd; }; @@ -863,12 +865,14 @@ const char str[ETH_GSTRING_LEN]; } ethtool_stats_keys[] = { { tx_deferred }, + { tx_max_collisions }, { tx_multiple_collisions }, + { tx_single_collisions }, { rx_bad_ssd }, }; /* number of ETHTOOL_GSTATS u64's */ -#define VORTEX_NUM_STATS 3 +#define VORTEX_NUM_STATS5 static int vortex_probe1(struct device *gendev, void __iomem *ioaddr, int irq, int chip_idx, int card_idx); @@ -2108,9 +2112,12 @@ iowrite8(0, ioaddr + TxStatus); if (tx_status 0x30) { /* txJabber or txUnderrun */ do_tx_reset = 1; - } else if ((tx_status 0x08) (vp-drv_flags MAX_COLLISION_RESET)) { /* maxCollisions */ - do_tx_reset = 1; - reset_mask = 0x0108;/* Reset interface logic, but not download logic */ + } else if (tx_status 0x08) { /* maxCollisions */ + vp-xstats.tx_max_collisions++; + if (vp-drv_flags MAX_COLLISION_RESET) { + do_tx_reset = 1; + reset_mask = 0x0108;/* Reset interface logic, but not download logic */ + } } else {/* Merely re-enable the transmitter. */ iowrite16(TxEnable, ioaddr + EL3_CMD); } @@ -2926,7 +2933,6 @@ EL3WINDOW(6); vp-stats.tx_carrier_errors += ioread8(ioaddr + 0); vp-stats.tx_heartbeat_errors += ioread8(ioaddr + 1); - vp-stats.collisions+= ioread8(ioaddr + 3); vp-stats.tx_window_errors += ioread8(ioaddr + 4); vp-stats.rx_fifo_errors+= ioread8(ioaddr + 5); vp-stats.tx_packets+= ioread8(ioaddr + 6); @@ -2939,10 +2945,15 @@ vp-stats.tx_bytes += ioread16(ioaddr + 12); /* Extra stats for get_ethtool_stats() */ vp-xstats.tx_multiple_collisions += ioread8(ioaddr + 2); + vp-xstats.tx_single_collisions += ioread8(ioaddr + 3); vp-xstats.tx_deferred += ioread8(ioaddr + 8); EL3WINDOW(4); vp-xstats.rx_bad_ssd += ioread8(ioaddr + 12); + vp-stats.collisions = vp-xstats.tx_multiple_collisions + + vp-xstats.tx_single_collisions + + vp-xstats.tx_max_collisions; + { u8 up = ioread8(ioaddr + 13); vp-stats.rx_bytes += (up 0x0f) 16; @@ -3036,8 +3047,10 @@ spin_unlock_irqrestore(vp-lock, flags); data[0] = vp-xstats.tx_deferred; - data[1] = vp-xstats.tx_multiple_collisions; - data[2] = vp-xstats.rx_bad_ssd; + data[1] = vp-xstats.tx_max_collisions; + data[2] = vp-xstats.tx_multiple_collisions; + data[3] = vp-xstats.tx_single_collisions; + data[4] = vp-xstats.rx_bad_ssd; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] 3c59x: improve usage of netif_carrier_{on,off}
On Mon, Jan 16, 2006 at 02:43:30PM -0500, Dan Williams wrote: ... The patch appears to work correctly and does notice links quite a bit sooner. The only issue I noticed was that if no cable is plugged in, it starts off with the carrier on (/sys/class/net/eth0/carrier == 1) but a second later switches the carrier off. How do I track that down? Dan The patch should not affect the set up of the NIC that early. vortex_up() calls netif_carrier_off() before the patch interfere. Perhaps you can see this behavior before the old status is cleared by reading MII_BMSR register. How is it without the patch? Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST] mii: check carrier state even when force_media == 1
On Thu, Feb 09, 2006 at 05:19:45PM -0500, John W. Linville wrote: Drivers using mii_check_media (via-rhine in particular) and also forcing link parameters with ethtool can reach a state where the link goes down and never comes back up. This is because mii_check_media short-circuits early if mii-force_media != 0. This was discussed in a couple of past threads, one of which is available here: http://www.ussg.iu.edu/hypermail/linux/kernel/0508.3/0670.html The patch moves the force_media check to below the carrier status check.This allows the link state to show correctly, while avoiding the check of link parameters. I sent almost the same patch because of the same reasons about a year ago, see http://oss.sgi.com/projects/netdev/archive/2005-02/msg00648.html so I would vote for this patch too. Acked-by: Steffen Klassert [EMAIL PROTECTED] Signed-off-by: John W. Linville [EMAIL PROTECTED] --- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] 3c59x: fix networking for 10base2 NICs
Fix broken networking for older 10base2 NICs. Signed-off-by: Steffen Klassert [EMAIL PROTECTED] --- linux-2.6.16-git12/drivers/net/3c59x.c 2006-03-30 14:16:23.0 +0200 +++ linux-2.6.16-git12-sk/drivers/net/3c59x.c 2006-03-30 15:27:13.0 +0200 @@ -788,7 +788,7 @@ int options;/* User-settable misc. driver options. */ unsigned int media_override:4, /* Passed-in media type. */ default_media:4,/* Read from the EEPROM/Wn3_Config. */ - full_duplex:1, force_fd:1, autoselect:1, + full_duplex:1, autoselect:1, bus_master:1, /* Vortex can only do a fragment bus-m. */ full_bus_master_tx:1, full_bus_master_rx:2, /* Boomerang */ flow_ctrl:1,/* Use 802.3x flow control (PAUSE only) */ @@ -1633,12 +1633,6 @@ ((vp-full_duplex vp-flow_ctrl vp-partner_flow_ctrl) ? 0x100 : 0), ioaddr + Wn3_MAC_Ctrl); - - issue_and_wait(dev, TxReset); - /* -* Don't reset the PHY - that upsets autonegotiation during DHCP operations. -*/ - issue_and_wait(dev, RxReset|0x04); } static void vortex_check_media(struct net_device *dev, unsigned int init) @@ -1663,7 +1657,7 @@ struct vortex_private *vp = netdev_priv(dev); void __iomem *ioaddr = vp-ioaddr; unsigned int config; - int i; + int i, mii_reg1, mii_reg5; if (VORTEX_PCI(vp)) { pci_set_power_state(VORTEX_PCI(vp), PCI_D0);/* Go active */ @@ -1723,14 +1717,23 @@ printk(KERN_DEBUG vortex_up(): writing 0x%x to InternalConfig\n, config); iowrite32(config, ioaddr + Wn3_Config); - netif_carrier_off(dev); if (dev-if_port == XCVR_MII || dev-if_port == XCVR_NWAY) { EL3WINDOW(4); + mii_reg1 = mdio_read(dev, vp-phys[0], MII_BMSR); + mii_reg5 = mdio_read(dev, vp-phys[0], MII_LPA); + vp-partner_flow_ctrl = ((mii_reg5 0x0400) != 0); + vortex_check_media(dev, 1); } else vortex_set_duplex(dev); + issue_and_wait(dev, TxReset); + /* +* Don't reset the PHY - that upsets autonegotiation during DHCP operations. +*/ + issue_and_wait(dev, RxReset|0x04); + iowrite16(SetStatusEnb | 0x00, ioaddr + EL3_CMD); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NETDEV WATCHDOG: eth2: transmit timed out with 3c905C-TX
On Wed, Apr 05, 2006 at 06:33:18PM +0200, Marco Berizzi wrote: Hello everybody. I'm getting these errors (with packet/connectivity loss) on our firewall after I have plugged in a 3c905C nic. Linux is Slackware 10.2 with vanilla 2.6.16.1. Hints? PS: I have temporary resolved the problem running 'ifconfig eth2 down' and 'ifconfig eth2 up' Apr 5 17:47:07 Teti kernel: eth2: Resetting the Tx ring pointer. Apr 5 17:47:47 Teti last message repeated 4 times Apr 5 17:48:57 Teti last message repeated 7 times Apr 5 17:49:57 Teti last message repeated 6 times Apr 5 17:50:57 Teti last message repeated 6 times Apr 5 17:47:07 Teti kernel: NETDEV WATCHDOG: eth2: transmit timed out There were some problems of this kind with 10base2 networks in 2.6.16. Could you please try whether 2.6.17-rc1 has this problems too? Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] 3c59x: fix duplex configuration
A special sequence of ifconfig up/down and plug/unplug the cable can break the duplex configuration of the driver. Setting vp-mii.full_duplex = vp-full_duplex in vortex_up should fix this. Addresses Bug 8575 3c59x duplex configuration broken http://bugzilla.kernel.org/show_bug.cgi?id=8575 Cc: Martin Buck [EMAIL PROTECTED] Signed-off-by: Steffen Klassert [EMAIL PROTECTED] --- drivers/net/3c59x.c |1 + 1 file changed, 1 insertion(+) --- linux-2.6.23-rc2.orig/drivers/net/3c59x.c +++ linux-2.6.23-rc2/drivers/net/3c59x.c @@ -1555,6 +1555,7 @@ vortex_up(struct net_device *dev) mii_reg1 = mdio_read(dev, vp-phys[0], MII_BMSR); mii_reg5 = mdio_read(dev, vp-phys[0], MII_LPA); vp-partner_flow_ctrl = ((mii_reg5 0x0400) != 0); + vp-mii.full_duplex = vp-full_duplex; vortex_check_media(dev, 1); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Add 3c59x maintainer
Add 3c59x maintainer. Signed-off-by: Steffen Klassert [EMAIL PROTECTED] --- MAINTAINERS |6 ++ 1 file changed, 6 insertions(+) --- linux-2.6.23-rc2.orig/MAINTAINERS +++ linux-2.6.23-rc2/MAINTAINERS @@ -97,6 +97,12 @@ M: [EMAIL PROTECTED] L: netdev@vger.kernel.org S: Maintained +3C59X NETWORK DRIVER +P: Steffen Klassert +M: [EMAIL PROTECTED] +L: netdev@vger.kernel.org +S: Maintained + 3CR990 NETWORK DRIVER P: David Dillow M: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 08/18] 3c59x: check return of pci_enable_device()
On Wed, Aug 15, 2007 at 06:30:00PM +0200, Steffen Klassert wrote: On Tue, Aug 14, 2007 at 10:54:32AM +0100, Mark Hindley wrote: On Tue, Aug 14, 2007 at 01:33:26AM -0400, Jeff Garzik wrote: I would strongly prefer that vortex_up return a value, since all the important callers of this function can themselves return an error back to the system. we can definitely return a meaningful return value here, if pci_enable_device() fails, and I would rather not apply a patch that fails to propagate a serious condition (pci_enable_device failure is indeed serious) when it is possible to do so OK. Any comments on this revised version? I have only ignored the return of vortex_up in vortex_error. It is not immediately clear what to do if vortex_up still fails there after a pci reset. Mark diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c index 001c66d..a1dfd6b 100644 --- a/drivers/net/3c59x.c +++ b/drivers/net/3c59x.c @@ -705,7 +705,7 @@ static struct { static int vortex_probe1(struct device *gendev, void __iomem *ioaddr, int irq, int chip_idx, int card_idx); -static void vortex_up(struct net_device *dev); +static int vortex_up(struct net_device *dev); static void vortex_down(struct net_device *dev, int final); static int vortex_open(struct net_device *dev); static void mdio_sync(void __iomem *ioaddr, int bits); @@ -841,8 +841,11 @@ static int vortex_resume(struct pci_dev *pdev) return -EBUSY; } if (netif_running(dev)) { - vortex_up(dev); - netif_device_attach(dev); + err = vortex_up(dev); + if (err) + return err; + else + netif_device_attach(dev); } } return 0; I think we should free the requested irq if vortex_up really fails here. I was wrong, this will be done in vortex_close. So it is OK as it is. Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REVISED PATCH] 3c59x: check return of pci_enable_device()
On Fri, Aug 31, 2007 at 09:08:37AM -0400, Jeff Garzik wrote: Mark Hindley wrote: Revised patch for this. Mark commit 5cf33391eba81a49038fa8be8cbad8425b80bf7f Author: Mark Hindley [EMAIL PROTECTED] Date: Thu Aug 16 11:26:35 2007 +0100 Check return of pci_enable_device in vortex_up(). Also modify vortex_up to return error to callers. Handle failure of vortex_up in vortex_open and vortex_resume. Signed-off-by: Mark Hindley [EMAIL PROTECTED] Steffen, did you ACK this? and/or going to resend it to me? Yes, I did. Andrew has it already in -mm, I guess that he will resend it to you soon. Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -mm 2/2] 3c59x MAINTAINERS
On Tue, Sep 04, 2007 at 03:52:50AM +0530, Satyam Sharma wrote: Remove duplicate entry for the same driver. This is -mm specific. Andrew did not remove the add-3c59x-maintainer patch after pushing it to mainline. This can be fixed just by removing the add-3c59x-maintainer patch from -mm. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug
On Tue, Sep 04, 2007 at 03:45:55AM +0530, Satyam Sharma wrote: drivers/net/3c59x.c: In function 'vortex_up': drivers/net/3c59x.c:1495: warning: 'err' may be used uninitialized in this function This came in with the recently applied 3c59x-check-return-of-pci_enable_device patch from Mark Hindley. I just compiled it on a PCI only machine so far, therefore I did not notice the warning yet. is a genuine bug. The function returns an uninitialized value of 'err' back to the caller, which expects it to be 0 for success cases. Let's fix this by explicitly initializing 'err' to zero. Signed-off-by: Satyam Sharma [EMAIL PROTECTED] Acked-by: Steffen Klassert [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug
On Tue, Sep 04, 2007 at 09:53:31AM +0100, Mark Hindley wrote: On Tue, Sep 04, 2007 at 02:09:47PM +0530, Satyam Sharma wrote: Hi Steffen, On Tue, 4 Sep 2007, Steffen Klassert wrote: On Tue, Sep 04, 2007 at 03:45:55AM +0530, Satyam Sharma wrote: drivers/net/3c59x.c: In function 'vortex_up': drivers/net/3c59x.c:1495: warning: 'err' may be used uninitialized in this function This came in with the recently applied 3c59x-check-return-of-pci_enable_device patch from Mark Hindley. I just compiled it on a PCI only machine so far, therefore I did not notice the warning yet. Hmm, the .config I built with had PCI=y as well. Probably a compiler version difference -- Jeff also mentioned yesterday that some newer GCC versions fail to warn about uninitialized variables cases. Sorry, this is my bad. I have just checked: there is no warning with gcc 4.2 or 4.1, but 3.3 emits the warning. The only warning that I was able to trigger with gcc 4.2 is in the case of a .config without PCI support. In this case I get drivers/net/3c59x.c: In function 'vortex_up': drivers/net/3c59x.c:1672: warning: 'err' is used uninitialized in this function ^^ So we have to be more carefull with the newer gcc versions. Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug
On Tue, Sep 04, 2007 at 10:35:10AM +0100, Mark Hindley wrote: On Tue, Sep 04, 2007 at 11:17:57AM +0200, Steffen Klassert wrote: The only warning that I was able to trigger with gcc 4.2 is in the case of a .config without PCI support. In this case I get drivers/net/3c59x.c: In function 'vortex_up': drivers/net/3c59x.c:1672: warning: 'err' is used uninitialized in this function That should be fixed by Satyam's patch too. Yes, of course. This was just to point out what's possible to trigger with a newer gcc and what's not. Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] 3c59x: sparse warning fix
On Wed, Sep 05, 2007 at 03:23:59PM +0100, Stephen Hemminger wrote: --- a/drivers/net/3c59x.c 2007-09-05 15:15:16.0 +0100 +++ b/drivers/net/3c59x.c 2007-09-05 15:16:29.0 +0100 @@ -1122,7 +1122,7 @@ static int __devinit vortex_probe1(struc + sizeof(struct boom_tx_desc) * TX_RING_SIZE, vp-rx_ring_dma); retval = -ENOMEM; - if (vp-rx_ring == 0) + if (!vp-rx_ring) goto free_region; vp-tx_ring = (struct boom_tx_desc *)(vp-rx_ring + RX_RING_SIZE); @@ -2476,7 +2476,8 @@ boomerang_rx(struct net_device *dev) /* Check if the packet is long enough to just accept without copying to a properly sized skbuff. */ - if (pkt_len rx_copybreak (skb = dev_alloc_skb(pkt_len + 2)) != 0) { + if (pkt_len rx_copybreak + (skb = dev_alloc_skb(pkt_len + 2)) ) { skb_reserve(skb, 2);/* Align IP on 16 byte boundaries */ pci_dma_sync_single_for_cpu(VORTEX_PCI(vp), dma, PKT_BUF_SZ, PCI_DMA_FROMDEVICE); /* 'skb_put()' points to the start of sk_buff data area. */ - Thanks, Acked-by: Steffen Klassert [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problems with 3c59x driver (extremely low throughput)
On Tue, Sep 18, 2007 at 01:32:00PM +0200, Mikael Leivisk? wrote: OK first of all I'm not really sure if this is the place but was told by Dave Dillow to send to the netdev kernel mailing list. So that's what I'm doing :) I tried searching the mailing lists and didn't find anything that seemed relevant, it is possible I missed something the archives are huge... Following is the mail I sent originally: I've run into something of a strange problem lately, I have a network card that runs on the Vortex/Boomerang driver. I ran the 2.6.16 kernel without any issues, and later switched to 2.6.22 because I couldn't get my IPSEC tunnel working with 2.6.16 for some reason (didn't find the right switches in menuconfig). And everything was seemingly working fine but soon after my ISP got some problems and they said they had to replace some hardware somewhere along my route and now when I access the internet (the machine is setup as router/web-file-mail-server with this card as the external interface) I get dreadfully slow speeds, I'll be lucky to get 10KB/s on a 10MB/s link. Does this mean that the driver worked well with 2.6.22 befor the hardware changes of your ISP? What did they change? But the funny thing is if I switch to the 2.6.16 kernel, WHAM I get 700KB/s no sweat... The kernel configurations are otherwise almost exactly the same bar tickless and some switches in network options to get ipsec... The only thing I do is change the kernel and things work... So my questions are, Has there been any recent changes to the drivers for the card lately? (between .16 and .22) There were some changes, your problem could be related to one of them. Or any changes in the network infrastructure that could be the cause of it? Or is there some other switch in the config that could be the cause of this? Or is it my ISP's hardware that's not playing nice? Or any other ideas as to what might be the cause. I've posted the respective kernel configs, and lspci output and my cpuinfo on http://www.zero-kelvin.org/pub/kern/ If there is any other info that you need I'll gladly provide it assuming I can get my hands on it :) There were some duplex related fixes on 3c59x recently, could you please try whether your problem persist in 2.6.23-rc6? Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bugzilla: open bug reports
On Thu, Oct 04, 2007 at 09:44:26AM -0700, Stephen Hemminger wrote: Bugzilla report of open bugs. Yes you could run it yourself but many of these bugs seem to be old and need some attention or work to get resolved. Perhaps we should at least ask the reporters of the older bugs whether the bugs are still present in newer kernel versions to get an actual state. I will do this for the 3c59x related bugs after the release of 2.6.23. Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vortex_up should initialize err
On Wed, Oct 17, 2007 at 08:28:36PM -0400, Jeff Garzik wrote: Badari Pulavarty wrote: Simple compile warning fix. (against 2.6.23-git12) Thanks, Badari vortex_up() should initialize 'err' for a successful return. drivers/net/3c59x.c: In function `vortex_up': drivers/net/3c59x.c:1494: warning: `err' might be used uninitialized in this function applied as an obvious bug fix (Steffen added to CC) Hm, we had already the 3c59x-fix-uninitialized-variable-bug.patch from Satyam Sharma in -mm to fix this, but the patch was removed from -mm some time ago. Andrew, what happened to this one? However, this issue should be fixed. Thanks, Steffen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CCM/GCM implementation defect
On Thu, Apr 23, 2015 at 11:26:20AM +0800, Herbert Xu wrote: Hi: It looks like our IPsec implementations of CCM and GCM are buggy in that they don't include the IV in the authentication calculation. Seems like crypto_rfc4106_crypt() passes the associated data it got from ESP directly to gcm, without chaining with the IV. This definitely breaks interoperability with anyone who implements them correctly. The fact that there have been no reports on this probably means that nobody has run into this in the field yet. On the security side, this is probably not a big deal for CCM because it always verifies the authentication tag after decryption. But for GCM this may be a DoS issue as an attacker could modify the IV without triggering the authentication check and thus cause an unnecessary decryption. For both CCM and GCM this will result in random data injected as a packet into the network stack which hopefully will be dropped. In order to fix this without breaking backwards compatibility, my plan is to introduce new templates such as rfc4106v2 which implement the RFC correctly. The existing templates will be retained so that current users aren't broken by the fix. Adding a second template for the correct implementation is probaply the only thing that we can do if we don't want to break backwards compatibility. But maybe we can add a warning to the old implementation, such that users notice that they use a broken version. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm: fix a race in xfrm_state_lookup_byspi
On Wed, Apr 29, 2015 at 05:25:25AM +, Du, Fan wrote: -Original Message- From: roy.qing...@gmail.com [mailto:roy.qing...@gmail.com] Sent: Wednesday, April 29, 2015 8:43 AM To: netdev@vger.kernel.org Cc: Du, Fan; steffen.klass...@secunet.com Subject: [PATCH] xfrm: fix a race in xfrm_state_lookup_byspi From: Li RongQing roy.qing...@gmail.com The returned xfrm_state should be hold before unlock xfrm_state_lock, otherwise the returned xfrm_state maybe be released. Fixes: c454997e6[{pktgen, xfrm} Introduce xfrm_state_lookup_byspi..] Cc: Fan Du fan...@intel.com Signed-off-by: Li RongQing roy.qing...@gmail.com Acked-by: Fan Du fan...@intel.com Applied to the ipsec tree, thanks a lot! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm: fix the return code when xfrm_*_register_afinfo failed
On Thu, Apr 23, 2015 at 11:06:53AM +0800, roy.qing...@gmail.com wrote: From: Li RongQing roy.qing...@gmail.com If xfrm_*_register_afinfo failed since xfrm_*_afinfo[afinfo-family] had the value, return the -EEXIST, not -ENOBUFS Signed-off-by: Li RongQing roy.qing...@gmail.com Also applied to ipsec-next, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][net-next] xfrm: slightly optimise xfrm_input
On Fri, Apr 24, 2015 at 04:49:31PM +0800, roy.qing...@gmail.com wrote: From: Li RongQing roy.qing...@gmail.com Check x-km.state with XFRM_STATE_ACQ only when state is not XFRM_STAT_VALID, not everytime Signed-off-by: Li RongQing roy.qing...@gmail.com Applied to ipsec-next, thanks a lot Li! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][net-next][v2] xfrm: optimise the use of walk list header in xfrm_policy/state_walk
On Wed, Apr 22, 2015 at 05:13:18PM +0800, Herbert Xu wrote: On Wed, Apr 22, 2015 at 05:09:54PM +0800, roy.qing...@gmail.com wrote: From: Li RongQing roy.qing...@gmail.com The walk from input is the list header, and marked as dead, and will be skipped in loop. list_first_entry() can be used to return the true usable value from walk if walk is not empty Signed-off-by: Li RongQing roy.qing...@gmail.com Acked-by: Herbert Xu herb...@gondor.apana.org.au Applied to ipsec-next, thanks everyone! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm: remove the xfrm_queue_purge definition
On Wed, Apr 22, 2015 at 03:51:16PM +0800, roy.qing...@gmail.com wrote: From: Li RongQing roy.qing...@gmail.com The task of xfrm_queue_purge is same as skb_queue_purge, so remove it Signed-off-by: Li RongQing roy.qing...@gmail.com Applied to ipsec-next, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3 PATCH 0/8] crypto: Convert all AEAD users to new interface
On Wed, May 27, 2015 at 04:01:05PM +0800, Herbert Xu wrote: Hi: The only changes from the last version are that set_ad no longer takes a cryptoff argument and testmgr has been updated to always supply space for the authentication tag. The algif_aead patch has been removed and will be posted separately. Series description: This series of patches convert all in-tree AEAD users that I could find to the new single SG list interface. For IPsec it also adopts the new explicit IV generator scheme. To recap, the old AEAD interface takes an associated data (AD) SG list in addition to the plain/cipher text SG list(s). That forces the underlying AEAD algorithm implementors to try to stitch those two lists together where possible in order to maximise the contiguous chunk of memory passed to the ICV/hash function. Things get even more hairy for IPsec as it has a third piece of memory, the generated IV (giv) that needs to be hashed. One look at the nasty things authenc does for example is enough to make anyone puke :) In fact the interface is just getting in our way because for the main user IPsec the data is naturally contiguous as the protocol was designed with this in mind. So the new AEAD interface gets rid of the separate AD SG list and instead simply requires the AD to be at the head of the src and dst SG lists. The conversion of in-tree users is fairly straightforward. The only non-trivial bit is IPsec as I'm taking this opportunity to move the IV generation knowledge into IPsec as that's where it belongs since we may in future wish to support different generation schemes for a single algorithm. Not sure if I missed something in the flood of patches, but if I apply your v3 patchset on top of the cryptodev tree, it crashes like that buring boot: [4.668297] [ cut here ] [4.669143] kernel BUG at /home/klassert/git/linux-stk/include/linux/scatterlist.h:67! [4.670457] invalid opcode: [#1] SMP DEBUG_PAGEALLOC [4.671595] CPU: 0 PID: 1363 Comm: cryptomgr_test Not tainted 4.0.0+ #951 [4.672025] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [4.672025] task: ce9e7300 ti: ceb54000 task.ti: ceb54000 [4.672025] EIP: 0060:[c11d45b5] EFLAGS: 00010206 CPU: 0 [4.672025] EIP is at scatterwalk_ffwd+0xf5/0x100 [4.672025] EAX: ceb43b20 EBX: ceb55c94 ECX: 0014 EDX: c11db23f [4.672025] ESI: 0010 EDI: 0003 EBP: ceb55c7c ESP: ceb55c6c [4.672025] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 [4.672025] CR0: 8005003b CR2: bfbb6fc0 CR3: 0eb26000 CR4: 06d0 [4.672025] Stack: [4.672025] cffd28c0 0014 ceb35400 cea33618 ceb55cd0 c11d45e8 ceb43b20 [4.672025] ceb35438 c11db220 ceb55c9c c11db23f ceb55cac c11da470 ceb35438 ceb353c8 [4.672025] ceb55cb4 c11da763 ceb55cd0 c11f2c6f ceb35400 0200 ceb35358 ceb353c8 [4.672025] Call Trace: [4.672025] [c11d45e8] scatterwalk_map_and_copy+0x28/0xc0 [4.672025] [c11db220] ? shash_ahash_finup+0x80/0x80 [4.672025] [c11db23f] ? shash_async_finup+0x1f/0x30 [4.672025] [c11da470] ? crypto_ahash_op+0x20/0x50 [4.672025] [c11da763] ? crypto_ahash_finup+0x13/0x20 [4.672025] [c11f2c6f] ? crypto_authenc_ahash_fb+0xaf/0xd0 [4.672025] [c11f2dfc] crypto_authenc_genicv+0xfc/0x340 [4.672025] [c11f3526] crypto_authenc_encrypt+0x96/0xb0 [4.672025] [c11f3490] ? crypto_authenc_decrypt+0x3e0/0x3e0 [4.672025] [c11d4eb7] old_crypt+0xa7/0xc0 [4.672025] [c11d4f09] old_encrypt+0x19/0x20 [4.672025] [c11ddbe8] __test_aead+0x268/0x1580 [4.672025] [c11d28a7] ? __crypto_alloc_tfm+0x37/0x120 [4.672025] [c11d28a7] ? __crypto_alloc_tfm+0x37/0x120 [4.672025] [c11d7742] ? skcipher_geniv_init+0x22/0x40 [4.672025] [c11d7d73] ? eseqiv_init+0x43/0x50 [4.672025] [c11d2936] ? __crypto_alloc_tfm+0xc6/0x120 [4.672025] [c11df101] test_aead+0x31/0xc0 [4.672025] [c11df1d3] alg_test_aead+0x43/0xa0 [4.672025] [c11def2e] ? alg_find_test+0x2e/0x70 [4.672025] [c11dfe42] alg_test+0xa2/0x240 [4.672025] [c106dd83] ? finish_task_switch+0x83/0xe0 [4.672025] [c159c002] ? __schedule+0x412/0x1067 [4.672025] [c1085f57] ? __wake_up_common+0x47/0x70 [4.672025] [c11dbc10] ? cryptomgr_notify+0x450/0x450 [4.672025] [c11dbc4f] cryptomgr_test+0x3f/0x50 [4.672025] [c1066dfb] kthread+0xab/0xc0 [4.672025] [c15a1a41] ret_from_kernel_thread+0x21/0x30 [4.672025] [c1066d50] ? __kthread_parkme+0x80/0x80 [4.672025] Code: 83 c4 04 5b 5e 5f 5d c3 81 3b 21 43 65 87 75 13 8b 43 04 83 e0 fe 83 c8 02 89 43 04 89 d8 e9 4d ff ff ff 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 8d b4 26 00 00 00 00 55 89 e5 57 56 53 83 ec 40 3e [4.672025] EIP: [c11d45b5] scatterwalk_ffwd+0xf5/0x100 SS:ESP 0068:ceb55c6c [4.721562] ---[ end trace 94a02f0816fe7c7f ]--- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a
Re: Looking for a lost patch
On Thu, May 21, 2015 at 05:25:24PM -0400, David Miller wrote: From: Steffen Klassert steffen.klass...@secunet.com Date: Wed, 20 May 2015 08:32:23 +0200 On Tue, May 19, 2015 at 11:32:15AM -0700, Alexander Duyck wrote: On 05/19/2015 12:57 AM, Steffen Klassert wrote: The MTU should be 1500. All the IPsec overhead is handled by PMTU discovery, just like in the case we use IPsec without vti tunnels. The IPv6 side of vti does it like that. The problem is the PMTU isn't communicated to things that make use of the tunnel. For example if I do a ping -s 2000 x.x.x.x across an IPv6 VTI interface it will fail currently as it assumes the MTU is 1500 and so it is fragmenting the ping packet at sizes that won't be communicated across the underlying interface. Well, the problem is that the local socket is still attached on the skb. The socket gets an error notification if the packet is too big, but ping does not care much about these error notifications. One option to get such applications to work is to orphan the skb in the vti xmit function. Then the packet is not assumed to be local, so PMTU discovery is triggered on that route. Something like this should work for IPv6: When a packet traverses software layered devices, we should not orphan the socket. In fact, we have taken great pains to make sure this works so that the socket memory accounting is done correctly on the original top-level socket. I have not considered this as an official patch :) It was more to demonstrate that PMTU discovery with IPsec tunnels can work, so we don't need to reduce the MTU of the tunnel device. We currently check if a socket is attached to a skb and do socket error notification in this case, otherwise we do PMTU discovery if the packet is too big. Looks like this socket check is not sufficient if the packet is already transmitted through a tunnel device. I wonder if we have something to know that a packet was already transmitted through a tunnel device. We could switch from socket notification to PMTU discovery in this case. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3 PATCH 0/8] crypto: Convert all AEAD users to new interface
On Wed, May 27, 2015 at 05:29:22PM +0800, Herbert Xu wrote: On Wed, May 27, 2015 at 11:25:33AM +0200, Steffen Klassert wrote: Not sure if I missed something in the flood of patches, but if I apply your v3 patchset on top of the cryptodev tree, it crashes like that buring boot: Sorry, I forgot to mention that v3 depends on the series of fixes posted just before it (but only to linux-crypto): https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg14487.html OK, I'll try with this. Thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ipsec PATCH 0/3] Preserve skb-mark through VTI tunnels
On Wed, May 27, 2015 at 07:16:37AM -0700, Alexander Duyck wrote: These patches are meant to try and address the fact the VTI tunnels are currently overwriting the skb-mark value. I am generally happy with the first two patches, however the third patch still modifies the skb-mark, though it undoes after the fact. The main problem I am trying to address is the fact that currently if I use an v6 over v6 VTI tunnel I cannot receive any traffic on the interface as the skb-mark is bleeding through and causing the traffic to be dropped. --- Alexander Duyck (3): ip_vti/ip6_vti: Do not touch skb-mark on xmit xfrm: Override skb-mark with tunnel-parm.i_key in xfrm_input ip_vti/ip6_vti: Preserve skb-mark after rcv_cb call All applied to the ipsec tree, thanks a lot Alexander! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] xfrm: fix a race in xfrm_state_lookup_byspi
From: Li RongQing roy.qing...@gmail.com The returned xfrm_state should be hold before unlock xfrm_state_lock, otherwise the returned xfrm_state maybe be released. Fixes: c454997e6[{pktgen, xfrm} Introduce xfrm_state_lookup_byspi..] Cc: Fan Du fan...@intel.com Signed-off-by: Li RongQing roy.qing...@gmail.com Acked-by: Fan Du fan...@intel.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/xfrm/xfrm_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index f5e39e3..96688cd 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -927,8 +927,8 @@ struct xfrm_state *xfrm_state_lookup_byspi(struct net *net, __be32 spi, x-id.spi != spi) continue; - spin_unlock_bh(net-xfrm.xfrm_state_lock); xfrm_state_hold(x); + spin_unlock_bh(net-xfrm.xfrm_state_lock); return x; } spin_unlock_bh(net-xfrm.xfrm_state_lock); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] esp6: Use high-order sequence number bits for IV generation
From: Herbert Xu herb...@gondor.apana.org.au I noticed we were only using the low-order bits for IV generation when ESN is enabled. This is very bad because it means that the IV can repeat. We must use the full 64 bits. Signed-off-by: Herbert Xu herb...@gondor.apana.org.au Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/ipv6/esp6.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c index 31f1b5d..7c07ce3 100644 --- a/net/ipv6/esp6.c +++ b/net/ipv6/esp6.c @@ -248,7 +248,8 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb) aead_givcrypt_set_crypt(req, sg, sg, clen, iv); aead_givcrypt_set_assoc(req, asg, assoclen); aead_givcrypt_set_giv(req, esph-enc_data, - XFRM_SKB_CB(skb)-seq.output.low); + XFRM_SKB_CB(skb)-seq.output.low + + ((u64)XFRM_SKB_CB(skb)-seq.output.hi 32)); ESP_SKB_CB(skb)-tmp = tmp; err = crypto_aead_givencrypt(req); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pull request (net): ipsec 2015-05-28
1) Fix a race in xfrm_state_lookup_byspi, we need to take the refcount before we release xfrm_state_lock. From Li RongQing. 2) Fix IV generation on ESN state. We used just the low order sequence numbers for IV generation on ESN, as a result the IV can repeat on the same state. Fix this by using the high order sequence number bits too and make sure to always initialize the high order bits with zero. These patches are serious stable candidates. Fixes from Herbert Xu. 3) Fix the skb-mark handling on vti. We don't reset skb-mark in skb_scrub_packet anymore, so vti must care to restore the original value back after it was used to lookup the vti policy and state. Fixes from Alexander Duyck. Please pull or let me know if there are problems. Thanks! The following changes since commit 39376ccb1968ba9f83e2a880a8bf02ad5dea44e1: Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf (2015-04-27 23:12:34 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master for you to fetch changes up to d55c670cbc54b2270a465cdc382ce71adae45785: ip_vti/ip6_vti: Preserve skb-mark after rcv_cb call (2015-05-28 06:23:32 +0200) Alexander Duyck (3): ip_vti/ip6_vti: Do not touch skb-mark on xmit xfrm: Override skb-mark with tunnel-parm.i_key in xfrm_input ip_vti/ip6_vti: Preserve skb-mark after rcv_cb call Herbert Xu (3): esp4: Use high-order sequence number bits for IV generation esp6: Use high-order sequence number bits for IV generation xfrm: Always zero high-order sequence number bits Li RongQing (1): xfrm: fix a race in xfrm_state_lookup_byspi net/ipv4/esp4.c| 3 ++- net/ipv4/ip_vti.c | 14 ++ net/ipv6/esp6.c| 3 ++- net/ipv6/ip6_vti.c | 13 ++--- net/xfrm/xfrm_input.c | 17 - net/xfrm/xfrm_replay.c | 2 ++ net/xfrm/xfrm_state.c | 2 +- 7 files changed, 43 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] esp4: Use high-order sequence number bits for IV generation
From: Herbert Xu herb...@gondor.apana.org.au I noticed we were only using the low-order bits for IV generation when ESN is enabled. This is very bad because it means that the IV can repeat. We must use the full 64 bits. Signed-off-by: Herbert Xu herb...@gondor.apana.org.au Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/ipv4/esp4.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index 421a80b..30b544f 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -256,7 +256,8 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) aead_givcrypt_set_crypt(req, sg, sg, clen, iv); aead_givcrypt_set_assoc(req, asg, assoclen); aead_givcrypt_set_giv(req, esph-enc_data, - XFRM_SKB_CB(skb)-seq.output.low); + XFRM_SKB_CB(skb)-seq.output.low + + ((u64)XFRM_SKB_CB(skb)-seq.output.hi 32)); ESP_SKB_CB(skb)-tmp = tmp; err = crypto_aead_givencrypt(req); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] ip_vti/ip6_vti: Do not touch skb-mark on xmit
From: Alexander Duyck alexander.h.du...@redhat.com Instead of modifying skb-mark we can simply modify the flowi_mark that is generated as a result of the xfrm_decode_session. By doing this we don't need to actually touch the skb-mark and it can be preserved as it passes out through the tunnel. Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/ipv4/ip_vti.c | 5 +++-- net/ipv6/ip6_vti.c | 4 +++- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c index 9f7269f..4c318e1 100644 --- a/net/ipv4/ip_vti.c +++ b/net/ipv4/ip_vti.c @@ -216,8 +216,6 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) memset(fl, 0, sizeof(fl)); - skb-mark = be32_to_cpu(tunnel-parms.o_key); - switch (skb-protocol) { case htons(ETH_P_IP): xfrm_decode_session(skb, fl, AF_INET); @@ -233,6 +231,9 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } + /* override mark with tunnel output key */ + fl.flowi_mark = be32_to_cpu(tunnel-parms.o_key); + return vti_xmit(skb, dev, fl); } diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c index ed9d681..104de4d 100644 --- a/net/ipv6/ip6_vti.c +++ b/net/ipv6/ip6_vti.c @@ -495,7 +495,6 @@ vti6_tnl_xmit(struct sk_buff *skb, struct net_device *dev) int ret; memset(fl, 0, sizeof(fl)); - skb-mark = be32_to_cpu(t-parms.o_key); switch (skb-protocol) { case htons(ETH_P_IPV6): @@ -516,6 +515,9 @@ vti6_tnl_xmit(struct sk_buff *skb, struct net_device *dev) goto tx_err; } + /* override mark with tunnel output key */ + fl.flowi_mark = be32_to_cpu(t-parms.o_key); + ret = vti6_xmit(skb, dev, fl); if (ret 0) goto tx_err; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] ip_vti/ip6_vti: Preserve skb-mark after rcv_cb call
From: Alexander Duyck alexander.h.du...@redhat.com The vti6_rcv_cb and vti_rcv_cb calls were leaving the skb-mark modified after completing the function. This resulted in the original skb-mark value being lost. Since we only need skb-mark to be set for xfrm_policy_check we can pull the assignment into the rcv_cb calls and then just restore the original mark after xfrm_policy_check has been completed. Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/ipv4/ip_vti.c | 9 +++-- net/ipv6/ip6_vti.c | 9 +++-- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c index 4c318e1..0c15208 100644 --- a/net/ipv4/ip_vti.c +++ b/net/ipv4/ip_vti.c @@ -65,7 +65,6 @@ static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi, goto drop; XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip4 = tunnel; - skb-mark = be32_to_cpu(tunnel-parms.i_key); return xfrm_input(skb, nexthdr, spi, encap_type); } @@ -91,6 +90,8 @@ static int vti_rcv_cb(struct sk_buff *skb, int err) struct pcpu_sw_netstats *tstats; struct xfrm_state *x; struct ip_tunnel *tunnel = XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip4; + u32 orig_mark = skb-mark; + int ret; if (!tunnel) return 1; @@ -107,7 +108,11 @@ static int vti_rcv_cb(struct sk_buff *skb, int err) x = xfrm_input_state(skb); family = x-inner_mode-afinfo-family; - if (!xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family)) + skb-mark = be32_to_cpu(tunnel-parms.i_key); + ret = xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family); + skb-mark = orig_mark; + + if (!ret) return -EPERM; skb_scrub_packet(skb, !net_eq(tunnel-net, dev_net(skb-dev))); diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c index 104de4d..ff3bd86 100644 --- a/net/ipv6/ip6_vti.c +++ b/net/ipv6/ip6_vti.c @@ -322,7 +322,6 @@ static int vti6_rcv(struct sk_buff *skb) } XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip6 = t; - skb-mark = be32_to_cpu(t-parms.i_key); rcu_read_unlock(); @@ -342,6 +341,8 @@ static int vti6_rcv_cb(struct sk_buff *skb, int err) struct pcpu_sw_netstats *tstats; struct xfrm_state *x; struct ip6_tnl *t = XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip6; + u32 orig_mark = skb-mark; + int ret; if (!t) return 1; @@ -358,7 +359,11 @@ static int vti6_rcv_cb(struct sk_buff *skb, int err) x = xfrm_input_state(skb); family = x-inner_mode-afinfo-family; - if (!xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family)) + skb-mark = be32_to_cpu(t-parms.i_key); + ret = xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family); + skb-mark = orig_mark; + + if (!ret) return -EPERM; skb_scrub_packet(skb, !net_eq(t-net, dev_net(skb-dev))); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] xfrm: Always zero high-order sequence number bits
From: Herbert Xu herb...@gondor.apana.org.au As we're now always including the high bits of the sequence number in the IV generation process we need to ensure that they don't contain crap. This patch ensures that the high sequence bits are always zeroed so that we don't leak random data into the IV. Signed-off-by: Herbert Xu herb...@gondor.apana.org.au Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/xfrm/xfrm_replay.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/xfrm/xfrm_replay.c b/net/xfrm/xfrm_replay.c index dab57da..4fd725a 100644 --- a/net/xfrm/xfrm_replay.c +++ b/net/xfrm/xfrm_replay.c @@ -99,6 +99,7 @@ static int xfrm_replay_overflow(struct xfrm_state *x, struct sk_buff *skb) if (x-type-flags XFRM_TYPE_REPLAY_PROT) { XFRM_SKB_CB(skb)-seq.output.low = ++x-replay.oseq; + XFRM_SKB_CB(skb)-seq.output.hi = 0; if (unlikely(x-replay.oseq == 0)) { x-replay.oseq--; xfrm_audit_state_replay_overflow(x, skb); @@ -177,6 +178,7 @@ static int xfrm_replay_overflow_bmp(struct xfrm_state *x, struct sk_buff *skb) if (x-type-flags XFRM_TYPE_REPLAY_PROT) { XFRM_SKB_CB(skb)-seq.output.low = ++replay_esn-oseq; + XFRM_SKB_CB(skb)-seq.output.hi = 0; if (unlikely(replay_esn-oseq == 0)) { replay_esn-oseq--; xfrm_audit_state_replay_overflow(x, skb); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] xfrm: Override skb-mark with tunnel-parm.i_key in xfrm_input
From: Alexander Duyck alexander.h.du...@redhat.com This change makes it so that if a tunnel is defined we just use the mark from the tunnel instead of the mark from the skb header. By doing this we can avoid the need to set skb-mark inside of the tunnel receive functions. Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/xfrm/xfrm_input.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 526c4fe..b58286e 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -13,6 +13,8 @@ #include net/dst.h #include net/ip.h #include net/xfrm.h +#include net/ip_tunnels.h +#include net/ip6_tunnel.h static struct kmem_cache *secpath_cachep __read_mostly; @@ -186,6 +188,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) struct xfrm_state *x = NULL; xfrm_address_t *daddr; struct xfrm_mode *inner_mode; + u32 mark = skb-mark; unsigned int family; int decaps = 0; int async = 0; @@ -203,6 +206,18 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) XFRM_SPI_SKB_CB(skb)-daddroff); family = XFRM_SPI_SKB_CB(skb)-family; + /* if tunnel is present override skb-mark value with tunnel i_key */ + if (XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip4) { + switch (family) { + case AF_INET: + mark = be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip4-parms.i_key); + break; + case AF_INET6: + mark = be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)-tunnel.ip6-parms.i_key); + break; + } + } + /* Allocate new secpath or COW existing one. */ if (!skb-sp || atomic_read(skb-sp-refcnt) != 1) { struct sec_path *sp; @@ -229,7 +244,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) goto drop; } - x = xfrm_state_lookup(net, skb-mark, daddr, spi, nexthdr, family); + x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr, family); if (x == NULL) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES); xfrm_audit_state_notfound(skb, family, spi, seq); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm6: Do not use xfrm_local_error for path MTU issues in tunnels
On Thu, May 28, 2015 at 12:18:51AM -0700, Alexander Duyck wrote: On 05/27/2015 10:36 PM, Steffen Klassert wrote: On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote: This change makes it so that we use icmpv6_send to report PMTU issues back into tunnels in the case that the resulting packet is larger than the MTU of the outgoing interface. Previously xfrm_local_error was being used in this case, however this was resulting in no changes, I suspect due to the fact that the tunnel itself was being kept out of the loop. This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the behavior seen if the socket was orphaned. Instead of requiring the socket to be orphaned this patch simply defaults to using icmpv6_send in the case that the frame came though a tunnel. We can use icmpv6_send() just in the case that the packet was already transmitted by a tunnel device, otherwise we get the bug back that I mentioned in my other mail. Not sure if we have something to know that the packet traversed a tunnel device. That's what I asked in the thread 'Looking for a lost patch'. Okay I will try to do some more digging. From what I can tell right now it looks like my ping attempts are getting hung up on the xfrm_local_error in __xfrm6_output. I wonder if we couldn't somehow make use of the skb-cb to store a pointer to the tunnel that could be checked to determine if we are going through a VTI or not. Maybe it is as easy as the patch below, could you please test it? Subject: [PATCH RFC] vti6: Add pmtu handling to vti6_xmit. We currently rely on the PMTU discovery of xfrm. However if a packet is localy sent, the PMTU mechanism of xfrm tries to to local socket notification what might not work for applications like ping that don't check for this. So add pmtu handling to vti6_xmit to report MTU changes immediately. Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/ipv6/ip6_vti.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c index ff3bd86..13cb771 100644 --- a/net/ipv6/ip6_vti.c +++ b/net/ipv6/ip6_vti.c @@ -434,6 +434,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl) struct dst_entry *dst = skb_dst(skb); struct net_device *tdev; struct xfrm_state *x; + int mtu; int err = -1; if (!dst) @@ -468,6 +469,15 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl) skb_dst_set(skb, dst); skb-dev = skb_dst(skb)-dev; + mtu = dst_mtu(dst); + if (!skb-ignore_df skb-len mtu) { + skb_dst(skb)-ops-update_pmtu(dst, NULL, skb, mtu); + + icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); + + return -EMSGSIZE; + } + err = dst_output(skb); if (net_xmit_eval(err) == 0) { struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev-tstats); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pull request (net-next): ipsec-next 2015-05-28
1) Remove xfrm_queue_purge as this is the same as skb_queue_purge. 2) Optimize policy and state walk. 3) Use a sane return code if afinfo registration fails. 4) Only check fori a acquire state if the state is not valid. 5) Remove a unnecessary NULL check before xfrm_pol_hold as it checks the input for NULL. 6) Return directly if the xfrm hold queue is empty, avoid to take a lock as it is nothing to do in this case. 7) Optimize the inexact policy search and allow for matching of policies with priority ~0U. All from Li RongQing. Please pull or let me know if there are problems. Thanks! The following changes since commit 04b7fe6a4a231871ef681bc95e08fe66992f7b1f: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide (2015-04-17 16:36:59 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master for you to fetch changes up to 8faf491e642011f449d6405be52bedb70964aed6: xfrm: optimise to search the inexact policy list (2015-05-18 10:31:56 +0200) Li RongQing (7): xfrm: remove the xfrm_queue_purge definition xfrm: optimise the use of walk list header in xfrm_policy/state_walk xfrm: fix the return code when xfrm_*_register_afinfo failed xfrm: slightly optimise xfrm_input xfrm: remove the unnecessary checking before call xfrm_pol_hold xfrm: move the checking for old xfrm_policy hold_queue to beginning xfrm: optimise to search the inexact policy list net/xfrm/xfrm_input.c | 12 ++-- net/xfrm/xfrm_policy.c | 42 -- net/xfrm/xfrm_state.c | 4 ++-- 3 files changed, 28 insertions(+), 30 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] xfrm: remove the xfrm_queue_purge definition
From: Li RongQing roy.qing...@gmail.com The task of xfrm_queue_purge is same as skb_queue_purge, so remove it Signed-off-by: Li RongQing roy.qing...@gmail.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/xfrm/xfrm_policy.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 638af06..d8c35ad 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -315,14 +315,6 @@ void xfrm_policy_destroy(struct xfrm_policy *policy) } EXPORT_SYMBOL(xfrm_policy_destroy); -static void xfrm_queue_purge(struct sk_buff_head *list) -{ - struct sk_buff *skb; - - while ((skb = skb_dequeue(list)) != NULL) - kfree_skb(skb); -} - /* Rule must be locked. Release descentant resources, announce * entry dead. The rule must be unlinked from lists to the moment. */ @@ -335,7 +327,7 @@ static void xfrm_policy_kill(struct xfrm_policy *policy) if (del_timer(policy-polq.hold_timer)) xfrm_pol_put(policy); - xfrm_queue_purge(policy-polq.hold_queue); + skb_queue_purge(policy-polq.hold_queue); if (del_timer(policy-timer)) xfrm_pol_put(policy); @@ -1955,7 +1947,7 @@ out: purge_queue: pq-timeout = 0; - xfrm_queue_purge(pq-hold_queue); + skb_queue_purge(pq-hold_queue); xfrm_pol_put(pol); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] xfrm: remove the unnecessary checking before call xfrm_pol_hold
From: Li RongQing roy.qing...@gmail.com xfrm_pol_hold will check its input with NULL Signed-off-by: Li RongQing roy.qing...@gmail.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/xfrm/xfrm_policy.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index c4c47f3..435bc0d 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1127,8 +1127,8 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type, break; } } - if (ret) - xfrm_pol_hold(ret); + + xfrm_pol_hold(ret); fail: read_unlock_bh(net-xfrm.xfrm_policy_lock); @@ -3211,8 +3211,7 @@ static struct xfrm_policy *xfrm_migrate_policy_find(const struct xfrm_selector * } } - if (ret) - xfrm_pol_hold(ret); + xfrm_pol_hold(ret); read_unlock_bh(net-xfrm.xfrm_policy_lock); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] xfrm: move the checking for old xfrm_policy hold_queue to beginning
From: Li RongQing roy.qing...@gmail.com if hold_queue of old xfrm_policy is NULL, return directly, then not need to run other codes, especially take the spin lock Signed-off-by: Li RongQing roy.qing...@gmail.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/xfrm/xfrm_policy.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 435bc0d..3d264e5 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -700,6 +700,9 @@ static void xfrm_policy_requeue(struct xfrm_policy *old, struct xfrm_policy_queue *pq = old-polq; struct sk_buff_head list; + if (skb_queue_empty(pq-hold_queue)) + return; + __skb_queue_head_init(list); spin_lock_bh(pq-hold_queue.lock); @@ -708,9 +711,6 @@ static void xfrm_policy_requeue(struct xfrm_policy *old, xfrm_pol_put(old); spin_unlock_bh(pq-hold_queue.lock); - if (skb_queue_empty(list)) - return; - pq = new-polq; spin_lock_bh(pq-hold_queue.lock); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] xfrm: fix the return code when xfrm_*_register_afinfo failed
From: Li RongQing roy.qing...@gmail.com If xfrm_*_register_afinfo failed since xfrm_*_afinfo[afinfo-family] had the value, return the -EEXIST, not -ENOBUFS Signed-off-by: Li RongQing roy.qing...@gmail.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/xfrm/xfrm_input.c | 2 +- net/xfrm/xfrm_policy.c | 2 +- net/xfrm/xfrm_state.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 526c4fe..459796a 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -29,7 +29,7 @@ int xfrm_input_register_afinfo(struct xfrm_input_afinfo *afinfo) return -EAFNOSUPPORT; spin_lock_bh(xfrm_input_afinfo_lock); if (unlikely(xfrm_input_afinfo[afinfo-family] != NULL)) - err = -ENOBUFS; + err = -EEXIST; else rcu_assign_pointer(xfrm_input_afinfo[afinfo-family], afinfo); spin_unlock_bh(xfrm_input_afinfo_lock); diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 847053e..c4c47f3 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -2808,7 +2808,7 @@ int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo) return -EAFNOSUPPORT; spin_lock(xfrm_policy_afinfo_lock); if (unlikely(xfrm_policy_afinfo[afinfo-family] != NULL)) - err = -ENOBUFS; + err = -EEXIST; else { struct dst_ops *dst_ops = afinfo-dst_ops; if (likely(dst_ops-kmem_cachep == NULL)) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index df93183..e47e498 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1908,7 +1908,7 @@ int xfrm_state_register_afinfo(struct xfrm_state_afinfo *afinfo) return -EAFNOSUPPORT; spin_lock_bh(xfrm_state_afinfo_lock); if (unlikely(xfrm_state_afinfo[afinfo-family] != NULL)) - err = -ENOBUFS; + err = -EEXIST; else rcu_assign_pointer(xfrm_state_afinfo[afinfo-family], afinfo); spin_unlock_bh(xfrm_state_afinfo_lock); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] xfrm: slightly optimise xfrm_input
From: Li RongQing roy.qing...@gmail.com Check x-km.state with XFRM_STATE_ACQ only when state is not XFRM_STAT_VALID, not everytime Signed-off-by: Li RongQing roy.qing...@gmail.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/xfrm/xfrm_input.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 459796a..1858a45f 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -239,13 +239,13 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) skb-sp-xvec[skb-sp-len++] = x; spin_lock(x-lock); - if (unlikely(x-km.state == XFRM_STATE_ACQ)) { - XFRM_INC_STATS(net, LINUX_MIB_XFRMACQUIREERROR); - goto drop_unlock; - } if (unlikely(x-km.state != XFRM_STATE_VALID)) { - XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEINVALID); + if (x-km.state == XFRM_STATE_ACQ) + XFRM_INC_STATS(net, LINUX_MIB_XFRMACQUIREERROR); + else + XFRM_INC_STATS(net, + LINUX_MIB_XFRMINSTATEINVALID); goto drop_unlock; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] xfrm: optimise the use of walk list header in xfrm_policy/state_walk
From: Li RongQing roy.qing...@gmail.com The walk from input is the list header, and marked as dead, and will be skipped in loop. list_first_entry() can be used to return the true usable value from walk if walk is not empty Signed-off-by: Li RongQing roy.qing...@gmail.com Acked-by: Herbert Xu herb...@gondor.apana.org.au Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/xfrm/xfrm_policy.c | 4 +++- net/xfrm/xfrm_state.c | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index d8c35ad..847053e 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1004,7 +1004,9 @@ int xfrm_policy_walk(struct net *net, struct xfrm_policy_walk *walk, if (list_empty(walk-walk.all)) x = list_first_entry(net-xfrm.policy_all, struct xfrm_policy_walk_entry, all); else - x = list_entry(walk-walk.all, struct xfrm_policy_walk_entry, all); + x = list_first_entry(walk-walk.all, +struct xfrm_policy_walk_entry, all); + list_for_each_entry_from(x, net-xfrm.policy_all, all) { if (x-dead) continue; diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index f5e39e3..df93183 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1626,7 +1626,7 @@ int xfrm_state_walk(struct net *net, struct xfrm_state_walk *walk, if (list_empty(walk-all)) x = list_first_entry(net-xfrm.state_all, struct xfrm_state_walk, all); else - x = list_entry(walk-all, struct xfrm_state_walk, all); + x = list_first_entry(walk-all, struct xfrm_state_walk, all); list_for_each_entry_from(x, net-xfrm.state_all, all) { if (x-state == XFRM_STATE_DEAD) continue; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] xfrm: optimise to search the inexact policy list
From: Li RongQing roy.qing...@gmail.com The policies are organized into list by priority ascent of policy, so it is unnecessary to continue to loop the policy if the priority of current looped police is larger than or equal priority which is from the policy_bydst list. This allows to match policy with ~0U priority in inexact list too. Signed-off-by: Li RongQing roy.qing...@gmail.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/xfrm/xfrm_policy.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 3d264e5..18cead7 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1114,6 +1114,9 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type, } chain = net-xfrm.policy_inexact[dir]; hlist_for_each_entry(pol, chain, bydst) { + if ((pol-priority = priority) ret) + break; + err = xfrm_policy_match(pol, fl, type, family, dir); if (err) { if (err == -ESRCH) @@ -1122,7 +1125,7 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type, ret = ERR_PTR(err); goto fail; } - } else if (pol-priority priority) { + } else { ret = pol; break; } @@ -3203,9 +3206,11 @@ static struct xfrm_policy *xfrm_migrate_policy_find(const struct xfrm_selector * } chain = net-xfrm.policy_inexact[dir]; hlist_for_each_entry(pol, chain, bydst) { + if ((pol-priority = priority) ret) + break; + if (xfrm_migrate_selector_match(sel, pol-selector) - pol-type == type - pol-priority priority) { + pol-type == type) { ret = pol; break; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm6: Do not use xfrm_local_error for path MTU issues in tunnels
On Thu, May 28, 2015 at 12:49:19PM +0800, Herbert Xu wrote: On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote: This change makes it so that we use icmpv6_send to report PMTU issues back into tunnels in the case that the resulting packet is larger than the MTU of the outgoing interface. Previously xfrm_local_error was being used in this case, however this was resulting in no changes, I suspect due to the fact that the tunnel itself was being kept out of the loop. This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the behavior seen if the socket was orphaned. Instead of requiring the socket to be orphaned this patch simply defaults to using icmpv6_send in the case that the frame came though a tunnel. Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com Does this still work with normal tunnel mode and identical inner and outer addresses? I recall we used to have a bug where in that situation the kernel would interpret the ICMP message as a reduction in outer MTU and thus resulting in a loop where the MTU keeps getting smaller. Right, I think this reintroduces a bug that I fixed some years ago with commit dd767856a36e (xfrm6: Don't call icmpv6_send on local error) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm6: Do not use xfrm_local_error for path MTU issues in tunnels
On Wed, May 27, 2015 at 10:40:32AM -0700, Alexander Duyck wrote: This change makes it so that we use icmpv6_send to report PMTU issues back into tunnels in the case that the resulting packet is larger than the MTU of the outgoing interface. Previously xfrm_local_error was being used in this case, however this was resulting in no changes, I suspect due to the fact that the tunnel itself was being kept out of the loop. This patch fixes PMTU problems seen on ip6_vti tunnels and is based on the behavior seen if the socket was orphaned. Instead of requiring the socket to be orphaned this patch simply defaults to using icmpv6_send in the case that the frame came though a tunnel. We can use icmpv6_send() just in the case that the packet was already transmitted by a tunnel device, otherwise we get the bug back that I mentioned in my other mail. Not sure if we have something to know that the packet traversed a tunnel device. That's what I asked in the thread 'Looking for a lost patch'. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Looking for a lost patch
On Wed, May 27, 2015 at 11:46:03AM -0400, David Miller wrote: From: Steffen Klassert steffen.klass...@secunet.com Date: Wed, 27 May 2015 10:35:16 +0200 We currently check if a socket is attached to a skb and do socket error notification in this case, otherwise we do PMTU discovery if the packet is too big. Looks like this socket check is not sufficient if the packet is already transmitted through a tunnel device. I wonder if we have something to know that a packet was already transmitted through a tunnel device. We could switch from socket notification to PMTU discovery in this case. Generally speaking, we should not be orphaning the socket as it traverses through tunnels. In fact we have taken great pains to avoid doing this. Yes, I'm aware of this. I don't want to orphan the socket, all I wanted to do is to change the way we notify about MTU changes. I.e. use icmpv6_send() instead of xfrm_local_error() if the packet traversed a tunnel, that's why I wondered whether we can know this. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH RFC 0/3] Preserve skb-mark through VTI tunnels
On Tue, May 26, 2015 at 03:41:10PM -0700, Alexander Duyck wrote: These patches are meant to try and address the fact the VTI tunnels are currently overwriting the skb-mark value. I am generally happy with the first two patches, however the third patch still modifies the skb-mark, though it undoes after the fact. I don't see any better solution, so I think this should be ok for now. On the long run we need to replace this gre key/mark matching with a separate interface. The main problem I am trying to address is the fact that currently if I use an v6 over v6 VTI tunnel I cannot receive any traffic on the interface as the skb-mark is bleeding through and causing the traffic to be dropped. This is broken in the current mainline, so it should go into the ipsec tree as a bugfix. I'd merge this patchset if you submit it to that tree. Thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xfrm: Always zero high-order sequence number bits
On Thu, May 21, 2015 at 12:38:12AM +0800, Herbert Xu wrote: As we're now always including the high bits of the sequence number in the IV generation process we need to ensure that they don't contain crap. This patch ensures that the high sequence bits are always zeroed so that we don't leak random data into the IV. Signed-off-by: Herbert Xu herb...@gondor.apana.org.au Applied, thanks Herbert! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH ipsec-next] xfrm: Use VRF master index if output device is enslaved
On Wed, Aug 19, 2015 at 11:35:55AM -0700, David Ahern wrote: I think you should use the new vrf_master_index() helper that acquires rcu because it looks possible to call -decode_session() without rcu read lock, e.g. in the hold_timer function xfrm_policy_queue_process(), though I haven’t tested it and might be missing something. :-) I was digging into code paths yesterday. Today I added WARN_ON and seems like the rcu_read_lock is held: if (skb_dst(skb)) { WARN_ON(!rcu_read_lock_held() !rcu_read_lock_bh_held()); oif = vrf_master_ifindex_rcu(skb_dst(skb)-dev) ? : skb_dst(skb)-dev-ifindex; pr_info(_decode_session: oif %d skb_dst(skb)-dev-ifindex %d\n, oif, skb_dst(skb)-dev-ifindex); } I get the printk, but not the WARN_ON splat. Well, this depends on the codepath that called xfrm_decode_session(). It really think it was not called through xfrm_policy_queue_process() because this codepath is just used if the sysctl xfrm_larval_drop is switched off (on by default) and the required xfrm_state is not yet established. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH ipsec-next v2] xfrm: Use VRF master index if output device is enslaved
On Fri, Aug 21, 2015 at 02:11:21AM +0300, Nikolay Aleksandrov wrote: On Aug 21, 2015, at 1:06 AM, David Ahern d...@cumulusnetworks.com wrote: Directs route lookups to VRF table. Compiles out if NET_VRF is not enabled. With this patch able to successfully bring up ipsec tunnels in VRFs, even with duplicate network configuration. Signed-off-by: David Ahern d...@cumulusnetworks.com --- v2 - use vrf_master_ifindex rather than vrf_master_ifindex_rcu net/ipv4/xfrm4_policy.c | 7 +-- net/ipv6/xfrm6_policy.c | 7 +-- 2 files changed, 10 insertions(+), 4 deletions(-) Looks good to me, Acked-by: Nikolay Aleksandrov niko...@cumulusnetworks.com David, can you please take this directly into net-next? Acked-by: Steffen Klassert steffen.klass...@secunet.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] net: Document xfrm4_gc_thresh and xfrm6_gc_thresh
From: Alexander Duyck alexander.h.du...@redhat.com This change adds documentation for xfrm4_gc_thresh and xfrm6_gc_thresh based on the comments in commit eeb1b73378b56 (xfrm: Increase the garbage collector threshold). Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- Documentation/networking/ip-sysctl.txt | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 56db1ef..46e88ed 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1181,6 +1181,11 @@ tag - INTEGER Allows you to write a number, which can be used as required. Default value is 0. +xfrm4_gc_thresh - INTEGER + The threshold at which we will start garbage collecting for IPv4 + destination cache entries. At twice this value the system will + refuse new allocations. + Alexey Kuznetsov. kuz...@ms2.inr.ac.ru @@ -1617,6 +1622,11 @@ ratelimit - INTEGER otherwise the minimal space between responses in milliseconds. Default: 1000 +xfrm6_gc_thresh - INTEGER + The threshold at which we will start garbage collecting for IPv6 + destination cache entries. At twice this value the system will + refuse new allocations. + IPv6 Update by: Pekka Savola pek...@netcore.fi -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] xfrm: Add oif to dst lookups
From: David Ahern d...@cumulusnetworks.com Rules can be installed that direct route lookups to specific tables based on oif. Plumb the oif through the xfrm lookups so it gets set in the flow struct and passed to the resolver routines. Signed-off-by: David Ahern d...@cumulusnetworks.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- include/net/xfrm.h | 7 +-- net/ipv4/xfrm4_policy.c | 11 ++- net/ipv6/xfrm6_policy.c | 7 --- net/xfrm/xfrm_policy.c | 24 ++-- 4 files changed, 29 insertions(+), 20 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index f0ee97e..312e3fe 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -285,10 +285,13 @@ struct xfrm_policy_afinfo { unsigned short family; struct dst_ops *dst_ops; void(*garbage_collect)(struct net *net); - struct dst_entry*(*dst_lookup)(struct net *net, int tos, + struct dst_entry*(*dst_lookup)(struct net *net, + int tos, int oif, const xfrm_address_t *saddr, const xfrm_address_t *daddr); - int (*get_saddr)(struct net *net, xfrm_address_t *saddr, xfrm_address_t *daddr); + int (*get_saddr)(struct net *net, int oif, +xfrm_address_t *saddr, +xfrm_address_t *daddr); void(*decode_session)(struct sk_buff *skb, struct flowi *fl, int reverse); diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index bff6974..55b3c0f 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -19,7 +19,7 @@ static struct xfrm_policy_afinfo xfrm4_policy_afinfo; static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 *fl4, - int tos, + int tos, int oif, const xfrm_address_t *saddr, const xfrm_address_t *daddr) { @@ -28,6 +28,7 @@ static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 *fl4, memset(fl4, 0, sizeof(*fl4)); fl4-daddr = daddr-a4; fl4-flowi4_tos = tos; + fl4-flowi4_oif = oif; if (saddr) fl4-saddr = saddr-a4; @@ -38,22 +39,22 @@ static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 *fl4, return ERR_CAST(rt); } -static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos, +static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos, int oif, const xfrm_address_t *saddr, const xfrm_address_t *daddr) { struct flowi4 fl4; - return __xfrm4_dst_lookup(net, fl4, tos, saddr, daddr); + return __xfrm4_dst_lookup(net, fl4, tos, oif, saddr, daddr); } -static int xfrm4_get_saddr(struct net *net, +static int xfrm4_get_saddr(struct net *net, int oif, xfrm_address_t *saddr, xfrm_address_t *daddr) { struct dst_entry *dst; struct flowi4 fl4; - dst = __xfrm4_dst_lookup(net, fl4, 0, NULL, daddr); + dst = __xfrm4_dst_lookup(net, fl4, 0, oif, NULL, daddr); if (IS_ERR(dst)) return -EHOSTUNREACH; diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index ed0583c..a74013d 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -26,7 +26,7 @@ static struct xfrm_policy_afinfo xfrm6_policy_afinfo; -static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, +static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif, const xfrm_address_t *saddr, const xfrm_address_t *daddr) { @@ -35,6 +35,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int err; memset(fl6, 0, sizeof(fl6)); + fl6.flowi6_oif = oif; memcpy(fl6.daddr, daddr, sizeof(fl6.daddr)); if (saddr) memcpy(fl6.saddr, saddr, sizeof(fl6.saddr)); @@ -50,13 +51,13 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, return dst; } -static int xfrm6_get_saddr(struct net *net, +static int xfrm6_get_saddr(struct net *net, int oif, xfrm_address_t *saddr, xfrm_address_t *daddr) { struct dst_entry *dst; struct net_device *dev; - dst = xfrm6_dst_lookup(net, 0, NULL, daddr); + dst = xfrm6_dst_lookup(net, 0, oif, NULL, daddr); if (IS_ERR(dst
[PATCH 2/4] net/xfrm: use kmemdup rather than duplicating its implementation
From: Andrzej Hajda a.ha...@samsung.com The patch was generated using fixed coccinelle semantic patch scripts/coccinelle/api/memdup.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 Signed-off-by: Andrzej Hajda a.ha...@samsung.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/xfrm/xfrm_user.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 0cebf1f..a8de9e3 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -925,12 +925,10 @@ static int xfrm_dump_sa(struct sk_buff *skb, struct netlink_callback *cb) return err; if (attrs[XFRMA_ADDRESS_FILTER]) { - filter = kmalloc(sizeof(*filter), GFP_KERNEL); + filter = kmemdup(nla_data(attrs[XFRMA_ADDRESS_FILTER]), +sizeof(*filter), GFP_KERNEL); if (filter == NULL) return -ENOMEM; - - memcpy(filter, nla_data(attrs[XFRMA_ADDRESS_FILTER]), - sizeof(*filter)); } if (attrs[XFRMA_PROTO]) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pull request (net-next): ipsec-next 2015-08-17
1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels. From Thomas Egerer. 2) Use kmemdup instead of duplicating it in xfrm_dump_sa(). From Andrzej Hajda. 3) Pass oif to the xfrm lookups so that it gets set on the flow and the resolver routines can match based on oif. From David Ahern. 4) Add documentation for the new xfrm garbage collector threshold. From Alexander Duyck. Please pull or let me know if there are problems. Thanks! The following changes since commit 07a51cd3794960548627a27aae68c1446341db32: vxlan: fix fdb_dump index calculation (2015-08-10 21:15:18 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master for you to fetch changes up to e69948a0a5309f3ef5715cb4ca7a9bd77d64e2cf: net: Document xfrm4_gc_thresh and xfrm6_gc_thresh (2015-08-12 08:28:04 +0200) Alexander Duyck (1): net: Document xfrm4_gc_thresh and xfrm6_gc_thresh Andrzej Hajda (1): net/xfrm: use kmemdup rather than duplicating its implementation David Ahern (1): xfrm: Add oif to dst lookups Thomas Egerer (1): xfrm6: Fix IPv6 ECN decapsulation Documentation/networking/ip-sysctl.txt | 10 ++ include/net/xfrm.h | 7 +-- net/ipv4/xfrm4_policy.c| 11 ++- net/ipv6/xfrm6_mode_tunnel.c | 3 +-- net/ipv6/xfrm6_policy.c| 7 --- net/xfrm/xfrm_policy.c | 24 ++-- net/xfrm/xfrm_user.c | 6 ++ 7 files changed, 42 insertions(+), 26 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] xfrm6: Fix IPv6 ECN decapsulation
From: Thomas Egerer thomas.ege...@secunet.com Using ipv6_get_dsfield on the outer IP header implies that inner and outer header are of the the same address family. For interfamily tunnels, particularly 646, the code reading the DSCP field obtains the wrong values (IHL and the upper four bits of the DSCP field). This can cause the code to detect a congestion encoutered state in the outer header and enable the corresponding bits in the inner header, too. Since the DSCP field is stored in the xfrm mode common buffer independently from the IP version of the outer header, it's safe (and correct) to take this value from there. Signed-off-by: Thomas Egerer thomas.ege...@secunet.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com --- net/ipv6/xfrm6_mode_tunnel.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c index 901ef6f..f7fbdba 100644 --- a/net/ipv6/xfrm6_mode_tunnel.c +++ b/net/ipv6/xfrm6_mode_tunnel.c @@ -20,10 +20,9 @@ static inline void ipip6_ecn_decapsulate(struct sk_buff *skb) { - const struct ipv6hdr *outer_iph = ipv6_hdr(skb); struct ipv6hdr *inner_iph = ipipv6_hdr(skb); - if (INET_ECN_is_ce(ipv6_get_dsfield(outer_iph))) + if (INET_ECN_is_ce(XFRM_MODE_SKB_CB(skb)-tos)) IP6_ECN_set_ce(inner_iph); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPsec maintenance during the next weeks
On Tue, Jul 07, 2015 at 10:58:17PM -0700, David Miller wrote: From: Steffen Klassert steffen.klass...@secunet.com Date: Wed, 8 Jul 2015 07:04:32 +0200 I'll be off without mail access for the next two and a half weeks. Can you please take urgent IPsec patches directly into the net tree during this time? I'll let you know as soon as I'm back. Sure, thanks for letting me know. I'm back and continue to run the ipsec trees. Thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm: Add oif to dst lookups
On Mon, Aug 10, 2015 at 04:58:11PM -0600, David Ahern wrote: Rules can be installed that direct route lookups to specific tables based on oif. Plumb the oif through the xfrm lookups so it gets set in the flow struct and passed to the resolver routines. Signed-off-by: David Ahern d...@cumulusnetworks.com Applied to ipsec-next, thanks a lot! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 28/31] net/xfrm: use kmemdup rather than duplicating its implementation
On Fri, Aug 07, 2015 at 09:59:34AM +0200, Andrzej Hajda wrote: The patch was generated using fixed coccinelle semantic patch scripts/coccinelle/api/memdup.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 Signed-off-by: Andrzej Hajda a.ha...@samsung.com Applied to ipsec-next, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH] net: Document xfrm4_gc_thresh and xfrm6_gc_thresh
On Tue, Aug 11, 2015 at 01:51:52PM -0700, David Miller wrote: From: Alexander Duyck alexander.h.du...@redhat.com Date: Tue, 11 Aug 2015 13:35:01 -0700 This change adds documentation for xfrm4_gc_thresh and xfrm6_gc_thresh based on the comments in commit eeb1b73378b56 (xfrm: Increase the garbage collector threshold). Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com Thanks, I'll let Steffen review and pick this up. Now applied to ipsec-next, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next] xfrm: Fix unaligned access to stats in copy_to_user_state()
On Wed, Oct 21, 2015 at 11:48:25AM -0400, Sowmini Varadhan wrote: > > On sparc, deleting established SAs (e.g., by restarting ipsec) > results in unaligned access messages via xfrm_del_sa -> > km_state_notify -> xfrm_send_state_notify(). > > Even though struct xfrm_usersa_info is aligned on 8-byte boundaries, > netlink attributes are fundamentally only 4 byte aligned, and this > cannot be changed for nla_data() that is passed up to userspace. > As a result, the put_unaligned() macro needs to be used to > set up potentially unaligned fields such as the xfrm_stats in > copy_to_user_state() > > Signed-off-by: Sowmini Varadhan> --- > v2: review comment from thread: cannot use PTR_ALIGN as this would break > userspace assumptions about 4 byte alignment. Use *_unaligned() macros > as needed, instead. This works on intel 32-bit and 64-bit as expected. Patch applied to ipsec-next, thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] xfrm: Fix pmtu discovery for local generated packets.
Commit 044a832a777 ("xfrm: Fix local error reporting crash with interfamily tunnels") moved the setting of skb->protocol behind the last access of the inner mode family to fix an interfamily crash. Unfortunately now skb->protocol might not be set at all, so we fail dispatch to the inner address family. As a reault, the local error handler is not called and the mtu value is not reported back to userspace. We fix this by setting skb->protocol on message size errors before we call xfrm_local_error. Fixes: 044a832a7779c ("xfrm: Fix local error reporting crash with interfamily tunnels") Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> --- net/ipv4/xfrm4_output.c | 2 ++ net/ipv6/xfrm6_output.c | 1 + 2 files changed, 3 insertions(+) diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c index 2878dbf..41a2613 100644 --- a/net/ipv4/xfrm4_output.c +++ b/net/ipv4/xfrm4_output.c @@ -30,6 +30,8 @@ static int xfrm4_tunnel_check_size(struct sk_buff *skb) mtu = dst_mtu(skb_dst(skb)); if (skb->len > mtu) { + skb->protocol = htons(ETH_P_IP); + if (skb->sk) xfrm_local_error(skb, mtu); else diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c index be033f2..e15feb7 100644 --- a/net/ipv6/xfrm6_output.c +++ b/net/ipv6/xfrm6_output.c @@ -79,6 +79,7 @@ static int xfrm6_tunnel_check_size(struct sk_buff *skb) if (!skb->ignore_df && skb->len > mtu) { skb->dev = dst->dev; + skb->protocol = htons(ETH_P_IPV6); if (xfrm6_local_dontfrag(skb)) xfrm6_local_rxpmtu(skb, mtu); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] xfrm: Fix state threshold configuration from userspace
From: Michael Rossberg <michael.rossb...@tu-ilmenau.de> Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry timer (XFRMA_ETIMER_THRESH) of a state without having to set other attributes like replay counter and byte lifetime. Changing these other values while traffic flows will break the state. Signed-off-by: Michael Rossberg <michael.rossb...@tu-ilmenau.de> Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> --- net/xfrm/xfrm_user.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index a8de9e3..24e06a2 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -1928,8 +1928,10 @@ static int xfrm_new_ae(struct sk_buff *skb, struct nlmsghdr *nlh, struct nlattr *rp = attrs[XFRMA_REPLAY_VAL]; struct nlattr *re = attrs[XFRMA_REPLAY_ESN_VAL]; struct nlattr *lt = attrs[XFRMA_LTIME_VAL]; + struct nlattr *et = attrs[XFRMA_ETIMER_THRESH]; + struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH]; - if (!lt && !rp && !re) + if (!lt && !rp && !re && !et && !rt) return err; /* pedantic mode - thou shalt sayeth replaceth */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pull request (net): ipsec 2015-10-22
1) Fix IPsec pre-encap fragmentation for GSO packets. From Herbert Xu. 2) Fix some header checks in _decode_session6. We skip the header informations if the data pointer points already behind the header in question for some protocols. This is because we call pskb_may_pull with a negative value converted to unsigened int from pskb_may_pull in this case. Skipping the header informations can lead to incorrect policy lookups. From Mathias Krause. 3) Allow to change the replay threshold and expiry timer of a state without having to set other attributes like replay counter and byte lifetime. Changing these other attributes may break the SA. From Michael Rossberg. 4) Fix pmtu discovery for local generated packets. We may fail dispatch to the inner address family. As a reault, the local error handler is not called and the mtu value is not reported back to userspace. Please pull or let me know if there are problems. Thanks! The following changes since commit 724a7636ad026a3a68f3fc626ccd04111f65cfd9: Merge branch 'sctp-fixes' (2015-09-03 15:43:06 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master for you to fetch changes up to ca064bd89363a6e7e71b1c5226ff1b718957a9d4: xfrm: Fix pmtu discovery for local generated packets. (2015-10-19 10:30:05 +0200) Herbert Xu (1): ipv6: Fix IPsec pre-encap fragmentation check Mathias Krause (1): xfrm6: Fix ICMPv6 and MH header checks in _decode_session6 Michael Rossberg (1): xfrm: Fix state threshold configuration from userspace Steffen Klassert (1): xfrm: Fix pmtu discovery for local generated packets. net/ipv4/xfrm4_output.c | 2 ++ net/ipv6/xfrm6_output.c | 18 -- net/ipv6/xfrm6_policy.c | 6 -- net/xfrm/xfrm_user.c| 4 +++- 4 files changed, 21 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] xfrm6: Fix ICMPv6 and MH header checks in _decode_session6
From: Mathias Krause <mathias.kra...@secunet.com> Ensure there's enough data left prior calling pskb_may_pull(). If skb->data was already advanced, we'll call pskb_may_pull() with a negative value converted to unsigned int -- leading to a huge positive value. That won't matter in practice as pskb_may_pull() will likely fail in this case, but it leads to underflow reports on kernels handling such kind of over-/underflows, e.g. a PaX enabled kernel instrumented with the size_overflow plugin. Reported-by: satmd <sa...@lain.at> Reported-and-tested-by: Marcin Jurkowski <marci...@gmail.com> Signed-off-by: Mathias Krause <mathias.kra...@secunet.com> Cc: PaX Team <pagee...@freemail.hu> Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> --- net/ipv6/xfrm6_policy.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index 30caa28..f10b940 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -178,7 +178,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse) return; case IPPROTO_ICMPV6: - if (!onlyproto && pskb_may_pull(skb, nh + offset + 2 - skb->data)) { + if (!onlyproto && (nh + offset + 2 < skb->data || + pskb_may_pull(skb, nh + offset + 2 - skb->data))) { u8 *icmp; nh = skb_network_header(skb); @@ -192,7 +193,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse) #if IS_ENABLED(CONFIG_IPV6_MIP6) case IPPROTO_MH: offset += ipv6_optlen(exthdr); - if (!onlyproto && pskb_may_pull(skb, nh + offset + 3 - skb->data)) { + if (!onlyproto && (nh + offset + 3 < skb->data || + pskb_may_pull(skb, nh + offset + 3 - skb->data))) { struct ip6_mh *mh; nh = skb_network_header(skb); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] ipv6: Fix IPsec pre-encap fragmentation check
From: Herbert Xu <herb...@gondor.apana.org.au> The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode packets. That is, we perform fragmentation pre-encap rather than post-encap. A check was added later to ensure that proper MTU information is passed back for locally generated traffic. Unfortunately this check was performed on all IPsec packets, including transport-mode packets. What's more, the check failed to take GSO into account. The end result is that transport-mode GSO packets get dropped at the check. This patch fixes it by moving the tunnel mode check forward as well as adding the GSO check. Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error") Signed-off-by: Herbert Xu <herb...@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> --- net/ipv6/xfrm6_output.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c index 09c76a7..be033f2 100644 --- a/net/ipv6/xfrm6_output.c +++ b/net/ipv6/xfrm6_output.c @@ -136,6 +136,7 @@ static int __xfrm6_output(struct sock *sk, struct sk_buff *skb) struct dst_entry *dst = skb_dst(skb); struct xfrm_state *x = dst->xfrm; int mtu; + bool toobig; #ifdef CONFIG_NETFILTER if (!x) { @@ -144,25 +145,29 @@ static int __xfrm6_output(struct sock *sk, struct sk_buff *skb) } #endif + if (x->props.mode != XFRM_MODE_TUNNEL) + goto skip_frag; + if (skb->protocol == htons(ETH_P_IPV6)) mtu = ip6_skb_dst_mtu(skb); else mtu = dst_mtu(skb_dst(skb)); - if (skb->len > mtu && xfrm6_local_dontfrag(skb)) { + toobig = skb->len > mtu && !skb_is_gso(skb); + + if (toobig && xfrm6_local_dontfrag(skb)) { xfrm6_local_rxpmtu(skb, mtu); return -EMSGSIZE; - } else if (!skb->ignore_df && skb->len > mtu && skb->sk) { + } else if (!skb->ignore_df && toobig && skb->sk) { xfrm_local_error(skb, mtu); return -EMSGSIZE; } - if (x->props.mode == XFRM_MODE_TUNNEL && - ((skb->len > mtu && !skb_is_gso(skb)) || - dst_allfrag(skb_dst(skb { + if (toobig || dst_allfrag(skb_dst(skb))) return ip6_fragment(sk, skb, x->outer_mode->afinfo->output_finish); - } + +skip_frag: return x->outer_mode->afinfo->output_finish(sk, skb); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] xfrm: Fix unaligned access to stats in copy_to_user_state()
From: Sowmini Varadhan <sowmini.varad...@oracle.com> On sparc, deleting established SAs (e.g., by restarting ipsec) results in unaligned access messages via xfrm_del_sa -> km_state_notify -> xfrm_send_state_notify(). Even though struct xfrm_usersa_info is aligned on 8-byte boundaries, netlink attributes are fundamentally only 4 byte aligned, and this cannot be changed for nla_data() that is passed up to userspace. As a result, the put_unaligned() macro needs to be used to set up potentially unaligned fields such as the xfrm_stats in copy_to_user_state() Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> --- net/xfrm/xfrm_user.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index a8de9e3..639e0d5 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -31,6 +31,7 @@ #if IS_ENABLED(CONFIG_IPV6) #include #endif +#include static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type) { @@ -728,7 +729,9 @@ static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) memcpy(>sel, >sel, sizeof(p->sel)); memcpy(>lft, >lft, sizeof(p->lft)); memcpy(>curlft, >curlft, sizeof(p->curlft)); - memcpy(>stats, >stats, sizeof(p->stats)); + put_unaligned(x->stats.replay_window, >stats.replay_window); + put_unaligned(x->stats.replay, >stats.replay); + put_unaligned(x->stats.integrity_failed, >stats.integrity_failed); memcpy(>saddr, >props.saddr, sizeof(p->saddr)); p->mode = x->props.mode; p->replay_window = x->props.replay_window; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] xfrm: Let the flowcache handle its size by default.
The xfrm flowcache size is limited by the flowcache limit (4096 * number of online cpus) and the xfrm garbage collector threshold (2 * 32768), whatever is reached first. This means that we can hit the garbage collector limit only on systems with more than 16 cpus. On such systems we simply refuse new allocations if we reach the limit, so new flows are dropped. On syslems with 16 or less cpus, we hit the flowcache limit. In this case, we shrink the flow cache instead of refusing new flows. We increase the xfrm garbage collector threshold to INT_MAX to get the same behaviour, independent of the number of cpus. The xfrm garbage collector threshold can still be set below the flowcache limit to reduce the memory usage of the flowcache. Tested-by: Dan Streetman <dan.street...@canonical.com> Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> --- Documentation/networking/ip-sysctl.txt | 6 -- net/ipv4/xfrm4_policy.c| 2 +- net/ipv6/xfrm6_policy.c| 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ebe94f2..260f30b 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1199,7 +1199,8 @@ tag - INTEGER xfrm4_gc_thresh - INTEGER The threshold at which we will start garbage collecting for IPv4 destination cache entries. At twice this value the system will - refuse new allocations. + refuse new allocations. The value must be set below the flowcache + limit (4096 * number of online cpus) to take effect. igmp_link_local_mcast_reports - BOOLEAN Enable IGMP reports for link local multicast groups in the @@ -1645,7 +1646,8 @@ ratelimit - INTEGER xfrm6_gc_thresh - INTEGER The threshold at which we will start garbage collecting for IPv6 destination cache entries. At twice this value the system will - refuse new allocations. + refuse new allocations. The value must be set below the flowcache + limit (4096 * number of online cpus) to take effect. IPv6 Update by: diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 0304d16..75e8d48 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -246,7 +246,7 @@ static struct dst_ops xfrm4_dst_ops = { .destroy = xfrm4_dst_destroy, .ifdown = xfrm4_dst_ifdown, .local_out =__ip_local_out, - .gc_thresh =32768, + .gc_thresh =INT_MAX, }; static struct xfrm_policy_afinfo xfrm4_policy_afinfo = { diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index 30caa28..2fad593 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -287,7 +287,7 @@ static struct dst_ops xfrm6_dst_ops = { .destroy = xfrm6_dst_destroy, .ifdown = xfrm6_dst_ifdown, .local_out =__ip6_local_out, - .gc_thresh =32768, + .gc_thresh =INT_MAX, }; static struct xfrm_policy_afinfo xfrm6_policy_afinfo = { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] xfrm4: Fix header checks in _decode_session4.
We skip the header informations if the data pointer points already behind the header in question for some protocols. This is because we call pskb_may_pull with a negative value converted to unsigened int from pskb_may_pull in this case. Skipping the header informations can lead to incorrect policy lookups, so fix it by a check of the data pointer position before we call pskb_may_pull. Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> --- net/ipv4/xfrm4_policy.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 75e8d48..e4d533c 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -137,7 +137,8 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) break; case IPPROTO_ICMP: - if (pskb_may_pull(skb, xprth + 2 - skb->data)) { + if (xprth + 2 < skb->data || + pskb_may_pull(skb, xprth + 2 - skb->data)) { u8 *icmp = xprth; fl4->fl4_icmp_type = icmp[0]; @@ -146,7 +147,8 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) break; case IPPROTO_ESP: - if (pskb_may_pull(skb, xprth + 4 - skb->data)) { + if (xprth + 4 < skb->data || + pskb_may_pull(skb, xprth + 4 - skb->data)) { __be32 *ehdr = (__be32 *)xprth; fl4->fl4_ipsec_spi = ehdr[0]; @@ -154,7 +156,8 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) break; case IPPROTO_AH: - if (pskb_may_pull(skb, xprth + 8 - skb->data)) { + if (xprth + 8 < skb->data || + pskb_may_pull(skb, xprth + 8 - skb->data)) { __be32 *ah_hdr = (__be32 *)xprth; fl4->fl4_ipsec_spi = ah_hdr[1]; @@ -162,7 +165,8 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) break; case IPPROTO_COMP: - if (pskb_may_pull(skb, xprth + 4 - skb->data)) { + if (xprth + 4 < skb->data || + pskb_may_pull(skb, xprth + 4 - skb->data)) { __be16 *ipcomp_hdr = (__be16 *)xprth; fl4->fl4_ipsec_spi = htonl(ntohs(ipcomp_hdr[1])); @@ -170,7 +174,8 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) break; case IPPROTO_GRE: - if (pskb_may_pull(skb, xprth + 12 - skb->data)) { + if (xprth + 12 < skb->data || + pskb_may_pull(skb, xprth + 12 - skb->data)) { __be16 *greflags = (__be16 *)xprth; __be32 *gre_hdr = (__be32 *)xprth; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] xfrm: Increment statistic counter on inner mode error
Increment the LINUX_MIB_XFRMINSTATEMODEERROR statistic counter to notify about dropped packets if we fail to fetch a inner mode. Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> --- net/xfrm/xfrm_input.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 60ce701..ad7f5b3 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -330,8 +330,10 @@ resume: if (x->sel.family == AF_UNSPEC) { inner_mode = xfrm_ip2inner_mode(x, XFRM_MODE_SKB_CB(skb)->protocol); - if (inner_mode == NULL) + if (inner_mode == NULL) { + XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR); goto drop; + } } if (inner_mode->input(x, skb)) { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] xfrm4: Reload skb header pointers after calling pskb_may_pull.
A call to pskb_may_pull may change the pointers into the packet, so reload the pointers after the call. Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> --- net/ipv4/xfrm4_policy.c | 33 ++--- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index e4d533c..269b137 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -129,7 +129,10 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) case IPPROTO_DCCP: if (xprth + 4 < skb->data || pskb_may_pull(skb, xprth + 4 - skb->data)) { - __be16 *ports = (__be16 *)xprth; + __be16 *ports; + + xprth = skb_network_header(skb) + iph->ihl * 4; + ports = (__be16 *)xprth; fl4->fl4_sport = ports[!!reverse]; fl4->fl4_dport = ports[!reverse]; @@ -139,7 +142,10 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) case IPPROTO_ICMP: if (xprth + 2 < skb->data || pskb_may_pull(skb, xprth + 2 - skb->data)) { - u8 *icmp = xprth; + u8 *icmp; + + xprth = skb_network_header(skb) + iph->ihl * 4; + icmp = xprth; fl4->fl4_icmp_type = icmp[0]; fl4->fl4_icmp_code = icmp[1]; @@ -149,7 +155,10 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) case IPPROTO_ESP: if (xprth + 4 < skb->data || pskb_may_pull(skb, xprth + 4 - skb->data)) { - __be32 *ehdr = (__be32 *)xprth; + __be32 *ehdr; + + xprth = skb_network_header(skb) + iph->ihl * 4; + ehdr = (__be32 *)xprth; fl4->fl4_ipsec_spi = ehdr[0]; } @@ -158,7 +167,10 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) case IPPROTO_AH: if (xprth + 8 < skb->data || pskb_may_pull(skb, xprth + 8 - skb->data)) { - __be32 *ah_hdr = (__be32 *)xprth; + __be32 *ah_hdr; + + xprth = skb_network_header(skb) + iph->ihl * 4; + ah_hdr = (__be32 *)xprth; fl4->fl4_ipsec_spi = ah_hdr[1]; } @@ -167,7 +179,10 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) case IPPROTO_COMP: if (xprth + 4 < skb->data || pskb_may_pull(skb, xprth + 4 - skb->data)) { - __be16 *ipcomp_hdr = (__be16 *)xprth; + __be16 *ipcomp_hdr; + + xprth = skb_network_header(skb) + iph->ihl * 4; + ipcomp_hdr = (__be16 *)xprth; fl4->fl4_ipsec_spi = htonl(ntohs(ipcomp_hdr[1])); } @@ -176,8 +191,12 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse) case IPPROTO_GRE: if (xprth + 12 < skb->data || pskb_may_pull(skb, xprth + 12 - skb->data)) { - __be16 *greflags = (__be16 *)xprth; - __be32 *gre_hdr = (__be32 *)xprth; + __be16 *greflags; + __be32 *gre_hdr; + + xprth = skb_network_header(skb) + iph->ihl * 4; + greflags = (__be16 *)xprth; + gre_hdr = (__be32 *)xprth; if (greflags[0] & GRE_KEY) { if (greflags[0] & GRE_CSUM) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pull request (net-next): ipsec-next 2015-10-30
1) The flow cache is limited by the flow cache limit which depends on the number of cpus and the xfrm garbage collector threshold which is independent of the number of cpus. This leads to the fact that on systems with more than 16 cpus we hit the xfrm garbage collector limit and refuse new allocations, so new flows are dropped. On systems with 16 or less cpus, we hit the flowcache limit. In this case, we shrink the flow cache instead of refusing new flows. We increase the xfrm garbage collector threshold to INT_MAX to get the same behaviour, independent of the number of cpus. 2) Fix some unaligned accesses on sparc systems. From Sowmini Varadhan. 3) Fix some header checks in _decode_session4. We may call pskb_may_pull with a negative value converted to unsigened int from pskb_may_pull. This can lead to incorrect policy lookups. We fix this by a check of the data pointer position before we call pskb_may_pull. 4) Reload skb header pointers after calling pskb_may_pull in _decode_session4 as this may change the pointers into the packet. 5) Add a missing statistic counter on inner mode errors. Please pull or let me know if there are problems. Thanks! The following changes since commit 8a4683a5e06efda7e1f327213678d4dcafc0d894: net: help compiler generate better code in eth_get_headlen (2015-09-28 22:51:15 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master for you to fetch changes up to cb866e3298cd7412503fc7e2c265753c853fab9d: xfrm: Increment statistic counter on inner mode error (2015-10-23 07:52:58 +0200) Sowmini Varadhan (1): xfrm: Fix unaligned access to stats in copy_to_user_state() Steffen Klassert (4): xfrm: Let the flowcache handle its size by default. xfrm4: Fix header checks in _decode_session4. xfrm4: Reload skb header pointers after calling pskb_may_pull. xfrm: Increment statistic counter on inner mode error Documentation/networking/ip-sysctl.txt | 6 ++-- net/ipv4/xfrm4_policy.c| 50 +- net/ipv6/xfrm6_policy.c| 2 +- net/xfrm/xfrm_input.c | 4 ++- net/xfrm/xfrm_user.c | 5 +++- 5 files changed, 49 insertions(+), 18 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3] xfrm: dst_entries_init() per-net dst_ops
On Thu, Oct 29, 2015 at 09:51:16AM -0400, Dan Streetman wrote: > Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops > templates; their dst_entries counters will never be used. Move the > xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to > xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init > and dst_entries_destroy for each net namespace. > > The ipv4 and ipv6 xfrms each create dst_ops template, and perform > dst_entries_init on the templates. The template values are copied to each > net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops > pcpuc_entries field is a percpu counter and cannot be used correctly by > simply copying it to another object. > > The result of this is a very subtle bug; changes to the dst entries > counter from one net namespace may sometimes get applied to a different > net namespace dst entries counter. This is because of how the percpu > counter works; it has a main count field as well as a pointer to the > percpu variables. Each net namespace maintains its own main count > variable, but all point to one set of percpu variables. When any net > namespace happens to change one of the percpu variables to outside its > small batch range, its count is moved to the net namespace's main count > variable. So with multiple net namespaces operating concurrently, the > dst_ops entries counter can stray from the actual value that it should > be; if counts are consistently moved from one net namespace to another > (which my testing showed is likely), then one net namespace winds up > with a negative dst_ops count while another winds up with a continually > increasing count, eventually reaching its gc_thresh limit, which causes > all new traffic on the net namespace to fail with -ENOBUFS. > > Signed-off-by: Dan Streetman> Signed-off-by: Dan Streetman Applied to the ipsec tree, thanks Dan! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
On Mon, Oct 12, 2015 at 12:49:29PM -0600, David Ahern wrote: > On 10/9/15 11:27 AM, David Ahern wrote: > >On 10/9/15 1:17 AM, Steffen Klassert wrote: > >>>>diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c > >>>>index 30caa289c5db..5cedfda4b241 100644 > >>>>--- a/net/ipv6/xfrm6_policy.c > >>>>+++ b/net/ipv6/xfrm6_policy.c > >>>>@@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct > >>>>net *net, int tos, int oif, > >>>> > >>>> memset(, 0, sizeof(fl6)); > >>>> fl6.flowi6_oif = oif; > >>>>+fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF; > >>>> memcpy(, daddr, sizeof(fl6.daddr)); > >>>> if (saddr) > >>>> memcpy(, saddr, sizeof(fl6.saddr)); > >>> > >>>I found that this fix is still not sufficient with the mip6 > >>>(Mobile IPv6) use case. > >> > >>It does not even fix the vti case. The behaviour of the vti devices is > >>the same, with and without the patch. > >> > > > >The attached patch applied to Linus' tree works for me. Currently the > >above change is not in his tree, so I added it to this patch. Once you > >confirm that it works for you I'll create the delta-patch for net and > >send out. > > Steffen: Have you had a chance to try the patch? Does it solve the > vti6 problem for you? The delta of your patch and the current net tree fixes the vti6 problems. Current net-next works without any changes. Thanks David! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: Really fix vti6 with oif in dst lookups
On Mon, Oct 19, 2015 at 08:26:05AM -0700, David Ahern wrote: > 6e28b000825d ("net: Fix vti use case with oif in dst lookups for IPv6") > is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them. > > Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") > Cc: Steffen Klassert <steffen.klass...@secunet.com> > Signed-off-by: David Ahern <d...@cumulusnetworks.com> > --- > This is needed for net in 4.3. 6e28b000825d was mistakenly created on > top of the VRF patches which had this change. Steffen has confirmed > VTI6 works with this change. Acked-by: Steffen Klassert <steffen.klass...@secunet.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/2] xfrm: Fix unaligned access in xfrm_notify_sa() for DELSA
On Mon, Oct 19, 2015 at 05:23:29PM -0400, Sowmini Varadhan wrote: > On sparc, deleting established SAs (e.g., by restarting ipsec > at the peer) results in unaligned access messages via > xfrm_del_sa -> km_state_notify -> xfrm_send_state_notify(). > Use an aligned pointer to xfrm_usersa_info for this case. > > Signed-off-by: Sowmini Varadhan> --- > net/xfrm/xfrm_user.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c > index a8de9e3..158ef4a 100644 > --- a/net/xfrm/xfrm_user.c > +++ b/net/xfrm/xfrm_user.c > @@ -2659,7 +2659,7 @@ static int xfrm_notify_sa(struct xfrm_state *x, const > struct km_event *c) > if (attr == NULL) > goto out_free_skb; > > - p = nla_data(attr); > + p = PTR_ALIGN(nla_data(attr), __alignof__(*p)); Hm, this breaks userspace notifications on 64-bit systems. Userspace expects this to be aligned to 4, with your patch it is aligned to 8 on 64-bit. Without your patch I get the correct notification when deleting a SA: ip x m Deleted src 172.16.0.2 dst 172.16.0.1 proto esp spi 0x0002 reqid 2 mode tunnel replay-window 32 auth-trunc hmac(sha1) 0x31323334353637383930 96 enc cbc(aes) 0x31323334353637383930313233343536 sel src 10.0.0.0/24 dst 192.168.0.0/24 With your patch I get for the same SA: ip x m Deleted src 50.0.0.0 dst 0.0.0.0 proto 0 reqid 0 mode transport replay-window 0 flag decap-dscp auth-trunc hmac(sha1) 0x31323334353637383930 96 enc cbc(aes) 0x31323334353637383930313233343536 sel src 0.0.0.0/0 dst 0.234.255.255/0 proto igmp -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
Hi David. On Mon, Oct 12, 2015 at 12:49:29PM -0600, David Ahern wrote: > On 10/9/15 11:27 AM, David Ahern wrote: > > > >The attached patch applied to Linus' tree works for me. Currently the > >above change is not in his tree, so I added it to this patch. Once you > >confirm that it works for you I'll create the delta-patch for net and > >send out. > > Steffen: Have you had a chance to try the patch? Does it solve the > vti6 problem for you? > I'm not at office this week, so not able to do a test before monday. I let you know as soon as I tried the patch. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6
On Fri, Oct 09, 2015 at 03:54:22PM +0900, Hajime Tazaki wrote: > > Hello David, > > At Mon, 5 Oct 2015 08:32:51 -0600, > David Ahern wrote: > > > > > diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c > > index 30caa289c5db..5cedfda4b241 100644 > > --- a/net/ipv6/xfrm6_policy.c > > +++ b/net/ipv6/xfrm6_policy.c > > @@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net > > *net, int tos, int oif, > > > > memset(, 0, sizeof(fl6)); > > fl6.flowi6_oif = oif; > > + fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF; > > memcpy(, daddr, sizeof(fl6.daddr)); > > if (saddr) > > memcpy(, saddr, sizeof(fl6.saddr)); > > I found that this fix is still not sufficient with the mip6 > (Mobile IPv6) use case. It does not even fix the vti case. The behaviour of the vti devices is the same, with and without the patch. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
IPsec maintenance during the next weeks
David, I'll be off without mail access for the next two and a half weeks. Can you please take urgent IPsec patches directly into the net tree during this time? I'll let you know as soon as I'm back. Thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6 xfrm GSO fragmentation bug
On Sun, Aug 30, 2015 at 04:24:32PM +0800, Herbert Xu wrote: > Hi Steffen: > > I received a bug report regarding poor IPComp performance over > IPv6: > > https://bugzilla.redhat.com/show_bug.cgi?id=1257952 > > It appears to have been caused by > > commit dd767856a36e00b631d65ebc4bb81b19915532d6 > Author: Steffen Klassert <steffen.klass...@secunet.com> > Date: Tue Oct 11 01:44:30 2011 + > > xfrm6: Don't call icmpv6_send on local error > > which addded an MTU check without a GSO override. > > Fixing it obviously isn't difficult, but I am wondering why I > can't find a corresponding patch for IPv4. Do we need that check > to be present in xfrm6_output at all? If we do why isn't it needed > for IPv4 as well? As far as I remember, this was to catch local message size errors before __xfrm6_output calls ip6_fragment which would use icmpv6_send for the error notification. IPv4 does not do fragmentation in the xfrm4_output functions, so this mtu check was not needed there. I think just adding the gso checks should be fine. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6 xfrm GSO fragmentation bug
On Fri, Sep 04, 2015 at 01:21:06PM +0800, Herbert Xu wrote: > On Mon, Aug 31, 2015 at 03:35:26PM +0800, Herbert Xu wrote: > > > > I see where the bug came from. Indeed IPv6 does do fragmentation > > but only for tunnel mode. While your patch added a check that also > > affected transport mode. So in addition to the GSO fix we should > > also make the MTU check conditional to tunnel mode. > > Here is the patch: > > ---8<--- > ipv6: Fix IPsec pre-encap fragmentation check > > The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode > packets. That is, we perform fragmentation pre-encap rather than > post-encap. > > A check was added later to ensure that proper MTU information is > passed back for locally generated traffic. Unfortunately this > check was performed on all IPsec packets, including transport-mode > packets. > > What's more, the check failed to take GSO into account. > > The end result is that transport-mode GSO packets get dropped at > the check. > > This patch fixes it by moving the tunnel mode check forward as well > as adding the GSO check. > > Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error") > Signed-off-by: Herbert XuApplied to the ipsec tree, thanks Herbert! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm6: Fix ICMPv6 and MH header checks in _decode_session6
On Fri, Sep 11, 2015 at 09:57:20AM +0200, Mathias Krause wrote: > From: Mathias Krause> > Ensure there's enough data left prior calling pskb_may_pull(). If > skb->data was already advanced, we'll call pskb_may_pull() with a > negative value converted to unsigned int -- leading to a huge > positive value. That won't matter in practice as pskb_may_pull() > will likely fail in this case, but it leads to underflow reports on > kernels handling such kind of over-/underflows, e.g. a PaX enabled > kernel instrumented with the size_overflow plugin. > > Reported-by: satmd > Reported-and-tested-by: Marcin Jurkowski > Signed-off-by: Mathias Krause > Cc: PaX Team Skipping upper layer informations due to a wrong length calculation, may also leed to incorrect policy lookups. Patch applied to the ipsec tree, thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: Fix vti use case with oif in dst lookups
On Tue, Sep 15, 2015 at 03:10:50PM -0700, David Ahern wrote: > Steffen reported that the recent change to add oif to dst lookups breaks > the VTI use case. The problem is that with the oif set in the flow struct > the comparison to the nh_oif is triggered. Fix by splitting the > FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache > bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare > nh oif (FLOWI_FLAG_SKIP_NH_OIF). > > Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") > > Signed-off-by: David Ahern <d...@cumulusnetworks.com> This works, thanks a lot for the quick fix! > --- > IPv6 does not show this problem for me. So no change is added for IPv6. > If your mileage varies let me know and I'll take another look. IPv6 works just fine as it is, so no change needed. Acked-by: Steffen Klassert <steffen.klass...@secunet.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xfrm4_garbage_collect reaching limit
On Mon, Sep 14, 2015 at 11:14:59PM -0400, Dan Streetman wrote: > On Fri, Sep 11, 2015 at 5:48 AM, Steffen Klassert > <steffen.klass...@secunet.com> wrote: > > > >> Possibly the > >> default value of xfrm4_gc_thresh could be set proportional to > >> num_online_cpus(), but that doesn't help when cpus are onlined after > >> boot. > > > > This could be an option, we could change the xfrm4_gc_thresh value with > > a cpu notifier callback if more cpus come up after boot. > > the issue there is, if the value is changed by the user, does a cpu > hotplug reset it back to default... What about the patch below? With this we are independent of the number of cpus. It should cover most, if not all usecases. While we are at it, we could think about increasing the flowcache percpu limit. This value was choosen back in 2003, so maybe we could have more than 4k cache entries per cpu these days. Subject: [PATCH RFC] xfrm: Let the flowcache handle its size by default. The xfrm flowcache size is limited by the flowcache limit (4096 * number of online cpus) and the xfrm garbage collector threshold (2 * 32768), whatever is reached first. This means that we can hit the garbage collector limit only on systems with more than 16 cpus. On such systems we simply refuse new allocations if we reach the limit, so new flows are dropped. On syslems with 16 or less cpus, we hit the flowcache limit. In this case, we shrink the flow cache instead of refusing new flows. We increase the xfrm garbage collector threshold to INT_MAX to get the same behaviour, independent of the number of cpus. The xfrm garbage collector threshold can still be set below the flowcache limit to reduce the memory usage of the flowcache. Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> --- Documentation/networking/ip-sysctl.txt | 6 -- net/ipv4/xfrm4_policy.c| 2 +- net/ipv6/xfrm6_policy.c| 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ebe94f2..260f30b 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1199,7 +1199,8 @@ tag - INTEGER xfrm4_gc_thresh - INTEGER The threshold at which we will start garbage collecting for IPv4 destination cache entries. At twice this value the system will - refuse new allocations. + refuse new allocations. The value must be set below the flowcache + limit (4096 * number of online cpus) to take effect. igmp_link_local_mcast_reports - BOOLEAN Enable IGMP reports for link local multicast groups in the @@ -1645,7 +1646,8 @@ ratelimit - INTEGER xfrm6_gc_thresh - INTEGER The threshold at which we will start garbage collecting for IPv6 destination cache entries. At twice this value the system will - refuse new allocations. + refuse new allocations. The value must be set below the flowcache + limit (4096 * number of online cpus) to take effect. IPv6 Update by: diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 1e06c4f..3dffc73 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -248,7 +248,7 @@ static struct dst_ops xfrm4_dst_ops = { .destroy = xfrm4_dst_destroy, .ifdown = xfrm4_dst_ifdown, .local_out =__ip_local_out, - .gc_thresh =32768, + .gc_thresh =INT_MAX, }; static struct xfrm_policy_afinfo xfrm4_policy_afinfo = { diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index f10b940..e9af39a 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -289,7 +289,7 @@ static struct dst_ops xfrm6_dst_ops = { .destroy = xfrm6_dst_destroy, .ifdown = xfrm6_dst_ifdown, .local_out =__ip6_local_out, - .gc_thresh =32768, + .gc_thresh =INT_MAX, }; static struct xfrm_policy_afinfo xfrm6_policy_afinfo = { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xfrm4_garbage_collect reaching limit
Hi Dan. On Thu, Sep 10, 2015 at 05:01:26PM -0400, Dan Streetman wrote: > Hi Steffen, > > I've been working with Jay on a ipsec issue, which I believe he > discussed with you. Yes, we talked about this at the LPC. > In this case the xfrm4_garbage_collect is > returning error because the number of xfrm4 dst entries has exceeded > twice the gc_thresh, which causes new allocations of xfrm4 dst objects > to fail, thus making the ipsec connection unusable (until dst objects > are removed/freed). > > The main reason the count gets to the limit is because the > xfrm4_policy_afinfo.garbage_collect function - which points to > flow_cache_flush (indirectly) - doesn't actually guarantee any xfrm4 > dst will get cleaned up, it only cleans up unused entries. > > The flow cache hashtable size limit watermark does restrict how many > flow cache entries exist (by shrinking the per-cpu hashtable once it > has 4k entries), and therefore indirectly controls the total number of > xfrm4 dst objects. However, there's a mismatch between the default > xfrm4 gc_thresh - of 32k objects (which sets a 64k max of xfrm4 dst > objects) - and the flow cache hashtable limit of 4k objects per cpu. > Any system with 16 or less cpus will have a total limit of 64k (or > less) flow cache entries, so the 64k xfrm4 dst entry limit will never > be reached. However for any system with more than 16 cpus, the flow > cache limit is greater than the xfrm4 dst limit, and so the xfrm4 dst > allocation can fail, rendering the ipsec connection unusable. > > The most obvious solution is for the system admin to increase the > xfrm4_gc_thresh value, although it's not really an obvious solution to > the end-user what value they should set it to :-) Yes, a static gc threshold is always wrong for some workloads. So the user needs to adjust it to his needs, even if the right value is not obvious. > Possibly the > default value of xfrm4_gc_thresh could be set proportional to > num_online_cpus(), but that doesn't help when cpus are onlined after > boot. This could be an option, we could change the xfrm4_gc_thresh value with a cpu notifier callback if more cpus come up after boot. > Also, a warning message indicating the xfrm4_gc_thresh limit > was reached, and a suggestion to increase the limit, may help anyone > who hits the issue. > > I'm not sure if something more aggressive is appropriate, like > removing active entries during garbage collection. It would not make too much sense to push an active flow out of the fastpath just to add some other flow. If the number of active entries is to high, there is no other option than increasing the gc threshold. You could try to reduce the number of active entries by shutting down stale security associations frequently. > Or, removing the > failure condition from xfrm4_garbage_collect so xfrm4 dst_ops can > always be allocated, This would open doors for DOS attacks, we can't do this. > or just increasing it from gc_thresh * 2 up to * > 4 or more. This would just defer the problem, so not a real solution. That said, whatever we do, we just paper over the real problem, that is the flowcache itself. Everything that need this kind of garbage collecting is fundamentally broken. But as long as nobody volunteers to work on a replacement, we have to live with this situation somehow. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm: Add oif to dst lookups
On Mon, Aug 10, 2015 at 04:58:11PM -0600, David Ahern wrote: > Rules can be installed that direct route lookups to specific tables based > on oif. Plumb the oif through the xfrm lookups so it gets set in the flow > struct and passed to the resolver routines. > > Signed-off-by: David AhernDavid, this change broke vti tunnels. > @@ -1690,8 +1694,8 @@ static struct dst_entry *xfrm_bundle_create(struct > xfrm_policy *policy, > > if (xfrm[i]->props.mode != XFRM_MODE_TRANSPORT) { > family = xfrm[i]->props.family; > - dst = xfrm_dst_lookup(xfrm[i], tos, , , > - family); > + dst = xfrm_dst_lookup(xfrm[i], tos, fl->flowi_oif, > + , , family); Passing the original output interface to xfrm_dst_lookup will generate a routing loop whenever the original output interface is not identical to the tunnel endpoint, like it is with vti. We can not ask for a route through a specific interface here. This is the lookup for the tunnel endpoints, so it must return a route through the local tunnel endpoint device. I don't know how you are going to use this with your vrf changes, so I'm not sure how to fix this in a way that it works with vrf. Please look into this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xfrm4_garbage_collect reaching limit
On Thu, Sep 17, 2015 at 09:23:35PM -0700, David Miller wrote: > From: Steffen Klassert <steffen.klass...@secunet.com> > Date: Wed, 16 Sep 2015 10:45:41 +0200 > > > index 1e06c4f..3dffc73 100644 > > --- a/net/ipv4/xfrm4_policy.c > > +++ b/net/ipv4/xfrm4_policy.c > > @@ -248,7 +248,7 @@ static struct dst_ops xfrm4_dst_ops = { > > .destroy = xfrm4_dst_destroy, > > .ifdown = xfrm4_dst_ifdown, > > .local_out =__ip_local_out, > > - .gc_thresh =32768, > > + .gc_thresh =INT_MAX, > > }; > > > > static struct xfrm_policy_afinfo xfrm4_policy_afinfo = { > > This means the dst_ops->gc() for xfrm will never be invoked. > > Is that intentional? Yes. This is already the case on systems with less than 8 cpus because the flowcache is limited to 4096 entries per cpu. The percpu flowcache shrinks itself to 'low_watermark' enrires if it hits the percpu limit. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function
On Tue, Sep 29, 2015 at 04:58:46PM -0600, David Ahern wrote: > Hi Tom: > > On 9/29/15 4:17 PM, Tom Herbert wrote: > >This patch adds xfrm6_xlat_addr which is called in the data path > >to perform address translation (primarily for the receive path). Modules > >may register their own callback to perform a translation-- this > >registration is managed by xfrm6_xlat_addr_add and xfrm6_xlat_addr_del. > >xfrm6_xlat_addr allows translation of addresses for an sk_buff. > > > Seems like a stretch to lump this into xfrms. You have a separate > genl based config as opposed to the netlink xfrm API and you are > calling the xlat_addr function directly in ip6_rcv as opposed to via > some policy with dst_ops driven redirection. Why call this a xfrm? I have to agree here. We have policies and states to do the lookups and to describe the transformation. Just adding a callback to do this in a different way does not integrate well into xfrm. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] xfrm: Fix state threshold configuration from userspace
On Tue, Sep 29, 2015 at 11:25:08AM +0200, Michael Rossberg wrote: > Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry > timer (XFRMA_ETIMER_THRESH) of a state without having to set other > attributes like replay counter and byte lifetime. Changing these other > values while traffic flows will break the state. > > Signed-off-by: Michael RossbergApplied to the ipsec tree, thanks Michael! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv
On Tue, Sep 29, 2015 at 03:17:22PM -0700, Tom Herbert wrote: > Call before performing NF_HOOK and routing in order to perform address > translation in the receive path. > > Signed-off-by: Tom Herbert> --- > net/ipv6/ip6_input.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c > index 9075acf..06dac55 100644 > --- a/net/ipv6/ip6_input.c > +++ b/net/ipv6/ip6_input.c > @@ -183,6 +183,9 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, > struct packet_type *pt > /* Must drop socket now because of tproxy. */ > skb_orphan(skb); > > + /* Translate destination address before routing */ > + xfrm6_xlat_addr(skb); > + This shows that xfrm is not the right place to add this. The existing xfrm hooks are located at the same place as your current LWT hooks are. You could use the existing xfrm hooks similar to xfrm tunnel modes. This reinserts the transformed packet back into layer2, but I guess this is not what you want. I'm currently paying with a GRO codepath for IPsec to get the packets transformed early. If you can do your address translation that early, it could be an option too. This clearly depends on enabled GRO at the receiving device, but you would still have the LWT hook as a fallback. > return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING, > net, NULL, skb, dev, NULL, > ip6_rcv_finish); Or, try to use the netfilter hook that seems to be at the right place at least. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xfrm4_garbage_collect reaching limit
On Mon, Sep 21, 2015 at 10:51:11AM -0400, Dan Streetman wrote: > On Fri, Sep 18, 2015 at 1:00 AM, Dan Streetman <ddstr...@ieee.org> wrote: > > On Wed, Sep 16, 2015 at 4:45 AM, Steffen Klassert > > <steffen.klass...@secunet.com> wrote: > >> > >> What about the patch below? With this we are independent of the number > >> of cpus. It should cover most, if not all usecases. > > > > yep that works, thanks! I'll give it a test also, but I don't see how > > it would fail. > > Yep, on a test setup that previously failed within several hours, it > ran over the weekend successfully. Thanks! > > Tested-by: Dan Streetman <dan.street...@canonical.com> > > > > >> > >> While we are at it, we could think about increasing the flowcache > >> percpu limit. This value was choosen back in 2003, so maybe we could > >> have more than 4k cache entries per cpu these days. > >> > >> > >> Subject: [PATCH RFC] xfrm: Let the flowcache handle its size by default. > >> > >> The xfrm flowcache size is limited by the flowcache limit > >> (4096 * number of online cpus) and the xfrm garbage collector > >> threshold (2 * 32768), whatever is reached first. This means > >> that we can hit the garbage collector limit only on systems > >> with more than 16 cpus. On such systems we simply refuse > >> new allocations if we reach the limit, so new flows are dropped. > >> On syslems with 16 or less cpus, we hit the flowcache limit. > >> In this case, we shrink the flow cache instead of refusing new > >> flows. > >> > >> We increase the xfrm garbage collector threshold to INT_MAX > >> to get the same behaviour, independent of the number of cpus. > >> > >> The xfrm garbage collector threshold can still be set below > >> the flowcache limit to reduce the memory usage of the flowcache. > >> > >> Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com> I've applied this to ipsec-next now. It can be considered as a fix too, but we still can tweak the value via the sysctl in the meantime. So it is better to test it a bit longer before it hits the mainline. Thanks a lot for your work Dan! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPsec workshop/BoF at netdev1.1?
On Mon, Dec 07, 2015 at 08:20:01AM -0800, Eric Dumazet wrote: > On Mon, 2015-12-07 at 13:00 +0100, Steffen Klassert wrote: > > Is there any interest in doing an IPsec workshop/BoF at netdev1.1? > > > > This mail is to probe if we can gather enough discussion topics to run > > such a workshop/BoF. So if someone is interested to attend and/or has a > > related discussion topic, please let me know. > > > > My current topics: > > > > - Adding a software GRO/GSO codepath for IPsec. I have working example code, > > a RFC version could be ready before the netdev conference. > > > - Remove rwlock usage (xfrm_policy_lock) from fast path and szitch to > RCU ? Good point. I add it to the topic list, thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On Mon, Dec 07, 2015 at 06:27:48AM -0500, Sowmini Varadhan wrote: > On (12/07/15 09:40), Steffen Klassert wrote: > > > > I've pushed it to > > > > https://git.kernel.org/cgit/linux/kernel/git/klassert/linux-stk.git/log/?h=net-next-ipsec-offload > > > > It is just example code, nothing that I would show usually. > > But you asked for it, so here is it :) > > that's fine, I dont expect more at this point, just want to > test-drive it, and see how it compares to my approach. Would be nice if you could share the results. Comments are welcome too, of course. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On Wed, Dec 02, 2015 at 07:05:38AM -0500, Sowmini Varadhan wrote: > On (12/02/15 07:53), Steffen Klassert wrote: > > > > I'm currently working on a GRO/GSO codepath for IPsec too. The GRO part > > works already. I decapsulate/decrypt the packets on layer2 with a esp GRO > > callback function and reinject them into napi_gro_receive(). So in case > > the decapsulated packet is TCP, GRO can aggregate big packets. > > Would you be able to share your patch with me? I'd like to give that a try > just to get preliminary numbers (and I could massage it as needed > for transport mode too). I've got the final bits to work today, I can do async crypto now. I can push the patches to a public tree after some polishing. But I have to warn, it has still bugs and no usefull commit messages. I did a first test with forwaring esp in tunnel mode. The crypto algorithm I used was: pcrypt(echainiv(authenc(hmac(sha1-ssse3),cbc-aes-aesni))) Result: iperf -c 10.0.0.12 -t 60 Client connecting to 10.0.0.12, TCP port 5001 TCP window size: 45.0 KByte (default) [ 3] local 192.168.0.12 port 39380 connected with 10.0.0.12 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 32.8 GBytes 4.70 Gbits/sec I provide more informatios as soon as the code is available. > > > Another thing, I thought about setting up an IPsec BoF/workshop at > > netdev1.1. My main topic is GRO/GSO for IPsec. I'll send out a mail > > to the list later this week to see if there is enough interest and > > maybe some additional topics. > > Sounds like an excellent idea. I'm certainly interested. Great, than we are at least two :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipsec impact on performance
On Thu, Dec 03, 2015 at 06:38:20AM -0500, Sowmini Varadhan wrote: > On (12/03/15 09:45), Steffen Klassert wrote: > > pcrypt(echainiv(authenc(hmac(sha1-ssse3),cbc-aes-aesni))) > > > > Result: > > > > iperf -c 10.0.0.12 -t 60 > > > > Client connecting to 10.0.0.12, TCP port 5001 > > TCP window size: 45.0 KByte (default) > > > > [ 3] local 192.168.0.12 port 39380 connected with 10.0.0.12 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-60.0 sec 32.8 GBytes 4.70 Gbits/sec > > > > I provide more informatios as soon as the code is available. > > that's pretty good compared to the baseline. > I'd like to try out our patches, when they are ready. > > I think you may get some more improvement if you manually pin the irq > and iperf to specific cpus (at least that was my observation for transp > mode) I do that already. I have dedicated crypto and IO cpus, 2 cpus do networking IO and 4 cpus do crypto (parallelized with pcrypt). The bottleneck is now the cpu that does the TX path (checksumming of the GSO segments). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html