date:20151217

Re: rhashtable: Prevent spurious EBUSY errors on insertion

2015-12-17 Thread Herbert Xu

On Thu, Dec 17, 2015 at 04:46:00PM +0800, Xin Long wrote:
>
> sorry for late test, but unfortunately, my case with rhashtalbe still
> return EBUSY.
> I added some debug code in rhashtable_insert_rehash(), and found:
> *future_tbl is null*
> 
> fail:
> /* Do not fail the insert if someone else did a rehash. */
> if (likely(rcu_dereference_raw(tbl->future_tbl))) {
> printk("future_tbl is there\n");
> return 0;
> } else {
> printk("future_tbl is null\n");
> }
> 
> any idea why ?

That's presumably because you got a genuine double rehash.

Until you post your code we can't really help you.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rhashtable: Prevent spurious EBUSY errors on insertion

2015-12-17 Thread Xin Long

On Thu, Dec 17, 2015 at 4:48 PM, Herbert Xu  wrote:
> On Thu, Dec 17, 2015 at 04:46:00PM +0800, Xin Long wrote:
>>
>> sorry for late test, but unfortunately, my case with rhashtalbe still
>> return EBUSY.
>> I added some debug code in rhashtable_insert_rehash(), and found:
>> *future_tbl is null*
>>
>> fail:
>> /* Do not fail the insert if someone else did a rehash. */
>> if (likely(rcu_dereference_raw(tbl->future_tbl))) {
>> printk("future_tbl is there\n");
>> return 0;
>> } else {
>> printk("future_tbl is null\n");
>> }
>>
>> any idea why ?
>
> That's presumably because you got a genuine double rehash.
>
> Until you post your code we can't really help you.
>
i wish i could , but my codes is a big patch for sctp, and this issue
happens in a special stress test based on this patch.
im trying to think how i can show you. :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Load Balancing for AF_INET Raw Sockets

2015-12-17 Thread Prashant Upadhyaya

On Tue, Dec 15, 2015 at 6:26 PM, Prashant Upadhyaya
 wrote:
> Hi,
>
> I open a raw socket for listening to all the UDP packets in a raw fashion --
>
> socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
>
> Then I use recvfrom to read the packets over the socket.
>
> The above works mighty fine.
> I want to find out if it is possible to 'load balance' the UDP flows
> by opening up multiple instances of this socket and then possibly
> setting some socket options so that I can scale up the reading via
> multiple threads doing recvfrom on these from multiple cores.
> (I know it is possible over packet sockets, but that is a different usecase)
>
> Regards
> -Prashant

ah, the msg_name field in msghdr should do the trick for src address.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] bonding: restrict up state in 802.3ad mode

2015-12-17 Thread zyjzyj2000

From: Zhu Yanjun 

In 802.3ad mode, the speed and duplex is needed. But in some NIC,
there is a time span between NIC up state and getting speed and duplex.
As such, sometimes a slave in 802.3ad mode is in up state without
speed and duplex. This will make bonding in 802.3ad mode can not
work well. 
To make bonding driver be compatible with more NICs, it is
necessary to restrict the up state in 802.3ad mode.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/bonding/bond_main.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 9e0f8a7..0a80fb3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1991,6 +1991,25 @@ static int bond_miimon_inspect(struct bonding *bond)
 
link_state = bond_check_dev_link(bond, slave->dev, 0);
 
+   /* Since some NIC has time span between netif_running and
+* getting speed and duples. That is, after a NIC is up 
(netif_running),
+* there is a time span before this NIC is negotiated with 
speed and duplex.
+* During this time span, the slave in 802.3ad is configured 
without speed
+* and duplex. This 802.3ad bonding will not work because it 
needs slave's speed
+* and duplex to generate key field.
+* As such, we restrict up in 802.3ad mode to: netif_running && 
peed != SPEED_UNKNOWN &&
+* duplex != DUPLEX_UNKNOWN
+*/
+   if ((BMSR_LSTATUS == link_state) &&
+   (BOND_MODE(bond) == BOND_MODE_8023AD)) {
+   bond_update_speed_duplex(slave);
+   if ((slave->speed == SPEED_UNKNOWN) ||
+   (slave->duplex == DUPLEX_UNKNOWN)) {
+   link_state = 0;
+   netdev_info(bond->dev, "In 802.3ad mode, it is 
not enough to up without speed and duplex");
+   }
+   }
+
switch (slave->link) {
case BOND_LINK_UP:
if (link_state)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] net/macb: add support for resetting PHY using GPIO

2015-12-17 Thread Gregory CLEMENT

Hi Arnd,
 
 On mer., déc. 16 2015, Arnd Bergmann  wrote:

> On Wednesday 16 December 2015 19:31:30 Gregory CLEMENT wrote:
>> diff --git a/drivers/net/ethernet/cadence/macb.c 
>> b/drivers/net/ethernet/cadence/macb.c
>> index 88c1e1a..35661aa 100644
>> --- a/drivers/net/ethernet/cadence/macb.c
>> +++ b/drivers/net/ethernet/cadence/macb.c
>> @@ -28,6 +28,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>
> Is this the patch that is already in linux-next?

I've just checked and it is the v2 which is in linux-next. David applied
it on Monday in his next branche but I was not aware of it.

David,

if I remebered well you do not remove patch from yout branch.  So would
you agree to take a follow-up patch on top of 5833e0526820 "net/macb:
add support for resetting PHY using GPIO" ?

I will fix the error found by Arnd and also use a better device tree
binding (more future proof).

Thanks,

Gregory

>
> I needed an additional
>
> #include 
>
> to avoid this build error on randconfig builds without GPIOLIB:
>
> drivers/net/ethernet/cadence/macb.c: In function 'macb_probe':
> drivers/net/ethernet/cadence/macb.c:2908:19: error: implicit declaration 
> of function 'devm_gpiod_get_optional' [-Werror=implicit-function-declaration]
>   bp->reset_gpio = devm_gpiod_get_optional(>pdev->dev, "phy-reset",
>^
> drivers/net/ethernet/cadence/macb.c:2909:8: error: 'GPIOD_OUT_HIGH' 
> undeclared (first use in this function)
> GPIOD_OUT_HIGH);
> ^
> drivers/net/ethernet/cadence/macb.c:2909:8: note: each undeclared 
> identifier is reported only once for each function it appears in
> drivers/net/ethernet/cadence/macb.c: In function 'macb_remove':
> drivers/net/ethernet/cadence/macb.c:2979:3: error: implicit declaration 
> of function 'gpiod_set_value' [-Werror=implicit-function-declaration]
>
>
>   Arnd

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rhashtable: Prevent spurious EBUSY errors on insertion

2015-12-17 Thread Xin Long

On Thu, Dec 3, 2015 at 8:41 PM, Herbert Xu  wrote:
> On Mon, Nov 30, 2015 at 06:18:59PM +0800, Herbert Xu wrote:
>>
>> OK that's better.  I think I see the problem.  The test in
>> rhashtable_insert_rehash is racy and if two threads both try
>> to grow the table one of them may be tricked into doing a rehash
>> instead.
>>
>> I'm working on a fix.
>
> OK this patch fixes the EBUSY problem as far as I can tell.  Please
> let me know if you still observe EBUSY with it.  I'll respond to the
> ENOMEM problem in another email.
>
> ---8<---
> Thomas and Phil observed that under stress rhashtable insertion
> sometimes failed with EBUSY, even though this error should only
> ever been seen when we're under attack and our hash chain length
> has grown to an unacceptable level, even after a rehash.
>
> It turns out that the logic for detecting whether there is an
> existing rehash is faulty.  In particular, when two threads both
> try to grow the same table at the same time, one of them may see
> the newly grown table and thus erroneously conclude that it had
> been rehashed.  This is what leads to the EBUSY error.
>
> This patch fixes this by remembering the current last table we
> used during insertion so that rhashtable_insert_rehash can detect
> when another thread has also done a resize/rehash.  When this is
> detected we will give up our resize/rehash and simply retry the
> insertion with the new table.
>
> Reported-by: Thomas Graf 
> Reported-by: Phil Sutter 
> Signed-off-by: Herbert Xu 
>
> diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
> index 843ceca..e50b31d 100644
> --- a/include/linux/rhashtable.h
> +++ b/include/linux/rhashtable.h
> @@ -19,6 +19,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -339,10 +340,11 @@ static inline int lockdep_rht_bucket_is_held(const 
> struct bucket_table *tbl,
>  int rhashtable_init(struct rhashtable *ht,
> const struct rhashtable_params *params);
>
> -int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
> -  struct rhash_head *obj,
> -  struct bucket_table *old_tbl);
> -int rhashtable_insert_rehash(struct rhashtable *ht);
> +struct bucket_table *rhashtable_insert_slow(struct rhashtable *ht,
> +   const void *key,
> +   struct rhash_head *obj,
> +   struct bucket_table *old_tbl);
> +int rhashtable_insert_rehash(struct rhashtable *ht, struct bucket_table 
> *tbl);
>
>  int rhashtable_walk_init(struct rhashtable *ht, struct rhashtable_iter 
> *iter);
>  void rhashtable_walk_exit(struct rhashtable_iter *iter);
> @@ -598,9 +600,11 @@ restart:
>
> new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
> if (unlikely(new_tbl)) {
> -   err = rhashtable_insert_slow(ht, key, obj, new_tbl);
> -   if (err == -EAGAIN)
> +   tbl = rhashtable_insert_slow(ht, key, obj, new_tbl);
> +   if (!IS_ERR_OR_NULL(tbl))
> goto slow_path;
> +
> +   err = PTR_ERR(tbl);
> goto out;
> }
>
> @@ -611,7 +615,7 @@ restart:
> if (unlikely(rht_grow_above_100(ht, tbl))) {
>  slow_path:
> spin_unlock_bh(lock);
> -   err = rhashtable_insert_rehash(ht);
> +   err = rhashtable_insert_rehash(ht, tbl);
> rcu_read_unlock();
> if (err)
> return err;
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index a54ff89..2ff7ed9 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -389,33 +389,31 @@ static bool rhashtable_check_elasticity(struct 
> rhashtable *ht,
> return false;
>  }
>
> -int rhashtable_insert_rehash(struct rhashtable *ht)
> +int rhashtable_insert_rehash(struct rhashtable *ht,
> +struct bucket_table *tbl)
>  {
> struct bucket_table *old_tbl;
> struct bucket_table *new_tbl;
> -   struct bucket_table *tbl;
> unsigned int size;
> int err;
>
> old_tbl = rht_dereference_rcu(ht->tbl, ht);
> -   tbl = rhashtable_last_table(ht, old_tbl);
>
> size = tbl->size;
>
> +   err = -EBUSY;
> +
> if (rht_grow_above_75(ht, tbl))
> size *= 2;
> /* Do not schedule more than one rehash */
> else if (old_tbl != tbl)
> -   return -EBUSY;
> +   goto fail;
> +
> +   err = -ENOMEM;
>
> new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
> -   if (new_tbl == NULL) {
> -   /* Schedule async resize/rehash to try allocation
> -* non-atomic context.
> -*/
> -   schedule_work(>run_work);
> -   return -ENOMEM;
> -   }

Re: [PATCH net 2/2] udp: restrict offloads to one namespace

2015-12-17 Thread Hannes Frederic Sowa

Hi all,

On 17.12.2015 01:04, David Miller wrote:
> From: Hannes Frederic Sowa 
> Date: Tue, 15 Dec 2015 21:01:54 +0100
> 
>> udp tunnel offloads tend to aggregate datagrams based on inner
>> headers. gro engine gets notified by tunnel implementations about
>> possible offloads. The match is solely based on the port number.
>>
>> Imagine a tunnel bound to port 53, the offloading will look into all
>> DNS packets and tries to aggregate them based on the inner data found
>> within. This could lead to data corruption and malformed DNS packets.
>>
>> While this patch minimizes the problem and helps an administrator to find
>> the issue by querying ip tunnel/fou, a better way would be to match on
>> the specific destination ip address so if a user space socket is bound
>> to the same address it will conflict.
>>
>> Cc: Tom Herbert 
>> Cc: Eric Dumazet 
>> Signed-off-by: Hannes Frederic Sowa 
> 
> It looks this issue is still being hashed out so I've marked this
> patch as deferred for now.


I think we need this patch. We later can decide to add more
classification attributes, like dst ip down to gro, but the netns marks
are important.

With user namespaces a normal user can start a new network namespace
with all privileges and thus add new offloads, letting the other stack
interpret this garbage. Because the user namespace can also add
arbitrary ip addresses to its interface, solely matching those is not
enough.

Tom any further comments?

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Does the Community use Coverity ?

2015-12-17 Thread pavi1729

Sanket, Jeff,
Thanks a ton, that was helpful.

Cheers,
Pavi

On Wed, Dec 16, 2015 at 12:32 AM, Jeff Haran  wrote:
>>-Original Message-
>>From: kernelnewbies-bounces+jharan=bytemobile@kernelnewbies.org
>>[mailto:kernelnewbies-
>>bounces+jharan=bytemobile@kernelnewbies.org] On Behalf Of
>>pavi1729
>>Sent: Monday, December 14, 2015 11:03 PM
>>To: kernelnewb...@kernelnewbies.org; linux-fsde...@vger.kernel.org; linux-
>>m...@vger.kernel.org; linux-net...@vger.kernel.org
>>Subject: Does the Community use Coverity ?
>>
>>Hi,
>>  May I know if the community uses the Coverity tool and, if yes where can I
>>find a repo of Coverity scans of kernels and IGNORE LIST; cause there
>>obviously would be false positives.
>>
>>Cheers,
>>Pavi
>
> https://scan.coverity.com/
>
> Sign up for an account and join the Linux project.
>
> Jeff Haran
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code

2015-12-17 Thread Hannes Frederic Sowa

On 16.12.2015 21:09, Rainer Weikusat wrote:
> With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
> receive code was changed from using mutex_lock(>readlock) to
> mutex_lock_interruptible(>readlock) to prevent signals from being
> delayed for an indefinite time if a thread sleeping on the mutex
> happened to be selected for handling the signal. But this was never a
> problem with the stream receive code (as opposed to its datagram
> counterpart) as that never went to sleep waiting for new messages with the
> mutex held and thus, wouldn't cause secondary readers to block on the
> mutex waiting for the sleeping primary reader. As the interruptible
> locking makes the code more complicated in exchange for no benefit,
> change it back to using mutex_lock.
> 
> Signed-off-by: Rainer Weikusat 
> ---
> 
> Considering that the datagram receive routine also doesn't go the sleep
> with the mutex held anymore, the 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490
> change to unix_autobind is now similarly purposeless.

I wouldn't do this conversion, yet. There is still a deadlock lingering
around which should be solved earlier:

http://lists.openwall.net/netdev/2015/11/10/4

Unfortunately I haven't found a good way how to solve it, yet.

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net/macb: Update device tree binding for resetting PHY using GPIO

2015-12-17 Thread Gregory CLEMENT

Instead of being at the MAC level the reset gpio preperty is moved at the
PHY child node level. It is still managed by the MAC, but from the point
of view of the binding it make more sense to be part of the PHY node.

This commit also fixes a build errors if GPIOLIB is not selected.

Signed-off-by: Gregory CLEMENT 
---
 Documentation/devicetree/bindings/net/macb.txt |  8 ++--
 drivers/net/ethernet/cadence/macb.c| 15 ---
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/macb.txt 
b/Documentation/devicetree/bindings/net/macb.txt
index 4a7fb6c..38c8e84 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -19,8 +19,8 @@ Required properties:
Optional elements: 'tx_clk'
 - clocks: Phandles to input clocks.
 
-Optional properties:
-- phy-reset-gpios : Should specify the gpio for phy reset
+Optional properties for PHY child node:
+- reset-gpios : Should specify the gpio for phy reset
 
 Examples:
 
@@ -32,4 +32,8 @@ Examples:
local-mac-address = [3a 0e 03 04 05 06];
clock-names = "pclk", "hclk", "tx_clk";
clocks = < 30>, < 30>, < 13>;
+   ethernet-phy@1 {
+   reg = <0x1>;
+   reset-gpios = < 6 1>;
+   };
};
diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 71fbda3..12370dd 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -2813,6 +2815,7 @@ static int macb_probe(struct platform_device *pdev)
  = macb_clk_init;
int (*init)(struct platform_device *) = macb_init;
struct device_node *np = pdev->dev.of_node;
+   struct device_node *phy_node;
const struct macb_config *macb_config = NULL;
struct clk *pclk, *hclk, *tx_clk;
unsigned int queue_mask, num_queues;
@@ -2901,8 +2904,14 @@ static int macb_probe(struct platform_device *pdev)
macb_get_hwaddr(bp);
 
/* Power up the PHY if there is a GPIO reset */
-   bp->reset_gpio = devm_gpiod_get_optional(>pdev->dev, "phy-reset",
-GPIOD_OUT_HIGH);
+   phy_node =  of_get_next_available_child(np, NULL);
+   if (phy_node) {
+   int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
+   if (gpio_is_valid(gpio))
+   bp->reset_gpio = gpio_to_desc(gpio);
+   gpiod_set_value(bp->reset_gpio, GPIOD_OUT_HIGH);
+   }
+   of_node_put(phy_node);
 
err = of_get_phy_mode(np);
if (err < 0) {
@@ -2972,7 +2981,7 @@ static int macb_remove(struct platform_device *pdev)
mdiobus_free(bp->mii_bus);
 
/* Shutdown the PHY if there is a GPIO reset */
-   gpiod_set_value(bp->reset_gpio, 0);
+   gpiod_set_value(bp->reset_gpio, GPIOD_OUT_LOW);
 
unregister_netdev(dev);
clk_disable_unprepare(bp->tx_clk);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] net: usb: cdc_ncm: Adding support for two new Dell devices

2015-12-17 Thread Daniele Palmas

Hi Bjorn,

2015-12-17 13:21 GMT+01:00 Bjørn Mork :
> Dan Williams  writes:
>> On Wed, 2015-12-16 at 10:39 +0100, Daniele Palmas wrote:
>>> This patch series add support in the cdc_ncm driver for two devices
>>> based on the same platform, that are different only for carrier
>>> customization.
>>>
>>> The devices do not have ARP capabilities.
>>>
>>> Daniele Palmas (2):
>>>   net: usb: cdc_ncm: Adding Dell DW5812 LTE Verizon Mobile Broadband
>>> Card
>>>   net: usb: cdc_ncm: Adding Dell DW5813 LTE AT Mobile Broadband
>>> Card
>>
>> Quite interesting; Google knows nothing about these devices that I can
>> find.  What platform are these based on?
>
> There are a number of launchpad bugs reported for these devices.  Don't
> know why they chose that channel..  This one:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1520147
> shows that it is an Intel/Infineon based design.
>
> BTW, DeWalt seems to own these device names :)
>

patches were passed first to Canonical for custom OS image testing.

413c:81ba is the HSPA+ variant of the family: however this does not
use ncm, but ecm, see
https://github.com/torvalds/linux/commit/0b88393cdf6b1322522849e61f7a3328f4fd3843

>
> Bjørn

Thanks,
Daniele
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/macb: add proper header file

2015-12-17 Thread Nicolas Ferre

Le 17/12/2015 13:15, Sudip Mukherjee a écrit :
> mips allmodconfig build fails with the error:
> 
> drivers/net/ethernet/cadence/macb.c: In function 'macb_probe':
> drivers/net/ethernet/cadence/macb.c:2908:2: error: implicit declaration of 
> function 'devm_gpiod_get_optional' [-Werror=implicit-function-declaration]
>   bp->reset_gpio = devm_gpiod_get_optional(>pdev->dev, "phy-reset",
>   ^
> drivers/net/ethernet/cadence/macb.c:2909:8: error: 'GPIOD_OUT_HIGH' 
> undeclared (first use in this function)
> GPIOD_OUT_HIGH);
> ^
> drivers/net/ethernet/cadence/macb.c:2909:8: note: each undeclared identifier 
> is reported only once for each function it appears in
> drivers/net/ethernet/cadence/macb.c: In function 'macb_remove':
> drivers/net/ethernet/cadence/macb.c:2979:3: error: implicit declaration of 
> function 'gpiod_set_value' [-Werror=implicit-function-declaration]
>gpiod_set_value(bp->reset_gpio, 0);
>^
> 
> Add the proper header file to resolve it.  
> 
> Fixes: 5833e0526820 ("net/macb: add support for resetting PHY using GPIO")
> Cc: Gregory CLEMENT <gregory.clem...@free-electrons.com>
> Signed-off-by: Sudip Mukherjee <su...@vectorindia.org>

This one is fixed and handled by Gregory here:
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/394433.html

Thanks a lot anyay, bye,


> ---
> 
> build log with next-20151217 is at:
> https://travis-ci.org/sudipm-mukherjee/parport/jobs/97388463
> 
>  drivers/net/ethernet/cadence/macb.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/cadence/macb.c 
> b/drivers/net/ethernet/cadence/macb.c
> index 0123646..988ee14 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> 


-- 
Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] mac80211: improve the contiguous mask checking

2015-12-17 Thread Zeng Zhaoxiu

If the result of adding the first set bit to the mask is power of 2,
the mask must be contiguous. "mask & -mask" can get the first set bit
of mask gracefully.

Signed-off-by: Zeng Zhaoxiu 
---
 net/mac80211/iface.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index c9e325d..4c896e8 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -1628,7 +1628,9 @@ static void ieee80211_assign_perm_addr(struct 
ieee80211_local *local,
((u64)m[2] << 3*8) | ((u64)m[3] << 2*8) |
((u64)m[4] << 1*8) | ((u64)m[5] << 0*8);
 
-   if (__ffs64(mask) + hweight64(mask) != fls64(mask)) {
+   inc = (mask & -mask);
+   val = mask + inc;
+   if ((val & (val - 1)) != 0) {
/* not a contiguous mask ... not handled now! */
pr_info("not contiguous\n");
break;
@@ -1649,7 +1651,6 @@ static void ieee80211_assign_perm_addr(struct 
ieee80211_local *local,
((u64)m[2] << 3*8) | ((u64)m[3] << 2*8) |
((u64)m[4] << 1*8) | ((u64)m[5] << 0*8);
 
-   inc = 1ULL<<__ffs64(mask);
val = (start & mask);
addr = (start & ~mask) | (val & mask);
do {
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/macb: add proper header file

2015-12-17 Thread Sudip Mukherjee

On Thu, Dec 17, 2015 at 01:41:59PM +0100, Gregory CLEMENT wrote:
> Hi Sudip,
>  
>  On jeu., déc. 17 2015, Sudip Mukherjee  wrote:
> 
> > mips allmodconfig build fails with the error:
> >
> > drivers/net/ethernet/cadence/macb.c: In function 'macb_probe':
> > drivers/net/ethernet/cadence/macb.c:2908:2: error: implicit declaration of 
> > function 'devm_gpiod_get_optional' [-Werror=implicit-function-declaration]
> >   bp->reset_gpio = devm_gpiod_get_optional(>pdev->dev, "phy-reset",
> >   ^
> > drivers/net/ethernet/cadence/macb.c:2909:8: error: 'GPIOD_OUT_HIGH' 
> > undeclared (first use in this function)
> > GPIOD_OUT_HIGH);
> > ^
> > drivers/net/ethernet/cadence/macb.c:2909:8: note: each undeclared 
> > identifier is reported only once for each function it appears in
> > drivers/net/ethernet/cadence/macb.c: In function 'macb_remove':
> > drivers/net/ethernet/cadence/macb.c:2979:3: error: implicit declaration of 
> > function 'gpiod_set_value' [-Werror=implicit-function-declaration]
> >gpiod_set_value(bp->reset_gpio, 0);
> >^
> >
> > Add the proper header file to resolve it.  
> 
> 
> A proper fix already has been posted along with the proper device tree
> bindinsg too:
> http://marc.info/?l=linux-netdev=145034590619620=2

Thanks, just compiled with it and the build error for which I posted this patch
is now gone.

regards
sudip
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][iproute2] tc/q_htb.c: Fix the MPU value output in 'tc -d class show dev ' command

2015-12-17 Thread Phil Sutter

On Thu, Dec 17, 2015 at 12:45:33PM +0300, Dmitrii Shcherbakov wrote:
> >I don't think your patch should contain this cleanup of "b4".
> 
> It seems that b3 is only used for the legacy overhead part and if I remove 
> it, b3 is not going to be used. So I figured I remove b4 put b3 instead.

No worries, your intentions are good per se. All he wants from you is to
split changes up: a dedicated patch for mpu->overhead conversion and a
second one dedicated for SPRINT_BUF variable cleanup.

HTH, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 1/2] net/mlx4_en: Remove dependency between timestamping capability and service_task

2015-12-17 Thread Or Gerlitz

From: Eugenia Emantayev 

Service task is responsible for other tasks in addition to timestamping
overflow check. Launch it even if timestamping is not supported by device.

Fixes: 07841f9d94c1 ('net/mlx4_en: Schedule napi when RX buffers allocation 
fails')
Signed-off-by: Eugenia Emantayev 
Signed-off-by: Eran Ben Elisha 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 886e1bc..4eef316 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -3058,9 +3058,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int 
port,
}
queue_delayed_work(mdev->workqueue, >stats_task, STATS_DELAY);
 
-   if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS)
-   queue_delayed_work(mdev->workqueue, >service_task,
-  SERVICE_TASK_DELAY);
+   queue_delayed_work(mdev->workqueue, >service_task,
+  SERVICE_TASK_DELAY);
 
mlx4_en_set_stats_bitmap(mdev->dev, >stats_bitmap,
 mdev->profile.prof[priv->port].rx_ppp,
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 2/2] net/mlx4_en: Fix HW timestamp init issue upon system startup

2015-12-17 Thread Or Gerlitz

From: Eugenia Emantayev 

mlx4_en_init_timestamp was called before creation of netdev and port
init, thus used uninitialized values.  Specifically - NIC frequency was
incorrect causing wrong calculations and later wrong HW timestamps.

Fixes: 1ec4864b1017 ('net/mlx4_en: Fixed crash when port type is changed')
Signed-off-by: Eugenia Emantayev 
Signed-off-by: Marina Varshaver 
Signed-off-by: Eran Ben Elisha 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx4/en_clock.c  | 7 +++
 drivers/net/ethernet/mellanox/mlx4/en_main.c   | 7 ---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 7 +++
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
index 8a083d7..038f9ce 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -242,6 +242,13 @@ void mlx4_en_init_timestamp(struct mlx4_en_dev *mdev)
unsigned long flags;
u64 ns, zero = 0;
 
+   /* mlx4_en_init_timestamp is called for each netdev.
+* mdev->ptp_clock is common for all ports, skip initialization if
+* was done for other port.
+*/
+   if (mdev->ptp_clock)
+   return;
+
rwlock_init(>clock_lock);
 
memset(>cycles, 0, sizeof(mdev->cycles));
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c 
b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index 005f910..e0ec280 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -232,9 +232,6 @@ static void mlx4_en_remove(struct mlx4_dev *dev, void 
*endev_ptr)
if (mdev->pndev[i])
mlx4_en_destroy_netdev(mdev->pndev[i]);
 
-   if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS)
-   mlx4_en_remove_timestamp(mdev);
-
flush_workqueue(mdev->workqueue);
destroy_workqueue(mdev->workqueue);
(void) mlx4_mr_free(dev, >mr);
@@ -320,10 +317,6 @@ static void *mlx4_en_add(struct mlx4_dev *dev)
mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH)
mdev->port_cnt++;
 
-   /* Initialize time stamp mechanism */
-   if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS)
-   mlx4_en_init_timestamp(mdev);
-
/* Set default number of RX rings*/
mlx4_en_set_num_rx_rings(mdev);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 4eef316..7869f97 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2072,6 +2072,9 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
/* flush any pending task for this netdev */
flush_workqueue(mdev->workqueue);
 
+   if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS)
+   mlx4_en_remove_timestamp(mdev);
+
/* Detach the netdev so tasks would not attempt to access it */
mutex_lock(>state_lock);
mdev->pndev[priv->port] = NULL;
@@ -3058,6 +3061,10 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int 
port,
}
queue_delayed_work(mdev->workqueue, >stats_task, STATS_DELAY);
 
+   /* Initialize time stamp mechanism */
+   if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS)
+   mlx4_en_init_timestamp(mdev);
+
queue_delayed_work(mdev->workqueue, >service_task,
   SERVICE_TASK_DELAY);
 
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 0/2] Mellanox mlx4 driver fixes

2015-12-17 Thread Or Gerlitz

Hi Dave,

Two small fixes from Jenny for code flows that deal with time-stamping.


Or.

Eugenia Emantayev (2):
  net/mlx4_en: Remove dependency between timestamping capability and 
service_task
  net/mlx4_en: Fix HW timestamp init issue upon system startup

 drivers/net/ethernet/mellanox/mlx4/en_clock.c  |  7 +++
 drivers/net/ethernet/mellanox/mlx4/en_main.c   |  7 ---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 10 --
 3 files changed, 15 insertions(+), 9 deletions(-)

-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] team: Advertise tunneling offload features

2015-12-17 Thread Or Gerlitz

From: Eran Ben Elisha 

When the underlying device supports offloads encapulated traffic,
we need to reflect that through the hw_enc_features field of the
team net-device.

This will cause the xmit path in the core networking stack to provide
team with encapsulated GSO frames to offload into the HW etc.

Using this over Mellanox ConnectX3-pro (mlx4 driver) card that supports
VXLAN offloads we got 36.0 Gbits/sec using eight iperf streams.

Signed-off-by: Eran Ben Elisha 
Signed-off-by: Jack Morgenstein 
Reviewed-by: Or Gerlitz 
Acked-by: Jiri Pirko 
---
 drivers/net/team/team.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 915f60f..2528331 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -985,10 +985,14 @@ static void team_port_disable(struct team *team,
NETIF_F_FRAGLIST | NETIF_F_ALL_TSO | \
NETIF_F_HIGHDMA | NETIF_F_LRO)
 
+#define TEAM_ENC_FEATURES  (NETIF_F_HW_CSUM | NETIF_F_SG | \
+NETIF_F_RXCSUM | NETIF_F_ALL_TSO)
+
 static void __team_compute_features(struct team *team)
 {
struct team_port *port;
u32 vlan_features = TEAM_VLAN_FEATURES & NETIF_F_ALL_FOR_ALL;
+   netdev_features_t enc_features  = TEAM_ENC_FEATURES;
unsigned short max_hard_header_len = ETH_HLEN;
unsigned int dst_release_flag = IFF_XMIT_DST_RELEASE |
IFF_XMIT_DST_RELEASE_PERM;
@@ -997,6 +1001,11 @@ static void __team_compute_features(struct team *team)
vlan_features = netdev_increment_features(vlan_features,
port->dev->vlan_features,
TEAM_VLAN_FEATURES);
+   enc_features =
+   netdev_increment_features(enc_features,
+ port->dev->hw_enc_features,
+ TEAM_ENC_FEATURES);
+
 
dst_release_flag &= port->dev->priv_flags;
if (port->dev->hard_header_len > max_hard_header_len)
@@ -1004,6 +1013,7 @@ static void __team_compute_features(struct team *team)
}
 
team->dev->vlan_features = vlan_features;
+   team->dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
team->dev->hard_header_len = max_hard_header_len;
 
team->dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
@@ -2091,6 +2101,7 @@ static void team_setup(struct net_device *dev)
   NETIF_F_HW_VLAN_CTAG_RX |
   NETIF_F_HW_VLAN_CTAG_FILTER;
 
+   dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
dev->features |= dev->hw_features;
 }
 
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Load Balancing for AF_INET Raw Sockets

2015-12-17 Thread Eric Dumazet

On Thu, 2015-12-17 at 11:40 +0530, Prashant Upadhyaya wrote:
> On Tue, Dec 15, 2015 at 8:09 PM, Eric Dumazet  wrote:
> > On Tue, 2015-12-15 at 18:26 +0530, Prashant Upadhyaya wrote:
> >> Hi,
> >>
> >> I open a raw socket for listening to all the UDP packets in a raw fashion 
> >> --
> >>
> >> socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
> >>
> >> Then I use recvfrom to read the packets over the socket.
> >>
> >> The above works mighty fine.
> >> I want to find out if it is possible to 'load balance' the UDP flows
> >> by opening up multiple instances of this socket and then possibly
> >> setting some socket options so that I can scale up the reading via
> >> multiple threads doing recvfrom on these from multiple cores.
> >> (I know it is possible over packet sockets, but that is a different 
> >> usecase)
> >
> > No plan yet to support fanout on multiple raw sockets.
> >
> >
> Hi,
> 
> One question on the AF_INET6 raw sockets.
> Here I don't get the ipv6 header at all when I read a packet.
> I checked the RFC 3542 and it specifies the following as the ancillary
> data which can be obtained --
> 
> Four similar pieces of information can be returned for a received
>packet as ancillary data:
> 
>   1.  the destination IPv6 address,
>   2.  the arriving interface index,
>   3.  the arriving hop limit, and
>   4.  the arriving traffic class value.
> 
> Now how do I obtain the 'src IPv6 address' ?

man recvfrom



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] net: usb: cdc_ncm: Adding support for two new Dell devices

2015-12-17 Thread Daniele Palmas

Hi Dan,

2015-12-16 18:12 GMT+01:00 Dan Williams :
> On Wed, 2015-12-16 at 10:39 +0100, Daniele Palmas wrote:
>> This patch series add support in the cdc_ncm driver for two devices
>> based on the same platform, that are different only for carrier
>> customization.
>>
>> The devices do not have ARP capabilities.
>>
>> Daniele Palmas (2):
>>   net: usb: cdc_ncm: Adding Dell DW5812 LTE Verizon Mobile Broadband
>> Card
>>   net: usb: cdc_ncm: Adding Dell DW5813 LTE AT Mobile Broadband
>> Card
>
> Quite interesting; Google knows nothing about these devices that I can
> find.  What platform are these based on?
>

those devices are still in the testing stage, so there is nothing
about them in the web. They are Infineon based.

> But in any case, since these blocks are almost identical to the DW5550
> block, maybe update the comments to indicate that they need NOARP
> unlike the MBM platform that the 5550 is based on?

Sure, I can do a V2 patch for that.

>
> Dan

Thanks,
Daniele
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] nfp: call netif_carrier_off() during init

2015-12-17 Thread Jakub Kicinski

Netdevs default to carrier on, we should call netif_carrier_off()
during initialization since we handle carrier state changes in the
driver.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Rolf Neugebauer 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 6c5af4cb5bdc..43c618bafdb6 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2417,6 +2417,7 @@ int nfp_net_netdev_init(struct net_device *netdev)
ether_setup(netdev);
netdev->netdev_ops = _net_netdev_ops;
netdev->watchdog_timeo = msecs_to_jiffies(5 * 1000);
+   netif_carrier_off(netdev);
 
nfp_net_set_ethtool_ops(netdev);
nfp_net_irqs_assign(netdev);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/macb: add proper header file

2015-12-17 Thread Gregory CLEMENT

Hi Sudip,
 
 On jeu., déc. 17 2015, Sudip Mukherjee <sudipm.mukher...@gmail.com> wrote:

> mips allmodconfig build fails with the error:
>
> drivers/net/ethernet/cadence/macb.c: In function 'macb_probe':
> drivers/net/ethernet/cadence/macb.c:2908:2: error: implicit declaration of 
> function 'devm_gpiod_get_optional' [-Werror=implicit-function-declaration]
>   bp->reset_gpio = devm_gpiod_get_optional(>pdev->dev, "phy-reset",
>   ^
> drivers/net/ethernet/cadence/macb.c:2909:8: error: 'GPIOD_OUT_HIGH' 
> undeclared (first use in this function)
> GPIOD_OUT_HIGH);
> ^
> drivers/net/ethernet/cadence/macb.c:2909:8: note: each undeclared identifier 
> is reported only once for each function it appears in
> drivers/net/ethernet/cadence/macb.c: In function 'macb_remove':
> drivers/net/ethernet/cadence/macb.c:2979:3: error: implicit declaration of 
> function 'gpiod_set_value' [-Werror=implicit-function-declaration]
>gpiod_set_value(bp->reset_gpio, 0);
>^
>
> Add the proper header file to resolve it.  


A proper fix already has been posted along with the proper device tree
bindinsg too:
http://marc.info/?l=linux-netdev=145034590619620=2

Thanks,

Gregory

>
> Fixes: 5833e0526820 ("net/macb: add support for resetting PHY using GPIO")
> Cc: Gregory CLEMENT <gregory.clem...@free-electrons.com>
> Signed-off-by: Sudip Mukherjee <su...@vectorindia.org>
> ---
>
> build log with next-20151217 is at:
> https://travis-ci.org/sudipm-mukherjee/parport/jobs/97388463
>
>  drivers/net/ethernet/cadence/macb.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/cadence/macb.c 
> b/drivers/net/ethernet/cadence/macb.c
> index 0123646..988ee14 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> -- 
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[iproute PATCH v2 2/2] ss: support closing inet sockets via SOCK_DESTROY.

2015-12-17 Thread Lorenzo Colitti

This patch adds a -K / --kill option to ss that attempts to
forcibly close matching sockets using SOCK_DESTROY.

This is implemented by adding a new "struct action" to struct
filter. If necessary, this can be extended later on to support
further actions on sockets.

Because ss typically prints sockets instead of acting on them,
and because the kernel only suppors forcibly closing some types
of sockets, the output of -K is as follows:

- If closing the socket succeeds, the socket is printed.
- If the kernel does not support forcibly closing this type of
  socket (e.g., if it's a UDP socket, or a TIME_WAIT socket),
  the socket is silently skipped.
- If an error occurs (e.g., permission denied), the error is
  reported and ss exits.

Signed-off-by: Lorenzo Colitti 
---
 include/linux/sock_diag.h |  1 +
 misc/ss.c | 52 +--
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/include/linux/sock_diag.h b/include/linux/sock_diag.h
index 024e1f4..dafcb89 100644
--- a/include/linux/sock_diag.h
+++ b/include/linux/sock_diag.h
@@ -4,6 +4,7 @@
 #include 
 
 #define SOCK_DIAG_BY_FAMILY 20
+#define SOCK_DESTROY 21
 
 struct sock_diag_req {
__u8sdiag_family;
diff --git a/misc/ss.c b/misc/ss.c
index 0dab32c..be70c41 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -160,6 +160,9 @@ struct filter
int states;
int families;
struct ssfilter *f;
+   struct {
+   __u8 kill:1;
+   } action;
 };
 
 static const struct filter default_dbs[MAX_DB] = {
@@ -2194,8 +2197,27 @@ static int sockdiag_send(int family, int fd, int 
protocol, struct filter *f)
 struct inet_diag_arg {
struct filter *f;
int protocol;
+   struct rtnl_handle *rth;
 };
 
+static int kill_inet_sock(const struct sockaddr_nl *addr,
+   struct nlmsghdr *h, void *arg)
+{
+   struct inet_diag_msg *d = NLMSG_DATA(h);
+   struct inet_diag_arg *diag_arg = arg;
+   struct rtnl_handle *rth = diag_arg->rth;
+   DIAG_REQUEST(req, struct inet_diag_req_v2 r);
+
+   req.nlh.nlmsg_type = SOCK_DESTROY;
+   req.nlh.nlmsg_flags = NLM_F_REQUEST;
+   req.nlh.nlmsg_seq = ++rth->seq;
+   req.r.sdiag_family = d->idiag_family;
+   req.r.sdiag_protocol = diag_arg->protocol;
+   req.r.id = d->id;
+
+   return rtnl_send_check_ack(rth, , req.nlh.nlmsg_len, 1);
+}
+
 static int show_one_inet_sock(const struct sockaddr_nl *addr,
struct nlmsghdr *h, void *arg)
 {
@@ -2205,6 +2227,15 @@ static int show_one_inet_sock(const struct sockaddr_nl 
*addr,
 
if (!(diag_arg->f->families & (1 << r->idiag_family)))
return 0;
+   if (diag_arg->f->action.kill && kill_inet_sock(addr, h, arg) != 0) {
+   if (errno == EOPNOTSUPP) {
+   /* This socket can't be closed. Silently skip it. */
+   return 0;
+   } else {
+   perror("SOCK_DESTROY answers");
+   return -1;
+   }
+   }
if ((err = inet_show_sock(h, diag_arg->f, diag_arg->protocol)) < 0)
return err;
 
@@ -2214,12 +2245,21 @@ static int show_one_inet_sock(const struct sockaddr_nl 
*addr,
 static int inet_show_netlink(struct filter *f, FILE *dump_fp, int protocol)
 {
int err = 0;
-   struct rtnl_handle rth;
+   struct rtnl_handle rth, rth2;
int family = PF_INET;
struct inet_diag_arg arg = { .f = f, .protocol = protocol };
 
if (rtnl_open_byproto(, 0, NETLINK_SOCK_DIAG))
return -1;
+
+   if (f->action.kill) {
+   if (rtnl_open_byproto(, 0, NETLINK_SOCK_DIAG)) {
+   rtnl_close();
+   return -1;
+   }
+   arg.rth = 
+   }
+
rth.dump = MAGIC_SEQ;
rth.dump_fp = dump_fp;
if (preferred_family == PF_INET6)
@@ -2243,6 +2283,8 @@ again:
 
 Exit:
rtnl_close();
+   if (arg.rth)
+   rtnl_close(arg.rth);
return err;
 }
 
@@ -3489,6 +3531,8 @@ static void _usage(FILE *dest)
 "   -x, --unix  display only Unix domain sockets\n"
 "   -f, --family=FAMILY display sockets of type FAMILY\n"
 "\n"
+"   -K, --kill  forcibly close sockets, display what was closed\n"
+"\n"
 "   -A, --query=QUERY, --socket=QUERY\n"
 "   QUERY := 
{all|inet|tcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|packet|netlink}[,QUERY]\n"
 "\n"
@@ -3579,6 +3623,7 @@ static const struct option long_opts[] = {
{ "context", 0, 0, 'Z' },
{ "contexts", 0, 0, 'z' },
{ "net", 1, 0, 'N' },
+   { "kill", 0, 0, 'K' },
{ 0 }
 
 };
@@ -3593,7 +3638,7 @@ int main(int argc, char *argv[])
int ch;
int state_filter = 0;
 
-   while ((ch = getopt_long(argc, argv, 
"dhaletuwxnro460spbEf:miA:D:F:vVzZN:",
+   while ((ch = getopt_long(argc, argv,

Re: [RFCv4 bluetooth-next 1/2] 6lowpan: iphc: add support for stateful compression

2015-12-17 Thread Alexander Aring

On Thu, Dec 17, 2015 at 12:26:37PM +, Duda, Lukasz wrote:
> Hi Alex!
> 
> > -Original Message-
> > From: Alexander Aring [mailto:alex.ar...@gmail.com]
> > Sent: Tuesday, December 15, 2015 12:09
> > To: Duda, Lukasz
> > Cc: linux-w...@vger.kernel.org; linux-blueto...@vger.kernel.org;
> > netdev@vger.kernel.org; ker...@pengutronix.de; m...@sandelman.ca;
> > martin.gergel...@hs-rm.de
> > Subject: Re: [RFCv4 bluetooth-next 1/2] 6lowpan: iphc: add support for
> > stateful compression
> > 
> > On Tue, Dec 15, 2015 at 10:29:51AM +, Duda, Lukasz wrote:
> > > First of all great work for your series of patches on 6lowpan improvements
> > and
> > > stateful compression!
> > >
> > > I have just done some testing of this patch (without RADVD modifications),
> > and
> > > I can share my experiments using 6LoWPAN over BT-LE by sending simple
> > ICMPv6
> > > messages. Contexts for BTLE device has been added manually.
> > >
> > 
> > did you test that with linux <-> linux? Or linux <->
> > $SOME_OTHER_6LOWPAN_BTLE_STACK.
> > 
> > I tested it on my side with RIOT, it has 802.15.4 6LoWPAN support and
> > also manipulate manually the context table.
> > 
> 
> I have tested it with nRF52 BTLE device from Nordic Semiconductor 
> with IoT SDK, and Linux Ubuntu with BTLE Dongle that acts as Router/Master.
> 
> > > Experiment 1:
> > >
> > > Router: 2001:db8::1/64 BTLE: 2001:db8::211:22FF:FE33:4455/64
> > > CID 1: 2001:db8::/64
> > >
> > > Works fine, I see that CID 1 is used for both addresses. Router has 64 
> > > bits of
> > > IID inline and BTLE node has 0.
> > >
> > > Experiment 2:
> > >
> > > Router: 2001:db8::1/64 BTLE: 2001:db8::211:22FF:FE33:4455/64
> > > CID 3: 2001:db8::/64 CID 5: 2001:db8::1/128
> > >
> > > Works also fine, I see that both CID 3 and 5 are used, as well as both 
> > > sides
> > > compress its IID in the best possible way. So the patch appears to work 
> > > fine
> > on
> > > 6LoWPAN over BT-LE.
> > >
> > 
> > ok.
> > 
> > >
> > > However, I notice that the folder created in the sys/kernel/debug/6lowpan/
> > for
> > > my bluetooth network interface is called "bt%d". And I would imagine this
> > > should be "bt0", "bt1", ... and not the template?
> > >
> > 
> > urgh, this should not happen. I use "dev->name" for that and dev is the
> > netdevice structure. This should be an _unique_ interface name,
> > otherwise you will getting trouble if you have two btle 6lowpan
> > interfaces.
> > 
> > I didn't realized it because I create my interface with:
> > 
> > ip link add link wpan0 name lowpan0 type lowpan
> > 
> > but should change it into:
> > 
> > ip link add link wpan0 name lowpan%d type lowpan
> > 
> > I realized that the dev->name will be changed from template into "real"
> > name after registering. This should do the job:
> > 
> > diff --git a/net/6lowpan/core.c b/net/6lowpan/core.c
> > index c7f06f5..faf65ba 100644
> > --- a/net/6lowpan/core.c
> > +++ b/net/6lowpan/core.c
> > @@ -29,13 +29,13 @@ int lowpan_register_netdevice(struct net_device
> > *dev,
> > 
> > lowpan_priv(dev)->lltype = lltype;
> > 
> > -   ret = lowpan_dev_debugfs_init(dev);
> > +   ret = register_netdevice(dev);
> > if (ret < 0)
> > return ret;
> > 
> > -   ret = register_netdevice(dev);
> > +   ret = lowpan_dev_debugfs_init(dev);
> > if (ret < 0)
> > -   lowpan_dev_debugfs_exit(dev);
> > +   unregister_netdevice(dev);
> > 
> > return ret;
> >  }
> > 
> > I think it should be safe to do that after registering because we held the
> > RTNL lock. And the interface isn't up after registering.
> > 
> 
> Thanks! Your patch helped, and I acked it in separate mail thread.
> 
> > > Also, I notice that the compression for Flow Control and Traffic Label in 
> > > IPv6
> > > header has been modified, these fields are no longer compressed in any
> > packets
> > > (0b11 value) that comes from Linux Kernel (e.g. ICMP Echo Request,
> > > Router Advertisement), instead I get three extra bytes (0b01 value).
> > > I would like to understand reason for this modification a little better.
> > 
> > 0b11 means that traffic class and flow label are zero. Are you sure that
> > these fields are zero inside the IPv6 header when you transmit
> > "e.g. ICMP Echo Request, RA"?
> > 
> > Can you verify this by running tcpdump/wireshark? Or instruments some
> > printk's at [0] for hdr->flow_lbl array and hdr->priority?
> > 
> > - Alex
> > 
> > [0] http://lxr.free-electrons.com/source/net/6lowpan/iphc.c#L428
> 
> I did some more linux debugging, and indeed, its IPv6 stack that already 
> gives ip6hdr
> with flow label set to some strange number. Do you know maybe the reason of 
> this?
> On Kernel version < 4.2 that field was always set to 0, thus better 
> compression can
> be applied.
> 

I think it depends on ping6 implementation. There exists some of them
outside, I using "iputils" [0].

Look inside the manpage of iputils:

-F flow label
ping6 only.  Allocate

[PATCH net] sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close

2015-12-17 Thread Xin Long

In sctp_close, sctp_make_abort_user may return NULL because of memory
allocation failure. If this happens, it will bypass any state change
and never free the assoc. The assoc has no chance to be freed and it
will be kept in memory with the state it had even after the socket is
closed by sctp_close().

So if sctp_make_abort_user fails to allocate memory, we should just
free the asoc, as there isn't much else that we can do.

Signed-off-by: Xin Long 
Acked-by: Marcelo Ricardo Leitner 
---
 net/sctp/socket.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 9b6cc6d..267b8f8 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1513,8 +1513,12 @@ static void sctp_close(struct sock *sk, long timeout)
struct sctp_chunk *chunk;
 
chunk = sctp_make_abort_user(asoc, NULL, 0);
-   if (chunk)
+   if (chunk) {
sctp_primitive_ABORT(net, asoc, chunk);
+   } else {
+   sctp_unhash_established(asoc);
+   sctp_association_free(asoc);
+   }
} else
sctp_primitive_SHUTDOWN(net, asoc, NULL);
}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pull-request: mac80211 2015-12-15

2015-12-17 Thread Johannes Berg

On Wed, 2015-12-16 at 18:34 -0500, David Miller wrote:
> 
> Something about your text encoding kept this from ending up
> in patchwork for some reason.
> 

Hm. I don't see anything special with this, seems to just be plain text
8bit transfer encoding.

Do you want me to watch out for things getting into patchwork in the
future?

johannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[iproute PATCH v2 1/2] libnetlink: add a variant of rtnl_send_check that consumes ACKs

2015-12-17 Thread Lorenzo Colitti

The new variant is identical to rtnl_send_check, except it also
consumes the kernel response instead of using MSG_PEEK. This is
useful for callers that send simple commands that never cause a
response but only ACKs, and that expect to receive and deal
with errors without printing them to stderr like rtnl_talk does.

Signed-off-by: Lorenzo Colitti 
---
 include/libnetlink.h |  2 ++
 lib/libnetlink.c | 14 +++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 431189e..a88cb4d 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -75,6 +75,8 @@ int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
__attribute__((warn_unused_result));
 int rtnl_send(struct rtnl_handle *rth, const void *buf, int)
__attribute__((warn_unused_result));
+int rtnl_send_check_ack(struct rtnl_handle *rth, const void *buf, int, int)
+   __attribute__((warn_unused_result));
 int rtnl_send_check(struct rtnl_handle *rth, const void *buf, int)
__attribute__((warn_unused_result));
 
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 1658214..a3ad83a 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -134,18 +134,21 @@ int rtnl_send(struct rtnl_handle *rth, const void *buf, 
int len)
return send(rth->fd, buf, len, 0);
 }
 
-int rtnl_send_check(struct rtnl_handle *rth, const void *buf, int len)
+int rtnl_send_check_ack(struct rtnl_handle *rth, const void *buf, int len,
+   int ack)
 {
struct nlmsghdr *h;
-   int status;
+   int status, flags;
char resp[1024];
 
status = send(rth->fd, buf, len, 0);
if (status < 0)
return status;
 
+   flags = MSG_DONTWAIT | (ack ? 0 : MSG_PEEK);
+
/* Check for immediate errors */
-   status = recv(rth->fd, resp, sizeof(resp), MSG_DONTWAIT|MSG_PEEK);
+   status = recv(rth->fd, resp, sizeof(resp), flags);
if (status < 0) {
if (errno == EAGAIN)
return 0;
@@ -167,6 +170,11 @@ int rtnl_send_check(struct rtnl_handle *rth, const void 
*buf, int len)
return 0;
 }
 
+inline int rtnl_send_check(struct rtnl_handle *rth, const void *buf, int len)
+{
+   return rtnl_send_check_ack(rth, buf, len, 0);
+}
+
 int rtnl_dump_request(struct rtnl_handle *rth, int type, void *req, int len)
 {
struct nlmsghdr nlh;
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v5 2/8] netfilter: Factor out nf_ct_get_info().

2015-12-17 Thread Sergei Shtylyov


Hello.

On 12/17/2015 3:36 AM, Jarno Rajahalme wrote:


Define a new inline function to map conntrack status to enum
ip_conntrack_info.  This removes the need to otherwise duplicate this
code in a later patch ("openvswitch: Find existing conntrack entry
after upcall.").

Signed-off-by: Jarno Rajahalme 

[...]

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 3cb3cb8..7546fc7 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1056,25 +1056,15 @@ resolve_normal_ct(struct net *net, struct nf_conn *tmpl,
ct = nf_ct_tuplehash_to_ctrack(h);

/* It exists; we have (non-exclusive) reference. */
-   if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY) {
-   *ctinfo = IP_CT_ESTABLISHED_REPLY;
-   /* Please set reply bit if this packet OK */
-   *set_reply = 1;
-   } else {
-   /* Once we've had two way comms, always ESTABLISHED. */
-   if (test_bit(IPS_SEEN_REPLY_BIT, >status)) {
-   pr_debug("nf_conntrack_in: normal packet for %p\n", ct);
-   *ctinfo = IP_CT_ESTABLISHED;
-   } else if (test_bit(IPS_EXPECTED_BIT, >status)) {
-   pr_debug("nf_conntrack_in: related packet for %p\n",
-ct);
-   *ctinfo = IP_CT_RELATED;
-   } else {
-   pr_debug("nf_conntrack_in: new packet for %p\n", ct);
-   *ctinfo = IP_CT_NEW;
-   }
-   *set_reply = 0;
-   }
+   *ctinfo = nf_ct_get_info(h);
+   if (*ctinfo == IP_CT_ESTABLISHED)
+   pr_debug("nf_conntrack_in: normal packet for %p\n", ct);
+   else if (*ctinfo == IP_CT_RELATED)
+   pr_debug("nf_conntrack_in: related packet for %p\n", ct);
+   else if (*ctinfo == IP_CT_NEW)
+   pr_debug("nf_conntrack_in: new packet for %p\n", ct);


   This is asking to be a *switch* statement...

[...]

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v5 7/8] openvswitch: Delay conntrack helper call for new connections.

2015-12-17 Thread Sergei Shtylyov


Hello.

On 12/17/2015 3:36 AM, Jarno Rajahalme wrote:


There is no need to help connections that are not confirmed, so we can
delay helping new connections to the time when they are confirmed.
This change is needed for NAT support, and having this as a separate
patch will make the following NAT patch a bit easier to review.

Signed-off-by: Jarno Rajahalme 
---
  net/openvswitch/conntrack.c | 20 +++-
  1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 7aa38fa..ba44287 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c

[...]

@@ -491,11 +496,16 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
return -ENOENT;

ovs_ct_update_key(skb, key, true);
+   }

-   if (ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
-   WARN_ONCE(1, "helper rejected packet");
-   return -EINVAL;
-   }
+   /* Call the helper right after nf_conntrack_in() for confirmed
+* connections, but only when commiting for unconfirmed connections.
+*/
+   ct = nf_ct_get(skb, );
+   if (ct && (nf_ct_is_confirmed(ct) ? !cached : info->commit)
+   && ovs_ct_helper(skb, info->family) != NF_ACCEPT) {


   Please leave && on the line being broken, don't carry it into the 
continuation line.


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [iproute PATCH v2 2/2] ss: support closing inet sockets via SOCK_DESTROY.

2015-12-17 Thread Eric Dumazet

On Thu, 2015-12-17 at 22:22 +0900, Lorenzo Colitti wrote:
> This patch adds a -K / --kill option to ss that attempts to
> forcibly close matching sockets using SOCK_DESTROY.
> 
> This is implemented by adding a new "struct action" to struct
> filter. If necessary, this can be extended later on to support
> further actions on sockets.

Does not work for me :

lpaa23:~# ./ss dst lpaa24|wc -l
401
lpaa23:~# ./ss -K dst lpaa24|wc -l
1
lpaa23:~# ./ss dst lpaa24|wc -l
401
lpaa23:~# id
uid=0(root) gid=0(root)
groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),26(tape)

Kernel is latest David Miller net-next


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code

2015-12-17 Thread Rainer Weikusat

Hannes Frederic Sowa  writes:
> On 16.12.2015 21:09, Rainer Weikusat wrote:
>> With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
>> receive code was changed from using mutex_lock(>readlock) to
>> mutex_lock_interruptible(>readlock) to prevent signals from being
>> delayed for an indefinite time if a thread sleeping on the mutex
>> happened to be selected for handling the signal. But this was never a
>> problem with the stream receive code (as opposed to its datagram
>> counterpart) as that never went to sleep waiting for new messages with the
>> mutex held and thus, wouldn't cause secondary readers to block on the
>> mutex waiting for the sleeping primary reader. As the interruptible
>> locking makes the code more complicated in exchange for no benefit,
>> change it back to using mutex_lock.
>> 
>> Signed-off-by: Rainer Weikusat 
>> ---
>> 
>> Considering that the datagram receive routine also doesn't go the sleep
>> with the mutex held anymore, the 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490
>> change to unix_autobind is now similarly purposeless.
>
> I wouldn't do this conversion, yet. There is still a deadlock lingering
> around which should be solved earlier:
>
> http://lists.openwall.net/netdev/2015/11/10/4
>
> Unfortunately I haven't found a good way how to solve it, yet.

Judging from the link, that's not related to the stream receive code.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] be2net: Avoid accessing eq object in be_msix_register routine, when i < 0.

2015-12-17 Thread Venkat Duvvuru

When the first request_irq fails in be_msix_register, i value
would be zero. The current code decrements the i value and
accesses the eq object without validating the decremented
"i" value. This can cause an "invalid memory address access"
violation.

This patch fixes the problem by accessing the eq object after
validating the "i" value.

Signed-off-by: Venkat Duvvuru 
---
 drivers/net/ethernet/emulex/benet/be_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index b6ad029..6598820 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3299,8 +3299,10 @@ static int be_msix_register(struct be_adapter *adapter)
 
return 0;
 err_msix:
-   for (i--, eqo = >eq_obj[i]; i >= 0; i--, eqo--)
+   for (i--; i >= 0; i--) {
+   eqo = >eq_obj[i];
free_irq(be_msix_vec_get(adapter, eqo), eqo);
+   }
dev_warn(>pdev->dev, "MSIX Request IRQ failed - err %d\n",
 status);
be_msix_disable(adapter);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch net] net: check both type and procotol for tcp sockets

2015-12-17 Thread Willem de Bruijn

On Thu, Dec 17, 2015 at 2:39 AM, Cong Wang  wrote:
> Dmitry reported the following out-of-bound access:
>
> Call Trace:
>  [] __asan_report_load4_noabort+0x3e/0x40
> mm/kasan/report.c:294
>  [] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
>  [< inline >] SYSC_setsockopt net/socket.c:1746
>  [] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
>  [] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
>
> This is because we mistake a raw socket as a tcp socket.
> We should check both sk->sk_type and sk->sk_protocol to ensure
> it is a tcp socket.
>
> Willem points out __skb_complete_tx_timestamp() needs to fix as well.
>
> Reported-by: Dmitry Vyukov 
> Cc: Willem de Bruijn 
> Cc: Eric Dumazet 
> Signed-off-by: Cong Wang 

Acked-by: Willem de Bruijn 

Thanks for fixing both cases at once.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net ixgbevf: ethtool statistics yields/misses fixes

2015-12-17 Thread Tristan Colgate

ixgbevf over counts yields and does not actually count misses.
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index d3e5f5b..ffdd8df 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -417,13 +417,13 @@ static void ixgbevf_get_ethtool_stats(struct
net_device *netdev,
for (i = 0; i < adapter->num_rx_queues; i++) {
rx_yields += adapter->rx_ring[i]->stats.yields;
rx_cleaned += adapter->rx_ring[i]->stats.cleaned;
-   rx_yields += adapter->rx_ring[i]->stats.yields;
+   rx_missed += adapter->rx_ring[i]->stats.misses;
}

for (i = 0; i < adapter->num_tx_queues; i++) {
tx_yields += adapter->tx_ring[i]->stats.yields;
tx_cleaned += adapter->tx_ring[i]->stats.cleaned;
-   tx_yields += adapter->tx_ring[i]->stats.yields;
+   tx_missed += adapter->tx_ring[i]->stats.misses;
}

adapter->bp_rx_yields = rx_yields;
-- 
2.1.4
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/15] i40iw: add hw and utils files

2015-12-17 Thread Christoph Hellwig

> +#ifndef UNREFERENCED_PARAMETER
> +#define UNREFERENCED_PARAMETER(_p)   \
> +{\
> + (_p) = (_p);\
> +}
> +#endif

No need for this, just remove it.

> +#define I40E_MASK(mask, shift) (mask << shift)

Please just opencode the shit, this macro is silly.

> +#define i40iw_flush(a)  readl((a)->hw_addr + I40E_GLGEN_STAT)
> +
> +#define wr32(a, reg, value) writel((value), (a)->hw_addr + (reg))
> +#define rd32(a, reg)readl((a)->hw_addr + (reg))

Please urn these into inlines.

> +
> +#ifndef readq
> +static inline u64 rd64(u8 * __iomem addr)
> +{
> + return ((u64)readl(addr)) | (((u64)readl(addr + 4UL)) << 32);
> +}
> +#else
> +#define rd64(a)readq((a))
> +#endif

Please use the magic in  instead.

> +
> +#define db_wr32(a, value)   writel((value), (a))

Pleas remove this pointless wrapper.

> +void SLEEP(u8 ms);

Please give this function a sensible name.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rhashtable: Prevent spurious EBUSY errors on insertion

2015-12-17 Thread Xin Long

On Thu, Dec 17, 2015 at 5:00 PM, Xin Long  wrote:
> On Thu, Dec 17, 2015 at 4:48 PM, Herbert Xu  
> wrote:
>> On Thu, Dec 17, 2015 at 04:46:00PM +0800, Xin Long wrote:
>>>
>>> sorry for late test, but unfortunately, my case with rhashtalbe still
>>> return EBUSY.
>>> I added some debug code in rhashtable_insert_rehash(), and found:
>>> *future_tbl is null*
>>>
>>> fail:
>>> /* Do not fail the insert if someone else did a rehash. */
>>> if (likely(rcu_dereference_raw(tbl->future_tbl))) {
>>> printk("future_tbl is there\n");
>>> return 0;
>>> } else {
>>> printk("future_tbl is null\n");
>>> }
>>>
>>> any idea why ?
>>
>> That's presumably because you got a genuine double rehash.
>>
>> Until you post your code we can't really help you.
>>
> i wish i could , but my codes is a big patch for sctp, and this issue
> happens in a special stress test based on this patch.
> im trying to think how i can show you. :)

I'm just wondering, why do not we handle the genuine double rehash
issue inside rhashtable? i mean it's just a temporary error that a
simple retry may fix it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/2] Local checksum offload for VXLAN

2015-12-17 Thread Edward Cree

When the inner packet checksum is offloaded, the outer UDP checksum is easy
 to calculate as it doesn't depend on the payload (because the inner checksum
 cancels out everything from the inner packet except the pseudo header).
Thus, transmit checksums for VXLAN (and in principle other encapsulations,
 but I haven't enabled it for / tested with those) can be offloaded on any
 device supporting NETIF_F_HW_CSUM.  Only the innermost checksum has to be
 offloaded, the rest are filled in by the stack.
Tested by hacking a driver to report NETIF_F_HW_CSUM, call skb_checksum_help
 before transmitting a packet, and not actually offload anything to the hw.
In principle it should also be possible to apply this technique when the
 inner packet has been checksummed by software, but only if skb->csum_start
 and skb->csum_offset have been filled in to describe the inner checksum.
 However in this case it is easier to use skb->csum and skb->csum_start, as
 gso_make_checksum() already does - a similar but simpler technique.  It's
 not clear to me where else this should be done, so this is out of scope for
 this patch series.

Edward Cree (2):
  net: udp: local checksum offload for encapsulation
  net: vxlan: enable local checksum offload on HW_CSUM devices

 drivers/net/vxlan.c |  5 -
 net/ipv4/udp.c  | 34 +-
 2 files changed, 33 insertions(+), 6 deletions(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] mac80211: improve the contiguous mask checking

2015-12-17 Thread Johannes Berg

On Thu, 2015-12-17 at 22:59 +0800, Zeng Zhaoxiu wrote:
> If the result of adding the first set bit to the mask is power of 2,
> the mask must be contiguous. "mask & -mask" can get the first set bit
> of mask gracefully.

> - if (__ffs64(mask) + hweight64(mask) != fls64(mask))
> {
> + inc = (mask & -mask);
> + val = mask + inc;
> + if ((val & (val - 1)) != 0) {
>   /* not a contiguous mask ... not handled now! */

Hm. Ok, I can see how that would work, but it doesn't really seem like
much of an "improvement" to me? Surely I seem to need much more
thinking to understand this. There's no reason to optimise it either,
so why should we change it?

johannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Random packet loss using IPsec with AES128-SHA1

2015-12-17 Thread Gabriele Beltrame

Hi Steffen,

I don't think it's IPsec (I can see the outbound packet in tcpdump), not
alone at least but the XEN /AWS Ethernet driver (or multiple things
concurring to the issue) actually... the odd thing is that it does seem to
affect AES-CBS only (3DES-CBS, AES-GCM are fine)
This is the short discussion on the Strongswan support wiki:
https://wiki.strongswan.org/issues/1220

Thanks,
Gabriele

-Original Message-
From: Steffen Klassert [mailto:steffen.klass...@secunet.com] 
Sent: mercoledì 16 dicembre 2015 11:00
To: Gabriele Beltrame 
Cc: netdev@vger.kernel.org
Subject: Re: Random packet loss using IPsec with AES128-SHA1

On Wed, Dec 16, 2015 at 10:17:54AM +0100, Gabriele Beltrame wrote:
> Hi,
> 
> I'm running a few Strongswan 5.3.* CentOS (Kernel 3.16.7, 4.2.6, 
> 4.1.*) instances on AWS to terminate VPNs between each other and/or to 
> other devices across the Internet.
> While investigating some application issues, I've noticed that on 
> every VPNs I have random packet losses (from 1% to 4% over 100 to 300
requests sent).
> This only happens when the two following conditions are met: (a) AES 
> encryption used, (b) IP packet size shorter than about (150+8+20)Bytes.

I've never seen this.

If xfrm statistics are compiled in, a counter is bumped for each packet
dropped by IPsec. You can check these counters in /proc/net/xfrm_stat.

This will tell you at least whether IPsec is the reason for your packet
loss.

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code

2015-12-17 Thread Hannes Frederic Sowa

On 17.12.2015 16:28, Rainer Weikusat wrote:
> Hannes Frederic Sowa  writes:
>> On 16.12.2015 21:09, Rainer Weikusat wrote:
>>> With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
>>> receive code was changed from using mutex_lock(>readlock) to
>>> mutex_lock_interruptible(>readlock) to prevent signals from being
>>> delayed for an indefinite time if a thread sleeping on the mutex
>>> happened to be selected for handling the signal. But this was never a
>>> problem with the stream receive code (as opposed to its datagram
>>> counterpart) as that never went to sleep waiting for new messages with the
>>> mutex held and thus, wouldn't cause secondary readers to block on the
>>> mutex waiting for the sleeping primary reader. As the interruptible
>>> locking makes the code more complicated in exchange for no benefit,
>>> change it back to using mutex_lock.
>>>
>>> Signed-off-by: Rainer Weikusat 
>>> ---
>>>
>>> Considering that the datagram receive routine also doesn't go the sleep
>>> with the mutex held anymore, the 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490
>>> change to unix_autobind is now similarly purposeless.
>>
>> I wouldn't do this conversion, yet. There is still a deadlock lingering
>> around which should be solved earlier:
>>
>> http://lists.openwall.net/netdev/2015/11/10/4
>>
>> Unfortunately I haven't found a good way how to solve it, yet.
> 
> Judging from the link, that's not related to the stream receive code.
> 

No, but to commit 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490 where the
mutexes of unix_bind and unix_autobind got changed.

The unix_stream_read_generic conversion is fine.

Acked-by: Hannes Frederic Sowa 

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC net-next] ravb: Add dma queue interrupt support

2015-12-17 Thread Yoshihiro Kaneko

Hi,

2015-12-16 4:00 GMT+09:00 Sergei Shtylyov :
> Hello.
>
> On 12/15/2015 03:23 PM, Yoshihiro Kaneko wrote:
>
>> From: Kazuya Mizuguchi 
>>
>> This patch supports the following interrupts.
>>
>> - One interrupt for multiple (descriptor, error, management)
>> - One interrupt for emac
>> - Four interrupts for dma queue (best effort rx/tx, network control rx/tx)
>
>
>You don't say why the current 2-interrupt scheme (implemented by Simon's
> patch) isn't enpough...
>
>> Signed-off-by: Kazuya Mizuguchi 
>> Signed-off-by: Yoshihiro Kaneko 
>
> [...]
>
>> diff --git a/drivers/net/ethernet/renesas/ravb.h
>> b/drivers/net/ethernet/renesas/ravb.h
>> index 9fbe92a..eada5a1 100644
>> --- a/drivers/net/ethernet/renesas/ravb.h
>> +++ b/drivers/net/ethernet/renesas/ravb.h
>> @@ -157,6 +157,7 @@ enum ravb_reg {
>> TIC = 0x0378,
>> TIS = 0x037C,
>> ISS = 0x0380,
>> +   CIE = 0x0384,
>
>
>I'd like to see some comment clarifying that this is R-Car gen3 only reg.
>
> [./..]
>>
>> @@ -556,6 +566,16 @@ enum ISS_BIT {
>> ISS_DPS15   = 0x8000,
>>   };
>>
>> +/* CIE */
>
>
>And here as well.
>
>> +enum CIE_BIT {
>> +   CIE_CRIE= 0x0001, /* Common Receive Interrupt Enable
>> */
>> +   CIE_CTIE= 0x0100, /* Common Transmit Interrupt Enable
>> */
>> +   CIE_RQFM= 0x0001, /* Reception Queue Full Mode */
>> +   CIE_CL0M= 0x0002, /* Common Line 0 Mode */
>> +   CIE_RFWL= 0x0004, /* Rx-FIFO Warning interrupt Line */
>
>
>You forgot "Select" at the end.
>
>> +   CIE_RFFL= 0x0008, /* Rx-FIFO Full interrupt Line */
>
>
>Here as well.
>Well, generally we don't have such comments for the other registers, so
> this will look somewhat out of line...

I agree. I will remove those comment.

>
> [...]
>>
>> @@ -592,6 +612,18 @@ enum GIS_BIT {
>> GIS_PTMF= 0x0004,
>>   };
>>
>> +/* GIx */
>
>
>I'd prefer GIC/GIS.
>
>> +#define RAVB_GIx_ALL   0x03ff
>
>
>No RAVB_ prefix please.
>
>> +
>> +/* RIx0 */
>
>
>RIE0/RID0.
>
>> +#define RAVB_RIx0_ALL  0x0003
>
>
>No prefix. And I'd rather call it RIx0_FRx. Or even RIE0_FRS and
> RID0_FRD.
>
>> +
>> +/* RIx2 */
>
>
>RIE2/RID2.
>
>> +#define RAVB_RIx2_ALL  0x8003
>
>
>No prefix. And there's bit 31 in this register, according to my gen3
> manual. So, your _ALL isn't really "all bits". I'd rather call it RIx2_QFx.
> Or even RIE2_QFS and RID2_QFD.

I think that bit 31 is included in the value 0x8003. Or I'm
missing something?

>
>> +
>> +/* TIx */
>
>
>TIE/TID.
>
>> +#define RAVB_TIx_ALL   0x000f
>
>
>No prefix. And there's bit 31 in this register, according to my gen3
> manual. So, your _ALL isn't really "all bits".

I think the correct value is 0x000f0f0f.

>
> [...]
>>
>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c
>> b/drivers/net/ethernet/renesas/ravb_main.c
>> index 120cc25..753b67d 100644
>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>
> [...]
>>
>> @@ -376,6 +386,7 @@ static void ravb_emac_init(struct net_device *ndev)
>>   static int ravb_dmac_init(struct net_device *ndev)
>>   {
>> int error;
>> +   struct ravb_private *priv = netdev_priv(ndev);
>
>
>Please declare this variable before 'error' -- DaveM really prefers
> "reversed Christmas tree" declaration order.
>
> [...]
>>
>> @@ -411,14 +422,28 @@ static int ravb_dmac_init(struct net_device *ndev)
>> ravb_write(ndev, TCCR_TFEN, TCCR);
>>
>> /* Interrupt init: */
>> -   /* Frame receive */
>> -   ravb_write(ndev, RIC0_FRE0 | RIC0_FRE1, RIC0);
>> -   /* Disable FIFO full warning */
>> -   ravb_write(ndev, 0, RIC1);
>> -   /* Receive FIFO full error, descriptor empty */
>> -   ravb_write(ndev, RIC2_QFE0 | RIC2_QFE1 | RIC2_RFFE, RIC2);
>> -   /* Frame transmitted, timestamp FIFO updated */
>> -   ravb_write(ndev, TIC_FTE0 | TIC_FTE1 | TIC_TFUE, TIC);
>> +   if (priv->chip_id == RCAR_GEN2) {
>> +   /* Frame receive */
>> +   ravb_write(ndev, RIC0_FRE0 | RIC0_FRE1, RIC0);
>> +   /* Disable FIFO full warning */
>> +   ravb_write(ndev, 0, RIC1);
>> +   /* Receive FIFO full error, descriptor empty */
>> +   ravb_write(ndev, RIC2_QFE0 | RIC2_QFE1 | RIC2_RFFE, RIC2);
>> +   /* Frame transmitted, timestamp FIFO updated */
>> +   ravb_write(ndev, TIC_FTE0 | TIC_FTE1 | TIC_TFUE, TIC);
>> +   } else {
>> +   /* Clear CIE.CTIE, CIE.CRIE, DIL.DPLx */
>> +   ravb_write(ndev, 0, CIE);
>
>
>Why clear CIE if you immediately overwrite it?
>
>> +   ravb_write(ndev, 0, DIL);
>> +   /* Set queue specific interrupt */
>> +

Re: [PATCH 08/15] i40iw: add files for iwarp interface

2015-12-17 Thread Christoph Hellwig

> + i40iw_next_iw_state(iwqp, I40IW_QP_STATE_ERROR, 0, 0, 0);
> +
> + if (!iwqp->user_mode) {
> + if (iwqp->iwscq)
> + i40iw_clean_cqes(iwqp, iwqp->iwscq);
> + if ((iwqp->iwrcq) && (iwqp->iwrcq != iwqp->iwscq))

Please try to do a pass over your code and remove all these pointless
braces.

> +static int i40iw_setup_virt_qp(struct i40iw_device *iwdev,
> +struct i40iw_qp *iwqp,
> +struct i40iw_qp_init_info *init_info)
> +{
> + struct i40iw_pbl *iwpbl = iwqp->iwpbl;
> + struct i40iw_qp_mr *qpmr = >qp_mr;
> + u64 *sq_base;
> +
> + sq_base = kmap(qpmr->sq_page);
> + iwqp->sq_kmapped = 1;


You must never use kmap for any long lived resource.  Just allocate
it it out of lowmem so that you don't need the kmap.

> + ukinfo->rq = (u64 *)((u8 *)mem->va + (sqdepth * I40IW_QP_WQE_MIN_SIZE));
> + info->rq_pa = (uintptr_t)((u8 *)mem->pa + (sqdepth * 
> I40IW_QP_WQE_MIN_SIZE));
> +
> + ukinfo->shadow_area = (u64 *)((u8 *)ukinfo->rq +
> +   (rqdepth * I40IW_QP_WQE_MIN_SIZE));
> + info->shadow_area_pa = info->rq_pa + (rqdepth * I40IW_QP_WQE_MIN_SIZE);

Can you please try to get away with less casts here?  Note that Linux does
use GCC extensions for void pointer arithmetics.  Even without that you
never need to use casts to or from void pointers.  All this happes in
lots of places in the code, so a little audit would be useful.

> +/**
> + * i40iw_alloc_mw - Allocate memory window
> + * @ibpd: protection domain
> + * @type: memory window type
> + */
> +static struct ib_mw *i40iw_alloc_mw(struct ib_pd *ibpd,
> + enum ib_mw_type type)
> +{
> + return ERR_PTR(-ENOSYS);
> +}
> +
> +/**
> + * i40iw_dealloc_mw - Free a memory window
> + * @ibmw: memory window to free
> + */
> +static int i40iw_dealloc_mw(struct ib_mw *ibmw)
> +{
> + return -EIO;
> +}
> +
> +/**
> + * i40iw_bind_mw - Bind a memory window to a qp
> + * @ibqp: queue pair
> + * @ibmw: memory window
> + * @ibmw_bind: pointer to bind structure
> + */
> +static int i40iw_bind_mw(struct ib_qp *ibqp,
> +  struct ib_mw *ibmw,
> +  struct ib_mw_bind *ibmw_bind)
> +{
> + return -ENOSYS;
> +}

There shouldn't be any need to stub all these out.

> +/**
> + * i40iw_init_ofa_device - initialization of iwarp device
> + * @iwdev: iwarp device
> + */
> +static struct i40iw_ib_device *i40iw_init_ofa_device(struct i40iw_device 
> *iwdev)

Where is that weird ofa prefix coming from?

> + iwibdev->ibdev.reg_phys_mr = i40iw_reg_phys_mr;

Please don't add phys MR support in new drivers, it's about to
disappear.

> + iwibdev->ibdev.detach_mcast = NULL;
> + iwibdev->ibdev.attach_mcast = NULL;
> + iwibdev->ibdev.get_protocol_stats = i40iw_get_protocol_stats;
> + iwibdev->ibdev.process_mad = NULL;

All the unused fields should already be zeroed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/2] net: vxlan: enable local checksum offload on HW_CSUM devices

2015-12-17 Thread Edward Cree

Signed-off-by: Edward Cree 
---
 drivers/net/vxlan.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 6369a57..c1660d6 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1785,6 +1785,9 @@ static int vxlan_xmit_skb(struct rtable *rt, struct sock 
*sk, struct sk_buff *sk
bool udp_sum = !!(vxflags & VXLAN_F_UDP_CSUM);
int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
u16 hdrlen = sizeof(struct vxlanhdr);
+   /* Is device able to do the inner checksum? */
+   bool inner_csum = skb_dst(skb) && skb_dst(skb)->dev &&
+   (skb_dst(skb)->dev->features & NETIF_F_HW_CSUM);
 
if ((vxflags & VXLAN_F_REMCSUM_TX) &&
skb->ip_summed == CHECKSUM_PARTIAL) {
@@ -1814,7 +1817,7 @@ static int vxlan_xmit_skb(struct rtable *rt, struct sock 
*sk, struct sk_buff *sk
if (WARN_ON(!skb))
return -ENOMEM;
 
-   skb = iptunnel_handle_offloads(skb, udp_sum, type);
+   skb = iptunnel_handle_offloads(skb, udp_sum && !inner_csum, type);
if (IS_ERR(skb))
return PTR_ERR(skb);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 1/2] net: udp: local checksum offload for encapsulation

2015-12-17 Thread Edward Cree

The arithmetic properties of the ones-complement checksum mean that a
 correctly checksummed inner packet, including its checksum, has a ones
 complement sum depending only on whatever value was used to initialise
 the checksum field before checksumming (in the case of TCP and UDP,
 this is the ones complement sum of the pseudo header, complemented).
Consequently, if we are going to offload the inner checksum with
 CHECKSUM_PARTIAL, we can compute the outer checksum based only on the
 packed data not covered by the inner checksum, and the initial value of
 the inner checksum field.

Signed-off-by: Edward Cree 
---
 net/ipv4/udp.c | 34 +-
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8841e98..3e63c3d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -767,13 +767,37 @@ void udp_set_csum(bool nocheck, struct sk_buff *skb,
 {
struct udphdr *uh = udp_hdr(skb);
 
-   if (nocheck)
+   if (nocheck) {
uh->check = 0;
-   else if (skb_is_gso(skb))
+   } else if (skb_is_gso(skb)) {
uh->check = ~udp_v4_check(len, saddr, daddr, 0);
-   else if (skb_dst(skb) && skb_dst(skb)->dev &&
-(skb_dst(skb)->dev->features &
- (NETIF_F_IP_CSUM | NETIF_F_HW_CSUM))) {
+   } else if (skb->ip_summed == CHECKSUM_PARTIAL &&
+  skb_dst(skb) && skb_dst(skb)->dev &&
+  (skb_dst(skb)->dev->features &
+   (NETIF_F_IP_CSUM | NETIF_F_HW_CSUM))) {
+   /* Everything from csum_start onwards will be
+* checksummed and will thus have a sum of whatever
+* we previously put in the checksum field (eg. sum
+* of pseudo-header)
+*/
+   __wsum csum;
+
+   /* Fill in our pseudo-header checksum */
+   uh->check = ~udp_v4_check(len, saddr, daddr, 0);
+   /* Start with complement of inner pseudo-header checksum */
+   csum = ~skb_checksum(skb, skb_checksum_start_offset(skb) + 
skb->csum_offset,
+2, 0);
+   /* Add in checksum of our headers (incl. pseudo-header
+* checksum filled in above)
+*/
+   csum = skb_checksum(skb, 0, skb_checksum_start_offset(skb), 
csum);
+   /* The result is the outer checksum */
+   uh->check = csum_fold(csum);
+   if (uh->check == 0)
+   uh->check = CSUM_MANGLED_0;
+   } else if (skb_dst(skb) && skb_dst(skb)->dev &&
+  (skb_dst(skb)->dev->features &
+   (NETIF_F_IP_CSUM | NETIF_F_HW_CSUM))) {
 
BUG_ON(skb->ip_summed == CHECKSUM_PARTIAL);
 
-- 
2.4.3


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [iproute PATCH v2 1/2] libnetlink: add a variant of rtnl_send_check that consumes ACKs

2015-12-17 Thread Eric Dumazet

On Thu, 2015-12-17 at 22:22 +0900, Lorenzo Colitti wrote:
> The new variant is identical to rtnl_send_check, except it also
> consumes the kernel response instead of using MSG_PEEK. This is
> useful for callers that send simple commands that never cause a
> response but only ACKs, and that expect to receive and deal
> with errors without printing them to stderr like rtnl_talk does.

> +inline int rtnl_send_check(struct rtnl_handle *rth, const void *buf, int len)
> +{
> + return rtnl_send_check_ack(rth, buf, len, 0);
> +}

Please remove this inline attribute.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Instantaneous Threshold ECN marking for DCTCP

2015-12-17 Thread Bryce Cronkite-Ratcliff

Hi there,

I am attempting to run some experiments with DCTCP using mininet (so
this is all user-level emulation -- my switch is a linux namespace
using OVS). My question is on how to apply DCTCP's AQM. DCTCP's AQM
just marks all packets with CE if they are received when the queue is
over a threshold value of K and otherwise does not mark CE. The DCTCP
papers I have seen suggest that RED can be used to achieve this: set
probability to 1, burst to 1, and min and max to the same value.

However, tc-red, in accordance with this paper, does not allow burst
to be significantly below the minimum threshold value, so I am unable
to set the parameters I would like -- in particular to set burst to 1
-- without monkey-patching tc.

Is there a way to achieve this simple threshold-ECN marking AQM with
tc, or another approach?

Thank you!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rhashtable: Prevent spurious EBUSY errors on insertion

2015-12-17 Thread David Miller

From: Xin Long 
Date: Thu, 17 Dec 2015 17:00:35 +0800

> On Thu, Dec 17, 2015 at 4:48 PM, Herbert Xu  
> wrote:
>> On Thu, Dec 17, 2015 at 04:46:00PM +0800, Xin Long wrote:
>>>
>>> sorry for late test, but unfortunately, my case with rhashtalbe still
>>> return EBUSY.
>>> I added some debug code in rhashtable_insert_rehash(), and found:
>>> *future_tbl is null*
>>>
>>> fail:
>>> /* Do not fail the insert if someone else did a rehash. */
>>> if (likely(rcu_dereference_raw(tbl->future_tbl))) {
>>> printk("future_tbl is there\n");
>>> return 0;
>>> } else {
>>> printk("future_tbl is null\n");
>>> }
>>>
>>> any idea why ?
>>
>> That's presumably because you got a genuine double rehash.
>>
>> Until you post your code we can't really help you.
>>
> i wish i could , but my codes is a big patch for sctp, and this issue
> happens in a special stress test based on this patch.
> im trying to think how i can show you. :)

Simply post it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 4/5] ethtool: support per queue sub command --show-coalesce

2015-12-17 Thread kan . liang

From: Kan Liang 

Get all masked queues' coalesce from kernel and dump them one by one.

Example:

 $ sudo ./ethtool --set-perqueue-command eth5 queue_mask 0x11
   --show-coalesce
 Queue: 0
 Adaptive RX: off  TX: off
 stats-block-usecs: 0
 sample-interval: 0
 pkt-rate-low: 0
 pkt-rate-high: 0

 rx-usecs: 222
 rx-frames: 0
 rx-usecs-irq: 0
 rx-frames-irq: 256

 tx-usecs: 222
 tx-frames: 0
 tx-usecs-irq: 0
 tx-frames-irq: 256

 rx-usecs-low: 0
 rx-frame-low: 0
 tx-usecs-low: 0
 tx-frame-low: 0

 rx-usecs-high: 0
 rx-frame-high: 0
 tx-usecs-high: 0
 tx-frame-high: 0

 Queue: 4
 Adaptive RX: off  TX: off
 stats-block-usecs: 0
 sample-interval: 0
 pkt-rate-low: 0
 pkt-rate-high: 0

 rx-usecs: 222
 rx-frames: 0
 rx-usecs-irq: 0
 rx-frames-irq: 256

 tx-usecs: 222
 tx-frames: 0
 tx-usecs-irq: 0
 tx-frames-irq: 256

 rx-usecs-low: 0
 rx-frame-low: 0
 tx-usecs-low: 0
 tx-frame-low: 0

 rx-usecs-high: 0
 rx-frame-high: 0
 tx-usecs-high: 0
 tx-frame-high: 0

Signed-off-by: Kan Liang 
---
 ethtool.c | 61 +++--
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index ea88c85..e7becc9 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1236,6 +1236,29 @@ static int dump_coalesce(const struct ethtool_coalesce 
*ecoal)
return 0;
 }
 
+void dump_per_queue_coalesce(struct ethtool_per_queue_op *per_queue_opt,
+__u64 *queue_mask)
+{
+   char *addr;
+   int i;
+
+   addr = (char *)per_queue_opt + sizeof(*per_queue_opt);
+   for (i = 0; i < MAX_QUEUE_MASK; i++) {
+   int queue = i * 64;
+   __u64 mask = queue_mask[i];
+
+   while (mask > 0) {
+   if (mask & 0x1) {
+   fprintf(stdout, "Queue: %d\n", queue);
+   dump_coalesce((struct ethtool_coalesce *)addr);
+   addr += sizeof(struct ethtool_coalesce);
+   }
+   mask = mask >> 1;
+   queue++;
+   }
+   }
+}
+
 struct feature_state {
u32 off_flags;
struct ethtool_gfeatures features;
@@ -4148,7 +4171,8 @@ static const struct option {
  " [ advertise %x ]\n"
  " [ tx-lpi on|off ]\n"
  " [ tx-timer %d ]\n"},
-   { "--set-perqueue-command", 1, do_perqueue, "Set per queue command",
+   { "--set-perqueue-command", 1, do_perqueue, "Set per queue command. "
+ "The supported sub commands include --show-coalesce",
  " [queue_mask %x] SUB_COMMAND\n"},
{ "-h|--help", 0, show_usage, "Show this help" },
{ "--version", 0, do_version, "Show version number" },
@@ -4242,8 +4266,30 @@ static int find_max_queue_num(struct cmd_context *ctx)
return MAX(MAX(echannels.rx_count, echannels.tx_count), 
echannels.combined_count);
 }
 
+static struct ethtool_per_queue_op *
+get_per_queue_coalesce(struct cmd_context *ctx,
+  __u64 *queue_mask, int queue_num)
+{
+   struct ethtool_per_queue_op *per_queue_opt;
+
+   per_queue_opt = malloc(sizeof(*per_queue_opt) + queue_num * 
sizeof(struct ethtool_coalesce));
+   if (!per_queue_opt)
+   return NULL;
+   memcpy(per_queue_opt->queue_mask, queue_mask, MAX_QUEUE_MASK * 
sizeof(__u64));
+   per_queue_opt->cmd = ETHTOOL_PERQUEUE;
+   per_queue_opt->sub_command = ETHTOOL_GCOALESCE;
+   if (send_ioctl(ctx, per_queue_opt)) {
+   free(per_queue_opt);
+   perror("Cannot get device per queue parameters");
+   return NULL;
+   }
+
+   return per_queue_opt;
+}
+
 static int do_perqueue(struct cmd_context *ctx)
 {
+   struct ethtool_per_queue_op *per_queue_opt;
__u64 queue_mask[MAX_QUEUE_MASK] = {0};
__u64 mask;
int i, queue_num = 0;
@@ -4286,7 +4332,18 @@ static int do_perqueue(struct cmd_context *ctx)
if (i < 0)
exit_bad_args();
 
-   /* no sub_command support yet */
+   if (strstr(args[i].opts, "--show-coalesce") != NULL) {
+   per_queue_opt = get_per_queue_coalesce(ctx, queue_mask, 
queue_num);
+   if (per_queue_opt == NULL) {
+   perror("Cannot get device per queue parameters");
+   return -EFAULT;
+   }
+   dump_per_queue_coalesce(per_queue_opt, queue_mask);
+   free(per_queue_opt);
+   } else {
+   perror("The subcommand is not supported yet");
+   return -EOPNOTSUPP;
+   }
 
return 0;
 }
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 2/5] ethtool: move cmdline_coalesce out of do_scoalesce

2015-12-17 Thread kan . liang

From: Kan Liang 

Moving cmdline_coalesce out of do_scoalesce, so it can be shared with
other functions.
No behavior change.

Signed-off-by: Kan Liang 
---
 ethtool.c | 147 +++---
 1 file changed, 74 insertions(+), 73 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 04c5015..cb9e630 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1900,85 +1900,86 @@ static int do_gcoalesce(struct cmd_context *ctx)
return 0;
 }
 
+static struct ethtool_coalesce s_ecoal;
+static s32 coal_stats_wanted = -1;
+static int coal_adaptive_rx_wanted = -1;
+static int coal_adaptive_tx_wanted = -1;
+static s32 coal_sample_rate_wanted = -1;
+static s32 coal_pkt_rate_low_wanted = -1;
+static s32 coal_pkt_rate_high_wanted = -1;
+static s32 coal_rx_usec_wanted = -1;
+static s32 coal_rx_frames_wanted = -1;
+static s32 coal_rx_usec_irq_wanted = -1;
+static s32 coal_rx_frames_irq_wanted = -1;
+static s32 coal_tx_usec_wanted = -1;
+static s32 coal_tx_frames_wanted = -1;
+static s32 coal_tx_usec_irq_wanted = -1;
+static s32 coal_tx_frames_irq_wanted = -1;
+static s32 coal_rx_usec_low_wanted = -1;
+static s32 coal_rx_frames_low_wanted = -1;
+static s32 coal_tx_usec_low_wanted = -1;
+static s32 coal_tx_frames_low_wanted = -1;
+static s32 coal_rx_usec_high_wanted = -1;
+static s32 coal_rx_frames_high_wanted = -1;
+static s32 coal_tx_usec_high_wanted = -1;
+static s32 coal_tx_frames_high_wanted = -1;
+
+static struct cmdline_info cmdline_coalesce[] = {
+   { "adaptive-rx", CMDL_BOOL, _adaptive_rx_wanted,
+ _ecoal.use_adaptive_rx_coalesce },
+   { "adaptive-tx", CMDL_BOOL, _adaptive_tx_wanted,
+ _ecoal.use_adaptive_tx_coalesce },
+   { "sample-interval", CMDL_S32, _sample_rate_wanted,
+ _ecoal.rate_sample_interval },
+   { "stats-block-usecs", CMDL_S32, _stats_wanted,
+ _ecoal.stats_block_coalesce_usecs },
+   { "pkt-rate-low", CMDL_S32, _pkt_rate_low_wanted,
+ _ecoal.pkt_rate_low },
+   { "pkt-rate-high", CMDL_S32, _pkt_rate_high_wanted,
+ _ecoal.pkt_rate_high },
+   { "rx-usecs", CMDL_S32, _rx_usec_wanted,
+ _ecoal.rx_coalesce_usecs },
+   { "rx-frames", CMDL_S32, _rx_frames_wanted,
+ _ecoal.rx_max_coalesced_frames },
+   { "rx-usecs-irq", CMDL_S32, _rx_usec_irq_wanted,
+ _ecoal.rx_coalesce_usecs_irq },
+   { "rx-frames-irq", CMDL_S32, _rx_frames_irq_wanted,
+ _ecoal.rx_max_coalesced_frames_irq },
+   { "tx-usecs", CMDL_S32, _tx_usec_wanted,
+ _ecoal.tx_coalesce_usecs },
+   { "tx-frames", CMDL_S32, _tx_frames_wanted,
+ _ecoal.tx_max_coalesced_frames },
+   { "tx-usecs-irq", CMDL_S32, _tx_usec_irq_wanted,
+ _ecoal.tx_coalesce_usecs_irq },
+   { "tx-frames-irq", CMDL_S32, _tx_frames_irq_wanted,
+ _ecoal.tx_max_coalesced_frames_irq },
+   { "rx-usecs-low", CMDL_S32, _rx_usec_low_wanted,
+ _ecoal.rx_coalesce_usecs_low },
+   { "rx-frames-low", CMDL_S32, _rx_frames_low_wanted,
+ _ecoal.rx_max_coalesced_frames_low },
+   { "tx-usecs-low", CMDL_S32, _tx_usec_low_wanted,
+ _ecoal.tx_coalesce_usecs_low },
+   { "tx-frames-low", CMDL_S32, _tx_frames_low_wanted,
+ _ecoal.tx_max_coalesced_frames_low },
+   { "rx-usecs-high", CMDL_S32, _rx_usec_high_wanted,
+ _ecoal.rx_coalesce_usecs_high },
+   { "rx-frames-high", CMDL_S32, _rx_frames_high_wanted,
+ _ecoal.rx_max_coalesced_frames_high },
+   { "tx-usecs-high", CMDL_S32, _tx_usec_high_wanted,
+ _ecoal.tx_coalesce_usecs_high },
+   { "tx-frames-high", CMDL_S32, _tx_frames_high_wanted,
+ _ecoal.tx_max_coalesced_frames_high },
+};
 static int do_scoalesce(struct cmd_context *ctx)
 {
-   struct ethtool_coalesce ecoal;
int gcoalesce_changed = 0;
-   s32 coal_stats_wanted = -1;
-   int coal_adaptive_rx_wanted = -1;
-   int coal_adaptive_tx_wanted = -1;
-   s32 coal_sample_rate_wanted = -1;
-   s32 coal_pkt_rate_low_wanted = -1;
-   s32 coal_pkt_rate_high_wanted = -1;
-   s32 coal_rx_usec_wanted = -1;
-   s32 coal_rx_frames_wanted = -1;
-   s32 coal_rx_usec_irq_wanted = -1;
-   s32 coal_rx_frames_irq_wanted = -1;
-   s32 coal_tx_usec_wanted = -1;
-   s32 coal_tx_frames_wanted = -1;
-   s32 coal_tx_usec_irq_wanted = -1;
-   s32 coal_tx_frames_irq_wanted = -1;
-   s32 coal_rx_usec_low_wanted = -1;
-   s32 coal_rx_frames_low_wanted = -1;
-   s32 coal_tx_usec_low_wanted = -1;
-   s32 coal_tx_frames_low_wanted = -1;
-   s32 coal_rx_usec_high_wanted = -1;
-   s32 coal_rx_frames_high_wanted = -1;
-   s32 coal_tx_usec_high_wanted = -1;
-   s32 coal_tx_frames_high_wanted = -1;
-   struct cmdline_info cmdline_coalesce[] = {
-   { "adaptive-rx", CMDL_BOOL, _adaptive_rx_wanted,
- _adaptive_rx_coalesce },
-   {

[RFC 3/5] ethtool: introduce new ioctl for per queue setting

2015-12-17 Thread kan . liang

From: Kan Liang 

Introduce a new ioctl for per queue parameters setting.
Users can apply commands to specific queues by setting SUB_COMMAND and
queue_mask as following command.

 ethtool --set-perqueue-command DEVNAME [queue_mask %x] SUB_COMMAND

If queue_mask is not set, the SUB_COMMAND will be applied to all queues.

The following patches will enable SUB_COMMANDs for per queue setting.

Signed-off-by: Kan Liang 
---
 ethtool-copy.h | 18 +++
 ethtool.c  | 94 ++
 internal.h |  2 ++
 3 files changed, 114 insertions(+)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index d23ffc4..a76c1dc 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -1108,6 +1108,22 @@ enum ethtool_sfeatures_retval_bits {
 #define ETHTOOL_F_WISH  (1 << ETHTOOL_F_WISH__BIT)
 #define ETHTOOL_F_COMPAT(1 << ETHTOOL_F_COMPAT__BIT)
 
+#define MAX_QUEUE  4096
+#define MAX_QUEUE_MASK (MAX_QUEUE / 64)
+
+/**
+ * struct ethtool_per_queue_op - apply sub command to the queues in mask.
+ * @cmd: ETHTOOL_PERQUEUE
+ * @queue_mask: Mask the queues which sub command apply to
+ * @sub_command: the sub command
+ * @data: parameters of the command
+ */
+struct ethtool_per_queue_op {
+   __u32   cmd;
+   __u64   queue_mask[MAX_QUEUE_MASK];
+   __u32   sub_command;
+   chardata[];
+};
 
 /* CMDs currently supported */
 #define ETHTOOL_GSET   0x0001 /* Get settings. */
@@ -1190,6 +1206,8 @@ enum ethtool_sfeatures_retval_bits {
 #define ETHTOOL_GTUNABLE   0x0048 /* Get tunable configuration */
 #define ETHTOOL_STUNABLE   0x0049 /* Set tunable configuration */
 
+#define ETHTOOL_PERQUEUE   0x004a /* Set per queue options */
+
 /* compatibility with older code */
 #define SPARC_ETH_GSET ETHTOOL_GSET
 #define SPARC_ETH_SSET ETHTOOL_SSET
diff --git a/ethtool.c b/ethtool.c
index cb9e630..ea88c85 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3989,6 +3989,8 @@ static int do_seee(struct cmd_context *ctx)
return 0;
 }
 
+static int do_perqueue(struct cmd_context *ctx);
+
 #ifndef TEST_ETHTOOL
 int send_ioctl(struct cmd_context *ctx, void *cmd)
 {
@@ -4146,6 +4148,8 @@ static const struct option {
  " [ advertise %x ]\n"
  " [ tx-lpi on|off ]\n"
  " [ tx-timer %d ]\n"},
+   { "--set-perqueue-command", 1, do_perqueue, "Set per queue command",
+ " [queue_mask %x] SUB_COMMAND\n"},
{ "-h|--help", 0, show_usage, "Show this help" },
{ "--version", 0, do_version, "Show version number" },
{}
@@ -4197,6 +4201,96 @@ static int find_option(int argc, char **argp)
return -1;
 }
 
+static int set_queue_mask(u64 *queue_mask, char *str)
+{
+   int len = strlen(str);
+   int index = BITS_TO_U64((len * 4));
+   char tmp[17];
+   char *end = str + len;
+   int i, num;
+
+   if (len > MAX_QUEUE)
+   return -EINVAL;
+
+   for (i = 0; i < index; i++) {
+   num = end - str;
+   if (num >= 16) {
+   end -= 16;
+   num = 16;
+   } else {
+   end = str;
+   }
+   strncpy(tmp, end, num);
+   tmp[num] = '\0';
+
+   queue_mask[i] = strtoull(tmp, NULL, 16);
+   }
+
+   return 0;
+}
+
+#define MAX(x, y) (x > y ? x : y)
+
+static int find_max_queue_num(struct cmd_context *ctx)
+{
+   struct ethtool_channels echannels;
+
+   echannels.cmd = ETHTOOL_GCHANNELS;
+   if (send_ioctl(ctx, ))
+   return -1;
+
+   return MAX(MAX(echannels.rx_count, echannels.tx_count), 
echannels.combined_count);
+}
+
+static int do_perqueue(struct cmd_context *ctx)
+{
+   __u64 queue_mask[MAX_QUEUE_MASK] = {0};
+   __u64 mask;
+   int i, queue_num = 0;
+
+   if (ctx->argc == 0)
+   exit_bad_args();
+
+   /* All queues will be applied if no queue_mask set */
+   if (strncmp(*ctx->argp, "queue_mask", 10)) {
+   queue_num = find_max_queue_num(ctx);
+   if (queue_num < 0) {
+   perror("Cannot get queue number");
+   return -EFAULT;
+   }
+   for (i = 0; i < queue_num / 64; i++)
+   queue_mask[i] = ~0;
+   queue_mask[i] = (1ULL << (queue_num - i * 64)) - 1;
+   } else {
+   ctx->argc--;
+   ctx->argp++;
+   if (set_queue_mask(queue_mask, *ctx->argp)) {
+   perror("Invalid queue mask");
+   return -EINVAL;
+   }
+   ctx->argc--;
+   ctx->argp++;
+
+   /* Get the masked queue number */
+   for (i = 0; i < MAX_QUEUE_MASK; i++) {
+   mask = queue_mask[i];
+

Re: [PATCH v3] net/macb: add support for resetting PHY using GPIO

2015-12-17 Thread David Miller

From: Gregory CLEMENT 
Date: Thu, 17 Dec 2015 09:39:32 +0100

> if I remebered well you do not remove patch from yout branch.  So would
> you agree to take a follow-up patch on top of 5833e0526820 "net/macb:
> add support for resetting PHY using GPIO" ?

Yes.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 0/2] Local checksum offload for VXLAN

2015-12-17 Thread Tom Herbert

On Thu, Dec 17, 2015 at 7:27 AM, Edward Cree  wrote:
> When the inner packet checksum is offloaded, the outer UDP checksum is easy
>  to calculate as it doesn't depend on the payload (because the inner checksum
>  cancels out everything from the inner packet except the pseudo header).
> Thus, transmit checksums for VXLAN (and in principle other encapsulations,
>  but I haven't enabled it for / tested with those) can be offloaded on any
>  device supporting NETIF_F_HW_CSUM.  Only the innermost checksum has to be
>  offloaded, the rest are filled in by the stack.
> Tested by hacking a driver to report NETIF_F_HW_CSUM, call skb_checksum_help
>  before transmitting a packet, and not actually offload anything to the hw.
> In principle it should also be possible to apply this technique when the
>  inner packet has been checksummed by software, but only if skb->csum_start
>  and skb->csum_offset have been filled in to describe the inner checksum.
>  However in this case it is easier to use skb->csum and skb->csum_start, as
>  gso_make_checksum() already does - a similar but simpler technique.  It's
>  not clear to me where else this should be done, so this is out of scope for
>  this patch series.
>
Edward, it took me a while to understand how this works, but this
really is an amazing trick! This implies that we don't need to worry
about HW support for offloading multiple checksums.

I'm not sure that we need bits in VXLAN or any other encapsulation. It
should be sufficient in udp_set_csum that if we already have
CHECKSUM_PARTIAL that can always be used to do local checksum offload.
This is also should be independent as to whether the device does
NETIF_F_HW_CSUM or can offload  NETIF_F_IP[V6]_CSUM for encapsulated
packets.

It would be nice to have a more formal documentation also. This is a
very powerful mechanism but the math behind it and requirements are
subtle.

Tom

> Edward Cree (2):
>   net: udp: local checksum offload for encapsulation
>   net: vxlan: enable local checksum offload on HW_CSUM devices
>
>  drivers/net/vxlan.c |  5 -
>  net/ipv4/udp.c  | 34 +-
>  2 files changed, 33 insertions(+), 6 deletions(-)
>
> --
> 2.4.3
>
> The information contained in this message is confidential and is intended for 
> the addressee(s) only. If you have received this message in error, please 
> notify the sender immediately and delete the message. Unless you are an 
> addressee (or authorized to receive for an addressee), you may not use, copy 
> or disclose to anyone this message or any information contained in this 
> message. The unauthorized use, disclosure, copying or alteration of this 
> message is strictly prohibited.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 2/5] net/ethtool: support get coalesce per queue

2015-12-17 Thread kan . liang

From: Kan Liang 

Device driver has to provide an interface to get per queue coalesce.
The interrupt coalescing parameters of each masked queue will be
copied back to user space one by one.

Signed-off-by: Kan Liang 
---
 include/linux/ethtool.h |  5 -
 net/core/ethtool.c  | 33 -
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 653dc9c..107f75f 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -201,6 +201,8 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, u32 
n_rx_rings)
  * @get_module_eeprom: Get the eeprom information from the plug-in module
  * @get_eee: Get Energy-Efficient (EEE) supported and status.
  * @set_eee: Set EEE status (enable/disable) as well as LPI timers.
+ * @get_per_queue_coalesce: Get interrupt coalescing parameters per queue.
+ * Returns a negative error code or zero.
  *
  * All operations are optional (i.e. the function pointer may be set
  * to %NULL) and callers must take this into account.  Callers must
@@ -279,7 +281,8 @@ struct ethtool_ops {
   const struct ethtool_tunable *, void *);
int (*set_tunable)(struct net_device *,
   const struct ethtool_tunable *, const void *);
-
+   int (*get_per_queue_coalesce)(struct net_device *, int,
+ struct ethtool_coalesce *);
 
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 125fb32..22ff69a 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1748,6 +1748,36 @@ out:
return ret;
 }
 
+static int ethtool_get_per_queue_coalesce(struct net_device *dev,
+ void __user *useraddr,
+ struct ethtool_per_queue_op 
*per_queue_opt)
+{
+   u64 queue_mask;
+   int bit, i, ret;
+
+   if (!dev->ethtool_ops->get_per_queue_coalesce)
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+   for (i = 0; i < MAX_QUEUE_MASK; i++) {
+   queue_mask = per_queue_opt->queue_mask[i];
+   if (queue_mask > 0) {
+   for_each_set_bit(bit, (unsigned long *)_mask, 64) 
{
+   struct ethtool_coalesce coalesce = { .cmd = 
ETHTOOL_GCOALESCE };
+
+   ret = 
dev->ethtool_ops->get_per_queue_coalesce(dev, bit + i * 64, );
+   if (ret != 0)
+   return ret;
+   if (copy_to_user(useraddr, , 
sizeof(coalesce)))
+   return -EFAULT;
+   useraddr += sizeof(coalesce);
+   }
+   }
+   }
+
+   return 0;
+}
+
 static int ethtool_set_per_queue(struct net_device *dev, void __user *useraddr)
 {
struct ethtool_per_queue_op per_queue_opt;
@@ -1756,7 +1786,8 @@ static int ethtool_set_per_queue(struct net_device *dev, 
void __user *useraddr)
return -EFAULT;
 
switch (per_queue_opt.sub_command) {
-
+   case ETHTOOL_GCOALESCE:
+   return ethtool_get_per_queue_coalesce(dev, useraddr, 
_queue_opt);
default:
return -EOPNOTSUPP;
};
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 5/5] i40e/ethtool: support coalesce setting by queue

2015-12-17 Thread kan . liang

From: Kan Liang 

This patch implements set_per_queue_coalesce for i40e driver.
For i40e driver, only rx and tx usecs has per queue value. Changing
these two parameters only impact the specific queue. For other interrupt
coalescing parameters, they are shared among queues. The change to one
queue will impact all queues.

Signed-off-by: Kan Liang 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 55 +++---
 1 file changed, 41 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index b41f0be..5a35fdb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1901,14 +1901,29 @@ static int i40e_get_per_queue_coalesce(struct 
net_device *netdev, int queue,
return __i40e_get_coalesce(netdev, ec, queue);
 }
 
-static int i40e_set_coalesce(struct net_device *netdev,
-struct ethtool_coalesce *ec)
+static void i40e_set_itr_for_queue(struct i40e_vsi *vsi, int queue, u16 vector)
 {
-   struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_q_vector *q_vector;
-   struct i40e_vsi *vsi = np->vsi;
struct i40e_pf *pf = vsi->back;
struct i40e_hw *hw = >hw;
+   u16 intrl = INTRL_USEC_TO_REG(vsi->int_rate_limit);
+
+   q_vector = vsi->q_vectors[queue];
+   q_vector->rx.itr = ITR_TO_REG(vsi->rx_itr_setting);
+   wr32(hw, I40E_PFINT_ITRN(0, vector - 1), q_vector->rx.itr);
+   q_vector->tx.itr = ITR_TO_REG(vsi->tx_itr_setting);
+   wr32(hw, I40E_PFINT_ITRN(1, vector - 1), q_vector->tx.itr);
+   wr32(hw, I40E_PFINT_RATEN(vector - 1), intrl);
+   i40e_flush(hw);
+}
+
+static int __i40e_set_coalesce(struct net_device *netdev,
+  struct ethtool_coalesce *ec,
+  int queue)
+{
+   struct i40e_netdev_priv *np = netdev_priv(netdev);
+   struct i40e_vsi *vsi = np->vsi;
+   struct i40e_pf *pf = vsi->back;
u16 vector;
int i;
 
@@ -1964,21 +1979,32 @@ static int i40e_set_coalesce(struct net_device *netdev,
else
vsi->tx_itr_setting &= ~I40E_ITR_DYNAMIC;
 
-   for (i = 0; i < vsi->num_q_vectors; i++, vector++) {
-   u16 intrl = INTRL_USEC_TO_REG(vsi->int_rate_limit);
-
-   q_vector = vsi->q_vectors[i];
-   q_vector->rx.itr = ITR_TO_REG(vsi->rx_itr_setting);
-   wr32(hw, I40E_PFINT_ITRN(0, vector - 1), q_vector->rx.itr);
-   q_vector->tx.itr = ITR_TO_REG(vsi->tx_itr_setting);
-   wr32(hw, I40E_PFINT_ITRN(1, vector - 1), q_vector->tx.itr);
-   wr32(hw, I40E_PFINT_RATEN(vector - 1), intrl);
-   i40e_flush(hw);
+   if (queue < 0) {
+   for (i = 0; i < vsi->num_q_vectors; i++, vector++)
+   i40e_set_itr_for_queue(vsi, i, vector);
+   } else {
+   if (queue >= vsi->num_q_vectors) {
+   netif_info(pf, drv, netdev, "Invalid queue value, queue 
range is 0 - %d\n", vsi->num_q_vectors - 1);
+   return -EINVAL;
+   }
+   i40e_set_itr_for_queue(vsi, queue, vector + queue);
}
 
return 0;
 }
 
+static int i40e_set_coalesce(struct net_device *netdev,
+struct ethtool_coalesce *ec)
+{
+   return __i40e_set_coalesce(netdev, ec, -1);
+}
+
+static int i40e_set_per_queue_coalesce(struct net_device *netdev, int queue,
+  struct ethtool_coalesce *ec)
+{
+   return __i40e_set_coalesce(netdev, ec, queue);
+}
+
 /**
  * i40e_get_rss_hash_opts - Get RSS hash Input Set for each flow type
  * @pf: pointer to the physical function struct
@@ -2818,6 +2844,7 @@ static const struct ethtool_ops i40e_ethtool_ops = {
.get_priv_flags = i40e_get_priv_flags,
.set_priv_flags = i40e_set_priv_flags,
.get_per_queue_coalesce = i40e_get_per_queue_coalesce,
+   .set_per_queue_coalesce = i40e_set_per_queue_coalesce,
 };
 
 void i40e_set_ethtool_ops(struct net_device *netdev)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 1/5] net/ethtool: introduce a new ioctl for per queue setting

2015-12-17 Thread kan . liang

From: Kan Liang 

Introduce a new ioctl ETHTOOL_PERQUEUE for per queue parameters setting.
The following patches will enable some SUB_COMMANDs for per queue
setting.

Signed-off-by: Kan Liang 
---
 include/uapi/linux/ethtool.h | 18 ++
 net/core/ethtool.c   | 17 +
 2 files changed, 35 insertions(+)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index cd16291..05bc92a 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1144,6 +1144,22 @@ enum ethtool_sfeatures_retval_bits {
 #define ETHTOOL_F_WISH  (1 << ETHTOOL_F_WISH__BIT)
 #define ETHTOOL_F_COMPAT(1 << ETHTOOL_F_COMPAT__BIT)
 
+#define MAX_QUEUE  4096
+#define MAX_QUEUE_MASK (MAX_QUEUE / 64)
+
+/**
+ * struct ethtool_per_queue_op - apply sub command to the queues in mask.
+ * @cmd: ETHTOOL_PERQUEUE
+ * @queue_mask: Mask the queues which sub command apply to
+ * @sub_command: the sub command
+ * @data: parameters of the command
+ */
+struct ethtool_per_queue_op {
+   __u32   cmd;
+   __u64   queue_mask[MAX_QUEUE_MASK];
+   __u32   sub_command;
+   chardata[];
+};
 
 /* CMDs currently supported */
 #define ETHTOOL_GSET   0x0001 /* Get settings. */
@@ -1226,6 +1242,8 @@ enum ethtool_sfeatures_retval_bits {
 #define ETHTOOL_GTUNABLE   0x0048 /* Get tunable configuration */
 #define ETHTOOL_STUNABLE   0x0049 /* Set tunable configuration */
 
+#define ETHTOOL_PERQUEUE   0x004a /* Set per queue options */
+
 /* compatibility with older code */
 #define SPARC_ETH_GSET ETHTOOL_GSET
 #define SPARC_ETH_SSET ETHTOOL_SSET
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 29edf74..125fb32 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1748,6 +1748,20 @@ out:
return ret;
 }
 
+static int ethtool_set_per_queue(struct net_device *dev, void __user *useraddr)
+{
+   struct ethtool_per_queue_op per_queue_opt;
+
+   if (copy_from_user(_queue_opt, useraddr, sizeof(per_queue_opt)))
+   return -EFAULT;
+
+   switch (per_queue_opt.sub_command) {
+
+   default:
+   return -EOPNOTSUPP;
+   };
+}
+
 /* The main entry point in this file.  Called from net/core/dev_ioctl.c */
 
 int dev_ethtool(struct net *net, struct ifreq *ifr)
@@ -1991,6 +2005,9 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
case ETHTOOL_STUNABLE:
rc = ethtool_set_tunable(dev, useraddr);
break;
+   case ETHTOOL_PERQUEUE:
+   rc = ethtool_set_per_queue(dev, useraddr);
+   break;
default:
rc = -EOPNOTSUPP;
}
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 4/5] i40e/ethtool: support coalesce getting by queue

2015-12-17 Thread kan . liang

From: Kan Liang 

This patch implements get_per_queue_coalesce for i40e driver.
For i40e driver, only rx and tx usecs has per queue value. So only these
two parameters are read from specific registers. For other interrupt
coalescing parameters, they are shared among queues. The values which
are stored in vsi will be return.

Signed-off-by: Kan Liang 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 3f385ff..b41f0be 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1843,11 +1843,16 @@ static int i40e_set_phys_id(struct net_device *netdev,
  * 125us (8000 interrupts per second) == ITR(62)
  */
 
-static int i40e_get_coalesce(struct net_device *netdev,
-struct ethtool_coalesce *ec)
+static int __i40e_get_coalesce(struct net_device *netdev,
+  struct ethtool_coalesce *ec,
+  int queue)
 {
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_vsi *vsi = np->vsi;
+   struct i40e_pf *pf = vsi->back;
+   struct i40e_hw *hw = >hw;
+   struct i40e_q_vector *q_vector;
+   u16 vector;
 
ec->tx_max_coalesced_frames_irq = vsi->work_limit;
ec->rx_max_coalesced_frames_irq = vsi->work_limit;
@@ -1869,9 +1874,33 @@ static int i40e_get_coalesce(struct net_device *netdev,
ec->rx_coalesce_usecs_high = vsi->int_rate_limit;
ec->tx_coalesce_usecs_high = vsi->int_rate_limit;
 
+   if (queue > 0) {
+   if (queue >= vsi->num_q_vectors) {
+   netif_info(pf, drv, netdev, "Invalid queue number\n");
+   return -EINVAL;
+   }
+
+   q_vector = vsi->q_vectors[queue];
+   vector = vsi->base_vector + queue;
+
+   ec->rx_coalesce_usecs = ITR_REG_TO_USEC(rd32(hw, 
I40E_PFINT_ITRN(0, vector - 1)));
+   ec->tx_coalesce_usecs = ITR_REG_TO_USEC(rd32(hw, 
I40E_PFINT_ITRN(1, vector - 1)));
+   }
return 0;
 }
 
+static int i40e_get_coalesce(struct net_device *netdev,
+struct ethtool_coalesce *ec)
+{
+   return __i40e_get_coalesce(netdev, ec, -1);
+}
+
+static int i40e_get_per_queue_coalesce(struct net_device *netdev, int queue,
+  struct ethtool_coalesce *ec)
+{
+   return __i40e_get_coalesce(netdev, ec, queue);
+}
+
 static int i40e_set_coalesce(struct net_device *netdev,
 struct ethtool_coalesce *ec)
 {
@@ -2788,6 +2817,7 @@ static const struct ethtool_ops i40e_ethtool_ops = {
.get_ts_info= i40e_get_ts_info,
.get_priv_flags = i40e_get_priv_flags,
.set_priv_flags = i40e_set_priv_flags,
+   .get_per_queue_coalesce = i40e_get_per_queue_coalesce,
 };
 
 void i40e_set_ethtool_ops(struct net_device *netdev)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 3/5] net/ethtool: support set coalesce per queue

2015-12-17 Thread kan . liang

From: Kan Liang 

Device driver has to provide an interface to set per queue coalesce. The
wanted coalesce information are stored in "data" for each masked queue,
which can copy from userspace.

Signed-off-by: Kan Liang 
---
 include/linux/ethtool.h |  4 
 net/core/ethtool.c  | 33 +
 2 files changed, 37 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 107f75f..b3bbbcb 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -203,6 +203,8 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, u32 
n_rx_rings)
  * @set_eee: Set EEE status (enable/disable) as well as LPI timers.
  * @get_per_queue_coalesce: Get interrupt coalescing parameters per queue.
  * Returns a negative error code or zero.
+ * @set_per_queue_coalesce: Set interrupt coalescing parameters per queue.
+ * Returns a negative error code or zero.
  *
  * All operations are optional (i.e. the function pointer may be set
  * to %NULL) and callers must take this into account.  Callers must
@@ -283,6 +285,8 @@ struct ethtool_ops {
   const struct ethtool_tunable *, const void *);
int (*get_per_queue_coalesce)(struct net_device *, int,
  struct ethtool_coalesce *);
+   int (*set_per_queue_coalesce)(struct net_device *, int,
+ struct ethtool_coalesce *);
 
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 22ff69a..c8c5726 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1778,6 +1778,37 @@ static int ethtool_get_per_queue_coalesce(struct 
net_device *dev,
return 0;
 }
 
+static int ethtool_set_per_queue_coalesce(struct net_device *dev,
+ void __user *useraddr,
+ struct ethtool_per_queue_op 
*per_queue_opt)
+{
+   u64 queue_mask;
+   int bit, i, ret;
+
+   if (!dev->ethtool_ops->set_per_queue_coalesce)
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+   for (i = 0; i < MAX_QUEUE_MASK; i++) {
+   queue_mask = per_queue_opt->queue_mask[i];
+   if (queue_mask > 0) {
+   for_each_set_bit(bit, (unsigned long *)_mask, 64) 
{
+   struct ethtool_coalesce coalesce;
+
+   if (copy_from_user(, useraddr, 
sizeof(coalesce)))
+   return -EFAULT;
+
+   ret = 
dev->ethtool_ops->set_per_queue_coalesce(dev, bit + i * 64, );
+   if (ret != 0)
+   return ret;
+   useraddr += sizeof(coalesce);
+   }
+   }
+   }
+
+   return 0;
+}
+
 static int ethtool_set_per_queue(struct net_device *dev, void __user *useraddr)
 {
struct ethtool_per_queue_op per_queue_opt;
@@ -1788,6 +1819,8 @@ static int ethtool_set_per_queue(struct net_device *dev, 
void __user *useraddr)
switch (per_queue_opt.sub_command) {
case ETHTOOL_GCOALESCE:
return ethtool_get_per_queue_coalesce(dev, useraddr, 
_queue_opt);
+   case ETHTOOL_SCOALESCE:
+   return ethtool_set_per_queue_coalesce(dev, useraddr, 
_queue_opt);
default:
return -EOPNOTSUPP;
};
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 5/5] ethtool: support per queue sub command --coalesce

2015-12-17 Thread kan . liang

From: Kan Liang 

This patch uses a similar way as do_scoalesce to set coalesce per queue.
It reads the current settings, change them, and write them back to the
kernel for each masked queue.

Example:

 $ sudo ./ethtool --set-perqueue-command eth5 queue_mask 0x1 --coalesce
 rx-usecs 10 tx-usecs 5
 $ sudo ./ethtool --set-perqueue-command eth5 queue_mask 0x1
 --show-coalesce

 Queue: 0
 Adaptive RX: on  TX: on
 stats-block-usecs: 0
 sample-interval: 0
 pkt-rate-low: 0
 pkt-rate-high: 0

 rx-usecs: 10
 rx-frames: 0
 rx-usecs-irq: 0
 rx-frames-irq: 256

 tx-usecs: 5
 tx-frames: 0
 tx-usecs-irq: 0
 tx-frames-irq: 256

 rx-usecs-low: 0
 rx-frame-low: 0
 tx-usecs-low: 0
 tx-frame-low: 0

 rx-usecs-high: 0
 rx-frame-high: 0
 tx-usecs-high: 0
 tx-frame-high: 0

Signed-off-by: Kan Liang 
---
 ethtool.c | 58 +-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/ethtool.c b/ethtool.c
index e7becc9..43eaa86 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -4172,7 +4172,7 @@ static const struct option {
  " [ tx-lpi on|off ]\n"
  " [ tx-timer %d ]\n"},
{ "--set-perqueue-command", 1, do_perqueue, "Set per queue command. "
- "The supported sub commands include --show-coalesce",
+ "The supported sub commands include --show-coalesce, --coalesce",
  " [queue_mask %x] SUB_COMMAND\n"},
{ "-h|--help", 0, show_usage, "Show this help" },
{ "--version", 0, do_version, "Show version number" },
@@ -4287,6 +4287,52 @@ get_per_queue_coalesce(struct cmd_context *ctx,
return per_queue_opt;
 }
 
+static void __set_per_queue_coalesce(int queue)
+{
+   int changed = 0;
+
+   do_generic_set(cmdline_coalesce, ARRAY_SIZE(cmdline_coalesce),
+  );
+
+   if (!changed)
+   fprintf(stderr, "Queue %d, no coalesce parameters changed\n", 
queue);
+}
+
+static void set_per_queue_coalesce(struct cmd_context *ctx,
+  struct ethtool_per_queue_op *per_queue_opt)
+{
+   __u64 *queue_mask = per_queue_opt->queue_mask;
+   char *addr = (char *)per_queue_opt + sizeof(*per_queue_opt);
+   int gcoalesce_changed = 0;
+   int i;
+
+   parse_generic_cmdline(ctx, _changed,
+ cmdline_coalesce, ARRAY_SIZE(cmdline_coalesce));
+
+   for (i = 0; i < MAX_QUEUE_MASK; i++) {
+   int queue = i * 64;
+   __u64 mask = queue_mask[i];
+
+   while (mask > 0) {
+   if (mask & 0x1) {
+   memcpy(_ecoal, addr, sizeof(struct 
ethtool_coalesce));
+   __set_per_queue_coalesce(queue);
+   memcpy(addr, _ecoal, sizeof(struct 
ethtool_coalesce));
+   addr += sizeof(struct ethtool_coalesce);
+   }
+   mask = mask >> 1;
+   queue++;
+   }
+   }
+
+   per_queue_opt->cmd = ETHTOOL_PERQUEUE;
+   per_queue_opt->sub_command = ETHTOOL_SCOALESCE;
+
+   if (send_ioctl(ctx, per_queue_opt))
+   perror("Cannot set device per queue parameters");
+
+}
+
 static int do_perqueue(struct cmd_context *ctx)
 {
struct ethtool_per_queue_op *per_queue_opt;
@@ -4340,6 +4386,16 @@ static int do_perqueue(struct cmd_context *ctx)
}
dump_per_queue_coalesce(per_queue_opt, queue_mask);
free(per_queue_opt);
+   } else if (strstr(args[i].opts, "--coalesce") != NULL) {
+   ctx->argc--;
+   ctx->argp++;
+   per_queue_opt = get_per_queue_coalesce(ctx, queue_mask, 
queue_num);
+   if (per_queue_opt == NULL) {
+   perror("Cannot get device per queue parameters");
+   return -EFAULT;
+   }
+   set_per_queue_coalesce(ctx, per_queue_opt);
+   free(per_queue_opt);
} else {
perror("The subcommand is not supported yet");
return -EOPNOTSUPP;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 1/5] ethtool: move option parsing related codes into function

2015-12-17 Thread kan . liang

From: Kan Liang 

Move option parsing code into find_option function.
No behavior changes.

Signed-off-by: Kan Liang 
---
 ethtool.c | 49 +++--
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 92c40b8..04c5015 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -4173,6 +4173,29 @@ static int show_usage(struct cmd_context *ctx)
return 0;
 }
 
+static int find_option(int argc, char **argp)
+{
+   const char *opt;
+   size_t len;
+   int k;
+
+   for (k = 0; args[k].opts; k++) {
+   opt = args[k].opts;
+   for (;;) {
+   len = strcspn(opt, "|");
+   if (strncmp(*argp, opt, len) == 0 &&
+   (*argp)[len] == 0)
+   return k;
+
+   if (opt[len] == 0)
+   break;
+   opt += len + 1;
+   }
+   }
+
+   return -1;
+}
+
 int main(int argc, char **argp)
 {
int (*func)(struct cmd_context *);
@@ -4190,24 +4213,14 @@ int main(int argc, char **argp)
 */
if (argc == 0)
exit_bad_args();
-   for (k = 0; args[k].opts; k++) {
-   const char *opt;
-   size_t len;
-   opt = args[k].opts;
-   for (;;) {
-   len = strcspn(opt, "|");
-   if (strncmp(*argp, opt, len) == 0 &&
-   (*argp)[len] == 0) {
-   argp++;
-   argc--;
-   func = args[k].func;
-   want_device = args[k].want_device;
-   goto opt_found;
-   }
-   if (opt[len] == 0)
-   break;
-   opt += len + 1;
-   }
+
+   k = find_option(argc, argp);
+   if (k > 0) {
+   argp++;
+   argc--;
+   func = args[k].func;
+   want_device = args[k].want_device;
+   goto opt_found;
}
if ((*argp)[0] == '-')
exit_bad_args();
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pull-request: mac80211 2015-12-15

2015-12-17 Thread David Miller

From: Johannes Berg 
Date: Thu, 17 Dec 2015 13:44:32 +0100

> On Wed, 2015-12-16 at 18:34 -0500, David Miller wrote:
>> 
>> Something about your text encoding kept this from ending up
>> in patchwork for some reason.
>> 
> 
> Hm. I don't see anything special with this, seems to just be plain text
> 8bit transfer encoding.
> 
> Do you want me to watch out for things getting into patchwork in the
> future?

Look in the quoted text of mine, see that underline thing after
the ">>"?  Where are those coming from?  Those were all over the
place in your pull request and tripped up patchwork's parser I
guess.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 2/2] udp: restrict offloads to one namespace

2015-12-17 Thread Hannes Frederic Sowa

On 17.12.2015 18:32, Tom Herbert wrote:
> On Thu, Dec 17, 2015 at 12:49 AM, Hannes Frederic Sowa
>  wrote:
>> With user namespaces a normal user can start a new network namespace
>> with all privileges and thus add new offloads, letting the other stack
>> interpret this garbage. Because the user namespace can also add
>> arbitrary ip addresses to its interface, solely matching those is not
>> enough.
>>
>> Tom any further comments?
>>
> I still don't think this addresses the core problem. If we're just
> worried about offloads being added in a user namespace that conflict
> with the those in the root space, it might be just as easy to disallow
> setting offloads except in default namespace.

I am fine with that solution, too.

> [...]
>
> To address this in the host stack the solution is pretty
> straightforward, we need to decide that the packet is going to be
> received before applying any offloads. Essentially we want to do an
> early_demux _really_ early. If we demux and get UDP socket for
> instance, then the protocol specific GRO function can be retrieved
> from the socket. So this will work with single listener port like
> encaps do today,  and also if encapsulation is being used over a
> connected socket. This also works if we want to support a user defined
> GRO function like I mentioned we might want to do for QUIC etc.

An approximation can be done, but I don't think it is feasible to
implement this kind of checks across namespace borders, ip rules and
netfilter rulesets, which could all change the outcome of the process.

Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 2/2] udp: restrict offloads to one namespace

2015-12-17 Thread Tom Herbert

On Thu, Dec 17, 2015 at 12:49 AM, Hannes Frederic Sowa
 wrote:
> Hi all,
>
> On 17.12.2015 01:04, David Miller wrote:
>> From: Hannes Frederic Sowa 
>> Date: Tue, 15 Dec 2015 21:01:54 +0100
>>
>>> udp tunnel offloads tend to aggregate datagrams based on inner
>>> headers. gro engine gets notified by tunnel implementations about
>>> possible offloads. The match is solely based on the port number.
>>>
>>> Imagine a tunnel bound to port 53, the offloading will look into all
>>> DNS packets and tries to aggregate them based on the inner data found
>>> within. This could lead to data corruption and malformed DNS packets.
>>>
>>> While this patch minimizes the problem and helps an administrator to find
>>> the issue by querying ip tunnel/fou, a better way would be to match on
>>> the specific destination ip address so if a user space socket is bound
>>> to the same address it will conflict.
>>>
>>> Cc: Tom Herbert 
>>> Cc: Eric Dumazet 
>>> Signed-off-by: Hannes Frederic Sowa 
>>
>> It looks this issue is still being hashed out so I've marked this
>> patch as deferred for now.
>
>
> I think we need this patch. We later can decide to add more
> classification attributes, like dst ip down to gro, but the netns marks
> are important.
>
> With user namespaces a normal user can start a new network namespace
> with all privileges and thus add new offloads, letting the other stack
> interpret this garbage. Because the user namespace can also add
> arbitrary ip addresses to its interface, solely matching those is not
> enough.
>
> Tom any further comments?
>
I still don't think this addresses the core problem. If we're just
worried about offloads being added in a user namespace that conflict
with the those in the root space, it might be just as easy to disallow
setting offloads except in default namespace.

The core problem is that UDP port numbers don't have global meaning,
and don't really have any meaning to anyone except the sender and
receiver. This is different from IP protocol numbers, where IP
protocol number 6 is always interpreted as TCP anywhere in the
network. From RFC7605:

"It is important to recognize that any interpretation of port numbers
-- except at the endpoints -- may be incorrect, because port numbers
are meaningful only at the endpoints."

In the case of device offloads the device is not an endpoint so
interpretation of port numbers may be incorrect. This is also true in
GRO since it happens before it has been determined that packet is
being received at the local endpoint. The possibility of
misinterpretation based on destination port in the stack occurs when
we process packets that are later be forwarded as opposed to received
which can happen with netns or even with just forwarding enabled. If
the misinterpretation causes corruption or mis-delivery the fault lies
in the *implementation* not the protocol!

To address this in the host stack the solution is pretty
straightforward, we need to decide that the packet is going to be
received before applying any offloads. Essentially we want to do an
early_demux _really_ early. If we demux and get UDP socket for
instance, then the protocol specific GRO function can be retrieved
from the socket. So this will work with single listener port like
encaps do today,  and also if encapsulation is being used over a
connected socket. This also works if we want to support a user defined
GRO function like I mentioned we might want to do for QUIC etc.

For hardware offloads the problem is harder to solve to be completely
correct (or a least correct approaching 100% probability).
Possibilities are:
1) Use protocol agnostic offloads since they don't care about UDP or
port numbers (we've already discussed this!)
2) Use magic numbers in the protocol
(https://www.ietf.org/id/draft-herbert-udp-magic-numbers-01.txt).
3) Use ntuple filters identify the packets to be subject to offload
based on more than just. This really should have been the interface
for VXLAN offload from the beginning anyway!

Tom

> Thanks,
> Hannes
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 2/2] udp: restrict offloads to one namespace

2015-12-17 Thread Tom Herbert

On Thu, Dec 17, 2015 at 9:40 AM, Hannes Frederic Sowa
 wrote:
> On 17.12.2015 18:32, Tom Herbert wrote:
>> On Thu, Dec 17, 2015 at 12:49 AM, Hannes Frederic Sowa
>>  wrote:
>>> With user namespaces a normal user can start a new network namespace
>>> with all privileges and thus add new offloads, letting the other stack
>>> interpret this garbage. Because the user namespace can also add
>>> arbitrary ip addresses to its interface, solely matching those is not
>>> enough.
>>>
>>> Tom any further comments?
>>>
>> I still don't think this addresses the core problem. If we're just
>> worried about offloads being added in a user namespace that conflict
>> with the those in the root space, it might be just as easy to disallow
>> setting offloads except in default namespace.
>
> I am fine with that solution, too.
>
>> [...]
>>
>> To address this in the host stack the solution is pretty
>> straightforward, we need to decide that the packet is going to be
>> received before applying any offloads. Essentially we want to do an
>> early_demux _really_ early. If we demux and get UDP socket for
>> instance, then the protocol specific GRO function can be retrieved
>> from the socket. So this will work with single listener port like
>> encaps do today,  and also if encapsulation is being used over a
>> connected socket. This also works if we want to support a user defined
>> GRO function like I mentioned we might want to do for QUIC etc.
>
> An approximation can be done, but I don't think it is feasible to
> implement this kind of checks across namespace borders, ip rules and
> netfilter rulesets, which could all change the outcome of the process.
>
For receive offloads we don't need to worry about checking other namespaces.

> Bye,
> Hannes
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC net-next] ravb: Add dma queue interrupt support

2015-12-17 Thread Sergei Shtylyov


Hello.

On 12/17/2015 07:29 PM, Yoshihiro Kaneko wrote:


From: Kazuya Mizuguchi 

This patch supports the following interrupts.

- One interrupt for multiple (descriptor, error, management)
- One interrupt for emac
- Four interrupts for dma queue (best effort rx/tx, network control rx/tx)



You don't say why the current 2-interrupt scheme (implemented by Simon's
patch) isn't enpough...


Signed-off-by: Kazuya Mizuguchi 
Signed-off-by: Yoshihiro Kaneko 


[...]


diff --git a/drivers/net/ethernet/renesas/ravb.h
b/drivers/net/ethernet/renesas/ravb.h
index 9fbe92a..eada5a1 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h
@@ -157,6 +157,7 @@ enum ravb_reg {

[...]

+#define RAVB_RIx2_ALL  0x8003


No prefix. And there's bit 31 in this register, according to my gen3
manual. So, your _ALL isn't really "all bits". I'd rather call it RIx2_QFx.
Or even RIE2_QFS and RID2_QFD.


I think that bit 31 is included in the value 0x8003. Or I'm
missing something?


   Sorry, I misread the code.


+
+/* TIx */


TIE/TID.


+#define RAVB_TIx_ALL   0x000f


No prefix. And there's bit 31 in this register, according to my gen3
manual.


   Oops, no bit 31 in these regs.


So, your _ALL isn't really "all bits".

I think the correct value is 0x000f0f0f.


   Indeed, please fix.


[...]


diff --git a/drivers/net/ethernet/renesas/ravb_main.c
b/drivers/net/ethernet/renesas/ravb_main.c
index 120cc25..753b67d 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c

[...]

@@ -654,7 +679,7 @@ static int ravb_stop_dma(struct net_device *ndev)
   }

   /* E-MAC interrupt handler */
-static void ravb_emac_interrupt(struct net_device *ndev)
+static void _ravb_emac_interrupt(struct net_device *ndev)



ravb_emac_interrupt_[un]locked() perhaps? Not sure which is more
correct... :-)


How about ravb_process_emac_interrupt() ?


   I've made up my mind -- I'd prefer ravb_emac_interrupt_unlocked().

[...]

+static irqreturn_t ravb_multi_interrupt(int irq, void *dev_id)
+{
+   struct net_device *ndev = dev_id;
+   struct ravb_private *priv = netdev_priv(ndev);
+   irqreturn_t result = IRQ_NONE;
+   u32 iss;
+
+   spin_lock(>lock);
+   /* Get interrupt status */
+   iss = ravb_read(ndev, ISS);
+
 /* Error status summary */
 if (iss & ISS_ES) {
 ravb_error_interrupt(ndev);
 result = IRQ_HANDLED;
 }

+   /* Management */



Really? I thought that's gPTP Interrupt...


gPTP seems to be a part of Management related interrupts.


   ISS.CGIM is still described as gPTP interrupt mirror in my gen3 manual.


 if (iss & ISS_CGIS)
 result = ravb_ptp_interrupt(ndev);

@@ -776,6 +843,55 @@ static irqreturn_t ravb_interrupt(int irq, void
*dev_id)

[...]


Thanks,
kaneko


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pull-request: mac80211 2015-12-15

2015-12-17 Thread David Rivshin (Allworx)

On Thu, 17 Dec 2015 12:04:48 -0500 (EST)
David Miller  wrote:

> From: Johannes Berg 
> Date: Thu, 17 Dec 2015 13:44:32 +0100
> 
> > On Wed, 2015-12-16 at 18:34 -0500, David Miller wrote:
> >> 
> >> Something about your text encoding kept this from ending up
> >> in patchwork for some reason.
> >> 
> > 
> > Hm. I don't see anything special with this, seems to just be plain
> > text 8bit transfer encoding.
> > 
> > Do you want me to watch out for things getting into patchwork in the
> > future?
> 
> Look in the quoted text of mine, see that underline thing after
> the ">>"?  Where are those coming from?  Those were all over the
> place in your pull request and tripped up patchwork's parser I
> guess.

I was curious and took a look. I suspect what you're seeing are the UTF-8 
<0xC2 0xA0> sequence, which translates to codepoint 0x00A0 "no-break space"
(same as in latin1). They seem to have been used in place of regular spaces 
for the purpose of indenting. The Content-Type charset in Johannes' emails 
is "UTF-8", so I think that's legal. Although I have no idea how Patchwork 
reacts to it. 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close

2015-12-17 Thread Vlad Yasevich

On 12/17/2015 09:30 AM, Xin Long wrote:
> In sctp_close, sctp_make_abort_user may return NULL because of memory
> allocation failure. If this happens, it will bypass any state change
> and never free the assoc. The assoc has no chance to be freed and it
> will be kept in memory with the state it had even after the socket is
> closed by sctp_close().
> 
> So if sctp_make_abort_user fails to allocate memory, we should just
> free the asoc, as there isn't much else that we can do.
> 
> Signed-off-by: Xin Long 
> Acked-by: Marcelo Ricardo Leitner 
> ---
>  net/sctp/socket.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 9b6cc6d..267b8f8 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1513,8 +1513,12 @@ static void sctp_close(struct sock *sk, long timeout)
>   struct sctp_chunk *chunk;
>  
>   chunk = sctp_make_abort_user(asoc, NULL, 0);
> - if (chunk)
> + if (chunk) {
>   sctp_primitive_ABORT(net, asoc, chunk);
> + } else {
> + sctp_unhash_established(asoc);
> + sctp_association_free(asoc);
> + }

I don't think you can do that for an association that has not been closed.

I think a cleaner approach might be to update abort primitive handlers
to handle a NULL chunk value and unconditionally call the primitive.

This guarantees that any timers or waitqueues that might be active are
stopped correctly.

-vlad


>   } else
>   sctp_primitive_SHUTDOWN(net, asoc, NULL);
>   }
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close

2015-12-17 Thread Marcelo Ricardo Leitner


Em 17-12-2015 16:29, Vlad Yasevich escreveu:

On 12/17/2015 09:30 AM, Xin Long wrote:

In sctp_close, sctp_make_abort_user may return NULL because of memory
allocation failure. If this happens, it will bypass any state change
and never free the assoc. The assoc has no chance to be freed and it
will be kept in memory with the state it had even after the socket is
closed by sctp_close().

So if sctp_make_abort_user fails to allocate memory, we should just
free the asoc, as there isn't much else that we can do.

Signed-off-by: Xin Long 
Acked-by: Marcelo Ricardo Leitner 
---
  net/sctp/socket.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 9b6cc6d..267b8f8 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1513,8 +1513,12 @@ static void sctp_close(struct sock *sk, long timeout)
struct sctp_chunk *chunk;

chunk = sctp_make_abort_user(asoc, NULL, 0);
-   if (chunk)
+   if (chunk) {
sctp_primitive_ABORT(net, asoc, chunk);
+   } else {
+   sctp_unhash_established(asoc);
+   sctp_association_free(asoc);
+   }


I don't think you can do that for an association that has not been closed.

I think a cleaner approach might be to update abort primitive handlers
to handle a NULL chunk value and unconditionally call the primitive.

This guarantees that any timers or waitqueues that might be active are
stopped correctly.


sctp_association_free() is the one who does that job, even that way. All 
in between the primitive call and then the call to 
sctp_association_free() is just status changes and packet xmit, which 
doing this way we cut out when we are in memory pressure. pkt xmit or 
ULP events are likely going to fail too anyway.


sctp_sf_do_9_1_prm_abort() -> SCTP_CMD_ASSOC_FAILED ->
  sctp_cmd_assoc_failed -> ULP events, send abort, and 
SCTP_CMD_DELETE_TCB ->

sctp_cmd_delete_tcb ->
  sctp_unhash_established(asoc);
  sctp_association_free(asoc);
and returns.

There is a check on sctp_cmd_delete_tcb() that avoids calling that on 
temp assocs on listening sockets, but that condition is false due to the 
check on sk_shutdown so it will call those two functions anyway.


  Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 net-next] ipv6: allow routes to be configured with expire values

2015-12-17 Thread David Miller

From: Xin Long 
Date: Wed, 16 Dec 2015 17:50:11 +0800

> Add the support for adding expire value to routes,  requested by
> Tom Gundersen  for systemd-networkd, and NetworkManager
> wants it too.
> 
> implement it by adding the new RTNETLINK attribute RTA_EXPIRES.
> 
> Signed-off-by: Xin Long 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 net-next] ipv6: allow routes to be configured with expire values

2015-12-17 Thread Hannes Frederic Sowa

On 17.12.2015 21:23, Dan Williams wrote:
> On Thu, 2015-12-17 at 15:08 -0500, David Miller wrote:
>> From: Dan Williams 
>> Date: Wed, 16 Dec 2015 11:03:52 -0600
>>
>>> On Wed, 2015-12-16 at 17:50 +0800, Xin Long wrote:
 Add the support for adding expire value to routes,  requested by
 Tom Gundersen  for systemd-networkd, and
>> NetworkManager
 wants it too.

 implement it by adding the new RTNETLINK attribute RTA_EXPIRES.
>>>
>>> Could you also add bits to send RTA_EXPIRES back to userspace in
>> the
>>> route dump in rt6_fill_node(), so that userspace can figure out
>> when
>>> RTA_EXPIRES is supported or not?
>>>
>>> (obviously having it there isn't foolproof as if there are no
>> routes on
>>> the system yet userspace can't figure out support, but it's better
>> than
>>> nothing...)
>>
>> That brings up an interesting issue, and I do not agree that we
>> should
>> publish the value for the purpose of determining if the kernel
>> supports
>> it or not.
> 
> That said, userspace still needs to read back the EXPIRES attribute, if
> only for iproute.  The program setting RTA_EXPIRES isn't the only thing
> that wants to know about the route's details.
>
>> We need to come up with a policy for handling unknown attributes
>> because what we do now doesn't work.
> 
> Definitely agree.
> 
>> I'm almost positive that the right thing to do is to unilaterally
>> making nlmsg_parse() error out on out-of-range attribute type
>> numbers,
>> and then backport that to all -stable branches.
> 
> This works for one attribute because then userspace gets an error like
> EOPNOTSUPP or something.  But which attribute caused it?  Does
> userspace then have to retry the operation a couple times with all the
> different combinations of potentially unsupported options?
> 
> If we're going to error out on unrecognized options, I'd really like to
> see some kind of netlink features bitmap or something that positively
> indicates which options the kernel will accept.

Based on your mail I started to look if we can simply publish the
nla_policy maps to user space, which get fed to nlmsg_parse. I am
working on a rtnl_annotate function which adds this information along
with a new netlink flag NLM_F_DUMP_POLICY to query those.

Right now I am struggeling with nested attributes and if it is safe to
move NLA_UNSPEC to the value 1 so we can determine if a specific
attribute is set in the policy or not...

Also nested attributes seem to be quite hairy, maybe there is no reason
to inform user space about them, I don't yet know.

This infrastructure should be safe to use also when features get backported.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch net] net: check both type and procotol for tcp sockets

2015-12-17 Thread David Miller

From: Cong Wang 
Date: Wed, 16 Dec 2015 23:39:04 -0800

> Dmitry reported the following out-of-bound access:
> 
> Call Trace:
>  [] __asan_report_load4_noabort+0x3e/0x40
> mm/kasan/report.c:294
>  [] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
>  [< inline >] SYSC_setsockopt net/socket.c:1746
>  [] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
>  [] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> 
> This is because we mistake a raw socket as a tcp socket.
> We should check both sk->sk_type and sk->sk_protocol to ensure
> it is a tcp socket.
> 
> Willem points out __skb_complete_tx_timestamp() needs to fix as well.
> 
> Reported-by: Dmitry Vyukov 
> Cc: Willem de Bruijn 
> Cc: Eric Dumazet 
> Signed-off-by: Cong Wang 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pull-request: mac80211 2015-12-15

2015-12-17 Thread Joe Perches

On Thu, 2015-12-17 at 13:44 +0100, Johannes Berg wrote:
> On Wed, 2015-12-16 at 18:34 -0500, David Miller wrote:
> >  
> > Something about your text encoding kept this from ending up
> > in patchwork for some reason.
> > 
> 
> Hm. I don't see anything special with this, seems to just be plain
> text
> 8bit transfer encoding.
> 
> Do you want me to watch out for things getting into patchwork in the
> future?
> 

Hey Johannes.

You seem to be using:

X-Mailer: Evolution 3.18.1-1 

which, to be overly technical, _sucks_.

The new composer for Evolution has way too many defects
to list.

It adds non-breaking spaces (NBSP) characters instead of
a standard ASCII space in various places.

Every version of Evolution since 3.14 is terrible at
sending text only messages.

3.12 works
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next V1 3/4] net/mlx5e: Add HW timestamping (TS) support

2015-12-17 Thread Richard Cochran

On Thu, Dec 17, 2015 at 02:35:34PM +0200, Saeed Mahameed wrote:
> @@ -63,6 +65,7 @@
>  #define MLX5E_TX_CQ_POLL_BUDGET128
>  #define MLX5E_UPDATE_STATS_INTERVAL200 /* msecs */
>  #define MLX5E_SQ_BF_BUDGET 16
> +#define MLX5E_SERVICE_TASK_DELAY   (HZ / 4)

Hm...
  
> +void mlx5e_timestamp_overflow_check(struct mlx5e_priv *priv)
> +{
> + bool timeout = time_is_before_jiffies(priv->tstamp.last_overflow_check +
> +   priv->tstamp.overflow_period);
> + unsigned long flags;
> +
> + if (timeout) {
> + write_lock_irqsave(>tstamp.lock, flags);
> + timecounter_read(>tstamp.clock);
> + write_unlock_irqrestore(>tstamp.lock, flags);
> + priv->tstamp.last_overflow_check = jiffies;

Here you have extra book keeping, because the rate of the work
callbacks is not the same as the rate of the overflow checks.

> + }
> +}

> +void mlx5e_timestamp_init(struct mlx5e_priv *priv)
> +{
> + struct mlx5e_tstamp *tstamp = >tstamp;
> + u64 ns;
> + u64 frac = 0;
> + u32 dev_freq;
> +
> + mlx5e_timestamp_init_config(tstamp);
> + dev_freq = MLX5_CAP_GEN(priv->mdev, device_frequency_khz);
> + if (!dev_freq) {
> + mlx5_core_warn(priv->mdev, "invalid device_frequency_khz. %s 
> failed\n",
> +__func__);
> + return;
> + }
> + rwlock_init(>lock);
> + memset(>cycles, 0, sizeof(tstamp->cycles));
> + tstamp->cycles.read = mlx5e_read_clock;
> + tstamp->cycles.shift = MLX5E_CYCLES_SHIFT;
> + tstamp->cycles.mult = clocksource_khz2mult(dev_freq,
> +tstamp->cycles.shift);
> + tstamp->nominal_c_mult = tstamp->cycles.mult;
> + tstamp->cycles.mask = CLOCKSOURCE_MASK(41);
> +
> + timecounter_init(>clock, >cycles,
> +  ktime_to_ns(ktime_get_real()));
> +
> + /* Calculate period in seconds to call the overflow watchdog - to make
> +  * sure counter is checked at least once every wrap around.
> +  */
> + ns = cyclecounter_cyc2ns(>cycles, tstamp->cycles.mask, frac,
> +  );
> + do_div(ns, NSEC_PER_SEC / 2 / HZ);
> + tstamp->overflow_period = ns;
> +}

And here you take great pains to calculate the rate of overflow checks...

> +/* mlx5e_service_task - Run service task for tasks that needed to be done
> + * periodically
> + */
> +static void mlx5e_service_task(struct work_struct *work)
> +{
> + struct delayed_work *dwork = to_delayed_work(work);
> + struct mlx5e_priv *priv = container_of(dwork, struct mlx5e_priv,
> +service_task);
> +
> + mutex_lock(>state_lock);
> + if (test_bit(MLX5E_STATE_OPENED, >state) &&
> + !test_bit(MLX5E_STATE_DESTROYING, >state)) {
> + if (MLX5_CAP_GEN(priv->mdev, device_frequency_khz)) {
> + mlx5e_timestamp_overflow_check(priv);
> + /* Only mlx5e_timestamp_overflow_check is called from
> +  * this service task. schedule a new task only if clock
> +  * is initialized. if changed, move the scheduler.
> +  */
> + schedule_delayed_work(dwork, MLX5E_SERVICE_TASK_DELAY);

Why not simply use the rate you calculated, rather than some hard
coded value?

Consider What happens if MLX5E_SERVICE_TASK_DELAY is too long or way
too short.

> + }
> + }
> + mutex_unlock(>state_lock);
> +}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] ipv6: addrconf: use stable address generator for ARPHRD_NONE

2015-12-17 Thread David Miller

From: Bjørn Mork 
Date: Wed, 16 Dec 2015 16:44:38 +0100

> Add a new address generator mode, using the stable address generator
> with an automatically generated secret. This is intended as a default
> address generator mode for device types with no EUI64 implementation.
> The new generator is used for ARPHRD_NONE interfaces initially, adding
> default IPv6 autoconf support to e.g. tun interfaces.
> 
> If the addrgenmode is set to 'random', either by default or manually,
> and no stable secret is available, then a random secret is used as
> input for the stable-privacy address generator.  The secret can be
> read and modified like manually configured secrets, using the proc
> interface.  Modifying the secret will change the addrgen mode to
> 'stable-privacy' to indicate that it operates on a known secret.
> 
> Existing behaviour of the 'stable-privacy' mode is kept unchanged. If
> a known secret is available when the device is created, then the mode
> will default to 'stable-privacy' as before.  The mode can be manually
> set to 'random' but it will behave exactly like 'stable-privacy' in
> this case. The secret will not change.
> 
> Cc: Hannes Frederic Sowa 
> Cc: 吉藤英明 
> Signed-off-by: Bjørn Mork 

I'll give Hannes and Hideaki a chance to review this.

Re: [PATCH net-next V1 3/4] net/mlx5e: Add HW timestamping (TS) support

2015-12-17 Thread Richard Cochran

On Thu, Dec 17, 2015 at 02:35:34PM +0200, Saeed Mahameed wrote:
> +static int mlx5e_get_ts_info(struct net_device *dev,
> +  struct ethtool_ts_info *info)
> +{
> + struct mlx5e_priv *priv = netdev_priv(dev);
> + int ret;
> +
> + ret = ethtool_op_get_ts_info(dev, info);
> + if (ret)
> + return ret;
> +
> + if (MLX5_CAP_GEN(priv->mdev, device_frequency_khz)) {
> + info->so_timestamping |=
> + SOF_TIMESTAMPING_TX_HARDWARE |
> + SOF_TIMESTAMPING_RX_HARDWARE |
> + SOF_TIMESTAMPING_RAW_HARDWARE;
> +
> + info->tx_types =
> + (1 << HWTSTAMP_TX_OFF) |
> + (1 << HWTSTAMP_TX_ON);
> +
> + info->rx_filters =
> + (1 << HWTSTAMP_FILTER_NONE) |
> + (1 << HWTSTAMP_FILTER_ALL);
> + }

Here you need:

info->phc_index = -1;

and then in the next patch, use the PHC index when available.

> + return 0;
> +}
> +

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 7c8c408..4ae70cd 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -36,6 +36,10 @@
>  #include 
>  #include "en.h"
>  
> +#define MLX5E_RX_HW_STAMP(priv)  \
> + (priv->tstamp.hwtstamp_config.rx_filter ==  \
> +  HWTSTAMP_FILTER_ALL)

Use an inline function, please.  Also, that line fits in 80 columns
easily.

> + if (MLX5E_RX_HW_STAMP(priv))
> + mlx5e_fill_hwstamp(>tstamp, skb_hwtstamps(skb),
> +get_cqe_ts(cqe));
> +

> +#define MLX5E_TX_HW_STAMP(priv, skb) \
> + (priv->tstamp.hwtstamp_config.tx_type == HWTSTAMP_TX_ON &&  \
> + skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)

Use inline function.

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/macb: Update device tree binding for resetting PHY using GPIO

2015-12-17 Thread David Miller

From: Gregory CLEMENT 
Date: Thu, 17 Dec 2015 10:51:04 +0100

> Instead of being at the MAC level the reset gpio preperty is moved at the
> PHY child node level. It is still managed by the MAC, but from the point
> of view of the binding it make more sense to be part of the PHY node.
> 
> This commit also fixes a build errors if GPIOLIB is not selected.
> 
> Signed-off-by: Gregory CLEMENT 

Applied to net-next, thanks.

Please be explicit in your patch postings what tree you are targetting
a change at.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next V1 4/4] net/mlx5e: Add PTP Hardware Clock (PHC) support

2015-12-17 Thread Richard Cochran

On Thu, Dec 17, 2015 at 02:35:35PM +0200, Saeed Mahameed wrote:
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> index 8e86f2c..b2e5014 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> @@ -880,6 +880,9 @@ static int mlx5e_get_ts_info(struct net_device *dev,
>   (1 << HWTSTAMP_FILTER_ALL);
>   }
>  
> + if (priv->tstamp.ptp)
> + info->phc_index = ptp_clock_index(priv->tstamp.ptp);
else
info->phc_index = -1;

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] tun: honor IFF_UP in tun_get_user()

2015-12-17 Thread David Miller

From: Eric Dumazet 
Date: Wed, 16 Dec 2015 08:57:37 -0800

> From: Eric Dumazet 
> 
> If a tun interface is turned down, we should not allow packet injection
> into the kernel.
> 
> Kernel does not send packets to the tun already.
> 
> TUNATTACHFILTER can not be used as only tun_net_xmit() is taking care
> of it.
> 
> Reported-by: Curt Wohlgemuth 
> Signed-off-by: Eric Dumazet 

This looks fine, applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] tcp: restore fastopen with no data in SYN packet

2015-12-17 Thread David Miller

From: Eric Dumazet 
Date: Wed, 16 Dec 2015 13:53:10 -0800

> From: Eric Dumazet 
> 
> Yuchung tracked a regression caused by commit 57be5bdad759 ("ip: convert
> tcp_sendmsg() to iov_iter primitives") for TCP Fast Open.
> 
> Some Fast Open users do not actually add any data in the SYN packet.
> 
> Fixes: 57be5bdad759 ("ip: convert tcp_sendmsg() to iov_iter primitives")
> Reported-by: Yuchung Cheng 
> Signed-off-by: Eric Dumazet 

Applied and queued up for -stable, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 net-next] ipv6: allow routes to be configured with expire values

2015-12-17 Thread David Miller

From: Dan Williams 
Date: Wed, 16 Dec 2015 11:03:52 -0600

> On Wed, 2015-12-16 at 17:50 +0800, Xin Long wrote:
>> Add the support for adding expire value to routes,  requested by
>> Tom Gundersen  for systemd-networkd, and NetworkManager
>> wants it too.
>> 
>> implement it by adding the new RTNETLINK attribute RTA_EXPIRES.
> 
> Could you also add bits to send RTA_EXPIRES back to userspace in the
> route dump in rt6_fill_node(), so that userspace can figure out when
> RTA_EXPIRES is supported or not?
> 
> (obviously having it there isn't foolproof as if there are no routes on
> the system yet userspace can't figure out support, but it's better than
> nothing...)

That brings up an interesting issue, and I do not agree that we should
publish the value for the purpose of determining if the kernel supports
it or not.

We need to come up with a policy for handling unknown attributes
because what we do now doesn't work.

I'm almost positive that the right thing to do is to unilaterally
making nlmsg_parse() error out on out-of-range attribute type numbers,
and then backport that to all -stable branches.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] nfp: clear ring delayed kick counters

2015-12-17 Thread David Miller

From: Jakub Kicinski 
Date: Wed, 16 Dec 2015 19:08:52 +

> We need to clear delayed kick counters when we free rings otherwise
> after ndo_close()/ndo_open() we could kick HW by more entries than
> actually written to rings.
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Rolf Neugebauer 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code

2015-12-17 Thread David Miller

From: Rainer Weikusat 
Date: Wed, 16 Dec 2015 20:09:25 +

> With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
> receive code was changed from using mutex_lock(>readlock) to
> mutex_lock_interruptible(>readlock) to prevent signals from being
> delayed for an indefinite time if a thread sleeping on the mutex
> happened to be selected for handling the signal. But this was never a
> problem with the stream receive code (as opposed to its datagram
> counterpart) as that never went to sleep waiting for new messages with the
> mutex held and thus, wouldn't cause secondary readers to block on the
> mutex waiting for the sleeping primary reader. As the interruptible
> locking makes the code more complicated in exchange for no benefit,
> change it back to using mutex_lock.
> 
> Signed-off-by: Rainer Weikusat 

Applied, thanks Rainer.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/3] drivers: net: cpsw: Fix bugs in fixed-link PHY DT parsing

2015-12-17 Thread David Miller

From: "David Rivshin (Allworx)" 
Date: Wed, 16 Dec 2015 23:02:08 -0500

> I have tested on the following hardware configurations:
>  - (EVMSK) dual emac with two real MDIO-connected phys using RGMII-TXID
>  - single emac with fixed-link using RGMII
> Testing of other CPSW emac configurations that folks may have would
> be appreciated.

I'm going to wait until some others give some feedback and testing
results on this one, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close

2015-12-17 Thread Vlad Yasevich

On 12/17/2015 02:01 PM, Marcelo Ricardo Leitner wrote:
> Em 17-12-2015 16:29, Vlad Yasevich escreveu:
>> On 12/17/2015 09:30 AM, Xin Long wrote:
>>> In sctp_close, sctp_make_abort_user may return NULL because of memory
>>> allocation failure. If this happens, it will bypass any state change
>>> and never free the assoc. The assoc has no chance to be freed and it
>>> will be kept in memory with the state it had even after the socket is
>>> closed by sctp_close().
>>>
>>> So if sctp_make_abort_user fails to allocate memory, we should just
>>> free the asoc, as there isn't much else that we can do.
>>>
>>> Signed-off-by: Xin Long 
>>> Acked-by: Marcelo Ricardo Leitner 
>>> ---
>>>   net/sctp/socket.c | 6 +-
>>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>> index 9b6cc6d..267b8f8 100644
>>> --- a/net/sctp/socket.c
>>> +++ b/net/sctp/socket.c
>>> @@ -1513,8 +1513,12 @@ static void sctp_close(struct sock *sk, long timeout)
>>>   struct sctp_chunk *chunk;
>>>
>>>   chunk = sctp_make_abort_user(asoc, NULL, 0);
>>> -if (chunk)
>>> +if (chunk) {
>>>   sctp_primitive_ABORT(net, asoc, chunk);
>>> +} else {
>>> +sctp_unhash_established(asoc);
>>> +sctp_association_free(asoc);
>>> +}
>>
>> I don't think you can do that for an association that has not been closed.
>>
>> I think a cleaner approach might be to update abort primitive handlers
>> to handle a NULL chunk value and unconditionally call the primitive.
>>
>> This guarantees that any timers or waitqueues that might be active are
>> stopped correctly.
> 
> sctp_association_free() is the one who does that job, even that way. All in 
> between the
> primitive call and then the call to sctp_association_free() is just status 
> changes and
> packet xmit, which doing this way we cut out when we are in memory pressure. 
> pkt xmit or
> ULP events are likely going to fail too anyway.
> 
> sctp_sf_do_9_1_prm_abort() -> SCTP_CMD_ASSOC_FAILED ->
>   sctp_cmd_assoc_failed -> ULP events, send abort, and SCTP_CMD_DELETE_TCB ->
> sctp_cmd_delete_tcb ->
>   sctp_unhash_established(asoc);
>   sctp_association_free(asoc);
> and returns.
> 
> There is a check on sctp_cmd_delete_tcb() that avoids calling that on temp 
> assocs on
> listening sockets, but that condition is false due to the check on 
> sk_shutdown so it will
> call those two functions anyway.

The condition I am a bit concerned about is one thread waiting in 
sctp_wait_for_sndbuf
while another does an abort.

I think this is OK though.  I need to look a bit more...

-vlad


> 
>   Marcelo
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] ipv6: add IPV6_HDRINCL option for raw sockets

2015-12-17 Thread David Miller

From: Hannes Frederic Sowa 
Date: Wed, 16 Dec 2015 17:22:47 +0100

> Same as in Windows, we miss IPV6_HDRINCL for SOL_IPV6 and SOL_RAW.
> The SOL_IP/IP_HDRINCL is not available for IPv6 sockets.
> 
> Signed-off-by: Hannes Frederic Sowa 

This looks fine, applied, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 net-next] ipv6: allow routes to be configured with expire values

2015-12-17 Thread Dan Williams

On Thu, 2015-12-17 at 15:08 -0500, David Miller wrote:
> From: Dan Williams 
> Date: Wed, 16 Dec 2015 11:03:52 -0600
> 
> > On Wed, 2015-12-16 at 17:50 +0800, Xin Long wrote:
> >> Add the support for adding expire value to routes,  requested by
> >> Tom Gundersen  for systemd-networkd, and
> NetworkManager
> >> wants it too.
> >> 
> >> implement it by adding the new RTNETLINK attribute RTA_EXPIRES.
> > 
> > Could you also add bits to send RTA_EXPIRES back to userspace in
> the
> > route dump in rt6_fill_node(), so that userspace can figure out
> when
> > RTA_EXPIRES is supported or not?
> > 
> > (obviously having it there isn't foolproof as if there are no
> routes on
> > the system yet userspace can't figure out support, but it's better
> than
> > nothing...)
> 
> That brings up an interesting issue, and I do not agree that we
> should
> publish the value for the purpose of determining if the kernel
> supports
> it or not.

That said, userspace still needs to read back the EXPIRES attribute, if
only for iproute.  The program setting RTA_EXPIRES isn't the only thing
that wants to know about the route's details.

> We need to come up with a policy for handling unknown attributes
> because what we do now doesn't work.

Definitely agree.

> I'm almost positive that the right thing to do is to unilaterally
> making nlmsg_parse() error out on out-of-range attribute type
> numbers,
> and then backport that to all -stable branches.

This works for one attribute because then userspace gets an error like
EOPNOTSUPP or something.  But which attribute caused it?  Does
userspace then have to retry the operation a couple times with all the
different combinations of potentially unsupported options?

If we're going to error out on unrecognized options, I'd really like to
see some kind of netlink features bitmap or something that positively
indicates which options the kernel will accept.

Dan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 2/2] udp: restrict offloads to one namespace

2015-12-17 Thread Hannes Frederic Sowa

On 17.12.2015 19:10, Tom Herbert wrote:
> On Thu, Dec 17, 2015 at 9:40 AM, Hannes Frederic Sowa
>  wrote:
>> On 17.12.2015 18:32, Tom Herbert wrote:
>>> On Thu, Dec 17, 2015 at 12:49 AM, Hannes Frederic Sowa
>>>  wrote:
 With user namespaces a normal user can start a new network namespace
 with all privileges and thus add new offloads, letting the other stack
 interpret this garbage. Because the user namespace can also add
 arbitrary ip addresses to its interface, solely matching those is not
 enough.

 Tom any further comments?

>>> I still don't think this addresses the core problem. If we're just
>>> worried about offloads being added in a user namespace that conflict
>>> with the those in the root space, it might be just as easy to disallow
>>> setting offloads except in default namespace.
>>
>> I am fine with that solution, too.
>>
>>> [...]
>>>
>>> To address this in the host stack the solution is pretty
>>> straightforward, we need to decide that the packet is going to be
>>> received before applying any offloads. Essentially we want to do an
>>> early_demux _really_ early. If we demux and get UDP socket for
>>> instance, then the protocol specific GRO function can be retrieved
>>> from the socket. So this will work with single listener port like
>>> encaps do today,  and also if encapsulation is being used over a
>>> connected socket. This also works if we want to support a user defined
>>> GRO function like I mentioned we might want to do for QUIC etc.
>>
>> An approximation can be done, but I don't think it is feasible to
>> implement this kind of checks across namespace borders, ip rules and
>> netfilter rulesets, which could all change the outcome of the process.
>>
> For receive offloads we don't need to worry about checking other namespaces.

That is true. Albeit for net-branch/stable I would still suggest either
this patch or restricting udp offloads just to the initial net namespace.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 net-next 0/4] Few l2 table related enhancements for cxgb4

2015-12-17 Thread David Miller

From: Hariprasad Shenai 
Date: Thu, 17 Dec 2015 13:45:06 +0530

> This series adds a new API to allocate and update l2t entry, replaces
> arpq_head/arpq_tail with double skb double linked list. Use t4_mgmt_tx()
> to send control packets of l2t write request. Use symbolic constants
> while calculating vlan priority.
> 
> This patch series has been created against net-next tree and includes
> patches on cxgb4 driver.
> 
> We have included all the maintainers of respective drivers. Kindly review
> the change and let us know in case of any review comments.
> 
> Thanks
> 
> V2: Remove unnecessary MAS operation while calculating vlan prio in
> PATCH 1/4 ("cxgb4: Use symbolic constant for VLAN priority calculation")
> based on review comment by David Miller

Series applied.

However, while reading over this, it seems to me that e->refcnt doesn't
need to be an atomic_t.  It's always changed under the rwlock, therefore
you can just use non-atomic refcounting.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] mkiss: Fix use after free in sixpack_close().

2015-12-17 Thread David Miller


Need to do the unregister_device() after all references to the driver
private have been done.

Signed-off-by: David S. Miller 
---
 drivers/net/hamradio/mkiss.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c
index 216bfd3..0b72b9d 100644
--- a/drivers/net/hamradio/mkiss.c
+++ b/drivers/net/hamradio/mkiss.c
@@ -798,13 +798,13 @@ static void mkiss_close(struct tty_struct *tty)
if (!atomic_dec_and_test(>refcnt))
down(>dead_sem);
 
-   unregister_netdev(ax->dev);
-
/* Free all AX25 frame buffers. */
kfree(ax->rbuff);
kfree(ax->xbuff);
 
ax->tty = NULL;
+
+   unregister_netdev(ax->dev);
 }
 
 /* Perform I/O control on an active ax25 channel. */
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] net: qmi_wwan: ignore bogus CDC Union descriptors

2015-12-17 Thread David Miller

From: Bjørn Mork 
Date: Thu, 17 Dec 2015 12:44:04 +0100

> The CDC descriptors found on these vendor specific functions should
> not be considered authoritative.  They seem to be ignored by drivers
> for other systems, and the quality is therefore low.
> 
> One device (1e0e:9001) has been reported to have such a bogus union
> descriptor on the QMI function, making it fail probing even if the
> device id was dynamically added.  The report was not complete enough
> to allow adding a device entry for this modem. But this should at
> least fix the dynamic id probing problem.
> 
> Reported-by: Kanerva Topi 
> Signed-off-by: Bjørn Mork 

Applied, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] team: Advertise tunneling offload features

2015-12-17 Thread David Miller

From: Or Gerlitz 
Date: Thu, 17 Dec 2015 16:11:55 +0200

> From: Eran Ben Elisha 
> 
> When the underlying device supports offloads encapulated traffic,
> we need to reflect that through the hw_enc_features field of the
> team net-device.
> 
> This will cause the xmit path in the core networking stack to provide
> team with encapsulated GSO frames to offload into the HW etc.
> 
> Using this over Mellanox ConnectX3-pro (mlx4 driver) card that supports
> VXLAN offloads we got 36.0 Gbits/sec using eight iperf streams.
> 
> Signed-off-by: Eran Ben Elisha 
> Signed-off-by: Jack Morgenstein 
> Reviewed-by: Or Gerlitz 
> Acked-by: Jiri Pirko 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: use-after-free in sixpack_close

2015-12-17 Thread Ralf Baechle DL5RB

On Thu, Dec 17, 2015 at 04:05:32PM -0500, David Miller wrote:

> This should fix it, the only thing I'm unsure of is if we should perhaps
> also use del_timer_sync() here.  Anyone?

I think so.

  Ralf
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

2015-12-17 Thread Jay Vosburgh

 wrote:

>From: Zhu Yanjun 
>
>In 802.3ad mode, the speed and duplex is needed. But in some NIC,
>there is a time span between NIC up state and getting speed and duplex.
>As such, sometimes a slave in 802.3ad mode is in up state without
>speed and duplex. This will make bonding in 802.3ad mode can not
>work well. 
>To make bonding driver be compatible with more NICs, it is
>necessary to restrict the up state in 802.3ad mode.

What device is this?  It seems a bit odd that an Ethernet device
can be carrier up but not have the duplex and speed available.

Also, what are the option settings for bonding?  Specifically,
is "use_carrier" set to 0?  The default setting is 1.

In general, though, bonding expects a speed or duplex change to
be announced via a NETDEV_UPDATE or NETDEV_UP notifier, which would
propagate to the 802.3ad logic.

If the device here is going carrier up prior to having speed or
duplex available, then maybe it should call netdev_state_change() when
the duplex and speed are available, or delay calling netif_carrier_on().

>Signed-off-by: Zhu Yanjun 
>---
> drivers/net/bonding/bond_main.c |   19 +++
> 1 file changed, 19 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 9e0f8a7..0a80fb3 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1991,6 +1991,25 @@ static int bond_miimon_inspect(struct bonding *bond)
> 
>   link_state = bond_check_dev_link(bond, slave->dev, 0);
> 
>+  /* Since some NIC has time span between netif_running and
>+   * getting speed and duples. That is, after a NIC is up 
>(netif_running),
>+   * there is a time span before this NIC is negotiated with 
>speed and duplex.
>+   * During this time span, the slave in 802.3ad is configured 
>without speed
>+   * and duplex. This 802.3ad bonding will not work because it 
>needs slave's speed
>+   * and duplex to generate key field.
>+   * As such, we restrict up in 802.3ad mode to: netif_running && 
>peed != SPEED_UNKNOWN &&
>+   * duplex != DUPLEX_UNKNOWN
>+   */
>+  if ((BMSR_LSTATUS == link_state) &&
>+  (BOND_MODE(bond) == BOND_MODE_8023AD)) {
>+  bond_update_speed_duplex(slave);
>+  if ((slave->speed == SPEED_UNKNOWN) ||
>+  (slave->duplex == DUPLEX_UNKNOWN)) {
>+  link_state = 0;
>+  netdev_info(bond->dev, "In 802.3ad mode, it is 
>not enough to up without speed and duplex");
>+  }
>+  }

Also, as a functional note on this patch, the above looks like
it will spam the log repeatedly every miimon interval for as long as the
"carrier up but no speed/duplex" situation persists.

-J

>   switch (slave->link) {
>   case BOND_LINK_UP:
>   if (link_state)
>-- 
>1.7.9.5
>

---
-Jay Vosburgh, jay.vosbu...@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] decnet: fix possible NULL deref in dnet_select_source()

2015-12-17 Thread Vegard Nossum

On 7 April 2014 at 21:18, David Miller  wrote:
> From: Eric Dumazet 
> Date: Sun, 06 Apr 2014 14:59:14 -0700
>
>> From: Eric Dumazet 
>>
>> dnet_select_source() should make sure dn_ptr is not NULL.
>>
>> While looking at this decnet code, I believe I found a device
>> reference leak, lets fix it as well.
>>
>> Reported-by: Sasha Levin 
>> Signed-off-by: Eric Dumazet 
>> ---
>> It seems this bug is very old, no recent change is involved.
>
> The callers work hard to ensure this.
>
> Analyzing all call sites:
>
> 1) __dn_fib_res_prefsrc() uses the FIB entry device pointer, we should not
>be adding FIB entries pointing to devices which do not have their
>decnet private initialized yet.
>
> 2) dn_route_output_slow()
>
>The paths leading to the dnet_select_address() call(s) check if
>dev_out->dn_ptr is not NULL, except when using loopback.
>
> In some other paths the device comes from neigh->dev, from which the
> 'neigh' was looked up in dn_neigh_table.  There should not be neighbour
> entries in this table pointing to devices which do not have their
> decnet private setup yet.
>
> And in the loopback case, it is the decnet stack's responsibility to
> make sure ->dn_ptr is setup properly, else it should fail the module
> load and stack initialization.
>
> I think there is some core fundamental issue here, and just adding
> a NULL check to dnet_select_source() is just papering around the issue.
>
> Please look closer at the stack trace, this code, and my analysis
> above to figure out what's really going on so we can fix this properly.
>

Hi,

(Reviving old thread: https://lkml.org/lkml/2014/4/6/101)

I've just run into the same bug and I can confirm it's still present
on a stock Ubuntu kernel and can be reliably triggered by a non-root
user given that the loopback device is in a "down" state.

So as far as I understand:

dev_out->dn_ptr is assigned a non-NULL value in dn_dev_up() ->
dn_dev_create() when the loopback device is brought up (and set to
NULL when it is brought down).

dn_route_output_slow() doesn't check for a NULL value in the "No
destination? Assume its local" (!fld.daddr) case -- it also doesn't
check in any other way if the device is up or down.

Another bit in dn_route_output_slow() uses dn_dev_get_default() in
another "Not there? Perhaps its a local address" case, which _does_
check ->dn_ptr (but it uses decnet_default_device, not
init_net.loopback_dev).

There are other users of init_net.loopback_dev which don't seem to
check its ->dn_ptr.

I'm a bit uncertain about the other callsites that check ->dn_ptr for
a NULL value -- unless they take RTNL, how are they safe against a
race with somebody else bringing the device down (see
dn_dev_down()/dn_dev_delete()) and freeing ->dn_ptr after they get
ahold of it?

I think we could add NULL checks to dn_route_output_slow(). In any
case we shouldn't allow the device to be used if it's down, right?

I also understood from Sasha that decnet is generally in a bit of a
sorry state -- should we just add 'depends on BROKEN' in the Kconfig
to prevent more problems down the line?

Vegard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT] Networking

2015-12-17 Thread David Miller


1) Fix uninitialized variable warnings in nfnetlink_queue, a lot of people
   reported this... From Arnd Bergmann.

2) Don't init mutex twice in i40e driver, from Jesse Brandeburg.

3) Fix spurious EBUSY in rhashtable, from Herbert Xu.

4) Missing DMA unmaps in mvpp2 driver, from Marcin Wojtas.

5) Fix race with work structure access in pppoe driver causing
   corruptions, from Guillaume Nault.

6) Fix OOPS due to sh_eth_rx() not checking whether netdev_alloc_skb()
   actually succeeded or not, from Sergei Shtylyov.

7) Don't lose flags when settifn IFA_F_OPTIMISTIC in ipv6 code, from
   Bjørn Mork.

8) VXLAN_HD_RCO defined incorrectly, fix from Jiri Benc.

9) Fix clock source used for cookies in SCTP, from Marcelo Ricardo Leitner.

10) aurora driver needs HAS_DMA dependency, from Geert Uytterhoeven.

11) ndo_fill_metadata_dst op of vxlan has to handle ipv6 tunneling properly
as well, from Jiri Benc.

12) Handle request sockets properly in xfrm layer, from Eric Dumazet.

13) Double stats update in ipv6 geneve transmit path, fix from Pravin
B Shelar.

14) sk->sk_policy[] needs RCU protection, and as a result
xfrm_policy_destroy() needs to free policies using an RCU grace
period, from Eric Dumazet.

15) SCTP needs to clone ipv6 tx options in order to avoid use after
free, from Eric Dumazet.

16) Missing kbuild export if ila.h, from Stephen Hemminger.

17) Missing mdiobus_alloc() return value checking in mdio-mux.c, from
Tobias Klauser.

18) Validate protocol value range in ->create() methods, from Hannes
Frederic Sowa.

19) Fix early socket demux races that result in illegal dst reuse,
from Eric Dumazet.

20) Validate socket address length in pptp code, from WANG Cong.

21) skb_reorder_vlan_header() uses incorrect offset and can corrupt
packets, from Vlad Yasevich.

22) Fix memory leaks in nl80211 registry code, from Ola Olsson.

23) Timeout loop count handing fixes in mISDN, xgbe, qlge, sfc, and
qlcnic.  From Dan Carpenter.

24) msg.msg_iocb needs to be cleared in recvfrom() otherwise, for
example, AF_ALG will interpret it as an async call.  From
Tadeusz Struk.

25) inetpeer_set_addr_v4 forgets to initialize the 'vif' field, from
Eric Dumazet.

26) rhashtable enforces the minimum table size not early enough,
breaking how we calculate the per-cpu lock allocations.  From
Herbert Xu.

27) Fix FCC port lockup in 82xx driver, from Martin Roth.

28) FOU sockets need to be freed using RCU, from Hannes Frederic Sowa.

29) Fix out-of-bounds access in __skb_complete_tx_timestamp() and
sock_setsockopt() wrt. timestamp handling.  From WANG Cong.

Please pull, thanks a lot!

The following changes since commit 071f5d105a0ae93aeb02197c4ee3557e8cc57a21:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-12-03 
16:02:46 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to ac5cc977991d2dce85fc734a6c71ddb33f6fe3c1:

  net: check both type and procotol for tcp sockets (2015-12-17 15:46:32 -0500)


Alan Cox (1):
  ser_gigaset: turn nonsense checks into WARN_ON

Alexander Duyck (1):
  ixgbe: Reset interface after enabling SR-IOV

Andrew Lunn (1):
  phy: micrel: Fix finding PHY properties in MAC node.

Andrzej Hajda (1):
  net/mlx4_core: fix handling return value of mlx4_slave_convert_port

Andy Shevchenko (2):
  net:hns: annotate IO address space properly
  net:hns: print MAC with %pM

Ariel Elior (1):
  qed: Fix BAR size split for some servers

Arnd Bergmann (3):
  netfilter: nfnetlink_queue: avoid harmless unnitialized variable warnings
  net: fsl: avoid 64-bit warning on pq_mdio
  net: ezchip: fix address space confusion in nps_enet.c

Bert Kenward (1):
  sfc: only use RSS filters if we're using RSS

Bjørn Mork (2):
  ipv6: keep existing flags when setting IFA_F_OPTIMISTIC
  net: cdc_mbim: add "NDP to end" quirk for Huawei E3372

Chen-Yu Tsai (1):
  stmmac: dwmac-sunxi: Call exit cleanup function in probe error path

Dan Carpenter (5):
  mISDN: fix a loop count
  amd-xgbe: fix a couple timeout loops
  qlge: fix a timeout loop in ql_change_rx_buffers()
  sfc: fix a timeout loop
  qlcnic: fix a timeout loop

David Ahern (1):
  net: Flush local routes when device changes vrf association

David S. Miller (13):
  Merge branch 'mvpp2-fixes'
  Merge branch 'master' of git://git.kernel.org/.../jkirsher/net-queue
  Merge branch 'sctp-timestamp-fixes'
  Revert "rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation"
  Merge branch 'qed-fixes'
  Merge branch 'vxlan-ipv6-metadata-dst'
  Merge tag 'batman-adv-fix-for-davem' of 
git://git.open-mesh.org/linux-merge
  Merge branch 'bnxt_en-fixes'
  Merge branch 'mpls-fixes'
  Merge git://git.kernel.org/.../pablo/nf
  Merge

Re: use-after-free in sixpack_close

2015-12-17 Thread David Miller

From: One Thousand Gnomes 
Date: Thu, 17 Dec 2015 11:41:04 +

>> This report is then followed by a dozen of other use-after-free reports.
>> 
>> On commit edb42dc7bc0da0125ceacab810a553ce1f0cac8d (Dec 15).
>> 
>> Thank you
> 
> sixpack_close does unregister_netdev(sp->dev), which frees sp as sp is
> actually allocated via alloc_netdev()
> 
> Then deletes two timers within sp
> 
> Then frees two buffers indexed off sp

This should fix it, the only thing I'm unsure of is if we should perhaps
also use del_timer_sync() here.  Anyone?


[PATCH 1/2] 6pack: Fix use after free in sixpack_close().

Need to do the unregister_device() after all references to the driver
private have been done.

Signed-off-by: David S. Miller 
---
 drivers/net/hamradio/6pack.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hamradio/6pack.c b/drivers/net/hamradio/6pack.c
index 7c4a415..218f3ab 100644
--- a/drivers/net/hamradio/6pack.c
+++ b/drivers/net/hamradio/6pack.c
@@ -683,14 +683,14 @@ static void sixpack_close(struct tty_struct *tty)
if (!atomic_dec_and_test(>refcnt))
down(>dead_sem);
 
-   unregister_netdev(sp->dev);
-
del_timer(>tx_t);
del_timer(>resync_t);
 
/* Free all 6pack frame buffers. */
kfree(sp->rbuff);
kfree(sp->xbuff);
+
+   unregister_netdev(sp->dev);
 }
 
 /* Perform I/O control on an active 6pack channel. */
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 net-next] ipv6: allow routes to be configured with expire values

2015-12-17 Thread David Miller

From: Dan Williams 
Date: Thu, 17 Dec 2015 14:23:57 -0600

> On Thu, 2015-12-17 at 15:08 -0500, David Miller wrote:
>> That brings up an interesting issue, and I do not agree that we
>> should
>> publish the value for the purpose of determining if the kernel
>> supports
>> it or not.
> 
> That said, userspace still needs to read back the EXPIRES attribute, if
> only for iproute.  The program setting RTA_EXPIRES isn't the only thing
> that wants to know about the route's details.

Agreed.

>> I'm almost positive that the right thing to do is to unilaterally
>> making nlmsg_parse() error out on out-of-range attribute type
>> numbers,
>> and then backport that to all -stable branches.
> 
> This works for one attribute because then userspace gets an error like
> EOPNOTSUPP or something.  But which attribute caused it?  Does
> userspace then have to retry the operation a couple times with all the
> different combinations of potentially unsupported options?
> 
> If we're going to error out on unrecognized options, I'd really like to
> see some kind of netlink features bitmap or something that positively
> indicates which options the kernel will accept.

Also agree.  But it has to be really simple and trivial so that -stable
backports are possible.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 2/2] udp: restrict offloads to one namespace

2015-12-17 Thread Tom Herbert

On Thu, Dec 17, 2015 at 12:33 PM, Hannes Frederic Sowa
 wrote:
> On 17.12.2015 19:10, Tom Herbert wrote:
>> On Thu, Dec 17, 2015 at 9:40 AM, Hannes Frederic Sowa
>>  wrote:
>>> On 17.12.2015 18:32, Tom Herbert wrote:
 On Thu, Dec 17, 2015 at 12:49 AM, Hannes Frederic Sowa
  wrote:
> With user namespaces a normal user can start a new network namespace
> with all privileges and thus add new offloads, letting the other stack
> interpret this garbage. Because the user namespace can also add
> arbitrary ip addresses to its interface, solely matching those is not
> enough.
>
> Tom any further comments?
>
 I still don't think this addresses the core problem. If we're just
 worried about offloads being added in a user namespace that conflict
 with the those in the root space, it might be just as easy to disallow
 setting offloads except in default namespace.
>>>
>>> I am fine with that solution, too.
>>>
 [...]

 To address this in the host stack the solution is pretty
 straightforward, we need to decide that the packet is going to be
 received before applying any offloads. Essentially we want to do an
 early_demux _really_ early. If we demux and get UDP socket for
 instance, then the protocol specific GRO function can be retrieved
 from the socket. So this will work with single listener port like
 encaps do today,  and also if encapsulation is being used over a
 connected socket. This also works if we want to support a user defined
 GRO function like I mentioned we might want to do for QUIC etc.
>>>
>>> An approximation can be done, but I don't think it is feasible to
>>> implement this kind of checks across namespace borders, ip rules and
>>> netfilter rulesets, which could all change the outcome of the process.
>>>
>> For receive offloads we don't need to worry about checking other namespaces.
>
> That is true. Albeit for net-branch/stable I would still suggest either
> this patch or restricting udp offloads just to the initial net namespace.
>
I would opt for the latter then.

> Bye,
> Hannes
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] mkiss: Fix use after free in sixpack_close().

2015-12-17 Thread Ralf Baechle

On Thu, Dec 17, 2015 at 04:05:49PM -0500, David Miller wrote:

> Subject: [PATCH 2/2] mkiss: Fix use after free in sixpack_close().

Make that subject "... mkiss_close()."

  Ralf
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] ipv6: addrconf: use stable address generator for ARPHRD_NONE

2015-12-17 Thread Hannes Frederic Sowa

On 16.12.2015 16:44, Bjørn Mork wrote:
> Add a new address generator mode, using the stable address generator
> with an automatically generated secret. This is intended as a default
> address generator mode for device types with no EUI64 implementation.
> The new generator is used for ARPHRD_NONE interfaces initially, adding
> default IPv6 autoconf support to e.g. tun interfaces.
> 
> If the addrgenmode is set to 'random', either by default or manually,
> and no stable secret is available, then a random secret is used as
> input for the stable-privacy address generator.  The secret can be
> read and modified like manually configured secrets, using the proc
> interface.  Modifying the secret will change the addrgen mode to
> 'stable-privacy' to indicate that it operates on a known secret.
> 
> Existing behaviour of the 'stable-privacy' mode is kept unchanged. If
> a known secret is available when the device is created, then the mode
> will default to 'stable-privacy' as before.  The mode can be manually
> set to 'random' but it will behave exactly like 'stable-privacy' in
> this case. The secret will not change.
> 
> Cc: Hannes Frederic Sowa 
> Cc: 吉藤英明 
> Signed-off-by: Bjørn Mork 
> ---
> Changes since RFC:
>  - use IN6_ADDR_GEN_MODE_RANDOM as requested
>  - set the device type specific default in addrconf_dev_config() instead
>of ipv6_add_dev() to catch device type changes
> 
> I guess an explanation is needed for the last change: My primary
> use case for this is the raw-ip support recently added to the qmi_wwan
> driver. This driver creates an ARPHRD_ETHER device which can be morphed
> into an ARPHRD_NONE device. So at device creation time everything is
> fine and EUI64 is supported. But when the link is set up, the device
> type is change and EUI64 does not work. So I needed to override the
> default for ARPHRD_NONE at link up time.
> 
> An alternative would be to keep the ipv6_add_dev() code and add device
> type change notifier code. That seemed messier to me. Besides, we
> already do type based address configuration decisions in
> addrconf_dev_config(), deciding whether autoconfiguration is available
> or not.

Agreed to your statements above and I reviewed the patch. Also, I can't
see any bad side effects resulting in ARPHRD_NONE interfaces picking up
link local addresses in general.

Acked-by: Hannes Frederic Sowa 

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 151 matches

Mail list logo