date:20170519

Re: [PATCH net 1/3] vlan: Fix tcp checksums offloads for Q-in-Q vlan.

2017-05-19 Thread Toshiaki Makita

On 2017/05/19 16:09, Vlad Yasevich wrote:
> On 05/18/2017 10:13 PM, Toshiaki Makita wrote:
>> On 2017/05/18 22:31, Vladislav Yasevich wrote:
>>> It appears that since commit 8cb65d000, Q-in-Q vlans have been
>>> broken.  The series that commit is part of enabled TSO and checksum
>>> offloading on Q-in-Q vlans.  However, most HW we support can't handle
>>> it.  To work around the issue, the above commit added a function that
>>> turns off offloads on Q-in-Q devices, but it left the checksum offload.
>>> That will cause issues with most older devices that supprort very basic
>>> checksum offload capabilities as well as some newer devices (we've
>>> reproduced te problem with both be2net and bnx).
>>>
>>> To solve this for everyone, turn off checksum offloading feature
>>> by default when sending Q-in-Q traffic.  Devices that are proven to
>>> work can provided a corrected ndo_features_check implemetation.
>>>
>>> Fixes: 8cb65d000 ("net: Move check for multiple vlans to drivers")
>>> CC: Toshiaki Makita 
>>> Signed-off-by: Vladislav Yasevich 
>>> ---
>>>  include/linux/if_vlan.h | 1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
>>> index 8d5fcd6..ae537f0 100644
>>> --- a/include/linux/if_vlan.h
>>> +++ b/include/linux/if_vlan.h
>>> @@ -619,7 +619,6 @@ static inline netdev_features_t 
>>> vlan_features_check(const struct sk_buff *skb,
>>>  NETIF_F_SG |
>>>  NETIF_F_HIGHDMA |
>>>  NETIF_F_FRAGLIST |
>>> -NETIF_F_HW_CSUM |
>>>  NETIF_F_HW_VLAN_CTAG_TX |
>>>  NETIF_F_HW_VLAN_STAG_TX);
>>>  
>>
>> I guess HW_CSUM theoretically can handle Q-in-Q packets and the problem
>> is IP_CSUM and IPV6_CSUM.
>> So wouldn't it be better to leave HW_CSUM and drop IP_CSUM/IPV6_CSUM,
>> i.e. change intersection into bitwise AND?
>>
> 
> It wasn't really a problem before accelerations got enabled on q-in-q
> vlans.

Right for stacked vlan device.
But I think the check was there for packets from guests forwarded by
bridge to vlan device so it was a problem before 8cb65d000.

>> The intersection was introduced in db115037bb57 ("net: fix checksum
>> features handling in netif_skb_features()"), but I guess for this
>> particular check the intersection was not needed.
>>
> 
> So, to put it another way, leave the intersection with HW_CSUM in the mask,
> and then do:
> 
>   return features & ~(NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
> 
> This might work, but it assumes that everyone who announce HW_CSUM can
> do q-in-q vlans.  It's been a bit of a pain tracking this down and I'd rather
> fix it for everyone and let individual driver authors verify that Q-in-Q works
> correctly with HW checksum.  However, I am willing to do the above if
> that's what people want.

At least HW_CSUM in the check was introduced intentionally.
https://www.spinics.net/lists/netdev/msg152016.html

And I think HW_CSUM should work with any packets.
You know, include/linux/skbuff.h says
>  *NETIF_F_HW_CSUM - The driver (or its device) is able to compute one
>  *  IP (one's complement) checksum for any combination
>  *  of protocols or protocol layering.

We should be able to safely enable it.

...But you are so worried about layer2 protocol handling of HW_CSUM
devices, I'm ok with disabling it for now.

-- 
Toshiaki Makita

Re: [PATCH v4 net-next 6/7] net: allow simultaneous SW and HW transmit timestamping

2017-05-19 Thread Miroslav Lichvar

On Thu, May 18, 2017 at 04:16:26PM -0400, Willem de Bruijn wrote:
> On Thu, May 18, 2017 at 9:06 AM, Miroslav Lichvar  wrote:
> > +/* On transmit, software and hardware timestamps are returned 
> > independently.
> > + * As the two skb clones share the hardware timestamp, which may be updated
> > + * before the software timestamp is received, a hardware TX timestamp may 
> > be
> > + * returned only if there is no software TX timestamp. A false software
> > + * timestamp made for SOCK_RCVTSTAMP when a real timestamp is missing must
> > + * be ignored.
> 
> Please expand on why this case can be ignored. It is quite subtle. How about
> something like
> 
> *
> * A false software timestamp is one made inside the __sock_recv_timestamp
> * call itself. These are generated whenever SO_TIMESTAMP(NS) is enabled
> * on the socket, even when the timestamp reported is for another option, such
> * as hardware tx timestamp.
> *
> * Ignore these when deciding whether a timestamp source is hw or sw.
> */

That seems a bit too verbose to me. :) Would the following work?

/* On transmit, software and hardware timestamps are returned independently.
 * As the two skb clones share the hardware timestamp, which may be updated
 * before the software timestamp is received, a hardware TX timestamp may be
 * returned only if there is no software TX timestamp. Ignore false software
 * timestamps, which may be made in the __sock_recv_timestamp() call when the
 * option SO_TIMESTAMP(NS) is enabled on the socket, even when the skb has a
 * hardware timestamp.
 */

> > +static bool skb_is_swtx_tstamp(const struct sk_buff *skb,
> > +  const struct sock *sk, int false_tstamp)
> > +{
> > +   if (false_tstamp && sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW)
> 
> Also, why is it ignored only for the new mode?

Good point. That should not be there. The function can be now reduced
to a single line again. I originally tried a different approach,
disabling false timestamps in the new mode, but then I thought it's
better to not complicate it unnecessarily and keep it consistent.

-- 
Miroslav Lichvar

[PATCH 10/12] netfilter: nf_tables: revisit chain/object refcounting from elements

2017-05-19 Thread Pablo Neira Ayuso

Andreas reports that the following incremental update using our commit
protocol doesn't work.

 # nft -f incremental-update.nft
 delete element ip filter client_to_any { 10.180.86.22 : goto CIn_1 }
 delete chain ip filter CIn_1
 ... Error: Could not process rule: Device or resource busy

The existing code is not well-integrated into the commit phase protocol,
since element deletions do not result in refcount decrement from the
preparation phase. This results in bogus EBUSY errors like the one
above.

Two new functions come with this patch:

* nft_set_elem_activate() function is used from the abort path, to
  restore the set element refcounting on objects that occurred from
  the preparation phase.

* nft_set_elem_deactivate() that is called from nft_del_setelem() to
  decrement set element refcounting on objects from the preparation
  phase in the commit protocol.

The nft_data_uninit() has been renamed to nft_data_release() since this
function does not uninitialize any data store in the data register,
instead just releases the references to objects. Moreover, a new
function nft_data_hold() has been introduced to be used from
nft_set_elem_activate().

Reported-by: Andreas Schultz 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_tables.h |  2 +-
 net/netfilter/nf_tables_api.c | 82 ++-
 net/netfilter/nft_bitwise.c   |  4 +-
 net/netfilter/nft_cmp.c   |  2 +-
 net/netfilter/nft_immediate.c |  5 ++-
 net/netfilter/nft_range.c |  4 +-
 6 files changed, 81 insertions(+), 18 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 028faec8fc27..8a8bab8d7b15 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -176,7 +176,7 @@ struct nft_data_desc {
 int nft_data_init(const struct nft_ctx *ctx,
  struct nft_data *data, unsigned int size,
  struct nft_data_desc *desc, const struct nlattr *nla);
-void nft_data_uninit(const struct nft_data *data, enum nft_data_types type);
+void nft_data_release(const struct nft_data *data, enum nft_data_types type);
 int nft_data_dump(struct sk_buff *skb, int attr, const struct nft_data *data,
  enum nft_data_types type, unsigned int len);
 
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 5f4a4d48b871..da314be0c048 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3627,9 +3627,9 @@ void nft_set_elem_destroy(const struct nft_set *set, void 
*elem,
 {
struct nft_set_ext *ext = nft_set_elem_ext(set, elem);
 
-   nft_data_uninit(nft_set_ext_key(ext), NFT_DATA_VALUE);
+   nft_data_release(nft_set_ext_key(ext), NFT_DATA_VALUE);
if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA))
-   nft_data_uninit(nft_set_ext_data(ext), set->dtype);
+   nft_data_release(nft_set_ext_data(ext), set->dtype);
if (destroy_expr && nft_set_ext_exists(ext, NFT_SET_EXT_EXPR))
nf_tables_expr_destroy(NULL, nft_set_ext_expr(ext));
if (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF))
@@ -3638,6 +3638,18 @@ void nft_set_elem_destroy(const struct nft_set *set, 
void *elem,
 }
 EXPORT_SYMBOL_GPL(nft_set_elem_destroy);
 
+/* Only called from commit path, nft_set_elem_deactivate() already deals with
+ * the refcounting from the preparation phase.
+ */
+static void nf_tables_set_elem_destroy(const struct nft_set *set, void *elem)
+{
+   struct nft_set_ext *ext = nft_set_elem_ext(set, elem);
+
+   if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPR))
+   nf_tables_expr_destroy(NULL, nft_set_ext_expr(ext));
+   kfree(elem);
+}
+
 static int nft_setelem_parse_flags(const struct nft_set *set,
   const struct nlattr *attr, u32 *flags)
 {
@@ -3849,9 +3861,9 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct 
nft_set *set,
kfree(elem.priv);
 err3:
if (nla[NFTA_SET_ELEM_DATA] != NULL)
-   nft_data_uninit(, d2.type);
+   nft_data_release(, d2.type);
 err2:
-   nft_data_uninit(, d1.type);
+   nft_data_release(, d1.type);
 err1:
return err;
 }
@@ -3896,6 +3908,53 @@ static int nf_tables_newsetelem(struct net *net, struct 
sock *nlsk,
return err;
 }
 
+/**
+ * nft_data_hold - hold a nft_data item
+ *
+ * @data: struct nft_data to release
+ * @type: type of data
+ *
+ * Hold a nft_data item. NFT_DATA_VALUE types can be silently discarded,
+ * NFT_DATA_VERDICT bumps the reference to chains in case of NFT_JUMP and
+ * NFT_GOTO verdicts. This function must be called on active data objects
+ * from the second phase of the commit protocol.
+ */
+static void nft_data_hold(const struct nft_data *data, enum nft_data_types 
type)
+{
+   if (type == NFT_DATA_VERDICT) {
+   switch (data->verdict.code)

[PATCH 09/12] netfilter: nf_tables: missing sanitization in data from userspace

2017-05-19 Thread Pablo Neira Ayuso

Do not assume userspace always sends us NFT_DATA_VALUE for bitwise and
cmp expressions. Although NFT_DATA_VERDICT does not make any sense, it
is still possible to handcraft a netlink message using this incorrect
data type.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_bitwise.c | 19 ++-
 net/netfilter/nft_cmp.c | 12 ++--
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c
index 877d9acd91ef..96bd4f325b0f 100644
--- a/net/netfilter/nft_bitwise.c
+++ b/net/netfilter/nft_bitwise.c
@@ -83,17 +83,26 @@ static int nft_bitwise_init(const struct nft_ctx *ctx,
tb[NFTA_BITWISE_MASK]);
if (err < 0)
return err;
-   if (d1.len != priv->len)
-   return -EINVAL;
+   if (d1.len != priv->len) {
+   err = -EINVAL;
+   goto err1;
+   }
 
err = nft_data_init(NULL, >xor, sizeof(priv->xor), ,
tb[NFTA_BITWISE_XOR]);
if (err < 0)
-   return err;
-   if (d2.len != priv->len)
-   return -EINVAL;
+   goto err1;
+   if (d2.len != priv->len) {
+   err = -EINVAL;
+   goto err2;
+   }
 
return 0;
+err2:
+   nft_data_uninit(>xor, d2.type);
+err1:
+   nft_data_uninit(>mask, d1.type);
+   return err;
 }
 
 static int nft_bitwise_dump(struct sk_buff *skb, const struct nft_expr *expr)
diff --git a/net/netfilter/nft_cmp.c b/net/netfilter/nft_cmp.c
index 2b96effeadc1..8c9d0fb19118 100644
--- a/net/netfilter/nft_cmp.c
+++ b/net/netfilter/nft_cmp.c
@@ -201,10 +201,18 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const 
struct nlattr * const tb[])
if (err < 0)
return ERR_PTR(err);
 
+   if (desc.type != NFT_DATA_VALUE) {
+   err = -EINVAL;
+   goto err1;
+   }
+
if (desc.len <= sizeof(u32) && op == NFT_CMP_EQ)
return _cmp_fast_ops;
-   else
-   return _cmp_ops;
+
+   return _cmp_ops;
+err1:
+   nft_data_uninit(, desc.type);
+   return ERR_PTR(-EINVAL);
 }
 
 struct nft_expr_type nft_cmp_type __read_mostly = {
-- 
2.1.4

[PATCH 00/12] Netfilter/IPVS fixes for net

2017-05-19 Thread Pablo Neira Ayuso

Hi David,

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) When using IPVS in direct-routing mode, normal traffic from the LVS
   host to a back-end server is sometimes incorrectly NATed on the way
   back into the LVS host. Patch to fix this from Julian Anastasov.

2) Calm down clang compilation warning in ctnetlink due to type
   mismatch, from Matthias Kaehlcke.

3) Do not re-setup NAT for conntracks that are already confirmed, this
   is fixing a problem that was introduced in the previous nf-next batch.
   Patch from Liping Zhang.

4) Do not allow conntrack helper removal from userspace cthelper
   infrastructure if already in used. This comes with an initial patch
   to introduce nf_conntrack_helper_put() that is required by this fix.
   From Liping Zhang.

5) Zero the pad when copying data to userspace, otherwise iptables fails
   to remove rules. This is a follow up on the patchset that sorts out
   the internal match/target structure pointer leak to userspace. Patch
   from the same author, Willem de Bruijn. This also comes with a build
   failure when CONFIG_COMPAT is not on, coming in the last patch of
   this series.

6) SYNPROXY crashes with conntrack entries that are created via
   ctnetlink, more specifically via conntrackd state sync. Patch from
   Eric Leblond.

7) RCU safe iteration on set element dumping in nf_tables, from
   Liping Zhang.

8) Missing sanitization of immediate date for the bitwise and cmp
   expressions in nf_tables.

9) Refcounting logic for chain and objects from set elements does not
   integrate into the nf_tables 2-phase commit protocol.

10) Missing sanitization of target verdict in ebtables arpreply target,
from Gao Feng.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit 1c4d5f51a812a82de97beee24f48ed05c65ebda5:

  vmxnet3: ensure that adapter is in proper state during force_close 
(2017-05-12 12:23:52 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 751a9c763849f5859cb69ea44b0430d00672f637:

  netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside 
CONFIG_COMPAT (2017-05-18 13:10:03 +0200)


Eric Leblond (1):
  netfilter: synproxy: fix conntrackd interaction

Gao Feng (1):
  ebtables: arpreply: Add the standard target sanity check

Julian Anastasov (1):
  ipvs: SNAT packet replies only for NATed connections

Liping Zhang (4):
  netfilter: don't setup nat info for confirmed ct
  netfilter: introduce nf_conntrack_helper_put helper function
  netfilter: nfnl_cthelper: reject del request if helper obj is in use
  netfilter: nf_tables: can't assume lock is acquired when dumping set elems

Matthias Kaehlcke (1):
  netfilter: ctnetlink: Make some parameters integer to avoid enum mismatch

Pablo Neira Ayuso (3):
  Merge tag 'ipvs-fixes-for-v4.12' of http://git.kernel.org/.../horms/ipvs
  netfilter: nf_tables: missing sanitization in data from userspace
  netfilter: nf_tables: revisit chain/object refcounting from elements

Willem de Bruijn (2):
  netfilter: xtables: zero padding in data_to_user
  netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside 
CONFIG_COMPAT

 include/linux/netfilter/x_tables.h  |   2 +-
 include/linux/netfilter_bridge/ebtables.h   |   5 +
 include/net/netfilter/nf_conntrack_helper.h |   4 +
 include/net/netfilter/nf_tables.h   |   2 +-
 net/bridge/netfilter/ebt_arpreply.c |   3 +
 net/bridge/netfilter/ebtables.c |   9 +-
 net/netfilter/ipvs/ip_vs_core.c |  19 +++-
 net/netfilter/nf_conntrack_helper.c |  12 +++
 net/netfilter/nf_conntrack_netlink.c|  11 +-
 net/netfilter/nf_nat_core.c |   4 +
 net/netfilter/nf_tables_api.c   | 160 ++--
 net/netfilter/nfnetlink_cthelper.c  |  17 +--
 net/netfilter/nft_bitwise.c |  19 +++-
 net/netfilter/nft_cmp.c |  12 ++-
 net/netfilter/nft_ct.c  |   4 +-
 net/netfilter/nft_immediate.c   |   5 +-
 net/netfilter/nft_range.c   |   4 +-
 net/netfilter/nft_set_hash.c|   2 +-
 net/netfilter/x_tables.c|  24 +++--
 net/netfilter/xt_CT.c   |   6 +-
 net/openvswitch/conntrack.c |   4 +-
 21 files changed, 249 insertions(+), 79 deletions(-)

[PATCH 03/12] netfilter: don't setup nat info for confirmed ct

2017-05-19 Thread Pablo Neira Ayuso

From: Liping Zhang 

We cannot setup nat info if the ct has been confirmed already, else,
different cpu may race to handle the same ct. In extreme situation,
we may hit the "BUG_ON(nf_nat_initialized(ct, maniptype))" in the
nf_nat_setup_info.

Also running the following commands will easily hit NF_CT_ASSERT in
nf_conntrack_alter_reply:
  # nft flush ruleset
  # ping -c 2 -W 1 1.1.1.111 &
  # nft add table t
  # nft add chain t c {type nat hook postrouting priority 0 \;}
  # nft add rule t c snat to 4.5.6.7
  WARNING: CPU: 1 PID: 10065 at net/netfilter/nf_conntrack_core.c:1472
  nf_conntrack_alter_reply+0x9a/0x1a0 [nf_conntrack]
  [...]
  Call Trace:
   nf_nat_setup_info+0xad/0x840 [nf_nat]
   ? deactivate_slab+0x65d/0x6c0
   nft_nat_eval+0xcd/0x100 [nft_nat]
   nft_do_chain+0xff/0x5d0 [nf_tables]
   ? mark_held_locks+0x6f/0xa0
   ? __local_bh_enable_ip+0x70/0xa0
   ? trace_hardirqs_on_caller+0x11f/0x190
   ? ipt_do_table+0x310/0x610
   ? trace_hardirqs_on+0xd/0x10
   ? __local_bh_enable_ip+0x70/0xa0
   ? ipt_do_table+0x32b/0x610
   ? __lock_acquire+0x2ac/0x1580
   ? ipt_do_table+0x32b/0x610
   nft_nat_do_chain+0x65/0x80 [nft_chain_nat_ipv4]
   nf_nat_ipv4_fn+0x1ae/0x240 [nf_nat_ipv4]
   nf_nat_ipv4_out+0x4a/0xf0 [nf_nat_ipv4]
   nft_nat_ipv4_out+0x15/0x20 [nft_chain_nat_ipv4]
   nf_hook_slow+0x2c/0xf0
   ip_output+0x154/0x270

So for the confirmed ct, just ignore it and return NF_ACCEPT.

Fixes: 9a08ecfe74d7 ("netfilter: don't attach a nat extension by default")
Signed-off-by: Liping Zhang 
Acked-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_nat_core.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index b48d6b5aae8a..ef0be325a0c6 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -409,6 +409,10 @@ nf_nat_setup_info(struct nf_conn *ct,
 {
struct nf_conntrack_tuple curr_tuple, new_tuple;
 
+   /* Can't setup nat info for confirmed ct. */
+   if (nf_ct_is_confirmed(ct))
+   return NF_ACCEPT;
+
NF_CT_ASSERT(maniptype == NF_NAT_MANIP_SRC ||
 maniptype == NF_NAT_MANIP_DST);
BUG_ON(nf_nat_initialized(ct, maniptype));
-- 
2.1.4

[PATCH 02/12] netfilter: ctnetlink: Make some parameters integer to avoid enum mismatch

2017-05-19 Thread Pablo Neira Ayuso

From: Matthias Kaehlcke 

Not all parameters passed to ctnetlink_parse_tuple() and
ctnetlink_exp_dump_tuple() match the enum type in the signatures of these
functions. Since this is intended change the argument type of to be an
unsigned integer value.

Signed-off-by: Matthias Kaehlcke 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_netlink.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index dcf561b5c97a..fa752626029e 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1007,9 +1007,8 @@ static const struct nla_policy 
tuple_nla_policy[CTA_TUPLE_MAX+1] = {
 
 static int
 ctnetlink_parse_tuple(const struct nlattr * const cda[],
- struct nf_conntrack_tuple *tuple,
- enum ctattr_type type, u_int8_t l3num,
- struct nf_conntrack_zone *zone)
+ struct nf_conntrack_tuple *tuple, u32 type,
+ u_int8_t l3num, struct nf_conntrack_zone *zone)
 {
struct nlattr *tb[CTA_TUPLE_MAX+1];
int err;
@@ -2447,7 +2446,7 @@ static struct nfnl_ct_hook ctnetlink_glue_hook = {
 
 static int ctnetlink_exp_dump_tuple(struct sk_buff *skb,
const struct nf_conntrack_tuple *tuple,
-   enum ctattr_expect type)
+   u32 type)
 {
struct nlattr *nest_parms;
 
-- 
2.1.4

RE: [RFC V1 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

2017-05-19 Thread David Laight

From: linux-usb-ow...@vger.kernel.org [mailto:linux-usb-ow...@vger.kernel.org] 
On Behalf Of Jim Baxter
> Sent: 16 May 2017 18:41
> 
> The CDC-NCM driver can require large amounts of memory to create
> skb's and this can be a problem when the memory becomes fragmented.
> 
> This especially affects embedded systems that have constrained
> resources but wish to maximise the throughput of CDC-NCM with 16KiB
> NTB's.

Why is this driver copying multiple tx messages into a single skb.
Surely there are better ways to do this??

I think it is generating a 'multi-ethernet frame' URB with an
overall header for each URB and a header for each ethernet frame.

Given that the USB stack allows multiple concurrent transmits I'm
surprised that batching large ethernet frames makes much difference.

Also the USB target can't actually tell when URB that contain
multiples of the USB packet size end.
So it is possible to send a single NTB as multiple URB.
Of course, the usb_net might have other ideas.

David

Re: [PATCH v2] xfrm: fix state migration copy replay sequence numbers

2017-05-19 Thread Steffen Klassert

On Fri, May 19, 2017 at 12:47:00PM +0200, Antony Antony wrote:
> During xfrm migration copy replay and preplay sequence numbers
> from the previous state.
> 
> Here is a tcpdump output showing the problem.
> 10.0.10.46 is running vanilla kernel, is the IKE/IPsec responder.
> After the migration it sent wrong sequence number, reset to 1.
> The migration is from 10.0.0.52 to 10.0.0.53.
> 
> IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7cf), length 136
> IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: 
> ESP(spi=0xca1c282d,seq=0x7cf), length 136
> IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7d0), length 136
> IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: 
> ESP(spi=0xca1c282d,seq=0x7d0), length 136
> 
> IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
> IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
> IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
> IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
> 
> IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7d1), length 136
> 
> NOTE: next sequence is wrong 0x1
> 
> IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), 
> length 136
> IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7d2), length 136
> IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), 
> length 136
> 
> Signed-off-by: Antony Antony 

Applied, thanks Antony!

Re: [PATCH] xfrm: fix state migration replay sequence numbers

2017-05-19 Thread Steffen Klassert

On Thu, May 18, 2017 at 04:39:53PM +0200, Antony Antony wrote:
> During xfrm migration replay and preplay sequence numbers are not 
> copied from the previous state. 
> 
> Here is tcpdump output showing the problem.
> 10.0.10.46 is running vanilla kernel, IKE/IPsec responder.
> After the migration it sent wrong sequence number, reset to 1.
> The migration is from 10.0.0.52 to 10.0.0.53.
> 
> IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7cf), length 136
> IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: 
> ESP(spi=0xca1c282d,seq=0x7cf), length 136
> IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7d0), length 136
> IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: 
> ESP(spi=0xca1c282d,seq=0x7d0), length 136
> 
> IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
> IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
> IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
> IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
> 
> IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7d1), length 136
> 
> NOTE: next sequence is wrong 0x1
> 
> IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), 
> length 136
> IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7d2), length 136
> IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), 
> length 136
> 
> The attached patch fix it by copying replay and preplay.

The patch looks ok, but please do a v2 and put the above
informations into the commit message. This is usefull
information that we would loose otherwise.

Thanks!

Re: [PATCH net-next] xfrm: Make function xfrm_dev_register static

2017-05-19 Thread Steffen Klassert

On Thu, May 18, 2017 at 03:51:38PM +, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> Fixes the following sparse warning:
> 
> net/xfrm/xfrm_device.c:141:5: warning:
>  symbol 'xfrm_dev_register' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Applied to ipsec-next, thanks!

Re: ipsec doesn't route TCP with 4.11 kernel

2017-05-19 Thread Steffen Klassert

On Tue, May 16, 2017 at 03:05:40PM -0400, Don Bowman wrote:
> On 3 May 2017 at 04:14, Steffen Klassert  wrote:
> > On Sat, Apr 29, 2017 at 08:39:34PM -0400, Don Bowman wrote:
> >> On 28 April 2017 at 03:13, Steffen Klassert
> >>  wrote:
> >> > On Thu, Apr 27, 2017 at 06:13:38PM -0400, Don Bowman wrote:
> >> >> On 27 April 2017 at 04:42, Steffen Klassert 
> >> >> 
> >> >> wrote:
> >> >> > On Wed, Apr 26, 2017 at 10:01:34PM -0700, Cong Wang wrote:
> >> >> >> (Cc'ing netdev and IPSec maintainers)
> >> >> >>
> >> >> >> On Tue, Apr 25, 2017 at 6:08 PM, Don Bowman  
> >> >> >> wrote:
> >> >>
> >>
> >> 
> >>
> >> confirmed, with this patch in place that the tcp functions properly.
> >
> > Thanks for testing!
> >
> > I'll make sure to get this fix into the mainline soon.
> 
> Thanks. Let me know if there is any more assistance I can provide.
> I've been running the patch for 2 weeks now on 3 machines.

Thanks for testing!
I plan to push it upstream at the beginning of te next week.

Re: [PATCH v2] e1000e: Don't return uninitialized stats

2017-05-19 Thread Jeff Kirsher

On Thu, 2017-05-18 at 10:46 -0400, David Miller wrote:
> From: Benjamin Poirier 
> Date: Wed, 17 May 2017 16:24:13 -0400
> 
> > Some statistics passed to ethtool are garbage because
> > e1000e_get_stats64()
> > doesn't write them, for example: tx_heartbeat_errors. This leaks kernel
> > memory to userspace and confuses users.
> > 
> > Do like ixgbe and use dev_get_stats() which first zeroes out
> > rtnl_link_stats64.
> > 
> > Fixes: 5944701df90d ("net: remove useless memset's in drivers
> > get_stats64")
> > Reported-by: Stefan Priebe 
> > Signed-off-by: Benjamin Poirier 
> 
> Jeff, please be sure to pick this up, thanks.

Yep, I have it in my tree, thanks.

signature.asc
Description: This is a digitally signed message part

RE: [PATCH net-next 9/9] nfp: eliminate an if statement in calculation of completed frames

2017-05-19 Thread David Laight

From: Jakub Kicinski
> Sent: 17 May 2017 18:37
..
> > >   while (todo--) {
> > >   idx = D_IDX(tx_ring, tx_ring->rd_p++);
> >
> > That '++' looks suspicious.
> > I think you need to decide whether you are incrementing pointers into the 
> > ring
> > or indexes into it.
> > Sometimes it is safer to use a non-wrapping index and mask when accessing 
> > the entry.
> > entry_ptr = [idx & (RING_SIZE - 1)]
> > Ring full is then (read_idx == write_idx + RING_SIZE),
> > ring empty (read_idx == write_idx).
> > So the index just wrap at (probably)_2^32.
> 
> I may be missing the point.  I use a mix of the two, actually, the
> software pointers are free running (non-wrapping) but the HW QCP
> pointers wrap.  Because HW pointers wrap I always keep one entry on
> the rings empty, see nfp_net_tx_full().

Ah, I'd assumed that rd_p was a pointer, not an index.

David

[PATCH net-next] cxgb4: fix incorrect cim_la output for T6

2017-05-19 Thread Ganesh Goudar

take care of UpDbgLaRdPtr[0-3] restriction for T6

Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c 
b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index aded42b96..917b46b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -8268,6 +8268,13 @@ int t4_cim_read_la(struct adapter *adap, u32 *la_buf, 
unsigned int *wrptr)
if (ret)
break;
idx = (idx + 1) & UPDBGLARDPTR_M;
+
+   /* Bits 0-3 of UpDbgLaRdPtr can be between  to 1001 to
+* identify the 32-bit portion of the full 312-bit data
+*/
+   if (is_t6(adap->params.chip))
+   while ((idx & 0xf) > 9)
+   idx = (idx + 1) % UPDBGLARDPTR_M;
}
 restart:
if (cfg & UPDBGLAEN_F) {
-- 
2.1.0

Re: [PATCH 4.4-only] openvswitch: clear sender cpu before forwarding packets

2017-05-19 Thread Anoob Soman


On 18/05/17 09:11, Greg KH wrote:

So backporting that one patch solves the issue here?  Can you please
verify it, and let me know before I apply it?

thanks,

greg k-h


yes, I can do that.

[PATCH 11/12] ebtables: arpreply: Add the standard target sanity check

2017-05-19 Thread Pablo Neira Ayuso

From: Gao Feng 

The info->target comes from userspace and it would be used directly.
So we need to add the sanity check to make sure it is a valid standard
target, although the ebtables tool has already checked it. Kernel needs
to validate anything coming from userspace.

If the target is set as an evil value, it would break the ebtables
and cause a panic. Because the non-standard target is treated as one
offset.

Now add one helper function ebt_invalid_target, and we would replace
the macro INVALID_TARGET later.

Signed-off-by: Gao Feng 
Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter_bridge/ebtables.h | 5 +
 net/bridge/netfilter/ebt_arpreply.c   | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/netfilter_bridge/ebtables.h 
b/include/linux/netfilter_bridge/ebtables.h
index a30efb437e6d..e0cbf17af780 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -125,4 +125,9 @@ extern unsigned int ebt_do_table(struct sk_buff *skb,
 /* True if the target is not a standard target */
 #define INVALID_TARGET (info->target < -NUM_STANDARD_TARGETS || info->target 
>= 0)
 
+static inline bool ebt_invalid_target(int target)
+{
+   return (target < -NUM_STANDARD_TARGETS || target >= 0);
+}
+
 #endif
diff --git a/net/bridge/netfilter/ebt_arpreply.c 
b/net/bridge/netfilter/ebt_arpreply.c
index 5929309beaa1..db85230e49c3 100644
--- a/net/bridge/netfilter/ebt_arpreply.c
+++ b/net/bridge/netfilter/ebt_arpreply.c
@@ -68,6 +68,9 @@ static int ebt_arpreply_tg_check(const struct xt_tgchk_param 
*par)
if (e->ethproto != htons(ETH_P_ARP) ||
e->invflags & EBT_IPROTO)
return -EINVAL;
+   if (ebt_invalid_target(info->target))
+   return -EINVAL;
+
return 0;
 }
 
-- 
2.1.4

[PATCH 08/12] netfilter: nf_tables: can't assume lock is acquired when dumping set elems

2017-05-19 Thread Pablo Neira Ayuso

From: Liping Zhang 

When dumping the elements related to a specified set, we may invoke the
nf_tables_dump_set with the NFNL_SUBSYS_NFTABLES lock not acquired. So
we should use the proper rcu operation to avoid race condition, just
like other nft dump operations.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_tables_api.c | 78 +++
 net/netfilter/nft_set_hash.c  |  2 +-
 2 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 559225029740..5f4a4d48b871 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3367,35 +3367,50 @@ static int nf_tables_dump_setelem(const struct nft_ctx 
*ctx,
return nf_tables_fill_setelem(args->skb, set, elem);
 }
 
+struct nft_set_dump_ctx {
+   const struct nft_set*set;
+   struct nft_ctx  ctx;
+};
+
 static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   struct nft_set_dump_ctx *dump_ctx = cb->data;
struct net *net = sock_net(skb->sk);
-   u8 genmask = nft_genmask_cur(net);
+   struct nft_af_info *afi;
+   struct nft_table *table;
struct nft_set *set;
struct nft_set_dump_args args;
-   struct nft_ctx ctx;
-   struct nlattr *nla[NFTA_SET_ELEM_LIST_MAX + 1];
+   bool set_found = false;
struct nfgenmsg *nfmsg;
struct nlmsghdr *nlh;
struct nlattr *nest;
u32 portid, seq;
-   int event, err;
+   int event;
 
-   err = nlmsg_parse(cb->nlh, sizeof(struct nfgenmsg), nla,
- NFTA_SET_ELEM_LIST_MAX, nft_set_elem_list_policy,
- NULL);
-   if (err < 0)
-   return err;
+   rcu_read_lock();
+   list_for_each_entry_rcu(afi, >nft.af_info, list) {
+   if (afi != dump_ctx->ctx.afi)
+   continue;
 
-   err = nft_ctx_init_from_elemattr(, net, cb->skb, cb->nlh,
-(void *)nla, genmask);
-   if (err < 0)
-   return err;
+   list_for_each_entry_rcu(table, >tables, list) {
+   if (table != dump_ctx->ctx.table)
+   continue;
 
-   set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_ELEM_LIST_SET],
-  genmask);
-   if (IS_ERR(set))
-   return PTR_ERR(set);
+   list_for_each_entry_rcu(set, >sets, list) {
+   if (set == dump_ctx->set) {
+   set_found = true;
+   break;
+   }
+   }
+   break;
+   }
+   break;
+   }
+
+   if (!set_found) {
+   rcu_read_unlock();
+   return -ENOENT;
+   }
 
event  = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, NFT_MSG_NEWSETELEM);
portid = NETLINK_CB(cb->skb).portid;
@@ -3407,11 +3422,11 @@ static int nf_tables_dump_set(struct sk_buff *skb, 
struct netlink_callback *cb)
goto nla_put_failure;
 
nfmsg = nlmsg_data(nlh);
-   nfmsg->nfgen_family = ctx.afi->family;
+   nfmsg->nfgen_family = afi->family;
nfmsg->version  = NFNETLINK_V0;
-   nfmsg->res_id   = htons(ctx.net->nft.base_seq & 0x);
+   nfmsg->res_id   = htons(net->nft.base_seq & 0x);
 
-   if (nla_put_string(skb, NFTA_SET_ELEM_LIST_TABLE, ctx.table->name))
+   if (nla_put_string(skb, NFTA_SET_ELEM_LIST_TABLE, table->name))
goto nla_put_failure;
if (nla_put_string(skb, NFTA_SET_ELEM_LIST_SET, set->name))
goto nla_put_failure;
@@ -3422,12 +3437,13 @@ static int nf_tables_dump_set(struct sk_buff *skb, 
struct netlink_callback *cb)
 
args.cb = cb;
args.skb= skb;
-   args.iter.genmask   = nft_genmask_cur(ctx.net);
+   args.iter.genmask   = nft_genmask_cur(net);
args.iter.skip  = cb->args[0];
args.iter.count = 0;
args.iter.err   = 0;
args.iter.fn= nf_tables_dump_setelem;
-   set->ops->walk(, set, );
+   set->ops->walk(_ctx->ctx, set, );
+   rcu_read_unlock();
 
nla_nest_end(skb, nest);
nlmsg_end(skb, nlh);
@@ -3441,9 +3457,16 @@ static int nf_tables_dump_set(struct sk_buff *skb, 
struct netlink_callback *cb)
return skb->len;
 
 nla_put_failure:
+   rcu_read_unlock();
return -ENOSPC;
 }
 
+static int nf_tables_dump_set_done(struct netlink_callback *cb)
+{
+   kfree(cb->data);
+   return 0;
+}
+
 static int nf_tables_getsetelem(struct net *net, struct sock *nlsk,
struct sk_buff *skb,

[PATCH 12/12] netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPAT

2017-05-19 Thread Pablo Neira Ayuso

From: Willem de Bruijn 

The patch in the Fixes references COMPAT_XT_ALIGN in the definition
of XT_DATA_TO_USER, outside an #ifdef CONFIG_COMPAT block.

Split XT_DATA_TO_USER into separate compat and non compat variants and
define the first inside an CONFIG_COMPAT block.

This simplifies both variants by removing branches inside the macro.

Fixes: 324318f0248c ("netfilter: xtables: zero padding in data_to_user")
Reported-by: Stephen Rothwell 
Signed-off-by: Willem de Bruijn 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/x_tables.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index d17769599c10..1770c1d9b37f 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -296,18 +296,17 @@ int xt_data_to_user(void __user *dst, const void *src,
 }
 EXPORT_SYMBOL_GPL(xt_data_to_user);
 
-#define XT_DATA_TO_USER(U, K, TYPE, C_SIZE)\
+#define XT_DATA_TO_USER(U, K, TYPE)\
xt_data_to_user(U->data, K->data,   \
K->u.kernel.TYPE->usersize, \
-   C_SIZE ? : K->u.kernel.TYPE->TYPE##size,\
-   C_SIZE ? COMPAT_XT_ALIGN(C_SIZE) :  \
-XT_ALIGN(K->u.kernel.TYPE->TYPE##size))
+   K->u.kernel.TYPE->TYPE##size,   \
+   XT_ALIGN(K->u.kernel.TYPE->TYPE##size))
 
 int xt_match_to_user(const struct xt_entry_match *m,
 struct xt_entry_match __user *u)
 {
return XT_OBJ_TO_USER(u, m, match, 0) ||
-  XT_DATA_TO_USER(u, m, match, 0);
+  XT_DATA_TO_USER(u, m, match);
 }
 EXPORT_SYMBOL_GPL(xt_match_to_user);
 
@@ -315,7 +314,7 @@ int xt_target_to_user(const struct xt_entry_target *t,
  struct xt_entry_target __user *u)
 {
return XT_OBJ_TO_USER(u, t, target, 0) ||
-  XT_DATA_TO_USER(u, t, target, 0);
+  XT_DATA_TO_USER(u, t, target);
 }
 EXPORT_SYMBOL_GPL(xt_target_to_user);
 
@@ -614,6 +613,12 @@ void xt_compat_match_from_user(struct xt_entry_match *m, 
void **dstptr,
 }
 EXPORT_SYMBOL_GPL(xt_compat_match_from_user);
 
+#define COMPAT_XT_DATA_TO_USER(U, K, TYPE, C_SIZE) \
+   xt_data_to_user(U->data, K->data,   \
+   K->u.kernel.TYPE->usersize, \
+   C_SIZE, \
+   COMPAT_XT_ALIGN(C_SIZE))
+
 int xt_compat_match_to_user(const struct xt_entry_match *m,
void __user **dstptr, unsigned int *size)
 {
@@ -629,7 +634,7 @@ int xt_compat_match_to_user(const struct xt_entry_match *m,
if (match->compat_to_user((void __user *)cm->data, m->data))
return -EFAULT;
} else {
-   if (XT_DATA_TO_USER(cm, m, match, msize - sizeof(*cm)))
+   if (COMPAT_XT_DATA_TO_USER(cm, m, match, msize - sizeof(*cm)))
return -EFAULT;
}
 
@@ -975,7 +980,7 @@ int xt_compat_target_to_user(const struct xt_entry_target 
*t,
if (target->compat_to_user((void __user *)ct->data, t->data))
return -EFAULT;
} else {
-   if (XT_DATA_TO_USER(ct, t, target, tsize - sizeof(*ct)))
+   if (COMPAT_XT_DATA_TO_USER(ct, t, target, tsize - sizeof(*ct)))
return -EFAULT;
}
 
-- 
2.1.4

[PATCH 04/12] netfilter: introduce nf_conntrack_helper_put helper function

2017-05-19 Thread Pablo Neira Ayuso

From: Liping Zhang 

And convert module_put invocation to nf_conntrack_helper_put, this is
prepared for the followup patch, which will add a refcnt for cthelper,
so we can reject the deleting request when cthelper is in use.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_conntrack_helper.h | 2 ++
 net/netfilter/nf_conntrack_helper.c | 6 ++
 net/netfilter/nft_ct.c  | 4 ++--
 net/netfilter/xt_CT.c   | 6 +++---
 net/openvswitch/conntrack.c | 4 ++--
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h 
b/include/net/netfilter/nf_conntrack_helper.h
index e04fa7691e5d..c1c12411103a 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -79,6 +79,8 @@ struct nf_conntrack_helper *__nf_conntrack_helper_find(const 
char *name,
 struct nf_conntrack_helper *nf_conntrack_helper_try_module_get(const char 
*name,
   u16 l3num,
   u8 protonum);
+void nf_conntrack_helper_put(struct nf_conntrack_helper *helper);
+
 void nf_ct_helper_init(struct nf_conntrack_helper *helper,
   u16 l3num, u16 protonum, const char *name,
   u16 default_port, u16 spec_port, u32 id,
diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index 3a60efa7799b..e17006b6e434 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -181,6 +181,12 @@ nf_conntrack_helper_try_module_get(const char *name, u16 
l3num, u8 protonum)
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_helper_try_module_get);
 
+void nf_conntrack_helper_put(struct nf_conntrack_helper *helper)
+{
+   module_put(helper->me);
+}
+EXPORT_SYMBOL_GPL(nf_conntrack_helper_put);
+
 struct nf_conn_help *
 nf_ct_helper_ext_add(struct nf_conn *ct,
 struct nf_conntrack_helper *helper, gfp_t gfp)
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index a34ceb38fc55..1678e9e75e8e 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -826,9 +826,9 @@ static void nft_ct_helper_obj_destroy(struct nft_object 
*obj)
struct nft_ct_helper_obj *priv = nft_obj_data(obj);
 
if (priv->helper4)
-   module_put(priv->helper4->me);
+   nf_conntrack_helper_put(priv->helper4);
if (priv->helper6)
-   module_put(priv->helper6->me);
+   nf_conntrack_helper_put(priv->helper6);
 }
 
 static void nft_ct_helper_obj_eval(struct nft_object *obj,
diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index bb7ad82dcd56..623ef37de886 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -96,7 +96,7 @@ xt_ct_set_helper(struct nf_conn *ct, const char *helper_name,
 
help = nf_ct_helper_ext_add(ct, helper, GFP_KERNEL);
if (help == NULL) {
-   module_put(helper->me);
+   nf_conntrack_helper_put(helper);
return -ENOMEM;
}
 
@@ -263,7 +263,7 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par,
 err4:
help = nfct_help(ct);
if (help)
-   module_put(help->helper->me);
+   nf_conntrack_helper_put(help->helper);
 err3:
nf_ct_tmpl_free(ct);
 err2:
@@ -346,7 +346,7 @@ static void xt_ct_tg_destroy(const struct xt_tgdtor_param 
*par,
if (ct) {
help = nfct_help(ct);
if (help)
-   module_put(help->helper->me);
+   nf_conntrack_helper_put(help->helper);
 
nf_ct_netns_put(par->net, par->family);
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index bf602e33c40a..08679ebb3068 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1123,7 +1123,7 @@ static int ovs_ct_add_helper(struct ovs_conntrack_info 
*info, const char *name,
 
help = nf_ct_helper_ext_add(info->ct, helper, GFP_KERNEL);
if (!help) {
-   module_put(helper->me);
+   nf_conntrack_helper_put(helper);
return -ENOMEM;
}
 
@@ -1584,7 +1584,7 @@ void ovs_ct_free_action(const struct nlattr *a)
 static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info)
 {
if (ct_info->helper)
-   module_put(ct_info->helper->me);
+   nf_conntrack_helper_put(ct_info->helper);
if (ct_info->ct)
nf_ct_tmpl_free(ct_info->ct);
 }
-- 
2.1.4

[PATCH 06/12] netfilter: xtables: zero padding in data_to_user

2017-05-19 Thread Pablo Neira Ayuso

From: Willem de Bruijn 

When looking up an iptables rule, the iptables binary compares the
aligned match and target data (XT_ALIGN). In some cases this can
exceed the actual data size to include padding bytes.

Before commit f77bc5b23fb1 ("iptables: use match, target and data
copy_to_user helpers") the malloc()ed bytes were overwritten by the
kernel with kzalloced contents, zeroing the padding and making the
comparison succeed. After this patch, the kernel copies and clears
only data, leaving the padding bytes undefined.

Extend the clear operation from data size to aligned data size to
include the padding bytes, if any.

Padding bytes can be observed in both match and target, and the bug
triggered, by issuing a rule with match icmp and target ACCEPT:

  iptables -t mangle -A INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT
  iptables -t mangle -D INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT

Fixes: f77bc5b23fb1 ("iptables: use match, target and data copy_to_user 
helpers")
Reported-by: Paul Moore 
Reported-by: Richard Guy Briggs 
Signed-off-by: Willem de Bruijn 
Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter/x_tables.h | 2 +-
 net/bridge/netfilter/ebtables.c| 9 ++---
 net/netfilter/x_tables.c   | 9 ++---
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h 
b/include/linux/netfilter/x_tables.h
index be378cf47fcc..b3044c2c62cb 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -294,7 +294,7 @@ int xt_match_to_user(const struct xt_entry_match *m,
 int xt_target_to_user(const struct xt_entry_target *t,
  struct xt_entry_target __user *u);
 int xt_data_to_user(void __user *dst, const void *src,
-   int usersize, int size);
+   int usersize, int size, int aligned_size);
 
 void *xt_copy_counters_from_user(const void __user *user, unsigned int len,
 struct xt_counters_info *info, bool compat);
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 9ec0c9f908fa..9c6e619f452b 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1373,7 +1373,8 @@ static inline int ebt_obj_to_user(char __user *um, const 
char *_name,
strlcpy(name, _name, sizeof(name));
if (copy_to_user(um, name, EBT_FUNCTION_MAXNAMELEN) ||
put_user(datasize, (int __user *)(um + EBT_FUNCTION_MAXNAMELEN)) ||
-   xt_data_to_user(um + entrysize, data, usersize, datasize))
+   xt_data_to_user(um + entrysize, data, usersize, datasize,
+   XT_ALIGN(datasize)))
return -EFAULT;
 
return 0;
@@ -1658,7 +1659,8 @@ static int compat_match_to_user(struct ebt_entry_match 
*m, void __user **dstptr,
if (match->compat_to_user(cm->data, m->data))
return -EFAULT;
} else {
-   if (xt_data_to_user(cm->data, m->data, match->usersize, msize))
+   if (xt_data_to_user(cm->data, m->data, match->usersize, msize,
+   COMPAT_XT_ALIGN(msize)))
return -EFAULT;
}
 
@@ -1687,7 +1689,8 @@ static int compat_target_to_user(struct ebt_entry_target 
*t,
if (target->compat_to_user(cm->data, t->data))
return -EFAULT;
} else {
-   if (xt_data_to_user(cm->data, t->data, target->usersize, tsize))
+   if (xt_data_to_user(cm->data, t->data, target->usersize, tsize,
+   COMPAT_XT_ALIGN(tsize)))
return -EFAULT;
}
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8876b7da6884..d17769599c10 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -283,12 +283,13 @@ static int xt_obj_to_user(u16 __user *psize, u16 size,
   >u.user.revision, K->u.kernel.TYPE->revision)
 
 int xt_data_to_user(void __user *dst, const void *src,
-   int usersize, int size)
+   int usersize, int size, int aligned_size)
 {
usersize = usersize ? : size;
if (copy_to_user(dst, src, usersize))
return -EFAULT;
-   if (usersize != size && clear_user(dst + usersize, size - usersize))
+   if (usersize != aligned_size &&
+   clear_user(dst + usersize, aligned_size - usersize))
return -EFAULT;
 
return 0;
@@ -298,7 +299,9 @@ EXPORT_SYMBOL_GPL(xt_data_to_user);
 #define XT_DATA_TO_USER(U, K, TYPE, C_SIZE)\
xt_data_to_user(U->data, K->data,   \
K->u.kernel.TYPE->usersize, \
-   C_SIZE ? : K->u.kernel.TYPE->TYPE##size)
+

[PATCH 07/12] netfilter: synproxy: fix conntrackd interaction

2017-05-19 Thread Pablo Neira Ayuso

From: Eric Leblond 

This patch fixes the creation of connection tracking entry from
netlink when synproxy is used. It was missing the addition of
the synproxy extension.

This was causing kernel crashes when a conntrack entry created by
conntrackd was used after the switch of traffic from active node
to the passive node.

Signed-off-by: Eric Leblond 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_netlink.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index fa752626029e..9799a50bc604 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -45,6 +45,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #ifdef CONFIG_NF_NAT_NEEDED
 #include 
 #include 
@@ -1827,6 +1829,8 @@ ctnetlink_create_conntrack(struct net *net,
nf_ct_tstamp_ext_add(ct, GFP_ATOMIC);
nf_ct_ecache_ext_add(ct, 0, 0, GFP_ATOMIC);
nf_ct_labels_ext_add(ct);
+   nfct_seqadj_ext_add(ct);
+   nfct_synproxy_ext_add(ct);
 
/* we must add conntrack extensions before confirmation. */
ct->status |= IPS_CONFIRMED;
-- 
2.1.4

[PATCH 05/12] netfilter: nfnl_cthelper: reject del request if helper obj is in use

2017-05-19 Thread Pablo Neira Ayuso

From: Liping Zhang 

We can still delete the ct helper even if it is in use, this will cause
a use-after-free error. In more detail, I mean:
  # nfct helper add ssdp inet udp
  # iptables -t raw -A OUTPUT -p udp -j CT --helper ssdp
  # nfct helper delete ssdp //--> oops, succeed!
  BUG: unable to handle kernel paging request at 26ca
  IP: 0x26ca
  [...]
  Call Trace:
   ? ipv4_helper+0x62/0x80 [nf_conntrack_ipv4]
   nf_hook_slow+0x21/0xb0
   ip_output+0xe9/0x100
   ? ip_fragment.constprop.54+0xc0/0xc0
   ip_local_out+0x33/0x40
   ip_send_skb+0x16/0x80
   udp_send_skb+0x84/0x240
   udp_sendmsg+0x35d/0xa50

So add reference count to fix this issue, if ct helper is used by
others, reject the delete request.

Apply this patch:
  # nfct helper delete ssdp
  nfct v1.4.3: netlink error: Device or resource busy

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_conntrack_helper.h |  2 ++
 net/netfilter/nf_conntrack_helper.c |  6 ++
 net/netfilter/nfnetlink_cthelper.c  | 17 +++--
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h 
b/include/net/netfilter/nf_conntrack_helper.h
index c1c12411103a..c519bb5b5bb8 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -9,6 +9,7 @@
 
 #ifndef _NF_CONNTRACK_HELPER_H
 #define _NF_CONNTRACK_HELPER_H
+#include 
 #include 
 #include 
 #include 
@@ -26,6 +27,7 @@ struct nf_conntrack_helper {
struct hlist_node hnode;/* Internal use. */
 
char name[NF_CT_HELPER_NAME_LEN]; /* name of the module */
+   refcount_t refcnt;
struct module *me;  /* pointer to self */
const struct nf_conntrack_expect_policy *expect_policy;
 
diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index e17006b6e434..7f6100ca63be 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -174,6 +174,10 @@ nf_conntrack_helper_try_module_get(const char *name, u16 
l3num, u8 protonum)
 #endif
if (h != NULL && !try_module_get(h->me))
h = NULL;
+   if (h != NULL && !refcount_inc_not_zero(>refcnt)) {
+   module_put(h->me);
+   h = NULL;
+   }
 
rcu_read_unlock();
 
@@ -183,6 +187,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_helper_try_module_get);
 
 void nf_conntrack_helper_put(struct nf_conntrack_helper *helper)
 {
+   refcount_dec(>refcnt);
module_put(helper->me);
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_helper_put);
@@ -423,6 +428,7 @@ int nf_conntrack_helper_register(struct nf_conntrack_helper 
*me)
}
}
}
+   refcount_set(>refcnt, 1);
hlist_add_head_rcu(>hnode, _ct_helper_hash[h]);
nf_ct_helper_count++;
 out:
diff --git a/net/netfilter/nfnetlink_cthelper.c 
b/net/netfilter/nfnetlink_cthelper.c
index 950bf6eadc65..be678a323598 100644
--- a/net/netfilter/nfnetlink_cthelper.c
+++ b/net/netfilter/nfnetlink_cthelper.c
@@ -686,6 +686,7 @@ static int nfnl_cthelper_del(struct net *net, struct sock 
*nfnl,
tuple_set = true;
}
 
+   ret = -ENOENT;
list_for_each_entry_safe(nlcth, n, _cthelper_list, list) {
cur = >helper;
j++;
@@ -699,16 +700,20 @@ static int nfnl_cthelper_del(struct net *net, struct sock 
*nfnl,
 tuple.dst.protonum != cur->tuple.dst.protonum))
continue;
 
-   found = true;
-   nf_conntrack_helper_unregister(cur);
-   kfree(cur->expect_policy);
+   if (refcount_dec_if_one(>refcnt)) {
+   found = true;
+   nf_conntrack_helper_unregister(cur);
+   kfree(cur->expect_policy);
 
-   list_del(>list);
-   kfree(nlcth);
+   list_del(>list);
+   kfree(nlcth);
+   } else {
+   ret = -EBUSY;
+   }
}
 
/* Make sure we return success if we flush and there is no helpers */
-   return (found || j == 0) ? 0 : -ENOENT;
+   return (found || j == 0) ? 0 : ret;
 }
 
 static const struct nla_policy nfnl_cthelper_policy[NFCTH_MAX+1] = {
-- 
2.1.4

[PATCH 01/12] ipvs: SNAT packet replies only for NATed connections

2017-05-19 Thread Pablo Neira Ayuso

From: Julian Anastasov 

We do not check if packet from real server is for NAT
connection before performing SNAT. This causes problems
for setups that use DR/TUN and allow local clients to
access the real server directly, for example:

- local client in director creates IPVS-DR/TUN connection
CIP->VIP and the request packets are routed to RIP.
Talks are finished but IPVS connection is not expired yet.

- second local client creates non-IPVS connection CIP->RIP
with same reply tuple RIP->CIP and when replies are received
on LOCAL_IN we wrongly assign them for the first client
connection because RIP->CIP matches the reply direction.
As result, IPVS SNATs replies for non-IPVS connections.

The problem is more visible to local UDP clients but in rare
cases it can happen also for TCP or remote clients when the
real server sends the reply traffic via the director.

So, better to be more precise for the reply traffic.
As replies are not expected for DR/TUN connections, better
to not touch them.

Reported-by: Nick Moriarty 
Tested-by: Nick Moriarty 
Signed-off-by: Julian Anastasov 
Signed-off-by: Simon Horman 
---
 net/netfilter/ipvs/ip_vs_core.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index d2d7bdf1d510..ad99c1ceea6f 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -849,10 +849,8 @@ static int handle_response_icmp(int af, struct sk_buff 
*skb,
 {
unsigned int verdict = NF_DROP;
 
-   if (IP_VS_FWD_METHOD(cp) != 0) {
-   pr_err("shouldn't reach here, because the box is on the "
-  "half connection in the tun/dr module.\n");
-   }
+   if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
+   goto ignore_cp;
 
/* Ensure the checksum is correct */
if (!skb_csum_unnecessary(skb) && ip_vs_checksum_complete(skb, ihl)) {
@@ -886,6 +884,8 @@ static int handle_response_icmp(int af, struct sk_buff *skb,
ip_vs_notrack(skb);
else
ip_vs_update_conntrack(skb, cp, 0);
+
+ignore_cp:
verdict = NF_ACCEPT;
 
 out:
@@ -1385,8 +1385,11 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, 
struct sk_buff *skb, in
 */
cp = pp->conn_out_get(ipvs, af, skb, );
 
-   if (likely(cp))
+   if (likely(cp)) {
+   if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
+   goto ignore_cp;
return handle_response(af, skb, pd, cp, , hooknum);
+   }
 
/* Check for real-server-started requests */
if (atomic_read(>conn_out_counter)) {
@@ -1444,9 +1447,15 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, 
struct sk_buff *skb, in
}
}
}
+
+out:
IP_VS_DBG_PKT(12, af, pp, skb, iph.off,
  "ip_vs_out: packet continues traversal as normal");
return NF_ACCEPT;
+
+ignore_cp:
+   __ip_vs_conn_put(cp);
+   goto out;
 }
 
 /*
-- 
2.1.4

Re: [PATCH net 1/3] vlan: Fix tcp checksums offloads for Q-in-Q vlan.

2017-05-19 Thread Vlad Yasevich

On 05/19/2017 04:16 AM, Toshiaki Makita wrote:
> On 2017/05/19 16:09, Vlad Yasevich wrote:
>> On 05/18/2017 10:13 PM, Toshiaki Makita wrote:
>>> On 2017/05/18 22:31, Vladislav Yasevich wrote:
 It appears that since commit 8cb65d000, Q-in-Q vlans have been
 broken.  The series that commit is part of enabled TSO and checksum
 offloading on Q-in-Q vlans.  However, most HW we support can't handle
 it.  To work around the issue, the above commit added a function that
 turns off offloads on Q-in-Q devices, but it left the checksum offload.
 That will cause issues with most older devices that supprort very basic
 checksum offload capabilities as well as some newer devices (we've
 reproduced te problem with both be2net and bnx).

 To solve this for everyone, turn off checksum offloading feature
 by default when sending Q-in-Q traffic.  Devices that are proven to
 work can provided a corrected ndo_features_check implemetation.

 Fixes: 8cb65d000 ("net: Move check for multiple vlans to drivers")
 CC: Toshiaki Makita 
 Signed-off-by: Vladislav Yasevich 
 ---
  include/linux/if_vlan.h | 1 -
  1 file changed, 1 deletion(-)

 diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
 index 8d5fcd6..ae537f0 100644
 --- a/include/linux/if_vlan.h
 +++ b/include/linux/if_vlan.h
 @@ -619,7 +619,6 @@ static inline netdev_features_t 
 vlan_features_check(const struct sk_buff *skb,
 NETIF_F_SG |
 NETIF_F_HIGHDMA |
 NETIF_F_FRAGLIST |
 -   NETIF_F_HW_CSUM |
 NETIF_F_HW_VLAN_CTAG_TX |
 NETIF_F_HW_VLAN_STAG_TX);
  
>>>
>>> I guess HW_CSUM theoretically can handle Q-in-Q packets and the problem
>>> is IP_CSUM and IPV6_CSUM.
>>> So wouldn't it be better to leave HW_CSUM and drop IP_CSUM/IPV6_CSUM,
>>> i.e. change intersection into bitwise AND?
>>>
>>
>> It wasn't really a problem before accelerations got enabled on q-in-q
>> vlans.
> 
> Right for stacked vlan device.
> But I think the check was there for packets from guests forwarded by
> bridge to vlan device so it was a problem before 8cb65d000.

Not really, since stacked vlans in guests wouldn't have accelerations on.
Haven't really tried a new guest on old hosts.  It might be an issue there...

> 
>>> The intersection was introduced in db115037bb57 ("net: fix checksum
>>> features handling in netif_skb_features()"), but I guess for this
>>> particular check the intersection was not needed.
>>>
>>
>> So, to put it another way, leave the intersection with HW_CSUM in the mask,
>> and then do:
>>
>>   return features & ~(NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
>>
>> This might work, but it assumes that everyone who announce HW_CSUM can
>> do q-in-q vlans.  It's been a bit of a pain tracking this down and I'd rather
>> fix it for everyone and let individual driver authors verify that Q-in-Q 
>> works
>> correctly with HW checksum.  However, I am willing to do the above if
>> that's what people want.
> 
> At least HW_CSUM in the check was introduced intentionally.
> https://www.spinics.net/lists/netdev/msg152016.html
> 
> And I think HW_CSUM should work with any packets.
> You know, include/linux/skbuff.h says
>>  *   NETIF_F_HW_CSUM - The driver (or its device) is able to compute one
>>  * IP (one's complement) checksum for any combination
>>  * of protocols or protocol layering.
> 
> We should be able to safely enable it.
> 
> ...But you are so worried about layer2 protocol handling of HW_CSUM
> devices, I'm ok with disabling it for now.
> 

It's a concern after running across this issue.  Granted, the few devices
we've seen this bug on use IP/IPV6 checksum features.  I am hoping someone
else might weigh in here.

-vlad

Re: [PATCH v5 net-next 5/7] net: fix documentation of struct scm_timestamping

2017-05-19 Thread Miroslav Lichvar

On Thu, May 18, 2017 at 03:38:30PM -0400, Willem de Bruijn wrote:
> On Thu, May 18, 2017 at 10:07 AM, Miroslav Lichvar  
> wrote:
> > +Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled
> > +together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false
> > +software timestamp will be generated in the recvmsg() call and passed
> > +in ts[0] when a real software timestamp is missing.
> 
> With receive software timestamping this is expected behavior? I would make
> explicit that this happens even on tx timestamps.

How about adding ", e.g. when receive timestamping is enabled
between receiving the message and the recvmsg() call, or it is a
message with a hardware transmit timestamp." ?

> > For this reason it
> > +is not recommended to combine SO_TIMESTAMP(NS) with SO_TIMESTAMPING.
> 
> And I'd remove this. The extra timestamp is harmless, and we may be missing
> other reasons why someone would want to enable both on the same socket.

Ok. I'm just concerned people will inadvertently use the timestamp as
a real timestamp and then wonder why SW TX timestamping is so bad. I
have fallen into this trap.

-- 
Miroslav Lichvar

[PATCH net-next v2] ibmveth: Support to enable LSO/CSO for Trunk VEA.

2017-05-19 Thread Sivakumar Krishnasamy

Current largesend and checksum offload feature in ibmveth driver,
 - Source VM sends the TCP packets with ip_summed field set as
   CHECKSUM_PARTIAL and TCP pseudo header checksum is placed in
   checksum field
 - CHECKSUM_PARTIAL flag in SKB will enable ibmveth driver to mark
   "no checksum" and "checksum good" bits in transmit buffer descriptor
   before the packet is delivered to pseries PowerVM Hypervisor
 - If ibmveth has largesend capability enabled, transmit buffer descriptors
   are market accordingly before packet is delivered to Hypervisor
   (along with mss value for packets with length > MSS)
 - Destination VM's ibmveth driver receives the packet with "checksum good"
   bit set and so, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY
 - If "largesend" bit was on, mss value is copied from receive descriptor
   into SKB's gso_size and other flags are appropriately set for
   packets > MSS size
 - The packet is now successfully delivered up the stack in destination VM

The offloads described above works fine for TCP communication among VMs in
the same pseries server ( VM A <=> PowerVM Hypervisor <=> VM B )

We are now enabling support for OVS in pseries PowerVM environment. One of
our requirements is to have ibmveth driver configured in "Trunk" mode, when
they are used with OVS. This is because, PowerVM Hypervisor will no more
bridge the packets between VMs, instead the packets are delivered to
IO Server which hosts OVS to bridge them between VMs or to external
networks (flow shown below),
  VM A <=> PowerVM Hypervisor <=> IO Server(OVS) <=> PowerVM Hypervisor
   <=> VM B
In "IO server" the packet is received by inbound Trunk ibmveth and then
delivered to OVS, which is then bridged to outbound Trunk ibmveth (shown
below),
Inbound Trunk ibmveth <=> OVS <=> Outbound Trunk ibmveth

In this model, we hit the following issues which impacted the VM
communication performance,

 - Issue 1: ibmveth doesn't support largesend and checksum offload features
   when configured as "Trunk". Driver has explicit checks to prevent
   enabling these offloads.

 - Issue 2: SYN packet drops seen at destination VM. When the packet
   originates, it has CHECKSUM_PARTIAL flag set and as it gets delivered to
   IO server's inbound Trunk ibmveth, on validating "checksum good" bits
   in ibmveth receive routine, SKB's ip_summed field is set with
   CHECKSUM_UNNECESSARY flag. This packet is then bridged by OVS (or Linux
   Bridge) and delivered to outbound Trunk ibmveth. At this point the
   outbound ibmveth transmit routine will not set "no checksum" and
   "checksum good" bits in transmit buffer descriptor, as it does so only
   when the ip_summed field is CHECKSUM_PARTIAL. When this packet gets
   delivered to destination VM, TCP layer receives the packet with checksum
   value of 0 and with no checksum related flags in ip_summed field. This
   leads to packet drops. So, TCP connections never goes through fine.

 - Issue 3: First packet of a TCP connection will be dropped, if there is
   no OVS flow cached in datapath. OVS while trying to identify the flow,
   computes the checksum. The computed checksum will be invalid at the
   receiving end, as ibmveth transmit routine zeroes out the pseudo
   checksum value in the packet. This leads to packet drop.

 - Issue 4: ibmveth driver doesn't have support for SKB's with frag_list.
   When Physical NIC has GRO enabled and when OVS bridges these packets,
   OVS vport send code will end up calling dev_queue_xmit, which in turn
   calls validate_xmit_skb.
   In validate_xmit_skb routine, the larger packets will get segmented into
   MSS sized segments, if SKB has a frag_list and if the driver to which
   they are delivered to doesn't support NETIF_F_FRAGLIST feature.

This patch addresses the above four issues, thereby enabling end to end
largesend and checksum offload support for better performance.

 - Fix for Issue 1 : Remove checks which prevent enabling TCP largesend and
   checksum offloads.
 - Fix for Issue 2 : When ibmveth receives a packet with "checksum good"
   bit set and if its configured in Trunk mode, set appropriate SKB fields
   using skb_partial_csum_set (ip_summed field is set with
   CHECKSUM_PARTIAL)
 - Fix for Issue 3: Recompute the pseudo header checksum before sending the
   SKB up the stack.
 - Fix for Issue 4: Linearize the SKBs with frag_list. Though we end up
   allocating buffers and copying data, this fix gives
   upto 4X throughput increase.

Note: All these fixes need to be dropped together as fixing just one of
them will lead to other issues immediately (especially for Issues 1,2 & 3).

Signed-off-by: Sivakumar Krishnasamy 
---
 drivers/net/ethernet/ibm/ibmveth.c | 107 ++---
 drivers/net/ethernet/ibm/ibmveth.h |   1 +
 2 files changed, 90 insertions(+), 18 deletions(-)

diff --git

Re: [PATCH v5 net-next 4/7] net: add new control message for incoming HW-timestamped packets

2017-05-19 Thread Miroslav Lichvar

On Thu, May 18, 2017 at 04:20:53PM -0400, Willem de Bruijn wrote:
> On Thu, May 18, 2017 at 10:07 AM, Miroslav Lichvar  
> wrote:
> > +SOF_TIMESTAMPING_OPT_PKTINFO:
> > +
> > +  Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming
> > +  packets with hardware timestamps. The message contains struct
> > +  scm_ts_pktinfo, which supplies the index of the real interface which
> > +  received the packet and its length at layer 2. A valid (non-zero)
> > +  interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is
> > +  enabled and the driver is using NAPI.
> 
> It is probably good to explicitly call out that the remaining two fields
> are reserved and undefined. To stress that applications cannot be
> overly pedantic and start failing if these become non-zero.

Ok. I'm adding "The struct contains also two other fields, but they
are reserved and undefined".

-- 
Miroslav Lichvar

[PATCH v2] xfrm: fix state migration copy replay sequence numbers

2017-05-19 Thread Antony Antony

During xfrm migration copy replay and preplay sequence numbers
from the previous state.

Here is a tcpdump output showing the problem.
10.0.10.46 is running vanilla kernel, is the IKE/IPsec responder.
After the migration it sent wrong sequence number, reset to 1.
The migration is from 10.0.0.52 to 10.0.0.53.

IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7cf), 
length 136
IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7cf), 
length 136
IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d0), 
length 136
IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7d0), 
length 136

IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]

IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d1), 
length 136

NOTE: next sequence is wrong 0x1

IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), 
length 136
IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d2), 
length 136
IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), 
length 136

Signed-off-by: Antony Antony 
---
Changes in v2:
  - include tcpdump output showing the problem in the commit message.

 net/xfrm/xfrm_state.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index fc3c5aa..2e291bc 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1383,6 +1383,8 @@ static struct xfrm_state *xfrm_state_clone(struct 
xfrm_state *orig)
x->curlft.add_time = orig->curlft.add_time;
x->km.state = orig->km.state;
x->km.seq = orig->km.seq;
+   x->replay = orig->replay;
+   x->preplay = orig->preplay;
 
return x;
 
-- 
2.9.3

Re: [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers

2017-05-19 Thread Jesper Dangaard Brouer

On Thu, 18 May 2017 17:41:48 +0200
Jesper Dangaard Brouer  wrote:

> Introducing a new XDP feature and associated bpf helper bpf_xdp_rxhash.
> 
> The rxhash and type allow filtering on packets without touching
> packet memory.  The performance difference on my system with a
> 100 Gbit/s mlx5 NIC is 12Mpps to 19Mpps.

The XDP/bpf program I use (called xdp_rxhash) for testing this feature
is available via my github repo here:

 
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_rxhash_kern.c
 
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_rxhash_user.c

The cmdline output looks like:

$ sudo ./xdp_rxhash --dev mlx5p2 --sec 2 --notouch

xdp-action ppspps-human-readable mem  
XDP_ABORTED19694682   19,694,682 2.000205  no_touch
XDP_DROP   0  0  2.000205  no_touch
XDP_PASS   10 10 2.000205  no_touch
XDP_TX 0  0  2.000205  no_touch
rx_total   19694701   19,694,701 2.000205  no_touch

hash_type:L3   ppspps-human-readable sample-period
Unknown0  0  2.000205
IPv4   19694725   19,694,725 2.000205
IPv6   0  0  2.000205

hash_type:L4   ppspps-human-readable sample-period
Unknown10 10 2.000205
TCP0  0  2.000205
UDP19694697   19,694,697 2.000205

^CInterrupted: Removing XDP program on ifindex:5 device:mlx5p2


The 10 pps XDP_PASS is a ping command I rand at the same time. Notice
how these ping-ICMP packets were categorized as L4=Unknown and L3=IPv4.
The L4 categorization is usually UDP or TCP, but looking at driver-code
it seems some HW support detecting other L4 types, like ICMP, SCTP, etc.


$ sudo taskset -c 4 ping -i 0.1 -c 1 198.18.100.1 -c 100 -q
PING 198.18.100.1 (198.18.100.1) 56(84) bytes of data.

--- 198.18.100.1 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 10294ms
rtt min/avg/max/mdev = 0.010/0.071/0.113/0.043 ms


p.s. xdp-newbies can via this commit find the links back to the netdev
kernel RFC patches/mails:
 https://github.com/netoptimizer/prototype-kernel/commit/9647e1b563970

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH v3 net-next 5/5] dsa: add maintainer of Microchip KSZ switches

2017-05-19 Thread Florian Fainelli

On 05/19/2017 03:57 PM, woojung@microchip.com wrote:
> From: Woojung Huh 
> 
> Adding maintainer of Microchip KSZ switches.
> 
> Signed-off-by: Woojung Huh 
> Reviewed-by: Andrew Lunn 

Reviewed-by: Florian Fainelli 

> ---
>  MAINTAINERS | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f7d568b..a72b40c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8454,6 +8454,16 @@ F: drivers/media/platform/atmel/atmel-isc.c
>  F:   drivers/media/platform/atmel/atmel-isc-regs.h
>  F:   devicetree/bindings/media/atmel-isc.txt
>  
> +MICROCHIP KSZ SERIES ETHERNET SWITCH DRIVER
> +M:   Woojung Huh 
> +M:   Microchip Linux Driver Support 
> +L:   netdev@vger.kernel.org
> +S:   Maintained
> +F:   net/dsa/tag_ksz.c
> +F:   drivers/net/dsa/microchip/*
> +F:   include/linux/platform_data/microchip-ksz.h
> +F:   Documentation/devicetree/bindings/net/dsa/ksz.txt
> +
>  MICROCHIP USB251XB DRIVER
>  M:   Richard Leitner 
>  L:   linux-...@vger.kernel.org
> 


-- 
Florian

Re: [PATCH v3 net-next 4/5] dsa: Add spi support to Microchip KSZ switches

2017-05-19 Thread Florian Fainelli

On 05/19/2017 03:57 PM, woojung@microchip.com wrote:
> From: Woojung Huh 
> 
> A sample SPI configuration for Microchip KSZ switches.
> 
> Signed-off-by: Woojung Huh 
> Reviewed-by: Andrew Lunn 

Subject should be something like:

dt-bindings: net: dsa: Add Microchip KSZ switches binding

With that fixed and the nits below:

Reviewed-by: Florian Fainelli 

> ---
>  Documentation/devicetree/bindings/net/dsa/ksz.txt | 73 
> +++
>  1 file changed, 73 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/dsa/ksz.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt 
> b/Documentation/devicetree/bindings/net/dsa/ksz.txt
> new file mode 100644
> index 000..8a13966
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt
> @@ -0,0 +1,73 @@
> +Microchip KSZ Series Ethernet switches
> +==
> +
> +Required properties:
> +
> +- compatible: For external switch chips, compatible string must be exactly 
> one
> +  of: "microchip,ksz9477"
> +
> +See Documentation/devicetree/bindings/dsa/dsa.txt for a list of additional
> +required and optional properties.
> +
> +Examples:
> +
> +Ethernet switch connected via SPI to the host, CPU port wired to eth0:
> +
> + eth0: ethernet@10001000 {
> + fixed-link {
> + reg = <7>

There is a missing semicolon, and for a fixed-link, there is no "reg"
property.

> + speed = <1000>;
> + duplex-full;

Actually the correct property is named "full-duplex" (like you put it
for the switch)

> + };
> + };
> +
> + spi1: spi@f8008000 {
> + pinctrl-0 = <_spi_ksz>;
> + cs-gpios = < 25 0>;
> + id = <1>;
> + status = "okay";
> +
> + ksz9477: ksz9477@0 {
> + compatible = "microchip,ksz9477";
> + reg = <0>;
> +
> + spi-max-frequency = <4400>;
> + spi-cpha;
> + spi-cpol;
> +
> + status = "okay";
> + ports {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + port@0 {
> + reg = <0>;
> + label = "lan1";
> + };
> + port@1 {
> + reg = <1>;
> + label = "lan2";
> + };
> + port@2 {
> + reg = <2>;
> + label = "lan3";
> + };
> + port@3 {
> + reg = <3>;
> + label = "lan4";
> + };
> + port@4 {
> + reg = <4>;
> + label = "lan5";
> + };
> + port@5 {
> + reg = <5>;
> + label = "cpu";
> + ethernet = <>;
> + fixed-link {
> + speed = <1000>;
> + full-duplex;
> + };
> + };
> + };
> + };
> + };
> 


-- 
Florian

RE: [PATCH v3 net-next 1/5] dsa: add support for Microchip KSZ tail tagging

2017-05-19 Thread Woojung.Huh

>> + if (padlen) {
>> + u8 *pad = skb_put(nskb, padlen);
>> +
>> + memset(pad, 0, padlen);
>> + }
>
>Can you use skb_put_padto() here instead of open coding this?
>
>> +
>> + tag = skb_put(nskb, KSZ_INGRESS_TAG_LEN);
>> + tag[0] = 0;
>> + tag[1] = 1 << p->dp->index; /* destnation port */
>
>typo: destination port
>
>With that fixed:
>
>Reviewed-by: Florian Fainelli 
HI Florian,

Thanks for prompt reviews. Will submit another version.

- Woojung

Re: [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers

2017-05-19 Thread Jakub Kicinski

On Fri, 19 May 2017 20:07:52 -0700, Alexei Starovoitov wrote:
> How about exposing 'struct mlx5_cqe64 *' to XDP programs as-is?
> We can make sure that XDP program does read only access into it and
> it will see cqe->rss_hash_result, cqe->rss_hash_type and everything else
> in there, but this will not be uapi and it will be pretty obvious
> to program authors that their programs are vendor specific.
> 'not uapi' here means that mellanox is free to change their HW descriptor
> and its contents as they wish.

Hm..  Would that mean we have to teach the verifier about all possible
drivers and their metadata structures (i.e. sizes thereof).  And add an
UAPI enum of known drivers?

Other idea I floated in early days was to standardize the fields but
let the driver "JIT" the accesses to look at the right offset of the
right structure.  Admittedly that would be a lot more work.

[PATCH v2 net-next] net: ipv6: fix code style error and warning of ndisc.c

2017-05-19 Thread yuan linyu

From: yuan linyu 

Signed-off-by: yuan linyu 
---
 net/ipv6/ndisc.c | 300 ---
 1 file changed, 155 insertions(+), 145 deletions(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index d310dc4..5a3dfaa 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -12,8 +12,7 @@
  *  2 of the License, or (at your option) any later version.
  */
 
-/*
- * Changes:
+/* Changes:
  *
  * Alexey I. Froloff   :   RFC6106 (DNSSL) support
  * Pierre Ynard:   export userland ND options
@@ -99,7 +98,6 @@ static const struct neigh_ops ndisc_hh_ops = {
.connected_output = neigh_resolve_output,
 };
 
-
 static const struct neigh_ops ndisc_direct_ops = {
.family =   AF_INET6,
.output =   neigh_direct_output,
@@ -147,13 +145,13 @@ void __ndisc_fill_addr_option(struct sk_buff *skb, int 
type, void *data,
u8 *opt = skb_put(skb, space);
 
opt[0] = type;
-   opt[1] = space>>3;
+   opt[1] = space >> 3;
 
memset(opt + 2, 0, pad);
opt   += pad;
space -= pad;
 
-   memcpy(opt+2, data, data_len);
+   memcpy(opt + 2, data, data_len);
data_len += 2;
opt += data_len;
space -= data_len;
@@ -182,6 +180,7 @@ static struct nd_opt_hdr *ndisc_next_option(struct 
nd_opt_hdr *cur,
struct nd_opt_hdr *end)
 {
int type;
+
if (!cur || !end || cur >= end)
return NULL;
type = cur->nd_opt_type;
@@ -222,6 +221,7 @@ struct ndisc_options *ndisc_parse_options(const struct 
net_device *dev,
memset(ndopts, 0, sizeof(*ndopts));
while (opt_len) {
int l;
+
if (opt_len < sizeof(struct nd_opt_hdr))
return NULL;
l = nd_opt->nd_opt_len << 3;
@@ -240,13 +240,15 @@ struct ndisc_options *ndisc_parse_options(const struct 
net_device *dev,
  "%s: duplicated ND6 option found: 
type=%d\n",
  __func__, nd_opt->nd_opt_type);
} else {
-   ndopts->nd_opt_array[nd_opt->nd_opt_type] = 
nd_opt;
+   ndopts->nd_opt_array[nd_opt->nd_opt_type] =
+   nd_opt;
}
break;
case ND_OPT_PREFIX_INFO:
ndopts->nd_opts_pi_end = nd_opt;
if (!ndopts->nd_opt_array[nd_opt->nd_opt_type])
-   ndopts->nd_opt_array[nd_opt->nd_opt_type] = 
nd_opt;
+   ndopts->nd_opt_array[nd_opt->nd_opt_type] =
+   nd_opt;
break;
 #ifdef CONFIG_IPV6_ROUTE_INFO
case ND_OPT_ROUTE_INFO:
@@ -261,8 +263,7 @@ struct ndisc_options *ndisc_parse_options(const struct 
net_device *dev,
if (!ndopts->nd_useropts)
ndopts->nd_useropts = nd_opt;
} else {
-   /*
-* Unknown options must be silently ignored,
+   /* Unknown options must be silently ignored,
 * to accommodate future extension to the
 * protocol.
 */
@@ -280,7 +281,8 @@ struct ndisc_options *ndisc_parse_options(const struct 
net_device *dev,
return ndopts;
 }
 
-int ndisc_mc_map(const struct in6_addr *addr, char *buf, struct net_device 
*dev, int dir)
+int ndisc_mc_map(const struct in6_addr *addr, char *buf,
+struct net_device *dev, int dir)
 {
switch (dev->type) {
case ARPHRD_ETHER:
@@ -327,9 +329,8 @@ static int ndisc_constructor(struct neighbour *neigh)
bool is_multicast = ipv6_addr_is_multicast(addr);
 
in6_dev = in6_dev_get(dev);
-   if (!in6_dev) {
+   if (!in6_dev)
return -EINVAL;
-   }
 
parms = in6_dev->nd_parms;
__neigh_parms_put(neigh->parms);
@@ -344,12 +345,12 @@ static int ndisc_constructor(struct neighbour *neigh)
if (is_multicast) {
neigh->nud_state = NUD_NOARP;
ndisc_mc_map(addr, neigh->ha, dev, 1);
-   } else if (dev->flags&(IFF_NOARP|IFF_LOOPBACK)) {
+   } else if (dev->flags & (IFF_NOARP | IFF_LOOPBACK)) {
neigh->nud_state = NUD_NOARP;
memcpy(neigh->ha, dev->dev_addr, dev->addr_len);
-   if (dev->flags_LOOPBACK)
+   if (dev->flags & IFF_LOOPBACK)
neigh->type = RTN_LOCAL;
-

Re: [PATCH v3 net-next 3/5] dsa: add DSA switch driver for Microchip KSZ9477

2017-05-19 Thread Florian Fainelli

On 05/19/2017 03:57 PM, woojung@microchip.com wrote:
> From: Woojung Huh 
> 
> The KSZ9477 is a fully integrated layer 2, managed, 7 ports GigE switch
> with numerous advanced features. 5 ports incorporate 10/100/1000 Mbps PHYs.
> The other 2 ports have interfaces that can be configured as SGMII, RGMII, MII
> or RMII. Either of these may connect directly to a host processor or
> to an external PHY. The SGMII port may interface to a fiber optic transceiver.
> 
> This driver currently supports vlan, fdb, mdb & mirror dsa switch operations.
> 
> Signed-off-by: Woojung Huh 

Looks great, thanks Woojung!

Reviewed-by: Florian Fainelli 
-- 
Florian

[PATCH] i40e: Fix incorrect pf->flags

2017-05-19 Thread Tushar Dave

Fix bug introduced by 'commit 47994c119a36e ("i40e: remove
hw_disabled_flags in favor of using separate flag bits")' that
mistakenly wipes out pf->flags.

Signed-off-by: Tushar Dave 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d5c9c9e..6b98d34 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8821,9 +8821,9 @@ static int i40e_sw_init(struct i40e_pf *pf)
(pf->hw.aq.api_min_ver > 4))) {
/* Supported in FW API version higher than 1.4 */
pf->flags |= I40E_FLAG_GENEVE_OFFLOAD_CAPABLE;
-   pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE;
+   pf->flags |= I40E_FLAG_HW_ATR_EVICT_CAPABLE;
} else {
-   pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE;
+   pf->flags |= I40E_FLAG_HW_ATR_EVICT_CAPABLE;
}
 
pf->eeprom_version = 0xDEAD;
-- 
1.9.1

Re: [PATCH v3 net-next 2/5] phy: micrel: add Microchip KSZ 9477 Switch PHY support

2017-05-19 Thread Florian Fainelli

On 05/19/2017 03:57 PM, woojung@microchip.com wrote:
> From: Woojung Huh 
> 
> Adding Microchip 9477 Phy included in KSZ9477 Switch.
> 
> Signed-off-by: Woojung Huh 
> Signed-off-by: Andrew Lunn 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH v3 net-next 1/5] dsa: add support for Microchip KSZ tail tagging

2017-05-19 Thread Florian Fainelli

On 05/19/2017 03:57 PM, woojung@microchip.com wrote:
> From: Woojung Huh 
> 
> Adding support for the Microchip KSZ switch family tail tagging.
> 
> Signed-off-by: Woojung Huh 
> Reviewed-by: Andrew Lunn 
> ---
>  include/net/dsa.h  |   1 +
>  net/dsa/Kconfig|   3 ++
>  net/dsa/Makefile   |   1 +
>  net/dsa/dsa.c  |   3 ++
>  net/dsa/dsa_priv.h |   3 ++
>  net/dsa/tag_ksz.c  | 103 
> +
>  6 files changed, 114 insertions(+)
>  create mode 100644 net/dsa/tag_ksz.c
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 791fed6..fbb00a6 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -31,6 +31,7 @@ enum dsa_tag_protocol {
>   DSA_TAG_PROTO_BRCM,
>   DSA_TAG_PROTO_DSA,
>   DSA_TAG_PROTO_EDSA,
> + DSA_TAG_PROTO_KSZ,
>   DSA_TAG_PROTO_LAN9303,
>   DSA_TAG_PROTO_MTK,
>   DSA_TAG_PROTO_QCA,
> diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
> index 297389b..cc5f8f9 100644
> --- a/net/dsa/Kconfig
> +++ b/net/dsa/Kconfig
> @@ -25,6 +25,9 @@ config NET_DSA_TAG_DSA
>  config NET_DSA_TAG_EDSA
>   bool
>  
> +config NET_DSA_TAG_KSZ
> + bool
> +
>  config NET_DSA_TAG_LAN9303
>   bool
>  
> diff --git a/net/dsa/Makefile b/net/dsa/Makefile
> index f8c0251..b15141f 100644
> --- a/net/dsa/Makefile
> +++ b/net/dsa/Makefile
> @@ -6,6 +6,7 @@ dsa_core-y += dsa.o slave.o dsa2.o switch.o legacy.o
>  dsa_core-$(CONFIG_NET_DSA_TAG_BRCM) += tag_brcm.o
>  dsa_core-$(CONFIG_NET_DSA_TAG_DSA) += tag_dsa.o
>  dsa_core-$(CONFIG_NET_DSA_TAG_EDSA) += tag_edsa.o
> +dsa_core-$(CONFIG_NET_DSA_TAG_KSZ) += tag_ksz.o
>  dsa_core-$(CONFIG_NET_DSA_TAG_LAN9303) += tag_lan9303.o
>  dsa_core-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o
>  dsa_core-$(CONFIG_NET_DSA_TAG_QCA) += tag_qca.o
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index 3288a80..402459e 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -49,6 +49,9 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = 
> {
>  #ifdef CONFIG_NET_DSA_TAG_EDSA
>   [DSA_TAG_PROTO_EDSA] = _netdev_ops,
>  #endif
> +#ifdef CONFIG_NET_DSA_TAG_KSZ
> + [DSA_TAG_PROTO_KSZ] = _netdev_ops,
> +#endif
>  #ifdef CONFIG_NET_DSA_TAG_LAN9303
>   [DSA_TAG_PROTO_LAN9303] = _netdev_ops,
>  #endif
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index c274130..6f23dfa 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -85,6 +85,9 @@ extern const struct dsa_device_ops dsa_netdev_ops;
>  /* tag_edsa.c */
>  extern const struct dsa_device_ops edsa_netdev_ops;
>  
> +/* tag_ksz.c */
> +extern const struct dsa_device_ops ksz_netdev_ops;
> +
>  /* tag_lan9303.c */
>  extern const struct dsa_device_ops lan9303_netdev_ops;
>  
> diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
> new file mode 100644
> index 000..cbc79b5
> --- /dev/null
> +++ b/net/dsa/tag_ksz.c
> @@ -0,0 +1,103 @@
> +/*
> + * net/dsa/tag_ksz.c - Microchip KSZ Switch tag format handling
> + * Copyright (c) 2017 Microchip Technology
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "dsa_priv.h"
> +
> +/* For Ingress (Host -> KSZ), 2 bytes are added before FCS.
> + * 
> ---
> + * 
> DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes)
> + * 
> ---
> + * tag0 : Prioritization (not used now)
> + * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5)
> + *
> + * For Egress (KSZ -> Host), 1 byte is added before FCS.
> + * 
> ---
> + * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|FCS(4bytes)
> + * 
> ---
> + * tag0 : zero-based value represents port
> + * (eg, 0x00=port1, 0x02=port3, 0x06=port7)
> + */
> +
> +#define  KSZ_INGRESS_TAG_LEN 2
> +#define  KSZ_EGRESS_TAG_LEN  1
> +
> +static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> + struct dsa_slave_priv *p = netdev_priv(dev);
> + struct sk_buff *nskb;
> + int padlen;
> + u8 *tag;
> +
> + padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
> +
> + if (skb_tailroom(skb) >= padlen + KSZ_INGRESS_TAG_LEN) {
> + nskb = skb;
> + } else {
> + nskb = alloc_skb(NET_IP_ALIGN + skb->len +
> +  padlen + KSZ_INGRESS_TAG_LEN, GFP_ATOMIC);
> + if (!nskb) {
> + kfree_skb(skb);
> + return

Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-19 Thread Alexei Starovoitov

On 5/19/17 4:16 PM, David Miller wrote:

From: Alexei Starovoitov 
Date: Fri, 19 May 2017 14:37:56 -0700

On 5/19/17 1:41 PM, David Miller wrote:

From: Edward Cree 
Date: Fri, 19 May 2017 18:17:42 +0100

One question: is there a way to build the verifier as userland code
 (or at least as a module), or will I have to reboot every time I
 want to test a change?

There currently is no such machanism, you will have to reboot every
time.

I have considered working on making the code buildable outside of the
kernel.  It shouldn't be too hard.

it's not hard.
We did it twice and both times abandoned.
First time to have 'user space verifier' to check programs before
loading and second time for fuzzing via llvm.
Abandoned since it diverges very quickly from kernel.

Well, my idea was the create an environment in which kernel verifier.c
could be built as-is.

Maybe there would be some small compromises in verifier.c such as an
ifdef test or two, but that should be it.

that's exactly what we did the first time. Added few ifdef to verifier.c
Second time we went even further by compiling kernel/bpf/verifier.c
as-is and linking everything magically via user space hooks
all the way that test_verifier.c runs as-is but calling
bpf_check() function that was compiled for user space via clang.
That code is here:
https://github.com/iovisor/bpf-fuzzer
It's definitely possible to refresh it and make it work again.

My point that unless we put such 'lets build verifier.c for user space'
as part of tools/testing/selftests/ or something, such project is
destined to bit rot.

Re: [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers

2017-05-19 Thread Jakub Kicinski

On Fri, 19 May 2017 20:34:00 -0700, Alexei Starovoitov wrote:
> On Fri, May 19, 2017 at 08:21:47PM -0700, Jakub Kicinski wrote:
> > On Fri, 19 May 2017 20:07:52 -0700, Alexei Starovoitov wrote:  
> > > How about exposing 'struct mlx5_cqe64 *' to XDP programs as-is?
> > > We can make sure that XDP program does read only access into it and
> > > it will see cqe->rss_hash_result, cqe->rss_hash_type and everything else
> > > in there, but this will not be uapi and it will be pretty obvious
> > > to program authors that their programs are vendor specific.
> > > 'not uapi' here means that mellanox is free to change their HW descriptor
> > > and its contents as they wish.  
> > 
> > Hm..  Would that mean we have to teach the verifier about all possible
> > drivers and their metadata structures (i.e. sizes thereof).  And add an
> > UAPI enum of known drivers?  
> 
> why? no uapi other than a pointer to this hw rx descriptor.
> Different sizeof(hw_rx_descriptor) is not a problem.
> We deal with it already in tracing. All tracepoints have different
> sizeof(*ctx), yet the safety is preserved.

Ack, quick read of tracing code reveals this indeed should work.

Re: [PATCH net-next] geneve: always fill CSUM6_RX configuration

2017-05-19 Thread Pravin Shelar

On Thu, May 18, 2017 at 12:59 PM, Eric Garver  wrote:
> CSMU6_RX is relevant for collect_metadata as well. As such leave it
> outside of the dev's IPv4/IPv6 checks.
>
Can you explain it bit? is this flag used with ipv4 tunnels?

> Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
> Signed-off-by: Eric Garver 
> ---
>  drivers/net/geneve.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index dec5d563ab19..f557d1dc3f9b 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> @@ -1311,13 +1311,13 @@ static int geneve_fill_info(struct sk_buff *skb, 
> const struct net_device *dev)
> if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_TX,
>!(info->key.tun_flags & TUNNEL_CSUM)))
> goto nla_put_failure;
> -
> -   if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX,
> -  !geneve->use_udp6_rx_checksums))
> -   goto nla_put_failure;
>  #endif
> }
>
> +   if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX,
> +  !geneve->use_udp6_rx_checksums))
> +   goto nla_put_failure;
> +
> if (nla_put_u8(skb, IFLA_GENEVE_TTL, info->key.ttl) ||
> nla_put_u8(skb, IFLA_GENEVE_TOS, info->key.tos) ||
> nla_put_be32(skb, IFLA_GENEVE_LABEL, info->key.label))
> --
> 2.12.0
>

[net-next] net: ipv6: fix code style error and warning of ndisc.c

2017-05-19 Thread yuan linyu

From: yuan linyu 

Signed-off-by: yuan linyu 
---
 net/ipv6/ndisc.c | 300 ---
 1 file changed, 155 insertions(+), 145 deletions(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index d310dc4..5a3dfaa 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -12,8 +12,7 @@
  *  2 of the License, or (at your option) any later version.
  */
 
-/*
- * Changes:
+/* Changes:
  *
  * Alexey I. Froloff   :   RFC6106 (DNSSL) support
  * Pierre Ynard:   export userland ND options
@@ -99,7 +98,6 @@ static const struct neigh_ops ndisc_hh_ops = {
.connected_output = neigh_resolve_output,
 };
 
-
 static const struct neigh_ops ndisc_direct_ops = {
.family =   AF_INET6,
.output =   neigh_direct_output,
@@ -147,13 +145,13 @@ void __ndisc_fill_addr_option(struct sk_buff *skb, int 
type, void *data,
u8 *opt = skb_put(skb, space);
 
opt[0] = type;
-   opt[1] = space>>3;
+   opt[1] = space >> 3;
 
memset(opt + 2, 0, pad);
opt   += pad;
space -= pad;
 
-   memcpy(opt+2, data, data_len);
+   memcpy(opt + 2, data, data_len);
data_len += 2;
opt += data_len;
space -= data_len;
@@ -182,6 +180,7 @@ static struct nd_opt_hdr *ndisc_next_option(struct 
nd_opt_hdr *cur,
struct nd_opt_hdr *end)
 {
int type;
+
if (!cur || !end || cur >= end)
return NULL;
type = cur->nd_opt_type;
@@ -222,6 +221,7 @@ struct ndisc_options *ndisc_parse_options(const struct 
net_device *dev,
memset(ndopts, 0, sizeof(*ndopts));
while (opt_len) {
int l;
+
if (opt_len < sizeof(struct nd_opt_hdr))
return NULL;
l = nd_opt->nd_opt_len << 3;
@@ -240,13 +240,15 @@ struct ndisc_options *ndisc_parse_options(const struct 
net_device *dev,
  "%s: duplicated ND6 option found: 
type=%d\n",
  __func__, nd_opt->nd_opt_type);
} else {
-   ndopts->nd_opt_array[nd_opt->nd_opt_type] = 
nd_opt;
+   ndopts->nd_opt_array[nd_opt->nd_opt_type] =
+   nd_opt;
}
break;
case ND_OPT_PREFIX_INFO:
ndopts->nd_opts_pi_end = nd_opt;
if (!ndopts->nd_opt_array[nd_opt->nd_opt_type])
-   ndopts->nd_opt_array[nd_opt->nd_opt_type] = 
nd_opt;
+   ndopts->nd_opt_array[nd_opt->nd_opt_type] =
+   nd_opt;
break;
 #ifdef CONFIG_IPV6_ROUTE_INFO
case ND_OPT_ROUTE_INFO:
@@ -261,8 +263,7 @@ struct ndisc_options *ndisc_parse_options(const struct 
net_device *dev,
if (!ndopts->nd_useropts)
ndopts->nd_useropts = nd_opt;
} else {
-   /*
-* Unknown options must be silently ignored,
+   /* Unknown options must be silently ignored,
 * to accommodate future extension to the
 * protocol.
 */
@@ -280,7 +281,8 @@ struct ndisc_options *ndisc_parse_options(const struct 
net_device *dev,
return ndopts;
 }
 
-int ndisc_mc_map(const struct in6_addr *addr, char *buf, struct net_device 
*dev, int dir)
+int ndisc_mc_map(const struct in6_addr *addr, char *buf,
+struct net_device *dev, int dir)
 {
switch (dev->type) {
case ARPHRD_ETHER:
@@ -327,9 +329,8 @@ static int ndisc_constructor(struct neighbour *neigh)
bool is_multicast = ipv6_addr_is_multicast(addr);
 
in6_dev = in6_dev_get(dev);
-   if (!in6_dev) {
+   if (!in6_dev)
return -EINVAL;
-   }
 
parms = in6_dev->nd_parms;
__neigh_parms_put(neigh->parms);
@@ -344,12 +345,12 @@ static int ndisc_constructor(struct neighbour *neigh)
if (is_multicast) {
neigh->nud_state = NUD_NOARP;
ndisc_mc_map(addr, neigh->ha, dev, 1);
-   } else if (dev->flags&(IFF_NOARP|IFF_LOOPBACK)) {
+   } else if (dev->flags & (IFF_NOARP | IFF_LOOPBACK)) {
neigh->nud_state = NUD_NOARP;
memcpy(neigh->ha, dev->dev_addr, dev->addr_len);
-   if (dev->flags_LOOPBACK)
+   if (dev->flags & IFF_LOOPBACK)
neigh->type = RTN_LOCAL;
-

Re: [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers

2017-05-19 Thread Alexei Starovoitov

On Thu, May 18, 2017 at 05:41:48PM +0200, Jesper Dangaard Brouer wrote:
>  
> +/* XDP rxhash have an associated type, which is related to the RSS
> + * (Receive Side Scaling) standard, but NIC HW have different mapping
> + * and support. Thus, create mapping that is interesting for XDP.  XDP
> + * would primarly want insight into L3 and L4 protocol info.
> + *
> + * TODO: Likely need to get extended with "L3_IPV6_EX" due RSS standard
> + *
> + * The HASH_TYPE will be returned from bpf helper as the top 32-bit of
> + * the 64-bit rxhash (internally type stored in xdp_buff->flags).
> + */
> +#define XDP_HASH(x)  ((x) & ((1ULL << 32)-1))
> +#define XDP_HASH_TYPE(x) ((x) >> 32)
> +
> +#define XDP_HASH_TYPE_L3_SHIFT   0
> +#define XDP_HASH_TYPE_L3_BITS3
> +#define XDP_HASH_TYPE_L3_MASK((1ULL << XDP_HASH_TYPE_L3_BITS)-1)
> +#define XDP_HASH_TYPE_L3(x)  ((x) & XDP_HASH_TYPE_L3_MASK)
> +enum {
> + XDP_HASH_TYPE_L3_IPV4 = 1,
> + XDP_HASH_TYPE_L3_IPV6,
> +};
> +
> +#define XDP_HASH_TYPE_L4_SHIFT   XDP_HASH_TYPE_L3_BITS
> +#define XDP_HASH_TYPE_L4_BITS5
> +#define XDP_HASH_TYPE_L4_MASK
> \
> + (((1ULL << XDP_HASH_TYPE_L4_BITS)-1) << XDP_HASH_TYPE_L4_SHIFT)
> +#define XDP_HASH_TYPE_L4(x)  ((x) & XDP_HASH_TYPE_L4_MASK)
> +enum {
> + _XDP_HASH_TYPE_L4_TCP = 1,
> + _XDP_HASH_TYPE_L4_UDP,
> +};
> +#define XDP_HASH_TYPE_L4_TCP (_XDP_HASH_TYPE_L4_TCP << 
> XDP_HASH_TYPE_L4_SHIFT)
> +#define XDP_HASH_TYPE_L4_UDP (_XDP_HASH_TYPE_L4_UDP << 
> XDP_HASH_TYPE_L4_SHIFT)

imo this is dangerous territory.
As far as I can see this information doesn't exist in the current drivers at all
and you're enabling it in the patch 5 via fancy:
+   u32 ht = (mlx5_htype_l4_to_xdp[((cht & CQE_RSS_HTYPE_L4) >> 6)] | \
+ mlx5_htype_l3_to_xdp[((cht & CQE_RSS_HTYPE_IP) >> 2)]);

It's pretty cool that you've discovered this hidden mlx5 feature
Did you find it in some hw spec ?
And it looks useful to me, but
1. i'm worried that we'd be relying on something that mellanox didn't
 implement in their drivers before. Was it tested and guarnteed to exist
 in the future revisions of firmware? Is it cx4 or cx4-lx or cx5 feature?
2. but the main concern that it is mellanox only feature. At least I cannot
see anything like this in broadcom and intel nics

In the very beginning we discussed that XDP programs should be as generic as
possible and HW independent while at the same time we want to expose HW
specific features to XDP programs.
So I'm totally fine to expose this fancy hw hash and ipv4 vs v6 and tcp vs udp
flags to xdp programs somehow, but I'm completely against making it into uapi.

How about exposing 'struct mlx5_cqe64 *' to XDP programs as-is?
We can make sure that XDP program does read only access into it and
it will see cqe->rss_hash_result, cqe->rss_hash_type and everything else
in there, but this will not be uapi and it will be pretty obvious
to program authors that their programs are vendor specific.
'not uapi' here means that mellanox is free to change their HW descriptor
and its contents as they wish.
Also no copies and bit conversions will be necessary, so the cost will
be zero to programs that don't use it and we wouldn't need to change
verifier to discover access to this stuff.

Note that bpf programs already have access to all in-kernel data structures
on the tracing side, so this is nothing new and tracing program authors
got used to structures changing from kernel to kernel.
XDP program authors can do the same for vendor specific bits while we
keep core XDP uapi generic across all nics.

Re: [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers

2017-05-19 Thread Alexei Starovoitov

On Fri, May 19, 2017 at 08:21:47PM -0700, Jakub Kicinski wrote:
> On Fri, 19 May 2017 20:07:52 -0700, Alexei Starovoitov wrote:
> > How about exposing 'struct mlx5_cqe64 *' to XDP programs as-is?
> > We can make sure that XDP program does read only access into it and
> > it will see cqe->rss_hash_result, cqe->rss_hash_type and everything else
> > in there, but this will not be uapi and it will be pretty obvious
> > to program authors that their programs are vendor specific.
> > 'not uapi' here means that mellanox is free to change their HW descriptor
> > and its contents as they wish.
> 
> Hm..  Would that mean we have to teach the verifier about all possible
> drivers and their metadata structures (i.e. sizes thereof).  And add an
> UAPI enum of known drivers?

why? no uapi other than a pointer to this hw rx descriptor.
Different sizeof(hw_rx_descriptor) is not a problem.
We deal with it already in tracing. All tracepoints have different
sizeof(*ctx), yet the safety is preserved.

> Other idea I floated in early days was to standardize the fields but
> let the driver "JIT" the accesses to look at the right offset of the
> right structure.  Admittedly that would be a lot more work.

'standardize the fields' sounds nice, but failed here already.
As far as I can see the meaning of packet 'hash' is quite different
across the drivers and 'hash' is just a beginning.
I hope we can standardize on 'csum' field and make it checksum_complete,
but so far out of 10+G nics only mlx5 and nfp do it in hw.
We need it at least for mlx4, but it can only fake it via expensive math.

Re: [PATCH net-next v2] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Hangbin Liu

On Fri, May 19, 2017 at 07:30:43PM +0200, Ivan Vecera wrote:
> Current bridge code incorrectly handles starting/stopping of hello and
> hold timers during STP enable/disable.
> 
> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
>transition. The timers are already stopped in NO_STP state so
>this is confusing no-op.

Hi Ivan,

Shouldn't we start hello timer in br_stp_start when NO_STP -> BR_KERNEL_STP ?
> 
> 2. During USER_STP->NO_STP transition the timers are started. This
>does not make sense and is confusion because the timer should not be
>active in NO_STP state.

Yes, but what about BR_KERNEL_STP -> NO_STP in function br_stp_stop() ?
> 
> Cc: da...@davemloft.net
> Cc: sas...@cumulusnetworks.com
> Cc: step...@networkplumber.org
> Cc: bri...@lists.linux-foundation.org
> Cc: lucien@gmail.com
> Cc: niko...@cumulusnetworks.com
> Signed-off-by: Ivan Vecera 
> ---
>  net/bridge/br_stp_if.c | 11 ---
>  1 file changed, 11 deletions(-)
> 
> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
> index 08341d2aa9c9..a05027027513 100644
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char 
> *arg)
>  
>  static void br_stp_start(struct net_bridge *br)
>  {
> - struct net_bridge_port *p;
>   int err = -ENOENT;
>  
>   if (net_eq(dev_net(br->dev), _net))
> @@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br)
>   if (!err) {
>   br->stp_enabled = BR_USER_STP;
>   br_debug(br, "userspace STP started\n");
> -
> - /* Stop hello and hold timers */
> - del_timer(>hello_timer);
> - list_for_each_entry(p, >port_list, list)
> - del_timer(>hold_timer);

I'm not sure if user space daemon will send bpdu or not? In comment
76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and
hold timers"). Nikolay said we should not handle it with BR_USER_STP.

>   } else {
>   br->stp_enabled = BR_KERNEL_STP;
>   br_debug(br, "using kernel STP\n");
> @@ -187,7 +181,6 @@ static void br_stp_start(struct net_bridge *br)
>  
>  static void br_stp_stop(struct net_bridge *br)
>  {
> - struct net_bridge_port *p;
>   int err;
>  
>   if (br->stp_enabled == BR_USER_STP) {
> @@ -196,10 +189,6 @@ static void br_stp_stop(struct net_bridge *br)
>   br_err(br, "failed to stop userspace STP (%d)\n", err);
>  
>   /* To start timers on any ports left in blocking */
> - mod_timer(>hello_timer, jiffies + br->hello_time);
> - list_for_each_entry(p, >port_list, list)
> - mod_timer(>hold_timer,
> -   round_jiffies(jiffies + BR_HOLD_TIME));

If we do not del hello_timer. after it expired in br_hello_timer_expired(),
Our state is br->dev->flags & IFF_UP and br->stp_enabled == NO_STP, it will
call mod_timer(>hello_timer, round_jiffies(jiffies + br->hello_time))
and we will keep sending bpdu message even after stp stoped.

>   spin_lock_bh(>lock);
>   br_port_state_selection(br);
>   spin_unlock_bh(>lock);
> -- 

So how about just like

diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index d8ad73b..0198f62 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -183,6 +183,7 @@ static void br_stp_start(struct net_bridge *br)
} else {
br->stp_enabled = BR_KERNEL_STP;
br_debug(br, "using kernel STP\n");
+   mod_timer(>hello_timer, jiffies + br->hello_time);

/* To start timers on any ports left in blocking */
br_port_state_selection(br);
@@ -202,7 +203,6 @@ static void br_stp_stop(struct net_bridge *br)
br_err(br, "failed to stop userspace STP (%d)\n", err);

/* To start timers on any ports left in blocking */
-   mod_timer(>hello_timer, jiffies + br->hello_time);
list_for_each_entry(p, >port_list, list)
mod_timer(>hold_timer,
  round_jiffies(jiffies + BR_HOLD_TIME));
@@ -211,6 +211,7 @@ static void br_stp_stop(struct net_bridge *br)
spin_unlock_bh(>lock);
}

+   del_timer_sync(>hello_timer);
br->stp_enabled = BR_NO_STP;
 }

Thanks
Hangbin

Re: [PATCH net] ah: use crypto_memneq to check the ICV

2017-05-19 Thread Steffen Klassert

On Fri, May 19, 2017 at 02:25:14PM +0200, Sabrina Dubroca wrote:
> Hi Steffen,
> 
> 2017-05-04, 12:41:24 +0200, Steffen Klassert wrote:
> > On Wed, May 03, 2017 at 04:57:57PM +0200, Sabrina Dubroca wrote:
> > > Signed-off-by: Sabrina Dubroca 
> > > ---
> > >  net/ipv4/ah4.c | 5 +++--
> > >  net/ipv6/ah6.c | 5 +++--
> > >  2 files changed, 6 insertions(+), 4 deletions(-)
> > 
> > Is this a fix for something? If so, please describe what it fixes.
> > If not, it can wait until after the merge window and merged into
> > ipsec-next then.
> 
> Do you prefer that I resend the patch, or can you take it directly?

No need to resend, I've just applied it directly.

Thanks a lot Sabrina!

RE: [PATCH net-next] cxgb4: fix incorrect cim_la output for T6

2017-05-19 Thread David Laight

From: Ganesh Goudar
> Sent: 19 May 2017 11:12
T6
> 
> take care of UpDbgLaRdPtr[0-3] restriction for T6
> 
> Signed-off-by: Ganesh Goudar 
> ---
>  drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c 
> b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> index aded42b96..917b46b 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> @@ -8268,6 +8268,13 @@ int t4_cim_read_la(struct adapter *adap, u32 *la_buf, 
> unsigned int *wrptr)
>   if (ret)
>   break;
>   idx = (idx + 1) & UPDBGLARDPTR_M;
> +
> + /* Bits 0-3 of UpDbgLaRdPtr can be between  to 1001 to
> +  * identify the 32-bit portion of the full 312-bit data
> +  */
> + if (is_t6(adap->params.chip))
> + while ((idx & 0xf) > 9)
> + idx = (idx + 1) % UPDBGLARDPTR_M;

Why the loop, maybe:
if (is_t6(adap->params.chip) && (idx & 0xf) >= 9)
idx = (idx & 0xf0) + 0x10;
else
idx++;
idx &= UPDBGLARDPTR_M;

David

Re: general protection fault in skb_release_data

2017-05-19 Thread Eric Dumazet

On Fri, May 19, 2017 at 5:57 AM, Andrey Konovalov  wrote:
> On Fri, May 19, 2017 at 12:18 PM,   wrote:
>> Hi,
>>
>> I've got the following bug report while fuzzing the
>> kernel(master-f83246089ca) with syzkaller.
>>
>> program and config are attached.
>
> Hi!
>
> Thanks for the report!
>
> Adding kernel maintainers.
>
> I can confirm that we've hist this bug multiple times, but never been
> able to reproduce it.
>
> I was able to reproduce it on 2ea659a9ef488125eb46da6eb571de5eae5c43f6
> (4.12-rc1).
>
> Using the attached syzkaller program I was able to generate C
> reproducer, attached. Sometimes I need to run it a few times to
> trigger the bug.
>
> @idaifish If you find more bugs please run ./scripts/get_maintainer.pl
> to get the list of subsystem maintainers and add them to the
> recipients. I've updated instructions of how to report kernel bugs
> found with syzkaller in README.
>
> Thanks!
>
>>
>> ===
>> kasan: GPF could be caused by NULL-ptr deref or user memory access
>> general protection fault:  [#1] SMP KASAN
>> Dumping ftrace buffer:
>>(ftrace buffer empty)
>> Modules linked in:
>> CPU: 2 PID: 21599 Comm: syz-executor3 Not tainted 4.11.0-rc8+ #1
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> Ubuntu-1.8.2-1ubuntu1 04/01/2014
>> task: 88006c16dec0 task.stack: 880058bb8000
>> RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline]
>> RIP: 0010:compound_head include/linux/page-flags.h:146 [inline]
>> RIP: 0010:put_page include/linux/mm.h:796 [inline]
>> RIP: 0010:__skb_frag_unref include/linux/skbuff.h:2613 [inline]
>> RIP: 0010:skb_release_data+0x201/0x3b0 net/core/skbuff.c:593
>> RSP: 0018:880058bbf570 EFLAGS: 00010a02
>> RAX: 11032b488bad1523 RBX: 88006c6e8ec8 RCX: c9000190e000
>> RDX: 11000d8dd1df RSI: 8293d7e3 RDI: 88195a445d68a919
>> RBP: 880058bbf5a8 R08: 7bdf27567b31597f R09: 
>> R10: 00d2 R11: 7dc9ab6dec891f24 R12: 
>> R13: dc00 R14: 88006a475940 R15: 88195a445d68a8f9
>> FS:  7fbbc6a34700() GS:88006e40() knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2: 2004e000 CR3: 6b868000 CR4: 06e0
>> Call Trace:
>>  skb_release_all+0x4a/0x60 net/core/skbuff.c:669
>>  __kfree_skb net/core/skbuff.c:683 [inline]
>>  kfree_skb+0x85/0x1b0 net/core/skbuff.c:704
>>  __ip6_append_data.isra.42+0x26ed/0x33b0 net/ipv6/ip6_output.c:1519
>>  ip6_append_data+0x1a8/0x2f0 net/ipv6/ip6_output.c:1633
>>  udpv6_sendmsg+0x7bd/0x2360 net/ipv6/udp.c:1264
>>  inet_sendmsg+0x123/0x3a0 net/ipv4/af_inet.c:762
>>  sock_sendmsg_nosec net/socket.c:633 [inline]
>>  sock_sendmsg+0xca/0x110 net/socket.c:643
>>  ___sys_sendmsg+0x79f/0x900 net/socket.c:1997
>>  __sys_sendmsg+0xd1/0x170 net/socket.c:2031
>>  SYSC_sendmsg net/socket.c:2042 [inline]
>>  SyS_sendmsg+0x2d/0x50 net/socket.c:2038
>>  entry_SYSCALL_64_fastpath+0x1a/0xa9
>> RIP: 0033:0x44fb79
>> RSP: 002b:7fbbc6a33b58 EFLAGS: 0212 ORIG_RAX: 002e
>> RAX: ffda RBX: 00718000 RCX: 0044fb79
>> RDX:  RSI: 2000afc8 RDI: 0005
>> RBP: 03fb R08:  R09: 
>> R10:  R11: 0212 R12: 0005
>> R13: 20001000 R14: 00048000 R15: 
>> Code: 48 83 c0 03 48 c1 e0 04 48 01 d8 48 89 c2 48 c1 ea 03 42 80 3c 2a 00
>> 0f 85 92 01 00 00 4c 8b 38 49 8d 7f 20 48 89 f8 48 c1 e8 03 <42> 80 3c 28 00
>> 0f 85 6f 01 00 00 49 8b 47 20 a8 01 0f 84 3b ff
>> RIP: __read_once_size include/linux/compiler.h:254 [inline] RSP:
>> 880058bbf570
>> RIP: compound_head include/linux/page-flags.h:146 [inline] RSP:
>> 880058bbf570
>> RIP: put_page include/linux/mm.h:796 [inline] RSP: 880058bbf570
>> RIP: __skb_frag_unref include/linux/skbuff.h:2613 [inline] RSP:
>> 880058bbf570
>> RIP: skb_release_data+0x201/0x3b0 net/core/skbuff.c:593 RSP:
>> 880058bbf570
>> ---[ end trace 46c9f72a66cd8627 ]---
>> Kernel panic - not syncing: Fatal exception
>> Dumping ftrace buffer:
>>(ftrace buffer empty)
>> Kernel Offset: disabled
>> Rebooting in 86400 seconds..
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "syzkaller" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to syzkaller+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.

Hi Andrey, please try following patch
Thanks.


patch2482
Description: Binary data

Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-19 Thread Alexei Starovoitov


On 5/19/17 7:21 AM, Edward Cree wrote:

I'm currently translating the algos to C.  But for the kernel patch,
 I'll need to read & understand the existing verifier code, so it
 might take a while :)  (I don't suppose there's any design document
 or hacking-notes you could point me at?)


Dave just gave a good overview of the verifier:
https://www.spinics.net/lists/xdp-newbies/msg00185.html

Few more details in
Documentation/networking/filter.txt
and in top comments in kernel/bpf/verifier.c

General bpf arch overview:
http://docs.cilium.io/en/latest/bpf/#instruction-set


But I'll give it a go for sure.


Thanks!

[PATCH v2 net] smsc95xx: Support only IPv4 TCP/UDP csum offload

2017-05-19 Thread Nisar.Sayed

From: Nisar Sayed 

When TX checksum offload is used, if the computed checksum is 0 the
LAN95xx device do not alter the checksum to 0x.  In the case of ipv4
UDP checksum, it indicates to receiver that no checksum is calculated.
Under ipv6, UDP checksum yields a result of zero must be changed to
0x. Hence disabling checksum offload for ipv6 packets.

Signed-off-by: Nisar Sayed 

Reported-by: popcorn mix 
---
Changes
V2
- Updated e-mail alias
- Updated commit message and comments

 drivers/net/usb/smsc95xx.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 765400b..2dfca96 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -681,7 +681,7 @@ static int smsc95xx_set_features(struct net_device *netdev,
if (ret < 0)
return ret;
 
-   if (features & NETIF_F_HW_CSUM)
+   if (features & NETIF_F_IP_CSUM)
read_buf |= Tx_COE_EN_;
else
read_buf &= ~Tx_COE_EN_;
@@ -1279,12 +1279,19 @@ static int smsc95xx_bind(struct usbnet *dev, struct 
usb_interface *intf)
 
spin_lock_init(>mac_cr_lock);
 
+   /* LAN95xx devices do not alter the computed checksum of 0 to 0x.
+* RFC 2460, ipv6 UDP calculated checksum yields a result of zero must
+* be changed to 0x. RFC 768, ipv4 UDP computed checksum is zero,
+* it is transmitted as all ones. The zero transmitted checksum means
+* transmitter generated no checksum. Hence, enable csum offload only
+* for ipv4 packets.
+*/
if (DEFAULT_TX_CSUM_ENABLE)
-   dev->net->features |= NETIF_F_HW_CSUM;
+   dev->net->features |= NETIF_F_IP_CSUM;
if (DEFAULT_RX_CSUM_ENABLE)
dev->net->features |= NETIF_F_RXCSUM;
 
-   dev->net->hw_features = NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
+   dev->net->hw_features = NETIF_F_IP_CSUM | NETIF_F_RXCSUM;
 
smsc95xx_init_mac_address(dev);
 
-- 
1.9.1

Re: general protection fault in skb_release_data

2017-05-19 Thread Andrey Konovalov

On Fri, May 19, 2017 at 4:36 PM, 'Eric Dumazet' via syzkaller
 wrote:
> On Fri, May 19, 2017 at 5:57 AM, Andrey Konovalov  
> wrote:
>> On Fri, May 19, 2017 at 12:18 PM,   wrote:
>>> Hi,
>>>
>>> I've got the following bug report while fuzzing the
>>> kernel(master-f83246089ca) with syzkaller.
>>>
>>> program and config are attached.
>>
>> Hi!
>>
>> Thanks for the report!
>>
>> Adding kernel maintainers.
>>
>> I can confirm that we've hist this bug multiple times, but never been
>> able to reproduce it.
>>
>> I was able to reproduce it on 2ea659a9ef488125eb46da6eb571de5eae5c43f6
>> (4.12-rc1).
>>
>> Using the attached syzkaller program I was able to generate C
>> reproducer, attached. Sometimes I need to run it a few times to
>> trigger the bug.
>>
>> @idaifish If you find more bugs please run ./scripts/get_maintainer.pl
>> to get the list of subsystem maintainers and add them to the
>> recipients. I've updated instructions of how to report kernel bugs
>> found with syzkaller in README.
>>
>> Thanks!
>>
>>>
>>> ===
>>> kasan: GPF could be caused by NULL-ptr deref or user memory access
>>> general protection fault:  [#1] SMP KASAN
>>> Dumping ftrace buffer:
>>>(ftrace buffer empty)
>>> Modules linked in:
>>> CPU: 2 PID: 21599 Comm: syz-executor3 Not tainted 4.11.0-rc8+ #1
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>>> Ubuntu-1.8.2-1ubuntu1 04/01/2014
>>> task: 88006c16dec0 task.stack: 880058bb8000
>>> RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline]
>>> RIP: 0010:compound_head include/linux/page-flags.h:146 [inline]
>>> RIP: 0010:put_page include/linux/mm.h:796 [inline]
>>> RIP: 0010:__skb_frag_unref include/linux/skbuff.h:2613 [inline]
>>> RIP: 0010:skb_release_data+0x201/0x3b0 net/core/skbuff.c:593
>>> RSP: 0018:880058bbf570 EFLAGS: 00010a02
>>> RAX: 11032b488bad1523 RBX: 88006c6e8ec8 RCX: c9000190e000
>>> RDX: 11000d8dd1df RSI: 8293d7e3 RDI: 88195a445d68a919
>>> RBP: 880058bbf5a8 R08: 7bdf27567b31597f R09: 
>>> R10: 00d2 R11: 7dc9ab6dec891f24 R12: 
>>> R13: dc00 R14: 88006a475940 R15: 88195a445d68a8f9
>>> FS:  7fbbc6a34700() GS:88006e40() knlGS:
>>> CS:  0010 DS:  ES:  CR0: 80050033
>>> CR2: 2004e000 CR3: 6b868000 CR4: 06e0
>>> Call Trace:
>>>  skb_release_all+0x4a/0x60 net/core/skbuff.c:669
>>>  __kfree_skb net/core/skbuff.c:683 [inline]
>>>  kfree_skb+0x85/0x1b0 net/core/skbuff.c:704
>>>  __ip6_append_data.isra.42+0x26ed/0x33b0 net/ipv6/ip6_output.c:1519
>>>  ip6_append_data+0x1a8/0x2f0 net/ipv6/ip6_output.c:1633
>>>  udpv6_sendmsg+0x7bd/0x2360 net/ipv6/udp.c:1264
>>>  inet_sendmsg+0x123/0x3a0 net/ipv4/af_inet.c:762
>>>  sock_sendmsg_nosec net/socket.c:633 [inline]
>>>  sock_sendmsg+0xca/0x110 net/socket.c:643
>>>  ___sys_sendmsg+0x79f/0x900 net/socket.c:1997
>>>  __sys_sendmsg+0xd1/0x170 net/socket.c:2031
>>>  SYSC_sendmsg net/socket.c:2042 [inline]
>>>  SyS_sendmsg+0x2d/0x50 net/socket.c:2038
>>>  entry_SYSCALL_64_fastpath+0x1a/0xa9
>>> RIP: 0033:0x44fb79
>>> RSP: 002b:7fbbc6a33b58 EFLAGS: 0212 ORIG_RAX: 002e
>>> RAX: ffda RBX: 00718000 RCX: 0044fb79
>>> RDX:  RSI: 2000afc8 RDI: 0005
>>> RBP: 03fb R08:  R09: 
>>> R10:  R11: 0212 R12: 0005
>>> R13: 20001000 R14: 00048000 R15: 
>>> Code: 48 83 c0 03 48 c1 e0 04 48 01 d8 48 89 c2 48 c1 ea 03 42 80 3c 2a 00
>>> 0f 85 92 01 00 00 4c 8b 38 49 8d 7f 20 48 89 f8 48 c1 e8 03 <42> 80 3c 28 00
>>> 0f 85 6f 01 00 00 49 8b 47 20 a8 01 0f 84 3b ff
>>> RIP: __read_once_size include/linux/compiler.h:254 [inline] RSP:
>>> 880058bbf570
>>> RIP: compound_head include/linux/page-flags.h:146 [inline] RSP:
>>> 880058bbf570
>>> RIP: put_page include/linux/mm.h:796 [inline] RSP: 880058bbf570
>>> RIP: __skb_frag_unref include/linux/skbuff.h:2613 [inline] RSP:
>>> 880058bbf570
>>> RIP: skb_release_data+0x201/0x3b0 net/core/skbuff.c:593 RSP:
>>> 880058bbf570
>>> ---[ end trace 46c9f72a66cd8627 ]---
>>> Kernel panic - not syncing: Fatal exception
>>> Dumping ftrace buffer:
>>>(ftrace buffer empty)
>>> Kernel Offset: disabled
>>> Rebooting in 86400 seconds..
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "syzkaller" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to syzkaller+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>
> Hi Andrey, please try following patch
> Thanks.

Hi Eric,

Your patch fixes the bug for me.

Thanks!

Tested-by: Andrey Konovalov 

>
> --

[PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Xin Long

Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
kernel hello and hold timers"), bridge would not start hello_timer if
stp_enabled is not KERNEL_STP when br_dev_open.

The problem is even if users set stp_enabled with KERNEL_STP later,
the timer will still not be started. It causes that KERNEL_STP can
not really work. Users have to re-ifup the bridge to avoid this.

This patch is to fix it by starting br->hello_timer when enabling
KERNEL_STP in br_stp_start.

As an improvement, it's also to start hello_timer again only when
br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
no reason to start the timer again when it's NO_STP.

Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello 
and hold timers")
Reported-by: Haidong Li 
Signed-off-by: Xin Long 
---
 net/bridge/br_stp_if.c| 1 +
 net/bridge/br_stp_timer.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 08341d2..0db8102 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -179,6 +179,7 @@ static void br_stp_start(struct net_bridge *br)
br_debug(br, "using kernel STP\n");
 
/* To start timers on any ports left in blocking */
+   mod_timer(>hello_timer, jiffies + br->hello_time);
br_port_state_selection(br);
}
 
diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
index c98b3e5..60b6fe2 100644
--- a/net/bridge/br_stp_timer.c
+++ b/net/bridge/br_stp_timer.c
@@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg)
if (br->dev->flags & IFF_UP) {
br_config_bpdu_generation(br);
 
-   if (br->stp_enabled != BR_USER_STP)
+   if (br->stp_enabled == BR_KERNEL_STP)
mod_timer(>hello_timer,
  round_jiffies(jiffies + br->hello_time));
}
-- 
2.1.0

RE: [RFC V1 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

2017-05-19 Thread David Laight

From: Bjørn Mork
> Sent: 19 May 2017 14:56
...
> Unless someone has a nice way to just collect a list of skbs and have
> them converted to proper framing on the fly when transmitting, without
> having to care about USB packet boundaries.

skb can be linked into arbitrary chains (or even trees), but I suspect
the usbnet code would need to be taught about them.

For XHCI it isn't too bad because it will do arbitrary scatter-gather
(apart from the ring end).
But I believe the earlier controllers only support fragments that
end on usb packet boundaries.

I looked at the usbnet code a few years ago, I'm sure it ought to
be possible to shortcut most of the code that uses URB and directly
write to the xhci (in particular) ring descriptors.

For receive you probably want to feed the USB stack multiple (probably)
2k buffers, and handle the debatching into ethernet frames yourself.

One of the ASIX drivers used to be particularly bad, it allocated 64k
skb for receive (the hardware can merge IP packets) and then hacked
the true_size before passing upstream.

David

[PATCH v2] hdlcdrv: fix divide error bug if bitrate is 0

2017-05-19 Thread Firo Yang

The divisor s->par.bitrate will always be 0 until initialized by
ndo_open() and hdlcdrv_open().

In order to fix this divide zero error, check whether the netdevice was
opened by ndo_open() before performing divide.And we also check the the
value of bitrate in case of bad setting of it.

Reported-by: Andrey Konovalov 
Signed-off-by: Firo Yang 
---
v1->v2:
Change Reported-by to Andrey Konovalov.
Send it to original report thread.

v0->v1:
Reviewed by walter harms .
Return ENODEV instead of EPERM if !netif_running(dev)
Check if s->par.bitrate > 0.
 drivers/net/hamradio/hdlcdrv.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/hamradio/hdlcdrv.c b/drivers/net/hamradio/hdlcdrv.c
index 8c3633c..b0f417f 100644
--- a/drivers/net/hamradio/hdlcdrv.c
+++ b/drivers/net/hamradio/hdlcdrv.c
@@ -576,6 +576,10 @@ static int hdlcdrv_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
case HDLCDRVCTL_CALIBRATE:
if(!capable(CAP_SYS_RAWIO))
return -EPERM;
+   if (!netif_running(dev))
+   return -ENODEV;
+   if (!(s->par.bitrate > 0))
+   return -EINVAL;
if (bi.data.calibrate > INT_MAX / s->par.bitrate)
return -EINVAL;
s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16;
-- 
2.9.4

Re: [PATCH net 1/3] vlan: Fix tcp checksums offloads for Q-in-Q vlan.

2017-05-19 Thread Toshiaki Makita


On 17/05/19 (金) 18:53, Vlad Yasevich wrote:

On 05/19/2017 04:16 AM, Toshiaki Makita wrote:

On 2017/05/19 16:09, Vlad Yasevich wrote:

On 05/18/2017 10:13 PM, Toshiaki Makita wrote:

On 2017/05/18 22:31, Vladislav Yasevich wrote:

It appears that since commit 8cb65d000, Q-in-Q vlans have been
broken.  The series that commit is part of enabled TSO and checksum
offloading on Q-in-Q vlans.  However, most HW we support can't handle
it.  To work around the issue, the above commit added a function that
turns off offloads on Q-in-Q devices, but it left the checksum offload.
That will cause issues with most older devices that supprort very basic
checksum offload capabilities as well as some newer devices (we've
reproduced te problem with both be2net and bnx).

To solve this for everyone, turn off checksum offloading feature
by default when sending Q-in-Q traffic.  Devices that are proven to
work can provided a corrected ndo_features_check implemetation.

Fixes: 8cb65d000 ("net: Move check for multiple vlans to drivers")
CC: Toshiaki Makita 
Signed-off-by: Vladislav Yasevich 
---
 include/linux/if_vlan.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 8d5fcd6..ae537f0 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -619,7 +619,6 @@ static inline netdev_features_t vlan_features_check(const 
struct sk_buff *skb,
 NETIF_F_SG |
 NETIF_F_HIGHDMA |
 NETIF_F_FRAGLIST |
-NETIF_F_HW_CSUM |
 NETIF_F_HW_VLAN_CTAG_TX |
 NETIF_F_HW_VLAN_STAG_TX);



I guess HW_CSUM theoretically can handle Q-in-Q packets and the problem
is IP_CSUM and IPV6_CSUM.
So wouldn't it be better to leave HW_CSUM and drop IP_CSUM/IPV6_CSUM,
i.e. change intersection into bitwise AND?



It wasn't really a problem before accelerations got enabled on q-in-q
vlans.


Right for stacked vlan device.
But I think the check was there for packets from guests forwarded by
bridge to vlan device so it was a problem before 8cb65d000.


Not really, since stacked vlans in guests wouldn't have accelerations on.
Haven't really tried a new guest on old hosts.  It might be an issue there...


It's real. I'm now remembering that I came across a similar issue before 
introducing 8cb65d000. The situation was that bridge (vlan_filtering) 
adds a vlan tag to a frame which is already tagged by guests, or by a 
vlan device on the top of the bridge (Note that virtio and bridge have 
HW_CSUM in vlan_features). I addressed the problem in drivers side since 
all the IP/IPV6_CSUM drivers I encountered the issue on are able to 
notify devices of IP header offset. Now I checked be2net driver's code 
and realized it doesn't provide IP offset so it makes sense to drop 
IP/IPV6_CSUM by default. Anyway, kernels before 8cb65d000 have that 
problem, not only after 8cb65d000.


Toshiaki Makita

Re: [RFC V1 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

2017-05-19 Thread Bjørn Mork

David Laight  writes:

> From: linux-usb-ow...@vger.kernel.org 
> [mailto:linux-usb-ow...@vger.kernel.org] On Behalf Of Jim Baxter
>> Sent: 16 May 2017 18:41
>> 
>> The CDC-NCM driver can require large amounts of memory to create
>> skb's and this can be a problem when the memory becomes fragmented.
>> 
>> This especially affects embedded systems that have constrained
>> resources but wish to maximise the throughput of CDC-NCM with 16KiB
>> NTB's.
>
> Why is this driver copying multiple tx messages into a single skb.

Mostly becasue it already did that when I started messing with it, and I
didn't know how to avoid that.

> Surely there are better ways to do this??

But I have been there thinking this exact thought a couple of times.
Suggestions are appreciated.

> I think it is generating a 'multi-ethernet frame' URB with an
> overall header for each URB and a header for each ethernet frame.

With some negotiated alignment restrictions, and a linked list of frame
pointer arrays.  But yes, that is basically it.

(it's not always ethernet - with MBIM it can be IP or arbitrary as well,
but I don't think that makes any difference)

> Given that the USB stack allows multiple concurrent transmits I'm
> surprised that batching large ethernet frames makes much difference.

Me too.  Actually, I don't think it does.  The protocol was developed
with specific device restrictions in mind. These might be invalid today.
There is no reason to believe that using simple CDC ECM framing
(i.e. one ethernet frame per URB) is any problem.

> Also the USB target can't actually tell when URB that contain
> multiples of the USB packet size end.
> So it is possible to send a single NTB as multiple URB.

Nice idea!  Never thought of that.  Yes, the driver could use a number
smaller buffers regardless of the NTB size, by abusing the fact that the
device will see them as a contigious USB transfer as long as they fall
on USB packet boundaries.

Started thinking about how to do this in practice.  It seemed obviously
simply to jsut fire off the buffers as they fill up until the the max
aggregation size or time has been exceeded.  But the header makes this
harder than necessary.  It contains both a length and a pointer to the
first frame pointer array (NDP).  So we will have to decide the size of
the NTB and where to put the first NDP before sending the first USB
packet. This is possible if we always go for the pad-to-max strategy.
We'll also have to make some assumptions about the size of the NDP(s) as
we add them, but we already do that so I don't think it is a big deal.

Might be the way to go.

Unless someone has a nice way to just collect a list of skbs and have
them converted to proper framing on the fly when transmitting, without
having to care about USB packet boundaries.

Bjørn

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Nikolay Aleksandrov


On 5/19/17 5:48 PM, Florian Westphal wrote:

Nikolay Aleksandrov  wrote:

On 5/19/17 5:20 PM, Xin Long wrote:

Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
kernel hello and hold timers"), bridge would not start hello_timer if
stp_enabled is not KERNEL_STP when br_dev_open.

The problem is even if users set stp_enabled with KERNEL_STP later,
the timer will still not be started. It causes that KERNEL_STP can
not really work. Users have to re-ifup the bridge to avoid this.

This patch is to fix it by starting br->hello_timer when enabling
KERNEL_STP in br_stp_start.

As an improvement, it's also to start hello_timer again only when
br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
no reason to start the timer again when it's NO_STP.

Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and 
hold timers")
Reported-by: Haidong Li 
Signed-off-by: Xin Long 
---
  net/bridge/br_stp_if.c| 1 +
  net/bridge/br_stp_timer.c | 2 +-
  2 files changed, 2 insertions(+), 1 deletion(-)



This doesn't make much sense to me, how do you change from USER_STP to
KERNEL_STP without first going through NO_STP ?


This is easily rerpoduceable via:

ip link add vethin1 type veth peer name vethout1
ip link add vethin2 type veth peer name vethout2

ip link set vethin1 up
ip link set vethin2 up

ip link set vethout1 up
ip link set vethout2 up

brctl addbr br0
brctl addbr br1

brctl stp br0 on
brctl stp br1 on


I think this step with moving NO_STP -> KERNEL_STP should be last, then
I can see how the timer won't be started.



brctl addif br0 vethin1
brctl addif br0 vethin2

brctl addif br1 vethout1
brctl addif br1 vethout2

ip link set br0 up
ip link set br1 up

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Ivan Vecera

2017-05-19 16:57 GMT+02:00 Nikolay Aleksandrov :
> On 5/19/17 5:51 PM, Ivan Vecera wrote:
>>
>> 2017-05-19 16:45 GMT+02:00 Nikolay Aleksandrov
>> :
>>>
>>> On 5/19/17 5:20 PM, Xin Long wrote:

 Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
 kernel hello and hold timers"), bridge would not start hello_timer if
 stp_enabled is not KERNEL_STP when br_dev_open.

 The problem is even if users set stp_enabled with KERNEL_STP later,
 the timer will still not be started. It causes that KERNEL_STP can
 not really work. Users have to re-ifup the bridge to avoid this.

 This patch is to fix it by starting br->hello_timer when enabling
 KERNEL_STP in br_stp_start.

 As an improvement, it's also to start hello_timer again only when
 br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
 no reason to start the timer again when it's NO_STP.

 Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel
 hello and hold timers")
 Reported-by: Haidong Li 
 Signed-off-by: Xin Long 
 ---
net/bridge/br_stp_if.c| 1 +
net/bridge/br_stp_timer.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)

>>>
>>> This doesn't make much sense to me, how do you change from USER_STP to
>>> KERNEL_STP without first going through NO_STP ?
>>>
>>> If you go through NO_STP then all will be fine because br_stp_stop will
>>> restart
>>> the timers if the previous val was USER_STP.
>>>
>> The problem occurs when KERNEL_STP is enabled if the bridge itself is
>> already
>> up. Then the hello_timer is not started. If the hello and hold timers
>> should run only
>> when KERNEL_STP is used then there are another problematic places
>> (will send follow-up).
>>
>> Ivan
>>
>
> Oh, the problem seems to be rather going from NO_STP -> KERNEL_STP only
> then, because you cannot do direct USER_STP -> KERNEL_STP.
>
No only NO_STP->KERNEL_STP but KERNEL_STP->NO_STP as well as USER_STP->NO_STP:

1) NO_STP->KERNEL_STP issue
hello_timer should be started in br_stp_start() - this patch

2) KERNEL_STP->NO_STP issue
hello timer and hold timers should be stopped (deleted) in br_stp_stop()

3) USER_STP->NO_STP issue
hello timer and hold timers should NOT be started in br_stp_stop()

Ivan

[PATCH iproute2] ip: add handling for new CAN netlink interface

2017-05-19 Thread Remigiusz Kołłątaj

This patch adds handling for new CAN netlink interface introduced in
4.11 kernel:
- IFLA_CAN_TERMINATION,
- IFLA_CAN_TERMINATION_CONST,
- IFLA_CAN_BITRATE_CONST,
- IFLA_CAN_DATA_BITRATE_CONST

Output example:
$ip -d link show can0
6: can0:  mtu 16 qdisc noop state DOWN mode DEFAULT group default 
qlen 10
link/can  promiscuity 0
can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
  bitrate 8
 [   2,3,5,8,8,   10,
125000,   15,   175000,   20,   225000,   25,
275000,   30,   50,   625000,   80,  100 ]
  termination 0 [ 0, 120 ]
  clock 0numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 
65535

Signed-off-by: Remigiusz Kołłątaj 
---
 ip/iplink_can.c | 98 ++---
 1 file changed, 94 insertions(+), 4 deletions(-)

diff --git a/ip/iplink_can.c b/ip/iplink_can.c
index 20d4d37d..5df56b2b 100644
--- a/ip/iplink_can.c
+++ b/ip/iplink_can.c
@@ -41,6 +41,8 @@ static void print_usage(FILE *f)
"\t[ restart-ms TIME-MS ]\n"
"\t[ restart ]\n"
"\n"
+   "\t[ termination { 0..65535 } ]\n"
+   "\n"
"\tWhere: BITRATE   := { 1..100 }\n"
"\t   SAMPLE-POINT  := { 0.000..0.999 }\n"
"\t   TQ:= { NUMBER }\n"
@@ -220,6 +222,14 @@ static int can_parse_opt(struct link_util *lu, int argc, 
char **argv,
if (get_u32(, *argv, 0))
invarg("invalid \"restart-ms\" value\n", *argv);
addattr32(n, 1024, IFLA_CAN_RESTART_MS, val);
+   } else if (matches(*argv, "termination") == 0) {
+   __u16 val;
+
+   NEXT_ARG();
+   if (get_u16(, *argv, 0))
+   invarg("invalid \"termination\" value\n",
+  *argv);
+   addattr16(n, 1024, IFLA_CAN_TERMINATION, val);
} else if (matches(*argv, "help") == 0) {
usage();
return -1;
@@ -282,7 +292,8 @@ static void can_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
fprintf(f, "restart-ms %d ", *restart_ms);
}
 
-   if (tb[IFLA_CAN_BITTIMING]) {
+   /* bittiming is irrelevant if fixed bitrate is defined */
+   if (tb[IFLA_CAN_BITTIMING] && !tb[IFLA_CAN_BITRATE_CONST]) {
struct can_bittiming *bt = RTA_DATA(tb[IFLA_CAN_BITTIMING]);
 
fprintf(f, "\nbitrate %d sample-point %.3f ",
@@ -292,7 +303,8 @@ static void can_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
bt->sjw);
}
 
-   if (tb[IFLA_CAN_BITTIMING_CONST]) {
+   /* bittiming const is irrelevant if fixed bitrate is defined */
+   if (tb[IFLA_CAN_BITTIMING_CONST] && !tb[IFLA_CAN_BITRATE_CONST]) {
struct can_bittiming_const *btc =
RTA_DATA(tb[IFLA_CAN_BITTIMING_CONST]);
 
@@ -303,7 +315,37 @@ static void can_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
btc->brp_min, btc->brp_max, btc->brp_inc);
}
 
-   if (tb[IFLA_CAN_DATA_BITTIMING]) {
+   if (tb[IFLA_CAN_BITRATE_CONST]) {
+   __u32 *bitrate_const = RTA_DATA(tb[IFLA_CAN_BITRATE_CONST]);
+   int bitrate_cnt = RTA_PAYLOAD(tb[IFLA_CAN_BITRATE_CONST]) /
+ sizeof(*bitrate_const);
+   int i;
+   __u32 bitrate = 0;
+
+   if (tb[IFLA_CAN_BITTIMING]) {
+   struct can_bittiming *bt =
+   RTA_DATA(tb[IFLA_CAN_BITTIMING]);
+   bitrate = bt->bitrate;
+   }
+
+   fprintf(f, "\nbitrate %u", bitrate);
+   fprintf(f, "\n   [");
+
+   for (i = 0; i < bitrate_cnt - 1; ++i) {
+   /* This will keep lines below 80 signs */
+   if (!(i % 6) && i)
+   fprintf(f, "\n");
+
+   fprintf(f, "%8u, ", bitrate_const[i]);
+   }
+
+   if (!(i % 6) && i)
+   fprintf(f, "\n");
+   fprintf(f, "%8u ]", bitrate_const[i]);
+   }
+
+   /* data bittiming is irrelevant if fixed bitrate is defined */
+   if (tb[IFLA_CAN_DATA_BITTIMING] && !tb[IFLA_CAN_DATA_BITRATE_CONST]) {
struct can_bittiming *dbt =
RTA_DATA(tb[IFLA_CAN_DATA_BITTIMING]);
 
@@ -315,7 +357,9 @@ static void can_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
dbt->phase_seg2, dbt->sjw);
}
 
-   if

Good day my dear, please i need your assistant

2017-05-19 Thread Aisha Muammar Gaddafi

How are you doing today, Hope you are in perfect condition? Please i apologies 
with you to exercise a little patience and read through my letter i am 
contacting you for obvious reason and important business. My Name is Aisha 
Gaddafi,i am open minded person and easy going person romance and hard working, 
Please i really need to have a good relationship with you, i have some 
important discussion to discuss with you.There is information i would like you 
to keep very confidential, i will appreciate if you can reply me through my 
Private email address (aishamuammar...@yahoo.com) for further discussion, I 
WILL BE VERY HAPPY IF YOU ACCEPT ME AS YOUR FRIEND,Thanks and God bless, this 
is just brief about me,hope to review the secret inner most part of me once i 
received from you.
Aisha Gaddafi.

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Florian Westphal

Nikolay Aleksandrov  wrote:
> On 5/19/17 5:20 PM, Xin Long wrote:
> >Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
> >kernel hello and hold timers"), bridge would not start hello_timer if
> >stp_enabled is not KERNEL_STP when br_dev_open.
> >
> >The problem is even if users set stp_enabled with KERNEL_STP later,
> >the timer will still not be started. It causes that KERNEL_STP can
> >not really work. Users have to re-ifup the bridge to avoid this.
> >
> >This patch is to fix it by starting br->hello_timer when enabling
> >KERNEL_STP in br_stp_start.
> >
> >As an improvement, it's also to start hello_timer again only when
> >br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
> >no reason to start the timer again when it's NO_STP.
> >
> >Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel 
> >hello and hold timers")
> >Reported-by: Haidong Li 
> >Signed-off-by: Xin Long 
> >---
> >  net/bridge/br_stp_if.c| 1 +
> >  net/bridge/br_stp_timer.c | 2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> >
> 
> This doesn't make much sense to me, how do you change from USER_STP to
> KERNEL_STP without first going through NO_STP ?

This is easily rerpoduceable via:

ip link add vethin1 type veth peer name vethout1
ip link add vethin2 type veth peer name vethout2

ip link set vethin1 up
ip link set vethin2 up

ip link set vethout1 up
ip link set vethout2 up

brctl addbr br0
brctl addbr br1

brctl stp br0 on
brctl stp br1 on

brctl addif br0 vethin1
brctl addif br0 vethin2

brctl addif br1 vethout1
brctl addif br1 vethout2

ip link set br0 up
ip link set br1 up

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Ivan Vecera

2017-05-19 16:45 GMT+02:00 Nikolay Aleksandrov :
> On 5/19/17 5:20 PM, Xin Long wrote:
>>
>> Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
>> kernel hello and hold timers"), bridge would not start hello_timer if
>> stp_enabled is not KERNEL_STP when br_dev_open.
>>
>> The problem is even if users set stp_enabled with KERNEL_STP later,
>> the timer will still not be started. It causes that KERNEL_STP can
>> not really work. Users have to re-ifup the bridge to avoid this.
>>
>> This patch is to fix it by starting br->hello_timer when enabling
>> KERNEL_STP in br_stp_start.
>>
>> As an improvement, it's also to start hello_timer again only when
>> br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
>> no reason to start the timer again when it's NO_STP.
>>
>> Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel
>> hello and hold timers")
>> Reported-by: Haidong Li 
>> Signed-off-by: Xin Long 
>> ---
>>   net/bridge/br_stp_if.c| 1 +
>>   net/bridge/br_stp_timer.c | 2 +-
>>   2 files changed, 2 insertions(+), 1 deletion(-)
>>
>
> This doesn't make much sense to me, how do you change from USER_STP to
> KERNEL_STP without first going through NO_STP ?
>
> If you go through NO_STP then all will be fine because br_stp_stop will
> restart
> the timers if the previous val was USER_STP.
>
The problem occurs when KERNEL_STP is enabled if the bridge itself is already
up. Then the hello_timer is not started. If the hello and hold timers
should run only
when KERNEL_STP is used then there are another problematic places
(will send follow-up).

Ivan

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Nikolay Aleksandrov


On 5/19/17 6:03 PM, Ivan Vecera wrote:

2017-05-19 16:57 GMT+02:00 Nikolay Aleksandrov :

On 5/19/17 5:51 PM, Ivan Vecera wrote:


2017-05-19 16:45 GMT+02:00 Nikolay Aleksandrov
:


On 5/19/17 5:20 PM, Xin Long wrote:



Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
kernel hello and hold timers"), bridge would not start hello_timer if
stp_enabled is not KERNEL_STP when br_dev_open.

The problem is even if users set stp_enabled with KERNEL_STP later,
the timer will still not be started. It causes that KERNEL_STP can
not really work. Users have to re-ifup the bridge to avoid this.

This patch is to fix it by starting br->hello_timer when enabling
KERNEL_STP in br_stp_start.

As an improvement, it's also to start hello_timer again only when
br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
no reason to start the timer again when it's NO_STP.

Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel
hello and hold timers")
Reported-by: Haidong Li 
Signed-off-by: Xin Long 
---
net/bridge/br_stp_if.c| 1 +
net/bridge/br_stp_timer.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)



This doesn't make much sense to me, how do you change from USER_STP to
KERNEL_STP without first going through NO_STP ?

If you go through NO_STP then all will be fine because br_stp_stop will
restart
the timers if the previous val was USER_STP.


The problem occurs when KERNEL_STP is enabled if the bridge itself is
already
up. Then the hello_timer is not started. If the hello and hold timers
should run only
when KERNEL_STP is used then there are another problematic places
(will send follow-up).

Ivan



Oh, the problem seems to be rather going from NO_STP -> KERNEL_STP only
then, because you cannot do direct USER_STP -> KERNEL_STP.


No only NO_STP->KERNEL_STP but KERNEL_STP->NO_STP as well as USER_STP->NO_STP:

1) NO_STP->KERNEL_STP issue
hello_timer should be started in br_stp_start() - this patch



Right, I was talking only about this patch. By the way what about
the port hold_timers ? This patch only starts the hello_timer.


2) KERNEL_STP->NO_STP issue
hello timer and hold timers should be stopped (deleted) in br_stp_stop()

3) USER_STP->NO_STP issue
hello timer and hold timers should NOT be started in br_stp_stop()



Yep, ack.


Ivan

Re: [PATCH net 3/3] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans

2017-05-19 Thread Jason Wang




On 2017年05月18日 21:31, Vladislav Yasevich wrote:

Since virtio does not provide it's own ndo_features_check handler,
TSO, and now checksum offload, are disabled for stacked vlans.
Re-enable the support and let the host take care of it.  This
restores/improves Guest-to-Guest performance over Q-in-Q vlans.

CC: "Michael S. Tsirkin" 
CC: Jason Wang 
CC: virtualizat...@lists.linux-foundation.org
Signed-off-by: Vladislav Yasevich 
---
  drivers/net/virtio_net.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8324a5e..341fb96 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2028,6 +2028,7 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_poll_controller = virtnet_netpoll,
  #endif
.ndo_xdp= virtnet_xdp,
+   .ndo_features_check = passthru_features_check,
  };
  
  static void virtnet_config_changed_work(struct work_struct *work)


Acked-by: Jason Wang

Re: [PATCH v4 net-next 6/7] net: allow simultaneous SW and HW transmit timestamping

2017-05-19 Thread Willem de Bruijn

On Fri, May 19, 2017 at 6:00 AM, Miroslav Lichvar  wrote:
> On Thu, May 18, 2017 at 04:16:26PM -0400, Willem de Bruijn wrote:
>> On Thu, May 18, 2017 at 9:06 AM, Miroslav Lichvar  
>> wrote:
>> > +/* On transmit, software and hardware timestamps are returned 
>> > independently.
>> > + * As the two skb clones share the hardware timestamp, which may be 
>> > updated
>> > + * before the software timestamp is received, a hardware TX timestamp may 
>> > be
>> > + * returned only if there is no software TX timestamp. A false software
>> > + * timestamp made for SOCK_RCVTSTAMP when a real timestamp is missing must
>> > + * be ignored.
>>
>> Please expand on why this case can be ignored. It is quite subtle. How about
>> something like
>>
>> *
>> * A false software timestamp is one made inside the __sock_recv_timestamp
>> * call itself. These are generated whenever SO_TIMESTAMP(NS) is enabled
>> * on the socket, even when the timestamp reported is for another option, such
>> * as hardware tx timestamp.
>> *
>> * Ignore these when deciding whether a timestamp source is hw or sw.
>> */
>
> That seems a bit too verbose to me. :) Would the following work?
>
> /* On transmit, software and hardware timestamps are returned independently.
>  * As the two skb clones share the hardware timestamp, which may be updated
>  * before the software timestamp is received, a hardware TX timestamp may be
>  * returned only if there is no software TX timestamp. Ignore false software
>  * timestamps, which may be made in the __sock_recv_timestamp() call when the
>  * option SO_TIMESTAMP(NS) is enabled on the socket, even when the skb has a
>  * hardware timestamp.
>  */

Looks great, thanks.

>
>> > +static bool skb_is_swtx_tstamp(const struct sk_buff *skb,
>> > +  const struct sock *sk, int false_tstamp)
>> > +{
>> > +   if (false_tstamp && sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW)
>>
>> Also, why is it ignored only for the new mode?
>
> Good point. That should not be there. The function can be now reduced
> to a single line again. I originally tried a different approach,
> disabling false timestamps in the new mode, but then I thought it's
> better to not complicate it unnecessarily and keep it consistent.
>
> --
> Miroslav Lichvar

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Ivan Vecera

2017-05-19 17:05 GMT+02:00 Nikolay Aleksandrov :
> On 5/19/17 6:03 PM, Ivan Vecera wrote:
>>
>> 2017-05-19 16:57 GMT+02:00 Nikolay Aleksandrov
>> :
>>>
>>> On 5/19/17 5:51 PM, Ivan Vecera wrote:


 2017-05-19 16:45 GMT+02:00 Nikolay Aleksandrov
 :
>
>
> On 5/19/17 5:20 PM, Xin Long wrote:
>>
>>
>>
>> Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
>> kernel hello and hold timers"), bridge would not start hello_timer if
>> stp_enabled is not KERNEL_STP when br_dev_open.
>>
>> The problem is even if users set stp_enabled with KERNEL_STP later,
>> the timer will still not be started. It causes that KERNEL_STP can
>> not really work. Users have to re-ifup the bridge to avoid this.
>>
>> This patch is to fix it by starting br->hello_timer when enabling
>> KERNEL_STP in br_stp_start.
>>
>> As an improvement, it's also to start hello_timer again only when
>> br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
>> no reason to start the timer again when it's NO_STP.
>>
>> Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop
>> kernel
>> hello and hold timers")
>> Reported-by: Haidong Li 
>> Signed-off-by: Xin Long 
>> ---
>> net/bridge/br_stp_if.c| 1 +
>> net/bridge/br_stp_timer.c | 2 +-
>> 2 files changed, 2 insertions(+), 1 deletion(-)
>>
>
> This doesn't make much sense to me, how do you change from USER_STP to
> KERNEL_STP without first going through NO_STP ?
>
> If you go through NO_STP then all will be fine because br_stp_stop will
> restart
> the timers if the previous val was USER_STP.
>
 The problem occurs when KERNEL_STP is enabled if the bridge itself is
 already
 up. Then the hello_timer is not started. If the hello and hold timers
 should run only
 when KERNEL_STP is used then there are another problematic places
 (will send follow-up).

 Ivan

>>>
>>> Oh, the problem seems to be rather going from NO_STP -> KERNEL_STP only
>>> then, because you cannot do direct USER_STP -> KERNEL_STP.
>>>
>> No only NO_STP->KERNEL_STP but KERNEL_STP->NO_STP as well as
>> USER_STP->NO_STP:
>>
>> 1) NO_STP->KERNEL_STP issue
>> hello_timer should be started in br_stp_start() - this patch
>>
>
> Right, I was talking only about this patch. By the way what about
> the port hold_timers ? This patch only starts the hello_timer.

The hold_timers should be started indirectly from br_transmit_config()
called from
br_config_bpdu_generation() called from hello_timer_expired() handler.

I.

>> 2) KERNEL_STP->NO_STP issue
>> hello timer and hold timers should be stopped (deleted) in br_stp_stop()
>>
>> 3) USER_STP->NO_STP issue
>> hello timer and hold timers should NOT be started in br_stp_stop()
>>
>
> Yep, ack.
>
>> Ivan
>>
>

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Ivan Vecera

2017-05-19 16:20 GMT+02:00 Xin Long :
> Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
> kernel hello and hold timers"), bridge would not start hello_timer if
> stp_enabled is not KERNEL_STP when br_dev_open.
>
> The problem is even if users set stp_enabled with KERNEL_STP later,
> the timer will still not be started. It causes that KERNEL_STP can
> not really work. Users have to re-ifup the bridge to avoid this.
>
> This patch is to fix it by starting br->hello_timer when enabling
> KERNEL_STP in br_stp_start.
>
> As an improvement, it's also to start hello_timer again only when
> br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
> no reason to start the timer again when it's NO_STP.
>
> Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello 
> and hold timers")
> Reported-by: Haidong Li 
> Signed-off-by: Xin Long 
> ---
>  net/bridge/br_stp_if.c| 1 +
>  net/bridge/br_stp_timer.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
> index 08341d2..0db8102 100644
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -179,6 +179,7 @@ static void br_stp_start(struct net_bridge *br)
> br_debug(br, "using kernel STP\n");
>
> /* To start timers on any ports left in blocking */
> +   mod_timer(>hello_timer, jiffies + br->hello_time);
> br_port_state_selection(br);
> }
>
> diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
> index c98b3e5..60b6fe2 100644
> --- a/net/bridge/br_stp_timer.c
> +++ b/net/bridge/br_stp_timer.c
> @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg)
> if (br->dev->flags & IFF_UP) {
> br_config_bpdu_generation(br);
>
> -   if (br->stp_enabled != BR_USER_STP)
> +   if (br->stp_enabled == BR_KERNEL_STP)
> mod_timer(>hello_timer,
>   round_jiffies(jiffies + br->hello_time));
> }
> --
> 2.1.0
>
Reviewed-by: Ivan Vecera

[PATCH net] sctp: fix ICMP processing if skb is non-linear

2017-05-19 Thread Davide Caratti

when the ICMP packet is carried by a paged skb, sctp_err_lookup() may fail
validation even if the payload contents match an open socket: as a
consequence, sometimes ICMPs are wrongly ignored. Use skb_header_pointer()
to retrieve encapsulated SCTP headers, to ensure that ICMP payloads are
validated correctly, also when skbs are non-linear.

Signed-off-by: Davide Caratti 
---
 include/net/sctp/sctp.h |  2 +-
 net/sctp/input.c| 29 +++--
 net/sctp/ipv6.c |  2 +-
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 069582e..1b8c16b 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -152,7 +152,7 @@ void sctp_v4_err(struct sk_buff *skb, u32 info);
 void sctp_hash_endpoint(struct sctp_endpoint *);
 void sctp_unhash_endpoint(struct sctp_endpoint *);
 struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *,
-struct sctphdr *, struct sctp_association **,
+struct sctp_association **,
 struct sctp_transport **);
 void sctp_err_finish(struct sock *, struct sctp_transport *);
 void sctp_icmp_frag_needed(struct sock *, struct sctp_association *,
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 0e06a27..7f3f983 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -469,19 +469,19 @@ void sctp_icmp_proto_unreachable(struct sock *sk,
 
 /* Common lookup code for icmp/icmpv6 error handler. */
 struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *skb,
-struct sctphdr *sctphdr,
 struct sctp_association **app,
 struct sctp_transport **tpp)
 {
+   struct sctp_init_chunk _chunkhdr, *chunkhdr;
+   struct sctphdr _sctphdr, *sctphdr;
union sctp_addr saddr;
union sctp_addr daddr;
struct sctp_af *af;
struct sock *sk = NULL;
struct sctp_association *asoc;
struct sctp_transport *transport = NULL;
-   struct sctp_init_chunk *chunkhdr;
-   __u32 vtag = ntohl(sctphdr->vtag);
-   int len = skb->len - ((void *)sctphdr - (void *)skb->data);
+   int offset;
+   __u32 vtag;
 
*app = NULL; *tpp = NULL;
 
@@ -515,14 +515,23 @@ struct sock *sctp_err_lookup(struct net *net, int family, 
struct sk_buff *skb,
 * or the chunk type or the Initiate Tag does not match, silently
 * discard the packet.
 */
+   offset = skb_transport_offset(skb);
+   sctphdr = skb_header_pointer(skb, offset, sizeof(_sctphdr), &_sctphdr);
+   if (unlikely(!sctphdr))
+   goto out;
+
+   vtag = ntohl(sctphdr->vtag);
if (vtag == 0) {
-   chunkhdr = (void *)sctphdr + sizeof(struct sctphdr);
-   if (len < sizeof(struct sctphdr) + sizeof(sctp_chunkhdr_t)
- + sizeof(__be32) ||
+   offset += sizeof(_sctphdr);
+   /* chunk header + first 4 octects of init header */
+   chunkhdr = skb_header_pointer(skb, offset,
+ sizeof(struct sctp_chunkhdr) +
+ sizeof(__be32), &_chunkhdr);
+   if (!chunkhdr ||
chunkhdr->chunk_hdr.type != SCTP_CID_INIT ||
-   ntohl(chunkhdr->init_hdr.init_tag) != asoc->c.my_vtag) {
+   ntohl(chunkhdr->init_hdr.init_tag) != asoc->c.my_vtag)
goto out;
-   }
+
} else if (vtag != asoc->c.peer_vtag) {
goto out;
}
@@ -585,7 +594,7 @@ void sctp_v4_err(struct sk_buff *skb, __u32 info)
savesctp = skb->transport_header;
skb_reset_network_header(skb);
skb_set_transport_header(skb, ihlen);
-   sk = sctp_err_lookup(net, AF_INET, skb, sctp_hdr(skb), , 
);
+   sk = sctp_err_lookup(net, AF_INET, skb, , );
/* Put back, the original values. */
skb->network_header = saveip;
skb->transport_header = savesctp;
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 142b70e..d72c8d5 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -157,7 +157,7 @@ static void sctp_v6_err(struct sk_buff *skb, struct 
inet6_skb_parm *opt,
savesctp = skb->transport_header;
skb_reset_network_header(skb);
skb_set_transport_header(skb, offset);
-   sk = sctp_err_lookup(net, AF_INET6, skb, sctp_hdr(skb), , 
);
+   sk = sctp_err_lookup(net, AF_INET6, skb, , );
/* Put back, the original pointers. */
skb->network_header   = saveip;
skb->transport_header = savesctp;
-- 
2.7.4

Re: [RFC net-next PATCH 2/5] mlx5: fix bug reading rss_hash_type from CQE

2017-05-19 Thread Daniel Borkmann


On 05/18/2017 05:41 PM, Jesper Dangaard Brouer wrote:

Masks for extracting part of the Completion Queue Entry (CQE)
field rss_hash_type was swapped, namely CQE_RSS_HTYPE_IP and
CQE_RSS_HTYPE_L4.

The bug resulted in setting skb->l4_hash, even-though the
rss_hash_type indicated that hash was NOT computed over the
L4 (UDP or TCP) part of the packet.

Added comments from the datasheet, to make it more clear what
these masks are selecting.

Signed-off-by: Jesper Dangaard Brouer 
---


Stand-alone fix for -net tree?


  include/linux/mlx5/device.h |   10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index dd9a263ed368..a940ec6a046c 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -787,8 +787,14 @@ enum {
  };

  enum {
-   CQE_RSS_HTYPE_IP= 0x3 << 6,
-   CQE_RSS_HTYPE_L4= 0x3 << 2,
+   CQE_RSS_HTYPE_IP= 0x3 << 2,
+   /* cqe->rss_hash_type[3:2] - IP destination selected for hash
+* (00 = none,  01 = IPv4, 10 = IPv6, 11 = Reserved)
+*/
+   CQE_RSS_HTYPE_L4= 0x3 << 6,
+   /* cqe->rss_hash_type[7:6] - L4 destination selected for hash
+* (00 = none, 01 = TCP. 10 = UDP, 11 = IPSEC.SPI
+*/
  };

  enum {

[PATCH v6 net-next 7/7] net: ethernet: update drivers to make both SW and HW TX timestamps

2017-05-19 Thread Miroslav Lichvar

Some drivers were calling the skb_tx_timestamp() function only when
a hardware timestamp was not requested. Now that applications can use
the SOF_TIMESTAMPING_OPT_TX_SWHW option to request both software and
hardware timestamps, the drivers need to be modified to unconditionally
call skb_tx_timestamp().

CC: Richard Cochran 
CC: Willem de Bruijn 
Signed-off-by: Miroslav Lichvar 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c  | 3 +--
 drivers/net/ethernet/intel/e1000e/netdev.c| 4 ++--
 drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c   | 3 +--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 ++
 4 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 89b21d7..5a2ad9c 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1391,8 +1391,7 @@ static void xgbe_prep_tx_tstamp(struct xgbe_prv_data 
*pdata,
spin_unlock_irqrestore(>tstamp_lock, flags);
}
 
-   if (!XGMAC_GET_BITS(packet->attributes, TX_PACKET_ATTRIBUTES, PTP))
-   skb_tx_timestamp(skb);
+   skb_tx_timestamp(skb);
 }
 
 static void xgbe_prep_vlan(struct sk_buff *skb, struct xgbe_packet_data 
*packet)
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 0ff9295..6ed3bc4 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5868,10 +5868,10 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
adapter->tx_hwtstamp_skb = skb_get(skb);
adapter->tx_hwtstamp_start = jiffies;
schedule_work(>tx_hwtstamp_work);
-   } else {
-   skb_tx_timestamp(skb);
}
 
+   skb_tx_timestamp(skb);
+
netdev_sent_queue(netdev, skb->len);
e1000_tx_queue(tx_ring, tx_flags, count);
/* Make sure there is space in the ring for the next send. */
diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c 
b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
index 1e59435..89831ad 100644
--- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
+++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
@@ -1418,8 +1418,7 @@ static netdev_tx_t sxgbe_xmit(struct sk_buff *skb, struct 
net_device *dev)
priv->hw->desc->tx_enable_tstamp(first_desc);
}
 
-   if (!tqueue->hwts_tx_en)
-   skb_tx_timestamp(skb);
+   skb_tx_timestamp(skb);
 
priv->hw->dma->enable_dma_transmission(priv->ioaddr, txq_index);
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index cce862b..27c12e7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2880,8 +2880,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, 
struct net_device *dev)
priv->xstats.tx_set_ic_bit++;
}
 
-   if (!priv->hwts_tx_en)
-   skb_tx_timestamp(skb);
+   skb_tx_timestamp(skb);
 
if (unlikely((skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
 priv->hwts_tx_en)) {
@@ -3084,8 +3083,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, 
struct net_device *dev)
priv->xstats.tx_set_ic_bit++;
}
 
-   if (!priv->hwts_tx_en)
-   skb_tx_timestamp(skb);
+   skb_tx_timestamp(skb);
 
/* Ready to fill the first descriptor and set the OWN bit w/o any
 * problems because all the descriptors are actually ready to be
-- 
2.9.3

[PATCH v6 net-next 5/7] net: fix documentation of struct scm_timestamping

2017-05-19 Thread Miroslav Lichvar

The scm_timestamping struct may return multiple non-zero fields, e.g.
when both software and hardware RX timestamping is enabled, or when the
SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false
software timestamp is generated in the recvmsg() call in order to always
return a SCM_TIMESTAMP(NS) message.

CC: Richard Cochran 
CC: Willem de Bruijn 
Signed-off-by: Miroslav Lichvar 
---
 Documentation/networking/timestamping.txt | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/timestamping.txt 
b/Documentation/networking/timestamping.txt
index ce11e3a..50eb0e5 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -322,7 +322,7 @@ struct scm_timestamping {
 };
 
 The structure can return up to three timestamps. This is a legacy
-feature. Only one field is non-zero at any time. Most timestamps
+feature. At least one field is non-zero at any time. Most timestamps
 are passed in ts[0]. Hardware timestamps are passed in ts[2].
 
 ts[1] used to hold hardware timestamps converted to system time.
@@ -331,6 +331,12 @@ a HW PTP clock source, to allow time conversion in 
userspace and
 optionally synchronize system time with a userspace PTP stack such
 as linuxptp. For the PTP clock API, see Documentation/ptp/ptp.txt.
 
+Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled
+together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false
+software timestamp will be generated in the recvmsg() call and passed
+in ts[0] when a real software timestamp is missing. This happens also
+on hardware transmit timestamps.
+
 2.1.1 Transmit timestamps with MSG_ERRQUEUE
 
 For transmit timestamps the outgoing packet is looped back to the
-- 
2.9.3

[PATCH v6 net-next 6/7] net: allow simultaneous SW and HW transmit timestamping

2017-05-19 Thread Miroslav Lichvar

Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to
be looped to the socket's error queue with a software timestamp even
when a hardware transmit timestamp is expected to be provided by the
driver.

Applications using this option will receive two separate messages from
the error queue, one with a software timestamp and the other with a
hardware timestamp. As the hardware timestamp is saved to the shared skb
info, which may happen before the first message with software timestamp
is received by the application, the hardware timestamp is copied to the
SCM_TIMESTAMPING control message only when the skb has no software
timestamp or it is an incoming packet.

While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as
there are no other users.

CC: Richard Cochran 
CC: Willem de Bruijn 
Signed-off-by: Miroslav Lichvar 
---
 Documentation/networking/timestamping.txt |  8 
 include/linux/skbuff.h| 10 ++
 include/uapi/linux/net_tstamp.h   |  3 ++-
 net/core/skbuff.c |  4 
 net/socket.c  | 20 ++--
 5 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/Documentation/networking/timestamping.txt 
b/Documentation/networking/timestamping.txt
index 50eb0e5..196ba17 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -203,6 +203,14 @@ SOF_TIMESTAMPING_OPT_PKTINFO:
   enabled and the driver is using NAPI. The struct contains also two
   other fields, but they are reserved and undefined.
 
+SOF_TIMESTAMPING_OPT_TX_SWHW:
+
+  Request both hardware and software timestamps for outgoing packets
+  when SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE
+  are enabled at the same time. If both timestamps are generated,
+  two separate messages will be looped to the socket's error queue,
+  each containing just one timestamp.
+
 New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to
 disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate
 regardless of the setting of sysctl net.core.tstamp_allow_data.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 1f8028c..3b2e284 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3254,13 +3254,6 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 void skb_tstamp_tx(struct sk_buff *orig_skb,
   struct skb_shared_hwtstamps *hwtstamps);
 
-static inline void sw_tx_timestamp(struct sk_buff *skb)
-{
-   if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP &&
-   !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))
-   skb_tstamp_tx(skb, NULL);
-}
-
 /**
  * skb_tx_timestamp() - Driver hook for transmit timestamping
  *
@@ -3276,7 +3269,8 @@ static inline void sw_tx_timestamp(struct sk_buff *skb)
 static inline void skb_tx_timestamp(struct sk_buff *skb)
 {
skb_clone_tx_timestamp(skb);
-   sw_tx_timestamp(skb);
+   if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)
+   skb_tstamp_tx(skb, NULL);
 }
 
 /**
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index dee74d3..3d421d9 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -28,8 +28,9 @@ enum {
SOF_TIMESTAMPING_OPT_TSONLY = (1<<11),
SOF_TIMESTAMPING_OPT_STATS = (1<<12),
SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13),
+   SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
 
-   SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_PKTINFO,
+   SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_TX_SWHW,
SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 SOF_TIMESTAMPING_LAST
 };
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 346d3e8..68c02df 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3875,6 +3875,10 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
if (!sk)
return;
 
+   if (!hwtstamps && !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) &&
+   skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)
+   return;
+
tsonly = sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TSONLY;
if (!skb_may_tx_timestamp(sk, tsonly))
return;
diff --git a/net/socket.c b/net/socket.c
index 67db7d8..cb355a7 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -662,6 +662,19 @@ static bool skb_is_err_queue(const struct sk_buff *skb)
return skb->pkt_type == PACKET_OUTGOING;
 }
 
+/* On transmit, software and hardware timestamps are returned independently.
+ * As the two skb clones share the hardware timestamp, which may be updated
+ * before the software timestamp is received, a hardware TX timestamp may be
+ * returned only if there is no software TX timestamp. Ignore false software
+ * timestamps, which may be made in the __sock_recv_timestamp() call when the
+ * option

Re: [PATCH net] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Stephen Hemminger

On Fri, 19 May 2017 18:25:43 +0200
Ivan Vecera  wrote:

> Current bridge code incorrectly handles starting/stopping of hello and
> hold timers during STP enable/disable.
> 
> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
>transition. This is not correct as the timers are stopped in NO_STP
>case.
> 
> 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition.
>This is not also correct as the timers should be stopped in NO_STP
>state.
> 
> 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP
>transition. They should be stopped as they are running in KERNEL_STP
>state and should not run in NO_STP case.
> 
> The patch is a follow-up for "bridge: start hello_timer when enabling
> KERNEL_STP in br_stp_start" patch from Xin Long.
> 
> Cc: da...@davemloft.net
> Cc: sas...@cumulusnetworks.com
> Cc: step...@networkplumber.org
> Cc: bri...@lists.linux-foundation.org
> Cc: lucien@gmail.com
> Cc: niko...@cumulusnetworks.com
> Signed-off-by: Ivan Vecera 

Overall, this looks correct but the wording of commit message
is too terse.

It would be better to add a more complete description of the impact
of this from a user's point of view. I am concerned that this
might have other side effects.

For example, what is the sequence of commands to validated this.

What is the impact, should this go to stable?

[PATCH net-next 7/9] fou: make local function static

2017-05-19 Thread Stephen Hemminger

The build header functions are not used by any other code.

net/ipv6/fou6.c:36:5: warning: no previous prototype for ‘fou6_build_header’ 
[-Wmissing-prototypes]
net/ipv6/fou6.c:54:5: warning: no previous prototype for ‘gue6_build_header’ 
[-Wmissing-prototypes]

Need to do some code rearranging to satisfy different Kconfig possiblities.

Signed-off-by: Stephen Hemminger 
---
 net/ipv4/fou.c  | 82 -
 net/ipv6/fou6.c | 14 +-
 2 files changed, 47 insertions(+), 49 deletions(-)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 805f6607f8d9..8e0257d01200 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -859,25 +860,6 @@ size_t gue_encap_hlen(struct ip_tunnel_encap *e)
 }
 EXPORT_SYMBOL(gue_encap_hlen);
 
-static void fou_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e,
- struct flowi4 *fl4, u8 *protocol, __be16 sport)
-{
-   struct udphdr *uh;
-
-   skb_push(skb, sizeof(struct udphdr));
-   skb_reset_transport_header(skb);
-
-   uh = udp_hdr(skb);
-
-   uh->dest = e->dport;
-   uh->source = sport;
-   uh->len = htons(skb->len);
-   udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb,
-fl4->saddr, fl4->daddr, skb->len);
-
-   *protocol = IPPROTO_UDP;
-}
-
 int __fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
   u8 *protocol, __be16 *sport, int type)
 {
@@ -894,24 +876,6 @@ int __fou_build_header(struct sk_buff *skb, struct 
ip_tunnel_encap *e,
 }
 EXPORT_SYMBOL(__fou_build_header);
 
-int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
-u8 *protocol, struct flowi4 *fl4)
-{
-   int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM :
-  SKB_GSO_UDP_TUNNEL;
-   __be16 sport;
-   int err;
-
-   err = __fou_build_header(skb, e, protocol, , type);
-   if (err)
-   return err;
-
-   fou_build_udp(skb, e, fl4, protocol, sport);
-
-   return 0;
-}
-EXPORT_SYMBOL(fou_build_header);
-
 int __gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
   u8 *protocol, __be16 *sport, int type)
 {
@@ -985,8 +949,46 @@ int __gue_build_header(struct sk_buff *skb, struct 
ip_tunnel_encap *e,
 }
 EXPORT_SYMBOL(__gue_build_header);
 
-int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
-u8 *protocol, struct flowi4 *fl4)
+#ifdef CONFIG_NET_FOU_IP_TUNNELS
+
+static void fou_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e,
+ struct flowi4 *fl4, u8 *protocol, __be16 sport)
+{
+   struct udphdr *uh;
+
+   skb_push(skb, sizeof(struct udphdr));
+   skb_reset_transport_header(skb);
+
+   uh = udp_hdr(skb);
+
+   uh->dest = e->dport;
+   uh->source = sport;
+   uh->len = htons(skb->len);
+   udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb,
+fl4->saddr, fl4->daddr, skb->len);
+
+   *protocol = IPPROTO_UDP;
+}
+
+static int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+   u8 *protocol, struct flowi4 *fl4)
+{
+   int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM :
+  SKB_GSO_UDP_TUNNEL;
+   __be16 sport;
+   int err;
+
+   err = __fou_build_header(skb, e, protocol, , type);
+   if (err)
+   return err;
+
+   fou_build_udp(skb, e, fl4, protocol, sport);
+
+   return 0;
+}
+
+static int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+   u8 *protocol, struct flowi4 *fl4)
 {
int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM :
   SKB_GSO_UDP_TUNNEL;
@@ -1001,9 +1003,7 @@ int gue_build_header(struct sk_buff *skb, struct 
ip_tunnel_encap *e,
 
return 0;
 }
-EXPORT_SYMBOL(gue_build_header);
 
-#ifdef CONFIG_NET_FOU_IP_TUNNELS
 
 static const struct ip_tunnel_encap_ops fou_iptun_ops = {
.encap_hlen = fou_encap_hlen,
diff --git a/net/ipv6/fou6.c b/net/ipv6/fou6.c
index 9ea249b9451e..6de3c04b0f30 100644
--- a/net/ipv6/fou6.c
+++ b/net/ipv6/fou6.c
@@ -14,6 +14,8 @@
 #include 
 #include 
 
+#if IS_ENABLED(CONFIG_IPV6_FOU_TUNNEL)
+
 static void fou6_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e,
   struct flowi6 *fl6, u8 *protocol, __be16 sport)
 {
@@ -33,8 +35,8 @@ static void fou6_build_udp(struct sk_buff *skb, struct 
ip_tunnel_encap *e,
*protocol = IPPROTO_UDP;
 }
 
-int fou6_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
- u8 *protocol, struct flowi6 *fl6)
+static int

[PATCH net-next 1/9] dcb: enforce minimum length on IEEE_APPS attribute

2017-05-19 Thread Stephen Hemminger

Found by reviewing the warning about unused policy table.
The code implies that it meant to check for size, but since
it unrolled the loop for attribute validation that is never used.
Instead do explicit check for attribute.

Compile tested only. Needs review by original author.

Signed-off-by: Stephen Hemminger 
---
 net/dcb/dcbnl.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 93106120f987..733f523707ac 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -178,10 +178,6 @@ static const struct nla_policy 
dcbnl_ieee_policy[DCB_ATTR_IEEE_MAX + 1] = {
[DCB_ATTR_IEEE_QCN_STATS]   = {.len = sizeof(struct ieee_qcn_stats)},
 };
 
-static const struct nla_policy dcbnl_ieee_app[DCB_ATTR_IEEE_APP_MAX + 1] = {
-   [DCB_ATTR_IEEE_APP] = {.len = sizeof(struct dcb_app)},
-};
-
 /* DCB number of traffic classes nested attributes. */
 static const struct nla_policy dcbnl_featcfg_nest[DCB_FEATCFG_ATTR_MAX + 1] = {
[DCB_FEATCFG_ATTR_ALL]  = {.type = NLA_FLAG},
@@ -1463,8 +1459,15 @@ static int dcbnl_ieee_set(struct net_device *netdev, 
struct nlmsghdr *nlh,
 
nla_for_each_nested(attr, ieee[DCB_ATTR_IEEE_APP_TABLE], rem) {
struct dcb_app *app_data;
+
if (nla_type(attr) != DCB_ATTR_IEEE_APP)
continue;
+
+   if (nla_len(attr) < sizeof(struct dcb_app)) {
+   err = -ERANGE;
+   goto err;
+   }
+
app_data = nla_data(attr);
if (ops->ieee_setapp)
err = ops->ieee_setapp(netdev, app_data);
-- 
2.11.0

[PATCH net-next 4/9] inet: fix warning about missing prototype

2017-05-19 Thread Stephen Hemminger

The prototype for inet_rcv_saddr_equal was not being included.

Signed-off-by: Stephen Hemminger 
---
 net/ipv4/inet_connection_sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 1054d330bf9d..82dec8825d28 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef INET_CSK_DEBUG
 const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n";
-- 
2.11.0

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Nikolay Aleksandrov


On 5/19/17 5:20 PM, Xin Long wrote:

Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
kernel hello and hold timers"), bridge would not start hello_timer if
stp_enabled is not KERNEL_STP when br_dev_open.

The problem is even if users set stp_enabled with KERNEL_STP later,
the timer will still not be started. It causes that KERNEL_STP can
not really work. Users have to re-ifup the bridge to avoid this.

This patch is to fix it by starting br->hello_timer when enabling
KERNEL_STP in br_stp_start.

As an improvement, it's also to start hello_timer again only when
br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
no reason to start the timer again when it's NO_STP.

Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and 
hold timers")
Reported-by: Haidong Li 
Signed-off-by: Xin Long 
---
  net/bridge/br_stp_if.c| 1 +
  net/bridge/br_stp_timer.c | 2 +-
  2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 08341d2..0db8102 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -179,6 +179,7 @@ static void br_stp_start(struct net_bridge *br)
br_debug(br, "using kernel STP\n");
  
  		/* To start timers on any ports left in blocking */

+   mod_timer(>hello_timer, jiffies + br->hello_time);
br_port_state_selection(br);
}
  
diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c

index c98b3e5..60b6fe2 100644
--- a/net/bridge/br_stp_timer.c
+++ b/net/bridge/br_stp_timer.c
@@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg)
if (br->dev->flags & IFF_UP) {
br_config_bpdu_generation(br);
  
-		if (br->stp_enabled != BR_USER_STP)

+   if (br->stp_enabled == BR_KERNEL_STP)
mod_timer(>hello_timer,
  round_jiffies(jiffies + br->hello_time));
}



+CC Bridge maintainers & fixes commit author

Acked-by: Nikolay Aleksandrov

Re: [PATCH v6 net-next 5/7] net: fix documentation of struct scm_timestamping

2017-05-19 Thread Willem de Bruijn

On Fri, May 19, 2017 at 11:52 AM, Miroslav Lichvar  wrote:
> The scm_timestamping struct may return multiple non-zero fields, e.g.
> when both software and hardware RX timestamping is enabled, or when the
> SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false
> software timestamp is generated in the recvmsg() call in order to always
> return a SCM_TIMESTAMP(NS) message.
>
> CC: Richard Cochran 
> CC: Willem de Bruijn 
> Signed-off-by: Miroslav Lichvar 

Acked-by: Willem de Bruijn

Re: [PATCH v6 net-next 6/7] net: allow simultaneous SW and HW transmit timestamping

2017-05-19 Thread Willem de Bruijn

On Fri, May 19, 2017 at 11:52 AM, Miroslav Lichvar  wrote:
> Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to
> be looped to the socket's error queue with a software timestamp even
> when a hardware transmit timestamp is expected to be provided by the
> driver.
>
> Applications using this option will receive two separate messages from
> the error queue, one with a software timestamp and the other with a
> hardware timestamp. As the hardware timestamp is saved to the shared skb
> info, which may happen before the first message with software timestamp
> is received by the application, the hardware timestamp is copied to the
> SCM_TIMESTAMPING control message only when the skb has no software
> timestamp or it is an incoming packet.
>
> While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as
> there are no other users.
>
> CC: Richard Cochran 
> CC: Willem de Bruijn 
> Signed-off-by: Miroslav Lichvar 

Acked-by: Willem de Bruijn

Re: [PATCH v5 net-next 5/7] net: fix documentation of struct scm_timestamping

2017-05-19 Thread Willem de Bruijn

On Fri, May 19, 2017 at 6:11 AM, Miroslav Lichvar  wrote:
> On Thu, May 18, 2017 at 03:38:30PM -0400, Willem de Bruijn wrote:
>> On Thu, May 18, 2017 at 10:07 AM, Miroslav Lichvar  
>> wrote:
>> > +Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled
>> > +together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false
>> > +software timestamp will be generated in the recvmsg() call and passed
>> > +in ts[0] when a real software timestamp is missing.
>>
>> With receive software timestamping this is expected behavior? I would make
>> explicit that this happens even on tx timestamps.
>
> How about adding ", e.g. when receive timestamping is enabled
> between receiving the message and the recvmsg() call, or it is a
> message with a hardware transmit timestamp." ?

Perhaps even more brief "This happens also on hardware tx timestamps."

>> > For this reason it
>> > +is not recommended to combine SO_TIMESTAMP(NS) with SO_TIMESTAMPING.
>>
>> And I'd remove this. The extra timestamp is harmless, and we may be missing
>> other reasons why someone would want to enable both on the same socket.
>
> Ok. I'm just concerned people will inadvertently use the timestamp as
> a real timestamp and then wonder why SW TX timestamping is so bad. I
> have fallen into this trap.

So have I. It is equally surprising when only enabling SO_TIMESTAMP and
observing out of order timestamps. It is certainly worth calling out.

[PATCH net] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Ivan Vecera

Current bridge code incorrectly handles starting/stopping of hello and
hold timers during STP enable/disable.

1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
   transition. This is not correct as the timers are stopped in NO_STP
   case.

2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition.
   This is not also correct as the timers should be stopped in NO_STP
   state.

3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP
   transition. They should be stopped as they are running in KERNEL_STP
   state and should not run in NO_STP case.

The patch is a follow-up for "bridge: start hello_timer when enabling
KERNEL_STP in br_stp_start" patch from Xin Long.

Cc: da...@davemloft.net
Cc: sas...@cumulusnetworks.com
Cc: step...@networkplumber.org
Cc: bri...@lists.linux-foundation.org
Cc: lucien@gmail.com
Cc: niko...@cumulusnetworks.com
Signed-off-by: Ivan Vecera 
---
 net/bridge/br_stp_if.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 0db8102995a5..f137ebf27755 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char 
*arg)
 
 static void br_stp_start(struct net_bridge *br)
 {
-   struct net_bridge_port *p;
int err = -ENOENT;
 
if (net_eq(dev_net(br->dev), _net))
@@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br)
if (!err) {
br->stp_enabled = BR_USER_STP;
br_debug(br, "userspace STP started\n");
-
-   /* Stop hello and hold timers */
-   del_timer(>hello_timer);
-   list_for_each_entry(p, >port_list, list)
-   del_timer(>hold_timer);
} else {
br->stp_enabled = BR_KERNEL_STP;
br_debug(br, "using kernel STP\n");
@@ -197,13 +191,14 @@ static void br_stp_stop(struct net_bridge *br)
br_err(br, "failed to stop userspace STP (%d)\n", err);
 
/* To start timers on any ports left in blocking */
-   mod_timer(>hello_timer, jiffies + br->hello_time);
-   list_for_each_entry(p, >port_list, list)
-   mod_timer(>hold_timer,
- round_jiffies(jiffies + BR_HOLD_TIME));
spin_lock_bh(>lock);
br_port_state_selection(br);
spin_unlock_bh(>lock);
+   } else {
+   /* BR_KERNEL_STP - stop hello and hold timers */
+   del_timer(>hello_timer);
+   list_for_each_entry(p, >port_list, list)
+   del_timer(>hold_timer);
}
 
br->stp_enabled = BR_NO_STP;
-- 
2.13.0

Re: [PATCH net] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Nikolay Aleksandrov


On 5/19/17 7:25 PM, Ivan Vecera wrote:

Current bridge code incorrectly handles starting/stopping of hello and
hold timers during STP enable/disable.

1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
transition. This is not correct as the timers are stopped in NO_STP
case.


This really is a noop, but ok.



2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition.
This is not also correct as the timers should be stopped in NO_STP
state.


Indeed, but the actual end result is almost as them being stopped because
in the timers there are specific checks if the STP == KERNEL_STP (see
br_transmit_config()) and the hold_timers will simply expire and not rearm
in any other mode. The only real problem is the hello_timer which continues
to rearm itself, but with Xin's earlier patch that is taken care of too.



3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP
transition. They should be stopped as they are running in KERNEL_STP
state and should not run in NO_STP case.


Same comment as for point 2.



The patch is a follow-up for "bridge: start hello_timer when enabling
KERNEL_STP in br_stp_start" patch from Xin Long.



I'd say this is more of a cleanup/improvement after Xin's patch and thus would
suggest targeting net-next. The only real issue is fixed by his patch.


Cc: da...@davemloft.net
Cc: sas...@cumulusnetworks.com
Cc: step...@networkplumber.org
Cc: bri...@lists.linux-foundation.org
Cc: lucien@gmail.com
Cc: niko...@cumulusnetworks.com
Signed-off-by: Ivan Vecera 
---
  net/bridge/br_stp_if.c | 15 +--
  1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 0db8102995a5..f137ebf27755 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char 
*arg)
  
  static void br_stp_start(struct net_bridge *br)

  {
-   struct net_bridge_port *p;
int err = -ENOENT;
  
  	if (net_eq(dev_net(br->dev), _net))

@@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br)
if (!err) {
br->stp_enabled = BR_USER_STP;
br_debug(br, "userspace STP started\n");
-
-   /* Stop hello and hold timers */
-   del_timer(>hello_timer);
-   list_for_each_entry(p, >port_list, list)
-   del_timer(>hold_timer);
} else {
br->stp_enabled = BR_KERNEL_STP;
br_debug(br, "using kernel STP\n");
@@ -197,13 +191,14 @@ static void br_stp_stop(struct net_bridge *br)
br_err(br, "failed to stop userspace STP (%d)\n", err);
  
  		/* To start timers on any ports left in blocking */

-   mod_timer(>hello_timer, jiffies + br->hello_time);
-   list_for_each_entry(p, >port_list, list)
-   mod_timer(>hold_timer,
- round_jiffies(jiffies + BR_HOLD_TIME));
spin_lock_bh(>lock);
br_port_state_selection(br);
spin_unlock_bh(>lock);
+   } else {
+   /* BR_KERNEL_STP - stop hello and hold timers */
+   del_timer(>hello_timer);
+   list_for_each_entry(p, >port_list, list)
+   del_timer(>hold_timer);
}
  
  	br->stp_enabled = BR_NO_STP;

Re: [PATCH net] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Ivan Vecera

2017-05-19 18:51 GMT+02:00 Xin Long :
> On Sat, May 20, 2017 at 12:25 AM, Ivan Vecera  wrote:
>> Current bridge code incorrectly handles starting/stopping of hello and
>> hold timers during STP enable/disable.
>>
>> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
>>transition. This is not correct as the timers are stopped in NO_STP
>>case.
>>
>> 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition.
>>This is not also correct as the timers should be stopped in NO_STP
>>state.
>>
>> 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP
>>transition. They should be stopped as they are running in KERNEL_STP
>>state and should not run in NO_STP case.
>>
>> The patch is a follow-up for "bridge: start hello_timer when enabling
>> KERNEL_STP in br_stp_start" patch from Xin Long.
>>
>> Cc: da...@davemloft.net
>> Cc: sas...@cumulusnetworks.com
>> Cc: step...@networkplumber.org
>> Cc: bri...@lists.linux-foundation.org
>> Cc: lucien@gmail.com
>> Cc: niko...@cumulusnetworks.com
>> Signed-off-by: Ivan Vecera 
>> ---
>>  net/bridge/br_stp_if.c | 15 +--
>>  1 file changed, 5 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
>> index 0db8102995a5..f137ebf27755 100644
>> --- a/net/bridge/br_stp_if.c
>> +++ b/net/bridge/br_stp_if.c
>> @@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char 
>> *arg)
>>
>>  static void br_stp_start(struct net_bridge *br)
>>  {
>> -   struct net_bridge_port *p;
>> int err = -ENOENT;
>>
>> if (net_eq(dev_net(br->dev), _net))
>> @@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br)
>> if (!err) {
>> br->stp_enabled = BR_USER_STP;
>> br_debug(br, "userspace STP started\n");
>> -
>> -   /* Stop hello and hold timers */
>> -   del_timer(>hello_timer);
>> -   list_for_each_entry(p, >port_list, list)
>> -   del_timer(>hold_timer);
>> } else {
>> br->stp_enabled = BR_KERNEL_STP;
>> br_debug(br, "using kernel STP\n");
>> @@ -197,13 +191,14 @@ static void br_stp_stop(struct net_bridge *br)
>> br_err(br, "failed to stop userspace STP (%d)\n", 
>> err);
>>
>> /* To start timers on any ports left in blocking */
>> -   mod_timer(>hello_timer, jiffies + br->hello_time);
>> -   list_for_each_entry(p, >port_list, list)
>> -   mod_timer(>hold_timer,
>> - round_jiffies(jiffies + BR_HOLD_TIME));
>> spin_lock_bh(>lock);
>> br_port_state_selection(br);
>> spin_unlock_bh(>lock);
>> +   } else {
>> +   /* BR_KERNEL_STP - stop hello and hold timers */
>> +   del_timer(>hello_timer);
>> +   list_for_each_entry(p, >port_list, list)
>> +   del_timer(>hold_timer);
> I'm thinking, what if the timers are running when deleting them ?
> del_timer may not be going to delete it, and still have to stop itself
> next time when br->stp_enabled = BR_NO_STP.
>
> So do you think it's better to do nothing here and just leave it to be
> stopped by itself when checking br->stp_enabled  in
> br_hello_timer_expired ?

Yes, this kind of "lazy stopping" could be safer.

I.

Re: [PATCH net] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Ivan Vecera

2017-05-19 18:55 GMT+02:00 Nikolay Aleksandrov :
> On 5/19/17 7:25 PM, Ivan Vecera wrote:
>>
>> Current bridge code incorrectly handles starting/stopping of hello and
>> hold timers during STP enable/disable.
>>
>> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
>> transition. This is not correct as the timers are stopped in NO_STP
>> case.
>
>
> This really is a noop, but ok.

Yes, stopping of stopped timers are safe but confusing.

>> 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition.
>> This is not also correct as the timers should be stopped in NO_STP
>> state.
>
>
> Indeed, but the actual end result is almost as them being stopped because
> in the timers there are specific checks if the STP == KERNEL_STP (see
> br_transmit_config()) and the hold_timers will simply expire and not rearm
> in any other mode. The only real problem is the hello_timer which continues
> to rearm itself, but with Xin's earlier patch that is taken care of too.

Yes, this is clean-up as well. The starting of timers are more
confusing than dangerous
but from a reader's point of view the starting of timers is non-sense
when STP is
going to be disabled.

>>
>> 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP
>> transition. They should be stopped as they are running in KERNEL_STP
>> state and should not run in NO_STP case.
>
>
> Same comment as for point 2.
This can be removed... and leave hello_timer handler to stop itself.

>> The patch is a follow-up for "bridge: start hello_timer when enabling
>> KERNEL_STP in br_stp_start" patch from Xin Long.
>>
>
> I'd say this is more of a cleanup/improvement after Xin's patch and thus
> would
> suggest targeting net-next. The only real issue is fixed by his patch.
Agree... will send resend against net-next.

Thanks for comments,
Ivan

[PATCH iproute2 1/1] tc: fix Makefile to build skbmod

2017-05-19 Thread Roman Mashak

Signed-off-by: Roman Mashak 
---
 tc/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tc/Makefile b/tc/Makefile
index 9a6bb1d..678b302 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -45,6 +45,7 @@ TCMODULES += m_nat.o
 TCMODULES += m_pedit.o
 TCMODULES += m_ife.o
 TCMODULES += m_skbedit.o
+TCMODULES += m_skbmod.o
 TCMODULES += m_csum.o
 TCMODULES += m_simple.o
 TCMODULES += m_vlan.o
-- 
1.9.1

Re: [RFC net-next PATCH 1/5] samples/bpf: xdp_tx_iptunnel make use of map_data[]

2017-05-19 Thread Daniel Borkmann


On 05/18/2017 05:41 PM, Jesper Dangaard Brouer wrote:

There is no reason to use a compile time constant MAX_IPTNL_ENTRIES
shared between the _user.c and _kern.c, when map_data[].def.max_entries
can tell us dynamically what the max_entries were of the ELF map that
the bpf loaded created.

Signed-off-by: Jesper Dangaard Brouer 


Previous code was perhaps a bit more robust in the sense that the
order of the map wouldn't matter due to MAX_IPTNL_ENTRIES being
shared. Now you rely on it being in slot 1 (map_data[1].def.max_entries)
from "maps" section in ELF.


  samples/bpf/xdp_tx_iptunnel_common.h |2 --
  samples/bpf/xdp_tx_iptunnel_kern.c   |2 +-
  samples/bpf/xdp_tx_iptunnel_user.c   |   14 +-
  3 files changed, 10 insertions(+), 8 deletions(-)


Not sure it's worth it given this actually adds more code and makes
it more fragile at the same time. Only point I could see is to demo
usage of map_data[1].def.max_entries for sample code.

Perhaps at the very minimum add a warning comment to xdp_tx_iptunnel_kern.c
that should the code be further extended with additional maps, that
ordering of struct bpf_map_def entries really matters here to not break
the _user.c part.

Other than that, this should be sent as stand-alone "cleanup" ...

[PATCH v6 net-next 4/7] net: add new control message for incoming HW-timestamped packets

2017-05-19 Thread Miroslav Lichvar

Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
for incoming packets with hardware timestamps. It contains the index of
the real interface which received the packet and the length of the
packet at layer 2.

The index is useful with bonding, bridges and other interfaces, where
IP_PKTINFO doesn't allow applications to determine which PHC made the
timestamp. With the L2 length (and link speed) it is possible to
transpose preamble timestamps to trailer timestamps, which are used in
the NTP protocol.

While this information could be provided by two new socket options
independently from timestamping, it doesn't look like they would be very
useful. With this option any performance impact is limited to hardware
timestamping.

Use dev_get_by_napi_id() to get the device and its index. On kernels
with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero
index will be returned in the control message.

CC: Richard Cochran 
Acked-by: Willem de Bruijn 
Signed-off-by: Miroslav Lichvar 
---
 Documentation/networking/timestamping.txt | 10 ++
 include/uapi/asm-generic/socket.h |  2 ++
 include/uapi/linux/net_tstamp.h   | 11 ++-
 net/socket.c  | 27 ++-
 4 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/timestamping.txt 
b/Documentation/networking/timestamping.txt
index 96f5069..ce11e3a 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -193,6 +193,16 @@ SOF_TIMESTAMPING_OPT_STATS:
   the transmit timestamps, such as how long a certain block of
   data was limited by peer's receiver window.
 
+SOF_TIMESTAMPING_OPT_PKTINFO:
+
+  Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming
+  packets with hardware timestamps. The message contains struct
+  scm_ts_pktinfo, which supplies the index of the real interface which
+  received the packet and its length at layer 2. A valid (non-zero)
+  interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is
+  enabled and the driver is using NAPI. The struct contains also two
+  other fields, but they are reserved and undefined.
+
 New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to
 disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate
 regardless of the setting of sysctl net.core.tstamp_allow_data.
diff --git a/include/uapi/asm-generic/socket.h 
b/include/uapi/asm-generic/socket.h
index 2b48856..a5f6e81 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -100,4 +100,6 @@
 
 #define SO_COOKIE  57
 
+#define SCM_TIMESTAMPING_PKTINFO   58
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 0749fb1..dee74d3 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -9,6 +9,7 @@
 #ifndef _NET_TIMESTAMPING_H
 #define _NET_TIMESTAMPING_H
 
+#include 
 #include/* for SO_TIMESTAMPING */
 
 /* SO_TIMESTAMPING gets an integer bit field comprised of these values */
@@ -26,8 +27,9 @@ enum {
SOF_TIMESTAMPING_OPT_CMSG = (1<<10),
SOF_TIMESTAMPING_OPT_TSONLY = (1<<11),
SOF_TIMESTAMPING_OPT_STATS = (1<<12),
+   SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13),
 
-   SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_STATS,
+   SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_PKTINFO,
SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 SOF_TIMESTAMPING_LAST
 };
@@ -130,4 +132,11 @@ enum hwtstamp_rx_filters {
HWTSTAMP_FILTER_NTP_ALL,
 };
 
+/* SCM_TIMESTAMPING_PKTINFO control message */
+struct scm_ts_pktinfo {
+   __u32 if_index;
+   __u32 pkt_length;
+   __u32 reserved[2];
+};
+
 #endif /* _NET_TIMESTAMPING_H */
diff --git a/net/socket.c b/net/socket.c
index c2564eb..67db7d8 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -662,6 +662,27 @@ static bool skb_is_err_queue(const struct sk_buff *skb)
return skb->pkt_type == PACKET_OUTGOING;
 }
 
+static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb)
+{
+   struct scm_ts_pktinfo ts_pktinfo;
+   struct net_device *orig_dev;
+
+   if (!skb_mac_header_was_set(skb))
+   return;
+
+   memset(_pktinfo, 0, sizeof(ts_pktinfo));
+
+   rcu_read_lock();
+   orig_dev = dev_get_by_napi_id(skb_napi_id(skb));
+   if (orig_dev)
+   ts_pktinfo.if_index = orig_dev->ifindex;
+   rcu_read_unlock();
+
+   ts_pktinfo.pkt_length = skb->len - skb_mac_offset(skb);
+   put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_PKTINFO,
+sizeof(ts_pktinfo), _pktinfo);
+}
+
 /*
  * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
  */
@@ -699,8 +720,12 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock 
*sk,

[PATCH v6 net-next 2/7] net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALL

2017-05-19 Thread Miroslav Lichvar

Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid
filter and update drivers which can timestamp all packets, or which
explicitly list unsupported filters instead of using a default case, to
handle the filter.

CC: Richard Cochran 
CC: Willem de Bruijn 
Signed-off-by: Miroslav Lichvar 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c   | 1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   | 1 +
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 1 +
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 1 +
 drivers/net/ethernet/cavium/octeon/octeon_mgmt.c   | 1 +
 drivers/net/ethernet/intel/e1000e/netdev.c | 1 +
 drivers/net/ethernet/intel/i40e/i40e_ptp.c | 1 +
 drivers/net/ethernet/intel/igb/igb_ptp.c   | 1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c   | 1 +
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 1 +
 drivers/net/ethernet/neterion/vxge/vxge-main.c | 1 +
 drivers/net/ethernet/qlogic/qede/qede_ptp.c| 1 +
 drivers/net/ethernet/sfc/ef10.c| 1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 1 +
 drivers/net/ethernet/ti/cpsw.c | 1 +
 drivers/net/ethernet/tile/tilegx.c | 1 +
 net/core/dev_ioctl.c   | 3 +--
 18 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index c772420..89b21d7 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1268,6 +1268,7 @@ static int xgbe_set_hwtstamp_settings(struct 
xgbe_prv_data *pdata,
case HWTSTAMP_FILTER_NONE:
break;
 
+   case HWTSTAMP_FILTER_NTP_ALL:
case HWTSTAMP_FILTER_ALL:
XGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSENALL, 1);
XGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSENA, 1);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 7414ffd..14c236e 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -15351,6 +15351,7 @@ int bnx2x_configure_ptp_filters(struct bnx2x *bp)
break;
case HWTSTAMP_FILTER_ALL:
case HWTSTAMP_FILTER_SOME:
+   case HWTSTAMP_FILTER_NTP_ALL:
bp->rx_filter = HWTSTAMP_FILTER_NONE;
break;
case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 649f2aa..ba01242 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -3024,6 +3024,7 @@ static int hwtstamp_ioctl(struct net_device *netdev, 
struct ifreq *ifr)
case HWTSTAMP_FILTER_PTP_V2_EVENT:
case HWTSTAMP_FILTER_PTP_V2_SYNC:
case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+   case HWTSTAMP_FILTER_NTP_ALL:
conf.rx_filter = HWTSTAMP_FILTER_ALL;
break;
default:
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index d51c8d8..31d737c 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -2085,6 +2085,7 @@ static int hwtstamp_ioctl(struct net_device *netdev, 
struct ifreq *ifr)
case HWTSTAMP_FILTER_PTP_V2_EVENT:
case HWTSTAMP_FILTER_PTP_V2_SYNC:
case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+   case HWTSTAMP_FILTER_NTP_ALL:
conf.rx_filter = HWTSTAMP_FILTER_ALL;
break;
default:
diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c 
b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
index a213868..2887bca 100644
--- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
+++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
@@ -755,6 +755,7 @@ static int octeon_mgmt_ioctl_hwtstamp(struct net_device 
*netdev,
case HWTSTAMP_FILTER_PTP_V2_EVENT:
case HWTSTAMP_FILTER_PTP_V2_SYNC:
case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+   case HWTSTAMP_FILTER_NTP_ALL:
p->has_rx_tstamp = have_hw_timestamps;
config.rx_filter = HWTSTAMP_FILTER_ALL;
if (p->has_rx_tstamp) {
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index b367972..0ff9295 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3680,6 +3680,7 @@ static int e1000e_config_hwtstamp(struct e1000_adapter 
*adapter,
 * Delay Request messages but not both so fall-through to
 * time stamp all packets.
 */
+   case

[PATCH v6 net-next 3/7] net: add function to retrieve original skb device using NAPI ID

2017-05-19 Thread Miroslav Lichvar

Since commit b68581778cd0 ("net: Make skb->skb_iif always track
skb->dev") skbs don't have the original index of the interface which
received the packet. This information is now needed for a new control
message related to hardware timestamping.

Instead of adding a new field to skb, we can find the device by the NAPI
ID if it is available, i.e. CONFIG_NET_RX_BUSY_POLL is enabled and the
driver is using NAPI. Add dev_get_by_napi_id() and also skb_napi_id() to
hide the CONFIG_NET_RX_BUSY_POLL ifdef.

CC: Richard Cochran 
Suggested-by: Willem de Bruijn 
Acked-by: Willem de Bruijn 
Signed-off-by: Miroslav Lichvar 
---
 include/linux/netdevice.h |  1 +
 include/linux/skbuff.h|  9 +
 net/core/dev.c| 26 ++
 3 files changed, 36 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3f39d27..b6c36d5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2456,6 +2456,7 @@ static inline int dev_recursion_level(void)
 struct net_device *dev_get_by_index(struct net *net, int ifindex);
 struct net_device *__dev_get_by_index(struct net *net, int ifindex);
 struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex);
+struct net_device *dev_get_by_napi_id(unsigned int napi_id);
 int netdev_get_name(struct net *net, char *name, int ifindex);
 int dev_restart(struct net_device *dev);
 int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7c0cb2c..1f8028c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -855,6 +855,15 @@ static inline bool skb_pkt_type_ok(u32 ptype)
return ptype <= PACKET_OTHERHOST;
 }
 
+static inline unsigned int skb_napi_id(const struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+   return skb->napi_id;
+#else
+   return 0;
+#endif
+}
+
 void kfree_skb(struct sk_buff *skb);
 void kfree_skb_list(struct sk_buff *segs);
 void skb_tx_error(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index acd594c..6d3c452 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -162,6 +162,7 @@ static int netif_rx_internal(struct sk_buff *skb);
 static int call_netdevice_notifiers_info(unsigned long val,
 struct net_device *dev,
 struct netdev_notifier_info *info);
+static struct napi_struct *napi_by_id(unsigned int napi_id);
 
 /*
  * The @dev_base_head list is protected by @dev_base_lock and the rtnl
@@ -866,6 +867,31 @@ struct net_device *dev_get_by_index(struct net *net, int 
ifindex)
 EXPORT_SYMBOL(dev_get_by_index);
 
 /**
+ * dev_get_by_napi_id - find a device by napi_id
+ * @napi_id: ID of the NAPI struct
+ *
+ * Search for an interface by NAPI ID. Returns %NULL if the device
+ * is not found or a pointer to the device. The device has not had
+ * its reference counter increased so the caller must be careful
+ * about locking. The caller must hold RCU lock.
+ */
+
+struct net_device *dev_get_by_napi_id(unsigned int napi_id)
+{
+   struct napi_struct *napi;
+
+   WARN_ON_ONCE(!rcu_read_lock_held());
+
+   if (napi_id < MIN_NAPI_ID)
+   return NULL;
+
+   napi = napi_by_id(napi_id);
+
+   return napi ? napi->dev : NULL;
+}
+EXPORT_SYMBOL(dev_get_by_napi_id);
+
+/**
  * netdev_get_name - get a netdevice name, knowing its ifindex.
  * @net: network namespace
  * @name: a pointer to the buffer where the name will be stored.
-- 
2.9.3

[PATCH v6 net-next 0/7] Extend socket timestamping API

2017-05-19 Thread Miroslav Lichvar

Changes v5->v6:
- fixed skb_is_swtx_tstamp() when OPT_TX_SWHW is disabled and improved
  its description
- improved OPT_PKTINFO documentation
- improved scm_timestamping documentation

Changes v4->v5:
- fixed initialization of reserved fields in struct scm_ts_pktinfo

Changes v3->v4:
- added reserved fields to struct scm_ts_pktinfo
- replaced patch fixing false SW timestamps with a documentation fix
- updated OPT_TX_SWHW patch to handle false SW timestamps

Changes v2->v3:
- modified struct scm_ts_pktinfo to use fixed-width integer types
- added WARN_ON_ONCE for missing RCU lock in dev_get_by_napi_id()
- modified dev_get_by_napi_id() to not return dev in unexpected branch
- modified recv to return SCM_TIMESTAMPING_PKTINFO even if the interface
  index is unknown

Changes v1->v2:
- added separate patch for new NAPI functions 
- split code from __sock_recv_timestamp() for better readability
- fixed RCU locking
- fixed compiler warning (missing case in switch in first patch)
- inline sw_tx_timestamp() in its only user

Changes RFC->v1:
- reworked SOF_TIMESTAMPING_OPT_PKTINFO patch to not add new fields to
  skb shared info (net device is now looked up by napi_id), not require
  any changes in drivers, and restrict the cmsg to incoming packets
- renamed SOF_TIMESTAMPING_OPT_MULTIMSG to SOF_TIMESTAMPING_OPT_TX_SWHW
  and fixed its description
- moved struct scm_ts_pktinfo from errqueue.h to net_tstamp.h as it
  can't be received from the error queue anymore
- improved commit descriptions and removed incorrect comment

This patchset adds new options to the timestamping API that will be
useful for NTP implementations and possibly other applications.

The first patch specifies a timestamp filter for NTP packets. The second
patch updates drivers that can timestamp all packets, or need to list
the filter as unsupported. There is no attempt to add the support to the
phyter driver.

The third patch adds two helper functions working with NAPI ID, which is
needed by the next patch. The fourth patch adds a new option to get a
new control message with the L2 length and interface index for incoming
packets with hardware timestamps.

The fifth patch fixes documentation on number of non-zero fields in
scm_timestamping and warns about false software timestamps when
SO_TIMESTAMP(NS) is combined with SCM_TIMESTAMPING.

The sixth patch adds a new option to request both software and hardware
timestamps for outgoing packets. The seventh patch updates drivers that
assumed software timestamping cannot be used together with hardware
timestamping.

The patches have been tested on x86_64 machines with igb and e1000e
drivers.

Miroslav Lichvar (7):
  net: define receive timestamp filter for NTP
  net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALL
  net: add function to retrieve original skb device using NAPI ID
  net: add new control message for incoming HW-timestamped packets
  net: fix documentation of struct scm_timestamping
  net: allow simultaneous SW and HW transmit timestamping
  net: ethernet: update drivers to make both SW and HW TX timestamps

 Documentation/networking/timestamping.txt  | 26 +++-
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c   |  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   |  1 +
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  1 +
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c |  1 +
 drivers/net/ethernet/cavium/octeon/octeon_mgmt.c   |  1 +
 drivers/net/ethernet/intel/e1000e/netdev.c |  5 ++-
 drivers/net/ethernet/intel/i40e/i40e_ptp.c |  1 +
 drivers/net/ethernet/intel/igb/igb_ptp.c   |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c   |  1 +
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  1 +
 drivers/net/ethernet/neterion/vxge/vxge-main.c |  1 +
 drivers/net/ethernet/qlogic/qede/qede_ptp.c|  1 +
 drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c|  3 +-
 drivers/net/ethernet/sfc/ef10.c|  1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  7 ++--
 drivers/net/ethernet/ti/cpsw.c |  1 +
 drivers/net/ethernet/tile/tilegx.c |  1 +
 include/linux/netdevice.h  |  1 +
 include/linux/skbuff.h | 19 +
 include/uapi/asm-generic/socket.h  |  2 +
 include/uapi/linux/net_tstamp.h| 15 ++-
 net/core/dev.c | 26 
 net/core/dev_ioctl.c   |  1 +
 net/core/skbuff.c  |  4 ++
 net/socket.c   | 47 --
 27 files changed, 151 insertions(+), 23 deletions(-)

-- 
2.9.3

[PATCH v6 net-next 1/7] net: define receive timestamp filter for NTP

2017-05-19 Thread Miroslav Lichvar

Add HWTSTAMP_FILTER_NTP_ALL to the hwtstamp_rx_filters enum for
timestamping of NTP packets. There is currently only one driver
(phyter) that could support it directly.

CC: Richard Cochran 
CC: Willem de Bruijn 
Signed-off-by: Miroslav Lichvar 
---
 include/uapi/linux/net_tstamp.h | 3 +++
 net/core/dev_ioctl.c| 2 ++
 2 files changed, 5 insertions(+)

diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 464dcca..0749fb1 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -125,6 +125,9 @@ enum hwtstamp_rx_filters {
HWTSTAMP_FILTER_PTP_V2_SYNC,
/* PTP v2/802.AS1, any layer, Delay_req packet */
HWTSTAMP_FILTER_PTP_V2_DELAY_REQ,
+
+   /* NTP, UDP, all versions and packet modes */
+   HWTSTAMP_FILTER_NTP_ALL,
 };
 
 #endif /* _NET_TIMESTAMPING_H */
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index b94b1d2..8f036a7 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -227,6 +227,8 @@ static int net_hwtstamp_validate(struct ifreq *ifr)
case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
rx_filter_valid = 1;
break;
+   case HWTSTAMP_FILTER_NTP_ALL:
+   break;
}
 
if (!tx_type_valid || !rx_filter_valid)
-- 
2.9.3

HELLO!!!!!

2017-05-19 Thread MR. NEIL TROTTER

HELLO!

I am Mr  Neil Trotter, the current winner of 108  Euro millions jackpot, if you 
have received this email then you are of the lucky fellows to benefit from 
me,so do get back to me for a better understanding. Here is the website for 
proof
http://www.huffingtonpost.co.uk/2014/03/18/neil-trotter-euromillions-winner_n_4984234.html}
Contact Email;(mr.neiltrotter...@outlook.com)

THANKS,
MR. NEIL TROTTER.

Re: [PATCH net-next V5 0/9] vhost_net rx batch dequeuing

2017-05-19 Thread Michael S. Tsirkin

On Fri, May 19, 2017 at 02:27:16PM +0800, Jason Wang wrote:
> 
> 
> On 2017年05月18日 04:59, Michael S. Tsirkin wrote:
> > On Wed, May 17, 2017 at 12:14:36PM +0800, Jason Wang wrote:
> > > This series tries to implement rx batching for vhost-net. This is done
> > > by batching the dequeuing from skb_array which was exported by
> > > underlayer socket and pass the sbk back through msg_control to finish
> > > userspace copying. This is also the requirement for more batching
> > > implemention on rx path.
> > > 
> > > Tests shows at most 7.56% improvment bon rx pps on top of batch
> > > zeroing and no obvious changes for TCP_STREAM/TCP_RR result.
> > > 
> > > Please review.
> > > 
> > > Thanks
> > A surprisingly large gain for such as simple change.  It would be nice
> > to understand better why this helps - in particular, does the optimal
> > batch size change if ring is bigger or smaller?
> 
> Will test, just want to confirm. You mean virtio ring not tx_queue_len here?
> 
> Thanks

Exactly.
Thanks,
MST

Re: [PATCH net] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Xin Long

On Sat, May 20, 2017 at 12:25 AM, Ivan Vecera  wrote:
> Current bridge code incorrectly handles starting/stopping of hello and
> hold timers during STP enable/disable.
>
> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
>transition. This is not correct as the timers are stopped in NO_STP
>case.
>
> 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition.
>This is not also correct as the timers should be stopped in NO_STP
>state.
>
> 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP
>transition. They should be stopped as they are running in KERNEL_STP
>state and should not run in NO_STP case.
>
> The patch is a follow-up for "bridge: start hello_timer when enabling
> KERNEL_STP in br_stp_start" patch from Xin Long.
>
> Cc: da...@davemloft.net
> Cc: sas...@cumulusnetworks.com
> Cc: step...@networkplumber.org
> Cc: bri...@lists.linux-foundation.org
> Cc: lucien@gmail.com
> Cc: niko...@cumulusnetworks.com
> Signed-off-by: Ivan Vecera 
> ---
>  net/bridge/br_stp_if.c | 15 +--
>  1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
> index 0db8102995a5..f137ebf27755 100644
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char 
> *arg)
>
>  static void br_stp_start(struct net_bridge *br)
>  {
> -   struct net_bridge_port *p;
> int err = -ENOENT;
>
> if (net_eq(dev_net(br->dev), _net))
> @@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br)
> if (!err) {
> br->stp_enabled = BR_USER_STP;
> br_debug(br, "userspace STP started\n");
> -
> -   /* Stop hello and hold timers */
> -   del_timer(>hello_timer);
> -   list_for_each_entry(p, >port_list, list)
> -   del_timer(>hold_timer);
> } else {
> br->stp_enabled = BR_KERNEL_STP;
> br_debug(br, "using kernel STP\n");
> @@ -197,13 +191,14 @@ static void br_stp_stop(struct net_bridge *br)
> br_err(br, "failed to stop userspace STP (%d)\n", 
> err);
>
> /* To start timers on any ports left in blocking */
> -   mod_timer(>hello_timer, jiffies + br->hello_time);
> -   list_for_each_entry(p, >port_list, list)
> -   mod_timer(>hold_timer,
> - round_jiffies(jiffies + BR_HOLD_TIME));
> spin_lock_bh(>lock);
> br_port_state_selection(br);
> spin_unlock_bh(>lock);
> +   } else {
> +   /* BR_KERNEL_STP - stop hello and hold timers */
> +   del_timer(>hello_timer);
> +   list_for_each_entry(p, >port_list, list)
> +   del_timer(>hold_timer);
I'm thinking, what if the timers are running when deleting them ?
del_timer may not be going to delete it, and still have to stop itself
next time when br->stp_enabled = BR_NO_STP.

So do you think it's better to do nothing here and just leave it to be
stopped by itself when checking br->stp_enabled  in
br_hello_timer_expired ?

> }
>
> br->stp_enabled = BR_NO_STP;
> --
> 2.13.0
>

Re: [PATCH net-next] net: sched: provide stubs for tcf_chain_{get,put} for CONFIG_NET_CLS=n

2017-05-19 Thread Cong Wang

On Thu, May 18, 2017 at 9:24 AM, Sabrina Dubroca  wrote:
> This also changes tcf_chain_get() to return an error pointer instead of
> NULL, so that tcf_action_goto_chain_init() can differentiate memory
> allocation failure from lack of support.
>
> Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
> Signed-off-by: Sabrina Dubroca 
> ---
> I'm not sure this EOPNOTSUPP is really necessary, ie if we can really
> reach the tcf_action_goto_chain_init() call when CONFIG_NET_CLS=n.
> If not, a simpler patch would add a tcf_chain_get() stub that just
> returns NULL, as we wouldn't have to care about returning an incorrect
> error code from tcf_action_goto_chain_init().


I wonder if we should just make CONFIG_NET_CLS_ACT depending
on CONFIG_NET_CLS. Although without filters we can still have
standalone actions but no one can use them.

[PATCH net-next 3/9] udp: make local function static

2017-05-19 Thread Stephen Hemminger

udp_queue_rcv_skb was global but only used in one file.
Identified by this warning:
net/ipv4/udp.c:1775:5: warning: no previous prototype for ‘udp_queue_rcv_skb’ 
[-Wmissing-prototypes]
 int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)

Signed-off-by: Stephen Hemminger 
---
 net/ipv4/udp.c | 2 +-
 net/ipv6/udp.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e7b6cfcca627..8d7980ca27a0 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1772,7 +1772,7 @@ EXPORT_SYMBOL(udp_encap_enable);
  * Note that in the success and error cases, the skb is assumed to
  * have either been requeued or freed.
  */
-int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
struct udp_sock *up = udp_sk(sk);
int is_udplite = IS_UDPLITE(sk);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index f78fdf8c9f0f..d558c1fef6fd 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -570,7 +570,7 @@ void udpv6_encap_enable(void)
 }
 EXPORT_SYMBOL(udpv6_encap_enable);
 
-int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
struct udp_sock *up = udp_sk(sk);
int is_udplite = IS_UDPLITE(sk);
-- 
2.11.0

[PATCH net-next 6/9] xfrm: make xfrm_dev_register static

2017-05-19 Thread Stephen Hemminger

This function is only used in this file and should not be global.

Signed-off-by: Stephen Hemminger 
---
 net/xfrm/xfrm_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 8ec8a3fcf8d4..50ec73399b48 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -138,7 +138,7 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct 
xfrm_state *x)
 }
 EXPORT_SYMBOL_GPL(xfrm_dev_offload_ok);
 
-int xfrm_dev_register(struct net_device *dev)
+static int xfrm_dev_register(struct net_device *dev)
 {
if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
return NOTIFY_BAD;
-- 
2.11.0

[PATCH net-next 8/9] ipv6: drop unused variables in seg6_genl_dumphac

2017-05-19 Thread Stephen Hemminger

THe seg6_pernet_data variable was set but never used.

Signed-off-by: Stephen Hemminger 
---
 net/ipv6/seg6.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c
index 5f44ffed2576..15fba55e3da8 100644
--- a/net/ipv6/seg6.c
+++ b/net/ipv6/seg6.c
@@ -303,13 +303,9 @@ static int seg6_genl_dumphmac_done(struct netlink_callback 
*cb)
 static int seg6_genl_dumphmac(struct sk_buff *skb, struct netlink_callback *cb)
 {
struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
-   struct net *net = sock_net(skb->sk);
-   struct seg6_pernet_data *sdata;
struct seg6_hmac_info *hinfo;
int ret;
 
-   sdata = seg6_pernet(net);
-
ret = rhashtable_walk_start(iter);
if (ret && ret != -EAGAIN)
goto done;
-- 
2.11.0

[PATCH net-next 2/9] ila: propagate error code in ila_output

2017-05-19 Thread Stephen Hemminger

This warning:
net/ipv6/ila/ila_lwt.c: In function ‘ila_output’:
net/ipv6/ila/ila_lwt.c:42:6: warning: variable ‘err’ set but not used 
[-Wunused-but-set-variable]

It looks like the code attempts to set propagate different error
values, but always returned -EINVAL.

Compile tested only. Needs review by original author.

Signed-off-by: Stephen Hemminger 
---
 net/ipv6/ila/ila_lwt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index b3df03e3faa0..f4a413aba423 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -91,7 +91,7 @@ static int ila_output(struct net *net, struct sock *sk, 
struct sk_buff *skb)
 
 drop:
kfree_skb(skb);
-   return -EINVAL;
+   return err;
 }
 
 static int ila_input(struct sk_buff *skb)
-- 
2.11.0

1 2 3 >

1 - 100 of 219 matches

Mail list logo