Re: [PATCH net 1/3] vlan: Fix tcp checksums offloads for Q-in-Q vlan.
On 2017/05/19 16:09, Vlad Yasevich wrote: > On 05/18/2017 10:13 PM, Toshiaki Makita wrote: >> On 2017/05/18 22:31, Vladislav Yasevich wrote: >>> It appears that since commit 8cb65d000, Q-in-Q vlans have been >>> broken. The series that commit is part of enabled TSO and checksum >>> offloading on Q-in-Q vlans. However, most HW we support can't handle >>> it. To work around the issue, the above commit added a function that >>> turns off offloads on Q-in-Q devices, but it left the checksum offload. >>> That will cause issues with most older devices that supprort very basic >>> checksum offload capabilities as well as some newer devices (we've >>> reproduced te problem with both be2net and bnx). >>> >>> To solve this for everyone, turn off checksum offloading feature >>> by default when sending Q-in-Q traffic. Devices that are proven to >>> work can provided a corrected ndo_features_check implemetation. >>> >>> Fixes: 8cb65d000 ("net: Move check for multiple vlans to drivers") >>> CC: Toshiaki Makita>>> Signed-off-by: Vladislav Yasevich >>> --- >>> include/linux/if_vlan.h | 1 - >>> 1 file changed, 1 deletion(-) >>> >>> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h >>> index 8d5fcd6..ae537f0 100644 >>> --- a/include/linux/if_vlan.h >>> +++ b/include/linux/if_vlan.h >>> @@ -619,7 +619,6 @@ static inline netdev_features_t >>> vlan_features_check(const struct sk_buff *skb, >>> NETIF_F_SG | >>> NETIF_F_HIGHDMA | >>> NETIF_F_FRAGLIST | >>> -NETIF_F_HW_CSUM | >>> NETIF_F_HW_VLAN_CTAG_TX | >>> NETIF_F_HW_VLAN_STAG_TX); >>> >> >> I guess HW_CSUM theoretically can handle Q-in-Q packets and the problem >> is IP_CSUM and IPV6_CSUM. >> So wouldn't it be better to leave HW_CSUM and drop IP_CSUM/IPV6_CSUM, >> i.e. change intersection into bitwise AND? >> > > It wasn't really a problem before accelerations got enabled on q-in-q > vlans. Right for stacked vlan device. But I think the check was there for packets from guests forwarded by bridge to vlan device so it was a problem before 8cb65d000. >> The intersection was introduced in db115037bb57 ("net: fix checksum >> features handling in netif_skb_features()"), but I guess for this >> particular check the intersection was not needed. >> > > So, to put it another way, leave the intersection with HW_CSUM in the mask, > and then do: > > return features & ~(NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM); > > This might work, but it assumes that everyone who announce HW_CSUM can > do q-in-q vlans. It's been a bit of a pain tracking this down and I'd rather > fix it for everyone and let individual driver authors verify that Q-in-Q works > correctly with HW checksum. However, I am willing to do the above if > that's what people want. At least HW_CSUM in the check was introduced intentionally. https://www.spinics.net/lists/netdev/msg152016.html And I think HW_CSUM should work with any packets. You know, include/linux/skbuff.h says > *NETIF_F_HW_CSUM - The driver (or its device) is able to compute one > * IP (one's complement) checksum for any combination > * of protocols or protocol layering. We should be able to safely enable it. ...But you are so worried about layer2 protocol handling of HW_CSUM devices, I'm ok with disabling it for now. -- Toshiaki Makita
Re: [PATCH v4 net-next 6/7] net: allow simultaneous SW and HW transmit timestamping
On Thu, May 18, 2017 at 04:16:26PM -0400, Willem de Bruijn wrote: > On Thu, May 18, 2017 at 9:06 AM, Miroslav Lichvarwrote: > > +/* On transmit, software and hardware timestamps are returned > > independently. > > + * As the two skb clones share the hardware timestamp, which may be updated > > + * before the software timestamp is received, a hardware TX timestamp may > > be > > + * returned only if there is no software TX timestamp. A false software > > + * timestamp made for SOCK_RCVTSTAMP when a real timestamp is missing must > > + * be ignored. > > Please expand on why this case can be ignored. It is quite subtle. How about > something like > > * > * A false software timestamp is one made inside the __sock_recv_timestamp > * call itself. These are generated whenever SO_TIMESTAMP(NS) is enabled > * on the socket, even when the timestamp reported is for another option, such > * as hardware tx timestamp. > * > * Ignore these when deciding whether a timestamp source is hw or sw. > */ That seems a bit too verbose to me. :) Would the following work? /* On transmit, software and hardware timestamps are returned independently. * As the two skb clones share the hardware timestamp, which may be updated * before the software timestamp is received, a hardware TX timestamp may be * returned only if there is no software TX timestamp. Ignore false software * timestamps, which may be made in the __sock_recv_timestamp() call when the * option SO_TIMESTAMP(NS) is enabled on the socket, even when the skb has a * hardware timestamp. */ > > +static bool skb_is_swtx_tstamp(const struct sk_buff *skb, > > + const struct sock *sk, int false_tstamp) > > +{ > > + if (false_tstamp && sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) > > Also, why is it ignored only for the new mode? Good point. That should not be there. The function can be now reduced to a single line again. I originally tried a different approach, disabling false timestamps in the new mode, but then I thought it's better to not complicate it unnecessarily and keep it consistent. -- Miroslav Lichvar
[PATCH 10/12] netfilter: nf_tables: revisit chain/object refcounting from elements
Andreas reports that the following incremental update using our commit protocol doesn't work. # nft -f incremental-update.nft delete element ip filter client_to_any { 10.180.86.22 : goto CIn_1 } delete chain ip filter CIn_1 ... Error: Could not process rule: Device or resource busy The existing code is not well-integrated into the commit phase protocol, since element deletions do not result in refcount decrement from the preparation phase. This results in bogus EBUSY errors like the one above. Two new functions come with this patch: * nft_set_elem_activate() function is used from the abort path, to restore the set element refcounting on objects that occurred from the preparation phase. * nft_set_elem_deactivate() that is called from nft_del_setelem() to decrement set element refcounting on objects from the preparation phase in the commit protocol. The nft_data_uninit() has been renamed to nft_data_release() since this function does not uninitialize any data store in the data register, instead just releases the references to objects. Moreover, a new function nft_data_hold() has been introduced to be used from nft_set_elem_activate(). Reported-by: Andreas SchultzSigned-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_tables.h | 2 +- net/netfilter/nf_tables_api.c | 82 ++- net/netfilter/nft_bitwise.c | 4 +- net/netfilter/nft_cmp.c | 2 +- net/netfilter/nft_immediate.c | 5 ++- net/netfilter/nft_range.c | 4 +- 6 files changed, 81 insertions(+), 18 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 028faec8fc27..8a8bab8d7b15 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -176,7 +176,7 @@ struct nft_data_desc { int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data, unsigned int size, struct nft_data_desc *desc, const struct nlattr *nla); -void nft_data_uninit(const struct nft_data *data, enum nft_data_types type); +void nft_data_release(const struct nft_data *data, enum nft_data_types type); int nft_data_dump(struct sk_buff *skb, int attr, const struct nft_data *data, enum nft_data_types type, unsigned int len); diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 5f4a4d48b871..da314be0c048 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3627,9 +3627,9 @@ void nft_set_elem_destroy(const struct nft_set *set, void *elem, { struct nft_set_ext *ext = nft_set_elem_ext(set, elem); - nft_data_uninit(nft_set_ext_key(ext), NFT_DATA_VALUE); + nft_data_release(nft_set_ext_key(ext), NFT_DATA_VALUE); if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA)) - nft_data_uninit(nft_set_ext_data(ext), set->dtype); + nft_data_release(nft_set_ext_data(ext), set->dtype); if (destroy_expr && nft_set_ext_exists(ext, NFT_SET_EXT_EXPR)) nf_tables_expr_destroy(NULL, nft_set_ext_expr(ext)); if (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF)) @@ -3638,6 +3638,18 @@ void nft_set_elem_destroy(const struct nft_set *set, void *elem, } EXPORT_SYMBOL_GPL(nft_set_elem_destroy); +/* Only called from commit path, nft_set_elem_deactivate() already deals with + * the refcounting from the preparation phase. + */ +static void nf_tables_set_elem_destroy(const struct nft_set *set, void *elem) +{ + struct nft_set_ext *ext = nft_set_elem_ext(set, elem); + + if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPR)) + nf_tables_expr_destroy(NULL, nft_set_ext_expr(ext)); + kfree(elem); +} + static int nft_setelem_parse_flags(const struct nft_set *set, const struct nlattr *attr, u32 *flags) { @@ -3849,9 +3861,9 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, kfree(elem.priv); err3: if (nla[NFTA_SET_ELEM_DATA] != NULL) - nft_data_uninit(, d2.type); + nft_data_release(, d2.type); err2: - nft_data_uninit(, d1.type); + nft_data_release(, d1.type); err1: return err; } @@ -3896,6 +3908,53 @@ static int nf_tables_newsetelem(struct net *net, struct sock *nlsk, return err; } +/** + * nft_data_hold - hold a nft_data item + * + * @data: struct nft_data to release + * @type: type of data + * + * Hold a nft_data item. NFT_DATA_VALUE types can be silently discarded, + * NFT_DATA_VERDICT bumps the reference to chains in case of NFT_JUMP and + * NFT_GOTO verdicts. This function must be called on active data objects + * from the second phase of the commit protocol. + */ +static void nft_data_hold(const struct nft_data *data, enum nft_data_types type) +{ + if (type == NFT_DATA_VERDICT) { + switch (data->verdict.code)
[PATCH 09/12] netfilter: nf_tables: missing sanitization in data from userspace
Do not assume userspace always sends us NFT_DATA_VALUE for bitwise and cmp expressions. Although NFT_DATA_VERDICT does not make any sense, it is still possible to handcraft a netlink message using this incorrect data type. Signed-off-by: Pablo Neira Ayuso--- net/netfilter/nft_bitwise.c | 19 ++- net/netfilter/nft_cmp.c | 12 ++-- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c index 877d9acd91ef..96bd4f325b0f 100644 --- a/net/netfilter/nft_bitwise.c +++ b/net/netfilter/nft_bitwise.c @@ -83,17 +83,26 @@ static int nft_bitwise_init(const struct nft_ctx *ctx, tb[NFTA_BITWISE_MASK]); if (err < 0) return err; - if (d1.len != priv->len) - return -EINVAL; + if (d1.len != priv->len) { + err = -EINVAL; + goto err1; + } err = nft_data_init(NULL, >xor, sizeof(priv->xor), , tb[NFTA_BITWISE_XOR]); if (err < 0) - return err; - if (d2.len != priv->len) - return -EINVAL; + goto err1; + if (d2.len != priv->len) { + err = -EINVAL; + goto err2; + } return 0; +err2: + nft_data_uninit(>xor, d2.type); +err1: + nft_data_uninit(>mask, d1.type); + return err; } static int nft_bitwise_dump(struct sk_buff *skb, const struct nft_expr *expr) diff --git a/net/netfilter/nft_cmp.c b/net/netfilter/nft_cmp.c index 2b96effeadc1..8c9d0fb19118 100644 --- a/net/netfilter/nft_cmp.c +++ b/net/netfilter/nft_cmp.c @@ -201,10 +201,18 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) if (err < 0) return ERR_PTR(err); + if (desc.type != NFT_DATA_VALUE) { + err = -EINVAL; + goto err1; + } + if (desc.len <= sizeof(u32) && op == NFT_CMP_EQ) return _cmp_fast_ops; - else - return _cmp_ops; + + return _cmp_ops; +err1: + nft_data_uninit(, desc.type); + return ERR_PTR(-EINVAL); } struct nft_expr_type nft_cmp_type __read_mostly = { -- 2.1.4
[PATCH 00/12] Netfilter/IPVS fixes for net
Hi David, The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) When using IPVS in direct-routing mode, normal traffic from the LVS host to a back-end server is sometimes incorrectly NATed on the way back into the LVS host. Patch to fix this from Julian Anastasov. 2) Calm down clang compilation warning in ctnetlink due to type mismatch, from Matthias Kaehlcke. 3) Do not re-setup NAT for conntracks that are already confirmed, this is fixing a problem that was introduced in the previous nf-next batch. Patch from Liping Zhang. 4) Do not allow conntrack helper removal from userspace cthelper infrastructure if already in used. This comes with an initial patch to introduce nf_conntrack_helper_put() that is required by this fix. From Liping Zhang. 5) Zero the pad when copying data to userspace, otherwise iptables fails to remove rules. This is a follow up on the patchset that sorts out the internal match/target structure pointer leak to userspace. Patch from the same author, Willem de Bruijn. This also comes with a build failure when CONFIG_COMPAT is not on, coming in the last patch of this series. 6) SYNPROXY crashes with conntrack entries that are created via ctnetlink, more specifically via conntrackd state sync. Patch from Eric Leblond. 7) RCU safe iteration on set element dumping in nf_tables, from Liping Zhang. 8) Missing sanitization of immediate date for the bitwise and cmp expressions in nf_tables. 9) Refcounting logic for chain and objects from set elements does not integrate into the nf_tables 2-phase commit protocol. 10) Missing sanitization of target verdict in ebtables arpreply target, from Gao Feng. You can pull these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git Thanks! The following changes since commit 1c4d5f51a812a82de97beee24f48ed05c65ebda5: vmxnet3: ensure that adapter is in proper state during force_close (2017-05-12 12:23:52 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD for you to fetch changes up to 751a9c763849f5859cb69ea44b0430d00672f637: netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPAT (2017-05-18 13:10:03 +0200) Eric Leblond (1): netfilter: synproxy: fix conntrackd interaction Gao Feng (1): ebtables: arpreply: Add the standard target sanity check Julian Anastasov (1): ipvs: SNAT packet replies only for NATed connections Liping Zhang (4): netfilter: don't setup nat info for confirmed ct netfilter: introduce nf_conntrack_helper_put helper function netfilter: nfnl_cthelper: reject del request if helper obj is in use netfilter: nf_tables: can't assume lock is acquired when dumping set elems Matthias Kaehlcke (1): netfilter: ctnetlink: Make some parameters integer to avoid enum mismatch Pablo Neira Ayuso (3): Merge tag 'ipvs-fixes-for-v4.12' of http://git.kernel.org/.../horms/ipvs netfilter: nf_tables: missing sanitization in data from userspace netfilter: nf_tables: revisit chain/object refcounting from elements Willem de Bruijn (2): netfilter: xtables: zero padding in data_to_user netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPAT include/linux/netfilter/x_tables.h | 2 +- include/linux/netfilter_bridge/ebtables.h | 5 + include/net/netfilter/nf_conntrack_helper.h | 4 + include/net/netfilter/nf_tables.h | 2 +- net/bridge/netfilter/ebt_arpreply.c | 3 + net/bridge/netfilter/ebtables.c | 9 +- net/netfilter/ipvs/ip_vs_core.c | 19 +++- net/netfilter/nf_conntrack_helper.c | 12 +++ net/netfilter/nf_conntrack_netlink.c| 11 +- net/netfilter/nf_nat_core.c | 4 + net/netfilter/nf_tables_api.c | 160 ++-- net/netfilter/nfnetlink_cthelper.c | 17 +-- net/netfilter/nft_bitwise.c | 19 +++- net/netfilter/nft_cmp.c | 12 ++- net/netfilter/nft_ct.c | 4 +- net/netfilter/nft_immediate.c | 5 +- net/netfilter/nft_range.c | 4 +- net/netfilter/nft_set_hash.c| 2 +- net/netfilter/x_tables.c| 24 +++-- net/netfilter/xt_CT.c | 6 +- net/openvswitch/conntrack.c | 4 +- 21 files changed, 249 insertions(+), 79 deletions(-)
[PATCH 03/12] netfilter: don't setup nat info for confirmed ct
From: Liping ZhangWe cannot setup nat info if the ct has been confirmed already, else, different cpu may race to handle the same ct. In extreme situation, we may hit the "BUG_ON(nf_nat_initialized(ct, maniptype))" in the nf_nat_setup_info. Also running the following commands will easily hit NF_CT_ASSERT in nf_conntrack_alter_reply: # nft flush ruleset # ping -c 2 -W 1 1.1.1.111 & # nft add table t # nft add chain t c {type nat hook postrouting priority 0 \;} # nft add rule t c snat to 4.5.6.7 WARNING: CPU: 1 PID: 10065 at net/netfilter/nf_conntrack_core.c:1472 nf_conntrack_alter_reply+0x9a/0x1a0 [nf_conntrack] [...] Call Trace: nf_nat_setup_info+0xad/0x840 [nf_nat] ? deactivate_slab+0x65d/0x6c0 nft_nat_eval+0xcd/0x100 [nft_nat] nft_do_chain+0xff/0x5d0 [nf_tables] ? mark_held_locks+0x6f/0xa0 ? __local_bh_enable_ip+0x70/0xa0 ? trace_hardirqs_on_caller+0x11f/0x190 ? ipt_do_table+0x310/0x610 ? trace_hardirqs_on+0xd/0x10 ? __local_bh_enable_ip+0x70/0xa0 ? ipt_do_table+0x32b/0x610 ? __lock_acquire+0x2ac/0x1580 ? ipt_do_table+0x32b/0x610 nft_nat_do_chain+0x65/0x80 [nft_chain_nat_ipv4] nf_nat_ipv4_fn+0x1ae/0x240 [nf_nat_ipv4] nf_nat_ipv4_out+0x4a/0xf0 [nf_nat_ipv4] nft_nat_ipv4_out+0x15/0x20 [nft_chain_nat_ipv4] nf_hook_slow+0x2c/0xf0 ip_output+0x154/0x270 So for the confirmed ct, just ignore it and return NF_ACCEPT. Fixes: 9a08ecfe74d7 ("netfilter: don't attach a nat extension by default") Signed-off-by: Liping Zhang Acked-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_nat_core.c | 4 1 file changed, 4 insertions(+) diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index b48d6b5aae8a..ef0be325a0c6 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -409,6 +409,10 @@ nf_nat_setup_info(struct nf_conn *ct, { struct nf_conntrack_tuple curr_tuple, new_tuple; + /* Can't setup nat info for confirmed ct. */ + if (nf_ct_is_confirmed(ct)) + return NF_ACCEPT; + NF_CT_ASSERT(maniptype == NF_NAT_MANIP_SRC || maniptype == NF_NAT_MANIP_DST); BUG_ON(nf_nat_initialized(ct, maniptype)); -- 2.1.4
[PATCH 02/12] netfilter: ctnetlink: Make some parameters integer to avoid enum mismatch
From: Matthias KaehlckeNot all parameters passed to ctnetlink_parse_tuple() and ctnetlink_exp_dump_tuple() match the enum type in the signatures of these functions. Since this is intended change the argument type of to be an unsigned integer value. Signed-off-by: Matthias Kaehlcke Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_netlink.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index dcf561b5c97a..fa752626029e 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1007,9 +1007,8 @@ static const struct nla_policy tuple_nla_policy[CTA_TUPLE_MAX+1] = { static int ctnetlink_parse_tuple(const struct nlattr * const cda[], - struct nf_conntrack_tuple *tuple, - enum ctattr_type type, u_int8_t l3num, - struct nf_conntrack_zone *zone) + struct nf_conntrack_tuple *tuple, u32 type, + u_int8_t l3num, struct nf_conntrack_zone *zone) { struct nlattr *tb[CTA_TUPLE_MAX+1]; int err; @@ -2447,7 +2446,7 @@ static struct nfnl_ct_hook ctnetlink_glue_hook = { static int ctnetlink_exp_dump_tuple(struct sk_buff *skb, const struct nf_conntrack_tuple *tuple, - enum ctattr_expect type) + u32 type) { struct nlattr *nest_parms; -- 2.1.4
RE: [RFC V1 1/1] net: cdc_ncm: Reduce memory use when kernel memory low
From: linux-usb-ow...@vger.kernel.org [mailto:linux-usb-ow...@vger.kernel.org] On Behalf Of Jim Baxter > Sent: 16 May 2017 18:41 > > The CDC-NCM driver can require large amounts of memory to create > skb's and this can be a problem when the memory becomes fragmented. > > This especially affects embedded systems that have constrained > resources but wish to maximise the throughput of CDC-NCM with 16KiB > NTB's. Why is this driver copying multiple tx messages into a single skb. Surely there are better ways to do this?? I think it is generating a 'multi-ethernet frame' URB with an overall header for each URB and a header for each ethernet frame. Given that the USB stack allows multiple concurrent transmits I'm surprised that batching large ethernet frames makes much difference. Also the USB target can't actually tell when URB that contain multiples of the USB packet size end. So it is possible to send a single NTB as multiple URB. Of course, the usb_net might have other ideas. David
Re: [PATCH v2] xfrm: fix state migration copy replay sequence numbers
On Fri, May 19, 2017 at 12:47:00PM +0200, Antony Antony wrote: > During xfrm migration copy replay and preplay sequence numbers > from the previous state. > > Here is a tcpdump output showing the problem. > 10.0.10.46 is running vanilla kernel, is the IKE/IPsec responder. > After the migration it sent wrong sequence number, reset to 1. > The migration is from 10.0.0.52 to 10.0.0.53. > > IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7cf), length 136 > IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: > ESP(spi=0xca1c282d,seq=0x7cf), length 136 > IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7d0), length 136 > IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: > ESP(spi=0xca1c282d,seq=0x7d0), length 136 > > IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] > IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] > IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] > IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] > > IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7d1), length 136 > > NOTE: next sequence is wrong 0x1 > > IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), > length 136 > IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7d2), length 136 > IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), > length 136 > > Signed-off-by: Antony AntonyApplied, thanks Antony!
Re: [PATCH] xfrm: fix state migration replay sequence numbers
On Thu, May 18, 2017 at 04:39:53PM +0200, Antony Antony wrote: > During xfrm migration replay and preplay sequence numbers are not > copied from the previous state. > > Here is tcpdump output showing the problem. > 10.0.10.46 is running vanilla kernel, IKE/IPsec responder. > After the migration it sent wrong sequence number, reset to 1. > The migration is from 10.0.0.52 to 10.0.0.53. > > IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7cf), length 136 > IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: > ESP(spi=0xca1c282d,seq=0x7cf), length 136 > IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7d0), length 136 > IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: > ESP(spi=0xca1c282d,seq=0x7d0), length 136 > > IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] > IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] > IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] > IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] > > IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7d1), length 136 > > NOTE: next sequence is wrong 0x1 > > IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), > length 136 > IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7d2), length 136 > IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), > length 136 > > The attached patch fix it by copying replay and preplay. The patch looks ok, but please do a v2 and put the above informations into the commit message. This is usefull information that we would loose otherwise. Thanks!
Re: [PATCH net-next] xfrm: Make function xfrm_dev_register static
On Thu, May 18, 2017 at 03:51:38PM +, Wei Yongjun wrote: > From: Wei Yongjun> > Fixes the following sparse warning: > > net/xfrm/xfrm_device.c:141:5: warning: > symbol 'xfrm_dev_register' was not declared. Should it be static? > > Signed-off-by: Wei Yongjun Applied to ipsec-next, thanks!
Re: ipsec doesn't route TCP with 4.11 kernel
On Tue, May 16, 2017 at 03:05:40PM -0400, Don Bowman wrote: > On 3 May 2017 at 04:14, Steffen Klassertwrote: > > On Sat, Apr 29, 2017 at 08:39:34PM -0400, Don Bowman wrote: > >> On 28 April 2017 at 03:13, Steffen Klassert > >> wrote: > >> > On Thu, Apr 27, 2017 at 06:13:38PM -0400, Don Bowman wrote: > >> >> On 27 April 2017 at 04:42, Steffen Klassert > >> >> > >> >> wrote: > >> >> > On Wed, Apr 26, 2017 at 10:01:34PM -0700, Cong Wang wrote: > >> >> >> (Cc'ing netdev and IPSec maintainers) > >> >> >> > >> >> >> On Tue, Apr 25, 2017 at 6:08 PM, Don Bowman > >> >> >> wrote: > >> >> > >> > >> > >> > >> confirmed, with this patch in place that the tcp functions properly. > > > > Thanks for testing! > > > > I'll make sure to get this fix into the mainline soon. > > Thanks. Let me know if there is any more assistance I can provide. > I've been running the patch for 2 weeks now on 3 machines. Thanks for testing! I plan to push it upstream at the beginning of te next week.
Re: [PATCH v2] e1000e: Don't return uninitialized stats
On Thu, 2017-05-18 at 10:46 -0400, David Miller wrote: > From: Benjamin Poirier> Date: Wed, 17 May 2017 16:24:13 -0400 > > > Some statistics passed to ethtool are garbage because > > e1000e_get_stats64() > > doesn't write them, for example: tx_heartbeat_errors. This leaks kernel > > memory to userspace and confuses users. > > > > Do like ixgbe and use dev_get_stats() which first zeroes out > > rtnl_link_stats64. > > > > Fixes: 5944701df90d ("net: remove useless memset's in drivers > > get_stats64") > > Reported-by: Stefan Priebe > > Signed-off-by: Benjamin Poirier > > Jeff, please be sure to pick this up, thanks. Yep, I have it in my tree, thanks. signature.asc Description: This is a digitally signed message part
RE: [PATCH net-next 9/9] nfp: eliminate an if statement in calculation of completed frames
From: Jakub Kicinski > Sent: 17 May 2017 18:37 .. > > > while (todo--) { > > > idx = D_IDX(tx_ring, tx_ring->rd_p++); > > > > That '++' looks suspicious. > > I think you need to decide whether you are incrementing pointers into the > > ring > > or indexes into it. > > Sometimes it is safer to use a non-wrapping index and mask when accessing > > the entry. > > entry_ptr = [idx & (RING_SIZE - 1)] > > Ring full is then (read_idx == write_idx + RING_SIZE), > > ring empty (read_idx == write_idx). > > So the index just wrap at (probably)_2^32. > > I may be missing the point. I use a mix of the two, actually, the > software pointers are free running (non-wrapping) but the HW QCP > pointers wrap. Because HW pointers wrap I always keep one entry on > the rings empty, see nfp_net_tx_full(). Ah, I'd assumed that rd_p was a pointer, not an index. David
[PATCH net-next] cxgb4: fix incorrect cim_la output for T6
take care of UpDbgLaRdPtr[0-3] restriction for T6 Signed-off-by: Ganesh Goudar--- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c index aded42b96..917b46b 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c @@ -8268,6 +8268,13 @@ int t4_cim_read_la(struct adapter *adap, u32 *la_buf, unsigned int *wrptr) if (ret) break; idx = (idx + 1) & UPDBGLARDPTR_M; + + /* Bits 0-3 of UpDbgLaRdPtr can be between to 1001 to +* identify the 32-bit portion of the full 312-bit data +*/ + if (is_t6(adap->params.chip)) + while ((idx & 0xf) > 9) + idx = (idx + 1) % UPDBGLARDPTR_M; } restart: if (cfg & UPDBGLAEN_F) { -- 2.1.0
Re: [PATCH 4.4-only] openvswitch: clear sender cpu before forwarding packets
On 18/05/17 09:11, Greg KH wrote: So backporting that one patch solves the issue here? Can you please verify it, and let me know before I apply it? thanks, greg k-h yes, I can do that.
[PATCH 11/12] ebtables: arpreply: Add the standard target sanity check
From: Gao FengThe info->target comes from userspace and it would be used directly. So we need to add the sanity check to make sure it is a valid standard target, although the ebtables tool has already checked it. Kernel needs to validate anything coming from userspace. If the target is set as an evil value, it would break the ebtables and cause a panic. Because the non-standard target is treated as one offset. Now add one helper function ebt_invalid_target, and we would replace the macro INVALID_TARGET later. Signed-off-by: Gao Feng Signed-off-by: Pablo Neira Ayuso --- include/linux/netfilter_bridge/ebtables.h | 5 + net/bridge/netfilter/ebt_arpreply.c | 3 +++ 2 files changed, 8 insertions(+) diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h index a30efb437e6d..e0cbf17af780 100644 --- a/include/linux/netfilter_bridge/ebtables.h +++ b/include/linux/netfilter_bridge/ebtables.h @@ -125,4 +125,9 @@ extern unsigned int ebt_do_table(struct sk_buff *skb, /* True if the target is not a standard target */ #define INVALID_TARGET (info->target < -NUM_STANDARD_TARGETS || info->target >= 0) +static inline bool ebt_invalid_target(int target) +{ + return (target < -NUM_STANDARD_TARGETS || target >= 0); +} + #endif diff --git a/net/bridge/netfilter/ebt_arpreply.c b/net/bridge/netfilter/ebt_arpreply.c index 5929309beaa1..db85230e49c3 100644 --- a/net/bridge/netfilter/ebt_arpreply.c +++ b/net/bridge/netfilter/ebt_arpreply.c @@ -68,6 +68,9 @@ static int ebt_arpreply_tg_check(const struct xt_tgchk_param *par) if (e->ethproto != htons(ETH_P_ARP) || e->invflags & EBT_IPROTO) return -EINVAL; + if (ebt_invalid_target(info->target)) + return -EINVAL; + return 0; } -- 2.1.4
[PATCH 08/12] netfilter: nf_tables: can't assume lock is acquired when dumping set elems
From: Liping ZhangWhen dumping the elements related to a specified set, we may invoke the nf_tables_dump_set with the NFNL_SUBSYS_NFTABLES lock not acquired. So we should use the proper rcu operation to avoid race condition, just like other nft dump operations. Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 78 +++ net/netfilter/nft_set_hash.c | 2 +- 2 files changed, 57 insertions(+), 23 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 559225029740..5f4a4d48b871 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3367,35 +3367,50 @@ static int nf_tables_dump_setelem(const struct nft_ctx *ctx, return nf_tables_fill_setelem(args->skb, set, elem); } +struct nft_set_dump_ctx { + const struct nft_set*set; + struct nft_ctx ctx; +}; + static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb) { + struct nft_set_dump_ctx *dump_ctx = cb->data; struct net *net = sock_net(skb->sk); - u8 genmask = nft_genmask_cur(net); + struct nft_af_info *afi; + struct nft_table *table; struct nft_set *set; struct nft_set_dump_args args; - struct nft_ctx ctx; - struct nlattr *nla[NFTA_SET_ELEM_LIST_MAX + 1]; + bool set_found = false; struct nfgenmsg *nfmsg; struct nlmsghdr *nlh; struct nlattr *nest; u32 portid, seq; - int event, err; + int event; - err = nlmsg_parse(cb->nlh, sizeof(struct nfgenmsg), nla, - NFTA_SET_ELEM_LIST_MAX, nft_set_elem_list_policy, - NULL); - if (err < 0) - return err; + rcu_read_lock(); + list_for_each_entry_rcu(afi, >nft.af_info, list) { + if (afi != dump_ctx->ctx.afi) + continue; - err = nft_ctx_init_from_elemattr(, net, cb->skb, cb->nlh, -(void *)nla, genmask); - if (err < 0) - return err; + list_for_each_entry_rcu(table, >tables, list) { + if (table != dump_ctx->ctx.table) + continue; - set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_ELEM_LIST_SET], - genmask); - if (IS_ERR(set)) - return PTR_ERR(set); + list_for_each_entry_rcu(set, >sets, list) { + if (set == dump_ctx->set) { + set_found = true; + break; + } + } + break; + } + break; + } + + if (!set_found) { + rcu_read_unlock(); + return -ENOENT; + } event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, NFT_MSG_NEWSETELEM); portid = NETLINK_CB(cb->skb).portid; @@ -3407,11 +3422,11 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb) goto nla_put_failure; nfmsg = nlmsg_data(nlh); - nfmsg->nfgen_family = ctx.afi->family; + nfmsg->nfgen_family = afi->family; nfmsg->version = NFNETLINK_V0; - nfmsg->res_id = htons(ctx.net->nft.base_seq & 0x); + nfmsg->res_id = htons(net->nft.base_seq & 0x); - if (nla_put_string(skb, NFTA_SET_ELEM_LIST_TABLE, ctx.table->name)) + if (nla_put_string(skb, NFTA_SET_ELEM_LIST_TABLE, table->name)) goto nla_put_failure; if (nla_put_string(skb, NFTA_SET_ELEM_LIST_SET, set->name)) goto nla_put_failure; @@ -3422,12 +3437,13 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb) args.cb = cb; args.skb= skb; - args.iter.genmask = nft_genmask_cur(ctx.net); + args.iter.genmask = nft_genmask_cur(net); args.iter.skip = cb->args[0]; args.iter.count = 0; args.iter.err = 0; args.iter.fn= nf_tables_dump_setelem; - set->ops->walk(, set, ); + set->ops->walk(_ctx->ctx, set, ); + rcu_read_unlock(); nla_nest_end(skb, nest); nlmsg_end(skb, nlh); @@ -3441,9 +3457,16 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; nla_put_failure: + rcu_read_unlock(); return -ENOSPC; } +static int nf_tables_dump_set_done(struct netlink_callback *cb) +{ + kfree(cb->data); + return 0; +} + static int nf_tables_getsetelem(struct net *net, struct sock *nlsk, struct sk_buff *skb,
[PATCH 12/12] netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPAT
From: Willem de BruijnThe patch in the Fixes references COMPAT_XT_ALIGN in the definition of XT_DATA_TO_USER, outside an #ifdef CONFIG_COMPAT block. Split XT_DATA_TO_USER into separate compat and non compat variants and define the first inside an CONFIG_COMPAT block. This simplifies both variants by removing branches inside the macro. Fixes: 324318f0248c ("netfilter: xtables: zero padding in data_to_user") Reported-by: Stephen Rothwell Signed-off-by: Willem de Bruijn Signed-off-by: Pablo Neira Ayuso --- net/netfilter/x_tables.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index d17769599c10..1770c1d9b37f 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -296,18 +296,17 @@ int xt_data_to_user(void __user *dst, const void *src, } EXPORT_SYMBOL_GPL(xt_data_to_user); -#define XT_DATA_TO_USER(U, K, TYPE, C_SIZE)\ +#define XT_DATA_TO_USER(U, K, TYPE)\ xt_data_to_user(U->data, K->data, \ K->u.kernel.TYPE->usersize, \ - C_SIZE ? : K->u.kernel.TYPE->TYPE##size,\ - C_SIZE ? COMPAT_XT_ALIGN(C_SIZE) : \ -XT_ALIGN(K->u.kernel.TYPE->TYPE##size)) + K->u.kernel.TYPE->TYPE##size, \ + XT_ALIGN(K->u.kernel.TYPE->TYPE##size)) int xt_match_to_user(const struct xt_entry_match *m, struct xt_entry_match __user *u) { return XT_OBJ_TO_USER(u, m, match, 0) || - XT_DATA_TO_USER(u, m, match, 0); + XT_DATA_TO_USER(u, m, match); } EXPORT_SYMBOL_GPL(xt_match_to_user); @@ -315,7 +314,7 @@ int xt_target_to_user(const struct xt_entry_target *t, struct xt_entry_target __user *u) { return XT_OBJ_TO_USER(u, t, target, 0) || - XT_DATA_TO_USER(u, t, target, 0); + XT_DATA_TO_USER(u, t, target); } EXPORT_SYMBOL_GPL(xt_target_to_user); @@ -614,6 +613,12 @@ void xt_compat_match_from_user(struct xt_entry_match *m, void **dstptr, } EXPORT_SYMBOL_GPL(xt_compat_match_from_user); +#define COMPAT_XT_DATA_TO_USER(U, K, TYPE, C_SIZE) \ + xt_data_to_user(U->data, K->data, \ + K->u.kernel.TYPE->usersize, \ + C_SIZE, \ + COMPAT_XT_ALIGN(C_SIZE)) + int xt_compat_match_to_user(const struct xt_entry_match *m, void __user **dstptr, unsigned int *size) { @@ -629,7 +634,7 @@ int xt_compat_match_to_user(const struct xt_entry_match *m, if (match->compat_to_user((void __user *)cm->data, m->data)) return -EFAULT; } else { - if (XT_DATA_TO_USER(cm, m, match, msize - sizeof(*cm))) + if (COMPAT_XT_DATA_TO_USER(cm, m, match, msize - sizeof(*cm))) return -EFAULT; } @@ -975,7 +980,7 @@ int xt_compat_target_to_user(const struct xt_entry_target *t, if (target->compat_to_user((void __user *)ct->data, t->data)) return -EFAULT; } else { - if (XT_DATA_TO_USER(ct, t, target, tsize - sizeof(*ct))) + if (COMPAT_XT_DATA_TO_USER(ct, t, target, tsize - sizeof(*ct))) return -EFAULT; } -- 2.1.4
[PATCH 04/12] netfilter: introduce nf_conntrack_helper_put helper function
From: Liping ZhangAnd convert module_put invocation to nf_conntrack_helper_put, this is prepared for the followup patch, which will add a refcnt for cthelper, so we can reject the deleting request when cthelper is in use. Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_conntrack_helper.h | 2 ++ net/netfilter/nf_conntrack_helper.c | 6 ++ net/netfilter/nft_ct.c | 4 ++-- net/netfilter/xt_CT.c | 6 +++--- net/openvswitch/conntrack.c | 4 ++-- 5 files changed, 15 insertions(+), 7 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index e04fa7691e5d..c1c12411103a 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -79,6 +79,8 @@ struct nf_conntrack_helper *__nf_conntrack_helper_find(const char *name, struct nf_conntrack_helper *nf_conntrack_helper_try_module_get(const char *name, u16 l3num, u8 protonum); +void nf_conntrack_helper_put(struct nf_conntrack_helper *helper); + void nf_ct_helper_init(struct nf_conntrack_helper *helper, u16 l3num, u16 protonum, const char *name, u16 default_port, u16 spec_port, u32 id, diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index 3a60efa7799b..e17006b6e434 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -181,6 +181,12 @@ nf_conntrack_helper_try_module_get(const char *name, u16 l3num, u8 protonum) } EXPORT_SYMBOL_GPL(nf_conntrack_helper_try_module_get); +void nf_conntrack_helper_put(struct nf_conntrack_helper *helper) +{ + module_put(helper->me); +} +EXPORT_SYMBOL_GPL(nf_conntrack_helper_put); + struct nf_conn_help * nf_ct_helper_ext_add(struct nf_conn *ct, struct nf_conntrack_helper *helper, gfp_t gfp) diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c index a34ceb38fc55..1678e9e75e8e 100644 --- a/net/netfilter/nft_ct.c +++ b/net/netfilter/nft_ct.c @@ -826,9 +826,9 @@ static void nft_ct_helper_obj_destroy(struct nft_object *obj) struct nft_ct_helper_obj *priv = nft_obj_data(obj); if (priv->helper4) - module_put(priv->helper4->me); + nf_conntrack_helper_put(priv->helper4); if (priv->helper6) - module_put(priv->helper6->me); + nf_conntrack_helper_put(priv->helper6); } static void nft_ct_helper_obj_eval(struct nft_object *obj, diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c index bb7ad82dcd56..623ef37de886 100644 --- a/net/netfilter/xt_CT.c +++ b/net/netfilter/xt_CT.c @@ -96,7 +96,7 @@ xt_ct_set_helper(struct nf_conn *ct, const char *helper_name, help = nf_ct_helper_ext_add(ct, helper, GFP_KERNEL); if (help == NULL) { - module_put(helper->me); + nf_conntrack_helper_put(helper); return -ENOMEM; } @@ -263,7 +263,7 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par, err4: help = nfct_help(ct); if (help) - module_put(help->helper->me); + nf_conntrack_helper_put(help->helper); err3: nf_ct_tmpl_free(ct); err2: @@ -346,7 +346,7 @@ static void xt_ct_tg_destroy(const struct xt_tgdtor_param *par, if (ct) { help = nfct_help(ct); if (help) - module_put(help->helper->me); + nf_conntrack_helper_put(help->helper); nf_ct_netns_put(par->net, par->family); diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index bf602e33c40a..08679ebb3068 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -1123,7 +1123,7 @@ static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name, help = nf_ct_helper_ext_add(info->ct, helper, GFP_KERNEL); if (!help) { - module_put(helper->me); + nf_conntrack_helper_put(helper); return -ENOMEM; } @@ -1584,7 +1584,7 @@ void ovs_ct_free_action(const struct nlattr *a) static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info) { if (ct_info->helper) - module_put(ct_info->helper->me); + nf_conntrack_helper_put(ct_info->helper); if (ct_info->ct) nf_ct_tmpl_free(ct_info->ct); } -- 2.1.4
[PATCH 06/12] netfilter: xtables: zero padding in data_to_user
From: Willem de BruijnWhen looking up an iptables rule, the iptables binary compares the aligned match and target data (XT_ALIGN). In some cases this can exceed the actual data size to include padding bytes. Before commit f77bc5b23fb1 ("iptables: use match, target and data copy_to_user helpers") the malloc()ed bytes were overwritten by the kernel with kzalloced contents, zeroing the padding and making the comparison succeed. After this patch, the kernel copies and clears only data, leaving the padding bytes undefined. Extend the clear operation from data size to aligned data size to include the padding bytes, if any. Padding bytes can be observed in both match and target, and the bug triggered, by issuing a rule with match icmp and target ACCEPT: iptables -t mangle -A INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT iptables -t mangle -D INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT Fixes: f77bc5b23fb1 ("iptables: use match, target and data copy_to_user helpers") Reported-by: Paul Moore Reported-by: Richard Guy Briggs Signed-off-by: Willem de Bruijn Signed-off-by: Pablo Neira Ayuso --- include/linux/netfilter/x_tables.h | 2 +- net/bridge/netfilter/ebtables.c| 9 ++--- net/netfilter/x_tables.c | 9 ++--- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index be378cf47fcc..b3044c2c62cb 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -294,7 +294,7 @@ int xt_match_to_user(const struct xt_entry_match *m, int xt_target_to_user(const struct xt_entry_target *t, struct xt_entry_target __user *u); int xt_data_to_user(void __user *dst, const void *src, - int usersize, int size); + int usersize, int size, int aligned_size); void *xt_copy_counters_from_user(const void __user *user, unsigned int len, struct xt_counters_info *info, bool compat); diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index 9ec0c9f908fa..9c6e619f452b 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -1373,7 +1373,8 @@ static inline int ebt_obj_to_user(char __user *um, const char *_name, strlcpy(name, _name, sizeof(name)); if (copy_to_user(um, name, EBT_FUNCTION_MAXNAMELEN) || put_user(datasize, (int __user *)(um + EBT_FUNCTION_MAXNAMELEN)) || - xt_data_to_user(um + entrysize, data, usersize, datasize)) + xt_data_to_user(um + entrysize, data, usersize, datasize, + XT_ALIGN(datasize))) return -EFAULT; return 0; @@ -1658,7 +1659,8 @@ static int compat_match_to_user(struct ebt_entry_match *m, void __user **dstptr, if (match->compat_to_user(cm->data, m->data)) return -EFAULT; } else { - if (xt_data_to_user(cm->data, m->data, match->usersize, msize)) + if (xt_data_to_user(cm->data, m->data, match->usersize, msize, + COMPAT_XT_ALIGN(msize))) return -EFAULT; } @@ -1687,7 +1689,8 @@ static int compat_target_to_user(struct ebt_entry_target *t, if (target->compat_to_user(cm->data, t->data)) return -EFAULT; } else { - if (xt_data_to_user(cm->data, t->data, target->usersize, tsize)) + if (xt_data_to_user(cm->data, t->data, target->usersize, tsize, + COMPAT_XT_ALIGN(tsize))) return -EFAULT; } diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 8876b7da6884..d17769599c10 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -283,12 +283,13 @@ static int xt_obj_to_user(u16 __user *psize, u16 size, >u.user.revision, K->u.kernel.TYPE->revision) int xt_data_to_user(void __user *dst, const void *src, - int usersize, int size) + int usersize, int size, int aligned_size) { usersize = usersize ? : size; if (copy_to_user(dst, src, usersize)) return -EFAULT; - if (usersize != size && clear_user(dst + usersize, size - usersize)) + if (usersize != aligned_size && + clear_user(dst + usersize, aligned_size - usersize)) return -EFAULT; return 0; @@ -298,7 +299,9 @@ EXPORT_SYMBOL_GPL(xt_data_to_user); #define XT_DATA_TO_USER(U, K, TYPE, C_SIZE)\ xt_data_to_user(U->data, K->data, \ K->u.kernel.TYPE->usersize, \ - C_SIZE ? : K->u.kernel.TYPE->TYPE##size) +
[PATCH 07/12] netfilter: synproxy: fix conntrackd interaction
From: Eric LeblondThis patch fixes the creation of connection tracking entry from netlink when synproxy is used. It was missing the addition of the synproxy extension. This was causing kernel crashes when a conntrack entry created by conntrackd was used after the switch of traffic from active node to the passive node. Signed-off-by: Eric Leblond Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_netlink.c | 4 1 file changed, 4 insertions(+) diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index fa752626029e..9799a50bc604 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -45,6 +45,8 @@ #include #include #include +#include +#include #ifdef CONFIG_NF_NAT_NEEDED #include #include @@ -1827,6 +1829,8 @@ ctnetlink_create_conntrack(struct net *net, nf_ct_tstamp_ext_add(ct, GFP_ATOMIC); nf_ct_ecache_ext_add(ct, 0, 0, GFP_ATOMIC); nf_ct_labels_ext_add(ct); + nfct_seqadj_ext_add(ct); + nfct_synproxy_ext_add(ct); /* we must add conntrack extensions before confirmation. */ ct->status |= IPS_CONFIRMED; -- 2.1.4
[PATCH 05/12] netfilter: nfnl_cthelper: reject del request if helper obj is in use
From: Liping ZhangWe can still delete the ct helper even if it is in use, this will cause a use-after-free error. In more detail, I mean: # nfct helper add ssdp inet udp # iptables -t raw -A OUTPUT -p udp -j CT --helper ssdp # nfct helper delete ssdp //--> oops, succeed! BUG: unable to handle kernel paging request at 26ca IP: 0x26ca [...] Call Trace: ? ipv4_helper+0x62/0x80 [nf_conntrack_ipv4] nf_hook_slow+0x21/0xb0 ip_output+0xe9/0x100 ? ip_fragment.constprop.54+0xc0/0xc0 ip_local_out+0x33/0x40 ip_send_skb+0x16/0x80 udp_send_skb+0x84/0x240 udp_sendmsg+0x35d/0xa50 So add reference count to fix this issue, if ct helper is used by others, reject the delete request. Apply this patch: # nfct helper delete ssdp nfct v1.4.3: netlink error: Device or resource busy Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_conntrack_helper.h | 2 ++ net/netfilter/nf_conntrack_helper.c | 6 ++ net/netfilter/nfnetlink_cthelper.c | 17 +++-- 3 files changed, 19 insertions(+), 6 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index c1c12411103a..c519bb5b5bb8 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -9,6 +9,7 @@ #ifndef _NF_CONNTRACK_HELPER_H #define _NF_CONNTRACK_HELPER_H +#include #include #include #include @@ -26,6 +27,7 @@ struct nf_conntrack_helper { struct hlist_node hnode;/* Internal use. */ char name[NF_CT_HELPER_NAME_LEN]; /* name of the module */ + refcount_t refcnt; struct module *me; /* pointer to self */ const struct nf_conntrack_expect_policy *expect_policy; diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index e17006b6e434..7f6100ca63be 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -174,6 +174,10 @@ nf_conntrack_helper_try_module_get(const char *name, u16 l3num, u8 protonum) #endif if (h != NULL && !try_module_get(h->me)) h = NULL; + if (h != NULL && !refcount_inc_not_zero(>refcnt)) { + module_put(h->me); + h = NULL; + } rcu_read_unlock(); @@ -183,6 +187,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_helper_try_module_get); void nf_conntrack_helper_put(struct nf_conntrack_helper *helper) { + refcount_dec(>refcnt); module_put(helper->me); } EXPORT_SYMBOL_GPL(nf_conntrack_helper_put); @@ -423,6 +428,7 @@ int nf_conntrack_helper_register(struct nf_conntrack_helper *me) } } } + refcount_set(>refcnt, 1); hlist_add_head_rcu(>hnode, _ct_helper_hash[h]); nf_ct_helper_count++; out: diff --git a/net/netfilter/nfnetlink_cthelper.c b/net/netfilter/nfnetlink_cthelper.c index 950bf6eadc65..be678a323598 100644 --- a/net/netfilter/nfnetlink_cthelper.c +++ b/net/netfilter/nfnetlink_cthelper.c @@ -686,6 +686,7 @@ static int nfnl_cthelper_del(struct net *net, struct sock *nfnl, tuple_set = true; } + ret = -ENOENT; list_for_each_entry_safe(nlcth, n, _cthelper_list, list) { cur = >helper; j++; @@ -699,16 +700,20 @@ static int nfnl_cthelper_del(struct net *net, struct sock *nfnl, tuple.dst.protonum != cur->tuple.dst.protonum)) continue; - found = true; - nf_conntrack_helper_unregister(cur); - kfree(cur->expect_policy); + if (refcount_dec_if_one(>refcnt)) { + found = true; + nf_conntrack_helper_unregister(cur); + kfree(cur->expect_policy); - list_del(>list); - kfree(nlcth); + list_del(>list); + kfree(nlcth); + } else { + ret = -EBUSY; + } } /* Make sure we return success if we flush and there is no helpers */ - return (found || j == 0) ? 0 : -ENOENT; + return (found || j == 0) ? 0 : ret; } static const struct nla_policy nfnl_cthelper_policy[NFCTH_MAX+1] = { -- 2.1.4
[PATCH 01/12] ipvs: SNAT packet replies only for NATed connections
From: Julian AnastasovWe do not check if packet from real server is for NAT connection before performing SNAT. This causes problems for setups that use DR/TUN and allow local clients to access the real server directly, for example: - local client in director creates IPVS-DR/TUN connection CIP->VIP and the request packets are routed to RIP. Talks are finished but IPVS connection is not expired yet. - second local client creates non-IPVS connection CIP->RIP with same reply tuple RIP->CIP and when replies are received on LOCAL_IN we wrongly assign them for the first client connection because RIP->CIP matches the reply direction. As result, IPVS SNATs replies for non-IPVS connections. The problem is more visible to local UDP clients but in rare cases it can happen also for TCP or remote clients when the real server sends the reply traffic via the director. So, better to be more precise for the reply traffic. As replies are not expected for DR/TUN connections, better to not touch them. Reported-by: Nick Moriarty Tested-by: Nick Moriarty Signed-off-by: Julian Anastasov Signed-off-by: Simon Horman --- net/netfilter/ipvs/ip_vs_core.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index d2d7bdf1d510..ad99c1ceea6f 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -849,10 +849,8 @@ static int handle_response_icmp(int af, struct sk_buff *skb, { unsigned int verdict = NF_DROP; - if (IP_VS_FWD_METHOD(cp) != 0) { - pr_err("shouldn't reach here, because the box is on the " - "half connection in the tun/dr module.\n"); - } + if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) + goto ignore_cp; /* Ensure the checksum is correct */ if (!skb_csum_unnecessary(skb) && ip_vs_checksum_complete(skb, ihl)) { @@ -886,6 +884,8 @@ static int handle_response_icmp(int af, struct sk_buff *skb, ip_vs_notrack(skb); else ip_vs_update_conntrack(skb, cp, 0); + +ignore_cp: verdict = NF_ACCEPT; out: @@ -1385,8 +1385,11 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, in */ cp = pp->conn_out_get(ipvs, af, skb, ); - if (likely(cp)) + if (likely(cp)) { + if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) + goto ignore_cp; return handle_response(af, skb, pd, cp, , hooknum); + } /* Check for real-server-started requests */ if (atomic_read(>conn_out_counter)) { @@ -1444,9 +1447,15 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, in } } } + +out: IP_VS_DBG_PKT(12, af, pp, skb, iph.off, "ip_vs_out: packet continues traversal as normal"); return NF_ACCEPT; + +ignore_cp: + __ip_vs_conn_put(cp); + goto out; } /* -- 2.1.4
Re: [PATCH net 1/3] vlan: Fix tcp checksums offloads for Q-in-Q vlan.
On 05/19/2017 04:16 AM, Toshiaki Makita wrote: > On 2017/05/19 16:09, Vlad Yasevich wrote: >> On 05/18/2017 10:13 PM, Toshiaki Makita wrote: >>> On 2017/05/18 22:31, Vladislav Yasevich wrote: It appears that since commit 8cb65d000, Q-in-Q vlans have been broken. The series that commit is part of enabled TSO and checksum offloading on Q-in-Q vlans. However, most HW we support can't handle it. To work around the issue, the above commit added a function that turns off offloads on Q-in-Q devices, but it left the checksum offload. That will cause issues with most older devices that supprort very basic checksum offload capabilities as well as some newer devices (we've reproduced te problem with both be2net and bnx). To solve this for everyone, turn off checksum offloading feature by default when sending Q-in-Q traffic. Devices that are proven to work can provided a corrected ndo_features_check implemetation. Fixes: 8cb65d000 ("net: Move check for multiple vlans to drivers") CC: Toshiaki MakitaSigned-off-by: Vladislav Yasevich --- include/linux/if_vlan.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 8d5fcd6..ae537f0 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -619,7 +619,6 @@ static inline netdev_features_t vlan_features_check(const struct sk_buff *skb, NETIF_F_SG | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | - NETIF_F_HW_CSUM | NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX); >>> >>> I guess HW_CSUM theoretically can handle Q-in-Q packets and the problem >>> is IP_CSUM and IPV6_CSUM. >>> So wouldn't it be better to leave HW_CSUM and drop IP_CSUM/IPV6_CSUM, >>> i.e. change intersection into bitwise AND? >>> >> >> It wasn't really a problem before accelerations got enabled on q-in-q >> vlans. > > Right for stacked vlan device. > But I think the check was there for packets from guests forwarded by > bridge to vlan device so it was a problem before 8cb65d000. Not really, since stacked vlans in guests wouldn't have accelerations on. Haven't really tried a new guest on old hosts. It might be an issue there... > >>> The intersection was introduced in db115037bb57 ("net: fix checksum >>> features handling in netif_skb_features()"), but I guess for this >>> particular check the intersection was not needed. >>> >> >> So, to put it another way, leave the intersection with HW_CSUM in the mask, >> and then do: >> >> return features & ~(NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM); >> >> This might work, but it assumes that everyone who announce HW_CSUM can >> do q-in-q vlans. It's been a bit of a pain tracking this down and I'd rather >> fix it for everyone and let individual driver authors verify that Q-in-Q >> works >> correctly with HW checksum. However, I am willing to do the above if >> that's what people want. > > At least HW_CSUM in the check was introduced intentionally. > https://www.spinics.net/lists/netdev/msg152016.html > > And I think HW_CSUM should work with any packets. > You know, include/linux/skbuff.h says >> * NETIF_F_HW_CSUM - The driver (or its device) is able to compute one >> * IP (one's complement) checksum for any combination >> * of protocols or protocol layering. > > We should be able to safely enable it. > > ...But you are so worried about layer2 protocol handling of HW_CSUM > devices, I'm ok with disabling it for now. > It's a concern after running across this issue. Granted, the few devices we've seen this bug on use IP/IPV6 checksum features. I am hoping someone else might weigh in here. -vlad
Re: [PATCH v5 net-next 5/7] net: fix documentation of struct scm_timestamping
On Thu, May 18, 2017 at 03:38:30PM -0400, Willem de Bruijn wrote: > On Thu, May 18, 2017 at 10:07 AM, Miroslav Lichvar> wrote: > > +Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled > > +together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false > > +software timestamp will be generated in the recvmsg() call and passed > > +in ts[0] when a real software timestamp is missing. > > With receive software timestamping this is expected behavior? I would make > explicit that this happens even on tx timestamps. How about adding ", e.g. when receive timestamping is enabled between receiving the message and the recvmsg() call, or it is a message with a hardware transmit timestamp." ? > > For this reason it > > +is not recommended to combine SO_TIMESTAMP(NS) with SO_TIMESTAMPING. > > And I'd remove this. The extra timestamp is harmless, and we may be missing > other reasons why someone would want to enable both on the same socket. Ok. I'm just concerned people will inadvertently use the timestamp as a real timestamp and then wonder why SW TX timestamping is so bad. I have fallen into this trap. -- Miroslav Lichvar
[PATCH net-next v2] ibmveth: Support to enable LSO/CSO for Trunk VEA.
Current largesend and checksum offload feature in ibmveth driver, - Source VM sends the TCP packets with ip_summed field set as CHECKSUM_PARTIAL and TCP pseudo header checksum is placed in checksum field - CHECKSUM_PARTIAL flag in SKB will enable ibmveth driver to mark "no checksum" and "checksum good" bits in transmit buffer descriptor before the packet is delivered to pseries PowerVM Hypervisor - If ibmveth has largesend capability enabled, transmit buffer descriptors are market accordingly before packet is delivered to Hypervisor (along with mss value for packets with length > MSS) - Destination VM's ibmveth driver receives the packet with "checksum good" bit set and so, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY - If "largesend" bit was on, mss value is copied from receive descriptor into SKB's gso_size and other flags are appropriately set for packets > MSS size - The packet is now successfully delivered up the stack in destination VM The offloads described above works fine for TCP communication among VMs in the same pseries server ( VM A <=> PowerVM Hypervisor <=> VM B ) We are now enabling support for OVS in pseries PowerVM environment. One of our requirements is to have ibmveth driver configured in "Trunk" mode, when they are used with OVS. This is because, PowerVM Hypervisor will no more bridge the packets between VMs, instead the packets are delivered to IO Server which hosts OVS to bridge them between VMs or to external networks (flow shown below), VM A <=> PowerVM Hypervisor <=> IO Server(OVS) <=> PowerVM Hypervisor <=> VM B In "IO server" the packet is received by inbound Trunk ibmveth and then delivered to OVS, which is then bridged to outbound Trunk ibmveth (shown below), Inbound Trunk ibmveth <=> OVS <=> Outbound Trunk ibmveth In this model, we hit the following issues which impacted the VM communication performance, - Issue 1: ibmveth doesn't support largesend and checksum offload features when configured as "Trunk". Driver has explicit checks to prevent enabling these offloads. - Issue 2: SYN packet drops seen at destination VM. When the packet originates, it has CHECKSUM_PARTIAL flag set and as it gets delivered to IO server's inbound Trunk ibmveth, on validating "checksum good" bits in ibmveth receive routine, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY flag. This packet is then bridged by OVS (or Linux Bridge) and delivered to outbound Trunk ibmveth. At this point the outbound ibmveth transmit routine will not set "no checksum" and "checksum good" bits in transmit buffer descriptor, as it does so only when the ip_summed field is CHECKSUM_PARTIAL. When this packet gets delivered to destination VM, TCP layer receives the packet with checksum value of 0 and with no checksum related flags in ip_summed field. This leads to packet drops. So, TCP connections never goes through fine. - Issue 3: First packet of a TCP connection will be dropped, if there is no OVS flow cached in datapath. OVS while trying to identify the flow, computes the checksum. The computed checksum will be invalid at the receiving end, as ibmveth transmit routine zeroes out the pseudo checksum value in the packet. This leads to packet drop. - Issue 4: ibmveth driver doesn't have support for SKB's with frag_list. When Physical NIC has GRO enabled and when OVS bridges these packets, OVS vport send code will end up calling dev_queue_xmit, which in turn calls validate_xmit_skb. In validate_xmit_skb routine, the larger packets will get segmented into MSS sized segments, if SKB has a frag_list and if the driver to which they are delivered to doesn't support NETIF_F_FRAGLIST feature. This patch addresses the above four issues, thereby enabling end to end largesend and checksum offload support for better performance. - Fix for Issue 1 : Remove checks which prevent enabling TCP largesend and checksum offloads. - Fix for Issue 2 : When ibmveth receives a packet with "checksum good" bit set and if its configured in Trunk mode, set appropriate SKB fields using skb_partial_csum_set (ip_summed field is set with CHECKSUM_PARTIAL) - Fix for Issue 3: Recompute the pseudo header checksum before sending the SKB up the stack. - Fix for Issue 4: Linearize the SKBs with frag_list. Though we end up allocating buffers and copying data, this fix gives upto 4X throughput increase. Note: All these fixes need to be dropped together as fixing just one of them will lead to other issues immediately (especially for Issues 1,2 & 3). Signed-off-by: Sivakumar Krishnasamy--- drivers/net/ethernet/ibm/ibmveth.c | 107 ++--- drivers/net/ethernet/ibm/ibmveth.h | 1 + 2 files changed, 90 insertions(+), 18 deletions(-) diff --git
Re: [PATCH v5 net-next 4/7] net: add new control message for incoming HW-timestamped packets
On Thu, May 18, 2017 at 04:20:53PM -0400, Willem de Bruijn wrote: > On Thu, May 18, 2017 at 10:07 AM, Miroslav Lichvar> wrote: > > +SOF_TIMESTAMPING_OPT_PKTINFO: > > + > > + Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming > > + packets with hardware timestamps. The message contains struct > > + scm_ts_pktinfo, which supplies the index of the real interface which > > + received the packet and its length at layer 2. A valid (non-zero) > > + interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is > > + enabled and the driver is using NAPI. > > It is probably good to explicitly call out that the remaining two fields > are reserved and undefined. To stress that applications cannot be > overly pedantic and start failing if these become non-zero. Ok. I'm adding "The struct contains also two other fields, but they are reserved and undefined". -- Miroslav Lichvar
[PATCH v2] xfrm: fix state migration copy replay sequence numbers
During xfrm migration copy replay and preplay sequence numbers from the previous state. Here is a tcpdump output showing the problem. 10.0.10.46 is running vanilla kernel, is the IKE/IPsec responder. After the migration it sent wrong sequence number, reset to 1. The migration is from 10.0.0.52 to 10.0.0.53. IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7cf), length 136 IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7cf), length 136 IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d0), length 136 IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7d0), length 136 IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d1), length 136 NOTE: next sequence is wrong 0x1 IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), length 136 IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d2), length 136 IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), length 136 Signed-off-by: Antony Antony--- Changes in v2: - include tcpdump output showing the problem in the commit message. net/xfrm/xfrm_state.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index fc3c5aa..2e291bc 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1383,6 +1383,8 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig) x->curlft.add_time = orig->curlft.add_time; x->km.state = orig->km.state; x->km.seq = orig->km.seq; + x->replay = orig->replay; + x->preplay = orig->preplay; return x; -- 2.9.3
Re: [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers
On Thu, 18 May 2017 17:41:48 +0200 Jesper Dangaard Brouerwrote: > Introducing a new XDP feature and associated bpf helper bpf_xdp_rxhash. > > The rxhash and type allow filtering on packets without touching > packet memory. The performance difference on my system with a > 100 Gbit/s mlx5 NIC is 12Mpps to 19Mpps. The XDP/bpf program I use (called xdp_rxhash) for testing this feature is available via my github repo here: https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_rxhash_kern.c https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_rxhash_user.c The cmdline output looks like: $ sudo ./xdp_rxhash --dev mlx5p2 --sec 2 --notouch xdp-action ppspps-human-readable mem XDP_ABORTED19694682 19,694,682 2.000205 no_touch XDP_DROP 0 0 2.000205 no_touch XDP_PASS 10 10 2.000205 no_touch XDP_TX 0 0 2.000205 no_touch rx_total 19694701 19,694,701 2.000205 no_touch hash_type:L3 ppspps-human-readable sample-period Unknown0 0 2.000205 IPv4 19694725 19,694,725 2.000205 IPv6 0 0 2.000205 hash_type:L4 ppspps-human-readable sample-period Unknown10 10 2.000205 TCP0 0 2.000205 UDP19694697 19,694,697 2.000205 ^CInterrupted: Removing XDP program on ifindex:5 device:mlx5p2 The 10 pps XDP_PASS is a ping command I rand at the same time. Notice how these ping-ICMP packets were categorized as L4=Unknown and L3=IPv4. The L4 categorization is usually UDP or TCP, but looking at driver-code it seems some HW support detecting other L4 types, like ICMP, SCTP, etc. $ sudo taskset -c 4 ping -i 0.1 -c 1 198.18.100.1 -c 100 -q PING 198.18.100.1 (198.18.100.1) 56(84) bytes of data. --- 198.18.100.1 ping statistics --- 100 packets transmitted, 100 received, 0% packet loss, time 10294ms rtt min/avg/max/mdev = 0.010/0.071/0.113/0.043 ms p.s. xdp-newbies can via this commit find the links back to the netdev kernel RFC patches/mails: https://github.com/netoptimizer/prototype-kernel/commit/9647e1b563970 -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH v3 net-next 5/5] dsa: add maintainer of Microchip KSZ switches
On 05/19/2017 03:57 PM, woojung@microchip.com wrote: > From: Woojung Huh> > Adding maintainer of Microchip KSZ switches. > > Signed-off-by: Woojung Huh > Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli > --- > MAINTAINERS | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index f7d568b..a72b40c 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -8454,6 +8454,16 @@ F: drivers/media/platform/atmel/atmel-isc.c > F: drivers/media/platform/atmel/atmel-isc-regs.h > F: devicetree/bindings/media/atmel-isc.txt > > +MICROCHIP KSZ SERIES ETHERNET SWITCH DRIVER > +M: Woojung Huh > +M: Microchip Linux Driver Support > +L: netdev@vger.kernel.org > +S: Maintained > +F: net/dsa/tag_ksz.c > +F: drivers/net/dsa/microchip/* > +F: include/linux/platform_data/microchip-ksz.h > +F: Documentation/devicetree/bindings/net/dsa/ksz.txt > + > MICROCHIP USB251XB DRIVER > M: Richard Leitner > L: linux-...@vger.kernel.org > -- Florian
Re: [PATCH v3 net-next 4/5] dsa: Add spi support to Microchip KSZ switches
On 05/19/2017 03:57 PM, woojung@microchip.com wrote: > From: Woojung Huh> > A sample SPI configuration for Microchip KSZ switches. > > Signed-off-by: Woojung Huh > Reviewed-by: Andrew Lunn Subject should be something like: dt-bindings: net: dsa: Add Microchip KSZ switches binding With that fixed and the nits below: Reviewed-by: Florian Fainelli > --- > Documentation/devicetree/bindings/net/dsa/ksz.txt | 73 > +++ > 1 file changed, 73 insertions(+) > create mode 100644 Documentation/devicetree/bindings/net/dsa/ksz.txt > > diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt > b/Documentation/devicetree/bindings/net/dsa/ksz.txt > new file mode 100644 > index 000..8a13966 > --- /dev/null > +++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt > @@ -0,0 +1,73 @@ > +Microchip KSZ Series Ethernet switches > +== > + > +Required properties: > + > +- compatible: For external switch chips, compatible string must be exactly > one > + of: "microchip,ksz9477" > + > +See Documentation/devicetree/bindings/dsa/dsa.txt for a list of additional > +required and optional properties. > + > +Examples: > + > +Ethernet switch connected via SPI to the host, CPU port wired to eth0: > + > + eth0: ethernet@10001000 { > + fixed-link { > + reg = <7> There is a missing semicolon, and for a fixed-link, there is no "reg" property. > + speed = <1000>; > + duplex-full; Actually the correct property is named "full-duplex" (like you put it for the switch) > + }; > + }; > + > + spi1: spi@f8008000 { > + pinctrl-0 = <_spi_ksz>; > + cs-gpios = < 25 0>; > + id = <1>; > + status = "okay"; > + > + ksz9477: ksz9477@0 { > + compatible = "microchip,ksz9477"; > + reg = <0>; > + > + spi-max-frequency = <4400>; > + spi-cpha; > + spi-cpol; > + > + status = "okay"; > + ports { > + #address-cells = <1>; > + #size-cells = <0>; > + port@0 { > + reg = <0>; > + label = "lan1"; > + }; > + port@1 { > + reg = <1>; > + label = "lan2"; > + }; > + port@2 { > + reg = <2>; > + label = "lan3"; > + }; > + port@3 { > + reg = <3>; > + label = "lan4"; > + }; > + port@4 { > + reg = <4>; > + label = "lan5"; > + }; > + port@5 { > + reg = <5>; > + label = "cpu"; > + ethernet = <>; > + fixed-link { > + speed = <1000>; > + full-duplex; > + }; > + }; > + }; > + }; > + }; > -- Florian
RE: [PATCH v3 net-next 1/5] dsa: add support for Microchip KSZ tail tagging
>> + if (padlen) { >> + u8 *pad = skb_put(nskb, padlen); >> + >> + memset(pad, 0, padlen); >> + } > >Can you use skb_put_padto() here instead of open coding this? > >> + >> + tag = skb_put(nskb, KSZ_INGRESS_TAG_LEN); >> + tag[0] = 0; >> + tag[1] = 1 << p->dp->index; /* destnation port */ > >typo: destination port > >With that fixed: > >Reviewed-by: Florian FainelliHI Florian, Thanks for prompt reviews. Will submit another version. - Woojung
Re: [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers
On Fri, 19 May 2017 20:07:52 -0700, Alexei Starovoitov wrote: > How about exposing 'struct mlx5_cqe64 *' to XDP programs as-is? > We can make sure that XDP program does read only access into it and > it will see cqe->rss_hash_result, cqe->rss_hash_type and everything else > in there, but this will not be uapi and it will be pretty obvious > to program authors that their programs are vendor specific. > 'not uapi' here means that mellanox is free to change their HW descriptor > and its contents as they wish. Hm.. Would that mean we have to teach the verifier about all possible drivers and their metadata structures (i.e. sizes thereof). And add an UAPI enum of known drivers? Other idea I floated in early days was to standardize the fields but let the driver "JIT" the accesses to look at the right offset of the right structure. Admittedly that would be a lot more work.
[PATCH v2 net-next] net: ipv6: fix code style error and warning of ndisc.c
From: yuan linyuSigned-off-by: yuan linyu --- net/ipv6/ndisc.c | 300 --- 1 file changed, 155 insertions(+), 145 deletions(-) diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index d310dc4..5a3dfaa 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -12,8 +12,7 @@ * 2 of the License, or (at your option) any later version. */ -/* - * Changes: +/* Changes: * * Alexey I. Froloff : RFC6106 (DNSSL) support * Pierre Ynard: export userland ND options @@ -99,7 +98,6 @@ static const struct neigh_ops ndisc_hh_ops = { .connected_output = neigh_resolve_output, }; - static const struct neigh_ops ndisc_direct_ops = { .family = AF_INET6, .output = neigh_direct_output, @@ -147,13 +145,13 @@ void __ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data, u8 *opt = skb_put(skb, space); opt[0] = type; - opt[1] = space>>3; + opt[1] = space >> 3; memset(opt + 2, 0, pad); opt += pad; space -= pad; - memcpy(opt+2, data, data_len); + memcpy(opt + 2, data, data_len); data_len += 2; opt += data_len; space -= data_len; @@ -182,6 +180,7 @@ static struct nd_opt_hdr *ndisc_next_option(struct nd_opt_hdr *cur, struct nd_opt_hdr *end) { int type; + if (!cur || !end || cur >= end) return NULL; type = cur->nd_opt_type; @@ -222,6 +221,7 @@ struct ndisc_options *ndisc_parse_options(const struct net_device *dev, memset(ndopts, 0, sizeof(*ndopts)); while (opt_len) { int l; + if (opt_len < sizeof(struct nd_opt_hdr)) return NULL; l = nd_opt->nd_opt_len << 3; @@ -240,13 +240,15 @@ struct ndisc_options *ndisc_parse_options(const struct net_device *dev, "%s: duplicated ND6 option found: type=%d\n", __func__, nd_opt->nd_opt_type); } else { - ndopts->nd_opt_array[nd_opt->nd_opt_type] = nd_opt; + ndopts->nd_opt_array[nd_opt->nd_opt_type] = + nd_opt; } break; case ND_OPT_PREFIX_INFO: ndopts->nd_opts_pi_end = nd_opt; if (!ndopts->nd_opt_array[nd_opt->nd_opt_type]) - ndopts->nd_opt_array[nd_opt->nd_opt_type] = nd_opt; + ndopts->nd_opt_array[nd_opt->nd_opt_type] = + nd_opt; break; #ifdef CONFIG_IPV6_ROUTE_INFO case ND_OPT_ROUTE_INFO: @@ -261,8 +263,7 @@ struct ndisc_options *ndisc_parse_options(const struct net_device *dev, if (!ndopts->nd_useropts) ndopts->nd_useropts = nd_opt; } else { - /* -* Unknown options must be silently ignored, + /* Unknown options must be silently ignored, * to accommodate future extension to the * protocol. */ @@ -280,7 +281,8 @@ struct ndisc_options *ndisc_parse_options(const struct net_device *dev, return ndopts; } -int ndisc_mc_map(const struct in6_addr *addr, char *buf, struct net_device *dev, int dir) +int ndisc_mc_map(const struct in6_addr *addr, char *buf, +struct net_device *dev, int dir) { switch (dev->type) { case ARPHRD_ETHER: @@ -327,9 +329,8 @@ static int ndisc_constructor(struct neighbour *neigh) bool is_multicast = ipv6_addr_is_multicast(addr); in6_dev = in6_dev_get(dev); - if (!in6_dev) { + if (!in6_dev) return -EINVAL; - } parms = in6_dev->nd_parms; __neigh_parms_put(neigh->parms); @@ -344,12 +345,12 @@ static int ndisc_constructor(struct neighbour *neigh) if (is_multicast) { neigh->nud_state = NUD_NOARP; ndisc_mc_map(addr, neigh->ha, dev, 1); - } else if (dev->flags&(IFF_NOARP|IFF_LOOPBACK)) { + } else if (dev->flags & (IFF_NOARP | IFF_LOOPBACK)) { neigh->nud_state = NUD_NOARP; memcpy(neigh->ha, dev->dev_addr, dev->addr_len); - if (dev->flags_LOOPBACK) + if (dev->flags & IFF_LOOPBACK) neigh->type = RTN_LOCAL; -
Re: [PATCH v3 net-next 3/5] dsa: add DSA switch driver for Microchip KSZ9477
On 05/19/2017 03:57 PM, woojung@microchip.com wrote: > From: Woojung Huh> > The KSZ9477 is a fully integrated layer 2, managed, 7 ports GigE switch > with numerous advanced features. 5 ports incorporate 10/100/1000 Mbps PHYs. > The other 2 ports have interfaces that can be configured as SGMII, RGMII, MII > or RMII. Either of these may connect directly to a host processor or > to an external PHY. The SGMII port may interface to a fiber optic transceiver. > > This driver currently supports vlan, fdb, mdb & mirror dsa switch operations. > > Signed-off-by: Woojung Huh Looks great, thanks Woojung! Reviewed-by: Florian Fainelli -- Florian
[PATCH] i40e: Fix incorrect pf->flags
Fix bug introduced by 'commit 47994c119a36e ("i40e: remove hw_disabled_flags in favor of using separate flag bits")' that mistakenly wipes out pf->flags. Signed-off-by: Tushar Dave--- drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index d5c9c9e..6b98d34 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -8821,9 +8821,9 @@ static int i40e_sw_init(struct i40e_pf *pf) (pf->hw.aq.api_min_ver > 4))) { /* Supported in FW API version higher than 1.4 */ pf->flags |= I40E_FLAG_GENEVE_OFFLOAD_CAPABLE; - pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE; + pf->flags |= I40E_FLAG_HW_ATR_EVICT_CAPABLE; } else { - pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE; + pf->flags |= I40E_FLAG_HW_ATR_EVICT_CAPABLE; } pf->eeprom_version = 0xDEAD; -- 1.9.1
Re: [PATCH v3 net-next 2/5] phy: micrel: add Microchip KSZ 9477 Switch PHY support
On 05/19/2017 03:57 PM, woojung@microchip.com wrote: > From: Woojung Huh> > Adding Microchip 9477 Phy included in KSZ9477 Switch. > > Signed-off-by: Woojung Huh > Signed-off-by: Andrew Lunn Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH v3 net-next 1/5] dsa: add support for Microchip KSZ tail tagging
On 05/19/2017 03:57 PM, woojung@microchip.com wrote: > From: Woojung Huh> > Adding support for the Microchip KSZ switch family tail tagging. > > Signed-off-by: Woojung Huh > Reviewed-by: Andrew Lunn > --- > include/net/dsa.h | 1 + > net/dsa/Kconfig| 3 ++ > net/dsa/Makefile | 1 + > net/dsa/dsa.c | 3 ++ > net/dsa/dsa_priv.h | 3 ++ > net/dsa/tag_ksz.c | 103 > + > 6 files changed, 114 insertions(+) > create mode 100644 net/dsa/tag_ksz.c > > diff --git a/include/net/dsa.h b/include/net/dsa.h > index 791fed6..fbb00a6 100644 > --- a/include/net/dsa.h > +++ b/include/net/dsa.h > @@ -31,6 +31,7 @@ enum dsa_tag_protocol { > DSA_TAG_PROTO_BRCM, > DSA_TAG_PROTO_DSA, > DSA_TAG_PROTO_EDSA, > + DSA_TAG_PROTO_KSZ, > DSA_TAG_PROTO_LAN9303, > DSA_TAG_PROTO_MTK, > DSA_TAG_PROTO_QCA, > diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig > index 297389b..cc5f8f9 100644 > --- a/net/dsa/Kconfig > +++ b/net/dsa/Kconfig > @@ -25,6 +25,9 @@ config NET_DSA_TAG_DSA > config NET_DSA_TAG_EDSA > bool > > +config NET_DSA_TAG_KSZ > + bool > + > config NET_DSA_TAG_LAN9303 > bool > > diff --git a/net/dsa/Makefile b/net/dsa/Makefile > index f8c0251..b15141f 100644 > --- a/net/dsa/Makefile > +++ b/net/dsa/Makefile > @@ -6,6 +6,7 @@ dsa_core-y += dsa.o slave.o dsa2.o switch.o legacy.o > dsa_core-$(CONFIG_NET_DSA_TAG_BRCM) += tag_brcm.o > dsa_core-$(CONFIG_NET_DSA_TAG_DSA) += tag_dsa.o > dsa_core-$(CONFIG_NET_DSA_TAG_EDSA) += tag_edsa.o > +dsa_core-$(CONFIG_NET_DSA_TAG_KSZ) += tag_ksz.o > dsa_core-$(CONFIG_NET_DSA_TAG_LAN9303) += tag_lan9303.o > dsa_core-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o > dsa_core-$(CONFIG_NET_DSA_TAG_QCA) += tag_qca.o > diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c > index 3288a80..402459e 100644 > --- a/net/dsa/dsa.c > +++ b/net/dsa/dsa.c > @@ -49,6 +49,9 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = > { > #ifdef CONFIG_NET_DSA_TAG_EDSA > [DSA_TAG_PROTO_EDSA] = _netdev_ops, > #endif > +#ifdef CONFIG_NET_DSA_TAG_KSZ > + [DSA_TAG_PROTO_KSZ] = _netdev_ops, > +#endif > #ifdef CONFIG_NET_DSA_TAG_LAN9303 > [DSA_TAG_PROTO_LAN9303] = _netdev_ops, > #endif > diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h > index c274130..6f23dfa 100644 > --- a/net/dsa/dsa_priv.h > +++ b/net/dsa/dsa_priv.h > @@ -85,6 +85,9 @@ extern const struct dsa_device_ops dsa_netdev_ops; > /* tag_edsa.c */ > extern const struct dsa_device_ops edsa_netdev_ops; > > +/* tag_ksz.c */ > +extern const struct dsa_device_ops ksz_netdev_ops; > + > /* tag_lan9303.c */ > extern const struct dsa_device_ops lan9303_netdev_ops; > > diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c > new file mode 100644 > index 000..cbc79b5 > --- /dev/null > +++ b/net/dsa/tag_ksz.c > @@ -0,0 +1,103 @@ > +/* > + * net/dsa/tag_ksz.c - Microchip KSZ Switch tag format handling > + * Copyright (c) 2017 Microchip Technology > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + */ > + > +#include > +#include > +#include > +#include > +#include "dsa_priv.h" > + > +/* For Ingress (Host -> KSZ), 2 bytes are added before FCS. > + * > --- > + * > DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes) > + * > --- > + * tag0 : Prioritization (not used now) > + * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5) > + * > + * For Egress (KSZ -> Host), 1 byte is added before FCS. > + * > --- > + * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|FCS(4bytes) > + * > --- > + * tag0 : zero-based value represents port > + * (eg, 0x00=port1, 0x02=port3, 0x06=port7) > + */ > + > +#define KSZ_INGRESS_TAG_LEN 2 > +#define KSZ_EGRESS_TAG_LEN 1 > + > +static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev) > +{ > + struct dsa_slave_priv *p = netdev_priv(dev); > + struct sk_buff *nskb; > + int padlen; > + u8 *tag; > + > + padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len; > + > + if (skb_tailroom(skb) >= padlen + KSZ_INGRESS_TAG_LEN) { > + nskb = skb; > + } else { > + nskb = alloc_skb(NET_IP_ALIGN + skb->len + > + padlen + KSZ_INGRESS_TAG_LEN, GFP_ATOMIC); > + if (!nskb) { > + kfree_skb(skb); > + return
Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
On 5/19/17 4:16 PM, David Miller wrote: From: Alexei StarovoitovDate: Fri, 19 May 2017 14:37:56 -0700 On 5/19/17 1:41 PM, David Miller wrote: From: Edward Cree Date: Fri, 19 May 2017 18:17:42 +0100 One question: is there a way to build the verifier as userland code (or at least as a module), or will I have to reboot every time I want to test a change? There currently is no such machanism, you will have to reboot every time. I have considered working on making the code buildable outside of the kernel. It shouldn't be too hard. it's not hard. We did it twice and both times abandoned. First time to have 'user space verifier' to check programs before loading and second time for fuzzing via llvm. Abandoned since it diverges very quickly from kernel. Well, my idea was the create an environment in which kernel verifier.c could be built as-is. Maybe there would be some small compromises in verifier.c such as an ifdef test or two, but that should be it. that's exactly what we did the first time. Added few ifdef to verifier.c Second time we went even further by compiling kernel/bpf/verifier.c as-is and linking everything magically via user space hooks all the way that test_verifier.c runs as-is but calling bpf_check() function that was compiled for user space via clang. That code is here: https://github.com/iovisor/bpf-fuzzer It's definitely possible to refresh it and make it work again. My point that unless we put such 'lets build verifier.c for user space' as part of tools/testing/selftests/ or something, such project is destined to bit rot.
Re: [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers
On Fri, 19 May 2017 20:34:00 -0700, Alexei Starovoitov wrote: > On Fri, May 19, 2017 at 08:21:47PM -0700, Jakub Kicinski wrote: > > On Fri, 19 May 2017 20:07:52 -0700, Alexei Starovoitov wrote: > > > How about exposing 'struct mlx5_cqe64 *' to XDP programs as-is? > > > We can make sure that XDP program does read only access into it and > > > it will see cqe->rss_hash_result, cqe->rss_hash_type and everything else > > > in there, but this will not be uapi and it will be pretty obvious > > > to program authors that their programs are vendor specific. > > > 'not uapi' here means that mellanox is free to change their HW descriptor > > > and its contents as they wish. > > > > Hm.. Would that mean we have to teach the verifier about all possible > > drivers and their metadata structures (i.e. sizes thereof). And add an > > UAPI enum of known drivers? > > why? no uapi other than a pointer to this hw rx descriptor. > Different sizeof(hw_rx_descriptor) is not a problem. > We deal with it already in tracing. All tracepoints have different > sizeof(*ctx), yet the safety is preserved. Ack, quick read of tracing code reveals this indeed should work.
Re: [PATCH net-next] geneve: always fill CSUM6_RX configuration
On Thu, May 18, 2017 at 12:59 PM, Eric Garverwrote: > CSMU6_RX is relevant for collect_metadata as well. As such leave it > outside of the dev's IPv4/IPv6 checks. > Can you explain it bit? is this flag used with ipv4 tunnels? > Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") > Signed-off-by: Eric Garver > --- > drivers/net/geneve.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c > index dec5d563ab19..f557d1dc3f9b 100644 > --- a/drivers/net/geneve.c > +++ b/drivers/net/geneve.c > @@ -1311,13 +1311,13 @@ static int geneve_fill_info(struct sk_buff *skb, > const struct net_device *dev) > if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_TX, >!(info->key.tun_flags & TUNNEL_CSUM))) > goto nla_put_failure; > - > - if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX, > - !geneve->use_udp6_rx_checksums)) > - goto nla_put_failure; > #endif > } > > + if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX, > + !geneve->use_udp6_rx_checksums)) > + goto nla_put_failure; > + > if (nla_put_u8(skb, IFLA_GENEVE_TTL, info->key.ttl) || > nla_put_u8(skb, IFLA_GENEVE_TOS, info->key.tos) || > nla_put_be32(skb, IFLA_GENEVE_LABEL, info->key.label)) > -- > 2.12.0 >
[net-next] net: ipv6: fix code style error and warning of ndisc.c
From: yuan linyuSigned-off-by: yuan linyu --- net/ipv6/ndisc.c | 300 --- 1 file changed, 155 insertions(+), 145 deletions(-) diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index d310dc4..5a3dfaa 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -12,8 +12,7 @@ * 2 of the License, or (at your option) any later version. */ -/* - * Changes: +/* Changes: * * Alexey I. Froloff : RFC6106 (DNSSL) support * Pierre Ynard: export userland ND options @@ -99,7 +98,6 @@ static const struct neigh_ops ndisc_hh_ops = { .connected_output = neigh_resolve_output, }; - static const struct neigh_ops ndisc_direct_ops = { .family = AF_INET6, .output = neigh_direct_output, @@ -147,13 +145,13 @@ void __ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data, u8 *opt = skb_put(skb, space); opt[0] = type; - opt[1] = space>>3; + opt[1] = space >> 3; memset(opt + 2, 0, pad); opt += pad; space -= pad; - memcpy(opt+2, data, data_len); + memcpy(opt + 2, data, data_len); data_len += 2; opt += data_len; space -= data_len; @@ -182,6 +180,7 @@ static struct nd_opt_hdr *ndisc_next_option(struct nd_opt_hdr *cur, struct nd_opt_hdr *end) { int type; + if (!cur || !end || cur >= end) return NULL; type = cur->nd_opt_type; @@ -222,6 +221,7 @@ struct ndisc_options *ndisc_parse_options(const struct net_device *dev, memset(ndopts, 0, sizeof(*ndopts)); while (opt_len) { int l; + if (opt_len < sizeof(struct nd_opt_hdr)) return NULL; l = nd_opt->nd_opt_len << 3; @@ -240,13 +240,15 @@ struct ndisc_options *ndisc_parse_options(const struct net_device *dev, "%s: duplicated ND6 option found: type=%d\n", __func__, nd_opt->nd_opt_type); } else { - ndopts->nd_opt_array[nd_opt->nd_opt_type] = nd_opt; + ndopts->nd_opt_array[nd_opt->nd_opt_type] = + nd_opt; } break; case ND_OPT_PREFIX_INFO: ndopts->nd_opts_pi_end = nd_opt; if (!ndopts->nd_opt_array[nd_opt->nd_opt_type]) - ndopts->nd_opt_array[nd_opt->nd_opt_type] = nd_opt; + ndopts->nd_opt_array[nd_opt->nd_opt_type] = + nd_opt; break; #ifdef CONFIG_IPV6_ROUTE_INFO case ND_OPT_ROUTE_INFO: @@ -261,8 +263,7 @@ struct ndisc_options *ndisc_parse_options(const struct net_device *dev, if (!ndopts->nd_useropts) ndopts->nd_useropts = nd_opt; } else { - /* -* Unknown options must be silently ignored, + /* Unknown options must be silently ignored, * to accommodate future extension to the * protocol. */ @@ -280,7 +281,8 @@ struct ndisc_options *ndisc_parse_options(const struct net_device *dev, return ndopts; } -int ndisc_mc_map(const struct in6_addr *addr, char *buf, struct net_device *dev, int dir) +int ndisc_mc_map(const struct in6_addr *addr, char *buf, +struct net_device *dev, int dir) { switch (dev->type) { case ARPHRD_ETHER: @@ -327,9 +329,8 @@ static int ndisc_constructor(struct neighbour *neigh) bool is_multicast = ipv6_addr_is_multicast(addr); in6_dev = in6_dev_get(dev); - if (!in6_dev) { + if (!in6_dev) return -EINVAL; - } parms = in6_dev->nd_parms; __neigh_parms_put(neigh->parms); @@ -344,12 +345,12 @@ static int ndisc_constructor(struct neighbour *neigh) if (is_multicast) { neigh->nud_state = NUD_NOARP; ndisc_mc_map(addr, neigh->ha, dev, 1); - } else if (dev->flags&(IFF_NOARP|IFF_LOOPBACK)) { + } else if (dev->flags & (IFF_NOARP | IFF_LOOPBACK)) { neigh->nud_state = NUD_NOARP; memcpy(neigh->ha, dev->dev_addr, dev->addr_len); - if (dev->flags_LOOPBACK) + if (dev->flags & IFF_LOOPBACK) neigh->type = RTN_LOCAL; -
Re: [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers
On Thu, May 18, 2017 at 05:41:48PM +0200, Jesper Dangaard Brouer wrote: > > +/* XDP rxhash have an associated type, which is related to the RSS > + * (Receive Side Scaling) standard, but NIC HW have different mapping > + * and support. Thus, create mapping that is interesting for XDP. XDP > + * would primarly want insight into L3 and L4 protocol info. > + * > + * TODO: Likely need to get extended with "L3_IPV6_EX" due RSS standard > + * > + * The HASH_TYPE will be returned from bpf helper as the top 32-bit of > + * the 64-bit rxhash (internally type stored in xdp_buff->flags). > + */ > +#define XDP_HASH(x) ((x) & ((1ULL << 32)-1)) > +#define XDP_HASH_TYPE(x) ((x) >> 32) > + > +#define XDP_HASH_TYPE_L3_SHIFT 0 > +#define XDP_HASH_TYPE_L3_BITS3 > +#define XDP_HASH_TYPE_L3_MASK((1ULL << XDP_HASH_TYPE_L3_BITS)-1) > +#define XDP_HASH_TYPE_L3(x) ((x) & XDP_HASH_TYPE_L3_MASK) > +enum { > + XDP_HASH_TYPE_L3_IPV4 = 1, > + XDP_HASH_TYPE_L3_IPV6, > +}; > + > +#define XDP_HASH_TYPE_L4_SHIFT XDP_HASH_TYPE_L3_BITS > +#define XDP_HASH_TYPE_L4_BITS5 > +#define XDP_HASH_TYPE_L4_MASK > \ > + (((1ULL << XDP_HASH_TYPE_L4_BITS)-1) << XDP_HASH_TYPE_L4_SHIFT) > +#define XDP_HASH_TYPE_L4(x) ((x) & XDP_HASH_TYPE_L4_MASK) > +enum { > + _XDP_HASH_TYPE_L4_TCP = 1, > + _XDP_HASH_TYPE_L4_UDP, > +}; > +#define XDP_HASH_TYPE_L4_TCP (_XDP_HASH_TYPE_L4_TCP << > XDP_HASH_TYPE_L4_SHIFT) > +#define XDP_HASH_TYPE_L4_UDP (_XDP_HASH_TYPE_L4_UDP << > XDP_HASH_TYPE_L4_SHIFT) imo this is dangerous territory. As far as I can see this information doesn't exist in the current drivers at all and you're enabling it in the patch 5 via fancy: + u32 ht = (mlx5_htype_l4_to_xdp[((cht & CQE_RSS_HTYPE_L4) >> 6)] | \ + mlx5_htype_l3_to_xdp[((cht & CQE_RSS_HTYPE_IP) >> 2)]); It's pretty cool that you've discovered this hidden mlx5 feature Did you find it in some hw spec ? And it looks useful to me, but 1. i'm worried that we'd be relying on something that mellanox didn't implement in their drivers before. Was it tested and guarnteed to exist in the future revisions of firmware? Is it cx4 or cx4-lx or cx5 feature? 2. but the main concern that it is mellanox only feature. At least I cannot see anything like this in broadcom and intel nics In the very beginning we discussed that XDP programs should be as generic as possible and HW independent while at the same time we want to expose HW specific features to XDP programs. So I'm totally fine to expose this fancy hw hash and ipv4 vs v6 and tcp vs udp flags to xdp programs somehow, but I'm completely against making it into uapi. How about exposing 'struct mlx5_cqe64 *' to XDP programs as-is? We can make sure that XDP program does read only access into it and it will see cqe->rss_hash_result, cqe->rss_hash_type and everything else in there, but this will not be uapi and it will be pretty obvious to program authors that their programs are vendor specific. 'not uapi' here means that mellanox is free to change their HW descriptor and its contents as they wish. Also no copies and bit conversions will be necessary, so the cost will be zero to programs that don't use it and we wouldn't need to change verifier to discover access to this stuff. Note that bpf programs already have access to all in-kernel data structures on the tracing side, so this is nothing new and tracing program authors got used to structures changing from kernel to kernel. XDP program authors can do the same for vendor specific bits while we keep core XDP uapi generic across all nics.
Re: [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers
On Fri, May 19, 2017 at 08:21:47PM -0700, Jakub Kicinski wrote: > On Fri, 19 May 2017 20:07:52 -0700, Alexei Starovoitov wrote: > > How about exposing 'struct mlx5_cqe64 *' to XDP programs as-is? > > We can make sure that XDP program does read only access into it and > > it will see cqe->rss_hash_result, cqe->rss_hash_type and everything else > > in there, but this will not be uapi and it will be pretty obvious > > to program authors that their programs are vendor specific. > > 'not uapi' here means that mellanox is free to change their HW descriptor > > and its contents as they wish. > > Hm.. Would that mean we have to teach the verifier about all possible > drivers and their metadata structures (i.e. sizes thereof). And add an > UAPI enum of known drivers? why? no uapi other than a pointer to this hw rx descriptor. Different sizeof(hw_rx_descriptor) is not a problem. We deal with it already in tracing. All tracepoints have different sizeof(*ctx), yet the safety is preserved. > Other idea I floated in early days was to standardize the fields but > let the driver "JIT" the accesses to look at the right offset of the > right structure. Admittedly that would be a lot more work. 'standardize the fields' sounds nice, but failed here already. As far as I can see the meaning of packet 'hash' is quite different across the drivers and 'hash' is just a beginning. I hope we can standardize on 'csum' field and make it checksum_complete, but so far out of 10+G nics only mlx5 and nfp do it in hw. We need it at least for mlx4, but it can only fake it via expensive math.
Re: [PATCH net-next v2] bridge: fix hello and hold timers starting/stopping
On Fri, May 19, 2017 at 07:30:43PM +0200, Ivan Vecera wrote: > Current bridge code incorrectly handles starting/stopping of hello and > hold timers during STP enable/disable. > > 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP >transition. The timers are already stopped in NO_STP state so >this is confusing no-op. Hi Ivan, Shouldn't we start hello timer in br_stp_start when NO_STP -> BR_KERNEL_STP ? > > 2. During USER_STP->NO_STP transition the timers are started. This >does not make sense and is confusion because the timer should not be >active in NO_STP state. Yes, but what about BR_KERNEL_STP -> NO_STP in function br_stp_stop() ? > > Cc: da...@davemloft.net > Cc: sas...@cumulusnetworks.com > Cc: step...@networkplumber.org > Cc: bri...@lists.linux-foundation.org > Cc: lucien@gmail.com > Cc: niko...@cumulusnetworks.com > Signed-off-by: Ivan Vecera> --- > net/bridge/br_stp_if.c | 11 --- > 1 file changed, 11 deletions(-) > > diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c > index 08341d2aa9c9..a05027027513 100644 > --- a/net/bridge/br_stp_if.c > +++ b/net/bridge/br_stp_if.c > @@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char > *arg) > > static void br_stp_start(struct net_bridge *br) > { > - struct net_bridge_port *p; > int err = -ENOENT; > > if (net_eq(dev_net(br->dev), _net)) > @@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br) > if (!err) { > br->stp_enabled = BR_USER_STP; > br_debug(br, "userspace STP started\n"); > - > - /* Stop hello and hold timers */ > - del_timer(>hello_timer); > - list_for_each_entry(p, >port_list, list) > - del_timer(>hold_timer); I'm not sure if user space daemon will send bpdu or not? In comment 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers"). Nikolay said we should not handle it with BR_USER_STP. > } else { > br->stp_enabled = BR_KERNEL_STP; > br_debug(br, "using kernel STP\n"); > @@ -187,7 +181,6 @@ static void br_stp_start(struct net_bridge *br) > > static void br_stp_stop(struct net_bridge *br) > { > - struct net_bridge_port *p; > int err; > > if (br->stp_enabled == BR_USER_STP) { > @@ -196,10 +189,6 @@ static void br_stp_stop(struct net_bridge *br) > br_err(br, "failed to stop userspace STP (%d)\n", err); > > /* To start timers on any ports left in blocking */ > - mod_timer(>hello_timer, jiffies + br->hello_time); > - list_for_each_entry(p, >port_list, list) > - mod_timer(>hold_timer, > - round_jiffies(jiffies + BR_HOLD_TIME)); If we do not del hello_timer. after it expired in br_hello_timer_expired(), Our state is br->dev->flags & IFF_UP and br->stp_enabled == NO_STP, it will call mod_timer(>hello_timer, round_jiffies(jiffies + br->hello_time)) and we will keep sending bpdu message even after stp stoped. > spin_lock_bh(>lock); > br_port_state_selection(br); > spin_unlock_bh(>lock); > -- So how about just like diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c index d8ad73b..0198f62 100644 --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -183,6 +183,7 @@ static void br_stp_start(struct net_bridge *br) } else { br->stp_enabled = BR_KERNEL_STP; br_debug(br, "using kernel STP\n"); + mod_timer(>hello_timer, jiffies + br->hello_time); /* To start timers on any ports left in blocking */ br_port_state_selection(br); @@ -202,7 +203,6 @@ static void br_stp_stop(struct net_bridge *br) br_err(br, "failed to stop userspace STP (%d)\n", err); /* To start timers on any ports left in blocking */ - mod_timer(>hello_timer, jiffies + br->hello_time); list_for_each_entry(p, >port_list, list) mod_timer(>hold_timer, round_jiffies(jiffies + BR_HOLD_TIME)); @@ -211,6 +211,7 @@ static void br_stp_stop(struct net_bridge *br) spin_unlock_bh(>lock); } + del_timer_sync(>hello_timer); br->stp_enabled = BR_NO_STP; } Thanks Hangbin
Re: [PATCH net] ah: use crypto_memneq to check the ICV
On Fri, May 19, 2017 at 02:25:14PM +0200, Sabrina Dubroca wrote: > Hi Steffen, > > 2017-05-04, 12:41:24 +0200, Steffen Klassert wrote: > > On Wed, May 03, 2017 at 04:57:57PM +0200, Sabrina Dubroca wrote: > > > Signed-off-by: Sabrina Dubroca> > > --- > > > net/ipv4/ah4.c | 5 +++-- > > > net/ipv6/ah6.c | 5 +++-- > > > 2 files changed, 6 insertions(+), 4 deletions(-) > > > > Is this a fix for something? If so, please describe what it fixes. > > If not, it can wait until after the merge window and merged into > > ipsec-next then. > > Do you prefer that I resend the patch, or can you take it directly? No need to resend, I've just applied it directly. Thanks a lot Sabrina!
RE: [PATCH net-next] cxgb4: fix incorrect cim_la output for T6
From: Ganesh Goudar > Sent: 19 May 2017 11:12 T6 > > take care of UpDbgLaRdPtr[0-3] restriction for T6 > > Signed-off-by: Ganesh Goudar> --- > drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c > b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c > index aded42b96..917b46b 100644 > --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c > +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c > @@ -8268,6 +8268,13 @@ int t4_cim_read_la(struct adapter *adap, u32 *la_buf, > unsigned int *wrptr) > if (ret) > break; > idx = (idx + 1) & UPDBGLARDPTR_M; > + > + /* Bits 0-3 of UpDbgLaRdPtr can be between to 1001 to > + * identify the 32-bit portion of the full 312-bit data > + */ > + if (is_t6(adap->params.chip)) > + while ((idx & 0xf) > 9) > + idx = (idx + 1) % UPDBGLARDPTR_M; Why the loop, maybe: if (is_t6(adap->params.chip) && (idx & 0xf) >= 9) idx = (idx & 0xf0) + 0x10; else idx++; idx &= UPDBGLARDPTR_M; David
Re: general protection fault in skb_release_data
On Fri, May 19, 2017 at 5:57 AM, Andrey Konovalovwrote: > On Fri, May 19, 2017 at 12:18 PM, wrote: >> Hi, >> >> I've got the following bug report while fuzzing the >> kernel(master-f83246089ca) with syzkaller. >> >> program and config are attached. > > Hi! > > Thanks for the report! > > Adding kernel maintainers. > > I can confirm that we've hist this bug multiple times, but never been > able to reproduce it. > > I was able to reproduce it on 2ea659a9ef488125eb46da6eb571de5eae5c43f6 > (4.12-rc1). > > Using the attached syzkaller program I was able to generate C > reproducer, attached. Sometimes I need to run it a few times to > trigger the bug. > > @idaifish If you find more bugs please run ./scripts/get_maintainer.pl > to get the list of subsystem maintainers and add them to the > recipients. I've updated instructions of how to report kernel bugs > found with syzkaller in README. > > Thanks! > >> >> === >> kasan: GPF could be caused by NULL-ptr deref or user memory access >> general protection fault: [#1] SMP KASAN >> Dumping ftrace buffer: >>(ftrace buffer empty) >> Modules linked in: >> CPU: 2 PID: 21599 Comm: syz-executor3 Not tainted 4.11.0-rc8+ #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> Ubuntu-1.8.2-1ubuntu1 04/01/2014 >> task: 88006c16dec0 task.stack: 880058bb8000 >> RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline] >> RIP: 0010:compound_head include/linux/page-flags.h:146 [inline] >> RIP: 0010:put_page include/linux/mm.h:796 [inline] >> RIP: 0010:__skb_frag_unref include/linux/skbuff.h:2613 [inline] >> RIP: 0010:skb_release_data+0x201/0x3b0 net/core/skbuff.c:593 >> RSP: 0018:880058bbf570 EFLAGS: 00010a02 >> RAX: 11032b488bad1523 RBX: 88006c6e8ec8 RCX: c9000190e000 >> RDX: 11000d8dd1df RSI: 8293d7e3 RDI: 88195a445d68a919 >> RBP: 880058bbf5a8 R08: 7bdf27567b31597f R09: >> R10: 00d2 R11: 7dc9ab6dec891f24 R12: >> R13: dc00 R14: 88006a475940 R15: 88195a445d68a8f9 >> FS: 7fbbc6a34700() GS:88006e40() knlGS: >> CS: 0010 DS: ES: CR0: 80050033 >> CR2: 2004e000 CR3: 6b868000 CR4: 06e0 >> Call Trace: >> skb_release_all+0x4a/0x60 net/core/skbuff.c:669 >> __kfree_skb net/core/skbuff.c:683 [inline] >> kfree_skb+0x85/0x1b0 net/core/skbuff.c:704 >> __ip6_append_data.isra.42+0x26ed/0x33b0 net/ipv6/ip6_output.c:1519 >> ip6_append_data+0x1a8/0x2f0 net/ipv6/ip6_output.c:1633 >> udpv6_sendmsg+0x7bd/0x2360 net/ipv6/udp.c:1264 >> inet_sendmsg+0x123/0x3a0 net/ipv4/af_inet.c:762 >> sock_sendmsg_nosec net/socket.c:633 [inline] >> sock_sendmsg+0xca/0x110 net/socket.c:643 >> ___sys_sendmsg+0x79f/0x900 net/socket.c:1997 >> __sys_sendmsg+0xd1/0x170 net/socket.c:2031 >> SYSC_sendmsg net/socket.c:2042 [inline] >> SyS_sendmsg+0x2d/0x50 net/socket.c:2038 >> entry_SYSCALL_64_fastpath+0x1a/0xa9 >> RIP: 0033:0x44fb79 >> RSP: 002b:7fbbc6a33b58 EFLAGS: 0212 ORIG_RAX: 002e >> RAX: ffda RBX: 00718000 RCX: 0044fb79 >> RDX: RSI: 2000afc8 RDI: 0005 >> RBP: 03fb R08: R09: >> R10: R11: 0212 R12: 0005 >> R13: 20001000 R14: 00048000 R15: >> Code: 48 83 c0 03 48 c1 e0 04 48 01 d8 48 89 c2 48 c1 ea 03 42 80 3c 2a 00 >> 0f 85 92 01 00 00 4c 8b 38 49 8d 7f 20 48 89 f8 48 c1 e8 03 <42> 80 3c 28 00 >> 0f 85 6f 01 00 00 49 8b 47 20 a8 01 0f 84 3b ff >> RIP: __read_once_size include/linux/compiler.h:254 [inline] RSP: >> 880058bbf570 >> RIP: compound_head include/linux/page-flags.h:146 [inline] RSP: >> 880058bbf570 >> RIP: put_page include/linux/mm.h:796 [inline] RSP: 880058bbf570 >> RIP: __skb_frag_unref include/linux/skbuff.h:2613 [inline] RSP: >> 880058bbf570 >> RIP: skb_release_data+0x201/0x3b0 net/core/skbuff.c:593 RSP: >> 880058bbf570 >> ---[ end trace 46c9f72a66cd8627 ]--- >> Kernel panic - not syncing: Fatal exception >> Dumping ftrace buffer: >>(ftrace buffer empty) >> Kernel Offset: disabled >> Rebooting in 86400 seconds.. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "syzkaller" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to syzkaller+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. Hi Andrey, please try following patch Thanks. patch2482 Description: Binary data
Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
On 5/19/17 7:21 AM, Edward Cree wrote: I'm currently translating the algos to C. But for the kernel patch, I'll need to read & understand the existing verifier code, so it might take a while :) (I don't suppose there's any design document or hacking-notes you could point me at?) Dave just gave a good overview of the verifier: https://www.spinics.net/lists/xdp-newbies/msg00185.html Few more details in Documentation/networking/filter.txt and in top comments in kernel/bpf/verifier.c General bpf arch overview: http://docs.cilium.io/en/latest/bpf/#instruction-set But I'll give it a go for sure. Thanks!
[PATCH v2 net] smsc95xx: Support only IPv4 TCP/UDP csum offload
From: Nisar SayedWhen TX checksum offload is used, if the computed checksum is 0 the LAN95xx device do not alter the checksum to 0x. In the case of ipv4 UDP checksum, it indicates to receiver that no checksum is calculated. Under ipv6, UDP checksum yields a result of zero must be changed to 0x. Hence disabling checksum offload for ipv6 packets. Signed-off-by: Nisar Sayed Reported-by: popcorn mix --- Changes V2 - Updated e-mail alias - Updated commit message and comments drivers/net/usb/smsc95xx.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c index 765400b..2dfca96 100644 --- a/drivers/net/usb/smsc95xx.c +++ b/drivers/net/usb/smsc95xx.c @@ -681,7 +681,7 @@ static int smsc95xx_set_features(struct net_device *netdev, if (ret < 0) return ret; - if (features & NETIF_F_HW_CSUM) + if (features & NETIF_F_IP_CSUM) read_buf |= Tx_COE_EN_; else read_buf &= ~Tx_COE_EN_; @@ -1279,12 +1279,19 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf) spin_lock_init(>mac_cr_lock); + /* LAN95xx devices do not alter the computed checksum of 0 to 0x. +* RFC 2460, ipv6 UDP calculated checksum yields a result of zero must +* be changed to 0x. RFC 768, ipv4 UDP computed checksum is zero, +* it is transmitted as all ones. The zero transmitted checksum means +* transmitter generated no checksum. Hence, enable csum offload only +* for ipv4 packets. +*/ if (DEFAULT_TX_CSUM_ENABLE) - dev->net->features |= NETIF_F_HW_CSUM; + dev->net->features |= NETIF_F_IP_CSUM; if (DEFAULT_RX_CSUM_ENABLE) dev->net->features |= NETIF_F_RXCSUM; - dev->net->hw_features = NETIF_F_HW_CSUM | NETIF_F_RXCSUM; + dev->net->hw_features = NETIF_F_IP_CSUM | NETIF_F_RXCSUM; smsc95xx_init_mac_address(dev); -- 1.9.1
Re: general protection fault in skb_release_data
On Fri, May 19, 2017 at 4:36 PM, 'Eric Dumazet' via syzkallerwrote: > On Fri, May 19, 2017 at 5:57 AM, Andrey Konovalov > wrote: >> On Fri, May 19, 2017 at 12:18 PM, wrote: >>> Hi, >>> >>> I've got the following bug report while fuzzing the >>> kernel(master-f83246089ca) with syzkaller. >>> >>> program and config are attached. >> >> Hi! >> >> Thanks for the report! >> >> Adding kernel maintainers. >> >> I can confirm that we've hist this bug multiple times, but never been >> able to reproduce it. >> >> I was able to reproduce it on 2ea659a9ef488125eb46da6eb571de5eae5c43f6 >> (4.12-rc1). >> >> Using the attached syzkaller program I was able to generate C >> reproducer, attached. Sometimes I need to run it a few times to >> trigger the bug. >> >> @idaifish If you find more bugs please run ./scripts/get_maintainer.pl >> to get the list of subsystem maintainers and add them to the >> recipients. I've updated instructions of how to report kernel bugs >> found with syzkaller in README. >> >> Thanks! >> >>> >>> === >>> kasan: GPF could be caused by NULL-ptr deref or user memory access >>> general protection fault: [#1] SMP KASAN >>> Dumping ftrace buffer: >>>(ftrace buffer empty) >>> Modules linked in: >>> CPU: 2 PID: 21599 Comm: syz-executor3 Not tainted 4.11.0-rc8+ #1 >>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>> Ubuntu-1.8.2-1ubuntu1 04/01/2014 >>> task: 88006c16dec0 task.stack: 880058bb8000 >>> RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline] >>> RIP: 0010:compound_head include/linux/page-flags.h:146 [inline] >>> RIP: 0010:put_page include/linux/mm.h:796 [inline] >>> RIP: 0010:__skb_frag_unref include/linux/skbuff.h:2613 [inline] >>> RIP: 0010:skb_release_data+0x201/0x3b0 net/core/skbuff.c:593 >>> RSP: 0018:880058bbf570 EFLAGS: 00010a02 >>> RAX: 11032b488bad1523 RBX: 88006c6e8ec8 RCX: c9000190e000 >>> RDX: 11000d8dd1df RSI: 8293d7e3 RDI: 88195a445d68a919 >>> RBP: 880058bbf5a8 R08: 7bdf27567b31597f R09: >>> R10: 00d2 R11: 7dc9ab6dec891f24 R12: >>> R13: dc00 R14: 88006a475940 R15: 88195a445d68a8f9 >>> FS: 7fbbc6a34700() GS:88006e40() knlGS: >>> CS: 0010 DS: ES: CR0: 80050033 >>> CR2: 2004e000 CR3: 6b868000 CR4: 06e0 >>> Call Trace: >>> skb_release_all+0x4a/0x60 net/core/skbuff.c:669 >>> __kfree_skb net/core/skbuff.c:683 [inline] >>> kfree_skb+0x85/0x1b0 net/core/skbuff.c:704 >>> __ip6_append_data.isra.42+0x26ed/0x33b0 net/ipv6/ip6_output.c:1519 >>> ip6_append_data+0x1a8/0x2f0 net/ipv6/ip6_output.c:1633 >>> udpv6_sendmsg+0x7bd/0x2360 net/ipv6/udp.c:1264 >>> inet_sendmsg+0x123/0x3a0 net/ipv4/af_inet.c:762 >>> sock_sendmsg_nosec net/socket.c:633 [inline] >>> sock_sendmsg+0xca/0x110 net/socket.c:643 >>> ___sys_sendmsg+0x79f/0x900 net/socket.c:1997 >>> __sys_sendmsg+0xd1/0x170 net/socket.c:2031 >>> SYSC_sendmsg net/socket.c:2042 [inline] >>> SyS_sendmsg+0x2d/0x50 net/socket.c:2038 >>> entry_SYSCALL_64_fastpath+0x1a/0xa9 >>> RIP: 0033:0x44fb79 >>> RSP: 002b:7fbbc6a33b58 EFLAGS: 0212 ORIG_RAX: 002e >>> RAX: ffda RBX: 00718000 RCX: 0044fb79 >>> RDX: RSI: 2000afc8 RDI: 0005 >>> RBP: 03fb R08: R09: >>> R10: R11: 0212 R12: 0005 >>> R13: 20001000 R14: 00048000 R15: >>> Code: 48 83 c0 03 48 c1 e0 04 48 01 d8 48 89 c2 48 c1 ea 03 42 80 3c 2a 00 >>> 0f 85 92 01 00 00 4c 8b 38 49 8d 7f 20 48 89 f8 48 c1 e8 03 <42> 80 3c 28 00 >>> 0f 85 6f 01 00 00 49 8b 47 20 a8 01 0f 84 3b ff >>> RIP: __read_once_size include/linux/compiler.h:254 [inline] RSP: >>> 880058bbf570 >>> RIP: compound_head include/linux/page-flags.h:146 [inline] RSP: >>> 880058bbf570 >>> RIP: put_page include/linux/mm.h:796 [inline] RSP: 880058bbf570 >>> RIP: __skb_frag_unref include/linux/skbuff.h:2613 [inline] RSP: >>> 880058bbf570 >>> RIP: skb_release_data+0x201/0x3b0 net/core/skbuff.c:593 RSP: >>> 880058bbf570 >>> ---[ end trace 46c9f72a66cd8627 ]--- >>> Kernel panic - not syncing: Fatal exception >>> Dumping ftrace buffer: >>>(ftrace buffer empty) >>> Kernel Offset: disabled >>> Rebooting in 86400 seconds.. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "syzkaller" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to syzkaller+unsubscr...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. > > Hi Andrey, please try following patch > Thanks. Hi Eric, Your patch fixes the bug for me. Thanks! Tested-by: Andrey Konovalov > > --
[PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers"), bridge would not start hello_timer if stp_enabled is not KERNEL_STP when br_dev_open. The problem is even if users set stp_enabled with KERNEL_STP later, the timer will still not be started. It causes that KERNEL_STP can not really work. Users have to re-ifup the bridge to avoid this. This patch is to fix it by starting br->hello_timer when enabling KERNEL_STP in br_stp_start. As an improvement, it's also to start hello_timer again only when br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is no reason to start the timer again when it's NO_STP. Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers") Reported-by: Haidong LiSigned-off-by: Xin Long --- net/bridge/br_stp_if.c| 1 + net/bridge/br_stp_timer.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c index 08341d2..0db8102 100644 --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -179,6 +179,7 @@ static void br_stp_start(struct net_bridge *br) br_debug(br, "using kernel STP\n"); /* To start timers on any ports left in blocking */ + mod_timer(>hello_timer, jiffies + br->hello_time); br_port_state_selection(br); } diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c index c98b3e5..60b6fe2 100644 --- a/net/bridge/br_stp_timer.c +++ b/net/bridge/br_stp_timer.c @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg) if (br->dev->flags & IFF_UP) { br_config_bpdu_generation(br); - if (br->stp_enabled != BR_USER_STP) + if (br->stp_enabled == BR_KERNEL_STP) mod_timer(>hello_timer, round_jiffies(jiffies + br->hello_time)); } -- 2.1.0
RE: [RFC V1 1/1] net: cdc_ncm: Reduce memory use when kernel memory low
From: Bjørn Mork > Sent: 19 May 2017 14:56 ... > Unless someone has a nice way to just collect a list of skbs and have > them converted to proper framing on the fly when transmitting, without > having to care about USB packet boundaries. skb can be linked into arbitrary chains (or even trees), but I suspect the usbnet code would need to be taught about them. For XHCI it isn't too bad because it will do arbitrary scatter-gather (apart from the ring end). But I believe the earlier controllers only support fragments that end on usb packet boundaries. I looked at the usbnet code a few years ago, I'm sure it ought to be possible to shortcut most of the code that uses URB and directly write to the xhci (in particular) ring descriptors. For receive you probably want to feed the USB stack multiple (probably) 2k buffers, and handle the debatching into ethernet frames yourself. One of the ASIX drivers used to be particularly bad, it allocated 64k skb for receive (the hardware can merge IP packets) and then hacked the true_size before passing upstream. David
[PATCH v2] hdlcdrv: fix divide error bug if bitrate is 0
The divisor s->par.bitrate will always be 0 until initialized by ndo_open() and hdlcdrv_open(). In order to fix this divide zero error, check whether the netdevice was opened by ndo_open() before performing divide.And we also check the the value of bitrate in case of bad setting of it. Reported-by: Andrey KonovalovSigned-off-by: Firo Yang --- v1->v2: Change Reported-by to Andrey Konovalov. Send it to original report thread. v0->v1: Reviewed by walter harms . Return ENODEV instead of EPERM if !netif_running(dev) Check if s->par.bitrate > 0. drivers/net/hamradio/hdlcdrv.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/hamradio/hdlcdrv.c b/drivers/net/hamradio/hdlcdrv.c index 8c3633c..b0f417f 100644 --- a/drivers/net/hamradio/hdlcdrv.c +++ b/drivers/net/hamradio/hdlcdrv.c @@ -576,6 +576,10 @@ static int hdlcdrv_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) case HDLCDRVCTL_CALIBRATE: if(!capable(CAP_SYS_RAWIO)) return -EPERM; + if (!netif_running(dev)) + return -ENODEV; + if (!(s->par.bitrate > 0)) + return -EINVAL; if (bi.data.calibrate > INT_MAX / s->par.bitrate) return -EINVAL; s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16; -- 2.9.4
Re: [PATCH net 1/3] vlan: Fix tcp checksums offloads for Q-in-Q vlan.
On 17/05/19 (金) 18:53, Vlad Yasevich wrote: On 05/19/2017 04:16 AM, Toshiaki Makita wrote: On 2017/05/19 16:09, Vlad Yasevich wrote: On 05/18/2017 10:13 PM, Toshiaki Makita wrote: On 2017/05/18 22:31, Vladislav Yasevich wrote: It appears that since commit 8cb65d000, Q-in-Q vlans have been broken. The series that commit is part of enabled TSO and checksum offloading on Q-in-Q vlans. However, most HW we support can't handle it. To work around the issue, the above commit added a function that turns off offloads on Q-in-Q devices, but it left the checksum offload. That will cause issues with most older devices that supprort very basic checksum offload capabilities as well as some newer devices (we've reproduced te problem with both be2net and bnx). To solve this for everyone, turn off checksum offloading feature by default when sending Q-in-Q traffic. Devices that are proven to work can provided a corrected ndo_features_check implemetation. Fixes: 8cb65d000 ("net: Move check for multiple vlans to drivers") CC: Toshiaki MakitaSigned-off-by: Vladislav Yasevich --- include/linux/if_vlan.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 8d5fcd6..ae537f0 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -619,7 +619,6 @@ static inline netdev_features_t vlan_features_check(const struct sk_buff *skb, NETIF_F_SG | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | -NETIF_F_HW_CSUM | NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX); I guess HW_CSUM theoretically can handle Q-in-Q packets and the problem is IP_CSUM and IPV6_CSUM. So wouldn't it be better to leave HW_CSUM and drop IP_CSUM/IPV6_CSUM, i.e. change intersection into bitwise AND? It wasn't really a problem before accelerations got enabled on q-in-q vlans. Right for stacked vlan device. But I think the check was there for packets from guests forwarded by bridge to vlan device so it was a problem before 8cb65d000. Not really, since stacked vlans in guests wouldn't have accelerations on. Haven't really tried a new guest on old hosts. It might be an issue there... It's real. I'm now remembering that I came across a similar issue before introducing 8cb65d000. The situation was that bridge (vlan_filtering) adds a vlan tag to a frame which is already tagged by guests, or by a vlan device on the top of the bridge (Note that virtio and bridge have HW_CSUM in vlan_features). I addressed the problem in drivers side since all the IP/IPV6_CSUM drivers I encountered the issue on are able to notify devices of IP header offset. Now I checked be2net driver's code and realized it doesn't provide IP offset so it makes sense to drop IP/IPV6_CSUM by default. Anyway, kernels before 8cb65d000 have that problem, not only after 8cb65d000. Toshiaki Makita
Re: [RFC V1 1/1] net: cdc_ncm: Reduce memory use when kernel memory low
David Laightwrites: > From: linux-usb-ow...@vger.kernel.org > [mailto:linux-usb-ow...@vger.kernel.org] On Behalf Of Jim Baxter >> Sent: 16 May 2017 18:41 >> >> The CDC-NCM driver can require large amounts of memory to create >> skb's and this can be a problem when the memory becomes fragmented. >> >> This especially affects embedded systems that have constrained >> resources but wish to maximise the throughput of CDC-NCM with 16KiB >> NTB's. > > Why is this driver copying multiple tx messages into a single skb. Mostly becasue it already did that when I started messing with it, and I didn't know how to avoid that. > Surely there are better ways to do this?? But I have been there thinking this exact thought a couple of times. Suggestions are appreciated. > I think it is generating a 'multi-ethernet frame' URB with an > overall header for each URB and a header for each ethernet frame. With some negotiated alignment restrictions, and a linked list of frame pointer arrays. But yes, that is basically it. (it's not always ethernet - with MBIM it can be IP or arbitrary as well, but I don't think that makes any difference) > Given that the USB stack allows multiple concurrent transmits I'm > surprised that batching large ethernet frames makes much difference. Me too. Actually, I don't think it does. The protocol was developed with specific device restrictions in mind. These might be invalid today. There is no reason to believe that using simple CDC ECM framing (i.e. one ethernet frame per URB) is any problem. > Also the USB target can't actually tell when URB that contain > multiples of the USB packet size end. > So it is possible to send a single NTB as multiple URB. Nice idea! Never thought of that. Yes, the driver could use a number smaller buffers regardless of the NTB size, by abusing the fact that the device will see them as a contigious USB transfer as long as they fall on USB packet boundaries. Started thinking about how to do this in practice. It seemed obviously simply to jsut fire off the buffers as they fill up until the the max aggregation size or time has been exceeded. But the header makes this harder than necessary. It contains both a length and a pointer to the first frame pointer array (NDP). So we will have to decide the size of the NTB and where to put the first NDP before sending the first USB packet. This is possible if we always go for the pad-to-max strategy. We'll also have to make some assumptions about the size of the NDP(s) as we add them, but we already do that so I don't think it is a big deal. Might be the way to go. Unless someone has a nice way to just collect a list of skbs and have them converted to proper framing on the fly when transmitting, without having to care about USB packet boundaries. Bjørn
Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
On 5/19/17 5:48 PM, Florian Westphal wrote: Nikolay Aleksandrovwrote: On 5/19/17 5:20 PM, Xin Long wrote: Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers"), bridge would not start hello_timer if stp_enabled is not KERNEL_STP when br_dev_open. The problem is even if users set stp_enabled with KERNEL_STP later, the timer will still not be started. It causes that KERNEL_STP can not really work. Users have to re-ifup the bridge to avoid this. This patch is to fix it by starting br->hello_timer when enabling KERNEL_STP in br_stp_start. As an improvement, it's also to start hello_timer again only when br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is no reason to start the timer again when it's NO_STP. Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers") Reported-by: Haidong Li Signed-off-by: Xin Long --- net/bridge/br_stp_if.c| 1 + net/bridge/br_stp_timer.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) This doesn't make much sense to me, how do you change from USER_STP to KERNEL_STP without first going through NO_STP ? This is easily rerpoduceable via: ip link add vethin1 type veth peer name vethout1 ip link add vethin2 type veth peer name vethout2 ip link set vethin1 up ip link set vethin2 up ip link set vethout1 up ip link set vethout2 up brctl addbr br0 brctl addbr br1 brctl stp br0 on brctl stp br1 on I think this step with moving NO_STP -> KERNEL_STP should be last, then I can see how the timer won't be started. brctl addif br0 vethin1 brctl addif br0 vethin2 brctl addif br1 vethout1 brctl addif br1 vethout2 ip link set br0 up ip link set br1 up
Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
2017-05-19 16:57 GMT+02:00 Nikolay Aleksandrov: > On 5/19/17 5:51 PM, Ivan Vecera wrote: >> >> 2017-05-19 16:45 GMT+02:00 Nikolay Aleksandrov >> : >>> >>> On 5/19/17 5:20 PM, Xin Long wrote: Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers"), bridge would not start hello_timer if stp_enabled is not KERNEL_STP when br_dev_open. The problem is even if users set stp_enabled with KERNEL_STP later, the timer will still not be started. It causes that KERNEL_STP can not really work. Users have to re-ifup the bridge to avoid this. This patch is to fix it by starting br->hello_timer when enabling KERNEL_STP in br_stp_start. As an improvement, it's also to start hello_timer again only when br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is no reason to start the timer again when it's NO_STP. Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers") Reported-by: Haidong Li Signed-off-by: Xin Long --- net/bridge/br_stp_if.c| 1 + net/bridge/br_stp_timer.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) >>> >>> This doesn't make much sense to me, how do you change from USER_STP to >>> KERNEL_STP without first going through NO_STP ? >>> >>> If you go through NO_STP then all will be fine because br_stp_stop will >>> restart >>> the timers if the previous val was USER_STP. >>> >> The problem occurs when KERNEL_STP is enabled if the bridge itself is >> already >> up. Then the hello_timer is not started. If the hello and hold timers >> should run only >> when KERNEL_STP is used then there are another problematic places >> (will send follow-up). >> >> Ivan >> > > Oh, the problem seems to be rather going from NO_STP -> KERNEL_STP only > then, because you cannot do direct USER_STP -> KERNEL_STP. > No only NO_STP->KERNEL_STP but KERNEL_STP->NO_STP as well as USER_STP->NO_STP: 1) NO_STP->KERNEL_STP issue hello_timer should be started in br_stp_start() - this patch 2) KERNEL_STP->NO_STP issue hello timer and hold timers should be stopped (deleted) in br_stp_stop() 3) USER_STP->NO_STP issue hello timer and hold timers should NOT be started in br_stp_stop() Ivan
[PATCH iproute2] ip: add handling for new CAN netlink interface
This patch adds handling for new CAN netlink interface introduced in 4.11 kernel: - IFLA_CAN_TERMINATION, - IFLA_CAN_TERMINATION_CONST, - IFLA_CAN_BITRATE_CONST, - IFLA_CAN_DATA_BITRATE_CONST Output example: $ip -d link show can0 6: can0:mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10 link/can promiscuity 0 can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0 bitrate 8 [ 2,3,5,8,8, 10, 125000, 15, 175000, 20, 225000, 25, 275000, 30, 50, 625000, 80, 100 ] termination 0 [ 0, 120 ] clock 0numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 Signed-off-by: Remigiusz Kołłątaj --- ip/iplink_can.c | 98 ++--- 1 file changed, 94 insertions(+), 4 deletions(-) diff --git a/ip/iplink_can.c b/ip/iplink_can.c index 20d4d37d..5df56b2b 100644 --- a/ip/iplink_can.c +++ b/ip/iplink_can.c @@ -41,6 +41,8 @@ static void print_usage(FILE *f) "\t[ restart-ms TIME-MS ]\n" "\t[ restart ]\n" "\n" + "\t[ termination { 0..65535 } ]\n" + "\n" "\tWhere: BITRATE := { 1..100 }\n" "\t SAMPLE-POINT := { 0.000..0.999 }\n" "\t TQ:= { NUMBER }\n" @@ -220,6 +222,14 @@ static int can_parse_opt(struct link_util *lu, int argc, char **argv, if (get_u32(, *argv, 0)) invarg("invalid \"restart-ms\" value\n", *argv); addattr32(n, 1024, IFLA_CAN_RESTART_MS, val); + } else if (matches(*argv, "termination") == 0) { + __u16 val; + + NEXT_ARG(); + if (get_u16(, *argv, 0)) + invarg("invalid \"termination\" value\n", + *argv); + addattr16(n, 1024, IFLA_CAN_TERMINATION, val); } else if (matches(*argv, "help") == 0) { usage(); return -1; @@ -282,7 +292,8 @@ static void can_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) fprintf(f, "restart-ms %d ", *restart_ms); } - if (tb[IFLA_CAN_BITTIMING]) { + /* bittiming is irrelevant if fixed bitrate is defined */ + if (tb[IFLA_CAN_BITTIMING] && !tb[IFLA_CAN_BITRATE_CONST]) { struct can_bittiming *bt = RTA_DATA(tb[IFLA_CAN_BITTIMING]); fprintf(f, "\nbitrate %d sample-point %.3f ", @@ -292,7 +303,8 @@ static void can_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) bt->sjw); } - if (tb[IFLA_CAN_BITTIMING_CONST]) { + /* bittiming const is irrelevant if fixed bitrate is defined */ + if (tb[IFLA_CAN_BITTIMING_CONST] && !tb[IFLA_CAN_BITRATE_CONST]) { struct can_bittiming_const *btc = RTA_DATA(tb[IFLA_CAN_BITTIMING_CONST]); @@ -303,7 +315,37 @@ static void can_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) btc->brp_min, btc->brp_max, btc->brp_inc); } - if (tb[IFLA_CAN_DATA_BITTIMING]) { + if (tb[IFLA_CAN_BITRATE_CONST]) { + __u32 *bitrate_const = RTA_DATA(tb[IFLA_CAN_BITRATE_CONST]); + int bitrate_cnt = RTA_PAYLOAD(tb[IFLA_CAN_BITRATE_CONST]) / + sizeof(*bitrate_const); + int i; + __u32 bitrate = 0; + + if (tb[IFLA_CAN_BITTIMING]) { + struct can_bittiming *bt = + RTA_DATA(tb[IFLA_CAN_BITTIMING]); + bitrate = bt->bitrate; + } + + fprintf(f, "\nbitrate %u", bitrate); + fprintf(f, "\n ["); + + for (i = 0; i < bitrate_cnt - 1; ++i) { + /* This will keep lines below 80 signs */ + if (!(i % 6) && i) + fprintf(f, "\n"); + + fprintf(f, "%8u, ", bitrate_const[i]); + } + + if (!(i % 6) && i) + fprintf(f, "\n"); + fprintf(f, "%8u ]", bitrate_const[i]); + } + + /* data bittiming is irrelevant if fixed bitrate is defined */ + if (tb[IFLA_CAN_DATA_BITTIMING] && !tb[IFLA_CAN_DATA_BITRATE_CONST]) { struct can_bittiming *dbt = RTA_DATA(tb[IFLA_CAN_DATA_BITTIMING]); @@ -315,7 +357,9 @@ static void can_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) dbt->phase_seg2, dbt->sjw); } - if
Good day my dear, please i need your assistant
How are you doing today, Hope you are in perfect condition? Please i apologies with you to exercise a little patience and read through my letter i am contacting you for obvious reason and important business. My Name is Aisha Gaddafi,i am open minded person and easy going person romance and hard working, Please i really need to have a good relationship with you, i have some important discussion to discuss with you.There is information i would like you to keep very confidential, i will appreciate if you can reply me through my Private email address (aishamuammar...@yahoo.com) for further discussion, I WILL BE VERY HAPPY IF YOU ACCEPT ME AS YOUR FRIEND,Thanks and God bless, this is just brief about me,hope to review the secret inner most part of me once i received from you. Aisha Gaddafi.
Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
Nikolay Aleksandrovwrote: > On 5/19/17 5:20 PM, Xin Long wrote: > >Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop > >kernel hello and hold timers"), bridge would not start hello_timer if > >stp_enabled is not KERNEL_STP when br_dev_open. > > > >The problem is even if users set stp_enabled with KERNEL_STP later, > >the timer will still not be started. It causes that KERNEL_STP can > >not really work. Users have to re-ifup the bridge to avoid this. > > > >This patch is to fix it by starting br->hello_timer when enabling > >KERNEL_STP in br_stp_start. > > > >As an improvement, it's also to start hello_timer again only when > >br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is > >no reason to start the timer again when it's NO_STP. > > > >Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel > >hello and hold timers") > >Reported-by: Haidong Li > >Signed-off-by: Xin Long > >--- > > net/bridge/br_stp_if.c| 1 + > > net/bridge/br_stp_timer.c | 2 +- > > 2 files changed, 2 insertions(+), 1 deletion(-) > > > > This doesn't make much sense to me, how do you change from USER_STP to > KERNEL_STP without first going through NO_STP ? This is easily rerpoduceable via: ip link add vethin1 type veth peer name vethout1 ip link add vethin2 type veth peer name vethout2 ip link set vethin1 up ip link set vethin2 up ip link set vethout1 up ip link set vethout2 up brctl addbr br0 brctl addbr br1 brctl stp br0 on brctl stp br1 on brctl addif br0 vethin1 brctl addif br0 vethin2 brctl addif br1 vethout1 brctl addif br1 vethout2 ip link set br0 up ip link set br1 up
Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
2017-05-19 16:45 GMT+02:00 Nikolay Aleksandrov: > On 5/19/17 5:20 PM, Xin Long wrote: >> >> Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop >> kernel hello and hold timers"), bridge would not start hello_timer if >> stp_enabled is not KERNEL_STP when br_dev_open. >> >> The problem is even if users set stp_enabled with KERNEL_STP later, >> the timer will still not be started. It causes that KERNEL_STP can >> not really work. Users have to re-ifup the bridge to avoid this. >> >> This patch is to fix it by starting br->hello_timer when enabling >> KERNEL_STP in br_stp_start. >> >> As an improvement, it's also to start hello_timer again only when >> br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is >> no reason to start the timer again when it's NO_STP. >> >> Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel >> hello and hold timers") >> Reported-by: Haidong Li >> Signed-off-by: Xin Long >> --- >> net/bridge/br_stp_if.c| 1 + >> net/bridge/br_stp_timer.c | 2 +- >> 2 files changed, 2 insertions(+), 1 deletion(-) >> > > This doesn't make much sense to me, how do you change from USER_STP to > KERNEL_STP without first going through NO_STP ? > > If you go through NO_STP then all will be fine because br_stp_stop will > restart > the timers if the previous val was USER_STP. > The problem occurs when KERNEL_STP is enabled if the bridge itself is already up. Then the hello_timer is not started. If the hello and hold timers should run only when KERNEL_STP is used then there are another problematic places (will send follow-up). Ivan
Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
On 5/19/17 6:03 PM, Ivan Vecera wrote: 2017-05-19 16:57 GMT+02:00 Nikolay Aleksandrov: On 5/19/17 5:51 PM, Ivan Vecera wrote: 2017-05-19 16:45 GMT+02:00 Nikolay Aleksandrov : On 5/19/17 5:20 PM, Xin Long wrote: Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers"), bridge would not start hello_timer if stp_enabled is not KERNEL_STP when br_dev_open. The problem is even if users set stp_enabled with KERNEL_STP later, the timer will still not be started. It causes that KERNEL_STP can not really work. Users have to re-ifup the bridge to avoid this. This patch is to fix it by starting br->hello_timer when enabling KERNEL_STP in br_stp_start. As an improvement, it's also to start hello_timer again only when br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is no reason to start the timer again when it's NO_STP. Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers") Reported-by: Haidong Li Signed-off-by: Xin Long --- net/bridge/br_stp_if.c| 1 + net/bridge/br_stp_timer.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) This doesn't make much sense to me, how do you change from USER_STP to KERNEL_STP without first going through NO_STP ? If you go through NO_STP then all will be fine because br_stp_stop will restart the timers if the previous val was USER_STP. The problem occurs when KERNEL_STP is enabled if the bridge itself is already up. Then the hello_timer is not started. If the hello and hold timers should run only when KERNEL_STP is used then there are another problematic places (will send follow-up). Ivan Oh, the problem seems to be rather going from NO_STP -> KERNEL_STP only then, because you cannot do direct USER_STP -> KERNEL_STP. No only NO_STP->KERNEL_STP but KERNEL_STP->NO_STP as well as USER_STP->NO_STP: 1) NO_STP->KERNEL_STP issue hello_timer should be started in br_stp_start() - this patch Right, I was talking only about this patch. By the way what about the port hold_timers ? This patch only starts the hello_timer. 2) KERNEL_STP->NO_STP issue hello timer and hold timers should be stopped (deleted) in br_stp_stop() 3) USER_STP->NO_STP issue hello timer and hold timers should NOT be started in br_stp_stop() Yep, ack. Ivan
Re: [PATCH net 3/3] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
On 2017年05月18日 21:31, Vladislav Yasevich wrote: Since virtio does not provide it's own ndo_features_check handler, TSO, and now checksum offload, are disabled for stacked vlans. Re-enable the support and let the host take care of it. This restores/improves Guest-to-Guest performance over Q-in-Q vlans. CC: "Michael S. Tsirkin"CC: Jason Wang CC: virtualizat...@lists.linux-foundation.org Signed-off-by: Vladislav Yasevich --- drivers/net/virtio_net.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 8324a5e..341fb96 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2028,6 +2028,7 @@ static const struct net_device_ops virtnet_netdev = { .ndo_poll_controller = virtnet_netpoll, #endif .ndo_xdp= virtnet_xdp, + .ndo_features_check = passthru_features_check, }; static void virtnet_config_changed_work(struct work_struct *work) Acked-by: Jason Wang
Re: [PATCH v4 net-next 6/7] net: allow simultaneous SW and HW transmit timestamping
On Fri, May 19, 2017 at 6:00 AM, Miroslav Lichvarwrote: > On Thu, May 18, 2017 at 04:16:26PM -0400, Willem de Bruijn wrote: >> On Thu, May 18, 2017 at 9:06 AM, Miroslav Lichvar >> wrote: >> > +/* On transmit, software and hardware timestamps are returned >> > independently. >> > + * As the two skb clones share the hardware timestamp, which may be >> > updated >> > + * before the software timestamp is received, a hardware TX timestamp may >> > be >> > + * returned only if there is no software TX timestamp. A false software >> > + * timestamp made for SOCK_RCVTSTAMP when a real timestamp is missing must >> > + * be ignored. >> >> Please expand on why this case can be ignored. It is quite subtle. How about >> something like >> >> * >> * A false software timestamp is one made inside the __sock_recv_timestamp >> * call itself. These are generated whenever SO_TIMESTAMP(NS) is enabled >> * on the socket, even when the timestamp reported is for another option, such >> * as hardware tx timestamp. >> * >> * Ignore these when deciding whether a timestamp source is hw or sw. >> */ > > That seems a bit too verbose to me. :) Would the following work? > > /* On transmit, software and hardware timestamps are returned independently. > * As the two skb clones share the hardware timestamp, which may be updated > * before the software timestamp is received, a hardware TX timestamp may be > * returned only if there is no software TX timestamp. Ignore false software > * timestamps, which may be made in the __sock_recv_timestamp() call when the > * option SO_TIMESTAMP(NS) is enabled on the socket, even when the skb has a > * hardware timestamp. > */ Looks great, thanks. > >> > +static bool skb_is_swtx_tstamp(const struct sk_buff *skb, >> > + const struct sock *sk, int false_tstamp) >> > +{ >> > + if (false_tstamp && sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) >> >> Also, why is it ignored only for the new mode? > > Good point. That should not be there. The function can be now reduced > to a single line again. I originally tried a different approach, > disabling false timestamps in the new mode, but then I thought it's > better to not complicate it unnecessarily and keep it consistent. > > -- > Miroslav Lichvar
Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
2017-05-19 17:05 GMT+02:00 Nikolay Aleksandrov: > On 5/19/17 6:03 PM, Ivan Vecera wrote: >> >> 2017-05-19 16:57 GMT+02:00 Nikolay Aleksandrov >> : >>> >>> On 5/19/17 5:51 PM, Ivan Vecera wrote: 2017-05-19 16:45 GMT+02:00 Nikolay Aleksandrov : > > > On 5/19/17 5:20 PM, Xin Long wrote: >> >> >> >> Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop >> kernel hello and hold timers"), bridge would not start hello_timer if >> stp_enabled is not KERNEL_STP when br_dev_open. >> >> The problem is even if users set stp_enabled with KERNEL_STP later, >> the timer will still not be started. It causes that KERNEL_STP can >> not really work. Users have to re-ifup the bridge to avoid this. >> >> This patch is to fix it by starting br->hello_timer when enabling >> KERNEL_STP in br_stp_start. >> >> As an improvement, it's also to start hello_timer again only when >> br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is >> no reason to start the timer again when it's NO_STP. >> >> Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop >> kernel >> hello and hold timers") >> Reported-by: Haidong Li >> Signed-off-by: Xin Long >> --- >> net/bridge/br_stp_if.c| 1 + >> net/bridge/br_stp_timer.c | 2 +- >> 2 files changed, 2 insertions(+), 1 deletion(-) >> > > This doesn't make much sense to me, how do you change from USER_STP to > KERNEL_STP without first going through NO_STP ? > > If you go through NO_STP then all will be fine because br_stp_stop will > restart > the timers if the previous val was USER_STP. > The problem occurs when KERNEL_STP is enabled if the bridge itself is already up. Then the hello_timer is not started. If the hello and hold timers should run only when KERNEL_STP is used then there are another problematic places (will send follow-up). Ivan >>> >>> Oh, the problem seems to be rather going from NO_STP -> KERNEL_STP only >>> then, because you cannot do direct USER_STP -> KERNEL_STP. >>> >> No only NO_STP->KERNEL_STP but KERNEL_STP->NO_STP as well as >> USER_STP->NO_STP: >> >> 1) NO_STP->KERNEL_STP issue >> hello_timer should be started in br_stp_start() - this patch >> > > Right, I was talking only about this patch. By the way what about > the port hold_timers ? This patch only starts the hello_timer. The hold_timers should be started indirectly from br_transmit_config() called from br_config_bpdu_generation() called from hello_timer_expired() handler. I. >> 2) KERNEL_STP->NO_STP issue >> hello timer and hold timers should be stopped (deleted) in br_stp_stop() >> >> 3) USER_STP->NO_STP issue >> hello timer and hold timers should NOT be started in br_stp_stop() >> > > Yep, ack. > >> Ivan >> >
Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
2017-05-19 16:20 GMT+02:00 Xin Long: > Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop > kernel hello and hold timers"), bridge would not start hello_timer if > stp_enabled is not KERNEL_STP when br_dev_open. > > The problem is even if users set stp_enabled with KERNEL_STP later, > the timer will still not be started. It causes that KERNEL_STP can > not really work. Users have to re-ifup the bridge to avoid this. > > This patch is to fix it by starting br->hello_timer when enabling > KERNEL_STP in br_stp_start. > > As an improvement, it's also to start hello_timer again only when > br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is > no reason to start the timer again when it's NO_STP. > > Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello > and hold timers") > Reported-by: Haidong Li > Signed-off-by: Xin Long > --- > net/bridge/br_stp_if.c| 1 + > net/bridge/br_stp_timer.c | 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c > index 08341d2..0db8102 100644 > --- a/net/bridge/br_stp_if.c > +++ b/net/bridge/br_stp_if.c > @@ -179,6 +179,7 @@ static void br_stp_start(struct net_bridge *br) > br_debug(br, "using kernel STP\n"); > > /* To start timers on any ports left in blocking */ > + mod_timer(>hello_timer, jiffies + br->hello_time); > br_port_state_selection(br); > } > > diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c > index c98b3e5..60b6fe2 100644 > --- a/net/bridge/br_stp_timer.c > +++ b/net/bridge/br_stp_timer.c > @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg) > if (br->dev->flags & IFF_UP) { > br_config_bpdu_generation(br); > > - if (br->stp_enabled != BR_USER_STP) > + if (br->stp_enabled == BR_KERNEL_STP) > mod_timer(>hello_timer, > round_jiffies(jiffies + br->hello_time)); > } > -- > 2.1.0 > Reviewed-by: Ivan Vecera
[PATCH net] sctp: fix ICMP processing if skb is non-linear
when the ICMP packet is carried by a paged skb, sctp_err_lookup() may fail validation even if the payload contents match an open socket: as a consequence, sometimes ICMPs are wrongly ignored. Use skb_header_pointer() to retrieve encapsulated SCTP headers, to ensure that ICMP payloads are validated correctly, also when skbs are non-linear. Signed-off-by: Davide Caratti--- include/net/sctp/sctp.h | 2 +- net/sctp/input.c| 29 +++-- net/sctp/ipv6.c | 2 +- 3 files changed, 21 insertions(+), 12 deletions(-) diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h index 069582e..1b8c16b 100644 --- a/include/net/sctp/sctp.h +++ b/include/net/sctp/sctp.h @@ -152,7 +152,7 @@ void sctp_v4_err(struct sk_buff *skb, u32 info); void sctp_hash_endpoint(struct sctp_endpoint *); void sctp_unhash_endpoint(struct sctp_endpoint *); struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *, -struct sctphdr *, struct sctp_association **, +struct sctp_association **, struct sctp_transport **); void sctp_err_finish(struct sock *, struct sctp_transport *); void sctp_icmp_frag_needed(struct sock *, struct sctp_association *, diff --git a/net/sctp/input.c b/net/sctp/input.c index 0e06a27..7f3f983 100644 --- a/net/sctp/input.c +++ b/net/sctp/input.c @@ -469,19 +469,19 @@ void sctp_icmp_proto_unreachable(struct sock *sk, /* Common lookup code for icmp/icmpv6 error handler. */ struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *skb, -struct sctphdr *sctphdr, struct sctp_association **app, struct sctp_transport **tpp) { + struct sctp_init_chunk _chunkhdr, *chunkhdr; + struct sctphdr _sctphdr, *sctphdr; union sctp_addr saddr; union sctp_addr daddr; struct sctp_af *af; struct sock *sk = NULL; struct sctp_association *asoc; struct sctp_transport *transport = NULL; - struct sctp_init_chunk *chunkhdr; - __u32 vtag = ntohl(sctphdr->vtag); - int len = skb->len - ((void *)sctphdr - (void *)skb->data); + int offset; + __u32 vtag; *app = NULL; *tpp = NULL; @@ -515,14 +515,23 @@ struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *skb, * or the chunk type or the Initiate Tag does not match, silently * discard the packet. */ + offset = skb_transport_offset(skb); + sctphdr = skb_header_pointer(skb, offset, sizeof(_sctphdr), &_sctphdr); + if (unlikely(!sctphdr)) + goto out; + + vtag = ntohl(sctphdr->vtag); if (vtag == 0) { - chunkhdr = (void *)sctphdr + sizeof(struct sctphdr); - if (len < sizeof(struct sctphdr) + sizeof(sctp_chunkhdr_t) - + sizeof(__be32) || + offset += sizeof(_sctphdr); + /* chunk header + first 4 octects of init header */ + chunkhdr = skb_header_pointer(skb, offset, + sizeof(struct sctp_chunkhdr) + + sizeof(__be32), &_chunkhdr); + if (!chunkhdr || chunkhdr->chunk_hdr.type != SCTP_CID_INIT || - ntohl(chunkhdr->init_hdr.init_tag) != asoc->c.my_vtag) { + ntohl(chunkhdr->init_hdr.init_tag) != asoc->c.my_vtag) goto out; - } + } else if (vtag != asoc->c.peer_vtag) { goto out; } @@ -585,7 +594,7 @@ void sctp_v4_err(struct sk_buff *skb, __u32 info) savesctp = skb->transport_header; skb_reset_network_header(skb); skb_set_transport_header(skb, ihlen); - sk = sctp_err_lookup(net, AF_INET, skb, sctp_hdr(skb), , ); + sk = sctp_err_lookup(net, AF_INET, skb, , ); /* Put back, the original values. */ skb->network_header = saveip; skb->transport_header = savesctp; diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index 142b70e..d72c8d5 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -157,7 +157,7 @@ static void sctp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, savesctp = skb->transport_header; skb_reset_network_header(skb); skb_set_transport_header(skb, offset); - sk = sctp_err_lookup(net, AF_INET6, skb, sctp_hdr(skb), , ); + sk = sctp_err_lookup(net, AF_INET6, skb, , ); /* Put back, the original pointers. */ skb->network_header = saveip; skb->transport_header = savesctp; -- 2.7.4
Re: [RFC net-next PATCH 2/5] mlx5: fix bug reading rss_hash_type from CQE
On 05/18/2017 05:41 PM, Jesper Dangaard Brouer wrote: Masks for extracting part of the Completion Queue Entry (CQE) field rss_hash_type was swapped, namely CQE_RSS_HTYPE_IP and CQE_RSS_HTYPE_L4. The bug resulted in setting skb->l4_hash, even-though the rss_hash_type indicated that hash was NOT computed over the L4 (UDP or TCP) part of the packet. Added comments from the datasheet, to make it more clear what these masks are selecting. Signed-off-by: Jesper Dangaard Brouer--- Stand-alone fix for -net tree? include/linux/mlx5/device.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index dd9a263ed368..a940ec6a046c 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -787,8 +787,14 @@ enum { }; enum { - CQE_RSS_HTYPE_IP= 0x3 << 6, - CQE_RSS_HTYPE_L4= 0x3 << 2, + CQE_RSS_HTYPE_IP= 0x3 << 2, + /* cqe->rss_hash_type[3:2] - IP destination selected for hash +* (00 = none, 01 = IPv4, 10 = IPv6, 11 = Reserved) +*/ + CQE_RSS_HTYPE_L4= 0x3 << 6, + /* cqe->rss_hash_type[7:6] - L4 destination selected for hash +* (00 = none, 01 = TCP. 10 = UDP, 11 = IPSEC.SPI +*/ }; enum {
[PATCH v6 net-next 7/7] net: ethernet: update drivers to make both SW and HW TX timestamps
Some drivers were calling the skb_tx_timestamp() function only when a hardware timestamp was not requested. Now that applications can use the SOF_TIMESTAMPING_OPT_TX_SWHW option to request both software and hardware timestamps, the drivers need to be modified to unconditionally call skb_tx_timestamp(). CC: Richard CochranCC: Willem de Bruijn Signed-off-by: Miroslav Lichvar --- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 3 +-- drivers/net/ethernet/intel/e1000e/netdev.c| 4 ++-- drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 3 +-- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 ++ 4 files changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c index 89b21d7..5a2ad9c 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c @@ -1391,8 +1391,7 @@ static void xgbe_prep_tx_tstamp(struct xgbe_prv_data *pdata, spin_unlock_irqrestore(>tstamp_lock, flags); } - if (!XGMAC_GET_BITS(packet->attributes, TX_PACKET_ATTRIBUTES, PTP)) - skb_tx_timestamp(skb); + skb_tx_timestamp(skb); } static void xgbe_prep_vlan(struct sk_buff *skb, struct xgbe_packet_data *packet) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 0ff9295..6ed3bc4 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -5868,10 +5868,10 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb, adapter->tx_hwtstamp_skb = skb_get(skb); adapter->tx_hwtstamp_start = jiffies; schedule_work(>tx_hwtstamp_work); - } else { - skb_tx_timestamp(skb); } + skb_tx_timestamp(skb); + netdev_sent_queue(netdev, skb->len); e1000_tx_queue(tx_ring, tx_flags, count); /* Make sure there is space in the ring for the next send. */ diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c index 1e59435..89831ad 100644 --- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c +++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c @@ -1418,8 +1418,7 @@ static netdev_tx_t sxgbe_xmit(struct sk_buff *skb, struct net_device *dev) priv->hw->desc->tx_enable_tstamp(first_desc); } - if (!tqueue->hwts_tx_en) - skb_tx_timestamp(skb); + skb_tx_timestamp(skb); priv->hw->dma->enable_dma_transmission(priv->ioaddr, txq_index); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index cce862b..27c12e7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2880,8 +2880,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) priv->xstats.tx_set_ic_bit++; } - if (!priv->hwts_tx_en) - skb_tx_timestamp(skb); + skb_tx_timestamp(skb); if (unlikely((skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) && priv->hwts_tx_en)) { @@ -3084,8 +3083,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) priv->xstats.tx_set_ic_bit++; } - if (!priv->hwts_tx_en) - skb_tx_timestamp(skb); + skb_tx_timestamp(skb); /* Ready to fill the first descriptor and set the OWN bit w/o any * problems because all the descriptors are actually ready to be -- 2.9.3
[PATCH v6 net-next 5/7] net: fix documentation of struct scm_timestamping
The scm_timestamping struct may return multiple non-zero fields, e.g. when both software and hardware RX timestamping is enabled, or when the SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false software timestamp is generated in the recvmsg() call in order to always return a SCM_TIMESTAMP(NS) message. CC: Richard CochranCC: Willem de Bruijn Signed-off-by: Miroslav Lichvar --- Documentation/networking/timestamping.txt | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index ce11e3a..50eb0e5 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -322,7 +322,7 @@ struct scm_timestamping { }; The structure can return up to three timestamps. This is a legacy -feature. Only one field is non-zero at any time. Most timestamps +feature. At least one field is non-zero at any time. Most timestamps are passed in ts[0]. Hardware timestamps are passed in ts[2]. ts[1] used to hold hardware timestamps converted to system time. @@ -331,6 +331,12 @@ a HW PTP clock source, to allow time conversion in userspace and optionally synchronize system time with a userspace PTP stack such as linuxptp. For the PTP clock API, see Documentation/ptp/ptp.txt. +Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled +together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false +software timestamp will be generated in the recvmsg() call and passed +in ts[0] when a real software timestamp is missing. This happens also +on hardware transmit timestamps. + 2.1.1 Transmit timestamps with MSG_ERRQUEUE For transmit timestamps the outgoing packet is looped back to the -- 2.9.3
[PATCH v6 net-next 6/7] net: allow simultaneous SW and HW transmit timestamping
Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to be looped to the socket's error queue with a software timestamp even when a hardware transmit timestamp is expected to be provided by the driver. Applications using this option will receive two separate messages from the error queue, one with a software timestamp and the other with a hardware timestamp. As the hardware timestamp is saved to the shared skb info, which may happen before the first message with software timestamp is received by the application, the hardware timestamp is copied to the SCM_TIMESTAMPING control message only when the skb has no software timestamp or it is an incoming packet. While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as there are no other users. CC: Richard CochranCC: Willem de Bruijn Signed-off-by: Miroslav Lichvar --- Documentation/networking/timestamping.txt | 8 include/linux/skbuff.h| 10 ++ include/uapi/linux/net_tstamp.h | 3 ++- net/core/skbuff.c | 4 net/socket.c | 20 ++-- 5 files changed, 34 insertions(+), 11 deletions(-) diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 50eb0e5..196ba17 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -203,6 +203,14 @@ SOF_TIMESTAMPING_OPT_PKTINFO: enabled and the driver is using NAPI. The struct contains also two other fields, but they are reserved and undefined. +SOF_TIMESTAMPING_OPT_TX_SWHW: + + Request both hardware and software timestamps for outgoing packets + when SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE + are enabled at the same time. If both timestamps are generated, + two separate messages will be looped to the socket's error queue, + each containing just one timestamp. + New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate regardless of the setting of sysctl net.core.tstamp_allow_data. diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 1f8028c..3b2e284 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3254,13 +3254,6 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, void skb_tstamp_tx(struct sk_buff *orig_skb, struct skb_shared_hwtstamps *hwtstamps); -static inline void sw_tx_timestamp(struct sk_buff *skb) -{ - if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP && - !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)) - skb_tstamp_tx(skb, NULL); -} - /** * skb_tx_timestamp() - Driver hook for transmit timestamping * @@ -3276,7 +3269,8 @@ static inline void sw_tx_timestamp(struct sk_buff *skb) static inline void skb_tx_timestamp(struct sk_buff *skb) { skb_clone_tx_timestamp(skb); - sw_tx_timestamp(skb); + if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP) + skb_tstamp_tx(skb, NULL); } /** diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h index dee74d3..3d421d9 100644 --- a/include/uapi/linux/net_tstamp.h +++ b/include/uapi/linux/net_tstamp.h @@ -28,8 +28,9 @@ enum { SOF_TIMESTAMPING_OPT_TSONLY = (1<<11), SOF_TIMESTAMPING_OPT_STATS = (1<<12), SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13), + SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14), - SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_PKTINFO, + SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_TX_SWHW, SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) | SOF_TIMESTAMPING_LAST }; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 346d3e8..68c02df 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3875,6 +3875,10 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, if (!sk) return; + if (!hwtstamps && !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) && + skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS) + return; + tsonly = sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TSONLY; if (!skb_may_tx_timestamp(sk, tsonly)) return; diff --git a/net/socket.c b/net/socket.c index 67db7d8..cb355a7 100644 --- a/net/socket.c +++ b/net/socket.c @@ -662,6 +662,19 @@ static bool skb_is_err_queue(const struct sk_buff *skb) return skb->pkt_type == PACKET_OUTGOING; } +/* On transmit, software and hardware timestamps are returned independently. + * As the two skb clones share the hardware timestamp, which may be updated + * before the software timestamp is received, a hardware TX timestamp may be + * returned only if there is no software TX timestamp. Ignore false software + * timestamps, which may be made in the __sock_recv_timestamp() call when the + * option
Re: [PATCH net] bridge: fix hello and hold timers starting/stopping
On Fri, 19 May 2017 18:25:43 +0200 Ivan Vecerawrote: > Current bridge code incorrectly handles starting/stopping of hello and > hold timers during STP enable/disable. > > 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP >transition. This is not correct as the timers are stopped in NO_STP >case. > > 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition. >This is not also correct as the timers should be stopped in NO_STP >state. > > 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP >transition. They should be stopped as they are running in KERNEL_STP >state and should not run in NO_STP case. > > The patch is a follow-up for "bridge: start hello_timer when enabling > KERNEL_STP in br_stp_start" patch from Xin Long. > > Cc: da...@davemloft.net > Cc: sas...@cumulusnetworks.com > Cc: step...@networkplumber.org > Cc: bri...@lists.linux-foundation.org > Cc: lucien@gmail.com > Cc: niko...@cumulusnetworks.com > Signed-off-by: Ivan Vecera Overall, this looks correct but the wording of commit message is too terse. It would be better to add a more complete description of the impact of this from a user's point of view. I am concerned that this might have other side effects. For example, what is the sequence of commands to validated this. What is the impact, should this go to stable?
[PATCH net-next 7/9] fou: make local function static
The build header functions are not used by any other code. net/ipv6/fou6.c:36:5: warning: no previous prototype for ‘fou6_build_header’ [-Wmissing-prototypes] net/ipv6/fou6.c:54:5: warning: no previous prototype for ‘gue6_build_header’ [-Wmissing-prototypes] Need to do some code rearranging to satisfy different Kconfig possiblities. Signed-off-by: Stephen Hemminger--- net/ipv4/fou.c | 82 - net/ipv6/fou6.c | 14 +- 2 files changed, 47 insertions(+), 49 deletions(-) diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c index 805f6607f8d9..8e0257d01200 100644 --- a/net/ipv4/fou.c +++ b/net/ipv4/fou.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -859,25 +860,6 @@ size_t gue_encap_hlen(struct ip_tunnel_encap *e) } EXPORT_SYMBOL(gue_encap_hlen); -static void fou_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e, - struct flowi4 *fl4, u8 *protocol, __be16 sport) -{ - struct udphdr *uh; - - skb_push(skb, sizeof(struct udphdr)); - skb_reset_transport_header(skb); - - uh = udp_hdr(skb); - - uh->dest = e->dport; - uh->source = sport; - uh->len = htons(skb->len); - udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb, -fl4->saddr, fl4->daddr, skb->len); - - *protocol = IPPROTO_UDP; -} - int __fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, u8 *protocol, __be16 *sport, int type) { @@ -894,24 +876,6 @@ int __fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, } EXPORT_SYMBOL(__fou_build_header); -int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, -u8 *protocol, struct flowi4 *fl4) -{ - int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM : - SKB_GSO_UDP_TUNNEL; - __be16 sport; - int err; - - err = __fou_build_header(skb, e, protocol, , type); - if (err) - return err; - - fou_build_udp(skb, e, fl4, protocol, sport); - - return 0; -} -EXPORT_SYMBOL(fou_build_header); - int __gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, u8 *protocol, __be16 *sport, int type) { @@ -985,8 +949,46 @@ int __gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, } EXPORT_SYMBOL(__gue_build_header); -int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, -u8 *protocol, struct flowi4 *fl4) +#ifdef CONFIG_NET_FOU_IP_TUNNELS + +static void fou_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e, + struct flowi4 *fl4, u8 *protocol, __be16 sport) +{ + struct udphdr *uh; + + skb_push(skb, sizeof(struct udphdr)); + skb_reset_transport_header(skb); + + uh = udp_hdr(skb); + + uh->dest = e->dport; + uh->source = sport; + uh->len = htons(skb->len); + udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb, +fl4->saddr, fl4->daddr, skb->len); + + *protocol = IPPROTO_UDP; +} + +static int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, + u8 *protocol, struct flowi4 *fl4) +{ + int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM : + SKB_GSO_UDP_TUNNEL; + __be16 sport; + int err; + + err = __fou_build_header(skb, e, protocol, , type); + if (err) + return err; + + fou_build_udp(skb, e, fl4, protocol, sport); + + return 0; +} + +static int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, + u8 *protocol, struct flowi4 *fl4) { int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL; @@ -1001,9 +1003,7 @@ int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, return 0; } -EXPORT_SYMBOL(gue_build_header); -#ifdef CONFIG_NET_FOU_IP_TUNNELS static const struct ip_tunnel_encap_ops fou_iptun_ops = { .encap_hlen = fou_encap_hlen, diff --git a/net/ipv6/fou6.c b/net/ipv6/fou6.c index 9ea249b9451e..6de3c04b0f30 100644 --- a/net/ipv6/fou6.c +++ b/net/ipv6/fou6.c @@ -14,6 +14,8 @@ #include #include +#if IS_ENABLED(CONFIG_IPV6_FOU_TUNNEL) + static void fou6_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e, struct flowi6 *fl6, u8 *protocol, __be16 sport) { @@ -33,8 +35,8 @@ static void fou6_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e, *protocol = IPPROTO_UDP; } -int fou6_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, - u8 *protocol, struct flowi6 *fl6) +static int
[PATCH net-next 1/9] dcb: enforce minimum length on IEEE_APPS attribute
Found by reviewing the warning about unused policy table. The code implies that it meant to check for size, but since it unrolled the loop for attribute validation that is never used. Instead do explicit check for attribute. Compile tested only. Needs review by original author. Signed-off-by: Stephen Hemminger--- net/dcb/dcbnl.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c index 93106120f987..733f523707ac 100644 --- a/net/dcb/dcbnl.c +++ b/net/dcb/dcbnl.c @@ -178,10 +178,6 @@ static const struct nla_policy dcbnl_ieee_policy[DCB_ATTR_IEEE_MAX + 1] = { [DCB_ATTR_IEEE_QCN_STATS] = {.len = sizeof(struct ieee_qcn_stats)}, }; -static const struct nla_policy dcbnl_ieee_app[DCB_ATTR_IEEE_APP_MAX + 1] = { - [DCB_ATTR_IEEE_APP] = {.len = sizeof(struct dcb_app)}, -}; - /* DCB number of traffic classes nested attributes. */ static const struct nla_policy dcbnl_featcfg_nest[DCB_FEATCFG_ATTR_MAX + 1] = { [DCB_FEATCFG_ATTR_ALL] = {.type = NLA_FLAG}, @@ -1463,8 +1459,15 @@ static int dcbnl_ieee_set(struct net_device *netdev, struct nlmsghdr *nlh, nla_for_each_nested(attr, ieee[DCB_ATTR_IEEE_APP_TABLE], rem) { struct dcb_app *app_data; + if (nla_type(attr) != DCB_ATTR_IEEE_APP) continue; + + if (nla_len(attr) < sizeof(struct dcb_app)) { + err = -ERANGE; + goto err; + } + app_data = nla_data(attr); if (ops->ieee_setapp) err = ops->ieee_setapp(netdev, app_data); -- 2.11.0
[PATCH net-next 4/9] inet: fix warning about missing prototype
The prototype for inet_rcv_saddr_equal was not being included. Signed-off-by: Stephen Hemminger--- net/ipv4/inet_connection_sock.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 1054d330bf9d..82dec8825d28 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -25,6 +25,7 @@ #include #include #include +#include #ifdef INET_CSK_DEBUG const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n"; -- 2.11.0
Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
On 5/19/17 5:20 PM, Xin Long wrote: Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers"), bridge would not start hello_timer if stp_enabled is not KERNEL_STP when br_dev_open. The problem is even if users set stp_enabled with KERNEL_STP later, the timer will still not be started. It causes that KERNEL_STP can not really work. Users have to re-ifup the bridge to avoid this. This patch is to fix it by starting br->hello_timer when enabling KERNEL_STP in br_stp_start. As an improvement, it's also to start hello_timer again only when br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is no reason to start the timer again when it's NO_STP. Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers") Reported-by: Haidong LiSigned-off-by: Xin Long --- net/bridge/br_stp_if.c| 1 + net/bridge/br_stp_timer.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c index 08341d2..0db8102 100644 --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -179,6 +179,7 @@ static void br_stp_start(struct net_bridge *br) br_debug(br, "using kernel STP\n"); /* To start timers on any ports left in blocking */ + mod_timer(>hello_timer, jiffies + br->hello_time); br_port_state_selection(br); } diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c index c98b3e5..60b6fe2 100644 --- a/net/bridge/br_stp_timer.c +++ b/net/bridge/br_stp_timer.c @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg) if (br->dev->flags & IFF_UP) { br_config_bpdu_generation(br); - if (br->stp_enabled != BR_USER_STP) + if (br->stp_enabled == BR_KERNEL_STP) mod_timer(>hello_timer, round_jiffies(jiffies + br->hello_time)); } +CC Bridge maintainers & fixes commit author Acked-by: Nikolay Aleksandrov
Re: [PATCH v6 net-next 5/7] net: fix documentation of struct scm_timestamping
On Fri, May 19, 2017 at 11:52 AM, Miroslav Lichvarwrote: > The scm_timestamping struct may return multiple non-zero fields, e.g. > when both software and hardware RX timestamping is enabled, or when the > SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false > software timestamp is generated in the recvmsg() call in order to always > return a SCM_TIMESTAMP(NS) message. > > CC: Richard Cochran > CC: Willem de Bruijn > Signed-off-by: Miroslav Lichvar Acked-by: Willem de Bruijn
Re: [PATCH v6 net-next 6/7] net: allow simultaneous SW and HW transmit timestamping
On Fri, May 19, 2017 at 11:52 AM, Miroslav Lichvarwrote: > Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to > be looped to the socket's error queue with a software timestamp even > when a hardware transmit timestamp is expected to be provided by the > driver. > > Applications using this option will receive two separate messages from > the error queue, one with a software timestamp and the other with a > hardware timestamp. As the hardware timestamp is saved to the shared skb > info, which may happen before the first message with software timestamp > is received by the application, the hardware timestamp is copied to the > SCM_TIMESTAMPING control message only when the skb has no software > timestamp or it is an incoming packet. > > While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as > there are no other users. > > CC: Richard Cochran > CC: Willem de Bruijn > Signed-off-by: Miroslav Lichvar Acked-by: Willem de Bruijn
Re: [PATCH v5 net-next 5/7] net: fix documentation of struct scm_timestamping
On Fri, May 19, 2017 at 6:11 AM, Miroslav Lichvarwrote: > On Thu, May 18, 2017 at 03:38:30PM -0400, Willem de Bruijn wrote: >> On Thu, May 18, 2017 at 10:07 AM, Miroslav Lichvar >> wrote: >> > +Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled >> > +together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false >> > +software timestamp will be generated in the recvmsg() call and passed >> > +in ts[0] when a real software timestamp is missing. >> >> With receive software timestamping this is expected behavior? I would make >> explicit that this happens even on tx timestamps. > > How about adding ", e.g. when receive timestamping is enabled > between receiving the message and the recvmsg() call, or it is a > message with a hardware transmit timestamp." ? Perhaps even more brief "This happens also on hardware tx timestamps." >> > For this reason it >> > +is not recommended to combine SO_TIMESTAMP(NS) with SO_TIMESTAMPING. >> >> And I'd remove this. The extra timestamp is harmless, and we may be missing >> other reasons why someone would want to enable both on the same socket. > > Ok. I'm just concerned people will inadvertently use the timestamp as > a real timestamp and then wonder why SW TX timestamping is so bad. I > have fallen into this trap. So have I. It is equally surprising when only enabling SO_TIMESTAMP and observing out of order timestamps. It is certainly worth calling out.
[PATCH net] bridge: fix hello and hold timers starting/stopping
Current bridge code incorrectly handles starting/stopping of hello and hold timers during STP enable/disable. 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP transition. This is not correct as the timers are stopped in NO_STP case. 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition. This is not also correct as the timers should be stopped in NO_STP state. 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP transition. They should be stopped as they are running in KERNEL_STP state and should not run in NO_STP case. The patch is a follow-up for "bridge: start hello_timer when enabling KERNEL_STP in br_stp_start" patch from Xin Long. Cc: da...@davemloft.net Cc: sas...@cumulusnetworks.com Cc: step...@networkplumber.org Cc: bri...@lists.linux-foundation.org Cc: lucien@gmail.com Cc: niko...@cumulusnetworks.com Signed-off-by: Ivan Vecera--- net/bridge/br_stp_if.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c index 0db8102995a5..f137ebf27755 100644 --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char *arg) static void br_stp_start(struct net_bridge *br) { - struct net_bridge_port *p; int err = -ENOENT; if (net_eq(dev_net(br->dev), _net)) @@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br) if (!err) { br->stp_enabled = BR_USER_STP; br_debug(br, "userspace STP started\n"); - - /* Stop hello and hold timers */ - del_timer(>hello_timer); - list_for_each_entry(p, >port_list, list) - del_timer(>hold_timer); } else { br->stp_enabled = BR_KERNEL_STP; br_debug(br, "using kernel STP\n"); @@ -197,13 +191,14 @@ static void br_stp_stop(struct net_bridge *br) br_err(br, "failed to stop userspace STP (%d)\n", err); /* To start timers on any ports left in blocking */ - mod_timer(>hello_timer, jiffies + br->hello_time); - list_for_each_entry(p, >port_list, list) - mod_timer(>hold_timer, - round_jiffies(jiffies + BR_HOLD_TIME)); spin_lock_bh(>lock); br_port_state_selection(br); spin_unlock_bh(>lock); + } else { + /* BR_KERNEL_STP - stop hello and hold timers */ + del_timer(>hello_timer); + list_for_each_entry(p, >port_list, list) + del_timer(>hold_timer); } br->stp_enabled = BR_NO_STP; -- 2.13.0
Re: [PATCH net] bridge: fix hello and hold timers starting/stopping
On 5/19/17 7:25 PM, Ivan Vecera wrote: Current bridge code incorrectly handles starting/stopping of hello and hold timers during STP enable/disable. 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP transition. This is not correct as the timers are stopped in NO_STP case. This really is a noop, but ok. 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition. This is not also correct as the timers should be stopped in NO_STP state. Indeed, but the actual end result is almost as them being stopped because in the timers there are specific checks if the STP == KERNEL_STP (see br_transmit_config()) and the hold_timers will simply expire and not rearm in any other mode. The only real problem is the hello_timer which continues to rearm itself, but with Xin's earlier patch that is taken care of too. 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP transition. They should be stopped as they are running in KERNEL_STP state and should not run in NO_STP case. Same comment as for point 2. The patch is a follow-up for "bridge: start hello_timer when enabling KERNEL_STP in br_stp_start" patch from Xin Long. I'd say this is more of a cleanup/improvement after Xin's patch and thus would suggest targeting net-next. The only real issue is fixed by his patch. Cc: da...@davemloft.net Cc: sas...@cumulusnetworks.com Cc: step...@networkplumber.org Cc: bri...@lists.linux-foundation.org Cc: lucien@gmail.com Cc: niko...@cumulusnetworks.com Signed-off-by: Ivan Vecera--- net/bridge/br_stp_if.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c index 0db8102995a5..f137ebf27755 100644 --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char *arg) static void br_stp_start(struct net_bridge *br) { - struct net_bridge_port *p; int err = -ENOENT; if (net_eq(dev_net(br->dev), _net)) @@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br) if (!err) { br->stp_enabled = BR_USER_STP; br_debug(br, "userspace STP started\n"); - - /* Stop hello and hold timers */ - del_timer(>hello_timer); - list_for_each_entry(p, >port_list, list) - del_timer(>hold_timer); } else { br->stp_enabled = BR_KERNEL_STP; br_debug(br, "using kernel STP\n"); @@ -197,13 +191,14 @@ static void br_stp_stop(struct net_bridge *br) br_err(br, "failed to stop userspace STP (%d)\n", err); /* To start timers on any ports left in blocking */ - mod_timer(>hello_timer, jiffies + br->hello_time); - list_for_each_entry(p, >port_list, list) - mod_timer(>hold_timer, - round_jiffies(jiffies + BR_HOLD_TIME)); spin_lock_bh(>lock); br_port_state_selection(br); spin_unlock_bh(>lock); + } else { + /* BR_KERNEL_STP - stop hello and hold timers */ + del_timer(>hello_timer); + list_for_each_entry(p, >port_list, list) + del_timer(>hold_timer); } br->stp_enabled = BR_NO_STP;
Re: [PATCH net] bridge: fix hello and hold timers starting/stopping
2017-05-19 18:51 GMT+02:00 Xin Long: > On Sat, May 20, 2017 at 12:25 AM, Ivan Vecera wrote: >> Current bridge code incorrectly handles starting/stopping of hello and >> hold timers during STP enable/disable. >> >> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP >>transition. This is not correct as the timers are stopped in NO_STP >>case. >> >> 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition. >>This is not also correct as the timers should be stopped in NO_STP >>state. >> >> 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP >>transition. They should be stopped as they are running in KERNEL_STP >>state and should not run in NO_STP case. >> >> The patch is a follow-up for "bridge: start hello_timer when enabling >> KERNEL_STP in br_stp_start" patch from Xin Long. >> >> Cc: da...@davemloft.net >> Cc: sas...@cumulusnetworks.com >> Cc: step...@networkplumber.org >> Cc: bri...@lists.linux-foundation.org >> Cc: lucien@gmail.com >> Cc: niko...@cumulusnetworks.com >> Signed-off-by: Ivan Vecera >> --- >> net/bridge/br_stp_if.c | 15 +-- >> 1 file changed, 5 insertions(+), 10 deletions(-) >> >> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c >> index 0db8102995a5..f137ebf27755 100644 >> --- a/net/bridge/br_stp_if.c >> +++ b/net/bridge/br_stp_if.c >> @@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char >> *arg) >> >> static void br_stp_start(struct net_bridge *br) >> { >> - struct net_bridge_port *p; >> int err = -ENOENT; >> >> if (net_eq(dev_net(br->dev), _net)) >> @@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br) >> if (!err) { >> br->stp_enabled = BR_USER_STP; >> br_debug(br, "userspace STP started\n"); >> - >> - /* Stop hello and hold timers */ >> - del_timer(>hello_timer); >> - list_for_each_entry(p, >port_list, list) >> - del_timer(>hold_timer); >> } else { >> br->stp_enabled = BR_KERNEL_STP; >> br_debug(br, "using kernel STP\n"); >> @@ -197,13 +191,14 @@ static void br_stp_stop(struct net_bridge *br) >> br_err(br, "failed to stop userspace STP (%d)\n", >> err); >> >> /* To start timers on any ports left in blocking */ >> - mod_timer(>hello_timer, jiffies + br->hello_time); >> - list_for_each_entry(p, >port_list, list) >> - mod_timer(>hold_timer, >> - round_jiffies(jiffies + BR_HOLD_TIME)); >> spin_lock_bh(>lock); >> br_port_state_selection(br); >> spin_unlock_bh(>lock); >> + } else { >> + /* BR_KERNEL_STP - stop hello and hold timers */ >> + del_timer(>hello_timer); >> + list_for_each_entry(p, >port_list, list) >> + del_timer(>hold_timer); > I'm thinking, what if the timers are running when deleting them ? > del_timer may not be going to delete it, and still have to stop itself > next time when br->stp_enabled = BR_NO_STP. > > So do you think it's better to do nothing here and just leave it to be > stopped by itself when checking br->stp_enabled in > br_hello_timer_expired ? Yes, this kind of "lazy stopping" could be safer. I.
Re: [PATCH net] bridge: fix hello and hold timers starting/stopping
2017-05-19 18:55 GMT+02:00 Nikolay Aleksandrov: > On 5/19/17 7:25 PM, Ivan Vecera wrote: >> >> Current bridge code incorrectly handles starting/stopping of hello and >> hold timers during STP enable/disable. >> >> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP >> transition. This is not correct as the timers are stopped in NO_STP >> case. > > > This really is a noop, but ok. Yes, stopping of stopped timers are safe but confusing. >> 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition. >> This is not also correct as the timers should be stopped in NO_STP >> state. > > > Indeed, but the actual end result is almost as them being stopped because > in the timers there are specific checks if the STP == KERNEL_STP (see > br_transmit_config()) and the hold_timers will simply expire and not rearm > in any other mode. The only real problem is the hello_timer which continues > to rearm itself, but with Xin's earlier patch that is taken care of too. Yes, this is clean-up as well. The starting of timers are more confusing than dangerous but from a reader's point of view the starting of timers is non-sense when STP is going to be disabled. >> >> 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP >> transition. They should be stopped as they are running in KERNEL_STP >> state and should not run in NO_STP case. > > > Same comment as for point 2. This can be removed... and leave hello_timer handler to stop itself. >> The patch is a follow-up for "bridge: start hello_timer when enabling >> KERNEL_STP in br_stp_start" patch from Xin Long. >> > > I'd say this is more of a cleanup/improvement after Xin's patch and thus > would > suggest targeting net-next. The only real issue is fixed by his patch. Agree... will send resend against net-next. Thanks for comments, Ivan
[PATCH iproute2 1/1] tc: fix Makefile to build skbmod
Signed-off-by: Roman Mashak--- tc/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/tc/Makefile b/tc/Makefile index 9a6bb1d..678b302 100644 --- a/tc/Makefile +++ b/tc/Makefile @@ -45,6 +45,7 @@ TCMODULES += m_nat.o TCMODULES += m_pedit.o TCMODULES += m_ife.o TCMODULES += m_skbedit.o +TCMODULES += m_skbmod.o TCMODULES += m_csum.o TCMODULES += m_simple.o TCMODULES += m_vlan.o -- 1.9.1
Re: [RFC net-next PATCH 1/5] samples/bpf: xdp_tx_iptunnel make use of map_data[]
On 05/18/2017 05:41 PM, Jesper Dangaard Brouer wrote: There is no reason to use a compile time constant MAX_IPTNL_ENTRIES shared between the _user.c and _kern.c, when map_data[].def.max_entries can tell us dynamically what the max_entries were of the ELF map that the bpf loaded created. Signed-off-by: Jesper Dangaard BrouerPrevious code was perhaps a bit more robust in the sense that the order of the map wouldn't matter due to MAX_IPTNL_ENTRIES being shared. Now you rely on it being in slot 1 (map_data[1].def.max_entries) from "maps" section in ELF. samples/bpf/xdp_tx_iptunnel_common.h |2 -- samples/bpf/xdp_tx_iptunnel_kern.c |2 +- samples/bpf/xdp_tx_iptunnel_user.c | 14 +- 3 files changed, 10 insertions(+), 8 deletions(-) Not sure it's worth it given this actually adds more code and makes it more fragile at the same time. Only point I could see is to demo usage of map_data[1].def.max_entries for sample code. Perhaps at the very minimum add a warning comment to xdp_tx_iptunnel_kern.c that should the code be further extended with additional maps, that ordering of struct bpf_map_def entries really matters here to not break the _user.c part. Other than that, this should be sent as stand-alone "cleanup" ...
[PATCH v6 net-next 4/7] net: add new control message for incoming HW-timestamped packets
Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message for incoming packets with hardware timestamps. It contains the index of the real interface which received the packet and the length of the packet at layer 2. The index is useful with bonding, bridges and other interfaces, where IP_PKTINFO doesn't allow applications to determine which PHC made the timestamp. With the L2 length (and link speed) it is possible to transpose preamble timestamps to trailer timestamps, which are used in the NTP protocol. While this information could be provided by two new socket options independently from timestamping, it doesn't look like they would be very useful. With this option any performance impact is limited to hardware timestamping. Use dev_get_by_napi_id() to get the device and its index. On kernels with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero index will be returned in the control message. CC: Richard CochranAcked-by: Willem de Bruijn Signed-off-by: Miroslav Lichvar --- Documentation/networking/timestamping.txt | 10 ++ include/uapi/asm-generic/socket.h | 2 ++ include/uapi/linux/net_tstamp.h | 11 ++- net/socket.c | 27 ++- 4 files changed, 48 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 96f5069..ce11e3a 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -193,6 +193,16 @@ SOF_TIMESTAMPING_OPT_STATS: the transmit timestamps, such as how long a certain block of data was limited by peer's receiver window. +SOF_TIMESTAMPING_OPT_PKTINFO: + + Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming + packets with hardware timestamps. The message contains struct + scm_ts_pktinfo, which supplies the index of the real interface which + received the packet and its length at layer 2. A valid (non-zero) + interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is + enabled and the driver is using NAPI. The struct contains also two + other fields, but they are reserved and undefined. + New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate regardless of the setting of sysctl net.core.tstamp_allow_data. diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index 2b48856..a5f6e81 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -100,4 +100,6 @@ #define SO_COOKIE 57 +#define SCM_TIMESTAMPING_PKTINFO 58 + #endif /* __ASM_GENERIC_SOCKET_H */ diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h index 0749fb1..dee74d3 100644 --- a/include/uapi/linux/net_tstamp.h +++ b/include/uapi/linux/net_tstamp.h @@ -9,6 +9,7 @@ #ifndef _NET_TIMESTAMPING_H #define _NET_TIMESTAMPING_H +#include #include/* for SO_TIMESTAMPING */ /* SO_TIMESTAMPING gets an integer bit field comprised of these values */ @@ -26,8 +27,9 @@ enum { SOF_TIMESTAMPING_OPT_CMSG = (1<<10), SOF_TIMESTAMPING_OPT_TSONLY = (1<<11), SOF_TIMESTAMPING_OPT_STATS = (1<<12), + SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13), - SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_STATS, + SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_PKTINFO, SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) | SOF_TIMESTAMPING_LAST }; @@ -130,4 +132,11 @@ enum hwtstamp_rx_filters { HWTSTAMP_FILTER_NTP_ALL, }; +/* SCM_TIMESTAMPING_PKTINFO control message */ +struct scm_ts_pktinfo { + __u32 if_index; + __u32 pkt_length; + __u32 reserved[2]; +}; + #endif /* _NET_TIMESTAMPING_H */ diff --git a/net/socket.c b/net/socket.c index c2564eb..67db7d8 100644 --- a/net/socket.c +++ b/net/socket.c @@ -662,6 +662,27 @@ static bool skb_is_err_queue(const struct sk_buff *skb) return skb->pkt_type == PACKET_OUTGOING; } +static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb) +{ + struct scm_ts_pktinfo ts_pktinfo; + struct net_device *orig_dev; + + if (!skb_mac_header_was_set(skb)) + return; + + memset(_pktinfo, 0, sizeof(ts_pktinfo)); + + rcu_read_lock(); + orig_dev = dev_get_by_napi_id(skb_napi_id(skb)); + if (orig_dev) + ts_pktinfo.if_index = orig_dev->ifindex; + rcu_read_unlock(); + + ts_pktinfo.pkt_length = skb->len - skb_mac_offset(skb); + put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_PKTINFO, +sizeof(ts_pktinfo), _pktinfo); +} + /* * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP) */ @@ -699,8 +720,12 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
[PATCH v6 net-next 2/7] net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALL
Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid filter and update drivers which can timestamp all packets, or which explicitly list unsupported filters instead of using a default case, to handle the filter. CC: Richard CochranCC: Willem de Bruijn Signed-off-by: Miroslav Lichvar --- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 1 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 + drivers/net/ethernet/cavium/liquidio/lio_main.c| 1 + drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 1 + drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 1 + drivers/net/ethernet/intel/e1000e/netdev.c | 1 + drivers/net/ethernet/intel/i40e/i40e_ptp.c | 1 + drivers/net/ethernet/intel/igb/igb_ptp.c | 1 + drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 1 + drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 1 + drivers/net/ethernet/neterion/vxge/vxge-main.c | 1 + drivers/net/ethernet/qlogic/qede/qede_ptp.c| 1 + drivers/net/ethernet/sfc/ef10.c| 1 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 + drivers/net/ethernet/ti/cpsw.c | 1 + drivers/net/ethernet/tile/tilegx.c | 1 + net/core/dev_ioctl.c | 3 +-- 18 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c index c772420..89b21d7 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c @@ -1268,6 +1268,7 @@ static int xgbe_set_hwtstamp_settings(struct xgbe_prv_data *pdata, case HWTSTAMP_FILTER_NONE: break; + case HWTSTAMP_FILTER_NTP_ALL: case HWTSTAMP_FILTER_ALL: XGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSENALL, 1); XGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSENA, 1); diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index 7414ffd..14c236e 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -15351,6 +15351,7 @@ int bnx2x_configure_ptp_filters(struct bnx2x *bp) break; case HWTSTAMP_FILTER_ALL: case HWTSTAMP_FILTER_SOME: + case HWTSTAMP_FILTER_NTP_ALL: bp->rx_filter = HWTSTAMP_FILTER_NONE; break; case HWTSTAMP_FILTER_PTP_V1_L4_EVENT: diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index 649f2aa..ba01242 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -3024,6 +3024,7 @@ static int hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr) case HWTSTAMP_FILTER_PTP_V2_EVENT: case HWTSTAMP_FILTER_PTP_V2_SYNC: case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ: + case HWTSTAMP_FILTER_NTP_ALL: conf.rx_filter = HWTSTAMP_FILTER_ALL; break; default: diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c index d51c8d8..31d737c 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c @@ -2085,6 +2085,7 @@ static int hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr) case HWTSTAMP_FILTER_PTP_V2_EVENT: case HWTSTAMP_FILTER_PTP_V2_SYNC: case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ: + case HWTSTAMP_FILTER_NTP_ALL: conf.rx_filter = HWTSTAMP_FILTER_ALL; break; default: diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c index a213868..2887bca 100644 --- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c +++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c @@ -755,6 +755,7 @@ static int octeon_mgmt_ioctl_hwtstamp(struct net_device *netdev, case HWTSTAMP_FILTER_PTP_V2_EVENT: case HWTSTAMP_FILTER_PTP_V2_SYNC: case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ: + case HWTSTAMP_FILTER_NTP_ALL: p->has_rx_tstamp = have_hw_timestamps; config.rx_filter = HWTSTAMP_FILTER_ALL; if (p->has_rx_tstamp) { diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index b367972..0ff9295 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -3680,6 +3680,7 @@ static int e1000e_config_hwtstamp(struct e1000_adapter *adapter, * Delay Request messages but not both so fall-through to * time stamp all packets. */ + case
[PATCH v6 net-next 3/7] net: add function to retrieve original skb device using NAPI ID
Since commit b68581778cd0 ("net: Make skb->skb_iif always track skb->dev") skbs don't have the original index of the interface which received the packet. This information is now needed for a new control message related to hardware timestamping. Instead of adding a new field to skb, we can find the device by the NAPI ID if it is available, i.e. CONFIG_NET_RX_BUSY_POLL is enabled and the driver is using NAPI. Add dev_get_by_napi_id() and also skb_napi_id() to hide the CONFIG_NET_RX_BUSY_POLL ifdef. CC: Richard CochranSuggested-by: Willem de Bruijn Acked-by: Willem de Bruijn Signed-off-by: Miroslav Lichvar --- include/linux/netdevice.h | 1 + include/linux/skbuff.h| 9 + net/core/dev.c| 26 ++ 3 files changed, 36 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3f39d27..b6c36d5 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2456,6 +2456,7 @@ static inline int dev_recursion_level(void) struct net_device *dev_get_by_index(struct net *net, int ifindex); struct net_device *__dev_get_by_index(struct net *net, int ifindex); struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex); +struct net_device *dev_get_by_napi_id(unsigned int napi_id); int netdev_get_name(struct net *net, char *name, int ifindex); int dev_restart(struct net_device *dev); int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb); diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 7c0cb2c..1f8028c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -855,6 +855,15 @@ static inline bool skb_pkt_type_ok(u32 ptype) return ptype <= PACKET_OTHERHOST; } +static inline unsigned int skb_napi_id(const struct sk_buff *skb) +{ +#ifdef CONFIG_NET_RX_BUSY_POLL + return skb->napi_id; +#else + return 0; +#endif +} + void kfree_skb(struct sk_buff *skb); void kfree_skb_list(struct sk_buff *segs); void skb_tx_error(struct sk_buff *skb); diff --git a/net/core/dev.c b/net/core/dev.c index acd594c..6d3c452 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -162,6 +162,7 @@ static int netif_rx_internal(struct sk_buff *skb); static int call_netdevice_notifiers_info(unsigned long val, struct net_device *dev, struct netdev_notifier_info *info); +static struct napi_struct *napi_by_id(unsigned int napi_id); /* * The @dev_base_head list is protected by @dev_base_lock and the rtnl @@ -866,6 +867,31 @@ struct net_device *dev_get_by_index(struct net *net, int ifindex) EXPORT_SYMBOL(dev_get_by_index); /** + * dev_get_by_napi_id - find a device by napi_id + * @napi_id: ID of the NAPI struct + * + * Search for an interface by NAPI ID. Returns %NULL if the device + * is not found or a pointer to the device. The device has not had + * its reference counter increased so the caller must be careful + * about locking. The caller must hold RCU lock. + */ + +struct net_device *dev_get_by_napi_id(unsigned int napi_id) +{ + struct napi_struct *napi; + + WARN_ON_ONCE(!rcu_read_lock_held()); + + if (napi_id < MIN_NAPI_ID) + return NULL; + + napi = napi_by_id(napi_id); + + return napi ? napi->dev : NULL; +} +EXPORT_SYMBOL(dev_get_by_napi_id); + +/** * netdev_get_name - get a netdevice name, knowing its ifindex. * @net: network namespace * @name: a pointer to the buffer where the name will be stored. -- 2.9.3
[PATCH v6 net-next 0/7] Extend socket timestamping API
Changes v5->v6: - fixed skb_is_swtx_tstamp() when OPT_TX_SWHW is disabled and improved its description - improved OPT_PKTINFO documentation - improved scm_timestamping documentation Changes v4->v5: - fixed initialization of reserved fields in struct scm_ts_pktinfo Changes v3->v4: - added reserved fields to struct scm_ts_pktinfo - replaced patch fixing false SW timestamps with a documentation fix - updated OPT_TX_SWHW patch to handle false SW timestamps Changes v2->v3: - modified struct scm_ts_pktinfo to use fixed-width integer types - added WARN_ON_ONCE for missing RCU lock in dev_get_by_napi_id() - modified dev_get_by_napi_id() to not return dev in unexpected branch - modified recv to return SCM_TIMESTAMPING_PKTINFO even if the interface index is unknown Changes v1->v2: - added separate patch for new NAPI functions - split code from __sock_recv_timestamp() for better readability - fixed RCU locking - fixed compiler warning (missing case in switch in first patch) - inline sw_tx_timestamp() in its only user Changes RFC->v1: - reworked SOF_TIMESTAMPING_OPT_PKTINFO patch to not add new fields to skb shared info (net device is now looked up by napi_id), not require any changes in drivers, and restrict the cmsg to incoming packets - renamed SOF_TIMESTAMPING_OPT_MULTIMSG to SOF_TIMESTAMPING_OPT_TX_SWHW and fixed its description - moved struct scm_ts_pktinfo from errqueue.h to net_tstamp.h as it can't be received from the error queue anymore - improved commit descriptions and removed incorrect comment This patchset adds new options to the timestamping API that will be useful for NTP implementations and possibly other applications. The first patch specifies a timestamp filter for NTP packets. The second patch updates drivers that can timestamp all packets, or need to list the filter as unsupported. There is no attempt to add the support to the phyter driver. The third patch adds two helper functions working with NAPI ID, which is needed by the next patch. The fourth patch adds a new option to get a new control message with the L2 length and interface index for incoming packets with hardware timestamps. The fifth patch fixes documentation on number of non-zero fields in scm_timestamping and warns about false software timestamps when SO_TIMESTAMP(NS) is combined with SCM_TIMESTAMPING. The sixth patch adds a new option to request both software and hardware timestamps for outgoing packets. The seventh patch updates drivers that assumed software timestamping cannot be used together with hardware timestamping. The patches have been tested on x86_64 machines with igb and e1000e drivers. Miroslav Lichvar (7): net: define receive timestamp filter for NTP net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALL net: add function to retrieve original skb device using NAPI ID net: add new control message for incoming HW-timestamped packets net: fix documentation of struct scm_timestamping net: allow simultaneous SW and HW transmit timestamping net: ethernet: update drivers to make both SW and HW TX timestamps Documentation/networking/timestamping.txt | 26 +++- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 4 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 + drivers/net/ethernet/cavium/liquidio/lio_main.c| 1 + drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 1 + drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 1 + drivers/net/ethernet/intel/e1000e/netdev.c | 5 ++- drivers/net/ethernet/intel/i40e/i40e_ptp.c | 1 + drivers/net/ethernet/intel/igb/igb_ptp.c | 1 + drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 1 + drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 1 + drivers/net/ethernet/neterion/vxge/vxge-main.c | 1 + drivers/net/ethernet/qlogic/qede/qede_ptp.c| 1 + drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c| 3 +- drivers/net/ethernet/sfc/ef10.c| 1 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 ++-- drivers/net/ethernet/ti/cpsw.c | 1 + drivers/net/ethernet/tile/tilegx.c | 1 + include/linux/netdevice.h | 1 + include/linux/skbuff.h | 19 + include/uapi/asm-generic/socket.h | 2 + include/uapi/linux/net_tstamp.h| 15 ++- net/core/dev.c | 26 net/core/dev_ioctl.c | 1 + net/core/skbuff.c | 4 ++ net/socket.c | 47 -- 27 files changed, 151 insertions(+), 23 deletions(-) -- 2.9.3
[PATCH v6 net-next 1/7] net: define receive timestamp filter for NTP
Add HWTSTAMP_FILTER_NTP_ALL to the hwtstamp_rx_filters enum for timestamping of NTP packets. There is currently only one driver (phyter) that could support it directly. CC: Richard CochranCC: Willem de Bruijn Signed-off-by: Miroslav Lichvar --- include/uapi/linux/net_tstamp.h | 3 +++ net/core/dev_ioctl.c| 2 ++ 2 files changed, 5 insertions(+) diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h index 464dcca..0749fb1 100644 --- a/include/uapi/linux/net_tstamp.h +++ b/include/uapi/linux/net_tstamp.h @@ -125,6 +125,9 @@ enum hwtstamp_rx_filters { HWTSTAMP_FILTER_PTP_V2_SYNC, /* PTP v2/802.AS1, any layer, Delay_req packet */ HWTSTAMP_FILTER_PTP_V2_DELAY_REQ, + + /* NTP, UDP, all versions and packet modes */ + HWTSTAMP_FILTER_NTP_ALL, }; #endif /* _NET_TIMESTAMPING_H */ diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c index b94b1d2..8f036a7 100644 --- a/net/core/dev_ioctl.c +++ b/net/core/dev_ioctl.c @@ -227,6 +227,8 @@ static int net_hwtstamp_validate(struct ifreq *ifr) case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ: rx_filter_valid = 1; break; + case HWTSTAMP_FILTER_NTP_ALL: + break; } if (!tx_type_valid || !rx_filter_valid) -- 2.9.3
HELLO!!!!!
HELLO! I am Mr Neil Trotter, the current winner of 108 Euro millions jackpot, if you have received this email then you are of the lucky fellows to benefit from me,so do get back to me for a better understanding. Here is the website for proof http://www.huffingtonpost.co.uk/2014/03/18/neil-trotter-euromillions-winner_n_4984234.html} Contact Email;(mr.neiltrotter...@outlook.com) THANKS, MR. NEIL TROTTER.
Re: [PATCH net-next V5 0/9] vhost_net rx batch dequeuing
On Fri, May 19, 2017 at 02:27:16PM +0800, Jason Wang wrote: > > > On 2017年05月18日 04:59, Michael S. Tsirkin wrote: > > On Wed, May 17, 2017 at 12:14:36PM +0800, Jason Wang wrote: > > > This series tries to implement rx batching for vhost-net. This is done > > > by batching the dequeuing from skb_array which was exported by > > > underlayer socket and pass the sbk back through msg_control to finish > > > userspace copying. This is also the requirement for more batching > > > implemention on rx path. > > > > > > Tests shows at most 7.56% improvment bon rx pps on top of batch > > > zeroing and no obvious changes for TCP_STREAM/TCP_RR result. > > > > > > Please review. > > > > > > Thanks > > A surprisingly large gain for such as simple change. It would be nice > > to understand better why this helps - in particular, does the optimal > > batch size change if ring is bigger or smaller? > > Will test, just want to confirm. You mean virtio ring not tx_queue_len here? > > Thanks Exactly. Thanks, MST
Re: [PATCH net] bridge: fix hello and hold timers starting/stopping
On Sat, May 20, 2017 at 12:25 AM, Ivan Vecerawrote: > Current bridge code incorrectly handles starting/stopping of hello and > hold timers during STP enable/disable. > > 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP >transition. This is not correct as the timers are stopped in NO_STP >case. > > 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition. >This is not also correct as the timers should be stopped in NO_STP >state. > > 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP >transition. They should be stopped as they are running in KERNEL_STP >state and should not run in NO_STP case. > > The patch is a follow-up for "bridge: start hello_timer when enabling > KERNEL_STP in br_stp_start" patch from Xin Long. > > Cc: da...@davemloft.net > Cc: sas...@cumulusnetworks.com > Cc: step...@networkplumber.org > Cc: bri...@lists.linux-foundation.org > Cc: lucien@gmail.com > Cc: niko...@cumulusnetworks.com > Signed-off-by: Ivan Vecera > --- > net/bridge/br_stp_if.c | 15 +-- > 1 file changed, 5 insertions(+), 10 deletions(-) > > diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c > index 0db8102995a5..f137ebf27755 100644 > --- a/net/bridge/br_stp_if.c > +++ b/net/bridge/br_stp_if.c > @@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char > *arg) > > static void br_stp_start(struct net_bridge *br) > { > - struct net_bridge_port *p; > int err = -ENOENT; > > if (net_eq(dev_net(br->dev), _net)) > @@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br) > if (!err) { > br->stp_enabled = BR_USER_STP; > br_debug(br, "userspace STP started\n"); > - > - /* Stop hello and hold timers */ > - del_timer(>hello_timer); > - list_for_each_entry(p, >port_list, list) > - del_timer(>hold_timer); > } else { > br->stp_enabled = BR_KERNEL_STP; > br_debug(br, "using kernel STP\n"); > @@ -197,13 +191,14 @@ static void br_stp_stop(struct net_bridge *br) > br_err(br, "failed to stop userspace STP (%d)\n", > err); > > /* To start timers on any ports left in blocking */ > - mod_timer(>hello_timer, jiffies + br->hello_time); > - list_for_each_entry(p, >port_list, list) > - mod_timer(>hold_timer, > - round_jiffies(jiffies + BR_HOLD_TIME)); > spin_lock_bh(>lock); > br_port_state_selection(br); > spin_unlock_bh(>lock); > + } else { > + /* BR_KERNEL_STP - stop hello and hold timers */ > + del_timer(>hello_timer); > + list_for_each_entry(p, >port_list, list) > + del_timer(>hold_timer); I'm thinking, what if the timers are running when deleting them ? del_timer may not be going to delete it, and still have to stop itself next time when br->stp_enabled = BR_NO_STP. So do you think it's better to do nothing here and just leave it to be stopped by itself when checking br->stp_enabled in br_hello_timer_expired ? > } > > br->stp_enabled = BR_NO_STP; > -- > 2.13.0 >
Re: [PATCH net-next] net: sched: provide stubs for tcf_chain_{get,put} for CONFIG_NET_CLS=n
On Thu, May 18, 2017 at 9:24 AM, Sabrina Dubrocawrote: > This also changes tcf_chain_get() to return an error pointer instead of > NULL, so that tcf_action_goto_chain_init() can differentiate memory > allocation failure from lack of support. > > Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters") > Signed-off-by: Sabrina Dubroca > --- > I'm not sure this EOPNOTSUPP is really necessary, ie if we can really > reach the tcf_action_goto_chain_init() call when CONFIG_NET_CLS=n. > If not, a simpler patch would add a tcf_chain_get() stub that just > returns NULL, as we wouldn't have to care about returning an incorrect > error code from tcf_action_goto_chain_init(). I wonder if we should just make CONFIG_NET_CLS_ACT depending on CONFIG_NET_CLS. Although without filters we can still have standalone actions but no one can use them.
[PATCH net-next 3/9] udp: make local function static
udp_queue_rcv_skb was global but only used in one file. Identified by this warning: net/ipv4/udp.c:1775:5: warning: no previous prototype for ‘udp_queue_rcv_skb’ [-Wmissing-prototypes] int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) Signed-off-by: Stephen Hemminger--- net/ipv4/udp.c | 2 +- net/ipv6/udp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index e7b6cfcca627..8d7980ca27a0 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1772,7 +1772,7 @@ EXPORT_SYMBOL(udp_encap_enable); * Note that in the success and error cases, the skb is assumed to * have either been requeued or freed. */ -int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) +static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) { struct udp_sock *up = udp_sk(sk); int is_udplite = IS_UDPLITE(sk); diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index f78fdf8c9f0f..d558c1fef6fd 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -570,7 +570,7 @@ void udpv6_encap_enable(void) } EXPORT_SYMBOL(udpv6_encap_enable); -int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) +static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) { struct udp_sock *up = udp_sk(sk); int is_udplite = IS_UDPLITE(sk); -- 2.11.0
[PATCH net-next 6/9] xfrm: make xfrm_dev_register static
This function is only used in this file and should not be global. Signed-off-by: Stephen Hemminger--- net/xfrm/xfrm_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 8ec8a3fcf8d4..50ec73399b48 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -138,7 +138,7 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x) } EXPORT_SYMBOL_GPL(xfrm_dev_offload_ok); -int xfrm_dev_register(struct net_device *dev) +static int xfrm_dev_register(struct net_device *dev) { if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops) return NOTIFY_BAD; -- 2.11.0
[PATCH net-next 8/9] ipv6: drop unused variables in seg6_genl_dumphac
THe seg6_pernet_data variable was set but never used. Signed-off-by: Stephen Hemminger--- net/ipv6/seg6.c | 4 1 file changed, 4 deletions(-) diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c index 5f44ffed2576..15fba55e3da8 100644 --- a/net/ipv6/seg6.c +++ b/net/ipv6/seg6.c @@ -303,13 +303,9 @@ static int seg6_genl_dumphmac_done(struct netlink_callback *cb) static int seg6_genl_dumphmac(struct sk_buff *skb, struct netlink_callback *cb) { struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0]; - struct net *net = sock_net(skb->sk); - struct seg6_pernet_data *sdata; struct seg6_hmac_info *hinfo; int ret; - sdata = seg6_pernet(net); - ret = rhashtable_walk_start(iter); if (ret && ret != -EAGAIN) goto done; -- 2.11.0
[PATCH net-next 2/9] ila: propagate error code in ila_output
This warning: net/ipv6/ila/ila_lwt.c: In function ‘ila_output’: net/ipv6/ila/ila_lwt.c:42:6: warning: variable ‘err’ set but not used [-Wunused-but-set-variable] It looks like the code attempts to set propagate different error values, but always returned -EINVAL. Compile tested only. Needs review by original author. Signed-off-by: Stephen Hemminger--- net/ipv6/ila/ila_lwt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c index b3df03e3faa0..f4a413aba423 100644 --- a/net/ipv6/ila/ila_lwt.c +++ b/net/ipv6/ila/ila_lwt.c @@ -91,7 +91,7 @@ static int ila_output(struct net *net, struct sock *sk, struct sk_buff *skb) drop: kfree_skb(skb); - return -EINVAL; + return err; } static int ila_input(struct sk_buff *skb) -- 2.11.0