Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP
Thu, Feb 09, 2017 at 12:41:20AM CET, t...@herbertland.com wrote: >This patch creates an infrastructure for registering and running code at >XDP hooks in drivers. This extends and generalizes the original XDP/BPF >interface. Specifically, it defines a generic xdp_hook structure and a >set of hooks that can be assigned to devices or napi instances. These >hooks are also generic to allow for XDP/BPF programs as well as non-BPF >code (e.g. kernel code can be written in a module). > >An XDP hook is defined by the xdp_hook structure. A pointer to this >structure is passed into the XDP register function to set up a hook. >The XDP register function mallocs its own xdp_hook structure and copies >the values from the xdp_hook passed in. The register function also saves >the pointer value of the xdp_hook argument; this pointer is used in >subsequently calls to XDP to identify the registered hook. > >The interface is defined in net/xdp.h. This includes the definition of >xdp_hook, functions to register and unregister hooks on a device >or individual instances of napi, and xdp_hook_run that is called by >drivers to run the hooks. > >Signed-off-by: Tom Herbert>--- > drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c | 1 + > include/linux/filter.h | 10 +- > include/linux/netdev_features.h | 3 +- > include/linux/netdevice.h| 16 ++ > include/net/xdp.h| 310 +++ > include/trace/events/xdp.h | 31 +++ > kernel/bpf/core.c| 1 + > net/core/Makefile| 2 +- > net/core/dev.c | 53 ++-- > net/core/filter.c| 1 + > net/core/rtnetlink.c | 14 +- > net/core/xdp.c | 304 ++ > 12 files changed, 711 insertions(+), 35 deletions(-) > create mode 100644 include/net/xdp.h > create mode 100644 net/core/xdp.c > [...] >@@ -48,6 +49,36 @@ TRACE_EVENT(xdp_exception, > __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB)) > ); > >+/* Temporaray trace function. This will be renamed to xdp_exception after all typo >+ * the calling drivers have been patched. >+ */
Re: [PATCHv6 net-next 4/6] sctp: implement sender-side procedures for SSN/TSN Reset Request Parameter
On Thu, Feb 9, 2017 at 5:50 AM, Marcelo Ricardo Leitnerwrote: > On Wed, Feb 08, 2017 at 07:48:01PM -0200, Marcelo Ricardo Leitner wrote: >> Hi Xin, >> >> On Thu, Feb 09, 2017 at 01:18:18AM +0800, Xin Long wrote: >> > This patch is to implement Sender-Side Procedures for the SSN/TSN >> > Reset Request Parameter descibed in rfc6525 section 5.1.4. >> > >> > It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3 >> > for users. >> > >> > Signed-off-by: Xin Long >> ... >> > + >> > +int sctp_send_reset_assoc(struct sctp_association *asoc) >> > +{ >> > + struct sctp_chunk *chunk = NULL; >> > + int retval; >> > + __u16 i; >> > + >> > + if (!asoc->peer.reconf_capable || >> > + !(asoc->strreset_enable & SCTP_ENABLE_RESET_ASSOC_REQ)) >> > + return -ENOPROTOOPT; >> > + >> > + if (asoc->strreset_outstanding) >> > + return -EINPROGRESS; >> > + >> > + chunk = sctp_make_strreset_tsnreq(asoc); >> ^--- refcnf = 1 (as per sctp_chunkify()) >> >> > + if (!chunk) >> > + return -ENOMEM; >> > + >> > + /* Block further xmit of data until this request is completed */ >> > + for (i = 0; i < asoc->stream->outcnt; i++) >> > + asoc->stream->out[i].state = SCTP_STREAM_CLOSED; >> > + >> > + asoc->strreset_chunk = chunk; >> > + sctp_chunk_hold(asoc->strreset_chunk); >> ^--- refcnf = 2 >> > + >> > + retval = sctp_send_reconf(asoc, chunk); >> > + if (retval) { >> > + sctp_chunk_put(asoc->strreset_chunk); >> ^--- refcnf = 1 >> >> Won't we leak the chunk here? > > No we won't, sctp_send_reconf() frees it for us, aye. yups. :) > >
[lkp-robot] [rhashtable] 60be2ebf32: INFO:suspicious_RCU_usage
FYI, we noticed the following commit: commit: 60be2ebf326aa90c88e9a967412557d832a1612e ("rhashtable: Add nested tables") url: https://github.com/0day-ci/linux/commits/Herbert-Xu/rhashtable-Handle-table-allocation-failure-during-insertion/20170207-204835 in testcase: boot on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 2G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +-+++ | | 12f609851b | 60be2ebf32 | +-+++ | boot_successes | 123| 5 | | boot_failures | 9 | 8 | | BUG:kernel_reboot-without-warning_in_test_stage | 9 || | INFO:suspicious_RCU_usage | 0 | 8 | +-+++ [ 222.678280] [ INFO: suspicious RCU usage. ] [ 222.678280] [ INFO: suspicious RCU usage. ] [ 222.699410] 4.10.0-rc6-00112-g60be2eb #534 Not tainted [ 222.699410] 4.10.0-rc6-00112-g60be2eb #534 Not tainted [ 222.725264] --- [ 222.725264] --- [ 222.741887] lib/rhashtable.c:1125 suspicious rcu_dereference_check() usage! [ 222.741887] lib/rhashtable.c:1125 suspicious rcu_dereference_check() usage! [ 222.783537] [ 222.783537] other info that might help us debug this: [ 222.783537] [ 222.783537] [ 222.783537] other info that might help us debug this: [ 222.783537] [ 222.823615] [ 222.823615] rcu_scheduler_active = 2, debug_locks = 1 [ 222.823615] [ 222.823615] rcu_scheduler_active = 2, debug_locks = 1 [ 222.856672] 4 locks held by kworker/0:1/19: [ 222.856672] 4 locks held by kworker/0:1/19: [ 222.877765] #0: ("events"){.+.+.+}, at: [] process_one_work+0x2ac/0x761 [ 222.877765] #0: ("events"){.+.+.+}, at: [] process_one_work+0x2ac/0x761 [ 222.917921] #1: ((>run_work)){+.+.+.}, at: [] process_one_work+0x2d6/0x761 [ 222.917921] #1: ((>run_work)){+.+.+.}, at: [] process_one_work+0x2d6/0x761 [ 222.960763] #2: (>mutex){+.+.+.}, at: [] rht_deferred_worker+0x22/0x26a [ 222.960763] #2: (>mutex){+.+.+.}, at: [] rht_deferred_worker+0x22/0x26a [ 223.004378] #3: (&(>locks[i])->rlock){+.}, at: [] rhashtable_rehash_table+0xf6/0x640 [ 223.004378] #3: (&(>locks[i])->rlock){+.}, at: [] rhashtable_rehash_table+0xf6/0x640 [ 223.055769] [ 223.055769] stack backtrace: [ 223.055769] [ 223.055769] stack backtrace: [ 223.078034] CPU: 0 PID: 19 Comm: kworker/0:1 Not tainted 4.10.0-rc6-00112-g60be2eb #534 [ 223.078034] CPU: 0 PID: 19 Comm: kworker/0:1 Not tainted 4.10.0-rc6-00112-g60be2eb #534 [ 223.118613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 [ 223.118613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 [ 223.169614] Workqueue: events rht_deferred_worker [ 223.169614] Workqueue: events rht_deferred_worker [ 223.193302] Call Trace: [ 223.193302] Call Trace: [ 223.205882] dump_stack+0x19/0x1b [ 223.205882] dump_stack+0x19/0x1b [ 223.222639] lockdep_rcu_suspicious+0xdd/0xf4 [ 223.222639] lockdep_rcu_suspicious+0xdd/0xf4 [ 223.244610] rht_bucket_nested+0x107/0x10c [ 223.244610] rht_bucket_nested+0x107/0x10c [ 223.265382] rhashtable_rehash_table+0x373/0x640 [ 223.265382] rhashtable_rehash_table+0x373/0x640 [ 223.289295] rht_deferred_worker+0x112/0x26a [ 223.289295] rht_deferred_worker+0x112/0x26a [ 223.308573] process_one_work+0x3b7/0x761 [ 223.308573] process_one_work+0x3b7/0x761 [ 223.328696] ? process_one_work+0x2d6/0x761 [ 223.328696] ? process_one_work+0x2d6/0x761 [ 223.349210] worker_thread+0x2aa/0x6a5 [ 223.349210] worker_thread+0x2aa/0x6a5 [ 223.368199] kthread+0x107/0x13a [ 223.368199] kthread+0x107/0x13a [ 223.384633] ? process_one_work+0x761/0x761 [ 223.384633] ? process_one_work+0x761/0x761 [ 223.405721] ? __kthread_create_on_node+0x232/0x232 [ 223.405721] ? __kthread_create_on_node+0x232/0x232 [ 223.430350] ret_from_fork+0x31/0x40 [ 223.430350] ret_from_fork+0x31/0x40 [ 223.448309] [ 223.448309] [ 223.456052] === [ 223.456052] === [ 223.477118] [ INFO: suspicious RCU usage. ] [ 223.477118] [ INFO: suspicious RCU usage. ] [ 223.498249] 4.10.0-rc6-00112-g60be2eb #534 Not tainted [ 223.498249] 4.10.0-rc6-00112-g60be2eb #534 Not tainted [ 223.524590] --- [ 223.524590] --- [ 223.545889] lib/rhashtable.c:1130 suspicious rcu_dereference_check() usage! [ 223.545889] lib/rhashtable.c:1130 suspicious rcu_dereference_check() usage! [ 223.588207] [ 223.588207] other
Re: [PATCHv6 net-next 3/6] sctp: add support for generating stream reconf ssn/tsn reset request chunk
On Thu, Feb 9, 2017 at 5:57 AM, Marcelo Ricardo Leitnerwrote: > On Thu, Feb 09, 2017 at 01:18:17AM +0800, Xin Long wrote: >> This patch is to define SSN/TSN Reset Request Parameter described >> in rfc6525 section 4.3. >> >> It's also to drop some unnecessary __packed in include/linux/sctp.h. > > Oups, extra line in the changelog here. I've moved the "drop __packed" to patch 1/6, this line should have been removed. Hi, David, can you remove this line when applying ? Thanks. > >> >> Signed-off-by: Xin Long >> --- >> include/linux/sctp.h | 5 + >> include/net/sctp/sm.h| 2 ++ >> net/sctp/sm_make_chunk.c | 29 + >> 3 files changed, 36 insertions(+) >> >> diff --git a/include/linux/sctp.h b/include/linux/sctp.h >> index d74fca3..71c0d41 100644 >> --- a/include/linux/sctp.h >> +++ b/include/linux/sctp.h >> @@ -737,4 +737,9 @@ struct sctp_strreset_inreq { >> __u16 list_of_streams[0]; >> }; >> >> +struct sctp_strreset_tsnreq { >> + sctp_paramhdr_t param_hdr; >> + __u32 request_seq; >> +}; >> + >> #endif /* __LINUX_SCTP_H__ */ >> diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h >> index 430ed13..ac37c17 100644 >> --- a/include/net/sctp/sm.h >> +++ b/include/net/sctp/sm.h >> @@ -265,6 +265,8 @@ struct sctp_chunk *sctp_make_strreset_req( >> const struct sctp_association *asoc, >> __u16 stream_num, __u16 *stream_list, >> bool out, bool in); >> +struct sctp_chunk *sctp_make_strreset_tsnreq( >> + const struct sctp_association *asoc); >> void sctp_chunk_assign_tsn(struct sctp_chunk *); >> void sctp_chunk_assign_ssn(struct sctp_chunk *); >> >> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c >> index c7d3249..749842a 100644 >> --- a/net/sctp/sm_make_chunk.c >> +++ b/net/sctp/sm_make_chunk.c >> @@ -3658,3 +3658,32 @@ struct sctp_chunk *sctp_make_strreset_req( >> >> return retval; >> } >> + >> +/* RE-CONFIG 4.3 (SSN/TSN RESET ALL) >> + * 0 1 2 3 >> + * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 >> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> + * | Parameter Type = 15 | Parameter Length = 8 | >> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> + * | Re-configuration Request Sequence Number | >> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ >> + */ >> +struct sctp_chunk *sctp_make_strreset_tsnreq( >> + const struct sctp_association *asoc) >> +{ >> + struct sctp_strreset_tsnreq tsnreq; >> + __u16 length = sizeof(tsnreq); >> + struct sctp_chunk *retval; >> + >> + retval = sctp_make_reconf(asoc, length); >> + if (!retval) >> + return NULL; >> + >> + tsnreq.param_hdr.type = SCTP_PARAM_RESET_TSN_REQUEST; >> + tsnreq.param_hdr.length = htons(length); >> + tsnreq.request_seq = htonl(asoc->strreset_outseq); >> + >> + sctp_addto_chunk(retval, sizeof(tsnreq), ); >> + >> + return retval; >> +} >> -- >> 2.1.0 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>
Re: [net, v3, 1/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
Hello Florian, Am 09.02.2017 um 08:13 schrieb Florian Fainelli: On 02/08/2017 10:58 PM, Heiko Schocher wrote: Hello Florian, Am 09.02.2017 um 01:13 schrieb Florian Fainelli: The Generic PHY drivers gets assigned after we checked that the current PHY driver is NULL, so we need to check a few things before we can safely dereference d->driver. This would be causing a NULL deference to occur when a system binds to the Generic PHY driver. Update phy_attach_direct() to do the following: - grab the driver module reference after we have assigned the Generic PHY drivers accordingly - update the error path to clean up the module reference in case the Generic PHY probe function fails Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver") Signed-off-by: Florian Fainelli--- drivers/net/phy/phy_device.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) just stumbled over this bug on an am335x based board, with an KSZ8081 attached, so there a "fixed-link" is used like: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am335x-baltos-ir3220.dts#n105 With your patch it crashes also ... The final version of the patch is here: http://patchwork.ozlabs.org/patch/725923/ Huh, sorry ... Do you mind giving it a try? With this patch, ethernet works again fine on this board, thanks! bye, Heiko -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
[PATCH 0/4] Whitespace checkpatch fixes
This patch set fixes various whitespace checkpatch errors and warnings. Tobin C. Harding (4): net: Fix checkpatch WARNING: please, no space before tabs net: Fix checkpatch whitespace errors net: Fix checkpatch block comments warnings net: Fix checkpatch, Missing a blank line after declarations net/core/dev.c | 259 ++--- 1 file changed, 137 insertions(+), 122 deletions(-) -- 2.7.4 "David S. Miller"(maintainer:NETWORKING [GENERAL],commit_signer:72/82=88%) Eric Dumazet (commit_signer:23/82=28%,authored:19/82=23%,added_lines:150/1100=14%,removed_lines:104/791=13%) Alexander Duyck (commit_signer:14/82=17%,authored:11/82=13%,added_lines:259/1100=24%,removed_lines:66/791=8%) David Ahern (commit_signer:9/82=11%,authored:7/82=9%,added_lines:219/1100=20%,removed_lines:230/791=29%) Jesper Dangaard Brouer (commit_signer:6/82=7%) "Tobin C. Harding" (added_lines:137/1100=12%,removed_lines:122/791=15%) Jiri Pirko (added_lines:90/1100=8%) stephen hemminger (removed_lines:136/791=17%) netdev@vger.kernel.org (open list:NETWORKING [GENERAL]) linux-ker...@vger.kernel.org (open list)
Re: [net, v3, 1/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
On 02/08/2017 10:58 PM, Heiko Schocher wrote: > Hello Florian, > > Am 09.02.2017 um 01:13 schrieb Florian Fainelli: >> The Generic PHY drivers gets assigned after we checked that the current >> PHY driver is NULL, so we need to check a few things before we can >> safely dereference d->driver. This would be causing a NULL deference to >> occur when a system binds to the Generic PHY driver. Update >> phy_attach_direct() to do the following: >> >> - grab the driver module reference after we have assigned the Generic >>PHY drivers accordingly >> >> - update the error path to clean up the module reference in case the >>Generic PHY probe function fails >> >> Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY >> driver") >> Signed-off-by: Florian Fainelli>> --- >> drivers/net/phy/phy_device.c | 16 +++- >> 1 file changed, 15 insertions(+), 1 deletion(-) > > just stumbled over this bug on an am335x based board, with an > KSZ8081 attached, so there a "fixed-link" is used like: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am335x-baltos-ir3220.dts#n105 > > > With your patch it crashes also ... The final version of the patch is here: http://patchwork.ozlabs.org/patch/725923/ Do you mind giving it a try? -- Florian
[PATCH 4/4] net: Fix checkpatch, Missing a blank line after declarations
This patch fixes multiple occurrences of checkpatch WARNING: Missing a blank line after declarations. Signed-off-by: Tobin C. Harding--- net/core/dev.c | 13 + 1 file changed, 13 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 6a076a1..fa63485 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2520,6 +2520,7 @@ u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb, if (dev->num_tc) { u8 tc = netdev_get_prio_tc_map(dev, skb->priority); + qoffset = dev->tc_to_txq[tc].offset; qcount = dev->tc_to_txq[tc].count; } @@ -2734,9 +2735,11 @@ static int illegal_highdma(struct net_device *dev, struct sk_buff *skb) { #ifdef CONFIG_HIGHMEM int i; + if (!(dev->features & NETIF_F_HIGHDMA)) { for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { skb_frag_t *frag = _shinfo(skb)->frags[i]; + if (PageHighMem(skb_frag_page(frag))) return 1; } @@ -2750,6 +2753,7 @@ static int illegal_highdma(struct net_device *dev, struct sk_buff *skb) for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { skb_frag_t *frag = _shinfo(skb)->frags[i]; dma_addr_t addr = page_to_phys(skb_frag_page(frag)); + if (!pdev->dma_mask || addr + PAGE_SIZE - 1 > *pdev->dma_mask) return 1; } @@ -3227,6 +3231,7 @@ static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb) if (queue_index < 0 || skb->ooo_okay || queue_index >= dev->real_num_tx_queues) { int new_index = get_xps_queue(dev, skb); + if (new_index < 0) new_index = skb_tx_hash(dev, skb); @@ -3256,6 +3261,7 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev, if (dev->real_num_tx_queues != 1) { const struct net_device_ops *ops = dev->netdev_ops; + if (ops->ndo_select_queue) queue_index = ops->ndo_select_queue(dev, skb, accel_priv, __netdev_pick_tx); @@ -3781,6 +3787,7 @@ static int netif_rx_internal(struct sk_buff *skb) #endif { unsigned int qtail; + ret = enqueue_to_backlog(skb, get_cpu(), ); put_cpu(); } @@ -3840,6 +3847,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) while (clist) { struct sk_buff *skb = clist; + clist = clist->next; WARN_ON(atomic_read(>users)); @@ -5708,6 +5716,7 @@ static int netdev_adjacent_sysfs_add(struct net_device *dev, struct list_head *dev_list) { char linkname[IFNAMSIZ+7]; + sprintf(linkname, dev_list == >adj_list.upper ? "upper_%s" : "lower_%s", adj_dev->name); return sysfs_create_link(&(dev->dev.kobj), &(adj_dev->dev.kobj), @@ -5718,6 +5727,7 @@ static void netdev_adjacent_sysfs_del(struct net_device *dev, struct list_head *dev_list) { char linkname[IFNAMSIZ+7]; + sprintf(linkname, dev_list == >adj_list.upper ? "upper_%s" : "lower_%s", name); sysfs_remove_link(&(dev->dev.kobj), linkname); @@ -5987,6 +5997,7 @@ void netdev_upper_dev_unlink(struct net_device *dev, struct net_device *upper_dev) { struct netdev_notifier_changeupper_info changeupper_info; + ASSERT_RTNL(); changeupper_info.upper_dev = upper_dev; @@ -6748,6 +6759,7 @@ EXPORT_SYMBOL(dev_change_xdp_fd); static int dev_new_index(struct net *net) { int ifindex = net->ifindex; + for (;;) { if (++ifindex <= 0) ifindex = 1; @@ -7168,6 +7180,7 @@ void netif_tx_stop_all_queues(struct net_device *dev) for (i = 0; i < dev->num_tx_queues; i++) { struct netdev_queue *txq = netdev_get_tx_queue(dev, i); + netif_tx_stop_queue(txq); } } -- 2.7.4
[PATCH 1/4] net: Fix checkpatch WARNING: please, no space before tabs
This patch fixes multiple occurrences of space before tabs warnings. More lines of code were moved than required to keep kernel-doc comments uniform. Signed-off-by: Tobin C. Harding--- net/core/dev.c | 142 - 1 file changed, 71 insertions(+), 71 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 29101c9..2753690 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1,5 +1,5 @@ /* - * NET3Protocol independent device support routines. + * NET3Protocol independent device support routines. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -7,7 +7,7 @@ * 2 of the License, or (at your option) any later version. * * Derived from the non IP parts of dev.c 1.0.19 - * Authors:Ross Biro + * Authors: Ross Biro * Fred N. van Kempen, * Mark Evans, * @@ -21,9 +21,9 @@ * * Changes: * D.J. Barrow : Fixed bug where dev->refcnt gets set - * to 2 if register_netdev gets called - * before net_dev_init & also removed a - * few lines of code in the process. + * to 2 if register_netdev gets called + * before net_dev_init & also removed a + * few lines of code in the process. * Alan Cox: device private ioctl copies fields back. * Alan Cox: Transmit queue code does relevant * stunts to keep the queue safe. @@ -36,7 +36,7 @@ * Alan Cox: 100 backlog just doesn't cut it when * you start doing multicast video 8) * Alan Cox: Rewrote net_bh and list manager. - * Alan Cox: Fix ETH_P_ALL echoback lengths. + * Alan Cox: Fix ETH_P_ALL echoback lengths. * Alan Cox: Took out transmit every packet pass * Saved a few bytes in the ioctl handler * Alan Cox: Network driver sets packet type before @@ -46,7 +46,7 @@ * Richard Kooijman: Timestamp fixes. * Alan Cox: Wrong field in SIOCGIFDSTADDR * Alan Cox: Device lock protection. - * Alan Cox: Fixed nasty side effect of device close + * Alan Cox: Fixed nasty side effect of device close * changes. * Rudi Cilibrasi : Pass the right thing to * set_mac_address() @@ -67,8 +67,8 @@ * Paul Rusty Russell : SIOCSIFNAME * Pekka Riikonen : Netdev boot-time settings code * Andrew Morton : Make unregister_netdevice wait - * indefinitely on dev->refcnt - * J Hadi Salim: - Backlog queue sampling + * indefinitely on dev->refcnt + * J Hadi Salim: - Backlog queue sampling * - netif_rx() feedback */ @@ -574,13 +574,13 @@ static int netdev_boot_setup_add(char *name, struct ifmap *map) } /** - * netdev_boot_setup_check - check boot time settings - * @dev: the netdevice + * netdev_boot_setup_check - check boot time settings + * @dev: the netdevice * - * Check boot time settings for the device. - * The found settings are set for the device to be used - * later in the device probing. - * Returns 0 if no settings found, 1 if they are. + * Check boot time settings for the device. + * The found settings are set for the device to be used + * later in the device probing. + * Returns 0 if no settings found, 1 if they are. */ int netdev_boot_setup_check(struct net_device *dev) { @@ -590,10 +590,10 @@ int netdev_boot_setup_check(struct net_device *dev) for (i = 0; i < NETDEV_BOOT_SETUP_MAX; i++) { if (s[i].name[0] != '\0' && s[i].name[0] != ' ' && !strcmp(dev->name, s[i].name)) { - dev->irq= s[i].map.irq; - dev->base_addr = s[i].map.base_addr; - dev->mem_start = s[i].map.mem_start; - dev->mem_end= s[i].map.mem_end; + dev->irq = s[i].map.irq; + dev->base_addr = s[i].map.base_addr; +
[PATCH 2/4] net: Fix checkpatch whitespace errors
This patch fixes two trivial whitespace errors. Brace should be on the previous line and trailing statements should be on next line. Signed-off-by: Tobin C. Harding--- net/core/dev.c | 39 --- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 2753690..471e41a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -192,7 +192,8 @@ static seqcount_t devnet_rename_seq; static inline void dev_base_seq_inc(struct net *net) { - while (++net->dev_base_seq == 0); + while (++net->dev_base_seq == 0) + ; } static inline struct hlist_head *dev_name_hash(struct net *net, const char *name) @@ -274,8 +275,8 @@ EXPORT_PER_CPU_SYMBOL(softnet_data); * register_netdevice() inits txq->_xmit_lock and sets lockdep class * according to dev->type */ -static const unsigned short netdev_lock_type[] = - {ARPHRD_NETROM, ARPHRD_ETHER, ARPHRD_EETHER, ARPHRD_AX25, +static const unsigned short netdev_lock_type[] = { +ARPHRD_NETROM, ARPHRD_ETHER, ARPHRD_EETHER, ARPHRD_AX25, ARPHRD_PRONET, ARPHRD_CHAOS, ARPHRD_IEEE802, ARPHRD_ARCNET, ARPHRD_APPLETLK, ARPHRD_DLCI, ARPHRD_ATM, ARPHRD_METRICOM, ARPHRD_IEEE1394, ARPHRD_EUI64, ARPHRD_INFINIBAND, ARPHRD_SLIP, @@ -291,22 +292,22 @@ static const unsigned short netdev_lock_type[] = ARPHRD_IEEE80211_RADIOTAP, ARPHRD_PHONET, ARPHRD_PHONET_PIPE, ARPHRD_IEEE802154, ARPHRD_VOID, ARPHRD_NONE}; -static const char *const netdev_lock_name[] = - {"_xmit_NETROM", "_xmit_ETHER", "_xmit_EETHER", "_xmit_AX25", -"_xmit_PRONET", "_xmit_CHAOS", "_xmit_IEEE802", "_xmit_ARCNET", -"_xmit_APPLETLK", "_xmit_DLCI", "_xmit_ATM", "_xmit_METRICOM", -"_xmit_IEEE1394", "_xmit_EUI64", "_xmit_INFINIBAND", "_xmit_SLIP", -"_xmit_CSLIP", "_xmit_SLIP6", "_xmit_CSLIP6", "_xmit_RSRVD", -"_xmit_ADAPT", "_xmit_ROSE", "_xmit_X25", "_xmit_HWX25", -"_xmit_PPP", "_xmit_CISCO", "_xmit_LAPB", "_xmit_DDCMP", -"_xmit_RAWHDLC", "_xmit_TUNNEL", "_xmit_TUNNEL6", "_xmit_FRAD", -"_xmit_SKIP", "_xmit_LOOPBACK", "_xmit_LOCALTLK", "_xmit_FDDI", -"_xmit_BIF", "_xmit_SIT", "_xmit_IPDDP", "_xmit_IPGRE", -"_xmit_PIMREG", "_xmit_HIPPI", "_xmit_ASH", "_xmit_ECONET", -"_xmit_IRDA", "_xmit_FCPP", "_xmit_FCAL", "_xmit_FCPL", -"_xmit_FCFABRIC", "_xmit_IEEE80211", "_xmit_IEEE80211_PRISM", -"_xmit_IEEE80211_RADIOTAP", "_xmit_PHONET", "_xmit_PHONET_PIPE", -"_xmit_IEEE802154", "_xmit_VOID", "_xmit_NONE"}; +static const char *const netdev_lock_name[] = { + "_xmit_NETROM", "_xmit_ETHER", "_xmit_EETHER", "_xmit_AX25", + "_xmit_PRONET", "_xmit_CHAOS", "_xmit_IEEE802", "_xmit_ARCNET", + "_xmit_APPLETLK", "_xmit_DLCI", "_xmit_ATM", "_xmit_METRICOM", + "_xmit_IEEE1394", "_xmit_EUI64", "_xmit_INFINIBAND", "_xmit_SLIP", + "_xmit_CSLIP", "_xmit_SLIP6", "_xmit_CSLIP6", "_xmit_RSRVD", + "_xmit_ADAPT", "_xmit_ROSE", "_xmit_X25", "_xmit_HWX25", + "_xmit_PPP", "_xmit_CISCO", "_xmit_LAPB", "_xmit_DDCMP", + "_xmit_RAWHDLC", "_xmit_TUNNEL", "_xmit_TUNNEL6", "_xmit_FRAD", + "_xmit_SKIP", "_xmit_LOOPBACK", "_xmit_LOCALTLK", "_xmit_FDDI", + "_xmit_BIF", "_xmit_SIT", "_xmit_IPDDP", "_xmit_IPGRE", + "_xmit_PIMREG", "_xmit_HIPPI", "_xmit_ASH", "_xmit_ECONET", + "_xmit_IRDA", "_xmit_FCPP", "_xmit_FCAL", "_xmit_FCPL", + "_xmit_FCFABRIC", "_xmit_IEEE80211", "_xmit_IEEE80211_PRISM", + "_xmit_IEEE80211_RADIOTAP", "_xmit_PHONET", "_xmit_PHONET_PIPE", + "_xmit_IEEE802154", "_xmit_VOID", "_xmit_NONE"}; static struct lock_class_key netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)]; static struct lock_class_key netdev_addr_lock_key[ARRAY_SIZE(netdev_lock_type)]; -- 2.7.4
[PATCH 3/4] net: Fix checkpatch block comments warnings
Fix multiple occurrences of checkpatch warning. WARNING: Block comments use * on subsequent lines. Also make comment blocks more uniform. Signed-off-by: Tobin C. Harding--- net/core/dev.c | 65 +- 1 file changed, 33 insertions(+), 32 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 471e41a..6a076a1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -353,10 +353,11 @@ static inline void netdev_set_addr_lockdep_class(struct net_device *dev) #endif /*** + * + * Protocol management and registration routines + * + ***/ - Protocol management and registration routines - -***/ /* * Add a protocol ID to the list. Now that the input handler is @@ -539,10 +540,10 @@ void dev_remove_offload(struct packet_offload *po) EXPORT_SYMBOL(dev_remove_offload); /** - - Device Boot-time Settings Routines - -***/ + * + * Device Boot-time Settings Routines + * + **/ /* Boot time configuration table */ static struct netdev_boot_setup dev_boot_setup[NETDEV_BOOT_SETUP_MAX]; @@ -664,10 +665,10 @@ int __init netdev_boot_setup(char *str) __setup("netdev=", netdev_boot_setup); /*** - - Device Interface Subroutines - -***/ + * + * Device Interface Subroutines + * + ***/ /** * dev_get_iflink - get 'iflink' value of a interface @@ -3343,16 +3344,16 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv) } /* The device has no queue. Common case for software devices: - loopback, all the sorts of tunnels... +* loopback, all the sorts of tunnels... - Really, it is unlikely that netif_tx_lock protection is necessary - here. (f.e. loopback and IP tunnels are clean ignoring statistics - counters.) - However, it is possible, that they rely on protection - made by us here. +* Really, it is unlikely that netif_tx_lock protection is necessary +* here. (f.e. loopback and IP tunnels are clean ignoring statistics +* counters.) +* However, it is possible, that they rely on protection +* made by us here. - Check this and shot the lock. It is not prone from deadlocks. - Either shot noqueue qdisc, it is even simpler 8) +* Check this and shot the lock. It is not prone from deadlocks. +*Either shot noqueue qdisc, it is even simpler 8) */ if (dev->flags & IFF_UP) { int cpu = smp_processor_id(); /* ok because BHs are off */ @@ -3414,9 +3415,9 @@ int dev_queue_xmit_accel(struct sk_buff *skb, void *accel_priv) EXPORT_SYMBOL(dev_queue_xmit_accel); -/*=== - Receiver routines - ===*/ +/* + * Receiver routines + */ int netdev_max_backlog __read_mostly = 1000; EXPORT_SYMBOL(netdev_max_backlog); @@ -6448,8 +6449,8 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags) } /* NOTE: order of synchronization of IFF_PROMISC and IFF_ALLMULTI - is important. Some (broken) drivers set IFF_PROMISC, when - IFF_ALLMULTI is requested not asking us and not reporting. +* is important. Some (broken) drivers set IFF_PROMISC, when +* IFF_ALLMULTI is requested not asking us and not reporting. */ if ((flags ^ dev->gflags) & IFF_ALLMULTI) { int inc = (flags & IFF_ALLMULTI) ? 1 : -1; @@ -6813,8 +6814,8 @@ static void rollback_registered_many(struct list_head *head) /* Notify protocols, that we are about to destroy - this device. They should clean all the things. - */ +* this device. They should clean all the things. +*/ call_netdevice_notifiers(NETDEV_UNREGISTER, dev); if (!dev->rtnl_link_ops || @@ -7951,12 +7952,12 @@ int
Re: [net, v3, 1/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
Hello Florian, Am 09.02.2017 um 01:13 schrieb Florian Fainelli: The Generic PHY drivers gets assigned after we checked that the current PHY driver is NULL, so we need to check a few things before we can safely dereference d->driver. This would be causing a NULL deference to occur when a system binds to the Generic PHY driver. Update phy_attach_direct() to do the following: - grab the driver module reference after we have assigned the Generic PHY drivers accordingly - update the error path to clean up the module reference in case the Generic PHY probe function fails Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver") Signed-off-by: Florian Fainelli--- drivers/net/phy/phy_device.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) just stumbled over this bug on an am335x based board, with an KSZ8081 attached, so there a "fixed-link" is used like: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am335x-baltos-ir3220.dts#n105 With your patch it crashes also ... If I remove this part: diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index d63d190..9dd08a4 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -921,11 +921,6 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, return -EIO; } - if (!try_module_get(d->driver->owner)) { - dev_err(>dev, "failed to get the device driver module\n"); - return -EIO; - } - get_device(d); /* Assume that if there is no driver, that it doesn't it boots again .. I think, you forgot? simply this remove ? bye, Heiko diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 0d8f4d3847f6..d63d190a95ef 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -908,6 +908,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, struct module *ndev_owner = dev->dev.parent->driver->owner; struct mii_bus *bus = phydev->mdio.bus; struct device *d = >mdio.dev; + bool using_genphy = false; int err; /* For Ethernet device drivers that register their own MDIO bus, we @@ -938,12 +939,22 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, d->driver = _driver[GENPHY_DRV_1G].mdiodrv.driver; + using_genphy = true; + } + + if (!try_module_get(d->driver->owner)) { + dev_err(>dev, "failed to get the device driver module\n"); + err = -EIO; + goto error_put_device; + } + + if (using_genphy) { err = d->driver->probe(d); if (err >= 0) err = device_bind_driver(d); if (err) - goto error; + goto error_module_put; } if (phydev->attached_dev) { @@ -981,6 +992,9 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, error: phy_detach(phydev); +error_module_put: + module_put(d->driver->owner); +error_put_device: put_device(d); module_put(d->driver->owner); if (ndev_owner != bus->owner) -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Crashes in -next due to 'net: phy: Fix lack of reference count on PHY driver'
Hi, I see a number of my qemu tests in -next crash. Affected are tests of nios2, xtensa, and arm64. The arm64 crash log looks as follows. [0.734220] Hardware name: ZynqMP EP108 (DT) [0.734298] task: 80007cb5 task.stack: 80007cb4c000 [0.734533] PC is at phy_attach_direct+0x54/0x1b8 [0.734592] LR is at phy_connect_direct+0x1c/0x70 [0.734643] pc : [] lr : [] pstate: 6045 ... [0.740044] [] phy_attach_direct+0x54/0x1b8 [0.740118] [] phy_connect_direct+0x1c/0x70 [0.740191] [] macb_probe+0x5a8/0x978 [0.740378] [] platform_drv_probe+0x50/0xb8 [0.740449] [] driver_probe_device+0x224/0x2c8 [0.740519] [] __driver_attach+0xac/0xb0 [0.740587] [] bus_for_each_dev+0x60/0xa0 [0.740653] [] driver_attach+0x20/0x28 [0.740716] [] bus_add_driver+0x1d0/0x238 [0.740782] [] driver_register+0x60/0xf8 [0.740849] [] __platform_driver_register+0x40/0x48 [0.740924] [] macb_driver_init+0x18/0x20 [0.740994] [] do_one_initcall+0x38/0x120 [0.741062] [] kernel_init_freeable+0x19c/0x23c [0.741134] [] kernel_init+0x10/0x100 [0.741199] [] ret_from_fork+0x10/0x50 Detailed logs are available at http://kerneltests.org/builders, in the 'next' column. The scripts used to run the tests are available in the architecture subdirectories of https://github.com/groeck/linux-build-test/tree/master/rootfs. I bisected the nios2 failure; it points to commit cafe8df8b9 ("net: phy: Fix lack of reference count on PHY driver"). Bisect log is attached below. Reverting this patch fixes the problem for all affected architectures in my tests. Guenter --- # bad: [e3e6c5f3544c5d05c6b3b309a34f4f2c3537e993] Add linux-next specific files for 20170208 # good: [d5adbfcd5f7bcc6fa58a41c5c5ada0e5c826ce2c] Linux 4.10-rc7 git bisect start 'HEAD' 'v4.10-rc7' # bad: [403e468309f9e2b2dbe264be1ad29b1486ed720e] Merge remote-tracking branch 'crypto/master' git bisect bad 403e468309f9e2b2dbe264be1ad29b1486ed720e # bad: [8c3f07a3ae77164de4405fa97baca4f103f5] Merge remote-tracking branch 'hid/for-next' git bisect bad 8c3f07a3ae77164de4405fa97baca4f103f5 # bad: [dd4318312c6fc5c00ae7619f875fb73538a2c1e6] Merge remote-tracking branch 'omap/for-next' git bisect bad dd4318312c6fc5c00ae7619f875fb73538a2c1e6 # good: [52b61c8b3eefd40f8a70131cdc0c7348f18f463c] Merge branch 'next/soc' into for-next git bisect good 52b61c8b3eefd40f8a70131cdc0c7348f18f463c # good: [cbd3dcf3b865b961a9c10ff16e42908832cee63f] Merge branch 'next/dt' into for-next git bisect good cbd3dcf3b865b961a9c10ff16e42908832cee63f # bad: [66842bac82cae0e9378eea1c54ab9751e32a929b] Merge remote-tracking branch 'arm/for-next' git bisect bad 66842bac82cae0e9378eea1c54ab9751e32a929b # bad: [926af6273fc683cd98cd0ce7bf0d04a02eed6742] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net git bisect bad 926af6273fc683cd98cd0ce7bf0d04a02eed6742 # bad: [2dcab598484185dea7ec22219c76dcdd59e3cb90] sctp: avoid BUG_ON on sctp_wait_for_sndbuf git bisect bad 2dcab598484185dea7ec22219c76dcdd59e3cb90 # bad: [89389b4d5524350e74974cf711fe4a18206c09d3] Merge tag 'mac80211-for-davem-2017-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 git bisect bad 89389b4d5524350e74974cf711fe4a18206c09d3 # bad: [34b2cef20f19c87999fff3da4071e66937db9644] ipv4: keep skb->dst around in presence of IP options git bisect bad 34b2cef20f19c87999fff3da4071e66937db9644 # bad: [cafe8df8b9bc9aa3dffa827c1a6757c6cd36f657] net: phy: Fix lack of reference count on PHY driver git bisect bad cafe8df8b9bc9aa3dffa827c1a6757c6cd36f657 # good: [770f82253dbd7e6892a88018f2f6cd395f48d214] mlx4: xdp_prog becomes inactive after ethtool '-L' or '-G' git bisect good 770f82253dbd7e6892a88018f2f6cd395f48d214 # good: [2372bcda5e681bc85d57a3604265155e1a4c040b] Merge branch 'mlx4-queue-reinit' git bisect good 2372bcda5e681bc85d57a3604265155e1a4c040b # first bad commit: [cafe8df8b9bc9aa3dffa827c1a6757c6cd36f657] net: phy: Fix lack of reference count on PHY driver
Re: Extending socket timestamping API for NTP
[Resend as plain text] > On Feb 07, 2017, at 06:01, Miroslav Lichvarwrote: > > 5) new SO_TIMESTAMPING options to get transposed RX timestamps > > PTP uses preamble RX timestamps, but NTP works with trailer RX > timestamps. This means NTP implementations currently need to > transpose HW RX timestamps. The calculation requires the link speed > and the length of the packet at layer 2. It seems this can be > reliably done only using raw sockets. It would be very nice if the > kernel could tranpose the timestamps automatically. > > The existing SOF_TIMESTAMPING_RX_HARDWARE flag could be aliased to > SOF_TIMESTAMPING_RX_HARDWARE_PREAMBLE and the new flag could be > SOF_TIMESTAMPING_RX_HARDWARE_TRAILER. > > PTP has a similar problem with SW RX timestamps, which are closer > to the trailer timestamps rather than preamble timestamps. A new > SOF_TIMESTAMPING_RX_SOFTWARE_PREAMBLE flag could be added for PTP > implementations to get transposed timestamps in order to improve > accuracy. > > 6) new SO_TIMESTAMPING option to get PHC index with HW timestamps > > With bridges, bonding and other things it's difficult to determine > which PHC timestamped the packet. It would be very useful if the > PHC index was provided with each HW timestamp. > > I'm not sure what would be the best place to put it. I guess the > second timespec in scm_timestamping could be reused for this, but > that sounds like a gross hack. Do we need to define a new struct? Miroslav, if #5 were implemented, would #6 still needed? Denny
[PATCH 1/1] ixgbe: add the external ixgbe fiber transceiver status
When the ixgbe fiber transceiver is external, it is necessary to get the present/absent status of this external ixgbe fiber transceiver. The steps to get the present/absent status: The enp1s0f0 is an external ixgbe fiber NIC. ethtool enp1s0f0 ... Port: FIBRE PHYAD: 0 Transceiver: external(present) <---The transceiver is present. Auto-negotiation: on Supports Wake-on: d ... Or ... Port: FIBRE PHYAD: 0 Transceiver: external(absent) <---The transceiver is absent Auto-negotiation: on Supports Wake-on: d ... Signed-off-by: Zhu Yanjun--- drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 15 +++ include/uapi/linux/ethtool.h | 4 2 files changed, 19 insertions(+) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c index fd192bf..b3f86f4 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c @@ -313,6 +313,21 @@ static int ixgbe_get_settings(struct net_device *netdev, break; } + /* When the tranceiver is external, the following is meaningful. +* ecmd->reserved[0] has 3 values: +* 0x0: tranceiver absent +* 0x4: tranceiver present +* others: not support +*/ + if (ecmd->port == PORT_FIBRE) { + u32 status = IXGBE_READ_REG(hw, IXGBE_ESDP) & IXGBE_ESDP_SDP2; + + if (status == 0x4) + ecmd->transceiver = XCVR_EXTERNAL_PRESENT; + if (status == 0x0) + ecmd->transceiver = XCVR_EXTERNAL_ABSENT; + } + /* Indicate pause support */ ecmd->supported |= SUPPORTED_Pause; diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 3dc91a4..8e8225a 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -1541,6 +1541,10 @@ static inline int ethtool_validate_duplex(__u8 duplex) #define XCVR_DUMMY20x03 #define XCVR_DUMMY30x04 +/* The fiber transceiver status */ +#define XCVR_EXTERNAL_ABSENT 0x05 +#define XCVR_EXTERNAL_PRESENT 0x06 + /* Enable or disable autonegotiation. */ #define AUTONEG_DISABLE0x00 #define AUTONEG_ENABLE 0x01 -- 2.7.4
[PATCH 1/1] ethtool: add the external transceiver status of the ixgbe fiber
When the the fiber transceiver of the ixgbe NIC is external, sometimes it is necessary to get the present/absent status of the fiber transceiver of the ixgbe NIC. The steps to get the present/absent status: The NIC enp1s0f0 is an external ixgbe fiber NIC. ethtool enp1s0f0 ... Port: FIBRE PHYAD: 0 Transceiver: external(present) <---The transceiver is present. Auto-negotiation: on Supports Wake-on: d ... Or ... Port: FIBRE PHYAD: 0 Transceiver: external(absent) <---The transceiver is absent Auto-negotiation: on Supports Wake-on: d ... Signed-off-by: Zhu Yanjun--- ethtool-copy.h | 2 ++ ethtool.c | 6 ++ 2 files changed, 8 insertions(+) diff --git a/ethtool-copy.h b/ethtool-copy.h index 3d299e3..1c6db9a 100644 --- a/ethtool-copy.h +++ b/ethtool-copy.h @@ -1536,6 +1536,8 @@ static __inline__ int ethtool_validate_duplex(__u8 duplex) #define XCVR_DUMMY10x02 #define XCVR_DUMMY20x03 #define XCVR_DUMMY30x04 +#define XCVR_EXTERNAL_ABSENT 0x05 +#define XCVR_EXTERNAL_PRESENT 0x06 /* Enable or disable autonegotiation. */ #define AUTONEG_DISABLE0x00 diff --git a/ethtool.c b/ethtool.c index 7af039e..85cf5a2 100644 --- a/ethtool.c +++ b/ethtool.c @@ -811,6 +811,12 @@ dump_link_usettings(const struct ethtool_link_usettings *link_usettings) case XCVR_EXTERNAL: fprintf(stdout, "external\n"); break; + case XCVR_EXTERNAL_PRESENT: + fprintf(stdout, "external(present)\n"); + break; + case XCVR_EXTERNAL_ABSENT: + fprintf(stdout, "external(absent)\n"); + break; default: fprintf(stdout, "Unknown!\n"); break; -- 2.7.4
[PATCHv2 iproute2 net-next] man: ip-link.8: Document bridge_slave fdb_flush option
Signed-off-by: Hangbin Liu--- man/man8/ip-link.8.in | 5 + 1 file changed, 5 insertions(+) diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in index 469bb43..651a255 100644 --- a/man/man8/ip-link.8.in +++ b/man/man8/ip-link.8.in @@ -1429,6 +1429,8 @@ the following additional arguments are supported: .B "ip link set type bridge_slave" [ +.B fdb_flush +] [ .BI state " STATE" ] [ .BI priority " PRIO" @@ -1459,6 +1461,9 @@ the following additional arguments are supported: .in +8 .sp +.B fdb_flush +- flush bridge slave's fdb dynamic entries. + .BI state " STATE" - Set port state. .I STATE -- 2.5.5
[PATCH net] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
The Generic PHY drivers gets assigned after we checked that the current PHY driver is NULL, so we need to check a few things before we can safely dereference d->driver. This would be causing a NULL deference to occur when a system binds to the Generic PHY driver. Update phy_attach_direct() to do the following: - grab the driver module reference after we have assigned the Generic PHY drivers accordingly, and remember we came from the generic PHY path - update the error path to clean up the module reference in case the Generic PHY probe function fails - split the error path involving phy_detacht() to avoid double free/put since phy_detach() does all the clean up - finally, have phy_detach() drop the module reference count before we call device_release_driver() for the Generic PHY driver case Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver") Signed-off-by: Florian Fainelli--- David, This is applicable to the "net" and the "net-next" tree since you merged "net" into "net-next". I will fix the PHY driver bind/unbind mess another time, because we are running out of time for 4.10-rc final, and it's not like it worked before and got broken in this cycle, it just never worked (the bind/unbind). Thanks! drivers/net/phy/phy_device.c | 28 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 0d8f4d3847f6..8c8e15b8739d 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -908,6 +908,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, struct module *ndev_owner = dev->dev.parent->driver->owner; struct mii_bus *bus = phydev->mdio.bus; struct device *d = >mdio.dev; + bool using_genphy = false; int err; /* For Ethernet device drivers that register their own MDIO bus, we @@ -920,11 +921,6 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, return -EIO; } - if (!try_module_get(d->driver->owner)) { - dev_err(>dev, "failed to get the device driver module\n"); - return -EIO; - } - get_device(d); /* Assume that if there is no driver, that it doesn't @@ -938,12 +934,22 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, d->driver = _driver[GENPHY_DRV_1G].mdiodrv.driver; + using_genphy = true; + } + + if (!try_module_get(d->driver->owner)) { + dev_err(>dev, "failed to get the device driver module\n"); + err = -EIO; + goto error_put_device; + } + + if (using_genphy) { err = d->driver->probe(d); if (err >= 0) err = device_bind_driver(d); if (err) - goto error; + goto error_module_put; } if (phydev->attached_dev) { @@ -980,9 +986,14 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, return err; error: + /* phy_detach() does all of the cleanup below */ phy_detach(phydev); - put_device(d); + return err; + +error_module_put: module_put(d->driver->owner); +error_put_device: + put_device(d); if (ndev_owner != bus->owner) module_put(bus->owner); return err; @@ -1045,6 +1056,8 @@ void phy_detach(struct phy_device *phydev) phy_led_triggers_unregister(phydev); + module_put(phydev->mdio.dev.driver->owner); + /* If the device had no specific driver before (i.e. - it * was using the generic driver), we unbind the device * from the generic driver so that there's a chance a @@ -1065,7 +1078,6 @@ void phy_detach(struct phy_device *phydev) bus = phydev->mdio.bus; put_device(>mdio.dev); - module_put(phydev->mdio.dev.driver->owner); if (ndev_owner != bus->owner) module_put(bus->owner); } -- 2.9.3
Re: [PATCH net] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
On 02/08/2017 07:05 PM, Florian Fainelli wrote: > The Generic PHY drivers gets assigned after we checked that the current > PHY driver is NULL, so we need to check a few things before we can > safely dereference d->driver. This would be causing a NULL deference to > occur when a system binds to the Generic PHY driver. Update > phy_attach_direct() to do the following: > > - grab the driver module reference after we have assigned the Generic > PHY drivers accordingly, and remember we came from the generic PHY > path > > - update the error path to clean up the module reference in case the > Generic PHY probe function fails > > - split the error path involving phy_detacht() to avoid double free/put > since phy_detach() does all the clean up > > - finally, have phy_detach() drop the module reference count before we > call device_release_driver() for the Generic PHY driver case > > Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver") > Signed-off-by: Florian FainelliJust FWIW, this time I tested all error paths in phy_attach_direct() by directly injecting errors, and did that with both the Generic PHY driver and another driver to make sure there were no reference count problems, nor double frees. Thanks all! > --- > David, > > This is applicable to the "net" and the "net-next" tree since you > merged "net" into "net-next". > > I will fix the PHY driver bind/unbind mess another time, because we are > running > out of time for 4.10-rc final, and it's not like it worked before and got > broken in this cycle, it just never worked (the bind/unbind). > > Thanks! > > drivers/net/phy/phy_device.c | 28 > 1 file changed, 20 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 0d8f4d3847f6..8c8e15b8739d 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -908,6 +908,7 @@ int phy_attach_direct(struct net_device *dev, struct > phy_device *phydev, > struct module *ndev_owner = dev->dev.parent->driver->owner; > struct mii_bus *bus = phydev->mdio.bus; > struct device *d = >mdio.dev; > + bool using_genphy = false; > int err; > > /* For Ethernet device drivers that register their own MDIO bus, we > @@ -920,11 +921,6 @@ int phy_attach_direct(struct net_device *dev, struct > phy_device *phydev, > return -EIO; > } > > - if (!try_module_get(d->driver->owner)) { > - dev_err(>dev, "failed to get the device driver module\n"); > - return -EIO; > - } > - > get_device(d); > > /* Assume that if there is no driver, that it doesn't > @@ -938,12 +934,22 @@ int phy_attach_direct(struct net_device *dev, struct > phy_device *phydev, > d->driver = > _driver[GENPHY_DRV_1G].mdiodrv.driver; > > + using_genphy = true; > + } > + > + if (!try_module_get(d->driver->owner)) { > + dev_err(>dev, "failed to get the device driver module\n"); > + err = -EIO; > + goto error_put_device; > + } > + > + if (using_genphy) { > err = d->driver->probe(d); > if (err >= 0) > err = device_bind_driver(d); > > if (err) > - goto error; > + goto error_module_put; > } > > if (phydev->attached_dev) { > @@ -980,9 +986,14 @@ int phy_attach_direct(struct net_device *dev, struct > phy_device *phydev, > return err; > > error: > + /* phy_detach() does all of the cleanup below */ > phy_detach(phydev); > - put_device(d); > + return err; > + > +error_module_put: > module_put(d->driver->owner); > +error_put_device: > + put_device(d); > if (ndev_owner != bus->owner) > module_put(bus->owner); > return err; > @@ -1045,6 +1056,8 @@ void phy_detach(struct phy_device *phydev) > > phy_led_triggers_unregister(phydev); > > + module_put(phydev->mdio.dev.driver->owner); > + > /* If the device had no specific driver before (i.e. - it >* was using the generic driver), we unbind the device >* from the generic driver so that there's a chance a > @@ -1065,7 +1078,6 @@ void phy_detach(struct phy_device *phydev) > bus = phydev->mdio.bus; > > put_device(>mdio.dev); > - module_put(phydev->mdio.dev.driver->owner); > if (ndev_owner != bus->owner) > module_put(bus->owner); > } > -- Florian
Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
On 2/8/17 12:40 PM, David Ahern wrote: > On 2/8/17 3:52 AM, Daniel Borkmann wrote: >> for cBPF dumps it looks like this in ss. Can you tell me what these >> 11 insns do? Likely you can, but can a normal admin? >> >> # ss -0 -b >> Netid Recv-Q Send-Q Local >> Address:PortPeer >> Address:Port >> p_raw 0 0 >> *:em1* >> bpf filter (11): 0x28 0 0 12, 0x15 0 8 2048, 0x30 0 0 23, 0x15 0 6 >> 17, 0x28 0 0 20, 0x45 4 0 8191, 0xb1 0 0 14, 0x48 0 0 16, 0x15 0 1 68, >> 0x06 0 0 4294967295, 0x06 0 0 0, > ... > > It's not rocket science. We should be able to write tools that do the > same for bpf as objdump does for assembly. It is a matter of someone > having the need and taking the initiative. BTW, the bpf option was added Just a couple of hours of hacking this afternoon and leveraging some of the verifier code in the kernel, the above bpf filter in more human friendly terms: BPF_LD | BPF_ABS | BPF_H 0xc: val = *(u16 *)skb[12] BPF_JMP | BPF_JEQ | BPF_K 0 8 0x800 : if !(val == 0x800) goto pc+8 BPF_LD | BPF_ABS | BPF_B 0x17 : val = *(u8 *)skb[23] BPF_JMP | BPF_JEQ | BPF_K 0 6 0x11 : if !(val == 0x11) goto pc+6 BPF_LD | BPF_ABS | BPF_H 0x14 : val = *(u16 *)skb[20] BPF_JMP | BPF_JSET | BPF_K 4 0 0x1fff : if ((val & 0x1fff) != 0) goto pc+4 BPF_LDX | BPF_MSH | BPF_B 0xe: BPF_LD | BPF_IND | BPF_H 0x10 : val = *(u16 *)skb[16] BPF_JMP | BPF_JEQ | BPF_K 0 1 0x44 : if !(val == 0x44) goto pc+1 BPF_RET : ret BPF_RET 0 : ret 0 (long lines so I chopped the reprint of the hex on the left) That said, verifying that the program attached to a cgroup is correct for a VRF does not require it to be pretty printed or viewed by humans. I can automate the checks on namespace id and and device index.
[RFC PATCH net-next 1/1] net: rmnet_data: Initial implementation
RmNet Data driver provides a transport agnostic MAP (multiplexing and aggregation protocol) support in embedded and bridge modes. Module provides virtual network devices which can be attached to any IP-mode physical device. This will be used to provide all MAP functionality on future hardware in a single consistent location. Signed-off-by: Subash Abhinov Kasiviswanathan--- Documentation/networking/rmnet_data.txt | 82 +++ include/uapi/linux/Kbuild |2 + include/uapi/linux/if_arp.h |1 + include/uapi/linux/if_ether.h |3 +- include/uapi/linux/rmnet_data.h | 416 +++ net/Kconfig |1 + net/Makefile|1 + net/rmnet_data/Kconfig | 21 + net/rmnet_data/Makefile | 14 + net/rmnet_data/rmnet_data_config.c | 1149 +++ net/rmnet_data/rmnet_data_config.h | 123 net/rmnet_data/rmnet_data_handlers.c| 560 +++ net/rmnet_data/rmnet_data_handlers.h| 24 + net/rmnet_data/rmnet_data_main.c| 60 ++ net/rmnet_data/rmnet_data_private.h | 76 ++ net/rmnet_data/rmnet_data_stats.c | 86 +++ net/rmnet_data/rmnet_data_stats.h | 61 ++ net/rmnet_data/rmnet_data_trace.h | 183 + net/rmnet_data/rmnet_data_vnd.c | 602 net/rmnet_data/rmnet_data_vnd.h | 40 ++ net/rmnet_data/rmnet_map.h | 148 net/rmnet_data/rmnet_map_command.c | 180 + net/rmnet_data/rmnet_map_data.c | 154 + 23 files changed, 3986 insertions(+), 1 deletion(-) create mode 100644 Documentation/networking/rmnet_data.txt create mode 100644 include/uapi/linux/rmnet_data.h create mode 100644 net/rmnet_data/Kconfig create mode 100644 net/rmnet_data/Makefile create mode 100644 net/rmnet_data/rmnet_data_config.c create mode 100644 net/rmnet_data/rmnet_data_config.h create mode 100644 net/rmnet_data/rmnet_data_handlers.c create mode 100644 net/rmnet_data/rmnet_data_handlers.h create mode 100644 net/rmnet_data/rmnet_data_main.c create mode 100644 net/rmnet_data/rmnet_data_private.h create mode 100644 net/rmnet_data/rmnet_data_stats.c create mode 100644 net/rmnet_data/rmnet_data_stats.h create mode 100644 net/rmnet_data/rmnet_data_trace.h create mode 100644 net/rmnet_data/rmnet_data_vnd.c create mode 100644 net/rmnet_data/rmnet_data_vnd.h create mode 100644 net/rmnet_data/rmnet_map.h create mode 100644 net/rmnet_data/rmnet_map_command.c create mode 100644 net/rmnet_data/rmnet_map_data.c diff --git a/Documentation/networking/rmnet_data.txt b/Documentation/networking/rmnet_data.txt new file mode 100644 index 000..ff6cce8 --- /dev/null +++ b/Documentation/networking/rmnet_data.txt @@ -0,0 +1,82 @@ +1. Introduction + +rmnet_data driver is used for supporting the Multiplexing and aggregation +Protocol (MAP). This protocol is used by all recent chipsets using Qualcomm +Technologies, Inc. modems. + +This driver can be used to register onto any physical network device in +IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator. + +Multiplexing allows for creation of logical netdevices (rmnet_data devices) to +handle multiple private data networks (PDN) like a default internet, tethering, +multimedia messaging service (MMS) or IP media subsystem (IMS). Hardware sends +packets with MAP headers to rmnet_data. Based on the multiplexer id, rmnet_data +routes to the appropriate PDN after removing the MAP header. + +Aggregation is required to achieve high data rates. This involves hardware +sending aggregated bunch of MAP frames. rmnet_data driver will de-aggregate +these MAP frames and send them to appropriate PDN's. + +2. Packet format + +a. MAP packet (data / control) + +MAP header has the same endianness of the IP packet. + +Packet format - + +Bit 0 1 2-7 8 - 15 16 - 31 +Function Command / Data Reserved Pad Multiplexer IDPayload length +Bit32 - x +Function Raw Bytes + +Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command +or data packet. Control packet is used for transport level flow control. Data +packets are standard IP packets. + +Reserved bits are usually zeroed out and to be ignored by receiver. + +Padding is number of bytes to be added for 4 byte alignment if required by +hardware. + +Multiplexer ID is to indicate the PDN on which data has to be sent. + +Payload length includes the padding length but does not include MAP header +length. + +b. MAP packet (command specific) + +Bit 0 1 2-7 8 - 15 16 - 31 +Function Command Reserved Pad Multiplexer IDPayload length +Bit 32 - 3940 - 4546 - 47 48 - 63 +Function Command nameReserved Command Type Reserved +Bit
[RFC PATCH net-next 0/1] net: Add support for rmnet_data driver
This patch adds support for the rmnet_data driver which is required to support recent chipsets using Qualcomm Technologies, Inc. modems. The data from hardware follows the multiplexing and aggregation protocol (MAP). This driver can be used to register onto any physical network device in IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator. rmnet_data driver helps to decode these packets and queue them to network stack (and encode and transmit it to the physical device). Subash Abhinov Kasiviswanathan (1): net: rmnet_data: Initial implementation Documentation/networking/rmnet_data.txt | 82 +++ include/uapi/linux/Kbuild |2 + include/uapi/linux/if_arp.h |1 + include/uapi/linux/if_ether.h |3 +- include/uapi/linux/rmnet_data.h | 416 +++ net/Kconfig |1 + net/Makefile|1 + net/rmnet_data/Kconfig | 21 + net/rmnet_data/Makefile | 14 + net/rmnet_data/rmnet_data_config.c | 1149 +++ net/rmnet_data/rmnet_data_config.h | 123 net/rmnet_data/rmnet_data_handlers.c| 560 +++ net/rmnet_data/rmnet_data_handlers.h| 24 + net/rmnet_data/rmnet_data_main.c| 60 ++ net/rmnet_data/rmnet_data_private.h | 76 ++ net/rmnet_data/rmnet_data_stats.c | 86 +++ net/rmnet_data/rmnet_data_stats.h | 61 ++ net/rmnet_data/rmnet_data_trace.h | 183 + net/rmnet_data/rmnet_data_vnd.c | 602 net/rmnet_data/rmnet_data_vnd.h | 40 ++ net/rmnet_data/rmnet_map.h | 148 net/rmnet_data/rmnet_map_command.c | 180 + net/rmnet_data/rmnet_map_data.c | 154 + 23 files changed, 3986 insertions(+), 1 deletion(-) create mode 100644 Documentation/networking/rmnet_data.txt create mode 100644 include/uapi/linux/rmnet_data.h create mode 100644 net/rmnet_data/Kconfig create mode 100644 net/rmnet_data/Makefile create mode 100644 net/rmnet_data/rmnet_data_config.c create mode 100644 net/rmnet_data/rmnet_data_config.h create mode 100644 net/rmnet_data/rmnet_data_handlers.c create mode 100644 net/rmnet_data/rmnet_data_handlers.h create mode 100644 net/rmnet_data/rmnet_data_main.c create mode 100644 net/rmnet_data/rmnet_data_private.h create mode 100644 net/rmnet_data/rmnet_data_stats.c create mode 100644 net/rmnet_data/rmnet_data_stats.h create mode 100644 net/rmnet_data/rmnet_data_trace.h create mode 100644 net/rmnet_data/rmnet_data_vnd.c create mode 100644 net/rmnet_data/rmnet_data_vnd.h create mode 100644 net/rmnet_data/rmnet_map.h create mode 100644 net/rmnet_data/rmnet_map_command.c create mode 100644 net/rmnet_data/rmnet_map_data.c -- 1.9.1
[PATCH v3 net-next 10/10] openvswitch: Pack struct sw_flow_key.
struct sw_flow_key has two 16-bit holes. Move the most matched conntrack match fields there. In some typical cases this reduces the size of the key that needs to be hashed into half and into one cache line. Signed-off-by: Jarno Rajahalme--- net/openvswitch/conntrack.c| 40 net/openvswitch/conntrack.h| 8 net/openvswitch/flow.h | 14 -- net/openvswitch/flow_netlink.c | 11 +++ 4 files changed, 39 insertions(+), 34 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index de47782..a12825e 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -154,7 +154,7 @@ static void __ovs_ct_update_key_orig_tp(struct sw_flow_key *key, const struct nf_conntrack_tuple *orig, u8 icmp_proto) { - key->ct.orig_proto = orig->dst.protonum; + key->ct_orig_proto = orig->dst.protonum; if (orig->dst.protonum == icmp_proto) { key->ct.orig_tp.src = htons(orig->dst.u.icmp.type); key->ct.orig_tp.dst = htons(orig->dst.u.icmp.code); @@ -168,8 +168,8 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, const struct nf_conntrack_zone *zone, const struct nf_conn *ct) { - key->ct.state = state; - key->ct.zone = zone->id; + key->ct_state = state; + key->ct_zone = zone->id; key->ct.mark = ovs_ct_get_mark(ct); ovs_ct_get_labels(ct, >ct.labels); @@ -197,10 +197,10 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, return; } } - /* Clear 'ct.orig_proto' to mark the non-existence of conntrack + /* Clear 'ct_orig_proto' to mark the non-existence of conntrack * original direction key fields. */ - key->ct.orig_proto = 0; + key->ct_orig_proto = 0; } /* Update 'key' based on skb->_nfct. If 'post_ct' is true, then OVS has @@ -230,7 +230,7 @@ static void ovs_ct_update_key(const struct sk_buff *skb, if (ct->master) state |= OVS_CS_F_RELATED; if (keep_nat_flags) { - state |= key->ct.state & OVS_CS_F_NAT_MASK; + state |= key->ct_state & OVS_CS_F_NAT_MASK; } else { if (ct->status & IPS_SRC_NAT) state |= OVS_CS_F_SRC_NAT; @@ -261,11 +261,11 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key) int ovs_ct_put_key(const struct sw_flow_key *swkey, const struct sw_flow_key *output, struct sk_buff *skb) { - if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct.state)) + if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct_state)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) && - nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct.zone)) + nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct_zone)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) && @@ -277,14 +277,14 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey, >ct.labels)) return -EMSGSIZE; - if (swkey->ct.orig_proto) { + if (swkey->ct_orig_proto) { if (swkey->eth.type == htons(ETH_P_IP)) { struct ovs_key_ct_tuple_ipv4 orig = { output->ipv4.ct_orig.src, output->ipv4.ct_orig.dst, output->ct.orig_tp.src, output->ct.orig_tp.dst, - output->ct.orig_proto, + output->ct_orig_proto, }; if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4, sizeof(orig), )) @@ -295,7 +295,7 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey, IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.dst), output->ct.orig_tp.src, output->ct.orig_tp.dst, - output->ct.orig_proto, + output->ct_orig_proto, }; if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6, sizeof(orig), )) @@ -614,11 +614,11 @@ static bool skb_nfct_cached(struct net *net, * due to an upcall. If the connection was not confirmed, it is not * cached and needs to be run through conntrack again. */ - if (!ct && key->ct.state & OVS_CS_F_TRACKED && - !(key->ct.state & OVS_CS_F_INVALID) && - key->ct.zone
[PATCH v3 net-next 04/10] openvswitch: Unionize ovs_key_ct_label with a u32 array.
Make the array of labels in struct ovs_key_ct_label an union, adding a u32 array of the same byte size as the existing u8 array. It is faster to loop through the labels 32 bits at the time, which is also the alignment of netlink attributes. Signed-off-by: Jarno Rajahalme--- include/uapi/linux/openvswitch.h | 8 ++-- net/openvswitch/conntrack.c | 15 --- 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 375d812..96aee34 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -446,9 +446,13 @@ struct ovs_key_nd { __u8nd_tll[ETH_ALEN]; }; -#define OVS_CT_LABELS_LEN 16 +#define OVS_CT_LABELS_LEN_32 4 +#define OVS_CT_LABELS_LEN (OVS_CT_LABELS_LEN_32 * sizeof(__u32)) struct ovs_key_ct_labels { - __u8ct_labels[OVS_CT_LABELS_LEN]; + union { + __u8ct_labels[OVS_CT_LABELS_LEN]; + __u32 ct_labels_32[OVS_CT_LABELS_LEN_32]; + }; }; /* OVS_KEY_ATTR_CT_STATE flags */ diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index a6ff374..f23934c 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -281,20 +281,21 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key, /* Triggers a change event, which makes sense only for * confirmed connections. */ - int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask, - OVS_CT_LABELS_LEN / sizeof(u32)); + int err = nf_connlabels_replace(ct, labels->ct_labels_32, + mask->ct_labels_32, + OVS_CT_LABELS_LEN_32); if (err) return err; } else { u32 *dst = (u32 *)cl->bits; - const u32 *msk = (const u32 *)mask->ct_labels; - const u32 *lbl = (const u32 *)labels->ct_labels; + const u32 *msk = mask->ct_labels_32; + const u32 *lbl = labels->ct_labels_32; int i; /* No-one else has access to the non-confirmed entry, copy * labels over, keeping any bits we are not explicitly setting. */ - for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++) + for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]); } @@ -866,8 +867,8 @@ static bool labels_nonzero(const struct ovs_key_ct_labels *labels) { size_t i; - for (i = 0; i < sizeof(*labels); i++) - if (labels->ct_labels[i]) + for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) + if (labels->ct_labels_32[i]) return true; return false; -- 2.1.4
[PATCH v3 net-next 07/10] openvswitch: Inherit master's labels.
We avoid calling into nf_conntrack_in() for expected connections, as that would remove the expectation that we want to stick around until we are ready to commit the connection. Instead, we do a lookup in the expectation table directly. However, after a successful expectation lookup we have set the flow key label field from the master connection, whereas nf_conntrack_in() does not do this. This leads to master's labels being inherited after an expectation lookup, but those labels not being inherited after the corresponding conntrack action with a commit flag. This patch resolves the problem by changing the commit code path to also inherit the master's labels to the expected connection. Resolving this conflict in favor or inheriting the labels allows more information be passed from the master connection to related connections, which would otherwise be much harder if the 32 bits in the connmark are not enough. Labels can still be set explicitly, so this change only affects the default values of the labels in presense of a master connection. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Jarno Rajahalme--- net/openvswitch/conntrack.c | 45 +++-- 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 0e038ee..5fbadcd 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -73,6 +73,8 @@ struct ovs_conntrack_info { #endif }; +static bool labels_nonzero(const struct ovs_key_ct_labels *labels); + static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info); static u16 key_to_nfproto(const struct sw_flow_key *key) @@ -272,18 +274,32 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key, const struct ovs_key_ct_labels *labels, const struct ovs_key_ct_labels *mask) { - struct nf_conn_labels *cl; - u32 *dst; - int i; + struct nf_conn_labels *cl, *master_cl; + bool have_mask = labels_nonzero(mask); + + /* Inherit master's labels to the related connection? */ + master_cl = (ct->master) ? nf_ct_labels_find(ct->master) : NULL; + + if (!master_cl && !have_mask) + return 0; /* Nothing to do. */ cl = ovs_ct_get_conn_labels(ct); if (!cl) return -ENOSPC; - dst = (u32 *)cl->bits; - for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) - dst[i] = (dst[i] & ~mask->ct_labels_32[i]) | - (labels->ct_labels_32[i] & mask->ct_labels_32[i]); + /* Inherit the master's labels, if any. */ + if (master_cl) + *cl = *master_cl; + + if (have_mask) { + u32 *dst = (u32 *)cl->bits; + int i; + + for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) + dst[i] = (dst[i] & ~mask->ct_labels_32[i]) | + (labels->ct_labels_32[i] +& mask->ct_labels_32[i]); + } memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN); @@ -911,13 +927,14 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key, if (err) return err; } - if (labels_nonzero(>labels.mask)) { - if (!nf_ct_is_confirmed(ct)) - err = ovs_ct_init_labels(ct, key, >labels.value, ->labels.mask); - else - err = ovs_ct_set_labels(ct, key, >labels.value, - >labels.mask); + if (!nf_ct_is_confirmed(ct)) { + err = ovs_ct_init_labels(ct, key, >labels.value, +>labels.mask); + if (err) + return err; + } else if (labels_nonzero(>labels.mask)) { + err = ovs_ct_set_labels(ct, key, >labels.value, + >labels.mask); if (err) return err; } -- 2.1.4
[PATCH v3 net-next 05/10] openvswitch: Simplify labels length logic.
Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128 distinct labels"), the size of conntrack labels extension has fixed to 128 bits, so we do not need to check for labels sizes shorter than 128 at run-time. This patch simplifies labels length logic accordingly, but allows the conntrack labels size to be increased in the future without breaking the build. In the event of conntrack labels increasing in size OVS would still be able to deal with the 128 first label bits. Suggested-by: Joe StringerSigned-off-by: Jarno Rajahalme --- net/openvswitch/conntrack.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index f23934c..c7db4da 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -129,22 +129,22 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct) #endif } +/* Guard against conntrack labels max size shrinking below 128 bits. */ +#if NF_CT_LABELS_MAX_SIZE < 16 +#error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes +#endif + static void ovs_ct_get_labels(const struct nf_conn *ct, struct ovs_key_ct_labels *labels) { struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL; - if (cl) { - size_t len = sizeof(cl->bits); - - if (len > OVS_CT_LABELS_LEN) - len = OVS_CT_LABELS_LEN; - else if (len < OVS_CT_LABELS_LEN) - memset(labels, 0, OVS_CT_LABELS_LEN); - memcpy(labels, cl->bits, len); - } else { + if (cl) + memcpy(labels, cl->bits, + sizeof(cl->bits) > OVS_CT_LABELS_LEN + ? OVS_CT_LABELS_LEN : sizeof(cl->bits)); + else memset(labels, 0, OVS_CT_LABELS_LEN); - } } static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, @@ -274,7 +274,7 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key, nf_ct_labels_ext_add(ct); cl = nf_ct_labels_find(ct); } - if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN) + if (!cl) return -ENOSPC; if (nf_ct_is_confirmed(ct)) { -- 2.1.4
[PATCH v3 net-next 00/10] openvswitch: Conntrack integration improvements.
This series improves the conntrack integration code in the openvswitch module by fixing outdated comments (patch 1), bugs (patches 2, 3, and 7), clarifying code (patches 4, 5, and 6), improving performance (patch 10), and adding new features enabling better translation from firewall admission policy to network configuration requested by user communities (patches 8 and 9). v3: Rebase to the current net-next, add the comment only changing patch 1 and reshuffle some of the patches as requested by Joe. Jarno Rajahalme (10): openvswitch: Fix comments for skb->_nfct openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted. openvswitch: Do not trigger events for unconfirmed connections. openvswitch: Unionize ovs_key_ct_label with a u32 array. openvswitch: Simplify labels length logic. openvswitch: Refactor labels initialization. openvswitch: Inherit master's labels. openvswitch: Add original direction conntrack tuple to sw_flow_key. openvswitch: Add force commit. openvswitch: Pack struct sw_flow_key. include/uapi/linux/openvswitch.h | 33 - net/openvswitch/actions.c| 2 + net/openvswitch/conntrack.c | 298 ++- net/openvswitch/conntrack.h | 14 +- net/openvswitch/flow.c | 34 - net/openvswitch/flow.h | 55 ++-- net/openvswitch/flow_netlink.c | 92 +--- net/openvswitch/flow_netlink.h | 7 +- 8 files changed, 422 insertions(+), 113 deletions(-) -- 2.1.4
[PATCH v3 net-next 06/10] openvswitch: Refactor labels initialization.
Refactoring conntrack labels initialization makes chenges in later patches easier to review. Signed-off-by: Jarno Rajahalme--- net/openvswitch/conntrack.c | 104 ++-- 1 file changed, 62 insertions(+), 42 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index c7db4da..0e038ee 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -229,19 +229,12 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb) return 0; } -static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key, +static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key, u32 ct_mark, u32 mask) { #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) - enum ip_conntrack_info ctinfo; - struct nf_conn *ct; u32 new_mark; - /* The connection could be invalid, in which case set_mark is no-op. */ - ct = nf_ct_get(skb, ); - if (!ct) - return 0; - new_mark = ct_mark | (ct->mark & ~(mask)); if (ct->mark != new_mark) { ct->mark = new_mark; @@ -256,50 +249,66 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key, #endif } -static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key, -const struct ovs_key_ct_labels *labels, -const struct ovs_key_ct_labels *mask) +static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct) { - enum ip_conntrack_info ctinfo; struct nf_conn_labels *cl; - struct nf_conn *ct; - - /* The connection could be invalid, in which case set_label is no-op.*/ - ct = nf_ct_get(skb, ); - if (!ct) - return 0; cl = nf_ct_labels_find(ct); if (!cl) { nf_ct_labels_ext_add(ct); cl = nf_ct_labels_find(ct); } + + return cl; +} + +/* Initialize labels for a new, to be committed conntrack entry. Note that + * since the new connection is not yet confirmed, and thus no-one else has + * access to it's labels, we simply write them over. Also, we refrain from + * triggering events, as receiving change events before the create event would + * be confusing. + */ +static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key, + const struct ovs_key_ct_labels *labels, + const struct ovs_key_ct_labels *mask) +{ + struct nf_conn_labels *cl; + u32 *dst; + int i; + + cl = ovs_ct_get_conn_labels(ct); if (!cl) return -ENOSPC; - if (nf_ct_is_confirmed(ct)) { - /* Triggers a change event, which makes sense only for -* confirmed connections. -*/ - int err = nf_connlabels_replace(ct, labels->ct_labels_32, - mask->ct_labels_32, - OVS_CT_LABELS_LEN_32); - if (err) - return err; - } else { - u32 *dst = (u32 *)cl->bits; - const u32 *msk = mask->ct_labels_32; - const u32 *lbl = labels->ct_labels_32; - int i; + dst = (u32 *)cl->bits; + for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) + dst[i] = (dst[i] & ~mask->ct_labels_32[i]) | + (labels->ct_labels_32[i] & mask->ct_labels_32[i]); - /* No-one else has access to the non-confirmed entry, copy -* labels over, keeping any bits we are not explicitly setting. -*/ - for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) - dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]); - } + memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN); + + return 0; +} + +static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, +const struct ovs_key_ct_labels *labels, +const struct ovs_key_ct_labels *mask) +{ + struct nf_conn_labels *cl; + int err; + + cl = ovs_ct_get_conn_labels(ct); + if (!cl) + return -ENOSPC; + + err = nf_connlabels_replace(ct, labels->ct_labels_32, + mask->ct_labels_32, + OVS_CT_LABELS_LEN_32); + if (err) + return err; + + memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN); - ovs_ct_get_labels(ct, >ct.labels); return 0; } @@ -879,25 +888,36 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key, const struct ovs_conntrack_info *info, struct sk_buff *skb) { + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; int err; err =
[PATCH v3 net-next 03/10] openvswitch: Do not trigger events for unconfirmed connections.
Receiving change events before the 'new' event for the connection has been received can be confusing. Avoid triggering change events for setting conntrack mark or labels before the conntrack entry has been confirmed. Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark") Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label") Signed-off-by: Jarno Rajahalme--- net/openvswitch/conntrack.c | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 4df9a54..a6ff374 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -245,7 +245,8 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key, new_mark = ct_mark | (ct->mark & ~(mask)); if (ct->mark != new_mark) { ct->mark = new_mark; - nf_conntrack_event_cache(IPCT_MARK, ct); + if (nf_ct_is_confirmed(ct)) + nf_conntrack_event_cache(IPCT_MARK, ct); key->ct.mark = new_mark; } @@ -262,7 +263,6 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key, enum ip_conntrack_info ctinfo; struct nf_conn_labels *cl; struct nf_conn *ct; - int err; /* The connection could be invalid, in which case set_label is no-op.*/ ct = nf_ct_get(skb, ); @@ -277,10 +277,26 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key, if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN) return -ENOSPC; - err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask, - OVS_CT_LABELS_LEN / sizeof(u32)); - if (err) - return err; + if (nf_ct_is_confirmed(ct)) { + /* Triggers a change event, which makes sense only for +* confirmed connections. +*/ + int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask, + OVS_CT_LABELS_LEN / sizeof(u32)); + if (err) + return err; + } else { + u32 *dst = (u32 *)cl->bits; + const u32 *msk = (const u32 *)mask->ct_labels; + const u32 *lbl = (const u32 *)labels->ct_labels; + int i; + + /* No-one else has access to the non-confirmed entry, copy +* labels over, keeping any bits we are not explicitly setting. +*/ + for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++) + dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]); + } ovs_ct_get_labels(ct, >ct.labels); return 0; -- 2.1.4
[PATCH v3 net-next 01/10] openvswitch: Fix comments for skb->_nfct
Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that they are combined into '_nfct'. Signed-off-by: Jarno Rajahalme--- net/openvswitch/conntrack.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index fbffe0e..5de6d12 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -157,7 +157,7 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, ovs_ct_get_labels(ct, >ct.labels); } -/* Update 'key' based on skb->nfct. If 'post_ct' is true, then OVS has +/* Update 'key' based on skb->_nfct. If 'post_ct' is true, then OVS has * previously sent the packet to conntrack via the ct action. If * 'keep_nat_flags' is true, the existing NAT flags retained, else they are * initialized from the connection status. @@ -421,12 +421,12 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) /* Find an existing connection which this packet belongs to without * re-attributing statistics or modifying the connection state. This allows an - * skb->nfct lost due to an upcall to be recovered during actions execution. + * skb->_nfct lost due to an upcall to be recovered during actions execution. * * Must be called with rcu_read_lock. * - * On success, populates skb->nfct and skb->nfctinfo, and returns the - * connection. Returns NULL if there is no existing entry. + * On success, populates skb->_nfct and returns the connection. Returns NULL + * if there is no existing entry. */ static struct nf_conn * ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, @@ -464,7 +464,7 @@ ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, return ct; } -/* Determine whether skb->nfct is equal to the result of conntrack lookup. */ +/* Determine whether skb->_nfct is equal to the result of conntrack lookup. */ static bool skb_nfct_cached(struct net *net, const struct sw_flow_key *key, const struct ovs_conntrack_info *info, @@ -475,7 +475,7 @@ static bool skb_nfct_cached(struct net *net, ct = nf_ct_get(skb, ); /* If no ct, check if we have evidence that an existing conntrack entry -* might be found for this skb. This happens when we lose a skb->nfct +* might be found for this skb. This happens when we lose a skb->_nfct * due to an upcall. If the connection was not confirmed, it is not * cached and needs to be run through conntrack again. */ @@ -699,7 +699,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if * not done already. Update key with new CT state after passing the packet * through conntrack. - * Note that if the packet is deemed invalid by conntrack, skb->nfct will be + * Note that if the packet is deemed invalid by conntrack, skb->_nfct will be * set to NULL and 0 will be returned. */ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, -- 2.1.4
[PATCH v3 net-next 09/10] openvswitch: Add force commit.
Stateful network admission policy may allow connections to one direction and reject connections initiated in the other direction. After policy change it is possible that for a new connection an overlapping conntrack entry already exists, where the original direction of the existing connection is opposed to the new connection's initial packet. Most importantly, conntrack state relating to the current packet gets the "reply" designation based on whether the original direction tuple or the reply direction tuple matched. If this "directionality" is wrong w.r.t. to the stateful network admission policy it may happen that packets in neither direction are correctly admitted. This patch adds a new "force commit" option to the OVS conntrack action that checks the original direction of an existing conntrack entry. If that direction is opposed to the current packet, the existing conntrack entry is deleted and a new one is subsequently created in the correct direction. Signed-off-by: Jarno Rajahalme--- include/uapi/linux/openvswitch.h | 5 + net/openvswitch/conntrack.c | 26 -- 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 90af8b8..7f41f7d 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -674,6 +674,10 @@ struct ovs_action_hash { * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG. * @OVS_CT_ATTR_NAT: Nested OVS_NAT_ATTR_* for performing L3 network address * translation (NAT) on the packet. + * @OVS_CT_ATTR_FORCE_COMMIT: Like %OVS_CT_ATTR_COMMIT, but instead of doing + * nothing if the connection is already committed will check that the current + * packet is in conntrack entry's original direction. If directionality does + * not match, will delete the existing conntrack entry and commit a new one. */ enum ovs_ct_attr { OVS_CT_ATTR_UNSPEC, @@ -684,6 +688,7 @@ enum ovs_ct_attr { OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of related connections. */ OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */ + OVS_CT_ATTR_FORCE_COMMIT, /* No argument */ __OVS_CT_ATTR_MAX }; diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 8685bcd..de47782 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -65,6 +65,7 @@ struct ovs_conntrack_info { struct nf_conn *ct; u8 commit : 1; u8 nat : 3; /* enum ovs_ct_nat */ + u8 force : 1; u16 family; struct md_mark mark; struct md_labels labels; @@ -615,10 +616,13 @@ static bool skb_nfct_cached(struct net *net, */ if (!ct && key->ct.state & OVS_CS_F_TRACKED && !(key->ct.state & OVS_CS_F_INVALID) && - key->ct.zone == info->zone.id) + key->ct.zone == info->zone.id) { ct = ovs_ct_find_existing(net, >zone, info->family, skb, !!(key->ct.state & OVS_CS_F_NAT_MASK)); + if (ct) + nf_ct_get(skb, ); + } if (!ct) return false; if (!net_eq(net, read_pnet(>ct_net))) @@ -632,6 +636,18 @@ static bool skb_nfct_cached(struct net *net, if (help && rcu_access_pointer(help->helper) != info->helper) return false; } + /* Force conntrack entry direction to the current packet? */ + if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) { + /* Delete the conntrack entry if confirmed, else just release +* the reference. +*/ + if (nf_ct_is_confirmed(ct)) + nf_ct_delete(ct, 0, 0); + else + nf_conntrack_put(>ct_general); + nf_ct_set(skb, NULL, 0); + return false; + } return true; } @@ -1209,6 +1225,7 @@ static int parse_nat(const struct nlattr *attr, static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = { [OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 }, + [OVS_CT_ATTR_FORCE_COMMIT] = { .minlen = 0, .maxlen = 0 }, [OVS_CT_ATTR_ZONE] = { .minlen = sizeof(u16), .maxlen = sizeof(u16) }, [OVS_CT_ATTR_MARK] = { .minlen = sizeof(struct md_mark), @@ -1248,6 +1265,9 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info, } switch (type) { + case OVS_CT_ATTR_FORCE_COMMIT: + info->force = true; + /* fall through. */ case OVS_CT_ATTR_COMMIT: info->commit = true; break; @@ -1474,7
[PATCH v3 net-next 02/10] openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.
The conntrack lookup for existing connections fails to invert the packet 5-tuple for NATted packets, and therefore fails to find the existing conntrack entry. Conntrack only stores 5-tuples for incoming packets, and there are various situations where a lookup on a packet that has already been transformed by NAT needs to be made. Looking up an existing conntrack entry upon executing packet received from the userspace is one of them. This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple for the conntrack lookup whenever the packet has already been transformed by conntrack from its input form as evidenced by one of the NAT flags being set in the conntrack state metadata. Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Jarno Rajahalme--- net/openvswitch/conntrack.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 5de6d12..4df9a54 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -430,7 +430,7 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) */ static struct nf_conn * ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, -u8 l3num, struct sk_buff *skb) +u8 l3num, struct sk_buff *skb, bool natted) { struct nf_conntrack_l3proto *l3proto; struct nf_conntrack_l4proto *l4proto; @@ -453,6 +453,17 @@ ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, return NULL; } + /* Must invert the tuple if skb has been transformed by NAT. */ + if (natted) { + struct nf_conntrack_tuple inverse; + + if (!nf_ct_invert_tuple(, , l3proto, l4proto)) { + pr_debug("ovs_ct_find_existing: Inversion failed!\n"); + return NULL; + } + tuple = inverse; + } + /* look for tuple match */ h = nf_conntrack_find_get(net, zone, ); if (!h) @@ -460,6 +471,13 @@ ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, ct = nf_ct_tuplehash_to_ctrack(h); + /* Inverted packet tuple matches the reverse direction conntrack tuple, +* select the other tuplehash to get the right 'ctinfo' bits for this +* packet. +*/ + if (natted) + h = >tuplehash[!h->tuple.dst.dir]; + nf_ct_set(skb, ct, ovs_ct_get_info(h)); return ct; } @@ -482,7 +500,9 @@ static bool skb_nfct_cached(struct net *net, if (!ct && key->ct.state & OVS_CS_F_TRACKED && !(key->ct.state & OVS_CS_F_INVALID) && key->ct.zone == info->zone.id) - ct = ovs_ct_find_existing(net, >zone, info->family, skb); + ct = ovs_ct_find_existing(net, >zone, info->family, skb, + !!(key->ct.state +& OVS_CS_F_NAT_MASK)); if (!ct) return false; if (!net_eq(net, read_pnet(>ct_net))) -- 2.1.4
[PATCH v3 net-next 08/10] openvswitch: Add original direction conntrack tuple to sw_flow_key.
Add the fields of the conntrack original direction 5-tuple to struct sw_flow_key. The new fields are initially marked as non-existent, and are populated whenever a conntrack action is executed and either finds or generates a conntrack entry. This means that these fields exist for all packets that were not rejected by conntrack as untrackable. The original tuple fields in the sw_flow_key are filled from the original direction tuple of the conntrack entry relating to the current packet, or from the original direction tuple of the master conntrack entry, if the current conntrack entry has a master. Generally, expected connections of connections having an assigned helper (e.g., FTP), have a master conntrack entry. The main purpose of the new conntrack original tuple fields is to allow matching on them for policy decision purposes, with the premise that the admissibility of tracked connections reply packets (as well as original direction packets), and both direction packets of any related connections may be based on ACL rules applying to the master connection's original direction 5-tuple. This also makes it easier to make policy decisions when the actual packet headers might have been transformed by NAT, as the original direction 5-tuple represents the packet headers before any such transformation. When using the original direction 5-tuple the admissibility of return and/or related packets need not be based on the mere existence of a conntrack entry, allowing separation of admission policy from the established conntrack state. While existence of a conntrack entry is required for admission of the return or related packets, policy changes can render connections that were initially admitted to be rejected or dropped afterwards. If the admission of the return and related packets was based on mere conntrack state (e.g., connection being in an established state), a policy change that would make the connection rejected or dropped would need to find and delete all conntrack entries affected by such a change. When using the original direction 5-tuple matching the affected conntrack entries can be allowed to time out instead, as the established state of the connection would not need to be the basis for packet admission any more. It should be noted that the directionality of related connections may be the same or different than that of the master connection, and neither the original direction 5-tuple nor the conntrack state bits carry this information. If needed, the directionality of the master connection can be stored in master's conntrack mark or labels, which are automatically inherited by the expected related connections. The fact that neither ARP not ND packets are trackable by conntrack allows mutual exclusion between ARP/ND and the new conntrack original tuple fields. Hence, the IP addresses are overlaid in union with ARP and ND fields. This allows the sw_flow_key to not grow much due to this patch, but it also means that we must be careful to never use the new key fields with ARP or ND packets. ARP is easy to distinguish and keep mutually exclusive based on the ethernet type, but ND being an ICMPv6 protocol requires a bit more attention. Signed-off-by: Jarno Rajahalme--- include/uapi/linux/openvswitch.h | 20 +- net/openvswitch/actions.c| 2 + net/openvswitch/conntrack.c | 86 +--- net/openvswitch/conntrack.h | 10 - net/openvswitch/flow.c | 34 +--- net/openvswitch/flow.h | 49 ++- net/openvswitch/flow_netlink.c | 85 +-- net/openvswitch/flow_netlink.h | 7 +++- 8 files changed, 246 insertions(+), 47 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 96aee34..90af8b8 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -1,6 +1,6 @@ /* - * Copyright (c) 2007-2013 Nicira, Inc. + * Copyright (c) 2007-2017 Nicira, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of version 2 of the GNU General Public @@ -331,6 +331,8 @@ enum ovs_key_attr { OVS_KEY_ATTR_CT_ZONE, /* u16 connection tracking zone. */ OVS_KEY_ATTR_CT_MARK, /* u32 connection tracking mark */ OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */ + OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4, /* struct ovs_key_ct_tuple_ipv4 */ + OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6, /* struct ovs_key_ct_tuple_ipv6 */ #ifdef __KERNEL__ OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */ @@ -472,6 +474,22 @@ struct ovs_key_ct_labels { #define OVS_CS_F_NAT_MASK (OVS_CS_F_SRC_NAT | OVS_CS_F_DST_NAT) +struct ovs_key_ct_tuple_ipv4 { + __be32 ipv4_src; + __be32 ipv4_dst; + __be16 src_port; + __be16 dst_port; + __u8 ipv4_proto; +}; + +struct
[PATCH 2/3 v2 net-next] enic: add udp_tunnel ndo for vxlan offload
Defines enic_udp_tunnel_add/del for configuring vxlan tunnel offload. enic supports offload of only one ipv4/udp port. There are two modes that fw supports for vxlan offload. mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set if checksum of csum is ok. This bit is or of ip_csum_ok and tcp_udp_csum_ok mode 2: BIT(0) in rss_hash is set if it is encapsulated packet. BIT(1) is set if outer_ip_csum_ok/ BIT(2) is set if outer_tcp_csum_ok tcp_udp_csum_ok/ipv4_csum_ok is set if inner csum is OK. Signed-off-by: Govindarajulu Varadarajan--- drivers/net/ethernet/cisco/enic/enic.h | 6 ++ drivers/net/ethernet/cisco/enic/enic_main.c | 156 +++- 2 files changed, 159 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/cisco/enic/enic.h b/drivers/net/ethernet/cisco/enic/enic.h index 9023c858715d..2b23f46b34d3 100644 --- a/drivers/net/ethernet/cisco/enic/enic.h +++ b/drivers/net/ethernet/cisco/enic/enic.h @@ -135,6 +135,11 @@ struct enic_rfs_flw_tbl { struct timer_list rfs_may_expire; }; +struct vxlan_offload { + u16 vxlan_udp_port_number; + u8 patch_level; +}; + /* Per-instance private data structure */ struct enic { struct net_device *netdev; @@ -175,6 +180,7 @@ struct enic { /* receive queue cache line section */ cacheline_aligned struct vnic_rq rq[ENIC_RQ_MAX]; unsigned int rq_count; + struct vxlan_offload vxlan; u64 rq_truncated_pkts; u64 rq_bad_fcs; struct napi_struct napi[ENIC_RQ_MAX + ENIC_WQ_MAX]; diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c index c009f6ddabf7..7e56bf95cfc7 100644 --- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -45,6 +45,7 @@ #endif #include #include +#include #include "cq_enet_desc.h" #include "vnic_dev.h" @@ -176,6 +177,92 @@ static void enic_unset_affinity_hint(struct enic *enic) irq_set_affinity_hint(enic->msix_entry[i].vector, NULL); } +static void enic_udp_tunnel_add(struct net_device *netdev, + struct udp_tunnel_info *ti) +{ + struct enic *enic = netdev_priv(netdev); + __be16 port = ti->port; + int err; + + spin_lock_bh(>devcmd_lock); + + if (ti->type != UDP_TUNNEL_TYPE_VXLAN) { + netdev_info(netdev, "udp_tnl: only vxlan tunnel offload supported"); + goto error; + } + + if (ti->sa_family != AF_INET) { + netdev_info(netdev, "vxlan: only IPv4 offload supported"); + goto error; + } + + if (enic->vxlan.vxlan_udp_port_number) { + if (ntohs(port) == enic->vxlan.vxlan_udp_port_number) + netdev_warn(netdev, "vxlan: udp port already offloaded"); + else + netdev_info(netdev, "vxlan: offload supported for only one UDP port"); + + goto error; + } + + err = vnic_dev_overlay_offload_cfg(enic->vdev, + OVERLAY_CFG_VXLAN_PORT_UPDATE, + ntohs(port)); + if (err) + goto error; + + err = vnic_dev_overlay_offload_ctrl(enic->vdev, OVERLAY_FEATURE_VXLAN, + enic->vxlan.patch_level); + if (err) + goto error; + + enic->vxlan.vxlan_udp_port_number = ntohs(port); + + netdev_info(netdev, "vxlan fw-vers-%d: offload enabled for udp port: %d, sa_family: %d ", + (int)enic->vxlan.patch_level, ntohs(port), ti->sa_family); + + goto unlock; + +error: + netdev_info(netdev, "failed to offload udp port: %d, sa_family: %d, type: %d", + ntohs(port), ti->sa_family, ti->type); +unlock: + spin_unlock_bh(>devcmd_lock); +} + +static void enic_udp_tunnel_del(struct net_device *netdev, + struct udp_tunnel_info *ti) +{ + struct enic *enic = netdev_priv(netdev); + int err; + + spin_lock_bh(>devcmd_lock); + + if ((ti->sa_family != AF_INET) || + ((ntohs(ti->port) != enic->vxlan.vxlan_udp_port_number)) || + (ti->type != UDP_TUNNEL_TYPE_VXLAN)) { + netdev_info(netdev, "udp_tnl: port:%d, sa_family: %d, type: %d not offloaded", + ntohs(ti->port), ti->sa_family, ti->type); + goto unlock; + } + + err = vnic_dev_overlay_offload_ctrl(enic->vdev, OVERLAY_FEATURE_VXLAN, + OVERLAY_OFFLOAD_DISABLE); + if (err) { + netdev_err(netdev, "vxlan: del offload udp port: %d failed", + ntohs(ti->port)); + goto unlock; + } + + enic->vxlan.vxlan_udp_port_number = 0; + + netdev_info(netdev, "vxlan: del
[PATCH 1/3 v2 net-next] enic: add devcmds for vxlan offload
This patch adds devcmds needed for vxlan offload. Implement 3 new devcmd overlay_offload_ctrl: enable/disable offload overlay_offload_cfg: update offload udp port number get_supported_feature_ver: get hw supported offload version. Each version has different bitmap for csum_ok/encap Signed-off-by: Govindarajulu Varadarajan--- drivers/net/ethernet/cisco/enic/vnic_dev.c| 34 ++ drivers/net/ethernet/cisco/enic/vnic_dev.h| 5 +++ drivers/net/ethernet/cisco/enic/vnic_devcmd.h | 51 +++ drivers/net/ethernet/cisco/enic/vnic_enet.h | 1 + 4 files changed, 91 insertions(+) diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.c b/drivers/net/ethernet/cisco/enic/vnic_dev.c index 8f27df3207bc..1841ad45d215 100644 --- a/drivers/net/ethernet/cisco/enic/vnic_dev.c +++ b/drivers/net/ethernet/cisco/enic/vnic_dev.c @@ -1247,3 +1247,37 @@ int vnic_dev_classifier(struct vnic_dev *vdev, u8 cmd, u16 *entry, return ret; } + +int vnic_dev_overlay_offload_ctrl(struct vnic_dev *vdev, u8 overlay, u8 config) +{ + u64 a0 = overlay; + u64 a1 = config; + int wait = 1000; + + return vnic_dev_cmd(vdev, CMD_OVERLAY_OFFLOAD_CTRL, , , wait); +} + +int vnic_dev_overlay_offload_cfg(struct vnic_dev *vdev, u8 overlay, +u16 vxlan_udp_port_number) +{ + u64 a1 = vxlan_udp_port_number; + u64 a0 = overlay; + int wait = 1000; + + return vnic_dev_cmd(vdev, CMD_OVERLAY_OFFLOAD_CFG, , , wait); +} + +int vnic_dev_get_supported_feature_ver(struct vnic_dev *vdev, u8 feature, + u64 *supported_versions) +{ + u64 a0 = feature; + int wait = 1000; + u64 a1 = 0; + int ret; + + ret = vnic_dev_cmd(vdev, CMD_GET_SUPP_FEATURE_VER, , , wait); + if (!ret) + *supported_versions = a0; + + return ret; +} diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.h b/drivers/net/ethernet/cisco/enic/vnic_dev.h index 54156c484424..9d43d6bb9907 100644 --- a/drivers/net/ethernet/cisco/enic/vnic_dev.h +++ b/drivers/net/ethernet/cisco/enic/vnic_dev.h @@ -179,5 +179,10 @@ int vnic_dev_set_mac_addr(struct vnic_dev *vdev, u8 *mac_addr); int vnic_dev_classifier(struct vnic_dev *vdev, u8 cmd, u16 *entry, struct filter *data); int vnic_devcmd_init(struct vnic_dev *vdev); +int vnic_dev_overlay_offload_ctrl(struct vnic_dev *vdev, u8 overlay, u8 config); +int vnic_dev_overlay_offload_cfg(struct vnic_dev *vdev, u8 overlay, +u16 vxlan_udp_port_number); +int vnic_dev_get_supported_feature_ver(struct vnic_dev *vdev, u8 feature, + u64 *supported_versions); #endif /* _VNIC_DEV_H_ */ diff --git a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h index 2a812880b884..d83880b0d468 100644 --- a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h +++ b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h @@ -406,6 +406,31 @@ enum vnic_devcmd_cmd { * in: (u32) a0=Queue Pair number */ CMD_QP_STATS_CLEAR = _CMDC(_CMD_DIR_WRITE, _CMD_VTYPE_ENET, 63), + + /* Use this devcmd for agreeing on the highest common version supported +* by both driver and fw for features who need such a facility. +* in: (u64) a0 = feature (driver requests for the supported versions +* on this feature) +* out: (u64) a0 = bitmap of all supported versions for that feature +*/ + CMD_GET_SUPP_FEATURE_VER = _CMDC(_CMD_DIR_RW, _CMD_VTYPE_ENET, 69), + + /* Control (Enable/Disable) overlay offloads on the given vnic +* in: (u8) a0 = OVERLAY_FEATURE_NVGRE : NVGRE +* a0 = OVERLAY_FEATURE_VXLAN : VxLAN +* in: (u8) a1 = OVERLAY_OFFLOAD_ENABLE : Enable or +* a1 = OVERLAY_OFFLOAD_DISABLE : Disable or +* a1 = OVERLAY_OFFLOAD_ENABLE_V2 : Enable with version 2 +*/ + CMD_OVERLAY_OFFLOAD_CTRL = _CMDC(_CMD_DIR_WRITE, _CMD_VTYPE_ENET, 72), + + /* Configuration of overlay offloads feature on a given vNIC +* in: (u8) a0 = DEVCMD_OVERLAY_NVGRE : NVGRE +* a0 = DEVCMD_OVERLAY_VXLAN : VxLAN +* in: (u8) a1 = VXLAN_PORT_UPDATE : VxLAN +* in: (u16) a2 = unsigned short int port information +*/ + CMD_OVERLAY_OFFLOAD_CFG = _CMDC(_CMD_DIR_WRITE, _CMD_VTYPE_ENET, 73), }; /* CMD_ENABLE2 flags */ @@ -657,4 +682,30 @@ struct devcmd2_result { #define DEVCMD2_RING_SIZE 32 #define DEVCMD2_DESC_SIZE 128 +enum overlay_feature_t { + OVERLAY_FEATURE_NVGRE = 1, + OVERLAY_FEATURE_VXLAN, + OVERLAY_FEATURE_MAX, +}; + +enum overlay_ofld_cmd { + OVERLAY_OFFLOAD_ENABLE, + OVERLAY_OFFLOAD_DISABLE, + OVERLAY_OFFLOAD_ENABLE_P2, + OVERLAY_OFFLOAD_MAX, +}; + +#define
[PATCH 0/3 v2 net-next] enic: add vxlan offload support
This series adds vxlan offload support for enic driver. The first patch adds vxlan devcmd for configuring vxland offload parameters. Second patch adds ndo_udp_tunnel_add/del and offload on rx path. There are to modes in which fw supports vxlan offload. mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set if checksum of csum is ok. This bit is or of ip_csum_ok and tcp_udp_csum_ok mode 2: BIT(0) in rss_hash is set if it is encapsulated packet. BIT(1) is set if outer_ip_csum_ok/ BIT(2) is set if outer_tcp_csum_ok Some hw supports only mode 0, some support mode 0 and 2. Driver gets the supported modes bitmap using get_supported_feature_ver devcmd and selects the highest mode both driver and fw supports. Third patch adds offload support on tx path by adding enic_features_check(). v2: Order local variable declarations from longest to shortest line, on all three patches. Govindarajulu Varadarajan (3): enic: add devcmds for vxlan offload enic: add udp_tunnel ndo for vxlan offload enic: add vxlan offload on tx path drivers/net/ethernet/cisco/enic/enic.h| 6 + drivers/net/ethernet/cisco/enic/enic_main.c | 282 -- drivers/net/ethernet/cisco/enic/vnic_dev.c| 34 drivers/net/ethernet/cisco/enic/vnic_dev.h| 5 + drivers/net/ethernet/cisco/enic/vnic_devcmd.h | 51 + drivers/net/ethernet/cisco/enic/vnic_enet.h | 1 + 6 files changed, 364 insertions(+), 15 deletions(-) -- 2.11.0
[PATCH 3/3 v2 net-next] enic: add vxlan offload on tx path
Define ndo_features_check. Hw supports offload only for ipv4 inner and ipv4 outer pkt. Code refactor for setting inner tcp pseudo csum. Signed-off-by: Govindarajulu Varadarajan--- drivers/net/ethernet/cisco/enic/enic_main.c | 126 +--- 1 file changed, 114 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c index 7e56bf95cfc7..4b87beeabce1 100644 --- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -263,6 +263,48 @@ static void enic_udp_tunnel_del(struct net_device *netdev, spin_unlock_bh(>devcmd_lock); } +static netdev_features_t enic_features_check(struct sk_buff *skb, +struct net_device *dev, +netdev_features_t features) +{ + const struct ethhdr *eth = (struct ethhdr *)skb_inner_mac_header(skb); + struct enic *enic = netdev_priv(dev); + struct udphdr *udph; + u16 port = 0; + u16 proto; + + if (!skb->encapsulation) + return features; + + features = vxlan_features_check(skb, features); + + /* hardware only supports IPv4 vxlan tunnel */ + if (vlan_get_protocol(skb) != htons(ETH_P_IP)) + goto out; + + /* hardware does not support offload of ipv6 inner pkt */ + if (eth->h_proto != ntohs(ETH_P_IP)) + goto out; + + proto = ip_hdr(skb)->protocol; + + if (proto == IPPROTO_UDP) { + udph = udp_hdr(skb); + port = be16_to_cpu(udph->dest); + } + + /* HW supports offload of only one UDP port. Remove CSUM and GSO MASK +* for other UDP port tunnels +*/ + if (port != enic->vxlan.vxlan_udp_port_number) + goto out; + + return features; + +out: + return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK); +} + int enic_is_dynamic(struct enic *enic) { return enic->pdev->device == PCI_DEVICE_ID_CISCO_VIC_ENET_DYN; @@ -591,20 +633,19 @@ static int enic_queue_wq_skb_csum_l4(struct enic *enic, struct vnic_wq *wq, return err; } -static int enic_queue_wq_skb_tso(struct enic *enic, struct vnic_wq *wq, -struct sk_buff *skb, unsigned int mss, -int vlan_tag_insert, unsigned int vlan_tag, -int loopback) +static void enic_preload_tcp_csum_encap(struct sk_buff *skb) { - unsigned int frag_len_left = skb_headlen(skb); - unsigned int len_left = skb->len - frag_len_left; - unsigned int hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb); - int eop = (len_left == 0); - unsigned int len; - dma_addr_t dma_addr; - unsigned int offset = 0; - skb_frag_t *frag; + if (skb->protocol == cpu_to_be16(ETH_P_IP)) { + inner_ip_hdr(skb)->check = 0; + inner_tcp_hdr(skb)->check = + ~csum_tcpudp_magic(inner_ip_hdr(skb)->saddr, + inner_ip_hdr(skb)->daddr, 0, + IPPROTO_TCP, 0); + } +} +static void enic_preload_tcp_csum(struct sk_buff *skb) +{ /* Preload TCP csum field with IP pseudo hdr calculated * with IP length set to zero. HW will later add in length * to each TCP segment resulting from the TSO. @@ -618,6 +659,30 @@ static int enic_queue_wq_skb_tso(struct enic *enic, struct vnic_wq *wq, tcp_hdr(skb)->check = ~csum_ipv6_magic(_hdr(skb)->saddr, _hdr(skb)->daddr, 0, IPPROTO_TCP, 0); } +} + +static int enic_queue_wq_skb_tso(struct enic *enic, struct vnic_wq *wq, +struct sk_buff *skb, unsigned int mss, +int vlan_tag_insert, unsigned int vlan_tag, +int loopback) +{ + unsigned int frag_len_left = skb_headlen(skb); + unsigned int len_left = skb->len - frag_len_left; + int eop = (len_left == 0); + unsigned int offset = 0; + unsigned int hdr_len; + dma_addr_t dma_addr; + unsigned int len; + skb_frag_t *frag; + + if (skb->encapsulation) { + hdr_len = skb_inner_transport_header(skb) - skb->data; + hdr_len += inner_tcp_hdrlen(skb); + enic_preload_tcp_csum_encap(skb); + } else { + hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb); + enic_preload_tcp_csum(skb); + } /* Queue WQ_ENET_MAX_DESC_LEN length descriptors * for the main skb fragment @@ -666,6 +731,38 @@ static int enic_queue_wq_skb_tso(struct enic *enic, struct vnic_wq *wq, return 0; } +static inline int enic_queue_wq_skb_encap(struct enic *enic, struct vnic_wq *wq, +
[PATCH RFC v2 7/8] bnxt: Changes to use generic XDP infrastructure
Change XDP program management functional interface to correspond to new XDP API. Signed-off-by: Tom Herbert--- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 14 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 46 +++ 3 files changed, 27 insertions(+), 35 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index cda1c78..ce311fb 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -2091,9 +2091,6 @@ static void bnxt_free_rx_rings(struct bnxt *bp) struct bnxt_rx_ring_info *rxr = >rx_ring[i]; struct bnxt_ring_struct *ring; - if (rxr->xdp_prog) - bpf_prog_put(rxr->xdp_prog); - kfree(rxr->rx_tpa); rxr->rx_tpa = NULL; @@ -2381,15 +2378,6 @@ static int bnxt_init_one_rx_ring(struct bnxt *bp, int ring_nr) ring = >rx_ring_struct; bnxt_init_rxbd_pages(ring, type); - if (BNXT_RX_PAGE_MODE(bp) && bp->xdp_prog) { - rxr->xdp_prog = bpf_prog_add(bp->xdp_prog, 1); - if (IS_ERR(rxr->xdp_prog)) { - int rc = PTR_ERR(rxr->xdp_prog); - - rxr->xdp_prog = NULL; - return rc; - } - } prod = rxr->rx_prod; for (i = 0; i < bp->rx_ring_size; i++) { if (bnxt_alloc_rx_data(bp, rxr, prod, GFP_KERNEL) != 0) { @@ -7157,8 +7145,6 @@ static void bnxt_remove_one(struct pci_dev *pdev) pci_iounmap(pdev, bp->bar0); kfree(bp->edev); bp->edev = NULL; - if (bp->xdp_prog) - bpf_prog_put(bp->xdp_prog); free_netdev(dev); pci_release_regions(pdev); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 9f07b9c..3efe7af 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1175,7 +1175,7 @@ struct bnxt { u8 num_leds; struct bnxt_led_infoleds[BNXT_MAX_LED]; - struct bpf_prog *xdp_prog; + boolxdp_enabled; }; #define BNXT_RX_STATS_OFFSET(counter) \ diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 899c30f..3cfdc94 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -85,18 +85,18 @@ void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts) bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, struct page *page, u8 **data_ptr, unsigned int *len, u8 *event) { - struct bpf_prog *xdp_prog = READ_ONCE(rxr->xdp_prog); struct bnxt_tx_ring_info *txr; struct bnxt_sw_rx_bd *rx_buf; struct pci_dev *pdev; struct xdp_buff xdp; + struct xdp_hook *last_hook; dma_addr_t mapping; void *orig_data; u32 tx_avail; u32 offset; u32 act; - if (!xdp_prog) + if (!xdp_hook_run_needed_check(bp->dev, >bnapi->napi)) return false; pdev = bp->pdev; @@ -113,7 +113,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, dma_sync_single_for_cpu(>dev, mapping + offset, *len, bp->rx_dir); rcu_read_lock(); - act = bpf_prog_run_xdp(xdp_prog, ); + act = xdp_hook_run_ret_last(>bnapi->napi, , _hook); rcu_read_unlock(); tx_avail = bnxt_tx_avail(bp, txr); @@ -134,7 +134,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, case XDP_TX: if (tx_avail < 2) { - trace_xdp_exception(bp->dev, xdp_prog, act); + trace_xdp_hook_exception(bp->dev, last_hook, act); bnxt_reuse_rx_data(rxr, cons, page); return true; } @@ -147,10 +147,10 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, bnxt_reuse_rx_data(rxr, cons, page); return true; default: - bpf_warn_invalid_xdp_action(act); + xdp_warn_invalid_action(act); /* Fall thru */ case XDP_ABORTED: - trace_xdp_exception(bp->dev, xdp_prog, act); + trace_xdp_hook_exception(bp->dev, last_hook, act); /* Fall thru */ case XDP_DROP: bnxt_reuse_rx_data(rxr, cons, page); @@ -160,13 +160,15 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, } /* Under rtnl_lock */ -static int bnxt_xdp_set(struct bnxt *bp, struct bpf_prog *prog) +static int bnxt_xdp_init(struct bnxt
[PATCH RFC v2 2/8] mlx4: Changes to use generic XDP infrastructure
Change XDP program management functional interface to correspond to new XDP API. Signed-off-by: Tom Herbert--- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 92 +- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 27 drivers/net/ethernet/mellanox/mlx4/en_tx.c | 1 + drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 - 4 files changed, 29 insertions(+), 92 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index 748e9f6..613786a 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -2195,8 +2196,7 @@ int mlx4_en_try_alloc_resources(struct mlx4_en_priv *priv, struct mlx4_en_port_profile *prof, bool carry_xdp_prog) { - struct bpf_prog *xdp_prog; - int i, t; + int t; mlx4_en_copy_priv(tmp, priv, prof); @@ -2211,22 +2211,6 @@ int mlx4_en_try_alloc_resources(struct mlx4_en_priv *priv, return -ENOMEM; } - /* All rx_rings has the same xdp_prog. Pick the first one. */ - xdp_prog = rcu_dereference_protected( - priv->rx_ring[0]->xdp_prog, - lockdep_is_held(>mdev->state_lock)); - - if (xdp_prog && carry_xdp_prog) { - xdp_prog = bpf_prog_add(xdp_prog, tmp->rx_ring_num); - if (IS_ERR(xdp_prog)) { - mlx4_en_free_resources(tmp); - return PTR_ERR(xdp_prog); - } - for (i = 0; i < tmp->rx_ring_num; i++) - rcu_assign_pointer(tmp->rx_ring[i]->xdp_prog, - xdp_prog); - } - return 0; } @@ -2713,42 +2697,20 @@ static int mlx4_en_set_tx_maxrate(struct net_device *dev, int queue_index, u32 m return err; } -static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog) +static int mlx4_xdp_init(struct net_device *dev, bool enable) { struct mlx4_en_priv *priv = netdev_priv(dev); struct mlx4_en_dev *mdev = priv->mdev; struct mlx4_en_port_profile new_prof; - struct bpf_prog *old_prog; struct mlx4_en_priv *tmp; int tx_changed = 0; - int xdp_ring_num; int port_up = 0; - int err; - int i; + int xdp_ring_num, err; - xdp_ring_num = prog ? priv->rx_ring_num : 0; + xdp_ring_num = enable ? ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP) : 0; - /* No need to reconfigure buffers when simply swapping the -* program for a new one. -*/ - if (priv->tx_ring_num[TX_XDP] == xdp_ring_num) { - if (prog) { - prog = bpf_prog_add(prog, priv->rx_ring_num - 1); - if (IS_ERR(prog)) - return PTR_ERR(prog); - } - mutex_lock(>state_lock); - for (i = 0; i < priv->rx_ring_num; i++) { - old_prog = rcu_dereference_protected( - priv->rx_ring[i]->xdp_prog, - lockdep_is_held(>state_lock)); - rcu_assign_pointer(priv->rx_ring[i]->xdp_prog, prog); - if (old_prog) - bpf_prog_put(old_prog); - } - mutex_unlock(>state_lock); + if (priv->tx_ring_num[TX_XDP] == xdp_ring_num) return 0; - } if (!mlx4_en_check_xdp_mtu(dev, dev->mtu)) return -EOPNOTSUPP; @@ -2757,14 +2719,6 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog) if (!tmp) return -ENOMEM; - if (prog) { - prog = bpf_prog_add(prog, priv->rx_ring_num - 1); - if (IS_ERR(prog)) { - err = PTR_ERR(prog); - goto out; - } - } - mutex_lock(>state_lock); memcpy(_prof, priv->prof, sizeof(struct mlx4_en_port_profile)); new_prof.tx_ring_num[TX_XDP] = xdp_ring_num; @@ -2777,11 +2731,8 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog) } err = mlx4_en_try_alloc_resources(priv, tmp, _prof, false); - if (err) { - if (prog) - bpf_prog_sub(prog, priv->rx_ring_num - 1); + if (err) goto unlock_out; - } if (priv->port_up) { port_up = 1; @@ -2792,15 +2743,6 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog) if (tx_changed) netif_set_real_num_tx_queues(dev, priv->tx_ring_num[TX]); - for (i = 0; i < priv->rx_ring_num; i++) { - old_prog =
RE: [net] net: phy: Fix lack of reference count on PHY driver
> -Original Message- > From: Andrew Lunn [mailto:and...@lunn.ch] > Sent: Thursday, February 09, 2017 12:24 AM > To: Robin Murphy > Cc: Florian Fainelli; netdev@vger.kernel.org; da...@davemloft.net; > rmk+ker...@armlinux.org.uk; maowenan; Catalin Marinas > Subject: Re: [net] net: phy: Fix lack of reference count on PHY driver > > On Wed, Feb 08, 2017 at 04:03:43PM +, Robin Murphy wrote: > > Hi all, > > > > We're seeing a new boot-time crash[1] on SMSC911x hardware from this > > patch in today's HEAD (as cafe8df8b9bc)... > > Hi Robin > > Thank for the report. See the discussion on netdev under the subject "Kernel > crashes in phy_attach_direct()" > > Andrew There is bug report from Dan Carpenter(dan.carpen...@oracle.com) who ran static analysis to find this issue. Thanks a lot. [bug report] net: phy: Fix lack of reference count on PHY driver
[PATCH RFC v2 5/8] virt_net: Changes to use generic XDP infrastructure
Change XDP program management functional interface to correspond to new XDP API. Signed-off-by: Tom Herbert--- drivers/net/virtio_net.c | 98 +++- 1 file changed, 38 insertions(+), 60 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 11e2853..e8b1747 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -93,8 +93,6 @@ struct receive_queue { struct napi_struct napi; - struct bpf_prog __rcu *xdp_prog; - /* Chain pages by the private ptr. */ struct page *pages; @@ -140,6 +138,9 @@ struct virtnet_info { /* Host can handle any s/g split between our header and packet data */ bool any_header_sg; + /* XDP has been enabled in device */ + bool xdp_enabled; + /* Packet virtio header size */ u8 hdr_len; @@ -414,13 +415,12 @@ static struct sk_buff *receive_small(struct net_device *dev, void *buf, unsigned int len) { struct sk_buff * skb = buf; - struct bpf_prog *xdp_prog; + struct xdp_hook *last_hook; len -= vi->hdr_len; rcu_read_lock(); - xdp_prog = rcu_dereference(rq->xdp_prog); - if (xdp_prog) { + if (xdp_hook_run_needed_check(dev, >napi)) { struct virtio_net_hdr_mrg_rxbuf *hdr = buf; struct xdp_buff xdp; u32 act; @@ -431,8 +431,7 @@ static struct sk_buff *receive_small(struct net_device *dev, xdp.data_hard_start = skb->data; xdp.data = skb->data + VIRTIO_XDP_HEADROOM; xdp.data_end = xdp.data + len; - act = bpf_prog_run_xdp(xdp_prog, ); - + act = xdp_hook_run_ret_last(>napi, , _hook); switch (act) { case XDP_PASS: /* Recalculate length in case bpf program changed it */ @@ -441,13 +440,13 @@ static struct sk_buff *receive_small(struct net_device *dev, break; case XDP_TX: if (unlikely(!virtnet_xdp_xmit(vi, rq, , skb))) - trace_xdp_exception(vi->dev, xdp_prog, act); + trace_xdp_hook_exception(vi->dev, last_hook, act); rcu_read_unlock(); goto xdp_xmit; default: - bpf_warn_invalid_xdp_action(act); + xdp_warn_invalid_action(act); case XDP_ABORTED: - trace_xdp_exception(vi->dev, xdp_prog, act); + trace_xdp_hook_exception(vi->dev, last_hook, act); case XDP_DROP: goto err_xdp; } @@ -559,16 +558,15 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, struct page *page = virt_to_head_page(buf); int offset = buf - page_address(page); struct sk_buff *head_skb, *curr_skb; - struct bpf_prog *xdp_prog; unsigned int truesize; head_skb = NULL; rcu_read_lock(); - xdp_prog = rcu_dereference(rq->xdp_prog); - if (xdp_prog) { + if (xdp_hook_run_needed_check(dev, >napi)) { struct page *xdp_page; struct xdp_buff xdp; + struct xdp_hook *last_hook; void *data; u32 act; @@ -599,7 +597,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len; xdp.data = data + vi->hdr_len; xdp.data_end = xdp.data + (len - vi->hdr_len); - act = bpf_prog_run_xdp(xdp_prog, ); + act = xdp_hook_run_ret_last(>napi, , _hook); switch (act) { case XDP_PASS: @@ -622,16 +620,16 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, break; case XDP_TX: if (unlikely(!virtnet_xdp_xmit(vi, rq, , data))) - trace_xdp_exception(vi->dev, xdp_prog, act); + trace_xdp_hook_exception(vi->dev, last_hook, act); ewma_pkt_len_add(>mrg_avg_pkt_len, len); if (unlikely(xdp_page != page)) goto err_xdp; rcu_read_unlock(); goto xdp_xmit; default: - bpf_warn_invalid_xdp_action(act); + xdp_warn_invalid_action(act); case XDP_ABORTED: - trace_xdp_exception(vi->dev, xdp_prog, act); + trace_xdp_hook_exception(vi->dev, last_hook, act); case XDP_DROP: if (unlikely(xdp_page != page))
Re: [PATCH net-next v2 3/3] net: ethernet: bgmac: driver power manangement
On 02/08/2017 01:24 PM, Jon Mason wrote: > From: Joey Zhong> > Implement suspend/resume callbacks in the bgmac driver. This makes sure > that we de-initialize and re-initialize the hardware correctly before > entering suspend and when resuming. > > Signed-off-by: Joey Zhong > Signed-off-by: Jon Mason > --- > drivers/net/ethernet/broadcom/bgmac-platform.c | 34 + > drivers/net/ethernet/broadcom/bgmac.c | 51 > ++ > drivers/net/ethernet/broadcom/bgmac.h | 2 + > 3 files changed, 87 insertions(+) > > diff --git a/drivers/net/ethernet/broadcom/bgmac-platform.c > b/drivers/net/ethernet/broadcom/bgmac-platform.c > index 2d153f7..3df91e7 100644 > --- a/drivers/net/ethernet/broadcom/bgmac-platform.c > +++ b/drivers/net/ethernet/broadcom/bgmac-platform.c > @@ -21,8 +21,12 @@ > #include > #include "bgmac.h" > > +#define NICPM_PADRING_CFG0x0004 > #define NICPM_IOMUX_CTRL 0x0008 > > +#define NICPM_PADRING_CFG_INIT_VAL 0x7400 > +#define NICPM_IOMUX_CTRL_INIT_VAL_AX 0x2188 > + > #define NICPM_IOMUX_CTRL_INIT_VAL0x3196e000 > #define NICPM_IOMUX_CTRL_SPD_SHIFT 10 > #define NICPM_IOMUX_CTRL_SPD_10M 0 > @@ -108,6 +112,10 @@ static void bgmac_nicpm_speed_set(struct net_device > *net_dev) > if (!bgmac->plat.nicpm_base) > return; > > + /* SET RGMII IO CONFIG */ > + writel(NICPM_PADRING_CFG_INIT_VAL, > +bgmac->plat.nicpm_base + NICPM_PADRING_CFG); > + > val = NICPM_IOMUX_CTRL_INIT_VAL; > switch (bgmac->net_dev->phydev->speed) { > default: > @@ -239,6 +247,31 @@ static int bgmac_remove(struct platform_device *pdev) > return 0; > } > > +#ifdef CONFIG_PM > +static int bgmac_suspend(struct device *dev) > +{ > + struct bgmac *bgmac = dev_get_drvdata(dev); > + > + return bgmac_enet_suspend(bgmac); > +} > + > +static int bgmac_resume(struct device *dev) > +{ > + struct bgmac *bgmac = dev_get_drvdata(dev); > + > + return bgmac_enet_resume(bgmac); > +} > + > +static const struct dev_pm_ops bgmac_pm_ops = { > + .suspend = bgmac_suspend, > + .resume = bgmac_resume > +}; > + > +#define BGMAC_PM_OPS (_pm_ops) > +#else > +#define BGMAC_PM_OPS NULL > +#endif /* CONFIG_PM */ > + > static const struct of_device_id bgmac_of_enet_match[] = { > {.compatible = "brcm,amac",}, > {.compatible = "brcm,nsp-amac",}, > @@ -252,6 +285,7 @@ static struct platform_driver bgmac_enet_driver = { > .driver = { > .name = "bgmac-enet", > .of_match_table = bgmac_of_enet_match, > + .pm = BGMAC_PM_OPS > }, > .probe = bgmac_probe, > .remove = bgmac_remove, > diff --git a/drivers/net/ethernet/broadcom/bgmac.c > b/drivers/net/ethernet/broadcom/bgmac.c > index bd549f8..e78c91d 100644 > --- a/drivers/net/ethernet/broadcom/bgmac.c > +++ b/drivers/net/ethernet/broadcom/bgmac.c > @@ -1478,6 +1478,7 @@ int bgmac_enet_probe(struct bgmac *bgmac) > > net_dev->irq = bgmac->irq; > SET_NETDEV_DEV(net_dev, bgmac->dev); > + dev_set_drvdata(bgmac->dev, bgmac); > > if (!is_valid_ether_addr(bgmac->mac_addr)) { > dev_err(bgmac->dev, "Invalid MAC addr: %pM\n", > @@ -1551,5 +1552,55 @@ void bgmac_enet_remove(struct bgmac *bgmac) > } > EXPORT_SYMBOL_GPL(bgmac_enet_remove); > > +int bgmac_enet_suspend(struct bgmac *bgmac) > +{ > + if (!netif_running(bgmac->net_dev)) > + return 0; > + > + phy_stop(bgmac->net_dev->phydev); > + > + netif_stop_queue(bgmac->net_dev); > + > + napi_disable(>napi); > + > + netif_tx_lock(bgmac->net_dev); > + netif_device_detach(bgmac->net_dev); > + netif_tx_unlock(bgmac->net_dev); > + > + bgmac_chip_intrs_off(bgmac); > + bgmac_chip_reset(bgmac); > + bgmac_dma_cleanup(bgmac); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(bgmac_enet_suspend); > + > +int bgmac_enet_resume(struct bgmac *bgmac) > +{ > + int rc; > + > + if (netif_running(bgmac->net_dev)) > + return 0; This should be if (!netif_running()) here, if it is running, we need to do all of what is below. With that fixed: Reviewed-by: Florian Fainelli > + > + rc = bgmac_dma_init(bgmac); > + if (rc) > + return rc; > + > + bgmac_chip_init(bgmac); > + > + napi_enable(>napi); > + > + netif_tx_lock(bgmac->net_dev); > + netif_device_attach(bgmac->net_dev); > + netif_tx_unlock(bgmac->net_dev); > + > + netif_start_queue(bgmac->net_dev); > + > + phy_start(bgmac->net_dev->phydev); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(bgmac_enet_resume); > + > MODULE_AUTHOR("Rafał Miłecki"); > MODULE_LICENSE("GPL"); > diff --git a/drivers/net/ethernet/broadcom/bgmac.h > b/drivers/net/ethernet/broadcom/bgmac.h > index 5a518fe..741ca27 100644 > ---
[PATCH net v3 0/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
Hi all, This patch series addresses the crash seen with the Generic PHY driver in phy_attach_direct() introduced in the latest pull to Linus. We also address how to properly bind and unbind to/from the PHY drivers which would previously be crashing in flames since we did not stop the state machine. Thanks! Changes in v3: - made more testing as module/built-in, with Generic and non-Generic PHY drivers - exercised error paths on purpose by injecting errors - properly incremenet Generic PHY module reference count as well - fixed the error path to be correct Changes in v2: - fixed net: phy: Fix lack of reference count on PHY driver against the Generic PHY driver which is special Florian Fainelli (3): net: phy: Fix PHY module checks and NULL deref in phy_attach_direct() net: phy: Check phydev->drv net: phy: Fix PHY driver bind and unbind events drivers/net/phy/phy.c| 26 + drivers/net/phy/phy_device.c | 45 include/linux/phy.h | 3 +++ 3 files changed, 66 insertions(+), 8 deletions(-) -- 2.9.3
[PATCH net v3 1/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
The Generic PHY drivers gets assigned after we checked that the current PHY driver is NULL, so we need to check a few things before we can safely dereference d->driver. This would be causing a NULL deference to occur when a system binds to the Generic PHY driver. Update phy_attach_direct() to do the following: - grab the driver module reference after we have assigned the Generic PHY drivers accordingly - update the error path to clean up the module reference in case the Generic PHY probe function fails Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver") Signed-off-by: Florian Fainelli--- drivers/net/phy/phy_device.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 0d8f4d3847f6..d63d190a95ef 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -908,6 +908,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, struct module *ndev_owner = dev->dev.parent->driver->owner; struct mii_bus *bus = phydev->mdio.bus; struct device *d = >mdio.dev; + bool using_genphy = false; int err; /* For Ethernet device drivers that register their own MDIO bus, we @@ -938,12 +939,22 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, d->driver = _driver[GENPHY_DRV_1G].mdiodrv.driver; + using_genphy = true; + } + + if (!try_module_get(d->driver->owner)) { + dev_err(>dev, "failed to get the device driver module\n"); + err = -EIO; + goto error_put_device; + } + + if (using_genphy) { err = d->driver->probe(d); if (err >= 0) err = device_bind_driver(d); if (err) - goto error; + goto error_module_put; } if (phydev->attached_dev) { @@ -981,6 +992,9 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, error: phy_detach(phydev); +error_module_put: + module_put(d->driver->owner); +error_put_device: put_device(d); module_put(d->driver->owner); if (ndev_owner != bus->owner) -- 2.9.3
[PATCH net v3 2/3] net: phy: Check phydev->drv
In preparation for supporting driver bind/unbind properly, sprinkle checks on phydev->drv where we may call into PHYLIB from user-space or other parts of the kernel. Suggested-by: Russell KingSigned-off-by: Florian Fainelli --- drivers/net/phy/phy.c| 26 ++ drivers/net/phy/phy_device.c | 2 +- include/linux/phy.h | 3 +++ 3 files changed, 26 insertions(+), 5 deletions(-) diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 7cc1b7dcfe05..d6f7838455dd 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -580,7 +580,7 @@ int phy_mii_ioctl(struct phy_device *phydev, struct ifreq *ifr, int cmd) return 0; case SIOCSHWTSTAMP: - if (phydev->drv->hwtstamp) + if (phydev->drv && phydev->drv->hwtstamp) return phydev->drv->hwtstamp(phydev, ifr); /* fall through */ @@ -603,6 +603,9 @@ int phy_start_aneg(struct phy_device *phydev) { int err; + if (!phydev->drv) + return -EIO; + mutex_lock(>lock); if (AUTONEG_DISABLE == phydev->autoneg) @@ -975,7 +978,7 @@ void phy_state_machine(struct work_struct *work) old_state = phydev->state; - if (phydev->drv->link_change_notify) + if (phydev->drv && phydev->drv->link_change_notify) phydev->drv->link_change_notify(phydev); switch (phydev->state) { @@ -1286,6 +1289,9 @@ EXPORT_SYMBOL(phy_write_mmd_indirect); */ int phy_init_eee(struct phy_device *phydev, bool clk_stop_enable) { + if (!phydev->drv) + return -EIO; + /* According to 802.3az,the EEE is supported only in full duplex-mode. * Also EEE feature is active when core is operating with MII, GMII * or RGMII (all kinds). Internal PHYs are also allowed to proceed and @@ -1363,6 +1369,9 @@ EXPORT_SYMBOL(phy_init_eee); */ int phy_get_eee_err(struct phy_device *phydev) { + if (!phydev->drv) + return -EIO; + return phy_read_mmd_indirect(phydev, MDIO_PCS_EEE_WK_ERR, MDIO_MMD_PCS); } EXPORT_SYMBOL(phy_get_eee_err); @@ -1379,6 +1388,9 @@ int phy_ethtool_get_eee(struct phy_device *phydev, struct ethtool_eee *data) { int val; + if (!phydev->drv) + return -EIO; + /* Get Supported EEE */ val = phy_read_mmd_indirect(phydev, MDIO_PCS_EEE_ABLE, MDIO_MMD_PCS); if (val < 0) @@ -1412,6 +1424,9 @@ int phy_ethtool_set_eee(struct phy_device *phydev, struct ethtool_eee *data) { int val = ethtool_adv_to_mmd_eee_adv_t(data->advertised); + if (!phydev->drv) + return -EIO; + /* Mask prohibited EEE modes */ val &= ~phydev->eee_broken_modes; @@ -1423,7 +1438,7 @@ EXPORT_SYMBOL(phy_ethtool_set_eee); int phy_ethtool_set_wol(struct phy_device *phydev, struct ethtool_wolinfo *wol) { - if (phydev->drv->set_wol) + if (phydev->drv && phydev->drv->set_wol) return phydev->drv->set_wol(phydev, wol); return -EOPNOTSUPP; @@ -1432,7 +1447,7 @@ EXPORT_SYMBOL(phy_ethtool_set_wol); void phy_ethtool_get_wol(struct phy_device *phydev, struct ethtool_wolinfo *wol) { - if (phydev->drv->get_wol) + if (phydev->drv && phydev->drv->get_wol) phydev->drv->get_wol(phydev, wol); } EXPORT_SYMBOL(phy_ethtool_get_wol); @@ -1468,6 +1483,9 @@ int phy_ethtool_nway_reset(struct net_device *ndev) if (!phydev) return -ENODEV; + if (!phydev->drv) + return -EIO; + return genphy_restart_aneg(phydev); } EXPORT_SYMBOL(phy_ethtool_nway_reset); diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index d63d190a95ef..40675b9706ae 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1790,7 +1790,7 @@ static int phy_remove(struct device *dev) phydev->state = PHY_DOWN; mutex_unlock(>lock); - if (phydev->drv->remove) + if (phydev->drv && phydev->drv->remove) phydev->drv->remove(phydev); phydev->drv = NULL; diff --git a/include/linux/phy.h b/include/linux/phy.h index 7fc1105605bf..231e07bb0d76 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -802,6 +802,9 @@ int phy_stop_interrupts(struct phy_device *phydev); static inline int phy_read_status(struct phy_device *phydev) { + if (!phydev->drv) + return -EIO; + return phydev->drv->read_status(phydev); } -- 2.9.3
Re: [PATCH v2 net-next 5/9] openvswitch: Refactor labels initialization.
On 8 February 2017 at 11:32, Jarno Rajahalmewrote: > Refactoring conntrack labels initialization makes chenges in later *changes > patches easier to review. > > Signed-off-by: Jarno Rajahalme Minor other nit: > > cl = nf_ct_labels_find(ct); > if (!cl) { > nf_ct_labels_ext_add(ct); > cl = nf_ct_labels_find(ct); > } > + > + return cl; > +} > + > +/* Initialize labels for a new, to be committed conntrack entry. Note that Maybe insert 'yet': Initialize labels for a new, yet to be committed conntrack entry. Note that
[PATCH RFC v2 8/8] xdp: Cleanup after API changes
This patch: - Change trace_xdp_hook_exception to trace_xdp_exception - Remove XDP_SETUP_PROG and XDP_QUERY_PROG constants - Remove bpf_warn_invalid_xdp_action Signed-off-by: Tom Herbert--- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 4 +-- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 +-- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 4 +-- .../net/ethernet/netronome/nfp/nfp_net_common.c| 8 +++--- drivers/net/ethernet/qlogic/qede/qede_fp.c | 6 ++--- drivers/net/virtio_net.c | 8 +++--- include/linux/filter.h | 1 - include/linux/netdevice.h | 15 --- include/trace/events/xdp.h | 29 -- kernel/bpf/core.c | 1 - net/core/filter.c | 6 - 11 files changed, 16 insertions(+), 70 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 3cfdc94..e894b67 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -134,7 +134,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, case XDP_TX: if (tx_avail < 2) { - trace_xdp_hook_exception(bp->dev, last_hook, act); + trace_xdp_exception(bp->dev, last_hook, act); bnxt_reuse_rx_data(rxr, cons, page); return true; } @@ -150,7 +150,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, xdp_warn_invalid_action(act); /* Fall thru */ case XDP_ABORTED: - trace_xdp_hook_exception(bp->dev, last_hook, act); + trace_xdp_exception(bp->dev, last_hook, act); /* Fall thru */ case XDP_DROP: bnxt_reuse_rx_data(rxr, cons, page); diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index a8fddc0..d8648fe 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -927,12 +927,12 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud length, cq->ring, _pending))) goto consumed; - trace_xdp_hook_exception(dev, last_hook, act); + trace_xdp_exception(dev, last_hook, act); goto xdp_drop_no_cnt; /* Drop on xmit failure */ default: xdp_warn_invalid_action(act); case XDP_ABORTED: - trace_xdp_hook_exception(dev, last_hook, act); + trace_xdp_exception(dev, last_hook, act); case XDP_DROP: ring->xdp_drop++; xdp_drop_no_cnt: diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 50ab4b9..1be1eef 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -740,12 +740,12 @@ static inline int mlx5e_xdp_handle(struct mlx5e_rq *rq, return false; case XDP_TX: if (unlikely(!mlx5e_xmit_xdp_frame(rq, di, ))) - trace_xdp_hook_exception(rq->netdev, last_hook, act); + trace_xdp_exception(rq->netdev, last_hook, act); return true; default: xdp_warn_invalid_action(act); case XDP_ABORTED: - trace_xdp_hook_exception(rq->netdev, last_hook, act); + trace_xdp_exception(rq->netdev, last_hook, act); case XDP_DROP: rq->stats.xdp_drop++; mlx5e_page_release(rq, di, true); diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 2dee867..381f6be 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -1613,15 +1613,13 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget) if (unlikely(!nfp_net_tx_xdp_buf(nn, rx_ring, tx_ring, rxbuf, pkt_off, pkt_len))) - trace_xdp_hook_exception(nn->netdev, -last_hook, -
Re: [PATCH v2 net-next 6/9] openvswitch: Inherit master's labels.
On 8 February 2017 at 11:32, Jarno Rajahalmewrote: > We avoid calling into nf_conntrack_in() for expected connections, as > that would remove the expectation that we want to stick around until > we are ready to commit the connection. Instead, we do a lookup in the > expectation table directly. However, after a successful expectation > lookup we have set the flow key label field from the master > connection, whereas nf_conntrack_in() does not do this. This leads to > master's labels being inherited after an expectation lookup, but those > labels not being inherited after the corresponding conntrack action > with a commit flag. > > This patch resolves the problem by changing the commit code path to > also inherit the master's labels to the expected connection. > Resolving this conflict in favor or inheriting the labels allows more *of inheriting > information be passed from the master connection to related > connections, which would otherwise be much harder if the 32 bits in > the connmark are not enough. Labels can still be set explicitly, so > this change only affects the default values of the labels in presense > of a master connection. > > Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") > Signed-off-by: Jarno Rajahalme > @@ -272,18 +274,32 @@ static int ovs_ct_init_labels(struct nf_conn *ct, > struct sw_flow_key *key, > const struct ovs_key_ct_labels *labels, > const struct ovs_key_ct_labels *mask) > { > - struct nf_conn_labels *cl; > - u32 *dst; > - int i; > + struct nf_conn_labels *cl, *master_cl; > + bool have_mask = labels_nonzero(mask); > + > + /* Inherit master's labels to the related connection? */ > + master_cl = (ct->master) ? nf_ct_labels_find(ct->master) : NULL; I don't think (ct->master) needs the parentheses.
Re:[詢價]GFP-051U 詢價,交換式電源 48V 3.3A 電源
採購你好, 收到你的來信了, 你正在尋找有關「電源製造」的可靠供應商嗎? 我們是台灣最專業的電源供應製造商,有自已的研發團隊與製造工廠, 我們有長期合作的客戶,如PHILIPS、HP、TOSHIBA、LITEON…等, 生產交換式電源供應器、USB充電器、POE供電的經驗非常豐富 歡迎與我們聯絡,期待你的回覆, 吉密科技 林榮宗 0422587996 gme.po...@msa.hinet.net 如寄錯請轉交,謝謝
[PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP
This patch creates an infrastructure for registering and running code at XDP hooks in drivers. This extends and generalizes the original XDP/BPF interface. Specifically, it defines a generic xdp_hook structure and a set of hooks that can be assigned to devices or napi instances. These hooks are also generic to allow for XDP/BPF programs as well as non-BPF code (e.g. kernel code can be written in a module). An XDP hook is defined by the xdp_hook structure. A pointer to this structure is passed into the XDP register function to set up a hook. The XDP register function mallocs its own xdp_hook structure and copies the values from the xdp_hook passed in. The register function also saves the pointer value of the xdp_hook argument; this pointer is used in subsequently calls to XDP to identify the registered hook. The interface is defined in net/xdp.h. This includes the definition of xdp_hook, functions to register and unregister hooks on a device or individual instances of napi, and xdp_hook_run that is called by drivers to run the hooks. Signed-off-by: Tom Herbert--- drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c | 1 + include/linux/filter.h | 10 +- include/linux/netdev_features.h | 3 +- include/linux/netdevice.h| 16 ++ include/net/xdp.h| 310 +++ include/trace/events/xdp.h | 31 +++ kernel/bpf/core.c| 1 + net/core/Makefile| 2 +- net/core/dev.c | 53 ++-- net/core/filter.c| 1 + net/core/rtnetlink.c | 14 +- net/core/xdp.c | 304 ++ 12 files changed, 711 insertions(+), 35 deletions(-) create mode 100644 include/net/xdp.h create mode 100644 net/core/xdp.c diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c index 335beb8..d294fb2 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c @@ -38,6 +38,7 @@ #include #include #include +#include #include "nfp_asm.h" #include "nfp_bpf.h" diff --git a/include/linux/filter.h b/include/linux/filter.h index e4eb254..bb9f2f2 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -428,7 +428,7 @@ struct sk_filter { struct bpf_prog *prog; }; -#define BPF_PROG_RUN(filter, ctx) (*filter->bpf_func)(ctx, filter->insnsi) +#define BPF_PROG_RUN(filter, ctx) (*(filter)->bpf_func)(ctx, (filter)->insnsi) #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN @@ -437,12 +437,6 @@ struct bpf_skb_data_end { void *data_end; }; -struct xdp_buff { - void *data; - void *data_end; - void *data_hard_start; -}; - /* compute the linear packet data range [data, data_end) which * will be accessed by cls_bpf, act_bpf and lwt programs */ @@ -504,6 +498,8 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, return BPF_PROG_RUN(prog, skb); } +struct xdp_buff; + static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, struct xdp_buff *xdp) { diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 9a04195..f22d379 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -71,8 +71,8 @@ enum { NETIF_F_HW_VLAN_STAG_RX_BIT,/* Receive VLAN STAG HW acceleration */ NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */ NETIF_F_HW_L2FW_DOFFLOAD_BIT, /* Allow L2 Forwarding in Hardware */ - NETIF_F_HW_TC_BIT, /* Offload TC infrastructure */ + NETIF_F_XDP_BIT,/* Support XDP interface */ /* * Add your fresh new feature above and remember to update @@ -134,6 +134,7 @@ enum { #define NETIF_F_HW_VLAN_STAG_TX__NETIF_F(HW_VLAN_STAG_TX) #define NETIF_F_HW_L2FW_DOFFLOAD __NETIF_F(HW_L2FW_DOFFLOAD) #define NETIF_F_HW_TC __NETIF_F(HW_TC) +#define NETIF_F_XDP__NETIF_F(XDP) #define for_each_netdev_feature(mask_addr, bit)\ for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 58afbd1..2404e76 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -324,6 +324,7 @@ struct napi_struct { struct sk_buff *skb; struct hrtimer timer; struct list_headdev_list; + struct xdp_hook_set __rcu *xdp_hooks; struct hlist_node napi_hash_node; unsigned intnapi_id; }; @@ -821,12 +822,25 @@ enum xdp_netdev_command { * return true if a program is
[PATCH RFC v2 3/8] nfp: Changes to use generic XDP infrastructure
Change XDP program management functional interface to correspond to new XDP API. Signed-off-by: Tom Herbert--- drivers/net/ethernet/netronome/nfp/nfp_net.h | 5 +- .../net/ethernet/netronome/nfp/nfp_net_common.c| 172 ++--- .../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 12 +- 3 files changed, 87 insertions(+), 102 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h index 2115f44..09a315e 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h @@ -504,14 +504,13 @@ struct nfp_net { unsigned fw_loaded:1; unsigned bpf_offload_skip_sw:1; unsigned bpf_offload_xdp:1; + unsigned xdp_enabled:1; u32 ctrl; u32 fl_bufsz; u32 rx_offset; - struct bpf_prog *xdp_prog; - struct nfp_net_tx_ring *tx_rings; struct nfp_net_rx_ring *rx_rings; @@ -792,7 +791,7 @@ void nfp_net_coalesce_write_cfg(struct nfp_net *nn); int nfp_net_irqs_alloc(struct nfp_net *nn); void nfp_net_irqs_disable(struct nfp_net *nn); int -nfp_net_ring_reconfig(struct nfp_net *nn, struct bpf_prog **xdp_prog, +nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_ring_set *rx, struct nfp_net_ring_set *tx); #ifdef CONFIG_NFP_NET_DEBUG diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 6ac43ab..2dee867 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -65,6 +65,7 @@ #include #include +#include #include "nfp_net_ctrl.h" #include "nfp_net.h" @@ -1166,10 +1167,10 @@ nfp_net_napi_alloc_one(struct nfp_net *nn, int direction, dma_addr_t *dma_addr) { void *frag; - if (!nn->xdp_prog) - frag = napi_alloc_frag(nn->fl_bufsz); - else + if (nn->xdp_enabled) frag = page_address(alloc_page(GFP_ATOMIC | __GFP_COLD)); + else + frag = napi_alloc_frag(nn->fl_bufsz); if (!frag) { nn_warn_ratelimit(nn, "Failed to alloc receive page frag\n"); return NULL; @@ -1177,7 +1178,7 @@ nfp_net_napi_alloc_one(struct nfp_net *nn, int direction, dma_addr_t *dma_addr) *dma_addr = nfp_net_dma_map_rx(nn, frag, nn->fl_bufsz, direction); if (dma_mapping_error(>pdev->dev, *dma_addr)) { - nfp_net_free_frag(frag, nn->xdp_prog); + nfp_net_free_frag(frag, nn->xdp_enabled); nn_warn_ratelimit(nn, "Failed to map DMA RX buffer\n"); return NULL; } @@ -1248,17 +1249,15 @@ static void nfp_net_rx_ring_reset(struct nfp_net_rx_ring *rx_ring) * nfp_net_rx_ring_bufs_free() - Free any buffers currently on the RX ring * @nn:NFP Net device * @rx_ring: RX ring to remove buffers from - * @xdp: Whether XDP is enabled * * Assumes that the device is stopped and buffers are in [0, ring->cnt - 1) * entries. After device is disabled nfp_net_rx_ring_reset() must be called * to restore required ring geometry. */ static void -nfp_net_rx_ring_bufs_free(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring, - bool xdp) +nfp_net_rx_ring_bufs_free(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring) { - int direction = xdp ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE; + int direction = nn->xdp_enabled ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE; unsigned int i; for (i = 0; i < rx_ring->cnt - 1; i++) { @@ -1271,7 +1270,7 @@ nfp_net_rx_ring_bufs_free(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring, nfp_net_dma_unmap_rx(nn, rx_ring->rxbufs[i].dma_addr, rx_ring->bufsz, direction); - nfp_net_free_frag(rx_ring->rxbufs[i].frag, xdp); + nfp_net_free_frag(rx_ring->rxbufs[i].frag, nn->xdp_enabled); rx_ring->rxbufs[i].dma_addr = 0; rx_ring->rxbufs[i].frag = NULL; } @@ -1284,8 +1283,7 @@ nfp_net_rx_ring_bufs_free(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring, * @xdp: Whether XDP is enabled */ static int -nfp_net_rx_ring_bufs_alloc(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring, - bool xdp) +nfp_net_rx_ring_bufs_alloc(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring) { struct nfp_net_rx_buf *rxbufs; unsigned int i; @@ -1295,9 +1293,9 @@ nfp_net_rx_ring_bufs_alloc(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring, for (i = 0; i < rx_ring->cnt - 1; i++) { rxbufs[i].frag = nfp_net_rx_alloc_one(rx_ring, [i].dma_addr, -rx_ring->bufsz, xdp); +rx_ring->bufsz, nn->xdp_enabled);
Re: [PATCH net v3 1/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
On 02/08/2017 04:13 PM, Florian Fainelli wrote: > The Generic PHY drivers gets assigned after we checked that the current > PHY driver is NULL, so we need to check a few things before we can > safely dereference d->driver. This would be causing a NULL deference to > occur when a system binds to the Generic PHY driver. Update > phy_attach_direct() to do the following: > > - grab the driver module reference after we have assigned the Generic > PHY drivers accordingly > > - update the error path to clean up the module reference in case the > Generic PHY probe function fails > > Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver") > Signed-off-by: Florian Fainelli> --- > drivers/net/phy/phy_device.c | 16 +++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 0d8f4d3847f6..d63d190a95ef 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -908,6 +908,7 @@ int phy_attach_direct(struct net_device *dev, struct > phy_device *phydev, > struct module *ndev_owner = dev->dev.parent->driver->owner; > struct mii_bus *bus = phydev->mdio.bus; > struct device *d = >mdio.dev; > + bool using_genphy = false; > int err; > > /* For Ethernet device drivers that register their own MDIO bus, we > @@ -938,12 +939,22 @@ int phy_attach_direct(struct net_device *dev, struct > phy_device *phydev, > d->driver = > _driver[GENPHY_DRV_1G].mdiodrv.driver; > > + using_genphy = true; > + } > + > + if (!try_module_get(d->driver->owner)) { > + dev_err(>dev, "failed to get the device driver module\n"); > + err = -EIO; > + goto error_put_device; > + } And still not correct, since we need to remove the other hunk, one day I will learn how to properly rebase my work... will submit a v4 shortly. -- Florian
[PATCH net v3 3/3] net: phy: Fix PHY driver bind and unbind events
The PHY library does not deal very well with bind and unbind events. The first thing we would see is that we were not properly canceling the PHY state machine workqueue, so we would be crashing while dereferencing phydev->drv since there is no driver attached anymore. Once we fix that, there are several things that did not quite work as expected: - if the PHY state machine was running, we were not stopping it properly, and the state machine state would not be marked as such - when we rebind the driver, nothing would happen, since we would not know which state we were before the unbind This patch takes the following approach: - if the PHY was attached, and the state machine was running we would stop it, remember where we left, and schedule the state machine for restart upong driver bind - if the PHY was attached, but HALTED, we would let it in that state, and do not alter the state upon driver bind - in all other cases (detached) we would keep the PHY in DOWN state waiting for a network driver to show up, and set PHY_READY on driver bind Suggested-by: Russell KingSigned-off-by: Florian Fainelli --- drivers/net/phy/phy_device.c | 27 +-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 40675b9706ae..6e46f6807bb7 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1723,6 +1723,7 @@ static int phy_probe(struct device *dev) struct phy_device *phydev = to_phy_device(dev); struct device_driver *drv = phydev->mdio.dev.driver; struct phy_driver *phydrv = to_phy_driver(drv); + bool should_start = false; int err = 0; phydev->drv = phydrv; @@ -1772,24 +1773,46 @@ static int phy_probe(struct device *dev) } /* Set the state to READY by default */ - phydev->state = PHY_READY; + if (phydev->state > PHY_UP && phydev->state != PHY_HALTED) + should_start = true; + else + phydev->state = PHY_READY; if (phydev->drv->probe) err = phydev->drv->probe(phydev); mutex_unlock(>lock); + if (should_start) + phy_start(phydev); + return err; } static int phy_remove(struct device *dev) { struct phy_device *phydev = to_phy_device(dev); + bool should_stop = false; + enum phy_state state; + + cancel_delayed_work_sync(>state_queue); mutex_lock(>lock); - phydev->state = PHY_DOWN; + state = phydev->state; + if (state > PHY_UP && state != PHY_HALTED) + should_stop = true; + else + phydev->state = PHY_DOWN; mutex_unlock(>lock); + /* phy_stop() sets the state to HALTED, undo that for the ->probe() function +* to have a chance to resume where we left +*/ + if (should_stop) { + phy_stop(phydev); + phydev->state = state; + } + if (phydev->drv && phydev->drv->remove) phydev->drv->remove(phydev); phydev->drv = NULL; -- 2.9.3
[PATCH RFC v2 6/8] mlx5: Changes to use generic XDP infrastructure
Change XDP program management functional interface to correspond to new XDP API. Signed-off-by: Tom Herbert--- drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 105 ++ drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 +-- 3 files changed, 33 insertions(+), 87 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 95ca03c..0255423 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -381,7 +381,6 @@ struct mlx5e_rq { u16rx_headroom; struct mlx5e_rx_am am; /* Adaptive Moderation */ - struct bpf_prog *xdp_prog; /* control */ struct mlx5_wq_ctrlwq_ctrl; @@ -695,7 +694,7 @@ struct mlx5e_priv { /* priv data path fields - start */ struct mlx5e_sq**txq_to_sq_map; int channeltc_to_txq_map[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC]; - struct bpf_prog *xdp_prog; + bool xdp_enabled; /* priv data path fields - end */ unsigned long state; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 3cce628..da91cf52 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -36,6 +36,7 @@ #include #include #include +#include #include "en.h" #include "en_tc.h" #include "eswitch.h" @@ -113,7 +114,7 @@ static void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type) static void mlx5e_set_rq_priv_params(struct mlx5e_priv *priv) { u8 rq_type = mlx5e_check_fragmented_striding_rq_cap(priv->mdev) && - !priv->xdp_prog ? + !priv->xdp_enabled ? MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ : MLX5_WQ_TYPE_LINKED_LIST; mlx5e_set_rq_type_params(priv, rq_type); @@ -568,14 +569,12 @@ static int mlx5e_create_rq(struct mlx5e_channel *c, rq->ix = c->ix; rq->priv= c->priv; - rq->xdp_prog = priv->xdp_prog ? bpf_prog_inc(priv->xdp_prog) : NULL; - if (IS_ERR(rq->xdp_prog)) { - err = PTR_ERR(rq->xdp_prog); - rq->xdp_prog = NULL; - goto err_rq_wq_destroy; - } - - if (rq->xdp_prog) { + if (priv->xdp_enabled) { + /* Note XDP is checked whether it is enabled for the device. If +* XDP programs are set per ring as opposed to setting program +* across the device this could be adjusted to account for +* that. +*/ rq->buff.map_dir = DMA_BIDIRECTIONAL; rq->rx_headroom = XDP_PACKET_HEADROOM; } else { @@ -662,8 +661,6 @@ static int mlx5e_create_rq(struct mlx5e_channel *c, mlx5_core_destroy_mkey(mdev, >umr_mkey); err_rq_wq_destroy: - if (rq->xdp_prog) - bpf_prog_put(rq->xdp_prog); mlx5_wq_destroy(>wq_ctrl); return err; @@ -673,9 +670,6 @@ static void mlx5e_destroy_rq(struct mlx5e_rq *rq) { int i; - if (rq->xdp_prog) - bpf_prog_put(rq->xdp_prog); - switch (rq->wq_type) { case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ: mlx5e_rq_free_mpwqe_info(rq); @@ -1547,7 +1541,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, c->netdev = priv->netdev; c->mkey_be = cpu_to_be32(priv->mdev->mlx5e_res.mkey.key); c->num_tc = priv->params.num_tc; - c->xdp = !!priv->xdp_prog; + c->xdp = priv->xdp_enabled; if (priv->params.rx_am_enabled) rx_cq_profile = mlx5e_am_get_def_profile(priv->params.rx_cq_period_mode); @@ -3196,96 +3190,52 @@ static void mlx5e_tx_timeout(struct net_device *dev) schedule_work(>tx_timeout_work); } -static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog) +static int mlx5e_xdp_init(struct net_device *netdev, bool enable) { struct mlx5e_priv *priv = netdev_priv(netdev); - struct bpf_prog *old_prog; int err = 0; - bool reset, was_opened; - int i; + bool was_opened; mutex_lock(>state_lock); - if ((netdev->features & NETIF_F_LRO) && prog) { + if (priv->xdp_enabled == enable) + goto unlock; + + if ((netdev->features & NETIF_F_LRO) && enable) { netdev_warn(netdev, "can't set XDP while LRO is on, disable LRO first\n"); err = -EINVAL; goto unlock; } was_opened = test_bit(MLX5E_STATE_OPENED, >state); - /* no need for full reset when exchanging programs */ - reset = (!priv->xdp_prog || !prog); - if
Re: Extending socket timestamping API for NTP
On Wed, Feb 8, 2017 at 2:26 AM, Miroslav Lichvarwrote: > On Tue, Feb 07, 2017 at 12:37:15PM -0800, sdncurious wrote: >> On Tue, Feb 7, 2017 at 6:01 AM, Miroslav Lichvar wrote: >> > 6) new SO_TIMESTAMPING option to get PHC index with HW timestamps >> > >> >With bridges, bonding and other things it's difficult to determine >> >which PHC timestamped the packet. It would be very useful if the >> >PHC index was provided with each HW timestamp. >> > >> >I'm not sure what would be the best place to put it. I guess the >> >second timespec in scm_timestamping could be reused for this, but >> >that sounds like a gross hack. Do we need to define a new struct? >> >> What is the use case for this. even if the delay though the PHY's how >> would that be compensated ? > > The idea was that applications like NTP servers and clients wouldn't > have to care about interfaces and how they map together with addresses > to PHCs over time. Currently, I use the interface index from > IP_PKTINFO to get the PHC, but that doesn't work with bridges and > other virtual interfaces. Another possibility would be an option to > modify the behavior of IP_PKTINFO to save the index of the real > interface. I'm not sure how would that compare in difficulty to > extending SCM_TIMESTAMPING with PHC index. Why not just return the digest that is in the message ? Though I am not sure if the least 32 bits will result in too many collisions. RMS > > -- > Miroslav Lichvar
[PATCH 2/2] net: ethernet: ti: cpsw: fix resume because of usage count
The usage count function is based on ndev_running flag that is updated before calling ndo_open/close, but if ndo is called in another place, in this case in suspend/resume, the counter is not changed, that breaks sus/resume. For common resource no difference which device is using it, does matter only device count. So, replace usage count function on var and inc and dec it in ndo_open/close. Fixes: 03fd01ad0eead23eb79294b6fb4d71dcac493855 "net: ethernet: ti: cpsw: don't duplicate ndev_running" Signed-off-by: Ivan Khoronzhuk--- drivers/net/ethernet/ti/cpsw.c | 42 -- 1 file changed, 12 insertions(+), 30 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 9714fab..1ffaad1 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -399,6 +399,7 @@ struct cpsw_common { struct cpts *cpts; int rx_ch_num, tx_ch_num; int speed; + int usage_count; }; struct cpsw_priv { @@ -671,18 +672,6 @@ static void cpsw_intr_disable(struct cpsw_common *cpsw) return; } -static int cpsw_get_usage_count(struct cpsw_common *cpsw) -{ - u32 i; - u32 usage_count = 0; - - for (i = 0; i < cpsw->data.slaves; i++) - if (cpsw->slaves[i].ndev && netif_running(cpsw->slaves[i].ndev)) - usage_count++; - - return usage_count; -} - static void cpsw_tx_handler(void *token, int len, int status) { struct netdev_queue *txq; @@ -716,8 +705,7 @@ static void cpsw_rx_handler(void *token, int len, int status) if (unlikely(status < 0) || unlikely(!netif_running(ndev))) { /* In dual emac mode check for all interfaces */ - if (cpsw->data.dual_emac && - cpsw_get_usage_count(cpsw) && + if (cpsw->data.dual_emac && cpsw->usage_count && (status >= 0)) { /* The packet received is for the interface which * is already down and the other interface is up @@ -1492,11 +1480,8 @@ static int cpsw_ndo_open(struct net_device *ndev) CPSW_MAJOR_VERSION(reg), CPSW_MINOR_VERSION(reg), CPSW_RTL_VERSION(reg)); - /* Initialize host and slave ports. -* Given ndev is marked as opened already, so init port only if 1 ndev -* is opened -*/ - if (cpsw_get_usage_count(cpsw) < 2) + /* Initialize host and slave ports */ + if (!cpsw->usage_count) cpsw_init_host_port(priv); for_each_slave(priv, cpsw_slave_open, priv); @@ -1507,10 +1492,8 @@ static int cpsw_ndo_open(struct net_device *ndev) cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan, ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0); - /* Given ndev is marked as opened already, so if more ndev -* are opened - no need to init shared resources. -*/ - if (cpsw_get_usage_count(cpsw) < 2) { + /* initialize shared resources for every ndev */ + if (!cpsw->usage_count) { /* disable priority elevation */ __raw_writel(0, >regs->ptype); @@ -1552,6 +1535,7 @@ static int cpsw_ndo_open(struct net_device *ndev) cpdma_ctlr_start(cpsw->dma); cpsw_intr_enable(cpsw); + cpsw->usage_count++; return 0; @@ -1572,10 +1556,7 @@ static int cpsw_ndo_stop(struct net_device *ndev) netif_tx_stop_all_queues(priv->ndev); netif_carrier_off(priv->ndev); - /* Given ndev is marked as close already, -* so disable shared resources if no open devices -*/ - if (!cpsw_get_usage_count(cpsw)) { + if (cpsw->usage_count <= 1) { napi_disable(>napi_rx); napi_disable(>napi_tx); cpts_unregister(cpsw->cpts); @@ -1588,6 +1569,7 @@ static int cpsw_ndo_stop(struct net_device *ndev) if (cpsw_need_resplit(cpsw)) cpsw_split_res(ndev); + cpsw->usage_count--; pm_runtime_put_sync(cpsw->dev); return 0; } @@ -2393,7 +2375,7 @@ static int cpsw_resume_data_pass(struct net_device *ndev) netif_dormant_off(slave->ndev); /* After this receive is started */ - if (cpsw_get_usage_count(cpsw)) { + if (cpsw->usage_count) { ret = cpsw_fill_rx_channels(priv); if (ret) return ret; @@ -2447,7 +2429,7 @@ static int cpsw_set_channels(struct net_device *ndev, } } - if (cpsw_get_usage_count(cpsw)) + if (cpsw->usage_count) cpsw_split_res(ndev); ret = cpsw_resume_data_pass(ndev); @@ -2529,7 +2511,7 @@ static int cpsw_set_ringparam(struct net_device
[PATCH 1/2] net: ethernet: ti: cpsw: fix cpsw assignment in resume
There is a copy-paste error, which hides breaking of resume for CPSW driver: there was replaced netdev_priv() to ndev_to_cpsw(ndev) in suspend, but left it unchanged in resume. Fixes: 606f39939595a4d4540406bfc11f265b2036af6d (ti: cpsw: move platform data and slaves info to cpsw_common) Reported-by: Alexey StarikovskiySigned-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 4d1c0c3..9714fab 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -3225,7 +3225,7 @@ static int cpsw_resume(struct device *dev) { struct platform_device *pdev = to_platform_device(dev); struct net_device *ndev = platform_get_drvdata(pdev); - struct cpsw_common *cpsw = netdev_priv(ndev); + struct cpsw_common *cpsw = ndev_to_cpsw(ndev); /* Select default pin state */ pinctrl_pm_select_default_state(dev); -- 2.7.4
[PATCH 0/2] net: ethernet: ti: cpsw: fix susp/resume
These two patches fix suspend/resume chain. Ivan Khoronzhuk (2): net: ethernet: ti: cpsw: fix cpsw assignment in resume net: ethernet: ti: cpsw: fix resume because of usage count drivers/net/ethernet/ti/cpsw.c | 44 +- 1 file changed, 13 insertions(+), 31 deletions(-) -- 2.7.4
[PATCH RFC v2 4/8] qede: Changes to use generic XDP infrastructure
Change XDP program management functional interface to correspond to new XDP API. Signed-off-by: Tom Herbert--- drivers/net/ethernet/qlogic/qede/qede.h | 3 +- drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 2 +- drivers/net/ethernet/qlogic/qede/qede_filter.c | 39 ++--- drivers/net/ethernet/qlogic/qede/qede_fp.c | 36 +-- drivers/net/ethernet/qlogic/qede/qede_main.c| 23 --- 5 files changed, 44 insertions(+), 59 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index b423406..e1baf88 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -213,10 +213,9 @@ struct qede_dev { u16 geneve_dst_port; bool wol_enabled; + bool xdp_enabled; struct qede_rdma_devrdma_info; - - struct bpf_prog *xdp_prog; }; enum QEDE_STATE { diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c index baf2642..5559d6e 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c +++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c @@ -341,7 +341,7 @@ static int qede_get_sset_count(struct net_device *dev, int stringset) num_stats += QEDE_RSS_COUNT(edev) * QEDE_NUM_RQSTATS; /* Account for XDP statistics [if needed] */ - if (edev->xdp_prog) + if (edev->xdp_enabled) num_stats += QEDE_RSS_COUNT(edev) * QEDE_NUM_TQSTATS; return num_stats; diff --git a/drivers/net/ethernet/qlogic/qede/qede_filter.c b/drivers/net/ethernet/qlogic/qede/qede_filter.c index 107c3fd..9c9db44 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_filter.c +++ b/drivers/net/ethernet/qlogic/qede/qede_filter.c @@ -426,7 +426,7 @@ int qede_set_features(struct net_device *dev, netdev_features_t features) * aggregations, so no need to actually reload. */ __qede_lock(edev); - if (edev->xdp_prog) + if (edev->xdp_enabled) args.func(edev, ); else qede_reload(edev, , true); @@ -506,29 +506,21 @@ void qede_udp_tunnel_del(struct net_device *dev, struct udp_tunnel_info *ti) schedule_delayed_work(>sp_task, 0); } -static void qede_xdp_reload_func(struct qede_dev *edev, -struct qede_reload_args *args) +static int qede_xdp_check_bpf(struct qede_dev *edev, struct bpf_prog *prog) { - struct bpf_prog *old; - - old = xchg(>xdp_prog, args->u.new_prog); - if (old) - bpf_prog_put(old); -} - -static int qede_xdp_set(struct qede_dev *edev, struct bpf_prog *prog) -{ - struct qede_reload_args args; - if (prog && prog->xdp_adjust_head) { DP_ERR(edev, "Does not support bpf_xdp_adjust_head()\n"); return -EOPNOTSUPP; } - /* If we're called, there was already a bpf reference increment */ - args.func = _xdp_reload_func; - args.u.new_prog = prog; - qede_reload(edev, , false); + return 0; +} + +static int qede_xdp_init(struct qede_dev *edev, bool enable) +{ + edev->xdp_enabled = enable; + + qede_reload(edev, NULL, false); return 0; } @@ -538,11 +530,12 @@ int qede_xdp(struct net_device *dev, struct netdev_xdp *xdp) struct qede_dev *edev = netdev_priv(dev); switch (xdp->command) { - case XDP_SETUP_PROG: - return qede_xdp_set(edev, xdp->prog); - case XDP_QUERY_PROG: - xdp->prog_attached = !!edev->xdp_prog; - return 0; + case XDP_MODE_OFF: + return qede_xdp_init(edev, true); + case XDP_MODE_ON: + return qede_xdp_init(edev, false); + case XDP_CHECK_BPF_PROG: + return qede_xdp_check_bpf(edev, xdp->prog); default: return -EINVAL; } diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c index 26848ee..af885c3 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_fp.c +++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include "qede.h" @@ -987,13 +988,14 @@ static bool qede_pkt_is_ip_fragmented(struct eth_fast_path_rx_reg_cqe *cqe, static bool qede_rx_xdp(struct qede_dev *edev, struct qede_fastpath *fp, struct qede_rx_queue *rxq, - struct bpf_prog *prog, struct sw_rx_data *bd, struct eth_fast_path_rx_reg_cqe *cqe) { u16 len = le16_to_cpu(cqe->len_on_first_bd); struct xdp_buff xdp; enum xdp_action act; + struct xdp_hook
[PATCH RFC v2 0/8] xdp: Generalize XDP
This patch set generalizes XDP by making the hooks in drivers to be generic. This has a number of advantages: - Allows alternative users of the XDP hooks other than the original BPF - Allows a means to pipeline XDP programs together - Reduces the amount of code and complexity needed in drivers to manage XDP - Provides a more structured environment that is extensible to new features while being mostly transparent to the drivers The generic XDP infrastructure is based on an xdp_hook structure that contains callback functions and private data structure that can be populated by the user of XDP. The XDP hooks are registered either on a netdev or a napi (both maintain a list of XDP hooks). Allowing per netdev hooks makes management of XDP a lot simpler when the intent is for the hook to apply to the whole device (as is the case with XDP_BPF so far). Multiple xdp hooks may be registered on a device or napi instance, the order of execution is indicated in the priority field of the xdp_hook structure. Execution of the list contains to the end or until a program returns something other than XDP_PASS. If both napi XDP hooks and device hooks are enabled, the NAPI hooks are run first. The xdp_hook structure contains a "hookfn" field that is the function executes a hook. The "priv" structure is private data that is provided as an argument to hookfn-- in the case of a BPF hook this is simply the bpf_prog. Hooks may be registered by xdp_register_dev_hook or xdp_register_napi_hook, and subsequently they can be unregistered but xdp_unregister_dev_hook and xdp_unregister_napi_hook. The identifier for a hook is the pointer to the template hook that was used to register the hook. xdp_find_dev_hook and xdp_find_napi_hook will return whether a hook has been registered and optionally return the contents of the hook. xdp_bpf_check_prog is called for BPF programs to check if the driver is okay with running the program (uses the XDP_CHECK_BPF_PROG ndo command described below). Driver interface: Drivers no longer deal with BPF programs for the most part, instead they call into the XDP interface. There are two functions of interest for use in the receive data path: - xdp_hook_run_needed_check: returns true if there is an XDP program registered on the napi instance or its device - xdp_hook_run, xdp_hook_run_ret_last: runs the XDP programs for the hooks registered for the given napi instance or its device. The latter variant returns a pointer to the last XDP hook that was run (useful for reporting). The ndo_xdp defines a new set of commands for this interface. A driver should implement these commands: - XDP_MODE_ON: Initialize device to use XDP. Called when first XDP program is registered on a device (including on a NAPI instance). - XDP_MODE_OFF: XDP is finished on the device. Called after the last XDP hook has been unregistered for a device. - XDP_CHECK_BPF_PROG: Check if a BPF program is acceptable to a device to run. - XDP_OFFLOAD_BPF: Offload the associated BPF program (e.g. Netronome). A new net feature is added NETIF_F_XDP so that a driver indicates that is supports XDP. This patch set: - Adds the infrastructure described above include xdp.c and xdp.h files. - Modifies mlx4, mlx5, qede, nfp, and virt_net drivers to use the new interface. That is mostly removed the management of BPF programs and changing to call the new interface. v2: - Eliminate use of nfhooks like lists. Just use use simple array for the hooks - Modify more drivers that now support XDP Tested: TBD Tom Herbert (8): xdp: Infrastructure to generalize XDP mlx4: Changes to use generic XDP infrastructure nfp: Changes to use generic XDP infrastructure qede: Changes to use generic XDP infrastructure virt_net: Changes to use generic XDP infrastructure mlx5: Changes to use generic XDP infrastructure bnxt: Changes to use generic XDP infrastructure xdp: Cleanup after API changes drivers/net/ethernet/broadcom/bnxt/bnxt.c | 14 - drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 46 +-- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 92 ++ drivers/net/ethernet/mellanox/mlx4/en_rx.c | 27 +- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 1 + drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 - drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 105 ++- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 12 +- drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c | 1 + drivers/net/ethernet/netronome/nfp/nfp_net.h | 5 +- .../net/ethernet/netronome/nfp/nfp_net_common.c| 170 ++- .../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 12 +- drivers/net/ethernet/qlogic/qede/qede.h| 3 +-
Re: [PATCH v2 net-next 0/9] openvswitch: Conntrack integration improvements.
On 8 February 2017 at 11:32, Jarno Rajahalmewrote: > This series improves the conntrack integration code in the openvswitch > module by fixing bugs (patches 1, 4, and 6), clarifying code (patches > 2, 3, and 5), improving performance (patch 9), and adding new features > enabling better translation from firewall admission policy to network > configuration requested by user communities (patches 7 and 8). Looks like this needs another rebase. For the patches I haven't specifically commented on (1,2,4,9), they looked fine to me: Acked-by: Joe Stringer I presume that Pravin will also want to take a look.
Re: [PATCH v2 net-next 8/9] openvswitch: Add force commit.
On 8 February 2017 at 11:32, Jarno Rajahalmewrote: > Stateful network admission policy may allow connections to one > direction and reject connections initiated in the other direction. > After policy change it is possible that for a new connection an > overlapping conntrack entry already exists, where the original > direction of the existing connection is opposed to the new > connection's initial packet. > > Most importantly, conntrack state relating to the current packet gets > the "reply" designation based on whether the original direction tuple > or the reply direction tuple matched. If this "directionality" is > wrong w.r.t. to the stateful network admission policy it may happen > that packets in neither direction are correctly admitted. > > This patch adds a new "force commit" option to the OVS conntrack > action that checks the original direction of an existing conntrack > entry. If that direction is opposed to the current packet, the > existing conntrack entry is deleted and a new one is subsequently > created in the correct direction. > > Signed-off-by: Jarno Rajahalme > if (help && rcu_access_pointer(help->helper) != info->helper) > return false; > } > + /* Force conntrack entry direction to the current packet? */ > + if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) { > + /* Delete the conntrack entry if confirmed, else just release > +* the reference. > +*/ > + if (nf_ct_is_confirmed(ct)) > + nf_ct_delete(ct, 0, 0); > + else > + nf_conntrack_put(>ct_general); > + skb->nfct = NULL; > + skb->nfctinfo = 0; This can use nf_ct_set().
Re: i40e: driver can't probe device (capabilities discovery error)
I forgot to attach a perhaps important file, the output of lspci. Here it goes. Thanks, Guilherme 0002:01:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710/X557-AT 10GBASE-T [8086:1589] (rev 02) Subsystem: Super Micro Computer Inc Device [15d9:] Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR-
Re: [PATCH v2 net-next 7/9] openvswitch: Add original direction conntrack tuple to sw_flow_key.
On 8 February 2017 at 11:32, Jarno Rajahalmewrote: > Add the fields of the conntrack original direction 5-tuple to struct > sw_flow_key. The new fields are initially marked as non-existent, and > are populated whenever a conntrack action is executed and either finds > or generates a conntrack entry. This means that these fields exist > for all packets that were not rejected by conntrack as untrackable. > > The original tuple fields in the sw_flow_key are filled from the > original direction tuple of the conntrack entry relating to the > current packet, or from the original direction tuple of the master > conntrack entry, if the current conntrack entry has a master. > Generally, expected connections of connections having an assigned > helper (e.g., FTP), have a master conntrack entry. > > The main purpose of the new conntrack original tuple fields is to > allow matching on them for policy decision purposes, with the premise > that the admissibility of tracked connections reply packets (as well > as original direction packets), and both direction packets of any > related connections may be based on ACL rules applying to the master > connection's original direction 5-tuple. This also makes it easier to > make policy decisions when the actual packet headers might have been > transformed by NAT, as the original direction 5-tuple represents the > packet headers before any such transformation. > > When using the original direction 5-tuple the admissibility of return > and/or related packets need not be based on the mere existence of a > conntrack entry, allowing separation of admission policy from the > established conntrack state. While existence of a conntrack entry is > required for admission of the return or related packets, policy > changes can render connections that were initially admitted to be > rejected or dropped afterwards. If the admission of the return and > related packets was based on mere conntrack state (e.g., connection > being in an established state), a policy change that would make the > connection rejected or dropped would need to find and delete all > conntrack entries affected by such a change. When using the original > direction 5-tuple matching the affected conntrack entries can be > allowed to time out instead, as the established state of the > connection would not need to be the basis for packet admission any > more. > > It should be noted that the directionality of related connections may > be the same or different than that of the master connection, and > neither the original direction 5-tuple nor the conntrack state bits > carry this information. If needed, the directionality of the master > connection can be stored in master's conntrack mark or labels, which > are automatically inherited by the expected related connections. > > The fact that neither ARP not ND packets are trackable by conntrack * ARP nor ND > allows mutual exclusion between ARP/ND and the new conntrack original > tuple fields. Hence, the IP addresses are overlaid in union with ARP > and ND fields. This allows the sw_flow_key to not grow much due to > this patch, but it also means that we must be careful to never use the > new key fields with ARP or ND packets. ARP is easy to distinguish and > keep mutually exclusive based on the ethernet type, but ND being an > ICMPv6 protocol requires a bit more attention. > > Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer
Re: [RFC 1/2] dt: emac: document device-tree based phy discovery and setup
On Sun, Feb 05, 2017 at 11:25:05PM +0100, Christian Lamparter wrote: > This patch adds documentation for a new "phy-handler" property s/phy-handler/phy-handle/ > and "mdio" sub-node. These allows the enumeration of PHYs which > are supported by the phy library under drivers/net/phy. > > The EMAC ethernet controller in IBM and AMCC 4xx chips is > currently stuck with a few privately defined phy > implementations. It has no support for PHYs which > are supported by the generic phylib. > > Signed-off-by: Christian Lamparter> --- > .../devicetree/bindings/powerpc/4xx/emac.txt | 60 > +- > 1 file changed, 58 insertions(+), 2 deletions(-) Otherwise, Acked-by: Rob Herring
[patch net-next 14/15] mlxsw: spectrum_router: Don't reflect LINKDOWN nexthops
From: Ido SchimmelThe kernel resolves the nexthops for a given route using FIB_LOOKUP_IGNORE_LINKSTATE which means a notification can be sent for a route with one of its nexthops being LINKDOWN. In case IGNORE_ROUTES_WITH_LINKDOWN is set for the nexthop netdev, then we shouldn't reflect the nexthop to the device's table. Once the nexthop netdev's carrier goes up we'll be notified using NH_ADD and reflect it to the device. Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c index 1c68b40..8dfc025 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include #include @@ -1503,6 +1504,7 @@ static int mlxsw_sp_nexthop_init(struct mlxsw_sp *mlxsw_sp, struct fib_nh *fib_nh) { struct net_device *dev = fib_nh->nh_dev; + struct in_device *in_dev; struct mlxsw_sp_rif *r; int err; @@ -1512,6 +1514,11 @@ static int mlxsw_sp_nexthop_init(struct mlxsw_sp *mlxsw_sp, if (err) return err; + in_dev = __in_dev_get_rtnl(dev); + if (in_dev && IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) && + fib_nh->nh_flags & RTNH_F_LINKDOWN) + return 0; + r = mlxsw_sp_rif_find_by_dev(mlxsw_sp, dev); if (!r) return 0; -- 2.7.4
Re: Extending socket timestamping API for NTP
Dealing with individual interfaces does not make sense. This seems to be a case where Reciprocity property is violated and hence should be handled as such. This is different than when the two sides have single but different speed NIC's. In this case the NIC used and the speed can change with each packet. Although I am not sure if that is possible because the hash should always land the packet on the NIC of the bond. 7. Reciprocity Errors The above analysis assumes that the delays on the outbound and inbound paths are the same; that is, the paths are reciprocal. This is assured if the ropagation delays are the same, the transmission rates are the same and the packet lengths are the same. In the NTP on-wire protocol all packets have the the same length. If we assume the transmission rates are the same, the only difference in path delays must be due to nonreciprocal transmission paths. This often occurs if one way is via landline and the other via satellite. It can also occur when the paths traverse tag-switched core networks. RMS On Wed, Feb 8, 2017 at 2:26 AM, Miroslav Lichvarwrote: > On Tue, Feb 07, 2017 at 12:37:15PM -0800, sdncurious wrote: >> On Tue, Feb 7, 2017 at 6:01 AM, Miroslav Lichvar wrote: >> > 6) new SO_TIMESTAMPING option to get PHC index with HW timestamps >> > >> >With bridges, bonding and other things it's difficult to determine >> >which PHC timestamped the packet. It would be very useful if the >> >PHC index was provided with each HW timestamp. >> > >> >I'm not sure what would be the best place to put it. I guess the >> >second timespec in scm_timestamping could be reused for this, but >> >that sounds like a gross hack. Do we need to define a new struct? >> >> What is the use case for this. even if the delay though the PHY's how >> would that be compensated ? > > The idea was that applications like NTP servers and clients wouldn't > have to care about interfaces and how they map together with addresses > to PHCs over time. Currently, I use the interface index from > IP_PKTINFO to get the PHC, but that doesn't work with bridges and > other virtual interfaces. Another possibility would be an option to > modify the behavior of IP_PKTINFO to save the index of the real > interface. I'm not sure how would that compare in difficulty to > extending SCM_TIMESTAMPING with PHC index. > > -- > Miroslav Lichvar
[PATCH v3 net-next 2/2] net: dsa: mv88e6xxx: Add mv88e6390 watchdog interrupt support
Implement the ops needed to support the watchdog for the MV88E6390 family. Signed-off-by: Andrew Lunn--- v2: All new v3: Remove g2_ prefix from ops. --- drivers/net/dsa/mv88e6xxx/chip.c | 9 +++ drivers/net/dsa/mv88e6xxx/global2.c | 48 +++ drivers/net/dsa/mv88e6xxx/global2.h | 2 ++ drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 12 + 4 files changed, 71 insertions(+) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 489a59f5dea3..7658284beaf9 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -3419,6 +3419,7 @@ static const struct mv88e6xxx_ops mv88e6190_ops = { .stats_get_stats = mv88e6390_stats_get_stats, .g1_set_cpu_port = mv88e6390_g1_set_cpu_port, .g1_set_egress_port = mv88e6390_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3446,6 +3447,7 @@ static const struct mv88e6xxx_ops mv88e6190x_ops = { .stats_get_stats = mv88e6390_stats_get_stats, .g1_set_cpu_port = mv88e6390_g1_set_cpu_port, .g1_set_egress_port = mv88e6390_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3473,6 +3475,7 @@ static const struct mv88e6xxx_ops mv88e6191_ops = { .stats_get_stats = mv88e6390_stats_get_stats, .g1_set_cpu_port = mv88e6390_g1_set_cpu_port, .g1_set_egress_port = mv88e6390_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3530,6 +3533,7 @@ static const struct mv88e6xxx_ops mv88e6290_ops = { .stats_get_stats = mv88e6390_stats_get_stats, .g1_set_cpu_port = mv88e6390_g1_set_cpu_port, .g1_set_egress_port = mv88e6390_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3694,6 +3698,7 @@ static const struct mv88e6xxx_ops mv88e6141_ops = { .stats_get_stats = mv88e6390_stats_get_stats, .g1_set_cpu_port = mv88e6390_g1_set_cpu_port, .g1_set_egress_port = mv88e6390_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3722,6 +3727,7 @@ static const struct mv88e6xxx_ops mv88e6341_ops = { .stats_get_stats = mv88e6390_stats_get_stats, .g1_set_cpu_port = mv88e6390_g1_set_cpu_port, .g1_set_egress_port = mv88e6390_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3752,6 +3758,7 @@ static const struct mv88e6xxx_ops mv88e6390_ops = { .stats_get_stats = mv88e6390_stats_get_stats, .g1_set_cpu_port = mv88e6390_g1_set_cpu_port, .g1_set_egress_port = mv88e6390_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3781,6 +3788,7 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = { .stats_get_stats = mv88e6390_stats_get_stats, .g1_set_cpu_port = mv88e6390_g1_set_cpu_port, .g1_set_egress_port = mv88e6390_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3808,6 +3816,7 @@ static const struct mv88e6xxx_ops mv88e6391_ops = { .stats_get_stats = mv88e6390_stats_get_stats, .g1_set_cpu_port = mv88e6390_g1_set_cpu_port, .g1_set_egress_port = mv88e6390_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; diff --git a/drivers/net/dsa/mv88e6xxx/global2.c b/drivers/net/dsa/mv88e6xxx/global2.c index 1e2d65826d12..8f15bc7b1f5f 100644 --- a/drivers/net/dsa/mv88e6xxx/global2.c +++ b/drivers/net/dsa/mv88e6xxx/global2.c @@ -686,6 +686,54 @@ const struct mv88e6xxx_irq_ops mv88e6097_watchdog_ops = { .irq_free = mv88e6097_watchdog_free, }; +static int mv88e6390_watchdog_setup(struct mv88e6xxx_chip *chip) +{ + return mv88e6xxx_g2_update(chip, GLOBAL2_WDOG_CONTROL, + GLOBAL2_WDOG_INT_ENABLE | + GLOBAL2_WDOG_CUT_THROUGH | + GLOBAL2_WDOG_QUEUE_CONTROLLER | + GLOBAL2_WDOG_EGRESS | + GLOBAL2_WDOG_FORCE_IRQ); +} + +static int mv88e6390_watchdog_action(struct mv88e6xxx_chip *chip, int irq) +{ + int err; + u16 reg; + + mv88e6xxx_g2_write(chip, GLOBAL2_WDOG_CONTROL,
[PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Add watchdog interrupt handler
The switch contains a watchdog looking for issues with the internal gubbins of the switch. Hook the interrupt the watchdog triggers and log the value of the control register indicating why the watchdog fired. The watchdog can only be cleared with a switch reset, which will destroy the current configuration. Rather than doing this, just disable the interrupt. The mv88e6390 family has different watchdog registers. So use an ops structure, so support for the mv88e6390 family can be added later. Signed-off-by: Andrew Lunn--- v2: Use ops and exclude the 6390 family Add missing locks in the IRQ handler v3: remove g2_ from ops name. --- drivers/net/dsa/mv88e6xxx/chip.c | 14 ++ drivers/net/dsa/mv88e6xxx/global2.c | 89 ++- drivers/net/dsa/mv88e6xxx/global2.h | 4 ++ drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 21 + 4 files changed, 127 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 7b4e40b286e4..489a59f5dea3 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -3111,6 +3111,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, .ppu_enable = mv88e6185_g1_ppu_enable, .ppu_disable = mv88e6185_g1_ppu_disable, @@ -3179,6 +3180,7 @@ static const struct mv88e6xxx_ops mv88e6123_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3205,6 +3207,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, .ppu_enable = mv88e6185_g1_ppu_enable, .ppu_disable = mv88e6185_g1_ppu_disable, @@ -3232,6 +3235,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3250,6 +3254,7 @@ static const struct mv88e6xxx_ops mv88e6165_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3276,6 +3281,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3304,6 +3310,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3330,6 +3337,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3358,6 +3366,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, .reset = mv88e6352_g1_reset, }; @@ -3380,6 +3389,7 @@ static const struct mv88e6xxx_ops mv88e6185_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .watchdog_ops = _watchdog_ops, .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, .ppu_enable =
[PATCHv2 net-next] net: dsa: mv88e6xxx: Move forward declaration to where it is needed
Move it out from the middle for the #defines to just before it is needed. Signed-off-by: Andrew LunnReviewed-by: Vivien Didelot --- v2: Rebased onto latest net-next. drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h index 8a21800374f3..d6b335cd8c09 100644 --- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h @@ -662,8 +662,6 @@ enum mv88e6xxx_cap { MV88E6XXX_FLAGS_PVT | \ MV88E6XXX_FLAGS_SERDES) -struct mv88e6xxx_ops; - #define MV88E6XXX_FLAGS_FAMILY_6390\ (MV88E6XXX_FLAG_EEE | \ MV88E6XXX_FLAG_GLOBAL2 | \ @@ -673,6 +671,8 @@ struct mv88e6xxx_ops; MV88E6XXX_FLAGS_MULTI_CHIP | \ MV88E6XXX_FLAGS_PVT) +struct mv88e6xxx_ops; + struct mv88e6xxx_info { enum mv88e6xxx_family family; u16 prod_num; -- 2.11.0
[PATCH v3 net-next 0/2] mv88e6xxx Watchdog support
The Marvell switches have an in built watchdog over some of the internal state machine. The watchdog can be configured to raise an interrupt on error. The problem the watchdog found is then logged to the kernel log. The older switches can automagically perform a software reset when the watchdog triggers. This just resets the internal state machine, but leaves the switch configuration unchanged. The 6390 family of switches cannot both raise an interrupt and automagically perform a software reset. So the interrupt handler has to perform the switch reset, and then re-enable the watchdog interrupts. This has been tested using hacked together debugfs code which allows the "force" bit to be set, so cause a watchdog interrupt. v2: Remove g2_prefix Andrew Lunn (2): net: dsa: mv88e6xxx: Add watchdog interrupt handler net: dsa: mv88e6xxx: Add mv88e6390 watchdog interrupt support drivers/net/dsa/mv88e6xxx/chip.c | 23 ++ drivers/net/dsa/mv88e6xxx/global2.c | 137 +- drivers/net/dsa/mv88e6xxx/global2.h | 6 ++ drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 33 4 files changed, 198 insertions(+), 1 deletion(-) -- 2.11.0
[PATCH] net: micrel: ks8695net: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. As I don't have the hardware, I'd be very pleased if someone may test this patch. Signed-off-by: Philippe Reynes--- drivers/net/ethernet/micrel/ks8695net.c | 91 +-- 1 files changed, 50 insertions(+), 41 deletions(-) diff --git a/drivers/net/ethernet/micrel/ks8695net.c b/drivers/net/ethernet/micrel/ks8695net.c index d210615..bd51e05 100644 --- a/drivers/net/ethernet/micrel/ks8695net.c +++ b/drivers/net/ethernet/micrel/ks8695net.c @@ -854,85 +854,94 @@ static int ks8695_poll(struct napi_struct *napi, int budget) } /** - * ks8695_wan_get_settings - Get device-specific settings. + * ks8695_wan_get_link_ksettings - Get device-specific settings. * @ndev: The network device to read settings from * @cmd: The ethtool structure to read into */ static int -ks8695_wan_get_settings(struct net_device *ndev, struct ethtool_cmd *cmd) +ks8695_wan_get_link_ksettings(struct net_device *ndev, + struct ethtool_link_ksettings *cmd) { struct ks8695_priv *ksp = netdev_priv(ndev); u32 ctrl; + u32 supported, advertising; /* All ports on the KS8695 support these... */ - cmd->supported = (SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full | + supported = (SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full | SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full | SUPPORTED_TP | SUPPORTED_MII); - cmd->transceiver = XCVR_INTERNAL; - cmd->advertising = ADVERTISED_TP | ADVERTISED_MII; - cmd->port = PORT_MII; - cmd->supported |= (SUPPORTED_Autoneg | SUPPORTED_Pause); - cmd->phy_address = 0; + advertising = ADVERTISED_TP | ADVERTISED_MII; + cmd->base.port = PORT_MII; + supported |= (SUPPORTED_Autoneg | SUPPORTED_Pause); + cmd->base.phy_address = 0; ctrl = readl(ksp->phyiface_regs + KS8695_WMC); if ((ctrl & WMC_WAND) == 0) { /* auto-negotiation is enabled */ - cmd->advertising |= ADVERTISED_Autoneg; + advertising |= ADVERTISED_Autoneg; if (ctrl & WMC_WANA100F) - cmd->advertising |= ADVERTISED_100baseT_Full; + advertising |= ADVERTISED_100baseT_Full; if (ctrl & WMC_WANA100H) - cmd->advertising |= ADVERTISED_100baseT_Half; + advertising |= ADVERTISED_100baseT_Half; if (ctrl & WMC_WANA10F) - cmd->advertising |= ADVERTISED_10baseT_Full; + advertising |= ADVERTISED_10baseT_Full; if (ctrl & WMC_WANA10H) - cmd->advertising |= ADVERTISED_10baseT_Half; + advertising |= ADVERTISED_10baseT_Half; if (ctrl & WMC_WANAP) - cmd->advertising |= ADVERTISED_Pause; - cmd->autoneg = AUTONEG_ENABLE; + advertising |= ADVERTISED_Pause; + cmd->base.autoneg = AUTONEG_ENABLE; - ethtool_cmd_speed_set(cmd, - (ctrl & WMC_WSS) ? SPEED_100 : SPEED_10); - cmd->duplex = (ctrl & WMC_WDS) ? + cmd->base.speed = (ctrl & WMC_WSS) ? SPEED_100 : SPEED_10; + cmd->base.duplex = (ctrl & WMC_WDS) ? DUPLEX_FULL : DUPLEX_HALF; } else { /* auto-negotiation is disabled */ - cmd->autoneg = AUTONEG_DISABLE; + cmd->base.autoneg = AUTONEG_DISABLE; - ethtool_cmd_speed_set(cmd, ((ctrl & WMC_WANF100) ? - SPEED_100 : SPEED_10)); - cmd->duplex = (ctrl & WMC_WANFF) ? + cmd->base.speed = (ctrl & WMC_WANF100) ? + SPEED_100 : SPEED_10; + cmd->base.duplex = (ctrl & WMC_WANFF) ? DUPLEX_FULL : DUPLEX_HALF; } + ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported, + supported); + ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.advertising, + advertising); + return 0; } /** - * ks8695_wan_set_settings - Set device-specific settings. + * ks8695_wan_set_link_ksettings - Set device-specific settings. * @ndev: The network device to configure * @cmd: The settings to configure */ static int -ks8695_wan_set_settings(struct net_device *ndev, struct ethtool_cmd *cmd) +ks8695_wan_set_link_ksettings(struct net_device *ndev, + const struct ethtool_link_ksettings *cmd) { struct ks8695_priv *ksp = netdev_priv(ndev); u32 ctrl; + u32
Re: [PATCH v2 net-next 3/9] openvswitch: Simplify labels length logic.
On 8 February 2017 at 11:32, Jarno Rajahalmewrote: > Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128 > distinct labels"), the size of conntrack labels extension has fixed to > 128 bits, so we do not need to check for labels sizes shorter than 128 > at run-time. This patch simplifies labels length logic accordingly, > but allows the conntrack labels size to be increased in the future > without breaking the build. In the event of conntrack labels > increasing in size OVS would still be able to deal with the 128 first > label bits. > > Suggested-by: Joe Stringer > Signed-off-by: Jarno Rajahalme > --- > net/openvswitch/conntrack.c | 22 +++--- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c > index 6730f09..a07e5cd 100644 > --- a/net/openvswitch/conntrack.c > +++ b/net/openvswitch/conntrack.c > @@ -129,22 +129,22 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct) > #endif > } > > +/* Guard against conntrack labels max size shrinking below 128 bits. */ > +#if NF_CT_LABELS_MAX_SIZE < 16 > +#error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes > +#endif > + > static void ovs_ct_get_labels(const struct nf_conn *ct, > struct ovs_key_ct_labels *labels) > { > struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL; > > - if (cl) { > - size_t len = sizeof(cl->bits); > - > - if (len > OVS_CT_LABELS_LEN) > - len = OVS_CT_LABELS_LEN; > - else if (len < OVS_CT_LABELS_LEN) > - memset(labels, 0, OVS_CT_LABELS_LEN); > - memcpy(labels, cl->bits, len); > - } else { > + if (cl) > + memcpy(labels, cl->bits, > + sizeof(cl->bits) > OVS_CT_LABELS_LEN > + ? OVS_CT_LABELS_LEN : sizeof(cl->bits)); Is this to be defensive? If sizeof(cl->bits) is larger than OVS_CT_LABELS_LEN, we'll use OVS_CT_LABELS_LEN; if it's equal, we'll still use OVS_CT_LABELS_LEN; if it's less, the precompiler will fail above. Why not memcpy(.., OVS_CT_LABELS_LEN) ?
Re: [PATCH net-next] net: dsa: Fix duplicate object rule
Hi Florian, Florian Fainelliwrites: > While adding switch.o to the list of DSA object files, we essentially > duplicated the previous obj-y line and just added switch.o, remove the > duplicate. > > Fixes: f515f192ab4f ("net: dsa: add switch notifier") > Signed-off-by: Florian Fainelli Reviewed-by: Vivien Didelot My bad, thanks! Vivien
[PATCH net-next] net: dsa: Fix duplicate object rule
While adding switch.o to the list of DSA object files, we essentially duplicated the previous obj-y line and just added switch.o, remove the duplicate. Fixes: f515f192ab4f ("net: dsa: add switch notifier") Signed-off-by: Florian Fainelli--- net/dsa/Makefile | 1 - 1 file changed, 1 deletion(-) diff --git a/net/dsa/Makefile b/net/dsa/Makefile index 72912982de3d..31d343796251 100644 --- a/net/dsa/Makefile +++ b/net/dsa/Makefile @@ -1,6 +1,5 @@ # the core obj-$(CONFIG_NET_DSA) += dsa_core.o -dsa_core-y += dsa.o slave.o dsa2.o dsa_core-y += dsa.o slave.o dsa2.o switch.o # tagging formats -- 2.9.3
[PATCH net] net: phy: Initialize mdio clock at probe function
From: Yendapally Reddy Dhananjaya ReddyUSB PHYs need the MDIO clock divisor enabled earlier to work. Initialize mdio clock divisor in probe function. The ext bus bit available in the same register will be used by mdio mux to enable external mdio. Signed-off-by: Yendapally Reddy Dhananjaya Reddy Fixes: ddc24ae1 ("net: phy: Broadcom iProc MDIO bus driver") Reviewed-by: Florian Fainelli Signed-off-by: Jon Mason --- drivers/net/phy/mdio-bcm-iproc.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/phy/mdio-bcm-iproc.c b/drivers/net/phy/mdio-bcm-iproc.c index c0b4e65..46fe1ae 100644 --- a/drivers/net/phy/mdio-bcm-iproc.c +++ b/drivers/net/phy/mdio-bcm-iproc.c @@ -81,8 +81,6 @@ static int iproc_mdio_read(struct mii_bus *bus, int phy_id, int reg) if (rc) return rc; - iproc_mdio_config_clk(priv->base); - /* Prepare the read operation */ cmd = (MII_DATA_TA_VAL << MII_DATA_TA_SHIFT) | (reg << MII_DATA_RA_SHIFT) | @@ -112,8 +110,6 @@ static int iproc_mdio_write(struct mii_bus *bus, int phy_id, if (rc) return rc; - iproc_mdio_config_clk(priv->base); - /* Prepare the write operation */ cmd = (MII_DATA_TA_VAL << MII_DATA_TA_SHIFT) | (reg << MII_DATA_RA_SHIFT) | @@ -163,6 +159,8 @@ static int iproc_mdio_probe(struct platform_device *pdev) bus->read = iproc_mdio_read; bus->write = iproc_mdio_write; + iproc_mdio_config_clk(priv->base); + rc = of_mdiobus_register(bus, pdev->dev.of_node); if (rc) { dev_err(>dev, "MDIO bus registration failed\n"); -- 2.7.4
Re: net/ipv4: warning in nf_nat_ipv4_fn
Andrey Konovalovwrote: > Hi, > > I've got the following error report while fuzzing the kernel with syzkaller. > > On commit 926af6273fc683cd98cd0ce7bf0d04a02eed6742. > > A reproducer and .config are attached. > > WARNING: CPU: 2 PID: 26582 at > net/ipv4/netfilter/nf_nat_l3proto_ipv4.c:261 > nf_nat_ipv4_fn+0x7f2/0xa50 > net/ipv4/netfilter/nf_nat_l3proto_ipv4.c:261 > Kernel panic - not syncing: panic_on_warn set ... Thats this assert: /* We never see fragments: conntrack defrags on pre-routing * and local-out, and nf_nat_out protects post-routing. */ NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb))); ... and its wrong. I will send a patch to remove it.
Re: [PATCH] [net-next] ARM: orion: remove unused wnr854t_switch_plat_data
On 02/08/2017 01:24 PM, Arnd Bergmann wrote: > The other instances of this structure got removed along with the MDIO > device change, but this one was left behind and needs to be removed > as well: > > arch/arm/mach-orion5x/wnr854t-setup.c:109:44: error: > 'wnr854t_switch_plat_data' defined but not used [-Werror=unused-variable] > static struct dsa_platform_data __initdata wnr854t_switch_plat_data = { > > Fixes: 575e93f7b5e6 ("ARM: orion: Register DSA switch as a MDIO device") > Signed-off-by: Arnd BergmannAcked-by: Florian Fainelli Thanks Arnd! -- Florian
RE: [PATCH 1/1] hv_netvsc: fix a netvsc stats typo
Please ignore this patch. I will resubmit it to net-next. > -Original Message- > From: Simon Xiao [mailto:six...@microsoft.com] > Sent: Tuesday, February 7, 2017 10:03 AM > To: KY Srinivasan; Haiyang Zhang > ; Stephen Hemminger > ; de...@linuxdriverproject.org; > netdev@vger.kernel.org; linux-ker...@vger.kernel.org > Cc: Simon Xiao > Subject: [PATCH 1/1] hv_netvsc: fix a netvsc stats typo > > [This sender failed our fraud detection checks and may not be who they > appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing] > > Now, return the correct tx_errors stats in netvsc. > > Signed-off-by: Simon Xiao > Reviewed-by: Haiyang Zhang > --- > drivers/net/hyperv/netvsc_drv.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/hyperv/netvsc_drv.c > b/drivers/net/hyperv/netvsc_drv.c > index 72b0c1f..725ac19 100644 > --- a/drivers/net/hyperv/netvsc_drv.c > +++ b/drivers/net/hyperv/netvsc_drv.c > @@ -920,7 +920,7 @@ static void netvsc_get_stats64(struct net_device *net, > } > > t->tx_dropped = net->stats.tx_dropped; > - t->tx_errors= net->stats.tx_dropped; > + t->tx_errors= net->stats.tx_errors; > > t->rx_dropped = net->stats.rx_dropped; > t->rx_errors= net->stats.rx_errors; > -- > 2.7.4
Re: loopback device reference count leakage
On Mon, Feb 6, 2017 at 6:32 PM, Kaiwen Xuwrote: > Hi Cong, > > I did some more testing, seems like your second assumption is correct. > There is indeed some things holding the references to a particular dst > which preventing it to be gc'ed. Excellent! > > I added logging to each dst_hold (or dst_hold_safe, or > skb_dst_force_safe) and dst_release, which formatted as following: > > () []: dst_release / dst_hold ... > > > And inside dst_gc_task(), I added logging when gc delay occurred, > formatted as: > > [dst_gc_task] (): delayed > > I have the log attached. The following line looks suspicious: Feb 6 16:27:24 kernel: [63589.458067] [dst_gc_task] lodebug (2): delayed 19 Looks like you ended up having one dst whose refcnt is 19 in GC, and this lasted for a rather long time for some reason. It is hard to know if it is a refcnt leak even with your log, since there were 4K+ refcnt'ing happened on that dst... Meanwhile, can you share your setup of your container? What network device do you use in your container? How is it connected to outside? Thanks.
Re: [PATCHv6 net-next 3/6] sctp: add support for generating stream reconf ssn/tsn reset request chunk
On Thu, Feb 09, 2017 at 01:18:17AM +0800, Xin Long wrote: > This patch is to define SSN/TSN Reset Request Parameter described > in rfc6525 section 4.3. > > It's also to drop some unnecessary __packed in include/linux/sctp.h. Oups, extra line in the changelog here. > > Signed-off-by: Xin Long> --- > include/linux/sctp.h | 5 + > include/net/sctp/sm.h| 2 ++ > net/sctp/sm_make_chunk.c | 29 + > 3 files changed, 36 insertions(+) > > diff --git a/include/linux/sctp.h b/include/linux/sctp.h > index d74fca3..71c0d41 100644 > --- a/include/linux/sctp.h > +++ b/include/linux/sctp.h > @@ -737,4 +737,9 @@ struct sctp_strreset_inreq { > __u16 list_of_streams[0]; > }; > > +struct sctp_strreset_tsnreq { > + sctp_paramhdr_t param_hdr; > + __u32 request_seq; > +}; > + > #endif /* __LINUX_SCTP_H__ */ > diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h > index 430ed13..ac37c17 100644 > --- a/include/net/sctp/sm.h > +++ b/include/net/sctp/sm.h > @@ -265,6 +265,8 @@ struct sctp_chunk *sctp_make_strreset_req( > const struct sctp_association *asoc, > __u16 stream_num, __u16 *stream_list, > bool out, bool in); > +struct sctp_chunk *sctp_make_strreset_tsnreq( > + const struct sctp_association *asoc); > void sctp_chunk_assign_tsn(struct sctp_chunk *); > void sctp_chunk_assign_ssn(struct sctp_chunk *); > > diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c > index c7d3249..749842a 100644 > --- a/net/sctp/sm_make_chunk.c > +++ b/net/sctp/sm_make_chunk.c > @@ -3658,3 +3658,32 @@ struct sctp_chunk *sctp_make_strreset_req( > > return retval; > } > + > +/* RE-CONFIG 4.3 (SSN/TSN RESET ALL) > + * 0 1 2 3 > + * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 > + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > + * | Parameter Type = 15 | Parameter Length = 8 | > + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > + * | Re-configuration Request Sequence Number | > + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > + */ > +struct sctp_chunk *sctp_make_strreset_tsnreq( > + const struct sctp_association *asoc) > +{ > + struct sctp_strreset_tsnreq tsnreq; > + __u16 length = sizeof(tsnreq); > + struct sctp_chunk *retval; > + > + retval = sctp_make_reconf(asoc, length); > + if (!retval) > + return NULL; > + > + tsnreq.param_hdr.type = SCTP_PARAM_RESET_TSN_REQUEST; > + tsnreq.param_hdr.length = htons(length); > + tsnreq.request_seq = htonl(asoc->strreset_outseq); > + > + sctp_addto_chunk(retval, sizeof(tsnreq), ); > + > + return retval; > +} > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
[PATCH 1/2] net: qcom/emac: add ethtool support for reading hardware registers
Implement the get_regs_len and get_regs ethtool methods. The driver returns the values of selected hardware registers. The make the register offsets known to emac_ethtool, the the register offset macros are all combined into one header file. They were inexplicably and arbitrarily split between two files. Signed-off-by: Timur Tabi--- drivers/net/ethernet/qualcomm/emac/emac-ethtool.c | 40 drivers/net/ethernet/qualcomm/emac/emac-mac.c | 52 --- drivers/net/ethernet/qualcomm/emac/emac.h | 108 -- 3 files changed, 119 insertions(+), 81 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c index c418a6e..abb9df5 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c +++ b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c @@ -170,6 +170,43 @@ static int emac_set_pauseparam(struct net_device *netdev, return 0; } +/* Selected registers that might want to track during runtime. */ +static const u16 emac_regs[] = { + EMAC_DMA_MAS_CTRL, + EMAC_MAC_CTRL, + EMAC_TXQ_CTRL_0, + EMAC_RXQ_CTRL_0, + EMAC_DMA_CTRL, + EMAC_INT_MASK, + EMAC_AXI_MAST_CTRL, + EMAC_CORE_HW_VERSION, + EMAC_MISC_CTRL, +}; + +/* Every time emac_regs[] above is changed, increase this version number. */ +#define EMAC_REGS_VERSION 0 + +#define EMAC_MAX_REG_SIZE ARRAY_SIZE(emac_regs) + +static void emac_get_regs(struct net_device *netdev, + struct ethtool_regs *regs, void *buff) +{ + struct emac_adapter *adpt = netdev_priv(netdev); + u32 *val = buff; + unsigned int i; + + regs->version = EMAC_REGS_VERSION; + regs->len = EMAC_MAX_REG_SIZE * sizeof(u32); + + for (i = 0; i < EMAC_MAX_REG_SIZE; i++) + val[i] = readl(adpt->base + emac_regs[i]); +} + +static int emac_get_regs_len(struct net_device *netdev) +{ + return EMAC_MAX_REG_SIZE * sizeof(32); +} + static const struct ethtool_ops emac_ethtool_ops = { .get_link_ksettings = phy_ethtool_get_link_ksettings, .set_link_ksettings = phy_ethtool_set_link_ksettings, @@ -189,6 +226,9 @@ static int emac_set_pauseparam(struct net_device *netdev, .nway_reset = emac_nway_reset, .get_link = ethtool_op_get_link, + + .get_regs_len= emac_get_regs_len, + .get_regs= emac_get_regs, }; void emac_set_ethtool_ops(struct net_device *netdev) diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c b/drivers/net/ethernet/qualcomm/emac/emac-mac.c index 4b3e014..cc065ff 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c +++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c @@ -25,58 +25,6 @@ #include "emac.h" #include "emac-sgmii.h" -/* EMAC base register offsets */ -#define EMAC_MAC_CTRL 0x001480 -#define EMAC_WOL_CTRL0 0x0014a0 -#define EMAC_RSS_KEY0 0x0014b0 -#define EMAC_H1TPD_BASE_ADDR_LO0x0014e0 -#define EMAC_H2TPD_BASE_ADDR_LO0x0014e4 -#define EMAC_H3TPD_BASE_ADDR_LO0x0014e8 -#define EMAC_INTER_SRAM_PART9 0x001534 -#define EMAC_DESC_CTRL_0 0x001540 -#define EMAC_DESC_CTRL_1 0x001544 -#define EMAC_DESC_CTRL_2 0x001550 -#define EMAC_DESC_CTRL_10 0x001554 -#define EMAC_DESC_CTRL_12 0x001558 -#define EMAC_DESC_CTRL_13 0x00155c -#define EMAC_DESC_CTRL_3 0x001560 -#define EMAC_DESC_CTRL_4 0x001564 -#define EMAC_DESC_CTRL_5 0x001568 -#define EMAC_DESC_CTRL_14 0x00156c -#define EMAC_DESC_CTRL_15 0x001570 -#define EMAC_DESC_CTRL_16 0x001574 -#define EMAC_DESC_CTRL_6 0x001578 -#define EMAC_DESC_CTRL_8 0x001580 -#define EMAC_DESC_CTRL_9 0x001584 -#define EMAC_DESC_CTRL_11 0x001588 -#define EMAC_TXQ_CTRL_00x001590 -#define EMAC_TXQ_CTRL_10x001594 -#define EMAC_TXQ_CTRL_20x001598 -#define EMAC_RXQ_CTRL_00x0015a0 -#define EMAC_RXQ_CTRL_10x0015a4 -#define EMAC_RXQ_CTRL_20x0015a8 -#define EMAC_RXQ_CTRL_30x0015ac -#define EMAC_BASE_CPU_NUMBER 0x0015b8 -#define EMAC_DMA_CTRL 0x0015c0 -#define EMAC_MAILBOX_0 0x0015e0 -#define EMAC_MAILBOX_5 0x0015e4 -#define EMAC_MAILBOX_6 0x0015e8 -#define EMAC_MAILBOX_130x0015ec -#define EMAC_MAILBOX_2 0x0015f4 -#define EMAC_MAILBOX_3 0x0015f8 -#define EMAC_MAILBOX_110x00160c -#define EMAC_AXI_MAST_CTRL 0x001610 -#define EMAC_MAILBOX_120x001614
[PATCH 2/2] net: qcom/emac: add ethtool support for setting ring parameters
Implement the set_ringparam method, which allows the user to specify the size of the TX and RX descriptor rings. The values are constrained to the limits of the hardware. Since the driver does not use separate queues for mini or jumbo frames, attempts to set those values are rejected. If the interface is already running when the setting is changed, then the interface is reset. Signed-off-by: Timur Tabi--- drivers/net/ethernet/qualcomm/emac/emac-ethtool.c | 24 +++ 1 file changed, 24 insertions(+) diff --git a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c index abb9df5..a3e2292 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c +++ b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c @@ -145,6 +145,29 @@ static void emac_get_ringparam(struct net_device *netdev, ring->tx_pending = adpt->tx_desc_cnt; } +static int emac_set_ringparam(struct net_device *netdev, + struct ethtool_ringparam *ring) +{ + struct emac_adapter *adpt = netdev_priv(netdev); + + /* We don't have separate queues/rings for small/large frames, so +* reject any attempt to specify those values separately. +*/ + if (ring->rx_mini_pending || ring->rx_jumbo_pending) + return -EINVAL; + + adpt->tx_desc_cnt = + clamp_val(ring->tx_pending, EMAC_MIN_TX_DESCS, EMAC_MAX_TX_DESCS); + + adpt->rx_desc_cnt = + clamp_val(ring->rx_pending, EMAC_MIN_RX_DESCS, EMAC_MAX_RX_DESCS); + + if (netif_running(netdev)) + return emac_reinit_locked(adpt); + + return 0; +} + static void emac_get_pauseparam(struct net_device *netdev, struct ethtool_pauseparam *pause) { @@ -219,6 +242,7 @@ static int emac_get_regs_len(struct net_device *netdev) .get_ethtool_stats = emac_get_ethtool_stats, .get_ringparam = emac_get_ringparam, + .set_ringparam = emac_set_ringparam, .get_pauseparam = emac_get_pauseparam, .set_pauseparam = emac_set_pauseparam, -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCHv6 net-next 2/6] sctp: streams should be recovered when it fails to send request.
On Thu, Feb 09, 2017 at 01:18:16AM +0800, Xin Long wrote: > Now when sending stream reset request, it closes the streams to > block further xmit of data until this request is completed, then > calls sctp_send_reconf to send the chunk. > > But if sctp_send_reconf returns err, and it doesn't recover the > streams' states back, which means the request chunk would not be > queued and sent, so the asoc will get stuck, streams are closed > and no packet is even queued. > > This patch is to fix it by recovering the streams' states when > it fails to send the request, it is also to fix a return value. > > Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset > Request Parameter") > Signed-off-by: Xin LongAcked-by: Marcelo Ricardo Leitner > --- > net/sctp/stream.c | 19 +-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/net/sctp/stream.c b/net/sctp/stream.c > index 13d5e07..6a686e3 100644 > --- a/net/sctp/stream.c > +++ b/net/sctp/stream.c > @@ -136,8 +136,10 @@ int sctp_send_reset_streams(struct sctp_association > *asoc, > goto out; > > chunk = sctp_make_strreset_req(asoc, str_nums, str_list, out, in); > - if (!chunk) > + if (!chunk) { > + retval = -ENOMEM; > goto out; > + } > > if (out) { > if (str_nums) > @@ -149,7 +151,6 @@ int sctp_send_reset_streams(struct sctp_association *asoc, > stream->out[i].state = SCTP_STREAM_CLOSED; > } > > - asoc->strreset_outstanding = out + in; > asoc->strreset_chunk = chunk; > sctp_chunk_hold(asoc->strreset_chunk); > > @@ -157,8 +158,22 @@ int sctp_send_reset_streams(struct sctp_association > *asoc, > if (retval) { > sctp_chunk_put(asoc->strreset_chunk); > asoc->strreset_chunk = NULL; > + if (!out) > + goto out; > + > + if (str_nums) > + for (i = 0; i < str_nums; i++) > + stream->out[str_list[i]].state = > +SCTP_STREAM_OPEN; > + else > + for (i = 0; i < stream->outcnt; i++) > + stream->out[i].state = SCTP_STREAM_OPEN; > + > + goto out; > } > > + asoc->strreset_outstanding = out + in; > + > out: > return retval; > } > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [PATCHv6 net-next 1/6] sctp: drop unnecessary __packed from some stream reconf structures
On Thu, Feb 09, 2017 at 01:18:15AM +0800, Xin Long wrote: > commit 85c727b59483 ("sctp: drop __packed from almost all SCTP structures") > has removed __packed from almost all SCTP structures. But there still are > three structures where it should be dropped. > > This patch is to remove it from some stream reconf structures. > > Signed-off-by: Xin LongAcked-by: Marcelo Ricardo Leitner > --- > include/linux/sctp.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/include/linux/sctp.h b/include/linux/sctp.h > index 2408c68..d74fca3 100644 > --- a/include/linux/sctp.h > +++ b/include/linux/sctp.h > @@ -721,7 +721,7 @@ struct sctp_infox { > struct sctp_reconf_chunk { > sctp_chunkhdr_t chunk_hdr; > __u8 params[0]; > -} __packed; > +}; > > struct sctp_strreset_outreq { > sctp_paramhdr_t param_hdr; > @@ -729,12 +729,12 @@ struct sctp_strreset_outreq { > __u32 response_seq; > __u32 send_reset_at_tsn; > __u16 list_of_streams[0]; > -} __packed; > +}; > > struct sctp_strreset_inreq { > sctp_paramhdr_t param_hdr; > __u32 request_seq; > __u16 list_of_streams[0]; > -} __packed; > +}; > > #endif /* __LINUX_SCTP_H__ */ > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [PATCH v2 net-next 1/9] sunvnet: make sunvnet common code dynamically loadable
On 2/8/2017 11:29 AM, David Miller wrote: From: Shannon NelsonDate: Tue, 7 Feb 2017 14:12:54 -0800 +static int __init sunvnet_common_init(void) +{ + pr_info("%s\n", version); + return 0; +} +module_init(sunvnet_common_init); + +static void __exit sunvnet_common_exit(void) +{ + /* Empty function, just here to fill the exit function pointer +* slot. In some combinations of older gcc and newer kernel, +* leaving this undefined results in the kernel marking it as a +* permanent module; it will show up in lsmod output as [permanent] +* and not be unloadable. +*/ +} +module_exit(sunvnet_common_exit); + This module is just providing infrastructure for other modules. So skip the init function, and that way you don't need the exit function either. The kernel log message when the real sunvnet driver loads is sufficient, you don't need one here. Sure - thanks, sln
Re: [PATCHv6 net-next 4/6] sctp: implement sender-side procedures for SSN/TSN Reset Request Parameter
On Wed, Feb 08, 2017 at 07:48:01PM -0200, Marcelo Ricardo Leitner wrote: > Hi Xin, > > On Thu, Feb 09, 2017 at 01:18:18AM +0800, Xin Long wrote: > > This patch is to implement Sender-Side Procedures for the SSN/TSN > > Reset Request Parameter descibed in rfc6525 section 5.1.4. > > > > It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3 > > for users. > > > > Signed-off-by: Xin Long> ... > > + > > +int sctp_send_reset_assoc(struct sctp_association *asoc) > > +{ > > + struct sctp_chunk *chunk = NULL; > > + int retval; > > + __u16 i; > > + > > + if (!asoc->peer.reconf_capable || > > + !(asoc->strreset_enable & SCTP_ENABLE_RESET_ASSOC_REQ)) > > + return -ENOPROTOOPT; > > + > > + if (asoc->strreset_outstanding) > > + return -EINPROGRESS; > > + > > + chunk = sctp_make_strreset_tsnreq(asoc); > ^--- refcnf = 1 (as per sctp_chunkify()) > > > + if (!chunk) > > + return -ENOMEM; > > + > > + /* Block further xmit of data until this request is completed */ > > + for (i = 0; i < asoc->stream->outcnt; i++) > > + asoc->stream->out[i].state = SCTP_STREAM_CLOSED; > > + > > + asoc->strreset_chunk = chunk; > > + sctp_chunk_hold(asoc->strreset_chunk); > ^--- refcnf = 2 > > + > > + retval = sctp_send_reconf(asoc, chunk); > > + if (retval) { > > + sctp_chunk_put(asoc->strreset_chunk); > ^--- refcnf = 1 > > Won't we leak the chunk here? No we won't, sctp_send_reconf() frees it for us, aye.
[PATCH 0/2] net: qcom/emac: add the last ethtool functions
These two patches implement the remaining two ethtool functions that are of interest to the Qualcomm EMAC driver. These are the last patches that will be submitted for the 4.11 merge window. Timur Tabi (2): net: qcom/emac: add ethtool support for reading hardware registers net: qcom/emac: add ethtool support for setting ring parameters drivers/net/ethernet/qualcomm/emac/emac-ethtool.c | 64 + drivers/net/ethernet/qualcomm/emac/emac-mac.c | 52 --- drivers/net/ethernet/qualcomm/emac/emac.h | 108 -- 3 files changed, 143 insertions(+), 81 deletions(-) -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCHv6 net-next 4/6] sctp: implement sender-side procedures for SSN/TSN Reset Request Parameter
Hi Xin, On Thu, Feb 09, 2017 at 01:18:18AM +0800, Xin Long wrote: > This patch is to implement Sender-Side Procedures for the SSN/TSN > Reset Request Parameter descibed in rfc6525 section 5.1.4. > > It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3 > for users. > > Signed-off-by: Xin Long... > + > +int sctp_send_reset_assoc(struct sctp_association *asoc) > +{ > + struct sctp_chunk *chunk = NULL; > + int retval; > + __u16 i; > + > + if (!asoc->peer.reconf_capable || > + !(asoc->strreset_enable & SCTP_ENABLE_RESET_ASSOC_REQ)) > + return -ENOPROTOOPT; > + > + if (asoc->strreset_outstanding) > + return -EINPROGRESS; > + > + chunk = sctp_make_strreset_tsnreq(asoc); ^--- refcnf = 1 (as per sctp_chunkify()) > + if (!chunk) > + return -ENOMEM; > + > + /* Block further xmit of data until this request is completed */ > + for (i = 0; i < asoc->stream->outcnt; i++) > + asoc->stream->out[i].state = SCTP_STREAM_CLOSED; > + > + asoc->strreset_chunk = chunk; > + sctp_chunk_hold(asoc->strreset_chunk); ^--- refcnf = 2 > + > + retval = sctp_send_reconf(asoc, chunk); > + if (retval) { > + sctp_chunk_put(asoc->strreset_chunk); ^--- refcnf = 1 Won't we leak the chunk here? > + asoc->strreset_chunk = NULL; > + > + for (i = 0; i < asoc->stream->outcnt; i++) > + asoc->stream->out[i].state = SCTP_STREAM_OPEN; > + > + return retval; > + } > + > + asoc->strreset_outstanding = 1; > + > + return 0; > +} > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
RE: [PATCHv4 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces
> -Original Message- > From: David Miller [mailto:da...@davemloft.net] > Sent: Tuesday, February 07, 2017 10:28 AM > To: Grandhi, Sainath> Cc: netdev@vger.kernel.org; mah...@bandewar.net; linux- > ker...@vger.kernel.org > Subject: Re: [PATCHv4 0/7] Refactor macvtap to re-use tap functionality by > other virtual intefaces > > From: Sainath Grandhi > Date: Mon, 6 Feb 2017 13:36:08 -0800 > > > Tap character devices can be implemented on other virtual interfaces > > like ipvlan, similar to macvtap. Source code for tap functionality in > > macvtap can be re-used for this purpose. > > > > This patch series splits macvtap source into two modules, macvtap and tap. > > This patch series also includes a patch for implementing tap character > > device driver based on the IP-VLAN network interface, called ipvtap. > > > > These patches are tested on x86 platform. > > You're going to have to rework the module and Kconfig parts of this set of > changes. > > The user should not have to modify any existing Kconfig setting to get the > same set of modules which already exist today. > > Yet when I run "make oldconfig" after applying these changes it prompts me > for: > > TAP module support for virtual interfaces (TAP) [N/m/y/?] (NEW) > > And that's not really acceptable. I had MACVTAP set, I should still get the > infrastructure necessary to get that module built. > > If you want to do patch #6 you have to do it in a way that is transparent to > existing kernel configs. Please check the next version of patches. Modified Kconfig to make TAP as an user non-visible symbol. > > Thanks.
[PATCHv5 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces
Tap character devices can be implemented on other virtual interfaces like ipvlan, similar to macvtap. Source code for tap functionality in macvtap can be re-used for this purpose. This patch series splits macvtap source into two modules, macvtap and tap. This patch series also includes a patch for implementing tap character device driver based on the IP-VLAN network interface, called ipvtap. These patches are tested on x86 platform. Sainath Grandhi (7): tap: Refactoring macvtap.c tap: Renaming tap related APIs, data structures, macros tap: Tap character device creation/destroy API tap: Abstract type of virtual interface from tap implementation tap: Extending tap device create/destroy APIs tap: tap as an independent module ipvtap: IP-VLAN based tap driver drivers/net/Kconfig | 20 + drivers/net/Makefile |2 + drivers/net/ipvlan/Makefile |1 + drivers/net/ipvlan/ipvlan.h |7 + drivers/net/ipvlan/ipvlan_core.c |5 +- drivers/net/ipvlan/ipvlan_main.c | 27 +- drivers/net/ipvlan/ipvtap.c | 241 drivers/net/macvlan.c|2 +- drivers/net/macvtap.c| 1229 ++-- drivers/net/tap.c| 1268 ++ drivers/vhost/Kconfig|2 +- drivers/vhost/net.c |3 +- include/linux/if_macvlan.h | 17 +- include/linux/if_tap.h | 75 +++ 14 files changed, 1690 insertions(+), 1209 deletions(-) create mode 100644 drivers/net/ipvlan/ipvtap.c create mode 100644 drivers/net/tap.c create mode 100644 include/linux/if_tap.h -- 2.7.4
[PATCHv5 3/7] tap: Tap character device creation/destroy API
This patch provides tap device create/destroy APIs in tap.c. Signed-off-by: Sainath Grandhi--- drivers/net/macvtap_main.c | 30 +++--- drivers/net/tap.c | 62 ++ include/linux/if_tap.h | 3 +++ 3 files changed, 63 insertions(+), 32 deletions(-) diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 548f339..215ab7a 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -28,7 +28,6 @@ * Variables for dealing with macvtaps device numbers. */ static dev_t macvtap_major; -#define MACVTAP_NUM_DEVS (1U << MINORBITS) static const void *macvtap_net_namespace(struct device *d) { @@ -159,57 +158,46 @@ static struct notifier_block macvtap_notifier_block __read_mostly = { .notifier_call = macvtap_device_event, }; -extern struct file_operations tap_fops; static int macvtap_init(void) { int err; - err = alloc_chrdev_region(_major, 0, - MACVTAP_NUM_DEVS, "macvtap"); - if (err) - goto out1; + err = tap_create_cdev(_cdev, _major, "macvtap"); - cdev_init(_cdev, _fops); - err = cdev_add(_cdev, macvtap_major, MACVTAP_NUM_DEVS); if (err) - goto out2; + goto out1; err = class_register(_class); if (err) - goto out3; + goto out2; err = register_netdevice_notifier(_notifier_block); if (err) - goto out4; + goto out3; err = macvlan_link_register(_link_ops); if (err) - goto out5; + goto out4; return 0; -out5: - unregister_netdevice_notifier(_notifier_block); out4: - class_unregister(_class); + unregister_netdevice_notifier(_notifier_block); out3: - cdev_del(_cdev); + class_unregister(_class); out2: - unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS); + tap_destroy_cdev(macvtap_major, _cdev); out1: return err; } module_init(macvtap_init); -extern struct idr minor_idr; static void macvtap_exit(void) { rtnl_link_unregister(_link_ops); unregister_netdevice_notifier(_notifier_block); class_unregister(_class); - cdev_del(_cdev); - unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS); - idr_destroy(_idr); + tap_destroy_cdev(macvtap_major, _cdev); } module_exit(macvtap_exit); diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 15ca2d5..04ba978 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -123,8 +123,12 @@ static struct proto tap_proto = { }; #define TAP_NUM_DEVS (1U << MINORBITS) -static DEFINE_MUTEX(minor_lock); -DEFINE_IDR(minor_idr); +struct major_info { + dev_t major; + struct idr minor_idr; + struct mutex minor_lock; + const char *device_name; +} macvtap_major; #define GOODCOPY_LEN 128 @@ -413,26 +417,26 @@ int tap_get_minor(struct macvlan_dev *vlan) { int retval = -ENOMEM; - mutex_lock(_lock); - retval = idr_alloc(_idr, vlan, 1, TAP_NUM_DEVS, GFP_KERNEL); + mutex_lock(_major.minor_lock); + retval = idr_alloc(_major.minor_idr, vlan, 1, TAP_NUM_DEVS, GFP_KERNEL); if (retval >= 0) { vlan->minor = retval; } else if (retval == -ENOSPC) { netdev_err(vlan->dev, "Too many tap devices\n"); retval = -EINVAL; } - mutex_unlock(_lock); + mutex_unlock(_major.minor_lock); return retval < 0 ? retval : 0; } void tap_free_minor(struct macvlan_dev *vlan) { - mutex_lock(_lock); + mutex_lock(_major.minor_lock); if (vlan->minor) { - idr_remove(_idr, vlan->minor); + idr_remove(_major.minor_idr, vlan->minor); vlan->minor = 0; } - mutex_unlock(_lock); + mutex_unlock(_major.minor_lock); } static struct net_device *dev_get_by_tap_minor(int minor) @@ -440,13 +444,13 @@ static struct net_device *dev_get_by_tap_minor(int minor) struct net_device *dev = NULL; struct macvlan_dev *vlan; - mutex_lock(_lock); - vlan = idr_find(_idr, minor); + mutex_lock(_major.minor_lock); + vlan = idr_find(_major.minor_idr, minor); if (vlan) { dev = vlan->dev; dev_hold(dev); } - mutex_unlock(_lock); + mutex_unlock(_major.minor_lock); return dev; } @@ -1184,3 +1188,39 @@ int tap_queue_resize(struct macvlan_dev *vlan) kfree(arrays); return ret; } + +int tap_create_cdev(struct cdev *tap_cdev, + dev_t *tap_major, const char *device_name) +{ + int err; + + err = alloc_chrdev_region(tap_major, 0, TAP_NUM_DEVS, device_name); + if (err) + goto out1; + + cdev_init(tap_cdev, _fops); +
[PATCHv5 4/7] tap: Abstract type of virtual interface from tap implementation
macvlan object is re-structured to hold tap related elements in a separate entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with idr and fetched again on tap_open. Few of the tap functions are modified to accepted tap_dev as argument. tap_dev object includes callbacks to be used by underlying virtual interface to take care of tx and rx accounting. Signed-off-by: Sainath Grandhi--- drivers/net/macvlan.c | 2 +- drivers/net/macvtap_main.c | 71 +--- drivers/net/tap.c | 264 - include/linux/if_tap.h | 57 +- 4 files changed, 229 insertions(+), 165 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 20b3fdf2..79383f9 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -1526,7 +1526,6 @@ static const struct nla_policy macvlan_policy[IFLA_MACVLAN_MAX + 1] = { int macvlan_link_register(struct rtnl_link_ops *ops) { /* common fields */ - ops->priv_size = sizeof(struct macvlan_dev); ops->validate = macvlan_validate; ops->maxtype= IFLA_MACVLAN_MAX; ops->policy = macvlan_policy; @@ -1549,6 +1548,7 @@ static struct rtnl_link_ops macvlan_link_ops = { .newlink= macvlan_newlink, .dellink= macvlan_dellink, .get_link_net = macvlan_get_link_net, + .priv_size = sizeof(struct macvlan_dev), }; static int macvlan_device_event(struct notifier_block *unused, diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 215ab7a..0238df6 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -24,6 +24,11 @@ #include #include +struct macvtap_dev { + struct macvlan_dev vlan; + struct tap_devtap; +}; + /* * Variables for dealing with macvtaps device numbers. */ @@ -46,22 +51,55 @@ static struct cdev macvtap_cdev; #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \ NETIF_F_TSO6 | NETIF_F_UFO) +static void macvtap_count_tx_dropped(struct tap_dev *tap) +{ + struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, tap); + struct macvlan_dev *vlan = >vlan; + + this_cpu_inc(vlan->pcpu_stats->tx_dropped); +} + +static void macvtap_count_rx_dropped(struct tap_dev *tap) +{ + struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, tap); + struct macvlan_dev *vlan = >vlan; + + macvlan_count_rx(vlan, 0, 0, 0); +} + +static void macvtap_update_features(struct tap_dev *tap, + netdev_features_t features) +{ + struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, tap); + struct macvlan_dev *vlan = >vlan; + + vlan->set_features = features; + netdev_update_features(vlan->dev); +} + static int macvtap_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[]) { - struct macvlan_dev *vlan = netdev_priv(dev); + struct macvtap_dev *vlantap = netdev_priv(dev); int err; - INIT_LIST_HEAD(>queue_list); + INIT_LIST_HEAD(>tap.queue_list); /* Since macvlan supports all offloads by default, make * tap support all offloads also. */ - vlan->tap_features = TUN_OFFLOADS; + vlantap->tap.tap_features = TUN_OFFLOADS; - err = netdev_rx_handler_register(dev, tap_handle_frame, vlan); + /* Register callbacks for rx/tx drops accounting and updating +* net_device features +*/ + vlantap->tap.count_tx_dropped = macvtap_count_tx_dropped; + vlantap->tap.count_rx_dropped = macvtap_count_rx_dropped; + vlantap->tap.update_features = macvtap_update_features; + + err = netdev_rx_handler_register(dev, tap_handle_frame, >tap); if (err) return err; @@ -74,14 +112,18 @@ static int macvtap_newlink(struct net *src_net, return err; } + vlantap->tap.dev = vlantap->vlan.dev; + return 0; } static void macvtap_dellink(struct net_device *dev, struct list_head *head) { + struct macvtap_dev *vlantap = netdev_priv(dev); + netdev_rx_handler_unregister(dev); - tap_del_queues(dev); + tap_del_queues(>tap); macvlan_dellink(dev, head); } @@ -96,13 +138,14 @@ static struct rtnl_link_ops macvtap_link_ops __read_mostly = { .setup = macvtap_setup, .newlink= macvtap_newlink, .dellink= macvtap_dellink, + .priv_size = sizeof(struct macvtap_dev), }; static int macvtap_device_event(struct notifier_block *unused, unsigned long event, void *ptr) { struct net_device *dev =
[PATCHv5 2/7] tap: Renaming tap related APIs, data structures, macros
Renaming tap related APIs, data structures and macros in tap.c from macvtap_.* to tap_.* Signed-off-by: Sainath Grandhi--- drivers/net/macvtap_main.c | 18 +-- drivers/net/tap.c | 332 ++--- drivers/vhost/net.c| 3 +- include/linux/if_macvlan.h | 17 +-- include/linux/if_macvtap.h | 10 -- include/linux/if_tap.h | 23 6 files changed, 202 insertions(+), 201 deletions(-) delete mode 100644 include/linux/if_macvtap.h create mode 100644 include/linux/if_tap.h diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 96ffa60..548f339 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -1,6 +1,6 @@ #include #include -#include +#include #include #include #include @@ -62,7 +62,7 @@ static int macvtap_newlink(struct net *src_net, */ vlan->tap_features = TUN_OFFLOADS; - err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan); + err = netdev_rx_handler_register(dev, tap_handle_frame, vlan); if (err) return err; @@ -82,7 +82,7 @@ static void macvtap_dellink(struct net_device *dev, struct list_head *head) { netdev_rx_handler_unregister(dev); - macvtap_del_queues(dev); + tap_del_queues(dev); macvlan_dellink(dev, head); } @@ -121,7 +121,7 @@ static int macvtap_device_event(struct notifier_block *unused, * been registered but before register_netdevice has * finished running. */ - err = macvtap_get_minor(vlan); + err = tap_get_minor(vlan); if (err) return notifier_from_errno(err); @@ -129,7 +129,7 @@ static int macvtap_device_event(struct notifier_block *unused, classdev = device_create(_class, >dev, devt, dev, tap_name); if (IS_ERR(classdev)) { - macvtap_free_minor(vlan); + tap_free_minor(vlan); return notifier_from_errno(PTR_ERR(classdev)); } err = sysfs_create_link(>dev.kobj, >kobj, @@ -144,10 +144,10 @@ static int macvtap_device_event(struct notifier_block *unused, sysfs_remove_link(>dev.kobj, tap_name); devt = MKDEV(MAJOR(macvtap_major), vlan->minor); device_destroy(_class, devt); - macvtap_free_minor(vlan); + tap_free_minor(vlan); break; case NETDEV_CHANGE_TX_QUEUE_LEN: - if (macvtap_queue_resize(vlan)) + if (tap_queue_resize(vlan)) return NOTIFY_BAD; break; } @@ -159,7 +159,7 @@ static struct notifier_block macvtap_notifier_block __read_mostly = { .notifier_call = macvtap_device_event, }; -extern struct file_operations macvtap_fops; +extern struct file_operations tap_fops; static int macvtap_init(void) { int err; @@ -169,7 +169,7 @@ static int macvtap_init(void) if (err) goto out1; - cdev_init(_cdev, _fops); + cdev_init(_cdev, _fops); err = cdev_add(_cdev, macvtap_major, MACVTAP_NUM_DEVS); if (err) goto out2; diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 6f6228e..15ca2d5 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -24,16 +24,16 @@ #include /* - * A macvtap queue is the central object of this driver, it connects + * A tap queue is the central object of this driver, it connects * an open character device to a macvlan interface. There can be * multiple queues on one interface, which map back to queues * implemented in hardware on the underlying device. * - * macvtap_proto is used to allocate queues through the sock allocation + * tap_proto is used to allocate queues through the sock allocation * mechanism. * */ -struct macvtap_queue { +struct tap_queue { struct sock sk; struct socket sock; struct socket_wq wq; @@ -47,21 +47,21 @@ struct macvtap_queue { struct skb_array skb_array; }; -#define MACVTAP_FEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE) +#define TAP_IFFEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE) -#define MACVTAP_VNET_LE 0x8000 -#define MACVTAP_VNET_BE 0x4000 +#define TAP_VNET_LE 0x8000 +#define TAP_VNET_BE 0x4000 #ifdef CONFIG_TUN_VNET_CROSS_LE -static inline bool macvtap_legacy_is_little_endian(struct macvtap_queue *q) +static inline bool tap_legacy_is_little_endian(struct tap_queue *q) { - return q->flags & MACVTAP_VNET_BE ? false : + return q->flags & TAP_VNET_BE ? false : virtio_legacy_is_little_endian(); } -static long macvtap_get_vnet_be(struct macvtap_queue *q, int __user *sp) +static long tap_get_vnet_be(struct tap_queue *q, int __user *sp)
[PATCHv5 5/7] tap: Extending tap device create/destroy APIs
Extending tap APIs get/free_minor and create/destroy_cdev to handle more than one type of virtual interface. Signed-off-by: Sainath Grandhi--- drivers/net/macvtap_main.c | 6 +-- drivers/net/tap.c | 101 +++-- include/linux/if_tap.h | 4 +- 3 files changed, 85 insertions(+), 26 deletions(-) diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 0238df6..a4bfc10 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -163,7 +163,7 @@ static int macvtap_device_event(struct notifier_block *unused, * been registered but before register_netdevice has * finished running. */ - err = tap_get_minor(>tap); + err = tap_get_minor(macvtap_major, >tap); if (err) return notifier_from_errno(err); @@ -171,7 +171,7 @@ static int macvtap_device_event(struct notifier_block *unused, classdev = device_create(_class, >dev, devt, dev, tap_name); if (IS_ERR(classdev)) { - tap_free_minor(>tap); + tap_free_minor(macvtap_major, >tap); return notifier_from_errno(PTR_ERR(classdev)); } err = sysfs_create_link(>dev.kobj, >kobj, @@ -186,7 +186,7 @@ static int macvtap_device_event(struct notifier_block *unused, sysfs_remove_link(>dev.kobj, tap_name); devt = MKDEV(MAJOR(macvtap_major), vlantap->tap.minor); device_destroy(_class, devt); - tap_free_minor(>tap); + tap_free_minor(macvtap_major, >tap); break; case NETDEV_CHANGE_TX_QUEUE_LEN: if (tap_queue_resize(>tap)) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 7d3e8b1..b7cdc90 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -99,12 +99,17 @@ static struct proto tap_proto = { }; #define TAP_NUM_DEVS (1U << MINORBITS) + +static LIST_HEAD(major_list); + struct major_info { + struct rcu_head rcu; dev_t major; struct idr minor_idr; struct mutex minor_lock; const char *device_name; -} macvtap_major; + struct list_head next; +}; #define GOODCOPY_LEN 128 @@ -385,44 +390,72 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) return RX_HANDLER_CONSUMED; } -int tap_get_minor(struct tap_dev *tap) +static struct major_info *tap_get_major(int major) +{ + struct major_info *tap_major; + + list_for_each_entry_rcu(tap_major, _list, next) { + if (tap_major->major == major) + return tap_major; + } + + return NULL; +} + +int tap_get_minor(dev_t major, struct tap_dev *tap) { int retval = -ENOMEM; + struct major_info *tap_major; - mutex_lock(_major.minor_lock); - retval = idr_alloc(_major.minor_idr, tap, 1, TAP_NUM_DEVS, GFP_KERNEL); + tap_major = tap_get_major(MAJOR(major)); + if (!tap_major) + return -EINVAL; + + mutex_lock(_major->minor_lock); + retval = idr_alloc(_major->minor_idr, tap, 1, TAP_NUM_DEVS, GFP_KERNEL); if (retval >= 0) { tap->minor = retval; } else if (retval == -ENOSPC) { netdev_err(tap->dev, "Too many tap devices\n"); retval = -EINVAL; } - mutex_unlock(_major.minor_lock); + mutex_unlock(_major->minor_lock); return retval < 0 ? retval : 0; } -void tap_free_minor(struct tap_dev *tap) +void tap_free_minor(dev_t major, struct tap_dev *tap) { - mutex_lock(_major.minor_lock); + struct major_info *tap_major; + + tap_major = tap_get_major(MAJOR(major)); + if (!tap_major) + return; + + mutex_lock(_major->minor_lock); if (tap->minor) { - idr_remove(_major.minor_idr, tap->minor); + idr_remove(_major->minor_idr, tap->minor); tap->minor = 0; } - mutex_unlock(_major.minor_lock); + mutex_unlock(_major->minor_lock); } -static struct tap_dev *dev_get_by_tap_minor(int minor) +static struct tap_dev *dev_get_by_tap_file(int major, int minor) { struct net_device *dev = NULL; struct tap_dev *tap; + struct major_info *tap_major; + + tap_major = tap_get_major(major); + if (!tap_major) + return NULL; - mutex_lock(_major.minor_lock); - tap = idr_find(_major.minor_idr, minor); + mutex_lock(_major->minor_lock); + tap = idr_find(_major->minor_idr, minor); if (tap) { dev = tap->dev; dev_hold(dev); } - mutex_unlock(_major.minor_lock); + mutex_unlock(_major->minor_lock); return tap; } @@ -454,7 +487,7 @@ static int
[PATCHv5 1/7] tap: Refactoring macvtap.c
macvtap module has code for tap/queue management and link management. This patch splits the code into macvtap_main.c for link management and tap.c for tap/queue management. Functionality in tap.c can be re-used for implementing tap on other virtual interfaces. Signed-off-by: Sainath Grandhi--- drivers/net/Makefile | 2 + drivers/net/macvtap_main.c | 218 +++ drivers/net/{macvtap.c => tap.c} | 204 ++-- include/linux/if_macvtap.h | 10 ++ 4 files changed, 238 insertions(+), 196 deletions(-) create mode 100644 drivers/net/macvtap_main.c rename drivers/net/{macvtap.c => tap.c} (84%) create mode 100644 include/linux/if_macvtap.h diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 7336cbd..19b03a9 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -29,6 +29,8 @@ obj-$(CONFIG_GTP) += gtp.o obj-$(CONFIG_NLMON) += nlmon.o obj-$(CONFIG_NET_VRF) += vrf.o +macvtap-objs := macvtap_main.o tap.o + # # Networking Drivers # diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c new file mode 100644 index 000..96ffa60 --- /dev/null +++ b/drivers/net/macvtap_main.c @@ -0,0 +1,218 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +/* + * Variables for dealing with macvtaps device numbers. + */ +static dev_t macvtap_major; +#define MACVTAP_NUM_DEVS (1U << MINORBITS) + +static const void *macvtap_net_namespace(struct device *d) +{ + struct net_device *dev = to_net_dev(d->parent); + return dev_net(dev); +} + +static struct class macvtap_class = { + .name = "macvtap", + .owner = THIS_MODULE, + .ns_type = _ns_type_operations, + .namespace = macvtap_net_namespace, +}; +static struct cdev macvtap_cdev; + +#define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \ + NETIF_F_TSO6 | NETIF_F_UFO) + +static int macvtap_newlink(struct net *src_net, + struct net_device *dev, + struct nlattr *tb[], + struct nlattr *data[]) +{ + struct macvlan_dev *vlan = netdev_priv(dev); + int err; + + INIT_LIST_HEAD(>queue_list); + + /* Since macvlan supports all offloads by default, make +* tap support all offloads also. +*/ + vlan->tap_features = TUN_OFFLOADS; + + err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan); + if (err) + return err; + + /* Don't put anything that may fail after macvlan_common_newlink +* because we can't undo what it does. +*/ + err = macvlan_common_newlink(src_net, dev, tb, data); + if (err) { + netdev_rx_handler_unregister(dev); + return err; + } + + return 0; +} + +static void macvtap_dellink(struct net_device *dev, + struct list_head *head) +{ + netdev_rx_handler_unregister(dev); + macvtap_del_queues(dev); + macvlan_dellink(dev, head); +} + +static void macvtap_setup(struct net_device *dev) +{ + macvlan_common_setup(dev); + dev->tx_queue_len = TUN_READQ_SIZE; +} + +static struct rtnl_link_ops macvtap_link_ops __read_mostly = { + .kind = "macvtap", + .setup = macvtap_setup, + .newlink= macvtap_newlink, + .dellink= macvtap_dellink, +}; + +static int macvtap_device_event(struct notifier_block *unused, + unsigned long event, void *ptr) +{ + struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct macvlan_dev *vlan; + struct device *classdev; + dev_t devt; + int err; + char tap_name[IFNAMSIZ]; + + if (dev->rtnl_link_ops != _link_ops) + return NOTIFY_DONE; + + snprintf(tap_name, IFNAMSIZ, "tap%d", dev->ifindex); + vlan = netdev_priv(dev); + + switch (event) { + case NETDEV_REGISTER: + /* Create the device node here after the network device has +* been registered but before register_netdevice has +* finished running. +*/ + err = macvtap_get_minor(vlan); + if (err) + return notifier_from_errno(err); + + devt = MKDEV(MAJOR(macvtap_major), vlan->minor); + classdev = device_create(_class, >dev, devt, +dev, tap_name); + if (IS_ERR(classdev)) { + macvtap_free_minor(vlan); + return notifier_from_errno(PTR_ERR(classdev)); + } + err =
[PATCHv5 7/7] ipvtap: IP-VLAN based tap driver
This patch adds a tap character device driver that is based on the IP-VLAN network interface, called ipvtap. An ipvtap device can be created in the same way as an ipvlan device, using 'type ipvtap', and then accessed using the tap user space interface. Signed-off-by: Sainath Grandhi--- drivers/net/Kconfig | 13 +++ drivers/net/Makefile | 1 + drivers/net/ipvlan/Makefile | 1 + drivers/net/ipvlan/ipvlan.h | 7 ++ drivers/net/ipvlan/ipvlan_core.c | 5 +- drivers/net/ipvlan/ipvlan_main.c | 27 +++-- drivers/net/ipvlan/ipvtap.c | 241 +++ 7 files changed, 281 insertions(+), 14 deletions(-) create mode 100644 drivers/net/ipvlan/ipvtap.c diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 8f6d21b4..fe83dc1 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -166,6 +166,19 @@ config IPVLAN To compile this driver as a module, choose M here: the module will be called ipvlan. +config IPVTAP + tristate "IP-VLAN based tap driver" + depends on IPVLAN + depends on INET + select TAP + ---help--- + This adds a specialized tap character device driver that is based + on the IP-VLAN network interface, called ipvtap. An ipvtap device + can be added in the same way as a ipvlan device, using 'type + ipvtap', and then be accessed through the tap user space interface. + + To compile this driver as a module, choose M here: the module + will be called ipvtap. config VXLAN tristate "Virtual eXtensible Local Area Network (VXLAN)" diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 7dd86ca..98ed4d9 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -7,6 +7,7 @@ # obj-$(CONFIG_BONDING) += bonding/ obj-$(CONFIG_IPVLAN) += ipvlan/ +obj-$(CONFIG_IPVTAP) += ipvlan/ obj-$(CONFIG_DUMMY) += dummy.o obj-$(CONFIG_EQUALIZER) += eql.o obj-$(CONFIG_IFB) += ifb.o diff --git a/drivers/net/ipvlan/Makefile b/drivers/net/ipvlan/Makefile index df79910..8a2c64d 100644 --- a/drivers/net/ipvlan/Makefile +++ b/drivers/net/ipvlan/Makefile @@ -3,5 +3,6 @@ # obj-$(CONFIG_IPVLAN) += ipvlan.o +obj-$(CONFIG_IPVTAP) += ipvtap.o ipvlan-objs := ipvlan_core.o ipvlan_main.o diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h index dbfbb33..4362d88 100644 --- a/drivers/net/ipvlan/ipvlan.h +++ b/drivers/net/ipvlan/ipvlan.h @@ -133,4 +133,11 @@ struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, struct sk_buff *skb, u16 proto); unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); +void ipvlan_count_rx(const struct ipvl_dev *ipvlan, +unsigned int len, bool success, bool mcast); +int ipvlan_link_new(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]); +void ipvlan_link_delete(struct net_device *dev, struct list_head *head); +void ipvlan_link_setup(struct net_device *dev); +int ipvlan_link_register(struct rtnl_link_ops *ops); #endif /* __IPVLAN_H */ diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c index 83ce74a..9af16ab 100644 --- a/drivers/net/ipvlan/ipvlan_core.c +++ b/drivers/net/ipvlan/ipvlan_core.c @@ -16,8 +16,8 @@ void ipvlan_init_secret(void) net_get_random_once(_jhash_secret, sizeof(ipvlan_jhash_secret)); } -static void ipvlan_count_rx(const struct ipvl_dev *ipvlan, - unsigned int len, bool success, bool mcast) +void ipvlan_count_rx(const struct ipvl_dev *ipvlan, +unsigned int len, bool success, bool mcast) { if (!ipvlan) return; @@ -36,6 +36,7 @@ static void ipvlan_count_rx(const struct ipvl_dev *ipvlan, this_cpu_inc(ipvlan->pcpu_stats->rx_errs); } } +EXPORT_SYMBOL_GPL(ipvlan_count_rx); static u8 ipvlan_get_v6_hash(const void *iaddr) { diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index 8b0f993..ed750e2 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -494,8 +494,8 @@ static int ipvlan_nl_fillinfo(struct sk_buff *skb, return ret; } -static int ipvlan_link_new(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[]) +int ipvlan_link_new(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) { struct ipvl_dev *ipvlan = netdev_priv(dev); struct ipvl_port *port; @@ -567,8 +567,9 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev, ipvlan_port_destroy(phy_dev); return err; } +EXPORT_SYMBOL_GPL(ipvlan_link_new); -static void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
[PATCHv5 6/7] tap: tap as an independent module
This patch makes tap a separate module for other types of virtual interfaces, for example, ipvlan to use. Signed-off-by: Sainath Grandhi--- drivers/net/Kconfig | 7 +++ drivers/net/Makefile | 3 +-- drivers/net/{macvtap_main.c => macvtap.c} | 0 drivers/net/tap.c | 11 +++ drivers/vhost/Kconfig | 2 +- include/linux/if_tap.h| 4 ++-- 6 files changed, 22 insertions(+), 5 deletions(-) rename drivers/net/{macvtap_main.c => macvtap.c} (100%) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 95c32f2..8f6d21b4 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -135,6 +135,7 @@ config MACVTAP tristate "MAC-VLAN based tap driver" depends on MACVLAN depends on INET + select TAP help This adds a specialized tap character device driver that is based on the MAC-VLAN network interface, called macvtap. A macvtap device @@ -284,6 +285,12 @@ config TUN If you don't know what to use this for, you don't need it. +config TAP + tristate + ---help--- + This option is selected by any driver implementing tap user space + interface for a virtual interface to re-use core tap functionality. + config TUN_VNET_CROSS_LE bool "Support for cross-endian vnet headers on little-endian kernels" default n diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 19b03a9..7dd86ca 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -21,6 +21,7 @@ obj-$(CONFIG_PHYLIB) += phy/ obj-$(CONFIG_RIONET) += rionet.o obj-$(CONFIG_NET_TEAM) += team/ obj-$(CONFIG_TUN) += tun.o +obj-$(CONFIG_TAP) += tap.o obj-$(CONFIG_VETH) += veth.o obj-$(CONFIG_VIRTIO_NET) += virtio_net.o obj-$(CONFIG_VXLAN) += vxlan.o @@ -29,8 +30,6 @@ obj-$(CONFIG_GTP) += gtp.o obj-$(CONFIG_NLMON) += nlmon.o obj-$(CONFIG_NET_VRF) += vrf.o -macvtap-objs := macvtap_main.o tap.o - # # Networking Drivers # diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap.c similarity index 100% rename from drivers/net/macvtap_main.c rename to drivers/net/macvtap.c diff --git a/drivers/net/tap.c b/drivers/net/tap.c index b7cdc90..a0ed508 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -312,6 +312,7 @@ void tap_del_queues(struct tap_dev *tap) /* guarantee that any future tap_set_queue will fail */ tap->numvtaps = MAX_TAP_QUEUES; } +EXPORT_SYMBOL_GPL(tap_del_queues); rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) { @@ -389,6 +390,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) kfree_skb(skb); return RX_HANDLER_CONSUMED; } +EXPORT_SYMBOL_GPL(tap_handle_frame); static struct major_info *tap_get_major(int major) { @@ -422,6 +424,7 @@ int tap_get_minor(dev_t major, struct tap_dev *tap) mutex_unlock(_major->minor_lock); return retval < 0 ? retval : 0; } +EXPORT_SYMBOL_GPL(tap_get_minor); void tap_free_minor(dev_t major, struct tap_dev *tap) { @@ -438,6 +441,7 @@ void tap_free_minor(dev_t major, struct tap_dev *tap) } mutex_unlock(_major->minor_lock); } +EXPORT_SYMBOL_GPL(tap_free_minor); static struct tap_dev *dev_get_by_tap_file(int major, int minor) { @@ -1193,6 +1197,7 @@ int tap_queue_resize(struct tap_dev *tap) kfree(arrays); return ret; } +EXPORT_SYMBOL_GPL(tap_queue_resize); static int tap_list_add(dev_t major, const char *device_name) { @@ -1240,6 +1245,7 @@ int tap_create_cdev(struct cdev *tap_cdev, out1: return err; } +EXPORT_SYMBOL_GPL(tap_create_cdev); void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev) { @@ -1255,3 +1261,8 @@ void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev) list_del_rcu(_major->next); kfree_rcu(tap_major, rcu); } +EXPORT_SYMBOL_GPL(tap_destroy_cdev); + +MODULE_AUTHOR("Arnd Bergmann "); +MODULE_AUTHOR("Sainath Grandhi "); +MODULE_LICENSE("GPL"); diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 40764ec..cfdecea 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -1,6 +1,6 @@ config VHOST_NET tristate "Host kernel accelerator for virtio net" - depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) + depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP) select VHOST ---help--- This kernel module can be loaded in host kernel to accelerate diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h index 362e71c..3482c3c 100644 --- a/include/linux/if_tap.h +++ b/include/linux/if_tap.h @@ -1,7 +1,7 @@ #ifndef _LINUX_IF_TAP_H_ #define _LINUX_IF_TAP_H_ -#if IS_ENABLED(CONFIG_MACVTAP) +#if IS_ENABLED(CONFIG_TAP) struct socket *tap_get_socket(struct file *); #else #include @@ -12,7 +12,7 @@ static inline struct socket
[PATCH net v3 0/2] net: ethernet: bgmac: bug fixes
Changes in v3: * Reworked the init sequence patch to only remove the device reset if the device is actually in reset. Given that this code doesn't bear much resemblance to the original code, I'm changing the author of the patch. This was tested on NS2 SVK. Changes in v2: * Reworked the first match to make it more obvious what portions of the register were being preserved (Per Rafal Mileki) * Style change to reorder the function variables in patch 2 (per Sergei Shtylyov) Bug fixes for bgmac driver Hari Vyas (1): net: ethernet: bgmac: mac address change bug Jon Mason (1): net: ethernet: bgmac: init sequence bug drivers/net/ethernet/broadcom/bgmac-platform.c | 27 -- drivers/net/ethernet/broadcom/bgmac.c | 6 +- drivers/net/ethernet/broadcom/bgmac.h | 16 +++ 3 files changed, 38 insertions(+), 11 deletions(-) -- 2.7.4
[PATCH] netlink: move nla_put_{u8,u16,u32} out of line
When CONFIG_KASAN is enabled, the "--param asan-stack=1" causes rather large stack frames in some functions. This goes unnoticed normally because CONFIG_FRAME_WARN is disabled with CONFIG_KASAN by default as of commit 3f181b4d8652 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=y"). The kernelci.org build bot however has the warning enabled and that led me to investigate it a little further, as every build produces these warnings: net/wireless/nl80211.c:4389:1: warning: the frame size of 2240 bytes is larger than 2048 bytes [-Wframe-larger-than=] net/wireless/nl80211.c:1895:1: warning: the frame size of 3776 bytes is larger than 2048 bytes [-Wframe-larger-than=] net/wireless/nl80211.c:1410:1: warning: the frame size of 2208 bytes is larger than 2048 bytes [-Wframe-larger-than=] net/bridge/br_netlink.c:1282:1: warning: the frame size of 2544 bytes is larger than 2048 bytes [-Wframe-larger-than=] It turns out that there is a relatively simple workaround for the netlink users that currently use a local variable in order to do the type conversion: Moving the three functions (for each of the typical sizes) to lib/nlattr.c avoids using local variables in the caller, which drastically reduces the stack usage for nl80211 and br_netlink. It would be good if we could enable the frame size check after that again, but that should be a separate patch and it requires some more testing to see which the largest acceptable frame size should be. Cc: Andrey RyabininCc: Alexander Potapenko Cc: Dmitry Vyukov Cc: kasan-...@googlegroups.com Signed-off-by: Arnd Bergmann --- include/net/netlink.h | 23 +++ lib/nlattr.c | 18 ++ 2 files changed, 25 insertions(+), 16 deletions(-) diff --git a/include/net/netlink.h b/include/net/netlink.h index b239fcd33d80..48b117e80509 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -755,10 +755,7 @@ static inline int nla_parse_nested(struct nlattr *tb[], int maxtype, * @attrtype: attribute type * @value: numeric value */ -static inline int nla_put_u8(struct sk_buff *skb, int attrtype, u8 value) -{ - return nla_put(skb, attrtype, sizeof(u8), ); -} +extern int nla_put_u8(struct sk_buff *skb, int attrtype, u8 value); /** * nla_put_u16 - Add a u16 netlink attribute to a socket buffer @@ -766,10 +763,7 @@ static inline int nla_put_u8(struct sk_buff *skb, int attrtype, u8 value) * @attrtype: attribute type * @value: numeric value */ -static inline int nla_put_u16(struct sk_buff *skb, int attrtype, u16 value) -{ - return nla_put(skb, attrtype, sizeof(u16), ); -} +extern int nla_put_u16(struct sk_buff *skb, int attrtype, u16 value); /** * nla_put_be16 - Add a __be16 netlink attribute to a socket buffer @@ -779,7 +773,7 @@ static inline int nla_put_u16(struct sk_buff *skb, int attrtype, u16 value) */ static inline int nla_put_be16(struct sk_buff *skb, int attrtype, __be16 value) { - return nla_put(skb, attrtype, sizeof(__be16), ); + return nla_put_u16(skb, attrtype, (u16 __force)value); } /** @@ -801,7 +795,7 @@ static inline int nla_put_net16(struct sk_buff *skb, int attrtype, __be16 value) */ static inline int nla_put_le16(struct sk_buff *skb, int attrtype, __le16 value) { - return nla_put(skb, attrtype, sizeof(__le16), ); + return nla_put_u16(skb, attrtype, (u16 __force)value); } /** @@ -810,10 +804,7 @@ static inline int nla_put_le16(struct sk_buff *skb, int attrtype, __le16 value) * @attrtype: attribute type * @value: numeric value */ -static inline int nla_put_u32(struct sk_buff *skb, int attrtype, u32 value) -{ - return nla_put(skb, attrtype, sizeof(u32), ); -} +int nla_put_u32(struct sk_buff *skb, int attrtype, u32 value); /** * nla_put_be32 - Add a __be32 netlink attribute to a socket buffer @@ -823,7 +814,7 @@ static inline int nla_put_u32(struct sk_buff *skb, int attrtype, u32 value) */ static inline int nla_put_be32(struct sk_buff *skb, int attrtype, __be32 value) { - return nla_put(skb, attrtype, sizeof(__be32), ); + return nla_put_u32(skb, attrtype, (u32 __force)value); } /** @@ -845,7 +836,7 @@ static inline int nla_put_net32(struct sk_buff *skb, int attrtype, __be32 value) */ static inline int nla_put_le32(struct sk_buff *skb, int attrtype, __le32 value) { - return nla_put(skb, attrtype, sizeof(__le32), ); + return nla_put_u32(skb, attrtype, (u32 __force)value); } /** diff --git a/lib/nlattr.c b/lib/nlattr.c index b42b8577fc23..2988b08a7e4d 100644 --- a/lib/nlattr.c +++ b/lib/nlattr.c @@ -548,6 +548,24 @@ int nla_put(struct sk_buff *skb, int attrtype, int attrlen, const void *data) } EXPORT_SYMBOL(nla_put); +int nla_put_u8(struct sk_buff *skb, int attrtype, u8 value) +{ + return nla_put(skb, attrtype, sizeof(u8), ); +} +EXPORT_SYMBOL(nla_put_u8); + +int