Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-08 Thread Jiri Pirko
Thu, Feb 09, 2017 at 12:41:20AM CET, t...@herbertland.com wrote:
>This patch creates an infrastructure for registering and running code at
>XDP hooks in drivers. This extends and generalizes the original XDP/BPF
>interface. Specifically, it defines a generic xdp_hook structure and a
>set of hooks that can be assigned to devices or napi instances.  These
>hooks are also generic to allow for XDP/BPF programs as well as non-BPF
>code (e.g. kernel code can be written in a module).
>
>An XDP hook is defined by the xdp_hook structure. A pointer to this
>structure is passed into the XDP register function to set up a hook.
>The XDP register function mallocs its own xdp_hook structure and copies
>the values from the xdp_hook passed in. The register function also saves
>the pointer value of the xdp_hook argument; this pointer is used in
>subsequently calls to XDP to identify the registered hook.
>
>The interface is defined in net/xdp.h. This includes the definition of
>xdp_hook, functions to register and unregister hooks on a device
>or individual instances of napi, and xdp_hook_run that is called by
>drivers to run the hooks.
>
>Signed-off-by: Tom Herbert 
>---
> drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c |   1 +
> include/linux/filter.h   |  10 +-
> include/linux/netdev_features.h  |   3 +-
> include/linux/netdevice.h|  16 ++
> include/net/xdp.h| 310 +++
> include/trace/events/xdp.h   |  31 +++
> kernel/bpf/core.c|   1 +
> net/core/Makefile|   2 +-
> net/core/dev.c   |  53 ++--
> net/core/filter.c|   1 +
> net/core/rtnetlink.c |  14 +-
> net/core/xdp.c   | 304 ++
> 12 files changed, 711 insertions(+), 35 deletions(-)
> create mode 100644 include/net/xdp.h
> create mode 100644 net/core/xdp.c
>

[...]

>@@ -48,6 +49,36 @@ TRACE_EVENT(xdp_exception,
> __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB))
> );
> 
>+/* Temporaray trace function. This will be renamed to xdp_exception after all

typo

>+ * the calling drivers have been patched.
>+ */


Re: [PATCHv6 net-next 4/6] sctp: implement sender-side procedures for SSN/TSN Reset Request Parameter

2017-02-08 Thread Xin Long
On Thu, Feb 9, 2017 at 5:50 AM, Marcelo Ricardo Leitner
 wrote:
> On Wed, Feb 08, 2017 at 07:48:01PM -0200, Marcelo Ricardo Leitner wrote:
>> Hi Xin,
>>
>> On Thu, Feb 09, 2017 at 01:18:18AM +0800, Xin Long wrote:
>> > This patch is to implement Sender-Side Procedures for the SSN/TSN
>> > Reset Request Parameter descibed in rfc6525 section 5.1.4.
>> >
>> > It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3
>> > for users.
>> >
>> > Signed-off-by: Xin Long 
>> ...
>> > +
>> > +int sctp_send_reset_assoc(struct sctp_association *asoc)
>> > +{
>> > +   struct sctp_chunk *chunk = NULL;
>> > +   int retval;
>> > +   __u16 i;
>> > +
>> > +   if (!asoc->peer.reconf_capable ||
>> > +   !(asoc->strreset_enable & SCTP_ENABLE_RESET_ASSOC_REQ))
>> > +   return -ENOPROTOOPT;
>> > +
>> > +   if (asoc->strreset_outstanding)
>> > +   return -EINPROGRESS;
>> > +
>> > +   chunk = sctp_make_strreset_tsnreq(asoc);
>>   ^--- refcnf = 1 (as per sctp_chunkify())
>>
>> > +   if (!chunk)
>> > +   return -ENOMEM;
>> > +
>> > +   /* Block further xmit of data until this request is completed */
>> > +   for (i = 0; i < asoc->stream->outcnt; i++)
>> > +   asoc->stream->out[i].state = SCTP_STREAM_CLOSED;
>> > +
>> > +   asoc->strreset_chunk = chunk;
>> > +   sctp_chunk_hold(asoc->strreset_chunk);
>>  ^--- refcnf = 2
>> > +
>> > +   retval = sctp_send_reconf(asoc, chunk);
>> > +   if (retval) {
>> > +   sctp_chunk_put(asoc->strreset_chunk);
>>   ^--- refcnf = 1
>>
>> Won't we leak the chunk here?
>
> No we won't, sctp_send_reconf() frees it for us, aye.
yups. :)

>
>


[lkp-robot] [rhashtable] 60be2ebf32: INFO:suspicious_RCU_usage

2017-02-08 Thread kernel test robot

FYI, we noticed the following commit:

commit: 60be2ebf326aa90c88e9a967412557d832a1612e ("rhashtable: Add nested 
tables")
url: 
https://github.com/0day-ci/linux/commits/Herbert-Xu/rhashtable-Handle-table-allocation-failure-during-insertion/20170207-204835


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 2G

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+-+++
| | 12f609851b | 60be2ebf32 |
+-+++
| boot_successes  | 123| 5  |
| boot_failures   | 9  | 8  |
| BUG:kernel_reboot-without-warning_in_test_stage | 9  ||
| INFO:suspicious_RCU_usage   | 0  | 8  |
+-+++



[  222.678280] [ INFO: suspicious RCU usage. ]
[  222.678280] [ INFO: suspicious RCU usage. ]
[  222.699410] 4.10.0-rc6-00112-g60be2eb #534 Not tainted
[  222.699410] 4.10.0-rc6-00112-g60be2eb #534 Not tainted
[  222.725264] ---
[  222.725264] ---
[  222.741887] lib/rhashtable.c:1125 suspicious rcu_dereference_check() usage!
[  222.741887] lib/rhashtable.c:1125 suspicious rcu_dereference_check() usage!
[  222.783537] 
[  222.783537] other info that might help us debug this:
[  222.783537] 
[  222.783537] 
[  222.783537] other info that might help us debug this:
[  222.783537] 
[  222.823615] 
[  222.823615] rcu_scheduler_active = 2, debug_locks = 1
[  222.823615] 
[  222.823615] rcu_scheduler_active = 2, debug_locks = 1
[  222.856672] 4 locks held by kworker/0:1/19:
[  222.856672] 4 locks held by kworker/0:1/19:
[  222.877765]  #0:  ("events"){.+.+.+}, at: [] 
process_one_work+0x2ac/0x761
[  222.877765]  #0:  ("events"){.+.+.+}, at: [] 
process_one_work+0x2ac/0x761
[  222.917921]  #1:  ((>run_work)){+.+.+.}, at: [] 
process_one_work+0x2d6/0x761
[  222.917921]  #1:  ((>run_work)){+.+.+.}, at: [] 
process_one_work+0x2d6/0x761
[  222.960763]  #2:  (>mutex){+.+.+.}, at: [] 
rht_deferred_worker+0x22/0x26a
[  222.960763]  #2:  (>mutex){+.+.+.}, at: [] 
rht_deferred_worker+0x22/0x26a
[  223.004378]  #3:  (&(>locks[i])->rlock){+.}, at: 
[] rhashtable_rehash_table+0xf6/0x640
[  223.004378]  #3:  (&(>locks[i])->rlock){+.}, at: 
[] rhashtable_rehash_table+0xf6/0x640
[  223.055769] 
[  223.055769] stack backtrace:
[  223.055769] 
[  223.055769] stack backtrace:
[  223.078034] CPU: 0 PID: 19 Comm: kworker/0:1 Not tainted 
4.10.0-rc6-00112-g60be2eb #534
[  223.078034] CPU: 0 PID: 19 Comm: kworker/0:1 Not tainted 
4.10.0-rc6-00112-g60be2eb #534
[  223.118613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[  223.118613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[  223.169614] Workqueue: events rht_deferred_worker
[  223.169614] Workqueue: events rht_deferred_worker
[  223.193302] Call Trace:
[  223.193302] Call Trace:
[  223.205882]  dump_stack+0x19/0x1b
[  223.205882]  dump_stack+0x19/0x1b
[  223.222639]  lockdep_rcu_suspicious+0xdd/0xf4
[  223.222639]  lockdep_rcu_suspicious+0xdd/0xf4
[  223.244610]  rht_bucket_nested+0x107/0x10c
[  223.244610]  rht_bucket_nested+0x107/0x10c
[  223.265382]  rhashtable_rehash_table+0x373/0x640
[  223.265382]  rhashtable_rehash_table+0x373/0x640
[  223.289295]  rht_deferred_worker+0x112/0x26a
[  223.289295]  rht_deferred_worker+0x112/0x26a
[  223.308573]  process_one_work+0x3b7/0x761
[  223.308573]  process_one_work+0x3b7/0x761
[  223.328696]  ? process_one_work+0x2d6/0x761
[  223.328696]  ? process_one_work+0x2d6/0x761
[  223.349210]  worker_thread+0x2aa/0x6a5
[  223.349210]  worker_thread+0x2aa/0x6a5
[  223.368199]  kthread+0x107/0x13a
[  223.368199]  kthread+0x107/0x13a
[  223.384633]  ? process_one_work+0x761/0x761
[  223.384633]  ? process_one_work+0x761/0x761
[  223.405721]  ? __kthread_create_on_node+0x232/0x232
[  223.405721]  ? __kthread_create_on_node+0x232/0x232
[  223.430350]  ret_from_fork+0x31/0x40
[  223.430350]  ret_from_fork+0x31/0x40
[  223.448309] 
[  223.448309] 
[  223.456052] ===
[  223.456052] ===
[  223.477118] [ INFO: suspicious RCU usage. ]
[  223.477118] [ INFO: suspicious RCU usage. ]
[  223.498249] 4.10.0-rc6-00112-g60be2eb #534 Not tainted
[  223.498249] 4.10.0-rc6-00112-g60be2eb #534 Not tainted
[  223.524590] ---
[  223.524590] ---
[  223.545889] lib/rhashtable.c:1130 suspicious rcu_dereference_check() usage!
[  223.545889] lib/rhashtable.c:1130 suspicious rcu_dereference_check() usage!
[  223.588207] 
[  223.588207] other 

Re: [PATCHv6 net-next 3/6] sctp: add support for generating stream reconf ssn/tsn reset request chunk

2017-02-08 Thread Xin Long
On Thu, Feb 9, 2017 at 5:57 AM, Marcelo Ricardo Leitner
 wrote:
> On Thu, Feb 09, 2017 at 01:18:17AM +0800, Xin Long wrote:
>> This patch is to define SSN/TSN Reset Request Parameter described
>> in rfc6525 section 4.3.
>>
>> It's also to drop some unnecessary __packed in include/linux/sctp.h.
>
> Oups, extra line in the changelog here.
I've moved the "drop __packed" to patch 1/6, this line should have
been removed.
Hi, David, can you remove this line when applying ?

Thanks.

>
>>
>> Signed-off-by: Xin Long 
>> ---
>>  include/linux/sctp.h |  5 +
>>  include/net/sctp/sm.h|  2 ++
>>  net/sctp/sm_make_chunk.c | 29 +
>>  3 files changed, 36 insertions(+)
>>
>> diff --git a/include/linux/sctp.h b/include/linux/sctp.h
>> index d74fca3..71c0d41 100644
>> --- a/include/linux/sctp.h
>> +++ b/include/linux/sctp.h
>> @@ -737,4 +737,9 @@ struct sctp_strreset_inreq {
>>   __u16 list_of_streams[0];
>>  };
>>
>> +struct sctp_strreset_tsnreq {
>> + sctp_paramhdr_t param_hdr;
>> + __u32 request_seq;
>> +};
>> +
>>  #endif /* __LINUX_SCTP_H__ */
>> diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
>> index 430ed13..ac37c17 100644
>> --- a/include/net/sctp/sm.h
>> +++ b/include/net/sctp/sm.h
>> @@ -265,6 +265,8 @@ struct sctp_chunk *sctp_make_strreset_req(
>>   const struct sctp_association *asoc,
>>   __u16 stream_num, __u16 *stream_list,
>>   bool out, bool in);
>> +struct sctp_chunk *sctp_make_strreset_tsnreq(
>> + const struct sctp_association *asoc);
>>  void sctp_chunk_assign_tsn(struct sctp_chunk *);
>>  void sctp_chunk_assign_ssn(struct sctp_chunk *);
>>
>> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
>> index c7d3249..749842a 100644
>> --- a/net/sctp/sm_make_chunk.c
>> +++ b/net/sctp/sm_make_chunk.c
>> @@ -3658,3 +3658,32 @@ struct sctp_chunk *sctp_make_strreset_req(
>>
>>   return retval;
>>  }
>> +
>> +/* RE-CONFIG 4.3 (SSN/TSN RESET ALL)
>> + *   0   1   2   3
>> + *   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> + *  | Parameter Type = 15   |  Parameter Length = 8 |
>> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> + *  | Re-configuration Request Sequence Number  |
>> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> + */
>> +struct sctp_chunk *sctp_make_strreset_tsnreq(
>> + const struct sctp_association *asoc)
>> +{
>> + struct sctp_strreset_tsnreq tsnreq;
>> + __u16 length = sizeof(tsnreq);
>> + struct sctp_chunk *retval;
>> +
>> + retval = sctp_make_reconf(asoc, length);
>> + if (!retval)
>> + return NULL;
>> +
>> + tsnreq.param_hdr.type = SCTP_PARAM_RESET_TSN_REQUEST;
>> + tsnreq.param_hdr.length = htons(length);
>> + tsnreq.request_seq = htonl(asoc->strreset_outseq);
>> +
>> + sctp_addto_chunk(retval, sizeof(tsnreq), );
>> +
>> + return retval;
>> +}
>> --
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


Re: [net, v3, 1/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()

2017-02-08 Thread Heiko Schocher

Hello Florian,

Am 09.02.2017 um 08:13 schrieb Florian Fainelli:



On 02/08/2017 10:58 PM, Heiko Schocher wrote:

Hello Florian,

Am 09.02.2017 um 01:13 schrieb Florian Fainelli:

The Generic PHY drivers gets assigned after we checked that the current
PHY driver is NULL, so we need to check a few things before we can
safely dereference d->driver. This would be causing a NULL deference to
occur when a system binds to the Generic PHY driver. Update
phy_attach_direct() to do the following:

- grab the driver module reference after we have assigned the Generic
PHY drivers accordingly

- update the error path to clean up the module reference in case the
Generic PHY probe function fails

Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY
driver")
Signed-off-by: Florian Fainelli 
---
   drivers/net/phy/phy_device.c | 16 +++-
   1 file changed, 15 insertions(+), 1 deletion(-)


just stumbled over this bug on an am335x based board, with an
KSZ8081 attached, so there a "fixed-link" is used like:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am335x-baltos-ir3220.dts#n105


With your patch it crashes also ...


The final version of the patch is here:

http://patchwork.ozlabs.org/patch/725923/


Huh, sorry ...


Do you mind giving it a try?


With this patch, ethernet works again fine on this board, thanks!

bye,
Heiko
--
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


[PATCH 0/4] Whitespace checkpatch fixes

2017-02-08 Thread Tobin C. Harding
This patch set fixes various whitespace checkpatch errors and warnings.

Tobin C. Harding (4):
  net: Fix checkpatch WARNING: please, no space before tabs
  net: Fix checkpatch whitespace errors
  net: Fix checkpatch block comments warnings
  net: Fix checkpatch, Missing a blank line after declarations

 net/core/dev.c | 259 ++---
 1 file changed, 137 insertions(+), 122 deletions(-)

-- 
2.7.4



"David S. Miller"  (maintainer:NETWORKING 
[GENERAL],commit_signer:72/82=88%)
Eric Dumazet  
(commit_signer:23/82=28%,authored:19/82=23%,added_lines:150/1100=14%,removed_lines:104/791=13%)
Alexander Duyck  
(commit_signer:14/82=17%,authored:11/82=13%,added_lines:259/1100=24%,removed_lines:66/791=8%)
David Ahern  
(commit_signer:9/82=11%,authored:7/82=9%,added_lines:219/1100=20%,removed_lines:230/791=29%)
Jesper Dangaard Brouer  (commit_signer:6/82=7%)
"Tobin C. Harding"  
(added_lines:137/1100=12%,removed_lines:122/791=15%)
Jiri Pirko  (added_lines:90/1100=8%)
stephen hemminger  (removed_lines:136/791=17%)
netdev@vger.kernel.org (open list:NETWORKING [GENERAL])
linux-ker...@vger.kernel.org (open list)
 


Re: [net, v3, 1/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()

2017-02-08 Thread Florian Fainelli


On 02/08/2017 10:58 PM, Heiko Schocher wrote:
> Hello Florian,
> 
> Am 09.02.2017 um 01:13 schrieb Florian Fainelli:
>> The Generic PHY drivers gets assigned after we checked that the current
>> PHY driver is NULL, so we need to check a few things before we can
>> safely dereference d->driver. This would be causing a NULL deference to
>> occur when a system binds to the Generic PHY driver. Update
>> phy_attach_direct() to do the following:
>>
>> - grab the driver module reference after we have assigned the Generic
>>PHY drivers accordingly
>>
>> - update the error path to clean up the module reference in case the
>>Generic PHY probe function fails
>>
>> Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY
>> driver")
>> Signed-off-by: Florian Fainelli 
>> ---
>>   drivers/net/phy/phy_device.c | 16 +++-
>>   1 file changed, 15 insertions(+), 1 deletion(-)
> 
> just stumbled over this bug on an am335x based board, with an
> KSZ8081 attached, so there a "fixed-link" is used like:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am335x-baltos-ir3220.dts#n105
> 
> 
> With your patch it crashes also ...

The final version of the patch is here:

http://patchwork.ozlabs.org/patch/725923/

Do you mind giving it a try?
-- 
Florian


[PATCH 4/4] net: Fix checkpatch, Missing a blank line after declarations

2017-02-08 Thread Tobin C. Harding
This patch fixes multiple occurrences of checkpatch WARNING: Missing
a blank line after declarations.

Signed-off-by: Tobin C. Harding 
---
 net/core/dev.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 6a076a1..fa63485 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2520,6 +2520,7 @@ u16 __skb_tx_hash(const struct net_device *dev, struct 
sk_buff *skb,
 
if (dev->num_tc) {
u8 tc = netdev_get_prio_tc_map(dev, skb->priority);
+
qoffset = dev->tc_to_txq[tc].offset;
qcount = dev->tc_to_txq[tc].count;
}
@@ -2734,9 +2735,11 @@ static int illegal_highdma(struct net_device *dev, 
struct sk_buff *skb)
 {
 #ifdef CONFIG_HIGHMEM
int i;
+
if (!(dev->features & NETIF_F_HIGHDMA)) {
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
skb_frag_t *frag = _shinfo(skb)->frags[i];
+
if (PageHighMem(skb_frag_page(frag)))
return 1;
}
@@ -2750,6 +2753,7 @@ static int illegal_highdma(struct net_device *dev, struct 
sk_buff *skb)
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
skb_frag_t *frag = _shinfo(skb)->frags[i];
dma_addr_t addr = page_to_phys(skb_frag_page(frag));
+
if (!pdev->dma_mask || addr + PAGE_SIZE - 1 > 
*pdev->dma_mask)
return 1;
}
@@ -3227,6 +3231,7 @@ static u16 __netdev_pick_tx(struct net_device *dev, 
struct sk_buff *skb)
if (queue_index < 0 || skb->ooo_okay ||
queue_index >= dev->real_num_tx_queues) {
int new_index = get_xps_queue(dev, skb);
+
if (new_index < 0)
new_index = skb_tx_hash(dev, skb);
 
@@ -3256,6 +3261,7 @@ struct netdev_queue *netdev_pick_tx(struct net_device 
*dev,
 
if (dev->real_num_tx_queues != 1) {
const struct net_device_ops *ops = dev->netdev_ops;
+
if (ops->ndo_select_queue)
queue_index = ops->ndo_select_queue(dev, skb, 
accel_priv,
__netdev_pick_tx);
@@ -3781,6 +3787,7 @@ static int netif_rx_internal(struct sk_buff *skb)
 #endif
{
unsigned int qtail;
+
ret = enqueue_to_backlog(skb, get_cpu(), );
put_cpu();
}
@@ -3840,6 +3847,7 @@ static __latent_entropy void net_tx_action(struct 
softirq_action *h)
 
while (clist) {
struct sk_buff *skb = clist;
+
clist = clist->next;
 
WARN_ON(atomic_read(>users));
@@ -5708,6 +5716,7 @@ static int netdev_adjacent_sysfs_add(struct net_device 
*dev,
  struct list_head *dev_list)
 {
char linkname[IFNAMSIZ+7];
+
sprintf(linkname, dev_list == >adj_list.upper ?
"upper_%s" : "lower_%s", adj_dev->name);
return sysfs_create_link(&(dev->dev.kobj), &(adj_dev->dev.kobj),
@@ -5718,6 +5727,7 @@ static void netdev_adjacent_sysfs_del(struct net_device 
*dev,
   struct list_head *dev_list)
 {
char linkname[IFNAMSIZ+7];
+
sprintf(linkname, dev_list == >adj_list.upper ?
"upper_%s" : "lower_%s", name);
sysfs_remove_link(&(dev->dev.kobj), linkname);
@@ -5987,6 +5997,7 @@ void netdev_upper_dev_unlink(struct net_device *dev,
 struct net_device *upper_dev)
 {
struct netdev_notifier_changeupper_info changeupper_info;
+
ASSERT_RTNL();
 
changeupper_info.upper_dev = upper_dev;
@@ -6748,6 +6759,7 @@ EXPORT_SYMBOL(dev_change_xdp_fd);
 static int dev_new_index(struct net *net)
 {
int ifindex = net->ifindex;
+
for (;;) {
if (++ifindex <= 0)
ifindex = 1;
@@ -7168,6 +7180,7 @@ void netif_tx_stop_all_queues(struct net_device *dev)
 
for (i = 0; i < dev->num_tx_queues; i++) {
struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
+
netif_tx_stop_queue(txq);
}
 }
-- 
2.7.4



[PATCH 1/4] net: Fix checkpatch WARNING: please, no space before tabs

2017-02-08 Thread Tobin C. Harding
This patch fixes multiple occurrences of space before tabs warnings.
More lines of code were moved than required to keep kernel-doc
comments uniform.

Signed-off-by: Tobin C. Harding 
---
 net/core/dev.c | 142 -
 1 file changed, 71 insertions(+), 71 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 29101c9..2753690 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1,5 +1,5 @@
 /*
- * NET3Protocol independent device support routines.
+ *  NET3Protocol independent device support routines.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -7,7 +7,7 @@
  * 2 of the License, or (at your option) any later version.
  *
  * Derived from the non IP parts of dev.c 1.0.19
- * Authors:Ross Biro
+ *  Authors:   Ross Biro
  * Fred N. van Kempen, 
  * Mark Evans, 
  *
@@ -21,9 +21,9 @@
  *
  * Changes:
  *  D.J. Barrow :   Fixed bug where dev->refcnt gets set
- * to 2 if register_netdev gets called
- * before net_dev_init & also removed a
- * few lines of code in the process.
+ *  to 2 if register_netdev gets called
+ *  before net_dev_init & also removed a
+ *  few lines of code in the process.
  * Alan Cox:   device private ioctl copies fields back.
  * Alan Cox:   Transmit queue code does relevant
  * stunts to keep the queue safe.
@@ -36,7 +36,7 @@
  * Alan Cox:   100 backlog just doesn't cut it when
  * you start doing multicast video 8)
  * Alan Cox:   Rewrote net_bh and list manager.
- * Alan Cox:   Fix ETH_P_ALL echoback lengths.
+ *  Alan Cox:   Fix ETH_P_ALL echoback lengths.
  * Alan Cox:   Took out transmit every packet pass
  * Saved a few bytes in the ioctl handler
  * Alan Cox:   Network driver sets packet type before
@@ -46,7 +46,7 @@
  * Richard Kooijman:   Timestamp fixes.
  * Alan Cox:   Wrong field in SIOCGIFDSTADDR
  * Alan Cox:   Device lock protection.
- * Alan Cox:   Fixed nasty side effect of device close
+ *  Alan Cox:   Fixed nasty side effect of device close
  * changes.
  * Rudi Cilibrasi  :   Pass the right thing to
  * set_mac_address()
@@ -67,8 +67,8 @@
  * Paul Rusty Russell  :   SIOCSIFNAME
  *  Pekka Riikonen  :  Netdev boot-time settings code
  *  Andrew Morton   :   Make unregister_netdevice wait
- * indefinitely on dev->refcnt
- * J Hadi Salim:   - Backlog queue sampling
+ *  indefinitely on dev->refcnt
+ *  J Hadi Salim:   - Backlog queue sampling
  * - netif_rx() feedback
  */
 
@@ -574,13 +574,13 @@ static int netdev_boot_setup_add(char *name, struct ifmap 
*map)
 }
 
 /**
- * netdev_boot_setup_check - check boot time settings
- * @dev: the netdevice
+ * netdev_boot_setup_check - check boot time settings
+ * @dev: the netdevice
  *
- * Check boot time settings for the device.
- * The found settings are set for the device to be used
- * later in the device probing.
- * Returns 0 if no settings found, 1 if they are.
+ * Check boot time settings for the device.
+ * The found settings are set for the device to be used
+ * later in the device probing.
+ * Returns 0 if no settings found, 1 if they are.
  */
 int netdev_boot_setup_check(struct net_device *dev)
 {
@@ -590,10 +590,10 @@ int netdev_boot_setup_check(struct net_device *dev)
for (i = 0; i < NETDEV_BOOT_SETUP_MAX; i++) {
if (s[i].name[0] != '\0' && s[i].name[0] != ' ' &&
!strcmp(dev->name, s[i].name)) {
-   dev->irq= s[i].map.irq;
-   dev->base_addr  = s[i].map.base_addr;
-   dev->mem_start  = s[i].map.mem_start;
-   dev->mem_end= s[i].map.mem_end;
+   dev->irq = s[i].map.irq;
+   dev->base_addr = s[i].map.base_addr;
+   

[PATCH 2/4] net: Fix checkpatch whitespace errors

2017-02-08 Thread Tobin C. Harding
This patch fixes two trivial whitespace errors. Brace should be
on the previous line and trailing statements should be on next line.

Signed-off-by: Tobin C. Harding 
---
 net/core/dev.c | 39 ---
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 2753690..471e41a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -192,7 +192,8 @@ static seqcount_t devnet_rename_seq;
 
 static inline void dev_base_seq_inc(struct net *net)
 {
-   while (++net->dev_base_seq == 0);
+   while (++net->dev_base_seq == 0)
+   ;
 }
 
 static inline struct hlist_head *dev_name_hash(struct net *net, const char 
*name)
@@ -274,8 +275,8 @@ EXPORT_PER_CPU_SYMBOL(softnet_data);
  * register_netdevice() inits txq->_xmit_lock and sets lockdep class
  * according to dev->type
  */
-static const unsigned short netdev_lock_type[] =
-   {ARPHRD_NETROM, ARPHRD_ETHER, ARPHRD_EETHER, ARPHRD_AX25,
+static const unsigned short netdev_lock_type[] = {
+ARPHRD_NETROM, ARPHRD_ETHER, ARPHRD_EETHER, ARPHRD_AX25,
 ARPHRD_PRONET, ARPHRD_CHAOS, ARPHRD_IEEE802, ARPHRD_ARCNET,
 ARPHRD_APPLETLK, ARPHRD_DLCI, ARPHRD_ATM, ARPHRD_METRICOM,
 ARPHRD_IEEE1394, ARPHRD_EUI64, ARPHRD_INFINIBAND, ARPHRD_SLIP,
@@ -291,22 +292,22 @@ static const unsigned short netdev_lock_type[] =
 ARPHRD_IEEE80211_RADIOTAP, ARPHRD_PHONET, ARPHRD_PHONET_PIPE,
 ARPHRD_IEEE802154, ARPHRD_VOID, ARPHRD_NONE};
 
-static const char *const netdev_lock_name[] =
-   {"_xmit_NETROM", "_xmit_ETHER", "_xmit_EETHER", "_xmit_AX25",
-"_xmit_PRONET", "_xmit_CHAOS", "_xmit_IEEE802", "_xmit_ARCNET",
-"_xmit_APPLETLK", "_xmit_DLCI", "_xmit_ATM", "_xmit_METRICOM",
-"_xmit_IEEE1394", "_xmit_EUI64", "_xmit_INFINIBAND", "_xmit_SLIP",
-"_xmit_CSLIP", "_xmit_SLIP6", "_xmit_CSLIP6", "_xmit_RSRVD",
-"_xmit_ADAPT", "_xmit_ROSE", "_xmit_X25", "_xmit_HWX25",
-"_xmit_PPP", "_xmit_CISCO", "_xmit_LAPB", "_xmit_DDCMP",
-"_xmit_RAWHDLC", "_xmit_TUNNEL", "_xmit_TUNNEL6", "_xmit_FRAD",
-"_xmit_SKIP", "_xmit_LOOPBACK", "_xmit_LOCALTLK", "_xmit_FDDI",
-"_xmit_BIF", "_xmit_SIT", "_xmit_IPDDP", "_xmit_IPGRE",
-"_xmit_PIMREG", "_xmit_HIPPI", "_xmit_ASH", "_xmit_ECONET",
-"_xmit_IRDA", "_xmit_FCPP", "_xmit_FCAL", "_xmit_FCPL",
-"_xmit_FCFABRIC", "_xmit_IEEE80211", "_xmit_IEEE80211_PRISM",
-"_xmit_IEEE80211_RADIOTAP", "_xmit_PHONET", "_xmit_PHONET_PIPE",
-"_xmit_IEEE802154", "_xmit_VOID", "_xmit_NONE"};
+static const char *const netdev_lock_name[] = {
+   "_xmit_NETROM", "_xmit_ETHER", "_xmit_EETHER", "_xmit_AX25",
+   "_xmit_PRONET", "_xmit_CHAOS", "_xmit_IEEE802", "_xmit_ARCNET",
+   "_xmit_APPLETLK", "_xmit_DLCI", "_xmit_ATM", "_xmit_METRICOM",
+   "_xmit_IEEE1394", "_xmit_EUI64", "_xmit_INFINIBAND", "_xmit_SLIP",
+   "_xmit_CSLIP", "_xmit_SLIP6", "_xmit_CSLIP6", "_xmit_RSRVD",
+   "_xmit_ADAPT", "_xmit_ROSE", "_xmit_X25", "_xmit_HWX25",
+   "_xmit_PPP", "_xmit_CISCO", "_xmit_LAPB", "_xmit_DDCMP",
+   "_xmit_RAWHDLC", "_xmit_TUNNEL", "_xmit_TUNNEL6", "_xmit_FRAD",
+   "_xmit_SKIP", "_xmit_LOOPBACK", "_xmit_LOCALTLK", "_xmit_FDDI",
+   "_xmit_BIF", "_xmit_SIT", "_xmit_IPDDP", "_xmit_IPGRE",
+   "_xmit_PIMREG", "_xmit_HIPPI", "_xmit_ASH", "_xmit_ECONET",
+   "_xmit_IRDA", "_xmit_FCPP", "_xmit_FCAL", "_xmit_FCPL",
+   "_xmit_FCFABRIC", "_xmit_IEEE80211", "_xmit_IEEE80211_PRISM",
+   "_xmit_IEEE80211_RADIOTAP", "_xmit_PHONET", "_xmit_PHONET_PIPE",
+   "_xmit_IEEE802154", "_xmit_VOID", "_xmit_NONE"};
 
 static struct lock_class_key 
netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];
 static struct lock_class_key 
netdev_addr_lock_key[ARRAY_SIZE(netdev_lock_type)];
-- 
2.7.4



[PATCH 3/4] net: Fix checkpatch block comments warnings

2017-02-08 Thread Tobin C. Harding
Fix multiple occurrences of checkpatch warning. WARNING: Block
comments use * on subsequent lines. Also make comment blocks
more uniform.

Signed-off-by: Tobin C. Harding 
---
 net/core/dev.c | 65 +-
 1 file changed, 33 insertions(+), 32 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 471e41a..6a076a1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -353,10 +353,11 @@ static inline void netdev_set_addr_lockdep_class(struct 
net_device *dev)
 #endif
 
 
/***
+ *
+ * Protocol management and registration routines
+ *
+ 
***/
 
-   Protocol management and registration routines
-
-***/
 
 /*
  * Add a protocol ID to the list. Now that the input handler is
@@ -539,10 +540,10 @@ void dev_remove_offload(struct packet_offload *po)
 EXPORT_SYMBOL(dev_remove_offload);
 
 /**
-
- Device Boot-time Settings Routines
-
-***/
+ *
+ *   Device Boot-time Settings Routines
+ *
+ 
**/
 
 /* Boot time configuration table */
 static struct netdev_boot_setup dev_boot_setup[NETDEV_BOOT_SETUP_MAX];
@@ -664,10 +665,10 @@ int __init netdev_boot_setup(char *str)
 __setup("netdev=", netdev_boot_setup);
 
 
/***
-
-   Device Interface Subroutines
-
-***/
+ *
+ * Device Interface Subroutines
+ *
+ 
***/
 
 /**
  * dev_get_iflink  - get 'iflink' value of a interface
@@ -3343,16 +3344,16 @@ static int __dev_queue_xmit(struct sk_buff *skb, void 
*accel_priv)
}
 
/* The device has no queue. Common case for software devices:
-  loopback, all the sorts of tunnels...
+* loopback, all the sorts of tunnels...
 
-  Really, it is unlikely that netif_tx_lock protection is necessary
-  here.  (f.e. loopback and IP tunnels are clean ignoring statistics
-  counters.)
-  However, it is possible, that they rely on protection
-  made by us here.
+* Really, it is unlikely that netif_tx_lock protection is necessary
+* here.  (f.e. loopback and IP tunnels are clean ignoring statistics
+* counters.)
+* However, it is possible, that they rely on protection
+* made by us here.
 
-  Check this and shot the lock. It is not prone from deadlocks.
-  Either shot noqueue qdisc, it is even simpler 8)
+* Check this and shot the lock. It is not prone from deadlocks.
+*Either shot noqueue qdisc, it is even simpler 8)
 */
if (dev->flags & IFF_UP) {
int cpu = smp_processor_id(); /* ok because BHs are off */
@@ -3414,9 +3415,9 @@ int dev_queue_xmit_accel(struct sk_buff *skb, void 
*accel_priv)
 EXPORT_SYMBOL(dev_queue_xmit_accel);
 
 
-/*===
-   Receiver routines
-  ===*/
+/*
+ * Receiver routines
+ */
 
 int netdev_max_backlog __read_mostly = 1000;
 EXPORT_SYMBOL(netdev_max_backlog);
@@ -6448,8 +6449,8 @@ int __dev_change_flags(struct net_device *dev, unsigned 
int flags)
}
 
/* NOTE: order of synchronization of IFF_PROMISC and IFF_ALLMULTI
-  is important. Some (broken) drivers set IFF_PROMISC, when
-  IFF_ALLMULTI is requested not asking us and not reporting.
+* is important. Some (broken) drivers set IFF_PROMISC, when
+* IFF_ALLMULTI is requested not asking us and not reporting.
 */
if ((flags ^ dev->gflags) & IFF_ALLMULTI) {
int inc = (flags & IFF_ALLMULTI) ? 1 : -1;
@@ -6813,8 +6814,8 @@ static void rollback_registered_many(struct list_head 
*head)
 
 
/* Notify protocols, that we are about to destroy
-  this device. They should clean all the things.
-   */
+* this device. They should clean all the things.
+*/
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
 
if (!dev->rtnl_link_ops ||
@@ -7951,12 +7952,12 @@ int 

Re: [net, v3, 1/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()

2017-02-08 Thread Heiko Schocher

Hello Florian,

Am 09.02.2017 um 01:13 schrieb Florian Fainelli:

The Generic PHY drivers gets assigned after we checked that the current
PHY driver is NULL, so we need to check a few things before we can
safely dereference d->driver. This would be causing a NULL deference to
occur when a system binds to the Generic PHY driver. Update
phy_attach_direct() to do the following:

- grab the driver module reference after we have assigned the Generic
   PHY drivers accordingly

- update the error path to clean up the module reference in case the
   Generic PHY probe function fails

Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver")
Signed-off-by: Florian Fainelli 
---
  drivers/net/phy/phy_device.c | 16 +++-
  1 file changed, 15 insertions(+), 1 deletion(-)


just stumbled over this bug on an am335x based board, with an
KSZ8081 attached, so there a "fixed-link" is used like:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am335x-baltos-ir3220.dts#n105

With your patch it crashes also ...

If I remove this part:

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index d63d190..9dd08a4 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -921,11 +921,6 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
return -EIO;
}

-   if (!try_module_get(d->driver->owner)) {
-   dev_err(>dev, "failed to get the device driver module\n");
-   return -EIO;
-   }
-
get_device(d);

/* Assume that if there is no driver, that it doesn't

it boots again .. I think, you forgot? simply this remove ?

bye,
Heiko


diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 0d8f4d3847f6..d63d190a95ef 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -908,6 +908,7 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
struct module *ndev_owner = dev->dev.parent->driver->owner;
struct mii_bus *bus = phydev->mdio.bus;
struct device *d = >mdio.dev;
+   bool using_genphy = false;
int err;

/* For Ethernet device drivers that register their own MDIO bus, we
@@ -938,12 +939,22 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
d->driver =
_driver[GENPHY_DRV_1G].mdiodrv.driver;

+   using_genphy = true;
+   }
+
+   if (!try_module_get(d->driver->owner)) {
+   dev_err(>dev, "failed to get the device driver module\n");
+   err = -EIO;
+   goto error_put_device;
+   }
+
+   if (using_genphy) {
err = d->driver->probe(d);
if (err >= 0)
err = device_bind_driver(d);

if (err)
-   goto error;
+   goto error_module_put;
}

if (phydev->attached_dev) {
@@ -981,6 +992,9 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,

  error:
phy_detach(phydev);
+error_module_put:
+   module_put(d->driver->owner);
+error_put_device:
put_device(d);
module_put(d->driver->owner);
if (ndev_owner != bus->owner)



--
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Crashes in -next due to 'net: phy: Fix lack of reference count on PHY driver'

2017-02-08 Thread Guenter Roeck
Hi,

I see a number of my qemu tests in -next crash. Affected are tests
of nios2, xtensa, and arm64.

The arm64 crash log looks as follows.

[0.734220] Hardware name: ZynqMP EP108 (DT)
[0.734298] task: 80007cb5 task.stack: 80007cb4c000
[0.734533] PC is at phy_attach_direct+0x54/0x1b8
[0.734592] LR is at phy_connect_direct+0x1c/0x70
[0.734643] pc : [] lr : [] pstate: 
6045
...
[0.740044] [] phy_attach_direct+0x54/0x1b8
[0.740118] [] phy_connect_direct+0x1c/0x70
[0.740191] [] macb_probe+0x5a8/0x978
[0.740378] [] platform_drv_probe+0x50/0xb8
[0.740449] [] driver_probe_device+0x224/0x2c8
[0.740519] [] __driver_attach+0xac/0xb0
[0.740587] [] bus_for_each_dev+0x60/0xa0
[0.740653] [] driver_attach+0x20/0x28
[0.740716] [] bus_add_driver+0x1d0/0x238
[0.740782] [] driver_register+0x60/0xf8
[0.740849] [] __platform_driver_register+0x40/0x48
[0.740924] [] macb_driver_init+0x18/0x20
[0.740994] [] do_one_initcall+0x38/0x120
[0.741062] [] kernel_init_freeable+0x19c/0x23c
[0.741134] [] kernel_init+0x10/0x100
[0.741199] [] ret_from_fork+0x10/0x50

Detailed logs are available at http://kerneltests.org/builders, in the 'next'
column. The scripts used to run the tests are available in the architecture
subdirectories of https://github.com/groeck/linux-build-test/tree/master/rootfs.

I bisected the nios2 failure; it points to commit cafe8df8b9 ("net: phy: Fix
lack of reference count on PHY driver"). Bisect log is attached below.
Reverting this patch fixes the problem for all affected architectures
in my tests.

Guenter

---
# bad: [e3e6c5f3544c5d05c6b3b309a34f4f2c3537e993] Add linux-next specific files 
for 20170208
# good: [d5adbfcd5f7bcc6fa58a41c5c5ada0e5c826ce2c] Linux 4.10-rc7
git bisect start 'HEAD' 'v4.10-rc7'
# bad: [403e468309f9e2b2dbe264be1ad29b1486ed720e] Merge remote-tracking branch 
'crypto/master'
git bisect bad 403e468309f9e2b2dbe264be1ad29b1486ed720e
# bad: [8c3f07a3ae77164de4405fa97baca4f103f5] Merge remote-tracking branch 
'hid/for-next'
git bisect bad 8c3f07a3ae77164de4405fa97baca4f103f5
# bad: [dd4318312c6fc5c00ae7619f875fb73538a2c1e6] Merge remote-tracking branch 
'omap/for-next'
git bisect bad dd4318312c6fc5c00ae7619f875fb73538a2c1e6
# good: [52b61c8b3eefd40f8a70131cdc0c7348f18f463c] Merge branch 'next/soc' into 
for-next
git bisect good 52b61c8b3eefd40f8a70131cdc0c7348f18f463c
# good: [cbd3dcf3b865b961a9c10ff16e42908832cee63f] Merge branch 'next/dt' into 
for-next
git bisect good cbd3dcf3b865b961a9c10ff16e42908832cee63f
# bad: [66842bac82cae0e9378eea1c54ab9751e32a929b] Merge remote-tracking branch 
'arm/for-next'
git bisect bad 66842bac82cae0e9378eea1c54ab9751e32a929b
# bad: [926af6273fc683cd98cd0ce7bf0d04a02eed6742] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect bad 926af6273fc683cd98cd0ce7bf0d04a02eed6742
# bad: [2dcab598484185dea7ec22219c76dcdd59e3cb90] sctp: avoid BUG_ON on 
sctp_wait_for_sndbuf
git bisect bad 2dcab598484185dea7ec22219c76dcdd59e3cb90
# bad: [89389b4d5524350e74974cf711fe4a18206c09d3] Merge tag 
'mac80211-for-davem-2017-02-06' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
git bisect bad 89389b4d5524350e74974cf711fe4a18206c09d3
# bad: [34b2cef20f19c87999fff3da4071e66937db9644] ipv4: keep skb->dst around in 
presence of IP options
git bisect bad 34b2cef20f19c87999fff3da4071e66937db9644
# bad: [cafe8df8b9bc9aa3dffa827c1a6757c6cd36f657] net: phy: Fix lack of 
reference count on PHY driver
git bisect bad cafe8df8b9bc9aa3dffa827c1a6757c6cd36f657
# good: [770f82253dbd7e6892a88018f2f6cd395f48d214] mlx4: xdp_prog becomes 
inactive after ethtool '-L' or '-G'
git bisect good 770f82253dbd7e6892a88018f2f6cd395f48d214
# good: [2372bcda5e681bc85d57a3604265155e1a4c040b] Merge branch 
'mlx4-queue-reinit'
git bisect good 2372bcda5e681bc85d57a3604265155e1a4c040b
# first bad commit: [cafe8df8b9bc9aa3dffa827c1a6757c6cd36f657] net: phy: Fix 
lack of reference count on PHY driver


Re: Extending socket timestamping API for NTP

2017-02-08 Thread Denny Page
[Resend as plain text]


> On Feb 07, 2017, at 06:01, Miroslav Lichvar  wrote:
> 
> 5) new SO_TIMESTAMPING options to get transposed RX timestamps
> 
>   PTP uses preamble RX timestamps, but NTP works with trailer RX
>   timestamps. This means NTP implementations currently need to
>   transpose HW RX timestamps. The calculation requires the link speed
>   and the length of the packet at layer 2. It seems this can be
>   reliably done only using raw sockets. It would be very nice if the
>   kernel could tranpose the timestamps automatically.
> 
>   The existing SOF_TIMESTAMPING_RX_HARDWARE flag could be aliased to
>   SOF_TIMESTAMPING_RX_HARDWARE_PREAMBLE and the new flag could be
>   SOF_TIMESTAMPING_RX_HARDWARE_TRAILER.
> 
>   PTP has a similar problem with SW RX timestamps, which are closer
>   to the trailer timestamps rather than preamble timestamps. A new
>   SOF_TIMESTAMPING_RX_SOFTWARE_PREAMBLE flag could be added for PTP
>   implementations to get transposed timestamps in order to improve
>   accuracy.
> 
> 6) new SO_TIMESTAMPING option to get PHC index with HW timestamps
> 
>   With bridges, bonding and other things it's difficult to determine
>   which PHC timestamped the packet. It would be very useful if the
>   PHC index was provided with each HW timestamp.
> 
>   I'm not sure what would be the best place to put it. I guess the
>   second timespec in scm_timestamping could be reused for this, but
>   that sounds like a gross hack. Do we need to define a new struct?


Miroslav, if #5 were implemented, would #6 still needed?

Denny




[PATCH 1/1] ixgbe: add the external ixgbe fiber transceiver status

2017-02-08 Thread Zhu Yanjun
When the ixgbe fiber transceiver is external, it is necessary to get
the present/absent status of this external ixgbe fiber transceiver.

The steps to get the present/absent status:
The enp1s0f0 is an external ixgbe fiber NIC.

ethtool enp1s0f0

...
Port: FIBRE
PHYAD: 0
Transceiver: external(present) <---The transceiver is present.
Auto-negotiation: on
Supports Wake-on: d
...

Or
...
Port: FIBRE
PHYAD: 0
Transceiver: external(absent) <---The transceiver is absent
Auto-negotiation: on
Supports Wake-on: d
...

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 15 +++
 include/uapi/linux/ethtool.h |  4 
 2 files changed, 19 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index fd192bf..b3f86f4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -313,6 +313,21 @@ static int ixgbe_get_settings(struct net_device *netdev,
break;
}
 
+   /* When the tranceiver is external, the following is meaningful.
+* ecmd->reserved[0] has 3 values:
+* 0x0: tranceiver absent
+* 0x4: tranceiver present
+* others: not support
+*/
+   if (ecmd->port == PORT_FIBRE) {
+   u32 status = IXGBE_READ_REG(hw, IXGBE_ESDP) & IXGBE_ESDP_SDP2;
+
+   if (status == 0x4)
+   ecmd->transceiver = XCVR_EXTERNAL_PRESENT;
+   if (status == 0x0)
+   ecmd->transceiver = XCVR_EXTERNAL_ABSENT;
+   }
+
/* Indicate pause support */
ecmd->supported |= SUPPORTED_Pause;
 
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 3dc91a4..8e8225a 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1541,6 +1541,10 @@ static inline int ethtool_validate_duplex(__u8 duplex)
 #define XCVR_DUMMY20x03
 #define XCVR_DUMMY30x04
 
+/* The fiber transceiver status */
+#define XCVR_EXTERNAL_ABSENT   0x05
+#define XCVR_EXTERNAL_PRESENT  0x06
+
 /* Enable or disable autonegotiation. */
 #define AUTONEG_DISABLE0x00
 #define AUTONEG_ENABLE 0x01
-- 
2.7.4



[PATCH 1/1] ethtool: add the external transceiver status of the ixgbe fiber

2017-02-08 Thread Zhu Yanjun
When the the fiber transceiver of the ixgbe NIC is external, sometimes
it is necessary to get the present/absent status of the fiber transceiver
of the ixgbe NIC.

The steps to get the present/absent status:
The NIC enp1s0f0 is an external ixgbe fiber NIC.

ethtool enp1s0f0

...
Port: FIBRE
PHYAD: 0
Transceiver: external(present) <---The transceiver is present.
Auto-negotiation: on
Supports Wake-on: d
...

Or
...
Port: FIBRE
PHYAD: 0
Transceiver: external(absent) <---The transceiver is absent
Auto-negotiation: on
Supports Wake-on: d
...
Signed-off-by: Zhu Yanjun 
---
 ethtool-copy.h | 2 ++
 ethtool.c  | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 3d299e3..1c6db9a 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -1536,6 +1536,8 @@ static __inline__ int ethtool_validate_duplex(__u8 duplex)
 #define XCVR_DUMMY10x02
 #define XCVR_DUMMY20x03
 #define XCVR_DUMMY30x04
+#define XCVR_EXTERNAL_ABSENT   0x05
+#define XCVR_EXTERNAL_PRESENT  0x06
 
 /* Enable or disable autonegotiation. */
 #define AUTONEG_DISABLE0x00
diff --git a/ethtool.c b/ethtool.c
index 7af039e..85cf5a2 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -811,6 +811,12 @@ dump_link_usettings(const struct ethtool_link_usettings 
*link_usettings)
case XCVR_EXTERNAL:
fprintf(stdout, "external\n");
break;
+   case XCVR_EXTERNAL_PRESENT:
+   fprintf(stdout, "external(present)\n");
+   break;
+   case XCVR_EXTERNAL_ABSENT:
+   fprintf(stdout, "external(absent)\n");
+   break;
default:
fprintf(stdout, "Unknown!\n");
break;
-- 
2.7.4



[PATCHv2 iproute2 net-next] man: ip-link.8: Document bridge_slave fdb_flush option

2017-02-08 Thread Hangbin Liu
Signed-off-by: Hangbin Liu 
---
 man/man8/ip-link.8.in | 5 +
 1 file changed, 5 insertions(+)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 469bb43..651a255 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -1429,6 +1429,8 @@ the following additional arguments are supported:
 
 .B "ip link set type bridge_slave"
 [
+.B fdb_flush
+] [
 .BI state " STATE"
 ] [
 .BI priority " PRIO"
@@ -1459,6 +1461,9 @@ the following additional arguments are supported:
 
 .in +8
 .sp
+.B fdb_flush
+- flush bridge slave's fdb dynamic entries.
+
 .BI state " STATE"
 - Set port state.
 .I STATE
-- 
2.5.5



[PATCH net] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()

2017-02-08 Thread Florian Fainelli
The Generic PHY drivers gets assigned after we checked that the current
PHY driver is NULL, so we need to check a few things before we can
safely dereference d->driver. This would be causing a NULL deference to
occur when a system binds to the Generic PHY driver. Update
phy_attach_direct() to do the following:

- grab the driver module reference after we have assigned the Generic
  PHY drivers accordingly, and remember we came from the generic PHY
  path

- update the error path to clean up the module reference in case the
  Generic PHY probe function fails

- split the error path involving phy_detacht() to avoid double free/put
  since phy_detach() does all the clean up

- finally, have phy_detach() drop the module reference count before we
  call device_release_driver() for the Generic PHY driver case

Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver")
Signed-off-by: Florian Fainelli 
---
David,

This is applicable to the "net" and the "net-next" tree since you
merged "net" into "net-next".

I will fix the PHY driver bind/unbind mess another time, because we are running
out of time for 4.10-rc final, and it's not like it worked before and got
broken in this cycle, it just never worked (the bind/unbind).

Thanks!

 drivers/net/phy/phy_device.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 0d8f4d3847f6..8c8e15b8739d 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -908,6 +908,7 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
struct module *ndev_owner = dev->dev.parent->driver->owner;
struct mii_bus *bus = phydev->mdio.bus;
struct device *d = >mdio.dev;
+   bool using_genphy = false;
int err;
 
/* For Ethernet device drivers that register their own MDIO bus, we
@@ -920,11 +921,6 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
return -EIO;
}
 
-   if (!try_module_get(d->driver->owner)) {
-   dev_err(>dev, "failed to get the device driver module\n");
-   return -EIO;
-   }
-
get_device(d);
 
/* Assume that if there is no driver, that it doesn't
@@ -938,12 +934,22 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
d->driver =
_driver[GENPHY_DRV_1G].mdiodrv.driver;
 
+   using_genphy = true;
+   }
+
+   if (!try_module_get(d->driver->owner)) {
+   dev_err(>dev, "failed to get the device driver module\n");
+   err = -EIO;
+   goto error_put_device;
+   }
+
+   if (using_genphy) {
err = d->driver->probe(d);
if (err >= 0)
err = device_bind_driver(d);
 
if (err)
-   goto error;
+   goto error_module_put;
}
 
if (phydev->attached_dev) {
@@ -980,9 +986,14 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
return err;
 
 error:
+   /* phy_detach() does all of the cleanup below */
phy_detach(phydev);
-   put_device(d);
+   return err;
+
+error_module_put:
module_put(d->driver->owner);
+error_put_device:
+   put_device(d);
if (ndev_owner != bus->owner)
module_put(bus->owner);
return err;
@@ -1045,6 +1056,8 @@ void phy_detach(struct phy_device *phydev)
 
phy_led_triggers_unregister(phydev);
 
+   module_put(phydev->mdio.dev.driver->owner);
+
/* If the device had no specific driver before (i.e. - it
 * was using the generic driver), we unbind the device
 * from the generic driver so that there's a chance a
@@ -1065,7 +1078,6 @@ void phy_detach(struct phy_device *phydev)
bus = phydev->mdio.bus;
 
put_device(>mdio.dev);
-   module_put(phydev->mdio.dev.driver->owner);
if (ndev_owner != bus->owner)
module_put(bus->owner);
 }
-- 
2.9.3



Re: [PATCH net] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()

2017-02-08 Thread Florian Fainelli
On 02/08/2017 07:05 PM, Florian Fainelli wrote:
> The Generic PHY drivers gets assigned after we checked that the current
> PHY driver is NULL, so we need to check a few things before we can
> safely dereference d->driver. This would be causing a NULL deference to
> occur when a system binds to the Generic PHY driver. Update
> phy_attach_direct() to do the following:
> 
> - grab the driver module reference after we have assigned the Generic
>   PHY drivers accordingly, and remember we came from the generic PHY
>   path
> 
> - update the error path to clean up the module reference in case the
>   Generic PHY probe function fails
> 
> - split the error path involving phy_detacht() to avoid double free/put
>   since phy_detach() does all the clean up
> 
> - finally, have phy_detach() drop the module reference count before we
>   call device_release_driver() for the Generic PHY driver case
> 
> Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver")
> Signed-off-by: Florian Fainelli 

Just FWIW, this time I tested all error paths in phy_attach_direct() by
directly injecting errors, and did that with both the Generic PHY driver
and another driver to make sure there were no reference count problems,
nor double frees.

Thanks all!

> ---
> David,
> 
> This is applicable to the "net" and the "net-next" tree since you
> merged "net" into "net-next".
> 
> I will fix the PHY driver bind/unbind mess another time, because we are 
> running
> out of time for 4.10-rc final, and it's not like it worked before and got
> broken in this cycle, it just never worked (the bind/unbind).
> 
> Thanks!
> 
>  drivers/net/phy/phy_device.c | 28 
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 0d8f4d3847f6..8c8e15b8739d 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -908,6 +908,7 @@ int phy_attach_direct(struct net_device *dev, struct 
> phy_device *phydev,
>   struct module *ndev_owner = dev->dev.parent->driver->owner;
>   struct mii_bus *bus = phydev->mdio.bus;
>   struct device *d = >mdio.dev;
> + bool using_genphy = false;
>   int err;
>  
>   /* For Ethernet device drivers that register their own MDIO bus, we
> @@ -920,11 +921,6 @@ int phy_attach_direct(struct net_device *dev, struct 
> phy_device *phydev,
>   return -EIO;
>   }
>  
> - if (!try_module_get(d->driver->owner)) {
> - dev_err(>dev, "failed to get the device driver module\n");
> - return -EIO;
> - }
> -
>   get_device(d);
>  
>   /* Assume that if there is no driver, that it doesn't
> @@ -938,12 +934,22 @@ int phy_attach_direct(struct net_device *dev, struct 
> phy_device *phydev,
>   d->driver =
>   _driver[GENPHY_DRV_1G].mdiodrv.driver;
>  
> + using_genphy = true;
> + }
> +
> + if (!try_module_get(d->driver->owner)) {
> + dev_err(>dev, "failed to get the device driver module\n");
> + err = -EIO;
> + goto error_put_device;
> + }
> +
> + if (using_genphy) {
>   err = d->driver->probe(d);
>   if (err >= 0)
>   err = device_bind_driver(d);
>  
>   if (err)
> - goto error;
> + goto error_module_put;
>   }
>  
>   if (phydev->attached_dev) {
> @@ -980,9 +986,14 @@ int phy_attach_direct(struct net_device *dev, struct 
> phy_device *phydev,
>   return err;
>  
>  error:
> + /* phy_detach() does all of the cleanup below */
>   phy_detach(phydev);
> - put_device(d);
> + return err;
> +
> +error_module_put:
>   module_put(d->driver->owner);
> +error_put_device:
> + put_device(d);
>   if (ndev_owner != bus->owner)
>   module_put(bus->owner);
>   return err;
> @@ -1045,6 +1056,8 @@ void phy_detach(struct phy_device *phydev)
>  
>   phy_led_triggers_unregister(phydev);
>  
> + module_put(phydev->mdio.dev.driver->owner);
> +
>   /* If the device had no specific driver before (i.e. - it
>* was using the generic driver), we unbind the device
>* from the generic driver so that there's a chance a
> @@ -1065,7 +1078,6 @@ void phy_detach(struct phy_device *phydev)
>   bus = phydev->mdio.bus;
>  
>   put_device(>mdio.dev);
> - module_put(phydev->mdio.dev.driver->owner);
>   if (ndev_owner != bus->owner)
>   module_put(bus->owner);
>  }
> 


-- 
Florian


Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions

2017-02-08 Thread David Ahern
On 2/8/17 12:40 PM, David Ahern wrote:
> On 2/8/17 3:52 AM, Daniel Borkmann wrote:
>> for cBPF dumps it looks like this in ss. Can you tell me what these
>> 11 insns do? Likely you can, but can a normal admin?
>>
>> # ss -0 -b
>> Netid  Recv-Q Send-Q   Local
>> Address:PortPeer
>> Address:Port
>> p_raw  0  0   
>> *:em1*
>> bpf filter (11):  0x28 0 0 12, 0x15 0 8 2048, 0x30 0 0 23, 0x15 0 6
>> 17, 0x28 0 0 20, 0x45 4 0 8191, 0xb1 0 0 14, 0x48 0 0 16, 0x15 0 1 68,
>> 0x06 0 0 4294967295, 0x06 0 0 0,
> 
...

> 
> It's not rocket science. We should be able to write tools that do the
> same for bpf as objdump does for assembly. It is a matter of someone
> having the need and taking the initiative. BTW, the bpf option was added

Just a couple of hours of hacking this afternoon and leveraging some of
the verifier code in the kernel, the above bpf filter in more human
friendly terms:

BPF_LD  | BPF_ABS  | BPF_H   0xc:  val = *(u16 *)skb[12]
BPF_JMP | BPF_JEQ  | BPF_K  0  8 0x800  :  if !(val == 0x800) goto pc+8
BPF_LD  | BPF_ABS  | BPF_B   0x17   :  val = *(u8 *)skb[23]
BPF_JMP | BPF_JEQ  | BPF_K  0  6 0x11   :  if !(val == 0x11) goto pc+6
BPF_LD  | BPF_ABS  | BPF_H   0x14   :  val = *(u16 *)skb[20]
BPF_JMP | BPF_JSET | BPF_K  4  0 0x1fff :  if ((val & 0x1fff) != 0) goto
pc+4
BPF_LDX | BPF_MSH  | BPF_B   0xe:
BPF_LD  | BPF_IND  | BPF_H   0x10   :  val = *(u16 *)skb[16]
BPF_JMP | BPF_JEQ  | BPF_K  0  1 0x44   :  if !(val == 0x44) goto pc+1
BPF_RET :  ret 
BPF_RET 0   :  ret 0

(long lines so I chopped the reprint of the hex on the left)

That said, verifying that the program attached to a cgroup is correct
for a VRF does not require it to be pretty printed or viewed by humans.
I can automate the checks on namespace id and and device index.


[RFC PATCH net-next 1/1] net: rmnet_data: Initial implementation

2017-02-08 Thread Subash Abhinov Kasiviswanathan
RmNet Data driver provides a transport agnostic MAP (multiplexing and
aggregation protocol) support in embedded and bridge modes. Module
provides virtual network devices which can be attached to any IP-mode
physical device. This will be used to provide all MAP functionality
on future hardware in a single consistent location.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 Documentation/networking/rmnet_data.txt |   82 +++
 include/uapi/linux/Kbuild   |2 +
 include/uapi/linux/if_arp.h |1 +
 include/uapi/linux/if_ether.h   |3 +-
 include/uapi/linux/rmnet_data.h |  416 +++
 net/Kconfig |1 +
 net/Makefile|1 +
 net/rmnet_data/Kconfig  |   21 +
 net/rmnet_data/Makefile |   14 +
 net/rmnet_data/rmnet_data_config.c  | 1149 +++
 net/rmnet_data/rmnet_data_config.h  |  123 
 net/rmnet_data/rmnet_data_handlers.c|  560 +++
 net/rmnet_data/rmnet_data_handlers.h|   24 +
 net/rmnet_data/rmnet_data_main.c|   60 ++
 net/rmnet_data/rmnet_data_private.h |   76 ++
 net/rmnet_data/rmnet_data_stats.c   |   86 +++
 net/rmnet_data/rmnet_data_stats.h   |   61 ++
 net/rmnet_data/rmnet_data_trace.h   |  183 +
 net/rmnet_data/rmnet_data_vnd.c |  602 
 net/rmnet_data/rmnet_data_vnd.h |   40 ++
 net/rmnet_data/rmnet_map.h  |  148 
 net/rmnet_data/rmnet_map_command.c  |  180 +
 net/rmnet_data/rmnet_map_data.c |  154 +
 23 files changed, 3986 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/networking/rmnet_data.txt
 create mode 100644 include/uapi/linux/rmnet_data.h
 create mode 100644 net/rmnet_data/Kconfig
 create mode 100644 net/rmnet_data/Makefile
 create mode 100644 net/rmnet_data/rmnet_data_config.c
 create mode 100644 net/rmnet_data/rmnet_data_config.h
 create mode 100644 net/rmnet_data/rmnet_data_handlers.c
 create mode 100644 net/rmnet_data/rmnet_data_handlers.h
 create mode 100644 net/rmnet_data/rmnet_data_main.c
 create mode 100644 net/rmnet_data/rmnet_data_private.h
 create mode 100644 net/rmnet_data/rmnet_data_stats.c
 create mode 100644 net/rmnet_data/rmnet_data_stats.h
 create mode 100644 net/rmnet_data/rmnet_data_trace.h
 create mode 100644 net/rmnet_data/rmnet_data_vnd.c
 create mode 100644 net/rmnet_data/rmnet_data_vnd.h
 create mode 100644 net/rmnet_data/rmnet_map.h
 create mode 100644 net/rmnet_data/rmnet_map_command.c
 create mode 100644 net/rmnet_data/rmnet_map_data.c

diff --git a/Documentation/networking/rmnet_data.txt 
b/Documentation/networking/rmnet_data.txt
new file mode 100644
index 000..ff6cce8
--- /dev/null
+++ b/Documentation/networking/rmnet_data.txt
@@ -0,0 +1,82 @@
+1. Introduction
+
+rmnet_data driver is used for supporting the Multiplexing and aggregation
+Protocol (MAP). This protocol is used by all recent chipsets using Qualcomm
+Technologies, Inc. modems.
+
+This driver can be used to register onto any physical network device in
+IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator.
+
+Multiplexing allows for creation of logical netdevices (rmnet_data devices) to
+handle multiple private data networks (PDN) like a default internet, tethering,
+multimedia messaging service (MMS) or IP media subsystem (IMS). Hardware sends
+packets with MAP headers to rmnet_data. Based on the multiplexer id, rmnet_data
+routes to the appropriate PDN after removing the MAP header.
+
+Aggregation is required to achieve high data rates. This involves hardware
+sending aggregated bunch of MAP frames. rmnet_data driver will de-aggregate
+these MAP frames and send them to appropriate PDN's.
+
+2. Packet format
+
+a. MAP packet (data / control)
+
+MAP header has the same endianness of the IP packet.
+
+Packet format -
+
+Bit 0 1   2-7  8 - 15   16 - 31
+Function   Command / Data   Reserved Pad   Multiplexer IDPayload length
+Bit32 - x
+Function Raw  Bytes
+
+Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command
+or data packet. Control packet is used for transport level flow control. Data
+packets are standard IP packets.
+
+Reserved bits are usually zeroed out and to be ignored by receiver.
+
+Padding is number of bytes to be added for 4 byte alignment if required by
+hardware.
+
+Multiplexer ID is to indicate the PDN on which data has to be sent.
+
+Payload length includes the padding length but does not include MAP header
+length.
+
+b. MAP packet (command specific)
+
+Bit 0 1   2-7  8 - 15   16 - 31
+Function   Command Reserved Pad   Multiplexer IDPayload length
+Bit  32 - 3940 - 4546 - 47   48 - 63
+Function   Command nameReserved   Command Type   Reserved
+Bit  

[RFC PATCH net-next 0/1] net: Add support for rmnet_data driver

2017-02-08 Thread Subash Abhinov Kasiviswanathan
This patch adds support for the rmnet_data driver which is required to
support recent chipsets using Qualcomm Technologies, Inc. modems. The data
from hardware follows the multiplexing and aggregation protocol (MAP).

This driver can be used to register onto any physical network device in
IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator.

rmnet_data driver helps to decode these packets and queue them to network
stack (and encode and transmit it to the physical device).

Subash Abhinov Kasiviswanathan (1):
  net: rmnet_data: Initial implementation

 Documentation/networking/rmnet_data.txt |   82 +++
 include/uapi/linux/Kbuild   |2 +
 include/uapi/linux/if_arp.h |1 +
 include/uapi/linux/if_ether.h   |3 +-
 include/uapi/linux/rmnet_data.h |  416 +++
 net/Kconfig |1 +
 net/Makefile|1 +
 net/rmnet_data/Kconfig  |   21 +
 net/rmnet_data/Makefile |   14 +
 net/rmnet_data/rmnet_data_config.c  | 1149 +++
 net/rmnet_data/rmnet_data_config.h  |  123 
 net/rmnet_data/rmnet_data_handlers.c|  560 +++
 net/rmnet_data/rmnet_data_handlers.h|   24 +
 net/rmnet_data/rmnet_data_main.c|   60 ++
 net/rmnet_data/rmnet_data_private.h |   76 ++
 net/rmnet_data/rmnet_data_stats.c   |   86 +++
 net/rmnet_data/rmnet_data_stats.h   |   61 ++
 net/rmnet_data/rmnet_data_trace.h   |  183 +
 net/rmnet_data/rmnet_data_vnd.c |  602 
 net/rmnet_data/rmnet_data_vnd.h |   40 ++
 net/rmnet_data/rmnet_map.h  |  148 
 net/rmnet_data/rmnet_map_command.c  |  180 +
 net/rmnet_data/rmnet_map_data.c |  154 +
 23 files changed, 3986 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/networking/rmnet_data.txt
 create mode 100644 include/uapi/linux/rmnet_data.h
 create mode 100644 net/rmnet_data/Kconfig
 create mode 100644 net/rmnet_data/Makefile
 create mode 100644 net/rmnet_data/rmnet_data_config.c
 create mode 100644 net/rmnet_data/rmnet_data_config.h
 create mode 100644 net/rmnet_data/rmnet_data_handlers.c
 create mode 100644 net/rmnet_data/rmnet_data_handlers.h
 create mode 100644 net/rmnet_data/rmnet_data_main.c
 create mode 100644 net/rmnet_data/rmnet_data_private.h
 create mode 100644 net/rmnet_data/rmnet_data_stats.c
 create mode 100644 net/rmnet_data/rmnet_data_stats.h
 create mode 100644 net/rmnet_data/rmnet_data_trace.h
 create mode 100644 net/rmnet_data/rmnet_data_vnd.c
 create mode 100644 net/rmnet_data/rmnet_data_vnd.h
 create mode 100644 net/rmnet_data/rmnet_map.h
 create mode 100644 net/rmnet_data/rmnet_map_command.c
 create mode 100644 net/rmnet_data/rmnet_map_data.c

-- 
1.9.1



[PATCH v3 net-next 10/10] openvswitch: Pack struct sw_flow_key.

2017-02-08 Thread Jarno Rajahalme
struct sw_flow_key has two 16-bit holes. Move the most matched
conntrack match fields there.  In some typical cases this reduces the
size of the key that needs to be hashed into half and into one cache
line.

Signed-off-by: Jarno Rajahalme 
---
 net/openvswitch/conntrack.c| 40 
 net/openvswitch/conntrack.h|  8 
 net/openvswitch/flow.h | 14 --
 net/openvswitch/flow_netlink.c | 11 +++
 4 files changed, 39 insertions(+), 34 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index de47782..a12825e 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -154,7 +154,7 @@ static void __ovs_ct_update_key_orig_tp(struct sw_flow_key 
*key,
const struct nf_conntrack_tuple *orig,
u8 icmp_proto)
 {
-   key->ct.orig_proto = orig->dst.protonum;
+   key->ct_orig_proto = orig->dst.protonum;
if (orig->dst.protonum == icmp_proto) {
key->ct.orig_tp.src = htons(orig->dst.u.icmp.type);
key->ct.orig_tp.dst = htons(orig->dst.u.icmp.code);
@@ -168,8 +168,8 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
const struct nf_conntrack_zone *zone,
const struct nf_conn *ct)
 {
-   key->ct.state = state;
-   key->ct.zone = zone->id;
+   key->ct_state = state;
+   key->ct_zone = zone->id;
key->ct.mark = ovs_ct_get_mark(ct);
ovs_ct_get_labels(ct, >ct.labels);
 
@@ -197,10 +197,10 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, 
u8 state,
return;
}
}
-   /* Clear 'ct.orig_proto' to mark the non-existence of conntrack
+   /* Clear 'ct_orig_proto' to mark the non-existence of conntrack
 * original direction key fields.
 */
-   key->ct.orig_proto = 0;
+   key->ct_orig_proto = 0;
 }
 
 /* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
@@ -230,7 +230,7 @@ static void ovs_ct_update_key(const struct sk_buff *skb,
if (ct->master)
state |= OVS_CS_F_RELATED;
if (keep_nat_flags) {
-   state |= key->ct.state & OVS_CS_F_NAT_MASK;
+   state |= key->ct_state & OVS_CS_F_NAT_MASK;
} else {
if (ct->status & IPS_SRC_NAT)
state |= OVS_CS_F_SRC_NAT;
@@ -261,11 +261,11 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct 
sw_flow_key *key)
 int ovs_ct_put_key(const struct sw_flow_key *swkey,
   const struct sw_flow_key *output, struct sk_buff *skb)
 {
-   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct.state))
+   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct_state))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
-   nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct.zone))
+   nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct_zone))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) &&
@@ -277,14 +277,14 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey,
>ct.labels))
return -EMSGSIZE;
 
-   if (swkey->ct.orig_proto) {
+   if (swkey->ct_orig_proto) {
if (swkey->eth.type == htons(ETH_P_IP)) {
struct ovs_key_ct_tuple_ipv4 orig = {
output->ipv4.ct_orig.src,
output->ipv4.ct_orig.dst,
output->ct.orig_tp.src,
output->ct.orig_tp.dst,
-   output->ct.orig_proto,
+   output->ct_orig_proto,
};
if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,
sizeof(orig), ))
@@ -295,7 +295,7 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey,
IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.dst),
output->ct.orig_tp.src,
output->ct.orig_tp.dst,
-   output->ct.orig_proto,
+   output->ct_orig_proto,
};
if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,
sizeof(orig), ))
@@ -614,11 +614,11 @@ static bool skb_nfct_cached(struct net *net,
 * due to an upcall.  If the connection was not confirmed, it is not
 * cached and needs to be run through conntrack again.
 */
-   if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
-   !(key->ct.state & OVS_CS_F_INVALID) &&
-   key->ct.zone 

[PATCH v3 net-next 04/10] openvswitch: Unionize ovs_key_ct_label with a u32 array.

2017-02-08 Thread Jarno Rajahalme
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array.  It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.

Signed-off-by: Jarno Rajahalme 
---
 include/uapi/linux/openvswitch.h |  8 ++--
 net/openvswitch/conntrack.c  | 15 ---
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 375d812..96aee34 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -446,9 +446,13 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
-#define OVS_CT_LABELS_LEN  16
+#define OVS_CT_LABELS_LEN_32   4
+#define OVS_CT_LABELS_LEN  (OVS_CT_LABELS_LEN_32 * sizeof(__u32))
 struct ovs_key_ct_labels {
-   __u8ct_labels[OVS_CT_LABELS_LEN];
+   union {
+   __u8ct_labels[OVS_CT_LABELS_LEN];
+   __u32   ct_labels_32[OVS_CT_LABELS_LEN_32];
+   };
 };
 
 /* OVS_KEY_ATTR_CT_STATE flags */
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index a6ff374..f23934c 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -281,20 +281,21 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
/* Triggers a change event, which makes sense only for
 * confirmed connections.
 */
-   int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / 
sizeof(u32));
+   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
if (err)
return err;
} else {
u32 *dst = (u32 *)cl->bits;
-   const u32 *msk = (const u32 *)mask->ct_labels;
-   const u32 *lbl = (const u32 *)labels->ct_labels;
+   const u32 *msk = mask->ct_labels_32;
+   const u32 *lbl = labels->ct_labels_32;
int i;
 
/* No-one else has access to the non-confirmed entry, copy
 * labels over, keeping any bits we are not explicitly setting.
 */
-   for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++)
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
}
 
@@ -866,8 +867,8 @@ static bool labels_nonzero(const struct ovs_key_ct_labels 
*labels)
 {
size_t i;
 
-   for (i = 0; i < sizeof(*labels); i++)
-   if (labels->ct_labels[i])
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   if (labels->ct_labels_32[i])
return true;
 
return false;
-- 
2.1.4



[PATCH v3 net-next 07/10] openvswitch: Inherit master's labels.

2017-02-08 Thread Jarno Rajahalme
We avoid calling into nf_conntrack_in() for expected connections, as
that would remove the expectation that we want to stick around until
we are ready to commit the connection.  Instead, we do a lookup in the
expectation table directly.  However, after a successful expectation
lookup we have set the flow key label field from the master
connection, whereas nf_conntrack_in() does not do this.  This leads to
master's labels being inherited after an expectation lookup, but those
labels not being inherited after the corresponding conntrack action
with a commit flag.

This patch resolves the problem by changing the commit code path to
also inherit the master's labels to the expected connection.
Resolving this conflict in favor or inheriting the labels allows more
information be passed from the master connection to related
connections, which would otherwise be much harder if the 32 bits in
the connmark are not enough.  Labels can still be set explicitly, so
this change only affects the default values of the labels in presense
of a master connection.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Jarno Rajahalme 
---
 net/openvswitch/conntrack.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 0e038ee..5fbadcd 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -73,6 +73,8 @@ struct ovs_conntrack_info {
 #endif
 };
 
+static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
+
 static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info);
 
 static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -272,18 +274,32 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
  const struct ovs_key_ct_labels *labels,
  const struct ovs_key_ct_labels *mask)
 {
-   struct nf_conn_labels *cl;
-   u32 *dst;
-   int i;
+   struct nf_conn_labels *cl, *master_cl;
+   bool have_mask = labels_nonzero(mask);
+
+   /* Inherit master's labels to the related connection? */
+   master_cl = (ct->master) ? nf_ct_labels_find(ct->master) : NULL;
+
+   if (!master_cl && !have_mask)
+   return 0;   /* Nothing to do. */
 
cl = ovs_ct_get_conn_labels(ct);
if (!cl)
return -ENOSPC;
 
-   dst = (u32 *)cl->bits;
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
-   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
+   /* Inherit the master's labels, if any. */
+   if (master_cl)
+   *cl = *master_cl;
+
+   if (have_mask) {
+   u32 *dst = (u32 *)cl->bits;
+   int i;
+
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i]
+& mask->ct_labels_32[i]);
+   }
 
memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
 
@@ -911,13 +927,14 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
if (err)
return err;
}
-   if (labels_nonzero(>labels.mask)) {
-   if (!nf_ct_is_confirmed(ct))
-   err = ovs_ct_init_labels(ct, key, >labels.value,
->labels.mask);
-   else
-   err = ovs_ct_set_labels(ct, key, >labels.value,
-   >labels.mask);
+   if (!nf_ct_is_confirmed(ct)) {
+   err = ovs_ct_init_labels(ct, key, >labels.value,
+>labels.mask);
+   if (err)
+   return err;
+   } else if (labels_nonzero(>labels.mask)) {
+   err = ovs_ct_set_labels(ct, key, >labels.value,
+   >labels.mask);
if (err)
return err;
}
-- 
2.1.4



[PATCH v3 net-next 05/10] openvswitch: Simplify labels length logic.

2017-02-08 Thread Jarno Rajahalme
Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
distinct labels"), the size of conntrack labels extension has fixed to
128 bits, so we do not need to check for labels sizes shorter than 128
at run-time.  This patch simplifies labels length logic accordingly,
but allows the conntrack labels size to be increased in the future
without breaking the build.  In the event of conntrack labels
increasing in size OVS would still be able to deal with the 128 first
label bits.

Suggested-by: Joe Stringer 
Signed-off-by: Jarno Rajahalme 
---
 net/openvswitch/conntrack.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index f23934c..c7db4da 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -129,22 +129,22 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct)
 #endif
 }
 
+/* Guard against conntrack labels max size shrinking below 128 bits. */
+#if NF_CT_LABELS_MAX_SIZE < 16
+#error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes
+#endif
+
 static void ovs_ct_get_labels(const struct nf_conn *ct,
  struct ovs_key_ct_labels *labels)
 {
struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
 
-   if (cl) {
-   size_t len = sizeof(cl->bits);
-
-   if (len > OVS_CT_LABELS_LEN)
-   len = OVS_CT_LABELS_LEN;
-   else if (len < OVS_CT_LABELS_LEN)
-   memset(labels, 0, OVS_CT_LABELS_LEN);
-   memcpy(labels, cl->bits, len);
-   } else {
+   if (cl)
+   memcpy(labels, cl->bits,
+  sizeof(cl->bits) > OVS_CT_LABELS_LEN
+  ? OVS_CT_LABELS_LEN : sizeof(cl->bits));
+   else
memset(labels, 0, OVS_CT_LABELS_LEN);
-   }
 }
 
 static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
@@ -274,7 +274,7 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
-   if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN)
+   if (!cl)
return -ENOSPC;
 
if (nf_ct_is_confirmed(ct)) {
-- 
2.1.4



[PATCH v3 net-next 00/10] openvswitch: Conntrack integration improvements.

2017-02-08 Thread Jarno Rajahalme
This series improves the conntrack integration code in the openvswitch
module by fixing outdated comments (patch 1), bugs (patches 2, 3, and
7), clarifying code (patches 4, 5, and 6), improving performance
(patch 10), and adding new features enabling better translation from
firewall admission policy to network configuration requested by user
communities (patches 8 and 9).

v3: Rebase to the current net-next, add the comment only changing
patch 1 and reshuffle some of the patches as requested by Joe.

Jarno Rajahalme (10):
  openvswitch: Fix comments for skb->_nfct
  openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.
  openvswitch: Do not trigger events for unconfirmed connections.
  openvswitch: Unionize ovs_key_ct_label with a u32 array.
  openvswitch: Simplify labels length logic.
  openvswitch: Refactor labels initialization.
  openvswitch: Inherit master's labels.
  openvswitch: Add original direction conntrack tuple to sw_flow_key.
  openvswitch: Add force commit.
  openvswitch: Pack struct sw_flow_key.

 include/uapi/linux/openvswitch.h |  33 -
 net/openvswitch/actions.c|   2 +
 net/openvswitch/conntrack.c  | 298 ++-
 net/openvswitch/conntrack.h  |  14 +-
 net/openvswitch/flow.c   |  34 -
 net/openvswitch/flow.h   |  55 ++--
 net/openvswitch/flow_netlink.c   |  92 +---
 net/openvswitch/flow_netlink.h   |   7 +-
 8 files changed, 422 insertions(+), 113 deletions(-)

-- 
2.1.4



[PATCH v3 net-next 06/10] openvswitch: Refactor labels initialization.

2017-02-08 Thread Jarno Rajahalme
Refactoring conntrack labels initialization makes chenges in later
patches easier to review.

Signed-off-by: Jarno Rajahalme 
---
 net/openvswitch/conntrack.c | 104 ++--
 1 file changed, 62 insertions(+), 42 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c7db4da..0e038ee 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -229,19 +229,12 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct 
sk_buff *skb)
return 0;
 }
 
-static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
+static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key,
   u32 ct_mark, u32 mask)
 {
 #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
-   enum ip_conntrack_info ctinfo;
-   struct nf_conn *ct;
u32 new_mark;
 
-   /* The connection could be invalid, in which case set_mark is no-op. */
-   ct = nf_ct_get(skb, );
-   if (!ct)
-   return 0;
-
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
@@ -256,50 +249,66 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
 #endif
 }
 
-static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key,
-const struct ovs_key_ct_labels *labels,
-const struct ovs_key_ct_labels *mask)
+static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct)
 {
-   enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
-   struct nf_conn *ct;
-
-   /* The connection could be invalid, in which case set_label is no-op.*/
-   ct = nf_ct_get(skb, );
-   if (!ct)
-   return 0;
 
cl = nf_ct_labels_find(ct);
if (!cl) {
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
+
+   return cl;
+}
+
+/* Initialize labels for a new, to be committed conntrack entry.  Note that
+ * since the new connection is not yet confirmed, and thus no-one else has
+ * access to it's labels, we simply write them over.  Also, we refrain from
+ * triggering events, as receiving change events before the create event would
+ * be confusing.
+ */
+static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key,
+ const struct ovs_key_ct_labels *labels,
+ const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   u32 *dst;
+   int i;
+
+   cl = ovs_ct_get_conn_labels(ct);
if (!cl)
return -ENOSPC;
 
-   if (nf_ct_is_confirmed(ct)) {
-   /* Triggers a change event, which makes sense only for
-* confirmed connections.
-*/
-   int err = nf_connlabels_replace(ct, labels->ct_labels_32,
-   mask->ct_labels_32,
-   OVS_CT_LABELS_LEN_32);
-   if (err)
-   return err;
-   } else {
-   u32 *dst = (u32 *)cl->bits;
-   const u32 *msk = mask->ct_labels_32;
-   const u32 *lbl = labels->ct_labels_32;
-   int i;
+   dst = (u32 *)cl->bits;
+   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
+   dst[i] = (dst[i] & ~mask->ct_labels_32[i]) |
+   (labels->ct_labels_32[i] & mask->ct_labels_32[i]);
 
-   /* No-one else has access to the non-confirmed entry, copy
-* labels over, keeping any bits we are not explicitly setting.
-*/
-   for (i = 0; i < OVS_CT_LABELS_LEN_32; i++)
-   dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
-   }
+   memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
+
+   return 0;
+}
+
+static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key,
+const struct ovs_key_ct_labels *labels,
+const struct ovs_key_ct_labels *mask)
+{
+   struct nf_conn_labels *cl;
+   int err;
+
+   cl = ovs_ct_get_conn_labels(ct);
+   if (!cl)
+   return -ENOSPC;
+
+   err = nf_connlabels_replace(ct, labels->ct_labels_32,
+   mask->ct_labels_32,
+   OVS_CT_LABELS_LEN_32);
+   if (err)
+   return err;
+
+   memcpy(>ct.labels, cl->bits, OVS_CT_LABELS_LEN);
 
-   ovs_ct_get_labels(ct, >ct.labels);
return 0;
 }
 
@@ -879,25 +888,36 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
 const struct ovs_conntrack_info *info,
 struct sk_buff *skb)
 {
+   enum ip_conntrack_info ctinfo;
+   struct nf_conn *ct;
int err;
 
err = 

[PATCH v3 net-next 03/10] openvswitch: Do not trigger events for unconfirmed connections.

2017-02-08 Thread Jarno Rajahalme
Receiving change events before the 'new' event for the connection has
been received can be confusing.  Avoid triggering change events for
setting conntrack mark or labels before the conntrack entry has been
confirmed.

Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark")
Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme 
---
 net/openvswitch/conntrack.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4df9a54..a6ff374 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -245,7 +245,8 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct 
sw_flow_key *key,
new_mark = ct_mark | (ct->mark & ~(mask));
if (ct->mark != new_mark) {
ct->mark = new_mark;
-   nf_conntrack_event_cache(IPCT_MARK, ct);
+   if (nf_ct_is_confirmed(ct))
+   nf_conntrack_event_cache(IPCT_MARK, ct);
key->ct.mark = new_mark;
}
 
@@ -262,7 +263,6 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
enum ip_conntrack_info ctinfo;
struct nf_conn_labels *cl;
struct nf_conn *ct;
-   int err;
 
/* The connection could be invalid, in which case set_label is no-op.*/
ct = nf_ct_get(skb, );
@@ -277,10 +277,26 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
if (!cl || sizeof(cl->bits) < OVS_CT_LABELS_LEN)
return -ENOSPC;
 
-   err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-   OVS_CT_LABELS_LEN / sizeof(u32));
-   if (err)
-   return err;
+   if (nf_ct_is_confirmed(ct)) {
+   /* Triggers a change event, which makes sense only for
+* confirmed connections.
+*/
+   int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
+   OVS_CT_LABELS_LEN / 
sizeof(u32));
+   if (err)
+   return err;
+   } else {
+   u32 *dst = (u32 *)cl->bits;
+   const u32 *msk = (const u32 *)mask->ct_labels;
+   const u32 *lbl = (const u32 *)labels->ct_labels;
+   int i;
+
+   /* No-one else has access to the non-confirmed entry, copy
+* labels over, keeping any bits we are not explicitly setting.
+*/
+   for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++)
+   dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]);
+   }
 
ovs_ct_get_labels(ct, >ct.labels);
return 0;
-- 
2.1.4



[PATCH v3 net-next 01/10] openvswitch: Fix comments for skb->_nfct

2017-02-08 Thread Jarno Rajahalme
Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
they are combined into '_nfct'.

Signed-off-by: Jarno Rajahalme 
---
 net/openvswitch/conntrack.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index fbffe0e..5de6d12 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -157,7 +157,7 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 
state,
ovs_ct_get_labels(ct, >ct.labels);
 }
 
-/* Update 'key' based on skb->nfct.  If 'post_ct' is true, then OVS has
+/* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
  * previously sent the packet to conntrack via the ct action.  If
  * 'keep_nat_flags' is true, the existing NAT flags retained, else they are
  * initialized from the connection status.
@@ -421,12 +421,12 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
 
 /* Find an existing connection which this packet belongs to without
  * re-attributing statistics or modifying the connection state.  This allows an
- * skb->nfct lost due to an upcall to be recovered during actions execution.
+ * skb->_nfct lost due to an upcall to be recovered during actions execution.
  *
  * Must be called with rcu_read_lock.
  *
- * On success, populates skb->nfct and skb->nfctinfo, and returns the
- * connection.  Returns NULL if there is no existing entry.
+ * On success, populates skb->_nfct and returns the connection.  Returns NULL
+ * if there is no existing entry.
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
@@ -464,7 +464,7 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return ct;
 }
 
-/* Determine whether skb->nfct is equal to the result of conntrack lookup. */
+/* Determine whether skb->_nfct is equal to the result of conntrack lookup. */
 static bool skb_nfct_cached(struct net *net,
const struct sw_flow_key *key,
const struct ovs_conntrack_info *info,
@@ -475,7 +475,7 @@ static bool skb_nfct_cached(struct net *net,
 
ct = nf_ct_get(skb, );
/* If no ct, check if we have evidence that an existing conntrack entry
-* might be found for this skb.  This happens when we lose a skb->nfct
+* might be found for this skb.  This happens when we lose a skb->_nfct
 * due to an upcall.  If the connection was not confirmed, it is not
 * cached and needs to be run through conntrack again.
 */
@@ -699,7 +699,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if
  * not done already.  Update key with new CT state after passing the packet
  * through conntrack.
- * Note that if the packet is deemed invalid by conntrack, skb->nfct will be
+ * Note that if the packet is deemed invalid by conntrack, skb->_nfct will be
  * set to NULL and 0 will be returned.
  */
 static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key,
-- 
2.1.4



[PATCH v3 net-next 09/10] openvswitch: Add force commit.

2017-02-08 Thread Jarno Rajahalme
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.

Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched.  If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.

This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry.  If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.

Signed-off-by: Jarno Rajahalme 
---
 include/uapi/linux/openvswitch.h |  5 +
 net/openvswitch/conntrack.c  | 26 --
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 90af8b8..7f41f7d 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -674,6 +674,10 @@ struct ovs_action_hash {
  * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
  * @OVS_CT_ATTR_NAT: Nested OVS_NAT_ATTR_* for performing L3 network address
  * translation (NAT) on the packet.
+ * @OVS_CT_ATTR_FORCE_COMMIT: Like %OVS_CT_ATTR_COMMIT, but instead of doing
+ * nothing if the connection is already committed will check that the current
+ * packet is in conntrack entry's original direction.  If directionality does
+ * not match, will delete the existing conntrack entry and commit a new one.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -684,6 +688,7 @@ enum ovs_ct_attr {
OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of
   related connections. */
OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */
+   OVS_CT_ATTR_FORCE_COMMIT,  /* No argument */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 8685bcd..de47782 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -65,6 +65,7 @@ struct ovs_conntrack_info {
struct nf_conn *ct;
u8 commit : 1;
u8 nat : 3; /* enum ovs_ct_nat */
+   u8 force : 1;
u16 family;
struct md_mark mark;
struct md_labels labels;
@@ -615,10 +616,13 @@ static bool skb_nfct_cached(struct net *net,
 */
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
-   key->ct.zone == info->zone.id)
+   key->ct.zone == info->zone.id) {
ct = ovs_ct_find_existing(net, >zone, info->family, skb,
  !!(key->ct.state
 & OVS_CS_F_NAT_MASK));
+   if (ct)
+   nf_ct_get(skb, );
+   }
if (!ct)
return false;
if (!net_eq(net, read_pnet(>ct_net)))
@@ -632,6 +636,18 @@ static bool skb_nfct_cached(struct net *net,
if (help && rcu_access_pointer(help->helper) != info->helper)
return false;
}
+   /* Force conntrack entry direction to the current packet? */
+   if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
+   /* Delete the conntrack entry if confirmed, else just release
+* the reference.
+*/
+   if (nf_ct_is_confirmed(ct))
+   nf_ct_delete(ct, 0, 0);
+   else
+   nf_conntrack_put(>ct_general);
+   nf_ct_set(skb, NULL, 0);
+   return false;
+   }
 
return true;
 }
@@ -1209,6 +1225,7 @@ static int parse_nat(const struct nlattr *attr,
 
 static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
[OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 },
+   [OVS_CT_ATTR_FORCE_COMMIT]  = { .minlen = 0, .maxlen = 0 },
[OVS_CT_ATTR_ZONE]  = { .minlen = sizeof(u16),
.maxlen = sizeof(u16) },
[OVS_CT_ATTR_MARK]  = { .minlen = sizeof(struct md_mark),
@@ -1248,6 +1265,9 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
}
 
switch (type) {
+   case OVS_CT_ATTR_FORCE_COMMIT:
+   info->force = true;
+   /* fall through. */
case OVS_CT_ATTR_COMMIT:
info->commit = true;
break;
@@ -1474,7 

[PATCH v3 net-next 02/10] openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.

2017-02-08 Thread Jarno Rajahalme
The conntrack lookup for existing connections fails to invert the
packet 5-tuple for NATted packets, and therefore fails to find the
existing conntrack entry.  Conntrack only stores 5-tuples for incoming
packets, and there are various situations where a lookup on a packet
that has already been transformed by NAT needs to be made.  Looking up
an existing conntrack entry upon executing packet received from the
userspace is one of them.

This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
for the conntrack lookup whenever the packet has already been
transformed by conntrack from its input form as evidenced by one of
the NAT flags being set in the conntrack state metadata.

Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Jarno Rajahalme 
---
 net/openvswitch/conntrack.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 5de6d12..4df9a54 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -430,7 +430,7 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
-u8 l3num, struct sk_buff *skb)
+u8 l3num, struct sk_buff *skb, bool natted)
 {
struct nf_conntrack_l3proto *l3proto;
struct nf_conntrack_l4proto *l4proto;
@@ -453,6 +453,17 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
return NULL;
}
 
+   /* Must invert the tuple if skb has been transformed by NAT. */
+   if (natted) {
+   struct nf_conntrack_tuple inverse;
+
+   if (!nf_ct_invert_tuple(, , l3proto, l4proto)) {
+   pr_debug("ovs_ct_find_existing: Inversion failed!\n");
+   return NULL;
+   }
+   tuple = inverse;
+   }
+
/* look for tuple match */
h = nf_conntrack_find_get(net, zone, );
if (!h)
@@ -460,6 +471,13 @@ ovs_ct_find_existing(struct net *net, const struct 
nf_conntrack_zone *zone,
 
ct = nf_ct_tuplehash_to_ctrack(h);
 
+   /* Inverted packet tuple matches the reverse direction conntrack tuple,
+* select the other tuplehash to get the right 'ctinfo' bits for this
+* packet.
+*/
+   if (natted)
+   h = >tuplehash[!h->tuple.dst.dir];
+
nf_ct_set(skb, ct, ovs_ct_get_info(h));
return ct;
 }
@@ -482,7 +500,9 @@ static bool skb_nfct_cached(struct net *net,
if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
!(key->ct.state & OVS_CS_F_INVALID) &&
key->ct.zone == info->zone.id)
-   ct = ovs_ct_find_existing(net, >zone, info->family, skb);
+   ct = ovs_ct_find_existing(net, >zone, info->family, skb,
+ !!(key->ct.state
+& OVS_CS_F_NAT_MASK));
if (!ct)
return false;
if (!net_eq(net, read_pnet(>ct_net)))
-- 
2.1.4



[PATCH v3 net-next 08/10] openvswitch: Add original direction conntrack tuple to sw_flow_key.

2017-02-08 Thread Jarno Rajahalme
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key.  The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry.  This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple.  This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state.  While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards.  If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change.  When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information.  If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP not ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields.  Hence, the IP addresses are overlaid in union with ARP
and ND fields.  This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets.  ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme 
---
 include/uapi/linux/openvswitch.h | 20 +-
 net/openvswitch/actions.c|  2 +
 net/openvswitch/conntrack.c  | 86 +---
 net/openvswitch/conntrack.h  | 10 -
 net/openvswitch/flow.c   | 34 +---
 net/openvswitch/flow.h   | 49 ++-
 net/openvswitch/flow_netlink.c   | 85 +--
 net/openvswitch/flow_netlink.h   |  7 +++-
 8 files changed, 246 insertions(+), 47 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 96aee34..90af8b8 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -1,6 +1,6 @@
 
 /*
- * Copyright (c) 2007-2013 Nicira, Inc.
+ * Copyright (c) 2007-2017 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -331,6 +331,8 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
+   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
+   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -472,6 +474,22 @@ struct ovs_key_ct_labels {
 
 #define OVS_CS_F_NAT_MASK (OVS_CS_F_SRC_NAT | OVS_CS_F_DST_NAT)
 
+struct ovs_key_ct_tuple_ipv4 {
+   __be32 ipv4_src;
+   __be32 ipv4_dst;
+   __be16 src_port;
+   __be16 dst_port;
+   __u8   ipv4_proto;
+};
+
+struct 

[PATCH 2/3 v2 net-next] enic: add udp_tunnel ndo for vxlan offload

2017-02-08 Thread Govindarajulu Varadarajan
Defines enic_udp_tunnel_add/del for configuring vxlan tunnel offload.
enic supports offload of only one ipv4/udp port.

There are two modes that fw supports for vxlan offload.

mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set
if checksum of csum is ok. This bit is or of ip_csum_ok and
tcp_udp_csum_ok

mode 2: BIT(0) in rss_hash is set if it is encapsulated packet.
BIT(1) is set if outer_ip_csum_ok/
BIT(2) is set if outer_tcp_csum_ok

tcp_udp_csum_ok/ipv4_csum_ok is set if inner csum is OK.

Signed-off-by: Govindarajulu Varadarajan 
---
 drivers/net/ethernet/cisco/enic/enic.h  |   6 ++
 drivers/net/ethernet/cisco/enic/enic_main.c | 156 +++-
 2 files changed, 159 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic.h 
b/drivers/net/ethernet/cisco/enic/enic.h
index 9023c858715d..2b23f46b34d3 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -135,6 +135,11 @@ struct enic_rfs_flw_tbl {
struct timer_list rfs_may_expire;
 };
 
+struct vxlan_offload {
+   u16 vxlan_udp_port_number;
+   u8 patch_level;
+};
+
 /* Per-instance private data structure */
 struct enic {
struct net_device *netdev;
@@ -175,6 +180,7 @@ struct enic {
/* receive queue cache line section */
cacheline_aligned struct vnic_rq rq[ENIC_RQ_MAX];
unsigned int rq_count;
+   struct vxlan_offload vxlan;
u64 rq_truncated_pkts;
u64 rq_bad_fcs;
struct napi_struct napi[ENIC_RQ_MAX + ENIC_WQ_MAX];
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c 
b/drivers/net/ethernet/cisco/enic/enic_main.c
index c009f6ddabf7..7e56bf95cfc7 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -45,6 +45,7 @@
 #endif
 #include 
 #include 
+#include 
 
 #include "cq_enet_desc.h"
 #include "vnic_dev.h"
@@ -176,6 +177,92 @@ static void enic_unset_affinity_hint(struct enic *enic)
irq_set_affinity_hint(enic->msix_entry[i].vector, NULL);
 }
 
+static void enic_udp_tunnel_add(struct net_device *netdev,
+   struct udp_tunnel_info *ti)
+{
+   struct enic *enic = netdev_priv(netdev);
+   __be16 port = ti->port;
+   int err;
+
+   spin_lock_bh(>devcmd_lock);
+
+   if (ti->type != UDP_TUNNEL_TYPE_VXLAN) {
+   netdev_info(netdev, "udp_tnl: only vxlan tunnel offload 
supported");
+   goto error;
+   }
+
+   if (ti->sa_family != AF_INET) {
+   netdev_info(netdev, "vxlan: only IPv4 offload supported");
+   goto error;
+   }
+
+   if (enic->vxlan.vxlan_udp_port_number) {
+   if (ntohs(port) == enic->vxlan.vxlan_udp_port_number)
+   netdev_warn(netdev, "vxlan: udp port already 
offloaded");
+   else
+   netdev_info(netdev, "vxlan: offload supported for only 
one UDP port");
+
+   goto error;
+   }
+
+   err = vnic_dev_overlay_offload_cfg(enic->vdev,
+  OVERLAY_CFG_VXLAN_PORT_UPDATE,
+  ntohs(port));
+   if (err)
+   goto error;
+
+   err = vnic_dev_overlay_offload_ctrl(enic->vdev, OVERLAY_FEATURE_VXLAN,
+   enic->vxlan.patch_level);
+   if (err)
+   goto error;
+
+   enic->vxlan.vxlan_udp_port_number = ntohs(port);
+
+   netdev_info(netdev, "vxlan fw-vers-%d: offload enabled for udp port: 
%d, sa_family: %d ",
+   (int)enic->vxlan.patch_level, ntohs(port), ti->sa_family);
+
+   goto unlock;
+
+error:
+   netdev_info(netdev, "failed to offload udp port: %d, sa_family: %d, 
type: %d",
+   ntohs(port), ti->sa_family, ti->type);
+unlock:
+   spin_unlock_bh(>devcmd_lock);
+}
+
+static void enic_udp_tunnel_del(struct net_device *netdev,
+   struct udp_tunnel_info *ti)
+{
+   struct enic *enic = netdev_priv(netdev);
+   int err;
+
+   spin_lock_bh(>devcmd_lock);
+
+   if ((ti->sa_family != AF_INET) ||
+   ((ntohs(ti->port) != enic->vxlan.vxlan_udp_port_number)) ||
+   (ti->type != UDP_TUNNEL_TYPE_VXLAN)) {
+   netdev_info(netdev, "udp_tnl: port:%d, sa_family: %d, type: %d 
not offloaded",
+   ntohs(ti->port), ti->sa_family, ti->type);
+   goto unlock;
+   }
+
+   err = vnic_dev_overlay_offload_ctrl(enic->vdev, OVERLAY_FEATURE_VXLAN,
+   OVERLAY_OFFLOAD_DISABLE);
+   if (err) {
+   netdev_err(netdev, "vxlan: del offload udp port: %d failed",
+  ntohs(ti->port));
+   goto unlock;
+   }
+
+   enic->vxlan.vxlan_udp_port_number = 0;
+
+   netdev_info(netdev, "vxlan: del 

[PATCH 1/3 v2 net-next] enic: add devcmds for vxlan offload

2017-02-08 Thread Govindarajulu Varadarajan
This patch adds devcmds needed for vxlan offload. Implement 3 new devcmd

overlay_offload_ctrl: enable/disable offload
overlay_offload_cfg: update offload udp port number
get_supported_feature_ver: get hw supported offload version. Each
   version has different bitmap for csum_ok/encap

Signed-off-by: Govindarajulu Varadarajan 
---
 drivers/net/ethernet/cisco/enic/vnic_dev.c| 34 ++
 drivers/net/ethernet/cisco/enic/vnic_dev.h|  5 +++
 drivers/net/ethernet/cisco/enic/vnic_devcmd.h | 51 +++
 drivers/net/ethernet/cisco/enic/vnic_enet.h   |  1 +
 4 files changed, 91 insertions(+)

diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.c 
b/drivers/net/ethernet/cisco/enic/vnic_dev.c
index 8f27df3207bc..1841ad45d215 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_dev.c
+++ b/drivers/net/ethernet/cisco/enic/vnic_dev.c
@@ -1247,3 +1247,37 @@ int vnic_dev_classifier(struct vnic_dev *vdev, u8 cmd, 
u16 *entry,
 
return ret;
 }
+
+int vnic_dev_overlay_offload_ctrl(struct vnic_dev *vdev, u8 overlay, u8 config)
+{
+   u64 a0 = overlay;
+   u64 a1 = config;
+   int wait = 1000;
+
+   return vnic_dev_cmd(vdev, CMD_OVERLAY_OFFLOAD_CTRL, , , wait);
+}
+
+int vnic_dev_overlay_offload_cfg(struct vnic_dev *vdev, u8 overlay,
+u16 vxlan_udp_port_number)
+{
+   u64 a1 = vxlan_udp_port_number;
+   u64 a0 = overlay;
+   int wait = 1000;
+
+   return vnic_dev_cmd(vdev, CMD_OVERLAY_OFFLOAD_CFG, , , wait);
+}
+
+int vnic_dev_get_supported_feature_ver(struct vnic_dev *vdev, u8 feature,
+  u64 *supported_versions)
+{
+   u64 a0 = feature;
+   int wait = 1000;
+   u64 a1 = 0;
+   int ret;
+
+   ret = vnic_dev_cmd(vdev, CMD_GET_SUPP_FEATURE_VER, , , wait);
+   if (!ret)
+   *supported_versions = a0;
+
+   return ret;
+}
diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.h 
b/drivers/net/ethernet/cisco/enic/vnic_dev.h
index 54156c484424..9d43d6bb9907 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_dev.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_dev.h
@@ -179,5 +179,10 @@ int vnic_dev_set_mac_addr(struct vnic_dev *vdev, u8 
*mac_addr);
 int vnic_dev_classifier(struct vnic_dev *vdev, u8 cmd, u16 *entry,
struct filter *data);
 int vnic_devcmd_init(struct vnic_dev *vdev);
+int vnic_dev_overlay_offload_ctrl(struct vnic_dev *vdev, u8 overlay, u8 
config);
+int vnic_dev_overlay_offload_cfg(struct vnic_dev *vdev, u8 overlay,
+u16 vxlan_udp_port_number);
+int vnic_dev_get_supported_feature_ver(struct vnic_dev *vdev, u8 feature,
+  u64 *supported_versions);
 
 #endif /* _VNIC_DEV_H_ */
diff --git a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h 
b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
index 2a812880b884..d83880b0d468 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
@@ -406,6 +406,31 @@ enum vnic_devcmd_cmd {
 * in: (u32) a0=Queue Pair number
 */
CMD_QP_STATS_CLEAR = _CMDC(_CMD_DIR_WRITE, _CMD_VTYPE_ENET, 63),
+
+   /* Use this devcmd for agreeing on the highest common version supported
+* by both driver and fw for features who need such a facility.
+* in:  (u64) a0 = feature (driver requests for the supported versions
+*  on this feature)
+* out: (u64) a0 = bitmap of all supported versions for that feature
+*/
+   CMD_GET_SUPP_FEATURE_VER = _CMDC(_CMD_DIR_RW, _CMD_VTYPE_ENET, 69),
+
+   /* Control (Enable/Disable) overlay offloads on the given vnic
+* in: (u8) a0 = OVERLAY_FEATURE_NVGRE : NVGRE
+*  a0 = OVERLAY_FEATURE_VXLAN : VxLAN
+* in: (u8) a1 = OVERLAY_OFFLOAD_ENABLE : Enable or
+*  a1 = OVERLAY_OFFLOAD_DISABLE : Disable or
+*  a1 = OVERLAY_OFFLOAD_ENABLE_V2 : Enable with version 2
+*/
+   CMD_OVERLAY_OFFLOAD_CTRL = _CMDC(_CMD_DIR_WRITE, _CMD_VTYPE_ENET, 72),
+
+   /* Configuration of overlay offloads feature on a given vNIC
+* in: (u8) a0 = DEVCMD_OVERLAY_NVGRE : NVGRE
+*  a0 = DEVCMD_OVERLAY_VXLAN : VxLAN
+* in: (u8) a1 = VXLAN_PORT_UPDATE : VxLAN
+* in: (u16) a2 = unsigned short int port information
+*/
+   CMD_OVERLAY_OFFLOAD_CFG = _CMDC(_CMD_DIR_WRITE, _CMD_VTYPE_ENET, 73),
 };
 
 /* CMD_ENABLE2 flags */
@@ -657,4 +682,30 @@ struct devcmd2_result {
 #define DEVCMD2_RING_SIZE  32
 #define DEVCMD2_DESC_SIZE  128
 
+enum overlay_feature_t {
+   OVERLAY_FEATURE_NVGRE = 1,
+   OVERLAY_FEATURE_VXLAN,
+   OVERLAY_FEATURE_MAX,
+};
+
+enum overlay_ofld_cmd {
+   OVERLAY_OFFLOAD_ENABLE,
+   OVERLAY_OFFLOAD_DISABLE,
+   OVERLAY_OFFLOAD_ENABLE_P2,
+   OVERLAY_OFFLOAD_MAX,
+};
+
+#define 

[PATCH 0/3 v2 net-next] enic: add vxlan offload support

2017-02-08 Thread Govindarajulu Varadarajan
This series adds vxlan offload support for enic driver. The first
patch adds vxlan devcmd for configuring vxland offload parameters.
Second patch adds ndo_udp_tunnel_add/del and offload on rx path.
There are to modes in which fw supports vxlan offload.

mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set
if checksum of csum is ok. This bit is or of ip_csum_ok and
tcp_udp_csum_ok

mode 2: BIT(0) in rss_hash is set if it is encapsulated packet.
BIT(1) is set if outer_ip_csum_ok/
BIT(2) is set if outer_tcp_csum_ok

Some hw supports only mode 0, some support mode 0 and 2. Driver gets
the supported modes bitmap using get_supported_feature_ver devcmd
and selects the highest mode both driver and fw supports.

Third patch adds offload support on tx path by adding
enic_features_check().

v2: Order local variable declarations from longest to shortest line,
on all three patches.

Govindarajulu Varadarajan (3):
  enic: add devcmds for vxlan offload
  enic: add udp_tunnel ndo for vxlan offload
  enic: add vxlan offload on tx path

 drivers/net/ethernet/cisco/enic/enic.h|   6 +
 drivers/net/ethernet/cisco/enic/enic_main.c   | 282 --
 drivers/net/ethernet/cisco/enic/vnic_dev.c|  34 
 drivers/net/ethernet/cisco/enic/vnic_dev.h|   5 +
 drivers/net/ethernet/cisco/enic/vnic_devcmd.h |  51 +
 drivers/net/ethernet/cisco/enic/vnic_enet.h   |   1 +
 6 files changed, 364 insertions(+), 15 deletions(-)

-- 
2.11.0



[PATCH 3/3 v2 net-next] enic: add vxlan offload on tx path

2017-02-08 Thread Govindarajulu Varadarajan
Define ndo_features_check. Hw supports offload only for ipv4 inner and
ipv4 outer pkt.

Code refactor for setting inner tcp pseudo csum.

Signed-off-by: Govindarajulu Varadarajan 
---
 drivers/net/ethernet/cisco/enic/enic_main.c | 126 +---
 1 file changed, 114 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c 
b/drivers/net/ethernet/cisco/enic/enic_main.c
index 7e56bf95cfc7..4b87beeabce1 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -263,6 +263,48 @@ static void enic_udp_tunnel_del(struct net_device *netdev,
spin_unlock_bh(>devcmd_lock);
 }
 
+static netdev_features_t enic_features_check(struct sk_buff *skb,
+struct net_device *dev,
+netdev_features_t features)
+{
+   const struct ethhdr *eth = (struct ethhdr *)skb_inner_mac_header(skb);
+   struct enic *enic = netdev_priv(dev);
+   struct udphdr *udph;
+   u16 port = 0;
+   u16 proto;
+
+   if (!skb->encapsulation)
+   return features;
+
+   features = vxlan_features_check(skb, features);
+
+   /* hardware only supports IPv4 vxlan tunnel */
+   if (vlan_get_protocol(skb) != htons(ETH_P_IP))
+   goto out;
+
+   /* hardware does not support offload of ipv6 inner pkt */
+   if (eth->h_proto != ntohs(ETH_P_IP))
+   goto out;
+
+   proto = ip_hdr(skb)->protocol;
+
+   if (proto == IPPROTO_UDP) {
+   udph = udp_hdr(skb);
+   port = be16_to_cpu(udph->dest);
+   }
+
+   /* HW supports offload of only one UDP port. Remove CSUM and GSO MASK
+* for other UDP port tunnels
+*/
+   if (port  != enic->vxlan.vxlan_udp_port_number)
+   goto out;
+
+   return features;
+
+out:
+   return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
+}
+
 int enic_is_dynamic(struct enic *enic)
 {
return enic->pdev->device == PCI_DEVICE_ID_CISCO_VIC_ENET_DYN;
@@ -591,20 +633,19 @@ static int enic_queue_wq_skb_csum_l4(struct enic *enic, 
struct vnic_wq *wq,
return err;
 }
 
-static int enic_queue_wq_skb_tso(struct enic *enic, struct vnic_wq *wq,
-struct sk_buff *skb, unsigned int mss,
-int vlan_tag_insert, unsigned int vlan_tag,
-int loopback)
+static void enic_preload_tcp_csum_encap(struct sk_buff *skb)
 {
-   unsigned int frag_len_left = skb_headlen(skb);
-   unsigned int len_left = skb->len - frag_len_left;
-   unsigned int hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
-   int eop = (len_left == 0);
-   unsigned int len;
-   dma_addr_t dma_addr;
-   unsigned int offset = 0;
-   skb_frag_t *frag;
+   if (skb->protocol == cpu_to_be16(ETH_P_IP)) {
+   inner_ip_hdr(skb)->check = 0;
+   inner_tcp_hdr(skb)->check =
+   ~csum_tcpudp_magic(inner_ip_hdr(skb)->saddr,
+  inner_ip_hdr(skb)->daddr, 0,
+  IPPROTO_TCP, 0);
+   }
+}
 
+static void enic_preload_tcp_csum(struct sk_buff *skb)
+{
/* Preload TCP csum field with IP pseudo hdr calculated
 * with IP length set to zero.  HW will later add in length
 * to each TCP segment resulting from the TSO.
@@ -618,6 +659,30 @@ static int enic_queue_wq_skb_tso(struct enic *enic, struct 
vnic_wq *wq,
tcp_hdr(skb)->check = ~csum_ipv6_magic(_hdr(skb)->saddr,
_hdr(skb)->daddr, 0, IPPROTO_TCP, 0);
}
+}
+
+static int enic_queue_wq_skb_tso(struct enic *enic, struct vnic_wq *wq,
+struct sk_buff *skb, unsigned int mss,
+int vlan_tag_insert, unsigned int vlan_tag,
+int loopback)
+{
+   unsigned int frag_len_left = skb_headlen(skb);
+   unsigned int len_left = skb->len - frag_len_left;
+   int eop = (len_left == 0);
+   unsigned int offset = 0;
+   unsigned int hdr_len;
+   dma_addr_t dma_addr;
+   unsigned int len;
+   skb_frag_t *frag;
+
+   if (skb->encapsulation) {
+   hdr_len = skb_inner_transport_header(skb) - skb->data;
+   hdr_len += inner_tcp_hdrlen(skb);
+   enic_preload_tcp_csum_encap(skb);
+   } else {
+   hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
+   enic_preload_tcp_csum(skb);
+   }
 
/* Queue WQ_ENET_MAX_DESC_LEN length descriptors
 * for the main skb fragment
@@ -666,6 +731,38 @@ static int enic_queue_wq_skb_tso(struct enic *enic, struct 
vnic_wq *wq,
return 0;
 }
 
+static inline int enic_queue_wq_skb_encap(struct enic *enic, struct vnic_wq 
*wq,
+  

[PATCH RFC v2 7/8] bnxt: Changes to use generic XDP infrastructure

2017-02-08 Thread Tom Herbert
Change XDP program management functional interface to correspond to new
XDP API.

Signed-off-by: Tom Herbert 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 14 
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 46 +++
 3 files changed, 27 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index cda1c78..ce311fb 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2091,9 +2091,6 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
struct bnxt_rx_ring_info *rxr = >rx_ring[i];
struct bnxt_ring_struct *ring;
 
-   if (rxr->xdp_prog)
-   bpf_prog_put(rxr->xdp_prog);
-
kfree(rxr->rx_tpa);
rxr->rx_tpa = NULL;
 
@@ -2381,15 +2378,6 @@ static int bnxt_init_one_rx_ring(struct bnxt *bp, int 
ring_nr)
ring = >rx_ring_struct;
bnxt_init_rxbd_pages(ring, type);
 
-   if (BNXT_RX_PAGE_MODE(bp) && bp->xdp_prog) {
-   rxr->xdp_prog = bpf_prog_add(bp->xdp_prog, 1);
-   if (IS_ERR(rxr->xdp_prog)) {
-   int rc = PTR_ERR(rxr->xdp_prog);
-
-   rxr->xdp_prog = NULL;
-   return rc;
-   }
-   }
prod = rxr->rx_prod;
for (i = 0; i < bp->rx_ring_size; i++) {
if (bnxt_alloc_rx_data(bp, rxr, prod, GFP_KERNEL) != 0) {
@@ -7157,8 +7145,6 @@ static void bnxt_remove_one(struct pci_dev *pdev)
pci_iounmap(pdev, bp->bar0);
kfree(bp->edev);
bp->edev = NULL;
-   if (bp->xdp_prog)
-   bpf_prog_put(bp->xdp_prog);
free_netdev(dev);
 
pci_release_regions(pdev);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 9f07b9c..3efe7af 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1175,7 +1175,7 @@ struct bnxt {
u8  num_leds;
struct bnxt_led_infoleds[BNXT_MAX_LED];
 
-   struct bpf_prog *xdp_prog;
+   boolxdp_enabled;
 };
 
 #define BNXT_RX_STATS_OFFSET(counter)  \
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 899c30f..3cfdc94 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -85,18 +85,18 @@ void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi 
*bnapi, int nr_pkts)
 bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 struct page *page, u8 **data_ptr, unsigned int *len, u8 *event)
 {
-   struct bpf_prog *xdp_prog = READ_ONCE(rxr->xdp_prog);
struct bnxt_tx_ring_info *txr;
struct bnxt_sw_rx_bd *rx_buf;
struct pci_dev *pdev;
struct xdp_buff xdp;
+   struct xdp_hook *last_hook;
dma_addr_t mapping;
void *orig_data;
u32 tx_avail;
u32 offset;
u32 act;
 
-   if (!xdp_prog)
+   if (!xdp_hook_run_needed_check(bp->dev, >bnapi->napi))
return false;
 
pdev = bp->pdev;
@@ -113,7 +113,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info 
*rxr, u16 cons,
dma_sync_single_for_cpu(>dev, mapping + offset, *len, bp->rx_dir);
 
rcu_read_lock();
-   act = bpf_prog_run_xdp(xdp_prog, );
+   act = xdp_hook_run_ret_last(>bnapi->napi, , _hook);
rcu_read_unlock();
 
tx_avail = bnxt_tx_avail(bp, txr);
@@ -134,7 +134,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info 
*rxr, u16 cons,
 
case XDP_TX:
if (tx_avail < 2) {
-   trace_xdp_exception(bp->dev, xdp_prog, act);
+   trace_xdp_hook_exception(bp->dev, last_hook, act);
bnxt_reuse_rx_data(rxr, cons, page);
return true;
}
@@ -147,10 +147,10 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct 
bnxt_rx_ring_info *rxr, u16 cons,
bnxt_reuse_rx_data(rxr, cons, page);
return true;
default:
-   bpf_warn_invalid_xdp_action(act);
+   xdp_warn_invalid_action(act);
/* Fall thru */
case XDP_ABORTED:
-   trace_xdp_exception(bp->dev, xdp_prog, act);
+   trace_xdp_hook_exception(bp->dev, last_hook, act);
/* Fall thru */
case XDP_DROP:
bnxt_reuse_rx_data(rxr, cons, page);
@@ -160,13 +160,15 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct 
bnxt_rx_ring_info *rxr, u16 cons,
 }
 
 /* Under rtnl_lock */
-static int bnxt_xdp_set(struct bnxt *bp, struct bpf_prog *prog)
+static int bnxt_xdp_init(struct bnxt 

[PATCH RFC v2 2/8] mlx4: Changes to use generic XDP infrastructure

2017-02-08 Thread Tom Herbert
Change XDP program management functional interface to correspond to new
XDP API.

Signed-off-by: Tom Herbert 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 92 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 27 
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |  1 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 -
 4 files changed, 29 insertions(+), 92 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 748e9f6..613786a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -2195,8 +2196,7 @@ int mlx4_en_try_alloc_resources(struct mlx4_en_priv *priv,
struct mlx4_en_port_profile *prof,
bool carry_xdp_prog)
 {
-   struct bpf_prog *xdp_prog;
-   int i, t;
+   int t;
 
mlx4_en_copy_priv(tmp, priv, prof);
 
@@ -2211,22 +2211,6 @@ int mlx4_en_try_alloc_resources(struct mlx4_en_priv 
*priv,
return -ENOMEM;
}
 
-   /* All rx_rings has the same xdp_prog.  Pick the first one. */
-   xdp_prog = rcu_dereference_protected(
-   priv->rx_ring[0]->xdp_prog,
-   lockdep_is_held(>mdev->state_lock));
-
-   if (xdp_prog && carry_xdp_prog) {
-   xdp_prog = bpf_prog_add(xdp_prog, tmp->rx_ring_num);
-   if (IS_ERR(xdp_prog)) {
-   mlx4_en_free_resources(tmp);
-   return PTR_ERR(xdp_prog);
-   }
-   for (i = 0; i < tmp->rx_ring_num; i++)
-   rcu_assign_pointer(tmp->rx_ring[i]->xdp_prog,
-  xdp_prog);
-   }
-
return 0;
 }
 
@@ -2713,42 +2697,20 @@ static int mlx4_en_set_tx_maxrate(struct net_device 
*dev, int queue_index, u32 m
return err;
 }
 
-static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
+static int mlx4_xdp_init(struct net_device *dev, bool enable)
 {
struct mlx4_en_priv *priv = netdev_priv(dev);
struct mlx4_en_dev *mdev = priv->mdev;
struct mlx4_en_port_profile new_prof;
-   struct bpf_prog *old_prog;
struct mlx4_en_priv *tmp;
int tx_changed = 0;
-   int xdp_ring_num;
int port_up = 0;
-   int err;
-   int i;
+   int xdp_ring_num, err;
 
-   xdp_ring_num = prog ? priv->rx_ring_num : 0;
+   xdp_ring_num = enable ? ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP) : 0;
 
-   /* No need to reconfigure buffers when simply swapping the
-* program for a new one.
-*/
-   if (priv->tx_ring_num[TX_XDP] == xdp_ring_num) {
-   if (prog) {
-   prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
-   if (IS_ERR(prog))
-   return PTR_ERR(prog);
-   }
-   mutex_lock(>state_lock);
-   for (i = 0; i < priv->rx_ring_num; i++) {
-   old_prog = rcu_dereference_protected(
-   priv->rx_ring[i]->xdp_prog,
-   lockdep_is_held(>state_lock));
-   rcu_assign_pointer(priv->rx_ring[i]->xdp_prog, prog);
-   if (old_prog)
-   bpf_prog_put(old_prog);
-   }
-   mutex_unlock(>state_lock);
+   if (priv->tx_ring_num[TX_XDP] == xdp_ring_num)
return 0;
-   }
 
if (!mlx4_en_check_xdp_mtu(dev, dev->mtu))
return -EOPNOTSUPP;
@@ -2757,14 +2719,6 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
if (!tmp)
return -ENOMEM;
 
-   if (prog) {
-   prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
-   if (IS_ERR(prog)) {
-   err = PTR_ERR(prog);
-   goto out;
-   }
-   }
-
mutex_lock(>state_lock);
memcpy(_prof, priv->prof, sizeof(struct mlx4_en_port_profile));
new_prof.tx_ring_num[TX_XDP] = xdp_ring_num;
@@ -2777,11 +2731,8 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
}
 
err = mlx4_en_try_alloc_resources(priv, tmp, _prof, false);
-   if (err) {
-   if (prog)
-   bpf_prog_sub(prog, priv->rx_ring_num - 1);
+   if (err)
goto unlock_out;
-   }
 
if (priv->port_up) {
port_up = 1;
@@ -2792,15 +2743,6 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
if (tx_changed)
netif_set_real_num_tx_queues(dev, priv->tx_ring_num[TX]);
 
-   for (i = 0; i < priv->rx_ring_num; i++) {
-   old_prog = 

RE: [net] net: phy: Fix lack of reference count on PHY driver

2017-02-08 Thread maowenan

> -Original Message-
> From: Andrew Lunn [mailto:and...@lunn.ch]
> Sent: Thursday, February 09, 2017 12:24 AM
> To: Robin Murphy
> Cc: Florian Fainelli; netdev@vger.kernel.org; da...@davemloft.net;
> rmk+ker...@armlinux.org.uk; maowenan; Catalin Marinas
> Subject: Re: [net] net: phy: Fix lack of reference count on PHY driver
> 
> On Wed, Feb 08, 2017 at 04:03:43PM +, Robin Murphy wrote:
> > Hi all,
> >
> > We're seeing a new boot-time crash[1] on SMSC911x hardware from this
> > patch in today's HEAD (as cafe8df8b9bc)...
> 
> Hi Robin
> 
> Thank for the report. See the discussion on netdev under the subject "Kernel
> crashes in phy_attach_direct()"
> 
> Andrew


There is bug report from Dan Carpenter(dan.carpen...@oracle.com) who ran static 
analysis to find this issue. Thanks a lot.
[bug report] net: phy: Fix lack of reference count on PHY driver


[PATCH RFC v2 5/8] virt_net: Changes to use generic XDP infrastructure

2017-02-08 Thread Tom Herbert
Change XDP program management functional interface to correspond to new
XDP API.

Signed-off-by: Tom Herbert 
---
 drivers/net/virtio_net.c | 98 +++-
 1 file changed, 38 insertions(+), 60 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11e2853..e8b1747 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -93,8 +93,6 @@ struct receive_queue {
 
struct napi_struct napi;
 
-   struct bpf_prog __rcu *xdp_prog;
-
/* Chain pages by the private ptr. */
struct page *pages;
 
@@ -140,6 +138,9 @@ struct virtnet_info {
/* Host can handle any s/g split between our header and packet data */
bool any_header_sg;
 
+   /* XDP has been enabled in device */
+   bool xdp_enabled;
+
/* Packet virtio header size */
u8 hdr_len;
 
@@ -414,13 +415,12 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
 void *buf, unsigned int len)
 {
struct sk_buff * skb = buf;
-   struct bpf_prog *xdp_prog;
+   struct xdp_hook *last_hook;
 
len -= vi->hdr_len;
 
rcu_read_lock();
-   xdp_prog = rcu_dereference(rq->xdp_prog);
-   if (xdp_prog) {
+   if (xdp_hook_run_needed_check(dev, >napi)) {
struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
struct xdp_buff xdp;
u32 act;
@@ -431,8 +431,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
xdp.data_hard_start = skb->data;
xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
xdp.data_end = xdp.data + len;
-   act = bpf_prog_run_xdp(xdp_prog, );
-
+   act = xdp_hook_run_ret_last(>napi, , _hook);
switch (act) {
case XDP_PASS:
/* Recalculate length in case bpf program changed it */
@@ -441,13 +440,13 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
break;
case XDP_TX:
if (unlikely(!virtnet_xdp_xmit(vi, rq, , skb)))
-   trace_xdp_exception(vi->dev, xdp_prog, act);
+   trace_xdp_hook_exception(vi->dev, last_hook, 
act);
rcu_read_unlock();
goto xdp_xmit;
default:
-   bpf_warn_invalid_xdp_action(act);
+   xdp_warn_invalid_action(act);
case XDP_ABORTED:
-   trace_xdp_exception(vi->dev, xdp_prog, act);
+   trace_xdp_hook_exception(vi->dev, last_hook, act);
case XDP_DROP:
goto err_xdp;
}
@@ -559,16 +558,15 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
struct page *page = virt_to_head_page(buf);
int offset = buf - page_address(page);
struct sk_buff *head_skb, *curr_skb;
-   struct bpf_prog *xdp_prog;
unsigned int truesize;
 
head_skb = NULL;
 
rcu_read_lock();
-   xdp_prog = rcu_dereference(rq->xdp_prog);
-   if (xdp_prog) {
+   if (xdp_hook_run_needed_check(dev, >napi)) {
struct page *xdp_page;
struct xdp_buff xdp;
+   struct xdp_hook *last_hook;
void *data;
u32 act;
 
@@ -599,7 +597,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
xdp.data = data + vi->hdr_len;
xdp.data_end = xdp.data + (len - vi->hdr_len);
-   act = bpf_prog_run_xdp(xdp_prog, );
+   act = xdp_hook_run_ret_last(>napi, , _hook);
 
switch (act) {
case XDP_PASS:
@@ -622,16 +620,16 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
break;
case XDP_TX:
if (unlikely(!virtnet_xdp_xmit(vi, rq, , data)))
-   trace_xdp_exception(vi->dev, xdp_prog, act);
+   trace_xdp_hook_exception(vi->dev, last_hook, 
act);
ewma_pkt_len_add(>mrg_avg_pkt_len, len);
if (unlikely(xdp_page != page))
goto err_xdp;
rcu_read_unlock();
goto xdp_xmit;
default:
-   bpf_warn_invalid_xdp_action(act);
+   xdp_warn_invalid_action(act);
case XDP_ABORTED:
-   trace_xdp_exception(vi->dev, xdp_prog, act);
+   trace_xdp_hook_exception(vi->dev, last_hook, act);
case XDP_DROP:
if (unlikely(xdp_page != page))
  

Re: [PATCH net-next v2 3/3] net: ethernet: bgmac: driver power manangement

2017-02-08 Thread Florian Fainelli
On 02/08/2017 01:24 PM, Jon Mason wrote:
> From: Joey Zhong 
> 
> Implement suspend/resume callbacks in the bgmac driver. This makes sure
> that we de-initialize and re-initialize the hardware correctly before
> entering suspend and when resuming.
> 
> Signed-off-by: Joey Zhong 
> Signed-off-by: Jon Mason 
> ---
>  drivers/net/ethernet/broadcom/bgmac-platform.c | 34 +
>  drivers/net/ethernet/broadcom/bgmac.c  | 51 
> ++
>  drivers/net/ethernet/broadcom/bgmac.h  |  2 +
>  3 files changed, 87 insertions(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/bgmac-platform.c 
> b/drivers/net/ethernet/broadcom/bgmac-platform.c
> index 2d153f7..3df91e7 100644
> --- a/drivers/net/ethernet/broadcom/bgmac-platform.c
> +++ b/drivers/net/ethernet/broadcom/bgmac-platform.c
> @@ -21,8 +21,12 @@
>  #include 
>  #include "bgmac.h"
>  
> +#define NICPM_PADRING_CFG0x0004
>  #define NICPM_IOMUX_CTRL 0x0008
>  
> +#define NICPM_PADRING_CFG_INIT_VAL   0x7400
> +#define NICPM_IOMUX_CTRL_INIT_VAL_AX 0x2188
> +
>  #define NICPM_IOMUX_CTRL_INIT_VAL0x3196e000
>  #define NICPM_IOMUX_CTRL_SPD_SHIFT   10
>  #define NICPM_IOMUX_CTRL_SPD_10M 0
> @@ -108,6 +112,10 @@ static void bgmac_nicpm_speed_set(struct net_device 
> *net_dev)
>   if (!bgmac->plat.nicpm_base)
>   return;
>  
> + /* SET RGMII IO CONFIG */
> + writel(NICPM_PADRING_CFG_INIT_VAL,
> +bgmac->plat.nicpm_base + NICPM_PADRING_CFG);
> +
>   val = NICPM_IOMUX_CTRL_INIT_VAL;
>   switch (bgmac->net_dev->phydev->speed) {
>   default:
> @@ -239,6 +247,31 @@ static int bgmac_remove(struct platform_device *pdev)
>   return 0;
>  }
>  
> +#ifdef CONFIG_PM
> +static int bgmac_suspend(struct device *dev)
> +{
> + struct bgmac *bgmac = dev_get_drvdata(dev);
> +
> + return bgmac_enet_suspend(bgmac);
> +}
> +
> +static int bgmac_resume(struct device *dev)
> +{
> + struct bgmac *bgmac = dev_get_drvdata(dev);
> +
> + return bgmac_enet_resume(bgmac);
> +}
> +
> +static const struct dev_pm_ops bgmac_pm_ops = {
> + .suspend = bgmac_suspend,
> + .resume = bgmac_resume
> +};
> +
> +#define BGMAC_PM_OPS (_pm_ops)
> +#else
> +#define BGMAC_PM_OPS NULL
> +#endif /* CONFIG_PM */
> +
>  static const struct of_device_id bgmac_of_enet_match[] = {
>   {.compatible = "brcm,amac",},
>   {.compatible = "brcm,nsp-amac",},
> @@ -252,6 +285,7 @@ static struct platform_driver bgmac_enet_driver = {
>   .driver = {
>   .name  = "bgmac-enet",
>   .of_match_table = bgmac_of_enet_match,
> + .pm = BGMAC_PM_OPS
>   },
>   .probe = bgmac_probe,
>   .remove = bgmac_remove,
> diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
> b/drivers/net/ethernet/broadcom/bgmac.c
> index bd549f8..e78c91d 100644
> --- a/drivers/net/ethernet/broadcom/bgmac.c
> +++ b/drivers/net/ethernet/broadcom/bgmac.c
> @@ -1478,6 +1478,7 @@ int bgmac_enet_probe(struct bgmac *bgmac)
>  
>   net_dev->irq = bgmac->irq;
>   SET_NETDEV_DEV(net_dev, bgmac->dev);
> + dev_set_drvdata(bgmac->dev, bgmac);
>  
>   if (!is_valid_ether_addr(bgmac->mac_addr)) {
>   dev_err(bgmac->dev, "Invalid MAC addr: %pM\n",
> @@ -1551,5 +1552,55 @@ void bgmac_enet_remove(struct bgmac *bgmac)
>  }
>  EXPORT_SYMBOL_GPL(bgmac_enet_remove);
>  
> +int bgmac_enet_suspend(struct bgmac *bgmac)
> +{
> + if (!netif_running(bgmac->net_dev))
> + return 0;
> +
> + phy_stop(bgmac->net_dev->phydev);
> +
> + netif_stop_queue(bgmac->net_dev);
> +
> + napi_disable(>napi);
> +
> + netif_tx_lock(bgmac->net_dev);
> + netif_device_detach(bgmac->net_dev);
> + netif_tx_unlock(bgmac->net_dev);
> +
> + bgmac_chip_intrs_off(bgmac);
> + bgmac_chip_reset(bgmac);
> + bgmac_dma_cleanup(bgmac);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(bgmac_enet_suspend);
> +
> +int bgmac_enet_resume(struct bgmac *bgmac)
> +{
> + int rc;
> +
> + if (netif_running(bgmac->net_dev))
> + return 0;

This should be if (!netif_running()) here, if it is running, we need to
do all of what is below.

With that fixed:

Reviewed-by: Florian Fainelli 

> +
> + rc = bgmac_dma_init(bgmac);
> + if (rc)
> + return rc;
> +
> + bgmac_chip_init(bgmac);
> +
> + napi_enable(>napi);
> +
> + netif_tx_lock(bgmac->net_dev);
> + netif_device_attach(bgmac->net_dev);
> + netif_tx_unlock(bgmac->net_dev);
> +
> + netif_start_queue(bgmac->net_dev);
> +
> + phy_start(bgmac->net_dev->phydev);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(bgmac_enet_resume);
> +
>  MODULE_AUTHOR("Rafał Miłecki");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/net/ethernet/broadcom/bgmac.h 
> b/drivers/net/ethernet/broadcom/bgmac.h
> index 5a518fe..741ca27 100644
> --- 

[PATCH net v3 0/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()

2017-02-08 Thread Florian Fainelli
Hi all,

This patch series addresses the crash seen with the Generic PHY driver
in phy_attach_direct() introduced in the latest pull to Linus.

We also address how to properly bind and unbind to/from the PHY drivers which
would previously be crashing in flames since we did not stop the state machine.

Thanks!

Changes in v3:

- made more testing as module/built-in, with Generic and non-Generic PHY drivers
- exercised error paths on purpose by injecting errors
- properly incremenet Generic PHY module reference count as well
- fixed the error path to be correct

Changes in v2:

- fixed net: phy: Fix lack of reference count on PHY driver against
  the Generic PHY driver which is special

Florian Fainelli (3):
  net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
  net: phy: Check phydev->drv
  net: phy: Fix PHY driver bind and unbind events

 drivers/net/phy/phy.c| 26 +
 drivers/net/phy/phy_device.c | 45 
 include/linux/phy.h  |  3 +++
 3 files changed, 66 insertions(+), 8 deletions(-)

-- 
2.9.3



[PATCH net v3 1/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()

2017-02-08 Thread Florian Fainelli
The Generic PHY drivers gets assigned after we checked that the current
PHY driver is NULL, so we need to check a few things before we can
safely dereference d->driver. This would be causing a NULL deference to
occur when a system binds to the Generic PHY driver. Update
phy_attach_direct() to do the following:

- grab the driver module reference after we have assigned the Generic
  PHY drivers accordingly

- update the error path to clean up the module reference in case the
  Generic PHY probe function fails

Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver")
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phy_device.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 0d8f4d3847f6..d63d190a95ef 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -908,6 +908,7 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
struct module *ndev_owner = dev->dev.parent->driver->owner;
struct mii_bus *bus = phydev->mdio.bus;
struct device *d = >mdio.dev;
+   bool using_genphy = false;
int err;
 
/* For Ethernet device drivers that register their own MDIO bus, we
@@ -938,12 +939,22 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
d->driver =
_driver[GENPHY_DRV_1G].mdiodrv.driver;
 
+   using_genphy = true;
+   }
+
+   if (!try_module_get(d->driver->owner)) {
+   dev_err(>dev, "failed to get the device driver module\n");
+   err = -EIO;
+   goto error_put_device;
+   }
+
+   if (using_genphy) {
err = d->driver->probe(d);
if (err >= 0)
err = device_bind_driver(d);
 
if (err)
-   goto error;
+   goto error_module_put;
}
 
if (phydev->attached_dev) {
@@ -981,6 +992,9 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
 
 error:
phy_detach(phydev);
+error_module_put:
+   module_put(d->driver->owner);
+error_put_device:
put_device(d);
module_put(d->driver->owner);
if (ndev_owner != bus->owner)
-- 
2.9.3



[PATCH net v3 2/3] net: phy: Check phydev->drv

2017-02-08 Thread Florian Fainelli
In preparation for supporting driver bind/unbind properly, sprinkle checks on
phydev->drv where we may call into PHYLIB from user-space or other parts of the
kernel.

Suggested-by: Russell King 
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phy.c| 26 ++
 drivers/net/phy/phy_device.c |  2 +-
 include/linux/phy.h  |  3 +++
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 7cc1b7dcfe05..d6f7838455dd 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -580,7 +580,7 @@ int phy_mii_ioctl(struct phy_device *phydev, struct ifreq 
*ifr, int cmd)
return 0;
 
case SIOCSHWTSTAMP:
-   if (phydev->drv->hwtstamp)
+   if (phydev->drv && phydev->drv->hwtstamp)
return phydev->drv->hwtstamp(phydev, ifr);
/* fall through */
 
@@ -603,6 +603,9 @@ int phy_start_aneg(struct phy_device *phydev)
 {
int err;
 
+   if (!phydev->drv)
+   return -EIO;
+
mutex_lock(>lock);
 
if (AUTONEG_DISABLE == phydev->autoneg)
@@ -975,7 +978,7 @@ void phy_state_machine(struct work_struct *work)
 
old_state = phydev->state;
 
-   if (phydev->drv->link_change_notify)
+   if (phydev->drv && phydev->drv->link_change_notify)
phydev->drv->link_change_notify(phydev);
 
switch (phydev->state) {
@@ -1286,6 +1289,9 @@ EXPORT_SYMBOL(phy_write_mmd_indirect);
  */
 int phy_init_eee(struct phy_device *phydev, bool clk_stop_enable)
 {
+   if (!phydev->drv)
+   return -EIO;
+
/* According to 802.3az,the EEE is supported only in full duplex-mode.
 * Also EEE feature is active when core is operating with MII, GMII
 * or RGMII (all kinds). Internal PHYs are also allowed to proceed and
@@ -1363,6 +1369,9 @@ EXPORT_SYMBOL(phy_init_eee);
  */
 int phy_get_eee_err(struct phy_device *phydev)
 {
+   if (!phydev->drv)
+   return -EIO;
+
return phy_read_mmd_indirect(phydev, MDIO_PCS_EEE_WK_ERR, MDIO_MMD_PCS);
 }
 EXPORT_SYMBOL(phy_get_eee_err);
@@ -1379,6 +1388,9 @@ int phy_ethtool_get_eee(struct phy_device *phydev, struct 
ethtool_eee *data)
 {
int val;
 
+   if (!phydev->drv)
+   return -EIO;
+
/* Get Supported EEE */
val = phy_read_mmd_indirect(phydev, MDIO_PCS_EEE_ABLE, MDIO_MMD_PCS);
if (val < 0)
@@ -1412,6 +1424,9 @@ int phy_ethtool_set_eee(struct phy_device *phydev, struct 
ethtool_eee *data)
 {
int val = ethtool_adv_to_mmd_eee_adv_t(data->advertised);
 
+   if (!phydev->drv)
+   return -EIO;
+
/* Mask prohibited EEE modes */
val &= ~phydev->eee_broken_modes;
 
@@ -1423,7 +1438,7 @@ EXPORT_SYMBOL(phy_ethtool_set_eee);
 
 int phy_ethtool_set_wol(struct phy_device *phydev, struct ethtool_wolinfo *wol)
 {
-   if (phydev->drv->set_wol)
+   if (phydev->drv && phydev->drv->set_wol)
return phydev->drv->set_wol(phydev, wol);
 
return -EOPNOTSUPP;
@@ -1432,7 +1447,7 @@ EXPORT_SYMBOL(phy_ethtool_set_wol);
 
 void phy_ethtool_get_wol(struct phy_device *phydev, struct ethtool_wolinfo 
*wol)
 {
-   if (phydev->drv->get_wol)
+   if (phydev->drv && phydev->drv->get_wol)
phydev->drv->get_wol(phydev, wol);
 }
 EXPORT_SYMBOL(phy_ethtool_get_wol);
@@ -1468,6 +1483,9 @@ int phy_ethtool_nway_reset(struct net_device *ndev)
if (!phydev)
return -ENODEV;
 
+   if (!phydev->drv)
+   return -EIO;
+
return genphy_restart_aneg(phydev);
 }
 EXPORT_SYMBOL(phy_ethtool_nway_reset);
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index d63d190a95ef..40675b9706ae 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1790,7 +1790,7 @@ static int phy_remove(struct device *dev)
phydev->state = PHY_DOWN;
mutex_unlock(>lock);
 
-   if (phydev->drv->remove)
+   if (phydev->drv && phydev->drv->remove)
phydev->drv->remove(phydev);
phydev->drv = NULL;
 
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 7fc1105605bf..231e07bb0d76 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -802,6 +802,9 @@ int phy_stop_interrupts(struct phy_device *phydev);
 
 static inline int phy_read_status(struct phy_device *phydev)
 {
+   if (!phydev->drv)
+   return -EIO;
+
return phydev->drv->read_status(phydev);
 }
 
-- 
2.9.3



Re: [PATCH v2 net-next 5/9] openvswitch: Refactor labels initialization.

2017-02-08 Thread Joe Stringer
On 8 February 2017 at 11:32, Jarno Rajahalme  wrote:
> Refactoring conntrack labels initialization makes chenges in later

*changes

> patches easier to review.
>
> Signed-off-by: Jarno Rajahalme 

Minor other nit:

>
> cl = nf_ct_labels_find(ct);
> if (!cl) {
> nf_ct_labels_ext_add(ct);
> cl = nf_ct_labels_find(ct);
> }
> +
> +   return cl;
> +}
> +
> +/* Initialize labels for a new, to be committed conntrack entry.  Note that

Maybe insert 'yet':

Initialize labels for a new, yet to be committed conntrack entry.  Note that


[PATCH RFC v2 8/8] xdp: Cleanup after API changes

2017-02-08 Thread Tom Herbert
This patch:
  - Change trace_xdp_hook_exception to trace_xdp_exception
  - Remove XDP_SETUP_PROG and XDP_QUERY_PROG constants
  - Remove bpf_warn_invalid_xdp_action

Signed-off-by: Tom Herbert 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c  |  4 +--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |  4 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|  4 +--
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  8 +++---
 drivers/net/ethernet/qlogic/qede/qede_fp.c |  6 ++---
 drivers/net/virtio_net.c   |  8 +++---
 include/linux/filter.h |  1 -
 include/linux/netdevice.h  | 15 ---
 include/trace/events/xdp.h | 29 --
 kernel/bpf/core.c  |  1 -
 net/core/filter.c  |  6 -
 11 files changed, 16 insertions(+), 70 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 3cfdc94..e894b67 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -134,7 +134,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info 
*rxr, u16 cons,
 
case XDP_TX:
if (tx_avail < 2) {
-   trace_xdp_hook_exception(bp->dev, last_hook, act);
+   trace_xdp_exception(bp->dev, last_hook, act);
bnxt_reuse_rx_data(rxr, cons, page);
return true;
}
@@ -150,7 +150,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info 
*rxr, u16 cons,
xdp_warn_invalid_action(act);
/* Fall thru */
case XDP_ABORTED:
-   trace_xdp_hook_exception(bp->dev, last_hook, act);
+   trace_xdp_exception(bp->dev, last_hook, act);
/* Fall thru */
case XDP_DROP:
bnxt_reuse_rx_data(rxr, cons, page);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index a8fddc0..d8648fe 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -927,12 +927,12 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
length, cq->ring,
_pending)))
goto consumed;
-   trace_xdp_hook_exception(dev, last_hook, act);
+   trace_xdp_exception(dev, last_hook, act);
goto xdp_drop_no_cnt; /* Drop on xmit failure */
default:
xdp_warn_invalid_action(act);
case XDP_ABORTED:
-   trace_xdp_hook_exception(dev, last_hook, act);
+   trace_xdp_exception(dev, last_hook, act);
case XDP_DROP:
ring->xdp_drop++;
 xdp_drop_no_cnt:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 50ab4b9..1be1eef 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -740,12 +740,12 @@ static inline int mlx5e_xdp_handle(struct mlx5e_rq *rq,
return false;
case XDP_TX:
if (unlikely(!mlx5e_xmit_xdp_frame(rq, di, )))
-   trace_xdp_hook_exception(rq->netdev, last_hook, act);
+   trace_xdp_exception(rq->netdev, last_hook, act);
return true;
default:
xdp_warn_invalid_action(act);
case XDP_ABORTED:
-   trace_xdp_hook_exception(rq->netdev, last_hook, act);
+   trace_xdp_exception(rq->netdev, last_hook, act);
case XDP_DROP:
rq->stats.xdp_drop++;
mlx5e_page_release(rq, di, true);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 2dee867..381f6be 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1613,15 +1613,13 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
if (unlikely(!nfp_net_tx_xdp_buf(nn, rx_ring,
 tx_ring, rxbuf,
 pkt_off, 
pkt_len)))
-   trace_xdp_hook_exception(nn->netdev,
-last_hook,
-  

Re: [PATCH v2 net-next 6/9] openvswitch: Inherit master's labels.

2017-02-08 Thread Joe Stringer
On 8 February 2017 at 11:32, Jarno Rajahalme  wrote:
> We avoid calling into nf_conntrack_in() for expected connections, as
> that would remove the expectation that we want to stick around until
> we are ready to commit the connection.  Instead, we do a lookup in the
> expectation table directly.  However, after a successful expectation
> lookup we have set the flow key label field from the master
> connection, whereas nf_conntrack_in() does not do this.  This leads to
> master's labels being inherited after an expectation lookup, but those
> labels not being inherited after the corresponding conntrack action
> with a commit flag.
>
> This patch resolves the problem by changing the commit code path to
> also inherit the master's labels to the expected connection.
> Resolving this conflict in favor or inheriting the labels allows more

*of inheriting

> information be passed from the master connection to related
> connections, which would otherwise be much harder if the 32 bits in
> the connmark are not enough.  Labels can still be set explicitly, so
> this change only affects the default values of the labels in presense
> of a master connection.
>
> Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
> Signed-off-by: Jarno Rajahalme 



> @@ -272,18 +274,32 @@ static int ovs_ct_init_labels(struct nf_conn *ct, 
> struct sw_flow_key *key,
>   const struct ovs_key_ct_labels *labels,
>   const struct ovs_key_ct_labels *mask)
>  {
> -   struct nf_conn_labels *cl;
> -   u32 *dst;
> -   int i;
> +   struct nf_conn_labels *cl, *master_cl;
> +   bool have_mask = labels_nonzero(mask);
> +
> +   /* Inherit master's labels to the related connection? */
> +   master_cl = (ct->master) ? nf_ct_labels_find(ct->master) : NULL;

I don't think (ct->master) needs the parentheses.


Re:[詢價]GFP-051U 詢價,交換式電源 48V 3.3A 電源

2017-02-08 Thread 吉密科技
採購你好,
收到你的來信了,

你正在尋找有關「電源製造」的可靠供應商嗎?
我們是台灣最專業的電源供應製造商,有自已的研發團隊與製造工廠,
我們有長期合作的客戶,如PHILIPS、HP、TOSHIBA、LITEON…等,
生產交換式電源供應器、USB充電器、POE供電的經驗非常豐富
歡迎與我們聯絡,期待你的回覆,


吉密科技
林榮宗
0422587996
gme.po...@msa.hinet.net

如寄錯請轉交,謝謝

[PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-08 Thread Tom Herbert
This patch creates an infrastructure for registering and running code at
XDP hooks in drivers. This extends and generalizes the original XDP/BPF
interface. Specifically, it defines a generic xdp_hook structure and a
set of hooks that can be assigned to devices or napi instances.  These
hooks are also generic to allow for XDP/BPF programs as well as non-BPF
code (e.g. kernel code can be written in a module).

An XDP hook is defined by the xdp_hook structure. A pointer to this
structure is passed into the XDP register function to set up a hook.
The XDP register function mallocs its own xdp_hook structure and copies
the values from the xdp_hook passed in. The register function also saves
the pointer value of the xdp_hook argument; this pointer is used in
subsequently calls to XDP to identify the registered hook.

The interface is defined in net/xdp.h. This includes the definition of
xdp_hook, functions to register and unregister hooks on a device
or individual instances of napi, and xdp_hook_run that is called by
drivers to run the hooks.

Signed-off-by: Tom Herbert 
---
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c |   1 +
 include/linux/filter.h   |  10 +-
 include/linux/netdev_features.h  |   3 +-
 include/linux/netdevice.h|  16 ++
 include/net/xdp.h| 310 +++
 include/trace/events/xdp.h   |  31 +++
 kernel/bpf/core.c|   1 +
 net/core/Makefile|   2 +-
 net/core/dev.c   |  53 ++--
 net/core/filter.c|   1 +
 net/core/rtnetlink.c |  14 +-
 net/core/xdp.c   | 304 ++
 12 files changed, 711 insertions(+), 35 deletions(-)
 create mode 100644 include/net/xdp.h
 create mode 100644 net/core/xdp.c

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c 
b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
index 335beb8..d294fb2 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "nfp_asm.h"
 #include "nfp_bpf.h"
diff --git a/include/linux/filter.h b/include/linux/filter.h
index e4eb254..bb9f2f2 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -428,7 +428,7 @@ struct sk_filter {
struct bpf_prog *prog;
 };
 
-#define BPF_PROG_RUN(filter, ctx)  (*filter->bpf_func)(ctx, filter->insnsi)
+#define BPF_PROG_RUN(filter, ctx)  (*(filter)->bpf_func)(ctx, (filter)->insnsi)
 
 #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN
 
@@ -437,12 +437,6 @@ struct bpf_skb_data_end {
void *data_end;
 };
 
-struct xdp_buff {
-   void *data;
-   void *data_end;
-   void *data_hard_start;
-};
-
 /* compute the linear packet data range [data, data_end) which
  * will be accessed by cls_bpf, act_bpf and lwt programs
  */
@@ -504,6 +498,8 @@ static inline u32 bpf_prog_run_clear_cb(const struct 
bpf_prog *prog,
return BPF_PROG_RUN(prog, skb);
 }
 
+struct xdp_buff;
+
 static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
struct xdp_buff *xdp)
 {
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 9a04195..f22d379 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -71,8 +71,8 @@ enum {
NETIF_F_HW_VLAN_STAG_RX_BIT,/* Receive VLAN STAG HW acceleration */
NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */
NETIF_F_HW_L2FW_DOFFLOAD_BIT,   /* Allow L2 Forwarding in Hardware */
-
NETIF_F_HW_TC_BIT,  /* Offload TC infrastructure */
+   NETIF_F_XDP_BIT,/* Support XDP interface */
 
/*
 * Add your fresh new feature above and remember to update
@@ -134,6 +134,7 @@ enum {
 #define NETIF_F_HW_VLAN_STAG_TX__NETIF_F(HW_VLAN_STAG_TX)
 #define NETIF_F_HW_L2FW_DOFFLOAD   __NETIF_F(HW_L2FW_DOFFLOAD)
 #define NETIF_F_HW_TC  __NETIF_F(HW_TC)
+#define NETIF_F_XDP__NETIF_F(XDP)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 58afbd1..2404e76 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -324,6 +324,7 @@ struct napi_struct {
struct sk_buff  *skb;
struct hrtimer  timer;
struct list_headdev_list;
+   struct xdp_hook_set __rcu *xdp_hooks;
struct hlist_node   napi_hash_node;
unsigned intnapi_id;
 };
@@ -821,12 +822,25 @@ enum xdp_netdev_command {
 * return true if a program is 

[PATCH RFC v2 3/8] nfp: Changes to use generic XDP infrastructure

2017-02-08 Thread Tom Herbert
Change XDP program management functional interface to correspond to new
XDP API.

Signed-off-by: Tom Herbert 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   5 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 172 ++---
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  12 +-
 3 files changed, 87 insertions(+), 102 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 2115f44..09a315e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -504,14 +504,13 @@ struct nfp_net {
unsigned fw_loaded:1;
unsigned bpf_offload_skip_sw:1;
unsigned bpf_offload_xdp:1;
+   unsigned xdp_enabled:1;
 
u32 ctrl;
u32 fl_bufsz;
 
u32 rx_offset;
 
-   struct bpf_prog *xdp_prog;
-
struct nfp_net_tx_ring *tx_rings;
struct nfp_net_rx_ring *rx_rings;
 
@@ -792,7 +791,7 @@ void nfp_net_coalesce_write_cfg(struct nfp_net *nn);
 int nfp_net_irqs_alloc(struct nfp_net *nn);
 void nfp_net_irqs_disable(struct nfp_net *nn);
 int
-nfp_net_ring_reconfig(struct nfp_net *nn, struct bpf_prog **xdp_prog,
+nfp_net_ring_reconfig(struct nfp_net *nn,
  struct nfp_net_ring_set *rx, struct nfp_net_ring_set *tx);
 
 #ifdef CONFIG_NFP_NET_DEBUG
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 6ac43ab..2dee867 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -65,6 +65,7 @@
 
 #include 
 #include 
+#include 
 
 #include "nfp_net_ctrl.h"
 #include "nfp_net.h"
@@ -1166,10 +1167,10 @@ nfp_net_napi_alloc_one(struct nfp_net *nn, int 
direction, dma_addr_t *dma_addr)
 {
void *frag;
 
-   if (!nn->xdp_prog)
-   frag = napi_alloc_frag(nn->fl_bufsz);
-   else
+   if (nn->xdp_enabled)
frag = page_address(alloc_page(GFP_ATOMIC | __GFP_COLD));
+   else
+   frag = napi_alloc_frag(nn->fl_bufsz);
if (!frag) {
nn_warn_ratelimit(nn, "Failed to alloc receive page frag\n");
return NULL;
@@ -1177,7 +1178,7 @@ nfp_net_napi_alloc_one(struct nfp_net *nn, int direction, 
dma_addr_t *dma_addr)
 
*dma_addr = nfp_net_dma_map_rx(nn, frag, nn->fl_bufsz, direction);
if (dma_mapping_error(>pdev->dev, *dma_addr)) {
-   nfp_net_free_frag(frag, nn->xdp_prog);
+   nfp_net_free_frag(frag, nn->xdp_enabled);
nn_warn_ratelimit(nn, "Failed to map DMA RX buffer\n");
return NULL;
}
@@ -1248,17 +1249,15 @@ static void nfp_net_rx_ring_reset(struct 
nfp_net_rx_ring *rx_ring)
  * nfp_net_rx_ring_bufs_free() - Free any buffers currently on the RX ring
  * @nn:NFP Net device
  * @rx_ring:   RX ring to remove buffers from
- * @xdp:   Whether XDP is enabled
  *
  * Assumes that the device is stopped and buffers are in [0, ring->cnt - 1)
  * entries.  After device is disabled nfp_net_rx_ring_reset() must be called
  * to restore required ring geometry.
  */
 static void
-nfp_net_rx_ring_bufs_free(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring,
- bool xdp)
+nfp_net_rx_ring_bufs_free(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring)
 {
-   int direction = xdp ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
+   int direction = nn->xdp_enabled ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
unsigned int i;
 
for (i = 0; i < rx_ring->cnt - 1; i++) {
@@ -1271,7 +1270,7 @@ nfp_net_rx_ring_bufs_free(struct nfp_net *nn, struct 
nfp_net_rx_ring *rx_ring,
 
nfp_net_dma_unmap_rx(nn, rx_ring->rxbufs[i].dma_addr,
 rx_ring->bufsz, direction);
-   nfp_net_free_frag(rx_ring->rxbufs[i].frag, xdp);
+   nfp_net_free_frag(rx_ring->rxbufs[i].frag, nn->xdp_enabled);
rx_ring->rxbufs[i].dma_addr = 0;
rx_ring->rxbufs[i].frag = NULL;
}
@@ -1284,8 +1283,7 @@ nfp_net_rx_ring_bufs_free(struct nfp_net *nn, struct 
nfp_net_rx_ring *rx_ring,
  * @xdp:   Whether XDP is enabled
  */
 static int
-nfp_net_rx_ring_bufs_alloc(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring,
-  bool xdp)
+nfp_net_rx_ring_bufs_alloc(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring)
 {
struct nfp_net_rx_buf *rxbufs;
unsigned int i;
@@ -1295,9 +1293,9 @@ nfp_net_rx_ring_bufs_alloc(struct nfp_net *nn, struct 
nfp_net_rx_ring *rx_ring,
for (i = 0; i < rx_ring->cnt - 1; i++) {
rxbufs[i].frag =
nfp_net_rx_alloc_one(rx_ring, [i].dma_addr,
-rx_ring->bufsz, xdp);
+rx_ring->bufsz, nn->xdp_enabled);

Re: [PATCH net v3 1/3] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()

2017-02-08 Thread Florian Fainelli
On 02/08/2017 04:13 PM, Florian Fainelli wrote:
> The Generic PHY drivers gets assigned after we checked that the current
> PHY driver is NULL, so we need to check a few things before we can
> safely dereference d->driver. This would be causing a NULL deference to
> occur when a system binds to the Generic PHY driver. Update
> phy_attach_direct() to do the following:
> 
> - grab the driver module reference after we have assigned the Generic
>   PHY drivers accordingly
> 
> - update the error path to clean up the module reference in case the
>   Generic PHY probe function fails
> 
> Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver")
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/net/phy/phy_device.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 0d8f4d3847f6..d63d190a95ef 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -908,6 +908,7 @@ int phy_attach_direct(struct net_device *dev, struct 
> phy_device *phydev,
>   struct module *ndev_owner = dev->dev.parent->driver->owner;
>   struct mii_bus *bus = phydev->mdio.bus;
>   struct device *d = >mdio.dev;
> + bool using_genphy = false;
>   int err;
>  
>   /* For Ethernet device drivers that register their own MDIO bus, we
> @@ -938,12 +939,22 @@ int phy_attach_direct(struct net_device *dev, struct 
> phy_device *phydev,
>   d->driver =
>   _driver[GENPHY_DRV_1G].mdiodrv.driver;
>  
> + using_genphy = true;
> + }
> +
> + if (!try_module_get(d->driver->owner)) {
> + dev_err(>dev, "failed to get the device driver module\n");
> + err = -EIO;
> + goto error_put_device;
> + }

And still not correct, since we need to remove the other hunk, one day I
will learn how to properly rebase my work... will submit a v4 shortly.
-- 
Florian


[PATCH net v3 3/3] net: phy: Fix PHY driver bind and unbind events

2017-02-08 Thread Florian Fainelli
The PHY library does not deal very well with bind and unbind events. The first
thing we would see is that we were not properly canceling the PHY state machine
workqueue, so we would be crashing while dereferencing phydev->drv since there
is no driver attached anymore.

Once we fix that, there are several things that did not quite work as expected:

- if the PHY state machine was running, we were not stopping it properly, and
  the state machine state would not be marked as such
- when we rebind the driver, nothing would happen, since we would not know which
  state we were before the unbind

This patch takes the following approach:

- if the PHY was attached, and the state machine was running we would stop it,
  remember where we left, and schedule the state machine for restart upong
  driver bind
- if the PHY was attached, but HALTED, we would let it in that state, and do not
  alter the state upon driver bind
- in all other cases (detached) we would keep the PHY in DOWN state waiting for
  a network driver to show up, and set PHY_READY on driver bind

Suggested-by: Russell King 
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phy_device.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 40675b9706ae..6e46f6807bb7 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1723,6 +1723,7 @@ static int phy_probe(struct device *dev)
struct phy_device *phydev = to_phy_device(dev);
struct device_driver *drv = phydev->mdio.dev.driver;
struct phy_driver *phydrv = to_phy_driver(drv);
+   bool should_start = false;
int err = 0;
 
phydev->drv = phydrv;
@@ -1772,24 +1773,46 @@ static int phy_probe(struct device *dev)
}
 
/* Set the state to READY by default */
-   phydev->state = PHY_READY;
+   if (phydev->state > PHY_UP && phydev->state != PHY_HALTED)
+   should_start = true;
+   else
+   phydev->state = PHY_READY;
 
if (phydev->drv->probe)
err = phydev->drv->probe(phydev);
 
mutex_unlock(>lock);
 
+   if (should_start)
+   phy_start(phydev);
+
return err;
 }
 
 static int phy_remove(struct device *dev)
 {
struct phy_device *phydev = to_phy_device(dev);
+   bool should_stop = false;
+   enum phy_state state;
+
+   cancel_delayed_work_sync(>state_queue);
 
mutex_lock(>lock);
-   phydev->state = PHY_DOWN;
+   state = phydev->state;
+   if (state > PHY_UP && state != PHY_HALTED)
+   should_stop = true;
+   else
+   phydev->state = PHY_DOWN;
mutex_unlock(>lock);
 
+   /* phy_stop() sets the state to HALTED, undo that for the ->probe() 
function
+* to have a chance to resume where we left
+*/
+   if (should_stop) {
+   phy_stop(phydev);
+   phydev->state = state;
+   }
+
if (phydev->drv && phydev->drv->remove)
phydev->drv->remove(phydev);
phydev->drv = NULL;
-- 
2.9.3



[PATCH RFC v2 6/8] mlx5: Changes to use generic XDP infrastructure

2017-02-08 Thread Tom Herbert
Change XDP program management functional interface to correspond to new
XDP API.

Signed-off-by: Tom Herbert 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 105 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |  12 +--
 3 files changed, 33 insertions(+), 87 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 95ca03c..0255423 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -381,7 +381,6 @@ struct mlx5e_rq {
u16rx_headroom;
 
struct mlx5e_rx_am am; /* Adaptive Moderation */
-   struct bpf_prog   *xdp_prog;
 
/* control */
struct mlx5_wq_ctrlwq_ctrl;
@@ -695,7 +694,7 @@ struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_sq**txq_to_sq_map;
int channeltc_to_txq_map[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC];
-   struct bpf_prog *xdp_prog;
+   bool   xdp_enabled;
/* priv data path fields - end */
 
unsigned long  state;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3cce628..da91cf52 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "en.h"
 #include "en_tc.h"
 #include "eswitch.h"
@@ -113,7 +114,7 @@ static void mlx5e_set_rq_type_params(struct mlx5e_priv 
*priv, u8 rq_type)
 static void mlx5e_set_rq_priv_params(struct mlx5e_priv *priv)
 {
u8 rq_type = mlx5e_check_fragmented_striding_rq_cap(priv->mdev) &&
-   !priv->xdp_prog ?
+   !priv->xdp_enabled ?
MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ :
MLX5_WQ_TYPE_LINKED_LIST;
mlx5e_set_rq_type_params(priv, rq_type);
@@ -568,14 +569,12 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
rq->ix  = c->ix;
rq->priv= c->priv;
 
-   rq->xdp_prog = priv->xdp_prog ? bpf_prog_inc(priv->xdp_prog) : NULL;
-   if (IS_ERR(rq->xdp_prog)) {
-   err = PTR_ERR(rq->xdp_prog);
-   rq->xdp_prog = NULL;
-   goto err_rq_wq_destroy;
-   }
-
-   if (rq->xdp_prog) {
+   if (priv->xdp_enabled) {
+   /* Note XDP is checked whether it is enabled for the device. If
+* XDP programs are set per ring as opposed to setting program
+* across the device this could be adjusted to account for
+* that.
+*/
rq->buff.map_dir = DMA_BIDIRECTIONAL;
rq->rx_headroom = XDP_PACKET_HEADROOM;
} else {
@@ -662,8 +661,6 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
mlx5_core_destroy_mkey(mdev, >umr_mkey);
 
 err_rq_wq_destroy:
-   if (rq->xdp_prog)
-   bpf_prog_put(rq->xdp_prog);
mlx5_wq_destroy(>wq_ctrl);
 
return err;
@@ -673,9 +670,6 @@ static void mlx5e_destroy_rq(struct mlx5e_rq *rq)
 {
int i;
 
-   if (rq->xdp_prog)
-   bpf_prog_put(rq->xdp_prog);
-
switch (rq->wq_type) {
case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
mlx5e_rq_free_mpwqe_info(rq);
@@ -1547,7 +1541,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, 
int ix,
c->netdev   = priv->netdev;
c->mkey_be  = cpu_to_be32(priv->mdev->mlx5e_res.mkey.key);
c->num_tc   = priv->params.num_tc;
-   c->xdp  = !!priv->xdp_prog;
+   c->xdp  = priv->xdp_enabled;
 
if (priv->params.rx_am_enabled)
rx_cq_profile = 
mlx5e_am_get_def_profile(priv->params.rx_cq_period_mode);
@@ -3196,96 +3190,52 @@ static void mlx5e_tx_timeout(struct net_device *dev)
schedule_work(>tx_timeout_work);
 }
 
-static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
+static int mlx5e_xdp_init(struct net_device *netdev, bool enable)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
-   struct bpf_prog *old_prog;
int err = 0;
-   bool reset, was_opened;
-   int i;
+   bool was_opened;
 
mutex_lock(>state_lock);
 
-   if ((netdev->features & NETIF_F_LRO) && prog) {
+   if (priv->xdp_enabled == enable)
+   goto unlock;
+
+   if ((netdev->features & NETIF_F_LRO) && enable) {
netdev_warn(netdev, "can't set XDP while LRO is on, disable LRO 
first\n");
err = -EINVAL;
goto unlock;
}
 
was_opened = test_bit(MLX5E_STATE_OPENED, >state);
-   /* no need for full reset when exchanging programs */
-   reset = (!priv->xdp_prog || !prog);
 
-   if 

Re: Extending socket timestamping API for NTP

2017-02-08 Thread sdncurious
On Wed, Feb 8, 2017 at 2:26 AM, Miroslav Lichvar  wrote:
> On Tue, Feb 07, 2017 at 12:37:15PM -0800, sdncurious wrote:
>> On Tue, Feb 7, 2017 at 6:01 AM, Miroslav Lichvar  wrote:
>> > 6) new SO_TIMESTAMPING option to get PHC index with HW timestamps
>> >
>> >With bridges, bonding and other things it's difficult to determine
>> >which PHC timestamped the packet. It would be very useful if the
>> >PHC index was provided with each HW timestamp.
>> >
>> >I'm not sure what would be the best place to put it. I guess the
>> >second timespec in scm_timestamping could be reused for this, but
>> >that sounds like a gross hack. Do we need to define a new struct?
>>
>> What is the use case for this. even if the delay though the PHY's how
>> would that be compensated ?
>
> The idea was that applications like NTP servers and clients wouldn't
> have to care about interfaces and how they map together with addresses
> to PHCs over time. Currently, I use the interface index from
> IP_PKTINFO to get the PHC, but that doesn't work with bridges and
> other virtual interfaces. Another possibility would be an option to
> modify the behavior of IP_PKTINFO to save the index of the real
> interface. I'm not sure how would that compare in difficulty to
> extending SCM_TIMESTAMPING with PHC index.

Why not just return the digest that is in the message ?
Though I am not sure if the least 32 bits will result in too many collisions.

RMS

>
> --
> Miroslav Lichvar


[PATCH 2/2] net: ethernet: ti: cpsw: fix resume because of usage count

2017-02-08 Thread Ivan Khoronzhuk
The usage count function is based on ndev_running flag that is
updated before calling ndo_open/close, but if ndo is called in
another place, in this case in suspend/resume, the counter is not
changed, that breaks sus/resume. For common resource no difference
which device is using it, does matter only device count. So, replace
usage count function on var and inc and dec it in ndo_open/close.

Fixes: 03fd01ad0eead23eb79294b6fb4d71dcac493855
"net: ethernet: ti: cpsw: don't duplicate ndev_running"

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 42 --
 1 file changed, 12 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 9714fab..1ffaad1 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -399,6 +399,7 @@ struct cpsw_common {
struct cpts *cpts;
int rx_ch_num, tx_ch_num;
int speed;
+   int usage_count;
 };
 
 struct cpsw_priv {
@@ -671,18 +672,6 @@ static void cpsw_intr_disable(struct cpsw_common *cpsw)
return;
 }
 
-static int cpsw_get_usage_count(struct cpsw_common *cpsw)
-{
-   u32 i;
-   u32 usage_count = 0;
-
-   for (i = 0; i < cpsw->data.slaves; i++)
-   if (cpsw->slaves[i].ndev && netif_running(cpsw->slaves[i].ndev))
-   usage_count++;
-
-   return usage_count;
-}
-
 static void cpsw_tx_handler(void *token, int len, int status)
 {
struct netdev_queue *txq;
@@ -716,8 +705,7 @@ static void cpsw_rx_handler(void *token, int len, int 
status)
 
if (unlikely(status < 0) || unlikely(!netif_running(ndev))) {
/* In dual emac mode check for all interfaces */
-   if (cpsw->data.dual_emac &&
-   cpsw_get_usage_count(cpsw) &&
+   if (cpsw->data.dual_emac && cpsw->usage_count &&
(status >= 0)) {
/* The packet received is for the interface which
 * is already down and the other interface is up
@@ -1492,11 +1480,8 @@ static int cpsw_ndo_open(struct net_device *ndev)
 CPSW_MAJOR_VERSION(reg), CPSW_MINOR_VERSION(reg),
 CPSW_RTL_VERSION(reg));
 
-   /* Initialize host and slave ports.
-* Given ndev is marked as opened already, so init port only if 1 ndev
-* is opened
-*/
-   if (cpsw_get_usage_count(cpsw) < 2)
+   /* Initialize host and slave ports */
+   if (!cpsw->usage_count)
cpsw_init_host_port(priv);
for_each_slave(priv, cpsw_slave_open, priv);
 
@@ -1507,10 +1492,8 @@ static int cpsw_ndo_open(struct net_device *ndev)
cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan,
  ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
 
-   /* Given ndev is marked as opened already, so if more ndev
-* are opened - no need to init shared resources.
-*/
-   if (cpsw_get_usage_count(cpsw) < 2) {
+   /* initialize shared resources for every ndev */
+   if (!cpsw->usage_count) {
/* disable priority elevation */
__raw_writel(0, >regs->ptype);
 
@@ -1552,6 +1535,7 @@ static int cpsw_ndo_open(struct net_device *ndev)
 
cpdma_ctlr_start(cpsw->dma);
cpsw_intr_enable(cpsw);
+   cpsw->usage_count++;
 
return 0;
 
@@ -1572,10 +1556,7 @@ static int cpsw_ndo_stop(struct net_device *ndev)
netif_tx_stop_all_queues(priv->ndev);
netif_carrier_off(priv->ndev);
 
-   /* Given ndev is marked as close already,
-* so disable shared resources if no open devices
-*/
-   if (!cpsw_get_usage_count(cpsw)) {
+   if (cpsw->usage_count <= 1) {
napi_disable(>napi_rx);
napi_disable(>napi_tx);
cpts_unregister(cpsw->cpts);
@@ -1588,6 +1569,7 @@ static int cpsw_ndo_stop(struct net_device *ndev)
if (cpsw_need_resplit(cpsw))
cpsw_split_res(ndev);
 
+   cpsw->usage_count--;
pm_runtime_put_sync(cpsw->dev);
return 0;
 }
@@ -2393,7 +2375,7 @@ static int cpsw_resume_data_pass(struct net_device *ndev)
netif_dormant_off(slave->ndev);
 
/* After this receive is started */
-   if (cpsw_get_usage_count(cpsw)) {
+   if (cpsw->usage_count) {
ret = cpsw_fill_rx_channels(priv);
if (ret)
return ret;
@@ -2447,7 +2429,7 @@ static int cpsw_set_channels(struct net_device *ndev,
}
}
 
-   if (cpsw_get_usage_count(cpsw))
+   if (cpsw->usage_count)
cpsw_split_res(ndev);
 
ret = cpsw_resume_data_pass(ndev);
@@ -2529,7 +2511,7 @@ static int cpsw_set_ringparam(struct net_device 

[PATCH 1/2] net: ethernet: ti: cpsw: fix cpsw assignment in resume

2017-02-08 Thread Ivan Khoronzhuk
There is a copy-paste error, which hides breaking of resume
for CPSW driver: there was replaced netdev_priv() to ndev_to_cpsw(ndev)
in suspend, but left it unchanged in resume.

Fixes: 606f39939595a4d4540406bfc11f265b2036af6d
(ti: cpsw: move platform data and slaves info to cpsw_common)

Reported-by: Alexey Starikovskiy 
Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 4d1c0c3..9714fab 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -3225,7 +3225,7 @@ static int cpsw_resume(struct device *dev)
 {
struct platform_device  *pdev = to_platform_device(dev);
struct net_device   *ndev = platform_get_drvdata(pdev);
-   struct cpsw_common  *cpsw = netdev_priv(ndev);
+   struct cpsw_common  *cpsw = ndev_to_cpsw(ndev);
 
/* Select default pin state */
pinctrl_pm_select_default_state(dev);
-- 
2.7.4



[PATCH 0/2] net: ethernet: ti: cpsw: fix susp/resume

2017-02-08 Thread Ivan Khoronzhuk
These two patches fix suspend/resume chain.

Ivan Khoronzhuk (2):
  net: ethernet: ti: cpsw: fix cpsw assignment in resume
  net: ethernet: ti: cpsw: fix resume because of usage count

 drivers/net/ethernet/ti/cpsw.c | 44 +-
 1 file changed, 13 insertions(+), 31 deletions(-)

-- 
2.7.4



[PATCH RFC v2 4/8] qede: Changes to use generic XDP infrastructure

2017-02-08 Thread Tom Herbert
Change XDP program management functional interface to correspond to new
XDP API.

Signed-off-by: Tom Herbert 
---
 drivers/net/ethernet/qlogic/qede/qede.h |  3 +-
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |  2 +-
 drivers/net/ethernet/qlogic/qede/qede_filter.c  | 39 ++---
 drivers/net/ethernet/qlogic/qede/qede_fp.c  | 36 +--
 drivers/net/ethernet/qlogic/qede/qede_main.c| 23 ---
 5 files changed, 44 insertions(+), 59 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index b423406..e1baf88 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -213,10 +213,9 @@ struct qede_dev {
u16 geneve_dst_port;
 
bool wol_enabled;
+   bool xdp_enabled;
 
struct qede_rdma_devrdma_info;
-
-   struct bpf_prog *xdp_prog;
 };
 
 enum QEDE_STATE {
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c 
b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index baf2642..5559d6e 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -341,7 +341,7 @@ static int qede_get_sset_count(struct net_device *dev, int 
stringset)
num_stats += QEDE_RSS_COUNT(edev) * QEDE_NUM_RQSTATS;
 
/* Account for XDP statistics [if needed] */
-   if (edev->xdp_prog)
+   if (edev->xdp_enabled)
num_stats += QEDE_RSS_COUNT(edev) * QEDE_NUM_TQSTATS;
return num_stats;
 
diff --git a/drivers/net/ethernet/qlogic/qede/qede_filter.c 
b/drivers/net/ethernet/qlogic/qede/qede_filter.c
index 107c3fd..9c9db44 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_filter.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_filter.c
@@ -426,7 +426,7 @@ int qede_set_features(struct net_device *dev, 
netdev_features_t features)
 * aggregations, so no need to actually reload.
 */
__qede_lock(edev);
-   if (edev->xdp_prog)
+   if (edev->xdp_enabled)
args.func(edev, );
else
qede_reload(edev, , true);
@@ -506,29 +506,21 @@ void qede_udp_tunnel_del(struct net_device *dev, struct 
udp_tunnel_info *ti)
schedule_delayed_work(>sp_task, 0);
 }
 
-static void qede_xdp_reload_func(struct qede_dev *edev,
-struct qede_reload_args *args)
+static int qede_xdp_check_bpf(struct qede_dev *edev, struct bpf_prog *prog)
 {
-   struct bpf_prog *old;
-
-   old = xchg(>xdp_prog, args->u.new_prog);
-   if (old)
-   bpf_prog_put(old);
-}
-
-static int qede_xdp_set(struct qede_dev *edev, struct bpf_prog *prog)
-{
-   struct qede_reload_args args;
-
if (prog && prog->xdp_adjust_head) {
DP_ERR(edev, "Does not support bpf_xdp_adjust_head()\n");
return -EOPNOTSUPP;
}
 
-   /* If we're called, there was already a bpf reference increment */
-   args.func = _xdp_reload_func;
-   args.u.new_prog = prog;
-   qede_reload(edev, , false);
+   return 0;
+}
+
+static int qede_xdp_init(struct qede_dev *edev, bool enable)
+{
+   edev->xdp_enabled = enable;
+
+   qede_reload(edev, NULL, false);
 
return 0;
 }
@@ -538,11 +530,12 @@ int qede_xdp(struct net_device *dev, struct netdev_xdp 
*xdp)
struct qede_dev *edev = netdev_priv(dev);
 
switch (xdp->command) {
-   case XDP_SETUP_PROG:
-   return qede_xdp_set(edev, xdp->prog);
-   case XDP_QUERY_PROG:
-   xdp->prog_attached = !!edev->xdp_prog;
-   return 0;
+   case XDP_MODE_OFF:
+   return qede_xdp_init(edev, true);
+   case XDP_MODE_ON:
+   return qede_xdp_init(edev, false);
+   case XDP_CHECK_BPF_PROG:
+   return qede_xdp_check_bpf(edev, xdp->prog);
default:
return -EINVAL;
}
diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c 
b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index 26848ee..af885c3 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "qede.h"
@@ -987,13 +988,14 @@ static bool qede_pkt_is_ip_fragmented(struct 
eth_fast_path_rx_reg_cqe *cqe,
 static bool qede_rx_xdp(struct qede_dev *edev,
struct qede_fastpath *fp,
struct qede_rx_queue *rxq,
-   struct bpf_prog *prog,
struct sw_rx_data *bd,
struct eth_fast_path_rx_reg_cqe *cqe)
 {
u16 len = le16_to_cpu(cqe->len_on_first_bd);
struct xdp_buff xdp;
enum xdp_action act;
+   struct xdp_hook 

[PATCH RFC v2 0/8] xdp: Generalize XDP

2017-02-08 Thread Tom Herbert
This patch set generalizes XDP by making the hooks in drivers to be
generic. This has a number of advantages:

  - Allows alternative users of the XDP hooks other than the original
BPF
  - Allows a means to pipeline XDP programs together
  - Reduces the amount of code and complexity needed in drivers to
manage XDP
  - Provides a more structured environment that is extensible to new
features while being mostly transparent to the drivers

The generic XDP infrastructure is based on an xdp_hook structure that
contains callback functions and private data structure that can be
populated by the user of XDP. The XDP hooks are registered either on a
netdev or a napi (both maintain a list of XDP hooks). Allowing per
netdev hooks makes management of XDP a lot simpler when the intent is
for the hook to apply to the whole device (as is the case with XDP_BPF
so far).  Multiple xdp hooks may be registered on a device or napi
instance, the order of execution is indicated in the priority field of
the xdp_hook structure. Execution of the list contains to the end or
until a program returns something other than XDP_PASS. If both
napi XDP hooks and device hooks are enabled, the NAPI hooks are run
first.

The xdp_hook structure contains a "hookfn" field that is the function
executes a hook. The "priv" structure is private data that is provided
as an argument to hookfn-- in the case of a BPF hook this is simply
the bpf_prog.

Hooks may be registered by xdp_register_dev_hook or
xdp_register_napi_hook, and subsequently they can be unregistered
but xdp_unregister_dev_hook and xdp_unregister_napi_hook. The
identifier for a hook is the pointer to the template hook that was
used to register the hook. xdp_find_dev_hook and
xdp_find_napi_hook will return whether a hook has been registered
and optionally return the contents of the hook. xdp_bpf_check_prog
is called for BPF programs to check if the driver is okay with
running the program (uses the XDP_CHECK_BPF_PROG ndo command
described below).

Driver interface:

Drivers no longer deal with BPF programs for the most part, instead
they call into the XDP interface.

There are two functions of interest for use in the receive data path:
  - xdp_hook_run_needed_check: returns true if there is an XDP
program registered on the napi instance or its device
  - xdp_hook_run, xdp_hook_run_ret_last: runs the XDP programs for
the hooks registered for the given napi instance or its device.
The latter variant returns a pointer to the last XDP hook that
was run (useful for reporting).

The ndo_xdp defines a new set of commands for this interface. A driver
should implement these commands:
  - XDP_MODE_ON: Initialize device to use XDP. Called when first XDP
 program is registered on a device (including on a NAPI
 instance).
  - XDP_MODE_OFF: XDP is finished on the device. Called after the last
  XDP hook has been unregistered for a device.
  - XDP_CHECK_BPF_PROG: Check if a BPF program is acceptable to a device
  to run.
  - XDP_OFFLOAD_BPF: Offload the associated BPF program (e.g. Netronome).

A new net feature is added NETIF_F_XDP so that a driver indicates
that is supports XDP.

This patch set:
  - Adds the infrastructure described above include xdp.c and xdp.h files.
  - Modifies mlx4, mlx5, qede, nfp, and virt_net drivers to use the new
interface. That is mostly removed the management of BPF programs and
changing to call the new interface.

v2:
  - Eliminate use of nfhooks like lists. Just use use simple array for
the hooks
  - Modify more drivers that now support XDP

Tested: TBD


Tom Herbert (8):
  xdp: Infrastructure to generalize XDP
  mlx4: Changes to use generic XDP infrastructure
  nfp: Changes to use generic XDP infrastructure
  qede: Changes to use generic XDP infrastructure
  virt_net: Changes to use generic XDP infrastructure
  mlx5: Changes to use generic XDP infrastructure
  bnxt: Changes to use generic XDP infrastructure
  xdp: Cleanup after API changes

 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  14 -
 drivers/net/ethernet/broadcom/bnxt/bnxt.h  |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c  |  46 +--
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  92 ++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |  27 +-
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |   1 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |   1 -
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 105 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|  12 +-
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c   |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   5 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 170 ++-
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  12 +-
 drivers/net/ethernet/qlogic/qede/qede.h|   3 +-
 

Re: [PATCH v2 net-next 0/9] openvswitch: Conntrack integration improvements.

2017-02-08 Thread Joe Stringer
On 8 February 2017 at 11:32, Jarno Rajahalme  wrote:
> This series improves the conntrack integration code in the openvswitch
> module by fixing bugs (patches 1, 4, and 6), clarifying code (patches
> 2, 3, and 5), improving performance (patch 9), and adding new features
> enabling better translation from firewall admission policy to network
> configuration requested by user communities (patches 7 and 8).

Looks like this needs another rebase.

For the patches I haven't specifically commented on (1,2,4,9), they
looked fine to me:

Acked-by: Joe Stringer 

I presume that Pravin will also want to take a look.


Re: [PATCH v2 net-next 8/9] openvswitch: Add force commit.

2017-02-08 Thread Joe Stringer
On 8 February 2017 at 11:32, Jarno Rajahalme  wrote:
> Stateful network admission policy may allow connections to one
> direction and reject connections initiated in the other direction.
> After policy change it is possible that for a new connection an
> overlapping conntrack entry already exists, where the original
> direction of the existing connection is opposed to the new
> connection's initial packet.
>
> Most importantly, conntrack state relating to the current packet gets
> the "reply" designation based on whether the original direction tuple
> or the reply direction tuple matched.  If this "directionality" is
> wrong w.r.t. to the stateful network admission policy it may happen
> that packets in neither direction are correctly admitted.
>
> This patch adds a new "force commit" option to the OVS conntrack
> action that checks the original direction of an existing conntrack
> entry.  If that direction is opposed to the current packet, the
> existing conntrack entry is deleted and a new one is subsequently
> created in the correct direction.
>
> Signed-off-by: Jarno Rajahalme 



> if (help && rcu_access_pointer(help->helper) != info->helper)
> return false;
> }
> +   /* Force conntrack entry direction to the current packet? */
> +   if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
> +   /* Delete the conntrack entry if confirmed, else just release
> +* the reference.
> +*/
> +   if (nf_ct_is_confirmed(ct))
> +   nf_ct_delete(ct, 0, 0);
> +   else
> +   nf_conntrack_put(>ct_general);
> +   skb->nfct = NULL;
> +   skb->nfctinfo = 0;

This can use nf_ct_set().


Re: i40e: driver can't probe device (capabilities discovery error)

2017-02-08 Thread Guilherme G. Piccoli
I forgot to attach a perhaps important file, the output of lspci.
Here it goes.

Thanks,


Guilherme
0002:01:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller 
X710/X557-AT 10GBASE-T [8086:1589] (rev 02)
Subsystem: Super Micro Computer Inc Device [15d9:]
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- 

Re: [PATCH v2 net-next 7/9] openvswitch: Add original direction conntrack tuple to sw_flow_key.

2017-02-08 Thread Joe Stringer
On 8 February 2017 at 11:32, Jarno Rajahalme  wrote:
> Add the fields of the conntrack original direction 5-tuple to struct
> sw_flow_key.  The new fields are initially marked as non-existent, and
> are populated whenever a conntrack action is executed and either finds
> or generates a conntrack entry.  This means that these fields exist
> for all packets that were not rejected by conntrack as untrackable.
>
> The original tuple fields in the sw_flow_key are filled from the
> original direction tuple of the conntrack entry relating to the
> current packet, or from the original direction tuple of the master
> conntrack entry, if the current conntrack entry has a master.
> Generally, expected connections of connections having an assigned
> helper (e.g., FTP), have a master conntrack entry.
>
> The main purpose of the new conntrack original tuple fields is to
> allow matching on them for policy decision purposes, with the premise
> that the admissibility of tracked connections reply packets (as well
> as original direction packets), and both direction packets of any
> related connections may be based on ACL rules applying to the master
> connection's original direction 5-tuple.  This also makes it easier to
> make policy decisions when the actual packet headers might have been
> transformed by NAT, as the original direction 5-tuple represents the
> packet headers before any such transformation.
>
> When using the original direction 5-tuple the admissibility of return
> and/or related packets need not be based on the mere existence of a
> conntrack entry, allowing separation of admission policy from the
> established conntrack state.  While existence of a conntrack entry is
> required for admission of the return or related packets, policy
> changes can render connections that were initially admitted to be
> rejected or dropped afterwards.  If the admission of the return and
> related packets was based on mere conntrack state (e.g., connection
> being in an established state), a policy change that would make the
> connection rejected or dropped would need to find and delete all
> conntrack entries affected by such a change.  When using the original
> direction 5-tuple matching the affected conntrack entries can be
> allowed to time out instead, as the established state of the
> connection would not need to be the basis for packet admission any
> more.
>
> It should be noted that the directionality of related connections may
> be the same or different than that of the master connection, and
> neither the original direction 5-tuple nor the conntrack state bits
> carry this information.  If needed, the directionality of the master
> connection can be stored in master's conntrack mark or labels, which
> are automatically inherited by the expected related connections.
>
> The fact that neither ARP not ND packets are trackable by conntrack

* ARP nor ND

> allows mutual exclusion between ARP/ND and the new conntrack original
> tuple fields.  Hence, the IP addresses are overlaid in union with ARP
> and ND fields.  This allows the sw_flow_key to not grow much due to
> this patch, but it also means that we must be careful to never use the
> new key fields with ARP or ND packets.  ARP is easy to distinguish and
> keep mutually exclusive based on the ethernet type, but ND being an
> ICMPv6 protocol requires a bit more attention.
>
> Signed-off-by: Jarno Rajahalme 

Acked-by: Joe Stringer 


Re: [RFC 1/2] dt: emac: document device-tree based phy discovery and setup

2017-02-08 Thread Rob Herring
On Sun, Feb 05, 2017 at 11:25:05PM +0100, Christian Lamparter wrote:
> This patch adds documentation for a new "phy-handler" property

s/phy-handler/phy-handle/

> and "mdio" sub-node. These allows the enumeration of PHYs which
> are supported by the phy library under drivers/net/phy.
> 
> The EMAC ethernet controller in IBM and AMCC 4xx chips is
> currently stuck with a few privately defined phy
> implementations. It has no support for PHYs which
> are supported by the generic phylib.
> 
> Signed-off-by: Christian Lamparter 
> ---
>  .../devicetree/bindings/powerpc/4xx/emac.txt   | 60 
> +-
>  1 file changed, 58 insertions(+), 2 deletions(-)

Otherwise,

Acked-by: Rob Herring 



[patch net-next 14/15] mlxsw: spectrum_router: Don't reflect LINKDOWN nexthops

2017-02-08 Thread Jiri Pirko
From: Ido Schimmel 

The kernel resolves the nexthops for a given route using
FIB_LOOKUP_IGNORE_LINKSTATE which means a notification can be sent for a
route with one of its nexthops being LINKDOWN.

In case IGNORE_ROUTES_WITH_LINKDOWN is set for the nexthop netdev, then
we shouldn't reflect the nexthop to the device's table.

Once the nexthop netdev's carrier goes up we'll be notified using NH_ADD
and reflect it to the device.

Signed-off-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 1c68b40..8dfc025 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1503,6 +1504,7 @@ static int mlxsw_sp_nexthop_init(struct mlxsw_sp 
*mlxsw_sp,
 struct fib_nh *fib_nh)
 {
struct net_device *dev = fib_nh->nh_dev;
+   struct in_device *in_dev;
struct mlxsw_sp_rif *r;
int err;
 
@@ -1512,6 +1514,11 @@ static int mlxsw_sp_nexthop_init(struct mlxsw_sp 
*mlxsw_sp,
if (err)
return err;
 
+   in_dev = __in_dev_get_rtnl(dev);
+   if (in_dev && IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
+   fib_nh->nh_flags & RTNH_F_LINKDOWN)
+   return 0;
+
r = mlxsw_sp_rif_find_by_dev(mlxsw_sp, dev);
if (!r)
return 0;
-- 
2.7.4



Re: Extending socket timestamping API for NTP

2017-02-08 Thread sdncurious
Dealing with individual interfaces does not make sense. This seems to be a
case where Reciprocity property is violated and hence should be handled as
such. This is different than when the two sides have single but different
speed NIC's. In this case the NIC used and the speed can change with each
packet. Although I am not sure if that is possible because the hash should
always land the packet on the NIC of the bond.

7. Reciprocity Errors

The above analysis assumes that the delays on the outbound and inbound
paths are the same; that is, the paths are reciprocal. This is assured if
the ropagation delays are the same, the transmission rates are the same and
the packet lengths are the same. In the NTP on-wire protocol all packets
have the the same length. If we assume the transmission rates are the same,
the only difference in path delays must be due to nonreciprocal
transmission paths. This often occurs if one way is via landline and the
other via satellite. It can also occur when the paths traverse tag-switched
core networks.

RMS

On Wed, Feb 8, 2017 at 2:26 AM, Miroslav Lichvar  wrote:
> On Tue, Feb 07, 2017 at 12:37:15PM -0800, sdncurious wrote:
>> On Tue, Feb 7, 2017 at 6:01 AM, Miroslav Lichvar  wrote:
>> > 6) new SO_TIMESTAMPING option to get PHC index with HW timestamps
>> >
>> >With bridges, bonding and other things it's difficult to determine
>> >which PHC timestamped the packet. It would be very useful if the
>> >PHC index was provided with each HW timestamp.
>> >
>> >I'm not sure what would be the best place to put it. I guess the
>> >second timespec in scm_timestamping could be reused for this, but
>> >that sounds like a gross hack. Do we need to define a new struct?
>>
>> What is the use case for this. even if the delay though the PHY's how
>> would that be compensated ?
>
> The idea was that applications like NTP servers and clients wouldn't
> have to care about interfaces and how they map together with addresses
> to PHCs over time. Currently, I use the interface index from
> IP_PKTINFO to get the PHC, but that doesn't work with bridges and
> other virtual interfaces. Another possibility would be an option to
> modify the behavior of IP_PKTINFO to save the index of the real
> interface. I'm not sure how would that compare in difficulty to
> extending SCM_TIMESTAMPING with PHC index.
>
> --
> Miroslav Lichvar


[PATCH v3 net-next 2/2] net: dsa: mv88e6xxx: Add mv88e6390 watchdog interrupt support

2017-02-08 Thread Andrew Lunn
Implement the ops needed to support the watchdog for the MV88E6390
family.

Signed-off-by: Andrew Lunn 
---
v2:
   All new
v3:
   Remove g2_ prefix from ops.
---
 drivers/net/dsa/mv88e6xxx/chip.c  |  9 +++
 drivers/net/dsa/mv88e6xxx/global2.c   | 48 +++
 drivers/net/dsa/mv88e6xxx/global2.h   |  2 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 12 +
 4 files changed, 71 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 489a59f5dea3..7658284beaf9 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3419,6 +3419,7 @@ static const struct mv88e6xxx_ops mv88e6190_ops = {
.stats_get_stats = mv88e6390_stats_get_stats,
.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
.g1_set_egress_port = mv88e6390_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3446,6 +3447,7 @@ static const struct mv88e6xxx_ops mv88e6190x_ops = {
.stats_get_stats = mv88e6390_stats_get_stats,
.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
.g1_set_egress_port = mv88e6390_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3473,6 +3475,7 @@ static const struct mv88e6xxx_ops mv88e6191_ops = {
.stats_get_stats = mv88e6390_stats_get_stats,
.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
.g1_set_egress_port = mv88e6390_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3530,6 +3533,7 @@ static const struct mv88e6xxx_ops mv88e6290_ops = {
.stats_get_stats = mv88e6390_stats_get_stats,
.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
.g1_set_egress_port = mv88e6390_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3694,6 +3698,7 @@ static const struct mv88e6xxx_ops mv88e6141_ops = {
.stats_get_stats = mv88e6390_stats_get_stats,
.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
.g1_set_egress_port = mv88e6390_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu =  mv88e6390_g1_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3722,6 +3727,7 @@ static const struct mv88e6xxx_ops mv88e6341_ops = {
.stats_get_stats = mv88e6390_stats_get_stats,
.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
.g1_set_egress_port = mv88e6390_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu =  mv88e6390_g1_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3752,6 +3758,7 @@ static const struct mv88e6xxx_ops mv88e6390_ops = {
.stats_get_stats = mv88e6390_stats_get_stats,
.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
.g1_set_egress_port = mv88e6390_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3781,6 +3788,7 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = {
.stats_get_stats = mv88e6390_stats_get_stats,
.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
.g1_set_egress_port = mv88e6390_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3808,6 +3816,7 @@ static const struct mv88e6xxx_ops mv88e6391_ops = {
.stats_get_stats = mv88e6390_stats_get_stats,
.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
.g1_set_egress_port = mv88e6390_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
diff --git a/drivers/net/dsa/mv88e6xxx/global2.c 
b/drivers/net/dsa/mv88e6xxx/global2.c
index 1e2d65826d12..8f15bc7b1f5f 100644
--- a/drivers/net/dsa/mv88e6xxx/global2.c
+++ b/drivers/net/dsa/mv88e6xxx/global2.c
@@ -686,6 +686,54 @@ const struct mv88e6xxx_irq_ops mv88e6097_watchdog_ops = {
.irq_free = mv88e6097_watchdog_free,
 };
 
+static int mv88e6390_watchdog_setup(struct mv88e6xxx_chip *chip)
+{
+   return mv88e6xxx_g2_update(chip, GLOBAL2_WDOG_CONTROL,
+  GLOBAL2_WDOG_INT_ENABLE |
+  GLOBAL2_WDOG_CUT_THROUGH |
+  GLOBAL2_WDOG_QUEUE_CONTROLLER |
+  GLOBAL2_WDOG_EGRESS |
+  GLOBAL2_WDOG_FORCE_IRQ);
+}
+
+static int mv88e6390_watchdog_action(struct mv88e6xxx_chip *chip, int irq)
+{
+   int err;
+   u16 reg;
+
+   mv88e6xxx_g2_write(chip, GLOBAL2_WDOG_CONTROL, 

[PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Add watchdog interrupt handler

2017-02-08 Thread Andrew Lunn
The switch contains a watchdog looking for issues with the internal
gubbins of the switch. Hook the interrupt the watchdog triggers and
log the value of the control register indicating why the watchdog
fired. The watchdog can only be cleared with a switch reset, which
will destroy the current configuration. Rather than doing this, just
disable the interrupt.

The mv88e6390 family has different watchdog registers. So use an ops
structure, so support for the mv88e6390 family can be added later.

Signed-off-by: Andrew Lunn 
---
v2:
  Use ops and exclude the 6390 family
  Add missing locks in the IRQ handler
v3:
  remove g2_ from ops name.
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 14 ++
 drivers/net/dsa/mv88e6xxx/global2.c   | 89 ++-
 drivers/net/dsa/mv88e6xxx/global2.h   |  4 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 21 +
 4 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 7b4e40b286e4..489a59f5dea3 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3111,6 +3111,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
.stats_get_stats = mv88e6095_stats_get_stats,
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
.ppu_enable = mv88e6185_g1_ppu_enable,
.ppu_disable = mv88e6185_g1_ppu_disable,
@@ -3179,6 +3180,7 @@ static const struct mv88e6xxx_ops mv88e6123_ops = {
.stats_get_stats = mv88e6095_stats_get_stats,
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3205,6 +3207,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = {
.stats_get_stats = mv88e6095_stats_get_stats,
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
.ppu_enable = mv88e6185_g1_ppu_enable,
.ppu_disable = mv88e6185_g1_ppu_disable,
@@ -3232,6 +3235,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = {
.stats_get_stats = mv88e6095_stats_get_stats,
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3250,6 +3254,7 @@ static const struct mv88e6xxx_ops mv88e6165_ops = {
.stats_get_stats = mv88e6095_stats_get_stats,
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3276,6 +3281,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = {
.stats_get_stats = mv88e6095_stats_get_stats,
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3304,6 +3310,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
.stats_get_stats = mv88e6095_stats_get_stats,
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3330,6 +3337,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = {
.stats_get_stats = mv88e6095_stats_get_stats,
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3358,6 +3366,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
.stats_get_stats = mv88e6095_stats_get_stats,
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
.reset = mv88e6352_g1_reset,
 };
@@ -3380,6 +3389,7 @@ static const struct mv88e6xxx_ops mv88e6185_ops = {
.stats_get_stats = mv88e6095_stats_get_stats,
.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
.g1_set_egress_port = mv88e6095_g1_set_egress_port,
+   .watchdog_ops = _watchdog_ops,
.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
.ppu_enable = 

[PATCHv2 net-next] net: dsa: mv88e6xxx: Move forward declaration to where it is needed

2017-02-08 Thread Andrew Lunn
Move it out from the middle for the #defines to just before it is
needed.

Signed-off-by: Andrew Lunn 
Reviewed-by: Vivien Didelot 
---
v2:

Rebased onto latest net-next.

drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 8a21800374f3..d6b335cd8c09 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -662,8 +662,6 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAGS_PVT |  \
 MV88E6XXX_FLAGS_SERDES)
 
-struct mv88e6xxx_ops;
-
 #define MV88E6XXX_FLAGS_FAMILY_6390\
(MV88E6XXX_FLAG_EEE |   \
 MV88E6XXX_FLAG_GLOBAL2 |   \
@@ -673,6 +671,8 @@ struct mv88e6xxx_ops;
 MV88E6XXX_FLAGS_MULTI_CHIP |   \
 MV88E6XXX_FLAGS_PVT)
 
+struct mv88e6xxx_ops;
+
 struct mv88e6xxx_info {
enum mv88e6xxx_family family;
u16 prod_num;
-- 
2.11.0



[PATCH v3 net-next 0/2] mv88e6xxx Watchdog support

2017-02-08 Thread Andrew Lunn
The Marvell switches have an in built watchdog over some of the
internal state machine. The watchdog can be configured to raise an
interrupt on error. The problem the watchdog found is then logged to
the kernel log.

The older switches can automagically perform a software reset when the
watchdog triggers. This just resets the internal state machine, but
leaves the switch configuration unchanged.

The 6390 family of switches cannot both raise an interrupt and
automagically perform a software reset. So the interrupt handler has
to perform the switch reset, and then re-enable the watchdog
interrupts.

This has been tested using hacked together debugfs code which allows
the "force" bit to be set, so cause a watchdog interrupt.

v2: Remove g2_prefix

Andrew Lunn (2):
  net: dsa: mv88e6xxx: Add watchdog interrupt handler
  net: dsa: mv88e6xxx: Add mv88e6390 watchdog interrupt support

 drivers/net/dsa/mv88e6xxx/chip.c  |  23 ++
 drivers/net/dsa/mv88e6xxx/global2.c   | 137 +-
 drivers/net/dsa/mv88e6xxx/global2.h   |   6 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  33 
 4 files changed, 198 insertions(+), 1 deletion(-)

-- 
2.11.0



[PATCH] net: micrel: ks8695net: use new api ethtool_{get|set}_link_ksettings

2017-02-08 Thread Philippe Reynes
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/micrel/ks8695net.c |   91 +--
 1 files changed, 50 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/micrel/ks8695net.c 
b/drivers/net/ethernet/micrel/ks8695net.c
index d210615..bd51e05 100644
--- a/drivers/net/ethernet/micrel/ks8695net.c
+++ b/drivers/net/ethernet/micrel/ks8695net.c
@@ -854,85 +854,94 @@ static int ks8695_poll(struct napi_struct *napi, int 
budget)
 }
 
 /**
- * ks8695_wan_get_settings - Get device-specific settings.
+ * ks8695_wan_get_link_ksettings - Get device-specific settings.
  * @ndev: The network device to read settings from
  * @cmd: The ethtool structure to read into
  */
 static int
-ks8695_wan_get_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
+ks8695_wan_get_link_ksettings(struct net_device *ndev,
+ struct ethtool_link_ksettings *cmd)
 {
struct ks8695_priv *ksp = netdev_priv(ndev);
u32 ctrl;
+   u32 supported, advertising;
 
/* All ports on the KS8695 support these... */
-   cmd->supported = (SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full |
+   supported = (SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full |
  SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full |
  SUPPORTED_TP | SUPPORTED_MII);
-   cmd->transceiver = XCVR_INTERNAL;
 
-   cmd->advertising = ADVERTISED_TP | ADVERTISED_MII;
-   cmd->port = PORT_MII;
-   cmd->supported |= (SUPPORTED_Autoneg | SUPPORTED_Pause);
-   cmd->phy_address = 0;
+   advertising = ADVERTISED_TP | ADVERTISED_MII;
+   cmd->base.port = PORT_MII;
+   supported |= (SUPPORTED_Autoneg | SUPPORTED_Pause);
+   cmd->base.phy_address = 0;
 
ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
if ((ctrl & WMC_WAND) == 0) {
/* auto-negotiation is enabled */
-   cmd->advertising |= ADVERTISED_Autoneg;
+   advertising |= ADVERTISED_Autoneg;
if (ctrl & WMC_WANA100F)
-   cmd->advertising |= ADVERTISED_100baseT_Full;
+   advertising |= ADVERTISED_100baseT_Full;
if (ctrl & WMC_WANA100H)
-   cmd->advertising |= ADVERTISED_100baseT_Half;
+   advertising |= ADVERTISED_100baseT_Half;
if (ctrl & WMC_WANA10F)
-   cmd->advertising |= ADVERTISED_10baseT_Full;
+   advertising |= ADVERTISED_10baseT_Full;
if (ctrl & WMC_WANA10H)
-   cmd->advertising |= ADVERTISED_10baseT_Half;
+   advertising |= ADVERTISED_10baseT_Half;
if (ctrl & WMC_WANAP)
-   cmd->advertising |= ADVERTISED_Pause;
-   cmd->autoneg = AUTONEG_ENABLE;
+   advertising |= ADVERTISED_Pause;
+   cmd->base.autoneg = AUTONEG_ENABLE;
 
-   ethtool_cmd_speed_set(cmd,
- (ctrl & WMC_WSS) ? SPEED_100 : SPEED_10);
-   cmd->duplex = (ctrl & WMC_WDS) ?
+   cmd->base.speed = (ctrl & WMC_WSS) ? SPEED_100 : SPEED_10;
+   cmd->base.duplex = (ctrl & WMC_WDS) ?
DUPLEX_FULL : DUPLEX_HALF;
} else {
/* auto-negotiation is disabled */
-   cmd->autoneg = AUTONEG_DISABLE;
+   cmd->base.autoneg = AUTONEG_DISABLE;
 
-   ethtool_cmd_speed_set(cmd, ((ctrl & WMC_WANF100) ?
-   SPEED_100 : SPEED_10));
-   cmd->duplex = (ctrl & WMC_WANFF) ?
+   cmd->base.speed = (ctrl & WMC_WANF100) ?
+   SPEED_100 : SPEED_10;
+   cmd->base.duplex = (ctrl & WMC_WANFF) ?
DUPLEX_FULL : DUPLEX_HALF;
}
 
+   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+   supported);
+   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.advertising,
+   advertising);
+
return 0;
 }
 
 /**
- * ks8695_wan_set_settings - Set device-specific settings.
+ * ks8695_wan_set_link_ksettings - Set device-specific settings.
  * @ndev: The network device to configure
  * @cmd: The settings to configure
  */
 static int
-ks8695_wan_set_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
+ks8695_wan_set_link_ksettings(struct net_device *ndev,
+ const struct ethtool_link_ksettings *cmd)
 {
struct ks8695_priv *ksp = netdev_priv(ndev);
u32 ctrl;
+   u32 

Re: [PATCH v2 net-next 3/9] openvswitch: Simplify labels length logic.

2017-02-08 Thread Joe Stringer
On 8 February 2017 at 11:32, Jarno Rajahalme  wrote:
> Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
> distinct labels"), the size of conntrack labels extension has fixed to
> 128 bits, so we do not need to check for labels sizes shorter than 128
> at run-time.  This patch simplifies labels length logic accordingly,
> but allows the conntrack labels size to be increased in the future
> without breaking the build.  In the event of conntrack labels
> increasing in size OVS would still be able to deal with the 128 first
> label bits.
>
> Suggested-by: Joe Stringer 
> Signed-off-by: Jarno Rajahalme 
> ---
>  net/openvswitch/conntrack.c | 22 +++---
>  1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 6730f09..a07e5cd 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -129,22 +129,22 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct)
>  #endif
>  }
>
> +/* Guard against conntrack labels max size shrinking below 128 bits. */
> +#if NF_CT_LABELS_MAX_SIZE < 16
> +#error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes
> +#endif
> +
>  static void ovs_ct_get_labels(const struct nf_conn *ct,
>   struct ovs_key_ct_labels *labels)
>  {
> struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
>
> -   if (cl) {
> -   size_t len = sizeof(cl->bits);
> -
> -   if (len > OVS_CT_LABELS_LEN)
> -   len = OVS_CT_LABELS_LEN;
> -   else if (len < OVS_CT_LABELS_LEN)
> -   memset(labels, 0, OVS_CT_LABELS_LEN);
> -   memcpy(labels, cl->bits, len);
> -   } else {
> +   if (cl)
> +   memcpy(labels, cl->bits,
> +  sizeof(cl->bits) > OVS_CT_LABELS_LEN
> +  ? OVS_CT_LABELS_LEN : sizeof(cl->bits));

Is this to be defensive? If sizeof(cl->bits) is larger than
OVS_CT_LABELS_LEN, we'll use OVS_CT_LABELS_LEN; if it's equal, we'll
still use OVS_CT_LABELS_LEN; if it's less, the precompiler will fail
above. Why not memcpy(.., OVS_CT_LABELS_LEN) ?


Re: [PATCH net-next] net: dsa: Fix duplicate object rule

2017-02-08 Thread Vivien Didelot
Hi Florian,

Florian Fainelli  writes:

> While adding switch.o to the list of DSA object files, we essentially
> duplicated the previous obj-y line and just added switch.o, remove the
> duplicate.
>
> Fixes: f515f192ab4f ("net: dsa: add switch notifier")
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot 

My bad, thanks!

   Vivien


[PATCH net-next] net: dsa: Fix duplicate object rule

2017-02-08 Thread Florian Fainelli
While adding switch.o to the list of DSA object files, we essentially
duplicated the previous obj-y line and just added switch.o, remove the
duplicate.

Fixes: f515f192ab4f ("net: dsa: add switch notifier")
Signed-off-by: Florian Fainelli 
---
 net/dsa/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index 72912982de3d..31d343796251 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -1,6 +1,5 @@
 # the core
 obj-$(CONFIG_NET_DSA) += dsa_core.o
-dsa_core-y += dsa.o slave.o dsa2.o
 dsa_core-y += dsa.o slave.o dsa2.o switch.o
 
 # tagging formats
-- 
2.9.3



[PATCH net] net: phy: Initialize mdio clock at probe function

2017-02-08 Thread Jon Mason
From: Yendapally Reddy Dhananjaya Reddy 

USB PHYs need the MDIO clock divisor enabled earlier to work.
Initialize mdio clock divisor in probe function. The ext bus
bit available in the same register will be used by mdio mux
to enable external mdio.

Signed-off-by: Yendapally Reddy Dhananjaya Reddy 
Fixes: ddc24ae1 ("net: phy: Broadcom iProc MDIO bus driver")
Reviewed-by: Florian Fainelli 
Signed-off-by: Jon Mason 
---
 drivers/net/phy/mdio-bcm-iproc.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/mdio-bcm-iproc.c b/drivers/net/phy/mdio-bcm-iproc.c
index c0b4e65..46fe1ae 100644
--- a/drivers/net/phy/mdio-bcm-iproc.c
+++ b/drivers/net/phy/mdio-bcm-iproc.c
@@ -81,8 +81,6 @@ static int iproc_mdio_read(struct mii_bus *bus, int phy_id, 
int reg)
if (rc)
return rc;
 
-   iproc_mdio_config_clk(priv->base);
-
/* Prepare the read operation */
cmd = (MII_DATA_TA_VAL << MII_DATA_TA_SHIFT) |
(reg << MII_DATA_RA_SHIFT) |
@@ -112,8 +110,6 @@ static int iproc_mdio_write(struct mii_bus *bus, int phy_id,
if (rc)
return rc;
 
-   iproc_mdio_config_clk(priv->base);
-
/* Prepare the write operation */
cmd = (MII_DATA_TA_VAL << MII_DATA_TA_SHIFT) |
(reg << MII_DATA_RA_SHIFT) |
@@ -163,6 +159,8 @@ static int iproc_mdio_probe(struct platform_device *pdev)
bus->read = iproc_mdio_read;
bus->write = iproc_mdio_write;
 
+   iproc_mdio_config_clk(priv->base);
+
rc = of_mdiobus_register(bus, pdev->dev.of_node);
if (rc) {
dev_err(>dev, "MDIO bus registration failed\n");
-- 
2.7.4



Re: net/ipv4: warning in nf_nat_ipv4_fn

2017-02-08 Thread Florian Westphal
Andrey Konovalov  wrote:
> Hi,
> 
> I've got the following error report while fuzzing the kernel with syzkaller.
> 
> On commit 926af6273fc683cd98cd0ce7bf0d04a02eed6742.
> 
> A reproducer and .config are attached.
> 
> WARNING: CPU: 2 PID: 26582 at
> net/ipv4/netfilter/nf_nat_l3proto_ipv4.c:261
> nf_nat_ipv4_fn+0x7f2/0xa50
> net/ipv4/netfilter/nf_nat_l3proto_ipv4.c:261
> Kernel panic - not syncing: panic_on_warn set ...

Thats this assert:
/* We never see fragments: conntrack defrags on pre-routing
 * and local-out, and nf_nat_out protects post-routing.
 */
NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb)));


... and its wrong. I will send a patch to remove it.



Re: [PATCH] [net-next] ARM: orion: remove unused wnr854t_switch_plat_data

2017-02-08 Thread Florian Fainelli
On 02/08/2017 01:24 PM, Arnd Bergmann wrote:
> The other instances of this structure got removed along with the MDIO
> device change, but this one was left behind and needs to be removed
> as well:
> 
> arch/arm/mach-orion5x/wnr854t-setup.c:109:44: error: 
> 'wnr854t_switch_plat_data' defined but not used [-Werror=unused-variable]
>  static struct dsa_platform_data __initdata wnr854t_switch_plat_data = {
> 
> Fixes: 575e93f7b5e6 ("ARM: orion: Register DSA switch as a MDIO device")
> Signed-off-by: Arnd Bergmann 

Acked-by: Florian Fainelli 

Thanks Arnd!
-- 
Florian


RE: [PATCH 1/1] hv_netvsc: fix a netvsc stats typo

2017-02-08 Thread Simon Xiao
Please ignore this patch. I will resubmit it to net-next.

> -Original Message-
> From: Simon Xiao [mailto:six...@microsoft.com]
> Sent: Tuesday, February 7, 2017 10:03 AM
> To: KY Srinivasan ; Haiyang Zhang
> ; Stephen Hemminger
> ; de...@linuxdriverproject.org;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org
> Cc: Simon Xiao 
> Subject: [PATCH 1/1] hv_netvsc: fix a netvsc stats typo
> 
> [This sender failed our fraud detection checks and may not be who they
> appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
> 
> Now, return the correct tx_errors stats in netvsc.
> 
> Signed-off-by: Simon Xiao 
> Reviewed-by: Haiyang Zhang 
> ---
>  drivers/net/hyperv/netvsc_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/hyperv/netvsc_drv.c
> b/drivers/net/hyperv/netvsc_drv.c
> index 72b0c1f..725ac19 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -920,7 +920,7 @@ static void netvsc_get_stats64(struct net_device *net,
> }
> 
> t->tx_dropped   = net->stats.tx_dropped;
> -   t->tx_errors= net->stats.tx_dropped;
> +   t->tx_errors= net->stats.tx_errors;
> 
> t->rx_dropped   = net->stats.rx_dropped;
> t->rx_errors= net->stats.rx_errors;
> --
> 2.7.4



Re: loopback device reference count leakage

2017-02-08 Thread Cong Wang
On Mon, Feb 6, 2017 at 6:32 PM, Kaiwen Xu  wrote:
> Hi Cong,
>
> I did some more testing, seems like your second assumption is correct.
> There is indeed some things holding the references to a particular dst
> which preventing it to be gc'ed.

Excellent!

>
> I added logging to each dst_hold (or dst_hold_safe, or
> skb_dst_force_safe) and dst_release, which formatted as following:
>
>  () []: dst_release / dst_hold ...  
> 
>
> And inside dst_gc_task(), I added logging when gc delay occurred,
> formatted as:
>
> [dst_gc_task]  (): delayed 
>
> I have the log attached.

The following line looks suspicious:

Feb  6 16:27:24  kernel: [63589.458067] [dst_gc_task]
lodebug (2): delayed 19

Looks like you ended up having one dst whose refcnt is 19 in GC,
and this lasted for a rather long time for some reason.

It is hard to know if it is a refcnt leak even with your log, since there were
4K+ refcnt'ing happened on that dst...

Meanwhile, can you share your setup of your container? What network device
do you use in your container? How is it connected to outside?

Thanks.


Re: [PATCHv6 net-next 3/6] sctp: add support for generating stream reconf ssn/tsn reset request chunk

2017-02-08 Thread Marcelo Ricardo Leitner
On Thu, Feb 09, 2017 at 01:18:17AM +0800, Xin Long wrote:
> This patch is to define SSN/TSN Reset Request Parameter described
> in rfc6525 section 4.3.
> 
> It's also to drop some unnecessary __packed in include/linux/sctp.h.

Oups, extra line in the changelog here.

> 
> Signed-off-by: Xin Long 
> ---
>  include/linux/sctp.h |  5 +
>  include/net/sctp/sm.h|  2 ++
>  net/sctp/sm_make_chunk.c | 29 +
>  3 files changed, 36 insertions(+)
> 
> diff --git a/include/linux/sctp.h b/include/linux/sctp.h
> index d74fca3..71c0d41 100644
> --- a/include/linux/sctp.h
> +++ b/include/linux/sctp.h
> @@ -737,4 +737,9 @@ struct sctp_strreset_inreq {
>   __u16 list_of_streams[0];
>  };
>  
> +struct sctp_strreset_tsnreq {
> + sctp_paramhdr_t param_hdr;
> + __u32 request_seq;
> +};
> +
>  #endif /* __LINUX_SCTP_H__ */
> diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
> index 430ed13..ac37c17 100644
> --- a/include/net/sctp/sm.h
> +++ b/include/net/sctp/sm.h
> @@ -265,6 +265,8 @@ struct sctp_chunk *sctp_make_strreset_req(
>   const struct sctp_association *asoc,
>   __u16 stream_num, __u16 *stream_list,
>   bool out, bool in);
> +struct sctp_chunk *sctp_make_strreset_tsnreq(
> + const struct sctp_association *asoc);
>  void sctp_chunk_assign_tsn(struct sctp_chunk *);
>  void sctp_chunk_assign_ssn(struct sctp_chunk *);
>  
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index c7d3249..749842a 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -3658,3 +3658,32 @@ struct sctp_chunk *sctp_make_strreset_req(
>  
>   return retval;
>  }
> +
> +/* RE-CONFIG 4.3 (SSN/TSN RESET ALL)
> + *   0   1   2   3
> + *   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *  | Parameter Type = 15   |  Parameter Length = 8 |
> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *  | Re-configuration Request Sequence Number  |
> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + */
> +struct sctp_chunk *sctp_make_strreset_tsnreq(
> + const struct sctp_association *asoc)
> +{
> + struct sctp_strreset_tsnreq tsnreq;
> + __u16 length = sizeof(tsnreq);
> + struct sctp_chunk *retval;
> +
> + retval = sctp_make_reconf(asoc, length);
> + if (!retval)
> + return NULL;
> +
> + tsnreq.param_hdr.type = SCTP_PARAM_RESET_TSN_REQUEST;
> + tsnreq.param_hdr.length = htons(length);
> + tsnreq.request_seq = htonl(asoc->strreset_outseq);
> +
> + sctp_addto_chunk(retval, sizeof(tsnreq), );
> +
> + return retval;
> +}
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[PATCH 1/2] net: qcom/emac: add ethtool support for reading hardware registers

2017-02-08 Thread Timur Tabi
Implement the get_regs_len and get_regs ethtool methods.  The driver
returns the values of selected hardware registers.

The make the register offsets known to emac_ethtool, the the register
offset macros are all combined into one header file.  They were
inexplicably and arbitrarily split between two files.

Signed-off-by: Timur Tabi 
---
 drivers/net/ethernet/qualcomm/emac/emac-ethtool.c |  40 
 drivers/net/ethernet/qualcomm/emac/emac-mac.c |  52 ---
 drivers/net/ethernet/qualcomm/emac/emac.h | 108 --
 3 files changed, 119 insertions(+), 81 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c 
b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
index c418a6e..abb9df5 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
@@ -170,6 +170,43 @@ static int emac_set_pauseparam(struct net_device *netdev,
return 0;
 }
 
+/* Selected registers that might want to track during runtime. */
+static const u16 emac_regs[] = {
+   EMAC_DMA_MAS_CTRL,
+   EMAC_MAC_CTRL,
+   EMAC_TXQ_CTRL_0,
+   EMAC_RXQ_CTRL_0,
+   EMAC_DMA_CTRL,
+   EMAC_INT_MASK,
+   EMAC_AXI_MAST_CTRL,
+   EMAC_CORE_HW_VERSION,
+   EMAC_MISC_CTRL,
+};
+
+/* Every time emac_regs[] above is changed, increase this version number. */
+#define EMAC_REGS_VERSION  0
+
+#define EMAC_MAX_REG_SIZE  ARRAY_SIZE(emac_regs)
+
+static void emac_get_regs(struct net_device *netdev,
+ struct ethtool_regs *regs, void *buff)
+{
+   struct emac_adapter *adpt = netdev_priv(netdev);
+   u32 *val = buff;
+   unsigned int i;
+
+   regs->version = EMAC_REGS_VERSION;
+   regs->len = EMAC_MAX_REG_SIZE * sizeof(u32);
+
+   for (i = 0; i < EMAC_MAX_REG_SIZE; i++)
+   val[i] = readl(adpt->base + emac_regs[i]);
+}
+
+static int emac_get_regs_len(struct net_device *netdev)
+{
+   return EMAC_MAX_REG_SIZE * sizeof(32);
+}
+
 static const struct ethtool_ops emac_ethtool_ops = {
.get_link_ksettings = phy_ethtool_get_link_ksettings,
.set_link_ksettings = phy_ethtool_set_link_ksettings,
@@ -189,6 +226,9 @@ static int emac_set_pauseparam(struct net_device *netdev,
.nway_reset = emac_nway_reset,
 
.get_link = ethtool_op_get_link,
+
+   .get_regs_len= emac_get_regs_len,
+   .get_regs= emac_get_regs,
 };
 
 void emac_set_ethtool_ops(struct net_device *netdev)
diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c 
b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
index 4b3e014..cc065ff 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
@@ -25,58 +25,6 @@
 #include "emac.h"
 #include "emac-sgmii.h"
 
-/* EMAC base register offsets */
-#define EMAC_MAC_CTRL  0x001480
-#define EMAC_WOL_CTRL0 0x0014a0
-#define EMAC_RSS_KEY0  0x0014b0
-#define EMAC_H1TPD_BASE_ADDR_LO0x0014e0
-#define EMAC_H2TPD_BASE_ADDR_LO0x0014e4
-#define EMAC_H3TPD_BASE_ADDR_LO0x0014e8
-#define EMAC_INTER_SRAM_PART9  0x001534
-#define EMAC_DESC_CTRL_0   0x001540
-#define EMAC_DESC_CTRL_1   0x001544
-#define EMAC_DESC_CTRL_2   0x001550
-#define EMAC_DESC_CTRL_10  0x001554
-#define EMAC_DESC_CTRL_12  0x001558
-#define EMAC_DESC_CTRL_13  0x00155c
-#define EMAC_DESC_CTRL_3   0x001560
-#define EMAC_DESC_CTRL_4   0x001564
-#define EMAC_DESC_CTRL_5   0x001568
-#define EMAC_DESC_CTRL_14  0x00156c
-#define EMAC_DESC_CTRL_15  0x001570
-#define EMAC_DESC_CTRL_16  0x001574
-#define EMAC_DESC_CTRL_6   0x001578
-#define EMAC_DESC_CTRL_8   0x001580
-#define EMAC_DESC_CTRL_9   0x001584
-#define EMAC_DESC_CTRL_11  0x001588
-#define EMAC_TXQ_CTRL_00x001590
-#define EMAC_TXQ_CTRL_10x001594
-#define EMAC_TXQ_CTRL_20x001598
-#define EMAC_RXQ_CTRL_00x0015a0
-#define EMAC_RXQ_CTRL_10x0015a4
-#define EMAC_RXQ_CTRL_20x0015a8
-#define EMAC_RXQ_CTRL_30x0015ac
-#define EMAC_BASE_CPU_NUMBER   0x0015b8
-#define EMAC_DMA_CTRL  0x0015c0
-#define EMAC_MAILBOX_0 0x0015e0
-#define EMAC_MAILBOX_5 0x0015e4
-#define EMAC_MAILBOX_6 0x0015e8
-#define EMAC_MAILBOX_130x0015ec
-#define EMAC_MAILBOX_2 0x0015f4
-#define EMAC_MAILBOX_3 0x0015f8
-#define EMAC_MAILBOX_110x00160c
-#define EMAC_AXI_MAST_CTRL 0x001610
-#define EMAC_MAILBOX_120x001614

[PATCH 2/2] net: qcom/emac: add ethtool support for setting ring parameters

2017-02-08 Thread Timur Tabi
Implement the set_ringparam method, which allows the user to specify
the size of the TX and RX descriptor rings.  The values are constrained
to the limits of the hardware.

Since the driver does not use separate queues for mini or jumbo frames,
attempts to set those values are rejected.

If the interface is already running when the setting is changed, then
the interface is reset.

Signed-off-by: Timur Tabi 
---
 drivers/net/ethernet/qualcomm/emac/emac-ethtool.c | 24 +++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c 
b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
index abb9df5..a3e2292 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
@@ -145,6 +145,29 @@ static void emac_get_ringparam(struct net_device *netdev,
ring->tx_pending = adpt->tx_desc_cnt;
 }
 
+static int emac_set_ringparam(struct net_device *netdev,
+ struct ethtool_ringparam *ring)
+{
+   struct emac_adapter *adpt = netdev_priv(netdev);
+
+   /* We don't have separate queues/rings for small/large frames, so
+* reject any attempt to specify those values separately.
+*/
+   if (ring->rx_mini_pending || ring->rx_jumbo_pending)
+   return -EINVAL;
+
+   adpt->tx_desc_cnt =
+   clamp_val(ring->tx_pending, EMAC_MIN_TX_DESCS, 
EMAC_MAX_TX_DESCS);
+
+   adpt->rx_desc_cnt =
+   clamp_val(ring->rx_pending, EMAC_MIN_RX_DESCS, 
EMAC_MAX_RX_DESCS);
+
+   if (netif_running(netdev))
+   return emac_reinit_locked(adpt);
+
+   return 0;
+}
+
 static void emac_get_pauseparam(struct net_device *netdev,
struct ethtool_pauseparam *pause)
 {
@@ -219,6 +242,7 @@ static int emac_get_regs_len(struct net_device *netdev)
.get_ethtool_stats = emac_get_ethtool_stats,
 
.get_ringparam = emac_get_ringparam,
+   .set_ringparam = emac_set_ringparam,
 
.get_pauseparam = emac_get_pauseparam,
.set_pauseparam = emac_set_pauseparam,
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.



Re: [PATCHv6 net-next 2/6] sctp: streams should be recovered when it fails to send request.

2017-02-08 Thread Marcelo Ricardo Leitner
On Thu, Feb 09, 2017 at 01:18:16AM +0800, Xin Long wrote:
> Now when sending stream reset request, it closes the streams to
> block further xmit of data until this request is completed, then
> calls sctp_send_reconf to send the chunk.
> 
> But if sctp_send_reconf returns err, and it doesn't recover the
> streams' states back,  which means the request chunk would not be
> queued and sent, so the asoc will get stuck, streams are closed
> and no packet is even queued.
> 
> This patch is to fix it by recovering the streams' states when
> it fails to send the request, it is also to fix a return value.
> 
> Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset 
> Request Parameter")
> Signed-off-by: Xin Long 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  net/sctp/stream.c | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/stream.c b/net/sctp/stream.c
> index 13d5e07..6a686e3 100644
> --- a/net/sctp/stream.c
> +++ b/net/sctp/stream.c
> @@ -136,8 +136,10 @@ int sctp_send_reset_streams(struct sctp_association 
> *asoc,
>   goto out;
>  
>   chunk = sctp_make_strreset_req(asoc, str_nums, str_list, out, in);
> - if (!chunk)
> + if (!chunk) {
> + retval = -ENOMEM;
>   goto out;
> + }
>  
>   if (out) {
>   if (str_nums)
> @@ -149,7 +151,6 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
>   stream->out[i].state = SCTP_STREAM_CLOSED;
>   }
>  
> - asoc->strreset_outstanding = out + in;
>   asoc->strreset_chunk = chunk;
>   sctp_chunk_hold(asoc->strreset_chunk);
>  
> @@ -157,8 +158,22 @@ int sctp_send_reset_streams(struct sctp_association 
> *asoc,
>   if (retval) {
>   sctp_chunk_put(asoc->strreset_chunk);
>   asoc->strreset_chunk = NULL;
> + if (!out)
> + goto out;
> +
> + if (str_nums)
> + for (i = 0; i < str_nums; i++)
> + stream->out[str_list[i]].state =
> +SCTP_STREAM_OPEN;
> + else
> + for (i = 0; i < stream->outcnt; i++)
> + stream->out[i].state = SCTP_STREAM_OPEN;
> +
> + goto out;
>   }
>  
> + asoc->strreset_outstanding = out + in;
> +
>  out:
>   return retval;
>  }
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Re: [PATCHv6 net-next 1/6] sctp: drop unnecessary __packed from some stream reconf structures

2017-02-08 Thread Marcelo Ricardo Leitner
On Thu, Feb 09, 2017 at 01:18:15AM +0800, Xin Long wrote:
> commit 85c727b59483 ("sctp: drop __packed from almost all SCTP structures")
> has removed __packed from almost all SCTP structures. But there still are
> three structures where it should be dropped.
> 
> This patch is to remove it from some stream reconf structures.
> 
> Signed-off-by: Xin Long 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  include/linux/sctp.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/sctp.h b/include/linux/sctp.h
> index 2408c68..d74fca3 100644
> --- a/include/linux/sctp.h
> +++ b/include/linux/sctp.h
> @@ -721,7 +721,7 @@ struct sctp_infox {
>  struct sctp_reconf_chunk {
>   sctp_chunkhdr_t chunk_hdr;
>   __u8 params[0];
> -} __packed;
> +};
>  
>  struct sctp_strreset_outreq {
>   sctp_paramhdr_t param_hdr;
> @@ -729,12 +729,12 @@ struct sctp_strreset_outreq {
>   __u32 response_seq;
>   __u32 send_reset_at_tsn;
>   __u16 list_of_streams[0];
> -} __packed;
> +};
>  
>  struct sctp_strreset_inreq {
>   sctp_paramhdr_t param_hdr;
>   __u32 request_seq;
>   __u16 list_of_streams[0];
> -} __packed;
> +};
>  
>  #endif /* __LINUX_SCTP_H__ */
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Re: [PATCH v2 net-next 1/9] sunvnet: make sunvnet common code dynamically loadable

2017-02-08 Thread Shannon Nelson

On 2/8/2017 11:29 AM, David Miller wrote:

From: Shannon Nelson 
Date: Tue,  7 Feb 2017 14:12:54 -0800


+static int __init sunvnet_common_init(void)
+{
+   pr_info("%s\n", version);
+   return 0;
+}
+module_init(sunvnet_common_init);
+
+static void __exit sunvnet_common_exit(void)
+{
+   /* Empty function, just here to fill the exit function pointer
+* slot.  In some combinations of older gcc and newer kernel,
+* leaving this undefined results in the kernel marking it as a
+* permanent module; it will show up in lsmod output as [permanent]
+* and not be unloadable.
+*/
+}
+module_exit(sunvnet_common_exit);
+


This module is just providing infrastructure for other modules.

So skip the init function, and that way you don't need the exit
function either.

The kernel log message when the real sunvnet driver loads is
sufficient, you don't need one here.



Sure - thanks,
sln


Re: [PATCHv6 net-next 4/6] sctp: implement sender-side procedures for SSN/TSN Reset Request Parameter

2017-02-08 Thread Marcelo Ricardo Leitner
On Wed, Feb 08, 2017 at 07:48:01PM -0200, Marcelo Ricardo Leitner wrote:
> Hi Xin,
> 
> On Thu, Feb 09, 2017 at 01:18:18AM +0800, Xin Long wrote:
> > This patch is to implement Sender-Side Procedures for the SSN/TSN
> > Reset Request Parameter descibed in rfc6525 section 5.1.4.
> > 
> > It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3
> > for users.
> > 
> > Signed-off-by: Xin Long 
> ...
> > +
> > +int sctp_send_reset_assoc(struct sctp_association *asoc)
> > +{
> > +   struct sctp_chunk *chunk = NULL;
> > +   int retval;
> > +   __u16 i;
> > +
> > +   if (!asoc->peer.reconf_capable ||
> > +   !(asoc->strreset_enable & SCTP_ENABLE_RESET_ASSOC_REQ))
> > +   return -ENOPROTOOPT;
> > +
> > +   if (asoc->strreset_outstanding)
> > +   return -EINPROGRESS;
> > +
> > +   chunk = sctp_make_strreset_tsnreq(asoc);
>   ^--- refcnf = 1 (as per sctp_chunkify())
> 
> > +   if (!chunk)
> > +   return -ENOMEM;
> > +
> > +   /* Block further xmit of data until this request is completed */
> > +   for (i = 0; i < asoc->stream->outcnt; i++)
> > +   asoc->stream->out[i].state = SCTP_STREAM_CLOSED;
> > +
> > +   asoc->strreset_chunk = chunk;
> > +   sctp_chunk_hold(asoc->strreset_chunk);
>  ^--- refcnf = 2
> > +
> > +   retval = sctp_send_reconf(asoc, chunk);
> > +   if (retval) {
> > +   sctp_chunk_put(asoc->strreset_chunk);
>   ^--- refcnf = 1
> 
> Won't we leak the chunk here?

No we won't, sctp_send_reconf() frees it for us, aye.




[PATCH 0/2] net: qcom/emac: add the last ethtool functions

2017-02-08 Thread Timur Tabi
These two patches implement the remaining two ethtool functions that
are of interest to the Qualcomm EMAC driver.  These are the last 
patches that will be submitted for the 4.11 merge window.

Timur Tabi (2):
  net: qcom/emac: add ethtool support for reading hardware registers
  net: qcom/emac: add ethtool support for setting ring parameters

 drivers/net/ethernet/qualcomm/emac/emac-ethtool.c |  64 +
 drivers/net/ethernet/qualcomm/emac/emac-mac.c |  52 ---
 drivers/net/ethernet/qualcomm/emac/emac.h | 108 --
 3 files changed, 143 insertions(+), 81 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.



Re: [PATCHv6 net-next 4/6] sctp: implement sender-side procedures for SSN/TSN Reset Request Parameter

2017-02-08 Thread Marcelo Ricardo Leitner
Hi Xin,

On Thu, Feb 09, 2017 at 01:18:18AM +0800, Xin Long wrote:
> This patch is to implement Sender-Side Procedures for the SSN/TSN
> Reset Request Parameter descibed in rfc6525 section 5.1.4.
> 
> It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3
> for users.
> 
> Signed-off-by: Xin Long 
...
> +
> +int sctp_send_reset_assoc(struct sctp_association *asoc)
> +{
> + struct sctp_chunk *chunk = NULL;
> + int retval;
> + __u16 i;
> +
> + if (!asoc->peer.reconf_capable ||
> + !(asoc->strreset_enable & SCTP_ENABLE_RESET_ASSOC_REQ))
> + return -ENOPROTOOPT;
> +
> + if (asoc->strreset_outstanding)
> + return -EINPROGRESS;
> +
> + chunk = sctp_make_strreset_tsnreq(asoc);
  ^--- refcnf = 1 (as per sctp_chunkify())

> + if (!chunk)
> + return -ENOMEM;
> +
> + /* Block further xmit of data until this request is completed */
> + for (i = 0; i < asoc->stream->outcnt; i++)
> + asoc->stream->out[i].state = SCTP_STREAM_CLOSED;
> +
> + asoc->strreset_chunk = chunk;
> + sctp_chunk_hold(asoc->strreset_chunk);
 ^--- refcnf = 2
> +
> + retval = sctp_send_reconf(asoc, chunk);
> + if (retval) {
> + sctp_chunk_put(asoc->strreset_chunk);
  ^--- refcnf = 1

Won't we leak the chunk here?


> + asoc->strreset_chunk = NULL;
> +
> + for (i = 0; i < asoc->stream->outcnt; i++)
> + asoc->stream->out[i].state = SCTP_STREAM_OPEN;
> +
> + return retval;
> + }
> +
> + asoc->strreset_outstanding = 1;
> +
> + return 0;
> +}
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


RE: [PATCHv4 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces

2017-02-08 Thread Grandhi, Sainath


> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Tuesday, February 07, 2017 10:28 AM
> To: Grandhi, Sainath 
> Cc: netdev@vger.kernel.org; mah...@bandewar.net; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCHv4 0/7] Refactor macvtap to re-use tap functionality by
> other virtual intefaces
> 
> From: Sainath Grandhi 
> Date: Mon,  6 Feb 2017 13:36:08 -0800
> 
> > Tap character devices can be implemented on other virtual interfaces
> > like ipvlan, similar to macvtap. Source code for tap functionality in
> > macvtap can be re-used for this purpose.
> >
> > This patch series splits macvtap source into two modules, macvtap and tap.
> > This patch series also includes a patch for implementing tap character
> > device driver based on the IP-VLAN network interface, called ipvtap.
> >
> > These patches are tested on x86 platform.
> 
> You're going to have to rework the module and Kconfig parts of this set of
> changes.
> 
> The user should not have to modify any existing Kconfig setting to get the
> same set of modules which already exist today.
> 
> Yet when I run "make oldconfig" after applying these changes it prompts me
> for:
> 
> TAP module support for virtual interfaces (TAP) [N/m/y/?] (NEW)
> 
> And that's not really acceptable.  I had MACVTAP set, I should still get the
> infrastructure necessary to get that module built.
> 
> If you want to do patch #6 you have to do it in a way that is transparent to
> existing kernel configs.

Please check the next version of patches. Modified Kconfig to make TAP as an 
user non-visible symbol.

> 
> Thanks.


[PATCHv5 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces

2017-02-08 Thread Sainath Grandhi
Tap character devices can be implemented on other virtual interfaces like
ipvlan, similar to macvtap. Source code for tap functionality in macvtap
can be re-used for this purpose.

This patch series splits macvtap source into two modules, macvtap and tap.
This patch series also includes a patch for implementing tap character
device driver based on the IP-VLAN network interface, called ipvtap.

These patches are tested on x86 platform.

Sainath Grandhi (7):
  tap: Refactoring macvtap.c
  tap: Renaming tap related APIs, data structures, macros
  tap: Tap character device creation/destroy API
  tap: Abstract type of virtual interface from tap  implementation
  tap: Extending tap device create/destroy APIs
  tap: tap as an independent module
  ipvtap: IP-VLAN based tap driver

 drivers/net/Kconfig  |   20 +
 drivers/net/Makefile |2 +
 drivers/net/ipvlan/Makefile  |1 +
 drivers/net/ipvlan/ipvlan.h  |7 +
 drivers/net/ipvlan/ipvlan_core.c |5 +-
 drivers/net/ipvlan/ipvlan_main.c |   27 +-
 drivers/net/ipvlan/ipvtap.c  |  241 
 drivers/net/macvlan.c|2 +-
 drivers/net/macvtap.c| 1229 ++--
 drivers/net/tap.c| 1268 ++
 drivers/vhost/Kconfig|2 +-
 drivers/vhost/net.c  |3 +-
 include/linux/if_macvlan.h   |   17 +-
 include/linux/if_tap.h   |   75 +++
 14 files changed, 1690 insertions(+), 1209 deletions(-)
 create mode 100644 drivers/net/ipvlan/ipvtap.c
 create mode 100644 drivers/net/tap.c
 create mode 100644 include/linux/if_tap.h

-- 
2.7.4



[PATCHv5 3/7] tap: Tap character device creation/destroy API

2017-02-08 Thread Sainath Grandhi
This patch provides tap device create/destroy APIs in tap.c.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvtap_main.c | 30 +++---
 drivers/net/tap.c  | 62 ++
 include/linux/if_tap.h |  3 +++
 3 files changed, 63 insertions(+), 32 deletions(-)

diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 548f339..215ab7a 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -28,7 +28,6 @@
  * Variables for dealing with macvtaps device numbers.
  */
 static dev_t macvtap_major;
-#define MACVTAP_NUM_DEVS (1U << MINORBITS)
 
 static const void *macvtap_net_namespace(struct device *d)
 {
@@ -159,57 +158,46 @@ static struct notifier_block macvtap_notifier_block 
__read_mostly = {
.notifier_call  = macvtap_device_event,
 };
 
-extern struct file_operations tap_fops;
 static int macvtap_init(void)
 {
int err;
 
-   err = alloc_chrdev_region(_major, 0,
-   MACVTAP_NUM_DEVS, "macvtap");
-   if (err)
-   goto out1;
+   err = tap_create_cdev(_cdev, _major, "macvtap");
 
-   cdev_init(_cdev, _fops);
-   err = cdev_add(_cdev, macvtap_major, MACVTAP_NUM_DEVS);
if (err)
-   goto out2;
+   goto out1;
 
err = class_register(_class);
if (err)
-   goto out3;
+   goto out2;
 
err = register_netdevice_notifier(_notifier_block);
if (err)
-   goto out4;
+   goto out3;
 
err = macvlan_link_register(_link_ops);
if (err)
-   goto out5;
+   goto out4;
 
return 0;
 
-out5:
-   unregister_netdevice_notifier(_notifier_block);
 out4:
-   class_unregister(_class);
+   unregister_netdevice_notifier(_notifier_block);
 out3:
-   cdev_del(_cdev);
+   class_unregister(_class);
 out2:
-   unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
+   tap_destroy_cdev(macvtap_major, _cdev);
 out1:
return err;
 }
 module_init(macvtap_init);
 
-extern struct idr minor_idr;
 static void macvtap_exit(void)
 {
rtnl_link_unregister(_link_ops);
unregister_netdevice_notifier(_notifier_block);
class_unregister(_class);
-   cdev_del(_cdev);
-   unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
-   idr_destroy(_idr);
+   tap_destroy_cdev(macvtap_major, _cdev);
 }
 module_exit(macvtap_exit);
 
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 15ca2d5..04ba978 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -123,8 +123,12 @@ static struct proto tap_proto = {
 };
 
 #define TAP_NUM_DEVS (1U << MINORBITS)
-static DEFINE_MUTEX(minor_lock);
-DEFINE_IDR(minor_idr);
+struct major_info {
+   dev_t major;
+   struct idr minor_idr;
+   struct mutex minor_lock;
+   const char *device_name;
+} macvtap_major;
 
 #define GOODCOPY_LEN 128
 
@@ -413,26 +417,26 @@ int tap_get_minor(struct macvlan_dev *vlan)
 {
int retval = -ENOMEM;
 
-   mutex_lock(_lock);
-   retval = idr_alloc(_idr, vlan, 1, TAP_NUM_DEVS, GFP_KERNEL);
+   mutex_lock(_major.minor_lock);
+   retval = idr_alloc(_major.minor_idr, vlan, 1, TAP_NUM_DEVS, 
GFP_KERNEL);
if (retval >= 0) {
vlan->minor = retval;
} else if (retval == -ENOSPC) {
netdev_err(vlan->dev, "Too many tap devices\n");
retval = -EINVAL;
}
-   mutex_unlock(_lock);
+   mutex_unlock(_major.minor_lock);
return retval < 0 ? retval : 0;
 }
 
 void tap_free_minor(struct macvlan_dev *vlan)
 {
-   mutex_lock(_lock);
+   mutex_lock(_major.minor_lock);
if (vlan->minor) {
-   idr_remove(_idr, vlan->minor);
+   idr_remove(_major.minor_idr, vlan->minor);
vlan->minor = 0;
}
-   mutex_unlock(_lock);
+   mutex_unlock(_major.minor_lock);
 }
 
 static struct net_device *dev_get_by_tap_minor(int minor)
@@ -440,13 +444,13 @@ static struct net_device *dev_get_by_tap_minor(int minor)
struct net_device *dev = NULL;
struct macvlan_dev *vlan;
 
-   mutex_lock(_lock);
-   vlan = idr_find(_idr, minor);
+   mutex_lock(_major.minor_lock);
+   vlan = idr_find(_major.minor_idr, minor);
if (vlan) {
dev = vlan->dev;
dev_hold(dev);
}
-   mutex_unlock(_lock);
+   mutex_unlock(_major.minor_lock);
return dev;
 }
 
@@ -1184,3 +1188,39 @@ int tap_queue_resize(struct macvlan_dev *vlan)
kfree(arrays);
return ret;
 }
+
+int tap_create_cdev(struct cdev *tap_cdev,
+   dev_t *tap_major, const char *device_name)
+{
+   int err;
+
+   err = alloc_chrdev_region(tap_major, 0, TAP_NUM_DEVS, device_name);
+   if (err)
+   goto out1;
+
+   cdev_init(tap_cdev, _fops);
+ 

[PATCHv5 4/7] tap: Abstract type of virtual interface from tap implementation

2017-02-08 Thread Sainath Grandhi
macvlan object is re-structured to hold tap related elements in a separate
entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with
idr and fetched again on tap_open. Few of the tap functions are modified to
accepted tap_dev as argument. tap_dev object includes callbacks to be used by
underlying virtual interface to take care of tx and rx accounting.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvlan.c  |   2 +-
 drivers/net/macvtap_main.c |  71 +---
 drivers/net/tap.c  | 264 -
 include/linux/if_tap.h |  57 +-
 4 files changed, 229 insertions(+), 165 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 20b3fdf2..79383f9 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1526,7 +1526,6 @@ static const struct nla_policy 
macvlan_policy[IFLA_MACVLAN_MAX + 1] = {
 int macvlan_link_register(struct rtnl_link_ops *ops)
 {
/* common fields */
-   ops->priv_size  = sizeof(struct macvlan_dev);
ops->validate   = macvlan_validate;
ops->maxtype= IFLA_MACVLAN_MAX;
ops->policy = macvlan_policy;
@@ -1549,6 +1548,7 @@ static struct rtnl_link_ops macvlan_link_ops = {
.newlink= macvlan_newlink,
.dellink= macvlan_dellink,
.get_link_net   = macvlan_get_link_net,
+   .priv_size  = sizeof(struct macvlan_dev),
 };
 
 static int macvlan_device_event(struct notifier_block *unused,
diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 215ab7a..0238df6 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -24,6 +24,11 @@
 #include 
 #include 
 
+struct macvtap_dev {
+   struct macvlan_dev vlan;
+   struct tap_devtap;
+};
+
 /*
  * Variables for dealing with macvtaps device numbers.
  */
@@ -46,22 +51,55 @@ static struct cdev macvtap_cdev;
 #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
  NETIF_F_TSO6 | NETIF_F_UFO)
 
+static void macvtap_count_tx_dropped(struct tap_dev *tap)
+{
+   struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, 
tap);
+   struct macvlan_dev *vlan = >vlan;
+
+   this_cpu_inc(vlan->pcpu_stats->tx_dropped);
+}
+
+static void macvtap_count_rx_dropped(struct tap_dev *tap)
+{
+   struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, 
tap);
+   struct macvlan_dev *vlan = >vlan;
+
+   macvlan_count_rx(vlan, 0, 0, 0);
+}
+
+static void macvtap_update_features(struct tap_dev *tap,
+   netdev_features_t features)
+{
+   struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, 
tap);
+   struct macvlan_dev *vlan = >vlan;
+
+   vlan->set_features = features;
+   netdev_update_features(vlan->dev);
+}
+
 static int macvtap_newlink(struct net *src_net,
   struct net_device *dev,
   struct nlattr *tb[],
   struct nlattr *data[])
 {
-   struct macvlan_dev *vlan = netdev_priv(dev);
+   struct macvtap_dev *vlantap = netdev_priv(dev);
int err;
 
-   INIT_LIST_HEAD(>queue_list);
+   INIT_LIST_HEAD(>tap.queue_list);
 
/* Since macvlan supports all offloads by default, make
 * tap support all offloads also.
 */
-   vlan->tap_features = TUN_OFFLOADS;
+   vlantap->tap.tap_features = TUN_OFFLOADS;
 
-   err = netdev_rx_handler_register(dev, tap_handle_frame, vlan);
+   /* Register callbacks for rx/tx drops accounting and updating
+* net_device features
+*/
+   vlantap->tap.count_tx_dropped = macvtap_count_tx_dropped;
+   vlantap->tap.count_rx_dropped = macvtap_count_rx_dropped;
+   vlantap->tap.update_features  = macvtap_update_features;
+
+   err = netdev_rx_handler_register(dev, tap_handle_frame, >tap);
if (err)
return err;
 
@@ -74,14 +112,18 @@ static int macvtap_newlink(struct net *src_net,
return err;
}
 
+   vlantap->tap.dev = vlantap->vlan.dev;
+
return 0;
 }
 
 static void macvtap_dellink(struct net_device *dev,
struct list_head *head)
 {
+   struct macvtap_dev *vlantap = netdev_priv(dev);
+
netdev_rx_handler_unregister(dev);
-   tap_del_queues(dev);
+   tap_del_queues(>tap);
macvlan_dellink(dev, head);
 }
 
@@ -96,13 +138,14 @@ static struct rtnl_link_ops macvtap_link_ops __read_mostly 
= {
.setup  = macvtap_setup,
.newlink= macvtap_newlink,
.dellink= macvtap_dellink,
+   .priv_size  = sizeof(struct macvtap_dev),
 };
 
 static int macvtap_device_event(struct notifier_block *unused,
unsigned long event, void *ptr)
 {
struct net_device *dev = 

[PATCHv5 2/7] tap: Renaming tap related APIs, data structures, macros

2017-02-08 Thread Sainath Grandhi
Renaming tap related APIs, data structures and macros in tap.c from macvtap_.* 
to tap_.*

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvtap_main.c |  18 +--
 drivers/net/tap.c  | 332 ++---
 drivers/vhost/net.c|   3 +-
 include/linux/if_macvlan.h |  17 +--
 include/linux/if_macvtap.h |  10 --
 include/linux/if_tap.h |  23 
 6 files changed, 202 insertions(+), 201 deletions(-)
 delete mode 100644 include/linux/if_macvtap.h
 create mode 100644 include/linux/if_tap.h

diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 96ffa60..548f339 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -1,6 +1,6 @@
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -62,7 +62,7 @@ static int macvtap_newlink(struct net *src_net,
 */
vlan->tap_features = TUN_OFFLOADS;
 
-   err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan);
+   err = netdev_rx_handler_register(dev, tap_handle_frame, vlan);
if (err)
return err;
 
@@ -82,7 +82,7 @@ static void macvtap_dellink(struct net_device *dev,
struct list_head *head)
 {
netdev_rx_handler_unregister(dev);
-   macvtap_del_queues(dev);
+   tap_del_queues(dev);
macvlan_dellink(dev, head);
 }
 
@@ -121,7 +121,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
 * been registered but before register_netdevice has
 * finished running.
 */
-   err = macvtap_get_minor(vlan);
+   err = tap_get_minor(vlan);
if (err)
return notifier_from_errno(err);
 
@@ -129,7 +129,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
classdev = device_create(_class, >dev, devt,
 dev, tap_name);
if (IS_ERR(classdev)) {
-   macvtap_free_minor(vlan);
+   tap_free_minor(vlan);
return notifier_from_errno(PTR_ERR(classdev));
}
err = sysfs_create_link(>dev.kobj, >kobj,
@@ -144,10 +144,10 @@ static int macvtap_device_event(struct notifier_block 
*unused,
sysfs_remove_link(>dev.kobj, tap_name);
devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
device_destroy(_class, devt);
-   macvtap_free_minor(vlan);
+   tap_free_minor(vlan);
break;
case NETDEV_CHANGE_TX_QUEUE_LEN:
-   if (macvtap_queue_resize(vlan))
+   if (tap_queue_resize(vlan))
return NOTIFY_BAD;
break;
}
@@ -159,7 +159,7 @@ static struct notifier_block macvtap_notifier_block 
__read_mostly = {
.notifier_call  = macvtap_device_event,
 };
 
-extern struct file_operations macvtap_fops;
+extern struct file_operations tap_fops;
 static int macvtap_init(void)
 {
int err;
@@ -169,7 +169,7 @@ static int macvtap_init(void)
if (err)
goto out1;
 
-   cdev_init(_cdev, _fops);
+   cdev_init(_cdev, _fops);
err = cdev_add(_cdev, macvtap_major, MACVTAP_NUM_DEVS);
if (err)
goto out2;
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 6f6228e..15ca2d5 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -24,16 +24,16 @@
 #include 
 
 /*
- * A macvtap queue is the central object of this driver, it connects
+ * A tap queue is the central object of this driver, it connects
  * an open character device to a macvlan interface. There can be
  * multiple queues on one interface, which map back to queues
  * implemented in hardware on the underlying device.
  *
- * macvtap_proto is used to allocate queues through the sock allocation
+ * tap_proto is used to allocate queues through the sock allocation
  * mechanism.
  *
  */
-struct macvtap_queue {
+struct tap_queue {
struct sock sk;
struct socket sock;
struct socket_wq wq;
@@ -47,21 +47,21 @@ struct macvtap_queue {
struct skb_array skb_array;
 };
 
-#define MACVTAP_FEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE)
+#define TAP_IFFEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE)
 
-#define MACVTAP_VNET_LE 0x8000
-#define MACVTAP_VNET_BE 0x4000
+#define TAP_VNET_LE 0x8000
+#define TAP_VNET_BE 0x4000
 
 #ifdef CONFIG_TUN_VNET_CROSS_LE
-static inline bool macvtap_legacy_is_little_endian(struct macvtap_queue *q)
+static inline bool tap_legacy_is_little_endian(struct tap_queue *q)
 {
-   return q->flags & MACVTAP_VNET_BE ? false :
+   return q->flags & TAP_VNET_BE ? false :
virtio_legacy_is_little_endian();
 }
 
-static long macvtap_get_vnet_be(struct macvtap_queue *q, int __user *sp)
+static long tap_get_vnet_be(struct tap_queue *q, int __user *sp)
 

[PATCHv5 5/7] tap: Extending tap device create/destroy APIs

2017-02-08 Thread Sainath Grandhi
Extending tap APIs get/free_minor and create/destroy_cdev to handle more than 
one
type of virtual interface.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvtap_main.c |   6 +--
 drivers/net/tap.c  | 101 +++--
 include/linux/if_tap.h |   4 +-
 3 files changed, 85 insertions(+), 26 deletions(-)

diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 0238df6..a4bfc10 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -163,7 +163,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
 * been registered but before register_netdevice has
 * finished running.
 */
-   err = tap_get_minor(>tap);
+   err = tap_get_minor(macvtap_major, >tap);
if (err)
return notifier_from_errno(err);
 
@@ -171,7 +171,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
classdev = device_create(_class, >dev, devt,
 dev, tap_name);
if (IS_ERR(classdev)) {
-   tap_free_minor(>tap);
+   tap_free_minor(macvtap_major, >tap);
return notifier_from_errno(PTR_ERR(classdev));
}
err = sysfs_create_link(>dev.kobj, >kobj,
@@ -186,7 +186,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
sysfs_remove_link(>dev.kobj, tap_name);
devt = MKDEV(MAJOR(macvtap_major), vlantap->tap.minor);
device_destroy(_class, devt);
-   tap_free_minor(>tap);
+   tap_free_minor(macvtap_major, >tap);
break;
case NETDEV_CHANGE_TX_QUEUE_LEN:
if (tap_queue_resize(>tap))
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 7d3e8b1..b7cdc90 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -99,12 +99,17 @@ static struct proto tap_proto = {
 };
 
 #define TAP_NUM_DEVS (1U << MINORBITS)
+
+static LIST_HEAD(major_list);
+
 struct major_info {
+   struct rcu_head rcu;
dev_t major;
struct idr minor_idr;
struct mutex minor_lock;
const char *device_name;
-} macvtap_major;
+   struct list_head next;
+};
 
 #define GOODCOPY_LEN 128
 
@@ -385,44 +390,72 @@ rx_handler_result_t tap_handle_frame(struct sk_buff 
**pskb)
return RX_HANDLER_CONSUMED;
 }
 
-int tap_get_minor(struct tap_dev *tap)
+static struct major_info *tap_get_major(int major)
+{
+   struct major_info *tap_major;
+
+   list_for_each_entry_rcu(tap_major, _list, next) {
+   if (tap_major->major == major)
+   return tap_major;
+   }
+
+   return NULL;
+}
+
+int tap_get_minor(dev_t major, struct tap_dev *tap)
 {
int retval = -ENOMEM;
+   struct major_info *tap_major;
 
-   mutex_lock(_major.minor_lock);
-   retval = idr_alloc(_major.minor_idr, tap, 1, TAP_NUM_DEVS, 
GFP_KERNEL);
+   tap_major = tap_get_major(MAJOR(major));
+   if (!tap_major)
+   return -EINVAL;
+
+   mutex_lock(_major->minor_lock);
+   retval = idr_alloc(_major->minor_idr, tap, 1, TAP_NUM_DEVS, 
GFP_KERNEL);
if (retval >= 0) {
tap->minor = retval;
} else if (retval == -ENOSPC) {
netdev_err(tap->dev, "Too many tap devices\n");
retval = -EINVAL;
}
-   mutex_unlock(_major.minor_lock);
+   mutex_unlock(_major->minor_lock);
return retval < 0 ? retval : 0;
 }
 
-void tap_free_minor(struct tap_dev *tap)
+void tap_free_minor(dev_t major, struct tap_dev *tap)
 {
-   mutex_lock(_major.minor_lock);
+   struct major_info *tap_major;
+
+   tap_major = tap_get_major(MAJOR(major));
+   if (!tap_major)
+   return;
+
+   mutex_lock(_major->minor_lock);
if (tap->minor) {
-   idr_remove(_major.minor_idr, tap->minor);
+   idr_remove(_major->minor_idr, tap->minor);
tap->minor = 0;
}
-   mutex_unlock(_major.minor_lock);
+   mutex_unlock(_major->minor_lock);
 }
 
-static struct tap_dev *dev_get_by_tap_minor(int minor)
+static struct tap_dev *dev_get_by_tap_file(int major, int minor)
 {
struct net_device *dev = NULL;
struct tap_dev *tap;
+   struct major_info *tap_major;
+
+   tap_major = tap_get_major(major);
+   if (!tap_major)
+   return NULL;
 
-   mutex_lock(_major.minor_lock);
-   tap = idr_find(_major.minor_idr, minor);
+   mutex_lock(_major->minor_lock);
+   tap = idr_find(_major->minor_idr, minor);
if (tap) {
dev = tap->dev;
dev_hold(dev);
}
-   mutex_unlock(_major.minor_lock);
+   mutex_unlock(_major->minor_lock);
return tap;
 }
 
@@ -454,7 +487,7 @@ static int 

[PATCHv5 1/7] tap: Refactoring macvtap.c

2017-02-08 Thread Sainath Grandhi
macvtap module has code for tap/queue management and link management. This 
patch splits
the code into macvtap_main.c for link management and tap.c for tap/queue 
management.
Functionality in tap.c can be re-used for implementing tap on other virtual 
interfaces.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/Makefile |   2 +
 drivers/net/macvtap_main.c   | 218 +++
 drivers/net/{macvtap.c => tap.c} | 204 ++--
 include/linux/if_macvtap.h   |  10 ++
 4 files changed, 238 insertions(+), 196 deletions(-)
 create mode 100644 drivers/net/macvtap_main.c
 rename drivers/net/{macvtap.c => tap.c} (84%)
 create mode 100644 include/linux/if_macvtap.h

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 7336cbd..19b03a9 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -29,6 +29,8 @@ obj-$(CONFIG_GTP) += gtp.o
 obj-$(CONFIG_NLMON) += nlmon.o
 obj-$(CONFIG_NET_VRF) += vrf.o
 
+macvtap-objs := macvtap_main.o tap.o
+
 #
 # Networking Drivers
 #
diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
new file mode 100644
index 000..96ffa60
--- /dev/null
+++ b/drivers/net/macvtap_main.c
@@ -0,0 +1,218 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Variables for dealing with macvtaps device numbers.
+ */
+static dev_t macvtap_major;
+#define MACVTAP_NUM_DEVS (1U << MINORBITS)
+
+static const void *macvtap_net_namespace(struct device *d)
+{
+   struct net_device *dev = to_net_dev(d->parent);
+   return dev_net(dev);
+}
+
+static struct class macvtap_class = {
+   .name = "macvtap",
+   .owner = THIS_MODULE,
+   .ns_type = _ns_type_operations,
+   .namespace = macvtap_net_namespace,
+};
+static struct cdev macvtap_cdev;
+
+#define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
+ NETIF_F_TSO6 | NETIF_F_UFO)
+
+static int macvtap_newlink(struct net *src_net,
+  struct net_device *dev,
+  struct nlattr *tb[],
+  struct nlattr *data[])
+{
+   struct macvlan_dev *vlan = netdev_priv(dev);
+   int err;
+
+   INIT_LIST_HEAD(>queue_list);
+
+   /* Since macvlan supports all offloads by default, make
+* tap support all offloads also.
+*/
+   vlan->tap_features = TUN_OFFLOADS;
+
+   err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan);
+   if (err)
+   return err;
+
+   /* Don't put anything that may fail after macvlan_common_newlink
+* because we can't undo what it does.
+*/
+   err = macvlan_common_newlink(src_net, dev, tb, data);
+   if (err) {
+   netdev_rx_handler_unregister(dev);
+   return err;
+   }
+
+   return 0;
+}
+
+static void macvtap_dellink(struct net_device *dev,
+   struct list_head *head)
+{
+   netdev_rx_handler_unregister(dev);
+   macvtap_del_queues(dev);
+   macvlan_dellink(dev, head);
+}
+
+static void macvtap_setup(struct net_device *dev)
+{
+   macvlan_common_setup(dev);
+   dev->tx_queue_len = TUN_READQ_SIZE;
+}
+
+static struct rtnl_link_ops macvtap_link_ops __read_mostly = {
+   .kind   = "macvtap",
+   .setup  = macvtap_setup,
+   .newlink= macvtap_newlink,
+   .dellink= macvtap_dellink,
+};
+
+static int macvtap_device_event(struct notifier_block *unused,
+   unsigned long event, void *ptr)
+{
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct macvlan_dev *vlan;
+   struct device *classdev;
+   dev_t devt;
+   int err;
+   char tap_name[IFNAMSIZ];
+
+   if (dev->rtnl_link_ops != _link_ops)
+   return NOTIFY_DONE;
+
+   snprintf(tap_name, IFNAMSIZ, "tap%d", dev->ifindex);
+   vlan = netdev_priv(dev);
+
+   switch (event) {
+   case NETDEV_REGISTER:
+   /* Create the device node here after the network device has
+* been registered but before register_netdevice has
+* finished running.
+*/
+   err = macvtap_get_minor(vlan);
+   if (err)
+   return notifier_from_errno(err);
+
+   devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
+   classdev = device_create(_class, >dev, devt,
+dev, tap_name);
+   if (IS_ERR(classdev)) {
+   macvtap_free_minor(vlan);
+   return notifier_from_errno(PTR_ERR(classdev));
+   }
+   err = 

[PATCHv5 7/7] ipvtap: IP-VLAN based tap driver

2017-02-08 Thread Sainath Grandhi
This patch adds a tap character device driver that is based on the
IP-VLAN network interface, called ipvtap. An ipvtap device can be created
in the same way as an ipvlan device, using 'type ipvtap', and then accessed
using the tap user space interface.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/Kconfig  |  13 +++
 drivers/net/Makefile |   1 +
 drivers/net/ipvlan/Makefile  |   1 +
 drivers/net/ipvlan/ipvlan.h  |   7 ++
 drivers/net/ipvlan/ipvlan_core.c |   5 +-
 drivers/net/ipvlan/ipvlan_main.c |  27 +++--
 drivers/net/ipvlan/ipvtap.c  | 241 +++
 7 files changed, 281 insertions(+), 14 deletions(-)
 create mode 100644 drivers/net/ipvlan/ipvtap.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 8f6d21b4..fe83dc1 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -166,6 +166,19 @@ config IPVLAN
   To compile this driver as a module, choose M here: the module
   will be called ipvlan.
 
+config IPVTAP
+   tristate "IP-VLAN based tap driver"
+   depends on IPVLAN
+   depends on INET
+   select TAP
+   ---help---
+ This adds a specialized tap character device driver that is based
+ on the IP-VLAN network interface, called ipvtap. An ipvtap device
+ can be added in the same way as a ipvlan device, using 'type
+ ipvtap', and then be accessed through the tap user space interface.
+
+ To compile this driver as a module, choose M here: the module
+ will be called ipvtap.
 
 config VXLAN
tristate "Virtual eXtensible Local Area Network (VXLAN)"
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 7dd86ca..98ed4d9 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -7,6 +7,7 @@
 #
 obj-$(CONFIG_BONDING) += bonding/
 obj-$(CONFIG_IPVLAN) += ipvlan/
+obj-$(CONFIG_IPVTAP) += ipvlan/
 obj-$(CONFIG_DUMMY) += dummy.o
 obj-$(CONFIG_EQUALIZER) += eql.o
 obj-$(CONFIG_IFB) += ifb.o
diff --git a/drivers/net/ipvlan/Makefile b/drivers/net/ipvlan/Makefile
index df79910..8a2c64d 100644
--- a/drivers/net/ipvlan/Makefile
+++ b/drivers/net/ipvlan/Makefile
@@ -3,5 +3,6 @@
 #
 
 obj-$(CONFIG_IPVLAN) += ipvlan.o
+obj-$(CONFIG_IPVTAP) += ipvtap.o
 
 ipvlan-objs := ipvlan_core.o ipvlan_main.o
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index dbfbb33..4362d88 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -133,4 +133,11 @@ struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, 
struct sk_buff *skb,
  u16 proto);
 unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
 const struct nf_hook_state *state);
+void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
+unsigned int len, bool success, bool mcast);
+int ipvlan_link_new(struct net *src_net, struct net_device *dev,
+   struct nlattr *tb[], struct nlattr *data[]);
+void ipvlan_link_delete(struct net_device *dev, struct list_head *head);
+void ipvlan_link_setup(struct net_device *dev);
+int ipvlan_link_register(struct rtnl_link_ops *ops);
 #endif /* __IPVLAN_H */
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 83ce74a..9af16ab 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -16,8 +16,8 @@ void ipvlan_init_secret(void)
net_get_random_once(_jhash_secret, sizeof(ipvlan_jhash_secret));
 }
 
-static void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
-   unsigned int len, bool success, bool mcast)
+void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
+unsigned int len, bool success, bool mcast)
 {
if (!ipvlan)
return;
@@ -36,6 +36,7 @@ static void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
this_cpu_inc(ipvlan->pcpu_stats->rx_errs);
}
 }
+EXPORT_SYMBOL_GPL(ipvlan_count_rx);
 
 static u8 ipvlan_get_v6_hash(const void *iaddr)
 {
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 8b0f993..ed750e2 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -494,8 +494,8 @@ static int ipvlan_nl_fillinfo(struct sk_buff *skb,
return ret;
 }
 
-static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
-  struct nlattr *tb[], struct nlattr *data[])
+int ipvlan_link_new(struct net *src_net, struct net_device *dev,
+   struct nlattr *tb[], struct nlattr *data[])
 {
struct ipvl_dev *ipvlan = netdev_priv(dev);
struct ipvl_port *port;
@@ -567,8 +567,9 @@ static int ipvlan_link_new(struct net *src_net, struct 
net_device *dev,
ipvlan_port_destroy(phy_dev);
return err;
 }
+EXPORT_SYMBOL_GPL(ipvlan_link_new);
 
-static void ipvlan_link_delete(struct net_device *dev, struct list_head *head)

[PATCHv5 6/7] tap: tap as an independent module

2017-02-08 Thread Sainath Grandhi
This patch makes tap a separate module for other types of virtual interfaces, 
for example,
ipvlan to use.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/Kconfig   |  7 +++
 drivers/net/Makefile  |  3 +--
 drivers/net/{macvtap_main.c => macvtap.c} |  0
 drivers/net/tap.c | 11 +++
 drivers/vhost/Kconfig |  2 +-
 include/linux/if_tap.h|  4 ++--
 6 files changed, 22 insertions(+), 5 deletions(-)
 rename drivers/net/{macvtap_main.c => macvtap.c} (100%)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 95c32f2..8f6d21b4 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -135,6 +135,7 @@ config MACVTAP
tristate "MAC-VLAN based tap driver"
depends on MACVLAN
depends on INET
+   select TAP
help
  This adds a specialized tap character device driver that is based
  on the MAC-VLAN network interface, called macvtap. A macvtap device
@@ -284,6 +285,12 @@ config TUN
 
  If you don't know what to use this for, you don't need it.
 
+config TAP
+   tristate
+   ---help---
+ This option is selected by any driver implementing tap user space
+ interface for a virtual interface to re-use core tap functionality.
+
 config TUN_VNET_CROSS_LE
bool "Support for cross-endian vnet headers on little-endian kernels"
default n
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 19b03a9..7dd86ca 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_PHYLIB) += phy/
 obj-$(CONFIG_RIONET) += rionet.o
 obj-$(CONFIG_NET_TEAM) += team/
 obj-$(CONFIG_TUN) += tun.o
+obj-$(CONFIG_TAP) += tap.o
 obj-$(CONFIG_VETH) += veth.o
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 obj-$(CONFIG_VXLAN) += vxlan.o
@@ -29,8 +30,6 @@ obj-$(CONFIG_GTP) += gtp.o
 obj-$(CONFIG_NLMON) += nlmon.o
 obj-$(CONFIG_NET_VRF) += vrf.o
 
-macvtap-objs := macvtap_main.o tap.o
-
 #
 # Networking Drivers
 #
diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap.c
similarity index 100%
rename from drivers/net/macvtap_main.c
rename to drivers/net/macvtap.c
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index b7cdc90..a0ed508 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -312,6 +312,7 @@ void tap_del_queues(struct tap_dev *tap)
/* guarantee that any future tap_set_queue will fail */
tap->numvtaps = MAX_TAP_QUEUES;
 }
+EXPORT_SYMBOL_GPL(tap_del_queues);
 
 rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
 {
@@ -389,6 +390,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
kfree_skb(skb);
return RX_HANDLER_CONSUMED;
 }
+EXPORT_SYMBOL_GPL(tap_handle_frame);
 
 static struct major_info *tap_get_major(int major)
 {
@@ -422,6 +424,7 @@ int tap_get_minor(dev_t major, struct tap_dev *tap)
mutex_unlock(_major->minor_lock);
return retval < 0 ? retval : 0;
 }
+EXPORT_SYMBOL_GPL(tap_get_minor);
 
 void tap_free_minor(dev_t major, struct tap_dev *tap)
 {
@@ -438,6 +441,7 @@ void tap_free_minor(dev_t major, struct tap_dev *tap)
}
mutex_unlock(_major->minor_lock);
 }
+EXPORT_SYMBOL_GPL(tap_free_minor);
 
 static struct tap_dev *dev_get_by_tap_file(int major, int minor)
 {
@@ -1193,6 +1197,7 @@ int tap_queue_resize(struct tap_dev *tap)
kfree(arrays);
return ret;
 }
+EXPORT_SYMBOL_GPL(tap_queue_resize);
 
 static int tap_list_add(dev_t major, const char *device_name)
 {
@@ -1240,6 +1245,7 @@ int tap_create_cdev(struct cdev *tap_cdev,
 out1:
return err;
 }
+EXPORT_SYMBOL_GPL(tap_create_cdev);
 
 void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev)
 {
@@ -1255,3 +1261,8 @@ void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev)
list_del_rcu(_major->next);
kfree_rcu(tap_major, rcu);
 }
+EXPORT_SYMBOL_GPL(tap_destroy_cdev);
+
+MODULE_AUTHOR("Arnd Bergmann ");
+MODULE_AUTHOR("Sainath Grandhi ");
+MODULE_LICENSE("GPL");
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 40764ec..cfdecea 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,6 @@
 config VHOST_NET
tristate "Host kernel accelerator for virtio net"
-   depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP)
+   depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP)
select VHOST
---help---
  This kernel module can be loaded in host kernel to accelerate
diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h
index 362e71c..3482c3c 100644
--- a/include/linux/if_tap.h
+++ b/include/linux/if_tap.h
@@ -1,7 +1,7 @@
 #ifndef _LINUX_IF_TAP_H_
 #define _LINUX_IF_TAP_H_
 
-#if IS_ENABLED(CONFIG_MACVTAP)
+#if IS_ENABLED(CONFIG_TAP)
 struct socket *tap_get_socket(struct file *);
 #else
 #include 
@@ -12,7 +12,7 @@ static inline struct socket 

[PATCH net v3 0/2] net: ethernet: bgmac: bug fixes

2017-02-08 Thread Jon Mason
Changes in v3:
* Reworked the init sequence patch to only remove the device reset if
  the device is actually in reset.  Given that this code doesn't bear
  much resemblance to the original code, I'm changing the author of the
  patch.  This was tested on NS2 SVK.

Changes in v2:
* Reworked the first match to make it more obvious what portions of the
  register were being preserved (Per Rafal Mileki)
* Style change to reorder the function variables in patch 2 (per Sergei
  Shtylyov)


Bug fixes for bgmac driver


Hari Vyas (1):
  net: ethernet: bgmac: mac address change bug

Jon Mason (1):
  net: ethernet: bgmac: init sequence bug

 drivers/net/ethernet/broadcom/bgmac-platform.c | 27 --
 drivers/net/ethernet/broadcom/bgmac.c  |  6 +-
 drivers/net/ethernet/broadcom/bgmac.h  | 16 +++
 3 files changed, 38 insertions(+), 11 deletions(-)

-- 
2.7.4



[PATCH] netlink: move nla_put_{u8,u16,u32} out of line

2017-02-08 Thread Arnd Bergmann
When CONFIG_KASAN is enabled, the "--param asan-stack=1" causes rather large
stack frames in some functions. This goes unnoticed normally because
CONFIG_FRAME_WARN is disabled with CONFIG_KASAN by default as of commit
3f181b4d8652 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with
KASAN=y").

The kernelci.org build bot however has the warning enabled and that led
me to investigate it a little further, as every build produces these warnings:

net/wireless/nl80211.c:4389:1: warning: the frame size of 2240 bytes is larger 
than 2048 bytes [-Wframe-larger-than=]
net/wireless/nl80211.c:1895:1: warning: the frame size of 3776 bytes is larger 
than 2048 bytes [-Wframe-larger-than=]
net/wireless/nl80211.c:1410:1: warning: the frame size of 2208 bytes is larger 
than 2048 bytes [-Wframe-larger-than=]
net/bridge/br_netlink.c:1282:1: warning: the frame size of 2544 bytes is larger 
than 2048 bytes [-Wframe-larger-than=]

It turns out that there is a relatively simple workaround for the netlink
users that currently use a local variable in order to do the type conversion:
Moving the three functions (for each of the typical sizes) to lib/nlattr.c
avoids using local variables in the caller, which drastically reduces the
stack usage for nl80211 and br_netlink.

It would be good if we could enable the frame size check after that again,
but that should be a separate patch and it requires some more testing
to see which the largest acceptable frame size should be.

Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Dmitry Vyukov 
Cc: kasan-...@googlegroups.com
Signed-off-by: Arnd Bergmann 
---
 include/net/netlink.h | 23 +++
 lib/nlattr.c  | 18 ++
 2 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index b239fcd33d80..48b117e80509 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -755,10 +755,7 @@ static inline int nla_parse_nested(struct nlattr *tb[], 
int maxtype,
  * @attrtype: attribute type
  * @value: numeric value
  */
-static inline int nla_put_u8(struct sk_buff *skb, int attrtype, u8 value)
-{
-   return nla_put(skb, attrtype, sizeof(u8), );
-}
+extern int nla_put_u8(struct sk_buff *skb, int attrtype, u8 value);
 
 /**
  * nla_put_u16 - Add a u16 netlink attribute to a socket buffer
@@ -766,10 +763,7 @@ static inline int nla_put_u8(struct sk_buff *skb, int 
attrtype, u8 value)
  * @attrtype: attribute type
  * @value: numeric value
  */
-static inline int nla_put_u16(struct sk_buff *skb, int attrtype, u16 value)
-{
-   return nla_put(skb, attrtype, sizeof(u16), );
-}
+extern int nla_put_u16(struct sk_buff *skb, int attrtype, u16 value);
 
 /**
  * nla_put_be16 - Add a __be16 netlink attribute to a socket buffer
@@ -779,7 +773,7 @@ static inline int nla_put_u16(struct sk_buff *skb, int 
attrtype, u16 value)
  */
 static inline int nla_put_be16(struct sk_buff *skb, int attrtype, __be16 value)
 {
-   return nla_put(skb, attrtype, sizeof(__be16), );
+   return nla_put_u16(skb, attrtype, (u16 __force)value);
 }
 
 /**
@@ -801,7 +795,7 @@ static inline int nla_put_net16(struct sk_buff *skb, int 
attrtype, __be16 value)
  */
 static inline int nla_put_le16(struct sk_buff *skb, int attrtype, __le16 value)
 {
-   return nla_put(skb, attrtype, sizeof(__le16), );
+   return nla_put_u16(skb, attrtype, (u16 __force)value);
 }
 
 /**
@@ -810,10 +804,7 @@ static inline int nla_put_le16(struct sk_buff *skb, int 
attrtype, __le16 value)
  * @attrtype: attribute type
  * @value: numeric value
  */
-static inline int nla_put_u32(struct sk_buff *skb, int attrtype, u32 value)
-{
-   return nla_put(skb, attrtype, sizeof(u32), );
-}
+int nla_put_u32(struct sk_buff *skb, int attrtype, u32 value);
 
 /**
  * nla_put_be32 - Add a __be32 netlink attribute to a socket buffer
@@ -823,7 +814,7 @@ static inline int nla_put_u32(struct sk_buff *skb, int 
attrtype, u32 value)
  */
 static inline int nla_put_be32(struct sk_buff *skb, int attrtype, __be32 value)
 {
-   return nla_put(skb, attrtype, sizeof(__be32), );
+   return nla_put_u32(skb, attrtype, (u32 __force)value);
 }
 
 /**
@@ -845,7 +836,7 @@ static inline int nla_put_net32(struct sk_buff *skb, int 
attrtype, __be32 value)
  */
 static inline int nla_put_le32(struct sk_buff *skb, int attrtype, __le32 value)
 {
-   return nla_put(skb, attrtype, sizeof(__le32), );
+   return nla_put_u32(skb, attrtype, (u32 __force)value);
 }
 
 /**
diff --git a/lib/nlattr.c b/lib/nlattr.c
index b42b8577fc23..2988b08a7e4d 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -548,6 +548,24 @@ int nla_put(struct sk_buff *skb, int attrtype, int 
attrlen, const void *data)
 }
 EXPORT_SYMBOL(nla_put);
 
+int nla_put_u8(struct sk_buff *skb, int attrtype, u8 value)
+{
+   return nla_put(skb, attrtype, sizeof(u8), );
+}
+EXPORT_SYMBOL(nla_put_u8);
+
+int 

  1   2   3   4   >