date:20160627

Re: [PATCH net-next 08/16] net/devlink: Add E-Switch mode control

2016-06-27 Thread John Fastabend

On 16-06-27 09:07 AM, Saeed Mahameed wrote:
> From: Or Gerlitz 
> 
> Add the commands to set and show the mode of SRIOV E-Switch,
> two modes are supported:
> 
> * legacy   : operating in the "old" L2 based mode (DMAC --> VF vport)
> * offloads : offloading SW rules/policy (e.g Bridge/FDB or TC/Flows based) 
> set by the host OS
> 
> Signed-off-by: Or Gerlitz 
> Signed-off-by: Saeed Mahameed 
> ---

Hi,

Nice work overall also I really appreciated that the core networking
interfaces appear to able to support this without any change.

On this patch though do we really need modes like this? My concern with
modes is two fold. One its another knob that some controller will have
to get right which I would prefer to avoid. And two I suspect switching
between the two modes flushes the tables or leaves them in some
unexpected state? At least I can't figure out what the expected should
be off-hand.

Could we instead continue to use the "legacy" mode by default by just
populating the fdb table correctly and then if users want to enable
the "offloads" mode they can modify the fdb tables by deleting entries
or adding them or just extending the dmac/vf mapping via 'tc'. This
would seem natural to me. The flooding rules in fdb might need to be
exposed a bit more cleanly to get the right default flooding behavior
etc. But to me at least this would be much cleaner. Everything will be
nicely defined and we wont have issues with drivers doing slightly
and subtle different defaults between legacy/offload and the transitions
between the states or on resets or etc. If users need to discover the
current configuration then they just query fdb, query tc, and the state
is known no need for any magic toggle switch as best I can see.

Otherwise I didn't review the mlx code but read the commit msgs and
it looks good. I'll take a closer look in the morning.

Thanks,
John

Re: [PATCH net-next 6/9] net: hns: normalize two different loop

2016-06-27 Thread Daode Huang




On 2016/6/27 20:13, Andy Shevchenko wrote:

On Mon, 2016-06-27 at 05:08 -0700, Joe Perches wrote:

On Mon, 2016-06-27 at 15:00 +0300, Andy Shevchenko wrote:

On Mon, 2016-06-27 at 04:49 -0700, Joe Perches wrote:

On Mon, 2016-06-27 at 17:54 +0800, Yisen Zhuang wrote:

From: Daode Huang 

There are two approaches to assign data, one does 2 loops,
another
does 1 loop. This patch normalize the different methods to 1
loop.

[]

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c

[]

@@ -2567,15 +2567,15 @@ static char
*hns_dsaf_get_node_stats_strings(char *data, int node,
buff += ETH_GSTRING_LEN;
if (node < DSAF_SERVICE_NW_NUM && !is_ver1) {
for (i = 0; i < DSAF_PRIO_NR; i++) {
-   snprintf(buff, ETH_GSTRING_LEN,
-"inod%d_pfc_prio%d_pkts",
node,
i);
-   buff += ETH_GSTRING_LEN;
-   }
-   for (i = 0; i < DSAF_PRIO_NR; i++) {
-   snprintf(buff, ETH_GSTRING_LEN,
-"onod%d_pfc_prio%d_pkts",
node,
i);
+   snprintf(buff + 0 * ETH_GSTRING_LEN *
DSAF_PRIO_NR,
+ETH_GSTRING_LEN,
"inod%d_pfc_prio%d_pkts",
+node, i);
+   snprintf(buff + 1 * ETH_GSTRING_LEN *
DSAF_PRIO_NR,
+ETH_GSTRING_LEN,
"onod%d_pfc_prio%d_pkts",
+node, i);
buff += ETH_GSTRING_LEN;

This looks odd and likely incorrect.

Why? the idea is to print stats for Rx and Tx at once.

I hope it was tested.

It changes the order of the strings in buff.

I don't see how.

Hi Andy,
The patch has been tested when sent out.


Is a bug fix or a style fix?

If it's a bug fix, then it should likely be added
to the stable trees.

I doubt it's a bug fix.


Because the previous patch is accepted in net-next, and this set is
an appendix to the series, in order to avoid merge conflict, we also
send this bug fix to net-next.

thanks.

Re: [PATCH v2] notifier: Fix soft lockup for notifier_call_chain().

2016-06-27 Thread Eric Dumazet

On Tue, 2016-06-28 at 12:56 +0800, Ding Tianhong wrote:
> The problem was occurs in my system that a lot of drviers register
> its own handler to the notifiler call chain for netdev_chain, and
> then create 4095 vlan dev for one nic, and add several ipv6 address
> on each one of them, just like this:
> 
> for i in `seq 1 4095`; do ip link add link eth0 name eth0.$i type vlan id $i; 
> done
> for i in `seq 1 4095`; do ip -6 addr add 2001::$i dev eth0.$i; done
> for i in `seq 1 4095`; do ip -6 addr add 2002::$i dev eth0.$i; done
> for i in `seq 1 4095`; do ip -6 addr add 2003::$i dev eth0.$i; done
> 
> ifconfig eth0 up
> ifconfig eth0 down

I would very much prefer cond_resched() at a more appropriate place.

touch_nmi_watchdog() does not fundamentally solve the issue, as some
process is holding one cpu for a very long time.

Probably in addrconf_ifdown(), as if you have 100,000 IPv6 addresses on
a single netdev, this function might also trigger a soft lockup, without
playing with 4096 vlans...

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 
a1f6b7b315317f811cafbf386cf21dfc510c2010..13b675f79a751db45af28fc0474ddb17d9b69b06
 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3566,6 +3566,7 @@ restart:
}
}
spin_unlock_bh(_hash_lock);
+   cond_resched();
}
 
write_lock_bh(>lock);

Re: IP ID check (flush_id) in inet_gro_receive is necessary or not？

2016-06-27 Thread Eric Dumazet

On Tue, 2016-06-28 at 12:40 +0800, Tan Xiaojun wrote:
> Hi everyone,
> 
>   I'm sorry to bother you. But I was confused.
> 
>   The IP ID check (flush_id) in inet_gro_receive is only used by
> tcp_gro_receive, and in tcp_gro_receive we have tcphdr check to ensure
> the order of skbs,
>   like below:
> 
>   flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
>   flush |= (ntohl(th2->seq) + skb_gro_len(p)) ^ ntohl(th->seq);
> 
>   So if I remove the IP ID check in inet_gro_receive, there will be a
> problem ? And under what circumstances ?

You probably missed a recent patch ?

commit 1530545ed64b42e87acb43c0c16401bd1ebae6bf
Author: Alexander Duyck 
Date:   Sun Apr 10 21:44:57 2016 -0400

GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID values

This patch does two things.

First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field.  As
a result we should now be able to aggregate flows that were converted from
IPv6 to IPv4.  In addition this allows us more flexibility for future
implementations of segmentation as we may be able to use a fixed IP ID when
segmenting the flow.

The second thing this does is that it places limitations on the outer IPv4
ID header in the case of tunneled frames.  Specifically it forces the IP ID
to be incrementing by 1 unless the DF bit is set in the outer IPv4 header.
This way we can avoid creating overlapping series of IP IDs that could
possibly be fragmented if the frame goes through GRO and is then
resegmented via GSO.

Signed-off-by: Alexander Duyck 
Signed-off-by: David S. Miller

[PATCH v2] notifier: Fix soft lockup for notifier_call_chain().

2016-06-27 Thread Ding Tianhong

The problem was occurs in my system that a lot of drviers register
its own handler to the notifiler call chain for netdev_chain, and
then create 4095 vlan dev for one nic, and add several ipv6 address
on each one of them, just like this:

for i in `seq 1 4095`; do ip link add link eth0 name eth0.$i type vlan id $i; 
done
for i in `seq 1 4095`; do ip -6 addr add 2001::$i dev eth0.$i; done
for i in `seq 1 4095`; do ip -6 addr add 2002::$i dev eth0.$i; done
for i in `seq 1 4095`; do ip -6 addr add 2003::$i dev eth0.$i; done

ifconfig eth0 up
ifconfig eth0 down

then it will halt several seconds, and occurs softlockup:

<0>[ 7620.364058]NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[ifconfig:19186]
<0>[ 7620.364592]Call trace:
<4>[ 7620.364599][] dump_backtrace+0x0/0x220
<4>[ 7620.364603][] show_stack+0x20/0x28
<4>[ 7620.364607][] dump_stack+0x90/0xb0
<4>[ 7620.364612][] watchdog_timer_fn+0x41c/0x460
<4>[ 7620.364617][] __run_hrtimer+0x98/0x2d8
<4>[ 7620.364620][] hrtimer_interrupt+0x110/0x288
<4>[ 7620.364624][] arch_timer_handler_phys+0x38/0x48
<4>[ 7620.364628][] handle_percpu_devid_irq+0x9c/0x190
<4>[ 7620.364632][] generic_handle_irq+0x40/0x58
<4>[ 7620.364635][] __handle_domain_irq+0x68/0xc0
<4>[ 7620.364638][] gic_handle_irq+0xc4/0x1c8
<4>[ 7620.364641]Exception stack(0xffc0309b3640 to 0xffc0309b3770)
<4>[ 7620.364644]3640: 1000  ffc0309b37c0 
ffbfa1019cf8
<4>[ 7620.364647]3660: 8145 ffc0309b3958  
ffbfa1013008
<4>[ 7620.364651]3680: 07f0 ffbfa131b770 ffd08aaadc40 
ffbfa1019cf8
<4>[ 7620.364654]36a0: ffbfa1019cc4 ffd089c2b000 ffd08eff8000 
ffc0309b3958
<4>[ 7620.364656]36c0: ffbfa101c5c0   
ffbfa101c66c
<4>[ 7620.364659]36e0: 7f7f7f7f7f7f7f7f 0030  

<4>[ 7620.364662]3700:   ffc000393d58 
007f794d67b0
<4>[ 7620.364665]3720: 007fe62215d0 ffc0309b3830 ffc00021d8e0 
ffbfa1049b68
<4>[ 7620.364668]3740: ffc000697578 ffc0006974b8 ffc0309b3958 

<4>[ 7620.364670]3760: ffbfa1013008 07f0
<4>[ 7620.364673][] el1_irq+0x80/0x100
<4>[ 7620.364692][] fib6_walk+0x3c/0x70 [ipv6]
<4>[ 7620.364710][] fib6_clean_tree+0x68/0x90 [ipv6]
<4>[ 7620.364727][] __fib6_clean_all+0x88/0xc0 [ipv6]
<4>[ 7620.364746][] fib6_clean_all+0x28/0x30 [ipv6]
<4>[ 7620.364763][] rt6_ifdown+0x64/0x148 [ipv6]
<4>[ 7620.364781][] addrconf_ifdown+0x68/0x540 [ipv6]
<4>[ 7620.364798][] addrconf_notify+0xd0/0x8b8 [ipv6]
<4>[ 7620.364801][] notifier_call_chain+0x5c/0xa0
<4>[ 7620.364804][] raw_notifier_call_chain+0x20/0x28
<4>[ 7620.364809][] call_netdevice_notifiers_info+0x4c/0x80
<4>[ 7620.364812][] dev_close_many+0xd0/0x138
<4>[ 7620.364821][] vlan_device_event+0x4a8/0x6a0 [8021q]
<4>[ 7620.364824][] notifier_call_chain+0x5c/0xa0
<4>[ 7620.364827][] raw_notifier_call_chain+0x20/0x28
<4>[ 7620.364830][] call_netdevice_notifiers_info+0x4c/0x80
<4>[ 7620.364833][] __dev_notify_flags+0xb8/0xe0
<4>[ 7620.364836][] dev_change_flags+0x54/0x68
<4>[ 7620.364840][] devinet_ioctl+0x650/0x700
<4>[ 7620.364843][] inet_ioctl+0xa4/0xc8
<4>[ 7620.364847][] sock_do_ioctl+0x44/0x88
<4>[ 7620.364850][] sock_ioctl+0x23c/0x308
<4>[ 7620.364854][] do_vfs_ioctl+0x48c/0x620
<4>[ 7620.364857][] SyS_ioctl+0x94/0xa8

=cut 
here

It looks that the notifier_call_chain has to deal with too much handler, and 
will not
feed the watchdog until finish the work, and the notifier_call_chain may be 
called
in atomic context, so add touch_nmi_watchdog() in the loops to fix this problem,
and it will not panic again.

v2: add cond_resched() will break the atomic context, so feed the watchdog in
the loops to fix this bug.

Signed-off-by: Ding Tianhong 
---
 kernel/notifier.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/notifier.c b/kernel/notifier.c
index fd2c9ac..7eca3c1 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Notifier list for kernel code which wants to be called
@@ -92,6 +93,8 @@ static int notifier_call_chain(struct notifier_block **nl,
 #endif
ret = nb->notifier_call(nb, val, v);
 
+   touch_nmi_watchdog();
+
if (nr_calls)
(*nr_calls)++;
 
-- 
1.9.0

IP ID check (flush_id) in inet_gro_receive is necessary or not？

2016-06-27 Thread Tan Xiaojun

Hi everyone,

I'm sorry to bother you. But I was confused.

The IP ID check (flush_id) in inet_gro_receive is only used by 
tcp_gro_receive, and in tcp_gro_receive we have tcphdr check to ensure the 
order of skbs,
like below:

flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
flush |= (ntohl(th2->seq) + skb_gro_len(p)) ^ ntohl(th->seq);

So if I remove the IP ID check in inet_gro_receive, there will be a 
problem ? And under what circumstances ?


Thanks.
Xiaojun.

Re: [PATCH] rtlwifi: Create _rtl_dbg_trace function to reduce RT_TRACE code size

2016-06-27 Thread Joe Perches

On Mon, 2016-06-27 at 19:53 -0500, Larry Finger wrote:
> On 06/25/2016 05:46 PM, Joe Perches wrote:
> > 
> > This debugging macro can expand to a lot of code.
> > Make it a function to reduce code size.
> > 
> > (x86-64 defconfig w/ all rtlwifi drivers and allyesconfig)
> > $ size drivers/net/wireless/realtek/rtlwifi/built-in.o*
> > text   data bss dec hex filename
> >   900083 200499    1907 1102489  10d299 
> > drivers/net/wireless/realtek/rtlwifi/built-in.o.defconfig.new
> > 1113597  200499    1907 1316003  1414a3 
> > drivers/net/wireless/realtek/rtlwifi/built-in.o.defconfig.old
> > 1746879  453503    8512 2208894  21b47e 
> > drivers/net/wireless/realtek/rtlwifi/built-in.o.new
> > 2051965  503311    8512 2563788  271ecc 
> > drivers/net/wireless/realtek/rtlwifi/built-in.o.old
> > 
> > Signed-off-by: Joe Perches 
> I acked this before; however there is a bug that breaks the build if 
> CONFIG_RTLWIFI_DEBUG is not defined. The rest of the code calls 
> _rtl_dbg_trace(), but that symbol is never defined. The problem can be fixed 
> in 
> debug.c or debug.h.

Confused a bit.  What breaks again?

debug.h:

#ifdef CONFIG_RTLWIFI_DEBUG
[]
__printf(5, 6)
void _rtl_dbg_trace(struct rtl_priv *rtlpriv, int comp, int level,
const char *modname, const char *fmt, ...);

#define RT_TRACE(rtlpriv, comp, level, fmt, ...)\
_rtl_dbg_trace(rtlpriv, comp, level,\
   KBUILD_MODNAME, fmt, ##__VA_ARGS__)
[]
#else
[]
__printf(4, 5)
static inline void RT_TRACE(struct rtl_priv *rtlpriv,
int comp, int level,
const char *fmt, ...)
{
}
[]
#endif

Re: [PATCH v2 net-next] tcp: md5: use kmalloc() backed scratch areas

2016-06-27 Thread Herbert Xu

On Mon, Jun 27, 2016 at 10:58:42AM -0700, Andy Lutomirski wrote:
>
> I wonder if it's worth switching from ahash to shash, though.  It
> would probably be simpler and faster.

No shash is not appropriate here because it needs to hash skb
frags which are SG lists.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] notifier: Fix soft lockup for notifier_call_chain().

2016-06-27 Thread Ding Tianhong

On 2016/6/28 3:50, Cong Wang wrote:
> On Fri, Jun 24, 2016 at 7:46 PM, Ding Tianhong  
> wrote:
>> diff --git a/kernel/notifier.c b/kernel/notifier.c
>> index fd2c9ac..9c30411 100644
>> --- a/kernel/notifier.c
>> +++ b/kernel/notifier.c
>> @@ -92,6 +92,8 @@ static int notifier_call_chain(struct notifier_block **nl,
>>  #endif
>> ret = nb->notifier_call(nb, val, v);
>>
>> +   cond_resched();
>> +
>> if (nr_calls)
>> (*nr_calls)++;
> 
> NAK.
> 
> You can't do a resched in atomic context in __atomic_notifier_call_chain().
> 
> 
Sorry, I miss this, so I think add touch_nmi_watchdog looks like the best 
solution for this problem.

Thanks
Ding

Re: [PATCH] geneve: fix max_mtu setting

2016-06-27 Thread Jesse Gross

On Mon, Jun 27, 2016 at 6:27 PM, 严海双  wrote:
>
> On Jun 28, 2016, at 12:10 AM, Jesse Gross  wrote:
>
> On Sun, Jun 26, 2016 at 6:13 PM, Haishuang Yan
>  wrote:
>
>
> On Jun 26, 2016, at 8:35 PM, zhuyj  wrote:
>
> +   if (geneve->remote.sa.sa_family == AF_INET)
> +   max_mtu -= sizeof(struct iphdr);
> +   else
> +   max_mtu -= sizeof(struct ipv6hdr);
>
> Sorry, if sa_family is not AF_NET, it is AF_INET6?
>
> There is a lot of macros in include/linux/socket.h.
>
> Zhu Yanjun
>
>
> There are only two enumerations AF_INET and AF_INET6 have been assigned in
> geneve_newlink:
>
>
> There's actually a third possibility: AF_UNSPEC, which is the default
> if neither remote type is specified. This is used by lightweight
> tunnels and should be able to work with either IPv4/v6. For the
> purposes of the MTU calculation this means that the IPv4 header size
> should be used to avoid disallowing potentially valid configurations.
>
>
> Yes, you’re right. Thanks for you advise. I will send a v2 commit like this:
>
>if (geneve->remote.sa.sa_family == AF_INET6)
>   max_mtu -= sizeof(struct ipv6hdr);
>else
>   max_mtu -= sizeof(struct iphdr);
>
> Is this ok?

Yes, that looks fine to me.

[PATCH net-next v2 6/6] r8152: add byte_enable for ocp_read_word function

2016-06-27 Thread Hayes Wang

Add byte_enable for ocp_read_word() to replace reading 4
bytes data with reading the desired 2 bytes data.

This is used to avoid the issue which is described in
commit b4d99def0938 ("r8152: remove sram_read"). The
origin method always reads 4 bytes data, and it may
have problem when reading the PHY registers.

The new method is supported since RTL8152B, but it
doesn't influence the previous chips. The bits of the
byte_enable for the previous chips are the reserved
bits, and the hw would ignore them.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 2fd4944..0bb7c1b 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -945,11 +945,13 @@ static u16 ocp_read_word(struct r8152 *tp, u16 type, u16 
index)
 {
u32 data;
__le32 tmp;
+   u16 byen = BYTE_EN_WORD;
u8 shift = index & 2;
 
index &= ~3;
+   byen <<= shift;
 
-   generic_ocp_read(tp, index, sizeof(tmp), , type);
+   generic_ocp_read(tp, index, sizeof(tmp), , type | byen);
 
data = __le32_to_cpu(tmp);
data >>= (shift * 8);
-- 
2.4.11

[PATCH net-next v2 5/6] r8152: support RTL8153B

2016-06-27 Thread Hayes Wang

Support new chip RTL8153B.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 560 +---
 1 file changed, 533 insertions(+), 27 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 3ccbff0..2fd4944 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -28,7 +28,7 @@
 #include 
 
 /* Information for net-next */
-#define NETNEXT_VERSION"08"
+#define NETNEXT_VERSION"09"
 
 /* Information for net */
 #define NET_VERSION"3"
@@ -50,11 +50,14 @@
 #define PLA_FMC0xc0b4
 #define PLA_CFG_WOL0xc0b6
 #define PLA_TEREDO_CFG 0xc0bc
+#define PLA_TEREDO_WAKE_BASE   0xc0c4
 #define PLA_MAR0xcd00
 #define PLA_BACKUP 0xd000
 #define PAL_BDC_CR 0xd1a0
 #define PLA_TEREDO_TIMER   0xd2cc
 #define PLA_REALWOW_TIMER  0xd2e8
+#define PLA_EFUSE_DATA 0xdd00
+#define PLA_EFUSE_CMD  0xdd02
 #define PLA_LEDSEL 0xdd90
 #define PLA_LED_FEATURE0xdd92
 #define PLA_PHYAR  0xde00
@@ -104,7 +107,9 @@
 #define USB_CSR_DUMMY2 0xb466
 #define USB_DEV_STAT   0xb808
 #define USB_CONNECT_TIMER  0xcbf8
+#define USB_MSC_TIMER  0xcbfc
 #define USB_BURST_SIZE 0xcfc0
+#define USB_LPM_CONFIG 0xcfd8
 #define USB_USB_CTRL   0xd406
 #define USB_PHY_CTRL   0xd408
 #define USB_TX_AGG 0xd40a
@@ -112,14 +117,19 @@
 #define USB_USB_TIMER  0xd428
 #define USB_RX_EARLY_TIMEOUT   0xd42c
 #define USB_RX_EARLY_SIZE  0xd42e
-#define USB_PM_CTRL_STATUS 0xd432
+#define USB_PM_CTRL_STATUS 0xd432  /* RTL8153A */
+#define USB_RX_EXTRA_AGGR_TMR  0xd432  /* RTL8153B */
 #define USB_TX_DMA 0xd434
+#define USB_UPT_RXDMA_OWN  0xd437
 #define USB_TOLERANCE  0xd490
 #define USB_LPM_CTRL   0xd41a
+#define USB_U1U2_TIMER 0xd4da
 #define USB_UPS_CTRL   0xd800
 #define USB_MISC_0 0xd81a
 #define USB_POWER_CUT  0xd80a
 #define USB_AFE_CTRL2  0xd824
+#define USB_UPS_CFG0xd842
+#define USB_UPS_FLAGS  0xd848
 #define USB_WDT11_CTRL 0xe43c
 #define USB_BP_BA  0xfc26
 #define USB_BP_0   0xfc28
@@ -141,6 +151,7 @@
 #define OCP_EEE_AR 0xa41a
 #define OCP_EEE_DATA   0xa41c
 #define OCP_PHY_STATUS 0xa420
+#define OCP_NCTL_CFG   0xa42c
 #define OCP_POWER_CFG  0xa430
 #define OCP_EEE_CFG0xa432
 #define OCP_SRAM_ADDR  0xa436
@@ -150,9 +161,14 @@
 #define OCP_EEE_ADV0xa5d0
 #define OCP_EEE_LPABLE 0xa5d2
 #define OCP_PHY_STATE  0xa708  /* nway state for 8153 */
+#define OCP_PHY_PATCH_STAT 0xb800
+#define OCP_PHY_PATCH_CMD  0xb820
+#define OCP_ADC_IOFFSET0xbcfc
 #define OCP_ADC_CFG0xbc06
+#define OCP_SYSCLK_CFG 0xc416
 
 /* SRAM Register */
+#define SRAM_GREEN_CFG 0x8011
 #define SRAM_LPF_CFG   0x8012
 #define SRAM_10M_AMP1  0x8080
 #define SRAM_10M_AMP2  0x8082
@@ -250,6 +266,10 @@
 /* PAL_BDC_CR */
 #define ALDPS_PROXY_MODE   0x0001
 
+/* PLA_EFUSE_CMD */
+#define EFUSE_READ_CMD BIT(15)
+#define EFUSE_DATA_BIT16   BIT(7)
+
 /* PLA_CONFIG34 */
 #define LINK_ON_WAKE_EN0x0010
 #define LINK_OFF_WAKE_EN   0x0008
@@ -275,6 +295,7 @@
 
 /* PLA_MAC_PWR_CTRL2 */
 #define EEE_SPDWN_RATIO0x8007
+#define MAC_CLK_SPDWN_EN   BIT(15)
 
 /* PLA_MAC_PWR_CTRL3 */
 #define PKT_AVAIL_SPDWN_EN 0x0100
@@ -326,6 +347,9 @@
 #define STAT_SPEED_HIGH0x
 #define STAT_SPEED_FULL0x0002
 
+/* USB_LPM_CONFIG */
+#define LPM_U1U2_ENBIT(0)
+
 /* USB_TX_AGG */
 #define TX_AGG_MAX_THRESHOLD   0x03
 
@@ -333,11 +357,16 @@
 #define RX_THR_SUPPER  0x0c350180
 #define RX_THR_HIGH0x7a120180
 #define RX_THR_SLOW0x0180
+#define RX_THR_B   0x00010001
 
 /* USB_TX_DMA */
 #define TEST_MODE_DISABLE  0x0001
 #define TX_SIZE_ADJUST10x0100
 
+/* USB_UPT_RXDMA_OWN */
+#define OWN_UPDATE BIT(0)
+#define OWN_CLEAR  BIT(1)
+
 /* USB_UPS_CTRL */
 #define POWER_CUT  0x0100
 
@@ -354,6 +383,8 @@
 /* USB_POWER_CUT */
 #define PWR_EN 0x0001
 #define PHASE2_EN  0x0008
+#define UPS_EN BIT(4)
+#define USP_PREWAKEBIT(5)
 
 /* USB_MISC_0 */
 #define PCUT_STATUS0x0001
@@ -380,6 +411,37 @@
 #define SEN_VAL_NORMAL 0xa000
 #define SEL_RXIDLE 0x0100
 
+/* USB_UPS_CFG */
+#define SAW_CNT_1MS_MASK   0x0fff
+
+/* USB_UPS_FLAGS */
+#define UPS_FLAGS_R_TUNE   BIT(0)
+#define UPS_FLAGS_EN_10M_CKDIV BIT(1)
+#define UPS_FLAGS_250M_CKDIV

[PATCH net-next v2 0/6] r8152: support new chips

2016-06-27 Thread Hayes Wang

v2:
Fix the commit message for patch #6.

v1:
In order to support new chips, adjust some codes. Then, add the settings
for the new chips.

Hayes Wang (6):
  r8152: add aldps_enable for rtl_ops
  r8152: add u1u2_enable for rtl_ops
  r8152: add power_cut_en for rtl_ops
  r8152: support the new chip 8050
  r8152: support RTL8153B
  r8152: add byte_enable for ocp_read_word function

 drivers/net/usb/r8152.c | 621 
 1 file changed, 576 insertions(+), 45 deletions(-)

-- 
2.4.11

[PATCH net-next v2 4/6] r8152: support the new chip 8050

2016-06-27 Thread Hayes Wang

Support a new chip which has the product ID 0x8050.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index a4f8a01..3ccbff0 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -646,6 +646,7 @@ enum rtl_version {
RTL_VER_04,
RTL_VER_05,
RTL_VER_06,
+   RTL_VER_07,
RTL_VER_MAX
 };
 
@@ -3920,6 +3921,7 @@ static int rtl8152_get_coalesce(struct net_device *netdev,
switch (tp->version) {
case RTL_VER_01:
case RTL_VER_02:
+   case RTL_VER_07:
return -EOPNOTSUPP;
default:
break;
@@ -3939,6 +3941,7 @@ static int rtl8152_set_coalesce(struct net_device *netdev,
switch (tp->version) {
case RTL_VER_01:
case RTL_VER_02:
+   case RTL_VER_07:
return -EOPNOTSUPP;
default:
break;
@@ -4038,6 +4041,7 @@ static int rtl8152_change_mtu(struct net_device *dev, int 
new_mtu)
switch (tp->version) {
case RTL_VER_01:
case RTL_VER_02:
+   case RTL_VER_07:
return eth_change_mtu(dev, new_mtu);
default:
break;
@@ -4109,6 +4113,9 @@ static void r8152b_get_version(struct r8152 *tp)
tp->version = RTL_VER_06;
tp->mii.supports_gmii = 1;
break;
+   case 0x4800:
+   tp->version = RTL_VER_07;
+   break;
default:
netif_info(tp, probe, tp->netdev,
   "Unknown version 0x%04x\n", version);
@@ -4141,6 +4148,7 @@ static int rtl_ops_init(struct r8152 *tp)
switch (tp->version) {
case RTL_VER_01:
case RTL_VER_02:
+   case RTL_VER_07:
ops->init   = r8152b_init;
ops->enable = rtl8152_enable;
ops->disable= rtl8152_disable;
@@ -4336,6 +4344,7 @@ static void rtl8152_disconnect(struct usb_interface *intf)
 
 /* table of devices that work with this driver */
 static struct usb_device_id rtl8152_table[] = {
+   {REALTEK_USB_DEVICE(VENDOR_ID_REALTEK, 0x8050)},
{REALTEK_USB_DEVICE(VENDOR_ID_REALTEK, 0x8152)},
{REALTEK_USB_DEVICE(VENDOR_ID_REALTEK, 0x8153)},
{REALTEK_USB_DEVICE(VENDOR_ID_SAMSUNG, 0xa101)},
-- 
2.4.11

[PATCH net-next v2 3/6] r8152: add power_cut_en for rtl_ops

2016-06-27 Thread Hayes Wang

Add power_cut_en() for rtl_ops.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index f51d799..a4f8a01 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -622,6 +622,7 @@ struct r8152 {
void (*aldps_enable)(struct r8152 *tp, bool enable);
void (*u1u2_enable)(struct r8152 *tp, bool enable);
void (*hw_phy_cfg)(struct r8152 *);
+   void (*power_cut_en)(struct r8152 *tp, bool enable);
} rtl_ops;
 
int intr_interval;
@@ -2391,6 +2392,13 @@ static void r8153_power_cut_en(struct r8152 *tp, bool 
enable)
else
ocp_data &= ~(PWR_EN | PHASE2_EN);
ocp_write_word(tp, MCU_TYPE_USB, USB_POWER_CUT, ocp_data);
+}
+
+static void r8153A_power_cut_en(struct r8152 *tp, bool enable)
+{
+   u32 ocp_data;
+
+   r8153_power_cut_en(tp, enable);
 
ocp_data = ocp_read_word(tp, MCU_TYPE_USB, USB_MISC_0);
ocp_data &= ~PCUT_STATUS;
@@ -2941,7 +2949,7 @@ static void rtl8153_down(struct r8152 *tp)
 
tp->rtl_ops.u1u2_enable(tp, false);
r8153_u2p3en(tp, false);
-   r8153_power_cut_en(tp, false);
+   tp->rtl_ops.power_cut_en(tp, false);
tp->rtl_ops.aldps_enable(tp, false);
r8153_enter_oob(tp);
tp->rtl_ops.aldps_enable(tp, true);
@@ -3397,7 +3405,7 @@ static void r8153_init(struct r8152 *tp)
 
ocp_write_word(tp, MCU_TYPE_USB, USB_CONNECT_TIMER, 0x0001);
 
-   r8153_power_cut_en(tp, false);
+   r8153A_power_cut_en(tp, false);
r8153_u1u2en(tp, true);
 
ocp_write_word(tp, MCU_TYPE_PLA, PLA_MAC_PWR_CTRL, ALDPS_SPDWN_RATIO);
@@ -4122,7 +4130,7 @@ static void rtl8153_unload(struct r8152 *tp)
if (test_bit(RTL8152_UNPLUG, >flags))
return;
 
-   r8153_power_cut_en(tp, false);
+   tp->rtl_ops.power_cut_en(tp, false);
 }
 
 static int rtl_ops_init(struct r8152 *tp)
@@ -4145,6 +4153,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->aldps_enable   = r8152_aldps_en;
ops->u1u2_enable= r8153_u1u2en;
ops->hw_phy_cfg = r8152b_hw_phy_cfg;
+   ops->power_cut_en   = r8152_power_cut_en;
break;
 
case RTL_VER_03:
@@ -4163,6 +4172,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->aldps_enable   = r8153_aldps_en;
ops->u1u2_enable= r8153_u1u2en;
ops->hw_phy_cfg = r8153_hw_phy_cfg;
+   ops->power_cut_en   = r8153A_power_cut_en;
break;
 
default:
-- 
2.4.11

[PATCH net-next v2 1/6] r8152: add aldps_enable for rtl_ops

2016-06-27 Thread Hayes Wang

Add aldps_enable() for rtl_ops.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 11178f9..b253003 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -619,6 +619,7 @@ struct r8152 {
int (*eee_get)(struct r8152 *, struct ethtool_eee *);
int (*eee_set)(struct r8152 *, struct ethtool_eee *);
bool (*in_nway)(struct r8152 *);
+   void (*aldps_enable)(struct r8152 *tp, bool enable);
void (*hw_phy_cfg)(struct r8152 *);
} rtl_ops;
 
@@ -2474,9 +2475,9 @@ static void r8152_aldps_en(struct r8152 *tp, bool enable)
 
 static void rtl8152_disable(struct r8152 *tp)
 {
-   r8152_aldps_en(tp, false);
+   tp->rtl_ops.aldps_enable(tp, false);
rtl_disable(tp);
-   r8152_aldps_en(tp, true);
+   tp->rtl_ops.aldps_enable(tp, true);
 }
 
 static void r8152b_hw_phy_cfg(struct r8152 *tp)
@@ -2801,9 +2802,7 @@ static void r8153_aldps_en(struct r8152 *tp, bool enable)
 
 static void rtl8153_disable(struct r8152 *tp)
 {
-   r8153_aldps_en(tp, false);
-   rtl_disable(tp);
-   r8153_aldps_en(tp, true);
+   rtl8152_disable(tp);
usb_enable_lpm(tp->udev);
 }
 
@@ -2924,9 +2923,9 @@ static void rtl8153_up(struct r8152 *tp)
return;
 
r8153_u1u2en(tp, false);
-   r8153_aldps_en(tp, false);
+   tp->rtl_ops.aldps_enable(tp, false);
r8153_first_init(tp);
-   r8153_aldps_en(tp, true);
+   tp->rtl_ops.aldps_enable(tp, true);
r8153_u2p3en(tp, true);
r8153_u1u2en(tp, true);
usb_enable_lpm(tp->udev);
@@ -2942,9 +2941,9 @@ static void rtl8153_down(struct r8152 *tp)
r8153_u1u2en(tp, false);
r8153_u2p3en(tp, false);
r8153_power_cut_en(tp, false);
-   r8153_aldps_en(tp, false);
+   tp->rtl_ops.aldps_enable(tp, false);
r8153_enter_oob(tp);
-   r8153_aldps_en(tp, true);
+   tp->rtl_ops.aldps_enable(tp, true);
 }
 
 static bool rtl8152_in_nway(struct r8152 *tp)
@@ -4142,6 +4141,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->eee_get= r8152_get_eee;
ops->eee_set= r8152_set_eee;
ops->in_nway= rtl8152_in_nway;
+   ops->aldps_enable   = r8152_aldps_en;
ops->hw_phy_cfg = r8152b_hw_phy_cfg;
break;
 
@@ -4158,6 +4158,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->eee_get= r8153_get_eee;
ops->eee_set= r8153_set_eee;
ops->in_nway= rtl8153_in_nway;
+   ops->aldps_enable   = r8153_aldps_en;
ops->hw_phy_cfg = r8153_hw_phy_cfg;
break;
 
-- 
2.4.11

[PATCH net-next v2 2/6] r8152: add u1u2_enable for rtl_ops

2016-06-27 Thread Hayes Wang

Add u1u2_enable() for rtl_ops.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index b253003..f51d799 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -620,6 +620,7 @@ struct r8152 {
int (*eee_set)(struct r8152 *, struct ethtool_eee *);
bool (*in_nway)(struct r8152 *);
void (*aldps_enable)(struct r8152 *tp, bool enable);
+   void (*u1u2_enable)(struct r8152 *tp, bool enable);
void (*hw_phy_cfg)(struct r8152 *);
} rtl_ops;
 
@@ -2408,7 +2409,7 @@ static void rtl_runtime_suspend_enable(struct r8152 *tp, 
bool enable)
if (enable) {
u32 ocp_data;
 
-   r8153_u1u2en(tp, false);
+   tp->rtl_ops.u1u2_enable(tp, false);
r8153_u2p3en(tp, false);
 
__rtl_set_wol(tp, WAKE_ANY);
@@ -2423,7 +2424,7 @@ static void rtl_runtime_suspend_enable(struct r8152 *tp, 
bool enable)
} else {
__rtl_set_wol(tp, tp->saved_wolopts);
r8153_u2p3en(tp, true);
-   r8153_u1u2en(tp, true);
+   tp->rtl_ops.u1u2_enable(tp, true);
}
 }
 
@@ -2922,12 +2923,12 @@ static void rtl8153_up(struct r8152 *tp)
if (test_bit(RTL8152_UNPLUG, >flags))
return;
 
-   r8153_u1u2en(tp, false);
+   tp->rtl_ops.u1u2_enable(tp, false);
tp->rtl_ops.aldps_enable(tp, false);
r8153_first_init(tp);
tp->rtl_ops.aldps_enable(tp, true);
r8153_u2p3en(tp, true);
-   r8153_u1u2en(tp, true);
+   tp->rtl_ops.u1u2_enable(tp, true);
usb_enable_lpm(tp->udev);
 }
 
@@ -2938,7 +2939,7 @@ static void rtl8153_down(struct r8152 *tp)
return;
}
 
-   r8153_u1u2en(tp, false);
+   tp->rtl_ops.u1u2_enable(tp, false);
r8153_u2p3en(tp, false);
r8153_power_cut_en(tp, false);
tp->rtl_ops.aldps_enable(tp, false);
@@ -4142,6 +4143,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->eee_set= r8152_set_eee;
ops->in_nway= rtl8152_in_nway;
ops->aldps_enable   = r8152_aldps_en;
+   ops->u1u2_enable= r8153_u1u2en;
ops->hw_phy_cfg = r8152b_hw_phy_cfg;
break;
 
@@ -4159,6 +4161,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->eee_set= r8153_set_eee;
ops->in_nway= rtl8153_in_nway;
ops->aldps_enable   = r8153_aldps_en;
+   ops->u1u2_enable= r8153_u1u2en;
ops->hw_phy_cfg = r8153_hw_phy_cfg;
break;
 
-- 
2.4.11

Re: Supporting C45 PHY without ID registers

2016-06-27 Thread Moritz Fischer

Hi Andrew,

On Mon, Jun 27, 2016 at 5:56 PM, Andrew Lunn  wrote:

> Does it have any ID registers at all?

There is a vendor specific (to my knowledge) register at device 1
register 65535 ([1]) that could be read back. I haven't seen anyone
else do that.

Thanks,

Moritz

[1] 
http://www.xilinx.com/support/documentation/ip_documentation/ten_gig_eth_pcs_pma/v6_0/pg068-ten-gig-eth-pcs-pma.pdf

linux-next: manual merge of the net-next tree with the imx-mxs tree

2016-06-27 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/freescale/fec.h

between commit:

  293809055656 ("ARM: imx6: disable deeper idle states when FEC is active w/o 
HW workaround")

from the imx-mxs tree and commit:

  ff7566b8d71f ("net: fec: add interrupt coalesc quirk flag")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/freescale/fec.h
index dc71a88e9c55,92fd5c0bf4df..
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@@ -442,8 -442,8 +442,10 @@@ struct bufdesc_ex 
  #define FEC_QUIRK_SINGLE_MDIO (1 << 11)
  /* Controller supports RACC register */
  #define FEC_QUIRK_HAS_RACC(1 << 12)
 +/* Interrupt doesn't wake CPU from deep idle */
 +#define FEC_QUIRK_ERR006687   (1 << 13)
+ /* Controller supports interrupt coalesc */
 -#define FEC_QUIRK_HAS_COALESCE(1 << 13)
++#define FEC_QUIRK_HAS_COALESCE(1 << 14)
  
  struct bufdesc_prop {
int qid;

Re: [PATCH net-next 0/6] net: dsa: Platform data for dsa2.c

2016-06-27 Thread Florian Fainelli

2016-06-27 18:05 GMT-07:00 Andrew Lunn :
> On Mon, Jun 27, 2016 at 05:52:37PM -0700, Florian Fainelli wrote:
>> Hi all,
>>
>> This patch series adds support for platform data using the new code from
>> net/dsa/dsa2.c. The motivation behind this is that we have a bit of in tree
>> platforms (ar7, bcm47xx, x86, others) that could be benefiting from the new
>> dsa_register_switch() API model but do not support Device Tree, nor is there 
>> a
>> plan to bring Device Tree to these platforms (time vs. benefits).
>
> Hi Florian
>
> Please could you convert an in tree device to actually use this.

Sure, I don't think there are going to be in tree users who need the
dsa2_port_link information most of what we have is typically single
chip, and so in that case, we can even re-use the existing
dsa_platform_data.
-- 
Florian

Re: [PATCH net-next 0/6] net: dsa: Platform data for dsa2.c

2016-06-27 Thread Andrew Lunn

On Mon, Jun 27, 2016 at 05:52:37PM -0700, Florian Fainelli wrote:
> Hi all,
> 
> This patch series adds support for platform data using the new code from
> net/dsa/dsa2.c. The motivation behind this is that we have a bit of in tree
> platforms (ar7, bcm47xx, x86, others) that could be benefiting from the new
> dsa_register_switch() API model but do not support Device Tree, nor is there a
> plan to bring Device Tree to these platforms (time vs. benefits).

Hi Florian

Please could you convert an in tree device to actually use this.

   Thanks
Andrew

Re: Supporting C45 PHY without ID registers

2016-06-27 Thread Andrew Lunn

On Mon, Jun 27, 2016 at 04:36:20PM -0700, Moritz Fischer wrote:
> Hi all,
> 
> I have a 10GigE PHY that I'm working with that has most of it's
> functionality availabile via MDIO
> in a clause 45 compliant fashion, however the usual probe method fails
> since the id registers
> are not implemented.

Hi Moritz

Does it have any ID registers at all?

 Andrew

[PATCH net-next 3/6] net: dsa: Suffix function manipulating device_node with _dn

2016-06-27 Thread Florian Fainelli

Make it clear that these functions take a device_node structure pointer

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa2.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 3a782ceef716..bdde5d217326 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -110,8 +110,8 @@ static bool dsa_port_is_cpu(struct dsa_port *port)
return false;
 }
 
-static bool dsa_ds_find_port(struct dsa_switch *ds,
-struct device_node *port)
+static bool dsa_ds_find_port_dn(struct dsa_switch *ds,
+   struct device_node *port)
 {
u32 index;
 
@@ -121,8 +121,8 @@ static bool dsa_ds_find_port(struct dsa_switch *ds,
return false;
 }
 
-static struct dsa_switch *dsa_dst_find_port(struct dsa_switch_tree *dst,
-   struct device_node *port)
+static struct dsa_switch *dsa_dst_find_port_dn(struct dsa_switch_tree *dst,
+  struct device_node *port)
 {
struct dsa_switch *ds;
u32 index;
@@ -132,7 +132,7 @@ static struct dsa_switch *dsa_dst_find_port(struct 
dsa_switch_tree *dst,
if (!ds)
continue;
 
-   if (dsa_ds_find_port(ds, port))
+   if (dsa_ds_find_port_dn(ds, port))
return ds;
}
 
@@ -153,7 +153,7 @@ static int dsa_port_complete(struct dsa_switch_tree *dst,
if (!link)
break;
 
-   dst_ds = dsa_dst_find_port(dst, link);
+   dst_ds = dsa_dst_find_port_dn(dst, link);
of_node_put(link);
 
if (!dst_ds)
@@ -557,7 +557,7 @@ static int dsa_parse_ports_dn(struct device_node *ports, 
struct dsa_switch *ds)
return 0;
 }
 
-static int dsa_parse_member(struct device_node *np, u32 *tree, u32 *index)
+static int dsa_parse_member_dn(struct device_node *np, u32 *tree, u32 *index)
 {
int err;
 
@@ -603,7 +603,7 @@ static int _dsa_register_switch(struct dsa_switch *ds, 
struct device *dev)
u32 tree, index;
int err;
 
-   err = dsa_parse_member(np, , );
+   err = dsa_parse_member_dn(np, , );
if (err)
return err;
 
-- 
2.7.4

[PATCH net-next 0/6] net: dsa: Platform data for dsa2.c

2016-06-27 Thread Florian Fainelli

Hi all,

This patch series adds support for platform data using the new code from
net/dsa/dsa2.c. The motivation behind this is that we have a bit of in tree
platforms (ar7, bcm47xx, x86, others) that could be benefiting from the new
dsa_register_switch() API model but do not support Device Tree, nor is there a
plan to bring Device Tree to these platforms (time vs. benefits).

The approach taken here is to introduced a new set of platform data structures:
- dsa2_platform_data: per-switch platform data information
- dsa2_port_data: per-port platform data information
- dsa2_port_link: per-port link topology

The platform data is mapped as closely as possible to the Device Tree binding.
Where needed the code is looking either specifically for DT
attributes/properties, or parses the platform data, unlike net/dsa/dsa.c we do
not try to make the DT code allocate and fit within the platform data, but that
could be an option if deemed necessary.

This was tested against a "loopback" driver which is available here, and which
was built as a module and into the kernel, modprobe & rmmod were succesful and
working:

https://github.com/ffainelli/linux/blob/27024f4c9b43cf879348baa5445763c690fbc618/drivers/net/dsa/dsa_loop.c

in two configurations:

- first configuration has 4 switches connected in a cascade [1]
- second configuration is the actual binding example [2]

[1]:
https://github.com/ffainelli/linux/commit/0aa1e98c027e8e41ed2f83825da82a3d8a1c3ae2
[2]:
https://github.com/ffainelli/linux/commit/27024f4c9b43cf879348baa5445763c690fbc618

Florian Fainelli (6):
  net: dsa: Pass device pointer to dsa_register_switch
  net: dsa: Make most functions take a dsa_port argument
  net: dsa: Suffix function manipulating device_node with _dn
  net: dsa: Move ports assignment closer to error checking
  net: dsa: Export dev_to_net_device()
  net: dsa: Add support for platform data

 drivers/net/dsa/b53/b53_common.c  |   2 +-
 drivers/net/dsa/mv88e6xxx/chip.c  |   7 +-
 include/linux/platform_data/dsa.h |  61 +++
 include/net/dsa.h |  10 +-
 net/dsa/Kconfig   |   1 +
 net/dsa/dsa.c |  58 ++
 net/dsa/dsa2.c| 216 --
 net/dsa/dsa_priv.h|   4 +-
 8 files changed, 273 insertions(+), 86 deletions(-)
 create mode 100644 include/linux/platform_data/dsa.h

-- 
2.7.4

[PATCH net-next 2/6] net: dsa: Make most functions take a dsa_port argument

2016-06-27 Thread Florian Fainelli

In preparation for allowing platform data, and therefore no valid
device_node pointer, make most DSA functions takes a pointer to a
dsa_port structure whenever possible. While at it, introduce a
dsa_port_is_valid() helper function which checks whether port->dn is
NULL or not at the moment.

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa.c  | 14 +++--
 net/dsa/dsa2.c | 61 +-
 net/dsa/dsa_priv.h |  4 ++--
 3 files changed, 43 insertions(+), 36 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 766d2a525ada..d117580a78b6 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -208,8 +208,9 @@ __ATTRIBUTE_GROUPS(dsa_hwmon);
 
 /* basic switch operations **/
 int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
- struct device_node *port_dn, int port)
+ struct dsa_port *dport, int port)
 {
+   struct device_node *port_dn = dport->dn;
struct phy_device *phydev;
int ret, mode;
 
@@ -237,15 +238,15 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct 
device *dev,
 
 static int dsa_cpu_dsa_setups(struct dsa_switch *ds, struct device *dev)
 {
-   struct device_node *port_dn;
+   struct dsa_port *dport;
int ret, port;
 
for (port = 0; port < DSA_MAX_PORTS; port++) {
if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)))
continue;
 
-   port_dn = ds->ports[port].dn;
-   ret = dsa_cpu_dsa_setup(ds, dev, port_dn, port);
+   dport = >ports[port];
+   ret = dsa_cpu_dsa_setup(ds, dev, dport, port);
if (ret)
return ret;
}
@@ -494,8 +495,9 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
return ds;
 }
 
-void dsa_cpu_dsa_destroy(struct device_node *port_dn)
+void dsa_cpu_dsa_destroy(struct dsa_port *port)
 {
+   struct device_node *port_dn = port->dn;
struct phy_device *phydev;
 
if (of_phy_is_fixed_link(port_dn)) {
@@ -531,7 +533,7 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
for (port = 0; port < DSA_MAX_PORTS; port++) {
if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)))
continue;
-   dsa_cpu_dsa_destroy(ds->ports[port].dn);
+   dsa_cpu_dsa_destroy(>ports[port]);
 
/* Clearing a bit which is not set does no harm */
ds->cpu_port_mask |= ~(1 << port);
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 0940a0ec83e6..3a782ceef716 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -77,11 +77,16 @@ static void dsa_dst_del_ds(struct dsa_switch_tree *dst,
kref_put(>refcount, dsa_free_dst);
 }
 
-static bool dsa_port_is_dsa(struct device_node *port)
+static bool dsa_port_is_valid(struct dsa_port *port)
+{
+   return !!port->dn;
+}
+
+static bool dsa_port_is_dsa(struct dsa_port *port)
 {
const char *name;
 
-   name = of_get_property(port, "label", NULL);
+   name = of_get_property(port->dn, "label", NULL);
if (!name)
return false;
 
@@ -91,11 +96,11 @@ static bool dsa_port_is_dsa(struct device_node *port)
return false;
 }
 
-static bool dsa_port_is_cpu(struct device_node *port)
+static bool dsa_port_is_cpu(struct dsa_port *port)
 {
const char *name;
 
-   name = of_get_property(port, "label", NULL);
+   name = of_get_property(port->dn, "label", NULL);
if (!name)
return false;
 
@@ -136,7 +141,7 @@ static struct dsa_switch *dsa_dst_find_port(struct 
dsa_switch_tree *dst,
 
 static int dsa_port_complete(struct dsa_switch_tree *dst,
 struct dsa_switch *src_ds,
-struct device_node *port,
+struct dsa_port *port,
 u32 src_port)
 {
struct device_node *link;
@@ -144,7 +149,7 @@ static int dsa_port_complete(struct dsa_switch_tree *dst,
struct dsa_switch *dst_ds;
 
for (index = 0;; index++) {
-   link = of_parse_phandle(port, "link", index);
+   link = of_parse_phandle(port->dn, "link", index);
if (!link)
break;
 
@@ -167,13 +172,13 @@ static int dsa_port_complete(struct dsa_switch_tree *dst,
  */
 static int dsa_ds_complete(struct dsa_switch_tree *dst, struct dsa_switch *ds)
 {
-   struct device_node *port;
+   struct dsa_port *port;
u32 index;
int err;
 
for (index = 0; index < DSA_MAX_PORTS; index++) {
-   port = ds->ports[index].dn;
-   if (!port)
+   port = >ports[index];
+   if (!dsa_port_is_valid(port))
continue;
 
if (!dsa_port_is_dsa(port))
@@ -213,7 +218,7 @@ static int

[PATCH net-next 4/6] net: dsa: Move ports assignment closer to error checking

2016-06-27 Thread Florian Fainelli

Move the assignment of ports in _dsa_register_switch() closer to where
it is checked, no functional change. Re-order declarations to be
preserve the inverted christmas tree style.

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa2.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index bdde5d217326..a565bd919aa3 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -598,8 +598,8 @@ static struct device_node *dsa_get_ports(struct dsa_switch 
*ds,
 static int _dsa_register_switch(struct dsa_switch *ds, struct device *dev)
 {
struct device_node *np = dev->of_node;
-   struct device_node *ports = dsa_get_ports(ds, np);
struct dsa_switch_tree *dst;
+   struct device_node *ports;
u32 tree, index;
int err;
 
@@ -607,6 +607,7 @@ static int _dsa_register_switch(struct dsa_switch *ds, 
struct device *dev)
if (err)
return err;
 
+   ports = dsa_get_ports(ds, np);
if (IS_ERR(ports))
return PTR_ERR(ports);
 
-- 
2.7.4

Re: [PATCH] rtlwifi: Create _rtl_dbg_trace function to reduce RT_TRACE code size

2016-06-27 Thread Larry Finger


On 06/25/2016 05:46 PM, Joe Perches wrote:

This debugging macro can expand to a lot of code.
Make it a function to reduce code size.

(x86-64 defconfig w/ all rtlwifi drivers and allyesconfig)
$ size drivers/net/wireless/realtek/rtlwifi/built-in.o*
text   data bss dec hex filename
  900083 2004991907 1102489  10d299 
drivers/net/wireless/realtek/rtlwifi/built-in.o.defconfig.new
1113597  2004991907 1316003  1414a3 
drivers/net/wireless/realtek/rtlwifi/built-in.o.defconfig.old
1746879  4535038512 2208894  21b47e 
drivers/net/wireless/realtek/rtlwifi/built-in.o.new
2051965  5033118512 2563788  271ecc 
drivers/net/wireless/realtek/rtlwifi/built-in.o.old

Signed-off-by: Joe Perches 


I acked this before; however there is a bug that breaks the build if 
CONFIG_RTLWIFI_DEBUG is not defined. The rest of the code calls 
_rtl_dbg_trace(), but that symbol is never defined. The problem can be fixed in 
debug.c or debug.h.


Larry


---
  drivers/net/wireless/realtek/rtlwifi/debug.c | 25 +
  drivers/net/wireless/realtek/rtlwifi/debug.h | 17 +
  2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/debug.c 
b/drivers/net/wireless/realtek/rtlwifi/debug.c
index fd25aba..33905bb 100644
--- a/drivers/net/wireless/realtek/rtlwifi/debug.c
+++ b/drivers/net/wireless/realtek/rtlwifi/debug.c
@@ -48,3 +48,28 @@ void rtl_dbgp_flag_init(struct ieee80211_hw *hw)
/*Init Debug flag enable condition */
  }
  EXPORT_SYMBOL_GPL(rtl_dbgp_flag_init);
+
+#ifdef CONFIG_RTLWIFI_DEBUG
+void _rtl_dbg_trace(struct rtl_priv *rtlpriv, int comp, int level,
+   const char *modname, const char *fmt, ...)
+{
+   if (unlikely((comp & rtlpriv->dbg.global_debugcomponents) &&
+(level <= rtlpriv->dbg.global_debuglevel))) {
+   struct va_format vaf;
+   va_list args;
+
+   va_start(args, fmt);
+
+   vaf.fmt = fmt;
+   vaf.va = 
+
+   printk(KERN_DEBUG "%s:%ps:<%lx-%x> %pV",
+  modname, __builtin_return_address(0),
+  in_interrupt(), in_atomic(),
+  );
+
+   va_end(args);
+   }
+}
+EXPORT_SYMBOL_GPL(_rtl_dbg_trace);
+#endif
diff --git a/drivers/net/wireless/realtek/rtlwifi/debug.h 
b/drivers/net/wireless/realtek/rtlwifi/debug.h
index fc794b3..6156a79 100644
--- a/drivers/net/wireless/realtek/rtlwifi/debug.h
+++ b/drivers/net/wireless/realtek/rtlwifi/debug.h
@@ -174,15 +174,16 @@ do {  
\
}   \
  } while (0)

+
+struct rtl_priv;
+
+__printf(5, 6)
+void _rtl_dbg_trace(struct rtl_priv *rtlpriv, int comp, int level,
+   const char *modname, const char *fmt, ...);
+
  #define RT_TRACE(rtlpriv, comp, level, fmt, ...)  \
-do {   \
-   if (unlikely(((comp) & rtlpriv->dbg.global_debugcomponents) &&   \
-((level) <= rtlpriv->dbg.global_debuglevel))) {  \
-   printk(KERN_DEBUG KBUILD_MODNAME ":%s():<%lx-%x> " fmt, \
-  __func__, in_interrupt(), in_atomic(),   \
-  ##__VA_ARGS__);  \
-   }   \
-} while (0)
+   _rtl_dbg_trace(rtlpriv, comp, level,\
+  KBUILD_MODNAME, fmt, ##__VA_ARGS__)

  #define RTPRINT(rtlpriv, dbgtype, dbgflag, fmt, ...)  \
  do {  \

[PATCH net-next 6/6] net: dsa: Add support for platform data

2016-06-27 Thread Florian Fainelli

Allow drivers to use the new DSA API with platform data. Most of the
code in net/dsa/dsa2.c does not rely so much on device_nodes and can get
the same information from platform_data instead.

Signed-off-by: Florian Fainelli 
---
 include/linux/platform_data/dsa.h |  61 
 include/net/dsa.h |   7 ++
 net/dsa/Kconfig   |   1 +
 net/dsa/dsa.c |  43 +++
 net/dsa/dsa2.c| 147 +++---
 5 files changed, 218 insertions(+), 41 deletions(-)
 create mode 100644 include/linux/platform_data/dsa.h

diff --git a/include/linux/platform_data/dsa.h 
b/include/linux/platform_data/dsa.h
new file mode 100644
index ..72a91903a88f
--- /dev/null
+++ b/include/linux/platform_data/dsa.h
@@ -0,0 +1,61 @@
+#ifndef __DSA_PDATA_H
+#define __DSA_PDATA_H
+
+#include 
+#include 
+#include 
+#include 
+
+struct dsa2_port_link {
+   boolvalid;
+   u32 index;
+   unsigned intport;
+};
+
+struct dsa2_port_data {
+   /*
+* Name of the ports, can be unique or a template (e.g: port%d)
+*/
+   const char *name;
+
+   /*
+* PHY interface
+*/
+   phy_interface_t phy_iface;
+
+   /*
+* Fixed PHY status information, if needed by the port (e.g: CPU port)
+*/
+   struct fixed_phy_status fixed_phy_status;
+   int link_gpio;
+
+   /*
+* Links to other switches in the tree
+*/
+   struct dsa2_port_link   links[DSA_MAX_SWITCHES];
+};
+
+struct dsa2_platform_data {
+   /*
+* Reference to a Linux network interface that connects
+* to this switch chip.
+*/
+   struct device   *netdev;
+
+   /*
+* Tree number
+*/
+   u32 tree;
+
+   /*
+* Switch chip index within the tree
+*/
+   u32 index;
+
+   /*
+* Ports layout and description
+*/
+   struct dsa2_port_data ports[DSA_MAX_PORTS];
+};
+
+#endif /* __DSA_PDATA_H */
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 48dce1fd100a..5e686640fd8e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -83,6 +83,7 @@ struct dsa_platform_data {
 };
 
 struct packet_type;
+struct dsa2_platform_data;
 
 struct dsa_switch_tree {
struct list_headlist;
@@ -101,6 +102,7 @@ struct dsa_switch_tree {
 * this dsa switch tree instance.
 */
struct dsa_platform_data*pd;
+   struct dsa2_platform_data   *pd2;
 
/*
 * Reference to network device to use, and which tagging
@@ -136,9 +138,14 @@ struct dsa_switch_tree {
const struct dsa_device_ops *tag_ops;
 };
 
+struct dsa2_port_data;
+
 struct dsa_port {
+   const char  *name;
struct net_device   *netdev;
+   struct phy_device   *phydev;
struct device_node  *dn;
+   struct dsa2_port_data   *data;
 };
 
 struct dsa_switch {
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index ff7736f7ff42..a152bafedce5 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -8,6 +8,7 @@ config NET_DSA
tristate "Distributed Switch Architecture"
depends on HAVE_NET_DSA && NET_SWITCHDEV
select PHYLIB
+   select FIXED_PHY
---help---
  Say Y if you want to enable support for the hardware switches 
supported
  by the Distributed Switch Architecture.
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 9b5ff9814b5e..6a84067a5b18 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "dsa_priv.h"
 
 char dsa_driver_version[] = "0.1";
@@ -211,10 +212,14 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct 
device *dev,
  struct dsa_port *dport, int port)
 {
struct device_node *port_dn = dport->dn;
+   struct dsa2_port_data *pdata = dport->data;
struct phy_device *phydev;
int ret, mode;
 
-   if (of_phy_is_fixed_link(port_dn)) {
+   if (!port_dn && !pdata)
+   return 0;
+
+   if (port_dn && of_phy_is_fixed_link(port_dn)) {
ret = of_phy_register_fixed_link(port_dn);
if (ret) {
dev_err(dev, "failed to register fixed PHY\n");
@@ -225,13 +230,25 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct 
device *dev,
mode = of_get_phy_mode(port_dn);
if (mode < 0)
mode = PHY_INTERFACE_MODE_NA;
-   phydev->interface = mode;
+   } else if (pdata->fixed_phy_status.speed != 0) {
+   phydev = fixed_phy_register(PHY_POLL, >fixed_phy_status,
+   pdata->link_gpio,
+   NULL);
+   if (IS_ERR(phydev)) {
+   dev_err(dev, "failed to register fixed PHY\n");
+

[PATCH net-next 5/6] net: dsa: Export dev_to_net_device()

2016-06-27 Thread Florian Fainelli

We are going to need this in net/dsa/dsa2.c as well, so make it
avaialable.

Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h | 1 +
 net/dsa/dsa.c | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 6de162c8283e..48dce1fd100a 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -369,6 +369,7 @@ struct dsa_switch_driver {
 void register_switch_driver(struct dsa_switch_driver *type);
 void unregister_switch_driver(struct dsa_switch_driver *type);
 struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev);
+struct net_device *dev_to_net_device(struct device *dev);
 
 static inline void *ds_to_priv(struct dsa_switch *ds)
 {
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index d117580a78b6..9b5ff9814b5e 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -626,7 +626,7 @@ struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dsa_host_dev_to_mii_bus);
 
-static struct net_device *dev_to_net_device(struct device *dev)
+struct net_device *dev_to_net_device(struct device *dev)
 {
struct device *d;
 
@@ -643,6 +643,7 @@ static struct net_device *dev_to_net_device(struct device 
*dev)
 
return NULL;
 }
+EXPORT_SYMBOL_GPL(dev_to_net_device);
 
 #ifdef CONFIG_OF
 static int dsa_of_setup_routing_table(struct dsa_platform_data *pd,
-- 
2.7.4

[PATCH net-next 1/6] net: dsa: Pass device pointer to dsa_register_switch

2016-06-27 Thread Florian Fainelli

In preparation for allowing dsa_register_switch() to be supplied with
device/platform data, pass down a struct device pointer instead of a
struct device_node.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 2 +-
 drivers/net/dsa/mv88e6xxx/chip.c | 7 +++
 include/net/dsa.h| 2 +-
 net/dsa/dsa2.c   | 7 ---
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 444de7b9..e5799a68cfc8 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1778,7 +1778,7 @@ int b53_switch_register(struct b53_device *dev)
 
pr_info("found switch: %s, rev %i\n", dev->name, dev->core_rev);
 
-   return dsa_register_switch(dev->ds, dev->ds->dev->of_node);
+   return dsa_register_switch(dev->ds, dev->ds->dev);
 }
 EXPORT_SYMBOL(b53_switch_register);
 
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5cb06f7673af..11617c04cd33 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3848,8 +3848,7 @@ static struct dsa_switch_driver mv88e6xxx_switch_driver = 
{
.port_fdb_dump  = mv88e6xxx_port_fdb_dump,
 };
 
-static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip,
-struct device_node *np)
+static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip)
 {
struct device *dev = chip->dev;
struct dsa_switch *ds;
@@ -3864,7 +3863,7 @@ static int mv88e6xxx_register_switch(struct 
mv88e6xxx_chip *chip,
 
dev_set_drvdata(dev, ds);
 
-   return dsa_register_switch(ds, np);
+   return dsa_register_switch(ds, dev);
 }
 
 static void mv88e6xxx_unregister_switch(struct mv88e6xxx_chip *chip)
@@ -3911,7 +3910,7 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
if (err)
return err;
 
-   err = mv88e6xxx_register_switch(chip, np);
+   err = mv88e6xxx_register_switch(chip);
if (err) {
mv88e6xxx_mdio_unregister(chip);
return err;
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 20b3087ad193..6de162c8283e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -381,5 +381,5 @@ static inline bool dsa_uses_tagged_protocol(struct 
dsa_switch_tree *dst)
 }
 
 void dsa_unregister_switch(struct dsa_switch *ds);
-int dsa_register_switch(struct dsa_switch *ds, struct device_node *np);
+int dsa_register_switch(struct dsa_switch *ds, struct device *dev);
 #endif
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 83b95fc4cede..0940a0ec83e6 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -590,8 +590,9 @@ static struct device_node *dsa_get_ports(struct dsa_switch 
*ds,
return ports;
 }
 
-static int _dsa_register_switch(struct dsa_switch *ds, struct device_node *np)
+static int _dsa_register_switch(struct dsa_switch *ds, struct device *dev)
 {
+   struct device_node *np = dev->of_node;
struct device_node *ports = dsa_get_ports(ds, np);
struct dsa_switch_tree *dst;
u32 tree, index;
@@ -660,12 +661,12 @@ out:
return err;
 }
 
-int dsa_register_switch(struct dsa_switch *ds, struct device_node *np)
+int dsa_register_switch(struct dsa_switch *ds, struct device *dev)
 {
int err;
 
mutex_lock(_mutex);
-   err = _dsa_register_switch(ds, np);
+   err = _dsa_register_switch(ds, dev);
mutex_unlock(_mutex);
 
return err;
-- 
2.7.4

Re: [RFC] tun: Support VIRTIO_NET_HDR_F_DATA_VALID in tun_get_user

2016-06-27 Thread Tom Herbert

On Fri, Jun 24, 2016 at 7:21 PM, Subash Abhinov Kasiviswanathan
 wrote:
> Userspace applications might sometimes process packets from hardware
> which has already validated checksum, perform trivial operations and
> then queue them back to the network stack. By not recomputing the
> checksum here, we can see significant improvement in performance.
>
> Sample application here is CLAT which does IPv6 to IPv4 translation.
> IPv6 packets for which checksum is validated in hardware are captured
> in CLAT and then translated to IPv4 and then queued back to network
> stack. In this case, it is expected that the application would not
> corrupt the packet and recomputing the checksum would be redundant.
>
> Pass the hint to kernel to skip checksum validation if
> VIRTIO_NET_HDR_F_DATA_VALID is set from userspace.
>
CHECKSUM_UNNECESSARY is not a hint, the interface is very specific. It
means that a checksum(s) has been verified to be correct. There is no
way here to validate that the userspace code is doing the right thing.
virtionet interface really should also pass checksum-complete values
to be up with the times, this would be more robust and harder to
silently get wrong.

Tom

> Signed-off-by: Subash Abhinov Kasiviswanathan 
> ---
>  drivers/net/tun.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e16487c..a5828a5 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1263,6 +1263,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
> struct tun_file *tfile,
> }
> }
>
> +   if (gso.flags & VIRTIO_NET_HDR_F_DATA_VALID)
> +   skb->ip_summed = CHECKSUM_UNNECESSARY;
> +
> switch (tun->flags & TUN_TYPE_MASK) {
> case IFF_TUN:
> if (tun->flags & IFF_NO_PI) {
> --
> 1.9.1
>

Supporting C45 PHY without ID registers

2016-06-27 Thread Moritz Fischer

Hi all,

I have a 10GigE PHY that I'm working with that has most of it's
functionality availabile via MDIO
in a clause 45 compliant fashion, however the usual probe method fails
since the id registers
are not implemented.

I hacked up drivers/of/of_mdio.c to include something similar to of_get_phy_id()
for c45 phys but I was wondering if someone else has a better idea
than this in my dt:

ethernet_phy1: ethernet-phy@4 {
compatible = "ethernet-phy-id.",
"ethernet-phy-id4242.4242",
"ethernet-phy-id.",
"ethernet-phy-id4343.4343",
"ethernet-phy-ieee802.3-c45";
reg = <4>;
};

Where I made up 42424242 and 43434343 as ids for my PCS / PMA. Ideas?

Cheers,

Moritz

Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum

2016-06-27 Thread Tom Herbert

On Mon, Jun 27, 2016 at 3:53 PM, Cong Wang  wrote:
> On Mon, Jun 27, 2016 at 3:04 PM, Tom Herbert  wrote:
>> On Mon, Jun 27, 2016 at 2:49 PM, Cong Wang  wrote:
>>> On Mon, Jun 27, 2016 at 2:47 PM, Tom Herbert  wrote:
 On Mon, Jun 27, 2016 at 2:44 PM, Cong Wang  
 wrote:
> On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz  wrote:
>> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang  
>> wrote:
>>> The stack doesn't trust the complete csum by hardware
>>> even when it is correct.
>>
>> Can you explain that a little further?
>
> Sure, here is the code in __skb_checksum_complete():
>
> /* skb->csum holds pseudo checksum */
> sum = csum_fold(csum_add(skb->csum, csum));
> if (likely(!sum)) {
> if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> !skb->csum_complete_sw)
> netdev_rx_csum_fault(skb->dev);
> }
>
> So when sum == 0, it means the checksum is correct. And
> we already set ->ip_summed to CHECKSUM_COMPLETE
> after check_csum(), and ->csum_complete_sw is initialized
> to 0 when we allocate the skb. This is why we trigger
> netdev_rx_csum_fault().
>
 Yes, but this also means that the driver gave the stack a checksum
 complete value that was incorrect. That's an error.
>>>
>>> That is the whole purpose of commit f8c6455bb04b944edb69e,
>>> isn't it?
>>
>> No. Unless you've uncovered some other bug, what is probably happening
>> is that driver receives a packet with a checksum complete value. It
>> records the value in the skbuff and marks it as CHECKSUM_COMPLETE.
>> Subsequently, the stack tries to validate a transport layer checksum,
>> and the validation fails (checksum does not sum to zero). The stack
>> will then call __skb_checksum_complete from
>> __skb_checksum_validate_complete. In this case the stack computes that
>> transport checksum by hand and sees that transport checksum is valid--
>> so that means that the original value in checksum complete was not
>> correct, it is not set to the computed checksum of the whole packet.
>> This is an important error because it catches issues where checksum is
>> not correctly being pulled up.
>
> I see, the comments in mlx4 driver said:
>
> /* Although the stack expects checksum which doesn't include the pseudo
>  * header, the HW adds it. To address that, we are subtracting the pseudo
>  * header checksum from the checksum value provided by the HW.
>  */
>
> which seems imply it calculates a correct checksum for the whole
> packet here, but the stack disagrees. Therefore skb->csum is not
> still not what the stack expects.
>
Right, skb->csum is not what the stack expects. When it does the
computation over the same data it arrives at a different value than
what the driver sets. With this error pops that means the checksum in
the packet is correct, but driver or something in the stack messed up
skb->csum.

> Given skb_checksum_simple_validate() always pass a null pseudo
> header, it looks like either the fix-up for pseudo header is not needed
> at all for ICMP case, OR we need to call skb_checksum_validate()
> for ICMPv4 case. Hmm...

Pseudo header is not part of IPv4 checksum calculation so
skb_checksum_simple_validate is correct. Seems like a good chance
driver is doing fix-up wrong for ICMP.

Tom

Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum

2016-06-27 Thread Cong Wang

On Mon, Jun 27, 2016 at 3:04 PM, Tom Herbert  wrote:
> On Mon, Jun 27, 2016 at 2:49 PM, Cong Wang  wrote:
>> On Mon, Jun 27, 2016 at 2:47 PM, Tom Herbert  wrote:
>>> On Mon, Jun 27, 2016 at 2:44 PM, Cong Wang  wrote:
 On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz  wrote:
> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang  
> wrote:
>> The stack doesn't trust the complete csum by hardware
>> even when it is correct.
>
> Can you explain that a little further?

 Sure, here is the code in __skb_checksum_complete():

 /* skb->csum holds pseudo checksum */
 sum = csum_fold(csum_add(skb->csum, csum));
 if (likely(!sum)) {
 if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
 !skb->csum_complete_sw)
 netdev_rx_csum_fault(skb->dev);
 }

 So when sum == 0, it means the checksum is correct. And
 we already set ->ip_summed to CHECKSUM_COMPLETE
 after check_csum(), and ->csum_complete_sw is initialized
 to 0 when we allocate the skb. This is why we trigger
 netdev_rx_csum_fault().

>>> Yes, but this also means that the driver gave the stack a checksum
>>> complete value that was incorrect. That's an error.
>>
>> That is the whole purpose of commit f8c6455bb04b944edb69e,
>> isn't it?
>
> No. Unless you've uncovered some other bug, what is probably happening
> is that driver receives a packet with a checksum complete value. It
> records the value in the skbuff and marks it as CHECKSUM_COMPLETE.
> Subsequently, the stack tries to validate a transport layer checksum,
> and the validation fails (checksum does not sum to zero). The stack
> will then call __skb_checksum_complete from
> __skb_checksum_validate_complete. In this case the stack computes that
> transport checksum by hand and sees that transport checksum is valid--
> so that means that the original value in checksum complete was not
> correct, it is not set to the computed checksum of the whole packet.
> This is an important error because it catches issues where checksum is
> not correctly being pulled up.

I see, the comments in mlx4 driver said:

/* Although the stack expects checksum which doesn't include the pseudo
 * header, the HW adds it. To address that, we are subtracting the pseudo
 * header checksum from the checksum value provided by the HW.
 */

which seems imply it calculates a correct checksum for the whole
packet here, but the stack disagrees. Therefore skb->csum is not
still not what the stack expects.

Given skb_checksum_simple_validate() always pass a null pseudo
header, it looks like either the fix-up for pseudo header is not needed
at all for ICMP case, OR we need to call skb_checksum_validate()
for ICMPv4 case. Hmm...

Re: [PATCH v4 01/29] bluetooth: Switch SMP to crypto_cipher_encrypt_one()

2016-06-27 Thread Andy Lutomirski

On Mon, Jun 27, 2016 at 3:30 PM, Marcel Holtmann  wrote:
> Hi Ingo,
>
 SMP does ECB crypto on stack buffers.  This is complicated and
 fragile, and it will not work if the stack is virtually allocated.

 Switch to the crypto_cipher interface, which is simpler and safer.

 Cc: Marcel Holtmann 
 Cc: Gustavo Padovan 
 Cc: Johan Hedberg 
 Cc: "David S. Miller" 
 Cc: linux-blueto...@vger.kernel.org
 Cc: netdev@vger.kernel.org
 Acked-by: Herbert Xu 
 Acked-and-tested-by: Johan Hedberg 
 Signed-off-by: Andy Lutomirski 
 ---
 net/bluetooth/smp.c | 67 
 ++---
 1 file changed, 28 insertions(+), 39 deletions(-)
>>>
>>> patch has been applied to bluetooth-next tree.
>>
>> Sadly carrying this separately will delay the virtual kernel stacks feature 
>> by a
>> kernel cycle, because it's a must-have prerequisite.
>
> I can take it back out, but then I have the fear the the ECDH change to use 
> KPP for SMP might be the one that has to wait a kernel cycle. Either way is 
> fine with me, but I want to avoid nasty merge conflicts in the Bluetooth SMP 
> code.

Nothing goes wrong if an identical patch is queued in both places,
right?  Or, if you prefer not to duplicate it, could one of you commit
it and the other one pull it?  Ingo, given that this is patch 1 in the
series and unlikely to change, if you want to make this whole thing
have a separate branch in -tip, this could live there for starters.
(But, if you do so, please make sure you base off a very new copy of
Linus' tree -- the series is heavily dependent on the thread_info
change he applied a few days ago.)

--Andy

[PATCH] tcp: add an ability to dump and restore window parameters

2016-06-27 Thread Andrey Vagin

We found that sometimes a restored tcp socket doesn't work.

A reason of this bug is incorrect window parameters and in this case
tcp_acceptable_seq() returns tcp_wnd_end(tp) instead of tp->snd_nxt. The
other side drops packets with this seq, because seq is less than
tp->rcv_nxt ( tcp_sequence() ).

Data from a send queue is sent only if there is enough space in a
window, so when we restore unacked data, we need to expand a window to
fit this data.

This was in a first version of this patch:
"tcp: extend window to fit all restored unacked data in a send queue"

Then Alexey recommended me to restore window parameters instead of
adjusted them according with data in a sent queue. This sounds resonable.

rcv_wnd has to be restored, because it was reported to another side
and the offered window is never shrunk.
One of reasons why we need to restore snd_wnd was described above.

Cc: Pavel Emelyanov 
Cc: "David S. Miller" 
Cc: Alexey Kuznetsov 
Cc: James Morris 
Cc: Hideaki YOSHIFUJI 
Cc: Patrick McHardy 
Signed-off-by: Andrey Vagin 
---
 include/uapi/linux/tcp.h | 10 +
 net/ipv4/tcp.c   | 57 
 2 files changed, 67 insertions(+)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 53e8e3f..482898f 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -115,12 +115,22 @@ enum {
 #define TCP_CC_INFO26  /* Get Congestion Control (optional) 
info */
 #define TCP_SAVE_SYN   27  /* Record SYN headers for new 
connections */
 #define TCP_SAVED_SYN  28  /* Get SYN headers recorded for 
connection */
+#define TCP_REPAIR_WINDOW  29  /* Get/set window parameters */
 
 struct tcp_repair_opt {
__u32   opt_code;
__u32   opt_val;
 };
 
+struct tcp_repair_window {
+   __u32   snd_wl1;
+   __u32   snd_wnd;
+   __u32   max_window;
+
+   __u32   rcv_wnd;
+   __u32   rcv_wup;
+};
+
 enum {
TCP_NO_QUEUE,
TCP_RECV_QUEUE,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5c7ed14..108ef2a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2277,6 +2277,38 @@ static inline bool tcp_can_repair_sock(const struct sock 
*sk)
((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_ESTABLISHED));
 }
 
+static int tcp_repair_set_window(struct tcp_sock *tp, char __user *optbuf, int 
len)
+{
+   struct tcp_repair_window opt;
+
+   if (!tp->repair)
+   return -EPERM;
+
+   if (len != sizeof(opt))
+   return -EINVAL;
+
+   if (copy_from_user(, optbuf, sizeof(opt)))
+   return -EFAULT;
+
+   if (opt.max_window < opt.snd_wnd)
+   return -EINVAL;
+
+   if (after(opt.snd_wl1, tp->rcv_nxt + opt.rcv_wnd))
+   return -EINVAL;
+
+   if (after(opt.rcv_wup, tp->rcv_nxt))
+   return -EINVAL;
+
+   tp->snd_wl1 = opt.snd_wl1;
+   tp->snd_wnd = opt.snd_wnd;
+   tp->max_window  = opt.max_window;
+
+   tp->rcv_wnd = opt.rcv_wnd;
+   tp->rcv_wup = opt.rcv_wup;
+
+   return 0;
+}
+
 static int tcp_repair_options_est(struct tcp_sock *tp,
struct tcp_repair_opt __user *optbuf, unsigned int len)
 {
@@ -2604,6 +2636,9 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
else
tp->tsoffset = val - tcp_time_stamp;
break;
+   case TCP_REPAIR_WINDOW:
+   err = tcp_repair_set_window(tp, optval, optlen);
+   break;
case TCP_NOTSENT_LOWAT:
tp->notsent_lowat = val;
sk->sk_write_space(sk);
@@ -2860,6 +2895,28 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
return -EINVAL;
break;
 
+   case TCP_REPAIR_WINDOW: {
+   struct tcp_repair_window opt;
+
+   if (get_user(len, optlen))
+   return -EFAULT;
+
+   if (len != sizeof(opt))
+   return -EINVAL;
+
+   if (!tp->repair)
+   return -EPERM;
+
+   opt.snd_wl1 = tp->snd_wl1;
+   opt.snd_wnd = tp->snd_wnd;
+   opt.max_window  = tp->max_window;
+   opt.rcv_wnd = tp->rcv_wnd;
+   opt.rcv_wup = tp->rcv_wup;
+
+   if (copy_to_user(optval, , len))
+   return -EFAULT;
+   return 0;
+   }
case TCP_QUEUE_SEQ:
if (tp->repair_queue == TCP_SEND_QUEUE)
val = tp->write_seq;
-- 
2.5.5

Re: [PATCH v4 01/29] bluetooth: Switch SMP to crypto_cipher_encrypt_one()

2016-06-27 Thread Marcel Holtmann

Hi Ingo,

>>> SMP does ECB crypto on stack buffers.  This is complicated and
>>> fragile, and it will not work if the stack is virtually allocated.
>>> 
>>> Switch to the crypto_cipher interface, which is simpler and safer.
>>> 
>>> Cc: Marcel Holtmann 
>>> Cc: Gustavo Padovan 
>>> Cc: Johan Hedberg 
>>> Cc: "David S. Miller" 
>>> Cc: linux-blueto...@vger.kernel.org
>>> Cc: netdev@vger.kernel.org
>>> Acked-by: Herbert Xu 
>>> Acked-and-tested-by: Johan Hedberg 
>>> Signed-off-by: Andy Lutomirski 
>>> ---
>>> net/bluetooth/smp.c | 67 
>>> ++---
>>> 1 file changed, 28 insertions(+), 39 deletions(-)
>> 
>> patch has been applied to bluetooth-next tree.
> 
> Sadly carrying this separately will delay the virtual kernel stacks feature 
> by a 
> kernel cycle, because it's a must-have prerequisite.

I can take it back out, but then I have the fear the the ECDH change to use KPP 
for SMP might be the one that has to wait a kernel cycle. Either way is fine 
with me, but I want to avoid nasty merge conflicts in the Bluetooth SMP code.

Regards

Marcel

[PATCH 2/2] net: ethernet: mvpp2: use phy_ethtool_{get|set}_link_ksettings

2016-06-27 Thread Philippe Reynes

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/marvell/mvpp2.c |   22 ++
 1 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 18477fe..0b04717 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -5813,24 +5813,6 @@ static int mvpp2_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
 
 /* Ethtool methods */
 
-/* Get settings (phy address, speed) for ethtools */
-static int mvpp2_ethtool_get_settings(struct net_device *dev,
- struct ethtool_cmd *cmd)
-{
-   if (!dev->phydev)
-   return -ENODEV;
-   return phy_ethtool_gset(dev->phydev, cmd);
-}
-
-/* Set settings (phy address, speed) for ethtools */
-static int mvpp2_ethtool_set_settings(struct net_device *dev,
- struct ethtool_cmd *cmd)
-{
-   if (!dev->phydev)
-   return -ENODEV;
-   return phy_ethtool_sset(dev->phydev, cmd);
-}
-
 /* Set interrupt coalescing for ethtools */
 static int mvpp2_ethtool_set_coalesce(struct net_device *dev,
  struct ethtool_coalesce *c)
@@ -5965,13 +5947,13 @@ static const struct net_device_ops mvpp2_netdev_ops = {
 
 static const struct ethtool_ops mvpp2_eth_tool_ops = {
.get_link   = ethtool_op_get_link,
-   .get_settings   = mvpp2_ethtool_get_settings,
-   .set_settings   = mvpp2_ethtool_set_settings,
.set_coalesce   = mvpp2_ethtool_set_coalesce,
.get_coalesce   = mvpp2_ethtool_get_coalesce,
.get_drvinfo= mvpp2_ethtool_get_drvinfo,
.get_ringparam  = mvpp2_ethtool_get_ringparam,
.set_ringparam  = mvpp2_ethtool_set_ringparam,
+   .get_link_ksettings = phy_ethtool_get_link_ksettings,
+   .set_link_ksettings = phy_ethtool_set_link_ksettings,
 };
 
 /* Driver initialization */
-- 
1.7.4.4

[PATCH 1/2] net: ethernet: mvpp2: use phydev from struct net_device

2016-06-27 Thread Philippe Reynes

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/marvell/mvpp2.c |   34 --
 1 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 868a957..18477fe 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -699,7 +699,6 @@ struct mvpp2_port {
u16 rx_ring_size;
struct mvpp2_pcpu_stats __percpu *stats;
 
-   struct phy_device *phy_dev;
phy_interface_t phy_interface;
struct device_node *phy_node;
unsigned int link;
@@ -4850,7 +4849,7 @@ static irqreturn_t mvpp2_isr(int irq, void *dev_id)
 static void mvpp2_link_event(struct net_device *dev)
 {
struct mvpp2_port *port = netdev_priv(dev);
-   struct phy_device *phydev = port->phy_dev;
+   struct phy_device *phydev = dev->phydev;
int status_change = 0;
u32 val;
 
@@ -5416,6 +5415,8 @@ static int mvpp2_poll(struct napi_struct *napi, int 
budget)
 /* Set hw internals when starting port */
 static void mvpp2_start_dev(struct mvpp2_port *port)
 {
+   struct net_device *ndev = port->dev;
+
mvpp2_gmac_max_rx_size_set(port);
mvpp2_txp_max_tx_size_set(port);
 
@@ -5425,13 +5426,15 @@ static void mvpp2_start_dev(struct mvpp2_port *port)
mvpp2_interrupts_enable(port);
 
mvpp2_port_enable(port);
-   phy_start(port->phy_dev);
+   phy_start(ndev->phydev);
netif_tx_start_all_queues(port->dev);
 }
 
 /* Set hw internals when stopping port */
 static void mvpp2_stop_dev(struct mvpp2_port *port)
 {
+   struct net_device *ndev = port->dev;
+
/* Stop new packets from arriving to RXQs */
mvpp2_ingress_disable(port);
 
@@ -5447,7 +5450,7 @@ static void mvpp2_stop_dev(struct mvpp2_port *port)
 
mvpp2_egress_disable(port);
mvpp2_port_disable(port);
-   phy_stop(port->phy_dev);
+   phy_stop(ndev->phydev);
 }
 
 /* Return positive if MTU is valid */
@@ -5535,7 +5538,6 @@ static int mvpp2_phy_connect(struct mvpp2_port *port)
phy_dev->supported &= PHY_GBIT_FEATURES;
phy_dev->advertising = phy_dev->supported;
 
-   port->phy_dev = phy_dev;
port->link= 0;
port->duplex  = 0;
port->speed   = 0;
@@ -5545,8 +5547,9 @@ static int mvpp2_phy_connect(struct mvpp2_port *port)
 
 static void mvpp2_phy_disconnect(struct mvpp2_port *port)
 {
-   phy_disconnect(port->phy_dev);
-   port->phy_dev = NULL;
+   struct net_device *ndev = port->dev;
+
+   phy_disconnect(ndev->phydev);
 }
 
 static int mvpp2_open(struct net_device *dev)
@@ -5796,13 +5799,12 @@ mvpp2_get_stats64(struct net_device *dev, struct 
rtnl_link_stats64 *stats)
 
 static int mvpp2_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 {
-   struct mvpp2_port *port = netdev_priv(dev);
int ret;
 
-   if (!port->phy_dev)
+   if (!dev->phydev)
return -ENOTSUPP;
 
-   ret = phy_mii_ioctl(port->phy_dev, ifr, cmd);
+   ret = phy_mii_ioctl(dev->phydev, ifr, cmd);
if (!ret)
mvpp2_link_event(dev);
 
@@ -5815,22 +5817,18 @@ static int mvpp2_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
 static int mvpp2_ethtool_get_settings(struct net_device *dev,
  struct ethtool_cmd *cmd)
 {
-   struct mvpp2_port *port = netdev_priv(dev);
-
-   if (!port->phy_dev)
+   if (!dev->phydev)
return -ENODEV;
-   return phy_ethtool_gset(port->phy_dev, cmd);
+   return phy_ethtool_gset(dev->phydev, cmd);
 }
 
 /* Set settings (phy address, speed) for ethtools */
 static int mvpp2_ethtool_set_settings(struct net_device *dev,
  struct ethtool_cmd *cmd)
 {
-   struct mvpp2_port *port = netdev_priv(dev);
-
-   if (!port->phy_dev)
+   if (!dev->phydev)
return -ENODEV;
-   return phy_ethtool_sset(port->phy_dev, cmd);
+   return phy_ethtool_sset(dev->phydev, cmd);
 }
 
 /* Set interrupt coalescing for ethtools */
-- 
1.7.4.4

Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum

2016-06-27 Thread Tom Herbert

On Mon, Jun 27, 2016 at 2:49 PM, Cong Wang  wrote:
> On Mon, Jun 27, 2016 at 2:47 PM, Tom Herbert  wrote:
>> On Mon, Jun 27, 2016 at 2:44 PM, Cong Wang  wrote:
>>> On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz  wrote:
 On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang  
 wrote:
> The stack doesn't trust the complete csum by hardware
> even when it is correct.

 Can you explain that a little further?
>>>
>>> Sure, here is the code in __skb_checksum_complete():
>>>
>>> /* skb->csum holds pseudo checksum */
>>> sum = csum_fold(csum_add(skb->csum, csum));
>>> if (likely(!sum)) {
>>> if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>>> !skb->csum_complete_sw)
>>> netdev_rx_csum_fault(skb->dev);
>>> }
>>>
>>> So when sum == 0, it means the checksum is correct. And
>>> we already set ->ip_summed to CHECKSUM_COMPLETE
>>> after check_csum(), and ->csum_complete_sw is initialized
>>> to 0 when we allocate the skb. This is why we trigger
>>> netdev_rx_csum_fault().
>>>
>> Yes, but this also means that the driver gave the stack a checksum
>> complete value that was incorrect. That's an error.
>
> That is the whole purpose of commit f8c6455bb04b944edb69e,
> isn't it?

No. Unless you've uncovered some other bug, what is probably happening
is that driver receives a packet with a checksum complete value. It
records the value in the skbuff and marks it as CHECKSUM_COMPLETE.
Subsequently, the stack tries to validate a transport layer checksum,
and the validation fails (checksum does not sum to zero). The stack
will then call __skb_checksum_complete from
__skb_checksum_validate_complete. In this case the stack computes that
transport checksum by hand and sees that transport checksum is valid--
so that means that the original value in checksum complete was not
correct, it is not set to the computed checksum of the whole packet.
This is an important error because it catches issues where checksum is
not correctly being pulled up.

Tom

Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum

2016-06-27 Thread Cong Wang

On Mon, Jun 27, 2016 at 2:47 PM, Tom Herbert  wrote:
> On Mon, Jun 27, 2016 at 2:44 PM, Cong Wang  wrote:
>> On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz  wrote:
>>> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang  wrote:
 The stack doesn't trust the complete csum by hardware
 even when it is correct.
>>>
>>> Can you explain that a little further?
>>
>> Sure, here is the code in __skb_checksum_complete():
>>
>> /* skb->csum holds pseudo checksum */
>> sum = csum_fold(csum_add(skb->csum, csum));
>> if (likely(!sum)) {
>> if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>> !skb->csum_complete_sw)
>> netdev_rx_csum_fault(skb->dev);
>> }
>>
>> So when sum == 0, it means the checksum is correct. And
>> we already set ->ip_summed to CHECKSUM_COMPLETE
>> after check_csum(), and ->csum_complete_sw is initialized
>> to 0 when we allocate the skb. This is why we trigger
>> netdev_rx_csum_fault().
>>
> Yes, but this also means that the driver gave the stack a checksum
> complete value that was incorrect. That's an error.

That is the whole purpose of commit f8c6455bb04b944edb69e,
isn't it?

Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum

2016-06-27 Thread Tom Herbert

On Mon, Jun 27, 2016 at 2:44 PM, Cong Wang  wrote:
> On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz  wrote:
>> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang  wrote:
>>> The stack doesn't trust the complete csum by hardware
>>> even when it is correct.
>>
>> Can you explain that a little further?
>
> Sure, here is the code in __skb_checksum_complete():
>
> /* skb->csum holds pseudo checksum */
> sum = csum_fold(csum_add(skb->csum, csum));
> if (likely(!sum)) {
> if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> !skb->csum_complete_sw)
> netdev_rx_csum_fault(skb->dev);
> }
>
> So when sum == 0, it means the checksum is correct. And
> we already set ->ip_summed to CHECKSUM_COMPLETE
> after check_csum(), and ->csum_complete_sw is initialized
> to 0 when we allocate the skb. This is why we trigger
> netdev_rx_csum_fault().
>
Yes, but this also means that the driver gave the stack a checksum
complete value that was incorrect. That's an error.

Tom

>
>>
>>> In the case we fix csum by ourself
>>> probably it is safe to just mark it as completed by software.
>>
>>> This should shut up a kernel warning from netdev_rx_csum_fault()
>>> with mlx4 driver for ICMP packets.
>>
>> can you point/paste the exact warning and how to reproduce that? is
>> that as simple as running ping and/or ping6?
>
> Yes, ping is enough to reproduce it every time.
>
> The warning is below:
>
> [ 8693.680997] eth0: hw csum failure
> [ 8693.681003] CPU: 5 PID: 34 Comm: ksoftirqd/5 Not tainted 4.1.20-t6.el5 #1
> [ 8693.681005] Hardware name: SYNNEX HYVE-ZEUS/X9DRD-iF, BIOS 3.0.4 12/06/2013
> [ 8693.681008]   88085c15fae8 81502872
> 881051397800
> [ 8693.681011]  881054c08b01 88085c15fb08 814569c5
> 
> [ 8693.681014]  8808572d7200 88085c15fb38 81450738
> 8808572d7200
> [ 8693.681017] Call Trace:
> [ 8693.681025]  [] dump_stack+0x4d/0x63
> [ 8693.681030]  [] netdev_rx_csum_fault+0x38/0x3c
> [ 8693.681033]  [] __skb_checksum_complete+0x6e/0xb6
> [ 8693.681036]  [] icmp_rcv+0x17a/0x32f
> [ 8693.681040]  [] ip_local_deliver_finish+0xd1/0x153
> [ 8693.681042]  [] ip_local_deliver+0x8d/0x94
> [ 8693.681045]  [] ? 
> xfrm4_policy_check.constprop.6+0x55/0x55
> [ 8693.681048]  [] ip_rcv_finish+0x289/0x2cc
> [ 8693.681050]  [] ip_rcv+0x27d/0x30a
> [ 8693.681053]  [] __netif_receive_skb_core+0x3f2/0x483
> [ 8693.681056]  [] __netif_receive_skb+0x18/0x5a
> [ 8693.681058]  [] process_backlog+0x90/0x10c
> [ 8693.681061]  [] net_rx_action+0x101/0x2aa
> [ 8693.681066]  [] __do_softirq+0x10c/0x26d
> [ 8693.681068]  [] run_ksoftirqd+0x1a/0x2f
> [ 8693.681071]  [] smpboot_thread_fn+0x149/0x167
> [ 8693.681074]  [] ? sort_range+0x24/0x24
> [ 8693.681076]  [] ? sort_range+0x24/0x24
> [ 8693.681080]  [] kthread+0xae/0xb6
> [ 8693.681082]  [] ? add_sysfs_param.isra.4+0xe1/0x18c
> [ 8693.681085]  [] ? __kthread_parkme+0x61/0x61
> [ 8693.681088]  [] ret_from_fork+0x42/0x70
> [ 8693.681090]  [] ? __kthread_parkme+0x61/0x61

Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum

2016-06-27 Thread Cong Wang

On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz  wrote:
> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang  wrote:
>> The stack doesn't trust the complete csum by hardware
>> even when it is correct.
>
> Can you explain that a little further?

Sure, here is the code in __skb_checksum_complete():

/* skb->csum holds pseudo checksum */
sum = csum_fold(csum_add(skb->csum, csum));
if (likely(!sum)) {
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
netdev_rx_csum_fault(skb->dev);
}

So when sum == 0, it means the checksum is correct. And
we already set ->ip_summed to CHECKSUM_COMPLETE
after check_csum(), and ->csum_complete_sw is initialized
to 0 when we allocate the skb. This is why we trigger
netdev_rx_csum_fault().


>
>> In the case we fix csum by ourself
>> probably it is safe to just mark it as completed by software.
>
>> This should shut up a kernel warning from netdev_rx_csum_fault()
>> with mlx4 driver for ICMP packets.
>
> can you point/paste the exact warning and how to reproduce that? is
> that as simple as running ping and/or ping6?

Yes, ping is enough to reproduce it every time.

The warning is below:

[ 8693.680997] eth0: hw csum failure
[ 8693.681003] CPU: 5 PID: 34 Comm: ksoftirqd/5 Not tainted 4.1.20-t6.el5 #1
[ 8693.681005] Hardware name: SYNNEX HYVE-ZEUS/X9DRD-iF, BIOS 3.0.4 12/06/2013
[ 8693.681008]   88085c15fae8 81502872
881051397800
[ 8693.681011]  881054c08b01 88085c15fb08 814569c5

[ 8693.681014]  8808572d7200 88085c15fb38 81450738
8808572d7200
[ 8693.681017] Call Trace:
[ 8693.681025]  [] dump_stack+0x4d/0x63
[ 8693.681030]  [] netdev_rx_csum_fault+0x38/0x3c
[ 8693.681033]  [] __skb_checksum_complete+0x6e/0xb6
[ 8693.681036]  [] icmp_rcv+0x17a/0x32f
[ 8693.681040]  [] ip_local_deliver_finish+0xd1/0x153
[ 8693.681042]  [] ip_local_deliver+0x8d/0x94
[ 8693.681045]  [] ? xfrm4_policy_check.constprop.6+0x55/0x55
[ 8693.681048]  [] ip_rcv_finish+0x289/0x2cc
[ 8693.681050]  [] ip_rcv+0x27d/0x30a
[ 8693.681053]  [] __netif_receive_skb_core+0x3f2/0x483
[ 8693.681056]  [] __netif_receive_skb+0x18/0x5a
[ 8693.681058]  [] process_backlog+0x90/0x10c
[ 8693.681061]  [] net_rx_action+0x101/0x2aa
[ 8693.681066]  [] __do_softirq+0x10c/0x26d
[ 8693.681068]  [] run_ksoftirqd+0x1a/0x2f
[ 8693.681071]  [] smpboot_thread_fn+0x149/0x167
[ 8693.681074]  [] ? sort_range+0x24/0x24
[ 8693.681076]  [] ? sort_range+0x24/0x24
[ 8693.681080]  [] kthread+0xae/0xb6
[ 8693.681082]  [] ? add_sysfs_param.isra.4+0xe1/0x18c
[ 8693.681085]  [] ? __kthread_parkme+0x61/0x61
[ 8693.681088]  [] ret_from_fork+0x42/0x70
[ 8693.681090]  [] ? __kthread_parkme+0x61/0x61

Re: [PATCH v2 net-next] tcp: md5: use kmalloc() backed scratch areas

2016-06-27 Thread Eric Dumazet

On Mon, 2016-06-27 at 11:31 -0700, Cong Wang wrote:

> Not a problem of your patch, but it seems these allocations never
> get freed once we start using tcp md5. Maybe we should free them
> when the last socket using tcp md5 is gone?

If we constantly allocate-deallocate these tiny blocks for occasional
TCP MD5 use, it becomes quite expensive.

With current code, only first TCP MD5 usage trigger an extra setup cost.

Re: [PATCH v2 net-next] tcp: md5: use kmalloc() backed scratch areas

2016-06-27 Thread Eric Dumazet

On Mon, 2016-06-27 at 10:58 -0700, Andy Lutomirski wrote:

> Seems reasonable.
> 
> I wonder if it's worth switching from ahash to shash, though.  It
> would probably be simpler and faster.

Well, I have no opinion on this, I will let a crypto guy doing this
change if he cares ;)

Thanks.

Re: [PATCH v2 iproute2 3/3] ss: Add support to filter on device

2016-06-27 Thread Stephen Hemminger

On Mon, 27 Jun 2016 11:34:25 -0700
David Ahern  wrote:

> + case SSF_DEVCOND:
> + {
> + struct aafilter *a = (void *)f->pred;

I don't like the wandering bracket left, but all the code has that.
After this will change it to:
case SSF_DEVCOND:  {
struct aafilter *a = f->pred;
...

Re: [iproute PATCH v3 0/6] Big C99 style initializer rework

2016-06-27 Thread Stephen Hemminger

On Mon, 27 Jun 2016 20:23:02 +0200
Phil Sutter  wrote:

> Hi,
> 
> On Mon, Jun 27, 2016 at 10:59:12AM -0700, Stephen Hemminger wrote:
> > On Thu, 23 Jun 2016 17:34:08 +
> > Phil Sutter  wrote:
> > 
> > > This is v3 of my C99-style initializer related patch series. The changes
> > > since v2 are:
> [...]
> > 
> > I like the idea and it makes code cleaner. But doing this introduces lots 
> > of warnings
> > and that is not acceptable.
> > ip
> > CC   ip.o
> > CC   ipaddress.o
> > ipaddress.c: In function ‘print_queuelen’:
> > ipaddress.c:175:10: warning: missing braces around initializer 
> > [-Wmissing-braces]
> >struct ifreq ifr = { 0 };
> >   ^
> 
> I saw these too with gcc-3.4.6 but not with 5.3.0. It appears to be a
> gcc bug[1]. One possible workaround is to match the brace level of the
> first field, but it's quite ugly: [2]. Another way might be to
> initialize one of the fields to zero, like so:
> 
> | struct ifreq ifr = { .ifr_qlen = 0 };
> 
> What do you think?
> 
> Thanks, Phil
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119
> [2] 
> http://nwl.cc/cgi-bin/git/gitweb.cgi?p=iproute2.git;a=commitdiff;h=a1cbf2b63c995b2f633c5b4699248ab308b201d2;hp=3809cfec65b03716d1d0360338126df4b4f3fbf6

I am using gcc on Debian stable which is 5.3.1.

Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum

2016-06-27 Thread Or Gerlitz

On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang  wrote:
> The stack doesn't trust the complete csum by hardware
> even when it is correct.

Can you explain that a little further?

> In the case we fix csum by ourself
> probably it is safe to just mark it as completed by software.

> This should shut up a kernel warning from netdev_rx_csum_fault()
> with mlx4 driver for ICMP packets.

can you point/paste the exact warning and how to reproduce that? is
that as simple as running ping and/or ping6?

> Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM 
> COMPLETE')
> Cc: Shani Michaeli 
> Cc: Tariq Toukan 
> Cc: Yishai Hadas 
> Signed-off-by: Cong Wang 
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index c1b3a9c..b44c434 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -732,6 +732,7 @@ static int check_csum(struct mlx4_cqe *cqe, struct 
> sk_buff *skb, void *va,
> if (get_fixed_ipv6_csum(hw_checksum, skb, hdr))
> return -1;
>  #endif
> +   skb->csum_complete_sw = 1;
> return 0;
>  }
>
> --
> 2.1.0
>

[patch] qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()

2016-06-27 Thread Dan Carpenter

There is a static checker warning here "warn: mask and shift to zero"
and the code sets "ring" to zero every time.  From looking at how
QLCNIC_FETCH_RING_ID() is used in qlcnic_83xx_process_rcv_ring() the
qlcnic_83xx_hndl() should be removed.

Fixes: 4be41e92f7c6 ('qlcnic: 83xx data path routines')
Signed-off-by: Dan Carpenter 
---
Not tested.

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c
index 7bd6f25..607bb7d 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c
@@ -2220,7 +2220,7 @@ void qlcnic_83xx_process_rcv_ring_diag(struct 
qlcnic_host_sds_ring *sds_ring)
if (!opcode)
return;
 
-   ring = QLCNIC_FETCH_RING_ID(qlcnic_83xx_hndl(sts_data[0]));
+   ring = QLCNIC_FETCH_RING_ID(sts_data[0]);
qlcnic_83xx_process_rcv_diag(adapter, ring, sts_data);
desc = _ring->desc_head[consumer];
desc->status_desc_data[0] = cpu_to_le64(STATUS_OWNER_PHANTOM);

Re: [PATCH net-next 03/16] net/mlx5: E-Switch, Add miss rule for offloads mode

2016-06-27 Thread Or Gerlitz

On Mon, Jun 27, 2016 at 7:53 PM, Sergei Shtylyov
 wrote:

>> +static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
>> +{
>> +   struct mlx5_flow_destination dest;
>> +   struct mlx5_flow_rule *flow_rule = NULL;
>> +   int match_header = 0;
>
>
>This variable doesn't appear necessary...

yep

>> +   dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
>> +   dest.vport_num = 0;
>> +
>> +   flow_rule = mlx5_add_flow_rule(esw->fdb_table.fdb, match_header,
>> match_c,
>
>
>Why not just pass 0 instead of 'match_header'?

Correct, will fix that.

Re: [PATCH] notifier: Fix soft lockup for notifier_call_chain().

2016-06-27 Thread Cong Wang

On Fri, Jun 24, 2016 at 7:46 PM, Ding Tianhong  wrote:
> diff --git a/kernel/notifier.c b/kernel/notifier.c
> index fd2c9ac..9c30411 100644
> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -92,6 +92,8 @@ static int notifier_call_chain(struct notifier_block **nl,
>  #endif
> ret = nb->notifier_call(nb, val, v);
>
> +   cond_resched();
> +
> if (nr_calls)
> (*nr_calls)++;

NAK.

You can't do a resched in atomic context in __atomic_notifier_call_chain().

[PATCH net] bpf, perf: delay release of BPF prog after grace period

2016-06-27 Thread Daniel Borkmann

Commit dead9f29ddcc ("perf: Fix race in BPF program unregister") moved
destruction of BPF program from free_event_rcu() callback to __free_event(),
which is problematic if used with tail calls: if prog A is attached as
trace event directly, but at the same time present in a tail call map used
by another trace event program elsewhere, then we need to delay destruction
via RCU grace period since it can still be in use by the program doing the
tail call (the prog first needs to be dropped from the tail call map, then
trace event with prog A attached destroyed, so we get immediate destruction).

Fixes: dead9f29ddcc ("perf: Fix race in BPF program unregister")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Cc: Jann Horn 
---
 include/linux/bpf.h  | 4 
 kernel/events/core.c | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8269caf..0de4de6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -264,6 +264,10 @@ static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 static inline void bpf_prog_put(struct bpf_prog *prog)
 {
 }
+
+static inline void bpf_prog_put_rcu(struct bpf_prog *prog)
+{
+}
 #endif /* CONFIG_BPF_SYSCALL */
 
 /* verifier prototypes for helper functions called from eBPF programs */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 274450e..d00c47b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7531,7 +7531,7 @@ static void perf_event_free_bpf_prog(struct perf_event 
*event)
prog = event->tp_event->prog;
if (prog) {
event->tp_event->prog = NULL;
-   bpf_prog_put(prog);
+   bpf_prog_put_rcu(prog);
}
 }
 
-- 
1.9.3

Re: [PATCH v4 00/19] CALIPSO Implementation

2016-06-27 Thread Paul Moore

On Thu, Jun 23, 2016 at 3:21 PM, David Miller  wrote:
> From: Huw Davies 
> Date: Tue, 21 Jun 2016 10:55:48 +0100
>
>> On Tue, Jun 21, 2016 at 05:39:28AM -0400, David Miller wrote:
>>> From: Huw Davies 
>>> Date: Mon, 20 Jun 2016 14:36:40 +0100
>>>
>>> > This patch series implements RFC 5570 - Common Architecture Label IPv6
>>> > Security Option (CALIPSO).  Its goal is to set MLS sensitivity labels
>>> > on IPv6 packets using a hop-by-hop option.  CALIPSO is very similar to
>>> > its IPv4 cousin CIPSO and much of this series is based on that code.
>>>
>>> What tree do you expect to integrate this?
>>
>> My understanding is that Paul Moore is happy to take them
>> in via the SELinux tree.  However, these patches do touch
>> some core networking code, such as the IPv6 option handling
>> code (in a similar manner to the way CIPSO touched the IPv4
>> option code), so if you have any comments on those aspects
>> that would be good to hear.
>
> No objections on my part.

Okay.

The changes between v3 and v4 were pretty trivial so I've gone ahead
and merged v4 into the SELinux next branch.  Huw, thanks very much for
all of this, I know it was a lot of work.

-- 
paul moore
www.paul-moore.com

RE: [PATCH net-next 00/13] liquidio: updates and bug fixes

2016-06-27 Thread Vatsavayi, Raghu

Thanks Dave. Will make sure next time Double-signoffs are not there.
Raghu.

> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Saturday, June 25, 2016 9:09 AM
> To: Vatsavayi, Raghu
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 00/13] liquidio: updates and bug fixes
> 
> From: Raghu Vatsavayi 
> Date: Tue, 21 Jun 2016 22:53:02 -0700
> 
> > Please consider following patch series for liquidio bug fixes and
> > updates on top of net-next. Following patches should be applied in the
> > following order as some of them depend on earlier patches in the
> > series.
> 
> Series applied, thanks.
> 
> Please don't put double-signoffs on your patches in future submissions, that
> is completely unnecessary.

[PATCH v2 iproute2 0/6] Add support for vrf keyword

2016-06-27 Thread David Ahern

Currently the syntax for VRF related commands is rather kludgy and
inconsistent from one subcommand to another. This set adds support
for the VRF keyword to the link, address, neigh, and route commands
to improve the user experience listing data associated with vrfs,
modifying routes or doing a route lookup.

v2
- rebased to top of tree
- all checkpatch warnings are usage lines. The change in these
  patches is consistent with existing code for usage lines

David Ahern (6):
  ip vrf: Add name_is_vrf
  ip link/addr: Add support for vrf keyword
  ip neigh: Add support for keyword
  ip route: Change type mask to bitmask
  ip vrf: Add ipvrf_get_table
  ip route: Add support for vrf keyword

 ip/ip_common.h  |   3 ++
 ip/ipaddress.c  |  12 +-
 ip/iplink.c |  15 ++-
 ip/iplink_vrf.c | 119 
 ip/ipneigh.c|  14 ++-
 ip/iproute.c|  43 
 6 files changed, 195 insertions(+), 11 deletions(-)

-- 
2.1.4

[PATCH iproute2 4/6] ip route: Change type mask to bitmask

2016-06-27 Thread David Ahern

Allow option to select multiple route types to show or exlude
specific route types.

Signed-off-by: David Ahern 
---
 ip/iproute.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 8224d7ffa94b..aae693d17be8 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -113,7 +113,7 @@ static struct
int flushe;
int protocol, protocolmask;
int scope, scopemask;
-   int type, typemask;
+   __u64 typemask;
int tos, tosmask;
int iif, iifmask;
int oif, oifmask;
@@ -178,7 +178,8 @@ static int filter_nlmsg(struct nlmsghdr *n, struct rtattr 
**tb, int host_len)
return 0;
if ((filter.scope^r->rtm_scope))
return 0;
-   if ((filter.type^r->rtm_type))
+
+   if (filter.typemask && !(filter.typemask & (1 << r->rtm_type)))
return 0;
if ((filter.tos^r->rtm_tos))
return 0;
@@ -365,7 +366,8 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
 
if (n->nlmsg_type == RTM_DELROUTE)
fprintf(fp, "Deleted ");
-   if ((r->rtm_type != RTN_UNICAST || show_details > 0) && !filter.type)
+   if ((r->rtm_type != RTN_UNICAST || show_details > 0) &&
+   (!filter.typemask || (filter.typemask & (1 << r->rtm_type
fprintf(fp, "%s ", rtnl_rtntype_n2a(r->rtm_type, b1, 
sizeof(b1)));
 
if (tb[RTA_DST]) {
@@ -1433,10 +1435,9 @@ static int iproute_list_flush_or_save(int argc, char 
**argv, int action)
int type;
 
NEXT_ARG();
-   filter.typemask = -1;
if (rtnl_rtntype_a2n(, *argv))
invarg("node type value is invalid\n", *argv);
-   filter.type = type;
+   filter.typemask = (1<

[PATCH iproute2 5/6] ip vrf: Add ipvrf_get_table

2016-06-27 Thread David Ahern

Add ipvrf_get_table to lookup table id for device name. Returns 0
on any error or if name is not a VRF device.

Signed-off-by: David Ahern 
---
 ip/ip_common.h  |  1 +
 ip/iplink_vrf.c | 66 +
 2 files changed, 67 insertions(+)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index 410eb135774a..8fdb7219fc2b 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -90,6 +90,7 @@ struct link_util *get_link_slave_kind(const char *slave_kind);
 
 void br_dump_bridge_id(const struct ifla_bridge_id *id, char *buf, size_t len);
 
+__u32 ipvrf_get_table(char *name);
 bool name_is_vrf(char *name);
 
 #ifndefINFINITY_LIFE_TIME
diff --git a/ip/iplink_vrf.c b/ip/iplink_vrf.c
index abd43c08423e..2eecb4564f7e 100644
--- a/ip/iplink_vrf.c
+++ b/ip/iplink_vrf.c
@@ -97,6 +97,72 @@ struct link_util vrf_slave_link_util = {
.slave  = true,
 };
 
+/* returns table id if name is a VRF device */
+__u32 ipvrf_get_table(char *name)
+{
+   struct {
+   struct nlmsghdr n;
+   struct ifinfomsgi;
+   charbuf[1024];
+   } req = {
+   .n = {
+   .nlmsg_len   = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+   .nlmsg_flags = NLM_F_REQUEST,
+   .nlmsg_type  = RTM_GETLINK,
+   },
+   .i = {
+   .ifi_family  = preferred_family,
+   },
+   };
+   struct {
+   struct nlmsghdr n;
+   char buf[8192];
+   } answer;
+   struct rtattr *tb[IFLA_MAX+1];
+   struct rtattr *li[IFLA_INFO_MAX+1];
+   struct rtattr *vrf_attr[IFLA_VRF_MAX + 1];
+   struct ifinfomsg *ifi;
+   __u32 tb_id = 0;
+   int len;
+
+   addattr_l(, sizeof(req), IFLA_IFNAME, name, strlen(name) + 1);
+
+   if (rtnl_talk(, , , sizeof(answer)) < 0)
+   goto err;
+
+   ifi = NLMSG_DATA();
+   len = answer.n.nlmsg_len - NLMSG_LENGTH(sizeof(*ifi));
+   if (len < 0) {
+   fprintf(stderr, "BUG: Invalid response to link query.\n");
+   goto err;
+   }
+
+   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
+
+   if (!tb[IFLA_LINKINFO])
+   goto err;
+
+   parse_rtattr_nested(li, IFLA_INFO_MAX, tb[IFLA_LINKINFO]);
+
+   if (!li[IFLA_INFO_KIND] || !li[IFLA_INFO_DATA])
+   goto err;
+
+   if (strcmp(RTA_DATA(li[IFLA_INFO_KIND]), "vrf"))
+   goto err;
+
+   parse_rtattr_nested(vrf_attr, IFLA_VRF_MAX, li[IFLA_INFO_DATA]);
+   if (vrf_attr[IFLA_VRF_TABLE])
+   tb_id = rta_getattr_u32(vrf_attr[IFLA_VRF_TABLE]);
+
+   if (!tb_id)
+   fprintf(stderr, "BUG: VRF %s is missing table id\n", name);
+
+   return tb_id;
+
+err:
+   return 0;
+}
+
 bool name_is_vrf(char *name)
 {
struct {
-- 
2.1.4

[PATCH iproute2 6/6] ip route: Add support for vrf keyword

2016-06-27 Thread David Ahern

Add vrf keyword to 'ip route' commands. Allows:
1. Users can list routes by VRF name:
   $ ip route show vrf NAME

   VRF tables have all routes including local and broadcast routes.
   The VRF keyword filters LOCAL and BROADCAST routes; to see all
   routes the table option can be used. Or to see local routes only
   for a VRF:
   $ ip route show vrf NAME type local

2. Add or delete a route for a VRF:
   $ ip route {add|delete} vrf NAME 

3. Do a route lookup for a VRF:
   $ ip route get vrf NAME ADDRESS

Signed-off-by: David Ahern 
---
 ip/iproute.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index aae693d17be8..bd661c16cb46 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -67,10 +67,10 @@ static void usage(void)
fprintf(stderr, "   ip route showdump\n");
fprintf(stderr, "   ip route get ADDRESS [ from ADDRESS iif STRING 
]\n");
fprintf(stderr, "[ oif STRING ] [ tos TOS 
]\n");
-   fprintf(stderr, "[ mark NUMBER ]\n");
+   fprintf(stderr, "[ mark NUMBER ] [ vrf NAME 
]\n");
fprintf(stderr, "   ip route { add | del | change | append | 
replace } ROUTE\n");
fprintf(stderr, "SELECTOR := [ root PREFIX ] [ match PREFIX ] [ exact 
PREFIX ]\n");
-   fprintf(stderr, "[ table TABLE_ID ] [ proto RTPROTO ]\n");
+   fprintf(stderr, "[ table TABLE_ID ] [ vrf NAME ] [ proto 
RTPROTO ]\n");
fprintf(stderr, "[ type TYPE ] [ scope SCOPE ]\n");
fprintf(stderr, "ROUTE := NODE_SPEC [ INFO_SPEC ]\n");
fprintf(stderr, "NODE_SPEC := [ TYPE ] PREFIX [ tos TOS ]\n");
@@ -1141,6 +1141,20 @@ static int iproute_modify(int cmd, unsigned int flags, 
int argc, char **argv)
addattr32(, sizeof(req), RTA_TABLE, tid);
}
table_ok = 1;
+   } else if (matches(*argv, "vrf") == 0) {
+   __u32 tid;
+
+   NEXT_ARG();
+   tid = ipvrf_get_table(*argv);
+   if (tid == 0)
+   invarg("Invalid VRF\n", *argv);
+   if (tid < 256)
+   req.r.rtm_table = tid;
+   else {
+   req.r.rtm_table = RT_TABLE_UNSPEC;
+   addattr32(, sizeof(req), RTA_TABLE, tid);
+   }
+   table_ok = 1;
} else if (strcmp(*argv, "dev") == 0 ||
   strcmp(*argv, "oif") == 0) {
NEXT_ARG();
@@ -1395,6 +1409,15 @@ static int iproute_list_flush_or_save(int argc, char 
**argv, int action)
}
} else
filter.tb = tid;
+   } else if (matches(*argv, "vrf") == 0) {
+   __u32 tid;
+
+   NEXT_ARG();
+   tid = ipvrf_get_table(*argv);
+   if (tid == 0)
+   invarg("Invalid VRF\n", *argv);
+   filter.tb = tid;
+   filter.typemask = ~(1 << RTN_LOCAL | 1<

[PATCH iproute2 1/6] ip vrf: Add name_is_vrf

2016-06-27 Thread David Ahern

Add name_is_vrf function to determine if given name corresponds to a
VRF device.

Signed-off-by: David Ahern 
---
 ip/ip_common.h  |  2 ++
 ip/iplink_vrf.c | 53 +
 2 files changed, 55 insertions(+)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index e8da9e034b15..410eb135774a 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -90,6 +90,8 @@ struct link_util *get_link_slave_kind(const char *slave_kind);
 
 void br_dump_bridge_id(const struct ifla_bridge_id *id, char *buf, size_t len);
 
+bool name_is_vrf(char *name);
+
 #ifndefINFINITY_LIFE_TIME
 #define INFINITY_LIFE_TIME  0xU
 #endif
diff --git a/ip/iplink_vrf.c b/ip/iplink_vrf.c
index e3c7b4652da5..abd43c08423e 100644
--- a/ip/iplink_vrf.c
+++ b/ip/iplink_vrf.c
@@ -96,3 +96,56 @@ struct link_util vrf_slave_link_util = {
.print_opt  = vrf_slave_print_opt,
.slave  = true,
 };
+
+bool name_is_vrf(char *name)
+{
+   struct {
+   struct nlmsghdr n;
+   struct ifinfomsgi;
+   charbuf[1024];
+   } req = {
+   .n = {
+   .nlmsg_len   = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+   .nlmsg_flags = NLM_F_REQUEST,
+   .nlmsg_type  = RTM_GETLINK,
+   },
+   .i = {
+   .ifi_family  = preferred_family,
+   },
+   };
+   struct {
+   struct nlmsghdr n;
+   char buf[8192];
+   } answer;
+   struct rtattr *tb[IFLA_MAX+1];
+   struct rtattr *li[IFLA_INFO_MAX+1];
+   struct ifinfomsg *ifi;
+   int len;
+
+   addattr_l(, sizeof(req), IFLA_IFNAME, name, strlen(name) + 1);
+
+   if (rtnl_talk(, , , sizeof(answer)) < 0)
+   goto err;
+
+   ifi = NLMSG_DATA();
+   len = answer.n.nlmsg_len - NLMSG_LENGTH(sizeof(*ifi));
+   if (len < 0) {
+   fprintf(stderr, "BUG: Invalid response to link query.\n");
+   goto err;
+   }
+
+   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
+
+   if (!tb[IFLA_LINKINFO])
+   goto err;
+
+   parse_rtattr_nested(li, IFLA_INFO_MAX, tb[IFLA_LINKINFO]);
+
+   if (!li[IFLA_INFO_KIND])
+   goto err;
+
+   return strcmp(RTA_DATA(li[IFLA_INFO_KIND]), "vrf") == 0;
+
+err:
+   return false;
+}
-- 
2.1.4

[PATCH v2 iproute2 2/6] ip link/addr: Add support for vrf keyword

2016-06-27 Thread David Ahern

Add vrf keyword to 'ip link' and 'ip addr' commands (common list code).

Allows:
1. Adding a link to a VRF
   $ ip link set NAME vrf NAME

   Removing a link from a VRF still uses 'ip link set NAME nomaster'

2. Showing links associated with a VRF:
   $ ip link show vrf NAME

3. List addresses associated with links in a VRF
   $ ip -br addr show vrf red

Signed-off-by: David Ahern 
---
 ip/ipaddress.c | 12 +++-
 ip/iplink.c| 15 +--
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 8766530f7fa7..03688d422dcc 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -79,7 +79,7 @@ static void usage(void)
fprintf(stderr, "[ to PREFIX ] [ FLAG-LIST 
] [ label LABEL ] [up]\n");
fprintf(stderr, "   ip address [ show [ dev IFNAME ] [ scope 
SCOPE-ID ] [ master DEVICE ]\n");
fprintf(stderr, " [ type TYPE ] [ to PREFIX ] [ 
FLAG-LIST ]\n");
-   fprintf(stderr, " [ label LABEL ] [up] ]\n");
+   fprintf(stderr, " [ label LABEL ] [up] [ vrf 
NAME ] ]\n");
fprintf(stderr, "   ip address {showdump|restore}\n");
fprintf(stderr, "IFADDR := PREFIX | ADDR peer PREFIX\n");
fprintf(stderr, "  [ broadcast ADDR ] [ anycast ADDR ]\n");
@@ -1620,6 +1620,16 @@ static int ipaddr_list_flush_or_save(int argc, char 
**argv, int action)
if (!ifindex)
invarg("Device does not exist\n", *argv);
filter.master = ifindex;
+   } else if (strcmp(*argv, "vrf") == 0) {
+   int ifindex;
+
+   NEXT_ARG();
+   ifindex = ll_name_to_index(*argv);
+   if (!ifindex)
+   invarg("Not a valid VRF name\n", *argv);
+   if (!name_is_vrf(*argv))
+   invarg("Not a valid VRF name\n", *argv);
+   filter.master = ifindex;
} else if (strcmp(*argv, "type") == 0) {
NEXT_ARG();
filter.kind = *argv;
diff --git a/ip/iplink.c b/ip/iplink.c
index b1f8a37922f5..f2a2e13cf0c5 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -82,11 +82,11 @@ void iplink_usage(void)
fprintf(stderr, "  [ query_rss { on | 
off} ]\n");
fprintf(stderr, "  [ state { auto | 
enable | disable} ] ]\n");
fprintf(stderr, "  [ trust { on | off} 
] ]\n");
-   fprintf(stderr, " [ master DEVICE ]\n");
+   fprintf(stderr, " [ master DEVICE ][ vrf NAME 
]\n");
fprintf(stderr, " [ nomaster ]\n");
fprintf(stderr, " [ addrgenmode { eui64 | none 
| stable_secret | random } ]\n");
fprintf(stderr, " [ protodown { on | off } 
]\n");
-   fprintf(stderr, "   ip link show [ DEVICE | group GROUP ] [up] 
[master DEV] [type TYPE]\n");
+   fprintf(stderr, "   ip link show [ DEVICE | group GROUP ] [up] 
[master DEV] [vrf NAME] [type TYPE]\n");
 
if (iplink_have_newlink()) {
fprintf(stderr, "   ip link help [ TYPE ]\n");
@@ -603,6 +603,17 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
invarg("Device does not exist\n", *argv);
addattr_l(>n, sizeof(*req), IFLA_MASTER,
  , 4);
+   } else if (strcmp(*argv, "vrf") == 0) {
+   int ifindex;
+
+   NEXT_ARG();
+   ifindex = ll_name_to_index(*argv);
+   if (!ifindex)
+   invarg("Not a valid VRF name\n", *argv);
+   if (!name_is_vrf(*argv))
+   invarg("Not a valid VRF name\n", *argv);
+   addattr_l(>n, sizeof(*req), IFLA_MASTER,
+ , sizeof(ifindex));
} else if (matches(*argv, "nomaster") == 0) {
int ifindex = 0;
 
-- 
2.1.4

[PATCH iproute2 3/6] ip neigh: Add support for keyword

2016-06-27 Thread David Ahern

Add vrf keyword to 'ip neigh' commands. Allows listing neighbor
entries for all links associated with a given VRF.

Signed-off-by: David Ahern 
---
 ip/ipneigh.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index 4ddb747e2086..3e444712645f 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -48,7 +48,8 @@ static void usage(void)
 {
fprintf(stderr, "Usage: ip neigh { add | del | change | replace }\n"
"{ ADDR [ lladdr LLADDR ] [ nud STATE ] 
| proxy ADDR } [ dev DEV ]\n");
-   fprintf(stderr, "   ip neigh { show | flush } [ proxy ] [ to PREFIX 
] [ dev DEV ] [ nud STATE ]\n\n");
+   fprintf(stderr, "   ip neigh { show | flush } [ proxy ] [ to PREFIX 
] [ dev DEV ] [ nud STATE ]\n");
+   fprintf(stderr, " [ vrf NAME ]\n\n");
fprintf(stderr, "STATE := { permanent | noarp | stale | reachable | 
none |\n"
"   incomplete | delay | probe | failed }\n");
exit(-1);
@@ -385,6 +386,17 @@ static int do_show_or_flush(int argc, char **argv, int 
flush)
invarg("Device does not exist\n", *argv);
addattr32(, sizeof(req), NDA_MASTER, ifindex);
filter.master = ifindex;
+   } else if (strcmp(*argv, "vrf") == 0) {
+   int ifindex;
+
+   NEXT_ARG();
+   ifindex = ll_name_to_index(*argv);
+   if (!ifindex)
+   invarg("Not a valid VRF name\n", *argv);
+   if (!name_is_vrf(*argv))
+   invarg("Not a valid VRF name\n", *argv);
+   addattr32(, sizeof(req), NDA_MASTER, ifindex);
+   filter.master = ifindex;
} else if (strcmp(*argv, "unused") == 0) {
filter.unused_only = 1;
} else if (strcmp(*argv, "nud") == 0) {
-- 
2.1.4

[PATCH v2 iproute2 1/3] ss: Refactor inet_show_sock

2016-06-27 Thread David Ahern

Extract parsing of sockstat and filter from inet_show_sock.
While moving run_ssfilter into callers of inet_show_sock enable
userspace filtering before the kill.

Signed-off-by: David Ahern 
---
 misc/ss.c | 75 ---
 1 file changed, 48 insertions(+), 27 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 02be7e7407df..f164ca920308 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2038,42 +2038,48 @@ static void tcp_show_info(const struct nlmsghdr *nlh, 
struct inet_diag_msg *r,
}
 }
 
-static int inet_show_sock(struct nlmsghdr *nlh, struct filter *f, int protocol)
+static void parse_diag_msg(struct nlmsghdr *nlh, struct sockstat *s)
 {
struct rtattr *tb[INET_DIAG_MAX+1];
struct inet_diag_msg *r = NLMSG_DATA(nlh);
-   struct sockstat s = {};
 
parse_rtattr(tb, INET_DIAG_MAX, (struct rtattr *)(r+1),
 nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
 
-   s.state = r->idiag_state;
-   s.local.family  = s.remote.family = r->idiag_family;
-   s.lport = ntohs(r->id.idiag_sport);
-   s.rport = ntohs(r->id.idiag_dport);
-   s.wq= r->idiag_wqueue;
-   s.rq= r->idiag_rqueue;
-   s.ino   = r->idiag_inode;
-   s.uid   = r->idiag_uid;
-   s.iface = r->id.idiag_if;
-   s.sk= cookie_sk_get(>id.idiag_cookie[0]);
-
-   if (s.local.family == AF_INET) {
-   s.local.bytelen = s.remote.bytelen = 4;
-   } else {
-   s.local.bytelen = s.remote.bytelen = 16;
-   }
+   s->state= r->idiag_state;
+   s->local.family = s->remote.family = r->idiag_family;
+   s->lport= ntohs(r->id.idiag_sport);
+   s->rport= ntohs(r->id.idiag_dport);
+   s->wq   = r->idiag_wqueue;
+   s->rq   = r->idiag_rqueue;
+   s->ino  = r->idiag_inode;
+   s->uid  = r->idiag_uid;
+   s->iface= r->id.idiag_if;
+   s->sk   = cookie_sk_get(>id.idiag_cookie[0]);
+
+   if (s->local.family == AF_INET)
+   s->local.bytelen = s->remote.bytelen = 4;
+   else
+   s->local.bytelen = s->remote.bytelen = 16;
 
-   memcpy(s.local.data, r->id.idiag_src, s.local.bytelen);
-   memcpy(s.remote.data, r->id.idiag_dst, s.local.bytelen);
+   memcpy(s->local.data, r->id.idiag_src, s->local.bytelen);
+   memcpy(s->remote.data, r->id.idiag_dst, s->local.bytelen);
+}
 
-   if (f && f->f && run_ssfilter(f->f, ) == 0)
-   return 0;
+static int inet_show_sock(struct nlmsghdr *nlh,
+ struct sockstat *s,
+ int protocol)
+{
+   struct rtattr *tb[INET_DIAG_MAX+1];
+   struct inet_diag_msg *r = NLMSG_DATA(nlh);
+
+   parse_rtattr(tb, INET_DIAG_MAX, (struct rtattr *)(r+1),
+nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
 
if (tb[INET_DIAG_PROTOCOL])
protocol = *(__u8 *)RTA_DATA(tb[INET_DIAG_PROTOCOL]);
 
-   inet_stats_print(, protocol);
+   inet_stats_print(s, protocol);
 
if (show_options) {
struct tcpstat t = {};
@@ -2085,8 +2091,8 @@ static int inet_show_sock(struct nlmsghdr *nlh, struct 
filter *f, int protocol)
}
 
if (show_details) {
-   sock_details_print();
-   if (s.local.family == AF_INET6 && tb[INET_DIAG_SKV6ONLY]) {
+   sock_details_print(s);
+   if (s->local.family == AF_INET6 && tb[INET_DIAG_SKV6ONLY]) {
unsigned char v6only;
 
v6only = *(__u8 *)RTA_DATA(tb[INET_DIAG_SKV6ONLY]);
@@ -2268,9 +2274,16 @@ static int show_one_inet_sock(const struct sockaddr_nl 
*addr,
int err;
struct inet_diag_arg *diag_arg = arg;
struct inet_diag_msg *r = NLMSG_DATA(h);
+   struct sockstat s = {};
 
if (!(diag_arg->f->families & (1 << r->idiag_family)))
return 0;
+
+   parse_diag_msg(h, );
+
+   if (diag_arg->f->f && run_ssfilter(diag_arg->f->f, ) == 0)
+   return 0;
+
if (diag_arg->f->kill && kill_inet_sock(h, arg) != 0) {
if (errno == EOPNOTSUPP || errno == ENOENT) {
/* Socket can't be closed, or is already closed. */
@@ -2280,7 +2293,9 @@ static int show_one_inet_sock(const struct sockaddr_nl 
*addr,
return -1;
}
}
-   if ((err = inet_show_sock(h, diag_arg->f, diag_arg->protocol)) < 0)
+
+   err = inet_show_sock(h, , diag_arg->protocol);
+   if (err < 0)
return err;
 
return 0;
@@ -2345,6 +2360,7 @@ static int tcp_show_netlink_file(struct filter *f)
while (1) {
int status, err;
struct nlmsghdr *h = (struct nlmsghdr *)buf;
+   struct sockstat s = {};

[PATCH v2 iproute2 3/3] ss: Add support to filter on device

2016-06-27 Thread David Ahern

Add support for device names in the filter. Example:

root@kenny:~# ss -t  'sport == :22 && dev == red'
State  Recv-Q Send-Q Local Address:Port  Peer Address:Port
ESTAB  0  0  10.100.1.2%red:ssh  10.100.1.254:47814
ESTAB  0  0   2100:1::2%red:ssh2100:1::64:49406

Since kernel does not support iface in the filter specifying a
device name means all filtering is done in userspace.

Signed-off-by: David Ahern 
---
 misc/ss.c   | 32 
 misc/ssfilter.h |  2 ++
 misc/ssfilter.y | 22 +-
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index 0510701619ac..20ea3a44ffc5 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1043,6 +1043,7 @@ static void inet_addr_print(const inet_prefix *a, int 
port, unsigned int ifindex
 struct aafilter {
inet_prefix addr;
int port;
+   unsigned intiface;
struct aafilter *next;
 };
 
@@ -1157,7 +1158,12 @@ static int run_ssfilter(struct ssfilter *f, struct 
sockstat *s)
 
return s->lport <= a->port;
}
+   case SSF_DEVCOND:
+   {
+   struct aafilter *a = (void *)f->pred;
 
+   return s->iface == a->iface;
+   }
/* Yup. It is recursion. Sorry. */
case SSF_AND:
return run_ssfilter(f->pred, s) && run_ssfilter(f->post, s);
@@ -1328,6 +1334,11 @@ static int ssfilter_bytecompile(struct ssfilter *f, char 
**bytecode)
*bytecode = a;
return l1+4;
}
+   case SSF_DEVCOND:
+   {
+   /* bytecompile for SSF_DEVCOND not supported yet */
+   return 0;
+   }
default:
abort();
}
@@ -1416,6 +1427,27 @@ static int xll_name_to_index(const char *dev)
return ll_name_to_index(dev);
 }
 
+void *parse_devcond(char *name)
+{
+   struct aafilter a = { .iface = 0 };
+   struct aafilter *res;
+
+   a.iface = xll_name_to_index(name);
+   if (a.iface == 0) {
+   char *end;
+   unsigned long res;
+
+   res = strtoul(name, , 0);
+   if (!end || end == name || *end || res > UINT_MAX)
+   return NULL;
+   }
+
+   res = malloc(sizeof(*res));
+   *res = a;
+
+   return res;
+}
+
 void *parse_hostcond(char *addr, bool is_port)
 {
char *port = NULL;
diff --git a/misc/ssfilter.h b/misc/ssfilter.h
index 53922a844457..c7db8eee9578 100644
--- a/misc/ssfilter.h
+++ b/misc/ssfilter.h
@@ -8,6 +8,7 @@
 #define SSF_S_GE  7
 #define SSF_S_LE  8
 #define SSF_S_AUTO  9
+#define SSF_DEVCOND 10
 
 #include 
 
@@ -20,3 +21,4 @@ struct ssfilter
 
 int ssfilter_parse(struct ssfilter **f, int argc, char **argv, FILE *fp);
 void *parse_hostcond(char *addr, bool is_port);
+void *parse_devcond(char *name);
diff --git a/misc/ssfilter.y b/misc/ssfilter.y
index a258d04b85d7..14bf9817f2c3 100644
--- a/misc/ssfilter.y
+++ b/misc/ssfilter.y
@@ -36,7 +36,7 @@ static void yyerror(char *s)
 
 %}
 
-%token HOSTCOND DCOND SCOND DPORT SPORT LEQ GEQ NEQ AUTOBOUND
+%token HOSTCOND DCOND SCOND DPORT SPORT LEQ GEQ NEQ AUTOBOUND DEVCOND DEVNAME
 %left '|'
 %left '&'
 %nonassoc '!'
@@ -108,6 +108,14 @@ expr:  DCOND HOSTCOND
 {
$$ = alloc_node(SSF_NOT, alloc_node(SSF_SCOND, $3));
 }
+| DEVNAME '=' DEVCOND
+{
+   $$ = alloc_node(SSF_DEVCOND, $3);
+}
+| DEVNAME NEQ DEVCOND
+{
+   $$ = alloc_node(SSF_NOT, alloc_node(SSF_DEVCOND, $3));
+}
 
 | AUTOBOUND
 {
@@ -237,6 +245,10 @@ int yylex(void)
tok_type = SPORT;
return SPORT;
}
+   if (strcmp(curtok, "dev") == 0) {
+   tok_type = DEVNAME;
+   return DEVNAME;
+   }
if (strcmp(curtok, ">=") == 0 ||
strcmp(curtok, "ge") == 0 ||
strcmp(curtok, "geq") == 0)
@@ -263,6 +275,14 @@ int yylex(void)
tok_type = AUTOBOUND;
return AUTOBOUND;
}
+   if (tok_type == DEVNAME) {
+   yylval = (void*)parse_devcond(curtok);
+   if (yylval == NULL) {
+   fprintf(stderr, "Cannot parse device.\n");
+   exit(1);
+   }
+   return DEVCOND;
+   }
yylval = (void*)parse_hostcond(curtok, tok_type == SPORT || tok_type == 
DPORT);
if (yylval == NULL) {
fprintf(stderr, "Cannot parse dst/src address.\n");
-- 
2.1.4

[PATCH v2 iproute2 0/3] ss: Add support to filter by device

2016-06-27 Thread David Ahern

Add support for specifying device name in the filter to ss.
The kernel does not provide support for iface filtering, so if
the user specifies 'dev == NAME' or 'dev != NAME' all filtering
is done in userspace.

I will send a patch to add support for iface filtering in the kernel,
but the reality is that ss will need to accommodate both (ie., lack of
kernel support) for some time - which this set provides.

v2
- fixed checkpatch errors and warnings

David Ahern (3):
  ss: Refactor inet_show_sock
  ss: Allow ssfilter_bytecompile to return 0
  ss: Add support to filter on device

 misc/ss.c   | 159 +---
 misc/ssfilter.h |   2 +
 misc/ssfilter.y |  22 +++-
 3 files changed, 140 insertions(+), 43 deletions(-)

-- 
2.1.4

[PATCH v2 iproute2 2/3] ss: Allow ssfilter_bytecompile to return 0

2016-06-27 Thread David Ahern

Allow ssfilter_bytecompile to return 0 for filter ops the kernel
does not support. If such an op is in the filter string then all
filtering is done in userspace.

Signed-off-by: David Ahern 
---
 misc/ss.c | 52 +---
 1 file changed, 37 insertions(+), 15 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index f164ca920308..0510701619ac 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1273,11 +1273,16 @@ static int ssfilter_bytecompile(struct ssfilter *f, 
char **bytecode)
 
case SSF_AND:
{
-   char *a1, *a2, *a;
+   char *a1 = NULL, *a2 = NULL, *a;
int l1, l2;
 
l1 = ssfilter_bytecompile(f->pred, );
l2 = ssfilter_bytecompile(f->post, );
+   if (!l1 || !l2) {
+   free(a1);
+   free(a2);
+   return 0;
+   }
if (!(a = malloc(l1+l2))) abort();
memcpy(a, a1, l1);
memcpy(a+l1, a2, l2);
@@ -1288,11 +1293,16 @@ static int ssfilter_bytecompile(struct ssfilter *f, 
char **bytecode)
}
case SSF_OR:
{
-   char *a1, *a2, *a;
+   char *a1 = NULL, *a2 = NULL, *a;
int l1, l2;
 
l1 = ssfilter_bytecompile(f->pred, );
l2 = ssfilter_bytecompile(f->post, );
+   if (!l1 || !l2) {
+   free(a1);
+   free(a2);
+   return 0;
+   }
if (!(a = malloc(l1+l2+4))) abort();
memcpy(a, a1, l1);
memcpy(a+l1+4, a2, l2);
@@ -1303,10 +1313,14 @@ static int ssfilter_bytecompile(struct ssfilter *f, 
char **bytecode)
}
case SSF_NOT:
{
-   char *a1, *a;
+   char *a1 = NULL, *a;
int l1;
 
l1 = ssfilter_bytecompile(f->pred, );
+   if (!l1) {
+   free(a1);
+   return 0;
+   }
if (!(a = malloc(l1+4))) abort();
memcpy(a, a1, l1);
free(a1);
@@ -2127,6 +2141,7 @@ static int tcpdiag_send(int fd, int protocol, struct 
filter *f)
struct msghdr msg;
struct rtattr rta;
struct iovec iov[3];
+   int iovlen = 1;
 
if (protocol == IPPROTO_UDP)
return -1;
@@ -2162,18 +2177,21 @@ static int tcpdiag_send(int fd, int protocol, struct 
filter *f)
};
if (f->f) {
bclen = ssfilter_bytecompile(f->f, );
-   rta.rta_type = INET_DIAG_REQ_BYTECODE;
-   rta.rta_len = RTA_LENGTH(bclen);
-   iov[1] = (struct iovec){ , sizeof(rta) };
-   iov[2] = (struct iovec){ bc, bclen };
-   req.nlh.nlmsg_len += RTA_LENGTH(bclen);
+   if (bclen) {
+   rta.rta_type = INET_DIAG_REQ_BYTECODE;
+   rta.rta_len = RTA_LENGTH(bclen);
+   iov[1] = (struct iovec){ , sizeof(rta) };
+   iov[2] = (struct iovec){ bc, bclen };
+   req.nlh.nlmsg_len += RTA_LENGTH(bclen);
+   iovlen = 3;
+   }
}
 
msg = (struct msghdr) {
.msg_name = (void *),
.msg_namelen = sizeof(nladdr),
.msg_iov = iov,
-   .msg_iovlen = f->f ? 3 : 1,
+   .msg_iovlen = iovlen,
};
 
if (sendmsg(fd, , 0) < 0) {
@@ -2194,6 +2212,7 @@ static int sockdiag_send(int family, int fd, int 
protocol, struct filter *f)
struct msghdr msg;
struct rtattr rta;
struct iovec iov[3];
+   int iovlen = 1;
 
if (family == PF_UNSPEC)
return tcpdiag_send(fd, protocol, f);
@@ -,18 +2241,21 @@ static int sockdiag_send(int family, int fd, int 
protocol, struct filter *f)
};
if (f->f) {
bclen = ssfilter_bytecompile(f->f, );
-   rta.rta_type = INET_DIAG_REQ_BYTECODE;
-   rta.rta_len = RTA_LENGTH(bclen);
-   iov[1] = (struct iovec){ , sizeof(rta) };
-   iov[2] = (struct iovec){ bc, bclen };
-   req.nlh.nlmsg_len += RTA_LENGTH(bclen);
+   if (bclen) {
+   rta.rta_type = INET_DIAG_REQ_BYTECODE;
+   rta.rta_len = RTA_LENGTH(bclen);
+   iov[1] = (struct iovec){ , sizeof(rta) };
+   iov[2] = (struct iovec){ bc, bclen };
+   req.nlh.nlmsg_len += RTA_LENGTH(bclen);
+   iovlen = 3;
+   }
}
 
msg = (struct msghdr) {
.msg_name = (void *),
.msg_namelen = sizeof(nladdr),
.msg_iov = iov,
-   .msg_iovlen = f->f ? 3 : 1,
+

Re: [PATCH v2 net-next] tcp: md5: use kmalloc() backed scratch areas

2016-06-27 Thread Cong Wang

On Mon, Jun 27, 2016 at 9:51 AM, Eric Dumazet  wrote:
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 
> 5c7ed147449c1b7ba029b12e033ad779a631460a..fddc0ab76c1df82cb05dba03271b773e3b2d
>  100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2969,8 +2969,18 @@ static void __tcp_alloc_md5sig_pool(void)
> return;
>
> for_each_possible_cpu(cpu) {
> +   void *scratch = per_cpu(tcp_md5sig_pool, cpu).scratch;
> struct ahash_request *req;
>
> +   if (!scratch) {
> +   scratch = kmalloc_node(sizeof(union tcp_md5sum_block) 
> +
> +  sizeof(struct tcphdr),
> +  GFP_KERNEL,
> +  cpu_to_node(cpu));
> +   if (!scratch)
> +   return;
> +   per_cpu(tcp_md5sig_pool, cpu).scratch = scratch;
> +   }
> if (per_cpu(tcp_md5sig_pool, cpu).md5_req)
> continue;

Not a problem of your patch, but it seems these allocations never
get freed once we start using tcp md5. Maybe we should free them
when the last socket using tcp md5 is gone?

Re: [iproute PATCH v3 0/6] Big C99 style initializer rework

2016-06-27 Thread Phil Sutter

Hi,

On Mon, Jun 27, 2016 at 10:59:12AM -0700, Stephen Hemminger wrote:
> On Thu, 23 Jun 2016 17:34:08 +
> Phil Sutter  wrote:
> 
> > This is v3 of my C99-style initializer related patch series. The changes
> > since v2 are:
[...]
> 
> I like the idea and it makes code cleaner. But doing this introduces lots of 
> warnings
> and that is not acceptable.
> ip
> CC   ip.o
> CC   ipaddress.o
> ipaddress.c: In function ‘print_queuelen’:
> ipaddress.c:175:10: warning: missing braces around initializer 
> [-Wmissing-braces]
>struct ifreq ifr = { 0 };
>   ^

I saw these too with gcc-3.4.6 but not with 5.3.0. It appears to be a
gcc bug[1]. One possible workaround is to match the brace level of the
first field, but it's quite ugly: [2]. Another way might be to
initialize one of the fields to zero, like so:

| struct ifreq ifr = { .ifr_qlen = 0 };

What do you think?

Thanks, Phil

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119
[2] 
http://nwl.cc/cgi-bin/git/gitweb.cgi?p=iproute2.git;a=commitdiff;h=a1cbf2b63c995b2f633c5b4699248ab308b201d2;hp=3809cfec65b03716d1d0360338126df4b4f3fbf6

[Patch net] mlx4: set csum_complete_sw bit when fixing complete csum

2016-06-27 Thread Cong Wang

The stack doesn't trust the complete csum by hardware
even when it is correct. In the case we fix csum by ourself
probably it is safe to just mark it as completed by software.

This should shut up a kernel warning from netdev_rx_csum_fault()
with mlx4 driver for ICMP packets.

Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM 
COMPLETE')
Cc: Shani Michaeli 
Cc: Tariq Toukan 
Cc: Yishai Hadas 
Signed-off-by: Cong Wang 
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c1b3a9c..b44c434 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -732,6 +732,7 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff 
*skb, void *va,
if (get_fixed_ipv6_csum(hw_checksum, skb, hdr))
return -1;
 #endif
+   skb->csum_complete_sw = 1;
return 0;
 }
 
-- 
2.1.0

[PATCH net-next] net: bridge: add support for IGMP/MLD stats and export them via netlink

2016-06-27 Thread Nikolay Aleksandrov

This patch adds stats support for the currently used IGMP/MLD types by the
bridge. The stats are per-port (plus one stat per-bridge) and per-direction
(RX/TX). The stats are exported via netlink via the new linkxstats API
(RTM_GETSTATS). In order to minimize the performance impact, a new option
is used to enable/disable the stats - multicast_stats_enabled, similar to
the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
lookups and checks, we make use of the current "igmp" member of the bridge
private skb->cb region to record the type on Rx (both host-generated and
external packets pass by multicast_rcv()). We can do that since the igmp
member was used as a boolean and all the valid IGMP/MLD types are positive
values. The normal bridge fast-path is not affected at all, the only
affected paths are the flooding ones and since we make use of the IGMP/MLD
type, we can quickly determine if the packet should be counted using
cache-hot data (cb's igmp member). We add counters for:
* IGMP Queries
* IGMP Leaves
* IGMP v1/v2/v3 reports

* MLD Queries
* MLD Leaves
* MLD v1/v2 reports

These are invaluable when monitoring or debugging complex multicast setups
with bridges.

Signed-off-by: Nikolay Aleksandrov 
---
 include/uapi/linux/if_bridge.h |  27 +++
 include/uapi/linux/if_link.h   |   1 +
 net/bridge/br_device.c |  10 ++-
 net/bridge/br_forward.c|  13 ++-
 net/bridge/br_if.c |   9 ++-
 net/bridge/br_input.c  |   3 +
 net/bridge/br_multicast.c  | 176 +
 net/bridge/br_netlink.c|  94 --
 net/bridge/br_private.h|  41 +-
 net/bridge/br_sysfs_br.c   |  25 ++
 10 files changed, 356 insertions(+), 43 deletions(-)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 397d503fdedb..9a0d81615a2f 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -247,8 +247,35 @@ enum {
 enum {
BRIDGE_XSTATS_UNSPEC,
BRIDGE_XSTATS_VLAN,
+   BRIDGE_XSTATS_MCAST,
+   BRIDGE_XSTATS_PAD,
__BRIDGE_XSTATS_MAX
 };
 #define BRIDGE_XSTATS_MAX (__BRIDGE_XSTATS_MAX - 1)
 
+enum {
+   BR_MCAST_DIR_RX,
+   BR_MCAST_DIR_TX,
+   BR_MCAST_DIR_SIZE
+};
+
+/* IGMP/MLD statistics */
+struct br_mcast_stats {
+   __u64 igmp_queries[BR_MCAST_DIR_SIZE];
+   __u64 igmp_leaves[BR_MCAST_DIR_SIZE];
+   __u64 igmp_v1reports[BR_MCAST_DIR_SIZE];
+   __u64 igmp_v2reports[BR_MCAST_DIR_SIZE];
+   __u64 igmp_v3reports[BR_MCAST_DIR_SIZE];
+
+   __u64 mld_queries[BR_MCAST_DIR_SIZE];
+   __u64 mld_leaves[BR_MCAST_DIR_SIZE];
+   __u64 mld_v1reports[BR_MCAST_DIR_SIZE];
+   __u64 mld_v2reports[BR_MCAST_DIR_SIZE];
+};
+
+struct br_mcast_stats_nla {
+   int ifindex;
+   __u32 pad1;
+   struct br_mcast_stats mstats;
+};
 #endif /* _UAPI_LINUX_IF_BRIDGE_H */
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index bb36bd5675a7..c9cb6adfda13 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -273,6 +273,7 @@ enum {
IFLA_BR_VLAN_DEFAULT_PVID,
IFLA_BR_PAD,
IFLA_BR_VLAN_STATS_ENABLED,
+   IFLA_BR_MCAST_STATS_ENABLED,
__IFLA_BR_MAX,
 };
 
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 2c8095a5d824..0c39e0f6da09 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -104,8 +104,16 @@ static int br_dev_init(struct net_device *dev)
return -ENOMEM;
 
err = br_vlan_init(br);
-   if (err)
+   if (err) {
free_percpu(br->stats);
+   return err;
+   }
+
+   err = br_multicast_init_stats(br);
+   if (err) {
+   free_percpu(br->stats);
+   br_vlan_flush(br);
+   }
br_set_lockdep_class(dev);
 
return err;
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index f47759f05b6d..6c196037d818 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -198,8 +198,10 @@ static void br_flood(struct net_bridge *br, struct sk_buff 
*skb,
   struct sk_buff *skb),
 bool unicast)
 {
-   struct net_bridge_port *p;
+   u8 igmp_type = br_multicast_igmp_type(skb);
+   __be16 proto = skb->protocol;
struct net_bridge_port *prev;
+   struct net_bridge_port *p;
 
prev = NULL;
 
@@ -218,6 +220,9 @@ static void br_flood(struct net_bridge *br, struct sk_buff 
*skb,
prev = maybe_deliver(prev, p, skb, __packet_hook);
if (IS_ERR(prev))
goto out;
+   if (prev == p)
+   br_multicast_count(p->br, p, proto, igmp_type,
+  BR_MCAST_DIR_TX);
}
 
if (!prev)
@@ -257,9 +262,12 @@ static void br_multicast_flood(struct

Re: [PATCH iproute2 3/3] ss: Add support to filter on device

2016-06-27 Thread Stephen Hemminger

On Tue, 21 Jun 2016 20:38:26 -0700
David Ahern  wrote:

> Add support for device names in the filter. Example:
> 
> root@kenny:~# ss -t  'sport == :22 && dev == red'
> State  Recv-Q Send-Q Local Address:Port  Peer Address:Port
> ESTAB  0  0  10.100.1.2%red:ssh  10.100.1.254:47814
> ESTAB  0  0   2100:1::2%red:ssh2100:1::64:49406
> 
> Since kernel does not support iface in the filter specifying a
> device name means all filtering is done in userspace.
> 

Introduces new checkpatch whitespace issues.
ERROR: code indent should use tabs where possible
#60: FILE: misc/ss.c:1337:
+   case SSF_DEVCOND:$

WARNING: please, no spaces at the start of a line
#60: FILE: misc/ss.c:1337:
+   case SSF_DEVCOND:$

ERROR: code indent should use tabs where possible
#63: FILE: misc/ss.c:1340:
+return 0;$

WARNING: please, no spaces at the start of a line
#63: FILE: misc/ss.c:1340:
+return 0;$

Re: [PATCH iproute2 1/3] ss: Refactor inet_show_sock

2016-06-27 Thread Stephen Hemminger

On Tue, 21 Jun 2016 20:38:24 -0700
David Ahern  wrote:

> Extract parsing of sockstat and filter from inet_show_sock.
> While moving run_ssfilter into callers of inet_show_sock enable
> userspace filtering before the kill.
> 
> Signed-off-by: David Ahern 

I would add this but it has checkpatch style issues

WARNING: braces {} are not necessary for any arm of this statement
#65: FILE: misc/ss.c:2060:
+   if (s->local.family == AF_INET) {
[...]
} else {
[...]

WARNING: line over 80 characters
#80: FILE: misc/ss.c:2070:
+static int inet_show_sock(struct nlmsghdr *nlh, struct sockstat *s, int 
protocol)

ERROR: do not use assignment in if condition
#129: FILE: misc/ss.c:2295:
+   if ((err = inet_show_sock(h, , diag_arg->protocol)) < 0)

Re: [PATCH iproute2] man: ip-link: Add vrf type

2016-06-27 Thread Stephen Hemminger

On Tue, 21 Jun 2016 16:29:01 -0700
David Ahern  wrote:

> Add description for vrf type to ip-link man page.
> 
> Signed-off-by: David Ahern 

Applied

Re: [iproute PATCH] Fix MAC address length check

2016-06-27 Thread Stephen Hemminger

On Wed, 22 Jun 2016 12:05:38 +0200
Phil Sutter  wrote:

> I forgot to change the variable in the conditional, too.
> 
> Fixes: 8fe58d58941f4 ("iplink: Check address length via netlink")
> Signed-off-by: Phil Sutter 

Applied

Re: [PATCH v2 net-next] tcp: md5: use kmalloc() backed scratch areas

2016-06-27 Thread Andy Lutomirski

On Mon, Jun 27, 2016 at 9:51 AM, Eric Dumazet  wrote:
> From: Eric Dumazet 
>
> Some arches have virtually mapped kernel stacks, or will soon have.
>
> tcp_md5_hash_header() uses an automatic variable to copy tcp header
> before mangling th->check and calling crypto function, which might
> be problematic on such arches.
>
> David says that using percpu storage is also problematic on non SMP
> builds.
>
> Just use kmalloc() to allocate scratch areas.

Seems reasonable.

I wonder if it's worth switching from ahash to shash, though.  It
would probably be simpler and faster.

--Andy

Re: [iproute PATCH v3 0/6] Big C99 style initializer rework

2016-06-27 Thread Stephen Hemminger

On Thu, 23 Jun 2016 17:34:08 +
Phil Sutter  wrote:

> This is v3 of my C99-style initializer related patch series. The changes
> since v2 are:
> 
> - Flattened embedded struct's initializers:
>   Since the field names are very short, I figured it makes more sense to
>   keep indenting low. Also, the same style is already used in
>   ip/xfrm_policy.c so take that as an example.
> 
> - Moved leftover nlmsg_seq initializing into the common place as well:
>   I was unsure whether this is a good idea at first (due to the
>   increment), but again it's done in ip/xfrm_policy.c as well so should
>   be fine.
> 
> - Added a comma after the last field initializer as suggested by Jakub.
> 
> - Dropped patch 7 since it was NACKed.
> 
> - Eliminated checkpatch non-compliance.
> 
> - Second go at union bpf_attr in tc/tc_bpf.c:
>   I figured that while it is not possible to initialize fields, gcc-3.4.6
>   does not complain when setting the whole union to zero using '= {0}'.
>   So I did this and thereby at least got rid of the memset calls.
> 
> For reference, here's the v2 changelog:
> 
> - Rebased onto current upstream master:
>   My own commit a0a73b298a579 ("tc: m_action: Use C99 style initializers
>   for struct req") contains most of the changes to tc/m_action.c already,
>   so I put the remaining ones into a dedicated patch (the first one here)
>   with a better description.
> 
> - Tested against gcc-3.4.6:
>   This is the oldest gcc version I was able to install locally. It indeed
>   does not like the former changes in tc/tc_bpf.c, so I reverted them.
>   Apart from emitting many warnings, it successfully compiles the
>   sources.
> 
> In the process of compatibility testing, I made a few more changes which
> make sense to have:
> 
> - New patch 5 allows to conveniently override the compiler via command
>   line.
> 
> - New patch 6 eliminates a warning with old gcc but looks valid in
>   general.
> 
> - A warning made me look at ip/tcp_metrics.c and I found a minor code
>   simplification (patch 7).
> 
> Phil Sutter (6):
>   tc: m_action: Improve conversion to C99 style initializers
>   Use C99 style initializers everywhere
>   Replace malloc && memset by calloc
>   No need to initialize rtattr fields before parsing
>   Makefile: Allow to override CC
>   misc/ifstat: simplify unsigned value comparison
> 
>  Makefile   |   4 +-
>  bridge/fdb.c   |  25 ++--
>  bridge/link.c  |  14 +++
>  bridge/mdb.c   |  17 -
>  bridge/vlan.c  |  17 -
>  genl/ctrl.c|  44 +
>  genl/genl.c|   3 +-
>  ip/ip6tunnel.c |  10 ++---
>  ip/ipaddress.c |  33 +++-
>  ip/ipaddrlabel.c   |  21 --
>  ip/iplink.c|  61 -
>  ip/iplink_can.c|   4 +-
>  ip/ipmaddr.c   |  25 
>  ip/ipmroute.c  |   8 +---
>  ip/ipneigh.c   |  30 ++-
>  ip/ipnetconf.c |  10 ++---
>  ip/ipnetns.c   |  39 +--
>  ip/ipntable.c  |  25 
>  ip/iproute.c   |  78 +
>  ip/iprule.c|  22 +--
>  ip/iptoken.c   |  19 -
>  ip/iptunnel.c  |  31 +--
>  ip/ipxfrm.c|  26 -
>  ip/link_gre.c  |  18 -
>  ip/link_gre6.c |  18 -
>  ip/link_ip6tnl.c   |  25 +---
>  ip/link_iptnl.c|  22 +--
>  ip/link_vti.c  |  18 -
>  ip/link_vti6.c |  18 -
>  ip/xfrm_policy.c   |  99 +++
>  ip/xfrm_state.c| 110 
> ++---
>  lib/libnetlink.c   |  77 ++---
>  lib/ll_map.c   |   1 -
>  lib/names.c|   7 +---
>  misc/arpd.c|  64 ++-
>  misc/ifstat.c  |   2 +-
>  misc/lnstat.c  |   6 +--
>  misc/lnstat_util.c |   4 +-
>  misc/ss.c  |  37 +++---
>  tc/e_bpf.c |   7 +---
>  tc/em_canid.c  |   4 +-
>  tc/em_cmp.c|   4 +-
>  tc/em_ipset.c  |   4 +-
>  tc/em_meta.c   |   4 +-
>  tc/em_nbyte.c  |   4 +-
>  tc/em_u32.c|   4 +-
>  tc/f_flow.c|   3 --
>  tc/f_flower.c  |   3 +-
>  tc/f_fw.c  |   6 +--
>  tc/f_route.c   |   3 --
>  tc/f_rsvp.c|   6 +--
>  tc/f_u32.c |  12 ++
>  tc/m_action.c  |  26 -
>  tc/m_bpf.c |   5 +--
>  tc/m_csum.c|   4 +-
>  tc/m_ematch.c  |   4 +-
>  tc/m_gact.c|   5 +--
>  tc/m_ife.c |   5 +--
>  tc/m_ipt.c |  13 ++-
>  tc/m_mirred.c  |   7 +---
>  tc/m_nat.c |   4 +-
>  tc/m_pedit.c   |  11 ++
>  tc/m_police.c  |   5 +--
>  tc/q_atm.c |   3 +-
>  tc/q_cbq.c |  22 +++
>  tc/q_choke.c   |   4 +-
>  tc/q_codel.c   |   3 +-
>  tc/q_dsmark.c  |   1 -
>  tc/q_fifo.c|   4 +-

Re: [PATCH iproute2] Enable use of extra debugging information

2016-06-27 Thread Stephen Hemminger

On Tue, 21 Jun 2016 16:27:09 -0700
David Ahern  wrote:

> Add -g flag to builds if DEBUG parameter is set. Improves
> debugging with gdb.
> 
> Signed-off-by: David Ahern 

I would rather not put this in the upstream repo.
Developers are free to modify flags as they see fit when debugging.

Re: [iproute PATCH] man: ip-address, ip-link: Document 'type' quirk

2016-06-27 Thread Stephen Hemminger

On Fri, 24 Jun 2016 12:14:23 +0200
Phil Sutter  wrote:

> This covers the fact that calling 'ip {link|addr} show type foobar' does
> not return an error.
> 
> Signed-off-by: Phil Sutter 

Applied

Re: [PATCH iproute2 net-next] bridge: man: fix "brige" typo

2016-06-27 Thread Stephen Hemminger

On Tue, 21 Jun 2016 19:28:50 +
Vivien Didelot  wrote:

> Signed-off-by: Vivien Didelot 

Applied to current no need to wait for net-next

Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653

2016-06-27 Thread Marc Dionne

On Mon, Jun 27, 2016 at 12:38 PM, Florian Westphal  wrote:
> Marc Dionne  wrote:
>> On Mon, Jun 27, 2016 at 11:22 AM, Florian Westphal  wrote:
>> > Marc Dionne  wrote:
>> >> Hi,
>
>> > hlist_nulls_for_each_entry(h, n, _conntrack_hash[hash], hnnode)
>> > if (nf_ct_key_equal(h, 
>> > >tuplehash[IP_CT_DIR_ORIGINAL].tuple,
>> > -   zone, net))
>> > -   goto out;
>> > +   zone, net)) {
>> > +   nf_ct_add_to_dying_list(ct);
>> > +   ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
>> > +   goto dying;
>> > +   }
>
> This is bogus as h can be a reply too (key compare does not deal
> with it).
>
> Below is what I actually intended; I can't come up with a reason why
> you experience this issue other than that we're getting confused over
> reply/original direction.
>
> If the patch doesn't help either, can you tell us what kind of iptables
> rules are installed on the affected system or perhaps report perf drop
> monitor stat when things go wrong?
>
> Thanks!

The additional patch didn't help either.

I had a lot of iptables bloat, but I reverted to old simple iptables
and ip6tables configs (attached), and still see the problem.  Note
that the test normally uses ipv6, but the behaviour is the same with
ipv4.

Marc


iptables
Description: Binary data


ip6tables
Description: Binary data

Re: [PATCH net-next 03/16] net/mlx5: E-Switch, Add miss rule for offloads mode

2016-06-27 Thread Sergei Shtylyov


Hello.

On 06/27/2016 07:07 PM, Saeed Mahameed wrote:


From: Or Gerlitz 

In the sriov offloads mode, packets that are not matched by any other
rule should be sent towards the e-switch manager for further processing.

Add such "miss" rule which matches ANY packet as the last rule in the
e-switch FDB and programs the HW to send the packet to vport 0 where
the e-switch manager runs.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 

[...]


diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index c6b28df..9310017 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -38,6 +38,41 @@
 #include "mlx5_core.h"
 #include "eswitch.h"

+static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
+{
+   struct mlx5_flow_destination dest;
+   struct mlx5_flow_rule *flow_rule = NULL;
+   int match_header = 0;


   This variable doesn't apperar necessary...


+   u32 *match_v, *match_c;
+   int err = 0;
+
+   match_v = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
+   match_c = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
+   if (!match_v || !match_c) {
+   esw_warn(esw->dev, "FDB: Failed to alloc match parameters\n");
+   err = -ENOMEM;
+   goto out;
+   }
+
+   dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
+   dest.vport_num = 0;
+
+   flow_rule = mlx5_add_flow_rule(esw->fdb_table.fdb, match_header, 
match_c,


   Whu not just pass 0 instead of 'match_header'?


+  match_v, 
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
+  0, );
+   if (IS_ERR(flow_rule)) {
+   err = PTR_ERR(flow_rule);
+   esw_warn(esw->dev,  "FDB: Failed to add miss flow rule err 
%d\n", err);
+   goto out;
+   }
+
+   esw->fdb_table.offloads.miss_rule = flow_rule;
+out:
+   kfree(match_v);
+   kfree(match_c);
+   return err;
+}
+
 #define MAX_PF_SQ 256

 int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)

[...]

MBR, Sergei

Re: [ethtool PATCH v2 4/4] ethtool: Enhancing link mode bits to support 25G/50G/100G

2016-06-27 Thread David Decotigny

On Sun, Jun 26, 2016 at 12:45 PM, Vidya Sagar Ravipati
 wrote:
> From: Vidya Sagar Ravipati 
>
> Enhancing link mode bits to support 25G/50G/100G
> for supported and advertised speed mode bits
>
> Signed-off-by: Vidya Sagar Ravipati 
> ---
>  ethtool.c | 27 +++
>  1 file changed, 27 insertions(+)
>
> diff --git a/ethtool.c b/ethtool.c
> index 1d6564e..5c3c765 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -512,6 +512,15 @@ static void init_global_link_mode_masks(void)
> ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT,
> ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT,
> ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT,
> +   ETHTOOL_LINK_MODE_25000baseCR_Full_BIT,
> +   ETHTOOL_LINK_MODE_25000baseKR_Full_BIT,
> +   ETHTOOL_LINK_MODE_25000baseSR_Full_BIT,
> +   ETHTOOL_LINK_MODE_5baseCR2_Full_BIT,
> +   ETHTOOL_LINK_MODE_5baseKR2_Full_BIT,
> +   ETHTOOL_LINK_MODE_10baseKR4_Full_BIT,
> +   ETHTOOL_LINK_MODE_10baseSR4_Full_BIT,
> +   ETHTOOL_LINK_MODE_10baseCR4_Full_BIT,
> +   ETHTOOL_LINK_MODE_10baseLR4_ER4_Full_BIT,
> };
> static const enum ethtool_link_mode_bit_indices
> additional_advertised_flags_bits[] = {
> @@ -632,6 +641,24 @@ static void dump_link_caps(const char *prefix, const 
> char *an_prefix,
>   "56000baseSR4/Full" },
> { 0, ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT,
>   "56000baseLR4/Full" },
> +   { 0, ETHTOOL_LINK_MODE_25000baseCR_Full_BIT,
> + "25000baseCR/Full" },
> +   { 0, ETHTOOL_LINK_MODE_25000baseKR_Full_BIT,
> + "25000baseKR/Full" },
> +   { 0, ETHTOOL_LINK_MODE_25000baseSR_Full_BIT,
> + "25000baseSR/Full" },
> +   { 0, ETHTOOL_LINK_MODE_5baseCR2_Full_BIT,
> + "5baseCR2/Full" },
> +   { 0, ETHTOOL_LINK_MODE_5baseKR2_Full_BIT,
> + "5baseKR2/Full" },
> +   { 0, ETHTOOL_LINK_MODE_10baseKR4_Full_BIT,
> + "10baseKR4/Full" },
> +   { 0, ETHTOOL_LINK_MODE_10baseSR4_Full_BIT,
> + "10baseSR4/Full" },
> +   { 0, ETHTOOL_LINK_MODE_10baseCR4_Full_BIT,
> + "10baseCR4/Full" },
> +   { 0, ETHTOOL_LINK_MODE_10baseLR4_ER4_Full_BIT,
> + "10baseLR4_ER4/Full" },
> };
> int indent;
> int did1, new_line_pend, i;
> --
> 2.1.4
>

Acked-By: David Decotigny

[PATCH v2 net-next] tcp: md5: use kmalloc() backed scratch areas

2016-06-27 Thread Eric Dumazet

From: Eric Dumazet 

Some arches have virtually mapped kernel stacks, or will soon have.

tcp_md5_hash_header() uses an automatic variable to copy tcp header
before mangling th->check and calling crypto function, which might
be problematic on such arches.

David says that using percpu storage is also problematic on non SMP
builds.

Just use kmalloc() to allocate scratch areas.

Signed-off-by: Eric Dumazet 
Reported-by: Andy Lutomirski 
---
 include/net/tcp.h   |3 +--
 net/ipv4/tcp.c  |   10 ++
 net/ipv4/tcp_ipv4.c |   31 ++-
 net/ipv6/tcp_ipv6.c |   29 -
 4 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
a79894b667265cdf9e3fe793b4757e2f932b378a..7d892f65d6c88a4520e4e2a9695478e0e8a7a7f7
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1384,7 +1384,7 @@ union tcp_md5sum_block {
 /* - pool: digest algorithm, hash description and scratch buffer */
 struct tcp_md5sig_pool {
struct ahash_request*md5_req;
-   union tcp_md5sum_block  md5_blk;
+   void*scratch;
 };
 
 /* - functions */
@@ -1420,7 +1420,6 @@ static inline void tcp_put_md5sig_pool(void)
local_bh_enable();
 }
 
-int tcp_md5_hash_header(struct tcp_md5sig_pool *, const struct tcphdr *);
 int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *, const struct sk_buff *,
  unsigned int header_len);
 int tcp_md5_hash_key(struct tcp_md5sig_pool *hp,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 
5c7ed147449c1b7ba029b12e033ad779a631460a..fddc0ab76c1df82cb05dba03271b773e3b2d
 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2969,8 +2969,18 @@ static void __tcp_alloc_md5sig_pool(void)
return;
 
for_each_possible_cpu(cpu) {
+   void *scratch = per_cpu(tcp_md5sig_pool, cpu).scratch;
struct ahash_request *req;
 
+   if (!scratch) {
+   scratch = kmalloc_node(sizeof(union tcp_md5sum_block) +
+  sizeof(struct tcphdr),
+  GFP_KERNEL,
+  cpu_to_node(cpu));
+   if (!scratch)
+   return;
+   per_cpu(tcp_md5sig_pool, cpu).scratch = scratch;
+   }
if (per_cpu(tcp_md5sig_pool, cpu).md5_req)
continue;
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 
3708de2a66833cf1d4a221a2b6ce3923bde978c4..32b048e524d6773538918eca175b3f422f9c2aa7
 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1018,27 +1018,28 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char 
__user *optval,
  GFP_KERNEL);
 }
 
-static int tcp_v4_md5_hash_pseudoheader(struct tcp_md5sig_pool *hp,
-   __be32 daddr, __be32 saddr, int nbytes)
+static int tcp_v4_md5_hash_headers(struct tcp_md5sig_pool *hp,
+  __be32 daddr, __be32 saddr,
+  const struct tcphdr *th, int nbytes)
 {
struct tcp4_pseudohdr *bp;
struct scatterlist sg;
+   struct tcphdr *_th;
 
-   bp = >md5_blk.ip4;
-
-   /*
-* 1. the TCP pseudo-header (in the order: source IP address,
-* destination IP address, zero-padded protocol number, and
-* segment length)
-*/
+   bp = hp->scratch;
bp->saddr = saddr;
bp->daddr = daddr;
bp->pad = 0;
bp->protocol = IPPROTO_TCP;
bp->len = cpu_to_be16(nbytes);
 
-   sg_init_one(, bp, sizeof(*bp));
-   ahash_request_set_crypt(hp->md5_req, , NULL, sizeof(*bp));
+   _th = (struct tcphdr *)(bp + 1);
+   memcpy(_th, th, sizeof(*th));
+   _th->check = 0;
+
+   sg_init_one(, bp, sizeof(*bp) + sizeof(*th));
+   ahash_request_set_crypt(hp->md5_req, , NULL,
+   sizeof(*bp) + sizeof(*th));
return crypto_ahash_update(hp->md5_req);
 }
 
@@ -1055,9 +1056,7 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const 
struct tcp_md5sig_key *key,
 
if (crypto_ahash_init(req))
goto clear_hash;
-   if (tcp_v4_md5_hash_pseudoheader(hp, daddr, saddr, th->doff << 2))
-   goto clear_hash;
-   if (tcp_md5_hash_header(hp, th))
+   if (tcp_v4_md5_hash_headers(hp, daddr, saddr, th, th->doff << 2))
goto clear_hash;
if (tcp_md5_hash_key(hp, key))
goto clear_hash;
@@ -1101,9 +1100,7 @@ int tcp_v4_md5_hash_skb(char *md5_hash, const struct 
tcp_md5sig_key *key,
if (crypto_ahash_init(req))
goto clear_hash;
 
-   if (tcp_v4_md5_hash_pseudoheader(hp, daddr, saddr, skb->len))
-   goto clear_hash;
-

[PATCH net] net: bridge: fix vlan stats continue counter

2016-06-27 Thread Nikolay Aleksandrov

I made a dumb off-by-one mistake when I added the vlan stats counter
dumping code. The increment should happen before the check, not after
otherwise we miss one entry when we continue dumping.

Fixes: a60c090361ea ("bridge: netlink: export per-vlan stats")
Signed-off-by: Nikolay Aleksandrov 
---
 net/bridge/br_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index a5343c7232bf..85e89f693589 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1273,7 +1273,7 @@ static int br_fill_linkxstats(struct sk_buff *skb, const 
struct net_device *dev,
struct bridge_vlan_xstats vxi;
struct br_vlan_stats stats;
 
-   if (vl_idx++ < *prividx)
+   if (++vl_idx < *prividx)
continue;
memset(, 0, sizeof(vxi));
vxi.vid = v->vid;
-- 
2.1.4

[PATCH net-next 11/16] net/mlx5e: Create NIC global resources only once

2016-06-27 Thread Saeed Mahameed

From: Hadar Hen Zion 

To allow creating more than one netdev over the same PCI function, we
change the driver such that global NIC resources are created once and
later be shared amongst all the mlx5e netdevs running over that port.

Move the CQ UAR, PD (pdn), Transport Domain (tdn), MKey resources from
being kept in the mlx5e priv part to a new resources structure
(mlx5e_resources) placed under the mlx5_core device.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c| 112 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 124 +++--
 include/linux/mlx5/driver.h|  13 +++
 5 files changed, 171 insertions(+), 90 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_common.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 96f1826..9b14dad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -6,8 +6,8 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
fs_counters.o rl.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
-   en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
-   en_rx_am.o en_txrx.o en_clock.o vxlan.o en_tc.o \
-   en_arfs.o
+   en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
+   en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
+   en_tc.o en_arfs.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index da885c0..da93bf55 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -570,10 +570,6 @@ struct mlx5e_priv {
 
unsigned long  state;
struct mutex   state_lock; /* Protects Interface state */
-   struct mlx5_uarcq_uar;
-   u32pdn;
-   u32tdn;
-   struct mlx5_core_mkey  mkey;
struct mlx5_core_mkey  umr_mkey;
struct mlx5e_rqdrop_rq;
 
@@ -788,5 +784,7 @@ int mlx5e_rx_flow_steer(struct net_device *dev, const 
struct sk_buff *skb,
 #endif
 
 u16 mlx5e_get_max_inline_cap(struct mlx5_core_dev *mdev);
+int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev);
+void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev);
 
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
new file mode 100644
index 000..33b3732
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -0,0 +1,112 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "en.h"
+
+/* mlx5e global resources should be placed in this file.
+ * Global resources are common to all the netdevices crated on the same nic.
+ */
+
+static int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn,
+struct mlx5_core_mkey *mkey)
+{
+   struct mlx5_create_mkey_mbox_in *in;
+   int err;
+

Re: [PATCH] geneve: fix max_mtu setting

2016-06-27 Thread Jesse Gross

On Sun, Jun 26, 2016 at 6:13 PM, 严海双  wrote:
>
>> On Jun 26, 2016, at 8:35 PM, zhuyj  wrote:
>>
>> +   if (geneve->remote.sa.sa_family == AF_INET)
>> +   max_mtu -= sizeof(struct iphdr);
>> +   else
>> +   max_mtu -= sizeof(struct ipv6hdr);
>>
>> Sorry, if sa_family is not AF_NET, it is AF_INET6?
>>
>> There is a lot of macros in include/linux/socket.h.
>>
>> Zhu Yanjun
>>
>
> There are only two enumerations AF_INET and AF_INET6 have been assigned in 
> geneve_newlink:

There's actually a third possibility: AF_UNSPEC, which is the default
if neither remote type is specified. This is used by lightweight
tunnels and should be able to work with either IPv4/v6. For the
purposes of the MTU calculation this means that the IPv4 header size
should be used to avoid disallowing potentially valid configurations.

[PATCH net-next 02/16] net/mlx5: E-Switch, Add support for the sriov offloads mode

2016-06-27 Thread Saeed Mahameed

From: Or Gerlitz 

Unlike the legacy mode, here, forwarding rules are not learned by the
driver per events on macs set by VFs/VMs into their vports, but rather
should be programmed by higher-level SW entities.

Saying that, still, in the offloads mode (SRIOV_OFFLOADS), two flow
groups are created by the driver for management (slow path) purposes:

The first group will be used for sending packets over e-switch vports
from the host OS where the e-switch management code runs, to be
received by VFs.

The second group will be used by a miss rule which forwards packets toward
the e-switch manager. Further logic will trap these packets such that
the receiving net-device as seen by the networking stack is the representor
of the vport that sent the packet over the e-switch data-path.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  35 +++---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  16 +++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 135 +
 4 files changed, 168 insertions(+), 20 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index c4f450f..96f1826 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -5,7 +5,7 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
fs_counters.o rl.o
 
-mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
+mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
en_rx_am.o en_txrx.o en_clock.o vxlan.o en_tc.o \
en_arfs.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 8068dde..1fc4cfd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -40,17 +40,6 @@
 
 #define UPLINK_VPORT 0x
 
-#define MLX5_DEBUG_ESWITCH_MASK BIT(3)
-
-#define esw_info(dev, format, ...) \
-   pr_info("(%s): E-Switch: " format, (dev)->priv.name, ##__VA_ARGS__)
-
-#define esw_warn(dev, format, ...) \
-   pr_warn("(%s): E-Switch: " format, (dev)->priv.name, ##__VA_ARGS__)
-
-#define esw_debug(dev, format, ...)\
-   mlx5_core_dbg_mask(dev, MLX5_DEBUG_ESWITCH_MASK, format, ##__VA_ARGS__)
-
 enum {
MLX5_ACTION_NONE = 0,
MLX5_ACTION_ADD  = 1,
@@ -92,6 +81,9 @@ enum {
MC_ADDR_CHANGE | \
PROMISC_CHANGE)
 
+int  esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports);
+void esw_destroy_offloads_fdb_table(struct mlx5_eswitch *esw);
+
 static int arm_vport_context_events_cmd(struct mlx5_core_dev *dev, u16 vport,
u32 events_mask)
 {
@@ -578,7 +570,8 @@ static int esw_add_uc_addr(struct mlx5_eswitch *esw, struct 
vport_addr *vaddr)
if (err)
goto abort;
 
-   if (esw->fdb_table.fdb) /* SRIOV is enabled: Forward UC MAC to vport */
+   /* SRIOV is enabled: Forward UC MAC to vport */
+   if (esw->fdb_table.fdb && esw->mode == SRIOV_LEGACY)
vaddr->flow_rule = esw_fdb_set_vport_rule(esw, mac, vport);
 
esw_debug(esw->dev, "\tADDED UC MAC: vport[%d] %pM index:%d fr(%p)\n",
@@ -1543,7 +1536,7 @@ static void esw_disable_vport(struct mlx5_eswitch *esw, 
int vport_num)
 int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs, int mode)
 {
int err;
-   int i;
+   int i, enabled_events;
 
if (!esw || !MLX5_CAP_GEN(esw->dev, vport_group_manager) ||
MLX5_CAP_GEN(esw->dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
@@ -1562,18 +1555,19 @@ int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, 
int nvfs, int mode)
esw_warn(esw->dev, "E-Switch engress ACL is not supported by 
FW\n");
 
esw_info(esw->dev, "E-Switch enable SRIOV: nvfs(%d) mode (%d)\n", nvfs, 
mode);
-   if (mode != SRIOV_LEGACY)
-   return -EINVAL;
-
esw->mode = mode;
esw_disable_vport(esw, 0);
 
-   err = esw_create_legacy_fdb_table(esw, nvfs + 1);
+   if (mode == SRIOV_LEGACY)
+   err = esw_create_legacy_fdb_table(esw, nvfs + 1);
+   else
+   err = esw_create_offloads_fdb_table(esw, nvfs + 1);
if (err)
goto abort;
 
+   enabled_events = (mode == SRIOV_LEGACY) ? SRIOV_VPORT_EVENTS : 
UC_ADDR_CHANGE;
for (i = 0; i <= nvfs;

[PATCH net-next 07/16] net/mlx5: E-Switch, Add API to create vport rx rules

2016-06-27 Thread Saeed Mahameed

From: Or Gerlitz 

Add the API to create vport rx rules of the form

packet meta-data :: vport == $VPORT --> $TIR

where the TIR is opened by this VF representor.

This logic will by used for packets that didn't match any rule in the
e-switch datapath and should be received into the host OS through the
netdevice that represents the VF they were sent from.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  4 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 85 ++
 2 files changed, 89 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 32db37a..cf959f7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -157,6 +157,7 @@ enum {
 
 struct mlx5_esw_offload {
struct mlx5_flow_table *ft_offloads;
+   struct mlx5_flow_group *vport_rx_group;
 };
 
 struct mlx5_eswitch {
@@ -201,6 +202,9 @@ int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
 struct mlx5_flow_rule *
 mlx5_eswitch_add_send_to_vport_rule(struct mlx5_eswitch *esw, int vport, u32 
sqn);
 
+struct mlx5_flow_rule *
+mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, int vport, u32 
tirn);
+
 #define MLX5_DEBUG_ESWITCH_MASK BIT(3)
 
 #define esw_info(dev, format, ...) \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 3ca926b..67ff1e8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -245,3 +245,88 @@ static void esw_destroy_offloads_table(struct mlx5_eswitch 
*esw)
 
mlx5_destroy_flow_table(offloads->ft_offloads);
 }
+
+static int esw_create_vport_rx_group(struct mlx5_eswitch *esw)
+{
+   int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+   struct mlx5_flow_group *g;
+   struct mlx5_priv *priv = >dev->priv;
+   u32 *flow_group_in;
+   void *match_criteria, *misc;
+   int err = 0;
+   int nvports = priv->sriov.num_vfs + 2;
+
+   flow_group_in = mlx5_vzalloc(inlen);
+   if (!flow_group_in)
+   return -ENOMEM;
+
+   /* create vport rx group */
+   memset(flow_group_in, 0, inlen);
+   MLX5_SET(create_flow_group_in, flow_group_in, match_criteria_enable,
+MLX5_MATCH_MISC_PARAMETERS);
+
+   match_criteria = MLX5_ADDR_OF(create_flow_group_in, flow_group_in, 
match_criteria);
+   misc = MLX5_ADDR_OF(fte_match_param, match_criteria, misc_parameters);
+   MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_port);
+
+   MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, 0);
+   MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, nvports - 
1);
+
+   g = mlx5_create_flow_group(esw->offloads.ft_offloads, flow_group_in);
+
+   if (IS_ERR(g)) {
+   err = PTR_ERR(g);
+   mlx5_core_warn(esw->dev, "Failed to create vport rx group err 
%d\n", err);
+   goto out;
+   }
+
+   esw->offloads.vport_rx_group = g;
+out:
+   kfree(flow_group_in);
+   return err;
+}
+
+static void esw_destroy_vport_rx_group(struct mlx5_eswitch *esw)
+{
+   mlx5_destroy_flow_group(esw->offloads.vport_rx_group);
+}
+
+struct mlx5_flow_rule *
+mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, int vport, u32 
tirn)
+{
+   struct mlx5_flow_destination dest;
+   struct mlx5_flow_rule *flow_rule;
+   int match_header = MLX5_MATCH_MISC_PARAMETERS;
+   u32 *match_v, *match_c;
+   void *misc;
+
+   match_v = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
+   match_c = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
+   if (!match_v || !match_c) {
+   esw_warn(esw->dev, "Failed to alloc match parameters\n");
+   flow_rule = ERR_PTR(-ENOMEM);
+   goto out;
+   }
+
+   misc = MLX5_ADDR_OF(fte_match_param, match_v, misc_parameters);
+   MLX5_SET(fte_match_set_misc, misc, source_port, vport);
+
+   misc = MLX5_ADDR_OF(fte_match_param, match_c, misc_parameters);
+   MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_port);
+
+   dest.type = MLX5_FLOW_DESTINATION_TYPE_TIR;
+   dest.tir_num = tirn;
+
+   flow_rule = mlx5_add_flow_rule(esw->offloads.ft_offloads, match_header, 
match_c,
+  match_v, 
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
+  0, );
+   if (IS_ERR(flow_rule)) {
+   esw_warn(esw->dev, "fs offloads: Failed to add vport rx rule 
err %ld\n", PTR_ERR(flow_rule));
+   goto out;
+   }
+
+out:
+   kfree(match_v);
+   kfree(match_c);
+

[PATCH net-next 10/16] net/mlx5e: Add devlink based SRIOV mode changes (legacy --> offloads)

2016-06-27 Thread Saeed Mahameed

From: Or Gerlitz 

Implement handlers for the devlink commands to get and set the SRIOV
E-Switch mode.

When turning to the offloads mode, we disable the e-switch and enable
it again in the new mode, create the NIC offloads table and create VF reps.

When turning to legacy mode, we remove the VF reps and the offloads
table, and re-initiate the e-switch in it's legacy mode.

The actual creation/removal of the VF reps is done in downstream patches.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  12 ++-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 102 -
 2 files changed, 105 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 1fc4cfd..12f509c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -81,8 +81,8 @@ enum {
MC_ADDR_CHANGE | \
PROMISC_CHANGE)
 
-int  esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports);
-void esw_destroy_offloads_fdb_table(struct mlx5_eswitch *esw);
+int esw_offloads_init(struct mlx5_eswitch *esw, int nvports);
+void esw_offloads_cleanup(struct mlx5_eswitch *esw, int nvports);
 
 static int arm_vport_context_events_cmd(struct mlx5_core_dev *dev, u16 vport,
u32 events_mask)
@@ -1561,7 +1561,7 @@ int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, 
int nvfs, int mode)
if (mode == SRIOV_LEGACY)
err = esw_create_legacy_fdb_table(esw, nvfs + 1);
else
-   err = esw_create_offloads_fdb_table(esw, nvfs + 1);
+   err = esw_offloads_init(esw, nvfs + 1);
if (err)
goto abort;
 
@@ -1581,6 +1581,7 @@ abort:
 void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw)
 {
struct esw_mc_addr *mc_promisc;
+   int nvports;
int i;
 
if (!esw || !MLX5_CAP_GEN(esw->dev, vport_group_manager) ||
@@ -1591,6 +1592,7 @@ void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw)
 esw->enabled_vports, esw->mode);
 
mc_promisc = esw->mc_promisc;
+   nvports = esw->enabled_vports;
 
for (i = 0; i < esw->total_vports; i++)
esw_disable_vport(esw, i);
@@ -1600,8 +1602,8 @@ void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw)
 
if (esw->mode == SRIOV_LEGACY)
esw_destroy_legacy_fdb_table(esw);
-   else
-   esw_destroy_offloads_fdb_table(esw);
+   else if (esw->mode == SRIOV_OFFLOADS)
+   esw_offloads_cleanup(esw, nvports);
 
esw->mode = SRIOV_NONE;
/* VPORT 0 (PF) must be enabled back with non-sriov configuration */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 3b3afbd..a39af6b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -114,7 +114,7 @@ out:
 
 #define MAX_PF_SQ 256
 
-int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
+static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
 {
int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
struct mlx5_core_dev *dev = esw->dev;
@@ -202,7 +202,7 @@ ns_err:
return err;
 }
 
-void esw_destroy_offloads_fdb_table(struct mlx5_eswitch *esw)
+static void esw_destroy_offloads_fdb_table(struct mlx5_eswitch *esw)
 {
if (!esw->fdb_table.fdb)
return;
@@ -331,12 +331,106 @@ out:
return flow_rule;
 }
 
+static int esw_offloads_start(struct mlx5_eswitch *esw)
+{
+   int err, num_vfs = esw->dev->priv.sriov.num_vfs;
+
+   if (esw->mode != SRIOV_LEGACY) {
+   esw_warn(esw->dev, "Can't set offloads mode, SRIOV legacy not 
enabled\n");
+   return -EINVAL;
+   }
+
+   mlx5_eswitch_disable_sriov(esw);
+   err = mlx5_eswitch_enable_sriov(esw, num_vfs, SRIOV_OFFLOADS);
+   if (err)
+   esw_warn(esw->dev, "Failed set eswitch to offloads, err %d\n", 
err);
+   return err;
+}
+
+int esw_offloads_init(struct mlx5_eswitch *esw, int nvports)
+{
+   int err;
+
+   err = esw_create_offloads_fdb_table(esw, nvports);
+   if (err)
+   return err;
+
+   err = esw_create_offloads_table(esw);
+   if (err)
+   goto create_ft_err;
+
+   err = esw_create_vport_rx_group(esw);
+   if (err)
+   goto create_fg_err;
+
+   return 0;
+
+create_fg_err:
+   esw_destroy_offloads_table(esw);
+
+create_ft_err:
+   esw_destroy_offloads_fdb_table(esw);
+   return err;
+}
+
+static int esw_offloads_stop(struct mlx5_eswitch *esw)
+{
+

[PATCH net-next 14/16] net/mlx5e: Add support for multiple profiles

2016-06-27 Thread Saeed Mahameed

From: Hadar Hen Zion 

To allow support in representor netdevices where we create more than one
netdevice per NIC, add profiles to the mlx5e driver. The profiling
allows for creation of mlx5e instances with different characteristics.

Each profile implements its own behavior using set of function pointers
defined in struct mlx5e_profile. This is done to allow for avoiding complex
per profix branching in the code.

Currently only the profile for the conventional NIC is implemented,
which is of use when a netdev is created upon pci probe.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  17 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 341 ++
 2 files changed, 240 insertions(+), 118 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 1843a4c..8d4d2b2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -568,6 +568,22 @@ enum {
MLX5E_NIC_PRIO
 };
 
+struct mlx5e_profile {
+   void(*init)(struct mlx5_core_dev *mdev,
+   struct net_device *netdev,
+   const struct mlx5e_profile *profile);
+   void(*cleanup)(struct mlx5e_priv *priv);
+   int (*init_rx)(struct mlx5e_priv *priv);
+   void(*cleanup_rx)(struct mlx5e_priv *priv);
+   int (*init_tx)(struct mlx5e_priv *priv);
+   void(*cleanup_tx)(struct mlx5e_priv *priv);
+   void(*enable)(struct mlx5e_priv *priv);
+   void(*disable)(struct mlx5e_priv *priv);
+   void(*update_stats)(struct mlx5e_priv *priv);
+   int (*max_nch)(struct mlx5_core_dev *mdev);
+   int max_tc;
+};
+
 struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_sq**txq_to_sq_map;
@@ -601,6 +617,7 @@ struct mlx5e_priv {
struct mlx5e_stats stats;
struct mlx5e_tstamptstamp;
u16 q_counter;
+   const struct mlx5e_profile *profile;
 };
 
 enum mlx5e_link_mode {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index db890b2..8ffe68b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -234,7 +234,7 @@ static void mlx5e_update_stats_work(struct work_struct 
*work)
   update_stats_work);
mutex_lock(>state_lock);
if (test_bit(MLX5E_STATE_OPENED, >state)) {
-   mlx5e_update_stats(priv);
+   priv->profile->update_stats(priv);
queue_delayed_work(priv->wq, dwork,
   
msecs_to_jiffies(MLX5E_UPDATE_STATS_INTERVAL));
}
@@ -1037,7 +1037,7 @@ static void mlx5e_build_channeltc_to_txq_map(struct 
mlx5e_priv *priv, int ix)
 {
int i;
 
-   for (i = 0; i < MLX5E_MAX_NUM_TC; i++)
+   for (i = 0; i < priv->profile->max_tc; i++)
priv->channeltc_to_txq_map[ix][i] =
ix + i * priv->params.num_channels;
 }
@@ -1525,21 +1525,20 @@ static void mlx5e_destroy_rqt(struct mlx5e_priv *priv, 
struct mlx5e_rqt *rqt)
mlx5_core_destroy_rqt(priv->mdev, rqt->rqtn);
 }
 
-static int mlx5e_create_rqts(struct mlx5e_priv *priv)
+static int mlx5e_create_indirect_rqts(struct mlx5e_priv *priv)
+{
+   struct mlx5e_rqt *rqt = >indir_rqt;
+
+   return mlx5e_create_rqt(priv, MLX5E_INDIR_RQT_SIZE, 0, rqt);
+}
+
+static int mlx5e_create_direct_rqts(struct mlx5e_priv *priv)
 {
-   int nch = mlx5e_get_max_num_channels(priv->mdev);
struct mlx5e_rqt *rqt;
int err;
int ix;
 
-   /* Indirect RQT */
-   rqt = >indir_rqt;
-   err = mlx5e_create_rqt(priv, MLX5E_INDIR_RQT_SIZE, 0, rqt);
-   if (err)
-   return err;
-
-   /* Direct RQTs */
-   for (ix = 0; ix < nch; ix++) {
+   for (ix = 0; ix < priv->profile->max_nch(priv->mdev); ix++) {
rqt = >direct_tir[ix].rqt;
err = mlx5e_create_rqt(priv, 1 /*size */, ix, rqt);
if (err)
@@ -1552,22 +1551,9 @@ err_destroy_rqts:
for (ix--; ix >= 0; ix--)
mlx5e_destroy_rqt(priv, >direct_tir[ix].rqt);
 
-   mlx5e_destroy_rqt(priv, >indir_rqt);
-
return err;
 }
 
-static void mlx5e_destroy_rqts(struct mlx5e_priv *priv)
-{
-   int nch = mlx5e_get_max_num_channels(priv->mdev);
-   int i;
-
-   for (i = 0; i < nch; i++)
-   mlx5e_destroy_rqt(priv, >direct_tir[i].rqt);
-
-   mlx5e_destroy_rqt(priv, >indir_rqt);
-}
-
 int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz, int ix)
 {
struct mlx5_core_dev *mdev = priv->mdev;
@@ -1677,7 +1663,7

[PATCH net-next 12/16] net/mlx5e: TIRs management refactoring

2016-06-27 Thread Saeed Mahameed

From: Hadar Hen Zion 

The current refresh tirs self loopback mechanism, refreshes all the tirs
belonging to the same mlx5e instance to prevent self loopback by packets
sent over any ring of that instance. This mechanism relies on all the
tirs/tises of an instance to be created with the same transport domain
number (tdn).

Change the driver to refresh all the tirs created under the same tdn
regardless of which mlx5e netdev instance they belong to.

This behaviour is needed for introducing new mlx5e instances which serve
to represent SRIOV VFs. The representors and the PF share vport used for
E-Switch management, and we want to avoid NIC level HW loopback between
them, e.g when sending broadcast packets. To achieve that, both the
representors and the PF NIC will share the tdn.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 12 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  | 14 +++---
 .../net/ethernet/mellanox/mlx5/core/en_common.c| 48 +++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c|  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 56 +-
 6 files changed, 77 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index da93bf55..ded3f96 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -552,9 +552,10 @@ struct mlx5e_flow_steering {
struct mlx5e_arfs_tablesarfs;
 };
 
-struct mlx5e_direct_tir {
+struct mlx5e_tir {
u32  tirn;
u32  rqtn;
+   struct list_head list;
 };
 
 enum {
@@ -576,8 +577,8 @@ struct mlx5e_priv {
struct mlx5e_channel **channel;
u32tisn[MLX5E_MAX_NUM_TC];
u32indir_rqtn;
-   u32indir_tirn[MLX5E_NUM_INDIR_TIRS];
-   struct mlx5e_direct_tirdirect_tir[MLX5E_MAX_NUM_CHANNELS];
+   struct mlx5e_tir   indir_tir[MLX5E_NUM_INDIR_TIRS];
+   struct mlx5e_tir   direct_tir[MLX5E_MAX_NUM_CHANNELS];
u32tx_rates[MLX5E_MAX_NUM_SQS];
 
struct mlx5e_flow_steering fs;
@@ -784,7 +785,12 @@ int mlx5e_rx_flow_steer(struct net_device *dev, const 
struct sk_buff *skb,
 #endif
 
 u16 mlx5e_get_max_inline_cap(struct mlx5_core_dev *mdev);
+int mlx5e_create_tir(struct mlx5_core_dev *mdev,
+struct mlx5e_tir *tir, u32 *in, int inlen);
+void mlx5e_destroy_tir(struct mlx5_core_dev *mdev,
+  struct mlx5e_tir *tir);
 int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev);
 void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev);
+int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5_core_dev *mdev);
 
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index 3515e78..10f18d4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -93,14 +93,14 @@ static enum mlx5e_traffic_types arfs_get_tt(enum arfs_type 
type)
 static int arfs_disable(struct mlx5e_priv *priv)
 {
struct mlx5_flow_destination dest;
-   u32 *tirn = priv->indir_tirn;
+   struct mlx5e_tir *tir = priv->indir_tir;
int err = 0;
int tt;
int i;
 
dest.type = MLX5_FLOW_DESTINATION_TYPE_TIR;
for (i = 0; i < ARFS_NUM_TYPES; i++) {
-   dest.tir_num = tirn[i];
+   dest.tir_num = tir[i].tirn;
tt = arfs_get_tt(i);
/* Modify ttc rules destination to bypass the aRFS tables*/
err = mlx5_modify_rule_destination(priv->fs.ttc.rules[tt],
@@ -176,7 +176,7 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv,
struct arfs_table *arfs_t = >fs.arfs.arfs_tables[type];
struct mlx5_flow_destination dest;
u8 match_criteria_enable = 0;
-   u32 *tirn = priv->indir_tirn;
+   struct mlx5e_tir *tir = priv->indir_tir;
u32 *match_criteria;
u32 *match_value;
int err = 0;
@@ -192,16 +192,16 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv,
dest.type = MLX5_FLOW_DESTINATION_TYPE_TIR;
switch (type) {
case ARFS_IPV4_TCP:
-   dest.tir_num = tirn[MLX5E_TT_IPV4_TCP];
+   dest.tir_num = tir[MLX5E_TT_IPV4_TCP].tirn;
break;
case ARFS_IPV4_UDP:
-   dest.tir_num = tirn[MLX5E_TT_IPV4_UDP];
+   dest.tir_num = tir[MLX5E_TT_IPV4_UDP].tirn;
break;
case

[PATCH net-next 09/16] net/mlx5: Add devlink interface

2016-06-27 Thread Saeed Mahameed

From: Or Gerlitz 

The devlink interface is initially used to set/get the mode of the SRIOV 
e-switch.

Currently, these are only stubs for get/set, down-stream patch will actually
fill them out.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|  1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  4 
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 10 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 26 ++
 4 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 1cf722e..aae4688 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -4,6 +4,7 @@
 
 config MLX5_CORE
tristate "Mellanox Technologies ConnectX-4 and Connect-IB core driver"
+   depends on MAY_USE_DEVLINK
depends on PCI
default n
---help---
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index cf959f7..7843f98 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -35,6 +35,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #define MLX5_MAX_UC_PER_VPORT(dev) \
@@ -205,6 +206,9 @@ mlx5_eswitch_add_send_to_vport_rule(struct mlx5_eswitch 
*esw, int vport, u32 sqn
 struct mlx5_flow_rule *
 mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, int vport, u32 
tirn);
 
+int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode);
+int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode);
+
 #define MLX5_DEBUG_ESWITCH_MASK BIT(3)
 
 #define esw_info(dev, format, ...) \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 67ff1e8..3b3afbd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -330,3 +330,13 @@ out:
kfree(match_c);
return flow_rule;
 }
+
+int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
+{
+   return -EOPNOTSUPP;
+}
+
+int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
+{
+   return -EOPNOTSUPP;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 08cae34..2abd387 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -51,6 +51,7 @@
 #ifdef CONFIG_RFS_ACCEL
 #include 
 #endif
+#include 
 #include "mlx5_core.h"
 #include "fs_core.h"
 #ifdef CONFIG_MLX5_CORE_EN
@@ -1315,19 +1316,28 @@ struct mlx5_core_event_handler {
  void *data);
 };
 
+static const struct devlink_ops mlx5_devlink_ops = {
+#ifdef CONFIG_MLX5_CORE_EN
+   .eswitch_mode_set = mlx5_devlink_eswitch_mode_set,
+   .eswitch_mode_get = mlx5_devlink_eswitch_mode_get,
+#endif
+};
 
 static int init_one(struct pci_dev *pdev,
const struct pci_device_id *id)
 {
struct mlx5_core_dev *dev;
+   struct devlink *devlink;
struct mlx5_priv *priv;
int err;
 
-   dev = kzalloc(sizeof(*dev), GFP_KERNEL);
-   if (!dev) {
+   devlink = devlink_alloc(_devlink_ops, sizeof(*dev));
+   if (!devlink) {
dev_err(>dev, "kzalloc failed\n");
return -ENOMEM;
}
+
+   dev = devlink_priv(devlink);
priv = >priv;
priv->pci_dev_data = id->driver_data;
 
@@ -1364,15 +1374,21 @@ static int init_one(struct pci_dev *pdev,
goto clean_health;
}
 
+   err = devlink_register(devlink, >dev);
+   if (err)
+   goto clean_load;
+
return 0;
 
+clean_load:
+   mlx5_unload_one(dev, priv);
 clean_health:
mlx5_health_cleanup(dev);
 close_pci:
mlx5_pci_close(dev, priv);
 clean_dev:
pci_set_drvdata(pdev, NULL);
-   kfree(dev);
+   devlink_free(devlink);
 
return err;
 }
@@ -1380,8 +1396,10 @@ clean_dev:
 static void remove_one(struct pci_dev *pdev)
 {
struct mlx5_core_dev *dev  = pci_get_drvdata(pdev);
+   struct devlink *devlink = priv_to_devlink(dev);
struct mlx5_priv *priv = >priv;
 
+   devlink_unregister(devlink);
if (mlx5_unload_one(dev, priv)) {
dev_err(>pdev->dev, "mlx5_unload_one failed\n");
mlx5_health_cleanup(dev);
@@ -1390,7 +1408,7 @@ static void remove_one(struct pci_dev *pdev)
mlx5_health_cleanup(dev);
mlx5_pci_close(dev, priv);
pci_set_drvdata(pdev, NULL);
-   kfree(dev);
+   devlink_free(devlink);
 }
 
 static pci_ers_result_t

[PATCH net-next 05/16] net/mlx5: Introduce offloads steering namespace

2016-06-27 Thread Saeed Mahameed

From: Or Gerlitz 

Add a new namespace (MLX5_FLOW_NAMESPACE_OFFLOADS) to be populated
with flow steering rules that deal with rules that have have to
be executed before the EN NIC steering rules are matched.

The namespace is located after the bypass name-space and before the
kernel name-space. Therefore, it precedes the HW processing done for
rules set for the kernel NIC name-space.

Under SRIOV, it would allow us to match on e-switch missed packet
and forward them to the relevant VF representor TIR.

Signed-off-by: Or Gerlitz 
Signed-off-by: Amir Vadai 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 11 ++-
 include/linux/mlx5/fs.h   |  1 +
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index e912a3d..b040110 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -83,6 +83,11 @@
 #define ANCHOR_NUM_LEVELS 1
 #define ANCHOR_NUM_PRIOS 1
 #define ANCHOR_MIN_LEVEL (BY_PASS_MIN_LEVEL + 1)
+
+#define OFFLOADS_MAX_FT 1
+#define OFFLOADS_NUM_PRIOS 1
+#define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + 1)
+
 struct node_caps {
size_t  arr_sz;
long*caps;
@@ -98,7 +103,7 @@ static struct init_tree_node {
int num_levels;
 } root_fs = {
.type = FS_TYPE_NAMESPACE,
-   .ar_size = 4,
+   .ar_size = 5,
.children = (struct init_tree_node[]) {
ADD_PRIO(0, BY_PASS_MIN_LEVEL, 0,
 
FS_REQUIRED_CAPS(FS_CAP(flow_table_properties_nic_receive.flow_modify_en),
@@ -107,6 +112,9 @@ static struct init_tree_node {
  
FS_CAP(flow_table_properties_nic_receive.flow_table_modify)),
 ADD_NS(ADD_MULTIPLE_PRIO(MLX5_BY_PASS_NUM_PRIOS,
  BY_PASS_PRIO_NUM_LEVELS))),
+   ADD_PRIO(0, OFFLOADS_MIN_LEVEL, 0, {},
+ADD_NS(ADD_MULTIPLE_PRIO(OFFLOADS_NUM_PRIOS, 
OFFLOADS_MAX_FT))),
+
ADD_PRIO(0, KERNEL_MIN_LEVEL, 0, {},
 ADD_NS(ADD_MULTIPLE_PRIO(1, 1),
ADD_MULTIPLE_PRIO(KERNEL_NIC_NUM_PRIOS,
@@ -1369,6 +1377,7 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 
switch (type) {
case MLX5_FLOW_NAMESPACE_BYPASS:
+   case MLX5_FLOW_NAMESPACE_OFFLOADS:
case MLX5_FLOW_NAMESPACE_KERNEL:
case MLX5_FLOW_NAMESPACE_LEFTOVERS:
case MLX5_FLOW_NAMESPACE_ANCHOR:
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 4b7a107..6ad1119 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -54,6 +54,7 @@ static inline void build_leftovers_ft_param(int *priority,
 
 enum mlx5_flow_namespace_type {
MLX5_FLOW_NAMESPACE_BYPASS,
+   MLX5_FLOW_NAMESPACE_OFFLOADS,
MLX5_FLOW_NAMESPACE_KERNEL,
MLX5_FLOW_NAMESPACE_LEFTOVERS,
MLX5_FLOW_NAMESPACE_ANCHOR,
-- 
2.8.0

[PATCH net-next 13/16] net/mlx5e: Mark enabled RQTs instances explicitly

2016-06-27 Thread Saeed Mahameed

From: Hadar Hen Zion 

In the current driver implementation two types of receive queue
tables (RQTs) are in use - direct and indirect.

Change the driver to mark each new created RQT (direct or indirect)
as "enabled". This behaviour is needed for introducing new mlx5e
instances which serve to represent SRIOV VFs.

The VF representors will have only one type of RQTs (direct).

An "enabled" flag is added to each RQT to allow better handling
and code sharing between the representors and the nic netdevices.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 13 +--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 45 +-
 3 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ded3f96..1843a4c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -552,10 +552,15 @@ struct mlx5e_flow_steering {
struct mlx5e_arfs_tablesarfs;
 };
 
-struct mlx5e_tir {
-   u32  tirn;
+struct mlx5e_rqt {
u32  rqtn;
-   struct list_head list;
+   bool enabled;
+};
+
+struct mlx5e_tir {
+   u32   tirn;
+   struct mlx5e_rqt  rqt;
+   struct list_head  list;
 };
 
 enum {
@@ -576,7 +581,7 @@ struct mlx5e_priv {
 
struct mlx5e_channel **channel;
u32tisn[MLX5E_MAX_NUM_TC];
-   u32indir_rqtn;
+   struct mlx5e_rqt   indir_rqt;
struct mlx5e_tir   indir_tir[MLX5E_NUM_INDIR_TIRS];
struct mlx5e_tir   direct_tir[MLX5E_MAX_NUM_CHANNELS];
u32tx_rates[MLX5E_MAX_NUM_SQS];
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 877cf68..7c5c477 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -898,7 +898,7 @@ static int mlx5e_set_rxfh(struct net_device *dev, const u32 
*indir,
mutex_lock(>state_lock);
 
if (indir) {
-   u32 rqtn = priv->indir_rqtn;
+   u32 rqtn = priv->indir_rqt.rqtn;
 
memcpy(priv->params.indirection_rqt, indir,
   sizeof(priv->params.indirection_rqt));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 808dff4..db890b2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1487,7 +1487,8 @@ static void mlx5e_fill_direct_rqt_rqn(struct mlx5e_priv 
*priv, void *rqtc,
MLX5_SET(rqtc, rqtc, rq_num[0], rqn);
 }
 
-static int mlx5e_create_rqt(struct mlx5e_priv *priv, int sz, int ix, u32 *rqtn)
+static int mlx5e_create_rqt(struct mlx5e_priv *priv, int sz,
+   int ix, struct mlx5e_rqt *rqt)
 {
struct mlx5_core_dev *mdev = priv->mdev;
void *rqtc;
@@ -1510,34 +1511,37 @@ static int mlx5e_create_rqt(struct mlx5e_priv *priv, 
int sz, int ix, u32 *rqtn)
else
mlx5e_fill_direct_rqt_rqn(priv, rqtc, ix);
 
-   err = mlx5_core_create_rqt(mdev, in, inlen, rqtn);
+   err = mlx5_core_create_rqt(mdev, in, inlen, >rqtn);
+   if (!err)
+   rqt->enabled = true;
 
kvfree(in);
return err;
 }
 
-static void mlx5e_destroy_rqt(struct mlx5e_priv *priv, u32 rqtn)
+static void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt)
 {
-   mlx5_core_destroy_rqt(priv->mdev, rqtn);
+   rqt->enabled = false;
+   mlx5_core_destroy_rqt(priv->mdev, rqt->rqtn);
 }
 
 static int mlx5e_create_rqts(struct mlx5e_priv *priv)
 {
int nch = mlx5e_get_max_num_channels(priv->mdev);
-   u32 *rqtn;
+   struct mlx5e_rqt *rqt;
int err;
int ix;
 
/* Indirect RQT */
-   rqtn = >indir_rqtn;
-   err = mlx5e_create_rqt(priv, MLX5E_INDIR_RQT_SIZE, 0, rqtn);
+   rqt = >indir_rqt;
+   err = mlx5e_create_rqt(priv, MLX5E_INDIR_RQT_SIZE, 0, rqt);
if (err)
return err;
 
/* Direct RQTs */
for (ix = 0; ix < nch; ix++) {
-   rqtn = >direct_tir[ix].rqtn;
-   err = mlx5e_create_rqt(priv, 1 /*size */, ix, rqtn);
+   rqt = >direct_tir[ix].rqt;
+   err = mlx5e_create_rqt(priv, 1 /*size */, ix, rqt);
if (err)
goto err_destroy_rqts;
}
@@ -1546,9 +1550,9 @@ static int mlx5e_create_rqts(struct mlx5e_priv *priv)

[PATCH net-next 03/16] net/mlx5: E-Switch, Add miss rule for offloads mode

2016-06-27 Thread Saeed Mahameed

From: Or Gerlitz 

In the sriov offloads mode, packets that are not matched by any other
rule should be sent towards the e-switch manager for further processing.

Add such "miss" rule which matches ANY packet as the last rule in the
e-switch FDB and programs the HW to send the packet to vport 0 where
the e-switch manager runs.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  1 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 42 ++
 2 files changed, 43 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 2360180..8eed33f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -144,6 +144,7 @@ struct mlx5_eswitch_fdb {
struct offloads_fdb {
struct mlx5_flow_group *send_to_vport_grp;
struct mlx5_flow_group *miss_grp;
+   struct mlx5_flow_rule  *miss_rule;
} offloads;
};
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index c6b28df..9310017 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -38,6 +38,41 @@
 #include "mlx5_core.h"
 #include "eswitch.h"
 
+static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
+{
+   struct mlx5_flow_destination dest;
+   struct mlx5_flow_rule *flow_rule = NULL;
+   int match_header = 0;
+   u32 *match_v, *match_c;
+   int err = 0;
+
+   match_v = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
+   match_c = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
+   if (!match_v || !match_c) {
+   esw_warn(esw->dev, "FDB: Failed to alloc match parameters\n");
+   err = -ENOMEM;
+   goto out;
+   }
+
+   dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
+   dest.vport_num = 0;
+
+   flow_rule = mlx5_add_flow_rule(esw->fdb_table.fdb, match_header, 
match_c,
+  match_v, 
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
+  0, );
+   if (IS_ERR(flow_rule)) {
+   err = PTR_ERR(flow_rule);
+   esw_warn(esw->dev,  "FDB: Failed to add miss flow rule err 
%d\n", err);
+   goto out;
+   }
+
+   esw->fdb_table.offloads.miss_rule = flow_rule;
+out:
+   kfree(match_v);
+   kfree(match_c);
+   return err;
+}
+
 #define MAX_PF_SQ 256
 
 int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
@@ -110,8 +145,14 @@ int esw_create_offloads_fdb_table(struct mlx5_eswitch 
*esw, int nvports)
}
esw->fdb_table.offloads.miss_grp = g;
 
+   err = esw_add_fdb_miss_rule(esw);
+   if (err)
+   goto miss_rule_err;
+
return 0;
 
+miss_rule_err:
+   mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp);
 miss_err:
mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp);
 send_vport_err:
@@ -128,6 +169,7 @@ void esw_destroy_offloads_fdb_table(struct mlx5_eswitch 
*esw)
return;
 
esw_debug(esw->dev, "Destroy offloads FDB Table\n");
+   mlx5_del_flow_rule(esw->fdb_table.offloads.miss_rule);
mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp);
mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp);
 
-- 
2.8.0

[PATCH net-next 06/16] net/mlx5: E-Switch, Add offloads table

2016-06-27 Thread Saeed Mahameed

From: Or Gerlitz 

Belongs to the NIC offloads name-space, and to be used as part of the
SRIOV offloads logic to steer packets that hit the e-switch miss rule
to the TIR of the relevant VF representor.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  5 
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 31 ++
 2 files changed, 36 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index b7fabd1..32db37a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -155,6 +155,10 @@ enum {
SRIOV_OFFLOADS
 };
 
+struct mlx5_esw_offload {
+   struct mlx5_flow_table *ft_offloads;
+};
+
 struct mlx5_eswitch {
struct mlx5_core_dev*dev;
struct mlx5_l2_tablel2_table;
@@ -169,6 +173,7 @@ struct mlx5_eswitch {
 */
struct mutexstate_lock;
struct esw_mc_addr  *mc_promisc;
+   struct mlx5_esw_offload offloads;
int mode;
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index a8be43d..3ca926b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -214,3 +214,34 @@ void esw_destroy_offloads_fdb_table(struct mlx5_eswitch 
*esw)
 
mlx5_destroy_flow_table(esw->fdb_table.fdb);
 }
+
+static int esw_create_offloads_table(struct mlx5_eswitch *esw)
+{
+   struct mlx5_flow_namespace *ns;
+   struct mlx5_flow_table *ft_offloads;
+   struct mlx5_core_dev *dev = esw->dev;
+   int err = 0;
+
+   ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_OFFLOADS);
+   if (!ns) {
+   esw_warn(esw->dev, "Failed to get offloads flow namespace\n");
+   return -ENOMEM;
+   }
+
+   ft_offloads = mlx5_create_flow_table(ns, 0, dev->priv.sriov.num_vfs + 
2, 0);
+   if (IS_ERR(ft_offloads)) {
+   err = PTR_ERR(ft_offloads);
+   esw_warn(esw->dev, "Failed to create offloads table, err %d\n", 
err);
+   return err;
+   }
+
+   esw->offloads.ft_offloads = ft_offloads;
+   return 0;
+}
+
+static void esw_destroy_offloads_table(struct mlx5_eswitch *esw)
+{
+   struct mlx5_esw_offload *offloads = >offloads;
+
+   mlx5_destroy_flow_table(offloads->ft_offloads);
+}
-- 
2.8.0

[PATCH net-next 01/16] net/mlx5: E-Switch, Add operational mode to the SRIOV e-Switch

2016-06-27 Thread Saeed Mahameed

From: Or Gerlitz 

Define three modes for the SRIOV e-switch operation, none (SRIOV_NONE,
none of the VF vports are enabled), legacy (SRIOV_LEGACY, the current mode)
and sriov offloads (SRIOV_OFFLOADS). Currently, when in SRIOV, only the
legacy mode is supported, where steering rules are of the form:

destination mac --> VF vport

This patch does not change any functionality.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 51 +--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 19 +++--
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |  5 ++-
 3 files changed, 46 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index aebbd6c..8068dde 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -428,7 +428,7 @@ esw_fdb_set_vport_promisc_rule(struct mlx5_eswitch *esw, 
u32 vport)
return __esw_fdb_set_vport_rule(esw, vport, true, mac_c, mac_v);
 }
 
-static int esw_create_fdb_table(struct mlx5_eswitch *esw, int nvports)
+static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw, int nvports)
 {
int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
struct mlx5_core_dev *dev = esw->dev;
@@ -479,7 +479,7 @@ static int esw_create_fdb_table(struct mlx5_eswitch *esw, 
int nvports)
esw_warn(dev, "Failed to create flow group err(%d)\n", err);
goto out;
}
-   esw->fdb_table.addr_grp = g;
+   esw->fdb_table.legacy.addr_grp = g;
 
/* Allmulti group : One rule that forwards any mcast traffic */
MLX5_SET(create_flow_group_in, flow_group_in, match_criteria_enable,
@@ -494,7 +494,7 @@ static int esw_create_fdb_table(struct mlx5_eswitch *esw, 
int nvports)
esw_warn(dev, "Failed to create allmulti flow group err(%d)\n", 
err);
goto out;
}
-   esw->fdb_table.allmulti_grp = g;
+   esw->fdb_table.legacy.allmulti_grp = g;
 
/* Promiscuous group :
 * One rule that forward all unmatched traffic from previous groups
@@ -511,17 +511,17 @@ static int esw_create_fdb_table(struct mlx5_eswitch *esw, 
int nvports)
esw_warn(dev, "Failed to create promisc flow group err(%d)\n", 
err);
goto out;
}
-   esw->fdb_table.promisc_grp = g;
+   esw->fdb_table.legacy.promisc_grp = g;
 
 out:
if (err) {
-   if (!IS_ERR_OR_NULL(esw->fdb_table.allmulti_grp)) {
-   mlx5_destroy_flow_group(esw->fdb_table.allmulti_grp);
-   esw->fdb_table.allmulti_grp = NULL;
+   if (!IS_ERR_OR_NULL(esw->fdb_table.legacy.allmulti_grp)) {
+   
mlx5_destroy_flow_group(esw->fdb_table.legacy.allmulti_grp);
+   esw->fdb_table.legacy.allmulti_grp = NULL;
}
-   if (!IS_ERR_OR_NULL(esw->fdb_table.addr_grp)) {
-   mlx5_destroy_flow_group(esw->fdb_table.addr_grp);
-   esw->fdb_table.addr_grp = NULL;
+   if (!IS_ERR_OR_NULL(esw->fdb_table.legacy.addr_grp)) {
+   mlx5_destroy_flow_group(esw->fdb_table.legacy.addr_grp);
+   esw->fdb_table.legacy.addr_grp = NULL;
}
if (!IS_ERR_OR_NULL(esw->fdb_table.fdb)) {
mlx5_destroy_flow_table(esw->fdb_table.fdb);
@@ -533,20 +533,20 @@ out:
return err;
 }
 
-static void esw_destroy_fdb_table(struct mlx5_eswitch *esw)
+static void esw_destroy_legacy_fdb_table(struct mlx5_eswitch *esw)
 {
if (!esw->fdb_table.fdb)
return;
 
esw_debug(esw->dev, "Destroy FDB Table\n");
-   mlx5_destroy_flow_group(esw->fdb_table.promisc_grp);
-   mlx5_destroy_flow_group(esw->fdb_table.allmulti_grp);
-   mlx5_destroy_flow_group(esw->fdb_table.addr_grp);
+   mlx5_destroy_flow_group(esw->fdb_table.legacy.promisc_grp);
+   mlx5_destroy_flow_group(esw->fdb_table.legacy.allmulti_grp);
+   mlx5_destroy_flow_group(esw->fdb_table.legacy.addr_grp);
mlx5_destroy_flow_table(esw->fdb_table.fdb);
esw->fdb_table.fdb = NULL;
-   esw->fdb_table.addr_grp = NULL;
-   esw->fdb_table.allmulti_grp = NULL;
-   esw->fdb_table.promisc_grp = NULL;
+   esw->fdb_table.legacy.addr_grp = NULL;
+   esw->fdb_table.legacy.allmulti_grp = NULL;
+   esw->fdb_table.legacy.promisc_grp = NULL;
 }
 
 /* E-Switch vport UC/MC lists management */
@@ -1540,7 +1540,7 @@ static void esw_disable_vport(struct mlx5_eswitch *esw, 
int vport_num)
 }
 
 /* Public E-Switch API */
-int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs)
+int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs,

[PATCH net-next 15/16] net/mlx5: Add Representors registration API

2016-06-27 Thread Saeed Mahameed

From: Hadar Hen Zion 

Introduce E-Switch registration/unregister representors functions.

Those functions are called by the mlx5e driver when the PF NIC is
created upon pci probe action regardless of the E-Switch mode (NONE,
LEGACY or OFFLOADS).

Adding basic E-Switch database that will hold the vport represntors
upon creation.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 60 +++---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 10 
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  | 12 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 19 +++
 5 files changed, 97 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 8d4d2b2..f61255c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -571,7 +571,7 @@ enum {
 struct mlx5e_profile {
void(*init)(struct mlx5_core_dev *mdev,
struct net_device *netdev,
-   const struct mlx5e_profile *profile);
+   const struct mlx5e_profile *profile, void *ppriv);
void(*cleanup)(struct mlx5e_priv *priv);
int (*init_rx)(struct mlx5e_priv *priv);
void(*cleanup_rx)(struct mlx5e_priv *priv);
@@ -618,6 +618,7 @@ struct mlx5e_priv {
struct mlx5e_tstamptstamp;
u16 q_counter;
const struct mlx5e_profile *profile;
+   void  *ppriv;
 };
 
 enum mlx5e_link_mode {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8ffe68b..bfe3a4c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2881,7 +2881,8 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params 
*params, u8 cq_period_mode)
 
 static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev,
struct net_device *netdev,
-   const struct mlx5e_profile *profile)
+   const struct mlx5e_profile *profile,
+   void *ppriv)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
u32 link_speed = 0;
@@ -2963,6 +2964,7 @@ static void mlx5e_build_nic_netdev_priv(struct 
mlx5_core_dev *mdev,
priv->netdev   = netdev;
priv->params.num_channels  = profile->max_nch(mdev);
priv->profile  = profile;
+   priv->ppriv= ppriv;
 
 #ifdef CONFIG_MLX5_CORE_EN_DCB
mlx5e_ets_init(priv);
@@ -3127,18 +3129,25 @@ static int mlx5e_create_umr_mkey(struct mlx5e_priv 
*priv)
 
 static void mlx5e_nic_init(struct mlx5_core_dev *mdev,
   struct net_device *netdev,
-  const struct mlx5e_profile *profile)
+  const struct mlx5e_profile *profile,
+  void *ppriv)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
 
-   mlx5e_build_nic_netdev_priv(mdev, netdev, profile);
+   mlx5e_build_nic_netdev_priv(mdev, netdev, profile, ppriv);
mlx5e_build_nic_netdev(netdev);
mlx5e_vxlan_init(priv);
 }
 
 static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
 {
+   struct mlx5_core_dev *mdev = priv->mdev;
+   struct mlx5_eswitch *esw = mdev->priv.eswitch;
+
mlx5e_vxlan_cleanup(priv);
+
+   if (MLX5_CAP_GEN(mdev, vport_group_manager))
+   mlx5_eswitch_unregister_vport_rep(esw, 0);
 }
 
 static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
@@ -3230,6 +3239,8 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 {
struct net_device *netdev = priv->netdev;
struct mlx5_core_dev *mdev = priv->mdev;
+   struct mlx5_eswitch *esw = mdev->priv.eswitch;
+   struct mlx5_eswitch_rep rep;
 
if (mlx5e_vxlan_allowed(mdev)) {
rtnl_lock();
@@ -3239,6 +3250,12 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 
mlx5e_enable_async_events(priv);
queue_work(priv->wq, >set_rx_mode_work);
+
+   if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
+   rep.vport = 0;
+   rep.priv_data = priv;
+   mlx5_eswitch_register_vport_rep(esw, );
+   }
 }
 
 static void mlx5e_nic_disable(struct mlx5e_priv *priv)
@@ -3262,7 +3279,7 @@ static const struct mlx5e_profile mlx5e_nic_profile = {
 };
 
 static void *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
-const struct mlx5e_profile *profile)
+

[PATCH net-next 08/16] net/devlink: Add E-Switch mode control

2016-06-27 Thread Saeed Mahameed

From: Or Gerlitz 

Add the commands to set and show the mode of SRIOV E-Switch,
two modes are supported:

* legacy   : operating in the "old" L2 based mode (DMAC --> VF vport)
* offloads : offloading SW rules/policy (e.g Bridge/FDB or TC/Flows based) set 
by the host OS

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 include/net/devlink.h|  3 ++
 include/uapi/linux/devlink.h |  9 +
 net/core/devlink.c   | 87 
 3 files changed, 99 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 1d45b61..c99ffe8 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -90,6 +90,9 @@ struct devlink_ops {
   u16 tc_index,
   enum devlink_sb_pool_type pool_type,
   u32 *p_cur, u32 *p_max);
+
+   int (*eswitch_mode_get)(struct devlink *devlink, u16 *p_mode);
+   int (*eswitch_mode_set)(struct devlink *devlink, u16 mode);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index ba0073b..dd7c1b4 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -57,6 +57,8 @@ enum devlink_command {
DEVLINK_CMD_SB_OCC_SNAPSHOT,
DEVLINK_CMD_SB_OCC_MAX_CLEAR,
 
+   DEVLINK_CMD_ESWITCH_MODE_GET,
+   DEVLINK_CMD_ESWITCH_MODE_SET,
/* add new commands above here */
 
__DEVLINK_CMD_MAX,
@@ -95,6 +97,12 @@ enum devlink_sb_threshold_type {
 
 #define DEVLINK_SB_THRESHOLD_TO_ALPHA_MAX 20
 
+enum devlink_eswitch_mode {
+   DEVLINK_ESWITCH_MODE_NONE,
+   DEVLINK_ESWITCH_MODE_LEGACY,
+   DEVLINK_ESWITCH_MODE_OFFLOADS,
+};
+
 enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */
DEVLINK_ATTR_UNSPEC,
@@ -125,6 +133,7 @@ enum devlink_attr {
DEVLINK_ATTR_SB_TC_INDEX,   /* u16 */
DEVLINK_ATTR_SB_OCC_CUR,/* u32 */
DEVLINK_ATTR_SB_OCC_MAX,/* u32 */
+   DEVLINK_ATTR_ESWITCH_MODE,  /* u16 */
 
/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 933e8d4..b2e592a 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1394,6 +1394,78 @@ static int devlink_nl_cmd_sb_occ_max_clear_doit(struct 
sk_buff *skb,
return -EOPNOTSUPP;
 }
 
+static int devlink_eswitch_fill(struct sk_buff *msg, struct devlink *devlink,
+   enum devlink_command cmd, u32 portid,
+   u32 seq, int flags, u16 mode)
+{
+   void *hdr;
+
+   hdr = genlmsg_put(msg, portid, seq, _nl_family, flags, cmd);
+   if (!hdr)
+   return -EMSGSIZE;
+
+   if (devlink_nl_put_handle(msg, devlink))
+   goto nla_put_failure;
+
+   if (nla_put_u16(msg, DEVLINK_ATTR_ESWITCH_MODE, mode))
+   goto nla_put_failure;
+
+   genlmsg_end(msg, hdr);
+   return 0;
+
+nla_put_failure:
+   genlmsg_cancel(msg, hdr);
+   return -EMSGSIZE;
+}
+
+static int devlink_nl_cmd_eswitch_mode_get_doit(struct sk_buff *skb,
+   struct genl_info *info)
+{
+   struct devlink *devlink = info->user_ptr[0];
+   const struct devlink_ops *ops = devlink->ops;
+   struct sk_buff *msg;
+   u16 mode;
+   int err;
+
+   if (!ops || !ops->eswitch_mode_get)
+   return -EOPNOTSUPP;
+
+   err = ops->eswitch_mode_get(devlink, );
+   if (err)
+   return err;
+
+   msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+   if (!msg)
+   return -ENOMEM;
+
+   err = devlink_eswitch_fill(msg, devlink, DEVLINK_CMD_ESWITCH_MODE_GET,
+  info->snd_portid, info->snd_seq, 0, mode);
+
+   if (err) {
+   nlmsg_free(msg);
+   return err;
+   }
+
+   return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_eswitch_mode_set_doit(struct sk_buff *skb,
+   struct genl_info *info)
+{
+   struct devlink *devlink = info->user_ptr[0];
+   const struct devlink_ops *ops = devlink->ops;
+   u16 mode;
+
+   if (!info->attrs[DEVLINK_ATTR_ESWITCH_MODE])
+   return -EINVAL;
+
+   mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]);
+
+   if (ops && ops->eswitch_mode_set)
+   return ops->eswitch_mode_set(devlink, mode);
+   return -EOPNOTSUPP;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -1407,6 +1479,7 @@ static const struct

[PATCH net-next 16/16] net/mlx5e: Introduce SRIOV VF representors

2016-06-27 Thread Saeed Mahameed

From: Hadar Hen Zion 

Implement the relevant profile functions to create mlx5e driver instance
serving as VF representor. When SRIOV offloads mode is enabled, each VF
will have a representor netdevice instance on the host.

To do that, we also export set of shared service functions from en_main.c,
such that they can be used by both NIC and repsresentors netdevs.

The newly created representor netdevice has a basic set of net_device_ops
which are the same ndo functions as the NIC netdevice and an ndo of it's
own for phys port name.

The profiling infrastructure allow sharing code between the NIC and the
vport representor even though the representor has only a subset of the
NIC functionality.

The VF reps and the PF which is used in that mode to represent the uplink,
expose switchdev ops. Currently the only op supposed is attr get for the
port parent ID which here serves to identify net-devices belonging to the
same HW E-Switch. Other than that, no offloading is implemented and hence
switching functionality is achieved if one sets SW switching rules, e.g
using tc, bridge or ovs.

Port phys name (ndo_get_phys_port_name) is implemented to allow exporting
to user-space the VF vport number and along with the switchdev port parent
id (phys_switch_id) enable a udev base consistent naming scheme:

SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="", \
ATTR{phys_port_name}!="", NAME="$PF_NIC$attr{phys_port_name}"

where phys_switch_id is exposed by the PF (and VF reps) and $PF_NIC is
the name of the PF netdevice.

Signed-off-by: Hadar Hen Zion 
Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  28 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  53 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 387 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  20 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  96 -
 6 files changed, 567 insertions(+), 19 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9b14dad..a574dea 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -8,6 +8,6 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
-   en_tc.o en_arfs.o
+   en_tc.o en_arfs.o en_rep.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f61255c..5912a02 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
@@ -816,4 +817,31 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev 
*mdev);
 void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev);
 int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5_core_dev *mdev);
 
+struct mlx5_eswitch_rep;
+int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
+struct mlx5_eswitch_rep *rep);
+void mlx5e_vport_rep_unload(struct mlx5_eswitch *esw,
+   struct mlx5_eswitch_rep *rep);
+int mlx5e_nic_rep_load(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep);
+void mlx5e_nic_rep_unload(struct mlx5_eswitch *esw,
+ struct mlx5_eswitch_rep *rep);
+int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv);
+void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv);
+int mlx5e_attr_get(struct net_device *dev, struct switchdev_attr *attr);
+
+int mlx5e_create_direct_rqts(struct mlx5e_priv *priv);
+void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt);
+int mlx5e_create_direct_tirs(struct mlx5e_priv *priv);
+void mlx5e_destroy_direct_tirs(struct mlx5e_priv *priv);
+int mlx5e_create_tises(struct mlx5e_priv *priv);
+void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv);
+int mlx5e_close(struct net_device *netdev);
+int mlx5e_open(struct net_device *netdev);
+void mlx5e_update_stats_work(struct work_struct *work);
+void *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
+ const struct mlx5e_profile *profile, void *ppriv);
+void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
+struct rtnl_link_stats64 *
+mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats);
+
 #endif /* __MLX5_EN_H__ */
diff --git

[PATCH net-next 04/16] net/mlx5: E-Switch, Add API to create send-to-vport rules

2016-06-27 Thread Saeed Mahameed

From: Or Gerlitz 

Add the API to create send-to-vport e-switch rules of the form

 packet meta-data :: send-queue-number == $SQN and source-vport == 0 --> $VPORT

These rules are to be used for a send-to-vport logic which conceptually bypasses
the "normal" steering rules currently present at the e-switch datapath.

Such rule should apply only for packets that originate in the e-switch manager
vport (0) and are sent for a given SQN which is used by a given VF representor
device, and hence the matching logic.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  3 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 39 ++
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 8eed33f..b7fabd1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -193,6 +193,8 @@ int mlx5_eswitch_get_vport_config(struct mlx5_eswitch *esw,
 int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
 int vport,
 struct ifla_vf_stats *vf_stats);
+struct mlx5_flow_rule *
+mlx5_eswitch_add_send_to_vport_rule(struct mlx5_eswitch *esw, int vport, u32 
sqn);
 
 #define MLX5_DEBUG_ESWITCH_MASK BIT(3)
 
@@ -204,5 +206,4 @@ int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
 
 #define esw_debug(dev, format, ...)\
mlx5_core_dbg_mask(dev, MLX5_DEBUG_ESWITCH_MASK, format, ##__VA_ARGS__)
-
 #endif /* __MLX5_ESWITCH_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 9310017..a8be43d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -38,6 +38,45 @@
 #include "mlx5_core.h"
 #include "eswitch.h"
 
+struct mlx5_flow_rule *
+mlx5_eswitch_add_send_to_vport_rule(struct mlx5_eswitch *esw, int vport, u32 
sqn)
+{
+   struct mlx5_flow_destination dest;
+   struct mlx5_flow_rule *flow_rule;
+   int match_header = MLX5_MATCH_MISC_PARAMETERS;
+   u32 *match_v, *match_c;
+   void *misc;
+
+   match_v = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
+   match_c = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
+   if (!match_v || !match_c) {
+   esw_warn(esw->dev, "FDB: Failed to alloc match parameters\n");
+   flow_rule = ERR_PTR(-ENOMEM);
+   goto out;
+   }
+
+   misc = MLX5_ADDR_OF(fte_match_param, match_v, misc_parameters);
+   MLX5_SET(fte_match_set_misc, misc, source_sqn, sqn);
+   MLX5_SET(fte_match_set_misc, misc, source_port, 0x0); /* source vport 
is 0 */
+
+   misc = MLX5_ADDR_OF(fte_match_param, match_c, misc_parameters);
+   MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_sqn);
+   MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_port);
+
+   dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
+   dest.vport_num = vport;
+
+   flow_rule = mlx5_add_flow_rule(esw->fdb_table.fdb, match_header, 
match_c,
+  match_v, 
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
+  0, );
+   if (IS_ERR(flow_rule))
+   esw_warn(esw->dev, "FDB: Failed to add send to vport rule err 
%ld\n", PTR_ERR(flow_rule));
+out:
+   kfree(match_v);
+   kfree(match_c);
+   return flow_rule;
+}
+
 static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
 {
struct mlx5_flow_destination dest;
-- 
2.8.0

1 2 3 >

1 - 100 of 203 matches

Mail list logo