date:20170627

Re: [PATCH net-next 1/3] net: ethtool: add support for forward error correction modes

2017-06-27 Thread Dustin Byford

Hi Gal,

On Sun Jun 25 16:38, Gal Pressman wrote:
> 
> > ...
> >
> > SHOW FEC option:
> > root@tor: ethtool --show-fec  swp1
> > FEC parameters for swp1:
> > Active FEC encodings: RS
> > Configured FEC encodings:  RS | BaseR
> >
> > ETHTOOL DEVNAME output modification:
> >
> > ethtool devname output:
> > root@tor:~# ethtool swp1
> > Settings for swp1:
> > root@hpe-7712-03:~# ethtool swp18
> > Settings for swp18:
> > Supported ports: [ FIBRE ]
> > Supported link modes:   4baseCR4/Full
> > 4baseSR4/Full
> > 4baseLR4/Full
> > 10baseSR4/Full
> > 10baseCR4/Full
> > 10baseLR4_ER4/Full
> > Supported pause frame use: No
> > Supports auto-negotiation: Yes
> > Supported FEC modes: [RS | BaseR | None | Not reported]
> > Advertised link modes:  Not reported
> > Advertised pause frame use: No
> > Advertised auto-negotiation: No
> > Advertised FEC modes: [RS | BaseR | None | Not reported]
> >  One or more FEC modes
> > Speed: 10Mb/s
> > Duplex: Full
> > Port: FIBRE
> > PHYAD: 106
> > Transceiver: internal
> > Auto-negotiation: off
> > Link detected: yes

> What is the difference between the information in ethtool DEVNAME and ethtool 
> --show-fec DEVNAME?

I think there are two questions there.  First, how does the FEC-related
information from glinksettings differ from what's retrieved via
gfecparam.  Second, how is that expressed through the ethtool UI.

Regarding the uapi (as we imagined it), glinksettings returns FEC
information through three fields:

@supported: the complete set of FEC modes the hardware supports, at any
speed, medium, or autoneg combination.

@advertising: the set of modes advertised to the link partner through
the relevant autoneg mechanism.

@lp_advertising: the set of modes the link partner is advertising
through autoneg.

gfecparam is used to fetch a couple more important facts about the FEC
configuration:

1) What FEC mode is currently active, either as a result of the autoneg
process, or a previous call to sfecparam.  This is returned in
sfecparam->active_fec

2) If autoneg is off, what is the currently configured FEC mode.  This
is a bitmask returned in gfecparam->fec.  I imagine it's typically a
single mode, but a mask makes it easier to implement a "don't care" policy,
or otherwise allow the NIC/driver to pick between a set of modes.

Regarding the UI.  ethtool DEVNAME gets most of its info from
glinksettings and it's easy to represent the FEC parameters affected by
autoneg there.  ethtool --show-fec simply reports the output of
gfecparam.  I agree the difference is subtle, perhaps it makes sense to
combine all the FEC information into ethtool DEVNAME?

> I can't find a usage of LINK_MODE_FEC_* bits in downstream patches.

I'm not sure what patches you're looking at, but I think those bits
directly affect the "Advertised FEC modes" and "Supported FEC Modes"
fields.

--Dustin

Re: [Intel-wired-lan] [PATCH v2 1/1] e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails

2017-06-27 Thread Daniel Vetter

On Tue, Jun 27, 2017 at 10:51 PM, Jeff Kirsher
 wrote:
> This was submitted and accepted into David Miller's net-next tree.  I can
> see if Dave can pull it into his net tree.  DOes stable need to pick this
> up as well?

Nah if it landed somewhere at least I'm happy, we can carry the fixup
for a while longer locally.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Re: [PATCH net v2] net: sched: Fix one possible panic when no destroy callback

2017-06-27 Thread Eric Dumazet

On Wed, 2017-06-28 at 12:53 +0800, gfree.w...@vip.163.com wrote:
> From: Gao Feng 
> 
> When qdisc fail to init, qdisc_create would invoke the destroy callback
> to cleanup. But there is no check if the callback exists really. So it
> would cause the panic if there is no real destroy callback like the qdisc
> codel, fq, and so on.
> 
> Take codel as an example following:
> When a malicious user constructs one invalid netlink msg, it would cause
> codel_init->codel_change->nla_parse_nested failed.
> Then kernel would invoke the destroy callback directly but qdisc codel
> doesn't define one. It causes one panic as a result.
> 
> Now add one the check for destroy to avoid the possible panic.
> 
> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
> Signed-off-by: Gao Feng 
> ---

Acked-by: Eric Dumazet 

Thanks !

[PATCH net v2] net: sched: Fix one possible panic when no destroy callback

2017-06-27 Thread gfree . wind

From: Gao Feng 

When qdisc fail to init, qdisc_create would invoke the destroy callback
to cleanup. But there is no check if the callback exists really. So it
would cause the panic if there is no real destroy callback like the qdisc
codel, fq, and so on.

Take codel as an example following:
When a malicious user constructs one invalid netlink msg, it would cause
codel_init->codel_change->nla_parse_nested failed.
Then kernel would invoke the destroy callback directly but qdisc codel
doesn't define one. It causes one panic as a result.

Now add one the check for destroy to avoid the possible panic.

Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Gao Feng 
---
 v2: Add the Fixes and an example in changelog, Per Cong Wang & Eric
 v1: initial version

 net/sched/sch_api.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index e88342f..cfdbfa1 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1019,7 +1019,8 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
return sch;
}
/* ops->init() failed, we call ->destroy() like qdisc_create_dflt() */
-   ops->destroy(sch);
+   if (ops->destroy)
+   ops->destroy(sch);
 err_out3:
dev_put(dev);
kfree((char *) sch - sch->padded);
-- 
1.9.1

Re: [PATCH iproute2 3/5] rdma: Add device capability parsing

2017-06-27 Thread Leon Romanovsky

On Tue, Jun 27, 2017 at 03:18:59PM -0700, Stephen Hemminger wrote:
> On Tue, 27 Jun 2017 20:46:15 +0300
> Leon Romanovsky  wrote:
>
> > On Tue, Jun 27, 2017 at 11:37:35AM -0600, Jason Gunthorpe wrote:
> > > On Tue, Jun 27, 2017 at 08:33:01PM +0300, Leon Romanovsky wrote:
> > >
> > > > My initial plan was to put all parsers under their respective names, in
> > > > the similar way as I did for caps: $ rdma dev show mlx5_4 caps
> > >
> > > I think you should have a useful summary display similar to 'ip a' and
> > > other commands.
> > >
> > > guid(s), subnet prefix or default gid for IB, lid/lmc, link state,
> > > speed, mtu, pkeys protocol(s)
> >
> > It will, but before I would like to see this tool be a part of
> > iproute2, so other people will be able to extend it in addition
> > to me.
> >
> > Are you fine with the proposed code?
> >
>
> Output formats need to be nailed down. The output of iproute2 commands is 
> almost
> like an ABI. Users build scripts to parse it (whether that is a great idea or 
> not
> is debateable, it mostly shows the weakness in programatic API's). Therefore 
> fully
> changing output formats in later revisions is likely to get users upset.
>
> The first version doesn't have to be perfect, just close to the overall goal
> of what is planned.

In this version, I'm going to use arrays without indexes, because I prefer
to expose the bare minimum from the kernel, which is RDMA netlink. After
everything else is settled, I'll move those defines to UAPI and reuse them
in rdmatool.

I'll send new version with -d/--details flag and enrich minimal summary.
It won't include anything related to tables gids, pkey yet.

Thanks



signature.asc
Description: PGP signature

Re: [PATCH iproute2 V1 3/6] rdma: Add device capability parsing

2017-06-27 Thread Leon Romanovsky

On Tue, Jun 27, 2017 at 04:04:49PM -0700, Stephen Hemminger wrote:
> On Tue, 27 Jun 2017 17:39:17 +0300
> Leon Romanovsky  wrote:
>
> > +static const char *dev_caps[64] = {
> > +   "RESIZE_MAX_WR",
> > +   "BAD_PKEY_CNTR",
> > +   "BAD_QKEY_CNTR",
> > +   "RAW_MULTI",
> > +   "AUTO_PATH_MIG",
> > +   "CHANGE_PHY_PORT",
> > +   "UD_AV_PORT_ENFORCE",
> > +   "CURR_QP_STATE_MOD",
> > +   "SHUTDOWN_PORT",
> > +   "INIT_TYPE",
> > +   "PORT_ACTIVE_EVENT",
> > +   "SYS_IMAGE_GUID",
> > +   "RC_RNR_NAK_GEN",
> > +   "SRQ_RESIZE",
> > +   "N_NOTIFY_CQ",
> > +   "LOCAL_DMA_LKEY",
> > +   "RESERVED",
> > +   "MEM_WINDOW",
> > +   "UD_IP_CSUM",
> > +   "UD_TSO",
> > +   "XRC",
> > +   "MEM_MGT_EXTENSIONS",
> > +   "BLOCK_MULTICAST_LOOPBACK",
> > +   "MEM_WINDOW_TYPE_2A",
> > +   "MEM_WINDOW_TYPE_2B",
> > +   "RC_IP_CSUM",
> > +   "RAW_IP_CSUM",
> > +   "CROSS_CHANNEL",
> > +   "MANAGED_FLOW_STEERING",
> > +   "SIGNATURE_HANDOVER",
> > +   "ON_DEMAND_PAGING",
> > +   "SG_GAPS_REG",
> > +   "VIRTUAL_FUNCTION",
> > +   "RAW_SCATTER_FCS",
> > +   "RDMA_NETDEV_OPA_VNIC",
> > +};
>
> Please use array initializer so that header and capabilities don't get 
> different values.
> Are the bit values in some rdma header file?

It is enum ib_device_cap_flags copied from include/rdma/ib_verbs.h.
These enum ib_device_cap_flags and enum ib_port_cap_flags are not exposed
to the user (include/uapi/rdma/*) and I'm planning to move them there in
next cycle.

Thanks


signature.asc
Description: PGP signature

[no subject]

2017-06-27 Thread системы администратор

внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных 
администратором, который в настоящее время работает на 10.9GB, Вы не сможете 
отправить или получить новую почту, пока вы повторно не проверить ваш почтовый 
ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, 
отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет 
отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...776774990..2017
Почты технической поддержки ©2017

спасибо
системы администратор

Re: [PATCH net] virtio-net: unbreak cusmed packet for small buffer XDP

2017-06-27 Thread Michael S. Tsirkin

On Wed, Jun 28, 2017 at 11:40:30AM +0800, Jason Wang wrote:
> 
> 
> On 2017年06月28日 11:31, Michael S. Tsirkin wrote:
> > On Wed, Jun 28, 2017 at 10:45:18AM +0800, Jason Wang wrote:
> > > On 2017年06月28日 10:17, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 28, 2017 at 10:14:34AM +0800, Jason Wang wrote:
> > > > > On 2017年06月28日 10:02, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jun 28, 2017 at 09:54:03AM +0800, Jason Wang wrote:
> > > > > > > We should allow csumed packet for small buffer, otherwise XDP_PASS
> > > > > > > won't work correctly.
> > > > > > > 
> > > > > > > Fixes commit bb91accf2733 ("virtio-net: XDP support for small 
> > > > > > > buffers")
> > > > > > > Signed-off-by: Jason Wang
> > > > > > The issue would be VIRTIO_NET_HDR_F_DATA_VALID might be set.
> > > > > > What do you think?
> > > > > I think it's safe. For XDP_PASS, it work like in the past.
> > > > That's the part I don't get. With DATA_VALID csum in packet is wrong, 
> > > > XDP
> > > > tools assume it's value.
> > > DATA_VALID is CHECKSUM_UNCESSARY on the host, and according to the comment
> > > in skbuff.h
> > > 
> > > 
> > > "
> > >   *   The hardware you're dealing with doesn't calculate the full checksum
> > >   *   (as in CHECKSUM_COMPLETE), but it does parse headers and verify
> > > checksums
> > >   *   for specific protocols. For such packets it will set
> > > CHECKSUM_UNNECESSARY
> > >   *   if their checksums are okay. skb->csum is still undefined in this 
> > > case
> > >   *   though. A driver or device must never modify the checksum field in 
> > > the
> > >   *   packet even if checksum is verified.
> > > "
> > > 
> > > The csum is correct I believe?
> > > 
> > > Thanks
> > That's on input. But I think for tun it's output, where that is equivalent
> > to CHECKSUM_NONE
> > 
> > 
> 
> Yes, but the comment said:
> 
> "
> CKSUM_NONE:
>  *
>  *   The skb was already checksummed by the protocol, or a checksum is not
>  *   required.
>  *
>  * CHECKSUM_UNNECESSARY:
>  *
>  *   This has the same meaning on as CHECKSUM_NONE for checksum offload on
>  *   output.
>  *
> "
> 
> So still correct I think?
> 
> Thanks

Hmm maybe I mean NEEDS_CHECKSUM actually.

I'll need to re-read the spec.

-- 
MST

Re: [PATCH net] virtio-net: unbreak cusmed packet for small buffer XDP

2017-06-27 Thread Jason Wang




On 2017年06月28日 11:31, Michael S. Tsirkin wrote:

On Wed, Jun 28, 2017 at 10:45:18AM +0800, Jason Wang wrote:

On 2017年06月28日 10:17, Michael S. Tsirkin wrote:

On Wed, Jun 28, 2017 at 10:14:34AM +0800, Jason Wang wrote:

On 2017年06月28日 10:02, Michael S. Tsirkin wrote:

On Wed, Jun 28, 2017 at 09:54:03AM +0800, Jason Wang wrote:

We should allow csumed packet for small buffer, otherwise XDP_PASS
won't work correctly.

Fixes commit bb91accf2733 ("virtio-net: XDP support for small buffers")
Signed-off-by: Jason Wang

The issue would be VIRTIO_NET_HDR_F_DATA_VALID might be set.
What do you think?

I think it's safe. For XDP_PASS, it work like in the past.

That's the part I don't get. With DATA_VALID csum in packet is wrong, XDP
tools assume it's value.

DATA_VALID is CHECKSUM_UNCESSARY on the host, and according to the comment
in skbuff.h


"
  *   The hardware you're dealing with doesn't calculate the full checksum
  *   (as in CHECKSUM_COMPLETE), but it does parse headers and verify
checksums
  *   for specific protocols. For such packets it will set
CHECKSUM_UNNECESSARY
  *   if their checksums are okay. skb->csum is still undefined in this case
  *   though. A driver or device must never modify the checksum field in the
  *   packet even if checksum is verified.
"

The csum is correct I believe?

Thanks

That's on input. But I think for tun it's output, where that is equivalent
to CHECKSUM_NONE




Yes, but the comment said:

"
CKSUM_NONE:
 *
 *   The skb was already checksummed by the protocol, or a checksum is not
 *   required.
 *
 * CHECKSUM_UNNECESSARY:
 *
 *   This has the same meaning on as CHECKSUM_NONE for checksum offload on
 *   output.
 *
"

So still correct I think?

Thanks

Re: Re: [net PATCH] net: sched: Fix one possible panic when no destroy callback

2017-06-27 Thread Cong Wang

On Tue, Jun 27, 2017 at 5:54 PM, Gao Feng  wrote:
> At 2017-06-28 01:49:50, "Eric Dumazet"  wrote:
>>On Tue, 2017-06-27 at 10:08 -0700, Cong Wang wrote:
>>> On Tue, Jun 27, 2017 at 9:50 AM, Eric Dumazet  
>>> wrote:
>>> > On Tue, 2017-06-27 at 09:30 -0700, Cong Wang wrote:
>>> >> On Mon, Jun 26, 2017 at 6:35 PM,   wrote:
>>> >> > From: Gao Feng 
>>> >> >
>>> >> > When qdisc fail to init, qdisc_create would invoke the destroy callback
>>> >> > to cleanup. But there is no check if the callback exists really. So it
>>> >> > would cause the panic if there is no real destroy callback like these
>>> >> > qdisc codel, pfifo, pfifo_fast, and so on.
>>> >> >
>>> >> > Now add one the check for destroy to avoid the possible panic.
>>> >> >
>>> >> > Signed-off-by: Gao Feng 
>>> >>
>>> >> Looks good,
>>> >>
>>> >> Acked-by: Cong Wang 
>>> >>
>>> >> This is introduced by commit 87b60cfacf9f17cf71933c6e33b.
>>> >> Please add proper Fixes tag next time.
>
> OK. Actually I didn't know it is introduced by this commit before :)
> Need I send an update patch again ?
>

Yes please, update the changelog as Eric suggested and also
add Fixes.

Re: [PATCH net] virtio-net: unbreak cusmed packet for small buffer XDP

2017-06-27 Thread Michael S. Tsirkin

On Wed, Jun 28, 2017 at 10:45:18AM +0800, Jason Wang wrote:
> 
> 
> On 2017年06月28日 10:17, Michael S. Tsirkin wrote:
> > On Wed, Jun 28, 2017 at 10:14:34AM +0800, Jason Wang wrote:
> > > 
> > > On 2017年06月28日 10:02, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 28, 2017 at 09:54:03AM +0800, Jason Wang wrote:
> > > > > We should allow csumed packet for small buffer, otherwise XDP_PASS
> > > > > won't work correctly.
> > > > > 
> > > > > Fixes commit bb91accf2733 ("virtio-net: XDP support for small 
> > > > > buffers")
> > > > > Signed-off-by: Jason Wang 
> > > > The issue would be VIRTIO_NET_HDR_F_DATA_VALID might be set.
> > > > What do you think?
> > > I think it's safe. For XDP_PASS, it work like in the past.
> > That's the part I don't get. With DATA_VALID csum in packet is wrong, XDP
> > tools assume it's value.
> 
> DATA_VALID is CHECKSUM_UNCESSARY on the host, and according to the comment
> in skbuff.h
> 
> 
> "
>  *   The hardware you're dealing with doesn't calculate the full checksum
>  *   (as in CHECKSUM_COMPLETE), but it does parse headers and verify
> checksums
>  *   for specific protocols. For such packets it will set
> CHECKSUM_UNNECESSARY
>  *   if their checksums are okay. skb->csum is still undefined in this case
>  *   though. A driver or device must never modify the checksum field in the
>  *   packet even if checksum is verified.
> "
> 
> The csum is correct I believe?
> 
> Thanks

That's on input. But I think for tun it's output, where that is equivalent
to CHECKSUM_NONE


> > 
> > > For XDP_TX, we
> > > zero the vnet header.
> > Again TX offload is disabled, so packets will go out with an invalid
> > checksum.
> > 
> > > For adjusting header, XDP prog should deal with csum.
> > > 
> > > Thanks
> > That part seems right.
> > 
> > > > > ---
> > > > > The patch is needed for -stable.
> > > > > ---
> > > > >drivers/net/virtio_net.c | 2 +-
> > > > >1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 143d8a9..499fcc9 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -413,7 +413,7 @@ static struct sk_buff *receive_small(struct 
> > > > > net_device *dev,
> > > > >   void *orig_data;
> > > > >   u32 act;
> > > > > - if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
> > > > > + if (unlikely(hdr->hdr.gso_type))
> > > > >   goto err_xdp;
> > > > >   xdp.data_hard_start = buf + VIRTNET_RX_PAD + 
> > > > > vi->hdr_len;
> > > > > -- 
> > > > > 2.7.4

Re: [net] virtio-net: serialize tx routine during reset

2017-06-27 Thread McCabe, Robert J

Acked-by: Robert McCabe

Re: [PATCH net] virtio-net: unbreak cusmed packet for small buffer XDP

2017-06-27 Thread Jason Wang




On 2017年06月28日 10:17, Michael S. Tsirkin wrote:

On Wed, Jun 28, 2017 at 10:14:34AM +0800, Jason Wang wrote:


On 2017年06月28日 10:02, Michael S. Tsirkin wrote:

On Wed, Jun 28, 2017 at 09:54:03AM +0800, Jason Wang wrote:

We should allow csumed packet for small buffer, otherwise XDP_PASS
won't work correctly.

Fixes commit bb91accf2733 ("virtio-net: XDP support for small buffers")
Signed-off-by: Jason Wang 

The issue would be VIRTIO_NET_HDR_F_DATA_VALID might be set.
What do you think?

I think it's safe. For XDP_PASS, it work like in the past.

That's the part I don't get. With DATA_VALID csum in packet is wrong, XDP
tools assume it's value.


DATA_VALID is CHECKSUM_UNCESSARY on the host, and according to the 
comment in skbuff.h



"
 *   The hardware you're dealing with doesn't calculate the full checksum
 *   (as in CHECKSUM_COMPLETE), but it does parse headers and verify 
checksums
 *   for specific protocols. For such packets it will set 
CHECKSUM_UNNECESSARY

 *   if their checksums are okay. skb->csum is still undefined in this case
 *   though. A driver or device must never modify the checksum field in the
 *   packet even if checksum is verified.
"

The csum is correct I believe?

Thanks




For XDP_TX, we
zero the vnet header.

Again TX offload is disabled, so packets will go out with an invalid
checksum.


For adjusting header, XDP prog should deal with csum.

Thanks

That part seems right.


---
The patch is needed for -stable.
---
   drivers/net/virtio_net.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 143d8a9..499fcc9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -413,7 +413,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
void *orig_data;
u32 act;
-   if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
+   if (unlikely(hdr->hdr.gso_type))
goto err_xdp;
xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
--
2.7.4

[PATCH v2] datapath: Avoid using stack larger than 1024.

2017-06-27 Thread Tonghao Zhang

When compiling OvS-master on 4.4.0-81 kernel,
there is a warning:

CC [M]  /root/ovs/datapath/linux/datapath.o
/root/ovs/datapath/linux/datapath.c: In function
‘ovs_flow_cmd_set’:
/root/ovs/datapath/linux/datapath.c:1221:1: warning:
the frame size of 1040 bytes is larger than 1024 bytes
[-Wframe-larger-than=]

This patch factors out match-init and action-copy to avoid
"Wframe-larger-than=1024" warning. Because mask is only
used to get actions, we new a function to save some
stack space.

Signed-off-by: Tonghao Zhang 
---
 datapath/datapath.c | 73 -
 1 file changed, 50 insertions(+), 23 deletions(-)

diff --git a/datapath/datapath.c b/datapath/datapath.c
index c85029c..fdbe314 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -1100,6 +1100,50 @@ static struct sw_flow_actions *get_flow_actions(struct 
net *net,
return acts;
 }
 
+/* Factor out match-init and action-copy to avoid
+ * "Wframe-larger-than=1024" warning. Because mask is only
+ * used to get actions, we new a function to save some
+ * stack space.
+ *
+ * If there are not key and action attrs, we return 0
+ * directly. In the case, the caller will also not use the
+ * match as before. If there is action attr, we try to get
+ * actions and save them to *acts.
+ * */
+static int ovs_nla_init_match_and_action(struct net *net,
+struct sw_flow_match *match,
+struct sw_flow_key *key,
+struct nlattr **a,
+struct sw_flow_actions **acts,
+bool log)
+{
+   struct sw_flow_mask mask;
+   int error = 0;
+
+   if (a[OVS_FLOW_ATTR_KEY]) {
+   ovs_match_init(match, key, true, );
+   error = ovs_nla_get_match(net, match, a[OVS_FLOW_ATTR_KEY],
+ a[OVS_FLOW_ATTR_MASK], log);
+   if (error)
+   return error;
+   }
+
+   if (a[OVS_FLOW_ATTR_ACTIONS]) {
+   if (!a[OVS_FLOW_ATTR_KEY]) {
+   OVS_NLERR(log,
+ "Flow key attribute not present in set 
flow.");
+   return -EINVAL;
+   }
+
+   *acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], key,
+, log);
+   if (IS_ERR(*acts))
+   return PTR_ERR(*acts);
+   }
+
+   return 0;
+}
+
 static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
 {
struct net *net = sock_net(skb->sk);
@@ -1107,7 +1151,6 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
struct ovs_header *ovs_header = info->userhdr;
struct sw_flow_key key;
struct sw_flow *flow;
-   struct sw_flow_mask mask;
struct sk_buff *reply = NULL;
struct datapath *dp;
struct sw_flow_actions *old_acts = NULL, *acts = NULL;
@@ -1119,34 +1162,18 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
bool ufid_present;
 
ufid_present = ovs_nla_get_ufid(, a[OVS_FLOW_ATTR_UFID], log);
-   if (a[OVS_FLOW_ATTR_KEY]) {
-   ovs_match_init(, , true, );
-   error = ovs_nla_get_match(net, , a[OVS_FLOW_ATTR_KEY],
- a[OVS_FLOW_ATTR_MASK], log);
-   } else if (!ufid_present) {
+   if (!a[OVS_FLOW_ATTR_KEY] && !ufid_present) {
OVS_NLERR(log,
  "Flow set message rejected, Key attribute missing.");
-   error = -EINVAL;
+   return -EINVAL;
}
+
+   error = ovs_nla_init_match_and_action(net, , , a,
+ , log);
if (error)
goto error;
 
-   /* Validate actions. */
-   if (a[OVS_FLOW_ATTR_ACTIONS]) {
-   if (!a[OVS_FLOW_ATTR_KEY]) {
-   OVS_NLERR(log,
- "Flow key attribute not present in set 
flow.");
-   error = -EINVAL;
-   goto error;
-   }
-
-   acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], ,
-   , log);
-   if (IS_ERR(acts)) {
-   error = PTR_ERR(acts);
-   goto error;
-   }
-
+   if (acts) {
/* Can allocate before locking if have acts. */
reply = ovs_flow_cmd_alloc_info(acts, , info, false,
ufid_flags);
-- 
1.8.3.1

Re: [PATCH net] virtio-net: unbreak cusmed packet for small buffer XDP

2017-06-27 Thread Michael S. Tsirkin

On Wed, Jun 28, 2017 at 10:14:34AM +0800, Jason Wang wrote:
> 
> 
> On 2017年06月28日 10:02, Michael S. Tsirkin wrote:
> > On Wed, Jun 28, 2017 at 09:54:03AM +0800, Jason Wang wrote:
> > > We should allow csumed packet for small buffer, otherwise XDP_PASS
> > > won't work correctly.
> > > 
> > > Fixes commit bb91accf2733 ("virtio-net: XDP support for small buffers")
> > > Signed-off-by: Jason Wang 
> > The issue would be VIRTIO_NET_HDR_F_DATA_VALID might be set.
> > What do you think?
> 
> I think it's safe. For XDP_PASS, it work like in the past.

That's the part I don't get. With DATA_VALID csum in packet is wrong, XDP
tools assume it's value.

> For XDP_TX, we
> zero the vnet header.

Again TX offload is disabled, so packets will go out with an invalid
checksum.

> For adjusting header, XDP prog should deal with csum.
> 
> Thanks

That part seems right.

> > 
> > > ---
> > > The patch is needed for -stable.
> > > ---
> > >   drivers/net/virtio_net.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 143d8a9..499fcc9 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -413,7 +413,7 @@ static struct sk_buff *receive_small(struct 
> > > net_device *dev,
> > >   void *orig_data;
> > >   u32 act;
> > > - if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
> > > + if (unlikely(hdr->hdr.gso_type))
> > >   goto err_xdp;
> > >   xdp.data_hard_start = buf + VIRTNET_RX_PAD + 
> > > vi->hdr_len;
> > > -- 
> > > 2.7.4

Re: [PATCH net] virtio-net: unbreak cusmed packet for small buffer XDP

2017-06-27 Thread Jason Wang




On 2017年06月28日 10:02, Michael S. Tsirkin wrote:

On Wed, Jun 28, 2017 at 09:54:03AM +0800, Jason Wang wrote:

We should allow csumed packet for small buffer, otherwise XDP_PASS
won't work correctly.

Fixes commit bb91accf2733 ("virtio-net: XDP support for small buffers")
Signed-off-by: Jason Wang 

The issue would be VIRTIO_NET_HDR_F_DATA_VALID might be set.
What do you think?


I think it's safe. For XDP_PASS, it work like in the past. For XDP_TX, 
we zero the vnet header. For adjusting header, XDP prog should deal with 
csum.


Thanks




---
The patch is needed for -stable.
---
  drivers/net/virtio_net.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 143d8a9..499fcc9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -413,7 +413,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
void *orig_data;
u32 act;
  
-		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))

+   if (unlikely(hdr->hdr.gso_type))
goto err_xdp;
  
  		xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;

--
2.7.4

Re: [PATCH net] virtio-net: unbreak cusmed packet for small buffer XDP

2017-06-27 Thread Michael S. Tsirkin

On Wed, Jun 28, 2017 at 09:54:03AM +0800, Jason Wang wrote:
> We should allow csumed packet for small buffer, otherwise XDP_PASS
> won't work correctly.
> 
> Fixes commit bb91accf2733 ("virtio-net: XDP support for small buffers")
> Signed-off-by: Jason Wang 

The issue would be VIRTIO_NET_HDR_F_DATA_VALID might be set.
What do you think?

> ---
> The patch is needed for -stable.
> ---
>  drivers/net/virtio_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 143d8a9..499fcc9 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -413,7 +413,7 @@ static struct sk_buff *receive_small(struct net_device 
> *dev,
>   void *orig_data;
>   u32 act;
>  
> - if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
> + if (unlikely(hdr->hdr.gso_type))
>   goto err_xdp;
>  
>   xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
> -- 
> 2.7.4

Re: [PATCH net] virtio-net: serialize tx routine during reset

2017-06-27 Thread Michael S. Tsirkin

On Wed, Jun 28, 2017 at 09:51:03AM +0800, Jason Wang wrote:
> We don't hold any tx lock when trying to disable TX during reset, this
> would lead a use after free since ndo_start_xmit() tries to access
> the virtqueue which has already been freed. Fix this by using
> netif_tx_disable() before freeing the vqs, this could make sure no tx
> after vq freeing.
> 
> Reported-by: Jean-Philippe Menil 
> Tested-by: Jean-Philippe Menil 
> Fixes commit f600b6905015 ("virtio_net: Add XDP support")
> Cc: John Fastabend 
> Signed-off-by: Jason Wang 

Acked-by: Michael S. Tsirkin 

Thanks a lot Jason. I think this is needed in stable as well.

> ---
>  drivers/net/virtio_net.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a871f45..143d8a9 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1797,6 +1797,7 @@ static void virtnet_freeze_down(struct virtio_device 
> *vdev)
>   flush_work(>config_work);
>  
>   netif_device_detach(vi->dev);
> + netif_tx_disable(vi->dev);
>   cancel_delayed_work_sync(>refill);
>  
>   if (netif_running(vi->dev)) {
> -- 
> 2.7.4

[PATCH net] virtio-net: unbreak cusmed packet for small buffer XDP

2017-06-27 Thread Jason Wang

We should allow csumed packet for small buffer, otherwise XDP_PASS
won't work correctly.

Fixes commit bb91accf2733 ("virtio-net: XDP support for small buffers")
Signed-off-by: Jason Wang 
---
The patch is needed for -stable.
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 143d8a9..499fcc9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -413,7 +413,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
void *orig_data;
u32 act;
 
-   if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
+   if (unlikely(hdr->hdr.gso_type))
goto err_xdp;
 
xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
-- 
2.7.4

[PATCH net] virtio-net: serialize tx routine during reset

2017-06-27 Thread Jason Wang

We don't hold any tx lock when trying to disable TX during reset, this
would lead a use after free since ndo_start_xmit() tries to access
the virtqueue which has already been freed. Fix this by using
netif_tx_disable() before freeing the vqs, this could make sure no tx
after vq freeing.

Reported-by: Jean-Philippe Menil 
Tested-by: Jean-Philippe Menil 
Fixes commit f600b6905015 ("virtio_net: Add XDP support")
Cc: John Fastabend 
Signed-off-by: Jason Wang 
---
 drivers/net/virtio_net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a871f45..143d8a9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1797,6 +1797,7 @@ static void virtnet_freeze_down(struct virtio_device 
*vdev)
flush_work(>config_work);
 
netif_device_detach(vi->dev);
+   netif_tx_disable(vi->dev);
cancel_delayed_work_sync(>refill);
 
if (netif_running(vi->dev)) {
-- 
2.7.4

Re: [PATCH NET V5 2/2] net: hns: Use phy_driver to setup Phy loopback

2017-06-27 Thread Yunsheng Lin

Hi, Andrew

On 2017/6/27 21:29, Andrew Lunn wrote:
 -  phy_write(phy_dev, COPPER_CONTROL_REG, val);
 +  err = phy_resume(phy_dev);
>>>
>>> Maybe this was discussed with an earlier version of these patches. Why
>>> are using phy_resume() and phy_suspend()?
>> When self_test is invoked with ETH_TEST_FL_OFFLINE option, hns mac driver
>> call dev_close to set net dev to offline state if net dev is online.
>> Doing the actual phy loolback test require phy is power up, So phy_resume
>> and phy_suspend are used.
> 
> O.K, so you at least need some comments, because this is not obvious.
> 
>>From your description, it sounds like you can call phy_resume() on a
> device which is not suspended. 
Do you mean after calling dev_close, the device is still not suspended?
If that is the case, is there any way I can ensure the device is suspended?

In general, suspend is expected to
> store away state which will be lost when powering down a
> device. Resume writes that state back into the device after it is
> powered up. So resuming a device which was never suspended could write
> bad state into it.
Do you mean phydev->suspended has bad state?

> 
> Also, what about if WOL has been set before closing the device?
phy_suspend will return errro.

int phy_suspend(struct phy_device *phydev)
{
struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver);
struct ethtool_wolinfo wol = { .cmd = ETHTOOL_GWOL };
int ret = 0;

/* If the device has WOL enabled, we cannot suspend the PHY */
phy_ethtool_get_wol(phydev, );
if (wol.wolopts)
return -EBUSY;

if (phydev->drv && phydrv->suspend)
ret = phydrv->suspend(phydev);

if (ret)
return ret;

phydev->suspended = true;

return ret;
}

Best Regard
Yunsheng Lin

Re:Re: [net PATCH] net: sched: Fix one possible panic when no destroy callback

2017-06-27 Thread Gao Feng

At 2017-06-28 01:49:50, "Eric Dumazet"  wrote:
>On Tue, 2017-06-27 at 10:08 -0700, Cong Wang wrote:
>> On Tue, Jun 27, 2017 at 9:50 AM, Eric Dumazet  wrote:
>> > On Tue, 2017-06-27 at 09:30 -0700, Cong Wang wrote:
>> >> On Mon, Jun 26, 2017 at 6:35 PM,   wrote:
>> >> > From: Gao Feng 
>> >> >
>> >> > When qdisc fail to init, qdisc_create would invoke the destroy callback
>> >> > to cleanup. But there is no check if the callback exists really. So it
>> >> > would cause the panic if there is no real destroy callback like these
>> >> > qdisc codel, pfifo, pfifo_fast, and so on.
>> >> >
>> >> > Now add one the check for destroy to avoid the possible panic.
>> >> >
>> >> > Signed-off-by: Gao Feng 
>> >>
>> >> Looks good,
>> >>
>> >> Acked-by: Cong Wang 
>> >>
>> >> This is introduced by commit 87b60cfacf9f17cf71933c6e33b.
>> >> Please add proper Fixes tag next time.

OK. Actually I didn't know it is introduced by this commit before :)
Need I send an update patch again ?

>> >
>> > Given that pfifo, pfifo_fast or codel can not fail their init,
>> 
>> 
>> How about codel_init() -> codel_change() -> nla_parse_nested() ?
>
>
>Yeah, with a malicious user space then (iproute2/tc is fine), codel
>could be problematic.
>
>pfifo and pfifo_fast can definitely not hit this bug.
>
>changelog needs a bit of attention, even if the bug is real.
>
>Thanks.
>
>

Yes, the codel could fail to init, and the fifo/pfifo could failed When 
"nla_len(opt) < sizeof(*ctl)".

Best Regards
Feng

Re: [PATCH NET V6 1/2] net: phy: Add phy loopback support in net phy framework

2017-06-27 Thread Yunsheng Lin

Hi, Madalin

On 2017/6/27 19:48, Madalin-cristian Bucur wrote:
>> -Original Message-
>> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
>> On Behalf Of Lin Yun Sheng
>> Sent: Tuesday, June 27, 2017 2:01 PM
>> To: da...@davemloft.net; and...@lunn.ch; f.faine...@gmail.com
>> Cc: huangda...@hisilicon.com; xuw...@hisilicon.com;
>> liguo...@hisilicon.com; yisen.zhu...@huawei.com;
>> gabriele.paol...@huawei.com; john.ga...@huawei.com; linux...@huawei.com;
>> yisen.zhu...@huawei.com; salil.me...@huawei.com; lipeng...@huawei.com;
>> trem...@gmail.com; netdev@vger.kernel.org; linux-ker...@vger.kernel.org
>> Subject: [PATCH NET V6 1/2] net: phy: Add phy loopback support in net phy
>> framework
>>
>> This patch add set_loopback in phy_driver, which is used by Mac
>> driver to enable or disable a phy. it also add a generic
>> genphy_loopback function, which use BMCR loopback bit to enable
>> or disable a phy.
> 
> "disable a phy" or disable the PHY loopback function?
It should be disable the PHY loopback function, thanks for pointing out.

> 

>> @@ -1123,6 +1123,39 @@ int phy_resume(struct phy_device *phydev)
>>  }
>>  EXPORT_SYMBOL(phy_resume);
>>
>> +int phy_loopback(struct phy_device *phydev, bool enable)
>> +{
>> +struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver);
>> +int ret = 0;
>> +
>> +mutex_lock(>lock);
>> +
>> +if (enable && phydev->loopback_enabled) {
>> +ret = -EBUSY;
>> +goto out;
>> +}
>> +
>> +if (!enable && !phydev->loopback_enabled) {
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
> 
> if (enable == phydev->loopback_enabled)
One if statement don't work here, it returns different error code.

> 
>> +if (phydev->drv && phydrv->set_loopback)
>> +ret = phydrv->set_loopback(phydev, enable);
>> +else
>> +ret = -EOPNOTSUPP;
>> +
>> +if (ret)
>> +goto out;
>> +
>> +phydev->loopback_enabled = enable;
>> +
>> +out:
>> +mutex_unlock(>lock);
>> +return ret;
>> +}
>> +EXPORT_SYMBOL(phy_loopback);
>> +
>

Best Regards
Yunsheng

Re: [PATCH/RFC net-next 2/9] nfp: add phys_switch_id support

2017-06-27 Thread Jakub Kicinski

On Wed, 28 Jun 2017 01:21:42 +0200, Simon Horman wrote:
> Add phys_switch_id support by allowing lookup of
> SWITCHDEV_ATTR_ID_PORT_PARENT_ID via the nfp_repr_port_attr_get
> switchdev operation.
> 
> This is visible to user-space in the phys_switch_id attribute
> of a netdev.
> 
> e.g.
> cd /sys/devices/pci:00/:00:01.0/:01:00.0
> find . -name phys_switch_id | xargs grep .
> ./net/eth3/phys_switch_id:00154d1300bd
> ./net/eth4/phys_switch_id:00154d1300bd
> ./net/eth2/phys_switch_id:00154d1300bd
> grep: ./net/eth5/phys_switch_id: Operation not supported
> 
> In the above eth2 and eth3 and representor netdevs for the first and second
> physical port. eth4 is the representor for the PF. And eth5 is the PF netdev.
> 
> Signed-off-by: Simon Horman 

Reviewed-by: Jakub Kicinski

[PATCH/RFC net-next 7/9] nfp: add metadata to each flow offload

2017-06-27 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Adds metadata describing the mask id of each flow and keeps track of
flows installed in hardware. Previously a flow could not be removed
from hardware as there was no way of knowing if that a specific flow
was installed. This is solved by storing the offloaded flows in a
hash table.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/flower/main.c   |   8 -
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  52 +++
 .../net/ethernet/netronome/nfp/flower/metadata.c   | 379 +
 .../net/ethernet/netronome/nfp/flower/offload.c|  30 +-
 5 files changed, 459 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/metadata.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 1ba0ea78adc3..b8e1358868bd 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -35,6 +35,7 @@ nfp-objs += \
flower/cmsg.o \
flower/main.o \
flower/match.o \
+   flower/metadata.o \
flower/offload.o
 endif
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 7b27871f489c..a7c9dea8cb9c 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -47,14 +47,6 @@
 #include "../nfp_port.h"
 #include "./cmsg.h"
 
-/**
- * struct nfp_flower_priv - Flower APP per-vNIC priv data
- * @nn: Pointer to vNIC
- */
-struct nfp_flower_priv {
-   struct nfp_net *nn;
-};
-
 static const char *nfp_flower_extra_cap(struct nfp_app *app, struct nfp_net 
*nn)
 {
return "FLOWER";
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 52db2acb250e..cc184618306c 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -34,6 +34,7 @@
 #ifndef __NFP_FLOWER_H__
 #define __NFP_FLOWER_H__ 1
 
+#include 
 #include 
 
 #include "cmsg.h"
@@ -45,6 +46,42 @@ struct tc_to_netdev;
 struct net_device;
 struct nfp_app;
 
+#define NFP_FLOWER_HASH_BITS   10
+#define NFP_FLOWER_HASH_SEED   129004
+
+#define NFP_FLOWER_MASK_ENTRY_RS   256
+#define NFP_FLOWER_MASK_ELEMENT_RS 1
+#define NFP_FLOWER_MASK_HASH_BITS  10
+#define NFP_FLOWER_MASK_HASH_SEED  9198806
+
+#define NFP_FL_META_FLAG_NEW_MASK  128
+#define NFP_FL_META_FLAG_LAST_MASK 1
+
+#define NFP_FL_MASK_REUSE_TIME 40
+#define NFP_FL_MASK_ID_LOCATION1
+
+struct nfp_fl_mask_id {
+   struct circ_buf mask_id_free_list;
+   struct timeval *last_used;
+   u8 init_unallocated;
+};
+
+/**
+ * struct nfp_flower_priv - Flower APP per-vNIC priv data
+ * @nn:Pointer to vNIC
+ * @flower_version:HW version of flower
+ * @mask_ids:  List of free mask ids
+ * @mask_table:Hash table used to store masks
+ * @flow_table:Hash table used to store flower rules
+ */
+struct nfp_flower_priv {
+   struct nfp_net *nn;
+   u64 flower_version;
+   struct nfp_fl_mask_id mask_ids;
+   DECLARE_HASHTABLE(mask_table, NFP_FLOWER_MASK_HASH_BITS);
+   DECLARE_HASHTABLE(flow_table, NFP_FLOWER_HASH_BITS);
+};
+
 struct nfp_fl_key_ls {
u32 key_layer_two;
u8 key_layer;
@@ -69,6 +106,10 @@ struct nfp_fl_payload {
char *action_data;
 };
 
+int nfp_flower_metadata_init(struct nfp_app *app);
+void nfp_flower_metadata_cleanup(struct nfp_app *app);
+
+int nfp_repr_get_port_id(struct net_device *netdev);
 int nfp_flower_repr_init(struct nfp_app *app);
 int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
u32 handle, __be16 proto, struct tc_to_netdev *tc);
@@ -79,5 +120,16 @@ int nfp_flower_compile_flow_match(struct 
tc_cls_flower_offload *flow,
 int nfp_flower_compile_action(struct tc_cls_flower_offload *flow,
  struct net_device *netdev,
  struct nfp_fl_payload *nfp_flow);
+int nfp_compile_flow_metadata(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
+ struct nfp_fl_payload *nfp_flow);
+int nfp_modify_flow_metadata(struct nfp_app *app,
+struct nfp_fl_payload *nfp_flow);
+
+struct nfp_fl_payload *
+nfp_flower_find_in_fl_table(struct nfp_app *app,
+   unsigned long tc_flower_cookie);
+int nfp_flower_remove_fl_table(struct nfp_app *app,
+  unsigned long tc_flower_cookie);
 
 #endif
diff

[PATCH/RFC net-next 8/9] nfp: add a stats handler for flower offloads

2017-06-27 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Previously there was no way of updating flow rule stats after they
have been offloaded to hardware. This is solved by keeping track of
stats received from hardware and providing this to the TC handler
on request.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  26 
 .../net/ethernet/netronome/nfp/flower/metadata.c   | 143 -
 .../net/ethernet/netronome/nfp/flower/offload.c|  14 +-
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   1 +
 4 files changed, 181 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index cc184618306c..8490ef1129ea 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -38,6 +38,7 @@
 #include 
 
 #include "cmsg.h"
+#include "../nfp_net.h"
 
 #define NFP_FLOWER_ALLOWED_VER 0x00010001UL
 
@@ -46,9 +47,13 @@ struct tc_to_netdev;
 struct net_device;
 struct nfp_app;
 
+#define NFP_FL_REPEATED_HASH_MAX   BIT(17)
 #define NFP_FLOWER_HASH_BITS   10
 #define NFP_FLOWER_HASH_SEED   129004
 
+#define NFP_FL_STATS_ENTRY_RS  BIT(20)
+#define NFP_FL_STATS_ELEM_RS   4
+
 #define NFP_FLOWER_MASK_ENTRY_RS   256
 #define NFP_FLOWER_MASK_ELEMENT_RS 1
 #define NFP_FLOWER_MASK_HASH_BITS  10
@@ -66,10 +71,17 @@ struct nfp_fl_mask_id {
u8 init_unallocated;
 };
 
+struct nfp_fl_stats_id {
+   struct circ_buf free_list;
+   u32 init_unalloc;
+   u8 repeated_em_count;
+};
+
 /**
  * struct nfp_flower_priv - Flower APP per-vNIC priv data
  * @nn:Pointer to vNIC
  * @flower_version:HW version of flower
+ * @stats_ids: List of free stats ids
  * @mask_ids:  List of free mask ids
  * @mask_table:Hash table used to store masks
  * @flow_table:Hash table used to store flower rules
@@ -77,6 +89,7 @@ struct nfp_fl_mask_id {
 struct nfp_flower_priv {
struct nfp_net *nn;
u64 flower_version;
+   struct nfp_fl_stats_id stats_ids;
struct nfp_fl_mask_id mask_ids;
DECLARE_HASHTABLE(mask_table, NFP_FLOWER_MASK_HASH_BITS);
DECLARE_HASHTABLE(flow_table, NFP_FLOWER_HASH_BITS);
@@ -101,11 +114,20 @@ struct nfp_fl_rule_metadata {
 
 struct nfp_fl_payload {
struct nfp_fl_rule_metadata meta;
+   spinlock_t lock_nfp_flow_stats; /* serialize flow stats access. */
+   struct nfp_stat_pair stats;
char *unmasked_data;
char *mask_data;
char *action_data;
 };
 
+struct nfp_fl_stats_frame {
+   __be32 stats_con_id;
+   __be32 pkt_count;
+   __be64 byte_count;
+   __be64 stats_cookie;
+};
+
 int nfp_flower_metadata_init(struct nfp_app *app);
 void nfp_flower_metadata_cleanup(struct nfp_app *app);
 
@@ -132,4 +154,8 @@ nfp_flower_find_in_fl_table(struct nfp_app *app,
 int nfp_flower_remove_fl_table(struct nfp_app *app,
   unsigned long tc_flower_cookie);
 
+void nfp_flower_rx_flow_stats(struct nfp_app *app, struct sk_buff *skb);
+void nfp_flower_populate_stats(struct nfp_fl_payload *nfp_flow);
+void nfp_flower_stats_clear(struct nfp_fl_payload *nfp_flow);
+
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c 
b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index acbf4c757988..75a98da049f7 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -39,6 +39,7 @@
 #include 
 
 #include "main.h"
+#include "cmsg.h"
 
 struct nfp_mask_id_table {
struct hlist_node link;
@@ -53,6 +54,115 @@ struct nfp_flower_table {
struct hlist_node link;
 };
 
+static int nfp_release_stats_entry(struct nfp_app *app, u32 stats_context_id)
+{
+   struct nfp_flower_priv *priv = app->priv;
+   struct circ_buf *ring;
+
+   ring = >stats_ids.free_list;
+   /* Check if buffer is full. */
+   if (!CIRC_SPACE(ring->head, ring->tail, NFP_FL_STATS_ENTRY_RS *
+   NFP_FL_STATS_ELEM_RS -
+   NFP_FL_STATS_ELEM_RS + 1))
+   return -ENOBUFS;
+
+   memcpy(>buf[ring->head], _context_id, NFP_FL_STATS_ELEM_RS);
+   ring->head = (ring->head + NFP_FL_STATS_ELEM_RS) %
+(NFP_FL_STATS_ENTRY_RS * NFP_FL_STATS_ELEM_RS);
+
+   return 0;
+}
+
+static int nfp_get_stats_entry(struct nfp_app *app, u32 *stats_context_id)
+{
+   struct nfp_flower_priv *priv = app->priv;
+   u32 freed_stats_id, temp_stats_id;
+   struct circ_buf *ring;
+
+   ring = >stats_ids.free_list;
+   freed_stats_id = NFP_FL_STATS_ENTRY_RS;
+   /* Check for unallocated entries first. */
+   if

[PATCH/RFC net-next 9/9] nfp: add control message passing capabilities to flower offloads

2017-06-27 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Previously the flower offloads never sends messages to the hardware,
and never registers a handler for receiving messages from hardware.
This patch enables the flower offloads to send control messages to
hardware when adding and removing flow rules. Additionally it
registers a control message rx handler for receiving stats updates
from hardware for each offloaded flow.

Additionally this patch adds 4 control message types; Add, modify and
delete flow, as well as flow stats. It also allows
nfp_flower_cmsg_get_data() to be used outside of cmsg.c.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   | 11 ++---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   | 12 +
 .../net/ethernet/netronome/nfp/flower/offload.c| 57 ++
 3 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
index 916a6196d2ba..dd7fa9cf225f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 
+#include "main.h"
 #include "../nfpcore/nfp_cpp.h"
 #include "../nfp_net_repr.h"
 #include "./cmsg.h"
@@ -52,12 +53,7 @@ nfp_flower_cmsg_get_hdr(struct sk_buff *skb)
return (struct nfp_flower_cmsg_hdr *)skb->data;
 }
 
-static void *nfp_flower_cmsg_get_data(struct sk_buff *skb)
-{
-   return (unsigned char *)skb->data + NFP_FLOWER_CMSG_HLEN;
-}
-
-static struct sk_buff *
+struct sk_buff *
 nfp_flower_cmsg_alloc(struct nfp_app *app, unsigned int size,
  enum nfp_flower_cmsg_type_port type)
 {
@@ -148,6 +144,9 @@ void nfp_flower_cmsg_rx(struct nfp_app *app, struct sk_buff 
*skb)
case NFP_FLOWER_CMSG_TYPE_PORT_MOD:
nfp_flower_cmsg_portmod_rx(app, skb);
break;
+   case NFP_FLOWER_CMSG_TYPE_FLOW_STATS:
+   nfp_flower_rx_flow_stats(app, skb);
+   break;
default:
nfp_flower_cmsg_warn(app, "Cannot handle invalid repr control 
type %u\n",
 type);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 4c72e537af32..5a997feb6f80 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -245,7 +245,11 @@ struct nfp_flower_cmsg_hdr {
 
 /* Types defined for port related control messages  */
 enum nfp_flower_cmsg_type_port {
+   NFP_FLOWER_CMSG_TYPE_FLOW_ADD = 0,
+   NFP_FLOWER_CMSG_TYPE_FLOW_MOD = 1,
+   NFP_FLOWER_CMSG_TYPE_FLOW_DEL = 2,
NFP_FLOWER_CMSG_TYPE_PORT_MOD = 8,
+   NFP_FLOWER_CMSG_TYPE_FLOW_STATS =   15,
NFP_FLOWER_CMSG_TYPE_PORT_ECHO =16,
NFP_FLOWER_CMSG_TYPE_MAX =  32,
 };
@@ -300,7 +304,15 @@ nfp_flower_cmsg_pcie_port(u8 nfp_pcie, enum 
nfp_flower_cmsg_port_vnic_type type,
   NFP_FLOWER_CMSG_PORT_TYPE_PCIE_PORT);
 }
 
+static inline void *nfp_flower_cmsg_get_data(struct sk_buff *skb)
+{
+   return (unsigned char *)skb->data + NFP_FLOWER_CMSG_HLEN;
+}
+
 int nfp_flower_cmsg_portmod(struct nfp_repr *repr, bool carrier_ok);
 void nfp_flower_cmsg_rx(struct nfp_app *app, struct sk_buff *skb);
+struct sk_buff *
+nfp_flower_cmsg_alloc(struct nfp_app *app, unsigned int size,
+ enum nfp_flower_cmsg_type_port type);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index b39c96623657..4dcd50675926 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -45,6 +45,52 @@
 #include "../nfp_net.h"
 #include "../nfp_port.h"
 
+static int
+nfp_flower_xmit_flow(struct net_device *netdev,
+struct nfp_fl_payload *nfp_flow, u8 mtype)
+{
+   u32 meta_len, key_len, mask_len, act_len, tot_len;
+   struct nfp_repr *priv = netdev_priv(netdev);
+   struct sk_buff *skb;
+   unsigned char *msg;
+
+   meta_len =  sizeof(struct nfp_fl_rule_metadata);
+   key_len = nfp_flow->meta.key_len;
+   mask_len = nfp_flow->meta.mask_len;
+   act_len = nfp_flow->meta.act_len;
+
+   tot_len = meta_len + key_len + mask_len + act_len;
+
+   /* Convert to long words as firmware expects
+* lengths in units of NFP_FL_LW_SIZ.
+*/
+   nfp_flow->meta.key_len /= NFP_FL_LW_SIZ;
+   nfp_flow->meta.mask_len /= NFP_FL_LW_SIZ;
+   nfp_flow->meta.act_len /= NFP_FL_LW_SIZ;
+
+   skb = nfp_flower_cmsg_alloc(priv->app, tot_len, mtype);
+   if (!skb)
+   return -ENOMEM;
+
+   msg =

[PATCH/RFC net-next 4/9] nfp: extend flower add flow offload

2017-06-27 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Extends the flower flow add function by calculating which match
fields are present in the flower offload structure and allocating
the appropriate space to describe these.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   | 141 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  24 +++
 .../net/ethernet/netronome/nfp/flower/offload.c| 166 -
 3 files changed, 330 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index c10ae7631941..1b1888e8dc14 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -40,6 +40,147 @@
 
 #include "../nfp_app.h"
 
+#define NFP_FLOWER_LAYER_META  BIT(0)
+#define NFP_FLOWER_LAYER_PORT  BIT(1)
+#define NFP_FLOWER_LAYER_MAC   BIT(2)
+#define NFP_FLOWER_LAYER_TPBIT(3)
+#define NFP_FLOWER_LAYER_IPV4  BIT(4)
+#define NFP_FLOWER_LAYER_IPV6  BIT(5)
+#define NFP_FLOWER_LAYER_CTBIT(6)
+#define NFP_FLOWER_LAYER_VXLAN BIT(7)
+
+#define NFP_FLOWER_LAYER_ETHER BIT(3)
+#define NFP_FLOWER_LAYER_ARP   BIT(4)
+
+/* Metadata without L2 (1W/4B)
+ * 
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |  key_layers   |mask_id|   reserved|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+struct nfp_flower_meta_one {
+   u8 nfp_flow_key_layer;
+   u8 mask_id;
+   u16 reserved;
+};
+
+/* Metadata with L2 (1W/4B)
+ * 
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |key_type   |mask_id| PCP |p|   vlan outermost VID  |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * ^   ^
+ *   NOTE: | TCI   |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+struct nfp_flower_meta_two {
+   u8 nfp_flow_key_layer;
+   u8 mask_id;
+   __be16 tci;
+};
+
+/* Port details (1W/4B)
+ * 
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * | port_ingress  |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+struct nfp_flower_in_port {
+   __be32 in_port;
+};
+
+/* L2 details (4W/16B)
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * | mac_addr_dst, 31 - 0  |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |  mac_addr_dst, 47 - 32| mac_addr_src, 15 - 0  |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * | mac_addr_src, 47 - 16 |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |   mpls outermost label|  TC |B|   reserved  |q|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+struct nfp_flower_mac_mpls {
+   u8 mac_dst[6];
+   u8 mac_src[6];
+   __be32 mpls_lse;
+};
+
+/* L4 ports (for UDP, TCP, SCTP) (1W/4B)
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |port_src   |   port_dst|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+struct nfp_flower_tp_ports {
+   __be16 port_src;
+   __be16 port_dst;
+};
+
+/* L3 IPv4 details (3W/12B)
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |DSCP   |ECN|   protocol|   reserved|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |ipv4_addr_src  |
+ *

[PATCH/RFC net-next 5/9] nfp: extend flower matching capabilities

2017-06-27 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Extends matching capabilities for flower offloads to include vlan,
layer 2, layer 3 and layer 4 type matches. This includes both exact
and wildcard matching.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |   4 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |   6 +
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 292 +
 .../net/ethernet/netronome/nfp/flower/offload.c|   5 +
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h  |  10 +-
 6 files changed, 317 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/match.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index d7afd2b410fe..018cef3fa10a 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -33,6 +33,7 @@ ifeq ($(CONFIG_NFP_APP_FLOWER),y)
 nfp-objs += \
flower/cmsg.o \
flower/main.o \
+   flower/match.o \
flower/offload.o
 endif
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 1b1888e8dc14..1956c1acf39f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -52,6 +52,10 @@
 #define NFP_FLOWER_LAYER_ETHER BIT(3)
 #define NFP_FLOWER_LAYER_ARP   BIT(4)
 
+#define NFP_FLOWER_MASK_VLAN_PRIO  GENMASK(15, 13)
+#define NFP_FLOWER_MASK_VLAN_CFI   BIT(12)
+#define NFP_FLOWER_MASK_VLAN_VID   GENMASK(11, 0)
+
 /* Metadata without L2 (1W/4B)
  * 
  *3   2   1
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index b4e5d9b75c01..d2b2bf783f32 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -38,6 +38,7 @@
 
 #define NFP_FLOWER_ALLOWED_VER 0x00010001UL
 
+struct tc_cls_flower_offload;
 struct tc_to_netdev;
 struct net_device;
 struct nfp_app;
@@ -69,4 +70,9 @@ struct nfp_fl_payload {
 int nfp_flower_repr_init(struct nfp_app *app);
 int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
u32 handle, __be16 proto, struct tc_to_netdev *tc);
+int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
+ struct nfp_fl_key_ls *key_ls,
+ struct net_device *netdev,
+ struct nfp_fl_payload *nfp_flow);
+
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c 
b/drivers/net/ethernet/netronome/nfp/flower/match.c
new file mode 100644
index ..b14c6b2be803
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -0,0 +1,292 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ *  2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include "main.h"
+#include "cmsg.h"
+
+static void
+nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
+   struct tc_cls_flower_offload *flow, u8 key_type,
+   bool mask_version)
+{
+   struct flow_dissector_key_vlan

[PATCH/RFC net-next 1/9] net: switchdev: add SET_SWITCHDEV_OPS helper

2017-06-27 Thread Simon Horman

Add a helper to allow switchdev ops to be set if NET_SWITCHDEV is configured
and do nothing otherwise. This allows for slightly cleaner code which
uses switchdev but does not select NET_SWITCHDEV.

Signed-off-by: Simon Horman 
---
 include/net/switchdev.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index c784a6ac6ef1..8ae9e3b6392e 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -217,6 +217,8 @@ void switchdev_port_fwd_mark_set(struct net_device *dev,
 
 bool switchdev_port_same_parent_id(struct net_device *a,
   struct net_device *b);
+
+#define SWITCHDEV_SET_OPS(netdev, ops) ((netdev)->switchdev_ops = (ops))
 #else
 
 static inline void switchdev_deferred_process(void)
@@ -322,6 +324,8 @@ static inline bool switchdev_port_same_parent_id(struct 
net_device *a,
return false;
 }
 
+#define SWITCHDEV_SET_OPS(netdev, ops) do {} while (0)
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
-- 
2.1.4

[PATCH/RFC net-next 2/9] nfp: add phys_switch_id support

2017-06-27 Thread Simon Horman

Add phys_switch_id support by allowing lookup of
SWITCHDEV_ATTR_ID_PORT_PARENT_ID via the nfp_repr_port_attr_get
switchdev operation.

This is visible to user-space in the phys_switch_id attribute
of a netdev.

e.g.
cd /sys/devices/pci:00/:00:01.0/:01:00.0
find . -name phys_switch_id | xargs grep .
./net/eth3/phys_switch_id:00154d1300bd
./net/eth4/phys_switch_id:00154d1300bd
./net/eth2/phys_switch_id:00154d1300bd
grep: ./net/eth5/phys_switch_id: Operation not supported

In the above eth2 and eth3 and representor netdevs for the first and second
physical port. eth4 is the representor for the PF. And eth5 is the PF netdev.

Signed-off-by: Simon Horman 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  3 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  |  2 ++
 drivers/net/ethernet/netronome/nfp/nfp_port.c  | 28 ++
 drivers/net/ethernet/netronome/nfp/nfp_port.h  |  3 +++
 4 files changed, 36 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 2e728543e840..b5834525c5f0 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 #include "nfpcore/nfp_nsp.h"
@@ -3703,6 +3704,8 @@ static void nfp_net_netdev_init(struct nfp_net *nn)
netdev->netdev_ops = _net_netdev_ops;
netdev->watchdog_timeo = msecs_to_jiffies(5 * 1000);
 
+   SWITCHDEV_SET_OPS(netdev, _port_switchdev_ops);
+
/* MTU range: 68 - hw-specific max */
netdev->min_mtu = ETH_MIN_MTU;
netdev->max_mtu = nn->max_mtu;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
index 046b89eb4cf2..bc9108071e5b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "nfpcore/nfp_cpp.h"
 #include "nfpcore/nfp_nsp.h"
@@ -299,6 +300,7 @@ int nfp_repr_init(struct nfp_app *app, struct net_device 
*netdev,
repr->dst->u.port_info.lower_dev = pf_netdev;
 
netdev->netdev_ops = _repr_netdev_ops;
+   SWITCHDEV_SET_OPS(netdev, _port_switchdev_ops);
 
err = register_netdev(netdev);
if (err)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.c 
b/drivers/net/ethernet/netronome/nfp/nfp_port.c
index 0b44952945d8..c95215eb87c2 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.c
@@ -59,6 +59,34 @@ struct nfp_port *nfp_port_from_netdev(struct net_device 
*netdev)
return NULL;
 }
 
+static int
+nfp_port_attr_get(struct net_device *netdev, struct switchdev_attr *attr)
+{
+   struct nfp_port *port;
+
+   port = nfp_port_from_netdev(netdev);
+   if (!port)
+   return -EOPNOTSUPP;
+
+   switch (attr->id) {
+   case SWITCHDEV_ATTR_ID_PORT_PARENT_ID: {
+   const u8 *serial;
+   /* N.B: attr->u.ppid.id is binary data */
+   attr->u.ppid.id_len = nfp_cpp_serial(port->app->cpp, );
+   memcpy(>u.ppid.id, serial, attr->u.ppid.id_len);
+   break;
+   }
+   default:
+   return -EOPNOTSUPP;
+   }
+
+   return 0;
+}
+
+const struct switchdev_ops nfp_port_switchdev_ops = {
+   .switchdev_port_attr_get= nfp_port_attr_get,
+};
+
 struct nfp_port *
 nfp_port_from_id(struct nfp_pf *pf, enum nfp_port_type type, unsigned int id)
 {
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.h 
b/drivers/net/ethernet/netronome/nfp/nfp_port.h
index 57d852a4ca59..de60cacd3362 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.h
@@ -35,6 +35,7 @@
 #define _NFP_PORT_H_
 
 #include 
+#include 
 
 struct net_device;
 struct nfp_app;
@@ -106,6 +107,8 @@ struct nfp_port {
struct list_head port_list;
 };
 
+extern const struct switchdev_ops nfp_port_switchdev_ops;
+
 struct nfp_port *nfp_port_from_netdev(struct net_device *netdev);
 struct nfp_port *
 nfp_port_from_id(struct nfp_pf *pf, enum nfp_port_type type, unsigned int id);
-- 
2.1.4

[PATCH/RFC net-next 3/9] nfp: provide infrastructure for offloading flower based TC filters

2017-06-27 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Adds a flower based TC offload handler for representor devices, this
is in addition to the bpf based offload handler. The changes in this
patch will be used in a follow-up patch to add tc flower offload to
the NFP.

The flower app enables tc offloads on representors by default.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   3 +-
 drivers/net/ethernet/netronome/nfp/flower/main.c   |   9 ++
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  48 +++
 .../net/ethernet/netronome/nfp/flower/offload.c| 144 +
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  |  18 +++
 5 files changed, 221 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/main.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/offload.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 43bdbc228969..d7afd2b410fe 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -32,7 +32,8 @@ nfp-objs := \
 ifeq ($(CONFIG_NFP_APP_FLOWER),y)
 nfp-objs += \
flower/cmsg.o \
-   flower/main.o
+   flower/main.o \
+   flower/offload.o
 endif
 
 ifeq ($(CONFIG_BPF_SYSCALL),y)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index ab68a8f58862..7b27871f489c 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 
+#include "main.h"
 #include "../nfpcore/nfp_cpp.h"
 #include "../nfpcore/nfp_nsp.h"
 #include "../nfp_app.h"
@@ -303,8 +304,14 @@ static int nfp_flower_vnic_init(struct nfp_app *app, 
struct nfp_net *nn,
eth_hw_addr_random(nn->dp.netdev);
netif_keep_dst(nn->dp.netdev);
 
+   if (nfp_flower_repr_init(app))
+   goto err_free_priv;
+
return 0;
 
+err_free_priv:
+   kfree(app->priv);
+   app->priv = NULL;
 err_invalid_port:
nn->port = nfp_port_alloc(app, NFP_PORT_INVALID, nn->dp.netdev);
return PTR_ERR_OR_ZERO(nn->port);
@@ -367,4 +374,6 @@ const struct nfp_app_type app_flower = {
 
.eswitch_mode_get  = eswitch_mode_get,
.repr_get   = nfp_flower_repr_get,
+
+   .setup_tc   = nfp_flower_setup_tc,
 };
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
new file mode 100644
index ..119f66068c2b
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ *  2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __NFP_FLOWER_H__
+#define __NFP_FLOWER_H__ 1
+
+#include 
+
+#define NFP_FLOWER_ALLOWED_VER 0x00010001UL
+
+struct tc_to_netdev;
+struct net_device;
+struct nfp_app;
+
+int nfp_flower_repr_init(struct nfp_app *app);
+int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
+   u32 handle, __be16 proto, struct tc_to_netdev *tc);
+#endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
new file mode 100644
index ..9127c28ea9c3
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -0,0 +1,144 @@

[PATCH/RFC net-next 6/9] nfp: add basic action capabilities to flower offloads

2017-06-27 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Adds push vlan, pop vlan, output and drop action capabilities
to flower offloads.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/flower/action.c | 210 +
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |  45 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |   5 +
 .../net/ethernet/netronome/nfp/flower/offload.c|  11 ++
 5 files changed, 272 insertions(+)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/action.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 018cef3fa10a..1ba0ea78adc3 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -31,6 +31,7 @@ nfp-objs := \
 
 ifeq ($(CONFIG_NFP_APP_FLOWER),y)
 nfp-objs += \
+   flower/action.o \
flower/cmsg.o \
flower/main.o \
flower/match.o \
diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
new file mode 100644
index ..391afb55504c
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -0,0 +1,210 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ *  2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "main.h"
+#include "../nfp_net_repr.h"
+
+static void nfp_fl_pop_vlan(struct nfp_fl_pop_vlan *pop_vlan)
+{
+   size_t act_size = sizeof(struct nfp_fl_pop_vlan);
+   u16 tmp_pop_vlan_op;
+
+   tmp_pop_vlan_op =
+   FIELD_PREP(NFP_FL_ACT_LEN_LW, act_size / NFP_FL_LW_SIZ) |
+   FIELD_PREP(NFP_FL_ACT_JMP_ID, NFP_FL_ACTION_OPCODE_POP_VLAN);
+
+   pop_vlan->a_op = cpu_to_be16(tmp_pop_vlan_op);
+   pop_vlan->reserved = 0;
+}
+
+static void
+nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
+const struct tc_action *action)
+{
+   size_t act_size = sizeof(struct nfp_fl_push_vlan);
+   struct tcf_vlan *vlan = to_vlan(action);
+   u16 tmp_push_vlan_tci;
+   u16 tmp_push_vlan_op;
+
+   tmp_push_vlan_op =
+   FIELD_PREP(NFP_FL_ACT_LEN_LW, act_size / NFP_FL_LW_SIZ) |
+   FIELD_PREP(NFP_FL_ACT_JMP_ID, NFP_FL_ACTION_OPCODE_PUSH_VLAN);
+
+   push_vlan->a_op = cpu_to_be16(tmp_push_vlan_op);
+   /* Set action push vlan parameters. */
+   push_vlan->reserved = 0;
+   push_vlan->vlan_tpid = tcf_vlan_push_proto(action);
+
+   tmp_push_vlan_tci =
+   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, vlan->tcfv_push_prio) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, vlan->tcfv_push_vid) |
+   NFP_FL_PUSH_VLAN_CFI;
+   push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
+}
+
+static int
+nfp_fl_output(struct nfp_fl_output *output, const struct tc_action *action,
+ struct nfp_fl_payload *nfp_flow, bool last,
+ struct net_device *in_dev)
+{
+   size_t act_size = sizeof(struct nfp_fl_output);
+   struct net_device *out_dev;
+   u16 tmp_output_op;
+   int ifindex;
+
+   /* Set action opcode to output action. */
+   tmp_output_op =
+   FIELD_PREP(NFP_FL_ACT_LEN_LW, act_size / NFP_FL_LW_SIZ) |
+   FIELD_PREP(NFP_FL_ACT_JMP_ID, NFP_FL_ACTION_OPCODE_OUTPUT);
+
+   output->a_op

[PATCH/RFC net-next 0/9] introduce flower offload capabilities

2017-06-27 Thread Simon Horman

Hi,

this series adds flower offload to the NFP driver. It builds on recent
work to add representor and a skeleton flower app - now the app does what
its name says.

In general the approach taken is to allow some flows within
the universe of possible flower matches and tc actions to be offloaded.
It is planned that this support will grow over time but the support
offered by this patch-set seems to be a reasonable starting point.

Pieter Jansen van Vuuren (7):
  nfp: provide infrastructure for offloading flower based TC filters
  nfp: extend flower add flow offload
  nfp: extend flower matching capabilities
  nfp: add basic action capabilities to flower offloads
  nfp: add metadata to each flow offload
  nfp: add a stats handler for flower offloads
  nfp: add control message passing capabilities to flower offloads

Simon Horman (2):
  net: switchdev: add SET_SWITCHDEV_OPS helper
  nfp: add phys_switch_id support

 drivers/net/ethernet/netronome/nfp/Makefile|   6 +-
 drivers/net/ethernet/netronome/nfp/flower/action.c | 210 +
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   |  11 +-
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   | 202 
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  17 +-
 drivers/net/ethernet/netronome/nfp/flower/main.h   | 161 +++
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 292 
 .../net/ethernet/netronome/nfp/flower/metadata.c   | 518 +
 .../net/ethernet/netronome/nfp/flower/offload.c| 417 +
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   1 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c|   3 +
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  |  20 +
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h  |  10 +-
 drivers/net/ethernet/netronome/nfp/nfp_port.c  |  28 ++
 drivers/net/ethernet/netronome/nfp/nfp_port.h  |   3 +
 include/net/switchdev.h|   4 +
 16 files changed, 1887 insertions(+), 16 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/action.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/main.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/match.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/metadata.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/offload.c

-- 
2.1.4

Re: [PATCH iproute2 0/3] ip-link: XDP flags and offload mode

2017-06-27 Thread Stephen Hemminger

On Mon, 26 Jun 2017 17:23:50 -0700
Jakub Kicinski  wrote:

> Hi!
> 
> This series adds support for specifying DRV_MODE and new HW_MODE
> flags when binding an XDP program to the driver.  It also teaches
> ip link about "xdpoffload" attachment mode.
> 
> Examples:
> # ip link set dev p4p1 xdpoffload obj prog.o sec '.text'
> # ip link show dev p4p1
> 60: p4p1:  mtu 1500 xdpoffload/id:2 qdisc noop state 
> DOWN mode DEFAULT group default qlen 1000
> link/ether 00:15:4d:12:27:6b brd ff:ff:ff:ff:ff:ff
> # ip link set dev p4p1 xdpoffload off
> 
> Note: this is based on top of Martin's "bpf: Add support for 
> IFLA_XDP_PROG_ID".
> 
> Jakub Kicinski (3):
>   bpf: print xdp offloaded mode
>   bpf: add xdpdrv for requesting XDP driver mode
>   bpf: allow requesting XDP HW offload
> 
>  include/linux/if_link.h |  2 ++
>  ip/iplink.c |  7 ++-
>  ip/iplink_xdp.c |  9 -
>  ip/xdp.h|  3 ++-
>  man/man8/ip-link.8.in   | 13 -
>  5 files changed, 30 insertions(+), 4 deletions(-)
> 

Applied to net-next.
Next time please put net-next in the patch subject line (same as for kernel).

Re: [PATCH iproute2 net-next] bpf: Add support for IFLA_XDP_PROG_ID

2017-06-27 Thread Stephen Hemminger

On Wed, 21 Jun 2017 14:29:42 -0700
Martin KaFai Lau  wrote:

> This patch adds support to the newly added IFLA_XDP_PROG_ID.
> 
> ./ip link show dev eth0
> 3: eth0:  mtu 1500 xdpgeneric/id:2 qdisc 
> [...]
> 
> Signed-off-by: Martin KaFai Lau 

Applied to net-next

Re: [PATCH iproute2] bpf: indicate lderr when bpf_apply_relo_data fails

2017-06-27 Thread Daniel Borkmann


On 06/28/2017 01:09 AM, Stephen Hemminger wrote:

On Tue, 27 Jun 2017 02:48:36 +0200
Daniel Borkmann  wrote:


When LLVM wrongly generates a rodata relo entry (llvm BZ #33599),
then just bail out instead of probing for prog w/o reloc, which
will fail in this case anyway.

Signed-off-by: Daniel Borkmann 


Applied, but don't you want a reasonable error message.


Thanks, the error message in this case throws:

ELF contains non-map related relo data in entry  pointing to section ! 
Compiler bug?!

Re: [PATCH iproute2] bpf: indicate lderr when bpf_apply_relo_data fails

2017-06-27 Thread Stephen Hemminger

On Tue, 27 Jun 2017 02:48:36 +0200
Daniel Borkmann  wrote:

> When LLVM wrongly generates a rodata relo entry (llvm BZ #33599),
> then just bail out instead of probing for prog w/o reloc, which
> will fail in this case anyway.
> 
> Signed-off-by: Daniel Borkmann 

Applied, but don't you want a reasonable error message.

Re: [PATCH] man: ip-route.8: Mention that lower metric means higher priority

2017-06-27 Thread Stephen Hemminger

On Wed, 21 Jun 2017 21:59:45 +0200
Lukas Braun  wrote:

> This is quite counter-intuitive when using the 'preference' keyword.
> 
> Signed-off-by: Lukas Braun 

I think it goes back to some wording in RFC.
But your addition makes sense. Applied.

Re: [iproute PATCH] man: Collect names of man pages automatically

2017-06-27 Thread Stephen Hemminger

On Tue, 27 Jun 2017 21:00:25 +0200
Phil Sutter  wrote:

> As it turned out, forgetting to add a man page to the respective
> Makefile when introducing it is a common mistake. Overcome this once and
> for all by using $(wildcard) function in Makefiles.
> 
> Fixes: 7124942942e53 ("genl: add manpage")
> Fixes: 958cd210942c8 ("ifcfg: add manpage")
> Fixes: e1b7f883e50de ("man: add documentation for IPv6 SR commands")
> Fixes: 1949f82cdf62c ("Introduce ip vrf command")
> Fixes: 535194a172d23 ("tipc: add peer remove functionality")
> Signed-off-by: Phil Sutter 

Thanks for fixing this common mistake.
Applied.

Re: [PATCH iproute2 V1 3/6] rdma: Add device capability parsing

2017-06-27 Thread Stephen Hemminger

On Tue, 27 Jun 2017 17:39:17 +0300
Leon Romanovsky  wrote:

> +static const char *dev_caps[64] = {
> + "RESIZE_MAX_WR",
> + "BAD_PKEY_CNTR",
> + "BAD_QKEY_CNTR",
> + "RAW_MULTI",
> + "AUTO_PATH_MIG",
> + "CHANGE_PHY_PORT",
> + "UD_AV_PORT_ENFORCE",
> + "CURR_QP_STATE_MOD",
> + "SHUTDOWN_PORT",
> + "INIT_TYPE",
> + "PORT_ACTIVE_EVENT",
> + "SYS_IMAGE_GUID",
> + "RC_RNR_NAK_GEN",
> + "SRQ_RESIZE",
> + "N_NOTIFY_CQ",
> + "LOCAL_DMA_LKEY",
> + "RESERVED",
> + "MEM_WINDOW",
> + "UD_IP_CSUM",
> + "UD_TSO",
> + "XRC",
> + "MEM_MGT_EXTENSIONS",
> + "BLOCK_MULTICAST_LOOPBACK",
> + "MEM_WINDOW_TYPE_2A",
> + "MEM_WINDOW_TYPE_2B",
> + "RC_IP_CSUM",
> + "RAW_IP_CSUM",
> + "CROSS_CHANNEL",
> + "MANAGED_FLOW_STEERING",
> + "SIGNATURE_HANDOVER",
> + "ON_DEMAND_PAGING",
> + "SG_GAPS_REG",
> + "VIRTUAL_FUNCTION",
> + "RAW_SCATTER_FCS",
> + "RDMA_NETDEV_OPA_VNIC",
> +};

Please use array initializer so that header and capabilities don't get 
different values.
Are the bit values in some rdma header file?

Re: [PATCH iproute2 3/5] rdma: Add device capability parsing

2017-06-27 Thread Stephen Hemminger

On Tue, 27 Jun 2017 20:46:15 +0300
Leon Romanovsky  wrote:

> On Tue, Jun 27, 2017 at 11:37:35AM -0600, Jason Gunthorpe wrote:
> > On Tue, Jun 27, 2017 at 08:33:01PM +0300, Leon Romanovsky wrote:
> >  
> > > My initial plan was to put all parsers under their respective names, in
> > > the similar way as I did for caps: $ rdma dev show mlx5_4 caps  
> >
> > I think you should have a useful summary display similar to 'ip a' and
> > other commands.
> >
> > guid(s), subnet prefix or default gid for IB, lid/lmc, link state,
> > speed, mtu, pkeys protocol(s)  
> 
> It will, but before I would like to see this tool be a part of
> iproute2, so other people will be able to extend it in addition
> to me.
> 
> Are you fine with the proposed code?
> 

Output formats need to be nailed down. The output of iproute2 commands is almost
like an ABI. Users build scripts to parse it (whether that is a great idea or 
not
is debateable, it mostly shows the weakness in programatic API's). Therefore 
fully
changing output formats in later revisions is likely to get users upset.

The first version doesn't have to be perfect, just close to the overall goal
of what is planned.  

pgpQskCG_Bf7a.pgp
Description: OpenPGP digital signature

Re: [PATCH iproute2 3/5] rdma: Add device capability parsing

2017-06-27 Thread Stephen Hemminger

On Tue, 27 Jun 2017 20:33:01 +0300
Leon Romanovsky  wrote:

> On Tue, Jun 27, 2017 at 10:41:50AM -0600, Jason Gunthorpe wrote:
> > On Tue, Jun 27, 2017 at 12:21:29PM +0300, Leon Romanovsky wrote:  
> > > > What will be the output of such command?
> > > >  $ rdma dev show mlx5_4  
> > >
> > > ip-like style:
> > >
> > > $ rdma dev show mlx5_4
> > > 5: mlx5_4:
> > > caps:  > > PORT_ACTIVE_EVENT, SYS_IMAGE_GUID, RC_RNR_NAK_GEN, MEM_WINDOW, 
> > > UD_IP_CSUM, UD_TSO, XRC, MEM_MGT_EXTENSIONS, BLOCK_MULTICAST_LOOPBACK, 
> > > MEM_WINDOW_TYPE_2B, RAW_IP_CSUM, SIGNATURE_HANDOVER, VIRTUAL_FUNCTION>
> > > $ rdma link show mlx5_3
> > > 4/1: mlx5_3/1:
> > > caps:   
> >
> > I think that is better, maybe it should only show under some kind of
> > verbose mode, I don't know, it depends what other stuff ends up being
> > displayed..
> >
> > Are you going to dump the gid table and pkey table too in one of these 
> > commands?  
> 
> My initial plan was to put all parsers under their respective names, in
> the similar way as I did for caps: $ rdma dev show mlx5_4 caps
> 
> So for large dumps, I'm going to use that technique again and maybe print 
> summary as a default.
> For example, for gids, we can print utilization as a summary while whole
> table if someone really wants it: $ rdma link show mlx5_4/1 gids 
> 
> Something like that.
> 
> Thanks
> 
> >
> > Jason  

Agree with discussion so far.

For iproute2 style commands, the show and set commands need to have similar 
arguments.
Ideally, everything after the colon in the show would be parameters to set 
command.

Please consider having a concise form for normal users and a detail form (with 
-d) for 
configuration and setup cases. The caps should not need to be displayed in 
normal show
output.


pgpbdA61_hyMI.pgp
Description: OpenPGP digital signature

RE: [net-next v2 6/6] ixgbe: Add malicious driver detection support

2017-06-27 Thread Tantilov, Emil S

>-Original Message-
>From: Or Gerlitz [mailto:gerlitz...@gmail.com]
>Sent: Tuesday, June 27, 2017 2:07 PM
>To: Tantilov, Emil S 
>Cc: Kirsher, Jeffrey T ; David Miller
>; Greenwalt, Paul ; Linux
>Netdev List ; nhor...@redhat.com;
>sassm...@redhat.com; jogre...@redhat.com
>Subject: Re: [net-next v2 6/6] ixgbe: Add malicious driver detection
>support
>
>On Tue, Jun 27, 2017 at 11:59 PM, Tantilov, Emil S
> wrote:
>>>-Original Message-
>>>From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
>On
>>>Behalf Of Or Gerlitz
>>>Sent: Tuesday, June 27, 2017 2:08 AM
>>>To: Kirsher, Jeffrey T 
>>>Cc: David Miller ; Greenwalt, Paul
>>>; Linux Netdev List ;
>>>nhor...@redhat.com; sassm...@redhat.com; jogre...@redhat.com
>>>Subject: Re: [net-next v2 6/6] ixgbe: Add malicious driver detection
>>>support
>>>
>>>On Tue, Jun 27, 2017 at 11:51 AM, Jeff Kirsher
>>> wrote:
 From: Paul Greenwalt 

 Add malicious driver detection (MDD) support for X550, X550em_a,
 and X550em_x devices.

 MDD is a hardware SR-IOV security feature which the driver enables by
 default, but can be controlled on|off by ethtool set-priv-flags
>>>
>>>wait, we have the trusted vf concept, which you implement
>>>(ixgbe_ndo_set_vf_trust)
>>>so you can enable by default for all vfs and disable it for trusted
>>>ones, why create[]  an ixgbe special config knob? IMHO we should max all
>possible efforts to
>>>avoid priv ethtool flags usage.
>>
>> The "trusted" option was added to allow use cases that were not possible in 
>> the
>> default driver configuration for SRIOV (promiscuous mode, overriding the 
>> MAC).
>> While these modes can lead to issues (performance with promisc) they can 
>> still
>> be useful for certain configurations.
>>
>> MDD is a completely different type of protection that incorporates checks for
>> queue context, Tx descriptors and out-of-bounds DMA/memory access that can
>> disrupt the operation of the interfaces. You can read more about it in the 
>> X550
>> datasheet (section 7.9.4.3 malicious Driver Detection):
>>
>https://www.intel.com/content/www/us/en/embedded/products/networking/ethern
>et-controller-x550-family-documentation.html
>>
>> For that reason we do not want to make it part of the "trusted" option.
>
>you can extend the trusted option without breaking the UAPI, currently
>it's one bit y/n, but you should have there at least seven more bits
>to use.
>
>> In addition MDD is a global setting and cannot be configured per-VF.
>
>can you state more clearly why use think the right configuration knob
>here is per driver ethtool private flag?

Mainly because I am not sure that other (non-Intel) drivers will benefit from
such an option. In normal operation this functionality should not cause issues
and if it doesn't we may be able to deprecate the private flag in the future.

On the other hand if the same/similar feature exists in other drivers then
it would perhaps make more sense to introduce a new option altogether.

Thanks,
Emil

[PATCH] net: stmmac: Add additional registers for dwmac1000_dma ethtool

2017-06-27 Thread thor . thayer

From: Thor Thayer 

Version 3.70a of the Designware has additional DMA registers so
add those to the ethtool DMA Register dump.
Offset 9  - Receive Interrupt Watchdog Timer Register
Offset 10 - AXI Bus Mode Register
Offset 11 - AHB or AXI Status Register
Offset 22 - HW Feature Register

Signed-off-by: Thor Thayer 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c  | 4 ++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
index 471a9aa..22cf635 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
@@ -205,8 +205,8 @@ static void dwmac1000_dump_dma_regs(void __iomem *ioaddr, 
u32 *reg_space)
 {
int i;
 
-   for (i = 0; i < 22; i++)
-   if ((i < 9) || (i > 17))
+   for (i = 0; i < 23; i++)
+   if ((i < 12) || (i > 17))
reg_space[DMA_BUS_MODE / 4 + i] =
readl(ioaddr + DMA_BUS_MODE + i * 4);
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index 743170d..babb39c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -29,7 +29,7 @@
 #include "stmmac.h"
 #include "dwmac_dma.h"
 
-#define REG_SPACE_SIZE 0x1054
+#define REG_SPACE_SIZE 0x1060
 #define MAC100_ETHTOOL_NAME"st_mac100"
 #define GMAC_ETHTOOL_NAME  "st_gmac"
 
-- 
2.7.4

[PATCH 2/2] ethtool: stmmac: Add DMA HW Feature Register

2017-06-27 Thread thor . thayer

From: Thor Thayer 

This patch adds the DMA HW Feature Register which is at the end
of the DMA registers and is documented in Version 3.70a.

Signed-off-by: Thor Thayer 
---
 stmmac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/stmmac.c b/stmmac.c
index e1bb291..7d7bebd 100644
--- a/stmmac.c
+++ b/stmmac.c
@@ -64,7 +64,7 @@ int st_gmac_dump_regs(struct ethtool_drvinfo *info, struct 
ethtool_regs *regs)
fprintf(stdout, "\n");
fprintf(stdout, "DMA Registers\n");
stmmac_reg = (unsigned int *)regs->data + DMA_REG_OFFSET;
-   for (i = 0; i < 22; i++)
+   for (i = 0; i < 23; i++)
fprintf(stdout, "Reg%d  0x%08X\n", i, *stmmac_reg++);
 
return 0;
-- 
2.7.4

[PATCH 1/2] ethtool: stmmac: Fix Designware ethtool register dump

2017-06-27 Thread thor . thayer

From: Thor Thayer 

The commit fbf68229ffe7 ("net: stmmac: unify registers dumps methods")

modified the register dump to store the DMA registers at the DMA register
offset (0x1000) but ethtool (stmmac.c) looks for the DMA registers after
the MAC registers which is offset 12.
This patch adds the DMA register offset so that indexing is correct.

Signed-off-by: Thor Thayer 
---
 stmmac.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/stmmac.c b/stmmac.c
index fb69bfe..e1bb291 100644
--- a/stmmac.c
+++ b/stmmac.c
@@ -14,6 +14,9 @@
 #include 
 #include "internal.h"
 
+/* The DMA Registers start at offset 0x1000 in the DW IP */
+#define DMA_REG_OFFSET (0x1000 / 4)
+
 int st_mac100_dump_regs(struct ethtool_drvinfo *info,
struct ethtool_regs *regs)
 {
@@ -36,6 +39,7 @@ int st_mac100_dump_regs(struct ethtool_drvinfo *info,
 
fprintf(stdout, "\n");
fprintf(stdout, "DMA Registers\n");
+   stmmac_reg = (unsigned int *)regs->data + DMA_REG_OFFSET;
for (i = 0; i < 9; i++)
fprintf(stdout, "CSR%d  0x%08X\n", i, *stmmac_reg++);
 
@@ -59,6 +63,7 @@ int st_gmac_dump_regs(struct ethtool_drvinfo *info, struct 
ethtool_regs *regs)
 
fprintf(stdout, "\n");
fprintf(stdout, "DMA Registers\n");
+   stmmac_reg = (unsigned int *)regs->data + DMA_REG_OFFSET;
for (i = 0; i < 22; i++)
fprintf(stdout, "Reg%d  0x%08X\n", i, *stmmac_reg++);
 
-- 
2.7.4

[PATCH 0/2] ethtool: stmmac: Fix DMA register dump

2017-06-27 Thread thor . thayer

From: Thor Thayer 

The DMA register dump structure changed which requires this
change to the indexing of the DMA registers.
Also dump the DMA HW Feature Register.

Thor Thayer (2):
  ethtool: stmmac: Fix Designware ethtool register dump
  ethtool: stmmac: Add DMA HW Feature Register

 stmmac.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.7.4

Re: [PATCH] datapath: Avoid using stack larger than 1024.

2017-06-27 Thread Pravin Shelar

On Tue, Jun 27, 2017 at 2:20 AM, Tonghao Zhang  wrote:
> When compiling OvS-master on 4.4.0-81 kernel,
> there is a warning:
>
> CC [M]  /root/ovs/datapath/linux/datapath.o
> /root/ovs/datapath/linux/datapath.c: In function
> ‘ovs_flow_cmd_set’:
> /root/ovs/datapath/linux/datapath.c:1221:1: warning:
> the frame size of 1040 bytes is larger than 1024 bytes
> [-Wframe-larger-than=]
>
> This patch use kmalloc to malloc mem for sw_flow_mask and
> avoid using stack.
>
> Signed-off-by: Tonghao Zhang 
> ---
>  net/openvswitch/datapath.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index c85029c..da8cd68 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -1107,7 +1107,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
> genl_info *info)
> struct ovs_header *ovs_header = info->userhdr;
> struct sw_flow_key key;
> struct sw_flow *flow;
> -   struct sw_flow_mask mask;
> +   struct sw_flow_mask *mask;
> struct sk_buff *reply = NULL;
> struct datapath *dp;
> struct sw_flow_actions *old_acts = NULL, *acts = NULL;
> @@ -1120,7 +1120,11 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, 
> struct genl_info *info)
>
> ufid_present = ovs_nla_get_ufid(, a[OVS_FLOW_ATTR_UFID], log);
> if (a[OVS_FLOW_ATTR_KEY]) {
> -   ovs_match_init(, , true, );
> +   mask = kmalloc(sizeof(struct sw_flow_mask), GFP_KERNEL);
> +   if (!mask)
> +   return -ENOMEM;
> +
> +   ovs_match_init(, , true, mask);

Rather than allocating mask object and freeing it at the end, can you
factor out code which needs mask into new function to save some stack
space in this function?

> error = ovs_nla_get_match(net, , a[OVS_FLOW_ATTR_KEY],
>   a[OVS_FLOW_ATTR_MASK], log);
> } else if (!ufid_present) {
> @@ -1141,7 +1145,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
> genl_info *info)
> }
>
> acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], ,
> -   , log);
> +   mask, log);
> if (IS_ERR(acts)) {
> error = PTR_ERR(acts);
> goto error;
> @@ -1216,6 +1220,7 @@ err_unlock_ovs:
>  err_kfree_acts:
> ovs_nla_free_flow_actions(acts);
>  error:
> +   kfree(mask);

mask free is missing in usual, no error case.

Re: [net-next v2 6/6] ixgbe: Add malicious driver detection support

2017-06-27 Thread Or Gerlitz

On Tue, Jun 27, 2017 at 11:59 PM, Tantilov, Emil S
 wrote:
>>-Original Message-
>>From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
>>Behalf Of Or Gerlitz
>>Sent: Tuesday, June 27, 2017 2:08 AM
>>To: Kirsher, Jeffrey T 
>>Cc: David Miller ; Greenwalt, Paul
>>; Linux Netdev List ;
>>nhor...@redhat.com; sassm...@redhat.com; jogre...@redhat.com
>>Subject: Re: [net-next v2 6/6] ixgbe: Add malicious driver detection
>>support
>>
>>On Tue, Jun 27, 2017 at 11:51 AM, Jeff Kirsher
>> wrote:
>>> From: Paul Greenwalt 
>>>
>>> Add malicious driver detection (MDD) support for X550, X550em_a,
>>> and X550em_x devices.
>>>
>>> MDD is a hardware SR-IOV security feature which the driver enables by
>>> default, but can be controlled on|off by ethtool set-priv-flags
>>
>>wait, we have the trusted vf concept, which you implement
>>(ixgbe_ndo_set_vf_trust)
>>so you can enable by default for all vfs and disable it for trusted
>>ones, why create[]  an ixgbe special config knob? IMHO we should max all 
>>possible efforts to
>>avoid priv ethtool flags usage.
>
> The "trusted" option was added to allow use cases that were not possible in 
> the
> default driver configuration for SRIOV (promiscuous mode, overriding the MAC).
> While these modes can lead to issues (performance with promisc) they can still
> be useful for certain configurations.
>
> MDD is a completely different type of protection that incorporates checks for
> queue context, Tx descriptors and out-of-bounds DMA/memory access that can
> disrupt the operation of the interfaces. You can read more about it in the 
> X550
> datasheet (section 7.9.4.3 malicious Driver Detection):
> https://www.intel.com/content/www/us/en/embedded/products/networking/ethernet-controller-x550-family-documentation.html
>
> For that reason we do not want to make it part of the "trusted" option.

you can extend the trusted option without breaking the UAPI, currently
it's one bit y/n, but you should have there at least seven more bits
to use.

> In addition MDD is a global setting and cannot be configured per-VF.

can you state more clearly why use think the right configuration knob
here is per driver ethtool private flag?

Or.

RE: [net-next v2 6/6] ixgbe: Add malicious driver detection support

2017-06-27 Thread Tantilov, Emil S

>-Original Message-
>From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
>Behalf Of Or Gerlitz
>Sent: Tuesday, June 27, 2017 2:08 AM
>To: Kirsher, Jeffrey T 
>Cc: David Miller ; Greenwalt, Paul
>; Linux Netdev List ;
>nhor...@redhat.com; sassm...@redhat.com; jogre...@redhat.com
>Subject: Re: [net-next v2 6/6] ixgbe: Add malicious driver detection
>support
>
>On Tue, Jun 27, 2017 at 11:51 AM, Jeff Kirsher
> wrote:
>> From: Paul Greenwalt 
>>
>> Add malicious driver detection (MDD) support for X550, X550em_a,
>> and X550em_x devices.
>>
>> MDD is a hardware SR-IOV security feature which the driver enables by
>> default, but can be controlled on|off by ethtool set-priv-flags
>
>wait, we have the trusted vf concept, which you implement
>(ixgbe_ndo_set_vf_trust)
>so you can enable by default for all vfs and disable it for trusted
>ones, why create[]  an ixgbe special config knob? IMHO we should max all 
>possible efforts to
>avoid priv ethtool flags usage.

The "trusted" option was added to allow use cases that were not possible in the
default driver configuration for SRIOV (promiscuous mode, overriding the MAC).
While these modes can lead to issues (performance with promisc) they can still
be useful for certain configurations.

MDD is a completely different type of protection that incorporates checks for
queue context, Tx descriptors and out-of-bounds DMA/memory access that can
disrupt the operation of the interfaces. You can read more about it in the X550
datasheet (section 7.9.4.3 malicious Driver Detection):
https://www.intel.com/content/www/us/en/embedded/products/networking/ethernet-controller-x550-family-documentation.html

For that reason we do not want to make it part of the "trusted" option.

In addition MDD is a global setting and cannot be configured per-VF.

Thanks,
Emil

Re: [Intel-wired-lan] [PATCH v2 1/1] e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails

2017-06-27 Thread Jeff Kirsher

On Wed, 2017-06-28 at 05:28 +1000, Dave Airlie wrote:
> On 20 June 2017 at 18:49, Daniel Vetter  wrote:
> > On Wed, Jun 07, 2017 at 01:07:33AM +, Brown, Aaron F wrote:
> > > > From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org]
> > > > On Behalf
> > > > Of Jeff Kirsher
> > > > Sent: Tuesday, June 6, 2017 1:46 PM
> > > > To: David Miller ; Nikula, Jani
> > > > 
> > > > Cc: Ursulin, Tvrtko ; daniel.vetter@ffwll
> > > > .ch; intel-
> > > > g...@lists.freedesktop.org; linux-ker...@vger.kernel.org;
> > > > jani.nik...@linux.intel.com; ch...@chris-wilson.co.uk; Ertman,
> > > > David M
> > > > ; intel-wired-...@lists.osuosl.org; dri-
> > > > de...@lists.freedesktop.org; netdev@vger.kernel.org; airlied@gmail.
> > > > com
> > > > Subject: Re: [Intel-wired-lan] [PATCH v2 1/1] e1000e: Undo
> > > > e1000e_pm_freeze if __e1000_shutdown fails
> > > > 
> > > > On Fri, 2017-06-02 at 14:14 -0400, David Miller wrote:
> > > > > From: Jani Nikula 
> > > > > Date: Wed, 31 May 2017 18:50:43 +0300
> > > > > 
> > > > > > From: Chris Wilson 
> > > > > > 
> > > > > > An error during suspend (e100e_pm_suspend),
> > > > > 
> > > > >  ...
> > > > > > lead to complete failure:
> > > > > 
> > > > >  ...
> > > > > > The unwind failures stems from commit 2800209994f8 ("e1000e:
> > > > > > Refactor PM
> > > > > > flows"), but it may be a later patch that introduced the non-
> > > > > > recoverable
> > > > > > behaviour.
> > > > > > 
> > > > > > Fixes: 2800209994f8 ("e1000e: Refactor PM flows")
> > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99847
> > > > > > Cc: Tvrtko Ursulin 
> > > > > > Cc: Jeff Kirsher 
> > > > > > Cc: Dave Ertman 
> > > > > > Cc: Bruce Allan 
> > > > > > Cc: intel-wired-...@lists.osuosl.org
> > > > > > Cc: netdev@vger.kernel.org
> > > > > > Signed-off-by: Chris Wilson 
> > > > > > [Jani: bikeshed repainted]
> > > > > > Signed-off-by: Jani Nikula 
> > > > > 
> > > > > Jeff, please make sure this gets submitted to me soon.
> > > > 
> > > > Expect it later tonight, just finishing up testing.
> > > 
> > > Tested-by: Aaron Brown 
> > 
> > Hm, I seem to be blind, but I can't find it anywhere in -rc6. Does
> > someone
> > have the sha1 from Linus' git for this patch?
> 
> Guys this is a pretty serious regression, just left blowing in the
> wind, is anyone responsible for e1000e?

This was submitted and accepted into David Miller's net-next tree.  I can
see if Dave can pull it into his net tree.  DOes stable need to pick this
up as well?

signature.asc
Description: This is a digitally signed message part

[PATCH net-next 1/2] vxlan: change vxlan_validate() to use netlink_ext_ack for error reporting

2017-06-27 Thread Matthias Schiffer

The kernel log is not where users expect error messages for netlink
requests; as we have extended acks now, we can replace pr_debug() with
NL_SET_ERR_MSG_ATTR().

While we're at it, also fix the !is_valid_ether_addr() error message (as it
not only rejects the all-zero address, but also multicast addresses), and
add messages for the remaining attributes.

Signed-off-by: Matthias Schiffer 
---
 drivers/net/vxlan.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index fd0ff97e3d81..01957e39f2cd 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2716,12 +2716,14 @@ static int vxlan_validate(struct nlattr *tb[], struct 
nlattr *data[],
 {
if (tb[IFLA_ADDRESS]) {
if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN) {
-   pr_debug("invalid link address (not ethernet)\n");
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_ADDRESS],
+   "invalid link address (not 
ethernet)");
return -EINVAL;
}
 
if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) {
-   pr_debug("invalid all zero ethernet address\n");
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_ADDRESS],
+   "invalid ethernet address");
return -EADDRNOTAVAIL;
}
}
@@ -2729,8 +2731,11 @@ static int vxlan_validate(struct nlattr *tb[], struct 
nlattr *data[],
if (tb[IFLA_MTU]) {
u32 mtu = nla_get_u32(tb[IFLA_MTU]);
 
-   if (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)
+   if (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_MTU],
+   "invalid MTU");
return -EINVAL;
+   }
}
 
if (!data)
@@ -2739,8 +2744,11 @@ static int vxlan_validate(struct nlattr *tb[], struct 
nlattr *data[],
if (data[IFLA_VXLAN_ID]) {
u32 id = nla_get_u32(data[IFLA_VXLAN_ID]);
 
-   if (id >= VXLAN_N_VID)
+   if (id >= VXLAN_N_VID) {
+   NL_SET_ERR_MSG_ATTR(extack, data[IFLA_VXLAN_ID],
+   "invalid VXLAN ID");
return -ERANGE;
+   }
}
 
if (data[IFLA_VXLAN_PORT_RANGE]) {
@@ -2748,8 +2756,8 @@ static int vxlan_validate(struct nlattr *tb[], struct 
nlattr *data[],
= nla_data(data[IFLA_VXLAN_PORT_RANGE]);
 
if (ntohs(p->high) < ntohs(p->low)) {
-   pr_debug("port range %u .. %u not valid\n",
-ntohs(p->low), ntohs(p->high));
+   NL_SET_ERR_MSG_ATTR(extack, data[IFLA_VXLAN_PORT_RANGE],
+   "port range not valid");
return -EINVAL;
}
}
-- 
2.13.2

[PATCH net-next 2/2] vxlan: add back error messages to vxlan_config_validate() as extended netlink acks

2017-06-27 Thread Matthias Schiffer

When refactoring the vxlan config validation, some kernel log messages were
removed. This brings them back using the new netlink_ext_ack support, and
adds some more in the recently added code handling link-local IPv6
addresses.

Signed-off-by: Matthias Schiffer 
---
 drivers/net/vxlan.c | 43 ++-
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 01957e39f2cd..8d4248ab09c2 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2909,7 +2909,8 @@ static int vxlan_sock_add(struct vxlan_dev *vxlan)
 
 static int vxlan_config_validate(struct net *src_net, struct vxlan_config 
*conf,
 struct net_device **lower,
-struct vxlan_dev *old)
+struct vxlan_dev *old,
+struct netlink_ext_ack *extack)
 {
struct vxlan_net *vn = net_generic(src_net, vxlan_net_id);
struct vxlan_dev *tmp;
@@ -2923,6 +2924,8 @@ static int vxlan_config_validate(struct net *src_net, 
struct vxlan_config *conf,
 */
if ((conf->flags & ~VXLAN_F_ALLOWED_GPE) ||
!(conf->flags & VXLAN_F_COLLECT_METADATA)) {
+   NL_SET_ERR_MSG(extack,
+  "unsupported combination of extensions");
return -EINVAL;
}
}
@@ -2957,14 +2960,20 @@ static int vxlan_config_validate(struct net *src_net, 
struct vxlan_config *conf,
 
if (local_type & IPV6_ADDR_LINKLOCAL) {
if (!(remote_type & IPV6_ADDR_LINKLOCAL) &&
-   (remote_type != IPV6_ADDR_ANY))
+   (remote_type != IPV6_ADDR_ANY)) {
+   NL_SET_ERR_MSG(extack,
+  "invalid combination of 
address scopes");
return -EINVAL;
+   }
 
conf->flags |= VXLAN_F_IPV6_LINKLOCAL;
} else {
if (remote_type ==
-   (IPV6_ADDR_UNICAST | IPV6_ADDR_LINKLOCAL))
+   (IPV6_ADDR_UNICAST | IPV6_ADDR_LINKLOCAL)) {
+   NL_SET_ERR_MSG(extack,
+  "invalid combination of 
address scopes");
return -EINVAL;
+   }
 
conf->flags &= ~VXLAN_F_IPV6_LINKLOCAL;
}
@@ -2991,12 +3000,18 @@ static int vxlan_config_validate(struct net *src_net, 
struct vxlan_config *conf,
 
*lower = lowerdev;
} else {
-   if (vxlan_addr_multicast(>remote_ip))
+   if (vxlan_addr_multicast(>remote_ip)) {
+   NL_SET_ERR_MSG(extack,
+  "multicast destination requires 
interface to be specified");
return -EINVAL;
+   }
 
 #if IS_ENABLED(CONFIG_IPV6)
-   if (conf->flags & VXLAN_F_IPV6_LINKLOCAL)
+   if (conf->flags & VXLAN_F_IPV6_LINKLOCAL) {
+   NL_SET_ERR_MSG(extack,
+  "link-local local/remote addresses 
require interface to be specified");
return -EINVAL;
+   }
 #endif
 
*lower = NULL;
@@ -3028,6 +3043,7 @@ static int vxlan_config_validate(struct net *src_net, 
struct vxlan_config *conf,
tmp->cfg.remote_ifindex != conf->remote_ifindex)
continue;
 
+   NL_SET_ERR_MSG(extack, "duplicate VNI");
return -EEXIST;
}
 
@@ -3083,14 +3099,14 @@ static void vxlan_config_apply(struct net_device *dev,
 }
 
 static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
-  struct vxlan_config *conf,
-  bool changelink)
+  struct vxlan_config *conf, bool changelink,
+  struct netlink_ext_ack *extack)
 {
struct vxlan_dev *vxlan = netdev_priv(dev);
struct net_device *lowerdev;
int ret;
 
-   ret = vxlan_config_validate(src_net, conf, , vxlan);
+   ret = vxlan_config_validate(src_net, conf, , vxlan, extack);
if (ret)
return ret;
 
@@ -3100,13 +3116,14 @@ static int vxlan_dev_configure(struct net *src_net, 
struct net_device *dev,
 }
 
 static int __vxlan_dev_create(struct net *net, struct net_device *dev,
- struct vxlan_config *conf)
+ struct vxlan_config *conf,
+

Re: RFC: sk leak in sock_graft?

2017-06-27 Thread Sowmini Varadhan

On (06/27/17 15:59), Sowmini Varadhan wrote:
> > Why does rds-tcp need to call sock_graft() without those invariants
> > met?
> 
> It would certainly help to declare "dont use sock_creeate_kern()
> if you are going to accept on this socket"- I dont see that being 
> mandated anywhere.

I can look into getting rds_tcp_accept_one also calling sock_create_lite
like every other caller, (though I may not get to this for another week,
due to travel), but the code in sock_graft() doesnt look right either. 

At the very least, there needs to be a WARN_ON(parent->sk) there,
to provide a gentle dope-slap for the next slob that stumbles on this
type of leak.

--Sowmini

Re: [PATCH v3 07/11] tty: improve tty_insert_flip_char() fast path

2017-06-27 Thread Arnd Bergmann

On Mon, Jun 26, 2017 at 3:58 PM, Arnd Bergmann  wrote:

> * With asan-stack=1, gcc uses at least 64 bytes per such variable
>   (two times ASAN_RED_ZONE_SIZE), while clang only uses 16 bytes
>   (2 * (1<   use any more space than with kasan completely disabled
>   (no -fsanitize=kernel-address).

I asked around the Linaro toolchain team today, and arrived at this commit
in llvm: https://github.com/llvm-mirror/llvm/commit/daa1bf3b74054

Prior to this, the llvm behavior was the same as gcc, using 64 bytes
for each small (<= 16 byte) variable instead of just 16 or 32 as it
does now. llvm now also uses a larger redzone (up to 256 bytes) for
very large stack objects, which also seems like a good idea.

While it would be hard to argue that the gcc behavior is a bug,
it should be possible to implement the same optimization in gcc,
and that would solve a lot of the stack size issues with KASAN.

> Can you say which behavior you find 'sane' or 'not sane' here,
> specifically? Maybe we can make future gcc releases use a
> smaller redzone like clang does.
>
> If we find a way to improve gcc so it uses less stack here, we still
> have a problem with existing compilers still producing dangerously
> high stack usage, as well as annoying warnings for an allmodconfig
> build as soon as we start warning about this again.

This problem obviously still stands.

  Arnd

Re: RFC: sk leak in sock_graft?

2017-06-27 Thread Sowmini Varadhan

On (06/27/17 15:38), David Miller wrote:
> 
> It could simply be the case that rds-tcp is the first setup that
> created that situation where there is a parent->sk already.

Possibly, I noticed that other callers call sock_create_lite()
and I dont know the history here - this seems to have been
the case from day-1 of rds-tcp. (and I dread changing 
rds_tcp_accept_kern() to do this, because then every module unload
would need to go and check if sock->sk is non-null first, before
cleaning it up

> Why does rds-tcp need to call sock_graft() without those invariants
> met?

It would certainly help to declare "dont use sock_creeate_kern()
if you are going to accept on this socket"- I dont see that being 
mandated anywhere.

It would also help to have a BUG_ON(parent->sk) or at least a
WARN_ON(parent->sk) in sock_graft, before unilaterally assigning 
it to the new sk. 

--Sowmini

Re: [PATCH net-next 00/14] nfp: get_phys_port_name for representors and SR-IOV reorder

2017-06-27 Thread David Miller

From: Jakub Kicinski 
Date: Tue, 27 Jun 2017 00:50:14 -0700

> This series starts by making the error message if FW cannot be located
> easier to understand.  Then I move some functions from PCI probe files
> into library code (nfpcore) where they belong, and remove one function
> which is never used.
> 
> Next few patches equip representors with nfp_port structure and make
> their NDOs fully shared (not defined in apps), thanks to which we can 
> easily determine which netdevs are NFP's by comparing the NDO pointers.
> 
> 10th patch makes use of the shared NDOs and nfp_ports to deliver
> netdev-type independent .ndo_get_phys_port_name() implementation.
> 
> Patches 11 and 12 reorder the nfp_app SR-IOV callbacks with enabling
> SR-IOV VFs.  Unfortunately due to how PCI subsystem works we can't
> guarantee being able to disable SR-IOV at exit or that it will be
> disabled when we first probe...  We must therefore make sure FW is
> able to deal with being loaded while SR-IOV is already on.
> 
> Patch 13 fixes potential deadlock when enabling SR-IOV happens at
> the same time as port state refresh.  Note that this can't happen
> at this point, since Flower doesn't refresh ports... but lockdep 
> doesn't know about such details and we will have to deal with this
> sooner or later anyway.
> 
> Last but not least a new Kconfig is added to make sure those who 
> don't care about flower offloads have a way of not including the 
> code in their kernels.  Thanks to nfp_app separation this costs us
> a single ifdef and excluding flower files from the build.

Series applied, thanks.

Re: [PATCH] net: usb: asix88179_178a: Add support for the Belkin B2B128

2017-06-27 Thread David Miller

From: "Andrew F. Davis" 
Date: Mon, 26 Jun 2017 12:41:20 -0500

> The Belkin B2B128 is a USB 3.0 Hub + Gigabit Ethernet Adapter, the
> Ethernet adapter uses the ASIX AX88179 USB 3.0 to Gigabit Ethernet
> chip supported by this driver, add the USB ID for the same.
> 
> This patch is based on work by Geoffrey Tran 
> who has indicated they would like this upstreamed by someone more
> familiar with the upstreaming process.
> 
> Signed-off-by: Andrew F. Davis 

Applied, thank you.

Re: [PATCH net-next 0/2] ipv6: udp: exploit dev_scratch helpers

2017-06-27 Thread David Miller

From: Paolo Abeni 
Date: Mon, 26 Jun 2017 19:01:49 +0200

> When bringing in the recent cache optimization for the UDP protocol, I forgot
> to leverage the newly introduced scratched area helpers in the UDPv6 code 
> path.
> As a result, the UDPv6 implementation suffers some unnecessary performance
> penality when compared to v4.
> 
> This series aim to bring back UDPv6 on equal footing in respect to v4.
> The first patch moves the shared helpers to the common include files, while
> the second uses them in the UDPv6 code.
> 
> This gives 5-8% performance improvement for a system under flood with small
> UDPv6 packets. The performance delta is less than the one reported on the
> original patch set because the UDPv6 code path already leveraged some of the
> optimization.

Series applied, thank you.

Re: [PATCH v2] fsl/fman: add dependency on HAS_DMA

2017-06-27 Thread David Miller

From: Madalin Bucur 
Date: Mon, 26 Jun 2017 18:47:00 +0300

> A previous commit (5567e989198b5a8d) inserted a dependency on DMA
> API that requires HAS_DMA to be added in Kconfig.
> 
> Signed-off-by: Madalin Bucur 

Applied, thank you.

Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-27 Thread David Miller

From: Andrew Lunn 
Date: Mon, 26 Jun 2017 15:34:39 +0200

> I still fear this is going to be an ethtool call with only one user.

That is my fear as well.

We are also in a sort-of moratorium for adding new major ethtool
features until the conversion of ethtool over to netlink occurs.

Re: [PATCH net-next] tcp: fix null ptr deref in getsockopt(..., TCP_ULP, ...)

2017-06-27 Thread David Miller

From: Dave Watson 
Date: Mon, 26 Jun 2017 08:36:47 -0700

> If icsk_ulp_ops is unset, it dereferences a null ptr.
> Add a null ptr check.
> 
> BUG: KASAN: null-ptr-deref in copy_to_user include/linux/uaccess.h:168 
> [inline]
> BUG: KASAN: null-ptr-deref in do_tcp_getsockopt.isra.33+0x24f/0x1e30 
> net/ipv4/tcp.c:3057
> Read of size 4 at addr 0020 by task syz-executor1/15452
> 
> Signed-off-by: Dave Watson 
> Reported-by: "Levin, Alexander (Sasha Levin)" 

Applied, thanks.

Re: RFC: sk leak in sock_graft?

2017-06-27 Thread David Miller

From: Sowmini Varadhan 
Date: Sat, 24 Jun 2017 09:08:27 -0400

> We're seeing a memleak when we run an infinite loop that 
> loads/unloads rds-tcp, and runs some traffic between each 
> load/unload.
> 
> Analysis shows that this is happening for the following reason:
> 
> inet_accept -> sock_graft does
>   parent->sk = sk
> but if the parent->sk was previously pointing at some other
> struct sock "old_sk" (happens in the case of rds_tcp_accept_one()
> which has historically called sock_create_kern() to set up
> the new_sock), we need to sock_put(old_sk), else we'd leak it.
> 
> In general, sock_graft() is cutting loose the parent->sk,
> so it looks like it needs to release its refcnt on it?
> 
> Patch below takes care of the leak in our case, but I could use
> some input on other locking considerations, and if this is ok
> with other modules that use sock_graft()

It could simply be the case that rds-tcp is the first setup that
created that situation where there is a parent->sk already.

In all of the normal accept*() code paths, a plain struct socket
is allocated and nothing sets newsock->sk to anything before that
sock_graft() call.

Why does rds-tcp need to call sock_graft() without those invariants
met?

Thanks.

Re: [Intel-wired-lan] [PATCH v2 1/1] e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails

2017-06-27 Thread Dave Airlie

On 20 June 2017 at 18:49, Daniel Vetter  wrote:
> On Wed, Jun 07, 2017 at 01:07:33AM +, Brown, Aaron F wrote:
>> > From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On Behalf
>> > Of Jeff Kirsher
>> > Sent: Tuesday, June 6, 2017 1:46 PM
>> > To: David Miller ; Nikula, Jani
>> > 
>> > Cc: Ursulin, Tvrtko ; daniel.vet...@ffwll.ch; 
>> > intel-
>> > g...@lists.freedesktop.org; linux-ker...@vger.kernel.org;
>> > jani.nik...@linux.intel.com; ch...@chris-wilson.co.uk; Ertman, David M
>> > ; intel-wired-...@lists.osuosl.org; dri-
>> > de...@lists.freedesktop.org; netdev@vger.kernel.org; airl...@gmail.com
>> > Subject: Re: [Intel-wired-lan] [PATCH v2 1/1] e1000e: Undo
>> > e1000e_pm_freeze if __e1000_shutdown fails
>> >
>> > On Fri, 2017-06-02 at 14:14 -0400, David Miller wrote:
>> > > From: Jani Nikula 
>> > > Date: Wed, 31 May 2017 18:50:43 +0300
>> > >
>> > > > From: Chris Wilson 
>> > > >
>> > > > An error during suspend (e100e_pm_suspend),
>> > >
>> > >  ...
>> > > > lead to complete failure:
>> > >
>> > >  ...
>> > > > The unwind failures stems from commit 2800209994f8 ("e1000e:
>> > > > Refactor PM
>> > > > flows"), but it may be a later patch that introduced the non-
>> > > > recoverable
>> > > > behaviour.
>> > > >
>> > > > Fixes: 2800209994f8 ("e1000e: Refactor PM flows")
>> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99847
>> > > > Cc: Tvrtko Ursulin 
>> > > > Cc: Jeff Kirsher 
>> > > > Cc: Dave Ertman 
>> > > > Cc: Bruce Allan 
>> > > > Cc: intel-wired-...@lists.osuosl.org
>> > > > Cc: netdev@vger.kernel.org
>> > > > Signed-off-by: Chris Wilson 
>> > > > [Jani: bikeshed repainted]
>> > > > Signed-off-by: Jani Nikula 
>> > >
>> > > Jeff, please make sure this gets submitted to me soon.
>> >
>> > Expect it later tonight, just finishing up testing.
>>
>> Tested-by: Aaron Brown 
>
> Hm, I seem to be blind, but I can't find it anywhere in -rc6. Does someone
> have the sha1 from Linus' git for this patch?

Guys this is a pretty serious regression, just left blowing in the
wind, is anyone responsible for e1000e?

Dave.

[iproute PATCH] man: Collect names of man pages automatically

2017-06-27 Thread Phil Sutter

As it turned out, forgetting to add a man page to the respective
Makefile when introducing it is a common mistake. Overcome this once and
for all by using $(wildcard) function in Makefiles.

Fixes: 7124942942e53 ("genl: add manpage")
Fixes: 958cd210942c8 ("ifcfg: add manpage")
Fixes: e1b7f883e50de ("man: add documentation for IPv6 SR commands")
Fixes: 1949f82cdf62c ("Introduce ip vrf command")
Fixes: 535194a172d23 ("tipc: add peer remove functionality")
Signed-off-by: Phil Sutter 
---
 man/man3/Makefile |  2 +-
 man/man7/Makefile |  2 +-
 man/man8/Makefile | 21 +
 3 files changed, 3 insertions(+), 22 deletions(-)

diff --git a/man/man3/Makefile b/man/man3/Makefile
index bf55658c96746..a98741de29262 100644
--- a/man/man3/Makefile
+++ b/man/man3/Makefile
@@ -1,4 +1,4 @@
-MAN3PAGES=libnetlink.3
+MAN3PAGES = $(wildcard *.3)
 
 all:
 
diff --git a/man/man7/Makefile b/man/man7/Makefile
index ccfd8398c5f29..689fc713b6729 100644
--- a/man/man7/Makefile
+++ b/man/man7/Makefile
@@ -1,4 +1,4 @@
-MAN7PAGES = tc-hfsc.7
+MAN7PAGES = $(wildcard *.7)
 
 all:
 
diff --git a/man/man8/Makefile b/man/man8/Makefile
index f33186446819e..12af66be4bc73 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -1,25 +1,6 @@
 TARGETS = ip-address.8 ip-link.8 ip-route.8
 
-MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 rtmon.8 rtpr.8 
ss.8 \
-   tc.8 tc-bfifo.8 tc-bpf.8 tc-cbq.8 tc-cbq-details.8 tc-choke.8 
tc-codel.8 \
-   tc-fq.8 \
-   tc-drr.8 tc-ematch.8 tc-fq_codel.8 tc-hfsc.8 tc-htb.8 tc-pie.8 \
-   tc-mqprio.8 tc-netem.8 tc-pfifo.8 tc-pfifo_fast.8 tc-prio.8 tc-red.8 \
-   tc-sfb.8 tc-sfq.8 tc-stab.8 tc-tbf.8 \
-   bridge.8 rtstat.8 ctstat.8 nstat.8 routef.8 \
-   ip-addrlabel.8 ip-fou.8 ip-gue.8 ip-l2tp.8 ip-macsec.8 \
-   ip-maddress.8 ip-monitor.8 ip-mroute.8 ip-neighbour.8 \
-   ip-netns.8 ip-ntable.8 ip-rule.8 ip-tunnel.8 ip-xfrm.8 \
-   ip-tcp_metrics.8 ip-netconf.8 ip-token.8 \
-   tipc.8 tipc-bearer.8 tipc-link.8 tipc-media.8 tipc-nametable.8 \
-   tipc-node.8 tipc-socket.8 \
-   tc-basic.8 tc-cgroup.8 tc-flow.8 tc-flower.8 tc-fw.8 tc-route.8 \
-   tc-tcindex.8 tc-u32.8 tc-matchall.8 \
-   tc-connmark.8 tc-csum.8 tc-mirred.8 tc-nat.8 tc-pedit.8 tc-police.8 \
-   tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8 tc-skbmod.8 tc-ife.8 \
-   tc-tunnel_key.8 tc-sample.8 \
-   devlink.8 devlink-dev.8 devlink-monitor.8 devlink-port.8 devlink-sb.8 \
-   ifstat.8
+MAN8PAGES = $(TARGETS) $(filter-out $(TARGETS),$(wildcard *.8))
 
 all: $(TARGETS)
 
-- 
2.13.1

Re: [PATCH net] net: prevent sign extension in dev_get_stats()

2017-06-27 Thread David Miller

From: Eric Dumazet 
Date: Tue, 27 Jun 2017 07:02:20 -0700

> From: Eric Dumazet 
> 
> Similar to the fix provided by Dominik Heidler in commit
> 9b3dc0a17d73 ("l2tp: cast l2tp traffic counter to unsigned")
> we need to take care of 32bit kernels in dev_get_stats().
> 
> When using atomic_long_read(), we add a 'long' to u64 and
> might misinterpret high order bit, unless we cast to unsigned.
> 
> Fixes: caf586e5f23ce ("net: add a core netdev->rx_dropped counter")
> Fixes: 015f0688f57ca ("net: net: add a core netdev->tx_dropped counter")
> Fixes: 6e7333d315a76 ("net: add rx_nohandler stat counter")
> Signed-off-by: Eric Dumazet 
> Cc: Jarod Wilson 

Applied and queued up for -stable, thanks.

Re: [net-next] net: remove policy-routing.txt documentation

2017-06-27 Thread David Miller

From: Vincent Bernat 
Date: Tue, 27 Jun 2017 15:42:57 +0200

> It dates back from 2.1.16 and is obsolete since 2.1.68 when the current
> rule system has been introduced.
> 
> Signed-off-by: Vincent Bernat 

Applied.  I am very surprised that document was still there :)

Re: [PATCH net-next] vxlan: fix incorrect nlattr access in MTU check

2017-06-27 Thread David Miller

From: Matthias Schiffer 
Date: Tue, 27 Jun 2017 14:42:43 +0200

> The access to the wrong variable could lead to a NULL dereference and
> possibly other invalid memory reads in vxlan newlink/changelink requests
> with a IFLA_MTU attribute.
> 
> Fixes: a985343ba906 "vxlan: refactor verification and application of 
> configuration"
> Signed-off-by: Matthias Schiffer 

Applied, thanks.

Re: [PATCH 1/1] tc: custom qdisc pkt size translation table

2017-06-27 Thread Eric Dumazet

On Tue, 2017-06-27 at 12:37 -0500, Robert McCabe wrote:
> Yeah, sorry didn't even think about that.
> I guess my first question would be is there another way via the
> iproute2 project where a user could
> configure the stab->data pkt size translation table used in the
> __qdisc_calculate_pkt_len method
> in the kernel source (net/sched/sched_api.c)?
> 
> Also, let's say I went ahead and made the added TC_LINK_LAYER_CUSTOM
> to the include/uapi/linux/pkt_sched.h
> file in the kernel source ... would I also need to make the same
> change in include/uapi/linux/pkt_sched.h in
> the iproute2 source?
> 
> Do you recommend an alternative (more elegant) approach to what I'm
> trying to accomplish?

Note that since you probably want to be able to dump the table
(tc -s -d qdisc show ), you might need a kernel change anyway.

Then the iproute2 change would be a companion.

Re: [PATCH 1/1] tc: custom qdisc pkt size translation table

2017-06-27 Thread Robert McCabe

Yeah, sorry didn't even think about that.
I guess my first question would be is there another way via the
iproute2 project where a user could
configure the stab->data pkt size translation table used in the
__qdisc_calculate_pkt_len method
in the kernel source (net/sched/sched_api.c)?

Also, let's say I went ahead and made the added TC_LINK_LAYER_CUSTOM
to the include/uapi/linux/pkt_sched.h
file in the kernel source ... would I also need to make the same
change in include/uapi/linux/pkt_sched.h in
the iproute2 source?

Do you recommend an alternative (more elegant) approach to what I'm
trying to accomplish?

On Tue, Jun 27, 2017 at 11:55 AM, Eric Dumazet  wrote:
> On Tue, 2017-06-27 at 11:29 -0500, McCabe, Robert J wrote:
>> Added the "custom" linklayer qdisc stab option.
>> This allows the user to specify the pkt size translation
>> parameters from stdin.
>> Example:
>>tc qdisc add ... stab tsize 8 linklayer custom htb
>>Custom size table:
>>InputSizeStart -> IntputSizeEnd: Output Pkt Size
>>0 - 255: 400
>>256 - 511: 800
>>512 - 767: 1200
>>768 - 1023: 1600
>>1024 - 1279: 2000
>>1280 - 1535: 2400
>>1536 - 1791: 2800
>>1792 - 2047: 3200
>>
>> Signed-off-by: McCabe, Robert J 
>> ---
>>  include/linux/pkt_sched.h |  1 +
>>  tc/tc_core.c  | 51 
>> +++
>>  tc/tc_core.h  |  2 +-
>>  tc/tc_stab.c  |  4 +++-
>>  tc/tc_util.c  |  5 +
>>  5 files changed, 61 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
>> index 099bf55..289bb81 100644
>> --- a/include/linux/pkt_sched.h
>> +++ b/include/linux/pkt_sched.h
>> @@ -82,6 +82,7 @@ enum tc_link_layer {
>>   TC_LINKLAYER_UNAWARE, /* Indicate unaware old iproute2 util */
>>   TC_LINKLAYER_ETHERNET,
>>   TC_LINKLAYER_ATM,
>> + TC_LINKLAYER_CUSTOM,
>>  };
>>  #define TC_LINKLAYER_MASK 0x0F /* limit use to lower 4 bits */
>>
>
>
> You can not do this : This file is coming from the kernel
> ( include/uapi/linux/pkt_sched.h )
>
> Since your patch is user space only, you need to find another way ?
>
>
>

Re: [net PATCH] net: sched: Fix one possible panic when no destroy callback

2017-06-27 Thread Eric Dumazet

On Tue, 2017-06-27 at 10:08 -0700, Cong Wang wrote:
> On Tue, Jun 27, 2017 at 9:50 AM, Eric Dumazet  wrote:
> > On Tue, 2017-06-27 at 09:30 -0700, Cong Wang wrote:
> >> On Mon, Jun 26, 2017 at 6:35 PM,   wrote:
> >> > From: Gao Feng 
> >> >
> >> > When qdisc fail to init, qdisc_create would invoke the destroy callback
> >> > to cleanup. But there is no check if the callback exists really. So it
> >> > would cause the panic if there is no real destroy callback like these
> >> > qdisc codel, pfifo, pfifo_fast, and so on.
> >> >
> >> > Now add one the check for destroy to avoid the possible panic.
> >> >
> >> > Signed-off-by: Gao Feng 
> >>
> >> Looks good,
> >>
> >> Acked-by: Cong Wang 
> >>
> >> This is introduced by commit 87b60cfacf9f17cf71933c6e33b.
> >> Please add proper Fixes tag next time.
> >
> > Given that pfifo, pfifo_fast or codel can not fail their init,
> 
> 
> How about codel_init() -> codel_change() -> nla_parse_nested() ?


Yeah, with a malicious user space then (iproute2/tc is fine), codel
could be problematic.

pfifo and pfifo_fast can definitely not hit this bug.

changelog needs a bit of attention, even if the bug is real.

Thanks.

Re: [PATCH iproute2 3/5] rdma: Add device capability parsing

2017-06-27 Thread Leon Romanovsky

On Tue, Jun 27, 2017 at 11:37:35AM -0600, Jason Gunthorpe wrote:
> On Tue, Jun 27, 2017 at 08:33:01PM +0300, Leon Romanovsky wrote:
>
> > My initial plan was to put all parsers under their respective names, in
> > the similar way as I did for caps: $ rdma dev show mlx5_4 caps
>
> I think you should have a useful summary display similar to 'ip a' and
> other commands.
>
> guid(s), subnet prefix or default gid for IB, lid/lmc, link state,
> speed, mtu, pkeys protocol(s)

It will, but before I would like to see this tool be a part of
iproute2, so other people will be able to extend it in addition
to me.

Are you fine with the proposed code?

>
> A show gid table command makes sense for rocee where it can show the
> gid and the IP binding for it, rocee mode, etc..
>
> Jason


signature.asc
Description: PGP signature

Re: [PATCH v6 05/21] net-next: stmmac: Add dwmac-sun8i

2017-06-27 Thread Corentin Labbe

On Tue, Jun 27, 2017 at 07:29:37PM +0200, Maxime Ripard wrote:
> On Tue, Jun 27, 2017 at 02:37:48PM +0200, Corentin Labbe wrote:
> > On Tue, Jun 27, 2017 at 11:33:56AM +0100, Andre Przywara wrote:
> > > Hi,
> > > 
> > > On 27/06/17 11:23, Icenowy Zheng wrote:
> > > > 
> > > > 
> > > > 于 2017年6月27日 GMT+08:00 下午6:15:58, Andre Przywara 
> > > >  写到:
> > > >> Hi,
> > > >>
> > > >> On 27/06/17 10:41, Maxime Ripard wrote:
> > > >>> On Tue, Jun 27, 2017 at 10:02:45AM +0100, Andre Przywara wrote:
> > >  Hi,
> > > 
> > >  (CC:ing some people from that Rockchip dmwac series)
> > > 
> > >  On 27/06/17 09:21, Corentin Labbe wrote:
> > > > On Tue, Jun 27, 2017 at 04:11:21PM +0800, Chen-Yu Tsai wrote:
> > > >> On Tue, Jun 27, 2017 at 4:05 PM, Corentin Labbe
> > > >>  wrote:
> > > >>> On Mon, Jun 26, 2017 at 01:18:23AM +0100, André Przywara wrote:
> > >  On 31/05/17 08:18, Corentin Labbe wrote:
> > > > The dwmac-sun8i is a heavy hacked version of stmmac hardware by
> > > > allwinner.
> > > > In fact the only common part is the descriptor management and
> > > >> the first
> > > > register function.
> > > 
> > >  Hi,
> > > 
> > >  I know I am a bit late with this, but while adapting the U-Boot
> > > >> driver
> > >  to the new binding I was wondering about the internal PHY
> > > >> detection:
> > > 
> > > 
> > >  So here you seem to deduce the usage of the internal PHY by the
> > > >> PHY
> > >  interface specified in the DT (MII = internal, RGMII =
> > > >> external).
> > >  I think I raised this question before, but isn't it perfectly
> > > >> legal for
> > >  a board to use MII with an external PHY even on those SoCs that
> > > >> feature
> > >  an internal PHY?
> > >  On the first glance that does not make too much sense, but apart
> > > >> from
> > >  not being the correct binding to describe all of the SoCs
> > > >> features I see
> > >  two scenarios:
> > >  1) A board vendor might choose to not use the internal PHY
> > > >> because it
> > >  has bugs, lacks features (configurability) or has other issues.
> > > >> For
> > >  instance I have heard reports that the internal PHY makes the
> > > >> SoC go
> > >  rather hot, possibly limiting the CPU frequency. By using an
> > > >> external
> > >  MII PHY (which are still cheaper than RGMII PHYs) this can be
> > > >> avoided.
> > >  2) A PHY does not necessarily need to be directly connected to
> > >  magnetics. Indeed quite some boards use (RG)MII to connect to a
> > > >> switch
> > >  IC or some other network circuitry, for instance fibre
> > > >> connectors.
> > > 
> > >  So I was wondering if we would need an explicit:
> > >    allwinner,use-internal-phy;
> > >  boolean DT property to signal the usage of the internal PHY?
> > >  Alternatively we could go with the negative version:
> > >    allwinner,disable-internal-phy;
> > > 
> > >  Or what about introducing a new "allwinner,internal-mii-phy"
> > > >> compatible
> > >  string for the *PHY* node and use that?
> > > 
> > >  I just want to avoid that we introduce a binding that causes us
> > >  headaches later. I think we can still fix this with a followup
> > > >> patch
> > >  before the driver and its binding hit a release kernel.
> > > 
> > >  Cheers,
> > >  Andre.
> > > 
> > > >>>
> > > >>> I just see some patch, where "phy-mode = internal" is valid.
> > > >>> I will try to find a way to use it
> > > >>
> > > >> Can you provide a link?
> > > >
> > > > https://lkml.org/lkml/2017/6/23/479
> > > >
> > > >>
> > > >> I'm not a fan of using phy-mode for this. There's no guarantee
> > > >> what
> > > >> mode the internal PHY uses. That's what phy-mode is for.
> > > 
> > >  I can understand Chen-Yu's concerns, but ...
> > > 
> > > > For each soc the internal PHY mode is know and setted in
> > > >> emac_variant/internal_phy
> > > > So its not a problem.
> > > 
> > >  that is true as well, at least for now.
> > > 
> > >  So while I agree that having a separate property to indicate
> > >  the usage of the internal PHY would be nice, I am bit tempted
> > >  to use this easier approach and piggy back on the existing
> > >  phy-mode property.
> > > >>>
> > > >>> We're trying to fix an issue that works for now too.
> > > >>>
> > > >>> If we want to consider future weird cases, then we must
> > > >>> consider all of them. And the phy mode changing is definitely
> > > >>> not really far fetched.
> > > >>>
> > > >>> I agree with Chen-Yu, and I really feel like the compatible
> > > >>> solution

Re: [PATCH v6 05/21] net-next: stmmac: Add dwmac-sun8i

2017-06-27 Thread Florian Fainelli

On 06/27/2017 10:29 AM, Maxime Ripard wrote:
> On Tue, Jun 27, 2017 at 02:37:48PM +0200, Corentin Labbe wrote:
>> On Tue, Jun 27, 2017 at 11:33:56AM +0100, Andre Przywara wrote:
>>> Hi,
>>>
>>> On 27/06/17 11:23, Icenowy Zheng wrote:


 于 2017年6月27日 GMT+08:00 下午6:15:58, Andre Przywara  
 写到:
> Hi,
>
> On 27/06/17 10:41, Maxime Ripard wrote:
>> On Tue, Jun 27, 2017 at 10:02:45AM +0100, Andre Przywara wrote:
>>> Hi,
>>>
>>> (CC:ing some people from that Rockchip dmwac series)
>>>
>>> On 27/06/17 09:21, Corentin Labbe wrote:
 On Tue, Jun 27, 2017 at 04:11:21PM +0800, Chen-Yu Tsai wrote:
> On Tue, Jun 27, 2017 at 4:05 PM, Corentin Labbe
>  wrote:
>> On Mon, Jun 26, 2017 at 01:18:23AM +0100, André Przywara wrote:
>>> On 31/05/17 08:18, Corentin Labbe wrote:
 The dwmac-sun8i is a heavy hacked version of stmmac hardware by
 allwinner.
 In fact the only common part is the descriptor management and
> the first
 register function.
>>>
>>> Hi,
>>>
>>> I know I am a bit late with this, but while adapting the U-Boot
> driver
>>> to the new binding I was wondering about the internal PHY
> detection:
>>>
>>>
>>> So here you seem to deduce the usage of the internal PHY by the
> PHY
>>> interface specified in the DT (MII = internal, RGMII =
> external).
>>> I think I raised this question before, but isn't it perfectly
> legal for
>>> a board to use MII with an external PHY even on those SoCs that
> feature
>>> an internal PHY?
>>> On the first glance that does not make too much sense, but apart
> from
>>> not being the correct binding to describe all of the SoCs
> features I see
>>> two scenarios:
>>> 1) A board vendor might choose to not use the internal PHY
> because it
>>> has bugs, lacks features (configurability) or has other issues.
> For
>>> instance I have heard reports that the internal PHY makes the
> SoC go
>>> rather hot, possibly limiting the CPU frequency. By using an
> external
>>> MII PHY (which are still cheaper than RGMII PHYs) this can be
> avoided.
>>> 2) A PHY does not necessarily need to be directly connected to
>>> magnetics. Indeed quite some boards use (RG)MII to connect to a
> switch
>>> IC or some other network circuitry, for instance fibre
> connectors.
>>>
>>> So I was wondering if we would need an explicit:
>>>   allwinner,use-internal-phy;
>>> boolean DT property to signal the usage of the internal PHY?
>>> Alternatively we could go with the negative version:
>>>   allwinner,disable-internal-phy;
>>>
>>> Or what about introducing a new "allwinner,internal-mii-phy"
> compatible
>>> string for the *PHY* node and use that?
>>>
>>> I just want to avoid that we introduce a binding that causes us
>>> headaches later. I think we can still fix this with a followup
> patch
>>> before the driver and its binding hit a release kernel.
>>>
>>> Cheers,
>>> Andre.
>>>
>>
>> I just see some patch, where "phy-mode = internal" is valid.
>> I will try to find a way to use it
>
> Can you provide a link?

 https://lkml.org/lkml/2017/6/23/479

>
> I'm not a fan of using phy-mode for this. There's no guarantee
> what
> mode the internal PHY uses. That's what phy-mode is for.
>>>
>>> I can understand Chen-Yu's concerns, but ...
>>>
 For each soc the internal PHY mode is know and setted in
> emac_variant/internal_phy
 So its not a problem.
>>>
>>> that is true as well, at least for now.
>>>
>>> So while I agree that having a separate property to indicate
>>> the usage of the internal PHY would be nice, I am bit tempted
>>> to use this easier approach and piggy back on the existing
>>> phy-mode property.
>>
>> We're trying to fix an issue that works for now too.
>>
>> If we want to consider future weird cases, then we must
>> consider all of them. And the phy mode changing is definitely
>> not really far fetched.
>>
>> I agree with Chen-Yu, and I really feel like the compatible
>> solution you suggested would cover both your concerns, and
>> ours.
>
> So something like this?
>   emac: emac@1c3 {
>   compatible = "allwinner,sun8i-h3-emac";
>   ...
>   phy-mode = "mii";
>   phy-handle = <_mii_phy>;
>   ...
>
>   mdio: mdio {
>#address-cells = <1>;
>

Re: [PATCH iproute2 3/5] rdma: Add device capability parsing

2017-06-27 Thread Jason Gunthorpe

On Tue, Jun 27, 2017 at 08:33:01PM +0300, Leon Romanovsky wrote:

> My initial plan was to put all parsers under their respective names, in
> the similar way as I did for caps: $ rdma dev show mlx5_4 caps

I think you should have a useful summary display similar to 'ip a' and
other commands.

guid(s), subnet prefix or default gid for IB, lid/lmc, link state,
speed, mtu, pkeys protocol(s)

A show gid table command makes sense for rocee where it can show the
gid and the IP binding for it, rocee mode, etc..

Jason

Re: [RESEND PATCH v4 1/3] Bluetooth: bnep: fix possible might sleep error in bnep_session

2017-06-27 Thread Marcel Holtmann

Hi Jeffy,

> It looks like bnep_session has same pattern as the issue reported in
> old rfcomm:
> 
>   while (1) {
>   set_current_state(TASK_INTERRUPTIBLE);
>   if (condition)
>   break;
>   // may call might_sleep here
>   schedule();
>   }
>   __set_current_state(TASK_RUNNING);
> 
> Which fixed at:
>   dfb2fae Bluetooth: Fix nested sleeps
> 
> So let's fix it at the same way, also follow the suggestion of:
> https://lwn.net/Articles/628628/
> 
> Signed-off-by: Jeffy Chen 
> Reviewed-by: Brian Norris 
> Reviewed-by: AL Yu-Chen Cho 
> ---
> 
> Changes in v4: None
> Changes in v2: None
> 
> net/bluetooth/bnep/core.c | 11 +--
> 1 file changed, 5 insertions(+), 6 deletions(-)

all 3 patches have been applied to bluetooth-next tree.

Regards

Marcel

Re: [PATCH iproute2 3/5] rdma: Add device capability parsing

2017-06-27 Thread Leon Romanovsky

On Tue, Jun 27, 2017 at 10:41:50AM -0600, Jason Gunthorpe wrote:
> On Tue, Jun 27, 2017 at 12:21:29PM +0300, Leon Romanovsky wrote:
> > > What will be the output of such command?
> > >  $ rdma dev show mlx5_4
> >
> > ip-like style:
> >
> > $ rdma dev show mlx5_4
> > 5: mlx5_4:
> > caps:  > PORT_ACTIVE_EVENT, SYS_IMAGE_GUID, RC_RNR_NAK_GEN, MEM_WINDOW, UD_IP_CSUM, 
> > UD_TSO, XRC, MEM_MGT_EXTENSIONS, BLOCK_MULTICAST_LOOPBACK, 
> > MEM_WINDOW_TYPE_2B, RAW_IP_CSUM, SIGNATURE_HANDOVER, VIRTUAL_FUNCTION>
> > $ rdma link show mlx5_3
> > 4/1: mlx5_3/1:
> > caps: 
>
> I think that is better, maybe it should only show under some kind of
> verbose mode, I don't know, it depends what other stuff ends up being
> displayed..
>
> Are you going to dump the gid table and pkey table too in one of these 
> commands?

My initial plan was to put all parsers under their respective names, in
the similar way as I did for caps: $ rdma dev show mlx5_4 caps

So for large dumps, I'm going to use that technique again and maybe print 
summary as a default.
For example, for gids, we can print utilization as a summary while whole
table if someone really wants it: $ rdma link show mlx5_4/1 gids 

Something like that.

Thanks

>
> Jason


signature.asc
Description: PGP signature

Re: [PATCH v6 05/21] net-next: stmmac: Add dwmac-sun8i

2017-06-27 Thread Maxime Ripard

On Tue, Jun 27, 2017 at 02:37:48PM +0200, Corentin Labbe wrote:
> On Tue, Jun 27, 2017 at 11:33:56AM +0100, Andre Przywara wrote:
> > Hi,
> > 
> > On 27/06/17 11:23, Icenowy Zheng wrote:
> > > 
> > > 
> > > 于 2017年6月27日 GMT+08:00 下午6:15:58, Andre Przywara  
> > > 写到:
> > >> Hi,
> > >>
> > >> On 27/06/17 10:41, Maxime Ripard wrote:
> > >>> On Tue, Jun 27, 2017 at 10:02:45AM +0100, Andre Przywara wrote:
> >  Hi,
> > 
> >  (CC:ing some people from that Rockchip dmwac series)
> > 
> >  On 27/06/17 09:21, Corentin Labbe wrote:
> > > On Tue, Jun 27, 2017 at 04:11:21PM +0800, Chen-Yu Tsai wrote:
> > >> On Tue, Jun 27, 2017 at 4:05 PM, Corentin Labbe
> > >>  wrote:
> > >>> On Mon, Jun 26, 2017 at 01:18:23AM +0100, André Przywara wrote:
> >  On 31/05/17 08:18, Corentin Labbe wrote:
> > > The dwmac-sun8i is a heavy hacked version of stmmac hardware by
> > > allwinner.
> > > In fact the only common part is the descriptor management and
> > >> the first
> > > register function.
> > 
> >  Hi,
> > 
> >  I know I am a bit late with this, but while adapting the U-Boot
> > >> driver
> >  to the new binding I was wondering about the internal PHY
> > >> detection:
> > 
> > 
> >  So here you seem to deduce the usage of the internal PHY by the
> > >> PHY
> >  interface specified in the DT (MII = internal, RGMII =
> > >> external).
> >  I think I raised this question before, but isn't it perfectly
> > >> legal for
> >  a board to use MII with an external PHY even on those SoCs that
> > >> feature
> >  an internal PHY?
> >  On the first glance that does not make too much sense, but apart
> > >> from
> >  not being the correct binding to describe all of the SoCs
> > >> features I see
> >  two scenarios:
> >  1) A board vendor might choose to not use the internal PHY
> > >> because it
> >  has bugs, lacks features (configurability) or has other issues.
> > >> For
> >  instance I have heard reports that the internal PHY makes the
> > >> SoC go
> >  rather hot, possibly limiting the CPU frequency. By using an
> > >> external
> >  MII PHY (which are still cheaper than RGMII PHYs) this can be
> > >> avoided.
> >  2) A PHY does not necessarily need to be directly connected to
> >  magnetics. Indeed quite some boards use (RG)MII to connect to a
> > >> switch
> >  IC or some other network circuitry, for instance fibre
> > >> connectors.
> > 
> >  So I was wondering if we would need an explicit:
> >    allwinner,use-internal-phy;
> >  boolean DT property to signal the usage of the internal PHY?
> >  Alternatively we could go with the negative version:
> >    allwinner,disable-internal-phy;
> > 
> >  Or what about introducing a new "allwinner,internal-mii-phy"
> > >> compatible
> >  string for the *PHY* node and use that?
> > 
> >  I just want to avoid that we introduce a binding that causes us
> >  headaches later. I think we can still fix this with a followup
> > >> patch
> >  before the driver and its binding hit a release kernel.
> > 
> >  Cheers,
> >  Andre.
> > 
> > >>>
> > >>> I just see some patch, where "phy-mode = internal" is valid.
> > >>> I will try to find a way to use it
> > >>
> > >> Can you provide a link?
> > >
> > > https://lkml.org/lkml/2017/6/23/479
> > >
> > >>
> > >> I'm not a fan of using phy-mode for this. There's no guarantee
> > >> what
> > >> mode the internal PHY uses. That's what phy-mode is for.
> > 
> >  I can understand Chen-Yu's concerns, but ...
> > 
> > > For each soc the internal PHY mode is know and setted in
> > >> emac_variant/internal_phy
> > > So its not a problem.
> > 
> >  that is true as well, at least for now.
> > 
> >  So while I agree that having a separate property to indicate
> >  the usage of the internal PHY would be nice, I am bit tempted
> >  to use this easier approach and piggy back on the existing
> >  phy-mode property.
> > >>>
> > >>> We're trying to fix an issue that works for now too.
> > >>>
> > >>> If we want to consider future weird cases, then we must
> > >>> consider all of them. And the phy mode changing is definitely
> > >>> not really far fetched.
> > >>>
> > >>> I agree with Chen-Yu, and I really feel like the compatible
> > >>> solution you suggested would cover both your concerns, and
> > >>> ours.
> > >>
> > >> So something like this?
> > >>  emac: emac@1c3 {
> > >>  compatible = "allwinner,sun8i-h3-emac";
> > >>  ...
> > >>  phy-mode = "mii";
> > >>  phy-handle = <_mii_phy>;
> > >>  ...
> > >>
> > >>

Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic

2017-06-27 Thread crow

Hi,
There are other user reporting same issue while using mainline kernel
but using Ubuntu, so this is for sure not Distribution related. For me
see the [0]. I hope someone would get time after 4.12 release to try
fix this issue.

Regards,

[0] 
http://forum.khadas.com/t/ubuntu-server-rom-linux-mainline-v170624-pre-alpha-version-emmc-installation/803/12

On Thu, Jun 15, 2017 at 4:40 PM, crow  wrote:
> Hi,
>
> On Sun, Jun 11, 2017 at 7:03 PM, crow  wrote:
>> Hi Andrew,
>>
>> On Sun, Jun 11, 2017 at 5:21 PM, Andrew Lunn  wrote:
 Thank your for the suggestion, and thanks Martin to explaining me over
 IRC what actually I should do.

 I recompiled mainline kernel 4.12-rc4 with the Amlogic driver:
 replaced drivers/net/phy/meson-gxl.c with
 https://github.com/khadas/linux/blob/ubuntu-4.9/drivers/amlogic/ethernet/phy/amlogic.c

 But this did not solve the issue. As soon as i start git clone i lose
 network connection to device (no session timeout/disconnect this time,
 but I am unable to reconnect over SSH or to get OK ping replay back).
>>>
>>
>> 1) First problem reported I can't reproduce anymore, every reboot/cold
>> boot with mainline kernel the Ethernet speed is detected as
>> "100Mbps/Full" , but as seen in first post there was this issue.
>
> Once I did setup u-boot to have network in u-boot and did just an ping
> to activate network. And after boot Ethernet was detected as 10Mbps.
> But again was not able to reproduce it. I double check that I have 5E
> cable.
>
> in u-boot Ethernet is detected as below
> kvim#ping x.x.x.x
> Speed: 100, full duplex
> Using dwmac.c941 device
> host x.x.x.x is alive
> kvim#
>
> then I let ArchLinuxArm to boot and found out I can't connect to
> device over SSH. Check over serial console and found:
>
> # dmesg | tail -n 10
> [8.334790] meson8b-dwmac c941.ethernet eth0: device MAC
> address 00:15:18:01:81:31
> [8.436668] Meson GXL Internal PHY 0.e40908ff:08: attached PHY
> driver [Meson GXL Internal PHY] (mii_bus:phy_addr=0.e40908ff:08,
> irq=-1)
> [8.535171] meson8b-dwmac c941.ethernet eth0: PTP not supported by HW
> [   10.225264] brcmfmac: brcmf_c_preinit_dcmds: Firmware version =
> wl0: Mar  1 2015 07:29:38 version 7.45.18 (r538002) FWID 01-6a2c8ad4
> [   10.635703] meson8b-dwmac c941.ethernet eth0: Link is Up -
> 10Mbps/Half - flow control off
> # uname -a
> Linux khadasvimpro 4.12.0-rc4-3-ARCH #1 SMP Thu Jun 8 00:17:20 CEST
> 2017 aarch64 GNU/Linux
> #
> # mii-tool -vvv eth0
> Using SIOCGMIIPHY=0x8947
> eth0: no autonegotiation,, link ok
>   registers for MII PHY 8:
> 1000 782d 0181 4400 01e1 0001 0005 2001
>        
> 0040 0002 40e8 5400 1c1c   
> fff0   000a 1407 0040  105a
>   product info: vendor 00:60:51, model 0 rev 0
>   basic mode:   autonegotiation enabled
>   basic status: autonegotiation complete, link ok
>   capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
>   advertising:  1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
> #
> # ifconfig eth0 down && ifconfig eth0 up
> [ 1972.596690] Meson GXL Internal PHY 0.e40908ff:08: attached PHY
> driver [Meson GXL Internal PHY] (mii_bus:phy_addr=0.e40908ff:08,
> irq=-1)
> [ 1972.704156] meson8b-dwmac c941.ethernet eth0: PTP not supported by HW
> [ 1974.795698] meson8b-dwmac c941.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
> #
> # mii-tool -vvv eth0
> Using SIOCGMIIPHY=0x8947
> eth0: negotiated 1000baseT-HD flow-control, link ok
>   registers for MII PHY 8:
> 1000 782d 0181 4400 01e1 c1e1 000f 2001
>        
> 0040 0002 40e8 5400 1c1c   
> fff0   020a 1407 00ca  105a
>   product info: vendor 00:60:51, model 0 rev 0
>   basic mode:   autonegotiation enabled
>   basic status: autonegotiation complete, link ok
>   capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
>   advertising:  1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
>   link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
> #
>
> 2) see below
>> 2) see below
>>
>>> So this shows it is more than a PHY problem. The Ethernet MAC driver
>>> is probably also part of the problem.
>>
>> There are some stmmac fixes [1] in soon to be rc5, compiled current
>> master (without amlogic.c) with the fixes but for me the issue still
>> persist. I will compile once released rc5 with amlogic.c and report
>> back.
>>
>>> Are there any mainline kernels which work O.K?
>>
>> Khadas VIM support was added in 4.12-rc1. And I did test all four rc's
>> but without success.
>
> I did test many Kernel builds but all test have failed when
> downloading bigger files / doing git clone.
> As Martin Blumenstingl suggested I start with

Re: [net PATCH] net: sched: Fix one possible panic when no destroy callback

2017-06-27 Thread Cong Wang

On Tue, Jun 27, 2017 at 9:50 AM, Eric Dumazet  wrote:
> On Tue, 2017-06-27 at 09:30 -0700, Cong Wang wrote:
>> On Mon, Jun 26, 2017 at 6:35 PM,   wrote:
>> > From: Gao Feng 
>> >
>> > When qdisc fail to init, qdisc_create would invoke the destroy callback
>> > to cleanup. But there is no check if the callback exists really. So it
>> > would cause the panic if there is no real destroy callback like these
>> > qdisc codel, pfifo, pfifo_fast, and so on.
>> >
>> > Now add one the check for destroy to avoid the possible panic.
>> >
>> > Signed-off-by: Gao Feng 
>>
>> Looks good,
>>
>> Acked-by: Cong Wang 
>>
>> This is introduced by commit 87b60cfacf9f17cf71933c6e33b.
>> Please add proper Fixes tag next time.
>
> Given that pfifo, pfifo_fast or codel can not fail their init,


How about codel_init() -> codel_change() -> nla_parse_nested() ?


> I do not see this patch as a net candidate, and the Fixes: tag seems not
> needed.


True, but we add Fixes tag to net-next candidates too.

Re: [PATCH v2] netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcv

2017-06-27 Thread Pablo Neira Ayuso

On Tue, Jun 27, 2017 at 05:58:25PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Jun 07, 2017 at 03:50:38PM +0200, Mateusz Jurczyk wrote:
> > Verify that the length of the socket buffer is sufficient to cover the
> > nlmsghdr structure before accessing the nlh->nlmsg_len field for further
> > input sanitization. If the client only supplies 1-3 bytes of data in
> > sk_buff, then nlh->nlmsg_len remains partially uninitialized and
> > contains leftover memory from the corresponding kernel allocation.
> > Operating on such data may result in indeterminate evaluation of the
> > nlmsg_len < NLMSG_HDRLEN expression.
> > 
> > The bug was discovered by a runtime instrumentation designed to detect
> > use of uninitialized memory in the kernel. The patch prevents this and
> > other similar tools (e.g. KMSAN) from flagging this behavior in the future.
> 
> Applied, thanks.

Wait, I keeping this back after closer look.

I think we have to remove this:

if (nlh->nlmsg_len < NLMSG_HDRLEN || <---
skb->len < NLMSG_HDRLEN + sizeof(struct nfgenmsg))
return;

in nfnetlink_rcv_skb_batch()

now that we make this unfront check from nfnetlink_rcv().

P.S: Sorry I couldn't look at this any sooner, I've been busy in the
last weeks preparing things for the upcoming Netfilter Workshop.

Re: [PATCH 1/1] tc: custom qdisc pkt size translation table

2017-06-27 Thread Eric Dumazet

On Tue, 2017-06-27 at 11:29 -0500, McCabe, Robert J wrote:
> Added the "custom" linklayer qdisc stab option.
> This allows the user to specify the pkt size translation
> parameters from stdin.
> Example:
>tc qdisc add ... stab tsize 8 linklayer custom htb
>Custom size table:
>InputSizeStart -> IntputSizeEnd: Output Pkt Size
>0 - 255: 400
>256 - 511: 800
>512 - 767: 1200
>768 - 1023: 1600
>1024 - 1279: 2000
>1280 - 1535: 2400
>1536 - 1791: 2800
>1792 - 2047: 3200
> 
> Signed-off-by: McCabe, Robert J 
> ---
>  include/linux/pkt_sched.h |  1 +
>  tc/tc_core.c  | 51 
> +++
>  tc/tc_core.h  |  2 +-
>  tc/tc_stab.c  |  4 +++-
>  tc/tc_util.c  |  5 +
>  5 files changed, 61 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
> index 099bf55..289bb81 100644
> --- a/include/linux/pkt_sched.h
> +++ b/include/linux/pkt_sched.h
> @@ -82,6 +82,7 @@ enum tc_link_layer {
>   TC_LINKLAYER_UNAWARE, /* Indicate unaware old iproute2 util */
>   TC_LINKLAYER_ETHERNET,
>   TC_LINKLAYER_ATM,
> + TC_LINKLAYER_CUSTOM,
>  };
>  #define TC_LINKLAYER_MASK 0x0F /* limit use to lower 4 bits */
>  


You can not do this : This file is coming from the kernel
( include/uapi/linux/pkt_sched.h )

Since your patch is user space only, you need to find another way ?

Re: [net PATCH] net: sched: Fix one possible panic when no destroy callback

2017-06-27 Thread Eric Dumazet

On Tue, 2017-06-27 at 09:30 -0700, Cong Wang wrote:
> On Mon, Jun 26, 2017 at 6:35 PM,   wrote:
> > From: Gao Feng 
> >
> > When qdisc fail to init, qdisc_create would invoke the destroy callback
> > to cleanup. But there is no check if the callback exists really. So it
> > would cause the panic if there is no real destroy callback like these
> > qdisc codel, pfifo, pfifo_fast, and so on.
> >
> > Now add one the check for destroy to avoid the possible panic.
> >
> > Signed-off-by: Gao Feng 
> 
> Looks good,
> 
> Acked-by: Cong Wang 
> 
> This is introduced by commit 87b60cfacf9f17cf71933c6e33b.
> Please add proper Fixes tag next time.

Given that pfifo, pfifo_fast or codel can not fail their init,
I do not see this patch as a net candidate, and the Fixes: tag seems not
needed.

Gao, have you really hit a bug, or is this patch some kind of cleanup or
prep work ?

If yes, please properly identify which packet scheduler had an issue.

Thanks.

Re: [PATCH iproute2 3/5] rdma: Add device capability parsing

2017-06-27 Thread Jason Gunthorpe

On Tue, Jun 27, 2017 at 12:21:29PM +0300, Leon Romanovsky wrote:
> > What will be the output of such command?
> >  $ rdma dev show mlx5_4
> 
> ip-like style:
> 
> $ rdma dev show mlx5_4
> 5: mlx5_4:
> caps:  SYS_IMAGE_GUID, RC_RNR_NAK_GEN, MEM_WINDOW, UD_IP_CSUM, UD_TSO, XRC, 
> MEM_MGT_EXTENSIONS, BLOCK_MULTICAST_LOOPBACK, MEM_WINDOW_TYPE_2B, 
> RAW_IP_CSUM, SIGNATURE_HANDOVER, VIRTUAL_FUNCTION>
> $ rdma link show mlx5_3
> 4/1: mlx5_3/1:
> caps: 

I think that is better, maybe it should only show under some kind of
verbose mode, I don't know, it depends what other stuff ends up being
displayed..

Are you going to dump the gid table and pkey table too in one of these commands?

Jason

[PATCH 1/1] tc: custom qdisc pkt size translation table

2017-06-27 Thread McCabe, Robert J

Added the "custom" linklayer qdisc stab option.
This allows the user to specify the pkt size translation
parameters from stdin.
Example:
   tc qdisc add ... stab tsize 8 linklayer custom htb
   Custom size table:
   InputSizeStart -> IntputSizeEnd: Output Pkt Size
   0 - 255: 400
   256 - 511: 800
   512 - 767: 1200
   768 - 1023: 1600
   1024 - 1279: 2000
   1280 - 1535: 2400
   1536 - 1791: 2800
   1792 - 2047: 3200

Signed-off-by: McCabe, Robert J 
---
 include/linux/pkt_sched.h |  1 +
 tc/tc_core.c  | 51 +++
 tc/tc_core.h  |  2 +-
 tc/tc_stab.c  |  4 +++-
 tc/tc_util.c  |  5 +
 5 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 099bf55..289bb81 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -82,6 +82,7 @@ enum tc_link_layer {
TC_LINKLAYER_UNAWARE, /* Indicate unaware old iproute2 util */
TC_LINKLAYER_ETHERNET,
TC_LINKLAYER_ATM,
+   TC_LINKLAYER_CUSTOM,
 };
 #define TC_LINKLAYER_MASK 0x0F /* limit use to lower 4 bits */
 
diff --git a/tc/tc_core.c b/tc/tc_core.c
index 821b741..fb04704 100644
--- a/tc/tc_core.c
+++ b/tc/tc_core.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "tc_core.h"
 #include 
@@ -28,6 +29,16 @@
 static double tick_in_usec = 1;
 static double clock_factor = 1;
 
+struct size_table_entry {
+   unsigned int input_size_boundary_start;
+   unsigned int output_size_bytes;
+};
+
+//TODO: free
+static struct size_table_entry* custom_size_table = NULL;
+static int num_size_table_entries = 0;
+
+
 int tc_core_time2big(unsigned int time)
 {
__u64 t = time;
@@ -89,6 +100,23 @@ static unsigned int tc_align_to_atm(unsigned int size)
return linksize;
 }
 
+static unsigned int tc_align_to_custom(unsigned int size)
+{
+   int i;
+
+   assert(custom_size_table != NULL);
+
+   for(i = num_size_table_entries -1; i >= 0 ; --i)
+   {
+   if(custom_size_table[i].input_size_boundary_start < size)
+   {
+   /* found it */
+   return custom_size_table[i].output_size_bytes;
+   }
+   }
+   return 0;
+}
+
 static unsigned int tc_adjust_size(unsigned int sz, unsigned int mpu, enum 
link_layer linklayer)
 {
if (sz < mpu)
@@ -97,6 +125,8 @@ static unsigned int tc_adjust_size(unsigned int sz, unsigned 
int mpu, enum link_
switch (linklayer) {
case LINKLAYER_ATM:
return tc_align_to_atm(sz);
+   case LINKLAYER_CUSTOM:
+   return tc_align_to_custom(sz);
case LINKLAYER_ETHERNET:
default:
/* No size adjustments on Ethernet */
@@ -185,6 +215,27 @@ int tc_calc_size_table(struct tc_sizespec *s, __u16 **stab)
if (!*stab)
return -1;
 
+   if(LINKLAYER_CUSTOM == linklayer)
+   {
+   custom_size_table = malloc(sizeof(struct size_table_entry)* 
s->tsize);
+   if(!custom_size_table)
+   return -1;
+   num_size_table_entries = s->tsize;
+
+   printf("Custom size table:\n");
+   printf("InputSizeStart -> IntputSizeEnd : Output Pkt Size\n");
+   for(i = 0; i <= s->tsize - 1; ++i)
+   {
+   printf("%d - %d: ", i << s->cell_log, ((i+1) << 
s->cell_log) - 1);
+   if(!scanf("%u", 
_size_table[i].output_size_bytes))
+   {
+   fprintf(stderr, "Invalid custom stab table 
entry!\n");
+   return -1;
+   }
+
+   custom_size_table[i].input_size_boundary_start = i << 
s->cell_log;
+   }
+   }
 again:
for (i = s->tsize - 1; i >= 0; i--) {
sz = tc_adjust_size((i + 1) << s->cell_log, s->mpu, linklayer);
diff --git a/tc/tc_core.h b/tc/tc_core.h
index 8a63b79..8e97222 100644
--- a/tc/tc_core.h
+++ b/tc/tc_core.h
@@ -10,9 +10,9 @@ enum link_layer {
LINKLAYER_UNSPEC,
LINKLAYER_ETHERNET,
LINKLAYER_ATM,
+   LINKLAYER_CUSTOM,
 };
 
-
 int  tc_core_time2big(unsigned time);
 unsigned tc_core_time2tick(unsigned time);
 unsigned tc_core_tick2time(unsigned tick);
diff --git a/tc/tc_stab.c b/tc/tc_stab.c
index 1a0a3e3..8374c76 100644
--- a/tc/tc_stab.c
+++ b/tc/tc_stab.c
@@ -37,7 +37,9 @@ static void stab_help(void)
"   tsize : how many slots should size table have {512}\n"
"   mpu   : minimum packet size used in rate computations\n"
"   overhead  : per-packet size overhead used in rate 
computations\n"
-   "   linklayer : adapting to a linklayer e.g. atm\n"
+   "   linklayer : adapting to a linklayer e.g. ethernet, atm or 
custom\n"
+

Re: [PATCH] datapath: Avoid using stack larger than 1024.

2017-06-27 Thread Greg Rose


On 06/27/2017 12:03 AM, Tonghao Zhang wrote:

When compiling OvS-master on 4.4.0-81 kernel,
there is a warning:

 CC [M]  /root/ovs/datapath/linux/datapath.o
 /root/ovs/datapath/linux/datapath.c: In function
 ‘ovs_flow_cmd_set’:
 /root/ovs/datapath/linux/datapath.c:1221:1: warning:
 the frame size of 1040 bytes is larger than 1024 bytes
 [-Wframe-larger-than=]

This patch use kmalloc to malloc mem for sw_flow_mask and
avoid using stack.

Signed-off-by: Tonghao Zhang 
---
  datapath/datapath.c | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/datapath/datapath.c b/datapath/datapath.c
index c85029c..da8cd68 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -1107,7 +1107,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
  struct ovs_header *ovs_header = info->userhdr;
  struct sw_flow_key key;
  struct sw_flow *flow;
-struct sw_flow_mask mask;
+struct sw_flow_mask *mask;
  struct sk_buff *reply = NULL;
  struct datapath *dp;
  struct sw_flow_actions *old_acts = NULL, *acts = NULL;
@@ -1120,7 +1120,11 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)

  ufid_present = ovs_nla_get_ufid(, a[OVS_FLOW_ATTR_UFID], log);
  if (a[OVS_FLOW_ATTR_KEY]) {
-ovs_match_init(, , true, );
+mask = kmalloc(sizeof(struct sw_flow_mask), GFP_KERNEL);
+if (!mask)
+return -ENOMEM;
+
+ovs_match_init(, , true, mask);
  error = ovs_nla_get_match(net, , a[OVS_FLOW_ATTR_KEY],
a[OVS_FLOW_ATTR_MASK], log);
  } else if (!ufid_present) {
@@ -1141,7 +1145,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
  }

  acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], ,
-, log);
+mask, log);
  if (IS_ERR(acts)) {
  error = PTR_ERR(acts);
  goto error;
@@ -1216,6 +1220,7 @@ err_unlock_ovs:
  err_kfree_acts:
  ovs_nla_free_flow_actions(acts);
  error:
+kfree(mask);
  return error;
  }



It looks fine to me but let's copy the maintainer Pravin

- Greg

Re: [net PATCH] net: sched: Fix one possible panic when no destroy callback

2017-06-27 Thread Cong Wang

On Mon, Jun 26, 2017 at 6:35 PM,   wrote:
> From: Gao Feng 
>
> When qdisc fail to init, qdisc_create would invoke the destroy callback
> to cleanup. But there is no check if the callback exists really. So it
> would cause the panic if there is no real destroy callback like these
> qdisc codel, pfifo, pfifo_fast, and so on.
>
> Now add one the check for destroy to avoid the possible panic.
>
> Signed-off-by: Gao Feng 

Looks good,

Acked-by: Cong Wang 

This is introduced by commit 87b60cfacf9f17cf71933c6e33b.
Please add proper Fixes tag next time.

Thanks.

[PATCH] rxrpc: remove unused static variables

2017-06-27 Thread Sebastian Andrzej Siewior

The rxrpc_security_methods and rxrpc_security_sem user has been removed
in 648af7fca159 ("rxrpc: Absorb the rxkad security module"). This was
noticed by kbuild test robot for the -RT tree but is also true for !RT.

Reported-by: kbuild test robot 
Cc: "David S. Miller" 
Cc: David Howells 
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 net/rxrpc/security.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/rxrpc/security.c b/net/rxrpc/security.c
index 7d921e56e715..13df56a738e5 100644
--- a/net/rxrpc/security.c
+++ b/net/rxrpc/security.c
@@ -19,9 +19,6 @@
 #include 
 #include "ar-internal.h"
 
-static LIST_HEAD(rxrpc_security_methods);
-static DECLARE_RWSEM(rxrpc_security_sem);
-
 static const struct rxrpc_security *rxrpc_security_types[] = {
[RXRPC_SECURITY_NONE]   = _no_security,
 #ifdef CONFIG_RXKAD
-- 
2.13.1

Re: [PATCH v6 05/21] net-next: stmmac: Add dwmac-sun8i

2017-06-27 Thread Maxime Ripard

On Tue, Jun 27, 2017 at 11:33:56AM +0100, Andre Przywara wrote:
> Hi,
> 
> On 27/06/17 11:23, Icenowy Zheng wrote:
> > 
> > 
> > 于 2017年6月27日 GMT+08:00 下午6:15:58, Andre Przywara  
> > 写到:
> >> Hi,
> >>
> >> On 27/06/17 10:41, Maxime Ripard wrote:
> >>> On Tue, Jun 27, 2017 at 10:02:45AM +0100, Andre Przywara wrote:
>  Hi,
> 
>  (CC:ing some people from that Rockchip dmwac series)
> 
>  On 27/06/17 09:21, Corentin Labbe wrote:
> > On Tue, Jun 27, 2017 at 04:11:21PM +0800, Chen-Yu Tsai wrote:
> >> On Tue, Jun 27, 2017 at 4:05 PM, Corentin Labbe
> >>  wrote:
> >>> On Mon, Jun 26, 2017 at 01:18:23AM +0100, André Przywara wrote:
>  On 31/05/17 08:18, Corentin Labbe wrote:
> > The dwmac-sun8i is a heavy hacked version of stmmac hardware by
> > allwinner.
> > In fact the only common part is the descriptor management and
> >> the first
> > register function.
> 
>  Hi,
> 
>  I know I am a bit late with this, but while adapting the U-Boot
> >> driver
>  to the new binding I was wondering about the internal PHY
> >> detection:
> 
> 
>  So here you seem to deduce the usage of the internal PHY by the
> >> PHY
>  interface specified in the DT (MII = internal, RGMII =
> >> external).
>  I think I raised this question before, but isn't it perfectly
> >> legal for
>  a board to use MII with an external PHY even on those SoCs that
> >> feature
>  an internal PHY?
>  On the first glance that does not make too much sense, but apart
> >> from
>  not being the correct binding to describe all of the SoCs
> >> features I see
>  two scenarios:
>  1) A board vendor might choose to not use the internal PHY
> >> because it
>  has bugs, lacks features (configurability) or has other issues.
> >> For
>  instance I have heard reports that the internal PHY makes the
> >> SoC go
>  rather hot, possibly limiting the CPU frequency. By using an
> >> external
>  MII PHY (which are still cheaper than RGMII PHYs) this can be
> >> avoided.
>  2) A PHY does not necessarily need to be directly connected to
>  magnetics. Indeed quite some boards use (RG)MII to connect to a
> >> switch
>  IC or some other network circuitry, for instance fibre
> >> connectors.
> 
>  So I was wondering if we would need an explicit:
>    allwinner,use-internal-phy;
>  boolean DT property to signal the usage of the internal PHY?
>  Alternatively we could go with the negative version:
>    allwinner,disable-internal-phy;
> 
>  Or what about introducing a new "allwinner,internal-mii-phy"
> >> compatible
>  string for the *PHY* node and use that?
> 
>  I just want to avoid that we introduce a binding that causes us
>  headaches later. I think we can still fix this with a followup
> >> patch
>  before the driver and its binding hit a release kernel.
> 
>  Cheers,
>  Andre.
> 
> >>>
> >>> I just see some patch, where "phy-mode = internal" is valid.
> >>> I will try to find a way to use it
> >>
> >> Can you provide a link?
> >
> > https://lkml.org/lkml/2017/6/23/479
> >
> >>
> >> I'm not a fan of using phy-mode for this. There's no guarantee
> >> what
> >> mode the internal PHY uses. That's what phy-mode is for.
> 
>  I can understand Chen-Yu's concerns, but ...
> 
> > For each soc the internal PHY mode is know and setted in
> >> emac_variant/internal_phy
> > So its not a problem.
> 
>  that is true as well, at least for now.
> 
>  So while I agree that having a separate property to indicate the
> >> usage
>  of the internal PHY would be nice, I am bit tempted to use this
> >> easier
>  approach and piggy back on the existing phy-mode property.
> >>>
> >>> We're trying to fix an issue that works for now too.
> >>>
> >>> If we want to consider future weird cases, then we must consider all
> >>> of them. And the phy mode changing is definitely not really far
> >>> fetched.
> >>>
> >>> I agree with Chen-Yu, and I really feel like the compatible solution
> >>> you suggested would cover both your concerns, and ours.
> >>
> >> So something like this?
> >>emac: emac@1c3 {
> >>compatible = "allwinner,sun8i-h3-emac";
> >>...
> >>phy-mode = "mii";
> >>phy-handle = <_mii_phy>;
> >>...
> >>
> >>mdio: mdio {
> >>#address-cells = <1>;
> >>#size-cells = <0>;
> >>int_mii_phy: ethernet-phy@1 {
> >>compatible = "allwinner,sun8i-h3-ephy";
> >>syscon = <>;
> > 
> > The MAC still needs to set some bits of syscon register.

Re: [PATCH v2] netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcv

2017-06-27 Thread Pablo Neira Ayuso

On Wed, Jun 07, 2017 at 03:50:38PM +0200, Mateusz Jurczyk wrote:
> Verify that the length of the socket buffer is sufficient to cover the
> nlmsghdr structure before accessing the nlh->nlmsg_len field for further
> input sanitization. If the client only supplies 1-3 bytes of data in
> sk_buff, then nlh->nlmsg_len remains partially uninitialized and
> contains leftover memory from the corresponding kernel allocation.
> Operating on such data may result in indeterminate evaluation of the
> nlmsg_len < NLMSG_HDRLEN expression.
> 
> The bug was discovered by a runtime instrumentation designed to detect
> use of uninitialized memory in the kernel. The patch prevents this and
> other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Applied, thanks.

Re: [PATCH net-next v3 02/13] sock: skb_copy_ubufs support for compound pages

2017-06-27 Thread Willem de Bruijn

 I looked at some kmap_atomic() implementations and I do not think
 it supports compound pages.
>>>
>>> Indeed. Thanks. It appears that I can do the obvious thing and
>>> kmap the individual page that is being copied inside the loop:
>>>
>>>   kmap_atomic(skb_frag_page(f) + (f_off >> PAGE_SHIFT));
>>>
>>> This is similar to existing logic in copy_huge_page_from_user
>>> and __flush_dcache_page in arch/arm/mm/flush.c
>>>
>>> But, this also applies to other skb operations that call kmap_atomic,
>>> such as skb_copy_bits and __skb_checksum. Not all can be called
>>> from a codepath with a compound user page, but I have to address
>>> the ones that can.
>>
>> Yeah that's quite a mess, it looks like this assumption that
>> kmap can handle compound pages exists in quite a few places.
>
> I hadn't even considered that skbs can already hold compound
> page frags without zerocopy.
>
> Open coding all call sites to iterate is tedious and unnecessary
> in the common case where a page is not highmem.
>
> kmap_atomic has enough slots to map an entire order-3 compound
> page at once. But kmap_atomic cannot fail and there may be edge
> cases that are larger than order-3.
>
> Packet rings allocate with __GFP_COMP and an order derived
> from (user supplied) tp_block_size, for instance. But it links each
> skb_frag_t from an individual page, so this case seems okay.
>
> Perhaps calls to kmap_atomic can be replaced with a
> kmap_compound(..) that checks
>
>  __this_cpu_read(__kmap_atomic_idx) +  (1 << compound_order(p)) < KM_TYPE_NR
>
> before calling kmap_atomic on all pages in the compound page. In
> the common case that the page is not high mem, a single call is
> enough, as there is no per-page operation.

This does not work. Some callers, such as __skb_checksum, cannot
fail, so neither can kmap_compound. Also, vaddr of consecutive
kmap_atomic calls are not guaranteed to be in order. Indeed, on x86
and arm vaddr appears to grows down: (FIXADDR_TOP - ((x) << PAGE_SHIFT))

An alternative is to change the kmap_atomic callers in skbuff.c. To
avoid open coding, we can wrap the kmap_atomic; op; kunmap_atomic
in a macro that loops only if needed:

static inline bool skb_frag_must_loop(struct page *p)
{
#if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32)
if (PageHighMem(p))
return true;
#endif
return false;
}

#define skb_frag_map_foreach(f, start, size, p, p_off, cp, copied)  \
for (p = skb_frag_page(f) + ((start) >> PAGE_SHIFT),\
 p_off = (start) & (PAGE_SIZE - 1), \
 copied = 0,\
 cp = skb_frag_must_loop(p) ?   \
min_t(u32, size, PAGE_SIZE - p_off) : size; \
 copied < size; \
 copied += cp, p++, p_off = 0,  \
 cp = min_t(u32, size - copied, PAGE_SIZE)) \

This does not change behavior on machines without high mem
or on low mem pages.

skb_seq_read keeps a mapping between calls to the function,
so will need a separate approach.

Re: [PATCH 05/11] net: stmmac: dwmac-rk: Add internal phy support

2017-06-27 Thread Heiko Stuebner

Hi David,

Am Dienstag, 27. Juni 2017, 22:33:20 CEST schrieb David.Wu:
> 在 2017/6/24 1:19, Heiko Stuebner 写道:
> > Am Freitag, 23. Juni 2017, 12:59:07 CEST schrieb David Wu:
> >> To make internal phy worked, need to configure the phy_clock,
> >> phy cru_reset and related registers.
> >>
> >> Change-Id: I6971c0a769754b824b1b908b56080cbaf7867d13
> > 
> > please remove all Change-Ids from patches before sending upstream.
> > There were more affected patches in this series.
> > 
> >> Signed-off-by: David Wu 
> >> ---
> >>   .../devicetree/bindings/net/rockchip-dwmac.txt |  3 +
> >>   drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 82 
> >> ++
> >>   2 files changed, 85 insertions(+)
> >>
> >> diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt 
> >> b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> >> index 8f42755..0514f69 100644
> >> --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> >> +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> >> @@ -22,6 +22,7 @@ Required properties:
> >>   < SCLK_MACREF_OUT> clock gate for RMII reference clock output
> >>   < ACLK_GMAC>: AXI clock gate for GMAC
> >>   < PCLK_GMAC>: APB clock gate for GMAC
> >> + < MAC_PHY>: clock for internal macphy
> > 
> > that clock should not be listed as always "Required" like it is here.
> > Make it some sort of extra paragraph marking it as required when using
> > an internal phy.
> > 
> 
> Okay, move it to the option.
> 
> >>- clock-names: One name for each entry in the clocks property.
> >>- phy-mode: See ethernet.txt file in the same directory.
> >>- pinctrl-names: Names corresponding to the numbered pinctrl states.
> >> @@ -35,6 +36,8 @@ Required properties:
> >>- assigned-clocks: main clock, should be < SCLK_MAC>;
> >>- assigned-clock-parents = parent of main clock.
> >>  can be <_gmac> or < SCLK_MAC_PLL>.
> >> + - phy-type: For internal phy, it must be "internal"; For external phy, 
> >> no need
> >> +   to configure this.
> >>   
> >>   Optional properties:
> >>- tx_delay: Delay value for TXD timing. Range value is 0~0x7F, 0x30 as 
> >> default.
> >> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c 
> >> b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> >> index a8e8fd5..c1a1413 100644
> >> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> >> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> >> @@ -41,6 +41,7 @@ struct rk_gmac_ops {
> >>void (*set_to_rmii)(struct rk_priv_data *bsp_priv);
> >>void (*set_rgmii_speed)(struct rk_priv_data *bsp_priv, int speed);
> >>void (*set_rmii_speed)(struct rk_priv_data *bsp_priv, int speed);
> >> +  void (*internal_phy_powerup)(struct rk_priv_data *bsp_priv);
> >>   };
> >>   
> >>   struct rk_priv_data {
> >> @@ -52,6 +53,7 @@ struct rk_priv_data {
> >>   
> >>bool clk_enabled;
> >>bool clock_input;
> >> +  bool internal_phy;
> >>   
> >>struct clk *clk_mac;
> >>struct clk *gmac_clkin;
> >> @@ -61,6 +63,9 @@ struct rk_priv_data {
> >>struct clk *clk_mac_refout;
> >>struct clk *aclk_mac;
> >>struct clk *pclk_mac;
> >> +  struct clk *clk_macphy;
> >> +
> >> +  struct reset_control *macphy_reset;
> >>   
> >>int tx_delay;
> >>int rx_delay;
> >> @@ -750,6 +755,48 @@ static void rk3399_set_rmii_speed(struct rk_priv_data 
> >> *bsp_priv, int speed)
> >>.set_rmii_speed = rk3399_set_rmii_speed,
> >>   };
> >>   
> >> +#define RK_GRF_MACPHY_CON00xb00
> >> +#define RK_GRF_MACPHY_CON10xb04
> >> +#define RK_GRF_MACPHY_CON20xb08
> >> +#define RK_GRF_MACPHY_CON30xb0c
> >> +
> >> +#define RK_MACPHY_ENABLE  GRF_BIT(0)
> >> +#define RK_MACPHY_DISABLE GRF_CLR_BIT(0)
> >> +#define RK_MACPHY_CFG_CLK_50M GRF_BIT(14)
> >> +#define RK_GMAC2PHY_RMII_MODE (GRF_BIT(6) | GRF_CLR_BIT(7))
> >> +#define RK_GRF_CON2_MACPHY_ID HIWORD_UPDATE(0x1234, 0x, 0)
> >> +#define RK_GRF_CON3_MACPHY_ID HIWORD_UPDATE(0x35, 0x3f, 0)
> > 
> > These are primarily registers for the rk3328 and come from the GRF which is
> > somehow prone to chip-designers moving bits around in registers and also
> > especially the register offsets (*_CONx) will probably not stay the same
> > on future socs.
> > 
> 
> I think they should try to keep the same. But what you said is very 
> reasonable. So let's give rk3228 and rk3328 different 
> internal_phy_powerup() in the rk_gmac_ops to set their own configuration?

I just looked at both the rk3228 and rk3328 GRFs and really this seems
to be the first time I see GRF-parts that are similar :-) .

There is no need to duplicate code unnecessarily, if the registers really
are the same for both. So I guess, just prefix everything with a rk3228_*
and add a comment that the rk3328 uses the same GRF layout.

That way future socs, can then add their (likely) changed

[PATCH] iwlwifi: mvm: fix iwl_mvm_sar_find_wifi_pkg corner case

2017-06-27 Thread Arnd Bergmann

gcc warns about what it thinks is an uninitialized variable
access:

drivers/net/wireless/intel/iwlwifi/mvm/fw.c: In function 
'iwl_mvm_sar_find_wifi_pkg.isra.14':
drivers/net/wireless/intel/iwlwifi/mvm/fw.c:1102:5: error: 'wifi_pkg' may be 
used uninitialized in this function [-Werror=maybe-uninitialized]

That problem cannot really happen, as we check data->package.count
to ensure that the loop is entered at least once.
However, something that can indeed happen is returning an incorrect
wifi_pkg pointer in case none of the elements are what we are looking
for.

This modifies the loop again, to only return a correct object, and
to shut up that warning.

Fixes: c386dacb4ed6 ("iwlwifi: mvm: refactor SAR init to prepare for dynamic 
SAR")
Signed-off-by: Arnd Bergmann 
---
 drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c 
b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index 24cc406d87ef..730c7f68c0b3 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -1094,14 +1094,12 @@ static union acpi_object 
*iwl_mvm_sar_find_wifi_pkg(struct iwl_mvm *mvm,
domain = _pkg->package.elements[0];
if (domain->type == ACPI_TYPE_INTEGER &&
domain->integer.value == ACPI_WIFI_DOMAIN)
-   break;
-
-   wifi_pkg = NULL;
+   goto found;
}
 
-   if (!wifi_pkg)
-   return ERR_PTR(-ENOENT);
+   return ERR_PTR(-ENOENT);
 
+found:
return wifi_pkg;
 }
 
-- 
2.9.0

Re: [PATCH 01/11] net: phy: Add rockchip phy driver support

2017-06-27 Thread David.Wu


Hi Andrew,

在 2017/6/27 22:46, Andrew Lunn 写道:

it has been licensed from somebody.


And does that somebody already have a driver for it? There is no point
adding a driver, if all you need to do is add the ID to another
driver.



I didn't find it.
Maybe use the same, but the configuration is different.
But this may also be possible, upstream with a new driver.


Andrew

Re: [PATCH 05/11] net: stmmac: dwmac-rk: Add internal phy support

2017-06-27 Thread Andrew Lunn

> I'm a little confused for the property of phy-mode = "internal".
> If the property of phy-mode is configured as "internal" from DT , i
> could not get the rmii or rgmii mode for the phy.
> I use it to differentiate rmii or rgmii for different configuration.

phy-mode is about the bus between the MAC and the PHY. Internal means
there is not a standard bus between the MAC and the PHY, something
proprietary is being used to embed the PHY in the MAC.

If you are using RMII or RGMII, then it is not internal, in that as
standard bus is being used. It does not matter if that bus is not
available external to the SoC, it still exists.

  Andrew

Re: [PATCH 01/11] net: phy: Add rockchip phy driver support

2017-06-27 Thread Andrew Lunn

> it has been licensed from somebody.

And does that somebody already have a driver for it? There is no point
adding a driver, if all you need to do is add the ID to another
driver.

Andrew

Re: [PATCH 01/11] net: phy: Add rockchip phy driver support

2017-06-27 Thread David.Wu


Hi Andrew,

在 2017/6/24 10:19, Andrew Lunn 写道:

On Fri, Jun 23, 2017 at 12:41:59PM +0800, David Wu wrote:

Support internal ephy currently.

Signed-off-by: David Wu 
---
  drivers/net/phy/Kconfig|  4 ++
  drivers/net/phy/Makefile   |  1 +
  drivers/net/phy/rockchip.c | 94 ++
  3 files changed, 99 insertions(+)
  create mode 100644 drivers/net/phy/rockchip.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index c360dd6..86010d4 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -350,6 +350,10 @@ config XILINX_GMII2RGMII
   the Reduced Gigabit Media Independent Interface(RGMII) between
   Ethernet physical media devices and the Gigabit Ethernet controller.
  
+config ROCKCHIP_MAC_PHY


This is a bit of an odd name, having both MAC and PHY in it. Are there
any other RockChip PHYs? Any external PHYS? Are they register
incompatible with the internal PHY?  Is it even RockChip IP? Or has it
been licensed from somebody else?



Maybe I just want to highlight it applies to the PHY with Mac 
connection, actually I was named Rockchip at the beginning, as Heiko 
said, PHY is a wide range, so add a modifier to restrict it.


Yes, we use other external phys, like realtek and etc...

If we have any other phy in our socs, it also could be expanded at 
rockchip_phy_tbl{} at rockchip.c


it has been licensed from somebody.


I would more likely just call it ROCKCHIP_PHY.

   Andrew

[PATCH iproute2 V1 0/5] RDMAtool

2017-06-27 Thread Leon Romanovsky

From: Leon Romanovsky 

Hi,

This is second version of series implementing the RDAMtool -  the tool
to configure RDMA devices. The initial proposal was sent as RFC [1] and
was based on sysfs entries as POC.

The current series was rewritten completely to work with RDMA netlinks as
a source of user<->kernel communications. In order to achieve that, the
RDMA netlinks were extensively refactored and modernized [2, 3, 4 and 5].

Changelog
v0->v1:
 * Moved hunk with changes in man/Makefile from first patch to the last patch
 * Removed the "unknown command" from the examples in commit messages
 * Removed special "caps" parsing command and put it to be part of general 
"show" command
 * Changed parsed capability format to be similar to iproute2 suite
 * Added FW version as an output of show command.
 * Added forgotten CAP_FLAGS to the nla_policy list
RFC->v0:
 * Removed everything that is not implemented yet.
 * Abandoned sysfs interfaces in favor of netlink.

Available in the "topic/rdmatool-netlink-v1" topic branch of this git repo:
git://git.kernel.org/pub/scm/linux/kernel/git/leon/iproute2.git

Or for browsing:
https://git.kernel.org/cgit/linux/kernel/git/leon/iproute2.git/log/?h=topic/rdmatool-netlink-v1

Thanks

[1] https://www.spinics.net/lists/linux-rdma/msg49575.html
[2] https://patchwork.kernel.org/patch/9752865/
[3] https://www.spinics.net/lists/linux-rdma/msg50827.html
[4] https://www.spinics.net/lists/linux-rdma/msg51210.html
[5] https://patchwork.kernel.org/patch/9811729/ and 
https://patchwork.kernel.org/patch/9811731/]

Cc: Doug Ledford 
Cc: Ariel Almog 
Cc: Dennis Dalessandro 
Cc: Jason Gunthorpe 
Cc: Linux RDMA 
Cc: Linux Netdev 

Leon Romanovsky (6):
  rdma: Add basic infrastructure for RDMA tool
  rdma: Add dev object
  rdma: Add device capability parsing
  rdma: Add link option and parsing
  rdma: Add FW version to the device output
  rdma: Add initial manual for the tool

 Makefile  |   2 +-
 man/man8/Makefile |   3 +-
 man/man8/rdma.8   |  82 ++
 rdma/.gitignore   |   1 +
 rdma/Makefile |  22 
 rdma/dev.c| 145 +
 rdma/link.c   | 202 +++
 rdma/rdma.c   | 112 
 rdma/rdma.h   |  90 
 rdma/utils.c  | 312 ++
 10 files changed, 969 insertions(+), 2 deletions(-)
 create mode 100644 man/man8/rdma.8
 create mode 100644 rdma/.gitignore
 create mode 100644 rdma/Makefile
 create mode 100644 rdma/dev.c
 create mode 100644 rdma/link.c
 create mode 100644 rdma/rdma.c
 create mode 100644 rdma/rdma.h
 create mode 100644 rdma/utils.c

--
2.13.1

[PATCH iproute2 V1 3/6] rdma: Add device capability parsing

2017-06-27 Thread Leon Romanovsky

From: Leon Romanovsky 

Add parsing interface for the device capability flags

$ rdma dev show
1: mlx5_0:
caps: 
2: mlx5_1:
caps: 
3: mlx5_2:
caps: 
4: mlx5_3:
caps: 
5: mlx5_4:
caps: 
root@mtr-leonro:~#

$ rdma dev show mlx5_4
5: mlx5_4:
caps: 

Signed-off-by: Leon Romanovsky 
---
 rdma/dev.c   | 99 +++-
 rdma/rdma.h  |  3 ++
 rdma/utils.c |  2 +-
 3 files changed, 95 insertions(+), 9 deletions(-)

diff --git a/rdma/dev.c b/rdma/dev.c
index d4809d63..76f4af88 100644
--- a/rdma/dev.c
+++ b/rdma/dev.c
@@ -17,28 +17,111 @@ static int dev_help(struct rdma *rd)
return 0;
 }
 
-static void dev_one_show(const struct dev_map *dev_map)
+static const char *dev_caps[64] = {
+   "RESIZE_MAX_WR",
+   "BAD_PKEY_CNTR",
+   "BAD_QKEY_CNTR",
+   "RAW_MULTI",
+   "AUTO_PATH_MIG",
+   "CHANGE_PHY_PORT",
+   "UD_AV_PORT_ENFORCE",
+   "CURR_QP_STATE_MOD",
+   "SHUTDOWN_PORT",
+   "INIT_TYPE",
+   "PORT_ACTIVE_EVENT",
+   "SYS_IMAGE_GUID",
+   "RC_RNR_NAK_GEN",
+   "SRQ_RESIZE",
+   "N_NOTIFY_CQ",
+   "LOCAL_DMA_LKEY",
+   "RESERVED",
+   "MEM_WINDOW",
+   "UD_IP_CSUM",
+   "UD_TSO",
+   "XRC",
+   "MEM_MGT_EXTENSIONS",
+   "BLOCK_MULTICAST_LOOPBACK",
+   "MEM_WINDOW_TYPE_2A",
+   "MEM_WINDOW_TYPE_2B",
+   "RC_IP_CSUM",
+   "RAW_IP_CSUM",
+   "CROSS_CHANNEL",
+   "MANAGED_FLOW_STEERING",
+   "SIGNATURE_HANDOVER",
+   "ON_DEMAND_PAGING",
+   "SG_GAPS_REG",
+   "VIRTUAL_FUNCTION",
+   "RAW_SCATTER_FCS",
+   "RDMA_NETDEV_OPA_VNIC",
+};
+
+static int dev_print_caps(struct rdma *rd)
 {
-   pr_out("%u: %s:\n", dev_map->idx, dev_map->dev_name);
+   struct dev_map *dev_map = rd->dev_map_curr;
+   uint64_t caps = dev_map->caps;
+   bool found = false;
+   uint32_t idx;
+
+   pr_out("caps: <");
+   for (idx = 0; idx < 64; idx++) {
+   if (caps & 0x1) {
+   pr_out("%s", dev_caps[idx]?dev_caps[idx]:"UNKNONW");
+   if (caps >> 0x1)
+   pr_out(", ");
+   found = true;
+   }
+   caps >>= 0x1;
+   }
+   if(!found)
+   pr_out("NONE");
+
+   pr_out(">\n");
+   return 0;
+}
+
+static int dev_no_args(struct rdma *rd)
+{
+   struct dev_map *dev_map = rd->dev_map_curr;
+
+   pr_out("%u: %s: \n", dev_map->idx, dev_map->dev_name);
+   return dev_print_caps(rd);
+}
+
+static int dev_one_show(struct rdma *rd)
+{
+   const struct rdma_cmd cmds[] = {
+   { NULL, dev_no_args},
+   { 0 }
+   };
+
+   return rdma_exec_cmd(rd, cmds, "parameter");
+
 }
 
 static int dev_show(struct rdma *rd)
 {
struct dev_map *dev_map;
+   int ret = 0;
 
if (rd_no_arg(rd)) {
-   list_for_each_entry(dev_map, >dev_map_list, list)
-   dev_one_show(dev_map);
+   list_for_each_entry(dev_map, >dev_map_list, list) {
+   rd->dev_map_curr = dev_map;
+   ret = dev_one_show(rd);
+   if (ret)
+   return ret;
+   }
+
}
else {
-   dev_map = dev_map_lookup(rd, false);
-   if (!dev_map) {
+

1 2 3 >

1 - 100 of 217 matches

Mail list logo