date:20180301

Re: [PATCH iproute2 2/3] ss: Fix build with old libc headers without AF_VSOCK

2018-03-01 Thread Serhey Popovych

Stephen Hemminger wrote:
> On Tue, 27 Feb 2018 14:06:51 +0200
> Serhey Popovych  wrote:
> 
>> diff --git a/include/compat/libc/bits/socket.h 
>> b/include/compat/libc/bits/socket.h
>> new file mode 100644
>> index 000..25ef0d5
>> --- /dev/null
>> +++ b/include/compat/libc/bits/socket.h
>> @@ -0,0 +1,15 @@
>> +#ifndef _IP_COMPAT_BITS_SOCKET_H
>> +#define _IP_COMPAT_BITS_SOCKET_H
>> +
>> +#include_next 
>> +
>> +#ifndef AF_VSOCK
>> +#define PF_VSOCK40
>> +#define AF_VSOCKPF_VSOCK
>> +#undef PF_MAX
>> +#undef AF_MAX
>> +#define PF_MAX  41
>> +#define AF_MAX  PF_MAX
>> +#endif /* AF_VSOCK */
>> +
>> +#endif /* _IP_COMPAT_BITS_SOCKET_H */
> 
> It makes more sense to change ss.c to ifdef out the code related to AF_VSOCK
> if it is not defined.  Rather than asking for unknown address family.
> 

Yes, I did this as v0 before sending series. But now I'm thinking that
single place for all compat stuff would ease tracking of changes.



signature.asc
Description: OpenPGP digital signature

Re: [PATCH iproute2 1/3] ip: Fix compilation with kernel headers < 3.4

2018-03-01 Thread Serhey Popovych

Stephen Hemminger wrote:
> On Tue, 27 Feb 2018 21:34:56 +0200
> Serhey Popovych  wrote:
> 
>> Stephen Hemminger wrote:
>>> On Tue, 27 Feb 2018 14:06:50 +0200
>>> Serhey Popovych  wrote:
>>>   
 Since commit 596b1c94aa38 ("iproute: build more easily on Android"),
 iproute2 uses types __kernel_long_t and __kernel_ulong_t but does not
 provide internal definitions for it.

 This means that compilation using kernel headers that are older than 3.4
 (where these types were added) will fail. This situation may be uncommon
 for native compilation, but not uncommon for cross compilation where the
 toolchains may be a bit older.

 Provide the necessary types internally if not provided by the kernel
 headers to fix compilation in such cases.

 Co-Developed-by: Serhii Popovych 
 Signed-off-by: Thomas De Schampheleire 
 Signed-off-by: Serhey Popovych 
 ---
  Makefile  |5 -
  include/compat/kernel/linux/sysinfo.h |   14 ++  
>>>
>>> Why not just start a single file include/compat.h which is what
>>> other software does.  
>>
>> Yes it is good, but not for our case. We use include_next to define
>> __kernel_long_t and __kernel_ulong_t types if they not defined. If doing
>> single  we need to include it in nearly all .c files as first
>> include file.
>>
>> I also start thinking on single  and found it bit complicated
>> than just adding header, (re)defining functionality and then include_next.
>>
>>>
>>> Doing fine grained kernel and libc per file makes it more painful.  
>>
>> Agree, and we already have  done using similar schema
>> that is reverted with this series.
> 
> This is a real rats nest. It all comes because kernel headers are including 
> asm/posix_types.h.
> Normally, I would just clone that file out of kernel headers process, but the 
> file
> is arch specific which doesn't help.
>

Anyway I'm still thinking that using include_next and separate kernel
and libc directories is most flexible schema to provide/track compatibility:

  1) No need to modify each .c file in package by adding custom
 . No need to track places where we need to include it.

  2) Way to tweak kernel/libc headers at first place and then
 continue with system/uapi headers via include_next.

  3) It is possible to completely replace system/uapi header by just
 putting it in correct location under comat/ in the same way we
 already did for .

  4) Single place for all compat stuff: no need to add compatibility:
 easy to track changes.

So at the moment we have two possible approaches:

  1) Use comat directory and include_next

  2) Provide single comat.h header file and include it in all .c
 (or at least utils.h and some .c that does not include it).

Is this correct? Other options are welcome. If you prefer to use
compat.h I can prepare series, but at this moment I think this could
potentially have side effects (like missing include of compat.h in
some .c files in some setups).

> 

signature.asc
Description: OpenPGP digital signature

Re: [Intel-wired-lan] SRIOV on Intel x710 fail to get attached both at VM creation time and VM runtime

2018-03-01 Thread Stefan Assmann

On 2018-03-01 19:40, Alexander Duyck wrote:
> On Thu, Mar 1, 2018 at 8:12 AM,   wrote:
> > + intel-wired-...@lists.osuosl.org
> >
> >
> > On 2018-03-01 21:41, p...@codeaurora.org wrote:
> >>
> >> Hi All,
> >>
> >> I am facing the following issue on kernel 4.14.14.
> >>
> >> Enable SRIOV on Intel x710 card.
> >> echo 32 > /sys/class/net/eth1/device/sriov_numvfs
> >> start net_pool
> >> virsh net-start intel_pool
> >>
> >> case 1)
> >> attach the VF while creatig VM:
> >> virt-install --accelerate --import --disk /home/disk.img --network
> >> network=intel_pool  --boot uefi --name poza-guest --os-type linux
> >> --os-variant rhel7 --ram 8000 --vcpus 4
> >>
> >> case 2)
> >> create VM:
> >> virt-install --accelerate --import --disk /home/disk.img --boot uefi
> >> --name poza-guest --os-type linux --os-variant rhel7 --ram 8000
> >> --vcpus 4
> >> attach it:
> >> virsh attach-interface --domain oza-guest --type network --source
> >> intel_pool --target eth1
> >>
> >> kernel logs:
> >> [44287.825287] i40evf :01:02.0: Unable to send opcode 2 to PF, err
> >> I40E_ERR_QUEUE_EMPTY, aq_err OK
> >> [44287.962640] i40e :01:00.0: VF 0 still in reset. Try again.
> >> error: Failed to attach interface
> >> error: Cannot set interface MAC/vlanid to 52:54:00:e9:f1:b5/0 for
> >> ifname eth1 vf 0: Resource temporarily unavailable
> >>
> >>
> >> The same use case works with following card with the same kernel
> >> version and rootfs.
> >> Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
> >>
> >> for details logs please have a look at
> >> https://bugzilla.kernel.org/show_bug.cgi?id=198959
> >>
> >> Regards,
> >> Oza.
> 
> So the first question that jumps to mind is what is the firmware
> version on the PF, ethtool -i should tell you. Are there any issues
> bringing up the PF and getting it to pass traffic?

There's a patch on Intel-wired-lan titled "i40e: Fix attach VF to VM
issue" which should fix the problem. It's not upstream yet.

As a workaround you could unload/blacklist the i40evf driver in the
host.

  Stefan

Re: [PATCH net] virtio-net: re enable XDP_REDIRECT for mergeable buffer

2018-03-01 Thread Jesper Dangaard Brouer

On Fri,  2 Mar 2018 14:25:29 +0800
Jason Wang  wrote:

> @@ -770,6 +774,19 @@ static struct sk_buff *receive_mergeable(struct 
> net_device *dev,
>   goto err_xdp;
>   rcu_read_unlock();
>   goto xdp_xmit;
> + case XDP_REDIRECT:
> + err = xdp_do_redirect(dev, , xdp_prog);
> + if (err) {
> + trace_xdp_exception(vi->dev, xdp_prog, act);

Do not add a trace_xdp_exception here... this is handled inside
xdp_do_redirect() invocation.

> + if (unlikely(xdp_page != page))
> + put_page(xdp_page);
> + goto err_xdp;
> + }
> + *xdp_xmit = true;
> + if (unlikely(xdp_page != page))
> + goto err_xdp;
> + rcu_read_unlock();
> + goto xdp_xmit;



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

RE: [PATCH] pci-iov: Add support for unmanaged SR-IOV

2018-03-01 Thread Tian, Kevin

> From: Tian, Kevin
> Sent: Friday, March 2, 2018 2:54 PM
> 
> > From: Alex Williamson
> > Sent: Friday, March 2, 2018 4:22 AM
> > >
> > > I am pretty sure that you are describing is true of some, but not for
> > > all. I think the Amazon solutions and the virtio solution are doing
> > > hard partitioning of the part. I will leave it to those guys to speak
> > > for themselves since I don't know anything about the hardware design
> > > of those parts.
> >
> > I think we'd need device specific knowledge and enablement to be able
> > to take advantage of any hardware partitioning, otherwise we need to
> > assume the pf is privileged, as implemented in other sriov devices.
> >
> > I'm also trying to imagine whether there's a solution via the new
> > vfio/mdev interface, where the mdev vendor driver would bind to the pf
> > and effectively present itself as the mdev device.  The vendor driver
> > could provide sriov capabilities and bear the burden of ensuring that
> > the pf is used cooperatively.  The only existing mdev vendor drivers are
> > vGPUs and rely on on-device DMA translation and isolation, such as
> > through GTTs, but there have been some thoughts on providing IOMMU
> > based
> > isolation of mdev/sriov mixed devices (assuming DMA is even required
> > for userspace management of the pf in this use case).  [Cc Kirti]
> > Thanks,
> >
> 
> Hope not distracting this thread, but above sounds like an interesting
> idea. Actually we ever brainstormed similar thought for another
> potential usage - supporting VF live migration. We are already working
> on an generic extension to allow state save/restore of mdev instance.
> If vendor driver could further wrap pf as a mdev instance, it could

I meant "wrap vf as a mdev instance" here.

> leverage the same framework for a clean state migration on VF. based
> on mmap callback the vendor driver can easily switch back-and-forth
> between pass through and trap/emulation of the VF resources. Of
> course doing so alone doesn't address all the demands of VF live
> migration (e.g. dirty page tracking still requires other techniques),
> but it does pave a way toward a general framework to support VF
> live migration.
> 
> If above is feasible, finally we could use one mdev framework to
> manage both mdev and pf/vf assignment, while providing added
> values which are difficult to achieve today. :-)
> 
> Thanks
> Kevin

RE: [PATCH] pci-iov: Add support for unmanaged SR-IOV

2018-03-01 Thread Tian, Kevin

> From: Alex Williamson
> Sent: Friday, March 2, 2018 4:22 AM
> >
> > I am pretty sure that you are describing is true of some, but not for
> > all. I think the Amazon solutions and the virtio solution are doing
> > hard partitioning of the part. I will leave it to those guys to speak
> > for themselves since I don't know anything about the hardware design
> > of those parts.
> 
> I think we'd need device specific knowledge and enablement to be able
> to take advantage of any hardware partitioning, otherwise we need to
> assume the pf is privileged, as implemented in other sriov devices.
> 
> I'm also trying to imagine whether there's a solution via the new
> vfio/mdev interface, where the mdev vendor driver would bind to the pf
> and effectively present itself as the mdev device.  The vendor driver
> could provide sriov capabilities and bear the burden of ensuring that
> the pf is used cooperatively.  The only existing mdev vendor drivers are
> vGPUs and rely on on-device DMA translation and isolation, such as
> through GTTs, but there have been some thoughts on providing IOMMU
> based
> isolation of mdev/sriov mixed devices (assuming DMA is even required
> for userspace management of the pf in this use case).  [Cc Kirti]
> Thanks,
> 

Hope not distracting this thread, but above sounds like an interesting
idea. Actually we ever brainstormed similar thought for another 
potential usage - supporting VF live migration. We are already working
on an generic extension to allow state save/restore of mdev instance.
If vendor driver could further wrap pf as a mdev instance, it could 
leverage the same framework for a clean state migration on VF. based
on mmap callback the vendor driver can easily switch back-and-forth
between pass through and trap/emulation of the VF resources. Of
course doing so alone doesn't address all the demands of VF live
migration (e.g. dirty page tracking still requires other techniques), 
but it does pave a way toward a general framework to support VF
live migration.

If above is feasible, finally we could use one mdev framework to
manage both mdev and pf/vf assignment, while providing added
values which are difficult to achieve today. :-)

Thanks
Kevin

[PATCH net] virtio-net: re enable XDP_REDIRECT for mergeable buffer

2018-03-01 Thread Jason Wang

XDP_REDIRECT support for mergeable buffer was removed since commit
7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. So this patch fixes this
by reserving enough tailroom and using fixed size of rx buffer.

Signed-off-by: Jason Wang 
---
 drivers/net/virtio_net.c | 55 +---
 1 file changed, 43 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9bb9e56..11e48c5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -504,6 +504,7 @@ static struct page *xdp_linearize_page(struct receive_queue 
*rq,
page_off += *len;
 
while (--*num_buf) {
+   int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
unsigned int buflen;
void *buf;
int off;
@@ -518,7 +519,7 @@ static struct page *xdp_linearize_page(struct receive_queue 
*rq,
/* guard against a misconfigured or uncooperative backend that
 * is sending packet larger than the MTU.
 */
-   if ((page_off + buflen) > PAGE_SIZE) {
+   if ((page_off + buflen + tailroom) > PAGE_SIZE) {
put_page(p);
goto err_buf;
}
@@ -690,6 +691,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
unsigned int truesize;
unsigned int headroom = mergeable_ctx_to_headroom(ctx);
bool sent;
+   int err;
 
head_skb = NULL;
 
@@ -701,7 +703,12 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
void *data;
u32 act;
 
-   /* This happens when rx buffer size is underestimated */
+   /* This happens when rx buffer size is underestimated
+* or headroom is not enough because of the buffer
+* was refilled before XDP is set. This should only
+* happen for the first several packets, so we don't
+* care much about its performance.
+*/
if (unlikely(num_buf > 1 ||
 headroom < virtnet_get_headroom(vi))) {
/* linearize data for XDP */
@@ -736,9 +743,6 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
 
act = bpf_prog_run_xdp(xdp_prog, );
 
-   if (act != XDP_PASS)
-   ewma_pkt_len_add(>mrg_avg_pkt_len, len);
-
switch (act) {
case XDP_PASS:
/* recalculate offset to account for any header
@@ -770,6 +774,19 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
goto err_xdp;
rcu_read_unlock();
goto xdp_xmit;
+   case XDP_REDIRECT:
+   err = xdp_do_redirect(dev, , xdp_prog);
+   if (err) {
+   trace_xdp_exception(vi->dev, xdp_prog, act);
+   if (unlikely(xdp_page != page))
+   put_page(xdp_page);
+   goto err_xdp;
+   }
+   *xdp_xmit = true;
+   if (unlikely(xdp_page != page))
+   goto err_xdp;
+   rcu_read_unlock();
+   goto xdp_xmit;
default:
bpf_warn_invalid_xdp_action(act);
case XDP_ABORTED:
@@ -1013,13 +1030,18 @@ static int add_recvbuf_big(struct virtnet_info *vi, 
struct receive_queue *rq,
 }
 
 static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
- struct ewma_pkt_len *avg_pkt_len)
+ struct ewma_pkt_len *avg_pkt_len,
+ unsigned int room)
 {
const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
unsigned int len;
 
-   len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
+   if (room)
+   return PAGE_SIZE - room;
+
+   len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
rq->min_buf_len, PAGE_SIZE - hdr_len);
+
return ALIGN(len, L1_CACHE_BYTES);
 }
 
@@ -1028,21 +1050,27 @@ static int add_recvbuf_mergeable(struct virtnet_info 
*vi,
 {
struct page_frag *alloc_frag = >alloc_frag;
unsigned int headroom = virtnet_get_headroom(vi);
+   unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
+   unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
char *buf;
void *ctx;
int err;

Re: [PATCH v2 net] rds: Incorrect reference counting in TCP socket creation

2018-03-01 Thread santosh.shilim...@oracle.com




On 3/1/18 9:07 PM, Ka-Cheong Poon wrote:

Commit 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the
accept socket") has a reference counting issue in TCP socket creation
when accepting a new connection.  The code uses sock_create_lite() to
create a kernel socket.  But it does not do __module_get() on the
socket owner.  When the connection is shutdown and sock_release() is
called to free the socket, the owner's reference count is decremented
and becomes incorrect.  Note that this bug only shows up when the socket
owner is configured as a kernel module.

v2: Update comments


Versioning comment typically goes below "---" and not part of
commit message.


Signed-off-by: Ka-Cheong Poon 
---

Patch looks fine.
Acked-by: Santosh Shilimkar

Re: SRIOV on Intel x710 fail to get attached both at VM creation time and VM runtime

2018-03-01 Thread poza


On 2018-03-02 09:10, Alexander Duyck wrote:

On Thu, Mar 1, 2018 at 8:12 AM,   wrote:

+ intel-wired-...@lists.osuosl.org


On 2018-03-01 21:41, p...@codeaurora.org wrote:


Hi All,

I am facing the following issue on kernel 4.14.14.

Enable SRIOV on Intel x710 card.
echo 32 > /sys/class/net/eth1/device/sriov_numvfs
start net_pool
virsh net-start intel_pool

case 1)
attach the VF while creatig VM:
virt-install --accelerate --import --disk /home/disk.img --network
network=intel_pool  --boot uefi --name poza-guest --os-type linux
--os-variant rhel7 --ram 8000 --vcpus 4

case 2)
create VM:
virt-install --accelerate --import --disk /home/disk.img --boot uefi
--name poza-guest --os-type linux --os-variant rhel7 --ram 8000
--vcpus 4
attach it:
virsh attach-interface --domain oza-guest --type network --source
intel_pool --target eth1

kernel logs:
[44287.825287] i40evf :01:02.0: Unable to send opcode 2 to PF, 
err

I40E_ERR_QUEUE_EMPTY, aq_err OK
[44287.962640] i40e :01:00.0: VF 0 still in reset. Try again.
error: Failed to attach interface
error: Cannot set interface MAC/vlanid to 52:54:00:e9:f1:b5/0 for
ifname eth1 vf 0: Resource temporarily unavailable


The same use case works with following card with the same kernel
version and rootfs.
Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 
Lx]


for details logs please have a look at
https://bugzilla.kernel.org/show_bug.cgi?id=198959

Regards,
Oza.


So the first question that jumps to mind is what is the firmware
version on the PF, ethtool -i should tell you. Are there any issues
bringing up the PF and getting it to pass traffic?


yes PF is working fine. and VFs are working fine on host as well.
only problem is attaching them to VM instances.

root@AW-BGLR-REP-32:/sys/class/net/eth0/device# echo 2 > sriov_numvfs

root@AW-BGLR-REP-32:/sys/class/net/eth0/device# ethtool -i eth7
driver: i40evf
version: 3.0.0-k
firmware-version: N/A
expansion-rom-version:
bus-info: :01:02.1
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
root@AW-BGLR-REP-32:/sys/class/net/eth0/device# ethtool -i eth0
driver: i40e
version: 2.1.14-k
firmware-version: 6.01 0x800034af 1.1747.0
expansion-rom-version:
bus-info: :01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes


root@AW-BGLR-REP-32:/sys/class/net/eth0/device# udhcpc -i eth0
udhcpc (v1.22.1) started
Sending discover...
Sending select for 10.131.25.80...
Lease of 10.131.25.80 obtained, lease time 86400

root@AW-BGLR-REP-32:/sys/class/net/eth0/device# udhcpc -i eth7
udhcpc (v1.22.1) started
Sending discover...
Sending select for 10.131.25.133...
Lease of 10.131.25.133 obtained, lease time 86400
ip: RTNETLINK answers: File exists

root@AW-BGLR-REP-32:/sys/class/net/eth0/device# ifconfig -a
eth0  Link encap:Ethernet  HWaddr 68:05:ca:38:6f:40
  inet addr:10.131.25.80  Bcast:10.131.31.255  
Mask:255.255.248.0
  inet6 addr: 2002:c023:9c17:d124:6a05:caff:fe38:6f40/64 
Scope:Global

  inet6 addr: fe80::6a05:caff:fe38:6f40/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:979 errors:0 dropped:0 overruns:0 frame:0
  TX packets:129 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:77926 (77.9 KB)  TX bytes:24790 (24.7 KB)

eth1  Link encap:Ethernet  HWaddr 8c:fd:f0:06:a8:d9
  inet addr:10.131.24.219  Bcast:10.131.31.255  
Mask:255.255.248.0

  inet6 addr: fe80::8efd:f0ff:fe06:a8d9/64 Scope:Link
  inet6 addr: 2002:c023:9c17:d124:8efd:f0ff:fe06:a8d9/64 
Scope:Global

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:5176 errors:0 dropped:0 overruns:0 frame:0
  TX packets:13 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:401506 (401.5 KB)  TX bytes:1638 (1.6 KB)
  Interrupt:37

eth2  Link encap:Ethernet  HWaddr 68:05:ca:38:6f:41
  BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth3  Link encap:Ethernet  HWaddr 68:05:ca:38:6f:42
  BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth4  Link encap:Ethernet  HWaddr 68:05:ca:38:6f:43
  BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0

RE: [PATCH v3 net-next 1/2] lan743x: Add main source files for new lan743x driver

2018-03-01 Thread Bryan.Whitehead

> > +static int lan743x_phy_reset(struct lan743x_adapter *adapter) {
> > +   u32 data;
> > +
> > +   data = lan743x_csr_read(adapter, PMT_CTL);
> > +   data |= PMT_CTL_ETH_PHY_RST_;
> > +   lan743x_csr_write(adapter, PMT_CTL, data);
> > +
> > +   return readx_poll_timeout(LAN743X_CSR_READ_OP, PMT_CTL,
> data,
> > + (!(data & PMT_CTL_ETH_PHY_RST_) &&
> > + (data & PMT_CTL_READY_)),
> > + 5, 100);
> > +}
> 
> Hi Bryan
> 
> Could you explain this a bit more. What exactly is it resetting? Do we need to
> tell the phylib that the PHY has been reset and that it needs to re-program 
> it?
> Or by phy do you mean a SERDES interface?

Hi Andrew,

This function resets the Ethernet phy. But it is called only in probe and 
before mdiobus_register. So I don't believe it is necessary to tell phylib.

[snip]
> > +
> > +   /* PHY interrupt enabled here */
> > +   phy_start(phydev);
> > +   phy_start_aneg(phydev);
> > +   return 0;
> 
> Are phy interrupts really enabled here? I could of missed it, but i don't see
> anywhere PHY interrupts are configured. This is either done via device tree,
> you set phydev->irq, or mdiobus->irq[X].

Sorry that is an obsolete comment, I will remove it. It is not using phy 
interrupts. It's using polling.

> 
> > +static int lan743x_tx_open(struct lan743x_tx *tx) {
> > +   struct lan743x_adapter *adapter = NULL;
> > +   u32 data = 0;
> > +   int ret;
> > +
> > +   adapter = tx->adapter;
> > +   ret = lan743x_tx_ring_init(tx);
> > +   if (ret)
> > +   goto return_error;
> 
> You could just return here. You don't do anything useful at
> return_error:

Yes, I'll fix it.

[snip]
 
> This is much nicer without all the flags. Thanks.

No problem, thanks for your patients with me.

> 
> > +static struct pci_driver lan743x_pcidev_driver = {
> > +   .name = DRIVER_NAME,
> > +   .id_table = lan743x_pcidev_tbl,
> > +   .probe= lan743x_pcidev_probe,
> > +   .remove   = lan743x_pcidev_remove,
> > +   .shutdown = lan743x_pcidev_shutdown, };
> > +
> > +static int __init lan743x_module_init(void) {
> > +   int result = -EINVAL;
> > +
> > +   pr_info(DRIVER_DESC "\n");
> > +   result = pci_register_driver(_pcidev_driver);
> > +   if (result)
> > +   pr_warn("pci_register_driver returned error code, %d\n",
> > +   result);
> > +   return result;
> > +}
> > +
> > +module_init(lan743x_module_init);
> > +
> > +static void __exit lan743x_module_exit(void) {
> > +   pci_unregister_driver(_pcidev_driver);
> > +}
> >
> > +module_exit(lan743x_module_exit);
> 
> You can replace this boilerplate code with module_pci_driver().
> You don't do anything special here.

OK

Thanks,
Bryan

[PATCH v2 net] rds: Incorrect reference counting in TCP socket creation

2018-03-01 Thread Ka-Cheong Poon

Commit 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the
accept socket") has a reference counting issue in TCP socket creation
when accepting a new connection.  The code uses sock_create_lite() to
create a kernel socket.  But it does not do __module_get() on the
socket owner.  When the connection is shutdown and sock_release() is
called to free the socket, the owner's reference count is decremented
and becomes incorrect.  Note that this bug only shows up when the socket
owner is configured as a kernel module.

v2: Update comments

Signed-off-by: Ka-Cheong Poon 
---
 net/rds/tcp_listen.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index c061d6e..2257118 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 Oracle.  All rights reserved.
+ * Copyright (c) 2006, 2018 Oracle.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -142,12 +142,20 @@ int rds_tcp_accept_one(struct socket *sock)
if (ret)
goto out;
 
-   new_sock->type = sock->type;
-   new_sock->ops = sock->ops;
ret = sock->ops->accept(sock, new_sock, O_NONBLOCK, true);
if (ret < 0)
goto out;
 
+   /* sock_create_lite() does not get a hold on the owner module so we
+* need to do it here.  Note that sock_release() uses sock->ops to
+* determine if it needs to decrement the reference count.  So set
+* sock->ops after calling accept() in case that fails.  And there's
+* no need to do try_module_get() as the listener should have a hold
+* already.
+*/
+   new_sock->ops = sock->ops;
+   __module_get(new_sock->ops->owner);
+
ret = rds_tcp_keepalive(new_sock);
if (ret < 0)
goto out;
-- 
1.8.3.1

linux-next: Signed-off-by missing for commits in the net-next tree

2018-03-01 Thread Stephen Rothwell

Hi all,

Commits

  568477045f80 (" phy: marvell10g: Utilize gen10g_no_soft_reset()")
  0adfdb667ab5 (" phy: cortina: Utilize generic functions")
  aebc78a40b88 (" phy: teranetics: Utilize generic functions")
  e8a714e086e4 (" phy: Export gen10g_* functions")
  6ed33d3a06e6 (" phy: aquantia: Utilize genphy_c45_aneg_done()")

are missing a Signed-off-by from their committer.

-- 
Cheers,
Stephen Rothwell


pgpUrtTenUXkr.pgp
Description: OpenPGP digital signature

Re: [PATCH net 0/4] net: dsa: Use strncpy() for ethtool::get_strings

2018-03-01 Thread Florian Fainelli

Hi David,

On 03/01/2018 04:25 PM, Florian Fainelli wrote:
> Hi all,
> 
> After turning on KASAN on one of my systems, I started getting lots of out of
> bounds errors while fetching a given port's statistics, and indeed using
> memcpy() is unsafe for copying strings, so let's use strncpy() instead.

Looks like only patch 1 is necessary, but there are more drivers with
the same pattern under drivers/net/phy: marvell.c, micrel.c and
bcm-phy-lib.c, so I will submit a v2 with those fixed.

> 
> Florian Fainelli (4):
>   net: dsa: b53: Use strncpy() for ethtool::get_strings
>   net: dsa: loop: Use strncpy() for ethtool::get_strings
>   net: dsa: microchip: Utilize strncpy() for ethtool::get_strings
>   net: dsa: mv88e6xxx: Utilize strncpy() for ethtool::get_strings
> 
>  drivers/net/dsa/b53/b53_common.c   | 4 ++--
>  drivers/net/dsa/dsa_loop.c | 4 ++--
>  drivers/net/dsa/microchip/ksz_common.c | 4 ++--
>  drivers/net/dsa/mv88e6xxx/chip.c   | 4 ++--
>  4 files changed, 8 insertions(+), 8 deletions(-)
> 

-- 
Florian

Re: [PATCH net 4/4] net: dsa: mv88e6xxx: Utilize strncpy() for ethtool::get_strings

2018-03-01 Thread Florian Fainelli

On 03/01/2018 07:08 PM, Andrew Lunn wrote:
> On Thu, Mar 01, 2018 at 04:25:29PM -0800, Florian Fainelli wrote:
>> Do not use memcpy() which is not safe, but instead use strncpy() which
>> will make sure that the string is NUL terminated (in the Linux
>> implementation) if the string is smaller than the length specified. This
>> fixes KASAN out of bounds warnings while fetching port statistics.
>>
>> Fixes: f5e2ed022dff ("dsa: mv88e6xxx: Add Second back of statistics")
> 
> I'm sure it goes back much further than that.

You are right, it appears that I used the most recent commit that
changed the stats last.

This is not actually needed per-se here because the string is defined to
be ETH_GSTRING_LEN bytes, so unlike b53, we are not copying past the
buffer, in fact only the first patch is really necessary.

Thanks!

> 
>> Signed-off-by: Florian Fainelli 
> 
> Reviewed-by: Andrew Lunn 
> 
> Andrew
> 

-- 
Florian

[PATCH bpf-next 3/4] tools: bpftool: read from stdin when batch file name is "-"

2018-03-01 Thread Jakub Kicinski

From: Quentin Monnet 

Make bpftool read its command list from standard input when the name if
the input file is a single dash.

Signed-off-by: Quentin Monnet 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/main.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index cdee4c3d30c3..1da54a9b5ea3 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -195,7 +195,10 @@ static int do_batch(int argc, char **argv)
}
NEXT_ARG();
 
-   fp = fopen(*argv, "r");
+   if (!strcmp(*argv, "-"))
+   fp = stdin;
+   else
+   fp = fopen(*argv, "r");
if (!fp) {
p_err("Can't open file (%s): %s", *argv, strerror(errno));
return -1;
@@ -284,7 +287,8 @@ static int do_batch(int argc, char **argv)
err = 0;
}
 err_close:
-   fclose(fp);
+   if (fp != stdin)
+   fclose(fp);
 
if (json_output)
jsonw_end_array(json_wtr);
-- 
2.15.1

Re: [Patch nf-next] netfilter: make xt_rateest hash table per net

2018-03-01 Thread Eric Dumazet

On Thu, 2018-03-01 at 18:58 -0800, Cong Wang wrote:
> As suggested by Eric, we need to make the xt_rateest
> hash table and its lock per netns to reduce lock
> contentions.
> 
> Cc: Florian Westphal 
> Cc: Eric Dumazet 
> Cc: Pablo Neira Ayuso 
> Signed-off-by: Cong Wang 
> ---
>  include/net/netfilter/xt_rateest.h |  4 +-
>  net/netfilter/xt_RATEEST.c | 91 
> +++---
>  net/netfilter/xt_rateest.c | 10 ++---
>  3 files changed, 72 insertions(+), 33 deletions(-)

Very nice, thanks !

Reviewed-by: Eric Dumazet 

Although the main reason was to avoid name collisions between different
netns.

Hash table is small enough that it can be allocated for each netns.

Re: [PATCH net-next 1/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer

2018-03-01 Thread Jason Wang




On 2018年03月01日 21:36, Michael S. Tsirkin wrote:

On Thu, Mar 01, 2018 at 11:19:04AM +0800, Jason Wang wrote:

XDP_REDIRECT support for mergeable buffer was removed since commit
7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. Other complaints are, the
complex linearize logic and EWMA estimation may increase the
possibility of linearizing.

Signed-off-by: Jason Wang

How about reposting just this patch for net? It's pretty small
and this way we don't have broken redirect there.
Probably keeping the linearize in here to reduce the
amount of changes.



Looks possible, let me post a version for net.

Thanks

[PATCH bpf-next 0/4] tools: bpftool: improve batch mode

2018-03-01 Thread Jakub Kicinski

Quentin says:

Several enhancements for bpftool batch mode are introduced in this series.

More specifically, input files for batch mode gain support for:
  * comments (starting with '#'),
  * continuation lines (after a line ending with '\'),
  * arguments enclosed between quotes.

Also, make bpftool able to read from standard input when "-" is provided as
input file name.


Quentin Monnet (4):
  tools: bpftool: support comments in batch files
  tools: bpftool: support continuation lines in batch files
  tools: bpftool: read from stdin when batch file name is "-"
  tools: bpftool: add support for quotations in batch files

 tools/bpf/bpftool/main.c | 104 ---
 1 file changed, 89 insertions(+), 15 deletions(-)

-- 
2.15.1

[PATCH bpf-next 2/4] tools: bpftool: support continuation lines in batch files

2018-03-01 Thread Jakub Kicinski

From: Quentin Monnet 

Add support for continuation lines, such as in the following example:

prog show
prog dump xlated \
id 1337 opcodes

This patch is based after the code for support for continuation lines
from file lib/utils.c from package iproute2.

"Lines" in error messages are renamed as "commands", as we count the
number of commands (but we ignore empty lines, comments, and do not add
continuation lines to the count).

Signed-off-by: Quentin Monnet 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/main.c | 36 
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 79587e6decae..cdee4c3d30c3 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -46,6 +46,9 @@
 
 #include "main.h"
 
+#define BATCH_LINE_LEN_MAX 65536
+#define BATCH_ARG_NB_MAX 4096
+
 const char *bin_name;
 static int last_argc;
 static char **last_argv;
@@ -171,9 +174,9 @@ static const struct cmd cmds[] = {
 
 static int do_batch(int argc, char **argv)
 {
+   char buf[BATCH_LINE_LEN_MAX], contline[BATCH_LINE_LEN_MAX];
+   char *n_argv[BATCH_ARG_NB_MAX];
unsigned int lines = 0;
-   char *n_argv[4096];
-   char buf[65536];
int n_argc;
FILE *fp;
char *cp;
@@ -210,13 +213,38 @@ static int do_batch(int argc, char **argv)
break;
}
 
+   /* Append continuation lines if any (coming after a line ending
+* with '\' in the batch file).
+*/
+   while ((cp = strstr(buf, "\\\n")) != NULL) {
+   if (!fgets(contline, sizeof(contline), fp) ||
+   strlen(contline) == 0) {
+   p_err("missing continuation line on command %d",
+ lines);
+   err = -1;
+   goto err_close;
+   }
+
+   cp = strchr(contline, '#');
+   if (cp)
+   *cp = '\0';
+
+   if (strlen(buf) + strlen(contline) + 1 > sizeof(buf)) {
+   p_err("command %d is too long", lines);
+   err = -1;
+   goto err_close;
+   }
+   buf[strlen(buf) - 2] = '\0';
+   strcat(buf, contline);
+   }
+
n_argc = 0;
n_argv[n_argc] = strtok(buf, " \t\n");
 
while (n_argv[n_argc]) {
n_argc++;
if (n_argc == ARRAY_SIZE(n_argv)) {
-   p_err("line %d has too many arguments, skip",
+   p_err("command %d has too many arguments, skip",
  lines);
n_argc = 0;
break;
@@ -252,7 +280,7 @@ static int do_batch(int argc, char **argv)
p_err("reading batch file failed: %s", strerror(errno));
err = -1;
} else {
-   p_info("processed %d lines", lines);
+   p_info("processed %d commands", lines);
err = 0;
}
 err_close:
-- 
2.15.1

[PATCH bpf-next 1/4] tools: bpftool: support comments in batch files

2018-03-01 Thread Jakub Kicinski

From: Quentin Monnet 

Replace '#' by '\0' in commands read from batch files in order to avoid
processing the remaining part of the line, thus allowing users to use
comments in the files.

Signed-off-by: Quentin Monnet 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/main.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 185acfa229b5..79587e6decae 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -176,6 +176,7 @@ static int do_batch(int argc, char **argv)
char buf[65536];
int n_argc;
FILE *fp;
+   char *cp;
int err;
int i;
 
@@ -200,6 +201,10 @@ static int do_batch(int argc, char **argv)
if (json_output)
jsonw_start_array(json_wtr);
while (fgets(buf, sizeof(buf), fp)) {
+   cp = strchr(buf, '#');
+   if (cp)
+   *cp = '\0';
+
if (strlen(buf) == sizeof(buf) - 1) {
errno = E2BIG;
break;
-- 
2.15.1

[PATCH bpf-next 4/4] tools: bpftool: add support for quotations in batch files

2018-03-01 Thread Jakub Kicinski

From: Quentin Monnet 

Improve argument parsing from batch input files in order to support
arguments enclosed between single (') or double quotes ("). For example,
this command can now be parsed in batch mode:

bpftool prog dump xlated id 1337 file "/tmp/my file with spaces"

The function responsible for parsing command arguments is copied from
its counterpart in lib/utils.c in iproute2 package.

Signed-off-by: Quentin Monnet 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/main.c | 65 +---
 1 file changed, 51 insertions(+), 14 deletions(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 1da54a9b5ea3..1ec852d21d44 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -160,6 +160,54 @@ void fprint_hex(FILE *f, void *arg, unsigned int n, const 
char *sep)
}
 }
 
+/* Split command line into argument vector. */
+static int make_args(char *line, char *n_argv[], int maxargs, int cmd_nb)
+{
+   static const char ws[] = " \t\r\n";
+   char *cp = line;
+   int n_argc = 0;
+
+   while (*cp) {
+   /* Skip leading whitespace. */
+   cp += strspn(cp, ws);
+
+   if (*cp == '\0')
+   break;
+
+   if (n_argc >= (maxargs - 1)) {
+   p_err("too many arguments to command %d", cmd_nb);
+   return -1;
+   }
+
+   /* Word begins with quote. */
+   if (*cp == '\'' || *cp == '"') {
+   char quote = *cp++;
+
+   n_argv[n_argc++] = cp;
+   /* Find ending quote. */
+   cp = strchr(cp, quote);
+   if (!cp) {
+   p_err("unterminated quoted string in command 
%d",
+ cmd_nb);
+   return -1;
+   }
+   } else {
+   n_argv[n_argc++] = cp;
+
+   /* Find end of word. */
+   cp += strcspn(cp, ws);
+   if (*cp == '\0')
+   break;
+   }
+
+   /* Separate words. */
+   *cp++ = 0;
+   }
+   n_argv[n_argc] = NULL;
+
+   return n_argc;
+}
+
 static int do_batch(int argc, char **argv);
 
 static const struct cmd cmds[] = {
@@ -241,22 +289,11 @@ static int do_batch(int argc, char **argv)
strcat(buf, contline);
}
 
-   n_argc = 0;
-   n_argv[n_argc] = strtok(buf, " \t\n");
-
-   while (n_argv[n_argc]) {
-   n_argc++;
-   if (n_argc == ARRAY_SIZE(n_argv)) {
-   p_err("command %d has too many arguments, skip",
- lines);
-   n_argc = 0;
-   break;
-   }
-   n_argv[n_argc] = strtok(NULL, " \t\n");
-   }
-
+   n_argc = make_args(buf, n_argv, BATCH_ARG_NB_MAX, lines);
if (!n_argc)
continue;
+   if (n_argc < 0)
+   goto err_close;
 
if (json_output) {
jsonw_start_object(json_wtr);
-- 
2.15.1

Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer

2018-03-01 Thread Jason Wang




On 2018年03月01日 22:16, Jesper Dangaard Brouer wrote:

On Thu, 1 Mar 2018 21:15:36 +0800
Jason Wang  wrote:


On 2018年03月01日 18:35, Jesper Dangaard Brouer wrote:

On Thu, 1 Mar 2018 17:23:37 +0800
Jason Wang  wrote:
  

On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:

On Thu,  1 Mar 2018 11:19:03 +0800
Jason Wang  wrote:
 

This series tries to re-enable XDP_REDIRECT for mergeable buffer which
was removed since commit 7324f5399b06 ("virtio_net: disable
XDP_REDIRECT in receive_mergeable() case"). Main concerns are:

- not enough tailroom was reserved which breaks cpumap

To address this at a more fundamental level, I would suggest that we/you
instead extend XDP to know it's buffers "frame" size/end.  (The
assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
ixgbe+virtio_net broke that assumption).

It should actually be fairly easy to implement:
* Simply extend xdp_buff with a "data_hard_end" pointer.

Right, and then cpumap can warn and drop packets with insufficient
tailroom.

But it should be a patch on top of this I think.

Hmmm, not really.  If we/you instead fix the issue of XDP doesn't know
the end/size of the frame, then we don't need this mixed XDP
generic/native code path mixing.

I know this but I'm still a little bit confused. According to the commit
log of 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in
receive_mergeable() case"), you said:

"""
      The longer explaination is that receive_mergeable() tries to
      work-around and satisfy these XDP requiresments e.g. by having a
      function xdp_linearize_page() that allocates and memcpy RX buffers
      around (in case packet is scattered across multiple rx buffers).  This
      does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because
      we have not implemented bpf_xdp_adjust_tail yet).
"""

So I consider the tailroom is a must for the (future) tail adjustment.

That is true, BUT implementing the "data_hard_end" extension is a
pre-requisite.  It will also be to catch the issue of too little
tail-room if/when implementing bpf_xdp_adjust_tail().

It is of-cause a "nice-to-have", to fix this virtio_net driver's
receive_mergeable() call to have enough tail-room, but I don't see it
as a solution to the fundamental problem.



You could re-enable native redirect, and push the responsibility to
cpumap for detecting this too-small frame "missing tailroom" (and avoid
crashing...). (If we really want to support this, cpumap could fallback
to dev_alloc_skb, and handle it gracefully).
  

Right but it will be slower than build_skb().

True, but bad argument in this context, as you are already using a
similar function call napi_alloc_skb().  And it will be even slower to
call generic-XDP code path.



Well, there's no generic skb implementation for cpumap redirection so I 
think we're talking about native XDP for cpumap, In this case, we won't 
even use napi_alloc_skb().


Thanks

Re: [patch net-next 09/10] net: sch: prio: Add offload ability for grafting a child

2018-03-01 Thread Jakub Kicinski

On Thu, 1 Mar 2018 22:38:50 -0500, Alexander Aring wrote:
> I guess to make extack working, you need to return an errno if failed.

AFAIK extack is printed as a warning if operation did not fail.

Re: SRIOV on Intel x710 fail to get attached both at VM creation time and VM runtime

2018-03-01 Thread Alexander Duyck

On Thu, Mar 1, 2018 at 8:12 AM,   wrote:
> + intel-wired-...@lists.osuosl.org
>
>
> On 2018-03-01 21:41, p...@codeaurora.org wrote:
>>
>> Hi All,
>>
>> I am facing the following issue on kernel 4.14.14.
>>
>> Enable SRIOV on Intel x710 card.
>> echo 32 > /sys/class/net/eth1/device/sriov_numvfs
>> start net_pool
>> virsh net-start intel_pool
>>
>> case 1)
>> attach the VF while creatig VM:
>> virt-install --accelerate --import --disk /home/disk.img --network
>> network=intel_pool  --boot uefi --name poza-guest --os-type linux
>> --os-variant rhel7 --ram 8000 --vcpus 4
>>
>> case 2)
>> create VM:
>> virt-install --accelerate --import --disk /home/disk.img --boot uefi
>> --name poza-guest --os-type linux --os-variant rhel7 --ram 8000
>> --vcpus 4
>> attach it:
>> virsh attach-interface --domain oza-guest --type network --source
>> intel_pool --target eth1
>>
>> kernel logs:
>> [44287.825287] i40evf :01:02.0: Unable to send opcode 2 to PF, err
>> I40E_ERR_QUEUE_EMPTY, aq_err OK
>> [44287.962640] i40e :01:00.0: VF 0 still in reset. Try again.
>> error: Failed to attach interface
>> error: Cannot set interface MAC/vlanid to 52:54:00:e9:f1:b5/0 for
>> ifname eth1 vf 0: Resource temporarily unavailable
>>
>>
>> The same use case works with following card with the same kernel
>> version and rootfs.
>> Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
>>
>> for details logs please have a look at
>> https://bugzilla.kernel.org/show_bug.cgi?id=198959
>>
>> Regards,
>> Oza.

So the first question that jumps to mind is what is the firmware
version on the PF, ethtool -i should tell you. Are there any issues
bringing up the PF and getting it to pass traffic?

Thanks.

- Alex

Re: [patch net-next 09/10] net: sch: prio: Add offload ability for grafting a child

2018-03-01 Thread Alexander Aring

Hi,

On Wed, Feb 28, 2018 at 4:45 AM, Jiri Pirko  wrote:
> From: Nogah Frankel 
>
> Offload sch_prio graft command for capable drivers.
> Warn in case of a failure, unless the graft was done as part of a destroy
> operation (the new qdisc is a noop) or if all the qdiscs (the parent, the
> old child, and the new one) are not offloaded.
>
> Signed-off-by: Nogah Frankel 
> Reviewed-by: Yuval Mintz 
> Signed-off-by: Jiri Pirko 
> ---
>  include/net/pkt_cls.h |  8 
>  net/sched/sch_prio.c  | 32 
...
> +   if (*old)
> +   any_qdisc_is_offloaded |= (*old)->flags &
> +  TCQ_F_OFFLOADED;
> +
> +   if (any_qdisc_is_offloaded)
> +   NL_SET_ERR_MSG(extack, "Offloading graft operation 
> failed.");
> +   }
> +
> return 0;
>  }
>

I guess to make extack working, you need to return an errno if failed.

- Alex

[PATCH v2 net-next 05/10] net: Rename NETEVENT_MULTIPATH_HASH_UPDATE

2018-03-01 Thread David Ahern

Rename NETEVENT_MULTIPATH_HASH_UPDATE to
NETEVENT_IPV4_MPATH_HASH_UPDATE to denote it relates to a change
in the IPv4 hash policy.

Signed-off-by: David Ahern 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 2 +-
 include/net/netevent.h| 2 +-
 net/ipv4/sysctl_net_ipv4.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 69f16c605b9d..93d48c1b2bf8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -2430,7 +2430,7 @@ static int mlxsw_sp_router_netevent_event(struct 
notifier_block *nb,
mlxsw_core_schedule_work(_work->work);
mlxsw_sp_port_dev_put(mlxsw_sp_port);
break;
-   case NETEVENT_MULTIPATH_HASH_UPDATE:
+   case NETEVENT_IPV4_MPATH_HASH_UPDATE:
net = ptr;
 
if (!net_eq(net, _net))
diff --git a/include/net/netevent.h b/include/net/netevent.h
index 40e7bab68490..baee605a94ab 100644
--- a/include/net/netevent.h
+++ b/include/net/netevent.h
@@ -26,7 +26,7 @@ enum netevent_notif_type {
NETEVENT_NEIGH_UPDATE = 1, /* arg is struct neighbour ptr */
NETEVENT_REDIRECT, /* arg is struct netevent_redirect ptr */
NETEVENT_DELAY_PROBE_TIME_UPDATE, /* arg is struct neigh_parms ptr */
-   NETEVENT_MULTIPATH_HASH_UPDATE, /* arg is struct net ptr */
+   NETEVENT_IPV4_MPATH_HASH_UPDATE, /* arg is struct net ptr */
 };
 
 int register_netevent_notifier(struct notifier_block *nb);
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 89683d868b37..011de9a20ec6 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -400,7 +400,7 @@ static int proc_fib_multipath_hash_policy(struct ctl_table 
*table, int write,
 
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (write && ret == 0)
-   call_netevent_notifiers(NETEVENT_MULTIPATH_HASH_UPDATE, net);
+   call_netevent_notifiers(NETEVENT_IPV4_MPATH_HASH_UPDATE, net);
 
return ret;
 }
-- 
2.11.0

[PATCH v2 net-next 02/10] net: Align ip_multipath_l3_keys and ip6_multipath_l3_keys

2018-03-01 Thread David Ahern

Symmetry is good and allows easy comparison that ipv4 and ipv6 are
doing the same thing. To that end, change ip_multipath_l3_keys to
set addresses at the end after the icmp compares, and move the
initialization of ipv6 flow keys to rt6_multipath_hash.

Signed-off-by: David Ahern 
---
 net/ipv4/route.c | 20 +++-
 net/ipv6/route.c |  4 ++--
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5615d26b3db7..78338f89370e 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1748,37 +1748,39 @@ static void ip_multipath_l3_keys(const struct sk_buff 
*skb,
 struct flow_keys *hash_keys)
 {
const struct iphdr *outer_iph = ip_hdr(skb);
+   const struct iphdr *key_iph = outer_iph;
const struct iphdr *inner_iph;
const struct icmphdr *icmph;
struct iphdr _inner_iph;
struct icmphdr _icmph;
 
-   hash_keys->addrs.v4addrs.src = outer_iph->saddr;
-   hash_keys->addrs.v4addrs.dst = outer_iph->daddr;
if (likely(outer_iph->protocol != IPPROTO_ICMP))
-   return;
+   goto out;
 
if (unlikely((outer_iph->frag_off & htons(IP_OFFSET)) != 0))
-   return;
+   goto out;
 
icmph = skb_header_pointer(skb, outer_iph->ihl * 4, sizeof(_icmph),
   &_icmph);
if (!icmph)
-   return;
+   goto out;
 
if (icmph->type != ICMP_DEST_UNREACH &&
icmph->type != ICMP_REDIRECT &&
icmph->type != ICMP_TIME_EXCEEDED &&
icmph->type != ICMP_PARAMETERPROB)
-   return;
+   goto out;
 
inner_iph = skb_header_pointer(skb,
   outer_iph->ihl * 4 + sizeof(_icmph),
   sizeof(_inner_iph), &_inner_iph);
if (!inner_iph)
-   return;
-   hash_keys->addrs.v4addrs.src = inner_iph->saddr;
-   hash_keys->addrs.v4addrs.dst = inner_iph->daddr;
+   goto out;
+
+   key_iph = inner_iph;
+out:
+   hash_keys->addrs.v4addrs.src = key_iph->saddr;
+   hash_keys->addrs.v4addrs.dst = key_iph->daddr;
 }
 
 /* if skb is set it will be used and fl4 can be NULL */
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index e2bb40824c85..190d9690dfe0 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1815,8 +1815,6 @@ static void ip6_multipath_l3_keys(const struct sk_buff 
*skb,
key_iph = inner_iph;
_flkeys = NULL;
 out:
-   memset(keys, 0, sizeof(*keys));
-   keys->control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
if (_flkeys) {
keys->addrs.v6addrs.src = _flkeys->addrs.v6addrs.src;
keys->addrs.v6addrs.dst = _flkeys->addrs.v6addrs.dst;
@@ -1837,6 +1835,8 @@ u32 rt6_multipath_hash(const struct flowi6 *fl6, const 
struct sk_buff *skb,
struct flow_keys hash_keys;
 
if (skb) {
+   memset(_keys, 0, sizeof(hash_keys));
+   hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
ip6_multipath_l3_keys(skb, _keys, flkeys);
return flow_hash_from_keys(_keys) >> 1;
}
-- 
2.11.0

[PATCH v2 net-next 04/10] net/ipv6: Make rt6_multipath_hash similar to fib_multipath_hash

2018-03-01 Thread David Ahern

Make rt6_multipath_hash more of a direct parallel to fib_multipath_hash
and reduce stack and overhead in the process: get_hash_from_flowi6 is
just a wrapper around __get_hash_from_flowi6 with another stack
allocation for flow_keys. Move setting the addresses, protocol and
label into rt6_multipath_hash and allow it to make the call to
flow_hash_from_keys.

Signed-off-by: David Ahern 
---
 net/ipv6/route.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 190d9690dfe0..5c89af2c54f4 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1833,15 +1833,21 @@ u32 rt6_multipath_hash(const struct flowi6 *fl6, const 
struct sk_buff *skb,
   struct flow_keys *flkeys)
 {
struct flow_keys hash_keys;
+   u32 mhash;
 
+   memset(_keys, 0, sizeof(hash_keys));
+   hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
if (skb) {
-   memset(_keys, 0, sizeof(hash_keys));
-   hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
ip6_multipath_l3_keys(skb, _keys, flkeys);
-   return flow_hash_from_keys(_keys) >> 1;
+   } else {
+   hash_keys.addrs.v6addrs.src = fl6->saddr;
+   hash_keys.addrs.v6addrs.dst = fl6->daddr;
+   hash_keys.tags.flow_label = (__force u32)fl6->flowlabel;
+   hash_keys.basic.ip_proto = fl6->flowi6_proto;
}
+   mhash = flow_hash_from_keys(_keys);
 
-   return get_hash_from_flowi6(fl6) >> 1;
+   return mhash >> 1;
 }
 
 void ip6_route_input(struct sk_buff *skb)
-- 
2.11.0

[PATCH v2 net-next 03/10] net/ipv4: Simplify fib_multipath_hash with optional flow keys

2018-03-01 Thread David Ahern

As of commit e37b1e978bec5 ("ipv6: route: dissect flow in input path if
fib rules need it") fib_multipath_hash takes an optional flow keys. If
non-NULL it means the skb has already been dissected. If not set, then
fib_multipath_hash needs to call skb_flow_dissect_flow_keys.

Simplify the logic by setting flkeys to the local stack variable keys.
Simplifies fib_multipath_hash by only have 1 set of instructions
setting hash_keys.

Signed-off-by: David Ahern 
---
 net/ipv4/route.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 78338f89370e..a7940b676f52 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1811,24 +1811,20 @@ int fib_multipath_hash(const struct net *net, const 
struct fib_info *fi,
/* short-circuit if we already have L4 hash present */
if (skb->l4_hash)
return skb_get_hash_raw(skb) >> 1;
+
memset(_keys, 0, sizeof(hash_keys));
 
-   if (flkeys) {
-   hash_keys.control.addr_type = 
FLOW_DISSECTOR_KEY_IPV4_ADDRS;
-   hash_keys.addrs.v4addrs.src = 
flkeys->addrs.v4addrs.src;
-   hash_keys.addrs.v4addrs.dst = 
flkeys->addrs.v4addrs.dst;
-   hash_keys.ports.src = flkeys->ports.src;
-   hash_keys.ports.dst = flkeys->ports.dst;
-   hash_keys.basic.ip_proto = 
flkeys->basic.ip_proto;
-   } else {
+   if (!flkeys) {
skb_flow_dissect_flow_keys(skb, , flag);
-   hash_keys.control.addr_type = 
FLOW_DISSECTOR_KEY_IPV4_ADDRS;
-   hash_keys.addrs.v4addrs.src = 
keys.addrs.v4addrs.src;
-   hash_keys.addrs.v4addrs.dst = 
keys.addrs.v4addrs.dst;
-   hash_keys.ports.src = keys.ports.src;
-   hash_keys.ports.dst = keys.ports.dst;
-   hash_keys.basic.ip_proto = keys.basic.ip_proto;
+   flkeys = 
}
+
+   hash_keys.control.addr_type = 
FLOW_DISSECTOR_KEY_IPV4_ADDRS;
+   hash_keys.addrs.v4addrs.src = flkeys->addrs.v4addrs.src;
+   hash_keys.addrs.v4addrs.dst = flkeys->addrs.v4addrs.dst;
+   hash_keys.ports.src = flkeys->ports.src;
+   hash_keys.ports.dst = flkeys->ports.dst;
+   hash_keys.basic.ip_proto = flkeys->basic.ip_proto;
} else {
memset(_keys, 0, sizeof(hash_keys));
hash_keys.control.addr_type = 
FLOW_DISSECTOR_KEY_IPV4_ADDRS;
-- 
2.11.0

[PATCH v2 net-next 06/10] net/ipv6: Pass skb to route lookup

2018-03-01 Thread David Ahern

IPv6 does path selection for multipath routes deep in the lookup
functions. The next patch adds L4 hash option and needs the skb
for the forward path. To get the skb to the relevant FIB lookup
functions it needs to go through the fib rules layer, so add a
lookup_data argument to the fib_lookup_arg struct.

Signed-off-by: David Ahern 
---
 drivers/infiniband/core/cma.c  |  2 +-
 drivers/net/ipvlan/ipvlan_core.c   |  3 +-
 drivers/net/vrf.c  |  7 ++--
 include/net/fib_rules.h|  1 +
 include/net/ip6_fib.h  |  4 ++-
 include/net/ip6_route.h| 11 +++---
 net/ipv6/anycast.c |  2 +-
 net/ipv6/fib6_rules.c  |  8 +++--
 net/ipv6/icmp.c|  3 +-
 net/ipv6/ip6_fib.c |  3 +-
 net/ipv6/ip6_gre.c |  2 +-
 net/ipv6/ip6_tunnel.c  |  4 +--
 net/ipv6/ip6_vti.c |  2 +-
 net/ipv6/mcast.c   |  4 +--
 net/ipv6/netfilter/ip6t_rpfilter.c |  2 +-
 net/ipv6/netfilter/nft_fib_ipv6.c  |  3 +-
 net/ipv6/route.c   | 72 +++---
 net/ipv6/seg6_local.c  |  4 +--
 18 files changed, 83 insertions(+), 54 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 3ae32d1ddd27..915bbd867b61 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1334,7 +1334,7 @@ static bool validate_ipv6_net_dev(struct net_device 
*net_dev,
   IPV6_ADDR_LINKLOCAL;
struct rt6_info *rt = rt6_lookup(dev_net(net_dev), _addr->sin6_addr,
 _addr->sin6_addr, net_dev->ifindex,
-strict);
+NULL, strict);
bool ret;
 
if (!rt)
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 17daebd19e65..1a8132eb2a3e 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -817,7 +817,8 @@ struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, 
struct sk_buff *skb,
};
 
skb_dst_drop(skb);
-   dst = ip6_route_input_lookup(dev_net(sdev), sdev, , flags);
+   dst = ip6_route_input_lookup(dev_net(sdev), sdev, ,
+skb, flags);
skb_dst_set(skb, dst);
break;
}
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index e459e601c57f..c6be49d3a9eb 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -941,6 +941,7 @@ static struct rt6_info *vrf_ip6_route_lookup(struct net 
*net,
 const struct net_device *dev,
 struct flowi6 *fl6,
 int ifindex,
+const struct sk_buff *skb,
 int flags)
 {
struct net_vrf *vrf = netdev_priv(dev);
@@ -959,7 +960,7 @@ static struct rt6_info *vrf_ip6_route_lookup(struct net 
*net,
if (!table)
return NULL;
 
-   return ip6_pol_route(net, table, ifindex, fl6, flags);
+   return ip6_pol_route(net, table, ifindex, fl6, skb, flags);
 }
 
 static void vrf_ip6_input_dst(struct sk_buff *skb, struct net_device *vrf_dev,
@@ -977,7 +978,7 @@ static void vrf_ip6_input_dst(struct sk_buff *skb, struct 
net_device *vrf_dev,
struct net *net = dev_net(vrf_dev);
struct rt6_info *rt6;
 
-   rt6 = vrf_ip6_route_lookup(net, vrf_dev, , ifindex,
+   rt6 = vrf_ip6_route_lookup(net, vrf_dev, , ifindex, skb,
   RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_IFACE);
if (unlikely(!rt6))
return;
@@ -1110,7 +,7 @@ static struct dst_entry *vrf_link_scope_lookup(const 
struct net_device *dev,
if (!ipv6_addr_any(>saddr))
flags |= RT6_LOOKUP_F_HAS_SADDR;
 
-   rt = vrf_ip6_route_lookup(net, dev, fl6, fl6->flowi6_oif, flags);
+   rt = vrf_ip6_route_lookup(net, dev, fl6, fl6->flowi6_oif, NULL, flags);
if (rt)
dst = >dst;
 
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 1c9e17c11953..e5cfcfc7dd93 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -47,6 +47,7 @@ struct fib_rule {
 
 struct fib_lookup_arg {
void*lookup_ptr;
+   const void  *lookup_data;
void*result;
struct fib_rule *rule;
u32 table;
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 8d906a35b534..5e86fd9dc857 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -350,7 +350,8 @@ struct fib6_table {
 
 typedef struct rt6_info *(*pol_lookup_t)(struct net *,

[PATCH v2 net-next 07/10] net/ipv6: Add support for path selection using hash of 5-tuple

2018-03-01 Thread David Ahern

Some operators prefer IPv6 path selection to use a standard 5-tuple
hash rather than just an L3 hash with the flow the label. To that end
add support to IPv6 for multipath hash policy similar to bf4e0a3db97eb
("net: ipv4: add support for ECMP hash policy choice"). The default
is still L3 which covers source and destination addresses along with
flow label and IPv6 protocol.

Signed-off-by: David Ahern 
---
 Documentation/networking/ip-sysctl.txt |  7 
 include/net/ip6_route.h|  4 +-
 include/net/netevent.h |  1 +
 include/net/netns/ipv6.h   |  1 +
 net/ipv6/icmp.c|  2 +-
 net/ipv6/route.c   | 68 ++
 net/ipv6/sysctl_net_ipv6.c | 27 ++
 7 files changed, 91 insertions(+), 19 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index a553d4e4a0fb..783675a730e5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1363,6 +1363,13 @@ flowlabel_reflect - BOOLEAN
FALSE: disabled
Default: FALSE
 
+fib_multipath_hash_policy - INTEGER
+   Controls which hash policy to use for multipath routes.
+   Default: 0 (Layer 3)
+   Possible values:
+   0 - Layer 3 (source and destination addresses plus flow label)
+   1 - Layer 4 (standard 5-tuple)
+
 anycast_src_echo_reply - BOOLEAN
Controls the use of anycast addresses as source addresses for ICMPv6
echo reply
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 9594f9317952..ce2abc0ff102 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -130,8 +130,8 @@ static inline int ip6_route_get_saddr(struct net *net, 
struct rt6_info *rt,
 struct rt6_info *rt6_lookup(struct net *net, const struct in6_addr *daddr,
const struct in6_addr *saddr, int oif,
const struct sk_buff *skb, int flags);
-u32 rt6_multipath_hash(const struct flowi6 *fl6, const struct sk_buff *skb,
-  struct flow_keys *hkeys);
+u32 rt6_multipath_hash(const struct net *net, const struct flowi6 *fl6,
+  const struct sk_buff *skb, struct flow_keys *hkeys);
 
 struct dst_entry *icmp6_dst_alloc(struct net_device *dev, struct flowi6 *fl6);
 
diff --git a/include/net/netevent.h b/include/net/netevent.h
index baee605a94ab..d9918261701c 100644
--- a/include/net/netevent.h
+++ b/include/net/netevent.h
@@ -27,6 +27,7 @@ enum netevent_notif_type {
NETEVENT_REDIRECT, /* arg is struct netevent_redirect ptr */
NETEVENT_DELAY_PROBE_TIME_UPDATE, /* arg is struct neigh_parms ptr */
NETEVENT_IPV4_MPATH_HASH_UPDATE, /* arg is struct net ptr */
+   NETEVENT_IPV6_MPATH_HASH_UPDATE, /* arg is struct net ptr */
 };
 
 int register_netevent_notifier(struct notifier_block *nb);
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index e286fda09fcf..5b51110435fc 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -28,6 +28,7 @@ struct netns_sysctl_ipv6 {
int ip6_rt_gc_elasticity;
int ip6_rt_mtu_expires;
int ip6_rt_min_advmss;
+   int multipath_hash_policy;
int flowlabel_consistency;
int auto_flowlabels;
int icmpv6_time;
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index a5d929223820..6f84668be6ea 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -522,7 +522,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info,
fl6.fl6_icmp_type = type;
fl6.fl6_icmp_code = code;
fl6.flowi6_uid = sock_net_uid(net, NULL);
-   fl6.mp_hash = rt6_multipath_hash(, skb, NULL);
+   fl6.mp_hash = rt6_multipath_hash(net, , skb, NULL);
security_skb_classify_flow(skb, flowi6_to_flowi());
 
sk = icmpv6_xmit_lock(net);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d2b8368663cb..f0ae58424c45 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -450,7 +450,8 @@ static bool rt6_check_expired(const struct rt6_info *rt)
return false;
 }
 
-static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
+static struct rt6_info *rt6_multipath_select(const struct net *net,
+struct rt6_info *match,
 struct flowi6 *fl6, int oif,
 const struct sk_buff *skb,
 int strict)
@@ -461,7 +462,7 @@ static struct rt6_info *rt6_multipath_select(struct 
rt6_info *match,
 * case it will always be non-zero. Otherwise now is the time to do it.
 */
if (!fl6->mp_hash)
-   fl6->mp_hash = rt6_multipath_hash(fl6, skb, NULL);
+   fl6->mp_hash = rt6_multipath_hash(net, fl6, skb, NULL);
 
if (fl6->mp_hash <=

[PATCH v2 net-next 01/10] net/ipv4: Pass net to fib_multipath_hash instead of fib_info

2018-03-01 Thread David Ahern

fib_multipath_hash only needs net struct to check a sysctl. Make it
clear by passing net instead of fib_info. In the need this allows
alignment between the ipv4 and ipv6 versions.

Signed-off-by: David Ahern 
---
 include/net/ip_fib.h | 5 +++--
 net/ipv4/fib_semantics.c | 2 +-
 net/ipv4/route.c | 9 +
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 8812582a94d5..1c4219e88726 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -395,8 +395,9 @@ int fib_sync_down_addr(struct net_device *dev, __be32 
local);
 int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-int fib_multipath_hash(const struct fib_info *fi, const struct flowi4 *fl4,
-  const struct sk_buff *skb, struct flow_keys *flkeys);
+int fib_multipath_hash(const struct net *net, const struct fib_info *fi,
+  const struct flowi4 *fl4, const struct sk_buff *skb,
+  struct flow_keys *flkeys);
 #endif
 void fib_select_multipath(struct fib_result *res, int hash);
 void fib_select_path(struct net *net, struct fib_result *res,
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 181b0d8d589c..02c1ff19a46f 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1770,7 +1770,7 @@ void fib_select_path(struct net *net, struct fib_result 
*res,
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
if (res->fi->fib_nhs > 1) {
-   int h = fib_multipath_hash(res->fi, fl4, skb, NULL);
+   int h = fib_multipath_hash(net, res->fi, fl4, skb, NULL);
 
fib_select_multipath(res, h);
}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 3bb686dac273..5615d26b3db7 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1782,10 +1782,10 @@ static void ip_multipath_l3_keys(const struct sk_buff 
*skb,
 }
 
 /* if skb is set it will be used and fl4 can be NULL */
-int fib_multipath_hash(const struct fib_info *fi, const struct flowi4 *fl4,
-  const struct sk_buff *skb, struct flow_keys *flkeys)
+int fib_multipath_hash(const struct net *net, const struct fib_info *fi,
+  const struct flowi4 *fl4, const struct sk_buff *skb,
+  struct flow_keys *flkeys)
 {
-   struct net *net = fi->fib_net;
struct flow_keys hash_keys;
u32 mhash;
 
@@ -1852,7 +1852,8 @@ static int ip_mkroute_input(struct sk_buff *skb,
 {
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
if (res->fi && res->fi->fib_nhs > 1) {
-   int h = fib_multipath_hash(res->fi, NULL, skb, hkeys);
+   int h = fib_multipath_hash(res->fi->fib_net, res->fi,
+  NULL, skb, hkeys);
 
fib_select_multipath(res, h);
}
-- 
2.11.0

[PATCH v2 net-next 09/10] net: Remove unused get_hash_from_flow functions

2018-03-01 Thread David Ahern

__get_hash_from_flowi6 is still used for flowlabels, but the IPv4
variant and the wrappers to both are not used. Remove them.

Signed-off-by: David Ahern 
---
 include/net/flow.h| 15 ---
 net/core/flow_dissector.c | 16 
 2 files changed, 31 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index 64e7ee9cb980..b3982de26e81 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -221,21 +221,6 @@ static inline unsigned int flow_key_size(u16 family)
 }
 
 __u32 __get_hash_from_flowi6(const struct flowi6 *fl6, struct flow_keys *keys);
-
-static inline __u32 get_hash_from_flowi6(const struct flowi6 *fl6)
-{
-   struct flow_keys keys;
-
-   return __get_hash_from_flowi6(fl6, );
-}
-
 __u32 __get_hash_from_flowi4(const struct flowi4 *fl4, struct flow_keys *keys);
 
-static inline __u32 get_hash_from_flowi4(const struct flowi4 *fl4)
-{
-   struct flow_keys keys;
-
-   return __get_hash_from_flowi4(fl4, );
-}
-
 #endif
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 559db9ea8d86..d29f09bc5ff9 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1341,22 +1341,6 @@ __u32 __get_hash_from_flowi6(const struct flowi6 *fl6, 
struct flow_keys *keys)
 }
 EXPORT_SYMBOL(__get_hash_from_flowi6);
 
-__u32 __get_hash_from_flowi4(const struct flowi4 *fl4, struct flow_keys *keys)
-{
-   memset(keys, 0, sizeof(*keys));
-
-   keys->addrs.v4addrs.src = fl4->saddr;
-   keys->addrs.v4addrs.dst = fl4->daddr;
-   keys->control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
-   keys->ports.src = fl4->fl4_sport;
-   keys->ports.dst = fl4->fl4_dport;
-   keys->keyid.keyid = fl4->fl4_gre_key;
-   keys->basic.ip_proto = fl4->flowi4_proto;
-
-   return flow_hash_from_keys(keys);
-}
-EXPORT_SYMBOL(__get_hash_from_flowi4);
-
 static const struct flow_dissector_key flow_keys_dissector_keys[] = {
{
.key_id = FLOW_DISSECTOR_KEY_CONTROL,
-- 
2.11.0

[PATCH v2 net-next 00/10] net/ipv6: Add support for path selection using hash of 5-tuple

2018-03-01 Thread David Ahern

Hardware supports multipath selection using the standard L4 5-tuple
instead of just L3 and the flow label. In addition, some network
operators prefer IPv6 path selection to use the 5-tuple. To that end,
add support to IPv6 for multipath hash policy similar to
bf4e0a3db97eb ("net: ipv4: add support for ECMP hash policy choice").
The default is still L3 which covers source and destination addresses
along with flow label and IPv6 protocol. This gives users a choice in
hash algorithms if they believe L3 only and the IPv6 flow label are not
sufficient for their use case.

A separate sysctl is added for IPv6, allowing IPv4 and IPv6 to use
different algorithms if desired.

The first 3 patches modify the IPv4 variant so that at the end of the
patch set the ipv4 and ipv6 implementations are direct parallels.

Patch 4 refactors the existing rt6_multipath_hash in preparation for
adding the policy option.

Patch 5 renames the existing netevent to have IPv4 in the name so ipv4
changes can be distinguished from IPv6 if the netevent handler cares.

Patch 6 adds the skb as an argument through the FIB lookup functions
to the multipath selection. Needed for the forwarding case.
 
Patch 7 adds the L4 hash support.

Patch 8 adds the hook for the netevent to the spectrum driver to update
the ASIC.

Patch 9 removes no longer used code.

Patch 10 adds a testcase for IPv6 multipath with L4 hash.

v1 to v2
- rebased to top of tree
- added refactor of fib_multipath_hash following recent change
- plumb skb through lookup functions to multipath selection
- fix sysctl setting; was missing the data set in ipv6_sysctl_net_init
- added test case

RFC to v1:
- rebase to top of net-next
- fix addr_type in hash_keys and removed flow label as noticed by Ido
- added a comment to cover letter about choice in algorithms based on
  use case per Or's comments

David Ahern (10):
  net/ipv4: Pass net to fib_multipath_hash instead of fib_info
  net: Align ip_multipath_l3_keys and ip6_multipath_l3_keys
  net/ipv4: Simplify fib_multipath_hash with optional flow keys
  net/ipv6: Make rt6_multipath_hash similar to fib_multipath_hash
  net: Rename NETEVENT_MULTIPATH_HASH_UPDATE
  net/ipv6: Pass skb to route lookup
  net/ipv6: Add support for path selection using hash of 5-tuple
  mlxsw: spectrum_router: Add support for ipv6 hash policy update
  net: Remove unused get_hash_from_flow functions
  selftests: forwarding: Add multipath test for L4 hashing

 Documentation/networking/ip-sysctl.txt |   7 ++
 drivers/infiniband/core/cma.c  |   2 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  |  13 +-
 drivers/net/ipvlan/ipvlan_core.c   |   3 +-
 drivers/net/vrf.c  |   7 +-
 include/net/fib_rules.h|   1 +
 include/net/flow.h |  15 ---
 include/net/ip6_fib.h  |   4 +-
 include/net/ip6_route.h|  15 ++-
 include/net/ip_fib.h   |   5 +-
 include/net/netevent.h |   3 +-
 include/net/netns/ipv6.h   |   1 +
 net/core/flow_dissector.c  |  16 ---
 net/ipv4/fib_semantics.c   |   2 +-
 net/ipv4/route.c   |  53 
 net/ipv4/sysctl_net_ipv4.c |   2 +-
 net/ipv6/anycast.c |   2 +-
 net/ipv6/fib6_rules.c  |   8 +-
 net/ipv6/icmp.c|   5 +-
 net/ipv6/ip6_fib.c |   3 +-
 net/ipv6/ip6_gre.c |   2 +-
 net/ipv6/ip6_tunnel.c  |   4 +-
 net/ipv6/ip6_vti.c |   2 +-
 net/ipv6/mcast.c   |   4 +-
 net/ipv6/netfilter/ip6t_rpfilter.c |   2 +-
 net/ipv6/netfilter/nft_fib_ipv6.c  |   3 +-
 net/ipv6/route.c   | 134 +++--
 net/ipv6/seg6_local.c  |   4 +-
 net/ipv6/sysctl_net_ipv6.c |  27 +
 .../selftests/net/forwarding/router_multipath.sh   |  44 +++
 30 files changed, 261 insertions(+), 132 deletions(-)

-- 
2.11.0

[PATCH v2 net-next 08/10] mlxsw: spectrum_router: Add support for ipv6 hash policy update

2018-03-01 Thread David Ahern

Similar to 28678f07f127d ("mlxsw: spectrum_router: Update multipath hash
parameters upon netevents") for IPv4, make sure the kernel and asic are
using the same hash algorithm for path selection.

Signed-off-by: David Ahern 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 93d48c1b2bf8..6f0457b6e408 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -2431,6 +2431,7 @@ static int mlxsw_sp_router_netevent_event(struct 
notifier_block *nb,
mlxsw_sp_port_dev_put(mlxsw_sp_port);
break;
case NETEVENT_IPV4_MPATH_HASH_UPDATE:
+   case NETEVENT_IPV6_MPATH_HASH_UPDATE:
net = ptr;
 
if (!net_eq(net, _net))
@@ -7030,13 +7031,21 @@ static void mlxsw_sp_mp4_hash_init(char *recr2_pl)
 
 static void mlxsw_sp_mp6_hash_init(char *recr2_pl)
 {
+   bool only_l3 = !init_net.ipv6.sysctl.multipath_hash_policy;
+
mlxsw_sp_mp_hash_header_set(recr2_pl,
MLXSW_REG_RECR2_IPV6_EN_NOT_TCP_NOT_UDP);
mlxsw_sp_mp_hash_header_set(recr2_pl, MLXSW_REG_RECR2_IPV6_EN_TCP_UDP);
mlxsw_reg_recr2_ipv6_sip_enable(recr2_pl);
mlxsw_reg_recr2_ipv6_dip_enable(recr2_pl);
-   mlxsw_sp_mp_hash_field_set(recr2_pl, MLXSW_REG_RECR2_IPV6_FLOW_LABEL);
mlxsw_sp_mp_hash_field_set(recr2_pl, MLXSW_REG_RECR2_IPV6_NEXT_HEADER);
+   if (only_l3) {
+   mlxsw_sp_mp_hash_field_set(recr2_pl, 
MLXSW_REG_RECR2_IPV6_FLOW_LABEL);
+   } else {
+   mlxsw_sp_mp_hash_header_set(recr2_pl, 
MLXSW_REG_RECR2_TCP_UDP_EN_IPV6);
+   mlxsw_sp_mp_hash_field_set(recr2_pl, 
MLXSW_REG_RECR2_TCP_UDP_SPORT);
+   mlxsw_sp_mp_hash_field_set(recr2_pl, 
MLXSW_REG_RECR2_TCP_UDP_DPORT);
+   }
 }
 
 static int mlxsw_sp_mp_hash_init(struct mlxsw_sp *mlxsw_sp)
-- 
2.11.0

[PATCH v2 net-next 10/10] selftests: forwarding: Add multipath test for L4 hashing

2018-03-01 Thread David Ahern

Add IPv6 multipath test using L4 hashing. Created with inputs from
Ido Schimmel.

Signed-off-by: David Ahern 
---
 .../selftests/net/forwarding/router_multipath.sh   | 44 ++
 1 file changed, 44 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/router_multipath.sh 
b/tools/testing/selftests/net/forwarding/router_multipath.sh
index 55595305a604..3bc351008db6 100755
--- a/tools/testing/selftests/net/forwarding/router_multipath.sh
+++ b/tools/testing/selftests/net/forwarding/router_multipath.sh
@@ -235,6 +235,45 @@ multipath4_test()
sysctl -q -w net.ipv4.fib_multipath_hash_policy=$hash_policy
 }
 
+multipath6_l4_test()
+{
+   local desc="$1"
+   local weight_rp12=$2
+   local weight_rp13=$3
+   local t0_rp12 t0_rp13 t1_rp12 t1_rp13
+   local packets_rp12 packets_rp13
+   local hash_policy
+
+   # Transmit multiple flows from h1 to h2 and make sure they are
+   # distributed between both multipath links (rp12 and rp13)
+   # according to the configured weights.
+   hash_policy=$(sysctl -n net.ipv6.fib_multipath_hash_policy)
+   sysctl -q -w net.ipv6.fib_multipath_hash_policy=1
+
+   ip route replace 2001:db8:2::/64 vrf vrf-r1 \
+  nexthop via fe80:2::22 dev $rp12 weight $weight_rp12 \
+  nexthop via fe80:3::23 dev $rp13 weight $weight_rp13
+
+   t0_rp12=$(link_stats_tx_packets_get $rp12)
+   t0_rp13=$(link_stats_tx_packets_get $rp13)
+
+   $MZ $h1 -6 -q -p 64 -A 2001:db8:1::2 -B 2001:db8:2::2 \
+  -d 1msec -t udp "sp=1024,dp=0-32768"
+
+   t1_rp12=$(link_stats_tx_packets_get $rp12)
+   t1_rp13=$(link_stats_tx_packets_get $rp13)
+
+   let "packets_rp12 = $t1_rp12 - $t0_rp12"
+   let "packets_rp13 = $t1_rp13 - $t0_rp13"
+   multipath_eval "$desc" $weight_rp12 $weight_rp13 $packets_rp12 
$packets_rp13
+
+   ip route replace 2001:db8:2::/64 vrf vrf-r1 \
+  nexthop via fe80:2::22 dev $rp12 \
+  nexthop via fe80:3::23 dev $rp13
+
+   sysctl -q -w net.ipv6.fib_multipath_hash_policy=$hash_policy
+}
+
 multipath6_test()
 {
local desc="$1"
@@ -278,6 +317,11 @@ multipath_test()
multipath6_test "ECMP" 1 1
multipath6_test "Weighted MP 2:1" 2 1
multipath6_test "Weighted MP 11:45" 11 45
+
+   log_info "Running IPv6 L4 hash multipath tests"
+   multipath6_l4_test "ECMP" 1 1
+   multipath6_l4_test "Weighted MP 2:1" 2 1
+   multipath6_l4_test "Weighted MP 11:45" 11 45
 }
 
 setup_prepare()
-- 
2.11.0

Re: [PATCH v2 net-next 5/5] net: dsa: mv88e6xxx: Get mv88e6352 SERDES statistics

2018-03-01 Thread Andrew Lunn

> +void mv88e6352_serdes_get_strings(struct mv88e6xxx_chip *chip,
> +   int port, uint8_t *data)
> +{
> + struct mv88e6352_serdes_hw_stat *stat;
> + int i;
> +
> + if (!mv88e6352_port_has_serdes(chip, port))
> + return;
> +
> + for (i = 0; i < ARRAY_SIZE(mv88e6352_serdes_hw_stats); i++) {
> + stat = _serdes_hw_stats[i];
> + memcpy(data + i * ETH_GSTRING_LEN, stat->string,
> +ETH_GSTRING_LEN);

This has the same problem as Florain just fixed, using memcpy instead
of strcnpy. I will spin a new version with this fixed.

   Andrew

Re: [PATCH net 4/4] net: dsa: mv88e6xxx: Utilize strncpy() for ethtool::get_strings

2018-03-01 Thread Andrew Lunn

On Thu, Mar 01, 2018 at 04:25:29PM -0800, Florian Fainelli wrote:
> Do not use memcpy() which is not safe, but instead use strncpy() which
> will make sure that the string is NUL terminated (in the Linux
> implementation) if the string is smaller than the length specified. This
> fixes KASAN out of bounds warnings while fetching port statistics.
> 
> Fixes: f5e2ed022dff ("dsa: mv88e6xxx: Add Second back of statistics")

I'm sure it goes back much further than that.

> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

[Patch nf-next] netfilter: make xt_rateest hash table per net

2018-03-01 Thread Cong Wang

As suggested by Eric, we need to make the xt_rateest
hash table and its lock per netns to reduce lock
contentions.

Cc: Florian Westphal 
Cc: Eric Dumazet 
Cc: Pablo Neira Ayuso 
Signed-off-by: Cong Wang 
---
 include/net/netfilter/xt_rateest.h |  4 +-
 net/netfilter/xt_RATEEST.c | 91 +++---
 net/netfilter/xt_rateest.c | 10 ++---
 3 files changed, 72 insertions(+), 33 deletions(-)

diff --git a/include/net/netfilter/xt_rateest.h 
b/include/net/netfilter/xt_rateest.h
index b1db13772554..832ab69efda5 100644
--- a/include/net/netfilter/xt_rateest.h
+++ b/include/net/netfilter/xt_rateest.h
@@ -21,7 +21,7 @@ struct xt_rateest {
struct net_rate_estimator __rcu *rate_est;
 };
 
-struct xt_rateest *xt_rateest_lookup(const char *name);
-void xt_rateest_put(struct xt_rateest *est);
+struct xt_rateest *xt_rateest_lookup(struct net *net, const char *name);
+void xt_rateest_put(struct net *net, struct xt_rateest *est);
 
 #endif /* _XT_RATEEST_H */
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 141c295191f6..dec843cadf46 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -14,15 +14,21 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 
-static DEFINE_MUTEX(xt_rateest_mutex);
-
 #define RATEEST_HSIZE  16
-static struct hlist_head rateest_hash[RATEEST_HSIZE] __read_mostly;
+
+struct xt_rateest_net {
+   struct mutex hash_lock;
+   struct hlist_head hash[RATEEST_HSIZE];
+};
+
+static unsigned int xt_rateest_id;
+
 static unsigned int jhash_rnd __read_mostly;
 
 static unsigned int xt_rateest_hash(const char *name)
@@ -31,21 +37,23 @@ static unsigned int xt_rateest_hash(const char *name)
   (RATEEST_HSIZE - 1);
 }
 
-static void xt_rateest_hash_insert(struct xt_rateest *est)
+static void xt_rateest_hash_insert(struct xt_rateest_net *xn,
+  struct xt_rateest *est)
 {
unsigned int h;
 
h = xt_rateest_hash(est->name);
-   hlist_add_head(>list, _hash[h]);
+   hlist_add_head(>list, >hash[h]);
 }
 
-static struct xt_rateest *__xt_rateest_lookup(const char *name)
+static struct xt_rateest *__xt_rateest_lookup(struct xt_rateest_net *xn,
+ const char *name)
 {
struct xt_rateest *est;
unsigned int h;
 
h = xt_rateest_hash(name);
-   hlist_for_each_entry(est, _hash[h], list) {
+   hlist_for_each_entry(est, >hash[h], list) {
if (strcmp(est->name, name) == 0) {
est->refcnt++;
return est;
@@ -55,20 +63,23 @@ static struct xt_rateest *__xt_rateest_lookup(const char 
*name)
return NULL;
 }
 
-struct xt_rateest *xt_rateest_lookup(const char *name)
+struct xt_rateest *xt_rateest_lookup(struct net *net, const char *name)
 {
+   struct xt_rateest_net *xn = net_generic(net, xt_rateest_id);
struct xt_rateest *est;
 
-   mutex_lock(_rateest_mutex);
-   est = __xt_rateest_lookup(name);
-   mutex_unlock(_rateest_mutex);
+   mutex_lock(>hash_lock);
+   est = __xt_rateest_lookup(xn, name);
+   mutex_unlock(>hash_lock);
return est;
 }
 EXPORT_SYMBOL_GPL(xt_rateest_lookup);
 
-void xt_rateest_put(struct xt_rateest *est)
+void xt_rateest_put(struct net *net, struct xt_rateest *est)
 {
-   mutex_lock(_rateest_mutex);
+   struct xt_rateest_net *xn = net_generic(net, xt_rateest_id);
+
+   mutex_lock(>hash_lock);
if (--est->refcnt == 0) {
hlist_del(>list);
gen_kill_estimator(>rate_est);
@@ -78,7 +89,7 @@ void xt_rateest_put(struct xt_rateest *est)
 */
kfree_rcu(est, rcu);
}
-   mutex_unlock(_rateest_mutex);
+   mutex_unlock(>hash_lock);
 }
 EXPORT_SYMBOL_GPL(xt_rateest_put);
 
@@ -98,6 +109,7 @@ xt_rateest_tg(struct sk_buff *skb, const struct 
xt_action_param *par)
 
 static int xt_rateest_tg_checkentry(const struct xt_tgchk_param *par)
 {
+   struct xt_rateest_net *xn = net_generic(par->net, xt_rateest_id);
struct xt_rateest_target_info *info = par->targinfo;
struct xt_rateest *est;
struct {
@@ -108,10 +120,10 @@ static int xt_rateest_tg_checkentry(const struct 
xt_tgchk_param *par)
 
net_get_random_once(_rnd, sizeof(jhash_rnd));
 
-   mutex_lock(_rateest_mutex);
-   est = __xt_rateest_lookup(info->name);
+   mutex_lock(>hash_lock);
+   est = __xt_rateest_lookup(xn, info->name);
if (est) {
-   mutex_unlock(_rateest_mutex);
+   mutex_unlock(>hash_lock);
/*
 * If estimator parameters are specified, they must match the
 * existing estimator.
@@ -119,7 +131,7 @@ static int xt_rateest_tg_checkentry(const struct 
xt_tgchk_param *par)

Re: [PATCH bpf-next v2 0/8] tools: bpftool: visualization support for eBPF program

2018-03-01 Thread Alexei Starovoitov

On Thu, Mar 01, 2018 at 06:01:15PM -0800, Jakub Kicinski wrote:
> Jiong says:
> 
> This patch set is an application of CFG information on eBPF program
> visualization. It presents some initial code for building CFG information
> from eBPF instruction sequences.
> 
> After we get eBPF program bytecode, we do sub-program detection and
> basic-block partition. These information then are visualized into DOT
> graph.
> 
> The user could use any DOT graphic tools (xdot, graphviz etc) to view it.
> 
> For example:
> 
>   bpftool prog dump xlated id 2 visual &>output.dot
> 
>   [xdot | dotty] output.dot
>   dot -Tpng -o output.png

I think generated .png even for simple and short programs isn't readable.
May be some fancy visualizer can help, but I like the cfg parser a lot.
May be you can show jumps and cfg the way perf does it in text with arrows
then it will be much more usable for text only folks like me.

Applied to bpf-next, Thanks everyone.

Re: [PATCH] pci-iov: Add support for unmanaged SR-IOV

2018-03-01 Thread Alexander Duyck

On Thu, Mar 1, 2018 at 3:58 PM, Alex Williamson
 wrote:
> On Thu, 1 Mar 2018 14:42:40 -0800
> Alexander Duyck  wrote:
>
>> On Thu, Mar 1, 2018 at 12:22 PM, Alex Williamson
>>  wrote:
>> > On Wed, 28 Feb 2018 16:36:38 -0800
>> > Alexander Duyck  wrote:
>> >
>> >> On Wed, Feb 28, 2018 at 2:59 PM, Alex Williamson
>> >>  wrote:
>> >> > On Wed, 28 Feb 2018 09:49:21 -0800
>> >> > Alexander Duyck  wrote:
>> >> >
>> >> >> On Tue, Feb 27, 2018 at 2:25 PM, Alexander Duyck
>> >> >>  wrote:
>> >> >> > On Tue, Feb 27, 2018 at 1:40 PM, Alex Williamson
>> >> >> >  wrote:
>> >> >> >> On Tue, 27 Feb 2018 11:06:54 -0800
>> >> >> >> Alexander Duyck  wrote:
>> >> >> >>
>> >> >> >>> From: Alexander Duyck 
>> >> >> >>>
>> >> >> >>> This patch is meant to add support for SR-IOV on devices when the 
>> >> >> >>> VFs are
>> >> >> >>> not managed by the kernel. Examples of recent patches attempting 
>> >> >> >>> to do this
>> >> >> >>> include:
>> >> >> >>
>> >> >> >> It appears to enable sriov when the _pf_ is not managed by the
>> >> >> >> kernel, but by "managed" we mean that either there is no pf driver 
>> >> >> >> or
>> >> >> >> the pf driver doesn't provide an sriov_configure callback,
>> >> >> >> intentionally or otherwise.
>> >> >> >>
>> >> >> >>> virto - https://patchwork.kernel.org/patch/10241225/
>> >> >> >>> pci-stub - https://patchwork.kernel.org/patch/10109935/
>> >> >> >>> vfio - https://patchwork.kernel.org/patch/10103353/
>> >> >> >>> uio - https://patchwork.kernel.org/patch/9974031/
>> >> >> >>
>> >> >> >> So is the goal to get around the issues with enabling sriov on each 
>> >> >> >> of
>> >> >> >> the above drivers by doing it under the covers or are you really 
>> >> >> >> just
>> >> >> >> trying to enable sriov for a truly unmanage (no pf driver) case?  
>> >> >> >> For
>> >> >> >> example, should a driver explicitly not wanting sriov enabled 
>> >> >> >> implement
>> >> >> >> a dummy sriov_configure function?
>> >> >> >>
>> >> >> >>> Since this is quickly blowing up into a multi-driver problem it is 
>> >> >> >>> probably
>> >> >> >>> best to implement this solution in one spot.
>> >> >> >>>
>> >> >> >>> This patch is an attempt to do that. What we do with this patch is 
>> >> >> >>> provide
>> >> >> >>> a generic call to enable SR-IOV in the case that the PF driver is 
>> >> >> >>> either
>> >> >> >>> not present, or the PF driver doesn't support configuring SR-IOV.
>> >> >> >>>
>> >> >> >>> A new sysfs value called sriov_unmanaged_autoprobe has been added. 
>> >> >> >>> This
>> >> >> >>> value is used as the drivers_autoprobe setting of the VFs when 
>> >> >> >>> they are
>> >> >> >>> being managed by an external entity such as userspace or device 
>> >> >> >>> firmware
>> >> >> >>> instead of being managed by the kernel.
>> >> >> >>
>> >> >> >> Documentation/ABI/testing/sysfs-bus-pci update is missing.
>> >> >> >
>> >> >> > I can make sure to update that in the next version.
>> >> >> >
>> >> >> >>> One side effect of this change is that the sriov_drivers_autoprobe 
>> >> >> >>> and
>> >> >> >>> sriov_unmanaged_autoprobe will only apply their updates when 
>> >> >> >>> SR-IOV is
>> >> >> >>> disabled. Attempts to update them when SR-IOV is in use will only 
>> >> >> >>> update
>> >> >> >>> the local value and will not update sriov->autoprobe.
>> >> >> >>
>> >> >> >> And we expect users to understand when sriov_drivers_autoprobe 
>> >> >> >> applies
>> >> >> >> vs sriov_unmanaged_autoprobe, even though they're using the same
>> >> >> >> interfaces to enable sriov?  Are all combinations expected to work, 
>> >> >> >> ex.
>> >> >> >> unmanaged sriov is enabled, a native pf driver loads, vfs work?  Not
>> >> >> >> only does it seems like there's opportunity to use this 
>> >> >> >> incorrectly, I
>> >> >> >> think maybe it might be difficult to use correctly.
>> >> >> >>
>> >> >> >>> I based my patch set originally on the patch by Mark Rustad but 
>> >> >> >>> there isn't
>> >> >> >>> much left after going through and cleaning out the bits that were 
>> >> >> >>> no longer
>> >> >> >>> needed, and after incorporating the feedback from David Miller.
>> >> >> >>>
>> >> >> >>> I have included the authors of the original 4 patches above in the 
>> >> >> >>> Cc here.
>> >> >> >>> My hope is to get feedback and/or review on if this works for 
>> >> >> >>> their use
>> >> >> >>> cases.
>> >> >> >>>
>> >> >> >>> Cc: Mark Rustad 
>> >> >> >>> Cc: Maximilian Heyne 
>> >> >> >>> Cc: Liang-Min Wang 
>> >> >> >>> Cc: David Woodhouse 
>> >> >> >>> Signed-off-by: Alexander Duyck 
>> >> >> >>> ---
>> >> >> >>>  drivers/pci/iov.c|

Re: [PATCH net-next 0/2] tcp_bbr: more GSO work

2018-03-01 Thread David Miller

From: Eric Dumazet 
Date: Wed, 28 Feb 2018 14:40:45 -0800

> Playing with r8152 USB 1Gbit NIC, on both USB2 and USB3 slots,
> I found that BBR was performing poorly, because of TSO being limited to 16KB
> 
> This patch series makes sure BBR is not under estimating number
> of packets that are needed to fill the pipe when a device has
> suboptimal TSO limits.

Series applied, thanks Eric.

Re: pull-request: bpf 2018-02-28

2018-03-01 Thread David Miller

From: Daniel Borkmann 
Date: Wed, 28 Feb 2018 21:27:58 +0100

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) Add schedule points and reduce the number of loop iterations
>the test_bpf kernel module is performing in order to not hog
>the CPU for too long, from Eric.
> 
> 2) Fix an out of bounds access in tail calls in the ppc64 BPF
>JIT compiler, from Daniel.
> 
> 3) Fix a crash on arm64 on unaligned BPF xadd operations that
>could be triggered via interpreter and JIT, from Daniel.
> 
> Please not that once you merge net into net-next at some point, there
> is a minor merge conflict in test_verifier.c since test cases had
> been added at the end in both trees. Resolution is trivial: keep all
> the test cases from both trees.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Daniel.

Re: [PATCH net-next] socket: skip checking sk_err for recvmmsg(MSG_ERRQUEUE)

2018-03-01 Thread David Miller

From: Soheil Hassas Yeganeh 
Date: Tue, 27 Feb 2018 18:22:40 -0500

> From: Soheil Hassas Yeganeh 
> 
> recvmmsg does not call ___sys_recvmsg when sk_err is set.
> That is fine for normal reads but, for MSG_ERRQUEUE, recvmmsg
> should always call ___sys_recvmsg regardless of sk->sk_err to
> be able to clear error queue. Otherwise, users are not able to
> drain the error queue using recvmmsg.
> 
> Signed-off-by: Soheil Hassas Yeganeh 
> Reviewed-by: Eric Dumazet 
> Signed-off-by: Willem de Bruijn 

Applied, thank you.

Re: [PATCH] net: allow interface to be set into VRF if VLAN interface in same VRF

2018-03-01 Thread David Miller

From: Mike Manning 
Date: Mon, 26 Feb 2018 23:49:30 +

> Setting an interface into a VRF fails with 'RTNETLINK answers: File
> exists' if one of its VLAN interfaces is already in the same VRF.
> As the VRF is an upper device of the VLAN interface, it is also showing
> up as an upper device of the interface itself. The solution is to
> restrict this check to devices other than master. As only one master
> device can be linked to a device, the check in this case is that the
> upper device (VRF) being linked to is not the same as the master device
> instead of it not being any one of the upper devices.
> 
> The following example shows an interface ens12 (with a VLAN interface
> ens12.10) being set into VRF green, which behaves as expected:
> 
>   # ip link add link ens12 ens12.10 type vlan id 10
>   # ip link set dev ens12 master vrfgreen
>   # ip link show dev ens12
> 3: ens12:  mtu 1500 qdisc fq_codel
>master vrfgreen state UP mode DEFAULT group default qlen 1000
>link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff
> 
> But if the VLAN interface has previously been set into the same VRF,
> then setting the interface into the VRF fails:
> 
>   # ip link set dev ens12 nomaster
>   # ip link set dev ens12.10 master vrfgreen
>   # ip link show dev ens12.10
> 39: ens12.10@ens12:  mtu 1500
> qdisc noqueue master vrfgreen state UP mode DEFAULT group default
> qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff
>   # ip link set dev ens12 master vrfgreen
> RTNETLINK answers: File exists
> 
> The workaround is to move the VLAN interface back into the default VRF
> beforehand, but it has to be shut first so as to avoid the risk of
> traffic leaking from the VRF. This fix avoids needing this workaround.
> 
> Signed-off-by: Mike Manning 

Applied, thanks Mike.

[PATCH net-next] selftests: rtnetlink: remove testns on test fail

2018-03-01 Thread Prashant Bhole

This patch removes testns after test failure so that next test can
continue with clean ns

Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/net/rtnetlink.sh | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/net/rtnetlink.sh 
b/tools/testing/selftests/net/rtnetlink.sh
index a622eeecc3a6..e6f485235435 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -517,6 +517,7 @@ kci_test_gretap()
ip link help gretap 2>&1 | grep -q "^Usage:"
if [ $? -ne 0 ];then
echo "SKIP: gretap: iproute2 too old"
+   ip netns del "$testns"
return 1
fi
 
@@ -543,6 +544,7 @@ kci_test_gretap()
 
if [ $ret -ne 0 ]; then
echo "FAIL: gretap"
+   ip netns del "$testns"
return 1
fi
echo "PASS: gretap"
@@ -565,6 +567,7 @@ kci_test_ip6gretap()
ip link help ip6gretap 2>&1 | grep -q "^Usage:"
if [ $? -ne 0 ];then
echo "SKIP: ip6gretap: iproute2 too old"
+   ip netns del "$testns"
return 1
fi
 
@@ -591,6 +594,7 @@ kci_test_ip6gretap()
 
if [ $ret -ne 0 ]; then
echo "FAIL: ip6gretap"
+   ip netns del "$testns"
return 1
fi
echo "PASS: ip6gretap"
@@ -655,6 +659,7 @@ kci_test_erspan()
 
if [ $ret -ne 0 ]; then
echo "FAIL: erspan"
+   ip netns del "$testns"
return 1
fi
echo "PASS: erspan"
@@ -720,6 +725,7 @@ kci_test_ip6erspan()
 
if [ $ret -ne 0 ]; then
echo "FAIL: ip6erspan"
+   ip netns del "$testns"
return 1
fi
echo "PASS: ip6erspan"
-- 
2.13.6

[PATCH net-next 0/6] enic update

2018-03-01 Thread Govindarajulu Varadarajan

This series adds support for IPv6 vxlan offload and UDP rss along with a
bug fix in filling the rq ring.

Govindarajulu Varadarajan (6):
  enic: Check inner ip proto for pseudo header csum
  enic: Add vxlan offload support for IPv6 pkts
  enic: Check if hw supports multi wq with vxlan offload
  enic: set UDP rss flag
  enic: enable rq before updating rq descriptors
  enic: set IG desc cache flag in open

 drivers/net/ethernet/cisco/enic/enic.h |  3 +-
 drivers/net/ethernet/cisco/enic/enic_ethtool.c | 36 
 drivers/net/ethernet/cisco/enic/enic_main.c| 81 +-
 drivers/net/ethernet/cisco/enic/vnic_dev.c | 22 ++-
 drivers/net/ethernet/cisco/enic/vnic_dev.h |  3 +-
 drivers/net/ethernet/cisco/enic/vnic_devcmd.h  |  5 ++
 drivers/net/ethernet/cisco/enic/vnic_nic.h |  1 +
 7 files changed, 131 insertions(+), 20 deletions(-)

-- 
2.16.2

Re: [PATCH net-next v3 0/5] net: phy: Reduce duplication

2018-03-01 Thread David Miller

From: Florian Fainelli 
Date: Thu,  1 Mar 2018 16:08:54 -0800

> This patch series reduces the duplication among 10G PHY drivers that just
> essentially stub most functions, but do that while replicating what the 
> existing
> generic functions do.

Series applied, thanks Florian.

[PATCH net-next 4/6] enic: set UDP rss flag

2018-03-01 Thread Govindarajulu Varadarajan

New hardware needs UDP flag set to enable UDP L4 rss hash. Add ethtool
get option to display supported rss flow hash.

Signed-off-by: Govindarajulu Varadarajan 
---
 drivers/net/ethernet/cisco/enic/enic_ethtool.c | 36 ++
 drivers/net/ethernet/cisco/enic/enic_main.c|  4 ++-
 drivers/net/ethernet/cisco/enic/vnic_dev.c | 17 
 drivers/net/ethernet/cisco/enic/vnic_dev.h |  1 +
 drivers/net/ethernet/cisco/enic/vnic_nic.h |  1 +
 5 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_ethtool.c 
b/drivers/net/ethernet/cisco/enic/enic_ethtool.c
index efb9333c7cf8..869006c2002d 100644
--- a/drivers/net/ethernet/cisco/enic/enic_ethtool.c
+++ b/drivers/net/ethernet/cisco/enic/enic_ethtool.c
@@ -474,6 +474,39 @@ static int enic_grxclsrule(struct enic *enic, struct 
ethtool_rxnfc *cmd)
return 0;
 }
 
+static int enic_get_rx_flow_hash(struct enic *enic, struct ethtool_rxnfc *cmd)
+{
+   cmd->data = 0;
+
+   switch (cmd->flow_type) {
+   case TCP_V6_FLOW:
+   case TCP_V4_FLOW:
+   cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   /* Fall through */
+   case UDP_V6_FLOW:
+   case UDP_V4_FLOW:
+   if (vnic_dev_capable_udp_rss(enic->vdev))
+   cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   /* Fall through */
+   case SCTP_V4_FLOW:
+   case AH_ESP_V4_FLOW:
+   case AH_V4_FLOW:
+   case ESP_V4_FLOW:
+   case SCTP_V6_FLOW:
+   case AH_ESP_V6_FLOW:
+   case AH_V6_FLOW:
+   case ESP_V6_FLOW:
+   case IPV4_FLOW:
+   case IPV6_FLOW:
+   cmd->data |= RXH_IP_SRC | RXH_IP_DST;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int enic_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
  u32 *rule_locs)
 {
@@ -500,6 +533,9 @@ static int enic_get_rxnfc(struct net_device *dev, struct 
ethtool_rxnfc *cmd,
ret = enic_grxclsrule(enic, cmd);
spin_unlock_bh(>rfs_h.lock);
break;
+   case ETHTOOL_GRXFH:
+   ret = enic_get_rx_flow_hash(enic, cmd);
+   break;
default:
ret = -EOPNOTSUPP;
break;
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c 
b/drivers/net/ethernet/cisco/enic/enic_main.c
index 3280a05f9cf1..5213bc01a6e9 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -2316,7 +2316,7 @@ static int enic_set_rss_nic_cfg(struct enic *enic)
 {
struct device *dev = enic_get_dev(enic);
const u8 rss_default_cpu = 0;
-   const u8 rss_hash_type = NIC_CFG_RSS_HASH_TYPE_IPV4 |
+   u8 rss_hash_type = NIC_CFG_RSS_HASH_TYPE_IPV4 |
NIC_CFG_RSS_HASH_TYPE_TCP_IPV4 |
NIC_CFG_RSS_HASH_TYPE_IPV6 |
NIC_CFG_RSS_HASH_TYPE_TCP_IPV6;
@@ -2324,6 +2324,8 @@ static int enic_set_rss_nic_cfg(struct enic *enic)
const u8 rss_base_cpu = 0;
u8 rss_enable = ENIC_SETTING(enic, RSS) && (enic->rq_count > 1);
 
+   if (vnic_dev_capable_udp_rss(enic->vdev))
+   rss_hash_type |= NIC_CFG_RSS_HASH_TYPE_UDP;
if (rss_enable) {
if (!enic_set_rsskey(enic)) {
if (enic_set_rsscpu(enic, rss_hash_bits)) {
diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.c 
b/drivers/net/ethernet/cisco/enic/vnic_dev.c
index b60fb6e3e775..a2b376055b2d 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_dev.c
+++ b/drivers/net/ethernet/cisco/enic/vnic_dev.c
@@ -1281,3 +1281,20 @@ int vnic_dev_get_supported_feature_ver(struct vnic_dev 
*vdev, u8 feature,
 
return ret;
 }
+
+bool vnic_dev_capable_udp_rss(struct vnic_dev *vdev)
+{
+   u64 a0 = CMD_NIC_CFG, a1 = 0;
+   u64 rss_hash_type;
+   int wait = 1000;
+   int err;
+
+   err = vnic_dev_cmd(vdev, CMD_CAPABILITY, , , wait);
+   if (err || !a0)
+   return 0;
+
+   rss_hash_type = (a1 >> NIC_CFG_RSS_HASH_TYPE_SHIFT) &
+   NIC_CFG_RSS_HASH_TYPE_MASK_FIELD;
+
+   return (rss_hash_type & NIC_CFG_RSS_HASH_TYPE_UDP);
+}
diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.h 
b/drivers/net/ethernet/cisco/enic/vnic_dev.h
index db160f459852..59d4cc8fbb85 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_dev.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_dev.h
@@ -184,5 +184,6 @@ int vnic_dev_overlay_offload_cfg(struct vnic_dev *vdev, u8 
overlay,
 u16 vxlan_udp_port_number);
 int vnic_dev_get_supported_feature_ver(struct vnic_dev *vdev, u8 feature,
   u64 *supported_versions, u64 *a1);
+bool vnic_dev_capable_udp_rss(struct vnic_dev *vdev);
 
 #endif /* _VNIC_DEV_H_ */
diff --git a/drivers/net/ethernet/cisco/enic/vnic_nic.h

[PATCH net-next 6/6] enic: set IG desc cache flag in open

2018-03-01 Thread Govindarajulu Varadarajan

New adapter needs CMD_OPENF_IG_DESCCACHE flag to be set. If this flag is
not set, fw flushes the global IG desc cache. This flag is nop in older
adapter.

Also increment driver version

Signed-off-by: Govindarajulu Varadarajan 
---
 drivers/net/ethernet/cisco/enic/enic.h| 2 +-
 drivers/net/ethernet/cisco/enic/enic_main.c   | 3 ++-
 drivers/net/ethernet/cisco/enic/vnic_devcmd.h | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic.h 
b/drivers/net/ethernet/cisco/enic/enic.h
index 83be9a5e8daa..0dd64acd2a3f 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -33,7 +33,7 @@
 
 #define DRV_NAME   "enic"
 #define DRV_DESCRIPTION"Cisco VIC Ethernet NIC Driver"
-#define DRV_VERSION"2.3.0.45"
+#define DRV_VERSION"2.3.0.53"
 #define DRV_COPYRIGHT  "Copyright 2008-2013 Cisco Systems, Inc"
 
 #define ENIC_BARS_MAX  6
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c 
b/drivers/net/ethernet/cisco/enic/enic_main.c
index 243d5c5fd5e3..a25fb95492a0 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -2196,9 +2196,10 @@ static int enic_dev_wait(struct vnic_dev *vdev,
 static int enic_dev_open(struct enic *enic)
 {
int err;
+   u32 flags = CMD_OPENF_IG_DESCCACHE;
 
err = enic_dev_wait(enic->vdev, vnic_dev_open,
-   vnic_dev_open_done, 0);
+   vnic_dev_open_done, flags);
if (err)
dev_err(enic_get_dev(enic), "vNIC device open failed, err %d\n",
err);
diff --git a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h 
b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
index 8fce9ef1c9bc..41de4ba622a1 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
@@ -439,6 +439,7 @@ enum vnic_devcmd_cmd {
 
 /* flags for CMD_OPEN */
 #define CMD_OPENF_OPROM0x1 /* open coming from option rom 
*/
+#define CMD_OPENF_IG_DESCCACHE 0x2 /* Do not flush IG DESC cache */
 
 /* flags for CMD_INIT */
 #define CMD_INITF_DEFAULT_MAC  0x1 /* init with default mac addr */
-- 
2.16.2

Re: [PATCH net 0/4] Fixes, cleanup and modernization for mac89x0 driver

2018-03-01 Thread David Miller

From: Finn Thain 
Date: Thu,  1 Mar 2018 18:29:28 -0500 (EST)

> Changes since v4 of combined patch series:
> - Removed redundant and non-portable MACH_IS_MAC tests.
> - Added acked-by tags from Geert Uytterhoeven.
> - Omitted patches unrelated to mac89x0 driver.

Series applied, thank you.

[PATCH net-next 1/6] enic: Check inner ip proto for pseudo header csum

2018-03-01 Thread Govindarajulu Varadarajan

To compute pseudo IP header csum, we need to check the inner header for
encap pkt, not outer IP header.

Also add pseudo csum for IPv6 inner pkt.

Signed-off-by: Govindarajulu Varadarajan 
---
 drivers/net/ethernet/cisco/enic/enic_main.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c 
b/drivers/net/ethernet/cisco/enic/enic_main.c
index f202ba72a811..252285894968 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -635,12 +635,25 @@ static int enic_queue_wq_skb_csum_l4(struct enic *enic, 
struct vnic_wq *wq,
 
 static void enic_preload_tcp_csum_encap(struct sk_buff *skb)
 {
-   if (skb->protocol == cpu_to_be16(ETH_P_IP)) {
+   const struct ethhdr *eth = (struct ethhdr *)skb_inner_mac_header(skb);
+
+   switch (eth->h_proto) {
+   case ntohs(ETH_P_IP):
inner_ip_hdr(skb)->check = 0;
inner_tcp_hdr(skb)->check =
~csum_tcpudp_magic(inner_ip_hdr(skb)->saddr,
   inner_ip_hdr(skb)->daddr, 0,
   IPPROTO_TCP, 0);
+   break;
+   case ntohs(ETH_P_IPV6):
+   inner_tcp_hdr(skb)->check =
+   ~csum_ipv6_magic(_ipv6_hdr(skb)->saddr,
+_ipv6_hdr(skb)->daddr, 0,
+IPPROTO_TCP, 0);
+   break;
+   default:
+   WARN_ONCE(1, "Non ipv4/ipv6 inner pkt for encap offload");
+   break;
}
 }
 
-- 
2.16.2

[PATCH net-next 2/6] enic: Add vxlan offload support for IPv6 pkts

2018-03-01 Thread Govindarajulu Varadarajan

New adaptors supports vxlan offload for inner IPv6 and outer IPv6 vxlan
pkts.

Fw sets BIT(0) & BIT(1) in a1 if hw supports ipv6 inner & outer pkt
offload.

Signed-off-by: Govindarajulu Varadarajan 
---
 drivers/net/ethernet/cisco/enic/enic.h|  1 +
 drivers/net/ethernet/cisco/enic/enic_main.c   | 44 +--
 drivers/net/ethernet/cisco/enic/vnic_dev.c|  5 ++-
 drivers/net/ethernet/cisco/enic/vnic_dev.h|  2 +-
 drivers/net/ethernet/cisco/enic/vnic_devcmd.h |  3 ++
 5 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic.h 
b/drivers/net/ethernet/cisco/enic/enic.h
index 9b218f0e5a4c..83be9a5e8daa 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -140,6 +140,7 @@ struct enic_rfs_flw_tbl {
 struct vxlan_offload {
u16 vxlan_udp_port_number;
u8 patch_level;
+   u8 flags;
 };
 
 /* Per-instance private data structure */
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c 
b/drivers/net/ethernet/cisco/enic/enic_main.c
index 252285894968..848aac477cff 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -191,8 +191,16 @@ static void enic_udp_tunnel_add(struct net_device *netdev,
goto error;
}
 
-   if (ti->sa_family != AF_INET) {
-   netdev_info(netdev, "vxlan: only IPv4 offload supported");
+   switch (ti->sa_family) {
+   case AF_INET6:
+   if (!(enic->vxlan.flags & ENIC_VXLAN_OUTER_IPV6)) {
+   netdev_info(netdev, "vxlan: only IPv4 offload 
supported");
+   goto error;
+   }
+   /* Fall through */
+   case AF_INET:
+   break;
+   default:
goto error;
}
 
@@ -271,22 +279,37 @@ static netdev_features_t enic_features_check(struct 
sk_buff *skb,
struct enic *enic = netdev_priv(dev);
struct udphdr *udph;
u16 port = 0;
-   u16 proto;
+   u8 proto;
 
if (!skb->encapsulation)
return features;
 
features = vxlan_features_check(skb, features);
 
-   /* hardware only supports IPv4 vxlan tunnel */
-   if (vlan_get_protocol(skb) != htons(ETH_P_IP))
+   switch (vlan_get_protocol(skb)) {
+   case htons(ETH_P_IPV6):
+   if (!(enic->vxlan.flags & ENIC_VXLAN_OUTER_IPV6))
+   goto out;
+   proto = ipv6_hdr(skb)->nexthdr;
+   break;
+   case htons(ETH_P_IP):
+   proto = ip_hdr(skb)->protocol;
+   break;
+   default:
goto out;
+   }
 
-   /* hardware does not support offload of ipv6 inner pkt */
-   if (eth->h_proto != ntohs(ETH_P_IP))
+   switch (eth->h_proto) {
+   case ntohs(ETH_P_IPV6):
+   if (!(enic->vxlan.flags & ENIC_VXLAN_INNER_IPV6))
+   goto out;
+   /* Fall through */
+   case ntohs(ETH_P_IP):
+   break;
+   default:
goto out;
+   }
 
-   proto = ip_hdr(skb)->protocol;
 
if (proto == IPPROTO_UDP) {
udph = udp_hdr(skb);
@@ -2914,9 +2937,11 @@ static int enic_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
netdev->hw_features |= NETIF_F_RXCSUM;
if (ENIC_SETTING(enic, VXLAN)) {
u64 patch_level;
+   u64 a1 = 0;
 
netdev->hw_enc_features |= NETIF_F_RXCSUM   |
   NETIF_F_TSO  |
+  NETIF_F_TSO6 |
   NETIF_F_TSO_ECN  |
   NETIF_F_GSO_UDP_TUNNEL   |
   NETIF_F_HW_CSUM  |
@@ -2935,9 +2960,10 @@ static int enic_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 */
err = vnic_dev_get_supported_feature_ver(enic->vdev,
 VIC_FEATURE_VXLAN,
-_level);
+_level, );
if (err)
patch_level = 0;
+   enic->vxlan.flags = (u8)a1;
/* mask bits that are supported by driver
 */
patch_level &= BIT_ULL(0) | BIT_ULL(2);
diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.c 
b/drivers/net/ethernet/cisco/enic/vnic_dev.c
index 39bad67422dd..b60fb6e3e775 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_dev.c
+++ b/drivers/net/ethernet/cisco/enic/vnic_dev.c
@@ -1269,14 +1269,13 @@ int vnic_dev_overlay_offload_cfg(struct vnic_dev *vdev, 
u8 overlay,
 }
 
 int

[PATCH net-next 3/6] enic: Check if hw supports multi wq with vxlan offload

2018-03-01 Thread Govindarajulu Varadarajan

Some adaptors do not support vxlan offload when multi wq is configured.

If hw supports multi wq, BIT(2) is set in a1.

Signed-off-by: Govindarajulu Varadarajan 
---
 drivers/net/ethernet/cisco/enic/enic_main.c   | 5 +
 drivers/net/ethernet/cisco/enic/vnic_devcmd.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c 
b/drivers/net/ethernet/cisco/enic/enic_main.c
index 848aac477cff..3280a05f9cf1 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -212,6 +212,11 @@ static void enic_udp_tunnel_add(struct net_device *netdev,
 
goto error;
}
+   if ((vnic_dev_get_res_count(enic->vdev, RES_TYPE_WQ) != 1) &&
+   !(enic->vxlan.flags & ENIC_VXLAN_MULTI_WQ)) {
+   netdev_info(netdev, "vxlan: vxlan offload with multi wq not 
supported on this adapter");
+   goto error;
+   }
 
err = vnic_dev_overlay_offload_cfg(enic->vdev,
   OVERLAY_CFG_VXLAN_PORT_UPDATE,
diff --git a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h 
b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
index 69529a3516cd..8fce9ef1c9bc 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
@@ -699,6 +699,7 @@ enum overlay_ofld_cmd {
 
 #define ENIC_VXLAN_INNER_IPV6  BIT(0)
 #define ENIC_VXLAN_OUTER_IPV6  BIT(1)
+#define ENIC_VXLAN_MULTI_WQBIT(2)
 
 /* Use this enum to get the supported versions for each of these features
  * If you need to use the devcmd_get_supported_feature_version(), add
-- 
2.16.2

[PATCH net-next 5/6] enic: enable rq before updating rq descriptors

2018-03-01 Thread Govindarajulu Varadarajan

rq should be enabled before posting the buffers to rq desc. If not hw sees
stale value and casuses DMAR errors.

Signed-off-by: Govindarajulu Varadarajan 
---
 drivers/net/ethernet/cisco/enic/enic_main.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c 
b/drivers/net/ethernet/cisco/enic/enic_main.c
index 5213bc01a6e9..243d5c5fd5e3 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -1939,6 +1939,8 @@ static int enic_open(struct net_device *netdev)
}
 
for (i = 0; i < enic->rq_count; i++) {
+   /* enable rq before updating rq desc */
+   vnic_rq_enable(>rq[i]);
vnic_rq_fill(>rq[i], enic_rq_alloc_buf);
/* Need at least one buffer on ring to get going */
if (vnic_rq_desc_used(>rq[i]) == 0) {
@@ -1950,8 +1952,6 @@ static int enic_open(struct net_device *netdev)
 
for (i = 0; i < enic->wq_count; i++)
vnic_wq_enable(>wq[i]);
-   for (i = 0; i < enic->rq_count; i++)
-   vnic_rq_enable(>rq[i]);
 
if (!enic_is_dynamic(enic) && !enic_is_sriov_vf(enic))
enic_dev_add_station_addr(enic);
@@ -1977,8 +1977,12 @@ static int enic_open(struct net_device *netdev)
return 0;
 
 err_out_free_rq:
-   for (i = 0; i < enic->rq_count; i++)
+   for (i = 0; i < enic->rq_count; i++) {
+   err = vnic_rq_disable(>rq[i]);
+   if (err)
+   return err;
vnic_rq_clean(>rq[i], enic_free_rq_buf);
+   }
enic_dev_notify_unset(enic);
 err_out_free_intr:
enic_unset_affinity_hint(enic);
-- 
2.16.2

Re: [PATCH v2 net-next 0/4] selftests: forwarding: misc bug fixes and enhancements

2018-03-01 Thread David Miller

From: David Ahern 
Date: Thu,  1 Mar 2018 13:49:29 -0800

> Bug fixes and an enhancement for the recent forwarding tests:
> - only check tc version on tc tests
> - handle multipath tests failing with 0 packet count
> - fix ping command for IPv6 on Debian jessie
> - improve summary of multipath tests
> 
> v2
> - add CHECK_TC to bridge_vlan_aware.sh (Ido)
> - dropped patch 2; always check for mz given its use
> - fixed commit message for the last patch (Multipath: was dropped)

Series applied, thanks David.

Re: [PATCH net-next] fib_rules: FRA_GENERIC_POLICY updates for ip proto, sport and dport attrs

2018-03-01 Thread Eric Dumazet

On Thu, 2018-03-01 at 17:55 -0800, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> Fixes: bfff4862653b ("net: fib_rules: support for match on ip_proto, sport 
> and dport")
> Reported-by: Eric Dumazet 
> Signed-off-by: Roopa Prabhu 
> ---
>  include/net/fib_rules.h | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
> index 6dd0a00..1c9e17c 100644
> --- a/include/net/fib_rules.h
> +++ b/include/net/fib_rules.h
> @@ -112,7 +112,11 @@ struct fib_rule_notifier_info {
>   [FRA_GOTO]  = { .type = NLA_U32 }, \
>   [FRA_L3MDEV]= { .type = NLA_U8 }, \
>   [FRA_UID_RANGE] = { .len = sizeof(struct fib_rule_uid_range) }, \
> - [FRA_PROTOCOL]  = { .type = NLA_U8 }
> + [FRA_PROTOCOL]  = { .type = NLA_U8 }, \
> + [FRA_IP_PROTO]  = { .type = NLA_U8 }, \
> + [FRA_SPORT_RANGE] = { .len = sizeof(struct fib_rule_port_range) }, \
> + [FRA_DPORT_RANGE] = { .len = sizeof(struct fib_rule_port_range) }
> +

Thanks Roopa !

Reviewed-by: Eric Dumazet

Re: [PATCH bpf-next v2 0/8] tools: bpftool: visualization support for eBPF program

2018-03-01 Thread David Miller

From: Jakub Kicinski 
Date: Thu,  1 Mar 2018 18:01:15 -0800

> This patch set is an application of CFG information on eBPF program
> visualization. It presents some initial code for building CFG information
> from eBPF instruction sequences.

For series:

Acked-by: David S. Miller

Re: [PATCH net-next] fib_rules: FRA_GENERIC_POLICY updates for ip proto, sport and dport attrs

2018-03-01 Thread David Miller

From: Roopa Prabhu 
Date: Thu,  1 Mar 2018 17:55:37 -0800

> From: Roopa Prabhu 
> 
> Fixes: bfff4862653b ("net: fib_rules: support for match on ip_proto, sport 
> and dport")
> Reported-by: Eric Dumazet 
> Signed-off-by: Roopa Prabhu 

Applied, thanks Roopa.

[PATCH bpf-next v2 4/8] tools: bpftool: partition basic-block for each function in the CFG

2018-03-01 Thread Jakub Kicinski

From: Jiong Wang 

This patch partition basic-block for each function in the CFG. The
algorithm is simple, we identify basic-block head in a first traversal,
then second traversal to identify the tail.

We could build extended basic-block (EBB) in next steps. EBB could make the
graph more readable when the eBPF sequence is big.

Signed-off-by: Jiong Wang 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/cfg.c | 118 +++-
 1 file changed, 117 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/cfg.c b/tools/bpf/bpftool/cfg.c
index 8330dedb3576..152df904d421 100644
--- a/tools/bpf/bpftool/cfg.c
+++ b/tools/bpf/bpftool/cfg.c
@@ -49,17 +49,32 @@ struct cfg {
 
 struct func_node {
struct list_head l;
+   struct list_head bbs;
struct bpf_insn *start;
struct bpf_insn *end;
int idx;
+   int bb_num;
+};
+
+struct bb_node {
+   struct list_head l;
+   struct bpf_insn *head;
+   struct bpf_insn *tail;
+   int idx;
 };
 
 #define func_prev(func)list_prev_entry(func, l)
 #define func_next(func)list_next_entry(func, l)
+#define bb_prev(bb)list_prev_entry(bb, l)
+#define bb_next(bb)list_next_entry(bb, l)
 #define cfg_first_func(cfg)\
list_first_entry(>funcs, struct func_node, l)
 #define cfg_last_func(cfg) \
list_last_entry(>funcs, struct func_node, l)
+#define func_first_bb(func)\
+   list_first_entry(>bbs, struct bb_node, l)
+#define func_last_bb(func) \
+   list_last_entry(>bbs, struct bb_node, l)
 
 static struct func_node *cfg_append_func(struct cfg *cfg, struct bpf_insn 
*insn)
 {
@@ -86,6 +101,30 @@ static struct func_node *cfg_append_func(struct cfg *cfg, 
struct bpf_insn *insn)
return new_func;
 }
 
+static struct bb_node *func_append_bb(struct func_node *func,
+ struct bpf_insn *insn)
+{
+   struct bb_node *new_bb, *bb;
+
+   list_for_each_entry(bb, >bbs, l) {
+   if (bb->head == insn)
+   return bb;
+   else if (bb->head > insn)
+   break;
+   }
+
+   bb = bb_prev(bb);
+   new_bb = calloc(1, sizeof(*new_bb));
+   if (!new_bb) {
+   p_err("OOM when allocating BB node");
+   return NULL;
+   }
+   new_bb->head = insn;
+   list_add(_bb->l, >l);
+
+   return new_bb;
+}
+
 static bool cfg_partition_funcs(struct cfg *cfg, struct bpf_insn *cur,
struct bpf_insn *end)
 {
@@ -115,13 +154,83 @@ static bool cfg_partition_funcs(struct cfg *cfg, struct 
bpf_insn *cur,
return false;
 }
 
+static bool func_partition_bb_head(struct func_node *func)
+{
+   struct bpf_insn *cur, *end;
+   struct bb_node *bb;
+
+   cur = func->start;
+   end = func->end;
+   INIT_LIST_HEAD(>bbs);
+   bb = func_append_bb(func, cur);
+   if (!bb)
+   return true;
+
+   for (; cur <= end; cur++) {
+   if (BPF_CLASS(cur->code) == BPF_JMP) {
+   u8 opcode = BPF_OP(cur->code);
+
+   if (opcode == BPF_EXIT || opcode == BPF_CALL)
+   continue;
+
+   bb = func_append_bb(func, cur + cur->off + 1);
+   if (!bb)
+   return true;
+
+   if (opcode != BPF_JA) {
+   bb = func_append_bb(func, cur + 1);
+   if (!bb)
+   return true;
+   }
+   }
+   }
+
+   return false;
+}
+
+static void func_partition_bb_tail(struct func_node *func)
+{
+   struct bb_node *bb, *last;
+   unsigned int bb_idx = 0;
+
+   last = func_last_bb(func);
+   last->tail = func->end;
+   bb = func_first_bb(func);
+   list_for_each_entry_from(bb, >l, l) {
+   bb->tail = bb_next(bb)->head - 1;
+   bb->idx = bb_idx++;
+   }
+
+   last->idx = bb_idx++;
+   func->bb_num = bb_idx;
+}
+
+static bool func_partition_bb(struct func_node *func)
+{
+   if (func_partition_bb_head(func))
+   return true;
+
+   func_partition_bb_tail(func);
+
+   return false;
+}
+
 static bool cfg_build(struct cfg *cfg, struct bpf_insn *insn, unsigned int len)
 {
int cnt = len / sizeof(*insn);
+   struct func_node *func;
 
INIT_LIST_HEAD(>funcs);
 
-   return cfg_partition_funcs(cfg, insn, insn + cnt);
+   if (cfg_partition_funcs(cfg, insn, insn + cnt))
+   return true;
+
+   list_for_each_entry(func, >funcs, l) {
+   if (func_partition_bb(func))
+   return true;
+   }
+
+   return false;
 }
 
 static void cfg_destroy(struct cfg

[PATCH bpf-next v2 8/8] tools: bpftool: add bash completion for CFG dump

2018-03-01 Thread Jakub Kicinski

From: Quentin Monnet 

Add bash completion for the "visual" keyword used for dumping the CFG of
eBPF programs with bpftool. Make sure we only complete with this keyword
when we dump "xlated" (and not "jited") instructions.

Acked-by: Jiong Wang 
Signed-off-by: Quentin Monnet 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/bash-completion/bpftool | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/bpf/bpftool/bash-completion/bpftool 
b/tools/bpf/bpftool/bash-completion/bpftool
index 08719c54a614..490811b45fa7 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -147,7 +147,7 @@ _bpftool()
 
 # Deal with simplest keywords
 case $prev in
-help|key|opcodes)
+help|key|opcodes|visual)
 return 0
 ;;
 tag)
@@ -223,11 +223,16 @@ _bpftool()
 return 0
 ;;
 *)
-_bpftool_once_attr 'file'
+_bpftool_once_attr 'file'
+if _bpftool_search_list 'xlated'; then
+COMPREPLY+=( $( compgen -W 'opcodes visual' -- \
+"$cur" ) )
+else
 COMPREPLY+=( $( compgen -W 'opcodes' -- \
 "$cur" ) )
-return 0
-;;
+fi
+return 0
+;;
 esac
 ;;
 pin)
-- 
2.15.1

[PATCH bpf-next v2 6/8] tools: bpftool: generate .dot graph from CFG information

2018-03-01 Thread Jakub Kicinski

From: Jiong Wang 

This patch let bpftool print .dot graph file into stdout.

This graph is generated by the following steps:

  - iterate through the function list.
  - generate basic-block(BB) definition for each BB in the function.
  - draw out edges to connect BBs.

This patch is the initial support, the layout and decoration of the .dot
graph could be improved.

Also, it will be useful if we could visualize some performance data from
static analysis.

Signed-off-by: Jiong Wang 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/cfg.c   | 93 +++
 tools/bpf/bpftool/xlated_dumper.c | 52 ++
 tools/bpf/bpftool/xlated_dumper.h |  2 +
 3 files changed, 147 insertions(+)

diff --git a/tools/bpf/bpftool/cfg.c b/tools/bpf/bpftool/cfg.c
index c57f88cf2834..f30b3a4a840b 100644
--- a/tools/bpf/bpftool/cfg.c
+++ b/tools/bpf/bpftool/cfg.c
@@ -41,6 +41,7 @@
 
 #include "cfg.h"
 #include "main.h"
+#include "xlated_dumper.h"
 
 struct cfg {
struct list_head funcs;
@@ -408,6 +409,96 @@ static void cfg_destroy(struct cfg *cfg)
}
 }
 
+static void draw_bb_node(struct func_node *func, struct bb_node *bb)
+{
+   const char *shape;
+
+   if (bb->idx == ENTRY_BLOCK_INDEX || bb->idx == EXIT_BLOCK_INDEX)
+   shape = "Mdiamond";
+   else
+   shape = "record";
+
+   printf("\tfn_%d_bb_%d [shape=%s,style=filled,label=\"",
+  func->idx, bb->idx, shape);
+
+   if (bb->idx == ENTRY_BLOCK_INDEX) {
+   printf("ENTRY");
+   } else if (bb->idx == EXIT_BLOCK_INDEX) {
+   printf("EXIT");
+   } else {
+   unsigned int start_idx;
+   struct dump_data dd = {};
+
+   printf("{");
+   kernel_syms_load();
+   start_idx = bb->head - func->start;
+   dump_xlated_for_graph(, bb->head, bb->tail, start_idx);
+   kernel_syms_destroy();
+   printf("}");
+   }
+
+   printf("\"];\n\n");
+}
+
+static void draw_bb_succ_edges(struct func_node *func, struct bb_node *bb)
+{
+   const char *style = "\"solid,bold\"";
+   const char *color = "black";
+   int func_idx = func->idx;
+   struct edge_node *e;
+   int weight = 10;
+
+   if (list_empty(>e_succs))
+   return;
+
+   list_for_each_entry(e, >e_succs, l) {
+   printf("\tfn_%d_bb_%d:s -> fn_%d_bb_%d:n [style=%s, color=%s, 
weight=%d, constraint=true",
+  func_idx, e->src->idx, func_idx, e->dst->idx,
+  style, color, weight);
+   printf("];\n");
+   }
+}
+
+static void func_output_bb_def(struct func_node *func)
+{
+   struct bb_node *bb;
+
+   list_for_each_entry(bb, >bbs, l) {
+   draw_bb_node(func, bb);
+   }
+}
+
+static void func_output_edges(struct func_node *func)
+{
+   int func_idx = func->idx;
+   struct bb_node *bb;
+
+   list_for_each_entry(bb, >bbs, l) {
+   draw_bb_succ_edges(func, bb);
+   }
+
+   /* Add an invisible edge from ENTRY to EXIT, this is to
+* improve the graph layout.
+*/
+   printf("\tfn_%d_bb_%d:s -> fn_%d_bb_%d:n [style=\"invis\", 
constraint=true];\n",
+  func_idx, ENTRY_BLOCK_INDEX, func_idx, EXIT_BLOCK_INDEX);
+}
+
+static void cfg_dump(struct cfg *cfg)
+{
+   struct func_node *func;
+
+   printf("digraph \"DOT graph for eBPF program\" {\n");
+   list_for_each_entry(func, >funcs, l) {
+   printf("subgraph \"cluster_%d\" 
{\n\tstyle=\"dashed\";\n\tcolor=\"black\";\n\tlabel=\"func_%d ()\";\n",
+  func->idx, func->idx);
+   func_output_bb_def(func);
+   func_output_edges(func);
+   printf("}\n");
+   }
+   printf("}\n");
+}
+
 void dump_xlated_cfg(void *buf, unsigned int len)
 {
struct bpf_insn *insn = buf;
@@ -417,5 +508,7 @@ void dump_xlated_cfg(void *buf, unsigned int len)
if (cfg_build(, insn, len))
return;
 
+   cfg_dump();
+
cfg_destroy();
 }
diff --git a/tools/bpf/bpftool/xlated_dumper.c 
b/tools/bpf/bpftool/xlated_dumper.c
index dfcdf794c9d1..20da835e9e38 100644
--- a/tools/bpf/bpftool/xlated_dumper.c
+++ b/tools/bpf/bpftool/xlated_dumper.c
@@ -123,6 +123,37 @@ static void print_insn(struct bpf_verifier_env *env, const 
char *fmt, ...)
va_end(args);
 }
 
+static void
+print_insn_for_graph(struct bpf_verifier_env *env, const char *fmt, ...)
+{
+   char buf[64], *p;
+   va_list args;
+
+   va_start(args, fmt);
+   vsnprintf(buf, sizeof(buf), fmt, args);
+   va_end(args);
+
+   p = buf;
+   while (*p != '\0') {
+   if (*p == '\n') {
+   memmove(p + 3, p, strlen(buf) + 1 - (p - buf));
+   /* Align each instruction dump

[PATCH bpf-next v2 7/8] tools: bpftool: new command-line option and documentation for 'visual'

2018-03-01 Thread Jakub Kicinski

From: Jiong Wang 

This patch adds new command-line option for visualizing the xlated eBPF
sequence.

Documentations are updated accordingly.

Usage:

  bpftool prog dump xlated id 2 visual

Reviewed-by: Quentin Monnet 
Signed-off-by: Jiong Wang 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 18 --
 tools/bpf/bpftool/prog.c | 12 +++-
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index e4ceee7f2dff..67ca6c69376c 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -21,7 +21,7 @@ MAP COMMANDS
 =
 
 |  **bpftool** **prog { show | list }** [*PROG*]
-|  **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | 
**opcodes**}]
+|  **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** 
| **visual**}]
 |  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
**opcodes**}]
 |  **bpftool** **prog pin** *PROG* *FILE*
 |  **bpftool** **prog load** *OBJ* *FILE*
@@ -39,12 +39,18 @@ DESCRIPTION
  Output will start with program ID followed by program type and
  zero or more named attributes (depending on kernel version).
 
-   **bpftool prog dump xlated** *PROG* [{ **file** *FILE* | **opcodes** }]
- Dump eBPF instructions of the program from the kernel.
- If *FILE* is specified image will be written to a file,
- otherwise it will be disassembled and printed to stdout.
+   **bpftool prog dump xlated** *PROG* [{ **file** *FILE* | **opcodes** | 
**visual** }]
+ Dump eBPF instructions of the program from the kernel. By
+ default, eBPF will be disassembled and printed to standard
+ output in human-readable format. In this case, **opcodes**
+ controls if raw opcodes should be printed as well.
 
- **opcodes** controls if raw opcodes will be printed.
+ If **file** is specified, the binary image will instead be
+ written to *FILE*.
+
+ If **visual** is specified, control flow graph (CFG) will be
+ built instead, and eBPF instructions will be presented with
+ CFG in DOT format, on standard output.
 
**bpftool prog dump jited**  *PROG* [{ **file** *FILE* | **opcodes** }]
  Dump jited image (host machine code) of the program.
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index c5afee9838e6..f7a810897eac 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 
+#include "cfg.h"
 #include "main.h"
 #include "xlated_dumper.h"
 
@@ -415,6 +416,7 @@ static int do_dump(int argc, char **argv)
unsigned int buf_size;
char *filepath = NULL;
bool opcodes = false;
+   bool visual = false;
unsigned char *buf;
__u32 *member_len;
__u64 *member_ptr;
@@ -453,6 +455,9 @@ static int do_dump(int argc, char **argv)
} else if (is_prefix(*argv, "opcodes")) {
opcodes = true;
NEXT_ARG();
+   } else if (is_prefix(*argv, "visual")) {
+   visual = true;
+   NEXT_ARG();
}
 
if (argc) {
@@ -536,6 +541,11 @@ static int do_dump(int argc, char **argv)
}
 
disasm_print_insn(buf, *member_len, opcodes, name);
+   } else if (visual) {
+   if (json_output)
+   jsonw_null(json_wtr);
+   else
+   dump_xlated_cfg(buf, *member_len);
} else {
kernel_syms_load();
if (json_output)
@@ -596,7 +606,7 @@ static int do_help(int argc, char **argv)
 
fprintf(stderr,
"Usage: %s %s { show | list } [PROG]\n"
-   "   %s %s dump xlated PROG [{ file FILE | opcodes }]\n"
+   "   %s %s dump xlated PROG [{ file FILE | opcodes | visual 
}]\n"
"   %s %s dump jited  PROG [{ file FILE | opcodes }]\n"
"   %s %s pin   PROG FILE\n"
"   %s %s load  OBJ  FILE\n"
-- 
2.15.1

[PATCH bpf-next v2 0/8] tools: bpftool: visualization support for eBPF program

2018-03-01 Thread Jakub Kicinski

Jiong says:

This patch set is an application of CFG information on eBPF program
visualization. It presents some initial code for building CFG information
from eBPF instruction sequences.

After we get eBPF program bytecode, we do sub-program detection and
basic-block partition. These information then are visualized into DOT
graph.

The user could use any DOT graphic tools (xdot, graphviz etc) to view it.

For example:

  bpftool prog dump xlated id 2 visual &>output.dot

  [xdot | dotty] output.dot
  dot -Tpng -o output.png

This initial patch set hasn't tuned much on the dot description layout
nor decoration, we could improve them later once the direction of the patch
set is agreed on. We could also visualize some static analysis performance
data.

v2 (Jakub):
 - update license headers and add SPDX tags.

Jiong Wang (7):
  tools: bpftool: remove unnecessary 'if' to reduce indentation
  tools: bpftool: factor out xlated dump related code into separate file
  tools: bpftool: detect sub-programs from the eBPF sequence
  tools: bpftool: partition basic-block for each function in the CFG
  tools: bpftool: add out edges for each basic-block
  tools: bpftool: generate .dot graph from CFG information
  tools: bpftool: new command-line option and documentation for 'visual'

Quentin Monnet (1):
  tools: bpftool: add bash completion for CFG dump

 tools/bpf/bpftool/Documentation/bpftool-prog.rst |  18 +-
 tools/bpf/bpftool/bash-completion/bpftool|  13 +-
 tools/bpf/bpftool/cfg.c  | 514 +++
 tools/bpf/bpftool/cfg.h  |  43 ++
 tools/bpf/bpftool/prog.c | 305 ++
 tools/bpf/bpftool/xlated_dumper.c| 338 +++
 tools/bpf/bpftool/xlated_dumper.h|  64 +++
 7 files changed, 1010 insertions(+), 285 deletions(-)
 create mode 100644 tools/bpf/bpftool/cfg.c
 create mode 100644 tools/bpf/bpftool/cfg.h
 create mode 100644 tools/bpf/bpftool/xlated_dumper.c
 create mode 100644 tools/bpf/bpftool/xlated_dumper.h

-- 
2.15.1

[PATCH bpf-next v2 1/8] tools: bpftool: remove unnecessary 'if' to reduce indentation

2018-03-01 Thread Jakub Kicinski

From: Jiong Wang 

It is obvious we could use 'else if' instead of start a new 'if' in the
touched code.

Signed-off-by: Jiong Wang 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/prog.c | 38 ++
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index e549e329be82..950d11dd42ab 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -777,27 +777,25 @@ static int do_dump(int argc, char **argv)
 
if (json_output)
jsonw_null(json_wtr);
-   } else {
-   if (member_len == _prog_len) {
-   const char *name = NULL;
-
-   if (info.ifindex) {
-   name = ifindex_to_bfd_name_ns(info.ifindex,
- info.netns_dev,
- info.netns_ino);
-   if (!name)
-   goto err_free;
-   }
-
-   disasm_print_insn(buf, *member_len, opcodes, name);
-   } else {
-   kernel_syms_load();
-   if (json_output)
-   dump_xlated_json(, buf, *member_len, 
opcodes);
-   else
-   dump_xlated_plain(, buf, *member_len, 
opcodes);
-   kernel_syms_destroy();
+   } else if (member_len == _prog_len) {
+   const char *name = NULL;
+
+   if (info.ifindex) {
+   name = ifindex_to_bfd_name_ns(info.ifindex,
+ info.netns_dev,
+ info.netns_ino);
+   if (!name)
+   goto err_free;
}
+
+   disasm_print_insn(buf, *member_len, opcodes, name);
+   } else {
+   kernel_syms_load();
+   if (json_output)
+   dump_xlated_json(, buf, *member_len, opcodes);
+   else
+   dump_xlated_plain(, buf, *member_len, opcodes);
+   kernel_syms_destroy();
}
 
free(buf);
-- 
2.15.1

[PATCH bpf-next v2 3/8] tools: bpftool: detect sub-programs from the eBPF sequence

2018-03-01 Thread Jakub Kicinski

From: Jiong Wang 

This patch detect all sub-programs from the eBPF sequence and keep the
information in the new CFG data structure.

The detection algorithm is basically the same as the one in verifier except
we need to use insn->off instead of insn->imm to get the pc-relative call
offset. Because verifier has modified insn->off/insn->imm during finishing
the verification.

Also, we don't need to do some sanity checks as verifier has done them.

Signed-off-by: Jiong Wang 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/cfg.c | 147 
 tools/bpf/bpftool/cfg.h |  43 ++
 2 files changed, 190 insertions(+)
 create mode 100644 tools/bpf/bpftool/cfg.c
 create mode 100644 tools/bpf/bpftool/cfg.h

diff --git a/tools/bpf/bpftool/cfg.c b/tools/bpf/bpftool/cfg.c
new file mode 100644
index ..8330dedb3576
--- /dev/null
+++ b/tools/bpf/bpftool/cfg.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/*
+ * Copyright (C) 2018 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ *  2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "cfg.h"
+#include "main.h"
+
+struct cfg {
+   struct list_head funcs;
+   int func_num;
+};
+
+struct func_node {
+   struct list_head l;
+   struct bpf_insn *start;
+   struct bpf_insn *end;
+   int idx;
+};
+
+#define func_prev(func)list_prev_entry(func, l)
+#define func_next(func)list_next_entry(func, l)
+#define cfg_first_func(cfg)\
+   list_first_entry(>funcs, struct func_node, l)
+#define cfg_last_func(cfg) \
+   list_last_entry(>funcs, struct func_node, l)
+
+static struct func_node *cfg_append_func(struct cfg *cfg, struct bpf_insn 
*insn)
+{
+   struct func_node *new_func, *func;
+
+   list_for_each_entry(func, >funcs, l) {
+   if (func->start == insn)
+   return func;
+   else if (func->start > insn)
+   break;
+   }
+
+   func = func_prev(func);
+   new_func = calloc(1, sizeof(*new_func));
+   if (!new_func) {
+   p_err("OOM when allocating FUNC node");
+   return NULL;
+   }
+   new_func->start = insn;
+   new_func->idx = cfg->func_num;
+   list_add(_func->l, >l);
+   cfg->func_num++;
+
+   return new_func;
+}
+
+static bool cfg_partition_funcs(struct cfg *cfg, struct bpf_insn *cur,
+   struct bpf_insn *end)
+{
+   struct func_node *func, *last_func;
+
+   func = cfg_append_func(cfg, cur);
+   if (!func)
+   return true;
+
+   for (; cur < end; cur++) {
+   if (cur->code != (BPF_JMP | BPF_CALL))
+   continue;
+   if (cur->src_reg != BPF_PSEUDO_CALL)
+   continue;
+   func = cfg_append_func(cfg, cur + cur->off + 1);
+   if (!func)
+   return true;
+   }
+
+   last_func = cfg_last_func(cfg);
+   last_func->end = end - 1;
+   func = cfg_first_func(cfg);
+   list_for_each_entry_from(func, _func->l, l) {
+   func->end =

[PATCH bpf-next v2 5/8] tools: bpftool: add out edges for each basic-block

2018-03-01 Thread Jakub Kicinski

From: Jiong Wang 

This patch adds out edges for each basic-block. We will need these out
edges to finish the .dot graph drawing.

Signed-off-by: Jiong Wang 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/cfg.c | 162 +++-
 1 file changed, 160 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/cfg.c b/tools/bpf/bpftool/cfg.c
index 152df904d421..c57f88cf2834 100644
--- a/tools/bpf/bpftool/cfg.c
+++ b/tools/bpf/bpftool/cfg.c
@@ -58,15 +58,32 @@ struct func_node {
 
 struct bb_node {
struct list_head l;
+   struct list_head e_prevs;
+   struct list_head e_succs;
struct bpf_insn *head;
struct bpf_insn *tail;
int idx;
 };
 
+#define EDGE_FLAG_EMPTY0x0
+#define EDGE_FLAG_FALLTHROUGH  0x1
+#define EDGE_FLAG_JUMP 0x2
+struct edge_node {
+   struct list_head l;
+   struct bb_node *src;
+   struct bb_node *dst;
+   int flags;
+};
+
+#define ENTRY_BLOCK_INDEX  0
+#define EXIT_BLOCK_INDEX   1
+#define NUM_FIXED_BLOCKS   2
 #define func_prev(func)list_prev_entry(func, l)
 #define func_next(func)list_next_entry(func, l)
 #define bb_prev(bb)list_prev_entry(bb, l)
 #define bb_next(bb)list_next_entry(bb, l)
+#define entry_bb(func) func_first_bb(func)
+#define exit_bb(func)  func_last_bb(func)
 #define cfg_first_func(cfg)\
list_first_entry(>funcs, struct func_node, l)
 #define cfg_last_func(cfg) \
@@ -120,11 +137,30 @@ static struct bb_node *func_append_bb(struct func_node 
*func,
return NULL;
}
new_bb->head = insn;
+   INIT_LIST_HEAD(_bb->e_prevs);
+   INIT_LIST_HEAD(_bb->e_succs);
list_add(_bb->l, >l);
 
return new_bb;
 }
 
+static struct bb_node *func_insert_dummy_bb(struct list_head *after)
+{
+   struct bb_node *bb;
+
+   bb = calloc(1, sizeof(*bb));
+   if (!bb) {
+   p_err("OOM when allocating BB node");
+   return NULL;
+   }
+
+   INIT_LIST_HEAD(>e_prevs);
+   INIT_LIST_HEAD(>e_succs);
+   list_add(>l, after);
+
+   return bb;
+}
+
 static bool cfg_partition_funcs(struct cfg *cfg, struct bpf_insn *cur,
struct bpf_insn *end)
 {
@@ -190,8 +226,8 @@ static bool func_partition_bb_head(struct func_node *func)
 
 static void func_partition_bb_tail(struct func_node *func)
 {
+   unsigned int bb_idx = NUM_FIXED_BLOCKS;
struct bb_node *bb, *last;
-   unsigned int bb_idx = 0;
 
last = func_last_bb(func);
last->tail = func->end;
@@ -205,6 +241,23 @@ static void func_partition_bb_tail(struct func_node *func)
func->bb_num = bb_idx;
 }
 
+static bool func_add_special_bb(struct func_node *func)
+{
+   struct bb_node *bb;
+
+   bb = func_insert_dummy_bb(>bbs);
+   if (!bb)
+   return true;
+   bb->idx = ENTRY_BLOCK_INDEX;
+
+   bb = func_insert_dummy_bb(_last_bb(func)->l);
+   if (!bb)
+   return true;
+   bb->idx = EXIT_BLOCK_INDEX;
+
+   return false;
+}
+
 static bool func_partition_bb(struct func_node *func)
 {
if (func_partition_bb_head(func))
@@ -215,6 +268,96 @@ static bool func_partition_bb(struct func_node *func)
return false;
 }
 
+static struct bb_node *func_search_bb_with_head(struct func_node *func,
+   struct bpf_insn *insn)
+{
+   struct bb_node *bb;
+
+   list_for_each_entry(bb, >bbs, l) {
+   if (bb->head == insn)
+   return bb;
+   }
+
+   return NULL;
+}
+
+static struct edge_node *new_edge(struct bb_node *src, struct bb_node *dst,
+ int flags)
+{
+   struct edge_node *e;
+
+   e = calloc(1, sizeof(*e));
+   if (!e) {
+   p_err("OOM when allocating edge node");
+   return NULL;
+   }
+
+   if (src)
+   e->src = src;
+   if (dst)
+   e->dst = dst;
+
+   e->flags |= flags;
+
+   return e;
+}
+
+static bool func_add_bb_edges(struct func_node *func)
+{
+   struct bpf_insn *insn;
+   struct edge_node *e;
+   struct bb_node *bb;
+
+   bb = entry_bb(func);
+   e = new_edge(bb, bb_next(bb), EDGE_FLAG_FALLTHROUGH);
+   if (!e)
+   return true;
+   list_add_tail(>l, >e_succs);
+
+   bb = exit_bb(func);
+   e = new_edge(bb_prev(bb), bb, EDGE_FLAG_FALLTHROUGH);
+   if (!e)
+   return true;
+   list_add_tail(>l, >e_prevs);
+
+   bb = entry_bb(func);
+   bb = bb_next(bb);
+   list_for_each_entry_from(bb, _bb(func)->l, l) {
+   e = new_edge(bb, NULL, EDGE_FLAG_EMPTY);
+   if (!e)
+   return true;
+   e->src = bb;
+
+

[PATCH bpf-next v2 2/8] tools: bpftool: factor out xlated dump related code into separate file

2018-03-01 Thread Jakub Kicinski

From: Jiong Wang 

This patch factors out those code of dumping xlated eBPF instructions into
xlated_dumper.[h|c].

They are quite independent dumper functions, so better to be kept
separately.

New dumper support will be added in later patches in this set.

Signed-off-by: Jiong Wang 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/prog.c  | 255 +
 tools/bpf/bpftool/xlated_dumper.c | 286 ++
 tools/bpf/bpftool/xlated_dumper.h |  62 +
 3 files changed, 349 insertions(+), 254 deletions(-)
 create mode 100644 tools/bpf/bpftool/xlated_dumper.c
 create mode 100644 tools/bpf/bpftool/xlated_dumper.h

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 950d11dd42ab..c5afee9838e6 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -48,7 +48,7 @@
 #include 
 
 #include "main.h"
-#include "disasm.h"
+#include "xlated_dumper.h"
 
 static const char * const prog_type_name[] = {
[BPF_PROG_TYPE_UNSPEC]  = "unspec",
@@ -407,259 +407,6 @@ static int do_show(int argc, char **argv)
return err;
 }
 
-#define SYM_MAX_NAME   256
-
-struct kernel_sym {
-   unsigned long address;
-   char name[SYM_MAX_NAME];
-};
-
-struct dump_data {
-   unsigned long address_call_base;
-   struct kernel_sym *sym_mapping;
-   __u32 sym_count;
-   char scratch_buff[SYM_MAX_NAME];
-};
-
-static int kernel_syms_cmp(const void *sym_a, const void *sym_b)
-{
-   return ((struct kernel_sym *)sym_a)->address -
-  ((struct kernel_sym *)sym_b)->address;
-}
-
-static void kernel_syms_load(struct dump_data *dd)
-{
-   struct kernel_sym *sym;
-   char buff[256];
-   void *tmp, *address;
-   FILE *fp;
-
-   fp = fopen("/proc/kallsyms", "r");
-   if (!fp)
-   return;
-
-   while (!feof(fp)) {
-   if (!fgets(buff, sizeof(buff), fp))
-   break;
-   tmp = realloc(dd->sym_mapping,
- (dd->sym_count + 1) *
- sizeof(*dd->sym_mapping));
-   if (!tmp) {
-out:
-   free(dd->sym_mapping);
-   dd->sym_mapping = NULL;
-   fclose(fp);
-   return;
-   }
-   dd->sym_mapping = tmp;
-   sym = >sym_mapping[dd->sym_count];
-   if (sscanf(buff, "%p %*c %s", , sym->name) != 2)
-   continue;
-   sym->address = (unsigned long)address;
-   if (!strcmp(sym->name, "__bpf_call_base")) {
-   dd->address_call_base = sym->address;
-   /* sysctl kernel.kptr_restrict was set */
-   if (!sym->address)
-   goto out;
-   }
-   if (sym->address)
-   dd->sym_count++;
-   }
-
-   fclose(fp);
-
-   qsort(dd->sym_mapping, dd->sym_count,
- sizeof(*dd->sym_mapping), kernel_syms_cmp);
-}
-
-static void kernel_syms_destroy(struct dump_data *dd)
-{
-   free(dd->sym_mapping);
-}
-
-static struct kernel_sym *kernel_syms_search(struct dump_data *dd,
-unsigned long key)
-{
-   struct kernel_sym sym = {
-   .address = key,
-   };
-
-   return dd->sym_mapping ?
-  bsearch(, dd->sym_mapping, dd->sym_count,
-  sizeof(*dd->sym_mapping), kernel_syms_cmp) : NULL;
-}
-
-static void print_insn(struct bpf_verifier_env *env, const char *fmt, ...)
-{
-   va_list args;
-
-   va_start(args, fmt);
-   vprintf(fmt, args);
-   va_end(args);
-}
-
-static const char *print_call_pcrel(struct dump_data *dd,
-   struct kernel_sym *sym,
-   unsigned long address,
-   const struct bpf_insn *insn)
-{
-   if (sym)
-   snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
-"%+d#%s", insn->off, sym->name);
-   else
-   snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
-"%+d#0x%lx", insn->off, address);
-   return dd->scratch_buff;
-}
-
-static const char *print_call_helper(struct dump_data *dd,
-struct kernel_sym *sym,
-unsigned long address)
-{
-   if (sym)
-   snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
-"%s", sym->name);
-   else
-   snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
-"0x%lx", address);
-   return dd->scratch_buff;
-}
-
-static const char *print_call(void *private_data,
- const struct bpf_insn

Re: [Patch 4.14 0/4] net_sched: backport tc filter fixes to 4.14

2018-03-01 Thread David Miller

From: Cong Wang 
Date: Thu,  1 Mar 2018 13:46:35 -0800

> This patchset backports 4 important bug fixes for tc filter to
> 4.14 stable branch. Due to some big changes between 4.14 and 4.15,
> the backports are not trivial, I have to adjust and fix the conflicts
> manually.
> 
> Thanks to Roland for reporting the kernel warning and testing
> the patches.
> 
> Reported-by: Roland Franke 
> Tested-by: Roland Franke 
> Cc: Jiri Pirko 
> Cc: Roman Kapl 
> Cc: David S. Miller 
> Signed-off-by: Cong Wang 

Greg, please queue up this series for -stable.

Thank you!

[PATCH net-next] fib_rules: FRA_GENERIC_POLICY updates for ip proto, sport and dport attrs

2018-03-01 Thread Roopa Prabhu

From: Roopa Prabhu 

Fixes: bfff4862653b ("net: fib_rules: support for match on ip_proto, sport and 
dport")
Reported-by: Eric Dumazet 
Signed-off-by: Roopa Prabhu 
---
 include/net/fib_rules.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 6dd0a00..1c9e17c 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -112,7 +112,11 @@ struct fib_rule_notifier_info {
[FRA_GOTO]  = { .type = NLA_U32 }, \
[FRA_L3MDEV]= { .type = NLA_U8 }, \
[FRA_UID_RANGE] = { .len = sizeof(struct fib_rule_uid_range) }, \
-   [FRA_PROTOCOL]  = { .type = NLA_U8 }
+   [FRA_PROTOCOL]  = { .type = NLA_U8 }, \
+   [FRA_IP_PROTO]  = { .type = NLA_U8 }, \
+   [FRA_SPORT_RANGE] = { .len = sizeof(struct fib_rule_port_range) }, \
+   [FRA_DPORT_RANGE] = { .len = sizeof(struct fib_rule_port_range) }
+
 
 static inline void fib_rule_get(struct fib_rule *rule)
 {
-- 
2.1.4

[RFC PATCH v3] ptr_ring: linked list fallback

2018-03-01 Thread Michael S. Tsirkin

So pointer rings work fine, but they have a problem: make them too small
and not enough entries fit.  Make them too large and you start flushing
your cache and running out of memory.

This is a new idea of mine: a ring backed by a linked list. Once you run
out of ring entries, instead of a drop you fall back on a list with a
common lock.

Should work well for the case where the ring is typically sized
correctly, but will help address the fact that some user try to set e.g.
tx queue length to 100.

In other words, the idea is that if a user sets a really huge TX queue
length, we allocate a ptr_ring which is smaller, and use the backup
linked list when necessary to provide the requested TX queue length
legitimately.

My hope this will move us closer to direction where e.g. fw codel can
use ptr rings without locking at all.  The API is still very rough, and
I really need to take a hard look at lock nesting.

As was pointed out, this approach only brings benefits if ring
is rarely completely full. Whether that's common remains to be seen.

Compiled only, sending for early feedback/flames.

Signed-off-by: Michael S. Tsirkin 
---
 include/linux/ptr_ring.h | 79 +---
 1 file changed, 75 insertions(+), 4 deletions(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index e633522..aadb751 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -31,16 +31,25 @@
 #include 
 #endif
 
+/* entries must start with the following structure */
+struct plist {
+   struct plist *next;
+   struct plist *last; /* only valid in the 1st entry */
+};
+
 struct ptr_ring {
int producer cacheline_aligned_in_smp;
spinlock_t producer_lock;
int consumer_head cacheline_aligned_in_smp; /* next valid entry */
int consumer_tail; /* next entry to invalidate */
+   struct plist *consumer_list;
+   int list_num;
spinlock_t consumer_lock;
/* Shared consumer/producer data */
/* Read-only by both the producer and the consumer */
int size cacheline_aligned_in_smp; /* max entries in queue */
int batch; /* number of entries to consume in a batch */
+   int list_size;
void **queue;
 };
 
@@ -121,10 +130,42 @@ static inline int __ptr_ring_produce(struct ptr_ring *r, 
void *ptr)
 }
 
 /*
- * Note: resize (below) nests producer lock within consumer lock, so if you
- * consume in interrupt or BH context, you must disable interrupts/BH when
- * calling this.
+ * Note: resize API with the _fallback should be used when calling this.
  */
+static inline int ptr_ring_produce_fallback(struct ptr_ring *r, void *ptr)
+{
+   int ret;
+   unsigned long flags;
+   struct plist *p = ptr;
+
+   p->next = NULL;
+   p->last = p;
+
+   spin_lock_irqsave(>producer_lock, flags);
+   ret = __ptr_ring_produce(r, ptr);
+   if (ret && r->list_size) {
+   spin_lock(>consumer_lock);
+   ret = __ptr_ring_produce(r, ptr);
+   if (ret && r->list_num < r->list_size) {
+   int producer = r->producer ? r->producer - 1 :
+   r->size - 1;
+   struct plist *first = r->queue[producer];
+
+   BUG_ON(!first);
+
+   first->last->next = p;
+   first->last = p;
+
+   r->list_num++;
+   }
+   spin_unlock(>consumer_lock);
+   }
+
+   spin_unlock_irqrestore(>producer_lock, flags);
+
+   return ret;
+}
+
 static inline int ptr_ring_produce(struct ptr_ring *r, void *ptr)
 {
int ret;
@@ -136,6 +177,7 @@ static inline int ptr_ring_produce(struct ptr_ring *r, void 
*ptr)
return ret;
 }
 
+
 static inline int ptr_ring_produce_irq(struct ptr_ring *r, void *ptr)
 {
int ret;
@@ -372,6 +414,27 @@ static inline void *ptr_ring_consume_bh(struct ptr_ring *r)
return ptr;
 }
 
+static inline void *ptr_ring_consume_fallback(struct ptr_ring *r)
+{
+   unsigned long flags;
+   struct plist *ptr;
+
+   spin_lock_irqsave(>consumer_lock, flags);
+   if (r->consumer_list) {
+   ptr = r->consumer_list;
+   r->consumer_list = ptr->next;
+   r->list_num--;
+   } else {
+   ptr = __ptr_ring_consume(r);
+   if (ptr) {
+   r->consumer_list = ptr->next;
+   }
+   }
+   spin_unlock_irqrestore(>consumer_lock, flags);
+
+   return ptr;
+}
+
 static inline int ptr_ring_consume_batched(struct ptr_ring *r,
   void **array, int n)
 {
@@ -487,7 +550,8 @@ static inline void __ptr_ring_set_size(struct ptr_ring *r, 
int size)
r->batch = 1;
 }
 
-static inline int ptr_ring_init(struct ptr_ring *r, int size, gfp_t gfp)
+static inline int ptr_ring_init_fallback(struct ptr_ring

Dear friend

2018-03-01 Thread Charlotte Jonathan



Dear friend

I am contacting you because I am dying and I want you to adopt my daughter. In 
Adopting my daughter, I am willing to allocate to you my funds deposited in a 
bank. When I get response from you, I will give you more details.



Charlotte Jonathan.

Re: [RFC PATCH V1 01/12] audit: add container id

2018-03-01 Thread Richard Guy Briggs

On 2018-03-01 14:41, Richard Guy Briggs wrote:
> Implement the proc fs write to set the audit container ID of a process,
> emitting an AUDIT_CONTAINER record to document the event.
> 
> This is a write from the container orchestrator task to a proc entry of
> the form /proc/PID/containerid where PID is the process ID of the newly
> created task that is to become the first task in a container, or an
> additional task added to a container.
> 
> The write expects up to a u64 value (unset: 18446744073709551615).
> 
> This will produce a record such as this:
> type=UNKNOWN[1333] msg=audit(1519903238.968:261): op=set pid=596 uid=0 
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 auid=0 tty=pts0 
> ses=1 opid=596 old-contid=18446744073709551615 contid=123455 res=0
> 
> The "op" field indicates an initial set.  The "pid" to "ses" fields are
> the orchestrator while the "opid" field is the object's PID, the process
> being "contained".  Old and new container ID values are given in the
> "contid" fields, while res indicates its success.
> 
> It is not permitted to self-set, unset or re-set the container ID.  A
> child inherits its parent's container ID, but then can be set only once
> after.

There are more restrictions coming later:
- check that the child being set has no children or threads yet, or
  forcibly set them all to the same container ID (assuming they all pass
  the same tests).  This will also prevent an orch from setting its
  parent and other tit-for-tat games to circumvent the basic checks.

> See: https://github.com/linux-audit/audit-kernel/issues/32
> 
> Signed-off-by: Richard Guy Briggs 
> ---
>  fs/proc/base.c | 37 
>  include/linux/audit.h  | 16 +
>  include/linux/init_task.h  |  4 ++-
>  include/linux/sched.h  |  1 +
>  include/uapi/linux/audit.h |  2 ++
>  kernel/auditsc.c   | 86 
> ++
>  6 files changed, 145 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 60316b5..6ce4fbe 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1299,6 +1299,41 @@ static ssize_t proc_sessionid_read(struct file * file, 
> char __user * buf,
>   .read   = proc_sessionid_read,
>   .llseek = generic_file_llseek,
>  };
> +
> +static ssize_t proc_containerid_write(struct file *file, const char __user 
> *buf,
> +size_t count, loff_t *ppos)
> +{
> + struct inode *inode = file_inode(file);
> + u64 containerid;
> + int rv;
> + struct task_struct *task = get_proc_task(inode);
> +
> + if (!task)
> + return -ESRCH;
> + if (*ppos != 0) {
> + /* No partial writes. */
> + put_task_struct(task);
> + return -EINVAL;
> + }
> +
> + rv = kstrtou64_from_user(buf, count, 10, );
> + if (rv < 0) {
> + put_task_struct(task);
> + return rv;
> + }
> +
> + rv = audit_set_containerid(task, containerid);
> + put_task_struct(task);
> + if (rv < 0)
> + return rv;
> + return count;
> +}
> +
> +static const struct file_operations proc_containerid_operations = {
> + .write  = proc_containerid_write,
> + .llseek = generic_file_llseek,
> +};
> +
>  #endif
>  
>  #ifdef CONFIG_FAULT_INJECTION
> @@ -2961,6 +2996,7 @@ static int proc_pid_patch_state(struct seq_file *m, 
> struct pid_namespace *ns,
>  #ifdef CONFIG_AUDITSYSCALL
>   REG("loginuid",   S_IWUSR|S_IRUGO, proc_loginuid_operations),
>   REG("sessionid",  S_IRUGO, proc_sessionid_operations),
> + REG("containerid", S_IWUSR, proc_containerid_operations),
>  #endif
>  #ifdef CONFIG_FAULT_INJECTION
>   REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
> @@ -3355,6 +3391,7 @@ static int proc_tid_comm_permission(struct inode 
> *inode, int mask)
>  #ifdef CONFIG_AUDITSYSCALL
>   REG("loginuid",  S_IWUSR|S_IRUGO, proc_loginuid_operations),
>   REG("sessionid",  S_IRUGO, proc_sessionid_operations),
> + REG("containerid", S_IWUSR, proc_containerid_operations),
>  #endif
>  #ifdef CONFIG_FAULT_INJECTION
>   REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index af410d9..fe4ba3f 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -29,6 +29,7 @@
>  
>  #define AUDIT_INO_UNSET ((unsigned long)-1)
>  #define AUDIT_DEV_UNSET ((dev_t)-1)
> +#define INVALID_CID AUDIT_CID_UNSET
>  
>  struct audit_sig_info {
>   uid_t   uid;
> @@ -321,6 +322,7 @@ static inline void audit_ptrace(struct task_struct *t)
>  extern int auditsc_get_stamp(struct audit_context *ctx,
> struct timespec64 *t, unsigned int *serial);
>  extern int audit_set_loginuid(kuid_t loginuid);
> +extern int audit_set_containerid(struct task_struct *tsk, u64 containerid);
>

[PATCH net 0/4] net: dsa: Use strncpy() for ethtool::get_strings

2018-03-01 Thread Florian Fainelli

Hi all,

After turning on KASAN on one of my systems, I started getting lots of out of
bounds errors while fetching a given port's statistics, and indeed using
memcpy() is unsafe for copying strings, so let's use strncpy() instead.

Florian Fainelli (4):
  net: dsa: b53: Use strncpy() for ethtool::get_strings
  net: dsa: loop: Use strncpy() for ethtool::get_strings
  net: dsa: microchip: Utilize strncpy() for ethtool::get_strings
  net: dsa: mv88e6xxx: Utilize strncpy() for ethtool::get_strings

 drivers/net/dsa/b53/b53_common.c   | 4 ++--
 drivers/net/dsa/dsa_loop.c | 4 ++--
 drivers/net/dsa/microchip/ksz_common.c | 4 ++--
 drivers/net/dsa/mv88e6xxx/chip.c   | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

-- 
2.14.1

[PATCH net 1/4] net: dsa: b53: Use strncpy() for ethtool::get_strings

2018-03-01 Thread Florian Fainelli

Do not use memcpy() which is not safe, but instead use strncpy() which
will make sure that the string is NUL terminated (in the Linux
implementation) if the string is smaller than the length specified. This
fixes KASAN out of bounds warnings while fetching port statistics.

Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index db830a1141d9..c64ebb82df83 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -814,8 +814,8 @@ void b53_get_strings(struct dsa_switch *ds, int port, 
uint8_t *data)
unsigned int i;
 
for (i = 0; i < mib_size; i++)
-   memcpy(data + i * ETH_GSTRING_LEN,
-  mibs[i].name, ETH_GSTRING_LEN);
+   strncpy(data + i * ETH_GSTRING_LEN,
+   mibs[i].name, ETH_GSTRING_LEN);
 }
 EXPORT_SYMBOL(b53_get_strings);
 
-- 
2.14.1

[PATCH net 2/4] net: dsa: loop: Use strncpy() for ethtool::get_strings

2018-03-01 Thread Florian Fainelli

Do not use memcpy() which is not safe, but instead use strncpy() which
will make sure that the string is NUL terminated (in the Linux
implementation) if the string is smaller than the length specified. This
fixes KASAN out of bounds warnings while fetching port statistics.

Fixes: 484c01720d84 ("net: dsa: loop: Implement ethtool statistics")
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/dsa_loop.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/dsa_loop.c b/drivers/net/dsa/dsa_loop.c
index 7aa84ee4e771..2e7e1cdbd2e9 100644
--- a/drivers/net/dsa/dsa_loop.c
+++ b/drivers/net/dsa/dsa_loop.c
@@ -97,8 +97,8 @@ static void dsa_loop_get_strings(struct dsa_switch *ds, int 
port, uint8_t *data)
unsigned int i;
 
for (i = 0; i < __DSA_LOOP_CNT_MAX; i++)
-   memcpy(data + i * ETH_GSTRING_LEN,
-  ps->ports[port].mib[i].name, ETH_GSTRING_LEN);
+   strncpy(data + i * ETH_GSTRING_LEN,
+   ps->ports[port].mib[i].name, ETH_GSTRING_LEN);
 }
 
 static void dsa_loop_get_ethtool_stats(struct dsa_switch *ds, int port,
-- 
2.14.1

[PATCH net 3/4] net: dsa: microchip: Utilize strncpy() for ethtool::get_strings

2018-03-01 Thread Florian Fainelli

Do not use memcpy() which is not safe, but instead use strncpy() which
will make sure that the string is NUL terminated (in the Linux
implementation) if the string is smaller than the length specified. This
fixes KASAN out of bounds warnings while fetching port statistics.

Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477")
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/microchip/ksz_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz_common.c 
b/drivers/net/dsa/microchip/ksz_common.c
index 663b0d5b982b..db7f5c8c1dcb 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -449,8 +449,8 @@ static void ksz_get_strings(struct dsa_switch *ds, int 
port, uint8_t *buf)
int i;
 
for (i = 0; i < TOTAL_SWITCH_COUNTER_NUM; i++) {
-   memcpy(buf + i * ETH_GSTRING_LEN, mib_names[i].string,
-  ETH_GSTRING_LEN);
+   strncpy(buf + i * ETH_GSTRING_LEN, mib_names[i].string,
+   ETH_GSTRING_LEN);
}
 }
 
-- 
2.14.1

[PATCH net 4/4] net: dsa: mv88e6xxx: Utilize strncpy() for ethtool::get_strings

2018-03-01 Thread Florian Fainelli

Do not use memcpy() which is not safe, but instead use strncpy() which
will make sure that the string is NUL terminated (in the Linux
implementation) if the string is smaller than the length specified. This
fixes KASAN out of bounds warnings while fetching port statistics.

Fixes: f5e2ed022dff ("dsa: mv88e6xxx: Add Second back of statistics")
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index eb328bade225..bec7540aae2b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -636,8 +636,8 @@ static void mv88e6xxx_stats_get_strings(struct 
mv88e6xxx_chip *chip,
for (i = 0, j = 0; i < ARRAY_SIZE(mv88e6xxx_hw_stats); i++) {
stat = _hw_stats[i];
if (stat->type & types) {
-   memcpy(data + j * ETH_GSTRING_LEN, stat->string,
-  ETH_GSTRING_LEN);
+   strncpy(data + j * ETH_GSTRING_LEN, stat->string,
+   ETH_GSTRING_LEN);
j++;
}
}
-- 
2.14.1

[PATCH net-next v3 0/5] net: phy: Reduce duplication

2018-03-01 Thread Florian Fainelli

Hi all,

This patch series reduces the duplication among 10G PHY drivers that just
essentially stub most functions, but do that while replicating what the existing
generic functions do.

Changes in v3:

- removed unused "reg" variable in teranetics.c
- fixed subject for patch 5 since we actually use gen10g_no_soft_reset()

Changes in v2:

- rename gen10g_soft_reset() to gen10g_no_soft_reset() to better illustrate
  what it does (or does not)
- removed stray comment in marvell10g.c

Florian Fainelli (5):
  net: phy: aquantia: Utilize genphy_c45_aneg_done()
  net: phy: Export gen10g_* functions
  net: phy: teranetics: Utilize generic functions
  net: phy: cortina: Utilize generic functions
  net: phy: marvell10g: Utilize gen10g_no_soft_reset()

 drivers/net/phy/aquantia.c   | 20 ++--
 drivers/net/phy/cortina.c| 18 +++---
 drivers/net/phy/marvell10g.c | 11 +--
 drivers/net/phy/phy-c45.c| 20 +---
 drivers/net/phy/teranetics.c | 32 +---
 include/linux/phy.h  |  8 
 6 files changed, 36 insertions(+), 73 deletions(-)

-- 
2.14.1

[PATCH net-next v3 2/5] net: phy: Export gen10g_* functions

2018-03-01 Thread Florian Fainelli

In order to remove a fair amount of duplication in the different 10G PHY
drivers, export all gen10g_* functions to be able to make use of those.
While we are at it, rename gen10g_soft_reset() to gen10g_no_soft_reset()
to illustrate what it does.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phy-c45.c | 20 +---
 include/linux/phy.h   |  8 
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
index a4576859afae..0017eddc24db 100644
--- a/drivers/net/phy/phy-c45.c
+++ b/drivers/net/phy/phy-c45.c
@@ -268,12 +268,13 @@ EXPORT_SYMBOL_GPL(genphy_c45_read_mdix);
 
 /* The gen10g_* functions are the old Clause 45 stub */
 
-static int gen10g_config_aneg(struct phy_device *phydev)
+int gen10g_config_aneg(struct phy_device *phydev)
 {
return 0;
 }
+EXPORT_SYMBOL_GPL(gen10g_config_aneg);
 
-static int gen10g_read_status(struct phy_device *phydev)
+int gen10g_read_status(struct phy_device *phydev)
 {
u32 mmd_mask = phydev->c45_ids.devices_in_package;
int ret;
@@ -291,14 +292,16 @@ static int gen10g_read_status(struct phy_device *phydev)
 
return 0;
 }
+EXPORT_SYMBOL_GPL(gen10g_read_status);
 
-static int gen10g_soft_reset(struct phy_device *phydev)
+int gen10g_no_soft_reset(struct phy_device *phydev)
 {
/* Do nothing for now */
return 0;
 }
+EXPORT_SYMBOL_GPL(gen10g_no_soft_reset);
 
-static int gen10g_config_init(struct phy_device *phydev)
+int gen10g_config_init(struct phy_device *phydev)
 {
/* Temporarily just say we support everything */
phydev->supported = SUPPORTED_1baseT_Full;
@@ -306,22 +309,25 @@ static int gen10g_config_init(struct phy_device *phydev)
 
return 0;
 }
+EXPORT_SYMBOL_GPL(gen10g_config_init);
 
-static int gen10g_suspend(struct phy_device *phydev)
+int gen10g_suspend(struct phy_device *phydev)
 {
return 0;
 }
+EXPORT_SYMBOL_GPL(gen10g_suspend);
 
-static int gen10g_resume(struct phy_device *phydev)
+int gen10g_resume(struct phy_device *phydev)
 {
return 0;
 }
+EXPORT_SYMBOL_GPL(gen10g_resume);
 
 struct phy_driver genphy_10g_driver = {
.phy_id = 0x,
.phy_id_mask= 0x,
.name   = "Generic 10G PHY",
-   .soft_reset = gen10g_soft_reset,
+   .soft_reset = gen10g_no_soft_reset,
.config_init= gen10g_config_init,
.features   = 0,
.config_aneg= gen10g_config_aneg,
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5a0c3e53e7c2..6e38c699b753 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -994,6 +994,14 @@ int genphy_c45_pma_setup_forced(struct phy_device *phydev);
 int genphy_c45_an_disable_aneg(struct phy_device *phydev);
 int genphy_c45_read_mdix(struct phy_device *phydev);
 
+/* The gen10g_* functions are the old Clause 45 stub */
+int gen10g_config_aneg(struct phy_device *phydev);
+int gen10g_read_status(struct phy_device *phydev);
+int gen10g_no_soft_reset(struct phy_device *phydev);
+int gen10g_config_init(struct phy_device *phydev);
+int gen10g_suspend(struct phy_device *phydev);
+int gen10g_resume(struct phy_device *phydev);
+
 static inline int phy_read_status(struct phy_device *phydev)
 {
if (!phydev->drv)
-- 
2.14.1

[PATCH net-next v3 1/5] net: phy: aquantia: Utilize genphy_c45_aneg_done()

2018-03-01 Thread Florian Fainelli

The driver duplicates what the generic function does, so use the generic
function intead.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/aquantia.c | 20 ++--
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/net/phy/aquantia.c b/drivers/net/phy/aquantia.c
index e8ae50e1255e..319edc9c8ec7 100644
--- a/drivers/net/phy/aquantia.c
+++ b/drivers/net/phy/aquantia.c
@@ -38,14 +38,6 @@ static int aquantia_config_aneg(struct phy_device *phydev)
return 0;
 }
 
-static int aquantia_aneg_done(struct phy_device *phydev)
-{
-   int reg;
-
-   reg = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_STAT1);
-   return (reg < 0) ? reg : (reg & BMSR_ANEGCOMPLETE);
-}
-
 static int aquantia_config_intr(struct phy_device *phydev)
 {
int err;
@@ -125,7 +117,7 @@ static struct phy_driver aquantia_driver[] = {
.name   = "Aquantia AQ1202",
.features   = PHY_AQUANTIA_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
-   .aneg_done  = aquantia_aneg_done,
+   .aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
.config_intr= aquantia_config_intr,
.ack_interrupt  = aquantia_ack_interrupt,
@@ -137,7 +129,7 @@ static struct phy_driver aquantia_driver[] = {
.name   = "Aquantia AQ2104",
.features   = PHY_AQUANTIA_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
-   .aneg_done  = aquantia_aneg_done,
+   .aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
.config_intr= aquantia_config_intr,
.ack_interrupt  = aquantia_ack_interrupt,
@@ -149,7 +141,7 @@ static struct phy_driver aquantia_driver[] = {
.name   = "Aquantia AQR105",
.features   = PHY_AQUANTIA_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
-   .aneg_done  = aquantia_aneg_done,
+   .aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
.config_intr= aquantia_config_intr,
.ack_interrupt  = aquantia_ack_interrupt,
@@ -161,7 +153,7 @@ static struct phy_driver aquantia_driver[] = {
.name   = "Aquantia AQR106",
.features   = PHY_AQUANTIA_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
-   .aneg_done  = aquantia_aneg_done,
+   .aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
.config_intr= aquantia_config_intr,
.ack_interrupt  = aquantia_ack_interrupt,
@@ -173,7 +165,7 @@ static struct phy_driver aquantia_driver[] = {
.name   = "Aquantia AQR107",
.features   = PHY_AQUANTIA_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
-   .aneg_done  = aquantia_aneg_done,
+   .aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
.config_intr= aquantia_config_intr,
.ack_interrupt  = aquantia_ack_interrupt,
@@ -185,7 +177,7 @@ static struct phy_driver aquantia_driver[] = {
.name   = "Aquantia AQR405",
.features   = PHY_AQUANTIA_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
-   .aneg_done  = aquantia_aneg_done,
+   .aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
.config_intr= aquantia_config_intr,
.ack_interrupt  = aquantia_ack_interrupt,
-- 
2.14.1

[PATCH net-next v3 4/5] net: phy: cortina: Utilize generic functions

2018-03-01 Thread Florian Fainelli

cortina_soft_reset() does the same thing as gen10g_soft_reset(), and
cortina_config_aneg() is actually doing what gen10g_config_init() does
for 10G capable PHYs.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/cortina.c | 18 +++---
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/drivers/net/phy/cortina.c b/drivers/net/phy/cortina.c
index 9442db221834..8022cd317f62 100644
--- a/drivers/net/phy/cortina.c
+++ b/drivers/net/phy/cortina.c
@@ -30,14 +30,6 @@ static int cortina_read_reg(struct phy_device *phydev, u16 
regnum)
MII_ADDR_C45 | regnum);
 }
 
-static int cortina_config_aneg(struct phy_device *phydev)
-{
-   phydev->supported = SUPPORTED_1baseT_Full;
-   phydev->advertising = SUPPORTED_1baseT_Full;
-
-   return 0;
-}
-
 static int cortina_read_status(struct phy_device *phydev)
 {
int gpio_int_status, ret = 0;
@@ -61,11 +53,6 @@ static int cortina_read_status(struct phy_device *phydev)
return ret;
 }
 
-static int cortina_soft_reset(struct phy_device *phydev)
-{
-   return 0;
-}
-
 static int cortina_probe(struct phy_device *phydev)
 {
u32 phy_id = 0;
@@ -101,9 +88,10 @@ static struct phy_driver cortina_driver[] = {
.phy_id = PHY_ID_CS4340,
.phy_id_mask= 0x,
.name   = "Cortina CS4340",
-   .config_aneg= cortina_config_aneg,
+   .config_init= gen10g_config_init,
+   .config_aneg= gen10g_config_aneg,
.read_status= cortina_read_status,
-   .soft_reset = cortina_soft_reset,
+   .soft_reset = gen10g_no_soft_reset,
.probe  = cortina_probe,
 },
 };
-- 
2.14.1

[PATCH net-next v3 5/5] net: phy: marvell10g: Utilize gen10g_no_soft_reset()

2018-03-01 Thread Florian Fainelli

We do the same thing as the generic function: nothing, so utilize it.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/marvell10g.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c
index 8a0bd98fdec7..4f1efa02a930 100644
--- a/drivers/net/phy/marvell10g.c
+++ b/drivers/net/phy/marvell10g.c
@@ -71,15 +71,6 @@ static int mv3310_probe(struct phy_device *phydev)
return 0;
 }
 
-/*
- * Resetting the MV88X3310 causes it to become non-responsive.  Avoid
- * setting the reset bit(s).
- */
-static int mv3310_soft_reset(struct phy_device *phydev)
-{
-   return 0;
-}
-
 static int mv3310_config_init(struct phy_device *phydev)
 {
__ETHTOOL_DECLARE_LINK_MODE_MASK(supported) = { 0, };
@@ -377,7 +368,7 @@ static struct phy_driver mv3310_drivers[] = {
  SUPPORTED_1baseT_Full |
  SUPPORTED_Backplane,
.probe  = mv3310_probe,
-   .soft_reset = mv3310_soft_reset,
+   .soft_reset = gen10g_no_soft_reset,
.config_init= mv3310_config_init,
.config_aneg= mv3310_config_aneg,
.aneg_done  = mv3310_aneg_done,
-- 
2.14.1

[PATCH net-next v3 3/5] net: phy: teranetics: Utilize generic functions

2018-03-01 Thread Florian Fainelli

Update teranetics_aneg_done() to use genphy_c45_aneg_done() instead of
duplicating that code, and switch to gen10g_* functions where
appropriate instead of maintaining identical copies doing nothing.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/teranetics.c | 32 +---
 1 file changed, 5 insertions(+), 27 deletions(-)

diff --git a/drivers/net/phy/teranetics.c b/drivers/net/phy/teranetics.c
index fb2cef764e9a..22f3bdd8206c 100644
--- a/drivers/net/phy/teranetics.c
+++ b/drivers/net/phy/teranetics.c
@@ -34,39 +34,17 @@ MODULE_LICENSE("GPL v2");
MDIO_PHYXS_LNSTAT_SYNC3 | \
MDIO_PHYXS_LNSTAT_ALIGN)
 
-static int teranetics_config_init(struct phy_device *phydev)
-{
-   phydev->supported = SUPPORTED_1baseT_Full;
-   phydev->advertising = SUPPORTED_1baseT_Full;
-
-   return 0;
-}
-
-static int teranetics_soft_reset(struct phy_device *phydev)
-{
-   return 0;
-}
-
 static int teranetics_aneg_done(struct phy_device *phydev)
 {
-   int reg;
-
/* auto negotiation state can only be checked when using copper
 * port, if using fiber port, just lie it's done.
 */
-   if (!phy_read_mmd(phydev, MDIO_MMD_VEND1, 93)) {
-   reg = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_STAT1);
-   return (reg < 0) ? reg : (reg & BMSR_ANEGCOMPLETE);
-   }
+   if (!phy_read_mmd(phydev, MDIO_MMD_VEND1, 93))
+   return genphy_c45_aneg_done(phydev);
 
return 1;
 }
 
-static int teranetics_config_aneg(struct phy_device *phydev)
-{
-   return 0;
-}
-
 static int teranetics_read_status(struct phy_device *phydev)
 {
int reg;
@@ -102,10 +80,10 @@ static struct phy_driver teranetics_driver[] = {
.phy_id = PHY_ID_TN2020,
.phy_id_mask= 0x,
.name   = "Teranetics TN2020",
-   .soft_reset = teranetics_soft_reset,
+   .soft_reset = gen10g_no_soft_reset,
.aneg_done  = teranetics_aneg_done,
-   .config_init= teranetics_config_init,
-   .config_aneg= teranetics_config_aneg,
+   .config_init= gen10g_config_init,
+   .config_aneg= gen10g_config_aneg,
.read_status= teranetics_read_status,
.match_phy_device = teranetics_match_phy_device,
 },
-- 
2.14.1

Re: [PATCH] pci-iov: Add support for unmanaged SR-IOV

2018-03-01 Thread Alex Williamson

On Thu, 1 Mar 2018 14:42:40 -0800
Alexander Duyck  wrote:

> On Thu, Mar 1, 2018 at 12:22 PM, Alex Williamson
>  wrote:
> > On Wed, 28 Feb 2018 16:36:38 -0800
> > Alexander Duyck  wrote:
> >  
> >> On Wed, Feb 28, 2018 at 2:59 PM, Alex Williamson
> >>  wrote:  
> >> > On Wed, 28 Feb 2018 09:49:21 -0800
> >> > Alexander Duyck  wrote:
> >> >  
> >> >> On Tue, Feb 27, 2018 at 2:25 PM, Alexander Duyck
> >> >>  wrote:  
> >> >> > On Tue, Feb 27, 2018 at 1:40 PM, Alex Williamson
> >> >> >  wrote:  
> >> >> >> On Tue, 27 Feb 2018 11:06:54 -0800
> >> >> >> Alexander Duyck  wrote:
> >> >> >>  
> >> >> >>> From: Alexander Duyck 
> >> >> >>>
> >> >> >>> This patch is meant to add support for SR-IOV on devices when the 
> >> >> >>> VFs are
> >> >> >>> not managed by the kernel. Examples of recent patches attempting to 
> >> >> >>> do this
> >> >> >>> include:  
> >> >> >>
> >> >> >> It appears to enable sriov when the _pf_ is not managed by the
> >> >> >> kernel, but by "managed" we mean that either there is no pf driver or
> >> >> >> the pf driver doesn't provide an sriov_configure callback,
> >> >> >> intentionally or otherwise.
> >> >> >>  
> >> >> >>> virto - https://patchwork.kernel.org/patch/10241225/
> >> >> >>> pci-stub - https://patchwork.kernel.org/patch/10109935/
> >> >> >>> vfio - https://patchwork.kernel.org/patch/10103353/
> >> >> >>> uio - https://patchwork.kernel.org/patch/9974031/  
> >> >> >>
> >> >> >> So is the goal to get around the issues with enabling sriov on each 
> >> >> >> of
> >> >> >> the above drivers by doing it under the covers or are you really just
> >> >> >> trying to enable sriov for a truly unmanage (no pf driver) case?  For
> >> >> >> example, should a driver explicitly not wanting sriov enabled 
> >> >> >> implement
> >> >> >> a dummy sriov_configure function?
> >> >> >>  
> >> >> >>> Since this is quickly blowing up into a multi-driver problem it is 
> >> >> >>> probably
> >> >> >>> best to implement this solution in one spot.
> >> >> >>>
> >> >> >>> This patch is an attempt to do that. What we do with this patch is 
> >> >> >>> provide
> >> >> >>> a generic call to enable SR-IOV in the case that the PF driver is 
> >> >> >>> either
> >> >> >>> not present, or the PF driver doesn't support configuring SR-IOV.
> >> >> >>>
> >> >> >>> A new sysfs value called sriov_unmanaged_autoprobe has been added. 
> >> >> >>> This
> >> >> >>> value is used as the drivers_autoprobe setting of the VFs when they 
> >> >> >>> are
> >> >> >>> being managed by an external entity such as userspace or device 
> >> >> >>> firmware
> >> >> >>> instead of being managed by the kernel.  
> >> >> >>
> >> >> >> Documentation/ABI/testing/sysfs-bus-pci update is missing.  
> >> >> >
> >> >> > I can make sure to update that in the next version.
> >> >> >  
> >> >> >>> One side effect of this change is that the sriov_drivers_autoprobe 
> >> >> >>> and
> >> >> >>> sriov_unmanaged_autoprobe will only apply their updates when SR-IOV 
> >> >> >>> is
> >> >> >>> disabled. Attempts to update them when SR-IOV is in use will only 
> >> >> >>> update
> >> >> >>> the local value and will not update sriov->autoprobe.  
> >> >> >>
> >> >> >> And we expect users to understand when sriov_drivers_autoprobe 
> >> >> >> applies
> >> >> >> vs sriov_unmanaged_autoprobe, even though they're using the same
> >> >> >> interfaces to enable sriov?  Are all combinations expected to work, 
> >> >> >> ex.
> >> >> >> unmanaged sriov is enabled, a native pf driver loads, vfs work?  Not
> >> >> >> only does it seems like there's opportunity to use this incorrectly, 
> >> >> >> I
> >> >> >> think maybe it might be difficult to use correctly.
> >> >> >>  
> >> >> >>> I based my patch set originally on the patch by Mark Rustad but 
> >> >> >>> there isn't
> >> >> >>> much left after going through and cleaning out the bits that were 
> >> >> >>> no longer
> >> >> >>> needed, and after incorporating the feedback from David Miller.
> >> >> >>>
> >> >> >>> I have included the authors of the original 4 patches above in the 
> >> >> >>> Cc here.
> >> >> >>> My hope is to get feedback and/or review on if this works for their 
> >> >> >>> use
> >> >> >>> cases.
> >> >> >>>
> >> >> >>> Cc: Mark Rustad 
> >> >> >>> Cc: Maximilian Heyne 
> >> >> >>> Cc: Liang-Min Wang 
> >> >> >>> Cc: David Woodhouse 
> >> >> >>> Signed-off-by: Alexander Duyck 
> >> >> >>> ---
> >> >> >>>  drivers/pci/iov.c|   27 +++-
> >> >> >>>  drivers/pci/pci-driver.c |2 +
> >> >> >>>  drivers/pci/pci-sysfs.c  |   62 
> >> >> >>> +-
> >> >> >>>

Re: [PATCH net 8/9] hv_netvsc: propagate rx filters to VF

2018-03-01 Thread Stephen Hemminger

On Thu,  1 Mar 2018 10:27:55 -0800
Stephen Hemminger  wrote:

> + if (change & IFF_PROMISC)
> + dev_set_promiscuity(net,
> + (net->flags & IFF_PROMISC) ? 1 : -1);

This should be vf_netdev here.

[RFC 4/5] fm10k: use seq_open_data()

2018-03-01 Thread Rasmus Villemoes

Simplify the code slightly by having seq_open_data do the ->private assignment.

Signed-off-by: Rasmus Villemoes 
---
 drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
index 14df09e2d964..acf034feb8fa 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
@@ -132,20 +132,13 @@ static int fm10k_dbg_desc_open(struct inode *inode, 
struct file *filep)
struct fm10k_ring *ring = inode->i_private;
struct fm10k_q_vector *q_vector = ring->q_vector;
const struct seq_operations *desc_seq_ops;
-   int err;
 
if (ring < q_vector->rx.ring)
desc_seq_ops = _dbg_tx_desc_seq_ops;
else
desc_seq_ops = _dbg_rx_desc_seq_ops;
 
-   err = seq_open(filep, desc_seq_ops);
-   if (err)
-   return err;
-
-   ((struct seq_file *)filep->private_data)->private = ring;
-
-   return 0;
+   return seq_open_data(filep, desc_seq_ops, ring);
 }
 
 static const struct file_operations fm10k_dbg_desc_fops = {
-- 
2.15.1

[PATCH net 3/4] net/mac89x0: Fix and modernize log messages

2018-03-01 Thread Finn Thain

Fix log message fragments that no longer produce the desired output
since the behaviour of printk() was changed.
Add missing printk severity levels.
Drop deprecated "out of memory" message as per checkpatch advice.

Signed-off-by: Finn Thain 
---
 drivers/net/ethernet/cirrus/mac89x0.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/cirrus/mac89x0.c 
b/drivers/net/ethernet/cirrus/mac89x0.c
index 911139abbe20..9a266496a538 100644
--- a/drivers/net/ethernet/cirrus/mac89x0.c
+++ b/drivers/net/ethernet/cirrus/mac89x0.c
@@ -56,6 +56,8 @@
   local_irq_{dis,en}able()
 */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 static const char version[] =
 "cs89x0.c:v1.02 11/26/96 Russell Nelson \n";
 
@@ -245,16 +247,14 @@ static int mac89x0_device_probe(struct platform_device 
*pdev)
if (net_debug && version_printed++ == 0)
printk(version);
 
-   printk(KERN_INFO "%s: cs89%c0%s rev %c found at %#8lx",
-  dev->name,
-  lp->chip_type==CS8900?'0':'2',
-  lp->chip_type==CS8920M?"M":"",
-  lp->chip_revision,
-  dev->base_addr);
+   pr_info("cs89%c0%s rev %c found at %#8lx\n",
+   lp->chip_type == CS8900 ? '0' : '2',
+   lp->chip_type == CS8920M ? "M" : "",
+   lp->chip_revision, dev->base_addr);
 
/* Try to read the MAC address */
if ((readreg(dev, PP_SelfST) & (EEPROM_PRESENT | EEPROM_OK)) == 0) {
-   printk("\nmac89x0: No EEPROM, giving up now.\n");
+   pr_info("No EEPROM, giving up now.\n");
goto out1;
 } else {
 for (i = 0; i < ETH_ALEN; i += 2) {
@@ -269,7 +269,7 @@ static int mac89x0_device_probe(struct platform_device 
*pdev)
 
/* print the IRQ and ethernet address. */
 
-   printk(" IRQ %d ADDR %pM\n", dev->irq, dev->dev_addr);
+   pr_info("MAC %pM, IRQ %d\n", dev->dev_addr, dev->irq);
 
dev->netdev_ops = _netdev_ops;
 
@@ -472,7 +472,6 @@ net_rx(struct net_device *dev)
/* Malloc up new buffer. */
skb = alloc_skb(length, GFP_ATOMIC);
if (skb == NULL) {
-   printk("%s: Memory squeeze, dropping packet.\n", dev->name);
dev->stats.rx_dropped++;
return;
}
@@ -560,7 +559,7 @@ static int set_mac_address(struct net_device *dev, void 
*addr)
return -EADDRNOTAVAIL;
 
memcpy(dev->dev_addr, saddr->sa_data, ETH_ALEN);
-   printk("%s: Setting MAC address to %pM\n", dev->name, dev->dev_addr);
+   netdev_info(dev, "Setting MAC address to %pM\n", dev->dev_addr);
 
/* set the Ethernet address */
for (i=0; i < ETH_ALEN/2; i++)
-- 
2.16.1

[PATCH net 2/4] net/mac89x0: Convert to platform_driver

2018-03-01 Thread Finn Thain

Apparently these Dayna cards don't have a pseudoslot declaration ROM
which means they can't be probed like NuBus cards.

Cc: Geert Uytterhoeven 
Signed-off-by: Finn Thain 
Acked-by: Geert Uytterhoeven 
---
 arch/m68k/mac/config.c|  4 +++
 drivers/net/Space.c   |  3 --
 drivers/net/ethernet/cirrus/mac89x0.c | 68 +++
 include/net/Space.h   |  1 -
 4 files changed, 33 insertions(+), 43 deletions(-)

diff --git a/arch/m68k/mac/config.c b/arch/m68k/mac/config.c
index d3d435248a24..c73eb8209555 100644
--- a/arch/m68k/mac/config.c
+++ b/arch/m68k/mac/config.c
@@ -1088,6 +1088,10 @@ int __init mac_platform_init(void)
macintosh_config->expansion_type == MAC_EXP_PDS_COMM)
platform_device_register_simple("macsonic", -1, NULL, 0);
 
+   if (macintosh_config->expansion_type == MAC_EXP_PDS ||
+   macintosh_config->expansion_type == MAC_EXP_PDS_COMM)
+   platform_device_register_simple("mac89x0", -1, NULL, 0);
+
if (macintosh_config->ether_type == MAC_ETHER_MACE)
platform_device_register_simple("macmace", -1, NULL, 0);
 
diff --git a/drivers/net/Space.c b/drivers/net/Space.c
index 64333ec999ac..3afda6561434 100644
--- a/drivers/net/Space.c
+++ b/drivers/net/Space.c
@@ -113,9 +113,6 @@ static struct devprobe2 m68k_probes[] __initdata = {
 #endif
 #ifdef CONFIG_MVME147_NET  /* MVME147 internal Ethernet */
{mvme147lance_probe, 0},
-#endif
-#ifdef CONFIG_MAC89x0
-   {mac89x0_probe, 0},
 #endif
{NULL, 0},
 };
diff --git a/drivers/net/ethernet/cirrus/mac89x0.c 
b/drivers/net/ethernet/cirrus/mac89x0.c
index 4fe0ae93ab36..911139abbe20 100644
--- a/drivers/net/ethernet/cirrus/mac89x0.c
+++ b/drivers/net/ethernet/cirrus/mac89x0.c
@@ -93,6 +93,7 @@ static const char version[] =
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -105,6 +106,10 @@ static const char version[] =
 
 #include "cs89x0.h"
 
+static int debug;
+module_param(debug, int, 0);
+MODULE_PARM_DESC(debug, "CS89[02]0 debug level (0-5)");
+
 static unsigned int net_debug = NET_DEBUG;
 
 /* Information that need to be kept for each board. */
@@ -167,10 +172,9 @@ static const struct net_device_ops mac89x0_netdev_ops = {
 
 /* Probe for the CS8900 card in slot E.  We won't bother looking
anywhere else until we have a really good reason to do so. */
-struct net_device * __init mac89x0_probe(int unit)
+static int mac89x0_device_probe(struct platform_device *pdev)
 {
struct net_device *dev;
-   static int once_is_enough;
struct net_local *lp;
static unsigned version_printed;
int i, slot;
@@ -180,21 +184,11 @@ struct net_device * __init mac89x0_probe(int unit)
int err = -ENODEV;
struct nubus_rsrc *fres;
 
-   if (!MACH_IS_MAC)
-   return ERR_PTR(-ENODEV);
+   net_debug = debug;
 
dev = alloc_etherdev(sizeof(struct net_local));
if (!dev)
-   return ERR_PTR(-ENOMEM);
-
-   if (unit >= 0) {
-   sprintf(dev->name, "eth%d", unit);
-   netdev_boot_setup_check(dev);
-   }
-
-   if (once_is_enough)
-   goto out;
-   once_is_enough = 1;
+   return -ENOMEM;
 
/* We might have to parameterize this later */
slot = 0xE;
@@ -221,6 +215,8 @@ struct net_device * __init mac89x0_probe(int unit)
if (sig != swab16(CHIP_EISA_ID_SIG))
goto out;
 
+   SET_NETDEV_DEV(dev, >dev);
+
/* Initialize the net_device structure. */
lp = netdev_priv(dev);
 
@@ -280,12 +276,14 @@ struct net_device * __init mac89x0_probe(int unit)
err = register_netdev(dev);
if (err)
goto out1;
-   return NULL;
+
+   platform_set_drvdata(pdev, dev);
+   return 0;
 out1:
nubus_writew(0, dev->base_addr + ADD_PORT);
 out:
free_netdev(dev);
-   return ERR_PTR(err);
+   return err;
 }
 
 /* Open/initialize the board.  This is called (in the current kernel)
@@ -571,32 +569,24 @@ static int set_mac_address(struct net_device *dev, void 
*addr)
return 0;
 }
 
-#ifdef MODULE
-
-static struct net_device *dev_cs89x0;
-static int debug;
-
-module_param(debug, int, 0);
-MODULE_PARM_DESC(debug, "CS89[02]0 debug level (0-5)");
 MODULE_LICENSE("GPL");
 
-int __init
-init_module(void)
+static int mac89x0_device_remove(struct platform_device *pdev)
 {
-   net_debug = debug;
-dev_cs89x0 = mac89x0_probe(-1);
-   if (IS_ERR(dev_cs89x0)) {
-printk(KERN_WARNING "mac89x0.c: No card found\n");
-   return PTR_ERR(dev_cs89x0);
-   }
+   struct net_device *dev = platform_get_drvdata(pdev);
+
+   unregister_netdev(dev);
+   nubus_writew(0, dev->base_addr + ADD_PORT);
+   free_netdev(dev);
return 0;
 }
 
-void

[PATCH net 0/4] Fixes, cleanup and modernization for mac89x0 driver

2018-03-01 Thread Finn Thain

Changes since v4 of combined patch series:
- Removed redundant and non-portable MACH_IS_MAC tests.
- Added acked-by tags from Geert Uytterhoeven.
- Omitted patches unrelated to mac89x0 driver.


Finn Thain (4):
  net/mac89x0: Remove redundant code
  net/mac89x0: Convert to platform_driver
  net/mac89x0: Fix and modernize log messages
  net/mac89x0: Replace custom debug logging with netif_* calls

 arch/m68k/mac/config.c|   4 +
 drivers/net/Space.c   |   3 -
 drivers/net/ethernet/cirrus/mac89x0.c | 158 +++---
 include/net/Space.h   |   1 -
 4 files changed, 53 insertions(+), 113 deletions(-)

-- 
2.16.1

[PATCH net 4/4] net/mac89x0: Replace custom debug logging with netif_* calls

2018-03-01 Thread Finn Thain

Adopt the conventional style of debug logging because it is both
shorter and more flexible.
Remove the 'version_printed' flag as the version will be printed
only once anyway (when the module loads).

Signed-off-by: Finn Thain 
---
 drivers/net/ethernet/cirrus/mac89x0.c | 47 +++
 1 file changed, 15 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/cirrus/mac89x0.c 
b/drivers/net/ethernet/cirrus/mac89x0.c
index 9a266496a538..3f8fe8fd79cc 100644
--- a/drivers/net/ethernet/cirrus/mac89x0.c
+++ b/drivers/net/ethernet/cirrus/mac89x0.c
@@ -61,18 +61,6 @@
 static const char version[] =
 "cs89x0.c:v1.02 11/26/96 Russell Nelson \n";
 
-/* === configure the driver here === */
-
-/* use 0 for production, 1 for verification, >2 for debug */
-#ifndef NET_DEBUG
-#define NET_DEBUG 0
-#endif
-
-/* === end of configuration === */
-
-
-/* Always include 'config.h' first in case the user wants to turn on
-   or override something. */
 #include 
 
 /*
@@ -108,14 +96,13 @@ static const char version[] =
 
 #include "cs89x0.h"
 
-static int debug;
+static int debug = -1;
 module_param(debug, int, 0);
-MODULE_PARM_DESC(debug, "CS89[02]0 debug level (0-5)");
-
-static unsigned int net_debug = NET_DEBUG;
+MODULE_PARM_DESC(debug, "debug message level");
 
 /* Information that need to be kept for each board. */
 struct net_local {
+   int msg_enable;
int chip_type;  /* one of: CS8900, CS8920, CS8920M */
char chip_revision; /* revision letter of the chip ('A'...) */
int send_cmd;   /* the propercommand used to send a packet. */
@@ -178,7 +165,6 @@ static int mac89x0_device_probe(struct platform_device 
*pdev)
 {
struct net_device *dev;
struct net_local *lp;
-   static unsigned version_printed;
int i, slot;
unsigned rev_type = 0;
unsigned long ioaddr;
@@ -186,8 +172,6 @@ static int mac89x0_device_probe(struct platform_device 
*pdev)
int err = -ENODEV;
struct nubus_rsrc *fres;
 
-   net_debug = debug;
-
dev = alloc_etherdev(sizeof(struct net_local));
if (!dev)
return -ENOMEM;
@@ -222,6 +206,8 @@ static int mac89x0_device_probe(struct platform_device 
*pdev)
/* Initialize the net_device structure. */
lp = netdev_priv(dev);
 
+   lp->msg_enable = netif_msg_init(debug, 0);
+
/* Fill in the 'dev' fields. */
dev->base_addr = ioaddr;
dev->mem_start = (unsigned long)
@@ -244,8 +230,7 @@ static int mac89x0_device_probe(struct platform_device 
*pdev)
if (lp->chip_type != CS8900 && lp->chip_revision >= 'C')
lp->send_cmd = TX_NOW;
 
-   if (net_debug && version_printed++ == 0)
-   printk(version);
+   netif_dbg(lp, drv, dev, "%s", version);
 
pr_info("cs89%c0%s rev %c found at %#8lx\n",
lp->chip_type == CS8900 ? '0' : '2',
@@ -345,11 +330,9 @@ net_send_packet(struct sk_buff *skb, struct net_device 
*dev)
struct net_local *lp = netdev_priv(dev);
unsigned long flags;
 
-   if (net_debug > 3)
-   printk("%s: sent %d byte packet of type %x\n",
-  dev->name, skb->len,
-  (skb->data[ETH_ALEN+ETH_ALEN] << 8)
-  | skb->data[ETH_ALEN+ETH_ALEN+1]);
+   netif_dbg(lp, tx_queued, dev, "sent %d byte packet of type %x\n",
+ skb->len, skb->data[ETH_ALEN + ETH_ALEN] << 8 |
+ skb->data[ETH_ALEN + ETH_ALEN + 1]);
 
/* keep the upload from being interrupted, since we
   ask the chip to start transmitting before the
@@ -398,7 +381,7 @@ static irqreturn_t net_interrupt(int irq, void *dev_id)
faster than you can read them off, you're screwed.  Hasta la
vista, baby!  */
while ((status = swab16(nubus_readw(dev->base_addr + ISQ_PORT {
-   if (net_debug > 4)printk("%s: event=%04x\n", dev->name, status);
+   netif_dbg(lp, intr, dev, "status=%04x\n", status);
switch(status & ISQ_EVENT_MASK) {
case ISQ_RECEIVER_EVENT:
/* Got a packet(s). */
@@ -428,7 +411,7 @@ static irqreturn_t net_interrupt(int irq, void *dev_id)
netif_wake_queue(dev);
}
if (status & TX_UNDERRUN) {
-   if (net_debug > 0) printk("%s: transmit 
underrun\n", dev->name);
+   netif_dbg(lp, tx_err, dev, "transmit 
underrun\n");
 lp->send_underrun++;
 if (lp->send_underrun == 3) lp->send_cmd = 
TX_AFTER_381;
 else if (lp->send_underrun == 6) lp->send_cmd 
= TX_AFTER_ALL;
@@ -449,6 +432,7 @@ static irqreturn_t

[PATCH net 1/4] net/mac89x0: Remove redundant code

2018-03-01 Thread Finn Thain

Signed-off-by: Finn Thain 
---
 drivers/net/ethernet/cirrus/mac89x0.c | 32 
 1 file changed, 32 deletions(-)

diff --git a/drivers/net/ethernet/cirrus/mac89x0.c 
b/drivers/net/ethernet/cirrus/mac89x0.c
index 977d4c2c759d..4fe0ae93ab36 100644
--- a/drivers/net/ethernet/cirrus/mac89x0.c
+++ b/drivers/net/ethernet/cirrus/mac89x0.c
@@ -115,14 +115,9 @@ struct net_local {
int rx_mode;
int curr_rx_cfg;
 int send_underrun;  /* keep track of how many underruns in a row 
we get */
-   struct sk_buff *skb;
 };
 
 /* Index to functions, as function prototypes. */
-
-#if 0
-extern void reset_chip(struct net_device *dev);
-#endif
 static int net_open(struct net_device *dev);
 static int net_send_packet(struct sk_buff *skb, struct net_device *dev);
 static irqreturn_t net_interrupt(int irq, void *dev_id);
@@ -132,10 +127,6 @@ static int net_close(struct net_device *dev);
 static struct net_device_stats *net_get_stats(struct net_device *dev);
 static int set_mac_address(struct net_device *dev, void *addr);
 
-
-/* Example routines you must write ;->. */
-#define tx_done(dev) 1
-
 /* For reading/writing registers ISA-style */
 static inline int
 readreg_io(struct net_device *dev, int portno)
@@ -297,24 +288,6 @@ struct net_device * __init mac89x0_probe(int unit)
return ERR_PTR(err);
 }
 
-#if 0
-/* This is useful for something, but I don't know what yet. */
-void __init reset_chip(struct net_device *dev)
-{
-   int reset_start_time;
-
-   writereg(dev, PP_SelfCTL, readreg(dev, PP_SelfCTL) | POWER_ON_RESET);
-
-   /* wait 30 ms */
-   msleep_interruptible(30);
-
-   /* Wait until the chip is reset */
-   reset_start_time = jiffies;
-   while( (readreg(dev, PP_SelfST) & INIT_DONE) == 0 && jiffies - 
reset_start_time < 2)
-   ;
-}
-#endif
-
 /* Open/initialize the board.  This is called (in the current kernel)
sometime after booting when the 'ifconfig' program is run.
 
@@ -416,11 +389,6 @@ static irqreturn_t net_interrupt(int irq, void *dev_id)
struct net_local *lp;
int ioaddr, status;
 
-   if (dev == NULL) {
-   printk ("net_interrupt(): irq %d for unknown device.\n", irq);
-   return IRQ_NONE;
-   }
-
ioaddr = dev->base_addr;
lp = netdev_priv(dev);
 
-- 
2.16.1

Re: [PATCH bpf-next] samples/bpf: detach prog from cgroup

2018-03-01 Thread Daniel Borkmann

On 03/01/2018 06:47 AM, Prashant Bhole wrote:
> test_cgrp2_sock.sh and test_cgrp2_sock2.sh tests keep the program
> attached to cgroup even after completion.
> Using detach functionality of test_cgrp2_sock in both scripts.
> 
> Signed-off-by: Prashant Bhole 

Applied to bpf-next, thanks Prashant!

linux-next: manual merge of the net-next tree with the net tree

2018-03-01 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  net/smc/smc_core.c

between commit:

  2be922f31606 ("net/smc: use link_id of server in confirm link reply")

from the net tree and commit:

  52bedf37bafe ("net/smc: process add/delete link messages")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.



-- 
Cheers,
Stephen Rothwell


pgpT6w9uNnMau.pgp
Description: OpenPGP digital signature

Re: [PATCH net-next v2 1/5] net: fib_rules: support for match on ip_proto, sport and dport

2018-03-01 Thread Roopa Prabhu

On Thu, Mar 1, 2018 at 2:48 PM, Eric Dumazet  wrote:
> On Tue, 2018-02-27 at 19:52 -0800, Roopa Prabhu wrote:
>> From: Roopa Prabhu 
>>
>> uapi for ip_proto, sport and dport range match
>> in fib rules.
>
>
> Hi Roopa
>
> FRA_UNSPEC,
>>   FRA_DST,/* destination address */
>> @@ -59,6 +64,9 @@ enum {
>>   FRA_L3MDEV, /* iif or oif is l3mdev goto its table */
>>   FRA_UID_RANGE,  /* UID range */
>>   FRA_PROTOCOL,   /* Originator of the rule */
>> + FRA_IP_PROTO,   /* ip proto */
>> + FRA_SPORT_RANGE, /* sport */
>> + FRA_DPORT_RANGE, /* dport */
>>   __FRA_MAX
>>  };
>
>
> It seems you forgot to update FRA_GENERIC_POLICY ?
>

indeed. will submit a fix. thanks Eric!.

Re: [PATCH bpf-next] samples/bpf: detach prog from cgroup

2018-03-01 Thread David Ahern

On 2/28/18 10:47 PM, Prashant Bhole wrote:
> test_cgrp2_sock.sh and test_cgrp2_sock2.sh tests keep the program
> attached to cgroup even after completion.
> Using detach functionality of test_cgrp2_sock in both scripts.
> 
> Signed-off-by: Prashant Bhole 
> ---
>  samples/bpf/test_cgrp2_sock.sh  | 1 +
>  samples/bpf/test_cgrp2_sock2.sh | 3 +++
>  2 files changed, 4 insertions(+)
> 
Acked-by: David Ahern

Re: [PATCH net-next v2 1/5] net: fib_rules: support for match on ip_proto, sport and dport

2018-03-01 Thread Eric Dumazet

On Tue, 2018-02-27 at 19:52 -0800, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> uapi for ip_proto, sport and dport range match
> in fib rules.


Hi Roopa

FRA_UNSPEC,
>   FRA_DST,/* destination address */
> @@ -59,6 +64,9 @@ enum {
>   FRA_L3MDEV, /* iif or oif is l3mdev goto its table */
>   FRA_UID_RANGE,  /* UID range */
>   FRA_PROTOCOL,   /* Originator of the rule */
> + FRA_IP_PROTO,   /* ip proto */
> + FRA_SPORT_RANGE, /* sport */
> + FRA_DPORT_RANGE, /* dport */
>   __FRA_MAX
>  };


It seems you forgot to update FRA_GENERIC_POLICY ?

Thanks !

Re: [net-next v3 0/2] eBPF seccomp filters

2018-03-01 Thread Sargun Dhillon

On Thu, Mar 1, 2018 at 1:59 PM, Andy Lutomirski  wrote:
> On Thu, Mar 1, 2018 at 9:51 PM, Sargun Dhillon  wrote:
>> On Thu, Mar 1, 2018 at 9:44 AM, Andy Lutomirski  wrote:
>>> On Wed, Feb 28, 2018 at 7:56 PM, Daniel Borkmann  
>>> wrote:
 On 02/28/2018 12:55 AM, chris hyser wrote:
>> On 02/27/2018 04:58 PM, Daniel Borkmann wrote: >> On 02/27/2018 05:59 
>> PM, chris hyser wrote:
 On 02/27/2018 11:00 AM, Kees Cook wrote:
> On Tue, Feb 27, 2018 at 6:53 AM, chris hyser  
> wrote:
>> On 02/26/2018 11:38 PM, Kees Cook wrote:
>>> On Mon, Feb 26, 2018 at 8:19 PM, Andy Lutomirski 
>>> 
>>> wrote:

 3. Straight-up bugs.  Those are exactly as problematic as verifier
 bugs in any other unprivileged eBPF program type, right?  I don't 
 see
 why seccomp is special here.
>>>
>>> My concern is more about unintended design mistakes or other feature
>>> creep with side-effects, especially when it comes to privileges and
>>> synchronization. Getting no-new-privs done correctly, for example,
>>> took some careful thought and discussion, and I'm shy from how 
>>> painful
>>> TSYNC was on the process locking side, and eBPF has had some rather
>>> ugly flaws in the past (and recently: it was nice to be able to say
>>> for Spectre that seccomp filters couldn't be constructed to make
>>> attacks but eBPF could). Adding the complexity needs to be worth the
>>>
>>> Well, not really. One part of all the Spectre mitigations that went 
>>> upstream
>>> from BPF side was to have an option to remove interpreter entirely and 
>>> that
>>> also relates to seccomp eventually. But other than that an attacker 
>>> might
>>> potentially find as well useful gadgets inside seccomp or any other code
>>> that is inside the kernel, so it's not a strict necessity either.
>>>
>>> gain. I'm on board for doing it, I just want to be careful. :)
>>
>> Another option might be to remove c/eBPF from the equation all 
>> together.
>> c/eBPF allows flexibility and that almost always comes at the cost of
>> additional security risk. Seccomp is for enhanced security yes? How 
>> about a
>> new seccomp mode that passes in something like a bit vector or 
>> hashmap for
>> "simple" white/black list checks validated by kernel code, versus 
>> user
>> provided interpreted code? Of course this removes a fair number of 
>> things
>> you can currently do or would be able to do with eBPF. Of course, 
>> restated
>> from a security point of view, this removes a fair number of things 
>> an
>> _attacker_ can do. Presumably the performance improvement would also 
>> be
>> significant.
>>>
>>> Good luck with not breaking existing applications relying on seccomp out
>>> there.
>>
>> This wasn't in the context of an implementation proposal, but the 
>> assumption would be to add this in addition to the old way. Now, does 
>> that make sense to do? That is the discussion.

 I see; didn't read that out from the above when you also mentioned removing
 cBPF, but fair enough.

>> Is this an idea worth prototyping?
>
> That was the original prototype for seccomp-filter. :) The discussion
> around that from years ago basically boiled down to it being
> inflexible. Given all the things people want to do at syscall time,
> that continues to be true. So true, in fact, that here we are now,
> trying to move to eBPF from cBPF. ;)
>>>
>>> Right, agree. cBPF is also pretty much frozen these days and aside from
>>> that, seccomp/BPF also just uses a proper subset of it. I wouldn't mind
>>> doing something similar for eBPF side as long as this is reasonably
>>> maintainable and not making BPF core more complex, but most of it can
>>> already be set in the verifier anyway based on prog type. Note, that
>>> performance of seccomp/BPF is definitely a demand as well which is why
>>> people still extend the old remaining cBPF JITs today such that it can
>>> be JITed also from there.
>>>
 I will try to find that discussion. As someone pointed out here 
 though, eBPF is being used by more and more people in areas where 
 security is not the primary concern. Differing objectives will make 
 this a long term continuing issue. We ourselves were looking at eBPF 
 simply as a means to use a hashmap for a white/blacklist, i.e. 
 performance not flexibility.
>>>
>>> Not really, security of

Re: [PATCH] pci-iov: Add support for unmanaged SR-IOV

2018-03-01 Thread Alexander Duyck

On Thu, Mar 1, 2018 at 12:22 PM, Alex Williamson
 wrote:
> On Wed, 28 Feb 2018 16:36:38 -0800
> Alexander Duyck  wrote:
>
>> On Wed, Feb 28, 2018 at 2:59 PM, Alex Williamson
>>  wrote:
>> > On Wed, 28 Feb 2018 09:49:21 -0800
>> > Alexander Duyck  wrote:
>> >
>> >> On Tue, Feb 27, 2018 at 2:25 PM, Alexander Duyck
>> >>  wrote:
>> >> > On Tue, Feb 27, 2018 at 1:40 PM, Alex Williamson
>> >> >  wrote:
>> >> >> On Tue, 27 Feb 2018 11:06:54 -0800
>> >> >> Alexander Duyck  wrote:
>> >> >>
>> >> >>> From: Alexander Duyck 
>> >> >>>
>> >> >>> This patch is meant to add support for SR-IOV on devices when the VFs 
>> >> >>> are
>> >> >>> not managed by the kernel. Examples of recent patches attempting to 
>> >> >>> do this
>> >> >>> include:
>> >> >>
>> >> >> It appears to enable sriov when the _pf_ is not managed by the
>> >> >> kernel, but by "managed" we mean that either there is no pf driver or
>> >> >> the pf driver doesn't provide an sriov_configure callback,
>> >> >> intentionally or otherwise.
>> >> >>
>> >> >>> virto - https://patchwork.kernel.org/patch/10241225/
>> >> >>> pci-stub - https://patchwork.kernel.org/patch/10109935/
>> >> >>> vfio - https://patchwork.kernel.org/patch/10103353/
>> >> >>> uio - https://patchwork.kernel.org/patch/9974031/
>> >> >>
>> >> >> So is the goal to get around the issues with enabling sriov on each of
>> >> >> the above drivers by doing it under the covers or are you really just
>> >> >> trying to enable sriov for a truly unmanage (no pf driver) case?  For
>> >> >> example, should a driver explicitly not wanting sriov enabled implement
>> >> >> a dummy sriov_configure function?
>> >> >>
>> >> >>> Since this is quickly blowing up into a multi-driver problem it is 
>> >> >>> probably
>> >> >>> best to implement this solution in one spot.
>> >> >>>
>> >> >>> This patch is an attempt to do that. What we do with this patch is 
>> >> >>> provide
>> >> >>> a generic call to enable SR-IOV in the case that the PF driver is 
>> >> >>> either
>> >> >>> not present, or the PF driver doesn't support configuring SR-IOV.
>> >> >>>
>> >> >>> A new sysfs value called sriov_unmanaged_autoprobe has been added. 
>> >> >>> This
>> >> >>> value is used as the drivers_autoprobe setting of the VFs when they 
>> >> >>> are
>> >> >>> being managed by an external entity such as userspace or device 
>> >> >>> firmware
>> >> >>> instead of being managed by the kernel.
>> >> >>
>> >> >> Documentation/ABI/testing/sysfs-bus-pci update is missing.
>> >> >
>> >> > I can make sure to update that in the next version.
>> >> >
>> >> >>> One side effect of this change is that the sriov_drivers_autoprobe and
>> >> >>> sriov_unmanaged_autoprobe will only apply their updates when SR-IOV is
>> >> >>> disabled. Attempts to update them when SR-IOV is in use will only 
>> >> >>> update
>> >> >>> the local value and will not update sriov->autoprobe.
>> >> >>
>> >> >> And we expect users to understand when sriov_drivers_autoprobe applies
>> >> >> vs sriov_unmanaged_autoprobe, even though they're using the same
>> >> >> interfaces to enable sriov?  Are all combinations expected to work, ex.
>> >> >> unmanaged sriov is enabled, a native pf driver loads, vfs work?  Not
>> >> >> only does it seems like there's opportunity to use this incorrectly, I
>> >> >> think maybe it might be difficult to use correctly.
>> >> >>
>> >> >>> I based my patch set originally on the patch by Mark Rustad but there 
>> >> >>> isn't
>> >> >>> much left after going through and cleaning out the bits that were no 
>> >> >>> longer
>> >> >>> needed, and after incorporating the feedback from David Miller.
>> >> >>>
>> >> >>> I have included the authors of the original 4 patches above in the Cc 
>> >> >>> here.
>> >> >>> My hope is to get feedback and/or review on if this works for their 
>> >> >>> use
>> >> >>> cases.
>> >> >>>
>> >> >>> Cc: Mark Rustad 
>> >> >>> Cc: Maximilian Heyne 
>> >> >>> Cc: Liang-Min Wang 
>> >> >>> Cc: David Woodhouse 
>> >> >>> Signed-off-by: Alexander Duyck 
>> >> >>> ---
>> >> >>>  drivers/pci/iov.c|   27 +++-
>> >> >>>  drivers/pci/pci-driver.c |2 +
>> >> >>>  drivers/pci/pci-sysfs.c  |   62 
>> >> >>> +-
>> >> >>>  drivers/pci/pci.h|4 ++-
>> >> >>>  include/linux/pci.h  |1 +
>> >> >>>  5 files changed, 86 insertions(+), 10 deletions(-)
>> >> >>>
>> >> >>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>> >> >>> index 677924ae0350..7b8858bd8d03 100644
>> >> >>> --- a/drivers/pci/iov.c
>> >> >>> +++ b/drivers/pci/iov.c
>> >> >>> @@ -446,6 +446,7 @@ static int sriov_init(struct pci_dev *dev, int

[PATCH iproute2] libnetlink: __rtnl_talk_iov should only loop max iovlen times

2018-03-01 Thread David Ahern

William reported ip hanging and bisected to a recent commit for batching
allowing more than 1 command to be sent per message. The loop over
recvmsg should never cycle more than iovlen times -- 1 response for
each command in the message.

Fixes: 72a2ff3916e5 ("lib/libnetlink: Add a new function rtnl_talk_iov")
Signed-off-by: David Ahern 
---
 lib/libnetlink.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 7ca47b22581a..8a7efaeb3cd3 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -670,8 +670,9 @@ static int __rtnl_talk_iov(struct rtnl_handle *rtnl, struct 
iovec *iov,
free(buf);
if (h->nlmsg_seq == seq)
return 0;
-   else
+   else if (i < iovlen)
goto next;
+   return 0;
}
 
if (rtnl->proto != NETLINK_SOCK_DIAG &&
-- 
2.11.0

1 2 3 >

1 - 100 of 285 matches

Mail list logo