Re: [virtio-dev] [pci PATCH v7 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices

2018-04-03 Thread Rustad, Mark D

On Apr 3, 2018, at 11:27 AM, Michael S. Tsirkin  wrote:


I'm not sure why you would need a feature bit. The capability is
controlled via PCI configuration space. If it is present the device
has the capability. If it is not then it does not.

Basically if the PCI configuration space is not present then the sysfs
entries will not be spawned and nothing will attempt to use this
function.

- ALex


It's about compability with older guests which ignore the
capability.

The feature is thus helpful so host knows whether guest supports VFs.


This is not about a guest creating its own VFs. This is about a host PF  
that happens to have a virtio interface to be able to create virtio VFs  
that can be assigned to guests. Nothing changes at all from a guest  
perspective. Or maybe I am not understanding what you mean by "whether  
guest supports VFs".


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP


Re: [pci PATCH v7 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices

2018-03-28 Thread Rustad, Mark D
On Mar 15, 2018, at 11:42 AM, Alexander Duyck  wrote:

> From: Alexander Duyck 
> 
> Hardware-realized virtio_pci devices can implement SR-IOV, so this
> patch enables its use. The device in question is an upcoming Intel
> NIC that implements both a virtio_net PF and virtio_net VFs. These
> are hardware realizations of what has been up to now been a software
> interface.
> 
> The device in question has the following 4-part PCI IDs:
> 
> PF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 15fe
> VF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 05fe
> 
> The patch currently needs no check for device ID, because the callback
> will never be made for devices that do not assert the capability or
> when run on a platform incapable of SR-IOV.
> 
> One reason for this patch is because the hardware requires the
> vendor ID of a VF to be the same as the vendor ID of the PF that
> created it. So it seemed logical to simply have a fully-functioning
> virtio_net PF create the VFs. This patch makes that possible.
> 
> Reviewed-by: Christoph Hellwig 
> Signed-off-by: Mark Rustad 
> Signed-off-by: Alexander Duyck 
> ---
> 
> v4: Dropped call to pci_disable_sriov in virtio_pci_remove function
> v5: Replaced call to pci_sriov_configure_unmanaged with
> pci_sriov_configure_simple
> v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition
> v7: No code change, added Reviewed-by
> 
>  drivers/virtio/virtio_pci_common.c |1 +
>  1 file changed, 1 insertion(+)

Tested with the identified device.

Tested-by: Mark Rustad 

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP


Re: [virtio-dev] [pci PATCH v7 1/5] pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources

2018-03-28 Thread Rustad, Mark D
On Mar 15, 2018, at 11:41 AM, Alexander Duyck  wrote:

> From: Alexander Duyck 
> 
> This patch adds a common configuration function called
> pci_sriov_configure_simple that will allow for managing VFs on devices
> where the PF is not capable of managing VF resources.
> 
> Signed-off-by: Alexander Duyck 
> ---
> 
> v5: New patch replacing pci_sriov_configure_unmanaged with
>   pci_sriov_configure_simple
> Dropped bits related to autoprobe changes
> v6: Defined pci_sriov_configure_simple as NULL if IOV is disabled
> v7: Updated pci_sriov_configure_simple to drop need for err value
> Fixed comment explaining why pci_sriov_configure_simple is NULL
> 
>  drivers/pci/iov.c   |   31 +++
>  include/linux/pci.h |3 +++
>  2 files changed, 34 insertions(+)

Tested with the device identified in patch #2.

Tested-by: Mark Rustad 

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP


Re: [RFC PATCH V3] virtio_pci: Add SR-IOV support

2018-02-27 Thread Rustad, Mark D
> On Feb 27, 2018, at 7:35 AM, David Miller  wrote:
> 
> I don't like these helpers on many different levels.



> So kill off pci_sriov_enable() helper completely, it is unnecessary,
> and rename the disable helper so that it says something meaningful to
> the reader.

Yes. Once pointed out, I completely agree with your comments and wish that I 
had seen those things myself.

> Thanks.

V3 was junk, but your comments apply to V4 as well, so please ignore it.

Thank you for your valuable review.

-- 
Mark Rustad, Networking Division, Intel Corporation



Re: [RFC PATCH V4] pci: virtio_pci: Add SR-IOV support for virtio_pci devices

2018-02-26 Thread Rustad, Mark D
Alex,

> On Feb 26, 2018, at 7:26 AM, Alexander Duyck  
> wrote:
> 
> Mark,
> 
> In the future please don't put my "Reviewed-by" on a patch that I
> haven't reviewed. I believe I reviewed one of the earlier patches, but
> I hadn't reviewed this version.

I'm very sorry. I completely spaced doing something about that. I think yours 
was the first Reviewed-by I ever had in this way. In the future I will remove 
such things from my changelog right after sending. Thanks for alerting me to 
what I had failed to do.

> Also, after thinking about it over the weekend we may want to look at
> just coming up with a truly "generic" solution that is applied to
> SR-IOV capable devices that don't have a SR-IOV capable driver loaded
> on them. That would allow us to handle the uio, vfio, pci-stub, and
> virtio cases all in one fell swoop. I think us going though and
> modifying one patch at a time to do this kind of thing isn't going to
> scale.

The notion of that kind of troubles me - at least pci-stub does. Having worked 
on ixgbe a bit, I have to wonder what kind of havoc would ensue if an ixgbe 
device were assigned to a guest, and an attempt was made to allocate VFs by the 
pci-stub. The guest could be running any version of the ixgbe driver, possibly 
even an old one that didn't support SR-IOV. Even if it did support SR-IOV, I 
don't know how it would respond to mailbox messages when it doesn't think it 
has VFs.

> I'll try to do some digging and find the VFIO approach we had been
> working on. I think with a couple tweaks we can probably make that
> truly generic and ready for submission.

I'd like to know more about you are thinking about.

-- 
Mark Rustad, Networking Division, Intel Corporation



Re: [RFC PATCH V2] virtio_pci: Add SR-IOV support

2018-02-22 Thread Rustad, Mark D
> On Feb 22, 2018, at 10:26 AM, Christoph Hellwig  wrote:
> 
> Can we move this into common code as a a generic_sriov_configure
> helper?  Nothing is really virtio specific, and it seems like
> some other drivers could also use it, e.g. ena or nvme.

That seems like a good idea to me, especially if PCI developers concur.

-- 
Mark Rustad, Networking Division, Intel Corporation



Re: [Intel-wired-lan] [next-queue 02/10] ixgbe: add ipsec register access routines

2017-12-05 Thread Rustad, Mark D

> On Dec 4, 2017, at 9:35 PM, Shannon Nelson  wrote:
> 
> Add a few routines to make access to the ipsec registers just a little
> easier, and throw in the beginnings of an initialization.
> 
> Signed-off-by: Shannon Nelson 
> ---
> drivers/net/ethernet/intel/ixgbe/Makefile  |   1 +
> drivers/net/ethernet/intel/ixgbe/ixgbe.h   |   6 +
> drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 157 +
> drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h |  50 
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |   1 +



> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
> new file mode 100644
> index 000..14dd011
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
> @@ -0,0 +1,157 @@
> +/***
> + *
> + * Intel 10 Gigabit PCI Express Linux driver
> + * Copyright(c) 2017 Oracle and/or its affiliates. All rights reserved.

I don't think that it really makes sense to assert "All rights reserved" in 
something that is GPL. It makes it seem like something is being asserted that 
is counter to the GPL.



> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h
> new file mode 100644
> index 000..017b13f
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h
> @@ -0,0 +1,50 @@
> +/***
> +
> +  Intel 10 Gigabit PCI Express Linux driver
> +  Copyright(c) 2017 Oracle and/or its affiliates. All rights reserved.

Likewise here.

-- 
Mark Rustad, Networking Division, Intel Corporation



Re: [PATCH net-next] tools: bpf: handle long path in jit disasm

2017-11-02 Thread Rustad, Mark D

> On Nov 2, 2017, at 1:09 AM, Prashant Bhole  
> wrote:
> 
> Use PATH_MAX instead of hardcoded array size 256
> 
> Signed-off-by: Prashant Bhole 
> ---
> tools/bpf/bpf_jit_disasm.c | 3 ++-
> tools/bpf/bpftool/jit_disasm.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
> index 422d9abd666a..75bf526a0168 100644
> --- a/tools/bpf/bpf_jit_disasm.c
> +++ b/tools/bpf/bpf_jit_disasm.c
> @@ -27,6 +27,7 @@
> #include 
> #include 
> #include 
> +#include 
> 
> #define CMD_ACTION_SIZE_BUFFER10
> #define CMD_ACTION_READ_ALL   3
> @@ -51,7 +52,7 @@ static void get_exec_path(char *tpath, size_t size)
> static void get_asm_insns(uint8_t *image, size_t len, int opcodes)
> {
>   int count, i, pc = 0;
> - char tpath[256];
> + char tpath[PATH_MAX];

Seems like such a nice thing, *but* PATH_MAX is 4096. Can things really 
tolerate 4k on the stack here?

>   struct disassemble_info info;
>   disassembler_ftype disassemble;
>   bfd *bfdf;
> diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
> index 5937e134e408..1551d3918d4c 100644
> --- a/tools/bpf/bpftool/jit_disasm.c
> +++ b/tools/bpf/bpftool/jit_disasm.c
> @@ -21,6 +21,7 @@
> #include 
> #include 
> #include 
> +#include 
> 
> #include "json_writer.h"
> #include "main.h"
> @@ -80,7 +81,7 @@ void disasm_print_insn(unsigned char *image, ssize_t len, 
> int opcodes)
>   disassembler_ftype disassemble;
>   struct disassemble_info info;
>   int count, i, pc = 0;
> - char tpath[256];
> + char tpath[PATH_MAX];

Same comment here.

>   bfd *bfdf;
> 
>   if (!len)

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] tests: Remove bashisms (s/source/.)

2017-10-13 Thread Rustad, Mark D
> On Oct 8, 2017, at 7:39 AM, Petr Vorel  wrote:
> 
> diff --git a/testsuite/tests/ip/route/add_default_route.t 
> b/testsuite/tests/ip/route/add_default_route.t
> index e5ea6473..0b566f1f 100755
> --- a/testsuite/tests/ip/route/add_default_route.t
> +++ b/testsuite/tests/ip/route/add_default_route.t
> @@ -1,6 +1,6 @@
> -#!/bin/sh
> +#!/bin/bash

Funny - ^^^ choosing bash explicitly while
    removing the bashism?

> -source lib/generic.sh
> +. lib/generic.sh
> 
> ts_log "[Testing add default route]"

I noticed a couple other files already specified /bin/bash, yet you removed the 
bashisms. But the above struck me as something that would not seem to have been 
intended.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP


Re: netlink backwards compatibility in userspace tools

2017-09-29 Thread Rustad, Mark D

> On Sep 29, 2017, at 3:22 AM, Jason A. Donenfeld  wrote:
> 
> Hi guys,
> 
> One handy aspect of Netlink is that it's backwards compatible. This
> means that you can run old userspace utilities on new kernels, even if
> the new kernel supports new features and netlink attributes. The wire
> format is stable enough that the data marshaled can be extended
> without breaking compat. Neat.
> 
> I was wondering, though, what you think the best stance is toward
> these old userspace utilities. What should they do if the kernel sends
> it netlink attributes that it does not recognize? At the moment, I'm
> doing something like this:
> 
> static void warn_unrecognized(void)
> {
>static bool once = false;
>if (once)
>return;
>once = true;
>fprintf(stderr,
>"Warning: this program received from your kernel one or more\n"
>"attributes that it did not recognize. It is possible that\n"
>"this version of wg(8) is older than your kernel. You may\n"
>"want to update this program.\n");
> }
> 
> This seems like a somewhat sensible warning, but then I wonder about
> distributions like Debian, which has a long stable life cycle, so it
> frequently has very old tools (ancient iproute2 for example). Then,
> VPS providers have these Debian images run on top of newer kernels.
> People in this situation would undoubtedly see the above warning a lot
> and not be able to do anything about it. Not horrible, but a bit
> annoying. Is this an okay annoyance? Or is it advised to just have no
> warning at all? One idea would be to put it behind an environment
> variable flag, but I don't like too many nobs.
> 
> I'm generally wondering about attitudes toward this kind of userspace
> program behavior in response to newer kernels.
> 
> Thanks,
> Jason

That seems like a bit much. Consider only emitting a message with the use of a 
verbose flag - or two. Even then the message should be shortened - the first 
sentence is entirely adequate even in verbose mode.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH V2] r8152: add Linksys USB3GIGV1 id

2017-09-28 Thread Rustad, Mark D

> On Sep 27, 2017, at 9:39 AM, Grant Grundler  wrote:
> 
> On Wed, Sep 27, 2017 at 12:15 AM, Oliver Neukum  wrote:
>> Am Dienstag, den 26.09.2017, 08:19 -0700 schrieb Doug Anderson:
>>> 
>>> I know that for at least some of the adapters in the CDC Ethernet
>>> blacklist it was claimed that the CDC Ethernet support in the adapter
>>> was kinda broken anyway so the blacklist made sense.  ...but for the
>>> Linksys Gigabit adapter the CDC Ethernet driver seems to work OK, it's
>>> just not quite as full featured / efficient as the R8152 driver.
>>> 
>>> Is that not a concern?  I guess you could tell people in this
>>> situation that they simply need to enable the R8152 driver to get
>>> continued support for their Ethernet adapter?
>> 
>> Hi,
>> 
>> yes, it is a valid concern. An #ifdef will be needed.
> 
> Good idea - I will post V3 shortly.
> 
> I'm assuming you mean to add #ifdef CONFIG_USB_RTL8152 around the
> blacklist entry in cdc_ether driver.

Shouldn't that be an #if IS_ENABLED(...) test, since that seems to be the 
proper way to check configured drivers.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support to HNS3 driver

2017-07-24 Thread Rustad, Mark D

> On Jul 23, 2017, at 10:05 AM, Florian Fainelli  wrote:
>> +
>> +strncpy(drvinfo->version, HNAE_DRIVER_VERSION,
>> +sizeof(drvinfo->version));
>> +drvinfo->version[sizeof(drvinfo->version) - 1] = '\0';
> 
> strlcpy() would probably do that for you.

You need to be careful about strlcpy - it does not completely write the 
destination buffer as strncpy does, and so can result in a kernel memory leak 
if the destination is not known to already be cleared.

>> +
>> +strncpy(drvinfo->driver, HNAE_DRIVER_NAME, sizeof(drvinfo->driver));
>> +drvinfo->driver[sizeof(drvinfo->driver) - 1] = '\0';
> 
> Same here

Same here

>> +
>> +strncpy(drvinfo->bus_info, priv->dev->bus->name,
>> +sizeof(drvinfo->bus_info));> +  
>> drvinfo->bus_info[ETHTOOL_BUSINFO_LEN - 1] = '\0';
> 
> And here.

And here too. I haven't looked at this deeply enough to know whether a leak 
could be created by strlcpy here, but I wanted to raise it as something to be 
considered before switching to it. Blindly adopting strlcpy is hazardous as are 
tools that unconditionally recommend it.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] x86-32: fix tlb flushing when lguest clears PGE

2017-03-13 Thread Rustad, Mark D
On Mar 12, 2017, at 7:02 PM, Kees Cook  wrote:

> Are there nominations for most comprehensive changelog of the year? :)
> This is awesome.

Especially for a one-liner! Truly comprehensive and completely relevant.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP


Re: [Intel-wired-lan] [PATCH 1/1] ixgbe: fcoe: return value of skb_linearize should be handled

2016-12-07 Thread Rustad, Mark D

Zhouyi Zhou  wrote:

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c

index fee1f29..4926d48 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2173,8 +2173,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector  
*q_vector,

total_rx_bytes += ddp_bytes;
total_rx_packets += DIV_ROUND_UP(ddp_bytes,
 mss);
-   }
-   if (!ddp_bytes) {
+   } else {
dev_kfree_skb_any(skb);
continue;
}


This is changing the logic by treating a negative ddp_bytes value (an error  
return) the same as a 0 value. This is probably wrong and inappropriate for  
this patch in any case.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-13 Thread Rustad, Mark D

Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote:


On Tue, Sep 13, 2016 at 10:41:12PM +, Rustad, Mark D wrote:

That said, I can see that you have tried to keep the original code path
pretty much intact. I would note that you introduced rcu calls into the  
!bpf

path that would never have been done before. While that should be ok, I
would really like to see it tested, at least for the !bpf case, on real
hardware to be sure.


please go ahead and test. rcu_read_lock is zero extra instructions
for everything but preempt or debug kernels.


Well, I don't have any hardware in hand to test with, though my former  
employer would. I guess my current employer would too! :-) FWIW, the kernel  
used in that system I referred to before was a preempt kernel.


The test matrix is large, the tail is long and you can't just gloss these  
things over. I understand that it isn't the focus of your work, just as  
regression testing the e1000 is not the focus of any of our work any more.  
That is precisely why it is a sensitive area.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-13 Thread Rustad, Mark D

Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote:


On Tue, Sep 13, 2016 at 07:14:27PM +, Rustad, Mark D wrote:

Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote:


On Tue, Sep 13, 2016 at 06:28:03PM +0000, Rustad, Mark D wrote:

Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote:


I've looked through qemu and it appears only emulate e1k and tg3.
The latter is still used in the field, so the risk of touching
it is higher.


I have no idea what makes you think that e1k is *not* "used in the  
field".

I grant you it probably is used more virtualized than not these days,
but it
certainly exists and is used. You can still buy them new at Newegg for
goodness sakes!


the point that it's only used virtualized, since PCI (not PCIE) have
been long dead.


My point is precisely the opposite. It is a real device, it exists in real
systems and it is used in those systems. I worked on embedded systems that
ran Linux and used e1000 devices. I am sure they are still out there  
because

customers are still paying for support of those systems.

Yes, PCI(-X) is absent from any current hardware and has been for some  
years

now, but there is an installed base that continues. What part of that
installed base updates software? I don't know, but I would not just assume
that it is 0. I know that I updated the kernel on those embedded systems
that I worked on when I was supporting them. Never at the bleeding edge,  
but

generally hopping from one LTS kernel to another as needed.


I suspect modern linux won't boot on such old pci only systems for other
reasons not related to networking, since no one really cares to test  
kernels there.


Actually it does boot, because although the motherboard was PCIe, the slots  
and the adapters in them were PCI-X. So the core architecture was not so  
stale.



So I think we mostly agree. There is chance that this xdp e1k code will
find a way to that old system. What are the chances those users will
be using xdp there? I think pretty close to zero.


For sure they wouldn't be using XDP, but they could suffer regressions in a  
changed driver that might find its way there. That is the risk.



The pci-e nics integrated into motherboards that pretend to be tg3
(though they're not at all build by broadcom) are significantly more  
common.

That's why I picked e1k instead of tg3.


That may be true (I really don't know anything about tg3 so I certainly  
can't dispute it), so the risk could be smaller with e1k, but there is  
still a regression risk for real existing hardware. That is my concern.



Also note how this patch is not touching anything in the main e1k path
(because I don't have a hw to test and I suspect Intel's driver team
doesn't have it either) to make sure there is no breakage on those
old systems. I created separate e1000_xmit_raw_frame() routine
instead of adding flags into e1000_xmit_frame() for the same reasons:
to make sure there is no breakage.
Same reasoning for not doing an union of page/skb as Alexander suggested.
I wanted minimal change to e1k that allows development xdp programs in kvm
without affecting e1k main path. If you see the actual bug in the patch,
please point out the line.


I can't say that I can, because I am not familiar with the internals of  
e1k. When I was using it, I never had cause to even look at the driver  
because it just worked. My attentions then were properly elsewhere.


My concern is with messing with a driver that probably no one has an active  
testbed routinely running regression tests.


Maybe a new ID should be assigned and the driver forked for this purpose.  
At least then only the virtualization case would have to be tested. Of  
course the hosts would have to offer that ID, but if this is just for  
testing that should be ok, or at least possible.


If someone with more internal knowledge of this driver has enough  
confidence to bless such patches, that might be fine, but it is a fallacy  
to think that e1k is *only* a virtualization driver today. Not yet anyway.  
Maybe around 2020.


That said, I can see that you have tried to keep the original code path  
pretty much intact. I would note that you introduced rcu calls into the  
!bpf path that would never have been done before. While that should be ok,  
I would really like to see it tested, at least for the !bpf case, on real  
hardware to be sure. I really can't comment on the workaround issue brought  
up by Eric, because I just don't know about them. At least that risk seems  
to only be in the bpf case.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH v3] net: ip, diag -- Add diag interface for raw sockets

2016-09-13 Thread Rustad, Mark D

Greg  wrote:


Someday Linux will be a modern OS that just includes IPV6 and forces a
config option to NOT have it.

That'll be great.  All the IS_ENABLED_(CONFIG_IPV6) scattered everywhere
is nuts.




Better wait until everyone at least *has* IPv6! I have yet to have IPv6  
deployed on any of my employer's networks or get IPv6 service from any ISP  
at my home. When I was at Apple in the 90's I was told that Apple needed  
IPv6 by next year or "we were dead". Well Apple nearly died, but IPv6 had  
nothing to do with that! And I still haven't experienced an IPv6  
deployment! Yeah, I have run it a bit point-to-point to resolve technical  
issues, but that isn't a "deployment" and not very interesting.


As much as we would like things to move faster, much of the world just  
doesn't. Witness the e1000 discussion today for example. Hardware doesn't  
vanish overnight, and I know that my ISP has a network full of CPE that  
doesn't do IPv6, so I'm not expecting their status to change any time soon.


It would be great though.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-13 Thread Rustad, Mark D

Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote:


On Tue, Sep 13, 2016 at 06:28:03PM +, Rustad, Mark D wrote:

Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote:


I've looked through qemu and it appears only emulate e1k and tg3.
The latter is still used in the field, so the risk of touching
it is higher.


I have no idea what makes you think that e1k is *not* "used in the field".
I grant you it probably is used more virtualized than not these days,  
but it

certainly exists and is used. You can still buy them new at Newegg for
goodness sakes!


the point that it's only used virtualized, since PCI (not PCIE) have
been long dead.


My point is precisely the opposite. It is a real device, it exists in real  
systems and it is used in those systems. I worked on embedded systems that  
ran Linux and used e1000 devices. I am sure they are still out there  
because customers are still paying for support of those systems.


Yes, PCI(-X) is absent from any current hardware and has been for some  
years now, but there is an installed base that continues. What part of that  
installed base updates software? I don't know, but I would not just assume  
that it is 0. I know that I updated the kernel on those embedded systems  
that I worked on when I was supporting them. Never at the bleeding edge,  
but generally hopping from one LTS kernel to another as needed.


The day is coming when all the motherboards with PCI(-X) will be gone, but  
I think it is still at least a few years off.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-13 Thread Rustad, Mark D

Alexei Starovoitov  wrote:


I've looked through qemu and it appears only emulate e1k and tg3.
The latter is still used in the field, so the risk of touching
it is higher.


I have no idea what makes you think that e1k is *not* "used in the field".   
I grant you it probably is used more virtualized than not these days, but  
it certainly exists and is used. You can still buy them new at Newegg for  
goodness sakes!


Maybe I'll go home and plug in my old e100 into my machine that still has a  
PCI slot, just for old times sake. Oh darn, I have a SCSI card in that  
slot...


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCHv2 3/4] pci: Determine actual VPD size on first access

2016-08-15 Thread Rustad, Mark D

Benjamin Herrenschmidt  wrote:


Filtering things to work around bugs in existing guests to avoid crashes
is a different kettle of fish and could be justified but keep in mind that
in most cases a malicious guest will be able to exploit those HW flaws.


Bugs in existing guests is an interesting case, but I have been focused on  
getting acceptable behavior from a properly functioning guest, in the face  
of hardware issues that can only be resolved in a single place.


I agree that a malicious guest can cause all kinds of havoc with  
directly-assigned devices. Consider a 4-port PHY chip on a shared MDIO bus,  
for instance. There is really nothing to be done about the potential for  
mischief with that kind of thing.


The VPD problem that I had been concerned about arises from a bad design in  
the PCI spec together with implementations that share the registers across  
functions. The hardware isn't going to change and I really doubt that the  
spec will either, so we address it the only place we can.


I am certain that we agree that not everything can or should be addressed  
in vfio. I did not mean to suggest it should try to address everything, but  
I think it should make it possible for correctly behaving guests to work. I  
think that is not unreasonable.


Perhaps the VPD range check should really just have been implemented for  
the sysfs interface, and left the vfio case unchecked. I don't know because  
I was not involved in that issue. Perhaps someone more intimately involved  
can comment on that notion.



Assuming that a device coming back from a guest is functional and not
completely broken and can be re-used without a full PERST or power cycle
is a wrong assumption. It may or may not work, no amount of "filtering"
will fix the fundamental issue. If your HW won't give you access to PERST
well ... blame Intel for not specifying a standard way to generate it in
the first place :-)


Yeah, I worry about the state that a malicious guest could leave a device  
in, but I consider direct assignment always risky anyway. I would just like  
it to at least work in the non-malicious guest cases.


I guess my previous response was really just too terse, I was just focused  
on unavoidable hangs and data corruption, which even were happening without  
any guest involvement. For me, guests were just an additional exposure of  
the same underlying issue.


With hindsight, it is easy to see that a standard reset would now be a  
pretty useful thing. I am sure that even if it existed, we would now have  
lots and lots of quirks around it as well! :-)


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCHv2 3/4] pci: Determine actual VPD size on first access

2016-08-15 Thread Rustad, Mark D

Benjamin Herrenschmidt  wrote:


We may want some kind of "strict" vs. "relaxed" model here to
differenciate the desktop user wanting to give a function to his/her
windows partition and doesn't care about strict isolation vs. the cloud
data center.


I don't think desktop users appreciate hangs any more than anyone else, and  
that is one of the symptoms that can arise here without the vfio  
coordination.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH net] e1000e: fix PTP on e1000_pch_lpt variants

2016-07-19 Thread Rustad, Mark D

Jarod Wilson  wrote:


I've got reports that the Intel I-218V NIC in Intel NUC5i5RYH systems used
as a PTP slave experiences random ~10 hour clock jumps, which are resolved
if the same workaround for the 82574 and 82583 is employed. Switching from
an if to a select, because the list of NIC types could well grow further
and we'd already have to wrap the conditionals.

CC: Jeff Kirsher 
CC: intel-wired-...@lists.osuosl.org
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c  
b/drivers/net/ethernet/intel/e1000e/netdev.c

index 2b2e2f8..866fea0 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4335,7 +4335,10 @@ static cycle_t e1000e_cyclecounter_read(const  
struct cyclecounter *cc)

systim = (cycle_t)systimel;
systim |= (cycle_t)systimeh << 32;

-   if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) {
+   switch (hw->mac.type) {
+   case e1000_82574:
+   case e1000_82583:
+   case e1000_pch_lpt:
u64 time_delta, rem, temp;
u32 incvalue;
int i;


I don't think that it is acceptable to declare local variables inside a  
switch statement quite like this. At a minimum, a new block needs to be  
opened to allow the declarations.


@@ -4360,6 +4363,9 @@ static cycle_t e1000e_cyclecounter_read(const  
struct cyclecounter *cc)

(rem == 0))
break;
}
+   break;
+   default:
+   break;
}
return systim;
 }


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH] (resend) ixgbe: always initialize setup_fc

2016-07-06 Thread Rustad, Mark D

Patrick McLean  wrote:


Gmail mangled my first message, sorry about that. Second attempt.

In ixgbe_init_mac_link_ops_X550em, the code has a special case for
backplane media type, but does not fall through to the default case,
so the setup_fc never gets initialized. This causes a panic when it
later tries to set up the card, and the kernel dereferences the null
pointer.

This patch lets the the function fall through, which initialized
setup_fc properly.


I don't think that this is the right fix. My memory is that fc autoneg is  
not supported in that configuration, so setup_fc is intended to be NULL. I  
think it is the reference that needs to have a check added.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH] e1000e: prevent division by zero if TIMINCA is zero

2016-05-06 Thread Rustad, Mark D

Denys Vlasenko  wrote:


Users report that under VMWare, er32(TIMINCA) returns zero.
This causes division by zero at init time as follows:

 ==>incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK;
for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) {
/* latch SYSTIMH on read of SYSTIML */
systim_next = (cycle_t)er32(SYSTIML);
systim_next |= (cycle_t)er32(SYSTIMH) << 32;

time_delta = systim_next - systim;
temp = time_delta;
 >  rem = do_div(temp, incvalue);

This change makes kernel survive this, and users report that
NIC does work after this change.

Since on real hardware incvalue is never zero, this should not affect
real hardware use case.

Signed-off-by: Denys Vlasenko 
CC: Jeff Kirsher 
CC: "Ruinskiy, Dima" 
CC: intel-wired-...@lists.osuosl.org
CC: netdev@vger.kernel.org
CC: LKML 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c  
b/drivers/net/ethernet/intel/e1000e/netdev.c

index 269087c..0626935 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4315,7 +4315,8 @@ static cycle_t e1000e_cyclecounter_read(const  
struct cyclecounter *cc)


time_delta = systim_next - systim;
temp = time_delta;
-   rem = do_div(temp, incvalue);
+   /* VMWare users have seen incvalue of zero, don't div / 
0 */
+   rem = incvalue ? do_div(temp, incvalue) : (time_delta 
!= 0);

systim = systim_next;



I seem to recall that this was rejected before because it really is  
VMWare's bug and, if they fix it, any existing VMs that use this will just  
work. Changing the driver will only fix it for vms that install a new  
driver. I don't object to doing it, it just seems like not the most  
effective place to address the issue.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH net-next 2/2] intel: ixgbevf: Support Windows hosts (Hyper-V)

2016-04-14 Thread Rustad, Mark D

KY Srinivasan <k...@microsoft.com> wrote:





-Original Message-
From: Rustad, Mark D [mailto:mark.d.rus...@intel.com]
Sent: Thursday, April 14, 2016 5:07 PM
To: KY Srinivasan <k...@microsoft.com>
Cc: David Miller <da...@davemloft.net>; netdev
<netdev@vger.kernel.org>; linux-ker...@vger.kernel.org;
de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
jasow...@redhat.com; e...@mellanox.com; ja...@mellanox.com;
yevge...@mellanox.com; Ronciak, John <john.ronc...@intel.com>; intel-
wired-...@linuxonhyperv.com
Subject: Re: [PATCH net-next 2/2] intel: ixgbevf: Support Windows hosts
(Hyper-V)

KY Srinivasan <k...@microsoft.com> wrote:


-----Original Message-
From: Rustad, Mark D [mailto:mark.d.rus...@intel.com]
Sent: Thursday, April 14, 2016 4:01 PM
To: KY Srinivasan <k...@microsoft.com>
Cc: David Miller <da...@davemloft.net>; netdev
<netdev@vger.kernel.org>; linux-ker...@vger.kernel.org;
de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
jasow...@redhat.com; e...@mellanox.com; ja...@mellanox.com;
yevge...@mellanox.com; Ronciak, John <john.ronc...@intel.com>;

intel-

wired-...@linuxonhyperv.com
Subject: Re: [PATCH net-next 2/2] intel: ixgbevf: Support Windows hosts
(Hyper-V)

Some comments below:


Mark,

Thank you for the comments. I will address them and repost the patches.

Regards,

K. Y


Please look closely at Alex's comments. I think they are much more
important.


I am looking at Alex's comments as I am writing this.

K. Y

--
Mark Rustad, Networking Division, Intel Corporation


Can you please stop putting that crazy intel-wired-...@linuxonhyperv.com in  
your messages? It doesn't exist.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH net-next 2/2] intel: ixgbevf: Support Windows hosts (Hyper-V)

2016-04-14 Thread Rustad, Mark D

KY Srinivasan <k...@microsoft.com> wrote:





-Original Message-
From: Rustad, Mark D [mailto:mark.d.rus...@intel.com]
Sent: Thursday, April 14, 2016 4:01 PM
To: KY Srinivasan <k...@microsoft.com>
Cc: David Miller <da...@davemloft.net>; netdev
<netdev@vger.kernel.org>; linux-ker...@vger.kernel.org;
de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
jasow...@redhat.com; e...@mellanox.com; ja...@mellanox.com;
yevge...@mellanox.com; Ronciak, John <john.ronc...@intel.com>; intel-
wired-...@linuxonhyperv.com
Subject: Re: [PATCH net-next 2/2] intel: ixgbevf: Support Windows hosts
(Hyper-V)

Some comments below:


Mark,

Thank you for the comments. I will address them and repost the patches.

Regards,

K. Y


Please look closely at Alex's comments. I think they are much more important.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH net-next 2/2] intel: ixgbevf: Support Windows hosts (Hyper-V)

2016-04-14 Thread Rustad, Mark D

Some comments below:

K. Y. Srinivasan  wrote:


On Hyper-V, the VF/PF communication is a via software mediated path
as opposed to the hardware mailbox. Make the necessary
adjustments to support Hyper-V.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |   11 ++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   56 ++---
 drivers/net/ethernet/intel/ixgbevf/mbx.c  |   12 ++
 drivers/net/ethernet/intel/ixgbevf/vf.c   |  138 +
 drivers/net/ethernet/intel/ixgbevf/vf.h   |2 +
 5 files changed, 201 insertions(+), 18 deletions(-)




diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c  
b/drivers/net/ethernet/intel/ixgbevf/vf.c

index 4d613a4..92397fd 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c





@@ -126,6 +133,23 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw)
 }

 /**
+ * Hyper-V variant; the VF/PF communication is through the PCI
+ * config space.
+ */
+static s32 ixgbevf_hv_reset_hw_vf(struct ixgbe_hw *hw)
+{
+   int i;
+   struct ixgbevf_adapter *adapter = hw->back;


The two lines above should be swapped so that the longer one is first.


+
+   for (i = 0; i < 6; i++)
+   pci_read_config_byte(adapter->pdev,
+(i + IXGBE_HV_RESET_OFFSET),
+>mac.perm_addr[i]);
+
+   return 0;
+}
+
+/**
  *  ixgbevf_stop_hw_vf - Generic stop Tx/Rx units
  *  @hw: pointer to hardware structure
  *
@@ -656,6 +680,68 @@ out:
 }

 /**
+ * Hyper-V variant; there is no mailbox communication.
+ */
+static s32 ixgbevf_hv_check_mac_link_vf(struct ixgbe_hw *hw,
+   ixgbe_link_speed *speed,
+   bool *link_up,
+   bool autoneg_wait_to_complete)
+{
+   struct ixgbe_mbx_info *mbx = >mbx;
+   struct ixgbe_mac_info *mac = >mac;
+   s32 ret_val = 0;


ret_val here is redundant as this is the only assignment to it.
Please delete it and just return 0 at the end.


+   u32 links_reg;
+
+   /* If we were hit with a reset drop the link */
+   if (!mbx->ops.check_for_rst(hw) || !mbx->timeout)
+   mac->get_link_status = true;
+
+   if (!mac->get_link_status)
+   goto out;
+
+   /* if link status is down no point in checking to see if pf is up */
+   links_reg = IXGBE_READ_REG(hw, IXGBE_VFLINKS);
+   if (!(links_reg & IXGBE_LINKS_UP))
+   goto out;
+
+   /* for SFP+ modules and DA cables on 82599 it can take up to 500usecs
+* before the link status is correct
+*/
+   if (mac->type == ixgbe_mac_82599_vf) {
+   int i;
+
+   for (i = 0; i < 5; i++) {
+   udelay(100);
+   links_reg = IXGBE_READ_REG(hw, IXGBE_VFLINKS);
+
+   if (!(links_reg & IXGBE_LINKS_UP))
+   goto out;
+   }
+   }
+
+   switch (links_reg & IXGBE_LINKS_SPEED_82599) {
+   case IXGBE_LINKS_SPEED_10G_82599:
+   *speed = IXGBE_LINK_SPEED_10GB_FULL;
+   break;
+   case IXGBE_LINKS_SPEED_1G_82599:
+   *speed = IXGBE_LINK_SPEED_1GB_FULL;
+   break;
+   case IXGBE_LINKS_SPEED_100_82599:
+   *speed = IXGBE_LINK_SPEED_100_FULL;
+   break;
+   }
+
+   /* if we passed all the tests above then the link is up and we no
+* longer need to check for link
+*/
+   mac->get_link_status = false;
+
+out:
+   *link_up = !mac->get_link_status;
+   return ret_val;


Just return 0 above.


+}
+
+/**
  *  ixgbevf_rlpml_set_vf - Set the maximum receive packet length
  *  @hw: pointer to the HW structure
  *  @max_size: value to assign to max frame size





@@ -663,6 +749,19 @@ out:
 void ixgbevf_rlpml_set_vf(struct ixgbe_hw *hw, u16 max_size)
 {
u32 msgbuf[2];
+   u32 reg;
+
+   if (x86_hyper == _hyper_ms_hyperv) {
+   /*
+* If we are on Hyper-V, we implement


Please format the comment above netdev-style, /* If we are...
as you did elsewhere.


+* this functionality differently.
+*/
+   reg =  IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(0));
+   /* CRC == 4 */
+   reg |= ((max_size + 4) | IXGBE_RXDCTL_RLPML_EN);
+   IXGBE_WRITE_REG(hw, IXGBE_VFRXDCTL(0), reg);
+   return;
+   }

msgbuf[0] = IXGBE_VF_SET_LPE;
msgbuf[1] = max_size;
@@ -679,6 +778,16 @@ int ixgbevf_negotiate_api_version(struct ixgbe_hw  
*hw, int api)

int err;
u32 msg[3];

+   if (x86_hyper == _hyper_ms_hyperv) {
+   /*
+* Hyper-V only supports api version ixgbe_mbox_api_10


Again, please use netdev-style 

Re: [PATCH] mwifiex: fix possible NULL dereference

2016-04-12 Thread Rustad, Mark D

Andy Shevchenko  wrote:


On Mon, Apr 11, 2016 at 6:27 PM, Sudip Mukherjee
 wrote:

From: Sudip Mukherjee 

We have a check for card just after dereferencing it. So if it is NULL
we have already dereferenced it before its check. Lets dereference it
after checking card for NULL.


IIUC the code does nothing with dereference.

I would have told NAK if I would have been maintainer.


Signed-off-by: Sudip Mukherjee 
---
 drivers/net/wireless/marvell/mwifiex/pcie.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c  
b/drivers/net/wireless/marvell/mwifiex/pcie.c

index edf8b07..84562d0 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -2884,10 +2884,11 @@ static void mwifiex_unregister_dev(struct  
mwifiex_adapter *adapter)

 {
struct pcie_service_card *card = adapter->card;


Let's say it's 0.


   const struct mwifiex_pcie_card_reg *reg;
-   struct pci_dev *pdev = card->dev;


This would be equal to offset of dev member in pcie_service_card struct.

Nothing wrong here.


Actually, that is not true. The dereference of card tells the compiler that  
card can't be NULL, so it is free to eliminate the check below.  
Unbelievably, this can even happen for a reference such as >thing  
where the pointer isn't even actually dereferenced at all!



+   struct pci_dev *pdev;
int i;

if (card) {
+   pdev = card->dev;
if (card->msix_enable) {
for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++)
synchronize_irq(card->msix_entries[i].vector);
--
1.9.1




--
With Best Regards,
Andy Shevchenko



--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe

2016-02-17 Thread Rustad, Mark D

John Fastabend  wrote:

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h  
b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h

new file mode 100644
index 000..43ebec4
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
@@ -0,0 +1,112 @@
+/***
+ *
+ * Intel 10 Gigabit PCI Express Linux drive
+ * Copyright(c) 2013 - 2015 Intel Corporation.


Those copyright dates don't seem reasonable for a new file, since it  
clearly didn't exist in 2013. IANAL, but it has to be better to at least be  
accurate about when it was created.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH net-next V2 4/6] igb: call ndo_stop() instead of dev_close() when running offline selftest

2016-02-16 Thread Rustad, Mark D

Aaron F  wrote:

From: Intel-wired-lan [intel-wired-lan-boun...@lists.osuosl.org] on  
behalf of Stefan Assmann [sassm...@kpanic.de]

Sent: Wednesday, February 03, 2016 12:20 AM
To: intel-wired-...@lists.osuosl.org
Cc: netdev@vger.kernel.org; da...@davemloft.net; sassm...@kpanic.de
Subject: [Intel-wired-lan] [PATCH net-next V2 4/6] igb: call ndo_stop()  
instead of dev_close() when running offline selftest


Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/igb/igb.h | 2 ++
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 ++--
 drivers/net/ethernet/intel/igb/igb_main.c| 8 
 3 files changed, 8 insertions(+), 6 deletions(-)


Checkpatch warns that externs should be avoided in .c files, but they  
pre-existed and are just being flagged due to the name changing, so...


Tested-by: Aaron Brown 


Again, the prototypes in the .c should be deleted in favor of the ones that  
were added to the .h file.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH net-next V2 5/6] e1000: call ndo_stop() instead of dev_close() when running offline selftest

2016-02-16 Thread Rustad, Mark D

Aaron F  wrote:

From: Intel-wired-lan [intel-wired-lan-boun...@lists.osuosl.org] on  
behalf of Stefan Assmann [sassm...@kpanic.de]

Sent: Wednesday, February 03, 2016 12:20 AM
To: intel-wired-...@lists.osuosl.org
Cc: netdev@vger.kernel.org; da...@davemloft.net; sassm...@kpanic.de
Subject: [Intel-wired-lan] [PATCH net-next V2 5/6] e1000: call  
ndo_stop()   instead of dev_close() when running offline selftest


Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/e1000/e1000.h | 2 ++
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 4 ++--
 drivers/net/ethernet/intel/e1000/e1000_main.c| 8 
 3 files changed, 8 insertions(+), 6 deletions(-)


Checkpatch warns that externs should be avoided in .c files, but they  
pre-existed and are just being flagged due to the name changing, so...


Tested-by: Aaron Brown 


Actually, it is the forward declarations in the .c that should be deleted.  
The prototypes should only exist in the .h file.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH V2 5/5] net: can: ifi: Add IFI CANFD IP support

2016-01-22 Thread Rustad, Mark D

Marek Vasut  wrote:


I see, so adding u32 also here works. I'm starting to wonder if the BIT
macro is really that nice and if I shouldn't just use (1 << n) as usual.


Actually, (1 << n) is not so good either when n is 31 - it can trigger  
overflow warnings since it will be presumed to be a signed value. (1U << n)  
should avoid that problem. Unfortunately, BIT() uses 1UL which produces  
64-bit values on 64-bit arches. The bit ops are kind of a mess in that way.  
It would be nicer if BIT was restricted to int-size values and a new BIT_UL  
or something would produce the long values.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH net-next 6/8] net: gre: Implement LCO for GRE over IPv4

2016-01-20 Thread Rustad, Mark D

Alexander Duyck  wrote:


Actually you may want to go the other way on that.  If they weren't
flipping the checksum value for GRE before why should we start doing
that now?  I'm pretty sure the checksum mangling is a very UDP centric
thing.  There is no need to introduce it to other protocols.


If different checksum representations are needed, then there really should  
be an explicit indication of whether it is a UDP-style checksum or other in  
the skb I would think rather than guessing it based on the offset. Of  
course it would be convenient if all the protocols that use a one's  
complement checksum would tolerate the UDP representation. I have a long  
(and now old) history working with real one's complement machines, and so I  
would want to believe that any correct implementation would tolerate it,  
but I don't know for a fact that they do.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH net-next 6/8] net: gre: Implement LCO for GRE over IPv4

2016-01-20 Thread Rustad, Mark D

Alexander Duyck  wrote:


There isn't any need to add such an indication, nor do we need to
start bitflipping the return value from csum_fold in all cases.  I
think there was just some confusion about UDP checksums vs GRE or TCP
checksums.


Yeah. I think I finally got there. The naive software methods will never  
generate a true 0 unless everything was zero. Real one's complement  
machines did addition in terms of subtraction so that 1 + -1 would never  
produce a -0, only a normal 0. Of course a simple adder would produce a -0,  
making it impossible to get back to a normal 0.



I'd say we are better off keeping this simple.  The original patch
just needs to drop the check for the resultant checksum being 0 since
that is not needed for GRE.


I'm all in favor of simple. I had just started to worry about a possible  
change in behavior that might have interoperability problems with some  
implementations. I wonder if any implementation ever did the addition by  
subtraction, but also failed to make 0 compare equal to -0? I guess if they  
knew enough to do the former, they should not have blown the latter.


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH 3/3] ixgbe: synchronize the link_speed and link_up of a slave interface

2015-12-30 Thread Rustad, Mark D
zyjzyj2...@gmail.com wrote:

> From: Zhu Yanjun <yanjun@windriver.com>
> 
> According to the suggestion from Rustad, Mark D, this behavior perhaps
> is more related to the copper phy. But to make fiber phy more robust,
> to all the interfaces as a slave interface, the link_speed and link_up
> is synchronized.
> 
> Signed-off-by: Zhu Yanjun <yanjun@windriver.com>
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |8 +---
> 1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 1bb6056..ce47639 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -6441,10 +6441,12 @@ static void ixgbe_watchdog_link_is_up(struct 
> ixgbe_adapter *adapter)
>* a bonding driver in 802.3ad mode. When X540 NIC acts as an
>* independent interface, it is not necessary to synchronize link_up
>* and link_speed.
> -  * In the end, not continue if (X540 NIC && SLAVE && link_speed UNKNOWN)
> +  * According to the suggestion from Rustad, Mark D, this behavior
> +  * perhaps is related to the copper phy. To make fiber phy more robust,
> +  * To all the interfaces as a slave, the link_speed is checked.
> +  * In the end, not continue if (SLAVE && link_speed UNKNOWN)

There is no need to make reference to my suggestion in the comment, especially 
since that is in the commit message. Please simplify your comment above to be 
something like:
 * For all slave interfaces, wait for the link_speed to be known.

>*/
> - if ((hw->mac.type == ixgbe_mac_X540) &&
> - (netdev->flags & IFF_SLAVE))
> + if (netdev->flags & IFF_SLAVE)
>   if (link_speed == IXGBE_LINK_SPEED_UNKNOWN)
>   return;

The above would be better as:
if ((netdev->flags & IFF_SLAVE) &&
link_speed == IXGBE_LINK_SPEED_UNKNOWN)
return;

Please do not send a series of patches - it just adds needless confusion and is 
a bisect hazard. Just send a single patch with the desired change as a V5.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict synchronization of link_up and speed

2015-12-29 Thread Rustad, Mark D
Emil S  wrote:

>>  */
>> -if (hw->mac.type == ixgbe_mac_X540)
>> +if ((hw->mac.type == ixgbe_mac_X540) &&
>> +(netdev->flags & IFF_SLAVE))
>>  if (link_speed == IXGBE_LINK_SPEED_UNKNOWN)
>>  return;
> 
> If you were to enter ixgbe_watchdog_link_is_up() with unknown speed, then I 
> would
> assume that you also have a dmesg that shows:
> "NIC Link is Up unknown speed"
> 
> by the interface you use in the bond?

It seems odd to be checking the MAC type for this. Is this behavior perhaps 
more related to the copper phy? If so, then the check should be changed. Or 
would the check for unknown link speed be sufficient? It seems like an 
interface as a slave would not work with an unknown link speed, so it may as 
well wait in all cases, not just for X540.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH v5 4/4] net: diag: Support destroying TCP socketsr

2015-12-15 Thread Rustad, Mark D
Maciej Żenczykowski  wrote:

> I'd tend to agree that reset or abort would be preferable to destroy.
> After all... the socket doesn't actually go away.

Or maybe terminate? Reset kind of implies to me that it may resume operation. 
Abort could be ok. I think terminate is somewhat more neutral, if that makes 
sense.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Checksum offload queries

2015-12-10 Thread Rustad, Mark D
Edward Cree  wrote:

> I have just realised something startling.  Assuming the inner protocol uses 
> the ones complement checksum in the way IP, UDP and TCP do, the outer 
> checksum can be computed *without looking at the payload*.  Why?  Because the 
> ones complement sum of (say) a correctly checksummed UDP datagram is simply 
> the complement of the ones complement sum of the pseudo header.  Similarly, 
> the ones complement sum of a correctly checksummed IP header is zero.
> Therefore, the outer checksum depends _only_ on the inner and outer pseudo 
> headers and the encapsulation headers.  For example, with UDP encapsulated in 
> VXLAN, we have the following packet structure:
> ETH IP UDP VXLAN inner-ETH inner-IP inner-UDP PAYLOAD
> and the outer checksum equals
> ~([outer_pseudo] + [UDP] + [VXLAN] + [inner-ETH] + ~[inner_pseudo])
> where [] denotes summation, and all addition is ones complement.
> This can easily be computed in software, especially as the stack already has 
> ~[inner_pseudo]: it's stored in the inner checksum field to help inner 
> checksum offload.
> 
> Have I made a mistake in my ones-complement maths, or is outer checksum 
> offload as unnecessary as IP header checksum offload?

I agree with the overall observation, in that the outer checksum can be derived 
from the inner one. I think that the inner-ip header needs to be added (after 
subtracting out the inner_pseudo as you indicate above), because the entire raw 
inner IP header needs to be included in the outer checksum. I haven't thought 
this all through in detail yet. It would be really nice to have a function that 
implemented something like this. Could one be structured to handle most 
encapsulations?

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [BUG] net: performance regression on ixgbe (Intel 82599EB 10-Gigabit NIC)

2015-12-04 Thread Rustad, Mark D
Otto Sabart  wrote:

> I probably found a performance regression on ixgbe (Intel 82599EB
> 10-Gigabit NIC) on v4.4-rc3. I am able to see this problem since
> v4.4-rc1.
> 
> The bug report you can find here [0].
> 
> Can somebody take a look at it?
> 
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1288124

A recent patch has disabled LRO by default because it is incompatible with 
forwarding. If you aren't interested in forwarding, you might try enabling lro 
with ethtool.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [net-next 13/15] ixgbe: Handle extended IPv6 headers in tx path

2015-12-02 Thread Rustad, Mark D
Alexander Duyck  wrote:

> This doesn't look right.  How come this doesn't match the implementation you 
> did for the ixgbevf driver?  If I am not mistaken this approach had issues 
> where it could spin forever didn't it?

Yes, this is the original version of the patch instead of the V4 of the patch 
which you would recognize and should have seen on the intel-wired-lan list. 
Jeff and I discussed this a little while ago and I sent him a link to the V4 of 
the patch in patchwork.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [net-next 06/16] i40e: Properly cast type for arithmetic

2015-11-25 Thread Rustad, Mark D
Joe Perches  wrote:

> On Wed, 2015-11-25 at 02:46 -0800, Jeff Kirsher wrote:
>> On Tue, 2015-11-24 at 16:43 -0800, Joe Perches wrote:
>>> On Tue, 2015-11-24 at 16:04 -0800, Jeff Kirsher wrote:
 From: Helin Zhang 
 
 Pointer of type void * shouldn't be used in arithmetic, which may
 result in compilation error. Casting of (u8 *) can be added to fix
 that.
>>> 
>>> void * arithmetic is used quite frequently in the kernel.
>>> 
>>> What compiler emits an error?
>> 
>> When you use the gcc -Wpointer-arith, it generates the warning.
> 
> make W=3 does that
> 
> $ make help
> [...]
> make W=n   [targets] Enable extra gcc checks, n=1,2,3 where
>   1: warnings which may be relevant and do not occur too often
>   2: warnings which occur quite often but may still be relevant
>   3: more obscure warnings, can most likely be ignored
> 
> but I think it should be ignored.

Yes. That doesn't mean that -Wpointer-arith can't catch other possible errors 
that may be worth fixing, but this particular type is an accepted usage in the 
kernel and so should be ignored - unless you find something logically wrong, 
which does not seem to be the case here at all.

If one cared about portability to other compilers that treat void * very 
differently in this usage, then you would want to deal with all of these. That 
is not the case with the kernel.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH 13/15] ixgbe: Convert to advertise NETIF_F_HW_CSUM

2015-11-20 Thread Rustad, Mark D
Tom Herbert  wrote:

> The skb_csum_offload_chk is used to resolve checksums that are unable
> to be offloaded to the device.
> 
> Signed-off-by: Tom Herbert 
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 41 +++
> 1 file changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 80823f5..26c1633 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c



> static void ixgbe_tx_csum(struct ixgbe_ring *tx_ring,
> -   struct ixgbe_tx_buffer *first)
> +   struct ixgbe_tx_buffer *first,
> +   struct ixgbe_adapter *adapter)

This added parameter should not be needed. See below.

> {
>   struct sk_buff *skb = first->skb;
>   u32 vlan_macip_lens = 0;
>   u32 mss_l4len_idx = 0;
>   u32 type_tucmd = 0;
> + bool csum_encapped;
> 
> - if (skb->ip_summed != CHECKSUM_PARTIAL) {
> + if (!skb_csum_offload_chk(skb, _offl_spec, _encapped, true)) {
> +no_csum:
>   if (!(first->tx_flags & IXGBE_TX_FLAGS_HW_VLAN) &&
>   !(first->tx_flags & IXGBE_TX_FLAGS_CC))
>   return;
> @@ -7025,7 +7038,15 @@ static void ixgbe_tx_csum(struct ixgbe_ring *tx_ring,
>   u8 *raw;
>   } transport_hdr;
> 
> - if (skb->encapsulation) {
> + if (csum_encapped) {
> + switch (adapter->hw.mac.type) {
> + case ixgbe_mac_X550:
> + case ixgbe_mac_X550EM_x:
> + break;
> + default:
> + skb_checksum_help(skb);
> + goto no_csum;
> + }
>   network_hdr.raw = skb_inner_network_header(skb);
>   transport_hdr.raw = skb_inner_transport_header(skb);
>   vlan_macip_lens = skb_inner_network_offset(skb) <<

The switch checking the mac type above is not needed. All of the ixgbe devices 
can handle encapsulated tx checksum offload. The X550 series
has the ability to offload the rx side as well, but is not a factor
here. That also means that the added adapter parameter would not be
needed.

> @@ -7602,7 +7623,7 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
>   if (tso < 0)
>   goto out_drop;
>   else if (!tso)
> - ixgbe_tx_csum(tx_ring, first);
> + ixgbe_tx_csum(tx_ring, first, adapter);
> 
>   /* add the ATR filter if ATR is on */
>   if (test_bit(__IXGBE_TX_FDIR_INIT_DONE, _ring->state))



> @@ -8794,12 +8814,10 @@ skip_sriov:
> 
>   netdev->vlan_features |= NETIF_F_TSO;
>   netdev->vlan_features |= NETIF_F_TSO6;
> - netdev->vlan_features |= NETIF_F_IP_CSUM;
> - netdev->vlan_features |= NETIF_F_IPV6_CSUM;
> + netdev->vlan_features |= NETIF_F_HW_CSUM;
>   netdev->vlan_features |= NETIF_F_SG;
> 
> - netdev->hw_enc_features |= NETIF_F_SG | NETIF_F_IP_CSUM |
> -NETIF_F_IPV6_CSUM;
> + netdev->hw_enc_features |= NETIF_F_SG | NETIF_F_HW_CSUM;
> 
>   netdev->priv_flags |= IFF_UNICAST_FLT;
>   netdev->priv_flags |= IFF_SUPP_NOFCS;
> @@ -8809,8 +8827,7 @@ skip_sriov:
>   case ixgbe_mac_X550:
>   case ixgbe_mac_X550EM_x:
>   netdev->hw_enc_features |= NETIF_F_RXCSUM |
> -NETIF_F_IP_CSUM |
> -NETIF_F_IPV6_CSUM;
> +NETIF_F_HW_CSUM;

This is actually redundant, because the bit in hw_enc_features was set above. I 
already have a patch in queue that changes the statement above to only set the 
NETIF_F_RXCSUM for the X550 devices because the other bits were set just above. 
So once my patch is applied, this last hunk won't be needed here.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] regression in ixgbe SFP detection patch

2015-11-11 Thread Rustad, Mark D
William,

Emil S  wrote:

>> It also fixes my issue: even if eth{2,3} are still up with no carrier, I
>> don't have any kworker in D state.
> 
> It appears that you have 2 ports with empty cages. If that is the case there
> is no reason to keep the interfaces up. If you bring them down, or plug the 
> SFP+
> modules the kworkers should stop.
> 
>> So, is it something we should consider as a regression, in that case I
>> can send a formal patch, or do you need some more information to help
>> you debug it?
> 
> If the diff above is the patch you are referring to then you will break the
> SFP+ detection in the case where the driver was loaded while there were no
> SFP+ modules present in the cages.

Just so you know, there are patches in queue that will improve this situation 
in two ways:

1) When the I2C probe times out, the code assumes that the cage is empty and 
does not retry the access until the next probe.

2) The driver will use its own private workqueue, so it will not affect the 
system workqueues at all.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH] fm10k:Fix error handling in the function fm10k_setup_tc

2015-10-20 Thread Rustad, Mark D
Nicholas Krause  wrote:

> This fixes error handling in the function fm10k_setup_tc to properly
> check if the call to the function fm10k_open has failed by returning
> a error and if so return immediately to the caller of the function
> fm10k_setup_tc to properly signal this non recoverable failure.
> 
> Signed-off-by: Nicholas Krause 
> ---
> drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 5 -
> 1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c 
> b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> index 99228bf..5a4e702 100644
> --- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> @@ -1146,6 +1146,7 @@ static struct rtnl_link_stats64 
> *fm10k_get_stats64(struct net_device *netdev,
> int fm10k_setup_tc(struct net_device *dev, u8 tc)
> {
>   struct fm10k_intfc *interface = netdev_priv(dev);
> + int err;
> 
>   /* Currently only the PF supports priority classes */
>   if (tc && (interface->hw.mac.type != fm10k_mac_pf))
> @@ -1175,7 +1176,9 @@ int fm10k_setup_tc(struct net_device *dev, u8 tc)
>   fm10k_mbx_request_irq(interface);
> 
>   if (netif_running(dev))
> - fm10k_open(dev);
> + err = fm10k_open(dev);
> + if (err)
> + return err;
> 
>   /* flag to indicate SWPRI has yet to be updated */
>   interface->flags |= FM10K_FLAG_SWPRI_CONFIG;

NAK. This will reference an uninitialized err variable when netif_running 
returns false. The compiler should complain about this, suggesting that it 
wasn't compiled.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [patch net-next v2 06/13] rocker: introduce worlds infrastructure

2015-10-06 Thread Rustad, Mark D
Jiri Pirko  wrote:

>>> +   int (*port_init)(struct rocker_port *rocker_port, void *priv,
>>> +void *port_priv);
>> 
>> Yuck, void *.  Can we do better?
> 
> I see nothing wrong with this priv usage. It's done like this on many
> places. I think it is completely legit, since the call points are well
> defined and wrapped.

This particular call is perhaps the most troubling. In general, if there is one 
void parameter you may well get a compile error on a non-void parameter if you 
get them switched around. With two void parameters that is no longer the case, 
making it even more error-prone than the other uses of void *.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [RFC PATCH] Fix false positives in can_checksum_protocol()

2015-10-05 Thread Rustad, Mark D
> On Oct 5, 2015, at 9:23 AM, Tom Herbert  wrote:
> 
> 4) To help drivers for devices with limited offload capabilities we'll
> define a helper function to check for typical restrictions (.e.g. IPv4
> only, TCP/UDP only. no encapsulation, no IPv6 extension headers,
> etc.). I am working on this helper function and will send RFC shortly.

FWIW, ixgbe probably won't use the IPv6 extension header check, because I am 
working on simply adding offload support for it. It seems to me that if segment 
routing becomes used, in some places there will be enough traffic with it 
present to want it offloaded. Well, maybe ixgbe should use the extension header 
check until I get the offload support done, but that is it.

Are there any good tools for testing IPv6 extension headers? It seems that it 
hasn't been tested up to this point or the ixgbe problem would have been 
noticed, so it would be good to introduce a good tool to test it properly.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH net 3/7] openvswitch: Fix skb leak in ovs_fragment()

2015-09-29 Thread Rustad, Mark D
> On Sep 29, 2015, at 3:39 PM, Joe Stringer  wrote:
> 
> @@ -728,8 +727,14 @@ static void ovs_fragment(struct vport *vport, struct 
> sk_buff *skb, u16 mru,
>   WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.",
> ovs_vport_name(vport), ntohs(ethertype), mru,
> vport->dev->mtu);
> - kfree_skb(skb);
> + goto out;
>   }
> +
> + skb = NULL;
> +
> +out:
> + if (skb)
> + kfree_skb(skb);
> }
> 
> static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,

Wouldn't that hunk be better as:

@@ -728,8 +727,13 @@ static void ovs_fragment(struct vport *vport, struct 
sk_buff *skb, u16 mru,
WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.",
  ovs_vport_name(vport), ntohs(ethertype), mru,
  vport->dev->mtu);
-   kfree_skb(skb);
+   goto out;
}
+
+   return;
+
+out:
+   kfree_skb(skb);
}

static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH V4 1/2] pci: Add dev_flags bit to access VPD through function 0

2015-09-15 Thread Rustad, Mark D
> On Sep 15, 2015, at 2:17 PM, Alex Williamson  
> wrote:
> 
> Also, rather than clearing the flag, can we move the tests done by
> pci_vpd_f0_dev_check() into the
> quirk setup function?  It seems like function 0 should be sufficiently
> configured by the time we're probing non-zero functions that we can be
> more selective in setting the flag rather than unsetting it later.

I guess I was being very conservative in not assuming anything about the state 
of other devices at that point. Things seem to be increasingly parallel all the 
time and I am not deeply involved in the evolution of the PCI subsystem. If you 
want to make that assumption, I would suggest that pci_vpd_f0_dev_check remain 
a separate function called by the quirk setup so that it can be used by other 
quirk setup functions as well.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH V4 1/2] pci: Add dev_flags bit to access VPD through function 0

2015-09-15 Thread Rustad, Mark D
> On Sep 15, 2015, at 11:19 AM, Alex Williamson  
> wrote:
> 
> In addition to the (PCI_SLOT() != devfn) issue, I'm concerned about
> topologies like we see on Skylake.  IIRC, the integrated NIC appears at
> something like 00:1f.6.  I don't know if that specific NIC has VPD, nor
> am I sure it really matter because another example or some future
> version might.  So we'll set the PCI_DEV_FLAGS_VPD_REF_F0 because we do
> so for all (PCI_FUNC() != 0) Intel NICs, we'll call
> pci_vpd_f0_dev_check(), which will error because function 0 has a
> different class code and device ID, so we return error and if VPD exists
> on the device, it's now inaccessible.

Yes, that is exactly what would happen.

> I thought there was talk about whitelisting anything on the root bus to
> avoid strange root complex integrated devices (and perhaps avoid the
> general case for assigned devices within a VM), but I don't see anything
> like that here.

I hadn't heard that talk, but I'm not on the PCI list and I guess I wasn't 
copied.

> Perhaps instead of failing and hiding VPD we should fail, clear the
> flag, and allow normal access.  Thanks,

Because the purpose of VPD is to hold information about the device, I would 
suggest that VPD should never be provided for an embedded network device, but 
rather for the device as a whole. So while there may well be VPD for an SOC, 
that VPD should not be associated with one of its embedded devices, but rather 
something more appropriate for the device as a whole. And attaching VPD to a 
whole bunch of internal devices would just be madness.

So I understand the concern, but I don't think that it should really happen in 
real systems. I did think about this case when I was working on the patches. A 
networking device should really only have VPD when it is its own physical 
device, such as a NIC.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH V4 1/2] pci: Add dev_flags bit to access VPD through function 0

2015-09-15 Thread Rustad, Mark D
> On Sep 15, 2015, at 12:04 PM, Alex Williamson  
> wrote:
> 
> 
> FRU-type information is only one of the use cases of VPD, the spec also
> defines (PCI rev 3.0, 6.4):
> 
>... a mechanism for storing information such as performance and
>failure data on the device being monitored.
> 
> That information could very much be function specific.

It is open to interpretation. I guess still I see it as the physical device as 
a whole.

> When I was looking at whether we should provide VPD access of an
> assigned device at all, I ran across this interesting statement in the
> PCI spec (rev 3.0, I.3.1.1):
> 
>CP Extended Capability
> 
>This field allows a new capability to be identified in the VPD
>area. Since dynamic control/status cannot be placed in VPD, the
>data for this field identifies where, in the device’s memory or
>I/O address space, the control/status registers for the
>capability can be found. Location of the control/status
>registers is identified by providing the index (a value between
>0 and 5) of the Base Address register that defines the address
>range that contains the registers, and the offset within that
>Base Address register range where the control/status registers
>reside. The data area for this field is four bytes long. The
>first byte contains the ID of the extended capability. The
>second byte contains the index (zero based) of the Base Address
>register used. The next two bytes contain the offset (in
>little-endian order) within that address range where the
>control/status registers defined for that capability reside.
> 
> Again, this sounds like function specific data, and both here and above,
> blocking access to VPD could affect the functionality of drivers.  It
> may be the case that Intel would find this use to be madness, but
> there's no PCI spec requirement that separate functions are in any way
> similar and we're looking at an interface that may be used by non-Intel
> devices as well.  Thanks,

It isn't an interface as such, it is a quirk to address some widespread design 
problems with multi function devices with VPD. And you are right that functions 
can be different. In fact this quirk is needed only because now they often 
(usually in fact) are not different! I do hope to see some non-Intel devices 
use the quirk, because I'm pretty sure there are other devices that have the 
same issue.

I realize that I covered a pretty wide swath by making the quirk apply to all 
Intel Ethernet devices, but that still seems correct. The Skylake is not an 
issue because it does not have VPD so the pci_find_capability will fail before 
any handling of the quirk is possible. The code that applies the quirk could 
check specific devices, but it would make the code a lot bigger, and I see this 
kind of code as dead weight for so many systems that I tried to make it as 
small as possible. Since all Intel Ethernet seems to be correct now and as far 
as I can see into the future, that is what I did.

Going back to something you mentioned before, I think you are right that the 
failure case for the pci_vpd_f0_dev_check could be made to simply clear the 
quirk and continue, since pci_vpd_f0_dev_check really should not fail in cases 
where the quirk is applicable. That does seem like a reasonable change to me 
the more I think about it.

I think a whitelist would be unnecessary dead weight.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH] igb: don't unmap hw_addr if its NULL

2015-09-10 Thread Rustad, Mark D
> On Sep 9, 2015, at 9:07 PM, Jarod Wilson  wrote:
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index e174fbb..a5e0022 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -2823,7 +2823,8 @@ static void igb_remove(struct pci_dev *pdev)
> 
>   igb_clear_interrupt_scheme(adapter);
> 
> - pci_iounmap(pdev, hw->hw_addr);
> + if (hw->hw_addr)
> + pci_iounmap(pdev, hw->hw_addr);
>   if (hw->flash_address)
>   iounmap(hw->flash_address);
>   pci_release_selected_regions(pdev,

I don't think that this is entirely the right solution. In ixgbe we have a 
separate pointer, io_addr, used to manage the resource, so that the space can 
be freed even after hw_addr is cleared. With the approach above, the 
pci_iounmap will not ever be called on the space. You can see how ixgbe is 
doing it.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [GIT] Networking

2015-09-08 Thread Rustad, Mark D
> On Sep 7, 2015, at 4:02 AM, David Laight  wrote:
> 
> Feed:
> int bar(int (*f)[10]) { return sizeof *f; }
> into cc -O2 -S and look at the generated code - returns 40 not 4.

Yes, indeed it does. And with clang too. I guess I was too easily discouraged 
when looking for a workable syntax some years ago. At the time I stopped when 
the typedef worked, which really just encapsulates this. I should have 
recognized that then. Thanks for making it all so clear.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [GIT] Networking

2015-09-04 Thread Rustad, Mark D
> On Sep 4, 2015, at 2:07 AM, David Laight  wrote:
> 
>> I find them useful as syntactic sugar. We have not used them a lot, but 
>> there are cases in our crypto
>> handling code where we have fixed size array inputs/outputs and there we 
>> opted to use them. They make
>> it easy to remember what the expected sizes of input and output are without 
>> having to read through the
>> implementation (of course we never even tried to use sizeof on these 
>> pointers).
>> 
>> static int smp_ah(struct crypto_blkcipher *tfm, const u8 irk[16],
>>  const u8 r[3], u8 res[3])
> 
> Expect that it looks like you are passing arrays by value,
> but instead you are passing by reference.
> 
> Explicitly pass by reference and sizeof works.

It depends on what you mean by works. It at least doesn't look so misleading 
when passing by reference and so works more as expected. The sizeof in either 
case will never return the size of the array. To have sizeof return the size of 
the array would require a typedef of the array to pass by reference. In some 
cases that could be the right thing to do.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [net-next 05/19] ixgbe: Add support for UDP-encapsulated tx checksum offload

2015-09-02 Thread Rustad, Mark D
> On Sep 1, 2015, at 8:17 PM, Tom Herbert  wrote:
> 
> I suspect this is not UDP-encapsulation specific, will it work with
> TCP/IP/IP, TCP/IP/GRE etc.?

It could do more, but this is what has been tested up to this point.

> Isn't there anyway the ixgbe could just be made to NETIF_HW_CSUM? That
> would be so much more straightforward and support nearly all use cases
> without needing to jump through all these hoops.

Well, the description says:

 ---
Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM.
It means that device can fill TCP/UDP-like checksum anywhere in the packets
whatever headers there might be.
 ---

The device can't do whatever, wherever. There is always a limit to the offset 
to the inner headers that can be handled, for instance.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [net-next 05/19] ixgbe: Add support for UDP-encapsulated tx checksum offload

2015-09-02 Thread Rustad, Mark D
> On Sep 2, 2015, at 4:21 PM, Tom Herbert  wrote:
> 
> Mark, another question in this area of code. Looking at ixgbe_tx_csum,
> I'm wondering what happens with those default cases for the switch
> statements. If those are hit for whatever reason does that mean the
> checksum is never resolved? It seems like if the device couldn't
> handle these cases then skb_checksum_help should be called to set the
> checksum. In particular I am wondering what happens in the case that a
> TCP or UDP packet is sent in IPv6 with an extension header present (so
> default is taken in switch (l4_hdr)). Would the checksum be properly
> set in this case?

I will look further into this, but in a first look it appears that you are 
right and that it has been this way for some time.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH net-next 09/13] vxlan: provide access function for vxlan socket address family

2015-08-24 Thread Rustad, Mark D
 On Aug 18, 2015, at 1:33 PM, Jiri Benc jb...@redhat.com wrote:
 
 Signed-off-by: Jiri Benc jb...@redhat.com
 ---
 drivers/net/vxlan.c | 8 
 include/net/vxlan.h | 5 +
 2 files changed, 9 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
 index e4b8ab63d0fa..d5ca1d7e0b81 100644
 --- a/drivers/net/vxlan.c
 +++ b/drivers/net/vxlan.c
 @@ -236,7 +236,7 @@ static struct vxlan_sock *vxlan_find_sock(struct net 
 *net, sa_family_t family,
 
   hlist_for_each_entry_rcu(vs, vs_head(net, port), hlist) {
   if (inet_sk(vs-sock-sk)-inet_sport == port 
 - inet_sk(vs-sock-sk)-sk.sk_family == family 
 + vxlan_get_sk_family(vs) == family 
   vs-flags == flags)
   return vs;
   }
 @@ -625,7 +625,7 @@ static void vxlan_notify_add_rx_port(struct vxlan_sock 
 *vs)
   struct net_device *dev;
   struct sock *sk = vs-sock-sk;
   struct net *net = sock_net(sk);
 - sa_family_t sa_family = sk-sk_family;
 + sa_family_t sa_family = vxlan_get_sk_family(vs);
   __be16 port = inet_sk(sk)-inet_sport;
   int err;
 
 @@ -650,7 +650,7 @@ static void vxlan_notify_del_rx_port(struct vxlan_sock 
 *vs)
   struct net_device *dev;
   struct sock *sk = vs-sock-sk;
   struct net *net = sock_net(sk);
 - sa_family_t sa_family = sk-sk_family;
 + sa_family_t sa_family = vxlan_get_sk_family(vs);
   __be16 port = inet_sk(sk)-inet_sport;
 
   rcu_read_lock();
 @@ -2390,7 +2390,7 @@ void vxlan_get_rx_port(struct net_device *dev)
   for (i = 0; i  PORT_HASH_SIZE; ++i) {
   hlist_for_each_entry_rcu(vs, vn-sock_list[i], hlist) {
   port = inet_sk(vs-sock-sk)-inet_sport;
 - sa_family = vs-sock-sk-sk_family;
 + sa_family = vxlan_get_sk_family(vs);
   dev-netdev_ops-ndo_add_vxlan_port(dev, sa_family,
   port);
   }
 diff --git a/include/net/vxlan.h b/include/net/vxlan.h
 index e4534f1b2d8c..43677e6b9c43 100644
 --- a/include/net/vxlan.h
 +++ b/include/net/vxlan.h
 @@ -241,3 +241,8 @@ static inline void vxlan_get_rx_port(struct net_device 
 *netdev)
 }
 #endif
 #endif
 +
 +static inline unsigned short vxlan_get_sk_family(struct vxlan_sock *vs)
 +{
 + return vs-sock-sk-sk_family;
 +}

This causes build problems because vxlan_get_sk_family is not inside the #endif
protecting the file for multiple inclusion. Please put vxlan_get_sk_family
inside the last #endif.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers

2015-07-23 Thread Rustad, Mark D
 On Jul 23, 2015, at 5:03 PM, Scott Feldman sfel...@gmail.com wrote:
 
 On the CHECKs on space after cast, should we modify checkpatch.pl to
 not flag those for drivers/net?

Please don't. My internal parser really wants to see the cast right up against 
whatever it is casting. Has this practice been changing and I haven't noticed?

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ss -p segfaults

2015-07-15 Thread Rustad, Mark D
 On Jul 15, 2015, at 8:12 AM, Vadim Kochan vadi...@gmail.com wrote:
 Would you please check this fix ?
 
 diff --git a/misc/ss.c b/misc/ss.c
 index 03f92fa..3a826e4 100644
 --- a/misc/ss.c
 +++ b/misc/ss.c
 @@ -683,8 +683,8 @@ static inline void sock_addr_set_str(inet_prefix *prefix, 
 char **ptr)
 
 static inline char *sock_addr_get_str(const inet_prefix *prefix)
 {
 -char *tmp ;
 -memcpy(tmp, prefix-data, sizeof(char *));
 +char *tmp;
 +memcpy(tmp, prefix-data[0], sizeof(char *));
 return tmp;
 }

That surely is not a fix! The destination of the memcpy is the address of an 
uninitialized stack variable! Both versions are equally bad.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ss -p segfaults

2015-07-15 Thread Rustad, Mark D
 On Jul 15, 2015, at 9:49 AM, Rustad, Mark D mark.d.rus...@intel.com wrote:
 
 On Jul 15, 2015, at 8:12 AM, Vadim Kochan vadi...@gmail.com wrote:
 Would you please check this fix ?
 
 diff --git a/misc/ss.c b/misc/ss.c
 index 03f92fa..3a826e4 100644
 --- a/misc/ss.c
 +++ b/misc/ss.c
 @@ -683,8 +683,8 @@ static inline void sock_addr_set_str(inet_prefix 
 *prefix, char **ptr)
 
 static inline char *sock_addr_get_str(const inet_prefix *prefix)
 {
 -char *tmp ;
 -memcpy(tmp, prefix-data, sizeof(char *));
 +char *tmp;
 +memcpy(tmp, prefix-data[0], sizeof(char *));
return tmp;
 }
 
 That surely is not a fix! The destination of the memcpy is the address of an 
 uninitialized stack variable! Both versions are equally bad.

I probably over-reacted, but using memcpy to access a pointer in this way is 
just ugly. For one thing, it circumvents any sanity-checking that the compiler 
can do. And changing the prefix-data to prefix-data[0] should be exactly the 
same thing and therefore should not fix anything. Anyway, never mind that.

Looking at more of the code, it looks to me like the the string pointer in data 
can sometimes point to a literal string instead of allocated memory when proc 
is in use. Free would not be happy with that. Look at the use of variable peer 
in function unix_stats_print.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH V3 0/2] pci: Provide a flag to access VPD through function 0

2015-07-06 Thread Rustad, Mark D
 On Jun 26, 2015, at 11:04 AM, Rustad, Mark D mark.d.rus...@intel.com wrote:
 
 Sorry, Mark, I've just been busy with other issues and haven't had a
 chance to look at this yet.
 
 Is there any chance of this getting into this merge window?

Well, it has missed the merge window, but this really is a bug fix. These 
patches address problems that, under race conditions, can corrupt VPD data and 
under other conditions can cause hangs. In fact I would submit that the reason 
that the VPD operations have been made interruptible is directly related to 
hangs caused by the sharing of VPD capability registers between functions. You 
see, if one function ever performs a VPD write, any subsequent read on any 
other function that shares those registers will definitely hang.

I imagine that there are many devices beyond Intel's Ethernet devices that 
would benefit from using the quirk that these patches introduce.

Please apply it and consider it for -stable.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH V3 0/2] pci: Provide a flag to access VPD through function 0

2015-06-26 Thread Rustad, Mark D
 On Jun 17, 2015, at 9:44 AM, Bjorn Helgaas bhelg...@google.com wrote:
 
 On Wed, Jun 17, 2015 at 11:29 AM, Rustad, Mark D
 mark.d.rus...@intel.com wrote:
 + Alex
 
 On Jun 5, 2015, at 2:59 PM, Rustad, Mark D mark.d.rus...@intel.com wrote:
 
 On Jun 3, 2015, at 11:46 AM, Mark D Rustad mark.d.rus...@intel.com wrote:
 
 Many multi-function devices provide shared registers in extended
 config space for accessing VPD. The behavior of these registers
 means that the state must be tracked and access locked correctly
 for accesses not to hang or worse. One way to meet these needs is
 to always perform the accesses through function 0, thereby using
 the state tracking and mutex that already exists.
 
 To provide this behavior, add a dev_flags bit to indicate that this
 should be done. This bit can then be set for any non-zero function
 that needs to redirect such VPD access to function 0. Do not set
 this bit on the zero function or there will be an infinite recursion.
 
 The second patch uses this new flag to invoke this behavior on all
 multi-function Intel Ethernet devices.
 
 Signed-off-by: Mark Rustad mark.d.rus...@intel.com
 
 ---
 Changes in V2:
 - Corrected a spelling error in a log message
 - Added checks to see that the referenced function 0 is reasonable
 Changes in V3:
 - Don't leak a device reference
 - Check that function 0 has VPD
 - Make a helper for the function 0 checks
 - Moved a multifunction check to the quirk patch
 
 So does this series look acceptable now? I think I addressed the issues 
 that Alex raised. Can these also be considered for -stable?
 
 More than a week has passed without any comment. Is this going to be 
 accepted or is there still an issue?
 
 Sorry, Mark, I've just been busy with other issues and haven't had a
 chance to look at this yet.

Is there any chance of this getting into this merge window?

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH V3 0/2] pci: Provide a flag to access VPD through function 0

2015-06-17 Thread Rustad, Mark D
+ Alex

 On Jun 5, 2015, at 2:59 PM, Rustad, Mark D mark.d.rus...@intel.com wrote:
 
 On Jun 3, 2015, at 11:46 AM, Mark D Rustad mark.d.rus...@intel.com wrote:
 
 Many multi-function devices provide shared registers in extended
 config space for accessing VPD. The behavior of these registers
 means that the state must be tracked and access locked correctly
 for accesses not to hang or worse. One way to meet these needs is
 to always perform the accesses through function 0, thereby using
 the state tracking and mutex that already exists.
 
 To provide this behavior, add a dev_flags bit to indicate that this
 should be done. This bit can then be set for any non-zero function
 that needs to redirect such VPD access to function 0. Do not set
 this bit on the zero function or there will be an infinite recursion.
 
 The second patch uses this new flag to invoke this behavior on all
 multi-function Intel Ethernet devices.
 
 Signed-off-by: Mark Rustad mark.d.rus...@intel.com
 
 ---
 Changes in V2:
 - Corrected a spelling error in a log message
 - Added checks to see that the referenced function 0 is reasonable
 Changes in V3:
 - Don't leak a device reference
 - Check that function 0 has VPD
 - Make a helper for the function 0 checks
 - Moved a multifunction check to the quirk patch
 
 So does this series look acceptable now? I think I addressed the issues that 
 Alex raised. Can these also be considered for -stable?

More than a week has passed without any comment. Is this going to be accepted 
or is there still an issue?

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH V3 0/2] pci: Provide a flag to access VPD through function 0

2015-06-05 Thread Rustad, Mark D
 On Jun 3, 2015, at 11:46 AM, Mark D Rustad mark.d.rus...@intel.com wrote:
 
 Many multi-function devices provide shared registers in extended
 config space for accessing VPD. The behavior of these registers
 means that the state must be tracked and access locked correctly
 for accesses not to hang or worse. One way to meet these needs is
 to always perform the accesses through function 0, thereby using
 the state tracking and mutex that already exists.
 
 To provide this behavior, add a dev_flags bit to indicate that this
 should be done. This bit can then be set for any non-zero function
 that needs to redirect such VPD access to function 0. Do not set
 this bit on the zero function or there will be an infinite recursion.
 
 The second patch uses this new flag to invoke this behavior on all
 multi-function Intel Ethernet devices.
 
 Signed-off-by: Mark Rustad mark.d.rus...@intel.com
 
 ---
 Changes in V2:
 - Corrected a spelling error in a log message
 - Added checks to see that the referenced function 0 is reasonable
 Changes in V3:
 - Don't leak a device reference
 - Check that function 0 has VPD
 - Make a helper for the function 0 checks
 - Moved a multifunction check to the quirk patch

So does this series look acceptable now? I think I addressed the issues that 
Alex raised. Can these also be considered for -stable?

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH V2 1/2] pci: Add dev_flags bit to access VPD through function 0

2015-06-03 Thread Rustad, Mark D
 On Jun 2, 2015, at 7:28 PM, Alexander Duyck alexander.du...@gmail.com wrote:
 
 You can probably combine the dev-multifunction check with the dev_flags 
 check.  After all you don't need this workaround if the device is not 
 multifunction.  It might even make more sense to move the multifunction check 
 to the quirk in patch 2/2.

Yes. I also realized that I really should check that tdev-vpd is not NULL. 
There is no point in referencing it if, for whatever reason, it has no VPD.

 I also believe this leaks a reference to the device.

Yes, and I was thinking that when I wrote the line, but forgot to do anything 
about it. Good catch!

 You should be calling pci_dev_put(tdev) if tdev is not NULL.  As such you 
 probably need to split up the !tdev and the rest of the checks.

Yup. I will have a V3 coming. I still don't have anything to handle PFs 
assigned to guests, but I think that would be best in a separate patch set if 
there is need to fix that. It looks to me like that would involve trapping on 
all config space accesses to such devices and then emulating the behavior of 
the VPD Address/F and Data registers. It may be worth seeing if anyone cares 
before doing anything about it. I haven't seen any reports related to such a 
setup.

Again, thank you for your comments.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH 1/2] pci: Add dev_flags bit to access VPD through function 0

2015-06-02 Thread Rustad, Mark D
 On Jun 2, 2015, at 10:48 AM, Alexander Duyck alexander.h.du...@redhat.com 
 wrote:
 
 I'm pretty sure these could cause some serious errors if you direct assign 
 the device into a VM since you then end up with multiple devices sharing a 
 bus.  Also it would likely have side-effects on a LOM (Lan On Motherboard) as 
 it also shares the bus with multiple non-Ethernet devices.
 
 I believe you still need to add something like a check for 
 !pci_is_root_bus(dev-bus) before you attempt to grab function 0.  It 
 probably also wouldn't hurt to check the dev-multifunction bit before 
 running this code since it wouldn't make sense to go chasing down the VPD on 
 another function if the device doesn't have one.  You could probably do that 
 either as a part of this code, or perhaps put it in the quirk.

I'll look into those. I think you are right about more checks being needed. 
Thanks for the comments.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH] pci: Use a bus-global mutex to protect VPD operations

2015-05-27 Thread Rustad, Mark D
 On May 27, 2015, at 10:27 AM, Bjorn Helgaas bhelg...@google.com wrote:
 
 It sounds like there are real problems here that would be fixed by changing
 the mutex strategy and limiting VPD read lengths, but that we don't quite
 have consensus on how to solve them yet.

I have a new pair of patches that I am getting internal feedback on. I
will be on vacation starting tomorrow and won't be back until Tuesday, so
I think it best to send them when I get back so that I will be available
to respond. I could be convinced to send them now.

 VPD is a bit of a tar pit.

It sure is.

 We've talked about limiting VPD read length
 before (see links below), which requires interpreting the VPD contents
 (just the tags  sizes) the kernel.  I think I'm OK with doing that,
 provided we should audit existing users and make sure we don't break them.
 
 http://lkml.kernel.org/r/c67cd7f4-8d8f-48fc-a63c-dbdafde87...@cmexhtcas1.ad.emulex.com
 http://lkml.kernel.org/r/1f6d7b6c-7189-4fe3-926b-42609724c...@cmexhtcas2.ad.emulex.com
 
 I'd also like to include specifics, e.g., bugzillas with complete dmesg and
 lspci information, for the problems we're fixing.

The issue I am addressing is for when the VPD Address/F and Data registers
are shared across multiple functions. My latest patch addresses this by
always performing the access through function 0. I added a dev_flags bit
to indicate when this should be done, so only devices that have a quirk
that sets that bit will get that treatment.

The patch I have still doesn't address the issue for direct-assigned
devices. I have no answer to that, but will point out that most guests
seem to use virtual machines that don't support extended config space
anyway. Only those that do support extended config space and access VPD
could be a source of trouble.

I have to say that I haven't actually reproduced the failure, but given
the design of the hardware it is clear that somehow a common mutex needs
to protect those shared registers.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH] pci: Use a bus-global mutex to protect VPD operations

2015-05-20 Thread Rustad, Mark D
 On May 19, 2015, at 6:02 PM, Alexander Duyck alexander.h.du...@redhat.com 
 wrote:
 
 My suspicion is that we have a number of bugs floating around out there like 
 the Broadcom issue.  Specifically, one of the ones I found was that the r8169 
 seems to have a similar issue as called out in the email thread at 
 http://permalink.gmane.org/gmane.linux.network/232260.  I'm wondering if we 
 shouldn't add an initializer for the read/write functions that will go 
 through and pull out the 3 or 4 headers from the VPD data needed to get the 
 actual length.  Then it would lock down the VPD and save some serious time on 
 reads since most devices don't have 32K of VPD to read.

That is interesting. I noticed that there are functions already present to find 
VPD tags. If the VPD were invalid, would this block its being read at all, or 
would it default to allow reading/writing anything? I don't know if there might 
be people using Linux to completely write the VPD area. Presumably your idea 
would prevent rewriting the VPD area to something larger.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH] pci: Limit VPD reads for all Intel Ethernet devices

2015-05-19 Thread Rustad, Mark D
 On May 19, 2015, at 10:50 AM, Alexander Duyck alexander.h.du...@redhat.com 
 wrote:
 
 These two patches are very much related.

They are only related because I saw an opportunity to do this while working on 
the other issue. That is the only relationship.

snip material on the other patch

 Artificially limiting the size of the VPD does nothing but cut off possibly 
 useful data, you would be better of providing all of the data on only the 
 first function than providing only partial data on all functions and adding 
 extra lock overhead.
 This limit only limits the maximum that the OS will read to what is 
 architecturally possible in these devices. Yes, PCIe architecturally 
 provides for the possibility of more, but these devices do not. More 
 repeating data can be read, however slowly, but there is no possibility of 
 useful content beyond the first 1K. If this limit were set to 0x100, which 
 is more in line what the actual usage is, it would be an artificial limit, 
 but at 1K it is not. Oh and it does include devices made by others that 
 incorporate Intel Ethernet silicon, not just Intel-built devices.
 
 As per section 3.4.4 of the X540 datasheet the upper addressable range for 
 the VPD section is 0xFFF which means the upper limit for the hardware is 
 0x1000, not 0x400.

Ok. I have no problem changing it to that. I had been looking at other specs, 
but 0x1000 really is a hard limit.

snip more material mostly relating to the other patch

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Intel-wired-lan] [PATCH] pci: Limit VPD reads for all Intel Ethernet devices

2015-05-19 Thread Rustad, Mark D
 On May 19, 2015, at 2:17 PM, Alexander Duyck alexander.du...@gmail.com 
 wrote:
 
 Any chance you could point me toward the software in question?  Just 
 wondering because it seems like what you are fixing with this is an 
 implementation issue in the application since you really shouldn't be 
 accessing areas outside the scope of the VPD data structure, and doing so is 
 undefined in terms of what happens if you do.

I don't have it, but if you dump VPD via sysfs you will see it comes out as 32k 
in size. The kernel just blindly provides access to the full 32K space provided 
by the spec. I'm sure that we agree that the kernel should not go parse it and 
find the actual size. If it is read via stdio, say fread, the read access would 
be whatever buffer size it chooses to use.

If you looked at the quirks, you might have noticed that Broadcom limited the 
VPD access for some devices for functional reasons. That is what gave me the 
idea for limiting access to what was possibly there. With the existing Intel 
Ethernet quirk, it seemed like a simple thing to do.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail