Re: Soft lockup in tc_classify

2016-12-20 Thread Cong Wang
On Tue, Dec 20, 2016 at 10:44 PM, Shahar Klein  wrote:
>
> Tried it with same results

This piece is pretty interesting:

[  408.554689] DEBUGG:SK thread-2853[cpu-1] setting tp_created to 1
tp=94b5b02805a0 back=94b9ea932060
[  408.574258] DEBUGG:SK thread-2853[cpu-1] add/change filter by:
fl_get [cls_flower] tp=94b5b02805a0 tp->next=94b9ea932060
[  408.587849] DEBUGG:SK destroy 94b5b0280780 tcf_destroy:1905
[  408.595862] DEBUGG:SK thread-2845[cpu-1] add/change filter by:
fl_get [cls_flower] tp=94b5b02805a0 tp->next=94b5b02805a0

Looks like you added a debug printk inside tcf_destroy() too,
which seems racy with filter creation, it should not happen since
in both cases we take RTNL lock.

Don't know if changing all RCU_INIT_POINTER in that file to
rcu_assign_pointer could help anything or not. Mind to try?


Thanks for debugging!


Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
Eric Dumazet wrote:
> On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote:
>> Cycles per byte on 1024 bytes of data:
>>  Pentium Core 2  Ivy
>>  4   Duo Bridge
>> SipHash-2-4  38.9 8.3 5.8
>> HalfSipHash-2-4  12.7 4.5 3.2
>> MD5   8.3 5.7 4.7
>
> So definitely not faster.
> 
> 38 cycles per byte is a problem, considering IPV6 is ramping up.

As I said earlier, SipHash performance on 32-bit x86 really sucks,
because it wants an absolute minimum of 9 32-bit registers (8 for the
state plus one temporary for the rotates), and x86 has 7.

> What about SHA performance (syncookies) on P4 ?

I recompiled with -mtune=pentium4 and re-ran.  MD5 time went *up* by
0.3 cycles/byte, HalfSipHash went down by 1 cycle, and SipHash didn't
change:

Cycles per byte on 1024 bytes of data:
Pentium Core 2  Ivy
4   Duo Bridge
SipHash-2-4 38.9 8.3 5.8
HalfSipHash-2-4 11.5 4.5 3.2
MD5  8.6 5.7 4.7
SHA-1   19.0 8.0 6.8

(This is with a verbatim copy of the lib/sha1.c code; I might be
able to optimize it with some asm hackery.)

Anyway, you see why we were looking longingly at HalfSipHash.


In fact, I have an idea.  Allow me to make the following concrete
suggestion for using HalfSipHash with 128 bits of key material:

- 64 bits are used as the key.
- The other 64 bits are used as an IV which is prepended to
  the message to be hashed.

As a matter of practical implementation, we precompute the effect
of hashing the IV and store the 128-bit HalfSipHash state, which
is used just like a 128-bit key.

Because of the way it is constructed, it is obviously no weaker than
standard HalfSipHash's 64-bit security claim.

I don't know the security of this, and it's almost certainly weaker than
128 bits, but I *hope* it's at least a few bits stronger than 64 bits.
80 would be enough to dissuade any attacker without a six-figure budget
(that's per attack, not a one-time capital investment).  96 would be
ample for our purposes.

What I do know is that it makes a brute-force attack without
significant cryptanalytic effort impossible.

To match the spec exactly, we'd need to add the 8-byte IV length to
the length byte which pads the final block, but from a security point
of view, it does not matter.  As long as we are consistent within any
single key, any unique mapping between padding byte and message length
(mod 256) is equally good.

We may choose based on implementation convenience.

(Also note my earlier comments about when it is okay to omit the padding
length byte entirely: any time all the data to be hashed with a given
key is fixed in format or self-delimiting (e.g. null-terminated).
This applies to many of the networking uses.)


Re: ipv6: handle -EFAULT from skb_copy_bits

2016-12-20 Thread Cong Wang
On Tue, Dec 20, 2016 at 2:12 PM, Dave Jones  wrote:
> fd = socket(AF_INET6, SOCK_RAW, 7);
>
> setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, , 4);
> setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, , LEN);
>

Interesting, you set the checksum offset to be 0, but the packet size
is actually 49, transport header is located at offset 48, so apparently
the packet doesn't have room for a 16bit checksum after network header.

Your original patch seems reasonable to me, unless there is some
check in __ip6_append_data() which is supposed to catch this, but
CHECKSUM is specific to raw socket only.


Re: HalfSipHash Acceptable Usage

2016-12-20 Thread Eric Dumazet
On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote:
> > I do not see why SipHash, if faster than MD5 and more secure, would be a
> > problem.
> 
> Because on 32-bit x86, it's slower.
> 
> Cycles per byte on 1024 bytes of data:
>   Pentium Core 2  Ivy
>   4   Duo Bridge
> SipHash-2-4   38.9 8.3 5.8
> HalfSipHash-2-4   12.7 4.5 3.2
> MD58.3 5.7 4.7

So definitely not faster.

38 cycles per byte is a problem, considering IPV6 is ramping up.

But TCP session establishment on P4 is probably not a big deal.
Nobody would expect a P4 to handle gazillions of TCP flows (using a
32bit kernel)

What about SHA performance (syncookies) on P4 ?

Synfloods are probably the only case we might take care of for 2000-era
cpus.







Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-20 Thread Alexei Starovoitov
On Tue, Dec 20, 2016 at 10:49:25AM -0800, Andy Lutomirski wrote:
> >> FWIW, everywhere I say ioctl(), the bpf() syscall would be okay, too.
> >> It doesn't make a semantic difference, except that I dislike
> >> BPF_PROG_DETACH because that particular command isn't BPF-specific at
> >> all.
> >
> > Well, I think it is; it pops the bpf program from a target and drops the
> > reference on it. It's not much code, but it's certainly bpf-specific.
> 
> I mean the interface isn't bpf-specific.  If there was something that
> wasn't bpf attached to the target, you'd still want an API to detach
> it.

This discussion won't go anywhere while you keep thinking that this api
has to be generalized. As I explained several times earlier
BPF_CGROUP_INET_SOCK_CREATE hook is bpf specific. There is nothing
in the kernel that can take advantage of it today, so by definition
the hook is bpf specific. Period. Saying that something in the future
may come along that would want to use that is like saying I want
to design the generic steering wheel for any car that will ever need it.

Hence if you want to change 'target_fd' in BPF_PROG_ATTACH/DETACH cmds
from being fd of open("cgroupdir") to fd of open("cgroupdir/cgroup.bpf")
file inside it then I'm ok with that.
All other proposals with non-extensible ioctls() and crazy text based
per-hook permissions is nack.



Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
> I do not see why SipHash, if faster than MD5 and more secure, would be a
> problem.

Because on 32-bit x86, it's slower.

Cycles per byte on 1024 bytes of data:
Pentium Core 2  Ivy
4   Duo Bridge
SipHash-2-4 38.9 8.3 5.8
HalfSipHash-2-4 12.7 4.5 3.2
MD5  8.3 5.7 4.7

SipHash is more parallelizable and runs faster on superscalar processors,
but MD5 is optimized for 2000-era processors, and is faster on them than
HalfSipHash even.

Now, in the applications we care about, we're hashing short blocks, and
SipHash has the advantage that it can hash less than 64 bytes.  But it
also pays a penalty on short blocks for the finalization, equivalent to
two words (16 bytes) of input.

It turns out that on both Ivy Bridge and Core 2 Duo, the crossover happens
between 23 (SipHash is faster) and 24 (MD5 is faster) bytes of input.

This is assuming you're adding the 1 byte of length padding to SipHash's
input, so 24 bytes pads to 4 64-bit words, which makes 2*4+4 = 12 rounds,
vs. one block for MD5.  (MD5 takes a similar jump between 55 and 56 bytes.)

On a P4, SipHash is *never* faster; it takes 2.5x longer than MD5 on a
12-byte block (an IPv4 address/port pair).

This is why there was discussion of using HalfSipHash on these machines.
(On a P4, the HalfSipHash/MD5 crossover is somewhere between 24 and 31
bytes; I haven't benchmarked every possible size.)


Re: [PATCH] staging: octeon: Call SET_NETDEV_DEV()

2016-12-20 Thread David Miller
From: Florian Fainelli 
Date: Tue, 20 Dec 2016 17:02:37 -0800

> On 12/14/2016 05:13 PM, Florian Fainelli wrote:
>> The Octeon driver calls into PHYLIB which now checks for
>> net_device->dev.parent, so make sure we do set it before calling into
>> any MDIO/PHYLIB related function.
>> 
>> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a 
>> different owner")
>> Reported-by: Aaro Koskinen 
>> Signed-off-by: Florian Fainelli 
> 
> Greg, David, since this is a fix for a regression introduced in the net
> tree, it may make sense that David take it via his tree.

Since the change in question is in Linus's tree, it's equally valid
for Greg to take it as well.


Re: [PATCH net-next 1/1] driver: ipvlan: Define common functions to decrease duplicated codes used to add or del IP address

2016-12-20 Thread Gao Feng
On Wed, Dec 21, 2016 at 2:30 AM, David Miller  wrote:
> From: f...@ikuai8.com
> Date: Mon, 19 Dec 2016 09:24:05 +0800
>
>>  It is sent again because the first email is sent during net-next closing.
>
> It is still closed, and will not open again for at least one week.

Thanks David.
I thought it only last one week.

I would waiting for reopen, and resend again.

Regards
Feng




Re: [PATCH] phy: check if parent device is NULL

2016-12-20 Thread Ruslan Babayev
Yes, I saw that with the staging Octeon driver.
Your patch works for me too.

Thanks Florian!


On Tue, Dec 20, 2016 at 4:33 PM, Florian Fainelli  wrote:
> On 12/20/2016 03:51 PM, Ruslan Babayev wrote:
>> Fixes a crash observed on Octeon.
>>
>> Signed-off-by: Ruslan Babayev 
>> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a
>> different owner")
>
> Assuming you saw this with the staging Octeon driver, a fix has already
> been submitted:
>
> https://lkml.org/lkml/2016/12/14/756
>
> If this is with a different driver, I would rather we fix it in a
> similar way that the fix proposed above.
>
> Thanks
>
>> ---
>>  drivers/net/phy/phy_device.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 9c06f8028f0c..043328b85643 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -905,7 +905,8 @@ EXPORT_SYMBOL(phy_attached_print);
>>  int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>> u32 flags, phy_interface_t interface)
>>  {
>> - struct module *ndev_owner = dev->dev.parent->driver->owner;
>> + struct device *parent = dev->dev.parent;
>> + struct module *ndev_owner = parent ? parent->driver->owner : NULL;
>>   struct mii_bus *bus = phydev->mdio.bus;
>>   struct device *d = >mdio.dev;
>>   int err;
>>
>
>
> --
> Florian


Re: [PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage

2016-12-20 Thread Mark Greer
On Tue, Dec 20, 2016 at 11:16:31AM -0500, Geoff Lansberry wrote:
> From: Geoff Lansberry 
> 
> The TRF7970A has configuration options for supporting hardware designs
> with 1.8 Volt or 3.3 Volt IO.   This commit adds a device tree option,
> using a fixed regulator binding, for setting the io voltage to match
> the hardware configuration. If no option is supplied it defaults to
> 3.3 volt configuration.

Sign-off ??  Same comment for you other patches.



Okay I see you have it at the end of the patch.  It should be here.
'git commit -s' is your friend.

> ---
>  .../devicetree/bindings/net/nfc/trf7970a.txt   |  4 ++--
>  drivers/nfc/trf7970a.c | 28 
> +-
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt 
> b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> index e262ac1..b5777d8 100644
> --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> @@ -21,9 +21,9 @@ Optional SoC Specific Properties:
>  - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
>where an extra byte is returned by Read Multiple Block commands issued
>to Type 5 tags.
> +- vdd-io-supply: Regulator specifying voltage for vdd-io
>  - clock-frequency: Set to specify that the input frequency to the trf7970a 
> is 1356Hz or 2712Hz
>  
> -
>  Example (for ARM-based BeagleBone with TRF7970A on SPI1):
>  
>   {
> @@ -41,11 +41,11 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
> < 5 GPIO_ACTIVE_LOW>;
>   vin-supply = <_reg>;
>   vin-voltage-override = <500>;
> + vdd-io-supply = <_reg>;
>   autosuspend-delay = <3>;
>   irq-status-read-quirk;
>   en2-rf-quirk;
>   t5t-rmb-extra-byte-quirk;
> - vdd_io_1v8;

It was already mentioned but this shouldn't have been added in the
previous patch so it shouldn't be here now.

>   clock-frequency = <2712>;
>   status = "okay";
>   };
> diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
> index 4e051e9..8a88195 100644
> --- a/drivers/nfc/trf7970a.c
> +++ b/drivers/nfc/trf7970a.c

> @@ -2062,6 +2068,7 @@ static int trf7970a_probe(struct spi_device *spi)
>   return ret;
>   }
>  
> +

Please don't add an extra blank line.

>   of_property_read_u32(np, "clock-frequency", _freq);
>   if ((clk_freq != TRF7970A_27MHZ_CLOCK_FREQUENCY) ||
>   (clk_freq != TRF7970A_27MHZ_CLOCK_FREQUENCY)) {
> @@ -2105,6 +2112,25 @@ static int trf7970a_probe(struct spi_device *spi)
>   if (uvolts > 400)
>   trf->chip_status_ctrl = TRF7970A_CHIP_STATUS_VRS5_3;
>  
> + trf->regulator = devm_regulator_get(>dev, "vdd-io");
> + if (IS_ERR(trf->regulator)) {
> + ret = PTR_ERR(trf->regulator);
> + dev_err(trf->dev, "Can't get VDD_IO regulator: %d\n", ret);
> + goto err_destroy_lock;
> + }
> +
> + ret = regulator_enable(trf->regulator);
> + if (ret) {
> + dev_err(trf->dev, "Can't enable VDD_IO: %d\n", ret);
> + goto err_destroy_lock;
> + }
> +
> +

Please don't add an extra blank line.

> + if (regulator_get_voltage(trf->regulator) == 180) {
> + trf->io_ctrl = TRF7970A_REG_IO_CTRL_IO_LOW;
> + dev_dbg(trf->dev, "trf7970a config vdd_io to 1.8V\n");
> + }
> +
>   trf->ddev = nfc_digital_allocate_device(_nfc_ops,
>   TRF7970A_SUPPORTED_PROTOCOLS,
>   NFC_DIGITAL_DRV_CAPS_IN_CRC |
> -- 
> Signed-off-by: Geoff Lansberry 

Your 'Signed-off-by:' goes at the end of the commit description not here.

Overall, I think you did the right thing (unless someone disagrees).
Just some minor issues.

Mark
--


Re: [PATCH net-next] ixgbevf: fix 'Etherleak' in ixgbevf

2016-12-20 Thread Alexander Duyck
I find it curious that only the last 4 bytes have data in them.  I'm
wondering if the NIC/driver in the Windows/Nessus system is
interpreting the 4 byte CRC on the end of the frame as padding instead
of stripping it.

Is there any chance you could capture the entire frame instead of just
the padding?  Maybe you could run something like wireshark without
enabling promiscuous mode on the VF and capture the frames it is
trying to send and receive.  What I want to verify is what the actual
amount of padding is that is needed to get to 60 bytes and where the
CRC should start.

- Alex

On Tue, Dec 20, 2016 at 5:40 PM, Weilong Chen  wrote:
> Thanks for you explanation, it's very professional.
>
> My test is like this:
> The Nessus is deployed on a windows server, the peer is a X86_64 linux host
> which run several VMs on it. The nic is Intel 82599 and SRIOV is enabled.
> VFs are passthroughed to the VMs. No DPDK.
>
> The Nessus server send small ICMP echo request packets to the VM, and
> then check the reply, and report the error:
>
> "11197 - Multiple Ethernet Driver Frame Padding Information Disclosure
> (Etherleak)"
>
> "Padding observed in one frame :
>
> 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 57 37 28 .W7(
> 0x10: 76 v
>
> Padding observed in another frame :
>
> 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 D3 4D 75 ..Mu
> 0x10: 28 ("
>
> I only have Nessus's windows version, so can't test on linux. Maybe the
> windows server does not pad small packets to 60 bytes on the receive path.
>
>
> On 2016/12/21 0:36, Alexander Duyck wrote:
>>
>> The limit of 17 is just based on the hardware.  Specifically the
>> olinfo field in the Tx descriptor has a minimum length of 17 has a
>> requirement.  The hardware itself is supposed to be capable of padding
>> short frames that are supposed to be transmitted.  The drivers are
>> supposed to pad short frames on receive to get them up to 60 bytes.
>>
>> When you are seeing this issue are you sending frames from the VF to
>> one of the local interfaces on the same port or to an external
>> interface?  Also are you receiving on another linux ixgbevf driver or
>> are you receiving the packet using a different driver interface such
>> as DPDK?  I'm just wanting to verify this as it is possible that the
>> memory leak you are seeing is on the receiver and not on the source if
>> you are transmitting to a local VF or the PF as the receiver will have
>> to pad the frame in such a case to get it up to 60 bytes.
>>
>> - Alex
>>
>> On Tue, Dec 20, 2016 at 3:50 AM, Weilong Chen 
>> wrote:
>>>
>>> Hi,
>>>
>>> Thanks for you reply.
>>> We test you patch, but the problem is still there, it seems do not work.
>>>
>>> I'm not sure why ixgbe use the limit 17. The kenel use ETH_ZLEN (60) with
>>> out FCS. A lot of drivers such as e1000 use it. Any explaination?
>>>
>>> Thanks.
>>>
>>>
>>> On 2016/12/16 0:13, Alexander Duyck wrote:


 On Thu, Dec 15, 2016 at 3:40 AM, Weilong Chen 
 wrote:
>
>
> Nessus report the vf appears to leak memory in network packets.
> Fix this by padding all small packets manually.
>
> And the CVE-2003-0001.
>
>
> https://ofirarkin.files.wordpress.com/2008/11/atstake_etherleak_report.pdf
>
> Signed-off-by: Weilong Chen 
> ---
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index 6d4bef5..137a154 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -3654,6 +3654,13 @@ static int ixgbevf_xmit_frame(struct sk_buff
> *skb,
> struct net_device *netdev)
> return NETDEV_TX_OK;
> }
>
> +   /* On PCI/PCI-X HW, if packet size is less than ETH_ZLEN,
> +* packets may get corrupted during padding by HW.
> +* To WA this issue, pad all small packets manually.
> +*/
> +   if (eth_skb_pad(skb))
> +   return NETDEV_TX_OK;
> +



 So the patch description for this probably isn't correct.  It looks
 like the problem isn't leaking data it is the fact that the frames
 aren't being padded to prevent malicious events.  The only issue is
 the patch is padding by a bit too much.  I would recommend replacing
 this with the following from ixgbe:

 /*
  * The minimum packet size for olinfo paylen is 17 so pad the
 skb
  * in order to meet this minimum size requirement.
  */
 if (skb_put_padto(skb, 17))
 return NETDEV_TX_OK;


> tx_ring = adapter->tx_ring[skb->queue_mapping];

Re: [PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage

2016-12-20 Thread Mark Greer
On Tue, Dec 20, 2016 at 11:13:23AM -0500, Geoff Lansberry wrote:
> On Mon, Dec 19, 2016 at 5:35 PM, Rob Herring  wrote:
> > On Thu, Dec 15, 2016 at 05:30:43PM -0500, Geoff Lansberry wrote:
> >> From: Geoff Lansberry 
> >>
> >> ---
> >>  Documentation/devicetree/bindings/net/nfc/trf7970a.txt |  2 ++
> >>  drivers/nfc/trf7970a.c | 13 -
> >>  2 files changed, 14 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt 
> >> b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> >> index 9dda879..208f045 100644
> >> --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> >> +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> >> @@ -21,6 +21,7 @@ Optional SoC Specific Properties:
> >>  - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
> >>where an extra byte is returned by Read Multiple Block commands issued
> >>to Type 5 tags.
> >> +- vdd_io_1v8: Set to specify that the trf7970a io voltage should be set 
> >> to 1.8V
> >
> > Use the regulator binding and provide a fixed 1.8V supply.
> >
> >>  - crystal_27mhz: Set to specify that the input frequency to the trf7970a 
> >> is 27.12MHz
> >>
> >>
> >> @@ -45,6 +46,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
> >>   irq-status-read-quirk;
> >>   en2-rf-quirk;
> >>   t5t-rmb-extra-byte-quirk;
> >> + vdd_io_1v8;
> >>   crystal_27mhz;
> >>   status = "okay";
> >>   };
> 
> Rob - using the regulator binding is new to me, but I've given it a
> shot and just sent you another set of patches for your inspection.
> Please let me know if this is what you had in mind.

This is my bad.  Geoff followed my example and did something similar to
'vin-voltage-override' which shouldn't have been there in the first place.
I have this fixed (I think) locally and will submit once it I'm back from
my holiday travels.

Mark
--


Re: [PATCH net-next] ixgbevf: fix 'Etherleak' in ixgbevf

2016-12-20 Thread Weilong Chen

Thanks for you explanation, it's very professional.

My test is like this:
The Nessus is deployed on a windows server, the peer is a X86_64 linux 
host which run several VMs on it. The nic is Intel 82599 and SRIOV is 
enabled. VFs are passthroughed to the VMs. No DPDK.


The Nessus server send small ICMP echo request packets to the VM, and
then check the reply, and report the error:

"11197 - Multiple Ethernet Driver Frame Padding Information Disclosure 
(Etherleak)"


"Padding observed in one frame :

0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 57 37 28 .W7(
0x10: 76 v

Padding observed in another frame :

0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 D3 4D 75 ..Mu
0x10: 28 ("

I only have Nessus's windows version, so can't test on linux. Maybe the 
windows server does not pad small packets to 60 bytes on the receive path.


On 2016/12/21 0:36, Alexander Duyck wrote:

The limit of 17 is just based on the hardware.  Specifically the
olinfo field in the Tx descriptor has a minimum length of 17 has a
requirement.  The hardware itself is supposed to be capable of padding
short frames that are supposed to be transmitted.  The drivers are
supposed to pad short frames on receive to get them up to 60 bytes.

When you are seeing this issue are you sending frames from the VF to
one of the local interfaces on the same port or to an external
interface?  Also are you receiving on another linux ixgbevf driver or
are you receiving the packet using a different driver interface such
as DPDK?  I'm just wanting to verify this as it is possible that the
memory leak you are seeing is on the receiver and not on the source if
you are transmitting to a local VF or the PF as the receiver will have
to pad the frame in such a case to get it up to 60 bytes.

- Alex

On Tue, Dec 20, 2016 at 3:50 AM, Weilong Chen  wrote:

Hi,

Thanks for you reply.
We test you patch, but the problem is still there, it seems do not work.

I'm not sure why ixgbe use the limit 17. The kenel use ETH_ZLEN (60) with
out FCS. A lot of drivers such as e1000 use it. Any explaination?

Thanks.


On 2016/12/16 0:13, Alexander Duyck wrote:


On Thu, Dec 15, 2016 at 3:40 AM, Weilong Chen 
wrote:


Nessus report the vf appears to leak memory in network packets.
Fix this by padding all small packets manually.

And the CVE-2003-0001.

https://ofirarkin.files.wordpress.com/2008/11/atstake_etherleak_report.pdf

Signed-off-by: Weilong Chen 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 6d4bef5..137a154 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3654,6 +3654,13 @@ static int ixgbevf_xmit_frame(struct sk_buff *skb,
struct net_device *netdev)
return NETDEV_TX_OK;
}

+   /* On PCI/PCI-X HW, if packet size is less than ETH_ZLEN,
+* packets may get corrupted during padding by HW.
+* To WA this issue, pad all small packets manually.
+*/
+   if (eth_skb_pad(skb))
+   return NETDEV_TX_OK;
+



So the patch description for this probably isn't correct.  It looks
like the problem isn't leaking data it is the fact that the frames
aren't being padded to prevent malicious events.  The only issue is
the patch is padding by a bit too much.  I would recommend replacing
this with the following from ixgbe:

/*
 * The minimum packet size for olinfo paylen is 17 so pad the skb
 * in order to meet this minimum size requirement.
 */
if (skb_put_padto(skb, 17))
return NETDEV_TX_OK;



tx_ring = adapter->tx_ring[skb->queue_mapping];

/* need: 1 descriptor per page *
PAGE_SIZE/IXGBE_MAX_DATA_PER_TXD,
--
1.7.12



.





.





Re: [PATCH] staging: octeon: Call SET_NETDEV_DEV()

2016-12-20 Thread Florian Fainelli
On 12/14/2016 05:13 PM, Florian Fainelli wrote:
> The Octeon driver calls into PHYLIB which now checks for
> net_device->dev.parent, so make sure we do set it before calling into
> any MDIO/PHYLIB related function.
> 
> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a 
> different owner")
> Reported-by: Aaro Koskinen 
> Signed-off-by: Florian Fainelli 

Greg, David, since this is a fix for a regression introduced in the net
tree, it may make sense that David take it via his tree.

Thanks

> ---
>  drivers/staging/octeon/ethernet.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/staging/octeon/ethernet.c 
> b/drivers/staging/octeon/ethernet.c
> index 8130dfe89745..4971aa54756a 100644
> --- a/drivers/staging/octeon/ethernet.c
> +++ b/drivers/staging/octeon/ethernet.c
> @@ -770,6 +770,7 @@ static int cvm_oct_probe(struct platform_device *pdev)
>   /* Initialize the device private structure. */
>   struct octeon_ethernet *priv = netdev_priv(dev);
>  
> + SET_NETDEV_DEV(dev, >dev);
>   dev->netdev_ops = _oct_pow_netdev_ops;
>   priv->imode = CVMX_HELPER_INTERFACE_MODE_DISABLED;
>   priv->port = CVMX_PIP_NUM_INPUT_PORTS;
> @@ -816,6 +817,7 @@ static int cvm_oct_probe(struct platform_device *pdev)
>   }
>  
>   /* Initialize the device private structure. */
> + SET_NETDEV_DEV(dev, >dev);
>   priv = netdev_priv(dev);
>   priv->netdev = dev;
>   priv->of_node = cvm_oct_node_for_port(pip, interface,
> 


-- 
Florian


Re: [PATCH] phy: check if parent device is NULL

2016-12-20 Thread Florian Fainelli
On 12/20/2016 03:51 PM, Ruslan Babayev wrote:
> Fixes a crash observed on Octeon.
> 
> Signed-off-by: Ruslan Babayev 
> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a
> different owner")

Assuming you saw this with the staging Octeon driver, a fix has already
been submitted:

https://lkml.org/lkml/2016/12/14/756

If this is with a different driver, I would rather we fix it in a
similar way that the fix proposed above.

Thanks

> ---
>  drivers/net/phy/phy_device.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 9c06f8028f0c..043328b85643 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -905,7 +905,8 @@ EXPORT_SYMBOL(phy_attached_print);
>  int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
> u32 flags, phy_interface_t interface)
>  {
> - struct module *ndev_owner = dev->dev.parent->driver->owner;
> + struct device *parent = dev->dev.parent;
> + struct module *ndev_owner = parent ? parent->driver->owner : NULL;
>   struct mii_bus *bus = phydev->mdio.bus;
>   struct device *d = >mdio.dev;
>   int err;
> 


-- 
Florian


Re: [PATCH net-next 00/10] netcp: enhancements and minor fixes

2016-12-20 Thread David Miller

The net-next tree is not open, do not resubmit this series until it
is open again.

Thanks.


Re: HalfSipHash Acceptable Usage

2016-12-20 Thread Eric Dumazet
On Tue, 2016-12-20 at 16:36 -0500, Theodore Ts'o wrote:
> On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
> > 1) Anything that requires actual long-term security will use
> > SipHash2-4, with the 64-bit output and the 128-bit key. This includes
> > things like TCP sequence numbers. This seems pretty uncontroversial to
> > me. Seem okay to you?
> 
> Um, why do TCP sequence numbers need long-term security?  So long as
> you rekey every 5 minutes or so, TCP sequence numbers don't need any
> more security than that, since even if you break the key used to
> generate initial sequence numbers seven a minute or two later, any
> pending TCP connections will have timed out long before.
> 
> See the security analysis done in RFC 6528[1], where among other
> things, it points out why MD5 is acceptable with periodic rekeying,
> although there is the concern that this could break certain hueristics
> used when establishing new connections during the TIME-WAIT state.
> 
> [1] https://tools.ietf.org/html/rfc6528


We do not use rekeying for TCP ISN, not anymore after commit
6e5714eaf77d79ae1 (where we switched from MD4 to MD5 )

It might hurt some common cases and I do not believe it is mandated by a
current (ie not obsolete) RFC.

Our clock has a 64 ns resolution and 274 second period (commit
9b42c336d0641) (compared to 4 usec one in RFC 6528)

I do not see why SipHash, if faster than MD5 and more secure, would be a
problem.

Same for syncookies.

BTW, we probably should add a ratelimit on SYNACK retransmits,
because it seems that attackers understood linux kernels resist to
synfloods, and they (the bad guys) use reflection attacks.





[PATCH] phy: check if parent device is NULL

2016-12-20 Thread Ruslan Babayev
Fixes a crash observed on Octeon.

Signed-off-by: Ruslan Babayev 
Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a
different owner")
---
 drivers/net/phy/phy_device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 9c06f8028f0c..043328b85643 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -905,7 +905,8 @@ EXPORT_SYMBOL(phy_attached_print);
 int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
  u32 flags, phy_interface_t interface)
 {
-   struct module *ndev_owner = dev->dev.parent->driver->owner;
+   struct device *parent = dev->dev.parent;
+   struct module *ndev_owner = parent ? parent->driver->owner : NULL;
struct mii_bus *bus = phydev->mdio.bus;
struct device *d = >mdio.dev;
int err;
-- 
2.7.4


Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
Theodore Ts'o wrote:
> On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
>> 1) Anything that requires actual long-term security will use
>> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
>> things like TCP sequence numbers. This seems pretty uncontroversial to
>> me. Seem okay to you?

> Um, why do TCP sequence numbers need long-term security?  So long as
> you rekey every 5 minutes or so, TCP sequence numbers don't need any
> more security than that, since even if you break the key used to
> generate initial sequence numbers seven a minute or two later, any
> pending TCP connections will have timed out long before.
> 
> See the security analysis done in RFC 6528[1], where among other
> things, it points out why MD5 is acceptable with periodic rekeying,
> although there is the concern that this could break certain hueristics
> used when establishing new connections during the TIME-WAIT state.

Because we don't rekey TCP sequence numbers, ever.  See commit
6e5714eaf77d79ae1c8b47e3e040ff5411b717ec

To rekey them requires dividing the sequence number base into a "random"
part and some "generation" msbits.  While we can do better than the
previous 8+24 split (I'd suggest 4+28 or 3+29), only 2 is tricks, and
1 generation bit isn't enough.

So while it helps in the long term, it reduces the security offered by
the random part in the short term.  (If I know 4 bits of your ISN,
I only need to send 256 MB to hit your TCP window.)

At the time, I objected, and suggested doing two hashes, with a fixed
32-bit base plus a split rekeyed portion, but that was vetoed on the
grounds of performance.

On further consideration, the fixed base doesn't help much.
(Details below for anyone that cares.)



Suppose we let the TCP initial sequence number be:

(Hash(, fixed_key) & 0x) +
(i << 28) + (Hash(, key[i]) & 0x0fff) +
(current_time_in_nanoseconds / 64)

It's not hugely difficult to mount an effective attack against a
64-bit fixed_key.

As an attacker, I can ask the target to send me these numbers for dstPort
values i control and other values I know.  I can (with high probability)
detect the large jumps when the generation changes, so I can make a
significant number of queries with the same generation.  After 23-ish
queries, I have enough information to identify a 64-bit fixed_key.

I don't know the current generation counter "i", but I know it's the
same for all my queries, so for any two queries, the maximum difference
between the 28-bit hash values is 29 bits.  (We can also add a small
margin to allow for timeing uncertainty, but that's even less.)

So if I guess a fixed key, hash my known plaintexts with that guess,
subtract the ciphertexts from the observed sequence numbers, and the
difference between the remaining (unknown) 28-bit hash values plus
timestamps exceeds what's possible, my guess is wrong.

I can then repeat with additional known plaintexts, reducing the space
of admissible keys by about 3 bits each time.

Assuming I can rent GPU horsepower from a bitcoin miner to do this in a
reasonable period of time, after 22 known plaintext differences, I have
uniquely identified the key.

Of course, in practice I'd do is a first pass with maybe 6 plaintexts
on the GPU, and then deal with the candidates found in a second pass.
But either way, it's about 2.3 SipHash evaluations per key tested.
As I noted earlier, a bitcoin blockchain block, worth 25 bitcoins,
currently costs 2^71 evaluations of SHA-2 (2^70 evaluations of double
SHA-2), and that's accomplished every 10 minutes, this is definitely
practical.


[PATCH] net: qcom/emac: add ethtool support

2016-12-20 Thread Timur Tabi
Add support for some ethtool methods: get/set link settings, get/set
message level, get statistics, get link status, and restart
autonegotiation.

Signed-off-by: Timur Tabi 
---
 drivers/net/ethernet/qualcomm/emac/Makefile   |   2 +-
 drivers/net/ethernet/qualcomm/emac/emac-ethtool.c | 156 ++
 drivers/net/ethernet/qualcomm/emac/emac.c |  51 ---
 drivers/net/ethernet/qualcomm/emac/emac.h |   3 +
 4 files changed, 191 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-ethtool.c

diff --git a/drivers/net/ethernet/qualcomm/emac/Makefile 
b/drivers/net/ethernet/qualcomm/emac/Makefile
index 7a66879..fc57ced 100644
--- a/drivers/net/ethernet/qualcomm/emac/Makefile
+++ b/drivers/net/ethernet/qualcomm/emac/Makefile
@@ -4,6 +4,6 @@
 
 obj-$(CONFIG_QCOM_EMAC) += qcom-emac.o
 
-qcom-emac-objs := emac.o emac-mac.o emac-phy.o emac-sgmii.o \
+qcom-emac-objs := emac.o emac-mac.o emac-phy.o emac-sgmii.o emac-ethtool.o \
  emac-sgmii-fsm9900.o emac-sgmii-qdf2432.o \
  emac-sgmii-qdf2400.o
diff --git a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c 
b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
new file mode 100644
index 000..6de5152
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
@@ -0,0 +1,156 @@
+/* Copyright (c) 2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+
+#include "emac.h"
+
+static const char * const emac_ethtool_stat_strings[] = {
+   "rx_ok",
+   "rx_bcast",
+   "rx_mcast",
+   "rx_pause",
+   "rx_ctrl",
+   "rx_fcs_err",
+   "rx_len_err",
+   "rx_byte_cnt",
+   "rx_runt",
+   "rx_frag",
+   "rx_sz_64",
+   "rx_sz_65_127",
+   "rx_sz_128_255",
+   "rx_sz_256_511",
+   "rx_sz_512_1023",
+   "rx_sz_1024_1518",
+   "rx_sz_1519_max",
+   "rx_sz_ov",
+   "rx_rxf_ov",
+   "rx_align_err",
+   "rx_bcast_byte_cnt",
+   "rx_mcast_byte_cnt",
+   "rx_err_addr",
+   "rx_crc_align",
+   "rx_jabbers",
+   "tx_ok",
+   "tx_bcast",
+   "tx_mcast",
+   "tx_pause",
+   "tx_exc_defer",
+   "tx_ctrl",
+   "tx_defer",
+   "tx_byte_cnt",
+   "tx_sz_64",
+   "tx_sz_65_127",
+   "tx_sz_128_255",
+   "tx_sz_256_511",
+   "tx_sz_512_1023",
+   "tx_sz_1024_1518",
+   "tx_sz_1519_max",
+   "tx_1_col",
+   "tx_2_col",
+   "tx_late_col",
+   "tx_abort_col",
+   "tx_underrun",
+   "tx_rd_eop",
+   "tx_len_err",
+   "tx_trunc",
+   "tx_bcast_byte",
+   "tx_mcast_byte",
+   "tx_col",
+};
+
+#define EMAC_STATS_LEN ARRAY_SIZE(emac_ethtool_stat_strings)
+
+static u32 emac_get_msglevel(struct net_device *netdev)
+{
+   struct emac_adapter *adpt = netdev_priv(netdev);
+
+   return adpt->msg_enable;
+}
+
+static void emac_set_msglevel(struct net_device *netdev, u32 data)
+{
+   struct emac_adapter *adpt = netdev_priv(netdev);
+
+   adpt->msg_enable = data;
+}
+
+static int emac_get_sset_count(struct net_device *netdev, int sset)
+{
+   switch (sset) {
+   case ETH_SS_STATS:
+   return EMAC_STATS_LEN;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static void emac_get_strings(struct net_device *netdev, u32 stringset, u8 
*data)
+{
+   unsigned int i;
+
+   switch (stringset) {
+   case ETH_SS_STATS:
+   for (i = 0; i < EMAC_STATS_LEN; i++) {
+   strlcpy(data, emac_ethtool_stat_strings[i],
+   ETH_GSTRING_LEN);
+   data += ETH_GSTRING_LEN;
+   }
+   break;
+   }
+}
+
+static void emac_get_ethtool_stats(struct net_device *netdev,
+  struct ethtool_stats *stats,
+  u64 *data)
+{
+   struct emac_adapter *adpt = netdev_priv(netdev);
+
+   spin_lock(>stats.lock);
+
+   emac_update_hw_stats(adpt);
+   memcpy(data, >stats, EMAC_STATS_LEN * sizeof(u64));
+
+   spin_unlock(>stats.lock);
+}
+
+static int emac_nway_reset(struct net_device *netdev)
+{
+   struct phy_device *phydev = netdev->phydev;
+
+   if (!phydev)
+   return -ENODEV;
+
+   return genphy_restart_aneg(phydev);
+}
+
+static const struct ethtool_ops emac_ethtool_ops = {
+   .get_link_ksettings = phy_ethtool_get_link_ksettings,
+   .set_link_ksettings = 

[PATCH v5] net: dummy: Introduce dummy virtual functions

2016-12-20 Thread Phil Sutter
The idea for this was born when testing VF support in iproute2 which was
impeded by hardware requirements. In fact, not every VF-capable hardware
driver implements all netdev ops, so testing the interface is still hard
to do even with a well-sorted hardware shelf.

To overcome this and allow for testing the user-kernel interface, this
patch allows to turn dummy into a PF with a configurable amount of VFs.

Due to the assumption that all PFs are PCI devices, this implementation
is not completely straightforward: In order to allow for
rtnl_fill_ifinfo() to see the dummy VFs, a fake PCI parent device is
attached to the dummy netdev. This has to happen at the right spot so
register_netdevice() does not get confused. This patch abuses
ndo_fix_features callback for that. In ndo_uninit callback, the fake
parent is removed again for the same purpose.

Joint work with Sabrina Dubroca.

Signed-off-by: Sabrina Dubroca 
Signed-off-by: Phil Sutter 
---
Changes since v4:
- Initialize pci_pdev.sriov at runtime - older gcc versions don't allow
  initializing fields of anonymous unions at declaration time.
- Rebased onto current net-next/master.
  
Changes since v3:
- Changed type of vf_mac field from unsigned char to u8.
- Column-aligned structs' field names.

Changes since v2:
- Fixed oops on reboot (need to initialize parent device mutex).
- Got rid of potential mem leak noticed by Eric Dumazet.
- Dropped stray newline insertion.

Changes since v1:
- Fixed issues reported by kbuild test robot:
  - pci_dev->sriov is only present if CONFIG_PCI_ATS is active.
  - pci_bus_type does not exist if CONFIG_PCI is not defined.
---
 drivers/net/dummy.c | 205 +++-
 1 file changed, 203 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index 6421835f11b7e..7f8d8598bbbfe 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -34,6 +34,8 @@
 #include 
 #include 
 #include 
+#include 
+#include "../pci/pci.h"/* for struct pci_sriov */
 #include 
 #include 
 #include 
@@ -42,6 +44,34 @@
 #define DRV_VERSION"1.0"
 
 static int numdummies = 1;
+static int num_vfs;
+
+static struct pci_sriov pdev_sriov;
+
+static struct pci_dev pci_pdev = {
+   .is_physfn = 0,
+#ifdef CONFIG_PCI
+   .dev.bus = _bus_type,
+#endif
+};
+
+struct vf_data_storage {
+   u8  vf_mac[ETH_ALEN];
+   u16 pf_vlan; /* When set, guest VLAN config not allowed. */
+   u16 pf_qos;
+   __be16  vlan_proto;
+   u16 min_tx_rate;
+   u16 max_tx_rate;
+   u8  spoofchk_enabled;
+   boolrss_query_enabled;
+   u8  trusted;
+   int link_state;
+};
+
+struct dummy_priv {
+   int num_vfs;
+   struct vf_data_storage  *vfinfo;
+};
 
 /* fake multicast ability */
 static void set_multicast_list(struct net_device *dev)
@@ -91,15 +121,31 @@ static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
 static int dummy_dev_init(struct net_device *dev)
 {
+   struct dummy_priv *priv = netdev_priv(dev);
+
dev->dstats = netdev_alloc_pcpu_stats(struct pcpu_dstats);
if (!dev->dstats)
return -ENOMEM;
 
+   priv->num_vfs = num_vfs;
+   priv->vfinfo = NULL;
+
+   if (!num_vfs)
+   return 0;
+
+   priv->vfinfo = kcalloc(num_vfs, sizeof(struct vf_data_storage),
+  GFP_KERNEL);
+   if (!priv->vfinfo) {
+   free_percpu(dev->dstats);
+   return -ENOMEM;
+   }
+
return 0;
 }
 
 static void dummy_dev_uninit(struct net_device *dev)
 {
+   dev->dev.parent = NULL;
free_percpu(dev->dstats);
 }
 
@@ -112,6 +158,137 @@ static int dummy_change_carrier(struct net_device *dev, 
bool new_carrier)
return 0;
 }
 
+/* fake, just to set fake PCI parent after netdev_register_kobject() */
+static netdev_features_t dummy_fix_features(struct net_device *dev,
+   netdev_features_t features)
+{
+   struct dummy_priv *priv = netdev_priv(dev);
+
+   if (priv->num_vfs) {
+#ifdef CONFIG_PCI_ATS
+   pci_pdev.sriov = _sriov;
+#endif
+   dev->dev.parent = _pdev.dev;
+   if (!pci_pdev.is_physfn) {
+   mutex_init(_pdev.dev.mutex);
+   pci_pdev.is_physfn = 1;
+   }
+   }
+
+   return features;
+}
+
+static int dummy_set_vf_mac(struct net_device *dev, int vf, u8 *mac)
+{
+   struct dummy_priv *priv = netdev_priv(dev);
+
+   if (!is_valid_ether_addr(mac) || (vf >= priv->num_vfs))
+   return -EINVAL;
+
+   memcpy(priv->vfinfo[vf].vf_mac, mac, ETH_ALEN);
+
+   return 0;
+}
+
+static int dummy_set_vf_vlan(struct net_device *dev, int vf,
+u16 vlan, u8 qos, __be16 vlan_proto)
+{
+   struct dummy_priv *priv = 

[PATCH 2/2] net: sfc: falcon: use new api ethtool_{get|set}_link_ksettings

2016-12-20 Thread Philippe Reynes
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/sfc/falcon/efx.c  |2 +-
 drivers/net/ethernet/sfc/falcon/ethtool.c  |   35 ---
 drivers/net/ethernet/sfc/falcon/mdio_10g.c |   44 +++-
 drivers/net/ethernet/sfc/falcon/mdio_10g.h |3 +-
 drivers/net/ethernet/sfc/falcon/net_driver.h   |   12 +++---
 drivers/net/ethernet/sfc/falcon/qt202x_phy.c   |9 +++--
 drivers/net/ethernet/sfc/falcon/tenxpress.c|   22 ++--
 drivers/net/ethernet/sfc/falcon/txc43128_phy.c |9 +++--
 8 files changed, 80 insertions(+), 56 deletions(-)

diff --git a/drivers/net/ethernet/sfc/falcon/efx.c 
b/drivers/net/ethernet/sfc/falcon/efx.c
index 5c5cb3c..438ef9e 100644
--- a/drivers/net/ethernet/sfc/falcon/efx.c
+++ b/drivers/net/ethernet/sfc/falcon/efx.c
@@ -986,7 +986,7 @@ void ef4_mac_reconfigure(struct ef4_nic *efx)
 
 /* Push loopback/power/transmit disable settings to the PHY, and reconfigure
  * the MAC appropriately. All other PHY configuration changes are pushed
- * through phy_op->set_settings(), and pushed asynchronously to the MAC
+ * through phy_op->set_link_ksettings(), and pushed asynchronously to the MAC
  * through ef4_monitor().
  *
  * Callers must hold the mac_lock
diff --git a/drivers/net/ethernet/sfc/falcon/ethtool.c 
b/drivers/net/ethernet/sfc/falcon/ethtool.c
index 8e1929b..659ece7 100644
--- a/drivers/net/ethernet/sfc/falcon/ethtool.c
+++ b/drivers/net/ethernet/sfc/falcon/ethtool.c
@@ -115,44 +115,53 @@ static int ef4_ethtool_phys_id(struct net_device *net_dev,
 }
 
 /* This must be called with rtnl_lock held. */
-static int ef4_ethtool_get_settings(struct net_device *net_dev,
-   struct ethtool_cmd *ecmd)
+static int
+ef4_ethtool_get_link_ksettings(struct net_device *net_dev,
+  struct ethtool_link_ksettings *cmd)
 {
struct ef4_nic *efx = netdev_priv(net_dev);
struct ef4_link_state *link_state = >link_state;
+   u32 supported;
+
+   ethtool_convert_link_mode_to_legacy_u32(,
+   cmd->link_modes.supported);
 
mutex_lock(>mac_lock);
-   efx->phy_op->get_settings(efx, ecmd);
+   efx->phy_op->get_link_ksettings(efx, cmd);
mutex_unlock(>mac_lock);
 
/* Both MACs support pause frames (bidirectional and respond-only) */
-   ecmd->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+   supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
 
if (LOOPBACK_INTERNAL(efx)) {
-   ethtool_cmd_speed_set(ecmd, link_state->speed);
-   ecmd->duplex = link_state->fd ? DUPLEX_FULL : DUPLEX_HALF;
+   cmd->base.speed = link_state->speed;
+   cmd->base.duplex = link_state->fd ? DUPLEX_FULL : DUPLEX_HALF;
}
 
+   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+   supported);
+
return 0;
 }
 
 /* This must be called with rtnl_lock held. */
-static int ef4_ethtool_set_settings(struct net_device *net_dev,
-   struct ethtool_cmd *ecmd)
+static int
+ef4_ethtool_set_link_ksettings(struct net_device *net_dev,
+  const struct ethtool_link_ksettings *cmd)
 {
struct ef4_nic *efx = netdev_priv(net_dev);
int rc;
 
/* GMAC does not support 1000Mbps HD */
-   if ((ethtool_cmd_speed(ecmd) == SPEED_1000) &&
-   (ecmd->duplex != DUPLEX_FULL)) {
+   if ((cmd->base.speed == SPEED_1000) &&
+   (cmd->base.duplex != DUPLEX_FULL)) {
netif_dbg(efx, drv, efx->net_dev,
  "rejecting unsupported 1000Mbps HD setting\n");
return -EINVAL;
}
 
mutex_lock(>mac_lock);
-   rc = efx->phy_op->set_settings(efx, ecmd);
+   rc = efx->phy_op->set_link_ksettings(efx, cmd);
mutex_unlock(>mac_lock);
return rc;
 }
@@ -1310,8 +1319,6 @@ static int ef4_ethtool_get_module_info(struct net_device 
*net_dev,
 }
 
 const struct ethtool_ops ef4_ethtool_ops = {
-   .get_settings   = ef4_ethtool_get_settings,
-   .set_settings   = ef4_ethtool_set_settings,
.get_drvinfo= ef4_ethtool_get_drvinfo,
.get_regs_len   = ef4_ethtool_get_regs_len,
.get_regs   = ef4_ethtool_get_regs,
@@ -1340,4 +1347,6 @@ static int ef4_ethtool_get_module_info(struct net_device 
*net_dev,
.set_rxfh   = ef4_ethtool_set_rxfh,
.get_module_info= ef4_ethtool_get_module_info,
.get_module_eeprom  = ef4_ethtool_get_module_eeprom,
+   .get_link_ksettings = ef4_ethtool_get_link_ksettings,
+   .set_link_ksettings = ef4_ethtool_set_link_ksettings,
 };
diff --git 

[PATCH 1/2] net: mdio: add mdio45_ethtool_ksettings_get

2016-12-20 Thread Philippe Reynes
There is a function in mdio for the old ethtool api gset.
We add a new function mdio45_ethtool_ksettings_get for the
new ethtool api glinksettings.

Signed-off-by: Philippe Reynes 
---
 drivers/net/mdio.c   |  178 ++
 include/linux/mdio.h |   21 ++
 2 files changed, 199 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mdio.c b/drivers/net/mdio.c
index 3e027ed..077364c 100644
--- a/drivers/net/mdio.c
+++ b/drivers/net/mdio.c
@@ -342,6 +342,184 @@ void mdio45_ethtool_gset_npage(const struct mdio_if_info 
*mdio,
 EXPORT_SYMBOL(mdio45_ethtool_gset_npage);
 
 /**
+ * mdio45_ethtool_ksettings_get_npage - get settings for ETHTOOL_GLINKSETTINGS
+ * @mdio: MDIO interface
+ * @cmd: Ethtool request structure
+ * @npage_adv: Modes currently advertised on next pages
+ * @npage_lpa: Modes advertised by link partner on next pages
+ *
+ * The @cmd parameter is expected to have been cleared before calling
+ * mdio45_ethtool_ksettings_get_npage().
+ *
+ * Since the CSRs for auto-negotiation using next pages are not fully
+ * standardised, this function does not attempt to decode them.  The
+ * caller must pass them in.
+ */
+void mdio45_ethtool_ksettings_get_npage(const struct mdio_if_info *mdio,
+   struct ethtool_link_ksettings *cmd,
+   u32 npage_adv, u32 npage_lpa)
+{
+   int reg;
+   u32 speed, supported = 0, advertising = 0, lp_advertising = 0;
+
+   BUILD_BUG_ON(MDIO_SUPPORTS_C22 != ETH_MDIO_SUPPORTS_C22);
+   BUILD_BUG_ON(MDIO_SUPPORTS_C45 != ETH_MDIO_SUPPORTS_C45);
+
+   cmd->base.phy_address = mdio->prtad;
+   cmd->base.mdio_support =
+   mdio->mode_support & (MDIO_SUPPORTS_C45 | MDIO_SUPPORTS_C22);
+
+   reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD,
+ MDIO_CTRL2);
+   switch (reg & MDIO_PMA_CTRL2_TYPE) {
+   case MDIO_PMA_CTRL2_10GBT:
+   case MDIO_PMA_CTRL2_1000BT:
+   case MDIO_PMA_CTRL2_100BTX:
+   case MDIO_PMA_CTRL2_10BT:
+   cmd->base.port = PORT_TP;
+   supported = SUPPORTED_TP;
+   reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD,
+ MDIO_SPEED);
+   if (reg & MDIO_SPEED_10G)
+   supported |= SUPPORTED_1baseT_Full;
+   if (reg & MDIO_PMA_SPEED_1000)
+   supported |= (SUPPORTED_1000baseT_Full |
+   SUPPORTED_1000baseT_Half);
+   if (reg & MDIO_PMA_SPEED_100)
+   supported |= (SUPPORTED_100baseT_Full |
+   SUPPORTED_100baseT_Half);
+   if (reg & MDIO_PMA_SPEED_10)
+   supported |= (SUPPORTED_10baseT_Full |
+   SUPPORTED_10baseT_Half);
+   advertising = ADVERTISED_TP;
+   break;
+
+   case MDIO_PMA_CTRL2_10GBCX4:
+   cmd->base.port = PORT_OTHER;
+   supported = 0;
+   advertising = 0;
+   break;
+
+   case MDIO_PMA_CTRL2_10GBKX4:
+   case MDIO_PMA_CTRL2_10GBKR:
+   case MDIO_PMA_CTRL2_1000BKX:
+   cmd->base.port = PORT_OTHER;
+   supported = SUPPORTED_Backplane;
+   reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD,
+ MDIO_PMA_EXTABLE);
+   if (reg & MDIO_PMA_EXTABLE_10GBKX4)
+   supported |= SUPPORTED_1baseKX4_Full;
+   if (reg & MDIO_PMA_EXTABLE_10GBKR)
+   supported |= SUPPORTED_1baseKR_Full;
+   if (reg & MDIO_PMA_EXTABLE_1000BKX)
+   supported |= SUPPORTED_1000baseKX_Full;
+   reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD,
+ MDIO_PMA_10GBR_FECABLE);
+   if (reg & MDIO_PMA_10GBR_FECABLE_ABLE)
+   supported |= SUPPORTED_1baseR_FEC;
+   advertising = ADVERTISED_Backplane;
+   break;
+
+   /* All the other defined modes are flavours of optical */
+   default:
+   cmd->base.port = PORT_FIBRE;
+   supported = SUPPORTED_FIBRE;
+   advertising = ADVERTISED_FIBRE;
+   break;
+   }
+
+   if (mdio->mmds & MDIO_DEVS_AN) {
+   supported |= SUPPORTED_Autoneg;
+   reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_AN,
+ MDIO_CTRL1);
+   if (reg & MDIO_AN_CTRL1_ENABLE) {
+   cmd->base.autoneg = AUTONEG_ENABLE;
+   advertising |=
+   ADVERTISED_Autoneg |
+   mdio45_get_an(mdio, MDIO_AN_ADVERTISE) |
+ 

[PATCH net-next 06/10] net: netcp: ethss: get phy-handle only if link interface is MAC-to-PHY

2016-12-20 Thread Murali Karicheri
Currently to parse phy-handle, driver doesn't check if the interface is
MAC to PHY. This patch add this check for all MAC to PHY interface types
supported by the driver.

Signed-off-by: Murali Karicheri 
Signed-off-by: Sekhar Nori 
---
 drivers/net/ethernet/ti/netcp_ethss.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/netcp_ethss.c 
b/drivers/net/ethernet/ti/netcp_ethss.c
index cb48f88..9266961 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -2956,7 +2956,9 @@ static int init_slave(struct gbe_priv *gbe_dev, struct 
gbe_slave *slave,
}
 
slave->open = false;
-   slave->phy_node = of_parse_phandle(node, "phy-handle", 0);
+   if ((slave->link_interface == SGMII_LINK_MAC_PHY) ||
+   (slave->link_interface == XGMII_LINK_MAC_PHY))
+   slave->phy_node = of_parse_phandle(node, "phy-handle", 0);
slave->port_num = gbe_get_slave_port(gbe_dev, slave->slave_num);
 
if (slave->link_interface >= XGMII_LINK_MAC_PHY)
-- 
1.9.1



[PATCH net-next 00/10] netcp: enhancements and minor fixes

2016-12-20 Thread Murali Karicheri
This series is for net-next. This propagates enhancements and minor
bug fixes from internal version of the driver to keep the upstream
in sync. Please review and apply if this looks good.

Tested on all of K2HK/E/L boards.

Thanks
Murali Karicheri

Michael Scherban (1):
  net: netcp: store network statistics in 64 bits

Murali Karicheri (7):
  net: netcp: extract eflag from desc for rx_hook handling
  net: netcp: remove the redundant memmov()
  net: netcp: ethss: get phy-handle only if link interface is MAC-to-PHY
  net: netcp: use hw capability to remove FCS word from rx packets
  net: netcp: ale: update to support unknown vlan controls for NU switch
  net: netcp: ale: use ale_status to size the ale table
  net: netcp: ale: add proper ale entry mask bits for netcp switch ALE

WingMan Kwok (2):
  net: netcp: ethss: add support of subsystem register region regmap
  net: netcp: ethss: add support of 10gbe pcsr link status

 .../devicetree/bindings/net/keystone-netcp.txt |  19 +-
 drivers/net/ethernet/ti/cpsw_ale.c | 180 ---
 drivers/net/ethernet/ti/cpsw_ale.h |  17 +-
 drivers/net/ethernet/ti/netcp.h|  21 +++
 drivers/net/ethernet/ti/netcp_core.c   | 102 ---
 drivers/net/ethernet/ti/netcp_ethss.c  | 200 +
 include/linux/soc/ti/knav_dma.h|   2 +
 7 files changed, 456 insertions(+), 85 deletions(-)

-- 
1.9.1



[PATCH net-next 02/10] net: netcp: ethss: add support of 10gbe pcsr link status

2016-12-20 Thread Murali Karicheri
From: WingMan Kwok 

The 10GBASE-R Physical Coding Sublayer (PCS-R) module provides
functionality of a physical coding sublayer (PCS) on data being
transferred between a demuxed XGMII and SerDes supporting a 16
or 32 bit interface.  From the driver point of view, whether
a ethernet link is up or not depends also on the status of the
block-lock bit of the PCSR.  This patch adds the checking of that
bit in order to determine the link status.

Signed-off-by: WingMan Kwok 
Signed-off-by: Murali Karicheri 
Signed-off-by: Sekhar Nori 
---
 .../devicetree/bindings/net/keystone-netcp.txt |  3 ++
 drivers/net/ethernet/ti/netcp_ethss.c  | 37 --
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt 
b/Documentation/devicetree/bindings/net/keystone-netcp.txt
index 0854a73..57fc13f 100644
--- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
+++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
@@ -75,6 +75,9 @@ Required properties:
 - syscon-subsys:   phandle to syscon node of the switch
subsystem registers.
 
+- syscon-pcsr: (10gbe only) phandle to syscon node of the
+   switch PCSR registers.
+
 - reg: register location and the size for the following register
regions in the specified order.
- switch subsystem registers
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c 
b/drivers/net/ethernet/ti/netcp_ethss.c
index 473edda1..cb48f88 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -63,6 +63,12 @@
 #define GBE13_ALE_OFFSET   0x600
 #define GBE13_HOST_PORT_NUM0
 #define GBE13_NUM_ALE_ENTRIES  1024
+/* offset relative to PCSR regmap */
+#define XGBE10_PCSR_OFFSET(x)  ((x) * 0x80)
+#define XGBE10_PCSR_RX_STATUS(x)   (XGBE10_PCSR_OFFSET(x) + 0x0C)
+
+#define XGBE10_PCSR_BLOCK_LOCK_MASKBIT(30)
+#define XGBE10_PCSR_BLOCK_LOCK_SHIFT   30
 
 /* 1G Ethernet NU SS defines */
 #define GBENU_MODULE_NAME  "netcp-gbenu"
@@ -2111,6 +2117,10 @@ static void netcp_ethss_link_state_action(struct 
gbe_priv *gbe_dev,
 
if (phy)
phy_print_status(phy);
+   else if (slave->link_interface == XGMII_LINK_MAC_MAC_FORCED) {
+   netdev_printk(KERN_INFO, ndev,
+ "Link is %s\n", (up ? "Up" : "Down"));
+   }
 }
 
 static bool gbe_phy_link_status(struct gbe_slave *slave)
@@ -2123,18 +2133,29 @@ static void netcp_ethss_update_link_state(struct 
gbe_priv *gbe_dev,
  struct net_device *ndev)
 {
int sp = slave->slave_num;
-   int phy_link_state, sgmii_link_state = 1, link_state;
+   int phy_link_state, sw_link_state = 1, link_state, ret;
+   u32 pcsr_rx_stat;
 
if (!slave->open)
return;
 
if (!SLAVE_LINK_IS_XGMII(slave)) {
-   sgmii_link_state =
+   sw_link_state =
netcp_sgmii_get_port_link(SGMII_BASE(gbe_dev, sp), sp);
+   } else if (slave->link_interface == XGMII_LINK_MAC_MAC_FORCED) {
+   /* read status from pcsr status reg */
+   ret = regmap_read(gbe_dev->pcsr_regmap,
+ XGBE10_PCSR_RX_STATUS(sp), _rx_stat);
+
+   if (ret)
+   return;
+
+   sw_link_state = (pcsr_rx_stat & XGBE10_PCSR_BLOCK_LOCK_MASK) >>
+XGBE10_PCSR_BLOCK_LOCK_SHIFT;
}
 
phy_link_state = gbe_phy_link_status(slave);
-   link_state = phy_link_state & sgmii_link_state;
+   link_state = phy_link_state & sw_link_state;
 
if (atomic_xchg(>link_state, link_state) != link_state)
netcp_ethss_link_state_action(gbe_dev, ndev, slave,
@@ -3154,6 +3175,16 @@ static int set_xgbe_ethss10_priv(struct gbe_priv 
*gbe_dev,
return PTR_ERR(gbe_dev->ss_regmap);
}
 
+   gbe_dev->pcsr_regmap = syscon_regmap_lookup_by_phandle(node,
+  "syscon-pcsr");
+
+   if (IS_ERR(gbe_dev->pcsr_regmap)) {
+   dev_err(gbe_dev->dev,
+   "pcsr regmap lookup failed: %ld\n",
+   PTR_ERR(gbe_dev->pcsr_regmap));
+   return PTR_ERR(gbe_dev->pcsr_regmap);
+   }
+
ret = of_address_to_resource(node, XGBE_SM_REG_INDEX, );
if (ret) {
dev_err(gbe_dev->dev,
-- 
1.9.1



[PATCH net-next 05/10] net: netcp: store network statistics in 64 bits

2016-12-20 Thread Murali Karicheri
From: Michael Scherban 

Previously the network statistics were stored in 32 bit variable
which can cause some stats to roll over after several minutes of
high traffic. This implements 64 bit storage so larger numbers
can be stored.

Signed-off-by: Michael Scherban 
Signed-off-by: Murali Karicheri 
Signed-off-by: Sekhar Nori 
---
 drivers/net/ethernet/ti/netcp.h  | 18 ++
 drivers/net/ethernet/ti/netcp_core.c | 68 +---
 2 files changed, 74 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h
index a92abd6..d243c5d 100644
--- a/drivers/net/ethernet/ti/netcp.h
+++ b/drivers/net/ethernet/ti/netcp.h
@@ -23,6 +23,7 @@
 
 #include 
 #include 
+#include 
 
 /* Maximum Ethernet frame size supported by Keystone switch */
 #define NETCP_MAX_FRAME_SIZE   9504
@@ -68,6 +69,20 @@ struct netcp_addr {
struct list_headnode;
 };
 
+struct netcp_stats {
+   struct u64_stats_sync   syncp_rx cacheline_aligned_in_smp;
+   u64 rx_packets;
+   u64 rx_bytes;
+   u32 rx_errors;
+   u32 rx_dropped;
+
+   struct u64_stats_sync   syncp_tx cacheline_aligned_in_smp;
+   u64 tx_packets;
+   u64 tx_bytes;
+   u32 tx_errors;
+   u32 tx_dropped;
+};
+
 struct netcp_intf {
struct device   *dev;
struct device   *ndev_dev;
@@ -88,6 +103,9 @@ struct netcp_intf {
struct napi_struct  rx_napi;
struct napi_struct  tx_napi;
 
+   /* 64-bit netcp stats */
+   struct netcp_stats  stats;
+
void*rx_channel;
const char  *dma_chan_name;
u32 rx_pool_size;
diff --git a/drivers/net/ethernet/ti/netcp_core.c 
b/drivers/net/ethernet/ti/netcp_core.c
index 286fd8d..b077ed4 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -629,6 +629,7 @@ static void netcp_free_rx_desc_chain(struct netcp_intf 
*netcp,
 
 static void netcp_empty_rx_queue(struct netcp_intf *netcp)
 {
+   struct netcp_stats *rx_stats = >stats;
struct knav_dma_desc *desc;
unsigned int dma_sz;
dma_addr_t dma;
@@ -642,16 +643,17 @@ static void netcp_empty_rx_queue(struct netcp_intf *netcp)
if (unlikely(!desc)) {
dev_err(netcp->ndev_dev, "%s: failed to unmap Rx 
desc\n",
__func__);
-   netcp->ndev->stats.rx_errors++;
+   rx_stats->rx_errors++;
continue;
}
netcp_free_rx_desc_chain(netcp, desc);
-   netcp->ndev->stats.rx_dropped++;
+   rx_stats->rx_dropped++;
}
 }
 
 static int netcp_process_one_rx_packet(struct netcp_intf *netcp)
 {
+   struct netcp_stats *rx_stats = >stats;
unsigned int dma_sz, buf_len, org_buf_len;
struct knav_dma_desc *desc, *ndesc;
unsigned int pkt_sz = 0, accum_sz;
@@ -757,8 +759,8 @@ static int netcp_process_one_rx_packet(struct netcp_intf 
*netcp)
if (unlikely(ret)) {
dev_err(netcp->ndev_dev, "RX hook %d failed: %d\n",
rx_hook->order, ret);
-   netcp->ndev->stats.rx_errors++;
/* Free the primary descriptor */
+   rx_stats->rx_dropped++;
knav_pool_desc_put(netcp->rx_pool, desc);
dev_kfree_skb(skb);
return 0;
@@ -767,8 +769,10 @@ static int netcp_process_one_rx_packet(struct netcp_intf 
*netcp)
/* Free the primary descriptor */
knav_pool_desc_put(netcp->rx_pool, desc);
 
-   netcp->ndev->stats.rx_packets++;
-   netcp->ndev->stats.rx_bytes += skb->len;
+   u64_stats_update_begin(_stats->syncp_rx);
+   rx_stats->rx_packets++;
+   rx_stats->rx_bytes += skb->len;
+   u64_stats_update_end(_stats->syncp_rx);
 
/* push skb up the stack */
skb->protocol = eth_type_trans(skb, netcp->ndev);
@@ -777,7 +781,7 @@ static int netcp_process_one_rx_packet(struct netcp_intf 
*netcp)
 
 free_desc:
netcp_free_rx_desc_chain(netcp, desc);
-   netcp->ndev->stats.rx_errors++;
+   rx_stats->rx_errors++;
return 0;
 }
 
@@ -1008,6 +1012,7 @@ static void netcp_free_tx_desc_chain(struct netcp_intf 
*netcp,
 static int netcp_process_tx_compl_packets(struct netcp_intf *netcp,
  unsigned int budget)
 {
+   struct netcp_stats *tx_stats = >stats;
struct knav_dma_desc *desc;
struct netcp_tx_cb *tx_cb;
struct sk_buff *skb;
@@ 

[PATCH net-next 01/10] net: netcp: ethss: add support of subsystem register region regmap

2016-12-20 Thread Murali Karicheri
From: WingMan Kwok 

10gbe phy driver needs to access the 10gbe subsystem control
register during phy initialization. To facilitate the shared
access of the subsystem register region between the 10gbe Ethernet
driver and the phy driver, this patch adds support of the
subsystem register region defined by a syscon node in the dts.

Although there is no shared access to the gbe subsystem register
region, using syscon for that is for the sake of consistency.

This change is backward compatible with previously released gbe
devicetree bindings.

Signed-off-by: WingMan Kwok 
Signed-off-by: Murali Karicheri 
Signed-off-by: Sekhar Nori 
---
 .../devicetree/bindings/net/keystone-netcp.txt |  16 ++-
 drivers/net/ethernet/ti/netcp_ethss.c  | 140 +
 2 files changed, 127 insertions(+), 29 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt 
b/Documentation/devicetree/bindings/net/keystone-netcp.txt
index 04ba1dc..0854a73 100644
--- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
+++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
@@ -72,20 +72,24 @@ Required properties:
"ti,netcp-gbe-2" for 1GbE N NetCP 1.5 (N=2)
"ti,netcp-xgbe" for 10 GbE
 
+- syscon-subsys:   phandle to syscon node of the switch
+   subsystem registers.
+
 - reg: register location and the size for the following register
regions in the specified order.
- switch subsystem registers
+   - sgmii module registers
- sgmii port3/4 module registers (only for NetCP 1.4)
- switch module registers
- serdes registers (only for 10G)
 
NetCP 1.4 ethss, here is the order
-   index #0 - switch subsystem registers
+   index #0 - sgmii module registers
index #1 - sgmii port3/4 module registers
index #2 - switch module registers
 
NetCP 1.5 ethss 9 port, 5 port and 2 port
-   index #0 - switch subsystem registers
+   index #0 - sgmii module registers
index #1 - switch module registers
index #2 - serdes registers
 
@@ -145,6 +149,11 @@ Optional properties:
 
 Example binding:
 
+gbe_subsys: subsys@209 {
+   compatible = "syscon";
+   reg = <0x0209 0x100>;
+};
+
 netcp: netcp@200 {
reg = <0x2620110 0x8>;
reg-names = "efuse";
@@ -163,7 +172,8 @@ netcp: netcp@200 {
ranges;
gbe@9 {
label = "netcp-gbe";
-   reg = <0x9 0x300>, <0x90400 0x400>, <0x90800 0x700>;
+   syscon-subsys = <_subsys>;
+   reg = <0x90100 0x200>, <0x90400 0x200>, <0x90800 0x700>;
/* enable-ale; */
tx-queue = <648>;
tx-channel = <8>;
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c 
b/drivers/net/ethernet/ti/netcp_ethss.c
index c7e547e..473edda1 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -19,9 +19,11 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,7 +45,10 @@
 #define GBE_MODULE_NAME"netcp-gbe"
 #define GBE_SS_VERSION_14  0x4ed21104
 
+/* for devicetree backward compatible only */
 #define GBE_SS_REG_INDEX   0
+
+#define GBE_SGMII_REG_INDEX0
 #define GBE_SGMII34_REG_INDEX  1
 #define GBE_SM_REG_INDEX   2
 /* offset relative to base of GBE_SS_REG_INDEX */
@@ -71,9 +76,11 @@
 #define IS_SS_ID_NU(d) \
(GBE_IDENT((d)->ss_version) == GBE_SS_ID_NU)
 
-#define GBENU_SS_REG_INDEX 0
+#define GBENU_SGMII_REG_INDEX  0
 #define GBENU_SM_REG_INDEX 1
+/* offset relative to base of GBE_SS_REG_INDEX */
 #define GBENU_SGMII_MODULE_OFFSET  0x100
+/* offset relative to base of GBENU_SM_REG_INDEX */
 #define GBENU_HOST_PORT_OFFSET 0x1000
 #define GBENU_SLAVE_PORT_OFFSET0x2000
 #define GBENU_EMAC_OFFSET  0x2330
@@ -82,13 +89,12 @@
 #define GBENU_ALE_OFFSET   0x1e000
 #define GBENU_HOST_PORT_NUM0
 #define GBENU_NUM_ALE_ENTRIES  1024
-#define GBENU_SGMII_MODULE_SIZE0x100
 
 /* 10G Ethernet SS defines */
 #define XGBE_MODULE_NAME   "netcp-xgbe"
 #define XGBE_SS_VERSION_10 0x4ee42100
 
-#define XGBE_SS_REG_INDEX  0
+#define XGBE_SGMII_REG_INDEX   0
 #define XGBE_SM_REG_INDEX  1
 #define XGBE_SERDES_REG_INDEX  2
 
@@ -173,6 +179,7 @@
 #define XGBE_SET_REG_OFS(p, rb, rn) p->rb##_ofs.rn = \
  

[PATCH net-next 07/10] net: netcp: use hw capability to remove FCS word from rx packets

2016-12-20 Thread Murali Karicheri
Some of the newer Ethernet switch hw (such as that on k2e/l/g) can
strip the Etherenet FCS from packet at the port 0 egress of the switch.
So use this capability instead of doing it in software.

Signed-off-by: Murali Karicheri 
Signed-off-by: Sekhar Nori 
---
 drivers/net/ethernet/ti/netcp.h   |  2 ++
 drivers/net/ethernet/ti/netcp_core.c  |  8 ++--
 drivers/net/ethernet/ti/netcp_ethss.c | 10 --
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h
index d243c5d..8900a6f 100644
--- a/drivers/net/ethernet/ti/netcp.h
+++ b/drivers/net/ethernet/ti/netcp.h
@@ -102,6 +102,8 @@ struct netcp_intf {
void*rx_fdq[KNAV_DMA_FDQ_PER_CHAN];
struct napi_struct  rx_napi;
struct napi_struct  tx_napi;
+#define ETH_SW_CAN_REMOVE_ETH_FCS  BIT(0)
+   u32 hw_cap;
 
/* 64-bit netcp stats */
struct netcp_stats  stats;
diff --git a/drivers/net/ethernet/ti/netcp_core.c 
b/drivers/net/ethernet/ti/netcp_core.c
index b077ed4..68a75cc 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -739,8 +739,12 @@ static int netcp_process_one_rx_packet(struct netcp_intf 
*netcp)
dev_dbg(netcp->ndev_dev, "mismatch in packet size(%d) & sum of 
fragments(%d)\n",
pkt_sz, accum_sz);
 
-   /* Remove ethernet FCS from the packet */
-   __pskb_trim(skb, skb->len - ETH_FCS_LEN);
+   /* Newer version of the Ethernet switch can trim the Ethernet FCS
+* from the packet and is indicated in hw_cap. So trim it only for
+* older h/w
+*/
+   if (!(netcp->hw_cap & ETH_SW_CAN_REMOVE_ETH_FCS))
+   __pskb_trim(skb, skb->len - ETH_FCS_LEN);
 
/* Call each of the RX hooks */
p_info.skb = skb;
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c 
b/drivers/net/ethernet/ti/netcp_ethss.c
index 9266961..4b2a911 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -133,6 +133,7 @@
 #define MACSL_FULLDUPLEX   BIT(0)
 
 #define GBE_CTL_P0_ENABLE  BIT(2)
+#define ETH_SW_CTL_P0_TX_CRC_REMOVEBIT(13)
 #define GBE13_REG_VAL_STAT_ENABLE_ALL  0xff
 #define XGBE_REG_VAL_STAT_ENABLE_ALL   0xf
 #define GBE_STATS_CD_SEL   BIT(28)
@@ -2847,7 +2848,7 @@ static int gbe_open(void *intf_priv, struct net_device 
*ndev)
struct netcp_intf *netcp = netdev_priv(ndev);
struct gbe_slave *slave = gbe_intf->slave;
int port_num = slave->port_num;
-   u32 reg;
+   u32 reg, val;
int ret;
 
reg = readl(GBE_REG_ADDR(gbe_dev, switch_regs, id_ver));
@@ -2877,7 +2878,12 @@ static int gbe_open(void *intf_priv, struct net_device 
*ndev)
writel(0, GBE_REG_ADDR(gbe_dev, switch_regs, ptype));
 
/* Control register */
-   writel(GBE_CTL_P0_ENABLE, GBE_REG_ADDR(gbe_dev, switch_regs, control));
+   val = GBE_CTL_P0_ENABLE;
+   if (IS_SS_ID_MU(gbe_dev)) {
+   val |= ETH_SW_CTL_P0_TX_CRC_REMOVE;
+   netcp->hw_cap = ETH_SW_CAN_REMOVE_ETH_FCS;
+   }
+   writel(val, GBE_REG_ADDR(gbe_dev, switch_regs, control));
 
/* All statistics enabled and STAT AB visible by default */
writel(gbe_dev->stats_en_mask, GBE_REG_ADDR(gbe_dev, switch_regs,
-- 
1.9.1



[PATCH net-next 03/10] net: netcp: extract eflag from desc for rx_hook handling

2016-12-20 Thread Murali Karicheri
Extract the eflag bits from the received desc and pass it down
the rx_hook chain to be available for netcp modules. Also the
psdata and epib data has to be inspected by the netcp modules.
So the desc can be freed only after returning from the rx_hook.
So move knav_pool_desc_put() after the rx_hook processing.

Signed-off-by: Murali Karicheri 
---
 drivers/net/ethernet/ti/netcp.h  |  1 +
 drivers/net/ethernet/ti/netcp_core.c | 20 +---
 include/linux/soc/ti/knav_dma.h  |  2 ++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h
index 0f58c58..a92abd6 100644
--- a/drivers/net/ethernet/ti/netcp.h
+++ b/drivers/net/ethernet/ti/netcp.h
@@ -115,6 +115,7 @@ struct netcp_packet {
struct sk_buff  *skb;
__le32  *epib;
u32 *psdata;
+   u32 eflags;
unsigned intpsdata_len;
struct netcp_intf   *netcp;
struct netcp_tx_pipe*tx_pipe;
diff --git a/drivers/net/ethernet/ti/netcp_core.c 
b/drivers/net/ethernet/ti/netcp_core.c
index c243335..a136c56 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -122,6 +122,13 @@ static void get_pkt_info(dma_addr_t *buff, u32 *buff_len, 
dma_addr_t *ndesc,
*ndesc = le32_to_cpu(desc->next_desc);
 }
 
+static void get_desc_info(u32 *desc_info, u32 *pkt_info,
+ struct knav_dma_desc *desc)
+{
+   *desc_info = le32_to_cpu(desc->desc_info);
+   *pkt_info = le32_to_cpu(desc->packet_info);
+}
+
 static u32 get_sw_data(int index, struct knav_dma_desc *desc)
 {
/* No Endian conversion needed as this data is untouched by hw */
@@ -653,6 +660,7 @@ static int netcp_process_one_rx_packet(struct netcp_intf 
*netcp)
struct netcp_packet p_info;
struct sk_buff *skb;
void *org_buf_ptr;
+   u32 tmp;
 
dma_desc = knav_queue_pop(netcp->rx_queue, _sz);
if (!dma_desc)
@@ -724,9 +732,6 @@ static int netcp_process_one_rx_packet(struct netcp_intf 
*netcp)
knav_pool_desc_put(netcp->rx_pool, ndesc);
}
 
-   /* Free the primary descriptor */
-   knav_pool_desc_put(netcp->rx_pool, desc);
-
/* check for packet len and warn */
if (unlikely(pkt_sz != accum_sz))
dev_dbg(netcp->ndev_dev, "mismatch in packet size(%d) & sum of 
fragments(%d)\n",
@@ -739,6 +744,11 @@ static int netcp_process_one_rx_packet(struct netcp_intf 
*netcp)
p_info.skb = skb;
skb->dev = netcp->ndev;
p_info.rxtstamp_complete = false;
+   get_desc_info(, _info.eflags, desc);
+   p_info.epib = desc->epib;
+   p_info.psdata = (u32 __force *)desc->psdata;
+   p_info.eflags = ((p_info.eflags >> KNAV_DMA_DESC_EFLAGS_SHIFT) &
+KNAV_DMA_DESC_EFLAGS_MASK);
list_for_each_entry(rx_hook, >rxhook_list_head, list) {
int ret;
 
@@ -748,10 +758,14 @@ static int netcp_process_one_rx_packet(struct netcp_intf 
*netcp)
dev_err(netcp->ndev_dev, "RX hook %d failed: %d\n",
rx_hook->order, ret);
netcp->ndev->stats.rx_errors++;
+   /* Free the primary descriptor */
+   knav_pool_desc_put(netcp->rx_pool, desc);
dev_kfree_skb(skb);
return 0;
}
}
+   /* Free the primary descriptor */
+   knav_pool_desc_put(netcp->rx_pool, desc);
 
netcp->ndev->stats.rx_packets++;
netcp->ndev->stats.rx_bytes += skb->len;
diff --git a/include/linux/soc/ti/knav_dma.h b/include/linux/soc/ti/knav_dma.h
index 35cb926..2b78826 100644
--- a/include/linux/soc/ti/knav_dma.h
+++ b/include/linux/soc/ti/knav_dma.h
@@ -41,6 +41,8 @@
 #define KNAV_DMA_DESC_RETQ_SHIFT   0
 #define KNAV_DMA_DESC_RETQ_MASKMASK(14)
 #define KNAV_DMA_DESC_BUF_LEN_MASK MASK(22)
+#define KNAV_DMA_DESC_EFLAGS_MASK  MASK(4)
+#define KNAV_DMA_DESC_EFLAGS_SHIFT 20
 
 #define KNAV_DMA_NUM_EPIB_WORDS4
 #define KNAV_DMA_NUM_PS_WORDS  16
-- 
1.9.1



Re: ipv6: handle -EFAULT from skb_copy_bits

2016-12-20 Thread Dave Jones
On Tue, Dec 20, 2016 at 11:31:38AM -0800, Cong Wang wrote:
 > On Tue, Dec 20, 2016 at 10:17 AM, Dave Jones  wrote:
 > > On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote:
 > >  > From: Dave Jones 
 > >  > Date: Mon, 19 Dec 2016 19:40:13 -0500
 > >  >
 > >  > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
 > >  > >
 > >  > >  > Unfortunately, this made no difference.  I spent some time today 
 > > trying
 > >  > >  > to make a better reproducer, but failed. I'll revisit again 
 > > tomorrow.
 > >  > >  >
 > >  > >  > Maybe I need >1 process/thread to trigger this.  That would 
 > > explain why
 > >  > >  > I can trigger it with Trinity.
 > >  > >
 > >  > > scratch that last part, I finally just repro'd it with a single 
 > > process.
 > >  >
 > >  > Thanks for the info, I'll try to think about this some more.
 > >
 > > I threw in some debug printks right before that BUG_ON.
 > > it's always this:
 > >
 > > skb->len=31 skb->data_len=0 offset:30 total_len:9
 > 
 > Clearly we fail because 30 > 31 - 2, seems 'offset' is not correct here,
 > off-by-one?

Ok, I finally made a messy, albeit good enough reproducer.

#include 
#include 
#include 
#include 

#include 
#include 
#include 

#define LEN 504

int main(int argc, char* argv[])
{
int fd;
int zero = 0;
char buf[LEN];

memset(buf, 0, LEN);

fd = socket(AF_INET6, SOCK_RAW, 7);

setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, , 4);
setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, , LEN);

sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}



[PATCH net-next 04/10] net: netcp: remove the redundant memmov()

2016-12-20 Thread Murali Karicheri
The psdata is populated with command data by netcp modules
to the tail of the buffer and set_words() copy the same
to the front of the psdata. So remove the redundant memmov
function call.

Signed-off-by: Murali Karicheri 
---
 drivers/net/ethernet/ti/netcp_core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp_core.c 
b/drivers/net/ethernet/ti/netcp_core.c
index a136c56..286fd8d 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1226,9 +1226,9 @@ static int netcp_tx_submit_skb(struct netcp_intf *netcp,
/* psdata points to both native-endian and device-endian data */
__le32 *psdata = (void __force *)p_info.psdata;
 
-   memmove(p_info.psdata, p_info.psdata + p_info.psdata_len,
-   p_info.psdata_len);
-   set_words(p_info.psdata, p_info.psdata_len, psdata);
+   set_words((u32 *)psdata +
+ (KNAV_DMA_NUM_PS_WORDS - p_info.psdata_len),
+ p_info.psdata_len, psdata);
tmp |= (p_info.psdata_len & KNAV_DMA_DESC_PSLEN_MASK) <<
KNAV_DMA_DESC_PSLEN_SHIFT;
}
-- 
1.9.1



[PATCH net-next 09/10] net: netcp: ale: use ale_status to size the ale table

2016-12-20 Thread Murali Karicheri
ALE h/w on newer version of NetCP (K2E/L/G) does provide a ALE_STATUS
register for the size of the ALE Table implemented in h/w. Currently
for example we set ALE Table size to 1024 for NetCP ALE on
K2E even though the ALE Status/Documentation shows it has 8192 entries.
So take advantage of this register to read the size of ALE table supported
and use that value in the driver for the newer version of NetCP ALE.
For NetCP lite, ALE Table size is much less (64) and indicated by a size
of zero in ALE_STATUS. So use that as a default for now. While at it,
also fix the ale table size on 10G switch to 2048 per User guide
http://www.ti.com/lit/ug/spruhj5/spruhj5.pdf

Signed-off-by: Murali Karicheri 
Signed-off-by: Sekhar Nori 
---
 drivers/net/ethernet/ti/cpsw_ale.c| 31 ++-
 drivers/net/ethernet/ti/netcp_ethss.c |  4 +---
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c 
b/drivers/net/ethernet/ti/cpsw_ale.c
index e15db39..62a18d6 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -33,6 +33,7 @@
 
 /* ALE Registers */
 #define ALE_IDVER  0x00
+#define ALE_STATUS 0x04
 #define ALE_CONTROL0x08
 #define ALE_PRESCALE   0x10
 #define ALE_UNKNOWNVLAN0x18
@@ -58,6 +59,10 @@
 #define ALE_UCAST_OUI  2
 #define ALE_UCAST_TOUCHED  3
 
+#define ALE_TABLE_SIZE_MULTIPLIER  1024
+#define ALE_STATUS_SIZE_MASK   0x1f
+#define ALE_TABLE_SIZE_DEFAULT 64
+
 static inline int cpsw_ale_get_field(u32 *ale_entry, u32 start, u32 bits)
 {
int idx;
@@ -728,7 +733,7 @@ static void cpsw_ale_timer(unsigned long arg)
 
 void cpsw_ale_start(struct cpsw_ale *ale)
 {
-   u32 rev;
+   u32 rev, ale_entries;
 
rev = __raw_readl(ale->params.ale_regs + ALE_IDVER);
if (!ale->params.major_ver_mask)
@@ -740,6 +745,30 @@ void cpsw_ale_start(struct cpsw_ale *ale)
 ALE_VERSION_MAJOR(rev, ale->params.major_ver_mask),
 ALE_VERSION_MINOR(rev));
 
+   if (!ale->params.ale_entries) {
+   ale_entries =
+   __raw_readl(ale->params.ale_regs + ALE_STATUS) &
+   ALE_STATUS_SIZE_MASK;
+   /* ALE available on newer NetCP switches has introduced
+* a register, ALE_STATUS, to indicate the size of ALE
+* table which shows the size as a multiple of 1024 entries.
+* For these, params.ale_entries will be set to zero. So
+* read the register and update the value of ale_entries.
+* ALE table on NetCP lite, is much smaller and is indicated
+* by a value of zero in ALE_STATUS. So use a default value
+* of ALE_TABLE_SIZE_DEFAULT for this. Caller is expected
+* to set the value of ale_entries for all other versions
+* of ALE.
+*/
+   if (!ale_entries)
+   ale_entries = ALE_TABLE_SIZE_DEFAULT;
+   else
+   ale_entries *= ALE_TABLE_SIZE_MULTIPLIER;
+   ale->params.ale_entries = ale_entries;
+   }
+   dev_info(ale->params.dev,
+"ALE Table size %ld\n", ale->params.ale_entries);
+
if (ale->params.nu_switch_ale) {
/* Separate registers for unknown vlan configuration.
 * Also there are N bits, where N is number of ale
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c 
b/drivers/net/ethernet/ti/netcp_ethss.c
index b37fb73..80d68cb 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -94,7 +94,6 @@
 #define GBENU_CPTS_OFFSET  0x1d000
 #define GBENU_ALE_OFFSET   0x1e000
 #define GBENU_HOST_PORT_NUM0
-#define GBENU_NUM_ALE_ENTRIES  1024
 
 /* 10G Ethernet SS defines */
 #define XGBE_MODULE_NAME   "netcp-xgbe"
@@ -114,7 +113,7 @@
 #define XGBE10_ALE_OFFSET  0x700
 #define XGBE10_HW_STATS_OFFSET 0x800
 #define XGBE10_HOST_PORT_NUM   0
-#define XGBE10_NUM_ALE_ENTRIES 1024
+#define XGBE10_NUM_ALE_ENTRIES 2048
 
 #defineGBE_TIMER_INTERVAL  (HZ / 2)
 
@@ -3548,7 +3547,6 @@ static int set_gbenu_ethss_priv(struct gbe_priv *gbe_dev,
gbe_dev->ale_reg = gbe_dev->switch_regs + GBENU_ALE_OFFSET;
gbe_dev->ale_ports = gbe_dev->max_num_ports;
gbe_dev->host_port = GBENU_HOST_PORT_NUM;
-   gbe_dev->ale_entries = GBE13_NUM_ALE_ENTRIES;
gbe_dev->stats_en_mask = (1 << (gbe_dev->max_num_ports)) - 1;
 
/* Subsystem registers */
-- 
1.9.1



[PATCH net-next 08/10] net: netcp: ale: update to support unknown vlan controls for NU switch

2016-12-20 Thread Murali Karicheri
In NU Ethernet switch used on some of the Keystone SoCs, there is
separate UNKNOWNVLAN register for membership, unreg mcast flood, reg
mcast flood and force untag egress bits in ALE. So control for these
fields require different address offset, shift and size of field.
As this ALE has the same version number as ALE in CPSW found on other
SoCs, customazation based on version number is not possible. So
use a configuration parameter, nu_switch_ale, to identify the ALE
ALE found in NU Switch. Different treatment is needed for NU Switch
ALE due to difference in the ale table bits, separate unknown vlan
registers etc. The register information available in ale_controls,
needs to be updated to support the netcp NU switch h/w. So it is not
constant array any more since it needs to be updated based
on ALE type. The header of the file is also updated to indicate it
supports N port switch ALE, not just 3 port. The version mask is
3 bits in NU Switch ALE vs 8 bits on other ALE types.

While at it, change the debug print to info print so that ALE
version gets displayed in boot log.

Signed-off-by: Murali Karicheri 
Signed-off-by: Sekhar Nori 
---
 drivers/net/ethernet/ti/cpsw_ale.c| 50 +++
 drivers/net/ethernet/ti/cpsw_ale.h| 13 -
 drivers/net/ethernet/ti/netcp_ethss.c |  5 +++-
 3 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c 
b/drivers/net/ethernet/ti/cpsw_ale.c
index 43b061b..e15db39 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -1,5 +1,5 @@
 /*
- * Texas Instruments 3-Port Ethernet Switch Address Lookup Engine
+ * Texas Instruments N-Port Ethernet Switch Address Lookup Engine
  *
  * Copyright (C) 2012 Texas Instruments
  *
@@ -27,8 +27,9 @@
 
 #define BITMASK(bits)  (BIT(bits) - 1)
 
-#define ALE_VERSION_MAJOR(rev) ((rev >> 8) & 0xff)
+#define ALE_VERSION_MAJOR(rev, mask) (((rev) >> 8) & (mask))
 #define ALE_VERSION_MINOR(rev) (rev & 0xff)
+#define ALE_VERSION_1R40x0104
 
 /* ALE Registers */
 #define ALE_IDVER  0x00
@@ -39,6 +40,12 @@
 #define ALE_TABLE  0x34
 #define ALE_PORTCTL0x40
 
+/* ALE NetCP NU switch specific Registers */
+#define ALE_UNKNOWNVLAN_MEMBER 0x90
+#define ALE_UNKNOWNVLAN_UNREG_MCAST_FLOOD  0x94
+#define ALE_UNKNOWNVLAN_REG_MCAST_FLOOD0x98
+#define ALE_UNKNOWNVLAN_FORCE_UNTAG_EGRESS 0x9C
+
 #define ALE_TABLE_WRITEBIT(31)
 
 #define ALE_TYPE_FREE  0
@@ -464,7 +471,7 @@ struct ale_control_info {
int bits;
 };
 
-static const struct ale_control_info ale_controls[ALE_NUM_CONTROLS] = {
+static struct ale_control_info ale_controls[ALE_NUM_CONTROLS] = {
[ALE_ENABLE]= {
.name   = "enable",
.offset = ALE_CONTROL,
@@ -724,8 +731,41 @@ void cpsw_ale_start(struct cpsw_ale *ale)
u32 rev;
 
rev = __raw_readl(ale->params.ale_regs + ALE_IDVER);
-   dev_dbg(ale->params.dev, "initialized cpsw ale revision %d.%d\n",
-   ALE_VERSION_MAJOR(rev), ALE_VERSION_MINOR(rev));
+   if (!ale->params.major_ver_mask)
+   ale->params.major_ver_mask = 0xff;
+   ale->version =
+   (ALE_VERSION_MAJOR(rev, ale->params.major_ver_mask) << 8) |
+ALE_VERSION_MINOR(rev);
+   dev_info(ale->params.dev, "initialized cpsw ale version %d.%d\n",
+ALE_VERSION_MAJOR(rev, ale->params.major_ver_mask),
+ALE_VERSION_MINOR(rev));
+
+   if (ale->params.nu_switch_ale) {
+   /* Separate registers for unknown vlan configuration.
+* Also there are N bits, where N is number of ale
+* ports and shift value should be 0
+*/
+   ale_controls[ALE_PORT_UNKNOWN_VLAN_MEMBER].bits =
+   ale->params.ale_ports;
+   ale_controls[ALE_PORT_UNKNOWN_VLAN_MEMBER].offset =
+   ALE_UNKNOWNVLAN_MEMBER;
+   ale_controls[ALE_PORT_UNKNOWN_MCAST_FLOOD].bits =
+   ale->params.ale_ports;
+   ale_controls[ALE_PORT_UNKNOWN_MCAST_FLOOD].shift = 0;
+   ale_controls[ALE_PORT_UNKNOWN_MCAST_FLOOD].offset =
+   ALE_UNKNOWNVLAN_UNREG_MCAST_FLOOD;
+   ale_controls[ALE_PORT_UNKNOWN_REG_MCAST_FLOOD].bits =
+   ale->params.ale_ports;
+   ale_controls[ALE_PORT_UNKNOWN_REG_MCAST_FLOOD].shift = 0;
+   ale_controls[ALE_PORT_UNKNOWN_REG_MCAST_FLOOD].offset =
+   ALE_UNKNOWNVLAN_REG_MCAST_FLOOD;
+   ale_controls[ALE_PORT_UNTAGGED_EGRESS].bits =
+   ale->params.ale_ports;
+  

[PATCH net-next 10/10] net: netcp: ale: add proper ale entry mask bits for netcp switch ALE

2016-12-20 Thread Murali Karicheri
For NetCP NU Switch ALE, some of the mask bits are different than
defaults used in the driver. Add a new macro DEFINE_ALE_FIELD1 that use
a configurable mask bits and use it in the driver. These bits are set to
correct values by using the new variables added to cpsw_ale structure
and re-used in the macros. The parameter nu_switch_ale is configured by
the caller driver to indicate the ALE is for that switch and is used in
the ALE driver to do customization as needed.

Signed-off-by: Murali Karicheri 
Signed-off-by: Sekhar Nori 
---
 drivers/net/ethernet/ti/cpsw_ale.c | 99 ++
 drivers/net/ethernet/ti/cpsw_ale.h |  4 ++
 2 files changed, 84 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c 
b/drivers/net/ethernet/ti/cpsw_ale.c
index 62a18d6..ddd43e0 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -29,6 +29,7 @@
 
 #define ALE_VERSION_MAJOR(rev, mask) (((rev) >> 8) & (mask))
 #define ALE_VERSION_MINOR(rev) (rev & 0xff)
+#define ALE_VERSION_1R30x0103
 #define ALE_VERSION_1R40x0104
 
 /* ALE Registers */
@@ -46,6 +47,7 @@
 #define ALE_UNKNOWNVLAN_UNREG_MCAST_FLOOD  0x94
 #define ALE_UNKNOWNVLAN_REG_MCAST_FLOOD0x98
 #define ALE_UNKNOWNVLAN_FORCE_UNTAG_EGRESS 0x9C
+#define ALE_VLAN_MASK_MUX(reg) (0xc0 + (0x4 * (reg)))
 
 #define ALE_TABLE_WRITEBIT(31)
 
@@ -96,20 +98,34 @@ static inline void cpsw_ale_set_field(u32 *ale_entry, u32 
start, u32 bits,
cpsw_ale_set_field(ale_entry, start, bits, value);  \
 }
 
+#define DEFINE_ALE_FIELD1(name, start) \
+static inline int cpsw_ale_get_##name(u32 *ale_entry, u32 bits)
\
+{  \
+   return cpsw_ale_get_field(ale_entry, start, bits);  \
+}  \
+static inline void cpsw_ale_set_##name(u32 *ale_entry, u32 value,  \
+   u32 bits)   \
+{  \
+   cpsw_ale_set_field(ale_entry, start, bits, value);  \
+}
+
 DEFINE_ALE_FIELD(entry_type,   60, 2)
 DEFINE_ALE_FIELD(vlan_id,  48, 12)
 DEFINE_ALE_FIELD(mcast_state,  62, 2)
-DEFINE_ALE_FIELD(port_mask,66, 3)
+DEFINE_ALE_FIELD1(port_mask,   66)
 DEFINE_ALE_FIELD(super,65, 1)
 DEFINE_ALE_FIELD(ucast_type,   62, 2)
-DEFINE_ALE_FIELD(port_num, 66, 2)
+DEFINE_ALE_FIELD1(port_num,66)
 DEFINE_ALE_FIELD(blocked,  65, 1)
 DEFINE_ALE_FIELD(secure,   64, 1)
-DEFINE_ALE_FIELD(vlan_untag_force, 24, 3)
-DEFINE_ALE_FIELD(vlan_reg_mcast,   16, 3)
-DEFINE_ALE_FIELD(vlan_unreg_mcast, 8,  3)
-DEFINE_ALE_FIELD(vlan_member_list, 0,  3)
+DEFINE_ALE_FIELD1(vlan_untag_force,24)
+DEFINE_ALE_FIELD1(vlan_reg_mcast,  16)
+DEFINE_ALE_FIELD1(vlan_unreg_mcast,8)
+DEFINE_ALE_FIELD1(vlan_member_list,0)
 DEFINE_ALE_FIELD(mcast,40, 1)
+/* ALE NetCP nu switch specific */
+DEFINE_ALE_FIELD(vlan_unreg_mcast_idx, 20, 3)
+DEFINE_ALE_FIELD(vlan_reg_mcast_idx,   44, 3)
 
 /* The MAC address field in the ALE entry cannot be macroized as above */
 static inline void cpsw_ale_get_addr(u32 *ale_entry, u8 *addr)
@@ -235,14 +251,16 @@ static void cpsw_ale_flush_mcast(struct cpsw_ale *ale, 
u32 *ale_entry,
 {
int mask;
 
-   mask = cpsw_ale_get_port_mask(ale_entry);
+   mask = cpsw_ale_get_port_mask(ale_entry,
+ ale->port_mask_bits);
if ((mask & port_mask) == 0)
return; /* ports dont intersect, not interested */
mask &= ~port_mask;
 
/* free if only remaining port is host port */
if (mask)
-   cpsw_ale_set_port_mask(ale_entry, mask);
+   cpsw_ale_set_port_mask(ale_entry, mask,
+  ale->port_mask_bits);
else
cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_FREE);
 }
@@ -303,7 +321,7 @@ int cpsw_ale_add_ucast(struct cpsw_ale *ale, u8 *addr, int 
port,
cpsw_ale_set_ucast_type(ale_entry, ALE_UCAST_PERSISTANT);
cpsw_ale_set_secure(ale_entry, (flags & ALE_SECURE) ? 1 : 0);
cpsw_ale_set_blocked(ale_entry, (flags & ALE_BLOCKED) ? 1 : 0);
-   cpsw_ale_set_port_num(ale_entry, port);
+   cpsw_ale_set_port_num(ale_entry, port, ale->port_num_bits);
 
idx = cpsw_ale_match_addr(ale, addr, (flags & ALE_VLAN) ? vid : 0);
if (idx < 0)
@@ -350,9 +368,11 @@ int cpsw_ale_add_mcast(struct cpsw_ale *ale, u8 *addr, int 
port_mask,

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread Theodore Ts'o
On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
> 1) Anything that requires actual long-term security will use
> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
> things like TCP sequence numbers. This seems pretty uncontroversial to
> me. Seem okay to you?

Um, why do TCP sequence numbers need long-term security?  So long as
you rekey every 5 minutes or so, TCP sequence numbers don't need any
more security than that, since even if you break the key used to
generate initial sequence numbers seven a minute or two later, any
pending TCP connections will have timed out long before.

See the security analysis done in RFC 6528[1], where among other
things, it points out why MD5 is acceptable with periodic rekeying,
although there is the concern that this could break certain hueristics
used when establishing new connections during the TIME-WAIT state.

[1] https://tools.ietf.org/html/rfc6528

- Ted


[GIT] Networking

2016-12-20 Thread David Miller

1) Use rb_entry() instead of hardcoded container_of(), from Geliang Tang.

2) Use correct memory barriers in stammac driver, from Pavel Machek.

3) Fix assoc bind address handling in SCTP, from Xin Long.

4) Make the length check for UFO handling consistent between
   __ip_append_data() and ip_finish_output(), from Zheng Li.

5) HSI driver compatible strings were busted fro hix5hd2, from Dongpo
   Li.

6) Handle devm_ioremap() errors properly in cavium driver, from Arvind
   Yadav.

Please pull, thanks a lot!

The following changes since commit 52f40e9d657cc126b766304a5dd58ad73b02ff46:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2016-12-17 
20:17:04 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to a763f78cea845c91b8d91f93dabf70c407635dc5:

  RDS: use rb_entry() (2016-12-20 14:22:49 -0500)


Arvind Yadav (1):
  net: ethernet: cavium: octeon: octeon_mgmt: Handle return NULL error from 
devm_ioremap

David S. Miller (4):
  Merge branch 'phy-broken-modes'
  Merge branch 'fsl-fixes'
  Merge branch 'hix5hd2_gmac-compatible-string'
  Merge branch 'sctp-fixes'

Dongpo Li (2):
  net: hix5hd2_gmac: fix compatible strings name
  ARM: dts: hix5hd2: don't change the existing compatible string

Geliang Tang (4):
  net/mlx5: use rb_entry()
  net_sched: sch_fq: use rb_entry()
  net_sched: sch_netem: use rb_entry()
  RDS: use rb_entry()

Jarno Rajahalme (1):
  openvswitch: Add a missing break statement.

Madalin Bucur (4):
  fsl/fman: fix 1G support for QSGMII interfaces
  powerpc: fsl/fman: remove fsl,fman from of_device_ids[]
  fsl/fman: A007273 only applies to PPC SoCs
  fsl/fman: enable compilation on ARM64

Pavel Machek (1):
  stmmac: fix memory barriers

Tobias Klauser (1):
  ethernet: sfc: Add Kconfig entry for vendor Solarflare

WingMan Kwok (2):
  net: netcp: ethss: fix errors in ethtool ops
  net: netcp: ethss: fix 10gbe host port tx pri map configuration

Xin Long (2):
  sctp: reduce indent level in sctp_copy_local_addr_list
  sctp: not copying duplicate addrs to the assoc's bind address list

jbrunet (3):
  net: phy: fix sign type error in genphy_config_eee_advert
  net: phy: use boolean dt properties for eee broken modes
  dt: bindings: net: use boolean dt properties for eee broken modes

zheng li (1):
  ipv4: Should use consistent conditional judgement for ip fragment in 
__ip_append_data and ip_finish_output

 Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt | 13 
-
 Documentation/devicetree/bindings/net/phy.txt| 10 
--
 arch/arm/boot/dts/hisi-x5hd2.dtsi|  4 ++--
 arch/powerpc/platforms/85xx/corenet_generic.c|  3 ---
 drivers/net/ethernet/Kconfig |  1 -
 drivers/net/ethernet/cavium/octeon/octeon_mgmt.c |  6 ++
 drivers/net/ethernet/freescale/fman/Kconfig  |  2 +-
 drivers/net/ethernet/freescale/fman/fman.c   | 15 
+++
 drivers/net/ethernet/freescale/fman/mac.c|  1 +
 drivers/net/ethernet/hisilicon/hix5hd2_gmac.c| 13 
+++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c|  2 +-
 drivers/net/ethernet/sfc/Kconfig | 21 
+
 drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c   |  4 ++--
 drivers/net/ethernet/stmicro/stmmac/enh_desc.c   |  2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c|  8 
 drivers/net/ethernet/ti/netcp_ethss.c| 24 
++--
 drivers/net/phy/phy_device.c | 22 
+-
 include/dt-bindings/net/mdio.h   | 19 
---
 net/ipv4/ip_output.c |  2 +-
 net/openvswitch/flow_netlink.c   |  1 +
 net/rds/rdma.c   |  2 +-
 net/sched/sch_fq.c   | 14 
+++---
 net/sched/sch_netem.c|  2 +-
 net/sctp/bind_addr.c |  3 +++
 net/sctp/protocol.c  | 40 
++--
 25 files changed, 148 insertions(+), 86 deletions(-)
 delete mode 100644 include/dt-bindings/net/mdio.h


[ANNOUNCE] nftables 0.7 release

2016-12-20 Thread Pablo Neira Ayuso
Hi!

The Netfilter project proudly presents:

nftables 0.7

This release contains many accumulated bug fixes and new features
available up to the (upcoming) Linux 4.10-rc1 kernel release.

* Facilitate migration from iptables to nftables:

  At compilation time, you have to pass this option.

  # ./configure --with-xtables

  And libxtables needs to be installed in your system. This allows you
  to list a ruleset containing xt extensions loaded through
  iptables-compat-restore tool. The nft tool provides a native
  translation for iptables extensions (if available).

* Add new fib expression, which can be used to obtain the output
  interface from the route table based on either source or destination
  address of a packet. This can be used to e.g. add reverse path
  filtering, eg. drop if not coming from the same interface packet
  arrived on:

  # nft add rule x prerouting fib saddr . iif oif eq 0 drop

  Accept only if from eth:

  # nft add rule x prerouting fib saddr . iif oif eq "eth0" accept

  Accept if from any valid interface:

  # nft add rule x prerouting fib saddr oif accept

  Querying of address type is also supported, this can be used
  to only accept packets to addresses configured in the same
  interface, eg.

  # nft add rule x prerouting fib daddr . iif type local accept

  Its also possible to use mark and verdict map, eg,

  # nft add rule x prerouting \
meta mark set 0xdead fib daddr . mark type vmap {
blackhole : drop,
prohibit : drop,
unicast : accept
}

* Support hashing of any arbitrary key combination, eg.

  # nft add rule x y \
dnat to jhash ip saddr . tcp dport mod 2 map { \
0 : 192.168.20.100, \
1 : 192.168.30.100 \
}

  Another usecase: Set packet marks based on any arbitrary hashing.

* Add number generation support. Useful for round-robin packet mark
  setting, eg.

  # nft add rule filter prerouting meta mark set numgen inc mod 2

  You can also specify an offset to indicate from what value you want
  to start from.

  The modulus provides the scale of the counting sequence. You can
  also use this from maps, eg.

  # nft add rule nat prerouting \
dnat to numgen inc mod 2 map { 0 : 192.168.10.100, 1 : 192.168.20.200 }

  So this is distributing new connections in a round-robin fashion
  between 192.168.10.100 and 192.168.20.200. Don't forget the special NAT
  chain semantics: Only the first packet evaluates the rule, follow up
  packets rely on conntrack to apply the NAT information.

  You can also emulate flow distribution with different backend weights
  using intervals, eg.

  # nft add rule nat prerouting \
dnat to numgen inc mod 10 map { 0-5 : 192.168.10.100, 6-9 : 
192.168.20.200 }

* Add quota support, eg.

  # nft add rule filter input \
flow table http { ip saddr timeout 60s quota over 50 mbytes } drop

  This creates a flow table, where every flow gets a quota of 50
  mbytes. You can also from use simple rules too to enforce quotas, of
  course.

* Introduce routing expression, for routing related data with support
  for nexthop (i.e. the directly connected IP address that an outgoing
  packet is sent to), which can be used either for matching or accounting, eg.

 # nft add rule filter postrouting \
  ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop

  This will drop any traffic to 192.168.1.0/24 that is not routed via
  192.168.0.1.

 # nft add rule filter postrouting \
  flow table acct { rt nexthop timeout 600s counter }

 # nft add rule ip6 filter postrouting \
  flow table acct { rt nexthop timeout 600s counter }

  These rules count outgoing traffic per nexthop. Note that the timeout
  releases an entry if no traffic is seen for this nexthop within 10
  minutes.

* Notrack support, to explicitly skip connection tracking for matching
  packets, eg.

 # nft add rule ip raw prerouting tcp dport { 80, 443 } notrack

  So you can skip tracking for http and https traffic.

* Support to set non-byte bound packet header fields, including
  checksum adjustment, eg. ip6 ecn set 1.

* Add 'create set' and 'create element' commands, eg.

 # nft add set x y { type ipv4_addr\; }
 # nft create set x y { type ipv4_addr\; }
 :1:1-35: Error: Could not process rule: File exists
 create set x y { type ipv4_addr; }
 ^^^
 # nft add set x y { type ipv4_addr\; }
 #

  So 'create' bails out if the set already exists, while 'add'
  doesn't, for more ergonomic usage as several users requested on
  the mailing list.

* Allow to use variable reference for set element definitions, eg.

  # cat ruleset.nft
define s-ext-2-int = { 10.10.10.10 . 25, 10.10.10.10 . 143 }

table inet forward {
set s-ext-2-int {
 type ipv4_addr . inet_service
 elements = $s-ext-2-int
}
}
  # 

[PATCH 3/5 net-next] inet: don't check for bind conflicts twice when searching for a port

2016-12-20 Thread Josef Bacik
This is just wasted time, we've already found a tb that doesn't have a bind
conflict, and we don't drop the head lock so scanning again isn't going to give
us a different answer.  Instead move the tb->reuse setting logic outside of the
found_tb path and put it in the success: path.  Then make it so that we don't
goto again if we find a bind conflict in the found_tb path as we won't reach
this anymore when we are scanning for an ephemeral port.

Signed-off-by: Josef Bacik 
---
 net/ipv4/inet_connection_sock.c | 39 ++-
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 1a1a94bd..fc9bfe1 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -92,7 +92,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 {
bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
-   int ret = 1, attempts = 5, port = snum;
+   int ret = 1, port = snum;
struct inet_bind_hashbucket *head;
struct net *net = sock_net(sk);
int i, low, high, attempt_half;
@@ -100,6 +100,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
kuid_t uid = sock_i_uid(sk);
u32 remaining, offset;
bool reuseport_ok = !!snum;
+   bool empty_tb = true;
 
if (port) {
head = >bhash[inet_bhashfn(net, port,
@@ -111,7 +112,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 
goto tb_not_found;
}
-again:
attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
inet_get_local_port_range(net, , );
@@ -148,8 +148,12 @@ other_parity_scan:
spin_lock_bh(>lock);
inet_bind_bucket_for_each(tb, >chain)
if (net_eq(ib_net(tb), net) && tb->port == port) {
-   if (!inet_csk_bind_conflict(sk, tb, false, 
reuseport_ok))
-   goto tb_found;
+   if (hlist_empty(>owners))
+   goto success;
+   if (!inet_csk_bind_conflict(sk, tb, false, 
reuseport_ok)) {
+   empty_tb = false;
+   goto success;
+   }
goto next_port;
}
goto tb_not_found;
@@ -184,23 +188,12 @@ tb_found:
  !rcu_access_pointer(sk->sk_reuseport_cb) &&
  sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
goto success;
-   if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
-   if ((reuse ||
-(tb->fastreuseport > 0 &&
- sk->sk_reuseport &&
- !rcu_access_pointer(sk->sk_reuseport_cb) &&
- uid_eq(tb->fastuid, uid))) && !snum &&
-   --attempts >= 0) {
-   spin_unlock_bh(>lock);
-   goto again;
-   }
+   if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
goto fail_unlock;
-   }
-   if (!reuse)
-   tb->fastreuse = 0;
-   if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
-   tb->fastreuseport = 0;
-   } else {
+   empty_tb = false;
+   }
+success:
+   if (empty_tb) {
tb->fastreuse = reuse;
if (sk->sk_reuseport) {
tb->fastreuseport = 1;
@@ -208,8 +201,12 @@ tb_found:
} else {
tb->fastreuseport = 0;
}
+   } else {
+   if (!reuse)
+   tb->fastreuse = 0;
+   if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
+   tb->fastreuseport = 0;
}
-success:
if (!inet_csk(sk)->icsk_bind_hash)
inet_bind_hash(sk, tb, port);
WARN_ON(inet_csk(sk)->icsk_bind_hash != tb);
-- 
2.9.3



[PATCH 5/5 net-next] inet: reset tb->fastreuseport when adding a reuseport sk

2016-12-20 Thread Josef Bacik
If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and
never set it again.  Which means that in the future if we end up adding a bunch
of reuseport sk's to that tb we'll have to do the expensive scan every time.
Instead add a sock_common to the tb so we know what reuseport sk succeeded last.
Once one sk has made it onto the list we know that there are no potential bind
conflicts on the owners list that match that sk's rcv_addr.  So copy the sk's
common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT so
we know we have to do an extra check for subsequent reuseport sockets and skip
the expensive bind conflict check.

Signed-off-by: Josef Bacik 
---
 include/net/inet_hashtables.h   |  4 
 net/ipv4/inet_connection_sock.c | 53 +
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 50f635c..b776401 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -74,12 +74,16 @@ struct inet_ehash_bucket {
  * users logged onto your box, isn't it nice to know that new data
  * ports are created in O(1) time?  I thought so. ;-)  -DaveM
  */
+#define FASTREUSEPORT_ANY  1
+#define FASTREUSEPORT_STRICT   2
+
 struct inet_bind_bucket {
possible_net_t  ib_net;
unsigned short  port;
signed char fastreuse;
signed char fastreuseport;
kuid_t  fastuid;
+   struct sock_common  fastsock;
int num_owners;
struct hlist_node   node;
struct hlist_head   owners;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index d3ccf62..9e29fad 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -164,6 +164,32 @@ success:
return head;
 }
 
+static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
+struct sock *sk)
+{
+   struct sock *sk2 = (struct sock *)>fastsock;
+   kuid_t uid = sock_i_uid(sk);
+
+   if (tb->fastreuseport <= 0)
+   return 0;
+   if (!sk->sk_reuseport)
+   return 0;
+   if (rcu_access_pointer(sk->sk_reuseport_cb))
+   return 0;
+   if (!uid_eq(tb->fastuid, uid))
+   return 0;
+   /* We only need to check the rcv_saddr if this tb was once marked
+* without fastreuseport and then was reset, as we can only know that
+* the fastsock has no potential bind conflicts with the rest of the
+* possible socks on the owners list.
+*/
+   if (tb->fastreuseport == FASTREUSEPORT_ANY)
+   return 1;
+   if (!inet_csk(sk)->icsk_af_ops->rcv_saddr_equal(sk, sk2, true))
+   return 0;
+   return 1;
+}
+
 /* Obtain a reference to a local port for the given sock,
  * if snum is zero it means select any available local port.
  * We try to allocate an odd port (and leave even ports for connect())
@@ -206,9 +232,7 @@ tb_found:
goto success;
 
if ((tb->fastreuse > 0 && reuse) ||
-(tb->fastreuseport > 0 &&
- !rcu_access_pointer(sk->sk_reuseport_cb) &&
- sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
+   sk_reuseport_match(tb, sk))
goto success;
if (inet_csk_bind_conflict(sk, tb, true, true))
goto fail_unlock;
@@ -220,14 +244,35 @@ success:
if (sk->sk_reuseport) {
tb->fastreuseport = 1;
tb->fastuid = uid;
+   memcpy(>fastsock, >__sk_common,
+  sizeof(struct sock_common));
} else {
tb->fastreuseport = 0;
}
} else {
if (!reuse)
tb->fastreuse = 0;
-   if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
+   if (sk->sk_reuseport) {
+   /* We didn't match or we don't have fastreuseport set on
+* the tb, but we have sk_reuseport set on this socket
+* and we know that there are no bind conflicts with
+* this socket in this tb, so reset our tb's reuseport
+* settings so that any subsequent sockets that match
+* our current socket will be put on the fast path.
+*
+* If we reset we need to set FASTREUSEPORT_STRICT so we
+* do extra checking for all subsequent sk_reuseport
+* socks.
+*/
+   if (!sk_reuseport_match(tb, sk)) {
+   

[PATCH 4/5 net-next] inet: split inet_csk_get_port into two functions

2016-12-20 Thread Josef Bacik
inet_csk_get_port does two different things, it either scans for an open port,
or it tries to see if the specified port is available for use.  Since these two
operations have different rules and are basically independent lets split them
into two different functions to make them both more readable.

Signed-off-by: Josef Bacik 
---
 net/ipv4/inet_connection_sock.c | 72 +++--
 1 file changed, 47 insertions(+), 25 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index fc9bfe1..d3ccf62 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -84,34 +84,21 @@ static int inet_csk_bind_conflict(const struct sock *sk,
return sk2 != NULL;
 }
 
-/* Obtain a reference to a local port for the given sock,
- * if snum is zero it means select any available local port.
- * We try to allocate an odd port (and leave even ports for connect())
+/*
+ * Find an open port number for the socket.  Returns with the
+ * inet_bind_hashbucket lock held.
  */
-int inet_csk_get_port(struct sock *sk, unsigned short snum)
+static struct inet_bind_hashbucket *
+inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int 
*port_ret)
 {
-   bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
-   int ret = 1, port = snum;
+   int port = 0;
struct inet_bind_hashbucket *head;
struct net *net = sock_net(sk);
int i, low, high, attempt_half;
struct inet_bind_bucket *tb;
-   kuid_t uid = sock_i_uid(sk);
u32 remaining, offset;
-   bool reuseport_ok = !!snum;
-   bool empty_tb = true;
 
-   if (port) {
-   head = >bhash[inet_bhashfn(net, port,
- hinfo->bhash_size)];
-   spin_lock_bh(>lock);
-   inet_bind_bucket_for_each(tb, >chain)
-   if (net_eq(ib_net(tb), net) && tb->port == port)
-   goto tb_found;
-
-   goto tb_not_found;
-   }
attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
inet_get_local_port_range(net, , );
@@ -150,13 +137,12 @@ other_parity_scan:
if (net_eq(ib_net(tb), net) && tb->port == port) {
if (hlist_empty(>owners))
goto success;
-   if (!inet_csk_bind_conflict(sk, tb, false, 
reuseport_ok)) {
-   empty_tb = false;
+   if (!inet_csk_bind_conflict(sk, tb, false, 
false))
goto success;
-   }
goto next_port;
}
-   goto tb_not_found;
+   tb = NULL;
+   goto success;
 next_port:
spin_unlock_bh(>lock);
cond_resched();
@@ -171,8 +157,44 @@ next_port:
attempt_half = 2;
goto other_half_scan;
}
-   return ret;
+   return NULL;
+success:
+   *port_ret = port;
+   *tb_ret = tb;
+   return head;
+}
 
+/* Obtain a reference to a local port for the given sock,
+ * if snum is zero it means select any available local port.
+ * We try to allocate an odd port (and leave even ports for connect())
+ */
+int inet_csk_get_port(struct sock *sk, unsigned short snum)
+{
+   bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
+   struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
+   int ret = 1, port = snum;
+   struct inet_bind_hashbucket *head;
+   struct net *net = sock_net(sk);
+   struct inet_bind_bucket *tb = NULL;
+   kuid_t uid = sock_i_uid(sk);
+   bool empty_tb = true;
+
+   if (!port) {
+   head = inet_csk_find_open_port(sk, , );
+   if (!head)
+   return 1;
+   if (!tb)
+   goto tb_not_found;
+   if (!hlist_empty(>owners))
+   empty_tb = false;
+   goto success;
+   }
+   head = >bhash[inet_bhashfn(net, port,
+ hinfo->bhash_size)];
+   spin_lock_bh(>lock);
+   inet_bind_bucket_for_each(tb, >chain)
+   if (net_eq(ib_net(tb), net) && tb->port == port)
+   goto tb_found;
 tb_not_found:
tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep,
 net, head, port);
@@ -188,7 +210,7 @@ tb_found:
  !rcu_access_pointer(sk->sk_reuseport_cb) &&
  sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
goto success;
-   if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
+   if 

[PATCH 2/5 net-next] inet: kill smallest_size and smallest_port

2016-12-20 Thread Josef Bacik
In inet_csk_get_port we seem to be using smallest_port to figure out where the
best place to look for a SO_REUSEPORT sk that matches with an existing set of
SO_REUSEPORT's.  However if we get to the logic

if (smallest_size != -1) {
port = smallest_port;
goto have_port;
}

we will do a useless search, because we would have already done the
inet_csk_bind_conflict for that port and it would have returned 1, otherwise we
would have gone to found_tb and succeeded.  Since this logic makes us do yet
another trip through inet_csk_bind_conflict for a port we know won't work just
delete this code and save us the time.

Signed-off-by: Josef Bacik 
---
 net/ipv4/inet_connection_sock.c | 26 --
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 74f6a57..1a1a94bd 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -93,7 +93,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
int ret = 1, attempts = 5, port = snum;
-   int smallest_size = -1, smallest_port;
struct inet_bind_hashbucket *head;
struct net *net = sock_net(sk);
int i, low, high, attempt_half;
@@ -103,7 +102,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
bool reuseport_ok = !!snum;
 
if (port) {
-have_port:
head = >bhash[inet_bhashfn(net, port,
  hinfo->bhash_size)];
spin_lock_bh(>lock);
@@ -137,8 +135,6 @@ other_half_scan:
 * We do the opposite to not pollute connect() users.
 */
offset |= 1U;
-   smallest_size = -1;
-   smallest_port = low; /* avoid compiler warning */
 
 other_parity_scan:
port = low + offset;
@@ -152,15 +148,6 @@ other_parity_scan:
spin_lock_bh(>lock);
inet_bind_bucket_for_each(tb, >chain)
if (net_eq(ib_net(tb), net) && tb->port == port) {
-   if (((tb->fastreuse > 0 && reuse) ||
-(tb->fastreuseport > 0 &&
- sk->sk_reuseport &&
- !rcu_access_pointer(sk->sk_reuseport_cb) 
&&
- uid_eq(tb->fastuid, uid))) &&
-   (tb->num_owners < smallest_size || 
smallest_size == -1)) {
-   smallest_size = tb->num_owners;
-   smallest_port = port;
-   }
if (!inet_csk_bind_conflict(sk, tb, false, 
reuseport_ok))
goto tb_found;
goto next_port;
@@ -171,10 +158,6 @@ next_port:
cond_resched();
}
 
-   if (smallest_size != -1) {
-   port = smallest_port;
-   goto have_port;
-   }
offset--;
if (!(offset & 1))
goto other_parity_scan;
@@ -196,19 +179,18 @@ tb_found:
if (sk->sk_reuse == SK_FORCE_REUSE)
goto success;
 
-   if (((tb->fastreuse > 0 && reuse) ||
+   if ((tb->fastreuse > 0 && reuse) ||
 (tb->fastreuseport > 0 &&
  !rcu_access_pointer(sk->sk_reuseport_cb) &&
- sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
-   smallest_size == -1)
+ sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
goto success;
if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
if ((reuse ||
 (tb->fastreuseport > 0 &&
  sk->sk_reuseport &&
  !rcu_access_pointer(sk->sk_reuseport_cb) &&
- uid_eq(tb->fastuid, uid))) &&
-   !snum && smallest_size != -1 && --attempts >= 0) {
+ uid_eq(tb->fastuid, uid))) && !snum &&
+   --attempts >= 0) {
spin_unlock_bh(>lock);
goto again;
}
-- 
2.9.3



[PATCH 1/5 net-next] inet: replace ->bind_conflict with ->rcv_saddr_equal

2016-12-20 Thread Josef Bacik
The only difference between inet6_csk_bind_conflict and inet_csk_bind_conflict
is how they check the rcv_saddr.  Since we want to be able to check the saddr in
other places just drop the protocol specific ->bind_conflict and replace it with
->rcv_saddr_equal, then make inet_csk_bind_conflict the one true bind conflict
function.

Signed-off-by: Josef Bacik 
---
 include/net/inet6_connection_sock.h |  5 -
 include/net/inet_connection_sock.h  |  9 +++--
 net/dccp/ipv4.c |  3 ++-
 net/dccp/ipv6.c |  2 +-
 net/ipv4/inet_connection_sock.c | 22 +++-
 net/ipv4/tcp_ipv4.c |  3 ++-
 net/ipv4/udp.c  |  1 +
 net/ipv6/inet6_connection_sock.c| 40 -
 net/ipv6/tcp_ipv6.c |  4 ++--
 9 files changed, 18 insertions(+), 71 deletions(-)

diff --git a/include/net/inet6_connection_sock.h 
b/include/net/inet6_connection_sock.h
index 3212b39..8ec87b6 100644
--- a/include/net/inet6_connection_sock.h
+++ b/include/net/inet6_connection_sock.h
@@ -15,16 +15,11 @@
 
 #include 
 
-struct inet_bind_bucket;
 struct request_sock;
 struct sk_buff;
 struct sock;
 struct sockaddr;
 
-int inet6_csk_bind_conflict(const struct sock *sk,
-   const struct inet_bind_bucket *tb, bool relax,
-   bool soreuseport_ok);
-
 struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct flowi6 
*fl6,
  const struct request_sock *req, u8 proto);
 
diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index ec0479a..9cd43c5 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -62,9 +62,9 @@ struct inet_connection_sock_af_ops {
char __user *optval, int __user *optlen);
 #endif
void(*addr2sockaddr)(struct sock *sk, struct sockaddr *);
-   int (*bind_conflict)(const struct sock *sk,
-const struct inet_bind_bucket *tb,
-bool relax, bool soreuseport_ok);
+   int (*rcv_saddr_equal)(const struct sock *sk1,
+  const struct sock *sk2,
+  bool match_wildcard);
void(*mtu_reduced)(struct sock *sk);
 };
 
@@ -261,9 +261,6 @@ inet_csk_rto_backoff(const struct inet_connection_sock 
*icsk,
 
 struct sock *inet_csk_accept(struct sock *sk, int flags, int *err);
 
-int inet_csk_bind_conflict(const struct sock *sk,
-  const struct inet_bind_bucket *tb, bool relax,
-  bool soreuseport_ok);
 int inet_csk_get_port(struct sock *sk, unsigned short snum);
 
 struct dst_entry *inet_csk_route_req(const struct sock *sk, struct flowi4 *fl4,
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 9c67a96..1931324 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -901,7 +902,7 @@ static const struct inet_connection_sock_af_ops 
dccp_ipv4_af_ops = {
.getsockopt= ip_getsockopt,
.addr2sockaddr = inet_csk_addr2sockaddr,
.sockaddr_len  = sizeof(struct sockaddr_in),
-   .bind_conflict = inet_csk_bind_conflict,
+   .rcv_saddr_equal   = ipv4_rcv_saddr_equal,
 #ifdef CONFIG_COMPAT
.compat_setsockopt = compat_ip_setsockopt,
.compat_getsockopt = compat_ip_getsockopt,
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 4663a01..45242b8 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -926,7 +926,7 @@ static const struct inet_connection_sock_af_ops 
dccp_ipv6_af_ops = {
.getsockopt= ipv6_getsockopt,
.addr2sockaddr = inet6_csk_addr2sockaddr,
.sockaddr_len  = sizeof(struct sockaddr_in6),
-   .bind_conflict = inet6_csk_bind_conflict,
+   .rcv_saddr_equal   = ipv6_rcv_saddr_equal,
 #ifdef CONFIG_COMPAT
.compat_setsockopt = compat_ipv6_setsockopt,
.compat_getsockopt = compat_ipv6_getsockopt,
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 5f44fa1..74f6a57 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -44,9 +44,9 @@ void inet_get_local_port_range(struct net *net, int *low, int 
*high)
 }
 EXPORT_SYMBOL(inet_get_local_port_range);
 
-int inet_csk_bind_conflict(const struct sock *sk,
-  const struct inet_bind_bucket *tb, bool relax,
-  bool reuseport_ok)
+static int inet_csk_bind_conflict(const struct sock *sk,
+ const struct inet_bind_bucket *tb,
+ bool relax, bool reuseport_ok)
 {
struct sock *sk2;
bool reuse = sk->sk_reuse;
@@ -62,7 +62,6 @@ int inet_csk_bind_conflict(const 

[RFC][PATCH 0/5 net-next] Rework inet_csk_get_port

2016-12-20 Thread Josef Bacik
At some point recently the guys working on our load balancer added the ability
to use SO_REUSEPORT.  When they restarted their app with this option enabled
they immediately hit a softlockup on what appeared to be the
inet_bind_bucket->lock.  Eventually what all of our debugging and discussion led
us to was the fact that the application comes up without SO_REUSEPORT, shuts
down which creates around 100k twsk's, and then comes up and tries to open a
bunch of sockets using SO_REUSEPORT, which meant traversing the inet_bind_bucket
owners list under the lock.  Since this lock is needed for dealing with the
twsk's and basically anything else related to connections we would softlockup,
and sometimes not ever recover.

To solve this problem I did what you see in Path 5/5.  Once we have a
SO_REUSEPORT socket on the tb->owners list we know that the socket has no
conflicts with any of the other sockets on that list.  So we can add a copy of
the sock_common (really all we need is the recv_saddr but it seemed ugly to copy
just the ipv6, ipv4, and flag to indicate if we were ipv6 only in there so I've
copied the whole common) in order to check subsequent SO_REUSEPORT sockets.  If
they match the previous one then we can skip the expensive
inet_csk_bind_conflict check.  This is what eliminated the soft lockup that we
were seeing.

Patches 1-4 are cleanups and re-workings.  For instance when we specify port ==
0 we need to find an open port, but we would do two passes through
inet_csk_bind_conflict every time we found a possible port.  We would also keep
track of the smallest_port value in order to try and use it if we found no
port our first run through.  This however made no sense as it would have had to
fail the first pass through inet_csk_bind_conflict, so would not actually pass
the second pass through either.  Finally I split the function into two functions
in order to make it easier to read and to distinguish between the two behaviors.

I have tested this on one of our load balancing boxes during peak traffic and it
hasn't fallen over.  But this is not my area, so obviously feel free to point
out where I'm being stupid and I'll get it fixed up and retested.  Thanks,

Josef


Re: nfc: trf7970a: Prevent repeated polling from crashing the kernel

2016-12-20 Thread Mark Greer
On Tue, Dec 20, 2016 at 02:13:52PM -0500, Justin Bronder wrote:
> On 20/12/16 11:59 -0700, Mark Greer wrote:
> > On Tue, Dec 20, 2016 at 11:16:32AM -0500, Geoff Lansberry wrote:
> > > From: Jaret Cantu 
> > > 
> > > Repeated polling attempts cause a NULL dereference error to occur.
> > > This is because the state of the trf7970a is currently reading but
> > > another request has been made to send a command before it has finished.
> > 
> > How is this happening?  Was trf7970a_abort_cmd() called and it didn't
> > work right?  Was it not called at all and there is a bug in the digital
> > layer?  More details please.
> > 
> > > The solution is to properly kill the waiting reading (workqueue)
> > > before failing on the send.
> > 
> > If the bug is in the calling code, then that is what should get fixed.
> > This seems to be a hack to work-around a digital layer bug.
> 
> One of our uses of NFC is to begin polling to read a tag and then stop polling
> (in order to save power) until we know via user interaction that we need to 
> poll
> again.  This is typically many minutes later so the power saving is pretty
> significant.  However, it's possible that a user will remove the tag before
> reading has completed.  We also detect this case and stop polling.  I can go
> more into this if necessary but that is what exposed a panic.
> 
> You can reproduce using neard and python, in our testing it was very likely to
> occur in 10-100 iterations of the following.:
> 
> #!/usr/bin/python
> import time
> 
> import dbus
> 
> bus = dbus.SystemBus()
> nfc0 = bus.get_object('org.neard', '/org/neard/nfc0')
> props = dbus.Interface(nfc0, 'org.freedesktop.DBus.Properties')
> 
> try:
> props.Set('org.neard.Adapter', 'Powered', dbus.Boolean(1))
> except:
> pass
> 
> adapter = dbus.Interface(nfc0, 'org.neard.Adapter')
> 
> for i in range(1000):
> adapter.StartPollLoop('Initiator')
> time.sleep(0.1)
> adapter.StopPollLoop()
> print(i)
> 
> I believe the last time we tested this was around the 4.1 release.

Thanks for the info, Justin, but I was also seeking more information
at the kernel NFC subsystem and trf7970a driver level.  This patch
adds code inside an 'if' in the driver whose condition should never
be evaluate to true but apparently it did.  How?

Thanks,

Mark
--


Re: ipv6: handle -EFAULT from skb_copy_bits

2016-12-20 Thread Dave Jones
On Tue, Dec 20, 2016 at 01:28:13PM -0500, David Miller wrote:
 
 > This has to do with the SKB buffer layout and geometry, not whether
 > the packet is "fragmented" in the protocol sense.
 > 
 > So no, this isn't a criteria for packets being filtered out by this
 > point.
 > 
 > Can you try to capture what sk->sk_socket->type and
 > inet_sk(sk)->hdrincl are set to at the time of the crash?
 > 

type:3 hdrincl:0

Dave



Re: ipv6: handle -EFAULT from skb_copy_bits

2016-12-20 Thread Cong Wang
On Tue, Dec 20, 2016 at 10:17 AM, Dave Jones  wrote:
> On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote:
>  > From: Dave Jones 
>  > Date: Mon, 19 Dec 2016 19:40:13 -0500
>  >
>  > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
>  > >
>  > >  > Unfortunately, this made no difference.  I spent some time today 
> trying
>  > >  > to make a better reproducer, but failed. I'll revisit again tomorrow.
>  > >  >
>  > >  > Maybe I need >1 process/thread to trigger this.  That would explain 
> why
>  > >  > I can trigger it with Trinity.
>  > >
>  > > scratch that last part, I finally just repro'd it with a single process.
>  >
>  > Thanks for the info, I'll try to think about this some more.
>
> I threw in some debug printks right before that BUG_ON.
> it's always this:
>
> skb->len=31 skb->data_len=0 offset:30 total_len:9

Clearly we fail because 30 > 31 - 2, seems 'offset' is not correct here,
off-by-one?


Re: [PATCH net] be2net: Increase skb headroom size to 256 bytes

2016-12-20 Thread David Miller
From: Suresh Reddy 
Date: Tue, 20 Dec 2016 10:14:30 -0500

> From: Kalesh A P 
> 
> The driver currently allocates 128 bytes of skb headroom.
> This was found to be insufficient with some configurations
> like Geneve tunnels, which resulted in skb head reallocations.
> 
> Increase the headroom to 256 bytes to fix this.
> 
> Signed-off-by: Kalesh A P 
> Signed-off-by: Suresh Reddy 

Adding 128 bytes of headroom just for geneve seems excessive.

Do you really need to add that much?


Re: [mm PATCH 0/3] Page fragment updates

2016-12-20 Thread Alexander Duyck
On Mon, Dec 5, 2016 at 12:11 PM, Andrew Morton
 wrote:
> On Mon, 5 Dec 2016 09:01:12 -0800 Alexander Duyck  
> wrote:
>
>> On Tue, Nov 29, 2016 at 10:23 AM, Alexander Duyck
>>  wrote:
>> > This patch series takes care of a few cleanups for the page fragments API.
>> >
>> > ...
>>
>> It's been about a week since I submitted this series.  Just wanted to
>> check in and see if anyone had any feedback or if this is good to be
>> accepted for 4.10-rc1 with the rest of the set?
>
> Looks good to me.  I have it all queued for post-4.9 processing.

So I guess there is a small bug in the first patch in that I was
comparing a pointer to to 0 instead of NULL.  Just wondering if I
should resubmit the first patch, the whole series, or if I need to
just submit an incremental patch.

Thanks.

- Alex


Re: [PATCH] net_sched: sch_netem: use rb_entry()

2016-12-20 Thread David Miller
From: Geliang Tang 
Date: Tue, 20 Dec 2016 22:02:16 +0800

> To make the code clearer, use rb_entry() instead of container_of() to
> deal with rbtree.
> 
> Signed-off-by: Geliang Tang 

Applied.


Re: [PATCH] net_sched: sch_fq: use rb_entry()

2016-12-20 Thread David Miller
From: Geliang Tang 
Date: Tue, 20 Dec 2016 22:02:15 +0800

> To make the code clearer, use rb_entry() instead of container_of() to
> deal with rbtree.
> 
> Signed-off-by: Geliang Tang 

Applied.


Re: [PATCH] net/mlx5: use rb_entry()

2016-12-20 Thread David Miller
From: Geliang Tang 
Date: Tue, 20 Dec 2016 22:02:14 +0800

> To make the code clearer, use rb_entry() instead of container_of() to
> deal with rbtree.
> 
> Signed-off-by: Geliang Tang 

Applied.


Re: [PATCH] RDS: use rb_entry()

2016-12-20 Thread David Miller
From: Geliang Tang 
Date: Tue, 20 Dec 2016 22:02:18 +0800

> To make the code clearer, use rb_entry() instead of container_of() to
> deal with rbtree.
> 
> Signed-off-by: Geliang Tang 

Applied.


Re: [PATCH] ethernet: sfc: Add Kconfig entry for vendor Solarflare

2016-12-20 Thread David Miller
From: Tobias Klauser 
Date: Tue, 20 Dec 2016 14:38:26 +0100

> Since commit
> 
>   5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new 
> sfc-falcon driver")
> 
> there are two drivers for Solarflare devices, but both still show up
> directly beneath "Ethernet driver support" in the Kconfig. Follow the
> pattern of other vendors and group them beneath an own vendor Kconfig
> entry for Solarflare.
> 
> Cc: Edward Cree 
> Signed-off-by: Tobias Klauser 

Applied.


Re: [GIT PULL 00/29] perf/core improvements and fixes

2016-12-20 Thread Ingo Molnar

* Arnaldo Carvalho de Melo <a...@kernel.org> wrote:

> Hi Ingo,
> 
> Please consider pulling, I had most of this queued before your first
> pull req to Linus for 4.10, most are fixes, with 'perf sched timehist --idle'
> as a followup new feature to the 'perf sched timehist' command introduced in
> this window.
>   
>   One other thing that delayed this was the samples/bpf/ switch to
> tools/lib/bpf/ that involved fixing up merge clashes with net.git and also
> to properly test it, after more rounds than antecipated, but all seems ok
> now and would be good to get this merge issues past us ASAP.
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit e7aa8c2eb11ba69b1b69099c3c7bd6be3087b0ba:
> 
>   Merge tag 'docs-4.10' of git://git.lwn.net/linux (2016-12-12 21:58:13 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-20161220
> 
> for you to fetch changes up to 9899694a7f67714216665b87318eb367e2c5c901:
> 
>   samples/bpf: Move open_raw_sock to separate header (2016-12-20 12:00:40 
> -0300)
> 
> 
> perf/core improvements and fixes:
> 
> New features:
> 
> - Introduce 'perf sched timehist --idle', to analyse processes
>   going to/from idle state (Namhyung Kim)
> 
> Fixes:
> 
> - Allow 'perf record -u user' to continue when facing races with threads
>   going away after having scanned them via /proc (Jiri Olsa)
> 
> - Fix 'perf mem' --all-user/--all-kernel options (Jiri Olsa)
> 
> - Support jumps with multiple arguments (Ravi Bangoria)
> 
> - Fix jumps to before the function where they are located (Ravi
> Bangoria)
> 
> - Fix lock-pi help string (Davidlohr Bueso)
> 
> - Fix build of 'perf trace' in odd systems such as a RHEL PPC one (Jiri Olsa)
> 
> - Do not overwrite valid build id in 'perf diff' (Kan Liang)
> 
> - Don't throw error for zero length symbols, allowing the use of the TUI
>   in PowerPC, where such symbols became more common recently (Ravi Bangoria)
> 
> Infrastructure:
> 
> - Switch of samples/bpf/ to use tools/lib/bpf, removing libbpf
>   duplication (Joe Stringer)
> 
> - Move headers check into bash script (Jiri Olsa)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
> 
> 
> Arnaldo Carvalho de Melo (3):
>   perf tools: Remove some needless __maybe_unused
>   samples/bpf: Make perf_event_read() static
>   samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
> 
> Davidlohr Bueso (1):
>   perf bench futex: Fix lock-pi help string
> 
> Jiri Olsa (7):
>   perf tools: Move headers check into bash script
>   perf mem: Fix --all-user/--all-kernel options
>   perf evsel: Use variable instead of repeating lengthy FD macro
>   perf thread_map: Add thread_map__remove function
>   perf evsel: Allow to ignore missing pid
>   perf record: Force ignore_missing_thread for uid option
>   perf trace: Check if MAP_32BIT is defined (again)
> 
> Joe Stringer (8):
>   tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h
>   tools lib bpf: use __u32 from linux/types.h
>   tools lib bpf: Add flags to bpf_create_map()
>   samples/bpf: Make samples more libbpf-centric
>   samples/bpf: Switch over to libbpf
>   tools lib bpf: Add bpf_prog_{attach,detach}
>   samples/bpf: Remove perf_event_open() declaration
>   samples/bpf: Move open_raw_sock to separate header
> 
> Kan Liang (1):
>   perf diff: Do not overwrite valid build id
> 
> Namhyung Kim (6):
>   perf sched timehist: Split is_idle_sample()
>   perf sched timehist: Introduce struct idle_time_data
>   perf sched timehist: Save callchain when entering idle
>   perf sched timehist: Skip non-idle events when necessary
>   perf sched timehist: Add -I/--idle-hist option
>   perf sched timehist: Show callchains for idle stat
> 
> Ravi Bangoria (3):
>   perf annotate: Support jump instruction with target as second operand
>   perf annotate: Fix jump target outside of function address range
>   perf annotate: Don't throw error for zero length symbols
> 
>  samples/bpf/Makefile  |  70 +--
>  samples/bpf/README.rst|   4 +-
>  samples/bpf/bpf_load.c|  21 +-
>  samples/bpf/bpf_load.h|   3 +
>  samples/bpf/fds_example.c |  13 +-
>  samples/bpf/lathis

Re: nfc: trf7970a: Prevent repeated polling from crashing the kernel

2016-12-20 Thread Justin Bronder
On 20/12/16 11:59 -0700, Mark Greer wrote:
> On Tue, Dec 20, 2016 at 11:16:32AM -0500, Geoff Lansberry wrote:
> > From: Jaret Cantu 
> > 
> > Repeated polling attempts cause a NULL dereference error to occur.
> > This is because the state of the trf7970a is currently reading but
> > another request has been made to send a command before it has finished.
> 
> How is this happening?  Was trf7970a_abort_cmd() called and it didn't
> work right?  Was it not called at all and there is a bug in the digital
> layer?  More details please.
> 
> > The solution is to properly kill the waiting reading (workqueue)
> > before failing on the send.
> 
> If the bug is in the calling code, then that is what should get fixed.
> This seems to be a hack to work-around a digital layer bug.

One of our uses of NFC is to begin polling to read a tag and then stop polling
(in order to save power) until we know via user interaction that we need to poll
again.  This is typically many minutes later so the power saving is pretty
significant.  However, it's possible that a user will remove the tag before
reading has completed.  We also detect this case and stop polling.  I can go
more into this if necessary but that is what exposed a panic.

You can reproduce using neard and python, in our testing it was very likely to
occur in 10-100 iterations of the following.:

#!/usr/bin/python
import time

import dbus

bus = dbus.SystemBus()
nfc0 = bus.get_object('org.neard', '/org/neard/nfc0')
props = dbus.Interface(nfc0, 'org.freedesktop.DBus.Properties')

try:
props.Set('org.neard.Adapter', 'Powered', dbus.Boolean(1))
except:
pass

adapter = dbus.Interface(nfc0, 'org.neard.Adapter')

for i in range(1000):
adapter.StartPollLoop('Initiator')
time.sleep(0.1)
adapter.StopPollLoop()
print(i)

I believe the last time we tested this was around the 4.1 release.

-- 
Justin Bronder


Re: [PATCH 0/2] net: hix5hd2_gmac: keep the compatible string not changed

2016-12-20 Thread David Miller
From: Dongpo Li 
Date: Tue, 20 Dec 2016 10:09:27 +0800

> This patch series fix the patch:
> d0fb6ba75dc0 ("net: hix5hd2_gmac: add generic compatible string")
> 
> The SoC hix5hd2 compatible string has the suffix "-gmac" and
> we should not change its compatible string.
> So we should name all the compatible string with the suffix "-gmac".
> Creating a new name suffix "-gemac" is unnecessary.

Series applied.


Re: [PATCH net 2/2] net: netcp: ethss: fix 10gbe host port tx pri map configuration

2016-12-20 Thread David Miller
From: Murali Karicheri 
Date: Mon, 19 Dec 2016 17:55:57 -0500

> From: WingMan Kwok 
> 
> This patch adds the missing 10gbe host port tx priority map
> configurations.
> 
> Signed-off-by: WingMan Kwok 
> Signed-off-by: Murali Karicheri 
> Signed-off-by: Sekhar Nori 

Applied.


Re: [PATCH net] openvswitch: Add a missing break statement.

2016-12-20 Thread David Miller
From: Jarno Rajahalme 
Date: Mon, 19 Dec 2016 17:06:33 -0800

> Add a break statement to prevent fall-through from
> OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
> actions setting ethernet addresses fail to validate with log messages
> complaining about invalid tunnel attributes.
> 
> Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
> Signed-off-by: Jarno Rajahalme 
> Acked-by: Pravin B Shelar 
> Acked-by: Jiri Benc 

Applied.


Re: [PATCH net 1/2] net: netcp: ethss: fix errors in ethtool ops

2016-12-20 Thread David Miller
From: Murali Karicheri 
Date: Mon, 19 Dec 2016 17:55:56 -0500

> From: WingMan Kwok 
> 
> In ethtool ops, it needs to retrieve the corresponding
> ethss module (gbe or xgbe) from the net_device structure.
> Prior to this patch, the retrieving procedure only
> checks for the gbe module.  This patch fixes the issue
> by checking the xgbe module if the net_device structure
> does not correspond to the gbe module.
> 
> Signed-off-by: WingMan Kwok 
> Signed-off-by: Murali Karicheri 
> Signed-off-by: Sekhar Nori 

Applied.


Re: [PATCH 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel

2016-12-20 Thread Mark Greer
On Tue, Dec 20, 2016 at 11:16:32AM -0500, Geoff Lansberry wrote:
> From: Jaret Cantu 
> 
> Repeated polling attempts cause a NULL dereference error to occur.
> This is because the state of the trf7970a is currently reading but
> another request has been made to send a command before it has finished.

How is this happening?  Was trf7970a_abort_cmd() called and it didn't
work right?  Was it not called at all and there is a bug in the digital
layer?  More details please.

> The solution is to properly kill the waiting reading (workqueue)
> before failing on the send.

If the bug is in the calling code, then that is what should get fixed.
This seems to be a hack to work-around a digital layer bug.

Mark
--


Re: [PATCH net v4 0/4] fsl/fman: fixes for ARM

2016-12-20 Thread David Miller
From: Madalin Bucur 
Date: Mon, 19 Dec 2016 22:42:42 +0200

> The patch set fixes advertised speeds for QSGMII interfaces, disables
> A007273 erratum workaround on non-PowerPC platforms where it does not
> apply, enables compilation on ARM64 and addresses a probing issue on
> non PPC platforms.
> 
> Changes from v3: removed redundant comment, added ack by Scott
> Changes from v2: merged fsl/fman changes to avoid a point of failure
> Changes from v1: unifying probing on all supported platforms

Series applied, thanks.


Re: Soft lockup in tc_classify

2016-12-20 Thread Shahar Klein



On 12/19/2016 7:58 PM, Cong Wang wrote:

Hello,

On Mon, Dec 19, 2016 at 8:39 AM, Shahar Klein  wrote:



On 12/13/2016 12:51 AM, Cong Wang wrote:


On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz  wrote:


On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann 
wrote:


Note that there's still the RCU fix missing for the deletion race that
Cong will still send out, but you say that the only thing you do is to
add a single rule, but no other operation in involved during that test?



What's missing to have the deletion race fixed? making a patch or
testing to a patch which was sent?



If you think it would help for this problem, here is my patch rebased
on the latest net-next.

Again, I don't see how it could help this case yet, especially I don't
see how we could have a loop in this singly linked list.



I've applied cong's patch and hit a different lockup(full log attached):



Are you sure this is really different? For me, it is still inside the loop
in tc_classify(), with only a slightly different offset.




Daniel suggested I'll add a print:
case RTM_DELTFILTER:
-   err = tp->ops->delete(tp, fh);
+ printk(KERN_ERR "DEBUGG:SK %s:%d\n", __func__, __LINE__);
+ err = tp->ops->delete(tp, fh, );
if (err == 0) {

and I couldn't see this print in the output.


Hmm, that is odd, if this never prints, then my patch should not make any
difference.

There are still two other cases where we could change tp->next, so do you
mind to add two more printk's for debugging?

Attached is the delta patch.

Thanks!



I've added a slightly different debug print:
@@ -368,11 +375,12 @@ static int tc_ctl_tfilter(struct sk_buff *skb, 
struct nlmsghdr *n)

if (tp_created) {
RCU_INIT_POINTER(tp->next, 
rtnl_dereference(*back));

rcu_assign_pointer(*back, tp);
+ printk(KERN_ERR "DEBUGG:SK add/change filter by: %pf 
tp=%p tp->next=%p\n", tp->ops->get, tp, tp->next);

}
tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER, false);

full output attached:

[  283.290271] Mirror/redirect action on
[  283.305031] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9432d704df60 tp->next=  (null)
[  283.322563] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d240 tp->next=  (null)

[  283.359997] GACT probability on
[  283.365923] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d3c0 tp->next=9436e718d240
[  283.378725] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d3c0 tp->next=9436e718d3c0
[  283.391310] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d3c0 tp->next=9436e718d3c0
[  283.403923] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d3c0 tp->next=9436e718d3c0
[  283.416542] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d3c0 tp->next=9436e718d3c0
[  308.538571] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[swapper/0:0]



Thanks
Shahar



[  283.290271] Mirror/redirect action on
[  283.305031] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9432d704df60 tp->next=  (null)
[  283.322563] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d240 tp->next=  (null)
[  283.359997] GACT probability on
[  283.365923] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d3c0 tp->next=9436e718d240
[  283.378725] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d3c0 tp->next=9436e718d3c0
[  283.391310] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d3c0 tp->next=9436e718d3c0
[  283.403923] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d3c0 tp->next=9436e718d3c0
[  283.416542] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=9436e718d3c0 tp->next=9436e718d3c0
[  308.538571] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[swapper/0:0]
[  308.547322] Modules linked in: act_gact act_mirred openvswitch 
nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 vfio_pci vfio_virqfd 
vfio_iommu_type1 vfio cls_flower mlx5_ib mlx5_core devlink sch_ingress nfsv3 
nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_nat_ipv4 nf_nat libcrc32c nf_conntrack_ipv4 nf_defrag_ipv4 
xt_conntrack nf_conntrack tun ebtable_filter ebtables ip6table_filter 
ip6_tables netconsole rpcrdma bridge ib_isert stp iscsi_target_mod llc ib_iser 
libiscsi scsi_transport_iscsi ib_srpt ib_srp scsi_transport_srp ib_ipoib 
rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core intel_rapl 
sb_edac edac_core x86_pkg_temp_thermal coretemp kvm_intel kvm igb irqbypass 
joydev ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel ptp 
ipmi_si 

Re: [PATCH net 0/3] Fix integration of eee-broken-modes

2016-12-20 Thread David Miller
From: Jerome Brunet 
Date: Mon, 19 Dec 2016 16:05:35 +0100

> The purpose of this series is to fix the integration of the ethernet phy
> property "eee-broken-modes" [0]
> 
> The v3 of this series has been merged, missing a fix (error reported by
> kbuild robot) available in the v4 [1]
> 
> More importantly, Florian opposed adding a DT property mapping a device
> register this directly [2]. The concern was that the property could be
> abused to implement platform configuration policy. After discussing it,
> I think we agreed that such information about the HW (defect) should appear
> in the platform DT. However, the preferred way is to add a boolean property
> for each EEE broken mode.
> 
> [0]: 
> http://lkml.kernel.org/r/1480326409-25419-1-git-send-email-jbru...@baylibre.com
> [1]: 
> http://lkml.kernel.org/r/1480348229-25672-1-git-send-email-jbru...@baylibre.com
> [2]: http://lkml.kernel.org/r/e14a3b0c-dc34-be14-48b3-518a0ad0c...@gmail.com

Series applied, thank you.


Re: [PATCH perf/core REBASE 3/5] tools lib bpf: Add bpf_prog_{attach,detach}

2016-12-20 Thread Joe Stringer
On 20 December 2016 at 06:32, Arnaldo Carvalho de Melo  wrote:
> Em Tue, Dec 20, 2016 at 11:18:51AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Wed, Dec 14, 2016 at 02:43:40PM -0800, Joe Stringer escreveu:
>> > Commit d8c5b17f2bc0 ("samples: bpf: add userspace example for attaching
>> > eBPF programs to cgroups") added these functions to samples/libbpf, but
>> > during this merge all of the samples libbpf functionality is shifting to
>> > tools/lib/bpf. Shift these functions there.
>> >
>> > Signed-off-by: Joe Stringer 
>> > ---
>> > Arnaldo, this is a new patch you didn't previously review which I've
>> > prepared due to the conflict with net-next. I figured it's better to try
>> > to get samples/bpf properly switched over this window rather than defer the
>> > problem and end up having to deal with another merge problem next time
>> > around. I hope that is fine for you. If not, this patch onwards will need
>> > to be dropped
>> >
>> > It's a simple copy/paste/delete with a minor change for sys_bpf() vs
>> > syscall().
>> > ---
>> >  samples/bpf/libbpf.c | 21 -
>> >  samples/bpf/libbpf.h |  3 ---
>> >  tools/lib/bpf/bpf.c  | 21 +
>> >  tools/lib/bpf/bpf.h  |  3 +++
>> >  4 files changed, 24 insertions(+), 24 deletions(-)
>> >
>> > diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
>> > index 3391225ad7e9..d9af876b4a2c 100644
>> > --- a/samples/bpf/libbpf.c
>> > +++ b/samples/bpf/libbpf.c
>> > @@ -11,27 +11,6 @@
>> >  #include 
>> >  #include "libbpf.h"
>> >
>> > -int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
>> > -{
>> > -   union bpf_attr attr = {
>> > -   .target_fd = target_fd,
>> > -   .attach_bpf_fd = prog_fd,
>> > -   .attach_type = type,
>> > -   };
>> > -
>> > -   return syscall(__NR_bpf, BPF_PROG_ATTACH, , sizeof(attr));
>>
>> This one makes it fail for CentOS 5 and 6, others may fail as well,
>> still building, investigating...
>
> Ok, fixed it by making it follow the model of the other sys_bpf wrappers
> setting up that bpf_attr union wrt initializing unamed struct members:
>
>  int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
>  {
> -   union bpf_attr attr = {
> -   .target_fd = target_fd,
> -   .attach_bpf_fd = prog_fd,
> -   .attach_type = type,
> -   };
> +   union bpf_attr attr;
> +
> +   bzero(, sizeof(attr));
> +   attr.target_fd = target_fd;
> +   attr.attach_bpf_fd = prog_fd;
> +   attr.attach_type   = type;
>
> return sys_bpf(BPF_PROG_ATTACH, , sizeof(attr));
>  }

Ah, I just shifted these across originally so the delta would be
minimal but now I know why this code is like this. Thanks.


Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-20 Thread Andy Lutomirski
On Tue, Dec 20, 2016 at 10:36 AM, Daniel Mack  wrote:
> Hi,
>
> On 12/20/2016 06:23 PM, Andy Lutomirski wrote:
>> On Tue, Dec 20, 2016 at 2:21 AM, Daniel Mack  wrote:
>
>> To clarify, since this thread has gotten excessively long and twisted,
>> I think it's important that, for hooks attached to a cgroup, you be
>> able to tell in a generic way whether something is plugged into the
>> hook.  The natural way to see a cgroup's configuration is to read from
>> cgroupfs, so I think that reading from cgroupfs should show you that a
>> BPF program is attached and also give enough information that, once
>> bpf programs become dumpable, you can dump the program (using the
>> bpf() syscall or whatever).
>
> [...]
>
>> There isn't a big semantic difference between
>> 'open("/cgroup/NAME/some.control.file", O_WRONLY); ioctl(...,
>> CGROUP_ATTACH_BPF, ...)' and 'open("/cgroup/NAME/some.control.file",
>> O_WRONLY); bpf(BPF_PROG_ATTACH, ...);'.  There is, however, a semantic
>> difference when you do open("/cgroup/NAME", O_RDONLY | O_DIRECTORY)
>> because the permission check is much weaker.
>
> Okay, if you have such a control file, you can of course do something
> like that. When we discussed things back then with Tejun however, we
> concluded that a controller that is not completely controllable through
> control knobs that can be written and read via cat is meaningless.
> That's why this has become a 'hidden' cgroup feature.
>
> With your proposed API, you'd first go to the bpf(2) syscall in order to
> get a prog fd, and then come back to some sort of cgroup API to put the
> fd in there. That's quite a mix and match, which is why we considered
> the API cleaner in its current form, as everything that is related to
> bpf is encapsulated behind a single syscall.

You already have to do bpf() to get a prog fd, then open() to get a
cgroup fd, then bpf() or ioctl() to attach, so this isn't much
different, and its exactly the same number of syscalls.

>
>> My preference would be to do an ioctl on a new
>> /cgroup/NAME/network_hooks.inet_ingress file.  Reading that file tells
>> you whether something is attached and hopefully also gives enough
>> information (a hash of the BPF program, perhaps) to dump the actual
>> program using future bpf() interfaces.  write() and ioctl() can be
>> used to configure it as appropriate.
>
> So am I reading this right? You're proposing to add ioctl() hooks to
> kernfs/cgroupfs? That would open more possibilities of course, but I'm
> not sure where that rabbit hole leads us eventually.

Indeed.  I already have a test patch to add ioctl() to kernfs.  Adding
it to cgroupfs shouldn't be much more complicated.

>
>> Another option that I like less would be to have a
>> /cgroup/NAME/cgroup.bpf that lists all the active hooks along with
>> their contents.  You would do an ioctl() on that to program a hook and
>> you could read it to see what's there.
>
> Yes, read() could, in theory, give you similar information than ioctl(),
> but in human-readable form.
>
>> FWIW, everywhere I say ioctl(), the bpf() syscall would be okay, too.
>> It doesn't make a semantic difference, except that I dislike
>> BPF_PROG_DETACH because that particular command isn't BPF-specific at
>> all.
>
> Well, I think it is; it pops the bpf program from a target and drops the
> reference on it. It's not much code, but it's certainly bpf-specific.

I mean the interface isn't bpf-specific.  If there was something that
wasn't bpf attached to the target, you'd still want an API to detach
it.

>
 So if I set up a cgroup that's monitored and call it /cgroup/a and
 enable delegation and if the program running there wants to do its own
 monitoring in /cgroup/a/b (via delegation), then you really want the
 outer monitor to silently drop events coming from /cgroup/a/b?
>>>
>>> That's a fair point, and we've discussed it as well. The issue is, as
>>> Alexei already pointed out, that we do not want to traverse the tree up
>>> to the root for nested cgroups due to the runtime costs in the
>>> networking fast-path. After all, we're running the bpf program for each
>>> packet in flight. Hence, we opted for the approach to only look at the
>>> leaf node for now, with the ability to open it up further in the future
>>> using flags during attach etc.
>>
>> Careful here!  You don't look only at the leaf node for now.  You do a
>> fancy traversal and choose the nearest node that has a hook set up.
>
> But we do the 'complex' operation at attach time or when a cgroup is
> created, both of which are slow-path operations. In the fast-path, we
> only look at the leaf, which may or may not have an effective program
> installed. And that's of course much cheaper then doing the traversing
> for each packet.

You would never traverse the full hierarchy for each packet.  You'd
have a linked list of programs that are attached, kind of like how
there's an "effective" array right now.  I sent out 

Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock

2016-12-20 Thread Geoff Lansberry
On Tue, Dec 20, 2016 at 1:11 PM, Mark Greer  wrote:
> Hi Geoff.
>
> Please put the version in your subjects when submitting anything but the
> initial version of a patch (e.g., [PATCH v2 1/3]).
>
> Which series do you want reviewed?
>
> Mark
> --
Sorry about the double posting, I had forgotten to erase the patches I
generated while rebasing and checking, and I'll have to figure out how
to add that v2 line to the automatically generated subject line if I
end up submitting another round.

Please review the three most recent patches, which have the send time
of 17:16.

Best Regards,
Geoff


Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-20 Thread Daniel Mack
Hi,

On 12/20/2016 06:23 PM, Andy Lutomirski wrote:
> On Tue, Dec 20, 2016 at 2:21 AM, Daniel Mack  wrote:

> To clarify, since this thread has gotten excessively long and twisted,
> I think it's important that, for hooks attached to a cgroup, you be
> able to tell in a generic way whether something is plugged into the
> hook.  The natural way to see a cgroup's configuration is to read from
> cgroupfs, so I think that reading from cgroupfs should show you that a
> BPF program is attached and also give enough information that, once
> bpf programs become dumpable, you can dump the program (using the
> bpf() syscall or whatever).

[...]

> There isn't a big semantic difference between
> 'open("/cgroup/NAME/some.control.file", O_WRONLY); ioctl(...,
> CGROUP_ATTACH_BPF, ...)' and 'open("/cgroup/NAME/some.control.file",
> O_WRONLY); bpf(BPF_PROG_ATTACH, ...);'.  There is, however, a semantic
> difference when you do open("/cgroup/NAME", O_RDONLY | O_DIRECTORY)
> because the permission check is much weaker.

Okay, if you have such a control file, you can of course do something
like that. When we discussed things back then with Tejun however, we
concluded that a controller that is not completely controllable through
control knobs that can be written and read via cat is meaningless.
That's why this has become a 'hidden' cgroup feature.

With your proposed API, you'd first go to the bpf(2) syscall in order to
get a prog fd, and then come back to some sort of cgroup API to put the
fd in there. That's quite a mix and match, which is why we considered
the API cleaner in its current form, as everything that is related to
bpf is encapsulated behind a single syscall.

> My preference would be to do an ioctl on a new
> /cgroup/NAME/network_hooks.inet_ingress file.  Reading that file tells
> you whether something is attached and hopefully also gives enough
> information (a hash of the BPF program, perhaps) to dump the actual
> program using future bpf() interfaces.  write() and ioctl() can be
> used to configure it as appropriate.

So am I reading this right? You're proposing to add ioctl() hooks to
kernfs/cgroupfs? That would open more possibilities of course, but I'm
not sure where that rabbit hole leads us eventually.

> Another option that I like less would be to have a
> /cgroup/NAME/cgroup.bpf that lists all the active hooks along with
> their contents.  You would do an ioctl() on that to program a hook and
> you could read it to see what's there.

Yes, read() could, in theory, give you similar information than ioctl(),
but in human-readable form.

> FWIW, everywhere I say ioctl(), the bpf() syscall would be okay, too.
> It doesn't make a semantic difference, except that I dislike
> BPF_PROG_DETACH because that particular command isn't BPF-specific at
> all.

Well, I think it is; it pops the bpf program from a target and drops the
reference on it. It's not much code, but it's certainly bpf-specific.

>>> So if I set up a cgroup that's monitored and call it /cgroup/a and
>>> enable delegation and if the program running there wants to do its own
>>> monitoring in /cgroup/a/b (via delegation), then you really want the
>>> outer monitor to silently drop events coming from /cgroup/a/b?
>>
>> That's a fair point, and we've discussed it as well. The issue is, as
>> Alexei already pointed out, that we do not want to traverse the tree up
>> to the root for nested cgroups due to the runtime costs in the
>> networking fast-path. After all, we're running the bpf program for each
>> packet in flight. Hence, we opted for the approach to only look at the
>> leaf node for now, with the ability to open it up further in the future
>> using flags during attach etc.
> 
> Careful here!  You don't look only at the leaf node for now.  You do a
> fancy traversal and choose the nearest node that has a hook set up.

But we do the 'complex' operation at attach time or when a cgroup is
created, both of which are slow-path operations. In the fast-path, we
only look at the leaf, which may or may not have an effective program
installed. And that's of course much cheaper then doing the traversing
for each packet.

> mkdir /cgroup/foo
> BPF_PROG_ATTACH(some program to foo)
> mkdir /cgroup/foo/bar
> chown -R some_user /cgroup/foo/bar
> 
> If the kernel only looked at the leaf, then the program that did the
> above would not expect that the program would constrain
> /cgroup/foo/bar's activity.  But, as it stands, the program *would*
> expect /cgroup/foo/bar to be constrained, except that, whenever the
> capable() check changes to ns_capable() (which will happen eventually
> one way or another), then the bad guy can create /cgroup/foo/bar/baz,
> install a new no-op hook there, and break the security assumption.
> 
> IOW, I think that totally non-recursive hooks are okay from a security
> perspective, albeit rather strange, but the current design is not okay
> from a security perspective.

We locked down the ability to override any of 

Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock

2016-12-20 Thread Mark Greer
On Tue, Dec 20, 2016 at 01:29:13PM -0500, Geoff Lansberry wrote:
> On Tue, Dec 20, 2016 at 1:11 PM, Mark Greer  wrote:
> > Hi Geoff.
> >
> > Please put the version in your subjects when submitting anything but the
> > initial version of a patch (e.g., [PATCH v2 1/3]).
> >
> > Which series do you want reviewed?
> >
> > Mark
> > --
> Sorry about the double posting, I had forgotten to erase the patches I
> generated while rebasing and checking, and I'll have to figure out how
> to add that v2 line to the automatically generated subject line if I
> end up submitting another round.

Hint: -v  option of 'git format-patch'

> Please review the three most recent patches, which have the send time
> of 17:16.

Okay, thank.

Mark
--


Re: mlx4: Bug in XDP_TX + 16 rx-queues

2016-12-20 Thread Martin KaFai Lau
On Tue, Dec 20, 2016 at 02:02:05PM +0200, Tariq Toukan wrote:
> Thanks Martin, nice catch!
>
>
> On 20/12/2016 1:37 AM, Martin KaFai Lau wrote:
> >Hi Tariq,
> >
> >On Sat, Dec 17, 2016 at 02:18:03AM -0800, Martin KaFai Lau wrote:
> >>Hi All,
> >>
> >>I have been debugging with XDP_TX and 16 rx-queues.
> >>
> >>1) When 16 rx-queues is used and an XDP prog is doing XDP_TX,
> >>it seems that the packet cannot be XDP_TX out if the pkt
> >>is received from some particular CPUs (/rx-queues).
> >>
> >>2) If 8 rx-queues is used, it does not have problem.
> >>
> >>3) The 16 rx-queues problem also went away after reverting these
> >>two patches:
> >>15fca2c8eb41 net/mlx4_en: Add ethtool statistics for XDP cases
> >>67f8b1dcb9ee net/mlx4_en: Refactor the XDP forwarding rings scheme
> >>
> >After taking a closer look at 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP 
> >forwarding rings scheme")
> >and armed with the fact that '>8 rx-queues does not work', I have
> >made the attached change that fixed the issue.
> >
> >Making change in mlx4_en_fill_qp_context() could be an easier fix
> >but I think this change will be easier for discussion purpose.
> >
> >I don't want to lie that I know anything about how this variable
> >works in CX3.  If this change makes sense, I can cook up a diff.
> >Otherwise, can you shed some light on what could be happening
> >and hopefully can lead to a diff?
> >
> >Thanks
> >--Martin
> >
> >
> >diff --git i/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
> >w/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> >index bcd955339058..b3bfb987e493 100644
> >--- i/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> >+++ w/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> >@@ -1638,10 +1638,10 @@ int mlx4_en_start_port(struct net_device *dev)
> >
> > /* Configure tx cq's and rings */
> > for (t = 0 ; t < MLX4_EN_NUM_TX_TYPES; t++) {
> >-u8 num_tx_rings_p_up = t == TX ? priv->num_tx_rings_p_up : 1;
> The bug lies in this line.
> Number of rings per UP in case of TX_XDP should be priv->tx_ring_num[TX_XDP
> ], not 1.
> Please try the following fix.
> I can prepare and send it once the window opens again.
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index bcd955339058..edbe200ac2fa 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -1638,7 +1638,8 @@ int mlx4_en_start_port(struct net_device *dev)
>
> /* Configure tx cq's and rings */
> for (t = 0 ; t < MLX4_EN_NUM_TX_TYPES; t++) {
> -   u8 num_tx_rings_p_up = t == TX ? priv->num_tx_rings_p_up :
> 1;
> +   u8 num_tx_rings_p_up = t == TX ?
> +   priv->num_tx_rings_p_up : priv->tx_ring_num[t];
>
> for (i = 0; i < priv->tx_ring_num[t]; i++) {
> /* Configure cq */
>
Thanks for confirming the bug is related to the user_prio argument.

I have some questions:

1. Just to confirm the intention of the change.  Your change is essentially
   always passing 0 to the user_prio parameter for the TX_XDP type by
   doing (i / priv->tx_ring_num[t])?  If yes, would it be clearer to
   always pass 0 instead?

   And yes, it also works in our test.  Please post an offical patch
   if it is the fix.

2. Can you explain a little on how does the user_prio affect
   the tx behavior?  e.g. What is the difference between
   different user_prio like 0, 1, 2...etc?

3. Mostly a follow up on (2).
   In mlx4_en_get_profile(), num_tx_rings_p_up (of the struct mlx4_en_profile)
   depends on mlx4_low_memory_profile() and number of cpu.  Does these
   similar bounds apply to the 'u8 num_tx_rings_p_up' here for
   TX_XDP type?

Thanks,
Martin

> >-
> > for (i = 0; i < priv->tx_ring_num[t]; i++) {
> > /* Configure cq */
> >+int user_prio;
> >+
> > cq = priv->tx_cq[t][i];
> > err = mlx4_en_activate_cq(priv, cq, i);
> > if (err) {
> >@@ -1660,9 +1660,14 @@ int mlx4_en_start_port(struct net_device *dev)
> >
> > /* Configure ring */
> > tx_ring = priv->tx_ring[t][i];
> >+if (t != TX_XDP)
> >+user_prio = i / priv->num_tx_rings_p_up;
> >+else
> >+user_prio = i & 0x07;
> >+
> > err = mlx4_en_activate_tx_ring(priv, tx_ring,
> >cq->mcq.cqn,
> >-   i / num_tx_rings_p_up);
> >+   user_prio);
> > if (err) {
> > en_err(priv, "Failed allocating Tx ring\n");
> > mlx4_en_deactivate_cq(priv, cq);
> Regards,
> Tariq Toukan.


Re: [PATCH net-next 1/1] driver: ipvlan: Define common functions to decrease duplicated codes used to add or del IP address

2016-12-20 Thread David Miller
From: f...@ikuai8.com
Date: Mon, 19 Dec 2016 09:24:05 +0800

>  It is sent again because the first email is sent during net-next closing.

It is still closed, and will not open again for at least one week.


Re: kernel/bpf/verifier.c: 4 * possible unintended fallthrough ?

2016-12-20 Thread Josef Bacik
On Tue, Dec 20, 2016 at 11:34 AM, David Binderman  
wrote:


Hello there,


From: Alexei Starovoitov 
I've tried 4.9 and 5.2 and don't see this warning.


As expected - I used a development version of gcc.
Latest released version is 6.2


Is this 6.x gcc?


7.0 would be more accurate.


I suspect it will have such warnings all over the kernel.


Indeed it has hundreds, but the subject under discussion is file
kernel/bpf/verifier.c.

I am still not sure if I have found a fallthrough bug or not.


You haven't, this is intended so is a useless warning.  Thanks,

Josef



Re: ipv6: handle -EFAULT from skb_copy_bits

2016-12-20 Thread David Miller
From: Dave Jones 
Date: Tue, 20 Dec 2016 13:17:28 -0500

> On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote:
>  > From: Dave Jones 
>  > Date: Mon, 19 Dec 2016 19:40:13 -0500
>  > 
>  > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
>  > > 
>  > >  > Unfortunately, this made no difference.  I spent some time today 
> trying
>  > >  > to make a better reproducer, but failed. I'll revisit again tomorrow.
>  > >  > 
>  > >  > Maybe I need >1 process/thread to trigger this.  That would explain 
> why
>  > >  > I can trigger it with Trinity.
>  > > 
>  > > scratch that last part, I finally just repro'd it with a single process.
>  > 
>  > Thanks for the info, I'll try to think about this some more.
> 
> I threw in some debug printks right before that BUG_ON.
> it's always this:
> 
> skb->len=31 skb->data_len=0 offset:30 total_len:9
> 
> Shouldn't we have kicked out data_len=0 skb's somewhere before we got this 
> far ?

skb->data_len is just the length of any non-linear data in the SKB.

This has to do with the SKB buffer layout and geometry, not whether
the packet is "fragmented" in the protocol sense.

So no, this isn't a criteria for packets being filtered out by this
point.

Can you try to capture what sk->sk_socket->type and
inet_sk(sk)->hdrincl are set to at the time of the crash?

Thanks.


Re: ipv6: handle -EFAULT from skb_copy_bits

2016-12-20 Thread Dave Jones
On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote:
 > From: Dave Jones 
 > Date: Mon, 19 Dec 2016 19:40:13 -0500
 > 
 > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
 > > 
 > >  > Unfortunately, this made no difference.  I spent some time today trying
 > >  > to make a better reproducer, but failed. I'll revisit again tomorrow.
 > >  > 
 > >  > Maybe I need >1 process/thread to trigger this.  That would explain why
 > >  > I can trigger it with Trinity.
 > > 
 > > scratch that last part, I finally just repro'd it with a single process.
 > 
 > Thanks for the info, I'll try to think about this some more.

I threw in some debug printks right before that BUG_ON.
it's always this:

skb->len=31 skb->data_len=0 offset:30 total_len:9

Shouldn't we have kicked out data_len=0 skb's somewhere before we got this far ?

Dave



Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock

2016-12-20 Thread Mark Greer
Hi Geoff.

Please put the version in your subjects when submitting anything but the
initial version of a patch (e.g., [PATCH v2 1/3]).

Which series do you want reviewed?

Mark
--


Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock

2016-12-20 Thread Jones Desougi
On 2016-12-20 17:16, Geoff Lansberry wrote:
> From: Geoff Lansberry 
> 
> The TRF7970A has configuration options to support hardware designs
> which use a 27.12MHz clock. This commit adds a device tree option
> 'clock-frequency' to support configuring the this chip for default
> 13.56MHz clock or the optional 27.12MHz clock.
> ---
>  .../devicetree/bindings/net/nfc/trf7970a.txt   |  4 ++
>  drivers/nfc/trf7970a.c | 50 
> +-
>  2 files changed, 43 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt 
> b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> index 32b35a0..e262ac1 100644
> --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> @@ -21,6 +21,8 @@ Optional SoC Specific Properties:
>  - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
>where an extra byte is returned by Read Multiple Block commands issued
>to Type 5 tags.
> +- clock-frequency: Set to specify that the input frequency to the trf7970a 
> is 1356Hz or 2712Hz
> +
You're adding an empty line here that is removed in the next patch.

>  
>  Example (for ARM-based BeagleBone with TRF7970A on SPI1):
>  
> @@ -43,6 +45,8 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
>   irq-status-read-quirk;
>   en2-rf-quirk;
>   t5t-rmb-extra-byte-quirk;
> + vdd_io_1v8;
This does not belong here, and so no need to remove in the next patch.

> + clock-frequency = <2712>;
>   status = "okay";
>   };
>  };
> diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
> index 26c9dbb..4e051e9 100644
> --- a/drivers/nfc/trf7970a.c
> +++ b/drivers/nfc/trf7970a.c
> @@ -124,6 +124,9 @@
>NFC_PROTO_ISO15693_MASK | NFC_PROTO_NFC_DEP_MASK)
>  
>  #define TRF7970A_AUTOSUSPEND_DELAY   3 /* 30 seconds */
> +#define TRF7970A_13MHZ_CLOCK_FREQUENCY   1356
> +#define TRF7970A_27MHZ_CLOCK_FREQUENCY   2712
> +
>  
>  #define TRF7970A_RX_SKB_ALLOC_SIZE   256
>  
> @@ -1056,12 +1059,11 @@ static int trf7970a_init(struct trf7970a *trf)
>  
>   trf->chip_status_ctrl &= ~TRF7970A_CHIP_STATUS_RF_ON;
>  
> - ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL, 0);
> + ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL,
> + trf->modulator_sys_clk_ctrl);
>   if (ret)
>   goto err_out;
>  
> - trf->modulator_sys_clk_ctrl = 0;
> -
>   ret = trf7970a_write(trf, TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS,
>   TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLH_96 |
>   TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLL_32);
> @@ -1181,27 +1183,37 @@ static int trf7970a_in_config_rf_tech(struct trf7970a 
> *trf, int tech)
>   switch (tech) {
>   case NFC_DIGITAL_RF_TECH_106A:
>   trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443A_106;
> - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
> + trf->modulator_sys_clk_ctrl =
> + (trf->modulator_sys_clk_ctrl & 0xF8) |
> + TRF7970A_MODULATOR_DEPTH_OOK;
>   trf->guard_time = TRF7970A_GUARD_TIME_NFCA;
>   break;
>   case NFC_DIGITAL_RF_TECH_106B:
>   trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443B_106;
> - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
> + trf->modulator_sys_clk_ctrl =
> + (trf->modulator_sys_clk_ctrl & 0xF8) |
> + TRF7970A_MODULATOR_DEPTH_ASK10;
>   trf->guard_time = TRF7970A_GUARD_TIME_NFCB;
>   break;
>   case NFC_DIGITAL_RF_TECH_212F:
>   trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_212;
> - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
> + trf->modulator_sys_clk_ctrl =
> + (trf->modulator_sys_clk_ctrl & 0xF8) |
> + TRF7970A_MODULATOR_DEPTH_ASK10;
>   trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
>   break;
>   case NFC_DIGITAL_RF_TECH_424F:
>   trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_424;
> - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
> + trf->modulator_sys_clk_ctrl =
> + (trf->modulator_sys_clk_ctrl & 0xF8) |
> + TRF7970A_MODULATOR_DEPTH_ASK10;
>   trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
>   break;
>   case NFC_DIGITAL_RF_TECH_ISO15693:
>   trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_15693_SGL_1OF4_2648;
> - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
> + trf->modulator_sys_clk_ctrl =
> + (trf->modulator_sys_clk_ctrl & 

Re: [PATCH] ethernet: sfc: Add Kconfig entry for vendor Solarflare

2016-12-20 Thread Edward Cree
On 20/12/16 13:38, Tobias Klauser wrote:
> Since commit
>
>   5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new 
> sfc-falcon driver")
>
> there are two drivers for Solarflare devices, but both still show up
> directly beneath "Ethernet driver support" in the Kconfig. Follow the
> pattern of other vendors and group them beneath an own vendor Kconfig
> entry for Solarflare.
>
> Cc: Edward Cree 
> Signed-off-by: Tobias Klauser 
Acked-by: Edward Cree 


Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-20 Thread Andy Lutomirski
On Tue, Dec 20, 2016 at 2:21 AM, Daniel Mack  wrote:
> Hi,
>
> On 12/20/2016 04:50 AM, Andy Lutomirski wrote:
>> You mean BPF_CGROUP_RUN_PROG_INET_SOCK(sk)?  There is nothing bpf
>> specfic about the hook except that the name of this macro has "BPF" in
>> it.  There is nothing whatsoever that's bpf-specific about the context
>> -- sk is not bpf-specific at all.
>>
>> The only thing bpf-specific about it is that it currently only invokes
>> bpf programs.  That could easily change.
>
> I'm not sure if I follow. The code as it currently stands only supports
> attaching bpf programs to cgroups which have been created using
> BPF_PROG_LOAD. If cgroups would support other program types in the
> future, then they would need to be stored in different data types
> anyway, and the bpf syscall multiplexer would be the wrong entry point
> to access them anyway.

To clarify, since this thread has gotten excessively long and twisted,
I think it's important that, for hooks attached to a cgroup, you be
able to tell in a generic way whether something is plugged into the
hook.  The natural way to see a cgroup's configuration is to read from
cgroupfs, so I think that reading from cgroupfs should show you that a
BPF program is attached and also give enough information that, once
bpf programs become dumpable, you can dump the program (using the
bpf() syscall or whatever).

Obviously the interface to *attach* a BPF program to a hook will need
to be at least a little bit BPF-specific.  But there's nothing
particularly BPF-specific about detaching, and if a control file were
to exist, writing "detach" or "none" to it seems natural.

>
> Whether we add bpf-specific code to the cgroup file parsers or
> cgroup-specific code to the bpf layer does not make much of a semantic
> difference, does it? As a matter of fact, my very first implementation
> of this patch set implemented a cgroup controller that would allow
> writing strings like "ing
>
> b) make it possible to extend the functionality in the future by adding
> flags to the command struct etc.
>
> And I hoped we achieved that after discussing it for so long.
>
>> How about slowing down a wee bit and trying to come up with cgroup
>> hook semantics that work for all of these use cases?
>
> I'm all for discussing things, but I don't this was done in a rush.
>
> I do agree though that adding functionality to cgroups that is not
> limited to resource control is a delicate thing to do, which is why I
> cc'ed cgroups@ in my patches. I should have also added linux-api@ I
> guess, sorry I missed that.
>ress 5" to its control file, where 5 is the fd
> number that came out of BPF_PROG_LOAD. The main reason we decided to
> ditch that was that echoing fd numbers into a text file seemed way worse
> than going through a proper syscall layer with it, and ioctls are
> unavailable on pseudo-fs.

There isn't a big semantic difference between
'open("/cgroup/NAME/some.control.file", O_WRONLY); ioctl(...,
CGROUP_ATTACH_BPF, ...)' and 'open("/cgroup/NAME/some.control.file",
O_WRONLY); bpf(BPF_PROG_ATTACH, ...);'.  There is, however, a semantic
difference when you do open("/cgroup/NAME", O_RDONLY | O_DIRECTORY)
because the permission check is much weaker.

The reason I suggest ioctl() and not write() is that write() MUST NOT
make its behavior depend on the caller's credentials, file table, etc.
Imagine what would happen if you did 'sudo -u eviltext
>/cgroup/NAME/control.file'.  (This particular mistake has been
repeated many times in the kernel, in drivers, networking, namespaces,
core code, etc, and it's resulted in a big pile of privilege
escalation bugs.)  So write("bpf:") is safe
(but unusably awkward, I think), whereas write("bpf:fd 5") is unsafe.

>
> The idea was rather to allow attaching bpf programs to other things than
> just cgroups as well, which is why we called the member of 'union
> bpf_attr' 'target_fd', and a cgroup is just one type a target here.

I would make that a separate operation.  If someone adds the ability
to attach an ebpf program to, say, seccomp (I'm quite sure this will
happen eventually), it should be attached using seccomp(), not bpf(),
for example).  The people writing seccomp filters will thank you for
making the syscall in question reflect what object (the cgroup, for
example) is being modified.

>
>>> i'm assuming 'baadf00d' is bpf program fd expressed a text string?
>>> and kernel needs to parse above? will you allow capital and lower
>>> case for 'bpf:' ? and mixed case too? spaces and tabs allowed or not?
>>> can program fd expressed as decimal or hex or both?
>>> how do you return the error? as a text string for user space
>>> to parse?
>>
>> No.  The kernel does not parse it because you cannot write this to the
>> file.  You set a bpf filter with ioctl and pass an fd.
>
> An ioctl on what file, exactly?

There are at least two plausible models.

My preference would be to do an ioctl on a new
/cgroup/NAME/network_hooks.inet_ingress file.  

Re: wl1251 & mac address & calibration data

2016-12-20 Thread Tony Lindgren
* Kalle Valo  [161220 09:12]:
> Tony Lindgren  writes:
> 
> > * Kalle Valo  [161220 03:47]:
> >> Arend Van Spriel  writes:
> >> 
> >> > On 18-12-2016 13:09, Pali Rohár wrote:
> >> >
> >> >> File wl1251-nvs.bin is provided by linux-firmware package and contains 
> >> >> default data which should be overriden by model specific calibrated 
> >> >> data.
> >> >
> >> > Ah. Someone thought it was a good idea to provide the "one ring to rule
> >> > them all". Nice.
> >> 
> >> Yes, that was a bad idea. wl1251-nvs.bin in linux-firmware.git should be
> >> renamed to wl1251-nvs.bin.example, or something like that, as it should
> >> be only installed to a real system only if there's no real calibration
> >> data available (only for developers to use, not real users).
> >
> > Makes sense to me. Note that with the recent changes to wlcore, we can
> > now easily provide board specific calibration firmware simply by adding a
> > new compatible value. So for n900, we could have something like
> > compatible = "ti,wl1251-n900" and have it point to n900 specific calibration
> > file wl1251-nvs-n900.bin. Of course this won't help with the mac address,
> > or any of the device specific data..
> >
> > That is assuming the calibration values are the same for each similar
> > device and don't have to be generated for each device. And naturally wl1251
> > needs simlar changes done to make use of devices specific calibration files.
> 
> No, these are unique per each sold device. Every N900 was calibrated at
> the factory and they all have different calibration data which is stored
> to the flash. So when N900 boots (and in _every_ boot) it has to load
> the calibration data from the flash and provide it to the wl1251 driver
> somehow.

Urgh, OK. So much for that idea then.

Thanks,

Tony


Re: wl1251 & mac address & calibration data

2016-12-20 Thread Kalle Valo
Tony Lindgren  writes:

> * Kalle Valo  [161220 03:47]:
>> Arend Van Spriel  writes:
>> 
>> > On 18-12-2016 13:09, Pali Rohár wrote:
>> >
>> >> File wl1251-nvs.bin is provided by linux-firmware package and contains 
>> >> default data which should be overriden by model specific calibrated 
>> >> data.
>> >
>> > Ah. Someone thought it was a good idea to provide the "one ring to rule
>> > them all". Nice.
>> 
>> Yes, that was a bad idea. wl1251-nvs.bin in linux-firmware.git should be
>> renamed to wl1251-nvs.bin.example, or something like that, as it should
>> be only installed to a real system only if there's no real calibration
>> data available (only for developers to use, not real users).
>
> Makes sense to me. Note that with the recent changes to wlcore, we can
> now easily provide board specific calibration firmware simply by adding a
> new compatible value. So for n900, we could have something like
> compatible = "ti,wl1251-n900" and have it point to n900 specific calibration
> file wl1251-nvs-n900.bin. Of course this won't help with the mac address,
> or any of the device specific data..
>
> That is assuming the calibration values are the same for each similar
> device and don't have to be generated for each device. And naturally wl1251
> needs simlar changes done to make use of devices specific calibration files.

No, these are unique per each sold device. Every N900 was calibrated at
the factory and they all have different calibration data which is stored
to the flash. So when N900 boots (and in _every_ boot) it has to load
the calibration data from the flash and provide it to the wl1251 driver
somehow.

-- 
Kalle Valo


[PATCH 19/29] samples/bpf: Make samples more libbpf-centric

2016-12-20 Thread Arnaldo Carvalho de Melo
From: Joe Stringer 

Switch all of the sample code to use the function names from
tools/lib/bpf so that they're consistent with that, and to declare their
own log buffers. This allow the next commit to be purely devoted to
getting rid of the duplicate library in samples/bpf.

Committer notes:

Testing it:

On a fedora rawhide container, with clang/llvm 3.9, sharing the host
linux kernel git tree:

  # make O=/tmp/build/linux/ headers_install
  # make O=/tmp/build/linux -C samples/bpf/

Since I forgot to make it privileged, just tested it outside the
container, using what it generated:

  # uname -a
  Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 
x86_64 GNU/Linux
  # cd 
/var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/
  # ls -la offwaketime
  -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime
  # file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically 
linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, 
BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped
  # readelf -SW offwaketime_kern.o  | grep PROGBITS
  [ 2] .text PROGBITS 40 00 00  AX  
0   0  4
  [ 3] kprobe/try_to_wake_up PROGBITS 40 d8 00  
AX  0   0  8
  [ 5] tracepoint/sched/sched_switch PROGBITS 000118 
000318 00  AX  0   0  8
  [ 7] maps  PROGBITS 000430 50 00  WA  
0   0  4
  [ 8] license   PROGBITS 000480 04 00  WA  
0   0  1
  [ 9] version   PROGBITS 000484 04 00  WA  
0   0  4
  # ./offwaketime | head -5
  
swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;;
 106
  CPU 
0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3
 2
  
Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh
 5
  
firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer
 13
  JS 
Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox
 2
  #

Signed-off-by: Joe Stringer 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Wang Nan 
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20161214224342.12858-2-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 samples/bpf/bpf_load.c| 17 +---
 samples/bpf/bpf_load.h|  3 +++
 samples/bpf/fds_example.c |  9 ---
 samples/bpf/lathist_user.c|  2 +-
 samples/bpf/libbpf.c  | 23 
 samples/bpf/libbpf.h  | 18 ++---
 samples/bpf/lwt_len_hist_user.c   |  6 +++--
 samples/bpf/offwaketime_user.c|  8 +++---
 samples/bpf/sampleip_user.c   |  4 +--
 samples/bpf/sock_example.c| 12 +
 samples/bpf/sockex1_user.c|  6 ++---
 samples/bpf/sockex2_user.c|  4 +--
 samples/bpf/sockex3_user.c|  4 +--
 samples/bpf/spintest_user.c   |  8 +++---
 samples/bpf/tc_l2_redirect_user.c |  4 +--
 samples/bpf/test_cgrp2_array_pin.c|  4 +--
 samples/bpf/test_cgrp2_attach.c   | 11 +---
 samples/bpf/test_cgrp2_attach2.c  |  7 +++--
 samples/bpf/test_cgrp2_sock.c |  6 +++--
 samples/bpf/test_current_task_under_cgroup_user.c |  8 +++---
 samples/bpf/test_lru_dist.c   | 32 +++
 samples/bpf/test_probe_write_user_user.c  |  2 +-
 samples/bpf/trace_event_user.c| 14 +-
 samples/bpf/trace_output_user.c   |  2 +-
 samples/bpf/tracex2_user.c| 10 +++
 samples/bpf/tracex3_user.c  

[PATCH v3] stmmac: enable rx queues

2016-12-20 Thread Joao Pinto
When the hardware is synthesized with multiple queues, all queues are
disabled for default. This patch adds the rx queues configuration.
This patch was successfully tested in a Synopsys QoS Reference design.

Signed-off-by: Joao Pinto 
---
changes v2 -> v3 (Seraphin Bonnaffe):
- GMAC_RX_QUEUE_CLEAR macro simplified
changes v1 -> v2 (Niklas Cassel and Seraphin Bonnaffe):
- Instead of using number of DMA channels, lets use number of queues
- Create 2 flavors of RX queue enable Macros: AV and DCB (AV by default)
- Make sure that the RX queue related bits are cleared before setting
- Check if rx_queue_enable is available before executing

 drivers/net/ethernet/stmicro/stmmac/common.h  |  5 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |  8 
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 12 
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  |  5 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++
 5 files changed, 52 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index b13a144..6c96291 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -323,6 +323,9 @@ struct dma_features {
/* TX and RX number of channels */
unsigned int number_rx_channel;
unsigned int number_tx_channel;
+   /* TX and RX number of queues */
+   unsigned int number_rx_queues;
+   unsigned int number_tx_queues;
/* Alternate (enhanced) DESC mode */
unsigned int enh_desc;
 };
@@ -454,6 +457,8 @@ struct stmmac_ops {
void (*core_init)(struct mac_device_info *hw, int mtu);
/* Enable and verify that the IPC module is supported */
int (*rx_ipc)(struct mac_device_info *hw);
+   /* Enable RX Queues */
+   void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue);
/* Dump MAC registers */
void (*dump_regs)(struct mac_device_info *hw);
/* Handle extra events on specific interrupts hw dependent */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 3e8d4fe..b524598 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -22,6 +22,7 @@
 #define GMAC_HASH_TAB_32_630x0014
 #define GMAC_RX_FLOW_CTRL  0x0090
 #define GMAC_QX_TX_FLOW_CTRL(x)(0x70 + x * 4)
+#define GMAC_RXQ_CTRL0 0x00a0
 #define GMAC_INT_STATUS0x00b0
 #define GMAC_INT_EN0x00b4
 #define GMAC_PCS_BASE  0x00e0
@@ -44,6 +45,11 @@
 
 #define GMAC_MAX_PERFECT_ADDRESSES 128
 
+/* MAC RX Queue Enable */
+#define GMAC_RX_QUEUE_CLEAR(queue) ~(GENMASK(1, 0) << ((queue) * 2))
+#define GMAC_RX_AV_QUEUE_ENABLE(queue) BIT((queue) * 2)
+#define GMAC_RX_DCB_QUEUE_ENABLE(queue)BIT(((queue) * 2) + 1)
+
 /* MAC Flow Control RX */
 #define GMAC_RX_FLOW_CTRL_RFE  BIT(0)
 
@@ -133,6 +139,8 @@ enum power_event {
 /* MAC HW features2 bitmap */
 #define GMAC_HW_FEAT_TXCHCNT   GENMASK(21, 18)
 #define GMAC_HW_FEAT_RXCHCNT   GENMASK(15, 12)
+#define GMAC_HW_FEAT_TXQCNTGENMASK(9, 6)
+#define GMAC_HW_FEAT_RXQCNTGENMASK(3, 0)
 
 /* MAC HW ADDR regs */
 #define GMAC_HI_DCSGENMASK(18, 16)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index eaed7cb..ecfbf57 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -59,6 +59,17 @@ static void dwmac4_core_init(struct mac_device_info *hw, int 
mtu)
writel(value, ioaddr + GMAC_INT_EN);
 }
 
+static void dwmac4_rx_queue_enable(struct mac_device_info *hw, u32 queue)
+{
+   void __iomem *ioaddr = hw->pcsr;
+   u32 value = readl(ioaddr + GMAC_RXQ_CTRL0);
+
+   value &= GMAC_RX_QUEUE_CLEAR(queue);
+   value |= GMAC_RX_AV_QUEUE_ENABLE(queue);
+
+   writel(value, ioaddr + GMAC_RXQ_CTRL0);
+}
+
 static void dwmac4_dump_regs(struct mac_device_info *hw)
 {
void __iomem *ioaddr = hw->pcsr;
@@ -392,6 +403,7 @@ static void dwmac4_debug(void __iomem *ioaddr, struct 
stmmac_extra_stats *x)
 static const struct stmmac_ops dwmac4_ops = {
.core_init = dwmac4_core_init,
.rx_ipc = dwmac4_rx_ipc_enable,
+   .rx_queue_enable = dwmac4_rx_queue_enable,
.dump_regs = dwmac4_dump_regs,
.host_irq_status = dwmac4_irq_status,
.flow_ctrl = dwmac4_flow_ctrl,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index 8196ab5..377d1b4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -303,6 +303,11 @@ static 

Re: wl1251 & mac address & calibration data

2016-12-20 Thread Pali Rohár
On Tuesday 20 December 2016 17:56:58 Tony Lindgren wrote:
> * Kalle Valo  [161220 03:47]:
> > Arend Van Spriel  writes:
> > > On 18-12-2016 13:09, Pali Rohár wrote:
> > >> File wl1251-nvs.bin is provided by linux-firmware package and
> > >> contains default data which should be overriden by model
> > >> specific calibrated data.
> > > 
> > > Ah. Someone thought it was a good idea to provide the "one ring
> > > to rule them all". Nice.
> > 
> > Yes, that was a bad idea. wl1251-nvs.bin in linux-firmware.git
> > should be renamed to wl1251-nvs.bin.example, or something like
> > that, as it should be only installed to a real system only if
> > there's no real calibration data available (only for developers to
> > use, not real users).
> 
> Makes sense to me. Note that with the recent changes to wlcore, we
> can now easily provide board specific calibration firmware simply by
> adding a new compatible value. So for n900, we could have something
> like compatible = "ti,wl1251-n900" and have it point to n900
> specific calibration file wl1251-nvs-n900.bin. Of course this won't
> help with the mac address, or any of the device specific data..
> 
> That is assuming the calibration values are the same for each similar
> device and don't have to be generated for each device. And naturally
> wl1251 needs simlar changes done to make use of devices specific
> calibration files.
> 
> Regards,
> 
> Tony

As wrote in another thread "wl1251 NVS calibration data format" 
calibration data for wl1251 (wl1251-nvs.bin) contains also MAC address, 
which kernel sends to wl1251 chip. Kernel just do not use it.

So... my idea now is:

1) extend request_firmware function family with ability to use userspace 
helper first and fallback to VFS

2) teach wl1251.ko to parse MAC address from wl1251-nvs.bin and use it 
(in case it is not empty or 00:00:20:07:03:09 which is in that example 
linux-firmware package)

3) write Nokia n900 specific userspace helper for providing data when 
kernel requests wl1251-nvs.bin. So userspace helper reads MAC address 
and calibration data from CAL, place MAC address into calibration data 
and send put it into kernel.

Are you OK with this idea?

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: This is a digitally signed message part.


[PATCH 25/29] samples/bpf: Switch over to libbpf

2016-12-20 Thread Arnaldo Carvalho de Melo
From: Joe Stringer 

Now that libbpf under tools/lib/bpf/* is synced with the version from
samples/bpf, we can get rid most of the libbpf library here.

Committer notes:

Built it in a docker fedora rawhide container and ran it in the f25 host, seems
to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
doesn't seem to have introduced problems and Joe said he tested it with
all the entries in samples/bpf/ and other code he found:

  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install
  
  [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
  make[1]: Entering directory '/tmp/build/linux'
CHK include/config/kernel.release
HOSTCC  scripts/basic/fixdep
GEN ./Makefile
CHK include/generated/uapi/linux/version.h
Using /git/linux as source for kernel
CHK include/generated/utsrelease.h
HOSTCC  scripts/basic/bin2c
HOSTCC  arch/x86/tools/relocs_32.o
HOSTCC  arch/x86/tools/relocs_64.o
LD  samples/bpf/built-in.o
  
HOSTCC  samples/bpf/fds_example.o
HOSTCC  samples/bpf/sockex1_user.o
  /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
  /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 
'bpf_load_program' discards 'const' qualifier from pointer target type 
[-Wdiscarded-qualifiers]
insns, insns_cnt, "GPL", 0,
^
  In file included from /git/linux/samples/bpf/libbpf.h:5:0,
   from /git/linux/samples/bpf/bpf_load.h:4,
   from /git/linux/samples/bpf/fds_example.c:15:
  /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but 
argument is of type 'const struct bpf_insn *'
   int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
   ^~~~
HOSTCC  samples/bpf/sockex2_user.o
  
HOSTCC  samples/bpf/xdp_tx_iptunnel_user.o
  clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include 
-I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi 
-I./arch/x86/include/generated  -I/git/linux/include -I./include 
-I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi 
-I./include/generated/uapi -include /git/linux/include/linux/kconfig.h  \
  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
  -Wno-compare-distinct-pointer-types \
  -Wno-gnu-variable-sized-type-not-at-end \
  -Wno-address-of-packed-member -Wno-tautological-compare \
  -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc 
-march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
HOSTLD  samples/bpf/tc_l2_redirect
  
HOSTLD  samples/bpf/lwt_len_hist
HOSTLD  samples/bpf/xdp_tx_iptunnel
  make[1]: Leaving directory '/tmp/build/linux'
  [root@f5065a7d6272 linux]#

And then, in the host:

  [root@jouet bpf]# mount | grep "docker.*devicemapper\/"
  
/dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9
 on 
/var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9
 type xfs 
(rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
  [root@jouet bpf]# cd 
/var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
  [root@jouet bpf]# file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically 
linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, 
BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
  [root@jouet bpf]# readelf -SW offwaketime
  offwaketime offwaketime_kern.o  offwaketime_user.o
  [root@jouet bpf]# readelf -SW offwaketime_kern.o
  There are 11 section headers, starting at offset 0x700:

  Section Headers:
[Nr] Name  TypeAddress  OffSize   ES 
Flg Lk Inf Al
[ 0]   NULL 00 00 00
  0   0  0
[ 1] .strtab   STRTAB   000658 a8 00
  0   0  1
[ 2] .text PROGBITS 40 00 00  
AX  0   0  4
[ 3] kprobe/try_to_wake_up PROGBITS 40 d8 
00  AX  0   0  8
[ 4] .relkprobe/try_to_wake_up REL  0005a8 
20 10 10   3  8
[ 5] tracepoint/sched/sched_switch PROGBITS 000118 
000318 00  AX  0   0  8
[ 6] .reltracepoint/sched/sched_switch REL  
0005c8 90 10 10   5  8
[ 7] maps  PROGBITS 000430 50 00  
WA  0   0  4
[ 8] license   PROGBITS 000480 04 00  
WA  0   0  1
[ 9] version   PROGBITS 

[GIT PULL 00/29] perf/core improvements and fixes

2016-12-20 Thread Arnaldo Carvalho de Melo
Hi Ingo,

Please consider pulling, I had most of this queued before your first
pull req to Linus for 4.10, most are fixes, with 'perf sched timehist --idle'
as a followup new feature to the 'perf sched timehist' command introduced in
this window.

One other thing that delayed this was the samples/bpf/ switch to
tools/lib/bpf/ that involved fixing up merge clashes with net.git and also
to properly test it, after more rounds than antecipated, but all seems ok
now and would be good to get this merge issues past us ASAP.

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit e7aa8c2eb11ba69b1b69099c3c7bd6be3087b0ba:

  Merge tag 'docs-4.10' of git://git.lwn.net/linux (2016-12-12 21:58:13 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-core-for-mingo-20161220

for you to fetch changes up to 9899694a7f67714216665b87318eb367e2c5c901:

  samples/bpf: Move open_raw_sock to separate header (2016-12-20 12:00:40 -0300)


perf/core improvements and fixes:

New features:

- Introduce 'perf sched timehist --idle', to analyse processes
  going to/from idle state (Namhyung Kim)

Fixes:

- Allow 'perf record -u user' to continue when facing races with threads
  going away after having scanned them via /proc (Jiri Olsa)

- Fix 'perf mem' --all-user/--all-kernel options (Jiri Olsa)

- Support jumps with multiple arguments (Ravi Bangoria)

- Fix jumps to before the function where they are located (Ravi
Bangoria)

- Fix lock-pi help string (Davidlohr Bueso)

- Fix build of 'perf trace' in odd systems such as a RHEL PPC one (Jiri Olsa)

- Do not overwrite valid build id in 'perf diff' (Kan Liang)

- Don't throw error for zero length symbols, allowing the use of the TUI
  in PowerPC, where such symbols became more common recently (Ravi Bangoria)

Infrastructure:

- Switch of samples/bpf/ to use tools/lib/bpf, removing libbpf
  duplication (Joe Stringer)

- Move headers check into bash script (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>


Arnaldo Carvalho de Melo (3):
  perf tools: Remove some needless __maybe_unused
  samples/bpf: Make perf_event_read() static
  samples/bpf: Be consistent with bpf_load_program bpf_insn parameter

Davidlohr Bueso (1):
  perf bench futex: Fix lock-pi help string

Jiri Olsa (7):
  perf tools: Move headers check into bash script
  perf mem: Fix --all-user/--all-kernel options
  perf evsel: Use variable instead of repeating lengthy FD macro
  perf thread_map: Add thread_map__remove function
  perf evsel: Allow to ignore missing pid
  perf record: Force ignore_missing_thread for uid option
  perf trace: Check if MAP_32BIT is defined (again)

Joe Stringer (8):
  tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h
  tools lib bpf: use __u32 from linux/types.h
  tools lib bpf: Add flags to bpf_create_map()
  samples/bpf: Make samples more libbpf-centric
  samples/bpf: Switch over to libbpf
  tools lib bpf: Add bpf_prog_{attach,detach}
  samples/bpf: Remove perf_event_open() declaration
  samples/bpf: Move open_raw_sock to separate header

Kan Liang (1):
  perf diff: Do not overwrite valid build id

Namhyung Kim (6):
  perf sched timehist: Split is_idle_sample()
  perf sched timehist: Introduce struct idle_time_data
  perf sched timehist: Save callchain when entering idle
  perf sched timehist: Skip non-idle events when necessary
  perf sched timehist: Add -I/--idle-hist option
  perf sched timehist: Show callchains for idle stat

Ravi Bangoria (3):
  perf annotate: Support jump instruction with target as second operand
  perf annotate: Fix jump target outside of function address range
  perf annotate: Don't throw error for zero length symbols

 samples/bpf/Makefile  |  70 +--
 samples/bpf/README.rst|   4 +-
 samples/bpf/bpf_load.c|  21 +-
 samples/bpf/bpf_load.h|   3 +
 samples/bpf/fds_example.c |  13 +-
 samples/bpf/lathist_user.c|   2 +-
 samples/bpf/libbpf.c  | 176 ---
 samples/bpf/libbpf.h  |  28 +-
 samples/bpf/lwt_len_hist_user.c   |   6 +-
 samples/bpf/offwaketime_user.c|   8 +-
 samples/bpf/sampleip_user.c   |   7 +-
 samples/bpf/sock_example.c|  14 +-
 samples/bpf/sock_example.h|  35 ++
 samples/bpf/sockex1_user.c|   7 +-
 samples/bpf/sockex2_user.c|   5 +-
 samples/bpf/sockex3_user.c   

Re: [PATCH v2] stmmac: enable rx queues

2016-12-20 Thread Joao Pinto
Às 4:51 PM de 12/20/2016, Seraphin BONNAFFE escreveu:
> Hi Joao,
> 
> Please find two more comments below.
> 
> Regards,
> Séraphin
> 
> 
> On 12/20/2016 05:27 PM, Joao Pinto wrote:
>> When the hardware is synthesized with multiple queues, all queues are
>> disabled for default. This patch adds the rx queues configuration.
>> This patch was successfully tested in a Synopsys QoS Reference design.
>>
>> Signed-off-by: Joao Pinto 
>> ---
>> changes v1 -> v2 (Niklas Cassel and Seraphin Bonnaffe):
>> - Instead of using number of DMA channels, lets use number of queues
>> - Create 2 flavors of RX queue enable Macros: AV and DCB (AV by default)
>> - Make sure that the RX queue related bits are cleared before setting
>> - Check if rx_queue_enable is available before executing
>> stmmac_mac_enable_rx_queues()
>>
>>  drivers/net/ethernet/stmicro/stmmac/common.h  |  5 +
>>  drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |  9 +
>>  drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 12 
>>  drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  |  5 +
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 
>> ++
>>  5 files changed, 53 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h
>> b/drivers/net/ethernet/stmicro/stmmac/common.h
>> index b13a144..6c96291 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/common.h
>> +++ b/drivers/net/ethernet/stmicro/stmmac/common.h
>> @@ -323,6 +323,9 @@ struct dma_features {
>>  /* TX and RX number of channels */
>>  unsigned int number_rx_channel;
>>  unsigned int number_tx_channel;
>> +/* TX and RX number of queues */
>> +unsigned int number_rx_queues;
>> +unsigned int number_tx_queues;
>>  /* Alternate (enhanced) DESC mode */
>>  unsigned int enh_desc;
>>  };
>> @@ -454,6 +457,8 @@ struct stmmac_ops {
>>  void (*core_init)(struct mac_device_info *hw, int mtu);
>>  /* Enable and verify that the IPC module is supported */
>>  int (*rx_ipc)(struct mac_device_info *hw);
>> +/* Enable RX Queues */
>> +void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue);
>>  /* Dump MAC registers */
>>  void (*dump_regs)(struct mac_device_info *hw);
>>  /* Handle extra events on specific interrupts hw dependent */
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
>> index 3e8d4fe..7d88517 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
>> @@ -22,6 +22,7 @@
>>  #define GMAC_HASH_TAB_32_630x0014
>>  #define GMAC_RX_FLOW_CTRL0x0090
>>  #define GMAC_QX_TX_FLOW_CTRL(x)(0x70 + x * 4)
>> +#define GMAC_RXQ_CTRL00x00a0
>>  #define GMAC_INT_STATUS0x00b0
>>  #define GMAC_INT_EN0x00b4
>>  #define GMAC_PCS_BASE0x00e0
>> @@ -44,6 +45,12 @@
>>
>>  #define GMAC_MAX_PERFECT_ADDRESSES128
>>
>> +/* MAC RX Queue Enable */
>> +#define GMAC_RX_QUEUE_CLEAR(queue)~(BIT((queue) * 2) \
>> +| BIT(((queue) * 2) + 1))
> 
> 
> What would you think about  ~(GENMASK(1, 0) << ((queue) * 2))) instead ?
> Slightly more readable in my humble opinion.

More readable indeed :)

> 
> 
>> +#define GMAC_RX_AV_QUEUE_ENABLE(queue)BIT((queue) * 2)
>> +#define GMAC_RX_DCB_QUEUE_ENABLE(queue)BIT(((queue) * 2) + 1)
>> +
>>  /* MAC Flow Control RX */
>>  #define GMAC_RX_FLOW_CTRL_RFEBIT(0)
>>
>> @@ -133,6 +140,8 @@ enum power_event {
>>  /* MAC HW features2 bitmap */
>>  #define GMAC_HW_FEAT_TXCHCNTGENMASK(21, 18)
>>  #define GMAC_HW_FEAT_RXCHCNTGENMASK(15, 12)
>> +#define GMAC_HW_FEAT_TXQCNTGENMASK(9, 6)
>> +#define GMAC_HW_FEAT_RXQCNTGENMASK(3, 0)
>>
>>  /* MAC HW ADDR regs */
>>  #define GMAC_HI_DCSGENMASK(18, 16)
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
>> index eaed7cb..ecfbf57 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
>> @@ -59,6 +59,17 @@ static void dwmac4_core_init(struct mac_device_info *hw,
>> int mtu)
>>  writel(value, ioaddr + GMAC_INT_EN);
>>  }
>>
>> +static void dwmac4_rx_queue_enable(struct mac_device_info *hw, u32 queue)
>> +{
>> +void __iomem *ioaddr = hw->pcsr;
>> +u32 value = readl(ioaddr + GMAC_RXQ_CTRL0);
>> +
>> +value &= GMAC_RX_QUEUE_CLEAR(queue);
>> +value |= GMAC_RX_AV_QUEUE_ENABLE(queue);
>> +
>> +writel(value, ioaddr + GMAC_RXQ_CTRL0);
>> +}
>> +
>>  static void dwmac4_dump_regs(struct mac_device_info *hw)
>>  {
>>  void __iomem *ioaddr = hw->pcsr;
>> @@ -392,6 +403,7 @@ static void dwmac4_debug(void __iomem *ioaddr, struct
>> stmmac_extra_stats *x)
>>  static const struct stmmac_ops dwmac4_ops = {
>>  .core_init = dwmac4_core_init,
>>  

Re: wl1251 & mac address & calibration data

2016-12-20 Thread Tony Lindgren
* Kalle Valo  [161220 03:47]:
> Arend Van Spriel  writes:
> 
> > On 18-12-2016 13:09, Pali Rohár wrote:
> >
> >> File wl1251-nvs.bin is provided by linux-firmware package and contains 
> >> default data which should be overriden by model specific calibrated 
> >> data.
> >
> > Ah. Someone thought it was a good idea to provide the "one ring to rule
> > them all". Nice.
> 
> Yes, that was a bad idea. wl1251-nvs.bin in linux-firmware.git should be
> renamed to wl1251-nvs.bin.example, or something like that, as it should
> be only installed to a real system only if there's no real calibration
> data available (only for developers to use, not real users).

Makes sense to me. Note that with the recent changes to wlcore, we can
now easily provide board specific calibration firmware simply by adding a
new compatible value. So for n900, we could have something like
compatible = "ti,wl1251-n900" and have it point to n900 specific calibration
file wl1251-nvs-n900.bin. Of course this won't help with the mac address,
or any of the device specific data..

That is assuming the calibration values are the same for each similar
device and don't have to be generated for each device. And naturally wl1251
needs simlar changes done to make use of devices specific calibration files.

Regards,

Tony


Re: [PATCH v2] stmmac: enable rx queues

2016-12-20 Thread Seraphin BONNAFFE

Hi Joao,

Please find two more comments below.

Regards,
Séraphin


On 12/20/2016 05:27 PM, Joao Pinto wrote:

When the hardware is synthesized with multiple queues, all queues are
disabled for default. This patch adds the rx queues configuration.
This patch was successfully tested in a Synopsys QoS Reference design.

Signed-off-by: Joao Pinto 
---
changes v1 -> v2 (Niklas Cassel and Seraphin Bonnaffe):
- Instead of using number of DMA channels, lets use number of queues
- Create 2 flavors of RX queue enable Macros: AV and DCB (AV by default)
- Make sure that the RX queue related bits are cleared before setting
- Check if rx_queue_enable is available before executing
stmmac_mac_enable_rx_queues()

 drivers/net/ethernet/stmicro/stmmac/common.h  |  5 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |  9 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 12 
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  |  5 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++
 5 files changed, 53 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index b13a144..6c96291 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -323,6 +323,9 @@ struct dma_features {
/* TX and RX number of channels */
unsigned int number_rx_channel;
unsigned int number_tx_channel;
+   /* TX and RX number of queues */
+   unsigned int number_rx_queues;
+   unsigned int number_tx_queues;
/* Alternate (enhanced) DESC mode */
unsigned int enh_desc;
 };
@@ -454,6 +457,8 @@ struct stmmac_ops {
void (*core_init)(struct mac_device_info *hw, int mtu);
/* Enable and verify that the IPC module is supported */
int (*rx_ipc)(struct mac_device_info *hw);
+   /* Enable RX Queues */
+   void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue);
/* Dump MAC registers */
void (*dump_regs)(struct mac_device_info *hw);
/* Handle extra events on specific interrupts hw dependent */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 3e8d4fe..7d88517 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -22,6 +22,7 @@
 #define GMAC_HASH_TAB_32_630x0014
 #define GMAC_RX_FLOW_CTRL  0x0090
 #define GMAC_QX_TX_FLOW_CTRL(x)(0x70 + x * 4)
+#define GMAC_RXQ_CTRL0 0x00a0
 #define GMAC_INT_STATUS0x00b0
 #define GMAC_INT_EN0x00b4
 #define GMAC_PCS_BASE  0x00e0
@@ -44,6 +45,12 @@

 #define GMAC_MAX_PERFECT_ADDRESSES 128

+/* MAC RX Queue Enable */
+#define GMAC_RX_QUEUE_CLEAR(queue) ~(BIT((queue) * 2) \
+   | BIT(((queue) * 2) + 1))



What would you think about  ~(GENMASK(1, 0) << ((queue) * 2))) instead ?
Slightly more readable in my humble opinion.



+#define GMAC_RX_AV_QUEUE_ENABLE(queue) BIT((queue) * 2)
+#define GMAC_RX_DCB_QUEUE_ENABLE(queue)BIT(((queue) * 2) + 1)
+
 /* MAC Flow Control RX */
 #define GMAC_RX_FLOW_CTRL_RFE  BIT(0)

@@ -133,6 +140,8 @@ enum power_event {
 /* MAC HW features2 bitmap */
 #define GMAC_HW_FEAT_TXCHCNT   GENMASK(21, 18)
 #define GMAC_HW_FEAT_RXCHCNT   GENMASK(15, 12)
+#define GMAC_HW_FEAT_TXQCNTGENMASK(9, 6)
+#define GMAC_HW_FEAT_RXQCNTGENMASK(3, 0)

 /* MAC HW ADDR regs */
 #define GMAC_HI_DCSGENMASK(18, 16)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index eaed7cb..ecfbf57 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -59,6 +59,17 @@ static void dwmac4_core_init(struct mac_device_info *hw, int 
mtu)
writel(value, ioaddr + GMAC_INT_EN);
 }

+static void dwmac4_rx_queue_enable(struct mac_device_info *hw, u32 queue)
+{
+   void __iomem *ioaddr = hw->pcsr;
+   u32 value = readl(ioaddr + GMAC_RXQ_CTRL0);
+
+   value &= GMAC_RX_QUEUE_CLEAR(queue);
+   value |= GMAC_RX_AV_QUEUE_ENABLE(queue);
+
+   writel(value, ioaddr + GMAC_RXQ_CTRL0);
+}
+
 static void dwmac4_dump_regs(struct mac_device_info *hw)
 {
void __iomem *ioaddr = hw->pcsr;
@@ -392,6 +403,7 @@ static void dwmac4_debug(void __iomem *ioaddr, struct 
stmmac_extra_stats *x)
 static const struct stmmac_ops dwmac4_ops = {
.core_init = dwmac4_core_init,
.rx_ipc = dwmac4_rx_ipc_enable,
+   .rx_queue_enable = dwmac4_rx_queue_enable,
.dump_regs = dwmac4_dump_regs,
.host_irq_status = dwmac4_irq_status,
.flow_ctrl = dwmac4_flow_ctrl,
diff --git 

RE: [RFC PATCH net-next v4 2/2] macb: Enable 1588 support in SAMA5Dx platforms.

2016-12-20 Thread Rafal Ozieblo
From: Andrei Pistirica [mailto:andrei.pistir...@microchip.com] 
Sent: 14 grudnia 2016 13:56

> This patch does the following:
>  
> - Enable HW time stamp for the following platforms: SAMA5D2, SAMA5D3 and  
>  
>   SAMA5D4.
>  
> - HW time stamp capabilities are advertised via ethtool and macb ioctl is 
>  
>   updated accordingly.
>  
> - HW time stamp on the PTP Ethernet packets are received using the
>  
>   SO_TIMESTAMPING API. Where timers are obtained from the PTP event/peer  
>  
>   registers.  
>  
>   
>  
> Note: Patch on net-next, on December 7th. 
>  
>   
>  
> Signed-off-by: Andrei Pistirica   
>  
> ---   
>  
> Patch history:
>  
>   
>  
> Version 1:
>  
> Integration with SAMA5D2 only. This feature wasn't tested on any other 
> platform that might use cadence/gem.
>   
>  
> Patch is not completely ported to the very latest version of net-next, and it 
> will be after review.
>   
>  
> Version 2 modifications:  
>  
> - add PTP caps for SAMA5D2/3/4 platforms  
>  
> - and cosmetic changes
>  
>   
>  
> Version 3 modifications:  
>  
> - add support for sama5D2/3/4 platforms using GEM-PTP interface.  
>  
>   
>  
> Version 4 modifications:  
>  
> - time stamp only PTP_V2 events   
>  
> - maximum adjustment value is set based on Richard's input
>  
>   
>  
> Note: Patch on net-next, on December 14th.
>  
>   
>  
>  drivers/net/ethernet/cadence/macb.c | 168 
> ++--
>  1 file changed, 163 insertions(+), 5 deletions(-)
>  
>   
>  
> diff --git a/drivers/net/ethernet/cadence/macb.c 
> b/drivers/net/ethernet/cadence/macb.c 
> index 538544a..8d5c976 100644 
>  
> --- a/drivers/net/ethernet/cadence/macb.c 
>  
> +++ b/drivers/net/ethernet/cadence/macb.c 
>  
> @@ -714,6 +714,8 @@ static void macb_tx_interrupt(struct 

Re: Which ethtool methods should I implement?

2016-12-20 Thread Timur Tabi

On 12/19/2016 07:40 PM, Florian Fainelli wrote:

Ideally, everything that is supported by your HW, but I would with the
basic essential stuff that you would need in case someone reports
problems with your driver like:

- statistics (MAC for sure) and PHY (if possible), -S
- ability to restart auto-negotation (-r)
- reporting of driver information (-i)
- support toggling and reporting NETIF_F_* features -k/-K


Thanks, I'll get this done soon.

I'm confused about netdev_set_default_ethtool_ops().  Is this a function 
that drivers are supposed to call?  I only see one driver use it.  Other 
drivers just set netdev->ethtool_ops manually.


--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.


Re: [PATCH] net_sched: sch_fq: use rb_entry()

2016-12-20 Thread Eric Dumazet
On Tue, 2016-12-20 at 22:02 +0800, Geliang Tang wrote:
> To make the code clearer, use rb_entry() instead of container_of() to
> deal with rbtree.
> 
> Signed-off-by: Geliang Tang 
> ---
>  net/sched/sch_fq.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)

Acked-by: Eric Dumazet 

Thanks.




Re: [PATCH net-next] ixgbevf: fix 'Etherleak' in ixgbevf

2016-12-20 Thread Alexander Duyck
The limit of 17 is just based on the hardware.  Specifically the
olinfo field in the Tx descriptor has a minimum length of 17 has a
requirement.  The hardware itself is supposed to be capable of padding
short frames that are supposed to be transmitted.  The drivers are
supposed to pad short frames on receive to get them up to 60 bytes.

When you are seeing this issue are you sending frames from the VF to
one of the local interfaces on the same port or to an external
interface?  Also are you receiving on another linux ixgbevf driver or
are you receiving the packet using a different driver interface such
as DPDK?  I'm just wanting to verify this as it is possible that the
memory leak you are seeing is on the receiver and not on the source if
you are transmitting to a local VF or the PF as the receiver will have
to pad the frame in such a case to get it up to 60 bytes.

- Alex

On Tue, Dec 20, 2016 at 3:50 AM, Weilong Chen  wrote:
> Hi,
>
> Thanks for you reply.
> We test you patch, but the problem is still there, it seems do not work.
>
> I'm not sure why ixgbe use the limit 17. The kenel use ETH_ZLEN (60) with
> out FCS. A lot of drivers such as e1000 use it. Any explaination?
>
> Thanks.
>
>
> On 2016/12/16 0:13, Alexander Duyck wrote:
>>
>> On Thu, Dec 15, 2016 at 3:40 AM, Weilong Chen 
>> wrote:
>>>
>>> Nessus report the vf appears to leak memory in network packets.
>>> Fix this by padding all small packets manually.
>>>
>>> And the CVE-2003-0001.
>>>
>>> https://ofirarkin.files.wordpress.com/2008/11/atstake_etherleak_report.pdf
>>>
>>> Signed-off-by: Weilong Chen 
>>> ---
>>>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
>>> b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
>>> index 6d4bef5..137a154 100644
>>> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
>>> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
>>> @@ -3654,6 +3654,13 @@ static int ixgbevf_xmit_frame(struct sk_buff *skb,
>>> struct net_device *netdev)
>>> return NETDEV_TX_OK;
>>> }
>>>
>>> +   /* On PCI/PCI-X HW, if packet size is less than ETH_ZLEN,
>>> +* packets may get corrupted during padding by HW.
>>> +* To WA this issue, pad all small packets manually.
>>> +*/
>>> +   if (eth_skb_pad(skb))
>>> +   return NETDEV_TX_OK;
>>> +
>>
>>
>> So the patch description for this probably isn't correct.  It looks
>> like the problem isn't leaking data it is the fact that the frames
>> aren't being padded to prevent malicious events.  The only issue is
>> the patch is padding by a bit too much.  I would recommend replacing
>> this with the following from ixgbe:
>>
>> /*
>>  * The minimum packet size for olinfo paylen is 17 so pad the skb
>>  * in order to meet this minimum size requirement.
>>  */
>> if (skb_put_padto(skb, 17))
>> return NETDEV_TX_OK;
>>
>>
>>> tx_ring = adapter->tx_ring[skb->queue_mapping];
>>>
>>> /* need: 1 descriptor per page *
>>> PAGE_SIZE/IXGBE_MAX_DATA_PER_TXD,
>>> --
>>> 1.7.12
>>>
>>
>> .
>>
>


Re: [PATCH] RDS: use rb_entry()

2016-12-20 Thread Santosh Shilimkar

On 12/20/2016 6:02 AM, Geliang Tang wrote:

To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang 
---

Looks fine.
Acked-by: Santosh Shilimkar 


[PATCH v2] stmmac: enable rx queues

2016-12-20 Thread Joao Pinto
When the hardware is synthesized with multiple queues, all queues are
disabled for default. This patch adds the rx queues configuration.
This patch was successfully tested in a Synopsys QoS Reference design.

Signed-off-by: Joao Pinto 
---
changes v1 -> v2 (Niklas Cassel and Seraphin Bonnaffe):
- Instead of using number of DMA channels, lets use number of queues
- Create 2 flavors of RX queue enable Macros: AV and DCB (AV by default)
- Make sure that the RX queue related bits are cleared before setting
- Check if rx_queue_enable is available before executing
stmmac_mac_enable_rx_queues()

 drivers/net/ethernet/stmicro/stmmac/common.h  |  5 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |  9 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 12 
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  |  5 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++
 5 files changed, 53 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index b13a144..6c96291 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -323,6 +323,9 @@ struct dma_features {
/* TX and RX number of channels */
unsigned int number_rx_channel;
unsigned int number_tx_channel;
+   /* TX and RX number of queues */
+   unsigned int number_rx_queues;
+   unsigned int number_tx_queues;
/* Alternate (enhanced) DESC mode */
unsigned int enh_desc;
 };
@@ -454,6 +457,8 @@ struct stmmac_ops {
void (*core_init)(struct mac_device_info *hw, int mtu);
/* Enable and verify that the IPC module is supported */
int (*rx_ipc)(struct mac_device_info *hw);
+   /* Enable RX Queues */
+   void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue);
/* Dump MAC registers */
void (*dump_regs)(struct mac_device_info *hw);
/* Handle extra events on specific interrupts hw dependent */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 3e8d4fe..7d88517 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -22,6 +22,7 @@
 #define GMAC_HASH_TAB_32_630x0014
 #define GMAC_RX_FLOW_CTRL  0x0090
 #define GMAC_QX_TX_FLOW_CTRL(x)(0x70 + x * 4)
+#define GMAC_RXQ_CTRL0 0x00a0
 #define GMAC_INT_STATUS0x00b0
 #define GMAC_INT_EN0x00b4
 #define GMAC_PCS_BASE  0x00e0
@@ -44,6 +45,12 @@
 
 #define GMAC_MAX_PERFECT_ADDRESSES 128
 
+/* MAC RX Queue Enable */
+#define GMAC_RX_QUEUE_CLEAR(queue) ~(BIT((queue) * 2) \
+   | BIT(((queue) * 2) + 1))
+#define GMAC_RX_AV_QUEUE_ENABLE(queue) BIT((queue) * 2)
+#define GMAC_RX_DCB_QUEUE_ENABLE(queue)BIT(((queue) * 2) + 1)
+
 /* MAC Flow Control RX */
 #define GMAC_RX_FLOW_CTRL_RFE  BIT(0)
 
@@ -133,6 +140,8 @@ enum power_event {
 /* MAC HW features2 bitmap */
 #define GMAC_HW_FEAT_TXCHCNT   GENMASK(21, 18)
 #define GMAC_HW_FEAT_RXCHCNT   GENMASK(15, 12)
+#define GMAC_HW_FEAT_TXQCNTGENMASK(9, 6)
+#define GMAC_HW_FEAT_RXQCNTGENMASK(3, 0)
 
 /* MAC HW ADDR regs */
 #define GMAC_HI_DCSGENMASK(18, 16)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index eaed7cb..ecfbf57 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -59,6 +59,17 @@ static void dwmac4_core_init(struct mac_device_info *hw, int 
mtu)
writel(value, ioaddr + GMAC_INT_EN);
 }
 
+static void dwmac4_rx_queue_enable(struct mac_device_info *hw, u32 queue)
+{
+   void __iomem *ioaddr = hw->pcsr;
+   u32 value = readl(ioaddr + GMAC_RXQ_CTRL0);
+
+   value &= GMAC_RX_QUEUE_CLEAR(queue);
+   value |= GMAC_RX_AV_QUEUE_ENABLE(queue);
+
+   writel(value, ioaddr + GMAC_RXQ_CTRL0);
+}
+
 static void dwmac4_dump_regs(struct mac_device_info *hw)
 {
void __iomem *ioaddr = hw->pcsr;
@@ -392,6 +403,7 @@ static void dwmac4_debug(void __iomem *ioaddr, struct 
stmmac_extra_stats *x)
 static const struct stmmac_ops dwmac4_ops = {
.core_init = dwmac4_core_init,
.rx_ipc = dwmac4_rx_ipc_enable,
+   .rx_queue_enable = dwmac4_rx_queue_enable,
.dump_regs = dwmac4_dump_regs,
.host_irq_status = dwmac4_irq_status,
.flow_ctrl = dwmac4_flow_ctrl,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index 8196ab5..377d1b4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -303,6 +303,11 @@ 

[PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock

2016-12-20 Thread Geoff Lansberry
From: Geoff Lansberry 

The TRF7970A has configuration options to support hardware designs
which use a 27.12MHz clock. This commit adds a device tree option
'clock-frequency' to support configuring the this chip for default
13.56MHz clock or the optional 27.12MHz clock.
---
 .../devicetree/bindings/net/nfc/trf7970a.txt   |  4 ++
 drivers/nfc/trf7970a.c | 50 +-
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt 
b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
index 32b35a0..e262ac1 100644
--- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
+++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
@@ -21,6 +21,8 @@ Optional SoC Specific Properties:
 - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
   where an extra byte is returned by Read Multiple Block commands issued
   to Type 5 tags.
+- clock-frequency: Set to specify that the input frequency to the trf7970a is 
1356Hz or 2712Hz
+
 
 Example (for ARM-based BeagleBone with TRF7970A on SPI1):
 
@@ -43,6 +45,8 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
irq-status-read-quirk;
en2-rf-quirk;
t5t-rmb-extra-byte-quirk;
+   vdd_io_1v8;
+   clock-frequency = <2712>;
status = "okay";
};
 };
diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index 26c9dbb..4e051e9 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -124,6 +124,9 @@
 NFC_PROTO_ISO15693_MASK | NFC_PROTO_NFC_DEP_MASK)
 
 #define TRF7970A_AUTOSUSPEND_DELAY 3 /* 30 seconds */
+#define TRF7970A_13MHZ_CLOCK_FREQUENCY 1356
+#define TRF7970A_27MHZ_CLOCK_FREQUENCY 2712
+
 
 #define TRF7970A_RX_SKB_ALLOC_SIZE 256
 
@@ -1056,12 +1059,11 @@ static int trf7970a_init(struct trf7970a *trf)
 
trf->chip_status_ctrl &= ~TRF7970A_CHIP_STATUS_RF_ON;
 
-   ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL, 0);
+   ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL,
+   trf->modulator_sys_clk_ctrl);
if (ret)
goto err_out;
 
-   trf->modulator_sys_clk_ctrl = 0;
-
ret = trf7970a_write(trf, TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS,
TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLH_96 |
TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLL_32);
@@ -1181,27 +1183,37 @@ static int trf7970a_in_config_rf_tech(struct trf7970a 
*trf, int tech)
switch (tech) {
case NFC_DIGITAL_RF_TECH_106A:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443A_106;
-   trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
+   trf->modulator_sys_clk_ctrl =
+   (trf->modulator_sys_clk_ctrl & 0xF8) |
+   TRF7970A_MODULATOR_DEPTH_OOK;
trf->guard_time = TRF7970A_GUARD_TIME_NFCA;
break;
case NFC_DIGITAL_RF_TECH_106B:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443B_106;
-   trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+   trf->modulator_sys_clk_ctrl =
+   (trf->modulator_sys_clk_ctrl & 0xF8) |
+   TRF7970A_MODULATOR_DEPTH_ASK10;
trf->guard_time = TRF7970A_GUARD_TIME_NFCB;
break;
case NFC_DIGITAL_RF_TECH_212F:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_212;
-   trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+   trf->modulator_sys_clk_ctrl =
+   (trf->modulator_sys_clk_ctrl & 0xF8) |
+   TRF7970A_MODULATOR_DEPTH_ASK10;
trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
break;
case NFC_DIGITAL_RF_TECH_424F:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_424;
-   trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+   trf->modulator_sys_clk_ctrl =
+   (trf->modulator_sys_clk_ctrl & 0xF8) |
+   TRF7970A_MODULATOR_DEPTH_ASK10;
trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
break;
case NFC_DIGITAL_RF_TECH_ISO15693:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_15693_SGL_1OF4_2648;
-   trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
+   trf->modulator_sys_clk_ctrl =
+   (trf->modulator_sys_clk_ctrl & 0xF8) |
+   TRF7970A_MODULATOR_DEPTH_OOK;
trf->guard_time = TRF7970A_GUARD_TIME_15693;
break;
default:
@@ -1571,17 +1583,23 @@ static int trf7970a_tg_config_rf_tech(struct trf7970a 
*trf, int tech)
trf->iso_ctrl_tech 

Re: kernel/bpf/verifier.c: 4 * possible unintended fallthrough ?

2016-12-20 Thread Alexei Starovoitov
On Tue, Dec 20, 2016 at 3:20 AM, David Binderman  wrote:
> Hello there,
>
> I just tried to compile kernel-4.9 with a recent development
> version of gcc. It said
>
> kernel/bpf/verifier.c:1907:23: warning: this statement may fall through 
> [-Wimplicit-fallthrough=]
> kernel/bpf/verifier.c:1918:23: warning: this statement may fall through 
> [-Wimplicit-fallthrough=]
> kernel/bpf/verifier.c:1859:24: warning: this statement may fall through 
> [-Wimplicit-fallthrough=]
> kernel/bpf/verifier.c:1869:24: warning: this statement may fall through 
> [-Wimplicit-fallthrough=]
>
> Source code for the first one is
>
> case BPF_JGT:
> /* Unsigned comparison, the minimum value is 0. */
> true_reg->min_value = 0;
> case BPF_JSGT:
>
> Suggest either add the missing break or document the fallthrough
> with a comment something like /* FALLTHROUGH */

I've tried 4.9 and 5.2 and don't see this warning.
Is this 6.x gcc?
I suspect it will have such warnings all over the kernel.


Re: [PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage

2016-12-20 Thread Geoff Lansberry
On Mon, Dec 19, 2016 at 5:35 PM, Rob Herring  wrote:
> On Thu, Dec 15, 2016 at 05:30:43PM -0500, Geoff Lansberry wrote:
>> From: Geoff Lansberry 
>>
>> ---
>>  Documentation/devicetree/bindings/net/nfc/trf7970a.txt |  2 ++
>>  drivers/nfc/trf7970a.c | 13 -
>>  2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt 
>> b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
>> index 9dda879..208f045 100644
>> --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
>> +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
>> @@ -21,6 +21,7 @@ Optional SoC Specific Properties:
>>  - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
>>where an extra byte is returned by Read Multiple Block commands issued
>>to Type 5 tags.
>> +- vdd_io_1v8: Set to specify that the trf7970a io voltage should be set to 
>> 1.8V
>
> Use the regulator binding and provide a fixed 1.8V supply.
>
>>  - crystal_27mhz: Set to specify that the input frequency to the trf7970a is 
>> 27.12MHz
>>
>>
>> @@ -45,6 +46,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
>>   irq-status-read-quirk;
>>   en2-rf-quirk;
>>   t5t-rmb-extra-byte-quirk;
>> + vdd_io_1v8;
>>   crystal_27mhz;
>>   status = "okay";
>>   };

Rob - using the regulator binding is new to me, but I've given it a
shot and just sent you another set of patches for your inspection.
Please let me know if this is what you had in mind.

Geoff


Re: [PATCH v2 3/3] arm64: dts: marvell: Add ethernet switch definition for the ESPRESSObin

2016-12-20 Thread Andrew Lunn
> >>+   mdio {
> >>+   #address-cells = <1>;
> >>+   #size-cells = <0>;
> >>+   reg = <1>;
> >
> >what is this reg value for?
> >
> > Andrew
> >
> 
> It was required to avoid a warning thrown by the mdio subsystem

Do you remember what the warning was?

This seems odd to me. I don't see why a reg is needed here.

Thanks
Andrew


[PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage

2016-12-20 Thread Geoff Lansberry
From: Geoff Lansberry 

The TRF7970A has configuration options for supporting hardware designs
with 1.8 Volt or 3.3 Volt IO.   This commit adds a device tree option,
using a fixed regulator binding, for setting the io voltage to match
the hardware configuration. If no option is supplied it defaults to
3.3 volt configuration.
---
 .../devicetree/bindings/net/nfc/trf7970a.txt   |  4 ++--
 drivers/nfc/trf7970a.c | 28 +-
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt 
b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
index e262ac1..b5777d8 100644
--- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
+++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
@@ -21,9 +21,9 @@ Optional SoC Specific Properties:
 - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
   where an extra byte is returned by Read Multiple Block commands issued
   to Type 5 tags.
+- vdd-io-supply: Regulator specifying voltage for vdd-io
 - clock-frequency: Set to specify that the input frequency to the trf7970a is 
1356Hz or 2712Hz
 
-
 Example (for ARM-based BeagleBone with TRF7970A on SPI1):
 
  {
@@ -41,11 +41,11 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
  < 5 GPIO_ACTIVE_LOW>;
vin-supply = <_reg>;
vin-voltage-override = <500>;
+   vdd-io-supply = <_reg>;
autosuspend-delay = <3>;
irq-status-read-quirk;
en2-rf-quirk;
t5t-rmb-extra-byte-quirk;
-   vdd_io_1v8;
clock-frequency = <2712>;
status = "okay";
};
diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index 4e051e9..8a88195 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -444,6 +444,7 @@ struct trf7970a {
u8  iso_ctrl_tech;
u8  modulator_sys_clk_ctrl;
u8  special_fcn_reg1;
+   u8  io_ctrl;
unsigned intguard_time;
int technology;
int framing;
@@ -1051,6 +1052,11 @@ static int trf7970a_init(struct trf7970a *trf)
if (ret)
goto err_out;
 
+   ret = trf7970a_write(trf, TRF7970A_REG_IO_CTRL,
+   trf->io_ctrl | TRF7970A_REG_IO_CTRL_VRS(0x1));
+   if (ret)
+   goto err_out;
+
ret = trf7970a_write(trf, TRF7970A_NFC_TARGET_LEVEL, 0);
if (ret)
goto err_out;
@@ -1767,7 +1773,7 @@ static int _trf7970a_tg_listen(struct nfc_digital_dev 
*ddev, u16 timeout,
goto out_err;
 
ret = trf7970a_write(trf, TRF7970A_REG_IO_CTRL,
-   TRF7970A_REG_IO_CTRL_VRS(0x1));
+   trf->io_ctrl | TRF7970A_REG_IO_CTRL_VRS(0x1));
if (ret)
goto out_err;
 
@@ -2062,6 +2068,7 @@ static int trf7970a_probe(struct spi_device *spi)
return ret;
}
 
+
of_property_read_u32(np, "clock-frequency", _freq);
if ((clk_freq != TRF7970A_27MHZ_CLOCK_FREQUENCY) ||
(clk_freq != TRF7970A_27MHZ_CLOCK_FREQUENCY)) {
@@ -2105,6 +2112,25 @@ static int trf7970a_probe(struct spi_device *spi)
if (uvolts > 400)
trf->chip_status_ctrl = TRF7970A_CHIP_STATUS_VRS5_3;
 
+   trf->regulator = devm_regulator_get(>dev, "vdd-io");
+   if (IS_ERR(trf->regulator)) {
+   ret = PTR_ERR(trf->regulator);
+   dev_err(trf->dev, "Can't get VDD_IO regulator: %d\n", ret);
+   goto err_destroy_lock;
+   }
+
+   ret = regulator_enable(trf->regulator);
+   if (ret) {
+   dev_err(trf->dev, "Can't enable VDD_IO: %d\n", ret);
+   goto err_destroy_lock;
+   }
+
+
+   if (regulator_get_voltage(trf->regulator) == 180) {
+   trf->io_ctrl = TRF7970A_REG_IO_CTRL_IO_LOW;
+   dev_dbg(trf->dev, "trf7970a config vdd_io to 1.8V\n");
+   }
+
trf->ddev = nfc_digital_allocate_device(_nfc_ops,
TRF7970A_SUPPORTED_PROTOCOLS,
NFC_DIGITAL_DRV_CAPS_IN_CRC |
-- 
Signed-off-by: Geoff Lansberry 



[PATCH 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel

2016-12-20 Thread Geoff Lansberry
From: Jaret Cantu 

Repeated polling attempts cause a NULL dereference error to occur.
This is because the state of the trf7970a is currently reading but
another request has been made to send a command before it has finished.

The solution is to properly kill the waiting reading (workqueue)
before failing on the send.
---
 drivers/nfc/trf7970a.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index 8a88195..5916737 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -1496,6 +1496,10 @@ static int trf7970a_send_cmd(struct nfc_digital_dev 
*ddev,
(trf->state != TRF7970A_ST_IDLE_RX_BLOCKED)) {
dev_err(trf->dev, "%s - Bogus state: %d\n", __func__,
trf->state);
+   if (trf->state == TRF7970A_ST_WAIT_FOR_RX_DATA ||
+   trf->state == TRF7970A_ST_WAIT_FOR_RX_DATA_CONT)
+   trf->ignore_timeout =
+   !cancel_delayed_work(>timeout_work);
ret = -EIO;
goto out_err;
}
-- 
Signed-off-by: Geoff Lansberry 



  1   2   >