Re: Soft lockup in tc_classify
On Tue, Dec 20, 2016 at 10:44 PM, Shahar Kleinwrote: > > Tried it with same results This piece is pretty interesting: [ 408.554689] DEBUGG:SK thread-2853[cpu-1] setting tp_created to 1 tp=94b5b02805a0 back=94b9ea932060 [ 408.574258] DEBUGG:SK thread-2853[cpu-1] add/change filter by: fl_get [cls_flower] tp=94b5b02805a0 tp->next=94b9ea932060 [ 408.587849] DEBUGG:SK destroy 94b5b0280780 tcf_destroy:1905 [ 408.595862] DEBUGG:SK thread-2845[cpu-1] add/change filter by: fl_get [cls_flower] tp=94b5b02805a0 tp->next=94b5b02805a0 Looks like you added a debug printk inside tcf_destroy() too, which seems racy with filter creation, it should not happen since in both cases we take RTNL lock. Don't know if changing all RCU_INIT_POINTER in that file to rcu_assign_pointer could help anything or not. Mind to try? Thanks for debugging!
Re: HalfSipHash Acceptable Usage
Eric Dumazet wrote: > On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote: >> Cycles per byte on 1024 bytes of data: >> Pentium Core 2 Ivy >> 4 Duo Bridge >> SipHash-2-4 38.9 8.3 5.8 >> HalfSipHash-2-4 12.7 4.5 3.2 >> MD5 8.3 5.7 4.7 > > So definitely not faster. > > 38 cycles per byte is a problem, considering IPV6 is ramping up. As I said earlier, SipHash performance on 32-bit x86 really sucks, because it wants an absolute minimum of 9 32-bit registers (8 for the state plus one temporary for the rotates), and x86 has 7. > What about SHA performance (syncookies) on P4 ? I recompiled with -mtune=pentium4 and re-ran. MD5 time went *up* by 0.3 cycles/byte, HalfSipHash went down by 1 cycle, and SipHash didn't change: Cycles per byte on 1024 bytes of data: Pentium Core 2 Ivy 4 Duo Bridge SipHash-2-4 38.9 8.3 5.8 HalfSipHash-2-4 11.5 4.5 3.2 MD5 8.6 5.7 4.7 SHA-1 19.0 8.0 6.8 (This is with a verbatim copy of the lib/sha1.c code; I might be able to optimize it with some asm hackery.) Anyway, you see why we were looking longingly at HalfSipHash. In fact, I have an idea. Allow me to make the following concrete suggestion for using HalfSipHash with 128 bits of key material: - 64 bits are used as the key. - The other 64 bits are used as an IV which is prepended to the message to be hashed. As a matter of practical implementation, we precompute the effect of hashing the IV and store the 128-bit HalfSipHash state, which is used just like a 128-bit key. Because of the way it is constructed, it is obviously no weaker than standard HalfSipHash's 64-bit security claim. I don't know the security of this, and it's almost certainly weaker than 128 bits, but I *hope* it's at least a few bits stronger than 64 bits. 80 would be enough to dissuade any attacker without a six-figure budget (that's per attack, not a one-time capital investment). 96 would be ample for our purposes. What I do know is that it makes a brute-force attack without significant cryptanalytic effort impossible. To match the spec exactly, we'd need to add the 8-byte IV length to the length byte which pads the final block, but from a security point of view, it does not matter. As long as we are consistent within any single key, any unique mapping between padding byte and message length (mod 256) is equally good. We may choose based on implementation convenience. (Also note my earlier comments about when it is okay to omit the padding length byte entirely: any time all the data to be hashed with a given key is fixed in format or self-delimiting (e.g. null-terminated). This applies to many of the networking uses.)
Re: ipv6: handle -EFAULT from skb_copy_bits
On Tue, Dec 20, 2016 at 2:12 PM, Dave Joneswrote: > fd = socket(AF_INET6, SOCK_RAW, 7); > > setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, , 4); > setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, , LEN); > Interesting, you set the checksum offset to be 0, but the packet size is actually 49, transport header is located at offset 48, so apparently the packet doesn't have room for a 16bit checksum after network header. Your original patch seems reasonable to me, unless there is some check in __ip6_append_data() which is supposed to catch this, but CHECKSUM is specific to raw socket only.
Re: HalfSipHash Acceptable Usage
On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote: > > I do not see why SipHash, if faster than MD5 and more secure, would be a > > problem. > > Because on 32-bit x86, it's slower. > > Cycles per byte on 1024 bytes of data: > Pentium Core 2 Ivy > 4 Duo Bridge > SipHash-2-4 38.9 8.3 5.8 > HalfSipHash-2-4 12.7 4.5 3.2 > MD58.3 5.7 4.7 So definitely not faster. 38 cycles per byte is a problem, considering IPV6 is ramping up. But TCP session establishment on P4 is probably not a big deal. Nobody would expect a P4 to handle gazillions of TCP flows (using a 32bit kernel) What about SHA performance (syncookies) on P4 ? Synfloods are probably the only case we might take care of for 2000-era cpus.
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Tue, Dec 20, 2016 at 10:49:25AM -0800, Andy Lutomirski wrote: > >> FWIW, everywhere I say ioctl(), the bpf() syscall would be okay, too. > >> It doesn't make a semantic difference, except that I dislike > >> BPF_PROG_DETACH because that particular command isn't BPF-specific at > >> all. > > > > Well, I think it is; it pops the bpf program from a target and drops the > > reference on it. It's not much code, but it's certainly bpf-specific. > > I mean the interface isn't bpf-specific. If there was something that > wasn't bpf attached to the target, you'd still want an API to detach > it. This discussion won't go anywhere while you keep thinking that this api has to be generalized. As I explained several times earlier BPF_CGROUP_INET_SOCK_CREATE hook is bpf specific. There is nothing in the kernel that can take advantage of it today, so by definition the hook is bpf specific. Period. Saying that something in the future may come along that would want to use that is like saying I want to design the generic steering wheel for any car that will ever need it. Hence if you want to change 'target_fd' in BPF_PROG_ATTACH/DETACH cmds from being fd of open("cgroupdir") to fd of open("cgroupdir/cgroup.bpf") file inside it then I'm ok with that. All other proposals with non-extensible ioctls() and crazy text based per-hook permissions is nack.
Re: HalfSipHash Acceptable Usage
> I do not see why SipHash, if faster than MD5 and more secure, would be a > problem. Because on 32-bit x86, it's slower. Cycles per byte on 1024 bytes of data: Pentium Core 2 Ivy 4 Duo Bridge SipHash-2-4 38.9 8.3 5.8 HalfSipHash-2-4 12.7 4.5 3.2 MD5 8.3 5.7 4.7 SipHash is more parallelizable and runs faster on superscalar processors, but MD5 is optimized for 2000-era processors, and is faster on them than HalfSipHash even. Now, in the applications we care about, we're hashing short blocks, and SipHash has the advantage that it can hash less than 64 bytes. But it also pays a penalty on short blocks for the finalization, equivalent to two words (16 bytes) of input. It turns out that on both Ivy Bridge and Core 2 Duo, the crossover happens between 23 (SipHash is faster) and 24 (MD5 is faster) bytes of input. This is assuming you're adding the 1 byte of length padding to SipHash's input, so 24 bytes pads to 4 64-bit words, which makes 2*4+4 = 12 rounds, vs. one block for MD5. (MD5 takes a similar jump between 55 and 56 bytes.) On a P4, SipHash is *never* faster; it takes 2.5x longer than MD5 on a 12-byte block (an IPv4 address/port pair). This is why there was discussion of using HalfSipHash on these machines. (On a P4, the HalfSipHash/MD5 crossover is somewhere between 24 and 31 bytes; I haven't benchmarked every possible size.)
Re: [PATCH] staging: octeon: Call SET_NETDEV_DEV()
From: Florian FainelliDate: Tue, 20 Dec 2016 17:02:37 -0800 > On 12/14/2016 05:13 PM, Florian Fainelli wrote: >> The Octeon driver calls into PHYLIB which now checks for >> net_device->dev.parent, so make sure we do set it before calling into >> any MDIO/PHYLIB related function. >> >> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a >> different owner") >> Reported-by: Aaro Koskinen >> Signed-off-by: Florian Fainelli > > Greg, David, since this is a fix for a regression introduced in the net > tree, it may make sense that David take it via his tree. Since the change in question is in Linus's tree, it's equally valid for Greg to take it as well.
Re: [PATCH net-next 1/1] driver: ipvlan: Define common functions to decrease duplicated codes used to add or del IP address
On Wed, Dec 21, 2016 at 2:30 AM, David Millerwrote: > From: f...@ikuai8.com > Date: Mon, 19 Dec 2016 09:24:05 +0800 > >> It is sent again because the first email is sent during net-next closing. > > It is still closed, and will not open again for at least one week. Thanks David. I thought it only last one week. I would waiting for reopen, and resend again. Regards Feng
Re: [PATCH] phy: check if parent device is NULL
Yes, I saw that with the staging Octeon driver. Your patch works for me too. Thanks Florian! On Tue, Dec 20, 2016 at 4:33 PM, Florian Fainelliwrote: > On 12/20/2016 03:51 PM, Ruslan Babayev wrote: >> Fixes a crash observed on Octeon. >> >> Signed-off-by: Ruslan Babayev >> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a >> different owner") > > Assuming you saw this with the staging Octeon driver, a fix has already > been submitted: > > https://lkml.org/lkml/2016/12/14/756 > > If this is with a different driver, I would rather we fix it in a > similar way that the fix proposed above. > > Thanks > >> --- >> drivers/net/phy/phy_device.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >> index 9c06f8028f0c..043328b85643 100644 >> --- a/drivers/net/phy/phy_device.c >> +++ b/drivers/net/phy/phy_device.c >> @@ -905,7 +905,8 @@ EXPORT_SYMBOL(phy_attached_print); >> int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, >> u32 flags, phy_interface_t interface) >> { >> - struct module *ndev_owner = dev->dev.parent->driver->owner; >> + struct device *parent = dev->dev.parent; >> + struct module *ndev_owner = parent ? parent->driver->owner : NULL; >> struct mii_bus *bus = phydev->mdio.bus; >> struct device *d = >mdio.dev; >> int err; >> > > > -- > Florian
Re: [PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
On Tue, Dec 20, 2016 at 11:16:31AM -0500, Geoff Lansberry wrote: > From: Geoff Lansberry> > The TRF7970A has configuration options for supporting hardware designs > with 1.8 Volt or 3.3 Volt IO. This commit adds a device tree option, > using a fixed regulator binding, for setting the io voltage to match > the hardware configuration. If no option is supplied it defaults to > 3.3 volt configuration. Sign-off ?? Same comment for you other patches. Okay I see you have it at the end of the patch. It should be here. 'git commit -s' is your friend. > --- > .../devicetree/bindings/net/nfc/trf7970a.txt | 4 ++-- > drivers/nfc/trf7970a.c | 28 > +- > 2 files changed, 29 insertions(+), 3 deletions(-) > > diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > index e262ac1..b5777d8 100644 > --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > @@ -21,9 +21,9 @@ Optional SoC Specific Properties: > - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum >where an extra byte is returned by Read Multiple Block commands issued >to Type 5 tags. > +- vdd-io-supply: Regulator specifying voltage for vdd-io > - clock-frequency: Set to specify that the input frequency to the trf7970a > is 1356Hz or 2712Hz > > - > Example (for ARM-based BeagleBone with TRF7970A on SPI1): > > { > @@ -41,11 +41,11 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1): > < 5 GPIO_ACTIVE_LOW>; > vin-supply = <_reg>; > vin-voltage-override = <500>; > + vdd-io-supply = <_reg>; > autosuspend-delay = <3>; > irq-status-read-quirk; > en2-rf-quirk; > t5t-rmb-extra-byte-quirk; > - vdd_io_1v8; It was already mentioned but this shouldn't have been added in the previous patch so it shouldn't be here now. > clock-frequency = <2712>; > status = "okay"; > }; > diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c > index 4e051e9..8a88195 100644 > --- a/drivers/nfc/trf7970a.c > +++ b/drivers/nfc/trf7970a.c > @@ -2062,6 +2068,7 @@ static int trf7970a_probe(struct spi_device *spi) > return ret; > } > > + Please don't add an extra blank line. > of_property_read_u32(np, "clock-frequency", _freq); > if ((clk_freq != TRF7970A_27MHZ_CLOCK_FREQUENCY) || > (clk_freq != TRF7970A_27MHZ_CLOCK_FREQUENCY)) { > @@ -2105,6 +2112,25 @@ static int trf7970a_probe(struct spi_device *spi) > if (uvolts > 400) > trf->chip_status_ctrl = TRF7970A_CHIP_STATUS_VRS5_3; > > + trf->regulator = devm_regulator_get(>dev, "vdd-io"); > + if (IS_ERR(trf->regulator)) { > + ret = PTR_ERR(trf->regulator); > + dev_err(trf->dev, "Can't get VDD_IO regulator: %d\n", ret); > + goto err_destroy_lock; > + } > + > + ret = regulator_enable(trf->regulator); > + if (ret) { > + dev_err(trf->dev, "Can't enable VDD_IO: %d\n", ret); > + goto err_destroy_lock; > + } > + > + Please don't add an extra blank line. > + if (regulator_get_voltage(trf->regulator) == 180) { > + trf->io_ctrl = TRF7970A_REG_IO_CTRL_IO_LOW; > + dev_dbg(trf->dev, "trf7970a config vdd_io to 1.8V\n"); > + } > + > trf->ddev = nfc_digital_allocate_device(_nfc_ops, > TRF7970A_SUPPORTED_PROTOCOLS, > NFC_DIGITAL_DRV_CAPS_IN_CRC | > -- > Signed-off-by: Geoff Lansberry Your 'Signed-off-by:' goes at the end of the commit description not here. Overall, I think you did the right thing (unless someone disagrees). Just some minor issues. Mark --
Re: [PATCH net-next] ixgbevf: fix 'Etherleak' in ixgbevf
I find it curious that only the last 4 bytes have data in them. I'm wondering if the NIC/driver in the Windows/Nessus system is interpreting the 4 byte CRC on the end of the frame as padding instead of stripping it. Is there any chance you could capture the entire frame instead of just the padding? Maybe you could run something like wireshark without enabling promiscuous mode on the VF and capture the frames it is trying to send and receive. What I want to verify is what the actual amount of padding is that is needed to get to 60 bytes and where the CRC should start. - Alex On Tue, Dec 20, 2016 at 5:40 PM, Weilong Chenwrote: > Thanks for you explanation, it's very professional. > > My test is like this: > The Nessus is deployed on a windows server, the peer is a X86_64 linux host > which run several VMs on it. The nic is Intel 82599 and SRIOV is enabled. > VFs are passthroughed to the VMs. No DPDK. > > The Nessus server send small ICMP echo request packets to the VM, and > then check the reply, and report the error: > > "11197 - Multiple Ethernet Driver Frame Padding Information Disclosure > (Etherleak)" > > "Padding observed in one frame : > > 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 57 37 28 .W7( > 0x10: 76 v > > Padding observed in another frame : > > 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 D3 4D 75 ..Mu > 0x10: 28 (" > > I only have Nessus's windows version, so can't test on linux. Maybe the > windows server does not pad small packets to 60 bytes on the receive path. > > > On 2016/12/21 0:36, Alexander Duyck wrote: >> >> The limit of 17 is just based on the hardware. Specifically the >> olinfo field in the Tx descriptor has a minimum length of 17 has a >> requirement. The hardware itself is supposed to be capable of padding >> short frames that are supposed to be transmitted. The drivers are >> supposed to pad short frames on receive to get them up to 60 bytes. >> >> When you are seeing this issue are you sending frames from the VF to >> one of the local interfaces on the same port or to an external >> interface? Also are you receiving on another linux ixgbevf driver or >> are you receiving the packet using a different driver interface such >> as DPDK? I'm just wanting to verify this as it is possible that the >> memory leak you are seeing is on the receiver and not on the source if >> you are transmitting to a local VF or the PF as the receiver will have >> to pad the frame in such a case to get it up to 60 bytes. >> >> - Alex >> >> On Tue, Dec 20, 2016 at 3:50 AM, Weilong Chen >> wrote: >>> >>> Hi, >>> >>> Thanks for you reply. >>> We test you patch, but the problem is still there, it seems do not work. >>> >>> I'm not sure why ixgbe use the limit 17. The kenel use ETH_ZLEN (60) with >>> out FCS. A lot of drivers such as e1000 use it. Any explaination? >>> >>> Thanks. >>> >>> >>> On 2016/12/16 0:13, Alexander Duyck wrote: On Thu, Dec 15, 2016 at 3:40 AM, Weilong Chen wrote: > > > Nessus report the vf appears to leak memory in network packets. > Fix this by padding all small packets manually. > > And the CVE-2003-0001. > > > https://ofirarkin.files.wordpress.com/2008/11/atstake_etherleak_report.pdf > > Signed-off-by: Weilong Chen > --- > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > index 6d4bef5..137a154 100644 > --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > @@ -3654,6 +3654,13 @@ static int ixgbevf_xmit_frame(struct sk_buff > *skb, > struct net_device *netdev) > return NETDEV_TX_OK; > } > > + /* On PCI/PCI-X HW, if packet size is less than ETH_ZLEN, > +* packets may get corrupted during padding by HW. > +* To WA this issue, pad all small packets manually. > +*/ > + if (eth_skb_pad(skb)) > + return NETDEV_TX_OK; > + So the patch description for this probably isn't correct. It looks like the problem isn't leaking data it is the fact that the frames aren't being padded to prevent malicious events. The only issue is the patch is padding by a bit too much. I would recommend replacing this with the following from ixgbe: /* * The minimum packet size for olinfo paylen is 17 so pad the skb * in order to meet this minimum size requirement. */ if (skb_put_padto(skb, 17)) return NETDEV_TX_OK; > tx_ring = adapter->tx_ring[skb->queue_mapping];
Re: [PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
On Tue, Dec 20, 2016 at 11:13:23AM -0500, Geoff Lansberry wrote: > On Mon, Dec 19, 2016 at 5:35 PM, Rob Herringwrote: > > On Thu, Dec 15, 2016 at 05:30:43PM -0500, Geoff Lansberry wrote: > >> From: Geoff Lansberry > >> > >> --- > >> Documentation/devicetree/bindings/net/nfc/trf7970a.txt | 2 ++ > >> drivers/nfc/trf7970a.c | 13 - > >> 2 files changed, 14 insertions(+), 1 deletion(-) > >> > >> diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > >> b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > >> index 9dda879..208f045 100644 > >> --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > >> +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > >> @@ -21,6 +21,7 @@ Optional SoC Specific Properties: > >> - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum > >>where an extra byte is returned by Read Multiple Block commands issued > >>to Type 5 tags. > >> +- vdd_io_1v8: Set to specify that the trf7970a io voltage should be set > >> to 1.8V > > > > Use the regulator binding and provide a fixed 1.8V supply. > > > >> - crystal_27mhz: Set to specify that the input frequency to the trf7970a > >> is 27.12MHz > >> > >> > >> @@ -45,6 +46,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1): > >> irq-status-read-quirk; > >> en2-rf-quirk; > >> t5t-rmb-extra-byte-quirk; > >> + vdd_io_1v8; > >> crystal_27mhz; > >> status = "okay"; > >> }; > > Rob - using the regulator binding is new to me, but I've given it a > shot and just sent you another set of patches for your inspection. > Please let me know if this is what you had in mind. This is my bad. Geoff followed my example and did something similar to 'vin-voltage-override' which shouldn't have been there in the first place. I have this fixed (I think) locally and will submit once it I'm back from my holiday travels. Mark --
Re: [PATCH net-next] ixgbevf: fix 'Etherleak' in ixgbevf
Thanks for you explanation, it's very professional. My test is like this: The Nessus is deployed on a windows server, the peer is a X86_64 linux host which run several VMs on it. The nic is Intel 82599 and SRIOV is enabled. VFs are passthroughed to the VMs. No DPDK. The Nessus server send small ICMP echo request packets to the VM, and then check the reply, and report the error: "11197 - Multiple Ethernet Driver Frame Padding Information Disclosure (Etherleak)" "Padding observed in one frame : 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 57 37 28 .W7( 0x10: 76 v Padding observed in another frame : 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 D3 4D 75 ..Mu 0x10: 28 (" I only have Nessus's windows version, so can't test on linux. Maybe the windows server does not pad small packets to 60 bytes on the receive path. On 2016/12/21 0:36, Alexander Duyck wrote: The limit of 17 is just based on the hardware. Specifically the olinfo field in the Tx descriptor has a minimum length of 17 has a requirement. The hardware itself is supposed to be capable of padding short frames that are supposed to be transmitted. The drivers are supposed to pad short frames on receive to get them up to 60 bytes. When you are seeing this issue are you sending frames from the VF to one of the local interfaces on the same port or to an external interface? Also are you receiving on another linux ixgbevf driver or are you receiving the packet using a different driver interface such as DPDK? I'm just wanting to verify this as it is possible that the memory leak you are seeing is on the receiver and not on the source if you are transmitting to a local VF or the PF as the receiver will have to pad the frame in such a case to get it up to 60 bytes. - Alex On Tue, Dec 20, 2016 at 3:50 AM, Weilong Chenwrote: Hi, Thanks for you reply. We test you patch, but the problem is still there, it seems do not work. I'm not sure why ixgbe use the limit 17. The kenel use ETH_ZLEN (60) with out FCS. A lot of drivers such as e1000 use it. Any explaination? Thanks. On 2016/12/16 0:13, Alexander Duyck wrote: On Thu, Dec 15, 2016 at 3:40 AM, Weilong Chen wrote: Nessus report the vf appears to leak memory in network packets. Fix this by padding all small packets manually. And the CVE-2003-0001. https://ofirarkin.files.wordpress.com/2008/11/atstake_etherleak_report.pdf Signed-off-by: Weilong Chen --- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index 6d4bef5..137a154 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -3654,6 +3654,13 @@ static int ixgbevf_xmit_frame(struct sk_buff *skb, struct net_device *netdev) return NETDEV_TX_OK; } + /* On PCI/PCI-X HW, if packet size is less than ETH_ZLEN, +* packets may get corrupted during padding by HW. +* To WA this issue, pad all small packets manually. +*/ + if (eth_skb_pad(skb)) + return NETDEV_TX_OK; + So the patch description for this probably isn't correct. It looks like the problem isn't leaking data it is the fact that the frames aren't being padded to prevent malicious events. The only issue is the patch is padding by a bit too much. I would recommend replacing this with the following from ixgbe: /* * The minimum packet size for olinfo paylen is 17 so pad the skb * in order to meet this minimum size requirement. */ if (skb_put_padto(skb, 17)) return NETDEV_TX_OK; tx_ring = adapter->tx_ring[skb->queue_mapping]; /* need: 1 descriptor per page * PAGE_SIZE/IXGBE_MAX_DATA_PER_TXD, -- 1.7.12 . .
Re: [PATCH] staging: octeon: Call SET_NETDEV_DEV()
On 12/14/2016 05:13 PM, Florian Fainelli wrote: > The Octeon driver calls into PHYLIB which now checks for > net_device->dev.parent, so make sure we do set it before calling into > any MDIO/PHYLIB related function. > > Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a > different owner") > Reported-by: Aaro Koskinen> Signed-off-by: Florian Fainelli Greg, David, since this is a fix for a regression introduced in the net tree, it may make sense that David take it via his tree. Thanks > --- > drivers/staging/octeon/ethernet.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/staging/octeon/ethernet.c > b/drivers/staging/octeon/ethernet.c > index 8130dfe89745..4971aa54756a 100644 > --- a/drivers/staging/octeon/ethernet.c > +++ b/drivers/staging/octeon/ethernet.c > @@ -770,6 +770,7 @@ static int cvm_oct_probe(struct platform_device *pdev) > /* Initialize the device private structure. */ > struct octeon_ethernet *priv = netdev_priv(dev); > > + SET_NETDEV_DEV(dev, >dev); > dev->netdev_ops = _oct_pow_netdev_ops; > priv->imode = CVMX_HELPER_INTERFACE_MODE_DISABLED; > priv->port = CVMX_PIP_NUM_INPUT_PORTS; > @@ -816,6 +817,7 @@ static int cvm_oct_probe(struct platform_device *pdev) > } > > /* Initialize the device private structure. */ > + SET_NETDEV_DEV(dev, >dev); > priv = netdev_priv(dev); > priv->netdev = dev; > priv->of_node = cvm_oct_node_for_port(pip, interface, > -- Florian
Re: [PATCH] phy: check if parent device is NULL
On 12/20/2016 03:51 PM, Ruslan Babayev wrote: > Fixes a crash observed on Octeon. > > Signed-off-by: Ruslan Babayev> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a > different owner") Assuming you saw this with the staging Octeon driver, a fix has already been submitted: https://lkml.org/lkml/2016/12/14/756 If this is with a different driver, I would rather we fix it in a similar way that the fix proposed above. Thanks > --- > drivers/net/phy/phy_device.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 9c06f8028f0c..043328b85643 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -905,7 +905,8 @@ EXPORT_SYMBOL(phy_attached_print); > int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, > u32 flags, phy_interface_t interface) > { > - struct module *ndev_owner = dev->dev.parent->driver->owner; > + struct device *parent = dev->dev.parent; > + struct module *ndev_owner = parent ? parent->driver->owner : NULL; > struct mii_bus *bus = phydev->mdio.bus; > struct device *d = >mdio.dev; > int err; > -- Florian
Re: [PATCH net-next 00/10] netcp: enhancements and minor fixes
The net-next tree is not open, do not resubmit this series until it is open again. Thanks.
Re: HalfSipHash Acceptable Usage
On Tue, 2016-12-20 at 16:36 -0500, Theodore Ts'o wrote: > On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote: > > 1) Anything that requires actual long-term security will use > > SipHash2-4, with the 64-bit output and the 128-bit key. This includes > > things like TCP sequence numbers. This seems pretty uncontroversial to > > me. Seem okay to you? > > Um, why do TCP sequence numbers need long-term security? So long as > you rekey every 5 minutes or so, TCP sequence numbers don't need any > more security than that, since even if you break the key used to > generate initial sequence numbers seven a minute or two later, any > pending TCP connections will have timed out long before. > > See the security analysis done in RFC 6528[1], where among other > things, it points out why MD5 is acceptable with periodic rekeying, > although there is the concern that this could break certain hueristics > used when establishing new connections during the TIME-WAIT state. > > [1] https://tools.ietf.org/html/rfc6528 We do not use rekeying for TCP ISN, not anymore after commit 6e5714eaf77d79ae1 (where we switched from MD4 to MD5 ) It might hurt some common cases and I do not believe it is mandated by a current (ie not obsolete) RFC. Our clock has a 64 ns resolution and 274 second period (commit 9b42c336d0641) (compared to 4 usec one in RFC 6528) I do not see why SipHash, if faster than MD5 and more secure, would be a problem. Same for syncookies. BTW, we probably should add a ratelimit on SYNACK retransmits, because it seems that attackers understood linux kernels resist to synfloods, and they (the bad guys) use reflection attacks.
[PATCH] phy: check if parent device is NULL
Fixes a crash observed on Octeon. Signed-off-by: Ruslan BabayevFixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a different owner") --- drivers/net/phy/phy_device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 9c06f8028f0c..043328b85643 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -905,7 +905,8 @@ EXPORT_SYMBOL(phy_attached_print); int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, u32 flags, phy_interface_t interface) { - struct module *ndev_owner = dev->dev.parent->driver->owner; + struct device *parent = dev->dev.parent; + struct module *ndev_owner = parent ? parent->driver->owner : NULL; struct mii_bus *bus = phydev->mdio.bus; struct device *d = >mdio.dev; int err; -- 2.7.4
Re: HalfSipHash Acceptable Usage
Theodore Ts'o wrote: > On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote: >> 1) Anything that requires actual long-term security will use >> SipHash2-4, with the 64-bit output and the 128-bit key. This includes >> things like TCP sequence numbers. This seems pretty uncontroversial to >> me. Seem okay to you? > Um, why do TCP sequence numbers need long-term security? So long as > you rekey every 5 minutes or so, TCP sequence numbers don't need any > more security than that, since even if you break the key used to > generate initial sequence numbers seven a minute or two later, any > pending TCP connections will have timed out long before. > > See the security analysis done in RFC 6528[1], where among other > things, it points out why MD5 is acceptable with periodic rekeying, > although there is the concern that this could break certain hueristics > used when establishing new connections during the TIME-WAIT state. Because we don't rekey TCP sequence numbers, ever. See commit 6e5714eaf77d79ae1c8b47e3e040ff5411b717ec To rekey them requires dividing the sequence number base into a "random" part and some "generation" msbits. While we can do better than the previous 8+24 split (I'd suggest 4+28 or 3+29), only 2 is tricks, and 1 generation bit isn't enough. So while it helps in the long term, it reduces the security offered by the random part in the short term. (If I know 4 bits of your ISN, I only need to send 256 MB to hit your TCP window.) At the time, I objected, and suggested doing two hashes, with a fixed 32-bit base plus a split rekeyed portion, but that was vetoed on the grounds of performance. On further consideration, the fixed base doesn't help much. (Details below for anyone that cares.) Suppose we let the TCP initial sequence number be: (Hash(, fixed_key) & 0x) + (i << 28) + (Hash( , key[i]) & 0x0fff) + (current_time_in_nanoseconds / 64) It's not hugely difficult to mount an effective attack against a 64-bit fixed_key. As an attacker, I can ask the target to send me these numbers for dstPort values i control and other values I know. I can (with high probability) detect the large jumps when the generation changes, so I can make a significant number of queries with the same generation. After 23-ish queries, I have enough information to identify a 64-bit fixed_key. I don't know the current generation counter "i", but I know it's the same for all my queries, so for any two queries, the maximum difference between the 28-bit hash values is 29 bits. (We can also add a small margin to allow for timeing uncertainty, but that's even less.) So if I guess a fixed key, hash my known plaintexts with that guess, subtract the ciphertexts from the observed sequence numbers, and the difference between the remaining (unknown) 28-bit hash values plus timestamps exceeds what's possible, my guess is wrong. I can then repeat with additional known plaintexts, reducing the space of admissible keys by about 3 bits each time. Assuming I can rent GPU horsepower from a bitcoin miner to do this in a reasonable period of time, after 22 known plaintext differences, I have uniquely identified the key. Of course, in practice I'd do is a first pass with maybe 6 plaintexts on the GPU, and then deal with the candidates found in a second pass. But either way, it's about 2.3 SipHash evaluations per key tested. As I noted earlier, a bitcoin blockchain block, worth 25 bitcoins, currently costs 2^71 evaluations of SHA-2 (2^70 evaluations of double SHA-2), and that's accomplished every 10 minutes, this is definitely practical.
[PATCH] net: qcom/emac: add ethtool support
Add support for some ethtool methods: get/set link settings, get/set message level, get statistics, get link status, and restart autonegotiation. Signed-off-by: Timur Tabi--- drivers/net/ethernet/qualcomm/emac/Makefile | 2 +- drivers/net/ethernet/qualcomm/emac/emac-ethtool.c | 156 ++ drivers/net/ethernet/qualcomm/emac/emac.c | 51 --- drivers/net/ethernet/qualcomm/emac/emac.h | 3 + 4 files changed, 191 insertions(+), 21 deletions(-) create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-ethtool.c diff --git a/drivers/net/ethernet/qualcomm/emac/Makefile b/drivers/net/ethernet/qualcomm/emac/Makefile index 7a66879..fc57ced 100644 --- a/drivers/net/ethernet/qualcomm/emac/Makefile +++ b/drivers/net/ethernet/qualcomm/emac/Makefile @@ -4,6 +4,6 @@ obj-$(CONFIG_QCOM_EMAC) += qcom-emac.o -qcom-emac-objs := emac.o emac-mac.o emac-phy.o emac-sgmii.o \ +qcom-emac-objs := emac.o emac-mac.o emac-phy.o emac-sgmii.o emac-ethtool.o \ emac-sgmii-fsm9900.o emac-sgmii-qdf2432.o \ emac-sgmii-qdf2400.o diff --git a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c new file mode 100644 index 000..6de5152 --- /dev/null +++ b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c @@ -0,0 +1,156 @@ +/* Copyright (c) 2016, The Linux Foundation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 and + * only version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include + +#include "emac.h" + +static const char * const emac_ethtool_stat_strings[] = { + "rx_ok", + "rx_bcast", + "rx_mcast", + "rx_pause", + "rx_ctrl", + "rx_fcs_err", + "rx_len_err", + "rx_byte_cnt", + "rx_runt", + "rx_frag", + "rx_sz_64", + "rx_sz_65_127", + "rx_sz_128_255", + "rx_sz_256_511", + "rx_sz_512_1023", + "rx_sz_1024_1518", + "rx_sz_1519_max", + "rx_sz_ov", + "rx_rxf_ov", + "rx_align_err", + "rx_bcast_byte_cnt", + "rx_mcast_byte_cnt", + "rx_err_addr", + "rx_crc_align", + "rx_jabbers", + "tx_ok", + "tx_bcast", + "tx_mcast", + "tx_pause", + "tx_exc_defer", + "tx_ctrl", + "tx_defer", + "tx_byte_cnt", + "tx_sz_64", + "tx_sz_65_127", + "tx_sz_128_255", + "tx_sz_256_511", + "tx_sz_512_1023", + "tx_sz_1024_1518", + "tx_sz_1519_max", + "tx_1_col", + "tx_2_col", + "tx_late_col", + "tx_abort_col", + "tx_underrun", + "tx_rd_eop", + "tx_len_err", + "tx_trunc", + "tx_bcast_byte", + "tx_mcast_byte", + "tx_col", +}; + +#define EMAC_STATS_LEN ARRAY_SIZE(emac_ethtool_stat_strings) + +static u32 emac_get_msglevel(struct net_device *netdev) +{ + struct emac_adapter *adpt = netdev_priv(netdev); + + return adpt->msg_enable; +} + +static void emac_set_msglevel(struct net_device *netdev, u32 data) +{ + struct emac_adapter *adpt = netdev_priv(netdev); + + adpt->msg_enable = data; +} + +static int emac_get_sset_count(struct net_device *netdev, int sset) +{ + switch (sset) { + case ETH_SS_STATS: + return EMAC_STATS_LEN; + default: + return -EOPNOTSUPP; + } +} + +static void emac_get_strings(struct net_device *netdev, u32 stringset, u8 *data) +{ + unsigned int i; + + switch (stringset) { + case ETH_SS_STATS: + for (i = 0; i < EMAC_STATS_LEN; i++) { + strlcpy(data, emac_ethtool_stat_strings[i], + ETH_GSTRING_LEN); + data += ETH_GSTRING_LEN; + } + break; + } +} + +static void emac_get_ethtool_stats(struct net_device *netdev, + struct ethtool_stats *stats, + u64 *data) +{ + struct emac_adapter *adpt = netdev_priv(netdev); + + spin_lock(>stats.lock); + + emac_update_hw_stats(adpt); + memcpy(data, >stats, EMAC_STATS_LEN * sizeof(u64)); + + spin_unlock(>stats.lock); +} + +static int emac_nway_reset(struct net_device *netdev) +{ + struct phy_device *phydev = netdev->phydev; + + if (!phydev) + return -ENODEV; + + return genphy_restart_aneg(phydev); +} + +static const struct ethtool_ops emac_ethtool_ops = { + .get_link_ksettings = phy_ethtool_get_link_ksettings, + .set_link_ksettings =
[PATCH v5] net: dummy: Introduce dummy virtual functions
The idea for this was born when testing VF support in iproute2 which was impeded by hardware requirements. In fact, not every VF-capable hardware driver implements all netdev ops, so testing the interface is still hard to do even with a well-sorted hardware shelf. To overcome this and allow for testing the user-kernel interface, this patch allows to turn dummy into a PF with a configurable amount of VFs. Due to the assumption that all PFs are PCI devices, this implementation is not completely straightforward: In order to allow for rtnl_fill_ifinfo() to see the dummy VFs, a fake PCI parent device is attached to the dummy netdev. This has to happen at the right spot so register_netdevice() does not get confused. This patch abuses ndo_fix_features callback for that. In ndo_uninit callback, the fake parent is removed again for the same purpose. Joint work with Sabrina Dubroca. Signed-off-by: Sabrina DubrocaSigned-off-by: Phil Sutter --- Changes since v4: - Initialize pci_pdev.sriov at runtime - older gcc versions don't allow initializing fields of anonymous unions at declaration time. - Rebased onto current net-next/master. Changes since v3: - Changed type of vf_mac field from unsigned char to u8. - Column-aligned structs' field names. Changes since v2: - Fixed oops on reboot (need to initialize parent device mutex). - Got rid of potential mem leak noticed by Eric Dumazet. - Dropped stray newline insertion. Changes since v1: - Fixed issues reported by kbuild test robot: - pci_dev->sriov is only present if CONFIG_PCI_ATS is active. - pci_bus_type does not exist if CONFIG_PCI is not defined. --- drivers/net/dummy.c | 205 +++- 1 file changed, 203 insertions(+), 2 deletions(-) diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c index 6421835f11b7e..7f8d8598bbbfe 100644 --- a/drivers/net/dummy.c +++ b/drivers/net/dummy.c @@ -34,6 +34,8 @@ #include #include #include +#include +#include "../pci/pci.h"/* for struct pci_sriov */ #include #include #include @@ -42,6 +44,34 @@ #define DRV_VERSION"1.0" static int numdummies = 1; +static int num_vfs; + +static struct pci_sriov pdev_sriov; + +static struct pci_dev pci_pdev = { + .is_physfn = 0, +#ifdef CONFIG_PCI + .dev.bus = _bus_type, +#endif +}; + +struct vf_data_storage { + u8 vf_mac[ETH_ALEN]; + u16 pf_vlan; /* When set, guest VLAN config not allowed. */ + u16 pf_qos; + __be16 vlan_proto; + u16 min_tx_rate; + u16 max_tx_rate; + u8 spoofchk_enabled; + boolrss_query_enabled; + u8 trusted; + int link_state; +}; + +struct dummy_priv { + int num_vfs; + struct vf_data_storage *vfinfo; +}; /* fake multicast ability */ static void set_multicast_list(struct net_device *dev) @@ -91,15 +121,31 @@ static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct net_device *dev) static int dummy_dev_init(struct net_device *dev) { + struct dummy_priv *priv = netdev_priv(dev); + dev->dstats = netdev_alloc_pcpu_stats(struct pcpu_dstats); if (!dev->dstats) return -ENOMEM; + priv->num_vfs = num_vfs; + priv->vfinfo = NULL; + + if (!num_vfs) + return 0; + + priv->vfinfo = kcalloc(num_vfs, sizeof(struct vf_data_storage), + GFP_KERNEL); + if (!priv->vfinfo) { + free_percpu(dev->dstats); + return -ENOMEM; + } + return 0; } static void dummy_dev_uninit(struct net_device *dev) { + dev->dev.parent = NULL; free_percpu(dev->dstats); } @@ -112,6 +158,137 @@ static int dummy_change_carrier(struct net_device *dev, bool new_carrier) return 0; } +/* fake, just to set fake PCI parent after netdev_register_kobject() */ +static netdev_features_t dummy_fix_features(struct net_device *dev, + netdev_features_t features) +{ + struct dummy_priv *priv = netdev_priv(dev); + + if (priv->num_vfs) { +#ifdef CONFIG_PCI_ATS + pci_pdev.sriov = _sriov; +#endif + dev->dev.parent = _pdev.dev; + if (!pci_pdev.is_physfn) { + mutex_init(_pdev.dev.mutex); + pci_pdev.is_physfn = 1; + } + } + + return features; +} + +static int dummy_set_vf_mac(struct net_device *dev, int vf, u8 *mac) +{ + struct dummy_priv *priv = netdev_priv(dev); + + if (!is_valid_ether_addr(mac) || (vf >= priv->num_vfs)) + return -EINVAL; + + memcpy(priv->vfinfo[vf].vf_mac, mac, ETH_ALEN); + + return 0; +} + +static int dummy_set_vf_vlan(struct net_device *dev, int vf, +u16 vlan, u8 qos, __be16 vlan_proto) +{ + struct dummy_priv *priv =
[PATCH 2/2] net: sfc: falcon: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes--- drivers/net/ethernet/sfc/falcon/efx.c |2 +- drivers/net/ethernet/sfc/falcon/ethtool.c | 35 --- drivers/net/ethernet/sfc/falcon/mdio_10g.c | 44 +++- drivers/net/ethernet/sfc/falcon/mdio_10g.h |3 +- drivers/net/ethernet/sfc/falcon/net_driver.h | 12 +++--- drivers/net/ethernet/sfc/falcon/qt202x_phy.c |9 +++-- drivers/net/ethernet/sfc/falcon/tenxpress.c| 22 ++-- drivers/net/ethernet/sfc/falcon/txc43128_phy.c |9 +++-- 8 files changed, 80 insertions(+), 56 deletions(-) diff --git a/drivers/net/ethernet/sfc/falcon/efx.c b/drivers/net/ethernet/sfc/falcon/efx.c index 5c5cb3c..438ef9e 100644 --- a/drivers/net/ethernet/sfc/falcon/efx.c +++ b/drivers/net/ethernet/sfc/falcon/efx.c @@ -986,7 +986,7 @@ void ef4_mac_reconfigure(struct ef4_nic *efx) /* Push loopback/power/transmit disable settings to the PHY, and reconfigure * the MAC appropriately. All other PHY configuration changes are pushed - * through phy_op->set_settings(), and pushed asynchronously to the MAC + * through phy_op->set_link_ksettings(), and pushed asynchronously to the MAC * through ef4_monitor(). * * Callers must hold the mac_lock diff --git a/drivers/net/ethernet/sfc/falcon/ethtool.c b/drivers/net/ethernet/sfc/falcon/ethtool.c index 8e1929b..659ece7 100644 --- a/drivers/net/ethernet/sfc/falcon/ethtool.c +++ b/drivers/net/ethernet/sfc/falcon/ethtool.c @@ -115,44 +115,53 @@ static int ef4_ethtool_phys_id(struct net_device *net_dev, } /* This must be called with rtnl_lock held. */ -static int ef4_ethtool_get_settings(struct net_device *net_dev, - struct ethtool_cmd *ecmd) +static int +ef4_ethtool_get_link_ksettings(struct net_device *net_dev, + struct ethtool_link_ksettings *cmd) { struct ef4_nic *efx = netdev_priv(net_dev); struct ef4_link_state *link_state = >link_state; + u32 supported; + + ethtool_convert_link_mode_to_legacy_u32(, + cmd->link_modes.supported); mutex_lock(>mac_lock); - efx->phy_op->get_settings(efx, ecmd); + efx->phy_op->get_link_ksettings(efx, cmd); mutex_unlock(>mac_lock); /* Both MACs support pause frames (bidirectional and respond-only) */ - ecmd->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause; + supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause; if (LOOPBACK_INTERNAL(efx)) { - ethtool_cmd_speed_set(ecmd, link_state->speed); - ecmd->duplex = link_state->fd ? DUPLEX_FULL : DUPLEX_HALF; + cmd->base.speed = link_state->speed; + cmd->base.duplex = link_state->fd ? DUPLEX_FULL : DUPLEX_HALF; } + ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported, + supported); + return 0; } /* This must be called with rtnl_lock held. */ -static int ef4_ethtool_set_settings(struct net_device *net_dev, - struct ethtool_cmd *ecmd) +static int +ef4_ethtool_set_link_ksettings(struct net_device *net_dev, + const struct ethtool_link_ksettings *cmd) { struct ef4_nic *efx = netdev_priv(net_dev); int rc; /* GMAC does not support 1000Mbps HD */ - if ((ethtool_cmd_speed(ecmd) == SPEED_1000) && - (ecmd->duplex != DUPLEX_FULL)) { + if ((cmd->base.speed == SPEED_1000) && + (cmd->base.duplex != DUPLEX_FULL)) { netif_dbg(efx, drv, efx->net_dev, "rejecting unsupported 1000Mbps HD setting\n"); return -EINVAL; } mutex_lock(>mac_lock); - rc = efx->phy_op->set_settings(efx, ecmd); + rc = efx->phy_op->set_link_ksettings(efx, cmd); mutex_unlock(>mac_lock); return rc; } @@ -1310,8 +1319,6 @@ static int ef4_ethtool_get_module_info(struct net_device *net_dev, } const struct ethtool_ops ef4_ethtool_ops = { - .get_settings = ef4_ethtool_get_settings, - .set_settings = ef4_ethtool_set_settings, .get_drvinfo= ef4_ethtool_get_drvinfo, .get_regs_len = ef4_ethtool_get_regs_len, .get_regs = ef4_ethtool_get_regs, @@ -1340,4 +1347,6 @@ static int ef4_ethtool_get_module_info(struct net_device *net_dev, .set_rxfh = ef4_ethtool_set_rxfh, .get_module_info= ef4_ethtool_get_module_info, .get_module_eeprom = ef4_ethtool_get_module_eeprom, + .get_link_ksettings = ef4_ethtool_get_link_ksettings, + .set_link_ksettings = ef4_ethtool_set_link_ksettings, }; diff --git
[PATCH 1/2] net: mdio: add mdio45_ethtool_ksettings_get
There is a function in mdio for the old ethtool api gset. We add a new function mdio45_ethtool_ksettings_get for the new ethtool api glinksettings. Signed-off-by: Philippe Reynes--- drivers/net/mdio.c | 178 ++ include/linux/mdio.h | 21 ++ 2 files changed, 199 insertions(+), 0 deletions(-) diff --git a/drivers/net/mdio.c b/drivers/net/mdio.c index 3e027ed..077364c 100644 --- a/drivers/net/mdio.c +++ b/drivers/net/mdio.c @@ -342,6 +342,184 @@ void mdio45_ethtool_gset_npage(const struct mdio_if_info *mdio, EXPORT_SYMBOL(mdio45_ethtool_gset_npage); /** + * mdio45_ethtool_ksettings_get_npage - get settings for ETHTOOL_GLINKSETTINGS + * @mdio: MDIO interface + * @cmd: Ethtool request structure + * @npage_adv: Modes currently advertised on next pages + * @npage_lpa: Modes advertised by link partner on next pages + * + * The @cmd parameter is expected to have been cleared before calling + * mdio45_ethtool_ksettings_get_npage(). + * + * Since the CSRs for auto-negotiation using next pages are not fully + * standardised, this function does not attempt to decode them. The + * caller must pass them in. + */ +void mdio45_ethtool_ksettings_get_npage(const struct mdio_if_info *mdio, + struct ethtool_link_ksettings *cmd, + u32 npage_adv, u32 npage_lpa) +{ + int reg; + u32 speed, supported = 0, advertising = 0, lp_advertising = 0; + + BUILD_BUG_ON(MDIO_SUPPORTS_C22 != ETH_MDIO_SUPPORTS_C22); + BUILD_BUG_ON(MDIO_SUPPORTS_C45 != ETH_MDIO_SUPPORTS_C45); + + cmd->base.phy_address = mdio->prtad; + cmd->base.mdio_support = + mdio->mode_support & (MDIO_SUPPORTS_C45 | MDIO_SUPPORTS_C22); + + reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD, + MDIO_CTRL2); + switch (reg & MDIO_PMA_CTRL2_TYPE) { + case MDIO_PMA_CTRL2_10GBT: + case MDIO_PMA_CTRL2_1000BT: + case MDIO_PMA_CTRL2_100BTX: + case MDIO_PMA_CTRL2_10BT: + cmd->base.port = PORT_TP; + supported = SUPPORTED_TP; + reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD, + MDIO_SPEED); + if (reg & MDIO_SPEED_10G) + supported |= SUPPORTED_1baseT_Full; + if (reg & MDIO_PMA_SPEED_1000) + supported |= (SUPPORTED_1000baseT_Full | + SUPPORTED_1000baseT_Half); + if (reg & MDIO_PMA_SPEED_100) + supported |= (SUPPORTED_100baseT_Full | + SUPPORTED_100baseT_Half); + if (reg & MDIO_PMA_SPEED_10) + supported |= (SUPPORTED_10baseT_Full | + SUPPORTED_10baseT_Half); + advertising = ADVERTISED_TP; + break; + + case MDIO_PMA_CTRL2_10GBCX4: + cmd->base.port = PORT_OTHER; + supported = 0; + advertising = 0; + break; + + case MDIO_PMA_CTRL2_10GBKX4: + case MDIO_PMA_CTRL2_10GBKR: + case MDIO_PMA_CTRL2_1000BKX: + cmd->base.port = PORT_OTHER; + supported = SUPPORTED_Backplane; + reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD, + MDIO_PMA_EXTABLE); + if (reg & MDIO_PMA_EXTABLE_10GBKX4) + supported |= SUPPORTED_1baseKX4_Full; + if (reg & MDIO_PMA_EXTABLE_10GBKR) + supported |= SUPPORTED_1baseKR_Full; + if (reg & MDIO_PMA_EXTABLE_1000BKX) + supported |= SUPPORTED_1000baseKX_Full; + reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD, + MDIO_PMA_10GBR_FECABLE); + if (reg & MDIO_PMA_10GBR_FECABLE_ABLE) + supported |= SUPPORTED_1baseR_FEC; + advertising = ADVERTISED_Backplane; + break; + + /* All the other defined modes are flavours of optical */ + default: + cmd->base.port = PORT_FIBRE; + supported = SUPPORTED_FIBRE; + advertising = ADVERTISED_FIBRE; + break; + } + + if (mdio->mmds & MDIO_DEVS_AN) { + supported |= SUPPORTED_Autoneg; + reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_AN, + MDIO_CTRL1); + if (reg & MDIO_AN_CTRL1_ENABLE) { + cmd->base.autoneg = AUTONEG_ENABLE; + advertising |= + ADVERTISED_Autoneg | + mdio45_get_an(mdio, MDIO_AN_ADVERTISE) | +
[PATCH net-next 06/10] net: netcp: ethss: get phy-handle only if link interface is MAC-to-PHY
Currently to parse phy-handle, driver doesn't check if the interface is MAC to PHY. This patch add this check for all MAC to PHY interface types supported by the driver. Signed-off-by: Murali KaricheriSigned-off-by: Sekhar Nori --- drivers/net/ethernet/ti/netcp_ethss.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c index cb48f88..9266961 100644 --- a/drivers/net/ethernet/ti/netcp_ethss.c +++ b/drivers/net/ethernet/ti/netcp_ethss.c @@ -2956,7 +2956,9 @@ static int init_slave(struct gbe_priv *gbe_dev, struct gbe_slave *slave, } slave->open = false; - slave->phy_node = of_parse_phandle(node, "phy-handle", 0); + if ((slave->link_interface == SGMII_LINK_MAC_PHY) || + (slave->link_interface == XGMII_LINK_MAC_PHY)) + slave->phy_node = of_parse_phandle(node, "phy-handle", 0); slave->port_num = gbe_get_slave_port(gbe_dev, slave->slave_num); if (slave->link_interface >= XGMII_LINK_MAC_PHY) -- 1.9.1
[PATCH net-next 00/10] netcp: enhancements and minor fixes
This series is for net-next. This propagates enhancements and minor bug fixes from internal version of the driver to keep the upstream in sync. Please review and apply if this looks good. Tested on all of K2HK/E/L boards. Thanks Murali Karicheri Michael Scherban (1): net: netcp: store network statistics in 64 bits Murali Karicheri (7): net: netcp: extract eflag from desc for rx_hook handling net: netcp: remove the redundant memmov() net: netcp: ethss: get phy-handle only if link interface is MAC-to-PHY net: netcp: use hw capability to remove FCS word from rx packets net: netcp: ale: update to support unknown vlan controls for NU switch net: netcp: ale: use ale_status to size the ale table net: netcp: ale: add proper ale entry mask bits for netcp switch ALE WingMan Kwok (2): net: netcp: ethss: add support of subsystem register region regmap net: netcp: ethss: add support of 10gbe pcsr link status .../devicetree/bindings/net/keystone-netcp.txt | 19 +- drivers/net/ethernet/ti/cpsw_ale.c | 180 --- drivers/net/ethernet/ti/cpsw_ale.h | 17 +- drivers/net/ethernet/ti/netcp.h| 21 +++ drivers/net/ethernet/ti/netcp_core.c | 102 --- drivers/net/ethernet/ti/netcp_ethss.c | 200 + include/linux/soc/ti/knav_dma.h| 2 + 7 files changed, 456 insertions(+), 85 deletions(-) -- 1.9.1
[PATCH net-next 02/10] net: netcp: ethss: add support of 10gbe pcsr link status
From: WingMan KwokThe 10GBASE-R Physical Coding Sublayer (PCS-R) module provides functionality of a physical coding sublayer (PCS) on data being transferred between a demuxed XGMII and SerDes supporting a 16 or 32 bit interface. From the driver point of view, whether a ethernet link is up or not depends also on the status of the block-lock bit of the PCSR. This patch adds the checking of that bit in order to determine the link status. Signed-off-by: WingMan Kwok Signed-off-by: Murali Karicheri Signed-off-by: Sekhar Nori --- .../devicetree/bindings/net/keystone-netcp.txt | 3 ++ drivers/net/ethernet/ti/netcp_ethss.c | 37 -- 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt b/Documentation/devicetree/bindings/net/keystone-netcp.txt index 0854a73..57fc13f 100644 --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt @@ -75,6 +75,9 @@ Required properties: - syscon-subsys: phandle to syscon node of the switch subsystem registers. +- syscon-pcsr: (10gbe only) phandle to syscon node of the + switch PCSR registers. + - reg: register location and the size for the following register regions in the specified order. - switch subsystem registers diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c index 473edda1..cb48f88 100644 --- a/drivers/net/ethernet/ti/netcp_ethss.c +++ b/drivers/net/ethernet/ti/netcp_ethss.c @@ -63,6 +63,12 @@ #define GBE13_ALE_OFFSET 0x600 #define GBE13_HOST_PORT_NUM0 #define GBE13_NUM_ALE_ENTRIES 1024 +/* offset relative to PCSR regmap */ +#define XGBE10_PCSR_OFFSET(x) ((x) * 0x80) +#define XGBE10_PCSR_RX_STATUS(x) (XGBE10_PCSR_OFFSET(x) + 0x0C) + +#define XGBE10_PCSR_BLOCK_LOCK_MASKBIT(30) +#define XGBE10_PCSR_BLOCK_LOCK_SHIFT 30 /* 1G Ethernet NU SS defines */ #define GBENU_MODULE_NAME "netcp-gbenu" @@ -2111,6 +2117,10 @@ static void netcp_ethss_link_state_action(struct gbe_priv *gbe_dev, if (phy) phy_print_status(phy); + else if (slave->link_interface == XGMII_LINK_MAC_MAC_FORCED) { + netdev_printk(KERN_INFO, ndev, + "Link is %s\n", (up ? "Up" : "Down")); + } } static bool gbe_phy_link_status(struct gbe_slave *slave) @@ -2123,18 +2133,29 @@ static void netcp_ethss_update_link_state(struct gbe_priv *gbe_dev, struct net_device *ndev) { int sp = slave->slave_num; - int phy_link_state, sgmii_link_state = 1, link_state; + int phy_link_state, sw_link_state = 1, link_state, ret; + u32 pcsr_rx_stat; if (!slave->open) return; if (!SLAVE_LINK_IS_XGMII(slave)) { - sgmii_link_state = + sw_link_state = netcp_sgmii_get_port_link(SGMII_BASE(gbe_dev, sp), sp); + } else if (slave->link_interface == XGMII_LINK_MAC_MAC_FORCED) { + /* read status from pcsr status reg */ + ret = regmap_read(gbe_dev->pcsr_regmap, + XGBE10_PCSR_RX_STATUS(sp), _rx_stat); + + if (ret) + return; + + sw_link_state = (pcsr_rx_stat & XGBE10_PCSR_BLOCK_LOCK_MASK) >> +XGBE10_PCSR_BLOCK_LOCK_SHIFT; } phy_link_state = gbe_phy_link_status(slave); - link_state = phy_link_state & sgmii_link_state; + link_state = phy_link_state & sw_link_state; if (atomic_xchg(>link_state, link_state) != link_state) netcp_ethss_link_state_action(gbe_dev, ndev, slave, @@ -3154,6 +3175,16 @@ static int set_xgbe_ethss10_priv(struct gbe_priv *gbe_dev, return PTR_ERR(gbe_dev->ss_regmap); } + gbe_dev->pcsr_regmap = syscon_regmap_lookup_by_phandle(node, + "syscon-pcsr"); + + if (IS_ERR(gbe_dev->pcsr_regmap)) { + dev_err(gbe_dev->dev, + "pcsr regmap lookup failed: %ld\n", + PTR_ERR(gbe_dev->pcsr_regmap)); + return PTR_ERR(gbe_dev->pcsr_regmap); + } + ret = of_address_to_resource(node, XGBE_SM_REG_INDEX, ); if (ret) { dev_err(gbe_dev->dev, -- 1.9.1
[PATCH net-next 05/10] net: netcp: store network statistics in 64 bits
From: Michael ScherbanPreviously the network statistics were stored in 32 bit variable which can cause some stats to roll over after several minutes of high traffic. This implements 64 bit storage so larger numbers can be stored. Signed-off-by: Michael Scherban Signed-off-by: Murali Karicheri Signed-off-by: Sekhar Nori --- drivers/net/ethernet/ti/netcp.h | 18 ++ drivers/net/ethernet/ti/netcp_core.c | 68 +--- 2 files changed, 74 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h index a92abd6..d243c5d 100644 --- a/drivers/net/ethernet/ti/netcp.h +++ b/drivers/net/ethernet/ti/netcp.h @@ -23,6 +23,7 @@ #include #include +#include /* Maximum Ethernet frame size supported by Keystone switch */ #define NETCP_MAX_FRAME_SIZE 9504 @@ -68,6 +69,20 @@ struct netcp_addr { struct list_headnode; }; +struct netcp_stats { + struct u64_stats_sync syncp_rx cacheline_aligned_in_smp; + u64 rx_packets; + u64 rx_bytes; + u32 rx_errors; + u32 rx_dropped; + + struct u64_stats_sync syncp_tx cacheline_aligned_in_smp; + u64 tx_packets; + u64 tx_bytes; + u32 tx_errors; + u32 tx_dropped; +}; + struct netcp_intf { struct device *dev; struct device *ndev_dev; @@ -88,6 +103,9 @@ struct netcp_intf { struct napi_struct rx_napi; struct napi_struct tx_napi; + /* 64-bit netcp stats */ + struct netcp_stats stats; + void*rx_channel; const char *dma_chan_name; u32 rx_pool_size; diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c index 286fd8d..b077ed4 100644 --- a/drivers/net/ethernet/ti/netcp_core.c +++ b/drivers/net/ethernet/ti/netcp_core.c @@ -629,6 +629,7 @@ static void netcp_free_rx_desc_chain(struct netcp_intf *netcp, static void netcp_empty_rx_queue(struct netcp_intf *netcp) { + struct netcp_stats *rx_stats = >stats; struct knav_dma_desc *desc; unsigned int dma_sz; dma_addr_t dma; @@ -642,16 +643,17 @@ static void netcp_empty_rx_queue(struct netcp_intf *netcp) if (unlikely(!desc)) { dev_err(netcp->ndev_dev, "%s: failed to unmap Rx desc\n", __func__); - netcp->ndev->stats.rx_errors++; + rx_stats->rx_errors++; continue; } netcp_free_rx_desc_chain(netcp, desc); - netcp->ndev->stats.rx_dropped++; + rx_stats->rx_dropped++; } } static int netcp_process_one_rx_packet(struct netcp_intf *netcp) { + struct netcp_stats *rx_stats = >stats; unsigned int dma_sz, buf_len, org_buf_len; struct knav_dma_desc *desc, *ndesc; unsigned int pkt_sz = 0, accum_sz; @@ -757,8 +759,8 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp) if (unlikely(ret)) { dev_err(netcp->ndev_dev, "RX hook %d failed: %d\n", rx_hook->order, ret); - netcp->ndev->stats.rx_errors++; /* Free the primary descriptor */ + rx_stats->rx_dropped++; knav_pool_desc_put(netcp->rx_pool, desc); dev_kfree_skb(skb); return 0; @@ -767,8 +769,10 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp) /* Free the primary descriptor */ knav_pool_desc_put(netcp->rx_pool, desc); - netcp->ndev->stats.rx_packets++; - netcp->ndev->stats.rx_bytes += skb->len; + u64_stats_update_begin(_stats->syncp_rx); + rx_stats->rx_packets++; + rx_stats->rx_bytes += skb->len; + u64_stats_update_end(_stats->syncp_rx); /* push skb up the stack */ skb->protocol = eth_type_trans(skb, netcp->ndev); @@ -777,7 +781,7 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp) free_desc: netcp_free_rx_desc_chain(netcp, desc); - netcp->ndev->stats.rx_errors++; + rx_stats->rx_errors++; return 0; } @@ -1008,6 +1012,7 @@ static void netcp_free_tx_desc_chain(struct netcp_intf *netcp, static int netcp_process_tx_compl_packets(struct netcp_intf *netcp, unsigned int budget) { + struct netcp_stats *tx_stats = >stats; struct knav_dma_desc *desc; struct netcp_tx_cb *tx_cb; struct sk_buff *skb; @@
[PATCH net-next 01/10] net: netcp: ethss: add support of subsystem register region regmap
From: WingMan Kwok10gbe phy driver needs to access the 10gbe subsystem control register during phy initialization. To facilitate the shared access of the subsystem register region between the 10gbe Ethernet driver and the phy driver, this patch adds support of the subsystem register region defined by a syscon node in the dts. Although there is no shared access to the gbe subsystem register region, using syscon for that is for the sake of consistency. This change is backward compatible with previously released gbe devicetree bindings. Signed-off-by: WingMan Kwok Signed-off-by: Murali Karicheri Signed-off-by: Sekhar Nori --- .../devicetree/bindings/net/keystone-netcp.txt | 16 ++- drivers/net/ethernet/ti/netcp_ethss.c | 140 + 2 files changed, 127 insertions(+), 29 deletions(-) diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt b/Documentation/devicetree/bindings/net/keystone-netcp.txt index 04ba1dc..0854a73 100644 --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt @@ -72,20 +72,24 @@ Required properties: "ti,netcp-gbe-2" for 1GbE N NetCP 1.5 (N=2) "ti,netcp-xgbe" for 10 GbE +- syscon-subsys: phandle to syscon node of the switch + subsystem registers. + - reg: register location and the size for the following register regions in the specified order. - switch subsystem registers + - sgmii module registers - sgmii port3/4 module registers (only for NetCP 1.4) - switch module registers - serdes registers (only for 10G) NetCP 1.4 ethss, here is the order - index #0 - switch subsystem registers + index #0 - sgmii module registers index #1 - sgmii port3/4 module registers index #2 - switch module registers NetCP 1.5 ethss 9 port, 5 port and 2 port - index #0 - switch subsystem registers + index #0 - sgmii module registers index #1 - switch module registers index #2 - serdes registers @@ -145,6 +149,11 @@ Optional properties: Example binding: +gbe_subsys: subsys@209 { + compatible = "syscon"; + reg = <0x0209 0x100>; +}; + netcp: netcp@200 { reg = <0x2620110 0x8>; reg-names = "efuse"; @@ -163,7 +172,8 @@ netcp: netcp@200 { ranges; gbe@9 { label = "netcp-gbe"; - reg = <0x9 0x300>, <0x90400 0x400>, <0x90800 0x700>; + syscon-subsys = <_subsys>; + reg = <0x90100 0x200>, <0x90400 0x200>, <0x90800 0x700>; /* enable-ale; */ tx-queue = <648>; tx-channel = <8>; diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c index c7e547e..473edda1 100644 --- a/drivers/net/ethernet/ti/netcp_ethss.c +++ b/drivers/net/ethernet/ti/netcp_ethss.c @@ -19,9 +19,11 @@ */ #include +#include #include #include #include +#include #include #include #include @@ -43,7 +45,10 @@ #define GBE_MODULE_NAME"netcp-gbe" #define GBE_SS_VERSION_14 0x4ed21104 +/* for devicetree backward compatible only */ #define GBE_SS_REG_INDEX 0 + +#define GBE_SGMII_REG_INDEX0 #define GBE_SGMII34_REG_INDEX 1 #define GBE_SM_REG_INDEX 2 /* offset relative to base of GBE_SS_REG_INDEX */ @@ -71,9 +76,11 @@ #define IS_SS_ID_NU(d) \ (GBE_IDENT((d)->ss_version) == GBE_SS_ID_NU) -#define GBENU_SS_REG_INDEX 0 +#define GBENU_SGMII_REG_INDEX 0 #define GBENU_SM_REG_INDEX 1 +/* offset relative to base of GBE_SS_REG_INDEX */ #define GBENU_SGMII_MODULE_OFFSET 0x100 +/* offset relative to base of GBENU_SM_REG_INDEX */ #define GBENU_HOST_PORT_OFFSET 0x1000 #define GBENU_SLAVE_PORT_OFFSET0x2000 #define GBENU_EMAC_OFFSET 0x2330 @@ -82,13 +89,12 @@ #define GBENU_ALE_OFFSET 0x1e000 #define GBENU_HOST_PORT_NUM0 #define GBENU_NUM_ALE_ENTRIES 1024 -#define GBENU_SGMII_MODULE_SIZE0x100 /* 10G Ethernet SS defines */ #define XGBE_MODULE_NAME "netcp-xgbe" #define XGBE_SS_VERSION_10 0x4ee42100 -#define XGBE_SS_REG_INDEX 0 +#define XGBE_SGMII_REG_INDEX 0 #define XGBE_SM_REG_INDEX 1 #define XGBE_SERDES_REG_INDEX 2 @@ -173,6 +179,7 @@ #define XGBE_SET_REG_OFS(p, rb, rn) p->rb##_ofs.rn = \
[PATCH net-next 07/10] net: netcp: use hw capability to remove FCS word from rx packets
Some of the newer Ethernet switch hw (such as that on k2e/l/g) can strip the Etherenet FCS from packet at the port 0 egress of the switch. So use this capability instead of doing it in software. Signed-off-by: Murali KaricheriSigned-off-by: Sekhar Nori --- drivers/net/ethernet/ti/netcp.h | 2 ++ drivers/net/ethernet/ti/netcp_core.c | 8 ++-- drivers/net/ethernet/ti/netcp_ethss.c | 10 -- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h index d243c5d..8900a6f 100644 --- a/drivers/net/ethernet/ti/netcp.h +++ b/drivers/net/ethernet/ti/netcp.h @@ -102,6 +102,8 @@ struct netcp_intf { void*rx_fdq[KNAV_DMA_FDQ_PER_CHAN]; struct napi_struct rx_napi; struct napi_struct tx_napi; +#define ETH_SW_CAN_REMOVE_ETH_FCS BIT(0) + u32 hw_cap; /* 64-bit netcp stats */ struct netcp_stats stats; diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c index b077ed4..68a75cc 100644 --- a/drivers/net/ethernet/ti/netcp_core.c +++ b/drivers/net/ethernet/ti/netcp_core.c @@ -739,8 +739,12 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp) dev_dbg(netcp->ndev_dev, "mismatch in packet size(%d) & sum of fragments(%d)\n", pkt_sz, accum_sz); - /* Remove ethernet FCS from the packet */ - __pskb_trim(skb, skb->len - ETH_FCS_LEN); + /* Newer version of the Ethernet switch can trim the Ethernet FCS +* from the packet and is indicated in hw_cap. So trim it only for +* older h/w +*/ + if (!(netcp->hw_cap & ETH_SW_CAN_REMOVE_ETH_FCS)) + __pskb_trim(skb, skb->len - ETH_FCS_LEN); /* Call each of the RX hooks */ p_info.skb = skb; diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c index 9266961..4b2a911 100644 --- a/drivers/net/ethernet/ti/netcp_ethss.c +++ b/drivers/net/ethernet/ti/netcp_ethss.c @@ -133,6 +133,7 @@ #define MACSL_FULLDUPLEX BIT(0) #define GBE_CTL_P0_ENABLE BIT(2) +#define ETH_SW_CTL_P0_TX_CRC_REMOVEBIT(13) #define GBE13_REG_VAL_STAT_ENABLE_ALL 0xff #define XGBE_REG_VAL_STAT_ENABLE_ALL 0xf #define GBE_STATS_CD_SEL BIT(28) @@ -2847,7 +2848,7 @@ static int gbe_open(void *intf_priv, struct net_device *ndev) struct netcp_intf *netcp = netdev_priv(ndev); struct gbe_slave *slave = gbe_intf->slave; int port_num = slave->port_num; - u32 reg; + u32 reg, val; int ret; reg = readl(GBE_REG_ADDR(gbe_dev, switch_regs, id_ver)); @@ -2877,7 +2878,12 @@ static int gbe_open(void *intf_priv, struct net_device *ndev) writel(0, GBE_REG_ADDR(gbe_dev, switch_regs, ptype)); /* Control register */ - writel(GBE_CTL_P0_ENABLE, GBE_REG_ADDR(gbe_dev, switch_regs, control)); + val = GBE_CTL_P0_ENABLE; + if (IS_SS_ID_MU(gbe_dev)) { + val |= ETH_SW_CTL_P0_TX_CRC_REMOVE; + netcp->hw_cap = ETH_SW_CAN_REMOVE_ETH_FCS; + } + writel(val, GBE_REG_ADDR(gbe_dev, switch_regs, control)); /* All statistics enabled and STAT AB visible by default */ writel(gbe_dev->stats_en_mask, GBE_REG_ADDR(gbe_dev, switch_regs, -- 1.9.1
[PATCH net-next 03/10] net: netcp: extract eflag from desc for rx_hook handling
Extract the eflag bits from the received desc and pass it down the rx_hook chain to be available for netcp modules. Also the psdata and epib data has to be inspected by the netcp modules. So the desc can be freed only after returning from the rx_hook. So move knav_pool_desc_put() after the rx_hook processing. Signed-off-by: Murali Karicheri--- drivers/net/ethernet/ti/netcp.h | 1 + drivers/net/ethernet/ti/netcp_core.c | 20 +--- include/linux/soc/ti/knav_dma.h | 2 ++ 3 files changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h index 0f58c58..a92abd6 100644 --- a/drivers/net/ethernet/ti/netcp.h +++ b/drivers/net/ethernet/ti/netcp.h @@ -115,6 +115,7 @@ struct netcp_packet { struct sk_buff *skb; __le32 *epib; u32 *psdata; + u32 eflags; unsigned intpsdata_len; struct netcp_intf *netcp; struct netcp_tx_pipe*tx_pipe; diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c index c243335..a136c56 100644 --- a/drivers/net/ethernet/ti/netcp_core.c +++ b/drivers/net/ethernet/ti/netcp_core.c @@ -122,6 +122,13 @@ static void get_pkt_info(dma_addr_t *buff, u32 *buff_len, dma_addr_t *ndesc, *ndesc = le32_to_cpu(desc->next_desc); } +static void get_desc_info(u32 *desc_info, u32 *pkt_info, + struct knav_dma_desc *desc) +{ + *desc_info = le32_to_cpu(desc->desc_info); + *pkt_info = le32_to_cpu(desc->packet_info); +} + static u32 get_sw_data(int index, struct knav_dma_desc *desc) { /* No Endian conversion needed as this data is untouched by hw */ @@ -653,6 +660,7 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp) struct netcp_packet p_info; struct sk_buff *skb; void *org_buf_ptr; + u32 tmp; dma_desc = knav_queue_pop(netcp->rx_queue, _sz); if (!dma_desc) @@ -724,9 +732,6 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp) knav_pool_desc_put(netcp->rx_pool, ndesc); } - /* Free the primary descriptor */ - knav_pool_desc_put(netcp->rx_pool, desc); - /* check for packet len and warn */ if (unlikely(pkt_sz != accum_sz)) dev_dbg(netcp->ndev_dev, "mismatch in packet size(%d) & sum of fragments(%d)\n", @@ -739,6 +744,11 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp) p_info.skb = skb; skb->dev = netcp->ndev; p_info.rxtstamp_complete = false; + get_desc_info(, _info.eflags, desc); + p_info.epib = desc->epib; + p_info.psdata = (u32 __force *)desc->psdata; + p_info.eflags = ((p_info.eflags >> KNAV_DMA_DESC_EFLAGS_SHIFT) & +KNAV_DMA_DESC_EFLAGS_MASK); list_for_each_entry(rx_hook, >rxhook_list_head, list) { int ret; @@ -748,10 +758,14 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp) dev_err(netcp->ndev_dev, "RX hook %d failed: %d\n", rx_hook->order, ret); netcp->ndev->stats.rx_errors++; + /* Free the primary descriptor */ + knav_pool_desc_put(netcp->rx_pool, desc); dev_kfree_skb(skb); return 0; } } + /* Free the primary descriptor */ + knav_pool_desc_put(netcp->rx_pool, desc); netcp->ndev->stats.rx_packets++; netcp->ndev->stats.rx_bytes += skb->len; diff --git a/include/linux/soc/ti/knav_dma.h b/include/linux/soc/ti/knav_dma.h index 35cb926..2b78826 100644 --- a/include/linux/soc/ti/knav_dma.h +++ b/include/linux/soc/ti/knav_dma.h @@ -41,6 +41,8 @@ #define KNAV_DMA_DESC_RETQ_SHIFT 0 #define KNAV_DMA_DESC_RETQ_MASKMASK(14) #define KNAV_DMA_DESC_BUF_LEN_MASK MASK(22) +#define KNAV_DMA_DESC_EFLAGS_MASK MASK(4) +#define KNAV_DMA_DESC_EFLAGS_SHIFT 20 #define KNAV_DMA_NUM_EPIB_WORDS4 #define KNAV_DMA_NUM_PS_WORDS 16 -- 1.9.1
Re: ipv6: handle -EFAULT from skb_copy_bits
On Tue, Dec 20, 2016 at 11:31:38AM -0800, Cong Wang wrote: > On Tue, Dec 20, 2016 at 10:17 AM, Dave Joneswrote: > > On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote: > > > From: Dave Jones > > > Date: Mon, 19 Dec 2016 19:40:13 -0500 > > > > > > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote: > > > > > > > > > Unfortunately, this made no difference. I spent some time today > > trying > > > > > to make a better reproducer, but failed. I'll revisit again > > tomorrow. > > > > > > > > > > Maybe I need >1 process/thread to trigger this. That would > > explain why > > > > > I can trigger it with Trinity. > > > > > > > > scratch that last part, I finally just repro'd it with a single > > process. > > > > > > Thanks for the info, I'll try to think about this some more. > > > > I threw in some debug printks right before that BUG_ON. > > it's always this: > > > > skb->len=31 skb->data_len=0 offset:30 total_len:9 > > Clearly we fail because 30 > 31 - 2, seems 'offset' is not correct here, > off-by-one? Ok, I finally made a messy, albeit good enough reproducer. #include #include #include #include #include #include #include #define LEN 504 int main(int argc, char* argv[]) { int fd; int zero = 0; char buf[LEN]; memset(buf, 0, LEN); fd = socket(AF_INET6, SOCK_RAW, 7); setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, , 4); setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, , LEN); sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110); }
[PATCH net-next 04/10] net: netcp: remove the redundant memmov()
The psdata is populated with command data by netcp modules to the tail of the buffer and set_words() copy the same to the front of the psdata. So remove the redundant memmov function call. Signed-off-by: Murali Karicheri--- drivers/net/ethernet/ti/netcp_core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c index a136c56..286fd8d 100644 --- a/drivers/net/ethernet/ti/netcp_core.c +++ b/drivers/net/ethernet/ti/netcp_core.c @@ -1226,9 +1226,9 @@ static int netcp_tx_submit_skb(struct netcp_intf *netcp, /* psdata points to both native-endian and device-endian data */ __le32 *psdata = (void __force *)p_info.psdata; - memmove(p_info.psdata, p_info.psdata + p_info.psdata_len, - p_info.psdata_len); - set_words(p_info.psdata, p_info.psdata_len, psdata); + set_words((u32 *)psdata + + (KNAV_DMA_NUM_PS_WORDS - p_info.psdata_len), + p_info.psdata_len, psdata); tmp |= (p_info.psdata_len & KNAV_DMA_DESC_PSLEN_MASK) << KNAV_DMA_DESC_PSLEN_SHIFT; } -- 1.9.1
[PATCH net-next 09/10] net: netcp: ale: use ale_status to size the ale table
ALE h/w on newer version of NetCP (K2E/L/G) does provide a ALE_STATUS register for the size of the ALE Table implemented in h/w. Currently for example we set ALE Table size to 1024 for NetCP ALE on K2E even though the ALE Status/Documentation shows it has 8192 entries. So take advantage of this register to read the size of ALE table supported and use that value in the driver for the newer version of NetCP ALE. For NetCP lite, ALE Table size is much less (64) and indicated by a size of zero in ALE_STATUS. So use that as a default for now. While at it, also fix the ale table size on 10G switch to 2048 per User guide http://www.ti.com/lit/ug/spruhj5/spruhj5.pdf Signed-off-by: Murali KaricheriSigned-off-by: Sekhar Nori --- drivers/net/ethernet/ti/cpsw_ale.c| 31 ++- drivers/net/ethernet/ti/netcp_ethss.c | 4 +--- 2 files changed, 31 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c index e15db39..62a18d6 100644 --- a/drivers/net/ethernet/ti/cpsw_ale.c +++ b/drivers/net/ethernet/ti/cpsw_ale.c @@ -33,6 +33,7 @@ /* ALE Registers */ #define ALE_IDVER 0x00 +#define ALE_STATUS 0x04 #define ALE_CONTROL0x08 #define ALE_PRESCALE 0x10 #define ALE_UNKNOWNVLAN0x18 @@ -58,6 +59,10 @@ #define ALE_UCAST_OUI 2 #define ALE_UCAST_TOUCHED 3 +#define ALE_TABLE_SIZE_MULTIPLIER 1024 +#define ALE_STATUS_SIZE_MASK 0x1f +#define ALE_TABLE_SIZE_DEFAULT 64 + static inline int cpsw_ale_get_field(u32 *ale_entry, u32 start, u32 bits) { int idx; @@ -728,7 +733,7 @@ static void cpsw_ale_timer(unsigned long arg) void cpsw_ale_start(struct cpsw_ale *ale) { - u32 rev; + u32 rev, ale_entries; rev = __raw_readl(ale->params.ale_regs + ALE_IDVER); if (!ale->params.major_ver_mask) @@ -740,6 +745,30 @@ void cpsw_ale_start(struct cpsw_ale *ale) ALE_VERSION_MAJOR(rev, ale->params.major_ver_mask), ALE_VERSION_MINOR(rev)); + if (!ale->params.ale_entries) { + ale_entries = + __raw_readl(ale->params.ale_regs + ALE_STATUS) & + ALE_STATUS_SIZE_MASK; + /* ALE available on newer NetCP switches has introduced +* a register, ALE_STATUS, to indicate the size of ALE +* table which shows the size as a multiple of 1024 entries. +* For these, params.ale_entries will be set to zero. So +* read the register and update the value of ale_entries. +* ALE table on NetCP lite, is much smaller and is indicated +* by a value of zero in ALE_STATUS. So use a default value +* of ALE_TABLE_SIZE_DEFAULT for this. Caller is expected +* to set the value of ale_entries for all other versions +* of ALE. +*/ + if (!ale_entries) + ale_entries = ALE_TABLE_SIZE_DEFAULT; + else + ale_entries *= ALE_TABLE_SIZE_MULTIPLIER; + ale->params.ale_entries = ale_entries; + } + dev_info(ale->params.dev, +"ALE Table size %ld\n", ale->params.ale_entries); + if (ale->params.nu_switch_ale) { /* Separate registers for unknown vlan configuration. * Also there are N bits, where N is number of ale diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c index b37fb73..80d68cb 100644 --- a/drivers/net/ethernet/ti/netcp_ethss.c +++ b/drivers/net/ethernet/ti/netcp_ethss.c @@ -94,7 +94,6 @@ #define GBENU_CPTS_OFFSET 0x1d000 #define GBENU_ALE_OFFSET 0x1e000 #define GBENU_HOST_PORT_NUM0 -#define GBENU_NUM_ALE_ENTRIES 1024 /* 10G Ethernet SS defines */ #define XGBE_MODULE_NAME "netcp-xgbe" @@ -114,7 +113,7 @@ #define XGBE10_ALE_OFFSET 0x700 #define XGBE10_HW_STATS_OFFSET 0x800 #define XGBE10_HOST_PORT_NUM 0 -#define XGBE10_NUM_ALE_ENTRIES 1024 +#define XGBE10_NUM_ALE_ENTRIES 2048 #defineGBE_TIMER_INTERVAL (HZ / 2) @@ -3548,7 +3547,6 @@ static int set_gbenu_ethss_priv(struct gbe_priv *gbe_dev, gbe_dev->ale_reg = gbe_dev->switch_regs + GBENU_ALE_OFFSET; gbe_dev->ale_ports = gbe_dev->max_num_ports; gbe_dev->host_port = GBENU_HOST_PORT_NUM; - gbe_dev->ale_entries = GBE13_NUM_ALE_ENTRIES; gbe_dev->stats_en_mask = (1 << (gbe_dev->max_num_ports)) - 1; /* Subsystem registers */ -- 1.9.1
[PATCH net-next 08/10] net: netcp: ale: update to support unknown vlan controls for NU switch
In NU Ethernet switch used on some of the Keystone SoCs, there is separate UNKNOWNVLAN register for membership, unreg mcast flood, reg mcast flood and force untag egress bits in ALE. So control for these fields require different address offset, shift and size of field. As this ALE has the same version number as ALE in CPSW found on other SoCs, customazation based on version number is not possible. So use a configuration parameter, nu_switch_ale, to identify the ALE ALE found in NU Switch. Different treatment is needed for NU Switch ALE due to difference in the ale table bits, separate unknown vlan registers etc. The register information available in ale_controls, needs to be updated to support the netcp NU switch h/w. So it is not constant array any more since it needs to be updated based on ALE type. The header of the file is also updated to indicate it supports N port switch ALE, not just 3 port. The version mask is 3 bits in NU Switch ALE vs 8 bits on other ALE types. While at it, change the debug print to info print so that ALE version gets displayed in boot log. Signed-off-by: Murali KaricheriSigned-off-by: Sekhar Nori --- drivers/net/ethernet/ti/cpsw_ale.c| 50 +++ drivers/net/ethernet/ti/cpsw_ale.h| 13 - drivers/net/ethernet/ti/netcp_ethss.c | 5 +++- 3 files changed, 61 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c index 43b061b..e15db39 100644 --- a/drivers/net/ethernet/ti/cpsw_ale.c +++ b/drivers/net/ethernet/ti/cpsw_ale.c @@ -1,5 +1,5 @@ /* - * Texas Instruments 3-Port Ethernet Switch Address Lookup Engine + * Texas Instruments N-Port Ethernet Switch Address Lookup Engine * * Copyright (C) 2012 Texas Instruments * @@ -27,8 +27,9 @@ #define BITMASK(bits) (BIT(bits) - 1) -#define ALE_VERSION_MAJOR(rev) ((rev >> 8) & 0xff) +#define ALE_VERSION_MAJOR(rev, mask) (((rev) >> 8) & (mask)) #define ALE_VERSION_MINOR(rev) (rev & 0xff) +#define ALE_VERSION_1R40x0104 /* ALE Registers */ #define ALE_IDVER 0x00 @@ -39,6 +40,12 @@ #define ALE_TABLE 0x34 #define ALE_PORTCTL0x40 +/* ALE NetCP NU switch specific Registers */ +#define ALE_UNKNOWNVLAN_MEMBER 0x90 +#define ALE_UNKNOWNVLAN_UNREG_MCAST_FLOOD 0x94 +#define ALE_UNKNOWNVLAN_REG_MCAST_FLOOD0x98 +#define ALE_UNKNOWNVLAN_FORCE_UNTAG_EGRESS 0x9C + #define ALE_TABLE_WRITEBIT(31) #define ALE_TYPE_FREE 0 @@ -464,7 +471,7 @@ struct ale_control_info { int bits; }; -static const struct ale_control_info ale_controls[ALE_NUM_CONTROLS] = { +static struct ale_control_info ale_controls[ALE_NUM_CONTROLS] = { [ALE_ENABLE]= { .name = "enable", .offset = ALE_CONTROL, @@ -724,8 +731,41 @@ void cpsw_ale_start(struct cpsw_ale *ale) u32 rev; rev = __raw_readl(ale->params.ale_regs + ALE_IDVER); - dev_dbg(ale->params.dev, "initialized cpsw ale revision %d.%d\n", - ALE_VERSION_MAJOR(rev), ALE_VERSION_MINOR(rev)); + if (!ale->params.major_ver_mask) + ale->params.major_ver_mask = 0xff; + ale->version = + (ALE_VERSION_MAJOR(rev, ale->params.major_ver_mask) << 8) | +ALE_VERSION_MINOR(rev); + dev_info(ale->params.dev, "initialized cpsw ale version %d.%d\n", +ALE_VERSION_MAJOR(rev, ale->params.major_ver_mask), +ALE_VERSION_MINOR(rev)); + + if (ale->params.nu_switch_ale) { + /* Separate registers for unknown vlan configuration. +* Also there are N bits, where N is number of ale +* ports and shift value should be 0 +*/ + ale_controls[ALE_PORT_UNKNOWN_VLAN_MEMBER].bits = + ale->params.ale_ports; + ale_controls[ALE_PORT_UNKNOWN_VLAN_MEMBER].offset = + ALE_UNKNOWNVLAN_MEMBER; + ale_controls[ALE_PORT_UNKNOWN_MCAST_FLOOD].bits = + ale->params.ale_ports; + ale_controls[ALE_PORT_UNKNOWN_MCAST_FLOOD].shift = 0; + ale_controls[ALE_PORT_UNKNOWN_MCAST_FLOOD].offset = + ALE_UNKNOWNVLAN_UNREG_MCAST_FLOOD; + ale_controls[ALE_PORT_UNKNOWN_REG_MCAST_FLOOD].bits = + ale->params.ale_ports; + ale_controls[ALE_PORT_UNKNOWN_REG_MCAST_FLOOD].shift = 0; + ale_controls[ALE_PORT_UNKNOWN_REG_MCAST_FLOOD].offset = + ALE_UNKNOWNVLAN_REG_MCAST_FLOOD; + ale_controls[ALE_PORT_UNTAGGED_EGRESS].bits = + ale->params.ale_ports; +
[PATCH net-next 10/10] net: netcp: ale: add proper ale entry mask bits for netcp switch ALE
For NetCP NU Switch ALE, some of the mask bits are different than defaults used in the driver. Add a new macro DEFINE_ALE_FIELD1 that use a configurable mask bits and use it in the driver. These bits are set to correct values by using the new variables added to cpsw_ale structure and re-used in the macros. The parameter nu_switch_ale is configured by the caller driver to indicate the ALE is for that switch and is used in the ALE driver to do customization as needed. Signed-off-by: Murali KaricheriSigned-off-by: Sekhar Nori --- drivers/net/ethernet/ti/cpsw_ale.c | 99 ++ drivers/net/ethernet/ti/cpsw_ale.h | 4 ++ 2 files changed, 84 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c index 62a18d6..ddd43e0 100644 --- a/drivers/net/ethernet/ti/cpsw_ale.c +++ b/drivers/net/ethernet/ti/cpsw_ale.c @@ -29,6 +29,7 @@ #define ALE_VERSION_MAJOR(rev, mask) (((rev) >> 8) & (mask)) #define ALE_VERSION_MINOR(rev) (rev & 0xff) +#define ALE_VERSION_1R30x0103 #define ALE_VERSION_1R40x0104 /* ALE Registers */ @@ -46,6 +47,7 @@ #define ALE_UNKNOWNVLAN_UNREG_MCAST_FLOOD 0x94 #define ALE_UNKNOWNVLAN_REG_MCAST_FLOOD0x98 #define ALE_UNKNOWNVLAN_FORCE_UNTAG_EGRESS 0x9C +#define ALE_VLAN_MASK_MUX(reg) (0xc0 + (0x4 * (reg))) #define ALE_TABLE_WRITEBIT(31) @@ -96,20 +98,34 @@ static inline void cpsw_ale_set_field(u32 *ale_entry, u32 start, u32 bits, cpsw_ale_set_field(ale_entry, start, bits, value); \ } +#define DEFINE_ALE_FIELD1(name, start) \ +static inline int cpsw_ale_get_##name(u32 *ale_entry, u32 bits) \ +{ \ + return cpsw_ale_get_field(ale_entry, start, bits); \ +} \ +static inline void cpsw_ale_set_##name(u32 *ale_entry, u32 value, \ + u32 bits) \ +{ \ + cpsw_ale_set_field(ale_entry, start, bits, value); \ +} + DEFINE_ALE_FIELD(entry_type, 60, 2) DEFINE_ALE_FIELD(vlan_id, 48, 12) DEFINE_ALE_FIELD(mcast_state, 62, 2) -DEFINE_ALE_FIELD(port_mask,66, 3) +DEFINE_ALE_FIELD1(port_mask, 66) DEFINE_ALE_FIELD(super,65, 1) DEFINE_ALE_FIELD(ucast_type, 62, 2) -DEFINE_ALE_FIELD(port_num, 66, 2) +DEFINE_ALE_FIELD1(port_num,66) DEFINE_ALE_FIELD(blocked, 65, 1) DEFINE_ALE_FIELD(secure, 64, 1) -DEFINE_ALE_FIELD(vlan_untag_force, 24, 3) -DEFINE_ALE_FIELD(vlan_reg_mcast, 16, 3) -DEFINE_ALE_FIELD(vlan_unreg_mcast, 8, 3) -DEFINE_ALE_FIELD(vlan_member_list, 0, 3) +DEFINE_ALE_FIELD1(vlan_untag_force,24) +DEFINE_ALE_FIELD1(vlan_reg_mcast, 16) +DEFINE_ALE_FIELD1(vlan_unreg_mcast,8) +DEFINE_ALE_FIELD1(vlan_member_list,0) DEFINE_ALE_FIELD(mcast,40, 1) +/* ALE NetCP nu switch specific */ +DEFINE_ALE_FIELD(vlan_unreg_mcast_idx, 20, 3) +DEFINE_ALE_FIELD(vlan_reg_mcast_idx, 44, 3) /* The MAC address field in the ALE entry cannot be macroized as above */ static inline void cpsw_ale_get_addr(u32 *ale_entry, u8 *addr) @@ -235,14 +251,16 @@ static void cpsw_ale_flush_mcast(struct cpsw_ale *ale, u32 *ale_entry, { int mask; - mask = cpsw_ale_get_port_mask(ale_entry); + mask = cpsw_ale_get_port_mask(ale_entry, + ale->port_mask_bits); if ((mask & port_mask) == 0) return; /* ports dont intersect, not interested */ mask &= ~port_mask; /* free if only remaining port is host port */ if (mask) - cpsw_ale_set_port_mask(ale_entry, mask); + cpsw_ale_set_port_mask(ale_entry, mask, + ale->port_mask_bits); else cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_FREE); } @@ -303,7 +321,7 @@ int cpsw_ale_add_ucast(struct cpsw_ale *ale, u8 *addr, int port, cpsw_ale_set_ucast_type(ale_entry, ALE_UCAST_PERSISTANT); cpsw_ale_set_secure(ale_entry, (flags & ALE_SECURE) ? 1 : 0); cpsw_ale_set_blocked(ale_entry, (flags & ALE_BLOCKED) ? 1 : 0); - cpsw_ale_set_port_num(ale_entry, port); + cpsw_ale_set_port_num(ale_entry, port, ale->port_num_bits); idx = cpsw_ale_match_addr(ale, addr, (flags & ALE_VLAN) ? vid : 0); if (idx < 0) @@ -350,9 +368,11 @@ int cpsw_ale_add_mcast(struct cpsw_ale *ale, u8 *addr, int port_mask,
Re: HalfSipHash Acceptable Usage
On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote: > 1) Anything that requires actual long-term security will use > SipHash2-4, with the 64-bit output and the 128-bit key. This includes > things like TCP sequence numbers. This seems pretty uncontroversial to > me. Seem okay to you? Um, why do TCP sequence numbers need long-term security? So long as you rekey every 5 minutes or so, TCP sequence numbers don't need any more security than that, since even if you break the key used to generate initial sequence numbers seven a minute or two later, any pending TCP connections will have timed out long before. See the security analysis done in RFC 6528[1], where among other things, it points out why MD5 is acceptable with periodic rekeying, although there is the concern that this could break certain hueristics used when establishing new connections during the TIME-WAIT state. [1] https://tools.ietf.org/html/rfc6528 - Ted
[GIT] Networking
1) Use rb_entry() instead of hardcoded container_of(), from Geliang Tang. 2) Use correct memory barriers in stammac driver, from Pavel Machek. 3) Fix assoc bind address handling in SCTP, from Xin Long. 4) Make the length check for UFO handling consistent between __ip_append_data() and ip_finish_output(), from Zheng Li. 5) HSI driver compatible strings were busted fro hix5hd2, from Dongpo Li. 6) Handle devm_ioremap() errors properly in cavium driver, from Arvind Yadav. Please pull, thanks a lot! The following changes since commit 52f40e9d657cc126b766304a5dd58ad73b02ff46: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2016-12-17 20:17:04 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to a763f78cea845c91b8d91f93dabf70c407635dc5: RDS: use rb_entry() (2016-12-20 14:22:49 -0500) Arvind Yadav (1): net: ethernet: cavium: octeon: octeon_mgmt: Handle return NULL error from devm_ioremap David S. Miller (4): Merge branch 'phy-broken-modes' Merge branch 'fsl-fixes' Merge branch 'hix5hd2_gmac-compatible-string' Merge branch 'sctp-fixes' Dongpo Li (2): net: hix5hd2_gmac: fix compatible strings name ARM: dts: hix5hd2: don't change the existing compatible string Geliang Tang (4): net/mlx5: use rb_entry() net_sched: sch_fq: use rb_entry() net_sched: sch_netem: use rb_entry() RDS: use rb_entry() Jarno Rajahalme (1): openvswitch: Add a missing break statement. Madalin Bucur (4): fsl/fman: fix 1G support for QSGMII interfaces powerpc: fsl/fman: remove fsl,fman from of_device_ids[] fsl/fman: A007273 only applies to PPC SoCs fsl/fman: enable compilation on ARM64 Pavel Machek (1): stmmac: fix memory barriers Tobias Klauser (1): ethernet: sfc: Add Kconfig entry for vendor Solarflare WingMan Kwok (2): net: netcp: ethss: fix errors in ethtool ops net: netcp: ethss: fix 10gbe host port tx pri map configuration Xin Long (2): sctp: reduce indent level in sctp_copy_local_addr_list sctp: not copying duplicate addrs to the assoc's bind address list jbrunet (3): net: phy: fix sign type error in genphy_config_eee_advert net: phy: use boolean dt properties for eee broken modes dt: bindings: net: use boolean dt properties for eee broken modes zheng li (1): ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt | 13 - Documentation/devicetree/bindings/net/phy.txt| 10 -- arch/arm/boot/dts/hisi-x5hd2.dtsi| 4 ++-- arch/powerpc/platforms/85xx/corenet_generic.c| 3 --- drivers/net/ethernet/Kconfig | 1 - drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 6 ++ drivers/net/ethernet/freescale/fman/Kconfig | 2 +- drivers/net/ethernet/freescale/fman/fman.c | 15 +++ drivers/net/ethernet/freescale/fman/mac.c| 1 + drivers/net/ethernet/hisilicon/hix5hd2_gmac.c| 13 +++-- drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c| 2 +- drivers/net/ethernet/sfc/Kconfig | 21 + drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c | 4 ++-- drivers/net/ethernet/stmicro/stmmac/enh_desc.c | 2 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c| 8 drivers/net/ethernet/ti/netcp_ethss.c| 24 ++-- drivers/net/phy/phy_device.c | 22 +- include/dt-bindings/net/mdio.h | 19 --- net/ipv4/ip_output.c | 2 +- net/openvswitch/flow_netlink.c | 1 + net/rds/rdma.c | 2 +- net/sched/sch_fq.c | 14 +++--- net/sched/sch_netem.c| 2 +- net/sctp/bind_addr.c | 3 +++ net/sctp/protocol.c | 40 ++-- 25 files changed, 148 insertions(+), 86 deletions(-) delete mode 100644 include/dt-bindings/net/mdio.h
[ANNOUNCE] nftables 0.7 release
Hi! The Netfilter project proudly presents: nftables 0.7 This release contains many accumulated bug fixes and new features available up to the (upcoming) Linux 4.10-rc1 kernel release. * Facilitate migration from iptables to nftables: At compilation time, you have to pass this option. # ./configure --with-xtables And libxtables needs to be installed in your system. This allows you to list a ruleset containing xt extensions loaded through iptables-compat-restore tool. The nft tool provides a native translation for iptables extensions (if available). * Add new fib expression, which can be used to obtain the output interface from the route table based on either source or destination address of a packet. This can be used to e.g. add reverse path filtering, eg. drop if not coming from the same interface packet arrived on: # nft add rule x prerouting fib saddr . iif oif eq 0 drop Accept only if from eth: # nft add rule x prerouting fib saddr . iif oif eq "eth0" accept Accept if from any valid interface: # nft add rule x prerouting fib saddr oif accept Querying of address type is also supported, this can be used to only accept packets to addresses configured in the same interface, eg. # nft add rule x prerouting fib daddr . iif type local accept Its also possible to use mark and verdict map, eg, # nft add rule x prerouting \ meta mark set 0xdead fib daddr . mark type vmap { blackhole : drop, prohibit : drop, unicast : accept } * Support hashing of any arbitrary key combination, eg. # nft add rule x y \ dnat to jhash ip saddr . tcp dport mod 2 map { \ 0 : 192.168.20.100, \ 1 : 192.168.30.100 \ } Another usecase: Set packet marks based on any arbitrary hashing. * Add number generation support. Useful for round-robin packet mark setting, eg. # nft add rule filter prerouting meta mark set numgen inc mod 2 You can also specify an offset to indicate from what value you want to start from. The modulus provides the scale of the counting sequence. You can also use this from maps, eg. # nft add rule nat prerouting \ dnat to numgen inc mod 2 map { 0 : 192.168.10.100, 1 : 192.168.20.200 } So this is distributing new connections in a round-robin fashion between 192.168.10.100 and 192.168.20.200. Don't forget the special NAT chain semantics: Only the first packet evaluates the rule, follow up packets rely on conntrack to apply the NAT information. You can also emulate flow distribution with different backend weights using intervals, eg. # nft add rule nat prerouting \ dnat to numgen inc mod 10 map { 0-5 : 192.168.10.100, 6-9 : 192.168.20.200 } * Add quota support, eg. # nft add rule filter input \ flow table http { ip saddr timeout 60s quota over 50 mbytes } drop This creates a flow table, where every flow gets a quota of 50 mbytes. You can also from use simple rules too to enforce quotas, of course. * Introduce routing expression, for routing related data with support for nexthop (i.e. the directly connected IP address that an outgoing packet is sent to), which can be used either for matching or accounting, eg. # nft add rule filter postrouting \ ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop This will drop any traffic to 192.168.1.0/24 that is not routed via 192.168.0.1. # nft add rule filter postrouting \ flow table acct { rt nexthop timeout 600s counter } # nft add rule ip6 filter postrouting \ flow table acct { rt nexthop timeout 600s counter } These rules count outgoing traffic per nexthop. Note that the timeout releases an entry if no traffic is seen for this nexthop within 10 minutes. * Notrack support, to explicitly skip connection tracking for matching packets, eg. # nft add rule ip raw prerouting tcp dport { 80, 443 } notrack So you can skip tracking for http and https traffic. * Support to set non-byte bound packet header fields, including checksum adjustment, eg. ip6 ecn set 1. * Add 'create set' and 'create element' commands, eg. # nft add set x y { type ipv4_addr\; } # nft create set x y { type ipv4_addr\; } :1:1-35: Error: Could not process rule: File exists create set x y { type ipv4_addr; } ^^^ # nft add set x y { type ipv4_addr\; } # So 'create' bails out if the set already exists, while 'add' doesn't, for more ergonomic usage as several users requested on the mailing list. * Allow to use variable reference for set element definitions, eg. # cat ruleset.nft define s-ext-2-int = { 10.10.10.10 . 25, 10.10.10.10 . 143 } table inet forward { set s-ext-2-int { type ipv4_addr . inet_service elements = $s-ext-2-int } } #
[PATCH 3/5 net-next] inet: don't check for bind conflicts twice when searching for a port
This is just wasted time, we've already found a tb that doesn't have a bind conflict, and we don't drop the head lock so scanning again isn't going to give us a different answer. Instead move the tb->reuse setting logic outside of the found_tb path and put it in the success: path. Then make it so that we don't goto again if we find a bind conflict in the found_tb path as we won't reach this anymore when we are scanning for an ephemeral port. Signed-off-by: Josef Bacik--- net/ipv4/inet_connection_sock.c | 39 ++- 1 file changed, 18 insertions(+), 21 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 1a1a94bd..fc9bfe1 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -92,7 +92,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) { bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN; struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo; - int ret = 1, attempts = 5, port = snum; + int ret = 1, port = snum; struct inet_bind_hashbucket *head; struct net *net = sock_net(sk); int i, low, high, attempt_half; @@ -100,6 +100,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) kuid_t uid = sock_i_uid(sk); u32 remaining, offset; bool reuseport_ok = !!snum; + bool empty_tb = true; if (port) { head = >bhash[inet_bhashfn(net, port, @@ -111,7 +112,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) goto tb_not_found; } -again: attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0; other_half_scan: inet_get_local_port_range(net, , ); @@ -148,8 +148,12 @@ other_parity_scan: spin_lock_bh(>lock); inet_bind_bucket_for_each(tb, >chain) if (net_eq(ib_net(tb), net) && tb->port == port) { - if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok)) - goto tb_found; + if (hlist_empty(>owners)) + goto success; + if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok)) { + empty_tb = false; + goto success; + } goto next_port; } goto tb_not_found; @@ -184,23 +188,12 @@ tb_found: !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) goto success; - if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) { - if ((reuse || -(tb->fastreuseport > 0 && - sk->sk_reuseport && - !rcu_access_pointer(sk->sk_reuseport_cb) && - uid_eq(tb->fastuid, uid))) && !snum && - --attempts >= 0) { - spin_unlock_bh(>lock); - goto again; - } + if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) goto fail_unlock; - } - if (!reuse) - tb->fastreuse = 0; - if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid)) - tb->fastreuseport = 0; - } else { + empty_tb = false; + } +success: + if (empty_tb) { tb->fastreuse = reuse; if (sk->sk_reuseport) { tb->fastreuseport = 1; @@ -208,8 +201,12 @@ tb_found: } else { tb->fastreuseport = 0; } + } else { + if (!reuse) + tb->fastreuse = 0; + if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid)) + tb->fastreuseport = 0; } -success: if (!inet_csk(sk)->icsk_bind_hash) inet_bind_hash(sk, tb, port); WARN_ON(inet_csk(sk)->icsk_bind_hash != tb); -- 2.9.3
[PATCH 5/5 net-next] inet: reset tb->fastreuseport when adding a reuseport sk
If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and never set it again. Which means that in the future if we end up adding a bunch of reuseport sk's to that tb we'll have to do the expensive scan every time. Instead add a sock_common to the tb so we know what reuseport sk succeeded last. Once one sk has made it onto the list we know that there are no potential bind conflicts on the owners list that match that sk's rcv_addr. So copy the sk's common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT so we know we have to do an extra check for subsequent reuseport sockets and skip the expensive bind conflict check. Signed-off-by: Josef Bacik--- include/net/inet_hashtables.h | 4 net/ipv4/inet_connection_sock.c | 53 + 2 files changed, 53 insertions(+), 4 deletions(-) diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 50f635c..b776401 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -74,12 +74,16 @@ struct inet_ehash_bucket { * users logged onto your box, isn't it nice to know that new data * ports are created in O(1) time? I thought so. ;-) -DaveM */ +#define FASTREUSEPORT_ANY 1 +#define FASTREUSEPORT_STRICT 2 + struct inet_bind_bucket { possible_net_t ib_net; unsigned short port; signed char fastreuse; signed char fastreuseport; kuid_t fastuid; + struct sock_common fastsock; int num_owners; struct hlist_node node; struct hlist_head owners; diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index d3ccf62..9e29fad 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -164,6 +164,32 @@ success: return head; } +static inline int sk_reuseport_match(struct inet_bind_bucket *tb, +struct sock *sk) +{ + struct sock *sk2 = (struct sock *)>fastsock; + kuid_t uid = sock_i_uid(sk); + + if (tb->fastreuseport <= 0) + return 0; + if (!sk->sk_reuseport) + return 0; + if (rcu_access_pointer(sk->sk_reuseport_cb)) + return 0; + if (!uid_eq(tb->fastuid, uid)) + return 0; + /* We only need to check the rcv_saddr if this tb was once marked +* without fastreuseport and then was reset, as we can only know that +* the fastsock has no potential bind conflicts with the rest of the +* possible socks on the owners list. +*/ + if (tb->fastreuseport == FASTREUSEPORT_ANY) + return 1; + if (!inet_csk(sk)->icsk_af_ops->rcv_saddr_equal(sk, sk2, true)) + return 0; + return 1; +} + /* Obtain a reference to a local port for the given sock, * if snum is zero it means select any available local port. * We try to allocate an odd port (and leave even ports for connect()) @@ -206,9 +232,7 @@ tb_found: goto success; if ((tb->fastreuse > 0 && reuse) || -(tb->fastreuseport > 0 && - !rcu_access_pointer(sk->sk_reuseport_cb) && - sk->sk_reuseport && uid_eq(tb->fastuid, uid))) + sk_reuseport_match(tb, sk)) goto success; if (inet_csk_bind_conflict(sk, tb, true, true)) goto fail_unlock; @@ -220,14 +244,35 @@ success: if (sk->sk_reuseport) { tb->fastreuseport = 1; tb->fastuid = uid; + memcpy(>fastsock, >__sk_common, + sizeof(struct sock_common)); } else { tb->fastreuseport = 0; } } else { if (!reuse) tb->fastreuse = 0; - if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid)) + if (sk->sk_reuseport) { + /* We didn't match or we don't have fastreuseport set on +* the tb, but we have sk_reuseport set on this socket +* and we know that there are no bind conflicts with +* this socket in this tb, so reset our tb's reuseport +* settings so that any subsequent sockets that match +* our current socket will be put on the fast path. +* +* If we reset we need to set FASTREUSEPORT_STRICT so we +* do extra checking for all subsequent sk_reuseport +* socks. +*/ + if (!sk_reuseport_match(tb, sk)) { +
[PATCH 4/5 net-next] inet: split inet_csk_get_port into two functions
inet_csk_get_port does two different things, it either scans for an open port, or it tries to see if the specified port is available for use. Since these two operations have different rules and are basically independent lets split them into two different functions to make them both more readable. Signed-off-by: Josef Bacik--- net/ipv4/inet_connection_sock.c | 72 +++-- 1 file changed, 47 insertions(+), 25 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index fc9bfe1..d3ccf62 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -84,34 +84,21 @@ static int inet_csk_bind_conflict(const struct sock *sk, return sk2 != NULL; } -/* Obtain a reference to a local port for the given sock, - * if snum is zero it means select any available local port. - * We try to allocate an odd port (and leave even ports for connect()) +/* + * Find an open port number for the socket. Returns with the + * inet_bind_hashbucket lock held. */ -int inet_csk_get_port(struct sock *sk, unsigned short snum) +static struct inet_bind_hashbucket * +inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *port_ret) { - bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN; struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo; - int ret = 1, port = snum; + int port = 0; struct inet_bind_hashbucket *head; struct net *net = sock_net(sk); int i, low, high, attempt_half; struct inet_bind_bucket *tb; - kuid_t uid = sock_i_uid(sk); u32 remaining, offset; - bool reuseport_ok = !!snum; - bool empty_tb = true; - if (port) { - head = >bhash[inet_bhashfn(net, port, - hinfo->bhash_size)]; - spin_lock_bh(>lock); - inet_bind_bucket_for_each(tb, >chain) - if (net_eq(ib_net(tb), net) && tb->port == port) - goto tb_found; - - goto tb_not_found; - } attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0; other_half_scan: inet_get_local_port_range(net, , ); @@ -150,13 +137,12 @@ other_parity_scan: if (net_eq(ib_net(tb), net) && tb->port == port) { if (hlist_empty(>owners)) goto success; - if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok)) { - empty_tb = false; + if (!inet_csk_bind_conflict(sk, tb, false, false)) goto success; - } goto next_port; } - goto tb_not_found; + tb = NULL; + goto success; next_port: spin_unlock_bh(>lock); cond_resched(); @@ -171,8 +157,44 @@ next_port: attempt_half = 2; goto other_half_scan; } - return ret; + return NULL; +success: + *port_ret = port; + *tb_ret = tb; + return head; +} +/* Obtain a reference to a local port for the given sock, + * if snum is zero it means select any available local port. + * We try to allocate an odd port (and leave even ports for connect()) + */ +int inet_csk_get_port(struct sock *sk, unsigned short snum) +{ + bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN; + struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo; + int ret = 1, port = snum; + struct inet_bind_hashbucket *head; + struct net *net = sock_net(sk); + struct inet_bind_bucket *tb = NULL; + kuid_t uid = sock_i_uid(sk); + bool empty_tb = true; + + if (!port) { + head = inet_csk_find_open_port(sk, , ); + if (!head) + return 1; + if (!tb) + goto tb_not_found; + if (!hlist_empty(>owners)) + empty_tb = false; + goto success; + } + head = >bhash[inet_bhashfn(net, port, + hinfo->bhash_size)]; + spin_lock_bh(>lock); + inet_bind_bucket_for_each(tb, >chain) + if (net_eq(ib_net(tb), net) && tb->port == port) + goto tb_found; tb_not_found: tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep, net, head, port); @@ -188,7 +210,7 @@ tb_found: !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) goto success; - if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) + if
[PATCH 2/5 net-next] inet: kill smallest_size and smallest_port
In inet_csk_get_port we seem to be using smallest_port to figure out where the best place to look for a SO_REUSEPORT sk that matches with an existing set of SO_REUSEPORT's. However if we get to the logic if (smallest_size != -1) { port = smallest_port; goto have_port; } we will do a useless search, because we would have already done the inet_csk_bind_conflict for that port and it would have returned 1, otherwise we would have gone to found_tb and succeeded. Since this logic makes us do yet another trip through inet_csk_bind_conflict for a port we know won't work just delete this code and save us the time. Signed-off-by: Josef Bacik--- net/ipv4/inet_connection_sock.c | 26 -- 1 file changed, 4 insertions(+), 22 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 74f6a57..1a1a94bd 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -93,7 +93,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN; struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo; int ret = 1, attempts = 5, port = snum; - int smallest_size = -1, smallest_port; struct inet_bind_hashbucket *head; struct net *net = sock_net(sk); int i, low, high, attempt_half; @@ -103,7 +102,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) bool reuseport_ok = !!snum; if (port) { -have_port: head = >bhash[inet_bhashfn(net, port, hinfo->bhash_size)]; spin_lock_bh(>lock); @@ -137,8 +135,6 @@ other_half_scan: * We do the opposite to not pollute connect() users. */ offset |= 1U; - smallest_size = -1; - smallest_port = low; /* avoid compiler warning */ other_parity_scan: port = low + offset; @@ -152,15 +148,6 @@ other_parity_scan: spin_lock_bh(>lock); inet_bind_bucket_for_each(tb, >chain) if (net_eq(ib_net(tb), net) && tb->port == port) { - if (((tb->fastreuse > 0 && reuse) || -(tb->fastreuseport > 0 && - sk->sk_reuseport && - !rcu_access_pointer(sk->sk_reuseport_cb) && - uid_eq(tb->fastuid, uid))) && - (tb->num_owners < smallest_size || smallest_size == -1)) { - smallest_size = tb->num_owners; - smallest_port = port; - } if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok)) goto tb_found; goto next_port; @@ -171,10 +158,6 @@ next_port: cond_resched(); } - if (smallest_size != -1) { - port = smallest_port; - goto have_port; - } offset--; if (!(offset & 1)) goto other_parity_scan; @@ -196,19 +179,18 @@ tb_found: if (sk->sk_reuse == SK_FORCE_REUSE) goto success; - if (((tb->fastreuse > 0 && reuse) || + if ((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && - sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && - smallest_size == -1) + sk->sk_reuseport && uid_eq(tb->fastuid, uid))) goto success; if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) { if ((reuse || (tb->fastreuseport > 0 && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && - uid_eq(tb->fastuid, uid))) && - !snum && smallest_size != -1 && --attempts >= 0) { + uid_eq(tb->fastuid, uid))) && !snum && + --attempts >= 0) { spin_unlock_bh(>lock); goto again; } -- 2.9.3
[PATCH 1/5 net-next] inet: replace ->bind_conflict with ->rcv_saddr_equal
The only difference between inet6_csk_bind_conflict and inet_csk_bind_conflict is how they check the rcv_saddr. Since we want to be able to check the saddr in other places just drop the protocol specific ->bind_conflict and replace it with ->rcv_saddr_equal, then make inet_csk_bind_conflict the one true bind conflict function. Signed-off-by: Josef Bacik--- include/net/inet6_connection_sock.h | 5 - include/net/inet_connection_sock.h | 9 +++-- net/dccp/ipv4.c | 3 ++- net/dccp/ipv6.c | 2 +- net/ipv4/inet_connection_sock.c | 22 +++- net/ipv4/tcp_ipv4.c | 3 ++- net/ipv4/udp.c | 1 + net/ipv6/inet6_connection_sock.c| 40 - net/ipv6/tcp_ipv6.c | 4 ++-- 9 files changed, 18 insertions(+), 71 deletions(-) diff --git a/include/net/inet6_connection_sock.h b/include/net/inet6_connection_sock.h index 3212b39..8ec87b6 100644 --- a/include/net/inet6_connection_sock.h +++ b/include/net/inet6_connection_sock.h @@ -15,16 +15,11 @@ #include -struct inet_bind_bucket; struct request_sock; struct sk_buff; struct sock; struct sockaddr; -int inet6_csk_bind_conflict(const struct sock *sk, - const struct inet_bind_bucket *tb, bool relax, - bool soreuseport_ok); - struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct flowi6 *fl6, const struct request_sock *req, u8 proto); diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index ec0479a..9cd43c5 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -62,9 +62,9 @@ struct inet_connection_sock_af_ops { char __user *optval, int __user *optlen); #endif void(*addr2sockaddr)(struct sock *sk, struct sockaddr *); - int (*bind_conflict)(const struct sock *sk, -const struct inet_bind_bucket *tb, -bool relax, bool soreuseport_ok); + int (*rcv_saddr_equal)(const struct sock *sk1, + const struct sock *sk2, + bool match_wildcard); void(*mtu_reduced)(struct sock *sk); }; @@ -261,9 +261,6 @@ inet_csk_rto_backoff(const struct inet_connection_sock *icsk, struct sock *inet_csk_accept(struct sock *sk, int flags, int *err); -int inet_csk_bind_conflict(const struct sock *sk, - const struct inet_bind_bucket *tb, bool relax, - bool soreuseport_ok); int inet_csk_get_port(struct sock *sk, unsigned short snum); struct dst_entry *inet_csk_route_req(const struct sock *sk, struct flowi4 *fl4, diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 9c67a96..1931324 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -17,6 +17,7 @@ #include #include +#include #include #include #include @@ -901,7 +902,7 @@ static const struct inet_connection_sock_af_ops dccp_ipv4_af_ops = { .getsockopt= ip_getsockopt, .addr2sockaddr = inet_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in), - .bind_conflict = inet_csk_bind_conflict, + .rcv_saddr_equal = ipv4_rcv_saddr_equal, #ifdef CONFIG_COMPAT .compat_setsockopt = compat_ip_setsockopt, .compat_getsockopt = compat_ip_getsockopt, diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 4663a01..45242b8 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -926,7 +926,7 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_af_ops = { .getsockopt= ipv6_getsockopt, .addr2sockaddr = inet6_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in6), - .bind_conflict = inet6_csk_bind_conflict, + .rcv_saddr_equal = ipv6_rcv_saddr_equal, #ifdef CONFIG_COMPAT .compat_setsockopt = compat_ipv6_setsockopt, .compat_getsockopt = compat_ipv6_getsockopt, diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 5f44fa1..74f6a57 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -44,9 +44,9 @@ void inet_get_local_port_range(struct net *net, int *low, int *high) } EXPORT_SYMBOL(inet_get_local_port_range); -int inet_csk_bind_conflict(const struct sock *sk, - const struct inet_bind_bucket *tb, bool relax, - bool reuseport_ok) +static int inet_csk_bind_conflict(const struct sock *sk, + const struct inet_bind_bucket *tb, + bool relax, bool reuseport_ok) { struct sock *sk2; bool reuse = sk->sk_reuse; @@ -62,7 +62,6 @@ int inet_csk_bind_conflict(const
[RFC][PATCH 0/5 net-next] Rework inet_csk_get_port
At some point recently the guys working on our load balancer added the ability to use SO_REUSEPORT. When they restarted their app with this option enabled they immediately hit a softlockup on what appeared to be the inet_bind_bucket->lock. Eventually what all of our debugging and discussion led us to was the fact that the application comes up without SO_REUSEPORT, shuts down which creates around 100k twsk's, and then comes up and tries to open a bunch of sockets using SO_REUSEPORT, which meant traversing the inet_bind_bucket owners list under the lock. Since this lock is needed for dealing with the twsk's and basically anything else related to connections we would softlockup, and sometimes not ever recover. To solve this problem I did what you see in Path 5/5. Once we have a SO_REUSEPORT socket on the tb->owners list we know that the socket has no conflicts with any of the other sockets on that list. So we can add a copy of the sock_common (really all we need is the recv_saddr but it seemed ugly to copy just the ipv6, ipv4, and flag to indicate if we were ipv6 only in there so I've copied the whole common) in order to check subsequent SO_REUSEPORT sockets. If they match the previous one then we can skip the expensive inet_csk_bind_conflict check. This is what eliminated the soft lockup that we were seeing. Patches 1-4 are cleanups and re-workings. For instance when we specify port == 0 we need to find an open port, but we would do two passes through inet_csk_bind_conflict every time we found a possible port. We would also keep track of the smallest_port value in order to try and use it if we found no port our first run through. This however made no sense as it would have had to fail the first pass through inet_csk_bind_conflict, so would not actually pass the second pass through either. Finally I split the function into two functions in order to make it easier to read and to distinguish between the two behaviors. I have tested this on one of our load balancing boxes during peak traffic and it hasn't fallen over. But this is not my area, so obviously feel free to point out where I'm being stupid and I'll get it fixed up and retested. Thanks, Josef
Re: nfc: trf7970a: Prevent repeated polling from crashing the kernel
On Tue, Dec 20, 2016 at 02:13:52PM -0500, Justin Bronder wrote: > On 20/12/16 11:59 -0700, Mark Greer wrote: > > On Tue, Dec 20, 2016 at 11:16:32AM -0500, Geoff Lansberry wrote: > > > From: Jaret Cantu> > > > > > Repeated polling attempts cause a NULL dereference error to occur. > > > This is because the state of the trf7970a is currently reading but > > > another request has been made to send a command before it has finished. > > > > How is this happening? Was trf7970a_abort_cmd() called and it didn't > > work right? Was it not called at all and there is a bug in the digital > > layer? More details please. > > > > > The solution is to properly kill the waiting reading (workqueue) > > > before failing on the send. > > > > If the bug is in the calling code, then that is what should get fixed. > > This seems to be a hack to work-around a digital layer bug. > > One of our uses of NFC is to begin polling to read a tag and then stop polling > (in order to save power) until we know via user interaction that we need to > poll > again. This is typically many minutes later so the power saving is pretty > significant. However, it's possible that a user will remove the tag before > reading has completed. We also detect this case and stop polling. I can go > more into this if necessary but that is what exposed a panic. > > You can reproduce using neard and python, in our testing it was very likely to > occur in 10-100 iterations of the following.: > > #!/usr/bin/python > import time > > import dbus > > bus = dbus.SystemBus() > nfc0 = bus.get_object('org.neard', '/org/neard/nfc0') > props = dbus.Interface(nfc0, 'org.freedesktop.DBus.Properties') > > try: > props.Set('org.neard.Adapter', 'Powered', dbus.Boolean(1)) > except: > pass > > adapter = dbus.Interface(nfc0, 'org.neard.Adapter') > > for i in range(1000): > adapter.StartPollLoop('Initiator') > time.sleep(0.1) > adapter.StopPollLoop() > print(i) > > I believe the last time we tested this was around the 4.1 release. Thanks for the info, Justin, but I was also seeking more information at the kernel NFC subsystem and trf7970a driver level. This patch adds code inside an 'if' in the driver whose condition should never be evaluate to true but apparently it did. How? Thanks, Mark --
Re: ipv6: handle -EFAULT from skb_copy_bits
On Tue, Dec 20, 2016 at 01:28:13PM -0500, David Miller wrote: > This has to do with the SKB buffer layout and geometry, not whether > the packet is "fragmented" in the protocol sense. > > So no, this isn't a criteria for packets being filtered out by this > point. > > Can you try to capture what sk->sk_socket->type and > inet_sk(sk)->hdrincl are set to at the time of the crash? > type:3 hdrincl:0 Dave
Re: ipv6: handle -EFAULT from skb_copy_bits
On Tue, Dec 20, 2016 at 10:17 AM, Dave Joneswrote: > On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote: > > From: Dave Jones > > Date: Mon, 19 Dec 2016 19:40:13 -0500 > > > > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote: > > > > > > > Unfortunately, this made no difference. I spent some time today > trying > > > > to make a better reproducer, but failed. I'll revisit again tomorrow. > > > > > > > > Maybe I need >1 process/thread to trigger this. That would explain > why > > > > I can trigger it with Trinity. > > > > > > scratch that last part, I finally just repro'd it with a single process. > > > > Thanks for the info, I'll try to think about this some more. > > I threw in some debug printks right before that BUG_ON. > it's always this: > > skb->len=31 skb->data_len=0 offset:30 total_len:9 Clearly we fail because 30 > 31 - 2, seems 'offset' is not correct here, off-by-one?
Re: [PATCH net] be2net: Increase skb headroom size to 256 bytes
From: Suresh ReddyDate: Tue, 20 Dec 2016 10:14:30 -0500 > From: Kalesh A P > > The driver currently allocates 128 bytes of skb headroom. > This was found to be insufficient with some configurations > like Geneve tunnels, which resulted in skb head reallocations. > > Increase the headroom to 256 bytes to fix this. > > Signed-off-by: Kalesh A P > Signed-off-by: Suresh Reddy Adding 128 bytes of headroom just for geneve seems excessive. Do you really need to add that much?
Re: [mm PATCH 0/3] Page fragment updates
On Mon, Dec 5, 2016 at 12:11 PM, Andrew Mortonwrote: > On Mon, 5 Dec 2016 09:01:12 -0800 Alexander Duyck > wrote: > >> On Tue, Nov 29, 2016 at 10:23 AM, Alexander Duyck >> wrote: >> > This patch series takes care of a few cleanups for the page fragments API. >> > >> > ... >> >> It's been about a week since I submitted this series. Just wanted to >> check in and see if anyone had any feedback or if this is good to be >> accepted for 4.10-rc1 with the rest of the set? > > Looks good to me. I have it all queued for post-4.9 processing. So I guess there is a small bug in the first patch in that I was comparing a pointer to to 0 instead of NULL. Just wondering if I should resubmit the first patch, the whole series, or if I need to just submit an incremental patch. Thanks. - Alex
Re: [PATCH] net_sched: sch_netem: use rb_entry()
From: Geliang TangDate: Tue, 20 Dec 2016 22:02:16 +0800 > To make the code clearer, use rb_entry() instead of container_of() to > deal with rbtree. > > Signed-off-by: Geliang Tang Applied.
Re: [PATCH] net_sched: sch_fq: use rb_entry()
From: Geliang TangDate: Tue, 20 Dec 2016 22:02:15 +0800 > To make the code clearer, use rb_entry() instead of container_of() to > deal with rbtree. > > Signed-off-by: Geliang Tang Applied.
Re: [PATCH] net/mlx5: use rb_entry()
From: Geliang TangDate: Tue, 20 Dec 2016 22:02:14 +0800 > To make the code clearer, use rb_entry() instead of container_of() to > deal with rbtree. > > Signed-off-by: Geliang Tang Applied.
Re: [PATCH] RDS: use rb_entry()
From: Geliang TangDate: Tue, 20 Dec 2016 22:02:18 +0800 > To make the code clearer, use rb_entry() instead of container_of() to > deal with rbtree. > > Signed-off-by: Geliang Tang Applied.
Re: [PATCH] ethernet: sfc: Add Kconfig entry for vendor Solarflare
From: Tobias KlauserDate: Tue, 20 Dec 2016 14:38:26 +0100 > Since commit > > 5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new > sfc-falcon driver") > > there are two drivers for Solarflare devices, but both still show up > directly beneath "Ethernet driver support" in the Kconfig. Follow the > pattern of other vendors and group them beneath an own vendor Kconfig > entry for Solarflare. > > Cc: Edward Cree > Signed-off-by: Tobias Klauser Applied.
Re: [GIT PULL 00/29] perf/core improvements and fixes
* Arnaldo Carvalho de Melo <a...@kernel.org> wrote: > Hi Ingo, > > Please consider pulling, I had most of this queued before your first > pull req to Linus for 4.10, most are fixes, with 'perf sched timehist --idle' > as a followup new feature to the 'perf sched timehist' command introduced in > this window. > > One other thing that delayed this was the samples/bpf/ switch to > tools/lib/bpf/ that involved fixing up merge clashes with net.git and also > to properly test it, after more rounds than antecipated, but all seems ok > now and would be good to get this merge issues past us ASAP. > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit e7aa8c2eb11ba69b1b69099c3c7bd6be3087b0ba: > > Merge tag 'docs-4.10' of git://git.lwn.net/linux (2016-12-12 21:58:13 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git > tags/perf-core-for-mingo-20161220 > > for you to fetch changes up to 9899694a7f67714216665b87318eb367e2c5c901: > > samples/bpf: Move open_raw_sock to separate header (2016-12-20 12:00:40 > -0300) > > > perf/core improvements and fixes: > > New features: > > - Introduce 'perf sched timehist --idle', to analyse processes > going to/from idle state (Namhyung Kim) > > Fixes: > > - Allow 'perf record -u user' to continue when facing races with threads > going away after having scanned them via /proc (Jiri Olsa) > > - Fix 'perf mem' --all-user/--all-kernel options (Jiri Olsa) > > - Support jumps with multiple arguments (Ravi Bangoria) > > - Fix jumps to before the function where they are located (Ravi > Bangoria) > > - Fix lock-pi help string (Davidlohr Bueso) > > - Fix build of 'perf trace' in odd systems such as a RHEL PPC one (Jiri Olsa) > > - Do not overwrite valid build id in 'perf diff' (Kan Liang) > > - Don't throw error for zero length symbols, allowing the use of the TUI > in PowerPC, where such symbols became more common recently (Ravi Bangoria) > > Infrastructure: > > - Switch of samples/bpf/ to use tools/lib/bpf, removing libbpf > duplication (Joe Stringer) > > - Move headers check into bash script (Jiri Olsa) > > Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> > > > Arnaldo Carvalho de Melo (3): > perf tools: Remove some needless __maybe_unused > samples/bpf: Make perf_event_read() static > samples/bpf: Be consistent with bpf_load_program bpf_insn parameter > > Davidlohr Bueso (1): > perf bench futex: Fix lock-pi help string > > Jiri Olsa (7): > perf tools: Move headers check into bash script > perf mem: Fix --all-user/--all-kernel options > perf evsel: Use variable instead of repeating lengthy FD macro > perf thread_map: Add thread_map__remove function > perf evsel: Allow to ignore missing pid > perf record: Force ignore_missing_thread for uid option > perf trace: Check if MAP_32BIT is defined (again) > > Joe Stringer (8): > tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h > tools lib bpf: use __u32 from linux/types.h > tools lib bpf: Add flags to bpf_create_map() > samples/bpf: Make samples more libbpf-centric > samples/bpf: Switch over to libbpf > tools lib bpf: Add bpf_prog_{attach,detach} > samples/bpf: Remove perf_event_open() declaration > samples/bpf: Move open_raw_sock to separate header > > Kan Liang (1): > perf diff: Do not overwrite valid build id > > Namhyung Kim (6): > perf sched timehist: Split is_idle_sample() > perf sched timehist: Introduce struct idle_time_data > perf sched timehist: Save callchain when entering idle > perf sched timehist: Skip non-idle events when necessary > perf sched timehist: Add -I/--idle-hist option > perf sched timehist: Show callchains for idle stat > > Ravi Bangoria (3): > perf annotate: Support jump instruction with target as second operand > perf annotate: Fix jump target outside of function address range > perf annotate: Don't throw error for zero length symbols > > samples/bpf/Makefile | 70 +-- > samples/bpf/README.rst| 4 +- > samples/bpf/bpf_load.c| 21 +- > samples/bpf/bpf_load.h| 3 + > samples/bpf/fds_example.c | 13 +- > samples/bpf/lathis
Re: nfc: trf7970a: Prevent repeated polling from crashing the kernel
On 20/12/16 11:59 -0700, Mark Greer wrote: > On Tue, Dec 20, 2016 at 11:16:32AM -0500, Geoff Lansberry wrote: > > From: Jaret Cantu> > > > Repeated polling attempts cause a NULL dereference error to occur. > > This is because the state of the trf7970a is currently reading but > > another request has been made to send a command before it has finished. > > How is this happening? Was trf7970a_abort_cmd() called and it didn't > work right? Was it not called at all and there is a bug in the digital > layer? More details please. > > > The solution is to properly kill the waiting reading (workqueue) > > before failing on the send. > > If the bug is in the calling code, then that is what should get fixed. > This seems to be a hack to work-around a digital layer bug. One of our uses of NFC is to begin polling to read a tag and then stop polling (in order to save power) until we know via user interaction that we need to poll again. This is typically many minutes later so the power saving is pretty significant. However, it's possible that a user will remove the tag before reading has completed. We also detect this case and stop polling. I can go more into this if necessary but that is what exposed a panic. You can reproduce using neard and python, in our testing it was very likely to occur in 10-100 iterations of the following.: #!/usr/bin/python import time import dbus bus = dbus.SystemBus() nfc0 = bus.get_object('org.neard', '/org/neard/nfc0') props = dbus.Interface(nfc0, 'org.freedesktop.DBus.Properties') try: props.Set('org.neard.Adapter', 'Powered', dbus.Boolean(1)) except: pass adapter = dbus.Interface(nfc0, 'org.neard.Adapter') for i in range(1000): adapter.StartPollLoop('Initiator') time.sleep(0.1) adapter.StopPollLoop() print(i) I believe the last time we tested this was around the 4.1 release. -- Justin Bronder
Re: [PATCH 0/2] net: hix5hd2_gmac: keep the compatible string not changed
From: Dongpo LiDate: Tue, 20 Dec 2016 10:09:27 +0800 > This patch series fix the patch: > d0fb6ba75dc0 ("net: hix5hd2_gmac: add generic compatible string") > > The SoC hix5hd2 compatible string has the suffix "-gmac" and > we should not change its compatible string. > So we should name all the compatible string with the suffix "-gmac". > Creating a new name suffix "-gemac" is unnecessary. Series applied.
Re: [PATCH net 2/2] net: netcp: ethss: fix 10gbe host port tx pri map configuration
From: Murali KaricheriDate: Mon, 19 Dec 2016 17:55:57 -0500 > From: WingMan Kwok > > This patch adds the missing 10gbe host port tx priority map > configurations. > > Signed-off-by: WingMan Kwok > Signed-off-by: Murali Karicheri > Signed-off-by: Sekhar Nori Applied.
Re: [PATCH net] openvswitch: Add a missing break statement.
From: Jarno RajahalmeDate: Mon, 19 Dec 2016 17:06:33 -0800 > Add a break statement to prevent fall-through from > OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL. Without the break > actions setting ethernet addresses fail to validate with log messages > complaining about invalid tunnel attributes. > > Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets") > Signed-off-by: Jarno Rajahalme > Acked-by: Pravin B Shelar > Acked-by: Jiri Benc Applied.
Re: [PATCH net 1/2] net: netcp: ethss: fix errors in ethtool ops
From: Murali KaricheriDate: Mon, 19 Dec 2016 17:55:56 -0500 > From: WingMan Kwok > > In ethtool ops, it needs to retrieve the corresponding > ethss module (gbe or xgbe) from the net_device structure. > Prior to this patch, the retrieving procedure only > checks for the gbe module. This patch fixes the issue > by checking the xgbe module if the net_device structure > does not correspond to the gbe module. > > Signed-off-by: WingMan Kwok > Signed-off-by: Murali Karicheri > Signed-off-by: Sekhar Nori Applied.
Re: [PATCH 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel
On Tue, Dec 20, 2016 at 11:16:32AM -0500, Geoff Lansberry wrote: > From: Jaret Cantu> > Repeated polling attempts cause a NULL dereference error to occur. > This is because the state of the trf7970a is currently reading but > another request has been made to send a command before it has finished. How is this happening? Was trf7970a_abort_cmd() called and it didn't work right? Was it not called at all and there is a bug in the digital layer? More details please. > The solution is to properly kill the waiting reading (workqueue) > before failing on the send. If the bug is in the calling code, then that is what should get fixed. This seems to be a hack to work-around a digital layer bug. Mark --
Re: [PATCH net v4 0/4] fsl/fman: fixes for ARM
From: Madalin BucurDate: Mon, 19 Dec 2016 22:42:42 +0200 > The patch set fixes advertised speeds for QSGMII interfaces, disables > A007273 erratum workaround on non-PowerPC platforms where it does not > apply, enables compilation on ARM64 and addresses a probing issue on > non PPC platforms. > > Changes from v3: removed redundant comment, added ack by Scott > Changes from v2: merged fsl/fman changes to avoid a point of failure > Changes from v1: unifying probing on all supported platforms Series applied, thanks.
Re: Soft lockup in tc_classify
On 12/19/2016 7:58 PM, Cong Wang wrote: Hello, On Mon, Dec 19, 2016 at 8:39 AM, Shahar Kleinwrote: On 12/13/2016 12:51 AM, Cong Wang wrote: On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz wrote: On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann wrote: Note that there's still the RCU fix missing for the deletion race that Cong will still send out, but you say that the only thing you do is to add a single rule, but no other operation in involved during that test? What's missing to have the deletion race fixed? making a patch or testing to a patch which was sent? If you think it would help for this problem, here is my patch rebased on the latest net-next. Again, I don't see how it could help this case yet, especially I don't see how we could have a loop in this singly linked list. I've applied cong's patch and hit a different lockup(full log attached): Are you sure this is really different? For me, it is still inside the loop in tc_classify(), with only a slightly different offset. Daniel suggested I'll add a print: case RTM_DELTFILTER: - err = tp->ops->delete(tp, fh); + printk(KERN_ERR "DEBUGG:SK %s:%d\n", __func__, __LINE__); + err = tp->ops->delete(tp, fh, ); if (err == 0) { and I couldn't see this print in the output. Hmm, that is odd, if this never prints, then my patch should not make any difference. There are still two other cases where we could change tp->next, so do you mind to add two more printk's for debugging? Attached is the delta patch. Thanks! I've added a slightly different debug print: @@ -368,11 +375,12 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n) if (tp_created) { RCU_INIT_POINTER(tp->next, rtnl_dereference(*back)); rcu_assign_pointer(*back, tp); + printk(KERN_ERR "DEBUGG:SK add/change filter by: %pf tp=%p tp->next=%p\n", tp->ops->get, tp, tp->next); } tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER, false); full output attached: [ 283.290271] Mirror/redirect action on [ 283.305031] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9432d704df60 tp->next= (null) [ 283.322563] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d240 tp->next= (null) [ 283.359997] GACT probability on [ 283.365923] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d3c0 tp->next=9436e718d240 [ 283.378725] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d3c0 tp->next=9436e718d3c0 [ 283.391310] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d3c0 tp->next=9436e718d3c0 [ 283.403923] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d3c0 tp->next=9436e718d3c0 [ 283.416542] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d3c0 tp->next=9436e718d3c0 [ 308.538571] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] Thanks Shahar [ 283.290271] Mirror/redirect action on [ 283.305031] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9432d704df60 tp->next= (null) [ 283.322563] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d240 tp->next= (null) [ 283.359997] GACT probability on [ 283.365923] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d3c0 tp->next=9436e718d240 [ 283.378725] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d3c0 tp->next=9436e718d3c0 [ 283.391310] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d3c0 tp->next=9436e718d3c0 [ 283.403923] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d3c0 tp->next=9436e718d3c0 [ 283.416542] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=9436e718d3c0 tp->next=9436e718d3c0 [ 308.538571] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] [ 308.547322] Modules linked in: act_gact act_mirred openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 vfio_pci vfio_virqfd vfio_iommu_type1 vfio cls_flower mlx5_ib mlx5_core devlink sch_ingress nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat libcrc32c nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun ebtable_filter ebtables ip6table_filter ip6_tables netconsole rpcrdma bridge ib_isert stp iscsi_target_mod llc ib_iser libiscsi scsi_transport_iscsi ib_srpt ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp kvm_intel kvm igb irqbypass joydev ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel ptp ipmi_si
Re: [PATCH net 0/3] Fix integration of eee-broken-modes
From: Jerome BrunetDate: Mon, 19 Dec 2016 16:05:35 +0100 > The purpose of this series is to fix the integration of the ethernet phy > property "eee-broken-modes" [0] > > The v3 of this series has been merged, missing a fix (error reported by > kbuild robot) available in the v4 [1] > > More importantly, Florian opposed adding a DT property mapping a device > register this directly [2]. The concern was that the property could be > abused to implement platform configuration policy. After discussing it, > I think we agreed that such information about the HW (defect) should appear > in the platform DT. However, the preferred way is to add a boolean property > for each EEE broken mode. > > [0]: > http://lkml.kernel.org/r/1480326409-25419-1-git-send-email-jbru...@baylibre.com > [1]: > http://lkml.kernel.org/r/1480348229-25672-1-git-send-email-jbru...@baylibre.com > [2]: http://lkml.kernel.org/r/e14a3b0c-dc34-be14-48b3-518a0ad0c...@gmail.com Series applied, thank you.
Re: [PATCH perf/core REBASE 3/5] tools lib bpf: Add bpf_prog_{attach,detach}
On 20 December 2016 at 06:32, Arnaldo Carvalho de Melowrote: > Em Tue, Dec 20, 2016 at 11:18:51AM -0300, Arnaldo Carvalho de Melo escreveu: >> Em Wed, Dec 14, 2016 at 02:43:40PM -0800, Joe Stringer escreveu: >> > Commit d8c5b17f2bc0 ("samples: bpf: add userspace example for attaching >> > eBPF programs to cgroups") added these functions to samples/libbpf, but >> > during this merge all of the samples libbpf functionality is shifting to >> > tools/lib/bpf. Shift these functions there. >> > >> > Signed-off-by: Joe Stringer >> > --- >> > Arnaldo, this is a new patch you didn't previously review which I've >> > prepared due to the conflict with net-next. I figured it's better to try >> > to get samples/bpf properly switched over this window rather than defer the >> > problem and end up having to deal with another merge problem next time >> > around. I hope that is fine for you. If not, this patch onwards will need >> > to be dropped >> > >> > It's a simple copy/paste/delete with a minor change for sys_bpf() vs >> > syscall(). >> > --- >> > samples/bpf/libbpf.c | 21 - >> > samples/bpf/libbpf.h | 3 --- >> > tools/lib/bpf/bpf.c | 21 + >> > tools/lib/bpf/bpf.h | 3 +++ >> > 4 files changed, 24 insertions(+), 24 deletions(-) >> > >> > diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c >> > index 3391225ad7e9..d9af876b4a2c 100644 >> > --- a/samples/bpf/libbpf.c >> > +++ b/samples/bpf/libbpf.c >> > @@ -11,27 +11,6 @@ >> > #include >> > #include "libbpf.h" >> > >> > -int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type) >> > -{ >> > - union bpf_attr attr = { >> > - .target_fd = target_fd, >> > - .attach_bpf_fd = prog_fd, >> > - .attach_type = type, >> > - }; >> > - >> > - return syscall(__NR_bpf, BPF_PROG_ATTACH, , sizeof(attr)); >> >> This one makes it fail for CentOS 5 and 6, others may fail as well, >> still building, investigating... > > Ok, fixed it by making it follow the model of the other sys_bpf wrappers > setting up that bpf_attr union wrt initializing unamed struct members: > > int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type) > { > - union bpf_attr attr = { > - .target_fd = target_fd, > - .attach_bpf_fd = prog_fd, > - .attach_type = type, > - }; > + union bpf_attr attr; > + > + bzero(, sizeof(attr)); > + attr.target_fd = target_fd; > + attr.attach_bpf_fd = prog_fd; > + attr.attach_type = type; > > return sys_bpf(BPF_PROG_ATTACH, , sizeof(attr)); > } Ah, I just shifted these across originally so the delta would be minimal but now I know why this code is like this. Thanks.
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Tue, Dec 20, 2016 at 10:36 AM, Daniel Mackwrote: > Hi, > > On 12/20/2016 06:23 PM, Andy Lutomirski wrote: >> On Tue, Dec 20, 2016 at 2:21 AM, Daniel Mack wrote: > >> To clarify, since this thread has gotten excessively long and twisted, >> I think it's important that, for hooks attached to a cgroup, you be >> able to tell in a generic way whether something is plugged into the >> hook. The natural way to see a cgroup's configuration is to read from >> cgroupfs, so I think that reading from cgroupfs should show you that a >> BPF program is attached and also give enough information that, once >> bpf programs become dumpable, you can dump the program (using the >> bpf() syscall or whatever). > > [...] > >> There isn't a big semantic difference between >> 'open("/cgroup/NAME/some.control.file", O_WRONLY); ioctl(..., >> CGROUP_ATTACH_BPF, ...)' and 'open("/cgroup/NAME/some.control.file", >> O_WRONLY); bpf(BPF_PROG_ATTACH, ...);'. There is, however, a semantic >> difference when you do open("/cgroup/NAME", O_RDONLY | O_DIRECTORY) >> because the permission check is much weaker. > > Okay, if you have such a control file, you can of course do something > like that. When we discussed things back then with Tejun however, we > concluded that a controller that is not completely controllable through > control knobs that can be written and read via cat is meaningless. > That's why this has become a 'hidden' cgroup feature. > > With your proposed API, you'd first go to the bpf(2) syscall in order to > get a prog fd, and then come back to some sort of cgroup API to put the > fd in there. That's quite a mix and match, which is why we considered > the API cleaner in its current form, as everything that is related to > bpf is encapsulated behind a single syscall. You already have to do bpf() to get a prog fd, then open() to get a cgroup fd, then bpf() or ioctl() to attach, so this isn't much different, and its exactly the same number of syscalls. > >> My preference would be to do an ioctl on a new >> /cgroup/NAME/network_hooks.inet_ingress file. Reading that file tells >> you whether something is attached and hopefully also gives enough >> information (a hash of the BPF program, perhaps) to dump the actual >> program using future bpf() interfaces. write() and ioctl() can be >> used to configure it as appropriate. > > So am I reading this right? You're proposing to add ioctl() hooks to > kernfs/cgroupfs? That would open more possibilities of course, but I'm > not sure where that rabbit hole leads us eventually. Indeed. I already have a test patch to add ioctl() to kernfs. Adding it to cgroupfs shouldn't be much more complicated. > >> Another option that I like less would be to have a >> /cgroup/NAME/cgroup.bpf that lists all the active hooks along with >> their contents. You would do an ioctl() on that to program a hook and >> you could read it to see what's there. > > Yes, read() could, in theory, give you similar information than ioctl(), > but in human-readable form. > >> FWIW, everywhere I say ioctl(), the bpf() syscall would be okay, too. >> It doesn't make a semantic difference, except that I dislike >> BPF_PROG_DETACH because that particular command isn't BPF-specific at >> all. > > Well, I think it is; it pops the bpf program from a target and drops the > reference on it. It's not much code, but it's certainly bpf-specific. I mean the interface isn't bpf-specific. If there was something that wasn't bpf attached to the target, you'd still want an API to detach it. > So if I set up a cgroup that's monitored and call it /cgroup/a and enable delegation and if the program running there wants to do its own monitoring in /cgroup/a/b (via delegation), then you really want the outer monitor to silently drop events coming from /cgroup/a/b? >>> >>> That's a fair point, and we've discussed it as well. The issue is, as >>> Alexei already pointed out, that we do not want to traverse the tree up >>> to the root for nested cgroups due to the runtime costs in the >>> networking fast-path. After all, we're running the bpf program for each >>> packet in flight. Hence, we opted for the approach to only look at the >>> leaf node for now, with the ability to open it up further in the future >>> using flags during attach etc. >> >> Careful here! You don't look only at the leaf node for now. You do a >> fancy traversal and choose the nearest node that has a hook set up. > > But we do the 'complex' operation at attach time or when a cgroup is > created, both of which are slow-path operations. In the fast-path, we > only look at the leaf, which may or may not have an effective program > installed. And that's of course much cheaper then doing the traversing > for each packet. You would never traverse the full hierarchy for each packet. You'd have a linked list of programs that are attached, kind of like how there's an "effective" array right now. I sent out
Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
On Tue, Dec 20, 2016 at 1:11 PM, Mark Greerwrote: > Hi Geoff. > > Please put the version in your subjects when submitting anything but the > initial version of a patch (e.g., [PATCH v2 1/3]). > > Which series do you want reviewed? > > Mark > -- Sorry about the double posting, I had forgotten to erase the patches I generated while rebasing and checking, and I'll have to figure out how to add that v2 line to the automatically generated subject line if I end up submitting another round. Please review the three most recent patches, which have the send time of 17:16. Best Regards, Geoff
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
Hi, On 12/20/2016 06:23 PM, Andy Lutomirski wrote: > On Tue, Dec 20, 2016 at 2:21 AM, Daniel Mackwrote: > To clarify, since this thread has gotten excessively long and twisted, > I think it's important that, for hooks attached to a cgroup, you be > able to tell in a generic way whether something is plugged into the > hook. The natural way to see a cgroup's configuration is to read from > cgroupfs, so I think that reading from cgroupfs should show you that a > BPF program is attached and also give enough information that, once > bpf programs become dumpable, you can dump the program (using the > bpf() syscall or whatever). [...] > There isn't a big semantic difference between > 'open("/cgroup/NAME/some.control.file", O_WRONLY); ioctl(..., > CGROUP_ATTACH_BPF, ...)' and 'open("/cgroup/NAME/some.control.file", > O_WRONLY); bpf(BPF_PROG_ATTACH, ...);'. There is, however, a semantic > difference when you do open("/cgroup/NAME", O_RDONLY | O_DIRECTORY) > because the permission check is much weaker. Okay, if you have such a control file, you can of course do something like that. When we discussed things back then with Tejun however, we concluded that a controller that is not completely controllable through control knobs that can be written and read via cat is meaningless. That's why this has become a 'hidden' cgroup feature. With your proposed API, you'd first go to the bpf(2) syscall in order to get a prog fd, and then come back to some sort of cgroup API to put the fd in there. That's quite a mix and match, which is why we considered the API cleaner in its current form, as everything that is related to bpf is encapsulated behind a single syscall. > My preference would be to do an ioctl on a new > /cgroup/NAME/network_hooks.inet_ingress file. Reading that file tells > you whether something is attached and hopefully also gives enough > information (a hash of the BPF program, perhaps) to dump the actual > program using future bpf() interfaces. write() and ioctl() can be > used to configure it as appropriate. So am I reading this right? You're proposing to add ioctl() hooks to kernfs/cgroupfs? That would open more possibilities of course, but I'm not sure where that rabbit hole leads us eventually. > Another option that I like less would be to have a > /cgroup/NAME/cgroup.bpf that lists all the active hooks along with > their contents. You would do an ioctl() on that to program a hook and > you could read it to see what's there. Yes, read() could, in theory, give you similar information than ioctl(), but in human-readable form. > FWIW, everywhere I say ioctl(), the bpf() syscall would be okay, too. > It doesn't make a semantic difference, except that I dislike > BPF_PROG_DETACH because that particular command isn't BPF-specific at > all. Well, I think it is; it pops the bpf program from a target and drops the reference on it. It's not much code, but it's certainly bpf-specific. >>> So if I set up a cgroup that's monitored and call it /cgroup/a and >>> enable delegation and if the program running there wants to do its own >>> monitoring in /cgroup/a/b (via delegation), then you really want the >>> outer monitor to silently drop events coming from /cgroup/a/b? >> >> That's a fair point, and we've discussed it as well. The issue is, as >> Alexei already pointed out, that we do not want to traverse the tree up >> to the root for nested cgroups due to the runtime costs in the >> networking fast-path. After all, we're running the bpf program for each >> packet in flight. Hence, we opted for the approach to only look at the >> leaf node for now, with the ability to open it up further in the future >> using flags during attach etc. > > Careful here! You don't look only at the leaf node for now. You do a > fancy traversal and choose the nearest node that has a hook set up. But we do the 'complex' operation at attach time or when a cgroup is created, both of which are slow-path operations. In the fast-path, we only look at the leaf, which may or may not have an effective program installed. And that's of course much cheaper then doing the traversing for each packet. > mkdir /cgroup/foo > BPF_PROG_ATTACH(some program to foo) > mkdir /cgroup/foo/bar > chown -R some_user /cgroup/foo/bar > > If the kernel only looked at the leaf, then the program that did the > above would not expect that the program would constrain > /cgroup/foo/bar's activity. But, as it stands, the program *would* > expect /cgroup/foo/bar to be constrained, except that, whenever the > capable() check changes to ns_capable() (which will happen eventually > one way or another), then the bad guy can create /cgroup/foo/bar/baz, > install a new no-op hook there, and break the security assumption. > > IOW, I think that totally non-recursive hooks are okay from a security > perspective, albeit rather strange, but the current design is not okay > from a security perspective. We locked down the ability to override any of
Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
On Tue, Dec 20, 2016 at 01:29:13PM -0500, Geoff Lansberry wrote: > On Tue, Dec 20, 2016 at 1:11 PM, Mark Greerwrote: > > Hi Geoff. > > > > Please put the version in your subjects when submitting anything but the > > initial version of a patch (e.g., [PATCH v2 1/3]). > > > > Which series do you want reviewed? > > > > Mark > > -- > Sorry about the double posting, I had forgotten to erase the patches I > generated while rebasing and checking, and I'll have to figure out how > to add that v2 line to the automatically generated subject line if I > end up submitting another round. Hint: -v option of 'git format-patch' > Please review the three most recent patches, which have the send time > of 17:16. Okay, thank. Mark --
Re: mlx4: Bug in XDP_TX + 16 rx-queues
On Tue, Dec 20, 2016 at 02:02:05PM +0200, Tariq Toukan wrote: > Thanks Martin, nice catch! > > > On 20/12/2016 1:37 AM, Martin KaFai Lau wrote: > >Hi Tariq, > > > >On Sat, Dec 17, 2016 at 02:18:03AM -0800, Martin KaFai Lau wrote: > >>Hi All, > >> > >>I have been debugging with XDP_TX and 16 rx-queues. > >> > >>1) When 16 rx-queues is used and an XDP prog is doing XDP_TX, > >>it seems that the packet cannot be XDP_TX out if the pkt > >>is received from some particular CPUs (/rx-queues). > >> > >>2) If 8 rx-queues is used, it does not have problem. > >> > >>3) The 16 rx-queues problem also went away after reverting these > >>two patches: > >>15fca2c8eb41 net/mlx4_en: Add ethtool statistics for XDP cases > >>67f8b1dcb9ee net/mlx4_en: Refactor the XDP forwarding rings scheme > >> > >After taking a closer look at 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP > >forwarding rings scheme") > >and armed with the fact that '>8 rx-queues does not work', I have > >made the attached change that fixed the issue. > > > >Making change in mlx4_en_fill_qp_context() could be an easier fix > >but I think this change will be easier for discussion purpose. > > > >I don't want to lie that I know anything about how this variable > >works in CX3. If this change makes sense, I can cook up a diff. > >Otherwise, can you shed some light on what could be happening > >and hopefully can lead to a diff? > > > >Thanks > >--Martin > > > > > >diff --git i/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > >w/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > >index bcd955339058..b3bfb987e493 100644 > >--- i/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > >+++ w/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > >@@ -1638,10 +1638,10 @@ int mlx4_en_start_port(struct net_device *dev) > > > > /* Configure tx cq's and rings */ > > for (t = 0 ; t < MLX4_EN_NUM_TX_TYPES; t++) { > >-u8 num_tx_rings_p_up = t == TX ? priv->num_tx_rings_p_up : 1; > The bug lies in this line. > Number of rings per UP in case of TX_XDP should be priv->tx_ring_num[TX_XDP > ], not 1. > Please try the following fix. > I can prepare and send it once the window opens again. > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > index bcd955339058..edbe200ac2fa 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > @@ -1638,7 +1638,8 @@ int mlx4_en_start_port(struct net_device *dev) > > /* Configure tx cq's and rings */ > for (t = 0 ; t < MLX4_EN_NUM_TX_TYPES; t++) { > - u8 num_tx_rings_p_up = t == TX ? priv->num_tx_rings_p_up : > 1; > + u8 num_tx_rings_p_up = t == TX ? > + priv->num_tx_rings_p_up : priv->tx_ring_num[t]; > > for (i = 0; i < priv->tx_ring_num[t]; i++) { > /* Configure cq */ > Thanks for confirming the bug is related to the user_prio argument. I have some questions: 1. Just to confirm the intention of the change. Your change is essentially always passing 0 to the user_prio parameter for the TX_XDP type by doing (i / priv->tx_ring_num[t])? If yes, would it be clearer to always pass 0 instead? And yes, it also works in our test. Please post an offical patch if it is the fix. 2. Can you explain a little on how does the user_prio affect the tx behavior? e.g. What is the difference between different user_prio like 0, 1, 2...etc? 3. Mostly a follow up on (2). In mlx4_en_get_profile(), num_tx_rings_p_up (of the struct mlx4_en_profile) depends on mlx4_low_memory_profile() and number of cpu. Does these similar bounds apply to the 'u8 num_tx_rings_p_up' here for TX_XDP type? Thanks, Martin > >- > > for (i = 0; i < priv->tx_ring_num[t]; i++) { > > /* Configure cq */ > >+int user_prio; > >+ > > cq = priv->tx_cq[t][i]; > > err = mlx4_en_activate_cq(priv, cq, i); > > if (err) { > >@@ -1660,9 +1660,14 @@ int mlx4_en_start_port(struct net_device *dev) > > > > /* Configure ring */ > > tx_ring = priv->tx_ring[t][i]; > >+if (t != TX_XDP) > >+user_prio = i / priv->num_tx_rings_p_up; > >+else > >+user_prio = i & 0x07; > >+ > > err = mlx4_en_activate_tx_ring(priv, tx_ring, > >cq->mcq.cqn, > >- i / num_tx_rings_p_up); > >+ user_prio); > > if (err) { > > en_err(priv, "Failed allocating Tx ring\n"); > > mlx4_en_deactivate_cq(priv, cq); > Regards, > Tariq Toukan.
Re: [PATCH net-next 1/1] driver: ipvlan: Define common functions to decrease duplicated codes used to add or del IP address
From: f...@ikuai8.com Date: Mon, 19 Dec 2016 09:24:05 +0800 > It is sent again because the first email is sent during net-next closing. It is still closed, and will not open again for at least one week.
Re: kernel/bpf/verifier.c: 4 * possible unintended fallthrough ?
On Tue, Dec 20, 2016 at 11:34 AM, David Bindermanwrote: Hello there, From: Alexei Starovoitov I've tried 4.9 and 5.2 and don't see this warning. As expected - I used a development version of gcc. Latest released version is 6.2 Is this 6.x gcc? 7.0 would be more accurate. I suspect it will have such warnings all over the kernel. Indeed it has hundreds, but the subject under discussion is file kernel/bpf/verifier.c. I am still not sure if I have found a fallthrough bug or not. You haven't, this is intended so is a useless warning. Thanks, Josef
Re: ipv6: handle -EFAULT from skb_copy_bits
From: Dave JonesDate: Tue, 20 Dec 2016 13:17:28 -0500 > On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote: > > From: Dave Jones > > Date: Mon, 19 Dec 2016 19:40:13 -0500 > > > > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote: > > > > > > > Unfortunately, this made no difference. I spent some time today > trying > > > > to make a better reproducer, but failed. I'll revisit again tomorrow. > > > > > > > > Maybe I need >1 process/thread to trigger this. That would explain > why > > > > I can trigger it with Trinity. > > > > > > scratch that last part, I finally just repro'd it with a single process. > > > > Thanks for the info, I'll try to think about this some more. > > I threw in some debug printks right before that BUG_ON. > it's always this: > > skb->len=31 skb->data_len=0 offset:30 total_len:9 > > Shouldn't we have kicked out data_len=0 skb's somewhere before we got this > far ? skb->data_len is just the length of any non-linear data in the SKB. This has to do with the SKB buffer layout and geometry, not whether the packet is "fragmented" in the protocol sense. So no, this isn't a criteria for packets being filtered out by this point. Can you try to capture what sk->sk_socket->type and inet_sk(sk)->hdrincl are set to at the time of the crash? Thanks.
Re: ipv6: handle -EFAULT from skb_copy_bits
On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote: > From: Dave Jones> Date: Mon, 19 Dec 2016 19:40:13 -0500 > > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote: > > > > > Unfortunately, this made no difference. I spent some time today trying > > > to make a better reproducer, but failed. I'll revisit again tomorrow. > > > > > > Maybe I need >1 process/thread to trigger this. That would explain why > > > I can trigger it with Trinity. > > > > scratch that last part, I finally just repro'd it with a single process. > > Thanks for the info, I'll try to think about this some more. I threw in some debug printks right before that BUG_ON. it's always this: skb->len=31 skb->data_len=0 offset:30 total_len:9 Shouldn't we have kicked out data_len=0 skb's somewhere before we got this far ? Dave
Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
Hi Geoff. Please put the version in your subjects when submitting anything but the initial version of a patch (e.g., [PATCH v2 1/3]). Which series do you want reviewed? Mark --
Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
On 2016-12-20 17:16, Geoff Lansberry wrote: > From: Geoff Lansberry> > The TRF7970A has configuration options to support hardware designs > which use a 27.12MHz clock. This commit adds a device tree option > 'clock-frequency' to support configuring the this chip for default > 13.56MHz clock or the optional 27.12MHz clock. > --- > .../devicetree/bindings/net/nfc/trf7970a.txt | 4 ++ > drivers/nfc/trf7970a.c | 50 > +- > 2 files changed, 43 insertions(+), 11 deletions(-) > > diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > index 32b35a0..e262ac1 100644 > --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt > @@ -21,6 +21,8 @@ Optional SoC Specific Properties: > - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum >where an extra byte is returned by Read Multiple Block commands issued >to Type 5 tags. > +- clock-frequency: Set to specify that the input frequency to the trf7970a > is 1356Hz or 2712Hz > + You're adding an empty line here that is removed in the next patch. > > Example (for ARM-based BeagleBone with TRF7970A on SPI1): > > @@ -43,6 +45,8 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1): > irq-status-read-quirk; > en2-rf-quirk; > t5t-rmb-extra-byte-quirk; > + vdd_io_1v8; This does not belong here, and so no need to remove in the next patch. > + clock-frequency = <2712>; > status = "okay"; > }; > }; > diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c > index 26c9dbb..4e051e9 100644 > --- a/drivers/nfc/trf7970a.c > +++ b/drivers/nfc/trf7970a.c > @@ -124,6 +124,9 @@ >NFC_PROTO_ISO15693_MASK | NFC_PROTO_NFC_DEP_MASK) > > #define TRF7970A_AUTOSUSPEND_DELAY 3 /* 30 seconds */ > +#define TRF7970A_13MHZ_CLOCK_FREQUENCY 1356 > +#define TRF7970A_27MHZ_CLOCK_FREQUENCY 2712 > + > > #define TRF7970A_RX_SKB_ALLOC_SIZE 256 > > @@ -1056,12 +1059,11 @@ static int trf7970a_init(struct trf7970a *trf) > > trf->chip_status_ctrl &= ~TRF7970A_CHIP_STATUS_RF_ON; > > - ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL, 0); > + ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL, > + trf->modulator_sys_clk_ctrl); > if (ret) > goto err_out; > > - trf->modulator_sys_clk_ctrl = 0; > - > ret = trf7970a_write(trf, TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS, > TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLH_96 | > TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLL_32); > @@ -1181,27 +1183,37 @@ static int trf7970a_in_config_rf_tech(struct trf7970a > *trf, int tech) > switch (tech) { > case NFC_DIGITAL_RF_TECH_106A: > trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443A_106; > - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK; > + trf->modulator_sys_clk_ctrl = > + (trf->modulator_sys_clk_ctrl & 0xF8) | > + TRF7970A_MODULATOR_DEPTH_OOK; > trf->guard_time = TRF7970A_GUARD_TIME_NFCA; > break; > case NFC_DIGITAL_RF_TECH_106B: > trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443B_106; > - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10; > + trf->modulator_sys_clk_ctrl = > + (trf->modulator_sys_clk_ctrl & 0xF8) | > + TRF7970A_MODULATOR_DEPTH_ASK10; > trf->guard_time = TRF7970A_GUARD_TIME_NFCB; > break; > case NFC_DIGITAL_RF_TECH_212F: > trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_212; > - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10; > + trf->modulator_sys_clk_ctrl = > + (trf->modulator_sys_clk_ctrl & 0xF8) | > + TRF7970A_MODULATOR_DEPTH_ASK10; > trf->guard_time = TRF7970A_GUARD_TIME_NFCF; > break; > case NFC_DIGITAL_RF_TECH_424F: > trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_424; > - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10; > + trf->modulator_sys_clk_ctrl = > + (trf->modulator_sys_clk_ctrl & 0xF8) | > + TRF7970A_MODULATOR_DEPTH_ASK10; > trf->guard_time = TRF7970A_GUARD_TIME_NFCF; > break; > case NFC_DIGITAL_RF_TECH_ISO15693: > trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_15693_SGL_1OF4_2648; > - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK; > + trf->modulator_sys_clk_ctrl = > + (trf->modulator_sys_clk_ctrl &
Re: [PATCH] ethernet: sfc: Add Kconfig entry for vendor Solarflare
On 20/12/16 13:38, Tobias Klauser wrote: > Since commit > > 5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new > sfc-falcon driver") > > there are two drivers for Solarflare devices, but both still show up > directly beneath "Ethernet driver support" in the Kconfig. Follow the > pattern of other vendors and group them beneath an own vendor Kconfig > entry for Solarflare. > > Cc: Edward Cree> Signed-off-by: Tobias Klauser Acked-by: Edward Cree
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Tue, Dec 20, 2016 at 2:21 AM, Daniel Mackwrote: > Hi, > > On 12/20/2016 04:50 AM, Andy Lutomirski wrote: >> You mean BPF_CGROUP_RUN_PROG_INET_SOCK(sk)? There is nothing bpf >> specfic about the hook except that the name of this macro has "BPF" in >> it. There is nothing whatsoever that's bpf-specific about the context >> -- sk is not bpf-specific at all. >> >> The only thing bpf-specific about it is that it currently only invokes >> bpf programs. That could easily change. > > I'm not sure if I follow. The code as it currently stands only supports > attaching bpf programs to cgroups which have been created using > BPF_PROG_LOAD. If cgroups would support other program types in the > future, then they would need to be stored in different data types > anyway, and the bpf syscall multiplexer would be the wrong entry point > to access them anyway. To clarify, since this thread has gotten excessively long and twisted, I think it's important that, for hooks attached to a cgroup, you be able to tell in a generic way whether something is plugged into the hook. The natural way to see a cgroup's configuration is to read from cgroupfs, so I think that reading from cgroupfs should show you that a BPF program is attached and also give enough information that, once bpf programs become dumpable, you can dump the program (using the bpf() syscall or whatever). Obviously the interface to *attach* a BPF program to a hook will need to be at least a little bit BPF-specific. But there's nothing particularly BPF-specific about detaching, and if a control file were to exist, writing "detach" or "none" to it seems natural. > > Whether we add bpf-specific code to the cgroup file parsers or > cgroup-specific code to the bpf layer does not make much of a semantic > difference, does it? As a matter of fact, my very first implementation > of this patch set implemented a cgroup controller that would allow > writing strings like "ing > > b) make it possible to extend the functionality in the future by adding > flags to the command struct etc. > > And I hoped we achieved that after discussing it for so long. > >> How about slowing down a wee bit and trying to come up with cgroup >> hook semantics that work for all of these use cases? > > I'm all for discussing things, but I don't this was done in a rush. > > I do agree though that adding functionality to cgroups that is not > limited to resource control is a delicate thing to do, which is why I > cc'ed cgroups@ in my patches. I should have also added linux-api@ I > guess, sorry I missed that. >ress 5" to its control file, where 5 is the fd > number that came out of BPF_PROG_LOAD. The main reason we decided to > ditch that was that echoing fd numbers into a text file seemed way worse > than going through a proper syscall layer with it, and ioctls are > unavailable on pseudo-fs. There isn't a big semantic difference between 'open("/cgroup/NAME/some.control.file", O_WRONLY); ioctl(..., CGROUP_ATTACH_BPF, ...)' and 'open("/cgroup/NAME/some.control.file", O_WRONLY); bpf(BPF_PROG_ATTACH, ...);'. There is, however, a semantic difference when you do open("/cgroup/NAME", O_RDONLY | O_DIRECTORY) because the permission check is much weaker. The reason I suggest ioctl() and not write() is that write() MUST NOT make its behavior depend on the caller's credentials, file table, etc. Imagine what would happen if you did 'sudo -u eviltext >/cgroup/NAME/control.file'. (This particular mistake has been repeated many times in the kernel, in drivers, networking, namespaces, core code, etc, and it's resulted in a big pile of privilege escalation bugs.) So write("bpf:") is safe (but unusably awkward, I think), whereas write("bpf:fd 5") is unsafe. > > The idea was rather to allow attaching bpf programs to other things than > just cgroups as well, which is why we called the member of 'union > bpf_attr' 'target_fd', and a cgroup is just one type a target here. I would make that a separate operation. If someone adds the ability to attach an ebpf program to, say, seccomp (I'm quite sure this will happen eventually), it should be attached using seccomp(), not bpf(), for example). The people writing seccomp filters will thank you for making the syscall in question reflect what object (the cgroup, for example) is being modified. > >>> i'm assuming 'baadf00d' is bpf program fd expressed a text string? >>> and kernel needs to parse above? will you allow capital and lower >>> case for 'bpf:' ? and mixed case too? spaces and tabs allowed or not? >>> can program fd expressed as decimal or hex or both? >>> how do you return the error? as a text string for user space >>> to parse? >> >> No. The kernel does not parse it because you cannot write this to the >> file. You set a bpf filter with ioctl and pass an fd. > > An ioctl on what file, exactly? There are at least two plausible models. My preference would be to do an ioctl on a new /cgroup/NAME/network_hooks.inet_ingress file.
Re: wl1251 & mac address & calibration data
* Kalle Valo[161220 09:12]: > Tony Lindgren writes: > > > * Kalle Valo [161220 03:47]: > >> Arend Van Spriel writes: > >> > >> > On 18-12-2016 13:09, Pali Rohár wrote: > >> > > >> >> File wl1251-nvs.bin is provided by linux-firmware package and contains > >> >> default data which should be overriden by model specific calibrated > >> >> data. > >> > > >> > Ah. Someone thought it was a good idea to provide the "one ring to rule > >> > them all". Nice. > >> > >> Yes, that was a bad idea. wl1251-nvs.bin in linux-firmware.git should be > >> renamed to wl1251-nvs.bin.example, or something like that, as it should > >> be only installed to a real system only if there's no real calibration > >> data available (only for developers to use, not real users). > > > > Makes sense to me. Note that with the recent changes to wlcore, we can > > now easily provide board specific calibration firmware simply by adding a > > new compatible value. So for n900, we could have something like > > compatible = "ti,wl1251-n900" and have it point to n900 specific calibration > > file wl1251-nvs-n900.bin. Of course this won't help with the mac address, > > or any of the device specific data.. > > > > That is assuming the calibration values are the same for each similar > > device and don't have to be generated for each device. And naturally wl1251 > > needs simlar changes done to make use of devices specific calibration files. > > No, these are unique per each sold device. Every N900 was calibrated at > the factory and they all have different calibration data which is stored > to the flash. So when N900 boots (and in _every_ boot) it has to load > the calibration data from the flash and provide it to the wl1251 driver > somehow. Urgh, OK. So much for that idea then. Thanks, Tony
Re: wl1251 & mac address & calibration data
Tony Lindgrenwrites: > * Kalle Valo [161220 03:47]: >> Arend Van Spriel writes: >> >> > On 18-12-2016 13:09, Pali Rohár wrote: >> > >> >> File wl1251-nvs.bin is provided by linux-firmware package and contains >> >> default data which should be overriden by model specific calibrated >> >> data. >> > >> > Ah. Someone thought it was a good idea to provide the "one ring to rule >> > them all". Nice. >> >> Yes, that was a bad idea. wl1251-nvs.bin in linux-firmware.git should be >> renamed to wl1251-nvs.bin.example, or something like that, as it should >> be only installed to a real system only if there's no real calibration >> data available (only for developers to use, not real users). > > Makes sense to me. Note that with the recent changes to wlcore, we can > now easily provide board specific calibration firmware simply by adding a > new compatible value. So for n900, we could have something like > compatible = "ti,wl1251-n900" and have it point to n900 specific calibration > file wl1251-nvs-n900.bin. Of course this won't help with the mac address, > or any of the device specific data.. > > That is assuming the calibration values are the same for each similar > device and don't have to be generated for each device. And naturally wl1251 > needs simlar changes done to make use of devices specific calibration files. No, these are unique per each sold device. Every N900 was calibrated at the factory and they all have different calibration data which is stored to the flash. So when N900 boots (and in _every_ boot) it has to load the calibration data from the flash and provide it to the wl1251 driver somehow. -- Kalle Valo
[PATCH 19/29] samples/bpf: Make samples more libbpf-centric
From: Joe StringerSwitch all of the sample code to use the function names from tools/lib/bpf so that they're consistent with that, and to declare their own log buffers. This allow the next commit to be purely devoted to getting rid of the duplicate library in samples/bpf. Committer notes: Testing it: On a fedora rawhide container, with clang/llvm 3.9, sharing the host linux kernel git tree: # make O=/tmp/build/linux/ headers_install # make O=/tmp/build/linux -C samples/bpf/ Since I forgot to make it privileged, just tested it outside the container, using what it generated: # uname -a Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 x86_64 GNU/Linux # cd /var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/ # ls -la offwaketime -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime # file offwaketime offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped # readelf -SW offwaketime_kern.o | grep PROGBITS [ 2] .text PROGBITS 40 00 00 AX 0 0 4 [ 3] kprobe/try_to_wake_up PROGBITS 40 d8 00 AX 0 0 8 [ 5] tracepoint/sched/sched_switch PROGBITS 000118 000318 00 AX 0 0 8 [ 7] maps PROGBITS 000430 50 00 WA 0 0 4 [ 8] license PROGBITS 000480 04 00 WA 0 0 1 [ 9] version PROGBITS 000484 04 00 WA 0 0 4 # ./offwaketime | head -5 swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 106 CPU 0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3 2 Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh 5 firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 13 JS Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox 2 # Signed-off-by: Joe Stringer Tested-by: Arnaldo Carvalho de Melo Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Wang Nan Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20161214224342.12858-2-...@ovn.org Signed-off-by: Arnaldo Carvalho de Melo --- samples/bpf/bpf_load.c| 17 +--- samples/bpf/bpf_load.h| 3 +++ samples/bpf/fds_example.c | 9 --- samples/bpf/lathist_user.c| 2 +- samples/bpf/libbpf.c | 23 samples/bpf/libbpf.h | 18 ++--- samples/bpf/lwt_len_hist_user.c | 6 +++-- samples/bpf/offwaketime_user.c| 8 +++--- samples/bpf/sampleip_user.c | 4 +-- samples/bpf/sock_example.c| 12 + samples/bpf/sockex1_user.c| 6 ++--- samples/bpf/sockex2_user.c| 4 +-- samples/bpf/sockex3_user.c| 4 +-- samples/bpf/spintest_user.c | 8 +++--- samples/bpf/tc_l2_redirect_user.c | 4 +-- samples/bpf/test_cgrp2_array_pin.c| 4 +-- samples/bpf/test_cgrp2_attach.c | 11 +--- samples/bpf/test_cgrp2_attach2.c | 7 +++-- samples/bpf/test_cgrp2_sock.c | 6 +++-- samples/bpf/test_current_task_under_cgroup_user.c | 8 +++--- samples/bpf/test_lru_dist.c | 32 +++ samples/bpf/test_probe_write_user_user.c | 2 +- samples/bpf/trace_event_user.c| 14 +- samples/bpf/trace_output_user.c | 2 +- samples/bpf/tracex2_user.c| 10 +++ samples/bpf/tracex3_user.c
[PATCH v3] stmmac: enable rx queues
When the hardware is synthesized with multiple queues, all queues are disabled for default. This patch adds the rx queues configuration. This patch was successfully tested in a Synopsys QoS Reference design. Signed-off-by: Joao Pinto--- changes v2 -> v3 (Seraphin Bonnaffe): - GMAC_RX_QUEUE_CLEAR macro simplified changes v1 -> v2 (Niklas Cassel and Seraphin Bonnaffe): - Instead of using number of DMA channels, lets use number of queues - Create 2 flavors of RX queue enable Macros: AV and DCB (AV by default) - Make sure that the RX queue related bits are cleared before setting - Check if rx_queue_enable is available before executing drivers/net/ethernet/stmicro/stmmac/common.h | 5 + drivers/net/ethernet/stmicro/stmmac/dwmac4.h | 8 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 12 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c | 5 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++ 5 files changed, 52 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index b13a144..6c96291 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -323,6 +323,9 @@ struct dma_features { /* TX and RX number of channels */ unsigned int number_rx_channel; unsigned int number_tx_channel; + /* TX and RX number of queues */ + unsigned int number_rx_queues; + unsigned int number_tx_queues; /* Alternate (enhanced) DESC mode */ unsigned int enh_desc; }; @@ -454,6 +457,8 @@ struct stmmac_ops { void (*core_init)(struct mac_device_info *hw, int mtu); /* Enable and verify that the IPC module is supported */ int (*rx_ipc)(struct mac_device_info *hw); + /* Enable RX Queues */ + void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue); /* Dump MAC registers */ void (*dump_regs)(struct mac_device_info *hw); /* Handle extra events on specific interrupts hw dependent */ diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h index 3e8d4fe..b524598 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h @@ -22,6 +22,7 @@ #define GMAC_HASH_TAB_32_630x0014 #define GMAC_RX_FLOW_CTRL 0x0090 #define GMAC_QX_TX_FLOW_CTRL(x)(0x70 + x * 4) +#define GMAC_RXQ_CTRL0 0x00a0 #define GMAC_INT_STATUS0x00b0 #define GMAC_INT_EN0x00b4 #define GMAC_PCS_BASE 0x00e0 @@ -44,6 +45,11 @@ #define GMAC_MAX_PERFECT_ADDRESSES 128 +/* MAC RX Queue Enable */ +#define GMAC_RX_QUEUE_CLEAR(queue) ~(GENMASK(1, 0) << ((queue) * 2)) +#define GMAC_RX_AV_QUEUE_ENABLE(queue) BIT((queue) * 2) +#define GMAC_RX_DCB_QUEUE_ENABLE(queue)BIT(((queue) * 2) + 1) + /* MAC Flow Control RX */ #define GMAC_RX_FLOW_CTRL_RFE BIT(0) @@ -133,6 +139,8 @@ enum power_event { /* MAC HW features2 bitmap */ #define GMAC_HW_FEAT_TXCHCNT GENMASK(21, 18) #define GMAC_HW_FEAT_RXCHCNT GENMASK(15, 12) +#define GMAC_HW_FEAT_TXQCNTGENMASK(9, 6) +#define GMAC_HW_FEAT_RXQCNTGENMASK(3, 0) /* MAC HW ADDR regs */ #define GMAC_HI_DCSGENMASK(18, 16) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c index eaed7cb..ecfbf57 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c @@ -59,6 +59,17 @@ static void dwmac4_core_init(struct mac_device_info *hw, int mtu) writel(value, ioaddr + GMAC_INT_EN); } +static void dwmac4_rx_queue_enable(struct mac_device_info *hw, u32 queue) +{ + void __iomem *ioaddr = hw->pcsr; + u32 value = readl(ioaddr + GMAC_RXQ_CTRL0); + + value &= GMAC_RX_QUEUE_CLEAR(queue); + value |= GMAC_RX_AV_QUEUE_ENABLE(queue); + + writel(value, ioaddr + GMAC_RXQ_CTRL0); +} + static void dwmac4_dump_regs(struct mac_device_info *hw) { void __iomem *ioaddr = hw->pcsr; @@ -392,6 +403,7 @@ static void dwmac4_debug(void __iomem *ioaddr, struct stmmac_extra_stats *x) static const struct stmmac_ops dwmac4_ops = { .core_init = dwmac4_core_init, .rx_ipc = dwmac4_rx_ipc_enable, + .rx_queue_enable = dwmac4_rx_queue_enable, .dump_regs = dwmac4_dump_regs, .host_irq_status = dwmac4_irq_status, .flow_ctrl = dwmac4_flow_ctrl, diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c index 8196ab5..377d1b4 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c @@ -303,6 +303,11 @@ static
Re: wl1251 & mac address & calibration data
On Tuesday 20 December 2016 17:56:58 Tony Lindgren wrote: > * Kalle Valo[161220 03:47]: > > Arend Van Spriel writes: > > > On 18-12-2016 13:09, Pali Rohár wrote: > > >> File wl1251-nvs.bin is provided by linux-firmware package and > > >> contains default data which should be overriden by model > > >> specific calibrated data. > > > > > > Ah. Someone thought it was a good idea to provide the "one ring > > > to rule them all". Nice. > > > > Yes, that was a bad idea. wl1251-nvs.bin in linux-firmware.git > > should be renamed to wl1251-nvs.bin.example, or something like > > that, as it should be only installed to a real system only if > > there's no real calibration data available (only for developers to > > use, not real users). > > Makes sense to me. Note that with the recent changes to wlcore, we > can now easily provide board specific calibration firmware simply by > adding a new compatible value. So for n900, we could have something > like compatible = "ti,wl1251-n900" and have it point to n900 > specific calibration file wl1251-nvs-n900.bin. Of course this won't > help with the mac address, or any of the device specific data.. > > That is assuming the calibration values are the same for each similar > device and don't have to be generated for each device. And naturally > wl1251 needs simlar changes done to make use of devices specific > calibration files. > > Regards, > > Tony As wrote in another thread "wl1251 NVS calibration data format" calibration data for wl1251 (wl1251-nvs.bin) contains also MAC address, which kernel sends to wl1251 chip. Kernel just do not use it. So... my idea now is: 1) extend request_firmware function family with ability to use userspace helper first and fallback to VFS 2) teach wl1251.ko to parse MAC address from wl1251-nvs.bin and use it (in case it is not empty or 00:00:20:07:03:09 which is in that example linux-firmware package) 3) write Nokia n900 specific userspace helper for providing data when kernel requests wl1251-nvs.bin. So userspace helper reads MAC address and calibration data from CAL, place MAC address into calibration data and send put it into kernel. Are you OK with this idea? -- Pali Rohár pali.ro...@gmail.com signature.asc Description: This is a digitally signed message part.
[PATCH 25/29] samples/bpf: Switch over to libbpf
From: Joe StringerNow that libbpf under tools/lib/bpf/* is synced with the version from samples/bpf, we can get rid most of the libbpf library here. Committer notes: Built it in a docker fedora rawhide container and ran it in the f25 host, seems to work just like it did before this patch, i.e. the switch to tools/lib/bpf/ doesn't seem to have introduced problems and Joe said he tested it with all the entries in samples/bpf/ and other code he found: [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/ [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/ make[1]: Entering directory '/tmp/build/linux' CHK include/config/kernel.release HOSTCC scripts/basic/fixdep GEN ./Makefile CHK include/generated/uapi/linux/version.h Using /git/linux as source for kernel CHK include/generated/utsrelease.h HOSTCC scripts/basic/bin2c HOSTCC arch/x86/tools/relocs_32.o HOSTCC arch/x86/tools/relocs_64.o LD samples/bpf/built-in.o HOSTCC samples/bpf/fds_example.o HOSTCC samples/bpf/sockex1_user.o /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create': /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] insns, insns_cnt, "GPL", 0, ^ In file included from /git/linux/samples/bpf/libbpf.h:5:0, from /git/linux/samples/bpf/bpf_load.h:4, from /git/linux/samples/bpf/fds_example.c:15: /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *' int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns, ^~~~ HOSTCC samples/bpf/sockex2_user.o HOSTCC samples/bpf/xdp_tx_iptunnel_user.o clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h \ -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \ -Wno-compare-distinct-pointer-types \ -Wno-gnu-variable-sized-type-not-at-end \ -Wno-address-of-packed-member -Wno-tautological-compare \ -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o HOSTLD samples/bpf/tc_l2_redirect HOSTLD samples/bpf/lwt_len_hist HOSTLD samples/bpf/xdp_tx_iptunnel make[1]: Leaving directory '/tmp/build/linux' [root@f5065a7d6272 linux]# And then, in the host: [root@jouet bpf]# mount | grep "docker.*devicemapper\/" /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota) [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/ [root@jouet bpf]# file offwaketime offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped [root@jouet bpf]# readelf -SW offwaketime offwaketime offwaketime_kern.o offwaketime_user.o [root@jouet bpf]# readelf -SW offwaketime_kern.o There are 11 section headers, starting at offset 0x700: Section Headers: [Nr] Name TypeAddress OffSize ES Flg Lk Inf Al [ 0] NULL 00 00 00 0 0 0 [ 1] .strtab STRTAB 000658 a8 00 0 0 1 [ 2] .text PROGBITS 40 00 00 AX 0 0 4 [ 3] kprobe/try_to_wake_up PROGBITS 40 d8 00 AX 0 0 8 [ 4] .relkprobe/try_to_wake_up REL 0005a8 20 10 10 3 8 [ 5] tracepoint/sched/sched_switch PROGBITS 000118 000318 00 AX 0 0 8 [ 6] .reltracepoint/sched/sched_switch REL 0005c8 90 10 10 5 8 [ 7] maps PROGBITS 000430 50 00 WA 0 0 4 [ 8] license PROGBITS 000480 04 00 WA 0 0 1 [ 9] version PROGBITS
[GIT PULL 00/29] perf/core improvements and fixes
Hi Ingo, Please consider pulling, I had most of this queued before your first pull req to Linus for 4.10, most are fixes, with 'perf sched timehist --idle' as a followup new feature to the 'perf sched timehist' command introduced in this window. One other thing that delayed this was the samples/bpf/ switch to tools/lib/bpf/ that involved fixing up merge clashes with net.git and also to properly test it, after more rounds than antecipated, but all seems ok now and would be good to get this merge issues past us ASAP. - Arnaldo Test results at the end of this message, as usual. The following changes since commit e7aa8c2eb11ba69b1b69099c3c7bd6be3087b0ba: Merge tag 'docs-4.10' of git://git.lwn.net/linux (2016-12-12 21:58:13 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-20161220 for you to fetch changes up to 9899694a7f67714216665b87318eb367e2c5c901: samples/bpf: Move open_raw_sock to separate header (2016-12-20 12:00:40 -0300) perf/core improvements and fixes: New features: - Introduce 'perf sched timehist --idle', to analyse processes going to/from idle state (Namhyung Kim) Fixes: - Allow 'perf record -u user' to continue when facing races with threads going away after having scanned them via /proc (Jiri Olsa) - Fix 'perf mem' --all-user/--all-kernel options (Jiri Olsa) - Support jumps with multiple arguments (Ravi Bangoria) - Fix jumps to before the function where they are located (Ravi Bangoria) - Fix lock-pi help string (Davidlohr Bueso) - Fix build of 'perf trace' in odd systems such as a RHEL PPC one (Jiri Olsa) - Do not overwrite valid build id in 'perf diff' (Kan Liang) - Don't throw error for zero length symbols, allowing the use of the TUI in PowerPC, where such symbols became more common recently (Ravi Bangoria) Infrastructure: - Switch of samples/bpf/ to use tools/lib/bpf, removing libbpf duplication (Joe Stringer) - Move headers check into bash script (Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> Arnaldo Carvalho de Melo (3): perf tools: Remove some needless __maybe_unused samples/bpf: Make perf_event_read() static samples/bpf: Be consistent with bpf_load_program bpf_insn parameter Davidlohr Bueso (1): perf bench futex: Fix lock-pi help string Jiri Olsa (7): perf tools: Move headers check into bash script perf mem: Fix --all-user/--all-kernel options perf evsel: Use variable instead of repeating lengthy FD macro perf thread_map: Add thread_map__remove function perf evsel: Allow to ignore missing pid perf record: Force ignore_missing_thread for uid option perf trace: Check if MAP_32BIT is defined (again) Joe Stringer (8): tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h tools lib bpf: use __u32 from linux/types.h tools lib bpf: Add flags to bpf_create_map() samples/bpf: Make samples more libbpf-centric samples/bpf: Switch over to libbpf tools lib bpf: Add bpf_prog_{attach,detach} samples/bpf: Remove perf_event_open() declaration samples/bpf: Move open_raw_sock to separate header Kan Liang (1): perf diff: Do not overwrite valid build id Namhyung Kim (6): perf sched timehist: Split is_idle_sample() perf sched timehist: Introduce struct idle_time_data perf sched timehist: Save callchain when entering idle perf sched timehist: Skip non-idle events when necessary perf sched timehist: Add -I/--idle-hist option perf sched timehist: Show callchains for idle stat Ravi Bangoria (3): perf annotate: Support jump instruction with target as second operand perf annotate: Fix jump target outside of function address range perf annotate: Don't throw error for zero length symbols samples/bpf/Makefile | 70 +-- samples/bpf/README.rst| 4 +- samples/bpf/bpf_load.c| 21 +- samples/bpf/bpf_load.h| 3 + samples/bpf/fds_example.c | 13 +- samples/bpf/lathist_user.c| 2 +- samples/bpf/libbpf.c | 176 --- samples/bpf/libbpf.h | 28 +- samples/bpf/lwt_len_hist_user.c | 6 +- samples/bpf/offwaketime_user.c| 8 +- samples/bpf/sampleip_user.c | 7 +- samples/bpf/sock_example.c| 14 +- samples/bpf/sock_example.h| 35 ++ samples/bpf/sockex1_user.c| 7 +- samples/bpf/sockex2_user.c| 5 +- samples/bpf/sockex3_user.c
Re: [PATCH v2] stmmac: enable rx queues
Às 4:51 PM de 12/20/2016, Seraphin BONNAFFE escreveu: > Hi Joao, > > Please find two more comments below. > > Regards, > Séraphin > > > On 12/20/2016 05:27 PM, Joao Pinto wrote: >> When the hardware is synthesized with multiple queues, all queues are >> disabled for default. This patch adds the rx queues configuration. >> This patch was successfully tested in a Synopsys QoS Reference design. >> >> Signed-off-by: Joao Pinto>> --- >> changes v1 -> v2 (Niklas Cassel and Seraphin Bonnaffe): >> - Instead of using number of DMA channels, lets use number of queues >> - Create 2 flavors of RX queue enable Macros: AV and DCB (AV by default) >> - Make sure that the RX queue related bits are cleared before setting >> - Check if rx_queue_enable is available before executing >> stmmac_mac_enable_rx_queues() >> >> drivers/net/ethernet/stmicro/stmmac/common.h | 5 + >> drivers/net/ethernet/stmicro/stmmac/dwmac4.h | 9 + >> drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 12 >> drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c | 5 + >> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 >> ++ >> 5 files changed, 53 insertions(+) >> >> diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h >> b/drivers/net/ethernet/stmicro/stmmac/common.h >> index b13a144..6c96291 100644 >> --- a/drivers/net/ethernet/stmicro/stmmac/common.h >> +++ b/drivers/net/ethernet/stmicro/stmmac/common.h >> @@ -323,6 +323,9 @@ struct dma_features { >> /* TX and RX number of channels */ >> unsigned int number_rx_channel; >> unsigned int number_tx_channel; >> +/* TX and RX number of queues */ >> +unsigned int number_rx_queues; >> +unsigned int number_tx_queues; >> /* Alternate (enhanced) DESC mode */ >> unsigned int enh_desc; >> }; >> @@ -454,6 +457,8 @@ struct stmmac_ops { >> void (*core_init)(struct mac_device_info *hw, int mtu); >> /* Enable and verify that the IPC module is supported */ >> int (*rx_ipc)(struct mac_device_info *hw); >> +/* Enable RX Queues */ >> +void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue); >> /* Dump MAC registers */ >> void (*dump_regs)(struct mac_device_info *hw); >> /* Handle extra events on specific interrupts hw dependent */ >> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h >> b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h >> index 3e8d4fe..7d88517 100644 >> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h >> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h >> @@ -22,6 +22,7 @@ >> #define GMAC_HASH_TAB_32_630x0014 >> #define GMAC_RX_FLOW_CTRL0x0090 >> #define GMAC_QX_TX_FLOW_CTRL(x)(0x70 + x * 4) >> +#define GMAC_RXQ_CTRL00x00a0 >> #define GMAC_INT_STATUS0x00b0 >> #define GMAC_INT_EN0x00b4 >> #define GMAC_PCS_BASE0x00e0 >> @@ -44,6 +45,12 @@ >> >> #define GMAC_MAX_PERFECT_ADDRESSES128 >> >> +/* MAC RX Queue Enable */ >> +#define GMAC_RX_QUEUE_CLEAR(queue)~(BIT((queue) * 2) \ >> +| BIT(((queue) * 2) + 1)) > > > What would you think about ~(GENMASK(1, 0) << ((queue) * 2))) instead ? > Slightly more readable in my humble opinion. More readable indeed :) > > >> +#define GMAC_RX_AV_QUEUE_ENABLE(queue)BIT((queue) * 2) >> +#define GMAC_RX_DCB_QUEUE_ENABLE(queue)BIT(((queue) * 2) + 1) >> + >> /* MAC Flow Control RX */ >> #define GMAC_RX_FLOW_CTRL_RFEBIT(0) >> >> @@ -133,6 +140,8 @@ enum power_event { >> /* MAC HW features2 bitmap */ >> #define GMAC_HW_FEAT_TXCHCNTGENMASK(21, 18) >> #define GMAC_HW_FEAT_RXCHCNTGENMASK(15, 12) >> +#define GMAC_HW_FEAT_TXQCNTGENMASK(9, 6) >> +#define GMAC_HW_FEAT_RXQCNTGENMASK(3, 0) >> >> /* MAC HW ADDR regs */ >> #define GMAC_HI_DCSGENMASK(18, 16) >> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c >> b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c >> index eaed7cb..ecfbf57 100644 >> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c >> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c >> @@ -59,6 +59,17 @@ static void dwmac4_core_init(struct mac_device_info *hw, >> int mtu) >> writel(value, ioaddr + GMAC_INT_EN); >> } >> >> +static void dwmac4_rx_queue_enable(struct mac_device_info *hw, u32 queue) >> +{ >> +void __iomem *ioaddr = hw->pcsr; >> +u32 value = readl(ioaddr + GMAC_RXQ_CTRL0); >> + >> +value &= GMAC_RX_QUEUE_CLEAR(queue); >> +value |= GMAC_RX_AV_QUEUE_ENABLE(queue); >> + >> +writel(value, ioaddr + GMAC_RXQ_CTRL0); >> +} >> + >> static void dwmac4_dump_regs(struct mac_device_info *hw) >> { >> void __iomem *ioaddr = hw->pcsr; >> @@ -392,6 +403,7 @@ static void dwmac4_debug(void __iomem *ioaddr, struct >> stmmac_extra_stats *x) >> static const struct stmmac_ops dwmac4_ops = { >> .core_init = dwmac4_core_init, >>
Re: wl1251 & mac address & calibration data
* Kalle Valo[161220 03:47]: > Arend Van Spriel writes: > > > On 18-12-2016 13:09, Pali Rohár wrote: > > > >> File wl1251-nvs.bin is provided by linux-firmware package and contains > >> default data which should be overriden by model specific calibrated > >> data. > > > > Ah. Someone thought it was a good idea to provide the "one ring to rule > > them all". Nice. > > Yes, that was a bad idea. wl1251-nvs.bin in linux-firmware.git should be > renamed to wl1251-nvs.bin.example, or something like that, as it should > be only installed to a real system only if there's no real calibration > data available (only for developers to use, not real users). Makes sense to me. Note that with the recent changes to wlcore, we can now easily provide board specific calibration firmware simply by adding a new compatible value. So for n900, we could have something like compatible = "ti,wl1251-n900" and have it point to n900 specific calibration file wl1251-nvs-n900.bin. Of course this won't help with the mac address, or any of the device specific data.. That is assuming the calibration values are the same for each similar device and don't have to be generated for each device. And naturally wl1251 needs simlar changes done to make use of devices specific calibration files. Regards, Tony
Re: [PATCH v2] stmmac: enable rx queues
Hi Joao, Please find two more comments below. Regards, Séraphin On 12/20/2016 05:27 PM, Joao Pinto wrote: When the hardware is synthesized with multiple queues, all queues are disabled for default. This patch adds the rx queues configuration. This patch was successfully tested in a Synopsys QoS Reference design. Signed-off-by: Joao Pinto--- changes v1 -> v2 (Niklas Cassel and Seraphin Bonnaffe): - Instead of using number of DMA channels, lets use number of queues - Create 2 flavors of RX queue enable Macros: AV and DCB (AV by default) - Make sure that the RX queue related bits are cleared before setting - Check if rx_queue_enable is available before executing stmmac_mac_enable_rx_queues() drivers/net/ethernet/stmicro/stmmac/common.h | 5 + drivers/net/ethernet/stmicro/stmmac/dwmac4.h | 9 + drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 12 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c | 5 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++ 5 files changed, 53 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index b13a144..6c96291 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -323,6 +323,9 @@ struct dma_features { /* TX and RX number of channels */ unsigned int number_rx_channel; unsigned int number_tx_channel; + /* TX and RX number of queues */ + unsigned int number_rx_queues; + unsigned int number_tx_queues; /* Alternate (enhanced) DESC mode */ unsigned int enh_desc; }; @@ -454,6 +457,8 @@ struct stmmac_ops { void (*core_init)(struct mac_device_info *hw, int mtu); /* Enable and verify that the IPC module is supported */ int (*rx_ipc)(struct mac_device_info *hw); + /* Enable RX Queues */ + void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue); /* Dump MAC registers */ void (*dump_regs)(struct mac_device_info *hw); /* Handle extra events on specific interrupts hw dependent */ diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h index 3e8d4fe..7d88517 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h @@ -22,6 +22,7 @@ #define GMAC_HASH_TAB_32_630x0014 #define GMAC_RX_FLOW_CTRL 0x0090 #define GMAC_QX_TX_FLOW_CTRL(x)(0x70 + x * 4) +#define GMAC_RXQ_CTRL0 0x00a0 #define GMAC_INT_STATUS0x00b0 #define GMAC_INT_EN0x00b4 #define GMAC_PCS_BASE 0x00e0 @@ -44,6 +45,12 @@ #define GMAC_MAX_PERFECT_ADDRESSES 128 +/* MAC RX Queue Enable */ +#define GMAC_RX_QUEUE_CLEAR(queue) ~(BIT((queue) * 2) \ + | BIT(((queue) * 2) + 1)) What would you think about ~(GENMASK(1, 0) << ((queue) * 2))) instead ? Slightly more readable in my humble opinion. +#define GMAC_RX_AV_QUEUE_ENABLE(queue) BIT((queue) * 2) +#define GMAC_RX_DCB_QUEUE_ENABLE(queue)BIT(((queue) * 2) + 1) + /* MAC Flow Control RX */ #define GMAC_RX_FLOW_CTRL_RFE BIT(0) @@ -133,6 +140,8 @@ enum power_event { /* MAC HW features2 bitmap */ #define GMAC_HW_FEAT_TXCHCNT GENMASK(21, 18) #define GMAC_HW_FEAT_RXCHCNT GENMASK(15, 12) +#define GMAC_HW_FEAT_TXQCNTGENMASK(9, 6) +#define GMAC_HW_FEAT_RXQCNTGENMASK(3, 0) /* MAC HW ADDR regs */ #define GMAC_HI_DCSGENMASK(18, 16) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c index eaed7cb..ecfbf57 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c @@ -59,6 +59,17 @@ static void dwmac4_core_init(struct mac_device_info *hw, int mtu) writel(value, ioaddr + GMAC_INT_EN); } +static void dwmac4_rx_queue_enable(struct mac_device_info *hw, u32 queue) +{ + void __iomem *ioaddr = hw->pcsr; + u32 value = readl(ioaddr + GMAC_RXQ_CTRL0); + + value &= GMAC_RX_QUEUE_CLEAR(queue); + value |= GMAC_RX_AV_QUEUE_ENABLE(queue); + + writel(value, ioaddr + GMAC_RXQ_CTRL0); +} + static void dwmac4_dump_regs(struct mac_device_info *hw) { void __iomem *ioaddr = hw->pcsr; @@ -392,6 +403,7 @@ static void dwmac4_debug(void __iomem *ioaddr, struct stmmac_extra_stats *x) static const struct stmmac_ops dwmac4_ops = { .core_init = dwmac4_core_init, .rx_ipc = dwmac4_rx_ipc_enable, + .rx_queue_enable = dwmac4_rx_queue_enable, .dump_regs = dwmac4_dump_regs, .host_irq_status = dwmac4_irq_status, .flow_ctrl = dwmac4_flow_ctrl, diff --git
RE: [RFC PATCH net-next v4 2/2] macb: Enable 1588 support in SAMA5Dx platforms.
From: Andrei Pistirica [mailto:andrei.pistir...@microchip.com] Sent: 14 grudnia 2016 13:56 > This patch does the following: > > - Enable HW time stamp for the following platforms: SAMA5D2, SAMA5D3 and > > SAMA5D4. > > - HW time stamp capabilities are advertised via ethtool and macb ioctl is > > updated accordingly. > > - HW time stamp on the PTP Ethernet packets are received using the > > SO_TIMESTAMPING API. Where timers are obtained from the PTP event/peer > > registers. > > > > Note: Patch on net-next, on December 7th. > > > > Signed-off-by: Andrei Pistirica> > --- > > Patch history: > > > > Version 1: > > Integration with SAMA5D2 only. This feature wasn't tested on any other > platform that might use cadence/gem. > > > Patch is not completely ported to the very latest version of net-next, and it > will be after review. > > > Version 2 modifications: > > - add PTP caps for SAMA5D2/3/4 platforms > > - and cosmetic changes > > > > Version 3 modifications: > > - add support for sama5D2/3/4 platforms using GEM-PTP interface. > > > > Version 4 modifications: > > - time stamp only PTP_V2 events > > - maximum adjustment value is set based on Richard's input > > > > Note: Patch on net-next, on December 14th. > > > > drivers/net/ethernet/cadence/macb.c | 168 > ++-- > 1 file changed, 163 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/net/ethernet/cadence/macb.c > b/drivers/net/ethernet/cadence/macb.c > index 538544a..8d5c976 100644 > > --- a/drivers/net/ethernet/cadence/macb.c > > +++ b/drivers/net/ethernet/cadence/macb.c > > @@ -714,6 +714,8 @@ static void macb_tx_interrupt(struct
Re: Which ethtool methods should I implement?
On 12/19/2016 07:40 PM, Florian Fainelli wrote: Ideally, everything that is supported by your HW, but I would with the basic essential stuff that you would need in case someone reports problems with your driver like: - statistics (MAC for sure) and PHY (if possible), -S - ability to restart auto-negotation (-r) - reporting of driver information (-i) - support toggling and reporting NETIF_F_* features -k/-K Thanks, I'll get this done soon. I'm confused about netdev_set_default_ethtool_ops(). Is this a function that drivers are supposed to call? I only see one driver use it. Other drivers just set netdev->ethtool_ops manually. -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH] net_sched: sch_fq: use rb_entry()
On Tue, 2016-12-20 at 22:02 +0800, Geliang Tang wrote: > To make the code clearer, use rb_entry() instead of container_of() to > deal with rbtree. > > Signed-off-by: Geliang Tang> --- > net/sched/sch_fq.c | 14 +++--- > 1 file changed, 7 insertions(+), 7 deletions(-) Acked-by: Eric Dumazet Thanks.
Re: [PATCH net-next] ixgbevf: fix 'Etherleak' in ixgbevf
The limit of 17 is just based on the hardware. Specifically the olinfo field in the Tx descriptor has a minimum length of 17 has a requirement. The hardware itself is supposed to be capable of padding short frames that are supposed to be transmitted. The drivers are supposed to pad short frames on receive to get them up to 60 bytes. When you are seeing this issue are you sending frames from the VF to one of the local interfaces on the same port or to an external interface? Also are you receiving on another linux ixgbevf driver or are you receiving the packet using a different driver interface such as DPDK? I'm just wanting to verify this as it is possible that the memory leak you are seeing is on the receiver and not on the source if you are transmitting to a local VF or the PF as the receiver will have to pad the frame in such a case to get it up to 60 bytes. - Alex On Tue, Dec 20, 2016 at 3:50 AM, Weilong Chenwrote: > Hi, > > Thanks for you reply. > We test you patch, but the problem is still there, it seems do not work. > > I'm not sure why ixgbe use the limit 17. The kenel use ETH_ZLEN (60) with > out FCS. A lot of drivers such as e1000 use it. Any explaination? > > Thanks. > > > On 2016/12/16 0:13, Alexander Duyck wrote: >> >> On Thu, Dec 15, 2016 at 3:40 AM, Weilong Chen >> wrote: >>> >>> Nessus report the vf appears to leak memory in network packets. >>> Fix this by padding all small packets manually. >>> >>> And the CVE-2003-0001. >>> >>> https://ofirarkin.files.wordpress.com/2008/11/atstake_etherleak_report.pdf >>> >>> Signed-off-by: Weilong Chen >>> --- >>> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c >>> b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c >>> index 6d4bef5..137a154 100644 >>> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c >>> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c >>> @@ -3654,6 +3654,13 @@ static int ixgbevf_xmit_frame(struct sk_buff *skb, >>> struct net_device *netdev) >>> return NETDEV_TX_OK; >>> } >>> >>> + /* On PCI/PCI-X HW, if packet size is less than ETH_ZLEN, >>> +* packets may get corrupted during padding by HW. >>> +* To WA this issue, pad all small packets manually. >>> +*/ >>> + if (eth_skb_pad(skb)) >>> + return NETDEV_TX_OK; >>> + >> >> >> So the patch description for this probably isn't correct. It looks >> like the problem isn't leaking data it is the fact that the frames >> aren't being padded to prevent malicious events. The only issue is >> the patch is padding by a bit too much. I would recommend replacing >> this with the following from ixgbe: >> >> /* >> * The minimum packet size for olinfo paylen is 17 so pad the skb >> * in order to meet this minimum size requirement. >> */ >> if (skb_put_padto(skb, 17)) >> return NETDEV_TX_OK; >> >> >>> tx_ring = adapter->tx_ring[skb->queue_mapping]; >>> >>> /* need: 1 descriptor per page * >>> PAGE_SIZE/IXGBE_MAX_DATA_PER_TXD, >>> -- >>> 1.7.12 >>> >> >> . >> >
Re: [PATCH] RDS: use rb_entry()
On 12/20/2016 6:02 AM, Geliang Tang wrote: To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang--- Looks fine. Acked-by: Santosh Shilimkar
[PATCH v2] stmmac: enable rx queues
When the hardware is synthesized with multiple queues, all queues are disabled for default. This patch adds the rx queues configuration. This patch was successfully tested in a Synopsys QoS Reference design. Signed-off-by: Joao Pinto--- changes v1 -> v2 (Niklas Cassel and Seraphin Bonnaffe): - Instead of using number of DMA channels, lets use number of queues - Create 2 flavors of RX queue enable Macros: AV and DCB (AV by default) - Make sure that the RX queue related bits are cleared before setting - Check if rx_queue_enable is available before executing stmmac_mac_enable_rx_queues() drivers/net/ethernet/stmicro/stmmac/common.h | 5 + drivers/net/ethernet/stmicro/stmmac/dwmac4.h | 9 + drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 12 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c | 5 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++ 5 files changed, 53 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index b13a144..6c96291 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -323,6 +323,9 @@ struct dma_features { /* TX and RX number of channels */ unsigned int number_rx_channel; unsigned int number_tx_channel; + /* TX and RX number of queues */ + unsigned int number_rx_queues; + unsigned int number_tx_queues; /* Alternate (enhanced) DESC mode */ unsigned int enh_desc; }; @@ -454,6 +457,8 @@ struct stmmac_ops { void (*core_init)(struct mac_device_info *hw, int mtu); /* Enable and verify that the IPC module is supported */ int (*rx_ipc)(struct mac_device_info *hw); + /* Enable RX Queues */ + void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue); /* Dump MAC registers */ void (*dump_regs)(struct mac_device_info *hw); /* Handle extra events on specific interrupts hw dependent */ diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h index 3e8d4fe..7d88517 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h @@ -22,6 +22,7 @@ #define GMAC_HASH_TAB_32_630x0014 #define GMAC_RX_FLOW_CTRL 0x0090 #define GMAC_QX_TX_FLOW_CTRL(x)(0x70 + x * 4) +#define GMAC_RXQ_CTRL0 0x00a0 #define GMAC_INT_STATUS0x00b0 #define GMAC_INT_EN0x00b4 #define GMAC_PCS_BASE 0x00e0 @@ -44,6 +45,12 @@ #define GMAC_MAX_PERFECT_ADDRESSES 128 +/* MAC RX Queue Enable */ +#define GMAC_RX_QUEUE_CLEAR(queue) ~(BIT((queue) * 2) \ + | BIT(((queue) * 2) + 1)) +#define GMAC_RX_AV_QUEUE_ENABLE(queue) BIT((queue) * 2) +#define GMAC_RX_DCB_QUEUE_ENABLE(queue)BIT(((queue) * 2) + 1) + /* MAC Flow Control RX */ #define GMAC_RX_FLOW_CTRL_RFE BIT(0) @@ -133,6 +140,8 @@ enum power_event { /* MAC HW features2 bitmap */ #define GMAC_HW_FEAT_TXCHCNT GENMASK(21, 18) #define GMAC_HW_FEAT_RXCHCNT GENMASK(15, 12) +#define GMAC_HW_FEAT_TXQCNTGENMASK(9, 6) +#define GMAC_HW_FEAT_RXQCNTGENMASK(3, 0) /* MAC HW ADDR regs */ #define GMAC_HI_DCSGENMASK(18, 16) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c index eaed7cb..ecfbf57 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c @@ -59,6 +59,17 @@ static void dwmac4_core_init(struct mac_device_info *hw, int mtu) writel(value, ioaddr + GMAC_INT_EN); } +static void dwmac4_rx_queue_enable(struct mac_device_info *hw, u32 queue) +{ + void __iomem *ioaddr = hw->pcsr; + u32 value = readl(ioaddr + GMAC_RXQ_CTRL0); + + value &= GMAC_RX_QUEUE_CLEAR(queue); + value |= GMAC_RX_AV_QUEUE_ENABLE(queue); + + writel(value, ioaddr + GMAC_RXQ_CTRL0); +} + static void dwmac4_dump_regs(struct mac_device_info *hw) { void __iomem *ioaddr = hw->pcsr; @@ -392,6 +403,7 @@ static void dwmac4_debug(void __iomem *ioaddr, struct stmmac_extra_stats *x) static const struct stmmac_ops dwmac4_ops = { .core_init = dwmac4_core_init, .rx_ipc = dwmac4_rx_ipc_enable, + .rx_queue_enable = dwmac4_rx_queue_enable, .dump_regs = dwmac4_dump_regs, .host_irq_status = dwmac4_irq_status, .flow_ctrl = dwmac4_flow_ctrl, diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c index 8196ab5..377d1b4 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c @@ -303,6 +303,11 @@
[PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Geoff LansberryThe TRF7970A has configuration options to support hardware designs which use a 27.12MHz clock. This commit adds a device tree option 'clock-frequency' to support configuring the this chip for default 13.56MHz clock or the optional 27.12MHz clock. --- .../devicetree/bindings/net/nfc/trf7970a.txt | 4 ++ drivers/nfc/trf7970a.c | 50 +- 2 files changed, 43 insertions(+), 11 deletions(-) diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt index 32b35a0..e262ac1 100644 --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt @@ -21,6 +21,8 @@ Optional SoC Specific Properties: - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum where an extra byte is returned by Read Multiple Block commands issued to Type 5 tags. +- clock-frequency: Set to specify that the input frequency to the trf7970a is 1356Hz or 2712Hz + Example (for ARM-based BeagleBone with TRF7970A on SPI1): @@ -43,6 +45,8 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1): irq-status-read-quirk; en2-rf-quirk; t5t-rmb-extra-byte-quirk; + vdd_io_1v8; + clock-frequency = <2712>; status = "okay"; }; }; diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c index 26c9dbb..4e051e9 100644 --- a/drivers/nfc/trf7970a.c +++ b/drivers/nfc/trf7970a.c @@ -124,6 +124,9 @@ NFC_PROTO_ISO15693_MASK | NFC_PROTO_NFC_DEP_MASK) #define TRF7970A_AUTOSUSPEND_DELAY 3 /* 30 seconds */ +#define TRF7970A_13MHZ_CLOCK_FREQUENCY 1356 +#define TRF7970A_27MHZ_CLOCK_FREQUENCY 2712 + #define TRF7970A_RX_SKB_ALLOC_SIZE 256 @@ -1056,12 +1059,11 @@ static int trf7970a_init(struct trf7970a *trf) trf->chip_status_ctrl &= ~TRF7970A_CHIP_STATUS_RF_ON; - ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL, 0); + ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL, + trf->modulator_sys_clk_ctrl); if (ret) goto err_out; - trf->modulator_sys_clk_ctrl = 0; - ret = trf7970a_write(trf, TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS, TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLH_96 | TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLL_32); @@ -1181,27 +1183,37 @@ static int trf7970a_in_config_rf_tech(struct trf7970a *trf, int tech) switch (tech) { case NFC_DIGITAL_RF_TECH_106A: trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443A_106; - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK; + trf->modulator_sys_clk_ctrl = + (trf->modulator_sys_clk_ctrl & 0xF8) | + TRF7970A_MODULATOR_DEPTH_OOK; trf->guard_time = TRF7970A_GUARD_TIME_NFCA; break; case NFC_DIGITAL_RF_TECH_106B: trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443B_106; - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10; + trf->modulator_sys_clk_ctrl = + (trf->modulator_sys_clk_ctrl & 0xF8) | + TRF7970A_MODULATOR_DEPTH_ASK10; trf->guard_time = TRF7970A_GUARD_TIME_NFCB; break; case NFC_DIGITAL_RF_TECH_212F: trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_212; - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10; + trf->modulator_sys_clk_ctrl = + (trf->modulator_sys_clk_ctrl & 0xF8) | + TRF7970A_MODULATOR_DEPTH_ASK10; trf->guard_time = TRF7970A_GUARD_TIME_NFCF; break; case NFC_DIGITAL_RF_TECH_424F: trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_424; - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10; + trf->modulator_sys_clk_ctrl = + (trf->modulator_sys_clk_ctrl & 0xF8) | + TRF7970A_MODULATOR_DEPTH_ASK10; trf->guard_time = TRF7970A_GUARD_TIME_NFCF; break; case NFC_DIGITAL_RF_TECH_ISO15693: trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_15693_SGL_1OF4_2648; - trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK; + trf->modulator_sys_clk_ctrl = + (trf->modulator_sys_clk_ctrl & 0xF8) | + TRF7970A_MODULATOR_DEPTH_OOK; trf->guard_time = TRF7970A_GUARD_TIME_15693; break; default: @@ -1571,17 +1583,23 @@ static int trf7970a_tg_config_rf_tech(struct trf7970a *trf, int tech) trf->iso_ctrl_tech
Re: kernel/bpf/verifier.c: 4 * possible unintended fallthrough ?
On Tue, Dec 20, 2016 at 3:20 AM, David Bindermanwrote: > Hello there, > > I just tried to compile kernel-4.9 with a recent development > version of gcc. It said > > kernel/bpf/verifier.c:1907:23: warning: this statement may fall through > [-Wimplicit-fallthrough=] > kernel/bpf/verifier.c:1918:23: warning: this statement may fall through > [-Wimplicit-fallthrough=] > kernel/bpf/verifier.c:1859:24: warning: this statement may fall through > [-Wimplicit-fallthrough=] > kernel/bpf/verifier.c:1869:24: warning: this statement may fall through > [-Wimplicit-fallthrough=] > > Source code for the first one is > > case BPF_JGT: > /* Unsigned comparison, the minimum value is 0. */ > true_reg->min_value = 0; > case BPF_JSGT: > > Suggest either add the missing break or document the fallthrough > with a comment something like /* FALLTHROUGH */ I've tried 4.9 and 5.2 and don't see this warning. Is this 6.x gcc? I suspect it will have such warnings all over the kernel.
Re: [PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
On Mon, Dec 19, 2016 at 5:35 PM, Rob Herringwrote: > On Thu, Dec 15, 2016 at 05:30:43PM -0500, Geoff Lansberry wrote: >> From: Geoff Lansberry >> >> --- >> Documentation/devicetree/bindings/net/nfc/trf7970a.txt | 2 ++ >> drivers/nfc/trf7970a.c | 13 - >> 2 files changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt >> b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt >> index 9dda879..208f045 100644 >> --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt >> +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt >> @@ -21,6 +21,7 @@ Optional SoC Specific Properties: >> - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum >>where an extra byte is returned by Read Multiple Block commands issued >>to Type 5 tags. >> +- vdd_io_1v8: Set to specify that the trf7970a io voltage should be set to >> 1.8V > > Use the regulator binding and provide a fixed 1.8V supply. > >> - crystal_27mhz: Set to specify that the input frequency to the trf7970a is >> 27.12MHz >> >> >> @@ -45,6 +46,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1): >> irq-status-read-quirk; >> en2-rf-quirk; >> t5t-rmb-extra-byte-quirk; >> + vdd_io_1v8; >> crystal_27mhz; >> status = "okay"; >> }; Rob - using the regulator binding is new to me, but I've given it a shot and just sent you another set of patches for your inspection. Please let me know if this is what you had in mind. Geoff
Re: [PATCH v2 3/3] arm64: dts: marvell: Add ethernet switch definition for the ESPRESSObin
> >>+ mdio { > >>+ #address-cells = <1>; > >>+ #size-cells = <0>; > >>+ reg = <1>; > > > >what is this reg value for? > > > > Andrew > > > > It was required to avoid a warning thrown by the mdio subsystem Do you remember what the warning was? This seems odd to me. I don't see why a reg is needed here. Thanks Andrew
[PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
From: Geoff LansberryThe TRF7970A has configuration options for supporting hardware designs with 1.8 Volt or 3.3 Volt IO. This commit adds a device tree option, using a fixed regulator binding, for setting the io voltage to match the hardware configuration. If no option is supplied it defaults to 3.3 volt configuration. --- .../devicetree/bindings/net/nfc/trf7970a.txt | 4 ++-- drivers/nfc/trf7970a.c | 28 +- 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt index e262ac1..b5777d8 100644 --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt @@ -21,9 +21,9 @@ Optional SoC Specific Properties: - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum where an extra byte is returned by Read Multiple Block commands issued to Type 5 tags. +- vdd-io-supply: Regulator specifying voltage for vdd-io - clock-frequency: Set to specify that the input frequency to the trf7970a is 1356Hz or 2712Hz - Example (for ARM-based BeagleBone with TRF7970A on SPI1): { @@ -41,11 +41,11 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1): < 5 GPIO_ACTIVE_LOW>; vin-supply = <_reg>; vin-voltage-override = <500>; + vdd-io-supply = <_reg>; autosuspend-delay = <3>; irq-status-read-quirk; en2-rf-quirk; t5t-rmb-extra-byte-quirk; - vdd_io_1v8; clock-frequency = <2712>; status = "okay"; }; diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c index 4e051e9..8a88195 100644 --- a/drivers/nfc/trf7970a.c +++ b/drivers/nfc/trf7970a.c @@ -444,6 +444,7 @@ struct trf7970a { u8 iso_ctrl_tech; u8 modulator_sys_clk_ctrl; u8 special_fcn_reg1; + u8 io_ctrl; unsigned intguard_time; int technology; int framing; @@ -1051,6 +1052,11 @@ static int trf7970a_init(struct trf7970a *trf) if (ret) goto err_out; + ret = trf7970a_write(trf, TRF7970A_REG_IO_CTRL, + trf->io_ctrl | TRF7970A_REG_IO_CTRL_VRS(0x1)); + if (ret) + goto err_out; + ret = trf7970a_write(trf, TRF7970A_NFC_TARGET_LEVEL, 0); if (ret) goto err_out; @@ -1767,7 +1773,7 @@ static int _trf7970a_tg_listen(struct nfc_digital_dev *ddev, u16 timeout, goto out_err; ret = trf7970a_write(trf, TRF7970A_REG_IO_CTRL, - TRF7970A_REG_IO_CTRL_VRS(0x1)); + trf->io_ctrl | TRF7970A_REG_IO_CTRL_VRS(0x1)); if (ret) goto out_err; @@ -2062,6 +2068,7 @@ static int trf7970a_probe(struct spi_device *spi) return ret; } + of_property_read_u32(np, "clock-frequency", _freq); if ((clk_freq != TRF7970A_27MHZ_CLOCK_FREQUENCY) || (clk_freq != TRF7970A_27MHZ_CLOCK_FREQUENCY)) { @@ -2105,6 +2112,25 @@ static int trf7970a_probe(struct spi_device *spi) if (uvolts > 400) trf->chip_status_ctrl = TRF7970A_CHIP_STATUS_VRS5_3; + trf->regulator = devm_regulator_get(>dev, "vdd-io"); + if (IS_ERR(trf->regulator)) { + ret = PTR_ERR(trf->regulator); + dev_err(trf->dev, "Can't get VDD_IO regulator: %d\n", ret); + goto err_destroy_lock; + } + + ret = regulator_enable(trf->regulator); + if (ret) { + dev_err(trf->dev, "Can't enable VDD_IO: %d\n", ret); + goto err_destroy_lock; + } + + + if (regulator_get_voltage(trf->regulator) == 180) { + trf->io_ctrl = TRF7970A_REG_IO_CTRL_IO_LOW; + dev_dbg(trf->dev, "trf7970a config vdd_io to 1.8V\n"); + } + trf->ddev = nfc_digital_allocate_device(_nfc_ops, TRF7970A_SUPPORTED_PROTOCOLS, NFC_DIGITAL_DRV_CAPS_IN_CRC | -- Signed-off-by: Geoff Lansberry
[PATCH 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel
From: Jaret CantuRepeated polling attempts cause a NULL dereference error to occur. This is because the state of the trf7970a is currently reading but another request has been made to send a command before it has finished. The solution is to properly kill the waiting reading (workqueue) before failing on the send. --- drivers/nfc/trf7970a.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c index 8a88195..5916737 100644 --- a/drivers/nfc/trf7970a.c +++ b/drivers/nfc/trf7970a.c @@ -1496,6 +1496,10 @@ static int trf7970a_send_cmd(struct nfc_digital_dev *ddev, (trf->state != TRF7970A_ST_IDLE_RX_BLOCKED)) { dev_err(trf->dev, "%s - Bogus state: %d\n", __func__, trf->state); + if (trf->state == TRF7970A_ST_WAIT_FOR_RX_DATA || + trf->state == TRF7970A_ST_WAIT_FOR_RX_DATA_CONT) + trf->ignore_timeout = + !cancel_delayed_work(>timeout_work); ret = -EIO; goto out_err; } -- Signed-off-by: Geoff Lansberry