[GIT] Networking
1) Fix NAPI poll list corruption in enic driver, from Christian Lamparter. 2) Fix route use after free, from Eric Dumazet. 3) Fix regression in reuseaddr handling, from Josef Bacik. 4) Assert the size of control messages in compat handling since we copy it in from userspace twice. From Meng Xu. 5) SMC layer bug fixes (missing RCU locking, bad refcounting, etc.) from Ursula Braun. 6) Fix races in AF_PACKET fanout handling, from Willem de Bruijn. 7) Don't use ARRAY_SIZE on spinlock array which might have zero entries, from Geert Uytterhoeven. 8) Fix miscomputation of checksum in ipv6 udp code, from Subash Abhinov Kasiviswanathan. 9) Push the ipv6 header properly in ipv6 GRE tunnel driver, from Xin Long. Please pull, thanks a lot. The following changes since commit 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e: Linux 4.14-rc1 (2017-09-16 15:47:51 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net for you to fetch changes up to 4e683f499a15cd777d3cb51aaebe48d72334c852: Merge branch 'net-fix-reuseaddr-regression' (2017-09-22 20:33:18 -0700) Alex Ng (1): hv_netvsc: fix send buffer failure on MTU change Andreas Gruenbacher (1): rhashtable: Documentation tweak Ariel Elior (1): MAINTAINERS: Remove Yuval Mintz from maintainers list Christian Lamparter (1): net: emac: Fix napi poll list corruption Cong Wang (1): net_sched: remove cls_flower idr on failure Daniel Borkmann (1): bpf: fix ri->map_owner pointer on bpf_prog_realloc David S. Miller (8): Merge tag 'mac80211-for-davem-2017-11-19' of git://git.kernel.org/.../jberg/mac80211 Merge branch 'hns3-bug-fixes' Merge git://git.kernel.org/.../pablo/nf Merge branch 'hns3-tm-fixes' Merge branch 'phylib-xcvr-type' Merge branch 'lan78xx-fixes' Merge branch 'smc-bug-fixes' Merge branch 'net-fix-reuseaddr-regression' Davide Caratti (1): net/sched: cls_matchall: fix crash when used with classful qdisc Edward Cree (1): net: change skb->mac_header when Generic XDP calls adjust_head Eric Dumazet (4): 8139too: revisit napi_complete_done() usage bpf: do not disable/enable BH in bpf_map_free_id() tcp: fastopen: fix on syn-data transmit failure net: prevent dst uses after free Fahad Kunnathadi (1): net: phy: Fix mask value write on gmii2rgmii converter speed register Florian Fainelli (3): net: systemport: Fix 64-bit statistics dependency net: ethtool: Add back transceiver type net: phy: Keep reporting transceiver type Geert Uytterhoeven (2): netfilter: nat: Do not use ARRAY_SIZE() on spinlocks to fix zero div net: phy: Fix truncation of large IRQ numbers in phy_attached_print() Hans Wippel (2): net/smc: add missing dev_put net/smc: add receive timeout check Jerome Brunet (1): net: phy: Kconfig: Fix PHY infrastructure menu in menuconfig Johannes Berg (1): nl80211: fix null-ptr dereference on invalid mesh configuration Josef Bacik (3): net: set tb->fast_sk_family net: use inet6_rcv_saddr to compare sockets inet: fix improper empty comparison Konstantin Khlebnikov (2): net_sched: always reset qdisc backlog in qdisc_reset() net_sched/hfsc: fix curve activation in hfsc_change_class() Lipeng (6): net: hns3: Fixes initialization of phy address from firmware net: hns3: Fixes the command used to unmap ring from vector net: hns3: Fixes ring-to-vector map-and-unmap command net: hns3: Fixes the initialization of MAC address in hardware net: hns3: Fixes the default VLAN-id of PF net: hns3: Fixes the premature exit of loop when matching clients Matteo Croce (1): ipv6: fix net.ipv6.conf.all interface DAD handlers Meng Xu (2): net: compat: assert the size of cmsg copied in is as expected isdn/i4l: fetch the ppp_write buffer in one shot Mike Manning (1): net: ipv6: fix regression of no RTM_DELADDR sent after DAD failure Nisar Sayed (3): lan78xx: Fix for eeprom read/write when device auto suspend lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE lan78xx: Use default values loaded from EEPROM/OTP after reset Randy Dunlap (1): Documentation: networking: fix ASCII art in switchdev.txt Salil Mehta (1): net: hns3: Fixes the ether address copy with appropriate API Sathya Perla (1): bnxt_en: check for ingress qdisc in flower offload Stefan Schmidt (1): MAINTAINERS: update git tree locations for ieee802154 subsystem Subash Abhinov Kasiviswanathan (1): udpv6: Fix the checksum computation when HW checksum does not apply Thomas Meyer (1): net: stmmac: Cocci spatch "of_table" Timur Tabi (1): net: qcom/emac: add software control for pause frame mode Tobias Klauser (1): bpf: devmap: pass on
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On Fri, Sep 22, 2017 at 9:23 AM, Edward Creewrote: > On 22/09/17 16:16, Alexei Starovoitov wrote: >> looks like we're converging on >> "be16/be32/be64/le16/le32/le64 #register" for BPF_END. >> I guess it can live with that. I would prefer more C like syntax >> to match the rest, but llvm parsing point is a strong one. > Yep, agreed. I'll post a v2 once we've settled BPF_NEG. >> For BPG_NEG I prefer to do it in C syntax like interpreter does: >> ALU_NEG: >> DST = (u32) -DST; >> ALU64_NEG: >> DST = -DST; >> Yonghong, does it mean that asmparser will equally suffer? > Correction to my earlier statements: verifier will currently disassemble > neg as: > (87) r0 neg 0 > (84) (u32) r0 neg (u32) 0 > because it pretends 'neg' is a compound-assignment operator like +=. > The analogy with be16 and friends would be to use > neg64 r0 > neg32 r0 > whereas the analogy with everything else would be > r0 = -r0 > r0 = (u32) -r0 > as Alexei says. > I'm happy to go with Alexei's version if it doesn't cause problems for llvm. I got some time to do some prototyping in llvm and it looks like that I am able to resolve the issue and we are able to use more C-like syntax. That is: for bswap: r1 = (be16) (u16) r1 or r1 = (be16) r1 or r1 = be16 r1 for neg: r0 = -r0 (for 32bit support, llvm may output "w0 = -w0" in the future. But since it is not enabled yet, you can continue to output "r0 = (u32) -r0".) Not sure which syntax is best for bswap. The "r1 = (be16) (u16) r1" is most explicit in its intention. Attaching my llvm patch as well and cc'ing Jiong and Jakub so they can see my implementation and the relative discussion here. (In this patch, I did not implement bswap for little endian yet.) Maybe they can provide additional comments. 0001-bpf-add-support-for-neg-insn-and-change-format-of-bs.patch Description: Binary data
Re: pull-request: ieee802154 2017-09-20
From: Stefan SchmidtDate: Thu, 21 Sep 2017 22:56:07 +0200 > Here comes a pull request for ieee802154 changes I have queued up for > this merge window. > > Normally these have been coming through the bluetooth tree but as this > three have been falling through the cracks so far and I have to review > and ack all of them anyway I think it makes sense if I save the > bluetooth people some work and handle them directly. > > Its the first pull request I send to you so please let me know if I did > something wrong or if you prefer a different format. Pulled, thanks.
Re: [PATCH net-next v2 0/4] cxgb4: add support to offload tc flower
From: Rahul LakkireddyDate: Thu, 21 Sep 2017 23:41:12 +0530 > This series of patches add support to offload tc flower onto Chelsio > NICs. > > Patch 1 adds basic skeleton to prepare for offloading tc flower flows. > > Patch 2 adds support to add/remove flows for offload. Flows can have > accompanying masks. Following match and action are currently supported > for offload: > Match: ether-protocol, IPv4/IPv6 addresses, L4 ports (TCP/UDP) > Action: drop, redirect to another port on the device. > > Patch 3 adds support to offload tc-flower flows having > vlan actions: pop, push, and modify. > > Patch 4 adds support to fetch stats for the offloaded tc flower flows > from hardware. > > Support for offloading more match and action types are to be followed > in subsequent series. Series applied, thank you.
Re: [PATCH] net: use 32-bit arithmetic while allocating net device
From: Alexey DobriyanDate: Thu, 21 Sep 2017 23:33:29 +0300 > Private part of allocation is never big enough to warrant size_t. > > Space savings: > > add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10) > function old new delta > alloc_netdev_mqs11201110 -10 > > Signed-off-by: Alexey Dobriyan Applied to net-next.
Re: [PATCH net-next v2] net: Remove useless function skb_header_release
From: gfree.w...@vip.163.com Date: Fri, 22 Sep 2017 10:25:22 +0800 > From: Gao Feng> > There is no one which would invokes the function skb_header_release. > So just remove it now. > > Signed-off-by: Gao Feng Applied, thanks.
Re: [PATCH 0/3] fix reuseaddr regression
From: Josef BacikDate: Fri, 22 Sep 2017 20:20:05 -0400 > I introduced a regression when reworking the fastreuse port stuff that allows > bind conflicts to occur once a reuseaddr successfully opens on an existing tb. > The root cause is I reversed an if statement which caused us to set the tb as > if > there were no owners on the socket if there were, which obviously is not > correct. > > Dave could you please queue these changes up for -stable, I've run them > through > the net tests and added another test to check for this problem specifically. Series applied and queued up for -stable, thanks.
Re: [PATCH net v2] net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
From: Willem de BruijnDate: Fri, 22 Sep 2017 19:42:37 -0400 > Zerocopy skbs frags are copied when the skb is looped to a local sock. > Commit 1080e512d44d ("net: orphan frags on receive") introduced calls > to skb_orphan_frags to deliver_skb and __netif_receive_skb for this. > > With msg_zerocopy, these skbs can also exist in the tx path and thus > loop from dev_queue_xmit_nit. This already calls deliver_skb in its > loop. But it does not orphan before a separate pt_prev->func(). > > Add the missing skb_orphan_frags_rx. > > Changes > v1->v2: handle skb_orphan_frags_rx failure > > Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") > Signed-off-by: Willem de Bruijn Applied and queued up for -stable, thanks.
Re: [PATCH] MAINTAINERS: update git tree locations for ieee802154 subsystem
From: Stefan SchmidtDate: Fri, 22 Sep 2017 14:28:46 +0200 > Patches for ieee802154 will go through my new trees towards netdev from > now on. The 6LoWPAN subsystem will stay as is (shared between ieee802154 > and bluetooth) and go through the bluetooth tree as usual. > > Signed-off-by: Stefan Schmidt Applied.
Re: [PATCH] net: stmmac: Meet alignment requirements for DMA
From: Matt RedfearnDate: Fri, 22 Sep 2017 12:13:53 +0100 > According to Documentation/DMA-API.txt: > Warnings: Memory coherency operates at a granularity called the cache > line width. In order for memory mapped by this API to operate > correctly, the mapped region must begin exactly on a cache line > boundary and end exactly on one (to prevent two separately mapped > regions from sharing a single cache line). Since the cache line size > may not be known at compile time, the API will not enforce this > requirement. Therefore, it is recommended that driver writers who > don't take special care to determine the cache line size at run time > only map virtual regions that begin and end on page boundaries (which > are guaranteed also to be cache line boundaries). This is rediculious. You're misreading what this document is trying to explain. As long as you use the dma_{map,unamp}_single() and sync to/from deivce interfaces properly, the cacheline issues will be handled properly and the cpu and the device will see proper uptodate memory contents. It is completely rediculious to require every driver to stash away two sets of pointer for every packet, and to DMA map the headroom of the SKB which is wasteful. I'm not applying this, fix this problem properly, thanks.
Re: [PATCH 0/5] use setup_timer() helper function.
From: Allen PaisDate: Fri, 22 Sep 2017 16:28:17 +0530 > This series uses setup_timer() helper function. The series > addresses the files under net/*. There was a recent change to the nfc code in net-next which causes your patches to not apply. Please repsin against net-next, thanks.
Re: tools: selftests: psock_tpacket: skip un-supported tpacket_v3 test
From: Orson ZhaiDate: Fri, 22 Sep 2017 18:17:17 +0800 > The TPACKET_V3 test of PACKET_TX_RING will fail with kernel version > lower than v4.11. Supported code of tx ring was add with commit id > <7f953ab2ba46: af_packet: TX_RING support for TPACKET_V3> at Jan. 3 > of 2017. > > So skip this item test instead of reporting failing for old kernels. > > Signed-off-by: Orson Zhai The whole point is to make sure the kernel in which the selftest code is present functions properly. There are many tests in selftests that only work on recent kernels. I'm not applying this, sorry.
Re: [PATCH net-next] virtio-net: correctly set xdp_xmit for mergeable buffer
From: Jason WangDate: Fri, 22 Sep 2017 14:38:58 +0800 > We should set xdp_xmit only when xdp_do_redirect() succeed. > > Cc: John Fastabend > Signed-off-by: Jason Wang Applied, thanks Jason.
Re: [PATCH net-next 10/10] net: hns3: Add mqprio support when interacting with network stack
Hi, Jiri On 2017/9/23 0:03, Jiri Pirko wrote: > Fri, Sep 22, 2017 at 04:11:51PM CEST, linyunsh...@huawei.com wrote: >> Hi, Jiri >> - if (!tc) { + if (if_running) { + (void)hns3_nic_net_stop(netdev); + msleep(100); + } + + ret = (kinfo->dcb_ops && kinfo->dcb_ops->>setup_tc) ? + kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : ->EOPNOTSUPP; >> >>> This is most odd. Why do you call dcb_ops from >ndo_setup_tc callback? >>> Why are you mixing this together? prio->tc mapping >can be done >>> directly in dcbnl >> >> Here is what we do in dcb_ops->setup_tc: >> Firstly, if current tc num is different from the tc num >> that user provide, then we setup the queues for each >> tc. >> >> Secondly, we tell hardware the pri to tc mapping that >> the stack is using. In rx direction, our hardware need >> that mapping to put different packet into different tc' >> queues according to the priority of the packet, then >> rss decides which specific queue in the tc should the >> packet goto. >> >> By mixing, I suppose you meant why we need the >> pri to tc infomation? > > by mixing, I mean what I wrote. You are calling dcb_ops callback from > ndo_setup_tc callback. So you are mixing DCBNL subsystem and TC > subsystem. Why? Why do you need sch_mqprio? Why DCBNL is not enough for > all? When using lldptool, dcbnl is involved. But when using tc qdisc, dcbbl is not involved, below is the a few key call graph in the kernel when tc qdisc cmd is executed. cmd: tc qdisc add dev eth0 root handle 1:0 mqprio num_tc 4 map 1 2 3 3 1 3 1 1 hw 1 call graph: rtnetlink_rcv_msg -> tc_modify_qdisc -> qdisc_create -> mqprio_init -> hns3_nic_setup_tc When hns3_nic_setup_tc is called, we need to know how many tc num and prio_tc mapping from the tc_mqprio_qopt which is provided in the paramter in the ndo_setup_tc function, and dcb_ops is the our hardware specific method to setup the tc related parameter to the hardware, so this is why we call dcb_ops callback in ndo_setup_tc callback. I hope this will answer your question, thanks for your time. > > > >> I hope I did not misunderstand your question, thanks >> for your time reviewing. > > . >
[PATCH net-next] liquidio: pass date and time info to NIC firmware
From: Veerasenareddy BurruSigned-off-by: Veerasenareddy Burru Signed-off-by: Manish Awasthi Signed-off-by: Felix Manlunas --- .../net/ethernet/cavium/liquidio/octeon_console.c | 28 +++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_console.c b/drivers/net/ethernet/cavium/liquidio/octeon_console.c index ec3dd69..eda799b 100644 --- a/drivers/net/ethernet/cavium/liquidio/octeon_console.c +++ b/drivers/net/ethernet/cavium/liquidio/octeon_console.c @@ -803,15 +803,19 @@ static int octeon_console_read(struct octeon_device *oct, u32 console_num, } #define FBUF_SIZE (4 * 1024 * 1024) +#define MAX_DATE_SIZE30 int octeon_download_firmware(struct octeon_device *oct, const u8 *data, size_t size) { - int ret = 0; + struct octeon_firmware_file_header *h; + char date[MAX_DATE_SIZE]; + struct timeval time; u32 crc32_result; + struct tm tm_val; u64 load_addr; u32 image_len; - struct octeon_firmware_file_header *h; + int ret = 0; u32 i, rem; if (size < sizeof(struct octeon_firmware_file_header)) { @@ -890,11 +894,29 @@ int octeon_download_firmware(struct octeon_device *oct, const u8 *data, load_addr += size; } } + + /* Get time of the day */ + do_gettimeofday(); + time_to_tm(time.tv_sec, (-sys_tz.tz_minuteswest) * 60, _val); + ret = snprintf(date, MAX_DATE_SIZE, + " date=%04ld.%02d.%02d-%02d:%02d:%02d", + tm_val.tm_year + 1900, tm_val.tm_mon + 1, tm_val.tm_mday, + tm_val.tm_hour, tm_val.tm_min, tm_val.tm_sec); + if ((sizeof(h->bootcmd) - strnlen(h->bootcmd, sizeof(h->bootcmd))) < + ret) { + dev_err(>pci_dev->dev, "Boot command buffer too small\n"); + return -EINVAL; + } + strncat(h->bootcmd, date, + sizeof(h->bootcmd) - strnlen(h->bootcmd, sizeof(h->bootcmd))); + dev_info(>pci_dev->dev, "Writing boot command: %s\n", h->bootcmd); /* Invoke the bootcmd */ ret = octeon_console_send_cmd(oct, h->bootcmd, 50); + if (ret) + dev_info(>pci_dev->dev, "Boot command send failed\n"); - return 0; + return ret; }
Re: [PATCH 0/3] fix reuseaddr regression
On Tue, Sep 19, 2017 at 01:50:56PM -0700, David Miller wrote: > From: jo...@toxicpanda.com > Date: Mon, 18 Sep 2017 12:28:54 -0400 > > > I introduced a regression when reworking the fastreuse port stuff that > > allows > > bind conflicts to occur once a reuseaddr socket successfully opens on an > > existing tb. The root cause is I reversed an if statement which caused us > > to > > set the tb as if there were no owners on the socket if there were, which > > obviously is not correct. > > > > Dave I have follow up patches that will add a selftest for this case and I > > ran > > the other reuseport related tests as well. These need to go in pretty > > quickly > > as it breaks kvm, I've marked them for stable. Sorry for the regression, > > First, please fix your "From: " field so that it actually has your full > name rather than just your email address. This matter when I apply > your patches. > > Second, remove the stable CC:. For networking changes, you simply ask > me to queue the changes up for -stable. > Sorry Dave, I've fixed my git email settings and I droped the stable cc and sent a new round. Didn't see this until just now, my bad. Josef
[PATCH 1/3] net: set tb->fast_sk_family
From: Josef BacikWe need to set the tb->fast_sk_family properly so we can use the proper comparison function for all subsequent reuseport bind requests. Fixes: 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk") Reported-and-tested-by: Cole Robinson Signed-off-by: Josef Bacik --- net/ipv4/inet_connection_sock.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index b9c64b40a83a..f87f4805e244 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -328,6 +328,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) tb->fastuid = uid; tb->fast_rcv_saddr = sk->sk_rcv_saddr; tb->fast_ipv6_only = ipv6_only_sock(sk); + tb->fast_sk_family = sk->sk_family; #if IS_ENABLED(CONFIG_IPV6) tb->fast_v6_rcv_saddr = sk->sk_v6_rcv_saddr; #endif @@ -354,6 +355,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) tb->fastuid = uid; tb->fast_rcv_saddr = sk->sk_rcv_saddr; tb->fast_ipv6_only = ipv6_only_sock(sk); + tb->fast_sk_family = sk->sk_family; #if IS_ENABLED(CONFIG_IPV6) tb->fast_v6_rcv_saddr = sk->sk_v6_rcv_saddr; #endif -- 2.7.4
[PATCH 0/3] fix reuseaddr regression
I introduced a regression when reworking the fastreuse port stuff that allows bind conflicts to occur once a reuseaddr successfully opens on an existing tb. The root cause is I reversed an if statement which caused us to set the tb as if there were no owners on the socket if there were, which obviously is not correct. Dave could you please queue these changes up for -stable, I've run them through the net tests and added another test to check for this problem specifically. Thanks, Josef
[PATCH 2/3] net: use inet6_rcv_saddr to compare sockets
From: Josef BacikIn ipv6_rcv_saddr_equal() we need to use inet6_rcv_saddr(sk) for the ipv6 compare with the fast socket information to make sure we're doing the proper comparisons. Fixes: 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk") Reported-and-tested-by: Cole Robinson Signed-off-by: Josef Bacik --- net/ipv4/inet_connection_sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index f87f4805e244..a1bf30438bc5 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -266,7 +266,7 @@ static inline int sk_reuseport_match(struct inet_bind_bucket *tb, #if IS_ENABLED(CONFIG_IPV6) if (tb->fast_sk_family == AF_INET6) return ipv6_rcv_saddr_equal(>fast_v6_rcv_saddr, - >sk_v6_rcv_saddr, + inet6_rcv_saddr(sk), tb->fast_rcv_saddr, sk->sk_rcv_saddr, tb->fast_ipv6_only, -- 2.7.4
[PATCH 3/3] inet: fix improper empty comparison
From: Josef BacikWhen doing my reuseport rework I screwed up and changed a if (hlist_empty(>owners)) to if (!hlist_empty(>owners)) This is obviously bad as all of the reuseport/reuse logic was reversed, which caused weird problems like allowing an ipv4 bind conflict if we opened an ipv4 only socket on a port followed by an ipv6 only socket on the same port. Fixes: b9470c27607b ("inet: kill smallest_size and smallest_port") Reported-by: Cole Robinson Signed-off-by: Josef Bacik --- net/ipv4/inet_connection_sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index a1bf30438bc5..c039c937ba90 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -321,7 +321,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) goto fail_unlock; } success: - if (!hlist_empty(>owners)) { + if (hlist_empty(>owners)) { tb->fastreuse = reuse; if (sk->sk_reuseport) { tb->fastreuseport = FASTREUSEPORT_ANY; -- 2.7.4
[PATCH net-next 2/3] liquidio: verify firmware version when auto-loaded from flash.
From: Rick FarringtonSigned-off-by: Rick Farrington Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/lio_main.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index ce08f71..a3c9867 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -3303,7 +3303,7 @@ static int setup_nic_devices(struct octeon_device *octeon_dev) { struct lio *lio = NULL; struct net_device *netdev; - u8 mac[6], i, j; + u8 mac[6], i, j, *fw_ver; struct octeon_soft_command *sc; struct liquidio_if_cfg_context *ctx; struct liquidio_if_cfg_resp *resp; @@ -3414,6 +3414,22 @@ static int setup_nic_devices(struct octeon_device *octeon_dev) goto setup_nic_dev_fail; } + /* Verify f/w version (in case of 'auto' loading from flash) */ + fw_ver = octeon_dev->fw_info.liquidio_firmware_version; + if (memcmp(LIQUIDIO_BASE_VERSION, + fw_ver, + strlen(LIQUIDIO_BASE_VERSION))) { + dev_err(_dev->pci_dev->dev, + "Unmatched firmware version. Expected %s.x, got %s.\n", + LIQUIDIO_BASE_VERSION, fw_ver); + goto setup_nic_dev_fail; + } else if (atomic_read(octeon_dev->adapter_fw_state) == + FW_IS_PRELOADED) { + dev_info(_dev->pci_dev->dev, +"Using auto-loaded firmware version %s.\n", +fw_ver); + } + octeon_swap_8B_data((u64 *)(>cfg_info), (sizeof(struct liquidio_if_cfg_info)) >> 3); -- 1.8.3.1
[PATCH net-next 3/3] liquidio: update module parameter fw_type to reflect firmware type loaded
From: Rick FarringtonSigned-off-by: Rick Farrington Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/lio_main.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index a3c9867..963803b 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -1934,10 +1934,12 @@ static int load_firmware(struct octeon_device *oct) char fw_name[LIO_MAX_FW_FILENAME_LEN]; char *tmp_fw_type; - if (fw_type_is_auto()) + if (fw_type_is_auto()) { tmp_fw_type = LIO_FW_NAME_TYPE_NIC; - else + strncpy(fw_type, tmp_fw_type, sizeof(fw_type)); + } else { tmp_fw_type = fw_type; + } sprintf(fw_name, "%s%s%s_%s%s", LIO_FW_DIR, LIO_FW_BASE_NAME, octeon_get_conf(oct)->card_name, tmp_fw_type, -- 1.8.3.1
[PATCH net-next 1/3] liquidio: allow override of firmware present in flash
From: Rick FarringtonSigned-off-by: Rick Farrington Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/lio_main.c| 68 ++ .../net/ethernet/cavium/liquidio/liquidio_image.h | 1 + .../net/ethernet/cavium/liquidio/octeon_device.c | 11 +++- .../net/ethernet/cavium/liquidio/octeon_device.h | 10 4 files changed, 64 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index e7f5494..ce08f71 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -59,9 +59,9 @@ module_param(debug, int, 0644); MODULE_PARM_DESC(debug, "NETIF_MSG debug bits"); -static char fw_type[LIO_MAX_FW_TYPE_LEN] = LIO_FW_NAME_TYPE_NIC; +static char fw_type[LIO_MAX_FW_TYPE_LEN] = LIO_FW_NAME_TYPE_AUTO; module_param_string(fw_type, fw_type, sizeof(fw_type), 0444); -MODULE_PARM_DESC(fw_type, "Type of firmware to be loaded. Default \"nic\". Use \"none\" to load firmware from flash."); +MODULE_PARM_DESC(fw_type, "Type of firmware to be loaded (default is \"auto\"), which uses firmware in flash, if present, else loads \"nic\"."); static u32 console_bitmask; module_param(console_bitmask, int, 0644); @@ -1115,10 +1115,10 @@ static int liquidio_watchdog(void *param) return 0; } -static bool fw_type_is_none(void) +static bool fw_type_is_auto(void) { - return strncmp(fw_type, LIO_FW_NAME_TYPE_NONE, - sizeof(LIO_FW_NAME_TYPE_NONE)) == 0; + return strncmp(fw_type, LIO_FW_NAME_TYPE_AUTO, + sizeof(LIO_FW_NAME_TYPE_AUTO)) == 0; } /** @@ -1302,7 +1302,7 @@ static void octeon_destroy_resources(struct octeon_device *oct) * Implementation note: only soft-reset the device * if it is a CN6XXX OR the LAST CN23XX device. */ - if (fw_type_is_none()) + if (atomic_read(oct->adapter_fw_state) == FW_IS_PRELOADED) octeon_pci_flr(oct); else if (OCTEON_CN6XXX(oct) || !refcount) oct->fn_list.soft_reset(oct); @@ -1934,7 +1934,7 @@ static int load_firmware(struct octeon_device *oct) char fw_name[LIO_MAX_FW_FILENAME_LEN]; char *tmp_fw_type; - if (fw_type[0] == '\0') + if (fw_type_is_auto()) tmp_fw_type = LIO_FW_NAME_TYPE_NIC; else tmp_fw_type = fw_type; @@ -3882,9 +3882,9 @@ static void nic_starter(struct work_struct *work) static int octeon_device_init(struct octeon_device *octeon_dev) { int j, ret; - int fw_loaded = 0; char bootcmd[] = "\n"; char *dbg_enb = NULL; + enum lio_fw_state fw_state; struct octeon_device_priv *oct_priv = (struct octeon_device_priv *)octeon_dev->priv; atomic_set(_dev->status, OCT_DEV_BEGIN_STATE); @@ -3916,24 +3916,40 @@ static int octeon_device_init(struct octeon_device *octeon_dev) octeon_dev->app_mode = CVM_DRV_INVALID_APP; - if (OCTEON_CN23XX_PF(octeon_dev)) { - if (!cn23xx_fw_loaded(octeon_dev) && !fw_type_is_none()) { - fw_loaded = 0; - /* Do a soft reset of the Octeon device. */ - if (octeon_dev->fn_list.soft_reset(octeon_dev)) - return 1; - /* things might have changed */ - if (!cn23xx_fw_loaded(octeon_dev)) - fw_loaded = 0; - else - fw_loaded = 1; - } else { - fw_loaded = 1; - } - } else if (octeon_dev->fn_list.soft_reset(octeon_dev)) { - return 1; + /* CN23XX supports preloaded firmware if the following is true: +* +* The adapter indicates that firmware is currently running AND +* 'fw_type' is 'auto'. +* +* (default state is NEEDS_TO_BE_LOADED, override it if appropriate). +*/ + if (OCTEON_CN23XX_PF(octeon_dev) && + cn23xx_fw_loaded(octeon_dev) && fw_type_is_auto()) { + atomic_cmpxchg(octeon_dev->adapter_fw_state, + FW_NEEDS_TO_BE_LOADED, FW_IS_PRELOADED); } + /* If loading firmware, only first device of adapter needs to do so. */ + fw_state = atomic_cmpxchg(octeon_dev->adapter_fw_state, + FW_NEEDS_TO_BE_LOADED, + FW_IS_BEING_LOADED); + + /* Here, [local variable] 'fw_state' is set to one of: +* +* FW_IS_PRELOADED: No firmware is to be loaded (see above) +* FW_NEEDS_TO_BE_LOADED: The driver's first instance will load +*
[PATCH net-next 0/3] liquidio: firmware loading
From: Rick Farrington1. Allow host driver parameter to override auto-loaded firmware (in flash). 2. Verify version of firmware that is auto-loaded from flash. 3. Change value of fw_type module parameter to reflect default firmware image name that is loaded by host driver (in /sys/module/liquidio/...) drivers/net/ethernet/cavium/liquidio/lio_main.c| 90 +++--- .../net/ethernet/cavium/liquidio/liquidio_image.h | 1 + .../net/ethernet/cavium/liquidio/octeon_device.c | 11 ++- .../net/ethernet/cavium/liquidio/octeon_device.h | 10 +++ 4 files changed, 84 insertions(+), 28 deletions(-) -- 1.8.3.1
Re: [PATCH net v2] net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
On Fri, 2017-09-22 at 19:42 -0400, Willem de Bruijn wrote: > Zerocopy skbs frags are copied when the skb is looped to a local sock. > Commit 1080e512d44d ("net: orphan frags on receive") introduced calls > to skb_orphan_frags to deliver_skb and __netif_receive_skb for this. > > With msg_zerocopy, these skbs can also exist in the tx path and thus > loop from dev_queue_xmit_nit. This already calls deliver_skb in its > loop. But it does not orphan before a separate pt_prev->func(). > > Add the missing skb_orphan_frags_rx. > > Changes > v1->v2: handle skb_orphan_frags_rx failure > > Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") > Signed-off-by: Willem de Bruijn> --- > net/core/dev.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) Reviewed-by: Eric Dumazet
[PATCH net v2] net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
Zerocopy skbs frags are copied when the skb is looped to a local sock. Commit 1080e512d44d ("net: orphan frags on receive") introduced calls to skb_orphan_frags to deliver_skb and __netif_receive_skb for this. With msg_zerocopy, these skbs can also exist in the tx path and thus loop from dev_queue_xmit_nit. This already calls deliver_skb in its loop. But it does not orphan before a separate pt_prev->func(). Add the missing skb_orphan_frags_rx. Changes v1->v2: handle skb_orphan_frags_rx failure Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") Signed-off-by: Willem de Bruijn--- net/core/dev.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 9a2254f9802f..588b473194a8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1948,8 +1948,12 @@ void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) goto again; } out_unlock: - if (pt_prev) - pt_prev->func(skb2, skb->dev, pt_prev, skb->dev); + if (pt_prev) { + if (!skb_orphan_frags_rx(skb2, GFP_ATOMIC)) + pt_prev->func(skb2, skb->dev, pt_prev, skb->dev); + else + kfree_skb(skb2); + } rcu_read_unlock(); } EXPORT_SYMBOL_GPL(dev_queue_xmit_nit); -- 2.14.1.821.g8fa685d3b7-goog
Re: [PATCH net] net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
On Fri, Sep 22, 2017 at 7:04 PM, Eric Dumazetwrote: > On Fri, 2017-09-22 at 18:51 -0400, Willem de Bruijn wrote: >> Zerocopy skbs frags are copied when the skb is looped to a local sock. >> Commit 1080e512d44d ("net: orphan frags on receive") introduced calls >> to skb_orphan_frags to deliver_skb and __netif_receive_skb. >> >> With msg_zerocopy, these skbs can also exist in the tx path and thus >> loop from dev_queue_xmit_nit. This already calls deliver_skb in its >> loop. But it does not orphan before a separate pt_prev->func(). >> >> Add the missing skb_orphan_frags_rx. >> >> Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") >> Signed-off-by: Willem de Bruijn >> --- >> net/core/dev.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 9a2254f9802f..3f5b26ff4f74 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -1948,7 +1948,7 @@ void dev_queue_xmit_nit(struct sk_buff *skb, struct >> net_device *dev) >> goto again; >> } >> out_unlock: >> - if (pt_prev) >> + if (pt_prev && !skb_orphan_frags_rx(skb2, GFP_ATOMIC)) >> pt_prev->func(skb2, skb->dev, pt_prev, skb->dev); > > Don't you need to kfree_skb(skb2) in case of failure ? Oh, yes, of course! :/ Will fix right away.
[PATCH net-next v3 0/2] net: dsa: port enabling
This patchset makes slave open and close symmetrical and provides helpers for enabling or disabling a given DSA port. Changes in v3: - save the phy_device change for a future patchset Changes in v2: - do not remove the phy argument from port enable/disable Vivien Didelot (2): net: dsa: make slave close symmetrical to open net: dsa: add port enable and disable helpers net/dsa/dsa_priv.h | 3 ++- net/dsa/port.c | 31 ++- net/dsa/slave.c| 21 ++--- 3 files changed, 38 insertions(+), 17 deletions(-) -- 2.14.1
[PATCH net-next v3 1/2] net: dsa: make slave close symmetrical to open
The DSA slave open function configures the unicast MAC addresses on the master device, enable the switch port, change its STP state, then start the PHY device. Make the close function symmetric, by first stopping the PHY device, then changing the STP state, disabling the switch port and restore the master device. Signed-off-by: Vivien DidelotReviewed-by: Florian Fainelli Reviewed-by: Andrew Lunn --- net/dsa/slave.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 02ace7d462c4..c2bb48579032 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -133,6 +133,11 @@ static int dsa_slave_close(struct net_device *dev) if (p->phy) phy_stop(p->phy); + dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); + + if (ds->ops->port_disable) + ds->ops->port_disable(ds, p->dp->index, p->phy); + dev_mc_unsync(master, dev); dev_uc_unsync(master, dev); if (dev->flags & IFF_ALLMULTI) @@ -143,11 +148,6 @@ static int dsa_slave_close(struct net_device *dev) if (!ether_addr_equal(dev->dev_addr, master->dev_addr)) dev_uc_del(master, dev->dev_addr); - if (ds->ops->port_disable) - ds->ops->port_disable(ds, p->dp->index, p->phy); - - dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); - return 0; } -- 2.14.1
[PATCH net-next v3 2/2] net: dsa: add port enable and disable helpers
Provide dsa_port_enable and dsa_port_disable helpers to respectively enable and disable a switch port. This makes the dsa_port_set_state_now helper static. Signed-off-by: Vivien DidelotReviewed-by: Florian Fainelli Reviewed-by: Andrew Lunn --- net/dsa/dsa_priv.h | 3 ++- net/dsa/port.c | 31 ++- net/dsa/slave.c| 19 +-- 3 files changed, 37 insertions(+), 16 deletions(-) diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index 9803952a5b40..0298a0f6a349 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -117,7 +117,8 @@ void dsa_master_ethtool_restore(struct net_device *dev); /* port.c */ int dsa_port_set_state(struct dsa_port *dp, u8 state, struct switchdev_trans *trans); -void dsa_port_set_state_now(struct dsa_port *dp, u8 state); +int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy); +void dsa_port_disable(struct dsa_port *dp, struct phy_device *phy); int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br); void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br); int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering, diff --git a/net/dsa/port.c b/net/dsa/port.c index 76d43a82d397..72c8dbd3d3f2 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -56,7 +56,7 @@ int dsa_port_set_state(struct dsa_port *dp, u8 state, return 0; } -void dsa_port_set_state_now(struct dsa_port *dp, u8 state) +static void dsa_port_set_state_now(struct dsa_port *dp, u8 state) { int err; @@ -65,6 +65,35 @@ void dsa_port_set_state_now(struct dsa_port *dp, u8 state) pr_err("DSA: failed to set STP state %u (%d)\n", state, err); } +int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy) +{ + u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING; + struct dsa_switch *ds = dp->ds; + int port = dp->index; + int err; + + if (ds->ops->port_enable) { + err = ds->ops->port_enable(ds, port, phy); + if (err) + return err; + } + + dsa_port_set_state_now(dp, stp_state); + + return 0; +} + +void dsa_port_disable(struct dsa_port *dp, struct phy_device *phy) +{ + struct dsa_switch *ds = dp->ds; + int port = dp->index; + + dsa_port_set_state_now(dp, BR_STATE_DISABLED); + + if (ds->ops->port_disable) + ds->ops->port_disable(ds, port, phy); +} + int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br) { struct dsa_notifier_bridge_info info = { diff --git a/net/dsa/slave.c b/net/dsa/slave.c index c2bb48579032..bd51ef56ec5b 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -73,9 +73,7 @@ static int dsa_slave_open(struct net_device *dev) { struct dsa_slave_priv *p = netdev_priv(dev); struct dsa_port *dp = p->dp; - struct dsa_switch *ds = dp->ds; struct net_device *master = dsa_master_netdev(p); - u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING; int err; if (!(master->flags & IFF_UP)) @@ -98,13 +96,9 @@ static int dsa_slave_open(struct net_device *dev) goto clear_allmulti; } - if (ds->ops->port_enable) { - err = ds->ops->port_enable(ds, p->dp->index, p->phy); - if (err) - goto clear_promisc; - } - - dsa_port_set_state_now(p->dp, stp_state); + err = dsa_port_enable(dp, p->phy); + if (err) + goto clear_promisc; if (p->phy) phy_start(p->phy); @@ -128,15 +122,12 @@ static int dsa_slave_close(struct net_device *dev) { struct dsa_slave_priv *p = netdev_priv(dev); struct net_device *master = dsa_master_netdev(p); - struct dsa_switch *ds = p->dp->ds; + struct dsa_port *dp = p->dp; if (p->phy) phy_stop(p->phy); - dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); - - if (ds->ops->port_disable) - ds->ops->port_disable(ds, p->dp->index, p->phy); + dsa_port_disable(dp, p->phy); dev_mc_unsync(master, dev); dev_uc_unsync(master, dev); -- 2.14.1
Re: [PATCH net] net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
On Fri, 2017-09-22 at 18:51 -0400, Willem de Bruijn wrote: > Zerocopy skbs frags are copied when the skb is looped to a local sock. > Commit 1080e512d44d ("net: orphan frags on receive") introduced calls > to skb_orphan_frags to deliver_skb and __netif_receive_skb. > > With msg_zerocopy, these skbs can also exist in the tx path and thus > loop from dev_queue_xmit_nit. This already calls deliver_skb in its > loop. But it does not orphan before a separate pt_prev->func(). > > Add the missing skb_orphan_frags_rx. > > Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") > Signed-off-by: Willem de Bruijn> --- > net/core/dev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 9a2254f9802f..3f5b26ff4f74 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1948,7 +1948,7 @@ void dev_queue_xmit_nit(struct sk_buff *skb, struct > net_device *dev) > goto again; > } > out_unlock: > - if (pt_prev) > + if (pt_prev && !skb_orphan_frags_rx(skb2, GFP_ATOMIC)) > pt_prev->func(skb2, skb->dev, pt_prev, skb->dev); Don't you need to kfree_skb(skb2) in case of failure ? > rcu_read_unlock(); > }
[PATCH net] net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
Zerocopy skbs frags are copied when the skb is looped to a local sock. Commit 1080e512d44d ("net: orphan frags on receive") introduced calls to skb_orphan_frags to deliver_skb and __netif_receive_skb. With msg_zerocopy, these skbs can also exist in the tx path and thus loop from dev_queue_xmit_nit. This already calls deliver_skb in its loop. But it does not orphan before a separate pt_prev->func(). Add the missing skb_orphan_frags_rx. Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") Signed-off-by: Willem de Bruijn--- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 9a2254f9802f..3f5b26ff4f74 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1948,7 +1948,7 @@ void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) goto again; } out_unlock: - if (pt_prev) + if (pt_prev && !skb_orphan_frags_rx(skb2, GFP_ATOMIC)) pt_prev->func(skb2, skb->dev, pt_prev, skb->dev); rcu_read_unlock(); } -- 2.14.1.821.g8fa685d3b7-goog
[PATCH net-next] hv_netvsc: Fix the real number of queues of non-vRSS cases
From: Haiyang ZhangFor older hosts without multi-channel (vRSS) support, and some error cases, we still need to set the real number of queues to one. This patch adds this missing setting. Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Signed-off-by: Haiyang Zhang --- drivers/net/hyperv/netvsc_drv.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index d4902ee5f260..68eac12fbf75 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -1929,6 +1929,12 @@ static int netvsc_probe(struct hv_device *dev, /* We always need headroom for rndis header */ net->needed_headroom = RNDIS_AND_PPI_SIZE; + /* Initialize the number of queues to be 1, we may change it if more +* channels are offered later. +*/ + netif_set_real_num_tx_queues(net, 1); + netif_set_real_num_rx_queues(net, 1); + /* Notify the netvsc driver of the new device */ memset(_info, 0, sizeof(device_info)); device_info.ring_size = ring_size; -- 2.14.1
Re: [RFC PATCH 00/11] udp: full early demux for unconnected sockets
On Fri, 2017-09-22 at 23:06 +0200, Paolo Abeni wrote: > This series refactor the UDP early demux code so that: > > * full socket lookup is performed for unicast packets > * a sk is grabbed even for unconnected socket match > * a dst cache is used even in such scenario > > To perform this tasks a couple of facilities are added: > > * noref socket references, scoped inside the current RCU section, to be > explicitly cleared before leaving such section > * a dst cache inside the inet and inet6 local addresses tables, caching the > related local dst entry > > The measured performance gain under small packet UDP flood is as follow: > > ingress NIC vanilla patched delta > rx queues (kpps) (kpps) (%) > [ipv4] > 1 2177241410 > 2 2527289214 > 3 3050373322 This is a clear sign your program is not using latest SO_REUSEPORT + [ec]BPF filter [1] return socket[RX_QUEUE# | or CPU#]; If udp_sink uses SO_REUSEPORT with no extra hint, socket selection is based on a lazy hash, meaning that you do not have proper siloing. return socket[hash(skb)]; Multiple cpus can then : - compete on grabbing same socket refcount - compete on grabbing the receive queue lock - compete for releasing lock and socket refcount - skb freeing done on different cpus than where allocated. You are adding complexity to the kernel because you are using a sub-optimal user space program, favoring false sharing. First solve the false sharing issue. Performance with 2 rx queues should be almost twice the performance with 1 rx queue. Then we can see if the gains you claim are still applicable. Thanks PS: Wei Wan is about to release the IPV6 changes so that the big differences you showed are going to disappear soon. Refs [1] tools/testing/selftests/net/reuseport_bpf.c 6a5ef90c58daada158ba16ba330558efc3471491 Merge branch 'faster-soreuseport' 3ca8e4029969d40ab90e3f1ecd83ab1cadd60fbb soreuseport: BPF selection functional test 538950a1b7527a0a52ccd9337e3fcd304f027f13 soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF e32ea7e747271a0abcd37e265005e97cc81d9df5 soreuseport: fast reuseport UDP socket selection ef456144da8ef507c8cf504284b6042e9201a05c soreuseport: define reuseport groups
[PATCH v4 0/9] bring back stack frame warning with KASAN
This is a new version of patches I originally submitted back in March [1], and last time in June [2]. This time I have basically rewritten the entire patch series based on a new approach that came out of GCC PR81715 that I opened[3]. The upcoming gcc-8 release is now much better at consolidating stack slots for inline function arguments and would obsolete most of my workaround patches here, but we still need the workarounds for gcc-5, gcc-6 and gcc-7. Many thanks to Jakub Jelinek for the analysis and the gcc-8 patch! This minimal set of patches only makes sure that we do get frame size warnings in allmodconfig for x86_64 and arm64 again with a 2048 byte limit, even with KASAN enabled, but without the new KASAN_EXTRA option. I set the warning limit with KASAN_EXTRA to 3072, limiting the allmodconfig+KASAN_EXTRA build output to around 50 legitimate warnings. These are for stack frames up to 31KB that will cause an immediate stack overflow, and fixing them would require bringing back my older patches and more. We can debate whether we want to apply those as a follow-up, or instead remove the option entirely. Another follow-up series I have reduces the warning limit with KASAN to 1536, and without KASAN to 1280 for 64-bit architectures. I hope we can get all patches merged for v4.14 and most of them backported into stable kernels. Since we no longer have a dependency on a preparation patch, my preference would be for the respective subsystem maintainers to pick up the individual patches. The last patch introduces a couple of "allmodconfig" build warnings on x86 and arm64 unless the other patches get merged first, I'll send that again separately once everything else has been taken care of. The remaining contents are: - -fsanitize-address-use-after-scope is moved to a separate CONFIG_KASAN_EXTRA option that increases the warning limit - CONFIG_KASAN_EXTRA is disabled with CONFIG_COMPILE_TEST, improving compile speed and disabling code that leads to valid warnings on gcc-7.0.1 - KMEMCHECK conflicts with CONFIG_KASAN - my inline function workaround is applied to netlink, one ethernet driver and a few media drivers. - The rework for the brcmsmac driver from previous versions is still there. Changes since v3: - I dropped all "noinline_if_stackbloat" annotations and used a workaround that introduces additional local variables in the inline functions to copy the function arguments, resulting in much better object code at the expense of having rather odd-looking functions. - The v4 patches now don't help with KASAN_EXTRA any more at all, CONFIG_KASAN_EXTRA now depends on CONFIG_DEBUG_KERNEL, as it is more dangerous in production systems than it was before - Rewrote the "em28xx" patch to be small enough for a stable backport. - The rewritten vt-keyboard patches got merged and are now in stable kernels as well. Changes since v2: - rewrote the vt-keyboard patch based on feedback - and made KMEMCHECK mutually exclusive with KASAN (rather than KASAN_EXTRA) Changes since v1: - dropped patches to fix all the CONFIG_KASAN_EXTRA warnings: - READ_ONCE/WRITE_ONCE cause problems in lots of code - typecheck() causes huge problems in a few places - many more uses of noinline_if_stackbloat Arnd [1] https://www.spinics.net/lists/linux-wireless/msg159819.html [2] https://www.spinics.net/lists/netdev/msg441918.html [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 Arnd Bergmann (9): brcmsmac: make some local variables 'static const' to reduce stack size brcmsmac: split up wlc_phy_workarounds_nphy brcmsmac: reindent split functions em28xx: fix em28xx_dvb_init for KASAN r820t: fix r820t_write_reg for KASAN dvb-frontends: fix i2c access helpers for KASAN rocker: fix rocker_tlv_put_* functions for KASAN netlink: fix nla_put_{u8,u16,u32} for KASAN kasan: rework Kconfig settings drivers/media/dvb-frontends/ascot2e.c |4 +- drivers/media/dvb-frontends/cxd2841er.c|4 +- drivers/media/dvb-frontends/helene.c |4 +- drivers/media/dvb-frontends/horus3a.c |4 +- drivers/media/dvb-frontends/itd1000.c |5 +- drivers/media/dvb-frontends/mt312.c|4 +- drivers/media/dvb-frontends/stb0899_drv.c |3 +- drivers/media/dvb-frontends/stb6100.c |6 +- drivers/media/dvb-frontends/stv0367.c |4 +- drivers/media/dvb-frontends/stv090x.c |4 +- drivers/media/dvb-frontends/stv6110x.c |4 +- drivers/media/dvb-frontends/zl10039.c |4 +- drivers/media/tuners/r820t.c | 13 +- drivers/media/usb/em28xx/em28xx-dvb.c | 30 +- drivers/net/ethernet/rocker/rocker_tlv.h | 48 +- .../broadcom/brcm80211/brcmsmac/phy/phy_n.c| 1856 ++-- include/net/netlink.h | 73 +- lib/Kconfig.debug
[PATCH v4 1/9] brcmsmac: make some local variables 'static const' to reduce stack size
With KASAN and a couple of other patches applied, this driver is one of the few remaining ones that actually use more than 2048 bytes of kernel stack: broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 'wlc_phy_workarounds_nphy_gainctrl': broadcom/brcm80211/brcmsmac/phy/phy_n.c:16065:1: warning: the frame size of 3264 bytes is larger than 2048 bytes [-Wframe-larger-than=] broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 'wlc_phy_workarounds_nphy': broadcom/brcm80211/brcmsmac/phy/phy_n.c:17138:1: warning: the frame size of 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=] Here, I'm reducing the stack size by marking as many local variables as 'static const' as I can without changing the actual code. This is the first of three patches to improve the stack usage in this driver. It would be good to have this backported to stabl kernels to get all drivers in 'allmodconfig' below the 2048 byte limit so we can turn on the frame warning again globally, but I realize that the patch is larger than the normal limit for stable backports. The other two patches do not need to be backported. Acked-by: Arend van SprielSigned-off-by: Arnd Bergmann --- .../broadcom/brcm80211/brcmsmac/phy/phy_n.c| 197 ++--- 1 file changed, 97 insertions(+), 100 deletions(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c index b3aab2fe96eb..ef685465f80a 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c @@ -14764,8 +14764,8 @@ static void wlc_phy_ipa_restore_tx_digi_filts_nphy(struct brcms_phy *pi) } static void -wlc_phy_set_rfseq_nphy(struct brcms_phy *pi, u8 cmd, u8 *events, u8 *dlys, - u8 len) +wlc_phy_set_rfseq_nphy(struct brcms_phy *pi, u8 cmd, const u8 *events, + const u8 *dlys, u8 len) { u32 t1_offset, t2_offset; u8 ctr; @@ -15240,16 +15240,16 @@ static void wlc_phy_workarounds_nphy_gainctrl_2057_rev5(struct brcms_phy *pi) static void wlc_phy_workarounds_nphy_gainctrl_2057_rev6(struct brcms_phy *pi) { u16 currband; - s8 lna1G_gain_db_rev7[] = { 9, 14, 19, 24 }; - s8 *lna1_gain_db = NULL; - s8 *lna1_gain_db_2 = NULL; - s8 *lna2_gain_db = NULL; - s8 tiaA_gain_db_rev7[] = { -9, -6, -3, 0, 3, 3, 3, 3, 3, 3 }; - s8 *tia_gain_db; - s8 tiaA_gainbits_rev7[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4 }; - s8 *tia_gainbits; - u16 rfseqA_init_gain_rev7[] = { 0x624f, 0x624f }; - u16 *rfseq_init_gain; + static const s8 lna1G_gain_db_rev7[] = { 9, 14, 19, 24 }; + const s8 *lna1_gain_db = NULL; + const s8 *lna1_gain_db_2 = NULL; + const s8 *lna2_gain_db = NULL; + static const s8 tiaA_gain_db_rev7[] = { -9, -6, -3, 0, 3, 3, 3, 3, 3, 3 }; + const s8 *tia_gain_db; + static const s8 tiaA_gainbits_rev7[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4 }; + const s8 *tia_gainbits; + static const u16 rfseqA_init_gain_rev7[] = { 0x624f, 0x624f }; + const u16 *rfseq_init_gain; u16 init_gaincode; u16 clip1hi_gaincode; u16 clip1md_gaincode = 0; @@ -15310,10 +15310,9 @@ static void wlc_phy_workarounds_nphy_gainctrl_2057_rev6(struct brcms_phy *pi) if ((freq <= 5080) || (freq == 5825)) { - s8 lna1A_gain_db_rev7[] = { 11, 16, 20, 24 }; - s8 lna1A_gain_db_2_rev7[] = { - 11, 17, 22, 25}; - s8 lna2A_gain_db_rev7[] = { -1, 6, 10, 14 }; + static const s8 lna1A_gain_db_rev7[] = { 11, 16, 20, 24 }; + static const s8 lna1A_gain_db_2_rev7[] = { 11, 17, 22, 25}; + static const s8 lna2A_gain_db_rev7[] = { -1, 6, 10, 14 }; crsminu_th = 0x3e; lna1_gain_db = lna1A_gain_db_rev7; @@ -15321,10 +15320,9 @@ static void wlc_phy_workarounds_nphy_gainctrl_2057_rev6(struct brcms_phy *pi) lna2_gain_db = lna2A_gain_db_rev7; } else if ((freq >= 5500) && (freq <= 5700)) { - s8 lna1A_gain_db_rev7[] = { 11, 17, 21, 25 }; - s8 lna1A_gain_db_2_rev7[] = { - 12, 18, 22, 26}; - s8 lna2A_gain_db_rev7[] = { 1, 8, 12, 16 }; + static const s8 lna1A_gain_db_rev7[] = { 11, 17, 21, 25 }; + static const s8 lna1A_gain_db_2_rev7[] = { 12, 18, 22, 26}; + static const s8 lna2A_gain_db_rev7[] = { 1, 8, 12, 16 }; crsminu_th =
[PATCH v4 2/9] brcmsmac: split up wlc_phy_workarounds_nphy
The stack consumption in this driver is still relatively high, with one remaining warning if the warning level is lowered to 1536 bytes: drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:17135:1: error: the frame size of 1880 bytes is larger than 1536 bytes [-Werror=frame-larger-than=] The affected function is actually a collection of three separate implementations, and each of them is fairly large by itself. Splitting them up is done easily and improves readability at the same time. I'm leaving the original indentation to make the review easier. Acked-by: Arend van SprielSigned-off-by: Arnd Bergmann --- .../broadcom/brcm80211/brcmsmac/phy/phy_n.c| 178 - 1 file changed, 104 insertions(+), 74 deletions(-) This one and the following patch could be merged for either v4.14 or v4.15 at this point, whichever the maintainers prefer. No need to backport them to stable kernels. diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c index ef685465f80a..ed409a80f3d2 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c @@ -16061,52 +16061,8 @@ static void wlc_phy_workarounds_nphy_gainctrl(struct brcms_phy *pi) } } -static void wlc_phy_workarounds_nphy(struct brcms_phy *pi) +static void wlc_phy_workarounds_nphy_rev7(struct brcms_phy *pi) { - static const u8 rfseq_rx2tx_events[] = { - NPHY_RFSEQ_CMD_NOP, - NPHY_RFSEQ_CMD_RXG_FBW, - NPHY_RFSEQ_CMD_TR_SWITCH, - NPHY_RFSEQ_CMD_CLR_HIQ_DIS, - NPHY_RFSEQ_CMD_RXPD_TXPD, - NPHY_RFSEQ_CMD_TX_GAIN, - NPHY_RFSEQ_CMD_EXT_PA - }; - u8 rfseq_rx2tx_dlys[] = { 8, 6, 6, 2, 4, 60, 1 }; - static const u8 rfseq_tx2rx_events[] = { - NPHY_RFSEQ_CMD_NOP, - NPHY_RFSEQ_CMD_EXT_PA, - NPHY_RFSEQ_CMD_TX_GAIN, - NPHY_RFSEQ_CMD_RXPD_TXPD, - NPHY_RFSEQ_CMD_TR_SWITCH, - NPHY_RFSEQ_CMD_RXG_FBW, - NPHY_RFSEQ_CMD_CLR_HIQ_DIS - }; - static const u8 rfseq_tx2rx_dlys[] = { 8, 6, 2, 4, 4, 6, 1 }; - static const u8 rfseq_tx2rx_events_rev3[] = { - NPHY_REV3_RFSEQ_CMD_EXT_PA, - NPHY_REV3_RFSEQ_CMD_INT_PA_PU, - NPHY_REV3_RFSEQ_CMD_TX_GAIN, - NPHY_REV3_RFSEQ_CMD_RXPD_TXPD, - NPHY_REV3_RFSEQ_CMD_TR_SWITCH, - NPHY_REV3_RFSEQ_CMD_RXG_FBW, - NPHY_REV3_RFSEQ_CMD_CLR_HIQ_DIS, - NPHY_REV3_RFSEQ_CMD_END - }; - static const u8 rfseq_tx2rx_dlys_rev3[] = { 8, 4, 2, 2, 4, 4, 6, 1 }; - u8 rfseq_rx2tx_events_rev3[] = { - NPHY_REV3_RFSEQ_CMD_NOP, - NPHY_REV3_RFSEQ_CMD_RXG_FBW, - NPHY_REV3_RFSEQ_CMD_TR_SWITCH, - NPHY_REV3_RFSEQ_CMD_CLR_HIQ_DIS, - NPHY_REV3_RFSEQ_CMD_RXPD_TXPD, - NPHY_REV3_RFSEQ_CMD_TX_GAIN, - NPHY_REV3_RFSEQ_CMD_INT_PA_PU, - NPHY_REV3_RFSEQ_CMD_EXT_PA, - NPHY_REV3_RFSEQ_CMD_END - }; - u8 rfseq_rx2tx_dlys_rev3[] = { 8, 6, 6, 4, 4, 18, 42, 1, 1 }; - static const u8 rfseq_rx2tx_events_rev3_ipa[] = { NPHY_REV3_RFSEQ_CMD_NOP, NPHY_REV3_RFSEQ_CMD_RXG_FBW, @@ -16120,29 +16076,15 @@ static void wlc_phy_workarounds_nphy(struct brcms_phy *pi) }; static const u8 rfseq_rx2tx_dlys_rev3_ipa[] = { 8, 6, 6, 4, 4, 16, 43, 1, 1 }; static const u16 rfseq_rx2tx_dacbufpu_rev7[] = { 0x10f, 0x10f }; - - s16 alpha0, alpha1, alpha2; - s16 beta0, beta1, beta2; - u32 leg_data_weights, ht_data_weights, nss1_data_weights, - stbc_data_weights; + u32 leg_data_weights; u8 chan_freq_range = 0; static const u16 dac_control = 0x0002; u16 aux_adc_vmid_rev7_core0[] = { 0x8e, 0x96, 0x96, 0x96 }; u16 aux_adc_vmid_rev7_core1[] = { 0x8f, 0x9f, 0x9f, 0x96 }; - u16 aux_adc_vmid_rev4[] = { 0xa2, 0xb4, 0xb4, 0x89 }; - u16 aux_adc_vmid_rev3[] = { 0xa2, 0xb4, 0xb4, 0x89 }; - u16 *aux_adc_vmid; u16 aux_adc_gain_rev7[] = { 0x02, 0x02, 0x02, 0x02 }; - u16 aux_adc_gain_rev4[] = { 0x02, 0x02, 0x02, 0x00 }; - u16 aux_adc_gain_rev3[] = { 0x02, 0x02, 0x02, 0x00 }; - u16 *aux_adc_gain; - static const u16 sk_adc_vmid[] = { 0xb4, 0xb4, 0xb4, 0x24 }; - static const u16 sk_adc_gain[] = { 0x02, 0x02, 0x02, 0x02 }; s32 min_nvar_val = 0x18d; s32 min_nvar_offset_6mbps = 20; u8 pdetrange; - u8 triso; - u16 regval; u16 afectrl_adc_ctrl1_rev7 = 0x20; u16 afectrl_adc_ctrl2_rev7 = 0x0; u16 rfseq_rx2tx_lpf_h_hpc_rev7 = 0x77; @@ -16171,17 +16113,6 @@ static void
[PATCH v4 8/9] netlink: fix nla_put_{u8,u16,u32} for KASAN
When CONFIG_KASAN is enabled, the "--param asan-stack=1" causes rather large stack frames in some functions. This goes unnoticed normally because CONFIG_FRAME_WARN is disabled with CONFIG_KASAN by default as of commit 3f181b4d8652 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=y"). The kernelci.org build bot however has the warning enabled and that led me to investigate it a little further, as every build produces these warnings: net/wireless/nl80211.c:4389:1: warning: the frame size of 2240 bytes is larger than 2048 bytes [-Wframe-larger-than=] net/wireless/nl80211.c:1895:1: warning: the frame size of 3776 bytes is larger than 2048 bytes [-Wframe-larger-than=] net/wireless/nl80211.c:1410:1: warning: the frame size of 2208 bytes is larger than 2048 bytes [-Wframe-larger-than=] net/bridge/br_netlink.c:1282:1: warning: the frame size of 2544 bytes is larger than 2048 bytes [-Wframe-larger-than=] Most of this problem is now solved in gcc-8, which can consolidate the stack slots for the inline function arguments. On older compilers we can add a workaround by declaring a local variable in each function to pass the inline function argument. Cc: sta...@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 Signed-off-by: Arnd Bergmann--- include/net/netlink.h | 73 ++- 1 file changed, 55 insertions(+), 18 deletions(-) diff --git a/include/net/netlink.h b/include/net/netlink.h index e51cf5f81597..14c289393071 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -773,7 +773,10 @@ static inline int nla_parse_nested(struct nlattr *tb[], int maxtype, */ static inline int nla_put_u8(struct sk_buff *skb, int attrtype, u8 value) { - return nla_put(skb, attrtype, sizeof(u8), ); + /* temporary variables to work around GCC PR81715 with asan-stack=1 */ + u8 tmp = value; + + return nla_put(skb, attrtype, sizeof(u8), ); } /** @@ -784,7 +787,9 @@ static inline int nla_put_u8(struct sk_buff *skb, int attrtype, u8 value) */ static inline int nla_put_u16(struct sk_buff *skb, int attrtype, u16 value) { - return nla_put(skb, attrtype, sizeof(u16), ); + u16 tmp = value; + + return nla_put(skb, attrtype, sizeof(u16), ); } /** @@ -795,7 +800,9 @@ static inline int nla_put_u16(struct sk_buff *skb, int attrtype, u16 value) */ static inline int nla_put_be16(struct sk_buff *skb, int attrtype, __be16 value) { - return nla_put(skb, attrtype, sizeof(__be16), ); + __be16 tmp = value; + + return nla_put(skb, attrtype, sizeof(__be16), ); } /** @@ -806,7 +813,9 @@ static inline int nla_put_be16(struct sk_buff *skb, int attrtype, __be16 value) */ static inline int nla_put_net16(struct sk_buff *skb, int attrtype, __be16 value) { - return nla_put_be16(skb, attrtype | NLA_F_NET_BYTEORDER, value); + __be16 tmp = value; + + return nla_put_be16(skb, attrtype | NLA_F_NET_BYTEORDER, tmp); } /** @@ -817,7 +826,9 @@ static inline int nla_put_net16(struct sk_buff *skb, int attrtype, __be16 value) */ static inline int nla_put_le16(struct sk_buff *skb, int attrtype, __le16 value) { - return nla_put(skb, attrtype, sizeof(__le16), ); + __le16 tmp = value; + + return nla_put(skb, attrtype, sizeof(__le16), ); } /** @@ -828,7 +839,9 @@ static inline int nla_put_le16(struct sk_buff *skb, int attrtype, __le16 value) */ static inline int nla_put_u32(struct sk_buff *skb, int attrtype, u32 value) { - return nla_put(skb, attrtype, sizeof(u32), ); + u32 tmp = value; + + return nla_put(skb, attrtype, sizeof(u32), ); } /** @@ -839,7 +852,9 @@ static inline int nla_put_u32(struct sk_buff *skb, int attrtype, u32 value) */ static inline int nla_put_be32(struct sk_buff *skb, int attrtype, __be32 value) { - return nla_put(skb, attrtype, sizeof(__be32), ); + __be32 tmp = value; + + return nla_put(skb, attrtype, sizeof(__be32), ); } /** @@ -850,7 +865,9 @@ static inline int nla_put_be32(struct sk_buff *skb, int attrtype, __be32 value) */ static inline int nla_put_net32(struct sk_buff *skb, int attrtype, __be32 value) { - return nla_put_be32(skb, attrtype | NLA_F_NET_BYTEORDER, value); + __be32 tmp = value; + + return nla_put_be32(skb, attrtype | NLA_F_NET_BYTEORDER, tmp); } /** @@ -861,7 +878,9 @@ static inline int nla_put_net32(struct sk_buff *skb, int attrtype, __be32 value) */ static inline int nla_put_le32(struct sk_buff *skb, int attrtype, __le32 value) { - return nla_put(skb, attrtype, sizeof(__le32), ); + __le32 tmp = value; + + return nla_put(skb, attrtype, sizeof(__le32), ); } /** @@ -874,7 +893,9 @@ static inline int nla_put_le32(struct sk_buff *skb, int attrtype, __le32 value) static inline int nla_put_u64_64bit(struct sk_buff *skb, int attrtype, u64 value,
[PATCH v4 9/9] kasan: rework Kconfig settings
We get a lot of very large stack frames using gcc-7.0.1 with the default -fsanitize-address-use-after-scope --param asan-stack=1 options, which can easily cause an overflow of the kernel stack, e.g. drivers/gpu/drm/i915/gvt/handlers.c:2407:1: error: the frame size of 31216 bytes is larger than 2048 bytes drivers/net/wireless/ralink/rt2x00/rt2800lib.c:5650:1: error: the frame size of 23632 bytes is larger than 2048 bytes drivers/scsi/fnic/fnic_trace.c:451:1: error: the frame size of 5152 bytes is larger than 2048 bytes fs/btrfs/relocation.c:1202:1: error: the frame size of 4256 bytes is larger than 2048 bytes fs/fscache/stats.c:287:1: error: the frame size of 6552 bytes is larger than 2048 bytes lib/atomic64_test.c:250:1: error: the frame size of 12616 bytes is larger than 2048 bytes mm/vmscan.c:1367:1: error: the frame size of 5080 bytes is larger than 2048 bytes net/wireless/nl80211.c:1905:1: error: the frame size of 4232 bytes is larger than 2048 bytes To reduce this risk, -fsanitize-address-use-after-scope is now split out into a separate CONFIG_KASAN_EXTRA Kconfig option, leading to stack frames that are smaller than 2 kilobytes most of the time on x86_64. An earlier version of this patch also prevented combining KASAN_EXTRA with KASAN_INLINE, but that is no longer necessary with gcc-7.0.1. A lot of warnings with KASAN_EXTRA go away if we disable KMEMCHECK, as -fsanitize-address-use-after-scope seems to understand the builtin memcpy, but adds checking code around an extern memcpy call. I had to work around a circular dependency, as DEBUG_SLAB/SLUB depended on !KMEMCHECK, while KASAN did it the other way round. Now we handle both the same way and make KASAN and KMEMCHECK mutually exclusive. All patches to get the frame size below 2048 bytes with CONFIG_KASAN=y and CONFIG_KASAN_EXTRA=n have been submitted along with this patch, so we can bring back that default now. KASAN_EXTRA=y still causes lots of warnings but now defaults to !COMPILE_TEST to disable it in allmodconfig, and it remains disabled in all other defconfigs since it is a new option. I arbitrarily raise the warning limit for KASAN_EXTRA to 3072 to reduce the noise, but an allmodconfig kernel still has around 50 warnings on gcc-7. I experimented a bit more with smaller stack frames and have another follow-up series that reduces the warning limit for 64-bit architectures to 1280 bytes (without CONFIG_KASAN). With earlier versions of this patch series, I also had patches to address the warnings we get with KASAN and/or KASAN_EXTRA, using a "noinline_if_stackbloat" annotation. That annotation now got replaced with a gcc-8 bugfix (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715) and a workaround for older compilers, which means that KASAN_EXTRA is now just as bad as before and will lead to an instant stack overflow in a few extreme cases. This reverts parts of commit commit 3f181b4 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=y"). Signed-off-by: Arnd Bergmann--- lib/Kconfig.debug | 4 ++-- lib/Kconfig.kasan | 13 - lib/Kconfig.kmemcheck | 1 + scripts/Makefile.kasan | 3 +++ 4 files changed, 18 insertions(+), 3 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index b19c491cbc4e..5755875d4a80 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -217,7 +217,7 @@ config ENABLE_MUST_CHECK config FRAME_WARN int "Warn for stack frames larger than (needs gcc 4.4)" range 0 8192 - default 0 if KASAN + default 3072 if KASAN_EXTRA default 2048 if GCC_PLUGIN_LATENT_ENTROPY default 1024 if !64BIT default 2048 if 64BIT @@ -503,7 +503,7 @@ config DEBUG_OBJECTS_ENABLE_DEFAULT config DEBUG_SLAB bool "Debug slab memory allocations" - depends on DEBUG_KERNEL && SLAB && !KMEMCHECK + depends on DEBUG_KERNEL && SLAB && !KMEMCHECK && !KASAN help Say Y here to have the kernel do limited verification on memory allocation as well as poisoning memory on free to catch use of freed diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan index bd38aab05929..db799e6e9dba 100644 --- a/lib/Kconfig.kasan +++ b/lib/Kconfig.kasan @@ -5,7 +5,7 @@ if HAVE_ARCH_KASAN config KASAN bool "KASan: runtime memory debugger" - depends on SLUB || (SLAB && !DEBUG_SLAB) + depends on SLUB || SLAB select CONSTRUCTORS select STACKDEPOT help @@ -20,6 +20,17 @@ config KASAN Currently CONFIG_KASAN doesn't work with CONFIG_DEBUG_SLAB (the resulting kernel does not boot). +config KASAN_EXTRA + bool "KAsan: extra checks" + depends on KASAN && DEBUG_KERNEL && !COMPILE_TEST + help + This enables further checks in the kernel address sanitizer, for now + it only includes the address-use-after-scope check that can lead + to excessive kernel stack usage, frame size warnings and longer +
[PATCH v4 7/9] rocker: fix rocker_tlv_put_* functions for KASAN
Inlining these functions creates lots of stack variables that each take 64 bytes when KASAN is enabled, leading to this warning about potential stack overflow: drivers/net/ethernet/rocker/rocker_ofdpa.c: In function 'ofdpa_cmd_flow_tbl_add': drivers/net/ethernet/rocker/rocker_ofdpa.c:621:1: error: the frame size of 2752 bytes is larger than 1536 bytes [-Werror=frame-larger-than=] gcc-8 can now consolidate the stack slots itself, but on older versions we get the same behavior by using a temporary variable that holds a copy of the inline function argument. Cc: sta...@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 Signed-off-by: Arnd Bergmann--- drivers/net/ethernet/rocker/rocker_tlv.h | 48 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker_tlv.h b/drivers/net/ethernet/rocker/rocker_tlv.h index a63ef82e7c72..dfae3c9d57c6 100644 --- a/drivers/net/ethernet/rocker/rocker_tlv.h +++ b/drivers/net/ethernet/rocker/rocker_tlv.h @@ -139,40 +139,52 @@ rocker_tlv_start(struct rocker_desc_info *desc_info) int rocker_tlv_put(struct rocker_desc_info *desc_info, int attrtype, int attrlen, const void *data); -static inline int rocker_tlv_put_u8(struct rocker_desc_info *desc_info, - int attrtype, u8 value) +static inline int +rocker_tlv_put_u8(struct rocker_desc_info *desc_info, int attrtype, u8 value) { - return rocker_tlv_put(desc_info, attrtype, sizeof(u8), ); + u8 tmp = value; /* work around GCC PR81715 */ + + return rocker_tlv_put(desc_info, attrtype, sizeof(u8), ); } -static inline int rocker_tlv_put_u16(struct rocker_desc_info *desc_info, -int attrtype, u16 value) +static inline int +rocker_tlv_put_u16(struct rocker_desc_info *desc_info, int attrtype, u16 value) { - return rocker_tlv_put(desc_info, attrtype, sizeof(u16), ); + u16 tmp = value; + + return rocker_tlv_put(desc_info, attrtype, sizeof(u16), ); } -static inline int rocker_tlv_put_be16(struct rocker_desc_info *desc_info, - int attrtype, __be16 value) +static inline int +rocker_tlv_put_be16(struct rocker_desc_info *desc_info, int attrtype, __be16 value) { - return rocker_tlv_put(desc_info, attrtype, sizeof(__be16), ); + __be16 tmp = value; + + return rocker_tlv_put(desc_info, attrtype, sizeof(__be16), ); } -static inline int rocker_tlv_put_u32(struct rocker_desc_info *desc_info, -int attrtype, u32 value) +static inline int +rocker_tlv_put_u32(struct rocker_desc_info *desc_info, int attrtype, u32 value) { - return rocker_tlv_put(desc_info, attrtype, sizeof(u32), ); + u32 tmp = value; + + return rocker_tlv_put(desc_info, attrtype, sizeof(u32), ); } -static inline int rocker_tlv_put_be32(struct rocker_desc_info *desc_info, - int attrtype, __be32 value) +static inline int +rocker_tlv_put_be32(struct rocker_desc_info *desc_info, int attrtype, __be32 value) { - return rocker_tlv_put(desc_info, attrtype, sizeof(__be32), ); + __be32 tmp = value; + + return rocker_tlv_put(desc_info, attrtype, sizeof(__be32), ); } -static inline int rocker_tlv_put_u64(struct rocker_desc_info *desc_info, -int attrtype, u64 value) +static inline int +rocker_tlv_put_u64(struct rocker_desc_info *desc_info, int attrtype, u64 value) { - return rocker_tlv_put(desc_info, attrtype, sizeof(u64), ); + u64 tmp = value; + + return rocker_tlv_put(desc_info, attrtype, sizeof(u64), ); } static inline struct rocker_tlv * -- 2.9.0
[PATCH v4 4/9] em28xx: fix em28xx_dvb_init for KASAN
With CONFIG_KASAN, the init function uses a large amount of kernel stack: drivers/media/usb/em28xx/em28xx-dvb.c: In function 'em28xx_dvb_init.part.4': drivers/media/usb/em28xx/em28xx-dvb.c:2061:1: error: the frame size of 3232 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] It seems that this is triggered in part by using strlcpy(), which the compiler doesn't recognize as copying at most 'len' bytes, since strlcpy is not part of the C standard. It does however recognize the standard strncpy() and optimizes away the extra checks for that, using only 1688 bytes in the end. I have another larger patch that we could use in addition to this one, in order to shrink the stack for -fsanitize-address-use-after-scope (with gcc-7.1.1) as well, but that would not be appropriate for stable backports, so let's focus on this one first. Cc: sta...@vger.kernel.org Signed-off-by: Arnd Bergmann--- drivers/media/usb/em28xx/em28xx-dvb.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/media/usb/em28xx/em28xx-dvb.c b/drivers/media/usb/em28xx/em28xx-dvb.c index 4a7db623fe29..06c363dc55ed 100644 --- a/drivers/media/usb/em28xx/em28xx-dvb.c +++ b/drivers/media/usb/em28xx/em28xx-dvb.c @@ -1440,7 +1440,7 @@ static int em28xx_dvb_init(struct em28xx *dev) tda10071_pdata.pll_multiplier = 20, tda10071_pdata.tuner_i2c_addr = 0x14, memset(_info, 0, sizeof(board_info)); - strlcpy(board_info.type, "tda10071_cx24118", I2C_NAME_SIZE); + strncpy(board_info.type, "tda10071_cx24118", I2C_NAME_SIZE - 1); board_info.addr = 0x55; board_info.platform_data = _pdata; request_module("tda10071"); @@ -1460,7 +1460,7 @@ static int em28xx_dvb_init(struct em28xx *dev) /* attach SEC */ a8293_pdata.dvb_frontend = dvb->fe[0]; memset(_info, 0, sizeof(board_info)); - strlcpy(board_info.type, "a8293", I2C_NAME_SIZE); + strncpy(board_info.type, "a8293", I2C_NAME_SIZE - 1); board_info.addr = 0x08; board_info.platform_data = _pdata; request_module("a8293"); @@ -1643,7 +1643,7 @@ static int em28xx_dvb_init(struct em28xx *dev) m88ds3103_pdata.ts_clk_pol = 1; m88ds3103_pdata.agc = 0x99; memset(_info, 0, sizeof(board_info)); - strlcpy(board_info.type, "m88ds3103", I2C_NAME_SIZE); + strncpy(board_info.type, "m88ds3103", I2C_NAME_SIZE - 1); board_info.addr = 0x68; board_info.platform_data = _pdata; request_module("m88ds3103"); @@ -1664,7 +1664,7 @@ static int em28xx_dvb_init(struct em28xx *dev) /* attach tuner */ ts2020_config.fe = dvb->fe[0]; memset(_info, 0, sizeof(board_info)); - strlcpy(board_info.type, "ts2022", I2C_NAME_SIZE); + strncpy(board_info.type, "ts2022", I2C_NAME_SIZE - 1); board_info.addr = 0x60; board_info.platform_data = _config; request_module("ts2020"); @@ -1690,7 +1690,7 @@ static int em28xx_dvb_init(struct em28xx *dev) /* attach SEC */ a8293_pdata.dvb_frontend = dvb->fe[0]; memset(_info, 0, sizeof(board_info)); - strlcpy(board_info.type, "a8293", I2C_NAME_SIZE); + strncpy(board_info.type, "a8293", I2C_NAME_SIZE - 1); board_info.addr = 0x08; board_info.platform_data = _pdata; request_module("a8293"); @@ -1729,7 +1729,7 @@ static int em28xx_dvb_init(struct em28xx *dev) si2168_config.fe = >fe[0]; si2168_config.ts_mode = SI2168_TS_PARALLEL; memset(, 0, sizeof(struct i2c_board_info)); - strlcpy(info.type, "si2168", I2C_NAME_SIZE); + strncpy(info.type, "si2168", I2C_NAME_SIZE - 1); info.addr = 0x64; info.platform_data = _config; request_module(info.type); @@ -1755,7 +1755,7 @@ static int em28xx_dvb_init(struct em28xx *dev) si2157_config.mdev = dev->media_dev; #endif memset(, 0, sizeof(struct i2c_board_info)); - strlcpy(info.type, "si2157", I2C_NAME_SIZE); + strncpy(info.type, "si2157", I2C_NAME_SIZE - 1); info.addr = 0x60; info.platform_data = _config; request_module(info.type); @@ -1793,7 +1793,7 @@ static int em28xx_dvb_init(struct em28xx *dev) si2168_config.fe = >fe[0]; si2168_config.ts_mode = SI2168_TS_PARALLEL; memset(, 0,
[PATCH v4 3/9] brcmsmac: reindent split functions
In the previous commit I left the indentation alone to help reviewing the patch, this one now runs the three new functions through 'indent -kr -8' with some manual fixups to avoid silliness. No changes other than whitespace are intended here. Signed-off-by: Arnd BergmannAcked-by: Arend van Spriel --- .../broadcom/brcm80211/brcmsmac/phy/phy_n.c| 1507 +--- 1 file changed, 697 insertions(+), 810 deletions(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c index ed409a80f3d2..763e8ba6b178 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c @@ -16074,7 +16074,8 @@ static void wlc_phy_workarounds_nphy_rev7(struct brcms_phy *pi) NPHY_REV3_RFSEQ_CMD_INT_PA_PU, NPHY_REV3_RFSEQ_CMD_END }; - static const u8 rfseq_rx2tx_dlys_rev3_ipa[] = { 8, 6, 6, 4, 4, 16, 43, 1, 1 }; + static const u8 rfseq_rx2tx_dlys_rev3_ipa[] = + { 8, 6, 6, 4, 4, 16, 43, 1, 1 }; static const u16 rfseq_rx2tx_dacbufpu_rev7[] = { 0x10f, 0x10f }; u32 leg_data_weights; u8 chan_freq_range = 0; @@ -16114,526 +16115,452 @@ static void wlc_phy_workarounds_nphy_rev7(struct brcms_phy *pi) int coreNum; - if (NREV_IS(pi->pubpi.phy_rev, 7)) { - mod_phy_reg(pi, 0x221, (0x1 << 4), (1 << 4)); - - mod_phy_reg(pi, 0x160, (0x7f << 0), (32 << 0)); - mod_phy_reg(pi, 0x160, (0x7f << 8), (39 << 8)); - mod_phy_reg(pi, 0x161, (0x7f << 0), (46 << 0)); - mod_phy_reg(pi, 0x161, (0x7f << 8), (51 << 8)); - mod_phy_reg(pi, 0x162, (0x7f << 0), (55 << 0)); - mod_phy_reg(pi, 0x162, (0x7f << 8), (58 << 8)); - mod_phy_reg(pi, 0x163, (0x7f << 0), (60 << 0)); - mod_phy_reg(pi, 0x163, (0x7f << 8), (62 << 8)); - mod_phy_reg(pi, 0x164, (0x7f << 0), (62 << 0)); - mod_phy_reg(pi, 0x164, (0x7f << 8), (63 << 8)); - mod_phy_reg(pi, 0x165, (0x7f << 0), (63 << 0)); - mod_phy_reg(pi, 0x165, (0x7f << 8), (64 << 8)); - mod_phy_reg(pi, 0x166, (0x7f << 0), (64 << 0)); - mod_phy_reg(pi, 0x166, (0x7f << 8), (64 << 8)); - mod_phy_reg(pi, 0x167, (0x7f << 0), (64 << 0)); - mod_phy_reg(pi, 0x167, (0x7f << 8), (64 << 8)); - } - - if (NREV_LE(pi->pubpi.phy_rev, 8)) { - write_phy_reg(pi, 0x23f, 0x1b0); - write_phy_reg(pi, 0x240, 0x1b0); - } + if (NREV_IS(pi->pubpi.phy_rev, 7)) { + mod_phy_reg(pi, 0x221, (0x1 << 4), (1 << 4)); + + mod_phy_reg(pi, 0x160, (0x7f << 0), (32 << 0)); + mod_phy_reg(pi, 0x160, (0x7f << 8), (39 << 8)); + mod_phy_reg(pi, 0x161, (0x7f << 0), (46 << 0)); + mod_phy_reg(pi, 0x161, (0x7f << 8), (51 << 8)); + mod_phy_reg(pi, 0x162, (0x7f << 0), (55 << 0)); + mod_phy_reg(pi, 0x162, (0x7f << 8), (58 << 8)); + mod_phy_reg(pi, 0x163, (0x7f << 0), (60 << 0)); + mod_phy_reg(pi, 0x163, (0x7f << 8), (62 << 8)); + mod_phy_reg(pi, 0x164, (0x7f << 0), (62 << 0)); + mod_phy_reg(pi, 0x164, (0x7f << 8), (63 << 8)); + mod_phy_reg(pi, 0x165, (0x7f << 0), (63 << 0)); + mod_phy_reg(pi, 0x165, (0x7f << 8), (64 << 8)); + mod_phy_reg(pi, 0x166, (0x7f << 0), (64 << 0)); + mod_phy_reg(pi, 0x166, (0x7f << 8), (64 << 8)); + mod_phy_reg(pi, 0x167, (0x7f << 0), (64 << 0)); + mod_phy_reg(pi, 0x167, (0x7f << 8), (64 << 8)); + } - if (NREV_GE(pi->pubpi.phy_rev, 8)) - mod_phy_reg(pi, 0xbd, (0xff << 0), (114 << 0)); + if (NREV_LE(pi->pubpi.phy_rev, 8)) { + write_phy_reg(pi, 0x23f, 0x1b0); + write_phy_reg(pi, 0x240, 0x1b0); + } - wlc_phy_table_write_nphy(pi, NPHY_TBL_ID_AFECTRL, 1, 0x00, 16, -_control); - wlc_phy_table_write_nphy(pi, NPHY_TBL_ID_AFECTRL, 1, 0x10, 16, -_control); + if (NREV_GE(pi->pubpi.phy_rev, 8)) + mod_phy_reg(pi, 0xbd, (0xff << 0), (114 << 0)); - wlc_phy_table_read_nphy(pi, NPHY_TBL_ID_CMPMETRICDATAWEIGHTTBL, - 1, 0, 32, _data_weights); - leg_data_weights = leg_data_weights & 0xff; - wlc_phy_table_write_nphy(pi, NPHY_TBL_ID_CMPMETRICDATAWEIGHTTBL, -
[PATCH v4 6/9] dvb-frontends: fix i2c access helpers for KASAN
A typical code fragment was copied across many dvb-frontend drivers and causes large stack frames when built with with CONFIG_KASAN on gcc-5/6/7: drivers/media/dvb-frontends/cxd2841er.c:3225:1: error: the frame size of 3992 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] drivers/media/dvb-frontends/cxd2841er.c:3404:1: error: the frame size of 3136 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] drivers/media/dvb-frontends/stv0367.c:3143:1: error: the frame size of 4016 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] drivers/media/dvb-frontends/stv090x.c:3430:1: error: the frame size of 5312 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] drivers/media/dvb-frontends/stv090x.c:4248:1: error: the frame size of 4872 bytes is larger than 3072 bytes [-Werror=frame-larger-than=] gcc-8 now solves this by consolidating the stack slots for the argument variables, but on older compilers we can get the same behavior by taking the pointer of a local variable rather than the inline function argument. Cc: sta...@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 Signed-off-by: Arnd Bergmann--- I'm undecided here whether there should be a comment pointing to PR81715 for each file that the bogus local variable workaround to prevent it from being cleaned up again. It's probably not necessary since anything that causes actual problems would also trigger a build warning. --- drivers/media/dvb-frontends/ascot2e.c | 4 +++- drivers/media/dvb-frontends/cxd2841er.c | 4 +++- drivers/media/dvb-frontends/helene.c | 4 +++- drivers/media/dvb-frontends/horus3a.c | 4 +++- drivers/media/dvb-frontends/itd1000.c | 5 +++-- drivers/media/dvb-frontends/mt312.c | 4 +++- drivers/media/dvb-frontends/stb0899_drv.c | 3 ++- drivers/media/dvb-frontends/stb6100.c | 6 -- drivers/media/dvb-frontends/stv0367.c | 4 +++- drivers/media/dvb-frontends/stv090x.c | 4 +++- drivers/media/dvb-frontends/stv6110x.c| 4 +++- drivers/media/dvb-frontends/zl10039.c | 4 +++- 12 files changed, 36 insertions(+), 14 deletions(-) diff --git a/drivers/media/dvb-frontends/ascot2e.c b/drivers/media/dvb-frontends/ascot2e.c index 0ee0df53b91b..1219272ca3f0 100644 --- a/drivers/media/dvb-frontends/ascot2e.c +++ b/drivers/media/dvb-frontends/ascot2e.c @@ -155,7 +155,9 @@ static int ascot2e_write_regs(struct ascot2e_priv *priv, static int ascot2e_write_reg(struct ascot2e_priv *priv, u8 reg, u8 val) { - return ascot2e_write_regs(priv, reg, , 1); + u8 tmp = val; + + return ascot2e_write_regs(priv, reg, , 1); } static int ascot2e_read_regs(struct ascot2e_priv *priv, diff --git a/drivers/media/dvb-frontends/cxd2841er.c b/drivers/media/dvb-frontends/cxd2841er.c index 48ee9bc00c06..b7574deff5c6 100644 --- a/drivers/media/dvb-frontends/cxd2841er.c +++ b/drivers/media/dvb-frontends/cxd2841er.c @@ -257,7 +257,9 @@ static int cxd2841er_write_regs(struct cxd2841er_priv *priv, static int cxd2841er_write_reg(struct cxd2841er_priv *priv, u8 addr, u8 reg, u8 val) { - return cxd2841er_write_regs(priv, addr, reg, , 1); + u8 tmp = val; + + return cxd2841er_write_regs(priv, addr, reg, , 1); } static int cxd2841er_read_regs(struct cxd2841er_priv *priv, diff --git a/drivers/media/dvb-frontends/helene.c b/drivers/media/dvb-frontends/helene.c index 4bf5a551ba40..6e93f2d1575b 100644 --- a/drivers/media/dvb-frontends/helene.c +++ b/drivers/media/dvb-frontends/helene.c @@ -331,7 +331,9 @@ static int helene_write_regs(struct helene_priv *priv, static int helene_write_reg(struct helene_priv *priv, u8 reg, u8 val) { - return helene_write_regs(priv, reg, , 1); + u8 tmp = val; + + return helene_write_regs(priv, reg, , 1); } static int helene_read_regs(struct helene_priv *priv, diff --git a/drivers/media/dvb-frontends/horus3a.c b/drivers/media/dvb-frontends/horus3a.c index 68d759c4c52e..fa9e2d373073 100644 --- a/drivers/media/dvb-frontends/horus3a.c +++ b/drivers/media/dvb-frontends/horus3a.c @@ -89,7 +89,9 @@ static int horus3a_write_regs(struct horus3a_priv *priv, static int horus3a_write_reg(struct horus3a_priv *priv, u8 reg, u8 val) { - return horus3a_write_regs(priv, reg, , 1); + u8 tmp = val; + + return horus3a_write_regs(priv, reg, , 1); } static int horus3a_enter_power_save(struct horus3a_priv *priv) diff --git a/drivers/media/dvb-frontends/itd1000.c b/drivers/media/dvb-frontends/itd1000.c index 5bb1e73a10b4..1ac5177162f6 100644 --- a/drivers/media/dvb-frontends/itd1000.c +++ b/drivers/media/dvb-frontends/itd1000.c @@ -95,8 +95,9 @@ static int itd1000_read_reg(struct itd1000_state *state, u8 reg) static inline int itd1000_write_reg(struct itd1000_state *state, u8 r, u8 v) { - int ret = itd1000_write_regs(state, r, , 1); - state->shadow[r] = v; + u8 tmp = v; + int ret =
[PATCH v4 5/9] r820t: fix r820t_write_reg for KASAN
With CONFIG_KASAN, we get an overly long stack frame due to inlining the register access functions: drivers/media/tuners/r820t.c: In function 'generic_set_freq.isra.7': drivers/media/tuners/r820t.c:1334:1: error: the frame size of 2880 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] This is caused by a gcc bug that has now been fixed in gcc-8. To work around the problem, we can pass the register data through a local variable that older gcc versions can optimize out as well. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 Signed-off-by: Arnd Bergmann--- drivers/media/tuners/r820t.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/media/tuners/r820t.c b/drivers/media/tuners/r820t.c index ba80376a3b86..d097eb04a0e9 100644 --- a/drivers/media/tuners/r820t.c +++ b/drivers/media/tuners/r820t.c @@ -396,9 +396,11 @@ static int r820t_write(struct r820t_priv *priv, u8 reg, const u8 *val, return 0; } -static int r820t_write_reg(struct r820t_priv *priv, u8 reg, u8 val) +static inline int r820t_write_reg(struct r820t_priv *priv, u8 reg, u8 val) { - return r820t_write(priv, reg, , 1); + u8 tmp = val; /* work around GCC PR81715 with asan-stack=1 */ + + return r820t_write(priv, reg, , 1); } static int r820t_read_cache_reg(struct r820t_priv *priv, int reg) @@ -411,17 +413,18 @@ static int r820t_read_cache_reg(struct r820t_priv *priv, int reg) return -EINVAL; } -static int r820t_write_reg_mask(struct r820t_priv *priv, u8 reg, u8 val, +static inline int r820t_write_reg_mask(struct r820t_priv *priv, u8 reg, u8 val, u8 bit_mask) { + u8 tmp = val; int rc = r820t_read_cache_reg(priv, reg); if (rc < 0) return rc; - val = (rc & ~bit_mask) | (val & bit_mask); + tmp = (rc & ~bit_mask) | (tmp & bit_mask); - return r820t_write(priv, reg, , 1); + return r820t_write(priv, reg, , 1); } static int r820t_read(struct r820t_priv *priv, u8 reg, u8 *val, int len) -- 2.9.0
[RFC PATCH 07/11] ipv6/addrconf: add an helper for inet6 address lookup
reduce code duplication and will simplify follow-up patch Signed-off-by: Paolo Abeni--- net/ipv6/addrconf.c | 65 + 1 file changed, 31 insertions(+), 34 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index c2e2a78787ec..5940062cac8d 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1796,35 +1796,46 @@ int ipv6_chk_addr(struct net *net, const struct in6_addr *addr, } EXPORT_SYMBOL(ipv6_chk_addr); +/* called under RCU lock with bh disabled */ +static struct inet6_ifaddr *ipv6_lookup_ifaddr_rcu_bh(struct net *net, + const struct in6_addr *addr) +{ + unsigned int hash = inet6_addr_hash(addr); + struct inet6_ifaddr *ifp; + + hlist_for_each_entry_rcu_bh(ifp, _addr_lst[hash], addr_lst) + if (net_eq(dev_net(ifp->idev->dev), net) && + ipv6_addr_equal(>addr, addr)) + return ifp; + + return NULL; +} + int ipv6_chk_addr_and_flags(struct net *net, const struct in6_addr *addr, const struct net_device *dev, int strict, u32 banned_flags) { struct inet6_ifaddr *ifp; - unsigned int hash = inet6_addr_hash(addr); u32 ifp_flags; + int ret = 0; rcu_read_lock_bh(); - hlist_for_each_entry_rcu(ifp, _addr_lst[hash], addr_lst) { - if (!net_eq(dev_net(ifp->idev->dev), net)) - continue; + ifp = ipv6_lookup_ifaddr_rcu_bh(net, addr); + if (ifp) { /* Decouple optimistic from tentative for evaluation here. * Ban optimistic addresses explicitly, when required. */ ifp_flags = (ifp->flags_F_OPTIMISTIC) ? (ifp->flags&~IFA_F_TENTATIVE) : ifp->flags; - if (ipv6_addr_equal(>addr, addr) && - !(ifp_flags_flags) && + if (!(ifp_flags_flags) && (!dev || ifp->idev->dev == dev || -!(ifp->scope&(IFA_LINK|IFA_HOST) || strict))) { - rcu_read_unlock_bh(); - return 1; - } +!(ifp->scope&(IFA_LINK|IFA_HOST) || strict))) + ret = 1; } rcu_read_unlock_bh(); - return 0; + return ret; } EXPORT_SYMBOL(ipv6_chk_addr_and_flags); @@ -1900,20 +1911,13 @@ struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *add struct net_device *dev, int strict) { struct inet6_ifaddr *ifp, *result = NULL; - unsigned int hash = inet6_addr_hash(addr); rcu_read_lock_bh(); - hlist_for_each_entry_rcu_bh(ifp, _addr_lst[hash], addr_lst) { - if (!net_eq(dev_net(ifp->idev->dev), net)) - continue; - if (ipv6_addr_equal(>addr, addr)) { - if (!dev || ifp->idev->dev == dev || - !(ifp->scope&(IFA_LINK|IFA_HOST) || strict)) { - result = ifp; - in6_ifa_hold(ifp); - break; - } - } + ifp = ipv6_lookup_ifaddr_rcu_bh(net, addr); + if (ifp && (!dev || ifp->idev->dev == dev || + !(ifp->scope & (IFA_LINK|IFA_HOST) || strict))) { + result = ifp; + in6_ifa_hold(ifp); } rcu_read_unlock_bh(); @@ -4226,20 +4230,13 @@ void if6_proc_exit(void) /* Check if address is a home address configured on any interface. */ int ipv6_chk_home_addr(struct net *net, const struct in6_addr *addr) { - int ret = 0; struct inet6_ifaddr *ifp = NULL; - unsigned int hash = inet6_addr_hash(addr); + int ret = 0; rcu_read_lock_bh(); - hlist_for_each_entry_rcu_bh(ifp, _addr_lst[hash], addr_lst) { - if (!net_eq(dev_net(ifp->idev->dev), net)) - continue; - if (ipv6_addr_equal(>addr, addr) && - (ifp->flags & IFA_F_HOMEADDRESS)) { - ret = 1; - break; - } - } + ifp = ipv6_lookup_ifaddr_rcu_bh(net, addr); + if (ifp && ifp->flags & IFA_F_HOMEADDRESS) + ret = 1; rcu_read_unlock_bh(); return ret; } -- 2.13.5
[RFC PATCH 01/11] net: add support for noref skb->sk
Noref sk do not carry a socket refcount, are valid only inside the current RCU section and must be explicitly cleared before exiting such section. They will be used in a later patch to allow early demux without sock refcounting. Signed-off-by: Paolo Abeni--- include/linux/skbuff.h | 31 +++ net/core/sock.c| 7 +++ 2 files changed, 38 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 492828801acb..c3fc32636690 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -922,6 +922,37 @@ static inline struct rtable *skb_rtable(const struct sk_buff *skb) return (struct rtable *)skb_dst(skb); } +void sock_dummyfree(struct sk_buff *skb); + +/* only early demux can set noref socks + * noref socks do not carry any refcount and must be + * cleared before exiting the current RCU section + */ +static inline void skb_set_noref_sk(struct sk_buff *skb, struct sock *sk) +{ + skb->sk = sk; + skb->destructor = sock_dummyfree; +} + +static inline bool skb_has_noref_sk(struct sk_buff *skb) +{ + return skb->destructor == sock_dummyfree; +} + +static inline struct sock *skb_clear_noref_sk(struct sk_buff *skb) +{ + struct sock *ret; + + if (!skb_has_noref_sk(skb)) + return NULL; + + WARN_ON_ONCE(!rcu_read_lock_held()); + ret = skb->sk; + skb->sk = NULL; + skb->destructor = NULL; + return ret; +} + /* For mangling skb->pkt_type from user space side from applications * such as nft, tc, etc, we only allow a conservative subset of * possible pkt_types to be set. diff --git a/net/core/sock.c b/net/core/sock.c index 9b7b6bbb2a23..33da8e7e58a0 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1893,6 +1893,13 @@ void sock_efree(struct sk_buff *skb) } EXPORT_SYMBOL(sock_efree); +/* dummy destructor used by noref sockets */ +void sock_dummyfree(struct sk_buff *skb) +{ + WARN_ON_ONCE(!rcu_read_lock_held()); +} +EXPORT_SYMBOL(sock_dummyfree); + kuid_t sock_i_uid(struct sock *sk) { kuid_t uid; -- 2.13.5
[RFC PATCH 05/11] udp: perform full socket lookup in early demux
Since UDP early demux lookup fetches noref socket references, we can safely be optimistic about it and set the sk reference even if the skb is not going to land on such socket, avoiding the rx dst cache usage for unconnected unicast sockets. This avoids a second lookup for unconnected sockets, and clean up a bit the whole udp early demux code. After this change, on hosts not acting as routers, the UDP early demux never affect negatively the receive performances, while before this change UDP early demux caused measurable performance impact for unconnected sockets. Signed-off-by: Paolo Abeni--- include/linux/udp.h | 2 ++ net/ipv4/udp.c | 62 +++-- net/ipv6/udp.c | 57 3 files changed, 38 insertions(+), 83 deletions(-) diff --git a/include/linux/udp.h b/include/linux/udp.h index eaea63bc79bb..9c68b57543cc 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -92,6 +92,8 @@ static inline struct udp_sock *udp_sk(const struct sock *sk) return (struct udp_sock *)sk; } +void udp_set_skb_rx_dst(struct sock *sk, struct sk_buff *skb, u32 cookie); + static inline void udp_set_no_check6_tx(struct sock *sk, bool val) { udp_sk(sk)->no_check6_tx = val; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index ba49d5aa9f09..5cbbd78024dc 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2043,6 +2043,11 @@ static inline int udp4_csum_init(struct sk_buff *skb, struct udphdr *uh, inet_compute_pseudo); } +static bool udp_use_rx_dst_cache(struct sock *sk, struct sk_buff *skb) +{ + return sk->sk_state == TCP_ESTABLISHED || skb->pkt_type != PACKET_HOST; +} + /* * All we need to do is get the socket, and then do a checksum. */ @@ -2088,8 +2093,8 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, struct dst_entry *dst = skb_dst(skb); int ret; - if (unlikely(sk->sk_rx_dst != dst)) - udp_sk_rx_dst_set(sk, dst); + if (udp_use_rx_dst_cache(sk, skb)) + dst_update(>sk_rx_dst, dst); ret = udp_queue_rcv_skb(sk, skb); if (!noref_sk) @@ -2196,42 +2201,28 @@ static struct sock *__udp4_lib_mcast_demux_lookup(struct net *net, return result; } -/* For unicast we should only early demux connected sockets or we can - * break forwarding setups. The chains here can be long so only check - * if the first socket is an exact match and if not move on. - */ -static struct sock *__udp4_lib_demux_lookup(struct net *net, - __be16 loc_port, __be32 loc_addr, - __be16 rmt_port, __be32 rmt_addr, - int dif, int sdif) +void udp_set_skb_rx_dst(struct sock *sk, struct sk_buff *skb, u32 cookie) { - unsigned short hnum = ntohs(loc_port); - unsigned int hash2 = udp4_portaddr_hash(net, loc_addr, hnum); - unsigned int slot2 = hash2 & udp_table.mask; - struct udp_hslot *hslot2 = _table.hash2[slot2]; - INET_ADDR_COOKIE(acookie, rmt_addr, loc_addr); - const __portpair ports = INET_COMBINED_PORTS(rmt_port, hnum); - struct sock *sk; + struct dst_entry *dst = dst_access(>sk_rx_dst, cookie); - udp_portaddr_for_each_entry_rcu(sk, >head) { - if (INET_MATCH(sk, net, acookie, rmt_addr, - loc_addr, ports, dif, sdif)) - return sk; - /* Only check first socket in chain */ - break; + if (dst) { + /* set noref for now. +* any place which wants to hold dst has to call +* dst_hold_safe() +*/ + skb_dst_set_noref(skb, dst); } - return NULL; } +EXPORT_SYMBOL_GPL(udp_set_skb_rx_dst); void udp_v4_early_demux(struct sk_buff *skb) { struct net *net = dev_net(skb->dev); + int dif = skb->dev->ifindex; + int sdif = inet_sdif(skb); const struct iphdr *iph; const struct udphdr *uh; struct sock *sk = NULL; - struct dst_entry *dst; - int dif = skb->dev->ifindex; - int sdif = inet_sdif(skb); int ours; /* validate the packet */ @@ -2260,25 +2251,16 @@ void udp_v4_early_demux(struct sk_buff *skb) uh->source, iph->saddr, dif, sdif); } else if (skb->pkt_type == PACKET_HOST) { - sk = __udp4_lib_demux_lookup(net, uh->dest, iph->daddr, -uh->source, iph->saddr, dif, sdif); + sk = __udp4_lib_lookup(net, iph->saddr, uh->source, iph->daddr, +
[RFC PATCH 02/11] net: allow early demux to fetch noref socket
We must be careful to avoid leaking such sockets outside the RCU section containing the early demux call; we clear them on nonlocal delivery. For ipv4 we clear sknoref even for multicast traffic entering the ip_mr_input() path; we will lose the mcast early demux optimization when the host is acting as multicast router, but that will help to keep to code simple. Also update all iptables/nftables extension that can happen in the input chain and can transmit the skb outside such patch, namely TEE, nft_dup and nfqueue. Signed-off-by: Paolo Abeni--- net/ipv4/ip_input.c | 8 net/ipv4/netfilter/nf_dup_ipv4.c | 3 +++ net/ipv6/ip6_input.c | 4 net/ipv6/netfilter/nf_dup_ipv6.c | 3 +++ net/netfilter/nf_queue.c | 3 +++ 5 files changed, 21 insertions(+) diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index fa2dc8f692c6..5690ef09da28 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -351,6 +351,14 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb) } } + /* Since the sk has no reference to the socket, we must +* clear it before escaping this RCU section. +* The sk is just an hint and we know we are not going to use +* it outside the input path. +*/ + if (skb_dst(skb)->input != ip_local_deliver) + skb_clear_noref_sk(skb); + #ifdef CONFIG_IP_ROUTE_CLASSID if (unlikely(skb_dst(skb)->tclassid)) { struct ip_rt_acct *st = this_cpu_ptr(ip_rt_acct); diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c index 39895b9ddeb9..bf8b78492fc8 100644 --- a/net/ipv4/netfilter/nf_dup_ipv4.c +++ b/net/ipv4/netfilter/nf_dup_ipv4.c @@ -71,6 +71,9 @@ void nf_dup_ipv4(struct net *net, struct sk_buff *skb, unsigned int hooknum, nf_reset(skb); nf_ct_set(skb, NULL, IP_CT_UNTRACKED); #endif + /* Avoid leaking noref sk outside the input path */ + skb_clear_noref_sk(skb); + /* * If we are in PREROUTING/INPUT, decrease the TTL to mitigate potential * loops between two hosts. diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index 9ee208a348f5..e15ec2d36b9e 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -68,6 +68,10 @@ int ip6_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb) if (!skb_valid_dst(skb)) ip6_route_input(skb); + /* see comment on ipv4 edmux */ + if (skb_dst(skb)->input != ip6_input) + skb_clear_noref_sk(skb); + return dst_input(skb); } diff --git a/net/ipv6/netfilter/nf_dup_ipv6.c b/net/ipv6/netfilter/nf_dup_ipv6.c index 4a7ddeddbaab..939f6a2238f9 100644 --- a/net/ipv6/netfilter/nf_dup_ipv6.c +++ b/net/ipv6/netfilter/nf_dup_ipv6.c @@ -60,6 +60,9 @@ void nf_dup_ipv6(struct net *net, struct sk_buff *skb, unsigned int hooknum, nf_reset(skb); nf_ct_set(skb, NULL, IP_CT_UNTRACKED); #endif + /* Avoid leaking noref sk outside the input path */ + skb_clear_noref_sk(skb); + if (hooknum == NF_INET_PRE_ROUTING || hooknum == NF_INET_LOCAL_IN) { struct ipv6hdr *iph = ipv6_hdr(skb); diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c index f7e21953b1de..100eff08cb51 100644 --- a/net/netfilter/nf_queue.c +++ b/net/netfilter/nf_queue.c @@ -145,6 +145,9 @@ static int __nf_queue(struct sk_buff *skb, const struct nf_hook_state *state, .size = sizeof(*entry) + afinfo->route_key_size, }; + /* Avoid leaking noref sk outside the input path */ + skb_clear_noref_sk(skb); + nf_queue_entry_get_refs(entry); skb_dst_force(skb); afinfo->saveroute(skb, entry); -- 2.13.5
[RFC PATCH 09/11] route: add ipv4/6 helpers to do partial route lookup vs local dst
For ipv4 also implement the proper source address validation, even against martian addresses and return an error code accordingly. Will be used by later patches to perform dst lookup in early demux for unconnected sockets. Signed-off-by: Paolo Abeni--- include/net/ip6_route.h | 1 + include/net/route.h | 2 ++ net/ipv4/route.c| 43 +++ net/ipv6/route.c| 13 + 4 files changed, 59 insertions(+) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index ee96f402cb75..edb24456a609 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -65,6 +65,7 @@ static inline bool rt6_need_strict(const struct in6_addr *daddr) (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK); } +void ip6_route_try_local_rcu_bh(struct net *net, struct sk_buff *skb); void ip6_route_input(struct sk_buff *skb); struct dst_entry *ip6_route_input_lookup(struct net *net, struct net_device *dev, diff --git a/include/net/route.h b/include/net/route.h index ec09c3d73581..21927231cc14 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -178,6 +178,8 @@ static inline struct rtable *ip_route_output_gre(struct net *net, struct flowi4 struct rtable *ip_local_route_alloc(struct net_device *dev, unsigned int flags, u32 itag, unsigned char type, bool docache); +int ip_route_try_local_rcu(struct net *net, struct sk_buff *skb, + const struct iphdr *iph); int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src, u8 tos, struct net_device *devin); int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src, diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 515589f1b3d1..84248dd41da6 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2079,6 +2079,49 @@ out: return err; goto out; } +/* try to resolve and set the route for the ingress packet in the local + * destination, looking-up the destination address against the local ones + * and performing source validation + * return an error only if the local look up is successful and validation fails + * Called under RCU + */ +int ip_route_try_local_rcu(struct net *net, struct sk_buff *skb, + const struct iphdr *iph) +{ + __be32 saddr = iph->saddr; + struct in_device *in_dev; + struct dst_entry *dst; + int err = -EINVAL; + u32 itag; + + dst = inet_get_ifaddr_dst_rcu(net, iph->daddr); + if (!dst) + return 0; + + in_dev = __in_dev_get_rcu(skb->dev); + if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr)) + goto martian_source; + + /* check for zeronet only after successful lookup, so that we don't trip +* over limited broadcast destination, see ip_route_input_slow() +*/ + if (ipv4_is_zeronet(saddr) || (ipv4_is_loopback(saddr) && + !IN_DEV_NET_ROUTE_LOCALNET(in_dev, net))) + goto martian_source; + + err = fib_validate_source(skb, saddr, iph->daddr, iph->tos, 0, skb->dev, + in_dev, ); + if (err < 0) + goto martian_source; + + skb_dst_set_noref(skb, dst); + return 0; + +martian_source: + ip_handle_martian_source(skb->dev, in_dev, skb, iph->daddr, iph->saddr); + return err; +} + int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr, u8 tos, struct net_device *dev) { diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 26cc9f483b6d..d957e30b1cbe 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1283,6 +1283,19 @@ void ip6_route_input(struct sk_buff *skb) skb_dst_set(skb, ip6_route_input_lookup(net, skb->dev, , flags)); } +/* try to resolve and set the route for the ingress packet in the local + * destination + * Called under RCU + */ +void ip6_route_try_local_rcu_bh(struct net *net, struct sk_buff *skb) +{ + struct dst_entry *dst; + + dst = inet6_get_ifaddr_dst_rcu_bh(net, _hdr(skb)->daddr); + if (dst) + skb_dst_set_noref(skb, dst); +} + static struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table *table, struct flowi6 *fl6, int flags) { -- 2.13.5
[RFC PATCH 06/11] ip/route: factor out helper for local route creation
Will be used by a later patch to build the ifaddr dst cache. No functional changes are introduced here. Signed-off-by: Paolo Abeni--- include/net/route.h | 2 ++ net/ipv4/route.c| 30 ++ 2 files changed, 24 insertions(+), 8 deletions(-) diff --git a/include/net/route.h b/include/net/route.h index 1b09a9368c68..ec09c3d73581 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -176,6 +176,8 @@ static inline struct rtable *ip_route_output_gre(struct net *net, struct flowi4 return ip_route_output_key(net, fl4); } +struct rtable *ip_local_route_alloc(struct net_device *dev, unsigned int flags, + u32 itag, unsigned char type, bool docache); int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src, u8 tos, struct net_device *devin); int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src, diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 94d4cd2d5ea4..515589f1b3d1 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1859,6 +1859,27 @@ static int ip_mkroute_input(struct sk_buff *skb, return __mkroute_input(skb, res, in_dev, daddr, saddr, tos); } +struct rtable *ip_local_route_alloc(struct net_device *dev, unsigned int flags, + u32 itag, unsigned char type, bool do_cache) +{ + struct in_device *in_dev = __in_dev_get_rcu(dev); + struct net *net = dev_net(dev); + struct rtable *rth; + + rth = rt_dst_alloc(l3mdev_master_dev_rcu(dev) ? : net->loopback_dev, + flags | RTCF_LOCAL, type, + IN_DEV_CONF_GET(in_dev, NOPOLICY), false, do_cache); + if (!rth) + return NULL; + + rth->dst.output= ip_rt_bug; +#ifdef CONFIG_IP_ROUTE_CLASSID + rth->dst.tclassid = itag; +#endif + rth->rt_is_input = 1; + return rth; +} + /* * NOTE. We drop all the packets that has local source * addresses, because every properly looped back packet @@ -1996,17 +2017,10 @@ out:return err; } } - rth = rt_dst_alloc(l3mdev_master_dev_rcu(dev) ? : net->loopback_dev, - flags | RTCF_LOCAL, res->type, - IN_DEV_CONF_GET(in_dev, NOPOLICY), false, do_cache); + rth = ip_local_route_alloc(dev, flags, itag, res->type, do_cache); if (!rth) goto e_nobufs; - rth->dst.output= ip_rt_bug; -#ifdef CONFIG_IP_ROUTE_CLASSID - rth->dst.tclassid = itag; -#endif - rth->rt_is_input = 1; if (res->table) rth->rt_table_id = res->table->tb_id; -- 2.13.5
[RFC PATCH 08/11] net: implement local route cache inside ifaddr
add storage and helpers to associate an ipv{4,6} address with the local route to self. This will be used by a later patch to implement early demux for unconnected UDP sockets. The caches are filled on address creation, with DST_OBSOLETE_NONE. Ipv6 cache are explicitly clearered and refreshed on underlaying device down/up events. The above schema is simpler than refreshing the cache every time the dst expires under the default obsolete schema. Signed-off-by: Paolo Abeni--- include/linux/inetdevice.h | 4 include/net/addrconf.h | 3 +++ include/net/if_inet6.h | 4 net/ipv4/devinet.c | 29 - net/ipv6/addrconf.c| 44 5 files changed, 83 insertions(+), 1 deletion(-) diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h index 751d051f0bc7..c29982f178bb 100644 --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -130,6 +130,8 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev) #define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE) #define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY) +struct dst_entry; + struct in_ifaddr { struct hlist_node hash; struct in_ifaddr*ifa_next; @@ -149,6 +151,7 @@ struct in_ifaddr { __u32 ifa_preferred_lft; unsigned long ifa_cstamp; /* created timestamp */ unsigned long ifa_tstamp; /* updated timestamp */ + struct dst_entry*dst; /* local route to self */ }; struct in_validator_info { @@ -180,6 +183,7 @@ __be32 inet_confirm_addr(struct net *net, struct in_device *in_dev, __be32 dst, struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, __be32 prefix, __be32 mask); struct in_ifaddr *inet_lookup_ifaddr_rcu(struct net *net, __be32 addr); +struct dst_entry *inet_get_ifaddr_dst_rcu(struct net *net, __be32 addr); static __inline__ bool inet_ifa_match(__be32 addr, struct in_ifaddr *ifa) { return !((addr^ifa->ifa_address)>ifa_mask); diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 87981cd63180..bdfa3306a4c5 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -87,6 +87,9 @@ struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *addr, struct net_device *dev, int strict); +struct dst_entry *inet6_get_ifaddr_dst_rcu_bh(struct net *net, + const struct in6_addr *addr); + int ipv6_dev_get_saddr(struct net *net, const struct net_device *dev, const struct in6_addr *daddr, unsigned int srcprefs, struct in6_addr *saddr); diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h index d4088d1a688d..1dd42e7c17a4 100644 --- a/include/net/if_inet6.h +++ b/include/net/if_inet6.h @@ -39,6 +39,8 @@ enum { INET6_IFADDR_STATE_DEAD, }; +struct dst_entry; + struct inet6_ifaddr { struct in6_addr addr; __u32 prefix_len; @@ -77,6 +79,8 @@ struct inet6_ifaddr { struct rcu_head rcu; struct in6_addr peer_addr; + + struct dst_entry*dst; /* local route to self */ }; struct ip6_sf_socklist { diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 7ce22a2c07ce..a7748f787866 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -179,6 +179,17 @@ struct in_ifaddr *inet_lookup_ifaddr_rcu(struct net *net, __be32 addr) return NULL; } +/* called under RCU lock */ +struct dst_entry *inet_get_ifaddr_dst_rcu(struct net *net, __be32 addr) +{ + struct in_ifaddr *ifa = inet_lookup_ifaddr_rcu(net, addr); + + if (!ifa) + return NULL; + + return dst_access(>dst, 0); +} + static void rtmsg_ifa(int event, struct in_ifaddr *, struct nlmsghdr *, u32); static BLOCKING_NOTIFIER_HEAD(inetaddr_chain); @@ -337,6 +348,7 @@ static void __inet_del_ifa(struct in_device *in_dev, struct in_ifaddr **ifap, struct in_ifaddr *last_prim = in_dev->ifa_list; struct in_ifaddr *prev_prom = NULL; int do_promote = IN_DEV_PROMOTE_SECONDARIES(in_dev); + struct dst_entry *dst; ASSERT_RTNL(); @@ -395,7 +407,12 @@ static void __inet_del_ifa(struct in_device *in_dev, struct in_ifaddr **ifap, *ifap = ifa1->ifa_next; inet_hash_remove(ifa1); - /* 3. Announce address deletion */ + /* 3. Clear dst cache */ + + dst = xchg(>dst, NULL); + dst_release(dst); + + /* 4. Announce address deletion */ /* Send message first, then call notifier. At first sight, FIB update triggered by notifier @@ -449,6 +466,7 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh, struct
[RFC PATCH 00/11] udp: full early demux for unconnected sockets
This series refactor the UDP early demux code so that: * full socket lookup is performed for unicast packets * a sk is grabbed even for unconnected socket match * a dst cache is used even in such scenario To perform this tasks a couple of facilities are added: * noref socket references, scoped inside the current RCU section, to be explicitly cleared before leaving such section * a dst cache inside the inet and inet6 local addresses tables, caching the related local dst entry The measured performance gain under small packet UDP flood is as follow: ingress NIC vanilla patched delta rx queues (kpps) (kpps) (%) [ipv4] 1 2177241410 2 2527289214 3 3050373322 4 3918464318 5 5074569912 6 5654686921 [ipv6] 1 2002282140 2 2087314850 3 2583400855 4 3072496361 5 3719599261 6 4314691060 The number of user space process in use is equal to the number of NIC rx queue; when multiple user space processes the SO_REUSEPORT options is used, as described below: ethtool -L em2 combined $n MASK=1 for I in `seq 0 $((n - 1))`; do udp_sink --reuse-port --recvfrom --count 10 --port 9 $1 & taskset -p $((MASK << ($I + $n) )) $! done Paolo Abeni (11): net: add support for noref skb->sk net: allow early demux to fetch noref socket udp: do not touch socket refcount in early demux net: add simple socket-like dst cache helpers udp: perform full socket lookup in early demux ip/route: factor out helper for local route creation ipv6/addrconf: add an helper for inet6 address lookup net: implement local route cache inside ifaddr route: add ipv4/6 helpers to do partial route lookup vs local dst IP: early demux can return an error code udp: dst lookup in early demux for unconnected sockets include/linux/inetdevice.h | 4 ++ include/linux/skbuff.h | 31 +++ include/linux/udp.h | 2 + include/net/addrconf.h | 3 ++ include/net/dst.h| 20 +++ include/net/if_inet6.h | 4 ++ include/net/ip6_route.h | 1 + include/net/protocol.h | 4 +- include/net/route.h | 4 ++ include/net/tcp.h| 2 +- include/net/udp.h| 2 +- net/core/dst.c | 12 + net/core/sock.c | 7 +++ net/ipv4/devinet.c | 29 ++- net/ipv4/ip_input.c | 33 net/ipv4/netfilter/nf_dup_ipv4.c | 3 ++ net/ipv4/route.c | 73 +++--- net/ipv4/tcp_ipv4.c | 9 ++-- net/ipv4/udp.c | 95 +++--- net/ipv6/addrconf.c | 109 +++ net/ipv6/ip6_input.c | 4 ++ net/ipv6/netfilter/nf_dup_ipv6.c | 3 ++ net/ipv6/route.c | 13 + net/ipv6/udp.c | 72 ++ net/netfilter/nf_queue.c | 3 ++ 25 files changed, 383 insertions(+), 159 deletions(-) -- 2.13.5
[RFC PATCH 03/11] udp: do not touch socket refcount in early demux
use noref sockets instead. This gives some small performance improvements and will allow efficient early demux for unconnected sockets in a later patch. Signed-off-by: Paolo Abeni--- net/ipv4/udp.c | 18 ++ net/ipv6/udp.c | 10 ++ 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 784ced0b9150..ba49d5aa9f09 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2050,12 +2050,13 @@ static inline int udp4_csum_init(struct sk_buff *skb, struct udphdr *uh, int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, int proto) { - struct sock *sk; - struct udphdr *uh; - unsigned short ulen; + struct net *net = dev_net(skb->dev); struct rtable *rt = skb_rtable(skb); + unsigned short ulen; __be32 saddr, daddr; - struct net *net = dev_net(skb->dev); + struct udphdr *uh; + struct sock *sk; + bool noref_sk; /* * Validate the packet. @@ -2081,6 +2082,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, if (udp4_csum_init(skb, uh, proto)) goto csum_error; + noref_sk = skb_has_noref_sk(skb); sk = skb_steal_sock(skb); if (sk) { struct dst_entry *dst = skb_dst(skb); @@ -2090,7 +2092,8 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, udp_sk_rx_dst_set(sk, dst); ret = udp_queue_rcv_skb(sk, skb); - sock_put(sk); + if (!noref_sk) + sock_put(sk); /* a return value > 0 means to resubmit the input, but * it wants the return to be -protocol, or 0 */ @@ -2261,11 +2264,10 @@ void udp_v4_early_demux(struct sk_buff *skb) uh->source, iph->saddr, dif, sdif); } - if (!sk || !refcount_inc_not_zero(>sk_refcnt)) + if (!sk) return; - skb->sk = sk; - skb->destructor = sock_efree; + skb_set_noref_sk(skb, sk); dst = READ_ONCE(sk->sk_rx_dst); if (dst) diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index e2ecfb137297..8f62392c4c35 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -787,6 +787,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, struct net *net = dev_net(skb->dev); struct udphdr *uh; struct sock *sk; + bool noref_sk; u32 ulen = 0; if (!pskb_may_pull(skb, sizeof(struct udphdr))) @@ -823,6 +824,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, goto csum_error; /* Check if the socket is already available, e.g. due to early demux */ + noref_sk = skb_has_noref_sk(skb); sk = skb_steal_sock(skb); if (sk) { struct dst_entry *dst = skb_dst(skb); @@ -832,7 +834,8 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, udp6_sk_rx_dst_set(sk, dst); ret = udpv6_queue_rcv_skb(sk, skb); - sock_put(sk); + if (!noref_sk) + sock_put(sk); /* a return value > 0 means to resubmit the input */ if (ret > 0) @@ -948,11 +951,10 @@ static void udp_v6_early_demux(struct sk_buff *skb) else return; - if (!sk || !refcount_inc_not_zero(>sk_refcnt)) + if (!sk) return; - skb->sk = sk; - skb->destructor = sock_efree; + skb_set_noref_sk(skb, sk); dst = READ_ONCE(sk->sk_rx_dst); if (dst) -- 2.13.5
[RFC PATCH 04/11] net: add simple socket-like dst cache helpers
It will be used by later patches to reduce code duplication. Signed-off-by: Paolo Abeni--- include/net/dst.h | 20 net/core/dst.c| 12 2 files changed, 32 insertions(+) diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a352..4fcca0e368c6 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -485,6 +485,26 @@ static inline struct dst_entry *dst_check(struct dst_entry *dst, u32 cookie) return dst; } +/* update the cache with dst, assuming the latter already carries a refcount */ +static inline bool __dst_update(struct dst_entry **cache, struct dst_entry *dst) +{ + struct dst_entry *old = xchg(cache, dst); + + dst_release(old); + return old != dst; +} +bool dst_update(struct dst_entry **cache, struct dst_entry *dst); +static inline struct dst_entry *dst_access(struct dst_entry **cache, + u32 cookie) +{ + struct dst_entry *dst = READ_ONCE(*cache); + + if (!dst) + return NULL; + + return dst_check(dst, cookie); +} + /* Flags for xfrm_lookup flags argument. */ enum { XFRM_LOOKUP_ICMP = 1 << 0, diff --git a/net/core/dst.c b/net/core/dst.c index a6c47da7d0f8..4076f9af45d7 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -205,6 +205,18 @@ void dst_release_immediate(struct dst_entry *dst) } EXPORT_SYMBOL(dst_release_immediate); +/* update the cache with dst, assuming the latter does not carry a refcount */ +bool dst_update(struct dst_entry **cache, struct dst_entry *dst) +{ + if (likely(*cache == dst)) + return false; + + if (dst_hold_safe(dst)) + return __dst_update(cache, dst); + return false; +} +EXPORT_SYMBOL_GPL(dst_update); + u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old) { struct dst_metrics *p = kmalloc(sizeof(*p), GFP_ATOMIC); -- 2.13.5
[PATCH,v3,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver
Add a TUN/TAP receive mode that exercises the napi_gro_frags() interface. This mode is available only in TAP mode, as the interface expects packets with Ethernet headers. Furthermore, packets follow the layout of the iovec_iter that was received. The first iovec is the linear data, and every one after the first is a fragment. If there are more fragments than the max number, drop the packet. Additionally, invoke eth_get_headlen() to exercise flow dissector code and to verify that the header resides in the linear data. The napi_gro_frags() mode requires setting the IFF_NAPI_FRAGS option. This is imposed because this mode is intended for testing via tools like syzkaller and packetdrill, and the increased flexibility it provides can introduce security vulnerabilities. This flag is accepted only if the device is in TAP mode and has the IFF_NAPI flag set as well. This is done because both of these are explicit requirements for correct operation in this mode. Signed-off-by: Petar PenkovCc: Eric Dumazet Cc: Mahesh Bandewar Cc: Willem de Bruijn Cc: da...@davemloft.net Cc: ppen...@stanford.edu --- drivers/net/tun.c | 134 ++-- include/uapi/linux/if_tun.h | 1 + 2 files changed, 129 insertions(+), 6 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index f16407242b18..9880b3bc8fa5 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -75,6 +75,7 @@ #include #include #include +#include #include @@ -121,7 +122,8 @@ do { \ #define TUN_VNET_BE 0x4000 #define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \ - IFF_MULTI_QUEUE | IFF_NAPI) + IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS) + #define GOODCOPY_LEN 128 #define FLT_EXACT_COUNT 8 @@ -173,6 +175,7 @@ struct tun_file { unsigned int ifindex; }; struct napi_struct napi; + struct mutex napi_mutex;/* Protects access to the above napi */ struct list_head next; struct tun_struct *detached; struct skb_array tx_array; @@ -277,6 +280,7 @@ static void tun_napi_init(struct tun_struct *tun, struct tun_file *tfile, netif_napi_add(tun->dev, >napi, tun_napi_poll, NAPI_POLL_WEIGHT); napi_enable(>napi); + mutex_init(>napi_mutex); } } @@ -292,6 +296,11 @@ static void tun_napi_del(struct tun_struct *tun, struct tun_file *tfile) netif_napi_del(>napi); } +static bool tun_napi_frags_enabled(const struct tun_struct *tun) +{ + return READ_ONCE(tun->flags) & IFF_NAPI_FRAGS; +} + #ifdef CONFIG_TUN_VNET_CROSS_LE static inline bool tun_legacy_is_little_endian(struct tun_struct *tun) { @@ -1036,7 +1045,8 @@ static void tun_poll_controller(struct net_device *dev) * supports polling, which enables bridge devices in virt setups to * still use netconsole * If NAPI is enabled, however, we need to schedule polling for all -* queues. +* queues unless we are using napi_gro_frags(), which we call in +* process context and not in NAPI context. */ struct tun_struct *tun = netdev_priv(dev); @@ -1044,6 +1054,9 @@ static void tun_poll_controller(struct net_device *dev) struct tun_file *tfile; int i; + if (tun_napi_frags_enabled(tun)) + return; + rcu_read_lock(); for (i = 0; i < tun->numqueues; i++) { tfile = rcu_dereference(tun->tfiles[i]); @@ -1266,6 +1279,64 @@ static unsigned int tun_chr_poll(struct file *file, poll_table *wait) return mask; } +static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, + size_t len, + const struct iov_iter *it) +{ + struct sk_buff *skb; + size_t linear; + int err; + int i; + + if (it->nr_segs > MAX_SKB_FRAGS + 1) + return ERR_PTR(-ENOMEM); + + local_bh_disable(); + skb = napi_get_frags(>napi); + local_bh_enable(); + if (!skb) + return ERR_PTR(-ENOMEM); + + linear = iov_iter_single_seg_count(it); + err = __skb_grow(skb, linear); + if (err) + goto free; + + skb->len = len; + skb->data_len = len - linear; + skb->truesize += skb->data_len; + + for (i = 1; i < it->nr_segs; i++) { + size_t fragsz = it->iov[i].iov_len; + unsigned long offset; + struct page *page; + void *data; + + if (fragsz == 0 || fragsz > PAGE_SIZE) { + err = -EINVAL; + goto free;
[PATCH,v3,net-next 1/2] tun: enable NAPI for TUN/TAP driver
Changes TUN driver to use napi_gro_receive() upon receiving packets rather than netif_rx_ni(). Adds flag IFF_NAPI that enables these changes and operation is not affected if the flag is disabled. SKBs are constructed upon packet arrival and are queued to be processed later. The new path was evaluated with a benchmark with the following setup: Open two tap devices and a receiver thread that reads in a loop for each device. Start one sender thread and pin all threads to different CPUs. Send 1M minimum UDP packets to each device and measure sending time for each of the sending methods: napi_gro_receive(): 4.90s netif_rx_ni(): 4.90s netif_receive_skb():7.20s Signed-off-by: Petar PenkovCc: Eric Dumazet Cc: Mahesh Bandewar Cc: Willem de Bruijn Cc: da...@davemloft.net Cc: ppen...@stanford.edu --- drivers/net/tun.c | 133 +++- include/uapi/linux/if_tun.h | 1 + 2 files changed, 119 insertions(+), 15 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 3c9985f29950..f16407242b18 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -121,7 +121,7 @@ do { \ #define TUN_VNET_BE 0x4000 #define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \ - IFF_MULTI_QUEUE) + IFF_MULTI_QUEUE | IFF_NAPI) #define GOODCOPY_LEN 128 #define FLT_EXACT_COUNT 8 @@ -172,6 +172,7 @@ struct tun_file { u16 queue_index; unsigned int ifindex; }; + struct napi_struct napi; struct list_head next; struct tun_struct *detached; struct skb_array tx_array; @@ -229,6 +230,68 @@ struct tun_struct { struct bpf_prog __rcu *xdp_prog; }; +static int tun_napi_receive(struct napi_struct *napi, int budget) +{ + struct tun_file *tfile = container_of(napi, struct tun_file, napi); + struct sk_buff_head *queue = >sk.sk_write_queue; + struct sk_buff_head process_queue; + struct sk_buff *skb; + int received = 0; + + __skb_queue_head_init(_queue); + + spin_lock(>lock); + skb_queue_splice_tail_init(queue, _queue); + spin_unlock(>lock); + + while (received < budget && (skb = __skb_dequeue(_queue))) { + napi_gro_receive(napi, skb); + ++received; + } + + if (!skb_queue_empty(_queue)) { + spin_lock(>lock); + skb_queue_splice(_queue, queue); + spin_unlock(>lock); + } + + return received; +} + +static int tun_napi_poll(struct napi_struct *napi, int budget) +{ + unsigned int received; + + received = tun_napi_receive(napi, budget); + + if (received < budget) + napi_complete_done(napi, received); + + return received; +} + +static void tun_napi_init(struct tun_struct *tun, struct tun_file *tfile, + bool napi_en) +{ + if (napi_en) { + netif_napi_add(tun->dev, >napi, tun_napi_poll, + NAPI_POLL_WEIGHT); + napi_enable(>napi); + } +} + +static void tun_napi_disable(struct tun_struct *tun, struct tun_file *tfile) +{ + if (tun->flags & IFF_NAPI) + napi_disable(>napi); +} + +static void tun_napi_del(struct tun_struct *tun, struct tun_file *tfile) +{ + if (tun->flags & IFF_NAPI) + netif_napi_del(>napi); +} + #ifdef CONFIG_TUN_VNET_CROSS_LE static inline bool tun_legacy_is_little_endian(struct tun_struct *tun) { @@ -541,6 +604,11 @@ static void __tun_detach(struct tun_file *tfile, bool clean) tun = rtnl_dereference(tfile->tun); + if (tun && clean) { + tun_napi_disable(tun, tfile); + tun_napi_del(tun, tfile); + } + if (tun && !tfile->detached) { u16 index = tfile->queue_index; BUG_ON(index >= tun->numqueues); @@ -598,6 +666,7 @@ static void tun_detach_all(struct net_device *dev) for (i = 0; i < n; i++) { tfile = rtnl_dereference(tun->tfiles[i]); BUG_ON(!tfile); + tun_napi_disable(tun, tfile); tfile->socket.sk->sk_shutdown = RCV_SHUTDOWN; tfile->socket.sk->sk_data_ready(tfile->socket.sk); RCU_INIT_POINTER(tfile->tun, NULL); @@ -613,6 +682,7 @@ static void tun_detach_all(struct net_device *dev) synchronize_net(); for (i = 0; i < n; i++) { tfile = rtnl_dereference(tun->tfiles[i]); + tun_napi_del(tun, tfile); /* Drop read queue */ tun_queue_purge(tfile); sock_put(>sk); @@ -631,7 +701,8 @@ static void tun_detach_all(struct net_device *dev)
[PATCH,v3,net-next 0/2] Improve code coverage of syzkaller
This patch series is intended to improve code coverage of syzkaller on the early receive path, specifically including flow dissector, GRO, and GRO with frags parts of the networking stack. Syzkaller exercises the stack through the TUN driver and this is therefore where changes reside. Current coverage through netif_receive_skb() is limited as it does not touch on any of the aforementioned code paths. Furthermore, for full coverage, it is necessary to have more flexibility over the linear and non-linear data of the skbs. The following patches address this by providing the user(syzkaller) with the ability to send via napi_gro_receive() and napi_gro_frags(). Additionally, syzkaller can specify how many fragments there are and how much data per fragment there is. This is done by exploiting the convenient structure of iovecs. Finally, this patch series adds support for exercising the flow dissector during fuzzing. The code path including napi_gro_receive() can be enabled via the IFF_NAPI flag. The remainder of the changes in this patch series give the user significantly more control over packets entering the kernel. To avoid potential security vulnerabilities, hide the ability to send custom skbs and the flow dissector code paths behind a capable(CAP_NET_ADMIN) check to require special user privileges. Changes since v2 based on feedback from Willem de Bruijn and Mahesh Bandewar: Patch 1/ No changes. Patch 2/ Check if the preconditions for IFF_NAPI_FRAGS (IFF_NAPI and IFF_TAP) are met before opening/attaching rather than after. If they are not, change the behavior from discarding the flag to rejecting the command with EINVAL. Petar Penkov (2): tun: enable NAPI for TUN/TAP driver tun: enable napi_gro_frags() for TUN/TAP driver drivers/net/tun.c | 261 +--- include/uapi/linux/if_tun.h | 2 + 2 files changed, 245 insertions(+), 18 deletions(-) -- 2.11.0
[PATCH] [for 4.14] net: qcom/emac: specify the correct size when mapping a DMA buffer
When mapping the RX DMA buffers, the driver was accidentally specifying zero for the buffer length. Under normal circumstances, SWIOTLB does not need to allocate a bounce buffer, so the address is just mapped without checking the size field. This is why the error was not detected earlier. Fixes: b9b17debc69d ("net: emac: emac gigabit ethernet controller driver") Cc: sta...@vger.kernel.org Signed-off-by: Timur Tabi--- drivers/net/ethernet/qualcomm/emac/emac-mac.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c b/drivers/net/ethernet/qualcomm/emac/emac-mac.c index 0ea3ca09c689..3ed9033e56db 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c +++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c @@ -898,7 +898,8 @@ static void emac_mac_rx_descs_refill(struct emac_adapter *adpt, curr_rxbuf->dma_addr = dma_map_single(adpt->netdev->dev.parent, skb->data, - curr_rxbuf->length, DMA_FROM_DEVICE); + adpt->rxbuf_size, DMA_FROM_DEVICE); + ret = dma_mapping_error(adpt->netdev->dev.parent, curr_rxbuf->dma_addr); if (ret) { -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH net-next v2 1/3] net: dsa: use slave device phydev
Hi Florian, Florian Fainelliwrites: > On 09/22/2017 12:40 PM, Vivien Didelot wrote: >> There is no need to store a phy_device in dsa_slave_priv since >> net_device already provides one. Simply s/p->phy/dev->phydev/. > > You can therefore remove the phy_device from dsa_slave_priv, see below > for more comments. I will have to regress test the heck out of this, > this should take a few hours. OK, since this is a sensible topic, I will respin a v3 without this patch, so that a future patchset can address your comments below and also gives you time to test this one patch alone. >> static int dsa_slave_port_attr_set(struct net_device *dev, >> @@ -435,12 +433,10 @@ static int >> dsa_slave_get_link_ksettings(struct net_device *dev, >> struct ethtool_link_ksettings *cmd) >> { >> -struct dsa_slave_priv *p = netdev_priv(dev); >> +if (!dev->phydev) >> +return -ENODEV; >> >> -if (!p->phy) >> -return -EOPNOTSUPP; >> - >> -phy_ethtool_ksettings_get(p->phy, cmd); >> +phy_ethtool_ksettings_get(dev->phydev, cmd); > > This can be replaced by phy_ethtool_get_link_ksettings() > >> >> return 0; >> } >> @@ -449,12 +445,10 @@ static int >> dsa_slave_set_link_ksettings(struct net_device *dev, >> const struct ethtool_link_ksettings *cmd) >> { >> -struct dsa_slave_priv *p = netdev_priv(dev); >> +if (!dev->phydev) >> +return -ENODEV; >> >> -if (p->phy != NULL) >> -return phy_ethtool_ksettings_set(p->phy, cmd); >> - >> -return -EOPNOTSUPP; >> +return phy_ethtool_ksettings_set(dev->phydev, cmd); >> } > > This can disappear and you can assign this ethtool operation to > phy_ethtool_set_link_ksettings() > >> >> static void dsa_slave_get_drvinfo(struct net_device *dev, >> @@ -488,24 +482,20 @@ dsa_slave_get_regs(struct net_device *dev, struct >> ethtool_regs *regs, void *_p) >> >> static int dsa_slave_nway_reset(struct net_device *dev) >> { >> -struct dsa_slave_priv *p = netdev_priv(dev); >> +if (!dev->phydev) >> +return -ENODEV; >> >> -if (p->phy != NULL) >> -return genphy_restart_aneg(p->phy); >> - >> -return -EOPNOTSUPP; >> +return genphy_restart_aneg(dev->phydev); >> } > > This can now disappear and you can use phy_ethtool_nway_reset() directly > in ethtool_ops > >> >> static u32 dsa_slave_get_link(struct net_device *dev) >> { >> -struct dsa_slave_priv *p = netdev_priv(dev); >> +if (!dev->phydev) >> +return -ENODEV; >> >> -if (p->phy != NULL) { >> -genphy_update_link(p->phy); >> -return p->phy->link; >> -} >> +genphy_update_link(dev->phydev); >> >> -return -EOPNOTSUPP; >> +return dev->phydev->link; >> } > > This should certainly be just ethtool_op_get_link(), not sure why we > kept that around here... Haaa, good to read that! I wasn't sure about this, but with this patch the slave phy ethtool functions seemed indeed quite generic... Thanks, Vivien
Re: [PATCH net-next 2/2] net: dsa: lan9303: Add basic offloading of unicast traffic
> >I'm wondering how this is supposed to work. Please add a good comment > >here, since the hardware is forcing you to do something odd. > > > >Maybe it would be a good idea to save the STP state in chip. And then > >when chip->is_bridged is set true, change the state in the hardware to > >the saved value? > > > >What happens when port 0 is added to the bridge, there is then a > >minute pause and then port 1 is added? I would expect that as soon as > >port 0 is added, the STP state machine for port 0 will start and move > >into listening and then forwarding. Due to hardware limitations it > >looks like you cannot do this. So what state is the hardware > >effectively in? Blocking? Forwarding? > > > >Then port 1 is added. You can then can respect the states. port 1 will > >do blocking->listening->forwarding, but what about port 0? The calls > >won't get repeated? How does it transition to forwarding? > > > > Andrew > > > > I see your point with the "minute pause" argument. Although a bit > contrived use case, it is easy to fix by caching the STP state, as > you suggest. So I can do that. I don't think it is contrived. I've done bridge configuration by hand for testing purposes. I've also set the forwarding delay to very small values, so there is a clear race condition here. > How does other DSA HW chips handle port separation? Knowing that > could perhaps help me know what to look for. They have better hardware :-) Generally each port is totally independent. You can change the STP state per port without restrictions. Andrew
Re: [PATCH net-next v2 1/3] net: dsa: use slave device phydev
On Fri, Sep 22, 2017 at 03:40:43PM -0400, Vivien Didelot wrote: > There is no need to store a phy_device in dsa_slave_priv since > net_device already provides one. Simply s/p->phy/dev->phydev/. > > While at it, return -ENODEV when it is NULL instead of -EOPNOTSUPP. I just did a quick poll for calling phy_mii_ioctl(). ENODEV seems the most popular, second to EINVAL. Marvell drivers all use EOPNOTSUPP. > static int dsa_slave_nway_reset(struct net_device *dev) > { > - struct dsa_slave_priv *p = netdev_priv(dev); > + if (!dev->phydev) > + return -ENODEV; > > - if (p->phy != NULL) > - return genphy_restart_aneg(p->phy); > - > - return -EOPNOTSUPP; > + return genphy_restart_aneg(dev->phydev); > } It looks like this can now be replaced with phy_ethtool_nway_reset(). It could be there are other phy_ethtool_ helpers which can be used, now that we have phydev in ndev. Andrew
Re: [PATCH net-next v2 1/3] net: dsa: use slave device phydev
On 09/22/2017 12:40 PM, Vivien Didelot wrote: > There is no need to store a phy_device in dsa_slave_priv since > net_device already provides one. Simply s/p->phy/dev->phydev/. You can therefore remove the phy_device from dsa_slave_priv, see below for more comments. I will have to regress test the heck out of this, this should take a few hours. > > While at it, return -ENODEV when it is NULL instead of -EOPNOTSUPP. > > Signed-off-by: Vivien Didelot> --- > static int dsa_slave_port_attr_set(struct net_device *dev, > @@ -435,12 +433,10 @@ static int > dsa_slave_get_link_ksettings(struct net_device *dev, >struct ethtool_link_ksettings *cmd) > { > - struct dsa_slave_priv *p = netdev_priv(dev); > + if (!dev->phydev) > + return -ENODEV; > > - if (!p->phy) > - return -EOPNOTSUPP; > - > - phy_ethtool_ksettings_get(p->phy, cmd); > + phy_ethtool_ksettings_get(dev->phydev, cmd); This can be replaced by phy_ethtool_get_link_ksettings() > > return 0; > } > @@ -449,12 +445,10 @@ static int > dsa_slave_set_link_ksettings(struct net_device *dev, >const struct ethtool_link_ksettings *cmd) > { > - struct dsa_slave_priv *p = netdev_priv(dev); > + if (!dev->phydev) > + return -ENODEV; > > - if (p->phy != NULL) > - return phy_ethtool_ksettings_set(p->phy, cmd); > - > - return -EOPNOTSUPP; > + return phy_ethtool_ksettings_set(dev->phydev, cmd); > } This can disappear and you can assign this ethtool operation to phy_ethtool_set_link_ksettings() > > static void dsa_slave_get_drvinfo(struct net_device *dev, > @@ -488,24 +482,20 @@ dsa_slave_get_regs(struct net_device *dev, struct > ethtool_regs *regs, void *_p) > > static int dsa_slave_nway_reset(struct net_device *dev) > { > - struct dsa_slave_priv *p = netdev_priv(dev); > + if (!dev->phydev) > + return -ENODEV; > > - if (p->phy != NULL) > - return genphy_restart_aneg(p->phy); > - > - return -EOPNOTSUPP; > + return genphy_restart_aneg(dev->phydev); > } This can now disappear and you can use phy_ethtool_nway_reset() directly in ethtool_ops > > static u32 dsa_slave_get_link(struct net_device *dev) > { > - struct dsa_slave_priv *p = netdev_priv(dev); > + if (!dev->phydev) > + return -ENODEV; > > - if (p->phy != NULL) { > - genphy_update_link(p->phy); > - return p->phy->link; > - } > + genphy_update_link(dev->phydev); > > - return -EOPNOTSUPP; > + return dev->phydev->link; > } This should certainly be just ethtool_op_get_link(), not sure why we kept that around here... -- Florian
[PATCH net-next v2 2/3] net: dsa: make slave close symmetrical to open
The DSA slave open function configures the unicast MAC addresses on the master device, enable the switch port, change its STP state, then start the PHY device. Make the close function symmetric, by first stopping the PHY device, then changing the STP state, disabling the switch port and restore the master device. Signed-off-by: Vivien DidelotReviewed-by: Florian Fainelli Reviewed-by: Andrew Lunn --- net/dsa/slave.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 3760472bf41d..0aab29928152 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -133,6 +133,11 @@ static int dsa_slave_close(struct net_device *dev) if (dev->phydev) phy_stop(dev->phydev); + dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); + + if (ds->ops->port_disable) + ds->ops->port_disable(ds, p->dp->index, dev->phydev); + dev_mc_unsync(master, dev); dev_uc_unsync(master, dev); if (dev->flags & IFF_ALLMULTI) @@ -143,11 +148,6 @@ static int dsa_slave_close(struct net_device *dev) if (!ether_addr_equal(dev->dev_addr, master->dev_addr)) dev_uc_del(master, dev->dev_addr); - if (ds->ops->port_disable) - ds->ops->port_disable(ds, p->dp->index, dev->phydev); - - dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); - return 0; } -- 2.14.1
[PATCH net-next v2 3/3] net: dsa: add port enable and disable helpers
Provide dsa_port_enable and dsa_port_disable helpers to respectively enable and disable a switch port. This makes the dsa_port_set_state_now helper static. Signed-off-by: Vivien DidelotReviewed-by: Florian Fainelli Reviewed-by: Andrew Lunn --- net/dsa/dsa_priv.h | 3 ++- net/dsa/port.c | 31 ++- net/dsa/slave.c| 19 +-- 3 files changed, 37 insertions(+), 16 deletions(-) diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index 9803952a5b40..0298a0f6a349 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -117,7 +117,8 @@ void dsa_master_ethtool_restore(struct net_device *dev); /* port.c */ int dsa_port_set_state(struct dsa_port *dp, u8 state, struct switchdev_trans *trans); -void dsa_port_set_state_now(struct dsa_port *dp, u8 state); +int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy); +void dsa_port_disable(struct dsa_port *dp, struct phy_device *phy); int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br); void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br); int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering, diff --git a/net/dsa/port.c b/net/dsa/port.c index 76d43a82d397..72c8dbd3d3f2 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -56,7 +56,7 @@ int dsa_port_set_state(struct dsa_port *dp, u8 state, return 0; } -void dsa_port_set_state_now(struct dsa_port *dp, u8 state) +static void dsa_port_set_state_now(struct dsa_port *dp, u8 state) { int err; @@ -65,6 +65,35 @@ void dsa_port_set_state_now(struct dsa_port *dp, u8 state) pr_err("DSA: failed to set STP state %u (%d)\n", state, err); } +int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy) +{ + u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING; + struct dsa_switch *ds = dp->ds; + int port = dp->index; + int err; + + if (ds->ops->port_enable) { + err = ds->ops->port_enable(ds, port, phy); + if (err) + return err; + } + + dsa_port_set_state_now(dp, stp_state); + + return 0; +} + +void dsa_port_disable(struct dsa_port *dp, struct phy_device *phy) +{ + struct dsa_switch *ds = dp->ds; + int port = dp->index; + + dsa_port_set_state_now(dp, BR_STATE_DISABLED); + + if (ds->ops->port_disable) + ds->ops->port_disable(ds, port, phy); +} + int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br) { struct dsa_notifier_bridge_info info = { diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 0aab29928152..4ea1c6eb0da8 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -73,9 +73,7 @@ static int dsa_slave_open(struct net_device *dev) { struct dsa_slave_priv *p = netdev_priv(dev); struct dsa_port *dp = p->dp; - struct dsa_switch *ds = dp->ds; struct net_device *master = dsa_master_netdev(p); - u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING; int err; if (!(master->flags & IFF_UP)) @@ -98,13 +96,9 @@ static int dsa_slave_open(struct net_device *dev) goto clear_allmulti; } - if (ds->ops->port_enable) { - err = ds->ops->port_enable(ds, p->dp->index, dev->phydev); - if (err) - goto clear_promisc; - } - - dsa_port_set_state_now(p->dp, stp_state); + err = dsa_port_enable(dp, dev->phydev); + if (err) + goto clear_promisc; if (dev->phydev) phy_start(dev->phydev); @@ -128,15 +122,12 @@ static int dsa_slave_close(struct net_device *dev) { struct dsa_slave_priv *p = netdev_priv(dev); struct net_device *master = dsa_master_netdev(p); - struct dsa_switch *ds = p->dp->ds; + struct dsa_port *dp = p->dp; if (dev->phydev) phy_stop(dev->phydev); - dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); - - if (ds->ops->port_disable) - ds->ops->port_disable(ds, p->dp->index, dev->phydev); + dsa_port_disable(dp, dev->phydev); dev_mc_unsync(master, dev); dev_uc_unsync(master, dev); -- 2.14.1
[PATCH net-next v2 0/3] net: dsa: use slave device phydev
This patchset removes the private phy_device in favor of the one provided by the slave net_device, makes slave open and close symmetrical and finally provides helpers for enabling or disabling a DSA port. Changes in v2: - do not remove the phy argument from port enable/disable Vivien Didelot (3): net: dsa: use slave device phydev net: dsa: make slave close symmetrical to open net: dsa: add port enable and disable helpers net/dsa/dsa_priv.h | 3 +- net/dsa/port.c | 31 +++- net/dsa/slave.c| 143 +++-- 3 files changed, 94 insertions(+), 83 deletions(-) -- 2.14.1
[PATCH net-next v2 1/3] net: dsa: use slave device phydev
There is no need to store a phy_device in dsa_slave_priv since net_device already provides one. Simply s/p->phy/dev->phydev/. While at it, return -ENODEV when it is NULL instead of -EOPNOTSUPP. Signed-off-by: Vivien Didelot--- net/dsa/slave.c | 126 ++-- 1 file changed, 58 insertions(+), 68 deletions(-) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 02ace7d462c4..3760472bf41d 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -99,15 +99,15 @@ static int dsa_slave_open(struct net_device *dev) } if (ds->ops->port_enable) { - err = ds->ops->port_enable(ds, p->dp->index, p->phy); + err = ds->ops->port_enable(ds, p->dp->index, dev->phydev); if (err) goto clear_promisc; } dsa_port_set_state_now(p->dp, stp_state); - if (p->phy) - phy_start(p->phy); + if (dev->phydev) + phy_start(dev->phydev); return 0; @@ -130,8 +130,8 @@ static int dsa_slave_close(struct net_device *dev) struct net_device *master = dsa_master_netdev(p); struct dsa_switch *ds = p->dp->ds; - if (p->phy) - phy_stop(p->phy); + if (dev->phydev) + phy_stop(dev->phydev); dev_mc_unsync(master, dev); dev_uc_unsync(master, dev); @@ -144,7 +144,7 @@ static int dsa_slave_close(struct net_device *dev) dev_uc_del(master, dev->dev_addr); if (ds->ops->port_disable) - ds->ops->port_disable(ds, p->dp->index, p->phy); + ds->ops->port_disable(ds, p->dp->index, dev->phydev); dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); @@ -273,12 +273,10 @@ dsa_slave_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb, static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) { - struct dsa_slave_priv *p = netdev_priv(dev); + if (!dev->phydev) + return -ENODEV; - if (p->phy != NULL) - return phy_mii_ioctl(p->phy, ifr, cmd); - - return -EOPNOTSUPP; + return phy_mii_ioctl(dev->phydev, ifr, cmd); } static int dsa_slave_port_attr_set(struct net_device *dev, @@ -435,12 +433,10 @@ static int dsa_slave_get_link_ksettings(struct net_device *dev, struct ethtool_link_ksettings *cmd) { - struct dsa_slave_priv *p = netdev_priv(dev); + if (!dev->phydev) + return -ENODEV; - if (!p->phy) - return -EOPNOTSUPP; - - phy_ethtool_ksettings_get(p->phy, cmd); + phy_ethtool_ksettings_get(dev->phydev, cmd); return 0; } @@ -449,12 +445,10 @@ static int dsa_slave_set_link_ksettings(struct net_device *dev, const struct ethtool_link_ksettings *cmd) { - struct dsa_slave_priv *p = netdev_priv(dev); + if (!dev->phydev) + return -ENODEV; - if (p->phy != NULL) - return phy_ethtool_ksettings_set(p->phy, cmd); - - return -EOPNOTSUPP; + return phy_ethtool_ksettings_set(dev->phydev, cmd); } static void dsa_slave_get_drvinfo(struct net_device *dev, @@ -488,24 +482,20 @@ dsa_slave_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *_p) static int dsa_slave_nway_reset(struct net_device *dev) { - struct dsa_slave_priv *p = netdev_priv(dev); + if (!dev->phydev) + return -ENODEV; - if (p->phy != NULL) - return genphy_restart_aneg(p->phy); - - return -EOPNOTSUPP; + return genphy_restart_aneg(dev->phydev); } static u32 dsa_slave_get_link(struct net_device *dev) { - struct dsa_slave_priv *p = netdev_priv(dev); + if (!dev->phydev) + return -ENODEV; - if (p->phy != NULL) { - genphy_update_link(p->phy); - return p->phy->link; - } + genphy_update_link(dev->phydev); - return -EOPNOTSUPP; + return dev->phydev->link; } static int dsa_slave_get_eeprom_len(struct net_device *dev) @@ -640,7 +630,7 @@ static int dsa_slave_set_eee(struct net_device *dev, struct ethtool_eee *e) int ret; /* Port's PHY and MAC both need to be EEE capable */ - if (!p->phy) + if (!dev->phydev) return -ENODEV; if (!ds->ops->set_mac_eee) @@ -651,12 +641,12 @@ static int dsa_slave_set_eee(struct net_device *dev, struct ethtool_eee *e) return ret; if (e->eee_enabled) { - ret = phy_init_eee(p->phy, 0); + ret = phy_init_eee(dev->phydev, 0); if (ret) return ret; } - return phy_ethtool_set_eee(p->phy, e); + return phy_ethtool_set_eee(dev->phydev, e); } static int dsa_slave_get_eee(struct net_device *dev, struct ethtool_eee *e) @@
Re: [PATCH net-next 2/4] net: dsa: remove phy arg from port enable/disable
> Historical reasons mostly. Considering the complexity of > dsa_slave_phy_setup(), I would certainly be extremely careful in > changing any of this, the potential for breakage is pretty big. Yes, i took a look at this, wondering how to convert to phylink. I went away and got a stiff drink :-) Andrew
[PATCH] r8152: add Linksys USB3GIGV1 id
This Linksys dongle by default comes up in cdc_ether mode. This patch allows r8152 to claim the device: Bus 002 Device 002: ID 13b1:0041 Linksys Signed-off-by: Grant Grundler--- drivers/net/usb/r8152.c | 2 ++ 1 file changed, 2 insertions(+) This was tested on chromeos-3.14, chromeos-3.18, and chromeos-4.4 kernels with a mix of ARM/x86-64 systems. diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index ceb78e2ea4f0..941ece08ba78 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -613,6 +613,7 @@ enum rtl8152_flags { #define VENDOR_ID_MICROSOFT0x045e #define VENDOR_ID_SAMSUNG 0x04e8 #define VENDOR_ID_LENOVO 0x17ef +#define VENDOR_ID_LINKSYS 0x13b1 #define VENDOR_ID_NVIDIA 0x0955 #define MCU_TYPE_PLA 0x0100 @@ -5316,6 +5317,7 @@ static const struct usb_device_id rtl8152_table[] = { {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x7205)}, {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x720c)}, {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x7214)}, + {REALTEK_USB_DEVICE(VENDOR_ID_LINKSYS, 0x0041)}, {REALTEK_USB_DEVICE(VENDOR_ID_NVIDIA, 0x09ff)}, {} }; -- 2.14.1.821.g8fa685d3b7-goog
Re: [PATCH net-next 2/4] net: dsa: remove phy arg from port enable/disable
On 09/22/2017 11:12 AM, Vivien Didelot wrote: > Hi Florian, > > Florian Fainelliwrites: > >> On 09/22/2017 09:17 AM, Vivien Didelot wrote: >>> The .port_enable and .port_disable functions are meant to deal with the >>> switch ports only, and no driver is using the phy argument anyway. >>> Remove it. >> >> I don't think this makes sense, there are perfectly legit reasons why a >> switch driver may have something to do with the PHY device attached to >> its per-port network interface, we should definitively keep that around, >> unless you think we should be accessing the PHY within the switch >> drivers by doing: >> >> struct phy_device *phydev = ds->ports[port].netdev->phydev? > > bcm_sf2 is the only user for this phy argument right now. The reason I'm > doing this is because I prefer to discourage switch drivers to dig into > the phy device themselves while as you said there must be a cleaner > solution. This must be handled somehow elsewhere in the stack. The current approach of passing the phy_device reference as an argument is certainly a cleaner way then. The port_enable caller can provide the correct phy_device and that lifts the switch driver from having to dig it itself from its per-port netdev. > > In the meantime, moving the PHY device up to the dsa_port structure is a > good solution, in order not to expose it in switch ops, but still make > it available to more complex drivers. > > Do you know if netdev->phydev is usable? Why do DSA has its own copy in > dsa_slave_priv then? Historical reasons mostly. Considering the complexity of dsa_slave_phy_setup(), I would certainly be extremely careful in changing any of this, the potential for breakage is pretty big. At first glance, I would say that this is a safe conversion to do, and I can test this on the HW I have here anyway. -- Florian
Re: [PATCH net-next 2/4] net: dsa: remove phy arg from port enable/disable
Hi Florian, Florian Fainelliwrites: > On 09/22/2017 09:17 AM, Vivien Didelot wrote: >> The .port_enable and .port_disable functions are meant to deal with the >> switch ports only, and no driver is using the phy argument anyway. >> Remove it. > > I don't think this makes sense, there are perfectly legit reasons why a > switch driver may have something to do with the PHY device attached to > its per-port network interface, we should definitively keep that around, > unless you think we should be accessing the PHY within the switch > drivers by doing: > > struct phy_device *phydev = ds->ports[port].netdev->phydev? bcm_sf2 is the only user for this phy argument right now. The reason I'm doing this is because I prefer to discourage switch drivers to dig into the phy device themselves while as you said there must be a cleaner solution. This must be handled somehow elsewhere in the stack. In the meantime, moving the PHY device up to the dsa_port structure is a good solution, in order not to expose it in switch ops, but still make it available to more complex drivers. Do you know if netdev->phydev is usable? Why do DSA has its own copy in dsa_slave_priv then? I'll respin, thanks. Vivien
Re: [PATCH,v2,net-next 1/2] tun: enable NAPI for TUN/TAP driver
On Fri, Sep 22, 2017 at 11:03 AM, Willem de Bruijnwrote: > On Fri, Sep 22, 2017 at 1:11 PM, Mahesh Bandewar (महेश बंडेवार) > wrote: >>> #ifdef CONFIG_TUN_VNET_CROSS_LE >>> static inline bool tun_legacy_is_little_endian(struct tun_struct *tun) >>> { >>> @@ -541,6 +604,11 @@ static void __tun_detach(struct tun_file *tfile, bool >>> clean) >>> >>> tun = rtnl_dereference(tfile->tun); >>> >>> + if (tun && clean) { >>> + tun_napi_disable(tun, tfile); >> are we missing synchronize_net() separating disable and del calls? > > That is not needed here. napi_disable has its own mechanism for waiting > until a napi struct is no longer run. netif_napi_del will call synchronize_net > if needed. Yes, that will do. Thanks. > These two calls are made one after the other in quite a few drivers.
Re: [Intel-wired-lan] [PATCH][V3] e1000: avoid null pointer dereference on invalid stat type
On Fri, Sep 22, 2017 at 10:13 AM, Colin Kingwrote: > From: Colin Ian King > > Currently if the stat type is invalid then data[i] is being set > either by dereferencing a null pointer p, or it is reading from > an incorrect previous location if we had a valid stat type > previously. Fix this by skipping over the read of p on an invalid > stat type. > > Detected by CoverityScan, CID#113385 ("Explicit null dereferenced") > > Signed-off-by: Colin Ian King Looks good to me. Reviewed-by: Alexander Duyck > --- > drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 - > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c > b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c > index ec8aa4562cc9..3b3983a1ffbb 100644 > --- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c > +++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c > @@ -1824,11 +1824,12 @@ static void e1000_get_ethtool_stats(struct net_device > *netdev, > { > struct e1000_adapter *adapter = netdev_priv(netdev); > int i; > - char *p = NULL; > const struct e1000_stats *stat = e1000_gstrings_stats; > > e1000_update_stats(adapter); > - for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++) { > + for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++, stat++) { > + char *p; > + > switch (stat->type) { > case NETDEV_STATS: > p = (char *)netdev + stat->stat_offset; > @@ -1839,15 +1840,13 @@ static void e1000_get_ethtool_stats(struct net_device > *netdev, > default: > WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n", > stat->type, i); > - break; > + continue; > } > > if (stat->sizeof_stat == sizeof(u64)) > data[i] = *(u64 *)p; > else > data[i] = *(u32 *)p; > - > - stat++; > } > /* BUG_ON(i != E1000_STATS_LEN); */ > } > -- > 2.14.1 > > ___ > Intel-wired-lan mailing list > intel-wired-...@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Re: [PATCH,v2,net-next 1/2] tun: enable NAPI for TUN/TAP driver
On Fri, Sep 22, 2017 at 1:11 PM, Mahesh Bandewar (महेश बंडेवार)wrote: >> #ifdef CONFIG_TUN_VNET_CROSS_LE >> static inline bool tun_legacy_is_little_endian(struct tun_struct *tun) >> { >> @@ -541,6 +604,11 @@ static void __tun_detach(struct tun_file *tfile, bool >> clean) >> >> tun = rtnl_dereference(tfile->tun); >> >> + if (tun && clean) { >> + tun_napi_disable(tun, tfile); > are we missing synchronize_net() separating disable and del calls? That is not needed here. napi_disable has its own mechanism for waiting until a napi struct is no longer run. netif_napi_del will call synchronize_net if needed. These two calls are made one after the other in quite a few drivers.
Re: [PATCH] Add a driver for Renesas uPD60620 and uPD60620A PHYs
On Fri, Sep 22, 2017 at 05:08:45PM +, Bernd Edlinger wrote: > Signed-off-by: Bernd Edlinger> --- > drivers/net/phy/Kconfig| 5 + > drivers/net/phy/Makefile | 1 + > drivers/net/phy/uPD60620.c | 226 > + > 3 files changed, 232 insertions(+) > create mode 100644 drivers/net/phy/uPD60620.c > > diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig > index a9d16a3..25089f0 100644 > --- a/drivers/net/phy/Kconfig > +++ b/drivers/net/phy/Kconfig > @@ -287,6 +287,11 @@ config DP83867_PHY > ---help--- > Currently supports the DP83867 PHY. > > +config RENESAS_PHY > + tristate "Driver for Renesas PHYs" > + ---help--- > + Supports the uPD60620 and uPD60620A PHYs. > + Hi Bernd Please call this "Reneseas PHYs" and place in it alphabetical order. > config FIXED_PHY > tristate "MDIO Bus/PHY emulation with fixed speed/link PHYs" > depends on PHYLIB > diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile > index 416df92..1404ad3 100644 > --- a/drivers/net/phy/Makefile > +++ b/drivers/net/phy/Makefile > @@ -72,6 +72,7 @@ obj-$(CONFIG_MICROSEMI_PHY) += mscc.o > obj-$(CONFIG_NATIONAL_PHY) += national.o > obj-$(CONFIG_QSEMI_PHY) += qsemi.o > obj-$(CONFIG_REALTEK_PHY) += realtek.o > +obj-$(CONFIG_RENESAS_PHY)+= uPD60620.o > obj-$(CONFIG_ROCKCHIP_PHY) += rockchip.o > obj-$(CONFIG_SMSC_PHY) += smsc.o > obj-$(CONFIG_STE10XP) += ste10Xp.o > diff --git a/drivers/net/phy/uPD60620.c b/drivers/net/phy/uPD60620.c > new file mode 100644 > index 000..b3d900c > --- /dev/null > +++ b/drivers/net/phy/uPD60620.c > @@ -0,0 +1,226 @@ > +/* > + * Driver for the Renesas PHY uPD60620. > + * > + * Copyright (C) 2015 Softing Industrial Automation GmbH > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + */ > + > +#include > +#include > +#include > + > +#define UPD60620_PHY_ID0xb8242824 > + > +/* Extended Registers and values */ > +/* PHY Special Control/Status*/ > +#define PHY_PHYSCR 0x1F /* PHY.31 */ > +#define PHY_PHYSCR_10MB0x0004/* PHY speed = 10mb */ > +#define PHY_PHYSCR_100MB 0x0008/* PHY speed = 100mb */ > +#define PHY_PHYSCR_DUPLEX 0x0010/* PHY Duplex */ > +#define PHY_PHYSCR_RSVD5 0x0020/* Reserved Bit 5 */ > +#define PHY_PHYSCR_MIIMOD 0x0040/* Enable 4B5B MII mode */ Are any of these comments actually useful. It seems like the defines are pretty obvious. > +#define PHY_PHYSCR_RSVD7 0x0080/* Reserved Bit 7 */ > +#define PHY_PHYSCR_RSVD8 0x0100/* Reserved Bit 8 */ > +#define PHY_PHYSCR_RSVD9 0x0200/* Reserved Bit 9 */ > +#define PHY_PHYSCR_RSVD10 0x0400/* Reserved Bit 10 */ > +#define PHY_PHYSCR_RSVD11 0x0800/* Reserved Bit 11 */ > +#define PHY_PHYSCR_ANDONE 0x1000/* Auto negotiation done */ > +#define PHY_PHYSCR_RSVD13 0x2000/* Reserved Bit 13 */ > +#define PHY_PHYSCR_RSVD14 0x4000/* Reserved Bit 14 */ > +#define PHY_PHYSCR_RSVD15 0x8000/* Reserved Bit 15 */ It looks like the only register you use is SCR and SPM. Maybe delete all the rest? Or do you plan to add more features making use of these registers? > +/* Init PHY */ > + > +static int upd60620_config_init(struct phy_device *phydev) > +{ > + /* Enable support for passive HUBs (could be a strap option) */ > + /* PHYMODE: All speeds, HD in parallel detect */ > + return phy_write(phydev, PHY_SPM, 0x0180 | phydev->mdio.addr); > +} > + > +/* Get PHY status from common registers */ > + > +static int upd60620_read_status(struct phy_device *phydev) > +{ > + int phy_state; > + > + /* Read negotiated state */ > + phy_state = phy_read(phydev, MII_BMSR); > + if (phy_state < 0) > + return phy_state; > + > + phydev->link = 0; > + phydev->lp_advertising = 0; > + phydev->pause = 0; > + phydev->asym_pause = 0; > + > + if (phy_state & BMSR_ANEGCOMPLETE) { It is worth comparing this against genphy_read_status() which is the reference implementation. You would normally check if auto negotiation is enabled, not if it has completed. If it is enabled you read the current negotiated state, even if it is not completed. > + phy_state = phy_read(phydev, PHY_PHYSCR); > + if (phy_state < 0) > + return phy_state; > + > + if (phy_state & (PHY_PHYSCR_10MB | PHY_PHYSCR_100MB)) { > + phydev->link = 1; > + phydev->speed = SPEED_10; > + phydev->duplex = DUPLEX_HALF; > + > + if (phy_state & PHY_PHYSCR_100MB) > + phydev->speed = SPEED_100; > +
Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver
On Fri, Sep 22, 2017 at 1:48 PM, Petar Penkovwrote: > On Fri, Sep 22, 2017 at 9:51 AM, Mahesh Bandewar (महेश बंडेवार) > wrote: >> On Fri, Sep 22, 2017 at 7:06 AM, Willem de Bruijn >> wrote: @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) if (tfile->detached) return -EINVAL; + if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN)) + return -EPERM; + >>> >>> This should perhaps be moved into the !dev branch, directly below the >>> ns_capable check. >>> >> Hmm, does that mean fail only on creation but allow to attach if >> exists? That would be wrong, isn't it? Correct me if I'm wrong but we >> want to prevent both these scenarios if user does not have sufficient >> privileges (i.e. NET_ADMIN in init-ns). Ok. >> > My understanding is we want to protect both scenarios. dev = __dev_get_by_name(net, ifr->ifr_name); if (dev) { if (ifr->ifr_flags & IFF_TUN_EXCL) @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) tun->flags = (tun->flags & ~TUN_FEATURES) | (ifr->ifr_flags & TUN_FEATURES); + if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != IFF_TAP) + tun->flags = tun->flags & ~IFF_NAPI_FRAGS; + >>> >>> Similarly, this check only need to be performed in that branch. >>> Instead of reverting to non-frags mode, a tun_set_iff with the wrong >>> set of flags should probably fail hard. >> Yes, agree, wrong set of flags should fail hard and probably be done >> before attach or open, no? > Agreed, in v3 I will push this check before the conditional so both > branches can be rejected with EINVAL. Sounds great.
Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver
On Fri, Sep 22, 2017 at 9:51 AM, Mahesh Bandewar (महेश बंडेवार)wrote: > On Fri, Sep 22, 2017 at 7:06 AM, Willem de Bruijn > wrote: >>> @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file >>> *file, struct ifreq *ifr) >>> if (tfile->detached) >>> return -EINVAL; >>> >>> + if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN)) >>> + return -EPERM; >>> + >> >> This should perhaps be moved into the !dev branch, directly below the >> ns_capable check. >> > Hmm, does that mean fail only on creation but allow to attach if > exists? That would be wrong, isn't it? Correct me if I'm wrong but we > want to prevent both these scenarios if user does not have sufficient > privileges (i.e. NET_ADMIN in init-ns). > My understanding is we want to protect both scenarios. >>> dev = __dev_get_by_name(net, ifr->ifr_name); >>> if (dev) { >>> if (ifr->ifr_flags & IFF_TUN_EXCL) >>> @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file >>> *file, struct ifreq *ifr) >>> tun->flags = (tun->flags & ~TUN_FEATURES) | >>> (ifr->ifr_flags & TUN_FEATURES); >>> >>> + if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != >>> IFF_TAP) >>> + tun->flags = tun->flags & ~IFF_NAPI_FRAGS; >>> + >> >> Similarly, this check only need to be performed in that branch. >> Instead of reverting to non-frags mode, a tun_set_iff with the wrong >> set of flags should probably fail hard. > Yes, agree, wrong set of flags should fail hard and probably be done > before attach or open, no? Agreed, in v3 I will push this check before the conditional so both branches can be rejected with EINVAL.
Re: [PATCH net-next 4/4] net: dsa: add port enable and disable helpers
On 09/22/2017 09:17 AM, Vivien Didelot wrote: > Provide dsa_port_enable and dsa_port_disable helpers to respectively > enable and disable a switch port. This makes the dsa_port_set_state_now > helper static. > > Signed-off-by: Vivien DidelotReviewed-by: Florian Fainelli -- Florian
Re: [PATCH net-next 2/4] net: dsa: remove phy arg from port enable/disable
On 09/22/2017 09:17 AM, Vivien Didelot wrote: > The .port_enable and .port_disable functions are meant to deal with the > switch ports only, and no driver is using the phy argument anyway. > Remove it. I don't think this makes sense, there are perfectly legit reasons why a switch driver may have something to do with the PHY device attached to its per-port network interface, we should definitively keep that around, unless you think we should be accessing the PHY within the switch drivers by doing: struct phy_device *phydev = ds->ports[port].netdev->phydev? > > Signed-off-by: Vivien Didelot> --- > drivers/net/dsa/b53/b53_common.c | 6 +++--- > drivers/net/dsa/b53/b53_priv.h | 4 ++-- > drivers/net/dsa/bcm_sf2.c | 16 +++- > drivers/net/dsa/lan9303-core.c | 6 ++ > drivers/net/dsa/microchip/ksz_common.c | 6 ++ > drivers/net/dsa/mt7530.c | 8 +++- > drivers/net/dsa/mv88e6xxx/chip.c | 6 ++ > drivers/net/dsa/qca8k.c| 6 ++ > include/net/dsa.h | 6 ++ > net/dsa/slave.c| 4 ++-- > 10 files changed, 27 insertions(+), 41 deletions(-) > > diff --git a/drivers/net/dsa/b53/b53_common.c > b/drivers/net/dsa/b53/b53_common.c > index d4ce092def83..e46eb29d29f0 100644 > --- a/drivers/net/dsa/b53/b53_common.c > +++ b/drivers/net/dsa/b53/b53_common.c > @@ -502,7 +502,7 @@ void b53_imp_vlan_setup(struct dsa_switch *ds, int > cpu_port) > } > EXPORT_SYMBOL(b53_imp_vlan_setup); > > -int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy) > +int b53_enable_port(struct dsa_switch *ds, int port) > { > struct b53_device *dev = ds->priv; > unsigned int cpu_port = dev->cpu_port; > @@ -531,7 +531,7 @@ int b53_enable_port(struct dsa_switch *ds, int port, > struct phy_device *phy) > } > EXPORT_SYMBOL(b53_enable_port); > > -void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device > *phy) > +void b53_disable_port(struct dsa_switch *ds, int port) > { > struct b53_device *dev = ds->priv; > u8 reg; > @@ -874,7 +874,7 @@ static int b53_setup(struct dsa_switch *ds) > if (dsa_is_cpu_port(ds, port)) > b53_enable_cpu_port(dev, port); > else if (!(BIT(port) & ds->enabled_port_mask)) > - b53_disable_port(ds, port, NULL); > + b53_disable_port(ds, port); > } > > return ret; > diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h > index 603c66d240d8..688d02ee6155 100644 > --- a/drivers/net/dsa/b53/b53_priv.h > +++ b/drivers/net/dsa/b53/b53_priv.h > @@ -311,8 +311,8 @@ int b53_mirror_add(struct dsa_switch *ds, int port, > struct dsa_mall_mirror_tc_entry *mirror, bool ingress); > void b53_mirror_del(struct dsa_switch *ds, int port, > struct dsa_mall_mirror_tc_entry *mirror); > -int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy); > -void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device > *phy); > +int b53_enable_port(struct dsa_switch *ds, int port); > +void b53_disable_port(struct dsa_switch *ds, int port); > void b53_brcm_hdr_setup(struct dsa_switch *ds, int port); > void b53_eee_enable_set(struct dsa_switch *ds, int port, bool enable); > int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy); > diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c > index ad96b9725a2c..77e0c43f973b 100644 > --- a/drivers/net/dsa/bcm_sf2.c > +++ b/drivers/net/dsa/bcm_sf2.c > @@ -159,8 +159,7 @@ static inline void bcm_sf2_port_intr_disable(struct > bcm_sf2_priv *priv, > intrl2_1_writel(priv, P_IRQ_MASK(off), INTRL2_CPU_CLEAR); > } > > -static int bcm_sf2_port_setup(struct dsa_switch *ds, int port, > - struct phy_device *phy) > +static int bcm_sf2_port_setup(struct dsa_switch *ds, int port) > { > struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds); > unsigned int i; > @@ -191,11 +190,10 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, > int port, > if (port == priv->moca_port) > bcm_sf2_port_intr_enable(priv, port); > > - return b53_enable_port(ds, port, phy); > + return b53_enable_port(ds, port); > } > > -static void bcm_sf2_port_disable(struct dsa_switch *ds, int port, > - struct phy_device *phy) > +static void bcm_sf2_port_disable(struct dsa_switch *ds, int port) > { > struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds); > u32 off, reg; > @@ -214,7 +212,7 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, > int port, > else > off = CORE_G_PCTL_PORT(port); > > - b53_disable_port(ds, port, phy); > + b53_disable_port(ds, port); > > /* Power down the port memory */ > reg =
Re: [PATCH net-next 3/4] net: dsa: make slave close symmetrical to open
On 09/22/2017 09:17 AM, Vivien Didelot wrote: > The DSA slave open function configures the unicast MAC addresses on the > master device, enable the switch port, change its STP state, then start > the PHY device. > > Make the close function symmetric, by first stopping the PHY device, > then changing the STP state, disabling the switch port and restore the > master device. > > Signed-off-by: Vivien DidelotReviewed-by: Florian Fainelli -- Florian
[PATCH][V3] e1000: avoid null pointer dereference on invalid stat type
From: Colin Ian KingCurrently if the stat type is invalid then data[i] is being set either by dereferencing a null pointer p, or it is reading from an incorrect previous location if we had a valid stat type previously. Fix this by skipping over the read of p on an invalid stat type. Detected by CoverityScan, CID#113385 ("Explicit null dereferenced") Signed-off-by: Colin Ian King --- drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c index ec8aa4562cc9..3b3983a1ffbb 100644 --- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c +++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c @@ -1824,11 +1824,12 @@ static void e1000_get_ethtool_stats(struct net_device *netdev, { struct e1000_adapter *adapter = netdev_priv(netdev); int i; - char *p = NULL; const struct e1000_stats *stat = e1000_gstrings_stats; e1000_update_stats(adapter); - for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++) { + for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++, stat++) { + char *p; + switch (stat->type) { case NETDEV_STATS: p = (char *)netdev + stat->stat_offset; @@ -1839,15 +1840,13 @@ static void e1000_get_ethtool_stats(struct net_device *netdev, default: WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n", stat->type, i); - break; + continue; } if (stat->sizeof_stat == sizeof(u64)) data[i] = *(u64 *)p; else data[i] = *(u32 *)p; - - stat++; } /* BUG_ON(i != E1000_STATS_LEN); */ } -- 2.14.1
Re: [PATCH,v2,net-next 1/2] tun: enable NAPI for TUN/TAP driver
> #ifdef CONFIG_TUN_VNET_CROSS_LE > static inline bool tun_legacy_is_little_endian(struct tun_struct *tun) > { > @@ -541,6 +604,11 @@ static void __tun_detach(struct tun_file *tfile, bool > clean) > > tun = rtnl_dereference(tfile->tun); > > + if (tun && clean) { > + tun_napi_disable(tun, tfile); are we missing synchronize_net() separating disable and del calls? > + tun_napi_del(tun, tfile); > + } > + > if (tun && !tfile->detached) { > u16 index = tfile->queue_index; > BUG_ON(index >= tun->numqueues);
Re: [PATCH iproute2 v2] man: fix documentation for range of route table ID
On Fri, 22 Sep 2017 13:28:54 +0200 Thomas Hallerwrote: > Signed-off-by: Thomas Haller > --- > Changes in v2: > - "0" is not a valid table ID. > > man/man8/ip-route.8.in | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in > index 803de3b9..705ceb20 100644 > --- a/man/man8/ip-route.8.in > +++ b/man/man8/ip-route.8.in > @@ -322,7 +322,7 @@ normal routing tables. > .P > .B Route tables: > Linux-2.x can pack routes into several routing tables identified > -by a number in the range from 1 to 2^31 or by name from the file > +by a number in the range from 1 to 2^32-1 or by name from the file > .B @SYSCONFDIR@/rt_tables > By default all normal routes are inserted into the > .B main Applied
[PATCH] Add a driver for Renesas uPD60620 and uPD60620A PHYs
Signed-off-by: Bernd Edlinger--- drivers/net/phy/Kconfig| 5 + drivers/net/phy/Makefile | 1 + drivers/net/phy/uPD60620.c | 226 + 3 files changed, 232 insertions(+) create mode 100644 drivers/net/phy/uPD60620.c diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index a9d16a3..25089f0 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -287,6 +287,11 @@ config DP83867_PHY ---help--- Currently supports the DP83867 PHY. +config RENESAS_PHY + tristate "Driver for Renesas PHYs" + ---help--- + Supports the uPD60620 and uPD60620A PHYs. + config FIXED_PHY tristate "MDIO Bus/PHY emulation with fixed speed/link PHYs" depends on PHYLIB diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index 416df92..1404ad3 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -72,6 +72,7 @@ obj-$(CONFIG_MICROSEMI_PHY) += mscc.o obj-$(CONFIG_NATIONAL_PHY)+= national.o obj-$(CONFIG_QSEMI_PHY) += qsemi.o obj-$(CONFIG_REALTEK_PHY) += realtek.o +obj-$(CONFIG_RENESAS_PHY) += uPD60620.o obj-$(CONFIG_ROCKCHIP_PHY)+= rockchip.o obj-$(CONFIG_SMSC_PHY)+= smsc.o obj-$(CONFIG_STE10XP) += ste10Xp.o diff --git a/drivers/net/phy/uPD60620.c b/drivers/net/phy/uPD60620.c new file mode 100644 index 000..b3d900c --- /dev/null +++ b/drivers/net/phy/uPD60620.c @@ -0,0 +1,226 @@ +/* + * Driver for the Renesas PHY uPD60620. + * + * Copyright (C) 2015 Softing Industrial Automation GmbH + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#include +#include +#include + +#define UPD60620_PHY_ID0xb8242824 + +/* Extended Registers and values */ +/* PHY Special Control/Status*/ +#define PHY_PHYSCR 0x1F /* PHY.31 */ +#define PHY_PHYSCR_10MB0x0004/* PHY speed = 10mb */ +#define PHY_PHYSCR_100MB 0x0008/* PHY speed = 100mb */ +#define PHY_PHYSCR_DUPLEX 0x0010/* PHY Duplex */ +#define PHY_PHYSCR_RSVD5 0x0020/* Reserved Bit 5 */ +#define PHY_PHYSCR_MIIMOD 0x0040/* Enable 4B5B MII mode */ +#define PHY_PHYSCR_RSVD7 0x0080/* Reserved Bit 7 */ +#define PHY_PHYSCR_RSVD8 0x0100/* Reserved Bit 8 */ +#define PHY_PHYSCR_RSVD9 0x0200/* Reserved Bit 9 */ +#define PHY_PHYSCR_RSVD10 0x0400/* Reserved Bit 10 */ +#define PHY_PHYSCR_RSVD11 0x0800/* Reserved Bit 11 */ +#define PHY_PHYSCR_ANDONE 0x1000/* Auto negotiation done */ +#define PHY_PHYSCR_RSVD13 0x2000/* Reserved Bit 13 */ +#define PHY_PHYSCR_RSVD14 0x4000/* Reserved Bit 14 */ +#define PHY_PHYSCR_RSVD15 0x8000/* Reserved Bit 15 */ + +/* PHY Global Config Mapping */ +#define PHY_GLOBAL_CONFIG 0x07 +/* PHY GPIO Config Register 1 */ +#define PHY_GPIO_CONFIG1 0x01 /* PHY 7.1 */ +#define PHY_GPIO4_INT0 0x000d /* GPIO4 configuration */ +#define PHY_GPIO5_INT1 0x00d0 /* GPIO5 configuration */ + +/* PHY Interrupt Control Register */ +#define PHY_ICR0x1e /* PHY.30 */ +#define PHY_ICR_RSVD0 0x0001/* Reserved bit 0 */ +#define PHY_ICR_ANCPRRN0x0002/* Auto negotiation paged received */ +#define PHY_ICR_PDFEN 0x0004/* Parallel detection fault */ +#define PHY_ICR_ANCLPAEN 0x0008/* Auto negotiation last page ack */ +#define PHY_ICR_LNKINTEN 0x0010/* Link down */ +#define PHY_ICR_REMFD 0x0020/* Remote fault detected */ +#define PHY_ICR_ANCINTEN 0x0040/* Auto negotiation complete */ +#define PHY_ICR_EOEN 0x0080/* Energy on generated */ +#define PHY_ICR_RSVD8 0x0100/* Reserved bit 8 */ +#define PHY_ICR_FEQTRGEN 0x0200/* FEQ Trigger */ +#define PHY_ICR_BERTRGEN 0x0400/* BER Counter Trigger */ +#define PHY_ICR_MLINTEN0x0800/* Maxlvl */ +#define PHY_ICR_CLPINTEN 0x1000/* Clipping */ +#define PHY_ICR_RSVD13 0x2000/* Reserved bit 13 */ +#define PHY_ICR_RSVD14 0x4000/* Reserved bit 14 */ +#define PHY_ICR_RSVD15 0x8000/* Reserved bit 15 */ + +/* PHY Interrupt Status Register */ +#define PHY_ISR0x1d /* PHY.29 */ +#define PHY_ISR_DUPINT 0x/* Placeholder for Duplex/Speed intr */ +#define PHY_ISR_RSVD0 0x0001/* Reserved bit 0 */ +#define PHY_ISR_ANCPR 0x0002/* Auto negotiation paged received */ +#define PHY_ISR_PDF0x0004/* Parallel detection fault */ +#define PHY_ISR_ANCLPA 0x0008/* Auto negotiation last page ack */ +#define PHY_ISR_LNKINT 0x0010/* Link down */ +#define PHY_ISR_REMFD 0x0020/* Remote fault detected */ +#define PHY_ISR_ANCINT 0x0040/* Auto negotiation complete */ +#define PHY_ISR_EO
Re: [PATCH iproute2 master 0/2] BPF/XDP json follow-up
On Thu, 21 Sep 2017 10:42:27 +0200 Daniel Borkmannwrote: > After merging net-next branch into master, Stephen asked to > fix up json dump for XDP as there were some merge conflicts, > so here it is. > > Thanks! > > Daniel Borkmann (2): > json: move json printer to common library > bpf: properly output json for xdp > > include/json_print.h | 71 > ip/Makefile | 2 +- > ip/ip_common.h | 65 ++ > ip/ip_print.c| 233 > --- > ip/iplink_xdp.c | 74 +--- > lib/Makefile | 2 +- > lib/bpf.c| 19 +++-- > lib/json_print.c | 231 ++ > 8 files changed, 369 insertions(+), 328 deletions(-) > create mode 100644 include/json_print.h > delete mode 100644 ip/ip_print.c > create mode 100644 lib/json_print.c > Applied.
usb/wireless/rsi_91x: use-after-free write in __run_timers
Hi! I've got the following report while fuzzing the kernel with syzkaller. On commit 6e80ecdddf4ea6f3cd84e83720f3d852e6624a68 (Sep 21). == BUG: KASAN: use-after-free in __run_timers+0xc0e/0xd40 Write of size 8 at addr 880069f701b8 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc1-42311-g6e80ecdddf4e #234 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x292/0x395 lib/dump_stack.c:52 print_address_description+0x78/0x280 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 kasan_report+0x22f/0x340 mm/kasan/report.c:409 __asan_report_store8_noabort+0x1c/0x20 mm/kasan/report.c:435 collect_expired_timers ./include/linux/list.h:729 __run_timers+0xc0e/0xd40 kernel/time/timer.c:1616 run_timer_softirq+0x83/0x140 kernel/time/timer.c:1646 __do_softirq+0x305/0xc2d kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 irq_exit+0x171/0x1a0 kernel/softirq.c:405 exiting_irq ./arch/x86/include/asm/apic.h:638 smp_apic_timer_interrupt+0x2b9/0x8d0 arch/x86/kernel/apic/apic.c:1048 apic_timer_interrupt+0x9d/0xb0 RIP: 0010:native_safe_halt+0x6/0x10 ./arch/x86/include/asm/irqflags.h:53 RSP: 0018:86607958 EFLAGS: 0282 ORIG_RAX: ff10 RAX: dc20 RBX: 10cc0f2f RCX: RDX: RSI: 0001 RDI: 8662ea64 RBP: 86607958 R08: 813d3501 R09: R10: R11: R12: 10cc0f3b R13: 86607a98 R14: 86fc1628 R15: arch_safe_halt ./arch/x86/include/asm/paravirt.h:93 default_idle+0x127/0x690 arch/x86/kernel/process.c:341 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:332 default_idle_call+0x3b/0x60 kernel/sched/idle.c:98 cpuidle_idle_call kernel/sched/idle.c:156 do_idle+0x35c/0x440 kernel/sched/idle.c:246 cpu_startup_entry+0x1d/0x20 kernel/sched/idle.c:351 rest_init+0xf3/0x100 init/main.c:435 start_kernel+0x782/0x7b0 init/main.c:710 x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:377 x86_64_start_kernel+0x77/0x7a arch/x86/kernel/head64.c:358 secondary_startup_64+0xa5/0xa5 arch/x86/kernel/head_64.S:235 Allocated by task 1845: save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x11e/0x2d0 mm/slub.c:2772 kmalloc ./include/linux/slab.h:493 kzalloc ./include/linux/slab.h:666 rsi_91x_init+0x98/0x510 drivers/net/wireless/rsi/rsi_91x_main.c:203 rsi_probe+0xb6/0x13b0 drivers/net/wireless/rsi/rsi_91x_usb.c:665 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361 really_probe drivers/base/dd.c:413 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463 __device_attach+0x26e/0x3d0 drivers/base/dd.c:710 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523 device_add+0xd0b/0x1660 drivers/base/core.c:1835 usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932 generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174 usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266 really_probe drivers/base/dd.c:413 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463 __device_attach+0x26e/0x3d0 drivers/base/dd.c:710 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523 device_add+0xd0b/0x1660 drivers/base/core.c:1835 usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457 hub_port_connect drivers/usb/core/hub.c:4903 hub_port_connect_change drivers/usb/core/hub.c:5009 port_event drivers/usb/core/hub.c:5115 hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195 process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119 worker_thread+0x221/0x1850 kernel/workqueue.c:2253 kthread+0x3a1/0x470 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Freed by task 1845: save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:524 slab_free_hook mm/slub.c:1390 slab_free_freelist_hook mm/slub.c:1412 slab_free mm/slub.c:2988 kfree+0xf6/0x2f0 mm/slub.c:3919 rsi_91x_deinit+0x1e8/0x250 drivers/net/wireless/rsi/rsi_91x_main.c:268 rsi_probe+0xed1/0x13b0 drivers/net/wireless/rsi/rsi_91x_usb.c:709 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361 really_probe drivers/base/dd.c:413 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
Re: [PATCH net-next 1/4] net: dsa: move up phy enabling in core
On 09/22/2017 09:32 AM, Andrew Lunn wrote: > On Fri, Sep 22, 2017 at 12:17:50PM -0400, Vivien Didelot wrote: >> bcm_sf2 is currently the only driver using the phy argument passed to >> .port_enable. It resets the state machine if the phy has been hard >> reset. This check is generic and can be moved to DSA core. >> >> dsa_port_set_state_now(p->dp, stp_state); >> >> -if (p->phy) >> -phy_start(p->phy); >> +if (phy) { >> +/* If phy_stop() has been called before, phy will be in >> + * halted state, and phy_start() will call resume. >> + * >> + * The resume path does not configure back autoneg >> + * settings, and since the internal phy may have been >> + * hard reset, we need to reset the state machine also. >> + */ >> +phy->state = PHY_READY; >> +phy_init_hw(phy); >> +phy_start(phy); >> +} > > Hi Vivien > > If this is generic, why is it needed at all here? Shouldn't this > actually by in phylib? This does not belong in the core logic within net/dsa/slave.c. The reason why this is necessary here is because we are doing a HW-based reset of the PHY, as the comment explains this is specific to how the HW works. There may be a cleaner solution to this problem, but in any case, I don't think other drivers should inherit that logic. -- Florian
Re: [PATCH net-next 1/4] net: dsa: move up phy enabling in core
On 09/22/2017 09:17 AM, Vivien Didelot wrote: > bcm_sf2 is currently the only driver using the phy argument passed to > .port_enable. It resets the state machine if the phy has been hard > reset. This check is generic and can be moved to DSA core. This is completely specific to bcm_sf2 because it does call bcm_sf2_gphy_enable_set() which performs a HW reset of the PHY, you can't move this to the generic portion of net/dsa/slave.c. NACK. > > Signed-off-by: Vivien Didelot> --- > drivers/net/dsa/bcm_sf2.c | 16 +--- > net/dsa/slave.c | 15 +-- > 2 files changed, 14 insertions(+), 17 deletions(-) > > diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c > index 898d5642b516..ad96b9725a2c 100644 > --- a/drivers/net/dsa/bcm_sf2.c > +++ b/drivers/net/dsa/bcm_sf2.c > @@ -184,22 +184,8 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int > port, > core_writel(priv, reg, CORE_PORT_TC2_QOS_MAP_PORT(port)); > > /* Re-enable the GPHY and re-apply workarounds */ > - if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1) { > + if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1) > bcm_sf2_gphy_enable_set(ds, true); > - if (phy) { > - /* if phy_stop() has been called before, phy > - * will be in halted state, and phy_start() > - * will call resume. > - * > - * the resume path does not configure back > - * autoneg settings, and since we hard reset > - * the phy manually here, we need to reset the > - * state machine also. > - */ > - phy->state = PHY_READY; > - phy_init_hw(phy); > - } > - } > > /* Enable MoCA port interrupts to get notified */ > if (port == priv->moca_port) > diff --git a/net/dsa/slave.c b/net/dsa/slave.c > index 02ace7d462c4..606812160fd5 100644 > --- a/net/dsa/slave.c > +++ b/net/dsa/slave.c > @@ -72,6 +72,7 @@ static int dsa_slave_get_iflink(const struct net_device > *dev) > static int dsa_slave_open(struct net_device *dev) > { > struct dsa_slave_priv *p = netdev_priv(dev); > + struct phy_device *phy = p->phy; > struct dsa_port *dp = p->dp; > struct dsa_switch *ds = dp->ds; > struct net_device *master = dsa_master_netdev(p); > @@ -106,8 +107,18 @@ static int dsa_slave_open(struct net_device *dev) > > dsa_port_set_state_now(p->dp, stp_state); > > - if (p->phy) > - phy_start(p->phy); > + if (phy) { > + /* If phy_stop() has been called before, phy will be in > + * halted state, and phy_start() will call resume. > + * > + * The resume path does not configure back autoneg > + * settings, and since the internal phy may have been > + * hard reset, we need to reset the state machine also. > + */ > + phy->state = PHY_READY; > + phy_init_hw(phy); > + phy_start(phy); > + } > > return 0; > > -- Florian
Re: [PATCH] brcm80211: make const array ucode_ofdm_rates static, reduces object code size
Please use 'brcmsmac:' as prefix instead of 'brcm80211:'. On 22-09-17 16:03, Colin King wrote: From: Colin Ian KingDon't populate const array ucode_ofdm_rates on the stack, instead make it static. Makes the object code smaller by 100 bytes: Before: text data bss dec hex filename 39482564 0 400469c6e phy_cmn.o After text data bss dec hex filename 39326620 0 399469c0a phy_cmn.o (gcc 6.3.0, x86-64) Acked-by: Arend van Spriel Signed-off-by: Colin Ian King --- drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c index 1c4e9dd57960..3a13d176b221 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c @@ -1916,7 +1916,7 @@ void wlc_phy_txpower_update_shm(struct brcms_phy *pi) pi->hwpwr_txcur); for (j = TXP_FIRST_OFDM; j <= TXP_LAST_OFDM; j++) { - const u8 ucode_ofdm_rates[] = { + static const u8 ucode_ofdm_rates[] = { 0x0c, 0x12, 0x18, 0x24, 0x30, 0x48, 0x60, 0x6c }; offset = wlapi_bmac_rate_shm_offset(
Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver
On Fri, Sep 22, 2017 at 7:06 AM, Willem de Bruijnwrote: >> @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file >> *file, struct ifreq *ifr) >> if (tfile->detached) >> return -EINVAL; >> >> + if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN)) >> + return -EPERM; >> + > > This should perhaps be moved into the !dev branch, directly below the > ns_capable check. > Hmm, does that mean fail only on creation but allow to attach if exists? That would be wrong, isn't it? Correct me if I'm wrong but we want to prevent both these scenarios if user does not have sufficient privileges (i.e. NET_ADMIN in init-ns). >> dev = __dev_get_by_name(net, ifr->ifr_name); >> if (dev) { >> if (ifr->ifr_flags & IFF_TUN_EXCL) >> @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file >> *file, struct ifreq *ifr) >> tun->flags = (tun->flags & ~TUN_FEATURES) | >> (ifr->ifr_flags & TUN_FEATURES); >> >> + if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != >> IFF_TAP) >> + tun->flags = tun->flags & ~IFF_NAPI_FRAGS; >> + > > Similarly, this check only need to be performed in that branch. > Instead of reverting to non-frags mode, a tun_set_iff with the wrong > set of flags should probably fail hard. Yes, agree, wrong set of flags should fail hard and probably be done before attach or open, no?
Re: [Intel-wired-lan] [PATCH] i40e: make const array patterns static, reduces object code size
On Fri, 22 Sep 2017 15:11:38 +0100 Colin Kingwrote: > From: Colin Ian King > > Don't populate const array patterns on the stack, instead make it > static. Makes the object code smaller by over 60 bytes: > > Before: >text data bss dec hex filename >1953 496 02449 991 i40e_diag.o > > After: >text data bss dec hex filename >1798 584 02382 94e i40e_diag.o > > (gcc 6.3.0, x86-64) > > Signed-off-by: Colin Ian King Looks good, thanks Colin! Acked-by: Jesse Brandeburg
Re: [PATCH net-next 4/4] net: dsa: add port enable and disable helpers
On Fri, Sep 22, 2017 at 12:17:53PM -0400, Vivien Didelot wrote: > Provide dsa_port_enable and dsa_port_disable helpers to respectively > enable and disable a switch port. This makes the dsa_port_set_state_now > helper static. > > Signed-off-by: Vivien DidelotReviewed-by: Andrew Lunn Andrew
Re: [PATCH net-next 3/4] net: dsa: make slave close symmetrical to open
On Fri, Sep 22, 2017 at 12:17:52PM -0400, Vivien Didelot wrote: > The DSA slave open function configures the unicast MAC addresses on the > master device, enable the switch port, change its STP state, then start > the PHY device. > > Make the close function symmetric, by first stopping the PHY device, > then changing the STP state, disabling the switch port and restore the > master device. > > Signed-off-by: Vivien DidelotReviewed-by: Andrew Lunn Andrew
Re: [PATCH net-next 1/4] net: dsa: move up phy enabling in core
On Fri, Sep 22, 2017 at 12:17:50PM -0400, Vivien Didelot wrote: > bcm_sf2 is currently the only driver using the phy argument passed to > .port_enable. It resets the state machine if the phy has been hard > reset. This check is generic and can be moved to DSA core. > > dsa_port_set_state_now(p->dp, stp_state); > > - if (p->phy) > - phy_start(p->phy); > + if (phy) { > + /* If phy_stop() has been called before, phy will be in > + * halted state, and phy_start() will call resume. > + * > + * The resume path does not configure back autoneg > + * settings, and since the internal phy may have been > + * hard reset, we need to reset the state machine also. > + */ > + phy->state = PHY_READY; > + phy_init_hw(phy); > + phy_start(phy); > + } Hi Vivien If this is generic, why is it needed at all here? Shouldn't this actually by in phylib? Florian ? Andrew
Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
On 22/09/17 16:16, Alexei Starovoitov wrote: > looks like we're converging on > "be16/be32/be64/le16/le32/le64 #register" for BPF_END. > I guess it can live with that. I would prefer more C like syntax > to match the rest, but llvm parsing point is a strong one. Yep, agreed. I'll post a v2 once we've settled BPF_NEG. > For BPG_NEG I prefer to do it in C syntax like interpreter does: > ALU_NEG: > DST = (u32) -DST; > ALU64_NEG: > DST = -DST; > Yonghong, does it mean that asmparser will equally suffer? Correction to my earlier statements: verifier will currently disassemble neg as: (87) r0 neg 0 (84) (u32) r0 neg (u32) 0 because it pretends 'neg' is a compound-assignment operator like +=. The analogy with be16 and friends would be to use neg64 r0 neg32 r0 whereas the analogy with everything else would be r0 = -r0 r0 = (u32) -r0 as Alexei says. I'm happy to go with Alexei's version if it doesn't cause problems for llvm.
[PATCH net-next 1/4] net: dsa: move up phy enabling in core
bcm_sf2 is currently the only driver using the phy argument passed to .port_enable. It resets the state machine if the phy has been hard reset. This check is generic and can be moved to DSA core. Signed-off-by: Vivien Didelot--- drivers/net/dsa/bcm_sf2.c | 16 +--- net/dsa/slave.c | 15 +-- 2 files changed, 14 insertions(+), 17 deletions(-) diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index 898d5642b516..ad96b9725a2c 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -184,22 +184,8 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int port, core_writel(priv, reg, CORE_PORT_TC2_QOS_MAP_PORT(port)); /* Re-enable the GPHY and re-apply workarounds */ - if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1) { + if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1) bcm_sf2_gphy_enable_set(ds, true); - if (phy) { - /* if phy_stop() has been called before, phy -* will be in halted state, and phy_start() -* will call resume. -* -* the resume path does not configure back -* autoneg settings, and since we hard reset -* the phy manually here, we need to reset the -* state machine also. -*/ - phy->state = PHY_READY; - phy_init_hw(phy); - } - } /* Enable MoCA port interrupts to get notified */ if (port == priv->moca_port) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 02ace7d462c4..606812160fd5 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -72,6 +72,7 @@ static int dsa_slave_get_iflink(const struct net_device *dev) static int dsa_slave_open(struct net_device *dev) { struct dsa_slave_priv *p = netdev_priv(dev); + struct phy_device *phy = p->phy; struct dsa_port *dp = p->dp; struct dsa_switch *ds = dp->ds; struct net_device *master = dsa_master_netdev(p); @@ -106,8 +107,18 @@ static int dsa_slave_open(struct net_device *dev) dsa_port_set_state_now(p->dp, stp_state); - if (p->phy) - phy_start(p->phy); + if (phy) { + /* If phy_stop() has been called before, phy will be in +* halted state, and phy_start() will call resume. +* +* The resume path does not configure back autoneg +* settings, and since the internal phy may have been +* hard reset, we need to reset the state machine also. +*/ + phy->state = PHY_READY; + phy_init_hw(phy); + phy_start(phy); + } return 0; -- 2.14.1
[PATCH net-next 2/4] net: dsa: remove phy arg from port enable/disable
The .port_enable and .port_disable functions are meant to deal with the switch ports only, and no driver is using the phy argument anyway. Remove it. Signed-off-by: Vivien Didelot--- drivers/net/dsa/b53/b53_common.c | 6 +++--- drivers/net/dsa/b53/b53_priv.h | 4 ++-- drivers/net/dsa/bcm_sf2.c | 16 +++- drivers/net/dsa/lan9303-core.c | 6 ++ drivers/net/dsa/microchip/ksz_common.c | 6 ++ drivers/net/dsa/mt7530.c | 8 +++- drivers/net/dsa/mv88e6xxx/chip.c | 6 ++ drivers/net/dsa/qca8k.c| 6 ++ include/net/dsa.h | 6 ++ net/dsa/slave.c| 4 ++-- 10 files changed, 27 insertions(+), 41 deletions(-) diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index d4ce092def83..e46eb29d29f0 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -502,7 +502,7 @@ void b53_imp_vlan_setup(struct dsa_switch *ds, int cpu_port) } EXPORT_SYMBOL(b53_imp_vlan_setup); -int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy) +int b53_enable_port(struct dsa_switch *ds, int port) { struct b53_device *dev = ds->priv; unsigned int cpu_port = dev->cpu_port; @@ -531,7 +531,7 @@ int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy) } EXPORT_SYMBOL(b53_enable_port); -void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device *phy) +void b53_disable_port(struct dsa_switch *ds, int port) { struct b53_device *dev = ds->priv; u8 reg; @@ -874,7 +874,7 @@ static int b53_setup(struct dsa_switch *ds) if (dsa_is_cpu_port(ds, port)) b53_enable_cpu_port(dev, port); else if (!(BIT(port) & ds->enabled_port_mask)) - b53_disable_port(ds, port, NULL); + b53_disable_port(ds, port); } return ret; diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h index 603c66d240d8..688d02ee6155 100644 --- a/drivers/net/dsa/b53/b53_priv.h +++ b/drivers/net/dsa/b53/b53_priv.h @@ -311,8 +311,8 @@ int b53_mirror_add(struct dsa_switch *ds, int port, struct dsa_mall_mirror_tc_entry *mirror, bool ingress); void b53_mirror_del(struct dsa_switch *ds, int port, struct dsa_mall_mirror_tc_entry *mirror); -int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy); -void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device *phy); +int b53_enable_port(struct dsa_switch *ds, int port); +void b53_disable_port(struct dsa_switch *ds, int port); void b53_brcm_hdr_setup(struct dsa_switch *ds, int port); void b53_eee_enable_set(struct dsa_switch *ds, int port, bool enable); int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy); diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index ad96b9725a2c..77e0c43f973b 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -159,8 +159,7 @@ static inline void bcm_sf2_port_intr_disable(struct bcm_sf2_priv *priv, intrl2_1_writel(priv, P_IRQ_MASK(off), INTRL2_CPU_CLEAR); } -static int bcm_sf2_port_setup(struct dsa_switch *ds, int port, - struct phy_device *phy) +static int bcm_sf2_port_setup(struct dsa_switch *ds, int port) { struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds); unsigned int i; @@ -191,11 +190,10 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int port, if (port == priv->moca_port) bcm_sf2_port_intr_enable(priv, port); - return b53_enable_port(ds, port, phy); + return b53_enable_port(ds, port); } -static void bcm_sf2_port_disable(struct dsa_switch *ds, int port, -struct phy_device *phy) +static void bcm_sf2_port_disable(struct dsa_switch *ds, int port) { struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds); u32 off, reg; @@ -214,7 +212,7 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, int port, else off = CORE_G_PCTL_PORT(port); - b53_disable_port(ds, port, phy); + b53_disable_port(ds, port); /* Power down the port memory */ reg = core_readl(priv, CORE_MEM_PSM_VDD_CTRL); @@ -613,7 +611,7 @@ static int bcm_sf2_sw_suspend(struct dsa_switch *ds) for (port = 0; port < DSA_MAX_PORTS; port++) { if ((1 << port) & ds->enabled_port_mask || dsa_is_cpu_port(ds, port)) - bcm_sf2_port_disable(ds, port, NULL); + bcm_sf2_port_disable(ds, port); } return 0; @@ -636,7 +634,7 @@ static int bcm_sf2_sw_resume(struct dsa_switch *ds) for (port = 0; port < DSA_MAX_PORTS; port++) { if ((1 <<
[PATCH net-next 3/4] net: dsa: make slave close symmetrical to open
The DSA slave open function configures the unicast MAC addresses on the master device, enable the switch port, change its STP state, then start the PHY device. Make the close function symmetric, by first stopping the PHY device, then changing the STP state, disabling the switch port and restore the master device. Signed-off-by: Vivien Didelot--- net/dsa/slave.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 6290741e496a..235a5c95dfcc 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -144,6 +144,11 @@ static int dsa_slave_close(struct net_device *dev) if (p->phy) phy_stop(p->phy); + dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); + + if (ds->ops->port_disable) + ds->ops->port_disable(ds, p->dp->index); + dev_mc_unsync(master, dev); dev_uc_unsync(master, dev); if (dev->flags & IFF_ALLMULTI) @@ -154,11 +159,6 @@ static int dsa_slave_close(struct net_device *dev) if (!ether_addr_equal(dev->dev_addr, master->dev_addr)) dev_uc_del(master, dev->dev_addr); - if (ds->ops->port_disable) - ds->ops->port_disable(ds, p->dp->index); - - dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); - return 0; } -- 2.14.1
[PATCH net-next 0/4] net: dsa: simplify port enabling
This patchset removes the unnecessary PHY device argument in port enable/disable switch operations, makes slave open and close symmetrical and finally provides helpers for enabling or disabling a DSA port. Vivien Didelot (4): net: dsa: move up phy enabling in core net: dsa: remove phy arg from port enable/disable net: dsa: make slave close symmetrical to open net: dsa: add port enable and disable helpers drivers/net/dsa/b53/b53_common.c | 6 +++--- drivers/net/dsa/b53/b53_priv.h | 4 ++-- drivers/net/dsa/bcm_sf2.c | 32 -- drivers/net/dsa/lan9303-core.c | 6 ++ drivers/net/dsa/microchip/ksz_common.c | 6 ++ drivers/net/dsa/mt7530.c | 8 +++- drivers/net/dsa/mv88e6xxx/chip.c | 6 ++ drivers/net/dsa/qca8k.c| 6 ++ include/net/dsa.h | 6 ++ net/dsa/dsa_priv.h | 3 ++- net/dsa/port.c | 31 - net/dsa/slave.c| 36 ++ 12 files changed, 77 insertions(+), 73 deletions(-) -- 2.14.1
[PATCH net-next 4/4] net: dsa: add port enable and disable helpers
Provide dsa_port_enable and dsa_port_disable helpers to respectively enable and disable a switch port. This makes the dsa_port_set_state_now helper static. Signed-off-by: Vivien Didelot--- net/dsa/dsa_priv.h | 3 ++- net/dsa/port.c | 31 ++- net/dsa/slave.c| 19 +-- 3 files changed, 37 insertions(+), 16 deletions(-) diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index 9803952a5b40..6bfff19d1615 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -117,7 +117,8 @@ void dsa_master_ethtool_restore(struct net_device *dev); /* port.c */ int dsa_port_set_state(struct dsa_port *dp, u8 state, struct switchdev_trans *trans); -void dsa_port_set_state_now(struct dsa_port *dp, u8 state); +int dsa_port_enable(struct dsa_port *dp); +void dsa_port_disable(struct dsa_port *dp); int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br); void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br); int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering, diff --git a/net/dsa/port.c b/net/dsa/port.c index 76d43a82d397..50749339e252 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -56,7 +56,7 @@ int dsa_port_set_state(struct dsa_port *dp, u8 state, return 0; } -void dsa_port_set_state_now(struct dsa_port *dp, u8 state) +static void dsa_port_set_state_now(struct dsa_port *dp, u8 state) { int err; @@ -65,6 +65,35 @@ void dsa_port_set_state_now(struct dsa_port *dp, u8 state) pr_err("DSA: failed to set STP state %u (%d)\n", state, err); } +int dsa_port_enable(struct dsa_port *dp) +{ + u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING; + struct dsa_switch *ds = dp->ds; + int port = dp->index; + int err; + + if (ds->ops->port_enable) { + err = ds->ops->port_enable(ds, port); + if (err) + return err; + } + + dsa_port_set_state_now(dp, stp_state); + + return 0; +} + +void dsa_port_disable(struct dsa_port *dp) +{ + struct dsa_switch *ds = dp->ds; + int port = dp->index; + + dsa_port_set_state_now(dp, BR_STATE_DISABLED); + + if (ds->ops->port_disable) + ds->ops->port_disable(ds, port); +} + int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br) { struct dsa_notifier_bridge_info info = { diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 235a5c95dfcc..e40623939323 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -74,9 +74,7 @@ static int dsa_slave_open(struct net_device *dev) struct dsa_slave_priv *p = netdev_priv(dev); struct phy_device *phy = p->phy; struct dsa_port *dp = p->dp; - struct dsa_switch *ds = dp->ds; struct net_device *master = dsa_master_netdev(p); - u8 stp_state = dp->bridge_dev ? BR_STATE_BLOCKING : BR_STATE_FORWARDING; int err; if (!(master->flags & IFF_UP)) @@ -99,13 +97,9 @@ static int dsa_slave_open(struct net_device *dev) goto clear_allmulti; } - if (ds->ops->port_enable) { - err = ds->ops->port_enable(ds, p->dp->index); - if (err) - goto clear_promisc; - } - - dsa_port_set_state_now(p->dp, stp_state); + err = dsa_port_enable(dp); + if (err) + goto clear_promisc; if (phy) { /* If phy_stop() has been called before, phy will be in @@ -139,15 +133,12 @@ static int dsa_slave_close(struct net_device *dev) { struct dsa_slave_priv *p = netdev_priv(dev); struct net_device *master = dsa_master_netdev(p); - struct dsa_switch *ds = p->dp->ds; + struct dsa_port *dp = p->dp; if (p->phy) phy_stop(p->phy); - dsa_port_set_state_now(p->dp, BR_STATE_DISABLED); - - if (ds->ops->port_disable) - ds->ops->port_disable(ds, p->dp->index); + dsa_port_disable(dp); dev_mc_unsync(master, dev); dev_uc_unsync(master, dev); -- 2.14.1
[PATCH] Switch to use the new hashtable implementation. This reduces the code and need for yet another hashtable implementation.
Signed-off-by: Aaron Wood--- net/9p/error.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/net/9p/error.c b/net/9p/error.c index 126fd0dceea2..2e966fcc5cbb 100644 --- a/net/9p/error.c +++ b/net/9p/error.c @@ -32,6 +32,7 @@ #include #include #include +#include #include #include @@ -50,8 +51,8 @@ struct errormap { struct hlist_node list; }; -#define ERRHASHSZ 32 -static struct hlist_head hash_errmap[ERRHASHSZ]; +#define ERR_HASH_BITS 5 +static DEFINE_HASHTABLE(hash_errmap, ERR_HASH_BITS); /* FixMe - reduce to a reasonable size */ static struct errormap errmap[] = { @@ -193,18 +194,14 @@ static struct errormap errmap[] = { int p9_error_init(void) { struct errormap *c; - int bucket; - - /* initialize hash table */ - for (bucket = 0; bucket < ERRHASHSZ; bucket++) - INIT_HLIST_HEAD(_errmap[bucket]); + int key; /* load initial error map into hash table */ for (c = errmap; c->name != NULL; c++) { c->namelen = strlen(c->name); - bucket = jhash(c->name, c->namelen, 0) % ERRHASHSZ; + key = jhash(c->name, c->namelen, 0); INIT_HLIST_NODE(>list); - hlist_add_head(>list, _errmap[bucket]); + hash_add(hash_errmap, >list, key); } return 1; @@ -222,12 +219,12 @@ int p9_errstr2errno(char *errstr, int len) { int errno; struct errormap *c; - int bucket; + int key; errno = 0; c = NULL; - bucket = jhash(errstr, len, 0) % ERRHASHSZ; - hlist_for_each_entry(c, _errmap[bucket], list) { + key = jhash(errstr, len, 0); + hash_for_each_possible(hash_errmap, c, list, key) { if (c->namelen == len && !memcmp(c->name, errstr, len)) { errno = c->val; break; -- 2.11.0