Re: [PATCH net-next] nfp: flower: prioritize stats updates
On Sun, Jan 21, 2018 at 3:10 PM, David Millerwrote: > From: Jakub Kicinski > Date: Fri, 19 Jan 2018 17:54:08 -0800 > >> From: Pieter Jansen van Vuuren >> >> Previously it was possible to interrupt processing stats updates because >> they were handled in a work queue. Interrupting the stats updates could >> lead to a situation where we backup the control message queue. This patch >> moves the stats update processing out of the work queue to be processed as >> soon as hardware sends a request. >> >> Reported-by: Louis Peens >> Signed-off-by: Pieter Jansen van Vuuren >> >> Reviewed-by: Dirk van der Merwe >> Reviewed-by: Jakub Kicinski > > Applied, thanks Jakub. Thank you! > Just a reminder that it really makes things more difficult that one > can't just go: > > make drivers/net/ethernet/netronome/nfp/flower/cmsg.o > > to test a specific NFP object build like one can with pretty much the > rest of the entire kernel tree. Oh, thanks for letting me know, I wasn't aware the build system seems to require a separate Makefile in each directory.. I'll fix it first thing in the morning!
[RFC PATCH net-next] net: aquantia: aq_fw2x_get_mac_permanent() can be static
Fixes: a57d3929b838 ("net: aquantia: Introduce support for new firmware on AQC cards") Signed-off-by: Fengguang Wu--- hw_atl_utils_fw2x.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c index 8cfce95..39cd3a2 100644 --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c @@ -107,7 +107,7 @@ static int aq_fw2x_update_link_status(struct aq_hw_s *self) return 0; } -int aq_fw2x_get_mac_permanent(struct aq_hw_s *self, u8 *mac) +static int aq_fw2x_get_mac_permanent(struct aq_hw_s *self, u8 *mac) { int err = 0; u32 h = 0U;
Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.
On 2018/01/20 09:21, Alexander Duyck wrote: > On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier >wrote: > > On 2018/01/20 07:45, Benjamin Poirier wrote: > > [...] > >> > > >> > I'm of the mind that we need to cut down on the code thrash. This > >> > driver is supposed to have been in a "maintenance" mode for the last > >> > year or so as there aren't being any new parts added is my > >> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e: > >> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or > >> > accepted in the first place. I don't see any notes about it fixing any > >> > bug or addressing any issue and it seems like that is the start of all > >> > the issues we have been having recently with RXO triggering more > >> > interrupts, various link issues, and this most recent VMware issue. > >> > >> I'm sorry to say but you're the one who suggested that change: > >> > >> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html > >> > >> On 2015/10/28 23:08, Alexander Duyck wrote: > >> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote: > >> [...] > >> > > >> > I would argue your first patch probably didn't go far enough to remove > >> > dead > >> > code. Specifically you should only ever get into this function if LSC is > >> > set. There are no other causes that should trigger this. As such you > >> > could > >> > probably remove the ICR read, and instead replace it with an ICR write of > >> > the LSC bit since OTHER is already cleared via EIAC. > >> > > > > > ... The assumption that "There are no other causes that should trigger > > this." turned out to be wrong and that caused the RXO problems. Clearing > > OTHER via EIAC is causing the problems with vmware now. I don't think > > you foresaw those problems back in 2015 and neither did I. > > Well that explains why I felt like I was explaining things to a > younger version of myself. I was a bit more relaxed in terms of being > willing to make arbitrary changes a few years ago. I tend to be a bit > more conservative now, at least as far as having justifications as to > why I want to do things. With any change you end up taking on risk, > and so usually a patch has a justification as to why you are making > the change. > > Unfortunately at the time I didn't have all the information and based > my suggestion on a bad assumption. I would guess at the time I was > thinking of doing general code cleanup. Other drivers such as igb work > this way, but it led us down the path we are on now where we are > having to make one fix after another. It is leading in the opposite > direction of maintainability as this is all becoming more complex. > Suggesting this was a bad decision on my part at the time. I'm only > human, I make mistakes.. :-) Thanks for the introspection. > > With further review of the code I am seeing various other issues that > could still pop up as I am not certain we should even have the "other" > interrupt messing with the NAPI polling or packet accounting logic at > all. The question I would have at this point is if we revert > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1) > and all the related fixes for it, what do we end up with? The patch I submitted for the current vmware issue actually finishes reverting commit 16ecba59bc33. I believe the relevant commits to consider are: 16ecba59bc33 e1000e: Do not read ICR in Other interrupt (v4.5-rc1) a61cfe4ffad7 e1000e: Do not write lsc to ics in msi-x mode (v4.5-rc1) 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) 19110cfbb34d e1000e: Separate signaling for link check/link up (v4.15-rc1) 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1) 4110e02eb45e e1000e: Fix e1000_check_for_copper_link_ich8lan return value. (v4.15-rc8) (submitted) e1000e: Remove Other from EIAC. commit 4aea7a5c5e94 essentially reverts a61cfe4ffad7 and part of 16ecba59bc33 (ICR read). The submitted patch reverts the rest of 16ecba59bc33 (EIAC clearing of Other). > It seems > like the code is slowly heading back in the direction of where it was > originally anyway since there have been a number of partial reverts. > I'm wondering what would happen if we were to just short-cut that and > audit the patches involved to see what we really need and don't. > > Your patch as proposed is essentially another step in that direction. > I'm thinking we may want to drop my currently proposed fix for now and > instead look at going through and figure out what changes after that > first one are still really needed. If the patch that I submitted for the current vmware issue is merged, the significant commits that are left are: 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) Fixes a problem in the irq disabling of the napi implementation. 19110cfbb34d e1000e: Separate signaling for link check/link up (v4.15-rc1) Fixes link flapping
[RFC PATCH net-next] net: aquantia: hw_atl_utils_mpi_set_speed() can be static
Fixes: 0c58c35f02c2 ("net: aquantia: Introduce firmware ops callbacks") Signed-off-by: Fengguang Wu--- hw_atl_utils.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c index 967f0fd..ad87338 100644 --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c @@ -483,7 +483,7 @@ void hw_atl_utils_mpi_read_stats(struct aq_hw_s *self, err_exit:; } -int hw_atl_utils_mpi_set_speed(struct aq_hw_s *self, u32 speed) +static int hw_atl_utils_mpi_set_speed(struct aq_hw_s *self, u32 speed) { u32 val = aq_hw_read_reg(self, HW_ATL_MPI_CONTROL_ADR);
Re: [PATCHv3] 3c59x: fix missing dma_mapping_error check and bad ring refill logic
On Wed, Jan 3, 2018 at 1:44 PM, David Millerwrote: > From: Neil Horman > Date: Wed, 3 Jan 2018 13:09:23 -0500 > >> A few spots in 3c59x missed calls to dma_mapping_error checks, casuing >> WARN_ONS to trigger. Clean those up. While we're at it, refactor the >> refill code a bit so that if skb allocation or dma mapping fails, we >> recycle the existing buffer. This prevents holes in the rx ring, and >> makes for much simpler logic >> >> Note: This is compile only tested. Ted, if you could run this and >> confirm that it continues to work properly, I would appreciate it, as I >> currently don't have access to this hardware >> Neil, I was able to test this patch. I did not get any WARN_ON messages. However, I am getting a lot of dropped receive packets; uptime is 11 minutes and it has already dropped 214 of 743 receive packets. Admittedly this is on a slow i486 regression testing system, but the drop rate is approximately 30% which seems high even for this system because it is on a very quiet switched network. I enabled some debugging messages by setting msglvl to 4 and recompiling with DYNAMIC_DEBUG=y. I did not see any messages of the form "No memory to allocate a sk_buff of size" so that leaves the following two cases: boomerang_rx() ... newskb = netdev_alloc_skb_ip_align(dev, PKT_BUF_SZ); if (!newskb) { dev->stats.rx_dropped++; goto clear_complete; } newdma = pci_map_single(VORTEX_PCI(vp), newskb->data, PKT_BUF_SZ, PCI_DMA_FROMDEVICE) if (dma_mapping_error(_PCI(vp)->dev, newdma)) { dev->stats.rx_dropped++; consume_skb(newskb); goto clear_complete; } What shall we do to determine if it is hitting the pci_map_single() or netdev_alloc_skb_ip_align() failure? - Matthew
Re: wcn36xx: release resources in case of error
Ramon Friedwrote: > wcn36xx_dxe_init() doesn't check for the return value > of wcn36xx_dxe_init_descs(). > This patch releases the resources in case an error ocurred. > > Signed-off-by: Ramon Fried This is a duplicate. Patch set to Superseded. -- https://patchwork.kernel.org/patch/10173331/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: wcn36xx: release resources in case of error
Ramon Friedwrote: > wcn36xx_dxe_init() doesn't check for the return value > of wcn36xx_dxe_init_descs(). > This patch releases the resources in case an error ocurred. > > Signed-off-by: Ramon Fried This is a duplicate. Patch set to Superseded. -- https://patchwork.kernel.org/patch/10173329/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: wcn36xx: release resources in case of error
Ramon Friedwrote: > wcn36xx_dxe_init() doesn't check for the return value > of wcn36xx_dxe_init_descs(). > This patch releases the resources in case an error ocurred. > > Signed-off-by: Ramon Fried Doesn't compile: drivers/net/wireless/ath/wcn36xx/dxe.c: In function ‘wcn36xx_dxe_deinit_descs’: drivers/net/wireless/ath/wcn36xx/dxe.c:243:15: error: ‘struct wcn36xx_dxe_ch’ has no member named ‘desc_bum’ size = wcn_ch->desc_bum * sizeof(struct wcn36xx_dxe_desc); ^ drivers/net/wireless/ath/wcn36xx/dxe.c:244:20: error: ‘wcn’ undeclared (first use in this function) dma_free_coherent(wcn->dev, size, ^ drivers/net/wireless/ath/wcn36xx/dxe.c:244:20: note: each undeclared identifier is reported only once for each function it appears in drivers/net/wireless/ath/wcn36xx/dxe.c: In function ‘wcn36xx_dxe_init’: drivers/net/wireless/ath/wcn36xx/dxe.c:737:3: error: implicit declaration of function ‘dev_error’ [-Werror=implicit-function-declaration] dev_error("Error allocating descriptor\n"); ^ drivers/net/wireless/ath/wcn36xx/dxe.c:818:2: error: expected ‘;’ before ‘if’ if (ret) { ^ drivers/net/wireless/ath/wcn36xx/dxe.c:860:1: warning: label ‘out_err_txh_ch’ defined but not used [-Wunused-label] out_err_txh_ch: ^ drivers/net/wireless/ath/wcn36xx/dxe.c:856:1: warning: label ‘out_err_rxh_ch’ defined but not used [-Wunused-label] out_err_rxh_ch: ^ drivers/net/wireless/ath/wcn36xx/dxe.c:760:3: error: label ‘our_err_txh_ch’ used but not defined goto our_err_txh_ch; ^ cc1: some warnings being treated as errors make[5]: *** [drivers/net/wireless/ath/wcn36xx/dxe.o] Error 1 make[4]: *** [drivers/net/wireless/ath/wcn36xx] Error 2 make[3]: *** [drivers/net/wireless/ath] Error 2 make[2]: *** [drivers/net/wireless] Error 2 make[1]: *** [drivers/net] Error 2 make[1]: *** Waiting for unfinished jobs make: *** [drivers] Error 2 Patch set to Changes Requested. -- https://patchwork.kernel.org/patch/10173325/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
[PATCH net-next] net: link_watch: mark bonding link events urgent
From: Roopa PrabhuIt takes 1sec for bond link down notification to hit user-space when all slaves of the bond go down. 1sec is too long for protocol daemons in user-space relying on bond link notification to failover/recover (eg: multichassis lag implementations in user-space). Since the link event code already marks team device port link events urgent, this patch does the same for bonding link events. Signed-off-by: Roopa Prabhu --- net/core/link_watch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/link_watch.c b/net/core/link_watch.c index 9828616..63bb2ad 100644 --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -92,7 +92,7 @@ static bool linkwatch_urgent_event(struct net_device *dev) if (dev->ifindex != dev_get_iflink(dev)) return true; - if (dev->priv_flags & IFF_TEAM_PORT) + if (dev->priv_flags & (IFF_TEAM_PORT | IFF_BONDING)) return true; return netif_carrier_ok(dev) && qdisc_tx_changing(dev); -- 2.1.4
Re: [v3, 3/6] dt: booting-without-of: DT fix s/#interrupt-cell/#interrupt-cells/
On Fri, 2017-06-02 at 12:38:46 UTC, Geert Uytterhoeven wrote: > Signed-off-by: Geert Uytterhoeven> Acked-by: Rob Herring Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/4be4119d1fbd93c44d5c639735c312 cheers
[PATCH net-next] tun: add missing rcu annotation
This patch fixes the following sparse warnings: drivers/net/tun.c:2241:15: error: incompatible types in comparison expression (different address spaces) Fixes: cd5681d7d890 ("tuntap: rename struct tun_steering_prog to struct tun_prog") Cc: Daniel BorkmannSigned-off-by: Jason Wang --- drivers/net/tun.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 6988746..2bc18b1 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2225,7 +2225,8 @@ static void tun_prog_free(struct rcu_head *rcu) kfree(prog); } -static int __tun_set_ebpf(struct tun_struct *tun, struct tun_prog **prog_p, +static int __tun_set_ebpf(struct tun_struct *tun, + struct tun_prog __rcu **prog_p, struct bpf_prog *prog) { struct tun_prog *old, *new = NULL; -- 2.7.4
Re: net merged into net-next
On 2018年01月21日 04:17, Cong Wang wrote: On Fri, Jan 19, 2018 at 8:02 PM, David Millerwrote: Cong, please check my conflict resolution of drivers/net/tun.c, thank you. It looks good to me except I am not sure about the xdp_rxq_info_unreg() inside tun_cleanup_tx_ring(). Looks correct to me. Thanks
Re: [PATCH net] net: validate untrusted gso packets
On 2018年01月19日 22:39, Willem de Bruijn wrote: On Fri, Jan 19, 2018 at 3:19 AM, Jason Wangwrote: On 2018年01月19日 08:53, Willem de Bruijn wrote: And what you propose here is just a very small subset of the necessary checking, more comes at gso header checking. So even if we care performance, it only help for some specific case. It also fixed the bug that Eric sent a separate patch for, as that did not dissect as a valid TCP packet, either. I may miss something but how did this patch protects an evil thoff? Actually, it blocked that specific reproducer because the ip protocol did not match. I see. I think that __skb_flow_dissect_tcp should return a boolean, causing dissection return FLOW_DISSECT_RET_OUT_BAD if the tcph is bad. That would be needed to really catch it with flow dissection at the source. Just sanitize transport to offset_hint (0) in the case of tun. That is the current approach in skb_probe_transport_header, but it opens us up to parsing of garbage headers later in the stack when unconditionally reading tcp_hdrlen. The qdisc_pkt_len_init case is one such example. Seems better to leave transport header unset in such cases and qualify all access to the headers if DODGY. It looks to me flow dissector will return FLOW_DISSECT_RET_OUT_BAD too if it can't recognize the protocol. We can't differ the real failure from unrecognized protocol. (or change the return from bool to int). Unrecognized protocol should imply failure. virtio_net_hdr_to_skb accepts a whitelist of protocols, all of which the flow dissector can verify. Yes and only for gso packets. Another argument for early validation: we cannot currently differentiate tunnel headers inserted by the tun/pf_packet/xen user from those generated by the stack if both are present. The kernel currently does not define tunnel VIRTIO_NET_HDR_GSO types, so should drop packets with tunnel headers. Tunnel gso handlers indeed do drop these if skb->encapsulation is not set. But if a packet travels through a tunnel device and the bit is set, at segmentation time we cannot distinguish trusted stack headers from possibly malformed user supplied ones. Right. Thanks
Re: [PATCH net v3] gso: validate gso_type in GSO handlers
On 2018年01月19日 22:29, Willem de Bruijn wrote: From: Willem de BruijnValidate gso_type during segmentation as SKB_GSO_DODGY sources may pass packets where the gso_type does not match the contents. Syzkaller was able to enter the SCTP gso handler with a packet of gso_type SKB_GSO_TCPV4. On entry of transport layer gso handlers, verify that the gso_type matches the transport protocol. Fixes: 90017accff61 ("sctp: Add GSO support") Link:http://lkml.kernel.org/r/<001a1137452496ffc305617e5...@google.com> Reported-by:syzbot+fee64147a25aecd48...@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn --- Similar checks existed until removed in commit 5c7cdf339af5 ("gso: Remove arbitrary checks for unsupported GSO"). But those were limited to the TSO path, not software GSO. I believe that this issue goes back further, hence the Fixes at the first user of virtio_net_hdr. --- net/ipv4/esp4_offload.c | 3 +++ net/ipv4/tcp_offload.c | 3 +++ net/ipv4/udp_offload.c | 3 +++ net/ipv6/esp6_offload.c | 3 +++ net/ipv6/tcpv6_offload.c | 3 +++ net/ipv6/udp_offload.c | 3 +++ net/sctp/offload.c | 3 +++ 7 files changed, 21 insertions(+) Acked-by: Jason Wang Thanks
Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
Hi Eric On 01/22/2018 12:43 AM, Eric Dumazet wrote: > On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote: >> >> On 21/01/2018 11:31 AM, Tariq Toukan wrote: >>> >>> >>> On 19/01/2018 5:49 PM, Eric Dumazet wrote: On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote: > Hi Tariq > > Very sad that the crash was reproduced again after applied the patch. >> >> Memory barriers vary for different Archs, can you please share more >> details regarding arch and repro steps? > > Yeah, mlx4 NICs in Google fleet receive trillions of packets per > second, and we never noticed an issue. > > Although we are using a slightly different driver, using order-0 pages > and fast pages recycling. > > The driver we use will will set the page reference count to (size of pages)/stride, the pages will be freed by networking stack when the reference become zero, and the order-3 pages maybe allocated soon, this give NIC device a chance to corrupt the pages which have been allocated by others, such as slab. In the current version with order-0 and page recycling, maybe the corruption occurred on the inbound packets sometimes and just cause some bad and invalid packets which will be dropped. Thanks Jianchao
Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
Hi Tariq and all Many thanks for your kindly and detailed response and comment. On 01/22/2018 12:24 AM, Tariq Toukan wrote: > > > On 21/01/2018 11:31 AM, Tariq Toukan wrote: >> >> >> On 19/01/2018 5:49 PM, Eric Dumazet wrote: >>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote: Hi Tariq Very sad that the crash was reproduced again after applied the patch. > > Memory barriers vary for different Archs, can you please share more details > regarding arch and repro steps?The hardware is HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015 The xen is installed. The crash occurred in DOM0. Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest. The patch that can fix this issue is as follow: --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -1005,6 +1005,7 @@ out: wmb(); /* ensure HW sees CQ consumer before we post new buffers */ ring->cons = cq->mcq.cons_index; mlx4_en_refill_rx_buffers(priv, ring); + wmb(); mlx4_en_update_rx_prod_db(ring); return polled; } Thanks Jianchao > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring) static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring) { + dma_wmb(); >>> >>> So... is wmb() here fixing the issue ? >>> *ring->wqres.db.db = cpu_to_be32(ring->prod & 0x); } I analyzed the kdump, it should be a memory corruption. Thanks Jianchao >> >> Hmm, this is actually consistent with the example below [1]. >> >> AIU from the example, it seems that the dma_wmb/dma_rmb barriers are good >> for synchronizing cpu/device accesses to the "Streaming DMA mapped" buffers >> (the descriptors, went through the dma_map_page() API), but not for the >> doorbell (a coherent memory, typically allocated via dma_alloc_coherent) >> that requires using the stronger wmb() barrier. >> >> >> [1] Documentation/memory-barriers.txt >> >> (*) dma_wmb(); >> (*) dma_rmb(); >> >> These are for use with consistent memory to guarantee the ordering >> of writes or reads of shared memory accessible to both the CPU and a >> DMA capable device. >> >> For example, consider a device driver that shares memory with a device >> and uses a descriptor status value to indicate if the descriptor >> belongs >> to the device or the CPU, and a doorbell to notify it when new >> descriptors are available: >> >> if (desc->status != DEVICE_OWN) { >> /* do not read data until we own descriptor */ >> dma_rmb(); >> >> /* read/modify data */ >> read_data = desc->data; >> desc->data = write_data; >> >> /* flush modifications before status update */ >> dma_wmb(); >> >> /* assign ownership */ >> desc->status = DEVICE_OWN; >> >> /* force memory to sync before notifying device via MMIO */ >> wmb(); >> >> /* notify device of new descriptors */ >> writel(DESC_NOTIFY, doorbell); >> } >> >> The dma_rmb() allows us guarantee the device has released ownership >> before we read the data from the descriptor, and the dma_wmb() allows >> us to guarantee the data is written to the descriptor before the device >> can see it now has ownership. The wmb() is needed to guarantee that >> the >> cache coherent memory writes have completed before attempting a write >> to >> the cache incoherent MMIO region. >> >> See Documentation/DMA-API.txt for more information on consistent >> memory. >> >> On 01/15/2018 01:50 PM, jianchao.wang wrote: > Hi Tariq > > Thanks for your kindly response. > > On 01/14/2018 05:47 PM, Tariq Toukan wrote: >> Thanks Jianchao for your patch. >> >> And Thank you guys for your reviews, much appreciated. >> I was off-work on Friday and Saturday. >> >> On 14/01/2018 4:40 AM, jianchao.wang wrote: >>> Dear all >>> >>> Thanks for the kindly response and reviewing. That's really appreciated. >>> >>> On 01/13/2018 12:46 AM, Eric Dumazet wrote: > Does this need to be dma_wmb(), and should it be in > mlx4_en_update_rx_prod_db ? > +1 on dma_wmb() On what architecture bug was observed ? >>> >>> This issue was observed on x86-64. >>> And I will send a new patch, in which replace wmb() with dma_wmb(), to >>> customer >>> to confirm. >> >> +1 on dma_wmb, let us know once customer confirms. >> Please place it within mlx4_en_update_rx_prod_db as suggested. > > Yes, I have recommended it to customer. > Once I get
Re: Memory corruption with r8169 across several device revisions and kernels
Am 22.01.2018 um 01:09 schrieb Francois Romieu: > You said: > > Oliver Freyermuth: > [...] >> The values found in overwritten memory match those contained in >> /proc/self/net/dev for the realtek ethernet device. > > Are you able to retrieve the layout ? That is, does it appear to match: > > - r8169 hardware stats DMA buffer ? > TxOk, RxOk, TxErr, RxErr, ... > > - rtnl_link_stats ? > rx_packets, tx_packets, rx_bytes, tx_bytes, ... > > or something else ? Not cleanly. Since I'm no expert in kernel module development, I can only deduce from what I get in mapped memory, e.g. with memtester. What I found there I found back in /proc/self/net/dev, I'm not sure anymore whether it was RX or TX bytes / packets (but it was none of the error counters). I can try to reproduce to clarify, but it's a somwhat dangerous undertaking. Also, from a time when the physical offset was in low memory, I got the following in syslog: Oct 12 10:05:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 0065b8ea Oct 12 10:10:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 0065be39 Oct 12 10:11:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 0065be8c Oct 12 10:12:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 0065bef8 Oct 12 10:13:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 0065bfbe Oct 12 10:18:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 0065c37a Oct 12 10:19:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 0065c3db Oct 12 10:31:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 0065cc48 Oct 12 10:35:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 0065d402 Oct 12 10:47:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 0065dcbb Oct 12 10:53:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 0065e0a3 Oct 12 11:39:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 006602f2 Oct 12 11:44:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 phys) = 00661ef0 Also, I'm not sure whether the low memory scanner continues after a single corruption was found, potentially it would only see the first corrupted region. memtester in userspace stops on the first corruption and then tries another pass. At least I only ever saw one corrupted region with the tools I used. The same was true for the corrupted btrfs filesystem: As far as I could tell, there was a single corrupted region, no series of counters, i.e. not a full structure. Cheers, Oliver
Re: Memory corruption with r8169 across several device revisions and kernels
Oliver Freyermuth: > Am 21.01.2018 um 21:48 schrieb Francois Romieu: > > Oliver Freyermuth : [...] > > Is it an AMD based system ? > > > > No, all the systems on which I have observed this up to now are Intel-based. > Two Haswell and one Sandy Bridge system. Ok. You said: Oliver Freyermuth : [...] > The values found in overwritten memory match those contained in > /proc/self/net/dev for the realtek ethernet device. Are you able to retrieve the layout ? That is, does it appear to match: - r8169 hardware stats DMA buffer ? TxOk, RxOk, TxErr, RxErr, ... - rtnl_link_stats ? rx_packets, tx_packets, rx_bytes, tx_bytes, ... or something else ? -- Ueimor
Re: [PATCH net-next 0/7] tcp: implement rb-tree based retransmit queue
On Sun, Jan 21, 2018 at 12:52 PM, Tal Gilboawrote: > Hi Eric, > We have noticed a degradation on both of our drivers (mlx4 and mlx5) when > running TCP. Exact scenario is single stream TCP with 1KB packets. The > degradation is a steady 50% drop. > We tracked the offending commit to be: > 75c119a ("tcp: implement rb-tree based retransmit queue") > > Since mlx4 and mlx5 code base is completely different and by looking at the > changes in this commit, we believe the issue is external to the mlx4/5 > drivers. > > I see in the comment below you anticipated some overhead, but this may be a > too common case to ignore. > > Can you please review and consider reverting/fixing it? > Hi Tal You have to provide way more details than a simple mail, asking for a " revert or a fix " ... On our GFEs, we got a win, while I was expecting a small overhead, given the apparent complexity of dealing with RB tree instead of linear list. And on the stress scenario described in my patch set, the win was absolutely abysmal. A " single strean TCP with 1KB packets" is not something we need to optimize, unless there is some really strange setup for one of your customers ? Here we deal with millions of TCP flows, and this is what we need to take care of. Thanks. > Thanks, > > Tal G. > > > On 10/7/2017 2:31 AM, David Miller wrote: >> >> From: Eric Dumazet >> Date: Thu, 5 Oct 2017 22:21:20 -0700 >> >>> This patch series implement RB-tree based retransmit queue for TCP, >>> to better match modern BDP. >> >> >> Indeed, there was a lot of resistence to this due to the overhead >> for small retransmit queue sizes, but with today's scale this is >> long overdue. >> >> So, series applied, nice work! >> >> Maybe we can look into dynamic schemes where when the queue never >> goes over N entries we elide the rbtree and use a list. I'm not >> so sure how practical that would be. >> >> Thanks! >> >
Re: [patch net-next 0/7] mlxsw: Add support for mirror action with flower
From: Jiri PirkoDate: Fri, 19 Jan 2018 09:24:45 +0100 > From: Jiri Pirko > > Arkadi says: > > Add support for mirror action with flower classifier. The first 3 patches > introduce a generic per-block resource infra. The last 4 patches add > support for flow based span. Series applied, thanks.
Re: [PATCH net-next 00/11] Aquantia atlantic driver new devices support
From: Igor RusskikhDate: Fri, 19 Jan 2018 17:03:17 +0300 > This patchset introduces a support for new Aquantia hardware: > AQC11x family with updated hardware (B1) and firmware (2.x and 3.x branches). > > For that, a number of improvements in overall driver model were done: > - Firmware specific ops tables. Firmware 2.x and 3.x series support >functions are now in separate fw2x module. > - PCI module cleanup and simplification done. > - Verified and tested hardware reset process. Series applied, thanks Igor.
Re: [pull request][net-next 00/15] Mellanox, mlx5 updates 2018-01-19
From: Saeed MahameedDate: Fri, 19 Jan 2018 23:34:42 +0200 > The following series includes updates for mlx5e netdev driver, > for more information please see tag log below. > > Please pull and let me know if there's any problem. Pulled, thanks Saeed.
Re: [PATCH net-next] nfp: flower: prioritize stats updates
From: Jakub KicinskiDate: Fri, 19 Jan 2018 17:54:08 -0800 > From: Pieter Jansen van Vuuren > > Previously it was possible to interrupt processing stats updates because > they were handled in a work queue. Interrupting the stats updates could > lead to a situation where we backup the control message queue. This patch > moves the stats update processing out of the work queue to be processed as > soon as hardware sends a request. > > Reported-by: Louis Peens > Signed-off-by: Pieter Jansen van Vuuren > Reviewed-by: Dirk van der Merwe > Reviewed-by: Jakub Kicinski Applied, thanks Jakub. Just a reminder that it really makes things more difficult that one can't just go: make drivers/net/ethernet/netronome/nfp/flower/cmsg.o to test a specific NFP object build like one can with pretty much the rest of the entire kernel tree.
Re: [PATCH net V2] net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare
From: Saeed MahameedDate: Sun, 21 Jan 2018 05:30:42 +0200 > From: Talat Batheesh > > Helmut reported a bug about division by zero while > running traffic and doing physical cable pull test. > > When the cable unplugged the ppms become zero, so when > dividing the current ppms by the previous ppms in the > next dim iteration there is division by zero. > > This patch prevent this division for both ppms and epms. > > Fixes: c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism") > Reported-by: Helmut Grauer > Signed-off-by: Talat Batheesh > Signed-off-by: Saeed Mahameed > --- > v2: > - Fix spelling in commit message. > > This patch is the equivalent of: > ("net/dim: Fix fixpoint divide exception in net_dim_stats_compare") > which was already submitted to net-next. Since mlx5 irq moderation code > got moved in net-next, this patch should only go to net and skip net-next. > > for -stable: 4.12.y and later. Applied and queue up for -stable, thank you.
Re: [PATCH] net: gemini: Depend on HAS_IOMEM
From: Linus WalleijDate: Sun, 21 Jan 2018 14:15:41 +0100 > The zeroday builder notices that since Usermode Linux does not > have IO memory, the build fails for them when selecting everything > it can enable. > > As the driver is clearly using memory-mapped registers to access > the network adapter, we add depends on HAS_IOMEM to solve this > problem. > > Reported-by: kbuild test robot > Signed-off-by: Linus Walleij Applied, thank you.
Re: [RFC crypto v3 0/9] Chelsio Inline TLS
2017-12-20, 17:03:02 +0530, Atul Gupta wrote: > RFC series for Chelsio Inline TLS driver (chtls.ko) > > Driver use the ULP infrastructure to register chtls as Inline TLS ULP. I don't think drivers should be registering their own ULP. TLS offloading should be transparent to userspace, whatever HW ends up being used. If each driver implements its own ULP, the application has to be aware of what HW and what driver it's running on. I think this offload should rely on a generic infrastructure, not build its own private interface. Look at the current kTLS code, the proposal for an offload infrastructure [0] from Mellanox, and see how you can fit your driver into that, and extend what's missing. [0] https://patchwork.ozlabs.org/patch/849984/ [...] > Atul Gupta (9): > chtls: structure and macro definiton > cxgb4: Inline TLS FW Interface > cxgb4: LLD driver changes to enable TLS > chcr: Key Macro > chtls: Key program > chtls: CPL handler definition > chtls: Inline crypto request Tx/Rx > chtls: Register the ULP > Makefile Kconfig That patchset is split so that each patch touches a separate set of files, and the description of the contents of each patch is very limited. Can you try to group your changes by feature instead? That should help you come up with descriptive commit messages as well. Thanks, -- Sabrina
Re: [Patch net-next 2/3] net_sched: plug in qdisc ops change_tx_queue_len
On 01/19/2018 03:09 PM, Cong Wang wrote: > Introduce a new qdisc ops ->change_tx_queue_len() so that > each qdisc could decide how to implement this if it wants. > Previously we simply read dev->tx_queue_len, after pfifo_fast > switches to skb array, we need this API to resize the skb array > when we change dev->tx_queue_len. > > To avoid handling race conditions with TX BH, we need to > deactivate all TX queues before change the value and bring them > back after we are done, this also makes implementation easier. > > Cc: John Fastabend> Signed-off-by: Cong Wang > --- > include/net/sch_generic.h | 2 ++ > net/core/dev.c| 1 + > net/sched/sch_generic.c | 22 ++ > 3 files changed, 25 insertions(+) > > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h > index cd1be1f25c36..aae1baa1c30f 100644 > --- a/include/net/sch_generic.h > +++ b/include/net/sch_generic.h > @@ -200,6 +200,7 @@ struct Qdisc_ops { > struct nlattr *arg, > struct netlink_ext_ack *extack); > void(*attach)(struct Qdisc *sch); > + void(*change_tx_queue_len)(struct Qdisc *, unsigned > int); > > int (*dump)(struct Qdisc *, struct sk_buff *); > int (*dump_stats)(struct Qdisc *, struct gnet_dump > *); > @@ -488,6 +489,7 @@ void qdisc_class_hash_remove(struct Qdisc_class_hash *, > void qdisc_class_hash_grow(struct Qdisc *, struct Qdisc_class_hash *); > void qdisc_class_hash_destroy(struct Qdisc_class_hash *); > > +void dev_qdisc_change_tx_queue_len(struct net_device *dev); > void dev_init_scheduler(struct net_device *dev); > void dev_shutdown(struct net_device *dev); > void dev_activate(struct net_device *dev); > diff --git a/net/core/dev.c b/net/core/dev.c > index 99d353e4cbb2..24809d858a64 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -7058,6 +7058,7 @@ int dev_change_tx_queue_len(struct net_device *dev, > unsigned long new_len) > dev->tx_queue_len = orig_len; > return res; > } > + dev_qdisc_change_tx_queue_len(dev); > } > > return 0; > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index ef8b4ecde2ac..30aaeb3c1bf1 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -1178,6 +1178,28 @@ void dev_deactivate(struct net_device *dev) > } > EXPORT_SYMBOL(dev_deactivate); > > +static void qdisc_change_tx_queue_len(struct net_device *dev, > + struct netdev_queue *dev_queue, > + void *unused) > +{ > + struct Qdisc *qdisc = dev_queue->qdisc_sleeping; > + const struct Qdisc_ops *ops = qdisc->ops; > + > + if (ops->change_tx_queue_len) > + ops->change_tx_queue_len(qdisc, dev->tx_queue_len); hmm what happens if the resize fails in the next patch, > > +static void pfifo_fast_change_tx_queue_len(struct Qdisc *sch, unsigned int > new_len) > +{ > + struct pfifo_fast_priv *priv = qdisc_priv(sch); > + int prio; > + > + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) { > + struct skb_array *q = band2list(priv, prio); > + > + skb_array_resize(q, new_len, GFP_KERNEL); > + } > +} > + Here skb_array_resize() can fail with ENOMEM, do we need to unwind the change and push the error up the stack? Thanks, John
Re: [Patch net-next 1/3] net: introduce helper dev_change_tx_queue_len()
On 01/19/2018 03:09 PM, Cong Wang wrote: > This patch promotes the local change_tx_queue_len() to a core > helper function, dev_change_tx_queue_len(), so that rtnetlink > and net-sysfs could share the code. This also prepares for the > following patch. > > Note, the -EFAULT in the original code doesn't make sense, > we should propagate the errno from notifiers. > > Cc: John Fastabend> Signed-off-by: Cong Wang > --- Acked-by: John Fastabend
Re: [PATCH] rtl8xxxu: Fix trailing semicolon
On 01/17/2018 05:56 AM, Luis de Bethencourt wrote: > The trailing semicolon is an empty statement that does no operation. > Removing it since it doesn't do anything. > > Signed-off-by: Luis de Bethencourt> --- > > Hi, > > After fixing the same thing in drivers/staging/rtl8723bs/, Joe Perches > suggested I fix it treewide [0]. > > Best regards > Luis > > > [0] > http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-January/115410.html > [1] > http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-January/115390.html > > drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Acked-by: Jes Sorensen > diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h > b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h > index 95e3993d8a33..8828baf26e7b 100644 > --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h > +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h > @@ -1172,7 +1172,7 @@ struct rtl8723bu_c2h { > > u8 basic_rate:1; > u8 bt_has_reset:1; > - u8 dummy4_1:1;; > + u8 dummy4_1:1; > u8 ignore_wlan:1; > u8 auto_report:1; > u8 dummy4_2:3; >
Re: ipv6_addrconf: WARNING about suspicious RCU usage
Am 20.01.2018 um 20:19 schrieb Ido Schimmel: > On Sat, Jan 20, 2018 at 10:49:03AM -0800, Eric Dumazet wrote: >> On Sat, 2018-01-20 at 15:37 +0200, Ido Schimmel wrote: >>> On Sat, Jan 20, 2018 at 12:57:01PM +0100, Heiner Kallweit wrote: Since some time (didn't bisect it yet) I get the following warning. Is it a known issue? >>> >>> [...] >>> [86220.126999] BUG: sleeping function called from invalid context at mm/slab.h:420 [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: kworker/0:2 [86220.127082] 4 locks held by kworker/0:2/1003: [86220.127107] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [] process_one_work+0x1de/0x680 [86220.127179] #1: ((addr_chk_work).work){+.+.}, at: [ ] process_one_work+0x1de/0x680 [86220.127242] #2: (rtnl_mutex){+.+.}, at: [ ] rtnl_lock+0x12/0x20 [86220.127300] #3: (rcu_read_lock_bh){}, at: [ ] addrconf_verify_rtnl+0x1e/0x510 [ipv6] [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7 [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015 [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6] [86220.127568] Call Trace: [86220.127591] dump_stack+0x70/0x9e [86220.127616] ___might_sleep+0x14d/0x240 [86220.127644] __might_sleep+0x45/0x80 [86220.127672] kmem_cache_alloc_trace+0x53/0x250 [86220.127717] ? ipv6_add_addr+0xfe/0x6e0 [ipv6] [86220.127762] ipv6_add_addr+0xfe/0x6e0 [ipv6] [86220.127807] ipv6_create_tempaddr+0x24d/0x430 [ipv6] [86220.127854] ? ipv6_create_tempaddr+0x24d/0x430 [ipv6] [86220.127903] addrconf_verify_rtnl+0x339/0x510 [ipv6] [86220.127950] ? addrconf_verify_rtnl+0x339/0x510 [ipv6] [86220.127998] addrconf_verify_work+0xe/0x20 [ipv6] [86220.128032] process_one_work+0x258/0x680 [86220.128063] worker_thread+0x35/0x3f0 [86220.128091] kthread+0x124/0x140 [86220.128117] ? process_one_work+0x680/0x680 [86220.128146] ? kthread_create_worker_on_cpu+0x40/0x40 [86220.128180] ? umh_complete+0x40/0x40 [86220.128207] ? call_usermodehelper_exec_async+0x12a/0x160 [86220.128243] ret_from_fork+0x4b/0x60 >>> >>> Can you please try attached patch (untested)? >> >> >> >> I would also/instead break rcu section. > > Thanks Eric, this should work. We can continue to block in > ipv6_create_tempaddr(). > > Heiner, can you try Eric's patch instead? > So far everything looks good with Eric's patch. The warning didn't show up again. >> >> Holding RCU (and BH) for whole hash traversal is a recipe for disaster, >> if we have thousands of IPv6 addresses. >> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c >> index >> ab99cb641b7cccdda0ad4ae553c09274d7dbc047..adda73466ae1dd0f3b700b3db5fbf3065e4d3f7f >> 100644 >> --- a/net/ipv6/addrconf.c >> +++ b/net/ipv6/addrconf.c >> @@ -4356,9 +4356,11 @@ static void addrconf_verify_rtnl(void) >> spin_lock(>lock); >> ifpub->regen_count = 0; >> spin_unlock(>lock); >> +rcu_read_unlock_bh(); >> ipv6_create_tempaddr(ifpub, >> ifp, true); >> in6_ifa_put(ifpub); >> in6_ifa_put(ifp); >> +rcu_read_lock_bh(); >> goto restart; >> } >> } else if (time_before(ifp->tstamp + >> ifp->prefered_lft * HZ - regen_advance * HZ, next)) >> >> >> >> >
Darlehen
Benötigen Sie Privat- oder Geschäftskredite ohne Stress und schnelle Zustimmung? Wenn ja, kontaktieren Sie uns bitte alexgr...@gmail.com
Re: [PATCH net-next 0/7] tcp: implement rb-tree based retransmit queue
Hi Eric, We have noticed a degradation on both of our drivers (mlx4 and mlx5) when running TCP. Exact scenario is single stream TCP with 1KB packets. The degradation is a steady 50% drop. We tracked the offending commit to be: 75c119a ("tcp: implement rb-tree based retransmit queue") Since mlx4 and mlx5 code base is completely different and by looking at the changes in this commit, we believe the issue is external to the mlx4/5 drivers. I see in the comment below you anticipated some overhead, but this may be a too common case to ignore. Can you please review and consider reverting/fixing it? Thanks, Tal G. On 10/7/2017 2:31 AM, David Miller wrote: From: Eric DumazetDate: Thu, 5 Oct 2017 22:21:20 -0700 This patch series implement RB-tree based retransmit queue for TCP, to better match modern BDP. Indeed, there was a lot of resistence to this due to the overhead for small retransmit queue sizes, but with today's scale this is long overdue. So, series applied, nice work! Maybe we can look into dynamic schemes where when the queue never goes over N entries we elide the rbtree and use a list. I'm not so sure how practical that would be. Thanks!
Re: Memory corruption with r8169 across several device revisions and kernels
Hi, Am 21.01.2018 um 21:48 schrieb Francois Romieu: > Oliver Freyermuth: > [...] > > Is it an AMD based system ? > No, all the systems on which I have observed this up to now are Intel-based. Two Haswell and one Sandy Bridge system. Cheers, Oliver
Re: Memory corruption with r8169 across several device revisions and kernels
Oliver Freyermuth: [...] Is it an AMD based system ? -- Ueimor
Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
> Hmm, this is actually consistent with the example below [1]. > > AIU from the example, it seems that the dma_wmb/dma_rmb barriers are good > for synchronizing cpu/device accesses to the "Streaming DMA mapped" buffers > (the descriptors, went through the dma_map_page() API), but not for the > doorbell (a coherent memory, typically allocated via dma_alloc_coherent) > that requires using the stronger wmb() barrier. If x86 truely requires a wmb() (aka SFENCE) here then the userspace RDMA stuff is broken too, and that has been tested to death at this point.. I looked into this at one point and I thought I concluded that x86 did not require a SFENCE between a posted PCI write and writes to system memory to guarnetee order with-respect-to the PCI device? Well, so long as non-temporal stores and other specialty accesses are not being used.. Is there a chance a fancy sse optimized memcpy or memset, crypto or something is being involved here? However, Documentation/memory-barriers.txt does seem pretty clear that the kernel definition of wmb() makes it required here, even if it might be overkill for x86? Jason
Re: [Patch net-next 1/3] net: introduce helper dev_change_tx_queue_len()
On Sat, Jan 20, 2018 at 8:52 PM, John Fastabendwrote: > On 01/19/2018 03:09 PM, Cong Wang wrote: >> This patch promotes the local change_tx_queue_len() to a core >> helper function, dev_change_tx_queue_len(), so that rtnetlink >> and net-sysfs could share the code. This also prepares for the >> following patch. >> >> Note, the -EFAULT in the original code doesn't make sense, >> we should propagate the errno from notifiers. >> >> Cc: John Fastabend >> Signed-off-by: Cong Wang >> --- > > > [...] > >> static ssize_t tx_queue_len_store(struct device *dev, >> struct device_attribute *attr, >> const char *buf, size_t len) >> @@ -355,7 +332,7 @@ static ssize_t tx_queue_len_store(struct device *dev, >> if (!capable(CAP_NET_ADMIN)) >> return -EPERM; >> >> - return netdev_store(dev, attr, buf, len, change_tx_queue_len); >> + return netdev_store(dev, attr, buf, len, dev_change_tx_queue_len); >> } > > Is this protected by RTNL lock? If not what happens if this and do_setlink > both try to change tx queue length at the same time? Seems we could get > a race with multiple dev_deactivate/dev_activate sequences in-flight in > the following 2/3 patch. Surely it is protected by RTNL... 96 if (!rtnl_trylock()) 97 return restart_syscall(); 98 99 if (dev_isalive(netdev)) { 100 ret = (*set)(netdev, new); 101 if (ret == 0) 102 ret = len; 103 } 104 rtnl_unlock();
Re: [PATCH net-next v2 2/2] net/sched: act_csum: don't use spinlock in the fast path
On Fri, Jan 19, 2018 at 6:12 AM, Davide Carattiwrote: > static int tcf_csum_dump(struct sk_buff *skb, struct tc_action *a, int bind, > @@ -575,15 +594,18 @@ static int tcf_csum_dump(struct sk_buff *skb, struct > tc_action *a, int bind, > { > unsigned char *b = skb_tail_pointer(skb); > struct tcf_csum *p = to_tcf_csum(a); > + struct tcf_csum_params *params; > struct tc_csum opt = { > - .update_flags = p->update_flags, > .index = p->tcf_index, > - .action = p->tcf_action, > .refcnt = p->tcf_refcnt - ref, > .bindcnt = p->tcf_bindcnt - bind, > }; > struct tcf_t t; > > + params = rcu_dereference(p->params); I think you need rtnl_dereference() here, as we don't have RCU read lock here? > + opt.action = params->action; > + opt.update_flags = params->update_flags; > + Thanks.
Re: [patch iproute2 net-next v12 0/3] tc: shared block support
On 1/20/18 3:00 AM, Jiri Pirko wrote: > From: Jiri Pirko> > Kernel allows to share all filters between qdiscs with use > of shared block. > > Example: > > block number 22. "22" is just an identification: > $ tc qdisc add dev ens7 ingress_block 22 ingress > > $ tc qdisc add dev ens8 ingress_block 22 ingress > > > If we don't specify "block" command line option, no shared block would > be created: > $ tc qdisc add dev ens9 ingress > > Now if we list the qdiscs, we will see the block index in the output: > > $ tc qdisc > qdisc ingress : dev ens7 parent :fff1 ingress_block 22 > qdisc ingress : dev ens8 parent :fff1 ingress_block 22 > qdisc ingress : dev ens9 parent :fff1 > > > To make is more visual, the situation looks like this: > >ens7 ingress qdisc ens7 ingress qdisc > | | > | | > +--> block 22 <--+ > > Unlimited number of qdiscs may share the same block. > > Block sharing is also supported for clsact qdisc: > $ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact > $ tc qdisc show dev ens10 > qdisc clsact : dev ens10 parent :fff1 ingress_block 23 egress_block 24 > > > We can add filter using the block index: > > $ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 > action drop > > > Note we cannot use the qdisc for filter manipulations of shared blocks: > > $ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip > 192.168.100.2 action drop > Error: This filter block is shared. Please use the block index to manipulate > the filters. > > > We will see the same output if we list filters for ingress qdisc of > ens7 and ens8, also for the block 22: > > $ tc filter show block 22 > filter protocol ip pref 25 flower chain 0 > filter protocol ip pref 25 flower chain 0 handle 0x1 > ... > > $ tc filter show dev ens7 ingress > filter block 22 protocol ip pref 25 flower chain 0 > filter block 22 protocol ip pref 25 flower chain 0 handle 0x1 > ... > > $ tc filter show dev ens8 ingress > filter block 22 protocol ip pref 25 flower chain 0 > filter block 22 protocol ip pref 25 flower chain 0 handle 0x1 > ... > > --- > v11->v12: > - separate iproute2 patchset, kernel part is merged > - original patch 1 is removed as the header changes are already > in iproute2 code > - patch 1: > - fixed return type of tc_qdisc_block_exists > - removed 0 checks for block in tc_qdisc_block_exists > - rever xmas tree variables in tc_qdisc_block_exists > - patch 2 > - fixes error message when both dev and block are on the command line > > --- > Note that if you want me to change the ">=" vs "==" thing, I will do it. > But I think that it is a global iproute2 thing and should be changed in > a separate patch/patchset. IMO, it should be checking for ==. If the payload is not of an expected length, then either the kernel and userspace differ on the API or the parsing has gone off the rails. FWIW, all of the the 'RTA_PAYLOAD >=' checks are under tc/. Series applied to iproute2-next. Thanks, Jiri.
Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support
> However interesting as an example, I'm not convinced this is what we > should aim for. > > ACPI is not a replacement for DT, and it is unlikely that people would > be interested in running ACPI-only distros such as RHEL on their > Ethernet switch. DT is excellent at describing this, and there is no > need to replace it. I however do know of somebody who is building an industrial wireless access point, with a Marvell Ethernet switch. They have chosen to use an Intel CPU, and want to run CentOS on it, because that is what they know. So they will be using ACPI whatever, for the basic board support. They then have a choice of either ACPI for the switch, or having device tree as well as ACPI for the switch. And it will be interesting to see how that works. Can DT reference GPIOs and I2C busses in ACPI? > Also, please bear in mind that the ACPI spec is owned by the UEFI/ACPI > forum, and only members (who are all under a contract regarding > reasonable and non-discriminatory (RAND) licensing terms to IP they > own that is covered by the spec) can contribute. Individuals can > become members for free AFAIR, but that doesn't mean individuals are > typically interested in getting involved in a process that is rather > tedious and bureaucratic. And this is a bit what i'm worried about. How you represent MDIO busses, PHYs, Ethernet switches is implemented once in the core Linux code. So i would be interested in knowing what happens if the Linux community defines and implements something, boards start using it, but a year later the tedious and bureaucratic process rejects it, re-writes it, in an incompatible way. Andrew
Re: [PATCH iproute2 0/6] utils: Get rid of inet_get_addr()
On 1/18/18 11:13 AM, Serhey Popovych wrote: > It looks confusing to have multiple independent > routines to get internet address from it's string > representation: get_addr() and inet_get_addr(). > > Most complicated users of inet_get_addr() is > iplink_geneve.c and iplink_vxlan.c because they > required to handle both AF_INET and AF_INET6 > for their local/remote endpoints. > > On the other hand get_addr() does not provide > additional information like address type: need > to address this. to get rid of current and > possible future code duplications. Note that > this functionality is first step to make proto > independent handling of local/remote endpoints > in ip/tunnel code (there will be additional > series based on this one). > > Also fix get_addr_1() and get_prefix() to make > sure it always provide correct ->family and > ->bitlen. > > As always comments, suggestions and criticism > are welcome. Series applied to iproute2-next. Thanks,
Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support
On 21 January 2018 at 16:13, Andrew Lunnwrote: >> Right. So if you need to have some additional "parameters" with the >> connection, then I suppose you may want to go with the GenericSerialBus >> route. However, looking at the sample device tree description: >> >> davinci_mdio: ethernet@5c03 { >> compatible = "ti,davinci_mdio"; >> reg = <0x5c03 0x1000>; >> #address-cells = <1>; >> #size-cells = <0>; >> >> reset-gpios = < 5 GPIO_ACTIVE_LOW>; >> reset-delay-us = <2>; >> >> ethphy0: ethernet-phy@1 { >> reg = <1>; >> }; >> >> ethphy1: ethernet-phy@3 { >> reg = <3>; >> }; >> }; >> >> would pretty much translate directly to this in ACPI if you don't need >> any additional attributes: >> >> Device (ETH0) { >> Name (_ADR, /* PCI address of the NIC */) >> >> Device (PHY0) { >> Name (_ADR, 1) >> ... >> } >> >> Device (PHY1) { >> Name (_ADR, 3) >> ... >> } >> } >> >> which looks pretty simple to me. You can also use _DSM and _DSD here to >> pass information (like the protocol number) for the PHY devices to Linux. > > I'm not particularly worried about that simple case. Other than, i > don't want people to think that is all that is required. > > For a more full example, take a look at vf610-zii-dev-rev-b.dts. The > Freescale FEC Ethernet controller provides the base MDIO device, > mdio1. On top of this is an MDIO mux, using a few GPIO lines to > enable/disable 3 child MDIO busses. Each of these busses has an > Ethernet Switch. The Ethernet switch exports up to two MDIO busses, > and on these busses are Ethernet PHYs which are embedded inside the > switch. The Ethernet switches are also interrupt controllers, with the > PHYs having interrupt properties which point back to the interrupt > controller in the switch. > > So i'm interested in an ACPI proposal which supports this board. > However interesting as an example, I'm not convinced this is what we should aim for. ACPI is not a replacement for DT, and it is unlikely that people would be interested in running ACPI-only distros such as RHEL on their Ethernet switch. DT is excellent at describing this, and there is no need to replace it. ACPI is about firmware abstractions: you don't need to describe every stacked interrupt controller in minute detail to the OS if the firmware configures it sufficiently. That way, the OS does not need to know all these details, and vendors can update their hardware without having to update the software as well. (Or that is the idea at least, how that works out in practice on arm64 systems remains to be seen) Taking the Marvell 8040 as an example: the ACPI description does not expose the ICU interrupt controllers to the OS, but the firmware configures them and describes their configured state as ordinary GIC interrupts. Also, please bear in mind that the ACPI spec is owned by the UEFI/ACPI forum, and only members (who are all under a contract regarding reasonable and non-discriminatory (RAND) licensing terms to IP they own that is covered by the spec) can contribute. Individuals can become members for free AFAIR, but that doesn't mean individuals are typically interested in getting involved in a process that is rather tedious and bureaucratic. So my suggestion would be to find out to what extent the latest version of the spec can describe non-trivial MDIO topologies, and hopefully that includes the mvpp2 on the Marvell 8040.
Re: [PATCH 00/32] Netfilter/IPVS updates for net-next
From: Pablo Neira AyusoDate: Fri, 19 Jan 2018 20:10:09 +0100 > The following patchset contains Netfilter/IPVS updates for your net-next > tree. Basically, a new extension for ip6tables, simplification work of > nf_tables that saves us 500 LoC, allow raw table registration before > defragmentation, conversion of the SNMP helper to use the ASN.1 code > generator, unique 64-bit handle for all nf_tables objects and fixes to > address fallout from previous nf-next batch. More specifically, they > are: ... > You can pull these changes from: > > git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git Pulled, thanks.
Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote: > > On 21/01/2018 11:31 AM, Tariq Toukan wrote: > > > > > > On 19/01/2018 5:49 PM, Eric Dumazet wrote: > > > On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote: > > > > Hi Tariq > > > > > > > > Very sad that the crash was reproduced again after applied the patch. > > Memory barriers vary for different Archs, can you please share more > details regarding arch and repro steps? Yeah, mlx4 NICs in Google fleet receive trillions of packets per second, and we never noticed an issue. Although we are using a slightly different driver, using order-0 pages and fast pages recycling.
Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
On 21/01/2018 11:31 AM, Tariq Toukan wrote: On 19/01/2018 5:49 PM, Eric Dumazet wrote: On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote: Hi Tariq Very sad that the crash was reproduced again after applied the patch. Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps? --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring) static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring) { + dma_wmb(); So... is wmb() here fixing the issue ? *ring->wqres.db.db = cpu_to_be32(ring->prod & 0x); } I analyzed the kdump, it should be a memory corruption. Thanks Jianchao Hmm, this is actually consistent with the example below [1]. AIU from the example, it seems that the dma_wmb/dma_rmb barriers are good for synchronizing cpu/device accesses to the "Streaming DMA mapped" buffers (the descriptors, went through the dma_map_page() API), but not for the doorbell (a coherent memory, typically allocated via dma_alloc_coherent) that requires using the stronger wmb() barrier. [1] Documentation/memory-barriers.txt (*) dma_wmb(); (*) dma_rmb(); These are for use with consistent memory to guarantee the ordering of writes or reads of shared memory accessible to both the CPU and a DMA capable device. For example, consider a device driver that shares memory with a device and uses a descriptor status value to indicate if the descriptor belongs to the device or the CPU, and a doorbell to notify it when new descriptors are available: if (desc->status != DEVICE_OWN) { /* do not read data until we own descriptor */ dma_rmb(); /* read/modify data */ read_data = desc->data; desc->data = write_data; /* flush modifications before status update */ dma_wmb(); /* assign ownership */ desc->status = DEVICE_OWN; /* force memory to sync before notifying device via MMIO */ wmb(); /* notify device of new descriptors */ writel(DESC_NOTIFY, doorbell); } The dma_rmb() allows us guarantee the device has released ownership before we read the data from the descriptor, and the dma_wmb() allows us to guarantee the data is written to the descriptor before the device can see it now has ownership. The wmb() is needed to guarantee that the cache coherent memory writes have completed before attempting a write to the cache incoherent MMIO region. See Documentation/DMA-API.txt for more information on consistent memory. On 01/15/2018 01:50 PM, jianchao.wang wrote: Hi Tariq Thanks for your kindly response. On 01/14/2018 05:47 PM, Tariq Toukan wrote: Thanks Jianchao for your patch. And Thank you guys for your reviews, much appreciated. I was off-work on Friday and Saturday. On 14/01/2018 4:40 AM, jianchao.wang wrote: Dear all Thanks for the kindly response and reviewing. That's really appreciated. On 01/13/2018 12:46 AM, Eric Dumazet wrote: Does this need to be dma_wmb(), and should it be in mlx4_en_update_rx_prod_db ? +1 on dma_wmb() On what architecture bug was observed ? This issue was observed on x86-64. And I will send a new patch, in which replace wmb() with dma_wmb(), to customer to confirm. +1 on dma_wmb, let us know once customer confirms. Please place it within mlx4_en_update_rx_prod_db as suggested. Yes, I have recommended it to customer. Once I get the result, I will share it here. All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier. Thanks, Tariq -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support
> Right. So if you need to have some additional "parameters" with the > connection, then I suppose you may want to go with the GenericSerialBus > route. However, looking at the sample device tree description: > > davinci_mdio: ethernet@5c03 { > compatible = "ti,davinci_mdio"; > reg = <0x5c03 0x1000>; > #address-cells = <1>; > #size-cells = <0>; > > reset-gpios = < 5 GPIO_ACTIVE_LOW>; > reset-delay-us = <2>; > > ethphy0: ethernet-phy@1 { > reg = <1>; > }; > > ethphy1: ethernet-phy@3 { > reg = <3>; > }; > }; > > would pretty much translate directly to this in ACPI if you don't need > any additional attributes: > > Device (ETH0) { > Name (_ADR, /* PCI address of the NIC */) > > Device (PHY0) { > Name (_ADR, 1) > ... > } > > Device (PHY1) { > Name (_ADR, 3) > ... > } > } > > which looks pretty simple to me. You can also use _DSM and _DSD here to > pass information (like the protocol number) for the PHY devices to Linux. I'm not particularly worried about that simple case. Other than, i don't want people to think that is all that is required. For a more full example, take a look at vf610-zii-dev-rev-b.dts. The Freescale FEC Ethernet controller provides the base MDIO device, mdio1. On top of this is an MDIO mux, using a few GPIO lines to enable/disable 3 child MDIO busses. Each of these busses has an Ethernet Switch. The Ethernet switch exports up to two MDIO busses, and on these busses are Ethernet PHYs which are embedded inside the switch. The Ethernet switches are also interrupt controllers, with the PHYs having interrupt properties which point back to the interrupt controller in the switch. So i'm interested in an ACPI proposal which supports this board. Andrew
[PATCH] r8169: improve multicast hash filter handling
The multicast hash filter is a 64 bit value and the code can be simplified by using a u64 variable. Signed-off-by: Heiner Kallweit--- drivers/net/ethernet/realtek/r8169.c | 24 ++-- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 272c5962e..9afe7cd4d 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -5572,7 +5572,7 @@ static void rtl_set_rx_mode(struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); void __iomem *ioaddr = tp->mmio_addr; - u32 mc_filter[2]; /* Multicast hash filter */ + u64 mc_filter; /* Multicast hash filter */ int rx_mode; u32 tmp = 0; @@ -5582,20 +5582,20 @@ static void rtl_set_rx_mode(struct net_device *dev) rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys | AcceptAllPhys; - mc_filter[1] = mc_filter[0] = 0x; + mc_filter = ~0ULL; } else if ((netdev_mc_count(dev) > multicast_filter_limit) || (dev->flags & IFF_ALLMULTI)) { /* Too many to filter perfectly -- accept all multicasts. */ rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys; - mc_filter[1] = mc_filter[0] = 0x; + mc_filter = ~0ULL; } else { struct netdev_hw_addr *ha; rx_mode = AcceptBroadcast | AcceptMyPhys; - mc_filter[1] = mc_filter[0] = 0; + mc_filter = 0ULL; netdev_for_each_mc_addr(ha, dev) { int bit_nr = ether_crc(ETH_ALEN, ha->addr) >> 26; - mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31); + mc_filter |= BIT_ULL(bit_nr); rx_mode |= AcceptMulticast; } } @@ -5605,18 +5605,14 @@ static void rtl_set_rx_mode(struct net_device *dev) tmp = (RTL_R32(RxConfig) & ~RX_CONFIG_ACCEPT_MASK) | rx_mode; - if (tp->mac_version > RTL_GIGA_MAC_VER_06) { - u32 data = mc_filter[0]; - - mc_filter[0] = swab32(mc_filter[1]); - mc_filter[1] = swab32(data); - } + if (tp->mac_version > RTL_GIGA_MAC_VER_06) + swab64s(_filter); if (tp->mac_version == RTL_GIGA_MAC_VER_35) - mc_filter[1] = mc_filter[0] = 0x; + mc_filter = ~0ULL; - RTL_W32(MAR0 + 4, mc_filter[1]); - RTL_W32(MAR0 + 0, mc_filter[0]); + RTL_W32(MAR0 + 4, mc_filter >> 32); + RTL_W32(MAR0 + 0, mc_filter & GENMASK_ULL(31, 0)); RTL_W32(RxConfig, tmp); } -- 2.16.0
[PATCH] net: gemini: Depend on HAS_IOMEM
The zeroday builder notices that since Usermode Linux does not have IO memory, the build fails for them when selecting everything it can enable. As the driver is clearly using memory-mapped registers to access the network adapter, we add depends on HAS_IOMEM to solve this problem. Reported-by: kbuild test robotSigned-off-by: Linus Walleij --- drivers/net/ethernet/cortina/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/cortina/Kconfig b/drivers/net/ethernet/cortina/Kconfig index 0df743ea51f1..89bc4579724d 100644 --- a/drivers/net/ethernet/cortina/Kconfig +++ b/drivers/net/ethernet/cortina/Kconfig @@ -14,6 +14,7 @@ if NET_VENDOR_CORTINA config GEMINI_ETHERNET tristate "Gemini Gigabit Ethernet support" depends on OF + depends on HAS_IOMEM select PHYLIB select CRC32 ---help--- -- 2.14.3
Re: [net-next:master 308/357] gemini.c:undefined reference to `devm_ioremap_resource'
On Sat, Jan 20, 2018 at 12:33 PM, kbuild test robotwrote: > tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git > master > head: 8565d26bcb2ff6df646e946d2913fcf706d46b66 > commit: 4d5ae32f5e1e13f7f36d6439ec3257993b9f5b88 [308/357] net: ethernet: Add > a driver for Gemini gigabit ethernet > config: um-allyesconfig (attached as .config) > compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025 > reproduce: > git checkout 4d5ae32f5e1e13f7f36d6439ec3257993b9f5b88 > # save the attached .config to linux build tree > make ARCH=um depends on HAS_IOMEM right? Yeah UM is a real special case. I'll send a patch. Yours, Linus Walleij
Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support
On Sun, Jan 21, 2018 at 02:08:40AM +0100, Andrew Lunn wrote: > > I'm not familiar with MDIO bus but an alternative to GeneriSerialBus > > would be to follow what SDIO is doing, e.g have the PHY devices listed > > below the MDIO controller and use _ADR to describe their "address" on > > that bus. You can see how _ADR applies to SDIO bus from ACPI spec. > > Hi Mika > > SDIO is not a serial bus, well it can be in its simplest form, but > high speed implementations have 4 data lines. So i can understand them > not using GenericSerialBus. Well, I think SDIO is more of a serial bus pretty much the same way than SPI but it can support more data lines as needed (in the same way than SPI). But I'm not an expert in SDIO. However, that's not the point here :-) SATA (which is definitely a serial bus) uses the same mechanism and not GenericSerialBus. > MDIO is a serial bus, very similar to SPI, I2C, and UART. > > > If you go with the SDIO way then each PHY is described as normal ACPI > > device and you can use ACPI _HID/_CID to match the device to the > > corresponding driver. > > Just some background here. If you have a plain PHY as a device on an > MDIO bus, you don't need to match it to a driver within ACPI. > Registers 2 and 3 contain a vendor and product ID. That is what it > used to match the device to the driver. > > What you might need to know is the protocol to talk on the bus. Most > devices use clause 22 protocol. A few devices are clause 45. 22 is the > default in Linux, and you need to indicate if 45 should be used. You > can also indicate 22. > > It gets more complex when the device on the bus is not a PHY. It is a > generic bus, you can connect anything to it. Ethernet switches can be > on the bus. They generally cannot be identified using registers 2 and > 3. So you do need to match the device to the driver. Most do have ID > registers, so the driver can work out what specific device is on the > bus. However, Marvell moved the ID registers on there newer generation > of devices, so we need to give the driver a hint where to look. So in > device tree, we have two different compatible string. > > Broadcom really do use it as a generic bus. They have their USB PHYs > and PCIE PHYs on an MDIO bus. In DT, they have compatible strings to > match the device to the driver, as normal. OK, thanks for the explanation. > We need to ensure what we define for ACPI has the same level of > flexibility. Right. So if you need to have some additional "parameters" with the connection, then I suppose you may want to go with the GenericSerialBus route. However, looking at the sample device tree description: davinci_mdio: ethernet@5c03 { compatible = "ti,davinci_mdio"; reg = <0x5c03 0x1000>; #address-cells = <1>; #size-cells = <0>; reset-gpios = < 5 GPIO_ACTIVE_LOW>; reset-delay-us = <2>; ethphy0: ethernet-phy@1 { reg = <1>; }; ethphy1: ethernet-phy@3 { reg = <3>; }; }; would pretty much translate directly to this in ACPI if you don't need any additional attributes: Device (ETH0) { Name (_ADR, /* PCI address of the NIC */) Device (PHY0) { Name (_ADR, 1) ... } Device (PHY1) { Name (_ADR, 3) ... } } which looks pretty simple to me. You can also use _DSM and _DSD here to pass information (like the protocol number) for the PHY devices to Linux.
Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
On 19/01/2018 5:49 PM, Eric Dumazet wrote: On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote: Hi Tariq Very sad that the crash was reproduced again after applied the patch. --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring) static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring) { + dma_wmb(); So... is wmb() here fixing the issue ? *ring->wqres.db.db = cpu_to_be32(ring->prod & 0x); } I analyzed the kdump, it should be a memory corruption. Thanks Jianchao Hmm, this is actually consistent with the example below [1]. AIU from the example, it seems that the dma_wmb/dma_rmb barriers are good for synchronizing cpu/device accesses to the "Streaming DMA mapped" buffers (the descriptors, went through the dma_map_page() API), but not for the doorbell (a coherent memory, typically allocated via dma_alloc_coherent) that requires using the stronger wmb() barrier. [1] Documentation/memory-barriers.txt (*) dma_wmb(); (*) dma_rmb(); These are for use with consistent memory to guarantee the ordering of writes or reads of shared memory accessible to both the CPU and a DMA capable device. For example, consider a device driver that shares memory with a device and uses a descriptor status value to indicate if the descriptor belongs to the device or the CPU, and a doorbell to notify it when new descriptors are available: if (desc->status != DEVICE_OWN) { /* do not read data until we own descriptor */ dma_rmb(); /* read/modify data */ read_data = desc->data; desc->data = write_data; /* flush modifications before status update */ dma_wmb(); /* assign ownership */ desc->status = DEVICE_OWN; /* force memory to sync before notifying device via MMIO */ wmb(); /* notify device of new descriptors */ writel(DESC_NOTIFY, doorbell); } The dma_rmb() allows us guarantee the device has released ownership before we read the data from the descriptor, and the dma_wmb() allows us to guarantee the data is written to the descriptor before the device can see it now has ownership. The wmb() is needed to guarantee that the cache coherent memory writes have completed before attempting a write to the cache incoherent MMIO region. See Documentation/DMA-API.txt for more information on consistent memory. On 01/15/2018 01:50 PM, jianchao.wang wrote: Hi Tariq Thanks for your kindly response. On 01/14/2018 05:47 PM, Tariq Toukan wrote: Thanks Jianchao for your patch. And Thank you guys for your reviews, much appreciated. I was off-work on Friday and Saturday. On 14/01/2018 4:40 AM, jianchao.wang wrote: Dear all Thanks for the kindly response and reviewing. That's really appreciated. On 01/13/2018 12:46 AM, Eric Dumazet wrote: Does this need to be dma_wmb(), and should it be in mlx4_en_update_rx_prod_db ? +1 on dma_wmb() On what architecture bug was observed ? This issue was observed on x86-64. And I will send a new patch, in which replace wmb() with dma_wmb(), to customer to confirm. +1 on dma_wmb, let us know once customer confirms. Please place it within mlx4_en_update_rx_prod_db as suggested. Yes, I have recommended it to customer. Once I get the result, I will share it here. All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier. Thanks, Tariq -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html