Re: [PATCH net-next] nfp: flower: prioritize stats updates

2018-01-21 Thread Jakub Kicinski
On Sun, Jan 21, 2018 at 3:10 PM, David Miller  wrote:
> From: Jakub Kicinski 
> Date: Fri, 19 Jan 2018 17:54:08 -0800
>
>> From: Pieter Jansen van Vuuren 
>>
>> Previously it was possible to interrupt processing stats updates because
>> they were handled in a work queue. Interrupting the stats updates could
>> lead to a situation where we backup the control message queue. This patch
>> moves the stats update processing out of the work queue to be processed as
>> soon as hardware sends a request.
>>
>> Reported-by: Louis Peens 
>> Signed-off-by: Pieter Jansen van Vuuren 
>> 
>> Reviewed-by: Dirk van der Merwe 
>> Reviewed-by: Jakub Kicinski 
>
> Applied, thanks Jakub.

Thank you!

> Just a reminder that it really makes things more difficult that one
> can't just go:
>
> make drivers/net/ethernet/netronome/nfp/flower/cmsg.o
>
> to test a specific NFP object build like one can with pretty much the
> rest of the entire kernel tree.

Oh, thanks for letting me know, I wasn't aware the build system seems
to require a separate Makefile in each directory..  I'll fix it first
thing in the morning!


[RFC PATCH net-next] net: aquantia: aq_fw2x_get_mac_permanent() can be static

2018-01-21 Thread kbuild test robot

Fixes: a57d3929b838 ("net: aquantia: Introduce support for new firmware on AQC 
cards")
Signed-off-by: Fengguang Wu 
---
 hw_atl_utils_fw2x.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
index 8cfce95..39cd3a2 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
@@ -107,7 +107,7 @@ static int aq_fw2x_update_link_status(struct aq_hw_s *self)
return 0;
 }
 
-int aq_fw2x_get_mac_permanent(struct aq_hw_s *self, u8 *mac)
+static int aq_fw2x_get_mac_permanent(struct aq_hw_s *self, u8 *mac)
 {
int err = 0;
u32 h = 0U;


Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.

2018-01-21 Thread Benjamin Poirier
On 2018/01/20 09:21, Alexander Duyck wrote:
> On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier
>  wrote:
> > On 2018/01/20 07:45, Benjamin Poirier wrote:
> > [...]
> >> >
> >> > I'm of the mind that we need to cut down on the code thrash.  This
> >> > driver is supposed to have been in a "maintenance" mode for the last
> >> > year or so as there aren't being any new parts added is my
> >> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e:
> >> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or
> >> > accepted in the first place. I don't see any notes about it fixing any
> >> > bug or addressing any issue and it seems like that is the start of all
> >> > the issues we have been having recently with RXO triggering more
> >> > interrupts, various link issues, and this most recent VMware issue.
> >>
> >> I'm sorry to say but you're the one who suggested that change:
> >>
> >> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html
> >>
> >> On 2015/10/28 23:08, Alexander Duyck wrote:
> >> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote:
> >> [...]
> >> >
> >> > I would argue your first patch probably didn't go far enough to remove 
> >> > dead
> >> > code.  Specifically you should only ever get into this function if LSC is
> >> > set.  There are no other causes that should trigger this.  As such you 
> >> > could
> >> > probably remove the ICR read, and instead replace it with an ICR write of
> >> > the LSC bit since OTHER is already cleared via EIAC.
> >> >
> >
> > ... The assumption that "There are no other causes that should trigger
> > this." turned out to be wrong and that caused the RXO problems. Clearing
> > OTHER via EIAC is causing the problems with vmware now. I don't think
> > you foresaw those problems back in 2015 and neither did I.
> 
> Well that explains why I felt like I was explaining things to a
> younger version of myself. I was a bit more relaxed in terms of being
> willing to make arbitrary changes a few years ago. I tend to be a bit
> more conservative now, at least as far as having justifications as to
> why I want to do things. With any change you end up taking on risk,
> and so usually a patch has a justification as to why you are making
> the change.
> 
> Unfortunately at the time I didn't have all the information and based
> my suggestion on a bad assumption. I would guess at the time I was
> thinking of doing general code cleanup. Other drivers such as igb work
> this way, but it led us down the path we are on now where we are
> having to make one fix after another. It is leading in the opposite
> direction of maintainability as this is all becoming more complex.
> Suggesting this was a bad decision on my part at the time. I'm only
> human, I make mistakes.. :-)

Thanks for the introspection.

> 
> With further review of the code I am seeing various other issues that
> could still pop up as I am not certain we should even have the "other"
> interrupt messing with the NAPI polling or packet accounting logic at
> all. The question I would have at this point is if we revert
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1)
> and all the related fixes for it, what do we end up with?

The patch I submitted for the current vmware issue actually finishes
reverting commit 16ecba59bc33.

I believe the relevant commits to consider are:

16ecba59bc33 e1000e: Do not read ICR in Other interrupt (v4.5-rc1)
a61cfe4ffad7 e1000e: Do not write lsc to ics in msi-x mode (v4.5-rc1)
0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)

19110cfbb34d e1000e: Separate signaling for link check/link up
 (v4.15-rc1)
4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)

4110e02eb45e e1000e: Fix e1000_check_for_copper_link_ich8lan return
 value. (v4.15-rc8)

(submitted)  e1000e: Remove Other from EIAC.

commit 4aea7a5c5e94 essentially reverts a61cfe4ffad7 and part of
16ecba59bc33 (ICR read). The submitted patch reverts the rest of
16ecba59bc33 (EIAC clearing of Other).

> It seems
> like the code is slowly heading back in the direction of where it was
> originally anyway since there have been a number of partial reverts.
> I'm wondering what would happen if we were to just short-cut that and
> audit the patches involved to see what we really need and don't.
> 
> Your patch as proposed is essentially another step in that direction.
> I'm thinking we may want to drop my currently proposed fix for now and
> instead look at going through and figure out what changes after that
> first one are still really needed.

If the patch that I submitted for the current vmware issue is merged,
the significant commits that are left are:

0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)
Fixes a problem in the irq disabling of the napi implementation.

19110cfbb34d e1000e: Separate signaling for link check/link up
 (v4.15-rc1)
Fixes link flapping 

[RFC PATCH net-next] net: aquantia: hw_atl_utils_mpi_set_speed() can be static

2018-01-21 Thread kbuild test robot

Fixes: 0c58c35f02c2 ("net: aquantia: Introduce firmware ops callbacks")
Signed-off-by: Fengguang Wu 
---
 hw_atl_utils.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index 967f0fd..ad87338 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -483,7 +483,7 @@ void hw_atl_utils_mpi_read_stats(struct aq_hw_s *self,
 err_exit:;
 }
 
-int hw_atl_utils_mpi_set_speed(struct aq_hw_s *self, u32 speed)
+static int hw_atl_utils_mpi_set_speed(struct aq_hw_s *self, u32 speed)
 {
u32 val = aq_hw_read_reg(self, HW_ATL_MPI_CONTROL_ADR);
 


Re: [PATCHv3] 3c59x: fix missing dma_mapping_error check and bad ring refill logic

2018-01-21 Thread tedheadster
On Wed, Jan 3, 2018 at 1:44 PM, David Miller  wrote:
> From: Neil Horman 
> Date: Wed,  3 Jan 2018 13:09:23 -0500
>
>> A few spots in 3c59x missed calls to dma_mapping_error checks, casuing
>> WARN_ONS to trigger.  Clean those up.  While we're at it, refactor the
>> refill code a bit so that if skb allocation or dma mapping fails, we
>> recycle the existing buffer.  This prevents holes in the rx ring, and
>> makes for much simpler logic
>>
>> Note: This is compile only tested.  Ted, if you could run this and
>> confirm that it continues to work properly, I would appreciate it, as I
>> currently don't have access to this hardware
>>

Neil,
  I was able to test this patch. I did not get any WARN_ON messages.
However, I am getting a lot of dropped receive packets; uptime is 11
minutes and it has already dropped 214 of 743 receive packets.

Admittedly this is on a slow i486 regression testing system, but the
drop rate is approximately 30% which seems high even for this system
because it is on a very quiet switched network.

I enabled some debugging messages by setting msglvl to 4 and
recompiling with DYNAMIC_DEBUG=y. I did not see any messages of the
form "No memory to allocate a sk_buff of size" so that leaves the
following two cases:

boomerang_rx()
...
newskb = netdev_alloc_skb_ip_align(dev, PKT_BUF_SZ);
if (!newskb) {
  dev->stats.rx_dropped++;
  goto clear_complete;
  }
  newdma = pci_map_single(VORTEX_PCI(vp), newskb->data,
PKT_BUF_SZ, PCI_DMA_FROMDEVICE)
  if (dma_mapping_error(_PCI(vp)->dev, newdma)) {
dev->stats.rx_dropped++;
consume_skb(newskb);
goto clear_complete;
  }

What shall we do to determine if it is hitting the pci_map_single() or
netdev_alloc_skb_ip_align() failure?

- Matthew


Re: wcn36xx: release resources in case of error

2018-01-21 Thread Kalle Valo
Ramon Fried  wrote:

> wcn36xx_dxe_init() doesn't check for the return value
> of wcn36xx_dxe_init_descs().
> This patch releases the resources in case an error ocurred.
> 
> Signed-off-by: Ramon Fried 

This is a duplicate.

Patch set to Superseded.

-- 
https://patchwork.kernel.org/patch/10173331/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



Re: wcn36xx: release resources in case of error

2018-01-21 Thread Kalle Valo
Ramon Fried  wrote:

> wcn36xx_dxe_init() doesn't check for the return value
> of wcn36xx_dxe_init_descs().
> This patch releases the resources in case an error ocurred.
> 
> Signed-off-by: Ramon Fried 

This is a duplicate.

Patch set to Superseded.

-- 
https://patchwork.kernel.org/patch/10173329/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



Re: wcn36xx: release resources in case of error

2018-01-21 Thread Kalle Valo
Ramon Fried  wrote:

> wcn36xx_dxe_init() doesn't check for the return value
> of wcn36xx_dxe_init_descs().
> This patch releases the resources in case an error ocurred.
> 
> Signed-off-by: Ramon Fried 

Doesn't compile:

drivers/net/wireless/ath/wcn36xx/dxe.c: In function ‘wcn36xx_dxe_deinit_descs’:
drivers/net/wireless/ath/wcn36xx/dxe.c:243:15: error: ‘struct wcn36xx_dxe_ch’ 
has no member named ‘desc_bum’
  size = wcn_ch->desc_bum * sizeof(struct wcn36xx_dxe_desc);
   ^
drivers/net/wireless/ath/wcn36xx/dxe.c:244:20: error: ‘wcn’ undeclared (first 
use in this function)
  dma_free_coherent(wcn->dev, size,
^
drivers/net/wireless/ath/wcn36xx/dxe.c:244:20: note: each undeclared identifier 
is reported only once for each function it appears in
drivers/net/wireless/ath/wcn36xx/dxe.c: In function ‘wcn36xx_dxe_init’:
drivers/net/wireless/ath/wcn36xx/dxe.c:737:3: error: implicit declaration of 
function ‘dev_error’ [-Werror=implicit-function-declaration]
   dev_error("Error allocating descriptor\n");
   ^
drivers/net/wireless/ath/wcn36xx/dxe.c:818:2: error: expected ‘;’ before ‘if’
  if (ret) {
  ^
drivers/net/wireless/ath/wcn36xx/dxe.c:860:1: warning: label ‘out_err_txh_ch’ 
defined but not used [-Wunused-label]
 out_err_txh_ch:
 ^
drivers/net/wireless/ath/wcn36xx/dxe.c:856:1: warning: label ‘out_err_rxh_ch’ 
defined but not used [-Wunused-label]
 out_err_rxh_ch:
 ^
drivers/net/wireless/ath/wcn36xx/dxe.c:760:3: error: label ‘our_err_txh_ch’ 
used but not defined
   goto our_err_txh_ch;
   ^
cc1: some warnings being treated as errors
make[5]: *** [drivers/net/wireless/ath/wcn36xx/dxe.o] Error 1
make[4]: *** [drivers/net/wireless/ath/wcn36xx] Error 2
make[3]: *** [drivers/net/wireless/ath] Error 2
make[2]: *** [drivers/net/wireless] Error 2
make[1]: *** [drivers/net] Error 2
make[1]: *** Waiting for unfinished jobs
make: *** [drivers] Error 2

Patch set to Changes Requested.

-- 
https://patchwork.kernel.org/patch/10173325/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



[PATCH net-next] net: link_watch: mark bonding link events urgent

2018-01-21 Thread Roopa Prabhu
From: Roopa Prabhu 

It takes 1sec for bond link down notification to hit user-space
when all slaves of the bond go down. 1sec is too long for
protocol daemons in user-space relying on bond link notification
to failover/recover (eg: multichassis lag implementations in user-space).
Since the link event code already marks team device port link events
 urgent, this patch does the same for bonding link events.

Signed-off-by: Roopa Prabhu 
---
 net/core/link_watch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 9828616..63bb2ad 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -92,7 +92,7 @@ static bool linkwatch_urgent_event(struct net_device *dev)
if (dev->ifindex != dev_get_iflink(dev))
return true;
 
-   if (dev->priv_flags & IFF_TEAM_PORT)
+   if (dev->priv_flags & (IFF_TEAM_PORT | IFF_BONDING))
return true;
 
return netif_carrier_ok(dev) && qdisc_tx_changing(dev);
-- 
2.1.4



Re: [v3, 3/6] dt: booting-without-of: DT fix s/#interrupt-cell/#interrupt-cells/

2018-01-21 Thread Michael Ellerman
On Fri, 2017-06-02 at 12:38:46 UTC, Geert Uytterhoeven wrote:
> Signed-off-by: Geert Uytterhoeven 
> Acked-by: Rob Herring 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/4be4119d1fbd93c44d5c639735c312

cheers


[PATCH net-next] tun: add missing rcu annotation

2018-01-21 Thread Jason Wang
This patch fixes the following sparse warnings:

drivers/net/tun.c:2241:15: error: incompatible types in comparison expression 
(different address spaces)

Fixes: cd5681d7d890 ("tuntap: rename struct tun_steering_prog to struct 
tun_prog")
Cc: Daniel Borkmann 
Signed-off-by: Jason Wang 
---
 drivers/net/tun.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 6988746..2bc18b1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2225,7 +2225,8 @@ static void tun_prog_free(struct rcu_head *rcu)
kfree(prog);
 }
 
-static int __tun_set_ebpf(struct tun_struct *tun, struct tun_prog **prog_p,
+static int __tun_set_ebpf(struct tun_struct *tun,
+ struct tun_prog __rcu **prog_p,
  struct bpf_prog *prog)
 {
struct tun_prog *old, *new = NULL;
-- 
2.7.4



Re: net merged into net-next

2018-01-21 Thread Jason Wang



On 2018年01月21日 04:17, Cong Wang wrote:

On Fri, Jan 19, 2018 at 8:02 PM, David Miller  wrote:

Cong, please check my conflict resolution of drivers/net/tun.c, thank
you.

It looks good to me except I am not sure about the xdp_rxq_info_unreg()
inside tun_cleanup_tx_ring().


Looks correct to me.

Thanks



Re: [PATCH net] net: validate untrusted gso packets

2018-01-21 Thread Jason Wang



On 2018年01月19日 22:39, Willem de Bruijn wrote:

On Fri, Jan 19, 2018 at 3:19 AM, Jason Wang  wrote:


On 2018年01月19日 08:53, Willem de Bruijn wrote:

And what you propose here is just a very small subset of the
necessary checking, more comes at gso header checking. So even if we
care
performance, it only help for some specific case.

It also fixed the bug that Eric sent a separate patch for, as that did
not dissect as a valid TCP packet, either.

I may miss something but how did this patch protects an evil thoff?

Actually, it blocked that specific reproducer because the ip protocol
did not match.


I see.


I think that __skb_flow_dissect_tcp should return a boolean, causing
dissection return FLOW_DISSECT_RET_OUT_BAD if the tcph is bad.
That would be needed to really catch it with flow dissection at the
source.


Just sanitize transport to offset_hint (0) in the case of tun.

That is the current approach in skb_probe_transport_header, but it
opens us up to parsing of garbage headers later in the stack when
unconditionally reading tcp_hdrlen. The qdisc_pkt_len_init case is one
such example. Seems better to leave transport header unset in such
cases and qualify all access to the headers if DODGY.


It looks to
me flow dissector will return FLOW_DISSECT_RET_OUT_BAD too if it can't
recognize the protocol. We can't differ the real failure from unrecognized
protocol. (or change the return from bool to int).

Unrecognized protocol should imply failure. virtio_net_hdr_to_skb accepts
a whitelist of protocols, all of which the flow dissector can verify.


Yes and only for gso packets.



Another argument for early validation: we cannot currently differentiate
tunnel headers inserted by the tun/pf_packet/xen user from those
generated by the stack if both are present.

The kernel currently does not define tunnel VIRTIO_NET_HDR_GSO
types, so should drop packets with tunnel headers. Tunnel gso handlers
indeed do drop these if skb->encapsulation is not set.

But if a packet travels through a tunnel device and the bit is set, at
segmentation time we cannot distinguish trusted stack headers from
possibly malformed user supplied ones.


Right.

Thanks


Re: [PATCH net v3] gso: validate gso_type in GSO handlers

2018-01-21 Thread Jason Wang



On 2018年01月19日 22:29, Willem de Bruijn wrote:

From: Willem de Bruijn

Validate gso_type during segmentation as SKB_GSO_DODGY sources
may pass packets where the gso_type does not match the contents.

Syzkaller was able to enter the SCTP gso handler with a packet of
gso_type SKB_GSO_TCPV4.

On entry of transport layer gso handlers, verify that the gso_type
matches the transport protocol.

Fixes: 90017accff61 ("sctp: Add GSO support")
Link:http://lkml.kernel.org/r/<001a1137452496ffc305617e5...@google.com>
Reported-by:syzbot+fee64147a25aecd48...@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn

---
Similar checks existed until removed in commit 5c7cdf339af5 ("gso:
Remove arbitrary checks for unsupported GSO"). But those were limited
to the TSO path, not software GSO. I believe that this issue goes
back further, hence the Fixes at the first user of virtio_net_hdr.
---
  net/ipv4/esp4_offload.c  | 3 +++
  net/ipv4/tcp_offload.c   | 3 +++
  net/ipv4/udp_offload.c   | 3 +++
  net/ipv6/esp6_offload.c  | 3 +++
  net/ipv6/tcpv6_offload.c | 3 +++
  net/ipv6/udp_offload.c   | 3 +++
  net/sctp/offload.c   | 3 +++
  7 files changed, 21 insertions(+)


Acked-by: Jason Wang 

Thanks


Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-21 Thread jianchao.wang
Hi Eric

On 01/22/2018 12:43 AM, Eric Dumazet wrote:
> On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote:
>>
>> On 21/01/2018 11:31 AM, Tariq Toukan wrote:
>>>
>>>
>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
 On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
> Hi Tariq
>
> Very sad that the crash was reproduced again after applied the patch.
>>
>> Memory barriers vary for different Archs, can you please share more 
>> details regarding arch and repro steps?
> 
> Yeah, mlx4 NICs in Google fleet receive trillions of packets per
> second, and we never noticed an issue.
> 
> Although we are using a slightly different driver, using order-0 pages
> and fast pages recycling.
> 
> 
The driver we use will will set the page reference count to (size of 
pages)/stride, the
pages will be freed by networking stack when the reference become zero, and the 
order-3
pages maybe allocated soon, this give NIC device a chance to corrupt the pages 
which have
 been allocated by others, such as slab.
In the current version with order-0 and page recycling, maybe the corruption 
occurred on
the inbound packets sometimes and just cause some bad and invalid packets which 
will be
dropped.

Thanks
Jianchao


Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-21 Thread jianchao.wang
Hi Tariq and all

Many thanks for your kindly and detailed response and comment.

On 01/22/2018 12:24 AM, Tariq Toukan wrote:
> 
> 
> On 21/01/2018 11:31 AM, Tariq Toukan wrote:
>>
>>
>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
 Hi Tariq

 Very sad that the crash was reproduced again after applied the patch.
> 
> Memory barriers vary for different Archs, can you please share more details 
> regarding arch and repro steps?The hardware is HP ProLiant DL380 
> Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
The xen is installed. The crash occurred in DOM0.
Regarding to the repro steps, it is a customer's test which does heavy disk I/O 
over NFS storage without any guest.

The patch that can fix this issue is as follow:
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1005,6 +1005,7 @@ out:
wmb(); /* ensure HW sees CQ consumer before we post new buffers */
ring->cons = cq->mcq.cons_index;
mlx4_en_refill_rx_buffers(priv, ring);
+   wmb();
mlx4_en_update_rx_prod_db(ring);
return polled;
 }

Thanks
Jianchao
> 

 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
 +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
 @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct 
 mlx4_en_rx_ring *ring)
   static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring 
 *ring)
   {
 +    dma_wmb();
>>>
>>> So... is wmb() here fixing the issue ?
>>>
   *ring->wqres.db.db = cpu_to_be32(ring->prod & 0x);
   }

 I analyzed the kdump, it should be a memory corruption.

 Thanks
 Jianchao
>>
>> Hmm, this is actually consistent with the example below [1].
>>
>> AIU from the example, it seems that the dma_wmb/dma_rmb barriers are good 
>> for synchronizing cpu/device accesses to the "Streaming DMA mapped" buffers 
>> (the descriptors, went through the dma_map_page() API), but not for the 
>> doorbell (a coherent memory, typically allocated via dma_alloc_coherent) 
>> that requires using the stronger wmb() barrier.
>>
>>
>> [1] Documentation/memory-barriers.txt
>>
>>   (*) dma_wmb();
>>   (*) dma_rmb();
>>
>>   These are for use with consistent memory to guarantee the ordering
>>   of writes or reads of shared memory accessible to both the CPU and a
>>   DMA capable device.
>>
>>   For example, consider a device driver that shares memory with a device
>>   and uses a descriptor status value to indicate if the descriptor 
>> belongs
>>   to the device or the CPU, and a doorbell to notify it when new
>>   descriptors are available:
>>
>>  if (desc->status != DEVICE_OWN) {
>>  /* do not read data until we own descriptor */
>>  dma_rmb();
>>
>>  /* read/modify data */
>>  read_data = desc->data;
>>  desc->data = write_data;
>>
>>  /* flush modifications before status update */
>>  dma_wmb();
>>
>>  /* assign ownership */
>>  desc->status = DEVICE_OWN;
>>
>>  /* force memory to sync before notifying device via MMIO */
>>  wmb();
>>
>>  /* notify device of new descriptors */
>>  writel(DESC_NOTIFY, doorbell);
>>  }
>>
>>   The dma_rmb() allows us guarantee the device has released ownership
>>   before we read the data from the descriptor, and the dma_wmb() allows
>>   us to guarantee the data is written to the descriptor before the device
>>   can see it now has ownership.  The wmb() is needed to guarantee that 
>> the
>>   cache coherent memory writes have completed before attempting a write 
>> to
>>   the cache incoherent MMIO region.
>>
>>   See Documentation/DMA-API.txt for more information on consistent 
>> memory.
>>
>>
 On 01/15/2018 01:50 PM, jianchao.wang wrote:
> Hi Tariq
>
> Thanks for your kindly response.
>
> On 01/14/2018 05:47 PM, Tariq Toukan wrote:
>> Thanks Jianchao for your patch.
>>
>> And Thank you guys for your reviews, much appreciated.
>> I was off-work on Friday and Saturday.
>>
>> On 14/01/2018 4:40 AM, jianchao.wang wrote:
>>> Dear all
>>>
>>> Thanks for the kindly response and reviewing. That's really appreciated.
>>>
>>> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
> Does this need to be dma_wmb(), and should it be in
> mlx4_en_update_rx_prod_db ?
>

 +1 on dma_wmb()

 On what architecture bug was observed ?
>>>
>>> This issue was observed on x86-64.
>>> And I will send a new patch, in which replace wmb() with dma_wmb(), to 
>>> customer
>>> to confirm.
>>
>> +1 on dma_wmb, let us know once customer confirms.
>> Please place it within mlx4_en_update_rx_prod_db as suggested.
>
> Yes, I have recommended it to customer.
> Once I get 

Re: Memory corruption with r8169 across several device revisions and kernels

2018-01-21 Thread Oliver Freyermuth
Am 22.01.2018 um 01:09 schrieb Francois Romieu:
> You said:
> 
> Oliver Freyermuth  :
> [...]
>> The values found in overwritten memory match those contained in
>> /proc/self/net/dev for the realtek ethernet device.
> 
> Are you able to retrieve the layout ? That is, does it appear to match:
> 
> - r8169 hardware stats DMA buffer ?
>   TxOk, RxOk, TxErr, RxErr, ...
> 
> - rtnl_link_stats ?
>   rx_packets, tx_packets, rx_bytes, tx_bytes, ...
> 
> or something else ?

Not cleanly. 
Since I'm no expert in kernel module development, I can only deduce from what I 
get in mapped memory,
e.g. with memtester. What I found there I found back in /proc/self/net/dev,
I'm not sure anymore whether it was RX or TX bytes / packets (but it was none 
of the error counters). 
I can try to reproduce to clarify, but it's a somwhat dangerous undertaking. 

Also, from a time when the physical offset was in low memory, I got the 
following in syslog:
Oct 12 10:05:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 0065b8ea
Oct 12 10:10:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 0065be39
Oct 12 10:11:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 0065be8c
Oct 12 10:12:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 0065bef8
Oct 12 10:13:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 0065bfbe
Oct 12 10:18:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 0065c37a
Oct 12 10:19:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 0065c3db
Oct 12 10:31:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 0065cc48
Oct 12 10:35:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 0065d402
Oct 12 10:47:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 0065dcbb
Oct 12 10:53:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 0065e0a3
Oct 12 11:39:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 006602f2
Oct 12 11:44:02 desktop1 kernel: Corrupted low memory at 88009000 (9000 
phys) = 00661ef0

Also, I'm not sure whether the low memory scanner continues after a single 
corruption was found, potentially it would only see the first corrupted region. 
memtester in userspace stops on the first corruption and then tries another 
pass. At least I only ever saw one corrupted region with the tools I used. 

The same was true for the corrupted btrfs filesystem: As far as I could tell, 
there was a single corrupted region, no series of counters, i.e. not a full 
structure. 

Cheers,
Oliver


Re: Memory corruption with r8169 across several device revisions and kernels

2018-01-21 Thread Francois Romieu
Oliver Freyermuth  :
> Am 21.01.2018 um 21:48 schrieb Francois Romieu:
> > Oliver Freyermuth  :
[...]
> > Is it an AMD based system ?
> > 
> 
> No, all the systems on which I have observed this up to now are Intel-based. 
> Two Haswell and one Sandy Bridge system. 

Ok.

You said:

Oliver Freyermuth  :
[...]
> The values found in overwritten memory match those contained in
> /proc/self/net/dev for the realtek ethernet device.

Are you able to retrieve the layout ? That is, does it appear to match:

- r8169 hardware stats DMA buffer ?
  TxOk, RxOk, TxErr, RxErr, ...

- rtnl_link_stats ?
  rx_packets, tx_packets, rx_bytes, tx_bytes, ...

or something else ?

-- 
Ueimor


Re: [PATCH net-next 0/7] tcp: implement rb-tree based retransmit queue

2018-01-21 Thread Eric Dumazet
On Sun, Jan 21, 2018 at 12:52 PM, Tal Gilboa  wrote:
> Hi Eric,
> We have noticed a degradation on both of our drivers (mlx4 and mlx5) when
> running TCP. Exact scenario is single stream TCP with 1KB packets. The
> degradation is a steady 50% drop.
> We tracked the offending commit to be:
> 75c119a ("tcp: implement rb-tree based retransmit queue")
>
> Since mlx4 and mlx5 code base is completely different and by looking at the
> changes in this commit, we believe the issue is external to the mlx4/5
> drivers.
>
> I see in the comment below you anticipated some overhead, but this may be a
> too common case to ignore.
>
> Can you please review and consider reverting/fixing it?
>

Hi Tal

You have to provide way more details than a simple mail, asking for a
" revert or a fix " ...

On our GFEs, we got a win, while I was expecting a small overhead,
given the apparent complexity of dealing with RB tree instead of
linear list.

And on the stress scenario described in my patch set, the win was
absolutely abysmal.

A " single strean TCP with 1KB packets"  is not something we need to optimize,
unless there is some really strange setup for one of your customers ?

Here we deal with millions of TCP flows, and this is what we need to
take care of.

Thanks.

> Thanks,
>
> Tal G.
>
>
> On 10/7/2017 2:31 AM, David Miller wrote:
>>
>> From: Eric Dumazet 
>> Date: Thu,  5 Oct 2017 22:21:20 -0700
>>
>>> This patch series implement RB-tree based retransmit queue for TCP,
>>> to better match modern BDP.
>>
>>
>> Indeed, there was a lot of resistence to this due to the overhead
>> for small retransmit queue sizes, but with today's scale this is
>> long overdue.
>>
>> So, series applied, nice work!
>>
>> Maybe we can look into dynamic schemes where when the queue never
>> goes over N entries we elide the rbtree and use a list.  I'm not
>> so sure how practical that would be.
>>
>> Thanks!
>>
>


Re: [patch net-next 0/7] mlxsw: Add support for mirror action with flower

2018-01-21 Thread David Miller
From: Jiri Pirko 
Date: Fri, 19 Jan 2018 09:24:45 +0100

> From: Jiri Pirko 
> 
> Arkadi says:
> 
> Add support for mirror action with flower classifier. The first 3 patches
> introduce a generic per-block resource infra. The last 4 patches add
> support for flow based span.

Series applied, thanks.


Re: [PATCH net-next 00/11] Aquantia atlantic driver new devices support

2018-01-21 Thread David Miller
From: Igor Russkikh 
Date: Fri, 19 Jan 2018 17:03:17 +0300

> This patchset introduces a support for new Aquantia hardware:
> AQC11x family with updated hardware (B1) and firmware (2.x and 3.x branches).
> 
> For that, a number of improvements in overall driver model were done:
>  - Firmware specific ops tables. Firmware 2.x and 3.x series support
>functions are now in separate fw2x module.
>  - PCI module cleanup and simplification done.
>  - Verified and tested hardware reset process.

Series applied, thanks Igor.


Re: [pull request][net-next 00/15] Mellanox, mlx5 updates 2018-01-19

2018-01-21 Thread David Miller
From: Saeed Mahameed 
Date: Fri, 19 Jan 2018 23:34:42 +0200

> The following series includes updates for mlx5e netdev driver,
> for more information please see tag log below.
> 
> Please pull and let me know if there's any problem.

Pulled, thanks Saeed.


Re: [PATCH net-next] nfp: flower: prioritize stats updates

2018-01-21 Thread David Miller
From: Jakub Kicinski 
Date: Fri, 19 Jan 2018 17:54:08 -0800

> From: Pieter Jansen van Vuuren 
> 
> Previously it was possible to interrupt processing stats updates because
> they were handled in a work queue. Interrupting the stats updates could
> lead to a situation where we backup the control message queue. This patch
> moves the stats update processing out of the work queue to be processed as
> soon as hardware sends a request.
> 
> Reported-by: Louis Peens 
> Signed-off-by: Pieter Jansen van Vuuren 
> Reviewed-by: Dirk van der Merwe 
> Reviewed-by: Jakub Kicinski 

Applied, thanks Jakub.

Just a reminder that it really makes things more difficult that one
can't just go:

make drivers/net/ethernet/netronome/nfp/flower/cmsg.o

to test a specific NFP object build like one can with pretty much the
rest of the entire kernel tree.


Re: [PATCH net V2] net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare

2018-01-21 Thread David Miller
From: Saeed Mahameed 
Date: Sun, 21 Jan 2018 05:30:42 +0200

> From: Talat Batheesh 
> 
> Helmut reported a bug about division by zero while
> running traffic and doing physical cable pull test.
> 
> When the cable unplugged the ppms become zero, so when
> dividing the current ppms by the previous ppms in the
> next dim iteration there is division by zero.
> 
> This patch prevent this division for both ppms and epms.
> 
> Fixes: c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism")
> Reported-by: Helmut Grauer 
> Signed-off-by: Talat Batheesh 
> Signed-off-by: Saeed Mahameed 
> ---
> v2:
> - Fix spelling in commit message.
> 
> This patch is the equivalent of:
> ("net/dim: Fix fixpoint divide exception in net_dim_stats_compare")
> which was already submitted to net-next. Since mlx5 irq moderation code
> got moved in net-next, this patch should only go to net and skip net-next.
> 
> for -stable: 4.12.y and later.

Applied and queue up for -stable, thank you.


Re: [PATCH] net: gemini: Depend on HAS_IOMEM

2018-01-21 Thread David Miller
From: Linus Walleij 
Date: Sun, 21 Jan 2018 14:15:41 +0100

> The zeroday builder notices that since Usermode Linux does not
> have IO memory, the build fails for them when selecting everything
> it can enable.
> 
> As the driver is clearly using memory-mapped registers to access
> the network adapter, we add depends on HAS_IOMEM to solve this
> problem.
> 
> Reported-by: kbuild test robot 
> Signed-off-by: Linus Walleij 

Applied, thank you.


Re: [RFC crypto v3 0/9] Chelsio Inline TLS

2018-01-21 Thread Sabrina Dubroca
2017-12-20, 17:03:02 +0530, Atul Gupta wrote:
> RFC series for Chelsio Inline TLS driver (chtls.ko)
> 
> Driver use the ULP infrastructure to register chtls as Inline TLS ULP.

I don't think drivers should be registering their own ULP. TLS
offloading should be transparent to userspace, whatever HW ends up
being used. If each driver implements its own ULP, the application has
to be aware of what HW and what driver it's running on.

I think this offload should rely on a generic infrastructure, not
build its own private interface. Look at the current kTLS code, the
proposal for an offload infrastructure [0] from Mellanox, and see how
you can fit your driver into that, and extend what's missing.

[0] https://patchwork.ozlabs.org/patch/849984/


[...]
> Atul Gupta (9):
>   chtls: structure and macro definiton
>   cxgb4: Inline TLS FW Interface
>   cxgb4: LLD driver changes to enable TLS
>   chcr: Key Macro
>   chtls: Key program
>   chtls: CPL handler definition
>   chtls: Inline crypto request Tx/Rx
>   chtls: Register the ULP
>   Makefile Kconfig

That patchset is split so that each patch touches a separate set of
files, and the description of the contents of each patch is very
limited.  Can you try to group your changes by feature instead?  That
should help you come up with descriptive commit messages as well.


Thanks,

-- 
Sabrina


Re: [Patch net-next 2/3] net_sched: plug in qdisc ops change_tx_queue_len

2018-01-21 Thread John Fastabend
On 01/19/2018 03:09 PM, Cong Wang wrote:
> Introduce a new qdisc ops ->change_tx_queue_len() so that
> each qdisc could decide how to implement this if it wants.
> Previously we simply read dev->tx_queue_len, after pfifo_fast
> switches to skb array, we need this API to resize the skb array
> when we change dev->tx_queue_len.
> 
> To avoid handling race conditions with TX BH, we need to
> deactivate all TX queues before change the value and bring them
> back after we are done, this also makes implementation easier.
> 
> Cc: John Fastabend 
> Signed-off-by: Cong Wang 
> ---
>  include/net/sch_generic.h |  2 ++
>  net/core/dev.c|  1 +
>  net/sched/sch_generic.c   | 22 ++
>  3 files changed, 25 insertions(+)
> 
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index cd1be1f25c36..aae1baa1c30f 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -200,6 +200,7 @@ struct Qdisc_ops {
> struct nlattr *arg,
> struct netlink_ext_ack *extack);
>   void(*attach)(struct Qdisc *sch);
> + void(*change_tx_queue_len)(struct Qdisc *, unsigned 
> int);
>  
>   int (*dump)(struct Qdisc *, struct sk_buff *);
>   int (*dump_stats)(struct Qdisc *, struct gnet_dump 
> *);
> @@ -488,6 +489,7 @@ void qdisc_class_hash_remove(struct Qdisc_class_hash *,
>  void qdisc_class_hash_grow(struct Qdisc *, struct Qdisc_class_hash *);
>  void qdisc_class_hash_destroy(struct Qdisc_class_hash *);
>  
> +void dev_qdisc_change_tx_queue_len(struct net_device *dev);
>  void dev_init_scheduler(struct net_device *dev);
>  void dev_shutdown(struct net_device *dev);
>  void dev_activate(struct net_device *dev);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 99d353e4cbb2..24809d858a64 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -7058,6 +7058,7 @@ int dev_change_tx_queue_len(struct net_device *dev, 
> unsigned long new_len)
>   dev->tx_queue_len = orig_len;
>   return res;
>   }
> + dev_qdisc_change_tx_queue_len(dev);
>   }
>  
>   return 0;
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index ef8b4ecde2ac..30aaeb3c1bf1 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1178,6 +1178,28 @@ void dev_deactivate(struct net_device *dev)
>  }
>  EXPORT_SYMBOL(dev_deactivate);
>  
> +static void qdisc_change_tx_queue_len(struct net_device *dev,
> +   struct netdev_queue *dev_queue,
> +   void *unused)
> +{
> + struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
> + const struct Qdisc_ops *ops = qdisc->ops;
> +
> + if (ops->change_tx_queue_len)
> + ops->change_tx_queue_len(qdisc, dev->tx_queue_len);

hmm what happens if the resize fails in the next patch,

>  
> +static void pfifo_fast_change_tx_queue_len(struct Qdisc *sch, unsigned int 
> new_len)
> +{
> + struct pfifo_fast_priv *priv = qdisc_priv(sch);
> + int prio;
> +
> + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) {
> + struct skb_array *q = band2list(priv, prio);
> +
> + skb_array_resize(q, new_len, GFP_KERNEL);
> + }
> +}
> +

Here skb_array_resize() can fail with ENOMEM, do we need to unwind the
change and push the error up the stack?

Thanks,
John


Re: [Patch net-next 1/3] net: introduce helper dev_change_tx_queue_len()

2018-01-21 Thread John Fastabend
On 01/19/2018 03:09 PM, Cong Wang wrote:
> This patch promotes the local change_tx_queue_len() to a core
> helper function, dev_change_tx_queue_len(), so that rtnetlink
> and net-sysfs could share the code. This also prepares for the
> following patch.
> 
> Note, the -EFAULT in the original code doesn't make sense,
> we should propagate the errno from notifiers.
> 
> Cc: John Fastabend 
> Signed-off-by: Cong Wang 
> ---


Acked-by: John Fastabend 



Re: [PATCH] rtl8xxxu: Fix trailing semicolon

2018-01-21 Thread Jes Sorensen
On 01/17/2018 05:56 AM, Luis de Bethencourt wrote:
> The trailing semicolon is an empty statement that does no operation.
> Removing it since it doesn't do anything.
> 
> Signed-off-by: Luis de Bethencourt 
> ---
> 
> Hi,
> 
> After fixing the same thing in drivers/staging/rtl8723bs/, Joe Perches
> suggested I fix it treewide [0].
> 
> Best regards 
> Luis
> 
> 
> [0] 
> http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-January/115410.html
> [1] 
> http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-January/115390.html
> 
>  drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Acked-by: Jes Sorensen 


> diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h 
> b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
> index 95e3993d8a33..8828baf26e7b 100644
> --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
> +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
> @@ -1172,7 +1172,7 @@ struct rtl8723bu_c2h {
>  
>   u8 basic_rate:1;
>   u8 bt_has_reset:1;
> - u8 dummy4_1:1;;
> + u8 dummy4_1:1;
>   u8 ignore_wlan:1;
>   u8 auto_report:1;
>   u8 dummy4_2:3;
> 



Re: ipv6_addrconf: WARNING about suspicious RCU usage

2018-01-21 Thread Heiner Kallweit
Am 20.01.2018 um 20:19 schrieb Ido Schimmel:
> On Sat, Jan 20, 2018 at 10:49:03AM -0800, Eric Dumazet wrote:
>> On Sat, 2018-01-20 at 15:37 +0200, Ido Schimmel wrote:
>>> On Sat, Jan 20, 2018 at 12:57:01PM +0100, Heiner Kallweit wrote:
 Since some time (didn't bisect it yet) I get the following warning.
 Is it a known issue?
>>>
>>> [...]
>>>
 [86220.126999] BUG: sleeping function called from invalid context at 
 mm/slab.h:420
 [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: 
 kworker/0:2
 [86220.127082] 4 locks held by kworker/0:2/1003:
 [86220.127107]  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: 
 [] process_one_work+0x1de/0x680
 [86220.127179]  #1:  ((addr_chk_work).work){+.+.}, at: 
 [] process_one_work+0x1de/0x680
 [86220.127242]  #2:  (rtnl_mutex){+.+.}, at: [] 
 rtnl_lock+0x12/0x20
 [86220.127300]  #3:  (rcu_read_lock_bh){}, at: [] 
 addrconf_verify_rtnl+0x1e/0x510 [ipv6]
 [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 
 4.15.0-rc7-next-20180110+ #7
 [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS 
 B246P105 06/01/2015
 [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
 [86220.127568] Call Trace:
 [86220.127591]  dump_stack+0x70/0x9e
 [86220.127616]  ___might_sleep+0x14d/0x240
 [86220.127644]  __might_sleep+0x45/0x80
 [86220.127672]  kmem_cache_alloc_trace+0x53/0x250
 [86220.127717]  ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
 [86220.127762]  ipv6_add_addr+0xfe/0x6e0 [ipv6]
 [86220.127807]  ipv6_create_tempaddr+0x24d/0x430 [ipv6]
 [86220.127854]  ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
 [86220.127903]  addrconf_verify_rtnl+0x339/0x510 [ipv6]
 [86220.127950]  ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
 [86220.127998]  addrconf_verify_work+0xe/0x20 [ipv6]
 [86220.128032]  process_one_work+0x258/0x680
 [86220.128063]  worker_thread+0x35/0x3f0
 [86220.128091]  kthread+0x124/0x140
 [86220.128117]  ? process_one_work+0x680/0x680
 [86220.128146]  ? kthread_create_worker_on_cpu+0x40/0x40
 [86220.128180]  ? umh_complete+0x40/0x40
 [86220.128207]  ? call_usermodehelper_exec_async+0x12a/0x160
 [86220.128243]  ret_from_fork+0x4b/0x60
>>>
>>> Can you please try attached patch (untested)?
>>
>>
>>
>> I would also/instead break rcu section.
> 
> Thanks Eric, this should work. We can continue to block in
> ipv6_create_tempaddr().
> 
> Heiner, can you try Eric's patch instead?
> 
So far everything looks good with Eric's patch. The warning didn't show up 
again.

>>
>> Holding RCU (and BH) for whole hash traversal is a recipe for disaster,
>> if we have thousands of IPv6 addresses.
>>
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 
>> ab99cb641b7cccdda0ad4ae553c09274d7dbc047..adda73466ae1dd0f3b700b3db5fbf3065e4d3f7f
>>  100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -4356,9 +4356,11 @@ static void addrconf_verify_rtnl(void)
>>  spin_lock(>lock);
>>  ifpub->regen_count = 0;
>>  spin_unlock(>lock);
>> +rcu_read_unlock_bh();
>>  ipv6_create_tempaddr(ifpub, 
>> ifp, true);
>>  in6_ifa_put(ifpub);
>>  in6_ifa_put(ifp);
>> +rcu_read_lock_bh();
>>  goto restart;
>>  }
>>  } else if (time_before(ifp->tstamp + 
>> ifp->prefered_lft * HZ - regen_advance * HZ, next))
>>
>>
>>
>>
> 



Darlehen

2018-01-21 Thread defina
Benötigen Sie Privat- oder Geschäftskredite ohne Stress und schnelle 
Zustimmung? Wenn ja, kontaktieren Sie uns bitte alexgr...@gmail.com


Re: [PATCH net-next 0/7] tcp: implement rb-tree based retransmit queue

2018-01-21 Thread Tal Gilboa

Hi Eric,
We have noticed a degradation on both of our drivers (mlx4 and mlx5) 
when running TCP. Exact scenario is single stream TCP with 1KB packets. 
The degradation is a steady 50% drop.

We tracked the offending commit to be:
75c119a ("tcp: implement rb-tree based retransmit queue")

Since mlx4 and mlx5 code base is completely different and by looking at 
the changes in this commit, we believe the issue is external to the 
mlx4/5 drivers.


I see in the comment below you anticipated some overhead, but this may 
be a too common case to ignore.


Can you please review and consider reverting/fixing it?

Thanks,

Tal G.

On 10/7/2017 2:31 AM, David Miller wrote:

From: Eric Dumazet 
Date: Thu,  5 Oct 2017 22:21:20 -0700


This patch series implement RB-tree based retransmit queue for TCP,
to better match modern BDP.


Indeed, there was a lot of resistence to this due to the overhead
for small retransmit queue sizes, but with today's scale this is
long overdue.

So, series applied, nice work!

Maybe we can look into dynamic schemes where when the queue never
goes over N entries we elide the rbtree and use a list.  I'm not
so sure how practical that would be.

Thanks!



Re: Memory corruption with r8169 across several device revisions and kernels

2018-01-21 Thread Oliver Freyermuth
Hi, 

Am 21.01.2018 um 21:48 schrieb Francois Romieu:
> Oliver Freyermuth  :
> [...]
> 
> Is it an AMD based system ?
> 

No, all the systems on which I have observed this up to now are Intel-based. 
Two Haswell and one Sandy Bridge system. 

Cheers,
Oliver


Re: Memory corruption with r8169 across several device revisions and kernels

2018-01-21 Thread Francois Romieu
Oliver Freyermuth  :
[...]

Is it an AMD based system ?

-- 
Ueimor


Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-21 Thread Jason Gunthorpe
> Hmm, this is actually consistent with the example below [1].
> 
> AIU from the example, it seems that the dma_wmb/dma_rmb barriers are good
> for synchronizing cpu/device accesses to the "Streaming DMA mapped" buffers
> (the descriptors, went through the dma_map_page() API), but not for the
> doorbell (a coherent memory, typically allocated via dma_alloc_coherent)
> that requires using the stronger wmb() barrier.

If x86 truely requires a wmb() (aka SFENCE) here then the userspace
RDMA stuff is broken too, and that has been tested to death at this
point..

I looked into this at one point and I thought I concluded that x86 did
not require a SFENCE between a posted PCI write and writes to system
memory to guarnetee order with-respect-to the PCI device?

Well, so long as non-temporal stores and other specialty accesses are
not being used.. Is there a chance a fancy sse optimized memcpy or
memset, crypto or something is being involved here?

However, Documentation/memory-barriers.txt does seem pretty clear that
the kernel definition of wmb() makes it required here, even if it
might be overkill for x86?

Jason


Re: [Patch net-next 1/3] net: introduce helper dev_change_tx_queue_len()

2018-01-21 Thread Cong Wang
On Sat, Jan 20, 2018 at 8:52 PM, John Fastabend
 wrote:
> On 01/19/2018 03:09 PM, Cong Wang wrote:
>> This patch promotes the local change_tx_queue_len() to a core
>> helper function, dev_change_tx_queue_len(), so that rtnetlink
>> and net-sysfs could share the code. This also prepares for the
>> following patch.
>>
>> Note, the -EFAULT in the original code doesn't make sense,
>> we should propagate the errno from notifiers.
>>
>> Cc: John Fastabend 
>> Signed-off-by: Cong Wang 
>> ---
>
>
> [...]
>
>>  static ssize_t tx_queue_len_store(struct device *dev,
>> struct device_attribute *attr,
>> const char *buf, size_t len)
>> @@ -355,7 +332,7 @@ static ssize_t tx_queue_len_store(struct device *dev,
>>   if (!capable(CAP_NET_ADMIN))
>>   return -EPERM;
>>
>> - return netdev_store(dev, attr, buf, len, change_tx_queue_len);
>> + return netdev_store(dev, attr, buf, len, dev_change_tx_queue_len);
>>  }
>
> Is this protected by RTNL lock? If not what happens if this and do_setlink
> both try to change tx queue length at the same time? Seems we could get
> a race with multiple dev_deactivate/dev_activate sequences in-flight in
> the following 2/3 patch.

Surely it is protected by RTNL...

  96 if (!rtnl_trylock())
  97 return restart_syscall();
  98
  99 if (dev_isalive(netdev)) {
 100 ret = (*set)(netdev, new);
 101 if (ret == 0)
 102 ret = len;
 103 }
 104 rtnl_unlock();


Re: [PATCH net-next v2 2/2] net/sched: act_csum: don't use spinlock in the fast path

2018-01-21 Thread Cong Wang
On Fri, Jan 19, 2018 at 6:12 AM, Davide Caratti  wrote:
>  static int tcf_csum_dump(struct sk_buff *skb, struct tc_action *a, int bind,
> @@ -575,15 +594,18 @@ static int tcf_csum_dump(struct sk_buff *skb, struct 
> tc_action *a, int bind,
>  {
> unsigned char *b = skb_tail_pointer(skb);
> struct tcf_csum *p = to_tcf_csum(a);
> +   struct tcf_csum_params *params;
> struct tc_csum opt = {
> -   .update_flags = p->update_flags,
> .index   = p->tcf_index,
> -   .action  = p->tcf_action,
> .refcnt  = p->tcf_refcnt - ref,
> .bindcnt = p->tcf_bindcnt - bind,
> };
> struct tcf_t t;
>
> +   params = rcu_dereference(p->params);

I think you need rtnl_dereference() here, as we don't have RCU read lock here?


> +   opt.action = params->action;
> +   opt.update_flags = params->update_flags;
> +


Thanks.


Re: [patch iproute2 net-next v12 0/3] tc: shared block support

2018-01-21 Thread David Ahern
On 1/20/18 3:00 AM, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> Kernel allows to share all filters between qdiscs with use
> of shared block.
> 
> Example:
> 
> block number 22. "22" is just an identification:
> $ tc qdisc add dev ens7 ingress_block 22 ingress
> 
> $ tc qdisc add dev ens8 ingress_block 22 ingress
> 
> 
> If we don't specify "block" command line option, no shared block would
> be created:
> $ tc qdisc add dev ens9 ingress
> 
> Now if we list the qdiscs, we will see the block index in the output:
> 
> $ tc qdisc
> qdisc ingress : dev ens7 parent :fff1 ingress_block 22
> qdisc ingress : dev ens8 parent :fff1 ingress_block 22
> qdisc ingress : dev ens9 parent :fff1
> 
> 
> To make is more visual, the situation looks like this:
> 
>ens7 ingress qdisc ens7 ingress qdisc
>   |  |
>   |  |
>   +-->  block 22  <--+
> 
> Unlimited number of qdiscs may share the same block.
> 
> Block sharing is also supported for clsact qdisc:
> $ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
> $ tc qdisc show dev ens10
> qdisc clsact : dev ens10 parent :fff1 ingress_block 23 egress_block 24
> 
> 
> We can add filter using the block index:
> 
> $ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 
> action drop
> 
> 
> Note we cannot use the qdisc for filter manipulations of shared blocks:
> 
> $ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 
> 192.168.100.2 action drop
> Error: This filter block is shared. Please use the block index to manipulate 
> the filters.
> 
> 
> We will see the same output if we list filters for ingress qdisc of
> ens7 and ens8, also for the block 22:
> 
> $ tc filter show block 22
> filter protocol ip pref 25 flower chain 0
> filter protocol ip pref 25 flower chain 0 handle 0x1
> ...
> 
> $ tc filter show dev ens7 ingress
> filter block 22 protocol ip pref 25 flower chain 0
> filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
> ...
> 
> $ tc filter show dev ens8 ingress
> filter block 22 protocol ip pref 25 flower chain 0
> filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
> ...
> 
> ---
> v11->v12:
> - separate iproute2 patchset, kernel part is merged
> - original patch 1 is removed as the header changes are already
>   in iproute2 code
> - patch 1:
>  - fixed return type of tc_qdisc_block_exists
>  - removed 0 checks for block in tc_qdisc_block_exists
>  - rever xmas tree variables in tc_qdisc_block_exists
> - patch 2
>  - fixes error message when both dev and block are on the command line
> 
> ---
> Note that if you want me to change the ">=" vs "==" thing, I will do it.
> But I think that it is a global iproute2 thing and should be changed in
> a separate patch/patchset.

IMO, it should be checking for ==. If the payload is not of an expected
length, then either the kernel and userspace differ on the API or the
parsing has gone off the rails.

FWIW, all of the the 'RTA_PAYLOAD >=' checks are under tc/.

Series applied to iproute2-next. Thanks, Jiri.


Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support

2018-01-21 Thread Andrew Lunn
> However interesting as an example, I'm not convinced this is what we
> should aim for.
> 
> ACPI is not a replacement for DT, and it is unlikely that people would
> be interested in running ACPI-only distros such as RHEL on their
> Ethernet switch. DT is excellent at describing this, and there is no
> need to replace it.

I however do know of somebody who is building an industrial wireless
access point, with a Marvell Ethernet switch. They have chosen to use
an Intel CPU, and want to run CentOS on it, because that is what they
know.

So they will be using ACPI whatever, for the basic board support. They
then have a choice of either ACPI for the switch, or having device
tree as well as ACPI for the switch. And it will be interesting to see
how that works. Can DT reference GPIOs and I2C busses in ACPI?

> Also, please bear in mind that the ACPI spec is owned by the UEFI/ACPI
> forum, and only members (who are all under a contract regarding
> reasonable and non-discriminatory (RAND) licensing terms to IP they
> own that is covered by the spec) can contribute. Individuals can
> become members for free AFAIR, but that doesn't mean individuals are
> typically interested in getting involved in a process that is rather
> tedious and bureaucratic.

And this is a bit what i'm worried about. How you represent MDIO
busses, PHYs, Ethernet switches is implemented once in the core Linux
code. So i would be interested in knowing what happens if the Linux
community defines and implements something, boards start using it, but
a year later the tedious and bureaucratic process rejects it,
re-writes it, in an incompatible way.

 Andrew


Re: [PATCH iproute2 0/6] utils: Get rid of inet_get_addr()

2018-01-21 Thread David Ahern
On 1/18/18 11:13 AM, Serhey Popovych wrote:
> It looks confusing to have multiple independent
> routines to get internet address from it's string
> representation: get_addr() and inet_get_addr().
> 
> Most complicated users of inet_get_addr() is
> iplink_geneve.c and iplink_vxlan.c because they
> required to handle both AF_INET and AF_INET6
> for their local/remote endpoints.
> 
> On the other hand get_addr() does not provide
> additional information like address type: need
> to address this. to get rid of current and
> possible future code duplications. Note that
> this functionality is first step to make proto
> independent handling of local/remote endpoints
> in ip/tunnel code (there will be additional
> series based on this one).
> 
> Also fix get_addr_1() and get_prefix() to make
> sure it always provide correct ->family and
> ->bitlen.
> 
> As always comments, suggestions and criticism
> are welcome.

Series applied to iproute2-next. Thanks,



Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support

2018-01-21 Thread Ard Biesheuvel
On 21 January 2018 at 16:13, Andrew Lunn  wrote:
>> Right. So if you need to have some additional "parameters" with the
>> connection, then I suppose you may want to go with the GenericSerialBus
>> route. However, looking at the sample device tree description:
>>
>> davinci_mdio: ethernet@5c03 {
>> compatible = "ti,davinci_mdio";
>> reg = <0x5c03 0x1000>;
>> #address-cells = <1>;
>> #size-cells = <0>;
>>
>> reset-gpios = < 5 GPIO_ACTIVE_LOW>;
>> reset-delay-us = <2>;
>>
>> ethphy0: ethernet-phy@1 {
>> reg = <1>;
>> };
>>
>> ethphy1: ethernet-phy@3 {
>> reg = <3>;
>> };
>> };
>>
>> would pretty much translate directly to this in ACPI if you don't need
>> any additional attributes:
>>
>>   Device (ETH0) {
>>   Name (_ADR, /* PCI address of the NIC */)
>>
>>   Device (PHY0) {
>>   Name (_ADR, 1)
>>   ...
>>   }
>>
>>   Device (PHY1) {
>>   Name (_ADR, 3)
>>   ...
>>   }
>>   }
>>
>> which looks pretty simple to me. You can also use _DSM and _DSD here to
>> pass information (like the protocol number) for the PHY devices to Linux.
>
> I'm not particularly worried about that simple case. Other than, i
> don't want people to think that is all that is required.
>
> For a more full example, take a look at vf610-zii-dev-rev-b.dts. The
> Freescale FEC Ethernet controller provides the base MDIO device,
> mdio1. On top of this is an MDIO mux, using a few GPIO lines to
> enable/disable 3 child MDIO busses. Each of these busses has an
> Ethernet Switch. The Ethernet switch exports up to two MDIO busses,
> and on these busses are Ethernet PHYs which are embedded inside the
> switch. The Ethernet switches are also interrupt controllers, with the
> PHYs having interrupt properties which point back to the interrupt
> controller in the switch.
>
> So i'm interested in an ACPI proposal which supports this board.
>

However interesting as an example, I'm not convinced this is what we
should aim for.

ACPI is not a replacement for DT, and it is unlikely that people would
be interested in running ACPI-only distros such as RHEL on their
Ethernet switch. DT is excellent at describing this, and there is no
need to replace it.

ACPI is about firmware abstractions: you don't need to describe every
stacked interrupt controller in minute detail to the OS if the
firmware configures it sufficiently. That way, the OS does not need to
know all these details, and vendors can update their hardware without
having to update the software as well. (Or that is the idea at least,
how that works out in practice on arm64 systems remains to be seen)

Taking the Marvell 8040 as an example: the ACPI description does not
expose the ICU interrupt controllers to the OS, but the firmware
configures them and describes their configured state as ordinary GIC
interrupts.

Also, please bear in mind that the ACPI spec is owned by the UEFI/ACPI
forum, and only members (who are all under a contract regarding
reasonable and non-discriminatory (RAND) licensing terms to IP they
own that is covered by the spec) can contribute. Individuals can
become members for free AFAIR, but that doesn't mean individuals are
typically interested in getting involved in a process that is rather
tedious and bureaucratic.

So my suggestion would be to find out to what extent the latest
version of the spec can describe non-trivial MDIO topologies, and
hopefully that includes the mvpp2 on the Marvell 8040.


Re: [PATCH 00/32] Netfilter/IPVS updates for net-next

2018-01-21 Thread David Miller
From: Pablo Neira Ayuso 
Date: Fri, 19 Jan 2018 20:10:09 +0100

> The following patchset contains Netfilter/IPVS updates for your net-next
> tree. Basically, a new extension for ip6tables, simplification work of
> nf_tables that saves us 500 LoC, allow raw table registration before
> defragmentation, conversion of the SNMP helper to use the ASN.1 code
> generator, unique 64-bit handle for all nf_tables objects and fixes to
> address fallout from previous nf-next batch.  More specifically, they
> are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Pulled, thanks.


Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-21 Thread Eric Dumazet
On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote:
> 
> On 21/01/2018 11:31 AM, Tariq Toukan wrote:
> > 
> > 
> > On 19/01/2018 5:49 PM, Eric Dumazet wrote:
> > > On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
> > > > Hi Tariq
> > > > 
> > > > Very sad that the crash was reproduced again after applied the patch.
> 
> Memory barriers vary for different Archs, can you please share more 
> details regarding arch and repro steps?

Yeah, mlx4 NICs in Google fleet receive trillions of packets per
second, and we never noticed an issue.

Although we are using a slightly different driver, using order-0 pages
and fast pages recycling.



Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-21 Thread Tariq Toukan



On 21/01/2018 11:31 AM, Tariq Toukan wrote:



On 19/01/2018 5:49 PM, Eric Dumazet wrote:

On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:

Hi Tariq

Very sad that the crash was reproduced again after applied the patch.


Memory barriers vary for different Archs, can you please share more 
details regarding arch and repro steps?




--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct 
mlx4_en_rx_ring *ring)
  static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring 
*ring)

  {
+    dma_wmb();


So... is wmb() here fixing the issue ?


  *ring->wqres.db.db = cpu_to_be32(ring->prod & 0x);
  }

I analyzed the kdump, it should be a memory corruption.

Thanks
Jianchao


Hmm, this is actually consistent with the example below [1].

AIU from the example, it seems that the dma_wmb/dma_rmb barriers are 
good for synchronizing cpu/device accesses to the "Streaming DMA mapped" 
buffers (the descriptors, went through the dma_map_page() API), but not 
for the doorbell (a coherent memory, typically allocated via 
dma_alloc_coherent) that requires using the stronger wmb() barrier.



[1] Documentation/memory-barriers.txt

  (*) dma_wmb();
  (*) dma_rmb();

  These are for use with consistent memory to guarantee the ordering
  of writes or reads of shared memory accessible to both the CPU and a
  DMA capable device.

  For example, consider a device driver that shares memory with a 
device
  and uses a descriptor status value to indicate if the descriptor 
belongs

  to the device or the CPU, and a doorbell to notify it when new
  descriptors are available:

 if (desc->status != DEVICE_OWN) {
     /* do not read data until we own descriptor */
     dma_rmb();

     /* read/modify data */
     read_data = desc->data;
     desc->data = write_data;

     /* flush modifications before status update */
     dma_wmb();

     /* assign ownership */
     desc->status = DEVICE_OWN;

     /* force memory to sync before notifying device via MMIO */
     wmb();

     /* notify device of new descriptors */
     writel(DESC_NOTIFY, doorbell);
 }

  The dma_rmb() allows us guarantee the device has released ownership
  before we read the data from the descriptor, and the dma_wmb() allows
  us to guarantee the data is written to the descriptor before the 
device
  can see it now has ownership.  The wmb() is needed to guarantee 
that the
  cache coherent memory writes have completed before attempting a 
write to

  the cache incoherent MMIO region.

  See Documentation/DMA-API.txt for more information on consistent 
memory.




On 01/15/2018 01:50 PM, jianchao.wang wrote:

Hi Tariq

Thanks for your kindly response.

On 01/14/2018 05:47 PM, Tariq Toukan wrote:

Thanks Jianchao for your patch.

And Thank you guys for your reviews, much appreciated.
I was off-work on Friday and Saturday.

On 14/01/2018 4:40 AM, jianchao.wang wrote:

Dear all

Thanks for the kindly response and reviewing. That's really 
appreciated.


On 01/13/2018 12:46 AM, Eric Dumazet wrote:

Does this need to be dma_wmb(), and should it be in
mlx4_en_update_rx_prod_db ?



+1 on dma_wmb()

On what architecture bug was observed ?


This issue was observed on x86-64.
And I will send a new patch, in which replace wmb() with 
dma_wmb(), to customer

to confirm.


+1 on dma_wmb, let us know once customer confirms.
Please place it within mlx4_en_update_rx_prod_db as suggested.


Yes, I have recommended it to customer.
Once I get the result, I will share it here.
All other calls to mlx4_en_update_rx_prod_db are in control/slow 
path so I prefer being on the safe side, and care less about 
bulking the barrier.


Thanks,
Tariq


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support

2018-01-21 Thread Andrew Lunn
> Right. So if you need to have some additional "parameters" with the
> connection, then I suppose you may want to go with the GenericSerialBus
> route. However, looking at the sample device tree description:
> 
> davinci_mdio: ethernet@5c03 {
> compatible = "ti,davinci_mdio";
> reg = <0x5c03 0x1000>;
> #address-cells = <1>;
> #size-cells = <0>;
> 
> reset-gpios = < 5 GPIO_ACTIVE_LOW>;
> reset-delay-us = <2>;
> 
> ethphy0: ethernet-phy@1 {
> reg = <1>;
> };
> 
> ethphy1: ethernet-phy@3 {
> reg = <3>;
> };
> };
> 
> would pretty much translate directly to this in ACPI if you don't need
> any additional attributes:
> 
>   Device (ETH0) {
>   Name (_ADR, /* PCI address of the NIC */)
> 
>   Device (PHY0) {
>   Name (_ADR, 1)
>   ...
>   } 
> 
>   Device (PHY1) {
>   Name (_ADR, 3)
>   ...
>   } 
>   }
> 
> which looks pretty simple to me. You can also use _DSM and _DSD here to
> pass information (like the protocol number) for the PHY devices to Linux.

I'm not particularly worried about that simple case. Other than, i
don't want people to think that is all that is required. 

For a more full example, take a look at vf610-zii-dev-rev-b.dts. The
Freescale FEC Ethernet controller provides the base MDIO device,
mdio1. On top of this is an MDIO mux, using a few GPIO lines to
enable/disable 3 child MDIO busses. Each of these busses has an
Ethernet Switch. The Ethernet switch exports up to two MDIO busses,
and on these busses are Ethernet PHYs which are embedded inside the
switch. The Ethernet switches are also interrupt controllers, with the
PHYs having interrupt properties which point back to the interrupt
controller in the switch.

So i'm interested in an ACPI proposal which supports this board.

   Andrew


[PATCH] r8169: improve multicast hash filter handling

2018-01-21 Thread Heiner Kallweit
The multicast hash filter is a 64 bit value and the code can be
simplified by using a u64 variable.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/ethernet/realtek/r8169.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 272c5962e..9afe7cd4d 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5572,7 +5572,7 @@ static void rtl_set_rx_mode(struct net_device *dev)
 {
struct rtl8169_private *tp = netdev_priv(dev);
void __iomem *ioaddr = tp->mmio_addr;
-   u32 mc_filter[2];   /* Multicast hash filter */
+   u64 mc_filter;  /* Multicast hash filter */
int rx_mode;
u32 tmp = 0;
 
@@ -5582,20 +5582,20 @@ static void rtl_set_rx_mode(struct net_device *dev)
rx_mode =
AcceptBroadcast | AcceptMulticast | AcceptMyPhys |
AcceptAllPhys;
-   mc_filter[1] = mc_filter[0] = 0x;
+   mc_filter = ~0ULL;
} else if ((netdev_mc_count(dev) > multicast_filter_limit) ||
   (dev->flags & IFF_ALLMULTI)) {
/* Too many to filter perfectly -- accept all multicasts. */
rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys;
-   mc_filter[1] = mc_filter[0] = 0x;
+   mc_filter = ~0ULL;
} else {
struct netdev_hw_addr *ha;
 
rx_mode = AcceptBroadcast | AcceptMyPhys;
-   mc_filter[1] = mc_filter[0] = 0;
+   mc_filter = 0ULL;
netdev_for_each_mc_addr(ha, dev) {
int bit_nr = ether_crc(ETH_ALEN, ha->addr) >> 26;
-   mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
+   mc_filter |= BIT_ULL(bit_nr);
rx_mode |= AcceptMulticast;
}
}
@@ -5605,18 +5605,14 @@ static void rtl_set_rx_mode(struct net_device *dev)
 
tmp = (RTL_R32(RxConfig) & ~RX_CONFIG_ACCEPT_MASK) | rx_mode;
 
-   if (tp->mac_version > RTL_GIGA_MAC_VER_06) {
-   u32 data = mc_filter[0];
-
-   mc_filter[0] = swab32(mc_filter[1]);
-   mc_filter[1] = swab32(data);
-   }
+   if (tp->mac_version > RTL_GIGA_MAC_VER_06)
+   swab64s(_filter);
 
if (tp->mac_version == RTL_GIGA_MAC_VER_35)
-   mc_filter[1] = mc_filter[0] = 0x;
+   mc_filter = ~0ULL;
 
-   RTL_W32(MAR0 + 4, mc_filter[1]);
-   RTL_W32(MAR0 + 0, mc_filter[0]);
+   RTL_W32(MAR0 + 4, mc_filter >> 32);
+   RTL_W32(MAR0 + 0, mc_filter & GENMASK_ULL(31, 0));
 
RTL_W32(RxConfig, tmp);
 }
-- 
2.16.0



[PATCH] net: gemini: Depend on HAS_IOMEM

2018-01-21 Thread Linus Walleij
The zeroday builder notices that since Usermode Linux does not
have IO memory, the build fails for them when selecting everything
it can enable.

As the driver is clearly using memory-mapped registers to access
the network adapter, we add depends on HAS_IOMEM to solve this
problem.

Reported-by: kbuild test robot 
Signed-off-by: Linus Walleij 
---
 drivers/net/ethernet/cortina/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/cortina/Kconfig 
b/drivers/net/ethernet/cortina/Kconfig
index 0df743ea51f1..89bc4579724d 100644
--- a/drivers/net/ethernet/cortina/Kconfig
+++ b/drivers/net/ethernet/cortina/Kconfig
@@ -14,6 +14,7 @@ if NET_VENDOR_CORTINA
 config GEMINI_ETHERNET
tristate "Gemini Gigabit Ethernet support"
depends on OF
+   depends on HAS_IOMEM
select PHYLIB
select CRC32
---help---
-- 
2.14.3



Re: [net-next:master 308/357] gemini.c:undefined reference to `devm_ioremap_resource'

2018-01-21 Thread Linus Walleij
On Sat, Jan 20, 2018 at 12:33 PM, kbuild test robot
 wrote:

> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
> master
> head:   8565d26bcb2ff6df646e946d2913fcf706d46b66
> commit: 4d5ae32f5e1e13f7f36d6439ec3257993b9f5b88 [308/357] net: ethernet: Add 
> a driver for Gemini gigabit ethernet
> config: um-allyesconfig (attached as .config)
> compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
> reproduce:
> git checkout 4d5ae32f5e1e13f7f36d6439ec3257993b9f5b88
> # save the attached .config to linux build tree
> make ARCH=um

depends on HAS_IOMEM right?

Yeah UM is a real special case.

I'll send a patch.

Yours,
Linus Walleij


Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support

2018-01-21 Thread Mika Westerberg
On Sun, Jan 21, 2018 at 02:08:40AM +0100, Andrew Lunn wrote:
> > I'm not familiar with MDIO bus but an alternative to GeneriSerialBus
> > would be to follow what SDIO is doing, e.g have the PHY devices listed
> > below the MDIO controller and use _ADR to describe their "address" on
> > that bus. You can see how _ADR applies to SDIO bus from ACPI spec.
> 
> Hi Mika
> 
> SDIO is not a serial bus, well it can be in its simplest form, but
> high speed implementations have 4 data lines. So i can understand them
> not using GenericSerialBus.

Well, I think SDIO is more of a serial bus pretty much the same way than
SPI but it can support more data lines as needed (in the same way than
SPI). But I'm not an expert in SDIO.

However, that's not the point here :-) SATA (which is definitely a serial
bus) uses the same mechanism and not GenericSerialBus.

> MDIO is a serial bus, very similar to SPI, I2C, and UART.
> 
> > If you go with the SDIO way then each PHY is described as normal ACPI
> > device and you can use ACPI _HID/_CID to match the device to the
> > corresponding driver.
> 
> Just some background here. If you have a plain PHY as a device on an
> MDIO bus, you don't need to match it to a driver within ACPI.
> Registers 2 and 3 contain a vendor and product ID. That is what it
> used to match the device to the driver.
> 
> What you might need to know is the protocol to talk on the bus. Most
> devices use clause 22 protocol. A few devices are clause 45. 22 is the
> default in Linux, and you need to indicate if 45 should be used. You
> can also indicate 22.
> 
> It gets more complex when the device on the bus is not a PHY. It is a
> generic bus, you can connect anything to it. Ethernet switches can be
> on the bus. They generally cannot be identified using registers 2 and
> 3. So you do need to match the device to the driver. Most do have ID
> registers, so the driver can work out what specific device is on the
> bus. However, Marvell moved the ID registers on there newer generation
> of devices, so we need to give the driver a hint where to look. So in
> device tree, we have two different compatible string.
> 
> Broadcom really do use it as a generic bus. They have their USB PHYs
> and PCIE PHYs on an MDIO bus. In DT, they have compatible strings to
> match the device to the driver, as normal.

OK, thanks for the explanation.

> We need to ensure what we define for ACPI has the same level of
> flexibility.

Right. So if you need to have some additional "parameters" with the
connection, then I suppose you may want to go with the GenericSerialBus
route. However, looking at the sample device tree description:

davinci_mdio: ethernet@5c03 {
compatible = "ti,davinci_mdio";
reg = <0x5c03 0x1000>;
#address-cells = <1>;
#size-cells = <0>;

reset-gpios = < 5 GPIO_ACTIVE_LOW>;
reset-delay-us = <2>;

ethphy0: ethernet-phy@1 {
reg = <1>;
};

ethphy1: ethernet-phy@3 {
reg = <3>;
};
};

would pretty much translate directly to this in ACPI if you don't need
any additional attributes:

Device (ETH0) {
Name (_ADR, /* PCI address of the NIC */)

Device (PHY0) {
Name (_ADR, 1)
...
} 

Device (PHY1) {
Name (_ADR, 3)
...
} 
}

which looks pretty simple to me. You can also use _DSM and _DSD here to
pass information (like the protocol number) for the PHY devices to Linux.


Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-21 Thread Tariq Toukan



On 19/01/2018 5:49 PM, Eric Dumazet wrote:

On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:

Hi Tariq

Very sad that the crash was reproduced again after applied the patch.

--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct 
mlx4_en_rx_ring *ring)
  
  static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)

  {
+   dma_wmb();


So... is wmb() here fixing the issue ?


*ring->wqres.db.db = cpu_to_be32(ring->prod & 0x);
  }

I analyzed the kdump, it should be a memory corruption.

Thanks
Jianchao


Hmm, this is actually consistent with the example below [1].

AIU from the example, it seems that the dma_wmb/dma_rmb barriers are 
good for synchronizing cpu/device accesses to the "Streaming DMA mapped" 
buffers (the descriptors, went through the dma_map_page() API), but not 
for the doorbell (a coherent memory, typically allocated via 
dma_alloc_coherent) that requires using the stronger wmb() barrier.



[1] Documentation/memory-barriers.txt

 (*) dma_wmb();
 (*) dma_rmb();

 These are for use with consistent memory to guarantee the ordering
 of writes or reads of shared memory accessible to both the CPU and a
 DMA capable device.

 For example, consider a device driver that shares memory with a device
 and uses a descriptor status value to indicate if the descriptor 
belongs

 to the device or the CPU, and a doorbell to notify it when new
 descriptors are available:

if (desc->status != DEVICE_OWN) {
/* do not read data until we own descriptor */
dma_rmb();

/* read/modify data */
read_data = desc->data;
desc->data = write_data;

/* flush modifications before status update */
dma_wmb();

/* assign ownership */
desc->status = DEVICE_OWN;

/* force memory to sync before notifying device via MMIO */
wmb();

/* notify device of new descriptors */
writel(DESC_NOTIFY, doorbell);
}

 The dma_rmb() allows us guarantee the device has released ownership
 before we read the data from the descriptor, and the dma_wmb() allows
 us to guarantee the data is written to the descriptor before the 
device
 can see it now has ownership.  The wmb() is needed to guarantee 
that the
 cache coherent memory writes have completed before attempting a 
write to

 the cache incoherent MMIO region.

 See Documentation/DMA-API.txt for more information on consistent 
memory.




On 01/15/2018 01:50 PM, jianchao.wang wrote:

Hi Tariq

Thanks for your kindly response.

On 01/14/2018 05:47 PM, Tariq Toukan wrote:

Thanks Jianchao for your patch.

And Thank you guys for your reviews, much appreciated.
I was off-work on Friday and Saturday.

On 14/01/2018 4:40 AM, jianchao.wang wrote:

Dear all

Thanks for the kindly response and reviewing. That's really appreciated.

On 01/13/2018 12:46 AM, Eric Dumazet wrote:

Does this need to be dma_wmb(), and should it be in
mlx4_en_update_rx_prod_db ?



+1 on dma_wmb()

On what architecture bug was observed ?


This issue was observed on x86-64.
And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
to confirm.


+1 on dma_wmb, let us know once customer confirms.
Please place it within mlx4_en_update_rx_prod_db as suggested.


Yes, I have recommended it to customer.
Once I get the result, I will share it here.

All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I 
prefer being on the safe side, and care less about bulking the barrier.

Thanks,
Tariq


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html