Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Eric Dumazet



On 12/08/2018 12:14 PM, Ilias Apalodimas wrote:
> On Sat, Dec 08, 2018 at 09:11:53PM +0100, Jesper Dangaard Brouer wrote:
>>>  
>>> I want to make sure you guys thought about splice() stuff, and
>>> skb_try_coalesce(), and GRO, and skb cloning, and ...
>>
>> Thanks for the pointers. To Ilias, we need to double check skb_try_coalesce()
>> code path, as it does look like we don't handle this correctly.
>>
> 
> Noted. We did check skb cloning, but not this one indeed

There is also TCP RX zero copy to consider ( tcp_zerocopy_receive() )
but more generally all paths that can grab pages from skbs.






Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Willy Tarreau
On Sat, Dec 08, 2018 at 10:14:47PM +0200, Ilias Apalodimas wrote:
> On Sat, Dec 08, 2018 at 09:11:53PM +0100, Jesper Dangaard Brouer wrote:
> > >  
> > > I want to make sure you guys thought about splice() stuff, and
> > > skb_try_coalesce(), and GRO, and skb cloning, and ...
> > 
> > Thanks for the pointers. To Ilias, we need to double check 
> > skb_try_coalesce()
> > code path, as it does look like we don't handle this correctly.
> > 
> 
> Noted. We did check skb cloning, but not this one indeed

I'll try to run some tests on my clearfog later, but for now I'm still
100% busy until haproxy 1.9 gets released.

Cheers,
Willy


Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-08 Thread Andrew Lunn
> That is not how XDP works. XDP must run on the "master" physical driver
> interface, as the "slave" interface is a virtual DSA interface.  I did
> mention DSA because I'm handling that on the EspressoBin implementation.

Hi Jesper

It is going to be interesting to see how you do that.

As a user, i want to attach the XDP program to a slave interface. So i
assume you somehow redirect that to the master, with some extra meta
data to allow XDP on the master to detect packets for a specific
slave?

   Andrew


Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-08 Thread Jesper Dangaard Brouer
On Sat, 8 Dec 2018 19:43:38 +0100
Björn Töpel  wrote:

> Den lör 8 dec. 2018 kl 15:52 skrev Jesper Dangaard Brouer :
> >
> > On Fri, 7 Dec 2018 15:01:55 +0100
> > Björn Töpel  wrote:
> >  
> > > Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer 
> > > :  
> > > >
> > > > On Fri,  7 Dec 2018 12:44:24 +0100
> > > > Björn Töpel  wrote:
> > > >  
> > > > > The rationale behind attach is performance and ease of use. Many XDP
> > > > > socket users just need a simple way of creating/binding a socket and
> > > > > receiving frames right away without loading an XDP program.
> > > > >
> > > > > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > > > > is a kernel provided XDP program that is installed to the netdev when
> > > > > XDP_ATTACH is being passed as a bind() flag.
> > > > >
> > > > > The builtin program is the simplest program possible to redirect a
> > > > > frame to an attached socket. In restricted C it would look like this:
> > > > >
> > > > >   SEC("xdp")
> > > > >   int xdp_prog(struct xdp_md *ctx)
> > > > >   {
> > > > > return bpf_xsk_redirect(ctx);
> > > > >   }
> > > > >
> > > > > The builtin program loaded via XDP_ATTACH behaves, from an
> > > > > install-to-netdev/uninstall-from-netdev point of view, differently
> > > > > from regular XDP programs. The easiest way to look at it is as a
> > > > > 2-level hierarchy, where regular XDP programs has precedence over the
> > > > > builtin one.
> > > > >
> > > > > If no regular XDP program is installed to the netdev, the builtin will
> > > > > be install. If the builtin program is installed, and a regular is
> > > > > installed, regular XDP program will have precedence over the builtin
> > > > > one.
> > > > >
> > > > > Further, if a regular program is installed, and later removed, the
> > > > > builtin one will automatically be installed.
> > > > >
> > > > > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > > > > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > > > > corresponding XDP netlink install flags.
> > > > >
> > > > > The builtin XDP program functionally adds even more complexity to the
> > > > > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > > > > store the program in the struct net_device together with the install
> > > > > flags instead of calling the ndo_bpf multiple times?  
> > > >
> > > > (As far as I can see from reading the code, correct me if I'm wrong.)
> > > >
> > > > If an AF_XDP program uses XDP_ATTACH, then it installs the
> > > > builtin-program as the XDP program on the "entire" device.  That means
> > > > all RX-queues will call this XDP-bpf program (indirect call), and it is
> > > > actually only relevant for the specific queue_index.  Yes, the helper
> > > > call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> > > > and return XDP_PASS if it is NULL:
> > > >  
> > >
> > > Yes, you are correct. The builtin XDP program, just like a regular XDP
> > > program, affects the whole netdev. So, yes the non-AF_XDP queues would
> > > get a performance hit from this. Just to reiterate -- this isn't new
> > > for this series. This has always been the case for XDP when acting on
> > > just one queue.
> > >  
> > > > +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> > > > +{
> > > > +   struct bpf_redirect_info *ri = this_cpu_ptr(_redirect_info);
> > > > +   struct xdp_sock *xsk;
> > > > +
> > > > +   xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> > > > +   if (xsk) {
> > > > +   ri->xsk = xsk;
> > > > +   return XDP_REDIRECT;
> > > > +   }
> > > > +
> > > > +   return XDP_PASS;
> > > > +}
> > > >
> > > > Why do every normal XDP_PASS packet have to pay this overhead (indirect
> > > > call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> > > > is tied hard and only relevant to a specific RX-queue (which is why we
> > > > get a performance boost due to SPSC queues).
> > > >
> > > > I acknowledge there is a need for this, but this use-case shows there
> > > > is a need for attaching XDP programs per RX-queue basis.
> > > >  
> > >
> > > From my AF_XDP perspective, having a program per queue would make
> > > sense. The discussion of a per-queue has been up before, and I think
> > > the conclusion was that it would be too complex from a
> > > configuration/tooling point-of-view. Again, for AF_XDP this would be
> > > great.  
> >
> > I really think it is about time that we introduce these per-queue XDP
> > programs.  From a configuration/tooling point-of-view, your proposed
> > solution have the same issues, we have to solve. (Like listing
> > introspection need to show the per-queue XDP/BPF programs). And you have
> > almost solved it, by keeping the per-queue program in net_device
> > struct.
> >  
> 
> I just to emphasize; My series is only a convenience method for
> loading a (sticky/hierarchical) XDP program and a new 

Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-08 Thread Jesper Dangaard Brouer
On Sat, 8 Dec 2018 17:55:05 +0100
Andrew Lunn  wrote:

> On Sat, Dec 08, 2018 at 04:12:12PM +0100, Jesper Dangaard Brouer wrote:
> > On Fri, 7 Dec 2018 13:21:08 -0800
> > Alexei Starovoitov  wrote:
> >   
> > > for production I suspect the users would want
> > > an easy way to stay safe when they're playing with AF_XDP.
> > > So another builtin program that redirects ssh and ping traffic
> > > back to the kernel would be a nice addition.  
> > 
> > Are you saying a buildin program that need to parse different kinds of
> > Eth-type headers (DSA, VLAN, QinqQ)  
> 
> Hi Jesper
> 
> Parsing DSA headers is quite hard, since most of them are not
> discoverable, you need to know they are there, and where they actually
> are.
> 
> I also don't think there are any use cases for actually trying to
> parse them on the master interface. You are more likely to want to run
> the eBPF program on the slave interface, once the DSA header has been
> removed, and it is clear what interface the frame is actually for.

That is not how XDP works. XDP must run on the "master" physical driver
interface, as the "slave" interface is a virtual DSA interface.  I did
mention DSA because I'm handling that on the EspressoBin implementation.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Ilias Apalodimas
On Sat, Dec 08, 2018 at 12:21:10PM -0800, David Miller wrote:
> From: Ilias Apalodimas 
> Date: Sat, 8 Dec 2018 16:57:28 +0200
> 
> > The patchset speeds up the mvneta driver on the default network
> > stack. The only change that was needed was to adapt the driver to
> > using the page_pool API. The speed improvements we are seeing on
> > specific workloads (i.e 256b < packet < 400b) are almost 3x.
> > 
> > Lots of high speed drivers are doing similar recycling tricks themselves 
> > (and
> > there's no common code, everyone is doing something similar though). All we 
> > are
> > trying to do is provide a unified API to make that easier for the rest. 
> > Another
> > advantage is that if the some drivers switch to the API, adding XDP
> > functionality on them is pretty trivial.
> 
> Yeah this is a very important point moving forward.
> 
> Jesse Brandeberg brought the following up to me at LPC and I'd like to
> develop it further.
> 
> Right now we tell driver authors to write a new driver as SKB based,
> and once they've done all of that work we tell them to basically
> shoe-horn XDP support into that somewhat different framework.
> 
> Instead, the model should be the other way around, because with a raw
> meta-data free set of data buffers we can always construct an SKB or
> pass it to XDP.

Yeah exactly and it gets even worst. If the driver writer doesn't go through the
'proper' path, i.e allocate buffers and use build_skb, you end up having to
rewrite dma/memory management for the nornal stack. So it's more than 
'shoe-horning' XDP, it's re-writing and re-testing the whole thing.
The API also offers dma mapping capabilities (configurable). So you remove 
potential nasty bugs there as well.

> 
> So drivers should be targetting some raw data buffer kind of interface
> which takes care of all of this stuff.  If the buffers get wrapped
> into an SKB and get pushed into the traditional networking stack, the
> driver shouldn't know or care.  Likewise if it ends up being processed
> with XDP, it should not need to know or care.
> 
> All of those details should be behind a common layer.  Then we can
> control:
> 
> 1) Buffer handling, recycling, "fast paths"
> 
> 2) Statistics
> 
> 3) XDP feature sets
> 
> We can consolidate behavior and semantics across all of the drivers
> if we do this.  No more talk about "supporting all XDP features",
> and the inconsistencies we have because of that.
> 
> The whole common statistics discussion could be resolved with this
> common layer as well.
> 
> We'd be able to control and properly optimize everything.


/Ilias


Re: [PATCH net-next] ip: silence udp zerocopy smatch false positive

2018-12-08 Thread David Miller
From: Willem de Bruijn 
Date: Sat,  8 Dec 2018 06:22:46 -0500

> From: Willem de Bruijn 
> 
> extra_uref is used in __ip(6)_append_data only if uarg is set.
> 
> Smatch sees that the variable is passed to sock_zerocopy_put_abort.
> This function accesses it only when uarg is set, but smatch cannot
> infer this.
> 
> Make this dependency explicit.
> 
> Fixes: 52900d22288e ("udp: elide zerocopy operation in hot path")
> Signed-off-by: Willem de Bruijn 

I looked and can't figure out a better way to fix this :)

Applied, thanks Willem.


Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread David Miller
From: Ilias Apalodimas 
Date: Sat, 8 Dec 2018 16:57:28 +0200

> The patchset speeds up the mvneta driver on the default network
> stack. The only change that was needed was to adapt the driver to
> using the page_pool API. The speed improvements we are seeing on
> specific workloads (i.e 256b < packet < 400b) are almost 3x.
> 
> Lots of high speed drivers are doing similar recycling tricks themselves (and
> there's no common code, everyone is doing something similar though). All we 
> are
> trying to do is provide a unified API to make that easier for the rest. 
> Another
> advantage is that if the some drivers switch to the API, adding XDP
> functionality on them is pretty trivial.

Yeah this is a very important point moving forward.

Jesse Brandeberg brought the following up to me at LPC and I'd like to
develop it further.

Right now we tell driver authors to write a new driver as SKB based,
and once they've done all of that work we tell them to basically
shoe-horn XDP support into that somewhat different framework.

Instead, the model should be the other way around, because with a raw
meta-data free set of data buffers we can always construct an SKB or
pass it to XDP.

So drivers should be targetting some raw data buffer kind of interface
which takes care of all of this stuff.  If the buffers get wrapped
into an SKB and get pushed into the traditional networking stack, the
driver shouldn't know or care.  Likewise if it ends up being processed
with XDP, it should not need to know or care.

All of those details should be behind a common layer.  Then we can
control:

1) Buffer handling, recycling, "fast paths"

2) Statistics

3) XDP feature sets

We can consolidate behavior and semantics across all of the drivers
if we do this.  No more talk about "supporting all XDP features",
and the inconsistencies we have because of that.

The whole common statistics discussion could be resolved with this
common layer as well.

We'd be able to control and properly optimize everything.


Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Ilias Apalodimas
On Sat, Dec 08, 2018 at 09:11:53PM +0100, Jesper Dangaard Brouer wrote:
> >  
> > I want to make sure you guys thought about splice() stuff, and
> > skb_try_coalesce(), and GRO, and skb cloning, and ...
> 
> Thanks for the pointers. To Ilias, we need to double check skb_try_coalesce()
> code path, as it does look like we don't handle this correctly.
> 

Noted. We did check skb cloning, but not this one indeed

/Ilias


Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Jesper Dangaard Brouer
On Sat, 8 Dec 2018 11:26:56 -0800
Eric Dumazet  wrote:

> On 12/08/2018 06:57 AM, Ilias Apalodimas wrote:
> > Hi Eric,   
>  This patch is changing struct sk_buff, and is thus per-definition
>  controversial.
> 
>  Place a new member 'mem_info' of type struct xdp_mem_info, just after
>  members (flags) head_frag and pfmemalloc, And not in between
>  headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
>  Copying mem_info during skb_clone() is required.  This makes sure that
>  pages are correctly freed or recycled during the altered
>  skb_free_head() invocation.  
> >>>
> >>> I read this to mean that this 'info' isn't accessed/needed until skb
> >>> is freed.  Any reason its not added at the end?
> >>>
> >>> This would avoid moving other fields that are probably accessed
> >>> more frequently during processing.
> >>>  
> >>
> >> But I do not get why the patch is needed.
> >>
> >> Adding extra cost for each skb destruction is costly.
> >>
> >> I though XDP was all about _not_ having skbs.  
> > 
> > You hit the only point i don't personally like in the code, xdp info in the 
> > skb. Considering the benefits i am convinced this is ok and here's why:
> >   
> >> Please let's do not slow down the non XDP stack only to make XDP more 
> >> appealing.  
> > 
> > We are not trying to do that, on the contrary. The patchset has nothing 
> > towards
> > speeding XDP and slowing down anything else. The patchset speeds up the 
> > mvneta driver on the default network stack. The only change that was needed 
> > was
> > to adapt the driver to using the page_pool API. The speed improvements we 
> > are
> > seeing on specific workloads (i.e 256b < packet < 400b) are almost 3x.
> > 
> > Lots of high speed drivers are doing similar recycling tricks themselves 
> > (and
> > there's no common code, everyone is doing something similar though). All we 
> > are
> > trying to do is provide a unified API to make that easier for the rest. 
> > Another
> > advantage is that if the some drivers switch to the API, adding XDP
> > functionality on them is pretty trivial.
> > 
> > Moreover our tests are only performed on systems without or with SMMU 
> > disabled.
> > On a platform i have access to, enabling and disabling the SMMU has some
> > performance impact. By keeping the buffers mapped we believe this impact
> > will be substantially less (i'll come back with results once i have them on
> > this).
> > 
> > I do understand your point, but the potential advantages on my head
> > overwight that by a lot.
> > 
> > I got other concerns on the patchset though. Like how much memory is it 
> > 'ok' to
> > keep mapped keeping in mind we are using the streaming DMA API. Are we 
> > going to
> > affect anyone else negatively by doing so ?
> >   
>  
> I want to make sure you guys thought about splice() stuff, and
> skb_try_coalesce(), and GRO, and skb cloning, and ...

Thanks for the pointers. To Ilias, we need to double check skb_try_coalesce()
code path, as it does look like we don't handle this correctly.

> I do not see how an essential property of page fragments would be
> attached to sk_buff, without adding extra code all over the places.
> 
> Page recycling in drivers is fine, but should operate on pages that
> the driver owns.
> 

I think we have addressed this.  We are only recycling pages that the
driver owns.  In case of fragments, then we release the DMA-mapping and
don't recycle the page, instead the normal code path is taken (which is
missing in case of skb_try_coalesce).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Sat, 8 Dec 2018 12:36:10 +0100

> The annoying part is actually that depending on the kernel config
> options CONFIG_XFRM, CONFIG_NF_CONNTRACK and CONFIG_BRIDGE_NETFILTER,
> whether there is a cache-line split, where mem_info gets moved into the
> next cacheline.

Note that Florian Westphal's work (trying to help MP-TCP) would
eliminate this variability.


Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Eric Dumazet



On 12/08/2018 06:57 AM, Ilias Apalodimas wrote:
> Hi Eric, 
 This patch is changing struct sk_buff, and is thus per-definition
 controversial.

 Place a new member 'mem_info' of type struct xdp_mem_info, just after
 members (flags) head_frag and pfmemalloc, And not in between
 headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
 Copying mem_info during skb_clone() is required.  This makes sure that
 pages are correctly freed or recycled during the altered
 skb_free_head() invocation.
>>>
>>> I read this to mean that this 'info' isn't accessed/needed until skb
>>> is freed.  Any reason its not added at the end?
>>>
>>> This would avoid moving other fields that are probably accessed
>>> more frequently during processing.
>>>
>>
>> But I do not get why the patch is needed.
>>
>> Adding extra cost for each skb destruction is costly.
>>
>> I though XDP was all about _not_ having skbs.
> 
> You hit the only point i don't personally like in the code, xdp info in the 
> skb. Considering the benefits i am convinced this is ok and here's why:
> 
>> Please let's do not slow down the non XDP stack only to make XDP more 
>> appealing.
> 
> We are not trying to do that, on the contrary. The patchset has nothing 
> towards
> speeding XDP and slowing down anything else. The patchset speeds up the 
> mvneta driver on the default network stack. The only change that was needed 
> was
> to adapt the driver to using the page_pool API. The speed improvements we are
> seeing on specific workloads (i.e 256b < packet < 400b) are almost 3x.
> 
> Lots of high speed drivers are doing similar recycling tricks themselves (and
> there's no common code, everyone is doing something similar though). All we 
> are
> trying to do is provide a unified API to make that easier for the rest. 
> Another
> advantage is that if the some drivers switch to the API, adding XDP
> functionality on them is pretty trivial.
> 
> Moreover our tests are only performed on systems without or with SMMU 
> disabled.
> On a platform i have access to, enabling and disabling the SMMU has some
> performance impact. By keeping the buffers mapped we believe this impact
> will be substantially less (i'll come back with results once i have them on
> this).
> 
> I do understand your point, but the potential advantages on my head
> overwight that by a lot.
> 
> I got other concerns on the patchset though. Like how much memory is it 'ok' 
> to
> keep mapped keeping in mind we are using the streaming DMA API. Are we going 
> to
> affect anyone else negatively by doing so ?
> 


I want to make sure you guys thought about splice() stuff, and 
skb_try_coalesce(),
and GRO, and skb cloning, and ...

I do not see how an essential property of page fragments would be attached
to sk_buff, without adding extra code all over the places.

Page recycling in drivers is fine, but should operate on pages that the driver 
owns.

Messing with skb->head seems not needed for performance, since skb->head can be 
regular
slab allocated.



Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-08 Thread Björn Töpel
Den lör 8 dec. 2018 kl 16:12 skrev Jesper Dangaard Brouer :
>
> On Fri, 7 Dec 2018 13:21:08 -0800
> Alexei Starovoitov  wrote:
>
> > for production I suspect the users would want
> > an easy way to stay safe when they're playing with AF_XDP.
> > So another builtin program that redirects ssh and ping traffic
> > back to the kernel would be a nice addition.
>
> Are you saying a buildin program that need to parse different kinds of
> Eth-type headers (DSA, VLAN, QinqQ) and find the TCP port to match port
> 22 to return XDP_PASS, or else call AF_XDP redurect. That seems to be
> pure overhead for this fast-path buildin program for AF_XDP.
>
> Would a solution be to install a NIC hardware filter that redirect SSH
> port 22 to another RX-queue. And then have a buildin program that
> returns XDP_PASS installed on that RX-queue.   And change Bjørns
> semantics, such that RX-queue programs takes precedence over the global
> XDP program. This would also be a good fail safe in general for XDP.
>

Exactly this; I'd say this is the most common way of using AF_XDP,
i.e. steer a certain flow to a Rx queue, and *all* packets on that
queue are pulled to the AF_XDP socket. This is why the builtin is the
way it is, it's what is being used, not only for benchmarking
scenarios.

> If the RX-queues take precedence, I can use this fail safe approach.
> E.g. when I want to test my new global XDP program, I'll use ethtool
> match my management IP and send that to a specific RX-queue and my
> fail-safe BPF program.
>

Interesting take to have a *per queue* XDP program that overrides the
regular one. Maybe this is a better way to use builtins?

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-08 Thread Björn Töpel
Den lör 8 dec. 2018 kl 15:52 skrev Jesper Dangaard Brouer :
>
> On Fri, 7 Dec 2018 15:01:55 +0100
> Björn Töpel  wrote:
>
> > Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer 
> > :
> > >
> > > On Fri,  7 Dec 2018 12:44:24 +0100
> > > Björn Töpel  wrote:
> > >
> > > > The rationale behind attach is performance and ease of use. Many XDP
> > > > socket users just need a simple way of creating/binding a socket and
> > > > receiving frames right away without loading an XDP program.
> > > >
> > > > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > > > is a kernel provided XDP program that is installed to the netdev when
> > > > XDP_ATTACH is being passed as a bind() flag.
> > > >
> > > > The builtin program is the simplest program possible to redirect a
> > > > frame to an attached socket. In restricted C it would look like this:
> > > >
> > > >   SEC("xdp")
> > > >   int xdp_prog(struct xdp_md *ctx)
> > > >   {
> > > > return bpf_xsk_redirect(ctx);
> > > >   }
> > > >
> > > > The builtin program loaded via XDP_ATTACH behaves, from an
> > > > install-to-netdev/uninstall-from-netdev point of view, differently
> > > > from regular XDP programs. The easiest way to look at it is as a
> > > > 2-level hierarchy, where regular XDP programs has precedence over the
> > > > builtin one.
> > > >
> > > > If no regular XDP program is installed to the netdev, the builtin will
> > > > be install. If the builtin program is installed, and a regular is
> > > > installed, regular XDP program will have precedence over the builtin
> > > > one.
> > > >
> > > > Further, if a regular program is installed, and later removed, the
> > > > builtin one will automatically be installed.
> > > >
> > > > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > > > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > > > corresponding XDP netlink install flags.
> > > >
> > > > The builtin XDP program functionally adds even more complexity to the
> > > > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > > > store the program in the struct net_device together with the install
> > > > flags instead of calling the ndo_bpf multiple times?
> > >
> > > (As far as I can see from reading the code, correct me if I'm wrong.)
> > >
> > > If an AF_XDP program uses XDP_ATTACH, then it installs the
> > > builtin-program as the XDP program on the "entire" device.  That means
> > > all RX-queues will call this XDP-bpf program (indirect call), and it is
> > > actually only relevant for the specific queue_index.  Yes, the helper
> > > call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> > > and return XDP_PASS if it is NULL:
> > >
> >
> > Yes, you are correct. The builtin XDP program, just like a regular XDP
> > program, affects the whole netdev. So, yes the non-AF_XDP queues would
> > get a performance hit from this. Just to reiterate -- this isn't new
> > for this series. This has always been the case for XDP when acting on
> > just one queue.
> >
> > > +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> > > +{
> > > +   struct bpf_redirect_info *ri = this_cpu_ptr(_redirect_info);
> > > +   struct xdp_sock *xsk;
> > > +
> > > +   xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> > > +   if (xsk) {
> > > +   ri->xsk = xsk;
> > > +   return XDP_REDIRECT;
> > > +   }
> > > +
> > > +   return XDP_PASS;
> > > +}
> > >
> > > Why do every normal XDP_PASS packet have to pay this overhead (indirect
> > > call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> > > is tied hard and only relevant to a specific RX-queue (which is why we
> > > get a performance boost due to SPSC queues).
> > >
> > > I acknowledge there is a need for this, but this use-case shows there
> > > is a need for attaching XDP programs per RX-queue basis.
> > >
> >
> > From my AF_XDP perspective, having a program per queue would make
> > sense. The discussion of a per-queue has been up before, and I think
> > the conclusion was that it would be too complex from a
> > configuration/tooling point-of-view. Again, for AF_XDP this would be
> > great.
>
> I really think it is about time that we introduce these per-queue XDP
> programs.  From a configuration/tooling point-of-view, your proposed
> solution have the same issues, we have to solve. (Like listing
> introspection need to show the per-queue XDP/BPF programs). And you have
> almost solved it, by keeping the per-queue program in net_device
> struct.
>

I just to emphasize; My series is only a convenience method for
loading a (sticky/hierarchical) XDP program and a new bpf-function.
Nothing more. I'm not sure what you mean when you say that "proposed
solution have the same issues". Do you mean XDP in general?

*I'm* all in for a per-queue XDP program -- but I might be biased due
to AF_XDP :-P Let's explore in this thread what that would look like
and 

Re: [PATCH] net: dsa: ksz: Increase the tag alignment

2018-12-08 Thread Florian Fainelli
Le 12/7/18 à 8:25 PM, Marek Vasut a écrit :
> On 12/08/2018 01:52 AM, tristram...@microchip.com wrote:
>>> -   padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
>>> +   padlen = (skb->len >= VLAN_ETH_ZLEN) ? 0 : VLAN_ETH_ZLEN - skb-
 len;
>>
>> The requirement is the tail tag should be at the end of frame before FCS.
>> When the length is less than 60 the MAC controller will pad the frame to
>> legal size.  That is why this function makes sure the padding is done
>> manually.  Increasing the size just increases the length of the frame and the
>> chance to allocate new socket buffer.
>>
>> Example of using ping size of 18 will have the sizes of request and response
>> differ by 4 bytes.  Not that it matters much.
>>
>> Are you concerned the MAC controller will remove the VLAN tag and so the 
>> frame
>> will not be sent? Or the switch removes the VLAN tag and is not able to send?
> 
> With TI CPSW in dual-ethernet configuration, which adds internal VLAN
> tag at the end of the frame, the KSZ switch fails. The CPU will send out
> packets and the switch will reject them as corrupted. It needs this
> extra VLAN tag padding.

Oh so they add the internal VLAN at the end of the frame, not the
beginning? That is quite surprising but that would not be the one single
oddity found with CPSW I am afraid.. The way I would approach this is
with layering where the padding needs to occur:

- within the tag driver you need to make sure there is enough room to
insert the KSZ tag

- within the ethernet MAC driver (which comes last in the transmit
path), you need to make sure there is enough room to insert that trailer
VLAN tag to permit internal transmission
-- 
Florian


Re: [PATCH net-next,v5 11/12] qede: place ethtool_rx_flow_spec after code after TC flower codebase

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:40:01PM CET, pa...@netfilter.org wrote:
>This is a preparation patch to reuse the existing TC flower codebase
>from ethtool_rx_flow_spec.
>
>This patch is merely moving the core ethtool_rx_flow_spec parser after
>tc flower offload driver code so we can skip a few forward function
>declarations in the follow up patch.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 10/12] dsa: bcm_sf2: use flow_rule infrastructure

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:40:00PM CET, pa...@netfilter.org wrote:
>Update this driver to use the flow_rule infrastructure, hence we can use
>the same code to populate hardware IR from ethtool_rx_flow and the
>cls_flower interfaces.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Andrew Lunn
> I got other concerns on the patchset though. Like how much memory is
> it 'ok' to keep mapped keeping in mind we are using the streaming
> DMA API. Are we going to affect anyone else negatively by doing so ?

For mvneta, you can expect the target to have between 512Mbyte to
3G. You can take a look at the DT files to get a better feel for this.
All of it should be low enough that it is DMA'able.

These SoCs are often used for WiFi access points, and NAS boxes.  So i
guess the big memory usage on these boxes is the block cache for a NAS
device. So if you want to see the impact of the driver using more
memory, see if disk performance is affected.

Andrew


Re: [PATCH v2 net-next 0/2] platform data controls for mdio-gpio

2018-12-08 Thread Florian Fainelli
Le 12/8/18 à 7:12 AM, Andrew Lunn a écrit :
> Soon to be mainlined is an x86 platform with a Marvell switch, and a
> bit-banging MDIO bus. In order to make this work, the phy_mask of the
> MDIO bus needs to be set to prevent scanning for PHYs, and the
> phy_ignore_ta_mask needs to be set because the switch has broken
> turnaround.
> 
> Add a platform_data structure with these parameters.

Looks good thanks Andrew. I do wonder if we could define a common
mii_bus_platform_data structure eventually which is comprised of these
two members (if nothing else) and maybe update the common
mdiobus_register() code path to look for these members. If a subsequent
platform data/device MDIO bus shows up we could do that at that time.

Thanks!

> 
> Andrew Lunn (2):
>   net: phy: mdio-gpio: Add platform_data support for phy_mask
>   net: phy: mdio-gpio: Add phy_ignore_ta_mask to platform data
> 
>  MAINTAINERS |  1 +
>  drivers/net/phy/mdio-gpio.c |  7 +++
>  include/linux/platform_data/mdio-gpio.h | 14 ++
>  3 files changed, 22 insertions(+)
>  create mode 100644 include/linux/platform_data/mdio-gpio.h
> 


-- 
Florian


Re: [PATCH v2 net-next 2/2] net: phy: mdio-gpio: Add phy_ignore_ta_mask to platform data

2018-12-08 Thread Florian Fainelli
Le 12/8/18 à 7:12 AM, Andrew Lunn a écrit :
> The Marvell 6390 Ethernet switch family does not perform MDIO
> turnaround correctly. Many hardware MDIO bus masters don't care about
> this, but the bitbangging implementation in Linux does by default. Add
> phy_ignore_ta_mask to the platform data so that the bitbangging code
> can be told which devices are known to get TA wrong.
> 
> v2
> --
> int -> u32 in platform data structure
> 
> Signed-off-by: Andrew Lunn 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v2 net-next 1/2] net: phy: mdio-gpio: Add platform_data support for phy_mask

2018-12-08 Thread Florian Fainelli
Le 12/8/18 à 7:12 AM, Andrew Lunn a écrit :
> It is sometimes necessary to instantiate a bit-banging MDIO bus as a
> platform device, without the aid of device tree.
> 
> When device tree is being used, the bus is not scanned for devices,
> only those devices which are in device tree are probed. Without device
> tree, by default, all addresses on the bus are scanned. This may then
> find a device which is not a PHY, e.g. a switch. And the switch may
> have registers containing values which look like a PHY. So during the
> scan, a PHY device is wrongly created.
> 
> After the bus has been registered, a search is made for
> mdio_board_info structures which indicates devices on the bus, and the
> driver which should be used for them. This is typically used to
> instantiate Ethernet switches from platform drivers.  However, if the
> scanning of the bus has created a PHY device at the same location as
> indicated into the board info for a switch, the switch device is not
> created, since the address is already busy.
> 
> This can be avoided by setting the phy_mask of the mdio bus. This mask
> prevents addresses on the bus being scanned.
> 
> v2
> --
> int -> u32 in platform data structure
> 
> Signed-off-by: Andrew Lunn 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-08 Thread Andrew Lunn
On Sat, Dec 08, 2018 at 04:12:12PM +0100, Jesper Dangaard Brouer wrote:
> On Fri, 7 Dec 2018 13:21:08 -0800
> Alexei Starovoitov  wrote:
> 
> > for production I suspect the users would want
> > an easy way to stay safe when they're playing with AF_XDP.
> > So another builtin program that redirects ssh and ping traffic
> > back to the kernel would be a nice addition.
> 
> Are you saying a buildin program that need to parse different kinds of
> Eth-type headers (DSA, VLAN, QinqQ)

Hi Jesper

Parsing DSA headers is quite hard, since most of them are not
discoverable, you need to know they are there, and where they actually
are.

I also don't think there are any use cases for actually trying to
parse them on the master interface. You are more likely to want to run
the eBPF program on the slave interface, once the DSA header has been
removed, and it is clear what interface the frame is actually for.

 Andrew


Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-08 Thread Jesper Dangaard Brouer
On Fri, 7 Dec 2018 13:21:08 -0800
Alexei Starovoitov  wrote:

> for production I suspect the users would want
> an easy way to stay safe when they're playing with AF_XDP.
> So another builtin program that redirects ssh and ping traffic
> back to the kernel would be a nice addition.

Are you saying a buildin program that need to parse different kinds of
Eth-type headers (DSA, VLAN, QinqQ) and find the TCP port to match port
22 to return XDP_PASS, or else call AF_XDP redurect. That seems to be
pure overhead for this fast-path buildin program for AF_XDP.

Would a solution be to install a NIC hardware filter that redirect SSH
port 22 to another RX-queue. And then have a buildin program that
returns XDP_PASS installed on that RX-queue.   And change Bjørns
semantics, such that RX-queue programs takes precedence over the global
XDP program. This would also be a good fail safe in general for XDP.

If the RX-queues take precedence, I can use this fail safe approach.
E.g. when I want to test my new global XDP program, I'll use ethtool
match my management IP and send that to a specific RX-queue and my
fail-safe BPF program.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Ilias Apalodimas
Hi Eric, 
> >> This patch is changing struct sk_buff, and is thus per-definition
> >> controversial.
> >>
> >> Place a new member 'mem_info' of type struct xdp_mem_info, just after
> >> members (flags) head_frag and pfmemalloc, And not in between
> >> headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
> >> Copying mem_info during skb_clone() is required.  This makes sure that
> >> pages are correctly freed or recycled during the altered
> >> skb_free_head() invocation.
> > 
> > I read this to mean that this 'info' isn't accessed/needed until skb
> > is freed.  Any reason its not added at the end?
> > 
> > This would avoid moving other fields that are probably accessed
> > more frequently during processing.
> > 
> 
> But I do not get why the patch is needed.
> 
> Adding extra cost for each skb destruction is costly.
> 
> I though XDP was all about _not_ having skbs.

You hit the only point i don't personally like in the code, xdp info in the 
skb. Considering the benefits i am convinced this is ok and here's why:

> Please let's do not slow down the non XDP stack only to make XDP more 
> appealing.

We are not trying to do that, on the contrary. The patchset has nothing towards
speeding XDP and slowing down anything else. The patchset speeds up the 
mvneta driver on the default network stack. The only change that was needed was
to adapt the driver to using the page_pool API. The speed improvements we are
seeing on specific workloads (i.e 256b < packet < 400b) are almost 3x.

Lots of high speed drivers are doing similar recycling tricks themselves (and
there's no common code, everyone is doing something similar though). All we are
trying to do is provide a unified API to make that easier for the rest. Another
advantage is that if the some drivers switch to the API, adding XDP
functionality on them is pretty trivial.

Moreover our tests are only performed on systems without or with SMMU disabled.
On a platform i have access to, enabling and disabling the SMMU has some
performance impact. By keeping the buffers mapped we believe this impact
will be substantially less (i'll come back with results once i have them on
this).

I do understand your point, but the potential advantages on my head
overwight that by a lot.

I got other concerns on the patchset though. Like how much memory is it 'ok' to
keep mapped keeping in mind we are using the streaming DMA API. Are we going to
affect anyone else negatively by doing so ?

Thanks
/Ilias


Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-08 Thread Jesper Dangaard Brouer
On Fri, 7 Dec 2018 15:01:55 +0100
Björn Töpel  wrote:

> Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer :
> >
> > On Fri,  7 Dec 2018 12:44:24 +0100
> > Björn Töpel  wrote:
> >  
> > > The rationale behind attach is performance and ease of use. Many XDP
> > > socket users just need a simple way of creating/binding a socket and
> > > receiving frames right away without loading an XDP program.
> > >
> > > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > > is a kernel provided XDP program that is installed to the netdev when
> > > XDP_ATTACH is being passed as a bind() flag.
> > >
> > > The builtin program is the simplest program possible to redirect a
> > > frame to an attached socket. In restricted C it would look like this:
> > >
> > >   SEC("xdp")
> > >   int xdp_prog(struct xdp_md *ctx)
> > >   {
> > > return bpf_xsk_redirect(ctx);
> > >   }
> > >
> > > The builtin program loaded via XDP_ATTACH behaves, from an
> > > install-to-netdev/uninstall-from-netdev point of view, differently
> > > from regular XDP programs. The easiest way to look at it is as a
> > > 2-level hierarchy, where regular XDP programs has precedence over the
> > > builtin one.
> > >
> > > If no regular XDP program is installed to the netdev, the builtin will
> > > be install. If the builtin program is installed, and a regular is
> > > installed, regular XDP program will have precedence over the builtin
> > > one.
> > >
> > > Further, if a regular program is installed, and later removed, the
> > > builtin one will automatically be installed.
> > >
> > > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > > corresponding XDP netlink install flags.
> > >
> > > The builtin XDP program functionally adds even more complexity to the
> > > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > > store the program in the struct net_device together with the install
> > > flags instead of calling the ndo_bpf multiple times?  
> >
> > (As far as I can see from reading the code, correct me if I'm wrong.)
> >
> > If an AF_XDP program uses XDP_ATTACH, then it installs the
> > builtin-program as the XDP program on the "entire" device.  That means
> > all RX-queues will call this XDP-bpf program (indirect call), and it is
> > actually only relevant for the specific queue_index.  Yes, the helper
> > call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> > and return XDP_PASS if it is NULL:
> >  
> 
> Yes, you are correct. The builtin XDP program, just like a regular XDP
> program, affects the whole netdev. So, yes the non-AF_XDP queues would
> get a performance hit from this. Just to reiterate -- this isn't new
> for this series. This has always been the case for XDP when acting on
> just one queue.
> 
> > +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> > +{
> > +   struct bpf_redirect_info *ri = this_cpu_ptr(_redirect_info);
> > +   struct xdp_sock *xsk;
> > +
> > +   xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> > +   if (xsk) {
> > +   ri->xsk = xsk;
> > +   return XDP_REDIRECT;
> > +   }
> > +
> > +   return XDP_PASS;
> > +}
> >
> > Why do every normal XDP_PASS packet have to pay this overhead (indirect
> > call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> > is tied hard and only relevant to a specific RX-queue (which is why we
> > get a performance boost due to SPSC queues).
> >
> > I acknowledge there is a need for this, but this use-case shows there
> > is a need for attaching XDP programs per RX-queue basis.
> >  
> 
> From my AF_XDP perspective, having a program per queue would make
> sense. The discussion of a per-queue has been up before, and I think
> the conclusion was that it would be too complex from a
> configuration/tooling point-of-view. Again, for AF_XDP this would be
> great.

I really think it is about time that we introduce these per-queue XDP
programs.  From a configuration/tooling point-of-view, your proposed
solution have the same issues, we have to solve. (Like listing
introspection need to show the per-queue XDP/BPF programs). And you have
almost solved it, by keeping the per-queue program in net_device
struct.  

Alexei already requested more types of builtin programs.  But as the
XDP-driver API can only load XDP for the entire NIC, then how can you
use two builtin programs at the same time?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Jesper Dangaard Brouer
On Sat, 8 Dec 2018 04:29:17 -0800
Eric Dumazet  wrote:

> On 12/08/2018 01:57 AM, Florian Westphal wrote:
> > Jesper Dangaard Brouer  wrote:  
> >> From: Ilias Apalodimas 
> >>
> >> This patch is changing struct sk_buff, and is thus per-definition
> >> controversial.
> >>
> >> Place a new member 'mem_info' of type struct xdp_mem_info, just after
> >> members (flags) head_frag and pfmemalloc, And not in between
> >> headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
> >> Copying mem_info during skb_clone() is required.  This makes sure that
> >> pages are correctly freed or recycled during the altered
> >> skb_free_head() invocation.  
> > 
> > I read this to mean that this 'info' isn't accessed/needed until skb
> > is freed.  Any reason its not added at the end?
> > 
> > This would avoid moving other fields that are probably accessed
> > more frequently during processing.
> >   
> 
> But I do not get why the patch is needed.
> 
> Adding extra cost for each skb destruction is costly.
> 
> I though XDP was all about _not_ having skbs.
> 
> Please let's do not slow down the non XDP stack only to make XDP more
> appealing.

In general this work is about better cooperation between XDP and
netstack.  The patch is needed because the page_pool is keeping pages
DMA mapped and need a return hook.  I doubt that the extra
compare-and-branch will affect your use-case on super-scalar CPUs.  We
(Ilias and I) are actually testing this stuff on low-end ARM64 in-order
execution CPUs, which is actually nice as performance effects of our
code changes are not hidden by speculative execution units.  I'm
enforcing (and Ilias agrees) that we do benchmark driven development.
I actually invite people to monitor our progress here[1].  So, trust
me, I am as concerned as you about any performance regression, and is
vigilantly measuring this stuff.  (This is more than you can say about
a lot of the other stuff that gets accepted on this list).


[1] https://github.com/xdp-project/xdp-project/blob/master/areas/arm64/
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH net-next,v5 09/12] ethtool: add ethtool_rx_flow_spec to flow_rule structure translator

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:59PM CET, pa...@netfilter.org wrote:
>This patch adds a function to translate the ethtool_rx_flow_spec
>structure to the flow_rule representation.
>
>This allows us to reuse code from the driver side given that both flower
>and ethtool_rx_flow interfaces use the same representation.
>
>This patch also includes support for the flow type flags FLOW_EXT,
>FLOW_MAC_EXT and FLOW_RSS.
>
>The ethtool_rx_flow_spec_input wrapper structure is used to convey the
>rss_context field, that is away from the ethtool_rx_flow_spec structure,
>and the ethtool_rx_flow_spec structure.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 08/12] flow_offload: add wake-up-on-lan and queue to flow_action

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:58PM CET, pa...@netfilter.org wrote:
>These actions need to be added to support the ethtool_rx_flow interface.
>The queue action includes a field to specify the RSS context, that is
>set via FLOW_RSS flow type flag and the rss_context field in struct
>ethtool_rxnfc, plus the corresponding queue index. FLOW_RSS implies that
>rss_context is non-zero, therefore, queue.ctx == 0 means that FLOW_RSS
>was not set.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 06/12] drivers: net: use flow action infrastructure

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:56PM CET, pa...@netfilter.org wrote:
>This patch updates drivers to use the new flow action infrastructure.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 07/12] cls_flower: don't expose TC actions to drivers anymore

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:57PM CET, pa...@netfilter.org wrote:
>Now that drivers have been converted to use the flow action
>infrastructure, remove this field from the tc_cls_flower_offload
>structure.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 05/12] flow_offload: add statistics retrieval infrastructure and use it

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:55PM CET, pa...@netfilter.org wrote:
>This patch provides the flow_stats structure that acts as container for
>tc_cls_flower_offload, then we can use to restore the statistics on the
>existing TC actions. Hence, tcf_exts_stats_update() is not used from
>drivers anymore.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Eric Dumazet



On 12/08/2018 04:29 AM, Eric Dumazet wrote:
> 

> But I do not get why the patch is needed.
> 
> Adding extra cost for each skb destruction is costly.
> 
> I though XDP was all about _not_ having skbs.
> 
> Please let's do not slow down the non XDP stack only to make XDP more 
> appealing.
> 

I have a similar concern with napi_consume_skb() :

This added a cost for locally generated TCP traffic, since most TX packets are 
fast clones.




Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Eric Dumazet



On 12/08/2018 01:57 AM, Florian Westphal wrote:
> Jesper Dangaard Brouer  wrote:
>> From: Ilias Apalodimas 
>>
>> This patch is changing struct sk_buff, and is thus per-definition
>> controversial.
>>
>> Place a new member 'mem_info' of type struct xdp_mem_info, just after
>> members (flags) head_frag and pfmemalloc, And not in between
>> headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
>> Copying mem_info during skb_clone() is required.  This makes sure that
>> pages are correctly freed or recycled during the altered
>> skb_free_head() invocation.
> 
> I read this to mean that this 'info' isn't accessed/needed until skb
> is freed.  Any reason its not added at the end?
> 
> This would avoid moving other fields that are probably accessed
> more frequently during processing.
> 

But I do not get why the patch is needed.

Adding extra cost for each skb destruction is costly.

I though XDP was all about _not_ having skbs.

Please let's do not slow down the non XDP stack only to make XDP more appealing.


Re: [PATCH 5/5] net: dsa: ksz: Add Microchip KSZ8795 DSA driver

2018-12-08 Thread Andrew Lunn
> +static int ksz8795_valid_dyn_entry(struct ksz_device *dev, u8 *data)
> +{
> + int timeout = 100;
> +
> + do {
> + ksz_read8(dev, REG_IND_DATA_CHECK, data);
> + timeout--;
> + } while ((*data & DYNAMIC_MAC_TABLE_NOT_READY) && timeout);

readx_poll_timeout()?

> +static inline void ksz8795_from_vlan(u16 vlan, u8 *fid, u8 *member, u8 
> *valid)

Please don't use inline in C code, just in headers. Leave the compile
to decide if it should be inlined.

> +static void ksz8795_r_vlan_table(struct ksz_device *dev, u16 vid, u16 *vlan)
> +{
> + u64 buf;
> + u16 *data = (u16 *)
> + u16 addr;
> + int index;

The networking code uses reverse christmas tree. So you need to change
the order of these declarations, and do the assignment in the body of
the function. Please review all the functions.

> +static const u8 stp_multicast_addr[] = {
> + 0x01, 0x80, 0xC2, 0x00, 0x00, 0x00
> +};

include/linux/etherdevice.h defines eth_stp_addr.

Andrew


Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Jesper Dangaard Brouer
On Sat, 8 Dec 2018 10:57:58 +0100
Florian Westphal  wrote:

> Jesper Dangaard Brouer  wrote:
> > From: Ilias Apalodimas 
> > 
> > This patch is changing struct sk_buff, and is thus per-definition
> > controversial.
> > 
> > Place a new member 'mem_info' of type struct xdp_mem_info, just after
> > members (flags) head_frag and pfmemalloc, And not in between
> > headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
> > Copying mem_info during skb_clone() is required.  This makes sure that
> > pages are correctly freed or recycled during the altered
> > skb_free_head() invocation.  
> 
> I read this to mean that this 'info' isn't accessed/needed until skb
> is freed.  Any reason its not added at the end?
>
> This would avoid moving other fields that are probably accessed
> more frequently during processing.

It is placed here because it is close to skb->head_frag, which is used
also used when SKB is freed.  This is the reason, both because I think
we can move the skb->head_frag bit into mem_info, and due to cacheline
accesses (e.g. skb->cloned and destructor pointer are also read, which
is on that cache-line)

The annoying part is actually that depending on the kernel config
options CONFIG_XFRM, CONFIG_NF_CONNTRACK and CONFIG_BRIDGE_NETFILTER,
whether there is a cache-line split, where mem_info gets moved into the
next cacheline.

I do like the idea of moving it to the end, iif we also move
skb->head_frag into mem_info, as skb->head (on end-cacheline) is also
accessed during free, but given we still need to read the cache-line
containing skb->{cloned,destructor}, when I don't think it will be a
win.  I think the skb->cloned value is read during lifetime of the SKB.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH 3/5] net: dsa: ksz: Factor out common tag code

2018-12-08 Thread Andrew Lunn
On Fri, Dec 07, 2018 at 07:18:43PM +0100, Marek Vasut wrote:
> From: Tristram Ha 
> 
> Factor out common code from the tag_ksz , so that the code can be used
> with other KSZ family switches which use differenly sized tags.

I prefer this implementation over what Tristram recently submitted. It
is also what we suggested a while back. However, since then we have
had Spectra/meltdown, and we now know a function call through a
pointer is expensive. This is the hot path, every frame comes through
here, so it is worth taking the time to optimize this. Could you try
to remove the ksz_tag_ops structure. xmit looks simple, since it is a
tail call, so you can do that in ksz9477_xmit. Receive looks a bit
more complex.

I also think for the moment we need it ignore PTP until we have the
big picture sorted out. If need be, the code can be refactored yet
again. But i don't want PTP holding up getting more switches
supported.

> Signed-off-by: Tristram Ha 
> Signed-off-by: Marek Vasut 
> Cc: Vivien Didelot 
> Cc: Woojung Huh 
> Cc: David S. Miller 
> ---
>  net/dsa/tag_ksz.c | 125 +-
>  1 file changed, 90 insertions(+), 35 deletions(-)
> 
> diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
> index 036bc62198f28..d94bad1ab7e53 100644
> --- a/net/dsa/tag_ksz.c
> +++ b/net/dsa/tag_ksz.c
> @@ -14,34 +14,30 @@
>  #include 
>  #include "dsa_priv.h"
>  
> -/* For Ingress (Host -> KSZ), 2 bytes are added before FCS.
> - * 
> ---
> - * 
> DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes)
> - * 
> ---
> - * tag0 : Prioritization (not used now)
> - * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5)
> - *
> - * For Egress (KSZ -> Host), 1 byte is added before FCS.
> - * 
> ---
> - * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|FCS(4bytes)
> - * 
> ---
> - * tag0 : zero-based value represents port
> - * (eg, 0x00=port1, 0x02=port3, 0x06=port7)
> - */
> +struct ksz_tag_ops {
> + void(*get_tag)(u8 *tag, unsigned int *port, unsigned int *len);
> + void(*set_tag)(struct sk_buff *skb, struct net_device *dev);
> +};
>  
> -#define  KSZ_INGRESS_TAG_LEN 2
> -#define  KSZ_EGRESS_TAG_LEN  1
> +/* Typically only one byte is used for tail tag. */
> +#define KSZ_EGRESS_TAG_LEN   1
>  
> -static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
> +/* Frames with following addresse may need to be sent even when the port is
> + * closed.
> + */
> +static const u8 special_mult_addr[] = {
> + 0x01, 0x80, 0xC2, 0x00, 0x00, 0x00
> +};
> +

This is new functionality, not part of the refactoring the
code. Please add that as a new patch. Also, you can probably make use
of is_link_local_ether_addr().

   Andrew


Re: [PATCH 1/5] net: dsa: ksz: Add MIB counter reading support

2018-12-08 Thread Andrew Lunn
> +static void ksz9477_phy_setup(struct ksz_device *dev, int port,
> +   struct phy_device *phy)
> +{
> + if (port < dev->phy_port_cnt) {
> + /* SUPPORTED_Asym_Pause and SUPPORTED_Pause can be removed to
> +  * disable flow control when rate limiting is used.
> +  */
> + phy->advertising = phy->supported;
> + }
> +}
> +

This has nothing to do with MIB counters.

Other than that, is this the same as Tristram's recent submission?
Maybe once the comments have been addressed, we can merge that
version?

Thanks
Andrew


Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-08 Thread Andrew Lunn
> This actually is an individual patch, it doesn't depend on anything.
> Or do you mean a series with the DT documentation change ?

Yes, i mean together with the DT documentation change. Those two
belong together, they are one functional change.

Part of this is also to do with scalability. It takes less effort to
merge one patchset of two patches, as two individual patches. The
truth is, developer time is cheap, maintainer time is expensive, so
the process is optimized towards making the maintainers life easy.

So sometimes you do combine orthogonal changes together into one
patchset, if there is a high purpose, eg. adding support for a new
device on a new board. However, given the situation of two overlapping
patchsets, it might be better to submit smaller patchsets.

   Andrew


Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-08 Thread Andrew Lunn
On Sat, Dec 08, 2018 at 05:23:25AM +0100, Marek Vasut wrote:
> On 12/08/2018 01:13 AM, tristram...@microchip.com wrote:
> >> Do you have a git tree with all the KSZ patches based on -next
> >> somewhere, so I don't have to look for them in random MLs ?
> > 
> > I just sent it this Monday and the subject for that patch is
> > "[PATCH RFC 6/6] net: dsa: microchip: Add switch offload forwarding 
> > support."
> 
> Is all that collected in some git tree somewhere, so I don't have to
> look for various patches in varying states of decay throughout the ML?

Hi Marek, Tristram

It is unfortunate that we have two patchsets being submitted at the
same time, with overlapping functionality. Some degree of cooperation
is needed to handle this.

Could i ask you both to publish a git tree of your patches. And then
figure out how to submit them. Maybe as smaller sets, with the easy,
non-overlapping bits first?

Andrew


RE: [PATCH bpf-next 7/7] samples: bpf: add support for XDP_ATTACH to xdpsock

2018-12-08 Thread Zhang, Qi Z
Hi:

Seems there are only 6 patches of the patch set in patchwork

https://patchwork.ozlabs.org/project/netdev/list/?submitter=70569
https://patchwork.ozlabs.org/project/netdev/list/?series=80389

Anyone can help to check why patch 7/7 is missing?

Thanks
Qi

> -Original Message-
> From: Björn Töpel [mailto:bjorn.to...@gmail.com]
> Sent: Friday, December 7, 2018 7:45 PM
> To: bjorn.to...@gmail.com; Karlsson, Magnus ;
> magnus.karls...@gmail.com; a...@kernel.org; dan...@iogearbox.net;
> netdev@vger.kernel.org
> Cc: Topel, Bjorn ; bro...@redhat.com;
> u9012...@gmail.com; Zhang, Qi Z 
> Subject: [PATCH bpf-next 7/7] samples: bpf: add support for XDP_ATTACH to
> xdpsock
> 
> From: Björn Töpel 
> 
> Teach the sample xdpsock application about XDP_ATTACH.
> 
> Signed-off-by: Björn Töpel 
> ---
>  samples/bpf/xdpsock_user.c | 108 +++--
>  1 file changed, 67 insertions(+), 41 deletions(-)
> 
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index
> 57ecadc58403..12f908b60468 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -633,7 +633,8 @@ static void int_exit(int sig)  {
>   (void)sig;
>   dump_stats();
> - bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
> + if (!(opt_xdp_bind_flags & XDP_ATTACH))
> + bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
>   exit(EXIT_SUCCESS);
>  }
> 
> @@ -650,6 +651,7 @@ static struct option long_options[] = {
>   {"interval", required_argument, 0, 'n'},
>   {"zero-copy", no_argument, 0, 'z'},
>   {"copy", no_argument, 0, 'c'},
> + {"attach", no_argument, 0, 'a'},
>   {0, 0, 0, 0}
>  };
> 
> @@ -670,6 +672,7 @@ static void usage(const char *prog)
>   "  -n, --interval=n Specify statistics update interval 
> (default 1
> sec).\n"
>   "  -z, --zero-copy  Force zero-copy mode.\n"
>   "  -c, --copy   Force copy mode.\n"
> + "  -a, --attach XDP_ATTACH mode.\n"
>   "\n";
>   fprintf(stderr, str, prog);
>   exit(EXIT_FAILURE);
> @@ -682,7 +685,7 @@ static void parse_command_line(int argc, char **argv)
>   opterr = 0;
> 
>   for (;;) {
> - c = getopt_long(argc, argv, "rtli:q:psSNn:cz", long_options,
> + c = getopt_long(argc, argv, "rtli:q:psSNn:cza", long_options,
>   _index);
>   if (c == -1)
>   break;
> @@ -725,11 +728,22 @@ static void parse_command_line(int argc, char
> **argv)
>   case 'c':
>   opt_xdp_bind_flags |= XDP_COPY;
>   break;
> + case 'a':
> + opt_xdp_bind_flags |= XDP_ATTACH;
> + break;
>   default:
>   usage(basename(argv[0]));
>   }
>   }
> 
> + if (opt_xdp_bind_flags & XDP_ATTACH) {
> + if (opt_xdp_flags & XDP_FLAGS_DRV_MODE)
> + opt_xdp_bind_flags |= XDP_BUILTIN_DRV_MODE;
> + if (opt_xdp_flags & XDP_FLAGS_SKB_MODE)
> + opt_xdp_bind_flags |= XDP_BUILTIN_SKB_MODE;
> +
> + }
> +
>   opt_ifindex = if_nametoindex(opt_if);
>   if (!opt_ifindex) {
>   fprintf(stderr, "ERROR: interface \"%s\" does not exist\n", @@
> -903,7 +917,7 @@ int main(int argc, char **argv)
>   struct bpf_prog_load_attr prog_load_attr = {
>   .prog_type  = BPF_PROG_TYPE_XDP,
>   };
> - int prog_fd, qidconf_map, xsks_map;
> + int prog_fd, qidconf_map, xsks_map = -1;
>   struct bpf_object *obj;
>   char xdp_filename[256];
>   struct bpf_map *map;
> @@ -918,59 +932,71 @@ int main(int argc, char **argv)
>   exit(EXIT_FAILURE);
>   }
> 
> - snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o", argv[0]);
> - prog_load_attr.file = xdp_filename;
> + if (!(opt_xdp_bind_flags & XDP_ATTACH)) {
> + snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o",
> +  argv[0]);
> + prog_load_attr.file = xdp_filename;
> 
> - if (bpf_prog_load_xattr(_load_attr, , _fd))
> - exit(EXIT_FAILURE);
> - if (prog_fd < 0) {
> - fprintf(stderr, "ERROR: no program found: %s\n",
> - strerror(prog_fd));
> - exit(EXIT_FAILURE);
> - }
> + if (bpf_prog_load_xattr(_load_attr, , _fd))
> + exit(EXIT_FAILURE);
> + if (prog_fd < 0) {
> + fprintf(stderr, "ERROR: no program found: %s\n",
> + strerror(prog_fd));
> + exit(EXIT_FAILURE);
> + }
> 
> - map = bpf_object__find_map_by_name(obj, "qidconf_map");
> - qidconf_map = bpf_map__fd(map);
> - if (qidconf_map < 0) {
> - fprintf(stderr, "ERROR: no qidconf map found: %s\n",
> - 

Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-08 Thread Florian Westphal
Jesper Dangaard Brouer  wrote:
> From: Ilias Apalodimas 
> 
> This patch is changing struct sk_buff, and is thus per-definition
> controversial.
> 
> Place a new member 'mem_info' of type struct xdp_mem_info, just after
> members (flags) head_frag and pfmemalloc, And not in between
> headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
> Copying mem_info during skb_clone() is required.  This makes sure that
> pages are correctly freed or recycled during the altered
> skb_free_head() invocation.

I read this to mean that this 'info' isn't accessed/needed until skb
is freed.  Any reason its not added at the end?

This would avoid moving other fields that are probably accessed
more frequently during processing.


Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-08 Thread Björn Töpel
Den fre 7 dec. 2018 kl 22:21 skrev Alexei Starovoitov
:
>
> On Fri, Dec 07, 2018 at 12:44:24PM +0100, Björn Töpel wrote:
> > From: Björn Töpel 
> >
> > Hi!
> >
> > This patch set adds support for a new XDP socket bind option,
> > XDP_ATTACH.
> >
> > The rationale behind attach is performance and ease of use. Many XDP
> > socket users just need a simple way of creating/binding a socket and
> > receiving frames right away without loading an XDP program.
> >
> > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > is a kernel provided XDP program that is installed to the netdev when
> > XDP_ATTACH is being passed as a bind() flag.
> >
> > The builtin program is the simplest program possible to redirect a
> > frame to an attached socket. In restricted C it would look like this:
> >
> >   SEC("xdp")
> >   int xdp_prog(struct xdp_md *ctx)
> >   {
> > return bpf_xsk_redirect(ctx);
> >   }
> >
> > The builtin program loaded via XDP_ATTACH behaves, from an
> > install-to-netdev/uninstall-from-netdev point of view, differently
> > from regular XDP programs. The easiest way to look at it is as a
> > 2-level hierarchy, where regular XDP programs has precedence over the
> > builtin one.
>
> The feature makes sense to me.
> May be XDP_ATTACH_BUILTIN would be a better name ?

Yes, agree, or maybe XDP_BUILTIN_ATTACH? Regardless, I'll change the
name for the next revision.

> Also I think it needs another parameter to say which builtin
> program to use.

Yup, I had a plan to add the parameter when it's actually more than
*one* builtin, but you're right, let's do it right away. I'll add a
builtin prog enum field to the struct sockaddr_xdp.

> This unconditional xsk_redirect is fine for performance
> benchmarking, but for production I suspect the users would want
> an easy way to stay safe when they're playing with AF_XDP.

For setups that don't direct the flows explicitly by HW filters,  yes!

> So another builtin program that redirects ssh and ping traffic
> back to the kernel would be a nice addition.
>

I suspect AF_XDP users would prefer redirecting packets to the kernel
via the CPUMAP instead of XDP_PASS -- not paying for the ipstack on
the AF_XDP core. Another builtin would be a tcpdump-like behavior, but
that would require an XDP clone (which Magnus is actually
experimenting with!).

I'll address your input and get back with a new revision. Thanks for
spending time on the series!


Björn


Re: [PATCH net-next,v5 04/12] cls_api: add translator to flow_action representation

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:54PM CET, pa...@netfilter.org wrote:
>This patch implements a new function to translate from native TC action
>to the new flow_action representation. Moreover, this patch also updates
>cls_flower to use this new function.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 03/12] flow_offload: add flow action infrastructure

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:53PM CET, pa...@netfilter.org wrote:
>This new infrastructure defines the nic actions that you can perform
>from existing network drivers. This infrastructure allows us to avoid a
>direct dependency with the native software TC action representation.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 02/12] net/mlx5e: support for two independent packet edit actions

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:52PM CET, pa...@netfilter.org wrote:
>This patch adds pedit_headers_action structure to store the result of
>parsing tc pedit actions. Then, it calls alloc_tc_pedit_action() to
>populate the mlx5e hardware intermediate representation once all actions
>have been parsed.
>
>This patch comes in preparation for the new flow_action infrastructure,
>where each packet mangling comes in an separated action, ie. not packed
>as in tc pedit.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 01/12] flow_offload: add flow_rule and flow_match structures and use them

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:51PM CET, pa...@netfilter.org wrote:
>This patch wraps the dissector key and mask - that flower uses to
>represent the matching side - around the flow_match structure.
>
>To avoid a follow up patch that would edit the same LoCs in the drivers,
>this patch also wraps this new flow match structure around the flow rule
>object. This new structure will also contain the flow actions in follow
>up patches.
>
>This introduces two new interfaces:
>
>   bool flow_rule_match_key(rule, dissector_id)
>
>that returns true if a given matching key is set on, and:
>
>   flow_rule_match_XYZ(rule, );
>
>To fetch the matching side XYZ into the match container structure, to
>retrieve the key and the mask with one single call.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA

2018-12-07 Thread Ilias Apalodimas
On Fri, Dec 07, 2018 at 11:06:55PM -0800, David Miller wrote:

> This isn't going to work on 32-bit platforms where dma_addr_t is a u64,
> because the page private is unsigned long.
> 
> Grep for PHY_ADDR_T_64BIT under arch/ to see the vast majority of the
> cases where this happens, then ARCH_DMA_ADDR_T_64BIT.

Noted, thanks for the heads up.

Thanks
/Ilias


Re: [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-07 Thread Ilias Apalodimas
On Fri, Dec 07, 2018 at 11:15:14PM -0800, David Miller wrote:
> From: Jesper Dangaard Brouer 
> Date: Fri, 07 Dec 2018 00:25:47 +0100
> 
> > @@ -744,6 +745,10 @@ struct sk_buff {
> > head_frag:1,
> > xmit_more:1,
> > pfmemalloc:1;
> > +   /* TODO: Future idea, extend mem_info with __u8 flags, and
> > +* move bits head_frag and pfmemalloc there.
> > +*/
> > +   struct xdp_mem_info mem_info;
> 
> This is 4 bytes right?

With this patchset yes

> 
> I guess I can live with this.

Great news!

> 
> Please do some microbenchmarks to make sure this doesn't show any
> obvious regressions.

Will do

Thanks
/Ilias


Re: [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-07 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Fri, 07 Dec 2018 00:25:47 +0100

> @@ -744,6 +745,10 @@ struct sk_buff {
>   head_frag:1,
>   xmit_more:1,
>   pfmemalloc:1;
> + /* TODO: Future idea, extend mem_info with __u8 flags, and
> +  * move bits head_frag and pfmemalloc there.
> +  */
> + struct xdp_mem_info mem_info;

This is 4 bytes right?

I guess I can live with this.

Please do some microbenchmarks to make sure this doesn't show any
obvious regressions.

Thanks.


Re: [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA

2018-12-07 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Fri, 07 Dec 2018 00:25:32 +0100

> From: Ilias Apalodimas 
> 
> Add helper functions for retreiving dma_addr_t stored in page_private and
> unmapping dma addresses, mapped via the page_pool API.
> 
> Signed-off-by: Ilias Apalodimas 
> Signed-off-by: Jesper Dangaard Brouer 

This isn't going to work on 32-bit platforms where dma_addr_t is a u64,
because the page private is unsigned long.

Grep for PHY_ADDR_T_64BIT under arch/ to see the vast majority of the
cases where this happens, then ARCH_DMA_ADDR_T_64BIT.


Re: [PATCH] Revert "net/ibm/emac: wrong bit is used for STA control"

2018-12-07 Thread David Miller
From: Benjamin Herrenschmidt 
Date: Fri, 07 Dec 2018 15:05:04 +1100

> This reverts commit 624ca9c33c8a853a4a589836e310d776620f4ab9.
> 
> This commit is completely bogus. The STACR register has two formats, old
> and new, depending on the version of the IP block used. There's a pair of
> device-tree properties that can be used to specify the format used:
> 
>   has-inverted-stacr-oc
>   has-new-stacr-staopc
> 
> What this commit did was to change the bit definition used with the old
> parts to match the new parts. This of course breaks the driver on all
> the old ones.
> 
> Instead, the author should have set the appropriate properties in the
> device-tree for the variant used on his board.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> ---
> 
> Found while setting up some old ppc440 boxes for test/CI

Applied, thanks.


Re: [PATCH] net-udp: deprioritize cpu match for udp socket lookup

2018-12-07 Thread David Miller
From: Maciej Żenczykowski 
Date: Fri, 7 Dec 2018 16:46:36 -0800

>> This doesn't apply to the current net tree.
>>
>> Also "net-udp: " is a weird subsystem prefix, just use "udp: ".
>>
>> Thank you.
> 
> Interesting... this patch was on top of net-next/master, and it still
> rebases cleanly on current net-next/master.
> 
> Would you like it on net/master instead?  It indeed doesn't apply
> cleanly there...

Well, it is a bug fix isn't it?  Or is this more like a behavioral feature?


Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut
On 12/08/2018 12:46 AM, David Miller wrote:
> From: Marek Vasut 
> Date: Fri, 7 Dec 2018 23:59:58 +0100
> 
>> On 12/07/2018 11:24 PM, Andrew Lunn wrote:
>>> On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
 Add code to handle optional reset GPIO in the KSZ switch driver. The switch
 has a reset GPIO line which can be controlled by the CPU, so make sure it 
 is
 configured correctly in such setups.
>>>
>>> Hi Marek
>>
>> Hi Andrew,
>>
>>> Please make this a patch series, not two individual patches.
>>
>> This actually is an individual patch, it doesn't depend on anything.
>> Or do you mean a series with the DT documentation change ?
> 
> Yes, but all of this stuff is building up for one single purpose,
> and that is to support a new mode of operation with DSA or whatever.

I'll group together the ones which make sense to group together and are
not orthogonal if that's OK with you. The reset handling really is
orthogonal from the rest and can go in independently of the rest.

> So please group them together in a series with an appropriate
> header posting.

Sure

-- 
Best regards,
Marek Vasut


Re: [PATCH] net: dsa: ksz: Increase the tag alignment

2018-12-07 Thread Marek Vasut
On 12/08/2018 01:52 AM, tristram...@microchip.com wrote:
>> -padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
>> +padlen = (skb->len >= VLAN_ETH_ZLEN) ? 0 : VLAN_ETH_ZLEN - skb-
>>> len;
> 
> The requirement is the tail tag should be at the end of frame before FCS.
> When the length is less than 60 the MAC controller will pad the frame to
> legal size.  That is why this function makes sure the padding is done
> manually.  Increasing the size just increases the length of the frame and the
> chance to allocate new socket buffer.
> 
> Example of using ping size of 18 will have the sizes of request and response
> differ by 4 bytes.  Not that it matters much.
> 
> Are you concerned the MAC controller will remove the VLAN tag and so the frame
> will not be sent? Or the switch removes the VLAN tag and is not able to send?

With TI CPSW in dual-ethernet configuration, which adds internal VLAN
tag at the end of the frame, the KSZ switch fails. The CPU will send out
packets and the switch will reject them as corrupted. It needs this
extra VLAN tag padding.

-- 
Best regards,
Marek Vasut


Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Marek Vasut
On 12/08/2018 01:13 AM, tristram...@microchip.com wrote:
>> Do you have a git tree with all the KSZ patches based on -next
>> somewhere, so I don't have to look for them in random MLs ?
> 
> I just sent it this Monday and the subject for that patch is
> "[PATCH RFC 6/6] net: dsa: microchip: Add switch offload forwarding support."

Is all that collected in some git tree somewhere, so I don't have to
look for various patches in varying states of decay throughout the ML?

-- 
Best regards,
Marek Vasut


RE: [PATCH] net: dsa: ksz: Increase the tag alignment

2018-12-07 Thread Tristram.Ha
> - padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
> + padlen = (skb->len >= VLAN_ETH_ZLEN) ? 0 : VLAN_ETH_ZLEN - skb-
> >len;

The requirement is the tail tag should be at the end of frame before FCS.
When the length is less than 60 the MAC controller will pad the frame to
legal size.  That is why this function makes sure the padding is done
manually.  Increasing the size just increases the length of the frame and the
chance to allocate new socket buffer.

Example of using ping size of 18 will have the sizes of request and response
differ by 4 bytes.  Not that it matters much.

Are you concerned the MAC controller will remove the VLAN tag and so the frame
will not be sent? Or the switch removes the VLAN tag and is not able to send?



Re: [PATCH] net-udp: deprioritize cpu match for udp socket lookup

2018-12-07 Thread Maciej Żenczykowski
> This doesn't apply to the current net tree.
>
> Also "net-udp: " is a weird subsystem prefix, just use "udp: ".
>
> Thank you.

Interesting... this patch was on top of net-next/master, and it still
rebases cleanly on current net-next/master.

Would you like it on net/master instead?  It indeed doesn't apply
cleanly there...


Re: [PATCH net-next 0/4] tc-testing: implement command timeouts and better results tracking

2018-12-07 Thread David Miller
From: Lucas Bates 
Date: Thu,  6 Dec 2018 17:42:23 -0500

> Patch 1 adds a timeout feature for any command tdc launches in a subshell.
> This prevents tdc from hanging indefinitely.
> 
> Patches 2-4 introduce a new method for tracking and generating test case
> results, and implements it across the core script and all applicable
> plugins.

Series applied.


Re: [PATCH net v2 0/2] Fix slab out-of-bounds on insufficient headroom for IPv6 packets

2018-12-07 Thread David Miller
From: Stefano Brivio 
Date: Thu,  6 Dec 2018 19:30:35 +0100

> Patch 1/2 fixes a slab out-of-bounds occurring with short SCTP packets over
> IPv4 over L2TP over IPv6 on a configuration with relatively low HEADER_MAX.
> 
> Patch 2/2 makes sure we avoid writing before the allocated buffer in
> neigh_hh_output() in case the headroom is enough for the unaligned hardware
> header size, but not enough for the aligned one, and that we warn if we hit
> this condition.

Series applied and queued up for -stable, thanks.


Re: [PATCH net] tcp: lack of available data can also cause TSO defer

2018-12-07 Thread David Miller
From: Eric Dumazet 
Date: Thu,  6 Dec 2018 09:58:24 -0800

> tcp_tso_should_defer() can return true in three different cases :
> 
>  1) We are cwnd-limited
>  2) We are rwnd-limited
>  3) We are application limited.
> 
> Neal pointed out that my recent fix went too far, since
> it assumed that if we were not in 1) case, we must be rwnd-limited
> 
> Fix this by properly populating the is_cwnd_limited and
> is_rwnd_limited booleans.
> 
> After this change, we can finally move the silly check for FIN
> flag only for the application-limited case.
> 
> The same move for EOR bit will be handled in net-next,
> since commit 1c09f7d073b1 ("tcp: do not try to defer skbs
> with eor mark (MSG_EOR)") is scheduled for linux-4.21
> 
> Tested by running 200 concurrent netperf -t TCP_RR -- -r 6,100
> and checking none of them was rwnd_limited in the chrono_stat
> output from "ss -ti" command.
> 
> Fixes: 41727549de3e ("tcp: Do not underestimate rwnd_limited")
> Signed-off-by: Eric Dumazet 
> Suggested-by: Neal Cardwell 
> Reviewed-by: Neal Cardwell 
> Acked-by: Soheil Hassas Yeganeh 
> Reviewed-by: Yuchung Cheng 

Applied.


Re: [PATCH] net-udp: deprioritize cpu match for udp socket lookup

2018-12-07 Thread David Miller
From: Maciej Żenczykowski 
Date: Wed,  5 Dec 2018 12:59:17 -0800

> From: Maciej Żenczykowski 
> 
> During udp socket lookup cpu match should be lowest priority,
> hence it should increase score by only 1.
> 
> The next priority is delivering v4 to v4 sockets, and v6 to v6 sockets.
> The v6 code path doesn't have to deal with this so it always gets
> a score of '4'.  The v4 code path uses '4' or '2' depending on
> whether we're delivering to a v4 socket or a dualstack v6 socket.
> 
> This is more important than cpu match, so has to be greater than
> the '1' bump in score from cpu match.
> 
> All other matches (src/dst ip, src port) are even *more* important,
> so need to bump score by 4 for ipv4.
> 
> For ipv6 we could simply bump by 2, but let's keep the two code
> paths as similar as possible.
> 
> (also, while at it, remove two unnecessary unconditional score bumps)
> 
> Signed-off-by: Maciej Żenczykowski 

This doesn't apply to the current net tree.

Also "net-udp: " is a weird subsystem prefix, just use "udp: ".

Thank you.


RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha
> Do you have a git tree with all the KSZ patches based on -next
> somewhere, so I don't have to look for them in random MLs ?

I just sent it this Monday and the subject for that patch is
"[PATCH RFC 6/6] net: dsa: microchip: Add switch offload forwarding support."



Re: [Patch v2 net-next] call sk_dst_reset when set SO_DONTROUTE

2018-12-07 Thread David Miller
From: yupeng 
Date: Wed,  5 Dec 2018 18:56:28 -0800

> after set SO_DONTROUTE to 1, the IP layer should not route packets if
> the dest IP address is not in link scope. But if the socket has cached
> the dst_entry, such packets would be routed until the sk_dst_cache
> expires. So we should clean the sk_dst_cache when a user set
> SO_DONTROUTE option. Below are server/client python scripts which
> could reprodue this issue:
 ...
> Signed-off-by: yupeng 

Applied.


RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha
> > I think if you do this without setting offload_fwd_mark you will
> > receive duplicate frame.
> 
> I don't think it will, at least not in the normal case. The hardware
> should know the egress port, so there is no need to forward a copy to
> the CPU. The only time it should forward to the CPU is when the egress
> port is not known, so it floods. Without offload_fwd_mark set, the SW
> bridge will flood it back out the ports causing duplication. But that
> is not too bad. The Marvell driver did this for a while and nothing
> bad was reported.

For unicast frames it is okay as the CPU port does not see it after the first
one.  For multicast frames there will be duplicates, and it is tolerated?



Re: [PATCH v2 net-next] neighbor: Improve garbage collection

2018-12-07 Thread David Miller
From: David Ahern 
Date: Fri,  7 Dec 2018 12:24:57 -0800

> From: David Ahern 
> 
> The existing garbage collection algorithm has a number of problems:
 ...
> This patch addresses these problems as follows:
> 
> 1. Use of a separate list_head to track entries that can be garbage
>collected along with a separate counter. PERMANENT entries are not
>added to this list.
> 
>The gc_thresh parameters are only compared to the new counter, not the
>total entries in the table. The forced_gc function is updated to only
>walk this new gc_list looking for entries to evict.
> 
> 2. Entries are added to the list head at the tail and removed from the
>front.
> 
> 3. Entries are only evicted if they were last updated more than 5 seconds
>ago, adhering to the original intent of gc_thresh2.
> 
> 4. Forced gc is stopped once the number of gc_entries drops below
>gc_thresh2.
> 
> 5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
>when allocating a new neighbor for a PERMANENT entry. By extension this
>means there are no explicit limits on the number of PERMANENT entries
>that can be created, but this is no different than FIB entries or FDB
>entries.
> 
> Signed-off-by: David Ahern 
> ---
> v2
> - remove on_gc_list boolean in favor of !list_empty
> - fix neigh_alloc to add new entry to tail of list_head

Again, looks great, applied.


RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha
> >> If two ports are in the same bridge and in forwarding state, the packets
> >> must be able to pass between them in both directions. The current code
> >> only configures this bridge membership for a port newly added to the
> >> bridge, but does not update all the other ports. Thus, ingress packets
> >> on the new port will be forwarded, but ingress packets on other ports
> >> destined for the new port (eg. a reply) will not be forwarded back to
> >> the new port, because they are not configured to do so. This patch fixes
> >> that by updating the membership registers of all ports.
> >>
> >> Signed-off-by: Marek Vasut 
> >> Cc: Vivien Didelot 
> >> Cc: Woojung Huh 
> >> Cc: David S. Miller 
> >> Cc: Tristram Ha 
> >> ---
> >>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/net/dsa/microchip/ksz9477.c
> >> b/drivers/net/dsa/microchip/ksz9477.c
> >> index 0684657fbf9a9..e24dd14ccde77 100644
> >> --- a/drivers/net/dsa/microchip/ksz9477.c
> >> +++ b/drivers/net/dsa/microchip/ksz9477.c
> >> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
> >> dsa_switch *ds, int port,
> >>struct ksz_device *dev = ds->priv;
> >>struct ksz_port *p = >ports[port];
> >>u8 data;
> >> -  int member = -1;
> >> +  int i, member = -1;
> >>
> >>ksz_pread8(dev, port, P_STP_CTRL, );
> >>data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
> >> PORT_LEARN_DISABLE);
> >> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
> >> dsa_switch *ds, int port,
> >>dev->tx_ports &= ~(1 << port);
> >>
> >>/* Port membership may share register with STP state. */
> >> -  if (member >= 0 && member != p->member)
> >> -  ksz9477_cfg_port_member(dev, port, (u8)member);
> >> +  for (i = 0; i < SWITCH_PORT_NUM; i++)
> >> +  ksz9477_cfg_port_member(dev, i, (u8)member);
> >>
> >>/* Check if forwarding needs to be updated. */
> >>if (state != BR_STATE_FORWARDING) {
> >
> > The original DSA model did not have a way to tell the bridge device not to
> > forward the frame, so the switch driver always setup the membership to
> > disable forwarding between ports.
> >
> > When lan devices are setup they act like individual devices.  A bridge 
> > device
> > adding them under it will forward the frames.
> >
> > The new switchdev model adds the offload_fwd_mark bit to tell the bridge
> not to
> > forward frame.
> >
> > The ksz_update_port_member function in ksz_common.c is doing this
> membership
> > setup for all forwarding ports.  It was finally enabled in one of the RFC
> patches I
> > submitted recently (Add switch forward offloading support).
> >
> > I think if you do this without setting offload_fwd_mark you will receive
> duplicate
> > frame.
> >
> 
> Either I am misreading Marek's patch, or I don't quite understand your
> response, but what is happening when you enslave a switch port into a
> bridge is that you need to make sure that:
> 
> - the switch port being enslaved will be part of the same forwarding
> group as any other switch port already in the bridge
> - any existing switch port already enslaved in the bridge must now also
> be allowed to forward to the port that is being enslaved
> 
> That is to me, exactly what Marek's patch is fixing, your response is
> about something slightly orthogonal here.

I got confused here as the code is obviously wrong and should not work,
so I found out why it works in the bridge device situation.  There is actually
a bug in the driver that enables this behavior.  The port_vlan_filtering 
function
turns off the port membership enforcement.  Fixing this problem should be
easy, but this port_vlan_filtering function is also not implemented right.  It 
treats the
operation as a simple VLAN on/off, but it is more complex than that.

Anyway it seems to work in the bridge device situation, but it does not work
in the default situation:

Assume there are two port devices lan1 and lan2.  The device lan1 is assigned
an IP address and can talk to outside.  Enabling and disabling lan2 by doing
"ifconfig lan2 up" and "ifconfig lan2 down."  The device lan1 is no longer 
working.

Create a bridge device and add a child device by doing "brctl addbr br0" and
"brctl addif br0 lan1."  This will call port_vlan_filtering and then the feature
UNICAST_VLAN_BOUNDARY is disabled.  This causes port membership to have no
effect on unicast packets and so it does not matter what member value is used.
The device lan1 can start working again.

The fix is to avoid disabling UNICAST_VLAN_BOUNDARY and it should be set all
the time.  In this switch the default is on.


Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread David Miller
From: Marek Vasut 
Date: Fri, 7 Dec 2018 23:59:58 +0100

> On 12/07/2018 11:24 PM, Andrew Lunn wrote:
>> On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
>>> Add code to handle optional reset GPIO in the KSZ switch driver. The switch
>>> has a reset GPIO line which can be controlled by the CPU, so make sure it is
>>> configured correctly in such setups.
>> 
>> Hi Marek
> 
> Hi Andrew,
> 
>> Please make this a patch series, not two individual patches.
> 
> This actually is an individual patch, it doesn't depend on anything.
> Or do you mean a series with the DT documentation change ?

Yes, but all of this stuff is building up for one single purpose,
and that is to support a new mode of operation with DSA or whatever.

So please group them together in a series with an appropriate
header posting.


Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread David Miller
From: Eric Dumazet 
Date: Fri, 7 Dec 2018 15:03:04 -0800

> On 12/07/2018 02:24 PM, David Ahern wrote:
>> On 12/7/18 3:20 PM, Eric Dumazet wrote:
>> 
>> /* --- cacheline 3 boundary (192 bytes) --- */
>> struct hh_cachehh;   /*   19248 */
>> 
>> ...
>> 
>> but does not change the actual allocation size which is rounded to 512.
>> 
> 
> I have not talked about the allocation size, but alignment of ->ha field,
> which is kind of assuming long alignment, in a strange way.

Right, neigh->ha[] should probably be kept 8-byte aligned.


Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread Eric Dumazet



On 12/07/2018 02:24 PM, David Ahern wrote:
> On 12/7/18 3:20 PM, Eric Dumazet wrote:
>>
>>
>> On 12/07/2018 01:49 PM, David Ahern wrote:
>>> From: David Ahern 
>>>
>>> Similar to routes and rules, add protocol attribute to neighbor entries
>>> for easier tracking of how each was created.
>>>
>>> Signed-off-by: David Ahern 
>>> ---
>>>  include/net/neighbour.h|  2 ++
>>>  include/uapi/linux/neighbour.h |  1 +
>>>  net/core/neighbour.c   | 24 +++-
>>>  3 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
>>> index 6c13072910ab..e93c59df9501 100644
>>> --- a/include/net/neighbour.h
>>> +++ b/include/net/neighbour.h
>>> @@ -149,6 +149,7 @@ struct neighbour {
>>> __u8nud_state;
>>> __u8type;
>>> __u8dead;
>>> +   u8  protocol;
>>> seqlock_t   ha_lock;
>>> unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];
>>
>> This looks like ha[] alignment would change, I am not sure how critical it 
>> is.
> 
> Just adds 4 bytes to neighbour:
> 
> ...
> /* --- cacheline 2 boundary (128 bytes) --- */
> long unsigned int  used; /*   128 8 */
> atomic_t   probes;   /*   136 4 */
> __u8   flags;/*   140 1 */
> __u8   nud_state;/*   141 1 */
> __u8   type; /*   142 1 */
> __u8   dead; /*   143 1 */
> u8 protocol; /*   144 1 */
> 
> /* XXX 3 bytes hole, try to pack */
> seqlock_t  ha_lock;  /*   148 8 */
> unsigned char  ha[32];   /*   15632 */
> /* XXX 4 bytes hole, try to pack */
> 
> /* --- cacheline 3 boundary (192 bytes) --- */
> struct hh_cachehh;   /*   19248 */
> 
> ...
> 
> but does not change the actual allocation size which is rounded to 512.
> 

I have not talked about the allocation size, but alignment of ->ha field,
which is kind of assuming long alignment, in a strange way.

As I said, I do not know how performance critical this might be.



Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut
On 12/07/2018 11:24 PM, Andrew Lunn wrote:
> On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
>> Add code to handle optional reset GPIO in the KSZ switch driver. The switch
>> has a reset GPIO line which can be controlled by the CPU, so make sure it is
>> configured correctly in such setups.
> 
> Hi Marek

Hi Andrew,

> Please make this a patch series, not two individual patches.

This actually is an individual patch, it doesn't depend on anything.
Or do you mean a series with the DT documentation change ?

> And as David has already said, include a cover letter.
> 
> Otherwise, this looks O.K.
> 
> Thanks
>   Andrew
> 


-- 
Best regards,
Marek Vasut


Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread David Ahern
On 12/7/18 3:20 PM, Eric Dumazet wrote:
> 
> 
> On 12/07/2018 01:49 PM, David Ahern wrote:
>> From: David Ahern 
>>
>> Similar to routes and rules, add protocol attribute to neighbor entries
>> for easier tracking of how each was created.
>>
>> Signed-off-by: David Ahern 
>> ---
>>  include/net/neighbour.h|  2 ++
>>  include/uapi/linux/neighbour.h |  1 +
>>  net/core/neighbour.c   | 24 +++-
>>  3 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
>> index 6c13072910ab..e93c59df9501 100644
>> --- a/include/net/neighbour.h
>> +++ b/include/net/neighbour.h
>> @@ -149,6 +149,7 @@ struct neighbour {
>>  __u8nud_state;
>>  __u8type;
>>  __u8dead;
>> +u8  protocol;
>>  seqlock_t   ha_lock;
>>  unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];
> 
> This looks like ha[] alignment would change, I am not sure how critical it is.

Just adds 4 bytes to neighbour:

...
/* --- cacheline 2 boundary (128 bytes) --- */
long unsigned int  used; /*   128 8 */
atomic_t   probes;   /*   136 4 */
__u8   flags;/*   140 1 */
__u8   nud_state;/*   141 1 */
__u8   type; /*   142 1 */
__u8   dead; /*   143 1 */
u8 protocol; /*   144 1 */

/* XXX 3 bytes hole, try to pack */
seqlock_t  ha_lock;  /*   148 8 */
unsigned char  ha[32];   /*   15632 */
/* XXX 4 bytes hole, try to pack */

/* --- cacheline 3 boundary (192 bytes) --- */
struct hh_cachehh;   /*   19248 */

...

but does not change the actual allocation size which is rounded to 512.


Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Andrew Lunn
On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote:
> Add code to handle optional reset GPIO in the KSZ switch driver. The switch
> has a reset GPIO line which can be controlled by the CPU, so make sure it is
> configured correctly in such setups.

Hi Marek

Please make this a patch series, not two individual patches.

And as David has already said, include a cover letter.

Otherwise, this looks O.K.

Thanks
Andrew


Re: [PATCH net-next] neighbor: Add protocol attribute

2018-12-07 Thread Eric Dumazet



On 12/07/2018 01:49 PM, David Ahern wrote:
> From: David Ahern 
> 
> Similar to routes and rules, add protocol attribute to neighbor entries
> for easier tracking of how each was created.
> 
> Signed-off-by: David Ahern 
> ---
>  include/net/neighbour.h|  2 ++
>  include/uapi/linux/neighbour.h |  1 +
>  net/core/neighbour.c   | 24 +++-
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
> index 6c13072910ab..e93c59df9501 100644
> --- a/include/net/neighbour.h
> +++ b/include/net/neighbour.h
> @@ -149,6 +149,7 @@ struct neighbour {
>   __u8nud_state;
>   __u8type;
>   __u8dead;
> + u8  protocol;
>   seqlock_t   ha_lock;
>   unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];

This looks like ha[] alignment would change, I am not sure how critical it is.

>   struct hh_cache hh;
> @@ -173,6 +174,7 @@ struct pneigh_entry {
>   possible_net_t  net;
>   struct net_device   *dev;
>   u8  flags;
> + u8  protocol;
>   u8  key[0];
>  };
>  




Re: [PATCH] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Marek Vasut
On 12/07/2018 08:55 PM, Andrew Lunn wrote:
>> +dev->reset_gpio = -1;
>> +reset_gpio = of_get_named_gpio_flags(np, "reset-gpios", 0,
>> + _gpio_flags);
>> +if (reset_gpio >= 0) {
>> +flags = (reset_gpio_flags == OF_GPIO_ACTIVE_LOW) ?
>> +GPIOF_ACTIVE_LOW : 0;
> 
> Can you use devm_gpiod_get_optional()? It makes this a lot simpler.
> Take a look at mv88e6xxx/chip.c which also uses a GPIO for reset.

Done

> You also need to update the binding documentation for this new
> property.

Will do in a separate patch.

-- 
Best regards,
Marek Vasut


Re: [PATCH bpf 1/2] selftests/bpf: use thoff instead of nhoff in BPF flow dissector

2018-12-07 Thread Alexei Starovoitov
On Wed, Dec 05, 2018 at 08:40:47PM -0800, Stanislav Fomichev wrote:
> We are returning thoff from the flow dissector, not the nhoff. Pass
> thoff along with nhoff to the bpf program (initially thoff == nhoff)
> and expect flow dissector amend/return thoff, not nhoff.
> 
> This avoids confusion, when by the time bpf flow dissector exits,
> nhoff == thoff, which doesn't make much sense.
> 
> Signed-off-by: Stanislav Fomichev 

applied both to bpf tree. thanks



Re: [PATCH v2 bpf-next 0/7] bpf: support BPF_ALU | BPF_ARSH

2018-12-07 Thread Alexei Starovoitov
On Wed, Dec 05, 2018 at 01:52:29PM -0500, Jiong Wang wrote:
> BPF_ALU | BPF_ARSH | BPF_* were rejected by commit: 7891a87efc71
> ("bpf: arsh is not supported in 32 bit alu thus reject it"). As explained
> in the commit message, this is due to there is no complete support for them
> on interpreter and various JIT compilation back-ends.
> 
> This patch set is a follow-up which completes the missing bits. This also
> pave the way for running bpf program compiled with ALU32 instruction
> enabled by specifing -mattr=+alu32 to LLVM for which case there is likely
> to have more BPF_ALU | BPF_ARSH insns that will trigger the rejection code.
> 
> test_verifier.c is updated accordingly.
> 
> I have tested this patch set on x86-64 and NFP, I need help of review and
> test on the arch changes (mips/ppc/s390).
> 
> Note, there might be merge confict on mips change which is better to be
> applied on top of:
> 
>   commit: 20b880a05f06 ("mips: bpf: fix encoding bug for mm_srlv32_op"),
> 
> which is on mips-fixes branch at the moment.
> 
> Thanks.
> 
> v1->v2:
>  - Fix ppc implementation bug. Should zero high bits explicitly.

I've applied this set and earlier commit "mips: bpf: fix encoding bug for 
mm_srlv32_op"
to bpf-next.

Thanks



Re: [PATCH bpf-next 0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

2018-12-07 Thread Alexei Starovoitov
On Fri, Dec 07, 2018 at 12:44:24PM +0100, Björn Töpel wrote:
> From: Björn Töpel 
> 
> Hi!
> 
> This patch set adds support for a new XDP socket bind option,
> XDP_ATTACH.
> 
> The rationale behind attach is performance and ease of use. Many XDP
> socket users just need a simple way of creating/binding a socket and
> receiving frames right away without loading an XDP program.
> 
> XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> is a kernel provided XDP program that is installed to the netdev when
> XDP_ATTACH is being passed as a bind() flag.
> 
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
> 
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
> return bpf_xsk_redirect(ctx);
>   }
> 
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, differently
> from regular XDP programs. The easiest way to look at it is as a
> 2-level hierarchy, where regular XDP programs has precedence over the
> builtin one.

The feature makes sense to me.
May be XDP_ATTACH_BUILTIN would be a better name ?
Also I think it needs another parameter to say which builtin
program to use.
This unconditional xsk_redirect is fine for performance
benchmarking, but for production I suspect the users would want
an easy way to stay safe when they're playing with AF_XDP.
So another builtin program that redirects ssh and ping traffic
back to the kernel would be a nice addition.



Re: [PATCH iproute2-next 0/2] devlink: Add support for 'fw_load_policy' generic parameter

2018-12-07 Thread David Ahern
On 12/4/18 3:14 AM, Shalom Toledo wrote:
> Patch #1 add string to uint conversion support for generic parameters.
> Patch #2 add string to uint support for 'fw_load_policy' generic parameter
> 
> Shalom Toledo (2):
>   devlink: Add string to uint{8,16,32} conversion for generic parameters
>   devlink: Add support for 'fw_load_policy' generic parameter
> 
>  devlink/devlink.c| 156 ---
>  include/uapi/linux/devlink.h |   5 ++
>  2 files changed, 151 insertions(+), 10 deletions(-)
> 

applied to iproute2-next. Thanks


Re: [PATCH 1/5] net: dsa: ksz: Add MIB counter reading support

2018-12-07 Thread David Miller


Every patch series should have a header posting with Subject of
the form "[PATCH 0/N] ..." explaining what the series does at
a high level, how it does it, and why it does it that way.


Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Tony Lindgren
* Russell King - ARM Linux  [181207 19:27]:
> You mentioned that edge mode didn't work as well as level mode on
> duovero smsc controller, I think this may help to solve the same
> issue but for edge IRQs - we need a mask_ack_irq function to avoid
> acking while the edge interrupt is masked.  Let me know if that
> lowers the smsc ping latency while in edge mode.

Looks like smsc edge interrupt is still producing varying
ping latencies with this. Seems like the mas_ack_irq is
a nice improvment though.

Regards,

Tony


Re: [PATCH v2 net-next 0/4] net: aquantia: add RSS configuration

2018-12-07 Thread David Miller
From: Igor Russkikh 
Date: Fri, 7 Dec 2018 14:00:09 +

> In this patchset few bugs related to RSS are fixed and RSS table and
> hash key configuration is added.
> 
> We also do increase max number of HW rings upto 8.
> 
> v2: removed extra arg check

Series applied.


Re: [PATCH] gianfar: Add gfar_change_carrier()

2018-12-07 Thread Florian Fainelli
On 12/7/18 9:26 AM, Andrew Lunn wrote:
>> Would you be happier if .ndo_change_carrier() only acted on Fixed PHYs?
> 
> I think it makes sense to allow a fixed phy carrier to be changed from
> user space. However, i don't think you can easily plumb that to
> .ndo_change_carrier(), since that is a MAC feature. You need to change
> the fixed_phy_status to indicate the PHY has lost link, and then let
> the usual mechanisms tell the MAC it is down and change the carrier
> status.

Joakim, I still don't understand what did not work with:

- adding a ndo_change_carrier() interface which keeps a boolean flag
whether the link was up or not

- register a fixed_link_update callback for your fixed PHY, which just
propagates that flag back to the fixed PHY

and that should take care of having the carrier go down, as driven by
the PHY state machine, for that fixed device.
-- 
Florian


Re: [PATCH rdma-next 0/3] Packet based credit mode

2018-12-07 Thread Jason Gunthorpe
On Fri, Dec 07, 2018 at 08:04:26AM +0200, Leon Romanovsky wrote:
> On Thu, Dec 06, 2018 at 08:27:06PM -0700, Jason Gunthorpe wrote:
> > On Fri, Nov 30, 2018 at 01:22:03PM +0200, Leon Romanovsky wrote:
> > > From: Leon Romanovsky 
> > >
> > > >From Danit,
> > >
> > > Packet based credit mode is an alternative end-to-end credit mode for QPs
> > > set during their creation. Credits are transported from the responder
> > > to the requester to optimize the use of its receive resources.
> > > In packet-based credit mode, credits are issued on a per packet basis.
> > >
> > > The advantage of this feature comes while sending large RDMA messages
> > > through switches that are short in memory.
> > >
> > > The first commit exposes QP creation flag and the HCA capability. The
> > > second commit adds support for a new DV QP creation flag. The last
> > > commit report packet based credit mode capability via the MLX5DV device
> > > capabilities.
> > >
> > > Thanks
> > >
> > > Danit Goldberg (3):
> > >   net/mlx5: Expose packet based credit mode
> > >   IB/mlx5: Add packet based credit mode support
> > >   IB/mlx5: Report packet based credit mode device capability
> >
> > This looks fine to me, can you update the shared branch please
> 
> Done, thanks
> 3fd3c80acc17 net/mlx5: Expose packet based credit mode

Applied to for-next

Thanks,
Jason


Re: [PATCH net-next v2 0/4] net: mitigate retpoline overhead

2018-12-07 Thread Paolo Abeni
Hi,

On Thu, 2018-12-06 at 22:28 -0800, David Miller wrote:
> From: David Miller 
> Date: Thu, 06 Dec 2018 22:24:09 -0800 (PST)
> 
> > Series applied, thanks!
> 
> Erm... actually reverted.  Please fix these build failures:

oops ...
I'm sorry for the late reply. I'm travelling and I will not able to re-
post soon.

> ld: net/ipv6/ip6_offload.o: in function `ipv6_gro_receive':
> ip6_offload.c:(.text+0xda2): undefined reference to `udp6_gro_receive'
> ld: ip6_offload.c:(.text+0xdb6): undefined reference to `udp6_gro_receive'
> ld: net/ipv6/ip6_offload.o: in function `ipv6_gro_complete':
> ip6_offload.c:(.text+0x1953): undefined reference to `udp6_gro_complete'
> ld: ip6_offload.c:(.text+0x1966): undefined reference to `udp6_gro_complete'
> make: *** [Makefile:1036: vmlinux] Error 1

Are you building with CONFIG_IPV6=m ? I tested vs some common cfg, but
I omitted that in my last iteration (my bad). With such conf ip6
offloads are builtin while udp6 offloads end-up in the ipv6 module, so
I can't use them with the given conf.

I'll try to fix the above in v3.

I'm sorry for this mess,

Paolo



Re: [PATCH net] ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output

2018-12-07 Thread David Miller
From: Shmulik Ladkani 
Date: Fri,  7 Dec 2018 09:50:17 +0200

> In 'seg6_output', stack variable 'struct flowi6 fl6' was missing
> initialization.
> 
> Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and 
> injection with lwtunnels")
> Signed-off-by: Shmulik Ladkani 

Applied and queued up for -stable, thanks.


Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Andrew Lunn
> I think if you do this without setting offload_fwd_mark you will
> receive duplicate frame.

I don't think it will, at least not in the normal case. The hardware
should know the egress port, so there is no need to forward a copy to
the CPU. The only time it should forward to the CPU is when the egress
port is not known, so it floods. Without offload_fwd_mark set, the SW
bridge will flood it back out the ports causing duplication. But that
is not too bad. The Marvell driver did this for a while and nothing
bad was reported.

Andrew


Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Marek Vasut
On 12/07/2018 08:37 PM, tristram...@microchip.com wrote:
>> If two ports are in the same bridge and in forwarding state, the packets
>> must be able to pass between them in both directions. The current code
>> only configures this bridge membership for a port newly added to the
>> bridge, but does not update all the other ports. Thus, ingress packets
>> on the new port will be forwarded, but ingress packets on other ports
>> destined for the new port (eg. a reply) will not be forwarded back to
>> the new port, because they are not configured to do so. This patch fixes
>> that by updating the membership registers of all ports.
>>
>> Signed-off-by: Marek Vasut 
>> Cc: Vivien Didelot 
>> Cc: Woojung Huh 
>> Cc: David S. Miller 
>> Cc: Tristram Ha 
>> ---
>>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/dsa/microchip/ksz9477.c
>> b/drivers/net/dsa/microchip/ksz9477.c
>> index 0684657fbf9a9..e24dd14ccde77 100644
>> --- a/drivers/net/dsa/microchip/ksz9477.c
>> +++ b/drivers/net/dsa/microchip/ksz9477.c
>> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  struct ksz_device *dev = ds->priv;
>>  struct ksz_port *p = >ports[port];
>>  u8 data;
>> -int member = -1;
>> +int i, member = -1;
>>
>>  ksz_pread8(dev, port, P_STP_CTRL, );
>>  data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
>> PORT_LEARN_DISABLE);
>> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  dev->tx_ports &= ~(1 << port);
>>
>>  /* Port membership may share register with STP state. */
>> -if (member >= 0 && member != p->member)
>> -ksz9477_cfg_port_member(dev, port, (u8)member);
>> +for (i = 0; i < SWITCH_PORT_NUM; i++)
>> +ksz9477_cfg_port_member(dev, i, (u8)member);
>>
>>  /* Check if forwarding needs to be updated. */
>>  if (state != BR_STATE_FORWARDING) {
> 
> The original DSA model did not have a way to tell the bridge device not to
> forward the frame, so the switch driver always setup the membership to
> disable forwarding between ports.
> 
> When lan devices are setup they act like individual devices.  A bridge device
> adding them under it will forward the frames.
> 
> The new switchdev model adds the offload_fwd_mark bit to tell the bridge not 
> to
> forward frame.
> 
> The ksz_update_port_member function in ksz_common.c is doing this membership
> setup for all forwarding ports.  It was finally enabled in one of the RFC 
> patches I
> submitted recently (Add switch forward offloading support).
> 
> I think if you do this without setting offload_fwd_mark you will receive 
> duplicate
> frame.

Do you have a git tree with all the KSZ patches based on -next
somewhere, so I don't have to look for them in random MLs ?

-- 
Best regards,
Marek Vasut


Re: [PATCH] net: dsa: ksz: Add reset GPIO handling

2018-12-07 Thread Andrew Lunn
> + dev->reset_gpio = -1;
> + reset_gpio = of_get_named_gpio_flags(np, "reset-gpios", 0,
> +  _gpio_flags);
> + if (reset_gpio >= 0) {
> + flags = (reset_gpio_flags == OF_GPIO_ACTIVE_LOW) ?
> + GPIOF_ACTIVE_LOW : 0;

Can you use devm_gpiod_get_optional()? It makes this a lot simpler.
Take a look at mv88e6xxx/chip.c which also uses a GPIO for reset.

You also need to update the binding documentation for this new
property.

Andrew


Re: [PATCH net-next] neighbour: Improve garbage collection

2018-12-07 Thread David Ahern
On 12/6/18 8:59 PM, David Miller wrote:
> But why do you need the on_gc_list boolean state? f

mental blockage.

v2 coming up.


Re: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Florian Fainelli
On 12/7/18 11:37 AM, tristram...@microchip.com wrote:
>> If two ports are in the same bridge and in forwarding state, the packets
>> must be able to pass between them in both directions. The current code
>> only configures this bridge membership for a port newly added to the
>> bridge, but does not update all the other ports. Thus, ingress packets
>> on the new port will be forwarded, but ingress packets on other ports
>> destined for the new port (eg. a reply) will not be forwarded back to
>> the new port, because they are not configured to do so. This patch fixes
>> that by updating the membership registers of all ports.
>>
>> Signed-off-by: Marek Vasut 
>> Cc: Vivien Didelot 
>> Cc: Woojung Huh 
>> Cc: David S. Miller 
>> Cc: Tristram Ha 
>> ---
>>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/dsa/microchip/ksz9477.c
>> b/drivers/net/dsa/microchip/ksz9477.c
>> index 0684657fbf9a9..e24dd14ccde77 100644
>> --- a/drivers/net/dsa/microchip/ksz9477.c
>> +++ b/drivers/net/dsa/microchip/ksz9477.c
>> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  struct ksz_device *dev = ds->priv;
>>  struct ksz_port *p = >ports[port];
>>  u8 data;
>> -int member = -1;
>> +int i, member = -1;
>>
>>  ksz_pread8(dev, port, P_STP_CTRL, );
>>  data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
>> PORT_LEARN_DISABLE);
>> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
>> dsa_switch *ds, int port,
>>  dev->tx_ports &= ~(1 << port);
>>
>>  /* Port membership may share register with STP state. */
>> -if (member >= 0 && member != p->member)
>> -ksz9477_cfg_port_member(dev, port, (u8)member);
>> +for (i = 0; i < SWITCH_PORT_NUM; i++)
>> +ksz9477_cfg_port_member(dev, i, (u8)member);
>>
>>  /* Check if forwarding needs to be updated. */
>>  if (state != BR_STATE_FORWARDING) {
> 
> The original DSA model did not have a way to tell the bridge device not to
> forward the frame, so the switch driver always setup the membership to
> disable forwarding between ports.
> 
> When lan devices are setup they act like individual devices.  A bridge device
> adding them under it will forward the frames.
> 
> The new switchdev model adds the offload_fwd_mark bit to tell the bridge not 
> to
> forward frame.
> 
> The ksz_update_port_member function in ksz_common.c is doing this membership
> setup for all forwarding ports.  It was finally enabled in one of the RFC 
> patches I
> submitted recently (Add switch forward offloading support).
> 
> I think if you do this without setting offload_fwd_mark you will receive 
> duplicate
> frame.
> 

Either I am misreading Marek's patch, or I don't quite understand your
response, but what is happening when you enslave a switch port into a
bridge is that you need to make sure that:

- the switch port being enslaved will be part of the same forwarding
group as any other switch port already in the bridge
- any existing switch port already enslaved in the bridge must now also
be allowed to forward to the port that is being enslaved

That is to me, exactly what Marek's patch is fixing, your response is
about something slightly orthogonal here.
-- 
Florian


RE: [PATCH] net: dsa: ksz: Fix port membership

2018-12-07 Thread Tristram.Ha
> If two ports are in the same bridge and in forwarding state, the packets
> must be able to pass between them in both directions. The current code
> only configures this bridge membership for a port newly added to the
> bridge, but does not update all the other ports. Thus, ingress packets
> on the new port will be forwarded, but ingress packets on other ports
> destined for the new port (eg. a reply) will not be forwarded back to
> the new port, because they are not configured to do so. This patch fixes
> that by updating the membership registers of all ports.
> 
> Signed-off-by: Marek Vasut 
> Cc: Vivien Didelot 
> Cc: Woojung Huh 
> Cc: David S. Miller 
> Cc: Tristram Ha 
> ---
>  drivers/net/dsa/microchip/ksz9477.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/dsa/microchip/ksz9477.c
> b/drivers/net/dsa/microchip/ksz9477.c
> index 0684657fbf9a9..e24dd14ccde77 100644
> --- a/drivers/net/dsa/microchip/ksz9477.c
> +++ b/drivers/net/dsa/microchip/ksz9477.c
> @@ -396,7 +396,7 @@ static void ksz9477_port_stp_state_set(struct
> dsa_switch *ds, int port,
>   struct ksz_device *dev = ds->priv;
>   struct ksz_port *p = >ports[port];
>   u8 data;
> - int member = -1;
> + int i, member = -1;
> 
>   ksz_pread8(dev, port, P_STP_CTRL, );
>   data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE |
> PORT_LEARN_DISABLE);
> @@ -454,8 +454,8 @@ static void ksz9477_port_stp_state_set(struct
> dsa_switch *ds, int port,
>   dev->tx_ports &= ~(1 << port);
> 
>   /* Port membership may share register with STP state. */
> - if (member >= 0 && member != p->member)
> - ksz9477_cfg_port_member(dev, port, (u8)member);
> + for (i = 0; i < SWITCH_PORT_NUM; i++)
> + ksz9477_cfg_port_member(dev, i, (u8)member);
> 
>   /* Check if forwarding needs to be updated. */
>   if (state != BR_STATE_FORWARDING) {

The original DSA model did not have a way to tell the bridge device not to
forward the frame, so the switch driver always setup the membership to
disable forwarding between ports.

When lan devices are setup they act like individual devices.  A bridge device
adding them under it will forward the frames.

The new switchdev model adds the offload_fwd_mark bit to tell the bridge not to
forward frame.

The ksz_update_port_member function in ksz_common.c is doing this membership
setup for all forwarding ports.  It was finally enabled in one of the RFC 
patches I
submitted recently (Add switch forward offloading support).

I think if you do this without setting offload_fwd_mark you will receive 
duplicate
frame.



Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Russell King - ARM Linux
On Fri, Dec 07, 2018 at 11:03:12AM -0800, Tony Lindgren wrote:
> * Tony Lindgren  [181207 18:14]:
> > Hi,
> > 
> > * Russell King - ARM Linux  [181207 18:01]:
> > > Hi Tony,
> > > 
> > > You know most of what's been going on from IRC, but here's the patch
> > > which gets me:
> > > 
> > > 1) working interrupts for networking
> > > 2) solves the stuck-wakeup problem
> > > 
> > > It also contains some of the debug bits I added.
> > 
> > This is excellent news :) Will test today.
> 
> Yes your patch seems to work great based on brief testing :)
> 
> > > I think what this means is that we should strip out ec0daae685b2
> > > ("gpio: omap: Add level wakeup handling for omap4 based SoCs").
> > 
> > Yes the only reason for the wakeup quirk was the stuck wakeup
> > state seen on omap4, it can be just dropped if this works.
> > Adding Grygorii to Cc too.
> 
> I'll post a partial revert for commit ec0daae685b2 ("gpio: omap:
> Add level wakeup handling for omap4 based SoCs") shortly.

Hi,

You mentioned that edge mode didn't work as well as level mode on
duovero smsc controller, I think this may help to solve the same
issue but for edge IRQs - we need a mask_ack_irq function to avoid
acking while the edge interrupt is masked.  Let me know if that
lowers the smsc ping latency while in edge mode.

Thanks.

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index 3d021f648c5d..b1ad6098e894 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -11,7 +11,7 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-
+#define DEBUG
 #include 
 #include 
 #include 
@@ -366,10 +366,14 @@ static inline void omap_set_gpio_trigger(struct gpio_bank 
*bank, int gpio,
  trigger & IRQ_TYPE_LEVEL_LOW);
omap_gpio_rmw(base, bank->regs->leveldetect1, gpio_bit,
  trigger & IRQ_TYPE_LEVEL_HIGH);
+   /*
+* We need the edge detect enabled for the idle mode detection
+* to function on OMAP4430.
+*/
omap_gpio_rmw(base, bank->regs->risingdetect, gpio_bit,
- trigger & IRQ_TYPE_EDGE_RISING);
+ trigger & (IRQ_TYPE_EDGE_RISING | IRQ_TYPE_LEVEL_HIGH));
omap_gpio_rmw(base, bank->regs->fallingdetect, gpio_bit,
- trigger & IRQ_TYPE_EDGE_FALLING);
+ trigger & (IRQ_TYPE_EDGE_FALLING | IRQ_TYPE_LEVEL_LOW));
 
bank->context.leveldetect0 =
readl_relaxed(bank->base + bank->regs->leveldetect0);
@@ -899,6 +903,19 @@ static void omap_gpio_mask_irq(struct irq_data *d)
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
+static void omap_gpio_mask_ack_irq(struct irq_data *d)
+{
+   struct gpio_bank *bank = omap_irq_data_get_bank(d);
+   unsigned offset = d->hwirq;
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(>lock, flags);
+   omap_clear_gpio_irqstatus(bank, offset);
+   omap_set_gpio_irqenable(bank, offset, 0);
+   omap_set_gpio_triggering(bank, offset, IRQ_TYPE_NONE);
+   raw_spin_unlock_irqrestore(>lock, flags);
+}
+
 static void omap_gpio_unmask_irq(struct irq_data *d)
 {
struct gpio_bank *bank = omap_irq_data_get_bank(d);
@@ -910,14 +927,16 @@ static void omap_gpio_unmask_irq(struct irq_data *d)
if (trigger)
omap_set_gpio_triggering(bank, offset, trigger);
 
+   omap_set_gpio_irqenable(bank, offset, 1);
+
/* For level-triggered GPIOs, the clearing must be done after
-* the HW source is cleared, thus after the handler has run */
-   if (bank->level_mask & BIT(offset)) {
-   omap_set_gpio_irqenable(bank, offset, 0);
+* the HW source is cleared, thus after the handler has run.
+* OMAP4 needs this done _after_ enabing the interrupt to clear
+* the wakeup status.
+*/
+   if (bank->level_mask & BIT(offset))
omap_clear_gpio_irqstatus(bank, offset);
-   }
 
-   omap_set_gpio_irqenable(bank, offset, 1);
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
@@ -1377,6 +1396,7 @@ static int omap_gpio_probe(struct platform_device *pdev)
 
irqc->irq_startup = omap_gpio_irq_startup,
irqc->irq_shutdown = omap_gpio_irq_shutdown,
+   irqc->irq_mask_ack = omap_gpio_mask_ack_irq,
irqc->irq_ack = omap_gpio_ack_irq,
irqc->irq_mask = omap_gpio_mask_irq,
irqc->irq_unmask = omap_gpio_unmask_irq,
@@ -1520,6 +1540,10 @@ static void omap_gpio_idle(struct gpio_bank *bank, bool 
may_lose_context)
struct device *dev = bank->chip.parent;
u32 l1 = 0, l2 = 0;
 
+   dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
+   bank->context.leveldetect0, bank->context.leveldetect1,
+   bank->context.wake_en);
+
if (bank->funcs.idle_enable_level_quirk)
bank->funcs.idle_enable_level_quirk(bank);
 
@@ -1553,6 

Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Tony Lindgren
* Tony Lindgren  [181207 18:14]:
> Hi,
> 
> * Russell King - ARM Linux  [181207 18:01]:
> > Hi Tony,
> > 
> > You know most of what's been going on from IRC, but here's the patch
> > which gets me:
> > 
> > 1) working interrupts for networking
> > 2) solves the stuck-wakeup problem
> > 
> > It also contains some of the debug bits I added.
> 
> This is excellent news :) Will test today.

Yes your patch seems to work great based on brief testing :)

> > I think what this means is that we should strip out ec0daae685b2
> > ("gpio: omap: Add level wakeup handling for omap4 based SoCs").
> 
> Yes the only reason for the wakeup quirk was the stuck wakeup
> state seen on omap4, it can be just dropped if this works.
> Adding Grygorii to Cc too.

I'll post a partial revert for commit ec0daae685b2 ("gpio: omap:
Add level wakeup handling for omap4 based SoCs") shortly.

Thanks,

Tony


Re: [PATCH bpf-next] bpf: relax verifier restriction on BPF_MOV | BPF_ALU

2018-12-07 Thread Alexei Starovoitov
On Fri, Dec 07, 2018 at 05:19:21PM +, Jiong Wang wrote:
> On 06/12/2018 03:13, Alexei Starovoitov wrote:
> > On Wed, Dec 05, 2018 at 03:32:50PM +, Jiong Wang wrote:
> > > On 05/12/2018 14:52, Edward Cree wrote:
> > > > On 05/12/18 09:46, Jiong Wang wrote:
> > > > > There is NO processed instruction number regression, either with or 
> > > > > without
> > > > > -mattr=+alu32.
> > > > 
> > > > > Cilium bpf
> > > > > ===
> > > > > bpf_lb-DLB_L3.o 2110/21101730/1733
> > > > That looks like a regression of 3 insns in the 32-bit case.
> > > > May be worth investigating why.
> > > 
> > > Will look into this.
> > > 
> > > > 
> > > > > +  dst_reg = insn->dst_reg;
> > > > > +  regs[dst_reg] = regs[src_reg];
> > > > > +  if (BPF_CLASS(insn->code) == BPF_ALU) {
> > > > > +  /* Update type and range info. */
> > > > > +  regs[dst_reg].type = SCALAR_VALUE;
> > > > > +  coerce_reg_to_size([dst_reg], 4);
> > > > Won't this break when handed a pointer (as root, so allowed to leak
> > > >   it)?  E.g. (pointer + x) gets turned into scalar x, rather than
> > > >   unknown scalar in range [0, 0x].
> > > 
> > > Initially I was gating this to scalar_value only, later was thinking it
> > > could be extended to ptr case if ptr leak is allowed.
> > > 
> > > But, your comment remind me min/max value doesn't mean real min/max value
> > > for ptr types value, it means the offset only if I am understanding the
> > > issue correctly. So, it will break pointer case.
> > 
> > correct. In case of is_pointer_value() && root -> mark_reg_unknown() has to 
> > be called
> > 
> > The explanation of additional 3 steps from another email makes sense to me.
> > 
> > Can you take a look why it helps default (llvm alu64) case too ?
> > bpf_overlay.o   3096/2898
> 
> It is embarrassing that I am not able to reproduce this number after tried
> quite a few env configurations. I think the number must be wrong because
> llvm alu64 binary doesn't contain alu32 move so shouldn't be impacted by
> this patch even though I double checked the raw data I collected on llvm
> alu64, re-calculated the number before this patch, it is still 3096. I
> guess there must be something wrong with the binary I was loading.
> 
> I improved my benchmarking methodology to build all alu64 and alu32
> binaries first, and never change them later. Then used a script to load and
> collect the processed number. (borrowed the script from
> https://github.com/4ast/bpf_cilium_test/, only my binaries are built from
> latest Cilium repo and contains alu32 version as well)
> 
> I ran this new benchmarking env for several times, and could get the
> following new results consistently:
> 
> bpf_lb-DLB_L3.o:2085/2085 1685/1687
> bpf_lb-DLB_L4.o:2287/2287 1986/1982
> bpf_lb-DUNKNOWN.o:  690/690   622/622
> bpf_lxc.o:  95033/95033   N/A
> bpf_netdev.o:   7245/7245 N/A
> bpf_overlay.o:  2898/2898 3085/2947
> 
> No change on alu64 binary.
> 
> For alu32, bpf_overlay.o still get fewer processed instruction number, this
> is because there is the following sequence (and another similar one).
> Before this patch, r2 at insn 139 is unknown, so verifier always explore
> both path-taken and path-fall_through. After this patch, it explores
> path-fall_through only, so saved some insns.
> 
>   129: (b4) (u32) r7 = (u32) -140
>   ...
>   136: (bc) (u32) r2 = (u32) r7
>   137: (74) (u32) r2 >>= (u32) 31
>   138: (4c) (u32) r2 |= (u32) r1
>   139: (15) if r2 == 0x0 goto pc+342
>   140: (b4) (u32) r1 = (u32) 2
> 
> And a permissive register value for r2 hasn't released more path prune for
> this test, so in all, after this patch, there is fewer processed insn.
> 
> I have sent out a v2, gated this change under SCALAR_VALUE, and also
> updated the patch description.

Thanks for the update. Makes sense.



Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Tony Lindgren
Hi,

* Russell King - ARM Linux  [181207 18:01]:
> Hi Tony,
> 
> You know most of what's been going on from IRC, but here's the patch
> which gets me:
> 
> 1) working interrupts for networking
> 2) solves the stuck-wakeup problem
> 
> It also contains some of the debug bits I added.

This is excellent news :) Will test today.

> I think what this means is that we should strip out ec0daae685b2
> ("gpio: omap: Add level wakeup handling for omap4 based SoCs").

Yes the only reason for the wakeup quirk was the stuck wakeup
state seen on omap4, it can be just dropped if this works.
Adding Grygorii to Cc too.

Regards,

Tony

> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index 3d021f648c5d..528ffd1b9832 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -11,7 +11,7 @@
>   * it under the terms of the GNU General Public License version 2 as
>   * published by the Free Software Foundation.
>   */
> -
> +#define DEBUG
>  #include 
>  #include 
>  #include 
> @@ -366,10 +366,14 @@ static inline void omap_set_gpio_trigger(struct 
> gpio_bank *bank, int gpio,
> trigger & IRQ_TYPE_LEVEL_LOW);
>   omap_gpio_rmw(base, bank->regs->leveldetect1, gpio_bit,
> trigger & IRQ_TYPE_LEVEL_HIGH);
> + /*
> +  * We need the edge detect enabled for the idle mode detection
> +  * to function on OMAP4430.
> +  */
>   omap_gpio_rmw(base, bank->regs->risingdetect, gpio_bit,
> -   trigger & IRQ_TYPE_EDGE_RISING);
> +   trigger & (IRQ_TYPE_EDGE_RISING | IRQ_TYPE_LEVEL_HIGH));
>   omap_gpio_rmw(base, bank->regs->fallingdetect, gpio_bit,
> -   trigger & IRQ_TYPE_EDGE_FALLING);
> +   trigger & (IRQ_TYPE_EDGE_FALLING | IRQ_TYPE_LEVEL_LOW));
>  
>   bank->context.leveldetect0 =
>   readl_relaxed(bank->base + bank->regs->leveldetect0);
> @@ -910,14 +914,16 @@ static void omap_gpio_unmask_irq(struct irq_data *d)
>   if (trigger)
>   omap_set_gpio_triggering(bank, offset, trigger);
>  
> + omap_set_gpio_irqenable(bank, offset, 1);
> +
>   /* For level-triggered GPIOs, the clearing must be done after
> -  * the HW source is cleared, thus after the handler has run */
> - if (bank->level_mask & BIT(offset)) {
> - omap_set_gpio_irqenable(bank, offset, 0);
> +  * the HW source is cleared, thus after the handler has run.
> +  * OMAP4 needs this done _after_ enabing the interrupt to clear
> +  * the wakeup status.
> +  */
> + if (bank->level_mask & BIT(offset))
>   omap_clear_gpio_irqstatus(bank, offset);
> - }
>  
> - omap_set_gpio_irqenable(bank, offset, 1);
>   raw_spin_unlock_irqrestore(>lock, flags);
>  }
>  
> @@ -1520,6 +1526,10 @@ static void omap_gpio_idle(struct gpio_bank *bank, 
> bool may_lose_context)
>   struct device *dev = bank->chip.parent;
>   u32 l1 = 0, l2 = 0;
>  
> + dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
> + bank->context.leveldetect0, bank->context.leveldetect1,
> + bank->context.wake_en);
> +
>   if (bank->funcs.idle_enable_level_quirk)
>   bank->funcs.idle_enable_level_quirk(bank);
>  
> @@ -1553,6 +1563,10 @@ static void omap_gpio_idle(struct gpio_bank *bank, 
> bool may_lose_context)
>   bank->get_context_loss_count(dev);
>  
>   omap_gpio_dbck_disable(bank);
> +
> + dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
> + bank->context.leveldetect0, bank->context.leveldetect1,
> + bank->context.wake_en);
>  }
>  
>  static void omap_gpio_init_context(struct gpio_bank *p);
> @@ -1563,6 +1577,10 @@ static void omap_gpio_unidle(struct gpio_bank *bank)
>   u32 l = 0, gen, gen0, gen1;
>   int c;
>  
> + dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
> + bank->context.leveldetect0, bank->context.leveldetect1,
> + bank->context.wake_en);
> +
>   /*
>* On the first resume during the probe, the context has not
>* been initialised and so initialise it now. Also initialise
> @@ -1648,6 +1666,10 @@ static void omap_gpio_unidle(struct gpio_bank *bank)
>   }
>  
>   bank->workaround_enabled = false;
> +
> + dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
> + bank->context.leveldetect0, bank->context.leveldetect1,
> + bank->context.wake_en);
>  }
>  
>  static void omap_gpio_init_context(struct gpio_bank *p)
> @@ -1720,6 +1742,7 @@ static int __maybe_unused 
> omap_gpio_runtime_suspend(struct device *dev)
>   error = -EBUSY;
>   goto unlock;
>   }
> + dev_dbg(dev, "%s()\n", __func__);
>   omap_gpio_idle(bank, true);
>   bank->is_suspended = true;
>  unlock:
> @@ -1741,6 +1764,7 @@ static int __maybe_unused 
> omap_gpio_runtime_resume(struct device *dev)
>   

RE: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support

2018-12-07 Thread Ioana Ciocoi Radulescu
> -Original Message-
> From: Ilias Apalodimas 
> Sent: Friday, December 7, 2018 7:52 PM
> To: Ioana Ciocoi Radulescu 
> Cc: Jesper Dangaard Brouer ;
> netdev@vger.kernel.org; da...@davemloft.net; Ioana Ciornei
> ; dsah...@gmail.com; Camelia Alexandra Groza
> 
> Subject: Re: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support
> 
> Hi Ioana,
> > > > >
> > > I only did a quick grep around the driver so i might be missing something,
> > > but i can only see allocations via napi_alloc_frag(). XDP requires pages
> > > (either a single page per packet or a driver that does the page
> management
> > > of
> > > its own and fits 2 frames in a single page, assuming 4kb pages).
> > > Am i missing something on the driver?
> >
> > No, I guess I'm the one missing stuff, I didn't realise single page per 
> > packet
> > is a hard requirement for XDP. Could you point me to more info on this?
> >
> 
> Well if you don't have to use 64kb pages you can use the page_pool API (only
> used from mlx5 atm) and get the xdp recycling for free. The memory 'waste'
> for
> 4kb pages isn't too much if the platforms the driver sits on have decent
> amounts
> of memory  (and the number of descriptors used is not too high).
> We still have work in progress with Jesper (just posted an RFC)with
> improvements
> on the API.
> Using it is fairly straightforward. This is a patchset on marvell's mvneta
> driver with the API changes needed:
> https://www.spinics.net/lists/netdev/msg538285.html
> 
> If you need 64kb pages you would have to introduce page recycling and
> sharing
> like intel/mlx drivers on your driver.

Thanks a lot for the info, will look into this. Do you have any pointers
as to why the full page restriction exists in the first place? Sorry if it's
a dumb question, but I haven't found details on this and I'd really like
to understand it.

Thanks
Ioana


Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-07 Thread Russell King - ARM Linux
Hi Tony,

You know most of what's been going on from IRC, but here's the patch
which gets me:

1) working interrupts for networking
2) solves the stuck-wakeup problem

It also contains some of the debug bits I added.

I think what this means is that we should strip out ec0daae685b2
("gpio: omap: Add level wakeup handling for omap4 based SoCs").

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index 3d021f648c5d..528ffd1b9832 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -11,7 +11,7 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-
+#define DEBUG
 #include 
 #include 
 #include 
@@ -366,10 +366,14 @@ static inline void omap_set_gpio_trigger(struct gpio_bank 
*bank, int gpio,
  trigger & IRQ_TYPE_LEVEL_LOW);
omap_gpio_rmw(base, bank->regs->leveldetect1, gpio_bit,
  trigger & IRQ_TYPE_LEVEL_HIGH);
+   /*
+* We need the edge detect enabled for the idle mode detection
+* to function on OMAP4430.
+*/
omap_gpio_rmw(base, bank->regs->risingdetect, gpio_bit,
- trigger & IRQ_TYPE_EDGE_RISING);
+ trigger & (IRQ_TYPE_EDGE_RISING | IRQ_TYPE_LEVEL_HIGH));
omap_gpio_rmw(base, bank->regs->fallingdetect, gpio_bit,
- trigger & IRQ_TYPE_EDGE_FALLING);
+ trigger & (IRQ_TYPE_EDGE_FALLING | IRQ_TYPE_LEVEL_LOW));
 
bank->context.leveldetect0 =
readl_relaxed(bank->base + bank->regs->leveldetect0);
@@ -910,14 +914,16 @@ static void omap_gpio_unmask_irq(struct irq_data *d)
if (trigger)
omap_set_gpio_triggering(bank, offset, trigger);
 
+   omap_set_gpio_irqenable(bank, offset, 1);
+
/* For level-triggered GPIOs, the clearing must be done after
-* the HW source is cleared, thus after the handler has run */
-   if (bank->level_mask & BIT(offset)) {
-   omap_set_gpio_irqenable(bank, offset, 0);
+* the HW source is cleared, thus after the handler has run.
+* OMAP4 needs this done _after_ enabing the interrupt to clear
+* the wakeup status.
+*/
+   if (bank->level_mask & BIT(offset))
omap_clear_gpio_irqstatus(bank, offset);
-   }
 
-   omap_set_gpio_irqenable(bank, offset, 1);
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
@@ -1520,6 +1526,10 @@ static void omap_gpio_idle(struct gpio_bank *bank, bool 
may_lose_context)
struct device *dev = bank->chip.parent;
u32 l1 = 0, l2 = 0;
 
+   dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
+   bank->context.leveldetect0, bank->context.leveldetect1,
+   bank->context.wake_en);
+
if (bank->funcs.idle_enable_level_quirk)
bank->funcs.idle_enable_level_quirk(bank);
 
@@ -1553,6 +1563,10 @@ static void omap_gpio_idle(struct gpio_bank *bank, bool 
may_lose_context)
bank->get_context_loss_count(dev);
 
omap_gpio_dbck_disable(bank);
+
+   dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
+   bank->context.leveldetect0, bank->context.leveldetect1,
+   bank->context.wake_en);
 }
 
 static void omap_gpio_init_context(struct gpio_bank *p);
@@ -1563,6 +1577,10 @@ static void omap_gpio_unidle(struct gpio_bank *bank)
u32 l = 0, gen, gen0, gen1;
int c;
 
+   dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
+   bank->context.leveldetect0, bank->context.leveldetect1,
+   bank->context.wake_en);
+
/*
 * On the first resume during the probe, the context has not
 * been initialised and so initialise it now. Also initialise
@@ -1648,6 +1666,10 @@ static void omap_gpio_unidle(struct gpio_bank *bank)
}
 
bank->workaround_enabled = false;
+
+   dev_dbg(dev, "%s(): ld 0x%08x 0x%08x we 0x%08x\n", __func__,
+   bank->context.leveldetect0, bank->context.leveldetect1,
+   bank->context.wake_en);
 }
 
 static void omap_gpio_init_context(struct gpio_bank *p)
@@ -1720,6 +1742,7 @@ static int __maybe_unused 
omap_gpio_runtime_suspend(struct device *dev)
error = -EBUSY;
goto unlock;
}
+   dev_dbg(dev, "%s()\n", __func__);
omap_gpio_idle(bank, true);
bank->is_suspended = true;
 unlock:
@@ -1741,6 +1764,7 @@ static int __maybe_unused omap_gpio_runtime_resume(struct 
device *dev)
error = -EBUSY;
goto unlock;
}
+   dev_dbg(dev, "%s()\n", __func__);
omap_gpio_unidle(bank);
bank->is_suspended = false;
 unlock:
@@ -1827,8 +1851,8 @@ static const struct omap_gpio_platform_data omap4_pdata = 
{
.regs = _gpio_regs,
.bank_width = 32,
.dbck_flag = true,
-   .quirks = 

Re: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support

2018-12-07 Thread Ilias Apalodimas
Hi Ioana,
> > > >
> > I only did a quick grep around the driver so i might be missing something,
> > but i can only see allocations via napi_alloc_frag(). XDP requires pages
> > (either a single page per packet or a driver that does the page management
> > of
> > its own and fits 2 frames in a single page, assuming 4kb pages).
> > Am i missing something on the driver?
> 
> No, I guess I'm the one missing stuff, I didn't realise single page per packet
> is a hard requirement for XDP. Could you point me to more info on this?
> 

Well if you don't have to use 64kb pages you can use the page_pool API (only
used from mlx5 atm) and get the xdp recycling for free. The memory 'waste' for
4kb pages isn't too much if the platforms the driver sits on have decent amounts
of memory  (and the number of descriptors used is not too high).
We still have work in progress with Jesper (just posted an RFC)with improvements
on the API.
Using it is fairly straightforward. This is a patchset on marvell's mvneta
driver with the API changes needed: 
https://www.spinics.net/lists/netdev/msg538285.html

If you need 64kb pages you would have to introduce page recycling and sharing 
like intel/mlx drivers on your driver.

/Ilias


RE: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support

2018-12-07 Thread Ioana Ciocoi Radulescu



> -Original Message-
> From: Ilias Apalodimas 
> Sent: Friday, December 7, 2018 7:20 PM
> To: Ioana Ciocoi Radulescu 
> Cc: Jesper Dangaard Brouer ;
> netdev@vger.kernel.org; da...@davemloft.net; Ioana Ciornei
> ; dsah...@gmail.com; Camelia Alexandra Groza
> 
> Subject: Re: [PATCH v2 net-next 0/8] dpaa2-eth: Introduce XDP support
> 
> Hi Ioana,
> > >
> > > > Add support for XDP programs. Only XDP_PASS, XDP_DROP and
> XDP_TX
> > > > actions are supported for now. Frame header changes are also
> > > > allowed.
> 
> I only did a quick grep around the driver so i might be missing something,
> but i can only see allocations via napi_alloc_frag(). XDP requires pages
> (either a single page per packet or a driver that does the page management
> of
> its own and fits 2 frames in a single page, assuming 4kb pages).
> Am i missing something on the driver?

No, I guess I'm the one missing stuff, I didn't realise single page per packet
is a hard requirement for XDP. Could you point me to more info on this?

Thanks,
Ioana

> 
> > >
> > > Do you have any XDP performance benchmarks on this hardware?
> >
> > We have some preliminary perf data that doesn't look great,
> > but we hope to improve it :)
> 
> As Jesper said we are doing similar work on a cortex a-53 and plan to work on
> a-72 as well. We might be able to help out.
> 
> /Ilias


Re: [PATCH] gianfar: Add gfar_change_carrier()

2018-12-07 Thread Andrew Lunn
> Would you be happier if .ndo_change_carrier() only acted on Fixed PHYs?

I think it makes sense to allow a fixed phy carrier to be changed from
user space. However, i don't think you can easily plumb that to
.ndo_change_carrier(), since that is a MAC feature. You need to change
the fixed_phy_status to indicate the PHY has lost link, and then let
the usual mechanisms tell the MAC it is down and change the carrier
status.

Andrew


  1   2   3   4   5   6   7   8   9   10   >