Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-08 Thread Jesper Dangaard Brouer
On Fri, 9 Nov 2018 04:52:01 +
Saeed Mahameed  wrote:

> On Thu, 2018-11-08 at 17:42 -0700, David Ahern wrote:
> > On 11/8/18 5:40 PM, Paweł Staszewski wrote:  
> > > 
> > > W dniu 08.11.2018 o 17:32, David Ahern pisze:  
> > > > On 11/8/18 9:27 AM, Paweł Staszewski wrote:  
> > > > > > > What hardware is this?
> > > > > > >   
> > > > > mellanox connectx 4
> > > > > ethtool -i enp175s0f0
> > > > > driver: mlx5_core
> > > > > version: 5.0-0
> > > > > firmware-version: 12.21.1000 (SM_200101033)
> > > > > expansion-rom-version:
> > > > > bus-info: :af:00.0
> > > > > supports-statistics: yes
> > > > > supports-test: yes
> > > > > supports-eeprom-access: no
> > > > > supports-register-dump: no
> > > > > supports-priv-flags: yes
> > > > > 
> > > > > ethtool -i enp175s0f1
> > > > > driver: mlx5_core
> > > > > version: 5.0-0
> > > > > firmware-version: 12.21.1000 (SM_200101033)
> > > > > expansion-rom-version:
> > > > > bus-info: :af:00.1
> > > > > supports-statistics: yes
> > > > > supports-test: yes
> > > > > supports-eeprom-access: no
> > > > > supports-register-dump: no
> > > > > supports-priv-flags: yes
> > > > >   
> > > > > > > Start with:
> > > > > > > 
> > > > > > > echo 1 > /sys/kernel/debug/tracing/events/xdp/enable
> > > > > > > cat /sys/kernel/debug/tracing/trace_pipe  
> > > > > >   cat /sys/kernel/debug/tracing/trace_pipe
> > > > > >   -0 [045] ..s. 68469.467752:
> > > > > > xdp_devmap_xmit:
> > > > > > ndo_xdp_xmit map_id=32 map_index=5 action=REDIRECT sent=0
> > > > > > drops=1
> > > > > > from_ifindex=4 to_ifindex=5 err=-6  
> > > > FIB lookup is good, the redirect is happening, but the mlx5
> > > > driver does
> > > > not like it.
> > > > 
> > > > I think the -6 is coming from the mlx5 driver and the packet is
> > > > getting
> > > > dropped. Perhaps this check in mlx5e_xdp_xmit:
> > > > 
> > > > if (unlikely(sq_num >= priv->channels.num))
> > > >  return -ENXIO;  
> > > I removed that part and recompiled - but after running now xdp_fwd
> > > i
> > > have kernel pamic :)  
> >   
> 
> hh, no please don't do such thing :)
> 
> It must be because the tx netdev has less tx queues than the rx netdev.
> or the rx netdev rings are bound to a high cpu indexes.
> 
> anyway, best practice is to open #cores RX/TX netdev on both sides
> 
> ethtool -L enp175s0f0  combined $(nproc) 
> ethtool -L enp175s0f1  combined $(nproc)
> 
> > Jesper or one of the Mellanox folks needs to respond about the config
> > needed to run XDP with this NIC. I don't have a 40G or 100G card to
> > play with.  

Saeed already answered with a solution... you need to increase the
number of RX/TX queues to be equal to the number of CPUs.

IHMO this again shows that the resource allocations around ndo_xdp_xmit
needs a better API.  The implicit requirement is that once ndo_xdp_xmit
is enabled the driver MUST allocate for each CPU a dedicated TX for
XDP. It seems for mlx5 that this is a manual process.  And as Pawel
discovered it is hard to troubleshoot and only via tracepoints.

I think we need to do better in this area, both regarding usability and
more graceful handling when the HW doesn't have the resources.  The
original requirement for a XDP-TX queue per CPU was necessary because
the ndo_xdp_xmit was only sending 1-packet at the time.  After my
recent changes, the ndo_xdp_xmit can now send in bulks. Thus,
performance wise it is feasible to use an (array of) locks, if e.g. the
HW cannot allocated more TX-HW queues, or e.g. allow sysadm to set the
mode of operation (if the system as a hole have issues allocating TX
completion IRQs for all these queues).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH bpf] bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp

2018-11-08 Thread Daniel Borkmann
On 11/07/2018 10:36 PM, Andrey Ignatov wrote:
> Lookup functions in sk_lookup have different expectations about byte
> order of provided arguments.
> 
> Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup
> expect dport to be in network byte order and do ntohs(dport) internally.
> 
> At the same time __inet6_lookup expects dport to be in host byte order
> and correspondingly name the argument hnum.
> 
> sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and
> __inet6_lookup with regard to dport. But in __udp6_lib_lookup case it
> uses host instead of expected network byte order. It makes result
> returned by bpf_sk_lookup_udp for IPv6 incorrect.
> 
> The patch fixes byte order of dport passed to __udp6_lib_lookup.
> 
> Originally sk_lookup properly handled UDPv6, but not TCPv6. 5ef0ae84f02a
> fixes TCPv6 but breaks UDPv6.
> 
> Fixes: 5ef0ae84f02a ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup")
> Signed-off-by: Andrey Ignatov 

Applied to bpf, thanks Andrey!


Re: [PATCH bpf v2] tools/bpftool: copy a few net uapi headers to tools directory

2018-11-08 Thread Daniel Borkmann
On 11/08/2018 04:55 AM, Yonghong Song wrote:
> Commit f6f3bac08ff9 ("tools/bpf: bpftool: add net support")
> added certain networking support to bpftool.
> The implementation relies on a relatively recent uapi header file
> linux/tc_act/tc_bpf.h on the host which contains the marco
> definition of TCA_ACT_BPF_ID.
> 
> Unfortunately, this is not the case for all distributions.
> See the email message below where rhel-7.2 does not have
> an up-to-date linux/tc_act/tc_bpf.h.
>   https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1799211.html
> Further investigation found that linux/pkt_cls.h is also needed for macro
> TCA_BPF_TAG.
> 
> This patch fixed the issue by copying linux/tc_act/tc_bpf.h
> and linux/pkt_cls.h from kernel include/uapi directory to
> tools/include/uapi directory so building the bpftool does not depend
> on host system for these files.
> 
> Fixes: f6f3bac08ff9 ("tools/bpf: bpftool: add net support")
> Reported-by: kernel test robot 
> Cc: Li Zhijian 
> Signed-off-by: Yonghong Song 

Applied to bpf, thanks Yonghong!


Re: [PATCH bpf 0/4] tools: bpftool: bring several minor fixes to bpftool

2018-11-08 Thread Daniel Borkmann
On 11/08/2018 12:52 PM, Quentin Monnet wrote:
> Hi,
> This set contains minor fixes for bpftool code and documentation.
> Please refer to individual patches for details.
> 
> Quentin Monnet (4):
>   tools: bpftool: prevent infinite loop in get_fdinfo()
>   tools: bpftool: fix plain output and doc for --bpffs option
>   tools: bpftool: pass an argument to silence open_obj_pinned()
>   tools: bpftool: update references to other man pages in documentation
> 
>  tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |  8 +++-
>  tools/bpf/bpftool/Documentation/bpftool-map.rst|  8 +++-
>  tools/bpf/bpftool/Documentation/bpftool-net.rst|  8 +++-
>  tools/bpf/bpftool/Documentation/bpftool-perf.rst   |  8 +++-
>  tools/bpf/bpftool/Documentation/bpftool-prog.rst   | 11 +--
>  tools/bpf/bpftool/Documentation/bpftool.rst|  9 +++--
>  tools/bpf/bpftool/common.c | 17 +
>  tools/bpf/bpftool/main.h   |  2 +-
>  tools/bpf/bpftool/prog.c   |  3 +--
>  9 files changed, 55 insertions(+), 19 deletions(-)
> 

Applied to bpf, thanks Quentin!


[PATCH][RFC] udp: cache sock to avoid searching it twice

2018-11-08 Thread Li RongQing
GRO for UDP needs to lookup socket twice, first is in gro receive,
second is gro complete, so if store sock to skb to avoid looking up
twice, this can give small performance boost

netperf -t UDP_RR -l 10

Before:
Rate per sec: 28746.01
After:
Rate per sec: 29401.67

Signed-off-by: Li RongQing 
---
 net/ipv4/udp_offload.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 0646d61f4fa8..429570112a33 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -408,6 +408,11 @@ struct sk_buff *udp_gro_receive(struct list_head *head, 
struct sk_buff *skb,
 
if (udp_sk(sk)->gro_enabled) {
pp = call_gro_receive(udp_gro_receive_segment, head, skb);
+
+   if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
+   sock_hold(sk);
+   pp->sk = sk;
+   }
rcu_read_unlock();
return pp;
}
@@ -444,6 +449,10 @@ struct sk_buff *udp_gro_receive(struct list_head *head, 
struct sk_buff *skb,
skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb);
 
+   if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
+   sock_hold(sk);
+   pp->sk = sk;
+   }
 out_unlock:
rcu_read_unlock();
skb_gro_flush_final(skb, pp, flush);
@@ -502,7 +511,9 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
uh->len = newlen;
 
rcu_read_lock();
-   sk = (*lookup)(skb, uh->source, uh->dest);
+   sk = skb->sk;
+   if (!sk)
+   sk = (*lookup)(skb, uh->source, uh->dest);
if (sk && udp_sk(sk)->gro_enabled) {
err = udp_gro_complete_segment(skb);
} else if (sk && udp_sk(sk)->gro_complete) {
@@ -516,6 +527,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
err = udp_sk(sk)->gro_complete(sk, skb,
nhoff + sizeof(struct udphdr));
}
+
+   if (skb->sk) {
+   sock_put(skb->sk);
+   skb->sk = NULL;
+   }
rcu_read_unlock();
 
if (skb->remcsum_offload)
-- 
2.16.2



Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-08 Thread Saeed Mahameed
On Thu, 2018-11-08 at 17:42 -0700, David Ahern wrote:
> On 11/8/18 5:40 PM, Paweł Staszewski wrote:
> > 
> > W dniu 08.11.2018 o 17:32, David Ahern pisze:
> > > On 11/8/18 9:27 AM, Paweł Staszewski wrote:
> > > > > > What hardware is this?
> > > > > > 
> > > > mellanox connectx 4
> > > > ethtool -i enp175s0f0
> > > > driver: mlx5_core
> > > > version: 5.0-0
> > > > firmware-version: 12.21.1000 (SM_200101033)
> > > > expansion-rom-version:
> > > > bus-info: :af:00.0
> > > > supports-statistics: yes
> > > > supports-test: yes
> > > > supports-eeprom-access: no
> > > > supports-register-dump: no
> > > > supports-priv-flags: yes
> > > > 
> > > > ethtool -i enp175s0f1
> > > > driver: mlx5_core
> > > > version: 5.0-0
> > > > firmware-version: 12.21.1000 (SM_200101033)
> > > > expansion-rom-version:
> > > > bus-info: :af:00.1
> > > > supports-statistics: yes
> > > > supports-test: yes
> > > > supports-eeprom-access: no
> > > > supports-register-dump: no
> > > > supports-priv-flags: yes
> > > > 
> > > > > > Start with:
> > > > > > 
> > > > > > echo 1 > /sys/kernel/debug/tracing/events/xdp/enable
> > > > > > cat /sys/kernel/debug/tracing/trace_pipe
> > > > >   cat /sys/kernel/debug/tracing/trace_pipe
> > > > >   -0 [045] ..s. 68469.467752:
> > > > > xdp_devmap_xmit:
> > > > > ndo_xdp_xmit map_id=32 map_index=5 action=REDIRECT sent=0
> > > > > drops=1
> > > > > from_ifindex=4 to_ifindex=5 err=-6
> > > FIB lookup is good, the redirect is happening, but the mlx5
> > > driver does
> > > not like it.
> > > 
> > > I think the -6 is coming from the mlx5 driver and the packet is
> > > getting
> > > dropped. Perhaps this check in mlx5e_xdp_xmit:
> > > 
> > > if (unlikely(sq_num >= priv->channels.num))
> > >  return -ENXIO;
> > I removed that part and recompiled - but after running now xdp_fwd
> > i
> > have kernel pamic :)
> 

hh, no please don't do such thing :)

It must be because the tx netdev has less tx queues than the rx netdev.
or the rx netdev rings are bound to a high cpu indexes.

anyway, best practice is to open #cores RX/TX netdev on both sides

ethtool -L enp175s0f0  combined $(nproc) 
ethtool -L enp175s0f1  combined $(nproc)

> Jesper or one of the Mellanox folks needs to respond about the config
> needed to run XDP with this NIC. I don't have a 40G or 100G card to
> play
> with.




Re: [PATCH net-next 0/7] nfp: abm: move code and improve parameter validation

2018-11-08 Thread David Miller
From: Jakub Kicinski 
Date: Thu,  8 Nov 2018 19:50:32 -0800

> This set starts by separating Qdisc handling code into a new file.
> Next two patches allow early access to TLV-based capabilities during
> probe, previously the capabilities were parsed just before netdevs
> were registered, but its cleaner to do some basic validation earlier
> and avoid cleanup work.
> 
> Next three patches improve RED's parameter validation.  First we provide
> a more precise message about why offload failed (and move the parameter
> validation to a helper).  Next we make sure we don't set the top bit
> in the 32 bit max RED threshold value.  Because FW is treating the value
> as signed it reportedly causes slow downs (unnecessary queuing and
> marking) when top bit is set with recent firmwares.  Last (and perhaps
> least importantly) we offload the harddrop parameter of the Qdisc.
> We don't plan to offload harddrop RED, but it seems prudent to make
> sure user didn't set that flag as device behaviour would have differed.

Series applied, thanks Jakub.


Re: [PATCH net-next] tcp_bbr: update comments to reflect pacing_margin_percent

2018-11-08 Thread David Miller
From: Neal Cardwell 
Date: Thu,  8 Nov 2018 21:54:00 -0500

> Recently, in commit ab408b6dc744 ("tcp: switch tcp and sch_fq to new
> earliest departure time model"), the TCP BBR code switched to a new
> approach of using an explicit bbr_pacing_margin_percent for shaving a
> pacing rate "haircut", rather than the previous implict
> approach. Update an old comment to reflect the new approach.
> 
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Soheil Hassas Yeganeh 
> Signed-off-by: Eric Dumazet 

Applied.


Re: [Patch net-next v2] net: move __skb_checksum_complete*() to skbuff.c

2018-11-08 Thread David Miller
From: Cong Wang 
Date: Thu,  8 Nov 2018 14:05:42 -0800

> __skb_checksum_complete_head() and __skb_checksum_complete()
> are both declared in skbuff.h, they fit better in skbuff.c
> than datagram.c.
> 
> Cc: Stefano Brivio 
> Signed-off-by: Cong Wang 

Applied.


Re: [PATCH 00/20] octeontx2-af: NPC MCAM support and FLR handling

2018-11-08 Thread Sunil Kovvuri
On Fri, Nov 9, 2018 at 2:32 AM Arnd Bergmann  wrote:
>
> On Thu, Nov 8, 2018 at 7:36 PM  wrote:
> >
> > From: Sunil Goutham 
>
> Hmm, I noticed that you use a different address as the patch author
> and the submitter. I'm guessing that "Sunil Goutham" and
> "Sunil Kovvuri" actually refer to the same person, and you just
> need to pick which of the two email addresses you want to use
> for public communication, but that's not obvious here.
>
> However, if there are actually two different Sunil's here, then
> you need to add that second Signed-off-by.
>

No, it's just me.
Sometimes code indentation becomes messy and difficult to read, if i use
corporate mail server to submit patches. So i have been using gmail.

> I've taken a look at all the patches now, and found very little
> sticking out that warranted a comment from my side, and
> no real show-stoppers. That said, I found this series overall
> much harder to understand than the previous ones, and don't
> even know what to ask about it. My feeling is that it's probably
> all fine, but that is  purely based on a review  of the individual
> pieces, not the overall design and how they fit together. With the
> earlier patches that I managed to get a better understanding
> of, that seemed reasonable as well.
>
>   Arnd


Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-08 Thread Alexei Starovoitov
On Thu, Nov 08, 2018 at 10:56:55PM +, Edward Cree wrote:
> On 08/11/18 19:42, Alexei Starovoitov wrote:
> > same link let's continue at 1pm PST. 
> So, one thing we didn't really get onto was maps, and you mentioned that it
>  wasn't really clear what I was proposing there.

let's discuss ground rules first.
1. the patches should be upstreamable in both kernel _and_ llvm

For example you propose 'prog_symbol' down below.
I'm sure you can do it in your assembler. We can also do it in pahole
(because it operates on dwarf inside elf), but there is no way for us
to do it in llvm. Because symbol table gen comes last. dwarf, btf sections have
been completed before that. llvm 'fixup' logic also done before symbol table.
So to add something like 'prog_symbol' into .BTF.ext would mean that would need
an extra pass after the last one that messes with the stuff already finalized.
That will get nacked.
Another example: in our first llvm->btf patchset we were using llvm's dwarf to
generate btf. That got nacked and we had to do 2k+ lines complete rewrite
to generate btf from llvm ir. That forced us to generate btf slightly
differently than from dwarf. The difference is not fundamental (like bitfields).
But it drives the point that elf format is _secondary_ to #1 rule above.

Another example: you're proposing to teach bpf backend to recognize
bpf_map* name. That is not something we can upstream in llvm either.

Similarly on the kernel api side we decided to craft an api in a way
that what is passed to the kernel is returned back in the same way.
Sort of like if kernel understands 'struct bpf_func_info' from patch 5
it should speak back in the same language.
I think it's important to agree on that principle to continue discussion.

Another key point is obvious, but worth repeating.
kernel abi is cast in stone. elf format is not.
which means that what was defined in include/uapi/linux/btf.h is fixed.
That is the format of .BTF section that llvm/pahole emits.
Whereas .BTF.ext section is _not_ defined in kernel uapi.
It defines the elf format and we can change it the future.
The format of .BTF.ext is the protocol between libbpf and llvm.
We define .BTF.ext in libbpf with libbpf license and coding style
and independently define it in llvm with its license and its coding
style.
We need to discuss .BTF.ext and make sure it's extensible, so we can
upgrade libbpf and llvm independently of each other,
but it doesn't have the same requirements as kernel abi.
Like in the kernel abi the extensibility of any structure means
that uknown fields should be zero.
Ex: sys_bpf() accepts 'bpf_attr' and size.
If user's size is larger than kernel size. All uknown fields
must be zero.
In case of libbpf <-> llvm that is not the case.
libbpf _has_ to deal with non-zero unknown fields in some way.
Like it can print warning, but it has to accept the extended format.
Otherwise upgrading llvm will not be possible without upgrading libbpf.
Similarly when both llvm is new and libbpf is new, but kernel is old,
libbpf has to pass to the kernel only the fields it knows about.
(though it may understand all of them).
So libbpf has to deal will all combinations: old/new kernel,
old/new llvm (and corresponding .BTF and .BTF.ext sections).

Now my understanding that you're mainly concerned with elf file format, correct?
I'm making this guess, because you're arguing strongly about KIND_MAP
and blasting our __btf_map* hack.
I'm all for improving this part, but it cannot go into .BTF,
because we already have an api to pass btf_key_type_id, btf_value_type_id
in MAP_CREATE syscall.
Adding new BTF_KIND_MAP into .BTF would be pointless, since we cannot
change MAP_CREATE.

As discussed on the call, currently we're designing KIND_VARIABLE
that will describe global/static variables in .BTF and corresponding
prog_load api changes for the kernel. We'll be talking about it
at bpf uconf. Imo current 'struct bpf_map_def map_foo;' hack
that libbpf/iproute2 and other loaders are dealing with
will be cleaned up by this KIND_VARIABLE BTF support.
Because of rule #1 we cannot pattern match 'bpf_map_def' name
in llvm. All global variables have to be dealt in the common way
by the llvm. We can add new __builtin_ specifically for map creation
and ask folks to start using new interface in .c programs for
map creation. All that is exciting, but I'd like to table that
discussion and focus on this patch set which is about adding
bpf_func_infos, BTF vs BTF.ext split, instances vs types.

> What I have in mind comes in two parts:
> 1) map type.  A new BTF_KIND_MAP with metadata 'key_type', 'value_type'
>  (both are type_ids referencing other BTF type records), describing the
>  type "map from key_type to value_type".
> 2) record in the 'instances' table.  This would have a name_off (the
>  name of the map), a type_id (pointing at a BTF_KIND_MAP in the 'types'
>  table), and potentially also some indication of what symbol (from
>  section 'maps') refers to this map.  This is pretty 

Re: [PATCH 11/20] octeontx2-af: Add support for stripping STAG/CTAG

2018-11-08 Thread Sunil Kovvuri
On Fri, Nov 9, 2018 at 2:17 AM Arnd Bergmann  wrote:
>
> On Thu, Nov 8, 2018 at 7:37 PM  wrote:
>
> > diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc.h 
> > b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
> > index f98b011..3f7e5e6 100644
> > --- a/drivers/net/ethernet/marvell/octeontx2/af/npc.h
> > +++ b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
> > @@ -259,4 +259,34 @@ struct nix_rx_action {
> >  #endif
> >  };
> >
> > +struct nix_rx_vtag_action {
> > +#if defined(__BIG_ENDIAN_BITFIELD)
> > +   u64 rsvd_63_48  :16;
> > +   u64 vtag1_valid :1;
> > +   u64 vtag1_type  :3;
> > +   u64 rsvd_43 :1;
> > +   u64 vtag1_lid   :3;
> > +   u64 vtag1_relptr:8;
> > +   u64 rsvd_31_16  :16;
> > +   u64 vtag0_valid :1;
> > +   u64 vtag0_type  :3;
> > +   u64 rsvd_11 :1;
> > +   u64 vtag0_lid   :3;
> > +   u64 vtag0_relptr:8;
> > +#else
> > +   u64 vtag0_relptr:8;
> > +   u64 vtag0_lid   :3;
> > +   u64 rsvd_11 :1;
> > +   u64 vtag0_type  :3;
> > +   u64 vtag0_valid :1;
> > +   u64 rsvd_31_16  :16;
> > +   u64 vtag1_relptr:8;
> > +   u64 vtag1_lid   :3;
> > +   u64 rsvd_43 :1;
> > +   u64 vtag1_type  :3;
> > +   u64 vtag1_valid :1;
> > +   u64 rsvd_63_48  :16;
> > +#endif
> > +};
>
> Here is another instance of bitfields in an interface structure. As
> before, please try to avoid doing that and use bit shifts and masks
> instead.
>
>Arnd

No, this struct is not part of communication interface.
This is used to fill up a register in a bit more readable fashion
instead of plain bit shifts.

===
struct nix_rx_vtag_action vtag_action;

*(u64 *)_action = 0;
vtag_action.vtag0_valid = 1;
/* must match type set in NIX_VTAG_CFG */
vtag_action.vtag0_type = 0;
vtag_action.vtag0_lid = NPC_LID_LA;
vtag_action.vtag0_relptr = 12;
entry.vtag_action = *(u64 *)_action;

/* Set TAG 'action' */
rvu_write64(rvu, blkaddr, NPC_AF_MCAMEX_BANKX_TAG_ACT(index, actbank),
entry->vtag_action);
===

Thanks,
Sunil.


Re: [PATCH 08/20] octeontx2-af: Alloc and config NPC MCAM entry at a time

2018-11-08 Thread Sunil Kovvuri
On Fri, Nov 9, 2018 at 2:13 AM Arnd Bergmann  wrote:
>
> On Thu, Nov 8, 2018 at 7:37 PM  wrote:
> > @@ -666,4 +668,20 @@ struct npc_mcam_unmap_counter_req {
> > u8  all;   /* Unmap all entries using this counter ? */
> >  };
> >
> > +struct npc_mcam_alloc_and_write_entry_req {
> > +   struct mbox_msghdr hdr;
> > +   struct mcam_entry entry_data;
> > +   u16 ref_entry;
> > +   u8  priority;/* Lower or higher w.r.t ref_entry */
> > +   u8  intf;/* Rx or Tx interface */
> > +   u8  enable_entry;/* Enable this MCAM entry ? */
> > +   u8  alloc_cntr;  /* Allocate counter and map ? */
> > +};
>
> I noticed that this structure requires padding at the end because
> struct mbox_msghdr has a 32-bit alignment requirement. For
> data structures in an interface, I'd recommend avoiding that kind
> of padding and adding reserved fields or widening the types
> accordingly.
>

When there are multiple messages in the mailbox, each message starts
at a 16byte aligned offset. So struct mbox_msghdr is always aligned.
I think adding reserved fields is not needed here.

===
struct mbox_msghdr *otx2_mbox_alloc_msg_rsp(struct otx2_mbox *mbox, int devid,
int size, int size_rsp)
{
size = ALIGN(size, MBOX_MSG_ALIGN);
===

Is this what you were referring to ?

Sunil.

> I also noticed a similar problem in struct mbox_msghdr. Maybe
> use the 'pahole' tool to check for this kind of padding in the
> API structures.
>
>  Arnd


Re: [PATCH net-next] sfc: use the new __netdev_tx_sent_queue BQL optimisation

2018-11-08 Thread David Miller
From: Edward Cree 
Date: Thu, 8 Nov 2018 19:47:19 +

> As added in 3e59020abf0f ("net: bql: add __netdev_tx_sent_queue()"), which
>  see for performance rationale.
> 
> Signed-off-by: Edward Cree 

Applied.


[PATCH net-next 1/7] nfp: abm: split qdisc offload code into a separate file

2018-11-08 Thread Jakub Kicinski
The Qdisc offload code is logically separate, and we will soon
do significant surgery on it to support more Qdiscs, so move
it to a separate file.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
Reviewed-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/Makefile   |   1 +
 drivers/net/ethernet/netronome/nfp/abm/main.c | 266 -
 drivers/net/ethernet/netronome/nfp/abm/main.h |   6 +
 .../net/ethernet/netronome/nfp/abm/qdisc.c| 272 ++
 4 files changed, 279 insertions(+), 266 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/abm/qdisc.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 4afb10375397..190e8b56a41f 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -57,6 +57,7 @@ endif
 ifeq ($(CONFIG_NFP_APP_ABM_NIC),y)
 nfp-objs += \
abm/ctrl.o \
+   abm/qdisc.o \
abm/main.o
 endif
 
diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c 
b/drivers/net/ethernet/netronome/nfp/abm/main.c
index c0830c0c2c3f..3d15de0ae271 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -7,9 +7,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
 
 #include "../nfpcore/nfp.h"
 #include "../nfpcore/nfp_cpp.h"
@@ -27,269 +24,6 @@ static u32 nfp_abm_portid(enum nfp_repr_type rtype, 
unsigned int id)
   FIELD_PREP(NFP_ABM_PORTID_ID, id);
 }
 
-static int
-__nfp_abm_reset_root(struct net_device *netdev, struct nfp_abm_link *alink,
-u32 handle, unsigned int qs, u32 init_val)
-{
-   struct nfp_port *port = nfp_port_from_netdev(netdev);
-   int ret;
-
-   ret = nfp_abm_ctrl_set_all_q_lvls(alink, init_val);
-   memset(alink->qdiscs, 0, sizeof(*alink->qdiscs) * alink->num_qdiscs);
-
-   alink->parent = handle;
-   alink->num_qdiscs = qs;
-   port->tc_offload_cnt = qs;
-
-   return ret;
-}
-
-static void
-nfp_abm_reset_root(struct net_device *netdev, struct nfp_abm_link *alink,
-  u32 handle, unsigned int qs)
-{
-   __nfp_abm_reset_root(netdev, alink, handle, qs, ~0);
-}
-
-static int
-nfp_abm_red_find(struct nfp_abm_link *alink, struct tc_red_qopt_offload *opt)
-{
-   unsigned int i = TC_H_MIN(opt->parent) - 1;
-
-   if (opt->parent == TC_H_ROOT)
-   i = 0;
-   else if (TC_H_MAJ(alink->parent) == TC_H_MAJ(opt->parent))
-   i = TC_H_MIN(opt->parent) - 1;
-   else
-   return -EOPNOTSUPP;
-
-   if (i >= alink->num_qdiscs || opt->handle != alink->qdiscs[i].handle)
-   return -EOPNOTSUPP;
-
-   return i;
-}
-
-static void
-nfp_abm_red_destroy(struct net_device *netdev, struct nfp_abm_link *alink,
-   u32 handle)
-{
-   unsigned int i;
-
-   for (i = 0; i < alink->num_qdiscs; i++)
-   if (handle == alink->qdiscs[i].handle)
-   break;
-   if (i == alink->num_qdiscs)
-   return;
-
-   if (alink->parent == TC_H_ROOT) {
-   nfp_abm_reset_root(netdev, alink, TC_H_ROOT, 0);
-   } else {
-   nfp_abm_ctrl_set_q_lvl(alink, i, ~0);
-   memset(>qdiscs[i], 0, sizeof(*alink->qdiscs));
-   }
-}
-
-static int
-nfp_abm_red_replace(struct net_device *netdev, struct nfp_abm_link *alink,
-   struct tc_red_qopt_offload *opt)
-{
-   bool existing;
-   int i, err;
-
-   i = nfp_abm_red_find(alink, opt);
-   existing = i >= 0;
-
-   if (opt->set.min != opt->set.max || !opt->set.is_ecn) {
-   nfp_warn(alink->abm->app->cpp,
-"RED offload failed - unsupported parameters\n");
-   err = -EINVAL;
-   goto err_destroy;
-   }
-
-   if (existing) {
-   if (alink->parent == TC_H_ROOT)
-   err = nfp_abm_ctrl_set_all_q_lvls(alink, opt->set.min);
-   else
-   err = nfp_abm_ctrl_set_q_lvl(alink, i, opt->set.min);
-   if (err)
-   goto err_destroy;
-   return 0;
-   }
-
-   if (opt->parent == TC_H_ROOT) {
-   i = 0;
-   err = __nfp_abm_reset_root(netdev, alink, TC_H_ROOT, 1,
-  opt->set.min);
-   } else if (TC_H_MAJ(alink->parent) == TC_H_MAJ(opt->parent)) {
-   i = TC_H_MIN(opt->parent) - 1;
-   err = nfp_abm_ctrl_set_q_lvl(alink, i, opt->set.min);
-   } else {
-   return -EINVAL;
-   }
-   /* Set the handle to try full clean up, in case IO failed */
-   alink->qdiscs[i].handle = opt->handle;
-   if (err)
-   goto err_destroy;
-
-   if (opt->parent == TC_H_ROOT)
-   err = nfp_abm_ctrl_read_stats(alink, >qdiscs[i].stats);
-   else
-  

[PATCH net-next 5/7] nfp: abm: don't set negative threshold

2018-11-08 Thread Jakub Kicinski
Turns out the threshold value is used in signed compares in the FW,
so we should avoid setting the top bit.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
Reviewed-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/abm/main.h  |  3 +++
 drivers/net/ethernet/netronome/nfp/abm/qdisc.c | 10 --
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h 
b/drivers/net/ethernet/netronome/nfp/abm/main.h
index c617d213e406..3774c063e419 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.h
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.h
@@ -4,9 +4,12 @@
 #ifndef __NFP_ABM_H__
 #define __NFP_ABM_H__ 1
 
+#include 
 #include 
 #include 
 
+#define NFP_ABM_LVL_INFINITY   S32_MAX
+
 struct nfp_app;
 struct nfp_net;
 
diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c 
b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
index 04b91cc12434..979afb3ea855 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
@@ -31,7 +31,7 @@ static void
 nfp_abm_reset_root(struct net_device *netdev, struct nfp_abm_link *alink,
   u32 handle, unsigned int qs)
 {
-   __nfp_abm_reset_root(netdev, alink, handle, qs, ~0);
+   __nfp_abm_reset_root(netdev, alink, handle, qs, NFP_ABM_LVL_INFINITY);
 }
 
 static int
@@ -67,7 +67,7 @@ nfp_abm_red_destroy(struct net_device *netdev, struct 
nfp_abm_link *alink,
if (alink->parent == TC_H_ROOT) {
nfp_abm_reset_root(netdev, alink, TC_H_ROOT, 0);
} else {
-   nfp_abm_ctrl_set_q_lvl(alink, i, ~0);
+   nfp_abm_ctrl_set_q_lvl(alink, i, NFP_ABM_LVL_INFINITY);
memset(>qdiscs[i], 0, sizeof(*alink->qdiscs));
}
 }
@@ -88,6 +88,12 @@ nfp_abm_red_check_params(struct nfp_abm_link *alink,
 opt->parent, opt->handle);
return false;
}
+   if (opt->set.min > NFP_ABM_LVL_INFINITY) {
+   nfp_warn(cpp, "RED offload failed - threshold too large %d > %d 
(p:%08x h:%08x)\n",
+opt->set.min, NFP_ABM_LVL_INFINITY, opt->parent,
+opt->handle);
+   return false;
+   }
 
return true;
 }
-- 
2.17.1



[PATCH net-next 7/7] nfp: abm: refuse RED offload with harddrop set

2018-11-08 Thread Jakub Kicinski
RED Qdisc will now inform the drivers about the state of the harddrop
flag.  Refuse to offload in case harddrop is set.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
Reviewed-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/abm/qdisc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c 
b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
index 979afb3ea855..bb05f9ee0401 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
@@ -83,6 +83,11 @@ nfp_abm_red_check_params(struct nfp_abm_link *alink,
 opt->parent, opt->handle);
return false;
}
+   if (opt->set.is_harddrop) {
+   nfp_warn(cpp, "RED offload failed - harddrop is not supported 
(p:%08x h:%08x)\n",
+opt->parent, opt->handle);
+   return false;
+   }
if (opt->set.min != opt->set.max) {
nfp_warn(cpp, "RED offload failed - unsupported min/max 
parameters (p:%08x h:%08x)\n",
 opt->parent, opt->handle);
-- 
2.17.1



[PATCH net-next 3/7] nfp: parse vNIC TLV capabilities at alloc time

2018-11-08 Thread Jakub Kicinski
In certain cases initialization logic which follows allocation of
the vNIC structure may want to validate the capabilities of that vNIC.
This is easy before vNIC is initialized for normal capabilities which
are at fixed offsets in control memory, easy to locate and read, but
poses a challenge if the capabilities are in form of TLVs.  Parse
the TLVs early on so other code can just access parsed info, instead
of having to do the parsing by itself.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
Reviewed-by: Quentin Monnet 
---
 .../ethernet/netronome/nfp/nfp_net_common.c| 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 54ce8353715f..e00d5a2a41ee 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3576,6 +3576,7 @@ nfp_net_alloc(struct pci_dev *pdev, void __iomem 
*ctrl_bar, bool needs_netdev,
  unsigned int max_tx_rings, unsigned int max_rx_rings)
 {
struct nfp_net *nn;
+   int err;
 
if (needs_netdev) {
struct net_device *netdev;
@@ -3618,7 +3619,19 @@ nfp_net_alloc(struct pci_dev *pdev, void __iomem 
*ctrl_bar, bool needs_netdev,
 
timer_setup(>reconfig_timer, nfp_net_reconfig_timer, 0);
 
+   err = nfp_net_tlv_caps_parse(>pdev->dev, nn->dp.ctrl_bar,
+>tlv_caps);
+   if (err)
+   goto err_free_nn;
+
return nn;
+
+err_free_nn:
+   if (nn->dp.netdev)
+   free_netdev(nn->dp.netdev);
+   else
+   vfree(nn);
+   return ERR_PTR(err);
 }
 
 /**
@@ -3891,11 +3904,6 @@ int nfp_net_init(struct nfp_net *nn)
nn->dp.ctrl |= NFP_NET_CFG_CTRL_IRQMOD;
}
 
-   err = nfp_net_tlv_caps_parse(>pdev->dev, nn->dp.ctrl_bar,
->tlv_caps);
-   if (err)
-   return err;
-
if (nn->dp.netdev)
nfp_net_netdev_init(nn);
 
-- 
2.17.1



[PATCH net-next 0/7] nfp: abm: move code and improve parameter validation

2018-11-08 Thread Jakub Kicinski
Hi!

This set starts by separating Qdisc handling code into a new file.
Next two patches allow early access to TLV-based capabilities during
probe, previously the capabilities were parsed just before netdevs
were registered, but its cleaner to do some basic validation earlier
and avoid cleanup work.

Next three patches improve RED's parameter validation.  First we provide
a more precise message about why offload failed (and move the parameter
validation to a helper).  Next we make sure we don't set the top bit
in the 32 bit max RED threshold value.  Because FW is treating the value
as signed it reportedly causes slow downs (unnecessary queuing and
marking) when top bit is set with recent firmwares.  Last (and perhaps
least importantly) we offload the harddrop parameter of the Qdisc.
We don't plan to offload harddrop RED, but it seems prudent to make
sure user didn't set that flag as device behaviour would have differed.

Jakub Kicinski (7):
  nfp: abm: split qdisc offload code into a separate file
  nfp: pass ctrl_bar pointer to nfp_net_alloc
  nfp: parse vNIC TLV capabilities at alloc time
  nfp: abm: provide more precise info about offload parameter validation
  nfp: abm: don't set negative threshold
  net: sched: red: inform offloads about harddrop setting
  nfp: abm: refuse RED offload with harddrop set

 drivers/net/ethernet/netronome/nfp/Makefile   |   1 +
 drivers/net/ethernet/netronome/nfp/abm/main.c | 266 
 drivers/net/ethernet/netronome/nfp/abm/main.h |   9 +
 .../net/ethernet/netronome/nfp/abm/qdisc.c| 301 ++
 drivers/net/ethernet/netronome/nfp/nfp_net.h  |   2 +-
 .../ethernet/netronome/nfp/nfp_net_common.c   |  26 +-
 .../net/ethernet/netronome/nfp/nfp_net_main.c |   4 +-
 .../ethernet/netronome/nfp/nfp_netvf_main.c   |   3 +-
 include/net/pkt_cls.h |   1 +
 net/sched/sch_red.c   |   1 +
 10 files changed, 335 insertions(+), 279 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/abm/qdisc.c

-- 
2.17.1



[PATCH net-next 6/7] net: sched: red: inform offloads about harddrop setting

2018-11-08 Thread Jakub Kicinski
To mirror software behaviour on offload more precisely inform
the drivers about the state of the harddrop flag.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
Reviewed-by: Quentin Monnet 
---
 include/net/pkt_cls.h | 1 +
 net/sched/sch_red.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 72ffb3120ced..00f71644fbcd 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -807,6 +807,7 @@ struct tc_red_qopt_offload_params {
u32 max;
u32 probability;
bool is_ecn;
+   bool is_harddrop;
struct gnet_stats_queue *qstats;
 };
 
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 7682f7a618a1..a1d08bdd9357 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -167,6 +167,7 @@ static int red_offload(struct Qdisc *sch, bool enable)
opt.set.max = q->parms.qth_max >> q->parms.Wlog;
opt.set.probability = q->parms.max_P;
opt.set.is_ecn = red_use_ecn(q);
+   opt.set.is_harddrop = red_use_harddrop(q);
opt.set.qstats = >qstats;
} else {
opt.command = TC_RED_DESTROY;
-- 
2.17.1



[PATCH net-next 4/7] nfp: abm: provide more precise info about offload parameter validation

2018-11-08 Thread Jakub Kicinski
Improve log messages printed when RED can't be offloaded because
of Qdisc parameters.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
Reviewed-by: Quentin Monnet 
---
 .../net/ethernet/netronome/nfp/abm/qdisc.c| 24 ---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c 
b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
index f36da95827ee..04b91cc12434 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
@@ -72,6 +72,26 @@ nfp_abm_red_destroy(struct net_device *netdev, struct 
nfp_abm_link *alink,
}
 }
 
+static bool
+nfp_abm_red_check_params(struct nfp_abm_link *alink,
+struct tc_red_qopt_offload *opt)
+{
+   struct nfp_cpp *cpp = alink->abm->app->cpp;
+
+   if (!opt->set.is_ecn) {
+   nfp_warn(cpp, "RED offload failed - drop is not supported (ECN 
option required) (p:%08x h:%08x)\n",
+opt->parent, opt->handle);
+   return false;
+   }
+   if (opt->set.min != opt->set.max) {
+   nfp_warn(cpp, "RED offload failed - unsupported min/max 
parameters (p:%08x h:%08x)\n",
+opt->parent, opt->handle);
+   return false;
+   }
+
+   return true;
+}
+
 static int
 nfp_abm_red_replace(struct net_device *netdev, struct nfp_abm_link *alink,
struct tc_red_qopt_offload *opt)
@@ -82,9 +102,7 @@ nfp_abm_red_replace(struct net_device *netdev, struct 
nfp_abm_link *alink,
i = nfp_abm_red_find(alink, opt);
existing = i >= 0;
 
-   if (opt->set.min != opt->set.max || !opt->set.is_ecn) {
-   nfp_warn(alink->abm->app->cpp,
-"RED offload failed - unsupported parameters\n");
+   if (!nfp_abm_red_check_params(alink, opt)) {
err = -EINVAL;
goto err_destroy;
}
-- 
2.17.1



[PATCH net-next 2/7] nfp: pass ctrl_bar pointer to nfp_net_alloc

2018-11-08 Thread Jakub Kicinski
Move setting ctrl_bar pointer to the nfp_net_alloc function,
to make sure we can parse capabilities early in the following
patch.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
Reviewed-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h| 2 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 8 +---
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c   | 4 ++--
 drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c | 3 +--
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 6f0c37d09256..dda02fefc806 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -851,7 +851,7 @@ void nfp_net_get_fw_version(struct nfp_net_fw_version 
*fw_ver,
void __iomem *ctrl_bar);
 
 struct nfp_net *
-nfp_net_alloc(struct pci_dev *pdev, bool needs_netdev,
+nfp_net_alloc(struct pci_dev *pdev, void __iomem *ctrl_bar, bool needs_netdev,
  unsigned int max_tx_rings, unsigned int max_rx_rings);
 void nfp_net_free(struct nfp_net *nn);
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 6bddfcfdec34..54ce8353715f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3560,6 +3560,7 @@ void nfp_net_info(struct nfp_net *nn)
 /**
  * nfp_net_alloc() - Allocate netdev and related structure
  * @pdev: PCI device
+ * @ctrl_bar: PCI IOMEM with vNIC config memory
  * @needs_netdev: Whether to allocate a netdev for this vNIC
  * @max_tx_rings: Maximum number of TX rings supported by device
  * @max_rx_rings: Maximum number of RX rings supported by device
@@ -3570,9 +3571,9 @@ void nfp_net_info(struct nfp_net *nn)
  *
  * Return: NFP Net device structure, or ERR_PTR on error.
  */
-struct nfp_net *nfp_net_alloc(struct pci_dev *pdev, bool needs_netdev,
- unsigned int max_tx_rings,
- unsigned int max_rx_rings)
+struct nfp_net *
+nfp_net_alloc(struct pci_dev *pdev, void __iomem *ctrl_bar, bool needs_netdev,
+ unsigned int max_tx_rings, unsigned int max_rx_rings)
 {
struct nfp_net *nn;
 
@@ -3594,6 +3595,7 @@ struct nfp_net *nfp_net_alloc(struct pci_dev *pdev, bool 
needs_netdev,
}
 
nn->dp.dev = >dev;
+   nn->dp.ctrl_bar = ctrl_bar;
nn->pdev = pdev;
 
nn->max_tx_rings = max_tx_rings;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 1e7d20468a34..08f5fdbd8e41 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -116,13 +116,13 @@ nfp_net_pf_alloc_vnic(struct nfp_pf *pf, bool 
needs_netdev,
n_rx_rings = readl(ctrl_bar + NFP_NET_CFG_MAX_RXRINGS);
 
/* Allocate and initialise the vNIC */
-   nn = nfp_net_alloc(pf->pdev, needs_netdev, n_tx_rings, n_rx_rings);
+   nn = nfp_net_alloc(pf->pdev, ctrl_bar, needs_netdev,
+  n_tx_rings, n_rx_rings);
if (IS_ERR(nn))
return nn;
 
nn->app = pf->app;
nfp_net_get_fw_version(>fw_ver, ctrl_bar);
-   nn->dp.ctrl_bar = ctrl_bar;
nn->tx_bar = qc_bar + tx_base * NFP_QCP_QUEUE_ADDR_SZ;
nn->rx_bar = qc_bar + rx_base * NFP_QCP_QUEUE_ADDR_SZ;
nn->dp.is_vf = 0;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c
index d2c1e9ea5668..1145849ca7ba 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c
@@ -172,7 +172,7 @@ static int nfp_netvf_pci_probe(struct pci_dev *pdev,
rx_bar_off = NFP_PCIE_QUEUE(startq);
 
/* Allocate and initialise the netdev */
-   nn = nfp_net_alloc(pdev, true, max_tx_rings, max_rx_rings);
+   nn = nfp_net_alloc(pdev, ctrl_bar, true, max_tx_rings, max_rx_rings);
if (IS_ERR(nn)) {
err = PTR_ERR(nn);
goto err_ctrl_unmap;
@@ -180,7 +180,6 @@ static int nfp_netvf_pci_probe(struct pci_dev *pdev,
vf->nn = nn;
 
nn->fw_ver = fw_ver;
-   nn->dp.ctrl_bar = ctrl_bar;
nn->dp.is_vf = 1;
nn->stride_tx = stride;
nn->stride_rx = stride;
-- 
2.17.1



Re: [PATCH v4 bpf-next 7/7] bpftool: support loading flow dissector

2018-11-08 Thread Jakub Kicinski
On Thu,  8 Nov 2018 16:22:13 -0800, Stanislav Fomichev wrote:
> From: Stanislav Fomichev 
> 
> This commit adds support for loading/attaching/detaching flow
> dissector program. The structure of the flow dissector program is
> assumed to be the same as in the selftests:

nit: I don't think we make any assumptions any more?  Since the
 sub-programs are added to the map explicitly by the user?

> * flow_dissector section with the main entry point
> * a bunch of tail call progs
> * a jmp_table map that is populated with the tail call progs
> 

[...]

> @@ -338,7 +339,16 @@ _bpftool()
>  
>  case $prev in
>  type)
> -COMPREPLY=( $( compgen -W "socket kprobe 
> kretprobe classifier action tracepoint raw_tracepoint xdp perf_event 
> cgroup/skb cgroup/sock cgroup/dev lwt_in lwt_out lwt_xmit lwt_seg6local 
> sockops sk_skb sk_msg lirc_mode2 cgroup/bind4 cgroup/bind6 cgroup/connect4 
> cgroup/connect6 cgroup/sendmsg4 cgroup/sendmsg6 cgroup/post_bind4 
> cgroup/post_bind6" -- \
> +COMPREPLY=( $( compgen -W "socket kprobe \
> +kretprobe classifier flow_dissector \
> +action tracepoint raw_tracepoint \
> +xdp perf_event cgroup/skb cgroup/sock \
> +cgroup/dev lwt_in lwt_out lwt_xmit \
> +lwt_seg6local sockops sk_skb sk_msg \
> +lirc_mode2 cgroup/bind4 cgroup/bind6 \
> +cgroup/connect4 cgroup/connect6 \
> +cgroup/sendmsg4 cgroup/sendmsg6 \
> +cgroup/post_bind4 cgroup/post_bind6" -- \
> "$cur" ) )

Thanks! :)

>  return 0
>  ;;
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 4654d9450cd9..b808a67d1d3e 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -81,6 +81,7 @@ static const char * const attach_type_strings[] = {
>   [BPF_SK_SKB_STREAM_PARSER] = "stream_parser",
>   [BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict",
>   [BPF_SK_MSG_VERDICT] = "msg_verdict",
> + [BPF_FLOW_DISSECTOR] = "flow_dissector",
>   [__MAX_BPF_ATTACH_TYPE] = NULL,
>  };
>  
> @@ -721,30 +722,53 @@ int map_replace_compar(const void *p1, const void *p2)
>   return a->idx - b->idx;
>  }
>  
> -static int do_attach(int argc, char **argv)
> +static int parse_atach_detach_args(int argc, char **argv, int *progfd,
> +enum bpf_attach_type *attach_type,
> +int *mapfd)
>  {
> - enum bpf_attach_type attach_type;
> - int err, mapfd, progfd;
> -
> - if (!REQ_ARGS(5)) {
> - p_err("too few parameters for map attach");
> + if (!REQ_ARGS(3)) {
> + p_err("too few parameters for attach/detach");

I know this is not existing bug but we didn't catch it when
attach/ /detach was added :(  - REQ_ARGS() already includes a p_err(),
so we would have a duplicate error in JSON.  Please drop the error here
and below.

With that fix:

Acked-by: Jakub Kicinski 

Thanks for the work!

>   return -EINVAL;
>   }
>  
> - progfd = prog_parse_fd(, );
> - if (progfd < 0)
> - return progfd;
> + *progfd = prog_parse_fd(, );
> + if (*progfd < 0)
> + return *progfd;
>  
> - attach_type = parse_attach_type(*argv);
> - if (attach_type == __MAX_BPF_ATTACH_TYPE) {
> - p_err("invalid attach type");
> + *attach_type = parse_attach_type(*argv);
> + if (*attach_type == __MAX_BPF_ATTACH_TYPE) {
> + p_err("invalid attach/detach type");
>   return -EINVAL;
>   }
> +
> + if (*attach_type == BPF_FLOW_DISSECTOR) {
> + *mapfd = -1;
> + return 0;
> + }
> +
>   NEXT_ARG();
> + if (!REQ_ARGS(2)) {
> + p_err("too few parameters for map attach/detach");
> + return -EINVAL;
> + }
>  
> - mapfd = map_parse_fd(, );
> - if (mapfd < 0)
> - return mapfd;
> + *mapfd = map_parse_fd(, );
> + if (*mapfd < 0)
> + return *mapfd;
> +
> + return 0;
> +}
> +



Re: [PATCH net-next 0/4] Remove VLAN_TAG_PRESENT from drivers

2018-11-08 Thread David Miller
From: Michał Mirosław 
Date: Thu, 08 Nov 2018 18:44:46 +0100

> This series removes VLAN_TAG_PRESENT use from network drivers in
> preparation to removing its special meaning.

Series applied, thank you.


Re: [net-next, PATCH 1/2] net: socionext: different approach on DMA

2018-11-08 Thread David Miller
From: Ilias Apalodimas 
Date: Thu,  8 Nov 2018 17:19:54 +0200

> Current driver dynamically allocates an skb and maps it as DMA Rx
> buffer. In order to prepare for upcoming XDP changes, let's introduce a
> different allocation scheme.
> Buffers are allocated dynamically and mapped into hardware.
> During the Rx operation the driver uses build_skb() to produce the
> necessary buffers for the network stack.
> This change increases performance ~15% on 64b packets with smmu disabled
> and ~5% with smmu enabled
> 
> Signed-off-by: Ilias Apalodimas 

Applied.


Re: [net-next, PATCH 2/2] net: socionext: refactor netsec_alloc_dring()

2018-11-08 Thread David Miller
From: Ilias Apalodimas 
Date: Thu,  8 Nov 2018 17:19:55 +0200

> return -ENOMEM directly instead of assigning it in a variable
> 
> Signed-off-by: Ilias Apalodimas 

Applied.


Re: [PATCH v4 bpf-next 6/7] bpftool: add pinmaps argument to the load/loadall

2018-11-08 Thread Jakub Kicinski
On Thu,  8 Nov 2018 16:22:12 -0800, Stanislav Fomichev wrote:
> From: Stanislav Fomichev 
> 
> This new additional argument lets users pin all maps from the object at
> specified path.
> 
> Signed-off-by: Stanislav Fomichev 

Acked-by: Jakub Kicinski 


Re: [PATCH v4 bpf-next 5/7] bpftool: add loadall command

2018-11-08 Thread Jakub Kicinski
On Thu,  8 Nov 2018 16:22:11 -0800, Stanislav Fomichev wrote:
> @@ -79,8 +80,13 @@ DESCRIPTION
> contain a dot character ('.'), which is reserved for future
> extensions of *bpffs*.
>  
> - **bpftool prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** 
> *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> -   Load bpf program from binary *OBJ* and pin as *FILE*.
> + **bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
> [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> +   Load bpf program(s) from binary *OBJ* and pin as *FILE*.
> +   Both **bpftool prog load** and **bpftool prog loadall** load
> +   all maps and programs from the *OBJ* and differ only in
> +   pinning. **load** pins only the first program from the *OBJ*
> +   as *FILE*. **loadall** pins all programs from the *OBJ*
> +   under *FILE* directory.
> **type** is optional, if not specified program type will be
> inferred from section names.
> By default bpftool will create new maps as declared in the ELF

As I said the fact that we load all always is a libbpf limitation, 
I wouldn't put it in documentation as it may change.

With that removed looks good to me:

Acked-by: Jakub Kicinski 


Re: [PATCH net-next] net: qca_spi: Add available buffer space verification

2018-11-08 Thread David Miller
From: Stefan Wahren 
Date: Thu,  8 Nov 2018 14:38:21 +0100

> Interferences on the SPI line could distort the response of
> available buffer space. So at least we should check that the
> response doesn't exceed the maximum available buffer space.
> In error case increase a new error counter and retry it later.
> This behavior avoids buffer errors in the QCA7000, which
> results in an unnecessary chip reset including packet loss.
> 
> Signed-off-by: Stefan Wahren 

Applied.


Re: [PATCH v4 bpf-next 4/7] libbpf: add internal pin_name

2018-11-08 Thread Jakub Kicinski
On Thu,  8 Nov 2018 16:22:10 -0800, Stanislav Fomichev wrote:
> @@ -261,6 +266,18 @@ static void bpf_program__exit(struct bpf_program *prog)
>   prog->idx = -1;
>  }
>  
> +static char *__bpf_program__pin_name(struct bpf_program *prog)
> +{
> + char *name;
> +
> + name = strdup(prog->section_name);
> + for (char *p = name; p && *p; p++)

Useful patch!  I'm not sure about libbpf but in the kernel we don't do
C99 variable declarations inside for loop init.  Perhaps better to stick
to kernel rules than invent our own.

Also, I'm tempted to say:

char *name, *p;

name = p = strdup(prog->section_name);
while ((p = strchr(p, '/')))
*p = '_';

;)

> + if (*p == '/')
> + *p = '_';
> +
> + return name;
> +}


Re: [PATCH net 0/4] Slowpath Queue bug fixes

2018-11-08 Thread David Miller
From: Denis Bolotin 
Date: Thu, 8 Nov 2018 16:46:07 +0200

> This patch series fixes several bugs in the SPQ mechanism.
> It deals with SPQ entries management, preventing resource leaks, memory
> corruptions and handles error cases throughout the driver.
> Please consider applying to net.

Series applied, thanks.


Re: [PATCH v4 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin

2018-11-08 Thread Jakub Kicinski
On Thu,  8 Nov 2018 16:22:08 -0800, Stanislav Fomichev wrote:
> + for (map = bpf_map__prev(map, obj);
> +  map != NULL;
> +  map = bpf_map__prev(map, obj)) {

nit pick: if you need to respin all these for loops on error paths could
  have been more concise while loops


Re: [PATCH v3 net-next] sock: Reset dst when changing sk_mark via setsockopt

2018-11-08 Thread David Miller
From: David Barmann 
Date: Thu, 8 Nov 2018 08:13:35 -0600

> When setting the SO_MARK socket option, if the mark changes, the dst
> needs to be reset so that a new route lookup is performed.
> 
> This fixes the case where an application wants to change routing by
> setting a new sk_mark.  If this is done after some packets have already
> been sent, the dst is cached and has no effect.
> 
> Signed-off-by: David Barmann 

Applied.


Re: [PATCH net-next 0/2] dpaa2-eth: defer probe on object allocate

2018-11-08 Thread David Miller
From: David Miller 
Date: Thu, 08 Nov 2018 19:28:57 -0800 (PST)

> From: Ioana Ciornei 
> Date: Thu, 8 Nov 2018 13:17:46 +
> 
>> Allocatable objects on the fsl-mc bus may be probed by the fsl_mc_allocator
>> after the first attempts of other drivers to use them. Defer the probe when
>> this situation happens.
> 
> Series applied, thanks.

Whoops, I just saw Andrew Lunn's feedback.

I reverted your changes, please address his concerns and resubmit.

Thank you.


Re: [PATCH net-next 0/2] dpaa2-eth: defer probe on object allocate

2018-11-08 Thread David Miller
From: Ioana Ciornei 
Date: Thu, 8 Nov 2018 13:17:46 +

> Allocatable objects on the fsl-mc bus may be probed by the fsl_mc_allocator
> after the first attempts of other drivers to use them. Defer the probe when
> this situation happens.

Series applied, thanks.


Re: [PATCH net-next] dpaa2-eth: Introduce TX congestion management

2018-11-08 Thread David Miller
From: Ioana Ciocoi Radulescu 
Date: Thu, 8 Nov 2018 20:21:15 +

> Today I tried to further coalesce the confirmation frames such that I call
> netdev_tx_completed_queue() only at the end of the NAPI poll, once for each
> confirmation queue that was serviced during that NAPI.

That sounds like exactly what you should do given your design description.

> I need to do more testing, but so far it performs *almost* on par
> with the non-BQL driver version. But it does complicate the fastpath
> code and feels somewhat unnatural.

Well, this is exactly what should happen with BQL and as a result you will
get much better TCP queue utilization and avoidance of bufferbloat.



Should the bridge learn from frames with link local destination MAC address?

2018-11-08 Thread Andrew Lunn
Hi Roopa, Nikolay

br_handle_frame() looks out for frames with a destination MAC
addresses with is Ethernet link local, those which fit
01-80-C2-00-00-XX. It does not normally forward these, but it will
deliver them locally.

Should the bridge perform learning on such frames?

I've got a setup with two bridges connected together with multiple
links between them. STP has done its thing, and blocked one of the
ports to solve the loop.

host0   host1
+-+ +-+
| lan0 forwarding |-| lan0 forwarding |
| | | |
| lan1 forwarding |-| lan1 blocked|
+-+ +-+

I have LLDP running on both system, and they are sending out periodic
frames on each port.

Now, lan0 and lan1 on host1 use the same MAC address.  So i see the
MAC address bouncing between ports because of the LLDP packets.

# bridge monitor
00:26:55:d2:27:a8 dev lan1 master br0 
00:26:55:d2:27:a8 dev lan0 master br0 
00:26:55:d2:27:a8 dev lan1 master br0 
00:26:55:d2:27:a8 dev lan0 master br0 
00:26:55:d2:27:a8 dev lan1 master br0 

This then results in normal traffic from host0 to host1 being sent to
the blocked port for some of the time.

LLDP is using 01-80-C2-00-00-0E, a link local MAC address. If the
bridge did not learn on such frames, i think this setup would
work. The bridge would learn from ARP, IP etc, coming from the
forwarding port of host1, and the blocked port would be ignored.

I've tried a similar setup with a hardware switch, Marvell 6352. It
never seems to learn from such frames.

Thanks
Andrew


Re: [PATCH] net: Add trace events for all receive exit points

2018-11-08 Thread Genevieve Bastien
On 2018-11-08 03:25 PM, Mathieu Desnoyers wrote:
> - On Nov 8, 2018, at 2:56 PM, Geneviève Bastien gbast...@versatic.net 
> wrote:
>
>> Trace events are already present for the receive entry points, to indicate
>> how the reception entered the stack.
>>
>> This patch adds the corresponding exit trace events that will bound the
>> reception such that all events occurring between the entry and the exit
>> can be considered as part of the reception context. This greatly helps
>> for dependency and root cause analyses.
>>
>> Without this, it is impossible to determine whether a sched_wakeup
>> event following a netif_receive_skb event is the result of the packet
>> reception or a simple coincidence after further processing by the
>> thread.
> As a clarification: it is indeed not possible with tracepoint instrumentation,
> which I think is your point here. It might be possible to do so by using other
> mechanisms like kretprobes, but considering that the "entry" point was deemed
> important enough to have a tracepoint, it would be good to add the matching 
> exit
> events.
>
> Being able to link the packet reception entry/exit with wakeups is key to
> allow tools to perform automated network stack latency analysis, so I think
> the use case justifies adding the missing "exit" events.
Thanks for the precision. Indeed so far, kretprobes have been used as a
workaround, but it is harder to setup and has quite more overhead. After
seeing those "entry" points, I thought corresponding "exits" would be
the best candidate to encapsulate the "network reception" context.

For an example of dependency and critical path analysis of 'wget', see
this screenshot here, arranged from Trace Compass:
https://framapic.org/DgSdNwiuymib/PDPdHJBGCJGR.png

The top view shows the dependency analysis without the "exit" events:
the wakeup is not associated with the packet reception, so the source is
considered to be the process that was running at that time, an IRQ
thread. So wget was blocked by the iwlwifi IRQ thread, who was himself
simply woken up by an hardware interrupt, while in fact, wget was
waiting for the network.

The bottom view shows the dependency analysis with the "exit" events:
because the wakeup happens between the "entry" and "exit", we know the
packet reception is the source of the wakeup and if we happen to know
where that packet came from, we can follow the dependency to the packet
source. So wget was blocked waiting for a network request to another
machine which sent back the reply and we clearly see the time spent on
the other machine and the network latency during this blockage.

I hope this kind of possibilities for analyses justify adding those
tracepoints.

Thanks,
Geneviève


>
> It might be great if you can provide a glimpse of the wakeup dependency chain
> analysis prototypes you have created, which rely on this new event, in order
> to justify adding it.
>
> Thanks,
>
> Mathieu
>
>> Signed-off-by: Geneviève Bastien 
>> CC: Mathieu Desnoyers 
>> CC: Steven Rostedt 
>> CC: Ingo Molnar 
>> CC: David S. Miller 
>> ---
>> include/trace/events/net.h | 59 ++
>> net/core/dev.c | 30 ---
>> 2 files changed, 85 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
>> index 00aa72ce0e7c..318307511018 100644
>> --- a/include/trace/events/net.h
>> +++ b/include/trace/events/net.h
>> @@ -117,6 +117,23 @@ DECLARE_EVENT_CLASS(net_dev_template,
>>  __get_str(name), __entry->skbaddr, __entry->len)
>> )
>>
>> +DECLARE_EVENT_CLASS(net_dev_template_simple,
>> +
>> +TP_PROTO(struct sk_buff *skb),
>> +
>> +TP_ARGS(skb),
>> +
>> +TP_STRUCT__entry(
>> +__field(void *, skbaddr)
>> +),
>> +
>> +TP_fast_assign(
>> +__entry->skbaddr = skb;
>> +),
>> +
>> +TP_printk("skbaddr=%p", __entry->skbaddr)
>> +)
>> +
>> DEFINE_EVENT(net_dev_template, net_dev_queue,
>>
>>  TP_PROTO(struct sk_buff *skb),
>> @@ -244,6 +261,48 @@ DEFINE_EVENT(net_dev_rx_verbose_template,
>> netif_rx_ni_entry,
>>  TP_ARGS(skb)
>> );
>>
>> +DEFINE_EVENT(net_dev_template_simple, napi_gro_frags_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb),
>> +
>> +TP_ARGS(skb)
>> +);
>> +
>> +DEFINE_EVENT(net_dev_template_simple, napi_gro_receive_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb),
>> +
>> +TP_ARGS(skb)
>> +);
>> +
>> +DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb),
>> +
>> +TP_ARGS(skb)
>> +);
>> +
>> +DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_list_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb),
>> +
>> +TP_ARGS(skb)
>> +);
>> +
>> +DEFINE_EVENT(net_dev_template_simple, netif_rx_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb),
>> +
>> +TP_ARGS(skb)
>> +);
>> +
>> +DEFINE_EVENT(net_dev_template_simple, netif_rx_ni_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb),
>> +
>> +TP_ARGS(skb)

Re: [PATCH ipsec-next 00/11] xfrm: policy: add inexact policy search tree

2018-11-08 Thread David Miller
From: Florian Westphal 
Date: Wed,  7 Nov 2018 23:00:30 +0100

> This series attempts to improve xfrm policy lookup performance when
> a lot of (several hundred or even thousands) inexact policies exist
> on a system.
> 
> On insert, a policy is either placed in hash table (all direct (/32 for
> ipv4, /128 policies, or all policies matching a user-configured threshold).
> All other policies get inserted into inexact list as per priority.
> 
> Lookup then scans inexact list for first matching entry.
> 
> This series instead makes it so that inexact policy is added to exactly
> one of four different search list classes.
> 
> 1. "Any:Any" list, containing policies where both saddr and daddr are
>wildcards or have very coarse prefixes, e.g. 10.0.0.0/8 and the like.
> 2. "saddr:any" list, containing policies with a fixed saddr/prefixlen,
>but without destination restrictions.
>These lists are stored in rbtree nodes; each node contains those
>policies matching saddr/prefixlen.
> 3. "Any:daddr" list. Similar to 2), except for policies where only the
>destinations are specified.
> 4. "saddr:daddr" lists, containing policies that match the given
>source/destination network.
> 
>The root of the saddr/daddr tree is stored in the nodes of the
>'daddr' tree.
...
> Comments or questions welcome.

Acked-by: David S. Miller 

This looks really great.  Nice work Florian.


[PATCH net-next] tcp_bbr: update comments to reflect pacing_margin_percent

2018-11-08 Thread Neal Cardwell
Recently, in commit ab408b6dc744 ("tcp: switch tcp and sch_fq to new
earliest departure time model"), the TCP BBR code switched to a new
approach of using an explicit bbr_pacing_margin_percent for shaving a
pacing rate "haircut", rather than the previous implict
approach. Update an old comment to reflect the new approach.

Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp_bbr.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index 9277abdd822a0..0f497fc49c3fe 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -128,7 +128,12 @@ static const u32 bbr_probe_rtt_mode_ms = 200;
 /* Skip TSO below the following bandwidth (bits/sec): */
 static const int bbr_min_tso_rate = 120;
 
-/* Pace at ~1% below estimated bw, on average, to reduce queue at bottleneck. 
*/
+/* Pace at ~1% below estimated bw, on average, to reduce queue at bottleneck.
+ * In order to help drive the network toward lower queues and low latency while
+ * maintaining high utilization, the average pacing rate aims to be slightly
+ * lower than the estimated bandwidth. This is an important aspect of the
+ * design.
+ */
 static const int bbr_pacing_margin_percent = 1;
 
 /* We use a high_gain value of 2/ln(2) because it's the smallest pacing gain
@@ -247,13 +252,7 @@ static void bbr_init_pacing_rate_from_rtt(struct sock *sk)
sk->sk_pacing_rate = bbr_bw_to_pacing_rate(sk, bw, bbr_high_gain);
 }
 
-/* Pace using current bw estimate and a gain factor. In order to help drive the
- * network toward lower queues while maintaining high utilization and low
- * latency, the average pacing rate aims to be slightly (~1%) lower than the
- * estimated bandwidth. This is an important aspect of the design. In this
- * implementation this slightly lower pacing rate is achieved implicitly by not
- * including link-layer headers in the packet size used for the pacing rate.
- */
+/* Pace using current bw estimate and a gain factor. */
 static void bbr_set_pacing_rate(struct sock *sk, u32 bw, int gain)
 {
struct tcp_sock *tp = tcp_sk(sk);
-- 
2.19.1.930.g4563a0d9d0-goog



Re: [PATCH v2 net] inet: frags: better deal with smp races

2018-11-08 Thread David Miller
From: Eric Dumazet 
Date: Thu,  8 Nov 2018 17:34:27 -0800

> Multiple cpus might attempt to insert a new fragment in rhashtable,
> if for example RPS is buggy, as reported by 배석진 in
> https://patchwork.ozlabs.org/patch/994601/
> 
> We use rhashtable_lookup_get_insert_key() instead of
> rhashtable_insert_fast() to let cpus losing the race
> free their own inet_frag_queue and use the one that
> was inserted by another cpu.
> 
> Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units")
> Signed-off-by: Eric Dumazet 
> Reported-by: 배석진 

Applied and queued up for -stable, thanks.


RE:(2) (2) (2) (2) (2) [Kernel][NET] Bug report on packet defragmenting

2018-11-08 Thread 배석진
>On 11/08/2018 04:42 PM, 배석진 wrote:
>> Thanks for testing.
>>>
>>> This is not a pristine net-next tree, this dump seems unrelated to the 
>>>patch ?
>> 
>> 
>> yes, looks like that.
>> but only when using your patch, panic came. even right after packet 
>>recieving..
>> without that, there's no problem except defrag issue. it's odd.. :p
>> I couldn't more debugging since have other problems.
> 
> 
> You might need to backport some fixes (check all changes to lib/rhashtable.c )
 
I try to backport the updates to my space.
but.. there are too many related files about lib/rhashtable.c ..
I give up ;(

thank you for your help!


Re: (2) (2) (2) (2) [Kernel][NET] Bug report on packet defragmenting

2018-11-08 Thread Eric Dumazet



On 11/08/2018 04:42 PM, 배석진 wrote:
>> Thanks for testing.
>>
>> This is not a pristine net-next tree, this dump seems unrelated to the patch 
>> ?
> 
> 
> yes, looks like that.
> but only when using your patch, panic came. even right after packet 
> recieving..
> without that, there's no problem except defrag issue. it's odd.. :p
> I couldn't more debugging since have other problems.


You might need to backport some fixes (check all changes to lib/rhashtable.c )






[PATCH v2 net] inet: frags: better deal with smp races

2018-11-08 Thread Eric Dumazet
Multiple cpus might attempt to insert a new fragment in rhashtable,
if for example RPS is buggy, as reported by 배석진 in
https://patchwork.ozlabs.org/patch/994601/

We use rhashtable_lookup_get_insert_key() instead of
rhashtable_insert_fast() to let cpus losing the race
free their own inet_frag_queue and use the one that
was inserted by another cpu.

Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: Eric Dumazet 
Reported-by: 배석진 
---
 net/ipv4/inet_fragment.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 
bcb11f3a27c0c34115af05034a5a20f57842eb0a..760a9e52e02b91b36af323c92f7027e150858f88
 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -178,21 +178,22 @@ static struct inet_frag_queue *inet_frag_alloc(struct 
netns_frags *nf,
 }
 
 static struct inet_frag_queue *inet_frag_create(struct netns_frags *nf,
-   void *arg)
+   void *arg,
+   struct inet_frag_queue **prev)
 {
struct inet_frags *f = nf->f;
struct inet_frag_queue *q;
-   int err;
 
q = inet_frag_alloc(nf, f, arg);
-   if (!q)
+   if (!q) {
+   *prev = ERR_PTR(-ENOMEM);
return NULL;
-
+   }
mod_timer(>timer, jiffies + nf->timeout);
 
-   err = rhashtable_insert_fast(>rhashtable, >node,
-f->rhash_params);
-   if (err < 0) {
+   *prev = rhashtable_lookup_get_insert_key(>rhashtable, >key,
+>node, f->rhash_params);
+   if (*prev) {
q->flags |= INET_FRAG_COMPLETE;
inet_frag_kill(q);
inet_frag_destroy(q);
@@ -204,22 +205,22 @@ static struct inet_frag_queue *inet_frag_create(struct 
netns_frags *nf,
 /* TODO : call from rcu_read_lock() and no longer use refcount_inc_not_zero() 
*/
 struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key)
 {
-   struct inet_frag_queue *fq;
+   struct inet_frag_queue *fq = NULL, *prev;
 
if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh)
return NULL;
 
rcu_read_lock();
 
-   fq = rhashtable_lookup(>rhashtable, key, nf->f->rhash_params);
-   if (fq) {
+   prev = rhashtable_lookup(>rhashtable, key, nf->f->rhash_params);
+   if (!prev)
+   fq = inet_frag_create(nf, key, );
+   if (prev && !IS_ERR(prev)) {
+   fq = prev;
if (!refcount_inc_not_zero(>refcnt))
fq = NULL;
-   rcu_read_unlock();
-   return fq;
}
rcu_read_unlock();
-
-   return inet_frag_create(nf, key);
+   return fq;
 }
 EXPORT_SYMBOL(inet_frag_find);
-- 
2.19.1.930.g4563a0d9d0-goog



Re: [PATCH net] inet: frags: better deal with smp races

2018-11-08 Thread Eric Dumazet



On 11/08/2018 05:02 PM, David Miller wrote:

> GCC is unwilling to see that all paths leading to that final return
> statement do in fact set 'fq' one way or another:
> 
> net/ipv4/inet_fragment.c: In function ‘inet_frag_find’:
> net/ipv4/inet_fragment.c:224:9: warning: ‘fq’ may be used uninitialized in 
> this function [-Wmaybe-uninitialized]
> 
> This is with:
> 
> gcc (GCC) 8.2.1 20181011 (Red Hat 8.2.1-4)
> 
> Please adjust your patch so that the warning is eliminated.
> 

Interesting, I will init *fq to NULL in v2, thanks.



Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-08 Thread Yonghong Song


On 11/8/18 2:56 PM, Edward Cree wrote:
> On 08/11/18 19:42, Alexei Starovoitov wrote:
>> same link let's continue at 1pm PST.
> So, one thing we didn't really get onto was maps, and you mentioned that it
>   wasn't really clear what I was proposing there.
> What I have in mind comes in two parts:
> 1) map type.  A new BTF_KIND_MAP with metadata 'key_type', 'value_type'
>   (both are type_ids referencing other BTF type records), describing the
>   type "map from key_type to value_type".
> 2) record in the 'instances' table.  This would have a name_off (the
>   name of the map), a type_id (pointing at a BTF_KIND_MAP in the 'types'
>   table), and potentially also some indication of what symbol (from
>   section 'maps') refers to this map.  This is pretty much the exact
>   same metadata that a function in the 'instances' table has, the only
>   differences being
>   (a) function's type_id points at a BTF_KIND_FUNC record
>   (b) function's symbol indication refers from .text section
>   (c) in future functions may be nested inside other functions, whereas
>   AIUI a map can't live inside a function.  (But a variable, which is
>   the other thing that would want to go in an 'instances' table, can.)
> So the 'instances' table record structure looks like
> 
> struct btf_instance {
>      __u32 type_id; /* Type of object declared.  An index into type section */
>      __u32 name_off; /* Name of object.  An offset into string section */
>      __u32 parent; /* Containing object if any (else 0).  An index into 
> instance section */
> };
> 
> and we extend the BTF header:
> 
> struct btf_header {
>      __u16   magic;
>      __u8    version;
>      __u8    flags;
>      __u32   hdr_len;
> 
>      /* All offsets are in bytes relative to the end of this header */
>      __u32   type_off;  /* offset of type section   */
>      __u32   type_len;  /* length of type section   */
>      __u32   str_off;   /* offset of string section */
>      __u32   str_len;   /* length of string section */
>      __u32   inst_off;  /* offset of instance section   */
>      __u32   inst_len;  /* length of instance section   */
> };
> 
> Then in the .BTF.ext section, we have both
> 
> struct bpf_func_info {
>      __u32 prog_symbol; /* Index of symbol giving address of subprog */
>      __u32 inst_id; /* Index into instance section */
> }
> 
> struct bpf_map_info {
> {
>      __u32 map_symbol; /* Index of symbol creating this map */
>      __u32 inst_id; /* Index into instance section */
> }
> 
> (either living in different subsections, or in a single table with
>   the addition of a kind field, or in a single table relying on the
>   ultimately referenced type to distinguish funcs from maps).
> 
> Note that the name (in btf_instance) of a map or function need not
>   match the name of the corresponding symbol; we use the .BTF.ext
>   section to tie together btf_instance IDs and symbol IDs.  Then in
>   the case of functions (subprogs), the prog_symbol can be looked
>   up in the ELF symbol table to find the address (== insn_offset)
>   of the subprog, as well as the section containing it (since that
>   might not be .text).  Similarly in the case of maps the BTF info
>   about the map is connected with the info in the maps section.
> 
> Now when the loader has munged this, what it passes to the kernel
>   might not have map_symbol, but instead map_fd.  Instead of
>   prog_symbol it will have whatever identifies the subprog in the
>   blob of stuff it feeds to the kernel (so probably insn_offset).
> 
> All this would of course require a bit more compiler support than
>   the current BPF_ANNOTATE_KV_PAIR, since that just causes the
>   existing BTF machinery to declare a specially constructed struct
>   type.  At the C level you could still have BPF_ANNOTATE_KV_PAIR
>   and the 'bpf_map_foo' name, but then the compiler would
>   recognise that and convert it into an instance record by looking
>   up the name 'foo' in its "maps" section.  That way the special
>   bpf_map_* handling (which ties map names to symbol names,

Compiler in general does not do transformation based on variable
or struct type names by default, so this probably should stay
in the loader.

>   also) would be entirely compiler-internal and not 'leak out' into
>   the definition of the format.  Frontends for other languages
>   which do possess a native map type (e.g. Python dict) might have
>   other ways of indicating the key/value type of a map at source
>   level (e.g. PEP 484) and could directly generate the appropriate
>   BTF_KIND_MAP and bpf_map_info records rather than (as they would
>   with the current design) having to encode the information as a
>   struct bpf_map_foo type-definition.

You mean a python application can generate bpf byte codes and
BTFs (include map types)? That will be different from the C/LLVM
use case. The python app. probably will be the loader as well.

One option is to pass BPF 

Re: [PATCH net-next 0/8] s390/qeth: updates 2018-11-08

2018-11-08 Thread David Miller
From: Julian Wiedmann 
Date: Thu,  8 Nov 2018 15:06:14 +0100

> please apply the following qeth patches to net-next.
> 
> The first patch allows one more device type to query the FW for a MAC address,
> the others are all basically just removal of duplicated or unused code.

Series applied, thanks.


Re: [PATCH][net-next] openvswitch: remove BUG_ON from get_dpdev

2018-11-08 Thread David Miller
From: Li RongQing 
Date: Thu,  8 Nov 2018 20:40:20 +0800

> if local is NULL pointer, and the following access of local's
> dev will trigger panic, which is same as BUG_ON
> 
> Signed-off-by: Li RongQing 

Applied.


Re: [PATCH net-next v2 00/11] ICMP error handling for UDP tunnels

2018-11-08 Thread David Miller
From: Stefano Brivio 
Date: Thu,  8 Nov 2018 12:19:13 +0100

> This series introduces ICMP error handling for UDP tunnels and
> encapsulations and related selftests. We need to handle ICMP errors to
> support PMTU discovery and route redirection -- this support is entirely
> missing right now:
> 
> - patch 1/11 adds a socket lookup for UDP tunnels that use, by design,
>   the same destination port on both endpoints -- i.e. VXLAN and GENEVE
> - patches 2/11 to 7/11 are specific to VxLAN and GENEVE
> - patches 8/11 and 9/11 add infrastructure for lookup of encapsulations
>   where sent packets cannot be matched via receiving socket lookup, i.e.
>   FoU and GUE
> - patches 10/11 and 11/11 are specific to FoU and GUE
> 
> v2: changes are listed in the single patches

Series applied, thanks Stefano.


Re: [PATCH net-next] cxgb4: Add new T6 PCI device ids 0x608a

2018-11-08 Thread David Miller
From: Ganesh Goudar 
Date: Thu,  8 Nov 2018 14:21:07 +0530

> Signed-off-by: Ganesh Goudar 

Applied.


Re: [PATCH][net-next][v2] net/ipv6: compute anycast address hash only if dev is null

2018-11-08 Thread David Miller
From: Li RongQing 
Date: Thu,  8 Nov 2018 14:58:07 +0800

> avoid to compute the hash value if dev is not null, since
> hash value is not used
> 
> Signed-off-by: Li RongQing 

Applied.


Re: [PATCH net] inet: frags: better deal with smp races

2018-11-08 Thread David Miller
From: Eric Dumazet 
Date: Wed,  7 Nov 2018 22:10:53 -0800

> @@ -204,22 +205,22 @@ static struct inet_frag_queue *inet_frag_create(struct 
> netns_frags *nf,
>  /* TODO : call from rcu_read_lock() and no longer use 
> refcount_inc_not_zero() */
>  struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key)
>  {
> - struct inet_frag_queue *fq;
> + struct inet_frag_queue *fq, *prev;
>  
>   if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh)
>   return NULL;
>  
>   rcu_read_lock();
>  
> - fq = rhashtable_lookup(>rhashtable, key, nf->f->rhash_params);
> - if (fq) {
> + prev = rhashtable_lookup(>rhashtable, key, nf->f->rhash_params);
> + if (!prev)
> + fq = inet_frag_create(nf, key, );
> + if (prev && !IS_ERR(prev)) {
> + fq = prev;
>   if (!refcount_inc_not_zero(>refcnt))
>   fq = NULL;
> - rcu_read_unlock();
> - return fq;
>   }
>   rcu_read_unlock();
> -
> - return inet_frag_create(nf, key);
> + return fq;

GCC is unwilling to see that all paths leading to that final return
statement do in fact set 'fq' one way or another:

net/ipv4/inet_fragment.c: In function ‘inet_frag_find’:
net/ipv4/inet_fragment.c:224:9: warning: ‘fq’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]

This is with:

gcc (GCC) 8.2.1 20181011 (Red Hat 8.2.1-4)

Please adjust your patch so that the warning is eliminated.

Thanks.


RE:(2) (2) (2) (2) [Kernel][NET] Bug report on packet defragmenting

2018-11-08 Thread 배석진
>Thanks for testing.
>
>This is not a pristine net-next tree, this dump seems unrelated to the patch ?


yes, looks like that.
but only when using your patch, panic came. even right after packet recieving..
without that, there's no problem except defrag issue. it's odd.. :p
I couldn't more debugging since have other problems.

rcptInfo.txt
Description: Binary data


Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-08 Thread David Ahern
On 11/8/18 5:40 PM, Paweł Staszewski wrote:
> 
> 
> W dniu 08.11.2018 o 17:32, David Ahern pisze:
>> On 11/8/18 9:27 AM, Paweł Staszewski wrote:
> What hardware is this?
>
>>> mellanox connectx 4
>>> ethtool -i enp175s0f0
>>> driver: mlx5_core
>>> version: 5.0-0
>>> firmware-version: 12.21.1000 (SM_200101033)
>>> expansion-rom-version:
>>> bus-info: :af:00.0
>>> supports-statistics: yes
>>> supports-test: yes
>>> supports-eeprom-access: no
>>> supports-register-dump: no
>>> supports-priv-flags: yes
>>>
>>> ethtool -i enp175s0f1
>>> driver: mlx5_core
>>> version: 5.0-0
>>> firmware-version: 12.21.1000 (SM_200101033)
>>> expansion-rom-version:
>>> bus-info: :af:00.1
>>> supports-statistics: yes
>>> supports-test: yes
>>> supports-eeprom-access: no
>>> supports-register-dump: no
>>> supports-priv-flags: yes
>>>
> Start with:
>
> echo 1 > /sys/kernel/debug/tracing/events/xdp/enable
> cat /sys/kernel/debug/tracing/trace_pipe
   cat /sys/kernel/debug/tracing/trace_pipe
   -0 [045] ..s. 68469.467752: xdp_devmap_xmit:
 ndo_xdp_xmit map_id=32 map_index=5 action=REDIRECT sent=0 drops=1
 from_ifindex=4 to_ifindex=5 err=-6
>> FIB lookup is good, the redirect is happening, but the mlx5 driver does
>> not like it.
>>
>> I think the -6 is coming from the mlx5 driver and the packet is getting
>> dropped. Perhaps this check in mlx5e_xdp_xmit:
>>
>>     if (unlikely(sq_num >= priv->channels.num))
>>  return -ENXIO;
> I removed that part and recompiled - but after running now xdp_fwd i
> have kernel pamic :)

Jesper or one of the Mellanox folks needs to respond about the config
needed to run XDP with this NIC. I don't have a 40G or 100G card to play
with.



Re: [PATCH bpf-next v4 02/13] bpf: btf: Add BTF_KIND_FUNC

2018-11-08 Thread Yonghong Song


On 11/8/18 12:52 PM, Edward Cree wrote:
> On 08/11/18 20:36, Yonghong Song wrote:
>> This patch adds BTF_KIND_FUNC support to the type section.
>> BTF_KIND_FUNC is used to specify the signature of a
>> defined subprogram or the pointee of a function pointer.
>>
>> In BTF, the function type related data structures are
>>struct bpf_param {
>>  __u32 name_off; /* parameter name */
>>  __u32 type; /* parameter type */
>>};
>>struct bpf_type {
>>  __u32 name_off; /* function name */
>>  __u32 info; /* BTF_KIND_FUNC and num of parameters (#vlen) */
>>  __u32 type; /* return type */
>>}
>> The data layout of the function type:
>>struct bpf_type
>>#vlen number of bpf_param's
>>
>> For a defined subprogram with valid function body,
>>. function name and all parameter names except the vararg
>>  must be valid C identifier.
> Given that there's an intention to support other frontends besides
>   C, what's the reason for this restriction?

This (C) is the typical usage today. If later on other frontend
generates bpf programs with more relaxed symbol name requirement,
we can certainly relax the rule.

> 
>> For the pointee of a function pointer,
>>. function name and all parameter names will
>> have name_off = 0 to indicate a non-existing name.
> Why can't function pointer parameters have names?

Currently, both bcc and llvm does not retain function pointer
arguments in dwarf. For LLVM, the IR generation for function pointer
type discards the argument name. So I did the checking because
llvm does not generate it.

We can relax the restrictions later if the compiler starts
to keep argument names in the IR.

> E.g. imagine something like struct net_device_ops.  All those
>   function pointers have named parameters and that's relevant info
>   when debugging.
> 
> -Ed
> 


Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-08 Thread Paweł Staszewski




W dniu 08.11.2018 o 17:32, David Ahern pisze:

On 11/8/18 9:27 AM, Paweł Staszewski wrote:

What hardware is this?


mellanox connectx 4
ethtool -i enp175s0f0
driver: mlx5_core
version: 5.0-0
firmware-version: 12.21.1000 (SM_200101033)
expansion-rom-version:
bus-info: :af:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

ethtool -i enp175s0f1
driver: mlx5_core
version: 5.0-0
firmware-version: 12.21.1000 (SM_200101033)
expansion-rom-version:
bus-info: :af:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes


Start with:

echo 1 > /sys/kernel/debug/tracing/events/xdp/enable
cat /sys/kernel/debug/tracing/trace_pipe

  cat /sys/kernel/debug/tracing/trace_pipe
  -0 [045] ..s. 68469.467752: xdp_devmap_xmit:
ndo_xdp_xmit map_id=32 map_index=5 action=REDIRECT sent=0 drops=1
from_ifindex=4 to_ifindex=5 err=-6

FIB lookup is good, the redirect is happening, but the mlx5 driver does
not like it.

I think the -6 is coming from the mlx5 driver and the packet is getting
dropped. Perhaps this check in mlx5e_xdp_xmit:

if (unlikely(sq_num >= priv->channels.num))
 return -ENXIO;
I removed that part and recompiled - but after running now xdp_fwd i 
have kernel pamic :)







swapper 0 [045] 68493.746274: fib:fib_table_lookup: table 254 oif
0 iif 6 proto 1 192.168.22.237/0 -> 172.16.0.2/0 tos 0 scope 0 flags 0
==> dev vlan1740 gw 0.0.0.0 src 172.16.0.1 err 0
     7fff818c13b5 fib_table_lookup ([kernel.kallsyms])

swapper 0 [045] 68494.770287: fib:fib_table_lookup: table 254 oif
0 iif 6 proto 1 192.168.22.237/0 -> 172.16.0.2/0 tos 0 scope 0 flags 0
==> dev vlan1740 gw 0.0.0.0 src 172.16.0.1 err 0
     7fff818c13b5 fib_table_lookup ([kernel.kallsyms])

swapper 0 [045] 68495.794304: fib:fib_table_lookup: table 254 oif
0 iif 6 proto 1 192.168.22.237/0 -> 172.16.0.2/0 tos 0 scope 0 flags 0
==> dev vlan1740 gw 0.0.0.0 src 172.16.0.1 err 0
     7fff818c13b5 fib_table_lookup ([kernel.kallsyms])

swapper 0 [045] 68496.818308: fib:fib_table_lookup: table 254 oif
0 iif 6 proto 1 192.168.22.237/0 -> 172.16.0.2/0 tos 0 scope 0 flags 0
==> dev vlan1740 gw 0.0.0.0 src 172.16.0.1 err 0
     7fff818c13b5 fib_table_lookup ([kernel.kallsyms])

swapper 0 [045] 68497.842313: fib:fib_table_lookup: table 254 oif
0 iif 6 proto 1 192.168.22.237/0 -> 172.16.0.2/0 tos 0 scope 0 flags 0
==> dev vlan1740 gw 0.0.0.0 src 172.16.0.1 err 0
     7fff818c13b5 fib_table_lookup ([kernel.kallsyms])




Re: [RFC PATCH 0/3] acpi: Add acpi mdio support code

2018-11-08 Thread Timur Tabi

On 11/8/18 5:23 PM, Andrew Lunn wrote:

I don't know much about ACPI. I do know DT. MDIO busses can have
multiple PHYs on them. Is the following valid to list two PHYs?

  Device (MDIO) {
  Name (_DSD, Package () {
  ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
  Package () { Package () { "ethernet-phy@0", PHY0 }, }
  })
  Name (PHY0, Package() {
  ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
  Package () { Package () { "reg", 0x0 }, }
  })
  Name (_DSD, Package () {
  ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
  Package () { Package () { "ethernet-phy@10", PHY1 }, }
  })
  Name (PHY1, Package() {
  ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
  Package () { Package () { "reg", 0x10 }, }
  })
  }


You can't have the same DSD twice.  It would need to look like this:

 Name (PHY1, Package() {
 ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
 Package () { Package () { "reg", 0, 0x10 }, }
 })


[PATCH v4 bpf-next 1/7] selftests/bpf: rename flow dissector section to flow_dissector

2018-11-08 Thread Stanislav Fomichev
From: Stanislav Fomichev 

Makes it compatible with the logic that derives program type
from section name in libbpf_prog_type_by_name.

Signed-off-by: Stanislav Fomichev 
---
 tools/testing/selftests/bpf/bpf_flow.c | 2 +-
 tools/testing/selftests/bpf/test_flow_dissector.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_flow.c 
b/tools/testing/selftests/bpf/bpf_flow.c
index 107350a7821d..b9798f558ca7 100644
--- a/tools/testing/selftests/bpf/bpf_flow.c
+++ b/tools/testing/selftests/bpf/bpf_flow.c
@@ -116,7 +116,7 @@ static __always_inline int parse_eth_proto(struct __sk_buff 
*skb, __be16 proto)
return BPF_DROP;
 }
 
-SEC("dissect")
+SEC("flow_dissector")
 int _dissect(struct __sk_buff *skb)
 {
if (!skb->vlan_present)
diff --git a/tools/testing/selftests/bpf/test_flow_dissector.sh 
b/tools/testing/selftests/bpf/test_flow_dissector.sh
index c0fb073b5eab..d23d4da66b83 100755
--- a/tools/testing/selftests/bpf/test_flow_dissector.sh
+++ b/tools/testing/selftests/bpf/test_flow_dissector.sh
@@ -59,7 +59,7 @@ else
 fi
 
 # Attach BPF program
-./flow_dissector_load -p bpf_flow.o -s dissect
+./flow_dissector_load -p bpf_flow.o -s flow_dissector
 
 # Setup
 tc qdisc add dev lo ingress
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH v4 bpf-next 4/7] libbpf: add internal pin_name

2018-11-08 Thread Stanislav Fomichev
From: Stanislav Fomichev 

pin_name is the same as section_name where '/' is replaced
by '_'. bpf_object__pin_programs is converted to use pin_name
to avoid the situation where section_name would require creating another
subdirectory for a pin (as, for example, when calling bpf_object__pin_programs
for programs in sections like "cgroup/connect6").

Signed-off-by: Stanislav Fomichev 
---
 tools/lib/bpf/libbpf.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index cfa269c91e11..38dbeb113eeb 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -124,6 +124,10 @@ struct bpf_program {
char *name;
int prog_ifindex;
char *section_name;
+   /* section_name with / replaced by _; makes recursive pinning
+* in bpf_object__pin_programs easier
+*/
+   char *pin_name;
struct bpf_insn *insns;
size_t insns_cnt, main_prog_cnt;
enum bpf_prog_type type;
@@ -253,6 +257,7 @@ static void bpf_program__exit(struct bpf_program *prog)
bpf_program__unload(prog);
zfree(>name);
zfree(>section_name);
+   zfree(>pin_name);
zfree(>insns);
zfree(>reloc_desc);
 
@@ -261,6 +266,18 @@ static void bpf_program__exit(struct bpf_program *prog)
prog->idx = -1;
 }
 
+static char *__bpf_program__pin_name(struct bpf_program *prog)
+{
+   char *name;
+
+   name = strdup(prog->section_name);
+   for (char *p = name; p && *p; p++)
+   if (*p == '/')
+   *p = '_';
+
+   return name;
+}
+
 static int
 bpf_program__init(void *data, size_t size, char *section_name, int idx,
  struct bpf_program *prog)
@@ -279,6 +296,13 @@ bpf_program__init(void *data, size_t size, char 
*section_name, int idx,
goto errout;
}
 
+   prog->pin_name = __bpf_program__pin_name(prog);
+   if (!prog->pin_name) {
+   pr_warning("failed to alloc pin name for prog under section(%d) 
%s\n",
+  idx, section_name);
+   goto errout;
+   }
+
prog->insns = malloc(size);
if (!prog->insns) {
pr_warning("failed to alloc insns for prog under section %s\n",
@@ -2008,7 +2032,7 @@ int bpf_object__pin_programs(struct bpf_object *obj, 
const char *path)
int len;
 
len = snprintf(buf, PATH_MAX, "%s/%s", path,
-  prog->section_name);
+  prog->pin_name);
if (len < 0) {
err = -EINVAL;
goto err_unpin_programs;
@@ -2032,7 +2056,7 @@ int bpf_object__pin_programs(struct bpf_object *obj, 
const char *path)
int len;
 
len = snprintf(buf, PATH_MAX, "%s/%s", path,
-  prog->section_name);
+  prog->pin_name);
if (len < 0)
continue;
else if (len >= PATH_MAX)
@@ -2057,7 +2081,7 @@ int bpf_object__unpin_programs(struct bpf_object *obj, 
const char *path)
int len;
 
len = snprintf(buf, PATH_MAX, "%s/%s", path,
-  prog->section_name);
+  prog->pin_name);
if (len < 0)
return -EINVAL;
else if (len >= PATH_MAX)
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH v4 bpf-next 3/7] libbpf: bpf_program__pin: add special case for instances.nr == 1

2018-11-08 Thread Stanislav Fomichev
From: Stanislav Fomichev 

When bpf_program has only one instance, don't create a subdirectory with
per-instance pin files (/0). Instead, just create a single pin file
for that single instance. This simplifies object pinning by not creating
unnecessary subdirectories.

This can potentially break existing users that depend on the case
where '/0' is always created. However, I couldn't find any serious
usage of bpf_program__pin inside the kernel tree and I suppose there
should be none outside.

Signed-off-by: Stanislav Fomichev 
---
 tools/lib/bpf/libbpf.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index f8590490a9dd..cfa269c91e11 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1761,6 +1761,11 @@ int bpf_program__pin(struct bpf_program *prog, const 
char *path)
return -EINVAL;
}
 
+   if (prog->instances.nr == 1) {
+   /* don't create subdirs when pinning single instance */
+   return bpf_program__pin_instance(prog, path, 0);
+   }
+
err = make_dir(path);
if (err)
return err;
@@ -1823,6 +1828,11 @@ int bpf_program__unpin(struct bpf_program *prog, const 
char *path)
return -EINVAL;
}
 
+   if (prog->instances.nr == 1) {
+   /* don't create subdirs when pinning single instance */
+   return bpf_program__unpin_instance(prog, path, 0);
+   }
+
for (i = 0; i < prog->instances.nr; i++) {
char buf[PATH_MAX];
int len;
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH v4 bpf-next 0/7] bpftool: support loading flow dissector

2018-11-08 Thread Stanislav Fomichev
From: Stanislav Fomichev 

v4 changes:
* addressed another round of comments/style issues from Jakub Kicinski &
  Quentin Monnet (thanks!)
* implemented bpf_object__pin_maps and bpf_object__pin_programs helpers and
  used them in bpf_program__pin
* added new pin_name to bpf_program so bpf_program__pin
  works with sections that contain '/'
* moved *loadall* command implementation into a separate patch
* added patch that implements *pinmaps* to pin maps when doing
  load/loadall

v3 changes:
* (maybe) better cleanup for partial failure in bpf_object__pin
* added special case in bpf_program__pin for programs with single
  instances

v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin

This patch series adds support for loading and attaching flow dissector
programs from the bpftool:

* first patch fixes flow dissector section name in the selftests (so
  libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin, parts of which are now
  being used to attach all flow dissector progs/maps
* third patch adds special case in bpf_program__pin for programs with
  single instances (we don't create /0 pin anymore, just )
* forth patch adds pin_name to the bpf_program struct
  which is now used as a pin name in bpf_program__pin et al
* fifth patch adds *loadall* command that pins all programs, not just
  the first one
* sixth patch adds *pinmaps* argument to load/loadall to let users pin
  all maps of the obj file
* seventh patch adds actual flow_dissector support to the bpftool and
  an example

Stanislav Fomichev (7):
  selftests/bpf: rename flow dissector section to flow_dissector
  libbpf: cleanup after partial failure in bpf_object__pin
  libbpf: bpf_program__pin: add special case for instances.nr == 1
  libbpf: add internal pin_name
  bpftool: add loadall command
  bpftool: add pinmaps argument to the load/loadall
  bpftool: support loading flow dissector

 .../bpftool/Documentation/bpftool-prog.rst|  42 +-
 tools/bpf/bpftool/bash-completion/bpftool |  21 +-
 tools/bpf/bpftool/common.c|  31 +-
 tools/bpf/bpftool/main.h  |   1 +
 tools/bpf/bpftool/prog.c  | 185 ++---
 tools/lib/bpf/libbpf.c| 364 --
 tools/lib/bpf/libbpf.h|  18 +
 tools/testing/selftests/bpf/bpf_flow.c|   2 +-
 .../selftests/bpf/test_flow_dissector.sh  |   2 +-
 9 files changed, 546 insertions(+), 120 deletions(-)

-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH v4 bpf-next 5/7] bpftool: add loadall command

2018-11-08 Thread Stanislav Fomichev
From: Stanislav Fomichev 

This patch adds new *loadall* command which slightly differs from the
existing *load*. *load* command loads all programs from the obj file,
but pins only the first programs. *loadall* pins all programs from the
obj file under specified directory.

The intended usecase is flow_dissector, where we want to load a bunch
of progs, pin them all and after that construct a jump table.

Signed-off-by: Stanislav Fomichev 
---
 .../bpftool/Documentation/bpftool-prog.rst| 14 +++-
 tools/bpf/bpftool/bash-completion/bpftool |  4 +-
 tools/bpf/bpftool/common.c| 31 
 tools/bpf/bpftool/main.h  |  1 +
 tools/bpf/bpftool/prog.c  | 74 ++-
 5 files changed, 82 insertions(+), 42 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index ac4e904b10fb..d943d9b67a1d 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -15,7 +15,8 @@ SYNOPSIS
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { 
**-f** | **--bpffs** } }
 
*COMMANDS* :=
-   { **show** | **list** | **dump xlated** | **dump jited** | **pin** | 
**load** | **help** }
+   { **show** | **list** | **dump xlated** | **dump jited** | **pin** | 
**load**
+   | **loadall** | **help** }
 
 MAP COMMANDS
 =
@@ -24,7 +25,7 @@ MAP COMMANDS
 |  **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** 
| **visual**}]
 |  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
**opcodes**}]
 |  **bpftool** **prog pin** *PROG* *FILE*
-|  **bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** 
{**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
+|  **bpftool** **prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
[**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
 |   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
 |   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
 |  **bpftool** **prog help**
@@ -79,8 +80,13 @@ DESCRIPTION
  contain a dot character ('.'), which is reserved for future
  extensions of *bpffs*.
 
-   **bpftool prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** 
*IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
- Load bpf program from binary *OBJ* and pin as *FILE*.
+   **bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
[**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
+ Load bpf program(s) from binary *OBJ* and pin as *FILE*.
+ Both **bpftool prog load** and **bpftool prog loadall** load
+ all maps and programs from the *OBJ* and differ only in
+ pinning. **load** pins only the first program from the *OBJ*
+ as *FILE*. **loadall** pins all programs from the *OBJ*
+ under *FILE* directory.
  **type** is optional, if not specified program type will be
  inferred from section names.
  By default bpftool will create new maps as declared in the ELF
diff --git a/tools/bpf/bpftool/bash-completion/bpftool 
b/tools/bpf/bpftool/bash-completion/bpftool
index 3f78e6404589..780ebafb756a 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -243,7 +243,7 @@ _bpftool()
 # Completion depends on object and command in use
 case $object in
 prog)
-if [[ $command != "load" ]]; then
+if [[ $command != "load" && $command != "loadall" ]]; then
 case $prev in
 id)
 _bpftool_get_prog_ids
@@ -309,7 +309,7 @@ _bpftool()
 fi
 return 0
 ;;
-load)
+load|loadall)
 local obj
 
 if [[ ${#words[@]} -lt 6 ]]; then
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 25af85304ebe..21ce556c15e1 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -169,34 +169,23 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type 
exp_type)
return fd;
 }
 
-int do_pin_fd(int fd, const char *name)
+int mount_bpffs_for_pin(const char *name)
 {
char err_str[ERR_MAX_LEN];
char *file;
char *dir;
int err = 0;
 
-   err = bpf_obj_pin(fd, name);
-   if (!err)
-   goto out;
-
file = malloc(strlen(name) + 1);
strcpy(file, name);
dir = dirname(file);
 
-   if (errno != EPERM || is_bpffs(dir)) {
-   p_err("can't pin the object (%s): %s", name, strerror(errno));
+   if (is_bpffs(dir))
+   /* nothing to do if already mounted */
   

[PATCH v4 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin

2018-11-08 Thread Stanislav Fomichev
From: Stanislav Fomichev 

bpftool will use bpf_object__pin in the next commits to pin all programs
and maps from the file; in case of a partial failure, we need to get
back to the clean state (undo previous program/map pins).

As part of a cleanup, I've added and exported separate routines to
pin all maps (bpf_object__pin_maps) and progs (bpf_object__pin_programs)
of an object.

Signed-off-by: Stanislav Fomichev 
---
 tools/lib/bpf/libbpf.c | 328 ++---
 tools/lib/bpf/libbpf.h |  18 +++
 2 files changed, 323 insertions(+), 23 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d6e62e90e8d4..f8590490a9dd 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1699,6 +1699,34 @@ int bpf_program__pin_instance(struct bpf_program *prog, 
const char *path,
return 0;
 }
 
+int bpf_program__unpin_instance(struct bpf_program *prog, const char *path,
+   int instance)
+{
+   int err;
+
+   err = check_path(path);
+   if (err)
+   return err;
+
+   if (prog == NULL) {
+   pr_warning("invalid program pointer\n");
+   return -EINVAL;
+   }
+
+   if (instance < 0 || instance >= prog->instances.nr) {
+   pr_warning("invalid prog instance %d of prog %s (max %d)\n",
+  instance, prog->section_name, prog->instances.nr);
+   return -EINVAL;
+   }
+
+   err = unlink(path);
+   if (err != 0)
+   return -errno;
+   pr_debug("unpinned program '%s'\n", path);
+
+   return 0;
+}
+
 static int make_dir(const char *path)
 {
char *cp, errmsg[STRERR_BUFSIZE];
@@ -1737,6 +1765,64 @@ int bpf_program__pin(struct bpf_program *prog, const 
char *path)
if (err)
return err;
 
+   for (i = 0; i < prog->instances.nr; i++) {
+   char buf[PATH_MAX];
+   int len;
+
+   len = snprintf(buf, PATH_MAX, "%s/%d", path, i);
+   if (len < 0) {
+   err = -EINVAL;
+   goto err_unpin;
+   } else if (len >= PATH_MAX) {
+   err = -ENAMETOOLONG;
+   goto err_unpin;
+   }
+
+   err = bpf_program__pin_instance(prog, buf, i);
+   if (err)
+   goto err_unpin;
+   }
+
+   return 0;
+
+err_unpin:
+   for (i = i - 1; i >= 0; i--) {
+   char buf[PATH_MAX];
+   int len;
+
+   len = snprintf(buf, PATH_MAX, "%s/%d", path, i);
+   if (len < 0)
+   continue;
+   else if (len >= PATH_MAX)
+   continue;
+
+   bpf_program__unpin_instance(prog, buf, i);
+   }
+
+   rmdir(path);
+
+   return err;
+}
+
+int bpf_program__unpin(struct bpf_program *prog, const char *path)
+{
+   int i, err;
+
+   err = check_path(path);
+   if (err)
+   return err;
+
+   if (prog == NULL) {
+   pr_warning("invalid program pointer\n");
+   return -EINVAL;
+   }
+
+   if (prog->instances.nr <= 0) {
+   pr_warning("no instances of prog %s to pin\n",
+  prog->section_name);
+   return -EINVAL;
+   }
+
for (i = 0; i < prog->instances.nr; i++) {
char buf[PATH_MAX];
int len;
@@ -1747,11 +1833,15 @@ int bpf_program__pin(struct bpf_program *prog, const 
char *path)
else if (len >= PATH_MAX)
return -ENAMETOOLONG;
 
-   err = bpf_program__pin_instance(prog, buf, i);
+   err = bpf_program__unpin_instance(prog, buf, i);
if (err)
return err;
}
 
+   err = rmdir(path);
+   if (err)
+   return -errno;
+
return 0;
 }
 
@@ -1776,12 +1866,33 @@ int bpf_map__pin(struct bpf_map *map, const char *path)
}
 
pr_debug("pinned map '%s'\n", path);
+
return 0;
 }
 
-int bpf_object__pin(struct bpf_object *obj, const char *path)
+int bpf_map__unpin(struct bpf_map *map, const char *path)
+{
+   int err;
+
+   err = check_path(path);
+   if (err)
+   return err;
+
+   if (map == NULL) {
+   pr_warning("invalid map pointer\n");
+   return -EINVAL;
+   }
+
+   err = unlink(path);
+   if (err != 0)
+   return -errno;
+   pr_debug("unpinned map '%s'\n", path);
+
+   return 0;
+}
+
+int bpf_object__pin_maps(struct bpf_object *obj, const char *path)
 {
-   struct bpf_program *prog;
struct bpf_map *map;
int err;
 
@@ -1797,6 +1908,55 @@ int bpf_object__pin(struct bpf_object *obj, const char 
*path)
if (err)
return err;
 
+   bpf_map__for_each(map, obj) {
+   char 

[PATCH v4 bpf-next 6/7] bpftool: add pinmaps argument to the load/loadall

2018-11-08 Thread Stanislav Fomichev
From: Stanislav Fomichev 

This new additional argument lets users pin all maps from the object at
specified path.

Signed-off-by: Stanislav Fomichev 
---
 .../bpftool/Documentation/bpftool-prog.rst|  4 +++-
 tools/bpf/bpftool/bash-completion/bpftool |  3 ++-
 tools/bpf/bpftool/prog.c  | 24 ++-
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index d943d9b67a1d..b04c4a365739 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -80,7 +80,7 @@ DESCRIPTION
  contain a dot character ('.'), which is reserved for future
  extensions of *bpffs*.
 
-   **bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
[**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
+   **bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
[**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*] [**pinmaps** 
*MAP_DIR*]
  Load bpf program(s) from binary *OBJ* and pin as *FILE*.
  Both **bpftool prog load** and **bpftool prog loadall** load
  all maps and programs from the *OBJ* and differ only in
@@ -98,6 +98,8 @@ DESCRIPTION
  use, referring to it by **id** or through a **pinned** file.
  If **dev** *NAME* is specified program will be loaded onto
  given networking device (offload).
+ Optional **pinmaps** argument can be provided to pin all
+ maps under *MAP_DIR* directory.
 
  Note: *FILE* must be located in *bpffs* mount. It must not
  contain a dot character ('.'), which is reserved for future
diff --git a/tools/bpf/bpftool/bash-completion/bpftool 
b/tools/bpf/bpftool/bash-completion/bpftool
index 780ebafb756a..a05d0071f39f 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -346,7 +346,7 @@ _bpftool()
 _bpftool_get_map_ids
 return 0
 ;;
-pinned)
+pinned|pinmaps)
 _filedir
 return 0
 ;;
@@ -358,6 +358,7 @@ _bpftool()
 COMPREPLY=( $( compgen -W "map" -- "$cur" ) )
 _bpftool_once_attr 'type'
 _bpftool_once_attr 'dev'
+_bpftool_once_attr 'pinmaps'
 return 0
 ;;
 esac
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 751a90ccfdab..4654d9450cd9 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -802,6 +802,7 @@ static int load_with_options(int argc, char **argv, bool 
first_prog_only)
struct map_replace *map_replace = NULL;
struct bpf_program *prog = NULL, *pos;
unsigned int old_map_fds = 0;
+   const char *pinmaps = NULL;
struct bpf_object *obj;
struct bpf_map *map;
const char *pinfile;
@@ -906,6 +907,13 @@ static int load_with_options(int argc, char **argv, bool 
first_prog_only)
goto err_free_reuse_maps;
}
NEXT_ARG();
+   } else if (is_prefix(*argv, "pinmaps")) {
+   NEXT_ARG();
+
+   if (!REQ_ARGS(1))
+   goto err_free_reuse_maps;
+
+   pinmaps = GET_ARG();
} else {
p_err("expected no more arguments, 'type', 'map' or 
'dev', got: '%s'?",
  *argv);
@@ -1026,6 +1034,14 @@ static int load_with_options(int argc, char **argv, bool 
first_prog_only)
}
}
 
+   if (pinmaps) {
+   err = bpf_object__pin_maps(obj, pinmaps);
+   if (err) {
+   p_err("failed to pin all maps");
+   goto err_unpin;
+   }
+   }
+
if (json_output)
jsonw_null(json_wtr);
 
@@ -1036,6 +1052,11 @@ static int load_with_options(int argc, char **argv, bool 
first_prog_only)
 
return 0;
 
+err_unpin:
+   if (first_prog_only)
+   unlink(pinfile);
+   else
+   bpf_object__unpin_programs(obj, pinfile);
 err_close_obj:
bpf_object__close(obj);
 err_free_reuse_maps:
@@ -1069,7 +1090,8 @@ static int do_help(int argc, char **argv)
"   %s %s pin   PROG FILE\n"
"   %s %s { load | loadall } OBJ  FILE \\\n"
" [type TYPE] [dev NAME] \\\n"
-   " 

Re: [PATCH net-next] net: bcmgenet: return correct value 'ret' from bcmgenet_power_down

2018-11-08 Thread David Miller
From: YueHaibing 
Date: Thu, 8 Nov 2018 02:08:43 +

> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/net/ethernet/broadcom/genet/bcmgenet.c: In function 
> 'bcmgenet_power_down':
> drivers/net/ethernet/broadcom/genet/bcmgenet.c:1136:6: warning:
>  variable 'ret' set but not used [-Wunused-but-set-variable]
> 
> bcmgenet_power_down should return 'ret' instead of 0.
> 
> Fixes: ca8cf341903f ("net: bcmgenet: propagate errors from 
> bcmgenet_power_down")
> Signed-off-by: YueHaibing 

Applied, thanks.


[PATCH v4 bpf-next 7/7] bpftool: support loading flow dissector

2018-11-08 Thread Stanislav Fomichev
From: Stanislav Fomichev 

This commit adds support for loading/attaching/detaching flow
dissector program. The structure of the flow dissector program is
assumed to be the same as in the selftests:

* flow_dissector section with the main entry point
* a bunch of tail call progs
* a jmp_table map that is populated with the tail call progs

When `bpftool loadall` is called with a flow_dissector prog (i.e. when the
'type flow_dissector' argument is passed), we load and pin all programs.
User is responsible to construct the jump table for the tail calls.

The last argument of `bpftool attach` is made optional for this use
case.

Example:
bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
/sys/fs/bpf/flow type flow_dissector \
pinmaps /sys/fs/bpf/flow

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 0 0 0 0 \
value pinned /sys/fs/bpf/flow/IP

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 1 0 0 0 \
value pinned /sys/fs/bpf/flow/IPV6

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 2 0 0 0 \
value pinned /sys/fs/bpf/flow/IPV6OP

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 3 0 0 0 \
value pinned /sys/fs/bpf/flow/IPV6FR

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 4 0 0 0 \
value pinned /sys/fs/bpf/flow/MPLS

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
key 5 0 0 0 \
value pinned /sys/fs/bpf/flow/VLAN

bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector flow_dissector

Tested by using the above lines to load the prog in
the test_flow_dissector.sh selftest.

Signed-off-by: Stanislav Fomichev 
---
 .../bpftool/Documentation/bpftool-prog.rst| 26 +++---
 tools/bpf/bpftool/bash-completion/bpftool | 14 ++-
 tools/bpf/bpftool/prog.c  | 87 +++
 3 files changed, 77 insertions(+), 50 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index b04c4a365739..d77885176464 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -26,8 +26,8 @@ MAP COMMANDS
 |  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
**opcodes**}]
 |  **bpftool** **prog pin** *PROG* *FILE*
 |  **bpftool** **prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
[**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
-|   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
-|   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
+|   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
+|   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
 |  **bpftool** **prog help**
 |
 |  *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
@@ -40,7 +40,9 @@ MAP COMMANDS
 |  **cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | 
**cgroup/post_bind6** |
 |  **cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** 
| **cgroup/sendmsg6**
 |  }
-|   *ATTACH_TYPE* := { **msg_verdict** | **skb_verdict** | **skb_parse** }
+|   *ATTACH_TYPE* := {
+|  **msg_verdict** | **skb_verdict** | **skb_parse** | 
**flow_dissector**
+|  }
 
 
 DESCRIPTION
@@ -105,13 +107,17 @@ DESCRIPTION
  contain a dot character ('.'), which is reserved for future
  extensions of *bpffs*.
 
-**bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
-  Attach bpf program *PROG* (with type specified by 
*ATTACH_TYPE*)
-  to the map *MAP*.
-
-**bpftool prog detach** *PROG* *ATTACH_TYPE* *MAP*
-  Detach bpf program *PROG* (with type specified by 
*ATTACH_TYPE*)
-  from the map *MAP*.
+   **bpftool prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
+ Attach bpf program *PROG* (with type specified by
+ *ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP*
+ parameter, with the exception of *flow_dissector* which is
+ attached to current networking name space.
+
+   **bpftool prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
+ Detach bpf program *PROG* (with type specified by
+ *ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP*
+ parameter, with the exception of *flow_dissector* which is
+ detached from the current networking name space.
 
**bpftool prog help**
  Print short help message.
diff --git a/tools/bpf/bpftool/bash-completion/bpftool 
b/tools/bpf/bpftool/bash-completion/bpftool
index a05d0071f39f..45c2db257d2b 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -299,7 +299,8 @@ _bpftool()
 fi
 
 if [[ ${#words[@]} == 6 ]]; then
-   

Re: [PATCH net-next 0/7] net: sched: prepare for more Qdisc offloads

2018-11-08 Thread David Miller
From: Jakub Kicinski 
Date: Wed,  7 Nov 2018 17:33:33 -0800

> This series refactors the "switchdev" Qdisc offloads a little.  We have
> a few Qdiscs which can be fully offloaded today to the forwarding plane
> of switching devices.
> 
> First patch adds a helper for handing statistic dumps, the code seems
> to be copy pasted between PRIO and RED.  Second patch removes unnecessary
> parameter from RED offload function.  Third patch makes the MQ offload
> use the dump helper which helps it behave much like PRIO and RED when
> it comes to the TCQ_F_OFFLOADED flag.  Patch 4 adds a graft helper,
> similar to the dump helper.
> 
> Patch 5 is unrelated to offloads, qdisc_graft() code seemed ripe for a
> small refactor - no functional changes there.
> 
> Last two patches move the qdisc_put() call outside of the sch_tree_lock
> section for RED and PRIO.  The child Qdiscs will get removed from the
> hierarchy under the lock, but having the put (and potentially destroy)
> called outside of the lock helps offload which may choose to sleep,
> and it should generally lower the Qdisc change impact.

Series applied, thanks Jakub.


Re: [RFC PATCH 0/3] acpi: Add acpi mdio support code

2018-11-08 Thread Andrew Lunn
On Thu, Nov 08, 2018 at 03:21:29PM +0800, Wang Dongsheng wrote:
> Originally I just push "phy-handle" support for ACPI on the QCOM QDF2400
> platform. After some discussion and following Andrew's advice, I send
> out with a generic version of ACPI.
> 
> Current there is no clear documentation about MDIO/PHY for ACPI, so when
> I reading some documents about ACPI [1], I think we just need to reuse the
> DT binding in the ACPI.[2]. However, this series of patches are not
> fully compatible with all contents specified in DT binding.
> 
> The most important thing about this iseries is link the phy device and
> fwnode of acpi. Besides, we need to carry out bus scan at the mdio
> register. Therefore, I am not compatible with more DT binding properties
> in this series of patches. More support will be in the follow-up patches
> support, or some people do the support.
> 
> Example:
> Based on ACPI doc:
> Documentation/acpi/dsd/data-node-references.txt
> Documentation/acpi/dsd/graph.txt
> With _DSD device properties we can finally do this:
> Device (MDIO) {
> Name (_DSD, Package () {
> ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> Package () { Package () { "ethernet-phy@0", PHY0 }, }
> })
> Name (PHY0, Package() {
> ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> Package () { Package () { "reg", 0x0 }, }
> })

I don't know much about ACPI. I do know DT. MDIO busses can have
multiple PHYs on them. Is the following valid to list two PHYs?

 Device (MDIO) {
 Name (_DSD, Package () {
 ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
 Package () { Package () { "ethernet-phy@0", PHY0 }, }
 })
 Name (PHY0, Package() {
 ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
 Package () { Package () { "reg", 0x0 }, }
 })
 Name (_DSD, Package () {
 ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
 Package () { Package () { "ethernet-phy@10", PHY1 }, }
 })
 Name (PHY1, Package() {
 ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
 Package () { Package () { "reg", 0x10 }, }
 })
 }


An MDIO bus can also have more than PHYs on them. There can be
Ethernet switches. Broadcom also have some with generic PHY devices on
them, and other odd things. That means whatever is on an MDIO bus is a
device in the Linux device model. How does that work? Do we need some
form Device (PHY) {}?

 Device (MDIO) {
 Device (PHY) {
 Name (_DSD, Package () {
 ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
 Package () { Package () { "ethernet-phy@0", PHY0 }, }
 })
 Name (PHY0, Package() {
 ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
 Package () { Package () { "reg", 0x0 }, }
 })
 }
 Device (PHY) {
 Name (_DSD, Package () {
 ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
 Package () { Package () { "ethernet-phy@10", PHY1 }, }
 })
 Name (PHY1, Package() {
 ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
 Package () { Package () { "reg", 0x10 }, }
 })
 Device (SWITCH) {
 Name (_DSD, Package () {
 ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
 Package () { Package () { "switch@11", SWITCH0 }, }
 })
 Name (SWITCH0, Package() {
 ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
 Package () { Package () { "reg", 0x11 }, }
 })
 }
  }

I'm just trying to ensure whatever is defined is flexible enough that
we really can later support everything which DT does. We have PHYs on
MDIO busses, inside switches, which are on MDIO busses, which are
inside Ethernet interfaces, etc.

An MDIO bus is very similar to an i2c bus. How is that described in
ACPI? Anything we can learn from that?

  Thanks
 Andrew


Re: [net-next PATCH v2] net: sched: cls_flower: Classify packets using port ranges

2018-11-08 Thread David Miller
From: Amritha Nambiar 
Date: Wed, 07 Nov 2018 13:22:42 -0800

> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 401d0c1..b63c3cf 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -405,6 +405,11 @@ enum {
>   TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>   TCA_FLOWER_KEY_UDP_DST, /* be16 */
>  
> + TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
> + TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
> + TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
> + TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
> +
>   TCA_FLOWER_FLAGS,
>   TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>   TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
> @@ -518,6 +523,8 @@ enum {

I don't think you can do this without breaking UAPI, this changes the
value of TCA_FLOWER_FLAGS and all subsequent values in this
enumeration.


Re: [PATCH net-next 1/2] net: phy: use phy_id_mask value zero for exact match

2018-11-08 Thread Florian Fainelli
On 11/8/18 12:53 PM, Andrew Lunn wrote:
>>> Maybe we can find a clever way with a macro to specify only the PHY OUI
>>> and compute a suitable mask automatically?
>>>
>> I don't think so. For Realtek each driver is specific even to a model
>> revision (therefore mask 0x). Same applies to intel-xway.
>> In broadcom.c we have masks 0xfff0, so for each model, independent
>> of revision number. There is no general rule.
>> Also we can't simply check for the first-bit-set to derive a mask.
> 
> I'm crystal ball gazing, but i think Florian was meaning something like
> 
> #define PHY_ID_UNIQUE(_id) \
>   .phy_id = _id_; \
>   .phy_id_mask = 0x;
> 
> It is the boilerplate setting .phy_id_mask which you don't like. This removes 
> that boilerplate.

Your crystal ball gazing skills are good, that is what I meant, we could
also define another macro which does not match the revision bits, and
that would likely cover everything that is already out there.
-- 
Florian


Re: [PATCH net-next 0/2] net: phy: use phy_id_mask value zero for exact match

2018-11-08 Thread David Miller
From: Heiner Kallweit 
Date: Wed, 7 Nov 2018 21:52:31 +0100

> A phy_id_mask value zero means every PHYID matches, therefore
> value zero isn't used. So we can safely redefine the semantics
> of value zero to mean "exact match". This allows to avoid some
> boilerplate code in PHY driver configs.
> 
> Realtek PHY driver is the first user of this change.

It looks like there will be some changes to this series, so I will
wait for the next version.


Re: [PATCH net-next 0/5] net: phy: improve and simplify phylib state machine

2018-11-08 Thread David Miller
From: Florian Fainelli 
Date: Thu, 8 Nov 2018 15:00:01 -0800

> On 11/8/18 2:58 PM, David Miller wrote:
>> From: Heiner Kallweit 
>> Date: Wed, 7 Nov 2018 20:41:52 +0100
>> 
>>> This patch series is based on two axioms:
>>>
>>> - During autoneg a PHY always reports the link being down
>>>
>>> - Info in clause 22/45 registers doesn't allow to differentiate between
>>>   these two states:
>>>   1. Link is physically down
>>>   2. A link partner is connected and PHY is autonegotiating
>>>   In both cases "link up" and "aneg finished" bits aren't set.
>>>   One consequence is that having separate states PHY_NOLINK and PHY_AN
>>>   isn't needed.
>>>
>>> By using these two axioms the state machine can be significantly
>>> simplified.
>> 
>> So how are we going to move forward on this?
>> 
>> Maybe we can apply this series and just watch carefully for any
>> problems that get reported or are found?
> 
> Given Heiner is always responsive and taking care of fixing what might
> be/have broken, no objections with me on that.

Great, I've applied this series to net-next then.

Thanks.


Re: [PATCH net-next 0/5] net: phy: improve and simplify phylib state machine

2018-11-08 Thread Andrew Lunn
On Thu, Nov 08, 2018 at 03:00:01PM -0800, Florian Fainelli wrote:
> On 11/8/18 2:58 PM, David Miller wrote:
> > From: Heiner Kallweit 
> > Date: Wed, 7 Nov 2018 20:41:52 +0100
> > 
> >> This patch series is based on two axioms:
> >>
> >> - During autoneg a PHY always reports the link being down
> >>
> >> - Info in clause 22/45 registers doesn't allow to differentiate between
> >>   these two states:
> >>   1. Link is physically down
> >>   2. A link partner is connected and PHY is autonegotiating
> >>   In both cases "link up" and "aneg finished" bits aren't set.
> >>   One consequence is that having separate states PHY_NOLINK and PHY_AN
> >>   isn't needed.
> >>
> >> By using these two axioms the state machine can be significantly
> >> simplified.
> > 
> > So how are we going to move forward on this?
> > 
> > Maybe we can apply this series and just watch carefully for any
> > problems that get reported or are found?
> 
> Given Heiner is always responsive and taking care of fixing what might
> be/have broken, no objections with me on that.

Yes, lets try it.

 Andrew


Re: [PATCH net-next 0/5] net: phy: improve and simplify phylib state machine

2018-11-08 Thread Florian Fainelli
On 11/8/18 2:58 PM, David Miller wrote:
> From: Heiner Kallweit 
> Date: Wed, 7 Nov 2018 20:41:52 +0100
> 
>> This patch series is based on two axioms:
>>
>> - During autoneg a PHY always reports the link being down
>>
>> - Info in clause 22/45 registers doesn't allow to differentiate between
>>   these two states:
>>   1. Link is physically down
>>   2. A link partner is connected and PHY is autonegotiating
>>   In both cases "link up" and "aneg finished" bits aren't set.
>>   One consequence is that having separate states PHY_NOLINK and PHY_AN
>>   isn't needed.
>>
>> By using these two axioms the state machine can be significantly
>> simplified.
> 
> So how are we going to move forward on this?
> 
> Maybe we can apply this series and just watch carefully for any
> problems that get reported or are found?

Given Heiner is always responsive and taking care of fixing what might
be/have broken, no objections with me on that.
-- 
Florian


Re: [PATCH net-next 0/5] net: phy: improve and simplify phylib state machine

2018-11-08 Thread David Miller
From: Heiner Kallweit 
Date: Wed, 7 Nov 2018 20:41:52 +0100

> This patch series is based on two axioms:
> 
> - During autoneg a PHY always reports the link being down
> 
> - Info in clause 22/45 registers doesn't allow to differentiate between
>   these two states:
>   1. Link is physically down
>   2. A link partner is connected and PHY is autonegotiating
>   In both cases "link up" and "aneg finished" bits aren't set.
>   One consequence is that having separate states PHY_NOLINK and PHY_AN
>   isn't needed.
> 
> By using these two axioms the state machine can be significantly
> simplified.

So how are we going to move forward on this?

Maybe we can apply this series and just watch carefully for any
problems that get reported or are found?


Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-08 Thread Edward Cree
On 08/11/18 19:42, Alexei Starovoitov wrote:
> same link let's continue at 1pm PST. 
So, one thing we didn't really get onto was maps, and you mentioned that it
 wasn't really clear what I was proposing there.
What I have in mind comes in two parts:
1) map type.  A new BTF_KIND_MAP with metadata 'key_type', 'value_type'
 (both are type_ids referencing other BTF type records), describing the
 type "map from key_type to value_type".
2) record in the 'instances' table.  This would have a name_off (the
 name of the map), a type_id (pointing at a BTF_KIND_MAP in the 'types'
 table), and potentially also some indication of what symbol (from
 section 'maps') refers to this map.  This is pretty much the exact
 same metadata that a function in the 'instances' table has, the only
 differences being
 (a) function's type_id points at a BTF_KIND_FUNC record
 (b) function's symbol indication refers from .text section
 (c) in future functions may be nested inside other functions, whereas
 AIUI a map can't live inside a function.  (But a variable, which is
 the other thing that would want to go in an 'instances' table, can.)
So the 'instances' table record structure looks like

struct btf_instance {
    __u32 type_id; /* Type of object declared.  An index into type section */
    __u32 name_off; /* Name of object.  An offset into string section */
    __u32 parent; /* Containing object if any (else 0).  An index into instance 
section */
};

and we extend the BTF header:

struct btf_header {
    __u16   magic;
    __u8    version;
    __u8    flags;
    __u32   hdr_len;

    /* All offsets are in bytes relative to the end of this header */
    __u32   type_off;  /* offset of type section   */
    __u32   type_len;  /* length of type section   */
    __u32   str_off;   /* offset of string section */
    __u32   str_len;   /* length of string section */
    __u32   inst_off;  /* offset of instance section   */
    __u32   inst_len;  /* length of instance section   */
};

Then in the .BTF.ext section, we have both

struct bpf_func_info {
    __u32 prog_symbol; /* Index of symbol giving address of subprog */
    __u32 inst_id; /* Index into instance section */
}

struct bpf_map_info {
{
    __u32 map_symbol; /* Index of symbol creating this map */
    __u32 inst_id; /* Index into instance section */
}

(either living in different subsections, or in a single table with
 the addition of a kind field, or in a single table relying on the
 ultimately referenced type to distinguish funcs from maps).

Note that the name (in btf_instance) of a map or function need not
 match the name of the corresponding symbol; we use the .BTF.ext
 section to tie together btf_instance IDs and symbol IDs.  Then in
 the case of functions (subprogs), the prog_symbol can be looked
 up in the ELF symbol table to find the address (== insn_offset)
 of the subprog, as well as the section containing it (since that
 might not be .text).  Similarly in the case of maps the BTF info
 about the map is connected with the info in the maps section.

Now when the loader has munged this, what it passes to the kernel
 might not have map_symbol, but instead map_fd.  Instead of
 prog_symbol it will have whatever identifies the subprog in the
 blob of stuff it feeds to the kernel (so probably insn_offset).

All this would of course require a bit more compiler support than
 the current BPF_ANNOTATE_KV_PAIR, since that just causes the
 existing BTF machinery to declare a specially constructed struct
 type.  At the C level you could still have BPF_ANNOTATE_KV_PAIR
 and the 'bpf_map_foo' name, but then the compiler would
 recognise that and convert it into an instance record by looking
 up the name 'foo' in its "maps" section.  That way the special
 bpf_map_* handling (which ties map names to symbol names,
 also) would be entirely compiler-internal and not 'leak out' into
 the definition of the format.  Frontends for other languages
 which do possess a native map type (e.g. Python dict) might have
 other ways of indicating the key/value type of a map at source
 level (e.g. PEP 484) and could directly generate the appropriate
 BTF_KIND_MAP and bpf_map_info records rather than (as they would
 with the current design) having to encode the information as a
 struct bpf_map_foo type-definition.


While I realise the desire to concentrate on one topic at once, I
 think this question of maps should be discussed in tomorrow's
 call, since it is when we start having other kinds of instances
 besides functions that the advantages of my design become
 apparent, unifying the process of 'declaration' of functions,
 maps, and (eventually) variables while separating them all from
 the process of 'definition' of the types of all three.

Thank you for your continued patience with me.
-Ed


Re: [PATCH net-next 1/2] net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt

2018-11-08 Thread Heiner Kallweit
On 08.11.2018 23:33, Florian Fainelli wrote:
> On 11/8/18 1:55 PM, Heiner Kallweit wrote:
>> Flag PHY_HAS_INTERRUPT is used only here for this small check. I think
>> using interrupts isn't possible if a driver defines neither
>> config_intr nor ack_interrupts callback. So we can replace checking
>> flag PHY_HAS_INTERRUPT with checking for these callbacks.
>>
>> Signed-off-by: Heiner Kallweit 
>> ---
>>  drivers/net/phy/phy_device.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index d165a2c82..33e51b955 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -2104,8 +2104,9 @@ static int phy_probe(struct device *dev)
>>  /* Disable the interrupt if the PHY doesn't support it
>>   * but the interrupt is still a valid one
>>   */
>> -if (!(phydrv->flags & PHY_HAS_INTERRUPT) &&
>> -phy_interrupt_is_valid(phydev))
>> + if (!phydrv->config_intr &&
>> + !phydrv->ack_interrupt &&
>> + phy_interrupt_is_valid(phydev))
>>  phydev->irq = PHY_POLL;
> 
> I would introduce an inline helper function which checks for
> drv->config_intr and config_ack_interrupt, that way if we ever have to
> introduce an additional function in the future, we just update the
> helper to check for that.
> 
> Other than that, LGTM
> 
OK, will add a helper and remove PHY_HAS_INTERRUPT from all drivers.


Re: [PATCH net-next] net: phy: improve struct phy_device member interrupts handling

2018-11-08 Thread Heiner Kallweit
On 08.11.2018 23:24, Andrew Lunn wrote:
> On Thu, Nov 08, 2018 at 10:36:33PM +0100, Heiner Kallweit wrote:
>> As a heritage from the very early days of phylib member interrupts is
>> defined as u32 even though it's just a flag whether interrupts are
>> enabled. So we can change it to a bitfield member. In addition change
>> the code dealing with this member in a way that it's clear we're
>> dealing with a bool value.
>>
>> Signed-off-by: Heiner Kallweit 
>> ---
>> Actually this member isn't needed at all and could be replaced with
>> a parameter in phy_driver->config_intr. But this would mean an API
>> change, maybe I come up with a proposal later.
>> ---
>>  drivers/net/phy/phy.c |  2 +-
>>  include/linux/phy.h   | 10 +-
>>  2 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>> index dd5bff955..a5e6acfe6 100644
>> --- a/drivers/net/phy/phy.c
>> +++ b/drivers/net/phy/phy.c
>> @@ -115,7 +115,7 @@ static int phy_clear_interrupt(struct phy_device *phydev)
>>   *
>>   * Returns 0 on success or < 0 on error.
>>   */
>> -static int phy_config_interrupt(struct phy_device *phydev, u32 interrupts)
>> +static int phy_config_interrupt(struct phy_device *phydev, bool interrupts)
>>  {
>>  phydev->interrupts = interrupts;
>>  if (phydev->drv->config_intr)
>> diff --git a/include/linux/phy.h b/include/linux/phy.h
>> index 5dd85c441..fc90af152 100644
>> --- a/include/linux/phy.h
>> +++ b/include/linux/phy.h
>> @@ -263,8 +263,8 @@ static inline struct mii_bus *devm_mdiobus_alloc(struct 
>> device *dev)
>>  void devm_mdiobus_free(struct device *dev, struct mii_bus *bus);
>>  struct phy_device *mdiobus_scan(struct mii_bus *bus, int addr);
>>  
>> -#define PHY_INTERRUPT_DISABLED  0x0
>> -#define PHY_INTERRUPT_ENABLED   0x8000
>> +#define PHY_INTERRUPT_DISABLED  0
>> +#define PHY_INTERRUPT_ENABLED   1
> 
> Hi Heiner
> 
> Since this is passed around as a bool, false/true would be better than
> 0/1.
> 
I thought about that before submitting the patch. I preferred to go with
0/1 because we assign this value to a 1 bit bitfield member which supports
values 0/1 only. So neither option is perfect IMO.

>   Andrew
> 
Heiner


Re: [PATCH net-next 0/2] net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt

2018-11-08 Thread Florian Fainelli
On 11/8/18 2:30 PM, Andrew Lunn wrote:
> On Thu, Nov 08, 2018 at 10:54:50PM +0100, Heiner Kallweit wrote:
>> Flag PHY_HAS_INTERRUPT is used only here for this small check. I think
>> using interrupts isn't possible if a driver defines neither
>> config_intr nor ack_interrupts callback. So we can replace checking
>> flag PHY_HAS_INTERRUPT with checking for these callbacks.
>>
>> This allows to remove this flag from a lot of driver configs, let's
>> start with the Realtek driver.
> 
> Hi Heiner
> 
> Ideally, we should do this to all the drivers, not just one. If we
> leave PHY_HAS_INTERRUPT, people are going to use it. That then makes
> the realtek driver then different to all the other drivers, and at
> some point somebody will do something which breaks it because they
> don't know the realtek driver is special.

Agreed.
-- 
Florian


Re: [PATCH net-next 1/2] net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt

2018-11-08 Thread Florian Fainelli
On 11/8/18 1:55 PM, Heiner Kallweit wrote:
> Flag PHY_HAS_INTERRUPT is used only here for this small check. I think
> using interrupts isn't possible if a driver defines neither
> config_intr nor ack_interrupts callback. So we can replace checking
> flag PHY_HAS_INTERRUPT with checking for these callbacks.
> 
> Signed-off-by: Heiner Kallweit 
> ---
>  drivers/net/phy/phy_device.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index d165a2c82..33e51b955 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2104,8 +2104,9 @@ static int phy_probe(struct device *dev)
>   /* Disable the interrupt if the PHY doesn't support it
>* but the interrupt is still a valid one
>*/
> - if (!(phydrv->flags & PHY_HAS_INTERRUPT) &&
> - phy_interrupt_is_valid(phydev))
> +  if (!phydrv->config_intr &&
> +  !phydrv->ack_interrupt &&
> +  phy_interrupt_is_valid(phydev))
>   phydev->irq = PHY_POLL;

I would introduce an inline helper function which checks for
drv->config_intr and config_ack_interrupt, that way if we ever have to
introduce an additional function in the future, we just update the
helper to check for that.

Other than that, LGTM
-- 
Florian


Re: [PATCH net-next 0/2] net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt

2018-11-08 Thread Andrew Lunn
On Thu, Nov 08, 2018 at 10:54:50PM +0100, Heiner Kallweit wrote:
> Flag PHY_HAS_INTERRUPT is used only here for this small check. I think
> using interrupts isn't possible if a driver defines neither
> config_intr nor ack_interrupts callback. So we can replace checking
> flag PHY_HAS_INTERRUPT with checking for these callbacks.
> 
> This allows to remove this flag from a lot of driver configs, let's
> start with the Realtek driver.

Hi Heiner

Ideally, we should do this to all the drivers, not just one. If we
leave PHY_HAS_INTERRUPT, people are going to use it. That then makes
the realtek driver then different to all the other drivers, and at
some point somebody will do something which breaks it because they
don't know the realtek driver is special.

Andrew


[Patch net] ip: fix csum_sub() with csum_block_sub()

2018-11-08 Thread Cong Wang
When subtracting the checksum of a block of data,
csum_block_sub() must be used to respect the offset.

We learned this lesson from both commit d55bef5059dd
("net: fix pskb_trim_rcsum_slow() with odd trim offset") and
commit d48051c5b837 ("net/mlx5e: fix csum adjustments caused by RXFCS").

Fixes: ca4ef4574f1e ("ip: fix IP_CHECKSUM handling")
Cc: Paolo Abeni 
Cc: Eric Dumazet 
Cc: Willem de Bruijn 
Signed-off-by: Cong Wang 
---
 net/ipv4/ip_sockglue.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index fffcc130900e..0d69f0823c08 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -122,7 +122,10 @@ static void ip_cmsg_recv_checksum(struct msghdr *msg, 
struct sk_buff *skb,
 
if (offset != 0) {
int tend_off = skb_transport_offset(skb) + tlen;
-   csum = csum_sub(csum, skb_checksum(skb, tend_off, offset, 0));
+
+   csum = csum_block_sub(csum,
+ skb_checksum(skb, tend_off, offset, 0),
+ tend_off);
}
 
put_cmsg(msg, SOL_IP, IP_CHECKSUM, sizeof(__wsum), );
-- 
2.19.1



Re: [PATCH net-next] net: phy: improve struct phy_device member interrupts handling

2018-11-08 Thread Andrew Lunn
On Thu, Nov 08, 2018 at 10:36:33PM +0100, Heiner Kallweit wrote:
> As a heritage from the very early days of phylib member interrupts is
> defined as u32 even though it's just a flag whether interrupts are
> enabled. So we can change it to a bitfield member. In addition change
> the code dealing with this member in a way that it's clear we're
> dealing with a bool value.
> 
> Signed-off-by: Heiner Kallweit 
> ---
> Actually this member isn't needed at all and could be replaced with
> a parameter in phy_driver->config_intr. But this would mean an API
> change, maybe I come up with a proposal later.
> ---
>  drivers/net/phy/phy.c |  2 +-
>  include/linux/phy.h   | 10 +-
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index dd5bff955..a5e6acfe6 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -115,7 +115,7 @@ static int phy_clear_interrupt(struct phy_device *phydev)
>   *
>   * Returns 0 on success or < 0 on error.
>   */
> -static int phy_config_interrupt(struct phy_device *phydev, u32 interrupts)
> +static int phy_config_interrupt(struct phy_device *phydev, bool interrupts)
>  {
>   phydev->interrupts = interrupts;
>   if (phydev->drv->config_intr)
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 5dd85c441..fc90af152 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -263,8 +263,8 @@ static inline struct mii_bus *devm_mdiobus_alloc(struct 
> device *dev)
>  void devm_mdiobus_free(struct device *dev, struct mii_bus *bus);
>  struct phy_device *mdiobus_scan(struct mii_bus *bus, int addr);
>  
> -#define PHY_INTERRUPT_DISABLED   0x0
> -#define PHY_INTERRUPT_ENABLED0x8000
> +#define PHY_INTERRUPT_DISABLED   0
> +#define PHY_INTERRUPT_ENABLED1

Hi Heiner

Since this is passed around as a bool, false/true would be better than
0/1.

Andrew


Re: [PATCH 04/20] octeontx2-af: NPC MCAM entry alloc/free support

2018-11-08 Thread David Miller
From: sunil.kovv...@gmail.com
Date: Fri,  9 Nov 2018 00:05:45 +0530

> +int rvu_mbox_handler_NPC_MCAM_ALLOC_ENTRY(struct rvu *rvu,
> +   struct npc_mcam_alloc_entry_req *req,
> +   struct npc_mcam_alloc_entry_rsp *rsp);
> +int rvu_mbox_handler_NPC_MCAM_FREE_ENTRY(struct rvu *rvu,
> +  struct npc_mcam_free_entry_req *req,
> +  struct msg_rsp *rsp);

Please don't use capitalized letters or StUdLyCaPs for function and variable
names.

Keep capitalized letters for CPP macros.

This is how programmers visually can see if something is a real C declaration
or some CPP stuff.


Re: [Patch net] ip: fix csum_sub() with csum_block_sub()

2018-11-08 Thread Cong Wang
On Thu, Nov 8, 2018 at 2:17 PM Eric Dumazet  wrote:
>
> On Thu, Nov 8, 2018 at 2:04 PM Cong Wang  wrote:
> >
> > When subtracting the checksum of a block of data,
> > csum_block_sub() must be used to respect the offset.
> >
> > We learned this lesson from both commit d55bef5059dd
> > ("net: fix pskb_trim_rcsum_slow() with odd trim offset") and
> > commit d48051c5b837 ("net/mlx5e: fix csum adjustments caused by RXFCS").
> >
> > Fixes: ca4ef4574f1e ("ip: fix IP_CHECKSUM handling")
>
> I do not believe you fix any bug here...
>
> I do not know of any inet protocol having odd header sizes.

That offset is payload offset, but yeah, the payload offset must be
aligned some way. Good point!

Let's drop it.


Re: [Patch net] ip: fix csum_sub() with csum_block_sub()

2018-11-08 Thread Eric Dumazet
On Thu, Nov 8, 2018 at 2:04 PM Cong Wang  wrote:
>
> When subtracting the checksum of a block of data,
> csum_block_sub() must be used to respect the offset.
>
> We learned this lesson from both commit d55bef5059dd
> ("net: fix pskb_trim_rcsum_slow() with odd trim offset") and
> commit d48051c5b837 ("net/mlx5e: fix csum adjustments caused by RXFCS").
>
> Fixes: ca4ef4574f1e ("ip: fix IP_CHECKSUM handling")

I do not believe you fix any bug here...

I do not know of any inet protocol having odd header sizes.


Re: [PATCH net-next 0/4] Remove VLAN_TAG_PRESENT from drivers

2018-11-08 Thread Michał Mirosław
On Thu, Nov 08, 2018 at 08:50:05PM +0200, Leon Romanovsky wrote:
> On Thu, Nov 08, 2018 at 06:44:46PM +0100, Michał Mirosław wrote:
> > This series removes VLAN_TAG_PRESENT use from network drivers in
> > preparation to removing its special meaning.
> Can you please give an extra explanation why it is removed?
> Such series come out-of-blue, for people who are not following
> netdev mailing list closely (drivers/infiniband/*).

This is one of the steps to remove VLAN_TAG_PRESENT overlap with CFI/DEI
bit of VLAN tag. Currently this overlap causes Linux kernel to always
clear CFI/DEI in packets.

There is skb_vlan_tag_present() that drivers should use to check if
the tag in skb is valid.

Best Regards,
Michał Mirosław


[Patch net-next v2] net: move __skb_checksum_complete*() to skbuff.c

2018-11-08 Thread Cong Wang
__skb_checksum_complete_head() and __skb_checksum_complete()
are both declared in skbuff.h, they fit better in skbuff.c
than datagram.c.

Cc: Stefano Brivio 
Signed-off-by: Cong Wang 
---
 net/core/datagram.c | 43 ---
 net/core/skbuff.c   | 43 +++
 2 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 57f3a6fcfc1e..07983b90d2bd 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -728,49 +728,6 @@ static int skb_copy_and_csum_datagram(const struct sk_buff 
*skb, int offset,
return -EFAULT;
 }
 
-__sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len)
-{
-   __sum16 sum;
-
-   sum = csum_fold(skb_checksum(skb, 0, len, skb->csum));
-   if (likely(!sum)) {
-   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
-   !skb->csum_complete_sw)
-   netdev_rx_csum_fault(skb->dev);
-   }
-   if (!skb_shared(skb))
-   skb->csum_valid = !sum;
-   return sum;
-}
-EXPORT_SYMBOL(__skb_checksum_complete_head);
-
-__sum16 __skb_checksum_complete(struct sk_buff *skb)
-{
-   __wsum csum;
-   __sum16 sum;
-
-   csum = skb_checksum(skb, 0, skb->len, 0);
-
-   /* skb->csum holds pseudo checksum */
-   sum = csum_fold(csum_add(skb->csum, csum));
-   if (likely(!sum)) {
-   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
-   !skb->csum_complete_sw)
-   netdev_rx_csum_fault(skb->dev);
-   }
-
-   if (!skb_shared(skb)) {
-   /* Save full packet checksum */
-   skb->csum = csum;
-   skb->ip_summed = CHECKSUM_COMPLETE;
-   skb->csum_complete_sw = 1;
-   skb->csum_valid = !sum;
-   }
-
-   return sum;
-}
-EXPORT_SYMBOL(__skb_checksum_complete);
-
 /**
  * skb_copy_and_csum_datagram_msg - Copy and checksum skb to user iovec.
  * @skb: skbuff
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b4ee5c8b928f..5cb4b3440153 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2645,6 +2645,49 @@ __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, 
int offset,
 }
 EXPORT_SYMBOL(skb_copy_and_csum_bits);
 
+__sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len)
+{
+   __sum16 sum;
+
+   sum = csum_fold(skb_checksum(skb, 0, len, skb->csum));
+   if (likely(!sum)) {
+   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
+   !skb->csum_complete_sw)
+   netdev_rx_csum_fault(skb->dev);
+   }
+   if (!skb_shared(skb))
+   skb->csum_valid = !sum;
+   return sum;
+}
+EXPORT_SYMBOL(__skb_checksum_complete_head);
+
+__sum16 __skb_checksum_complete(struct sk_buff *skb)
+{
+   __wsum csum;
+   __sum16 sum;
+
+   csum = skb_checksum(skb, 0, skb->len, 0);
+
+   /* skb->csum holds pseudo checksum */
+   sum = csum_fold(csum_add(skb->csum, csum));
+   if (likely(!sum)) {
+   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
+   !skb->csum_complete_sw)
+   netdev_rx_csum_fault(skb->dev);
+   }
+
+   if (!skb_shared(skb)) {
+   /* Save full packet checksum */
+   skb->csum = csum;
+   skb->ip_summed = CHECKSUM_COMPLETE;
+   skb->csum_complete_sw = 1;
+   skb->csum_valid = !sum;
+   }
+
+   return sum;
+}
+EXPORT_SYMBOL(__skb_checksum_complete);
+
 static __wsum warn_crc32c_csum_update(const void *buff, int len, __wsum sum)
 {
net_warn_ratelimited(
-- 
2.19.1



[PATCH net-next 2/2] net: phy: realtek: remove flag PHY_HAS_INTERRUPT from driver configs

2018-11-08 Thread Heiner Kallweit
Now that flag PHY_HAS_INTERRUPT has been replaced with a check for
callbacks config_intr and ack_interrupt, we can remove setting this
flag from driver configs.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/phy/realtek.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index abff4cdc9..3985b4a4d 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -216,12 +216,10 @@ static struct phy_driver realtek_drvs[] = {
.phy_id = 0x8201,
.name   = "RTL8201CP Ethernet",
.features   = PHY_BASIC_FEATURES,
-   .flags  = PHY_HAS_INTERRUPT,
}, {
.phy_id = 0x001cc816,
.name   = "RTL8201F Fast Ethernet",
.features   = PHY_BASIC_FEATURES,
-   .flags  = PHY_HAS_INTERRUPT,
.ack_interrupt  = _ack_interrupt,
.config_intr= _config_intr,
.suspend= genphy_suspend,
@@ -239,7 +237,6 @@ static struct phy_driver realtek_drvs[] = {
.phy_id = 0x001cc912,
.name   = "RTL8211B Gigabit Ethernet",
.features   = PHY_GBIT_FEATURES,
-   .flags  = PHY_HAS_INTERRUPT,
.ack_interrupt  = _ack_interrupt,
.config_intr= _config_intr,
.read_mmd   = _read_mmd_unsupported,
@@ -257,7 +254,6 @@ static struct phy_driver realtek_drvs[] = {
.phy_id = 0x001cc914,
.name   = "RTL8211DN Gigabit Ethernet",
.features   = PHY_GBIT_FEATURES,
-   .flags  = PHY_HAS_INTERRUPT,
.ack_interrupt  = rtl821x_ack_interrupt,
.config_intr= rtl8211e_config_intr,
.suspend= genphy_suspend,
@@ -266,7 +262,6 @@ static struct phy_driver realtek_drvs[] = {
.phy_id = 0x001cc915,
.name   = "RTL8211E Gigabit Ethernet",
.features   = PHY_GBIT_FEATURES,
-   .flags  = PHY_HAS_INTERRUPT,
.ack_interrupt  = _ack_interrupt,
.config_intr= _config_intr,
.suspend= genphy_suspend,
@@ -275,7 +270,6 @@ static struct phy_driver realtek_drvs[] = {
.phy_id = 0x001cc916,
.name   = "RTL8211F Gigabit Ethernet",
.features   = PHY_GBIT_FEATURES,
-   .flags  = PHY_HAS_INTERRUPT,
.config_init= _config_init,
.ack_interrupt  = _ack_interrupt,
.config_intr= _config_intr,
@@ -287,7 +281,6 @@ static struct phy_driver realtek_drvs[] = {
.phy_id = 0x001cc961,
.name   = "RTL8366RB Gigabit Ethernet",
.features   = PHY_GBIT_FEATURES,
-   .flags  = PHY_HAS_INTERRUPT,
.config_init= _config_init,
.suspend= genphy_suspend,
.resume = genphy_resume,
-- 
2.19.1




[PATCH net-next 1/2] net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt

2018-11-08 Thread Heiner Kallweit
Flag PHY_HAS_INTERRUPT is used only here for this small check. I think
using interrupts isn't possible if a driver defines neither
config_intr nor ack_interrupts callback. So we can replace checking
flag PHY_HAS_INTERRUPT with checking for these callbacks.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/phy/phy_device.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index d165a2c82..33e51b955 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2104,8 +2104,9 @@ static int phy_probe(struct device *dev)
/* Disable the interrupt if the PHY doesn't support it
 * but the interrupt is still a valid one
 */
-   if (!(phydrv->flags & PHY_HAS_INTERRUPT) &&
-   phy_interrupt_is_valid(phydev))
+if (!phydrv->config_intr &&
+!phydrv->ack_interrupt &&
+phy_interrupt_is_valid(phydev))
phydev->irq = PHY_POLL;
 
if (phydrv->flags & PHY_IS_INTERNAL)
-- 
2.19.1




Re: [PATCH v3 bpf-next 4/4] bpftool: support loading flow dissector

2018-11-08 Thread Jakub Kicinski
On Thu, 8 Nov 2018 13:29:40 -0800, Stanislav Fomichev wrote:
> On 11/08, Jakub Kicinski wrote:
> > On Wed,  7 Nov 2018 21:39:57 -0800, Stanislav Fomichev wrote:  
> > > This commit adds support for loading/attaching/detaching flow
> > > dissector program. The structure of the flow dissector program is
> > > assumed to be the same as in the selftests:
> > > 
> > > * flow_dissector section with the main entry point
> > > * a bunch of tail call progs
> > > * a jmp_table map that is populated with the tail call progs  
> > 
> > Could you split the loadall changes and the flow_dissector changes into
> > two separate patches?  
> Sure, will do, but let's first agree on the semantical differences of
> load vs loadall.
> 
> So far *load* actually loads _all_ progs (via bpf_object__load), but pins
> only the first program. Is that what we want? I wonder whether the
> assumption there was that there is only single program in the object.
> Should we load only the first program in *load*?
> 
> If we add *loadall*, then the difference would be:
> *load*:
>   * loads all maps and only the first program, pins only the first
> program
> *loadall*:
>   * loads all maps and all programs, pins everything (maps and programs)
> 
> Is this the expected behavior?

Loading all programs and maps for "load" is just a libbpf limitation we
can remove at some point.


[PATCH net-next 0/2] net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt

2018-11-08 Thread Heiner Kallweit
Flag PHY_HAS_INTERRUPT is used only here for this small check. I think
using interrupts isn't possible if a driver defines neither
config_intr nor ack_interrupts callback. So we can replace checking
flag PHY_HAS_INTERRUPT with checking for these callbacks.

This allows to remove this flag from a lot of driver configs, let's
start with the Realtek driver.

Heiner Kallweit (2):
  net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and 
ack_interrupt
  net: phy: realtek: remove flag PHY_HAS_INTERRUPT from driver configs

 drivers/net/phy/phy_device.c | 5 +++--
 drivers/net/phy/realtek.c| 7 ---
 2 files changed, 3 insertions(+), 9 deletions(-)

-- 
2.19.1



Re: [net-next 06/12] i40e/ixgbe/igb: fail on new WoL flag setting WAKE_MAGICSECURE

2018-11-08 Thread Jeff Kirsher
On Thu, 2018-11-08 at 07:42 +0100, Michal Kubecek wrote:
> On Thu, Nov 08, 2018 at 06:05:26AM +, Kevin Easton wrote:
> > On Wed, Nov 07, 2018 at 02:48:24PM -0800, Jeff Kirsher wrote:
> > > From: Todd Fujinaka 
> > > 
> > > There's a new flag for setting WoL filters that is only
> > > enabled on one manufacturer's NICs, and it's not ours. Fail
> > > with EOPNOTSUPP.
> > > 
> > > Signed-off-by: Todd Fujinaka 
> > > Tested-by: Andrew Bowers 
> > > Signed-off-by: Jeff Kirsher 
> > > ---
> > >  drivers/net/ethernet/intel/i40e/i40e_ethtool.c   | 3 ++-
> > >  drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 +-
> > >  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 3 ++-
> > >  3 files changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> > > b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> > > index 9f8464f80783..9c1211ad2c6b 100644
> > > --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> > > +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> > > @@ -2377,7 +2377,8 @@ static int i40e_set_wol(struct net_device
> > > *netdev, struct ethtool_wolinfo *wol)
> > >   return -EOPNOTSUPP;
> > >  
> > >   /* only magic packet is supported */
> > > - if (wol->wolopts && (wol->wolopts != WAKE_MAGIC))
> > > + if (wol->wolopts && (wol->wolopts != WAKE_MAGIC)
> > > +   | (wol->wolopts != WAKE_FILTER))
> > >   return -EOPNOTSUPP;
> > 
> > This doesn't look right.  WAKE_MAGIC and WAKE_FILTER are distinct, so
> > 
> > (wol->wolopts != WAKE_MAGIC) | (wol->wolopts != WAKE_FILTER)
> > 
> > will always be 1.
> 
> Right. Also, using "|" with logical values is rather confusing. While
> the result works as expected, its priority is higher than priority of
> && (which would not be true for ||), making the code counterintuitive.
> 
> BtW, the patch subject is also wrong, the newly added flag it is dealing
> with is WAKE_FILTER, not WAKE_MAGICSECURE.

After looking into this this more, this does appear to be incorrect.  The
author of this change is currently out on vacation, so once he gets back, I
will address both Kevin's and your concerns.

> 
> > It looks like the existing test in this driver was fine - it *only*
> > accepted wol->wolopts of either 0 or WAKE_MAGIC, it was already
> > rejecting everything else including WAKE_FILTER.
> 
> Another way to write the check would be
> 
>   if (wol->wolopts & ~WAKE_MAGIC)
> 
> > > diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> > > b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> > > index 5acf3b743876..c57671068245 100644
> > > --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> > > +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> > > @@ -2113,7 +2113,7 @@ static int igb_set_wol(struct net_device
> > > *netdev, struct ethtool_wolinfo *wol)
> > >  {
> > >   struct igb_adapter *adapter = netdev_priv(netdev);
> > >  
> > > - if (wol->wolopts & (WAKE_ARP | WAKE_MAGICSECURE))
> > > + if (wol->wolopts & (WAKE_ARP | WAKE_MAGICSECURE | WAKE_FILTER))
> > >   return -EOPNOTSUPP;
> > >  
> > >   if (!(adapter->flags & IGB_FLAG_WOL_SUPPORTED))
> 
> I would also suggest taking the opposite approach here, i.e. listing the
> flags which _are_ supported so that we don't have to update the code if
> another wol flag is added (or userspace sends an invalid one):
> 
> #define SUPPORTED_WOL_MODES \
>   (WAKE_PHY | WAKE_UCAST | WAKE_MCAST | WAKE_BCAST | WAKE_MAGIC)
> ...
>   if (wol->wolopts & ~SUPPORTED_WOL_MODES)
>   return -EOPNOTSUPP;
> 
> 
> > > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> > > b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> > > index 732b1e6ecc43..acba067cc15a 100644
> > > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> > > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> > > @@ -2206,7 +2206,8 @@ static int ixgbe_set_wol(struct net_device
> > > *netdev, struct ethtool_wolinfo *wol)
> > >  {
> > >   struct ixgbe_adapter *adapter = netdev_priv(netdev);
> > >  
> > > - if (wol->wolopts & (WAKE_PHY | WAKE_ARP | WAKE_MAGICSECURE))
> > > + if (wol->wolopts & (WAKE_PHY | WAKE_ARP | WAKE_MAGICSECURE |
> > > + WAKE_FILTER))
> > >   return -EOPNOTSUPP;
> > >  
> > >   if (ixgbe_wol_exclusion(adapter, wol))
> 
> ...and here.
> 
> Michal Kubecek



signature.asc
Description: This is a digitally signed message part


Re: [PATCH v3 bpf-next 4/4] bpftool: support loading flow dissector

2018-11-08 Thread Jakub Kicinski
On Thu, 8 Nov 2018 13:25:39 -0800, Stanislav Fomichev wrote:
> > > > +   goto err_close_obj;
> > > > +   }
> > > > +
> > > > const char *sec_name = bpf_program__title(prog, false);
> > > >   
> > > > err = libbpf_prog_type_by_name(sec_name, 
> > > > _type,
> > > > @@ -936,8 +958,13 @@ static int do_load(int argc, char **argv)
> > > > goto err_close_obj;
> > > > }
> > > > }
> > > > -   bpf_program__set_type(prog, attr.prog_type);
> > > > -   bpf_program__set_expected_attach_type(prog, 
> > > > expected_attach_type);
> > > > +
> > > > +   bpf_object__for_each_program(pos, obj) {
> > > > +   bpf_program__set_ifindex(pos, ifindex);
> > > > +   bpf_program__set_type(pos, attr.prog_type);
> > > > +   bpf_program__set_expected_attach_type(pos,
> > > > + 
> > > > expected_attach_type);
> > > > +   }
> > > 
> > > I still believe you can have programs of different types here, and be 
> > > able to load them. I tried it and managed to have it working fine. If no 
> > > type is provided from command line we can retrieve types for each 
> > > program from its section name. If a type is provided on the command 
> > > line, we can do the same, but I am not sure we should do it, or impose 
> > > that type for all programs instead.  
> > 
> > attr->prog_type is one per object, though.  How do we set that one?  
> Isn't it used only in __bpf_object__open_xattr to require/not-require kernel
> version in the bpf prog?
> 
> It will probably work quite nicely for both of our options:
> 
> * type not specified: prog_type = BPF_PROG_TYPE_UNSPEC, need kernel
>   version (over cautious?)
> * type specified (and applied to all progs): using bpf_prog_type__needs_kver
>   to require/not requqire kernel version

Right, but they you can't infer it from the program name, since there's
multiple.


Re: [PATCH v3 bpf-next 4/4] bpftool: support loading flow dissector

2018-11-08 Thread Jakub Kicinski
On Thu, 8 Nov 2018 13:20:12 -0800, Stanislav Fomichev wrote:
> On 11/08, Jakub Kicinski wrote:
> > On Thu, 8 Nov 2018 18:21:24 +, Quentin Monnet wrote:  
> > > >>> @@ -79,8 +82,11 @@ DESCRIPTION
> > > >>> contain a dot character ('.'), which is reserved for 
> > > >>> future
> > > >>> extensions of *bpffs*.
> > > >>> - **bpftool prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** 
> > > >>> {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> > > >>> + **bpftool prog { load | loadall }** *OBJ* *FILE* [**type** 
> > > >>> *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** 
> > > >>> *NAME*]
> > > >>> Load bpf program from binary *OBJ* and pin as *FILE*.
> > > >>> +   **bpftool prog load** will pin only the first bpf 
> > > >>> program
> > > >>> +   from the *OBJ*, **bpftool prog loadall** will pin all 
> > > >>> maps
> > > >>> +   and programs from the *OBJ*.
> > > >>
> > > >> This could be improved regarding maps: with "bpftool prog load" I 
> > > >> think we
> > > >> also load and pin all maps, but your description implies this is only 
> > > >> the
> > > >> case with "loadall"
> > > > I don't think we pin any maps with `bpftool prog load`, we certainly 
> > > > load
> > > > them, but we don't pin any afaict. Can you point me to the code where we
> > > > pin the maps?
> > > > 
> > > 
> > > My bad. I read "pin" but thought "load". It does not pin them indeed,
> > > sorry about that.  
> > 
> > Right, but I don't see much reason why prog loadall should pin maps.  
> It does seem convenient to have an option to pin everything, so we
> don't require anything else to find the ids of the maps.
> If we are pinning all progs, might as well pin the maps, why not? See
> my example in the commit message, for example, where I just hard code
> the expected map name. Convenient :-)

Sure.  I can see how its convenient to your use case.  A lot of people
will be very used to finding out the maps because if program is loaded
with iproute2 tools neither programs nor maps get pinned.

Please add a "pinmaps MAP_DIR" optional parameter as a separate patch.

> > The reason to pin program(s) is to hold some reference and to be able
> > to find them.  Since we have the programs pinned we should be able to
> > find their maps with relative ease.
> > 
> > $ bpftool prog show pinned /sys/fs/bpf/prog
> > 7: cgroup_skb  tag 2a142ef67aaad174  gpl
> > loaded_at 2018-11-08T11:02:25-0800  uid 0
> > xlated 296B  jited 229B  memlock 4096B  map_ids 6,7
> > 
> > possibly:
> > 
> > $ bpftool -j prog show pinned /sys/fs/bpf/prog | jq '.map_ids[0]'
> > 6
> > 
> > Moreover, I think program and map names may collide making ELFs
> > unloadable..  
> It doesn't sound like a big problem, sounds like a constraint we can live
> with.



[PATCH net-next] net: phy: improve struct phy_device member interrupts handling

2018-11-08 Thread Heiner Kallweit
As a heritage from the very early days of phylib member interrupts is
defined as u32 even though it's just a flag whether interrupts are
enabled. So we can change it to a bitfield member. In addition change
the code dealing with this member in a way that it's clear we're
dealing with a bool value.

Signed-off-by: Heiner Kallweit 
---
Actually this member isn't needed at all and could be replaced with
a parameter in phy_driver->config_intr. But this would mean an API
change, maybe I come up with a proposal later.
---
 drivers/net/phy/phy.c |  2 +-
 include/linux/phy.h   | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index dd5bff955..a5e6acfe6 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -115,7 +115,7 @@ static int phy_clear_interrupt(struct phy_device *phydev)
  *
  * Returns 0 on success or < 0 on error.
  */
-static int phy_config_interrupt(struct phy_device *phydev, u32 interrupts)
+static int phy_config_interrupt(struct phy_device *phydev, bool interrupts)
 {
phydev->interrupts = interrupts;
if (phydev->drv->config_intr)
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5dd85c441..fc90af152 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -263,8 +263,8 @@ static inline struct mii_bus *devm_mdiobus_alloc(struct 
device *dev)
 void devm_mdiobus_free(struct device *dev, struct mii_bus *bus);
 struct phy_device *mdiobus_scan(struct mii_bus *bus, int addr);
 
-#define PHY_INTERRUPT_DISABLED 0x0
-#define PHY_INTERRUPT_ENABLED  0x8000
+#define PHY_INTERRUPT_DISABLED 0
+#define PHY_INTERRUPT_ENABLED  1
 
 /* PHY state machine states:
  *
@@ -410,6 +410,9 @@ struct phy_device {
/* The most recently read link state */
unsigned link:1;
 
+   /* Interrupts are enabled */
+   unsigned interrupts:1;
+
enum phy_state state;
 
u32 dev_flags;
@@ -425,9 +428,6 @@ struct phy_device {
int pause;
int asym_pause;
 
-   /* Enabled Interrupts */
-   u32 interrupts;
-
/* Union of PHY and Attached devices' supported modes */
/* See mii.h for more info */
u32 supported;
-- 
2.19.1



Re: [PATCH v3 bpf-next 4/4] bpftool: support loading flow dissector

2018-11-08 Thread Stanislav Fomichev
On 11/08, Jakub Kicinski wrote:
> On Wed,  7 Nov 2018 21:39:57 -0800, Stanislav Fomichev wrote:
> > This commit adds support for loading/attaching/detaching flow
> > dissector program. The structure of the flow dissector program is
> > assumed to be the same as in the selftests:
> > 
> > * flow_dissector section with the main entry point
> > * a bunch of tail call progs
> > * a jmp_table map that is populated with the tail call progs
> 
> Could you split the loadall changes and the flow_dissector changes into
> two separate patches?
Sure, will do, but let's first agree on the semantical differences of
load vs loadall.

So far *load* actually loads _all_ progs (via bpf_object__load), but pins
only the first program. Is that what we want? I wonder whether the
assumption there was that there is only single program in the object.
Should we load only the first program in *load*?

If we add *loadall*, then the difference would be:
*load*:
  * loads all maps and only the first program, pins only the first
program
*loadall*:
  * loads all maps and all programs, pins everything (maps and programs)

Is this the expected behavior?


Re: [PATCH bpf-next] bpftool: Improve handling of ENOENT on map dumps

2018-11-08 Thread Jakub Kicinski
On Thu,  8 Nov 2018 13:00:07 -0800, David Ahern wrote:
> From: David Ahern 
> 
> bpftool output is not user friendly when dumping a map with only a few
> populated entries:
> 
> $ bpftool map
> 1: devmap  name tx_devmap  flags 0x0
> key 4B  value 4B  max_entries 64  memlock 4096B
> 2: array  name tx_idxmap  flags 0x0
> key 4B  value 4B  max_entries 64  memlock 4096B
> 
> $ bpftool map dump id 1
> key:
> 00 00 00 00
> value:
> No such file or directory
> key:
> 01 00 00 00
> value:
> No such file or directory
> key:
> 02 00 00 00
> value:
> No such file or directory
> key: 03 00 00 00  value: 03 00 00 00
> 
> Handle ENOENT by keeping the line format sane and dumping
> "" for the value
> 
> $ bpftool map dump id 1
> key: 00 00 00 00  value: 
> key: 01 00 00 00  value: 
> key: 02 00 00 00  value: 
> key: 03 00 00 00  value: 03 00 00 00
> ...
> 
> Signed-off-by: David Ahern 

Seems good.  I wonder why "fd" maps report all indexes in get_next..

Acked-by: Jakub Kicinski 

> Alternatively, could just omit the value, so:
> key: 00 00 00 00  value:
> key: 01 00 00 00  value:
> key: 02 00 00 00  value:
> key: 03 00 00 00  value: 03 00 00 00
> 
>  tools/bpf/bpftool/map.c | 19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index 101b8a881225..1f0060644e0c 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -383,7 +383,10 @@ static void print_entry_plain(struct bpf_map_info *info, 
> unsigned char *key,
>   printf(single_line ? "  " : "\n");
>  
>   printf("value:%c", break_names ? '\n' : ' ');
> - fprint_hex(stdout, value, info->value_size, " ");
> + if (value)
> + fprint_hex(stdout, value, info->value_size, " ");
> + else
> + printf("");
>  
>   printf("\n");
>   } else {
> @@ -398,8 +401,12 @@ static void print_entry_plain(struct bpf_map_info *info, 
> unsigned char *key,
>   for (i = 0; i < n; i++) {
>   printf("value (CPU %02d):%c",
>  i, info->value_size > 16 ? '\n' : ' ');
> - fprint_hex(stdout, value + i * step,
> -info->value_size, " ");
> + if (value) {
> + fprint_hex(stdout, value + i * step,
> +info->value_size, " ");
> + } else {
> + printf("");
> + }

nit: in other places you don't add { }

>   printf("\n");
>   }
>   }
> @@ -731,7 +738,11 @@ static int dump_map_elem(int fd, void *key, void *value,
>   jsonw_string_field(json_wtr, "error", strerror(lookup_errno));
>   jsonw_end_object(json_wtr);
>   } else {
> - print_entry_error(map_info, key, strerror(lookup_errno));
> + if (errno == ENOENT)
> + print_entry_plain(map_info, key, NULL);
> + else
> + print_entry_error(map_info, key,
> +   strerror(lookup_errno));
>   }
>  
>   return 0;



Re: [PATCH v3 bpf-next 4/4] bpftool: support loading flow dissector

2018-11-08 Thread Stanislav Fomichev
On 11/08, Jakub Kicinski wrote:
> On Thu, 8 Nov 2018 11:16:43 +, Quentin Monnet wrote:
> > > - bpf_program__set_ifindex(prog, ifindex);
> > >   if (attr.prog_type == BPF_PROG_TYPE_UNSPEC) {
> > > + if (!prog) {
> > > + p_err("can not guess program type when loading all 
> > > programs\n");
> 
> No new lines in p_err(), beaks JSON.
Thanks, I'll probably remove that altogether and do auto inference of
prog_type from the section name here.

> > > + goto err_close_obj;
> > > + }
> > > +
> > >   const char *sec_name = bpf_program__title(prog, false);
> > >   
> > >   err = libbpf_prog_type_by_name(sec_name, 
> > > _type,
> > > @@ -936,8 +958,13 @@ static int do_load(int argc, char **argv)
> > >   goto err_close_obj;
> > >   }
> > >   }
> > > - bpf_program__set_type(prog, attr.prog_type);
> > > - bpf_program__set_expected_attach_type(prog, expected_attach_type);
> > > +
> > > + bpf_object__for_each_program(pos, obj) {
> > > + bpf_program__set_ifindex(pos, ifindex);
> > > + bpf_program__set_type(pos, attr.prog_type);
> > > + bpf_program__set_expected_attach_type(pos,
> > > +   expected_attach_type);
> > > + }  
> > 
> > I still believe you can have programs of different types here, and be 
> > able to load them. I tried it and managed to have it working fine. If no 
> > type is provided from command line we can retrieve types for each 
> > program from its section name. If a type is provided on the command 
> > line, we can do the same, but I am not sure we should do it, or impose 
> > that type for all programs instead.
> 
> attr->prog_type is one per object, though.  How do we set that one?
Isn't it used only in __bpf_object__open_xattr to require/not-require kernel
version in the bpf prog?

It will probably work quite nicely for both of our options:

* type not specified: prog_type = BPF_PROG_TYPE_UNSPEC, need kernel
  version (over cautious?)
* type specified (and applied to all progs): using bpf_prog_type__needs_kver
  to require/not requqire kernel version


Re: [PATCH net-next 0/7] net: sched: prepare for more Qdisc offloads

2018-11-08 Thread Toke Høiland-Jørgensen
Jakub Kicinski  writes:

> On Thu, 08 Nov 2018 12:48:27 +0100, Toke Høiland-Jørgensen wrote:
>> Jakub Kicinski  writes:
>> > Hi!
>> >
>> > This series refactors the "switchdev" Qdisc offloads a little.  We have
>> > a few Qdiscs which can be fully offloaded today to the forwarding plane
>> > of switching devices.
>> >
>> > First patch adds a helper for handing statistic dumps, the code seems
>> > to be copy pasted between PRIO and RED.  
>> 
>> Hi Jakub
>> 
>> I didn't know there was offload capabilities for AQMs, that's cool! Do
>> you have any plans to add offloads for any of the modern AQMs (CoDel or
>> PIE)?
>
> I'd really like to add CoDel offload, but it's not a plan at this
> point :(

Right, too bad. Well, here's hoping that you'll get the chance in the
not too distant future :)

-Toke


  1   2   3   >