Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions

2018-08-09 Thread Jakub Kicinski
On Thu,  9 Aug 2018 11:01:18 -0400, Eelco Chaudron wrote:
> Add hardware specific counters to TC actions which will be exported
> through the netlink API. This makes troubleshooting TC flower offload
> easier, as it possible to differentiate the packets being offloaded.

It is not immediately clear why this is needed.  The memory and
updating two sets of counters won't come for free, so perhaps a
stronger justification than troubleshooting is due? :S

Netdev has counters for fallback vs forwarded traffic, so you'd know
that traffic hits the SW datapath, plus the rules which are in_hw will
most likely not match as of today for flower (assuming correctness).

I'm slightly concerned about potential performance impact, would you
be able to share some numbers for non-trivial number of flows (100k
active?)?


Re: [PATCH bpf-next 1/3] bpf: add bpf queue map

2018-08-09 Thread Alexei Starovoitov
On Thu, Aug 09, 2018 at 06:41:54PM -0500, Mauricio Vasquez wrote:
> 
> 
> On 08/09/2018 11:23 AM, Alexei Starovoitov wrote:
> > On Thu, Aug 09, 2018 at 09:51:49AM -0500, Mauricio Vasquez wrote:
> > > > Agree that existing ops are not the right alias, but deferring to user
> > > > space as inline function also doesn't really seem like a good fit, imho,
> > > > so I'd prefer rather to have something native. (Aside from that, the
> > > > above inline bpf_pop() would also race between CPUs.)
> > > I think we should have push/pop/peek syscalls as well, having a bpf_pop()
> > > that is race prone would create problems. Users expected maps operations 
> > > to
> > > be safe, so having one that is not will confuse them.
> > agree the races are not acceptable.
> > How about a mixed solution:
> > - introduce bpf_push/pop/peak helpers that programs will use, so
> >they don't need to pass useless key=NULL
> > - introduce map->ops->lookup_and_delete and map->ops->lookup_or_init
> >that prog-side helpers can use and syscall has 1-1 mapping for
> I think if is a fair solution.
> > Native lookup_or_init() helper for programs and syscall is badly missing.
> > Most of the bcc scripts use it and bcc has a racy workaround.
> > Similarly lookup_and_delete() syscall is 1-1 to pop() for stack/queue
> > and useful for regular hash maps.
> > 
> > At the end for stack/queue map the programs will use:
> > int bpf_push(map, value);
> 
> Also flags should be passed here.

Good point about flags. Indeed flags should be there for
extensibility, but I cannot think of use case for them at the moment.
hash's exist/noexist don't apply here.
Would be good to have at least one bit in use from the start.

> 
> > value_or_null = bpf_pop(map); // guaranteed non-racy for multi-cpu
> > value_or_null = bpf_peak(map); // racy if 2+ cpus doing it
> Is there any reason for it to be racy?

because two cpus will be looking at the same element?
whethere implementation is array based or link list with rcu
the race is still there.



Re: [PATCH bpf-next] bpf: enable btf for use in all maps

2018-08-09 Thread Alexei Starovoitov
On Fri, Aug 10, 2018 at 12:43:20AM +0200, Daniel Borkmann wrote:
> On 08/09/2018 11:44 PM, Alexei Starovoitov wrote:
> > On Thu, Aug 09, 2018 at 11:30:52PM +0200, Daniel Borkmann wrote:
> >> On 08/09/2018 11:14 PM, Alexei Starovoitov wrote:
> >>> On Thu, Aug 09, 2018 at 09:42:20PM +0200, Daniel Borkmann wrote:
>  Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
>  the basic arraymap") enabled support for BTF and dumping via
>  BPF fs for arraymap. However, both can be decoupled from each
>  other such that all BPF maps can be supported for attaching
>  BTF key/value information, while not all maps necessarily
>  need to dump via map_seq_show_elem() callback.
> 
>  The check in array_map_check_btf() can be generalized as
>  ultimatively the key and value size is the only contraint
>  that needs to match for the map. The fact that the key needs
>  to be of type int is optional; it could be any data type as
>  long as it matches the 4 byte key size, just like hash table
>  key or others could be of any data type as well.
> 
>  Minimal example of a hash table dump which then works out
>  of the box for bpftool:
> 
>    # bpftool map dump id 19
>    [{
>    "key": {
>    "": {
>    "vip": 0,
>    "vipv6": []
>    },
>    "port": 0,
>    "family": 0,
>    "proto": 0
>    },
>    "value": {
>    "flags": 0,
>    "vip_num": 0
>    }
>    }
>    ]
> 
>  Signed-off-by: Daniel Borkmann 
>  Cc: Yonghong Song 
>  ---
>   include/linux/bpf.h   |  4 +---
>   kernel/bpf/arraymap.c | 27 ---
>   kernel/bpf/inode.c|  3 ++-
>   kernel/bpf/syscall.c  | 24 
>   4 files changed, 23 insertions(+), 35 deletions(-)
> 
>  diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>  index cd8790d..eb76e8e 100644
>  --- a/include/linux/bpf.h
>  +++ b/include/linux/bpf.h
>  @@ -48,8 +48,6 @@ struct bpf_map_ops {
>   u32 (*map_fd_sys_lookup_elem)(void *ptr);
>   void (*map_seq_show_elem)(struct bpf_map *map, void *key,
> struct seq_file *m);
>  -int (*map_check_btf)(const struct bpf_map *map, const struct 
>  btf *btf,
>  - u32 key_type_id, u32 value_type_id);
>   };
>   
>   struct bpf_map {
>  @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const 
>  struct bpf_map *map)
>   
>   static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
>   {
>  -return map->ops->map_seq_show_elem && map->ops->map_check_btf;
>  +return map->btf && map->ops->map_seq_show_elem;
>   }
>   
>   extern const struct bpf_map_ops bpf_map_offload_ops;
>  diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>  index 2aa55d030..67f0bdf 100644
>  --- a/kernel/bpf/arraymap.c
>  +++ b/kernel/bpf/arraymap.c
>  @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map 
>  *map, void *key,
>   rcu_read_unlock();
>   }
>   
>  -static int array_map_check_btf(const struct bpf_map *map, const struct 
>  btf *btf,
>  -   u32 btf_key_id, u32 btf_value_id)
>  -{
>  -const struct btf_type *key_type, *value_type;
>  -u32 key_size, value_size;
>  -u32 int_data;
>  -
>  -key_type = btf_type_id_size(btf, _key_id, _size);
>  -if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
>  -return -EINVAL;
>  -
>  -int_data = *(u32 *)(key_type + 1);
>  -/* bpf array can only take a u32 key.  This check makes
>  - * sure that the btf matches the attr used during map_create.
>  - */
>  -if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
>  -BTF_INT_OFFSET(int_data))
>  -return -EINVAL;
> >>>
> >>> I think most of these checks are still necessary for array type.
> >>> Relaxing BTF array key from BTF_KIND_INT to, for example, BTF_KIND_ENUM
> >>> is probably ok, but key being BTF_KIND_PTR or BTF_KIND_ARRAY doesn't 
> >>> makes sense.
> >>
> >> Hmm, so on 64 bit archs BTF_KIND_PTR would get rejected for array,
> >> on 32 bit it may be allowed due to sizeof(void *) == 4. BTF_KIND_ARRAY
> >> could be array of u8 foo[4], for example, or u16 foo[2]. But how would
> >> it ultimately be different from e.g. having 'struct a' versus 'struct b'
> >> where both are of same size and while actual key has 'struct a', the one
> >> who writes the prog resp. loads the BTF into the kernel would lie about
> 

[net-next:master 1941/1953] drivers/net/ethernet/qlogic/qede/qede_filter.c:2048:38: sparse: restricted __be16 degrades to integer

2018-08-09 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   36d2f761b5aa688567b6aebdc6d68e73682275d4
commit: 2ce9c93eaca6c67e3fa8828a471738a32cd66770 [1941/1953] qede: Ingress tc 
flower offload (drop action) support.
reproduce:
# apt-get install sparse
git checkout 2ce9c93eaca6c67e3fa8828a471738a32cd66770
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   include/linux/overflow.h:220:13: sparse: undefined identifier 
'__builtin_mul_overflow'
   include/linux/overflow.h:220:13: sparse: incorrect type in conditional
   include/linux/overflow.h:220:13:got void
>> drivers/net/ethernet/qlogic/qede/qede_filter.c:2048:38: sparse: restricted 
>> __be16 degrades to integer
   drivers/net/ethernet/qlogic/qede/qede_filter.c:2049:38: sparse: restricted 
__be16 degrades to integer
>> drivers/net/ethernet/qlogic/qede/qede_filter.c:2115:38: sparse: restricted 
>> __be32 degrades to integer
   drivers/net/ethernet/qlogic/qede/qede_filter.c:2116:38: sparse: restricted 
__be32 degrades to integer
   include/linux/overflow.h:220:13: sparse: call with no type!

vim +2048 drivers/net/ethernet/qlogic/qede/qede_filter.c

  2032  
  2033  static int
  2034  qede_tc_parse_ports(struct qede_dev *edev,
  2035  struct tc_cls_flower_offload *f,
  2036  struct qede_arfs_tuple *t)
  2037  {
  2038  if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_PORTS)) 
{
  2039  struct flow_dissector_key_ports *key, *mask;
  2040  
  2041  key = skb_flow_dissector_target(f->dissector,
  2042  
FLOW_DISSECTOR_KEY_PORTS,
  2043  f->key);
  2044  mask = skb_flow_dissector_target(f->dissector,
  2045   
FLOW_DISSECTOR_KEY_PORTS,
  2046   f->mask);
  2047  
> 2048  if ((key->src && mask->src != U16_MAX) ||
  2049  (key->dst && mask->dst != U16_MAX)) {
  2050  DP_NOTICE(edev, "Do not support ports masks\n");
  2051  return -EINVAL;
  2052  }
  2053  
  2054  t->src_port = key->src;
  2055  t->dst_port = key->dst;
  2056  }
  2057  
  2058  return 0;
  2059  }
  2060  
  2061  static int
  2062  qede_tc_parse_v6_common(struct qede_dev *edev,
  2063  struct tc_cls_flower_offload *f,
  2064  struct qede_arfs_tuple *t)
  2065  {
  2066  struct in6_addr zero_addr, addr;
  2067  
  2068  memset(_addr, 0, sizeof(addr));
  2069  memset(, 0xff, sizeof(addr));
  2070  
  2071  if (dissector_uses_key(f->dissector, 
FLOW_DISSECTOR_KEY_IPV6_ADDRS)) {
  2072  struct flow_dissector_key_ipv6_addrs *key, *mask;
  2073  
  2074  key = skb_flow_dissector_target(f->dissector,
  2075  
FLOW_DISSECTOR_KEY_IPV6_ADDRS,
  2076  f->key);
  2077  mask = skb_flow_dissector_target(f->dissector,
  2078   
FLOW_DISSECTOR_KEY_IPV6_ADDRS,
  2079   f->mask);
  2080  
  2081  if ((memcmp(>src, _addr, sizeof(addr)) &&
  2082   memcmp(>src, , sizeof(addr))) ||
  2083  (memcmp(>dst, _addr, sizeof(addr)) &&
  2084   memcmp(>dst, , sizeof(addr {
  2085  DP_NOTICE(edev,
  2086"Do not support IPv6 address 
prefix/mask\n");
  2087  return -EINVAL;
  2088  }
  2089  
  2090  memcpy(>src_ipv6, >src, sizeof(addr));
  2091  memcpy(>dst_ipv6, >dst, sizeof(addr));
  2092  }
  2093  
  2094  if (qede_tc_parse_ports(edev, f, t))
  2095  return -EINVAL;
  2096  
  2097  return qede_set_v6_tuple_to_profile(edev, t, _addr);
  2098  }
  2099  
  2100  static int
  2101  qede_tc_parse_v4_common(struct qede_dev *edev,
  2102  struct tc_cls_flower_offload *f,
  2103  struct qede_arfs_tuple *t)
  2104  {
  2105  if (dissector_uses_key(f->dissector, 
FLOW_DISSECTOR_KEY_IPV4_ADDRS)) {
  2106  struct flow_dissector_key_ipv4_addrs *key, *mask;
  2107  
  2108  key = skb_flow_dissector_target(f->dissector,
  2109  
FLOW_DISSECTOR_KEY_IPV4_ADDRS,
  2110  f->key);
  2111  mask = 

Re: [PATCH 4.9-stable] tcp: add tcp_ooo_try_coalesce() helper

2018-08-09 Thread maowenan



On 2018/8/9 20:52, David Woodhouse wrote:
> On Thu, 2018-08-09 at 14:47 +0200, Greg KH wrote:
>> On Thu, Aug 09, 2018 at 08:37:13PM +0800, maowenan wrote:
>>> There are two patches in stable branch linux-4.4, but I have tested with 
>>> below patches, and found that the cpu usage was very high.
>>> dc6ae4d tcp: detect malicious patterns in tcp_collapse_ofo_queue()
>>> 5fbec48 tcp: avoid collapses in tcp_prune_queue() if possible
>>>  
>>> test results:
>>> with fix patch: 78.2%   ksoftirqd
>>> no fix patch:   90% ksoftirqd
>>>  
>>> there is %0 when no attack packets.
>>>  
>>> so please help verify that fixed patches are enough in linux-stable 4.4.
>>>  
>>
>> I do not know, I am not a network developer.  Please try to reproduce
>> the same thing on a newer kernel release and see if the result is the
>> same or not.  If you can find a change that I missed, please let me know
>> and I will be glad to apply it.
> 
> maowenan, there were five patches in the original upstream set to
> address SegmentSmack:
> 
>   tcp: free batches of packets in tcp_prune_ofo_queue()
>   tcp: avoid collapses in tcp_prune_queue() if possible
>   tcp: detect malicious patterns in tcp_collapse_ofo_queue()
>   t
> cp: call tcp_drop() from tcp_data_queue_ofo()
>   tcp: add
> tcp_ooo_try_coalesce() helper
> 
> I believe that the first one, "free batches of packets..." is not
> needed in 4.4 because we only have a simple queue of packets there
> anyway, so we're dropping everything each time and don't need the
> heuristics for how many to drop.
> 
> That leaves two more which have so far not been backported to 4.4; can
> you try applying them and see if it resolves the problem for you?
ok, i will try.
> 
> Thanks.
> 



Re: [PATCH 4.9-stable] tcp: add tcp_ooo_try_coalesce() helper

2018-08-09 Thread maowenan



On 2018/8/9 20:47, Greg KH wrote:
> On Thu, Aug 09, 2018 at 08:37:13PM +0800, maowenan wrote:
>>
>>
>> On 2018/8/7 21:22, Greg KH wrote:
>>> On Sat, Aug 04, 2018 at 10:10:00AM +0100, David Woodhouse wrote:
 From: Eric Dumazet 

 commit 58152ecbbcc6a0ce7fddd5bf5f6ee535834ece0c upstream.

 In case skb in out_or_order_queue is the result of
 multiple skbs coalescing, we would like to get a proper gso_segs
 counter tracking, so that future tcp_drop() can report an accurate
 number.

 I chose to not implement this tracking for skbs in receive queue,
 since they are not dropped, unless socket is disconnected.

 Signed-off-by: Eric Dumazet 
 Acked-by: Soheil Hassas Yeganeh 
 Acked-by: Yuchung Cheng 
 Signed-off-by: David S. Miller 
 Signed-off-by: David Woodhouse 
 ---
  net/ipv4/tcp_input.c | 23 +--
  1 file changed, 21 insertions(+), 2 deletions(-)
>>>
>>> Now applied, thanks,
>>>
>>> greg k-h
>>>
>>> .
>>>
>>
>> Hello,
>>
>> There are two patches in stable branch linux-4.4, but I have tested with 
>> below patches, and found that the cpu usage was very high.
>> dc6ae4d tcp: detect malicious patterns in tcp_collapse_ofo_queue()
>> 5fbec48 tcp: avoid collapses in tcp_prune_queue() if possible
>>
>> test results:
>> with fix patch: 78.2%   ksoftirqd
>> no fix patch:   90% ksoftirqd
>>
>> there is %0 when no attack packets.
>>
>> so please help verify that fixed patches are enough in linux-stable 4.4.
>>
> 
> I do not know, I am not a network developer.  Please try to reproduce
> the same thing on a newer kernel release and see if the result is the
> same or not.  If you can find a change that I missed, please let me know
> and I will be glad to apply it.

I have verified that in linux 4.18-rc3(no fixed patches), and 4.18 rc7(with 5 
fixed patches),
it works well and cpu usage drops from 95% to 27%.

> 
> thnaks,
> 
> greg k-h
> 
> .
> 



Re: Error running AF_XDP sample application

2018-08-09 Thread Jakub Kicinski
On Thu, 09 Aug 2018 18:18:08 +0200, kdjimeli wrote:
> Hello,
> 
> I have been trying to test a sample AF_XDP program, but I have been
> experiencing some issues.
> After building the sample code
> https://github.com/torvalds/linux/tree/master/samples/bpf,
> when running the xdpsock binary, I get the errors
> "libbpf: failed to create map (name: 'xsks_map'): Invalid argument"
> "libbpf: failed to load object './xdpsock_kern.o"
> 
> I tried to figure out the cause of the error but all I know is that it
> occurs at line 910 with the function
> call "bpf_prog_load_xattr(_load_attr, , _fd)".
> 
> Please I would like to inquire what could be a possible for this error.

which kernel version are you running?


[net-next:master 518/519] drivers/net/virtio_net.c:1910:3: error: implicit declaration of function '__netif_set_xps_queue'

2018-08-09 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   36d2f761b5aa688567b6aebdc6d68e73682275d4
commit: 4d99f6602cb552fb58db0c3b1d935bb6fa017f24 [518/519] net: allow to call 
netif_reset_xps_queues() under cpus_read_lock
config: i386-randconfig-sb0-08100039 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 4d99f6602cb552fb58db0c3b1d935bb6fa017f24
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/virtio_net.c: In function 'virtnet_set_affinity':
>> drivers/net/virtio_net.c:1910:3: error: implicit declaration of function 
>> '__netif_set_xps_queue' [-Werror=implicit-function-declaration]
  __netif_set_xps_queue(vi->dev, mask, i, false);
  ^
   cc1: some warnings being treated as errors

vim +/__netif_set_xps_queue +1910 drivers/net/virtio_net.c

  1888  
  1889  static void virtnet_set_affinity(struct virtnet_info *vi)
  1890  {
  1891  int i;
  1892  int cpu;
  1893  
  1894  /* In multiqueue mode, when the number of cpu is equal to the 
number of
  1895   * queue pairs, we let the queue pairs to be private to one cpu 
by
  1896   * setting the affinity hint to eliminate the contention.
  1897   */
  1898  if (vi->curr_queue_pairs == 1 ||
  1899  vi->max_queue_pairs != num_online_cpus()) {
  1900  virtnet_clean_affinity(vi, -1);
  1901  return;
  1902  }
  1903  
  1904  i = 0;
  1905  for_each_online_cpu(cpu) {
  1906  const unsigned long *mask = 
cpumask_bits(cpumask_of(cpu));
  1907  
  1908  virtqueue_set_affinity(vi->rq[i].vq, cpu);
  1909  virtqueue_set_affinity(vi->sq[i].vq, cpu);
> 1910  __netif_set_xps_queue(vi->dev, mask, i, false);
  1911  i++;
  1912  }
  1913  
  1914  vi->affinity_hint_set = true;
  1915  }
  1916  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH net-next] net: add an empty __netif_set_xps_queue() stab in the !CONFIG_XPS case

2018-08-09 Thread Andrei Vagin
From: Andrei Vagin 

__netif_set_xps_queue() is used in drivers/net/virtio_net.c.

Fixes: 4d99f6602cb5 ("net: allow to call netif_reset_xps_queues() under 
cpus_read_lock")
Signed-off-by: Andrei Vagin 
---
 include/linux/netdevice.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 282e2e95ad5b..a5e4b0a18f90 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3406,6 +3406,13 @@ static inline int netif_attrmask_next_and(int n, const 
unsigned long *src1p,
return n + 1;
 }
 #else
+static inline int __netif_set_xps_queue(struct net_device *dev,
+   const unsigned long *mask,
+   u16 index, bool is_rxqs_map)
+{
+   return 0;
+}
+
 static inline int netif_set_xps_queue(struct net_device *dev,
  const struct cpumask *mask,
  u16 index)
-- 
2.17.1



Re: [PATCH lora-next v2 8/8] net: lora: sx1301: convert driver over to regmap reads and writes

2018-08-09 Thread Andreas Färber
Am 10.08.2018 um 00:47 schrieb Ben Whitten:
> On Thu, 9 Aug 2018 at 23:34, Andreas Färber  wrote:
>> Applying so that we can continue based on regmap.
> 
> Thanks!

Rebased onto latest linux-next, tested on RAK831 and pushed:

https://git.kernel.org/pub/scm/linux/kernel/git/afaerber/linux-lora.git/log/?h=lora-next

Next steps for me will be to apply your devm_alloc_loradev() to all my
other drivers, and to convert sx1276 to regmap, too.

Cheers,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH net-next v6 10/11] net: sched: atomically check-allocate action

2018-08-09 Thread Cong Wang
On Wed, Aug 8, 2018 at 5:06 AM Vlad Buslov  wrote:
>
>
> On Wed 08 Aug 2018 at 01:20, Cong Wang  wrote:
> > On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov  wrote:
> >>
> >> Implement function that atomically checks if action exists and either takes
> >> reference to it, or allocates idr slot for action index to prevent
> >> concurrent allocations of actions with same index. Use EBUSY error pointer
> >> to indicate that idr slot is reserved.
> >
> > A dumb question:
> >
> > How could "concurrent allocations of actions with same index" happen
> > as you already take idrinfo->lock for the whole
> > tcf_idr_check_alloc()??
>
> I guess my changelog is not precise enough in this description.
> Let look into sequence of events of initialization of new action:
> 1) tcf_idr_check_alloc() is called by action init.
> 2) idrinfo->lock is taken.
> 3) Lookup in idr is performed to determine if action with specified
> index already exists.
> 4) EBUSY pointer is inserted to indicate that id is taken.
> 5) idrinfo->lock is released.
> 6) tcf_idr_check_alloc() returns to action init code.
> 7) New action is allocated and initialized.
> 8) tcf_idr_insert() is called.
> 9) idrinfo->lock is taken.
> 10) EBUSY pointer is substituted with pointer to new action.
> 11) idrinfo->lock is released.
> 12) tcf_idr_insert() returns.
>
> So in this case "concurrent allocations of actions with same index"
> means not the allocation with same index during tcf_idr_check_alloc(),
> but during the period when idrinfo->lock was released(6-8).

Yes but it is unnecessary:

a) When adding a new action, you can actually allocate and init it before
touching idrinfo, therefore the check and insert can be done in one step
instead of breaking down it into multiple steps, which means you can
acquire idrinfo->lock once.

b) When updating an existing action, it is slightly complicated.
However, you can still allocate a new one first, then find the old one
and copy it into the new one and finally replace it.

In summary, we can do the following:

1. always allocate a new action
2. acquire idrinfo->lock
3a. if it is an add operation: allocate a new ID and insert the new action
3b. if it is a replace operation: find the old one with ID, copy it into the
new one and replace it
4. release idrinfo->lock
5. If 3a or 3b fails, free the allocation. Otherwise succeed.

I know, the locking scope is now per netns rather than per action,
but this can be optimized for replacing, you can hold the old action
and then release the idrinfo->lock, as idr_replace() later doesn't
require idrinfo->lock AFAIK.

Is there anything I miss here?


>
> >
> > For me, it should be only one allocation could succeed, all others
> > should fail.
>
> Correct! And this change is made specifically to enforce that rule.
>
> Otherwise, multiple processes could try to create new action with same
> id at the same time, and all processes that executed 3, before any
> process reached 10, will "succeed" by overwriting each others action in
> idr. (and leak memory while doing so)

I know but again it doesn't look necessary to achieve a same goal.


>
> >
> > Maybe you are trying to prevent others treat it like existing one,
> > but in that case you can just hold the idinfo->lock for all idr operations.
> >
> > And more importantly, upper layer is able to tell it is a creation or
> > just replace, you don't have to check this in this complicated way.
> >
> > IOW, all of these complicated code should not exist.
>
> Original code was simpler and didn't involve temporary EBUSY pointer.
> This change was made according to Jiri's request. He wanted to have
> unified API to be used by all actions and suggested this approach
> specifically.

I will work on this, as this is aligned to my work to make
it RCU-complete.


Re: [PATCH bpf-next 1/3] bpf: add bpf queue map

2018-08-09 Thread Mauricio Vasquez




On 08/09/2018 11:23 AM, Alexei Starovoitov wrote:

On Thu, Aug 09, 2018 at 09:51:49AM -0500, Mauricio Vasquez wrote:

Agree that existing ops are not the right alias, but deferring to user
space as inline function also doesn't really seem like a good fit, imho,
so I'd prefer rather to have something native. (Aside from that, the
above inline bpf_pop() would also race between CPUs.)

I think we should have push/pop/peek syscalls as well, having a bpf_pop()
that is race prone would create problems. Users expected maps operations to
be safe, so having one that is not will confuse them.

agree the races are not acceptable.
How about a mixed solution:
- introduce bpf_push/pop/peak helpers that programs will use, so
   they don't need to pass useless key=NULL
- introduce map->ops->lookup_and_delete and map->ops->lookup_or_init
   that prog-side helpers can use and syscall has 1-1 mapping for

I think if is a fair solution.

Native lookup_or_init() helper for programs and syscall is badly missing.
Most of the bcc scripts use it and bcc has a racy workaround.
Similarly lookup_and_delete() syscall is 1-1 to pop() for stack/queue
and useful for regular hash maps.

At the end for stack/queue map the programs will use:
int bpf_push(map, value);


Also flags should be passed here.


value_or_null = bpf_pop(map); // guaranteed non-racy for multi-cpu
value_or_null = bpf_peak(map); // racy if 2+ cpus doing it

Is there any reason for it to be racy?



from syscall:
bpf_map_lookup_elem(map, NULL, ); // returns top of stack
bpf_map_lookup_and_delete_elem(map, NULL, ); // returns top and deletes 
top atomically
bpf_map_update_elem(map, NULL, ); // pushes new value into stack 
atomically

Eventually hash and other maps will implement bpf_map_lookup_and_delete()
for both bpf progs and syscall.

The main point that prog-side api doesn't have to match 1-1 to syscall-side,
since they're different enough already.
Like lookup_or_init() is badly needed for programs, but unnecessary for syscall.

Thoughts?

I agree with the idea, if there are not more thoughts on this I'd 
proceed to the implementation.


[net-next:master 518/519] drivers/net/virtio_net.c:1910:3: error: implicit declaration of function '__netif_set_xps_queue'; did you mean 'netif_set_xps_queue'?

2018-08-09 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   36d2f761b5aa688567b6aebdc6d68e73682275d4
commit: 4d99f6602cb552fb58db0c3b1d935bb6fa017f24 [518/519] net: allow to call 
netif_reset_xps_queues() under cpus_read_lock
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 4d99f6602cb552fb58db0c3b1d935bb6fa017f24
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   drivers/net/virtio_net.c: In function 'virtnet_set_affinity':
>> drivers/net/virtio_net.c:1910:3: error: implicit declaration of function 
>> '__netif_set_xps_queue'; did you mean 'netif_set_xps_queue'? 
>> [-Werror=implicit-function-declaration]
  __netif_set_xps_queue(vi->dev, mask, i, false);
  ^
  netif_set_xps_queue
   cc1: some warnings being treated as errors

vim +1910 drivers/net/virtio_net.c

  1888  
  1889  static void virtnet_set_affinity(struct virtnet_info *vi)
  1890  {
  1891  int i;
  1892  int cpu;
  1893  
  1894  /* In multiqueue mode, when the number of cpu is equal to the 
number of
  1895   * queue pairs, we let the queue pairs to be private to one cpu 
by
  1896   * setting the affinity hint to eliminate the contention.
  1897   */
  1898  if (vi->curr_queue_pairs == 1 ||
  1899  vi->max_queue_pairs != num_online_cpus()) {
  1900  virtnet_clean_affinity(vi, -1);
  1901  return;
  1902  }
  1903  
  1904  i = 0;
  1905  for_each_online_cpu(cpu) {
  1906  const unsigned long *mask = 
cpumask_bits(cpumask_of(cpu));
  1907  
  1908  virtqueue_set_affinity(vi->rq[i].vq, cpu);
  1909  virtqueue_set_affinity(vi->sq[i].vq, cpu);
> 1910  __netif_set_xps_queue(vi->dev, mask, i, false);
  1911  i++;
  1912  }
  1913  
  1914  vi->affinity_hint_set = true;
  1915  }
  1916  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


pull-request: bpf 2018-08-10

2018-08-09 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix cpumap and devmap on teardown as they're under RCU context
   and won't have same assumption as running under NAPI protection,
   from Jesper.

2) Fix various sockmap bugs in bpf_tcp_sendmsg() code, e.g. we had
   a bug where socket error was not propagated correctly, from Daniel.

3) Fix incompatible libbpf header license for BTF code and match it
   before it gets officially released with the rest of libbpf which
   is LGPL-2.1, from Martin.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit 82a40777de12728dedf4075453b694f0d1baee80:

  ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit 
(2018-08-05 17:35:02 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 9c95420117393ed5f76de373e3c6479c21e3e380:

  Merge branch 'bpf-fix-cpu-and-devmap-teardown' (2018-08-09 21:50:45 +0200)


Alexei Starovoitov (1):
  Merge branch 'sockmap-fixes'

Daniel Borkmann (4):
  bpf, sockmap: fix bpf_tcp_sendmsg sock error handling
  bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path
  bpf, sockmap: fix cork timeout for select due to epipe
  Merge branch 'bpf-fix-cpu-and-devmap-teardown'

Jesper Dangaard Brouer (3):
  xdp: fix bug in cpumap teardown code path
  samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier
  xdp: fix bug in devmap teardown code path

Martin KaFai Lau (1):
  bpf: btf: Change tools/lib/bpf/btf to LGPL

 kernel/bpf/cpumap.c| 15 +--
 kernel/bpf/devmap.c| 14 +-
 kernel/bpf/sockmap.c   |  9 ++---
 samples/bpf/xdp_redirect_cpu_kern.c|  2 +-
 samples/bpf/xdp_redirect_cpu_user.c|  4 ++--
 tools/lib/bpf/btf.c|  2 +-
 tools/lib/bpf/btf.h|  2 +-
 tools/testing/selftests/bpf/test_sockmap.c |  2 +-
 8 files changed, 30 insertions(+), 20 deletions(-)


Re: [PATCH] net: ethernet: cpsw-phy-sel: prefer phandle for phy sel and update binding

2018-08-09 Thread Grygorii Strashko




On 08/09/2018 05:46 AM, Tony Lindgren wrote:

* Tony Lindgren  [180808 13:52]:

* Andrew Lunn  [180808 12:02]:


Do you need to handle EPROBE_DEFER here? The phandle points to a
device which has not yet been loaded? I'm not sure exactly where it
will be returned, maybe it is bus_find_device(), but i expect to see
some handling of it somewhere in this function.


If no device is found the driver just produces a warning currently.
And in that case cpsw attempts to continue with bootloader settings.

And looking at the caller function cpsw_slave_open() it also just
produces warnings for phy_connect() too..

I agree that in general this this whole pile of cpsw related drivers sure
could use some better error handling. Starting with making cpsw_slave_open()
and cpsw_phy_sel() return errors instead of just ignoring them might be a
good start.

Grygorii, care to add that note of things to do into your cpsw maintainer
hat?


Right. EPROBE_DEFER not supported for this module as of now.




With the proper interconnect hierarchy in the device tree there should be
no EPROBE_DEFER happening here as the interconnects are probed in the
right order with the always on interrupt with system control module first :)

But then again, adding support for EPROBE_DEFER here won't hurt either,
will take a look.


I'll just add some notes about that to the patch description considering
the above.


thanks Tony.

--
regards,
-grygorii


Re: [PATCH lora-next v2 8/8] net: lora: sx1301: convert driver over to regmap reads and writes

2018-08-09 Thread Ben Whitten
On Thu, 9 Aug 2018 at 23:34, Andreas Färber  wrote:
>
> Am 09.08.2018 um 14:33 schrieb Ben Whitten:
> > The reads and writes are replaced with regmap versions and unneeded
> > functions, variable, and defines removed.
> >
> > Signed-off-by: Ben Whitten 
> > ---
> >  drivers/net/lora/sx1301.c | 204 
> > +++---
> >  drivers/net/lora/sx1301.h |  30 +++
> >  2 files changed, 95 insertions(+), 139 deletions(-)
> >
> > diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
> > index 766df06..4db5a43 100644
> > --- a/drivers/net/lora/sx1301.c
> > +++ b/drivers/net/lora/sx1301.c
> [...]
> > @@ -140,50 +115,9 @@ static int sx1301_write(struct sx1301_priv *priv, u8 
> > reg, u8 val)
> >   return sx1301_write_burst(priv, reg, , 1);
> >  }
>
> _write and _read are now unused, causing warnings. Dropping.
>
> The _burst versions are still in use for firmware load, and I saw a
> discussion indicating that regmap is lacking the capability to not
> increment the reg for bulk reads at the moment. So we still can't
> cleanly switch to regmap entirely and thereby remain bound to SPI.
>
> [...]
> > @@ -235,8 +169,8 @@ static int sx1301_radio_spi_transfer_one(struct 
> > spi_controller *ctrl,
> >  {
> >   struct spi_sx1301 *ssx = spi_controller_get_devdata(ctrl);
> >   struct sx1301_priv *priv = spi_get_drvdata(ssx->parent);
> > - const u8 *tx_buf = xfr->tx_buf;
> > - u8 *rx_buf = xfr->rx_buf;
> > + const unsigned int *tx_buf = xfr->tx_buf;
> > + unsigned int *rx_buf = xfr->rx_buf;
>
> These are wrong both for Little Endian and even worse for Big Endian.
>
> >   int ret;
> >
> >   if (xfr->len == 0 || xfr->len > 3)
> > @@ -245,13 +179,13 @@ static int sx1301_radio_spi_transfer_one(struct 
> > spi_controller *ctrl,
> >   dev_dbg(>dev, "transferring one (%u)\n", xfr->len);
> >
> >   if (tx_buf) {
> > - ret = sx1301_page_write(priv, ssx->page, ssx->regs + 
> > REG_RADIO_X_ADDR, tx_buf ? tx_buf[0] : 0);
> > + ret = regmap_write(priv->regmap, ssx->regs + 
> > REG_RADIO_X_ADDR, tx_buf ? tx_buf[0] : 0);
> >   if (ret) {
> >   dev_err(>dev, "SPI radio address write 
> > failed\n");
> >   return ret;
> >   }
> >
> > - ret = sx1301_page_write(priv, ssx->page, ssx->regs + 
> > REG_RADIO_X_DATA, (tx_buf && xfr->len >= 2) ? tx_buf[1] : 0);
> > + ret = regmap_write(priv->regmap, ssx->regs + 
> > REG_RADIO_X_DATA, (tx_buf && xfr->len >= 2) ? tx_buf[1] : 0);
> >   if (ret) {
> >   dev_err(>dev, "SPI radio data write failed\n");
> >   return ret;
> > @@ -271,7 +205,7 @@ static int sx1301_radio_spi_transfer_one(struct 
> > spi_controller *ctrl,
> >   }
> >
> >   if (rx_buf) {
> > - ret = sx1301_page_read(priv, ssx->page, ssx->regs + 
> > REG_RADIO_X_DATA_READBACK, _buf[xfr->len - 1]);
> > + ret = regmap_read(priv->regmap, ssx->regs + 
> > REG_RADIO_X_DATA_READBACK, _buf[xfr->len - 1]);
> >   if (ret) {
> >   dev_err(>dev, "SPI radio data read failed\n");
> >   return ret;
>
> Fixing by adding a local variable instead:
>
> @@ -239,6 +163,7 @@ static int sx1301_radio_spi_transfer_one(struct
> spi_controll
> er *ctrl,
> struct sx1301_priv *priv = netdev_priv(netdev);
> const u8 *tx_buf = xfr->tx_buf;
> u8 *rx_buf = xfr->rx_buf;
> +   unsigned int val;
> int ret;
>
> if (xfr->len == 0 || xfr->len > 3)
> [...]
> @@ -273,27 +198,28 @@ static int sx1301_radio_spi_transfer_one(struct
> spi_controller *ctrl,
> }
>
> if (rx_buf) {
> -   ret = sx1301_page_read(priv, ssx->page, ssx->regs +
> REG_RADIO_X_DATA_READBACK, _buf[xfr->len - 1]);
> +   ret = regmap_read(priv->regmap, ssx->regs +
> REG_RADIO_X_DATA_READBACK, );
> if (ret) {
> dev_err(>dev, "SPI radio data read failed\n");
> return ret;
> }
> +   rx_buf[xfr->len - 1] = val & 0xff;
> }
>
> return 0;
>
> [...]
> > diff --git a/drivers/net/lora/sx1301.h b/drivers/net/lora/sx1301.h
> > index 2fc283f..b21e5c6 100644
> > --- a/drivers/net/lora/sx1301.h
> > +++ b/drivers/net/lora/sx1301.h
> > @@ -18,11 +18,41 @@
> >  /* Page independent */
> >  #define SX1301_PAGE 0x00
> >  #define SX1301_VER  0x01
> > +#define SX1301_MPA  0x09
>
> Those are the official register names? I find these much harder to read
> than my guessed names. Could we keep the long names as aliases?

Yes these are the official register names, aliases to improve readability
sound like a good plan as all the official names are terse.

> > +#define SX1301_MPD  0x0A
> > +#define SX1301_GEN  0x10
> > +#define SX1301_CKEN 0x11
> > +#define SX1301_GPSO 0x1C
> > +#define SX1301_GPMODE   

Re: [PATCH net-next,v4] net/tls: Calculate nsg for zerocopy path without skb_cow_data.

2018-08-09 Thread Doron Roberts-Kedes
On Wed, Aug 08, 2018 at 12:14:30PM -0700, David Miller wrote:
> From: Doron Roberts-Kedes 
> Date: Tue, 7 Aug 2018 11:09:39 -0700
> 
> > +static int __skb_nsg(struct sk_buff *skb, int offset, int len,
> > +unsigned int recursion_level)
> > +{
> > +   int start = skb_headlen(skb);
> > +   int i, copy = start - offset;
> > +   struct sk_buff *frag_iter;
> > +   int elt = 0;
> > +
> > +   if (unlikely(recursion_level >= 24))
> > +   return -EMSGSIZE;
> 
> This recursion is kinda crazy.
> 
> Even skb_cow_data() doesn't recurse like this (of course because it copies
> into linear buffers).
> 
> There has to be a way to simplify this.  Fragment lists are such a rarely
> used SKB geometry, and few if any devices support it for transmission
> (so the fraglist will get undone at transmit time anyways).
> 

Interesting. Just wanted to clarify whether the issue is the use of
recursion or the fact that the function is handling the frag_list at
all. This is the rx path, so my understanding was that we need to handle
the frag_list. Please let me know if I'm misunderstanding your point
about the rare use of fragment lists.

If the issue is the recursion, I can rewrite the function to not use
recursion, but skb_to_sgvec uses a similar pattern and is invoked
immediately afterwards.

Taking a step back, is there an existing solution for what this function
is trying to do? I was surprised to find that there did not seem to
exist a function for determining the number of scatterlist elements
required to map an skb without COW. 


Re: [PATCH bpf-next] bpf: enable btf for use in all maps

2018-08-09 Thread Daniel Borkmann
On 08/09/2018 11:44 PM, Alexei Starovoitov wrote:
> On Thu, Aug 09, 2018 at 11:30:52PM +0200, Daniel Borkmann wrote:
>> On 08/09/2018 11:14 PM, Alexei Starovoitov wrote:
>>> On Thu, Aug 09, 2018 at 09:42:20PM +0200, Daniel Borkmann wrote:
 Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
 the basic arraymap") enabled support for BTF and dumping via
 BPF fs for arraymap. However, both can be decoupled from each
 other such that all BPF maps can be supported for attaching
 BTF key/value information, while not all maps necessarily
 need to dump via map_seq_show_elem() callback.

 The check in array_map_check_btf() can be generalized as
 ultimatively the key and value size is the only contraint
 that needs to match for the map. The fact that the key needs
 to be of type int is optional; it could be any data type as
 long as it matches the 4 byte key size, just like hash table
 key or others could be of any data type as well.

 Minimal example of a hash table dump which then works out
 of the box for bpftool:

   # bpftool map dump id 19
   [{
   "key": {
   "": {
   "vip": 0,
   "vipv6": []
   },
   "port": 0,
   "family": 0,
   "proto": 0
   },
   "value": {
   "flags": 0,
   "vip_num": 0
   }
   }
   ]

 Signed-off-by: Daniel Borkmann 
 Cc: Yonghong Song 
 ---
  include/linux/bpf.h   |  4 +---
  kernel/bpf/arraymap.c | 27 ---
  kernel/bpf/inode.c|  3 ++-
  kernel/bpf/syscall.c  | 24 
  4 files changed, 23 insertions(+), 35 deletions(-)

 diff --git a/include/linux/bpf.h b/include/linux/bpf.h
 index cd8790d..eb76e8e 100644
 --- a/include/linux/bpf.h
 +++ b/include/linux/bpf.h
 @@ -48,8 +48,6 @@ struct bpf_map_ops {
u32 (*map_fd_sys_lookup_elem)(void *ptr);
void (*map_seq_show_elem)(struct bpf_map *map, void *key,
  struct seq_file *m);
 -  int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
 -   u32 key_type_id, u32 value_type_id);
  };
  
  struct bpf_map {
 @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const 
 struct bpf_map *map)
  
  static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
  {
 -  return map->ops->map_seq_show_elem && map->ops->map_check_btf;
 +  return map->btf && map->ops->map_seq_show_elem;
  }
  
  extern const struct bpf_map_ops bpf_map_offload_ops;
 diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
 index 2aa55d030..67f0bdf 100644
 --- a/kernel/bpf/arraymap.c
 +++ b/kernel/bpf/arraymap.c
 @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map 
 *map, void *key,
rcu_read_unlock();
  }
  
 -static int array_map_check_btf(const struct bpf_map *map, const struct 
 btf *btf,
 - u32 btf_key_id, u32 btf_value_id)
 -{
 -  const struct btf_type *key_type, *value_type;
 -  u32 key_size, value_size;
 -  u32 int_data;
 -
 -  key_type = btf_type_id_size(btf, _key_id, _size);
 -  if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
 -  return -EINVAL;
 -
 -  int_data = *(u32 *)(key_type + 1);
 -  /* bpf array can only take a u32 key.  This check makes
 -   * sure that the btf matches the attr used during map_create.
 -   */
 -  if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
 -  BTF_INT_OFFSET(int_data))
 -  return -EINVAL;
>>>
>>> I think most of these checks are still necessary for array type.
>>> Relaxing BTF array key from BTF_KIND_INT to, for example, BTF_KIND_ENUM
>>> is probably ok, but key being BTF_KIND_PTR or BTF_KIND_ARRAY doesn't makes 
>>> sense.
>>
>> Hmm, so on 64 bit archs BTF_KIND_PTR would get rejected for array,
>> on 32 bit it may be allowed due to sizeof(void *) == 4. BTF_KIND_ARRAY
>> could be array of u8 foo[4], for example, or u16 foo[2]. But how would
>> it ultimately be different from e.g. having 'struct a' versus 'struct b'
>> where both are of same size and while actual key has 'struct a', the one
>> who writes the prog resp. loads the BTF into the kernel would lie about
>> it stating it's of type 'struct b' instead? It's basically trusting the
>> app that it advertised sane key types which kernel is propagating back.
> 
> for hash map - yes. the kernel cannot yet catch the lie that
> key == 'struct a' that user said in BTF is not what program used
> (which used 'struct b' of the same size).
> Eventually we will annotate all load/store in the program and will
> make sure that memory access 

Re: [PATCH lora-next v2 8/8] net: lora: sx1301: convert driver over to regmap reads and writes

2018-08-09 Thread Andreas Färber
Am 09.08.2018 um 14:33 schrieb Ben Whitten:
> The reads and writes are replaced with regmap versions and unneeded
> functions, variable, and defines removed.
> 
> Signed-off-by: Ben Whitten 
> ---
>  drivers/net/lora/sx1301.c | 204 
> +++---
>  drivers/net/lora/sx1301.h |  30 +++
>  2 files changed, 95 insertions(+), 139 deletions(-)
> 
> diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
> index 766df06..4db5a43 100644
> --- a/drivers/net/lora/sx1301.c
> +++ b/drivers/net/lora/sx1301.c
[...]
> @@ -140,50 +115,9 @@ static int sx1301_write(struct sx1301_priv *priv, u8 
> reg, u8 val)
>   return sx1301_write_burst(priv, reg, , 1);
>  }

_write and _read are now unused, causing warnings. Dropping.

The _burst versions are still in use for firmware load, and I saw a
discussion indicating that regmap is lacking the capability to not
increment the reg for bulk reads at the moment. So we still can't
cleanly switch to regmap entirely and thereby remain bound to SPI.

[...]
> @@ -235,8 +169,8 @@ static int sx1301_radio_spi_transfer_one(struct 
> spi_controller *ctrl,
>  {
>   struct spi_sx1301 *ssx = spi_controller_get_devdata(ctrl);
>   struct sx1301_priv *priv = spi_get_drvdata(ssx->parent);
> - const u8 *tx_buf = xfr->tx_buf;
> - u8 *rx_buf = xfr->rx_buf;
> + const unsigned int *tx_buf = xfr->tx_buf;
> + unsigned int *rx_buf = xfr->rx_buf;

These are wrong both for Little Endian and even worse for Big Endian.

>   int ret;
>  
>   if (xfr->len == 0 || xfr->len > 3)
> @@ -245,13 +179,13 @@ static int sx1301_radio_spi_transfer_one(struct 
> spi_controller *ctrl,
>   dev_dbg(>dev, "transferring one (%u)\n", xfr->len);
>  
>   if (tx_buf) {
> - ret = sx1301_page_write(priv, ssx->page, ssx->regs + 
> REG_RADIO_X_ADDR, tx_buf ? tx_buf[0] : 0);
> + ret = regmap_write(priv->regmap, ssx->regs + REG_RADIO_X_ADDR, 
> tx_buf ? tx_buf[0] : 0);
>   if (ret) {
>   dev_err(>dev, "SPI radio address write failed\n");
>   return ret;
>   }
>  
> - ret = sx1301_page_write(priv, ssx->page, ssx->regs + 
> REG_RADIO_X_DATA, (tx_buf && xfr->len >= 2) ? tx_buf[1] : 0);
> + ret = regmap_write(priv->regmap, ssx->regs + REG_RADIO_X_DATA, 
> (tx_buf && xfr->len >= 2) ? tx_buf[1] : 0);
>   if (ret) {
>   dev_err(>dev, "SPI radio data write failed\n");
>   return ret;
> @@ -271,7 +205,7 @@ static int sx1301_radio_spi_transfer_one(struct 
> spi_controller *ctrl,
>   }
>  
>   if (rx_buf) {
> - ret = sx1301_page_read(priv, ssx->page, ssx->regs + 
> REG_RADIO_X_DATA_READBACK, _buf[xfr->len - 1]);
> + ret = regmap_read(priv->regmap, ssx->regs + 
> REG_RADIO_X_DATA_READBACK, _buf[xfr->len - 1]);
>   if (ret) {
>   dev_err(>dev, "SPI radio data read failed\n");
>   return ret;

Fixing by adding a local variable instead:

@@ -239,6 +163,7 @@ static int sx1301_radio_spi_transfer_one(struct
spi_controll
er *ctrl,
struct sx1301_priv *priv = netdev_priv(netdev);
const u8 *tx_buf = xfr->tx_buf;
u8 *rx_buf = xfr->rx_buf;
+   unsigned int val;
int ret;

if (xfr->len == 0 || xfr->len > 3)
[...]
@@ -273,27 +198,28 @@ static int sx1301_radio_spi_transfer_one(struct
spi_controller *ctrl,
}

if (rx_buf) {
-   ret = sx1301_page_read(priv, ssx->page, ssx->regs +
REG_RADIO_X_DATA_READBACK, _buf[xfr->len - 1]);
+   ret = regmap_read(priv->regmap, ssx->regs +
REG_RADIO_X_DATA_READBACK, );
if (ret) {
dev_err(>dev, "SPI radio data read failed\n");
return ret;
}
+   rx_buf[xfr->len - 1] = val & 0xff;
}

return 0;

[...]
> diff --git a/drivers/net/lora/sx1301.h b/drivers/net/lora/sx1301.h
> index 2fc283f..b21e5c6 100644
> --- a/drivers/net/lora/sx1301.h
> +++ b/drivers/net/lora/sx1301.h
> @@ -18,11 +18,41 @@
>  /* Page independent */
>  #define SX1301_PAGE 0x00
>  #define SX1301_VER  0x01
> +#define SX1301_MPA  0x09

Those are the official register names? I find these much harder to read
than my guessed names. Could we keep the long names as aliases?

> +#define SX1301_MPD  0x0A
> +#define SX1301_GEN  0x10
> +#define SX1301_CKEN 0x11
> +#define SX1301_GPSO 0x1C
> +#define SX1301_GPMODE   0x1D
> +#define SX1301_AGCSTS   0x20
>  
>  #define SX1301_VIRT_BASE0x100
>  #define SX1301_PAGE_LEN 0x80
>  #define SX1301_PAGE_BASE(n) (SX1301_VIRT_BASE + (SX1301_PAGE_LEN * n))
>  
> +/* Page 0 */
> +#define SX1301_CHRS (SX1301_PAGE_BASE(0) + 0x23)
> +#define SX1301_FORCE_CTRL   (SX1301_PAGE_BASE(0) + 0x69)
> +#define SX1301_MCU_CTRL (SX1301_PAGE_BASE(0) + 0x6A)
> +
> +/* Page 2 */
> +#define 

Re: [PATCH net-next v2] net: allow to call netif_reset_xps_queues() under cpus_read_lock

2018-08-09 Thread Michael S. Tsirkin
On Wed, Aug 08, 2018 at 08:07:35PM -0700, Andrei Vagin wrote:
> From: Andrei Vagin 
> 
> The definition of static_key_slow_inc() has cpus_read_lock in place. In the
> virtio_net driver, XPS queues are initialized after setting the queue:cpu
> affinity in virtnet_set_affinity() which is already protected within
> cpus_read_lock. Lockdep prints a warning when we are trying to acquire
> cpus_read_lock when it is already held.
> 
> This patch adds an ability to call __netif_set_xps_queue under
> cpus_read_lock().
> 
> 
> WARNING: possible recursive locking detected
> 4.18.0-rc3-next-20180703+ #1 Not tainted
> 
> swapper/0/1 is trying to acquire lock:
> cf973d46 (cpu_hotplug_lock.rw_sem){}, at: 
> static_key_slow_inc+0xe/0x20
> 
> but task is already holding lock:
> cf973d46 (cpu_hotplug_lock.rw_sem){}, at: init_vqs+0x513/0x5a0
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>CPU0
>
>   lock(cpu_hotplug_lock.rw_sem);
>   lock(cpu_hotplug_lock.rw_sem);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by swapper/0/1:
>  #0: 244bc7da (>mutex){}, at: __driver_attach+0x5a/0x110
>  #1: cf973d46 (cpu_hotplug_lock.rw_sem){}, at: 
> init_vqs+0x513/0x5a0
>  #2: 5cd8463f (xps_map_mutex){+.+.}, at: 
> __netif_set_xps_queue+0x8d/0xc60
> 
> v2: move cpus_read_lock() out of __netif_set_xps_queue()

FYI you change log should go after -- below, not before it.

> Cc: "Nambiar, Amritha" 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Fixes: 8af2c06ff4b1 ("net-sysfs: Add interface for Rx queue(s) map per Tx 
> queue")
> 
> Signed-off-by: Andrei Vagin 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/net/virtio_net.c |  4 +++-
>  net/core/dev.c   | 20 +++-
>  net/core/net-sysfs.c |  4 
>  3 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 62311dde6e71..39a7f4452587 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1903,9 +1903,11 @@ static void virtnet_set_affinity(struct virtnet_info 
> *vi)
>  
>   i = 0;
>   for_each_online_cpu(cpu) {
> + const unsigned long *mask = cpumask_bits(cpumask_of(cpu));
> +
>   virtqueue_set_affinity(vi->rq[i].vq, cpu);
>   virtqueue_set_affinity(vi->sq[i].vq, cpu);
> - netif_set_xps_queue(vi->dev, cpumask_of(cpu), i);
> + __netif_set_xps_queue(vi->dev, mask, i, false);
>   i++;
>   }
>  
> diff --git a/net/core/dev.c b/net/core/dev.c
> index f68122f0ab02..325fc5088370 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2176,6 +2176,7 @@ static void netif_reset_xps_queues(struct net_device 
> *dev, u16 offset,
>   if (!static_key_false(_needed))
>   return;
>  
> + cpus_read_lock();
>   mutex_lock(_map_mutex);
>  
>   if (static_key_false(_rxqs_needed)) {
> @@ -2199,10 +2200,11 @@ static void netif_reset_xps_queues(struct net_device 
> *dev, u16 offset,
>  
>  out_no_maps:
>   if (static_key_enabled(_rxqs_needed))
> - static_key_slow_dec(_rxqs_needed);
> + static_key_slow_dec_cpuslocked(_rxqs_needed);
>  
> - static_key_slow_dec(_needed);
> + static_key_slow_dec_cpuslocked(_needed);
>   mutex_unlock(_map_mutex);
> + cpus_read_unlock();
>  }
>  
>  static void netif_reset_xps_queues_gt(struct net_device *dev, u16 index)
> @@ -2250,6 +2252,7 @@ static struct xps_map *expand_xps_map(struct xps_map 
> *map, int attr_index,
>   return new_map;
>  }
>  
> +/* Must be called under cpus_read_lock */
>  int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
> u16 index, bool is_rxqs_map)
>  {
> @@ -2317,9 +2320,9 @@ int __netif_set_xps_queue(struct net_device *dev, const 
> unsigned long *mask,
>   if (!new_dev_maps)
>   goto out_no_new_maps;
>  
> - static_key_slow_inc(_needed);
> + static_key_slow_inc_cpuslocked(_needed);
>   if (is_rxqs_map)
> - static_key_slow_inc(_rxqs_needed);
> + static_key_slow_inc_cpuslocked(_rxqs_needed);
>  
>   for (j = -1; j = netif_attrmask_next(j, possible_mask, nr_ids),
>j < nr_ids;) {
> @@ -2448,11 +2451,18 @@ int __netif_set_xps_queue(struct net_device *dev, 
> const unsigned long *mask,
>   kfree(new_dev_maps);
>   return -ENOMEM;
>  }
> +EXPORT_SYMBOL_GPL(__netif_set_xps_queue);
>  
>  int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
>   u16 index)
>  {
> - return __netif_set_xps_queue(dev, cpumask_bits(mask), index, false);
> + int ret;
> +
> + cpus_read_lock();
> + ret =  __netif_set_xps_queue(dev, cpumask_bits(mask), index, false);
> + 

Re: KCM - recvmsg() mangles packets?

2018-08-09 Thread Dominique Martinet
Tom Herbert wrote on Thu, Aug 09, 2018:
> > diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> > index 625acb27efcc..348ff5945591 100644
> > --- a/net/strparser/strparser.c
> > +++ b/net/strparser/strparser.c
> > @@ -222,6 +222,16 @@ static int __strp_recv(read_descriptor_t *desc, struct 
> > sk_buff *orig_skb,
> > if (!stm->strp.full_len) {
> > ssize_t len;
> >
> > +   /* Can only parse if there is no offset */
> > +   if (unlikely(stm->strp.offset)) {
> > +   if (!pskb_pull(skb, stm->strp.offset)) {
> > +   
> > STRP_STATS_INCR(strp->stats.mem_fail);
> > +   strp_parser_err(strp, -ENOMEM, 
> > desc);
> > +   break;
> > +   }
> > +   stm->strp.offset = 0;
> > +   }
> > +
> 
> Seems okay to me for a fix.

Hmm, if you say so, I'll send this as a patch for broader comments right
away.

> Looks like strp.offset is only set in one place and read in one
> place. With this pull maybe that just can go away?

Good point, when strp.offset is set the full_len is also being init'd so
we will necessarily do the pull next...

But the way tls uses strparser is also kind of weird, since they modify
the strp_msg's offset and full_len, I wouldn't want to assume we can't
have full_len == 0 *again* later with a non zero offset...
On the other hand they do handle non-zero offset in their parse function
so they'd be ok with that... Ultimately it's probably closer to a design
choice than anything else.


I'll still send a v0 of the patch as is, because I feel it's easier to
understand that we pull because the existing parse_msg functions do not
handle it properly, and will write a note that I intend to move it up a
few lines as a comment.


Thanks,
-- 
Dominique Martinet


Re: [PATCH lora-next v2 7/8] net: lora: sx1301: add initial registration for regmap

2018-08-09 Thread Andreas Färber
Am 09.08.2018 um 14:33 schrieb Ben Whitten:
> The register and bit-field definitions are taken from the SX1301
> datasheet version 2.01 dated June 2014 with the revision information
> 'First released version'.
> 
> The reset state and RW capability of each field is not reflected in this
> patch however from the datasheet:
> "Bits and registers that are not documented are reserved. They may
> include calibration values. It is important not to modify these bits and
> registers. If specific bits must be changed in a register with reserved
> bits, the register must be read first, specific bits modified while
> masking reserved bits and then the register can be written."
> 
> Then goes on to state:
> "Reserved bits should be written with their reset state, they may be
> read different states."
> 
> Caching is currently disabled.
> 
> The version is read back using regmap_read to verify regmap operation,
> in doing so needs to be moved after priv and regmap allocation.
> 
> Further registers or fields are added as they are required in conversion.
> 
> Signed-off-by: Ben Whitten 
> ---
>  drivers/net/lora/Kconfig  |  1 +
>  drivers/net/lora/sx1301.c | 46 ++
>  drivers/net/lora/sx1301.h | 10 ++
>  3 files changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/lora/Kconfig b/drivers/net/lora/Kconfig
> index bb57a01..79d23f2 100644
> --- a/drivers/net/lora/Kconfig
> +++ b/drivers/net/lora/Kconfig
> @@ -49,6 +49,7 @@ config LORA_SX1301
>   tristate "Semtech SX1301 SPI driver"
>   default y
>   depends on SPI
> + select REGMAP_SPI
>   help
> Semtech SX1301
>  
> diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
> index 8e81179..766df06 100644
> --- a/drivers/net/lora/sx1301.c
> +++ b/drivers/net/lora/sx1301.c
> @@ -20,11 +20,11 @@
>  #include 
>  #include 
>  #include 
> +#include 

Misordered.

>  
>  #include "sx1301.h"
>  
>  #define REG_PAGE_RESET   0
> -#define REG_VERSION  1
>  #define REG_MCU_PROM_ADDR9
>  #define REG_MCU_PROM_DATA10
>  #define REG_GPIO_SELECT_INPUT27
> @@ -68,6 +68,35 @@
>  
>  #define REG_EMERGENCY_FORCE_HOST_CTRLBIT(0)
>  
> +static const struct regmap_range_cfg sx1301_ranges[] = {

Let's rename to _regmap_ranges for consistency.

> @@ -81,6 +110,7 @@ struct sx1301_priv {
>   struct gpio_desc *rst_gpio;
>   u8 cur_page;
>   struct spi_controller *radio_a_ctrl, *radio_b_ctrl;
> + struct regmap   *regmap;

Note: We need a consistent style. Either whitespace or tabs, not both
depending on author. Same in an earlier patch. Problem with tabs is that
at some point it's always one tab too little, but we can try it.

Applied.

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH bpf-next] bpf: enable btf for use in all maps

2018-08-09 Thread Alexei Starovoitov
On Thu, Aug 09, 2018 at 11:30:52PM +0200, Daniel Borkmann wrote:
> On 08/09/2018 11:14 PM, Alexei Starovoitov wrote:
> > On Thu, Aug 09, 2018 at 09:42:20PM +0200, Daniel Borkmann wrote:
> >> Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
> >> the basic arraymap") enabled support for BTF and dumping via
> >> BPF fs for arraymap. However, both can be decoupled from each
> >> other such that all BPF maps can be supported for attaching
> >> BTF key/value information, while not all maps necessarily
> >> need to dump via map_seq_show_elem() callback.
> >>
> >> The check in array_map_check_btf() can be generalized as
> >> ultimatively the key and value size is the only contraint
> >> that needs to match for the map. The fact that the key needs
> >> to be of type int is optional; it could be any data type as
> >> long as it matches the 4 byte key size, just like hash table
> >> key or others could be of any data type as well.
> >>
> >> Minimal example of a hash table dump which then works out
> >> of the box for bpftool:
> >>
> >>   # bpftool map dump id 19
> >>   [{
> >>   "key": {
> >>   "": {
> >>   "vip": 0,
> >>   "vipv6": []
> >>   },
> >>   "port": 0,
> >>   "family": 0,
> >>   "proto": 0
> >>   },
> >>   "value": {
> >>   "flags": 0,
> >>   "vip_num": 0
> >>   }
> >>   }
> >>   ]
> >>
> >> Signed-off-by: Daniel Borkmann 
> >> Cc: Yonghong Song 
> >> ---
> >>  include/linux/bpf.h   |  4 +---
> >>  kernel/bpf/arraymap.c | 27 ---
> >>  kernel/bpf/inode.c|  3 ++-
> >>  kernel/bpf/syscall.c  | 24 
> >>  4 files changed, 23 insertions(+), 35 deletions(-)
> >>
> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >> index cd8790d..eb76e8e 100644
> >> --- a/include/linux/bpf.h
> >> +++ b/include/linux/bpf.h
> >> @@ -48,8 +48,6 @@ struct bpf_map_ops {
> >>u32 (*map_fd_sys_lookup_elem)(void *ptr);
> >>void (*map_seq_show_elem)(struct bpf_map *map, void *key,
> >>  struct seq_file *m);
> >> -  int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
> >> -   u32 key_type_id, u32 value_type_id);
> >>  };
> >>  
> >>  struct bpf_map {
> >> @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const 
> >> struct bpf_map *map)
> >>  
> >>  static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
> >>  {
> >> -  return map->ops->map_seq_show_elem && map->ops->map_check_btf;
> >> +  return map->btf && map->ops->map_seq_show_elem;
> >>  }
> >>  
> >>  extern const struct bpf_map_ops bpf_map_offload_ops;
> >> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> >> index 2aa55d030..67f0bdf 100644
> >> --- a/kernel/bpf/arraymap.c
> >> +++ b/kernel/bpf/arraymap.c
> >> @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map 
> >> *map, void *key,
> >>rcu_read_unlock();
> >>  }
> >>  
> >> -static int array_map_check_btf(const struct bpf_map *map, const struct 
> >> btf *btf,
> >> - u32 btf_key_id, u32 btf_value_id)
> >> -{
> >> -  const struct btf_type *key_type, *value_type;
> >> -  u32 key_size, value_size;
> >> -  u32 int_data;
> >> -
> >> -  key_type = btf_type_id_size(btf, _key_id, _size);
> >> -  if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
> >> -  return -EINVAL;
> >> -
> >> -  int_data = *(u32 *)(key_type + 1);
> >> -  /* bpf array can only take a u32 key.  This check makes
> >> -   * sure that the btf matches the attr used during map_create.
> >> -   */
> >> -  if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
> >> -  BTF_INT_OFFSET(int_data))
> >> -  return -EINVAL;
> > 
> > I think most of these checks are still necessary for array type.
> > Relaxing BTF array key from BTF_KIND_INT to, for example, BTF_KIND_ENUM
> > is probably ok, but key being BTF_KIND_PTR or BTF_KIND_ARRAY doesn't makes 
> > sense.
> 
> Hmm, so on 64 bit archs BTF_KIND_PTR would get rejected for array,
> on 32 bit it may be allowed due to sizeof(void *) == 4. BTF_KIND_ARRAY
> could be array of u8 foo[4], for example, or u16 foo[2]. But how would
> it ultimately be different from e.g. having 'struct a' versus 'struct b'
> where both are of same size and while actual key has 'struct a', the one
> who writes the prog resp. loads the BTF into the kernel would lie about
> it stating it's of type 'struct b' instead? It's basically trusting the
> app that it advertised sane key types which kernel is propagating back.

for hash map - yes. the kernel cannot yet catch the lie that
key == 'struct a' that user said in BTF is not what program used
(which used 'struct b' of the same size).
Eventually we will annotate all load/store in the program and will
make sure that memory access match what BTF said.
For array we can catch the lie today that key is not 

Re: [Patch net-next] net_sched: fix a potential out-of-bound access

2018-08-09 Thread Cong Wang
On Thu, Aug 9, 2018 at 12:32 AM Vlad Buslov  wrote:
>
> Before version V5 of my action API patchset this functionality was
> implemented in exactly the same way as in your patch. Unfortunately, it
> has a double-free bug. The problem is that if you have multiple
> actions(N) being deleted, and deleted succeeded for first K actions,
> this implementation will try to delete all N actions second time
> (including first K actions that were already deleted). That is why I
> added 'acts_deleted' variable that tracks actual amount of actions that
> were deleted successfully, and only delete last N-K actions in case of
> error.

Interesting, I didn't notice you call it for tcf_del_notify()'s failure too.

But this is easy to resolve, we can just set succeeded ones to NULL
and teach tcf_action_put_many() to scan the whole array but
skip NULL's.


>
> In order to fix that issue I did following code changes in V5:
> - Added 'acts_deleted' variable to delete only actions [K, N) in case of
> error.
> - Extended 'actions' array size by one to ensure that it always ends
> with NULL pointer.

Oh, I see, this is not how we use C, you can at least rollback
by passing acts_deleted as a parameter as the start of the array.
You picked the most confusing way to handle it.

I will send an updated patch.


Re: [PATCH bpf-next] bpf: enable btf for use in all maps

2018-08-09 Thread Daniel Borkmann
On 08/09/2018 11:14 PM, Alexei Starovoitov wrote:
> On Thu, Aug 09, 2018 at 09:42:20PM +0200, Daniel Borkmann wrote:
>> Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
>> the basic arraymap") enabled support for BTF and dumping via
>> BPF fs for arraymap. However, both can be decoupled from each
>> other such that all BPF maps can be supported for attaching
>> BTF key/value information, while not all maps necessarily
>> need to dump via map_seq_show_elem() callback.
>>
>> The check in array_map_check_btf() can be generalized as
>> ultimatively the key and value size is the only contraint
>> that needs to match for the map. The fact that the key needs
>> to be of type int is optional; it could be any data type as
>> long as it matches the 4 byte key size, just like hash table
>> key or others could be of any data type as well.
>>
>> Minimal example of a hash table dump which then works out
>> of the box for bpftool:
>>
>>   # bpftool map dump id 19
>>   [{
>>   "key": {
>>   "": {
>>   "vip": 0,
>>   "vipv6": []
>>   },
>>   "port": 0,
>>   "family": 0,
>>   "proto": 0
>>   },
>>   "value": {
>>   "flags": 0,
>>   "vip_num": 0
>>   }
>>   }
>>   ]
>>
>> Signed-off-by: Daniel Borkmann 
>> Cc: Yonghong Song 
>> ---
>>  include/linux/bpf.h   |  4 +---
>>  kernel/bpf/arraymap.c | 27 ---
>>  kernel/bpf/inode.c|  3 ++-
>>  kernel/bpf/syscall.c  | 24 
>>  4 files changed, 23 insertions(+), 35 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index cd8790d..eb76e8e 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -48,8 +48,6 @@ struct bpf_map_ops {
>>  u32 (*map_fd_sys_lookup_elem)(void *ptr);
>>  void (*map_seq_show_elem)(struct bpf_map *map, void *key,
>>struct seq_file *m);
>> -int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
>> - u32 key_type_id, u32 value_type_id);
>>  };
>>  
>>  struct bpf_map {
>> @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const struct 
>> bpf_map *map)
>>  
>>  static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
>>  {
>> -return map->ops->map_seq_show_elem && map->ops->map_check_btf;
>> +return map->btf && map->ops->map_seq_show_elem;
>>  }
>>  
>>  extern const struct bpf_map_ops bpf_map_offload_ops;
>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>> index 2aa55d030..67f0bdf 100644
>> --- a/kernel/bpf/arraymap.c
>> +++ b/kernel/bpf/arraymap.c
>> @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map 
>> *map, void *key,
>>  rcu_read_unlock();
>>  }
>>  
>> -static int array_map_check_btf(const struct bpf_map *map, const struct btf 
>> *btf,
>> -   u32 btf_key_id, u32 btf_value_id)
>> -{
>> -const struct btf_type *key_type, *value_type;
>> -u32 key_size, value_size;
>> -u32 int_data;
>> -
>> -key_type = btf_type_id_size(btf, _key_id, _size);
>> -if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
>> -return -EINVAL;
>> -
>> -int_data = *(u32 *)(key_type + 1);
>> -/* bpf array can only take a u32 key.  This check makes
>> - * sure that the btf matches the attr used during map_create.
>> - */
>> -if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
>> -BTF_INT_OFFSET(int_data))
>> -return -EINVAL;
> 
> I think most of these checks are still necessary for array type.
> Relaxing BTF array key from BTF_KIND_INT to, for example, BTF_KIND_ENUM
> is probably ok, but key being BTF_KIND_PTR or BTF_KIND_ARRAY doesn't makes 
> sense.

Hmm, so on 64 bit archs BTF_KIND_PTR would get rejected for array,
on 32 bit it may be allowed due to sizeof(void *) == 4. BTF_KIND_ARRAY
could be array of u8 foo[4], for example, or u16 foo[2]. But how would
it ultimately be different from e.g. having 'struct a' versus 'struct b'
where both are of same size and while actual key has 'struct a', the one
who writes the prog resp. loads the BTF into the kernel would lie about
it stating it's of type 'struct b' instead? It's basically trusting the
app that it advertised sane key types which kernel is propagating back.

Thanks,
Daniel


Re: [PATCH net-next] cxgb4: update 1.20.8.0 as the latest firmware supported

2018-08-09 Thread David Miller
From: Ganesh Goudar 
Date: Thu,  9 Aug 2018 12:32:03 +0530

> Change t4fw_version.h to update latest firmware version
> number to 1.20.8.0.
> 
> Signed-off-by: Ganesh Goudar 

Applied, thank you.


Re: [PATCH net-next v2] net: allow to call netif_reset_xps_queues() under cpus_read_lock

2018-08-09 Thread David Miller
From: Andrei Vagin 
Date: Wed,  8 Aug 2018 20:07:35 -0700

> From: Andrei Vagin 
> 
> The definition of static_key_slow_inc() has cpus_read_lock in place. In the
> virtio_net driver, XPS queues are initialized after setting the queue:cpu
> affinity in virtnet_set_affinity() which is already protected within
> cpus_read_lock. Lockdep prints a warning when we are trying to acquire
> cpus_read_lock when it is already held.
> 
> This patch adds an ability to call __netif_set_xps_queue under
> cpus_read_lock().
 ...

Applied, thank you.


Re: [PATCH net-next v2 1/1] net/tls: Combined memory allocation for decryption request

2018-08-09 Thread David Miller
From: Vakul Garg 
Date: Thu,  9 Aug 2018 04:56:23 +0530

> For preparing decryption request, several memory chunks are required
> (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
> an accelerator, it is required that the buffers which are read by the
> accelerator must be dma-able and not come from stack. The buffers for
> aad and iv can be separately kmalloced each, but it is inefficient.
> This patch does a combined allocation for preparing decryption request
> and then segments into aead_req || sgin || sgout || iv || aad.
> 
> Signed-off-by: Vakul Garg 
> ---
> This patch needs to be applied over Doron Roberts-Kedes's patch.
>   net/tls: Calculate nsg for zerocopy path without skb_cow_data.

That's going to have many changes, I gave feedback on it yesterday.

Please do not post patches which have pre-requisites which are in
the process of changing or similar as that makes a lot more work
for me and you are also asking people to review changes on top
of code which is going to change.

Thanks.


Re: [PATCH lora-next v2 3/8] net: lora: sx1301: convert to passing priv data throughout

2018-08-09 Thread Andreas Färber
Am 09.08.2018 um 23:06 schrieb Ben Whitten:
> On Thu, 9 Aug 2018 at 21:43, Andreas Färber  wrote:
>> Am 09.08.2018 um 14:33 schrieb Ben Whitten:
>>> @@ -654,22 +646,35 @@ static int sx1301_probe(struct spi_device *spi)
>>>   priv->rst_gpio = rst;
>>>   priv->cur_page = 0xff;
>>>
>>> - spi_set_drvdata(spi, netdev);
>>> + spi_set_drvdata(spi, priv);
>>
>> This change seems unnecessary and counter-productive for unregistration.
>>
>> Otherwise applying.
> 
> This is actually pretty critical, as it stands with the two spi masters we use
> spi_get_drvdata on the parent device of the controller to recover the priv
> struct for regmap.
> 
> We may have to include the netdev in the priv data, or do a container_of
> dance to recover netdev in unregistration.
> That said if we wrap things in devm then really our remove function could
> be empty, as we have done with the allocation.

Thanks for quickly noticing. This should compensate:

diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
index 0ba841f8e7cd..43cd2308e41c 100644
--- a/drivers/net/lora/sx1301.c
+++ b/drivers/net/lora/sx1301.c
@@ -164,7 +164,8 @@ static int sx1301_soft_reset(struct sx1301_priv *priv)
 static int sx1301_radio_set_cs(struct spi_controller *ctrl, bool enable)
 {
struct spi_sx1301 *ssx = spi_controller_get_devdata(ctrl);
-   struct sx1301_priv *priv = spi_get_drvdata(ssx->parent);
+   struct net_device *netdev = spi_get_drvdata(ssx->parent);
+   struct sx1301_priv *priv = netdev_priv(netdev);
u8 cs;
int ret;

@@ -204,7 +205,8 @@ static int sx1301_radio_spi_transfer_one(struct
spi_controller *ctrl,
struct spi_device *spi, struct spi_transfer *xfr)
 {
struct spi_sx1301 *ssx = spi_controller_get_devdata(ctrl);
-   struct sx1301_priv *priv = spi_get_drvdata(ssx->parent);
+   struct net_device *netdev = spi_get_drvdata(ssx->parent);
+   struct sx1301_priv *priv = netdev_priv(netdev);
const u8 *tx_buf = xfr->tx_buf;
u8 *rx_buf = xfr->rx_buf;
int ret;

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 net-next] net: phy: sfp: print debug message with text, not numbers

2018-08-09 Thread David Miller
From: Andrew Lunn 
Date: Thu,  9 Aug 2018 15:00:20 +0200

> Convert the state numbers, device state, etc from numbers to strings
> when printing debug messages.
> 
> Signed-off-by: Andrew Lunn 
> Acked-by: Florian Fainelli 
> ---
> v2: Fixed typo in subject line.
> Add Acked-by from Florian

Grrr, I committed and pushed out the version with the Subject header
typo.

I could revert and apply this one but that would just make things
look worse.

Sorry, will be more careful next time!


Re: [PATCH bpf-next] bpf: enable btf for use in all maps

2018-08-09 Thread Alexei Starovoitov
On Thu, Aug 09, 2018 at 09:42:20PM +0200, Daniel Borkmann wrote:
> Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
> the basic arraymap") enabled support for BTF and dumping via
> BPF fs for arraymap. However, both can be decoupled from each
> other such that all BPF maps can be supported for attaching
> BTF key/value information, while not all maps necessarily
> need to dump via map_seq_show_elem() callback.
> 
> The check in array_map_check_btf() can be generalized as
> ultimatively the key and value size is the only contraint
> that needs to match for the map. The fact that the key needs
> to be of type int is optional; it could be any data type as
> long as it matches the 4 byte key size, just like hash table
> key or others could be of any data type as well.
> 
> Minimal example of a hash table dump which then works out
> of the box for bpftool:
> 
>   # bpftool map dump id 19
>   [{
>   "key": {
>   "": {
>   "vip": 0,
>   "vipv6": []
>   },
>   "port": 0,
>   "family": 0,
>   "proto": 0
>   },
>   "value": {
>   "flags": 0,
>   "vip_num": 0
>   }
>   }
>   ]
> 
> Signed-off-by: Daniel Borkmann 
> Cc: Yonghong Song 
> ---
>  include/linux/bpf.h   |  4 +---
>  kernel/bpf/arraymap.c | 27 ---
>  kernel/bpf/inode.c|  3 ++-
>  kernel/bpf/syscall.c  | 24 
>  4 files changed, 23 insertions(+), 35 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index cd8790d..eb76e8e 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -48,8 +48,6 @@ struct bpf_map_ops {
>   u32 (*map_fd_sys_lookup_elem)(void *ptr);
>   void (*map_seq_show_elem)(struct bpf_map *map, void *key,
> struct seq_file *m);
> - int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
> -  u32 key_type_id, u32 value_type_id);
>  };
>  
>  struct bpf_map {
> @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const struct 
> bpf_map *map)
>  
>  static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
>  {
> - return map->ops->map_seq_show_elem && map->ops->map_check_btf;
> + return map->btf && map->ops->map_seq_show_elem;
>  }
>  
>  extern const struct bpf_map_ops bpf_map_offload_ops;
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 2aa55d030..67f0bdf 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map *map, 
> void *key,
>   rcu_read_unlock();
>  }
>  
> -static int array_map_check_btf(const struct bpf_map *map, const struct btf 
> *btf,
> -u32 btf_key_id, u32 btf_value_id)
> -{
> - const struct btf_type *key_type, *value_type;
> - u32 key_size, value_size;
> - u32 int_data;
> -
> - key_type = btf_type_id_size(btf, _key_id, _size);
> - if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
> - return -EINVAL;
> -
> - int_data = *(u32 *)(key_type + 1);
> - /* bpf array can only take a u32 key.  This check makes
> -  * sure that the btf matches the attr used during map_create.
> -  */
> - if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
> - BTF_INT_OFFSET(int_data))
> - return -EINVAL;

I think most of these checks are still necessary for array type.
Relaxing BTF array key from BTF_KIND_INT to, for example, BTF_KIND_ENUM
is probably ok, but key being BTF_KIND_PTR or BTF_KIND_ARRAY doesn't makes 
sense.

For hash maps we probably need hash specific checks too. Otherwise
such sanity checks would need to be in kernel pretty printer and later
in user space too (bpftool and everything that will consume BTF),
since user space won't be able to trust kernel with sane key types.



[PATCH] bonding: avoid repeated display of same link status change

2018-08-09 Thread rama nichanamatlu

From 9927a1c2a632d9479a80c63b7d9fda59ea8374bc Mon Sep 17 00:00:00 2001
From: Rama Nichanamatlu 
Date: Tue, 31 Jul 2018 07:09:52 -0700
Subject: [PATCH] bonding: avoid repeated display of same link status change

When link status change needs to be committed and rtnl lock couldn't be
taken, avoid redisplay of same link status change message.

Signed-off-by: Rama Nichanamatlu 
---
 drivers/net/bonding/bond_main.c |    6 --
 include/net/bonding.h   |    1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c 
b/drivers/net/bonding/bond_main.c

index 63e3844..3dd1091 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2086,7 +2086,7 @@ static int bond_miimon_inspect(struct bonding *bond)
    bond_propose_link_state(slave, BOND_LINK_FAIL);
    commit++;
    slave->delay = bond->params.downdelay;
-   if (slave->delay) {
+   if (slave->delay && !bond->rtnl_needed) {
    netdev_info(bond->dev, "link status 
down for %sinterface %s, disabling it in %d ms\n",

    (BOND_MODE(bond) ==
 BOND_MODE_ACTIVEBACKUP) ?
@@ -2126,7 +2126,7 @@ static int bond_miimon_inspect(struct bonding *bond)
    commit++;
    slave->delay = bond->params.updelay;

-   if (slave->delay) {
+   if (slave->delay && !bond->rtnl_needed) {
    netdev_info(bond->dev, "link status up 
for interface %s, enabling it in %d ms\n",

    slave->dev->name,
    ignore_updelay ? 0 :
@@ -2300,9 +2300,11 @@ static void bond_mii_monitor(struct work_struct 
*work)

    if (!rtnl_trylock()) {
    delay = 1;
    should_notify_peers = false;
+   bond->rtnl_needed = true;
    goto re_arm;
    }

+   bond->rtnl_needed = false;
    bond_for_each_slave(bond, slave, iter) {
    bond_commit_link_state(slave, 
BOND_SLAVE_NOTIFY_LATER);

    }
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 808f1d1..4e76e5d 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -234,6 +234,7 @@ struct bonding {
    struct   dentry *debug_dir;
 #endif /* CONFIG_DEBUG_FS */
    struct rtnl_link_stats64 bond_stats;
+   u8 rtnl_needed;
 };

 #define bond_slave_get_rcu(dev) \
--
1.7.1



Re: [patch net-next] net: sched: fix block->refcnt decrement

2018-08-09 Thread David Miller
From: Jiri Pirko 
Date: Wed,  8 Aug 2018 14:04:13 +0200

> From: Jiri Pirko 
> 
> Currently the refcnt is never decremented in case the value is not 1.
> Fix it by adding decrement in case the refcnt is not 1.
> 
> Reported-by: Vlad Buslov 
> Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation")
> Signed-off-by: Jiri Pirko 

Applied.


Re: [PATCH lora-next v2 6/8] net: lora: sx1301: replace version and size magic numbers with defines

2018-08-09 Thread Andreas Färber
Am 09.08.2018 um 14:33 schrieb Ben Whitten:
> We replace the hard coded numbers for size and version with meaningful
> names.
> 
> Signed-off-by: Ben Whitten 
> ---
>  drivers/net/lora/sx1301.c | 21 +
>  drivers/net/lora/sx1301.h | 18 ++
>  2 files changed, 31 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/net/lora/sx1301.h
> 
> diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
> index 916ee40..8e81179 100644
> --- a/drivers/net/lora/sx1301.c
> +++ b/drivers/net/lora/sx1301.c
> @@ -21,6 +21,8 @@
>  #include 
>  #include 
>  
> +#include "sx1301.h"
> +
>  #define REG_PAGE_RESET   0
>  #define REG_VERSION  1
>  #define REG_MCU_PROM_ADDR9
> @@ -293,7 +295,7 @@ static int sx1301_load_firmware(struct sx1301_priv *priv, 
> int mcu, const struct
>   u8 val, rst, select_mux;
>   int ret;
>  
> - if (fw->size != 8192) {
> + if (fw->size != SX1301_MCU_FW_BYTE) {

I think that should be BYTES, but we can still rename it later.

>   dev_err(priv->dev, "Unexpected firmware size\n");
>   return -EINVAL;
>   }
[...]
> diff --git a/drivers/net/lora/sx1301.h b/drivers/net/lora/sx1301.h
> new file mode 100644
> index 000..b37ac56
> --- /dev/null
> +++ b/drivers/net/lora/sx1301.h
> @@ -0,0 +1,18 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later

Header files must use /* ... */ or checkpatch.pl complains.

> +/*
> + * Semtech SX1301 lora concentrator

LoRa

> + *
> + * Copyright (c) 2018   Ben Whitten

Any reason for the multiple whitespaces?

> + */
> +
> +#ifndef _SX1301_
> +#define _SX1301_
> +
> +#define SX1301_CHIP_VERSION 103
> +
> +#define SX1301_MCU_FW_BYTE 8192
> +#define SX1301_MCU_ARB_FW_VERSION 1
> +#define SX1301_MCU_AGC_FW_VERSION 4
> +#define SX1301_MCU_AGC_CAL_FW_VERSION 2
> +
> +#endif

Applied.

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH net-next] net: ipv6_gre: Fix GRO to work on IPv6 over GRE tap

2018-08-09 Thread David Miller
From: Tariq Toukan 
Date: Wed,  8 Aug 2018 11:46:30 +0300

> IPv6 GRO over GRE tap is not working while GRO is not set
> over the native interface.
 ...
> This patch removes the override of the hard_header_len, and
> assigns the calculated value to needed_headroom.
> This way, the comparison in gro_list_prepare is really of
> the mac headers, and if the packets have the same mac headers
> the same_flow will be set to 1.
> 
> Performance testing: 45% higher bandwidth.
> Measuring bandwidth of single-stream IPv4 TCP traffic over IPv6
> GRE tap while GRO is not set on the native.
> NIC: ConnectX-4LX
> Before (GRO not working) : 7.2 Gbits/sec
> After (GRO working): 10.5 Gbits/sec
> 
> Signed-off-by: Maria Pasechnik 
> Signed-off-by: Tariq Toukan 

Applied, thank you.


Re: [PATCH lora-next v2 3/8] net: lora: sx1301: convert to passing priv data throughout

2018-08-09 Thread Ben Whitten
On Thu, 9 Aug 2018 at 21:43, Andreas Färber  wrote:
>
> Am 09.08.2018 um 14:33 schrieb Ben Whitten:
> > Instead of passing around the spi device we instead pass around our
> > driver data directly.
> >
> > Signed-off-by: Ben Whitten 
> > ---
> >  drivers/net/lora/sx1301.c | 305 
> > +++---
> >  1 file changed, 155 insertions(+), 150 deletions(-)
> >
> > diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
> > index 3c09f5a..7324001 100644
> > --- a/drivers/net/lora/sx1301.c
> > +++ b/drivers/net/lora/sx1301.c
> > @@ -73,24 +73,26 @@ struct spi_sx1301 {
> >  };
> >
> >  struct sx1301_priv {
> > + struct device   *dev;
> > + struct spi_device   *spi;
>
> Obviously this is not a long-term solution, but as interim step it'll
> have to do.
>
> >   struct lora_priv lora;
> >   struct gpio_desc *rst_gpio;
> >   u8 cur_page;
> >   struct spi_controller *radio_a_ctrl, *radio_b_ctrl;
> >  };
> >
> > -static int sx1301_read_burst(struct spi_device *spi, u8 reg, u8 *val, 
> > size_t len)
> > +static int sx1301_read_burst(struct sx1301_priv *priv, u8 reg, u8 *val, 
> > size_t len)
> >  {
> >   u8 addr = reg & 0x7f;
> > - return spi_write_then_read(spi, , 1, val, len);
> > + return spi_write_then_read(priv->spi, , 1, val, len);
> >  }
> >
> > -static int sx1301_read(struct spi_device *spi, u8 reg, u8 *val)
> > +static int sx1301_read(struct sx1301_priv *priv, u8 reg, u8 *val)
> >  {
> > - return sx1301_read_burst(spi, reg, val, 1);
> > + return sx1301_read_burst(priv, reg, val, 1);
> >  }
> >
> > -static int sx1301_write_burst(struct spi_device *spi, u8 reg, const u8 
> > *val, size_t len)
> > +static int sx1301_write_burst(struct sx1301_priv *priv, u8 reg, const u8 
> > *val, size_t len)
> >  {
> >   u8 addr = reg | BIT(7);
> >   struct spi_transfer xfr[2] = {
>
> This hunk did not apply for some reason, I've manually re-applied it.
>
> [...]
> > @@ -654,22 +646,35 @@ static int sx1301_probe(struct spi_device *spi)
> >   priv->rst_gpio = rst;
> >   priv->cur_page = 0xff;
> >
> > - spi_set_drvdata(spi, netdev);
> > + spi_set_drvdata(spi, priv);
>
> This change seems unnecessary and counter-productive for unregistration.
>
> Otherwise applying.

This is actually pretty critical, as it stands with the two spi masters we use
spi_get_drvdata on the parent device of the controller to recover the priv
struct for regmap.

We may have to include the netdev in the priv data, or do a container_of
dance to recover netdev in unregistration.
That said if we wrap things in devm then really our remove function could
be empty, as we have done with the allocation.

Regards,
Ben


Re: [PATCH net-next 0/3] qed*: Enhancements

2018-08-09 Thread David Miller
From: Manish Chopra 
Date: Thu, 9 Aug 2018 11:13:48 -0700

> This patch series adds following support in drivers -
> 
> 1. Egress mqprio offload.
> 2. Add destination IP based flow profile.
> 3. Ingress flower offload (for drop action).
> 
> Please consider applying this series to "net-next".

Series applied, thank you.


Re: [PATCH net-next 0/7] s390/qeth: updates 2018-08-09

2018-08-09 Thread David Miller
From: Julian Wiedmann 
Date: Thu,  9 Aug 2018 14:47:57 +0200

> one more set of patches for net-next. This is all sorts of cleanups
> and more refactoring on the way to using netdev_priv. Please apply.

Looks simple enough, series applied, thanks.


Re: [PATCH lora-next v2 5/8] net: lora: sx1301: remove duplicate firmware size checks

2018-08-09 Thread Andreas Färber
Am 09.08.2018 um 14:33 schrieb Ben Whitten:
> No need to check the size of the firmware multiple times, just do it once
> in the function responsible for loading as the firmwares are the same size.
> 
> Signed-off-by: Ben Whitten 
> ---
>  drivers/net/lora/sx1301.c | 21 +++--
>  1 file changed, 3 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
> index 3f2a532..916ee40 100644
> --- a/drivers/net/lora/sx1301.c
> +++ b/drivers/net/lora/sx1301.c
> @@ -293,8 +293,10 @@ static int sx1301_load_firmware(struct sx1301_priv 
> *priv, int mcu, const struct
>   u8 val, rst, select_mux;
>   int ret;
>  
> - if (fw->size > 8192)
> + if (fw->size != 8192) {

Note the original intention here was to allow loading firmware smaller
than the maximum size, but we can revisit that later if we ever have
such a firmware.

Applied.

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: KCM - recvmsg() mangles packets?

2018-08-09 Thread Tom Herbert
On Sun, Aug 5, 2018 at 4:39 PM, Dominique Martinet
 wrote:
> Dominique Martinet wrote on Sun, Aug 05, 2018:
>> It's getting late but I'll try adding a pskb_pull in there tomorrow, it
>> would be better to make the bpf program start with an offset but I don't
>> think that'll be easy to change...
>
> I can confirm the following patch fixes the issue for me:
> -8<-
> diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> index 625acb27efcc..348ff5945591 100644
> --- a/net/strparser/strparser.c
> +++ b/net/strparser/strparser.c
> @@ -222,6 +222,16 @@ static int __strp_recv(read_descriptor_t *desc, struct 
> sk_buff *orig_skb,
> if (!stm->strp.full_len) {
> ssize_t len;
>
> +   /* Can only parse if there is no offset */
> +   if (unlikely(stm->strp.offset)) {
> +   if (!pskb_pull(skb, stm->strp.offset)) {
> +   STRP_STATS_INCR(strp->stats.mem_fail);
> +   strp_parser_err(strp, -ENOMEM, desc);
> +   break;
> +   }
> +   stm->strp.offset = 0;
> +   }
> +

Seems okay to me for a fix. Looks like strp.offset is only set in one
place and read in one place. With this pull maybe that just can go
away?

Tom


> len = (*strp->cb.parse_msg)(strp, head);
>
> if (!len) {
> 8<--
>
> Now, I was looking at other users of strparser (I see sockmap, kcm and
> tls) and it looks like sockmap does not handle offsets either but tls
> does by using skb_copy_bits -- they're copying the tls header to a
> buffer on the stack.
>
> kcm cannot do that because we do not know how much data the user expects
> to read, and I'm not comfortable doing pskb_pull in the kcm callback
> either, but the cost of this pull is probably non-negligible if some
> user can make do without it...
>
> On the other hand, I do not see how to make the bpf program handle an
> offset in the skb as that offset is strparser-specific.
>
> Maybe add a flag in the cb that specifies wether the callback allows
> non-zero offset?
>
>
> I'll let you see if you can reproduce this and will wait for advices on
> how to solve this properly so we can work on a proper fix.
>
>
> Thanks,
> --
> Dominique


[PATCH net-next] liquidio: copperhead LED identification

2018-08-09 Thread Felix Manlunas
From: Raghu Vatsavayi 

Add LED identification support for liquidio TP copperhead cards.

Signed-off-by: Raghu Vatsavayi 
Acked-by: Derek Chickles 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 27 ++
 .../net/ethernet/cavium/liquidio/liquidio_common.h |  1 +
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c 
b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
index 06f7449..807ea2c 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
@@ -857,7 +857,14 @@ static int lio_set_phys_id(struct net_device *netdev,
 {
struct lio *lio = GET_LIO(netdev);
struct octeon_device *oct = lio->oct_dev;
+   struct oct_link_info *linfo;
int value, ret;
+   u32 cur_ver;
+
+   linfo = >linfo;
+   cur_ver = OCT_FW_VER(oct->fw_info.ver.maj,
+oct->fw_info.ver.min,
+oct->fw_info.ver.rev);
 
switch (state) {
case ETHTOOL_ID_ACTIVE:
@@ -896,16 +903,22 @@ static int lio_set_phys_id(struct net_device *netdev,
return ret;
} else if (oct->chip_id == OCTEON_CN23XX_PF_VID) {
octnet_id_active(netdev, LED_IDENTIFICATION_ON);
-
-   /* returns 0 since updates are asynchronous */
-   return 0;
+   if (linfo->link.s.phy_type == LIO_PHY_PORT_TP &&
+   cur_ver > OCT_FW_VER(1, 7, 2))
+   return 2;
+   else
+   return 0;
} else {
return -EINVAL;
}
break;
 
case ETHTOOL_ID_ON:
-   if (oct->chip_id == OCTEON_CN66XX)
+   if (oct->chip_id == OCTEON_CN23XX_PF_VID &&
+   linfo->link.s.phy_type == LIO_PHY_PORT_TP &&
+   cur_ver > OCT_FW_VER(1, 7, 2))
+   octnet_id_active(netdev, LED_IDENTIFICATION_ON);
+   else if (oct->chip_id == OCTEON_CN66XX)
octnet_gpio_access(netdev, VITESSE_PHY_GPIO_CFG,
   VITESSE_PHY_GPIO_HIGH);
else
@@ -914,7 +927,11 @@ static int lio_set_phys_id(struct net_device *netdev,
break;
 
case ETHTOOL_ID_OFF:
-   if (oct->chip_id == OCTEON_CN66XX)
+   if (oct->chip_id == OCTEON_CN23XX_PF_VID &&
+   linfo->link.s.phy_type == LIO_PHY_PORT_TP &&
+   cur_ver > OCT_FW_VER(1, 7, 2))
+   octnet_id_active(netdev, LED_IDENTIFICATION_OFF);
+   else if (oct->chip_id == OCTEON_CN66XX)
octnet_gpio_access(netdev, VITESSE_PHY_GPIO_CFG,
   VITESSE_PHY_GPIO_LOW);
else
diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h 
b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
index 690424b..7407fcd 100644
--- a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
+++ b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
@@ -907,6 +907,7 @@ static inline int opcode_slow_path(union octeon_rh *rh)
 #define VITESSE_PHY_GPIO_LOW  0x3
 #define LED_IDENTIFICATION_ON 0x1
 #define LED_IDENTIFICATION_OFF0x0
+#define LIO23XX_COPPERHEAD_LED_GPIO 0x2
 
 struct oct_mdio_cmd {
u64 op;


Re: [PATCH bpf-next] bpf: enable btf for use in all maps

2018-08-09 Thread Yonghong Song




On 8/9/18 12:42 PM, Daniel Borkmann wrote:

Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
the basic arraymap") enabled support for BTF and dumping via
BPF fs for arraymap. However, both can be decoupled from each
other such that all BPF maps can be supported for attaching
BTF key/value information, while not all maps necessarily
need to dump via map_seq_show_elem() callback.

The check in array_map_check_btf() can be generalized as
ultimatively the key and value size is the only contraint
that needs to match for the map. The fact that the key needs
to be of type int is optional; it could be any data type as
long as it matches the 4 byte key size, just like hash table
key or others could be of any data type as well.

Minimal example of a hash table dump which then works out
of the box for bpftool:

   # bpftool map dump id 19
   [{
   "key": {
   "": {
   "vip": 0,
   "vipv6": []
   },
   "port": 0,
   "family": 0,
   "proto": 0
   },
   "value": {
   "flags": 0,
   "vip_num": 0
   }
   }
   ]

Signed-off-by: Daniel Borkmann 
Cc: Yonghong Song 


LGTM. Thanks!
Acked-by: Yonghong Song 


Re: [PATCH lora-next v2 4/8] net: lora: sx1301: convert load_firmware to take firmware directly

2018-08-09 Thread Andreas Färber
Am 09.08.2018 um 14:33 schrieb Ben Whitten:
> We just pass the pointer to firmware down to the function that loads
> it.
> 
> Signed-off-by: Ben Whitten 
> ---
>  drivers/net/lora/sx1301.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)

Applied.

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH lora-next v2 3/8] net: lora: sx1301: convert to passing priv data throughout

2018-08-09 Thread Andreas Färber
Am 09.08.2018 um 14:33 schrieb Ben Whitten:
> Instead of passing around the spi device we instead pass around our
> driver data directly.
> 
> Signed-off-by: Ben Whitten 
> ---
>  drivers/net/lora/sx1301.c | 305 
> +++---
>  1 file changed, 155 insertions(+), 150 deletions(-)
> 
> diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
> index 3c09f5a..7324001 100644
> --- a/drivers/net/lora/sx1301.c
> +++ b/drivers/net/lora/sx1301.c
> @@ -73,24 +73,26 @@ struct spi_sx1301 {
>  };
>  
>  struct sx1301_priv {
> + struct device   *dev;
> + struct spi_device   *spi;

Obviously this is not a long-term solution, but as interim step it'll
have to do.

>   struct lora_priv lora;
>   struct gpio_desc *rst_gpio;
>   u8 cur_page;
>   struct spi_controller *radio_a_ctrl, *radio_b_ctrl;
>  };
>  
> -static int sx1301_read_burst(struct spi_device *spi, u8 reg, u8 *val, size_t 
> len)
> +static int sx1301_read_burst(struct sx1301_priv *priv, u8 reg, u8 *val, 
> size_t len)
>  {
>   u8 addr = reg & 0x7f;
> - return spi_write_then_read(spi, , 1, val, len);
> + return spi_write_then_read(priv->spi, , 1, val, len);
>  }
>  
> -static int sx1301_read(struct spi_device *spi, u8 reg, u8 *val)
> +static int sx1301_read(struct sx1301_priv *priv, u8 reg, u8 *val)
>  {
> - return sx1301_read_burst(spi, reg, val, 1);
> + return sx1301_read_burst(priv, reg, val, 1);
>  }
>  
> -static int sx1301_write_burst(struct spi_device *spi, u8 reg, const u8 *val, 
> size_t len)
> +static int sx1301_write_burst(struct sx1301_priv *priv, u8 reg, const u8 
> *val, size_t len)
>  {
>   u8 addr = reg | BIT(7);
>   struct spi_transfer xfr[2] = {

This hunk did not apply for some reason, I've manually re-applied it.

[...]
> @@ -654,22 +646,35 @@ static int sx1301_probe(struct spi_device *spi)
>   priv->rst_gpio = rst;
>   priv->cur_page = 0xff;
>  
> - spi_set_drvdata(spi, netdev);
> + spi_set_drvdata(spi, priv);

This change seems unnecessary and counter-productive for unregistration.

Otherwise applying.

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: [bpf-next PATCH 2/2] samples/bpf: xdp_redirect_cpu load balance like Suricata

2018-08-09 Thread Jesper Dangaard Brouer
On Thu, 9 Aug 2018 22:15:22 +0200
Daniel Borkmann  wrote:

> On 08/09/2018 03:26 PM, Jesper Dangaard Brouer wrote:
> > This implement XDP CPU redirection load-balancing across available
> > CPUs, based on the hashing IP-pairs + L4-protocol.  This equivalent to
> > xdp-cpu-redirect feature in Suricata, which is inspired by the
> > Suricata 'ippair' hashing code.
> > 
> > An important property is that the hashing is flow symmetric, meaning
> > that if the source and destination gets swapped then the selected CPU
> > will remain the same.  This is helps locality by placing both directions
> > of a flows on the same CPU, in a forwarding/routing scenario.
> > 
> > The hashing INITVAL (15485863 the 10^6th prime number) was fairly
> > arbitrary choosen, but experiments with kernel tree pktgen scripts
> > (pktgen_sample04_many_flows.sh +pktgen_sample05_flow_per_thread.sh)
> > showed this improved the distribution.
> > 
> > This patch also change the default loaded XDP program to be this
> > load-balancer.  As based on different user feedback, this seems to be
> > the expected behavior of the sample xdp_redirect_cpu.
> > 
> > Link: https://github.com/OISF/suricata/commit/796ec08dd7a63
> > Signed-off-by: Jesper Dangaard Brouer 
> > ---
> >  samples/bpf/xdp_redirect_cpu_kern.c |  103 
> > +++
> >  samples/bpf/xdp_redirect_cpu_user.c |4 +
> >  2 files changed, 105 insertions(+), 2 deletions(-)
> > 
> > diff --git a/samples/bpf/xdp_redirect_cpu_kern.c 
> > b/samples/bpf/xdp_redirect_cpu_kern.c
> > index 0cc3d71057f0..a306d1c75622 100644
> > --- a/samples/bpf/xdp_redirect_cpu_kern.c
> > +++ b/samples/bpf/xdp_redirect_cpu_kern.c
> > @@ -13,6 +13,7 @@
> >  
> >  #include 
> >  #include "bpf_helpers.h"
> > +#include "hash_func01.h"
> >  
> >  #define MAX_CPUS 64 /* WARNING - sync with _user.c */  
> 
> Hmm, this doesn't apply cleanly. I have the following in bpf-next:
> 
> #define MAX_CPUS 12 /* WARNING - sync with _user.c */
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/samples/bpf/xdp_redirect_cpu_kern.c#n17
> 
> Rebase issue? Please respin, thanks.

Ah, this is due to the teardown-fixes patchset for "bpf" git-tree,
which you just applied, which changed MAX_CPUS to 64 (so, QA can use
the reproducer).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [bpf-next PATCH 2/2] samples/bpf: xdp_redirect_cpu load balance like Suricata

2018-08-09 Thread Daniel Borkmann
On 08/09/2018 03:26 PM, Jesper Dangaard Brouer wrote:
> This implement XDP CPU redirection load-balancing across available
> CPUs, based on the hashing IP-pairs + L4-protocol.  This equivalent to
> xdp-cpu-redirect feature in Suricata, which is inspired by the
> Suricata 'ippair' hashing code.
> 
> An important property is that the hashing is flow symmetric, meaning
> that if the source and destination gets swapped then the selected CPU
> will remain the same.  This is helps locality by placing both directions
> of a flows on the same CPU, in a forwarding/routing scenario.
> 
> The hashing INITVAL (15485863 the 10^6th prime number) was fairly
> arbitrary choosen, but experiments with kernel tree pktgen scripts
> (pktgen_sample04_many_flows.sh +pktgen_sample05_flow_per_thread.sh)
> showed this improved the distribution.
> 
> This patch also change the default loaded XDP program to be this
> load-balancer.  As based on different user feedback, this seems to be
> the expected behavior of the sample xdp_redirect_cpu.
> 
> Link: https://github.com/OISF/suricata/commit/796ec08dd7a63
> Signed-off-by: Jesper Dangaard Brouer 
> ---
>  samples/bpf/xdp_redirect_cpu_kern.c |  103 
> +++
>  samples/bpf/xdp_redirect_cpu_user.c |4 +
>  2 files changed, 105 insertions(+), 2 deletions(-)
> 
> diff --git a/samples/bpf/xdp_redirect_cpu_kern.c 
> b/samples/bpf/xdp_redirect_cpu_kern.c
> index 0cc3d71057f0..a306d1c75622 100644
> --- a/samples/bpf/xdp_redirect_cpu_kern.c
> +++ b/samples/bpf/xdp_redirect_cpu_kern.c
> @@ -13,6 +13,7 @@
>  
>  #include 
>  #include "bpf_helpers.h"
> +#include "hash_func01.h"
>  
>  #define MAX_CPUS 64 /* WARNING - sync with _user.c */

Hmm, this doesn't apply cleanly. I have the following in bpf-next:

#define MAX_CPUS 12 /* WARNING - sync with _user.c */

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/samples/bpf/xdp_redirect_cpu_kern.c#n17

Rebase issue? Please respin, thanks.


Re: [net PATCH 0/3] Fix two teardown bugs for BPF maps cpumap and devmap

2018-08-09 Thread Daniel Borkmann
On 08/08/2018 11:00 PM, Jesper Dangaard Brouer wrote:
> Removing entries from cpumap and devmap, goes through a number of
> syncronization steps to make sure no new xdp_frames can be enqueued.
> But there is a small chance, that xdp_frames remains which have not
> been flushed/processed yet.  Flushing these during teardown, happens
> from RCU context and not as usual under RX NAPI context.
> 
> The optimization introduced in commt 389ab7f01af9 ("xdp: introduce
> xdp_return_frame_rx_napi"), missed that the flush operation can also
> be called from RCU context.  Thus, we cannot always use the
> xdp_return_frame_rx_napi call, which take advantage of the protection
> provided by XDP RX running under NAPI protection.
> 
> The samples/bpf xdp_redirect_cpu have a --stress-mode, that is
> adjusted to easier reproduce (verified by Red Hat QA).

Applied to bpf, thanks Jesper!


Re: [PATCH bpf] bpf: fix bpffs non-array map seq_show issue

2018-08-09 Thread Daniel Borkmann
On 08/09/2018 07:54 PM, Yonghong Song wrote:
> On 8/9/18 10:02 AM, Daniel Borkmann wrote:
>> On 08/09/2018 06:55 PM, Yonghong Song wrote:
>>> On 8/9/18 8:59 AM, Daniel Borkmann wrote:
 On 08/09/2018 05:15 PM, Yonghong Song wrote:
> On 8/9/18 7:24 AM, Daniel Borkmann wrote:
>> On 08/09/2018 05:55 AM, Yonghong Song wrote:
>>> On 8/8/18 7:25 PM, Alexei Starovoitov wrote:
 On Wed, Aug 08, 2018 at 06:25:19PM -0700, Yonghong Song wrote:
> In function map_seq_next() of kernel/bpf/inode.c,
> the first key will be the "0" regardless of the map type.
> This works for array. But for hash type, if it happens
> key "0" is in the map, the bpffs map show will miss
> some items if the key "0" is not the first element of
> the first bucket.
>
> This patch fixed the issue by guaranteeing to get
> the first element, if the seq_show is just started,
> by passing NULL pointer key to map_get_next_key() callback.
> This way, no missing elements will occur for
> bpffs hash table show even if key "0" is in the map.
>>>
>>> Currently, map_seq_show_elem callback is only implemented
>>> for arraymap. So the problem actually is not exposed.
>>>
>>> The issue is discovered when I tried to implement
>>> map_seq_show_elem for hash maps, and I will have followup
>>> patches for it.

 Btw, on that note, I would also prefer if we could decouple
 BTF from the map_seq_show_elem() as there is really no reason
 to have it on a per-map. I had a patch below which would enable
 it for all map types generically, and bpftool works out of the
 box for it. Also, array doesn't really have to be 'int' type
 enforced as long as it's some data structure with 4 bytes, it's
 all fine, so this can be made fully generic (we only eventually
 care about the match in size).
>>>
>>> I agree with a generic map_check_btf as mostly we only care about size
>>> and this change should enable btftool btf based pretty print for 
>>> hash/lru_hash tables.
>>
>> Yep, agree, the below output from bpftool is from test_xdp_noinline.o
>> where both work with it.
>>
   From 0a8be27cbc2ac0c6fc2632865b5afe37222a1fc7 Mon Sep 17 00:00:00 2001
 Message-Id: 
 <0a8be27cbc2ac0c6fc2632865b5afe37222a1fc7.1533830053.git.dan...@iogearbox.net>
 From: Daniel Borkmann 
 Date: Thu, 9 Aug 2018 16:50:21 +0200
 Subject: [PATCH bpf-next] bpf, btf: enable for all maps

 # bpftool m dump id 19
 [{
   "key": {
   "": {
   "vip": 0,
   "vipv6": []
   },
   "port": 0,
   "family": 0,
   "proto": 0
   },
   "value": {
   "flags": 0,
   "vip_num": 0
   }
   }
 ]

 Signed-off-by: Daniel Borkmann 
 ---
    include/linux/bpf.h   |  4 +---
    kernel/bpf/arraymap.c | 27 ---
    kernel/bpf/inode.c    |  3 ++-
    kernel/bpf/syscall.c  | 24 
    4 files changed, 23 insertions(+), 35 deletions(-)

 diff --git a/include/linux/bpf.h b/include/linux/bpf.h
 index cd8790d..91aa4be 100644
 --- a/include/linux/bpf.h
 +++ b/include/linux/bpf.h
 @@ -48,8 +48,6 @@ struct bpf_map_ops {
    u32 (*map_fd_sys_lookup_elem)(void *ptr);
    void (*map_seq_show_elem)(struct bpf_map *map, void *key,
  struct seq_file *m);
 -    int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
 - u32 key_type_id, u32 value_type_id);
    };

    struct bpf_map {
 @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const 
 struct bpf_map *map)

    static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
    {
 -    return map->ops->map_seq_show_elem && map->ops->map_check_btf;
 +    return map->ops->map_seq_show_elem;
    }

    extern const struct bpf_map_ops bpf_map_offload_ops;
 diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
 index 2aa55d030..67f0bdf 100644
 --- a/kernel/bpf/arraymap.c
 +++ b/kernel/bpf/arraymap.c
 @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map 
 *map, void *key,
    rcu_read_unlock();
    }

 -static int array_map_check_btf(const struct bpf_map *map, const struct 
 btf *btf,
 -   u32 btf_key_id, u32 btf_value_id)
 -{
 -    const struct btf_type *key_type, *value_type;
 -    u32 key_size, value_size;
 -    u32 int_data;
 -
 -    key_type = btf_type_id_size(btf, _key_id, _size);
 -    if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
 -    return -EINVAL;
 -
 -    int_data = *(u32 

[PATCH bpf-next] bpf: enable btf for use in all maps

2018-08-09 Thread Daniel Borkmann
Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
the basic arraymap") enabled support for BTF and dumping via
BPF fs for arraymap. However, both can be decoupled from each
other such that all BPF maps can be supported for attaching
BTF key/value information, while not all maps necessarily
need to dump via map_seq_show_elem() callback.

The check in array_map_check_btf() can be generalized as
ultimatively the key and value size is the only contraint
that needs to match for the map. The fact that the key needs
to be of type int is optional; it could be any data type as
long as it matches the 4 byte key size, just like hash table
key or others could be of any data type as well.

Minimal example of a hash table dump which then works out
of the box for bpftool:

  # bpftool map dump id 19
  [{
  "key": {
  "": {
  "vip": 0,
  "vipv6": []
  },
  "port": 0,
  "family": 0,
  "proto": 0
  },
  "value": {
  "flags": 0,
  "vip_num": 0
  }
  }
  ]

Signed-off-by: Daniel Borkmann 
Cc: Yonghong Song 
---
 include/linux/bpf.h   |  4 +---
 kernel/bpf/arraymap.c | 27 ---
 kernel/bpf/inode.c|  3 ++-
 kernel/bpf/syscall.c  | 24 
 4 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cd8790d..eb76e8e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -48,8 +48,6 @@ struct bpf_map_ops {
u32 (*map_fd_sys_lookup_elem)(void *ptr);
void (*map_seq_show_elem)(struct bpf_map *map, void *key,
  struct seq_file *m);
-   int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
-u32 key_type_id, u32 value_type_id);
 };
 
 struct bpf_map {
@@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const struct 
bpf_map *map)
 
 static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
 {
-   return map->ops->map_seq_show_elem && map->ops->map_check_btf;
+   return map->btf && map->ops->map_seq_show_elem;
 }
 
 extern const struct bpf_map_ops bpf_map_offload_ops;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2aa55d030..67f0bdf 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map *map, 
void *key,
rcu_read_unlock();
 }
 
-static int array_map_check_btf(const struct bpf_map *map, const struct btf 
*btf,
-  u32 btf_key_id, u32 btf_value_id)
-{
-   const struct btf_type *key_type, *value_type;
-   u32 key_size, value_size;
-   u32 int_data;
-
-   key_type = btf_type_id_size(btf, _key_id, _size);
-   if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
-   return -EINVAL;
-
-   int_data = *(u32 *)(key_type + 1);
-   /* bpf array can only take a u32 key.  This check makes
-* sure that the btf matches the attr used during map_create.
-*/
-   if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
-   BTF_INT_OFFSET(int_data))
-   return -EINVAL;
-
-   value_type = btf_type_id_size(btf, _value_id, _size);
-   if (!value_type || value_size != map->value_size)
-   return -EINVAL;
-
-   return 0;
-}
-
 const struct bpf_map_ops array_map_ops = {
.map_alloc_check = array_map_alloc_check,
.map_alloc = array_map_alloc,
@@ -394,7 +368,6 @@ const struct bpf_map_ops array_map_ops = {
.map_delete_elem = array_map_delete_elem,
.map_gen_lookup = array_map_gen_lookup,
.map_seq_show_elem = array_map_seq_show_elem,
-   .map_check_btf = array_map_check_btf,
 };
 
 const struct bpf_map_ops percpu_array_map_ops = {
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 76efe9a..400f27d 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -332,7 +332,8 @@ static int bpf_mkmap(struct dentry *dentry, umode_t mode, 
void *arg)
struct bpf_map *map = arg;
 
return bpf_mkobj_ops(dentry, mode, arg, _map_iops,
-map->btf ? _map_fops : _obj_fops);
+bpf_map_support_seq_show(map) ?
+_map_fops : _obj_fops);
 }
 
 static struct dentry *
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5af4e9e..0b6f6e8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -455,6 +455,23 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
return 0;
 }
 
+static int map_check_btf(const struct bpf_map *map, const struct btf *btf,
+u32 btf_key_id, u32 btf_value_id)
+{
+   const struct btf_type *key_type, *value_type;
+   u32 key_size, value_size;
+
+   key_type = btf_type_id_size(btf, _key_id, _size);
+   if (!key_type || key_size 

[PATCH v1 net-next] lan743x: lan743x: Add PTP support

2018-08-09 Thread Bryan Whitehead
PTP support includes:
Ingress, and egress timestamping.
One step timestamping available.
PTP clock support.
Periodic output support.

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/Makefile  |2 +-
 drivers/net/ethernet/microchip/lan743x_ethtool.c |   27 +
 drivers/net/ethernet/microchip/lan743x_main.c|   78 +-
 drivers/net/ethernet/microchip/lan743x_main.h|  101 +-
 drivers/net/ethernet/microchip/lan743x_ptp.c | 1164 ++
 drivers/net/ethernet/microchip/lan743x_ptp.h |   76 ++
 6 files changed, 1443 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ptp.c
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ptp.h

diff --git a/drivers/net/ethernet/microchip/Makefile 
b/drivers/net/ethernet/microchip/Makefile
index 43f47cb..538926d 100644
--- a/drivers/net/ethernet/microchip/Makefile
+++ b/drivers/net/ethernet/microchip/Makefile
@@ -6,4 +6,4 @@ obj-$(CONFIG_ENC28J60) += enc28j60.o
 obj-$(CONFIG_ENCX24J600) += encx24j600.o encx24j600-regmap.o
 obj-$(CONFIG_LAN743X) += lan743x.o
 
-lan743x-objs := lan743x_main.o lan743x_ethtool.o
+lan743x-objs := lan743x_main.o lan743x_ethtool.o lan743x_ptp.o
diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
index c25b3e9..07c1eb6 100644
--- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -4,6 +4,7 @@
 #include 
 #include "lan743x_main.h"
 #include "lan743x_ethtool.h"
+#include 
 #include 
 #include 
 
@@ -542,6 +543,31 @@ static int lan743x_ethtool_set_rxfh(struct net_device 
*netdev,
return 0;
 }
 
+static int lan743x_ethtool_get_ts_info(struct net_device *netdev,
+  struct ethtool_ts_info *ts_info)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   ts_info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE |
+  SOF_TIMESTAMPING_RX_SOFTWARE |
+  SOF_TIMESTAMPING_SOFTWARE |
+  SOF_TIMESTAMPING_TX_HARDWARE |
+  SOF_TIMESTAMPING_RX_HARDWARE |
+  SOF_TIMESTAMPING_RAW_HARDWARE;
+
+   if (adapter->ptp.ptp_clock)
+   ts_info->phc_index = ptp_clock_index(adapter->ptp.ptp_clock);
+   else
+   ts_info->phc_index = -1;
+
+   ts_info->tx_types = BIT(HWTSTAMP_TX_OFF) |
+   BIT(HWTSTAMP_TX_ON) |
+   BIT(HWTSTAMP_TX_ONESTEP_SYNC);
+   ts_info->rx_filters = BIT(HWTSTAMP_FILTER_NONE) |
+ BIT(HWTSTAMP_FILTER_ALL);
+   return 0;
+}
+
 static int lan743x_ethtool_get_eee(struct net_device *netdev,
   struct ethtool_eee *eee)
 {
@@ -685,6 +711,7 @@ const struct ethtool_ops lan743x_ethtool_ops = {
.get_rxfh_indir_size = lan743x_ethtool_get_rxfh_indir_size,
.get_rxfh = lan743x_ethtool_get_rxfh,
.set_rxfh = lan743x_ethtool_set_rxfh,
+   .get_ts_info = lan743x_ethtool_get_ts_info,
.get_eee = lan743x_ethtool_get_eee,
.set_eee = lan743x_ethtool_set_eee,
.get_link_ksettings = phy_ethtool_get_link_ksettings,
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
b/drivers/net/ethernet/microchip/lan743x_main.c
index cd41911..48da18c 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -267,6 +267,10 @@ static void lan743x_intr_shared_isr(void *context, u32 
int_sts, u32 flags)
lan743x_intr_software_isr(adapter);
int_sts &= ~INT_BIT_SW_GP_;
}
+   if (int_sts & INT_BIT_1588_) {
+   lan743x_ptp_isr(adapter);
+   int_sts &= ~INT_BIT_1588_;
+   }
}
if (int_sts)
lan743x_csr_write(adapter, INT_EN_CLR, int_sts);
@@ -976,6 +980,7 @@ static void lan743x_phy_link_status_change(struct 
net_device *netdev)
   ksettings.base.duplex,
   local_advertisement,
   remote_advertisement);
+   lan743x_ptp_update_latency(adapter, ksettings.base.speed);
}
 }
 
@@ -1226,6 +1231,7 @@ static void lan743x_tx_release_desc(struct lan743x_tx *tx,
struct lan743x_tx_buffer_info *buffer_info = NULL;
struct lan743x_tx_descriptor *descriptor = NULL;
u32 descriptor_type = 0;
+   bool ignore_sync;
 
descriptor = >ring_cpu_ptr[descriptor_index];
buffer_info = >buffer_info[descriptor_index];
@@ -1256,11 +1262,27 @@ static void lan743x_tx_release_desc(struct lan743x_tx 
*tx,
buffer_info->dma_ptr = 0;
buffer_info->buffer_length = 

Re: [PATCH net-next v6 06/11] net: sched: add 'delete' function to action ops

2018-08-09 Thread Cong Wang
On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov  wrote:
>
> Extend action ops with 'delete' function. Each action type to implements
> its own delete function that doesn't depend on rtnl lock.
>
> Implement delete function that is required to delete actions without
> holding rtnl lock. Use action API function that atomically deletes action
> only if it is still in action idr. This implementation prevents concurrent
> threads from deleting same action twice.

I fail to understand why you introduce ops->delete(), it seems all
you want is getting the tn->idrinfo, but you already have tc_action
before calling ops->delete(), and tc_action has ->idrinfo...

Each type of action does the same too, that is, just calling
tcf_idr_delete_index()...

This changelog sucks again, it claims for skipping rtnl lock,
but you can skip rtnl lock by just calling tcf_idr_delete_index()
directly too, it is not the reason for adding ops->delete().


Re: [PATCH lora-next v2 2/8] net: lora: sx1301: convert to devm registration of netdev

2018-08-09 Thread Andreas Färber
Am 09.08.2018 um 14:33 schrieb Ben Whitten:
> We allow the devres framework handle the clean removal of resources on
> teardown of the device, in this case the SPI device, saving lengthy
> unwind code and improving clarity.
> 
> Signed-off-by: Ben Whitten 
> ---
>  drivers/net/lora/sx1301.c | 87 
> +++
>  1 file changed, 27 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
> index 5342b61..3c09f5a 100644
> --- a/drivers/net/lora/sx1301.c
> +++ b/drivers/net/lora/sx1301.c
[...]
>  static int sx1301_remove(struct spi_device *spi)
>  {
> - struct net_device *netdev = spi_get_drvdata(spi);
> -
> - //unregister_loradev(netdev);

Thanks, this part we'll still need later though.

Applying.

Regards,
Andreas

> - free_loradev(netdev);
> -
>   dev_info(>dev, "SX1301 module removed\n");
>  
>   return 0;
[snip]

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH net-next 5/5] net: aquantia: bump driver version

2018-08-09 Thread Jakub Kicinski
On Thu, 9 Aug 2018 13:10:09 +0300, Igor Russkikh wrote:
> Hi Andrew,
> 
> Thanks for the review, agreed on all your findings,
> we'll address these comments in next version.
> 
> > Driver versions are pretty much useless. Say somebody backports this
> > driver into a vendor kernel. Vendor kernels are typically based on an
> > old kernel, plus thousands of patches. Is 2.0.4 on such a kernel the
> > same as 2.0.4 in 4.19?
> > 
> > You probably want to remove this, just the avoid people thinking is
> > means something.  
> 
> We do distribute our driver as a separate source package, customers for older
> kernels use it and install without updating their kernels.
> 
> We also in fact always have a gap in featureset between upstream driver and 
> our
> preview driver releases on github - alot of people tend to tryout 'hot and 
> latest' driver
> instead of waiting for their distro kernel to receive updates.
> 
> All this means we have to understand which version (and featureset)
> users have installed in fields.
> 
> `ethtool -i` and this internal driver version is a huge helper for us.
> 
> You are right that in event of backporting there could be some
> uncertainty whether that bugfix is there or not. But having kernel
> version plus driver version really helps us to understand the exact
> configuration in fields.
> 
> Moreover, inkernel driver version bumps are linked with new features
> added (like in this patchset). Thus the overall concept of internal
> version is still very useful for Aquantia technical and support
> engineers.
> 
> Hope the above are good enough arguments to keep up version bumps.

A reasonably successful strategy is to version your out-of-tree driver
with corresponding kernel release versions.  Flip the equation.

E.g. current net-next will become 4.19, so all the features you're
pushing now would in your GH repo be:

4.19.0.${extra_numbers_if_you_want}-gh

And you still have single versioning scheme, but the "bump to version
XYZ" commits now go into your local tree not the kernel.

As Andrew said backports make driver versions less than useful.  Not
only feature backports to enterprise kernel, but bug fixes which have to
go to stable.


Re: [PATCH lora-next v2 1/8] net: lora: add methods for devm registration

2018-08-09 Thread Andreas Färber
Am 09.08.2018 um 14:33 schrieb Ben Whitten:
> Follow the devm model so that we can avoid lengthy unwind code.
> 
> Signed-off-by: Ben Whitten 
> ---
>  drivers/net/lora/dev.c   | 28 
>  include/linux/lora/dev.h |  1 +
>  2 files changed, 29 insertions(+)
> 
> diff --git a/drivers/net/lora/dev.c b/drivers/net/lora/dev.c
> index 8c01106..e32a870 100644
> --- a/drivers/net/lora/dev.c
> +++ b/drivers/net/lora/dev.c
> @@ -84,6 +84,34 @@ void free_loradev(struct net_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(free_loradev);
>  
> +static void devm_free_loradev(struct device *dev, void *res)
> +{
> + struct net_device *net = (*(struct net_device **)res);
> + free_loradev(net);

This is what I meant with adding a variable:

diff --git a/drivers/net/lora/dev.c b/drivers/net/lora/dev.c
index c1b196cdf835..0d4823de8c06 100644
--- a/drivers/net/lora/dev.c
+++ b/drivers/net/lora/dev.c
@@ -87,8 +87,9 @@ EXPORT_SYMBOL_GPL(free_loradev);

 static void devm_free_loradev(struct device *dev, void *res)
 {
-   struct net_device *net = (*(struct net_device **)res);
-   free_loradev(net);
+   struct net_device **net = res;
+
+   free_loradev(*net);
 }

 struct net_device *devm_alloc_loradev(struct device *dev, size_t priv)

Applying.

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: [net-next, PATCH 1/2] net: socionext: Use descriptor info instead of MMIO reads on Rx

2018-08-09 Thread Ilias Apalodimas
On Thu, Aug 09, 2018 at 05:37:15PM +0200, Arnd Bergmann wrote:
> On Thu, Aug 9, 2018 at 10:02 AM Ilias Apalodimas
>  wrote:
> >
> > MMIO reads for remaining packets in queue occur (at least)twice per
> > invocation of netsec_process_rx(). We can use the packet descriptor to
> > identify if it's owned by the hardware and break out, avoiding the more
> > expensive MMIO read operations. This has a ~2% increase on the pps of the
> > Rx path when tested with 64byte packets
> >
> > Signed-off-by: Ilias Apalodimas 
> > ---
> >  drivers/net/ethernet/socionext/netsec.c | 19 +--
> >  1 file changed, 5 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/socionext/netsec.c 
> > b/drivers/net/ethernet/socionext/netsec.c
> > index 01589b6..ae32909 100644
> > --- a/drivers/net/ethernet/socionext/netsec.c
> > +++ b/drivers/net/ethernet/socionext/netsec.c
> > @@ -657,8 +657,6 @@ static struct sk_buff *netsec_get_rx_pkt_data(struct 
> > netsec_priv *priv,
> 
> > +   if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD))
> > +   break;
> > done++;
> 
> Should this use READ_ONCE() to prevent the compiler from moving the
> access around? I see that netsec_get_rx_pkt_data() has a dma_rmb()
> before reading the data, which prevents the CPU from doing something
> wrong here, but not the compiler.
> 
> Arnd
As we discussed i'll send a V2 with the dma_rmb() right after the desc status
read

Thnaks
Ilias


[PATCH net] l2tp: use sk_dst_check() to avoid race on sk->sk_dst_cache

2018-08-09 Thread Wei Wang
From: Wei Wang 

In l2tp code, if it is a L2TP_UDP_ENCAP tunnel, tunnel->sk points to a
UDP socket. User could call sendmsg() on both this tunnel and the UDP
socket itself concurrently. As l2tp_xmit_skb() holds socket lock and call
__sk_dst_check() to refresh sk->sk_dst_cache, while udpv6_sendmsg() is
lockless and call sk_dst_check() to refresh sk->sk_dst_cache, there
could be a race and cause the dst cache to be freed multiple times.
So we fix l2tp side code to always call sk_dst_check() to garantee
xchg() is called when refreshing sk->sk_dst_cache to avoid race
conditions.

Syzkaller reported stack trace:
BUG: KASAN: use-after-free in atomic_read 
include/asm-generic/atomic-instrumented.h:21 [inline]
BUG: KASAN: use-after-free in atomic_fetch_add_unless 
include/linux/atomic.h:575 [inline]
BUG: KASAN: use-after-free in atomic_add_unless include/linux/atomic.h:597 
[inline]
BUG: KASAN: use-after-free in dst_hold_safe include/net/dst.h:308 [inline]
BUG: KASAN: use-after-free in ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029
Read of size 4 at addr 8801aea9a880 by task syz-executor129/4829

CPU: 0 PID: 4829 Comm: syz-executor129 Not tainted 4.18.0-rc7-next-20180802+ #30
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
 atomic_fetch_add_unless include/linux/atomic.h:575 [inline]
 atomic_add_unless include/linux/atomic.h:597 [inline]
 dst_hold_safe include/net/dst.h:308 [inline]
 ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029
 rt6_get_pcpu_route net/ipv6/route.c:1249 [inline]
 ip6_pol_route+0x354/0xd20 net/ipv6/route.c:1922
 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2098
 fib6_rule_lookup+0x283/0x890 net/ipv6/fib6_rules.c:122
 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2126
 ip6_dst_lookup_tail+0x1278/0x1da0 net/ipv6/ip6_output.c:978
 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
 ip6_sk_dst_lookup_flow+0x5ed/0xc50 net/ipv6/ip6_output.c:1117
 udpv6_sendmsg+0x2163/0x36b0 net/ipv6/udp.c:1354
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:622 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:632
 ___sys_sendmsg+0x51d/0x930 net/socket.c:2115
 __sys_sendmmsg+0x240/0x6f0 net/socket.c:2210
 __do_sys_sendmmsg net/socket.c:2239 [inline]
 __se_sys_sendmmsg net/socket.c:2236 [inline]
 __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2236
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446a29
Code: e8 ac b8 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 
eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7f4de5532db8 EFLAGS: 0246 ORIG_RAX: 0133
RAX: ffda RBX: 006dcc38 RCX: 00446a29
RDX: 00b8 RSI: 20001b00 RDI: 0003
RBP: 006dcc30 R08: 7f4de5533700 R09: 
R10:  R11: 0246 R12: 006dcc3c
R13: 7ffe2b830fdf R14: 7f4de55339c0 R15: 0001

Fixes: 71b1391a4128 ("l2tp: ensure sk->dst is still valid")
Reported-by: syzbot+05f840f3b04f211ba...@syzkaller.appspotmail.com
Signed-off-by: Wei Wang 
Signed-off-by: Martin KaFai Lau 
Cc: David Ahern 
Cc: Cong Wang 
---
 net/l2tp/l2tp_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 40261cb68e83..7166b61338d4 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1110,7 +1110,7 @@ int l2tp_xmit_skb(struct l2tp_session *session, struct 
sk_buff *skb, int hdr_len
 
/* Get routing info from the tunnel socket */
skb_dst_drop(skb);
-   skb_dst_set(skb, dst_clone(__sk_dst_check(sk, 0)));
+   skb_dst_set(skb, dst_clone(sk_dst_check(sk, 0)));
 
inet = inet_sk(sk);
fl = >cork.fl;
-- 
2.18.0.597.ga71716f1ad-goog



Re: [PATCH v4 net-next 0/9] Add support for XGMAC2 in stmmac

2018-08-09 Thread David Miller
From: Jose Abreu 
Date: Wed,  8 Aug 2018 09:04:28 +0100

> This series adds support for 10Gigabit IP in stmmac.

Series applied, thanks Jose.



[PATCH net-next 3/3] qede: Ingress tc flower offload (drop action) support.

2018-08-09 Thread Manish Chopra
The main motive of this patch is to lay down driver's
tc offload infrastructure in place.

With these changes tc can offload various supported flow
profiles (4 tuples, src-ip, dst-ip, l4 port) for the drop
action. Dropped flows statistic is a global counter for
all the offloaded flows for drop action and is populated
in ethtool statistics as common "gft_filter_drop".

Examples -

tc qdisc add dev p4p1 ingress
tc filter add dev p4p1 protocol ipv4 parent : flower \
skip_sw ip_proto tcp dst_ip 192.168.40.200 action drop
tc filter add dev p4p1 protocol ipv4 parent : flower \
skip_sw ip_proto udp src_ip 192.168.40.100 action drop
tc filter add dev p4p1 protocol ipv4 parent : flower \
skip_sw ip_proto tcp src_ip 192.168.40.100 dst_ip 192.168.40.200 \
src_port 453 dst_port 876 action drop
tc filter add dev p4p1 protocol ipv4 parent : flower \
skip_sw ip_proto tcp dst_port 98 action drop

Signed-off-by: Manish Chopra 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/qede.h |   7 +-
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |   2 +-
 drivers/net/ethernet/qlogic/qede/qede_filter.c  | 308 +++-
 drivers/net/ethernet/qlogic/qede/qede_main.c|  56 -
 4 files changed, 362 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index e90c60a..6a4d266 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -52,6 +52,9 @@
 #include 
 #include 
 
+#include 
+#include 
+
 #define QEDE_MAJOR_VERSION 8
 #define QEDE_MINOR_VERSION 33
 #define QEDE_REVISION_VERSION  0
@@ -469,7 +472,7 @@ int qede_rx_flow_steer(struct net_device *dev, const struct 
sk_buff *skb,
 void qede_free_arfs(struct qede_dev *edev);
 int qede_alloc_arfs(struct qede_dev *edev);
 int qede_add_cls_rule(struct qede_dev *edev, struct ethtool_rxnfc *info);
-int qede_del_cls_rule(struct qede_dev *edev, struct ethtool_rxnfc *info);
+int qede_delete_flow_filter(struct qede_dev *edev, u64 cookie);
 int qede_get_cls_rule_entry(struct qede_dev *edev, struct ethtool_rxnfc *cmd);
 int qede_get_cls_rule_all(struct qede_dev *edev, struct ethtool_rxnfc *info,
  u32 *rule_locs);
@@ -535,6 +538,8 @@ void qede_reload(struct qede_dev *edev,
 int qede_txq_has_work(struct qede_tx_queue *txq);
 void qede_recycle_rx_bd_ring(struct qede_rx_queue *rxq, u8 count);
 void qede_update_rx_prod(struct qede_dev *edev, struct qede_rx_queue *rxq);
+int qede_add_tc_flower_fltr(struct qede_dev *edev, __be16 proto,
+   struct tc_cls_flower_offload *f);
 
 #define RX_RING_SIZE_POW   13
 #define RX_RING_SIZE   ((u16)BIT(RX_RING_SIZE_POW))
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c 
b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index 2bd84d6..19652cd 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -1285,7 +1285,7 @@ static int qede_set_rxnfc(struct net_device *dev, struct 
ethtool_rxnfc *info)
rc = qede_add_cls_rule(edev, info);
break;
case ETHTOOL_SRXCLSRLDEL:
-   rc = qede_del_cls_rule(edev, info);
+   rc = qede_delete_flow_filter(edev, info->fs.location);
break;
default:
DP_INFO(edev, "Command parameters not supported\n");
diff --git a/drivers/net/ethernet/qlogic/qede/qede_filter.c 
b/drivers/net/ethernet/qlogic/qede/qede_filter.c
index d090257..9673d19 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_filter.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_filter.c
@@ -83,7 +83,7 @@ struct qede_arfs_fltr_node {
struct qede_arfs_tuple tuple;
 
u32 flow_id;
-   u16 sw_id;
+   u64 sw_id;
u16 rxq_id;
u16 next_rxq_id;
u8 vfid;
@@ -138,7 +138,7 @@ static void qede_configure_arfs_fltr(struct qede_dev *edev,
 
n->tuple.stringify(>tuple, tuple_buffer);
DP_VERBOSE(edev, NETIF_MSG_RX_STATUS,
-  "%s sw_id[0x%x]: %s [vf %u queue %d]\n",
+  "%s sw_id[0x%llx]: %s [vf %u queue %d]\n",
   add_fltr ? "Adding" : "Deleting",
   n->sw_id, tuple_buffer, n->vfid, rxq_id);
}
@@ -152,7 +152,10 @@ static void qede_configure_arfs_fltr(struct qede_dev *edev,
 qede_free_arfs_filter(struct qede_dev *edev,  struct qede_arfs_fltr_node *fltr)
 {
kfree(fltr->data);
-   clear_bit(fltr->sw_id, edev->arfs->arfs_fltr_bmap);
+
+   if (fltr->sw_id < QEDE_RFS_MAX_FLTR)
+   clear_bit(fltr->sw_id, edev->arfs->arfs_fltr_bmap);
+
kfree(fltr);
 }
 
@@ -214,7 +217,7 @@ void qede_arfs_filter_op(void *dev, void *filter, u8 fw_rc)
 
if (fw_rc) {
DP_NOTICE(edev,
- "Failed 

[PATCH net-next 2/3] qede: Add destination ip based flow profile.

2018-08-09 Thread Manish Chopra
This patch adds support for dropping and redirecting
the flows based on destination IP in the packet.

This also moves the profile mode settings in their own
functions which can be used through tc flows in successive
patch.

For example -

ethtool -N p5p1 flow-type tcp4 dst-ip 192.168.40.100 action -1
ethtool -N p5p1 flow-type udp4 dst-ip 192.168.50.100 action 1
ethtool -N p5p1 flow-type tcp4 dst-ip 192.168.60.100 action 0x1

Signed-off-by: Manish Chopra 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/qede_filter.c | 114 ++---
 1 file changed, 66 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_filter.c 
b/drivers/net/ethernet/qlogic/qede/qede_filter.c
index f9a327c..d090257 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_filter.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_filter.c
@@ -1599,6 +1599,69 @@ static int qede_flow_spec_validate_unused(struct 
qede_dev *edev,
return 0;
 }
 
+static int qede_set_v4_tuple_to_profile(struct qede_dev *edev,
+   struct qede_arfs_tuple *t)
+{
+   /* We must have Only 4-tuples/l4 port/src ip/dst ip
+* as an input.
+*/
+   if (t->src_port && t->dst_port && t->src_ipv4 && t->dst_ipv4) {
+   t->mode = QED_FILTER_CONFIG_MODE_5_TUPLE;
+   } else if (!t->src_port && t->dst_port &&
+  !t->src_ipv4 && !t->dst_ipv4) {
+   t->mode = QED_FILTER_CONFIG_MODE_L4_PORT;
+   } else if (!t->src_port && !t->dst_port &&
+  !t->dst_ipv4 && t->src_ipv4) {
+   t->mode = QED_FILTER_CONFIG_MODE_IP_SRC;
+   } else if (!t->src_port && !t->dst_port &&
+  t->dst_ipv4 && !t->src_ipv4) {
+   t->mode = QED_FILTER_CONFIG_MODE_IP_DEST;
+   } else {
+   DP_INFO(edev, "Invalid N-tuple\n");
+   return -EOPNOTSUPP;
+   }
+
+   t->ip_comp = qede_flow_spec_ipv4_cmp;
+   t->build_hdr = qede_flow_build_ipv4_hdr;
+   t->stringify = qede_flow_stringify_ipv4_hdr;
+
+   return 0;
+}
+
+static int qede_set_v6_tuple_to_profile(struct qede_dev *edev,
+   struct qede_arfs_tuple *t,
+   struct in6_addr *zaddr)
+{
+   /* We must have Only 4-tuples/l4 port/src ip/dst ip
+* as an input.
+*/
+   if (t->src_port && t->dst_port &&
+   memcmp(>src_ipv6, zaddr, sizeof(struct in6_addr)) &&
+   memcmp(>dst_ipv6, zaddr, sizeof(struct in6_addr))) {
+   t->mode = QED_FILTER_CONFIG_MODE_5_TUPLE;
+   } else if (!t->src_port && t->dst_port &&
+  !memcmp(>src_ipv6, zaddr, sizeof(struct in6_addr)) &&
+  !memcmp(>dst_ipv6, zaddr, sizeof(struct in6_addr))) {
+   t->mode = QED_FILTER_CONFIG_MODE_L4_PORT;
+   } else if (!t->src_port && !t->dst_port &&
+  !memcmp(>dst_ipv6, zaddr, sizeof(struct in6_addr)) &&
+  memcmp(>src_ipv6, zaddr, sizeof(struct in6_addr))) {
+   t->mode = QED_FILTER_CONFIG_MODE_IP_SRC;
+   } else if (!t->src_port && !t->dst_port &&
+  memcmp(>dst_ipv6, zaddr, sizeof(struct in6_addr)) &&
+  !memcmp(>src_ipv6, zaddr, sizeof(struct in6_addr))) {
+   t->mode = QED_FILTER_CONFIG_MODE_IP_DEST;
+   } else {
+   DP_INFO(edev, "Invalid N-tuple\n");
+   return -EOPNOTSUPP;
+   }
+
+   t->ip_comp = qede_flow_spec_ipv6_cmp;
+   t->build_hdr = qede_flow_build_ipv6_hdr;
+
+   return 0;
+}
+
 static int qede_flow_spec_to_tuple_ipv4_common(struct qede_dev *edev,
   struct qede_arfs_tuple *t,
   struct ethtool_rx_flow_spec *fs)
@@ -1638,27 +1701,7 @@ static int qede_flow_spec_to_tuple_ipv4_common(struct 
qede_dev *edev,
t->src_port = fs->h_u.tcp_ip4_spec.psrc;
t->dst_port = fs->h_u.tcp_ip4_spec.pdst;
 
-   /* We must either have a valid 4-tuple or only dst port
-* or only src ip as an input
-*/
-   if (t->src_port && t->dst_port && t->src_ipv4 && t->dst_ipv4) {
-   t->mode = QED_FILTER_CONFIG_MODE_5_TUPLE;
-   } else if (!t->src_port && t->dst_port &&
-  !t->src_ipv4 && !t->dst_ipv4) {
-   t->mode = QED_FILTER_CONFIG_MODE_L4_PORT;
-   }  else if (!t->src_port && !t->dst_port &&
-   !t->dst_ipv4 && t->src_ipv4) {
-   t->mode = QED_FILTER_CONFIG_MODE_IP_SRC;
-   } else {
-   DP_INFO(edev, "Invalid N-tuple\n");
-   return -EOPNOTSUPP;
-   }
-
-   t->ip_comp = qede_flow_spec_ipv4_cmp;
-   t->build_hdr = qede_flow_build_ipv4_hdr;
-   t->stringify = qede_flow_stringify_ipv4_hdr;
-
-   return 0;
+   return qede_set_v4_tuple_to_profile(edev, t);
 }
 
 static int 

[PATCH net-next 1/3] qed/qede: Multi CoS support.

2018-08-09 Thread Manish Chopra
This patch adds support for tc mqprio offload,
using this different traffic classes on the adapter
can be utilized based on configured priority to tc map.

For example -

tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3

This will cause SKBs with priority 0,1,2,3 to transmit
over tc 0,1,2,3 hardware queues respectively.

Signed-off-by: Manish Chopra 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed_l2.c|   9 +-
 drivers/net/ethernet/qlogic/qed/qed_main.c  |   5 +-
 drivers/net/ethernet/qlogic/qede/qede.h |  13 +++
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |  48 ++--
 drivers/net/ethernet/qlogic/qede/qede_fp.c  |  29 +++--
 drivers/net/ethernet/qlogic/qede/qede_main.c| 139 +++-
 include/linux/qed/qed_eth_if.h  |   6 +
 7 files changed, 200 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c 
b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index 5ede640..82a1bd1 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -2188,16 +2188,17 @@ int qed_get_queue_coalesce(struct qed_hwfn *p_hwfn, u16 
*p_coal, void *handle)
 static int qed_fill_eth_dev_info(struct qed_dev *cdev,
 struct qed_dev_eth_info *info)
 {
+   struct qed_hwfn *p_hwfn = QED_LEADING_HWFN(cdev);
int i;
 
memset(info, 0, sizeof(*info));
 
-   info->num_tc = 1;
-
if (IS_PF(cdev)) {
int max_vf_vlan_filters = 0;
int max_vf_mac_filters = 0;
 
+   info->num_tc = p_hwfn->hw_info.num_hw_tc;
+
if (cdev->int_params.out.int_mode == QED_INT_MODE_MSIX) {
u16 num_queues = 0;
 
@@ -2248,6 +2249,8 @@ static int qed_fill_eth_dev_info(struct qed_dev *cdev,
} else {
u16 total_cids = 0;
 
+   info->num_tc = 1;
+
/* Determine queues &  XDP support */
for_each_hwfn(cdev, i) {
struct qed_hwfn *p_hwfn = >hwfns[i];
@@ -2554,7 +2557,7 @@ static int qed_start_txq(struct qed_dev *cdev,
 
rc = qed_eth_tx_queue_start(p_hwfn,
p_hwfn->hw_info.opaque_fid,
-   p_params, 0,
+   p_params, p_params->tc,
pbl_addr, pbl_size, ret_params);
 
if (rc) {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index dbe8131..2094d86 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -948,13 +948,14 @@ static void qed_update_pf_params(struct qed_dev *cdev,
params->eth_pf_params.num_arfs_filters = 0;
 
/* In case we might support RDMA, don't allow qede to be greedy
-* with the L2 contexts. Allow for 64 queues [rx, tx, xdp] per hwfn.
+* with the L2 contexts. Allow for 64 queues [rx, tx cos, xdp]
+* per hwfn.
 */
if (QED_IS_RDMA_PERSONALITY(QED_LEADING_HWFN(cdev))) {
u16 *num_cons;
 
num_cons = >eth_pf_params.num_cons;
-   *num_cons = min_t(u16, *num_cons, 192);
+   *num_cons = min_t(u16, *num_cons, QED_MAX_L2_CONS);
}
 
for (i = 0; i < cdev->num_hwfns; i++) {
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index d7ed0d3..e90c60a 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -386,6 +386,15 @@ struct qede_tx_queue {
 #define QEDE_TXQ_XDP_TO_IDX(edev, txq) ((txq)->index - \
 QEDE_MAX_TSS_CNT(edev))
 #define QEDE_TXQ_IDX_TO_XDP(edev, idx) ((idx) + QEDE_MAX_TSS_CNT(edev))
+#define QEDE_NDEV_TXQ_ID_TO_FP_ID(edev, idx)   ((edev)->fp_num_rx + \
+((idx) % QEDE_TSS_COUNT(edev)))
+#define QEDE_NDEV_TXQ_ID_TO_TXQ_COS(edev, idx) ((idx) / QEDE_TSS_COUNT(edev))
+#define QEDE_TXQ_TO_NDEV_TXQ_ID(edev, txq) ((QEDE_TSS_COUNT(edev) * \
+(txq)->cos) + (txq)->index)
+#define QEDE_NDEV_TXQ_ID_TO_TXQ(edev, idx) \
+   (&((edev)->fp_array[QEDE_NDEV_TXQ_ID_TO_FP_ID(edev, idx)].txq \
+   [QEDE_NDEV_TXQ_ID_TO_TXQ_COS(edev, idx)]))
+#define QEDE_FP_TC0_TXQ(fp)(&((fp)->txq[0]))
 
/* Regular Tx requires skb + metadata for release purpose,
 * while XDP requires the pages and the mapped address.
@@ -399,6 +408,8 @@ struct qede_tx_queue {
 
/* Slowpath; Should be kept in end [unless missing padding] */
void *handle;
+   u16 cos;
+   u16 ndev_txq_id;
 };
 
 #define BD_UNMAP_ADDR(bd)  HILO_U64(le32_to_cpu((bd)->addr.hi), \
@@ -541,5 +552,7 @@ void qede_reload(struct qede_dev *edev,
 #define QEDE_RX_HDR_SIZE   256
 

[PATCH net-next 0/3] qed*: Enhancements

2018-08-09 Thread Manish Chopra
Hi David,

This patch series adds following support in drivers -

1. Egress mqprio offload.
2. Add destination IP based flow profile.
3. Ingress flower offload (for drop action).

Please consider applying this series to "net-next".

Thanks,
Manish

Manish Chopra (3):
  qed/qede: Multi CoS support.
  qede: Add destination ip based flow profile.
  qede: Ingress tc flower offload (drop action) support.

 drivers/net/ethernet/qlogic/qed/qed_l2.c|   9 +-
 drivers/net/ethernet/qlogic/qed/qed_main.c  |   5 +-
 drivers/net/ethernet/qlogic/qede/qede.h |  20 +-
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |  50 ++-
 drivers/net/ethernet/qlogic/qede/qede_filter.c  | 422 
 drivers/net/ethernet/qlogic/qede/qede_fp.c  |  29 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c| 195 +--
 include/linux/qed/qed_eth_if.h  |   6 +
 8 files changed, 628 insertions(+), 108 deletions(-)

-- 
1.8.3.1



Re: [PATCH net-next 00/13] More complete PHYLINK support for mv88e6xxx

2018-08-09 Thread David Miller
From: Andrew Lunn 
Date: Thu,  9 Aug 2018 15:38:36 +0200

> Previous patches added sufficient PHYLINK support to the mv88e6xxx
> that it did not break existing use cases, basically fixed-link phys.
> 
> This patchset builds out the support so that SFP modules, up to
> 2.5Gbps can be supported, on mv88e6390X, on ports 9 and 10. It also
> provides a framework which can be extended to support SFPs on ports
> 2-8 of mv88e6390X, 10Gbps PHYs, and SFP support on the 6352 family.
> 
> Russell King did much of the initial work, implementing the validate
> and mac_link_state calls. However, there is an important TODO in the
> commit message:
> 
> needs to call phylink_mac_change() when the port link comes up/goes down.
> 
> The remaining patches implement this, by adding more support for the
> SERDES interfaces, in particular, interrupt support so we get notified
> when the SERDES gains/looses sync.
> 
> This has been tested on the ZII devel C, using a Clearfog as peer
> device.

Looks good to me, series applied, thanks!


[PATCH ethtool v2 1/3] ethtool-copy.h: sync with net-next

2018-08-09 Thread Florian Fainelli
This covers kernel changes up to commit 6cfef793b558:
   ethtool: Add WAKE_FILTER and RX_CLS_FLOW_WAKE

Signed-off-by: Florian Fainelli 
---
 ethtool-copy.h | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 8cc61e9ab40b..6bfbb85f9402 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -215,12 +215,16 @@ struct ethtool_value {
__u32   data;
 };
 
+#define PFC_STORM_PREVENTION_AUTO  0x
+#define PFC_STORM_PREVENTION_DISABLE   0
+
 enum tunable_id {
ETHTOOL_ID_UNSPEC,
ETHTOOL_RX_COPYBREAK,
ETHTOOL_TX_COPYBREAK,
+   ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */
/*
-* Add your fresh new tubale attribute above and remember to update
+* Add your fresh new tunable attribute above and remember to update
 * tunable_strings[] in net/core/ethtool.c
 */
__ETHTOOL_TUNABLE_COUNT,
@@ -864,7 +868,8 @@ struct ethtool_flow_ext {
  * includes the %FLOW_EXT or %FLOW_MAC_EXT flag
  * (see  ethtool_flow_ext description).
  * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
- * if packets should be discarded
+ * if packets should be discarded, or %RX_CLS_FLOW_WAKE if the
+ * packets should be used for Wake-on-LAN with %WAKE_FILTER
  * @location: Location of rule in the table.  Locations must be
  * numbered such that a flow matching multiple rules will be
  * classified according to the first (lowest numbered) rule.
@@ -896,13 +901,13 @@ struct ethtool_rx_flow_spec {
 static __inline__ __u64 ethtool_get_flow_spec_ring(__u64 ring_cookie)
 {
return ETHTOOL_RX_FLOW_SPEC_RING & ring_cookie;
-};
+}
 
 static __inline__ __u64 ethtool_get_flow_spec_ring_vf(__u64 ring_cookie)
 {
return (ETHTOOL_RX_FLOW_SPEC_RING_VF & ring_cookie) >>
ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
-};
+}
 
 /**
  * struct ethtool_rxnfc - command to get or set RX flow classification rules
@@ -1628,6 +1633,7 @@ static __inline__ int ethtool_validate_duplex(__u8 duplex)
 #define WAKE_ARP   (1 << 4)
 #define WAKE_MAGIC (1 << 5)
 #define WAKE_MAGICSECURE   (1 << 6) /* only meaningful if WAKE_MAGIC */
+#define WAKE_FILTER(1 << 7)
 
 /* L2-L4 network traffic flow types */
 #defineTCP_V4_FLOW 0x01/* hash or spec (tcp_ip4_spec) */
@@ -1665,6 +1671,7 @@ static __inline__ int ethtool_validate_duplex(__u8 duplex)
 #defineRXH_DISCARD (1 << 31)
 
 #defineRX_CLS_FLOW_DISC0xULL
+#define RX_CLS_FLOW_WAKE   0xfffeULL
 
 /* Special RX classification rule insert location values */
 #define RX_CLS_LOC_SPECIAL 0x8000  /* flag */
-- 
2.17.1



[PATCH ethtool v2 2/3] ethtool: Add support for WAKE_FILTER (WoL using filters)

2018-08-09 Thread Florian Fainelli
Add a new character 'f' which can be used to configure an Ethernet
controller to support Wake-on-LAN using filters programmed with the
ethtool::rxnfc and the special action -2 (wake-up filter). This is
useful in particular in the context of devices that must support wake-up
on more complex patterns such as multicast DNS packets.

Signed-off-by: Florian Fainelli 
---
 ethtool.8.in | 3 ++-
 ethtool.c| 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/ethtool.8.in b/ethtool.8.in
index 0a366aa536ae..3eb9005ada48 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -58,7 +58,7 @@
 .\"
 .\"\(*WO - wol flags
 .\"
-.ds WO \fBp\fP|\fBu\fP|\fBm\fP|\fBb\fP|\fBa\fP|\fBg\fP|\fBs\fP|\fBd\fP...
+.ds WO \fBp\fP|\fBu\fP|\fBm\fP|\fBb\fP|\fBa\fP|\fBg\fP|\fBs\fP|\fBf|\fBd\fP...
 .\"
 .\"\(*FL - flow type values
 .\"
@@ -679,6 +679,7 @@ b   Wake on broadcast messages
 a  Wake on ARP
 g  Wake on MagicPacket\[tm]
 s  Enable SecureOn\[tm] password for MagicPacket\[tm]
+f  Wake on filter(s)
 d  T{
 Disable (wake on nothing).  This option clears all previous options.
 T}
diff --git a/ethtool.c b/ethtool.c
index fb93ae898312..aa2bbe9e4c65 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -931,6 +931,9 @@ static int parse_wolopts(char *optstr, u32 *data)
case 's':
*data |= WAKE_MAGICSECURE;
break;
+   case 'f':
+   *data |= WAKE_FILTER;
+   break;
case 'd':
*data = 0;
break;
@@ -964,6 +967,8 @@ static char *unparse_wolopts(int wolopts)
*p++ = 'g';
if (wolopts & WAKE_MAGICSECURE)
*p++ = 's';
+   if (wolopts & WAKE_FILTER)
+   *p++ = 'f';
} else {
*p = 'd';
}
-- 
2.17.1



[PATCH ethtool v2 0/3] ethtool: Wake-on-LAN using filters

2018-08-09 Thread Florian Fainelli
Hi John,

This patch series syncs up ethtool-copy.h to get the new definitions
required for supporting wake-on-LAN using filters: WAKE_FILTER and
RX_CLS_FLOW_WAKE and then updates the rxclass.c code to allow us to
specify action -2 (RX_CLS_FLOW_WAKE).

Let me know if you would like this to be done differently.

Thanks!

Changes in v2:

- properly put the man page hunk describing action -2 into patch #3

Florian Fainelli (3):
  ethtool-copy.h: sync with net-next
  ethtool: Add support for WAKE_FILTER (WoL using filters)
  ethtool: Add support for action value -2 (wake-up filter)

 ethtool-copy.h | 15 +++
 ethtool.8.in   |  4 +++-
 ethtool.c  |  5 +
 rxclass.c  |  8 +---
 4 files changed, 24 insertions(+), 8 deletions(-)

-- 
2.17.1



[PATCH ethtool v2 3/3] ethtool: Add support for action value -2 (wake-up filter)

2018-08-09 Thread Florian Fainelli
Add the ability to program special filters using ethtool::rxnfc which
are meant to be used for wake-up purposes (in conjuction with
WAKE_FILTER) using the special action value: -2 (RX_CLS_FLOW_WAKE).

Signed-off-by: Florian Fainelli 
---
 ethtool.8.in | 1 +
 rxclass.c| 8 +---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/ethtool.8.in b/ethtool.8.in
index 3eb9005ada48..97c7330fd373 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -871,6 +871,7 @@ Specifies the Rx queue to send packets to, or some other 
action.
 nokeep;
 lB l.
 -1 Drop the matched flow
+-2 Use the matched flow as a Wake-on-LAN filter
 0 or higherRx queue to route the flow
 .TE
 .TP
diff --git a/rxclass.c b/rxclass.c
index 42d122d1ed86..79972651e706 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -251,7 +251,11 @@ static void rxclass_print_nfc_rule(struct 
ethtool_rx_flow_spec *fsp,
if (fsp->flow_type & FLOW_RSS)
fprintf(stdout, "\tRSS Context ID: %u\n", rss_context);
 
-   if (fsp->ring_cookie != RX_CLS_FLOW_DISC) {
+   if (fsp->ring_cookie == RX_CLS_FLOW_DISC) {
+   fprintf(stdout, "\tAction: Drop\n");
+   } else if (fsp->ring_cookie == RX_CLS_FLOW_WAKE) {
+   fprintf(stdout, "\tAction: Wake-on-LAN\n");
+   } else {
u64 vf = ethtool_get_flow_spec_ring_vf(fsp->ring_cookie);
u64 queue = ethtool_get_flow_spec_ring(fsp->ring_cookie);
 
@@ -266,8 +270,6 @@ static void rxclass_print_nfc_rule(struct 
ethtool_rx_flow_spec *fsp,
else
fprintf(stdout, "\tAction: Direct to queue %llu\n",
queue);
-   } else {
-   fprintf(stdout, "\tAction: Drop\n");
}
 
fprintf(stdout, "\n");
-- 
2.17.1



Re: [PATCH bpf] bpf: fix bpffs non-array map seq_show issue

2018-08-09 Thread Yonghong Song




On 8/9/18 10:02 AM, Daniel Borkmann wrote:

On 08/09/2018 06:55 PM, Yonghong Song wrote:

On 8/9/18 8:59 AM, Daniel Borkmann wrote:

On 08/09/2018 05:15 PM, Yonghong Song wrote:

On 8/9/18 7:24 AM, Daniel Borkmann wrote:

On 08/09/2018 05:55 AM, Yonghong Song wrote:

On 8/8/18 7:25 PM, Alexei Starovoitov wrote:

On Wed, Aug 08, 2018 at 06:25:19PM -0700, Yonghong Song wrote:

In function map_seq_next() of kernel/bpf/inode.c,
the first key will be the "0" regardless of the map type.
This works for array. But for hash type, if it happens
key "0" is in the map, the bpffs map show will miss
some items if the key "0" is not the first element of
the first bucket.

This patch fixed the issue by guaranteeing to get
the first element, if the seq_show is just started,
by passing NULL pointer key to map_get_next_key() callback.
This way, no missing elements will occur for
bpffs hash table show even if key "0" is in the map.


Currently, map_seq_show_elem callback is only implemented
for arraymap. So the problem actually is not exposed.

The issue is discovered when I tried to implement
map_seq_show_elem for hash maps, and I will have followup
patches for it.


Btw, on that note, I would also prefer if we could decouple
BTF from the map_seq_show_elem() as there is really no reason
to have it on a per-map. I had a patch below which would enable
it for all map types generically, and bpftool works out of the
box for it. Also, array doesn't really have to be 'int' type
enforced as long as it's some data structure with 4 bytes, it's
all fine, so this can be made fully generic (we only eventually
care about the match in size).


I agree with a generic map_check_btf as mostly we only care about size
and this change should enable btftool btf based pretty print for hash/lru_hash 
tables.


Yep, agree, the below output from bpftool is from test_xdp_noinline.o
where both work with it.


  From 0a8be27cbc2ac0c6fc2632865b5afe37222a1fc7 Mon Sep 17 00:00:00 2001
Message-Id: 
<0a8be27cbc2ac0c6fc2632865b5afe37222a1fc7.1533830053.git.dan...@iogearbox.net>
From: Daniel Borkmann 
Date: Thu, 9 Aug 2018 16:50:21 +0200
Subject: [PATCH bpf-next] bpf, btf: enable for all maps

# bpftool m dump id 19
[{
  "key": {
  "": {
  "vip": 0,
  "vipv6": []
  },
  "port": 0,
  "family": 0,
  "proto": 0
  },
  "value": {
  "flags": 0,
  "vip_num": 0
  }
  }
]

Signed-off-by: Daniel Borkmann 
---
   include/linux/bpf.h   |  4 +---
   kernel/bpf/arraymap.c | 27 ---
   kernel/bpf/inode.c    |  3 ++-
   kernel/bpf/syscall.c  | 24 
   4 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cd8790d..91aa4be 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -48,8 +48,6 @@ struct bpf_map_ops {
   u32 (*map_fd_sys_lookup_elem)(void *ptr);
   void (*map_seq_show_elem)(struct bpf_map *map, void *key,
     struct seq_file *m);
-    int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
- u32 key_type_id, u32 value_type_id);
   };

   struct bpf_map {
@@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const struct 
bpf_map *map)

   static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
   {
-    return map->ops->map_seq_show_elem && map->ops->map_check_btf;
+    return map->ops->map_seq_show_elem;
   }

   extern const struct bpf_map_ops bpf_map_offload_ops;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2aa55d030..67f0bdf 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map *map, 
void *key,
   rcu_read_unlock();
   }

-static int array_map_check_btf(const struct bpf_map *map, const struct btf 
*btf,
-   u32 btf_key_id, u32 btf_value_id)
-{
-    const struct btf_type *key_type, *value_type;
-    u32 key_size, value_size;
-    u32 int_data;
-
-    key_type = btf_type_id_size(btf, _key_id, _size);
-    if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
-    return -EINVAL;
-
-    int_data = *(u32 *)(key_type + 1);
-    /* bpf array can only take a u32 key.  This check makes
- * sure that the btf matches the attr used during map_create.
- */
-    if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
-    BTF_INT_OFFSET(int_data))
-    return -EINVAL;
-
-    value_type = btf_type_id_size(btf, _value_id, _size);
-    if (!value_type || value_size != map->value_size)
-    return -EINVAL;
-
-    return 0;
-}
-
   const struct bpf_map_ops array_map_ops = {
   .map_alloc_check = array_map_alloc_check,
   .map_alloc = array_map_alloc,
@@ -394,7 +368,6 @@ const struct bpf_map_ops array_map_ops = {
   .map_delete_elem = array_map_delete_elem,
   

Re: [PATCH net-next 0/7] mlxsw: Various updates

2018-08-09 Thread David Miller
From: Ido Schimmel 
Date: Thu,  9 Aug 2018 11:59:06 +0300

> Patches 1-3 update the driver to use a new firmware version. Due to a
> recently discovered issue, the version (and future ones) does not
> support matching on VLAN ID at egress. This is enforced in the driver
> and reported back to the user via extack.
> 
> Patch 4 adds a new selftest for the recently introduced algorithmic
> TCAM.
> 
> Patch 5 converts the driver to use SPDX identifiers.
> 
> Patches 6-7 fix a bug in ethtool stats reporting and expose counters for
> all 16 TCs, following recent MC-aware changes that utilize TCs 8-15.

Series applied, thanks!


Re: [Query]: DSA Understanding

2018-08-09 Thread Andrew Lunn
> Its coming from the switch lan4 I have attached the png, where
> C4:F3:12:08:FE:7F is
> the mac of lan4, which is broadcast to ff:ff:ff:ff:ff:ff, which is
> causing rx counter on
> PC to go up.

So, big packets are making it from the switch to the PC. But the small
ARP packets are not.

This is what Florian was suggesting.

ARP packets are smaller than 64 bytes, which is the minimum packet
size for Ethernet. Any packets smaller than 64 bytes are called runt
packets. They have to be padded upto 64 bytes in order to make them
valid. Otherwise the destination, or any switch along the path, might
throw them away.

What could be happening is that the CSPW driver or hardware is padding
the packet to 64 bytes. But that packet has a DSA header in it. The
switch removes the header, recalculate the checksum and sends the
packet. It is now either 4 or 8 bytes smaller, depending on what DSA
header was used. It then becomes a runt packet.

Florian had to fix this problem recently.

http://patchwork.ozlabs.org/patch/836534/

You probably need something similar for the cpsw.

Andrew


Re: [PATCH bpf] bpf: fix bpffs non-array map seq_show issue

2018-08-09 Thread Daniel Borkmann
On 08/09/2018 06:55 PM, Yonghong Song wrote:
> On 8/9/18 8:59 AM, Daniel Borkmann wrote:
>> On 08/09/2018 05:15 PM, Yonghong Song wrote:
>>> On 8/9/18 7:24 AM, Daniel Borkmann wrote:
 On 08/09/2018 05:55 AM, Yonghong Song wrote:
> On 8/8/18 7:25 PM, Alexei Starovoitov wrote:
>> On Wed, Aug 08, 2018 at 06:25:19PM -0700, Yonghong Song wrote:
>>> In function map_seq_next() of kernel/bpf/inode.c,
>>> the first key will be the "0" regardless of the map type.
>>> This works for array. But for hash type, if it happens
>>> key "0" is in the map, the bpffs map show will miss
>>> some items if the key "0" is not the first element of
>>> the first bucket.
>>>
>>> This patch fixed the issue by guaranteeing to get
>>> the first element, if the seq_show is just started,
>>> by passing NULL pointer key to map_get_next_key() callback.
>>> This way, no missing elements will occur for
>>> bpffs hash table show even if key "0" is in the map.
>
> Currently, map_seq_show_elem callback is only implemented
> for arraymap. So the problem actually is not exposed.
>
> The issue is discovered when I tried to implement
> map_seq_show_elem for hash maps, and I will have followup
> patches for it.
>>
>> Btw, on that note, I would also prefer if we could decouple
>> BTF from the map_seq_show_elem() as there is really no reason
>> to have it on a per-map. I had a patch below which would enable
>> it for all map types generically, and bpftool works out of the
>> box for it. Also, array doesn't really have to be 'int' type
>> enforced as long as it's some data structure with 4 bytes, it's
>> all fine, so this can be made fully generic (we only eventually
>> care about the match in size).
> 
> I agree with a generic map_check_btf as mostly we only care about size
> and this change should enable btftool btf based pretty print for 
> hash/lru_hash tables.

Yep, agree, the below output from bpftool is from test_xdp_noinline.o
where both work with it.

>>  From 0a8be27cbc2ac0c6fc2632865b5afe37222a1fc7 Mon Sep 17 00:00:00 2001
>> Message-Id: 
>> <0a8be27cbc2ac0c6fc2632865b5afe37222a1fc7.1533830053.git.dan...@iogearbox.net>
>> From: Daniel Borkmann 
>> Date: Thu, 9 Aug 2018 16:50:21 +0200
>> Subject: [PATCH bpf-next] bpf, btf: enable for all maps
>>
>> # bpftool m dump id 19
>> [{
>>  "key": {
>>  "": {
>>  "vip": 0,
>>  "vipv6": []
>>  },
>>  "port": 0,
>>  "family": 0,
>>  "proto": 0
>>  },
>>  "value": {
>>  "flags": 0,
>>  "vip_num": 0
>>  }
>>  }
>> ]
>>
>> Signed-off-by: Daniel Borkmann 
>> ---
>>   include/linux/bpf.h   |  4 +---
>>   kernel/bpf/arraymap.c | 27 ---
>>   kernel/bpf/inode.c    |  3 ++-
>>   kernel/bpf/syscall.c  | 24 
>>   4 files changed, 23 insertions(+), 35 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index cd8790d..91aa4be 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -48,8 +48,6 @@ struct bpf_map_ops {
>>   u32 (*map_fd_sys_lookup_elem)(void *ptr);
>>   void (*map_seq_show_elem)(struct bpf_map *map, void *key,
>>     struct seq_file *m);
>> -    int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
>> - u32 key_type_id, u32 value_type_id);
>>   };
>>
>>   struct bpf_map {
>> @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const struct 
>> bpf_map *map)
>>
>>   static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
>>   {
>> -    return map->ops->map_seq_show_elem && map->ops->map_check_btf;
>> +    return map->ops->map_seq_show_elem;
>>   }
>>
>>   extern const struct bpf_map_ops bpf_map_offload_ops;
>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>> index 2aa55d030..67f0bdf 100644
>> --- a/kernel/bpf/arraymap.c
>> +++ b/kernel/bpf/arraymap.c
>> @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map 
>> *map, void *key,
>>   rcu_read_unlock();
>>   }
>>
>> -static int array_map_check_btf(const struct bpf_map *map, const struct btf 
>> *btf,
>> -   u32 btf_key_id, u32 btf_value_id)
>> -{
>> -    const struct btf_type *key_type, *value_type;
>> -    u32 key_size, value_size;
>> -    u32 int_data;
>> -
>> -    key_type = btf_type_id_size(btf, _key_id, _size);
>> -    if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
>> -    return -EINVAL;
>> -
>> -    int_data = *(u32 *)(key_type + 1);
>> -    /* bpf array can only take a u32 key.  This check makes
>> - * sure that the btf matches the attr used during map_create.
>> - */
>> -    if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
>> -    BTF_INT_OFFSET(int_data))
>> -    return -EINVAL;
>> -
>> -    value_type = btf_type_id_size(btf, _value_id, _size);
>> -    

Re: [PATCH iproute2/net-next] tc_util: Add support for showing TCA_STATS_BASIC_HW statistics

2018-08-09 Thread Eelco Chaudron

Thanks for the quick reply, see inline responses.

On 9 Aug 2018, at 18:07, Stephen Hemminger wrote:


On Thu,  9 Aug 2018 11:16:02 -0400
Eelco Chaudron  wrote:



+static void print_tcstats_basic_hw(struct rtattr **tbs, char 
*prefix)

+{
+   struct gnet_stats_basic bs = {0};


If not present don't print it rather than printing zero.



This is used to print separate SW counters below, which is not displayed 
if 0, i.e. not present.
However I will move it under the “if (tbs[TCA_STATS_BASIC])” 
statement, so it’s more explicit.



+   struct gnet_stats_basic bs_hw = {0};


This initialization is unnecessary since you always overwrite it.



Thanks will remove it in the v2


+
+   if (!tbs[TCA_STATS_BASIC_HW])
+   return;
+
+   memcpy(_hw, RTA_DATA(tbs[TCA_STATS_BASIC_HW]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC_HW]), sizeof(bs_hw)));
+
+   if (bs_hw.bytes == 0 && bs_hw.packets == 0)
+   return;
+
+   if (tbs[TCA_STATS_BASIC]) {
+   memcpy(, RTA_DATA(tbs[TCA_STATS_BASIC]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC]),
+  sizeof(bs)));
+   }
+
+   if (bs.bytes >= bs_hw.bytes && bs.packets >= bs_hw.packets) {
+   print_string(PRINT_FP, NULL, "\n%s", prefix);


Please use the magic string _SL_ to allow supporting single line 
output mode.




Will do in the V2.


+   print_lluint(PRINT_ANY, "sw_bytes",
+"Sent software %llu bytes",
+bs.bytes - bs_hw.bytes);
+   print_uint(PRINT_ANY, "sw_packets", " %u pkt",
+  bs.packets - bs_hw.packets);
+   }
+
+   print_string(PRINT_FP, NULL, "\n%s", prefix);
+   print_lluint(PRINT_ANY, "hw_bytes", "Sent hardware %llu bytes",
+bs_hw.bytes);
+   print_uint(PRINT_ANY, "hw_packets", " %u pkt", bs_hw.packets);
+}


Re: [PATCH bpf] bpf: fix bpffs non-array map seq_show issue

2018-08-09 Thread Yonghong Song




On 8/9/18 8:59 AM, Daniel Borkmann wrote:

On 08/09/2018 05:15 PM, Yonghong Song wrote:

On 8/9/18 7:24 AM, Daniel Borkmann wrote:

On 08/09/2018 05:55 AM, Yonghong Song wrote:

On 8/8/18 7:25 PM, Alexei Starovoitov wrote:

On Wed, Aug 08, 2018 at 06:25:19PM -0700, Yonghong Song wrote:

In function map_seq_next() of kernel/bpf/inode.c,
the first key will be the "0" regardless of the map type.
This works for array. But for hash type, if it happens
key "0" is in the map, the bpffs map show will miss
some items if the key "0" is not the first element of
the first bucket.

This patch fixed the issue by guaranteeing to get
the first element, if the seq_show is just started,
by passing NULL pointer key to map_get_next_key() callback.
This way, no missing elements will occur for
bpffs hash table show even if key "0" is in the map.


Currently, map_seq_show_elem callback is only implemented
for arraymap. So the problem actually is not exposed.

The issue is discovered when I tried to implement
map_seq_show_elem for hash maps, and I will have followup
patches for it.


Btw, on that note, I would also prefer if we could decouple
BTF from the map_seq_show_elem() as there is really no reason
to have it on a per-map. I had a patch below which would enable
it for all map types generically, and bpftool works out of the
box for it. Also, array doesn't really have to be 'int' type
enforced as long as it's some data structure with 4 bytes, it's
all fine, so this can be made fully generic (we only eventually
care about the match in size).


I agree with a generic map_check_btf as mostly we only care about size
and this change should enable btftool btf based pretty print for 
hash/lru_hash tables.




 From 0a8be27cbc2ac0c6fc2632865b5afe37222a1fc7 Mon Sep 17 00:00:00 2001
Message-Id: 
<0a8be27cbc2ac0c6fc2632865b5afe37222a1fc7.1533830053.git.dan...@iogearbox.net>
From: Daniel Borkmann 
Date: Thu, 9 Aug 2018 16:50:21 +0200
Subject: [PATCH bpf-next] bpf, btf: enable for all maps

# bpftool m dump id 19
[{
 "key": {
 "": {
 "vip": 0,
 "vipv6": []
 },
 "port": 0,
 "family": 0,
 "proto": 0
 },
 "value": {
 "flags": 0,
 "vip_num": 0
 }
 }
]

Signed-off-by: Daniel Borkmann 
---
  include/linux/bpf.h   |  4 +---
  kernel/bpf/arraymap.c | 27 ---
  kernel/bpf/inode.c|  3 ++-
  kernel/bpf/syscall.c  | 24 
  4 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cd8790d..91aa4be 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -48,8 +48,6 @@ struct bpf_map_ops {
u32 (*map_fd_sys_lookup_elem)(void *ptr);
void (*map_seq_show_elem)(struct bpf_map *map, void *key,
  struct seq_file *m);
-   int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
-u32 key_type_id, u32 value_type_id);
  };

  struct bpf_map {
@@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const struct 
bpf_map *map)

  static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
  {
-   return map->ops->map_seq_show_elem && map->ops->map_check_btf;
+   return map->ops->map_seq_show_elem;
  }

  extern const struct bpf_map_ops bpf_map_offload_ops;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2aa55d030..67f0bdf 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map *map, 
void *key,
rcu_read_unlock();
  }

-static int array_map_check_btf(const struct bpf_map *map, const struct btf 
*btf,
-  u32 btf_key_id, u32 btf_value_id)
-{
-   const struct btf_type *key_type, *value_type;
-   u32 key_size, value_size;
-   u32 int_data;
-
-   key_type = btf_type_id_size(btf, _key_id, _size);
-   if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
-   return -EINVAL;
-
-   int_data = *(u32 *)(key_type + 1);
-   /* bpf array can only take a u32 key.  This check makes
-* sure that the btf matches the attr used during map_create.
-*/
-   if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
-   BTF_INT_OFFSET(int_data))
-   return -EINVAL;
-
-   value_type = btf_type_id_size(btf, _value_id, _size);
-   if (!value_type || value_size != map->value_size)
-   return -EINVAL;
-
-   return 0;
-}
-
  const struct bpf_map_ops array_map_ops = {
.map_alloc_check = array_map_alloc_check,
.map_alloc = array_map_alloc,
@@ -394,7 +368,6 @@ const struct bpf_map_ops array_map_ops = {
.map_delete_elem = array_map_delete_elem,
.map_gen_lookup = array_map_gen_lookup,
.map_seq_show_elem = array_map_seq_show_elem,
-   

Error running AF_XDP sample application

2018-08-09 Thread kdjimeli
Hello,

I have been trying to test a sample AF_XDP program, but I have been
experiencing some issues.
After building the sample code
https://github.com/torvalds/linux/tree/master/samples/bpf,
when running the xdpsock binary, I get the errors
"libbpf: failed to create map (name: 'xsks_map'): Invalid argument"
"libbpf: failed to load object './xdpsock_kern.o"

I tried to figure out the cause of the error but all I know is that it
occurs at line 910 with the function
call "bpf_prog_load_xattr(_load_attr, , _fd)".

Please I would like to inquire what could be a possible for this error.


Thanks
Konrad


Re: [DO NOT MERGE] ARM: dts: vf610-zii-dev-rev-c: add support for SFF modules

2018-08-09 Thread Marek Behún
Hi Andres,

I tried your patches on Turris Mox with one 6190 marvell switch with
port 9 connected to CPU and port 10 to SFP cage.
Seems it works :)
Thank you.

Marek


[PATCH net-next 3/4] tcp: always ACK immediately on hole repairs

2018-08-09 Thread Yuchung Cheng
RFC 5681 sec 4.2:
  To provide feedback to senders recovering from losses, the receiver
  SHOULD send an immediate ACK when it receives a data segment that
  fills in all or part of a gap in the sequence space.

When a gap is partially filled, __tcp_ack_snd_check already checks
the out-of-order queue and correctly send an immediate ACK. However
when a gap is fully filled, the previous implementation only resets
pingpong mode which does not guarantee an immediate ACK because the
quick ACK counter may be zero. This patch addresses this issue by
marking the one-time immediate ACK flag instead.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
Signed-off-by: Wei Wang 
Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp_input.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b8849588c440..9a09ff3afef2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4735,11 +4735,11 @@ static void tcp_data_queue(struct sock *sk, struct 
sk_buff *skb)
if (!RB_EMPTY_ROOT(>out_of_order_queue)) {
tcp_ofo_queue(sk);
 
-   /* RFC2581. 4.2. SHOULD send immediate ACK, when
+   /* RFC5681. 4.2. SHOULD send immediate ACK, when
 * gap in queue is filled.
 */
if (RB_EMPTY_ROOT(>out_of_order_queue))
-   inet_csk(sk)->icsk_ack.pingpong = 0;
+   inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
}
 
if (tp->rx_opt.num_sacks)
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH net-next 4/4] tcp: avoid resetting ACK timer upon receiving packet with ECN CWR flag

2018-08-09 Thread Yuchung Cheng
Previously commit 9aee40006190 ("tcp: ack immediately when a cwr
packet arrives") calls tcp_enter_quickack_mode to force sending
two immediate ACKs upon receiving a packet w/ CWR flag. The side
effect is it'll also reset the delayed ACK timer and interactive
session tracking. This patch removes that side effect by using the
new ACK_NOW flag to force an immmediate ACK.

Packetdrill to demonstrate:

0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < [ect0] SEW 0:0(0) win 32792 
   +0 > SE. 0:0(0) ack 1 
  +.1 < [ect0] . 1:1(0) ack 1 win 257
   +0 accept(3, ..., ...) = 4

   +0 < [ect0] . 1:1001(1000) ack 1 win 257
   +0 > [ect01] . 1:1(0) ack 1001

   +0 write(4, ..., 1) = 1
   +0 > [ect01] P. 1:2(1) ack 1001

   +0 < [ect0] . 1001:2001(1000) ack 2 win 257
   +0 write(4, ..., 1) = 1
   +0 > [ect01] P. 2:3(1) ack 2001

   +0 < [ect0] . 2001:3001(1000) ack 3 win 257
   +0 < [ect0] . 3001:4001(1000) ack 3 win 257
   // Ack delayed ...

   +.01 < [ce] P. 4001:4501(500) ack 3 win 257
   +0 > [ect01] . 3:3(0) ack 4001
   +0 > [ect01] E. 3:3(0) ack 4501

+.001 read(4, ..., 4500) = 4500
   +0 write(4, ..., 1) = 1
   +0 > [ect01] PE. 3:4(1) ack 4501 win 100

 +.01 < [ect0] W. 4501:5501(1000) ack 4 win 257
   // No delayed ACK on CWR flag
   +0 > [ect01] . 4:4(0) ack 5501

 +.31 < [ect0] . 5501:6501(1000) ack 4 win 257
   +0 > [ect01] . 4:4(0) ack 6501


Fixes: 9aee40006190 ("tcp: ack immediately when a cwr packet arrives")
Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
---
 net/ipv4/tcp_input.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9a09ff3afef2..4c2dd9f863f7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -245,16 +245,16 @@ static void tcp_ecn_queue_cwr(struct tcp_sock *tp)
tp->ecn_flags |= TCP_ECN_QUEUE_CWR;
 }
 
-static void tcp_ecn_accept_cwr(struct tcp_sock *tp, const struct sk_buff *skb)
+static void tcp_ecn_accept_cwr(struct sock *sk, const struct sk_buff *skb)
 {
if (tcp_hdr(skb)->cwr) {
-   tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
+   tcp_sk(sk)->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
 
/* If the sender is telling us it has entered CWR, then its
 * cwnd may be very low (even just 1 packet), so we should ACK
 * immediately.
 */
-   tcp_enter_quickack_mode((struct sock *)tp, 2);
+   inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
}
 }
 
@@ -4703,7 +4703,7 @@ static void tcp_data_queue(struct sock *sk, struct 
sk_buff *skb)
skb_dst_drop(skb);
__skb_pull(skb, tcp_hdr(skb)->doff * 4);
 
-   tcp_ecn_accept_cwr(tp, skb);
+   tcp_ecn_accept_cwr(sk, skb);
 
tp->rx_opt.dsack = 0;
 
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH net-next 2/4] tcp: avoid resetting ACK timer in DCTCP

2018-08-09 Thread Yuchung Cheng
The recent fix of acking immediately in DCTCP on CE status change
has an undesirable side-effect: it also resets TCP ack timer and
disables pingpong mode (interactive session). But the CE status
change has nothing to do with them. This patch addresses that by
using the new one-time immediate ACK flag instead of calling
tcp_enter_quickack_mode().

Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
Signed-off-by: Wei Wang 
Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp_dctcp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
index 8b637f9f23a2..ca61e2a659e7 100644
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -136,7 +136,7 @@ static void dctcp_ce_state_0_to_1(struct sock *sk)
 */
if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER)
__tcp_send_ack(sk, ca->prior_rcv_nxt);
-   tcp_enter_quickack_mode(sk, 1);
+   inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
}
 
ca->prior_rcv_nxt = tp->rcv_nxt;
@@ -157,7 +157,7 @@ static void dctcp_ce_state_1_to_0(struct sock *sk)
 */
if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER)
__tcp_send_ack(sk, ca->prior_rcv_nxt);
-   tcp_enter_quickack_mode(sk, 1);
+   inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
}
 
ca->prior_rcv_nxt = tp->rcv_nxt;
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH net-next 0/4] new mechanism to ACK immediately

2018-08-09 Thread Yuchung Cheng
This patch is a follow-up feature improvement to the recent fixes on
the performance issues in ECN (delayed) ACKs. Many of the fixes use
tcp_enter_quickack_mode routine to force immediate ACKs. However the
routine also reset tracking interactive session. This is not ideal
because these immediate ACKs are required by protocol specifics
unrelated to the interactiveness nature of the application.

This patch set introduces a new flag to send a one-time immediate ACK
without changing the status of interactive session tracking. With this
patch set the immediate ACKs are generated upon these protocol states:

1) When a hole is repaired
2) When CE status changes between subsequent data packets received
3) When a data packet carries CWR flag

Yuchung Cheng (4):
  tcp: mandate a one-time immediate ACK
  tcp: avoid resetting ACK timer in DCTCP
  tcp: always ACK immediately on hole repairs
  tcp: avoid resetting ACK timer upon receiving packet with ECN CWR flag

 include/net/inet_connection_sock.h |  3 ++-
 net/ipv4/tcp_dctcp.c   |  4 ++--
 net/ipv4/tcp_input.c   | 16 +---
 3 files changed, 13 insertions(+), 10 deletions(-)

-- 
2.18.0.597.ga71716f1ad-goog



[PATCH net-next 1/4] tcp: mandate a one-time immediate ACK

2018-08-09 Thread Yuchung Cheng
Add a new flag to indicate a one-time immediate ACK. This flag is
occasionaly set under specific TCP protocol states in addition to
the more common quickack mechanism for interactive application.

In several cases in the TCP code we want to force an immediate ACK
but do not want to call tcp_enter_quickack_mode() because we do
not want to forget the icsk_ack.pingpong or icsk_ack.ato state.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
Signed-off-by: Wei Wang 
Signed-off-by: Eric Dumazet 
---
 include/net/inet_connection_sock.h | 3 ++-
 net/ipv4/tcp_input.c   | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index 0a6c9e0f2b5a..fa43b82607d9 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -167,7 +167,8 @@ enum inet_csk_ack_state_t {
ICSK_ACK_SCHED  = 1,
ICSK_ACK_TIMER  = 2,
ICSK_ACK_PUSHED = 4,
-   ICSK_ACK_PUSHED2 = 8
+   ICSK_ACK_PUSHED2 = 8,
+   ICSK_ACK_NOW = 16   /* Send the next ACK immediately (once) */
 };
 
 void inet_csk_init_xmit_timers(struct sock *sk,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 715d541b52dd..b8849588c440 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5179,7 +5179,9 @@ static void __tcp_ack_snd_check(struct sock *sk, int 
ofo_possible)
(tp->rcv_nxt - tp->copied_seq < sk->sk_rcvlowat ||
 __tcp_select_window(sk) >= tp->rcv_wnd)) ||
/* We ACK each frame or... */
-   tcp_in_quickack_mode(sk)) {
+   tcp_in_quickack_mode(sk) ||
+   /* Protocol state mandates a one-time immediate ACK */
+   inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOW) {
 send_now:
tcp_send_ack(sk);
return;
-- 
2.18.0.597.ga71716f1ad-goog



Re: [PATCH net-next v2 1/1] net/tls: Combined memory allocation for decryption request

2018-08-09 Thread Dave Watson
On 08/09/18 04:56 AM, Vakul Garg wrote:
> For preparing decryption request, several memory chunks are required
> (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
> an accelerator, it is required that the buffers which are read by the
> accelerator must be dma-able and not come from stack. The buffers for
> aad and iv can be separately kmalloced each, but it is inefficient.
> This patch does a combined allocation for preparing decryption request
> and then segments into aead_req || sgin || sgout || iv || aad.
> 
> Signed-off-by: Vakul Garg 

Reviewed-by: Dave Watson 

Thanks, looks good to me now.


Re: [PATCH bpf-next 1/3] bpf: add bpf queue map

2018-08-09 Thread Alexei Starovoitov
On Thu, Aug 09, 2018 at 09:51:49AM -0500, Mauricio Vasquez wrote:
> 
> > Agree that existing ops are not the right alias, but deferring to user
> > space as inline function also doesn't really seem like a good fit, imho,
> > so I'd prefer rather to have something native. (Aside from that, the
> > above inline bpf_pop() would also race between CPUs.)
> 
> I think we should have push/pop/peek syscalls as well, having a bpf_pop()
> that is race prone would create problems. Users expected maps operations to
> be safe, so having one that is not will confuse them.

agree the races are not acceptable.
How about a mixed solution:
- introduce bpf_push/pop/peak helpers that programs will use, so
  they don't need to pass useless key=NULL
- introduce map->ops->lookup_and_delete and map->ops->lookup_or_init
  that prog-side helpers can use and syscall has 1-1 mapping for

Native lookup_or_init() helper for programs and syscall is badly missing.
Most of the bcc scripts use it and bcc has a racy workaround.
Similarly lookup_and_delete() syscall is 1-1 to pop() for stack/queue
and useful for regular hash maps.

At the end for stack/queue map the programs will use:
int bpf_push(map, value);
value_or_null = bpf_pop(map); // guaranteed non-racy for multi-cpu
value_or_null = bpf_peak(map); // racy if 2+ cpus doing it

from syscall:
bpf_map_lookup_elem(map, NULL, ); // returns top of stack
bpf_map_lookup_and_delete_elem(map, NULL, ); // returns top and deletes 
top atomically
bpf_map_update_elem(map, NULL, ); // pushes new value into stack 
atomically

Eventually hash and other maps will implement bpf_map_lookup_and_delete()
for both bpf progs and syscall.

The main point that prog-side api doesn't have to match 1-1 to syscall-side,
since they're different enough already.
Like lookup_or_init() is badly needed for programs, but unnecessary for syscall.

Thoughts?



Re: [PATCH iproute2/net-next] tc_util: Add support for showing TCA_STATS_BASIC_HW statistics

2018-08-09 Thread Stephen Hemminger
On Thu,  9 Aug 2018 11:16:02 -0400
Eelco Chaudron  wrote:

>  
> +static void print_tcstats_basic_hw(struct rtattr **tbs, char *prefix)
> +{
> + struct gnet_stats_basic bs = {0};

If not present don't print it rather than printing zero.

> + struct gnet_stats_basic bs_hw = {0};

This initialization is unnecessary since you always overwrite it.

> +
> + if (!tbs[TCA_STATS_BASIC_HW])
> + return;
> +
> + memcpy(_hw, RTA_DATA(tbs[TCA_STATS_BASIC_HW]),
> +MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC_HW]), sizeof(bs_hw)));
> +
> + if (bs_hw.bytes == 0 && bs_hw.packets == 0)
> + return;
> +
> + if (tbs[TCA_STATS_BASIC]) {
> + memcpy(, RTA_DATA(tbs[TCA_STATS_BASIC]),
> +MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC]),
> +sizeof(bs)));
> + }
> +
> + if (bs.bytes >= bs_hw.bytes && bs.packets >= bs_hw.packets) {
> + print_string(PRINT_FP, NULL, "\n%s", prefix);

Please use the magic string _SL_ to allow supporting single line output mode.

> + print_lluint(PRINT_ANY, "sw_bytes",
> +  "Sent software %llu bytes",
> +  bs.bytes - bs_hw.bytes);
> + print_uint(PRINT_ANY, "sw_packets", " %u pkt",
> +bs.packets - bs_hw.packets);
> + }
> +
> + print_string(PRINT_FP, NULL, "\n%s", prefix);
> + print_lluint(PRINT_ANY, "hw_bytes", "Sent hardware %llu bytes",
> +  bs_hw.bytes);
> + print_uint(PRINT_ANY, "hw_packets", " %u pkt", bs_hw.packets);
> +}


Re: [PATCH bpf-next] BPF: helpers: New helper to obtain namespace data from current task

2018-08-09 Thread Carlos Neira
Jesper,
Here is the updated patch.
 
>From 92633f6819423093932e8d04aa3dc99a5913f6fd Mon Sep 17 00:00:00 2001
From: Carlos Neira 
Date: Thu, 9 Aug 2018 09:55:32 -0400
Subject: [PATCH bpf-next] BPF: helpers: New helper to obtain namespace
 data from current task

This helper obtains the active namespace from current and returns pid, tgid,
device and namespace id as seen from that namespace, allowing to instrument
a process inside a container.
Device is read from /proc/self/ns/pid, as in the future it's possible that
different pid_ns files may belong to different devices, according
to the discussion between Eric Biederman and Yonghong in 2017 linux plumbers
conference.

Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
scripts but this helper returns the pid as seen by the root namespace which is
fine when a bcc script is not executed inside a container.
When the process of interest is inside a container, pid filtering will not work
if bpf_get_current_pid_tgid() is used. This helper addresses this limitation
returning the pid as it's seen by the current namespace where the script is
executing.

This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
used to do pid filtering even inside a container.

For example a bcc script using bpf_get_current_pid_tgid() (tools/funccount.py):

u32 pid = bpf_get_current_pid_tgid() >> 32;
if (pid != )
return 0;

Could be modified to use bpf_get_current_pidns_info() as follows:

struct bpf_pidns pidns;
bpf_get_current_pidns_info(, sizeof(struct bpf_pidns));
u32 pid = pidns.tgid;
u32 nsid = pidns.nsid;
if ((pid != ) && (nsid != ))
return 0;

To find out the name PID namespace id of a process, you could use this command:

$ ps -h -o pidns -p 

Or this other command:

$ ls -Li /proc//ns/pid

Signed-off-by: Carlos Antonio Neira Bustos 
---
 include/linux/bpf.h   |  1 +
 include/uapi/linux/bpf.h  | 24 +++-
 kernel/bpf/core.c |  1 +
 kernel/bpf/helpers.c  | 64 +++
 kernel/trace/bpf_trace.c  |  2 +
 samples/bpf/Makefile  |  3 ++
 samples/bpf/trace_ns_info_user.c  | 35 +
 samples/bpf/trace_ns_info_user_kern.c | 45 ++
 tools/include/uapi/linux/bpf.h| 24 +++-
 tools/testing/selftests/bpf/bpf_helpers.h |  3 ++
 10 files changed, 200 insertions(+), 2 deletions(-)
 create mode 100644 samples/bpf/trace_ns_info_user.c
 create mode 100644 samples/bpf/trace_ns_info_user_kern.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cd8790d2c6ed..3f4b999f7c99 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -787,6 +787,7 @@ extern const struct bpf_func_proto bpf_get_stack_proto;
 extern const struct bpf_func_proto bpf_sock_map_update_proto;
 extern const struct bpf_func_proto bpf_sock_hash_update_proto;
 extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto;
+extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
 
 extern const struct bpf_func_proto bpf_get_local_storage_proto;
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index dd5758dc35d3..8462f9881465 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2113,6 +2113,18 @@ union bpf_attr {
  * the shared data.
  * Return
  * Pointer to the local storage area.
+ *
+ * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 
size_of_pidns)
+ * Description
+ * Copies into *pidns* pid, namespace id and tgid as seen by the
+ * current namespace and also device from /proc/self/ns/pid.
+ * *size_of_pidns* must be the size of *pidns*
+ *
+ * This helper is used when pid filtering is needed inside a
+ * container as bpf_get_current_tgid() helper returns always the
+ * pid id as seen by the root namespace.
+ * Return
+ * 0 on success -EINVAL on error.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -2196,7 +2208,8 @@ union bpf_attr {
FN(rc_keydown), \
FN(skb_cgroup_id),  \
FN(get_current_cgroup_id),  \
-   FN(get_local_storage),
+   FN(get_local_storage),  \
+   FN(get_current_pidns_info),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2724,4 +2737,13 @@ enum bpf_task_fd_type {
BPF_FD_TYPE_URETPROBE,  /* filename + offset */
 };
 
+/* helper bpf_get_current_pidns_info will store the following
+ * data, dev will contain major/minor from /proc/self/ns/pid.
+ */
+struct bpf_pidns_info {
+   __u32 dev;
+   __u32 nsid;
+   __u32 tgid;
+   __u32 pid;
+};
 #endif /* 

Re: [PATCH bpf] bpf: fix bpffs non-array map seq_show issue

2018-08-09 Thread Daniel Borkmann
On 08/09/2018 05:15 PM, Yonghong Song wrote:
> On 8/9/18 7:24 AM, Daniel Borkmann wrote:
>> On 08/09/2018 05:55 AM, Yonghong Song wrote:
>>> On 8/8/18 7:25 PM, Alexei Starovoitov wrote:
 On Wed, Aug 08, 2018 at 06:25:19PM -0700, Yonghong Song wrote:
> In function map_seq_next() of kernel/bpf/inode.c,
> the first key will be the "0" regardless of the map type.
> This works for array. But for hash type, if it happens
> key "0" is in the map, the bpffs map show will miss
> some items if the key "0" is not the first element of
> the first bucket.
>
> This patch fixed the issue by guaranteeing to get
> the first element, if the seq_show is just started,
> by passing NULL pointer key to map_get_next_key() callback.
> This way, no missing elements will occur for
> bpffs hash table show even if key "0" is in the map.
>>>
>>> Currently, map_seq_show_elem callback is only implemented
>>> for arraymap. So the problem actually is not exposed.
>>>
>>> The issue is discovered when I tried to implement
>>> map_seq_show_elem for hash maps, and I will have followup
>>> patches for it.

Btw, on that note, I would also prefer if we could decouple
BTF from the map_seq_show_elem() as there is really no reason
to have it on a per-map. I had a patch below which would enable
it for all map types generically, and bpftool works out of the
box for it. Also, array doesn't really have to be 'int' type
enforced as long as it's some data structure with 4 bytes, it's
all fine, so this can be made fully generic (we only eventually
care about the match in size).

>From 0a8be27cbc2ac0c6fc2632865b5afe37222a1fc7 Mon Sep 17 00:00:00 2001
Message-Id: 
<0a8be27cbc2ac0c6fc2632865b5afe37222a1fc7.1533830053.git.dan...@iogearbox.net>
From: Daniel Borkmann 
Date: Thu, 9 Aug 2018 16:50:21 +0200
Subject: [PATCH bpf-next] bpf, btf: enable for all maps

# bpftool m dump id 19
[{
"key": {
"": {
"vip": 0,
"vipv6": []
},
"port": 0,
"family": 0,
"proto": 0
},
"value": {
"flags": 0,
"vip_num": 0
}
}
]

Signed-off-by: Daniel Borkmann 
---
 include/linux/bpf.h   |  4 +---
 kernel/bpf/arraymap.c | 27 ---
 kernel/bpf/inode.c|  3 ++-
 kernel/bpf/syscall.c  | 24 
 4 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cd8790d..91aa4be 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -48,8 +48,6 @@ struct bpf_map_ops {
u32 (*map_fd_sys_lookup_elem)(void *ptr);
void (*map_seq_show_elem)(struct bpf_map *map, void *key,
  struct seq_file *m);
-   int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
-u32 key_type_id, u32 value_type_id);
 };

 struct bpf_map {
@@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const struct 
bpf_map *map)

 static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
 {
-   return map->ops->map_seq_show_elem && map->ops->map_check_btf;
+   return map->ops->map_seq_show_elem;
 }

 extern const struct bpf_map_ops bpf_map_offload_ops;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2aa55d030..67f0bdf 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map *map, 
void *key,
rcu_read_unlock();
 }

-static int array_map_check_btf(const struct bpf_map *map, const struct btf 
*btf,
-  u32 btf_key_id, u32 btf_value_id)
-{
-   const struct btf_type *key_type, *value_type;
-   u32 key_size, value_size;
-   u32 int_data;
-
-   key_type = btf_type_id_size(btf, _key_id, _size);
-   if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
-   return -EINVAL;
-
-   int_data = *(u32 *)(key_type + 1);
-   /* bpf array can only take a u32 key.  This check makes
-* sure that the btf matches the attr used during map_create.
-*/
-   if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
-   BTF_INT_OFFSET(int_data))
-   return -EINVAL;
-
-   value_type = btf_type_id_size(btf, _value_id, _size);
-   if (!value_type || value_size != map->value_size)
-   return -EINVAL;
-
-   return 0;
-}
-
 const struct bpf_map_ops array_map_ops = {
.map_alloc_check = array_map_alloc_check,
.map_alloc = array_map_alloc,
@@ -394,7 +368,6 @@ const struct bpf_map_ops array_map_ops = {
.map_delete_elem = array_map_delete_elem,
.map_gen_lookup = array_map_gen_lookup,
.map_seq_show_elem = array_map_seq_show_elem,
-   .map_check_btf = array_map_check_btf,
 };

 const struct bpf_map_ops percpu_array_map_ops = {
diff --git a/kernel/bpf/inode.c 

[PATCH bpf-next 1/3] bpf: fix bpffs non-array map seq_show issue

2018-08-09 Thread Yonghong Song
In function map_seq_next() of kernel/bpf/inode.c,
the first key will be the "0" regardless of the map type.
This works for array. But for hash type, if it happens
key "0" is in the map, the bpffs map show will miss
some items if the key "0" is not the first element of
the first bucket.

This patch fixed the issue by guaranteeing to get
the first element, if the seq_show is just started,
by passing NULL pointer key to map_get_next_key() callback.
This way, no missing elements will occur for
bpffs hash table show even if key "0" is in the map.

Fixes: a26ca7c982cb5 ("bpf: btf: Add pretty print support to the basic 
arraymap")
Acked-by: Alexei Starovoitov 
Signed-off-by: Yonghong Song 
---
 kernel/bpf/inode.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 76efe9a183f5..fc5b103512e7 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -196,19 +196,21 @@ static void *map_seq_next(struct seq_file *m, void *v, 
loff_t *pos)
 {
struct bpf_map *map = seq_file_to_map(m);
void *key = map_iter(m)->key;
+   void *prev_key;
 
if (map_iter(m)->done)
return NULL;
 
if (unlikely(v == SEQ_START_TOKEN))
-   goto done;
+   prev_key = NULL;
+   else
+   prev_key = key;
 
-   if (map->ops->map_get_next_key(map, key, key)) {
+   if (map->ops->map_get_next_key(map, prev_key, key)) {
map_iter(m)->done = true;
return NULL;
}
 
-done:
++(*pos);
return key;
 }
-- 
2.14.3



[PATCH bpf-next 2/3] bpf: btf: add pretty print for hash/lru_hash maps

2018-08-09 Thread Yonghong Song
Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
the basic arraymap") added pretty print support to array map.
This patch adds pretty print for hash and lru_hash maps.
The following example shows the pretty-print result of
a pinned hashmap:

struct map_value {
int count_a;
int count_b;
};

cat /sys/fs/bpf/pinned_hash_map:

87907: {87907,87908}
57354: {37354,57355}
76625: {76625,76626}
...

Signed-off-by: Yonghong Song 
---
 kernel/bpf/hashtab.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 513d9dfcf4ee..d6110042e0d9 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -11,9 +11,11 @@
  * General Public License for more details.
  */
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include "percpu_freelist.h"
 #include "bpf_lru_list.h"
 #include "map_in_map.h"
@@ -1162,6 +1164,44 @@ static void htab_map_free(struct bpf_map *map)
kfree(htab);
 }
 
+static void htab_map_seq_show_elem(struct bpf_map *map, void *key,
+  struct seq_file *m)
+{
+   void *value;
+
+   rcu_read_lock();
+
+   value = htab_map_lookup_elem(map, key);
+   if (!value) {
+   rcu_read_unlock();
+   return;
+   }
+
+   btf_type_seq_show(map->btf, map->btf_key_type_id, key, m);
+   seq_puts(m, ": ");
+   btf_type_seq_show(map->btf, map->btf_value_type_id, value, m);
+   seq_puts(m, "\n");
+
+   rcu_read_unlock();
+}
+
+static int htab_map_check_btf(const struct bpf_map *map, const struct btf *btf,
+ u32 btf_key_id, u32 btf_value_id)
+{
+   const struct btf_type *key_type, *value_type;
+   u32 key_size, value_size;
+
+   key_type = btf_type_id_size(btf, _key_id, _size);
+   if (!key_type || key_size != map->key_size)
+   return -EINVAL;
+
+   value_type = btf_type_id_size(btf, _value_id, _size);
+   if (!value_type || value_size != map->value_size)
+   return -EINVAL;
+
+   return 0;
+}
+
 const struct bpf_map_ops htab_map_ops = {
.map_alloc_check = htab_map_alloc_check,
.map_alloc = htab_map_alloc,
@@ -1171,6 +1211,8 @@ const struct bpf_map_ops htab_map_ops = {
.map_update_elem = htab_map_update_elem,
.map_delete_elem = htab_map_delete_elem,
.map_gen_lookup = htab_map_gen_lookup,
+   .map_seq_show_elem = htab_map_seq_show_elem,
+   .map_check_btf = htab_map_check_btf,
 };
 
 const struct bpf_map_ops htab_lru_map_ops = {
@@ -1182,6 +1224,8 @@ const struct bpf_map_ops htab_lru_map_ops = {
.map_update_elem = htab_lru_map_update_elem,
.map_delete_elem = htab_lru_map_delete_elem,
.map_gen_lookup = htab_lru_map_gen_lookup,
+   .map_seq_show_elem = htab_map_seq_show_elem,
+   .map_check_btf = htab_map_check_btf,
 };
 
 /* Called from eBPF program */
-- 
2.14.3



[PATCH bpf-next 0/3] bpf: add bpffs pretty print for hash/lru_hash maps

2018-08-09 Thread Yonghong Song
Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
the basic arraymap") added pretty print support to array map.
This patch adds pretty print for hash and lru_hash maps.

The following example shows the pretty-print result of a pinned hashmap.
Without this patch set, user will get an error instead.

struct map_value {
int count_a;
int count_b;
};

cat /sys/fs/bpf/pinned_hash_map:

87907: {87907,87908}
57354: {37354,57355}
76625: {76625,76626}
...

Patch #1 fixed a bug in bpffs map_seq_next() function so that
all elements in the hash table will be traversed.
Patch #2 implemented map_seq_show_elem() and map_check_btf()
callback functions for hash and lru hash maps.
Patch #3 enhanced tools/testing/selftests/bpf/test_btf.c to
test bpffs hash and lru hash map pretty print.

Yonghong Song (3):
  bpf: fix bpffs non-array map seq_show issue
  bpf: btf: add pretty print for hash/lru_hash maps
  tools/bpf: add bpffs pretty print btf test for hash/lru_hash maps

 kernel/bpf/hashtab.c   | 44 +
 kernel/bpf/inode.c |  8 ++--
 tools/testing/selftests/bpf/test_btf.c | 87 --
 3 files changed, 121 insertions(+), 18 deletions(-)

-- 
2.14.3



[PATCH bpf-next 3/3] tools/bpf: add bpffs pretty print btf test for hash/lru_hash maps

2018-08-09 Thread Yonghong Song
Pretty print tests for hash/lru_hash maps are added in test_btf.c.
The btf type blob is the same as pretty print array map test.
The test result:
  $ mount -t bpf bpf /sys/fs/bpf
  $ ./test_btf -p
BTF pretty print array..OK
BTF pretty print hash..OK
BTF pretty print lru hash..OK
PASS:3 SKIP:0 FAIL:0

Signed-off-by: Yonghong Song 
---
 tools/testing/selftests/bpf/test_btf.c | 87 --
 1 file changed, 72 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_btf.c 
b/tools/testing/selftests/bpf/test_btf.c
index ffdd27737c9e..7fa8c800c540 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -131,6 +131,8 @@ struct btf_raw_test {
__u32 max_entries;
bool btf_load_err;
bool map_create_err;
+   bool ordered_map;
+   bool lossless_map;
int hdr_len_delta;
int type_off_delta;
int str_off_delta;
@@ -2093,8 +2095,7 @@ struct pprint_mapv {
} aenum;
 };
 
-static struct btf_raw_test pprint_test = {
-   .descr = "BTF pretty print test #1",
+static struct btf_raw_test pprint_test_template = {
.raw_types = {
/* unsighed char */ /* [1] */
BTF_TYPE_INT_ENC(NAME_TBD, 0, 0, 8, 1),
@@ -2146,8 +2147,6 @@ static struct btf_raw_test pprint_test = {
},
.str_sec = "\0unsigned char\0unsigned short\0unsigned 
int\0int\0unsigned long 
long\0uint8_t\0uint16_t\0uint32_t\0int32_t\0uint64_t\0ui64\0ui8a\0ENUM_ZERO\0ENUM_ONE\0ENUM_TWO\0ENUM_THREE\0pprint_mapv\0ui32\0ui16\0si32\0unused_bits2a\0bits28\0unused_bits2b\0aenum",
.str_sec_size = sizeof("\0unsigned char\0unsigned short\0unsigned 
int\0int\0unsigned long 
long\0uint8_t\0uint16_t\0uint32_t\0int32_t\0uint64_t\0ui64\0ui8a\0ENUM_ZERO\0ENUM_ONE\0ENUM_TWO\0ENUM_THREE\0pprint_mapv\0ui32\0ui16\0si32\0unused_bits2a\0bits28\0unused_bits2b\0aenum"),
-   .map_type = BPF_MAP_TYPE_ARRAY,
-   .map_name = "pprint_test",
.key_size = sizeof(unsigned int),
.value_size = sizeof(struct pprint_mapv),
.key_type_id = 3,   /* unsigned int */
@@ -2155,6 +2154,40 @@ static struct btf_raw_test pprint_test = {
.max_entries = 128 * 1024,
 };
 
+static struct btf_pprint_test_meta {
+   const char *descr;
+   enum bpf_map_type map_type;
+   const char *map_name;
+   bool ordered_map;
+   bool lossless_map;
+} pprint_tests_meta[] = {
+{
+   .descr = "BTF pretty print array",
+   .map_type = BPF_MAP_TYPE_ARRAY,
+   .map_name = "pprint_test_array",
+   .ordered_map = true,
+   .lossless_map = true,
+},
+
+{
+   .descr = "BTF pretty print hash",
+   .map_type = BPF_MAP_TYPE_HASH,
+   .map_name = "pprint_test_hash",
+   .ordered_map = false,
+   .lossless_map = true,
+},
+
+{
+   .descr = "BTF pretty print lru hash",
+   .map_type = BPF_MAP_TYPE_LRU_HASH,
+   .map_name = "pprint_test_lru_hash",
+   .ordered_map = false,
+   .lossless_map = false,
+},
+
+};
+
+
 static void set_pprint_mapv(struct pprint_mapv *v, uint32_t i)
 {
v->ui32 = i;
@@ -2166,10 +2199,12 @@ static void set_pprint_mapv(struct pprint_mapv *v, 
uint32_t i)
v->aenum = i & 0x03;
 }
 
-static int test_pprint(void)
+static int do_test_pprint(void)
 {
-   const struct btf_raw_test *test = _test;
+   const struct btf_raw_test *test = _test_template;
struct bpf_create_map_attr create_attr = {};
+   unsigned int key, nr_read_elems;
+   bool ordered_map, lossless_map;
int map_fd = -1, btf_fd = -1;
struct pprint_mapv mapv = {};
unsigned int raw_btf_size;
@@ -2178,7 +2213,6 @@ static int test_pprint(void)
char pin_path[255];
size_t line_len = 0;
char *line = NULL;
-   unsigned int key;
uint8_t *raw_btf;
ssize_t nread;
int err, ret;
@@ -2251,14 +2285,18 @@ static int test_pprint(void)
goto done;
}
 
-   key = 0;
+   nr_read_elems = 0;
+   ordered_map = test->ordered_map;
+   lossless_map = test->lossless_map;
do {
ssize_t nexpected_line;
+   unsigned int next_key;
 
-   set_pprint_mapv(, key);
+   next_key = ordered_map ? nr_read_elems : atoi(line);
+   set_pprint_mapv(, next_key);
nexpected_line = snprintf(expected_line, sizeof(expected_line),
  "%u: 
{%u,0,%d,0x%x,0x%x,0x%x,{%lu|[%u,%u,%u,%u,%u,%u,%u,%u]},%s}\n",
- key,
+ next_key,
  mapv.ui32, mapv.si32,
  mapv.unused_bits2a, mapv.bits28, 
mapv.unused_bits2b,
  mapv.ui64,
@@ -2281,11 +2319,12 @@ static int test_pprint(void)
   

Re: [net-next, PATCH 1/2] net: socionext: Use descriptor info instead of MMIO reads on Rx

2018-08-09 Thread Arnd Bergmann
On Thu, Aug 9, 2018 at 10:02 AM Ilias Apalodimas
 wrote:
>
> MMIO reads for remaining packets in queue occur (at least)twice per
> invocation of netsec_process_rx(). We can use the packet descriptor to
> identify if it's owned by the hardware and break out, avoiding the more
> expensive MMIO read operations. This has a ~2% increase on the pps of the
> Rx path when tested with 64byte packets
>
> Signed-off-by: Ilias Apalodimas 
> ---
>  drivers/net/ethernet/socionext/netsec.c | 19 +--
>  1 file changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/ethernet/socionext/netsec.c 
> b/drivers/net/ethernet/socionext/netsec.c
> index 01589b6..ae32909 100644
> --- a/drivers/net/ethernet/socionext/netsec.c
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -657,8 +657,6 @@ static struct sk_buff *netsec_get_rx_pkt_data(struct 
> netsec_priv *priv,

> +   if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD))
> +   break;
> done++;

Should this use READ_ONCE() to prevent the compiler from moving the
access around? I see that netsec_get_rx_pkt_data() has a dma_rmb()
before reading the data, which prevents the CPU from doing something
wrong here, but not the compiler.

Arnd


Re: [Query]: DSA Understanding

2018-08-09 Thread Andrew Lunn
> > > The received packets captured on the PC are MDNS and DHPC, these MDNS
> > > are causing the rx
> > > packet counter go up:
> >
> > And where are these packets coming from? The target device? Or some
> > other device on your network?
> >
> AFIK, MDNS is also kind of a bcast its sending MDNS requests and
> receiving itself,
> that’s the reason rx packets are incrementing (correct me if I am wrong)

Your Ethernet device should not be receiving its own
transmissions. Looping back for broadcast and multicast packets
happens higher up in the network stack.

Look at the source MAC address for these packets. Where are the coming
from?

> ~$ ethtool -S lan4
> NIC statistics:
...
>  tx_hi: 0
>  tx_late_col: 0
>  tx_pause: 0
>  tx_bcast: 749
>  tx_mcast: 212

This suggest the switch is putting packets onto the wire.

> Only weird thing I notice on target, when its replying for ping
> requests ( (oui Unknown) is that something which is causing issues ?

These are not ping requests. These are ARP requests/replies.

> 08:11:20.230704 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
> VB4-SN tell tango-charlie.local, length 46
> 08:11:20.230749 ARP, Ethernet (len 6), IPv4 (len 4), Reply
> VB4-SN is-at c4:f3:12:08:fe:7f (oui Unknown), length 28
> 08:11:21.230629 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
> VB4-SN tell tango-charlie.local, length 46
> 08:11:21.230657 ARP, Ethernet (len 6), IPv4 (len 4), Reply
> VB4-SN is-at c4:f3:12:08:fe:7f (oui Unknown), length 28
> 08:11:22.247831 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
> VB4-SN tell tango-charlie.local, length 46
> 08:11:22.247857 ARP, Ethernet (len 6), IPv4 (len 4), Reply
> VB4-SN is-at c4:f3:12:08:fe:7f (oui Unknown), length 28

c4:f3:12 is the OUI. It is actually registers to TI:

https://aruljohn.com/mac/C4F312

But tcpdump probably does not know this, or the build you have has the
oui table removed to keep the binary small.

Andrew


Re: C45 support and mdiobus_scan

2018-08-09 Thread Andrew Lunn
> > The PCIe core will look in the device tree and when it creates the
> > platform device for the i210 on the pcie bus, it points
> > pdev->dev.of_node at this node. So long as you are using a platform
> > with DT, you can do this. I hope you are not using x86..
> 
> Yes I am :( Any possible solution for this?

Well, DT can be used with x86. I think Edison did that. But i assume
your PCIe host is in ACPI, not DT. So getting this linking working
will not be easy.

There has been some work to add an ACPI binding for PHYs. I don't know
if it actually got far enough that you can hack your DSDT to add a
PHY. But i'm sure it did not get far enough that you can describe an
MDIO bus in DSDT, so it probably is not going to help you.

> I guess in ultimate case I will have to switch to ARM based setup.

Yes, or MIPS.

 Andrew


Re: [PATCH bpf] bpf: fix bpffs non-array map seq_show issue

2018-08-09 Thread Yonghong Song




On 8/9/18 7:24 AM, Daniel Borkmann wrote:

On 08/09/2018 05:55 AM, Yonghong Song wrote:

On 8/8/18 7:25 PM, Alexei Starovoitov wrote:

On Wed, Aug 08, 2018 at 06:25:19PM -0700, Yonghong Song wrote:

In function map_seq_next() of kernel/bpf/inode.c,
the first key will be the "0" regardless of the map type.
This works for array. But for hash type, if it happens
key "0" is in the map, the bpffs map show will miss
some items if the key "0" is not the first element of
the first bucket.

This patch fixed the issue by guaranteeing to get
the first element, if the seq_show is just started,
by passing NULL pointer key to map_get_next_key() callback.
This way, no missing elements will occur for
bpffs hash table show even if key "0" is in the map.


Currently, map_seq_show_elem callback is only implemented
for arraymap. So the problem actually is not exposed.

The issue is discovered when I tried to implement
map_seq_show_elem for hash maps, and I will have followup
patches for it.

So this patch probably should apply to bpf-next or
I can include this patch in my later patch set
which implements map_seq_show_elem for hash map
which can demonstrate the problem.

Please let me know.


Fixes: a26ca7c982cb5 ("bpf: btf: Add pretty print support to the basic 
arraymap")
Signed-off-by: Yonghong Song 


Acked-by: Alexei Starovoitov 


Given this doesn't affect any current code, I think bpf-next
would be fine.

Anyway, this cannot be used as-is, results in following compile
warning ...


Thanks and will fix the problem and resubmit the patch set to
bpf-next.



# make -j4 kernel/bpf/
   DESCEND  objtool
   CALLscripts/checksyscalls.sh
   CC  kernel/bpf/verifier.o
   CC  kernel/bpf/inode.o
kernel/bpf/inode.c: In function ‘map_seq_next’:
kernel/bpf/inode.c:214:1: warning: label ‘done’ defined but not used 
[-Wunused-label]
  done:
  ^~~~
   AR  kernel/bpf/built-in.a



[PATCH iproute2/net-next] tc_util: Add support for showing TCA_STATS_BASIC_HW statistics

2018-08-09 Thread Eelco Chaudron
Add support for showing hardware specific counters to easily
troubleshoot hardware offload.

$ tc -s filter show dev enp3s0np0 parent :
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 2.0.0.0
  src_ip 1.0.0.0
  ip_flags nofrag
  in_hw
action order 1: mirred (Egress Redirect to device eth1) stolen
index 1 ref 1 bind 1 installed 0 sec used 0 sec
Action statistics:
Sent 534884742 bytes 8915697 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 187542 bytes 4077 pkt
Sent hardware 534697200 bytes 8911620 pkt
backlog 0b 0p requeues 0
cookie 89173e6a7001becfd486bda17e29


Signed-off-by: Eelco Chaudron 
---
 include/uapi/linux/gen_stats.h |1 +
 tc/tc_util.c   |   38 ++
 2 files changed, 39 insertions(+)

diff --git a/include/uapi/linux/gen_stats.h b/include/uapi/linux/gen_stats.h
index 24a861c..065408e 100644
--- a/include/uapi/linux/gen_stats.h
+++ b/include/uapi/linux/gen_stats.h
@@ -12,6 +12,7 @@ enum {
TCA_STATS_APP,
TCA_STATS_RATE_EST64,
TCA_STATS_PAD,
+   TCA_STATS_BASIC_HW,
__TCA_STATS_MAX,
 };
 #define TCA_STATS_MAX (__TCA_STATS_MAX - 1)
diff --git a/tc/tc_util.c b/tc/tc_util.c
index d757852..43a2013 100644
--- a/tc/tc_util.c
+++ b/tc/tc_util.c
@@ -800,6 +800,41 @@ void print_tm(FILE *f, const struct tcf_t *tm)
}
 }
 
+static void print_tcstats_basic_hw(struct rtattr **tbs, char *prefix)
+{
+   struct gnet_stats_basic bs = {0};
+   struct gnet_stats_basic bs_hw = {0};
+
+   if (!tbs[TCA_STATS_BASIC_HW])
+   return;
+
+   memcpy(_hw, RTA_DATA(tbs[TCA_STATS_BASIC_HW]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC_HW]), sizeof(bs_hw)));
+
+   if (bs_hw.bytes == 0 && bs_hw.packets == 0)
+   return;
+
+   if (tbs[TCA_STATS_BASIC]) {
+   memcpy(, RTA_DATA(tbs[TCA_STATS_BASIC]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC]),
+  sizeof(bs)));
+   }
+
+   if (bs.bytes >= bs_hw.bytes && bs.packets >= bs_hw.packets) {
+   print_string(PRINT_FP, NULL, "\n%s", prefix);
+   print_lluint(PRINT_ANY, "sw_bytes",
+"Sent software %llu bytes",
+bs.bytes - bs_hw.bytes);
+   print_uint(PRINT_ANY, "sw_packets", " %u pkt",
+  bs.packets - bs_hw.packets);
+   }
+
+   print_string(PRINT_FP, NULL, "\n%s", prefix);
+   print_lluint(PRINT_ANY, "hw_bytes", "Sent hardware %llu bytes",
+bs_hw.bytes);
+   print_uint(PRINT_ANY, "hw_packets", " %u pkt", bs_hw.packets);
+}
+
 void print_tcstats2_attr(FILE *fp, struct rtattr *rta, char *prefix, struct 
rtattr **xstats)
 {
SPRINT_BUF(b1);
@@ -826,6 +861,9 @@ void print_tcstats2_attr(FILE *fp, struct rtattr *rta, char 
*prefix, struct rtat
print_uint(PRINT_ANY, "requeues", " requeues %u) ", q.requeues);
}
 
+   if (tbs[TCA_STATS_BASIC_HW])
+   print_tcstats_basic_hw(tbs, prefix);
+
if (tbs[TCA_STATS_RATE_EST64]) {
struct gnet_stats_rate_est64 re = {0};
 



Re: [PATCH] net: macb: do not disable MDIO bus when closing interface

2018-08-09 Thread Andrew Lunn
Hi Anssi

> macb_reset_hw() is called in init path too, though, so maybe clearing
> all bits is intentional / wanted to get the controller to a known state,
> even though the comment only mentions TX/RX?

You need to be careful here. Once of_mdiobus_register() is called, the
MDIO should be usable. If you happen to have an Ethernet switch on the
bus, it could be probed then. The DSA driver will start using the bus.
Or if you have a second PHY, connected to some other MAC, it could be
used by the other MAC.  This all happens in the macb_probe function.

Sometime later, the interface will be up'ed. At this point macb_open()
is called, which calls macb_init_hw(), which calls
macb_reset_hw(). What you don't want happening is changes to the NCR
at this point breaking an MDIO transaction which might be going on.

Ideally, the MPE should be enabled before of_mdiobus_register(), and
left alone until mdiobus_unregister() is called in macb_remove().

 Andrew


Re: C45 support and mdiobus_scan

2018-08-09 Thread Jose Abreu
Hi Andrew,

Thanks for your answer :)

On 09-08-2018 16:03, Andrew Lunn wrote:
> On Thu, Aug 09, 2018 at 02:54:11PM +0100, Jose Abreu wrote:
>> Hi All,
>>
>> I'm preparing to add support for 10G in stmmac and I noticed that
>> Generic 10G PHY needs C45 support. Digging through the
>> registration callbacks for phy that are used in stmmac I reached
>> to mdiobus_scan() and the following call:
>>
>> phydev = get_phy_device(bus, addr, false);
>>
>> The last parameter is "is_c45", and is always being set to false ...
>>
>> Does this mean that I can't use the Generic 10G PHY in stmmac? I
>> don't mind link being fixed for 10G for now.
> Hi Jose
>
> So far, all MACs which support 10G have used phy-handle to point to a
> PHY on am MDIO bus, and that PHY uses .compatible =
> "ethernet-phy-ieee802.3-c45". of_mdiobus_register() will then find the
> PHY and register it. You really should try to follow this, if you can.
>
>> (Notice I'm using a PCI based setup so no DT bindings can help me
>> for this).
> That is not necessarily true. Take a look at:
>
> arch/arm/boot/dts/imx6qdl-zii-rdu2.dtsi
>
>  {
> pinctrl-names = "default";
> pinctrl-0 = <_pcie>;
> reset-gpio = < 12 GPIO_ACTIVE_LOW>;
> status = "okay";
>
> host@0 {
> reg = <0 0 0 0 0>;
>
> #address-cells = <3>;
> #size-cells = <2>;
>
> i210: i210@0 {
> reg = <0 0 0 0 0>;
> };
> };
> };
>
> The PCIe core will look in the device tree and when it creates the
> platform device for the i210 on the pcie bus, it points
> pdev->dev.of_node at this node. So long as you are using a platform
> with DT, you can do this. I hope you are not using x86..

Yes I am :( Any possible solution for this?

I guess in ultimate case I will have to switch to ARM based setup.

Thanks and Best Regards,
Jose Miguel Abreu

>
>  Andrew



Re: [Query]: DSA Understanding

2018-08-09 Thread Lad, Prabhakar
Hi,

On Thu, Aug 9, 2018 at 1:56 PM Andrew Lunn  wrote:
>
> On Thu, Aug 09, 2018 at 01:45:52PM +0100, Lad, Prabhakar wrote:
> > On Thu, Aug 9, 2018 at 1:02 PM Andrew Lunn  wrote:
> > >
> > > On Thu, Aug 09, 2018 at 12:31:31PM +0100, Lad, Prabhakar wrote:
> > > > Hi Andrew,
> > > >
> > > > On Thu, Aug 2, 2018 at 5:05 PM Andrew Lunn  wrote:
> > > > >
> > > > > > I dont see any Reply's on the PC with tcpdump on PC
> > > > >
> > > > > So try ethool -S on the PC. Any packets dropped because of errors?
> > > > >
> > > > I dont see any drops/errors on the PC, following is the dump from PC:
> > > >
> > > > sudo ethtool -S enx00e04c68c229
> > > > [sudo] password for prabhakar:
> > > > NIC statistics:
> > > >  tx_packets: 1659
> > > >  rx_packets: 485
> > > >  tx_errors: 0
> > > >  rx_errors: 0
> > > >  rx_missed: 0
> > > >  align_errors: 0
> > > >  tx_single_collisions: 0
> > > >  tx_multi_collisions: 0
> > > >  rx_unicast: 18
> > > >  rx_broadcast: 295
> > > >  rx_multicast: 172
> > > >  tx_aborted: 0
> > > >  tx_underrun: 0
> > >
> > > So there are received packets at the PC. Not many unicast, mostly
> > > broadcast, which fits with ARP. What does wireshark tell you about
> > > these received packets? Are they ARP replies? Are they something else?
> > > If they are ARP replies, why are they being ignored?  I don't know if
> > > tshark will show you CRC problems. Wireshark does, when you unfold a
> > > packet, and look at the fields in detail.
> > >
> > > > Seems like the packet is not being transmitted from the switch at all
> > > > ? (as ping from switch lan4 to PC fails)
> > >
> > > I don't think you can make that conclusion yet. The PC is receiving
> > > something, rx_packets=485. What are those packets?
> > >
> > The received packets captured on the PC are MDNS and DHPC, these MDNS
> > are causing the rx
> > packet counter go up:
>
> And where are these packets coming from? The target device? Or some
> other device on your network?
>
AFIK, MDNS is also kind of a bcast its sending MDNS requests and
receiving itself,
that’s the reason rx packets are incrementing (correct me if I am wrong)

On the PC where lan4 is connected , tx has high count because of ping requests
prabhakar@tango-charlie:~/Desktop/test$ ifconfig  enx00e04c68c229
enx00e04c68c229 Link encap:Ethernet  HWaddr 00:e0:4c:68:c2:29
  inet addr:169.254.78.251  Bcast:169.254.255.255  Mask:255.255.0.0
  inet6 addr: fe80::2f12:3d45:7cca:57fa/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:502 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4811 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:79843 (79.8 KB)  TX bytes:252647 (252.6 KB)

> > I don’t see any packets reaching the PC for the ping request. I can see the
> > RX and TX on the switch for lan4 increasing every second. seems like the
> > switch itself is consuming it and not forwarding(but then lan4 TX
> > shouldn’t have incremented ?).
>
> Which lan4 counters are going up? tx_packets, rx_packets, tx_errors,
> rx_errors are software counters, and are incremented by the DSA
> core. Other counters are hardware counters, and the DSA driver will
> read them from the actual switch port. If the hardware counters show
> packets are being transmitted, then the packets are probably on the
> cable.
>

I can see the RX and TX incrementing every second, no errors counters go up

$ watch -n1 ifconfig lan4
lan4  Link encap:Ethernet  HWaddr C4:F3:12:08:FE:7F
  inet addr:169.254.126.126  Bcast:169.254.255.255  Mask:255.255.0.0
  inet6 addr: fe80::18b9:d16c:7ff:ab73%3201178264/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:2168 errors:0 dropped:0 overruns:0 frame:0
  TX packets:1724 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:117816 (115.0 KiB)  TX bytes:99940 (97.5 KiB)

~$ ethtool -S eth1
NIC statistics:
 Good Rx Frames: 800
 Broadcast Rx Frames: 729
 Multicast Rx Frames: 71
 Pause Rx Frames: 0
 Rx CRC Errors: 0
 Rx Align/Code Errors: 0
 Oversize Rx Frames: 0
 Rx Jabbers: 0
 Undersize (Short) Rx Frames: 0
 Rx Fragments: 0
 Rx Octets: 65472
 Good Tx Frames: 369
 Broadcast Tx Frames: 16
 Multicast Tx Frames: 139
 Pause Tx Frames: 0
 Deferred Tx Frames: 0
 Collisions: 0
 Single Collision Tx Frames: 0
 Multiple Collision Tx Frames: 0
 Excessive Collisions: 0
 Late Collisions: 0
 Tx Underrun: 0
 Carrier Sense Errors: 0
 Tx Octets: 40990
 Rx + Tx 64 Octet Frames: 0
 Rx + Tx 65-127 Octet Frames: 1031
 Rx + Tx 128-255 Octet Frames: 73
 Rx + Tx 256-511 Octet Frames: 65
 Rx + Tx 512-1023 Octet Frames: 0
 Rx + Tx 1024-Up Octet Frames: 0
 Net Octets: 106462
 Rx Start of Frame Overruns: 0
 Rx 

Re: C45 support and mdiobus_scan

2018-08-09 Thread Andrew Lunn
On Thu, Aug 09, 2018 at 02:54:11PM +0100, Jose Abreu wrote:
> Hi All,
> 
> I'm preparing to add support for 10G in stmmac and I noticed that
> Generic 10G PHY needs C45 support. Digging through the
> registration callbacks for phy that are used in stmmac I reached
> to mdiobus_scan() and the following call:
> 
> phydev = get_phy_device(bus, addr, false);
> 
> The last parameter is "is_c45", and is always being set to false ...
> 
> Does this mean that I can't use the Generic 10G PHY in stmmac? I
> don't mind link being fixed for 10G for now.

Hi Jose

So far, all MACs which support 10G have used phy-handle to point to a
PHY on am MDIO bus, and that PHY uses .compatible =
"ethernet-phy-ieee802.3-c45". of_mdiobus_register() will then find the
PHY and register it. You really should try to follow this, if you can.

> (Notice I'm using a PCI based setup so no DT bindings can help me
> for this).

That is not necessarily true. Take a look at:

arch/arm/boot/dts/imx6qdl-zii-rdu2.dtsi

 {
pinctrl-names = "default";
pinctrl-0 = <_pcie>;
reset-gpio = < 12 GPIO_ACTIVE_LOW>;
status = "okay";

host@0 {
reg = <0 0 0 0 0>;

#address-cells = <3>;
#size-cells = <2>;

i210: i210@0 {
reg = <0 0 0 0 0>;
};
};
};

The PCIe core will look in the device tree and when it creates the
platform device for the i210 on the pcie bus, it points
pdev->dev.of_node at this node. So long as you are using a platform
with DT, you can do this. I hope you are not using x86..

 Andrew


[PATCH 2/2] net/sched: Add hardware specific counters to TC actions

2018-08-09 Thread Eelco Chaudron
Add additional counters that will store the bytes/packets processed by
hardware. These will be exported through the netlink interface for
displaying by the iproute2 tc tool

Signed-off-by: Eelco Chaudron 
---
 include/net/act_api.h  |8 +---
 include/net/pkt_cls.h  |2 +-
 net/sched/act_api.c|   14 +++---
 net/sched/act_gact.c   |6 +-
 net/sched/act_mirred.c |5 -
 5 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 8c9bc02..9931d8a 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -33,10 +33,12 @@ struct tc_action {
int tcfa_action;
struct tcf_ttcfa_tm;
struct gnet_stats_basic_packed  tcfa_bstats;
+   struct gnet_stats_basic_packed  tcfa_bstats_hw;
struct gnet_stats_queue tcfa_qstats;
struct net_rate_estimator __rcu *tcfa_rate_est;
spinlock_t  tcfa_lock;
struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+   struct gnet_stats_basic_cpu __percpu *cpu_bstats_hw;
struct gnet_stats_queue __percpu *cpu_qstats;
struct tc_cookie__rcu *act_cookie;
struct tcf_chain*goto_chain;
@@ -98,7 +100,7 @@ struct tc_action_ops {
struct netlink_callback *, int,
const struct tc_action_ops *,
struct netlink_ext_ack *);
-   void(*stats_update)(struct tc_action *, u64, u32, u64);
+   void(*stats_update)(struct tc_action *, u64, u32, u64, bool);
size_t  (*get_fill_size)(const struct tc_action *act);
struct net_device *(*get_dev)(const struct tc_action *a);
int (*delete)(struct net *net, u32 index);
@@ -189,13 +191,13 @@ int tcf_action_dump(struct sk_buff *skb, struct tc_action 
*actions[], int bind,
 #endif /* CONFIG_NET_CLS_ACT */
 
 static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes,
-  u64 packets, u64 lastuse)
+  u64 packets, u64 lastuse, bool hw)
 {
 #ifdef CONFIG_NET_CLS_ACT
if (!a->ops->stats_update)
return;
 
-   a->ops->stats_update(a, bytes, packets, lastuse);
+   a->ops->stats_update(a, bytes, packets, lastuse, hw);
 #endif
 }
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index ef727f7..de1f06a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -324,7 +324,7 @@ static inline void tcf_exts_to_list(const struct tcf_exts 
*exts,
for (i = 0; i < exts->nr_actions; i++) {
struct tc_action *a = exts->actions[i];
 
-   tcf_action_stats_update(a, bytes, packets, lastuse);
+   tcf_action_stats_update(a, bytes, packets, lastuse, true);
}
 
preempt_enable();
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 229d63c..9ab3061 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -81,6 +81,7 @@ static void tcf_set_action_cookie(struct tc_cookie __rcu 
**old_cookie,
 static void free_tcf(struct tc_action *p)
 {
free_percpu(p->cpu_bstats);
+   free_percpu(p->cpu_bstats_hw);
free_percpu(p->cpu_qstats);
 
tcf_set_action_cookie(>act_cookie, NULL);
@@ -390,9 +391,12 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
struct nlattr *est,
p->cpu_bstats = netdev_alloc_pcpu_stats(struct 
gnet_stats_basic_cpu);
if (!p->cpu_bstats)
goto err1;
+   p->cpu_bstats_hw = netdev_alloc_pcpu_stats(struct 
gnet_stats_basic_cpu);
+   if (!p->cpu_bstats_hw)
+   goto err2;
p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
if (!p->cpu_qstats)
-   goto err2;
+   goto err3;
}
spin_lock_init(>tcfa_lock);
p->tcfa_index = index;
@@ -404,7 +408,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
struct nlattr *est,
>tcfa_rate_est,
>tcfa_lock, NULL, est);
if (err)
-   goto err3;
+   goto err4;
}
 
p->idrinfo = idrinfo;
@@ -412,8 +416,10 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
struct nlattr *est,
INIT_LIST_HEAD(>list);
*a = p;
return 0;
-err3:
+err4:
free_percpu(p->cpu_qstats);
+err3:
+   free_percpu(p->cpu_bstats_hw);
 err2:
free_percpu(p->cpu_bstats);
 err1:
@@ -988,6 +994,8 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct 
tc_action *p,
goto errout;
 
if (gnet_stats_copy_basic(NULL, , p->cpu_bstats, >tcfa_bstats) < 0 
||
+   gnet_stats_copy_basic_hw(NULL, , p->cpu_bstats_hw,
+

[PATCH 0/2] net/sched: Add hardware specific counters to TC actions

2018-08-09 Thread Eelco Chaudron
Add hardware specific counters to TC actions which will be exported
through the netlink API. This makes troubleshooting TC flower offload
easier, as it possible to differentiate the packets being offloaded.

Signed-off-by: Eelco Chaudron 

Eelco Chaudron (2):
  net/core: Add new basic hardware counter
  net/sched: Add hardware specific counters to TC actions


 include/net/act_api.h  |8 +++-
 include/net/gen_stats.h|4 ++
 include/net/pkt_cls.h  |2 +
 include/uapi/linux/gen_stats.h |1 +
 net/core/gen_stats.c   |   73 ++--
 net/sched/act_api.c|   14 ++--
 net/sched/act_gact.c   |6 +++
 net/sched/act_mirred.c |5 ++-
 8 files changed, 85 insertions(+), 28 deletions(-)



[PATCH 1/2] net/core: Add new basic hardware counter

2018-08-09 Thread Eelco Chaudron
Add a new hardware specific basic counter, TCA_STATS_BASIC_HW. This can
be used to count packets/bytes processed by hardware offload.


Signed-off-by: Eelco Chaudron 
---
 include/net/gen_stats.h|4 ++
 include/uapi/linux/gen_stats.h |1 +
 net/core/gen_stats.c   |   73 ++--
 3 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 0304ba2..7e54a9a 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -44,6 +44,10 @@ void __gnet_stats_copy_basic(const seqcount_t *running,
 struct gnet_stats_basic_packed *bstats,
 struct gnet_stats_basic_cpu __percpu *cpu,
 struct gnet_stats_basic_packed *b);
+int gnet_stats_copy_basic_hw(const seqcount_t *running,
+struct gnet_dump *d,
+struct gnet_stats_basic_cpu __percpu *cpu,
+struct gnet_stats_basic_packed *b);
 int gnet_stats_copy_rate_est(struct gnet_dump *d,
 struct net_rate_estimator __rcu **ptr);
 int gnet_stats_copy_queue(struct gnet_dump *d,
diff --git a/include/uapi/linux/gen_stats.h b/include/uapi/linux/gen_stats.h
index 24a861c..065408e 100644
--- a/include/uapi/linux/gen_stats.h
+++ b/include/uapi/linux/gen_stats.h
@@ -12,6 +12,7 @@ enum {
TCA_STATS_APP,
TCA_STATS_RATE_EST64,
TCA_STATS_PAD,
+   TCA_STATS_BASIC_HW,
__TCA_STATS_MAX,
 };
 #define TCA_STATS_MAX (__TCA_STATS_MAX - 1)
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 188d693..65a2e82 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -162,30 +162,18 @@
 }
 EXPORT_SYMBOL(__gnet_stats_copy_basic);
 
-/**
- * gnet_stats_copy_basic - copy basic statistics into statistic TLV
- * @running: seqcount_t pointer
- * @d: dumping handle
- * @cpu: copy statistic per cpu
- * @b: basic statistics
- *
- * Appends the basic statistics to the top level TLV created by
- * gnet_stats_start_copy().
- *
- * Returns 0 on success or -1 with the statistic lock released
- * if the room in the socket buffer was not sufficient.
- */
 int
-gnet_stats_copy_basic(const seqcount_t *running,
- struct gnet_dump *d,
- struct gnet_stats_basic_cpu __percpu *cpu,
- struct gnet_stats_basic_packed *b)
+___gnet_stats_copy_basic(const seqcount_t *running,
+struct gnet_dump *d,
+struct gnet_stats_basic_cpu __percpu *cpu,
+struct gnet_stats_basic_packed *b,
+int type)
 {
struct gnet_stats_basic_packed bstats = {0};
 
__gnet_stats_copy_basic(running, , cpu, b);
 
-   if (d->compat_tc_stats) {
+   if (d->compat_tc_stats && type == TCA_STATS_BASIC) {
d->tc_stats.bytes = bstats.bytes;
d->tc_stats.packets = bstats.packets;
}
@@ -196,14 +184,61 @@
memset(, 0, sizeof(sb));
sb.bytes = bstats.bytes;
sb.packets = bstats.packets;
-   return gnet_stats_copy(d, TCA_STATS_BASIC, , sizeof(sb),
+   return gnet_stats_copy(d, type, , sizeof(sb),
   TCA_STATS_PAD);
}
return 0;
 }
+
+/**
+ * gnet_stats_copy_basic - copy basic statistics into statistic TLV
+ * @running: seqcount_t pointer
+ * @d: dumping handle
+ * @cpu: copy statistic per cpu
+ * @b: basic statistics
+ *
+ * Appends the basic statistics to the top level TLV created by
+ * gnet_stats_start_copy().
+ *
+ * Returns 0 on success or -1 with the statistic lock released
+ * if the room in the socket buffer was not sufficient.
+ */
+int
+gnet_stats_copy_basic(const seqcount_t *running,
+ struct gnet_dump *d,
+ struct gnet_stats_basic_cpu __percpu *cpu,
+ struct gnet_stats_basic_packed *b)
+{
+   return ___gnet_stats_copy_basic(running, d, cpu, b,
+   TCA_STATS_BASIC);
+}
 EXPORT_SYMBOL(gnet_stats_copy_basic);
 
 /**
+ * gnet_stats_copy_basic_hw - copy basic hw statistics into statistic TLV
+ * @running: seqcount_t pointer
+ * @d: dumping handle
+ * @cpu: copy statistic per cpu
+ * @b: basic statistics
+ *
+ * Appends the basic statistics to the top level TLV created by
+ * gnet_stats_start_copy().
+ *
+ * Returns 0 on success or -1 with the statistic lock released
+ * if the room in the socket buffer was not sufficient.
+ */
+int
+gnet_stats_copy_basic_hw(const seqcount_t *running,
+struct gnet_dump *d,
+struct gnet_stats_basic_cpu __percpu *cpu,
+struct gnet_stats_basic_packed *b)
+{
+   return ___gnet_stats_copy_basic(running, d, cpu, b,
+   TCA_STATS_BASIC_HW);
+}

  1   2   >