date:20180116

[PATCH net-next] net/mlx5: Fix build break

2018-01-16 Thread Saeed Mahameed

The latest merge between net and net-next introduced a complier assert in
mlx5 driver.  In hca_cap_bits older fields are kept along with newer
fields that should have replaced them.

Fixes: c02b3741eb99 ("Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/mlx5_ifc.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 94135c03d52b..acd829d8613b 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1035,8 +1035,6 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 log_max_wq_sz[0x5];
 
u8 nic_vport_change_event[0x1];
-   u8 disable_local_lb[0x1];
-   u8 reserved_at_3e2[0x1];
u8 disable_local_lb_uc[0x1];
u8 disable_local_lb_mc[0x1];
u8 log_min_hairpin_wq_data_sz[0x5];
-- 
2.13.0

[PATCH iproute2-next] bpf: support map offload

2018-01-16 Thread Jakub Kicinski

When program is loaded with a specified ifindex, use that
ifindex also when creating maps.

Signed-off-by: Jakub Kicinski 
---
 lib/bpf.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/bpf.c b/lib/bpf.c
index d32f1b808180..2db151e4dd3c 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -1208,7 +1208,7 @@ static int bpf_log_realloc(struct bpf_elf_ctx *ctx)
 
 static int bpf_map_create(enum bpf_map_type type, uint32_t size_key,
  uint32_t size_value, uint32_t max_elem,
- uint32_t flags, int inner_fd)
+ uint32_t flags, int inner_fd, uint32_t ifindex)
 {
union bpf_attr attr = {};
 
@@ -1218,6 +1218,7 @@ static int bpf_map_create(enum bpf_map_type type, 
uint32_t size_key,
attr.max_entries = max_elem;
attr.map_flags = flags;
attr.inner_map_fd = inner_fd;
+   attr.map_ifindex = ifindex;
 
return bpf(BPF_MAP_CREATE, , sizeof(attr));
 }
@@ -1632,7 +1633,9 @@ static int bpf_map_attach(const char *name, struct 
bpf_elf_ctx *ctx,
 
errno = 0;
fd = bpf_map_create(map->type, map->size_key, map->size_value,
-   map->max_elem, map->flags, map_inner_fd);
+   map->max_elem, map->flags, map_inner_fd,
+   ctx->ifindex);
+
if (fd < 0 || ctx->verbose) {
bpf_map_report(fd, name, map, ctx, map_inner_fd);
if (fd < 0)
-- 
2.15.1

Re: WARNING in can_rcv

2018-01-16 Thread Dmitry Vyukov

On Wed, Jan 17, 2018 at 8:12 AM, Eric Biggers  wrote:
> On Wed, Jan 17, 2018 at 07:39:24AM +0100, Oliver Hartkopp wrote:
>>
>>
>> On 01/16/2018 07:11 PM, Dmitry Vyukov wrote:
>> > On Tue, Jan 16, 2018 at 7:07 PM, Marc Kleine-Budde  
>> > wrote:
>> > > On 01/16/2018 06:58 PM, syzbot wrote:
>> > > > Hello,
>> > > >
>> > > > syzkaller hit the following crash on
>> > > > a8750ddca918032d6349adbf9a4b6555e7db20da
>> > > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>> > > > compiler: gcc (GCC) 7.1.1 20170620
>> > > > .config is attached
>> > > > Raw console output is attached.
>> > > > C reproducer is attached
>> > > > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> > > > for information about syzkaller reproducers
>> > > >
>> > > >
>> > > > IMPORTANT: if you fix the bug, please add the following tag to the 
>> > > > commit:
>> > > > Reported-by: syzbot+4386709c0c1284dca...@syzkaller.appspotmail.com
>> > > > It will help syzbot understand when the bug is fixed. See footer for
>> > > > details.
>> > > > If you forward the report, please keep this part and the footer.
>> > > >
>> > > > device eql entered promiscuous mode
>> > > > [ cut here ]
>> > > > PF_CAN: dropped non conform CAN skbuf: dev type 65534, len 42, datalen >> > > > 0
>> > > > WARNING: CPU: 0 PID: 3650 at net/can/af_can.c:729 can_rcv+0x1c5/0x200
>> > > > net/can/af_can.c:724
>> > > > Kernel panic - not syncing: panic_on_warn set ...
>> > >
>> > > Invalid packages generate a warning (WARN_ONCE()), and you have
>> > > panic_on_warn active. Should we better silently drop these CAN packages?
>> >
>> > Hi,
>> >
>> > pr_warn_once() will be more appropriate. It prints a single line.
>> >
>>
>> The idea behind this WARN() is to detect really bad things that might have
>> happen on network driver level:
>>
>> The CAN subsystem registers with dev_add_pack() for ETH_P_CAN and
>> ETH_P_CANFD only. These ETH_P_ types are only allowed to be created by CAN
>> network devices (like vcan, vxcan, and real CAN drivers).
>>
>> I don't have any strong opinion on using WARN() or pr_warn_once().
>> Is this detected violation worth using WARN(), as something already must
>> have gone really wrong to trigger this issue?
>>
>
> WARN() indicates a kernel bug.  If it's instead "userspace did something
> stupid", or "someone sent some unexpected network packet", it needs to be
> pr_warn_once(), pr_warn_ratelimited(), or removed entirely.


The packet comes from tun device. We could change tun to filter out
such packages earlier. However, in the context of "syzkaller support
for AF_CAN" discussion, it would actually be useful for fuzzer to be
able emit can packets for testing purposes. For example, for tcp it
can not just emit random packets, it can build complex user<->network
interactions, for example, open a listening socket, connect to it
"from outside", accept the connection, and then exchange some data
over the active connection. It could do the same for can.
Is it possible to allow can packets via tun? Then we could leave this
WARNING in place. tun/vcan are contained within a net namespace, so
this should not be a security problem, right?
Or is there a way to do the same with vcan? If yes, then fuzzer could
use vcan. But then we need some fix for this WARNING: either change it
to pr_warn or change tun (I don't have strong preference which one).

Re: [PATCH 32/32] aio: implement io_pgetevents

2018-01-16 Thread Christoph Hellwig

On Tue, Jan 16, 2018 at 07:41:24PM -0500, Jeff Moyer wrote:
> I'd be willing to bet the issue is in your io_syscall6 implementation.
> You pass in arg5 where arg6 should be used.  Don't feel bad, it took me
> the better part of today to figure that out.  :)
> 
> Here's an incremental diff on top of what you've posted.  Feel free to
> fold it into your patch (and format however you like).  You can find the
> libaio changes in my 'aio-poll' branch:
>   https://pagure.io/libaio/commits/aio-poll
> 
> These changes were run through the libaio test harness, 64 bit and 32
> bit, so the compat system call was tested.

Oops, yes.  Although I prefer the copy_from_user version, this is what
I had:


diff --git a/fs/aio.c b/fs/aio.c
index 9fe0a5539596..6c1bbfa9b06a 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1984,8 +1984,9 @@ SYSCALL_DEFINE6(io_pgetevents,
long, nr,
struct io_event __user *, events,
struct timespec __user *, timeout,
-   const sigset_t __user *, sigmask)
+   const struct __aio_sigset __user *, usig)
 {
+   struct __aio_sigset ksig = { NULL, };
sigset_tksigmask, sigsaved;
struct timespec64   ts;
int ret;
@@ -1993,8 +1994,13 @@ SYSCALL_DEFINE6(io_pgetevents,
if (timeout && unlikely(get_timespec64(, timeout)))
return -EFAULT;
 
-   if (sigmask) {
-   if (copy_from_user(, sigmask, sizeof(ksigmask)))
+   if (usig && copy_from_user(, usig, sizeof(ksig)))
+   return -EFAULT;
+
+   if (ksig.sigmask) {
+   if (ksig.sigsetsize != sizeof(sigset_t))
+   return -EINVAL;
+   if (copy_from_user(, ksig.sigmask, sizeof(ksigmask)))
return -EFAULT;
sigdelsetmask(, sigmask(SIGKILL) | sigmask(SIGSTOP));
sigprocmask(SIG_SETMASK, , );
@@ -2002,7 +2008,7 @@ SYSCALL_DEFINE6(io_pgetevents,
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
if (signal_pending(current)) {
-   if (sigmask) {
+   if (ksig.sigmask) {
current->saved_sigmask = sigsaved;
set_restore_sigmask();
}
@@ -2010,7 +2016,7 @@ SYSCALL_DEFINE6(io_pgetevents,
if (!ret)
ret = -ERESTARTNOHAND;
} else {
-   if (sigmask)
+   if (ksig.sigmask)
sigprocmask(SIG_SETMASK, , NULL);
}
 
@@ -2036,14 +2042,21 @@ COMPAT_SYSCALL_DEFINE5(io_getevents, 
compat_aio_context_t, ctx_id,
return ret;
 }
 
+
+struct __compat_aio_sigset {
+   compat_sigset_t __user  *sigmask;
+   compat_size_t   sigsetsize;
+};
+
 COMPAT_SYSCALL_DEFINE6(io_pgetevents,
compat_aio_context_t, ctx_id,
compat_long_t, min_nr,
compat_long_t, nr,
struct io_event __user *, events,
struct compat_timespec __user *, timeout,
-   const compat_sigset_t __user *, sigmask)
+   const struct __compat_aio_sigset __user *, usig)
 {
+   struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
int ret;
@@ -2051,8 +2064,13 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
if (timeout && compat_get_timespec64(, timeout))
return -EFAULT;
 
-   if (sigmask) {
-   if (get_compat_sigset(, sigmask))
+   if (usig && copy_from_user(, usig, sizeof(ksig)))
+   return -EFAULT;
+
+   if (ksig.sigmask) {
+   if (ksig.sigsetsize != sizeof(compat_sigset_t))
+   return -EINVAL;
+   if (get_compat_sigset(, ksig.sigmask))
return -EFAULT;
sigdelsetmask(, sigmask(SIGKILL) | sigmask(SIGSTOP));
sigprocmask(SIG_SETMASK, , );
@@ -2060,14 +2078,14 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
if (signal_pending(current)) {
-   if (sigmask) {
+   if (ksig.sigmask) {
current->saved_sigmask = sigsaved;
set_restore_sigmask();
}
if (!ret)
ret = -ERESTARTNOHAND;
} else {
-   if (sigmask)
+   if (ksig.sigmask)
sigprocmask(SIG_SETMASK, , NULL);
}
 
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a4cda98073f1..6c04450e961f 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -205,6 +205,7 @@ extern int put_compat_rusage(const struct rusage *,
 struct compat_rusage __user *);
 
 struct compat_siginfo;
+struct __compat_aio_sigset;
 
 extern asmlinkage long compat_sys_waitid(int, compat_pid_t,

RE: [PATCH net-next v2] net: sched: red: don't reset the backlog on every stat dump

2018-01-16 Thread Nogah Frankel

> -Original Message-
> From: Jakub Kicinski [mailto:jakub.kicin...@netronome.com]
> Sent: Monday, January 15, 2018 6:01 AM
> To: da...@davemloft.net; j...@resnulli.us; Nogah Frankel
> 
> Cc: netdev@vger.kernel.org; oss-driv...@netronome.com;
> xiyou.wangc...@gmail.com; eduma...@google.com; Yuval Mintz
> ; Jakub Kicinski 
> Subject: [PATCH net-next v2] net: sched: red: don't reset the backlog on every
> stat dump
> 
> Commit 0dfb33a0d7e2 ("sch_red: report backlog information") copied
> child's backlog into RED's backlog.  Back then RED did not maintain
> its own backlog counts.  This has changed after commit 2f5fb43f
> ("net_sched: update hierarchical backlog too") and commit d7f4f332f082
> ("sch_red: update backlog as well").  Copying is no longer necessary.
> 
> Tested:
> 
> $ tc -s qdisc show dev veth0
> qdisc red 1: root refcnt 2 limit 40b min 3b max 3b ecn
>  Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 1260b 14p requeues 14
>   marked 0 early 0 pdrop 0 other 0
> qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s
>  Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0)
>  backlog 1260b 14p requeues 14
> 
> Recently RED offload was added.  We need to make sure drivers don't
> depend on resetting the stats.  This means backlog should be treated
> like any other statistic:
> 
>   total_stat = new_hw_stat - prev_hw_stat;
> 
> Adjust mlxsw.
> 
> Signed-off-by: Jakub Kicinski 

Acked-by: Nogah Frankel 

Thanks
Nogah

> ---
> v2:
>  - reuse the mlxsw infra added for prio;
>  - align the way qstats are passed with prio.
> 
>  .../net/ethernet/mellanox/mlxsw/spectrum_qdisc.c   | 26
> +++---
>  include/net/pkt_cls.h  |  1 +
>  net/sched/sch_red.c|  2 +-
>  3 files changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> index e11a0abfc663..8cac5202b913 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> @@ -247,6 +247,8 @@ mlxsw_sp_setup_tc_qdisc_red_clean_stats(struct
> mlxsw_sp_port *mlxsw_sp_port,
> 
>   stats_base->overlimits = red_base->prob_drop + red_base-
> >prob_mark;
>   stats_base->drops = red_base->prob_drop + red_base->pdrop;
> +
> + stats_base->backlog = 0;
>  }
> 
>  static int
> @@ -306,6 +308,19 @@ mlxsw_sp_qdisc_red_replace(struct mlxsw_sp_port
> *mlxsw_sp_port,
>max, prob, p->is_ecn);
>  }
> 
> +static void
> +mlxsw_sp_qdisc_red_unoffload(struct mlxsw_sp_port *mlxsw_sp_port,
> +  struct mlxsw_sp_qdisc *mlxsw_sp_qdisc,
> +  void *params)
> +{
> + struct tc_red_qopt_offload_params *p = params;
> + u64 backlog;
> +
> + backlog = mlxsw_sp_cells_bytes(mlxsw_sp_port->mlxsw_sp,
> +mlxsw_sp_qdisc->stats_base.backlog);
> + p->qstats->backlog -= backlog;
> +}
> +
>  static int
>  mlxsw_sp_qdisc_get_red_xstats(struct mlxsw_sp_port *mlxsw_sp_port,
> struct mlxsw_sp_qdisc *mlxsw_sp_qdisc,
> @@ -338,7 +353,7 @@ mlxsw_sp_qdisc_get_red_stats(struct mlxsw_sp_port
> *mlxsw_sp_port,
>struct mlxsw_sp_qdisc *mlxsw_sp_qdisc,
>struct tc_qopt_offload_stats *stats_ptr)
>  {
> - u64 tx_bytes, tx_packets, overlimits, drops;
> + u64 tx_bytes, tx_packets, overlimits, drops, backlog;
>   u8 tclass_num = mlxsw_sp_qdisc->tclass_num;
>   struct mlxsw_sp_qdisc_stats *stats_base;
>   struct mlxsw_sp_port_xstats *xstats;
> @@ -354,14 +369,18 @@ mlxsw_sp_qdisc_get_red_stats(struct
> mlxsw_sp_port *mlxsw_sp_port,
>stats_base->overlimits;
>   drops = xstats->wred_drop[tclass_num] + xstats-
> >tail_drop[tclass_num] -
>   stats_base->drops;
> + backlog = xstats->backlog[tclass_num];
> 
>   _bstats_update(stats_ptr->bstats, tx_bytes, tx_packets);
>   stats_ptr->qstats->overlimits += overlimits;
>   stats_ptr->qstats->drops += drops;
>   stats_ptr->qstats->backlog +=
> - mlxsw_sp_cells_bytes(mlxsw_sp_port->mlxsw_sp,
> -  xstats->backlog[tclass_num]);
> + mlxsw_sp_cells_bytes(mlxsw_sp_port-
> >mlxsw_sp,
> +  backlog) -
> + mlxsw_sp_cells_bytes(mlxsw_sp_port-
> >mlxsw_sp,
> +  stats_base->backlog);
> 
> + stats_base->backlog = backlog;
>   stats_base->drops +=  drops;
>   stats_base->overlimits += overlimits;
>   stats_base->tx_bytes +=

[PATCH net-next] cxgb4: IPv6 filter takes 2 tids

2018-01-16 Thread Ganesh Goudar

on T6, IPv6 filter would occupy 2 tids instead of 4.

Signed-off-by: Kumar Sanghvi 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 113 +++---
 1 file changed, 80 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
index 677a3ba..3177b0c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
@@ -439,19 +439,32 @@ int cxgb4_get_free_ftid(struct net_device *dev, int 
family)
if (ftid >= t->nftids)
ftid = -1;
} else {
-   ftid = bitmap_find_free_region(t->ftid_bmap, t->nftids, 2);
-   if (ftid < 0)
-   goto out_unlock;
+   if (is_t6(adap->params.chip)) {
+   ftid = bitmap_find_free_region(t->ftid_bmap,
+  t->nftids, 1);
+   if (ftid < 0)
+   goto out_unlock;
+
+   /* this is only a lookup, keep the found region
+* unallocated
+*/
+   bitmap_release_region(t->ftid_bmap, ftid, 1);
+   } else {
+   ftid = bitmap_find_free_region(t->ftid_bmap,
+  t->nftids, 2);
+   if (ftid < 0)
+   goto out_unlock;
 
-   /* this is only a lookup, keep the found region unallocated */
-   bitmap_release_region(t->ftid_bmap, ftid, 2);
+   bitmap_release_region(t->ftid_bmap, ftid, 2);
+   }
}
 out_unlock:
spin_unlock_bh(>ftid_lock);
return ftid;
 }
 
-static int cxgb4_set_ftid(struct tid_info *t, int fidx, int family)
+static int cxgb4_set_ftid(struct tid_info *t, int fidx, int family,
+ unsigned int chip_ver)
 {
spin_lock_bh(>ftid_lock);
 
@@ -460,22 +473,31 @@ static int cxgb4_set_ftid(struct tid_info *t, int fidx, 
int family)
return -EBUSY;
}
 
-   if (family == PF_INET)
+   if (family == PF_INET) {
__set_bit(fidx, t->ftid_bmap);
-   else
-   bitmap_allocate_region(t->ftid_bmap, fidx, 2);
+   } else {
+   if (chip_ver < CHELSIO_T6)
+   bitmap_allocate_region(t->ftid_bmap, fidx, 2);
+   else
+   bitmap_allocate_region(t->ftid_bmap, fidx, 1);
+   }
 
spin_unlock_bh(>ftid_lock);
return 0;
 }
 
-static void cxgb4_clear_ftid(struct tid_info *t, int fidx, int family)
+static void cxgb4_clear_ftid(struct tid_info *t, int fidx, int family,
+unsigned int chip_ver)
 {
spin_lock_bh(>ftid_lock);
-   if (family == PF_INET)
+   if (family == PF_INET) {
__clear_bit(fidx, t->ftid_bmap);
-   else
-   bitmap_release_region(t->ftid_bmap, fidx, 2);
+   } else {
+   if (chip_ver < CHELSIO_T6)
+   bitmap_release_region(t->ftid_bmap, fidx, 2);
+   else
+   bitmap_release_region(t->ftid_bmap, fidx, 1);
+   }
spin_unlock_bh(>ftid_lock);
 }
 
@@ -1249,23 +1271,42 @@ int __cxgb4_set_filter(struct net_device *dev, int 
filter_id,
}
}
} else { /* IPv6 */
-   /* Ensure that the IPv6 filter is aligned on a
-* multiple of 4 boundary.
-*/
-   if (filter_id & 0x3) {
-   dev_err(adapter->pdev_dev,
-   "Invalid location. IPv6 must be aligned on a 
4-slot boundary\n");
-   return -EINVAL;
-   }
+   if (chip_ver < CHELSIO_T6) {
+   /* Ensure that the IPv6 filter is aligned on a
+* multiple of 4 boundary.
+*/
+   if (filter_id & 0x3) {
+   dev_err(adapter->pdev_dev,
+   "Invalid location. IPv6 must be aligned 
on a 4-slot boundary\n");
+   return -EINVAL;
+   }
 
-   /* Check all except the base overlapping IPv4 filter slots. */
-   for (fidx = filter_id + 1; fidx < filter_id + 4; fidx++) {
+   /* Check all except the base overlapping IPv4 filter
+* slots.
+*/
+   for (fidx = filter_id + 1; fidx < filter_id + 4;
+fidx++) {
+   f = >tids.ftid_tab[fidx];
+   if (f->valid) {
+

Re: net merged into net-next

2018-01-16 Thread Saeed Mahameed


Dave,

The resolution of the mlx5_ifc conflict was wrong and it causes a build break 
in mlx5, oops :(.

I hope my resolution instructions in my pull request didn't mislead you.

I will post a patch.

-Saeed.

[PATCH net-next 1/2] cxgb4: update dump collection logic to use compression

2018-01-16 Thread Rahul Lakkireddy

Update firmware dump collection logic to use compression when available.
Let collection logic attempt to do compression, instead of returning out
of memory early.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Vishal Kulkarni 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c  |  24 +-
 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h  |   3 +
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c | 280 +++--
 .../net/ethernet/chelsio/cxgb4/cudbg_lib_common.h  |   7 +-
 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.h|  27 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c   |  13 +-
 6 files changed, 207 insertions(+), 147 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.h

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c
index f78ba1743b5a..8edc49827af0 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c
@@ -19,7 +19,8 @@
 #include "cudbg_if.h"
 #include "cudbg_lib_common.h"
 
-int cudbg_get_buff(struct cudbg_buffer *pdbg_buff, u32 size,
+int cudbg_get_buff(struct cudbg_init *pdbg_init,
+  struct cudbg_buffer *pdbg_buff, u32 size,
   struct cudbg_buffer *pin_buff)
 {
u32 offset;
@@ -28,17 +29,30 @@ int cudbg_get_buff(struct cudbg_buffer *pdbg_buff, u32 size,
if (offset + size > pdbg_buff->size)
return CUDBG_STATUS_NO_MEM;
 
+   if (pdbg_init->compress_type != CUDBG_COMPRESSION_NONE) {
+   if (size > pdbg_init->compress_buff_size)
+   return CUDBG_STATUS_NO_MEM;
+
+   pin_buff->data = (char *)pdbg_init->compress_buff;
+   pin_buff->offset = 0;
+   pin_buff->size = size;
+   return 0;
+   }
+
pin_buff->data = (char *)pdbg_buff->data + offset;
pin_buff->offset = offset;
pin_buff->size = size;
-   pdbg_buff->size -= size;
return 0;
 }
 
-void cudbg_put_buff(struct cudbg_buffer *pin_buff,
-   struct cudbg_buffer *pdbg_buff)
+void cudbg_put_buff(struct cudbg_init *pdbg_init,
+   struct cudbg_buffer *pin_buff)
 {
-   pdbg_buff->size += pin_buff->size;
+   /* Clear compression buffer for re-use */
+   if (pdbg_init->compress_type != CUDBG_COMPRESSION_NONE)
+   memset(pdbg_init->compress_buff, 0,
+  pdbg_init->compress_buff_size);
+
pin_buff->data = NULL;
pin_buff->offset = 0;
pin_buff->size = 0;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
index 88e740082a02..eb1d2f48ebd3 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
@@ -87,6 +87,9 @@ struct cudbg_init {
struct adapter *adap; /* Pointer to adapter structure */
void *outbuf; /* Output buffer */
u32 outbuf_size;  /* Output buffer size */
+   u8 compress_type; /* Type of compression to use */
+   void *compress_buff; /* Compression buffer */
+   u32 compress_buff_size; /* Compression buffer size */
 };
 
 static inline unsigned int cudbg_mbytes_to_bytes(unsigned int size)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
index 0a3871f10787..8b95117c2923 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
@@ -23,12 +23,57 @@
 #include "cudbg_lib_common.h"
 #include "cudbg_entity.h"
 #include "cudbg_lib.h"
+#include "cudbg_zlib.h"
 
-static void cudbg_write_and_release_buff(struct cudbg_buffer *pin_buff,
-struct cudbg_buffer *dbg_buff)
+static int cudbg_do_compression(struct cudbg_init *pdbg_init,
+   struct cudbg_buffer *pin_buff,
+   struct cudbg_buffer *dbg_buff)
 {
-   cudbg_update_buff(pin_buff, dbg_buff);
-   cudbg_put_buff(pin_buff, dbg_buff);
+   struct cudbg_buffer temp_in_buff = { 0 };
+   int bytes_left, bytes_read, bytes;
+   u32 offset = dbg_buff->offset;
+   int rc;
+
+   temp_in_buff.offset = pin_buff->offset;
+   temp_in_buff.data = pin_buff->data;
+   temp_in_buff.size = pin_buff->size;
+
+   bytes_left = pin_buff->size;
+   bytes_read = 0;
+   while (bytes_left > 0) {
+   /* Do compression in smaller chunks */
+   bytes = min_t(unsigned long, bytes_left,
+ (unsigned long)CUDBG_CHUNK_SIZE);
+   temp_in_buff.data = (char *)pin_buff->data + bytes_read;
+   temp_in_buff.size = bytes;
+   rc = cudbg_compress_buff(pdbg_init, _in_buff, dbg_buff);
+   if (rc)
+

[PATCH net-next 2/2] cxgb4: use zlib deflate to compress firmware dump

2018-01-16 Thread Rahul Lakkireddy

Use zlib deflate to compress firmware dump. Collect and compress
as much firmware dump as possible into a 32 MB buffer.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Vishal Kulkarni 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/Makefile|  1 +
 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h  |  1 +
 .../net/ethernet/chelsio/cxgb4/cudbg_lib_common.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c| 81 ++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.h| 29 
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c   | 56 ++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h   |  3 +
 7 files changed, 169 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c

diff --git a/drivers/net/ethernet/chelsio/cxgb4/Makefile 
b/drivers/net/ethernet/chelsio/cxgb4/Makefile
index 8c9c6b0d2e5d..5df923798669 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/Makefile
+++ b/drivers/net/ethernet/chelsio/cxgb4/Makefile
@@ -12,3 +12,4 @@ cxgb4-objs := cxgb4_main.o l2t.o smt.o t4_hw.o sge.o 
clip_tbl.o cxgb4_ethtool.o
 cxgb4-$(CONFIG_CHELSIO_T4_DCB) +=  cxgb4_dcb.o
 cxgb4-$(CONFIG_CHELSIO_T4_FCOE) +=  cxgb4_fcoe.o
 cxgb4-$(CONFIG_DEBUG_FS) += cxgb4_debugfs.o
+cxgb4-$(CONFIG_ZLIB_DEFLATE) += cudbg_zlib.o
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
index eb1d2f48ebd3..8568a51f6414 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
@@ -90,6 +90,7 @@ struct cudbg_init {
u8 compress_type; /* Type of compression to use */
void *compress_buff; /* Compression buffer */
u32 compress_buff_size; /* Compression buffer size */
+   void *workspace; /* Workspace for zlib */
 };
 
 static inline unsigned int cudbg_mbytes_to_bytes(unsigned int size)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib_common.h 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib_common.h
index 2e1c8e87c9bd..8150ea85d6a5 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib_common.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib_common.h
@@ -26,6 +26,7 @@ enum cudbg_dump_type {
 
 enum cudbg_compression_type {
CUDBG_COMPRESSION_NONE = 1,
+   CUDBG_COMPRESSION_ZLIB,
 };
 
 struct cudbg_hdr {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c
new file mode 100644
index ..4c3854cbeb6c
--- /dev/null
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c
@@ -0,0 +1,81 @@
+/*
+ *  Copyright (C) 2018 Chelsio Communications.  All rights reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms and conditions of the GNU General Public License,
+ *  version 2, as published by the Free Software Foundation.
+ *
+ *  This program is distributed in the hope it will be useful, but WITHOUT
+ *  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ *  more details.
+ *
+ *  The full GNU General Public License is included in this distribution in
+ *  the file called "COPYING".
+ *
+ */
+
+#include 
+
+#include "cxgb4.h"
+#include "cudbg_if.h"
+#include "cudbg_lib_common.h"
+#include "cudbg_zlib.h"
+
+static int cudbg_get_compress_hdr(struct cudbg_buffer *pdbg_buff,
+ struct cudbg_buffer *pin_buff)
+{
+   if (pdbg_buff->offset + sizeof(struct cudbg_compress_hdr) >
+   pdbg_buff->size)
+   return CUDBG_STATUS_NO_MEM;
+
+   pin_buff->data = (char *)pdbg_buff->data + pdbg_buff->offset;
+   pin_buff->offset = 0;
+   pin_buff->size = sizeof(struct cudbg_compress_hdr);
+   pdbg_buff->offset += sizeof(struct cudbg_compress_hdr);
+   return 0;
+}
+
+int cudbg_compress_buff(struct cudbg_init *pdbg_init,
+   struct cudbg_buffer *pin_buff,
+   struct cudbg_buffer *pout_buff)
+{
+   struct z_stream_s compress_stream = { 0 };
+   struct cudbg_buffer temp_buff = { 0 };
+   struct cudbg_compress_hdr *c_hdr;
+   int rc;
+
+   /* Write compression header to output buffer before compression */
+   rc = cudbg_get_compress_hdr(pout_buff, _buff);
+   if (rc)
+   return rc;
+
+   c_hdr = (struct cudbg_compress_hdr *)temp_buff.data;
+   c_hdr->compress_id = CUDBG_ZLIB_COMPRESS_ID;
+
+   compress_stream.workspace = pdbg_init->workspace;
+   rc = zlib_deflateInit2(_stream, Z_DEFAULT_COMPRESSION,
+  Z_DEFLATED, CUDBG_ZLIB_WIN_BITS,
+  CUDBG_ZLIB_MEM_LVL, Z_DEFAULT_STRATEGY);
+   if (rc != Z_OK)
+   return CUDBG_SYSTEM_ERROR;
+
+   compress_stream.next_in = pin_buff->data;
+

[PATCH net-next 0/2] cxgb4: reduce memory footprint for collecting firmware dump

2018-01-16 Thread Rahul Lakkireddy

Firmware dump can be large (upto 2 GB).  In low memory conditions,
ethtool fails to allocate such large memory.  So, use zlib deflate
to compress collected firmware dump.

Patch 1 updates collection logic to use compression.

Patch 2 adds zlib deflate to compress collected firmware dump.

Thanks,
Rahul

Rahul Lakkireddy (2):
  cxgb4: update dump collection logic to use compression
  cxgb4: use zlib deflate to compress firmware dump

 drivers/net/ethernet/chelsio/cxgb4/Makefile|   1 +
 drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c  |  24 +-
 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h  |   4 +
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c | 280 +++--
 .../net/ethernet/chelsio/cxgb4/cudbg_lib_common.h  |   8 +-
 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c|  81 ++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.h|  56 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c   |  65 -
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h   |   3 +
 9 files changed, 374 insertions(+), 148 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.h

-- 
2.14.1

Re: DPAA Ethernet traffice troubles with Linux kernel

2018-01-16 Thread Christian Zigotzky

Hi Skateman,

Fantastic! Many thanks for testing the RC8 of kernel 4.15 without PAMU support.

@All
Further information: 
http://forum.hyperion-entertainment.biz/viewtopic.php?f=58=43706#p43706

Cheers,
Christian

On 16. Jan 2018, at 23:05, mad skateman  wrote:

Fantastic Christian.. 

Your latest kernel makes the NIC work!!!

Few tweaks to be done... like the buffer space 

Brilliant!


On 16 January 2018 at 9:42PM, Christian Zigotzky wrote:
Hi All,

I compiled the RC8 of kernel 4.15 for the X5000 without PAMU support today.

Download: http://www.xenosoft.de/uImage_without_pamu.tar.gz

Please test it on your AmigaOne X5000.

Thanks,
Christian


On 16 January 2018 at 6:33PM, Madalin-cristian Bucur wrote:
The PAMU related errors may be relevant to the issue, if you have incorrect
settings you may have no traffic passing through. The PAMU configuration
should be made by the bootloader. Can you try to disable CONFIG_FSL_PAMU?

Madalin

Re: WARNING in can_rcv

2018-01-16 Thread Eric Biggers

On Wed, Jan 17, 2018 at 07:39:24AM +0100, Oliver Hartkopp wrote:
> 
> 
> On 01/16/2018 07:11 PM, Dmitry Vyukov wrote:
> > On Tue, Jan 16, 2018 at 7:07 PM, Marc Kleine-Budde  
> > wrote:
> > > On 01/16/2018 06:58 PM, syzbot wrote:
> > > > Hello,
> > > > 
> > > > syzkaller hit the following crash on
> > > > a8750ddca918032d6349adbf9a4b6555e7db20da
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> > > > compiler: gcc (GCC) 7.1.1 20170620
> > > > .config is attached
> > > > Raw console output is attached.
> > > > C reproducer is attached
> > > > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > > > for information about syzkaller reproducers
> > > > 
> > > > 
> > > > IMPORTANT: if you fix the bug, please add the following tag to the 
> > > > commit:
> > > > Reported-by: syzbot+4386709c0c1284dca...@syzkaller.appspotmail.com
> > > > It will help syzbot understand when the bug is fixed. See footer for
> > > > details.
> > > > If you forward the report, please keep this part and the footer.
> > > > 
> > > > device eql entered promiscuous mode
> > > > [ cut here ]
> > > > PF_CAN: dropped non conform CAN skbuf: dev type 65534, len 42, datalen 0
> > > > WARNING: CPU: 0 PID: 3650 at net/can/af_can.c:729 can_rcv+0x1c5/0x200
> > > > net/can/af_can.c:724
> > > > Kernel panic - not syncing: panic_on_warn set ...
> > > 
> > > Invalid packages generate a warning (WARN_ONCE()), and you have
> > > panic_on_warn active. Should we better silently drop these CAN packages?
> > 
> > Hi,
> > 
> > pr_warn_once() will be more appropriate. It prints a single line.
> > 
> 
> The idea behind this WARN() is to detect really bad things that might have
> happen on network driver level:
> 
> The CAN subsystem registers with dev_add_pack() for ETH_P_CAN and
> ETH_P_CANFD only. These ETH_P_ types are only allowed to be created by CAN
> network devices (like vcan, vxcan, and real CAN drivers).
> 
> I don't have any strong opinion on using WARN() or pr_warn_once().
> Is this detected violation worth using WARN(), as something already must
> have gone really wrong to trigger this issue?
> 

WARN() indicates a kernel bug.  If it's instead "userspace did something
stupid", or "someone sent some unexpected network packet", it needs to be
pr_warn_once(), pr_warn_ratelimited(), or removed entirely.

Eric

Re: [PATCH 32/32] aio: implement io_pgetevents

2018-01-16 Thread Christoph Hellwig

On Wed, Jan 17, 2018 at 04:27:21AM +, Al Viro wrote:
> On Tue, Jan 16, 2018 at 07:41:24PM -0500, Jeff Moyer wrote:
> > if (sigmask) {
> > -   if (copy_from_user(, sigmask, sizeof(ksigmask)))
> > +   if (!access_ok(VERIFY_READ, sigmask,
> > +  sizeof(void *) + sizeof(size_t)) ||
> > +   __get_user(up, (sigset_t __user * __user *)sigmask) ||
> > +   __get_user(sigsetsize,
> > +  (size_t __user *)(sigmask + sizeof(void *
> > return -EFAULT;
> 
> How about copy_from_user() on a struct?  Making eyes bleed is fun, but
> people tend to get annoyed when you do it to them...

Above is the copy & paste version from pselect.  I've got both copy_from_user
and that horrible version in my tree, and if we really need this awfull
calling convention copy_from_user certainly is much better.  pselect
also should be switched to explicit struct + copy_from_user while
we're at it.  In fact glibc defines a struct for the userland version
to start with.

Re: [PATCH net-next] net: stmmac: Fix reception of Broadcom switches tags

2018-01-16 Thread Giuseppe CAVALLARO


Hi Florian

for gmac4.x and gmac3.x series the ACS bit is the Automatic Pad or CRC 
Stripping, so the
core strips the Pad or FCS on frames if the value of the length field is 
< 1536 bytes.
For MAC10-100 there is the Bit 8 (ASTP) of the reg0 that does the same 
if len is < 46bytes.
In your patch I can just suggest to add a new field to strip the PAD/FCS 
w/o passing the whole
netdev struct to the core_init. In the main driver, we could manage the 
pad-strip feature (also
by using dt) or disable it in case of netdev_uses_dsa; then propagating 
this setting to the core_init

or calling a new callback. What do you think?

Regards
Peppe

On 1/17/2018 12:25 AM, Florian Fainelli wrote:

Broadcom tags inserted by Broadcom switches put a 4 byte header after
the MAC SA and before the EtherType, which may look like some sort of 0
length LLC/SNAP packet (tcpdump and wireshark do think that way). With
ACS enabled in stmmac the packets were truncated to 8 bytes on
reception, whereas clearing this bit allowed normal reception to occur.

In order to make that possible, we need to pass a net_device argument to
the different core_init() functions and we are dependent on the Broadcom
tagger padding packets correctly (which it now does). To be as little
invasive as possible, this is only done for gmac1000 when the network
device is DSA-enabled (netdev_uses_dsa() returns true).

Signed-off-by: Florian Fainelli 
---
  drivers/net/ethernet/stmicro/stmmac/common.h |  2 +-
  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c|  3 ++-
  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 12 +++-
  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  |  3 ++-
  drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c| 11 ++-
  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c|  2 +-
  6 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index ce2ea2d491ac..2ffe76c0ff74 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -474,7 +474,7 @@ struct mac_device_info;
  /* Helpers to program the MAC core */
  struct stmmac_ops {
/* MAC core initialization */
-   void (*core_init)(struct mac_device_info *hw, int mtu);
+   void (*core_init)(struct mac_device_info *hw, struct net_device *dev);
/* Enable the MAC RX/TX */
void (*set_mac)(void __iomem *ioaddr, bool enable);
/* Enable and verify that the IPC module is supported */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
index 9eb7f65d8000..a3fa65b1ca8e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
@@ -483,7 +483,8 @@ static int sun8i_dwmac_init(struct platform_device *pdev, 
void *priv)
return 0;
  }
  
-static void sun8i_dwmac_core_init(struct mac_device_info *hw, int mtu)

+static void sun8i_dwmac_core_init(struct mac_device_info *hw,
+ struct net_device *dev)
  {
void __iomem *ioaddr = hw->pcsr;
u32 v;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index 8a86340ff2d3..540d21786a43 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -25,18 +25,28 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include "stmmac_pcs.h"
  #include "dwmac1000.h"
  
-static void dwmac1000_core_init(struct mac_device_info *hw, int mtu)

+static void dwmac1000_core_init(struct mac_device_info *hw,
+   struct net_device *dev)
  {
void __iomem *ioaddr = hw->pcsr;
u32 value = readl(ioaddr + GMAC_CONTROL);
+   int mtu = dev->mtu;
  
  	/* Configure GMAC core */

value |= GMAC_CORE_INIT;
  
+	/* Clear ACS bit because Ethernet switch tagging formats such as

+* Broadcom tags can look like invalid LLC/SNAP packets and cause the
+* hardware to truncate packets on reception.
+*/
+   if (netdev_uses_dsa(dev))
+   value &= ~GMAC_CONTROL_ACS;
+
if (mtu > 1500)
value |= GMAC_CONTROL_2K;
if (mtu > 2000)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index 8ef517356313..c1ee427c42cb 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -28,7 +28,8 @@
  #include 
  #include "dwmac100.h"
  
-static void dwmac100_core_init(struct mac_device_info *hw, int mtu)

+static void dwmac100_core_init(struct mac_device_info *hw,
+  struct net_device *dev)
  {
void __iomem *ioaddr = hw->pcsr;
u32 value = readl(ioaddr

[PATCH net 2/2] cxgb4: fix endianness for vlan value in cxgb4_tc_flower

2018-01-16 Thread Rahul Lakkireddy

From: Kumar Sanghvi 

Don't change endianness when assigning vlan value in cxgb4_tc_flower
code when processing flow match parameters. The value gets converted
to network order as part of filtering code in set_filter_wr.

Signed-off-by: Kumar Sanghvi 
Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index 276edcbb3259..a452d5a1b0f3 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -208,8 +208,8 @@ static void cxgb4_process_flow_match(struct net_device *dev,
   VLAN_PRIO_SHIFT);
vlan_tci_mask = mask->vlan_id | (mask->vlan_priority <<
 VLAN_PRIO_SHIFT);
-   fs->val.ivlan = cpu_to_be16(vlan_tci);
-   fs->mask.ivlan = cpu_to_be16(vlan_tci_mask);
+   fs->val.ivlan = vlan_tci;
+   fs->mask.ivlan = vlan_tci_mask;
 
/* Chelsio adapters use ivlan_vld bit to match vlan packets
 * as 802.1Q. Also, when vlan tag is present in packets,
-- 
2.14.1

[PATCH net 1/2] cxgb4: set filter type to 1 for ETH_P_IPV6

2018-01-16 Thread Rahul Lakkireddy

From: Kumar Sanghvi 

For ethtype_key = ETH_P_IPV6, set filter type as 1 in cxgb4_tc_flower
code when processing flow match parameters.

Signed-off-by: Kumar Sanghvi 
Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index d4a548a6a55c..276edcbb3259 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -111,6 +111,9 @@ static void cxgb4_process_flow_match(struct net_device *dev,
ethtype_mask = 0;
}
 
+   if (ethtype_key == ETH_P_IPV6)
+   fs->type = 1;
+
fs->val.ethtype = ethtype_key;
fs->mask.ethtype = ethtype_mask;
fs->val.proto = key->ip_proto;
-- 
2.14.1

[PATCH net 0/2] cxgb4: fix issues in rule processing for tc-flower offload

2018-01-16 Thread Rahul Lakkireddy

Patch 1 sets filter type to indicate IPv6 when processing flow match
parameters.

Patch 2 fixes endianness issue when processing vlan flow match parameters.

Kumar Sanghvi (2):
  cxgb4: set filter type to 1 for ETH_P_IPV6
  cxgb4: fix endianness for vlan value in cxgb4_tc_flower

 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.14.1

[PATCH v2 net-next] virtio_net: Add ethtool stats

2018-01-16 Thread Toshiaki Makita

The main purpose of this patch is adding a way of checking per-queue stats.
It's useful to debug performance problems on multiqueue environment.

$ ethtool -S ens10
NIC statistics:
 rx_queue_0_packets: 2090408
 rx_queue_0_bytes: 3164825094
 rx_queue_1_packets: 2082531
 rx_queue_1_bytes: 3152932314
 tx_queue_0_packets: 2770841
 tx_queue_0_bytes: 4194955474
 tx_queue_1_packets: 3084697
 tx_queue_1_bytes: 4670196372

This change converts existing per-cpu stats structure into per-queue one.
This should not impact on performance since each queue counter is not
updated concurrently by multiple cpus.

Performance numbers:
 - Guest has 2 vcpus and 2 queues
 - Guest runs netserver
 - Host runs 100-flow super_netperf

 Before  After   Diff
UDP_STREAM 18byte86.22   87.00   +0.90%
UDP_STREAM 1472byte4055.27 4042.18   -0.32%
TCP_STREAM16956.3216890.63   -0.39%
UDP_RR   178667.11   185862.70   +4.03%
TCP_RR   128473.04   124985.81   -2.71%

Signed-off-by: Toshiaki Makita 
---
v2:
- Removed redundant counters which can be obtained from dev_get_stats.
- Made queue counter structure different for tx and rx so they can be
  easily extended separately, as some additional counters are expected
  like XDP related ones and VM-Exit event.
- Added performance numbers in commitlog.

 drivers/net/virtio_net.c | 191 ++-
 1 file changed, 141 insertions(+), 50 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 12dfc5f..626c273 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -66,16 +66,39 @@
VIRTIO_NET_F_GUEST_UFO
 };
 
-struct virtnet_stats {
-   struct u64_stats_sync tx_syncp;
-   struct u64_stats_sync rx_syncp;
-   u64 tx_bytes;
-   u64 tx_packets;
-
-   u64 rx_bytes;
-   u64 rx_packets;
+struct virtnet_stat_desc {
+   char desc[ETH_GSTRING_LEN];
+   size_t offset;
 };
 
+struct virtnet_sq_stats {
+   struct u64_stats_sync syncp;
+   u64 packets;
+   u64 bytes;
+};
+
+struct virtnet_rq_stats {
+   struct u64_stats_sync syncp;
+   u64 packets;
+   u64 bytes;
+};
+
+#define VIRTNET_SQ_STAT(m) offsetof(struct virtnet_sq_stats, m)
+#define VIRTNET_RQ_STAT(m) offsetof(struct virtnet_rq_stats, m)
+
+static const struct virtnet_stat_desc virtnet_sq_stats_desc[] = {
+   { "packets",VIRTNET_SQ_STAT(packets) },
+   { "bytes",  VIRTNET_SQ_STAT(bytes) },
+};
+
+static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
+   { "packets",VIRTNET_RQ_STAT(packets) },
+   { "bytes",  VIRTNET_RQ_STAT(bytes) },
+};
+
+#define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
+#define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
+
 /* Internal representation of a send virtqueue */
 struct send_queue {
/* Virtqueue associated with this send _queue */
@@ -87,6 +110,8 @@ struct send_queue {
/* Name of the send queue: output.$index */
char name[40];
 
+   struct virtnet_sq_stats stats;
+
struct napi_struct napi;
 };
 
@@ -99,6 +124,8 @@ struct receive_queue {
 
struct bpf_prog __rcu *xdp_prog;
 
+   struct virtnet_rq_stats stats;
+
/* Chain pages by the private ptr. */
struct page *pages;
 
@@ -152,9 +179,6 @@ struct virtnet_info {
/* Packet virtio header size */
u8 hdr_len;
 
-   /* Active statistics */
-   struct virtnet_stats __percpu *stats;
-
/* Work struct for refilling if we run low on memory. */
struct delayed_work refill;
 
@@ -1127,7 +1151,6 @@ static int virtnet_receive(struct receive_queue *rq, int 
budget, bool *xdp_xmit)
struct virtnet_info *vi = rq->vq->vdev->priv;
unsigned int len, received = 0, bytes = 0;
void *buf;
-   struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 
if (!vi->big_packets || vi->mergeable_rx_bufs) {
void *ctx;
@@ -1150,10 +1173,10 @@ static int virtnet_receive(struct receive_queue *rq, 
int budget, bool *xdp_xmit)
schedule_delayed_work(>refill, 0);
}
 
-   u64_stats_update_begin(>rx_syncp);
-   stats->rx_bytes += bytes;
-   stats->rx_packets += received;
-   u64_stats_update_end(>rx_syncp);
+   u64_stats_update_begin(>stats.syncp);
+   rq->stats.bytes += bytes;
+   rq->stats.packets += received;
+   u64_stats_update_end(>stats.syncp);
 
return received;
 }
@@ -1162,8 +1185,6 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 {
struct sk_buff *skb;
unsigned int len;
-   struct virtnet_info *vi = sq->vq->vdev->priv;
-   struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
unsigned int packets = 0;
unsigned int bytes = 0;
 
@@ -1182,10 +1203,10 @@ static void

Re: WARNING in can_rcv

2018-01-16 Thread Oliver Hartkopp




On 01/16/2018 07:11 PM, Dmitry Vyukov wrote:

On Tue, Jan 16, 2018 at 7:07 PM, Marc Kleine-Budde  wrote:

On 01/16/2018 06:58 PM, syzbot wrote:

Hello,

syzkaller hit the following crash on
a8750ddca918032d6349adbf9a4b6555e7db20da
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.
C reproducer is attached
syzkaller reproducer is attached. See https://goo.gl/kgGztJ
for information about syzkaller reproducers


IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+4386709c0c1284dca...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

device eql entered promiscuous mode
[ cut here ]
PF_CAN: dropped non conform CAN skbuf: dev type 65534, len 42, datalen 0
WARNING: CPU: 0 PID: 3650 at net/can/af_can.c:729 can_rcv+0x1c5/0x200
net/can/af_can.c:724
Kernel panic - not syncing: panic_on_warn set ...


Invalid packages generate a warning (WARN_ONCE()), and you have
panic_on_warn active. Should we better silently drop these CAN packages?


Hi,

pr_warn_once() will be more appropriate. It prints a single line.



The idea behind this WARN() is to detect really bad things that might 
have happen on network driver level:


The CAN subsystem registers with dev_add_pack() for ETH_P_CAN and 
ETH_P_CANFD only. These ETH_P_ types are only allowed to be created by 
CAN network devices (like vcan, vxcan, and real CAN drivers).


I don't have any strong opinion on using WARN() or pr_warn_once().
Is this detected violation worth using WARN(), as something already must 
have gone really wrong to trigger this issue?


Best regards,
Oliver

Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data

2018-01-16 Thread Alexei Starovoitov

On Tue, Jan 16, 2018 at 09:49:16PM -0800, John Fastabend wrote:
> 
> > but this program will see only first SG ?
> 
> Correct, to read further into the msg we would need to have a helper
> or some other way to catch reads/writes past the first 4k and read
> the next sg. We have the same limitation in cls_bpf.
> 
> I have started a patch on top of this series with the current working
> title msg_apply_bytes(int bytes). This would let us apply a verdict to
> a set number of bytes instead of the entire msg. By calling
> msg_apply_bytes(data_end - data) a user who needs to read an entire msg
> could do this in 4k chunks until the entire msg is passed through the
> bpf prog.

good idea.
I think would be good to add this helper as part of this patch set
to make sure there is a way for user to look through the whole
tcp stream if the program really wants to.
I understand that program cannot examine every byte anyway
due to lack of loops and helpers, but this part of sockmap api
should still provide an interface from day one.
One example would be the program parsing http2 or similar
where in the header it sees length. Then it can do
msg_apply_bytes(length)
to skip the bytes it processed, but still continue within
the same 64Kbyte chunk when 0 < length < 64k

> > and it's typically going to be one page only ?
> 
> yep
> 
> > then what's the value of waiting for MAX_SKB_FRAGS ?
> > 
> Its not waiting for MAX_SKB_FRAGS its simple copying up to MAX_SKB_FRAGS
> pages in one call if possible. It seems better to me to run this loop
> over as much data as we can.

agree on trying to do MAX_SKB_FRAGS as a 'processing unit',
but program should still be able to skip or redirect 
parts of the bytes and not the whole 64k chunk.
>From program point of view it should never see or worry about
SG list boundaries whereas right now it seems that below code
is dealing with them (though program doesn't know where sg ends):

> +
> + switch (eval) {
> + case __SK_PASS:
> + sg_mark_end(sg + sg_num - 1);
> + err = bpf_tcp_push(sk, sg, _curr, flags, true);
> + if (unlikely(err)) {
> + copied -= free_sg(sk, sg, sg_curr, sg_num);
> + goto out_err;
> + }
> + break;
> + case __SK_REDIRECT:
> + sg_mark_end(sg + sg_num - 1);
> + goto do_redir;
...
> >> +static int bpf_tcp_ulp_register(void)
> >> +{
> >> +  tcp_bpf_proto = tcp_prot;
> >> +  tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg;
> >> +  tcp_bpf_proto.sendpage = bpf_tcp_sendpage;
> >> +  return tcp_register_ulp(_tcp_ulp_ops);
> > 
> > I don't see corresponding tcp_unregister_ulp().
> > 
> 
> There is none. tcp_register_ulp() adds the bpf_tcp_ulp to the list of
> available ULPs and never removes it. To remove it we would have to
> keep a ref count on the reg/unreg calls. This would require a couple
> more patches to the ULP infra and not sure it hurts to leave the ULP
> reference around...
> 
> Maybe a follow on patch? Or else it could be a patch in front of this
> patch.

I see. I'm ok with leaving that for latter.
It doesn't hurt to keep it registered. Please add a comment though.

[linux-next] kernel Oops when booting powerpc

2018-01-16 Thread Abdul Haleem

Greeting's

linux-next kernel booted with kernel Oops on powerpc machine.

Machine Type: Power 8 [bare-metal & PowerVM LPAR]
kernel version: 4.15.0-rc7-next-20180115
test: Boot
config: attached

bootlogs:
-
ses 0:0:3:0: Attached Enclosure device
ses 0:0:4:0: Attached Enclosure device
Rounding down aligned max_sectors from 4294967295 to 4294967168
Loading iSCSI transport class v2.0-870.
iscsi: registered transport (iser)
RPC: Registered rdma transport module.
RPC: Registered rdma backchannel transport module. 
ip6_tables: (C) 2000-2006 Netfilter Core Team
Ebtables v2.0 registered
nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
Unable to handle kernel paging request for data at address 0x0118
Faulting instruction address: 0xd000102d0fa8
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=1024 NUMA PowerNV
Modules linked in: ip6table_mangle ip6table_security ip6table_raw
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw
ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert
iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt
target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm iw_cxgb3 ib_core ses enclosure
scsi_transport_sas i2c_opal ipmi_powernv i2c_core ipmi_devintf
ipmi_msghandler powernv_op_panel nfsd auth_rpcgss kvm_hv nfs_acl kvm_pr
lockd grace kvm sunrpc qla2xxx scsi_transport_fc tg3 ptp pps_core cxgb3
mdio
CPU: 28 PID: 3447 Comm: ip6tables Not tainted 4.15.0-rc7-next-20180115-autotest 
#1
NIP:  d000102d0fa8 LR: d000102d0f94 CTR: c0138ee0
REGS: c007a41377f0 TRAP: 0300   Not tainted  
(4.15.0-rc7-next-20180115-autotest)
MSR:  90010280b033   CR: 42002848 
 XER: 
CFAR: c000884c DAR: 0118 DSISR: 4000 SOFTE: 1 
GPR00: d000102d0f94 c007a4137a70 d000102dd000 0100 
GPR04: 0001 0001   
GPR08: c007a4137880 f000  0005 
GPR12: 2200 cfd53400  10014f80 
GPR16: 0002 7fff9c400518 7c2b0c98  
GPR20: 0003 10013c50 7c2b0ca0 7c2bffd3 
GPR24: 7c2b0890 7c2b088c 7c2b0894 c007a4137d00 
GPR28: 7c2b0894 c007a4137d00 0100 c13ef580 
NIP [d000102d0fa8] get_info+0x98/0x290 [ip6_tables]
LR [d000102d0f94] get_info+0x84/0x290 [ip6_tables]
Call Trace:
[c007a4137a70] [d000102d0f94] get_info+0x84/0x290 [ip6_tables] 
(unreliable)
[c007a4137bb0] [d000102d2274] do_ip6t_get_ctl+0x94/0x590 [ip6_tables]
[c007a4137c90] [c09d9ee8] nf_getsockopt+0x88/0xd0
[c007a4137ce0] [c0ab2170] ipv6_getsockopt+0x160/0x1f0
[c007a4137d30] [c0abe4c0] rawv6_getsockopt+0x40/0xd0
[c007a4137d50] [c094c7d4] sock_common_getsockopt+0x34/0x50
[c007a4137d70] [c094b228] SyS_getsockopt+0xa8/0x160
[c007a4137dd0] [c094bef8] SyS_socketcall+0x1f8/0x3d0
[c007a4137e30] [c000b8e0] system_call+0x58/0x6c
Instruction dump:
2e3e 98610117 40920190 7fe3fb78 388a 38a100f8 480034f9 e8410018 
3920f000 7fa34840 7c7e1b78 41dd01f4  40920144 3880 38a00054 
---[ end trace ecbb65add1313022 ]---

IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f1: link is not ready
IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f1: link is not ready
IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f2: link is not ready
IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f2: link is not ready
IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f3: link is not ready
IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f3: link is not ready
IPv6: ADDRCONF(NETDEV_UP): net0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): net0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
qla2xxx [0002:03:00.0]-8038:1: Cable is unplugged...
iw_cxgb3: Chelsio T3 RDMA Driver - version 1.1
ib_srpt MAD registration failed for cxgb3_0-1. 
ib_srpt srpt_add_one(cxgb3_0) failed.
iw_cxgb3: Initialized device :01:00.0




-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre


#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.15.0-rc7 Kernel Configuration
#
CONFIG_PPC64=y

#
# Processor support
#
CONFIG_PPC_BOOK3S_64=y
# CONFIG_PPC_BOOK3E_64 is not set
# CONFIG_POWER7_CPU is not set
CONFIG_POWER8_CPU=y
CONFIG_PPC_BOOK3S=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_VSX=y
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_RADIX_MMU=y
CONFIG_PPC_RADIX_MMU_DEFAULT=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_PPC_MM_SLICES=y
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
CONFIG_FORCE_SMP=y
CONFIG_SMP=y
CONFIG_NR_CPUS=1024
CONFIG_PPC_DOORBELL=y
# CONFIG_CPU_BIG_ENDIAN is not set

Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data

2018-01-16 Thread John Fastabend

On 01/16/2018 06:25 PM, Alexei Starovoitov wrote:
> On Fri, Jan 12, 2018 at 10:11:11AM -0800, John Fastabend wrote:
>> This implements a BPF ULP layer to allow policy enforcement and
>> monitoring at the socket layer. In order to support this a new
>> program type BPF_PROG_TYPE_SK_MSG is used to run the policy at
>> the sendmsg/sendpage hook. To attach the policy to sockets a
>> sockmap is used with a new program attach type BPF_SK_MSG_VERDICT.

[...]

> 
> overall design looks clean. imo huge improvement from first version.
> 

Great thanks for the review.

> Few nits:
> 

[...]

>> +
>> +static int bpf_tcp_push(struct sock *sk, struct scatterlist *sg,
>> +int *sg_end, int flags, bool charge)
>> +{
>> +int sendpage_flags = flags | MSG_SENDPAGE_NOTLAST;
>> +int offset, ret = 0;
>> +struct page *p;
>> +size_t size;
>> +
>> +size = sg->length;
>> +offset = sg->offset;
>> +
>> +while (1) {
>> +if (sg_is_last(sg))
>> +sendpage_flags = flags;
>> +
>> +tcp_rate_check_app_limited(sk);
>> +p = sg_page(sg);
>> +retry:
>> +ret = do_tcp_sendpages(sk, p, offset, size, sendpage_flags);
>> +if (ret != size) {
>> +if (ret > 0) {
>> +offset += ret;
>> +size -= ret;
>> +goto retry;
>> +}
>> +
>> +if (charge)
>> +sk_mem_uncharge(sk,
>> +sg->length - size - sg->offset);
> 
> should the bool argument be called 'uncharge' instead ?
> 

Agreed that would be a better name, will update.

>> +
>> +sg->offset = offset;
>> +sg->length = size;
>> +return ret;
>> +}
>> +
>> +put_page(p);
>> +if (charge)
>> +sk_mem_uncharge(sk, sg->length);
>> +*sg_end += 1;
>> +sg = sg_next(sg);
>> +if (!sg)
>> +break;
>> +
>> +offset = sg->offset;
>> +size = sg->length;
>> +}
>> +
>> +return 0;
>> +}

[...]

>> +static int bpf_tcp_sendmsg_do_redirect(struct scatterlist *sg, int sg_num,
>> +   struct sk_msg_buff *md, int flags)
>> +{
>> +int i, sg_curr = 0, err, free;
>> +struct smap_psock *psock;
>> +struct sock *sk;
>> +
>> +rcu_read_lock();
>> +sk = do_msg_redirect_map(md);
>> +if (unlikely(!sk))
>> +goto out_rcu;
>> +
>> +psock = smap_psock_sk(sk);
>> +if (unlikely(!psock))
>> +goto out_rcu;
>> +
>> +if (!refcount_inc_not_zero(>refcnt))
>> +goto out_rcu;
>> +
>> +rcu_read_unlock();
>> +lock_sock(sk);
>> +err = bpf_tcp_push(sk, sg, _curr, flags, false);
>> +if (unlikely(err))
>> +goto out;
>> +release_sock(sk);
>> +smap_release_sock(psock, sk);
>> +return 0;
>> +out_rcu:
>> +rcu_read_unlock();
>> +out:
>> +for (i = sg_curr; i < sg_num; ++i) {
>> +free += sg[i].length;
>> +put_page(sg_page([i]));
>> +}
>> +return free;
> 
> erro path keeps rcu_lock and sk locked?
> 

yep, although looks like rcu_read_unlock() is OK because its
released above the call but the

   if (unlikely(err))
goto err

needs to be moved below the smap_release_sock(). Thanks!

>> +}
>> +

[...]

>> +while (msg_data_left(msg)) {
>> +int sg_curr;
>> +
>> +if (sk->sk_err) {
>> +err = sk->sk_err;
>> +goto out_err;
>> +}
>> +
>> +copy = msg_data_left(msg);
>> +if (!sk_stream_memory_free(sk))
>> +goto wait_for_sndbuf;
>> +
>> +/* sg_size indicates bytes already allocated and sg_num
>> + * is last sg element used. This is used when alloc_sg
>> + * partially allocates a scatterlist and then is sent
>> + * to wait for memory. In normal case (no memory pressure)
>> + * both sg_nun and sg_size are zero.
>> + */
>> +copy = copy - sg_size;
>> +err = sk_alloc_sg(sk, copy, sg, _num, _size, 0);
>> +if (err) {
>> +if (err != -ENOSPC)
>> +goto wait_for_memory;
>> +copy = sg_size;
>> +}
>> +
>> +err = memcopy_from_iter(sk, sg, sg_num, >msg_iter, copy);
>> +if (err < 0) {
>> +free_sg(sk, sg, 0, sg_num);
>> +goto out_err;
>> +}
>> +
>> +copied += copy;
>> +
>> +/* If msg is larger than MAX_SKB_FRAGS we can send multiple
>> + * scatterlists per msg. However BPF decisions apply to the
>> + * entire msg.
>> + */
>> +if (eval

net merged into net-next

2018-01-16 Thread David Miller


Daniel, please double check my merge work especially wrt. your
packet scheduler fix.

Thanks!

Re: [PATCH net] net: validate untrusted gso packets

2018-01-16 Thread Willem de Bruijn

On Tue, Jan 16, 2018 at 11:33 PM, Willem de Bruijn
 wrote:
> On Tue, Jan 16, 2018 at 11:04 PM, Jason Wang  wrote:
>>
>>
>> On 2018年01月17日 04:29, Willem de Bruijn wrote:
>>>
>>> From: Willem de Bruijn
>>>
>>> Validate gso packet type and headers on kernel entry. Reuse the info
>>> gathered by skb_probe_transport_header.
>>>
>>> Syzbot found two bugs by passing bad gso packets in packet sockets.
>>> Untrusted user packets are limited to a small set of gso types in
>>> virtio_net_hdr_to_skb. But segmentation occurs on packet contents.
>>> Syzkaller was able to enter gso callbacks that are not hardened
>>> against untrusted user input.
>>
>>
>> Do this mean there's something missed in exist header check for dodgy
>> packets?
>
> virtio_net_hdr_to_skb checks gso_type, but it does not verify that this
> type correctly describes the actual packet. Segmentation happens based
> on packet contents. So a packet was crafted to enter sctp gso, even
> though no such gso_type exists. This issue is not specific to sctp.
>
>>>
>>> User packets can also have corrupted headers, tripping up segmentation
>>> logic that expects sane packets from the trusted protocol stack.
>>> Hardening all segmentation paths against all bad packets is error
>>> prone and slows down the common path, so validate on kernel entry.
>>
>>
>> I think evil packets should be rare in common case, so I'm not sure validate
>> it on kernel entry is a good choice especially consider we've already had
>> header check.
>
> This just makes that check more strict. Frequency of malicious packets is
> not really relevant if a single bad packet can cause damage.
>
> The alternative to validate on kernel entry is to harden the entire 
> segmentation
> layer and lower part of the stack. That is much harder to get right and not
> necessarily cheaper.
>
> As a matter of fact, it incurs a cost on all packets, including the common
> case generated by the protocol stack.

If packets can be fully validated at the source, we can eventually also
get rid of the entire SKB_GSO_DODGY and NETIF_F_GSO_ROBUST
logic. Then virtio packets won't have to enter the segmentation layer
at all for TSO capable devices.

Re: [PATCH net] net: validate untrusted gso packets

2018-01-16 Thread Willem de Bruijn

On Tue, Jan 16, 2018 at 11:04 PM, Jason Wang  wrote:
>
>
> On 2018年01月17日 04:29, Willem de Bruijn wrote:
>>
>> From: Willem de Bruijn
>>
>> Validate gso packet type and headers on kernel entry. Reuse the info
>> gathered by skb_probe_transport_header.
>>
>> Syzbot found two bugs by passing bad gso packets in packet sockets.
>> Untrusted user packets are limited to a small set of gso types in
>> virtio_net_hdr_to_skb. But segmentation occurs on packet contents.
>> Syzkaller was able to enter gso callbacks that are not hardened
>> against untrusted user input.
>
>
> Do this mean there's something missed in exist header check for dodgy
> packets?

virtio_net_hdr_to_skb checks gso_type, but it does not verify that this
type correctly describes the actual packet. Segmentation happens based
on packet contents. So a packet was crafted to enter sctp gso, even
though no such gso_type exists. This issue is not specific to sctp.

>>
>> User packets can also have corrupted headers, tripping up segmentation
>> logic that expects sane packets from the trusted protocol stack.
>> Hardening all segmentation paths against all bad packets is error
>> prone and slows down the common path, so validate on kernel entry.
>
>
> I think evil packets should be rare in common case, so I'm not sure validate
> it on kernel entry is a good choice especially consider we've already had
> header check.

This just makes that check more strict. Frequency of malicious packets is
not really relevant if a single bad packet can cause damage.

The alternative to validate on kernel entry is to harden the entire segmentation
layer and lower part of the stack. That is much harder to get right and not
necessarily cheaper.

As a matter of fact, it incurs a cost on all packets, including the common
case generated by the protocol stack.

Re: [PATCH 32/32] aio: implement io_pgetevents

2018-01-16 Thread Al Viro

On Tue, Jan 16, 2018 at 07:41:24PM -0500, Jeff Moyer wrote:
>   if (sigmask) {
> - if (copy_from_user(, sigmask, sizeof(ksigmask)))
> + if (!access_ok(VERIFY_READ, sigmask,
> +sizeof(void *) + sizeof(size_t)) ||
> + __get_user(up, (sigset_t __user * __user *)sigmask) ||
> + __get_user(sigsetsize,
> +(size_t __user *)(sigmask + sizeof(void *
>   return -EFAULT;

How about copy_from_user() on a struct?  Making eyes bleed is fun, but
people tend to get annoyed when you do it to them...

Re: [PATCH bpf-next v2 09/11] nfp: bpf: use extack support to improve debugging

2018-01-16 Thread David Ahern

On 1/16/18 12:11 PM, Jakub Kicinski wrote:
> On Tue, 16 Jan 2018 10:36:01 +0100, Jiri Pirko wrote:
>>> @@ -303,7 +305,8 @@ static int nfp_net_bpf_load(struct nfp_net *nn, struct 
>>> bpf_prog *prog)
>>> /* Load up the JITed code */
>>> err = nfp_net_reconfig(nn, NFP_NET_CFG_UPDATE_BPF);
>>> if (err)
>>> -   nn_err(nn, "FW command error while loading BPF: %d\n", err);
>>> +   NL_SET_ERR_MSG_MOD(extack,
>>> +  "FW command error while loading BPF");  
>>
>> One line please. Same for all others. Strings may overflow 80 cols.
> 
> Sorry, but this is the way I want things in the nfp driver.  If the
> string would fit 80 chars placed on a new line, it should be placed 
> on a new line.  If it doesn't fit anyway the new line is unnecessary.
> This rules is adhered to throughout the driver (to the extent I'm able
> to enforce it).
> 

+1

Re: [PATCH net] net: validate untrusted gso packets

2018-01-16 Thread Jason Wang




On 2018年01月17日 04:29, Willem de Bruijn wrote:

From: Willem de Bruijn

Validate gso packet type and headers on kernel entry. Reuse the info
gathered by skb_probe_transport_header.

Syzbot found two bugs by passing bad gso packets in packet sockets.
Untrusted user packets are limited to a small set of gso types in
virtio_net_hdr_to_skb. But segmentation occurs on packet contents.
Syzkaller was able to enter gso callbacks that are not hardened
against untrusted user input.


Do this mean there's something missed in exist header check for dodgy 
packets?




User packets can also have corrupted headers, tripping up segmentation
logic that expects sane packets from the trusted protocol stack.
Hardening all segmentation paths against all bad packets is error
prone and slows down the common path, so validate on kernel entry.


I think evil packets should be rare in common case, so I'm not sure 
validate it on kernel entry is a good choice especially consider we've 
already had header check.




Introduce skb_probe_transport_header_hard to unconditionally probe,
even if skb_partial_csum_set has already set an offset. That is under
user control, so do not trust it. I did not see a measurable change
in TCP_STREAM performance.

Move tpacket probe call after virtio_net_hdr_to_skb has set gso_type.

Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.")
Fixes: f43798c27684 ("tun: Allow GSO using virtio_net_hdr")
Fixes: f942dc2552b8 ("xen network backend driver")
Link:http://lkml.kernel.org/r/<001a1137452496ffc305617e5...@google.com>
Reported-by:syzbot+fee64147a25aecd48...@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn


...

Thanks

Re: [PATCH bpf-next 3/3] tools: bpftool: improve architecture detection by using ifindex

2018-01-16 Thread Jakub Kicinski

CC: Francois

On Tue, 16 Jan 2018 19:42:15 -0800, Alexei Starovoitov wrote:
> > +   switch (vendor_id) {
> > +   case 0x19ee:
> > +   device_id = read_sysfs_netdev_hex_int(devname, "device");
> > +   if (device_id != 0x4000 &&
> > +   device_id != 0x6000 &&
> > +   device_id != 0x6003)
> > +   p_info("Unknown NFP device ID, assuming it is NFP-6xxx 
> > arch");
> > +   return "NFP-6xxx";  
> 
> is this a canonical name that bfd will understand?

Yes, we started with just "nfp", but tossed "6k" for good measure.

> a link to bfd patches?

Unfortunately not posted, yet.

[PATCH] Bluetooth: 6lowpan: Fix disconnect bug in 6lowpan

2018-01-16 Thread Guo Yi

This patch fix the bluetooth 6lowpan disconnect fail bug.

The type of the same address type have different define value in HCI layer
and L2CAP layer.That makes disconnect fail due to wrong network type.User
will not be able to disconnect from console with the network type that used
in connect.

This patch add a var lookup_type, and covert the channel address type to
HCI address type. By these means, user can disconnect successfuly.

Signed-off-by: Guo Yi 
---
 net/bluetooth/6lowpan.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/6lowpan.c b/net/bluetooth/6lowpan.c
index 795ddd8..332dddb 100644
--- a/net/bluetooth/6lowpan.c
+++ b/net/bluetooth/6lowpan.c
@@ -1104,11 +1104,18 @@ static int get_l2cap_conn(char *buf, bdaddr_t *addr, u8 
*addr_type,
struct hci_dev *hdev;
bdaddr_t *src = BDADDR_ANY;
int n;
+   u8 lookup_type;
 
n = sscanf(buf, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx %hhu",
   >b[5], >b[4], >b[3],
   >b[2], >b[1], >b[0],
   addr_type);
+   /* Convert from L2CAP channel address type to HCI address type
+*/
+   if (*addr_type == BDADDR_LE_PUBLIC)
+   lookup_type = ADDR_LE_DEV_PUBLIC;
+   else
+   lookup_type = ADDR_LE_DEV_RANDOM;
 
if (n < 7)
return -EINVAL;
@@ -1118,7 +1125,7 @@ static int get_l2cap_conn(char *buf, bdaddr_t *addr, u8 
*addr_type,
return -ENOENT;
 
hci_dev_lock(hdev);
-   hcon = hci_conn_hash_lookup_le(hdev, addr, *addr_type);
+   hcon = hci_conn_hash_lookup_le(hdev, addr, lookup_type);
hci_dev_unlock(hdev);
 
if (!hcon)
-- 
2.7.4

Re: pull-request: bpf-next 2018-01-17

2018-01-16 Thread David Miller

From: Daniel Borkmann 
Date: Wed, 17 Jan 2018 02:01:28 +0100

> The following pull-request contains BPF updates for your *net-next* tree.
 ...
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Pulled, thanks Daniel.

Re: [PATCH bpf-next 3/3] tools: bpftool: improve architecture detection by using ifindex

2018-01-16 Thread Alexei Starovoitov

On Tue, Jan 16, 2018 at 04:05:21PM -0800, Jakub Kicinski wrote:
> From: Jiong Wang 
> 
> The current architecture detection method in bpftool is designed for host
> case.
> 
> For offload case, we can't use the architecture of "bpftool" itself.
> Instead, we could call the existing "ifindex_to_name_ns" to get DEVNAME,
> then read pci id from /sys/class/dev/DEVNAME/device/vendor, finally we map
> vendor id to bfd arch name which will finally be used to select bfd backend
> for the disassembler.
> 
> Reviewed-by: Jakub Kicinski 
> Signed-off-by: Jiong Wang 

awesome addition!
Acked-by: Alexei Starovoitov 

> + switch (vendor_id) {
> + case 0x19ee:
> + device_id = read_sysfs_netdev_hex_int(devname, "device");
> + if (device_id != 0x4000 &&
> + device_id != 0x6000 &&
> + device_id != 0x6003)
> + p_info("Unknown NFP device ID, assuming it is NFP-6xxx 
> arch");
> + return "NFP-6xxx";

is this a canonical name that bfd will understand?
a link to bfd patches?

Re: [PATCH bpf-next 1/3] bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info

2018-01-16 Thread Alexei Starovoitov

On Tue, Jan 16, 2018 at 04:05:19PM -0800, Jakub Kicinski wrote:
> From: Jiong Wang 
> 
> For host JIT, there are "jited_len"/"bpf_func" fields in struct bpf_prog
> used by all host JIT targets to get jited image and it's length. While for
> offload, targets are likely to have different offload mechanisms that these
> info are kept in device private data fields.
> 
> Therefore, BPF_OBJ_GET_INFO_BY_FD syscall needs an unified way to get JIT
> length and contents info for offload targets.
> 
> One way is to introduce new callback to parse device private data then fill
> those fields in bpf_prog_info. This might be a little heavy, the other way
> is to add generic fields which will be initialized by all offload targets.
> 
> This patch follow the second approach to introduce two new fields in
> struct bpf_dev_offload and teach bpf_prog_get_info_by_fd about them to fill
> correct jited_prog_len and jited_prog_insns in bpf_prog_info.
> 
> Reviewed-by: Jakub Kicinski 
> Signed-off-by: Jiong Wang 

Acked-by: Alexei Starovoitov 

initially I wasn't sure that reusing jited_prog_insns field
to return offloaded prog is such a good idea, but since we
fill in ifindex at the same time the usage of the field is not ambiguous,
so I think it's a good approach.

Re: [PATCH net-next 2/8] net: sched: cls_api: handle generic cls errors

2018-01-16 Thread David Ahern

On 1/16/18 4:19 PM, Jamal Hadi Salim wrote:
> On 18-01-16 06:58 PM, David Ahern wrote:
>> On 1/16/18 9:20 AM, Alexander Aring wrote:
> 
> 
>>>   }
>>>     if (n->nlmsg_type != RTM_NEWTFILTER ||
>>>   !(n->nlmsg_flags & NLM_F_CREATE)) {
>>> +    NL_SET_ERR_MSG(extack, "Need both RTM_NEWTFILTER and
>>> NLM_F_CREATE to create a new filter");
>>
>> that does not seem the right message. tc_ctl_tfilter is overloaded for
>> new, delete and get so the response here needs to reflect that. I
>> believe in this case the user did not specify a valid chain.
>>
> 
> Are you sure you are looking at the correct code?

tp = tcf_chain_tp_find(chain, _info, protocol,
   prio, prio_allocate);
if (IS_ERR(tp)) {
err = PTR_ERR(tp);
goto errout;
}

if (tp == NULL) {
/* Proto-tcf does not exist, create new one */

if (tca[TCA_KIND] == NULL || !protocol) {
err = -EINVAL;
goto errout;
}

if (n->nlmsg_type != RTM_NEWTFILTER ||
!(n->nlmsg_flags & NLM_F_CREATE)) {
err = -ENOENT;
goto errout;
}

Seems like that code path is run for other than RTM_NEWTFILTER. Even the
check there says != is ok -- just error out with an ENOENT.


> It is a create message that is at stake here.
> A create has to have RTM_NEWTFILTER and NLM_F_CREATE
> 
>> Also, the messages are targeted at users not developers, so no code
>> jargon / API references.
> 
> Generally true, but should this rule really be scripture?
> The main user here is tc in  user space and it doesnt make mistakes
> in this case i.e we will  never see this error with tc because a
> create will always have those two set correctly; OTOH, a developer
> writing some new app is more likely to make this mistake (in which
> case this message is very helpful).

argumentative. I have focused on adding specific error messages that
help a user understand why a command failed. It can be done with
referencing API names.

[PATCH net-next] mlxsw: spectrum: Make function mlxsw_sp_kvdl_part_occ() static

2018-01-16 Thread Wei Yongjun

Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:289:5: warning:
 symbol 'mlxsw_sp_kvdl_part_occ' was not declared. Should it be static?

Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
index cfacc17..55f9d2d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
@@ -286,7 +286,7 @@ static void mlxsw_sp_kvdl_parts_fini(struct mlxsw_sp 
*mlxsw_sp)
mlxsw_sp_kvdl_part_fini(mlxsw_sp, i);
 }
 
-u64 mlxsw_sp_kvdl_part_occ(struct mlxsw_sp_kvdl_part *part)
+static u64 mlxsw_sp_kvdl_part_occ(struct mlxsw_sp_kvdl_part *part)
 {
unsigned int nr_entries;
int bit = -1;

[PATCH net-next] devlink: Make some functions static

2018-01-16 Thread Wei Yongjun

Fixes the following sparse warnings:

net/core/devlink.c:2297:25: warning:
 symbol 'devlink_resource_find' was not declared. Should it be static?
net/core/devlink.c:2322:6: warning:
 symbol 'devlink_resource_validate_children' was not declared. Should it be 
static?

Signed-off-by: Wei Yongjun 
---
 net/core/devlink.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index dd7d6dd..66d3670 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2294,7 +2294,7 @@ static int devlink_nl_cmd_dpipe_table_counters_set(struct 
sk_buff *skb,
counters_enable);
 }
 
-struct devlink_resource *
+static struct devlink_resource *
 devlink_resource_find(struct devlink *devlink,
  struct devlink_resource *resource, u64 resource_id)
 {
@@ -2319,7 +2319,8 @@ struct devlink_resource *
return NULL;
 }
 
-void devlink_resource_validate_children(struct devlink_resource *resource)
+static void
+devlink_resource_validate_children(struct devlink_resource *resource)
 {
struct devlink_resource *child_resource;
bool size_valid = true;

[PATCHv2 net-next 1/1] forcedeth: remove unused variable

2018-01-16 Thread Zhu Yanjun

The variable miistat is not used. So it is removed.

CC: Srinivas Eeda 
CC: Joe Jin 
CC: Junxiao Bi 
Signed-off-by: Zhu Yanjun 
---
v1->v2: Keep readl function
---
 drivers/net/ethernet/nvidia/forcedeth.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c 
b/drivers/net/ethernet/nvidia/forcedeth.c
index 21e15cb..a3f6d51 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -5510,11 +5510,9 @@ static int nv_open(struct net_device *dev)
/* One manual link speed update: Interrupts are enabled, future link
 * speed changes cause interrupts and are handled by nv_link_irq().
 */
-   {
-   u32 miistat;
-   miistat = readl(base + NvRegMIIStatus);
-   writel(NVREG_MIISTAT_MASK_ALL, base + NvRegMIIStatus);
-   }
+   readl(base + NvRegMIIStatus);
+   writel(NVREG_MIISTAT_MASK_ALL, base + NvRegMIIStatus);
+
/* set linkspeed to invalid value, thus force nv_update_linkspeed
 * to init hw */
np->linkspeed = 0;
-- 
2.7.4

Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data

2018-01-16 Thread Alexei Starovoitov

On Fri, Jan 12, 2018 at 10:11:11AM -0800, John Fastabend wrote:
> This implements a BPF ULP layer to allow policy enforcement and
> monitoring at the socket layer. In order to support this a new
> program type BPF_PROG_TYPE_SK_MSG is used to run the policy at
> the sendmsg/sendpage hook. To attach the policy to sockets a
> sockmap is used with a new program attach type BPF_SK_MSG_VERDICT.
> 
> Similar to previous sockmap usages when a sock is added to a
> sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT
> program type attached then the BPF ULP layer is created on the
> socket and the attached BPF_PROG_TYPE_SK_MSG program is run for
> every msg in sendmsg case and page/offset in sendpage case.
> 
> BPF_PROG_TYPE_SK_MSG Semantics/API:
> 
> BPF_PROG_TYPE_SK_MSG supports only two return codes SK_PASS and
> SK_DROP. Returning SK_DROP free's the copied data in the sendmsg
> case and in the sendpage case leaves the data untouched. Both cases
> return -EACESS to the user. Returning SK_PASS will allow the msg to
> be sent.
> 
> In the sendmsg case data is copied into kernel space buffers before
> running the BPF program. In the sendpage case data is never copied.
> The implication being users may change data after BPF programs run in
> the sendpage case. (A flag will be added to always copy shortly
> if the copy must always be performed).
> 
> The verdict from the BPF_PROG_TYPE_SK_MSG applies to the entire msg
> in the sendmsg() case and the entire page/offset in the sendpage case.
> This avoid ambiguity on how to handle mixed return codes in the
> sendmsg case. The readable/writeable data provided to the program
> in the sendmsg case may not be the entire message, in fact for
> large sends this is likely the case. The data range that can be
> read is part of the sk_msg_md structure. This is because similar
> to the tc bpf_cls case the data is stored in a scatter gather list.
> Future work will address this short-coming to allow users to pull
> in more data if needed (similar to TC BPF).
> 
> The helper msg_redirect_map() can be used to select the socket to
> send the data on. This is used similar to existing redirect use
> cases. This allows policy to redirect msgs.
> 
> Pseudo code simple example:
> 
> The basic logic to attach a program to a socket is as follows,
> 
>   // load the programs
>   bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG,
>   , _prog);
> 
>   // lookup the sockmap
>   bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map");
> 
>   // get fd for sockmap
>   map_fd_msg = bpf_map__fd(bpf_map_msg);
> 
>   // attach program to sockmap
>   bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0);
> 
> Adding sockets to the map is done in the normal way,
> 
>   // Add a socket 'fd' to sockmap at location 'i'
>   bpf_map_update_elem(map_fd_msg, , fd, BPF_ANY);
> 
> After the above any socket attached to "my_sock_map", in this case
> 'fd', will run the BPF msg verdict program (msg_prog) on every
> sendmsg and sendpage system call.
> 
> For a complete example see BPF selftests bpf/sockmap_tcp_msg_*.c and
> test_maps.c
> 
> Implementation notes:
> 
> It seemed the simplest, to me at least, to use a refcnt to ensure
> psock is not lost across the sendmsg copy into the sg, the bpf program
> running on the data in sg_data, and the final pass to the TCP stack.
> Some performance testing may show a better method to do this and avoid
> the refcnt cost, but for now use the simpler method.
> 
> Another item that will come after basic support is in place is
> supporting MSG_MORE flag. At the moment we call sendpages even if
> the MSG_MORE flag is set. An enhancement would be to collect the
> pages into a larger scatterlist and pass down the stack. Notice that
> bpf_tcp_sendmsg() could support this with some additional state saved
> across sendmsg calls. I built the code to support this without having
> to do refactoring work. Other flags TBD include ZEROCOPY flag.
> 
> Yet another detail that needs some thought is the size of scatterlist.
> Currently, we use MAX_SKB_FRAGS simply because this was being used
> already in the TLS case. Future work to improve the kernel sk APIs to
> tune this depending on workload may be useful. This is a trade-off
> between memory usage and B/s performance.
> 
> Signed-off-by: John Fastabend 

overall design looks clean. imo huge improvement from first version.

Few nits:

> ---
>  include/linux/bpf.h   |1 
>  include/linux/bpf_types.h |1 
>  include/linux/filter.h|   10 +
>  include/net/tcp.h |2 
>  include/uapi/linux/bpf.h  |   28 +++
>  kernel/bpf/sockmap.c  |  485 
> -
>  kernel/bpf/syscall.c  |   14 +
>  kernel/bpf/verifier.c |5 
>  net/core/filter.c |  106 ++
>  9 files changed, 638 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 9e03046..14cdb4d

[PATCH net-next] tcp: avoid negotitating ECN for BBR

2018-01-16 Thread Yuchung Cheng

This patch keeps BBR from negotiating ECN if sysctl ECN is
set. Prior to this patch, BBR negotiates ECN if enabled, sends
CWR upon receiving ECE ACKs but does not react to them. This can
cause confusion from the protocol perspective. Therefore this
patch prevents the connection from negotiating ECN if BBR is the
congestion control during the handshake.

Note that after the handshake, the user can still switch to a
different congestion control that supports or even requires ECN
(e.g. DCTCP).  In that case, the connection can not re-negotiate
ECN and has to go with the ECN-free mode in that congestion control.

There are other cases BBR would still respond to ECE ACKs with CWR
but does not react to it like the behavior before this patch. First,
when the user switches to BBR congestion control but the connection
has already negotiated ECN before. Second, the system has configured
the ip route and/or uses eBPF to enable ECN on the connection that
uses BBR congestion control.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
Acked-by: Yousuk Seung 
Acked-by: Eric Dumazet 
---
 include/net/tcp.h | 7 +++
 net/ipv4/tcp_bbr.c| 2 +-
 net/ipv4/tcp_input.c  | 3 ++-
 net/ipv4/tcp_output.c | 6 --
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6939e69d3c37..22345132d969 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -925,6 +925,8 @@ enum tcp_ca_ack_event_flags {
 #define TCP_CONG_NON_RESTRICTED 0x1
 /* Requires ECN/ECT set on all packets */
 #define TCP_CONG_NEEDS_ECN 0x2
+/* Does not use or react to ECN */
+#define TCP_CONG_DONT_USE_ECN  0x4
 
 union tcp_cc_info;
 
@@ -1033,6 +1035,11 @@ static inline bool tcp_ca_needs_ecn(const struct sock 
*sk)
return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ECN;
 }
 
+static inline bool tcp_ca_uses_ecn(const struct sock *sk)
+{
+   return !(inet_csk(sk)->icsk_ca_ops->flags & TCP_CONG_DONT_USE_ECN);
+}
+
 static inline void tcp_set_ca_state(struct sock *sk, const u8 ca_state)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index 8322f26e770e..27456554b113 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -926,7 +926,7 @@ static void bbr_set_state(struct sock *sk, u8 new_state)
 }
 
 static struct tcp_congestion_ops tcp_bbr_cong_ops __read_mostly = {
-   .flags  = TCP_CONG_NON_RESTRICTED,
+   .flags  = TCP_CONG_NON_RESTRICTED | TCP_CONG_DONT_USE_ECN,
.name   = "bbr",
.owner  = THIS_MODULE,
.init   = bbr_init,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ff71b18d9682..6731d0b9b146 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6090,7 +6090,8 @@ static void tcp_ecn_create_request(struct request_sock 
*req,
 
ect = !INET_ECN_is_not_ect(TCP_SKB_CB(skb)->ip_dsfield);
ecn_ok_dst = dst_feature(dst, DST_FEATURE_ECN_MASK);
-   ecn_ok = net->ipv4.sysctl_tcp_ecn || ecn_ok_dst;
+   ecn_ok = ecn_ok_dst ||
+(net->ipv4.sysctl_tcp_ecn && tcp_ca_uses_ecn(listen_sk));
 
if ((!ect && ecn_ok) || tcp_ca_needs_ecn(listen_sk) ||
(ecn_ok_dst & DST_FEATURE_ECN_CA) ||
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 95461f02ac9a..446cb65090f5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -312,8 +312,10 @@ static void tcp_ecn_send_syn(struct sock *sk, struct 
sk_buff *skb)
 {
struct tcp_sock *tp = tcp_sk(sk);
bool bpf_needs_ecn = tcp_bpf_ca_needs_ecn(sk);
-   bool use_ecn = sock_net(sk)->ipv4.sysctl_tcp_ecn == 1 ||
-   tcp_ca_needs_ecn(sk) || bpf_needs_ecn;
+   bool use_ecn = tcp_ca_needs_ecn(sk) || bpf_needs_ecn;
+
+   if (sock_net(sk)->ipv4.sysctl_tcp_ecn == 1 && tcp_ca_uses_ecn(sk))
+   use_ecn = true;
 
if (!use_ecn) {
const struct dst_entry *dst = __sk_dst_get(sk);
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH] net:l2tp: Allow MAC to be configured via netlink

2018-01-16 Thread Isaac Lee

The linux kernel by default uses random MAC address
for l2tp interfaces. However, there are situations
where it is desirable to have a deterministic MAC address.

A sample scenario would be where the host IP stack is attached
directly to a tunnel hence the "random" address is now propagated
via ARP/ND to the other end of the tunnel. If the device reboots,
a new MAC is used and the communication over the tunnel will be
disrupted until the new MAC address is re-learned by the peer.
Therefore it can be useful to have the mac address the same across
reboots.

The patch makes the MAC address to be configurable via
netlink so that a userspace program can specify what MAC address to
use at interface creation time in the kernel.

Signed-off-by: Isaac Lee 
---
 include/uapi/linux/l2tp.h | 1 +
 net/l2tp/l2tp_core.h  | 1 +
 net/l2tp/l2tp_eth.c   | 7 ++-
 net/l2tp/l2tp_netlink.c   | 3 +++
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/l2tp.h b/include/uapi/linux/l2tp.h
index d84ce5c1c9aa..fec15fd774c4 100644
--- a/include/uapi/linux/l2tp.h
+++ b/include/uapi/linux/l2tp.h
@@ -126,6 +126,7 @@ enum {
L2TP_ATTR_IP6_DADDR,/* struct in6_addr */
L2TP_ATTR_UDP_ZERO_CSUM6_TX,/* flag */
L2TP_ATTR_UDP_ZERO_CSUM6_RX,/* flag */
+   L2TP_ATTR_HWADDR,   /* 6 bytes */
L2TP_ATTR_PAD,
__L2TP_ATTR_MAX,
 };
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 9534e16965cc..730021289ce5 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -71,6 +71,7 @@ struct l2tp_session_cfg {
int mtu;
int mru;
char*ifname;
+   unsigned char   hwaddr[ETH_ALEN];
 };
 
 struct l2tp_session {
diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 5c366ecfa1cb..0e6ef5379b64 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -58,7 +58,9 @@ struct l2tp_eth_sess {
 
 static int l2tp_eth_dev_init(struct net_device *dev)
 {
-   eth_hw_addr_random(dev);
+   /* Use random MAC only when the interface is created without dev_addr */
+   if (!dev->dev_addr || !is_valid_ether_addr(dev->dev_addr))
+   eth_hw_addr_random(dev);
eth_broadcast_addr(dev->broadcast);
netdev_lockdep_set_classes(dev);
 
@@ -309,6 +311,9 @@ static int l2tp_eth_create(struct net *net, struct 
l2tp_tunnel *tunnel,
dev->max_mtu = ETH_MAX_MTU;
l2tp_eth_adjust_mtu(tunnel, session, dev);
 
+   if (is_valid_ether_addr(cfg->hwaddr))
+   ether_addr_copy(dev->dev_addr, cfg->hwaddr);
+
priv = netdev_priv(dev);
priv->session = session;
 
diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index a1f24fb2be98..dc2933c32121 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -607,6 +607,9 @@ static int l2tp_nl_cmd_session_create(struct sk_buff *skb, 
struct genl_info *inf
if (info->attrs[L2TP_ATTR_MRU])
cfg.mru = nla_get_u16(info->attrs[L2TP_ATTR_MRU]);
 
+   if (info->attrs[L2TP_ATTR_HWADDR])
+   memcpy(, nla_data(info->attrs[L2TP_ATTR_HWADDR]), 
ETH_ALEN);
+
 #ifdef CONFIG_MODULES
if (l2tp_nl_cmd_ops[cfg.pw_type] == NULL) {
genl_unlock();
-- 
2.15.1

Re: [PATCH net-next 0/8] net: sched: cls: add extack support

2018-01-16 Thread Daniel Borkmann

On 01/17/2018 01:08 AM, Daniel Borkmann wrote:
> Hey David, and others, [+Alexei]
> 
> On 01/17/2018 12:27 AM, Jamal Hadi Salim wrote:
>> On 18-01-16 05:41 PM, Jakub Kicinski wrote:
>>> On Tue, 16 Jan 2018 17:12:57 -0500, Jamal Hadi Salim wrote:
 On 18-01-16 04:46 PM, Jakub Kicinski wrote:
> On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote:

 [..]

 I would say precedence should be Jiri's patches, Alex's patches
 and then yours:
 Alex's patches fix the core (cls_api.c) area with proper extack
 for the core and then he has one patch to cover a specific
 use case of the u32 classifier extack. Yours is only concerned
 with one use case - bpf which depend on the core (that is in Alex's
 patches)
>>>
>>> Our patches are concerned with propagating the extack to drivers,
>>> and nfp (and netdevsim) make use of it.
>>>
>>> I'm miffed by the fact that you jumped out with this conflicting series
>>> *after* we posted ours, and we got shot down on white space.
> 
> So I've been looking over Quentin's series just now that sits in my
> bucket and it looks fine to me, but merge with this one would probably
> end up badly for David. Therefore I'm proposing the following that
> should hopefully be fine and work out for Alexander and Jakub/Quentin
> as a consensus:
> 
> I'm getting the current bpf-next stuff as PR out in a few minutes, so
> David can pull this in and therefore net-next will also have the
> dependency on nfp for Quentin's series. Then, given this one here
> needs another respin anyway, I would suggest to combine the missing
> patches from Alexander's series, and get it all out in a single patch
> series directly for net-next w/o any interdependency hassle.

Ok, bpf-next PR with the nfp dependencies is now out, so all this can
make progress here. I've therefore purged Jakub's extack series from
bpf queue, so a combined series can target net-next directly then.

Thanks,
Daniel

linux-next: manual merge of the net-next tree with the net tree

2018-01-16 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got conflicts in:

  net/sched/sch_ingress.c
  net/sched/sch_api.c
  include/net/sch_generic.h

between commit:

  81d947e2b8dd ("net, sched: fix panic when updating miniq {b,q}stats")

from the net tree and commits:

  54160ef6ec64 ("net: sched: sch_api: rearrange init handling")
  8d1a77f974ca ("net: sch: api: add extack support in tcf_block_get")
  d0bd684dddab ("net: sch: api: add extack support in qdisc_alloc")

from the net-next tree.

I fixed it up (I think, see below) and can carry the fix as necessary.
This is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/net/sch_generic.h
index becf86aa4ac6,ac029d5d88e4..
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@@ -444,10 -471,11 +471,12 @@@ void qdisc_destroy(struct Qdisc *qdisc)
  void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
   unsigned int len);
  struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
- const struct Qdisc_ops *ops);
+ const struct Qdisc_ops *ops,
+ struct netlink_ext_ack *extack);
 +void qdisc_free(struct Qdisc *qdisc);
  struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
-   const struct Qdisc_ops *ops, u32 parentid);
+   const struct Qdisc_ops *ops, u32 parentid,
+   struct netlink_ext_ack *extack);
  void __qdisc_calculate_pkt_len(struct sk_buff *skb,
   const struct qdisc_size_table *stab);
  int skb_do_redirect(struct sk_buff *);
diff --cc net/sched/sch_api.c
index 52529b7f8d96,0038a1c44ee9..
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@@ -1062,43 -1088,64 +1088,53 @@@ static struct Qdisc *qdisc_create(struc
netdev_info(dev, "Caught tx_queue_len zero misconfig\n");
}
  
-   if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) {
-   if (tca[TCA_STAB]) {
-   stab = qdisc_get_stab(tca[TCA_STAB]);
-   if (IS_ERR(stab)) {
-   err = PTR_ERR(stab);
-   goto err_out4;
-   }
-   rcu_assign_pointer(sch->stab, stab);
-   }
-   if (tca[TCA_RATE]) {
-   seqcount_t *running;
- 
-   err = -EOPNOTSUPP;
-   if (sch->flags & TCQ_F_MQROOT)
-   goto err_out4;
- 
-   if ((sch->parent != TC_H_ROOT) &&
-   !(sch->flags & TCQ_F_INGRESS) &&
-   (!p || !(p->flags & TCQ_F_MQROOT)))
-   running = qdisc_root_sleeping_running(sch);
-   else
-   running = >running;
- 
-   err = gen_new_estimator(>bstats,
-   sch->cpu_bstats,
-   >rate_est,
-   NULL,
-   running,
-   tca[TCA_RATE]);
-   if (err)
-   goto err_out4;
+   if (ops->init) {
+   err = ops->init(sch, tca[TCA_OPTIONS], extack);
+   if (err != 0)
+   goto err_out5;
+   }
+ 
 -  if (qdisc_is_percpu_stats(sch)) {
 -  sch->cpu_bstats =
 -  netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
 -  if (!sch->cpu_bstats)
 -  goto err_out4;
 -
 -  sch->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
 -  if (!sch->cpu_qstats)
 -  goto err_out4;
 -  }
 -
+   if (tca[TCA_STAB]) {
+   stab = qdisc_get_stab(tca[TCA_STAB], extack);
+   if (IS_ERR(stab)) {
+   err = PTR_ERR(stab);
+   goto err_out4;
}
+   rcu_assign_pointer(sch->stab, stab);
+   }
+   if (tca[TCA_RATE]) {
+   seqcount_t *running;
  
-   qdisc_hash_add(sch, false);
+   err = -EOPNOTSUPP;
+   if (sch->flags & TCQ_F_MQROOT) {
+   NL_SET_ERR_MSG(extack, "Cannot attach rate estimator to 
a multi-queue root qdisc");
+   goto err_out4;
+   }
  
-   return sch;
+   if (sch->parent != TC_H_ROOT &&
+

Re: [bpf-next PATCH 0/3] libbpf: cleanups to Makefile

2018-01-16 Thread Daniel Borkmann

On 01/17/2018 12:20 AM, Jesper Dangaard Brouer wrote:
> This patchset contains some small improvements and cleanup for
> the Makefile in tools/lib/bpf/.
> 
> It worries me that the libbpf.so shared library is not versioned, but
> it not addressed in this patchset.

Looks good; applied it to bpf-next, thanks Jesper!

Re: [PATCH bpf-next 0/6] bpf: various fixes and improvements

2018-01-16 Thread Daniel Borkmann

On 01/17/2018 12:51 AM, Jakub Kicinski wrote:
> Hi!
> 
> This series combines a number of random improvements ranging from
> libbpf to nfp driver.  NFP patches make better use of the verifier
> log.  There is a requested adjustment to the map offload code, and
> a warning fix for a W=1 build to the disassembler.  Quentin also
> fixes the libbpf program type detection, while Jiong allows the use
> of libbfd compiled from source.

I did apply the series to bpf-next, thanks guys!

pull-request: bpf-next 2018-01-17

2018-01-16 Thread Daniel Borkmann

Hi David,

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add initial BPF map offloading for nfp driver. Currently only
   programs were supported so far w/o being able to access maps.
   Offloaded programs are right now only allowed to perform map
   lookups, and control path is responsible for populating the
   maps. BPF core infrastructure along with nfp implementation is
   provided, from Jakub.

2) Various follow-ups to Josef's BPF error injections. More
   specifically that includes: properly check whether the error
   injectable event is on function entry or not, remove the percpu
   bpf_kprobe_override and rather compare instruction pointer
   with original one, separate error-injection from kprobes since
   it's not limited to it, add injectable error types in order to
   specify what is the expected type of failure, and last but not
   least also support the kernel's fault injection framework, all
   from Masami.

3) Various misc improvements and cleanups to the libbpf Makefile.
   That is, fix permissions when installing BPF header files, remove
   unused variables and functions, and also install the libbpf.h
   header, from Jesper.

4) When offloading to nfp JIT and the BPF insn is unsupported in the
   JIT, then reject right at verification time. Also fix libbpf with
   regards to ELF section name matching by properly treating the
   program type as prefix. Both from Quentin.

5) Add -DPACKAGE to bpftool when including bfd.h for the disassembler.
   This is needed, for example, when building libfd from source as
   bpftool doesn't supply a config.h for bfd.h. Fix from Jiong.

6) xdp_convert_ctx_access() is simplified since it doesn't need to
   set target size during verification, from Jesper.

7) Let bpftool properly recognize BPF_PROG_TYPE_CGROUP_DEVICE
   program types, from Roman.

8) Various functions in BPF cpumap were not declared static, from Wei.

9) Fix a double semicolon in BPF samples, from Luis.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Thanks a lot!



The following changes since commit 6bd39bc3da0f4a301fae69c4a32db2768f5118be:

  Merge branch 'hns3-add-some-new-features-and-fix-some-bugs' (2018-01-12 
10:12:33 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 

for you to fetch changes up to e8a9d9683c8a62f917c19e57f1618363fb9ed04e:

  Merge branch 'bpf-libbpf-cleanups' (2018-01-17 01:18:12 +0100)


Alexei Starovoitov (1):
  Merge branch 'error-injection'

Daniel Borkmann (3):
  Merge branch 'bpf-nfp-map-offload'
  Merge branch 'bpf-various-improvements'
  Merge branch 'bpf-libbpf-cleanups'

Jakub Kicinski (18):
  bpf: add map_alloc_check callback
  bpf: hashtab: move attribute validation before allocation
  bpf: hashtab: move checks out of alloc function
  bpf: add helper for copying attrs to struct bpf_map
  bpf: rename bpf_dev_offload -> bpf_prog_offload
  bpf: offload: factor out netdev checking at allocation time
  bpf: offload: add map offload infrastructure
  nfp: bpf: add map data structure
  nfp: bpf: add basic control channel communication
  nfp: bpf: implement helpers for FW map ops
  nfp: bpf: parse function call and map capabilities
  nfp: bpf: add helpers for updating immediate instructions
  nfp: bpf: add verification and codegen for map lookups
  nfp: bpf: add support for reading map memory
  nfp: bpf: implement bpf map offload
  bpf: offload: make bpf_offload_dev_match() reject host+host case
  bpf: annotate bpf_insn_print_t with __printf
  nfp: bpf: print map lookup problems into verifier log

Jesper Dangaard Brouer (4):
  bpf: simplify xdp_convert_ctx_access for xdp_rxq_info
  libbpf: install the header file libbpf.h
  libbpf: cleanup Makefile, remove unused elements
  libbpf: Makefile set specified permission mode

Jiong Wang (1):
  tools: bpftool: add -DPACKAGE when including bfd.h

Luis de Bethencourt (1):
  samples/bpf: Fix trailing semicolon

Masami Hiramatsu (5):
  tracing/kprobe: bpf: Check error injectable event is on function entry
  tracing/kprobe: bpf: Compare instruction pointer with original one
  error-injection: Separate error-injection from kprobe
  error-injection: Add injectable error types
  error-injection: Support fault injection framework

Quentin Monnet (2):
  libbpf: fix string comparison for guessing eBPF program type
  nfp: bpf: reject program on instructions unknown to the JIT compiler

Roman Gushchin (1):
  bpftool: recognize BPF_PROG_TYPE_CGROUP_DEVICE programs

Wei Yongjun (1):
  bpf: cpumap: make some functions static

 Documentation/fault-injection/fault-injection.txt  |

Re: [PATCH 32/32] aio: implement io_pgetevents

2018-01-16 Thread Jeff Moyer

Hi, Christoph,

Christoph Hellwig  writes:

> On Mon, Jan 15, 2018 at 09:53:10AM +0100, Christoph Hellwig wrote:
>> > pselect, as an example, crams the sigmask and size together.  Why not
>> > just do that?  libaio can take care of setting that up.
>> 
>> Yes, I could try that.  It's just another double indirection for no
>> good reason.
>
> I cna't get this to work - for some reason I always end up with a NULL
> sigmask in the kernel.  Nevermind that it leads to really crap code
> generation.  I guess for select the latter doesn't matter too much as
> everyone sane uses ppoll anyway.

I'd be willing to bet the issue is in your io_syscall6 implementation.
You pass in arg5 where arg6 should be used.  Don't feel bad, it took me
the better part of today to figure that out.  :)

Here's an incremental diff on top of what you've posted.  Feel free to
fold it into your patch (and format however you like).  You can find the
libaio changes in my 'aio-poll' branch:
  https://pagure.io/libaio/commits/aio-poll

These changes were run through the libaio test harness, 64 bit and 32
bit, so the compat system call was tested.

Cheers,
Jeff

diff --git a/fs/aio.c b/fs/aio.c
index 57a4e8d..c6d67d0 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1991,9 +1991,11 @@ static long do_io_getevents(aio_context_t ctx_id,
long, nr,
struct io_event __user *, events,
struct timespec __user *, timeout,
-   const sigset_t __user *, sigmask)
+   void __user *, sigmask)
 {
sigset_tksigmask, sigsaved;
+   size_t  sigsetsize = 0;
+   sigset_t __user *up = NULL;
struct timespec64   ts;
int ret;
 
@@ -2001,8 +2003,18 @@ static long do_io_getevents(aio_context_t ctx_id,
return -EFAULT;
 
if (sigmask) {
-   if (copy_from_user(, sigmask, sizeof(ksigmask)))
+   if (!access_ok(VERIFY_READ, sigmask,
+  sizeof(void *) + sizeof(size_t)) ||
+   __get_user(up, (sigset_t __user * __user *)sigmask) ||
+   __get_user(sigsetsize,
+  (size_t __user *)(sigmask + sizeof(void *
return -EFAULT;
+
+   if (sigsetsize != sizeof(sigset_t))
+   return -EINVAL;
+   if (copy_from_user(, up, sizeof(ksigmask)))
+   return -EFAULT;
+
sigdelsetmask(, sigmask(SIGKILL) | sigmask(SIGSTOP));
sigprocmask(SIG_SETMASK, , );
}
@@ -2049,8 +2061,11 @@ static long do_io_getevents(aio_context_t ctx_id,
compat_long_t, nr,
struct io_event __user *, events,
struct compat_timespec __user *, timeout,
-   const compat_sigset_t __user *, sigmask)
+   void __user *, sig)
 {
+   compat_size_t sigsetsize = 0;
+   compat_sigset_t __user *sigmask;
+   compat_uptr_t up = 0;
sigset_t ksigmask, sigsaved;
struct timespec64 t;
int ret;
@@ -2058,8 +2073,17 @@ static long do_io_getevents(aio_context_t ctx_id,
if (timeout && compat_get_timespec64(, timeout))
return -EFAULT;
 
-   if (sigmask) {
-   if (get_compat_sigset(, sigmask))
+   if (sig) {
+   if (!access_ok(VERIFY_READ, sig,
+   sizeof(compat_uptr_t) + sizeof(compat_size_t)) ||
+   __get_user(up, (compat_uptr_t __user *)sig) ||
+   __get_user(sigsetsize,
+  (compat_size_t __user *)(sig + sizeof(up
+   return -EFAULT;
+
+   if (sigsetsize != sizeof(compat_sigset_t))
+   return -EINVAL;
+   if (get_compat_sigset(, compat_ptr(up)))
return -EFAULT;
sigdelsetmask(, sigmask(SIGKILL) | sigmask(SIGSTOP));
sigprocmask(SIG_SETMASK, , );
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a4cda98..32412f8 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -541,7 +541,7 @@ asmlinkage long 
compat_sys_io_pgetevents(compat_aio_context_t ctx_id,
compat_long_t nr,
struct io_event __user *events,
struct compat_timespec __user *timeout,
-   const compat_sigset_t __user *sigmask);
+   void __user *sigmask);
 asmlinkage long compat_sys_io_submit(compat_aio_context_t ctx_id, int nr,
 u32 __user *iocb);
 asmlinkage long compat_sys_mount(const char __user *dev_name,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 3bc9a13..bc79026 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -544,7 +544,7

Re: [PATCH bpf-next 0/3] bpf: add dumping and disassembler for non-host JITs

2018-01-16 Thread Jakub Kicinski

On Tue, 16 Jan 2018 16:05:18 -0800, Jakub Kicinski wrote:
> Hi,

Ah, forgot to insert "Jiong says:" here :)

> Currently bpftool could disassemble host jited image, for example x86_64,
> using libbfd. However it couldn't disassemble offload jited image.
> 
> There are two reasons:
> 
>   1. bpf_obj_get_info_by_fd/struct bpf_prog_info couldn't get the address
>  of jited image and image's length.
> 
>   2. Even after issue 1 resolved, bpftool couldn't figure out what is the
>  offload arch from bpf_prog_info, therefore can't drive libbfd
>  disassembler correctly.
> 
>   This patch set resolve issue 1 by introducing two new fields "jited_len"
> and "jited_image" in bpf_dev_offload. These two fields serve as the generic
> interface to communicate the jited image address and length for all offload
> targets to higher level caller. For example, bpf_obj_get_info_by_fd could
> use them to fill the userspace visible fields jited_prog_len and
> jited_prog_insns.
> 
>   This patch set resolve issue 2 by getting bfd backend name through
> "ifindex", i.e network interface index.
> 
> v1:
>  - Deduct bfd arch name through ifindex, i.e network interface index.
>First, map ifindex to devname through ifindex_to_name_ns, then get
>pci id through /sys/class/dev/DEVNAME/device/vendor. (Daniel, Alexei)

Re: [PATCH net-next 2/8] net: sched: cls_api: handle generic cls errors

2018-01-16 Thread Jamal Hadi Salim


On 18-01-16 06:58 PM, David Ahern wrote:

On 1/16/18 9:20 AM, Alexander Aring wrote:




}
  
  		if (n->nlmsg_type != RTM_NEWTFILTER ||

!(n->nlmsg_flags & NLM_F_CREATE)) {
+   NL_SET_ERR_MSG(extack, "Need both RTM_NEWTFILTER and 
NLM_F_CREATE to create a new filter");


that does not seem the right message. tc_ctl_tfilter is overloaded for
new, delete and get so the response here needs to reflect that. I
believe in this case the user did not specify a valid chain.



Are you sure you are looking at the correct code?
It is a create message that is at stake here.
A create has to have RTM_NEWTFILTER and NLM_F_CREATE


Also, the messages are targeted at users not developers, so no code
jargon / API references.


Generally true, but should this rule really be scripture?
The main user here is tc in  user space and it doesnt make mistakes
in this case i.e we will  never see this error with tc because a
create will always have those two set correctly; OTOH, a developer
writing some new app is more likely to make this mistake (in which
case this message is very helpful).

cheers,
jamal

Re: [PATCH net-next 0/8] net: sched: cls: add extack support

2018-01-16 Thread Daniel Borkmann

Hey David, and others, [+Alexei]

On 01/17/2018 12:27 AM, Jamal Hadi Salim wrote:
> On 18-01-16 05:41 PM, Jakub Kicinski wrote:
>> On Tue, 16 Jan 2018 17:12:57 -0500, Jamal Hadi Salim wrote:
>>> On 18-01-16 04:46 PM, Jakub Kicinski wrote:
 On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote:
>>>
>>> [..]
>>>
>>> I would say precedence should be Jiri's patches, Alex's patches
>>> and then yours:
>>> Alex's patches fix the core (cls_api.c) area with proper extack
>>> for the core and then he has one patch to cover a specific
>>> use case of the u32 classifier extack. Yours is only concerned
>>> with one use case - bpf which depend on the core (that is in Alex's
>>> patches)
>>
>> Our patches are concerned with propagating the extack to drivers,
>> and nfp (and netdevsim) make use of it.
>>
>> I'm miffed by the fact that you jumped out with this conflicting series
>> *after* we posted ours, and we got shot down on white space.

So I've been looking over Quentin's series just now that sits in my
bucket and it looks fine to me, but merge with this one would probably
end up badly for David. Therefore I'm proposing the following that
should hopefully be fine and work out for Alexander and Jakub/Quentin
as a consensus:

I'm getting the current bpf-next stuff as PR out in a few minutes, so
David can pull this in and therefore net-next will also have the
dependency on nfp for Quentin's series. Then, given this one here
needs another respin anyway, I would suggest to combine the missing
patches from Alexander's series, and get it all out in a single patch
series directly for net-next w/o any interdependency hassle.

Thanks,
Daniel

[PATCH bpf-next 1/3] bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info

2018-01-16 Thread Jakub Kicinski

From: Jiong Wang 

For host JIT, there are "jited_len"/"bpf_func" fields in struct bpf_prog
used by all host JIT targets to get jited image and it's length. While for
offload, targets are likely to have different offload mechanisms that these
info are kept in device private data fields.

Therefore, BPF_OBJ_GET_INFO_BY_FD syscall needs an unified way to get JIT
length and contents info for offload targets.

One way is to introduce new callback to parse device private data then fill
those fields in bpf_prog_info. This might be a little heavy, the other way
is to add generic fields which will be initialized by all offload targets.

This patch follow the second approach to introduce two new fields in
struct bpf_dev_offload and teach bpf_prog_get_info_by_fd about them to fill
correct jited_prog_len and jited_prog_insns in bpf_prog_info.

Reviewed-by: Jakub Kicinski 
Signed-off-by: Jiong Wang 
---
 include/linux/bpf.h  |  2 ++
 kernel/bpf/offload.c | 23 +++
 kernel/bpf/syscall.c | 31 ++-
 3 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5c2c104dc2c5..025b1c2f8053 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -234,6 +234,8 @@ struct bpf_prog_offload {
struct list_headoffloads;
booldev_state;
const struct bpf_prog_offload_ops *dev_ops;
+   void*jited_image;
+   u32 jited_len;
 };
 
 struct bpf_prog_aux {
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index a88cebf368bf..6c0baa1cf8f8 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -230,9 +230,12 @@ int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
.prog   = prog,
.info   = info,
};
+   struct bpf_prog_aux *aux = prog->aux;
struct inode *ns_inode;
struct path ns_path;
+   char __user *uinsns;
void *res;
+   u32 ulen;
 
res = ns_get_path_cb(_path, bpf_prog_offload_info_fill_ns, );
if (IS_ERR(res)) {
@@ -241,6 +244,26 @@ int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
return PTR_ERR(res);
}
 
+   down_read(_devs_lock);
+
+   if (!aux->offload) {
+   up_read(_devs_lock);
+   return -ENODEV;
+   }
+
+   ulen = info->jited_prog_len;
+   info->jited_prog_len = aux->offload->jited_len;
+   if (info->jited_prog_len & ulen) {
+   uinsns = u64_to_user_ptr(info->jited_prog_insns);
+   ulen = min_t(u32, info->jited_prog_len, ulen);
+   if (copy_to_user(uinsns, aux->offload->jited_image, ulen)) {
+   up_read(_devs_lock);
+   return -EFAULT;
+   }
+   }
+
+   up_read(_devs_lock);
+
ns_inode = ns_path.dentry->d_inode;
info->netns_dev = new_encode_dev(ns_inode->i_sb->s_dev);
info->netns_ino = ns_inode->i_ino;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c691b9e972e3..c28524483bf4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1724,19 +1724,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
goto done;
}
 
-   ulen = info.jited_prog_len;
-   info.jited_prog_len = prog->jited_len;
-   if (info.jited_prog_len && ulen) {
-   if (bpf_dump_raw_ok()) {
-   uinsns = u64_to_user_ptr(info.jited_prog_insns);
-   ulen = min_t(u32, info.jited_prog_len, ulen);
-   if (copy_to_user(uinsns, prog->bpf_func, ulen))
-   return -EFAULT;
-   } else {
-   info.jited_prog_insns = 0;
-   }
-   }
-
ulen = info.xlated_prog_len;
info.xlated_prog_len = bpf_prog_insn_size(prog);
if (info.xlated_prog_len && ulen) {
@@ -1762,6 +1749,24 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
err = bpf_prog_offload_info_fill(, prog);
if (err)
return err;
+   goto done;
+   }
+
+   /* NOTE: the following code is supposed to be skipped for offload.
+* bpf_prog_offload_info_fill() is the place to fill similar fields
+* for offload.
+*/
+   ulen = info.jited_prog_len;
+   info.jited_prog_len = prog->jited_len;
+   if (info.jited_prog_len && ulen) {
+   if (bpf_dump_raw_ok()) {
+   uinsns = u64_to_user_ptr(info.jited_prog_insns);
+   ulen = min_t(u32, info.jited_prog_len, ulen);
+   if (copy_to_user(uinsns, prog->bpf_func, ulen))
+   return -EFAULT;
+   } else {
+   info.jited_prog_insns = 0;
+

[PATCH bpf-next 2/3] nfp: bpf: set new jit info fields

2018-01-16 Thread Jakub Kicinski

From: Jiong Wang 

This patch set those new jit info fields introduced in this patch set.

Reviewed-by: Jakub Kicinski 
Signed-off-by: Jiong Wang 
---
 drivers/net/ethernet/netronome/nfp/bpf/offload.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 9c78a09cda24..4c1cea68f19e 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -127,6 +127,7 @@ static int nfp_bpf_translate(struct nfp_net *nn, struct 
bpf_prog *prog)
struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
unsigned int stack_size;
unsigned int max_instr;
+   int err;
 
stack_size = nn_readb(nn, NFP_NET_CFG_BPF_STACK_SZ) * 64;
if (prog->aux->stack_depth > stack_size) {
@@ -143,7 +144,14 @@ static int nfp_bpf_translate(struct nfp_net *nn, struct 
bpf_prog *prog)
if (!nfp_prog->prog)
return -ENOMEM;
 
-   return nfp_bpf_jit(nfp_prog);
+   err = nfp_bpf_jit(nfp_prog);
+   if (err)
+   return err;
+
+   prog->aux->offload->jited_len = nfp_prog->prog_len * sizeof(u64);
+   prog->aux->offload->jited_image = nfp_prog->prog;
+
+   return 0;
 }
 
 static int nfp_bpf_destroy(struct nfp_net *nn, struct bpf_prog *prog)
-- 
2.15.1

[PATCH bpf-next 3/3] tools: bpftool: improve architecture detection by using ifindex

2018-01-16 Thread Jakub Kicinski

From: Jiong Wang 

The current architecture detection method in bpftool is designed for host
case.

For offload case, we can't use the architecture of "bpftool" itself.
Instead, we could call the existing "ifindex_to_name_ns" to get DEVNAME,
then read pci id from /sys/class/dev/DEVNAME/device/vendor, finally we map
vendor id to bfd arch name which will finally be used to select bfd backend
for the disassembler.

Reviewed-by: Jakub Kicinski 
Signed-off-by: Jiong Wang 
---
 tools/bpf/bpftool/common.c | 72 ++
 tools/bpf/bpftool/jit_disasm.c | 16 +-
 tools/bpf/bpftool/main.h   |  5 ++-
 tools/bpf/bpftool/prog.c   | 12 ++-
 4 files changed, 102 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 6601c95a9258..0b482c0070e0 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -34,6 +34,7 @@
 /* Author: Jakub Kicinski  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -433,6 +434,77 @@ ifindex_to_name_ns(__u32 ifindex, __u32 ns_dev, __u32 
ns_ino, char *buf)
return if_indextoname(ifindex, buf);
 }
 
+static int read_sysfs_hex_int(char *path)
+{
+   char vendor_id_buf[8];
+   int len;
+   int fd;
+
+   fd = open(path, O_RDONLY);
+   if (fd < 0) {
+   p_err("Can't open %s: %s", path, strerror(errno));
+   return -1;
+   }
+
+   len = read(fd, vendor_id_buf, sizeof(vendor_id_buf));
+   close(fd);
+   if (len < 0) {
+   p_err("Can't read %s: %s", path, strerror(errno));
+   return -1;
+   }
+   if (len >= (int)sizeof(vendor_id_buf)) {
+   p_err("Value in %s too long", path);
+   return -1;
+   }
+
+   vendor_id_buf[len] = 0;
+
+   return strtol(vendor_id_buf, NULL, 0);
+}
+
+static int read_sysfs_netdev_hex_int(char *devname, const char *entry_name)
+{
+   char full_path[64];
+
+   snprintf(full_path, sizeof(full_path), "/sys/class/net/%s/device/%s",
+devname, entry_name);
+
+   return read_sysfs_hex_int(full_path);
+}
+
+const char *ifindex_to_bfd_name_ns(__u32 ifindex, __u64 ns_dev, __u64 ns_ino)
+{
+   char devname[IF_NAMESIZE];
+   int vendor_id;
+   int device_id;
+
+   if (!ifindex_to_name_ns(ifindex, ns_dev, ns_ino, devname)) {
+   p_err("Can't get net device name for ifindex %d: %s", ifindex,
+ strerror(errno));
+   return NULL;
+   }
+
+   vendor_id = read_sysfs_netdev_hex_int(devname, "vendor");
+   if (vendor_id < 0) {
+   p_err("Can't get device vendor id for %s", devname);
+   return NULL;
+   }
+
+   switch (vendor_id) {
+   case 0x19ee:
+   device_id = read_sysfs_netdev_hex_int(devname, "device");
+   if (device_id != 0x4000 &&
+   device_id != 0x6000 &&
+   device_id != 0x6003)
+   p_info("Unknown NFP device ID, assuming it is NFP-6xxx 
arch");
+   return "NFP-6xxx";
+   default:
+   p_err("Can't get bfd arch name for device vendor id 0x%04x",
+ vendor_id);
+   return NULL;
+   }
+}
+
 void print_dev_plain(__u32 ifindex, __u64 ns_dev, __u64 ns_inode)
 {
char name[IF_NAMESIZE];
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
index 57d32e8a1391..87439320ef70 100644
--- a/tools/bpf/bpftool/jit_disasm.c
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -76,7 +76,8 @@ static int fprintf_json(void *out, const char *fmt, ...)
return 0;
 }
 
-void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes)
+void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
+  const char *arch)
 {
disassembler_ftype disassemble;
struct disassemble_info info;
@@ -100,6 +101,19 @@ void disasm_print_insn(unsigned char *image, ssize_t len, 
int opcodes)
else
init_disassemble_info(, stdout,
  (fprintf_ftype) fprintf);
+
+   /* Update architecture info for offload. */
+   if (arch) {
+   const bfd_arch_info_type *inf = bfd_scan_arch(arch);
+
+   if (inf) {
+   bfdf->arch_info = inf;
+   } else {
+   p_err("No libfd support for %s", arch);
+   return;
+   }
+   }
+
info.arch = bfd_get_arch(bfdf);
info.mach = bfd_get_mach(bfdf);
info.buffer = image;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 65b526fe6e7e..b8e9584d6246 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -121,7 +121,10 @@ int do_cgroup(int argc, char **arg);
 
 int

[PATCH bpf-next 0/3] bpf: add dumping and disassembler for non-host JITs

2018-01-16 Thread Jakub Kicinski

Hi,

Currently bpftool could disassemble host jited image, for example x86_64,
using libbfd. However it couldn't disassemble offload jited image.

There are two reasons:

  1. bpf_obj_get_info_by_fd/struct bpf_prog_info couldn't get the address
 of jited image and image's length.

  2. Even after issue 1 resolved, bpftool couldn't figure out what is the
 offload arch from bpf_prog_info, therefore can't drive libbfd
 disassembler correctly.

  This patch set resolve issue 1 by introducing two new fields "jited_len"
and "jited_image" in bpf_dev_offload. These two fields serve as the generic
interface to communicate the jited image address and length for all offload
targets to higher level caller. For example, bpf_obj_get_info_by_fd could
use them to fill the userspace visible fields jited_prog_len and
jited_prog_insns.

  This patch set resolve issue 2 by getting bfd backend name through
"ifindex", i.e network interface index.

v1:
 - Deduct bfd arch name through ifindex, i.e network interface index.
   First, map ifindex to devname through ifindex_to_name_ns, then get
   pci id through /sys/class/dev/DEVNAME/device/vendor. (Daniel, Alexei)
 
Jiong Wang (3):
  bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info
  nfp: bpf: set new jit info fields
  tools: bpftool: improve architecture detection by using ifindex

 drivers/net/ethernet/netronome/nfp/bpf/offload.c | 10 +++-
 include/linux/bpf.h  |  2 +
 kernel/bpf/offload.c | 23 
 kernel/bpf/syscall.c | 31 +-
 tools/bpf/bpftool/common.c   | 72 
 tools/bpf/bpftool/jit_disasm.c   | 16 +-
 tools/bpf/bpftool/main.h |  5 +-
 tools/bpf/bpftool/prog.c | 12 +++-
 8 files changed, 154 insertions(+), 17 deletions(-)

-- 
2.15.1

Re: [patch net-next v10 02/13] net: sched: introduce shared filter blocks infrastructure

2018-01-16 Thread Cong Wang

On Tue, Jan 16, 2018 at 7:33 AM, Jiri Pirko  wrote:
>  static int __init tc_filter_init(void)
>  {
> +   int err;
> +
> tc_filter_wq = alloc_ordered_workqueue("tc_filter_workqueue", 0);
> if (!tc_filter_wq)
> return -ENOMEM;
>
> +   err = register_pernet_subsys(_net_ops);
> +   if (err)
> +   return err;

Need to destroy the above workqueue on error.

Re: [PATCH net-next 2/8] net: sched: cls_api: handle generic cls errors

2018-01-16 Thread David Ahern

On 1/16/18 9:20 AM, Alexander Aring wrote:
> This patch adds extack support for generic cls handling. The extack
> will be set deeper to each called function which is not part of netdev
> core api.
> 
> Cc: David Ahern 
> Signed-off-by: Alexander Aring 
> ---
>  net/sched/cls_api.c | 55 
> +
>  1 file changed, 43 insertions(+), 12 deletions(-)
> 
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 01d09055707d..c25a9b4bcb4b 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -122,7 +122,8 @@ static inline u32 tcf_auto_prio(struct tcf_proto *tp)
>  
>  static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
> u32 prio, u32 parent, struct Qdisc *q,
> -   struct tcf_chain *chain)
> +   struct tcf_chain *chain,
> +   struct netlink_ext_ack *extack)
>  {
>   struct tcf_proto *tp;
>   int err;
> @@ -148,6 +149,7 @@ static struct tcf_proto *tcf_proto_create(const char 
> *kind, u32 protocol,
>   module_put(tp->ops->owner);
>   err = -EAGAIN;
>   } else {
> + NL_SET_ERR_MSG(extack, "TC classifier not found");
>   err = -ENOENT;
>   }
>   goto errout;
> @@ -662,7 +664,8 @@ static int tfilter_notify(struct net *net, struct sk_buff 
> *oskb,
>  static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
> struct nlmsghdr *n, struct tcf_proto *tp,
> struct Qdisc *q, u32 parent,
> -   void *fh, bool unicast, bool *last)
> +   void *fh, bool unicast, bool *last,
> +   struct netlink_ext_ack *extack)
>  {
>   struct sk_buff *skb;
>   u32 portid = oskb ? NETLINK_CB(oskb).portid : 0;
> @@ -674,6 +677,7 @@ static int tfilter_del_notify(struct net *net, struct 
> sk_buff *oskb,
>  
>   if (tcf_fill_node(net, skb, tp, q, parent, fh, portid, n->nlmsg_seq,
> n->nlmsg_flags, RTM_DELTFILTER) <= 0) {
> + NL_SET_ERR_MSG(extack, "Failed to build del event 
> notification");
>   kfree_skb(skb);
>   return -EINVAL;
>   }
> @@ -687,8 +691,11 @@ static int tfilter_del_notify(struct net *net, struct 
> sk_buff *oskb,
>   if (unicast)
>   return netlink_unicast(net->rtnl, skb, portid, MSG_DONTWAIT);
>  
> - return rtnetlink_send(skb, net, portid, RTNLGRP_TC,
> -   n->nlmsg_flags & NLM_F_ECHO);
> + err = rtnetlink_send(skb, net, portid, RTNLGRP_TC,
> +  n->nlmsg_flags & NLM_F_ECHO);
> + if (err < 0)
> + NL_SET_ERR_MSG(extack, "Failed to send filter delete 
> notification");

not sure we want to do this -- extack for internal failures like this
one or below in tc_ctl_tfilter.


> + return err;
>  }
>  
>  static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
> @@ -749,8 +756,10 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
> nlmsghdr *n,
>   if (prio == 0) {
>   switch (n->nlmsg_type) {
>   case RTM_DELTFILTER:
> - if (protocol || t->tcm_handle || tca[TCA_KIND])
> + if (protocol || t->tcm_handle || tca[TCA_KIND]) {
> + NL_SET_ERR_MSG(extack, "Cannot flush filters 
> with protocol, handle or kind set");
>   return -ENOENT;
> + }
>   break;
>   case RTM_NEWTFILTER:
>   /* If no priority is provided by the user,
> @@ -763,6 +772,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
> nlmsghdr *n,
>   }
>   /* fall-through */
>   default:
> + NL_SET_ERR_MSG(extack, "Invalid filter command with 
> priority of zero");
>   return -ENOENT;
>   }
>   }
> @@ -780,23 +790,31 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
> nlmsghdr *n,
>   parent = q->handle;
>   } else {
>   q = qdisc_lookup(dev, TC_H_MAJ(t->tcm_parent));
> - if (!q)
> + if (!q) {
> + NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");

Messages should avoid contractions; spell out 'does not'. Please check
all of the patches.

Also, it should be 'exist' (no 's' on the end).


>   return -EINVAL;
> + }
>   }
>  
>   /* Is it classful? */
>   cops = q->ops->cl_ops;
> - if (!cops)
> + if (!cops) {
> + NL_SET_ERR_MSG(extack, "Qdisc not classful");
>   return -EINVAL;
> + }
>  
> - if

[PATCH bpf-next 1/6] bpf: offload: make bpf_offload_dev_match() reject host+host case

2018-01-16 Thread Jakub Kicinski

Daniel suggests it would be more logical for bpf_offload_dev_match()
to return false is either the program or the map are not offloaded,
rather than treating the both not offloaded case as a "matching
CPU/host device".

This makes no functional difference today, since verifier only calls
bpf_offload_dev_match() when one of the objects is offloaded.

Signed-off-by: Jakub Kicinski 
---
 kernel/bpf/offload.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 453785fa1881..a88cebf368bf 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -395,10 +395,8 @@ bool bpf_offload_dev_match(struct bpf_prog *prog, struct 
bpf_map *map)
struct bpf_prog_offload *offload;
bool ret;
 
-   if (!!bpf_prog_is_dev_bound(prog->aux) != !!bpf_map_is_dev_bound(map))
+   if (!bpf_prog_is_dev_bound(prog->aux) || !bpf_map_is_dev_bound(map))
return false;
-   if (!bpf_prog_is_dev_bound(prog->aux))
-   return true;
 
down_read(_devs_lock);
offload = prog->aux->offload;
-- 
2.15.1

[PATCH bpf-next 0/6] bpf: various fixes and improvements

2018-01-16 Thread Jakub Kicinski

Hi!

This series combines a number of random improvements ranging from
libbpf to nfp driver.  NFP patches make better use of the verifier
log.  There is a requested adjustment to the map offload code, and
a warning fix for a W=1 build to the disassembler.  Quentin also
fixes the libbpf program type detection, while Jiong allows the use
of libbfd compiled from source.

Jakub Kicinski (3):
  bpf: offload: make bpf_offload_dev_match() reject host+host case
  bpf: annotate bpf_insn_print_t with __printf
  nfp: bpf: print map lookup problems into verifier log

Jiong Wang (1):
  tools: bpftool: add -DPACKAGE when including bfd.h

Quentin Monnet (2):
  libbpf: fix string comparison for guessing eBPF program type
  nfp: bpf: reject program on instructions unknown to the JIT compiler

 drivers/net/ethernet/netronome/nfp/bpf/jit.c  |  5 +
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  1 +
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 20 ++--
 kernel/bpf/disasm.h   |  4 ++--
 kernel/bpf/offload.c  |  4 +---
 tools/bpf/bpftool/Makefile|  2 +-
 tools/build/feature/Makefile  |  2 +-
 tools/lib/bpf/libbpf.c|  2 +-
 8 files changed, 26 insertions(+), 14 deletions(-)

-- 
2.15.1

[PATCH bpf-next 2/6] bpf: annotate bpf_insn_print_t with __printf

2018-01-16 Thread Jakub Kicinski

Functions of type bpf_insn_print_t take printf-like format
string, mark the type accordingly.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 kernel/bpf/disasm.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/disasm.h b/kernel/bpf/disasm.h
index e0857d016f89..266fe8ee542b 100644
--- a/kernel/bpf/disasm.h
+++ b/kernel/bpf/disasm.h
@@ -29,8 +29,8 @@ extern const char *const bpf_class_string[8];
 
 const char *func_id_name(int id);
 
-typedef void (*bpf_insn_print_t)(struct bpf_verifier_env *env,
-const char *, ...);
+typedef __printf(2, 3) void (*bpf_insn_print_t)(struct bpf_verifier_env *env,
+   const char *, ...);
 typedef const char *(*bpf_insn_revmap_call_t)(void *private_data,
  const struct bpf_insn *insn);
 typedef const char *(*bpf_insn_print_imm_t)(void *private_data,
-- 
2.15.1

[PATCH bpf-next 6/6] nfp: bpf: reject program on instructions unknown to the JIT compiler

2018-01-16 Thread Jakub Kicinski

From: Quentin Monnet 

If an eBPF instruction is unknown to the driver JIT compiler, we can
reject the program at verification time.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
Reviewed-by: Jiong Wang 
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 5 +
 drivers/net/ethernet/netronome/nfp/bpf/main.h | 1 +
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 6 ++
 3 files changed, 12 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c 
b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index cdc949fabe98..56451edf01c2 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -2907,6 +2907,11 @@ void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, 
unsigned int cnt)
}
 }
 
+bool nfp_bpf_supported_opcode(u8 code)
+{
+   return !!instr_cb[code];
+}
+
 void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv)
 {
unsigned int i;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 80855d43b25e..424fe8338105 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -324,6 +324,7 @@ struct nfp_bpf_vnic {
 
 void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, unsigned int cnt);
 int nfp_bpf_jit(struct nfp_prog *prog);
+bool nfp_bpf_supported_opcode(u8 code);
 
 extern const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops;
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c 
b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 81dab462456c..479f602887e9 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -290,6 +290,12 @@ nfp_verify_insn(struct bpf_verifier_env *env, int 
insn_idx, int prev_insn_idx)
meta = nfp_bpf_goto_meta(nfp_prog, meta, insn_idx, env->prog->len);
nfp_prog->verifier_meta = meta;
 
+   if (!nfp_bpf_supported_opcode(meta->insn.code)) {
+   pr_vlog(env, "instruction %#02x not supported\n",
+   meta->insn.code);
+   return -EINVAL;
+   }
+
if (meta->insn.src_reg >= MAX_BPF_REG ||
meta->insn.dst_reg >= MAX_BPF_REG) {
pr_vlog(env, "program uses extended registers - jit 
hardening?\n");
-- 
2.15.1

[PATCH bpf-next 3/6] tools: bpftool: add -DPACKAGE when including bfd.h

2018-01-16 Thread Jakub Kicinski

From: Jiong Wang 

bfd.h is requiring including of config.h except when PACKAGE or
PACKAGE_VERSION are defined.

  /* PR 14072: Ensure that config.h is included first.  */
  #if !defined PACKAGE && !defined PACKAGE_VERSION
  #error config.h must be included before this header
  #endif

This check has been introduced since May-2012. It doesn't show up in bfd.h
on some Linux distribution, probably because distributions have remove it
when building the package.

However, sometimes the user might just build libfd from source code then
link bpftool against it. For this case, bfd.h will be original that we need
to define PACKAGE or PACKAGE_VERSION.

Acked-by: Jakub Kicinski 
Signed-off-by: Jiong Wang 
---
 tools/bpf/bpftool/Makefile   | 2 +-
 tools/build/feature/Makefile | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 2237bc43f71c..26901ec87361 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -39,7 +39,7 @@ CC = gcc
 
 CFLAGS += -O2
 CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
-CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi 
-I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
+CFLAGS += -DPACKAGE='"bpftool"' -D__EXPORTED_HEADERS__ 
-I$(srctree)/tools/include/uapi -I$(srctree)/tools/include 
-I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
 CFLAGS += -DBPFTOOL_VERSION='"$(BPFTOOL_VERSION)"'
 LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
 
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 17f2c73fff8b..bc715f6ac320 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -190,7 +190,7 @@ FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS)
$(BUILD) -DPACKAGE='"perf"' -lbfd -lz -liberty -ldl
 
 $(OUTPUT)test-disassembler-four-args.bin:
-   $(BUILD) -lbfd -lopcodes
+   $(BUILD) -DPACKAGE='"perf"' -lbfd -lopcodes
 
 $(OUTPUT)test-liberty.bin:
$(CC) $(CFLAGS) -Wall -Werror -o $@ test-libbfd.c -DPACKAGE='"perf"' 
$(LDFLAGS) -lbfd -ldl -liberty
-- 
2.15.1

[PATCH bpf-next 5/6] nfp: bpf: print map lookup problems into verifier log

2018-01-16 Thread Jakub Kicinski

Use the verifier log to output error messages if map lookup
can't be offloaded.

Signed-off-by: Jakub Kicinski 
Acked-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c 
b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 741438896cc7..81dab462456c 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -132,22 +132,24 @@ nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct 
bpf_verifier_env *env,
 
case BPF_FUNC_map_lookup_elem:
if (!bpf->helpers.map_lookup) {
-   pr_info("map_lookup: not supported by FW\n");
+   pr_vlog(env, "map_lookup: not supported by FW\n");
return -EOPNOTSUPP;
}
if (reg2->type != PTR_TO_STACK) {
-   pr_info("map_lookup: unsupported key ptr type %d\n",
+   pr_vlog(env,
+   "map_lookup: unsupported key ptr type %d\n",
reg2->type);
return -EOPNOTSUPP;
}
if (!tnum_is_const(reg2->var_off)) {
-   pr_info("map_lookup: variable key pointer\n");
+   pr_vlog(env, "map_lookup: variable key pointer\n");
return -EOPNOTSUPP;
}
 
off = reg2->var_off.value + reg2->off;
if (-off % 4) {
-   pr_info("map_lookup: unaligned stack pointer %lld\n",
+   pr_vlog(env,
+   "map_lookup: unaligned stack pointer %lld\n",
-off);
return -EOPNOTSUPP;
}
@@ -160,7 +162,7 @@ nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct 
bpf_verifier_env *env,
meta->arg2_var_off |= off != old_off;
 
if (meta->arg1.map_ptr != reg1->map_ptr) {
-   pr_info("map_lookup: called for different map\n");
+   pr_vlog(env, "map_lookup: called for different map\n");
return -EOPNOTSUPP;
}
break;
@@ -263,7 +265,7 @@ nfp_bpf_check_ptr(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta,
 
if (reg->type == PTR_TO_MAP_VALUE) {
if (is_mbpf_store(meta)) {
-   pr_info("map writes not supported\n");
+   pr_vlog(env, "map writes not supported\n");
return -EOPNOTSUPP;
}
}
-- 
2.15.1

[PATCH bpf-next 4/6] libbpf: fix string comparison for guessing eBPF program type

2018-01-16 Thread Jakub Kicinski

From: Quentin Monnet 

libbpf is able to deduce the type of a program from the name of the ELF
section in which it is located. However, the comparison is made on the
first n characters, n being determined with sizeof() applied to the
reference string (e.g. "xdp"). When such section names are supposed to
receive a suffix separated with a slash (e.g. "kprobe/"), using sizeof()
takes the final NUL character of the reference string into account,
which implies that both strings must be equal. Instead, the desired
behaviour would consist in taking the length of the string, *without*
accounting for the ending NUL character, and to make sure the reference
string is a prefix to the ELF section name.

Subtract 1 to the total size of the string for obtaining the length for
the comparison.

Fixes: 583c90097f72 ("libbpf: add ability to guess program type based on 
section name")
Signed-off-by: Quentin Monnet 
Acked-by: Jakub Kicinski 
---
 tools/lib/bpf/libbpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e9c4b7cabcf2..30c776375118 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1803,7 +1803,7 @@ BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
 BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
 BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 
-#define BPF_PROG_SEC(string, type) { string, sizeof(string), type }
+#define BPF_PROG_SEC(string, type) { string, sizeof(string) - 1, type }
 static const struct {
const char *sec;
size_t len;
-- 
2.15.1

[PATCH net-next] ipv6: mcast: remove dead code

2018-01-16 Thread Eric Dumazet

From: Eric Dumazet 

Since commit 41033f029e39 ("snmp: Remove duplicate OUTMCAST stat
increment") one line of code became unneeded.

Signed-off-by: Eric Dumazet 
---
 net/ipv6/mcast.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 
40b223a930a39e010ac744bc3b4b32b28e9bc5e8..6a5d0e39bb87f98bef7de90ab2fa63d9666c00ce
 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1655,8 +1655,6 @@ static void mld_sendpack(struct sk_buff *skb)
if (err)
goto err_out;
 
-   payload_len = skb->len;
-
err = NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT,
  net, net->ipv6.igmp_sk, skb, NULL, skb->dev,
  dst_output);

Re: [patch net-next v10 00/13] net: sched: allow qdiscs to share filter block instances

2018-01-16 Thread David Ahern

On 1/16/18 7:33 AM, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> Currently the filters added to qdiscs are independent. So for example if you
> have 2 netdevices and you create ingress qdisc on both and you want to add
> identical filter rules both, you need to add them twice. This patchset
> makes this easier and mainly saves resources allowing to share all filters
> within a qdisc - I call it a "filter block". Also this helps to save
> resources when we do offload to hw for example to expensive TCAM.
> 
> So back to the example. First, we create 2 qdiscs. Both will share
> block number 22. "22" is just an identification:
> $ tc qdisc add dev ens7 ingress_block 22 ingress
> 
> $ tc qdisc add dev ens8 ingress_block 22 ingress
> 
> 
> If we don't specify "block" command line option, no shared block would
> be created:
> $ tc qdisc add dev ens9 ingress
> 
> Now if we list the qdiscs, we will see the block index in the output:
> 
> $ tc qdisc
> qdisc ingress : dev ens7 parent :fff1 ingress_block 22
> qdisc ingress : dev ens8 parent :fff1 ingress_block 22
> qdisc ingress : dev ens9 parent :fff1
> 
> 
> To make is more visual, the situation looks like this:
> 
>ens7 ingress qdisc ens7 ingress qdisc
>   |  |
>   |  |
>   +-->  block 22  <--+
> 
> Unlimited number of qdiscs may share the same block.
> 
> Note that this patchset introduces block sharing support also for clsact
> qdisc:
> $ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
> $ tc qdisc show dev ens10
> qdisc clsact : dev ens10 parent :fff1 ingress_block 23 egress_block 24
> 
> 
> We can add filter using the block index:
> 
> $ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 
> action drop
> 
> 
> Note we cannot use the qdisc for filter manipulations of shared blocks:
> 
> $ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 
> 192.168.100.2 action drop
> Error: This filter block is shared. Please use the block index to manipulate 
> the filters.
> 
> 
> We will see the same output if we list filters for ingress qdisc of
> ens7 and ens8, also for the block 22:
> 
> $ tc filter show block 22
> filter block 22 protocol ip pref 25 flower chain 0
> filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
> ...
> 
> $ tc filter show dev ens7 ingress
> filter block 22 protocol ip pref 25 flower chain 0
> filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
> ...
> 
> $ tc filter show dev ens8 ingress
> filter block 22 protocol ip pref 25 flower chain 0
> filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
> ...
> 

API LGTM.

Acked-by: David Ahern

Re: [patch net-next v10 00/13] net: sched: allow qdiscs to share filter block instances

2018-01-16 Thread Jamal Hadi Salim


On 18-01-16 10:33 AM, Jiri Pirko wrote:

From: Jiri Pirko 



For patches 1-9:

Reviewed-by: Jamal Hadi Salim 
Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [PATCH net-next 0/8] net: sched: cls: add extack support

2018-01-16 Thread Jamal Hadi Salim


On 18-01-16 05:41 PM, Jakub Kicinski wrote:

On Tue, 16 Jan 2018 17:12:57 -0500, Jamal Hadi Salim wrote:

On 18-01-16 04:46 PM, Jakub Kicinski wrote:

On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote:


[..]




I would say precedence should be Jiri's patches, Alex's patches
and then yours:
Alex's patches fix the core (cls_api.c) area with proper extack
for the core and then he has one patch to cover a specific
use case of the u32 classifier extack. Yours is only concerned
with one use case - bpf which depend on the core (that is in Alex's
patches)


Our patches are concerned with propagating the extack to drivers,
and nfp (and netdevsim) make use of it.

I'm miffed by the fact that you jumped out with this conflicting series
*after* we posted ours, and we got shot down on white space.


I totally empathize with the general frustration.
The general rule is we fix the core first then add users (classifiers in
this case). Note:
Alex has a _lot_ of patches that he has been trying to send for the
last little while and this one is certainly not a new set (I actually
had reviewed this set). There are others. And the rule of "fix core
first then add users" has been imposed on him as well.

cheers,
jamal

[PATCH net-next] net: stmmac: Fix reception of Broadcom switches tags

2018-01-16 Thread Florian Fainelli

Broadcom tags inserted by Broadcom switches put a 4 byte header after
the MAC SA and before the EtherType, which may look like some sort of 0
length LLC/SNAP packet (tcpdump and wireshark do think that way). With
ACS enabled in stmmac the packets were truncated to 8 bytes on
reception, whereas clearing this bit allowed normal reception to occur.

In order to make that possible, we need to pass a net_device argument to
the different core_init() functions and we are dependent on the Broadcom
tagger padding packets correctly (which it now does). To be as little
invasive as possible, this is only done for gmac1000 when the network
device is DSA-enabled (netdev_uses_dsa() returns true).

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/stmicro/stmmac/common.h |  2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c|  3 ++-
 drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 12 +++-
 drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  |  3 ++-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c| 11 ++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c|  2 +-
 6 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index ce2ea2d491ac..2ffe76c0ff74 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -474,7 +474,7 @@ struct mac_device_info;
 /* Helpers to program the MAC core */
 struct stmmac_ops {
/* MAC core initialization */
-   void (*core_init)(struct mac_device_info *hw, int mtu);
+   void (*core_init)(struct mac_device_info *hw, struct net_device *dev);
/* Enable the MAC RX/TX */
void (*set_mac)(void __iomem *ioaddr, bool enable);
/* Enable and verify that the IPC module is supported */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
index 9eb7f65d8000..a3fa65b1ca8e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
@@ -483,7 +483,8 @@ static int sun8i_dwmac_init(struct platform_device *pdev, 
void *priv)
return 0;
 }
 
-static void sun8i_dwmac_core_init(struct mac_device_info *hw, int mtu)
+static void sun8i_dwmac_core_init(struct mac_device_info *hw,
+ struct net_device *dev)
 {
void __iomem *ioaddr = hw->pcsr;
u32 v;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index 8a86340ff2d3..540d21786a43 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -25,18 +25,28 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "stmmac_pcs.h"
 #include "dwmac1000.h"
 
-static void dwmac1000_core_init(struct mac_device_info *hw, int mtu)
+static void dwmac1000_core_init(struct mac_device_info *hw,
+   struct net_device *dev)
 {
void __iomem *ioaddr = hw->pcsr;
u32 value = readl(ioaddr + GMAC_CONTROL);
+   int mtu = dev->mtu;
 
/* Configure GMAC core */
value |= GMAC_CORE_INIT;
 
+   /* Clear ACS bit because Ethernet switch tagging formats such as
+* Broadcom tags can look like invalid LLC/SNAP packets and cause the
+* hardware to truncate packets on reception.
+*/
+   if (netdev_uses_dsa(dev))
+   value &= ~GMAC_CONTROL_ACS;
+
if (mtu > 1500)
value |= GMAC_CONTROL_2K;
if (mtu > 2000)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index 8ef517356313..c1ee427c42cb 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -28,7 +28,8 @@
 #include 
 #include "dwmac100.h"
 
-static void dwmac100_core_init(struct mac_device_info *hw, int mtu)
+static void dwmac100_core_init(struct mac_device_info *hw,
+  struct net_device *dev)
 {
void __iomem *ioaddr = hw->pcsr;
u32 value = readl(ioaddr + MAC_CONTROL);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index f3ed8f7853eb..6af5100d3cb2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -20,13 +20,22 @@
 #include "stmmac_pcs.h"
 #include "dwmac4.h"
 
-static void dwmac4_core_init(struct mac_device_info *hw, int mtu)
+static void dwmac4_core_init(struct mac_device_info *hw,
+struct net_device *dev)
 {
void __iomem *ioaddr = hw->pcsr;
u32 value = readl(ioaddr + GMAC_CONFIG);
+   int mtu = dev->mtu;
 
value |= GMAC_CORE_INIT;
 
+   /* Clear ACS

Re: [PATCH] samples/bpf: Fix trailing semicolon

2018-01-16 Thread Daniel Borkmann

On 01/16/2018 03:15 PM, Luis de Bethencourt wrote:
> The trailing semicolon is an empty statement that does no operation.
> Removing it since it doesn't do anything.
> 
> Signed-off-by: Luis de Bethencourt 

Applied to bpf-next, thanks Luis!

[bpf-next PATCH 0/3] libbpf: cleanups to Makefile

2018-01-16 Thread Jesper Dangaard Brouer

This patchset contains some small improvements and cleanup for
the Makefile in tools/lib/bpf/.

It worries me that the libbpf.so shared library is not versioned, but
it not addressed in this patchset.

---

Jesper Dangaard Brouer (3):
  libbpf: install the header file libbpf.h
  libbpf: cleanup Makefile, remove unused elements
  libbpf: Makefile set specified permission mode


 tools/lib/bpf/Makefile |   20 +---
 1 file changed, 5 insertions(+), 15 deletions(-)

--

Re: [RFC bpf-next PATCH] bpf: add comments to BPF ld/ldx sizes

2018-01-16 Thread Daniel Borkmann

On 01/16/2018 12:31 PM, Jesper Dangaard Brouer wrote:
> Doc BPF ld/ldx size defines, as it help me understand the code in filter.c.
> 
> Signed-off-by: Jesper Dangaard Brouer 
> ---
>  0 files changed
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 395d261948de..4729d9a002d4 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -17,7 +17,7 @@
>  #define BPF_ALU640x07/* alu mode in double word width */
>  
>  /* ld/ldx fields */
> -#define BPF_DW   0x18/* double word */
> +#define BPF_DW   0x18/* double word (64-bit) */
>  #define BPF_XADD 0xc0/* exclusive add */
>  
>  /* alu/jmp fields */
> diff --git a/include/uapi/linux/bpf_common.h b/include/uapi/linux/bpf_common.h
> index 18be90725ab0..ee97668bdadb 100644
> --- a/include/uapi/linux/bpf_common.h
> +++ b/include/uapi/linux/bpf_common.h
> @@ -15,9 +15,10 @@
>  
>  /* ld/ldx fields */
>  #define BPF_SIZE(code)  ((code) & 0x18)
> -#define  BPF_W   0x00
> -#define  BPF_H   0x08
> -#define  BPF_B   0x10
> +#define  BPF_W   0x00 /* 32-bit */
> +#define  BPF_H   0x08 /* 16-bit */
> +#define  BPF_B   0x10 /*  8-bit */
> +/* eBPF  BPF_DW  0x1864-bit */

Hmm, I don't really mind, but we do have it documented in:

  Documentation/networking/filter.txt +942

Feels like if we put a comment only on BPF_{B,H,W}, then we
might also want to document all the others such as ALU ops,
etc.

>  #define BPF_MODE(code)  ((code) & 0xe0)
>  #define  BPF_IMM 0x00
>  #define  BPF_ABS 0x20
>

[bpf-next PATCH 1/3] libbpf: install the header file libbpf.h

2018-01-16 Thread Jesper Dangaard Brouer

It seems like an oversight not to install the header file for libbpf,
given the libbpf.so + libbpf.a files are installed.

Signed-off-by: Jesper Dangaard Brouer 
---
 tools/lib/bpf/Makefile |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 8ed43ae9db9b..54370654c708 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -192,7 +192,8 @@ install_lib: all_cmd
 
 install_headers:
$(call QUIET_INSTALL, headers) \
-   $(call do_install,bpf.h,$(prefix)/include/bpf,644)
+   $(call do_install,bpf.h,$(prefix)/include/bpf,644); \
+   $(call do_install,libbpf.h,$(prefix)/include/bpf,644);
 
 install: install_lib

[bpf-next PATCH 2/3] libbpf: cleanup Makefile, remove unused elements

2018-01-16 Thread Jesper Dangaard Brouer

The plugin_dir_SQ variable is not used, remove it.
The function update_dir is also unused, remove it.
The variable $VERSION_FILES is empty, remove it.

These all originates from the introduction of the Makefile, and is likely a 
copy paste
from tools/lib/traceevent/Makefile.

Fixes: 1b76c13e4b36 ("bpf tools: Introduce 'bpf' library and add bpf feature 
check")
Signed-off-by: Jesper Dangaard Brouer 
---
 tools/lib/bpf/Makefile |   15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 54370654c708..8e15e48cb8f8 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -93,7 +93,6 @@ export prefix libdir src obj
 # Shell quotes
 libdir_SQ = $(subst ','\'',$(libdir))
 libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
-plugin_dir_SQ = $(subst ','\'',$(plugin_dir))
 
 LIB_FILE = libbpf.a libbpf.so
 
@@ -150,7 +149,7 @@ CMD_TARGETS = $(LIB_FILE)
 
 TARGETS = $(CMD_TARGETS)
 
-all: fixdep $(VERSION_FILES) all_cmd
+all: fixdep all_cmd
 
 all_cmd: $(CMD_TARGETS)
 
@@ -169,16 +168,6 @@ $(OUTPUT)libbpf.so: $(BPF_IN)
 $(OUTPUT)libbpf.a: $(BPF_IN)
$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
 
-define update_dir
-  (echo $1 > $@.tmp;   \
-   if [ -r $@ ] && cmp -s $@ $@.tmp; then  \
- rm -f $@.tmp; \
-   else\
- echo '  UPDATE $@';   \
- mv -f $@.tmp $@;  \
-   fi);
-endef
-
 define do_install
if [ ! -d '$(DESTDIR_SQ)$2' ]; then \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$2'; \
@@ -204,7 +193,7 @@ config-clean:
$(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null
 
 clean:
-   $(call QUIET_CLEAN, libbpf) $(RM) *.o *~ $(TARGETS) *.a *.so 
$(VERSION_FILES) .*.d .*.cmd \
+   $(call QUIET_CLEAN, libbpf) $(RM) *.o *~ $(TARGETS) *.a *.so .*.d 
.*.cmd \
$(RM) LIBBPF-CFLAGS
$(call QUIET_CLEAN, core-gen) $(RM) $(OUTPUT)FEATURE-DUMP.libbpf

[bpf-next PATCH 3/3] libbpf: Makefile set specified permission mode

2018-01-16 Thread Jesper Dangaard Brouer

The third parameter to do_install was not used by $(INSTALL) command.
Fix this by only setting the -m option when the third parameter is supplied.

The use of a third parameter was introduced in commit  eb54e522a000 ("bpf:
install libbpf headers on 'make install'").

Without this change, the header files are install as executables files (755).

Fixes: eb54e522a000 ("bpf: install libbpf headers on 'make install'")
Signed-off-by: Jesper Dangaard Brouer 
---
 tools/lib/bpf/Makefile |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 8e15e48cb8f8..83714ca1f22b 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -172,7 +172,7 @@ define do_install
if [ ! -d '$(DESTDIR_SQ)$2' ]; then \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$2'; \
fi; \
-   $(INSTALL) $1 '$(DESTDIR_SQ)$2'
+   $(INSTALL) $1 $(if $3,-m $3,) '$(DESTDIR_SQ)$2'
 endef
 
 install_lib: all_cmd

Re: [PATCH net-next 7/8] net: sched: cls: add extack support for tc_setup_cb_call

2018-01-16 Thread Cong Wang

On Tue, Jan 16, 2018 at 9:20 AM, Alexander Aring  wrote:
>  int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
> -enum tc_setup_type type, void *type_data, bool err_stop)
> +enum tc_setup_type type, void *type_data, bool err_stop,
> +struct netlink_ext_ack *extack)
>  {
> int ok_count;
> int ret;
>
> ret = tcf_block_cb_call(block, type, type_data, err_stop);
> -   if (ret < 0)
> +   if (ret < 0) {
> +   NL_SET_ERR_MSG(extack, "Failed to inialize tcf block");


s/inialize/initialize/

> return ret;
> +   }
> ok_count = ret;
>
> if (!exts)
> return ok_count;
> ret = tc_exts_setup_cb_egdev_call(exts, type, type_data, err_stop);
> -   if (ret < 0)
> +   if (ret < 0) {
> +   NL_SET_ERR_MSG(extack, "Failed to inialize tcf block 
> extensions");

Ditto.

Re: [PATCH -next] bpf: cpumap: make some functions static

2018-01-16 Thread Daniel Borkmann

On 01/16/2018 12:27 PM, Wei Yongjun wrote:
> Fixes the following sparse warnings:
> 
> kernel/bpf/cpumap.c:146:6: warning:
>  symbol '__cpu_map_queue_destructor' was not declared. Should it be static?
> kernel/bpf/cpumap.c:225:16: warning:
>  symbol 'cpu_map_build_skb' was not declared. Should it be static?
> kernel/bpf/cpumap.c:340:26: warning:
>  symbol '__cpu_map_entry_alloc' was not declared. Should it be static?
> kernel/bpf/cpumap.c:398:6: warning:
>  symbol '__cpu_map_entry_free' was not declared. Should it be static?
> kernel/bpf/cpumap.c:441:6: warning:
>  symbol '__cpu_map_entry_replace' was not declared. Should it be static?
> kernel/bpf/cpumap.c:454:5: warning:
>  symbol 'cpu_map_delete_elem' was not declared. Should it be static?
> kernel/bpf/cpumap.c:467:5: warning:
>  symbol 'cpu_map_update_elem' was not declared. Should it be static?
> kernel/bpf/cpumap.c:505:6: warning:
>  symbol 'cpu_map_free' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Applied to bpf-next, thanks Wei!

Re: [PATCH bpf] bpf: reject stores into ctx via st and xadd

2018-01-16 Thread Alexei Starovoitov

On Tue, Jan 16, 2018 at 11:30:10PM +0100, Daniel Borkmann wrote:
> Alexei found that verifier does not reject stores into context
> via BPF_ST instead of BPF_STX. And while looking at it, we
> also should not allow XADD variant of BPF_STX.
> 
> The context rewriter is only assuming either BPF_LDX_MEM- or
> BPF_STX_MEM-type operations, thus reject anything other than
> that so that assumptions in the rewriter properly hold. Add
> test cases as well for BPF selftests.
> 
> Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
> Reported-by: Alexei Starovoitov 
> Signed-off-by: Daniel Borkmann 

Applied, thank you Daniel.

all bugs are eventually shallow.
For this one we even had two broken testcases. Ouch.

Re: [PATCH net-next 2/8] net: sched: cls_api: handle generic cls errors

2018-01-16 Thread Cong Wang

On Tue, Jan 16, 2018 at 9:20 AM, Alexander Aring  wrote:
> @@ -1117,8 +1146,10 @@ int tcf_exts_validate(struct net *net, struct 
> tcf_proto *tp, struct nlattr **tb,
> }
>  #else
> if ((exts->action && tb[exts->action]) ||
> -   (exts->police && tb[exts->police]))
> +   (exts->police && tb[exts->police])) {
> +   NL_SET_ERR_MSG(extack, "Actions are not supported. Check 
> compile options");
> return -EOPNOTSUPP;
> +   }
>  #endif

"Check compile options" is confusing, it is clearer if we can just
say we need to enable CONFIG_NET_CLS_ACT here.

[PATCH iproute2-next] tc: red: allow setting th_min and th_max to the same value

2018-01-16 Thread Jakub Kicinski

Setting th_min and th_max to the same value may be useful for DCTCP
deployments.  The original DCTCP paper describes it as a simplest way
of achieving simple ECN threshold marking.  Indeed, there doesn't seem
to be any simpler qdisc in Linux which would allow such a setup today.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
Or should I go ahead and add a DCTCP qdisc? :)

 tc/tc_red.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tc/tc_red.c b/tc/tc_red.c
index 1f82ef1aec65..178fe088f732 100644
--- a/tc/tc_red.c
+++ b/tc/tc_red.c
@@ -30,7 +30,9 @@ int tc_red_eval_P(unsigned int qmin, unsigned int qmax, 
double prob)
 {
int i = qmax - qmin;
 
-   if (i <= 0)
+   if (!i)
+   return 0;
+   if (i < 0)
return -1;
 
prob /= i;
-- 
2.15.1

Re: [PATCH net-next 8/8] net: sched: cls_u32: add extack support

2018-01-16 Thread Cong Wang

On Tue, Jan 16, 2018 at 9:20 AM, Alexander Aring  wrote:
> -   if (root_ht == ht)
> +   if (root_ht == ht) {
> +   NL_SET_ERR_MSG(extack, "Not allowd to delete root node");

s/allowd/allowed/

Re: [PATCH net-next 0/8] net: sched: cls: add extack support

2018-01-16 Thread Jakub Kicinski

On Tue, 16 Jan 2018 17:12:57 -0500, Jamal Hadi Salim wrote:
> On 18-01-16 04:46 PM, Jakub Kicinski wrote:
> > On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote:  
> 
> [..]
> 
> > Ugh, this is going to conflict with our series too :(  (and I CCed you
> > on ours)
> > 
> > Would it be OK for you to hold off until Jiri's code gets merged and
> > ours comes down via bpf-next?  That shouldn't take long at all.  The
> > conflicts between bpf/bpf-next/net-next are really taking their toll
> > on us this release cycles, I would really appreciate if we could make
> > some progress on this relatively simple series at least...
> >   
> 
> I would say precedence should be Jiri's patches, Alex's patches
> and then yours:
> Alex's patches fix the core (cls_api.c) area with proper extack
> for the core and then he has one patch to cover a specific
> use case of the u32 classifier extack. Yours is only concerned
> with one use case - bpf which depend on the core (that is in Alex's
> patches)

Our patches are concerned with propagating the extack to drivers, 
and nfp (and netdevsim) make use of it.

I'm miffed by the fact that you jumped out with this conflicting series
*after* we posted ours, and we got shot down on white space.

[PATCH bpf] bpf: reject stores into ctx via st and xadd

2018-01-16 Thread Daniel Borkmann

Alexei found that verifier does not reject stores into context
via BPF_ST instead of BPF_STX. And while looking at it, we
also should not allow XADD variant of BPF_STX.

The context rewriter is only assuming either BPF_LDX_MEM- or
BPF_STX_MEM-type operations, thus reject anything other than
that so that assumptions in the rewriter properly hold. Add
test cases as well for BPF selftests.

Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
Reported-by: Alexei Starovoitov 
Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/verifier.c   | 19 +++
 tools/testing/selftests/bpf/test_verifier.c | 29 +++--
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5423b90..1aff5de 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -978,6 +978,13 @@ static bool is_pointer_value(struct bpf_verifier_env *env, 
int regno)
return __is_pointer_value(env->allow_ptr_leaks, cur_regs(env) + regno);
 }
 
+static bool is_ctx_reg(struct bpf_verifier_env *env, int regno)
+{
+   const struct bpf_reg_state *reg = cur_regs(env) + regno;
+
+   return reg->type == PTR_TO_CTX;
+}
+
 static int check_pkt_ptr_alignment(struct bpf_verifier_env *env,
   const struct bpf_reg_state *reg,
   int off, int size, bool strict)
@@ -1258,6 +1265,12 @@ static int check_xadd(struct bpf_verifier_env *env, int 
insn_idx, struct bpf_ins
return -EACCES;
}
 
+   if (is_ctx_reg(env, insn->dst_reg)) {
+   verbose(env, "BPF_XADD stores into R%d context is not 
allowed\n",
+   insn->dst_reg);
+   return -EACCES;
+   }
+
/* check whether atomic_add can read the memory */
err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
   BPF_SIZE(insn->code), BPF_READ, -1);
@@ -3991,6 +4004,12 @@ static int do_check(struct bpf_verifier_env *env)
if (err)
return err;
 
+   if (is_ctx_reg(env, insn->dst_reg)) {
+   verbose(env, "BPF_ST stores into R%d context is 
not allowed\n",
+   insn->dst_reg);
+   return -EACCES;
+   }
+
/* check that memory (dst_reg + off) is writeable */
err = check_mem_access(env, insn_idx, insn->dst_reg, 
insn->off,
   BPF_SIZE(insn->code), BPF_WRITE,
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 74cb63e..c34d288 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -2593,6 +2593,29 @@ static struct bpf_test tests[] = {
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
{
+   "context stores via ST",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_ST_MEM(BPF_DW, BPF_REG_1, offsetof(struct 
__sk_buff, mark), 0),
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "BPF_ST stores into R1 context is not allowed",
+   .result = REJECT,
+   .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+   },
+   {
+   "context stores via XADD",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_W, BPF_REG_1,
+BPF_REG_0, offsetof(struct __sk_buff, 
mark), 0),
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "BPF_XADD stores into R1 context is not allowed",
+   .result = REJECT,
+   .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+   },
+   {
"direct packet access: test1",
.insns = {
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
@@ -4312,7 +4335,8 @@ static struct bpf_test tests[] = {
.fixup_map1 = { 2 },
.errstr_unpriv = "R2 leaks addr into mem",
.result_unpriv = REJECT,
-   .result = ACCEPT,
+   .result = REJECT,
+   .errstr = "BPF_XADD stores into R1 context is not allowed",
},
{
"leak pointer into ctx 2",
@@ -4326,7 +4350,8 @@ static struct bpf_test tests[] = {
},
.errstr_unpriv = "R10 leaks addr into mem",
.result_unpriv = REJECT,
-   .result = ACCEPT,
+   .result = REJECT,
+   .errstr = "BPF_XADD stores into R1 context is not allowed",
},
{
"leak pointer

[PATCH] cfg80211: fix station info handling bugs

2018-01-16 Thread Johannes Berg

From: Johannes Berg 

Fix two places where the structure isn't initialized to zero,
and thus can't be filled properly by the driver.

Fixes: 4a4b8169501b ("cfg80211: Accept multiple RSSI thresholds for CQM")
Fixes: 9930380f0bd8 ("cfg80211: implement IWRATE")
Signed-off-by: Johannes Berg 
---
Dave, can you apply this as an exception? I'm not really expecting
any other patches to show up now, and seems easier to have a single
patch than a whole pull request (especially now that patchwork seems
to be swallowing mine ...)
---
 net/wireless/nl80211.c | 2 +-
 net/wireless/wext-compat.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index c084dd2205ac..91e55bb85416 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -9832,7 +9832,7 @@ static int cfg80211_cqm_rssi_update(struct 
cfg80211_registered_device *rdev,
 */
if (!wdev->cqm_config->last_rssi_event_value && wdev->current_bss &&
rdev->ops->get_station) {
-   struct station_info sinfo;
+   struct station_info sinfo = {};
u8 *mac_addr;
 
mac_addr = wdev->current_bss->pub.bssid;
diff --git a/net/wireless/wext-compat.c b/net/wireless/wext-compat.c
index 7ca04a7de85a..05186a47878f 100644
--- a/net/wireless/wext-compat.c
+++ b/net/wireless/wext-compat.c
@@ -1254,8 +1254,7 @@ static int cfg80211_wext_giwrate(struct net_device *dev,
 {
struct wireless_dev *wdev = dev->ieee80211_ptr;
struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy);
-   /* we are under RTNL - globally locked - so can use a static struct */
-   static struct station_info sinfo;
+   struct station_info sinfo = {};
u8 addr[ETH_ALEN];
int err;
 
-- 
2.15.1

Re: [PATCH net-next 0/8] net: sched: cls: add extack support

2018-01-16 Thread Jamal Hadi Salim


On 18-01-16 04:46 PM, Jakub Kicinski wrote:

On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote:


[..]


Ugh, this is going to conflict with our series too :(  (and I CCed you
on ours)

Would it be OK for you to hold off until Jiri's code gets merged and
ours comes down via bpf-next?  That shouldn't take long at all.  The
conflicts between bpf/bpf-next/net-next are really taking their toll
on us this release cycles, I would really appreciate if we could make
some progress on this relatively simple series at least...



I would say precedence should be Jiri's patches, Alex's patches
and then yours:
Alex's patches fix the core (cls_api.c) area with proper extack
for the core and then he has one patch to cover a specific
use case of the u32 classifier extack. Yours is only concerned
with one use case - bpf which depend on the core (that is in Alex's
patches)

cheers,
jamal

[PATCH v3 net-next 2/4] l2tp: remove l2specific_len dependency in l2tp_core

2018-01-16 Thread Lorenzo Bianconi

Remove l2specific_len dependency while building l2tpv3 header or
parsing the received frame since default L2-Specific Sublayer is
always four bytes long and we don't need to rely on a user supplied
value.
Moreover in l2tp netlink code there are no sanity checks to
enforce the relation between l2specific_len and l2specific_type,
so sending a malformed netlink message is possible to set
l2specific_type to L2TP_L2SPECTYPE_DEFAULT (or even
L2TP_L2SPECTYPE_NONE) and set l2specific_len to a value greater than
4 leaking memory on the wire and sending corrupted frames.

Reviewed-by: Guillaume Nault 
Tested-by: Guillaume Nault 
Signed-off-by: Lorenzo Bianconi 
---
 net/l2tp/l2tp_core.c | 34 --
 net/l2tp/l2tp_core.h | 11 +++
 2 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 62285fc6eb59..88efb8b845ca 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -730,11 +730,9 @@ void l2tp_recv_common(struct l2tp_session *session, struct 
sk_buff *skb,
 "%s: recv data ns=%u, session nr=%u\n",
 session->name, ns, session->nr);
}
+   ptr += 4;
}
 
-   /* Advance past L2-specific header, if present */
-   ptr += session->l2specific_len;
-
if (L2TP_SKB_CB(skb)->has_seq) {
/* Received a packet with sequence numbers. If we're the LNS,
 * check if we sre sending sequence numbers and if not,
@@ -1048,21 +1046,20 @@ static int l2tp_build_l2tpv3_header(struct l2tp_session 
*session, void *buf)
memcpy(bufp, >cookie[0], session->cookie_len);
bufp += session->cookie_len;
}
-   if (session->l2specific_len) {
-   if (session->l2specific_type == L2TP_L2SPECTYPE_DEFAULT) {
-   u32 l2h = 0;
-   if (session->send_seq) {
-   l2h = 0x4000 | session->ns;
-   session->ns++;
-   session->ns &= 0xff;
-   l2tp_dbg(session, L2TP_MSG_SEQ,
-"%s: updated ns to %u\n",
-session->name, session->ns);
-   }
+   if (session->l2specific_type == L2TP_L2SPECTYPE_DEFAULT) {
+   u32 l2h = 0;
 
-   *((__be32 *) bufp) = htonl(l2h);
+   if (session->send_seq) {
+   l2h = 0x4000 | session->ns;
+   session->ns++;
+   session->ns &= 0xff;
+   l2tp_dbg(session, L2TP_MSG_SEQ,
+"%s: updated ns to %u\n",
+session->name, session->ns);
}
-   bufp += session->l2specific_len;
+
+   *((__be32 *)bufp) = htonl(l2h);
+   bufp += 4;
}
 
return bufp - optr;
@@ -1719,7 +1716,7 @@ int l2tp_session_delete(struct l2tp_session *session)
 EXPORT_SYMBOL_GPL(l2tp_session_delete);
 
 /* We come here whenever a session's send_seq, cookie_len or
- * l2specific_len parameters are set.
+ * l2specific_type parameters are set.
  */
 void l2tp_session_set_header_len(struct l2tp_session *session, int version)
 {
@@ -1728,7 +1725,8 @@ void l2tp_session_set_header_len(struct l2tp_session 
*session, int version)
if (session->send_seq)
session->hdr_len += 4;
} else {
-   session->hdr_len = 4 + session->cookie_len + 
session->l2specific_len;
+   session->hdr_len = 4 + session->cookie_len;
+   session->hdr_len += l2tp_get_l2specific_len(session);
if (session->tunnel->encap == L2TP_ENCAPTYPE_UDP)
session->hdr_len += 4;
}
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index c2e9bbd79b35..7bef304de4f0 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -302,6 +302,17 @@ static inline void l2tp_session_dec_refcount(struct 
l2tp_session *session)
l2tp_session_free(session);
 }
 
+static inline int l2tp_get_l2specific_len(struct l2tp_session *session)
+{
+   switch (session->l2specific_type) {
+   case L2TP_L2SPECTYPE_DEFAULT:
+   return 4;
+   case L2TP_L2SPECTYPE_NONE:
+   default:
+   return 0;
+   }
+}
+
 #define l2tp_printk(ptr, type, func, fmt, ...) \
 do {   \
if (((ptr)->debug) & (type))\
-- 
2.13.6

[PATCH v3 net-next 3/4] l2tp: remove l2specific_len configurable parameter

2018-01-16 Thread Lorenzo Bianconi

Remove l2specific_len configuration parameter since now L2-Specific
Sublayer length is computed according to l2specific_type provided by
userspace.

Reviewed-by: Guillaume Nault 
Tested-by: Guillaume Nault 
Signed-off-by: Lorenzo Bianconi 
---
 net/l2tp/l2tp_core.c| 1 -
 net/l2tp/l2tp_core.h| 2 --
 net/l2tp/l2tp_debugfs.c | 2 +-
 net/l2tp/l2tp_netlink.c | 4 
 4 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 88efb8b845ca..194a7483bb93 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1777,7 +1777,6 @@ struct l2tp_session *l2tp_session_create(int priv_size, 
struct l2tp_tunnel *tunn
session->lns_mode = cfg->lns_mode;
session->reorder_timeout = cfg->reorder_timeout;
session->l2specific_type = cfg->l2specific_type;
-   session->l2specific_len = cfg->l2specific_len;
session->cookie_len = cfg->cookie_len;
memcpy(>cookie[0], >cookie[0], 
cfg->cookie_len);
session->peer_cookie_len = cfg->peer_cookie_len;
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 7bef304de4f0..9bbee90e9963 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -59,7 +59,6 @@ struct l2tp_session_cfg {
int debug;  /* bitmask of debug message
 * categories */
u16 vlan_id;/* VLAN pseudowire only */
-   u16 l2specific_len; /* Layer 2 specific length */
u16 l2specific_type; /* Layer 2 specific type */
u8  cookie[8];  /* optional cookie */
int cookie_len; /* 0, 4 or 8 bytes */
@@ -85,7 +84,6 @@ struct l2tp_session {
int cookie_len;
u8  peer_cookie[8];
int peer_cookie_len;
-   u16 l2specific_len;
u16 l2specific_type;
u16 hdr_len;
u32 nr; /* session NR state (receive) */
diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
index 2c30587d1a14..72e713da4733 100644
--- a/net/l2tp/l2tp_debugfs.c
+++ b/net/l2tp/l2tp_debugfs.c
@@ -181,7 +181,7 @@ static void l2tp_dfs_seq_session_show(struct seq_file *m, 
void *v)
   session->debug,
   jiffies_to_msecs(session->reorder_timeout));
seq_printf(m, "   offset 0 l2specific %hu/%hu\n",
-  session->l2specific_type, session->l2specific_len);
+  session->l2specific_type, l2tp_get_l2specific_len(session));
if (session->cookie_len) {
seq_printf(m, "   cookie %02x%02x%02x%02x",
   session->cookie[0], session->cookie[1],
diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index 9ba2b8a68f65..405a5341ed1e 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -561,10 +561,6 @@ static int l2tp_nl_cmd_session_create(struct sk_buff *skb, 
struct genl_info *inf
cfg.l2specific_type = L2TP_L2SPECTYPE_DEFAULT;
}
 
-   cfg.l2specific_len = 4;
-   if (info->attrs[L2TP_ATTR_L2SPEC_LEN])
-   cfg.l2specific_len = 
nla_get_u8(info->attrs[L2TP_ATTR_L2SPEC_LEN]);
-
if (info->attrs[L2TP_ATTR_COOKIE]) {
u16 len = nla_len(info->attrs[L2TP_ATTR_COOKIE]);
if (len > 8) {
-- 
2.13.6

[PATCH v3 net-next 4/4] l2tp: mark L2TP_ATTR_L2SPEC_LEN as not used

2018-01-16 Thread Lorenzo Bianconi

Reviewed-by: Guillaume Nault 
Tested-by: Guillaume Nault 
Signed-off-by: Lorenzo Bianconi 
---
 include/uapi/linux/l2tp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/l2tp.h b/include/uapi/linux/l2tp.h
index 71e62795104d..7d570c7bd117 100644
--- a/include/uapi/linux/l2tp.h
+++ b/include/uapi/linux/l2tp.h
@@ -97,7 +97,7 @@ enum {
L2TP_ATTR_OFFSET,   /* u16 (not used) */
L2TP_ATTR_DATA_SEQ, /* u16 */
L2TP_ATTR_L2SPEC_TYPE,  /* u8, enum l2tp_l2spec_type */
-   L2TP_ATTR_L2SPEC_LEN,   /* u8, enum l2tp_l2spec_type */
+   L2TP_ATTR_L2SPEC_LEN,   /* u8 (not used) */
L2TP_ATTR_PROTO_VERSION,/* u8 */
L2TP_ATTR_IFNAME,   /* string */
L2TP_ATTR_CONN_ID,  /* u32 */
-- 
2.13.6

[PATCH v3 net-next 1/4] l2tp: double-check l2specific_type provided by userspace

2018-01-16 Thread Lorenzo Bianconi

Add sanity check on l2specific_type provided by userspace in
l2tp_nl_cmd_session_create() since just L2TP_L2SPECTYPE_DEFAULT and
L2TP_L2SPECTYPE_NONE are currently supported.
Moreover explicitly set l2specific_type to L2TP_L2SPECTYPE_DEFAULT
only if the userspace does not provide a value for it

Reviewed-by: Guillaume Nault 
Tested-by: Guillaume Nault 
Signed-off-by: Lorenzo Bianconi 
---
 net/l2tp/l2tp_netlink.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index e1ca29f79821..9ba2b8a68f65 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -550,9 +550,16 @@ static int l2tp_nl_cmd_session_create(struct sk_buff *skb, 
struct genl_info *inf
if (info->attrs[L2TP_ATTR_DATA_SEQ])
cfg.data_seq = 
nla_get_u8(info->attrs[L2TP_ATTR_DATA_SEQ]);
 
-   cfg.l2specific_type = L2TP_L2SPECTYPE_DEFAULT;
-   if (info->attrs[L2TP_ATTR_L2SPEC_TYPE])
+   if (info->attrs[L2TP_ATTR_L2SPEC_TYPE]) {
cfg.l2specific_type = 
nla_get_u8(info->attrs[L2TP_ATTR_L2SPEC_TYPE]);
+   if (cfg.l2specific_type != L2TP_L2SPECTYPE_DEFAULT &&
+   cfg.l2specific_type != L2TP_L2SPECTYPE_NONE) {
+   ret = -EINVAL;
+   goto out_tunnel;
+   }
+   } else {
+   cfg.l2specific_type = L2TP_L2SPECTYPE_DEFAULT;
+   }
 
cfg.l2specific_len = 4;
if (info->attrs[L2TP_ATTR_L2SPEC_LEN])
-- 
2.13.6

[PATCH v3 net-next 0/4] l2tp: set l2specific_len based on l2specific_type

2018-01-16 Thread Lorenzo Bianconi

Do not rely on l2specific_len value provided by userspace but set sublayer
length according to l2specific_type.
Mark L2TP_ATTR_L2SPEC_LEN attribute as not used

Changes since v2:
- drop the patch related to a fix in the switch default case in
  l2tp_nl_cmd_session_create()
- use L2SPECTYPE_NONE as default case in l2tp_get_l2specific_len()

Changes since v1:
- remove l2specific_len parameter
- add sanity check on l2specific_type provided by userspace

Lorenzo Bianconi (4):
  l2tp: double-check l2specific_type provided by userspace
  l2tp: remove l2specific_len dependency in l2tp_core
  l2tp: remove l2specific_len configurable parameter
  l2tp: mark L2TP_ATTR_L2SPEC_LEN as not used

 include/uapi/linux/l2tp.h |  2 +-
 net/l2tp/l2tp_core.c  | 35 ---
 net/l2tp/l2tp_core.h  | 13 +++--
 net/l2tp/l2tp_debugfs.c   |  2 +-
 net/l2tp/l2tp_netlink.c   | 15 +--
 5 files changed, 38 insertions(+), 29 deletions(-)

-- 
2.13.6

Re: [PATCH net-next 8/8] net: sched: cls_u32: add extack support

2018-01-16 Thread Jakub Kicinski

On Tue, 16 Jan 2018 12:20:27 -0500, Alexander Aring wrote:
> @@ -780,14 +787,18 @@ static int u32_set_parms(struct net *net, struct 
> tcf_proto *tp,
>   u32 handle = nla_get_u32(tb[TCA_U32_LINK]);
>   struct tc_u_hnode *ht_down = NULL, *ht_old;
>  
> - if (TC_U32_KEY(handle))
> + if (TC_U32_KEY(handle)) {
> + NL_SET_ERR_MSG(extack, "u32 Link handle must be a hash 
> table");
>   return -EINVAL;
> + }

Since classifiers are commonly built as modules would it make more
sense to use NL_SET_ERR_MSG_MOD()?

Re: [PATCH net-next 0/8] net: sched: cls: add extack support

2018-01-16 Thread Jakub Kicinski

On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote:
> Hi,
> 
> this patch adds extack support for TC classifier subsystem. The first
> patch fixes some code style issues for this patch series pointed out
> by checkpatch. The other patches until the last one prepares extack
> handling for the TC classifier subsystem and handle generic extack
> errors.
> 
> The last patch is an example for u32 classifier to add extack support
> inside the callbacks delete and change. There exists a init callback as
> well, but most classifier implementation run a kalloc() once to allocate
> something. Not necessary _yet_ to add extack support now.
> 
> I know there are patches around which makes changes to these files.
> I will rebase my stuff on Jiri's patches if they get in before mine.

Ugh, this is going to conflict with our series too :(  (and I CCed you
on ours)

Would it be OK for you to hold off until Jiri's code gets merged and
ours comes down via bpf-next?  That shouldn't take long at all.  The
conflicts between bpf/bpf-next/net-next are really taking their toll 
on us this release cycles, I would really appreciate if we could make
some progress on this relatively simple series at least...

[PATCH bpf-next v3 02/11] net: sched: cls_flower: propagate extack support for filter offload

2018-01-16 Thread Jakub Kicinski

From: Quentin Monnet 

Propagate the extack pointer from the `->change()` classifier operation
to the function used for filter replacement in cls_flower. This makes it
possible to use netlink extack messages in the future at replacement
time for this filter, although it is not used at this point.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 net/sched/cls_flower.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 998ee4faf934..ebbaba4a214b 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -234,7 +234,8 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
struct cls_fl_filter *f)
 static int fl_hw_replace_filter(struct tcf_proto *tp,
struct flow_dissector *dissector,
struct fl_flow_key *mask,
-   struct cls_fl_filter *f)
+   struct cls_fl_filter *f,
+   struct netlink_ext_ack *extack)
 {
struct tc_cls_flower_offload cls_flower = {};
struct tcf_block *block = tp->chain->block;
@@ -939,7 +940,8 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
err = fl_hw_replace_filter(tp,
   >dissector,
   ,
-  fnew);
+  fnew,
+  extack);
if (err)
goto errout_idr;
}
-- 
2.15.1

[PATCH bpf-next v3 08/11] nfp: bpf: plumb extack into functions related to XDP offload

2018-01-16 Thread Jakub Kicinski

From: Quentin Monnet 

Pass a pointer to an extack object to nfp_app_xdp_offload() in order to
prepare for extack usage in the nfp driver. Next step will be to forward
this extack pointer to nfp_net_bpf_offload(), once this function is able
to use it for printing error messages.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c   | 4 ++--
 drivers/net/ethernet/netronome/nfp/nfp_app.h| 9 ++---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 +-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c 
b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 8823c8360047..e8816ab8fb63 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -54,7 +54,7 @@ static bool nfp_net_ebpf_capable(struct nfp_net *nn)
 
 static int
 nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn,
-   struct bpf_prog *prog)
+   struct bpf_prog *prog, struct netlink_ext_ack *extack)
 {
bool running, xdp_running;
int ret;
@@ -73,7 +73,7 @@ nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn,
ret = nfp_net_bpf_offload(nn, prog, running);
/* Stop offload if replace not possible */
if (ret && prog)
-   nfp_bpf_xdp_offload(app, nn, NULL);
+   nfp_bpf_xdp_offload(app, nn, NULL, extack);
 
nn->dp.bpf_offload_xdp = prog && !ret;
return ret;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h 
b/drivers/net/ethernet/netronome/nfp/nfp_app.h
index 6a6eb02b516e..1229a34f8da5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h
@@ -43,6 +43,7 @@
 struct bpf_prog;
 struct net_device;
 struct netdev_bpf;
+struct netlink_ext_ack;
 struct pci_dev;
 struct sk_buff;
 struct sk_buff;
@@ -134,7 +135,8 @@ struct nfp_app_type {
int (*bpf)(struct nfp_app *app, struct nfp_net *nn,
   struct netdev_bpf *xdp);
int (*xdp_offload)(struct nfp_app *app, struct nfp_net *nn,
-  struct bpf_prog *prog);
+  struct bpf_prog *prog,
+  struct netlink_ext_ack *extack);
 
int (*sriov_enable)(struct nfp_app *app, int num_vfs);
void (*sriov_disable)(struct nfp_app *app);
@@ -320,11 +322,12 @@ static inline int nfp_app_bpf(struct nfp_app *app, struct 
nfp_net *nn,
 }
 
 static inline int nfp_app_xdp_offload(struct nfp_app *app, struct nfp_net *nn,
- struct bpf_prog *prog)
+ struct bpf_prog *prog,
+ struct netlink_ext_ack *extack)
 {
if (!app || !app->type->xdp_offload)
return -EOPNOTSUPP;
-   return app->type->xdp_offload(app, nn, prog);
+   return app->type->xdp_offload(app, nn, prog, extack);
 }
 
 static inline bool __nfp_app_ctrl_tx(struct nfp_app *app, struct sk_buff *skb)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 2b5cad3069a7..14f23e8d27fa 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3395,7 +3395,7 @@ nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog 
*prog, u32 flags,
if (err)
return err;
 
-   err = nfp_app_xdp_offload(nn->app, nn, offload_prog);
+   err = nfp_app_xdp_offload(nn->app, nn, offload_prog, extack);
if (err && flags & XDP_FLAGS_HW_MODE)
return err;
 
-- 
2.15.1

[PATCH bpf-next v3 04/11] net: sched: cls_u32: propagate extack support for filter offload

2018-01-16 Thread Jakub Kicinski

From: Quentin Monnet 

Propagate the extack pointer from the `->change()` classifier operation
to the function used for filter replacement in cls_u32. This makes it
possible to use netlink extack messages in the future at replacement
time for this filter, although it is not used at this point.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 net/sched/cls_u32.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3ef5c32741c1..671eb952f6af 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -501,7 +501,7 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *h)
 }
 
 static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h,
-   u32 flags)
+   u32 flags, struct netlink_ext_ack *extack)
 {
struct tcf_block *block = tp->chain->block;
struct tc_cls_u32_offload cls_u32 = {};
@@ -542,7 +542,7 @@ static void u32_remove_hw_knode(struct tcf_proto *tp, u32 
handle)
 }
 
 static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n,
-   u32 flags)
+   u32 flags, struct netlink_ext_ack *extack)
 {
struct tcf_block *block = tp->chain->block;
struct tc_cls_u32_offload cls_u32 = {};
@@ -943,7 +943,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
return err;
}
 
-   err = u32_replace_hw_knode(tp, new, flags);
+   err = u32_replace_hw_knode(tp, new, flags, extack);
if (err) {
u32_destroy_key(tp, new, false);
return err;
@@ -990,7 +990,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
ht->prio = tp->prio;
idr_init(>handle_idr);
 
-   err = u32_replace_hw_hnode(tp, ht, flags);
+   err = u32_replace_hw_hnode(tp, ht, flags, extack);
if (err) {
idr_remove_ext(_c->handle_idr, handle);
kfree(ht);
@@ -1088,7 +1088,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
struct tc_u_knode __rcu **ins;
struct tc_u_knode *pins;
 
-   err = u32_replace_hw_knode(tp, n, flags);
+   err = u32_replace_hw_knode(tp, n, flags, extack);
if (err)
goto errhw;
 
-- 
2.15.1

[PATCH bpf-next v3 01/11] net: sched: add extack support to change() classifier operation

2018-01-16 Thread Jakub Kicinski

From: Quentin Monnet 

Add an extra argument to `->change()` operation for passing a pointer to
a struct netlink_ext_ack. Update the operation for all classifiers
accordingly. Extack is not used at this point.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 include/net/sch_generic.h | 3 ++-
 net/sched/cls_api.c   | 3 ++-
 net/sched/cls_basic.c | 3 ++-
 net/sched/cls_bpf.c   | 2 +-
 net/sched/cls_cgroup.c| 3 ++-
 net/sched/cls_flow.c  | 2 +-
 net/sched/cls_flower.c| 2 +-
 net/sched/cls_fw.c| 2 +-
 net/sched/cls_matchall.c  | 2 +-
 net/sched/cls_route.c | 3 ++-
 net/sched/cls_rsvp.h  | 2 +-
 net/sched/cls_tcindex.c   | 3 ++-
 net/sched/cls_u32.c   | 3 ++-
 13 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index ac029d5d88e4..5e77f2639c67 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -232,7 +232,8 @@ struct tcf_proto_ops {
int (*change)(struct net *net, struct sk_buff *,
struct tcf_proto*, unsigned long,
u32 handle, struct nlattr **,
-   void **, bool);
+   void **, bool,
+   struct netlink_ext_ack *);
int (*delete)(struct tcf_proto*, void *, bool*);
void(*walk)(struct tcf_proto*, struct tcf_walker 
*arg);
void(*bind_class)(void *, u32, unsigned long);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 6708b6953bfa..0460cc22d48c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -912,7 +912,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
}
 
err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, ,
- n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE 
: TCA_ACT_REPLACE);
+ n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE 
: TCA_ACT_REPLACE,
+ extack);
if (err == 0) {
if (tp_created)
tcf_chain_tp_insert(chain, _info, tp);
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 5f169ded347e..2cc38cd71938 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -175,7 +175,8 @@ static int basic_set_parms(struct net *net, struct 
tcf_proto *tp,
 
 static int basic_change(struct net *net, struct sk_buff *in_skb,
struct tcf_proto *tp, unsigned long base, u32 handle,
-   struct nlattr **tca, void **arg, bool ovr)
+   struct nlattr **tca, void **arg, bool ovr,
+   struct netlink_ext_ack *extack)
 {
int err;
struct basic_head *head = rtnl_dereference(tp->root);
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 8d78e7f4ecc3..fcb831b3917e 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -449,7 +449,7 @@ static int cls_bpf_set_parms(struct net *net, struct 
tcf_proto *tp,
 static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
  struct tcf_proto *tp, unsigned long base,
  u32 handle, struct nlattr **tca,
- void **arg, bool ovr)
+ void **arg, bool ovr, struct netlink_ext_ack *extack)
 {
struct cls_bpf_head *head = rtnl_dereference(tp->root);
struct cls_bpf_prog *oldprog = *arg;
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 309d5899265f..b74af0b55820 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -91,7 +91,8 @@ static void cls_cgroup_destroy_rcu(struct rcu_head *root)
 static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
 struct tcf_proto *tp, unsigned long base,
 u32 handle, struct nlattr **tca,
-void **arg, bool ovr)
+void **arg, bool ovr,
+struct netlink_ext_ack *extack)
 {
struct nlattr *tb[TCA_CGROUP_MAX + 1];
struct cls_cgroup_head *head = rtnl_dereference(tp->root);
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 25c2a888e1f0..e944f01d5394 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -401,7 +401,7 @@ static void flow_destroy_filter(struct rcu_head *head)
 static int flow_change(struct net *net, struct sk_buff *in_skb,
   struct tcf_proto *tp, unsigned long base,
   u32 handle, struct nlattr **tca,
-  void **arg, bool ovr)
+  void **arg, bool ovr, struct netlink_ext_ack *extack)
 {
struct

[PATCH bpf-next v3 07/11] net: sched: create tc_can_offload_extack() wrapper

2018-01-16 Thread Jakub Kicinski

From: Quentin Monnet 

Create a wrapper around tc_can_offload() that takes an additional
extack pointer argument in order to output an error message if TC
offload is disabled on the device.

In this way, the error message is handled by the core and can be the
same for all drivers.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 include/net/pkt_cls.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index c88c61234cb3..a3ad6a5a2d12 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -644,6 +644,17 @@ static inline bool tc_can_offload(const struct net_device 
*dev)
return dev->features & NETIF_F_HW_TC;
 }
 
+static inline bool tc_can_offload_extack(const struct net_device *dev,
+struct netlink_ext_ack *extack)
+{
+   bool can = tc_can_offload(dev);
+
+   if (!can)
+   NL_SET_ERR_MSG(extack, "TC offload is disabled on net device");
+
+   return can;
+}
+
 static inline bool tc_skip_hw(u32 flags)
 {
return (flags & TCA_CLS_FLAGS_SKIP_HW) ? true : false;
-- 
2.15.1

[PATCH bpf-next v3 09/11] nfp: bpf: use extack support to improve debugging

2018-01-16 Thread Jakub Kicinski

From: Quentin Monnet 

Use the recently added extack support for eBPF offload in the driver.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c| 31 ++--
 drivers/net/ethernet/netronome/nfp/bpf/main.h|  2 +-
 drivers/net/ethernet/netronome/nfp/bpf/offload.c | 24 +++---
 3 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c 
b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index e8816ab8fb63..a638c3ab6b61 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -70,7 +70,7 @@ nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn,
if (prog && running && !xdp_running)
return -EBUSY;
 
-   ret = nfp_net_bpf_offload(nn, prog, running);
+   ret = nfp_net_bpf_offload(nn, prog, running, extack);
/* Stop offload if replace not possible */
if (ret && prog)
nfp_bpf_xdp_offload(app, nn, NULL, extack);
@@ -125,17 +125,31 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type 
type,
struct nfp_bpf_vnic *bv;
int err;
 
-   if (type != TC_SETUP_CLSBPF ||
-   !tc_can_offload(nn->dp.netdev) ||
-   !nfp_net_ebpf_capable(nn) ||
-   cls_bpf->common.protocol != htons(ETH_P_ALL) ||
-   cls_bpf->common.chain_index)
+   if (type != TC_SETUP_CLSBPF) {
+   NL_SET_ERR_MSG_MOD(cls_bpf->common.extack,
+  "only offload of BPF classifiers supported");
+   return -EOPNOTSUPP;
+   }
+   if (!tc_can_offload_extack(nn->dp.netdev, cls_bpf->common.extack))
+   return -EOPNOTSUPP;
+   if (!nfp_net_ebpf_capable(nn)) {
+   NL_SET_ERR_MSG_MOD(cls_bpf->common.extack,
+  "NFP firmware does not support eBPF 
offload");
+   return -EOPNOTSUPP;
+   }
+   if (cls_bpf->common.protocol != htons(ETH_P_ALL)) {
+   NL_SET_ERR_MSG_MOD(cls_bpf->common.extack,
+  "only ETH_P_ALL supported as filter 
protocol");
+   return -EOPNOTSUPP;
+   }
+   if (cls_bpf->common.chain_index)
return -EOPNOTSUPP;
 
/* Only support TC direct action */
if (!cls_bpf->exts_integrated ||
tcf_exts_has_actions(cls_bpf->exts)) {
-   nn_err(nn, "only direct action with no legacy actions 
supported\n");
+   NL_SET_ERR_MSG_MOD(cls_bpf->common.extack,
+  "only direct action with no legacy actions 
supported");
return -EOPNOTSUPP;
}
 
@@ -152,7 +166,8 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type 
type,
return 0;
}
 
-   err = nfp_net_bpf_offload(nn, cls_bpf->prog, oldprog);
+   err = nfp_net_bpf_offload(nn, cls_bpf->prog, oldprog,
+ cls_bpf->common.extack);
if (err)
return err;
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index b80e75a8ecda..80855d43b25e 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -334,7 +334,7 @@ struct nfp_net;
 int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn,
struct netdev_bpf *bpf);
 int nfp_net_bpf_offload(struct nfp_net *nn, struct bpf_prog *prog,
-   bool old_prog);
+   bool old_prog, struct netlink_ext_ack *extack);
 
 struct nfp_insn_meta *
 nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index e2859b2e9c6a..9c78a09cda24 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -271,7 +271,9 @@ int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, 
struct netdev_bpf *bpf)
}
 }
 
-static int nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog)
+static int
+nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog,
+struct netlink_ext_ack *extack)
 {
struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
unsigned int max_mtu;
@@ -281,7 +283,7 @@ static int nfp_net_bpf_load(struct nfp_net *nn, struct 
bpf_prog *prog)
 
max_mtu = nn_readb(nn, NFP_NET_CFG_BPF_INL_MTU) * 64 - 32;
if (max_mtu < nn->dp.netdev->mtu) {
-   nn_info(nn, "BPF offload not supported with MTU larger than HW 
packet split boundary\n");
+   NL_SET_ERR_MSG_MOD(extack, "BPF offload not supported with MTU 
larger than HW packet split boundary");

[PATCH bpf-next v3 06/11] net: sched: add extack support for offload via tc_cls_common_offload

2018-01-16 Thread Jakub Kicinski

From: Quentin Monnet 

Add extack support for hardware offload of classifiers. In order
to achieve this, a pointer to a struct netlink_ext_ack is added to the
struct tc_cls_common_offload that is passed to the callback for setting
up the classifier. Function tc_cls_common_offload_init() is updated to
support initialization of this new attribute.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 include/net/pkt_cls.h| 5 -
 net/sched/cls_bpf.c  | 4 ++--
 net/sched/cls_flower.c   | 6 +++---
 net/sched/cls_matchall.c | 4 ++--
 net/sched/cls_u32.c  | 8 
 5 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 0d1343cba84c..c88c61234cb3 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -590,15 +590,18 @@ struct tc_cls_common_offload {
u32 chain_index;
__be16 protocol;
u32 prio;
+   struct netlink_ext_ack *extack;
 };
 
 static inline void
 tc_cls_common_offload_init(struct tc_cls_common_offload *cls_common,
-  const struct tcf_proto *tp)
+  const struct tcf_proto *tp,
+  struct netlink_ext_ack *extack)
 {
cls_common->chain_index = tp->chain->index;
cls_common->protocol = tp->protocol;
cls_common->prio = tp->prio;
+   cls_common->extack = extack;
 }
 
 struct tc_cls_u32_knode {
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 70397862da4a..d15ef9ab7243 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -159,7 +159,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct 
cls_bpf_prog *prog,
skip_sw = prog && tc_skip_sw(prog->gen_flags);
obj = prog ?: oldprog;
 
-   tc_cls_common_offload_init(_bpf.common, tp);
+   tc_cls_common_offload_init(_bpf.common, tp, extack);
cls_bpf.command = TC_CLSBPF_OFFLOAD;
cls_bpf.exts = >exts;
cls_bpf.prog = prog ? prog->filter : NULL;
@@ -217,7 +217,7 @@ static void cls_bpf_offload_update_stats(struct tcf_proto 
*tp,
struct tcf_block *block = tp->chain->block;
struct tc_cls_bpf_offload cls_bpf = {};
 
-   tc_cls_common_offload_init(_bpf.common, tp);
+   tc_cls_common_offload_init(_bpf.common, tp, NULL);
cls_bpf.command = TC_CLSBPF_STATS;
cls_bpf.exts = >exts;
cls_bpf.prog = prog->filter;
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index ebbaba4a214b..fe7d96d12435 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -223,7 +223,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
struct cls_fl_filter *f)
struct tc_cls_flower_offload cls_flower = {};
struct tcf_block *block = tp->chain->block;
 
-   tc_cls_common_offload_init(_flower.common, tp);
+   tc_cls_common_offload_init(_flower.common, tp, NULL);
cls_flower.command = TC_CLSFLOWER_DESTROY;
cls_flower.cookie = (unsigned long) f;
 
@@ -242,7 +242,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
bool skip_sw = tc_skip_sw(f->flags);
int err;
 
-   tc_cls_common_offload_init(_flower.common, tp);
+   tc_cls_common_offload_init(_flower.common, tp, extack);
cls_flower.command = TC_CLSFLOWER_REPLACE;
cls_flower.cookie = (unsigned long) f;
cls_flower.dissector = dissector;
@@ -271,7 +271,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct 
cls_fl_filter *f)
struct tc_cls_flower_offload cls_flower = {};
struct tcf_block *block = tp->chain->block;
 
-   tc_cls_common_offload_init(_flower.common, tp);
+   tc_cls_common_offload_init(_flower.common, tp, NULL);
cls_flower.command = TC_CLSFLOWER_STATS;
cls_flower.cookie = (unsigned long) f;
cls_flower.exts = >exts;
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 16752abcb76b..fe6b673db5c6 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -76,7 +76,7 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp,
struct tc_cls_matchall_offload cls_mall = {};
struct tcf_block *block = tp->chain->block;
 
-   tc_cls_common_offload_init(_mall.common, tp);
+   tc_cls_common_offload_init(_mall.common, tp, NULL);
cls_mall.command = TC_CLSMATCHALL_DESTROY;
cls_mall.cookie = cookie;
 
@@ -93,7 +93,7 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
bool skip_sw = tc_skip_sw(head->flags);
int err;
 
-   tc_cls_common_offload_init(_mall.common, tp);
+   tc_cls_common_offload_init(_mall.common, tp, extack);
cls_mall.command = TC_CLSMATCHALL_REPLACE;
cls_mall.exts = >exts;
cls_mall.cookie = cookie;
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 671eb952f6af..ef1b746de80b 100644
--- a/net/sched/cls_u32.c
+++

[PATCH bpf-next v3 10/11] netdevsim: add extack support for TC eBPF offload

2018-01-16 Thread Jakub Kicinski

From: Quentin Monnet 

Use the recently added extack support for TC eBPF filters in netdevsim.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/netdevsim/bpf.c | 35 ---
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 5134d5c1306c..0de8ba91b262 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -109,17 +109,35 @@ int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type,
struct netdevsim *ns = cb_priv;
struct bpf_prog *oldprog;
 
-   if (type != TC_SETUP_CLSBPF ||
-   !tc_can_offload(ns->netdev) ||
-   cls_bpf->common.protocol != htons(ETH_P_ALL) ||
-   cls_bpf->common.chain_index)
+   if (type != TC_SETUP_CLSBPF) {
+   NSIM_EA(cls_bpf->common.extack,
+   "only offload of BPF classifiers supported");
+   return -EOPNOTSUPP;
+   }
+
+   if (!tc_can_offload_extack(ns->netdev, cls_bpf->common.extack))
+   return -EOPNOTSUPP;
+
+   if (cls_bpf->common.protocol != htons(ETH_P_ALL)) {
+   NSIM_EA(cls_bpf->common.extack,
+   "only ETH_P_ALL supported as filter protocol");
+   return -EOPNOTSUPP;
+   }
+
+   if (cls_bpf->common.chain_index)
return -EOPNOTSUPP;
 
-   if (!ns->bpf_tc_accept)
+   if (!ns->bpf_tc_accept) {
+   NSIM_EA(cls_bpf->common.extack,
+   "netdevsim configured to reject BPF TC offload");
return -EOPNOTSUPP;
+   }
/* Note: progs without skip_sw will probably not be dev bound */
-   if (prog && !prog->aux->offload && !ns->bpf_tc_non_bound_accept)
+   if (prog && !prog->aux->offload && !ns->bpf_tc_non_bound_accept) {
+   NSIM_EA(cls_bpf->common.extack,
+   "netdevsim configured to reject unbound programs");
return -EOPNOTSUPP;
+   }
 
if (cls_bpf->command != TC_CLSBPF_OFFLOAD)
return -EOPNOTSUPP;
@@ -131,8 +149,11 @@ int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type,
oldprog = NULL;
if (!cls_bpf->prog)
return 0;
-   if (ns->bpf_offloaded)
+   if (ns->bpf_offloaded) {
+   NSIM_EA(cls_bpf->common.extack,
+   "driver and netdev offload states mismatch");
return -EBUSY;
+   }
}
 
return nsim_bpf_offload(ns, cls_bpf->prog, oldprog);
-- 
2.15.1

[PATCH bpf-next v3 03/11] net: sched: cls_matchall: propagate extack support for filter offload

2018-01-16 Thread Jakub Kicinski

From: Quentin Monnet 

Propagate the extack pointer from the `->change()` classifier operation
to the function used for filter replacement in cls_matchall. This makes
it possible to use netlink extack messages in the future at replacement
time for this filter, although it is not used at this point.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 net/sched/cls_matchall.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index dc3c57116bbd..16752abcb76b 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -85,7 +85,8 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp,
 
 static int mall_replace_hw_filter(struct tcf_proto *tp,
  struct cls_mall_head *head,
- unsigned long cookie)
+ unsigned long cookie,
+ struct netlink_ext_ack *extack)
 {
struct tc_cls_matchall_offload cls_mall = {};
struct tcf_block *block = tp->chain->block;
@@ -202,7 +203,8 @@ static int mall_change(struct net *net, struct sk_buff 
*in_skb,
goto err_set_parms;
 
if (!tc_skip_hw(new->flags)) {
-   err = mall_replace_hw_filter(tp, new, (unsigned long) new);
+   err = mall_replace_hw_filter(tp, new, (unsigned long)new,
+extack);
if (err)
goto err_replace_hw_filter;
}
-- 
2.15.1

[PATCH bpf-next v3 00/11] net: sched: add extack support for cls offload

2018-01-16 Thread Jakub Kicinski

Hi!

This series adds extack to cls offloads, as such it could arguably be
targeted at net-next.  Unfortunately, git am is not able to deal cleanly
with minor conflicts on the nfp patches..  Since the series is really
about cls_bpf I hope it's OK if it went via the bpf-next tree.

There is a very minor conflict with Jiri's series, but if this goes
via bpf-next, git will be able to deal with it on merge without a fuss.

Quentin says:

This series tries to improve user experience when eBPF hardware offload
hits error paths at load time. In particular, it introduces netlink
extended ack support in the nfp driver.

To that aim, transmission of the pointer to the extack object is piped
through the `change()` operation of the existing classifiers (patch 1 to
6). Then it is used for TC offload in the nfp driver (patch 8) and in
netdevsim (patch 9, selftest in patch 10). Patch 7 adds a helper to handle
extack messages in the core when TC offload is disabled on the net device.

For completeness extack is propagated for classifiers other than cls_bpf,
but it's up to the drivers to make use of it.


Quentin Monnet (11):
  net: sched: add extack support to change() classifier operation
  net: sched: cls_flower: propagate extack support for filter offload
  net: sched: cls_matchall: propagate extack support for filter offload
  net: sched: cls_u32: propagate extack support for filter offload
  net: sched: cls_bpf: plumb extack support in filter for hardware
offload
  net: sched: add extack support for offload via tc_cls_common_offload
  net: sched: create tc_can_offload_extack() wrapper
  nfp: bpf: plumb extack into functions related to XDP offload
  nfp: bpf: use extack support to improve debugging
  netdevsim: add extack support for TC eBPF offload
  selftests/bpf: add checks on extack messages for eBPF hw offload tests

 drivers/net/ethernet/netronome/nfp/bpf/main.c  |  35 +--
 drivers/net/ethernet/netronome/nfp/bpf/main.h  |   2 +-
 drivers/net/ethernet/netronome/nfp/bpf/offload.c   |  24 +++--
 drivers/net/ethernet/netronome/nfp/nfp_app.h   |   9 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c|   2 +-
 drivers/net/netdevsim/bpf.c|  35 +--
 include/net/pkt_cls.h  |  16 +++-
 include/net/sch_generic.h  |   3 +-
 net/sched/cls_api.c|   3 +-
 net/sched/cls_basic.c  |   3 +-
 net/sched/cls_bpf.c|  20 ++--
 net/sched/cls_cgroup.c |   3 +-
 net/sched/cls_flow.c   |   2 +-
 net/sched/cls_flower.c |  14 +--
 net/sched/cls_fw.c |   2 +-
 net/sched/cls_matchall.c   |  12 ++-
 net/sched/cls_route.c  |   3 +-
 net/sched/cls_rsvp.h   |   2 +-
 net/sched/cls_tcindex.c|   3 +-
 net/sched/cls_u32.c|  21 +++--
 tools/testing/selftests/bpf/test_offload.py| 104 +++--
 21 files changed, 221 insertions(+), 97 deletions(-)

-- 
2.15.1

1 2 3 >

1 - 100 of 296 matches

Mail list logo