[PATCH net-next] net/mlx5: Fix build break
The latest merge between net and net-next introduced a complier assert in mlx5 driver. In hca_cap_bits older fields are kept along with newer fields that should have replaced them. Fixes: c02b3741eb99 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") Signed-off-by: Saeed Mahameed--- include/linux/mlx5/mlx5_ifc.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 94135c03d52b..acd829d8613b 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1035,8 +1035,6 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 log_max_wq_sz[0x5]; u8 nic_vport_change_event[0x1]; - u8 disable_local_lb[0x1]; - u8 reserved_at_3e2[0x1]; u8 disable_local_lb_uc[0x1]; u8 disable_local_lb_mc[0x1]; u8 log_min_hairpin_wq_data_sz[0x5]; -- 2.13.0
[PATCH iproute2-next] bpf: support map offload
When program is loaded with a specified ifindex, use that ifindex also when creating maps. Signed-off-by: Jakub Kicinski--- lib/bpf.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/bpf.c b/lib/bpf.c index d32f1b808180..2db151e4dd3c 100644 --- a/lib/bpf.c +++ b/lib/bpf.c @@ -1208,7 +1208,7 @@ static int bpf_log_realloc(struct bpf_elf_ctx *ctx) static int bpf_map_create(enum bpf_map_type type, uint32_t size_key, uint32_t size_value, uint32_t max_elem, - uint32_t flags, int inner_fd) + uint32_t flags, int inner_fd, uint32_t ifindex) { union bpf_attr attr = {}; @@ -1218,6 +1218,7 @@ static int bpf_map_create(enum bpf_map_type type, uint32_t size_key, attr.max_entries = max_elem; attr.map_flags = flags; attr.inner_map_fd = inner_fd; + attr.map_ifindex = ifindex; return bpf(BPF_MAP_CREATE, , sizeof(attr)); } @@ -1632,7 +1633,9 @@ static int bpf_map_attach(const char *name, struct bpf_elf_ctx *ctx, errno = 0; fd = bpf_map_create(map->type, map->size_key, map->size_value, - map->max_elem, map->flags, map_inner_fd); + map->max_elem, map->flags, map_inner_fd, + ctx->ifindex); + if (fd < 0 || ctx->verbose) { bpf_map_report(fd, name, map, ctx, map_inner_fd); if (fd < 0) -- 2.15.1
Re: WARNING in can_rcv
On Wed, Jan 17, 2018 at 8:12 AM, Eric Biggerswrote: > On Wed, Jan 17, 2018 at 07:39:24AM +0100, Oliver Hartkopp wrote: >> >> >> On 01/16/2018 07:11 PM, Dmitry Vyukov wrote: >> > On Tue, Jan 16, 2018 at 7:07 PM, Marc Kleine-Budde >> > wrote: >> > > On 01/16/2018 06:58 PM, syzbot wrote: >> > > > Hello, >> > > > >> > > > syzkaller hit the following crash on >> > > > a8750ddca918032d6349adbf9a4b6555e7db20da >> > > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master >> > > > compiler: gcc (GCC) 7.1.1 20170620 >> > > > .config is attached >> > > > Raw console output is attached. >> > > > C reproducer is attached >> > > > syzkaller reproducer is attached. See https://goo.gl/kgGztJ >> > > > for information about syzkaller reproducers >> > > > >> > > > >> > > > IMPORTANT: if you fix the bug, please add the following tag to the >> > > > commit: >> > > > Reported-by: syzbot+4386709c0c1284dca...@syzkaller.appspotmail.com >> > > > It will help syzbot understand when the bug is fixed. See footer for >> > > > details. >> > > > If you forward the report, please keep this part and the footer. >> > > > >> > > > device eql entered promiscuous mode >> > > > [ cut here ] >> > > > PF_CAN: dropped non conform CAN skbuf: dev type 65534, len 42, datalen >> > > > 0 >> > > > WARNING: CPU: 0 PID: 3650 at net/can/af_can.c:729 can_rcv+0x1c5/0x200 >> > > > net/can/af_can.c:724 >> > > > Kernel panic - not syncing: panic_on_warn set ... >> > > >> > > Invalid packages generate a warning (WARN_ONCE()), and you have >> > > panic_on_warn active. Should we better silently drop these CAN packages? >> > >> > Hi, >> > >> > pr_warn_once() will be more appropriate. It prints a single line. >> > >> >> The idea behind this WARN() is to detect really bad things that might have >> happen on network driver level: >> >> The CAN subsystem registers with dev_add_pack() for ETH_P_CAN and >> ETH_P_CANFD only. These ETH_P_ types are only allowed to be created by CAN >> network devices (like vcan, vxcan, and real CAN drivers). >> >> I don't have any strong opinion on using WARN() or pr_warn_once(). >> Is this detected violation worth using WARN(), as something already must >> have gone really wrong to trigger this issue? >> > > WARN() indicates a kernel bug. If it's instead "userspace did something > stupid", or "someone sent some unexpected network packet", it needs to be > pr_warn_once(), pr_warn_ratelimited(), or removed entirely. The packet comes from tun device. We could change tun to filter out such packages earlier. However, in the context of "syzkaller support for AF_CAN" discussion, it would actually be useful for fuzzer to be able emit can packets for testing purposes. For example, for tcp it can not just emit random packets, it can build complex user<->network interactions, for example, open a listening socket, connect to it "from outside", accept the connection, and then exchange some data over the active connection. It could do the same for can. Is it possible to allow can packets via tun? Then we could leave this WARNING in place. tun/vcan are contained within a net namespace, so this should not be a security problem, right? Or is there a way to do the same with vcan? If yes, then fuzzer could use vcan. But then we need some fix for this WARNING: either change it to pr_warn or change tun (I don't have strong preference which one).
Re: [PATCH 32/32] aio: implement io_pgetevents
On Tue, Jan 16, 2018 at 07:41:24PM -0500, Jeff Moyer wrote: > I'd be willing to bet the issue is in your io_syscall6 implementation. > You pass in arg5 where arg6 should be used. Don't feel bad, it took me > the better part of today to figure that out. :) > > Here's an incremental diff on top of what you've posted. Feel free to > fold it into your patch (and format however you like). You can find the > libaio changes in my 'aio-poll' branch: > https://pagure.io/libaio/commits/aio-poll > > These changes were run through the libaio test harness, 64 bit and 32 > bit, so the compat system call was tested. Oops, yes. Although I prefer the copy_from_user version, this is what I had: diff --git a/fs/aio.c b/fs/aio.c index 9fe0a5539596..6c1bbfa9b06a 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1984,8 +1984,9 @@ SYSCALL_DEFINE6(io_pgetevents, long, nr, struct io_event __user *, events, struct timespec __user *, timeout, - const sigset_t __user *, sigmask) + const struct __aio_sigset __user *, usig) { + struct __aio_sigset ksig = { NULL, }; sigset_tksigmask, sigsaved; struct timespec64 ts; int ret; @@ -1993,8 +1994,13 @@ SYSCALL_DEFINE6(io_pgetevents, if (timeout && unlikely(get_timespec64(, timeout))) return -EFAULT; - if (sigmask) { - if (copy_from_user(, sigmask, sizeof(ksigmask))) + if (usig && copy_from_user(, usig, sizeof(ksig))) + return -EFAULT; + + if (ksig.sigmask) { + if (ksig.sigsetsize != sizeof(sigset_t)) + return -EINVAL; + if (copy_from_user(, ksig.sigmask, sizeof(ksigmask))) return -EFAULT; sigdelsetmask(, sigmask(SIGKILL) | sigmask(SIGSTOP)); sigprocmask(SIG_SETMASK, , ); @@ -2002,7 +2008,7 @@ SYSCALL_DEFINE6(io_pgetevents, ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); if (signal_pending(current)) { - if (sigmask) { + if (ksig.sigmask) { current->saved_sigmask = sigsaved; set_restore_sigmask(); } @@ -2010,7 +2016,7 @@ SYSCALL_DEFINE6(io_pgetevents, if (!ret) ret = -ERESTARTNOHAND; } else { - if (sigmask) + if (ksig.sigmask) sigprocmask(SIG_SETMASK, , NULL); } @@ -2036,14 +2042,21 @@ COMPAT_SYSCALL_DEFINE5(io_getevents, compat_aio_context_t, ctx_id, return ret; } + +struct __compat_aio_sigset { + compat_sigset_t __user *sigmask; + compat_size_t sigsetsize; +}; + COMPAT_SYSCALL_DEFINE6(io_pgetevents, compat_aio_context_t, ctx_id, compat_long_t, min_nr, compat_long_t, nr, struct io_event __user *, events, struct compat_timespec __user *, timeout, - const compat_sigset_t __user *, sigmask) + const struct __compat_aio_sigset __user *, usig) { + struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; int ret; @@ -2051,8 +2064,13 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, if (timeout && compat_get_timespec64(, timeout)) return -EFAULT; - if (sigmask) { - if (get_compat_sigset(, sigmask)) + if (usig && copy_from_user(, usig, sizeof(ksig))) + return -EFAULT; + + if (ksig.sigmask) { + if (ksig.sigsetsize != sizeof(compat_sigset_t)) + return -EINVAL; + if (get_compat_sigset(, ksig.sigmask)) return -EFAULT; sigdelsetmask(, sigmask(SIGKILL) | sigmask(SIGSTOP)); sigprocmask(SIG_SETMASK, , ); @@ -2060,14 +2078,14 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); if (signal_pending(current)) { - if (sigmask) { + if (ksig.sigmask) { current->saved_sigmask = sigsaved; set_restore_sigmask(); } if (!ret) ret = -ERESTARTNOHAND; } else { - if (sigmask) + if (ksig.sigmask) sigprocmask(SIG_SETMASK, , NULL); } diff --git a/include/linux/compat.h b/include/linux/compat.h index a4cda98073f1..6c04450e961f 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -205,6 +205,7 @@ extern int put_compat_rusage(const struct rusage *, struct compat_rusage __user *); struct compat_siginfo; +struct __compat_aio_sigset; extern asmlinkage long compat_sys_waitid(int, compat_pid_t,
RE: [PATCH net-next v2] net: sched: red: don't reset the backlog on every stat dump
> -Original Message- > From: Jakub Kicinski [mailto:jakub.kicin...@netronome.com] > Sent: Monday, January 15, 2018 6:01 AM > To: da...@davemloft.net; j...@resnulli.us; Nogah Frankel >> Cc: netdev@vger.kernel.org; oss-driv...@netronome.com; > xiyou.wangc...@gmail.com; eduma...@google.com; Yuval Mintz > ; Jakub Kicinski > Subject: [PATCH net-next v2] net: sched: red: don't reset the backlog on every > stat dump > > Commit 0dfb33a0d7e2 ("sch_red: report backlog information") copied > child's backlog into RED's backlog. Back then RED did not maintain > its own backlog counts. This has changed after commit 2f5fb43f > ("net_sched: update hierarchical backlog too") and commit d7f4f332f082 > ("sch_red: update backlog as well"). Copying is no longer necessary. > > Tested: > > $ tc -s qdisc show dev veth0 > qdisc red 1: root refcnt 2 limit 40b min 3b max 3b ecn > Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0) > backlog 1260b 14p requeues 14 > marked 0 early 0 pdrop 0 other 0 > qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s > Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0) > backlog 1260b 14p requeues 14 > > Recently RED offload was added. We need to make sure drivers don't > depend on resetting the stats. This means backlog should be treated > like any other statistic: > > total_stat = new_hw_stat - prev_hw_stat; > > Adjust mlxsw. > > Signed-off-by: Jakub Kicinski Acked-by: Nogah Frankel Thanks Nogah > --- > v2: > - reuse the mlxsw infra added for prio; > - align the way qstats are passed with prio. > > .../net/ethernet/mellanox/mlxsw/spectrum_qdisc.c | 26 > +++--- > include/net/pkt_cls.h | 1 + > net/sched/sch_red.c| 2 +- > 3 files changed, 25 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c > b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c > index e11a0abfc663..8cac5202b913 100644 > --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c > +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c > @@ -247,6 +247,8 @@ mlxsw_sp_setup_tc_qdisc_red_clean_stats(struct > mlxsw_sp_port *mlxsw_sp_port, > > stats_base->overlimits = red_base->prob_drop + red_base- > >prob_mark; > stats_base->drops = red_base->prob_drop + red_base->pdrop; > + > + stats_base->backlog = 0; > } > > static int > @@ -306,6 +308,19 @@ mlxsw_sp_qdisc_red_replace(struct mlxsw_sp_port > *mlxsw_sp_port, >max, prob, p->is_ecn); > } > > +static void > +mlxsw_sp_qdisc_red_unoffload(struct mlxsw_sp_port *mlxsw_sp_port, > + struct mlxsw_sp_qdisc *mlxsw_sp_qdisc, > + void *params) > +{ > + struct tc_red_qopt_offload_params *p = params; > + u64 backlog; > + > + backlog = mlxsw_sp_cells_bytes(mlxsw_sp_port->mlxsw_sp, > +mlxsw_sp_qdisc->stats_base.backlog); > + p->qstats->backlog -= backlog; > +} > + > static int > mlxsw_sp_qdisc_get_red_xstats(struct mlxsw_sp_port *mlxsw_sp_port, > struct mlxsw_sp_qdisc *mlxsw_sp_qdisc, > @@ -338,7 +353,7 @@ mlxsw_sp_qdisc_get_red_stats(struct mlxsw_sp_port > *mlxsw_sp_port, >struct mlxsw_sp_qdisc *mlxsw_sp_qdisc, >struct tc_qopt_offload_stats *stats_ptr) > { > - u64 tx_bytes, tx_packets, overlimits, drops; > + u64 tx_bytes, tx_packets, overlimits, drops, backlog; > u8 tclass_num = mlxsw_sp_qdisc->tclass_num; > struct mlxsw_sp_qdisc_stats *stats_base; > struct mlxsw_sp_port_xstats *xstats; > @@ -354,14 +369,18 @@ mlxsw_sp_qdisc_get_red_stats(struct > mlxsw_sp_port *mlxsw_sp_port, >stats_base->overlimits; > drops = xstats->wred_drop[tclass_num] + xstats- > >tail_drop[tclass_num] - > stats_base->drops; > + backlog = xstats->backlog[tclass_num]; > > _bstats_update(stats_ptr->bstats, tx_bytes, tx_packets); > stats_ptr->qstats->overlimits += overlimits; > stats_ptr->qstats->drops += drops; > stats_ptr->qstats->backlog += > - mlxsw_sp_cells_bytes(mlxsw_sp_port->mlxsw_sp, > - xstats->backlog[tclass_num]); > + mlxsw_sp_cells_bytes(mlxsw_sp_port- > >mlxsw_sp, > + backlog) - > + mlxsw_sp_cells_bytes(mlxsw_sp_port- > >mlxsw_sp, > + stats_base->backlog); > > + stats_base->backlog = backlog; > stats_base->drops += drops; > stats_base->overlimits += overlimits; > stats_base->tx_bytes +=
[PATCH net-next] cxgb4: IPv6 filter takes 2 tids
on T6, IPv6 filter would occupy 2 tids instead of 4. Signed-off-by: Kumar SanghviSigned-off-by: Ganesh Goudar --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 113 +++--- 1 file changed, 80 insertions(+), 33 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c index 677a3ba..3177b0c 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c @@ -439,19 +439,32 @@ int cxgb4_get_free_ftid(struct net_device *dev, int family) if (ftid >= t->nftids) ftid = -1; } else { - ftid = bitmap_find_free_region(t->ftid_bmap, t->nftids, 2); - if (ftid < 0) - goto out_unlock; + if (is_t6(adap->params.chip)) { + ftid = bitmap_find_free_region(t->ftid_bmap, + t->nftids, 1); + if (ftid < 0) + goto out_unlock; + + /* this is only a lookup, keep the found region +* unallocated +*/ + bitmap_release_region(t->ftid_bmap, ftid, 1); + } else { + ftid = bitmap_find_free_region(t->ftid_bmap, + t->nftids, 2); + if (ftid < 0) + goto out_unlock; - /* this is only a lookup, keep the found region unallocated */ - bitmap_release_region(t->ftid_bmap, ftid, 2); + bitmap_release_region(t->ftid_bmap, ftid, 2); + } } out_unlock: spin_unlock_bh(>ftid_lock); return ftid; } -static int cxgb4_set_ftid(struct tid_info *t, int fidx, int family) +static int cxgb4_set_ftid(struct tid_info *t, int fidx, int family, + unsigned int chip_ver) { spin_lock_bh(>ftid_lock); @@ -460,22 +473,31 @@ static int cxgb4_set_ftid(struct tid_info *t, int fidx, int family) return -EBUSY; } - if (family == PF_INET) + if (family == PF_INET) { __set_bit(fidx, t->ftid_bmap); - else - bitmap_allocate_region(t->ftid_bmap, fidx, 2); + } else { + if (chip_ver < CHELSIO_T6) + bitmap_allocate_region(t->ftid_bmap, fidx, 2); + else + bitmap_allocate_region(t->ftid_bmap, fidx, 1); + } spin_unlock_bh(>ftid_lock); return 0; } -static void cxgb4_clear_ftid(struct tid_info *t, int fidx, int family) +static void cxgb4_clear_ftid(struct tid_info *t, int fidx, int family, +unsigned int chip_ver) { spin_lock_bh(>ftid_lock); - if (family == PF_INET) + if (family == PF_INET) { __clear_bit(fidx, t->ftid_bmap); - else - bitmap_release_region(t->ftid_bmap, fidx, 2); + } else { + if (chip_ver < CHELSIO_T6) + bitmap_release_region(t->ftid_bmap, fidx, 2); + else + bitmap_release_region(t->ftid_bmap, fidx, 1); + } spin_unlock_bh(>ftid_lock); } @@ -1249,23 +1271,42 @@ int __cxgb4_set_filter(struct net_device *dev, int filter_id, } } } else { /* IPv6 */ - /* Ensure that the IPv6 filter is aligned on a -* multiple of 4 boundary. -*/ - if (filter_id & 0x3) { - dev_err(adapter->pdev_dev, - "Invalid location. IPv6 must be aligned on a 4-slot boundary\n"); - return -EINVAL; - } + if (chip_ver < CHELSIO_T6) { + /* Ensure that the IPv6 filter is aligned on a +* multiple of 4 boundary. +*/ + if (filter_id & 0x3) { + dev_err(adapter->pdev_dev, + "Invalid location. IPv6 must be aligned on a 4-slot boundary\n"); + return -EINVAL; + } - /* Check all except the base overlapping IPv4 filter slots. */ - for (fidx = filter_id + 1; fidx < filter_id + 4; fidx++) { + /* Check all except the base overlapping IPv4 filter +* slots. +*/ + for (fidx = filter_id + 1; fidx < filter_id + 4; +fidx++) { + f = >tids.ftid_tab[fidx]; + if (f->valid) { +
Re: net merged into net-next
Dave, The resolution of the mlx5_ifc conflict was wrong and it causes a build break in mlx5, oops :(. I hope my resolution instructions in my pull request didn't mislead you. I will post a patch. -Saeed.
[PATCH net-next 1/2] cxgb4: update dump collection logic to use compression
Update firmware dump collection logic to use compression when available. Let collection logic attempt to do compression, instead of returning out of memory early. Signed-off-by: Rahul LakkireddySigned-off-by: Vishal Kulkarni Signed-off-by: Ganesh Goudar --- drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c | 24 +- drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h | 3 + drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c | 280 +++-- .../net/ethernet/chelsio/cxgb4/cudbg_lib_common.h | 7 +- drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.h| 27 ++ drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 13 +- 6 files changed, 207 insertions(+), 147 deletions(-) create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.h diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c index f78ba1743b5a..8edc49827af0 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c @@ -19,7 +19,8 @@ #include "cudbg_if.h" #include "cudbg_lib_common.h" -int cudbg_get_buff(struct cudbg_buffer *pdbg_buff, u32 size, +int cudbg_get_buff(struct cudbg_init *pdbg_init, + struct cudbg_buffer *pdbg_buff, u32 size, struct cudbg_buffer *pin_buff) { u32 offset; @@ -28,17 +29,30 @@ int cudbg_get_buff(struct cudbg_buffer *pdbg_buff, u32 size, if (offset + size > pdbg_buff->size) return CUDBG_STATUS_NO_MEM; + if (pdbg_init->compress_type != CUDBG_COMPRESSION_NONE) { + if (size > pdbg_init->compress_buff_size) + return CUDBG_STATUS_NO_MEM; + + pin_buff->data = (char *)pdbg_init->compress_buff; + pin_buff->offset = 0; + pin_buff->size = size; + return 0; + } + pin_buff->data = (char *)pdbg_buff->data + offset; pin_buff->offset = offset; pin_buff->size = size; - pdbg_buff->size -= size; return 0; } -void cudbg_put_buff(struct cudbg_buffer *pin_buff, - struct cudbg_buffer *pdbg_buff) +void cudbg_put_buff(struct cudbg_init *pdbg_init, + struct cudbg_buffer *pin_buff) { - pdbg_buff->size += pin_buff->size; + /* Clear compression buffer for re-use */ + if (pdbg_init->compress_type != CUDBG_COMPRESSION_NONE) + memset(pdbg_init->compress_buff, 0, + pdbg_init->compress_buff_size); + pin_buff->data = NULL; pin_buff->offset = 0; pin_buff->size = 0; diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h index 88e740082a02..eb1d2f48ebd3 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h @@ -87,6 +87,9 @@ struct cudbg_init { struct adapter *adap; /* Pointer to adapter structure */ void *outbuf; /* Output buffer */ u32 outbuf_size; /* Output buffer size */ + u8 compress_type; /* Type of compression to use */ + void *compress_buff; /* Compression buffer */ + u32 compress_buff_size; /* Compression buffer size */ }; static inline unsigned int cudbg_mbytes_to_bytes(unsigned int size) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c index 0a3871f10787..8b95117c2923 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c @@ -23,12 +23,57 @@ #include "cudbg_lib_common.h" #include "cudbg_entity.h" #include "cudbg_lib.h" +#include "cudbg_zlib.h" -static void cudbg_write_and_release_buff(struct cudbg_buffer *pin_buff, -struct cudbg_buffer *dbg_buff) +static int cudbg_do_compression(struct cudbg_init *pdbg_init, + struct cudbg_buffer *pin_buff, + struct cudbg_buffer *dbg_buff) { - cudbg_update_buff(pin_buff, dbg_buff); - cudbg_put_buff(pin_buff, dbg_buff); + struct cudbg_buffer temp_in_buff = { 0 }; + int bytes_left, bytes_read, bytes; + u32 offset = dbg_buff->offset; + int rc; + + temp_in_buff.offset = pin_buff->offset; + temp_in_buff.data = pin_buff->data; + temp_in_buff.size = pin_buff->size; + + bytes_left = pin_buff->size; + bytes_read = 0; + while (bytes_left > 0) { + /* Do compression in smaller chunks */ + bytes = min_t(unsigned long, bytes_left, + (unsigned long)CUDBG_CHUNK_SIZE); + temp_in_buff.data = (char *)pin_buff->data + bytes_read; + temp_in_buff.size = bytes; + rc = cudbg_compress_buff(pdbg_init, _in_buff, dbg_buff); + if (rc) +
[PATCH net-next 2/2] cxgb4: use zlib deflate to compress firmware dump
Use zlib deflate to compress firmware dump. Collect and compress as much firmware dump as possible into a 32 MB buffer. Signed-off-by: Rahul LakkireddySigned-off-by: Vishal Kulkarni Signed-off-by: Ganesh Goudar --- drivers/net/ethernet/chelsio/cxgb4/Makefile| 1 + drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h | 1 + .../net/ethernet/chelsio/cxgb4/cudbg_lib_common.h | 1 + drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c| 81 ++ drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.h| 29 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 56 ++- drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h | 3 + 7 files changed, 169 insertions(+), 3 deletions(-) create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c diff --git a/drivers/net/ethernet/chelsio/cxgb4/Makefile b/drivers/net/ethernet/chelsio/cxgb4/Makefile index 8c9c6b0d2e5d..5df923798669 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/Makefile +++ b/drivers/net/ethernet/chelsio/cxgb4/Makefile @@ -12,3 +12,4 @@ cxgb4-objs := cxgb4_main.o l2t.o smt.o t4_hw.o sge.o clip_tbl.o cxgb4_ethtool.o cxgb4-$(CONFIG_CHELSIO_T4_DCB) += cxgb4_dcb.o cxgb4-$(CONFIG_CHELSIO_T4_FCOE) += cxgb4_fcoe.o cxgb4-$(CONFIG_DEBUG_FS) += cxgb4_debugfs.o +cxgb4-$(CONFIG_ZLIB_DEFLATE) += cudbg_zlib.o diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h index eb1d2f48ebd3..8568a51f6414 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h @@ -90,6 +90,7 @@ struct cudbg_init { u8 compress_type; /* Type of compression to use */ void *compress_buff; /* Compression buffer */ u32 compress_buff_size; /* Compression buffer size */ + void *workspace; /* Workspace for zlib */ }; static inline unsigned int cudbg_mbytes_to_bytes(unsigned int size) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib_common.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib_common.h index 2e1c8e87c9bd..8150ea85d6a5 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib_common.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib_common.h @@ -26,6 +26,7 @@ enum cudbg_dump_type { enum cudbg_compression_type { CUDBG_COMPRESSION_NONE = 1, + CUDBG_COMPRESSION_ZLIB, }; struct cudbg_hdr { diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c new file mode 100644 index ..4c3854cbeb6c --- /dev/null +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c @@ -0,0 +1,81 @@ +/* + * Copyright (C) 2018 Chelsio Communications. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + */ + +#include + +#include "cxgb4.h" +#include "cudbg_if.h" +#include "cudbg_lib_common.h" +#include "cudbg_zlib.h" + +static int cudbg_get_compress_hdr(struct cudbg_buffer *pdbg_buff, + struct cudbg_buffer *pin_buff) +{ + if (pdbg_buff->offset + sizeof(struct cudbg_compress_hdr) > + pdbg_buff->size) + return CUDBG_STATUS_NO_MEM; + + pin_buff->data = (char *)pdbg_buff->data + pdbg_buff->offset; + pin_buff->offset = 0; + pin_buff->size = sizeof(struct cudbg_compress_hdr); + pdbg_buff->offset += sizeof(struct cudbg_compress_hdr); + return 0; +} + +int cudbg_compress_buff(struct cudbg_init *pdbg_init, + struct cudbg_buffer *pin_buff, + struct cudbg_buffer *pout_buff) +{ + struct z_stream_s compress_stream = { 0 }; + struct cudbg_buffer temp_buff = { 0 }; + struct cudbg_compress_hdr *c_hdr; + int rc; + + /* Write compression header to output buffer before compression */ + rc = cudbg_get_compress_hdr(pout_buff, _buff); + if (rc) + return rc; + + c_hdr = (struct cudbg_compress_hdr *)temp_buff.data; + c_hdr->compress_id = CUDBG_ZLIB_COMPRESS_ID; + + compress_stream.workspace = pdbg_init->workspace; + rc = zlib_deflateInit2(_stream, Z_DEFAULT_COMPRESSION, + Z_DEFLATED, CUDBG_ZLIB_WIN_BITS, + CUDBG_ZLIB_MEM_LVL, Z_DEFAULT_STRATEGY); + if (rc != Z_OK) + return CUDBG_SYSTEM_ERROR; + + compress_stream.next_in = pin_buff->data; +
[PATCH net-next 0/2] cxgb4: reduce memory footprint for collecting firmware dump
Firmware dump can be large (upto 2 GB). In low memory conditions, ethtool fails to allocate such large memory. So, use zlib deflate to compress collected firmware dump. Patch 1 updates collection logic to use compression. Patch 2 adds zlib deflate to compress collected firmware dump. Thanks, Rahul Rahul Lakkireddy (2): cxgb4: update dump collection logic to use compression cxgb4: use zlib deflate to compress firmware dump drivers/net/ethernet/chelsio/cxgb4/Makefile| 1 + drivers/net/ethernet/chelsio/cxgb4/cudbg_common.c | 24 +- drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h | 4 + drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c | 280 +++-- .../net/ethernet/chelsio/cxgb4/cudbg_lib_common.h | 8 +- drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c| 81 ++ drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.h| 56 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 65 - drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h | 3 + 9 files changed, 374 insertions(+), 148 deletions(-) create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.c create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cudbg_zlib.h -- 2.14.1
Re: DPAA Ethernet traffice troubles with Linux kernel
Hi Skateman, Fantastic! Many thanks for testing the RC8 of kernel 4.15 without PAMU support. @All Further information: http://forum.hyperion-entertainment.biz/viewtopic.php?f=58=43706#p43706 Cheers, Christian On 16. Jan 2018, at 23:05, mad skatemanwrote: Fantastic Christian.. Your latest kernel makes the NIC work!!! Few tweaks to be done... like the buffer space Brilliant! On 16 January 2018 at 9:42PM, Christian Zigotzky wrote: Hi All, I compiled the RC8 of kernel 4.15 for the X5000 without PAMU support today. Download: http://www.xenosoft.de/uImage_without_pamu.tar.gz Please test it on your AmigaOne X5000. Thanks, Christian On 16 January 2018 at 6:33PM, Madalin-cristian Bucur wrote: The PAMU related errors may be relevant to the issue, if you have incorrect settings you may have no traffic passing through. The PAMU configuration should be made by the bootloader. Can you try to disable CONFIG_FSL_PAMU? Madalin
Re: WARNING in can_rcv
On Wed, Jan 17, 2018 at 07:39:24AM +0100, Oliver Hartkopp wrote: > > > On 01/16/2018 07:11 PM, Dmitry Vyukov wrote: > > On Tue, Jan 16, 2018 at 7:07 PM, Marc Kleine-Budde> > wrote: > > > On 01/16/2018 06:58 PM, syzbot wrote: > > > > Hello, > > > > > > > > syzkaller hit the following crash on > > > > a8750ddca918032d6349adbf9a4b6555e7db20da > > > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master > > > > compiler: gcc (GCC) 7.1.1 20170620 > > > > .config is attached > > > > Raw console output is attached. > > > > C reproducer is attached > > > > syzkaller reproducer is attached. See https://goo.gl/kgGztJ > > > > for information about syzkaller reproducers > > > > > > > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the > > > > commit: > > > > Reported-by: syzbot+4386709c0c1284dca...@syzkaller.appspotmail.com > > > > It will help syzbot understand when the bug is fixed. See footer for > > > > details. > > > > If you forward the report, please keep this part and the footer. > > > > > > > > device eql entered promiscuous mode > > > > [ cut here ] > > > > PF_CAN: dropped non conform CAN skbuf: dev type 65534, len 42, datalen 0 > > > > WARNING: CPU: 0 PID: 3650 at net/can/af_can.c:729 can_rcv+0x1c5/0x200 > > > > net/can/af_can.c:724 > > > > Kernel panic - not syncing: panic_on_warn set ... > > > > > > Invalid packages generate a warning (WARN_ONCE()), and you have > > > panic_on_warn active. Should we better silently drop these CAN packages? > > > > Hi, > > > > pr_warn_once() will be more appropriate. It prints a single line. > > > > The idea behind this WARN() is to detect really bad things that might have > happen on network driver level: > > The CAN subsystem registers with dev_add_pack() for ETH_P_CAN and > ETH_P_CANFD only. These ETH_P_ types are only allowed to be created by CAN > network devices (like vcan, vxcan, and real CAN drivers). > > I don't have any strong opinion on using WARN() or pr_warn_once(). > Is this detected violation worth using WARN(), as something already must > have gone really wrong to trigger this issue? > WARN() indicates a kernel bug. If it's instead "userspace did something stupid", or "someone sent some unexpected network packet", it needs to be pr_warn_once(), pr_warn_ratelimited(), or removed entirely. Eric
Re: [PATCH 32/32] aio: implement io_pgetevents
On Wed, Jan 17, 2018 at 04:27:21AM +, Al Viro wrote: > On Tue, Jan 16, 2018 at 07:41:24PM -0500, Jeff Moyer wrote: > > if (sigmask) { > > - if (copy_from_user(, sigmask, sizeof(ksigmask))) > > + if (!access_ok(VERIFY_READ, sigmask, > > + sizeof(void *) + sizeof(size_t)) || > > + __get_user(up, (sigset_t __user * __user *)sigmask) || > > + __get_user(sigsetsize, > > + (size_t __user *)(sigmask + sizeof(void * > > return -EFAULT; > > How about copy_from_user() on a struct? Making eyes bleed is fun, but > people tend to get annoyed when you do it to them... Above is the copy & paste version from pselect. I've got both copy_from_user and that horrible version in my tree, and if we really need this awfull calling convention copy_from_user certainly is much better. pselect also should be switched to explicit struct + copy_from_user while we're at it. In fact glibc defines a struct for the userland version to start with.
Re: [PATCH net-next] net: stmmac: Fix reception of Broadcom switches tags
Hi Florian for gmac4.x and gmac3.x series the ACS bit is the Automatic Pad or CRC Stripping, so the core strips the Pad or FCS on frames if the value of the length field is < 1536 bytes. For MAC10-100 there is the Bit 8 (ASTP) of the reg0 that does the same if len is < 46bytes. In your patch I can just suggest to add a new field to strip the PAD/FCS w/o passing the whole netdev struct to the core_init. In the main driver, we could manage the pad-strip feature (also by using dt) or disable it in case of netdev_uses_dsa; then propagating this setting to the core_init or calling a new callback. What do you think? Regards Peppe On 1/17/2018 12:25 AM, Florian Fainelli wrote: Broadcom tags inserted by Broadcom switches put a 4 byte header after the MAC SA and before the EtherType, which may look like some sort of 0 length LLC/SNAP packet (tcpdump and wireshark do think that way). With ACS enabled in stmmac the packets were truncated to 8 bytes on reception, whereas clearing this bit allowed normal reception to occur. In order to make that possible, we need to pass a net_device argument to the different core_init() functions and we are dependent on the Broadcom tagger padding packets correctly (which it now does). To be as little invasive as possible, this is only done for gmac1000 when the network device is DSA-enabled (netdev_uses_dsa() returns true). Signed-off-by: Florian Fainelli--- drivers/net/ethernet/stmicro/stmmac/common.h | 2 +- drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c| 3 ++- drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 12 +++- drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c | 3 ++- drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c| 11 ++- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c| 2 +- 6 files changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index ce2ea2d491ac..2ffe76c0ff74 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -474,7 +474,7 @@ struct mac_device_info; /* Helpers to program the MAC core */ struct stmmac_ops { /* MAC core initialization */ - void (*core_init)(struct mac_device_info *hw, int mtu); + void (*core_init)(struct mac_device_info *hw, struct net_device *dev); /* Enable the MAC RX/TX */ void (*set_mac)(void __iomem *ioaddr, bool enable); /* Enable and verify that the IPC module is supported */ diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c index 9eb7f65d8000..a3fa65b1ca8e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c @@ -483,7 +483,8 @@ static int sun8i_dwmac_init(struct platform_device *pdev, void *priv) return 0; } -static void sun8i_dwmac_core_init(struct mac_device_info *hw, int mtu) +static void sun8i_dwmac_core_init(struct mac_device_info *hw, + struct net_device *dev) { void __iomem *ioaddr = hw->pcsr; u32 v; diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c index 8a86340ff2d3..540d21786a43 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c @@ -25,18 +25,28 @@ #include #include #include +#include #include #include "stmmac_pcs.h" #include "dwmac1000.h" -static void dwmac1000_core_init(struct mac_device_info *hw, int mtu) +static void dwmac1000_core_init(struct mac_device_info *hw, + struct net_device *dev) { void __iomem *ioaddr = hw->pcsr; u32 value = readl(ioaddr + GMAC_CONTROL); + int mtu = dev->mtu; /* Configure GMAC core */ value |= GMAC_CORE_INIT; + /* Clear ACS bit because Ethernet switch tagging formats such as +* Broadcom tags can look like invalid LLC/SNAP packets and cause the +* hardware to truncate packets on reception. +*/ + if (netdev_uses_dsa(dev)) + value &= ~GMAC_CONTROL_ACS; + if (mtu > 1500) value |= GMAC_CONTROL_2K; if (mtu > 2000) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c index 8ef517356313..c1ee427c42cb 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c @@ -28,7 +28,8 @@ #include #include "dwmac100.h" -static void dwmac100_core_init(struct mac_device_info *hw, int mtu) +static void dwmac100_core_init(struct mac_device_info *hw, + struct net_device *dev) { void __iomem *ioaddr = hw->pcsr; u32 value = readl(ioaddr
[PATCH net 2/2] cxgb4: fix endianness for vlan value in cxgb4_tc_flower
From: Kumar SanghviDon't change endianness when assigning vlan value in cxgb4_tc_flower code when processing flow match parameters. The value gets converted to network order as part of filtering code in set_filter_wr. Signed-off-by: Kumar Sanghvi Signed-off-by: Rahul Lakkireddy Signed-off-by: Ganesh Goudar --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c index 276edcbb3259..a452d5a1b0f3 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c @@ -208,8 +208,8 @@ static void cxgb4_process_flow_match(struct net_device *dev, VLAN_PRIO_SHIFT); vlan_tci_mask = mask->vlan_id | (mask->vlan_priority << VLAN_PRIO_SHIFT); - fs->val.ivlan = cpu_to_be16(vlan_tci); - fs->mask.ivlan = cpu_to_be16(vlan_tci_mask); + fs->val.ivlan = vlan_tci; + fs->mask.ivlan = vlan_tci_mask; /* Chelsio adapters use ivlan_vld bit to match vlan packets * as 802.1Q. Also, when vlan tag is present in packets, -- 2.14.1
[PATCH net 1/2] cxgb4: set filter type to 1 for ETH_P_IPV6
From: Kumar SanghviFor ethtype_key = ETH_P_IPV6, set filter type as 1 in cxgb4_tc_flower code when processing flow match parameters. Signed-off-by: Kumar Sanghvi Signed-off-by: Rahul Lakkireddy Signed-off-by: Ganesh Goudar --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c index d4a548a6a55c..276edcbb3259 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c @@ -111,6 +111,9 @@ static void cxgb4_process_flow_match(struct net_device *dev, ethtype_mask = 0; } + if (ethtype_key == ETH_P_IPV6) + fs->type = 1; + fs->val.ethtype = ethtype_key; fs->mask.ethtype = ethtype_mask; fs->val.proto = key->ip_proto; -- 2.14.1
[PATCH net 0/2] cxgb4: fix issues in rule processing for tc-flower offload
Patch 1 sets filter type to indicate IPv6 when processing flow match parameters. Patch 2 fixes endianness issue when processing vlan flow match parameters. Kumar Sanghvi (2): cxgb4: set filter type to 1 for ETH_P_IPV6 cxgb4: fix endianness for vlan value in cxgb4_tc_flower drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) -- 2.14.1
[PATCH v2 net-next] virtio_net: Add ethtool stats
The main purpose of this patch is adding a way of checking per-queue stats. It's useful to debug performance problems on multiqueue environment. $ ethtool -S ens10 NIC statistics: rx_queue_0_packets: 2090408 rx_queue_0_bytes: 3164825094 rx_queue_1_packets: 2082531 rx_queue_1_bytes: 3152932314 tx_queue_0_packets: 2770841 tx_queue_0_bytes: 4194955474 tx_queue_1_packets: 3084697 tx_queue_1_bytes: 4670196372 This change converts existing per-cpu stats structure into per-queue one. This should not impact on performance since each queue counter is not updated concurrently by multiple cpus. Performance numbers: - Guest has 2 vcpus and 2 queues - Guest runs netserver - Host runs 100-flow super_netperf Before After Diff UDP_STREAM 18byte86.22 87.00 +0.90% UDP_STREAM 1472byte4055.27 4042.18 -0.32% TCP_STREAM16956.3216890.63 -0.39% UDP_RR 178667.11 185862.70 +4.03% TCP_RR 128473.04 124985.81 -2.71% Signed-off-by: Toshiaki Makita--- v2: - Removed redundant counters which can be obtained from dev_get_stats. - Made queue counter structure different for tx and rx so they can be easily extended separately, as some additional counters are expected like XDP related ones and VM-Exit event. - Added performance numbers in commitlog. drivers/net/virtio_net.c | 191 ++- 1 file changed, 141 insertions(+), 50 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 12dfc5f..626c273 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -66,16 +66,39 @@ VIRTIO_NET_F_GUEST_UFO }; -struct virtnet_stats { - struct u64_stats_sync tx_syncp; - struct u64_stats_sync rx_syncp; - u64 tx_bytes; - u64 tx_packets; - - u64 rx_bytes; - u64 rx_packets; +struct virtnet_stat_desc { + char desc[ETH_GSTRING_LEN]; + size_t offset; }; +struct virtnet_sq_stats { + struct u64_stats_sync syncp; + u64 packets; + u64 bytes; +}; + +struct virtnet_rq_stats { + struct u64_stats_sync syncp; + u64 packets; + u64 bytes; +}; + +#define VIRTNET_SQ_STAT(m) offsetof(struct virtnet_sq_stats, m) +#define VIRTNET_RQ_STAT(m) offsetof(struct virtnet_rq_stats, m) + +static const struct virtnet_stat_desc virtnet_sq_stats_desc[] = { + { "packets",VIRTNET_SQ_STAT(packets) }, + { "bytes", VIRTNET_SQ_STAT(bytes) }, +}; + +static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = { + { "packets",VIRTNET_RQ_STAT(packets) }, + { "bytes", VIRTNET_RQ_STAT(bytes) }, +}; + +#define VIRTNET_SQ_STATS_LEN ARRAY_SIZE(virtnet_sq_stats_desc) +#define VIRTNET_RQ_STATS_LEN ARRAY_SIZE(virtnet_rq_stats_desc) + /* Internal representation of a send virtqueue */ struct send_queue { /* Virtqueue associated with this send _queue */ @@ -87,6 +110,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + struct virtnet_sq_stats stats; + struct napi_struct napi; }; @@ -99,6 +124,8 @@ struct receive_queue { struct bpf_prog __rcu *xdp_prog; + struct virtnet_rq_stats stats; + /* Chain pages by the private ptr. */ struct page *pages; @@ -152,9 +179,6 @@ struct virtnet_info { /* Packet virtio header size */ u8 hdr_len; - /* Active statistics */ - struct virtnet_stats __percpu *stats; - /* Work struct for refilling if we run low on memory. */ struct delayed_work refill; @@ -1127,7 +1151,6 @@ static int virtnet_receive(struct receive_queue *rq, int budget, bool *xdp_xmit) struct virtnet_info *vi = rq->vq->vdev->priv; unsigned int len, received = 0, bytes = 0; void *buf; - struct virtnet_stats *stats = this_cpu_ptr(vi->stats); if (!vi->big_packets || vi->mergeable_rx_bufs) { void *ctx; @@ -1150,10 +1173,10 @@ static int virtnet_receive(struct receive_queue *rq, int budget, bool *xdp_xmit) schedule_delayed_work(>refill, 0); } - u64_stats_update_begin(>rx_syncp); - stats->rx_bytes += bytes; - stats->rx_packets += received; - u64_stats_update_end(>rx_syncp); + u64_stats_update_begin(>stats.syncp); + rq->stats.bytes += bytes; + rq->stats.packets += received; + u64_stats_update_end(>stats.syncp); return received; } @@ -1162,8 +1185,6 @@ static void free_old_xmit_skbs(struct send_queue *sq) { struct sk_buff *skb; unsigned int len; - struct virtnet_info *vi = sq->vq->vdev->priv; - struct virtnet_stats *stats = this_cpu_ptr(vi->stats); unsigned int packets = 0; unsigned int bytes = 0; @@ -1182,10 +1203,10 @@ static void
Re: WARNING in can_rcv
On 01/16/2018 07:11 PM, Dmitry Vyukov wrote: On Tue, Jan 16, 2018 at 7:07 PM, Marc Kleine-Buddewrote: On 01/16/2018 06:58 PM, syzbot wrote: Hello, syzkaller hit the following crash on a8750ddca918032d6349adbf9a4b6555e7db20da git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master compiler: gcc (GCC) 7.1.1 20170620 .config is attached Raw console output is attached. C reproducer is attached syzkaller reproducer is attached. See https://goo.gl/kgGztJ for information about syzkaller reproducers IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+4386709c0c1284dca...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. device eql entered promiscuous mode [ cut here ] PF_CAN: dropped non conform CAN skbuf: dev type 65534, len 42, datalen 0 WARNING: CPU: 0 PID: 3650 at net/can/af_can.c:729 can_rcv+0x1c5/0x200 net/can/af_can.c:724 Kernel panic - not syncing: panic_on_warn set ... Invalid packages generate a warning (WARN_ONCE()), and you have panic_on_warn active. Should we better silently drop these CAN packages? Hi, pr_warn_once() will be more appropriate. It prints a single line. The idea behind this WARN() is to detect really bad things that might have happen on network driver level: The CAN subsystem registers with dev_add_pack() for ETH_P_CAN and ETH_P_CANFD only. These ETH_P_ types are only allowed to be created by CAN network devices (like vcan, vxcan, and real CAN drivers). I don't have any strong opinion on using WARN() or pr_warn_once(). Is this detected violation worth using WARN(), as something already must have gone really wrong to trigger this issue? Best regards, Oliver
Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
On Tue, Jan 16, 2018 at 09:49:16PM -0800, John Fastabend wrote: > > > but this program will see only first SG ? > > Correct, to read further into the msg we would need to have a helper > or some other way to catch reads/writes past the first 4k and read > the next sg. We have the same limitation in cls_bpf. > > I have started a patch on top of this series with the current working > title msg_apply_bytes(int bytes). This would let us apply a verdict to > a set number of bytes instead of the entire msg. By calling > msg_apply_bytes(data_end - data) a user who needs to read an entire msg > could do this in 4k chunks until the entire msg is passed through the > bpf prog. good idea. I think would be good to add this helper as part of this patch set to make sure there is a way for user to look through the whole tcp stream if the program really wants to. I understand that program cannot examine every byte anyway due to lack of loops and helpers, but this part of sockmap api should still provide an interface from day one. One example would be the program parsing http2 or similar where in the header it sees length. Then it can do msg_apply_bytes(length) to skip the bytes it processed, but still continue within the same 64Kbyte chunk when 0 < length < 64k > > and it's typically going to be one page only ? > > yep > > > then what's the value of waiting for MAX_SKB_FRAGS ? > > > Its not waiting for MAX_SKB_FRAGS its simple copying up to MAX_SKB_FRAGS > pages in one call if possible. It seems better to me to run this loop > over as much data as we can. agree on trying to do MAX_SKB_FRAGS as a 'processing unit', but program should still be able to skip or redirect parts of the bytes and not the whole 64k chunk. >From program point of view it should never see or worry about SG list boundaries whereas right now it seems that below code is dealing with them (though program doesn't know where sg ends): > + > + switch (eval) { > + case __SK_PASS: > + sg_mark_end(sg + sg_num - 1); > + err = bpf_tcp_push(sk, sg, _curr, flags, true); > + if (unlikely(err)) { > + copied -= free_sg(sk, sg, sg_curr, sg_num); > + goto out_err; > + } > + break; > + case __SK_REDIRECT: > + sg_mark_end(sg + sg_num - 1); > + goto do_redir; ... > >> +static int bpf_tcp_ulp_register(void) > >> +{ > >> + tcp_bpf_proto = tcp_prot; > >> + tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg; > >> + tcp_bpf_proto.sendpage = bpf_tcp_sendpage; > >> + return tcp_register_ulp(_tcp_ulp_ops); > > > > I don't see corresponding tcp_unregister_ulp(). > > > > There is none. tcp_register_ulp() adds the bpf_tcp_ulp to the list of > available ULPs and never removes it. To remove it we would have to > keep a ref count on the reg/unreg calls. This would require a couple > more patches to the ULP infra and not sure it hurts to leave the ULP > reference around... > > Maybe a follow on patch? Or else it could be a patch in front of this > patch. I see. I'm ok with leaving that for latter. It doesn't hurt to keep it registered. Please add a comment though.
[linux-next] kernel Oops when booting powerpc
Greeting's linux-next kernel booted with kernel Oops on powerpc machine. Machine Type: Power 8 [bare-metal & PowerVM LPAR] kernel version: 4.15.0-rc7-next-20180115 test: Boot config: attached bootlogs: - ses 0:0:3:0: Attached Enclosure device ses 0:0:4:0: Attached Enclosure device Rounding down aligned max_sectors from 4294967295 to 4294967168 Loading iSCSI transport class v2.0-870. iscsi: registered transport (iser) RPC: Registered rdma transport module. RPC: Registered rdma backchannel transport module. ip6_tables: (C) 2000-2006 Netfilter Core Team Ebtables v2.0 registered nf_conntrack version 0.5.0 (65536 buckets, 262144 max) Unable to handle kernel paging request for data at address 0x0118 Faulting instruction address: 0xd000102d0fa8 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=1024 NUMA PowerNV Modules linked in: ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm iw_cxgb3 ib_core ses enclosure scsi_transport_sas i2c_opal ipmi_powernv i2c_core ipmi_devintf ipmi_msghandler powernv_op_panel nfsd auth_rpcgss kvm_hv nfs_acl kvm_pr lockd grace kvm sunrpc qla2xxx scsi_transport_fc tg3 ptp pps_core cxgb3 mdio CPU: 28 PID: 3447 Comm: ip6tables Not tainted 4.15.0-rc7-next-20180115-autotest #1 NIP: d000102d0fa8 LR: d000102d0f94 CTR: c0138ee0 REGS: c007a41377f0 TRAP: 0300 Not tainted (4.15.0-rc7-next-20180115-autotest) MSR: 90010280b033CR: 42002848 XER: CFAR: c000884c DAR: 0118 DSISR: 4000 SOFTE: 1 GPR00: d000102d0f94 c007a4137a70 d000102dd000 0100 GPR04: 0001 0001 GPR08: c007a4137880 f000 0005 GPR12: 2200 cfd53400 10014f80 GPR16: 0002 7fff9c400518 7c2b0c98 GPR20: 0003 10013c50 7c2b0ca0 7c2bffd3 GPR24: 7c2b0890 7c2b088c 7c2b0894 c007a4137d00 GPR28: 7c2b0894 c007a4137d00 0100 c13ef580 NIP [d000102d0fa8] get_info+0x98/0x290 [ip6_tables] LR [d000102d0f94] get_info+0x84/0x290 [ip6_tables] Call Trace: [c007a4137a70] [d000102d0f94] get_info+0x84/0x290 [ip6_tables] (unreliable) [c007a4137bb0] [d000102d2274] do_ip6t_get_ctl+0x94/0x590 [ip6_tables] [c007a4137c90] [c09d9ee8] nf_getsockopt+0x88/0xd0 [c007a4137ce0] [c0ab2170] ipv6_getsockopt+0x160/0x1f0 [c007a4137d30] [c0abe4c0] rawv6_getsockopt+0x40/0xd0 [c007a4137d50] [c094c7d4] sock_common_getsockopt+0x34/0x50 [c007a4137d70] [c094b228] SyS_getsockopt+0xa8/0x160 [c007a4137dd0] [c094bef8] SyS_socketcall+0x1f8/0x3d0 [c007a4137e30] [c000b8e0] system_call+0x58/0x6c Instruction dump: 2e3e 98610117 40920190 7fe3fb78 388a 38a100f8 480034f9 e8410018 3920f000 7fa34840 7c7e1b78 41dd01f4 40920144 3880 38a00054 ---[ end trace ecbb65add1313022 ]--- IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f0: link is not ready IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f0: link is not ready IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f1: link is not ready IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f1: link is not ready IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f2: link is not ready IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f2: link is not ready IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f3: link is not ready IPv6: ADDRCONF(NETDEV_UP): enP1p9s0f3: link is not ready IPv6: ADDRCONF(NETDEV_UP): net0: link is not ready IPv6: ADDRCONF(NETDEV_UP): net0: link is not ready IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready qla2xxx [0002:03:00.0]-8038:1: Cable is unplugged... iw_cxgb3: Chelsio T3 RDMA Driver - version 1.1 ib_srpt MAD registration failed for cxgb3_0-1. ib_srpt srpt_add_one(cxgb3_0) failed. iw_cxgb3: Initialized device :01:00.0 -- Regard's Abdul Haleem IBM Linux Technology Centre # # Automatically generated file; DO NOT EDIT. # Linux/powerpc 4.15.0-rc7 Kernel Configuration # CONFIG_PPC64=y # # Processor support # CONFIG_PPC_BOOK3S_64=y # CONFIG_PPC_BOOK3E_64 is not set # CONFIG_POWER7_CPU is not set CONFIG_POWER8_CPU=y CONFIG_PPC_BOOK3S=y CONFIG_PPC_FPU=y CONFIG_ALTIVEC=y CONFIG_VSX=y CONFIG_PPC_STD_MMU=y CONFIG_PPC_RADIX_MMU=y CONFIG_PPC_RADIX_MMU_DEFAULT=y CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y CONFIG_PPC_MM_SLICES=y CONFIG_PPC_HAVE_PMU_SUPPORT=y CONFIG_PPC_PERF_CTRS=y CONFIG_FORCE_SMP=y CONFIG_SMP=y CONFIG_NR_CPUS=1024 CONFIG_PPC_DOORBELL=y # CONFIG_CPU_BIG_ENDIAN is not set
Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
On 01/16/2018 06:25 PM, Alexei Starovoitov wrote: > On Fri, Jan 12, 2018 at 10:11:11AM -0800, John Fastabend wrote: >> This implements a BPF ULP layer to allow policy enforcement and >> monitoring at the socket layer. In order to support this a new >> program type BPF_PROG_TYPE_SK_MSG is used to run the policy at >> the sendmsg/sendpage hook. To attach the policy to sockets a >> sockmap is used with a new program attach type BPF_SK_MSG_VERDICT. [...] > > overall design looks clean. imo huge improvement from first version. > Great thanks for the review. > Few nits: > [...] >> + >> +static int bpf_tcp_push(struct sock *sk, struct scatterlist *sg, >> +int *sg_end, int flags, bool charge) >> +{ >> +int sendpage_flags = flags | MSG_SENDPAGE_NOTLAST; >> +int offset, ret = 0; >> +struct page *p; >> +size_t size; >> + >> +size = sg->length; >> +offset = sg->offset; >> + >> +while (1) { >> +if (sg_is_last(sg)) >> +sendpage_flags = flags; >> + >> +tcp_rate_check_app_limited(sk); >> +p = sg_page(sg); >> +retry: >> +ret = do_tcp_sendpages(sk, p, offset, size, sendpage_flags); >> +if (ret != size) { >> +if (ret > 0) { >> +offset += ret; >> +size -= ret; >> +goto retry; >> +} >> + >> +if (charge) >> +sk_mem_uncharge(sk, >> +sg->length - size - sg->offset); > > should the bool argument be called 'uncharge' instead ? > Agreed that would be a better name, will update. >> + >> +sg->offset = offset; >> +sg->length = size; >> +return ret; >> +} >> + >> +put_page(p); >> +if (charge) >> +sk_mem_uncharge(sk, sg->length); >> +*sg_end += 1; >> +sg = sg_next(sg); >> +if (!sg) >> +break; >> + >> +offset = sg->offset; >> +size = sg->length; >> +} >> + >> +return 0; >> +} [...] >> +static int bpf_tcp_sendmsg_do_redirect(struct scatterlist *sg, int sg_num, >> + struct sk_msg_buff *md, int flags) >> +{ >> +int i, sg_curr = 0, err, free; >> +struct smap_psock *psock; >> +struct sock *sk; >> + >> +rcu_read_lock(); >> +sk = do_msg_redirect_map(md); >> +if (unlikely(!sk)) >> +goto out_rcu; >> + >> +psock = smap_psock_sk(sk); >> +if (unlikely(!psock)) >> +goto out_rcu; >> + >> +if (!refcount_inc_not_zero(>refcnt)) >> +goto out_rcu; >> + >> +rcu_read_unlock(); >> +lock_sock(sk); >> +err = bpf_tcp_push(sk, sg, _curr, flags, false); >> +if (unlikely(err)) >> +goto out; >> +release_sock(sk); >> +smap_release_sock(psock, sk); >> +return 0; >> +out_rcu: >> +rcu_read_unlock(); >> +out: >> +for (i = sg_curr; i < sg_num; ++i) { >> +free += sg[i].length; >> +put_page(sg_page([i])); >> +} >> +return free; > > erro path keeps rcu_lock and sk locked? > yep, although looks like rcu_read_unlock() is OK because its released above the call but the if (unlikely(err)) goto err needs to be moved below the smap_release_sock(). Thanks! >> +} >> + [...] >> +while (msg_data_left(msg)) { >> +int sg_curr; >> + >> +if (sk->sk_err) { >> +err = sk->sk_err; >> +goto out_err; >> +} >> + >> +copy = msg_data_left(msg); >> +if (!sk_stream_memory_free(sk)) >> +goto wait_for_sndbuf; >> + >> +/* sg_size indicates bytes already allocated and sg_num >> + * is last sg element used. This is used when alloc_sg >> + * partially allocates a scatterlist and then is sent >> + * to wait for memory. In normal case (no memory pressure) >> + * both sg_nun and sg_size are zero. >> + */ >> +copy = copy - sg_size; >> +err = sk_alloc_sg(sk, copy, sg, _num, _size, 0); >> +if (err) { >> +if (err != -ENOSPC) >> +goto wait_for_memory; >> +copy = sg_size; >> +} >> + >> +err = memcopy_from_iter(sk, sg, sg_num, >msg_iter, copy); >> +if (err < 0) { >> +free_sg(sk, sg, 0, sg_num); >> +goto out_err; >> +} >> + >> +copied += copy; >> + >> +/* If msg is larger than MAX_SKB_FRAGS we can send multiple >> + * scatterlists per msg. However BPF decisions apply to the >> + * entire msg. >> + */ >> +if (eval
net merged into net-next
Daniel, please double check my merge work especially wrt. your packet scheduler fix. Thanks!
Re: [PATCH net] net: validate untrusted gso packets
On Tue, Jan 16, 2018 at 11:33 PM, Willem de Bruijnwrote: > On Tue, Jan 16, 2018 at 11:04 PM, Jason Wang wrote: >> >> >> On 2018年01月17日 04:29, Willem de Bruijn wrote: >>> >>> From: Willem de Bruijn >>> >>> Validate gso packet type and headers on kernel entry. Reuse the info >>> gathered by skb_probe_transport_header. >>> >>> Syzbot found two bugs by passing bad gso packets in packet sockets. >>> Untrusted user packets are limited to a small set of gso types in >>> virtio_net_hdr_to_skb. But segmentation occurs on packet contents. >>> Syzkaller was able to enter gso callbacks that are not hardened >>> against untrusted user input. >> >> >> Do this mean there's something missed in exist header check for dodgy >> packets? > > virtio_net_hdr_to_skb checks gso_type, but it does not verify that this > type correctly describes the actual packet. Segmentation happens based > on packet contents. So a packet was crafted to enter sctp gso, even > though no such gso_type exists. This issue is not specific to sctp. > >>> >>> User packets can also have corrupted headers, tripping up segmentation >>> logic that expects sane packets from the trusted protocol stack. >>> Hardening all segmentation paths against all bad packets is error >>> prone and slows down the common path, so validate on kernel entry. >> >> >> I think evil packets should be rare in common case, so I'm not sure validate >> it on kernel entry is a good choice especially consider we've already had >> header check. > > This just makes that check more strict. Frequency of malicious packets is > not really relevant if a single bad packet can cause damage. > > The alternative to validate on kernel entry is to harden the entire > segmentation > layer and lower part of the stack. That is much harder to get right and not > necessarily cheaper. > > As a matter of fact, it incurs a cost on all packets, including the common > case generated by the protocol stack. If packets can be fully validated at the source, we can eventually also get rid of the entire SKB_GSO_DODGY and NETIF_F_GSO_ROBUST logic. Then virtio packets won't have to enter the segmentation layer at all for TSO capable devices.
Re: [PATCH net] net: validate untrusted gso packets
On Tue, Jan 16, 2018 at 11:04 PM, Jason Wangwrote: > > > On 2018年01月17日 04:29, Willem de Bruijn wrote: >> >> From: Willem de Bruijn >> >> Validate gso packet type and headers on kernel entry. Reuse the info >> gathered by skb_probe_transport_header. >> >> Syzbot found two bugs by passing bad gso packets in packet sockets. >> Untrusted user packets are limited to a small set of gso types in >> virtio_net_hdr_to_skb. But segmentation occurs on packet contents. >> Syzkaller was able to enter gso callbacks that are not hardened >> against untrusted user input. > > > Do this mean there's something missed in exist header check for dodgy > packets? virtio_net_hdr_to_skb checks gso_type, but it does not verify that this type correctly describes the actual packet. Segmentation happens based on packet contents. So a packet was crafted to enter sctp gso, even though no such gso_type exists. This issue is not specific to sctp. >> >> User packets can also have corrupted headers, tripping up segmentation >> logic that expects sane packets from the trusted protocol stack. >> Hardening all segmentation paths against all bad packets is error >> prone and slows down the common path, so validate on kernel entry. > > > I think evil packets should be rare in common case, so I'm not sure validate > it on kernel entry is a good choice especially consider we've already had > header check. This just makes that check more strict. Frequency of malicious packets is not really relevant if a single bad packet can cause damage. The alternative to validate on kernel entry is to harden the entire segmentation layer and lower part of the stack. That is much harder to get right and not necessarily cheaper. As a matter of fact, it incurs a cost on all packets, including the common case generated by the protocol stack.
Re: [PATCH 32/32] aio: implement io_pgetevents
On Tue, Jan 16, 2018 at 07:41:24PM -0500, Jeff Moyer wrote: > if (sigmask) { > - if (copy_from_user(, sigmask, sizeof(ksigmask))) > + if (!access_ok(VERIFY_READ, sigmask, > +sizeof(void *) + sizeof(size_t)) || > + __get_user(up, (sigset_t __user * __user *)sigmask) || > + __get_user(sigsetsize, > +(size_t __user *)(sigmask + sizeof(void * > return -EFAULT; How about copy_from_user() on a struct? Making eyes bleed is fun, but people tend to get annoyed when you do it to them...
Re: [PATCH bpf-next v2 09/11] nfp: bpf: use extack support to improve debugging
On 1/16/18 12:11 PM, Jakub Kicinski wrote: > On Tue, 16 Jan 2018 10:36:01 +0100, Jiri Pirko wrote: >>> @@ -303,7 +305,8 @@ static int nfp_net_bpf_load(struct nfp_net *nn, struct >>> bpf_prog *prog) >>> /* Load up the JITed code */ >>> err = nfp_net_reconfig(nn, NFP_NET_CFG_UPDATE_BPF); >>> if (err) >>> - nn_err(nn, "FW command error while loading BPF: %d\n", err); >>> + NL_SET_ERR_MSG_MOD(extack, >>> + "FW command error while loading BPF"); >> >> One line please. Same for all others. Strings may overflow 80 cols. > > Sorry, but this is the way I want things in the nfp driver. If the > string would fit 80 chars placed on a new line, it should be placed > on a new line. If it doesn't fit anyway the new line is unnecessary. > This rules is adhered to throughout the driver (to the extent I'm able > to enforce it). > +1
Re: [PATCH net] net: validate untrusted gso packets
On 2018年01月17日 04:29, Willem de Bruijn wrote: From: Willem de BruijnValidate gso packet type and headers on kernel entry. Reuse the info gathered by skb_probe_transport_header. Syzbot found two bugs by passing bad gso packets in packet sockets. Untrusted user packets are limited to a small set of gso types in virtio_net_hdr_to_skb. But segmentation occurs on packet contents. Syzkaller was able to enter gso callbacks that are not hardened against untrusted user input. Do this mean there's something missed in exist header check for dodgy packets? User packets can also have corrupted headers, tripping up segmentation logic that expects sane packets from the trusted protocol stack. Hardening all segmentation paths against all bad packets is error prone and slows down the common path, so validate on kernel entry. I think evil packets should be rare in common case, so I'm not sure validate it on kernel entry is a good choice especially consider we've already had header check. Introduce skb_probe_transport_header_hard to unconditionally probe, even if skb_partial_csum_set has already set an offset. That is under user control, so do not trust it. I did not see a measurable change in TCP_STREAM performance. Move tpacket probe call after virtio_net_hdr_to_skb has set gso_type. Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.") Fixes: f43798c27684 ("tun: Allow GSO using virtio_net_hdr") Fixes: f942dc2552b8 ("xen network backend driver") Link:http://lkml.kernel.org/r/<001a1137452496ffc305617e5...@google.com> Reported-by:syzbot+fee64147a25aecd48...@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn ... Thanks
Re: [PATCH bpf-next 3/3] tools: bpftool: improve architecture detection by using ifindex
CC: Francois On Tue, 16 Jan 2018 19:42:15 -0800, Alexei Starovoitov wrote: > > + switch (vendor_id) { > > + case 0x19ee: > > + device_id = read_sysfs_netdev_hex_int(devname, "device"); > > + if (device_id != 0x4000 && > > + device_id != 0x6000 && > > + device_id != 0x6003) > > + p_info("Unknown NFP device ID, assuming it is NFP-6xxx > > arch"); > > + return "NFP-6xxx"; > > is this a canonical name that bfd will understand? Yes, we started with just "nfp", but tossed "6k" for good measure. > a link to bfd patches? Unfortunately not posted, yet.
[PATCH] Bluetooth: 6lowpan: Fix disconnect bug in 6lowpan
This patch fix the bluetooth 6lowpan disconnect fail bug. The type of the same address type have different define value in HCI layer and L2CAP layer.That makes disconnect fail due to wrong network type.User will not be able to disconnect from console with the network type that used in connect. This patch add a var lookup_type, and covert the channel address type to HCI address type. By these means, user can disconnect successfuly. Signed-off-by: Guo Yi--- net/bluetooth/6lowpan.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/net/bluetooth/6lowpan.c b/net/bluetooth/6lowpan.c index 795ddd8..332dddb 100644 --- a/net/bluetooth/6lowpan.c +++ b/net/bluetooth/6lowpan.c @@ -1104,11 +1104,18 @@ static int get_l2cap_conn(char *buf, bdaddr_t *addr, u8 *addr_type, struct hci_dev *hdev; bdaddr_t *src = BDADDR_ANY; int n; + u8 lookup_type; n = sscanf(buf, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx %hhu", >b[5], >b[4], >b[3], >b[2], >b[1], >b[0], addr_type); + /* Convert from L2CAP channel address type to HCI address type +*/ + if (*addr_type == BDADDR_LE_PUBLIC) + lookup_type = ADDR_LE_DEV_PUBLIC; + else + lookup_type = ADDR_LE_DEV_RANDOM; if (n < 7) return -EINVAL; @@ -1118,7 +1125,7 @@ static int get_l2cap_conn(char *buf, bdaddr_t *addr, u8 *addr_type, return -ENOENT; hci_dev_lock(hdev); - hcon = hci_conn_hash_lookup_le(hdev, addr, *addr_type); + hcon = hci_conn_hash_lookup_le(hdev, addr, lookup_type); hci_dev_unlock(hdev); if (!hcon) -- 2.7.4
Re: pull-request: bpf-next 2018-01-17
From: Daniel BorkmannDate: Wed, 17 Jan 2018 02:01:28 +0100 > The following pull-request contains BPF updates for your *net-next* tree. ... > Please consider pulling these changes from: > > git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git Pulled, thanks Daniel.
Re: [PATCH bpf-next 3/3] tools: bpftool: improve architecture detection by using ifindex
On Tue, Jan 16, 2018 at 04:05:21PM -0800, Jakub Kicinski wrote: > From: Jiong Wang> > The current architecture detection method in bpftool is designed for host > case. > > For offload case, we can't use the architecture of "bpftool" itself. > Instead, we could call the existing "ifindex_to_name_ns" to get DEVNAME, > then read pci id from /sys/class/dev/DEVNAME/device/vendor, finally we map > vendor id to bfd arch name which will finally be used to select bfd backend > for the disassembler. > > Reviewed-by: Jakub Kicinski > Signed-off-by: Jiong Wang awesome addition! Acked-by: Alexei Starovoitov > + switch (vendor_id) { > + case 0x19ee: > + device_id = read_sysfs_netdev_hex_int(devname, "device"); > + if (device_id != 0x4000 && > + device_id != 0x6000 && > + device_id != 0x6003) > + p_info("Unknown NFP device ID, assuming it is NFP-6xxx > arch"); > + return "NFP-6xxx"; is this a canonical name that bfd will understand? a link to bfd patches?
Re: [PATCH bpf-next 1/3] bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info
On Tue, Jan 16, 2018 at 04:05:19PM -0800, Jakub Kicinski wrote: > From: Jiong Wang> > For host JIT, there are "jited_len"/"bpf_func" fields in struct bpf_prog > used by all host JIT targets to get jited image and it's length. While for > offload, targets are likely to have different offload mechanisms that these > info are kept in device private data fields. > > Therefore, BPF_OBJ_GET_INFO_BY_FD syscall needs an unified way to get JIT > length and contents info for offload targets. > > One way is to introduce new callback to parse device private data then fill > those fields in bpf_prog_info. This might be a little heavy, the other way > is to add generic fields which will be initialized by all offload targets. > > This patch follow the second approach to introduce two new fields in > struct bpf_dev_offload and teach bpf_prog_get_info_by_fd about them to fill > correct jited_prog_len and jited_prog_insns in bpf_prog_info. > > Reviewed-by: Jakub Kicinski > Signed-off-by: Jiong Wang Acked-by: Alexei Starovoitov initially I wasn't sure that reusing jited_prog_insns field to return offloaded prog is such a good idea, but since we fill in ifindex at the same time the usage of the field is not ambiguous, so I think it's a good approach.
Re: [PATCH net-next 2/8] net: sched: cls_api: handle generic cls errors
On 1/16/18 4:19 PM, Jamal Hadi Salim wrote: > On 18-01-16 06:58 PM, David Ahern wrote: >> On 1/16/18 9:20 AM, Alexander Aring wrote: > > >>>  } >>>   if (n->nlmsg_type != RTM_NEWTFILTER || >>>  !(n->nlmsg_flags & NLM_F_CREATE)) { >>> +   NL_SET_ERR_MSG(extack, "Need both RTM_NEWTFILTER and >>> NLM_F_CREATE to create a new filter"); >> >> that does not seem the right message. tc_ctl_tfilter is overloaded for >> new, delete and get so the response here needs to reflect that. I >> believe in this case the user did not specify a valid chain. >> > > Are you sure you are looking at the correct code? tp = tcf_chain_tp_find(chain, _info, protocol, prio, prio_allocate); if (IS_ERR(tp)) { err = PTR_ERR(tp); goto errout; } if (tp == NULL) { /* Proto-tcf does not exist, create new one */ if (tca[TCA_KIND] == NULL || !protocol) { err = -EINVAL; goto errout; } if (n->nlmsg_type != RTM_NEWTFILTER || !(n->nlmsg_flags & NLM_F_CREATE)) { err = -ENOENT; goto errout; } Seems like that code path is run for other than RTM_NEWTFILTER. Even the check there says != is ok -- just error out with an ENOENT. > It is a create message that is at stake here. > A create has to have RTM_NEWTFILTER and NLM_F_CREATE > >> Also, the messages are targeted at users not developers, so no code >> jargon / API references. > > Generally true, but should this rule really be scripture? > The main user here is tc in user space and it doesnt make mistakes > in this case i.e we will never see this error with tc because a > create will always have those two set correctly; OTOH, a developer > writing some new app is more likely to make this mistake (in which > case this message is very helpful). argumentative. I have focused on adding specific error messages that help a user understand why a command failed. It can be done with referencing API names.
[PATCH net-next] mlxsw: spectrum: Make function mlxsw_sp_kvdl_part_occ() static
Fixes the following sparse warning: drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:289:5: warning: symbol 'mlxsw_sp_kvdl_part_occ' was not declared. Should it be static? Signed-off-by: Wei Yongjun--- drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c index cfacc17..55f9d2d 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c @@ -286,7 +286,7 @@ static void mlxsw_sp_kvdl_parts_fini(struct mlxsw_sp *mlxsw_sp) mlxsw_sp_kvdl_part_fini(mlxsw_sp, i); } -u64 mlxsw_sp_kvdl_part_occ(struct mlxsw_sp_kvdl_part *part) +static u64 mlxsw_sp_kvdl_part_occ(struct mlxsw_sp_kvdl_part *part) { unsigned int nr_entries; int bit = -1;
[PATCH net-next] devlink: Make some functions static
Fixes the following sparse warnings: net/core/devlink.c:2297:25: warning: symbol 'devlink_resource_find' was not declared. Should it be static? net/core/devlink.c:2322:6: warning: symbol 'devlink_resource_validate_children' was not declared. Should it be static? Signed-off-by: Wei Yongjun--- net/core/devlink.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/core/devlink.c b/net/core/devlink.c index dd7d6dd..66d3670 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -2294,7 +2294,7 @@ static int devlink_nl_cmd_dpipe_table_counters_set(struct sk_buff *skb, counters_enable); } -struct devlink_resource * +static struct devlink_resource * devlink_resource_find(struct devlink *devlink, struct devlink_resource *resource, u64 resource_id) { @@ -2319,7 +2319,8 @@ struct devlink_resource * return NULL; } -void devlink_resource_validate_children(struct devlink_resource *resource) +static void +devlink_resource_validate_children(struct devlink_resource *resource) { struct devlink_resource *child_resource; bool size_valid = true;
[PATCHv2 net-next 1/1] forcedeth: remove unused variable
The variable miistat is not used. So it is removed. CC: Srinivas EedaCC: Joe Jin CC: Junxiao Bi Signed-off-by: Zhu Yanjun --- v1->v2: Keep readl function --- drivers/net/ethernet/nvidia/forcedeth.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c index 21e15cb..a3f6d51 100644 --- a/drivers/net/ethernet/nvidia/forcedeth.c +++ b/drivers/net/ethernet/nvidia/forcedeth.c @@ -5510,11 +5510,9 @@ static int nv_open(struct net_device *dev) /* One manual link speed update: Interrupts are enabled, future link * speed changes cause interrupts and are handled by nv_link_irq(). */ - { - u32 miistat; - miistat = readl(base + NvRegMIIStatus); - writel(NVREG_MIISTAT_MASK_ALL, base + NvRegMIIStatus); - } + readl(base + NvRegMIIStatus); + writel(NVREG_MIISTAT_MASK_ALL, base + NvRegMIIStatus); + /* set linkspeed to invalid value, thus force nv_update_linkspeed * to init hw */ np->linkspeed = 0; -- 2.7.4
Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
On Fri, Jan 12, 2018 at 10:11:11AM -0800, John Fastabend wrote: > This implements a BPF ULP layer to allow policy enforcement and > monitoring at the socket layer. In order to support this a new > program type BPF_PROG_TYPE_SK_MSG is used to run the policy at > the sendmsg/sendpage hook. To attach the policy to sockets a > sockmap is used with a new program attach type BPF_SK_MSG_VERDICT. > > Similar to previous sockmap usages when a sock is added to a > sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT > program type attached then the BPF ULP layer is created on the > socket and the attached BPF_PROG_TYPE_SK_MSG program is run for > every msg in sendmsg case and page/offset in sendpage case. > > BPF_PROG_TYPE_SK_MSG Semantics/API: > > BPF_PROG_TYPE_SK_MSG supports only two return codes SK_PASS and > SK_DROP. Returning SK_DROP free's the copied data in the sendmsg > case and in the sendpage case leaves the data untouched. Both cases > return -EACESS to the user. Returning SK_PASS will allow the msg to > be sent. > > In the sendmsg case data is copied into kernel space buffers before > running the BPF program. In the sendpage case data is never copied. > The implication being users may change data after BPF programs run in > the sendpage case. (A flag will be added to always copy shortly > if the copy must always be performed). > > The verdict from the BPF_PROG_TYPE_SK_MSG applies to the entire msg > in the sendmsg() case and the entire page/offset in the sendpage case. > This avoid ambiguity on how to handle mixed return codes in the > sendmsg case. The readable/writeable data provided to the program > in the sendmsg case may not be the entire message, in fact for > large sends this is likely the case. The data range that can be > read is part of the sk_msg_md structure. This is because similar > to the tc bpf_cls case the data is stored in a scatter gather list. > Future work will address this short-coming to allow users to pull > in more data if needed (similar to TC BPF). > > The helper msg_redirect_map() can be used to select the socket to > send the data on. This is used similar to existing redirect use > cases. This allows policy to redirect msgs. > > Pseudo code simple example: > > The basic logic to attach a program to a socket is as follows, > > // load the programs > bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG, > , _prog); > > // lookup the sockmap > bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map"); > > // get fd for sockmap > map_fd_msg = bpf_map__fd(bpf_map_msg); > > // attach program to sockmap > bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0); > > Adding sockets to the map is done in the normal way, > > // Add a socket 'fd' to sockmap at location 'i' > bpf_map_update_elem(map_fd_msg, , fd, BPF_ANY); > > After the above any socket attached to "my_sock_map", in this case > 'fd', will run the BPF msg verdict program (msg_prog) on every > sendmsg and sendpage system call. > > For a complete example see BPF selftests bpf/sockmap_tcp_msg_*.c and > test_maps.c > > Implementation notes: > > It seemed the simplest, to me at least, to use a refcnt to ensure > psock is not lost across the sendmsg copy into the sg, the bpf program > running on the data in sg_data, and the final pass to the TCP stack. > Some performance testing may show a better method to do this and avoid > the refcnt cost, but for now use the simpler method. > > Another item that will come after basic support is in place is > supporting MSG_MORE flag. At the moment we call sendpages even if > the MSG_MORE flag is set. An enhancement would be to collect the > pages into a larger scatterlist and pass down the stack. Notice that > bpf_tcp_sendmsg() could support this with some additional state saved > across sendmsg calls. I built the code to support this without having > to do refactoring work. Other flags TBD include ZEROCOPY flag. > > Yet another detail that needs some thought is the size of scatterlist. > Currently, we use MAX_SKB_FRAGS simply because this was being used > already in the TLS case. Future work to improve the kernel sk APIs to > tune this depending on workload may be useful. This is a trade-off > between memory usage and B/s performance. > > Signed-off-by: John Fastabendoverall design looks clean. imo huge improvement from first version. Few nits: > --- > include/linux/bpf.h |1 > include/linux/bpf_types.h |1 > include/linux/filter.h| 10 + > include/net/tcp.h |2 > include/uapi/linux/bpf.h | 28 +++ > kernel/bpf/sockmap.c | 485 > - > kernel/bpf/syscall.c | 14 + > kernel/bpf/verifier.c |5 > net/core/filter.c | 106 ++ > 9 files changed, 638 insertions(+), 14 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 9e03046..14cdb4d
[PATCH net-next] tcp: avoid negotitating ECN for BBR
This patch keeps BBR from negotiating ECN if sysctl ECN is set. Prior to this patch, BBR negotiates ECN if enabled, sends CWR upon receiving ECE ACKs but does not react to them. This can cause confusion from the protocol perspective. Therefore this patch prevents the connection from negotiating ECN if BBR is the congestion control during the handshake. Note that after the handshake, the user can still switch to a different congestion control that supports or even requires ECN (e.g. DCTCP). In that case, the connection can not re-negotiate ECN and has to go with the ECN-free mode in that congestion control. There are other cases BBR would still respond to ECE ACKs with CWR but does not react to it like the behavior before this patch. First, when the user switches to BBR congestion control but the connection has already negotiated ECN before. Second, the system has configured the ip route and/or uses eBPF to enable ECN on the connection that uses BBR congestion control. Signed-off-by: Yuchung ChengSigned-off-by: Neal Cardwell Acked-by: Yousuk Seung Acked-by: Eric Dumazet --- include/net/tcp.h | 7 +++ net/ipv4/tcp_bbr.c| 2 +- net/ipv4/tcp_input.c | 3 ++- net/ipv4/tcp_output.c | 6 -- 4 files changed, 14 insertions(+), 4 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 6939e69d3c37..22345132d969 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -925,6 +925,8 @@ enum tcp_ca_ack_event_flags { #define TCP_CONG_NON_RESTRICTED 0x1 /* Requires ECN/ECT set on all packets */ #define TCP_CONG_NEEDS_ECN 0x2 +/* Does not use or react to ECN */ +#define TCP_CONG_DONT_USE_ECN 0x4 union tcp_cc_info; @@ -1033,6 +1035,11 @@ static inline bool tcp_ca_needs_ecn(const struct sock *sk) return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ECN; } +static inline bool tcp_ca_uses_ecn(const struct sock *sk) +{ + return !(inet_csk(sk)->icsk_ca_ops->flags & TCP_CONG_DONT_USE_ECN); +} + static inline void tcp_set_ca_state(struct sock *sk, const u8 ca_state) { struct inet_connection_sock *icsk = inet_csk(sk); diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index 8322f26e770e..27456554b113 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -926,7 +926,7 @@ static void bbr_set_state(struct sock *sk, u8 new_state) } static struct tcp_congestion_ops tcp_bbr_cong_ops __read_mostly = { - .flags = TCP_CONG_NON_RESTRICTED, + .flags = TCP_CONG_NON_RESTRICTED | TCP_CONG_DONT_USE_ECN, .name = "bbr", .owner = THIS_MODULE, .init = bbr_init, diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ff71b18d9682..6731d0b9b146 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -6090,7 +6090,8 @@ static void tcp_ecn_create_request(struct request_sock *req, ect = !INET_ECN_is_not_ect(TCP_SKB_CB(skb)->ip_dsfield); ecn_ok_dst = dst_feature(dst, DST_FEATURE_ECN_MASK); - ecn_ok = net->ipv4.sysctl_tcp_ecn || ecn_ok_dst; + ecn_ok = ecn_ok_dst || +(net->ipv4.sysctl_tcp_ecn && tcp_ca_uses_ecn(listen_sk)); if ((!ect && ecn_ok) || tcp_ca_needs_ecn(listen_sk) || (ecn_ok_dst & DST_FEATURE_ECN_CA) || diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 95461f02ac9a..446cb65090f5 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -312,8 +312,10 @@ static void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); bool bpf_needs_ecn = tcp_bpf_ca_needs_ecn(sk); - bool use_ecn = sock_net(sk)->ipv4.sysctl_tcp_ecn == 1 || - tcp_ca_needs_ecn(sk) || bpf_needs_ecn; + bool use_ecn = tcp_ca_needs_ecn(sk) || bpf_needs_ecn; + + if (sock_net(sk)->ipv4.sysctl_tcp_ecn == 1 && tcp_ca_uses_ecn(sk)) + use_ecn = true; if (!use_ecn) { const struct dst_entry *dst = __sk_dst_get(sk); -- 2.16.0.rc1.238.g530d649a79-goog
[PATCH] net:l2tp: Allow MAC to be configured via netlink
The linux kernel by default uses random MAC address for l2tp interfaces. However, there are situations where it is desirable to have a deterministic MAC address. A sample scenario would be where the host IP stack is attached directly to a tunnel hence the "random" address is now propagated via ARP/ND to the other end of the tunnel. If the device reboots, a new MAC is used and the communication over the tunnel will be disrupted until the new MAC address is re-learned by the peer. Therefore it can be useful to have the mac address the same across reboots. The patch makes the MAC address to be configurable via netlink so that a userspace program can specify what MAC address to use at interface creation time in the kernel. Signed-off-by: Isaac Lee--- include/uapi/linux/l2tp.h | 1 + net/l2tp/l2tp_core.h | 1 + net/l2tp/l2tp_eth.c | 7 ++- net/l2tp/l2tp_netlink.c | 3 +++ 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/l2tp.h b/include/uapi/linux/l2tp.h index d84ce5c1c9aa..fec15fd774c4 100644 --- a/include/uapi/linux/l2tp.h +++ b/include/uapi/linux/l2tp.h @@ -126,6 +126,7 @@ enum { L2TP_ATTR_IP6_DADDR,/* struct in6_addr */ L2TP_ATTR_UDP_ZERO_CSUM6_TX,/* flag */ L2TP_ATTR_UDP_ZERO_CSUM6_RX,/* flag */ + L2TP_ATTR_HWADDR, /* 6 bytes */ L2TP_ATTR_PAD, __L2TP_ATTR_MAX, }; diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h index 9534e16965cc..730021289ce5 100644 --- a/net/l2tp/l2tp_core.h +++ b/net/l2tp/l2tp_core.h @@ -71,6 +71,7 @@ struct l2tp_session_cfg { int mtu; int mru; char*ifname; + unsigned char hwaddr[ETH_ALEN]; }; struct l2tp_session { diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c index 5c366ecfa1cb..0e6ef5379b64 100644 --- a/net/l2tp/l2tp_eth.c +++ b/net/l2tp/l2tp_eth.c @@ -58,7 +58,9 @@ struct l2tp_eth_sess { static int l2tp_eth_dev_init(struct net_device *dev) { - eth_hw_addr_random(dev); + /* Use random MAC only when the interface is created without dev_addr */ + if (!dev->dev_addr || !is_valid_ether_addr(dev->dev_addr)) + eth_hw_addr_random(dev); eth_broadcast_addr(dev->broadcast); netdev_lockdep_set_classes(dev); @@ -309,6 +311,9 @@ static int l2tp_eth_create(struct net *net, struct l2tp_tunnel *tunnel, dev->max_mtu = ETH_MAX_MTU; l2tp_eth_adjust_mtu(tunnel, session, dev); + if (is_valid_ether_addr(cfg->hwaddr)) + ether_addr_copy(dev->dev_addr, cfg->hwaddr); + priv = netdev_priv(dev); priv->session = session; diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c index a1f24fb2be98..dc2933c32121 100644 --- a/net/l2tp/l2tp_netlink.c +++ b/net/l2tp/l2tp_netlink.c @@ -607,6 +607,9 @@ static int l2tp_nl_cmd_session_create(struct sk_buff *skb, struct genl_info *inf if (info->attrs[L2TP_ATTR_MRU]) cfg.mru = nla_get_u16(info->attrs[L2TP_ATTR_MRU]); + if (info->attrs[L2TP_ATTR_HWADDR]) + memcpy(, nla_data(info->attrs[L2TP_ATTR_HWADDR]), ETH_ALEN); + #ifdef CONFIG_MODULES if (l2tp_nl_cmd_ops[cfg.pw_type] == NULL) { genl_unlock(); -- 2.15.1
Re: [PATCH net-next 0/8] net: sched: cls: add extack support
On 01/17/2018 01:08 AM, Daniel Borkmann wrote: > Hey David, and others, [+Alexei] > > On 01/17/2018 12:27 AM, Jamal Hadi Salim wrote: >> On 18-01-16 05:41 PM, Jakub Kicinski wrote: >>> On Tue, 16 Jan 2018 17:12:57 -0500, Jamal Hadi Salim wrote: On 18-01-16 04:46 PM, Jakub Kicinski wrote: > On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote: [..] I would say precedence should be Jiri's patches, Alex's patches and then yours: Alex's patches fix the core (cls_api.c) area with proper extack for the core and then he has one patch to cover a specific use case of the u32 classifier extack. Yours is only concerned with one use case - bpf which depend on the core (that is in Alex's patches) >>> >>> Our patches are concerned with propagating the extack to drivers, >>> and nfp (and netdevsim) make use of it. >>> >>> I'm miffed by the fact that you jumped out with this conflicting series >>> *after* we posted ours, and we got shot down on white space. > > So I've been looking over Quentin's series just now that sits in my > bucket and it looks fine to me, but merge with this one would probably > end up badly for David. Therefore I'm proposing the following that > should hopefully be fine and work out for Alexander and Jakub/Quentin > as a consensus: > > I'm getting the current bpf-next stuff as PR out in a few minutes, so > David can pull this in and therefore net-next will also have the > dependency on nfp for Quentin's series. Then, given this one here > needs another respin anyway, I would suggest to combine the missing > patches from Alexander's series, and get it all out in a single patch > series directly for net-next w/o any interdependency hassle. Ok, bpf-next PR with the nfp dependencies is now out, so all this can make progress here. I've therefore purged Jakub's extack series from bpf queue, so a combined series can target net-next directly then. Thanks, Daniel
linux-next: manual merge of the net-next tree with the net tree
Hi all, Today's linux-next merge of the net-next tree got conflicts in: net/sched/sch_ingress.c net/sched/sch_api.c include/net/sch_generic.h between commit: 81d947e2b8dd ("net, sched: fix panic when updating miniq {b,q}stats") from the net tree and commits: 54160ef6ec64 ("net: sched: sch_api: rearrange init handling") 8d1a77f974ca ("net: sch: api: add extack support in tcf_block_get") d0bd684dddab ("net: sch: api: add extack support in qdisc_alloc") from the net-next tree. I fixed it up (I think, see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc include/net/sch_generic.h index becf86aa4ac6,ac029d5d88e4.. --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@@ -444,10 -471,11 +471,12 @@@ void qdisc_destroy(struct Qdisc *qdisc) void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n, unsigned int len); struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue, - const struct Qdisc_ops *ops); + const struct Qdisc_ops *ops, + struct netlink_ext_ack *extack); +void qdisc_free(struct Qdisc *qdisc); struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue, - const struct Qdisc_ops *ops, u32 parentid); + const struct Qdisc_ops *ops, u32 parentid, + struct netlink_ext_ack *extack); void __qdisc_calculate_pkt_len(struct sk_buff *skb, const struct qdisc_size_table *stab); int skb_do_redirect(struct sk_buff *); diff --cc net/sched/sch_api.c index 52529b7f8d96,0038a1c44ee9.. --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@@ -1062,43 -1088,64 +1088,53 @@@ static struct Qdisc *qdisc_create(struc netdev_info(dev, "Caught tx_queue_len zero misconfig\n"); } - if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) { - if (tca[TCA_STAB]) { - stab = qdisc_get_stab(tca[TCA_STAB]); - if (IS_ERR(stab)) { - err = PTR_ERR(stab); - goto err_out4; - } - rcu_assign_pointer(sch->stab, stab); - } - if (tca[TCA_RATE]) { - seqcount_t *running; - - err = -EOPNOTSUPP; - if (sch->flags & TCQ_F_MQROOT) - goto err_out4; - - if ((sch->parent != TC_H_ROOT) && - !(sch->flags & TCQ_F_INGRESS) && - (!p || !(p->flags & TCQ_F_MQROOT))) - running = qdisc_root_sleeping_running(sch); - else - running = >running; - - err = gen_new_estimator(>bstats, - sch->cpu_bstats, - >rate_est, - NULL, - running, - tca[TCA_RATE]); - if (err) - goto err_out4; + if (ops->init) { + err = ops->init(sch, tca[TCA_OPTIONS], extack); + if (err != 0) + goto err_out5; + } + - if (qdisc_is_percpu_stats(sch)) { - sch->cpu_bstats = - netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu); - if (!sch->cpu_bstats) - goto err_out4; - - sch->cpu_qstats = alloc_percpu(struct gnet_stats_queue); - if (!sch->cpu_qstats) - goto err_out4; - } - + if (tca[TCA_STAB]) { + stab = qdisc_get_stab(tca[TCA_STAB], extack); + if (IS_ERR(stab)) { + err = PTR_ERR(stab); + goto err_out4; } + rcu_assign_pointer(sch->stab, stab); + } + if (tca[TCA_RATE]) { + seqcount_t *running; - qdisc_hash_add(sch, false); + err = -EOPNOTSUPP; + if (sch->flags & TCQ_F_MQROOT) { + NL_SET_ERR_MSG(extack, "Cannot attach rate estimator to a multi-queue root qdisc"); + goto err_out4; + } - return sch; + if (sch->parent != TC_H_ROOT && +
Re: [bpf-next PATCH 0/3] libbpf: cleanups to Makefile
On 01/17/2018 12:20 AM, Jesper Dangaard Brouer wrote: > This patchset contains some small improvements and cleanup for > the Makefile in tools/lib/bpf/. > > It worries me that the libbpf.so shared library is not versioned, but > it not addressed in this patchset. Looks good; applied it to bpf-next, thanks Jesper!
Re: [PATCH bpf-next 0/6] bpf: various fixes and improvements
On 01/17/2018 12:51 AM, Jakub Kicinski wrote: > Hi! > > This series combines a number of random improvements ranging from > libbpf to nfp driver. NFP patches make better use of the verifier > log. There is a requested adjustment to the map offload code, and > a warning fix for a W=1 build to the disassembler. Quentin also > fixes the libbpf program type detection, while Jiong allows the use > of libbfd compiled from source. I did apply the series to bpf-next, thanks guys!
pull-request: bpf-next 2018-01-17
Hi David, The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add initial BPF map offloading for nfp driver. Currently only programs were supported so far w/o being able to access maps. Offloaded programs are right now only allowed to perform map lookups, and control path is responsible for populating the maps. BPF core infrastructure along with nfp implementation is provided, from Jakub. 2) Various follow-ups to Josef's BPF error injections. More specifically that includes: properly check whether the error injectable event is on function entry or not, remove the percpu bpf_kprobe_override and rather compare instruction pointer with original one, separate error-injection from kprobes since it's not limited to it, add injectable error types in order to specify what is the expected type of failure, and last but not least also support the kernel's fault injection framework, all from Masami. 3) Various misc improvements and cleanups to the libbpf Makefile. That is, fix permissions when installing BPF header files, remove unused variables and functions, and also install the libbpf.h header, from Jesper. 4) When offloading to nfp JIT and the BPF insn is unsupported in the JIT, then reject right at verification time. Also fix libbpf with regards to ELF section name matching by properly treating the program type as prefix. Both from Quentin. 5) Add -DPACKAGE to bpftool when including bfd.h for the disassembler. This is needed, for example, when building libfd from source as bpftool doesn't supply a config.h for bfd.h. Fix from Jiong. 6) xdp_convert_ctx_access() is simplified since it doesn't need to set target size during verification, from Jesper. 7) Let bpftool properly recognize BPF_PROG_TYPE_CGROUP_DEVICE program types, from Roman. 8) Various functions in BPF cpumap were not declared static, from Wei. 9) Fix a double semicolon in BPF samples, from Luis. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git Thanks a lot! The following changes since commit 6bd39bc3da0f4a301fae69c4a32db2768f5118be: Merge branch 'hns3-add-some-new-features-and-fix-some-bugs' (2018-01-12 10:12:33 -0500) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git for you to fetch changes up to e8a9d9683c8a62f917c19e57f1618363fb9ed04e: Merge branch 'bpf-libbpf-cleanups' (2018-01-17 01:18:12 +0100) Alexei Starovoitov (1): Merge branch 'error-injection' Daniel Borkmann (3): Merge branch 'bpf-nfp-map-offload' Merge branch 'bpf-various-improvements' Merge branch 'bpf-libbpf-cleanups' Jakub Kicinski (18): bpf: add map_alloc_check callback bpf: hashtab: move attribute validation before allocation bpf: hashtab: move checks out of alloc function bpf: add helper for copying attrs to struct bpf_map bpf: rename bpf_dev_offload -> bpf_prog_offload bpf: offload: factor out netdev checking at allocation time bpf: offload: add map offload infrastructure nfp: bpf: add map data structure nfp: bpf: add basic control channel communication nfp: bpf: implement helpers for FW map ops nfp: bpf: parse function call and map capabilities nfp: bpf: add helpers for updating immediate instructions nfp: bpf: add verification and codegen for map lookups nfp: bpf: add support for reading map memory nfp: bpf: implement bpf map offload bpf: offload: make bpf_offload_dev_match() reject host+host case bpf: annotate bpf_insn_print_t with __printf nfp: bpf: print map lookup problems into verifier log Jesper Dangaard Brouer (4): bpf: simplify xdp_convert_ctx_access for xdp_rxq_info libbpf: install the header file libbpf.h libbpf: cleanup Makefile, remove unused elements libbpf: Makefile set specified permission mode Jiong Wang (1): tools: bpftool: add -DPACKAGE when including bfd.h Luis de Bethencourt (1): samples/bpf: Fix trailing semicolon Masami Hiramatsu (5): tracing/kprobe: bpf: Check error injectable event is on function entry tracing/kprobe: bpf: Compare instruction pointer with original one error-injection: Separate error-injection from kprobe error-injection: Add injectable error types error-injection: Support fault injection framework Quentin Monnet (2): libbpf: fix string comparison for guessing eBPF program type nfp: bpf: reject program on instructions unknown to the JIT compiler Roman Gushchin (1): bpftool: recognize BPF_PROG_TYPE_CGROUP_DEVICE programs Wei Yongjun (1): bpf: cpumap: make some functions static Documentation/fault-injection/fault-injection.txt |
Re: [PATCH 32/32] aio: implement io_pgetevents
Hi, Christoph, Christoph Hellwigwrites: > On Mon, Jan 15, 2018 at 09:53:10AM +0100, Christoph Hellwig wrote: >> > pselect, as an example, crams the sigmask and size together. Why not >> > just do that? libaio can take care of setting that up. >> >> Yes, I could try that. It's just another double indirection for no >> good reason. > > I cna't get this to work - for some reason I always end up with a NULL > sigmask in the kernel. Nevermind that it leads to really crap code > generation. I guess for select the latter doesn't matter too much as > everyone sane uses ppoll anyway. I'd be willing to bet the issue is in your io_syscall6 implementation. You pass in arg5 where arg6 should be used. Don't feel bad, it took me the better part of today to figure that out. :) Here's an incremental diff on top of what you've posted. Feel free to fold it into your patch (and format however you like). You can find the libaio changes in my 'aio-poll' branch: https://pagure.io/libaio/commits/aio-poll These changes were run through the libaio test harness, 64 bit and 32 bit, so the compat system call was tested. Cheers, Jeff diff --git a/fs/aio.c b/fs/aio.c index 57a4e8d..c6d67d0 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1991,9 +1991,11 @@ static long do_io_getevents(aio_context_t ctx_id, long, nr, struct io_event __user *, events, struct timespec __user *, timeout, - const sigset_t __user *, sigmask) + void __user *, sigmask) { sigset_tksigmask, sigsaved; + size_t sigsetsize = 0; + sigset_t __user *up = NULL; struct timespec64 ts; int ret; @@ -2001,8 +2003,18 @@ static long do_io_getevents(aio_context_t ctx_id, return -EFAULT; if (sigmask) { - if (copy_from_user(, sigmask, sizeof(ksigmask))) + if (!access_ok(VERIFY_READ, sigmask, + sizeof(void *) + sizeof(size_t)) || + __get_user(up, (sigset_t __user * __user *)sigmask) || + __get_user(sigsetsize, + (size_t __user *)(sigmask + sizeof(void * return -EFAULT; + + if (sigsetsize != sizeof(sigset_t)) + return -EINVAL; + if (copy_from_user(, up, sizeof(ksigmask))) + return -EFAULT; + sigdelsetmask(, sigmask(SIGKILL) | sigmask(SIGSTOP)); sigprocmask(SIG_SETMASK, , ); } @@ -2049,8 +2061,11 @@ static long do_io_getevents(aio_context_t ctx_id, compat_long_t, nr, struct io_event __user *, events, struct compat_timespec __user *, timeout, - const compat_sigset_t __user *, sigmask) + void __user *, sig) { + compat_size_t sigsetsize = 0; + compat_sigset_t __user *sigmask; + compat_uptr_t up = 0; sigset_t ksigmask, sigsaved; struct timespec64 t; int ret; @@ -2058,8 +2073,17 @@ static long do_io_getevents(aio_context_t ctx_id, if (timeout && compat_get_timespec64(, timeout)) return -EFAULT; - if (sigmask) { - if (get_compat_sigset(, sigmask)) + if (sig) { + if (!access_ok(VERIFY_READ, sig, + sizeof(compat_uptr_t) + sizeof(compat_size_t)) || + __get_user(up, (compat_uptr_t __user *)sig) || + __get_user(sigsetsize, + (compat_size_t __user *)(sig + sizeof(up + return -EFAULT; + + if (sigsetsize != sizeof(compat_sigset_t)) + return -EINVAL; + if (get_compat_sigset(, compat_ptr(up))) return -EFAULT; sigdelsetmask(, sigmask(SIGKILL) | sigmask(SIGSTOP)); sigprocmask(SIG_SETMASK, , ); diff --git a/include/linux/compat.h b/include/linux/compat.h index a4cda98..32412f8 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -541,7 +541,7 @@ asmlinkage long compat_sys_io_pgetevents(compat_aio_context_t ctx_id, compat_long_t nr, struct io_event __user *events, struct compat_timespec __user *timeout, - const compat_sigset_t __user *sigmask); + void __user *sigmask); asmlinkage long compat_sys_io_submit(compat_aio_context_t ctx_id, int nr, u32 __user *iocb); asmlinkage long compat_sys_mount(const char __user *dev_name, diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 3bc9a13..bc79026 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -544,7 +544,7
Re: [PATCH bpf-next 0/3] bpf: add dumping and disassembler for non-host JITs
On Tue, 16 Jan 2018 16:05:18 -0800, Jakub Kicinski wrote: > Hi, Ah, forgot to insert "Jiong says:" here :) > Currently bpftool could disassemble host jited image, for example x86_64, > using libbfd. However it couldn't disassemble offload jited image. > > There are two reasons: > > 1. bpf_obj_get_info_by_fd/struct bpf_prog_info couldn't get the address > of jited image and image's length. > > 2. Even after issue 1 resolved, bpftool couldn't figure out what is the > offload arch from bpf_prog_info, therefore can't drive libbfd > disassembler correctly. > > This patch set resolve issue 1 by introducing two new fields "jited_len" > and "jited_image" in bpf_dev_offload. These two fields serve as the generic > interface to communicate the jited image address and length for all offload > targets to higher level caller. For example, bpf_obj_get_info_by_fd could > use them to fill the userspace visible fields jited_prog_len and > jited_prog_insns. > > This patch set resolve issue 2 by getting bfd backend name through > "ifindex", i.e network interface index. > > v1: > - Deduct bfd arch name through ifindex, i.e network interface index. >First, map ifindex to devname through ifindex_to_name_ns, then get >pci id through /sys/class/dev/DEVNAME/device/vendor. (Daniel, Alexei)
Re: [PATCH net-next 2/8] net: sched: cls_api: handle generic cls errors
On 18-01-16 06:58 PM, David Ahern wrote: On 1/16/18 9:20 AM, Alexander Aring wrote: } if (n->nlmsg_type != RTM_NEWTFILTER || !(n->nlmsg_flags & NLM_F_CREATE)) { + NL_SET_ERR_MSG(extack, "Need both RTM_NEWTFILTER and NLM_F_CREATE to create a new filter"); that does not seem the right message. tc_ctl_tfilter is overloaded for new, delete and get so the response here needs to reflect that. I believe in this case the user did not specify a valid chain. Are you sure you are looking at the correct code? It is a create message that is at stake here. A create has to have RTM_NEWTFILTER and NLM_F_CREATE Also, the messages are targeted at users not developers, so no code jargon / API references. Generally true, but should this rule really be scripture? The main user here is tc in user space and it doesnt make mistakes in this case i.e we will never see this error with tc because a create will always have those two set correctly; OTOH, a developer writing some new app is more likely to make this mistake (in which case this message is very helpful). cheers, jamal
Re: [PATCH net-next 0/8] net: sched: cls: add extack support
Hey David, and others, [+Alexei] On 01/17/2018 12:27 AM, Jamal Hadi Salim wrote: > On 18-01-16 05:41 PM, Jakub Kicinski wrote: >> On Tue, 16 Jan 2018 17:12:57 -0500, Jamal Hadi Salim wrote: >>> On 18-01-16 04:46 PM, Jakub Kicinski wrote: On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote: >>> >>> [..] >>> >>> I would say precedence should be Jiri's patches, Alex's patches >>> and then yours: >>> Alex's patches fix the core (cls_api.c) area with proper extack >>> for the core and then he has one patch to cover a specific >>> use case of the u32 classifier extack. Yours is only concerned >>> with one use case - bpf which depend on the core (that is in Alex's >>> patches) >> >> Our patches are concerned with propagating the extack to drivers, >> and nfp (and netdevsim) make use of it. >> >> I'm miffed by the fact that you jumped out with this conflicting series >> *after* we posted ours, and we got shot down on white space. So I've been looking over Quentin's series just now that sits in my bucket and it looks fine to me, but merge with this one would probably end up badly for David. Therefore I'm proposing the following that should hopefully be fine and work out for Alexander and Jakub/Quentin as a consensus: I'm getting the current bpf-next stuff as PR out in a few minutes, so David can pull this in and therefore net-next will also have the dependency on nfp for Quentin's series. Then, given this one here needs another respin anyway, I would suggest to combine the missing patches from Alexander's series, and get it all out in a single patch series directly for net-next w/o any interdependency hassle. Thanks, Daniel
[PATCH bpf-next 1/3] bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info
From: Jiong WangFor host JIT, there are "jited_len"/"bpf_func" fields in struct bpf_prog used by all host JIT targets to get jited image and it's length. While for offload, targets are likely to have different offload mechanisms that these info are kept in device private data fields. Therefore, BPF_OBJ_GET_INFO_BY_FD syscall needs an unified way to get JIT length and contents info for offload targets. One way is to introduce new callback to parse device private data then fill those fields in bpf_prog_info. This might be a little heavy, the other way is to add generic fields which will be initialized by all offload targets. This patch follow the second approach to introduce two new fields in struct bpf_dev_offload and teach bpf_prog_get_info_by_fd about them to fill correct jited_prog_len and jited_prog_insns in bpf_prog_info. Reviewed-by: Jakub Kicinski Signed-off-by: Jiong Wang --- include/linux/bpf.h | 2 ++ kernel/bpf/offload.c | 23 +++ kernel/bpf/syscall.c | 31 ++- 3 files changed, 43 insertions(+), 13 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5c2c104dc2c5..025b1c2f8053 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -234,6 +234,8 @@ struct bpf_prog_offload { struct list_headoffloads; booldev_state; const struct bpf_prog_offload_ops *dev_ops; + void*jited_image; + u32 jited_len; }; struct bpf_prog_aux { diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index a88cebf368bf..6c0baa1cf8f8 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -230,9 +230,12 @@ int bpf_prog_offload_info_fill(struct bpf_prog_info *info, .prog = prog, .info = info, }; + struct bpf_prog_aux *aux = prog->aux; struct inode *ns_inode; struct path ns_path; + char __user *uinsns; void *res; + u32 ulen; res = ns_get_path_cb(_path, bpf_prog_offload_info_fill_ns, ); if (IS_ERR(res)) { @@ -241,6 +244,26 @@ int bpf_prog_offload_info_fill(struct bpf_prog_info *info, return PTR_ERR(res); } + down_read(_devs_lock); + + if (!aux->offload) { + up_read(_devs_lock); + return -ENODEV; + } + + ulen = info->jited_prog_len; + info->jited_prog_len = aux->offload->jited_len; + if (info->jited_prog_len & ulen) { + uinsns = u64_to_user_ptr(info->jited_prog_insns); + ulen = min_t(u32, info->jited_prog_len, ulen); + if (copy_to_user(uinsns, aux->offload->jited_image, ulen)) { + up_read(_devs_lock); + return -EFAULT; + } + } + + up_read(_devs_lock); + ns_inode = ns_path.dentry->d_inode; info->netns_dev = new_encode_dev(ns_inode->i_sb->s_dev); info->netns_ino = ns_inode->i_ino; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c691b9e972e3..c28524483bf4 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1724,19 +1724,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog, goto done; } - ulen = info.jited_prog_len; - info.jited_prog_len = prog->jited_len; - if (info.jited_prog_len && ulen) { - if (bpf_dump_raw_ok()) { - uinsns = u64_to_user_ptr(info.jited_prog_insns); - ulen = min_t(u32, info.jited_prog_len, ulen); - if (copy_to_user(uinsns, prog->bpf_func, ulen)) - return -EFAULT; - } else { - info.jited_prog_insns = 0; - } - } - ulen = info.xlated_prog_len; info.xlated_prog_len = bpf_prog_insn_size(prog); if (info.xlated_prog_len && ulen) { @@ -1762,6 +1749,24 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog, err = bpf_prog_offload_info_fill(, prog); if (err) return err; + goto done; + } + + /* NOTE: the following code is supposed to be skipped for offload. +* bpf_prog_offload_info_fill() is the place to fill similar fields +* for offload. +*/ + ulen = info.jited_prog_len; + info.jited_prog_len = prog->jited_len; + if (info.jited_prog_len && ulen) { + if (bpf_dump_raw_ok()) { + uinsns = u64_to_user_ptr(info.jited_prog_insns); + ulen = min_t(u32, info.jited_prog_len, ulen); + if (copy_to_user(uinsns, prog->bpf_func, ulen)) + return -EFAULT; + } else { + info.jited_prog_insns = 0; +
[PATCH bpf-next 2/3] nfp: bpf: set new jit info fields
From: Jiong WangThis patch set those new jit info fields introduced in this patch set. Reviewed-by: Jakub Kicinski Signed-off-by: Jiong Wang --- drivers/net/ethernet/netronome/nfp/bpf/offload.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c index 9c78a09cda24..4c1cea68f19e 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c @@ -127,6 +127,7 @@ static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog) struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv; unsigned int stack_size; unsigned int max_instr; + int err; stack_size = nn_readb(nn, NFP_NET_CFG_BPF_STACK_SZ) * 64; if (prog->aux->stack_depth > stack_size) { @@ -143,7 +144,14 @@ static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog) if (!nfp_prog->prog) return -ENOMEM; - return nfp_bpf_jit(nfp_prog); + err = nfp_bpf_jit(nfp_prog); + if (err) + return err; + + prog->aux->offload->jited_len = nfp_prog->prog_len * sizeof(u64); + prog->aux->offload->jited_image = nfp_prog->prog; + + return 0; } static int nfp_bpf_destroy(struct nfp_net *nn, struct bpf_prog *prog) -- 2.15.1
[PATCH bpf-next 3/3] tools: bpftool: improve architecture detection by using ifindex
From: Jiong WangThe current architecture detection method in bpftool is designed for host case. For offload case, we can't use the architecture of "bpftool" itself. Instead, we could call the existing "ifindex_to_name_ns" to get DEVNAME, then read pci id from /sys/class/dev/DEVNAME/device/vendor, finally we map vendor id to bfd arch name which will finally be used to select bfd backend for the disassembler. Reviewed-by: Jakub Kicinski Signed-off-by: Jiong Wang --- tools/bpf/bpftool/common.c | 72 ++ tools/bpf/bpftool/jit_disasm.c | 16 +- tools/bpf/bpftool/main.h | 5 ++- tools/bpf/bpftool/prog.c | 12 ++- 4 files changed, 102 insertions(+), 3 deletions(-) diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index 6601c95a9258..0b482c0070e0 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -34,6 +34,7 @@ /* Author: Jakub Kicinski */ #include +#include #include #include #include @@ -433,6 +434,77 @@ ifindex_to_name_ns(__u32 ifindex, __u32 ns_dev, __u32 ns_ino, char *buf) return if_indextoname(ifindex, buf); } +static int read_sysfs_hex_int(char *path) +{ + char vendor_id_buf[8]; + int len; + int fd; + + fd = open(path, O_RDONLY); + if (fd < 0) { + p_err("Can't open %s: %s", path, strerror(errno)); + return -1; + } + + len = read(fd, vendor_id_buf, sizeof(vendor_id_buf)); + close(fd); + if (len < 0) { + p_err("Can't read %s: %s", path, strerror(errno)); + return -1; + } + if (len >= (int)sizeof(vendor_id_buf)) { + p_err("Value in %s too long", path); + return -1; + } + + vendor_id_buf[len] = 0; + + return strtol(vendor_id_buf, NULL, 0); +} + +static int read_sysfs_netdev_hex_int(char *devname, const char *entry_name) +{ + char full_path[64]; + + snprintf(full_path, sizeof(full_path), "/sys/class/net/%s/device/%s", +devname, entry_name); + + return read_sysfs_hex_int(full_path); +} + +const char *ifindex_to_bfd_name_ns(__u32 ifindex, __u64 ns_dev, __u64 ns_ino) +{ + char devname[IF_NAMESIZE]; + int vendor_id; + int device_id; + + if (!ifindex_to_name_ns(ifindex, ns_dev, ns_ino, devname)) { + p_err("Can't get net device name for ifindex %d: %s", ifindex, + strerror(errno)); + return NULL; + } + + vendor_id = read_sysfs_netdev_hex_int(devname, "vendor"); + if (vendor_id < 0) { + p_err("Can't get device vendor id for %s", devname); + return NULL; + } + + switch (vendor_id) { + case 0x19ee: + device_id = read_sysfs_netdev_hex_int(devname, "device"); + if (device_id != 0x4000 && + device_id != 0x6000 && + device_id != 0x6003) + p_info("Unknown NFP device ID, assuming it is NFP-6xxx arch"); + return "NFP-6xxx"; + default: + p_err("Can't get bfd arch name for device vendor id 0x%04x", + vendor_id); + return NULL; + } +} + void print_dev_plain(__u32 ifindex, __u64 ns_dev, __u64 ns_inode) { char name[IF_NAMESIZE]; diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c index 57d32e8a1391..87439320ef70 100644 --- a/tools/bpf/bpftool/jit_disasm.c +++ b/tools/bpf/bpftool/jit_disasm.c @@ -76,7 +76,8 @@ static int fprintf_json(void *out, const char *fmt, ...) return 0; } -void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes) +void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, + const char *arch) { disassembler_ftype disassemble; struct disassemble_info info; @@ -100,6 +101,19 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes) else init_disassemble_info(, stdout, (fprintf_ftype) fprintf); + + /* Update architecture info for offload. */ + if (arch) { + const bfd_arch_info_type *inf = bfd_scan_arch(arch); + + if (inf) { + bfdf->arch_info = inf; + } else { + p_err("No libfd support for %s", arch); + return; + } + } + info.arch = bfd_get_arch(bfdf); info.mach = bfd_get_mach(bfdf); info.buffer = image; diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 65b526fe6e7e..b8e9584d6246 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -121,7 +121,10 @@ int do_cgroup(int argc, char **arg); int
[PATCH bpf-next 0/3] bpf: add dumping and disassembler for non-host JITs
Hi, Currently bpftool could disassemble host jited image, for example x86_64, using libbfd. However it couldn't disassemble offload jited image. There are two reasons: 1. bpf_obj_get_info_by_fd/struct bpf_prog_info couldn't get the address of jited image and image's length. 2. Even after issue 1 resolved, bpftool couldn't figure out what is the offload arch from bpf_prog_info, therefore can't drive libbfd disassembler correctly. This patch set resolve issue 1 by introducing two new fields "jited_len" and "jited_image" in bpf_dev_offload. These two fields serve as the generic interface to communicate the jited image address and length for all offload targets to higher level caller. For example, bpf_obj_get_info_by_fd could use them to fill the userspace visible fields jited_prog_len and jited_prog_insns. This patch set resolve issue 2 by getting bfd backend name through "ifindex", i.e network interface index. v1: - Deduct bfd arch name through ifindex, i.e network interface index. First, map ifindex to devname through ifindex_to_name_ns, then get pci id through /sys/class/dev/DEVNAME/device/vendor. (Daniel, Alexei) Jiong Wang (3): bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info nfp: bpf: set new jit info fields tools: bpftool: improve architecture detection by using ifindex drivers/net/ethernet/netronome/nfp/bpf/offload.c | 10 +++- include/linux/bpf.h | 2 + kernel/bpf/offload.c | 23 kernel/bpf/syscall.c | 31 +- tools/bpf/bpftool/common.c | 72 tools/bpf/bpftool/jit_disasm.c | 16 +- tools/bpf/bpftool/main.h | 5 +- tools/bpf/bpftool/prog.c | 12 +++- 8 files changed, 154 insertions(+), 17 deletions(-) -- 2.15.1
Re: [patch net-next v10 02/13] net: sched: introduce shared filter blocks infrastructure
On Tue, Jan 16, 2018 at 7:33 AM, Jiri Pirkowrote: > static int __init tc_filter_init(void) > { > + int err; > + > tc_filter_wq = alloc_ordered_workqueue("tc_filter_workqueue", 0); > if (!tc_filter_wq) > return -ENOMEM; > > + err = register_pernet_subsys(_net_ops); > + if (err) > + return err; Need to destroy the above workqueue on error.
Re: [PATCH net-next 2/8] net: sched: cls_api: handle generic cls errors
On 1/16/18 9:20 AM, Alexander Aring wrote: > This patch adds extack support for generic cls handling. The extack > will be set deeper to each called function which is not part of netdev > core api. > > Cc: David Ahern> Signed-off-by: Alexander Aring > --- > net/sched/cls_api.c | 55 > + > 1 file changed, 43 insertions(+), 12 deletions(-) > > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c > index 01d09055707d..c25a9b4bcb4b 100644 > --- a/net/sched/cls_api.c > +++ b/net/sched/cls_api.c > @@ -122,7 +122,8 @@ static inline u32 tcf_auto_prio(struct tcf_proto *tp) > > static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol, > u32 prio, u32 parent, struct Qdisc *q, > - struct tcf_chain *chain) > + struct tcf_chain *chain, > + struct netlink_ext_ack *extack) > { > struct tcf_proto *tp; > int err; > @@ -148,6 +149,7 @@ static struct tcf_proto *tcf_proto_create(const char > *kind, u32 protocol, > module_put(tp->ops->owner); > err = -EAGAIN; > } else { > + NL_SET_ERR_MSG(extack, "TC classifier not found"); > err = -ENOENT; > } > goto errout; > @@ -662,7 +664,8 @@ static int tfilter_notify(struct net *net, struct sk_buff > *oskb, > static int tfilter_del_notify(struct net *net, struct sk_buff *oskb, > struct nlmsghdr *n, struct tcf_proto *tp, > struct Qdisc *q, u32 parent, > - void *fh, bool unicast, bool *last) > + void *fh, bool unicast, bool *last, > + struct netlink_ext_ack *extack) > { > struct sk_buff *skb; > u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; > @@ -674,6 +677,7 @@ static int tfilter_del_notify(struct net *net, struct > sk_buff *oskb, > > if (tcf_fill_node(net, skb, tp, q, parent, fh, portid, n->nlmsg_seq, > n->nlmsg_flags, RTM_DELTFILTER) <= 0) { > + NL_SET_ERR_MSG(extack, "Failed to build del event > notification"); > kfree_skb(skb); > return -EINVAL; > } > @@ -687,8 +691,11 @@ static int tfilter_del_notify(struct net *net, struct > sk_buff *oskb, > if (unicast) > return netlink_unicast(net->rtnl, skb, portid, MSG_DONTWAIT); > > - return rtnetlink_send(skb, net, portid, RTNLGRP_TC, > - n->nlmsg_flags & NLM_F_ECHO); > + err = rtnetlink_send(skb, net, portid, RTNLGRP_TC, > + n->nlmsg_flags & NLM_F_ECHO); > + if (err < 0) > + NL_SET_ERR_MSG(extack, "Failed to send filter delete > notification"); not sure we want to do this -- extack for internal failures like this one or below in tc_ctl_tfilter. > + return err; > } > > static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb, > @@ -749,8 +756,10 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct > nlmsghdr *n, > if (prio == 0) { > switch (n->nlmsg_type) { > case RTM_DELTFILTER: > - if (protocol || t->tcm_handle || tca[TCA_KIND]) > + if (protocol || t->tcm_handle || tca[TCA_KIND]) { > + NL_SET_ERR_MSG(extack, "Cannot flush filters > with protocol, handle or kind set"); > return -ENOENT; > + } > break; > case RTM_NEWTFILTER: > /* If no priority is provided by the user, > @@ -763,6 +772,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct > nlmsghdr *n, > } > /* fall-through */ > default: > + NL_SET_ERR_MSG(extack, "Invalid filter command with > priority of zero"); > return -ENOENT; > } > } > @@ -780,23 +790,31 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct > nlmsghdr *n, > parent = q->handle; > } else { > q = qdisc_lookup(dev, TC_H_MAJ(t->tcm_parent)); > - if (!q) > + if (!q) { > + NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists"); Messages should avoid contractions; spell out 'does not'. Please check all of the patches. Also, it should be 'exist' (no 's' on the end). > return -EINVAL; > + } > } > > /* Is it classful? */ > cops = q->ops->cl_ops; > - if (!cops) > + if (!cops) { > + NL_SET_ERR_MSG(extack, "Qdisc not classful"); > return -EINVAL; > + } > > - if
[PATCH bpf-next 1/6] bpf: offload: make bpf_offload_dev_match() reject host+host case
Daniel suggests it would be more logical for bpf_offload_dev_match() to return false is either the program or the map are not offloaded, rather than treating the both not offloaded case as a "matching CPU/host device". This makes no functional difference today, since verifier only calls bpf_offload_dev_match() when one of the objects is offloaded. Signed-off-by: Jakub Kicinski--- kernel/bpf/offload.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index 453785fa1881..a88cebf368bf 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -395,10 +395,8 @@ bool bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_map *map) struct bpf_prog_offload *offload; bool ret; - if (!!bpf_prog_is_dev_bound(prog->aux) != !!bpf_map_is_dev_bound(map)) + if (!bpf_prog_is_dev_bound(prog->aux) || !bpf_map_is_dev_bound(map)) return false; - if (!bpf_prog_is_dev_bound(prog->aux)) - return true; down_read(_devs_lock); offload = prog->aux->offload; -- 2.15.1
[PATCH bpf-next 0/6] bpf: various fixes and improvements
Hi! This series combines a number of random improvements ranging from libbpf to nfp driver. NFP patches make better use of the verifier log. There is a requested adjustment to the map offload code, and a warning fix for a W=1 build to the disassembler. Quentin also fixes the libbpf program type detection, while Jiong allows the use of libbfd compiled from source. Jakub Kicinski (3): bpf: offload: make bpf_offload_dev_match() reject host+host case bpf: annotate bpf_insn_print_t with __printf nfp: bpf: print map lookup problems into verifier log Jiong Wang (1): tools: bpftool: add -DPACKAGE when including bfd.h Quentin Monnet (2): libbpf: fix string comparison for guessing eBPF program type nfp: bpf: reject program on instructions unknown to the JIT compiler drivers/net/ethernet/netronome/nfp/bpf/jit.c | 5 + drivers/net/ethernet/netronome/nfp/bpf/main.h | 1 + drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 20 ++-- kernel/bpf/disasm.h | 4 ++-- kernel/bpf/offload.c | 4 +--- tools/bpf/bpftool/Makefile| 2 +- tools/build/feature/Makefile | 2 +- tools/lib/bpf/libbpf.c| 2 +- 8 files changed, 26 insertions(+), 14 deletions(-) -- 2.15.1
[PATCH bpf-next 2/6] bpf: annotate bpf_insn_print_t with __printf
Functions of type bpf_insn_print_t take printf-like format string, mark the type accordingly. Signed-off-by: Jakub KicinskiReviewed-by: Quentin Monnet --- kernel/bpf/disasm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/disasm.h b/kernel/bpf/disasm.h index e0857d016f89..266fe8ee542b 100644 --- a/kernel/bpf/disasm.h +++ b/kernel/bpf/disasm.h @@ -29,8 +29,8 @@ extern const char *const bpf_class_string[8]; const char *func_id_name(int id); -typedef void (*bpf_insn_print_t)(struct bpf_verifier_env *env, -const char *, ...); +typedef __printf(2, 3) void (*bpf_insn_print_t)(struct bpf_verifier_env *env, + const char *, ...); typedef const char *(*bpf_insn_revmap_call_t)(void *private_data, const struct bpf_insn *insn); typedef const char *(*bpf_insn_print_imm_t)(void *private_data, -- 2.15.1
[PATCH bpf-next 6/6] nfp: bpf: reject program on instructions unknown to the JIT compiler
From: Quentin MonnetIf an eBPF instruction is unknown to the driver JIT compiler, we can reject the program at verification time. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski Reviewed-by: Jiong Wang --- drivers/net/ethernet/netronome/nfp/bpf/jit.c | 5 + drivers/net/ethernet/netronome/nfp/bpf/main.h | 1 + drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 6 ++ 3 files changed, 12 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c index cdc949fabe98..56451edf01c2 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c @@ -2907,6 +2907,11 @@ void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, unsigned int cnt) } } +bool nfp_bpf_supported_opcode(u8 code) +{ + return !!instr_cb[code]; +} + void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv) { unsigned int i; diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h index 80855d43b25e..424fe8338105 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h @@ -324,6 +324,7 @@ struct nfp_bpf_vnic { void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, unsigned int cnt); int nfp_bpf_jit(struct nfp_prog *prog); +bool nfp_bpf_supported_opcode(u8 code); extern const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops; diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c index 81dab462456c..479f602887e9 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c @@ -290,6 +290,12 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx) meta = nfp_bpf_goto_meta(nfp_prog, meta, insn_idx, env->prog->len); nfp_prog->verifier_meta = meta; + if (!nfp_bpf_supported_opcode(meta->insn.code)) { + pr_vlog(env, "instruction %#02x not supported\n", + meta->insn.code); + return -EINVAL; + } + if (meta->insn.src_reg >= MAX_BPF_REG || meta->insn.dst_reg >= MAX_BPF_REG) { pr_vlog(env, "program uses extended registers - jit hardening?\n"); -- 2.15.1
[PATCH bpf-next 3/6] tools: bpftool: add -DPACKAGE when including bfd.h
From: Jiong Wangbfd.h is requiring including of config.h except when PACKAGE or PACKAGE_VERSION are defined. /* PR 14072: Ensure that config.h is included first. */ #if !defined PACKAGE && !defined PACKAGE_VERSION #error config.h must be included before this header #endif This check has been introduced since May-2012. It doesn't show up in bfd.h on some Linux distribution, probably because distributions have remove it when building the package. However, sometimes the user might just build libfd from source code then link bpftool against it. For this case, bfd.h will be original that we need to define PACKAGE or PACKAGE_VERSION. Acked-by: Jakub Kicinski Signed-off-by: Jiong Wang --- tools/bpf/bpftool/Makefile | 2 +- tools/build/feature/Makefile | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile index 2237bc43f71c..26901ec87361 100644 --- a/tools/bpf/bpftool/Makefile +++ b/tools/bpf/bpftool/Makefile @@ -39,7 +39,7 @@ CC = gcc CFLAGS += -O2 CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow -CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/ +CFLAGS += -DPACKAGE='"bpftool"' -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/ CFLAGS += -DBPFTOOL_VERSION='"$(BPFTOOL_VERSION)"' LIBS = -lelf -lbfd -lopcodes $(LIBBPF) diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index 17f2c73fff8b..bc715f6ac320 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -190,7 +190,7 @@ FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS) $(BUILD) -DPACKAGE='"perf"' -lbfd -lz -liberty -ldl $(OUTPUT)test-disassembler-four-args.bin: - $(BUILD) -lbfd -lopcodes + $(BUILD) -DPACKAGE='"perf"' -lbfd -lopcodes $(OUTPUT)test-liberty.bin: $(CC) $(CFLAGS) -Wall -Werror -o $@ test-libbfd.c -DPACKAGE='"perf"' $(LDFLAGS) -lbfd -ldl -liberty -- 2.15.1
[PATCH bpf-next 5/6] nfp: bpf: print map lookup problems into verifier log
Use the verifier log to output error messages if map lookup can't be offloaded. Signed-off-by: Jakub KicinskiAcked-by: Quentin Monnet --- drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c index 741438896cc7..81dab462456c 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c @@ -132,22 +132,24 @@ nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct bpf_verifier_env *env, case BPF_FUNC_map_lookup_elem: if (!bpf->helpers.map_lookup) { - pr_info("map_lookup: not supported by FW\n"); + pr_vlog(env, "map_lookup: not supported by FW\n"); return -EOPNOTSUPP; } if (reg2->type != PTR_TO_STACK) { - pr_info("map_lookup: unsupported key ptr type %d\n", + pr_vlog(env, + "map_lookup: unsupported key ptr type %d\n", reg2->type); return -EOPNOTSUPP; } if (!tnum_is_const(reg2->var_off)) { - pr_info("map_lookup: variable key pointer\n"); + pr_vlog(env, "map_lookup: variable key pointer\n"); return -EOPNOTSUPP; } off = reg2->var_off.value + reg2->off; if (-off % 4) { - pr_info("map_lookup: unaligned stack pointer %lld\n", + pr_vlog(env, + "map_lookup: unaligned stack pointer %lld\n", -off); return -EOPNOTSUPP; } @@ -160,7 +162,7 @@ nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct bpf_verifier_env *env, meta->arg2_var_off |= off != old_off; if (meta->arg1.map_ptr != reg1->map_ptr) { - pr_info("map_lookup: called for different map\n"); + pr_vlog(env, "map_lookup: called for different map\n"); return -EOPNOTSUPP; } break; @@ -263,7 +265,7 @@ nfp_bpf_check_ptr(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, if (reg->type == PTR_TO_MAP_VALUE) { if (is_mbpf_store(meta)) { - pr_info("map writes not supported\n"); + pr_vlog(env, "map writes not supported\n"); return -EOPNOTSUPP; } } -- 2.15.1
[PATCH bpf-next 4/6] libbpf: fix string comparison for guessing eBPF program type
From: Quentin Monnetlibbpf is able to deduce the type of a program from the name of the ELF section in which it is located. However, the comparison is made on the first n characters, n being determined with sizeof() applied to the reference string (e.g. "xdp"). When such section names are supposed to receive a suffix separated with a slash (e.g. "kprobe/"), using sizeof() takes the final NUL character of the reference string into account, which implies that both strings must be equal. Instead, the desired behaviour would consist in taking the length of the string, *without* accounting for the ending NUL character, and to make sure the reference string is a prefix to the ELF section name. Subtract 1 to the total size of the string for obtaining the length for the comparison. Fixes: 583c90097f72 ("libbpf: add ability to guess program type based on section name") Signed-off-by: Quentin Monnet Acked-by: Jakub Kicinski --- tools/lib/bpf/libbpf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index e9c4b7cabcf2..30c776375118 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -1803,7 +1803,7 @@ BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT); BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP); BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT); -#define BPF_PROG_SEC(string, type) { string, sizeof(string), type } +#define BPF_PROG_SEC(string, type) { string, sizeof(string) - 1, type } static const struct { const char *sec; size_t len; -- 2.15.1
[PATCH net-next] ipv6: mcast: remove dead code
From: Eric DumazetSince commit 41033f029e39 ("snmp: Remove duplicate OUTMCAST stat increment") one line of code became unneeded. Signed-off-by: Eric Dumazet --- Â net/ipv6/mcast.c |2 -- Â 1 file changed, 2 deletions(-) diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 40b223a930a39e010ac744bc3b4b32b28e9bc5e8..6a5d0e39bb87f98bef7de90ab2fa63d9666c00ce 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -1655,8 +1655,6 @@ static void mld_sendpack(struct sk_buff *skb) if (err) goto err_out; - payload_len = skb->len; - err = NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, net->ipv6.igmp_sk, skb, NULL, skb->dev, dst_output);
Re: [patch net-next v10 00/13] net: sched: allow qdiscs to share filter block instances
On 1/16/18 7:33 AM, Jiri Pirko wrote: > From: Jiri Pirko> > Currently the filters added to qdiscs are independent. So for example if you > have 2 netdevices and you create ingress qdisc on both and you want to add > identical filter rules both, you need to add them twice. This patchset > makes this easier and mainly saves resources allowing to share all filters > within a qdisc - I call it a "filter block". Also this helps to save > resources when we do offload to hw for example to expensive TCAM. > > So back to the example. First, we create 2 qdiscs. Both will share > block number 22. "22" is just an identification: > $ tc qdisc add dev ens7 ingress_block 22 ingress > > $ tc qdisc add dev ens8 ingress_block 22 ingress > > > If we don't specify "block" command line option, no shared block would > be created: > $ tc qdisc add dev ens9 ingress > > Now if we list the qdiscs, we will see the block index in the output: > > $ tc qdisc > qdisc ingress : dev ens7 parent :fff1 ingress_block 22 > qdisc ingress : dev ens8 parent :fff1 ingress_block 22 > qdisc ingress : dev ens9 parent :fff1 > > > To make is more visual, the situation looks like this: > >ens7 ingress qdisc ens7 ingress qdisc > | | > | | > +--> block 22 <--+ > > Unlimited number of qdiscs may share the same block. > > Note that this patchset introduces block sharing support also for clsact > qdisc: > $ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact > $ tc qdisc show dev ens10 > qdisc clsact : dev ens10 parent :fff1 ingress_block 23 egress_block 24 > > > We can add filter using the block index: > > $ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 > action drop > > > Note we cannot use the qdisc for filter manipulations of shared blocks: > > $ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip > 192.168.100.2 action drop > Error: This filter block is shared. Please use the block index to manipulate > the filters. > > > We will see the same output if we list filters for ingress qdisc of > ens7 and ens8, also for the block 22: > > $ tc filter show block 22 > filter block 22 protocol ip pref 25 flower chain 0 > filter block 22 protocol ip pref 25 flower chain 0 handle 0x1 > ... > > $ tc filter show dev ens7 ingress > filter block 22 protocol ip pref 25 flower chain 0 > filter block 22 protocol ip pref 25 flower chain 0 handle 0x1 > ... > > $ tc filter show dev ens8 ingress > filter block 22 protocol ip pref 25 flower chain 0 > filter block 22 protocol ip pref 25 flower chain 0 handle 0x1 > ... > API LGTM. Acked-by: David Ahern
Re: [patch net-next v10 00/13] net: sched: allow qdiscs to share filter block instances
On 18-01-16 10:33 AM, Jiri Pirko wrote: From: Jiri PirkoFor patches 1-9: Reviewed-by: Jamal Hadi Salim Acked-by: Jamal Hadi Salim cheers, jamal
Re: [PATCH net-next 0/8] net: sched: cls: add extack support
On 18-01-16 05:41 PM, Jakub Kicinski wrote: On Tue, 16 Jan 2018 17:12:57 -0500, Jamal Hadi Salim wrote: On 18-01-16 04:46 PM, Jakub Kicinski wrote: On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote: [..] I would say precedence should be Jiri's patches, Alex's patches and then yours: Alex's patches fix the core (cls_api.c) area with proper extack for the core and then he has one patch to cover a specific use case of the u32 classifier extack. Yours is only concerned with one use case - bpf which depend on the core (that is in Alex's patches) Our patches are concerned with propagating the extack to drivers, and nfp (and netdevsim) make use of it. I'm miffed by the fact that you jumped out with this conflicting series *after* we posted ours, and we got shot down on white space. I totally empathize with the general frustration. The general rule is we fix the core first then add users (classifiers in this case). Note: Alex has a _lot_ of patches that he has been trying to send for the last little while and this one is certainly not a new set (I actually had reviewed this set). There are others. And the rule of "fix core first then add users" has been imposed on him as well. cheers, jamal
[PATCH net-next] net: stmmac: Fix reception of Broadcom switches tags
Broadcom tags inserted by Broadcom switches put a 4 byte header after the MAC SA and before the EtherType, which may look like some sort of 0 length LLC/SNAP packet (tcpdump and wireshark do think that way). With ACS enabled in stmmac the packets were truncated to 8 bytes on reception, whereas clearing this bit allowed normal reception to occur. In order to make that possible, we need to pass a net_device argument to the different core_init() functions and we are dependent on the Broadcom tagger padding packets correctly (which it now does). To be as little invasive as possible, this is only done for gmac1000 when the network device is DSA-enabled (netdev_uses_dsa() returns true). Signed-off-by: Florian Fainelli--- drivers/net/ethernet/stmicro/stmmac/common.h | 2 +- drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c| 3 ++- drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 12 +++- drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c | 3 ++- drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c| 11 ++- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c| 2 +- 6 files changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index ce2ea2d491ac..2ffe76c0ff74 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -474,7 +474,7 @@ struct mac_device_info; /* Helpers to program the MAC core */ struct stmmac_ops { /* MAC core initialization */ - void (*core_init)(struct mac_device_info *hw, int mtu); + void (*core_init)(struct mac_device_info *hw, struct net_device *dev); /* Enable the MAC RX/TX */ void (*set_mac)(void __iomem *ioaddr, bool enable); /* Enable and verify that the IPC module is supported */ diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c index 9eb7f65d8000..a3fa65b1ca8e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c @@ -483,7 +483,8 @@ static int sun8i_dwmac_init(struct platform_device *pdev, void *priv) return 0; } -static void sun8i_dwmac_core_init(struct mac_device_info *hw, int mtu) +static void sun8i_dwmac_core_init(struct mac_device_info *hw, + struct net_device *dev) { void __iomem *ioaddr = hw->pcsr; u32 v; diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c index 8a86340ff2d3..540d21786a43 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c @@ -25,18 +25,28 @@ #include #include #include +#include #include #include "stmmac_pcs.h" #include "dwmac1000.h" -static void dwmac1000_core_init(struct mac_device_info *hw, int mtu) +static void dwmac1000_core_init(struct mac_device_info *hw, + struct net_device *dev) { void __iomem *ioaddr = hw->pcsr; u32 value = readl(ioaddr + GMAC_CONTROL); + int mtu = dev->mtu; /* Configure GMAC core */ value |= GMAC_CORE_INIT; + /* Clear ACS bit because Ethernet switch tagging formats such as +* Broadcom tags can look like invalid LLC/SNAP packets and cause the +* hardware to truncate packets on reception. +*/ + if (netdev_uses_dsa(dev)) + value &= ~GMAC_CONTROL_ACS; + if (mtu > 1500) value |= GMAC_CONTROL_2K; if (mtu > 2000) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c index 8ef517356313..c1ee427c42cb 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c @@ -28,7 +28,8 @@ #include #include "dwmac100.h" -static void dwmac100_core_init(struct mac_device_info *hw, int mtu) +static void dwmac100_core_init(struct mac_device_info *hw, + struct net_device *dev) { void __iomem *ioaddr = hw->pcsr; u32 value = readl(ioaddr + MAC_CONTROL); diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c index f3ed8f7853eb..6af5100d3cb2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c @@ -20,13 +20,22 @@ #include "stmmac_pcs.h" #include "dwmac4.h" -static void dwmac4_core_init(struct mac_device_info *hw, int mtu) +static void dwmac4_core_init(struct mac_device_info *hw, +struct net_device *dev) { void __iomem *ioaddr = hw->pcsr; u32 value = readl(ioaddr + GMAC_CONFIG); + int mtu = dev->mtu; value |= GMAC_CORE_INIT; + /* Clear ACS
Re: [PATCH] samples/bpf: Fix trailing semicolon
On 01/16/2018 03:15 PM, Luis de Bethencourt wrote: > The trailing semicolon is an empty statement that does no operation. > Removing it since it doesn't do anything. > > Signed-off-by: Luis de BethencourtApplied to bpf-next, thanks Luis!
[bpf-next PATCH 0/3] libbpf: cleanups to Makefile
This patchset contains some small improvements and cleanup for the Makefile in tools/lib/bpf/. It worries me that the libbpf.so shared library is not versioned, but it not addressed in this patchset. --- Jesper Dangaard Brouer (3): libbpf: install the header file libbpf.h libbpf: cleanup Makefile, remove unused elements libbpf: Makefile set specified permission mode tools/lib/bpf/Makefile | 20 +--- 1 file changed, 5 insertions(+), 15 deletions(-) --
Re: [RFC bpf-next PATCH] bpf: add comments to BPF ld/ldx sizes
On 01/16/2018 12:31 PM, Jesper Dangaard Brouer wrote: > Doc BPF ld/ldx size defines, as it help me understand the code in filter.c. > > Signed-off-by: Jesper Dangaard Brouer> --- > 0 files changed > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 395d261948de..4729d9a002d4 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -17,7 +17,7 @@ > #define BPF_ALU640x07/* alu mode in double word width */ > > /* ld/ldx fields */ > -#define BPF_DW 0x18/* double word */ > +#define BPF_DW 0x18/* double word (64-bit) */ > #define BPF_XADD 0xc0/* exclusive add */ > > /* alu/jmp fields */ > diff --git a/include/uapi/linux/bpf_common.h b/include/uapi/linux/bpf_common.h > index 18be90725ab0..ee97668bdadb 100644 > --- a/include/uapi/linux/bpf_common.h > +++ b/include/uapi/linux/bpf_common.h > @@ -15,9 +15,10 @@ > > /* ld/ldx fields */ > #define BPF_SIZE(code) ((code) & 0x18) > -#define BPF_W 0x00 > -#define BPF_H 0x08 > -#define BPF_B 0x10 > +#define BPF_W 0x00 /* 32-bit */ > +#define BPF_H 0x08 /* 16-bit */ > +#define BPF_B 0x10 /* 8-bit */ > +/* eBPF BPF_DW 0x1864-bit */ Hmm, I don't really mind, but we do have it documented in: Documentation/networking/filter.txt +942 Feels like if we put a comment only on BPF_{B,H,W}, then we might also want to document all the others such as ALU ops, etc. > #define BPF_MODE(code) ((code) & 0xe0) > #define BPF_IMM 0x00 > #define BPF_ABS 0x20 >
[bpf-next PATCH 1/3] libbpf: install the header file libbpf.h
It seems like an oversight not to install the header file for libbpf, given the libbpf.so + libbpf.a files are installed. Signed-off-by: Jesper Dangaard Brouer--- tools/lib/bpf/Makefile |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile index 8ed43ae9db9b..54370654c708 100644 --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -192,7 +192,8 @@ install_lib: all_cmd install_headers: $(call QUIET_INSTALL, headers) \ - $(call do_install,bpf.h,$(prefix)/include/bpf,644) + $(call do_install,bpf.h,$(prefix)/include/bpf,644); \ + $(call do_install,libbpf.h,$(prefix)/include/bpf,644); install: install_lib
[bpf-next PATCH 2/3] libbpf: cleanup Makefile, remove unused elements
The plugin_dir_SQ variable is not used, remove it. The function update_dir is also unused, remove it. The variable $VERSION_FILES is empty, remove it. These all originates from the introduction of the Makefile, and is likely a copy paste from tools/lib/traceevent/Makefile. Fixes: 1b76c13e4b36 ("bpf tools: Introduce 'bpf' library and add bpf feature check") Signed-off-by: Jesper Dangaard Brouer--- tools/lib/bpf/Makefile | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile index 54370654c708..8e15e48cb8f8 100644 --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -93,7 +93,6 @@ export prefix libdir src obj # Shell quotes libdir_SQ = $(subst ','\'',$(libdir)) libdir_relative_SQ = $(subst ','\'',$(libdir_relative)) -plugin_dir_SQ = $(subst ','\'',$(plugin_dir)) LIB_FILE = libbpf.a libbpf.so @@ -150,7 +149,7 @@ CMD_TARGETS = $(LIB_FILE) TARGETS = $(CMD_TARGETS) -all: fixdep $(VERSION_FILES) all_cmd +all: fixdep all_cmd all_cmd: $(CMD_TARGETS) @@ -169,16 +168,6 @@ $(OUTPUT)libbpf.so: $(BPF_IN) $(OUTPUT)libbpf.a: $(BPF_IN) $(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^ -define update_dir - (echo $1 > $@.tmp; \ - if [ -r $@ ] && cmp -s $@ $@.tmp; then \ - rm -f $@.tmp; \ - else\ - echo ' UPDATE $@'; \ - mv -f $@.tmp $@; \ - fi); -endef - define do_install if [ ! -d '$(DESTDIR_SQ)$2' ]; then \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$2'; \ @@ -204,7 +193,7 @@ config-clean: $(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null clean: - $(call QUIET_CLEAN, libbpf) $(RM) *.o *~ $(TARGETS) *.a *.so $(VERSION_FILES) .*.d .*.cmd \ + $(call QUIET_CLEAN, libbpf) $(RM) *.o *~ $(TARGETS) *.a *.so .*.d .*.cmd \ $(RM) LIBBPF-CFLAGS $(call QUIET_CLEAN, core-gen) $(RM) $(OUTPUT)FEATURE-DUMP.libbpf
[bpf-next PATCH 3/3] libbpf: Makefile set specified permission mode
The third parameter to do_install was not used by $(INSTALL) command. Fix this by only setting the -m option when the third parameter is supplied. The use of a third parameter was introduced in commit eb54e522a000 ("bpf: install libbpf headers on 'make install'"). Without this change, the header files are install as executables files (755). Fixes: eb54e522a000 ("bpf: install libbpf headers on 'make install'") Signed-off-by: Jesper Dangaard Brouer--- tools/lib/bpf/Makefile |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile index 8e15e48cb8f8..83714ca1f22b 100644 --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -172,7 +172,7 @@ define do_install if [ ! -d '$(DESTDIR_SQ)$2' ]; then \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$2'; \ fi; \ - $(INSTALL) $1 '$(DESTDIR_SQ)$2' + $(INSTALL) $1 $(if $3,-m $3,) '$(DESTDIR_SQ)$2' endef install_lib: all_cmd
Re: [PATCH net-next 7/8] net: sched: cls: add extack support for tc_setup_cb_call
On Tue, Jan 16, 2018 at 9:20 AM, Alexander Aringwrote: > int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts, > -enum tc_setup_type type, void *type_data, bool err_stop) > +enum tc_setup_type type, void *type_data, bool err_stop, > +struct netlink_ext_ack *extack) > { > int ok_count; > int ret; > > ret = tcf_block_cb_call(block, type, type_data, err_stop); > - if (ret < 0) > + if (ret < 0) { > + NL_SET_ERR_MSG(extack, "Failed to inialize tcf block"); s/inialize/initialize/ > return ret; > + } > ok_count = ret; > > if (!exts) > return ok_count; > ret = tc_exts_setup_cb_egdev_call(exts, type, type_data, err_stop); > - if (ret < 0) > + if (ret < 0) { > + NL_SET_ERR_MSG(extack, "Failed to inialize tcf block > extensions"); Ditto.
Re: [PATCH -next] bpf: cpumap: make some functions static
On 01/16/2018 12:27 PM, Wei Yongjun wrote: > Fixes the following sparse warnings: > > kernel/bpf/cpumap.c:146:6: warning: > symbol '__cpu_map_queue_destructor' was not declared. Should it be static? > kernel/bpf/cpumap.c:225:16: warning: > symbol 'cpu_map_build_skb' was not declared. Should it be static? > kernel/bpf/cpumap.c:340:26: warning: > symbol '__cpu_map_entry_alloc' was not declared. Should it be static? > kernel/bpf/cpumap.c:398:6: warning: > symbol '__cpu_map_entry_free' was not declared. Should it be static? > kernel/bpf/cpumap.c:441:6: warning: > symbol '__cpu_map_entry_replace' was not declared. Should it be static? > kernel/bpf/cpumap.c:454:5: warning: > symbol 'cpu_map_delete_elem' was not declared. Should it be static? > kernel/bpf/cpumap.c:467:5: warning: > symbol 'cpu_map_update_elem' was not declared. Should it be static? > kernel/bpf/cpumap.c:505:6: warning: > symbol 'cpu_map_free' was not declared. Should it be static? > > Signed-off-by: Wei YongjunApplied to bpf-next, thanks Wei!
Re: [PATCH bpf] bpf: reject stores into ctx via st and xadd
On Tue, Jan 16, 2018 at 11:30:10PM +0100, Daniel Borkmann wrote: > Alexei found that verifier does not reject stores into context > via BPF_ST instead of BPF_STX. And while looking at it, we > also should not allow XADD variant of BPF_STX. > > The context rewriter is only assuming either BPF_LDX_MEM- or > BPF_STX_MEM-type operations, thus reject anything other than > that so that assumptions in the rewriter properly hold. Add > test cases as well for BPF selftests. > > Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields") > Reported-by: Alexei Starovoitov> Signed-off-by: Daniel Borkmann Applied, thank you Daniel. all bugs are eventually shallow. For this one we even had two broken testcases. Ouch.
Re: [PATCH net-next 2/8] net: sched: cls_api: handle generic cls errors
On Tue, Jan 16, 2018 at 9:20 AM, Alexander Aringwrote: > @@ -1117,8 +1146,10 @@ int tcf_exts_validate(struct net *net, struct > tcf_proto *tp, struct nlattr **tb, > } > #else > if ((exts->action && tb[exts->action]) || > - (exts->police && tb[exts->police])) > + (exts->police && tb[exts->police])) { > + NL_SET_ERR_MSG(extack, "Actions are not supported. Check > compile options"); > return -EOPNOTSUPP; > + } > #endif "Check compile options" is confusing, it is clearer if we can just say we need to enable CONFIG_NET_CLS_ACT here.
[PATCH iproute2-next] tc: red: allow setting th_min and th_max to the same value
Setting th_min and th_max to the same value may be useful for DCTCP deployments. The original DCTCP paper describes it as a simplest way of achieving simple ECN threshold marking. Indeed, there doesn't seem to be any simpler qdisc in Linux which would allow such a setup today. Signed-off-by: Jakub KicinskiReviewed-by: Dirk van der Merwe --- Or should I go ahead and add a DCTCP qdisc? :) tc/tc_red.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tc/tc_red.c b/tc/tc_red.c index 1f82ef1aec65..178fe088f732 100644 --- a/tc/tc_red.c +++ b/tc/tc_red.c @@ -30,7 +30,9 @@ int tc_red_eval_P(unsigned int qmin, unsigned int qmax, double prob) { int i = qmax - qmin; - if (i <= 0) + if (!i) + return 0; + if (i < 0) return -1; prob /= i; -- 2.15.1
Re: [PATCH net-next 8/8] net: sched: cls_u32: add extack support
On Tue, Jan 16, 2018 at 9:20 AM, Alexander Aringwrote: > - if (root_ht == ht) > + if (root_ht == ht) { > + NL_SET_ERR_MSG(extack, "Not allowd to delete root node"); s/allowd/allowed/
Re: [PATCH net-next 0/8] net: sched: cls: add extack support
On Tue, 16 Jan 2018 17:12:57 -0500, Jamal Hadi Salim wrote: > On 18-01-16 04:46 PM, Jakub Kicinski wrote: > > On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote: > > [..] > > > Ugh, this is going to conflict with our series too :( (and I CCed you > > on ours) > > > > Would it be OK for you to hold off until Jiri's code gets merged and > > ours comes down via bpf-next? That shouldn't take long at all. The > > conflicts between bpf/bpf-next/net-next are really taking their toll > > on us this release cycles, I would really appreciate if we could make > > some progress on this relatively simple series at least... > > > > I would say precedence should be Jiri's patches, Alex's patches > and then yours: > Alex's patches fix the core (cls_api.c) area with proper extack > for the core and then he has one patch to cover a specific > use case of the u32 classifier extack. Yours is only concerned > with one use case - bpf which depend on the core (that is in Alex's > patches) Our patches are concerned with propagating the extack to drivers, and nfp (and netdevsim) make use of it. I'm miffed by the fact that you jumped out with this conflicting series *after* we posted ours, and we got shot down on white space.
[PATCH bpf] bpf: reject stores into ctx via st and xadd
Alexei found that verifier does not reject stores into context via BPF_ST instead of BPF_STX. And while looking at it, we also should not allow XADD variant of BPF_STX. The context rewriter is only assuming either BPF_LDX_MEM- or BPF_STX_MEM-type operations, thus reject anything other than that so that assumptions in the rewriter properly hold. Add test cases as well for BPF selftests. Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields") Reported-by: Alexei StarovoitovSigned-off-by: Daniel Borkmann --- kernel/bpf/verifier.c | 19 +++ tools/testing/selftests/bpf/test_verifier.c | 29 +++-- 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 5423b90..1aff5de 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -978,6 +978,13 @@ static bool is_pointer_value(struct bpf_verifier_env *env, int regno) return __is_pointer_value(env->allow_ptr_leaks, cur_regs(env) + regno); } +static bool is_ctx_reg(struct bpf_verifier_env *env, int regno) +{ + const struct bpf_reg_state *reg = cur_regs(env) + regno; + + return reg->type == PTR_TO_CTX; +} + static int check_pkt_ptr_alignment(struct bpf_verifier_env *env, const struct bpf_reg_state *reg, int off, int size, bool strict) @@ -1258,6 +1265,12 @@ static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_ins return -EACCES; } + if (is_ctx_reg(env, insn->dst_reg)) { + verbose(env, "BPF_XADD stores into R%d context is not allowed\n", + insn->dst_reg); + return -EACCES; + } + /* check whether atomic_add can read the memory */ err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, BPF_SIZE(insn->code), BPF_READ, -1); @@ -3991,6 +4004,12 @@ static int do_check(struct bpf_verifier_env *env) if (err) return err; + if (is_ctx_reg(env, insn->dst_reg)) { + verbose(env, "BPF_ST stores into R%d context is not allowed\n", + insn->dst_reg); + return -EACCES; + } + /* check that memory (dst_reg + off) is writeable */ err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, BPF_SIZE(insn->code), BPF_WRITE, diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 74cb63e..c34d288 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -2593,6 +2593,29 @@ static struct bpf_test tests[] = { .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, { + "context stores via ST", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_ST_MEM(BPF_DW, BPF_REG_1, offsetof(struct __sk_buff, mark), 0), + BPF_EXIT_INSN(), + }, + .errstr = "BPF_ST stores into R1 context is not allowed", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + }, + { + "context stores via XADD", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_W, BPF_REG_1, +BPF_REG_0, offsetof(struct __sk_buff, mark), 0), + BPF_EXIT_INSN(), + }, + .errstr = "BPF_XADD stores into R1 context is not allowed", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + }, + { "direct packet access: test1", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, @@ -4312,7 +4335,8 @@ static struct bpf_test tests[] = { .fixup_map1 = { 2 }, .errstr_unpriv = "R2 leaks addr into mem", .result_unpriv = REJECT, - .result = ACCEPT, + .result = REJECT, + .errstr = "BPF_XADD stores into R1 context is not allowed", }, { "leak pointer into ctx 2", @@ -4326,7 +4350,8 @@ static struct bpf_test tests[] = { }, .errstr_unpriv = "R10 leaks addr into mem", .result_unpriv = REJECT, - .result = ACCEPT, + .result = REJECT, + .errstr = "BPF_XADD stores into R1 context is not allowed", }, { "leak pointer
[PATCH] cfg80211: fix station info handling bugs
From: Johannes BergFix two places where the structure isn't initialized to zero, and thus can't be filled properly by the driver. Fixes: 4a4b8169501b ("cfg80211: Accept multiple RSSI thresholds for CQM") Fixes: 9930380f0bd8 ("cfg80211: implement IWRATE") Signed-off-by: Johannes Berg --- Dave, can you apply this as an exception? I'm not really expecting any other patches to show up now, and seems easier to have a single patch than a whole pull request (especially now that patchwork seems to be swallowing mine ...) --- net/wireless/nl80211.c | 2 +- net/wireless/wext-compat.c | 3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index c084dd2205ac..91e55bb85416 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -9832,7 +9832,7 @@ static int cfg80211_cqm_rssi_update(struct cfg80211_registered_device *rdev, */ if (!wdev->cqm_config->last_rssi_event_value && wdev->current_bss && rdev->ops->get_station) { - struct station_info sinfo; + struct station_info sinfo = {}; u8 *mac_addr; mac_addr = wdev->current_bss->pub.bssid; diff --git a/net/wireless/wext-compat.c b/net/wireless/wext-compat.c index 7ca04a7de85a..05186a47878f 100644 --- a/net/wireless/wext-compat.c +++ b/net/wireless/wext-compat.c @@ -1254,8 +1254,7 @@ static int cfg80211_wext_giwrate(struct net_device *dev, { struct wireless_dev *wdev = dev->ieee80211_ptr; struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy); - /* we are under RTNL - globally locked - so can use a static struct */ - static struct station_info sinfo; + struct station_info sinfo = {}; u8 addr[ETH_ALEN]; int err; -- 2.15.1
Re: [PATCH net-next 0/8] net: sched: cls: add extack support
On 18-01-16 04:46 PM, Jakub Kicinski wrote: On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote: [..] Ugh, this is going to conflict with our series too :( (and I CCed you on ours) Would it be OK for you to hold off until Jiri's code gets merged and ours comes down via bpf-next? That shouldn't take long at all. The conflicts between bpf/bpf-next/net-next are really taking their toll on us this release cycles, I would really appreciate if we could make some progress on this relatively simple series at least... I would say precedence should be Jiri's patches, Alex's patches and then yours: Alex's patches fix the core (cls_api.c) area with proper extack for the core and then he has one patch to cover a specific use case of the u32 classifier extack. Yours is only concerned with one use case - bpf which depend on the core (that is in Alex's patches) cheers, jamal
[PATCH v3 net-next 2/4] l2tp: remove l2specific_len dependency in l2tp_core
Remove l2specific_len dependency while building l2tpv3 header or parsing the received frame since default L2-Specific Sublayer is always four bytes long and we don't need to rely on a user supplied value. Moreover in l2tp netlink code there are no sanity checks to enforce the relation between l2specific_len and l2specific_type, so sending a malformed netlink message is possible to set l2specific_type to L2TP_L2SPECTYPE_DEFAULT (or even L2TP_L2SPECTYPE_NONE) and set l2specific_len to a value greater than 4 leaking memory on the wire and sending corrupted frames. Reviewed-by: Guillaume NaultTested-by: Guillaume Nault Signed-off-by: Lorenzo Bianconi --- net/l2tp/l2tp_core.c | 34 -- net/l2tp/l2tp_core.h | 11 +++ 2 files changed, 27 insertions(+), 18 deletions(-) diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index 62285fc6eb59..88efb8b845ca 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -730,11 +730,9 @@ void l2tp_recv_common(struct l2tp_session *session, struct sk_buff *skb, "%s: recv data ns=%u, session nr=%u\n", session->name, ns, session->nr); } + ptr += 4; } - /* Advance past L2-specific header, if present */ - ptr += session->l2specific_len; - if (L2TP_SKB_CB(skb)->has_seq) { /* Received a packet with sequence numbers. If we're the LNS, * check if we sre sending sequence numbers and if not, @@ -1048,21 +1046,20 @@ static int l2tp_build_l2tpv3_header(struct l2tp_session *session, void *buf) memcpy(bufp, >cookie[0], session->cookie_len); bufp += session->cookie_len; } - if (session->l2specific_len) { - if (session->l2specific_type == L2TP_L2SPECTYPE_DEFAULT) { - u32 l2h = 0; - if (session->send_seq) { - l2h = 0x4000 | session->ns; - session->ns++; - session->ns &= 0xff; - l2tp_dbg(session, L2TP_MSG_SEQ, -"%s: updated ns to %u\n", -session->name, session->ns); - } + if (session->l2specific_type == L2TP_L2SPECTYPE_DEFAULT) { + u32 l2h = 0; - *((__be32 *) bufp) = htonl(l2h); + if (session->send_seq) { + l2h = 0x4000 | session->ns; + session->ns++; + session->ns &= 0xff; + l2tp_dbg(session, L2TP_MSG_SEQ, +"%s: updated ns to %u\n", +session->name, session->ns); } - bufp += session->l2specific_len; + + *((__be32 *)bufp) = htonl(l2h); + bufp += 4; } return bufp - optr; @@ -1719,7 +1716,7 @@ int l2tp_session_delete(struct l2tp_session *session) EXPORT_SYMBOL_GPL(l2tp_session_delete); /* We come here whenever a session's send_seq, cookie_len or - * l2specific_len parameters are set. + * l2specific_type parameters are set. */ void l2tp_session_set_header_len(struct l2tp_session *session, int version) { @@ -1728,7 +1725,8 @@ void l2tp_session_set_header_len(struct l2tp_session *session, int version) if (session->send_seq) session->hdr_len += 4; } else { - session->hdr_len = 4 + session->cookie_len + session->l2specific_len; + session->hdr_len = 4 + session->cookie_len; + session->hdr_len += l2tp_get_l2specific_len(session); if (session->tunnel->encap == L2TP_ENCAPTYPE_UDP) session->hdr_len += 4; } diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h index c2e9bbd79b35..7bef304de4f0 100644 --- a/net/l2tp/l2tp_core.h +++ b/net/l2tp/l2tp_core.h @@ -302,6 +302,17 @@ static inline void l2tp_session_dec_refcount(struct l2tp_session *session) l2tp_session_free(session); } +static inline int l2tp_get_l2specific_len(struct l2tp_session *session) +{ + switch (session->l2specific_type) { + case L2TP_L2SPECTYPE_DEFAULT: + return 4; + case L2TP_L2SPECTYPE_NONE: + default: + return 0; + } +} + #define l2tp_printk(ptr, type, func, fmt, ...) \ do { \ if (((ptr)->debug) & (type))\ -- 2.13.6
[PATCH v3 net-next 3/4] l2tp: remove l2specific_len configurable parameter
Remove l2specific_len configuration parameter since now L2-Specific Sublayer length is computed according to l2specific_type provided by userspace. Reviewed-by: Guillaume NaultTested-by: Guillaume Nault Signed-off-by: Lorenzo Bianconi --- net/l2tp/l2tp_core.c| 1 - net/l2tp/l2tp_core.h| 2 -- net/l2tp/l2tp_debugfs.c | 2 +- net/l2tp/l2tp_netlink.c | 4 4 files changed, 1 insertion(+), 8 deletions(-) diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index 88efb8b845ca..194a7483bb93 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1777,7 +1777,6 @@ struct l2tp_session *l2tp_session_create(int priv_size, struct l2tp_tunnel *tunn session->lns_mode = cfg->lns_mode; session->reorder_timeout = cfg->reorder_timeout; session->l2specific_type = cfg->l2specific_type; - session->l2specific_len = cfg->l2specific_len; session->cookie_len = cfg->cookie_len; memcpy(>cookie[0], >cookie[0], cfg->cookie_len); session->peer_cookie_len = cfg->peer_cookie_len; diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h index 7bef304de4f0..9bbee90e9963 100644 --- a/net/l2tp/l2tp_core.h +++ b/net/l2tp/l2tp_core.h @@ -59,7 +59,6 @@ struct l2tp_session_cfg { int debug; /* bitmask of debug message * categories */ u16 vlan_id;/* VLAN pseudowire only */ - u16 l2specific_len; /* Layer 2 specific length */ u16 l2specific_type; /* Layer 2 specific type */ u8 cookie[8]; /* optional cookie */ int cookie_len; /* 0, 4 or 8 bytes */ @@ -85,7 +84,6 @@ struct l2tp_session { int cookie_len; u8 peer_cookie[8]; int peer_cookie_len; - u16 l2specific_len; u16 l2specific_type; u16 hdr_len; u32 nr; /* session NR state (receive) */ diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c index 2c30587d1a14..72e713da4733 100644 --- a/net/l2tp/l2tp_debugfs.c +++ b/net/l2tp/l2tp_debugfs.c @@ -181,7 +181,7 @@ static void l2tp_dfs_seq_session_show(struct seq_file *m, void *v) session->debug, jiffies_to_msecs(session->reorder_timeout)); seq_printf(m, " offset 0 l2specific %hu/%hu\n", - session->l2specific_type, session->l2specific_len); + session->l2specific_type, l2tp_get_l2specific_len(session)); if (session->cookie_len) { seq_printf(m, " cookie %02x%02x%02x%02x", session->cookie[0], session->cookie[1], diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c index 9ba2b8a68f65..405a5341ed1e 100644 --- a/net/l2tp/l2tp_netlink.c +++ b/net/l2tp/l2tp_netlink.c @@ -561,10 +561,6 @@ static int l2tp_nl_cmd_session_create(struct sk_buff *skb, struct genl_info *inf cfg.l2specific_type = L2TP_L2SPECTYPE_DEFAULT; } - cfg.l2specific_len = 4; - if (info->attrs[L2TP_ATTR_L2SPEC_LEN]) - cfg.l2specific_len = nla_get_u8(info->attrs[L2TP_ATTR_L2SPEC_LEN]); - if (info->attrs[L2TP_ATTR_COOKIE]) { u16 len = nla_len(info->attrs[L2TP_ATTR_COOKIE]); if (len > 8) { -- 2.13.6
[PATCH v3 net-next 4/4] l2tp: mark L2TP_ATTR_L2SPEC_LEN as not used
Reviewed-by: Guillaume NaultTested-by: Guillaume Nault Signed-off-by: Lorenzo Bianconi --- include/uapi/linux/l2tp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/l2tp.h b/include/uapi/linux/l2tp.h index 71e62795104d..7d570c7bd117 100644 --- a/include/uapi/linux/l2tp.h +++ b/include/uapi/linux/l2tp.h @@ -97,7 +97,7 @@ enum { L2TP_ATTR_OFFSET, /* u16 (not used) */ L2TP_ATTR_DATA_SEQ, /* u16 */ L2TP_ATTR_L2SPEC_TYPE, /* u8, enum l2tp_l2spec_type */ - L2TP_ATTR_L2SPEC_LEN, /* u8, enum l2tp_l2spec_type */ + L2TP_ATTR_L2SPEC_LEN, /* u8 (not used) */ L2TP_ATTR_PROTO_VERSION,/* u8 */ L2TP_ATTR_IFNAME, /* string */ L2TP_ATTR_CONN_ID, /* u32 */ -- 2.13.6
[PATCH v3 net-next 1/4] l2tp: double-check l2specific_type provided by userspace
Add sanity check on l2specific_type provided by userspace in l2tp_nl_cmd_session_create() since just L2TP_L2SPECTYPE_DEFAULT and L2TP_L2SPECTYPE_NONE are currently supported. Moreover explicitly set l2specific_type to L2TP_L2SPECTYPE_DEFAULT only if the userspace does not provide a value for it Reviewed-by: Guillaume NaultTested-by: Guillaume Nault Signed-off-by: Lorenzo Bianconi --- net/l2tp/l2tp_netlink.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c index e1ca29f79821..9ba2b8a68f65 100644 --- a/net/l2tp/l2tp_netlink.c +++ b/net/l2tp/l2tp_netlink.c @@ -550,9 +550,16 @@ static int l2tp_nl_cmd_session_create(struct sk_buff *skb, struct genl_info *inf if (info->attrs[L2TP_ATTR_DATA_SEQ]) cfg.data_seq = nla_get_u8(info->attrs[L2TP_ATTR_DATA_SEQ]); - cfg.l2specific_type = L2TP_L2SPECTYPE_DEFAULT; - if (info->attrs[L2TP_ATTR_L2SPEC_TYPE]) + if (info->attrs[L2TP_ATTR_L2SPEC_TYPE]) { cfg.l2specific_type = nla_get_u8(info->attrs[L2TP_ATTR_L2SPEC_TYPE]); + if (cfg.l2specific_type != L2TP_L2SPECTYPE_DEFAULT && + cfg.l2specific_type != L2TP_L2SPECTYPE_NONE) { + ret = -EINVAL; + goto out_tunnel; + } + } else { + cfg.l2specific_type = L2TP_L2SPECTYPE_DEFAULT; + } cfg.l2specific_len = 4; if (info->attrs[L2TP_ATTR_L2SPEC_LEN]) -- 2.13.6
[PATCH v3 net-next 0/4] l2tp: set l2specific_len based on l2specific_type
Do not rely on l2specific_len value provided by userspace but set sublayer length according to l2specific_type. Mark L2TP_ATTR_L2SPEC_LEN attribute as not used Changes since v2: - drop the patch related to a fix in the switch default case in l2tp_nl_cmd_session_create() - use L2SPECTYPE_NONE as default case in l2tp_get_l2specific_len() Changes since v1: - remove l2specific_len parameter - add sanity check on l2specific_type provided by userspace Lorenzo Bianconi (4): l2tp: double-check l2specific_type provided by userspace l2tp: remove l2specific_len dependency in l2tp_core l2tp: remove l2specific_len configurable parameter l2tp: mark L2TP_ATTR_L2SPEC_LEN as not used include/uapi/linux/l2tp.h | 2 +- net/l2tp/l2tp_core.c | 35 --- net/l2tp/l2tp_core.h | 13 +++-- net/l2tp/l2tp_debugfs.c | 2 +- net/l2tp/l2tp_netlink.c | 15 +-- 5 files changed, 38 insertions(+), 29 deletions(-) -- 2.13.6
Re: [PATCH net-next 8/8] net: sched: cls_u32: add extack support
On Tue, 16 Jan 2018 12:20:27 -0500, Alexander Aring wrote: > @@ -780,14 +787,18 @@ static int u32_set_parms(struct net *net, struct > tcf_proto *tp, > u32 handle = nla_get_u32(tb[TCA_U32_LINK]); > struct tc_u_hnode *ht_down = NULL, *ht_old; > > - if (TC_U32_KEY(handle)) > + if (TC_U32_KEY(handle)) { > + NL_SET_ERR_MSG(extack, "u32 Link handle must be a hash > table"); > return -EINVAL; > + } Since classifiers are commonly built as modules would it make more sense to use NL_SET_ERR_MSG_MOD()?
Re: [PATCH net-next 0/8] net: sched: cls: add extack support
On Tue, 16 Jan 2018 12:20:19 -0500, Alexander Aring wrote: > Hi, > > this patch adds extack support for TC classifier subsystem. The first > patch fixes some code style issues for this patch series pointed out > by checkpatch. The other patches until the last one prepares extack > handling for the TC classifier subsystem and handle generic extack > errors. > > The last patch is an example for u32 classifier to add extack support > inside the callbacks delete and change. There exists a init callback as > well, but most classifier implementation run a kalloc() once to allocate > something. Not necessary _yet_ to add extack support now. > > I know there are patches around which makes changes to these files. > I will rebase my stuff on Jiri's patches if they get in before mine. Ugh, this is going to conflict with our series too :( (and I CCed you on ours) Would it be OK for you to hold off until Jiri's code gets merged and ours comes down via bpf-next? That shouldn't take long at all. The conflicts between bpf/bpf-next/net-next are really taking their toll on us this release cycles, I would really appreciate if we could make some progress on this relatively simple series at least...
[PATCH bpf-next v3 02/11] net: sched: cls_flower: propagate extack support for filter offload
From: Quentin MonnetPropagate the extack pointer from the `->change()` classifier operation to the function used for filter replacement in cls_flower. This makes it possible to use netlink extack messages in the future at replacement time for this filter, although it is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- net/sched/cls_flower.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 998ee4faf934..ebbaba4a214b 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -234,7 +234,8 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f) static int fl_hw_replace_filter(struct tcf_proto *tp, struct flow_dissector *dissector, struct fl_flow_key *mask, - struct cls_fl_filter *f) + struct cls_fl_filter *f, + struct netlink_ext_ack *extack) { struct tc_cls_flower_offload cls_flower = {}; struct tcf_block *block = tp->chain->block; @@ -939,7 +940,8 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, err = fl_hw_replace_filter(tp, >dissector, , - fnew); + fnew, + extack); if (err) goto errout_idr; } -- 2.15.1
[PATCH bpf-next v3 08/11] nfp: bpf: plumb extack into functions related to XDP offload
From: Quentin MonnetPass a pointer to an extack object to nfp_app_xdp_offload() in order to prepare for extack usage in the nfp driver. Next step will be to forward this extack pointer to nfp_net_bpf_offload(), once this function is able to use it for printing error messages. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/bpf/main.c | 4 ++-- drivers/net/ethernet/netronome/nfp/nfp_app.h| 9 ++--- drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 +- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c index 8823c8360047..e8816ab8fb63 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c @@ -54,7 +54,7 @@ static bool nfp_net_ebpf_capable(struct nfp_net *nn) static int nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn, - struct bpf_prog *prog) + struct bpf_prog *prog, struct netlink_ext_ack *extack) { bool running, xdp_running; int ret; @@ -73,7 +73,7 @@ nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn, ret = nfp_net_bpf_offload(nn, prog, running); /* Stop offload if replace not possible */ if (ret && prog) - nfp_bpf_xdp_offload(app, nn, NULL); + nfp_bpf_xdp_offload(app, nn, NULL, extack); nn->dp.bpf_offload_xdp = prog && !ret; return ret; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h index 6a6eb02b516e..1229a34f8da5 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h @@ -43,6 +43,7 @@ struct bpf_prog; struct net_device; struct netdev_bpf; +struct netlink_ext_ack; struct pci_dev; struct sk_buff; struct sk_buff; @@ -134,7 +135,8 @@ struct nfp_app_type { int (*bpf)(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *xdp); int (*xdp_offload)(struct nfp_app *app, struct nfp_net *nn, - struct bpf_prog *prog); + struct bpf_prog *prog, + struct netlink_ext_ack *extack); int (*sriov_enable)(struct nfp_app *app, int num_vfs); void (*sriov_disable)(struct nfp_app *app); @@ -320,11 +322,12 @@ static inline int nfp_app_bpf(struct nfp_app *app, struct nfp_net *nn, } static inline int nfp_app_xdp_offload(struct nfp_app *app, struct nfp_net *nn, - struct bpf_prog *prog) + struct bpf_prog *prog, + struct netlink_ext_ack *extack) { if (!app || !app->type->xdp_offload) return -EOPNOTSUPP; - return app->type->xdp_offload(app, nn, prog); + return app->type->xdp_offload(app, nn, prog, extack); } static inline bool __nfp_app_ctrl_tx(struct nfp_app *app, struct sk_buff *skb) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 2b5cad3069a7..14f23e8d27fa 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -3395,7 +3395,7 @@ nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog *prog, u32 flags, if (err) return err; - err = nfp_app_xdp_offload(nn->app, nn, offload_prog); + err = nfp_app_xdp_offload(nn->app, nn, offload_prog, extack); if (err && flags & XDP_FLAGS_HW_MODE) return err; -- 2.15.1
[PATCH bpf-next v3 04/11] net: sched: cls_u32: propagate extack support for filter offload
From: Quentin MonnetPropagate the extack pointer from the `->change()` classifier operation to the function used for filter replacement in cls_u32. This makes it possible to use netlink extack messages in the future at replacement time for this filter, although it is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- net/sched/cls_u32.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 3ef5c32741c1..671eb952f6af 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -501,7 +501,7 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h) } static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h, - u32 flags) + u32 flags, struct netlink_ext_ack *extack) { struct tcf_block *block = tp->chain->block; struct tc_cls_u32_offload cls_u32 = {}; @@ -542,7 +542,7 @@ static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle) } static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n, - u32 flags) + u32 flags, struct netlink_ext_ack *extack) { struct tcf_block *block = tp->chain->block; struct tc_cls_u32_offload cls_u32 = {}; @@ -943,7 +943,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, return err; } - err = u32_replace_hw_knode(tp, new, flags); + err = u32_replace_hw_knode(tp, new, flags, extack); if (err) { u32_destroy_key(tp, new, false); return err; @@ -990,7 +990,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, ht->prio = tp->prio; idr_init(>handle_idr); - err = u32_replace_hw_hnode(tp, ht, flags); + err = u32_replace_hw_hnode(tp, ht, flags, extack); if (err) { idr_remove_ext(_c->handle_idr, handle); kfree(ht); @@ -1088,7 +1088,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, struct tc_u_knode __rcu **ins; struct tc_u_knode *pins; - err = u32_replace_hw_knode(tp, n, flags); + err = u32_replace_hw_knode(tp, n, flags, extack); if (err) goto errhw; -- 2.15.1
[PATCH bpf-next v3 01/11] net: sched: add extack support to change() classifier operation
From: Quentin MonnetAdd an extra argument to `->change()` operation for passing a pointer to a struct netlink_ext_ack. Update the operation for all classifiers accordingly. Extack is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- include/net/sch_generic.h | 3 ++- net/sched/cls_api.c | 3 ++- net/sched/cls_basic.c | 3 ++- net/sched/cls_bpf.c | 2 +- net/sched/cls_cgroup.c| 3 ++- net/sched/cls_flow.c | 2 +- net/sched/cls_flower.c| 2 +- net/sched/cls_fw.c| 2 +- net/sched/cls_matchall.c | 2 +- net/sched/cls_route.c | 3 ++- net/sched/cls_rsvp.h | 2 +- net/sched/cls_tcindex.c | 3 ++- net/sched/cls_u32.c | 3 ++- 13 files changed, 20 insertions(+), 13 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index ac029d5d88e4..5e77f2639c67 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -232,7 +232,8 @@ struct tcf_proto_ops { int (*change)(struct net *net, struct sk_buff *, struct tcf_proto*, unsigned long, u32 handle, struct nlattr **, - void **, bool); + void **, bool, + struct netlink_ext_ack *); int (*delete)(struct tcf_proto*, void *, bool*); void(*walk)(struct tcf_proto*, struct tcf_walker *arg); void(*bind_class)(void *, u32, unsigned long); diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 6708b6953bfa..0460cc22d48c 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -912,7 +912,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n, } err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, , - n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE); + n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE, + extack); if (err == 0) { if (tp_created) tcf_chain_tp_insert(chain, _info, tp); diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c index 5f169ded347e..2cc38cd71938 100644 --- a/net/sched/cls_basic.c +++ b/net/sched/cls_basic.c @@ -175,7 +175,8 @@ static int basic_set_parms(struct net *net, struct tcf_proto *tp, static int basic_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, - struct nlattr **tca, void **arg, bool ovr) + struct nlattr **tca, void **arg, bool ovr, + struct netlink_ext_ack *extack) { int err; struct basic_head *head = rtnl_dereference(tp->root); diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index 8d78e7f4ecc3..fcb831b3917e 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -449,7 +449,7 @@ static int cls_bpf_set_parms(struct net *net, struct tcf_proto *tp, static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, - void **arg, bool ovr) + void **arg, bool ovr, struct netlink_ext_ack *extack) { struct cls_bpf_head *head = rtnl_dereference(tp->root); struct cls_bpf_prog *oldprog = *arg; diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c index 309d5899265f..b74af0b55820 100644 --- a/net/sched/cls_cgroup.c +++ b/net/sched/cls_cgroup.c @@ -91,7 +91,8 @@ static void cls_cgroup_destroy_rcu(struct rcu_head *root) static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, -void **arg, bool ovr) +void **arg, bool ovr, +struct netlink_ext_ack *extack) { struct nlattr *tb[TCA_CGROUP_MAX + 1]; struct cls_cgroup_head *head = rtnl_dereference(tp->root); diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c index 25c2a888e1f0..e944f01d5394 100644 --- a/net/sched/cls_flow.c +++ b/net/sched/cls_flow.c @@ -401,7 +401,7 @@ static void flow_destroy_filter(struct rcu_head *head) static int flow_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, - void **arg, bool ovr) + void **arg, bool ovr, struct netlink_ext_ack *extack) { struct
[PATCH bpf-next v3 07/11] net: sched: create tc_can_offload_extack() wrapper
From: Quentin MonnetCreate a wrapper around tc_can_offload() that takes an additional extack pointer argument in order to output an error message if TC offload is disabled on the device. In this way, the error message is handled by the core and can be the same for all drivers. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- include/net/pkt_cls.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index c88c61234cb3..a3ad6a5a2d12 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -644,6 +644,17 @@ static inline bool tc_can_offload(const struct net_device *dev) return dev->features & NETIF_F_HW_TC; } +static inline bool tc_can_offload_extack(const struct net_device *dev, +struct netlink_ext_ack *extack) +{ + bool can = tc_can_offload(dev); + + if (!can) + NL_SET_ERR_MSG(extack, "TC offload is disabled on net device"); + + return can; +} + static inline bool tc_skip_hw(u32 flags) { return (flags & TCA_CLS_FLAGS_SKIP_HW) ? true : false; -- 2.15.1
[PATCH bpf-next v3 09/11] nfp: bpf: use extack support to improve debugging
From: Quentin MonnetUse the recently added extack support for eBPF offload in the driver. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/bpf/main.c| 31 ++-- drivers/net/ethernet/netronome/nfp/bpf/main.h| 2 +- drivers/net/ethernet/netronome/nfp/bpf/offload.c | 24 +++--- 3 files changed, 39 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c index e8816ab8fb63..a638c3ab6b61 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c @@ -70,7 +70,7 @@ nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn, if (prog && running && !xdp_running) return -EBUSY; - ret = nfp_net_bpf_offload(nn, prog, running); + ret = nfp_net_bpf_offload(nn, prog, running, extack); /* Stop offload if replace not possible */ if (ret && prog) nfp_bpf_xdp_offload(app, nn, NULL, extack); @@ -125,17 +125,31 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type, struct nfp_bpf_vnic *bv; int err; - if (type != TC_SETUP_CLSBPF || - !tc_can_offload(nn->dp.netdev) || - !nfp_net_ebpf_capable(nn) || - cls_bpf->common.protocol != htons(ETH_P_ALL) || - cls_bpf->common.chain_index) + if (type != TC_SETUP_CLSBPF) { + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "only offload of BPF classifiers supported"); + return -EOPNOTSUPP; + } + if (!tc_can_offload_extack(nn->dp.netdev, cls_bpf->common.extack)) + return -EOPNOTSUPP; + if (!nfp_net_ebpf_capable(nn)) { + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "NFP firmware does not support eBPF offload"); + return -EOPNOTSUPP; + } + if (cls_bpf->common.protocol != htons(ETH_P_ALL)) { + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "only ETH_P_ALL supported as filter protocol"); + return -EOPNOTSUPP; + } + if (cls_bpf->common.chain_index) return -EOPNOTSUPP; /* Only support TC direct action */ if (!cls_bpf->exts_integrated || tcf_exts_has_actions(cls_bpf->exts)) { - nn_err(nn, "only direct action with no legacy actions supported\n"); + NL_SET_ERR_MSG_MOD(cls_bpf->common.extack, + "only direct action with no legacy actions supported"); return -EOPNOTSUPP; } @@ -152,7 +166,8 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type, return 0; } - err = nfp_net_bpf_offload(nn, cls_bpf->prog, oldprog); + err = nfp_net_bpf_offload(nn, cls_bpf->prog, oldprog, + cls_bpf->common.extack); if (err) return err; diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h index b80e75a8ecda..80855d43b25e 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h @@ -334,7 +334,7 @@ struct nfp_net; int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *bpf); int nfp_net_bpf_offload(struct nfp_net *nn, struct bpf_prog *prog, - bool old_prog); + bool old_prog, struct netlink_ext_ack *extack); struct nfp_insn_meta * nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c index e2859b2e9c6a..9c78a09cda24 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c @@ -271,7 +271,9 @@ int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *bpf) } } -static int nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog) +static int +nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog, +struct netlink_ext_ack *extack) { struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv; unsigned int max_mtu; @@ -281,7 +283,7 @@ static int nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog) max_mtu = nn_readb(nn, NFP_NET_CFG_BPF_INL_MTU) * 64 - 32; if (max_mtu < nn->dp.netdev->mtu) { - nn_info(nn, "BPF offload not supported with MTU larger than HW packet split boundary\n"); + NL_SET_ERR_MSG_MOD(extack, "BPF offload not supported with MTU larger than HW packet split boundary");
[PATCH bpf-next v3 06/11] net: sched: add extack support for offload via tc_cls_common_offload
From: Quentin MonnetAdd extack support for hardware offload of classifiers. In order to achieve this, a pointer to a struct netlink_ext_ack is added to the struct tc_cls_common_offload that is passed to the callback for setting up the classifier. Function tc_cls_common_offload_init() is updated to support initialization of this new attribute. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- include/net/pkt_cls.h| 5 - net/sched/cls_bpf.c | 4 ++-- net/sched/cls_flower.c | 6 +++--- net/sched/cls_matchall.c | 4 ++-- net/sched/cls_u32.c | 8 5 files changed, 15 insertions(+), 12 deletions(-) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 0d1343cba84c..c88c61234cb3 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -590,15 +590,18 @@ struct tc_cls_common_offload { u32 chain_index; __be16 protocol; u32 prio; + struct netlink_ext_ack *extack; }; static inline void tc_cls_common_offload_init(struct tc_cls_common_offload *cls_common, - const struct tcf_proto *tp) + const struct tcf_proto *tp, + struct netlink_ext_ack *extack) { cls_common->chain_index = tp->chain->index; cls_common->protocol = tp->protocol; cls_common->prio = tp->prio; + cls_common->extack = extack; } struct tc_cls_u32_knode { diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index 70397862da4a..d15ef9ab7243 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -159,7 +159,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog, skip_sw = prog && tc_skip_sw(prog->gen_flags); obj = prog ?: oldprog; - tc_cls_common_offload_init(_bpf.common, tp); + tc_cls_common_offload_init(_bpf.common, tp, extack); cls_bpf.command = TC_CLSBPF_OFFLOAD; cls_bpf.exts = >exts; cls_bpf.prog = prog ? prog->filter : NULL; @@ -217,7 +217,7 @@ static void cls_bpf_offload_update_stats(struct tcf_proto *tp, struct tcf_block *block = tp->chain->block; struct tc_cls_bpf_offload cls_bpf = {}; - tc_cls_common_offload_init(_bpf.common, tp); + tc_cls_common_offload_init(_bpf.common, tp, NULL); cls_bpf.command = TC_CLSBPF_STATS; cls_bpf.exts = >exts; cls_bpf.prog = prog->filter; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index ebbaba4a214b..fe7d96d12435 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -223,7 +223,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f) struct tc_cls_flower_offload cls_flower = {}; struct tcf_block *block = tp->chain->block; - tc_cls_common_offload_init(_flower.common, tp); + tc_cls_common_offload_init(_flower.common, tp, NULL); cls_flower.command = TC_CLSFLOWER_DESTROY; cls_flower.cookie = (unsigned long) f; @@ -242,7 +242,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp, bool skip_sw = tc_skip_sw(f->flags); int err; - tc_cls_common_offload_init(_flower.common, tp); + tc_cls_common_offload_init(_flower.common, tp, extack); cls_flower.command = TC_CLSFLOWER_REPLACE; cls_flower.cookie = (unsigned long) f; cls_flower.dissector = dissector; @@ -271,7 +271,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f) struct tc_cls_flower_offload cls_flower = {}; struct tcf_block *block = tp->chain->block; - tc_cls_common_offload_init(_flower.common, tp); + tc_cls_common_offload_init(_flower.common, tp, NULL); cls_flower.command = TC_CLSFLOWER_STATS; cls_flower.cookie = (unsigned long) f; cls_flower.exts = >exts; diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index 16752abcb76b..fe6b673db5c6 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -76,7 +76,7 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp, struct tc_cls_matchall_offload cls_mall = {}; struct tcf_block *block = tp->chain->block; - tc_cls_common_offload_init(_mall.common, tp); + tc_cls_common_offload_init(_mall.common, tp, NULL); cls_mall.command = TC_CLSMATCHALL_DESTROY; cls_mall.cookie = cookie; @@ -93,7 +93,7 @@ static int mall_replace_hw_filter(struct tcf_proto *tp, bool skip_sw = tc_skip_sw(head->flags); int err; - tc_cls_common_offload_init(_mall.common, tp); + tc_cls_common_offload_init(_mall.common, tp, extack); cls_mall.command = TC_CLSMATCHALL_REPLACE; cls_mall.exts = >exts; cls_mall.cookie = cookie; diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 671eb952f6af..ef1b746de80b 100644 --- a/net/sched/cls_u32.c +++
[PATCH bpf-next v3 10/11] netdevsim: add extack support for TC eBPF offload
From: Quentin MonnetUse the recently added extack support for TC eBPF filters in netdevsim. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- drivers/net/netdevsim/bpf.c | 35 --- 1 file changed, 28 insertions(+), 7 deletions(-) diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c index 5134d5c1306c..0de8ba91b262 100644 --- a/drivers/net/netdevsim/bpf.c +++ b/drivers/net/netdevsim/bpf.c @@ -109,17 +109,35 @@ int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type, struct netdevsim *ns = cb_priv; struct bpf_prog *oldprog; - if (type != TC_SETUP_CLSBPF || - !tc_can_offload(ns->netdev) || - cls_bpf->common.protocol != htons(ETH_P_ALL) || - cls_bpf->common.chain_index) + if (type != TC_SETUP_CLSBPF) { + NSIM_EA(cls_bpf->common.extack, + "only offload of BPF classifiers supported"); + return -EOPNOTSUPP; + } + + if (!tc_can_offload_extack(ns->netdev, cls_bpf->common.extack)) + return -EOPNOTSUPP; + + if (cls_bpf->common.protocol != htons(ETH_P_ALL)) { + NSIM_EA(cls_bpf->common.extack, + "only ETH_P_ALL supported as filter protocol"); + return -EOPNOTSUPP; + } + + if (cls_bpf->common.chain_index) return -EOPNOTSUPP; - if (!ns->bpf_tc_accept) + if (!ns->bpf_tc_accept) { + NSIM_EA(cls_bpf->common.extack, + "netdevsim configured to reject BPF TC offload"); return -EOPNOTSUPP; + } /* Note: progs without skip_sw will probably not be dev bound */ - if (prog && !prog->aux->offload && !ns->bpf_tc_non_bound_accept) + if (prog && !prog->aux->offload && !ns->bpf_tc_non_bound_accept) { + NSIM_EA(cls_bpf->common.extack, + "netdevsim configured to reject unbound programs"); return -EOPNOTSUPP; + } if (cls_bpf->command != TC_CLSBPF_OFFLOAD) return -EOPNOTSUPP; @@ -131,8 +149,11 @@ int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type, oldprog = NULL; if (!cls_bpf->prog) return 0; - if (ns->bpf_offloaded) + if (ns->bpf_offloaded) { + NSIM_EA(cls_bpf->common.extack, + "driver and netdev offload states mismatch"); return -EBUSY; + } } return nsim_bpf_offload(ns, cls_bpf->prog, oldprog); -- 2.15.1
[PATCH bpf-next v3 03/11] net: sched: cls_matchall: propagate extack support for filter offload
From: Quentin MonnetPropagate the extack pointer from the `->change()` classifier operation to the function used for filter replacement in cls_matchall. This makes it possible to use netlink extack messages in the future at replacement time for this filter, although it is not used at this point. Signed-off-by: Quentin Monnet Reviewed-by: Jakub Kicinski --- net/sched/cls_matchall.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index dc3c57116bbd..16752abcb76b 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -85,7 +85,8 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp, static int mall_replace_hw_filter(struct tcf_proto *tp, struct cls_mall_head *head, - unsigned long cookie) + unsigned long cookie, + struct netlink_ext_ack *extack) { struct tc_cls_matchall_offload cls_mall = {}; struct tcf_block *block = tp->chain->block; @@ -202,7 +203,8 @@ static int mall_change(struct net *net, struct sk_buff *in_skb, goto err_set_parms; if (!tc_skip_hw(new->flags)) { - err = mall_replace_hw_filter(tp, new, (unsigned long) new); + err = mall_replace_hw_filter(tp, new, (unsigned long)new, +extack); if (err) goto err_replace_hw_filter; } -- 2.15.1
[PATCH bpf-next v3 00/11] net: sched: add extack support for cls offload
Hi! This series adds extack to cls offloads, as such it could arguably be targeted at net-next. Unfortunately, git am is not able to deal cleanly with minor conflicts on the nfp patches.. Since the series is really about cls_bpf I hope it's OK if it went via the bpf-next tree. There is a very minor conflict with Jiri's series, but if this goes via bpf-next, git will be able to deal with it on merge without a fuss. Quentin says: This series tries to improve user experience when eBPF hardware offload hits error paths at load time. In particular, it introduces netlink extended ack support in the nfp driver. To that aim, transmission of the pointer to the extack object is piped through the `change()` operation of the existing classifiers (patch 1 to 6). Then it is used for TC offload in the nfp driver (patch 8) and in netdevsim (patch 9, selftest in patch 10). Patch 7 adds a helper to handle extack messages in the core when TC offload is disabled on the net device. For completeness extack is propagated for classifiers other than cls_bpf, but it's up to the drivers to make use of it. Quentin Monnet (11): net: sched: add extack support to change() classifier operation net: sched: cls_flower: propagate extack support for filter offload net: sched: cls_matchall: propagate extack support for filter offload net: sched: cls_u32: propagate extack support for filter offload net: sched: cls_bpf: plumb extack support in filter for hardware offload net: sched: add extack support for offload via tc_cls_common_offload net: sched: create tc_can_offload_extack() wrapper nfp: bpf: plumb extack into functions related to XDP offload nfp: bpf: use extack support to improve debugging netdevsim: add extack support for TC eBPF offload selftests/bpf: add checks on extack messages for eBPF hw offload tests drivers/net/ethernet/netronome/nfp/bpf/main.c | 35 +-- drivers/net/ethernet/netronome/nfp/bpf/main.h | 2 +- drivers/net/ethernet/netronome/nfp/bpf/offload.c | 24 +++-- drivers/net/ethernet/netronome/nfp/nfp_app.h | 9 +- .../net/ethernet/netronome/nfp/nfp_net_common.c| 2 +- drivers/net/netdevsim/bpf.c| 35 +-- include/net/pkt_cls.h | 16 +++- include/net/sch_generic.h | 3 +- net/sched/cls_api.c| 3 +- net/sched/cls_basic.c | 3 +- net/sched/cls_bpf.c| 20 ++-- net/sched/cls_cgroup.c | 3 +- net/sched/cls_flow.c | 2 +- net/sched/cls_flower.c | 14 +-- net/sched/cls_fw.c | 2 +- net/sched/cls_matchall.c | 12 ++- net/sched/cls_route.c | 3 +- net/sched/cls_rsvp.h | 2 +- net/sched/cls_tcindex.c| 3 +- net/sched/cls_u32.c| 21 +++-- tools/testing/selftests/bpf/test_offload.py| 104 +++-- 21 files changed, 221 insertions(+), 97 deletions(-) -- 2.15.1