Re: [bpf-next V1 PATCH 09/15] mlx5: register a memory model when XDP is enabled
On Wed, 7 Mar 2018 13:50:19 +0200 Tariq Toukanwrote: > On 06/03/2018 11:48 PM, Jesper Dangaard Brouer wrote: > > Now all the users of ndo_xdp_xmit have been converted to use > > xdp_return_frame. > > This enable a different memory model, thus activating another code path > > in the xdp_return_frame API. > > > > Signed-off-by: Jesper Dangaard Brouer > > --- > > drivers/net/ethernet/mellanox/mlx5/core/en_main.c |7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > > index da94c8cba5ee..51482943c583 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > > @@ -506,6 +506,13 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, > > rq->mkey_be = c->mkey_be; > > } > > > > + /* This must only be activate for order-0 pages */ > > + if (rq->xdp_prog) > > + err = xdp_rxq_info_reg_mem_model(>xdp_rxq, > > +MEM_TYPE_PAGE_ORDER0, NULL); > > + if (err < 0) > > + goto err_rq_wq_destroy; > > + > > Use "if (err)" here, instead of changing this in next patch. > Also, get it into the "if (rq->xdp_prog)" block. Good point, fixed! -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH] net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()
On Wed, Mar 07, 2018 at 11:24:16AM -0800, Greg Hackmann wrote: > f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a > __this_cpu_read() call inside ipcomp_alloc_tfms(). Since this call was > introduced, the rules around per-cpu accessors have been tightened and > __this_cpu_read() cannot be used in a preemptible context. > > syzkaller reported this leading to the following kernel BUG while > fuzzing sendmsg: How about reverting f7c83bcbfaf5 instead? Thanks, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
,Your urgent confirmation
Attn: Beneficiary, We have contacted the Federal Ministry of Finance on your Behalf and they have brought a solution to your problem by coordinating your payment in total (10,000,000.00) Ten Million Dollars in an atm card which you can use to withdraw money from any ATM MACHINE CENTER anywhere in the world with a maximum of 1 Dollars daily. You now have the lawful right to claim your fund in an atm card. Since the Federal Bureau of Investigation is involved in this transaction, you have to be rest assured for this is 100% risk free it is our duty to protect the American Citizens, European Citizens, Asian Citizen. All I want you to do is to contact the atm card CENTER Via email or call the office telephone number one of the Consultant will assist you for their requirements to proceed and procure your Approval Slip on your behalf. CONTACT INFORMATION NAME: James Williams EMAIL: paymasterofficed...@gmail.com Do contact us with your details: Full name// Address// Age// Telephone Numbers// Occupation// Your Country// Bank Details// So your files would be updated after which the Delivery of your atm card will be affected to your designated home Address without any further delay and the bank will transfer your funds in total (10,000,000.00) Ten Million Dollars to your Bank account. We will reply you with the secret code (1600 atm card). We advice you get back to the payment office after you have contacted the ATM SWIFT CARD CENTER and we do await your response so we can move on with our Investigation and make sure your ATM SWIFT CARD gets to you. Best Regards James Williams Paymaster General Federal Republic Of Nigeri
[PATCH net-next] liquidio: fix ndo_change_mtu to always return correct status to the caller
From: Veerasenareddy BurruIn a scenario where the command queued to firmware get dropped or times out, MTU change from host will not propagate to firmware. So, it is required for host driver to wait for response from firmware or timeout and then return correct status to caller of ndo_change_mtu. Also moved the common code for MTU change from PF and VF driver files to common file lio_core.c Signed-off-by: Veerasenareddy Burru Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/lio_core.c| 96 -- drivers/net/ethernet/cavium/liquidio/lio_main.c| 70 ++-- drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 67 +++ .../net/ethernet/cavium/liquidio/liquidio_common.h | 3 +- .../net/ethernet/cavium/liquidio/octeon_network.h | 20 + 5 files changed, 143 insertions(+), 113 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c b/drivers/net/ethernet/cavium/liquidio/lio_core.c index 8b1ee83..ff4bfb28 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_core.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c @@ -164,15 +164,6 @@ void liquidio_link_ctrl_cmd_completion(void *nctrl_ptr) } break; - case OCTNET_CMD_CHANGE_MTU: - /* If command is successful, change the MTU. */ - netif_info(lio, probe, lio->netdev, "MTU Changed from %d to %d\n", - netdev->mtu, nctrl->ncmd.s.param1); - netdev->mtu = nctrl->ncmd.s.param1; - queue_delayed_work(lio->link_status_wq.wq, - >link_status_wq.wk.work, 0); - break; - case OCTNET_CMD_GPIO_ACCESS: netif_info(lio, probe, lio->netdev, "LED Flashing visual identification\n"); @@ -1081,3 +1072,90 @@ int octeon_setup_interrupt(struct octeon_device *oct, u32 num_ioqs) } return 0; } + +static void liquidio_change_mtu_completion(struct octeon_device *oct, + u32 status, void *buf) +{ + struct octeon_soft_command *sc = (struct octeon_soft_command *)buf; + struct liquidio_if_cfg_context *ctx; + + ctx = (struct liquidio_if_cfg_context *)sc->ctxptr; + + if (status) { + dev_err(>pci_dev->dev, "MTU change failed. Status: %llx\n", + CVM_CAST64(status)); + WRITE_ONCE(ctx->cond, LIO_CHANGE_MTU_FAIL); + } else { + WRITE_ONCE(ctx->cond, LIO_CHANGE_MTU_SUCCESS); + } + + /* This barrier is required to be sure that the response has been +* written fully before waking up the handler +*/ + wmb(); + + wake_up_interruptible(>wc); +} + +/** + * \brief Net device change_mtu + * @param netdev network device + */ +int liquidio_change_mtu(struct net_device *netdev, int new_mtu) +{ + struct lio *lio = GET_LIO(netdev); + struct octeon_device *oct = lio->oct_dev; + struct liquidio_if_cfg_context *ctx; + struct octeon_soft_command *sc; + union octnet_cmd *ncmd; + int ctx_size; + int ret = 0; + + ctx_size = sizeof(struct liquidio_if_cfg_context); + sc = (struct octeon_soft_command *) + octeon_alloc_soft_command(oct, OCTNET_CMD_SIZE, 16, ctx_size); + + ncmd = (union octnet_cmd *)sc->virtdptr; + ctx = (struct liquidio_if_cfg_context *)sc->ctxptr; + + WRITE_ONCE(ctx->cond, 0); + ctx->octeon_id = lio_get_device_id(oct); + init_waitqueue_head(>wc); + + ncmd->u64 = 0; + ncmd->s.cmd = OCTNET_CMD_CHANGE_MTU; + ncmd->s.param1 = new_mtu; + + octeon_swap_8B_data((u64 *)ncmd, (OCTNET_CMD_SIZE >> 3)); + + sc->iq_no = lio->linfo.txpciq[0].s.q_no; + + octeon_prepare_soft_command(oct, sc, OPCODE_NIC, + OPCODE_NIC_CMD, 0, 0, 0); + + sc->callback = liquidio_change_mtu_completion; + sc->callback_arg = sc; + sc->wait_time = 100; + + ret = octeon_send_soft_command(oct, sc); + if (ret == IQ_SEND_FAILED) { + netif_info(lio, rx_err, lio->netdev, "Failed to change MTU\n"); + return -EINVAL; + } + /* Sleep on a wait queue till the cond flag indicates that the +* response arrived or timed-out. +*/ + if (sleep_cond(>wc, >cond) == -EINTR || + ctx->cond == LIO_CHANGE_MTU_FAIL) { + octeon_free_soft_command(oct, sc); + return -EINVAL; + } + /* command is successful, change the MTU. */ + netif_info(lio, probe, lio->netdev, "MTU changed from %d to %d\n", + netdev->mtu, new_mtu); + netdev->mtu = new_mtu; + lio->mtu = new_mtu; + + octeon_free_soft_command(oct, sc); + return 0; +} diff --git
Please add "NFC: llcp: Limit size of SDP URI" for stable
Hi, I don't see fe9c842695e2 ("NFC: llcp: Limit size of SDP URI") queued up for stable. Can this one be added please? This is a buffer overflow fix. Thanks! -Kees -- Kees Cook Pixel Security
Re: [PATCH net-next 00/23] net: hns3: HNS3 bug fixes & code improvements
Sorry, this is way too large of a patch series. Please keep your series to about a dozen or so changes. Anything longer puts an unreasonable burdon upon patch reviewers, and such a large series will often make it so that nearly all reviewers are discouraged from taking a look at all. Thank you.
[PATCH AUTOSEL for 4.9 032/190] time: Change posix clocks ops interfaces to use timespec64
From: Deepa Dinamani[ Upstream commit d340266e19ddb70dbd608f9deedcfb35fdb9d419 ] struct timespec is not y2038 safe on 32 bit machines. The posix clocks apis use struct timespec directly and through struct itimerspec. Replace the posix clock interfaces to use struct timespec64 and struct itimerspec64 instead. Also fix up their implementations accordingly. Note that the clock_getres() interface has also been changed to use timespec64 even though this particular interface is not affected by the y2038 problem. This helps verification for internal kernel code for y2038 readiness by getting rid of time_t/ timeval/ timespec. Signed-off-by: Deepa Dinamani Cc: a...@arndb.de Cc: y2...@lists.linaro.org Cc: netdev@vger.kernel.org Cc: Richard Cochran Cc: john.stu...@linaro.org Link: http://lkml.kernel.org/r/1490555058-4603-3-git-send-email-deepa.ker...@gmail.com Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin --- drivers/ptp/ptp_clock.c | 18 +++--- include/linux/posix-clock.h | 10 +- kernel/time/posix-clock.c | 34 -- 3 files changed, 36 insertions(+), 26 deletions(-) diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c index 86280b7e41f3..2aa5b37cc6d2 100644 --- a/drivers/ptp/ptp_clock.c +++ b/drivers/ptp/ptp_clock.c @@ -97,30 +97,26 @@ static s32 scaled_ppm_to_ppb(long ppm) /* posix clock implementation */ -static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp) +static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp) { tp->tv_sec = 0; tp->tv_nsec = 1; return 0; } -static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp) +static int ptp_clock_settime(struct posix_clock *pc, const struct timespec64 *tp) { struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); - struct timespec64 ts = timespec_to_timespec64(*tp); - return ptp->info->settime64(ptp->info, ); + return ptp->info->settime64(ptp->info, tp); } -static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp) +static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp) { struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); - struct timespec64 ts; int err; - err = ptp->info->gettime64(ptp->info, ); - if (!err) - *tp = timespec64_to_timespec(ts); + err = ptp->info->gettime64(ptp->info, tp); return err; } @@ -133,7 +129,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) ops = ptp->info; if (tx->modes & ADJ_SETOFFSET) { - struct timespec ts; + struct timespec64 ts; ktime_t kt; s64 delta; @@ -146,7 +142,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) if ((unsigned long) ts.tv_nsec >= NSEC_PER_SEC) return -EINVAL; - kt = timespec_to_ktime(ts); + kt = timespec64_to_ktime(ts); delta = ktime_to_ns(kt); err = ops->adjtime(ops, delta); } else if (tx->modes & ADJ_FREQUENCY) { diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h index 34c4498b800f..83b22ae9ae12 100644 --- a/include/linux/posix-clock.h +++ b/include/linux/posix-clock.h @@ -59,23 +59,23 @@ struct posix_clock_operations { int (*clock_adjtime)(struct posix_clock *pc, struct timex *tx); - int (*clock_gettime)(struct posix_clock *pc, struct timespec *ts); + int (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts); - int (*clock_getres) (struct posix_clock *pc, struct timespec *ts); + int (*clock_getres) (struct posix_clock *pc, struct timespec64 *ts); int (*clock_settime)(struct posix_clock *pc, - const struct timespec *ts); + const struct timespec64 *ts); int (*timer_create) (struct posix_clock *pc, struct k_itimer *kit); int (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit); void (*timer_gettime)(struct posix_clock *pc, - struct k_itimer *kit, struct itimerspec *tsp); + struct k_itimer *kit, struct itimerspec64 *tsp); int (*timer_settime)(struct posix_clock *pc, struct k_itimer *kit, int flags, - struct itimerspec *tsp, struct itimerspec *old); + struct itimerspec64 *tsp, struct itimerspec64 *old); /* * Optional character device methods: */ diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c index 9cff0ab82b63..e24008c098c6 100644 ---
[PATCH net-next] liquidio: avoid doing useless work
From: Prasad KannegantiAvoid doing useless work by making sure that the response_list is not empty before scheduling work to process it. Signed-off-by: Prasad Kanneganti Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/request_manager.c | 5 + drivers/net/ethernet/cavium/liquidio/response_manager.c | 6 -- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c b/drivers/net/ethernet/cavium/liquidio/request_manager.c index e07d209..2766af0 100644 --- a/drivers/net/ethernet/cavium/liquidio/request_manager.c +++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c @@ -366,6 +366,7 @@ int lio_process_iq_request_list(struct octeon_device *oct, struct octeon_instr_queue *iq, u32 napi_budget) { + struct cavium_wq *cwq = >dma_comp_wq; int reqtype; void *buf; u32 old = iq->flush_index; @@ -450,6 +451,10 @@ lio_process_iq_request_list(struct octeon_device *oct, bytes_compl); iq->flush_index = old; + if (atomic_read(>response_list + [OCTEON_ORDERED_SC_LIST].pending_req_count)) + queue_delayed_work(cwq->wq, >wk.work, msecs_to_jiffies(1)); + return inst_count; } diff --git a/drivers/net/ethernet/cavium/liquidio/response_manager.c b/drivers/net/ethernet/cavium/liquidio/response_manager.c index 3d691c6..fe5b537 100644 --- a/drivers/net/ethernet/cavium/liquidio/response_manager.c +++ b/drivers/net/ethernet/cavium/liquidio/response_manager.c @@ -49,7 +49,6 @@ int octeon_setup_response_list(struct octeon_device *oct) INIT_DELAYED_WORK(>wk.work, oct_poll_req_completion); cwq->wk.ctxptr = oct; oct->cmd_resp_state = OCT_DRV_ONLINE; - queue_delayed_work(cwq->wq, >wk.work, msecs_to_jiffies(50)); return ret; } @@ -164,5 +163,8 @@ static void oct_poll_req_completion(struct work_struct *work) struct cavium_wq *cwq = >dma_comp_wq; lio_process_ordered_list(oct, 0); - queue_delayed_work(cwq->wq, >wk.work, msecs_to_jiffies(50)); + + if (atomic_read(>response_list + [OCTEON_ORDERED_SC_LIST].pending_req_count)) + queue_delayed_work(cwq->wq, >wk.work, msecs_to_jiffies(1)); }
[PATCH net-next] liquidio: Resolved mbox read issue while reading more than one 64bit data
From: Intiyaz BashaCorrected length check when data received in the mbox is more than one 64 bit data value Signed-off-by: Intiyaz Basha Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c index 57af7df..28e74ee 100644 --- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c +++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c @@ -87,7 +87,7 @@ int octeon_mbox_read(struct octeon_mbox *mbox) } if (mbox->state & OCTEON_MBOX_STATE_REQUEST_RECEIVING) { - if (mbox->mbox_req.recv_len < msg.s.len) { + if (mbox->mbox_req.recv_len < mbox->mbox_req.msg.s.len) { ret = 0; } else { mbox->state &= ~OCTEON_MBOX_STATE_REQUEST_RECEIVING; @@ -96,7 +96,8 @@ int octeon_mbox_read(struct octeon_mbox *mbox) } } else { if (mbox->state & OCTEON_MBOX_STATE_RESPONSE_RECEIVING) { - if (mbox->mbox_resp.recv_len < msg.s.len) { + if (mbox->mbox_resp.recv_len < + mbox->mbox_resp.msg.s.len) { ret = 0; } else { mbox->state &= -- 1.8.3.1
[PATCH AUTOSEL for 4.4 016/101] time: Change posix clocks ops interfaces to use timespec64
From: Deepa Dinamani[ Upstream commit d340266e19ddb70dbd608f9deedcfb35fdb9d419 ] struct timespec is not y2038 safe on 32 bit machines. The posix clocks apis use struct timespec directly and through struct itimerspec. Replace the posix clock interfaces to use struct timespec64 and struct itimerspec64 instead. Also fix up their implementations accordingly. Note that the clock_getres() interface has also been changed to use timespec64 even though this particular interface is not affected by the y2038 problem. This helps verification for internal kernel code for y2038 readiness by getting rid of time_t/ timeval/ timespec. Signed-off-by: Deepa Dinamani Cc: a...@arndb.de Cc: y2...@lists.linaro.org Cc: netdev@vger.kernel.org Cc: Richard Cochran Cc: john.stu...@linaro.org Link: http://lkml.kernel.org/r/1490555058-4603-3-git-send-email-deepa.ker...@gmail.com Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin --- drivers/ptp/ptp_clock.c | 18 +++--- include/linux/posix-clock.h | 10 +- kernel/time/posix-clock.c | 34 -- 3 files changed, 36 insertions(+), 26 deletions(-) diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c index 2e481b9e8ea5..60a5e0c63a13 100644 --- a/drivers/ptp/ptp_clock.c +++ b/drivers/ptp/ptp_clock.c @@ -97,30 +97,26 @@ static s32 scaled_ppm_to_ppb(long ppm) /* posix clock implementation */ -static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp) +static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp) { tp->tv_sec = 0; tp->tv_nsec = 1; return 0; } -static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp) +static int ptp_clock_settime(struct posix_clock *pc, const struct timespec64 *tp) { struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); - struct timespec64 ts = timespec_to_timespec64(*tp); - return ptp->info->settime64(ptp->info, ); + return ptp->info->settime64(ptp->info, tp); } -static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp) +static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp) { struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); - struct timespec64 ts; int err; - err = ptp->info->gettime64(ptp->info, ); - if (!err) - *tp = timespec64_to_timespec(ts); + err = ptp->info->gettime64(ptp->info, tp); return err; } @@ -133,7 +129,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) ops = ptp->info; if (tx->modes & ADJ_SETOFFSET) { - struct timespec ts; + struct timespec64 ts; ktime_t kt; s64 delta; @@ -146,7 +142,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) if ((unsigned long) ts.tv_nsec >= NSEC_PER_SEC) return -EINVAL; - kt = timespec_to_ktime(ts); + kt = timespec64_to_ktime(ts); delta = ktime_to_ns(kt); err = ops->adjtime(ops, delta); } else if (tx->modes & ADJ_FREQUENCY) { diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h index 34c4498b800f..83b22ae9ae12 100644 --- a/include/linux/posix-clock.h +++ b/include/linux/posix-clock.h @@ -59,23 +59,23 @@ struct posix_clock_operations { int (*clock_adjtime)(struct posix_clock *pc, struct timex *tx); - int (*clock_gettime)(struct posix_clock *pc, struct timespec *ts); + int (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts); - int (*clock_getres) (struct posix_clock *pc, struct timespec *ts); + int (*clock_getres) (struct posix_clock *pc, struct timespec64 *ts); int (*clock_settime)(struct posix_clock *pc, - const struct timespec *ts); + const struct timespec64 *ts); int (*timer_create) (struct posix_clock *pc, struct k_itimer *kit); int (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit); void (*timer_gettime)(struct posix_clock *pc, - struct k_itimer *kit, struct itimerspec *tsp); + struct k_itimer *kit, struct itimerspec64 *tsp); int (*timer_settime)(struct posix_clock *pc, struct k_itimer *kit, int flags, - struct itimerspec *tsp, struct itimerspec *old); + struct itimerspec64 *tsp, struct itimerspec64 *old); /* * Optional character device methods: */ diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c index 9cff0ab82b63..e24008c098c6 100644 ---
[PATCH iproute2 net-next v4] iprule: support for ip_proto, sport and dport match options
From: Roopa Prabhuadd support to match on ip_proto, sport and dport ranges. For ip_proto, this patch currently enumerates, tcp, udp and sctp. This list can be extended in the future. example: $ip rule add sport 666-777 dport 999 ip_proto tcp table 100 $ip rule show 0: from all lookup local 32765: from all ip_proto 6 sport 666-777 dport 999 lookup 100 32766: from all lookup main 32767: from all lookup default Signed-off-by: Roopa Prabhu --- v2: use inet_proto_* as suggested by David Ahern v3: fix newlines in usage (feedback from David Ahern) v4: fixes for json (feedback from Stephen H). include/uapi/linux/fib_rules.h | 8 + ip/iprule.c| 67 ++ man/man8/ip-rule.8 | 32 +++- 3 files changed, 106 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h index 77d90ae..1809af5 100644 --- a/include/uapi/linux/fib_rules.h +++ b/include/uapi/linux/fib_rules.h @@ -35,6 +35,11 @@ struct fib_rule_uid_range { __u32 end; }; +struct fib_rule_port_range { + __u16 start; + __u16 end; +}; + enum { FRA_UNSPEC, FRA_DST,/* destination address */ @@ -59,6 +64,9 @@ enum { FRA_L3MDEV, /* iif or oif is l3mdev goto its table */ FRA_UID_RANGE, /* UID range */ FRA_PROTOCOL, /* Originator of the rule */ + FRA_IP_PROTO, /* ip proto */ + FRA_SPORT_RANGE,/* sport range */ + FRA_DPORT_RANGE,/* dport range */ __FRA_MAX }; diff --git a/ip/iprule.c b/ip/iprule.c index a49753e..3520544 100644 --- a/ip/iprule.c +++ b/ip/iprule.c @@ -47,6 +47,9 @@ static void usage(void) "SELECTOR := [ not ] [ from PREFIX ] [ to PREFIX ] [ tos TOS ] [ fwmark FWMARK[/MASK] ]\n" "[ iif STRING ] [ oif STRING ] [ pref NUMBER ] [ l3mdev ]\n" "[ uidrange NUMBER-NUMBER ]\n" + "[ ip_proto PROTOCOL ]\n" + "[ sport [ NUMBER | NUMBER-NUMBER ]\n" + "[ dport [ NUMBER | NUMBER-NUMBER ] ]\n" "ACTION := [ table TABLE_ID ]\n" " [ protocol PROTO ]\n" " [ nat ADDRESS ]\n" @@ -306,6 +309,37 @@ int print_rule(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) print_uint(PRINT_ANY, "uid_end", "-%u ", r->end); } + if (tb[FRA_IP_PROTO]) { + SPRINT_BUF(pbuf); + print_string(PRINT_ANY, "ip_proto", "ip_proto %s ", +inet_proto_n2a(rta_getattr_u8(tb[FRA_IP_PROTO]), + pbuf, sizeof(pbuf))); + } + + if (tb[FRA_SPORT_RANGE]) { + struct fib_rule_port_range *r = RTA_DATA(tb[FRA_SPORT_RANGE]); + + if (r->start == r->end) { + print_uint(PRINT_ANY, "sport", "sport %u ", r->start); + } else { + print_uint(PRINT_ANY, "sport_start", "sport %u", + r->start); + print_uint(PRINT_ANY, "sport_end", "-%u ", r->end); + } + } + + if (tb[FRA_DPORT_RANGE]) { + struct fib_rule_port_range *r = RTA_DATA(tb[FRA_DPORT_RANGE]); + + if (r->start == r->end) { + print_uint(PRINT_ANY, "dport", "dport %u ", r->start); + } else { + print_uint(PRINT_ANY, "dport_start", "dport %u", + r->start); + print_uint(PRINT_ANY, "dport_end", "-%u ", r->end); + } + } + table = frh_get_table(frh, tb); if (table) { print_string(PRINT_ANY, "table", @@ -802,6 +836,39 @@ static int iprule_modify(int cmd, int argc, char **argv) addattr32(, sizeof(req), RTA_GATEWAY, get_addr32(*argv)); req.frh.action = RTN_NAT; + } else if (strcmp(*argv, "ip_proto") == 0) { + __u8 ip_proto; + + NEXT_ARG(); + ip_proto = inet_proto_a2n(*argv); + if (ip_proto < 0) + invarg("Invalid \"ip_proto\" value\n", + *argv); + addattr8(, sizeof(req), FRA_IP_PROTO, ip_proto); + } else if (strcmp(*argv, "sport") == 0) { + struct fib_rule_port_range r; + int ret = 0; + + NEXT_ARG(); + ret = sscanf(*argv, "%hu-%hu", , ); + if (ret == 1) + r.end = r.start; + else if (ret
[PATCH 2/3] net: Remove accidental VLAs from proc buffers
In the quest to remove all stack VLAs from the kernel[1], this refactors the stack array size calculation to avoid using max(), which makes the compiler think the size isn't fixed. [1] https://lkml.org/lkml/2018/3/7/621 Signed-off-by: Kees Cook--- net/ipv4/proc.c | 10 -- net/ipv6/proc.c | 10 -- 2 files changed, 8 insertions(+), 12 deletions(-) diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index dc5edc8f7564..c23c43803435 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -46,8 +46,6 @@ #include #include -#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX) - /* * Report socket allocation statistics [m...@utu.fi] */ @@ -400,11 +398,11 @@ static int snmp_seq_show_ipstats(struct seq_file *seq, void *v) static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) { - unsigned long buff[TCPUDP_MIB_MAX]; + unsigned long buff[SIMPLE_MAX(UDP_MIB_MAX, TCP_MIB_MAX)]; struct net *net = seq->private; int i; - memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long)); + memset(buff, 0, sizeof(buff)); seq_puts(seq, "\nTcp:"); for (i = 0; snmp4_tcp_list[i].name; i++) @@ -421,7 +419,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) seq_printf(seq, " %lu", buff[i]); } - memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long)); + memset(buff, 0, sizeof(buff)); snmp_get_cpu_field_batch(buff, snmp4_udp_list, net->mib.udp_statistics); @@ -432,7 +430,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) for (i = 0; snmp4_udp_list[i].name; i++) seq_printf(seq, " %lu", buff[i]); - memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long)); + memset(buff, 0, sizeof(buff)); /* the UDP and UDP-Lite MIBs are the same */ seq_puts(seq, "\nUdpLite:"); diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c index b67814242f78..5b0874c26802 100644 --- a/net/ipv6/proc.c +++ b/net/ipv6/proc.c @@ -30,10 +30,8 @@ #include #include -#define MAX4(a, b, c, d) \ - max_t(u32, max_t(u32, a, b), max_t(u32, c, d)) -#define SNMP_MIB_MAX MAX4(UDP_MIB_MAX, TCP_MIB_MAX, \ - IPSTATS_MIB_MAX, ICMP_MIB_MAX) +#define SNMP_MIB_MAX SIMPLE_MAX(SIMPLE_MAX(UDP_MIB_MAX, TCP_MIB_MAX), \ + SIMPLE_MAX(IPSTATS_MIB_MAX, ICMP_MIB_MAX)) static int sockstat6_seq_show(struct seq_file *seq, void *v) { @@ -199,7 +197,7 @@ static void snmp6_seq_show_item(struct seq_file *seq, void __percpu *pcpumib, int i; if (pcpumib) { - memset(buff, 0, sizeof(unsigned long) * SNMP_MIB_MAX); + memset(buff, 0, sizeof(buff)); snmp_get_cpu_field_batch(buff, itemlist, pcpumib); for (i = 0; itemlist[i].name; i++) @@ -218,7 +216,7 @@ static void snmp6_seq_show_item64(struct seq_file *seq, void __percpu *mib, u64 buff64[SNMP_MIB_MAX]; int i; - memset(buff64, 0, sizeof(u64) * SNMP_MIB_MAX); + memset(buff64, 0, sizeof(buff64)); snmp_get_cpu_field64_batch(buff64, itemlist, mib, syncpoff); for (i = 0; itemlist[i].name; i++) -- 2.7.4
[PATCH 3/3] btrfs: tree-checker: Avoid accidental stack VLA
In the quest to remove all stack VLAs from the kernel[1], this refactors the stack array size calculation to avoid using max(), which makes the compiler think the size isn't fixed. [1] https://lkml.org/lkml/2018/3/7/621 Signed-off-by: Kees Cook--- fs/btrfs/tree-checker.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index c3c8d48f6618..59bd07694118 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -341,7 +341,8 @@ static int check_dir_item(struct btrfs_root *root, */ if (key->type == BTRFS_DIR_ITEM_KEY || key->type == BTRFS_XATTR_ITEM_KEY) { - char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)]; + char namebuf[SIMPLE_MAX(BTRFS_NAME_LEN, + XATTR_NAME_MAX)]; read_extent_buffer(leaf, namebuf, (unsigned long)(di + 1), name_len); -- 2.7.4
[PATCH 0/3] Remove accidental VLA usage
This series adds SIMPLE_MAX() to be used in places where a stack array is actually fixed, but the compiler still warns about VLA usage due to confusion caused by the safety checks in the max() macro. I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(), and they should all have no operational differences. -Kees
[PATCH v2 1/3] vsprintf: Remove accidental VLA usage
In the quest to remove all stack VLAs from the kernel[1], this introduces a new "simple max" macro, and changes the "sym" array size calculation to use it. The value is actually a fixed size, but since the max() macro uses some extensive tricks for safety, it ends up looking like a variable size to the compiler. [1] https://lkml.org/lkml/2018/3/7/621 Signed-off-by: Kees Cook--- include/linux/kernel.h | 11 +++ lib/vsprintf.c | 4 ++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 3fd291503576..1da554e9997f 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -820,6 +820,17 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { } x, y) /** + * SIMPLE_MAX - return maximum of two values without any type checking + * @x: first value + * @y: second value + * + * This should only be used in stack array sizes, since the type-checking + * from max() confuses the compiler into thinking a VLA is being used. + */ +#define SIMPLE_MAX(x, y) ((size_t)(x) > (size_t)(y) ? (size_t)(x) \ + : (size_t)(y)) + +/** * min3 - return minimum of three values * @x: first value * @y: second value diff --git a/lib/vsprintf.c b/lib/vsprintf.c index d7a708f82559..50cce36e1cdc 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -744,8 +744,8 @@ char *resource_string(char *buf, char *end, struct resource *res, #define FLAG_BUF_SIZE (2 * sizeof(res->flags)) #define DECODED_BUF_SIZE sizeof("[mem - 64bit pref window disabled]") #define RAW_BUF_SIZE sizeof("[mem - flags 0x]") - char sym[max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE, -2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)]; + char sym[SIMPLE_MAX(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE, + 2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)]; char *p = sym, *pend = sym + sizeof(sym); int decode = (fmt[0] == 'R') ? 1 : 0; -- 2.7.4
[PATCH net-next 01/23] {topost} net: hns3: VF should get the real rss_size instead of rss_size_max
VF driver should get the real rss_size which is assigned by host PF, not rss_size_max. Signed-off-by: Peng Li--- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c index f38fc5c..31383a6 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -291,7 +291,7 @@ static int hclge_get_vf_queue_info(struct hclge_vport *vport, /* get the queue related info */ memcpy(_data[0], >alloc_tqps, sizeof(u16)); - memcpy(_data[2], >rss_size_max, sizeof(u16)); + memcpy(_data[2], >nic.kinfo.rss_size, sizeof(u16)); memcpy(_data[4], >num_desc, sizeof(u16)); memcpy(_data[6], >rx_buf_len, sizeof(u16)); -- 2.9.3
[PATCH net-next 06/23] {topost} net: hns3: fix for ipv6 address loss problem after setting channels
From: Fuyun LiangThe function of dev_close and dev_open is just likes ifconfig down and ifconfig up. The ipv6 address will be lost after dev_close and dev_open are called. This patch uses hns3_nic_net_stop to replace dev_close and uses hns3_nic_net_open to replace dev_open. Signed-off-by: Fuyun Liang Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 1bebfd9..83f4b36 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -3411,7 +3411,7 @@ int hns3_set_channels(struct net_device *netdev, return 0; if (if_running) - dev_close(netdev); + hns3_nic_net_stop(netdev); hns3_clear_all_ring(h); @@ -3440,7 +3440,7 @@ int hns3_set_channels(struct net_device *netdev, open_netdev: if (if_running) - dev_open(netdev); + hns3_nic_net_open(netdev); return ret; } -- 2.9.3
[PATCH net-next 12/23] {topost} net: hns3: fix for RSS configuration loss problem during reset
From: Yunsheng LinRSS configuration will be set to default value by hclge_rss_init_hw during reset, which causes the RSS configuration loss problem. This patch fixes it by setting the default value in hclge_rss_init_cfg function, which will not be called in the reset process. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 2 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 107 ++--- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h| 1 + 3 files changed, 56 insertions(+), 54 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c index c5270b5..955f0e3 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c @@ -144,6 +144,8 @@ static int hclge_map_update(struct hnae3_handle *h) if (ret) return ret; + hclge_rss_indir_init_cfg(hdev); + return hclge_rss_init_hw(hdev); } diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 0271960..1d69470 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -3329,67 +3329,28 @@ static int hclge_get_tc_size(struct hnae3_handle *handle) int hclge_rss_init_hw(struct hclge_dev *hdev) { - const u8 hfunc = HCLGE_RSS_HASH_ALGO_TOEPLITZ; struct hclge_vport *vport = hdev->vport; + u8 *rss_indir = vport[0].rss_indirection_tbl; + u16 rss_size = vport[0].alloc_rss_size; + u8 *key = vport[0].rss_hash_key; + u8 hfunc = vport[0].rss_algo; u16 tc_offset[HCLGE_MAX_TC_NUM]; - u8 rss_key[HCLGE_RSS_KEY_SIZE]; u16 tc_valid[HCLGE_MAX_TC_NUM]; u16 tc_size[HCLGE_MAX_TC_NUM]; - u32 *rss_indir = NULL; - u16 rss_size = 0, roundup_size; - const u8 *key; - int i, ret, j; - - rss_indir = kcalloc(HCLGE_RSS_IND_TBL_SIZE, sizeof(u32), GFP_KERNEL); - if (!rss_indir) - return -ENOMEM; - - /* Get default RSS key */ - netdev_rss_key_fill(rss_key, HCLGE_RSS_KEY_SIZE); - - /* Initialize RSS indirect table for each vport */ - for (j = 0; j < hdev->num_vmdq_vport + 1; j++) { - vport[j].rss_tuple_sets.ipv4_tcp_en = - HCLGE_RSS_INPUT_TUPLE_OTHER; - vport[j].rss_tuple_sets.ipv4_udp_en = - HCLGE_RSS_INPUT_TUPLE_OTHER; - vport[j].rss_tuple_sets.ipv4_sctp_en = - HCLGE_RSS_INPUT_TUPLE_SCTP; - vport[j].rss_tuple_sets.ipv4_fragment_en = - HCLGE_RSS_INPUT_TUPLE_OTHER; - vport[j].rss_tuple_sets.ipv6_tcp_en = - HCLGE_RSS_INPUT_TUPLE_OTHER; - vport[j].rss_tuple_sets.ipv6_udp_en = - HCLGE_RSS_INPUT_TUPLE_OTHER; - vport[j].rss_tuple_sets.ipv6_sctp_en = - HCLGE_RSS_INPUT_TUPLE_SCTP; - vport[j].rss_tuple_sets.ipv6_fragment_en = - HCLGE_RSS_INPUT_TUPLE_OTHER; - - for (i = 0; i < HCLGE_RSS_IND_TBL_SIZE; i++) { - vport[j].rss_indirection_tbl[i] = - i % vport[j].alloc_rss_size; - - /* vport 0 is for PF */ - if (j != 0) - continue; + u16 roundup_size; + int i, ret; - rss_size = vport[j].alloc_rss_size; - rss_indir[i] = vport[j].rss_indirection_tbl[i]; - } - } ret = hclge_set_rss_indir_table(hdev, rss_indir); if (ret) - goto err; + return ret; - key = rss_key; ret = hclge_set_rss_algo_key(hdev, hfunc, key); if (ret) - goto err; + return ret; ret = hclge_set_rss_input_tuple(hdev); if (ret) - goto err; + return ret; /* Each TC have the same queue size, and tc_size set to hardware is * the log2 of roundup power of two of rss_size, the acutal queue @@ -3399,8 +3360,7 @@ int hclge_rss_init_hw(struct hclge_dev *hdev) dev_err(>pdev->dev, "Configure rss tc size failed, invalid TC_SIZE = %d\n", rss_size); - ret = -EINVAL; - goto err; + return -EINVAL; } roundup_size = roundup_pow_of_two(rss_size); @@ -3417,12 +3377,50 @@ int hclge_rss_init_hw(struct hclge_dev *hdev) tc_offset[i] = rss_size * i; } - ret = hclge_set_rss_tc_mode(hdev, tc_valid, tc_size, tc_offset); +
Re: [PATCHv2 net-next] openvswitch: fix vport packet length check.
On Wed, Mar 7, 2018 at 3:38 PM, William Tuwrote: > When sending a packet to a tunnel device, the dev's hard_header_len > could be larger than the skb->len in function packet_length(). > In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen, > which is around 180, and an ARP packet sent to this tunnel has > skb->len = 42. This causes the 'unsign int length' to become super > large because it is negative value, causing the later ovs_vport_send > to drop it due to over-mtu size. The patch fixes it by setting it to 0. > > Signed-off-by: William Tu > --- > v1->v2: > replace the return type from unsigned int to int > --- Acked-by: Pravin B Shelar
[PATCH net-next 05/23] {topost} net: hns3: fix for netdev not running problem after calling net_stop and net_open
From: Fuyun LiangThe link status update function is called by timer every second. But net_stop and net_open may be called with very short intervals. The link status update function can not detect the link state has changed. It causes the netdev not running problem. This patch fixes it by updating the link state in ae_stop function. Signed-off-by: Fuyun Liang Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 +++ drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 0b74461..83be4d5 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -3770,6 +3770,9 @@ static void hclge_ae_stop(struct hnae3_handle *handle) /* reset tqp stats */ hclge_reset_tqp_stats(handle); + del_timer_sync(>service_timer); + cancel_work_sync(>service_task); + hclge_update_link_status(hdev); } static int hclge_get_mac_vlan_cmd_status(struct hclge_vport *vport, diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c index ccb6756..eee5e20 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c @@ -1061,6 +1061,9 @@ static void hclgevf_ae_stop(struct hnae3_handle *handle) /* reset tqp stats */ hclgevf_reset_tqp_stats(handle); + del_timer_sync(>service_timer); + cancel_work_sync(>service_task); + hclgevf_update_link_status(hdev, 0); } static void hclgevf_state_init(struct hclgevf_dev *hdev) -- 2.9.3
[PATCH net-next 00/23] net: hns3: HNS3 bug fixes & code improvements
This patch-set introduces various HNS3 bug fixes, optimizations and code improvements. Fuyun Liang (4): {topost} net: hns3: add existence check when remove old uc mac address {topost} net: hns3: fix for netdev not running problem after calling net_stop and net_open {topost} net: hns3: fix for ipv6 address loss problem after setting channels {topost} net: hns3: unify the pause params setup function Peng Li (8): {topost} net: hns3: VF should get the real rss_size instead of rss_size_max {topost} net: hns3: set the cmdq out_vld bit to 0 after used {topost} net: hns3: fix endian issue when PF get mbx message flag {topost} net: hns3: fix rx path skb->truesize reporting bug {topost} net: hns3: Add support for querying pfc puase packets statistic {topost} net: hns3: fix the queue id for tqp enable& {topost} net: hns3: set the max ring num when alloc netdev {topost} net: hns3: add support for VF driver inner interface hclgevf_ops.get_tqps_and_rss_info Yunsheng Lin (11): {topost} net: hns3: Refactor the hclge_get/set_rss function {topost} net: hns3: Refactor the hclge_get/set_rss_tuple function {topost} net: hns3: Fix for RSS configuration loss problem during reset {topost} net: hns3: Fix for pause configuration lost during reset {topost} net: hns3: Fix for use-after-free when setting ring parameter {topost} net: hns3: Refactor the get/put_vector function {topost} net: hns3: Fix for coalesce configuration lost during reset {topost} net: hns3: Refactor the coalesce related struct {topost} net: hns3: Fix for coal configuation lost when setting the channel {topost} net: hns3: Fix for loopback failure when vlan filter is enable {topost} net: hns3: Fix for buffer overflow smatch warning drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h| 2 + drivers/net/ethernet/hisilicon/hns3/hnae3.h| 6 +- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c| 286 +-- drivers/net/ethernet/hisilicon/hns3/hns3_enet.h| 10 +- drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 42 ++- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 16 ++ .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 307 +++-- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h| 16 ++ .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 31 ++- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 76 - .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h | 8 +- .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 95 --- .../ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c | 1 + 13 files changed, 574 insertions(+), 322 deletions(-) -- 2.9.3
[PATCH net-next 08/23] {topost} net: hns3: fix rx path skb->truesize reporting bug
Original skb->truesize reports the received packet size, not the actual buffer size NIC driver allocated(1 Page). The linux net protocol will misjudge the true size of rx queue. Signed-off-by: Peng Li--- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 83f4b36..f50245d 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -2064,15 +2064,13 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i, desc = >desc[ring->next_to_clean]; size = le16_to_cpu(desc->rx.size); - if (twobufs) { - truesize = hnae_buf_size(ring); - } else { - truesize = ALIGN(size, L1_CACHE_BYTES); + truesize = hnae_buf_size(ring); + + if (!twobufs) last_offset = hnae_page_size(ring) - hnae_buf_size(ring); - } skb_add_rx_frag(skb, i, desc_cb->priv, desc_cb->page_offset + pull_len, - size - pull_len, truesize - pull_len); + size - pull_len, truesize); /* Avoid re-using remote pages,flag default unreuse */ if (unlikely(page_to_nid(desc_cb->priv) != numa_node_id())) -- 2.9.3
[PATCH net-next 14/23] {topost} net: hns3: fix for use-after-free when setting ring parameter
From: Yunsheng LinIn hns3_set_ringparam, hns3_uninit_all_ring frees the memory pointed by priv->ring_data[i].ring, and hns3_change_all_ring_bd_num use that pointer without mallocing, which will cause a use-after-free problem. The patch fixes it by not freeing the memory in hns3_uninit_all_ring, and uses hns3_put_ring_config to free it when necessary. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index f50245d..2bed73e 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -2955,13 +2955,8 @@ int hns3_uninit_all_ring(struct hns3_nic_priv *priv) h->ae_algo->ops->reset_queue(h, i); hns3_fini_ring(priv->ring_data[i].ring); - devm_kfree(priv->dev, priv->ring_data[i].ring); hns3_fini_ring(priv->ring_data[i + h->kinfo.num_tqps].ring); - devm_kfree(priv->dev, - priv->ring_data[i + h->kinfo.num_tqps].ring); } - devm_kfree(priv->dev, priv->ring_data); - return 0; } @@ -3099,6 +3094,8 @@ static void hns3_client_uninit(struct hnae3_handle *handle, bool reset) if (ret) netdev_err(netdev, "uninit ring error\n"); + hns3_put_ring_config(priv); + priv->ring_data = NULL; free_netdev(netdev); @@ -3304,6 +3301,8 @@ static int hns3_reset_notify_uninit_enet(struct hnae3_handle *handle) if (ret) netdev_err(netdev, "uninit ring error\n"); + hns3_put_ring_config(priv); + priv->ring_data = NULL; return ret; @@ -3421,6 +3420,7 @@ int hns3_set_channels(struct net_device *netdev, } hns3_uninit_all_ring(priv); + hns3_put_ring_config(priv); org_tqp_num = h->kinfo.num_tqps; ret = hns3_modify_tqp_num(netdev, new_tqp_num); -- 2.9.3
[PATCH net-next 11/23] {topost} net: hns3: refactor the hclge_get/set_rss_tuple function
From: Yunsheng LinThis patch refactors the hclge_get/set_rss_tuple function in order to fix the rss configuration loss problem during reset process. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li --- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 91 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h| 13 2 files changed, 67 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 9ba012b..0271960 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -3091,14 +3091,16 @@ static int hclge_set_rss_input_tuple(struct hclge_dev *hdev) hclge_cmd_setup_basic_desc(, HCLGE_OPC_RSS_INPUT_TUPLE, false); req = (struct hclge_rss_input_tuple_cmd *)desc.data; - req->ipv4_tcp_en = HCLGE_RSS_INPUT_TUPLE_OTHER; - req->ipv4_udp_en = HCLGE_RSS_INPUT_TUPLE_OTHER; - req->ipv4_sctp_en = HCLGE_RSS_INPUT_TUPLE_SCTP; - req->ipv4_fragment_en = HCLGE_RSS_INPUT_TUPLE_OTHER; - req->ipv6_tcp_en = HCLGE_RSS_INPUT_TUPLE_OTHER; - req->ipv6_udp_en = HCLGE_RSS_INPUT_TUPLE_OTHER; - req->ipv6_sctp_en = HCLGE_RSS_INPUT_TUPLE_SCTP; - req->ipv6_fragment_en = HCLGE_RSS_INPUT_TUPLE_OTHER; + + /* Get the tuple cfg from pf */ + req->ipv4_tcp_en = hdev->vport[0].rss_tuple_sets.ipv4_tcp_en; + req->ipv4_udp_en = hdev->vport[0].rss_tuple_sets.ipv4_udp_en; + req->ipv4_sctp_en = hdev->vport[0].rss_tuple_sets.ipv4_sctp_en; + req->ipv4_fragment_en = hdev->vport[0].rss_tuple_sets.ipv4_fragment_en; + req->ipv6_tcp_en = hdev->vport[0].rss_tuple_sets.ipv6_tcp_en; + req->ipv6_udp_en = hdev->vport[0].rss_tuple_sets.ipv6_udp_en; + req->ipv6_sctp_en = hdev->vport[0].rss_tuple_sets.ipv6_sctp_en; + req->ipv6_fragment_en = hdev->vport[0].rss_tuple_sets.ipv6_fragment_en; ret = hclge_cmd_send(>hw, , 1); if (ret) { dev_err(>pdev->dev, @@ -3204,15 +3206,16 @@ static int hclge_set_rss_tuple(struct hnae3_handle *handle, return -EINVAL; req = (struct hclge_rss_input_tuple_cmd *)desc.data; - hclge_cmd_setup_basic_desc(, HCLGE_OPC_RSS_INPUT_TUPLE, true); - ret = hclge_cmd_send(>hw, , 1); - if (ret) { - dev_err(>pdev->dev, - "Read rss tuple fail, status = %d\n", ret); - return ret; - } + hclge_cmd_setup_basic_desc(, HCLGE_OPC_RSS_INPUT_TUPLE, false); - hclge_cmd_reuse_desc(, false); + req->ipv4_tcp_en = vport->rss_tuple_sets.ipv4_tcp_en; + req->ipv4_udp_en = vport->rss_tuple_sets.ipv4_udp_en; + req->ipv4_sctp_en = vport->rss_tuple_sets.ipv4_sctp_en; + req->ipv4_fragment_en = vport->rss_tuple_sets.ipv4_fragment_en; + req->ipv6_tcp_en = vport->rss_tuple_sets.ipv6_tcp_en; + req->ipv6_udp_en = vport->rss_tuple_sets.ipv6_udp_en; + req->ipv6_sctp_en = vport->rss_tuple_sets.ipv6_sctp_en; + req->ipv6_fragment_en = vport->rss_tuple_sets.ipv6_fragment_en; tuple_sets = hclge_get_rss_hash_bits(nfc); switch (nfc->flow_type) { @@ -3249,52 +3252,49 @@ static int hclge_set_rss_tuple(struct hnae3_handle *handle, } ret = hclge_cmd_send(>hw, , 1); - if (ret) + if (ret) { dev_err(>pdev->dev, "Set rss tuple fail, status = %d\n", ret); + return ret; + } - return ret; + vport->rss_tuple_sets.ipv4_tcp_en = req->ipv4_tcp_en; + vport->rss_tuple_sets.ipv4_udp_en = req->ipv4_udp_en; + vport->rss_tuple_sets.ipv4_sctp_en = req->ipv4_sctp_en; + vport->rss_tuple_sets.ipv4_fragment_en = req->ipv4_fragment_en; + vport->rss_tuple_sets.ipv6_tcp_en = req->ipv6_tcp_en; + vport->rss_tuple_sets.ipv6_udp_en = req->ipv6_udp_en; + vport->rss_tuple_sets.ipv6_sctp_en = req->ipv6_sctp_en; + vport->rss_tuple_sets.ipv6_fragment_en = req->ipv6_fragment_en; + return 0; } static int hclge_get_rss_tuple(struct hnae3_handle *handle, struct ethtool_rxnfc *nfc) { struct hclge_vport *vport = hclge_get_vport(handle); - struct hclge_dev *hdev = vport->back; - struct hclge_rss_input_tuple_cmd *req; - struct hclge_desc desc; u8 tuple_sets; - int ret; nfc->data = 0; - req = (struct hclge_rss_input_tuple_cmd *)desc.data; - hclge_cmd_setup_basic_desc(, HCLGE_OPC_RSS_INPUT_TUPLE, true); - ret = hclge_cmd_send(>hw, , 1); - if (ret) { - dev_err(>pdev->dev, - "Read rss tuple fail, status = %d\n", ret); - return ret; - } - switch (nfc->flow_type) { case TCP_V4_FLOW: -
[PATCH net-next 07/23] {topost} net: hns3: unify the pause params setup function
From: Fuyun LiangSince the firmware cmd to setup mac pause params is the same as the firmware cmd to pfc pause params, this patch unifies the pause params setup function. Signed-off-by: Fuyun Liang Signed-off-by: Peng Li --- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 2 +- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 23 +++--- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h | 2 +- 3 files changed, 13 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 83be4d5..cc72ed8 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -4431,7 +4431,7 @@ static int hclge_set_mac_addr(struct hnae3_handle *handle, void *p, return -EIO; } - ret = hclge_mac_pause_addr_cfg(hdev, new_addr); + ret = hclge_pause_addr_cfg(hdev, new_addr); if (ret) { dev_err(>pdev->dev, "configure mac pause address fail, ret =%d.\n", diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c index 36bd79a..4134a82 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c @@ -138,8 +138,8 @@ static int hclge_pfc_pause_en_cfg(struct hclge_dev *hdev, u8 tx_rx_bitmap, return hclge_cmd_send(>hw, , 1); } -static int hclge_mac_pause_param_cfg(struct hclge_dev *hdev, const u8 *addr, -u8 pause_trans_gap, u16 pause_trans_time) +static int hclge_pause_param_cfg(struct hclge_dev *hdev, const u8 *addr, +u8 pause_trans_gap, u16 pause_trans_time) { struct hclge_cfg_pause_param_cmd *pause_param; struct hclge_desc desc; @@ -155,7 +155,7 @@ static int hclge_mac_pause_param_cfg(struct hclge_dev *hdev, const u8 *addr, return hclge_cmd_send(>hw, , 1); } -int hclge_mac_pause_addr_cfg(struct hclge_dev *hdev, const u8 *mac_addr) +int hclge_pause_addr_cfg(struct hclge_dev *hdev, const u8 *mac_addr) { struct hclge_cfg_pause_param_cmd *pause_param; struct hclge_desc desc; @@ -174,7 +174,7 @@ int hclge_mac_pause_addr_cfg(struct hclge_dev *hdev, const u8 *mac_addr) trans_gap = pause_param->pause_trans_gap; trans_time = le16_to_cpu(pause_param->pause_trans_time); - return hclge_mac_pause_param_cfg(hdev, mac_addr, trans_gap, + return hclge_pause_param_cfg(hdev, mac_addr, trans_gap, trans_time); } @@ -1096,11 +1096,11 @@ static int hclge_tm_schd_setup_hw(struct hclge_dev *hdev) return hclge_tm_schd_mode_hw(hdev); } -static int hclge_mac_pause_param_setup_hw(struct hclge_dev *hdev) +static int hclge_pause_param_setup_hw(struct hclge_dev *hdev) { struct hclge_mac *mac = >hw.mac; - return hclge_mac_pause_param_cfg(hdev, mac->mac_addr, + return hclge_pause_param_cfg(hdev, mac->mac_addr, HCLGE_DEFAULT_PAUSE_TRANS_GAP, HCLGE_DEFAULT_PAUSE_TRANS_TIME); } @@ -1151,13 +1151,12 @@ int hclge_pause_setup_hw(struct hclge_dev *hdev) int ret; u8 i; - if (hdev->tm_info.fc_mode != HCLGE_FC_PFC) { - ret = hclge_mac_pause_setup_hw(hdev); - if (ret) - return ret; + ret = hclge_pause_param_setup_hw(hdev); + if (ret) + return ret; - return hclge_mac_pause_param_setup_hw(hdev); - } + if (hdev->tm_info.fc_mode != HCLGE_FC_PFC) + return hclge_mac_pause_setup_hw(hdev); /* Only DCB-supported dev supports qset back pressure and pfc cmd */ if (!hnae3_dev_dcb_supported(hdev)) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h index 5401e75..c30c85b 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h @@ -129,5 +129,5 @@ int hclge_tm_dwrr_cfg(struct hclge_dev *hdev); int hclge_tm_map_cfg(struct hclge_dev *hdev); int hclge_tm_init_hw(struct hclge_dev *hdev); int hclge_mac_pause_en_cfg(struct hclge_dev *hdev, bool tx, bool rx); -int hclge_mac_pause_addr_cfg(struct hclge_dev *hdev, const u8 *mac_addr); +int hclge_pause_addr_cfg(struct hclge_dev *hdev, const u8 *mac_addr); #endif -- 2.9.3
[PATCH net-next 13/23] {topost} net: hns3: fix for pause configuration lost during reset
From: Yunsheng LinPause configuration will be set to default value by hclge_tm_schd_init during reset, which causes the RSS configuration loss problem. This patch fixes it by calling hclge_tm_init_hw during reset process , which will set the pause configuration to default value. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 1d69470..c0f6939 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -5495,9 +5495,9 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev) return ret; } - ret = hclge_tm_schd_init(hdev); + ret = hclge_tm_init_hw(hdev); if (ret) { - dev_err(>dev, "tm schd init fail, ret =%d\n", ret); + dev_err(>dev, "tm init hw fail, ret =%d\n", ret); return ret; } -- 2.9.3
[PATCH net-next 02/23] {topost} net: hns3: add existence check when remove old uc mac address
From: Fuyun LiangWhen driver is in initial state, the mac_vlan table table is empty. So the delete operation for mac address must fail. Existence check is needed here. Otherwise, the error message will make user confused. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hnae3.h | 3 ++- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 4 ++-- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 17 +++-- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 2 ++ .../net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 10 +++--- 5 files changed, 20 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h index fd06bc7..3c653eb 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h @@ -336,7 +336,8 @@ struct hnae3_ae_ops { u32 *tx_usecs_high, u32 *rx_usecs_high); void (*get_mac_addr)(struct hnae3_handle *handle, u8 *p); - int (*set_mac_addr)(struct hnae3_handle *handle, void *p); + int (*set_mac_addr)(struct hnae3_handle *handle, void *p, + bool is_first); int (*add_uc_addr)(struct hnae3_handle *handle, const unsigned char *addr); int (*rm_uc_addr)(struct hnae3_handle *handle, diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 601b629..1bebfd9 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -1104,7 +1104,7 @@ static int hns3_nic_net_set_mac_address(struct net_device *netdev, void *p) if (!mac_addr || !is_valid_ether_addr((const u8 *)mac_addr->sa_data)) return -EADDRNOTAVAIL; - ret = h->ae_algo->ops->set_mac_addr(h, mac_addr->sa_data); + ret = h->ae_algo->ops->set_mac_addr(h, mac_addr->sa_data, false); if (ret) { netdev_err(netdev, "set_mac_address fail, ret=%d!\n", ret); return ret; @@ -2987,7 +2987,7 @@ static void hns3_init_mac_addr(struct net_device *netdev) } if (h->ae_algo->ops->set_mac_addr) - h->ae_algo->ops->set_mac_addr(h, netdev->dev_addr); + h->ae_algo->ops->set_mac_addr(h, netdev->dev_addr, true); } diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 32bc6f6..0b74461 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -4392,7 +4392,8 @@ static void hclge_get_mac_addr(struct hnae3_handle *handle, u8 *p) ether_addr_copy(p, hdev->hw.mac.mac_addr); } -static int hclge_set_mac_addr(struct hnae3_handle *handle, void *p) +static int hclge_set_mac_addr(struct hnae3_handle *handle, void *p, + bool is_first) { const unsigned char *new_addr = (const unsigned char *)p; struct hclge_vport *vport = hclge_get_vport(handle); @@ -4409,11 +4410,9 @@ static int hclge_set_mac_addr(struct hnae3_handle *handle, void *p) return -EINVAL; } - ret = hclge_rm_uc_addr(handle, hdev->hw.mac.mac_addr); - if (ret) + if (!is_first && hclge_rm_uc_addr(handle, hdev->hw.mac.mac_addr)) dev_warn(>pdev->dev, -"remove old uc mac address fail, ret =%d.\n", -ret); +"remove old uc mac address fail.\n"); ret = hclge_add_uc_addr(handle, new_addr); if (ret) { @@ -4421,12 +4420,10 @@ static int hclge_set_mac_addr(struct hnae3_handle *handle, void *p) "add uc mac address fail, ret =%d.\n", ret); - ret = hclge_add_uc_addr(handle, hdev->hw.mac.mac_addr); - if (ret) { + if (!is_first && + hclge_add_uc_addr(handle, hdev->hw.mac.mac_addr)) dev_err(>pdev->dev, - "restore uc mac address fail, ret =%d.\n", - ret); - } + "restore uc mac address fail.\n"); return -EIO; } diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c index 31383a6..ea78a99 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -196,6 +196,8 @@ static int hclge_set_vf_uc_mac_addr(struct hclge_vport *vport,
[PATCH net-next 10/23] {topost} net: hns3: refactor the hclge_get/set_rss function
From: Yunsheng LinThis patch refactors the hclge_get/set_rss function in order to fix the rss configuration loss problem during reset process. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li --- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 39 -- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h| 2 ++ 2 files changed, 9 insertions(+), 32 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index cc72ed8..9ba012b 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -2979,31 +2979,6 @@ static u32 hclge_get_rss_indir_size(struct hnae3_handle *handle) return HCLGE_RSS_IND_TBL_SIZE; } -static int hclge_get_rss_algo(struct hclge_dev *hdev) -{ - struct hclge_rss_config_cmd *req; - struct hclge_desc desc; - int rss_hash_algo; - int ret; - - hclge_cmd_setup_basic_desc(, HCLGE_OPC_RSS_GENERIC_CONFIG, true); - - ret = hclge_cmd_send(>hw, , 1); - if (ret) { - dev_err(>pdev->dev, - "Get link status error, status =%d\n", ret); - return ret; - } - - req = (struct hclge_rss_config_cmd *)desc.data; - rss_hash_algo = (req->hash_config & HCLGE_RSS_HASH_ALGO_MASK); - - if (rss_hash_algo == HCLGE_RSS_HASH_ALGO_TOEPLITZ) - return ETH_RSS_HASH_TOP; - - return -EINVAL; -} - static int hclge_set_rss_algo_key(struct hclge_dev *hdev, const u8 hfunc, const u8 *key) { @@ -3042,7 +3017,7 @@ static int hclge_set_rss_algo_key(struct hclge_dev *hdev, return 0; } -static int hclge_set_rss_indir_table(struct hclge_dev *hdev, const u32 *indir) +static int hclge_set_rss_indir_table(struct hclge_dev *hdev, const u8 *indir) { struct hclge_rss_indirection_table_cmd *req; struct hclge_desc desc; @@ -3138,12 +3113,11 @@ static int hclge_get_rss(struct hnae3_handle *handle, u32 *indir, u8 *key, u8 *hfunc) { struct hclge_vport *vport = hclge_get_vport(handle); - struct hclge_dev *hdev = vport->back; int i; /* Get hash algorithm */ if (hfunc) - *hfunc = hclge_get_rss_algo(hdev); + *hfunc = vport->rss_algo; /* Get the RSS Key required by the user */ if (key) @@ -3167,8 +3141,6 @@ static int hclge_set_rss(struct hnae3_handle *handle, const u32 *indir, /* Set the RSS Hash Key if specififed by the user */ if (key) { - /* Update the shadow RSS key with user specified qids */ - memcpy(vport->rss_hash_key, key, HCLGE_RSS_KEY_SIZE); if (hfunc == ETH_RSS_HASH_TOP || hfunc == ETH_RSS_HASH_NO_CHANGE) @@ -3178,6 +3150,10 @@ static int hclge_set_rss(struct hnae3_handle *handle, const u32 *indir, ret = hclge_set_rss_algo_key(hdev, hash_algo, key); if (ret) return ret; + + /* Update the shadow RSS key with user specified qids */ + memcpy(vport->rss_hash_key, key, HCLGE_RSS_KEY_SIZE); + vport->rss_algo = hash_algo; } /* Update the shadow RSS table with user specified qids */ @@ -3185,8 +3161,7 @@ static int hclge_set_rss(struct hnae3_handle *handle, const u32 *indir, vport->rss_indirection_tbl[i] = indir[i]; /* Update the hardware */ - ret = hclge_set_rss_indir_table(hdev, indir); - return ret; + return hclge_set_rss_indir_table(hdev, vport->rss_indirection_tbl); } static u8 hclge_get_rss_hash_bits(struct ethtool_rxnfc *nfc) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h index d99a76a..7e762c4 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h @@ -579,6 +579,8 @@ struct hclge_vport { u8 rss_hash_key[HCLGE_RSS_KEY_SIZE]; /* User configured hash keys */ /* User configured lookup table entries */ u8 rss_indirection_tbl[HCLGE_RSS_IND_TBL_SIZE]; + int rss_algo; /* User configured hash algorithm */ + u16 alloc_rss_size; u16 qs_offset; -- 2.9.3
[PATCH net-next 09/23] {topost} net: hns3: add support for querying pfc puase packets statistic
This patch add support for querying pfc puase packets statistic in hclge_ieee_getpfc, which is used to tell user how many pfc puase packets have been sent and received by this mac port. Signed-off-by: Yunsheng LinSigned-off-by: Peng Li --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 14 ++ .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 53 ++ .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h | 6 +++ 3 files changed, 73 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c index 5018d66..c5270b5 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c @@ -203,9 +203,11 @@ static int hclge_ieee_setets(struct hnae3_handle *h, struct ieee_ets *ets) static int hclge_ieee_getpfc(struct hnae3_handle *h, struct ieee_pfc *pfc) { + u64 requests[HNAE3_MAX_TC], indications[HNAE3_MAX_TC]; struct hclge_vport *vport = hclge_get_vport(h); struct hclge_dev *hdev = vport->back; u8 i, j, pfc_map, *prio_tc; + int ret; memset(pfc, 0, sizeof(*pfc)); pfc->pfc_cap = hdev->pfc_max; @@ -220,6 +222,18 @@ static int hclge_ieee_getpfc(struct hnae3_handle *h, struct ieee_pfc *pfc) } } + ret = hclge_pfc_tx_stats_get(hdev, requests); + if (ret) + return ret; + + ret = hclge_pfc_rx_stats_get(hdev, indications); + if (ret) + return ret; + + for (i = 0; i < HCLGE_MAX_TC_NUM; i++) { + pfc->requests[i] = requests[i]; + pfc->indications[i] = indications[i]; + } return 0; } diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c index 4134a82..885f25c 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c @@ -23,6 +23,9 @@ enum hclge_shaper_level { HCLGE_SHAPER_LVL_PF = 1, }; +#define HCLGE_TM_PFC_PKT_GET_CMD_NUM 3 +#define HCLGE_TM_PFC_NUM_GET_PER_CMD 3 + #define HCLGE_SHAPER_BS_U_DEF 5 #define HCLGE_SHAPER_BS_S_DEF 20 @@ -112,6 +115,56 @@ static int hclge_shaper_para_calc(u32 ir, u8 shaper_level, return 0; } +static int hclge_pfc_stats_get(struct hclge_dev *hdev, + enum hclge_opcode_type opcode, u64 *stats) +{ + struct hclge_desc desc[HCLGE_TM_PFC_PKT_GET_CMD_NUM]; + int ret, i, j; + + if (!(opcode == HCLGE_OPC_QUERY_PFC_RX_PKT_CNT || + opcode == HCLGE_OPC_QUERY_PFC_TX_PKT_CNT)) + return -EINVAL; + + for (i = 0; i < HCLGE_TM_PFC_PKT_GET_CMD_NUM; i++) { + hclge_cmd_setup_basic_desc([i], opcode, true); + if (i != (HCLGE_TM_PFC_PKT_GET_CMD_NUM - 1)) + desc[i].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT); + else + desc[i].flag &= ~cpu_to_le16(HCLGE_CMD_FLAG_NEXT); + } + + ret = hclge_cmd_send(>hw, desc, HCLGE_TM_PFC_PKT_GET_CMD_NUM); + if (ret) { + dev_err(>pdev->dev, + "Get pfc pause stats fail, ret = %d.\n", ret); + return ret; + } + + for (i = 0; i < HCLGE_TM_PFC_PKT_GET_CMD_NUM; i++) { + struct hclge_pfc_stats_cmd *pfc_stats = + (struct hclge_pfc_stats_cmd *)desc[i].data; + + for (j = 0; j < HCLGE_TM_PFC_NUM_GET_PER_CMD; j++) { + u32 index = i * HCLGE_TM_PFC_PKT_GET_CMD_NUM + j; + + if (index < HCLGE_MAX_TC_NUM) + stats[index] = + le64_to_cpu(pfc_stats->pkt_num[j]); + } + } + return 0; +} + +int hclge_pfc_rx_stats_get(struct hclge_dev *hdev, u64 *stats) +{ + return hclge_pfc_stats_get(hdev, HCLGE_OPC_QUERY_PFC_RX_PKT_CNT, stats); +} + +int hclge_pfc_tx_stats_get(struct hclge_dev *hdev, u64 *stats) +{ + return hclge_pfc_stats_get(hdev, HCLGE_OPC_QUERY_PFC_TX_PKT_CNT, stats); +} + int hclge_mac_pause_en_cfg(struct hclge_dev *hdev, bool tx, bool rx) { struct hclge_desc desc; diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h index c30c85b..2dbe177 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h @@ -109,6 +109,10 @@ struct hclge_cfg_pause_param_cmd { __le16 pause_trans_time; }; +struct hclge_pfc_stats_cmd { + __le64 pkt_num[3]; +}; + struct hclge_port_shapping_cmd { __le32 port_shapping_para; }; @@ -130,4 +134,6 @@ int hclge_tm_map_cfg(struct hclge_dev *hdev); int hclge_tm_init_hw(struct hclge_dev *hdev); int
[PATCH net-next 19/23] {topost} net: hns3: fix for loopback failure when vlan filter is enable
From: Yunsheng LinWhen vlan ctag filter is enabled, the loopback selftest fails because loopback selftest does not support vlan. This patch fixes it by disabling the vlan ctag filter when runnig loopback selftest. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c index 26274bc..2db127c 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c @@ -309,6 +309,9 @@ static void hns3_self_test(struct net_device *ndev, struct hnae3_handle *h = priv->ae_handle; int st_param[HNS3_SELF_TEST_TPYE_NUM][2]; bool if_running = netif_running(ndev); +#if IS_ENABLED(CONFIG_VLAN_8021Q) + bool dis_vlan_filter; +#endif int test_index = 0; u32 i; @@ -323,6 +326,14 @@ static void hns3_self_test(struct net_device *ndev, if (if_running) dev_close(ndev); +#if IS_ENABLED(CONFIG_VLAN_8021Q) + /* Disable the vlan filter for selftest does not support it */ + dis_vlan_filter = (ndev->features & NETIF_F_HW_VLAN_CTAG_FILTER) && + h->ae_algo->ops->enable_vlan_filter; + if (dis_vlan_filter) + h->ae_algo->ops->enable_vlan_filter(h, false); +#endif + set_bit(HNS3_NIC_STATE_TESTING, >state); for (i = 0; i < HNS3_SELF_TEST_TPYE_NUM; i++) { @@ -345,6 +356,11 @@ static void hns3_self_test(struct net_device *ndev, clear_bit(HNS3_NIC_STATE_TESTING, >state); +#if IS_ENABLED(CONFIG_VLAN_8021Q) + if (dis_vlan_filter) + h->ae_algo->ops->enable_vlan_filter(h, true); +#endif + if (if_running) dev_open(ndev); } -- 2.9.3
[PATCH net-next 03/23] {topost} net: hns3: set the cmdq out_vld bit to 0 after used
Driver check the out_vld bit when get a new cmdq BD, if the bit is 1, the BD is valid. driver Should set the bit 0 after used and hw will set the bit 1 if get a valid BD. Signed-off-by: Peng Li--- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 1 + drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c index ea78a99..3a2c174 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -412,6 +412,7 @@ void hclge_mbx_handler(struct hclge_dev *hdev) req->msg[0]); break; } + crq->desc[crq->next_to_use].flag = 0; hclge_mbx_ring_ptr_move_crq(crq); } diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c index e39cad2..18283ef 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c @@ -171,6 +171,7 @@ void hclgevf_mbx_handler(struct hclgevf_dev *hdev) req->msg[0]); break; } + crq->desc[crq->next_to_use].flag = 0; hclge_mbx_ring_ptr_move_crq(crq); flag = le16_to_cpu(crq->desc[crq->next_to_use].flag); } -- 2.9.3
[PATCH net-next 18/23] {topost} net: hns3: fix for coal configuation lost when setting the channel
From: Yunsheng LinThis patch fixes the coalesce configuation lost problem when setting the channel number by restoring all vectors's coalesce configuation to vector 0's, because all vectors belonging to the same netdev have the same coalesce configuation for now. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 37 +++-- 1 file changed, 34 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index a0ba25f..b02f3ff 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -3412,7 +3412,24 @@ static u16 hns3_get_max_available_channels(struct net_device *netdev) return min_t(u16, max_tqps, (free_tqps + h->kinfo.num_tqps)); } -static int hns3_modify_tqp_num(struct net_device *netdev, u16 new_tqp_num) +static void hns3_restore_coal(struct hns3_nic_priv *priv, + struct hns3_enet_coalesce *tx, + struct hns3_enet_coalesce *rx) +{ + u16 vector_num = priv->vector_num; + int i; + + for (i = 0; i < vector_num; i++) { + memcpy(>tqp_vector[i].tx_group.coal, tx, + sizeof(struct hns3_enet_coalesce)); + memcpy(>tqp_vector[i].rx_group.coal, rx, + sizeof(struct hns3_enet_coalesce)); + } +} + +static int hns3_modify_tqp_num(struct net_device *netdev, u16 new_tqp_num, + struct hns3_enet_coalesce *tx, + struct hns3_enet_coalesce *rx) { struct hns3_nic_priv *priv = netdev_priv(netdev); struct hnae3_handle *h = hns3_get_handle(netdev); @@ -3430,6 +3447,8 @@ static int hns3_modify_tqp_num(struct net_device *netdev, u16 new_tqp_num) if (ret) goto err_alloc_vector; + hns3_restore_coal(priv, tx, rx); + ret = hns3_nic_init_vector_data(priv); if (ret) goto err_uninit_vector; @@ -3460,6 +3479,7 @@ int hns3_set_channels(struct net_device *netdev, struct hns3_nic_priv *priv = netdev_priv(netdev); struct hnae3_handle *h = hns3_get_handle(netdev); struct hnae3_knic_private_info *kinfo = >kinfo; + struct hns3_enet_coalesce tx_coal, rx_coal; bool if_running = netif_running(netdev); u32 new_tqp_num = ch->combined_count; u16 org_tqp_num; @@ -3493,15 +3513,26 @@ int hns3_set_channels(struct net_device *netdev, goto open_netdev; } + /* Changing the tqp num may also change the vector num, +* ethtool only support setting and querying one coal +* configuation for now, so save the vector 0' coal +* configuation here in order to restore it. +*/ + memcpy(_coal, >tqp_vector[0].tx_group.coal, + sizeof(struct hns3_enet_coalesce)); + memcpy(_coal, >tqp_vector[0].rx_group.coal, + sizeof(struct hns3_enet_coalesce)); + hns3_nic_dealloc_vector_data(priv); hns3_uninit_all_ring(priv); hns3_put_ring_config(priv); org_tqp_num = h->kinfo.num_tqps; - ret = hns3_modify_tqp_num(netdev, new_tqp_num); + ret = hns3_modify_tqp_num(netdev, new_tqp_num, _coal, _coal); if (ret) { - ret = hns3_modify_tqp_num(netdev, org_tqp_num); + ret = hns3_modify_tqp_num(netdev, org_tqp_num, + _coal, _coal); if (ret) { /* If revert to old tqp failed, fatal error occurred */ dev_err(>dev, -- 2.9.3
[PATCH net-next 17/23] {topost} net: hns3: refactor the coalesce related struct
From: Yunsheng LinThis patch refoctors the coalesce related struct by introducing the hns3_enet_coalesce struct, in order to fix the coalesce configuation lost problem when changing the channel number. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c| 46 +++--- drivers/net/ethernet/hisilicon/hns3/hns3_enet.h| 10 +++-- drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 26 +++- 3 files changed, 46 insertions(+), 36 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 453f509..a0ba25f 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -168,8 +168,8 @@ void hns3_set_vector_coalesce_rl(struct hns3_enet_tqp_vector *tqp_vector, * GL and RL(Rate Limiter) are 2 ways to acheive interrupt coalescing */ - if (rl_reg > 0 && !tqp_vector->tx_group.gl_adapt_enable && - !tqp_vector->rx_group.gl_adapt_enable) + if (rl_reg > 0 && !tqp_vector->tx_group.coal.gl_adapt_enable && + !tqp_vector->rx_group.coal.gl_adapt_enable) /* According to the hardware, the range of rl_reg is * 0-59 and the unit is 4. */ @@ -205,17 +205,17 @@ static void hns3_vector_gl_rl_init(struct hns3_enet_tqp_vector *tqp_vector, */ /* Default: enable interrupt coalescing self-adaptive and GL */ - tqp_vector->tx_group.gl_adapt_enable = 1; - tqp_vector->rx_group.gl_adapt_enable = 1; + tqp_vector->tx_group.coal.gl_adapt_enable = 1; + tqp_vector->rx_group.coal.gl_adapt_enable = 1; - tqp_vector->tx_group.int_gl = HNS3_INT_GL_50K; - tqp_vector->rx_group.int_gl = HNS3_INT_GL_50K; + tqp_vector->tx_group.coal.int_gl = HNS3_INT_GL_50K; + tqp_vector->rx_group.coal.int_gl = HNS3_INT_GL_50K; /* Default: disable RL */ h->kinfo.int_rl_setting = 0; - tqp_vector->rx_group.flow_level = HNS3_FLOW_LOW; - tqp_vector->tx_group.flow_level = HNS3_FLOW_LOW; + tqp_vector->rx_group.coal.flow_level = HNS3_FLOW_LOW; + tqp_vector->tx_group.coal.flow_level = HNS3_FLOW_LOW; } static void hns3_vector_gl_rl_init_hw(struct hns3_enet_tqp_vector *tqp_vector, @@ -224,9 +224,9 @@ static void hns3_vector_gl_rl_init_hw(struct hns3_enet_tqp_vector *tqp_vector, struct hnae3_handle *h = priv->ae_handle; hns3_set_vector_coalesce_tx_gl(tqp_vector, - tqp_vector->tx_group.int_gl); + tqp_vector->tx_group.coal.int_gl); hns3_set_vector_coalesce_rx_gl(tqp_vector, - tqp_vector->rx_group.int_gl); + tqp_vector->rx_group.coal.int_gl); hns3_set_vector_coalesce_rl(tqp_vector, h->kinfo.int_rl_setting); } @@ -2381,12 +2381,12 @@ static bool hns3_get_new_int_gl(struct hns3_enet_ring_group *ring_group) u16 new_int_gl; int usecs; - if (!ring_group->int_gl) + if (!ring_group->coal.int_gl) return false; if (ring_group->total_packets == 0) { - ring_group->int_gl = HNS3_INT_GL_50K; - ring_group->flow_level = HNS3_FLOW_LOW; + ring_group->coal.int_gl = HNS3_INT_GL_50K; + ring_group->coal.flow_level = HNS3_FLOW_LOW; return true; } @@ -2396,10 +2396,10 @@ static bool hns3_get_new_int_gl(struct hns3_enet_ring_group *ring_group) * 20-1249MB/s high (18000 ints/s) * > 4pps ultra (8000 ints/s) */ - new_flow_level = ring_group->flow_level; - new_int_gl = ring_group->int_gl; + new_flow_level = ring_group->coal.flow_level; + new_int_gl = ring_group->coal.int_gl; tqp_vector = ring_group->ring->tqp_vector; - usecs = (ring_group->int_gl << 1); + usecs = (ring_group->coal.int_gl << 1); bytes_per_usecs = ring_group->total_bytes / usecs; /* 100 microseconds */ packets_per_secs = ring_group->total_packets * 100 / usecs; @@ -2446,9 +2446,9 @@ static bool hns3_get_new_int_gl(struct hns3_enet_ring_group *ring_group) ring_group->total_bytes = 0; ring_group->total_packets = 0; - ring_group->flow_level = new_flow_level; - if (new_int_gl != ring_group->int_gl) { - ring_group->int_gl = new_int_gl; + ring_group->coal.flow_level = new_flow_level; + if (new_int_gl != ring_group->coal.int_gl) { + ring_group->coal.int_gl = new_int_gl; return true; } return false; @@ -2460,18 +2460,18 @@ static void hns3_update_new_int_gl(struct hns3_enet_tqp_vector *tqp_vector) struct
[PATCH net-next 23/23] {topost} net: hns3: add support for VF driver inner interface hclgevf_ops.get_tqps_and_rss_info
This patch adds support for VF driver inner interface hclgevf_ops.get_tqps_and_rss_info. This interface will be used in the initialization process. Signed-off-by: Peng Li--- drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c index b5cb8fb..6c240d6 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c @@ -1451,6 +1451,15 @@ static void hclgevf_get_channels(struct hnae3_handle *handle, ch->combined_count = hdev->num_tqps; } +static void hclgevf_get_tqps_and_rss_info(struct hnae3_handle *handle, + u16 *free_tqps, u16 *max_rss_size) +{ + struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle); + + *free_tqps = 0; + *max_rss_size = hdev->rss_size_max; +} + static const struct hnae3_ae_ops hclgevf_ops = { .init_ae_dev = hclgevf_init_ae_dev, .uninit_ae_dev = hclgevf_uninit_ae_dev, @@ -1482,6 +1491,7 @@ static const struct hnae3_ae_ops hclgevf_ops = { .get_fw_version = hclgevf_get_fw_version, .set_vlan_filter = hclgevf_set_vlan_filter, .get_channels = hclgevf_get_channels, + .get_tqps_and_rss_info = hclgevf_get_tqps_and_rss_info, }; static struct hnae3_ae_algo ae_algovf = { -- 2.9.3
[PATCH net-next 20/23] {topost} net: hns3: fix for buffer overflow smatch warning
From: Yunsheng LinThis patch fixes the buffer overflow warning by refactoring hclgevf_bind_ring_to_vector and hclge_get_ring_chain_from_mbx. Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Fixes: dde1a86e93ca ("net: hns3: Add mailbox support to PF driver") Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h| 2 + .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 19 --- .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 60 ++ 3 files changed, 39 insertions(+), 42 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h b/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h index 3e9203e..e6e1d22 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h +++ b/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h @@ -57,6 +57,8 @@ enum hclge_mbx_vlan_cfg_subcode { #define HCLGE_MBX_MAX_MSG_SIZE 16 #define HCLGE_MBX_MAX_RESP_DATA_SIZE 8 +#define HCLGE_MBX_RING_MAP_BASIC_MSG_NUM 3 +#define HCLGE_MBX_RING_NODE_VARIABLE_NUM 3 struct hclgevf_mbx_resp_status { struct mutex mbx_mutex; /* protects against contending sync cmd resp */ diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c index ed34ca3..e3e4ded 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -105,14 +105,17 @@ static int hclge_get_ring_chain_from_mbx( struct hnae3_ring_chain_node *ring_chain, struct hclge_vport *vport) { -#define HCLGE_RING_NODE_VARIABLE_NUM 3 -#define HCLGE_RING_MAP_MBX_BASIC_MSG_NUM 3 struct hnae3_ring_chain_node *cur_chain, *new_chain; int ring_num; int i; ring_num = req->msg[2]; + if (ring_num > ((HCLGE_MBX_VF_MSG_DATA_NUM - + HCLGE_MBX_RING_MAP_BASIC_MSG_NUM) / + HCLGE_MBX_RING_NODE_VARIABLE_NUM)) + return -ENOMEM; + hnae_set_bit(ring_chain->flag, HNAE3_RING_TYPE_B, req->msg[3]); ring_chain->tqp_index = hclge_get_queue_id(vport->nic.kinfo.tqp[req->msg[4]]); @@ -128,18 +131,18 @@ static int hclge_get_ring_chain_from_mbx( goto err; hnae_set_bit(new_chain->flag, HNAE3_RING_TYPE_B, -req->msg[HCLGE_RING_NODE_VARIABLE_NUM * i + -HCLGE_RING_MAP_MBX_BASIC_MSG_NUM]); +req->msg[HCLGE_MBX_RING_NODE_VARIABLE_NUM * i + +HCLGE_MBX_RING_MAP_BASIC_MSG_NUM]); new_chain->tqp_index = hclge_get_queue_id(vport->nic.kinfo.tqp - [req->msg[HCLGE_RING_NODE_VARIABLE_NUM * i + - HCLGE_RING_MAP_MBX_BASIC_MSG_NUM + 1]]); + [req->msg[HCLGE_MBX_RING_NODE_VARIABLE_NUM * i + + HCLGE_MBX_RING_MAP_BASIC_MSG_NUM + 1]]); hnae_set_field(new_chain->int_gl_idx, HCLGE_INT_GL_IDX_M, HCLGE_INT_GL_IDX_S, - req->msg[HCLGE_RING_NODE_VARIABLE_NUM * i + - HCLGE_RING_MAP_MBX_BASIC_MSG_NUM + 2]); + req->msg[HCLGE_MBX_RING_NODE_VARIABLE_NUM * i + + HCLGE_MBX_RING_MAP_BASIC_MSG_NUM + 2]); cur_chain->next = new_chain; cur_chain = new_chain; diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c index 6bce99a..b5cb8fb 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c @@ -533,13 +533,11 @@ static int hclgevf_bind_ring_to_vector(struct hnae3_handle *handle, bool en, int vector, struct hnae3_ring_chain_node *ring_chain) { -#define HCLGEVF_RING_NODE_VARIABLE_NUM 3 -#define HCLGEVF_RING_MAP_MBX_BASIC_MSG_NUM 3 struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle); struct hnae3_ring_chain_node *node; struct hclge_mbx_vf_to_pf_cmd *req; struct hclgevf_desc desc; - int i, vector_id; + int i = 0, vector_id; int status; u8 type; @@ -551,28 +549,33 @@ static int hclgevf_bind_ring_to_vector(struct hnae3_handle *handle, bool en, return vector_id; } - hclgevf_cmd_setup_basic_desc(, HCLGEVF_OPC_MBX_VF_TO_PF, false); - type = en ? - HCLGE_MBX_MAP_RING_TO_VECTOR : HCLGE_MBX_UNMAP_RING_TO_VECTOR; - req->msg[0] = type; - req->msg[1] = vector_id; /* vector_id should be id in VF */ -
[PATCH net-next 15/23] {topost} net: hns3: refactor the get/put_vector function
From: Yunsheng LinThere is a get_vector function, which allocate the vectors for a client, but there is not a put_vector to free the vector. This patch introduces the put_vector function in order to fix the coalesce configuration lost problem during reset process. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hnae3.h| 3 +++ drivers/net/ethernet/hisilicon/hns3/hns3_enet.c| 4 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 28 -- .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 12 +++--- 4 files changed, 37 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h index 3c653eb..70441d2 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h @@ -265,6 +265,8 @@ struct hnae3_ae_dev { * Get tc size of handle * get_vector() * Get vector number and vector information + * put_vector() + * Put the vector in hdev * map_ring_to_vector() * Map rings to vector * unmap_ring_from_vector() @@ -376,6 +378,7 @@ struct hnae3_ae_ops { int (*get_vector)(struct hnae3_handle *handle, u16 vector_num, struct hnae3_vector_info *vector_info); + int (*put_vector)(struct hnae3_handle *handle, int vector_num); int (*map_ring_to_vector)(struct hnae3_handle *handle, int vector_num, struct hnae3_ring_chain_node *vr_chain); diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 2bed73e..fef65b9 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -2709,6 +2709,10 @@ static int hns3_nic_uninit_vector_data(struct hns3_nic_priv *priv) if (ret) return ret; + ret = h->ae_algo->ops->put_vector(h, tqp_vector->vector_irq); + if (ret) + return ret; + hns3_free_vector_ring_chain(tqp_vector, _ring_chain); if (priv->tqp_vector[i].irq_init_flag == HNS3_VECTOR_INITED) { diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index c0f6939..323f95b 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -2969,6 +2969,24 @@ static int hclge_get_vector_index(struct hclge_dev *hdev, int vector) return -EINVAL; } +static int hclge_put_vector(struct hnae3_handle *handle, int vector) +{ + struct hclge_vport *vport = hclge_get_vport(handle); + struct hclge_dev *hdev = vport->back; + int vector_id; + + vector_id = hclge_get_vector_index(hdev, vector); + if (vector_id < 0) { + dev_err(>pdev->dev, + "Get vector index fail. vector_id =%d\n", vector_id); + return vector_id; + } + + hclge_free_vector(hdev, vector_id); + + return 0; +} + static u32 hclge_get_rss_key_size(struct hnae3_handle *handle) { return HCLGE_RSS_KEY_SIZE; @@ -3523,18 +3541,13 @@ static int hclge_unmap_ring_frm_vector(struct hnae3_handle *handle, } ret = hclge_bind_ring_with_vector(vport, vector_id, false, ring_chain); - if (ret) { + if (ret) dev_err(>pdev->dev, "Unmap ring from vector fail. vectorid=%d, ret =%d\n", vector_id, ret); - return ret; - } - - /* Free this MSIX or MSI vector */ - hclge_free_vector(hdev, vector_id); - return 0; + return ret; } int hclge_cmd_set_promisc_mode(struct hclge_dev *hdev, @@ -5996,6 +6009,7 @@ static const struct hnae3_ae_ops hclge_ops = { .map_ring_to_vector = hclge_map_ring_to_vector, .unmap_ring_from_vector = hclge_unmap_ring_frm_vector, .get_vector = hclge_get_vector, + .put_vector = hclge_put_vector, .set_promisc_mode = hclge_set_promisc_mode, .set_loopback = hclge_set_loopback, .start = hclge_ae_start, diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c index eee5e20..6bce99a 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c @@ -627,13 +627,18 @@ static int hclgevf_unmap_ring_from_vector( } ret = hclgevf_bind_ring_to_vector(handle, false, vector, ring_chain); - if (ret) { + if (ret) dev_err(>pdev->dev, "Unmap ring from vector fail. vector=%d, ret
[PATCH net-next 21/23] {topost} net: hns3: fix the queue id for tqp enable&
Command HCLGE_OPC_CFG_COM_TQP_QUEUE should use queue id in the function, but command HCLGE_OPC_RESET_TQP_QUEUE should use global queue id. This patch fixes the queue id about queue enable/disable/reset. Signed-off-by: Peng Li--- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 50 +++--- 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 323f95b..ea33cc5 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -3720,20 +3720,11 @@ static int hclge_ae_start(struct hnae3_handle *handle) { struct hclge_vport *vport = hclge_get_vport(handle); struct hclge_dev *hdev = vport->back; - int i, queue_id, ret; + int i, ret; - for (i = 0; i < vport->alloc_tqps; i++) { - /* todo clear interrupt */ - /* ring enable */ - queue_id = hclge_get_queue_id(handle->kinfo.tqp[i]); - if (queue_id < 0) { - dev_warn(>pdev->dev, -"Get invalid queue id, ignore it\n"); - continue; - } + for (i = 0; i < vport->alloc_tqps; i++) + hclge_tqp_enable(hdev, i, 0, true); - hclge_tqp_enable(hdev, queue_id, 0, true); - } /* mac enable */ hclge_cfg_mac_mode(hdev, true); clear_bit(HCLGE_STATE_DOWN, >state); @@ -3753,19 +3744,11 @@ static void hclge_ae_stop(struct hnae3_handle *handle) { struct hclge_vport *vport = hclge_get_vport(handle); struct hclge_dev *hdev = vport->back; - int i, queue_id; + int i; - for (i = 0; i < vport->alloc_tqps; i++) { - /* Ring disable */ - queue_id = hclge_get_queue_id(handle->kinfo.tqp[i]); - if (queue_id < 0) { - dev_warn(>pdev->dev, -"Get invalid queue id, ignore it\n"); - continue; - } + for (i = 0; i < vport->alloc_tqps; i++) + hclge_tqp_enable(hdev, i, 0, false); - hclge_tqp_enable(hdev, queue_id, 0, false); - } /* Mac disable */ hclge_cfg_mac_mode(hdev, false); @@ -4851,21 +4834,36 @@ static int hclge_get_reset_status(struct hclge_dev *hdev, u16 queue_id) return hnae_get_bit(req->ready_to_reset, HCLGE_TQP_RESET_B); } +static u16 hclge_covert_handle_qid_global(struct hnae3_handle *handle, + u16 queue_id) +{ + struct hnae3_queue *queue; + struct hclge_tqp *tqp; + + queue = handle->kinfo.tqp[queue_id]; + tqp = container_of(queue, struct hclge_tqp, q); + + return tqp->index; +} + void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id) { struct hclge_vport *vport = hclge_get_vport(handle); struct hclge_dev *hdev = vport->back; int reset_try_times = 0; int reset_status; + u16 queue_gid; int ret; + queue_gid = hclge_covert_handle_qid_global(handle, queue_id); + ret = hclge_tqp_enable(hdev, queue_id, 0, false); if (ret) { dev_warn(>pdev->dev, "Disable tqp fail, ret = %d\n", ret); return; } - ret = hclge_send_reset_tqp_cmd(hdev, queue_id, true); + ret = hclge_send_reset_tqp_cmd(hdev, queue_gid, true); if (ret) { dev_warn(>pdev->dev, "Send reset tqp cmd fail, ret = %d\n", ret); @@ -4876,7 +4874,7 @@ void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id) while (reset_try_times++ < HCLGE_TQP_RESET_TRY_TIMES) { /* Wait for tqp hw reset */ msleep(20); - reset_status = hclge_get_reset_status(hdev, queue_id); + reset_status = hclge_get_reset_status(hdev, queue_gid); if (reset_status) break; } @@ -4886,7 +4884,7 @@ void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id) return; } - ret = hclge_send_reset_tqp_cmd(hdev, queue_id, false); + ret = hclge_send_reset_tqp_cmd(hdev, queue_gid, false); if (ret) { dev_warn(>pdev->dev, "Deassert the soft reset fail, ret = %d\n", ret); -- 2.9.3
[PATCH net-next 16/23] {topost} net: hns3: fix for coalesce configuration lost during reset
From: Yunsheng LinCoalesce configuration will be set to default value by hns3_nic_init_vector_data during reset, which causes the coalesce configuration loss problem. This patch fixes it by setting the default value in hns3_nic_alloc_vector_data, which will not be called in the reset process. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 156 +--- 1 file changed, 114 insertions(+), 42 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index fef65b9..453f509 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -211,19 +211,25 @@ static void hns3_vector_gl_rl_init(struct hns3_enet_tqp_vector *tqp_vector, tqp_vector->tx_group.int_gl = HNS3_INT_GL_50K; tqp_vector->rx_group.int_gl = HNS3_INT_GL_50K; - hns3_set_vector_coalesce_tx_gl(tqp_vector, - tqp_vector->tx_group.int_gl); - hns3_set_vector_coalesce_rx_gl(tqp_vector, - tqp_vector->rx_group.int_gl); - /* Default: disable RL */ h->kinfo.int_rl_setting = 0; - hns3_set_vector_coalesce_rl(tqp_vector, h->kinfo.int_rl_setting); tqp_vector->rx_group.flow_level = HNS3_FLOW_LOW; tqp_vector->tx_group.flow_level = HNS3_FLOW_LOW; } +static void hns3_vector_gl_rl_init_hw(struct hns3_enet_tqp_vector *tqp_vector, + struct hns3_nic_priv *priv) +{ + struct hnae3_handle *h = priv->ae_handle; + + hns3_set_vector_coalesce_tx_gl(tqp_vector, + tqp_vector->tx_group.int_gl); + hns3_set_vector_coalesce_rx_gl(tqp_vector, + tqp_vector->rx_group.int_gl); + hns3_set_vector_coalesce_rl(tqp_vector, h->kinfo.int_rl_setting); +} + static int hns3_nic_set_real_num_queue(struct net_device *netdev) { struct hnae3_handle *h = hns3_get_handle(netdev); @@ -2613,32 +2619,18 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv) struct hnae3_ring_chain_node vector_ring_chain; struct hnae3_handle *h = priv->ae_handle; struct hns3_enet_tqp_vector *tqp_vector; - struct hnae3_vector_info *vector; - struct pci_dev *pdev = h->pdev; - u16 tqp_num = h->kinfo.num_tqps; - u16 vector_num; int ret = 0; u16 i; - /* RSS size, cpu online and vector_num should be the same */ - /* Should consider 2p/4p later */ - vector_num = min_t(u16, num_online_cpus(), tqp_num); - vector = devm_kcalloc(>dev, vector_num, sizeof(*vector), - GFP_KERNEL); - if (!vector) - return -ENOMEM; - - vector_num = h->ae_algo->ops->get_vector(h, vector_num, vector); - - priv->vector_num = vector_num; - priv->tqp_vector = (struct hns3_enet_tqp_vector *) - devm_kcalloc(>dev, vector_num, sizeof(*priv->tqp_vector), -GFP_KERNEL); - if (!priv->tqp_vector) - return -ENOMEM; + for (i = 0; i < priv->vector_num; i++) { + tqp_vector = >tqp_vector[i]; + hns3_vector_gl_rl_init_hw(tqp_vector, priv); + tqp_vector->num_tqps = 0; + } - for (i = 0; i < tqp_num; i++) { - u16 vector_i = i % vector_num; + for (i = 0; i < h->kinfo.num_tqps; i++) { + u16 vector_i = i % priv->vector_num; + u16 tqp_num = h->kinfo.num_tqps; tqp_vector = >tqp_vector[vector_i]; @@ -2648,52 +2640,94 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv) hns3_add_ring_to_group(_vector->rx_group, priv->ring_data[i + tqp_num].ring); - tqp_vector->idx = vector_i; - tqp_vector->mask_addr = vector[vector_i].io_addr; - tqp_vector->vector_irq = vector[vector_i].vector; - tqp_vector->num_tqps++; - priv->ring_data[i].ring->tqp_vector = tqp_vector; priv->ring_data[i + tqp_num].ring->tqp_vector = tqp_vector; + tqp_vector->num_tqps++; } - for (i = 0; i < vector_num; i++) { + for (i = 0; i < priv->vector_num; i++) { tqp_vector = >tqp_vector[i]; tqp_vector->rx_group.total_bytes = 0; tqp_vector->rx_group.total_packets = 0; tqp_vector->tx_group.total_bytes = 0; tqp_vector->tx_group.total_packets = 0; - hns3_vector_gl_rl_init(tqp_vector, priv); tqp_vector->handle = h; ret = hns3_get_vector_ring_chain(tqp_vector,
[PATCH net-next 22/23] {topost} net: hns3: set the max ring num when alloc netdev
HNS3 driver should alloc netdev with max support ring num, as driver support change netdev count by ethtool -L. Signed-off-by: Peng Li--- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 27 - 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index b02f3ff..94f0b92 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -255,6 +255,16 @@ static int hns3_nic_set_real_num_queue(struct net_device *netdev) return 0; } +static u16 hns3_get_max_available_channels(struct hnae3_handle *h) +{ + u16 free_tqps, max_rss_size, max_tqps; + + h->ae_algo->ops->get_tqps_and_rss_info(h, _tqps, _rss_size); + max_tqps = h->kinfo.num_tc * max_rss_size; + + return min_t(u16, max_tqps, (free_tqps + h->kinfo.num_tqps)); +} + static int hns3_nic_net_up(struct net_device *netdev) { struct hns3_nic_priv *priv = netdev_priv(netdev); @@ -3062,7 +3072,7 @@ static int hns3_client_init(struct hnae3_handle *handle) int ret; netdev = alloc_etherdev_mq(sizeof(struct hns3_nic_priv), - handle->kinfo.num_tqps); + hns3_get_max_available_channels(handle)); if (!netdev) return -ENOMEM; @@ -3401,17 +3411,6 @@ static int hns3_reset_notify(struct hnae3_handle *handle, return ret; } -static u16 hns3_get_max_available_channels(struct net_device *netdev) -{ - struct hnae3_handle *h = hns3_get_handle(netdev); - u16 free_tqps, max_rss_size, max_tqps; - - h->ae_algo->ops->get_tqps_and_rss_info(h, _tqps, _rss_size); - max_tqps = h->kinfo.num_tc * max_rss_size; - - return min_t(u16, max_tqps, (free_tqps + h->kinfo.num_tqps)); -} - static void hns3_restore_coal(struct hns3_nic_priv *priv, struct hns3_enet_coalesce *tx, struct hns3_enet_coalesce *rx) @@ -3488,12 +3487,12 @@ int hns3_set_channels(struct net_device *netdev, if (ch->rx_count || ch->tx_count) return -EINVAL; - if (new_tqp_num > hns3_get_max_available_channels(netdev) || + if (new_tqp_num > hns3_get_max_available_channels(h) || new_tqp_num < kinfo->num_tc) { dev_err(>dev, "Change tqps fail, the tqp range is from %d to %d", kinfo->num_tc, - hns3_get_max_available_channels(netdev)); + hns3_get_max_available_channels(h)); return -EINVAL; } -- 2.9.3
[PATCH net-next 04/23] {topost} net: hns3: fix endian issue when PF get mbx message flag
This patch fixes the endian issue when PF get mbx message flag. Signed-off-by: Peng Li--- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c index 3a2c174..ed34ca3 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -335,11 +335,11 @@ void hclge_mbx_handler(struct hclge_dev *hdev) struct hclge_mbx_vf_to_pf_cmd *req; struct hclge_vport *vport; struct hclge_desc *desc; - int ret; + int ret, flag; + flag = le16_to_cpu(crq->desc[crq->next_to_use].flag); /* handle all the mailbox requests in the queue */ - while (hnae_get_bit(crq->desc[crq->next_to_use].flag, - HCLGE_CMDQ_RX_OUTVLD_B)) { + while (hnae_get_bit(flag, HCLGE_CMDQ_RX_OUTVLD_B)) { desc = >desc[crq->next_to_use]; req = (struct hclge_mbx_vf_to_pf_cmd *)desc->data; @@ -414,6 +414,7 @@ void hclge_mbx_handler(struct hclge_dev *hdev) } crq->desc[crq->next_to_use].flag = 0; hclge_mbx_ring_ptr_move_crq(crq); + flag = le16_to_cpu(crq->desc[crq->next_to_use].flag); } /* Write back CMDQ_RQ header pointer, M7 need this pointer */ -- 2.9.3
[net-next:master 178/193] drivers/net/ipvlan/ipvlan.h:183:32: sparse: incompatible types in comparison expression (different address spaces)
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: a366e300ae9fc466d333e6d8f2bc5d58ed248041 commit: 1ec54cb44e6731c3cb251bcf9251d65a4b4f6306 [178/193] net: unpollute priv_flags space reproduce: # apt-get install sparse git checkout 1ec54cb44e6731c3cb251bcf9251d65a4b4f6306 make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) drivers/net/ipvlan/ipvlan_core.c:57:36: sparse: incorrect type in argument 1 (different base types) @@expected unsigned int [unsigned] [usertype] a @@ got unsigned int [unsigned] [usertype] a @@ drivers/net/ipvlan/ipvlan_core.c:57:36:expected unsigned int [unsigned] [usertype] a drivers/net/ipvlan/ipvlan_core.c:57:36:got restricted __be32 const [usertype] s_addr >> drivers/net/ipvlan/ipvlan.h:183:32: sparse: incompatible types in comparison >> expression (different address spaces) -- >> drivers/net/ipvlan/ipvlan.h:183:32: sparse: incompatible types in comparison >> expression (different address spaces) >> drivers/net/ipvlan/ipvlan.h:183:32: sparse: incompatible types in comparison >> expression (different address spaces) >> drivers/net/ipvlan/ipvlan.h:183:32: sparse: incompatible types in comparison >> expression (different address spaces) vim +183 drivers/net/ipvlan/ipvlan.h 180 181 static inline bool netif_is_ipvlan_port(const struct net_device *dev) 182 { > 183 return dev->rx_handler == ipvlan_handle_frame; 184 } 185 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: linux-next: manual merge of the selinux tree with the net-next tree
Hi all, On Mon, 5 Mar 2018 12:40:54 +1100 Stephen Rothwellwrote: > > Today's linux-next merge of the selinux tree got a conflict in: > > net/sctp/socket.c > > between several refactoring commits from the net-next tree and commit: > > 2277c7cd75e3 ("sctp: Add LSM hooks") > > from the selinux tree. > > I fixed it up (I think - see below) and can carry the fix as > necessary. This is now fixed as far as linux-next is concerned, but any > non trivial conflicts should be mentioned to your upstream maintainer > when your tree is submitted for merging. You may also want to consider > cooperating with the maintainer of the conflicting tree to minimise any > particularly complex conflicts. > > -- > Cheers, > Stephen Rothwell The resolution now looks like below (there were more changes to this file in the net-next tree). It will keep changing every time this file is touched :-( -- Cheers, Stephen Rothwell diff --cc net/sctp/socket.c index 7d3476a4860d,73b34a6b5b09.. --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@@ -1606,200 -1622,308 +1622,216 @@@ static int sctp_error(struct sock *sk, static int sctp_msghdr_parse(const struct msghdr *msg, struct sctp_cmsgs *cmsgs); -static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len) +static int sctp_sendmsg_parse(struct sock *sk, struct sctp_cmsgs *cmsgs, +struct sctp_sndrcvinfo *srinfo, +const struct msghdr *msg, size_t msg_len) { - struct net *net = sock_net(sk); - struct sctp_sock *sp; - struct sctp_endpoint *ep; - struct sctp_association *new_asoc = NULL, *asoc = NULL; - struct sctp_transport *transport, *chunk_tp; - struct sctp_chunk *chunk; - union sctp_addr to; - struct sctp_af *af; - struct sockaddr *msg_name = NULL; - struct sctp_sndrcvinfo default_sinfo; - struct sctp_sndrcvinfo *sinfo; - struct sctp_initmsg *sinit; - sctp_assoc_t associd = 0; - struct sctp_cmsgs cmsgs = { NULL }; - enum sctp_scope scope; - bool fill_sinfo_ttl = false, wait_connect = false; - struct sctp_datamsg *datamsg; - int msg_flags = msg->msg_flags; - __u16 sinfo_flags = 0; - long timeo; + __u16 sflags; int err; - err = 0; - sp = sctp_sk(sk); - ep = sp->ep; + if (sctp_sstate(sk, LISTENING) && sctp_style(sk, TCP)) + return -EPIPE; - pr_debug("%s: sk:%p, msg:%p, msg_len:%zu ep:%p\n", __func__, sk, - msg, msg_len, ep); - - /* We cannot send a message over a TCP-style listening socket. */ - if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING)) { - err = -EPIPE; - goto out_nounlock; - } + if (msg_len > sk->sk_sndbuf) + return -EMSGSIZE; - /* Parse out the SCTP CMSGs. */ - err = sctp_msghdr_parse(msg, ); + memset(cmsgs, 0, sizeof(*cmsgs)); + err = sctp_msghdr_parse(msg, cmsgs); if (err) { pr_debug("%s: msghdr parse err:%x\n", __func__, err); - goto out_nounlock; + return err; } - /* Fetch the destination address for this packet. This - * address only selects the association--it is not necessarily - * the address we will send to. - * For a peeled-off socket, msg_name is ignored. - */ - if (!sctp_style(sk, UDP_HIGH_BANDWIDTH) && msg->msg_name) { - int msg_namelen = msg->msg_namelen; + memset(srinfo, 0, sizeof(*srinfo)); + if (cmsgs->srinfo) { + srinfo->sinfo_stream = cmsgs->srinfo->sinfo_stream; + srinfo->sinfo_flags = cmsgs->srinfo->sinfo_flags; + srinfo->sinfo_ppid = cmsgs->srinfo->sinfo_ppid; + srinfo->sinfo_context = cmsgs->srinfo->sinfo_context; + srinfo->sinfo_assoc_id = cmsgs->srinfo->sinfo_assoc_id; + srinfo->sinfo_timetolive = cmsgs->srinfo->sinfo_timetolive; + } - err = sctp_verify_addr(sk, (union sctp_addr *)msg->msg_name, - msg_namelen); - if (err) - return err; + if (cmsgs->sinfo) { + srinfo->sinfo_stream = cmsgs->sinfo->snd_sid; + srinfo->sinfo_flags = cmsgs->sinfo->snd_flags; + srinfo->sinfo_ppid = cmsgs->sinfo->snd_ppid; + srinfo->sinfo_context = cmsgs->sinfo->snd_context; + srinfo->sinfo_assoc_id = cmsgs->sinfo->snd_assoc_id; + } - if (msg_namelen > sizeof(to)) - msg_namelen = sizeof(to); - memcpy(, msg->msg_name, msg_namelen); - msg_name = msg->msg_name; + if (cmsgs->prinfo) { + srinfo->sinfo_timetolive = cmsgs->prinfo->pr_value; +
Re: [PATCH bpf-next 0/2] bpf: add support for bpf program to read perf event sample address
On 03/06/2018 07:55 PM, Teng Qin wrote: > These patches add support that allows bpf programs attached to perf events to > read the address values recorded with the perf events. These values are > requested by specifying sample_type with PERF_SAMPLE_ADDR when calling > perf_event_open(). > > The main motivation for these changes is to support building memory or lock > access profiling and tracing tools. For example on Intel CPUs, the recorded > address values for supported memory or lock access perf events would be > the access or lock target addresses from PEBS buffer. Such information would > be very valuable for building tools that help understand memory access or > lock acquire pattern. Series applied to bpf-next, thanks Teng!
Re: pull-request: bpf 2018-03-08
On 03/08/2018 02:31 AM, David Miller wrote: > From: Daniel Borkmann> Date: Thu, 8 Mar 2018 02:17:16 +0100 > >> The following pull-request contains BPF updates for your *net* tree. >> >> The main changes are: >> >> 1) Fix various BPF helpers which adjust the skb and its GSO information >>with regards to SCTP GSO. The latter is a special case where gso_size >>is of value GSO_BY_FRAGS, so mangling that will end up corrupting >>the skb, thus bail out when seeing SCTP GSO packets, from Daniel(s). >> >> 2) Fix a compilation error in bpftool where BPF_FS_MAGIC is not defined >>due to too old kernel headers in the system, from Jiri. >> >> 3) Increase the number of x64 JIT passes in order to allow larger images >>to converge instead of punting them to interpreter or having them >>rejected when the interpreter is not built into the kernel, from Daniel. >> >> Please consider pulling these changes from: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git > > Pulled, thanks Daniel. Thanks! > About that x86 JIT passes thing... > > I think since you now have a scheduling point in there, you can be > even more liberal with the limit if necessary. Agree, if needed we can always adapt it further in future, I think 20 should be good for now. Thanks, Daniel
Re: pull-request: bpf 2018-03-08
From: Daniel BorkmannDate: Thu, 8 Mar 2018 02:17:16 +0100 > The following pull-request contains BPF updates for your *net* tree. > > The main changes are: > > 1) Fix various BPF helpers which adjust the skb and its GSO information >with regards to SCTP GSO. The latter is a special case where gso_size >is of value GSO_BY_FRAGS, so mangling that will end up corrupting >the skb, thus bail out when seeing SCTP GSO packets, from Daniel(s). > > 2) Fix a compilation error in bpftool where BPF_FS_MAGIC is not defined >due to too old kernel headers in the system, from Jiri. > > 3) Increase the number of x64 JIT passes in order to allow larger images >to converge instead of punting them to interpreter or having them >rejected when the interpreter is not built into the kernel, from Daniel. > > Please consider pulling these changes from: > > git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Pulled, thanks Daniel. About that x86 JIT passes thing... I think since you now have a scheduling point in there, you can be even more liberal with the limit if necessary.
[for-next 02/11] net/mlx5: IPSec, Generalize sandbox QP commands
From: Yossi KupermanThe current code assume only SA QP commands. Refactor in order to pave the way for new QP commands: 1. Generic cmd response format. 2. SA cmd checks are in dedicated functions. 3. Aligned debug prints. Signed-off-by: Yossi Kuperman Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 116 - include/linux/mlx5/mlx5_ifc_fpga.h | 16 +++ 2 files changed, 81 insertions(+), 51 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c index 95f9c5a8619b..e0f32b025e06 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c @@ -41,35 +41,23 @@ #define SBU_QP_QUEUE_SIZE 8 #define MLX5_FPGA_IPSEC_CMD_TIMEOUT_MSEC (60 * 1000) -enum mlx5_ipsec_response_syndrome { - MLX5_IPSEC_RESPONSE_SUCCESS = 0, - MLX5_IPSEC_RESPONSE_ILLEGAL_REQUEST = 1, - MLX5_IPSEC_RESPONSE_SADB_ISSUE = 2, - MLX5_IPSEC_RESPONSE_WRITE_RESPONSE_ISSUE = 3, -}; - -enum mlx5_fpga_ipsec_sacmd_status { - MLX5_FPGA_IPSEC_SACMD_PENDING, - MLX5_FPGA_IPSEC_SACMD_SEND_FAIL, - MLX5_FPGA_IPSEC_SACMD_COMPLETE, +enum mlx5_fpga_ipsec_cmd_status { + MLX5_FPGA_IPSEC_CMD_PENDING, + MLX5_FPGA_IPSEC_CMD_SEND_FAIL, + MLX5_FPGA_IPSEC_CMD_COMPLETE, }; struct mlx5_ipsec_command_context { struct mlx5_fpga_dma_buf buf; - struct mlx5_accel_ipsec_sa sa; - enum mlx5_fpga_ipsec_sacmd_status status; + enum mlx5_fpga_ipsec_cmd_status status; + struct mlx5_ifc_fpga_ipsec_cmd_resp resp; int status_code; struct completion complete; struct mlx5_fpga_device *dev; struct list_head list; /* Item in pending_cmds */ + u8 command[0]; }; -struct mlx5_ipsec_sadb_resp { - __be32 syndrome; - __be32 sw_sa_handle; - u8 reserved[24]; -} __packed; - struct mlx5_fpga_ipsec { struct list_head pending_cmds; spinlock_t pending_cmds_lock; /* Protects pending_cmds */ @@ -105,21 +93,22 @@ static void mlx5_fpga_ipsec_send_complete(struct mlx5_fpga_conn *conn, buf); mlx5_fpga_warn(fdev, "IPSec command send failed with status %u\n", status); - context->status = MLX5_FPGA_IPSEC_SACMD_SEND_FAIL; + context->status = MLX5_FPGA_IPSEC_CMD_SEND_FAIL; complete(>complete); } } -static inline int syndrome_to_errno(enum mlx5_ipsec_response_syndrome syndrome) +static inline +int syndrome_to_errno(enum mlx5_ifc_fpga_ipsec_response_syndrome syndrome) { switch (syndrome) { - case MLX5_IPSEC_RESPONSE_SUCCESS: + case MLX5_FPGA_IPSEC_RESPONSE_SUCCESS: return 0; - case MLX5_IPSEC_RESPONSE_SADB_ISSUE: + case MLX5_FPGA_IPSEC_RESPONSE_SADB_ISSUE: return -EEXIST; - case MLX5_IPSEC_RESPONSE_ILLEGAL_REQUEST: + case MLX5_FPGA_IPSEC_RESPONSE_ILLEGAL_REQUEST: return -EINVAL; - case MLX5_IPSEC_RESPONSE_WRITE_RESPONSE_ISSUE: + case MLX5_FPGA_IPSEC_RESPONSE_WRITE_RESPONSE_ISSUE: return -EIO; } return -EIO; @@ -127,9 +116,9 @@ static inline int syndrome_to_errno(enum mlx5_ipsec_response_syndrome syndrome) static void mlx5_fpga_ipsec_recv(void *cb_arg, struct mlx5_fpga_dma_buf *buf) { - struct mlx5_ipsec_sadb_resp *resp = buf->sg[0].data; + struct mlx5_ifc_fpga_ipsec_cmd_resp *resp = buf->sg[0].data; struct mlx5_ipsec_command_context *context; - enum mlx5_ipsec_response_syndrome syndrome; + enum mlx5_ifc_fpga_ipsec_response_syndrome syndrome; struct mlx5_fpga_device *fdev = cb_arg; unsigned long flags; @@ -139,8 +128,8 @@ static void mlx5_fpga_ipsec_recv(void *cb_arg, struct mlx5_fpga_dma_buf *buf) return; } - mlx5_fpga_dbg(fdev, "mlx5_ipsec recv_cb syndrome %08x sa_id %x\n", - ntohl(resp->syndrome), ntohl(resp->sw_sa_handle)); + mlx5_fpga_dbg(fdev, "mlx5_ipsec recv_cb syndrome %08x\n", + ntohl(resp->syndrome)); spin_lock_irqsave(>ipsec->pending_cmds_lock, flags); context = list_first_entry_or_null(>ipsec->pending_cmds, @@ -156,51 +145,48 @@ static void mlx5_fpga_ipsec_recv(void *cb_arg, struct mlx5_fpga_dma_buf *buf) } mlx5_fpga_dbg(fdev, "Handling response for %p\n", context); - if (context->sa.sw_sa_handle != resp->sw_sa_handle) { - mlx5_fpga_err(fdev, "mismatch SA handle. cmd 0x%08x vs resp 0x%08x\n", - ntohl(context->sa.sw_sa_handle), - ntohl(resp->sw_sa_handle)); -
[for-next 08/11] net/mlx5: Add flow-steering commands for FPGA IPSec implementation
From: Aviad YehezkelIn order to add a context to the FPGA, we need to get both the software transform context (which includes the keys, etc) and the source/destination IPs (which are included in the steering rule). Therefore, we register new set of firmware like commands for the FPGA. Each time a rule is added, the steering core infrastructure calls the FPGA command layer. If the rule is intended for the FPGA, it combines the IPs information with the software transformation context and creates the respective hardware transform. Afterwards, it calls the standard steering command layer. Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/accel/ipsec.c | 7 + .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 724 + .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.h | 24 + drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 5 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 + include/linux/mlx5/accel.h | 5 + include/linux/mlx5/fs.h| 3 + 7 files changed, 770 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c index ab5bc82855fd..9f1b1939716a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c @@ -100,3 +100,10 @@ void mlx5_accel_esp_destroy_xfrm(struct mlx5_accel_esp_xfrm *xfrm) mlx5_fpga_esp_destroy_xfrm(xfrm); } EXPORT_SYMBOL_GPL(mlx5_accel_esp_destroy_xfrm); + +int mlx5_accel_esp_modify_xfrm(struct mlx5_accel_esp_xfrm *xfrm, + const struct mlx5_accel_esp_xfrm_attrs *attrs) +{ + return mlx5_fpga_esp_modify_xfrm(xfrm, attrs); +} +EXPORT_SYMBOL_GPL(mlx5_accel_esp_modify_xfrm); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c index daae44c937f0..7b43fa269117 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c @@ -33,8 +33,12 @@ #include #include +#include +#include +#include #include "mlx5_core.h" +#include "fs_cmd.h" #include "fpga/ipsec.h" #include "fpga/sdk.h" #include "fpga/core.h" @@ -75,6 +79,12 @@ struct mlx5_fpga_esp_xfrm { struct mlx5_accel_esp_xfrm accel_xfrm; }; +struct mlx5_fpga_ipsec_rule { + struct rb_node node; + struct fs_fte *fte; + struct mlx5_fpga_ipsec_sa_ctx *ctx; +}; + static const struct rhashtable_params rhash_sa = { .key_len = FIELD_SIZEOF(struct mlx5_fpga_ipsec_sa_ctx, hw_sa), .key_offset = offsetof(struct mlx5_fpga_ipsec_sa_ctx, hw_sa), @@ -84,11 +94,15 @@ static const struct rhashtable_params rhash_sa = { }; struct mlx5_fpga_ipsec { + struct mlx5_fpga_device *fdev; struct list_head pending_cmds; spinlock_t pending_cmds_lock; /* Protects pending_cmds */ u32 caps[MLX5_ST_SZ_DW(ipsec_extended_cap)]; struct mlx5_fpga_conn *conn; + struct notifier_block fs_notifier_ingress_bypass; + struct notifier_block fs_notifier_egress; + /* Map hardware SA --> SA context * (mlx5_fpga_ipsec_sa) (mlx5_fpga_ipsec_sa_ctx) * We will use this hash to avoid SAs duplication in fpga which @@ -96,6 +110,12 @@ struct mlx5_fpga_ipsec { */ struct rhashtable sa_hash; /* hw_sa -> mlx5_fpga_ipsec_sa_ctx */ struct mutex sa_hash_lock; + + /* Tree holding all rules for this fpga device +* Key for searching a rule (mlx5_fpga_ipsec_rule) is (ft, id) +*/ + struct rb_root rules_rb; + struct mutex rules_rb_lock; /* rules lock */ }; static bool mlx5_fpga_is_ipsec_device(struct mlx5_core_dev *mdev) @@ -498,6 +518,127 @@ mlx5_fpga_ipsec_build_hw_sa(struct mlx5_core_dev *mdev, hw_sa->ipsec_sa_v1.flags |= MLX5_FPGA_IPSEC_SA_IPV6; } +static bool is_full_mask(const void *p, size_t len) +{ + WARN_ON(len % 4); + + return !memchr_inv(p, 0xff, len); +} + +static bool validate_fpga_full_mask(struct mlx5_core_dev *dev, + const u32 *match_c, + const u32 *match_v) +{ + const void *misc_params_c = MLX5_ADDR_OF(fte_match_param, +match_c, +misc_parameters); + const void *headers_c = MLX5_ADDR_OF(fte_match_param, +match_c, +outer_headers); + const void *headers_v = MLX5_ADDR_OF(fte_match_param, +match_v, +outer_headers); + +
[for-next 07/11] net/mlx5: Refactor accel IPSec code
From: Aviad YehezkelThe current code has one layer that executed FPGA commands and the Ethernet part directly used this code. Since downstream patches introduces support for IPSec in mlx5_ib, we need to provide some abstractions. This patch refactors the accel code into one layer that creates a software IPSec transformation and another one which creates the actual hardware context. The internal command implementation is now hidden in the FPGA core layer. The code also adds the ability to share FPGA hardware contexts. If two contexts are the same, only a reference count is taken. Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/accel/ipsec.c | 58 +-- .../net/ethernet/mellanox/mlx5/core/accel/ipsec.h | 97 ++--- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 150 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 391 +++-- .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.h | 55 ++- include/linux/mlx5/accel.h | 83 - include/linux/mlx5/mlx5_ifc_fpga.h | 59 7 files changed, 668 insertions(+), 225 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c index 375ba438e7cf..ab5bc82855fd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c @@ -37,27 +37,6 @@ #include "mlx5_core.h" #include "fpga/ipsec.h" -void *mlx5_accel_ipsec_sa_cmd_exec(struct mlx5_core_dev *mdev, - struct mlx5_accel_ipsec_sa *cmd) -{ - int cmd_size; - - if (!MLX5_IPSEC_DEV(mdev)) - return ERR_PTR(-EOPNOTSUPP); - - if (mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_CAP_V2_CMD) - cmd_size = sizeof(*cmd); - else - cmd_size = sizeof(cmd->ipsec_sa_v1); - - return mlx5_fpga_ipsec_sa_cmd_exec(mdev, cmd, cmd_size); -} - -int mlx5_accel_ipsec_sa_cmd_wait(void *ctx) -{ - return mlx5_fpga_ipsec_sa_cmd_wait(ctx); -} - u32 mlx5_accel_ipsec_device_caps(struct mlx5_core_dev *mdev) { return mlx5_fpga_ipsec_device_caps(mdev); @@ -75,6 +54,21 @@ int mlx5_accel_ipsec_counters_read(struct mlx5_core_dev *mdev, u64 *counters, return mlx5_fpga_ipsec_counters_read(mdev, counters, count); } +void *mlx5_accel_esp_create_hw_context(struct mlx5_core_dev *mdev, + struct mlx5_accel_esp_xfrm *xfrm, + const __be32 saddr[4], + const __be32 daddr[4], + const __be32 spi, bool is_ipv6) +{ + return mlx5_fpga_ipsec_create_sa_ctx(mdev, xfrm, saddr, daddr, +spi, is_ipv6); +} + +void mlx5_accel_esp_free_hw_context(void *context) +{ + mlx5_fpga_ipsec_delete_sa_ctx(context); +} + int mlx5_accel_ipsec_init(struct mlx5_core_dev *mdev) { return mlx5_fpga_ipsec_init(mdev); @@ -84,3 +78,25 @@ void mlx5_accel_ipsec_cleanup(struct mlx5_core_dev *mdev) { mlx5_fpga_ipsec_cleanup(mdev); } + +struct mlx5_accel_esp_xfrm * +mlx5_accel_esp_create_xfrm(struct mlx5_core_dev *mdev, + const struct mlx5_accel_esp_xfrm_attrs *attrs, + u32 flags) +{ + struct mlx5_accel_esp_xfrm *xfrm; + + xfrm = mlx5_fpga_esp_create_xfrm(mdev, attrs, flags); + if (IS_ERR(xfrm)) + return xfrm; + + xfrm->mdev = mdev; + return xfrm; +} +EXPORT_SYMBOL_GPL(mlx5_accel_esp_create_xfrm); + +void mlx5_accel_esp_destroy_xfrm(struct mlx5_accel_esp_xfrm *xfrm) +{ + mlx5_fpga_esp_destroy_xfrm(xfrm); +} +EXPORT_SYMBOL_GPL(mlx5_accel_esp_destroy_xfrm); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h index 421ed71a029b..024dbd22a89b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h @@ -39,89 +39,20 @@ #ifdef CONFIG_MLX5_ACCEL -#define MLX5_IPSEC_SADB_IP_AH BIT(7) -#define MLX5_IPSEC_SADB_IP_ESP BIT(6) -#define MLX5_IPSEC_SADB_SA_VALIDBIT(5) -#define MLX5_IPSEC_SADB_SPI_EN BIT(4) -#define MLX5_IPSEC_SADB_DIR_SX BIT(3) -#define MLX5_IPSEC_SADB_IPV6BIT(2) - -enum { - MLX5_IPSEC_CMD_ADD_SA = 0, - MLX5_IPSEC_CMD_DEL_SA = 1, - MLX5_IPSEC_CMD_ADD_SA_V2 = 2, - MLX5_IPSEC_CMD_DEL_SA_V2 = 3, - MLX5_IPSEC_CMD_MOD_SA_V2 = 4, - MLX5_IPSEC_CMD_SET_CAP = 5, -}; - -enum mlx5_accel_ipsec_enc_mode { - MLX5_IPSEC_SADB_MODE_NONE = 0, - MLX5_IPSEC_SADB_MODE_AES_GCM_128_AUTH_128 = 1, - MLX5_IPSEC_SADB_MODE_AES_GCM_256_AUTH_128 = 3, -}; - #define MLX5_IPSEC_DEV(mdev)
[for-next 11/11] net/mlx5: Fix wrongly assigned CQ reference counter
From: Leon RomanovskyThe kernel compiled with CONFIG_REFCOUNT_FULL produces the following error. The reason to it that initial value of refcount_t is supposed to be more than 0, change it. [3.106634] [ cut here ] [3.107756] refcount_t: increment on 0; use-after-free. [3.109130] WARNING: CPU: 0 PID: 1 at lib/refcount.c:153 refcount_inc+0x27/0x30 [3.110085] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00028-gf683e04bdccc #137 [3.110085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [3.110085] RIP: 0010:refcount_inc+0x27/0x30 [3.110085] RSP: :aa62fba0 EFLAGS: 00010286 [3.110085] RAX: RBX: 9a6d1a1821c8 RCX: 98a50f48 [3.110085] RDX: 0001 RSI: 0086 RDI: 0246 [3.110085] RBP: 9a6d1ac800a0 R08: 0289 R09: 000a [3.110085] R10: f03bc0682840 R11: 9949856d R12: 9a6d1b4a4000 [3.110085] R13: R14: 9a6d1a0a6c00 R15: aa62fc5c [3.110085] FS: () GS:9a6d1fc0() knlGS: [3.110085] CS: 0010 DS: ES: CR0: 80050033 [3.110085] CR2: CR3: 0ba0a000 CR4: 06b0 [3.110085] Call Trace: [3.110085] mlx5_core_create_cq+0xde/0x250 [3.110085] ? __kmalloc+0x1ce/0x1e0 [3.110085] mlx5e_create_cq+0x15c/0x1e0 [3.110085] mlx5e_open_drop_rq+0xea/0x190 [3.110085] mlx5e_attach_netdev+0x53/0x140 [3.110085] mlx5e_attach+0x3d/0x60 [3.110085] mlx5e_add+0x11d/0x2f0 [3.110085] mlx5_add_device+0x77/0x170 [3.110085] mlx5_register_interface+0x74/0xc0 [3.110085] ? set_debug_rodata+0x11/0x11 [3.110085] init+0x67/0x72 [3.110085] ? mlx4_en_init_ptys2ethtool_map+0x346/0x346 [3.110085] do_one_initcall+0x98/0x147 [3.110085] ? set_debug_rodata+0x11/0x11 [3.110085] kernel_init_freeable+0x164/0x1e0 [3.110085] ? rest_init+0xb0/0xb0 [3.110085] kernel_init+0xa/0x100 [3.110085] ret_from_fork+0x35/0x40 [3.110085] Code: 00 00 00 00 e8 ab ff ff ff 84 c0 74 02 f3 c3 80 3d 3b c3 64 01 00 75 f5 48 c7 c7 68 0b 81 98 c6 05 2b c3 64 01 01 e8 79 d7 a3 ff <0f> ff c3 66 0f 1f 44 00 00 8b 06 83 f8 ff 74 39 31 c9 39 f8 89 [3.110085] ---[ end trace a0068e1c68438a74 ]--- Fixes: f105b45bf77c ("net/mlx5: CQ hold/put API") Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/cq.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c index 669ed16938b3..a4179122a279 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c @@ -109,8 +109,7 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, cq->cons_index = 0; cq->arm_sn = 0; cq->eq = eq; - refcount_set(>refcount, 0); - mlx5_cq_hold(cq); + refcount_set(>refcount, 1); init_completion(>free); if (!cq->comp) cq->comp = mlx5_add_cq_to_tasklet; -- 2.14.3
[for-next 10/11] net/mlx5: IPSec, Add support for ESN
From: Aviad YehezkelCurrently ESN is not supported with IPSec device offload. This patch adds ESN support to IPsec device offload. Implementing new xfrm device operation to synchronize offloading device ESN with xfrm received SN. New QP command to update SA state at the following: ESN 1ESN 2 ESN 3 |---*---|---*---|---* ^ ^ ^ ^ ^ ^ ^ - marks where QP command invoked to update the SA ESN state machine. | - marks the start of the ESN scope (0-2^32-1). At this point move SA ESN overlap bit to zero and increment ESN. * - marks the middle of the ESN scope (2^31). At this point move SA ESN overlap bit to one. Signed-off-by: Aviad Yehezkel Signed-off-by: Yossef Efraim Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 118 +++-- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 23 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 29 - .../mellanox/mlx5/core/en_accel/ipsec_rxtx.h | 5 + .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 22 include/linux/mlx5/accel.h | 2 + include/linux/mlx5/mlx5_ifc_fpga.h | 2 + 7 files changed, 189 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index f5b1d60f96f5..cf58c9637904 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -38,18 +38,9 @@ #include #include "en.h" -#include "accel/ipsec.h" #include "en_accel/ipsec.h" #include "en_accel/ipsec_rxtx.h" -struct mlx5e_ipsec_sa_entry { - struct hlist_node hlist; /* Item in SADB_RX hashtable */ - unsigned int handle; /* Handle in SADB_RX */ - struct xfrm_state *x; - struct mlx5e_ipsec *ipsec; - struct mlx5_accel_esp_xfrm *xfrm; - void *hw_context; -}; static struct mlx5e_ipsec_sa_entry *to_ipsec_sa_entry(struct xfrm_state *x) { @@ -121,6 +112,40 @@ static void mlx5e_ipsec_sadb_rx_free(struct mlx5e_ipsec_sa_entry *sa_entry) ida_simple_remove(>halloc, sa_entry->handle); } +static bool mlx5e_ipsec_update_esn_state(struct mlx5e_ipsec_sa_entry *sa_entry) +{ + struct xfrm_replay_state_esn *replay_esn; + u32 seq_bottom; + u8 overlap; + u32 *esn; + + if (!(sa_entry->x->props.flags & XFRM_STATE_ESN)) { + sa_entry->esn_state.trigger = 0; + return false; + } + + replay_esn = sa_entry->x->replay_esn; + seq_bottom = replay_esn->seq - replay_esn->replay_window + 1; + overlap = sa_entry->esn_state.overlap; + + sa_entry->esn_state.esn = xfrm_replay_seqhi(sa_entry->x, + htonl(seq_bottom)); + esn = _entry->esn_state.esn; + + sa_entry->esn_state.trigger = 1; + if (unlikely(overlap && seq_bottom < MLX5E_IPSEC_ESN_SCOPE_MID)) { + ++(*esn); + sa_entry->esn_state.overlap = 0; + return true; + } else if (unlikely(!overlap && + (seq_bottom >= MLX5E_IPSEC_ESN_SCOPE_MID))) { + sa_entry->esn_state.overlap = 1; + return true; + } + + return false; +} + static void mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry, struct mlx5_accel_esp_xfrm_attrs *attrs) @@ -152,6 +177,14 @@ mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry, /* iv len */ aes_gcm->icv_len = x->aead->alg_icv_len; + /* esn */ + if (sa_entry->esn_state.trigger) { + attrs->flags |= MLX5_ACCEL_ESP_FLAGS_ESN_TRIGGERED; + attrs->esn = sa_entry->esn_state.esn; + if (sa_entry->esn_state.overlap) + attrs->flags |= MLX5_ACCEL_ESP_FLAGS_ESN_STATE_OVERLAP; + } + /* rx handle */ attrs->sa_handle = sa_entry->handle; @@ -187,7 +220,9 @@ static inline int mlx5e_xfrm_validate_state(struct xfrm_state *x) netdev_info(netdev, "Cannot offload compressed xfrm states\n"); return -EINVAL; } - if (x->props.flags & XFRM_STATE_ESN) { + if (x->props.flags & XFRM_STATE_ESN && + !(mlx5_accel_ipsec_device_caps(priv->mdev) & + MLX5_ACCEL_IPSEC_CAP_ESN)) { netdev_info(netdev, "Cannot offload ESN xfrm states\n"); return -EINVAL; } @@ -277,8 +312,14 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x) netdev_info(netdev, "Failed adding to SADB_RX: %d\n", err); goto err_entry;
[for-next 06/11] net/mlx5: Added required metadata capability for ipsec
From: Aviad YehezkelCurrently our device requires additional metadata in packet to perform ipsec crypto offload. Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 6 -- include/linux/mlx5/accel.h | 1 + 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c index e7e28277733d..8de992ba7230 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c @@ -256,10 +256,12 @@ u32 mlx5_fpga_ipsec_device_caps(struct mlx5_core_dev *mdev) struct mlx5_fpga_device *fdev = mdev->fpga; u32 ret = 0; - if (mlx5_fpga_is_ipsec_device(mdev)) + if (mlx5_fpga_is_ipsec_device(mdev)) { ret |= MLX5_ACCEL_IPSEC_CAP_DEVICE; - else + ret |= MLX5_ACCEL_IPSEC_CAP_REQUIRED_METADATA; + } else { return ret; + } if (!fdev->ipsec) return ret; diff --git a/include/linux/mlx5/accel.h b/include/linux/mlx5/accel.h index 601280c782d3..b674af63689b 100644 --- a/include/linux/mlx5/accel.h +++ b/include/linux/mlx5/accel.h @@ -38,6 +38,7 @@ enum mlx5_accel_ipsec_caps { MLX5_ACCEL_IPSEC_CAP_DEVICE = 1 << 0, + MLX5_ACCEL_IPSEC_CAP_REQUIRED_METADATA = 1 << 1, MLX5_ACCEL_IPSEC_CAP_ESP= 1 << 2, MLX5_ACCEL_IPSEC_CAP_IPV6 = 1 << 3, MLX5_ACCEL_IPSEC_CAP_LSO= 1 << 4, -- 2.14.3
[for-next 05/11] net/mlx5: Export ipsec capabilities
From: Aviad YehezkelWe will need that for ipsec verbs. Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/accel/ipsec.c | 3 +- .../net/ethernet/mellanox/mlx5/core/accel/ipsec.h | 14 +- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 9 ++-- .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 14 +++--- include/linux/mlx5/accel.h | 57 ++ 5 files changed, 73 insertions(+), 24 deletions(-) create mode 100644 include/linux/mlx5/accel.h diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c index b88ae12d9066..375ba438e7cf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c @@ -45,7 +45,7 @@ void *mlx5_accel_ipsec_sa_cmd_exec(struct mlx5_core_dev *mdev, if (!MLX5_IPSEC_DEV(mdev)) return ERR_PTR(-EOPNOTSUPP); - if (mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_V2_CMD) + if (mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_CAP_V2_CMD) cmd_size = sizeof(*cmd); else cmd_size = sizeof(cmd->ipsec_sa_v1); @@ -62,6 +62,7 @@ u32 mlx5_accel_ipsec_device_caps(struct mlx5_core_dev *mdev) { return mlx5_fpga_ipsec_device_caps(mdev); } +EXPORT_SYMBOL_GPL(mlx5_accel_ipsec_device_caps); unsigned int mlx5_accel_ipsec_counters_count(struct mlx5_core_dev *mdev) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h index 14a2e95e82c3..421ed71a029b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h @@ -35,18 +35,10 @@ #define __MLX5_ACCEL_IPSEC_H__ #include +#include #ifdef CONFIG_MLX5_ACCEL -enum { - MLX5_ACCEL_IPSEC_DEVICE = BIT(1), - MLX5_ACCEL_IPSEC_IPV6 = BIT(2), - MLX5_ACCEL_IPSEC_ESP = BIT(3), - MLX5_ACCEL_IPSEC_LSO = BIT(4), - MLX5_ACCEL_IPSEC_NO_TRAILER = BIT(5), - MLX5_ACCEL_IPSEC_V2_CMD = BIT(7), -}; - #define MLX5_IPSEC_SADB_IP_AH BIT(7) #define MLX5_IPSEC_SADB_IP_ESP BIT(6) #define MLX5_IPSEC_SADB_SA_VALIDBIT(5) @@ -70,7 +62,7 @@ enum mlx5_accel_ipsec_enc_mode { }; #define MLX5_IPSEC_DEV(mdev) (mlx5_accel_ipsec_device_caps(mdev) & \ - MLX5_ACCEL_IPSEC_DEVICE) + MLX5_ACCEL_IPSEC_CAP_DEVICE) struct mlx5_accel_ipsec_sa_v1 { __be32 cmd; @@ -126,8 +118,6 @@ void *mlx5_accel_ipsec_sa_cmd_exec(struct mlx5_core_dev *mdev, */ int mlx5_accel_ipsec_sa_cmd_wait(void *context); -u32 mlx5_accel_ipsec_device_caps(struct mlx5_core_dev *mdev); - unsigned int mlx5_accel_ipsec_counters_count(struct mlx5_core_dev *mdev); int mlx5_accel_ipsec_counters_read(struct mlx5_core_dev *mdev, u64 *counters, unsigned int count); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index a8c3fe7cff0f..6f4a01620cc3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -242,7 +242,8 @@ static inline int mlx5e_xfrm_validate_state(struct xfrm_state *x) return -EINVAL; } if (x->props.family == AF_INET6 && - !(mlx5_accel_ipsec_device_caps(priv->mdev) & MLX5_ACCEL_IPSEC_IPV6)) { + !(mlx5_accel_ipsec_device_caps(priv->mdev) & +MLX5_ACCEL_IPSEC_CAP_IPV6)) { netdev_info(netdev, "IPv6 xfrm state offload is not supported by this device\n"); return -EINVAL; } @@ -375,7 +376,7 @@ int mlx5e_ipsec_init(struct mlx5e_priv *priv) ipsec->en_priv = priv; ipsec->en_priv->ipsec = ipsec; ipsec->no_trailer = !!(mlx5_accel_ipsec_device_caps(priv->mdev) & - MLX5_ACCEL_IPSEC_NO_TRAILER); + MLX5_ACCEL_IPSEC_CAP_RX_NO_TRAILER); netdev_dbg(priv->netdev, "IPSec attached to netdevice\n"); return 0; } @@ -422,7 +423,7 @@ void mlx5e_ipsec_build_netdev(struct mlx5e_priv *priv) if (!priv->ipsec) return; - if (!(mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_ESP) || + if (!(mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_CAP_ESP) || !MLX5_CAP_ETH(mdev, swp)) { mlx5_core_dbg(mdev, "mlx5e: ESP and SWP offload not supported\n"); return; @@ -441,7 +442,7 @@ void mlx5e_ipsec_build_netdev(struct mlx5e_priv *priv) netdev->features |= NETIF_F_HW_ESP_TX_CSUM; netdev->hw_enc_features |= NETIF_F_HW_ESP_TX_CSUM; - if (!(mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_LSO)
[for-next 09/11] net/mlx5e: Added common function for to_ipsec_sa_entry
From: Aviad YehezkelNew function for getting driver internal sa entry from xfrm state. All checks are done in one function. Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 29 ++ 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 59df3dbd2e65..f5b1d60f96f5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -51,6 +51,21 @@ struct mlx5e_ipsec_sa_entry { void *hw_context; }; +static struct mlx5e_ipsec_sa_entry *to_ipsec_sa_entry(struct xfrm_state *x) +{ + struct mlx5e_ipsec_sa_entry *sa; + + if (!x) + return NULL; + + sa = (struct mlx5e_ipsec_sa_entry *)x->xso.offload_handle; + if (!sa) + return NULL; + + WARN_ON(sa->x != x); + return sa; +} + struct xfrm_state *mlx5e_ipsec_sadb_rx_lookup(struct mlx5e_ipsec *ipsec, unsigned int handle) { @@ -312,28 +327,22 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x) static void mlx5e_xfrm_del_state(struct xfrm_state *x) { - struct mlx5e_ipsec_sa_entry *sa_entry; + struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x); - if (!x->xso.offload_handle) + if (!sa_entry) return; - sa_entry = (struct mlx5e_ipsec_sa_entry *)x->xso.offload_handle; - WARN_ON(sa_entry->x != x); - if (x->xso.flags & XFRM_OFFLOAD_INBOUND) mlx5e_ipsec_sadb_rx_del(sa_entry); } static void mlx5e_xfrm_free_state(struct xfrm_state *x) { - struct mlx5e_ipsec_sa_entry *sa_entry; + struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x); - if (!x->xso.offload_handle) + if (!sa_entry) return; - sa_entry = (struct mlx5e_ipsec_sa_entry *)x->xso.offload_handle; - WARN_ON(sa_entry->x != x); - if (sa_entry->hw_context) { mlx5_accel_esp_free_hw_context(sa_entry->hw_context); mlx5_accel_esp_destroy_xfrm(sa_entry->xfrm); -- 2.14.3
[for-next 04/11] net/mlx5: IPSec, Add command V2 support
From: Aviad YehezkelThis patch adds V2 command support. New fpga devices support extended features (udp encap, esn etc...), this features require new hardware sadb format therefore we have a new version of commands to manipulate it. Signed-off-by: Yossef Efraim Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/accel/ipsec.c | 9 +++- .../net/ethernet/mellanox/mlx5/core/accel/ipsec.h | 21 ++-- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 60 ++ .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 11 ++-- .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.h | 5 +- include/linux/mlx5/mlx5_ifc_fpga.h | 4 +- 6 files changed, 66 insertions(+), 44 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c index 53e69edaedde..b88ae12d9066 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c @@ -40,10 +40,17 @@ void *mlx5_accel_ipsec_sa_cmd_exec(struct mlx5_core_dev *mdev, struct mlx5_accel_ipsec_sa *cmd) { + int cmd_size; + if (!MLX5_IPSEC_DEV(mdev)) return ERR_PTR(-EOPNOTSUPP); - return mlx5_fpga_ipsec_sa_cmd_exec(mdev, cmd); + if (mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_V2_CMD) + cmd_size = sizeof(*cmd); + else + cmd_size = sizeof(cmd->ipsec_sa_v1); + + return mlx5_fpga_ipsec_sa_cmd_exec(mdev, cmd, cmd_size); } int mlx5_accel_ipsec_sa_cmd_wait(void *ctx) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h index 4da9611a753d..14a2e95e82c3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h @@ -44,6 +44,7 @@ enum { MLX5_ACCEL_IPSEC_ESP = BIT(3), MLX5_ACCEL_IPSEC_LSO = BIT(4), MLX5_ACCEL_IPSEC_NO_TRAILER = BIT(5), + MLX5_ACCEL_IPSEC_V2_CMD = BIT(7), }; #define MLX5_IPSEC_SADB_IP_AH BIT(7) @@ -56,6 +57,9 @@ enum { enum { MLX5_IPSEC_CMD_ADD_SA = 0, MLX5_IPSEC_CMD_DEL_SA = 1, + MLX5_IPSEC_CMD_ADD_SA_V2 = 2, + MLX5_IPSEC_CMD_DEL_SA_V2 = 3, + MLX5_IPSEC_CMD_MOD_SA_V2 = 4, MLX5_IPSEC_CMD_SET_CAP = 5, }; @@ -68,7 +72,7 @@ enum mlx5_accel_ipsec_enc_mode { #define MLX5_IPSEC_DEV(mdev) (mlx5_accel_ipsec_device_caps(mdev) & \ MLX5_ACCEL_IPSEC_DEVICE) -struct mlx5_accel_ipsec_sa { +struct mlx5_accel_ipsec_sa_v1 { __be32 cmd; u8 key_enc[32]; u8 key_auth[32]; @@ -88,10 +92,19 @@ struct mlx5_accel_ipsec_sa { __be32 sw_sa_handle; __be16 tfclen; u8 enc_mode; - u8 sip_masklen; - u8 dip_masklen; + u8 reserved1[2]; u8 flags; - u8 reserved[2]; + u8 reserved2[2]; +}; + +struct mlx5_accel_ipsec_sa { + struct mlx5_accel_ipsec_sa_v1 ipsec_sa_v1; + __be16 udp_sp; + __be16 udp_dp; + u8 reserved1[4]; + __be32 esn; + __be16 vid; /* only 12 bits, rest is reserved */ + __be16 reserved2; } __packed; /** diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 460a613059fe..a8c3fe7cff0f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -133,50 +133,46 @@ static void mlx5e_ipsec_build_hw_sa(u32 op, struct mlx5e_ipsec_sa_entry *sa_entr memset(hw_sa, 0, sizeof(*hw_sa)); - if (op == MLX5_IPSEC_CMD_ADD_SA) { - crypto_data_len = (x->aead->alg_key_len + 7) / 8; - key_len = crypto_data_len - 4; /* 4 bytes salt at end */ - aead = x->data; - geniv_ctx = crypto_aead_ctx(aead); - ivsize = crypto_aead_ivsize(aead); - - memcpy(_sa->key_enc, x->aead->alg_key, key_len); - /* Duplicate 128 bit key twice according to HW layout */ - if (key_len == 16) - memcpy(_sa->key_enc[16], x->aead->alg_key, key_len); - memcpy(_sa->gcm.salt_iv, geniv_ctx->salt, ivsize); - hw_sa->gcm.salt = *((__be32 *)(x->aead->alg_key + key_len)); - } - - hw_sa->cmd = htonl(op); - hw_sa->flags |= MLX5_IPSEC_SADB_SA_VALID | MLX5_IPSEC_SADB_SPI_EN; + crypto_data_len = (x->aead->alg_key_len + 7) / 8; + key_len = crypto_data_len - 4; /* 4 bytes salt at end */ + aead = x->data; + geniv_ctx = crypto_aead_ctx(aead); + ivsize = crypto_aead_ivsize(aead); + + memcpy(_sa->ipsec_sa_v1.key_enc, x->aead->alg_key, key_len); + /*
[for-next 01/11] net/mlx5: Use MLX5_IPSEC_DEV macro for ipsec caps
Fix build break of mlx5_accel_ipsec_device_caps is not defined when MLX5_ACCEL is not selected, use MLX5_IPSEC_DEV instead which handles such case. Signed-off-by: Saeed MahameedReported-by: Doug Ledford --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 4e456c292ce4..f836e6b76f65 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -2642,8 +2642,7 @@ int mlx5_init_fs(struct mlx5_core_dev *dev) goto err; } - if (mlx5_accel_ipsec_device_caps(steering->dev) & - MLX5_ACCEL_IPSEC_DEVICE) { + if (MLX5_IPSEC_DEV(dev)) { err = init_egress_root_ns(steering); if (err) goto err; -- 2.14.3
Re: [PATCH] net: remove VLA usage
From: Laszlo TothDate: Thu, 8 Mar 2018 01:19:53 +0100 > Separated snmp_seq_show_tcp_udp() to tcp and udp variants, > so the usage of max_t() for the array size can be emitted. > > Signed-off-by: Laszlo Toth But it's a max on a constant value, computed at compile time. I don't see at all why this is necessary. If the compiler can't figure this out, fix it. If the compiler warns on this with -Wvla, fix it. Because there is no reason to have two separate routines for this. Thank you.
[for-next 03/11] net/mlx5e: IPSec, Add support for ESP trailer removal by hardware
From: Yossi KupermanCurrent hardware decrypts and authenticates incoming ESP packets. Subsequently, the software extracts the nexthdr field, truncates the trailer and adjusts csum accordingly. With this patch and a capable device, the trailer is being removed by the hardware and the nexthdr field is conveyed via PET. This way we avoid both the need to access the trailer (cache miss) and to compute its relative checksum, which significantly improve the performance. Experiment shows that trailer removal improves the performance by 2Gbps, (netperf). Both forwarding and host-to-host configurations. Signed-off-by: Yossi Kuperman Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/accel/ipsec.h | 2 + .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 2 + .../ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 1 + .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 10 +++- .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 54 ++ include/linux/mlx5/mlx5_ifc_fpga.h | 13 +- 6 files changed, 80 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h index 67cda8871f5a..4da9611a753d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h @@ -43,6 +43,7 @@ enum { MLX5_ACCEL_IPSEC_IPV6 = BIT(2), MLX5_ACCEL_IPSEC_ESP = BIT(3), MLX5_ACCEL_IPSEC_LSO = BIT(4), + MLX5_ACCEL_IPSEC_NO_TRAILER = BIT(5), }; #define MLX5_IPSEC_SADB_IP_AH BIT(7) @@ -55,6 +56,7 @@ enum { enum { MLX5_IPSEC_CMD_ADD_SA = 0, MLX5_IPSEC_CMD_DEL_SA = 1, + MLX5_IPSEC_CMD_SET_CAP = 5, }; enum mlx5_accel_ipsec_enc_mode { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 1b49afca65c0..460a613059fe 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -378,6 +378,8 @@ int mlx5e_ipsec_init(struct mlx5e_priv *priv) ida_init(>halloc); ipsec->en_priv = priv; ipsec->en_priv->ipsec = ipsec; + ipsec->no_trailer = !!(mlx5_accel_ipsec_device_caps(priv->mdev) & + MLX5_ACCEL_IPSEC_NO_TRAILER); netdev_dbg(priv->netdev, "IPSec attached to netdevice\n"); return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h index 56e00baf16cc..bffc3ed0574a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h @@ -77,6 +77,7 @@ struct mlx5e_ipsec_stats { struct mlx5e_ipsec { struct mlx5e_priv *en_priv; DECLARE_HASHTABLE(sadb_rx, MLX5E_IPSEC_SADB_RX_BITS); + bool no_trailer; spinlock_t sadb_rx_lock; /* Protects sadb_rx and halloc */ struct ida halloc; struct mlx5e_ipsec_sw_stats sw_stats; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c index 6a7c8b04447e..64c549a06678 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c @@ -42,10 +42,11 @@ enum { MLX5E_IPSEC_RX_SYNDROME_DECRYPTED = 0x11, MLX5E_IPSEC_RX_SYNDROME_AUTH_FAILED = 0x12, + MLX5E_IPSEC_RX_SYNDROME_BAD_PROTO = 0x17, }; struct mlx5e_ipsec_rx_metadata { - unsigned char reserved; + unsigned char nexthdr; __be32 sa_handle; } __packed; @@ -301,10 +302,17 @@ mlx5e_ipsec_build_sp(struct net_device *netdev, struct sk_buff *skb, switch (mdata->syndrome) { case MLX5E_IPSEC_RX_SYNDROME_DECRYPTED: xo->status = CRYPTO_SUCCESS; + if (likely(priv->ipsec->no_trailer)) { + xo->flags |= XFRM_ESP_NO_TRAILER; + xo->proto = mdata->content.rx.nexthdr; + } break; case MLX5E_IPSEC_RX_SYNDROME_AUTH_FAILED: xo->status = CRYPTO_TUNNEL_ESP_AUTH_FAILED; break; + case MLX5E_IPSEC_RX_SYNDROME_BAD_PROTO: + xo->status = CRYPTO_INVALID_PROTOCOL; + break; default: atomic64_inc(>ipsec->sw_stats.ipsec_rx_drop_syndrome); return NULL; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c index e0f32b025e06..3b10d46dc821 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c @@ -273,6 +273,9
[pull request][for-next 00/11] Mellanox, mlx5 IPSec updates 2018-02-28-2 (Part 2)
Hi Dave and Doug, This series includes shared code updates (IPSec part2) for mlx5 core driver for both netdev and rdma subsystems. This series should be pulled to both trees so we can continue netdev and rdma specific submissions separately. Mainly it includes two fixes for previous pull requests: 1. net/mlx5: Use MLX5_IPSEC_DEV macro for ipsec caps - Fixes Build issue when MLX5_ACCEL is not selected 2. net/mlx5: Fix wrongly assigned CQ reference counter - Fixes a call trace warning when CONFIG_REFCOUNT_FULL is selected And IPsec ESN netdev and user-space foundation and support, For more information please see tag log below. The series doesn't cause any conflict with the latest mlx5 rc fixes. Thanks, Saeed. --- The following changes since commit e810bf5e96e327500cc6334f9d56c8047aaabcff: net/mlx5: Flow steering cmd interface should get the fte when deleting (2018-03-06 22:20:15 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git tags/mlx5-updates-2018-02-28-2 for you to fetch changes up to 31135eb3887daf2ed3e88fbefc36243357a9008f: net/mlx5: Fix wrongly assigned CQ reference counter (2018-03-07 15:54:36 -0800) mlx5-updates-2018-02-28-2 (IPSec-2) This series follows our previous one to lay out the foundations for IPSec in user-space and extend current kernel netdev IPSec support. As noted in our previous pull request cover letter "mlx5-updates-2018-02-28-1 (IPSec-1)", the IPSec mechanism will be supported through our flow steering mechanism. Therefore, we need to change the initialization order. Furthermore, IPsec is also supported in both egress and ingress. Since our current flow steering is egress only, we add an empty (only implemented through FPGA steering ops) egress namespace to handle that case. We also implement the required flow steering callbacks and logic in our FPGA driver. We extend the FPGA support for ESN and modifying a xfrm too. Therefore, we add support for some new FPGA command interface that supports them. The other required bits are added too. The new features and requirements are advertised via cap bits. Last but not least, we revise our driver's accel_esp API. This API will be shared between our netdev and IB driver, so we need to have all the required functionality from both worlds. Regards, Aviad and Matan Aviad Yehezkel (7): net/mlx5: IPSec, Add command V2 support net/mlx5: Export ipsec capabilities net/mlx5: Added required metadata capability for ipsec net/mlx5: Refactor accel IPSec code net/mlx5: Add flow-steering commands for FPGA IPSec implementation net/mlx5e: Added common function for to_ipsec_sa_entry net/mlx5: IPSec, Add support for ESN Leon Romanovsky (1): net/mlx5: Fix wrongly assigned CQ reference counter Saeed Mahameed (1): net/mlx5: Use MLX5_IPSEC_DEV macro for ipsec caps Yossi Kuperman (2): net/mlx5: IPSec, Generalize sandbox QP commands net/mlx5e: IPSec, Add support for ESP trailer removal by hardware .../net/ethernet/mellanox/mlx5/core/accel/ipsec.c | 59 +- .../net/ethernet/mellanox/mlx5/core/accel/ipsec.h | 96 +- drivers/net/ethernet/mellanox/mlx5/core/cq.c |3 +- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 306 +++-- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 24 + .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 39 +- .../mellanox/mlx5/core/en_accel/ipsec_rxtx.h |5 + .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 1280 +++- .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.h | 76 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |8 +- drivers/net/ethernet/mellanox/mlx5/core/main.c |2 + include/linux/mlx5/accel.h | 144 +++ include/linux/mlx5/fs.h|3 + include/linux/mlx5/mlx5_ifc_fpga.h | 92 +- 14 files changed, 1858 insertions(+), 279 deletions(-) create mode 100644 include/linux/mlx5/accel.h
Re: [PATCH 0/3] ibmvnic: Clean up net close and fix reset bug
From: Thomas FalconDate: Wed, 7 Mar 2018 17:43:06 -0600 > Crud, this series is meant for the net-next tree, but I forgot to > include it in the patch tag. I know I'm really stubborn about a lot of things :-), but next time you can just reply like this explaining things and that is enough for me. Thanks.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Mon, Mar 05, 2018 at 05:34:57PM -0800, Alexei Starovoitov wrote: > As the first step in development of bpfilter project [1] So meta :) The URL refers an lwn article, which in turn refers to this effort's first RFC. As someone only getting *one* of these patches in emails, It would be useful if the cover letter referenced instead an optional git tree and branch so one could easily get the patches for more careful inspection. In the meantime, can you make such tree available with a branch? Also, since kernel/module.c is involved it would be wise to include Jessica, which I've Cc'd. I'm going to guess Kees may like to review this too. Mimi might want to help review as well. Rafael may care about suspend/resume implications of these "umh modules" as you put it. > the request_module() > code is extended to allow user mode helpers to be invoked. Upon inspection you never touch or use request_module() so this is false and misleading. You don't use or extend request_module() at all, you rely on extending finit_module() so that load_module() itself will now execute (via modified do_execve_file()) the same file which was loaded as an module. This is *very* different given request_module() has its own full magic and is in and of itself a UMH of the kernel implemented in kernel/kmod.c. Nevertheless, why? > Idea is that > user mode helpers are built as part of the kernel build and installed as > traditional kernel modules with .ko file extension into distro specified > location, such that from a distribution point of view, they are no different > than regular kernel modules. It sounds like finit_module() module loading is used as a convenience factor simply to take advantage of being able to ship / maintain/ compile these umh programs as part of the kernel. Is that right? So the finit_module() interface was just a convenient mechanism. Is that right? Ie, if folks had these binaries in place the regular UMH interface / API could be used so that these could be looked for, but instead we want to carry these in tandem with the kernel? If so this still seems like an overly complex way to deal with this. > Thus, allow request_module() logic to load such > user mode helper (umh) modules via: > > request_module("foo") -> > call_umh("modprobe foo") -> > sys_finit_module(FD of /lib/modules/.../foo.ko) -> > call_umh(struct file) OK so the use case envisioned here was for networking code to do something like: if (!loaded) { err = request_module("bpfilter"); ... } This is visible on your third patch (this is from your RFC, not this series): https://www.mail-archive.com/netfilter-devel@vger.kernel.org/msg11129.html So indeed all this patch does in the end is just putting tons of wrappers in place so that kernel code can load certain trusted UMH programs we ship, and maintain in the kernel. request_module() has its own world though too. How often in your proof of concept is request_module() called? How many times do you envision it being called? Please review lib/test_kmod.c and tools/testing/selftests/kmod/ for testing your stuff too or consider extending appropriately. Are aliases something which you expect we'll need to support for these userspace... modules? > Such approach enables kernel to delegate functionality traditionally done > by kernel modules into user space processes (either root or !root) and > reduces security attack surface of such new code, meaning in case of > potential bugs only the umh would crash but not the kernel. Now, this sounds great, however I think that the proof of concept chosen is pretty complex to start off with. Even if its not designed to be a real world life use case, a much simpler proof of concept to do something more simple may be useful, if possible. One wouldn't need to to have it replace a kernel functionality in real life. lib/ is full of CONFIG_TEST_* examples, a simple new stupid kernel functionality which can in turn be replaced with a respective userspace counterpart may be useful, and both kconfig entries would be mutually exclusive. > Another > advantage coming with that would be that bpfilter.ko You mean foo.ko > can be debugged and > tested out of user space as well (e.g. opening the possibility to run > all clang sanitizers, fuzzers or test suites for checking translation). Great too. > Also, such architecture makes the kernel/user boundary very precise: > control plane is done by the user space while data plane stays in the kernel. I don't see how this is defining any boundary, I see just a loader for a userspace program, and re-using a kernel interface known, finit_module() which makes it convenient for us to load pre-compiled kernel junk. I'm still not convinced this is the right approach. > It's easy to distinguish "umh module" from traditional kernel module: Ah you said it, "umh module". I don't see what makes it a "umh module" so far, all we are doing is executing a
pull-request: bpf 2018-03-08
Hi David, The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix various BPF helpers which adjust the skb and its GSO information with regards to SCTP GSO. The latter is a special case where gso_size is of value GSO_BY_FRAGS, so mangling that will end up corrupting the skb, thus bail out when seeing SCTP GSO packets, from Daniel(s). 2) Fix a compilation error in bpftool where BPF_FS_MAGIC is not defined due to too old kernel headers in the system, from Jiri. 3) Increase the number of x64 JIT passes in order to allow larger images to converge instead of punting them to interpreter or having them rejected when the interpreter is not built into the kernel, from Daniel. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! The following changes since commit 4a0c7191c75e94b052855bf44d94bfe97403: Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf (2018-03-02 20:32:15 -0500) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git for you to fetch changes up to 6007b080d2e2adb7af22bf29165f0594ea12b34c: bpf, x64: increase number of passes (2018-03-07 14:47:24 -0800) Daniel Axtens (1): bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat to deal with gso sctp skbs Daniel Borkmann (1): bpf, x64: increase number of passes Jiri Benc (1): tools: bpftool: fix compilation with older headers Documentation/networking/segmentation-offloads.txt | 11 +++- arch/x86/net/bpf_jit_comp.c| 3 +- include/linux/skbuff.h | 22 net/core/filter.c | 60 +++--- tools/bpf/bpftool/common.c | 4 ++ 5 files changed, 79 insertions(+), 21 deletions(-)
[next-queue PATCH v4 6/8] igb: Add MAC address support for ethtool nftuple filters
This adds the capability of configuring the queue steering of arriving packets based on their source and destination MAC addresses. In practical terms this adds support for the following use cases, characterized by these examples: $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0 (this will direct packets with destination address "aa:aa:aa:aa:aa:aa" to the RX queue 0) $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3 (this will direct packets with source address "44:44:44:44:44:44" to the RX queue 3) Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb_ethtool.c | 35 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 94fc9a4bed8b..3f98299d4cd0 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -2494,6 +2494,23 @@ static int igb_get_ethtool_nfc_entry(struct igb_adapter *adapter, fsp->h_ext.vlan_tci = rule->filter.vlan_tci; fsp->m_ext.vlan_tci = htons(VLAN_PRIO_MASK); } + if (rule->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) { + ether_addr_copy(fsp->h_u.ether_spec.h_dest, + rule->filter.dst_addr); + /* As we only support matching by the full +* mask, return the mask to userspace +*/ + eth_broadcast_addr(fsp->m_u.ether_spec.h_dest); + } + if (rule->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) { + ether_addr_copy(fsp->h_u.ether_spec.h_source, + rule->filter.src_addr); + /* As we only support matching by the full +* mask, return the mask to userspace +*/ + eth_broadcast_addr(fsp->m_u.ether_spec.h_source); + } + return 0; } return -EINVAL; @@ -2932,10 +2949,6 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter *adapter, if ((fsp->flow_type & ~FLOW_EXT) != ETHER_FLOW) return -EINVAL; - if (fsp->m_u.ether_spec.h_proto != ETHER_TYPE_FULL_MASK && - fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) - return -EINVAL; - input = kzalloc(sizeof(*input), GFP_KERNEL); if (!input) return -ENOMEM; @@ -2945,6 +2958,20 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter *adapter, input->filter.match_flags = IGB_FILTER_FLAG_ETHER_TYPE; } + /* Only support matching addresses by the full mask */ + if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_source)) { + input->filter.match_flags |= IGB_FILTER_FLAG_SRC_MAC_ADDR; + ether_addr_copy(input->filter.src_addr, + fsp->h_u.ether_spec.h_source); + } + + /* Only support matching addresses by the full mask */ + if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_dest)) { + input->filter.match_flags |= IGB_FILTER_FLAG_DST_MAC_ADDR; + ether_addr_copy(input->filter.dst_addr, + fsp->h_u.ether_spec.h_dest); + } + if ((fsp->flow_type & FLOW_EXT) && fsp->m_ext.vlan_tci) { if (fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) { err = -EINVAL; -- 2.16.2
[next-queue PATCH v3 0/8] igb: offloading of receive filters
Hi, Changes from v3: - Addressed review comments from Aaron F. Brown and Jakub Kicinski; Changes from v2: - Addressed review comments from Jakub Kicinski, mostly about coding style adjustments and more consistent error reporting; Changes from v1: - Addressed review comments from Alexander Duyck and Florian Fainelli; - Adding and removing cls_flower filters are now proposed in the same patch; - cls_flower filters are kept in a separated list from "ethtool" filters (so that section of the original cover letter is no longer valid); - The patch adding support for ethtool filters is now independent from the rest of the series; Original cover letter: This series enables some ethtool and tc-flower filters to be offloaded to igb-based network controllers. This is useful when the system configurator want to steer kinds of traffic to a specific hardware queue. The first two commits are bug fixes. The basis of this series is to export the internal API used to configure address filters, so they can be used by ethtool, and extending the functionality so an source address can be handled. Then, we enable the tc-flower offloading implementation to re-use the same infrastructure as ethtool, and storing them in the per-adapter "nfc" (Network Filter Config?) list. But for consistency, for destructive access they are separated, i.e. an filter added by tc-flower can only be removed by tc-flower, but ethtool can read them all. Only support for VLAN Prio, Source and Destination MAC Address, and Ethertype is enabled for now. Open question: - igb is initialized with the number of traffic classes as 1, if we want to use multiple traffic classes we need to increase this value, the only way I could find is to use mqprio (for example). Should igb be initialized with, say, the number of queues as its "num_tc"? Vinicius Costa Gomes (8): igb: Fix not adding filter elements to the list igb: Fix queue selection on MAC filters on i210 and i211 igb: Enable the hardware traffic class feature bit for igb models igb: Add support for MAC address filters specifying source addresses igb: Enable nfc filters to specify MAC addresses igb: Add MAC address support for ethtool nftuple filters igb: Add the skeletons for tc-flower offloading igb: Add support for adding offloaded clsflower filters drivers/net/ethernet/intel/igb/e1000_defines.h | 2 + drivers/net/ethernet/intel/igb/igb.h | 12 + drivers/net/ethernet/intel/igb/igb_ethtool.c | 65 +- drivers/net/ethernet/intel/igb/igb_main.c | 306 - 4 files changed, 371 insertions(+), 14 deletions(-) -- 2.16.2
[next-queue PATCH v4 4/8] igb: Add support for MAC address filters specifying source addresses
Makes it possible to direct packets to queues based on their source address. Documents the expected usage of the 'flags' parameter. Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/e1000_defines.h | 1 + drivers/net/ethernet/intel/igb/igb.h | 1 + drivers/net/ethernet/intel/igb/igb_main.c | 38 ++ 3 files changed, 35 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h index 573bf177fd08..c6f552de30dd 100644 --- a/drivers/net/ethernet/intel/igb/e1000_defines.h +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h @@ -490,6 +490,7 @@ * manageability enabled, allowing us room for 15 multicast addresses. */ #define E1000_RAH_AV 0x8000/* Receive descriptor valid */ +#define E1000_RAH_ASEL_SRC_ADDR 0x0001 #define E1000_RAH_QSEL_ENABLE 0x1000 #define E1000_RAL_MAC_ADDR_LEN 4 #define E1000_RAH_MAC_ADDR_LEN 2 diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 55d6f17d5799..4501b28ff7c5 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -473,6 +473,7 @@ struct igb_mac_addr { #define IGB_MAC_STATE_DEFAULT 0x1 #define IGB_MAC_STATE_IN_USE 0x2 +#define IGB_MAC_STATE_SRC_ADDR 0x4 /* board specific private data structure */ struct igb_adapter { diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 8684d5ed56e1..76969467de31 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -6843,8 +6843,13 @@ static void igb_set_default_mac_filter(struct igb_adapter *adapter) igb_rar_set_index(adapter, 0); } -static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr, - const u8 queue) +/* Add a MAC filter for 'addr' directing matching traffic to 'queue', + * 'flags' is used to indicate what kind of match is made, match is by + * default for the destination address, if matching by source address + * is desired the flag IGB_MAC_STATE_SRC_ADDR can be used. + */ +static int igb_add_mac_filter_flags(struct igb_adapter *adapter, const u8 *addr, + const u8 queue, const u8 flags) { struct e1000_hw *hw = >hw; int rar_entries = hw->mac.rar_entry_count - @@ -6864,7 +6869,7 @@ static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr, ether_addr_copy(adapter->mac_table[i].addr, addr); adapter->mac_table[i].queue = queue; - adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE; + adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE | flags; igb_rar_set_index(adapter, i); return i; @@ -6873,8 +6878,20 @@ static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr, return -ENOSPC; } -static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr, +static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr, const u8 queue) +{ + return igb_add_mac_filter_flags(adapter, addr, queue, 0); +} + +/* Remove a MAC filter for 'addr' directing matching traffic to + * 'queue', 'flags' is used to indicate what kind of match need to be + * removed, match is by default for the destination address, if + * matching by source address is to be removed the flag + * IGB_MAC_STATE_SRC_ADDR can be used. + */ +static int igb_del_mac_filter_flags(struct igb_adapter *adapter, const u8 *addr, + const u8 queue, const u8 flags) { struct e1000_hw *hw = >hw; int rar_entries = hw->mac.rar_entry_count - @@ -6891,12 +6908,14 @@ static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr, for (i = 0; i < rar_entries; i++) { if (!(adapter->mac_table[i].state & IGB_MAC_STATE_IN_USE)) continue; + if ((adapter->mac_table[i].state & flags) != flags) + continue; if (adapter->mac_table[i].queue != queue) continue; if (!ether_addr_equal(adapter->mac_table[i].addr, addr)) continue; - adapter->mac_table[i].state &= ~IGB_MAC_STATE_IN_USE; + adapter->mac_table[i].state = 0; memset(adapter->mac_table[i].addr, 0, ETH_ALEN); adapter->mac_table[i].queue = 0; @@ -6907,6 +6926,12 @@ static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr, return -ENOENT; } +static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr, + const u8 queue) +{ + return igb_del_mac_filter_flags(adapter, addr, queue, 0); +} + static int
[next-queue PATCH v4 7/8] igb: Add the skeletons for tc-flower offloading
This adds basic functions needed to implement offloading for filters created by tc-flower. Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb_main.c | 66 +++ 1 file changed, 66 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 04307ef07e5a..7762dd5f270d 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include #include @@ -2497,6 +2498,69 @@ static int igb_offload_cbs(struct igb_adapter *adapter, return 0; } +static int igb_configure_clsflower(struct igb_adapter *adapter, + struct tc_cls_flower_offload *cls_flower) +{ + return -EOPNOTSUPP; +} + +static int igb_delete_clsflower(struct igb_adapter *adapter, + struct tc_cls_flower_offload *cls_flower) +{ + return -EOPNOTSUPP; +} + +static int igb_setup_tc_cls_flower(struct igb_adapter *adapter, + struct tc_cls_flower_offload *cls_flower) +{ + switch (cls_flower->command) { + case TC_CLSFLOWER_REPLACE: + return igb_configure_clsflower(adapter, cls_flower); + case TC_CLSFLOWER_DESTROY: + return igb_delete_clsflower(adapter, cls_flower); + case TC_CLSFLOWER_STATS: + return -EOPNOTSUPP; + default: + return -EINVAL; + } +} + +static int igb_setup_tc_block_cb(enum tc_setup_type type, void *type_data, +void *cb_priv) +{ + struct igb_adapter *adapter = cb_priv; + + if (!tc_cls_can_offload_and_chain0(adapter->netdev, type_data)) + return -EOPNOTSUPP; + + switch (type) { + case TC_SETUP_CLSFLOWER: + return igb_setup_tc_cls_flower(adapter, type_data); + + default: + return -EOPNOTSUPP; + } +} + +static int igb_setup_tc_block(struct igb_adapter *adapter, + struct tc_block_offload *f) +{ + if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS) + return -EOPNOTSUPP; + + switch (f->command) { + case TC_BLOCK_BIND: + return tcf_block_cb_register(f->block, igb_setup_tc_block_cb, +adapter, adapter); + case TC_BLOCK_UNBIND: + tcf_block_cb_unregister(f->block, igb_setup_tc_block_cb, + adapter); + return 0; + default: + return -EOPNOTSUPP; + } +} + static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type, void *type_data) { @@ -2505,6 +2569,8 @@ static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type, switch (type) { case TC_SETUP_QDISC_CBS: return igb_offload_cbs(adapter, type_data); + case TC_SETUP_BLOCK: + return igb_setup_tc_block(adapter, type_data); default: return -EOPNOTSUPP; -- 2.16.2
[next-queue PATCH v4 8/8] igb: Add support for adding offloaded clsflower filters
This allows filters added by tc-flower and specifying MAC addresses, Ethernet types, and the VLAN priority field, to be offloaded to the controller. This reuses most of the infrastructure used by ethtool, but clsflower filters are kept in a separated list, so they are invisible to ethtool. Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb.h | 2 + drivers/net/ethernet/intel/igb/igb_main.c | 188 +- 2 files changed, 188 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 3b310b16e1d1..1874c4635d54 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -464,6 +464,7 @@ struct igb_nfc_input { struct igb_nfc_filter { struct hlist_node nfc_node; struct igb_nfc_input filter; + unsigned long cookie; u16 etype_reg_index; u16 sw_idx; u16 action; @@ -602,6 +603,7 @@ struct igb_adapter { /* RX network flow classification support */ struct hlist_head nfc_filter_list; + struct hlist_head cls_flower_list; unsigned int nfc_filter_count; /* lock for RX network flow classification filter */ spinlock_t nfc_lock; diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 7762dd5f270d..d7d86a7955bd 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2498,16 +2498,197 @@ static int igb_offload_cbs(struct igb_adapter *adapter, return 0; } +#define ETHER_TYPE_FULL_MASK ((__force __be16)~0) +#define VLAN_PRIO_FULL_MASK (0x07) + +static int igb_parse_cls_flower(struct igb_adapter *adapter, + struct tc_cls_flower_offload *f, + int traffic_class, + struct igb_nfc_filter *input) +{ + struct netlink_ext_ack *extack = f->common.extack; + + if (f->dissector->used_keys & + ~(BIT(FLOW_DISSECTOR_KEY_BASIC) | + BIT(FLOW_DISSECTOR_KEY_CONTROL) | + BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | + BIT(FLOW_DISSECTOR_KEY_VLAN))) { + NL_SET_ERR_MSG_MOD(extack, + "Unsupported key used, only BASIC, CONTROL, ETH_ADDRS and VLAN are supported"); + return -EOPNOTSUPP; + } + + if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) { + struct flow_dissector_key_eth_addrs *key, *mask; + + key = skb_flow_dissector_target(f->dissector, + FLOW_DISSECTOR_KEY_ETH_ADDRS, + f->key); + mask = skb_flow_dissector_target(f->dissector, +FLOW_DISSECTOR_KEY_ETH_ADDRS, +f->mask); + + if (!is_zero_ether_addr(mask->dst)) { + if (!is_broadcast_ether_addr(mask->dst)) { + NL_SET_ERR_MSG_MOD(extack, "Only full masks are supported for destination MAC address"); + return -EINVAL; + } + + input->filter.match_flags |= + IGB_FILTER_FLAG_DST_MAC_ADDR; + ether_addr_copy(input->filter.dst_addr, key->dst); + } + + if (!is_zero_ether_addr(mask->src)) { + if (!is_broadcast_ether_addr(mask->src)) { + NL_SET_ERR_MSG_MOD(extack, "Only full masks are supported for source MAC address"); + return -EINVAL; + } + + input->filter.match_flags |= + IGB_FILTER_FLAG_SRC_MAC_ADDR; + ether_addr_copy(input->filter.src_addr, key->src); + } + } + + if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_BASIC)) { + struct flow_dissector_key_basic *key, *mask; + + key = skb_flow_dissector_target(f->dissector, + FLOW_DISSECTOR_KEY_BASIC, + f->key); + mask = skb_flow_dissector_target(f->dissector, +FLOW_DISSECTOR_KEY_BASIC, +f->mask); + + if (mask->n_proto) { + if (mask->n_proto != ETHER_TYPE_FULL_MASK) { + NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for EtherType filter"); + return -EINVAL; + } + + input->filter.match_flags |= IGB_FILTER_FLAG_ETHER_TYPE; +
[next-queue PATCH v4 1/8] igb: Fix not adding filter elements to the list
Because the order of the parameters passes to 'hlist_add_behind()' was inverted, the 'parent' node was added "behind" the 'input', as input is not in the list, this causes the 'input' node to be lost. Fixes: 0e71def25281 ("igb: add support of RX network flow classification") Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 606e6761758f..143f0bb34e4d 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -2864,7 +2864,7 @@ static int igb_update_ethtool_nfc_entry(struct igb_adapter *adapter, /* add filter to the list */ if (parent) - hlist_add_behind(>nfc_node, >nfc_node); + hlist_add_behind(>nfc_node, >nfc_node); else hlist_add_head(>nfc_node, >nfc_filter_list); -- 2.16.2
[next-queue PATCH v4 3/8] igb: Enable the hardware traffic class feature bit for igb models
This will allow functionality depending on the hardware being traffic class aware to work. In particular the tc-flower offloading checks verifies that this bit is set. Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index eabedc6b6518..8684d5ed56e1 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2806,6 +2806,9 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent) if (hw->mac.type >= e1000_82576) netdev->features |= NETIF_F_SCTP_CRC; + if (hw->mac.type >= e1000_i350) + netdev->features |= NETIF_F_HW_TC; + #define IGB_GSO_PARTIAL_FEATURES (NETIF_F_GSO_GRE | \ NETIF_F_GSO_GRE_CSUM | \ NETIF_F_GSO_IPXIP4 | \ -- 2.16.2
[next-queue PATCH v4 5/8] igb: Enable nfc filters to specify MAC addresses
This allows igb_add_filter()/igb_erase_filter() to work on filters that include MAC addresses (both source and destination). For now, this only exposes the functionality, the next commit glues ethtool into this. Later in this series, these APIs are used to allow offloading of cls_flower filters. Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb.h | 9 + drivers/net/ethernet/intel/igb/igb_ethtool.c | 28 drivers/net/ethernet/intel/igb/igb_main.c| 8 3 files changed, 41 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 4501b28ff7c5..3b310b16e1d1 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -441,6 +441,8 @@ struct hwmon_buff { enum igb_filter_match_flags { IGB_FILTER_FLAG_ETHER_TYPE = 0x1, IGB_FILTER_FLAG_VLAN_TCI = 0x2, + IGB_FILTER_FLAG_SRC_MAC_ADDR = 0x4, + IGB_FILTER_FLAG_DST_MAC_ADDR = 0x8, }; #define IGB_MAX_RXNFC_FILTERS 16 @@ -455,6 +457,8 @@ struct igb_nfc_input { u8 match_flags; __be16 etype; __be16 vlan_tci; + u8 src_addr[ETH_ALEN]; + u8 dst_addr[ETH_ALEN]; }; struct igb_nfc_filter { @@ -739,4 +743,9 @@ int igb_add_filter(struct igb_adapter *adapter, int igb_erase_filter(struct igb_adapter *adapter, struct igb_nfc_filter *input); +int igb_add_mac_filter_flags(struct igb_adapter *adapter, const u8 *addr, +const u8 queue, const u8 flags); +int igb_del_mac_filter_flags(struct igb_adapter *adapter, const u8 *addr, +const u8 queue, const u8 flags); + #endif /* _IGB_H_ */ diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 143f0bb34e4d..94fc9a4bed8b 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -2775,6 +2775,25 @@ int igb_add_filter(struct igb_adapter *adapter, struct igb_nfc_filter *input) return err; } + if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) { + err = igb_add_mac_filter_flags(adapter, + input->filter.dst_addr, + input->action, 0); + err = min_t(int, err, 0); + if (err) + return err; + } + + if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) { + err = igb_add_mac_filter_flags(adapter, + input->filter.src_addr, + input->action, + IGB_MAC_STATE_SRC_ADDR); + err = min_t(int, err, 0); + if (err) + return err; + } + if (input->filter.match_flags & IGB_FILTER_FLAG_VLAN_TCI) err = igb_rxnfc_write_vlan_prio_filter(adapter, input); @@ -2823,6 +2842,15 @@ int igb_erase_filter(struct igb_adapter *adapter, struct igb_nfc_filter *input) igb_clear_vlan_prio_filter(adapter, ntohs(input->filter.vlan_tci)); + if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) + igb_del_mac_filter_flags(adapter, input->filter.src_addr, +input->action, +IGB_MAC_STATE_SRC_ADDR); + + if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) + igb_del_mac_filter_flags(adapter, input->filter.dst_addr, +input->action, 0); + return 0; } diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 76969467de31..04307ef07e5a 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -6848,8 +6848,8 @@ static void igb_set_default_mac_filter(struct igb_adapter *adapter) * default for the destination address, if matching by source address * is desired the flag IGB_MAC_STATE_SRC_ADDR can be used. */ -static int igb_add_mac_filter_flags(struct igb_adapter *adapter, const u8 *addr, - const u8 queue, const u8 flags) +int igb_add_mac_filter_flags(struct igb_adapter *adapter, const u8 *addr, +const u8 queue, const u8 flags) { struct e1000_hw *hw = >hw; int rar_entries = hw->mac.rar_entry_count - @@ -6890,8 +6890,8 @@ static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr, * matching by source address is to be removed the flag * IGB_MAC_STATE_SRC_ADDR can be used. */ -static int igb_del_mac_filter_flags(struct igb_adapter
[next-queue PATCH v4 2/8] igb: Fix queue selection on MAC filters on i210 and i211
On the RAH registers there are semantic differences on the meaning of the "queue" parameter for traffic steering depending on the controller model: there is the 82575 meaning, which "queue" means a RX Hardware Queue, and the i350 meaning, where it is a reception pool. The previous behaviour was having no effect for i210 and i211 based controllers because the QSEL bit of the RAH register wasn't being set. This patch separates the condition in discrete cases, so the different handling is clearer. Fixes: 83c21335c876 ("igb: improve MAC filter handling") Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/e1000_defines.h | 1 + drivers/net/ethernet/intel/igb/igb_main.c | 15 +++ 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h index 83cabff1e0ab..573bf177fd08 100644 --- a/drivers/net/ethernet/intel/igb/e1000_defines.h +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h @@ -490,6 +490,7 @@ * manageability enabled, allowing us room for 15 multicast addresses. */ #define E1000_RAH_AV 0x8000/* Receive descriptor valid */ +#define E1000_RAH_QSEL_ENABLE 0x1000 #define E1000_RAL_MAC_ADDR_LEN 4 #define E1000_RAH_MAC_ADDR_LEN 2 #define E1000_RAH_POOL_MASK 0x03FC diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 715bb32e6901..eabedc6b6518 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -8747,12 +8747,19 @@ static void igb_rar_set_index(struct igb_adapter *adapter, u32 index) if (is_valid_ether_addr(addr)) rar_high |= E1000_RAH_AV; - if (hw->mac.type == e1000_82575) + switch (hw->mac.type) { + case e1000_82575: + case e1000_i210: + case e1000_i211: + rar_high |= E1000_RAH_QSEL_ENABLE; rar_high |= E1000_RAH_POOL_1 * - adapter->mac_table[index].queue; - else + adapter->mac_table[index].queue; + break; + default: rar_high |= E1000_RAH_POOL_1 << - adapter->mac_table[index].queue; + adapter->mac_table[index].queue; + break; + } } wr32(E1000_RAL(index), rar_low); -- 2.16.2
[PATCH] net: remove VLA usage
Separated snmp_seq_show_tcp_udp() to tcp and udp variants, so the usage of max_t() for the array size can be emitted. Signed-off-by: Laszlo Toth--- net/ipv4/proc.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index dc5edc8..67f7d76 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -46,8 +46,6 @@ #include #include -#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX) - /* * Report socket allocation statistics [m...@utu.fi] */ @@ -398,13 +396,13 @@ static int snmp_seq_show_ipstats(struct seq_file *seq, void *v) return 0; } -static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) +static int snmp_seq_show_tcp(struct seq_file *seq, void *v) { - unsigned long buff[TCPUDP_MIB_MAX]; + unsigned long buff[TCP_MIB_MAX]; struct net *net = seq->private; int i; - memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long)); + memset(buff, 0, TCP_MIB_MAX * sizeof(unsigned long)); seq_puts(seq, "\nTcp:"); for (i = 0; snmp4_tcp_list[i].name; i++) @@ -421,7 +419,16 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) seq_printf(seq, " %lu", buff[i]); } - memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long)); + return 0; +} + +static int snmp_seq_show_udp(struct seq_file *seq, void *v) +{ + unsigned long buff[UDP_MIB_MAX]; + struct net *net = seq->private; + int i; + + memset(buff, 0, UDP_MIB_MAX * sizeof(unsigned long)); snmp_get_cpu_field_batch(buff, snmp4_udp_list, net->mib.udp_statistics); @@ -432,7 +439,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) for (i = 0; snmp4_udp_list[i].name; i++) seq_printf(seq, " %lu", buff[i]); - memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long)); + memset(buff, 0, UDP_MIB_MAX * sizeof(unsigned long)); /* the UDP and UDP-Lite MIBs are the same */ seq_puts(seq, "\nUdpLite:"); @@ -455,7 +462,8 @@ static int snmp_seq_show(struct seq_file *seq, void *v) icmp_put(seq); /* RFC 2011 compatibility */ icmpmsg_put(seq); - snmp_seq_show_tcp_udp(seq, v); + snmp_seq_show_tcp(seq, v); + snmp_seq_show_udp(seq, v); return 0; } -- 2.7.4
RE: [Intel-wired-lan] [next-queue PATCH v3 6/8] igb: Add MAC address support for ethtool nftuple filters
Hi, "Brown, Aaron F"writes: >> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On >> Behalf Of Vinicius Costa Gomes >> Sent: Tuesday, March 6, 2018 5:30 PM >> To: intel-wired-...@lists.osuosl.org >> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus > palen...@intel.com> >> Subject: [Intel-wired-lan] [next-queue PATCH v3 6/8] igb: Add MAC address >> support for ethtool nftuple filters >> >> This adds the capability of configuring the queue steering of arriving >> packets based on their source and destination MAC addresses. >> >> In practical terms this adds support for the following use cases, >> characterized by these examples: >> >> $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0 >> (this will direct packets with destination address "aa:aa:aa:aa:aa:aa" >> to the RX queue 0) >> >> $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3 >> (this will direct packets with destination address "44:44:44:44:44:44" >> to the RX queue 3) > > I assume this example should read "... source address" rather than > "...destination". Ugh, yeah. Will be fixed on v4. Thank you, -- Vinicius
[RESEND PATCH 0/3 net-next] ibmvnic: Clean up net close and fix reset bug
This patch set cleans up and reorganizes the driver's net_device close function and leverages that to fix up a bug that can occur during some device resets. Some reset cases require the backing adapter to be disabled before continuing, but other cases, such as during a device failover or partition migration, do not require this step. Since the device will not be initialized at this stage and its command-processing queue is closed, do not send the request to disable the device as it could result in an error or timeout disrupting the reset. Thomas Falcon (3): ibmvnic: Clean up device close ibmvnic: Reorganize device close ibmvnic: Do not disable device during failover or partition migration drivers/net/ethernet/ibm/ibmvnic.c | 48 ++ 1 file changed, 23 insertions(+), 25 deletions(-) -- 1.8.3.1
[RESEND PATCH 1/3 net-next] ibmvnic: Clean up device close
Remove some dead code now that RX pools are being cleaned. This was included to wait until any pending RX queue interrupts are processed, but NAPI polling should be disabled by this point. Another minor change is to use the net device parameter for any print functions instead of accessing it from the adapter structure. Signed-off-by: Thomas Falcon--- drivers/net/ethernet/ibm/ibmvnic.c | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 7654071..fca0533 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1162,7 +1162,7 @@ static int __ibmvnic_close(struct net_device *netdev) if (adapter->tx_scrq) { for (i = 0; i < adapter->req_tx_queues; i++) if (adapter->tx_scrq[i]->irq) { - netdev_dbg(adapter->netdev, + netdev_dbg(netdev, "Disabling tx_scrq[%d] irq\n", i); disable_irq(adapter->tx_scrq[i]->irq); } @@ -1174,18 +1174,8 @@ static int __ibmvnic_close(struct net_device *netdev) if (adapter->rx_scrq) { for (i = 0; i < adapter->req_rx_queues; i++) { - int retries = 10; - - while (pending_scrq(adapter, adapter->rx_scrq[i])) { - retries--; - mdelay(100); - - if (retries == 0) - break; - } - if (adapter->rx_scrq[i]->irq) { - netdev_dbg(adapter->netdev, + netdev_dbg(netdev, "Disabling rx_scrq[%d] irq\n", i); disable_irq(adapter->rx_scrq[i]->irq); } -- 1.8.3.1
[RESEND PATCH 3/3 net-next] ibmvnic: Do not disable device during failover or partition migration
During a device failover or partition migration reset, it is not necessary to disable the backing adapter since it should not be running yet and its Command-Response Queue is closed. Sending device commands during this time could result in an error or timeout disrupting the reset process. In these cases, just halt transmissions, clean up resources, and continue with reset. Signed-off-by: Thomas Falcon--- drivers/net/ethernet/ibm/ibmvnic.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index d93f286..7be4b06 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1653,12 +1653,15 @@ static int do_reset(struct ibmvnic_adapter *adapter, rc = ibmvnic_reenable_crq_queue(adapter); if (rc) return 0; + ibmvnic_cleanup(netdev); + } else if (rwi->reset_reason == VNIC_RESET_FAILOVER) { + ibmvnic_cleanup(netdev); + } else { + rc = __ibmvnic_close(netdev); + if (rc) + return rc; } - rc = __ibmvnic_close(netdev); - if (rc) - return rc; - if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM || adapter->wait_for_reset) { release_resources(adapter); -- 1.8.3.1
[RESEND PATCH 2/3 net-next] ibmvnic: Reorganize device close
Introduce a function to halt network operations and clean up any unused or outstanding socket buffers. Then, during device close, disable backing adapter before halting all queues and performing cleanup. This ensures all backing device operations will be stopped before the driver cleans up shared resources. Signed-off-by: Thomas Falcon--- drivers/net/ethernet/ibm/ibmvnic.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index fca0533..d93f286 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1143,14 +1143,11 @@ static void clean_tx_pools(struct ibmvnic_adapter *adapter) } } -static int __ibmvnic_close(struct net_device *netdev) +static void ibmvnic_cleanup(struct net_device *netdev) { struct ibmvnic_adapter *adapter = netdev_priv(netdev); - int rc = 0; int i; - adapter->state = VNIC_CLOSING; - /* ensure that transmissions are stopped if called by do_reset */ if (adapter->resetting) netif_tx_disable(netdev); @@ -1168,10 +1165,6 @@ static int __ibmvnic_close(struct net_device *netdev) } } - rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN); - if (rc) - return rc; - if (adapter->rx_scrq) { for (i = 0; i < adapter->req_rx_queues; i++) { if (adapter->rx_scrq[i]->irq) { @@ -1183,8 +1176,20 @@ static int __ibmvnic_close(struct net_device *netdev) } clean_rx_pools(adapter); clean_tx_pools(adapter); +} + +static int __ibmvnic_close(struct net_device *netdev) +{ + struct ibmvnic_adapter *adapter = netdev_priv(netdev); + int rc = 0; + + adapter->state = VNIC_CLOSING; + rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN); + if (rc) + return rc; + ibmvnic_cleanup(netdev); adapter->state = VNIC_CLOSED; - return rc; + return 0; } static int ibmvnic_close(struct net_device *netdev) -- 1.8.3.1
Re: [PATCH net 4/5] tcp: prevent bogus undos when SACK is not enabled
On Wed, Mar 7, 2018 at 12:19 PM, Neal Cardwellwrote: > > On Wed, Mar 7, 2018 at 7:59 AM, Ilpo Järvinen > wrote: > > A bogus undo may/will trigger when the loss recovery state is > > kept until snd_una is above high_seq. If tcp_any_retrans_done > > is zero, retrans_stamp is cleared in this transient state. On > > the next ACK, tcp_try_undo_recovery again executes and > > tcp_may_undo will always return true because tcp_packet_delayed > > has this condition: > > return !tp->retrans_stamp || ... > > > > Check for the false fast retransmit transient condition in > > tcp_packet_delayed to avoid bogus undos. Since snd_una may have > > advanced on this ACK but CA state still remains unchanged, > > prior_snd_una needs to be passed instead of tp->snd_una. > > This one also seems like a case where it would be nice to have a > specific packet-by-packet example, or trace, or packetdrill scenario. > Something that we might be able to translate into a test, or at least > to document the issue more explicitly. I am hesitate for further logic to make undo "perfect" on non-sack cases b/c undo is very complicated and SACK is extremely well-supported today. so a trace to demonstrate how severe this issue is appreciated. > > Thanks! > neal
Re: [PATCH 0/3] ibmvnic: Clean up net close and fix reset bug
On 03/07/2018 05:41 PM, Thomas Falcon wrote: > This patch set cleans up and reorganizes the driver's net_device > close function and leverages that to fix up a bug that can occur > during some device resets. Some reset cases require the backing > adapter to be disabled before continuing, but other cases, such as > during a device failover or partition migration, do not require this > step. Since the device will not be initialized at this stage and > its command-processing queue is closed, do not send the request to > disable the device as it could result in an error or timeout > disrupting the reset. > > Thomas Falcon (3): > ibmvnic: Clean up device close > ibmvnic: Reorganize device close > ibmvnic: Do not disable device during failover or partition migration > > drivers/net/ethernet/ibm/ibmvnic.c | 48 > ++ > 1 file changed, 23 insertions(+), 25 deletions(-) > Crud, this series is meant for the net-next tree, but I forgot to include it in the patch tag.
[PATCH 3/3] ibmvnic: Do not disable device during failover or partition migration
During a device failover or partition migration reset, it is not necessary to disable the backing adapter since it should not be running yet and its Command-Response Queue is closed. Sending device commands during this time could result in an error or timeout disrupting the reset process. In these cases, just halt transmissions, clean up resources, and continue with reset. Signed-off-by: Thomas Falcon--- drivers/net/ethernet/ibm/ibmvnic.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index d93f286..7be4b06 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1653,12 +1653,15 @@ static int do_reset(struct ibmvnic_adapter *adapter, rc = ibmvnic_reenable_crq_queue(adapter); if (rc) return 0; + ibmvnic_cleanup(netdev); + } else if (rwi->reset_reason == VNIC_RESET_FAILOVER) { + ibmvnic_cleanup(netdev); + } else { + rc = __ibmvnic_close(netdev); + if (rc) + return rc; } - rc = __ibmvnic_close(netdev); - if (rc) - return rc; - if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM || adapter->wait_for_reset) { release_resources(adapter); -- 1.8.3.1
[PATCH 0/3] ibmvnic: Clean up net close and fix reset bug
This patch set cleans up and reorganizes the driver's net_device close function and leverages that to fix up a bug that can occur during some device resets. Some reset cases require the backing adapter to be disabled before continuing, but other cases, such as during a device failover or partition migration, do not require this step. Since the device will not be initialized at this stage and its command-processing queue is closed, do not send the request to disable the device as it could result in an error or timeout disrupting the reset. Thomas Falcon (3): ibmvnic: Clean up device close ibmvnic: Reorganize device close ibmvnic: Do not disable device during failover or partition migration drivers/net/ethernet/ibm/ibmvnic.c | 48 ++ 1 file changed, 23 insertions(+), 25 deletions(-) -- 1.8.3.1
[PATCH 2/3] ibmvnic: Reorganize device close
Introduce a function to halt network operations and clean up any unused or outstanding socket buffers. Then, during device close, disable backing adapter before halting all queues and performing cleanup. This ensures all backing device operations will be stopped before the driver cleans up shared resources. Signed-off-by: Thomas Falcon--- drivers/net/ethernet/ibm/ibmvnic.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index fca0533..d93f286 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1143,14 +1143,11 @@ static void clean_tx_pools(struct ibmvnic_adapter *adapter) } } -static int __ibmvnic_close(struct net_device *netdev) +static void ibmvnic_cleanup(struct net_device *netdev) { struct ibmvnic_adapter *adapter = netdev_priv(netdev); - int rc = 0; int i; - adapter->state = VNIC_CLOSING; - /* ensure that transmissions are stopped if called by do_reset */ if (adapter->resetting) netif_tx_disable(netdev); @@ -1168,10 +1165,6 @@ static int __ibmvnic_close(struct net_device *netdev) } } - rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN); - if (rc) - return rc; - if (adapter->rx_scrq) { for (i = 0; i < adapter->req_rx_queues; i++) { if (adapter->rx_scrq[i]->irq) { @@ -1183,8 +1176,20 @@ static int __ibmvnic_close(struct net_device *netdev) } clean_rx_pools(adapter); clean_tx_pools(adapter); +} + +static int __ibmvnic_close(struct net_device *netdev) +{ + struct ibmvnic_adapter *adapter = netdev_priv(netdev); + int rc = 0; + + adapter->state = VNIC_CLOSING; + rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN); + if (rc) + return rc; + ibmvnic_cleanup(netdev); adapter->state = VNIC_CLOSED; - return rc; + return 0; } static int ibmvnic_close(struct net_device *netdev) -- 1.8.3.1
[PATCH 1/3] ibmvnic: Clean up device close
Remove some dead code now that RX pools are being cleaned. This was included to wait until any pending RX queue interrupts are processed, but NAPI polling should be disabled by this point. Another minor change is to use the net device parameter for any print functions instead of accessing it from the adapter structure. Signed-off-by: Thomas Falcon--- drivers/net/ethernet/ibm/ibmvnic.c | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 7654071..fca0533 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1162,7 +1162,7 @@ static int __ibmvnic_close(struct net_device *netdev) if (adapter->tx_scrq) { for (i = 0; i < adapter->req_tx_queues; i++) if (adapter->tx_scrq[i]->irq) { - netdev_dbg(adapter->netdev, + netdev_dbg(netdev, "Disabling tx_scrq[%d] irq\n", i); disable_irq(adapter->tx_scrq[i]->irq); } @@ -1174,18 +1174,8 @@ static int __ibmvnic_close(struct net_device *netdev) if (adapter->rx_scrq) { for (i = 0; i < adapter->req_rx_queues; i++) { - int retries = 10; - - while (pending_scrq(adapter, adapter->rx_scrq[i])) { - retries--; - mdelay(100); - - if (retries == 0) - break; - } - if (adapter->rx_scrq[i]->irq) { - netdev_dbg(adapter->netdev, + netdev_dbg(netdev, "Disabling rx_scrq[%d] irq\n", i); disable_irq(adapter->rx_scrq[i]->irq); } -- 1.8.3.1
[PATCHv2 net-next] openvswitch: fix vport packet length check.
When sending a packet to a tunnel device, the dev's hard_header_len could be larger than the skb->len in function packet_length(). In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen, which is around 180, and an ARP packet sent to this tunnel has skb->len = 42. This causes the 'unsign int length' to become super large because it is negative value, causing the later ovs_vport_send to drop it due to over-mtu size. The patch fixes it by setting it to 0. Signed-off-by: William Tu--- v1->v2: replace the return type from unsigned int to int --- net/openvswitch/vport.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c index b6c8524032a0..f81c1d0ddff4 100644 --- a/net/openvswitch/vport.c +++ b/net/openvswitch/vport.c @@ -464,10 +464,10 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff *skb, return 0; } -static unsigned int packet_length(const struct sk_buff *skb, - struct net_device *dev) +static int packet_length(const struct sk_buff *skb, +struct net_device *dev) { - unsigned int length = skb->len - dev->hard_header_len; + int length = skb->len - dev->hard_header_len; if (!skb_vlan_tag_present(skb) && eth_type_vlan(skb->protocol)) @@ -478,7 +478,7 @@ static unsigned int packet_length(const struct sk_buff *skb, * account for 802.1ad. e.g. is_skb_forwardable(). */ - return length; + return length > 0 ? length : 0; } void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto) -- 2.7.4
Re: linux-next: manual merge of the selinux tree with the net-next tree
On Wed, Mar 7, 2018 at 3:26 PM, David Millerwrote: > From: Paul Moore > Date: Wed, 7 Mar 2018 15:20:33 -0500 > >>> So you would only have to wait until my tree went in before >>> sending your pull request. >> >> So you would want me to rebase selinux/next on top of Linus' tree in >> the middle of the merge window? I'm sure that isn't what you meant, >> but that's how I keep reading the above ... which can't be right, >> because in my experience that's one way to piss off Linus. Help me >> understand what you are saying. > > I never said you rebase anything. I wonder where you get that from. As I said, I was just trying to figure out what you were suggesting. Your email was not very clear in my opinion. > I'm saying, you just defer your pull request until Linus takes my > networking tree in. > > No changes or rebasing of your tree is necessary whatsoever. You just > ask him to pull your tree as-is. > > Again, this is what other smaller subsystem trees do when they have a > situation like this. Which gets us back to what I originally suggested in my first email of this thread: linux-next carries the fixup patch and when we send the pull requests to Linus we mention this fixup/thread. For what it's worth, if you mention the potential merge conflict, and the fixup that Stephen provided, it shouldn't matter when the pull requests are sent to Linus; he's a smart guy, he'll merge things in the order he wants. I've seen more than a few people get burned by deferring pull requests, I don't intend to have SELinux, or audit for that matter, run into the same problem. -- paul moore www.paul-moore.com
Re: [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE
From: Eric DumazetDate: Wed, 07 Mar 2018 08:43:19 -0800 > From: Eric Dumazet > > Kirill found that recently added synchronize_rcu() call in > ip6mr_sk_done() > was slowing down netns dismantle and posted a patch to use it only if > the socket > was found. > > I instead suggested to get rid of this call, and use instead > SOCK_RCU_FREE > > We might later change IPv4 side to use the same technique and unify > both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu() > that could be replaced by SOCK_RCU_FREE. > > Tested: > time for i in {1..1000}; do unshare -n /bin/false;done > > Before : real 7m18.911s > After : real 10.187s > > Fixes: 8571ab479a6e ("ip6mr: Make mroute_sk rcu-based") > Signed-off-by: Eric Dumazet > Reported-by: Kirill Tkhai > Cc: Yuval Mintz Looks great, applied, thanks everyone.
Re: [pull request][for-next V2 00/13] Mellanox, mlx5 IPSec updates 2018-02-28-1
On Wed, 2018-03-07 at 15:57 -0500, Doug Ledford wrote: > On Wed, 2018-03-07 at 15:41 -0500, Doug Ledford wrote: > > On Wed, 2018-03-07 at 15:31 -0500, David Miller wrote: > > > From: Saeed Mahameed> > > Date: Tue, 6 Mar 2018 22:35:03 -0800 > > > > > > > This series includes shared code updates for mlx5 core driver > > > > for both > > > > netdev and rdma subsystems. This series should be pulled to > > > > both > > > > trees so we can continue netdev and rdma specific submissions > > > > separately. > > > > > > > > For more information please see tag log below. > > > > > > > > The series doesn't cause any conflict with the latest mlx5 rc > > > > fixes. > > > > > > > > v1->v2: > > > > - Drop sparse fixes patch > > > > - Updated commit message of "net/mlx5: Add has_tag to > > > > mlx5_flow_act" > > > > - Add const to static mlx5_flow_cmd structs where needed. > > > > > > Pulled, thanks Saeed. > > > > Thanks, pulled here as well. > > > > Just FYI, > > My .config might have been in an unreasonable state (I had jumped > from > for-next to for-rc, built a kernel which ran a make oldconfig, then > jumped back to for-next and tried to build with this series applied > but > without making any changes to the .config file), but I got a build > error. My .config had both innova and the new ipsec accelerator > turned > off or something like that, and I got this error: > > CC [M] drivers/net/ethernet/mellanox/mlx5/core/en_main.o > drivers/net/ethernet/mellanox/mlx5/core/fs_core.c: In function > ‘mlx5_init_fs’: > drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:2645:6: error: > implicit declaration of function ‘mlx5_accel_ipsec_device_caps’; did > you > mean ‘mlx5_accel_ipsec_cleanup’? [-Werror=implicit-function- > declaration] > if (mlx5_accel_ipsec_device_caps(steering->dev) & > ^~~~ > mlx5_accel_ipsec_cleanup > CC [M] drivers/net/ethernet/silan/sc92031.o > CC [M] drivers/w1/slaves/w1_smem.o > CC [M] drivers/net/ethernet/mellanox/mlx5/core/en_common.o > CC [M] drivers/net/ethernet/sfc/falcon/nic.o > drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:2646:6: error: > ‘MLX5_ACCEL_IPSEC_DEVICE’ undeclared (first use in this function); > did > you mean ‘__MLX5_ACCEL_IPSEC_H__’? > MLX5_ACCEL_IPSEC_DEVICE) { > ^~~ > __MLX5_ACCEL_IPSEC_H__ > drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:2646:6: note: each > undeclared identifier is reported only once for each function it > appears > in > CC [M] drivers/w1/w1.o > > Running make config and enabling innova support and ipsec accelerator > support fixed it. > hmm, i think there is a missing include somewhere, will check this out now and send the fix in the next pull request, later today or tomorrow. Thanks Doug !
Re: [PATCH net-next 0/2] RDS: zerocopy code enhancements
From: Sowmini VaradhanDate: Tue, 6 Mar 2018 07:22:32 -0800 > A couple of enhancements to the rds zerocop code > - patch 1 refactors rds_message_copy_from_user to pull the zcopy logic > into its own function > - patch 2 drops the usage sk_buff to track MSG_ZEROCOPY cookies and > uses a simple linked list (enhancement suggested by willemb during > code review) Series applied, thanks.
Re: [RFC v3 net-next 08/18] net: SO_TXTIME: Add clockid and drop_if_late params
From: Eric DumazetDate: Wed, 07 Mar 2018 14:45:45 -0800 > No, we need to be extra careful. +1
Re: [RFC v3 iproute2 3/3] tc: Add support for the TBS Qdisc
On Wed, 7 Mar 2018 14:29:23 -0800 Jesus Sanchez-Palenciawrote: > Hi, > > > On 03/06/2018 05:51 PM, Stephen Hemminger wrote: > > On Tue, 6 Mar 2018 17:16:08 -0800 > > Jesus Sanchez-Palencia wrote: > > > >> atic int tbs_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) > >> +{ > >> + struct rtattr *tb[TCA_TBS_MAX+1]; > >> + struct tc_tbs_qopt *qopt; > >> + > >> + if (opt == NULL) > >> + return 0; > >> + > >> + parse_rtattr_nested(tb, TCA_TBS_MAX, opt); > >> + > >> + if (tb[TCA_TBS_PARMS] == NULL) > >> + return -1; > >> + > >> + qopt = RTA_DATA(tb[TCA_TBS_PARMS]); > >> + if (RTA_PAYLOAD(tb[TCA_TBS_PARMS]) < sizeof(*qopt)) > >> + return -1; > >> + > >> + fprintf(f, "clockid "); > >> + if (qopt->clockid == CLOCKID_INVALID) > >> + fprintf(f, "invalid "); > >> + else > >> + fprintf(f, "%d ", qopt->clockid); > >> + > >> + fprintf(f, "delta %d ", qopt->delta); > >> + fprintf(f, "offload %s ", (qopt->flags & TC_TBS_OFFLOAD_ON) ? > >> + "on" : "off"); > >> + fprintf(f, "sorting %s", (qopt->flags & TC_TBS_SORTING_ON) ? > >> + "on" : "off"); > >> + > >> + return 0; > >> +} > > > > All new print code in iproute2 should support JSON output. > > Look at other code using json_print.h for simple way to handle this. > > > > > Fixed, thanks. I'm assuming that only applies to print code from print_qopt() > implementations. Please let me know if otherwise. > Yes. that is what gets invoked by 'tc qdisc show'. Not everything is updated, but want to get there soon.
[PATCH] net: phy: Move interrupt check from phy_check to phy_interrupt
If multiple phys share the same interrupt (e.g. a multi-phy chip), the first device registered is the only one checked as phy_interrupt will always return IRQ_HANDLED if the first phydev is not halted. Move the interrupt check into phy_interrupt and, if it was not this phydev, return IRQ_NONE to allow other devices on this irq a chance to check if it was their interrupt. Signed-off-by: Brad Mouring--- drivers/net/phy/phy.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index e3e29c2b028b..ff1aa815568f 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -632,6 +632,12 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat) if (PHY_HALTED == phydev->state) return IRQ_NONE;/* It can't be ours. */ + if (phy_interrupt_is_valid(phydev)) { + if (phydev->drv->did_interrupt && + !phydev->drv->did_interrupt(phydev)) + return IRQ_NONE; + } + phy_change(phydev); return IRQ_HANDLED; @@ -725,16 +731,6 @@ EXPORT_SYMBOL(phy_stop_interrupts); */ void phy_change(struct phy_device *phydev) { - if (phy_interrupt_is_valid(phydev)) { - if (phydev->drv->did_interrupt && - !phydev->drv->did_interrupt(phydev)) - return; - - if (phydev->state == PHY_HALTED) - if (phy_disable_interrupts(phydev)) - goto phy_err; - } - mutex_lock(>lock); if ((PHY_RUNNING == phydev->state) || (PHY_NOLINK == phydev->state)) phydev->state = PHY_CHANGELINK; -- 2.16.2
Re: [PATCH bpf] bpf, x64: increase number of passes
On Wed, Mar 07, 2018 at 10:10:01PM +0100, Daniel Borkmann wrote: > In Cilium some of the main programs we run today are hitting 9 passes > on x64's JIT compiler, and we've had cases already where we surpassed > the limit where the JIT then punts the program to the interpreter > instead, leading to insertion failures due to CONFIG_BPF_JIT_ALWAYS_ON > or insertion failures due to the prog array owner being JITed but the > program to insert not (both must have the same JITed/non-JITed property). > > One concrete case the program image shrunk from 12,767 bytes down to > 10,288 bytes where the image converged after 16 steps. I've measured > that this took 340us in the JIT until it converges on my i7-6600U. Thus, > increase the original limit we had from day one where the JIT covered > cBPF only back then before we run into the case (as similar with the > complexity limit) where we trip over this and hit program rejections. > Also add a cond_resched() into the compilation loop, the JIT process > runs without any locks and may sleep anyway. > > Signed-off-by: Daniel Borkmann> Acked-by: Alexei Starovoitov Applied to bpf tree, Thanks Daniel!
Re: [RFC v3 net-next 08/18] net: SO_TXTIME: Add clockid and drop_if_late params
On Wed, 2018-03-07 at 13:52 -0800, Jesus Sanchez-Palencia wrote: > Hi, ... > I should have mentioned on the commit msg, but the tc_drop_if_late is > actually > filling a 1 bit hole that was already there. > > > > > > Do we really need 32 bits for a clockid_t ? > > There is a 2 bytes hole just after tc_index, so a u16 clockid would > fit > perfectly without increasing the skbuffs size / cachelines any > further. > > From Richard's reply, it seems safe to just change the definition > here if we > make it explicit on the SCM_CLOCKID documentation the caveat about > the max > possible fd count for dynamic clocks. > > How does that sound? Not convincing really :/ Next big feature needing one bit in sk_buff will add it, and add a 63bit hole. Then next feature(s) will happily consume 'because there are holes anyway'. Then at some point we will cross cache line boundary and performance will take a 10 % hit. It is a never ending trend. If you really need 33 bits, then maybe we'll ask you to guard the new bits with some #if IS_ENABLED(CONFIG_...) so that we can opt-out. Why do we _really_ need dynamic clocks being supported in core networking stack, other than 'that is needed to send 2 packets per second with precise departure time and arbitrary user defined clocks, so lets do that, and do not care of the other 10,000,000 packets we receive/send per second' I have one patch (TXCS, something that I called XPS in the past) implementing the remote-freeing of skbs that help workloads where skb are produced on cpu A and consumed on cpu B, using an additional 16bit field that I have not upstreamed yet (even if Mellanox folks want that), simply because of this additional field... Maybe I should eat this hole before you take it ? No, we need to be extra careful.
[PATCH 1/2] net: macb: Add phy-handle DT support
This optional binding (as described in the ethernet DT bindings doc) directs the netdev to the phydev to use. This is useful for a phy chip that has >1 phy in it, and two netdevs are using the same phy chip (i.e. the second mac's phy lives on the first mac's MDIO bus) The devicetree snippet would look something like this: ethernet@feedf00d { ... phy-handle = <> // the first netdev is physically wired to phy0 ... phy0: phy@0 { ... reg = <0x0> // MDIO address 0 ... } phy1: phy@1 { ... reg = <0x1> // MDIO address 1 ... } ... } ethernet@deadbeef { ... phy-handle = <> // tells the driver to use phy1 on the // first mac's mdio bus (it's wired thusly) ... } The work done to add the phy_node in the first place (dacdbb4dfc1a1: "net: macb: add fixed-link node support") will consume the device_node (if found). Signed-off-by: Brad Mouring--- drivers/net/ethernet/cadence/macb_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index e84afcf1ecb5..cc5b9e6e3526 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -567,6 +567,9 @@ static int macb_mii_init(struct macb *bp) err = mdiobus_register(bp->mii_bus); } else { + /* attempt to find a phy-handle */ + bp->phy_node = of_parse_phandle(np, "phy-handle", 0); + /* try dt phy registration */ err = of_mdiobus_register(bp->mii_bus, np); -- 2.16.2
[PATCH v2 net] net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()
f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a __this_cpu_read() call inside ipcomp_alloc_tfms(). At the time, __this_cpu_read() required the caller to either not care about races or to handle preemption/interrupt issues. 3.15 tightened the rules around some per-cpu operations, and now __this_cpu_read() should never be used in a preemptible context. On 3.15 and later, we need to use this_cpu_read() instead. syzkaller reported this leading to the following kernel BUG while fuzzing sendmsg: BUG: using __this_cpu_read() in preemptible [] code: repro/3101 caller is ipcomp_init_state+0x185/0x990 CPU: 3 PID: 3101 Comm: repro Not tainted 4.16.0-rc4-00123-g86f84779d8e9 #154 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0xb9/0x115 check_preemption_disabled+0x1cb/0x1f0 ipcomp_init_state+0x185/0x990 ? __xfrm_init_state+0x876/0xc20 ? lock_downgrade+0x5e0/0x5e0 ipcomp4_init_state+0xaa/0x7c0 __xfrm_init_state+0x3eb/0xc20 xfrm_init_state+0x19/0x60 pfkey_add+0x20df/0x36f0 ? pfkey_broadcast+0x3dd/0x600 ? pfkey_sock_destruct+0x340/0x340 ? pfkey_seq_stop+0x80/0x80 ? __skb_clone+0x236/0x750 ? kmem_cache_alloc+0x1f6/0x260 ? pfkey_sock_destruct+0x340/0x340 ? pfkey_process+0x62a/0x6f0 pfkey_process+0x62a/0x6f0 ? pfkey_send_new_mapping+0x11c0/0x11c0 ? mutex_lock_io_nested+0x1390/0x1390 pfkey_sendmsg+0x383/0x750 ? dump_sp+0x430/0x430 sock_sendmsg+0xc0/0x100 ___sys_sendmsg+0x6c8/0x8b0 ? copy_msghdr_from_user+0x3b0/0x3b0 ? pagevec_lru_move_fn+0x144/0x1f0 ? find_held_lock+0x32/0x1c0 ? do_huge_pmd_anonymous_page+0xc43/0x11e0 ? lock_downgrade+0x5e0/0x5e0 ? get_kernel_page+0xb0/0xb0 ? _raw_spin_unlock+0x29/0x40 ? do_huge_pmd_anonymous_page+0x400/0x11e0 ? __handle_mm_fault+0x553/0x2460 ? __fget_light+0x163/0x1f0 ? __sys_sendmsg+0xc7/0x170 __sys_sendmsg+0xc7/0x170 ? SyS_shutdown+0x1a0/0x1a0 ? __do_page_fault+0x5a0/0xca0 ? lock_downgrade+0x5e0/0x5e0 SyS_sendmsg+0x27/0x40 ? __sys_sendmsg+0x170/0x170 do_syscall_64+0x19f/0x640 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x7f0ee73dfb79 RSP: 002b:7ffe14fc15a8 EFLAGS: 0207 ORIG_RAX: 002e RAX: ffda RBX: RCX: 7f0ee73dfb79 RDX: RSI: 208befc8 RDI: 0004 RBP: 7ffe14fc15b0 R08: 7ffe14fc15c0 R09: 7ffe14fc15c0 R10: R11: 0207 R12: 00400440 R13: 7ffe14fc16b0 R14: R15: Signed-off-by: Greg Hackmann--- v2: expand commit log to clarify that kernel 3.15 and later are impacted net/xfrm/xfrm_ipcomp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c index ccfdc7115a83..a00ec715aa46 100644 --- a/net/xfrm/xfrm_ipcomp.c +++ b/net/xfrm/xfrm_ipcomp.c @@ -283,7 +283,7 @@ static struct crypto_comp * __percpu *ipcomp_alloc_tfms(const char *alg_name) struct crypto_comp *tfm; /* This can be any valid CPU ID so we don't need locking. */ - tfm = __this_cpu_read(*pos->tfms); + tfm = this_cpu_read(*pos->tfms); if (!strcmp(crypto_comp_name(tfm), alg_name)) { pos->users++; -- 2.16.2.395.g2e18187dfd-goog
[PATCH 2/2] Documentation: macb: Document phy-handle optional binding
Signed-off-by: Brad Mouring--- Documentation/devicetree/bindings/net/macb.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt index 27966ae741e0..457d5ae16f23 100644 --- a/Documentation/devicetree/bindings/net/macb.txt +++ b/Documentation/devicetree/bindings/net/macb.txt @@ -29,6 +29,7 @@ Optional properties for PHY child node: - reset-gpios : Should specify the gpio for phy reset - magic-packet : If present, indicates that the hardware supports waking up via magic packet. +- phy-handle : see ethernet.txt file in the same directory Examples: -- 2.16.2
Re: [PATCH net-next] openvswitch: fix vport packet length check.
On Wed, Mar 7, 2018 at 1:18 PM, Pravin Shelarwrote: > On Tue, Mar 6, 2018 at 5:56 PM, William Tu wrote: >> When sending a packet to a tunnel device, the dev's hard_header_len >> could be larger than the skb->len in function packet_length(). >> In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen, >> which is around 180, and an ARP packet sent to this tunnel has >> skb->len = 42. This causes the 'unsign int length' to become super >> large because it is negative value, causing the later ovs_vport_send >> to drop it due to over-mtu size. The patch fixes it by setting it to 0. >> >> Signed-off-by: William Tu >> --- >> net/openvswitch/vport.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c >> index b6c8524032a0..7718d5b4cf8a 100644 >> --- a/net/openvswitch/vport.c >> +++ b/net/openvswitch/vport.c >> @@ -467,7 +467,7 @@ int ovs_vport_receive(struct vport *vport, struct >> sk_buff *skb, >> static unsigned int packet_length(const struct sk_buff *skb, >> struct net_device *dev) > Can you also change return type of this function? > OK, I will change to int. Thanks
Re: [RFC v3 iproute2 3/3] tc: Add support for the TBS Qdisc
Hi, On 03/06/2018 05:51 PM, Stephen Hemminger wrote: > On Tue, 6 Mar 2018 17:16:08 -0800 > Jesus Sanchez-Palenciawrote: > >> atic int tbs_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) >> +{ >> +struct rtattr *tb[TCA_TBS_MAX+1]; >> +struct tc_tbs_qopt *qopt; >> + >> +if (opt == NULL) >> +return 0; >> + >> +parse_rtattr_nested(tb, TCA_TBS_MAX, opt); >> + >> +if (tb[TCA_TBS_PARMS] == NULL) >> +return -1; >> + >> +qopt = RTA_DATA(tb[TCA_TBS_PARMS]); >> +if (RTA_PAYLOAD(tb[TCA_TBS_PARMS]) < sizeof(*qopt)) >> +return -1; >> + >> +fprintf(f, "clockid "); >> +if (qopt->clockid == CLOCKID_INVALID) >> +fprintf(f, "invalid "); >> +else >> +fprintf(f, "%d ", qopt->clockid); >> + >> +fprintf(f, "delta %d ", qopt->delta); >> +fprintf(f, "offload %s ", (qopt->flags & TC_TBS_OFFLOAD_ON) ? >> +"on" : "off"); >> +fprintf(f, "sorting %s", (qopt->flags & TC_TBS_SORTING_ON) ? >> +"on" : "off"); >> + >> +return 0; >> +} > > All new print code in iproute2 should support JSON output. > Look at other code using json_print.h for simple way to handle this. > Fixed, thanks. I'm assuming that only applies to print code from print_qopt() implementations. Please let me know if otherwise.
RE: [Intel-wired-lan] [next-queue PATCH v3 6/8] igb: Add MAC address support for ethtool nftuple filters
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On > Behalf Of Vinicius Costa Gomes > Sent: Tuesday, March 6, 2018 5:30 PM > To: intel-wired-...@lists.osuosl.org > Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus palen...@intel.com> > Subject: [Intel-wired-lan] [next-queue PATCH v3 6/8] igb: Add MAC address > support for ethtool nftuple filters > > This adds the capability of configuring the queue steering of arriving > packets based on their source and destination MAC addresses. > > In practical terms this adds support for the following use cases, > characterized by these examples: > > $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0 > (this will direct packets with destination address "aa:aa:aa:aa:aa:aa" > to the RX queue 0) > > $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3 > (this will direct packets with destination address "44:44:44:44:44:44" > to the RX queue 3) I assume this example should read "... source address" rather than "...destination". > > Signed-off-by: Vinicius Costa Gomes> --- > drivers/net/ethernet/intel/igb/igb_ethtool.c | 35 > > 1 file changed, 31 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c > b/drivers/net/ethernet/intel/igb/igb_ethtool.c > index 94fc9a4bed8b..3f98299d4cd0 100644 > --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c > +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c > @@ -2494,6 +2494,23 @@ static int igb_get_ethtool_nfc_entry(struct > igb_adapter *adapter, > fsp->h_ext.vlan_tci = rule->filter.vlan_tci; > fsp->m_ext.vlan_tci = htons(VLAN_PRIO_MASK); > } > + if (rule->filter.match_flags & > IGB_FILTER_FLAG_DST_MAC_ADDR) { > + ether_addr_copy(fsp->h_u.ether_spec.h_dest, > + rule->filter.dst_addr); > + /* As we only support matching by the full > + * mask, return the mask to userspace > + */ > + eth_broadcast_addr(fsp->m_u.ether_spec.h_dest); > + } > + if (rule->filter.match_flags & > IGB_FILTER_FLAG_SRC_MAC_ADDR) { > + ether_addr_copy(fsp->h_u.ether_spec.h_source, > + rule->filter.src_addr); > + /* As we only support matching by the full > + * mask, return the mask to userspace > + */ > + eth_broadcast_addr(fsp- > >m_u.ether_spec.h_source); > + } > + > return 0; > } > return -EINVAL; > @@ -2932,10 +2949,6 @@ static int igb_add_ethtool_nfc_entry(struct > igb_adapter *adapter, > if ((fsp->flow_type & ~FLOW_EXT) != ETHER_FLOW) > return -EINVAL; > > - if (fsp->m_u.ether_spec.h_proto != ETHER_TYPE_FULL_MASK && > - fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) > - return -EINVAL; > - > input = kzalloc(sizeof(*input), GFP_KERNEL); > if (!input) > return -ENOMEM; > @@ -2945,6 +2958,20 @@ static int igb_add_ethtool_nfc_entry(struct > igb_adapter *adapter, > input->filter.match_flags = IGB_FILTER_FLAG_ETHER_TYPE; > } > > + /* Only support matching addresses by the full mask */ > + if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_source)) { > + input->filter.match_flags |= > IGB_FILTER_FLAG_SRC_MAC_ADDR; > + ether_addr_copy(input->filter.src_addr, > + fsp->h_u.ether_spec.h_source); > + } > + > + /* Only support matching addresses by the full mask */ > + if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_dest)) { > + input->filter.match_flags |= > IGB_FILTER_FLAG_DST_MAC_ADDR; > + ether_addr_copy(input->filter.dst_addr, > + fsp->h_u.ether_spec.h_dest); > + } > + > if ((fsp->flow_type & FLOW_EXT) && fsp->m_ext.vlan_tci) { > if (fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) { > err = -EINVAL; > -- > 2.16.2 > > ___ > Intel-wired-lan mailing list > intel-wired-...@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Re: [next-queue PATCH v3 8/8] igb: Add support for adding offloaded clsflower filters
Hi, Jakub Kicinskiwrites: > On Tue, 6 Mar 2018 17:29:57 -0800, Vinicius Costa Gomes wrote: >> This allows filters added by tc-flower and specifying MAC addresses, >> Ethernet types, and the VLAN priority field, to be offloaded to the >> controller. >> >> This reuses most of the infrastructure used by ethtool, but clsflower >> filters are kept in a separated list, so they are invisible to >> ethtool. >> >> Signed-off-by: Vinicius Costa Gomes > > LGTM, thanks! > >> +NL_SET_ERR_MSG(extack, > > One nit: consider using NL_SET_ERR_MSG_MOD to prefix the message with > driver name. Sure thing. Will send a v4 with this shortly. Cheers, -- Vinicius
Re: [PATCH net 2/5] tcp: prevent bogus FRTO undos with non-SACK flows
On Wed, 7 Mar 2018, Yuchung Cheng wrote: > On Wed, Mar 7, 2018 at 11:24 AM, Neal Cardwellwrote: > > On Wed, Mar 7, 2018 at 7:59 AM, Ilpo Järvinen > > wrote: > > > > > > In a non-SACK case, any non-retransmitted segment acknowledged will > > > set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is > > > no indication that it would have been delivered for real (the > > > scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK > > > case). This causes bogus undos in ordinary RTO recoveries where > > > segments are lost here and there, with a few delivered segments in > > > between losses. A cumulative ACKs will cover retransmitted ones at > > > the bottom and the non-retransmitted ones following that causing > > > FLAG_ORIG_SACK_ACKED to be set in tcp_clean_rtx_queue and results > > > in a spurious FRTO undo. > > > > > > We need to make the check more strict for non-SACK case and check > > > that none of the cumulatively ACKed segments were retransmitted, > > > which would be the case for the last step of FRTO algorithm as we > > > sent out only new segments previously. Only then, allow FRTO undo > > > to proceed in non-SACK case. > > > > Hi Ilpo - Do you have a packet trace or (even better) packetdrill > > script illustrating this issue? It would be nice to have a test case > > or at least concrete example of this. > > a packetdrill or even a contrived example would be good ... I've seen all but this for sure in packet traces. But I'm somewhat old-school that while looking for the burst issue I discovered this issue by reading the code only (making it more than _one_ issue). However, I think that I later on saw also this issue from the traces (as it seemed to not match to any of the other burst issues this whole series is trying to fix). But finding that dump afterwards would take really long time, I've more than enough of them from our recent tests ;-)). But anyway, that was before the recent moving for the condition into tp->frto block so it might no longer be triggerable. It clearly was triggerable beforehand without tp->frto guard (and I just forward-ported past that recent change without thinking it much). To trigger it, ever-R and !ever-R skb would need to be cumulatively ACKed when tp->frto is non-zero. Do you think that is still possible with FRTO? E.g., after some undo leaving some ever-R and then RTO resulting in FRTO procedure? > also why not just avoid setting FLAG_ORIG_SACK_ACKED on non-sack? seems > a much clean fix. I guess that would work now that the relevant FRTO condition got moved into the tp->frto block. It wouldn't have been that simple earlier as SACK wanted FLAG_ORIG_SACK_ACKED while non-SACK wants FLAG_ONLY_ORIG_ACKED (that was already available through a combination of the existing FLAGs). -- i.