Re: [bpf-next V1 PATCH 09/15] mlx5: register a memory model when XDP is enabled

2018-03-07 Thread Jesper Dangaard Brouer
On Wed, 7 Mar 2018 13:50:19 +0200
Tariq Toukan  wrote:

> On 06/03/2018 11:48 PM, Jesper Dangaard Brouer wrote:
> > Now all the users of ndo_xdp_xmit have been converted to use 
> > xdp_return_frame.
> > This enable a different memory model, thus activating another code path
> > in the xdp_return_frame API.
> > 
> > Signed-off-by: Jesper Dangaard Brouer 
> > ---
> >   drivers/net/ethernet/mellanox/mlx5/core/en_main.c |7 +++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index da94c8cba5ee..51482943c583 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -506,6 +506,13 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
> > rq->mkey_be = c->mkey_be;
> > }
> >   
> > +   /* This must only be activate for order-0 pages */
> > +   if (rq->xdp_prog)
> > +   err = xdp_rxq_info_reg_mem_model(>xdp_rxq,
> > +MEM_TYPE_PAGE_ORDER0, NULL);
> > +   if (err < 0)
> > +   goto err_rq_wq_destroy;
> > +  
> 
> Use "if (err)" here, instead of changing this in next patch.
> Also, get it into the "if (rq->xdp_prog)" block.

Good point, fixed!

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH] net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()

2018-03-07 Thread Herbert Xu
On Wed, Mar 07, 2018 at 11:24:16AM -0800, Greg Hackmann wrote:
> f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a
> __this_cpu_read() call inside ipcomp_alloc_tfms().  Since this call was
> introduced, the rules around per-cpu accessors have been tightened and
> __this_cpu_read() cannot be used in a preemptible context.
> 
> syzkaller reported this leading to the following kernel BUG while
> fuzzing sendmsg:

How about reverting f7c83bcbfaf5 instead?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


,Your urgent confirmation

2018-03-07 Thread James Williams
Attn: Beneficiary,

We have contacted the Federal Ministry of Finance on your Behalf and
they have brought a solution to your problem by coordinating your
payment in total (10,000,000.00) Ten Million Dollars in an atm card
which you can use to withdraw money from any ATM MACHINE CENTER
anywhere in the world with a maximum of 1 Dollars daily. You now
have the lawful right to claim your fund in an atm card. Since the
Federal Bureau of Investigation is involved in this transaction, you
have to be rest assured for this is 100% risk free it is our duty to
protect the American Citizens, European Citizens, Asian Citizen. All I
want you to do is to contact the atm card CENTER Via email or call the
office telephone number one of the Consultant will assist you for
their requirements to proceed and procure your Approval Slip on your
behalf.

CONTACT INFORMATION
NAME: James  Williams
EMAIL: paymasterofficed...@gmail.com


Do contact us with your details:

Full name//
Address// Age//
 Telephone Numbers//
Occupation//
 Your Country//
Bank Details//

So your files would be updated after which the Delivery of your
atm card will be affected to your designated home Address without any
further delay and the bank will transfer your funds in total
(10,000,000.00) Ten Million Dollars to your Bank account. We
will reply you with the secret code (1600 atm card). We advice you get
back to the payment office after you have contacted the ATM SWIFT CARD
CENTER and we do await your response so we can move on with our
Investigation and make sure your ATM SWIFT CARD gets to you.


Best Regards
James Williams
Paymaster General
Federal Republic Of Nigeri


[PATCH net-next] liquidio: fix ndo_change_mtu to always return correct status to the caller

2018-03-07 Thread Felix Manlunas
From: Veerasenareddy Burru 

In a scenario where the command queued to firmware get dropped or times
out, MTU change from host will not propagate to firmware. So, it is
required for host driver to wait for response from firmware or timeout
and then return correct status to caller of ndo_change_mtu.

Also moved the common code for MTU change from PF and VF driver files to
common file lio_core.c

Signed-off-by: Veerasenareddy Burru 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/lio_core.c| 96 --
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 70 ++--
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 67 +++
 .../net/ethernet/cavium/liquidio/liquidio_common.h |  3 +-
 .../net/ethernet/cavium/liquidio/octeon_network.h  | 20 +
 5 files changed, 143 insertions(+), 113 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c 
b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 8b1ee83..ff4bfb28 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -164,15 +164,6 @@ void liquidio_link_ctrl_cmd_completion(void *nctrl_ptr)
}
break;
 
-   case OCTNET_CMD_CHANGE_MTU:
-   /* If command is successful, change the MTU. */
-   netif_info(lio, probe, lio->netdev, "MTU Changed from %d to 
%d\n",
-  netdev->mtu, nctrl->ncmd.s.param1);
-   netdev->mtu = nctrl->ncmd.s.param1;
-   queue_delayed_work(lio->link_status_wq.wq,
-  >link_status_wq.wk.work, 0);
-   break;
-
case OCTNET_CMD_GPIO_ACCESS:
netif_info(lio, probe, lio->netdev, "LED Flashing visual 
identification\n");
 
@@ -1081,3 +1072,90 @@ int octeon_setup_interrupt(struct octeon_device *oct, 
u32 num_ioqs)
}
return 0;
 }
+
+static void liquidio_change_mtu_completion(struct octeon_device *oct,
+  u32 status, void *buf)
+{
+   struct octeon_soft_command *sc = (struct octeon_soft_command *)buf;
+   struct liquidio_if_cfg_context *ctx;
+
+   ctx  = (struct liquidio_if_cfg_context *)sc->ctxptr;
+
+   if (status) {
+   dev_err(>pci_dev->dev, "MTU change failed. Status: %llx\n",
+   CVM_CAST64(status));
+   WRITE_ONCE(ctx->cond, LIO_CHANGE_MTU_FAIL);
+   } else {
+   WRITE_ONCE(ctx->cond, LIO_CHANGE_MTU_SUCCESS);
+   }
+
+   /* This barrier is required to be sure that the response has been
+* written fully before waking up the handler
+*/
+   wmb();
+
+   wake_up_interruptible(>wc);
+}
+
+/**
+ * \brief Net device change_mtu
+ * @param netdev network device
+ */
+int liquidio_change_mtu(struct net_device *netdev, int new_mtu)
+{
+   struct lio *lio = GET_LIO(netdev);
+   struct octeon_device *oct = lio->oct_dev;
+   struct liquidio_if_cfg_context *ctx;
+   struct octeon_soft_command *sc;
+   union octnet_cmd *ncmd;
+   int ctx_size;
+   int ret = 0;
+
+   ctx_size = sizeof(struct liquidio_if_cfg_context);
+   sc = (struct octeon_soft_command *)
+   octeon_alloc_soft_command(oct, OCTNET_CMD_SIZE, 16, ctx_size);
+
+   ncmd = (union octnet_cmd *)sc->virtdptr;
+   ctx  = (struct liquidio_if_cfg_context *)sc->ctxptr;
+
+   WRITE_ONCE(ctx->cond, 0);
+   ctx->octeon_id = lio_get_device_id(oct);
+   init_waitqueue_head(>wc);
+
+   ncmd->u64 = 0;
+   ncmd->s.cmd = OCTNET_CMD_CHANGE_MTU;
+   ncmd->s.param1 = new_mtu;
+
+   octeon_swap_8B_data((u64 *)ncmd, (OCTNET_CMD_SIZE >> 3));
+
+   sc->iq_no = lio->linfo.txpciq[0].s.q_no;
+
+   octeon_prepare_soft_command(oct, sc, OPCODE_NIC,
+   OPCODE_NIC_CMD, 0, 0, 0);
+
+   sc->callback = liquidio_change_mtu_completion;
+   sc->callback_arg = sc;
+   sc->wait_time = 100;
+
+   ret = octeon_send_soft_command(oct, sc);
+   if (ret == IQ_SEND_FAILED) {
+   netif_info(lio, rx_err, lio->netdev, "Failed to change MTU\n");
+   return -EINVAL;
+   }
+   /* Sleep on a wait queue till the cond flag indicates that the
+* response arrived or timed-out.
+*/
+   if (sleep_cond(>wc, >cond) == -EINTR ||
+   ctx->cond == LIO_CHANGE_MTU_FAIL) {
+   octeon_free_soft_command(oct, sc);
+   return -EINVAL;
+   }
+   /* command is successful, change the MTU. */
+   netif_info(lio, probe, lio->netdev, "MTU changed from %d to %d\n",
+  netdev->mtu, new_mtu);
+   netdev->mtu = new_mtu;
+   lio->mtu = new_mtu;
+
+   octeon_free_soft_command(oct, sc);
+   return 0;
+}
diff --git 

Please add "NFC: llcp: Limit size of SDP URI" for stable

2018-03-07 Thread Kees Cook
Hi,

I don't see fe9c842695e2 ("NFC: llcp: Limit size of SDP URI") queued
up for stable. Can this one be added please? This is a buffer overflow
fix.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH net-next 00/23] net: hns3: HNS3 bug fixes & code improvements

2018-03-07 Thread David Miller

Sorry, this is way too large of a patch series.

Please keep your series to about a dozen or so changes.

Anything longer puts an unreasonable burdon upon patch
reviewers, and such a large series will often make it
so that nearly all reviewers are discouraged from taking
a look at all.

Thank you.


[PATCH AUTOSEL for 4.9 032/190] time: Change posix clocks ops interfaces to use timespec64

2018-03-07 Thread Sasha Levin
From: Deepa Dinamani 

[ Upstream commit d340266e19ddb70dbd608f9deedcfb35fdb9d419 ]

struct timespec is not y2038 safe on 32 bit machines.

The posix clocks apis use struct timespec directly and through struct
itimerspec.

Replace the posix clock interfaces to use struct timespec64 and struct
itimerspec64 instead.  Also fix up their implementations accordingly.

Note that the clock_getres() interface has also been changed to use
timespec64 even though this particular interface is not affected by the
y2038 problem. This helps verification for internal kernel code for y2038
readiness by getting rid of time_t/ timeval/ timespec.

Signed-off-by: Deepa Dinamani 
Cc: a...@arndb.de
Cc: y2...@lists.linaro.org
Cc: netdev@vger.kernel.org
Cc: Richard Cochran 
Cc: john.stu...@linaro.org
Link: 
http://lkml.kernel.org/r/1490555058-4603-3-git-send-email-deepa.ker...@gmail.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Sasha Levin 
---
 drivers/ptp/ptp_clock.c | 18 +++---
 include/linux/posix-clock.h | 10 +-
 kernel/time/posix-clock.c   | 34 --
 3 files changed, 36 insertions(+), 26 deletions(-)

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 86280b7e41f3..2aa5b37cc6d2 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -97,30 +97,26 @@ static s32 scaled_ppm_to_ppb(long ppm)
 
 /* posix clock implementation */
 
-static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp)
 {
tp->tv_sec = 0;
tp->tv_nsec = 1;
return 0;
 }
 
-static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp)
+static int ptp_clock_settime(struct posix_clock *pc, const struct timespec64 
*tp)
 {
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
-   struct timespec64 ts = timespec_to_timespec64(*tp);
 
-   return  ptp->info->settime64(ptp->info, );
+   return  ptp->info->settime64(ptp->info, tp);
 }
 
-static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp)
 {
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
-   struct timespec64 ts;
int err;
 
-   err = ptp->info->gettime64(ptp->info, );
-   if (!err)
-   *tp = timespec64_to_timespec(ts);
+   err = ptp->info->gettime64(ptp->info, tp);
return err;
 }
 
@@ -133,7 +129,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct 
timex *tx)
ops = ptp->info;
 
if (tx->modes & ADJ_SETOFFSET) {
-   struct timespec ts;
+   struct timespec64 ts;
ktime_t kt;
s64 delta;
 
@@ -146,7 +142,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct 
timex *tx)
if ((unsigned long) ts.tv_nsec >= NSEC_PER_SEC)
return -EINVAL;
 
-   kt = timespec_to_ktime(ts);
+   kt = timespec64_to_ktime(ts);
delta = ktime_to_ns(kt);
err = ops->adjtime(ops, delta);
} else if (tx->modes & ADJ_FREQUENCY) {
diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h
index 34c4498b800f..83b22ae9ae12 100644
--- a/include/linux/posix-clock.h
+++ b/include/linux/posix-clock.h
@@ -59,23 +59,23 @@ struct posix_clock_operations {
 
int  (*clock_adjtime)(struct posix_clock *pc, struct timex *tx);
 
-   int  (*clock_gettime)(struct posix_clock *pc, struct timespec *ts);
+   int  (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts);
 
-   int  (*clock_getres) (struct posix_clock *pc, struct timespec *ts);
+   int  (*clock_getres) (struct posix_clock *pc, struct timespec64 *ts);
 
int  (*clock_settime)(struct posix_clock *pc,
- const struct timespec *ts);
+ const struct timespec64 *ts);
 
int  (*timer_create) (struct posix_clock *pc, struct k_itimer *kit);
 
int  (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit);
 
void (*timer_gettime)(struct posix_clock *pc,
- struct k_itimer *kit, struct itimerspec *tsp);
+ struct k_itimer *kit, struct itimerspec64 *tsp);
 
int  (*timer_settime)(struct posix_clock *pc,
  struct k_itimer *kit, int flags,
- struct itimerspec *tsp, struct itimerspec *old);
+ struct itimerspec64 *tsp, struct itimerspec64 
*old);
/*
 * Optional character device methods:
 */
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index 9cff0ab82b63..e24008c098c6 100644
--- 

[PATCH net-next] liquidio: avoid doing useless work

2018-03-07 Thread Felix Manlunas
From: Prasad Kanneganti 

Avoid doing useless work by making sure that the response_list is not empty
before scheduling work to process it.

Signed-off-by: Prasad Kanneganti 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/request_manager.c  | 5 +
 drivers/net/ethernet/cavium/liquidio/response_manager.c | 6 --
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c 
b/drivers/net/ethernet/cavium/liquidio/request_manager.c
index e07d209..2766af0 100644
--- a/drivers/net/ethernet/cavium/liquidio/request_manager.c
+++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c
@@ -366,6 +366,7 @@ int
 lio_process_iq_request_list(struct octeon_device *oct,
struct octeon_instr_queue *iq, u32 napi_budget)
 {
+   struct cavium_wq *cwq = >dma_comp_wq;
int reqtype;
void *buf;
u32 old = iq->flush_index;
@@ -450,6 +451,10 @@ lio_process_iq_request_list(struct octeon_device *oct,
   bytes_compl);
iq->flush_index = old;
 
+   if (atomic_read(>response_list
+   [OCTEON_ORDERED_SC_LIST].pending_req_count))
+   queue_delayed_work(cwq->wq, >wk.work, msecs_to_jiffies(1));
+
return inst_count;
 }
 
diff --git a/drivers/net/ethernet/cavium/liquidio/response_manager.c 
b/drivers/net/ethernet/cavium/liquidio/response_manager.c
index 3d691c6..fe5b537 100644
--- a/drivers/net/ethernet/cavium/liquidio/response_manager.c
+++ b/drivers/net/ethernet/cavium/liquidio/response_manager.c
@@ -49,7 +49,6 @@ int octeon_setup_response_list(struct octeon_device *oct)
INIT_DELAYED_WORK(>wk.work, oct_poll_req_completion);
cwq->wk.ctxptr = oct;
oct->cmd_resp_state = OCT_DRV_ONLINE;
-   queue_delayed_work(cwq->wq, >wk.work, msecs_to_jiffies(50));
 
return ret;
 }
@@ -164,5 +163,8 @@ static void oct_poll_req_completion(struct work_struct 
*work)
struct cavium_wq *cwq = >dma_comp_wq;
 
lio_process_ordered_list(oct, 0);
-   queue_delayed_work(cwq->wq, >wk.work, msecs_to_jiffies(50));
+
+   if (atomic_read(>response_list
+   [OCTEON_ORDERED_SC_LIST].pending_req_count))
+   queue_delayed_work(cwq->wq, >wk.work, msecs_to_jiffies(1));
 }


[PATCH net-next] liquidio: Resolved mbox read issue while reading more than one 64bit data

2018-03-07 Thread Felix Manlunas
From: Intiyaz Basha 

Corrected length check when data received in the mbox is more than one
64 bit data value

Signed-off-by: Intiyaz Basha 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
index 57af7df..28e74ee 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
@@ -87,7 +87,7 @@ int octeon_mbox_read(struct octeon_mbox *mbox)
}
 
if (mbox->state & OCTEON_MBOX_STATE_REQUEST_RECEIVING) {
-   if (mbox->mbox_req.recv_len < msg.s.len) {
+   if (mbox->mbox_req.recv_len < mbox->mbox_req.msg.s.len) {
ret = 0;
} else {
mbox->state &= ~OCTEON_MBOX_STATE_REQUEST_RECEIVING;
@@ -96,7 +96,8 @@ int octeon_mbox_read(struct octeon_mbox *mbox)
}
} else {
if (mbox->state & OCTEON_MBOX_STATE_RESPONSE_RECEIVING) {
-   if (mbox->mbox_resp.recv_len < msg.s.len) {
+   if (mbox->mbox_resp.recv_len <
+   mbox->mbox_resp.msg.s.len) {
ret = 0;
} else {
mbox->state &=
-- 
1.8.3.1



[PATCH AUTOSEL for 4.4 016/101] time: Change posix clocks ops interfaces to use timespec64

2018-03-07 Thread Sasha Levin
From: Deepa Dinamani 

[ Upstream commit d340266e19ddb70dbd608f9deedcfb35fdb9d419 ]

struct timespec is not y2038 safe on 32 bit machines.

The posix clocks apis use struct timespec directly and through struct
itimerspec.

Replace the posix clock interfaces to use struct timespec64 and struct
itimerspec64 instead.  Also fix up their implementations accordingly.

Note that the clock_getres() interface has also been changed to use
timespec64 even though this particular interface is not affected by the
y2038 problem. This helps verification for internal kernel code for y2038
readiness by getting rid of time_t/ timeval/ timespec.

Signed-off-by: Deepa Dinamani 
Cc: a...@arndb.de
Cc: y2...@lists.linaro.org
Cc: netdev@vger.kernel.org
Cc: Richard Cochran 
Cc: john.stu...@linaro.org
Link: 
http://lkml.kernel.org/r/1490555058-4603-3-git-send-email-deepa.ker...@gmail.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Sasha Levin 
---
 drivers/ptp/ptp_clock.c | 18 +++---
 include/linux/posix-clock.h | 10 +-
 kernel/time/posix-clock.c   | 34 --
 3 files changed, 36 insertions(+), 26 deletions(-)

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 2e481b9e8ea5..60a5e0c63a13 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -97,30 +97,26 @@ static s32 scaled_ppm_to_ppb(long ppm)
 
 /* posix clock implementation */
 
-static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp)
 {
tp->tv_sec = 0;
tp->tv_nsec = 1;
return 0;
 }
 
-static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp)
+static int ptp_clock_settime(struct posix_clock *pc, const struct timespec64 
*tp)
 {
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
-   struct timespec64 ts = timespec_to_timespec64(*tp);
 
-   return  ptp->info->settime64(ptp->info, );
+   return  ptp->info->settime64(ptp->info, tp);
 }
 
-static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp)
 {
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
-   struct timespec64 ts;
int err;
 
-   err = ptp->info->gettime64(ptp->info, );
-   if (!err)
-   *tp = timespec64_to_timespec(ts);
+   err = ptp->info->gettime64(ptp->info, tp);
return err;
 }
 
@@ -133,7 +129,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct 
timex *tx)
ops = ptp->info;
 
if (tx->modes & ADJ_SETOFFSET) {
-   struct timespec ts;
+   struct timespec64 ts;
ktime_t kt;
s64 delta;
 
@@ -146,7 +142,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct 
timex *tx)
if ((unsigned long) ts.tv_nsec >= NSEC_PER_SEC)
return -EINVAL;
 
-   kt = timespec_to_ktime(ts);
+   kt = timespec64_to_ktime(ts);
delta = ktime_to_ns(kt);
err = ops->adjtime(ops, delta);
} else if (tx->modes & ADJ_FREQUENCY) {
diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h
index 34c4498b800f..83b22ae9ae12 100644
--- a/include/linux/posix-clock.h
+++ b/include/linux/posix-clock.h
@@ -59,23 +59,23 @@ struct posix_clock_operations {
 
int  (*clock_adjtime)(struct posix_clock *pc, struct timex *tx);
 
-   int  (*clock_gettime)(struct posix_clock *pc, struct timespec *ts);
+   int  (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts);
 
-   int  (*clock_getres) (struct posix_clock *pc, struct timespec *ts);
+   int  (*clock_getres) (struct posix_clock *pc, struct timespec64 *ts);
 
int  (*clock_settime)(struct posix_clock *pc,
- const struct timespec *ts);
+ const struct timespec64 *ts);
 
int  (*timer_create) (struct posix_clock *pc, struct k_itimer *kit);
 
int  (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit);
 
void (*timer_gettime)(struct posix_clock *pc,
- struct k_itimer *kit, struct itimerspec *tsp);
+ struct k_itimer *kit, struct itimerspec64 *tsp);
 
int  (*timer_settime)(struct posix_clock *pc,
  struct k_itimer *kit, int flags,
- struct itimerspec *tsp, struct itimerspec *old);
+ struct itimerspec64 *tsp, struct itimerspec64 
*old);
/*
 * Optional character device methods:
 */
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index 9cff0ab82b63..e24008c098c6 100644
--- 

[PATCH iproute2 net-next v4] iprule: support for ip_proto, sport and dport match options

2018-03-07 Thread Roopa Prabhu
From: Roopa Prabhu 

add support to match on ip_proto, sport and dport ranges.
For ip_proto, this patch currently enumerates, tcp, udp and sctp.
This list can be extended in the future.

example:
$ip rule add sport 666-777 dport 999 ip_proto tcp table 100
$ip rule show
0:  from all lookup local
32765:  from all ip_proto 6 sport 666-777 dport 999 lookup 100
32766:  from all lookup main
32767:  from all lookup default

Signed-off-by: Roopa Prabhu 
---
v2: use inet_proto_* as suggested by David Ahern

v3: fix newlines in usage (feedback from David Ahern)

v4: fixes for json (feedback from Stephen H).

 include/uapi/linux/fib_rules.h |  8 +
 ip/iprule.c| 67 ++
 man/man8/ip-rule.8 | 32 +++-
 3 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
index 77d90ae..1809af5 100644
--- a/include/uapi/linux/fib_rules.h
+++ b/include/uapi/linux/fib_rules.h
@@ -35,6 +35,11 @@ struct fib_rule_uid_range {
__u32   end;
 };
 
+struct fib_rule_port_range {
+   __u16   start;
+   __u16   end;
+};
+
 enum {
FRA_UNSPEC,
FRA_DST,/* destination address */
@@ -59,6 +64,9 @@ enum {
FRA_L3MDEV, /* iif or oif is l3mdev goto its table */
FRA_UID_RANGE,  /* UID range */
FRA_PROTOCOL,   /* Originator of the rule */
+   FRA_IP_PROTO,   /* ip proto */
+   FRA_SPORT_RANGE,/* sport range */
+   FRA_DPORT_RANGE,/* dport range */
__FRA_MAX
 };
 
diff --git a/ip/iprule.c b/ip/iprule.c
index a49753e..3520544 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -47,6 +47,9 @@ static void usage(void)
"SELECTOR := [ not ] [ from PREFIX ] [ to PREFIX ] [ tos TOS ] 
[ fwmark FWMARK[/MASK] ]\n"
"[ iif STRING ] [ oif STRING ] [ pref NUMBER ] [ 
l3mdev ]\n"
"[ uidrange NUMBER-NUMBER ]\n"
+   "[ ip_proto PROTOCOL ]\n"
+   "[ sport [ NUMBER | NUMBER-NUMBER ]\n"
+   "[ dport [ NUMBER | NUMBER-NUMBER ] ]\n"
"ACTION := [ table TABLE_ID ]\n"
"  [ protocol PROTO ]\n"
"  [ nat ADDRESS ]\n"
@@ -306,6 +309,37 @@ int print_rule(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
print_uint(PRINT_ANY, "uid_end", "-%u ", r->end);
}
 
+   if (tb[FRA_IP_PROTO]) {
+   SPRINT_BUF(pbuf);
+   print_string(PRINT_ANY, "ip_proto", "ip_proto %s ",
+inet_proto_n2a(rta_getattr_u8(tb[FRA_IP_PROTO]),
+   pbuf, sizeof(pbuf)));
+   }
+
+   if (tb[FRA_SPORT_RANGE]) {
+   struct fib_rule_port_range *r = RTA_DATA(tb[FRA_SPORT_RANGE]);
+
+   if (r->start == r->end) {
+   print_uint(PRINT_ANY, "sport", "sport %u ", r->start);
+   } else {
+   print_uint(PRINT_ANY, "sport_start", "sport %u",
+  r->start);
+   print_uint(PRINT_ANY, "sport_end", "-%u ", r->end);
+   }
+   }
+
+   if (tb[FRA_DPORT_RANGE]) {
+   struct fib_rule_port_range *r = RTA_DATA(tb[FRA_DPORT_RANGE]);
+
+   if (r->start == r->end) {
+   print_uint(PRINT_ANY, "dport", "dport %u ", r->start);
+   } else {
+   print_uint(PRINT_ANY, "dport_start", "dport %u",
+  r->start);
+   print_uint(PRINT_ANY, "dport_end", "-%u ", r->end);
+   }
+   }
+
table = frh_get_table(frh, tb);
if (table) {
print_string(PRINT_ANY, "table",
@@ -802,6 +836,39 @@ static int iprule_modify(int cmd, int argc, char **argv)
addattr32(, sizeof(req), RTA_GATEWAY,
  get_addr32(*argv));
req.frh.action = RTN_NAT;
+   } else if (strcmp(*argv, "ip_proto") == 0) {
+   __u8 ip_proto;
+
+   NEXT_ARG();
+   ip_proto = inet_proto_a2n(*argv);
+   if (ip_proto < 0)
+   invarg("Invalid \"ip_proto\" value\n",
+  *argv);
+   addattr8(, sizeof(req), FRA_IP_PROTO, ip_proto);
+   } else if (strcmp(*argv, "sport") == 0) {
+   struct fib_rule_port_range r;
+   int ret = 0;
+
+   NEXT_ARG();
+   ret = sscanf(*argv, "%hu-%hu", , );
+   if (ret == 1)
+   r.end = r.start;
+   else if (ret 

[PATCH 2/3] net: Remove accidental VLAs from proc buffers

2018-03-07 Thread Kees Cook
In the quest to remove all stack VLAs from the kernel[1], this refactors
the stack array size calculation to avoid using max(), which makes the
compiler think the size isn't fixed.

[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Kees Cook 
---
 net/ipv4/proc.c | 10 --
 net/ipv6/proc.c | 10 --
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index dc5edc8f7564..c23c43803435 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -46,8 +46,6 @@
 #include 
 #include 
 
-#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX)
-
 /*
  * Report socket allocation statistics [m...@utu.fi]
  */
@@ -400,11 +398,11 @@ static int snmp_seq_show_ipstats(struct seq_file *seq, 
void *v)
 
 static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
 {
-   unsigned long buff[TCPUDP_MIB_MAX];
+   unsigned long buff[SIMPLE_MAX(UDP_MIB_MAX, TCP_MIB_MAX)];
struct net *net = seq->private;
int i;
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
seq_puts(seq, "\nTcp:");
for (i = 0; snmp4_tcp_list[i].name; i++)
@@ -421,7 +419,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
seq_printf(seq, " %lu", buff[i]);
}
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
snmp_get_cpu_field_batch(buff, snmp4_udp_list,
 net->mib.udp_statistics);
@@ -432,7 +430,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
for (i = 0; snmp4_udp_list[i].name; i++)
seq_printf(seq, " %lu", buff[i]);
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
/* the UDP and UDP-Lite MIBs are the same */
seq_puts(seq, "\nUdpLite:");
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index b67814242f78..5b0874c26802 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -30,10 +30,8 @@
 #include 
 #include 
 
-#define MAX4(a, b, c, d) \
-   max_t(u32, max_t(u32, a, b), max_t(u32, c, d))
-#define SNMP_MIB_MAX MAX4(UDP_MIB_MAX, TCP_MIB_MAX, \
-   IPSTATS_MIB_MAX, ICMP_MIB_MAX)
+#define SNMP_MIB_MAX SIMPLE_MAX(SIMPLE_MAX(UDP_MIB_MAX, TCP_MIB_MAX), \
+   SIMPLE_MAX(IPSTATS_MIB_MAX, ICMP_MIB_MAX))
 
 static int sockstat6_seq_show(struct seq_file *seq, void *v)
 {
@@ -199,7 +197,7 @@ static void snmp6_seq_show_item(struct seq_file *seq, void 
__percpu *pcpumib,
int i;
 
if (pcpumib) {
-   memset(buff, 0, sizeof(unsigned long) * SNMP_MIB_MAX);
+   memset(buff, 0, sizeof(buff));
 
snmp_get_cpu_field_batch(buff, itemlist, pcpumib);
for (i = 0; itemlist[i].name; i++)
@@ -218,7 +216,7 @@ static void snmp6_seq_show_item64(struct seq_file *seq, 
void __percpu *mib,
u64 buff64[SNMP_MIB_MAX];
int i;
 
-   memset(buff64, 0, sizeof(u64) * SNMP_MIB_MAX);
+   memset(buff64, 0, sizeof(buff64));
 
snmp_get_cpu_field64_batch(buff64, itemlist, mib, syncpoff);
for (i = 0; itemlist[i].name; i++)
-- 
2.7.4



[PATCH 3/3] btrfs: tree-checker: Avoid accidental stack VLA

2018-03-07 Thread Kees Cook
In the quest to remove all stack VLAs from the kernel[1], this refactors
the stack array size calculation to avoid using max(), which makes the
compiler think the size isn't fixed.

[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Kees Cook 
---
 fs/btrfs/tree-checker.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index c3c8d48f6618..59bd07694118 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -341,7 +341,8 @@ static int check_dir_item(struct btrfs_root *root,
 */
if (key->type == BTRFS_DIR_ITEM_KEY ||
key->type == BTRFS_XATTR_ITEM_KEY) {
-   char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)];
+   char namebuf[SIMPLE_MAX(BTRFS_NAME_LEN,
+   XATTR_NAME_MAX)];
 
read_extent_buffer(leaf, namebuf,
(unsigned long)(di + 1), name_len);
-- 
2.7.4



[PATCH 0/3] Remove accidental VLA usage

2018-03-07 Thread Kees Cook
This series adds SIMPLE_MAX() to be used in places where a stack array
is actually fixed, but the compiler still warns about VLA usage due to
confusion caused by the safety checks in the max() macro.

I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
and they should all have no operational differences.

-Kees



[PATCH v2 1/3] vsprintf: Remove accidental VLA usage

2018-03-07 Thread Kees Cook
In the quest to remove all stack VLAs from the kernel[1], this introduces
a new "simple max" macro, and changes the "sym" array size calculation to
use it. The value is actually a fixed size, but since the max() macro uses
some extensive tricks for safety, it ends up looking like a variable size
to the compiler.

[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Kees Cook 
---
 include/linux/kernel.h | 11 +++
 lib/vsprintf.c |  4 ++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..1da554e9997f 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -820,6 +820,17 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  x, y)
 
 /**
+ * SIMPLE_MAX - return maximum of two values without any type checking
+ * @x: first value
+ * @y: second value
+ *
+ * This should only be used in stack array sizes, since the type-checking
+ * from max() confuses the compiler into thinking a VLA is being used.
+ */
+#define SIMPLE_MAX(x, y)   ((size_t)(x) > (size_t)(y) ? (size_t)(x) \
+  : (size_t)(y))
+
+/**
  * min3 - return minimum of three values
  * @x: first value
  * @y: second value
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index d7a708f82559..50cce36e1cdc 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -744,8 +744,8 @@ char *resource_string(char *buf, char *end, struct resource 
*res,
 #define FLAG_BUF_SIZE  (2 * sizeof(res->flags))
 #define DECODED_BUF_SIZE   sizeof("[mem - 64bit pref window disabled]")
 #define RAW_BUF_SIZE   sizeof("[mem - flags 0x]")
-   char sym[max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
-2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
+   char sym[SIMPLE_MAX(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
+   2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
 
char *p = sym, *pend = sym + sizeof(sym);
int decode = (fmt[0] == 'R') ? 1 : 0;
-- 
2.7.4



[PATCH net-next 01/23] {topost} net: hns3: VF should get the real rss_size instead of rss_size_max

2018-03-07 Thread Peng Li
VF driver should get the real rss_size which is assigned
by host PF, not rss_size_max.

Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index f38fc5c..31383a6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -291,7 +291,7 @@ static int hclge_get_vf_queue_info(struct hclge_vport 
*vport,
 
/* get the queue related info */
memcpy(_data[0], >alloc_tqps, sizeof(u16));
-   memcpy(_data[2], >rss_size_max, sizeof(u16));
+   memcpy(_data[2], >nic.kinfo.rss_size, sizeof(u16));
memcpy(_data[4], >num_desc, sizeof(u16));
memcpy(_data[6], >rx_buf_len, sizeof(u16));
 
-- 
2.9.3



[PATCH net-next 06/23] {topost} net: hns3: fix for ipv6 address loss problem after setting channels

2018-03-07 Thread Peng Li
From: Fuyun Liang 

The function of dev_close and dev_open is just likes ifconfig  down
and ifconfig  up. The ipv6 address will be lost after dev_close and
dev_open are called. This patch uses hns3_nic_net_stop to replace dev_close
and uses hns3_nic_net_open to replace dev_open.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 1bebfd9..83f4b36 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -3411,7 +3411,7 @@ int hns3_set_channels(struct net_device *netdev,
return 0;
 
if (if_running)
-   dev_close(netdev);
+   hns3_nic_net_stop(netdev);
 
hns3_clear_all_ring(h);
 
@@ -3440,7 +3440,7 @@ int hns3_set_channels(struct net_device *netdev,
 
 open_netdev:
if (if_running)
-   dev_open(netdev);
+   hns3_nic_net_open(netdev);
 
return ret;
 }
-- 
2.9.3



[PATCH net-next 12/23] {topost} net: hns3: fix for RSS configuration loss problem during reset

2018-03-07 Thread Peng Li
From: Yunsheng Lin 

RSS configuration will be set to default value by hclge_rss_init_hw
during reset, which causes the RSS configuration loss problem.

This patch fixes it by setting the default value in
hclge_rss_init_cfg function, which will not be called in the reset
process.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c |   2 +
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 107 ++---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|   1 +
 3 files changed, 56 insertions(+), 54 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
index c5270b5..955f0e3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
@@ -144,6 +144,8 @@ static int hclge_map_update(struct hnae3_handle *h)
if (ret)
return ret;
 
+   hclge_rss_indir_init_cfg(hdev);
+
return hclge_rss_init_hw(hdev);
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 0271960..1d69470 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3329,67 +3329,28 @@ static int hclge_get_tc_size(struct hnae3_handle 
*handle)
 
 int hclge_rss_init_hw(struct hclge_dev *hdev)
 {
-   const  u8 hfunc = HCLGE_RSS_HASH_ALGO_TOEPLITZ;
struct hclge_vport *vport = hdev->vport;
+   u8 *rss_indir = vport[0].rss_indirection_tbl;
+   u16 rss_size = vport[0].alloc_rss_size;
+   u8 *key = vport[0].rss_hash_key;
+   u8 hfunc = vport[0].rss_algo;
u16 tc_offset[HCLGE_MAX_TC_NUM];
-   u8 rss_key[HCLGE_RSS_KEY_SIZE];
u16 tc_valid[HCLGE_MAX_TC_NUM];
u16 tc_size[HCLGE_MAX_TC_NUM];
-   u32 *rss_indir = NULL;
-   u16 rss_size = 0, roundup_size;
-   const u8 *key;
-   int i, ret, j;
-
-   rss_indir = kcalloc(HCLGE_RSS_IND_TBL_SIZE, sizeof(u32), GFP_KERNEL);
-   if (!rss_indir)
-   return -ENOMEM;
-
-   /* Get default RSS key */
-   netdev_rss_key_fill(rss_key, HCLGE_RSS_KEY_SIZE);
-
-   /* Initialize RSS indirect table for each vport */
-   for (j = 0; j < hdev->num_vmdq_vport + 1; j++) {
-   vport[j].rss_tuple_sets.ipv4_tcp_en =
-   HCLGE_RSS_INPUT_TUPLE_OTHER;
-   vport[j].rss_tuple_sets.ipv4_udp_en =
-   HCLGE_RSS_INPUT_TUPLE_OTHER;
-   vport[j].rss_tuple_sets.ipv4_sctp_en =
-   HCLGE_RSS_INPUT_TUPLE_SCTP;
-   vport[j].rss_tuple_sets.ipv4_fragment_en =
-   HCLGE_RSS_INPUT_TUPLE_OTHER;
-   vport[j].rss_tuple_sets.ipv6_tcp_en =
-   HCLGE_RSS_INPUT_TUPLE_OTHER;
-   vport[j].rss_tuple_sets.ipv6_udp_en =
-   HCLGE_RSS_INPUT_TUPLE_OTHER;
-   vport[j].rss_tuple_sets.ipv6_sctp_en =
-   HCLGE_RSS_INPUT_TUPLE_SCTP;
-   vport[j].rss_tuple_sets.ipv6_fragment_en =
-   HCLGE_RSS_INPUT_TUPLE_OTHER;
-
-   for (i = 0; i < HCLGE_RSS_IND_TBL_SIZE; i++) {
-   vport[j].rss_indirection_tbl[i] =
-   i % vport[j].alloc_rss_size;
-
-   /* vport 0 is for PF */
-   if (j != 0)
-   continue;
+   u16 roundup_size;
+   int i, ret;
 
-   rss_size = vport[j].alloc_rss_size;
-   rss_indir[i] = vport[j].rss_indirection_tbl[i];
-   }
-   }
ret = hclge_set_rss_indir_table(hdev, rss_indir);
if (ret)
-   goto err;
+   return ret;
 
-   key = rss_key;
ret = hclge_set_rss_algo_key(hdev, hfunc, key);
if (ret)
-   goto err;
+   return ret;
 
ret = hclge_set_rss_input_tuple(hdev);
if (ret)
-   goto err;
+   return ret;
 
/* Each TC have the same queue size, and tc_size set to hardware is
 * the log2 of roundup power of two of rss_size, the acutal queue
@@ -3399,8 +3360,7 @@ int hclge_rss_init_hw(struct hclge_dev *hdev)
dev_err(>pdev->dev,
"Configure rss tc size failed, invalid TC_SIZE = %d\n",
rss_size);
-   ret = -EINVAL;
-   goto err;
+   return -EINVAL;
}
 
roundup_size = roundup_pow_of_two(rss_size);
@@ -3417,12 +3377,50 @@ int hclge_rss_init_hw(struct hclge_dev *hdev)
tc_offset[i] = rss_size * i;
}
 
-   ret = hclge_set_rss_tc_mode(hdev, tc_valid, tc_size, tc_offset);
+   

Re: [PATCHv2 net-next] openvswitch: fix vport packet length check.

2018-03-07 Thread Pravin Shelar
On Wed, Mar 7, 2018 at 3:38 PM, William Tu  wrote:
> When sending a packet to a tunnel device, the dev's hard_header_len
> could be larger than the skb->len in function packet_length().
> In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen,
> which is around 180, and an ARP packet sent to this tunnel has
> skb->len = 42.  This causes the 'unsign int length' to become super
> large because it is negative value, causing the later ovs_vport_send
> to drop it due to over-mtu size.  The patch fixes it by setting it to 0.
>
> Signed-off-by: William Tu 
> ---
> v1->v2:
>   replace the return type from unsigned int to int
> ---
Acked-by: Pravin B Shelar 


[PATCH net-next 05/23] {topost} net: hns3: fix for netdev not running problem after calling net_stop and net_open

2018-03-07 Thread Peng Li
From: Fuyun Liang 

The link status update function is called by timer every second. But
net_stop and net_open may be called with very short intervals. The link
status update function can not detect the link state has changed. It
causes the netdev not running problem.

This patch fixes it by updating the link state in ae_stop function.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c   | 3 +++
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 0b74461..83be4d5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3770,6 +3770,9 @@ static void hclge_ae_stop(struct hnae3_handle *handle)
 
/* reset tqp stats */
hclge_reset_tqp_stats(handle);
+   del_timer_sync(>service_timer);
+   cancel_work_sync(>service_task);
+   hclge_update_link_status(hdev);
 }
 
 static int hclge_get_mac_vlan_cmd_status(struct hclge_vport *vport,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index ccb6756..eee5e20 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1061,6 +1061,9 @@ static void hclgevf_ae_stop(struct hnae3_handle *handle)
 
/* reset tqp stats */
hclgevf_reset_tqp_stats(handle);
+   del_timer_sync(>service_timer);
+   cancel_work_sync(>service_task);
+   hclgevf_update_link_status(hdev, 0);
 }
 
 static void hclgevf_state_init(struct hclgevf_dev *hdev)
-- 
2.9.3



[PATCH net-next 00/23] net: hns3: HNS3 bug fixes & code improvements

2018-03-07 Thread Peng Li
This patch-set introduces various HNS3 bug fixes, optimizations and code
improvements.

Fuyun Liang (4):
  {topost} net: hns3: add existence check when remove old uc mac address
  {topost} net: hns3: fix for netdev not running problem after calling
net_stop and net_open
  {topost} net: hns3: fix for ipv6 address loss problem after setting
channels
  {topost} net: hns3: unify the pause params setup function

Peng Li (8):
  {topost} net: hns3: VF should get the real rss_size instead of
rss_size_max
  {topost} net: hns3: set the cmdq out_vld bit to 0 after used
  {topost} net: hns3: fix endian issue when PF get mbx message flag
  {topost} net: hns3: fix rx path skb->truesize reporting bug
  {topost} net: hns3: Add support for querying pfc puase packets
statistic
  {topost} net: hns3: fix the queue id for tqp enable&
  {topost} net: hns3: set the max ring num when alloc netdev
  {topost} net: hns3: add support for VF driver inner interface
hclgevf_ops.get_tqps_and_rss_info

Yunsheng Lin (11):
  {topost} net: hns3: Refactor the hclge_get/set_rss function
  {topost} net: hns3: Refactor the hclge_get/set_rss_tuple function
  {topost} net: hns3: Fix for RSS configuration loss problem during
reset
  {topost} net: hns3: Fix for pause configuration lost during reset
  {topost} net: hns3: Fix for use-after-free when setting ring parameter
  {topost} net: hns3: Refactor the get/put_vector function
  {topost} net: hns3: Fix for coalesce configuration lost during reset
  {topost} net: hns3: Refactor the coalesce related struct
  {topost} net: hns3: Fix for coal configuation lost when setting the
channel
  {topost} net: hns3: Fix for loopback failure when vlan filter is
enable
  {topost} net: hns3: Fix for buffer overflow smatch warning

 drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h|   2 +
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|   6 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c| 286 +--
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h|  10 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  42 ++-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c |  16 ++
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 307 +++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  16 ++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c |  31 ++-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  |  76 -
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h  |   8 +-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  |  95 ---
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c   |   1 +
 13 files changed, 574 insertions(+), 322 deletions(-)

-- 
2.9.3



[PATCH net-next 08/23] {topost} net: hns3: fix rx path skb->truesize reporting bug

2018-03-07 Thread Peng Li
Original skb->truesize reports the received packet size,
not the actual buffer size NIC driver allocated(1 Page).
The linux net protocol will misjudge the true size of rx queue.

Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 83f4b36..f50245d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2064,15 +2064,13 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, 
int i,
desc = >desc[ring->next_to_clean];
size = le16_to_cpu(desc->rx.size);
 
-   if (twobufs) {
-   truesize = hnae_buf_size(ring);
-   } else {
-   truesize = ALIGN(size, L1_CACHE_BYTES);
+   truesize = hnae_buf_size(ring);
+
+   if (!twobufs)
last_offset = hnae_page_size(ring) - hnae_buf_size(ring);
-   }
 
skb_add_rx_frag(skb, i, desc_cb->priv, desc_cb->page_offset + pull_len,
-   size - pull_len, truesize - pull_len);
+   size - pull_len, truesize);
 
 /* Avoid re-using remote pages,flag default unreuse */
if (unlikely(page_to_nid(desc_cb->priv) != numa_node_id()))
-- 
2.9.3



[PATCH net-next 14/23] {topost} net: hns3: fix for use-after-free when setting ring parameter

2018-03-07 Thread Peng Li
From: Yunsheng Lin 

In hns3_set_ringparam, hns3_uninit_all_ring frees the
memory pointed by priv->ring_data[i].ring, and
hns3_change_all_ring_bd_num use that pointer without mallocing,
which will cause a use-after-free problem.

The patch fixes it by not freeing the memory in
hns3_uninit_all_ring, and uses hns3_put_ring_config to free it
when necessary.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index f50245d..2bed73e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2955,13 +2955,8 @@ int hns3_uninit_all_ring(struct hns3_nic_priv *priv)
h->ae_algo->ops->reset_queue(h, i);
 
hns3_fini_ring(priv->ring_data[i].ring);
-   devm_kfree(priv->dev, priv->ring_data[i].ring);
hns3_fini_ring(priv->ring_data[i + h->kinfo.num_tqps].ring);
-   devm_kfree(priv->dev,
-  priv->ring_data[i + h->kinfo.num_tqps].ring);
}
-   devm_kfree(priv->dev, priv->ring_data);
-
return 0;
 }
 
@@ -3099,6 +3094,8 @@ static void hns3_client_uninit(struct hnae3_handle 
*handle, bool reset)
if (ret)
netdev_err(netdev, "uninit ring error\n");
 
+   hns3_put_ring_config(priv);
+
priv->ring_data = NULL;
 
free_netdev(netdev);
@@ -3304,6 +3301,8 @@ static int hns3_reset_notify_uninit_enet(struct 
hnae3_handle *handle)
if (ret)
netdev_err(netdev, "uninit ring error\n");
 
+   hns3_put_ring_config(priv);
+
priv->ring_data = NULL;
 
return ret;
@@ -3421,6 +3420,7 @@ int hns3_set_channels(struct net_device *netdev,
}
 
hns3_uninit_all_ring(priv);
+   hns3_put_ring_config(priv);
 
org_tqp_num = h->kinfo.num_tqps;
ret = hns3_modify_tqp_num(netdev, new_tqp_num);
-- 
2.9.3



[PATCH net-next 11/23] {topost} net: hns3: refactor the hclge_get/set_rss_tuple function

2018-03-07 Thread Peng Li
From: Yunsheng Lin 

This patch refactors the hclge_get/set_rss_tuple function
in order to fix the rss configuration loss problem during
reset process.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 91 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h| 13 
 2 files changed, 67 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 9ba012b..0271960 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3091,14 +3091,16 @@ static int hclge_set_rss_input_tuple(struct hclge_dev 
*hdev)
hclge_cmd_setup_basic_desc(, HCLGE_OPC_RSS_INPUT_TUPLE, false);
 
req = (struct hclge_rss_input_tuple_cmd *)desc.data;
-   req->ipv4_tcp_en = HCLGE_RSS_INPUT_TUPLE_OTHER;
-   req->ipv4_udp_en = HCLGE_RSS_INPUT_TUPLE_OTHER;
-   req->ipv4_sctp_en = HCLGE_RSS_INPUT_TUPLE_SCTP;
-   req->ipv4_fragment_en = HCLGE_RSS_INPUT_TUPLE_OTHER;
-   req->ipv6_tcp_en = HCLGE_RSS_INPUT_TUPLE_OTHER;
-   req->ipv6_udp_en = HCLGE_RSS_INPUT_TUPLE_OTHER;
-   req->ipv6_sctp_en = HCLGE_RSS_INPUT_TUPLE_SCTP;
-   req->ipv6_fragment_en = HCLGE_RSS_INPUT_TUPLE_OTHER;
+
+   /* Get the tuple cfg from pf */
+   req->ipv4_tcp_en = hdev->vport[0].rss_tuple_sets.ipv4_tcp_en;
+   req->ipv4_udp_en = hdev->vport[0].rss_tuple_sets.ipv4_udp_en;
+   req->ipv4_sctp_en = hdev->vport[0].rss_tuple_sets.ipv4_sctp_en;
+   req->ipv4_fragment_en = hdev->vport[0].rss_tuple_sets.ipv4_fragment_en;
+   req->ipv6_tcp_en = hdev->vport[0].rss_tuple_sets.ipv6_tcp_en;
+   req->ipv6_udp_en = hdev->vport[0].rss_tuple_sets.ipv6_udp_en;
+   req->ipv6_sctp_en = hdev->vport[0].rss_tuple_sets.ipv6_sctp_en;
+   req->ipv6_fragment_en = hdev->vport[0].rss_tuple_sets.ipv6_fragment_en;
ret = hclge_cmd_send(>hw, , 1);
if (ret) {
dev_err(>pdev->dev,
@@ -3204,15 +3206,16 @@ static int hclge_set_rss_tuple(struct hnae3_handle 
*handle,
return -EINVAL;
 
req = (struct hclge_rss_input_tuple_cmd *)desc.data;
-   hclge_cmd_setup_basic_desc(, HCLGE_OPC_RSS_INPUT_TUPLE, true);
-   ret = hclge_cmd_send(>hw, , 1);
-   if (ret) {
-   dev_err(>pdev->dev,
-   "Read rss tuple fail, status = %d\n", ret);
-   return ret;
-   }
+   hclge_cmd_setup_basic_desc(, HCLGE_OPC_RSS_INPUT_TUPLE, false);
 
-   hclge_cmd_reuse_desc(, false);
+   req->ipv4_tcp_en = vport->rss_tuple_sets.ipv4_tcp_en;
+   req->ipv4_udp_en = vport->rss_tuple_sets.ipv4_udp_en;
+   req->ipv4_sctp_en = vport->rss_tuple_sets.ipv4_sctp_en;
+   req->ipv4_fragment_en = vport->rss_tuple_sets.ipv4_fragment_en;
+   req->ipv6_tcp_en = vport->rss_tuple_sets.ipv6_tcp_en;
+   req->ipv6_udp_en = vport->rss_tuple_sets.ipv6_udp_en;
+   req->ipv6_sctp_en = vport->rss_tuple_sets.ipv6_sctp_en;
+   req->ipv6_fragment_en = vport->rss_tuple_sets.ipv6_fragment_en;
 
tuple_sets = hclge_get_rss_hash_bits(nfc);
switch (nfc->flow_type) {
@@ -3249,52 +3252,49 @@ static int hclge_set_rss_tuple(struct hnae3_handle 
*handle,
}
 
ret = hclge_cmd_send(>hw, , 1);
-   if (ret)
+   if (ret) {
dev_err(>pdev->dev,
"Set rss tuple fail, status = %d\n", ret);
+   return ret;
+   }
 
-   return ret;
+   vport->rss_tuple_sets.ipv4_tcp_en = req->ipv4_tcp_en;
+   vport->rss_tuple_sets.ipv4_udp_en = req->ipv4_udp_en;
+   vport->rss_tuple_sets.ipv4_sctp_en = req->ipv4_sctp_en;
+   vport->rss_tuple_sets.ipv4_fragment_en = req->ipv4_fragment_en;
+   vport->rss_tuple_sets.ipv6_tcp_en = req->ipv6_tcp_en;
+   vport->rss_tuple_sets.ipv6_udp_en = req->ipv6_udp_en;
+   vport->rss_tuple_sets.ipv6_sctp_en = req->ipv6_sctp_en;
+   vport->rss_tuple_sets.ipv6_fragment_en = req->ipv6_fragment_en;
+   return 0;
 }
 
 static int hclge_get_rss_tuple(struct hnae3_handle *handle,
   struct ethtool_rxnfc *nfc)
 {
struct hclge_vport *vport = hclge_get_vport(handle);
-   struct hclge_dev *hdev = vport->back;
-   struct hclge_rss_input_tuple_cmd *req;
-   struct hclge_desc desc;
u8 tuple_sets;
-   int ret;
 
nfc->data = 0;
 
-   req = (struct hclge_rss_input_tuple_cmd *)desc.data;
-   hclge_cmd_setup_basic_desc(, HCLGE_OPC_RSS_INPUT_TUPLE, true);
-   ret = hclge_cmd_send(>hw, , 1);
-   if (ret) {
-   dev_err(>pdev->dev,
-   "Read rss tuple fail, status = %d\n", ret);
-   return ret;
-   }
-
switch (nfc->flow_type) {
case TCP_V4_FLOW:
-   

[PATCH net-next 07/23] {topost} net: hns3: unify the pause params setup function

2018-03-07 Thread Peng Li
From: Fuyun Liang 

Since the firmware cmd to setup mac pause params is the same as the
firmware cmd to pfc pause params, this patch unifies the pause params
setup function.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c|  2 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  | 23 +++---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h  |  2 +-
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 83be4d5..cc72ed8 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -4431,7 +4431,7 @@ static int hclge_set_mac_addr(struct hnae3_handle 
*handle, void *p,
return -EIO;
}
 
-   ret = hclge_mac_pause_addr_cfg(hdev, new_addr);
+   ret = hclge_pause_addr_cfg(hdev, new_addr);
if (ret) {
dev_err(>pdev->dev,
"configure mac pause address fail, ret =%d.\n",
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index 36bd79a..4134a82 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -138,8 +138,8 @@ static int hclge_pfc_pause_en_cfg(struct hclge_dev *hdev, 
u8 tx_rx_bitmap,
return hclge_cmd_send(>hw, , 1);
 }
 
-static int hclge_mac_pause_param_cfg(struct hclge_dev *hdev, const u8 *addr,
-u8 pause_trans_gap, u16 pause_trans_time)
+static int hclge_pause_param_cfg(struct hclge_dev *hdev, const u8 *addr,
+u8 pause_trans_gap, u16 pause_trans_time)
 {
struct hclge_cfg_pause_param_cmd *pause_param;
struct hclge_desc desc;
@@ -155,7 +155,7 @@ static int hclge_mac_pause_param_cfg(struct hclge_dev 
*hdev, const u8 *addr,
return hclge_cmd_send(>hw, , 1);
 }
 
-int hclge_mac_pause_addr_cfg(struct hclge_dev *hdev, const u8 *mac_addr)
+int hclge_pause_addr_cfg(struct hclge_dev *hdev, const u8 *mac_addr)
 {
struct hclge_cfg_pause_param_cmd *pause_param;
struct hclge_desc desc;
@@ -174,7 +174,7 @@ int hclge_mac_pause_addr_cfg(struct hclge_dev *hdev, const 
u8 *mac_addr)
trans_gap = pause_param->pause_trans_gap;
trans_time = le16_to_cpu(pause_param->pause_trans_time);
 
-   return hclge_mac_pause_param_cfg(hdev, mac_addr, trans_gap,
+   return hclge_pause_param_cfg(hdev, mac_addr, trans_gap,
 trans_time);
 }
 
@@ -1096,11 +1096,11 @@ static int hclge_tm_schd_setup_hw(struct hclge_dev 
*hdev)
return hclge_tm_schd_mode_hw(hdev);
 }
 
-static int hclge_mac_pause_param_setup_hw(struct hclge_dev *hdev)
+static int hclge_pause_param_setup_hw(struct hclge_dev *hdev)
 {
struct hclge_mac *mac = >hw.mac;
 
-   return hclge_mac_pause_param_cfg(hdev, mac->mac_addr,
+   return hclge_pause_param_cfg(hdev, mac->mac_addr,
 HCLGE_DEFAULT_PAUSE_TRANS_GAP,
 HCLGE_DEFAULT_PAUSE_TRANS_TIME);
 }
@@ -1151,13 +1151,12 @@ int hclge_pause_setup_hw(struct hclge_dev *hdev)
int ret;
u8 i;
 
-   if (hdev->tm_info.fc_mode != HCLGE_FC_PFC) {
-   ret = hclge_mac_pause_setup_hw(hdev);
-   if (ret)
-   return ret;
+   ret = hclge_pause_param_setup_hw(hdev);
+   if (ret)
+   return ret;
 
-   return hclge_mac_pause_param_setup_hw(hdev);
-   }
+   if (hdev->tm_info.fc_mode != HCLGE_FC_PFC)
+   return hclge_mac_pause_setup_hw(hdev);
 
/* Only DCB-supported dev supports qset back pressure and pfc cmd */
if (!hnae3_dev_dcb_supported(hdev))
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
index 5401e75..c30c85b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
@@ -129,5 +129,5 @@ int hclge_tm_dwrr_cfg(struct hclge_dev *hdev);
 int hclge_tm_map_cfg(struct hclge_dev *hdev);
 int hclge_tm_init_hw(struct hclge_dev *hdev);
 int hclge_mac_pause_en_cfg(struct hclge_dev *hdev, bool tx, bool rx);
-int hclge_mac_pause_addr_cfg(struct hclge_dev *hdev, const u8 *mac_addr);
+int hclge_pause_addr_cfg(struct hclge_dev *hdev, const u8 *mac_addr);
 #endif
-- 
2.9.3



[PATCH net-next 13/23] {topost} net: hns3: fix for pause configuration lost during reset

2018-03-07 Thread Peng Li
From: Yunsheng Lin 

Pause configuration will be set to default value by hclge_tm_schd_init
during reset, which causes the RSS configuration loss problem.

This patch fixes it by calling hclge_tm_init_hw during reset process
, which will set the pause configuration to default value.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 1d69470..c0f6939 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -5495,9 +5495,9 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev)
return ret;
}
 
-   ret = hclge_tm_schd_init(hdev);
+   ret = hclge_tm_init_hw(hdev);
if (ret) {
-   dev_err(>dev, "tm schd init fail, ret =%d\n", ret);
+   dev_err(>dev, "tm init hw fail, ret =%d\n", ret);
return ret;
}
 
-- 
2.9.3



[PATCH net-next 02/23] {topost} net: hns3: add existence check when remove old uc mac address

2018-03-07 Thread Peng Li
From: Fuyun Liang 

When driver is in initial state, the mac_vlan table table is empty.
So the delete operation for mac address must fail. Existence check
is needed here. Otherwise, the error message will make user confused.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility 
Layer Support")
Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h |  3 ++-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c |  4 ++--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 17 +++--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c  |  2 ++
 .../net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c   | 10 +++---
 5 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index fd06bc7..3c653eb 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -336,7 +336,8 @@ struct hnae3_ae_ops {
   u32 *tx_usecs_high, u32 *rx_usecs_high);
 
void (*get_mac_addr)(struct hnae3_handle *handle, u8 *p);
-   int (*set_mac_addr)(struct hnae3_handle *handle, void *p);
+   int (*set_mac_addr)(struct hnae3_handle *handle, void *p,
+   bool is_first);
int (*add_uc_addr)(struct hnae3_handle *handle,
   const unsigned char *addr);
int (*rm_uc_addr)(struct hnae3_handle *handle,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 601b629..1bebfd9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1104,7 +1104,7 @@ static int hns3_nic_net_set_mac_address(struct net_device 
*netdev, void *p)
if (!mac_addr || !is_valid_ether_addr((const u8 *)mac_addr->sa_data))
return -EADDRNOTAVAIL;
 
-   ret = h->ae_algo->ops->set_mac_addr(h, mac_addr->sa_data);
+   ret = h->ae_algo->ops->set_mac_addr(h, mac_addr->sa_data, false);
if (ret) {
netdev_err(netdev, "set_mac_address fail, ret=%d!\n", ret);
return ret;
@@ -2987,7 +2987,7 @@ static void hns3_init_mac_addr(struct net_device *netdev)
}
 
if (h->ae_algo->ops->set_mac_addr)
-   h->ae_algo->ops->set_mac_addr(h, netdev->dev_addr);
+   h->ae_algo->ops->set_mac_addr(h, netdev->dev_addr, true);
 
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 32bc6f6..0b74461 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -4392,7 +4392,8 @@ static void hclge_get_mac_addr(struct hnae3_handle 
*handle, u8 *p)
ether_addr_copy(p, hdev->hw.mac.mac_addr);
 }
 
-static int hclge_set_mac_addr(struct hnae3_handle *handle, void *p)
+static int hclge_set_mac_addr(struct hnae3_handle *handle, void *p,
+ bool is_first)
 {
const unsigned char *new_addr = (const unsigned char *)p;
struct hclge_vport *vport = hclge_get_vport(handle);
@@ -4409,11 +4410,9 @@ static int hclge_set_mac_addr(struct hnae3_handle 
*handle, void *p)
return -EINVAL;
}
 
-   ret = hclge_rm_uc_addr(handle, hdev->hw.mac.mac_addr);
-   if (ret)
+   if (!is_first && hclge_rm_uc_addr(handle, hdev->hw.mac.mac_addr))
dev_warn(>pdev->dev,
-"remove old uc mac address fail, ret =%d.\n",
-ret);
+"remove old uc mac address fail.\n");
 
ret = hclge_add_uc_addr(handle, new_addr);
if (ret) {
@@ -4421,12 +4420,10 @@ static int hclge_set_mac_addr(struct hnae3_handle 
*handle, void *p)
"add uc mac address fail, ret =%d.\n",
ret);
 
-   ret = hclge_add_uc_addr(handle, hdev->hw.mac.mac_addr);
-   if (ret) {
+   if (!is_first &&
+   hclge_add_uc_addr(handle, hdev->hw.mac.mac_addr))
dev_err(>pdev->dev,
-   "restore uc mac address fail, ret =%d.\n",
-   ret);
-   }
+   "restore uc mac address fail.\n");
 
return -EIO;
}
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index 31383a6..ea78a99 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -196,6 +196,8 @@ static int hclge_set_vf_uc_mac_addr(struct hclge_vport 
*vport,
 

[PATCH net-next 10/23] {topost} net: hns3: refactor the hclge_get/set_rss function

2018-03-07 Thread Peng Li
From: Yunsheng Lin 

This patch refactors the hclge_get/set_rss function in
order to fix the rss configuration loss problem during
reset process.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 39 --
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  2 ++
 2 files changed, 9 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index cc72ed8..9ba012b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2979,31 +2979,6 @@ static u32 hclge_get_rss_indir_size(struct hnae3_handle 
*handle)
return HCLGE_RSS_IND_TBL_SIZE;
 }
 
-static int hclge_get_rss_algo(struct hclge_dev *hdev)
-{
-   struct hclge_rss_config_cmd *req;
-   struct hclge_desc desc;
-   int rss_hash_algo;
-   int ret;
-
-   hclge_cmd_setup_basic_desc(, HCLGE_OPC_RSS_GENERIC_CONFIG, true);
-
-   ret = hclge_cmd_send(>hw, , 1);
-   if (ret) {
-   dev_err(>pdev->dev,
-   "Get link status error, status =%d\n", ret);
-   return ret;
-   }
-
-   req = (struct hclge_rss_config_cmd *)desc.data;
-   rss_hash_algo = (req->hash_config & HCLGE_RSS_HASH_ALGO_MASK);
-
-   if (rss_hash_algo == HCLGE_RSS_HASH_ALGO_TOEPLITZ)
-   return ETH_RSS_HASH_TOP;
-
-   return -EINVAL;
-}
-
 static int hclge_set_rss_algo_key(struct hclge_dev *hdev,
  const u8 hfunc, const u8 *key)
 {
@@ -3042,7 +3017,7 @@ static int hclge_set_rss_algo_key(struct hclge_dev *hdev,
return 0;
 }
 
-static int hclge_set_rss_indir_table(struct hclge_dev *hdev, const u32 *indir)
+static int hclge_set_rss_indir_table(struct hclge_dev *hdev, const u8 *indir)
 {
struct hclge_rss_indirection_table_cmd *req;
struct hclge_desc desc;
@@ -3138,12 +3113,11 @@ static int hclge_get_rss(struct hnae3_handle *handle, 
u32 *indir,
 u8 *key, u8 *hfunc)
 {
struct hclge_vport *vport = hclge_get_vport(handle);
-   struct hclge_dev *hdev = vport->back;
int i;
 
/* Get hash algorithm */
if (hfunc)
-   *hfunc = hclge_get_rss_algo(hdev);
+   *hfunc = vport->rss_algo;
 
/* Get the RSS Key required by the user */
if (key)
@@ -3167,8 +3141,6 @@ static int hclge_set_rss(struct hnae3_handle *handle, 
const u32 *indir,
 
/* Set the RSS Hash Key if specififed by the user */
if (key) {
-   /* Update the shadow RSS key with user specified qids */
-   memcpy(vport->rss_hash_key, key, HCLGE_RSS_KEY_SIZE);
 
if (hfunc == ETH_RSS_HASH_TOP ||
hfunc == ETH_RSS_HASH_NO_CHANGE)
@@ -3178,6 +3150,10 @@ static int hclge_set_rss(struct hnae3_handle *handle, 
const u32 *indir,
ret = hclge_set_rss_algo_key(hdev, hash_algo, key);
if (ret)
return ret;
+
+   /* Update the shadow RSS key with user specified qids */
+   memcpy(vport->rss_hash_key, key, HCLGE_RSS_KEY_SIZE);
+   vport->rss_algo = hash_algo;
}
 
/* Update the shadow RSS table with user specified qids */
@@ -3185,8 +3161,7 @@ static int hclge_set_rss(struct hnae3_handle *handle, 
const u32 *indir,
vport->rss_indirection_tbl[i] = indir[i];
 
/* Update the hardware */
-   ret = hclge_set_rss_indir_table(hdev, indir);
-   return ret;
+   return hclge_set_rss_indir_table(hdev, vport->rss_indirection_tbl);
 }
 
 static u8 hclge_get_rss_hash_bits(struct ethtool_rxnfc *nfc)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index d99a76a..7e762c4 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -579,6 +579,8 @@ struct hclge_vport {
u8  rss_hash_key[HCLGE_RSS_KEY_SIZE]; /* User configured hash keys */
/* User configured lookup table entries */
u8  rss_indirection_tbl[HCLGE_RSS_IND_TBL_SIZE];
+   int rss_algo;   /* User configured hash algorithm */
+
u16 alloc_rss_size;
 
u16 qs_offset;
-- 
2.9.3



[PATCH net-next 09/23] {topost} net: hns3: add support for querying pfc puase packets statistic

2018-03-07 Thread Peng Li
This patch add support for querying pfc puase packets statistic
in hclge_ieee_getpfc, which is used to tell user how many pfc
puase packets have been sent and received by this mac port.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 14 ++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  | 53 ++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h  |  6 +++
 3 files changed, 73 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
index 5018d66..c5270b5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
@@ -203,9 +203,11 @@ static int hclge_ieee_setets(struct hnae3_handle *h, 
struct ieee_ets *ets)
 
 static int hclge_ieee_getpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
 {
+   u64 requests[HNAE3_MAX_TC], indications[HNAE3_MAX_TC];
struct hclge_vport *vport = hclge_get_vport(h);
struct hclge_dev *hdev = vport->back;
u8 i, j, pfc_map, *prio_tc;
+   int ret;
 
memset(pfc, 0, sizeof(*pfc));
pfc->pfc_cap = hdev->pfc_max;
@@ -220,6 +222,18 @@ static int hclge_ieee_getpfc(struct hnae3_handle *h, 
struct ieee_pfc *pfc)
}
}
 
+   ret = hclge_pfc_tx_stats_get(hdev, requests);
+   if (ret)
+   return ret;
+
+   ret = hclge_pfc_rx_stats_get(hdev, indications);
+   if (ret)
+   return ret;
+
+   for (i = 0; i < HCLGE_MAX_TC_NUM; i++) {
+   pfc->requests[i] = requests[i];
+   pfc->indications[i] = indications[i];
+   }
return 0;
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index 4134a82..885f25c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -23,6 +23,9 @@ enum hclge_shaper_level {
HCLGE_SHAPER_LVL_PF = 1,
 };
 
+#define HCLGE_TM_PFC_PKT_GET_CMD_NUM   3
+#define HCLGE_TM_PFC_NUM_GET_PER_CMD   3
+
 #define HCLGE_SHAPER_BS_U_DEF  5
 #define HCLGE_SHAPER_BS_S_DEF  20
 
@@ -112,6 +115,56 @@ static int hclge_shaper_para_calc(u32 ir, u8 shaper_level,
return 0;
 }
 
+static int hclge_pfc_stats_get(struct hclge_dev *hdev,
+  enum hclge_opcode_type opcode, u64 *stats)
+{
+   struct hclge_desc desc[HCLGE_TM_PFC_PKT_GET_CMD_NUM];
+   int ret, i, j;
+
+   if (!(opcode == HCLGE_OPC_QUERY_PFC_RX_PKT_CNT ||
+ opcode == HCLGE_OPC_QUERY_PFC_TX_PKT_CNT))
+   return -EINVAL;
+
+   for (i = 0; i < HCLGE_TM_PFC_PKT_GET_CMD_NUM; i++) {
+   hclge_cmd_setup_basic_desc([i], opcode, true);
+   if (i != (HCLGE_TM_PFC_PKT_GET_CMD_NUM - 1))
+   desc[i].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT);
+   else
+   desc[i].flag &= ~cpu_to_le16(HCLGE_CMD_FLAG_NEXT);
+   }
+
+   ret = hclge_cmd_send(>hw, desc, HCLGE_TM_PFC_PKT_GET_CMD_NUM);
+   if (ret) {
+   dev_err(>pdev->dev,
+   "Get pfc pause stats fail, ret = %d.\n", ret);
+   return ret;
+   }
+
+   for (i = 0; i < HCLGE_TM_PFC_PKT_GET_CMD_NUM; i++) {
+   struct hclge_pfc_stats_cmd *pfc_stats =
+   (struct hclge_pfc_stats_cmd *)desc[i].data;
+
+   for (j = 0; j < HCLGE_TM_PFC_NUM_GET_PER_CMD; j++) {
+   u32 index = i * HCLGE_TM_PFC_PKT_GET_CMD_NUM + j;
+
+   if (index < HCLGE_MAX_TC_NUM)
+   stats[index] =
+   le64_to_cpu(pfc_stats->pkt_num[j]);
+   }
+   }
+   return 0;
+}
+
+int hclge_pfc_rx_stats_get(struct hclge_dev *hdev, u64 *stats)
+{
+   return hclge_pfc_stats_get(hdev, HCLGE_OPC_QUERY_PFC_RX_PKT_CNT, stats);
+}
+
+int hclge_pfc_tx_stats_get(struct hclge_dev *hdev, u64 *stats)
+{
+   return hclge_pfc_stats_get(hdev, HCLGE_OPC_QUERY_PFC_TX_PKT_CNT, stats);
+}
+
 int hclge_mac_pause_en_cfg(struct hclge_dev *hdev, bool tx, bool rx)
 {
struct hclge_desc desc;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
index c30c85b..2dbe177 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
@@ -109,6 +109,10 @@ struct hclge_cfg_pause_param_cmd {
__le16 pause_trans_time;
 };
 
+struct hclge_pfc_stats_cmd {
+   __le64 pkt_num[3];
+};
+
 struct hclge_port_shapping_cmd {
__le32 port_shapping_para;
 };
@@ -130,4 +134,6 @@ int hclge_tm_map_cfg(struct hclge_dev *hdev);
 int hclge_tm_init_hw(struct hclge_dev *hdev);
 int 

[PATCH net-next 19/23] {topost} net: hns3: fix for loopback failure when vlan filter is enable

2018-03-07 Thread Peng Li
From: Yunsheng Lin 

When vlan ctag filter is enabled, the loopback selftest fails because
loopback selftest does not support vlan.

This patch fixes it by disabling the vlan ctag filter when runnig
loopback selftest.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 26274bc..2db127c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -309,6 +309,9 @@ static void hns3_self_test(struct net_device *ndev,
struct hnae3_handle *h = priv->ae_handle;
int st_param[HNS3_SELF_TEST_TPYE_NUM][2];
bool if_running = netif_running(ndev);
+#if IS_ENABLED(CONFIG_VLAN_8021Q)
+   bool dis_vlan_filter;
+#endif
int test_index = 0;
u32 i;
 
@@ -323,6 +326,14 @@ static void hns3_self_test(struct net_device *ndev,
if (if_running)
dev_close(ndev);
 
+#if IS_ENABLED(CONFIG_VLAN_8021Q)
+   /* Disable the vlan filter for selftest does not support it */
+   dis_vlan_filter = (ndev->features & NETIF_F_HW_VLAN_CTAG_FILTER) &&
+   h->ae_algo->ops->enable_vlan_filter;
+   if (dis_vlan_filter)
+   h->ae_algo->ops->enable_vlan_filter(h, false);
+#endif
+
set_bit(HNS3_NIC_STATE_TESTING, >state);
 
for (i = 0; i < HNS3_SELF_TEST_TPYE_NUM; i++) {
@@ -345,6 +356,11 @@ static void hns3_self_test(struct net_device *ndev,
 
clear_bit(HNS3_NIC_STATE_TESTING, >state);
 
+#if IS_ENABLED(CONFIG_VLAN_8021Q)
+   if (dis_vlan_filter)
+   h->ae_algo->ops->enable_vlan_filter(h, true);
+#endif
+
if (if_running)
dev_open(ndev);
 }
-- 
2.9.3



[PATCH net-next 03/23] {topost} net: hns3: set the cmdq out_vld bit to 0 after used

2018-03-07 Thread Peng Li
Driver check the out_vld bit when get a new cmdq BD, if the bit is 1,
the BD is valid. driver Should set the bit 0 after used and hw will
set the bit 1 if get a valid BD.

Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c   | 1 +
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index ea78a99..3a2c174 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -412,6 +412,7 @@ void hclge_mbx_handler(struct hclge_dev *hdev)
req->msg[0]);
break;
}
+   crq->desc[crq->next_to_use].flag = 0;
hclge_mbx_ring_ptr_move_crq(crq);
}
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c
index e39cad2..18283ef 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c
@@ -171,6 +171,7 @@ void hclgevf_mbx_handler(struct hclgevf_dev *hdev)
req->msg[0]);
break;
}
+   crq->desc[crq->next_to_use].flag = 0;
hclge_mbx_ring_ptr_move_crq(crq);
flag = le16_to_cpu(crq->desc[crq->next_to_use].flag);
}
-- 
2.9.3



[PATCH net-next 18/23] {topost} net: hns3: fix for coal configuation lost when setting the channel

2018-03-07 Thread Peng Li
From: Yunsheng Lin 

This patch fixes the coalesce configuation lost problem when
setting the channel number by restoring all vectors's coalesce
configuation to vector 0's, because all vectors belonging to
the same netdev have the same coalesce configuation for now.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 37 +++--
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index a0ba25f..b02f3ff 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -3412,7 +3412,24 @@ static u16 hns3_get_max_available_channels(struct 
net_device *netdev)
return min_t(u16, max_tqps, (free_tqps + h->kinfo.num_tqps));
 }
 
-static int hns3_modify_tqp_num(struct net_device *netdev, u16 new_tqp_num)
+static void hns3_restore_coal(struct hns3_nic_priv *priv,
+ struct hns3_enet_coalesce *tx,
+ struct hns3_enet_coalesce *rx)
+{
+   u16 vector_num = priv->vector_num;
+   int i;
+
+   for (i = 0; i < vector_num; i++) {
+   memcpy(>tqp_vector[i].tx_group.coal, tx,
+  sizeof(struct hns3_enet_coalesce));
+   memcpy(>tqp_vector[i].rx_group.coal, rx,
+  sizeof(struct hns3_enet_coalesce));
+   }
+}
+
+static int hns3_modify_tqp_num(struct net_device *netdev, u16 new_tqp_num,
+  struct hns3_enet_coalesce *tx,
+  struct hns3_enet_coalesce *rx)
 {
struct hns3_nic_priv *priv = netdev_priv(netdev);
struct hnae3_handle *h = hns3_get_handle(netdev);
@@ -3430,6 +3447,8 @@ static int hns3_modify_tqp_num(struct net_device *netdev, 
u16 new_tqp_num)
if (ret)
goto err_alloc_vector;
 
+   hns3_restore_coal(priv, tx, rx);
+
ret = hns3_nic_init_vector_data(priv);
if (ret)
goto err_uninit_vector;
@@ -3460,6 +3479,7 @@ int hns3_set_channels(struct net_device *netdev,
struct hns3_nic_priv *priv = netdev_priv(netdev);
struct hnae3_handle *h = hns3_get_handle(netdev);
struct hnae3_knic_private_info *kinfo = >kinfo;
+   struct hns3_enet_coalesce tx_coal, rx_coal;
bool if_running = netif_running(netdev);
u32 new_tqp_num = ch->combined_count;
u16 org_tqp_num;
@@ -3493,15 +3513,26 @@ int hns3_set_channels(struct net_device *netdev,
goto open_netdev;
}
 
+   /* Changing the tqp num may also change the vector num,
+* ethtool only support setting and querying one coal
+* configuation for now, so save the vector 0' coal
+* configuation here in order to restore it.
+*/
+   memcpy(_coal, >tqp_vector[0].tx_group.coal,
+  sizeof(struct hns3_enet_coalesce));
+   memcpy(_coal, >tqp_vector[0].rx_group.coal,
+  sizeof(struct hns3_enet_coalesce));
+
hns3_nic_dealloc_vector_data(priv);
 
hns3_uninit_all_ring(priv);
hns3_put_ring_config(priv);
 
org_tqp_num = h->kinfo.num_tqps;
-   ret = hns3_modify_tqp_num(netdev, new_tqp_num);
+   ret = hns3_modify_tqp_num(netdev, new_tqp_num, _coal, _coal);
if (ret) {
-   ret = hns3_modify_tqp_num(netdev, org_tqp_num);
+   ret = hns3_modify_tqp_num(netdev, org_tqp_num,
+ _coal, _coal);
if (ret) {
/* If revert to old tqp failed, fatal error occurred */
dev_err(>dev,
-- 
2.9.3



[PATCH net-next 17/23] {topost} net: hns3: refactor the coalesce related struct

2018-03-07 Thread Peng Li
From: Yunsheng Lin 

This patch refoctors the coalesce related struct by introducing
the hns3_enet_coalesce struct, in order to fix the coalesce
configuation lost problem when changing the channel number.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c| 46 +++---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h| 10 +++--
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 26 +++-
 3 files changed, 46 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 453f509..a0ba25f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -168,8 +168,8 @@ void hns3_set_vector_coalesce_rl(struct 
hns3_enet_tqp_vector *tqp_vector,
 * GL and RL(Rate Limiter) are 2 ways to acheive interrupt coalescing
 */
 
-   if (rl_reg > 0 && !tqp_vector->tx_group.gl_adapt_enable &&
-   !tqp_vector->rx_group.gl_adapt_enable)
+   if (rl_reg > 0 && !tqp_vector->tx_group.coal.gl_adapt_enable &&
+   !tqp_vector->rx_group.coal.gl_adapt_enable)
/* According to the hardware, the range of rl_reg is
 * 0-59 and the unit is 4.
 */
@@ -205,17 +205,17 @@ static void hns3_vector_gl_rl_init(struct 
hns3_enet_tqp_vector *tqp_vector,
 */
 
/* Default: enable interrupt coalescing self-adaptive and GL */
-   tqp_vector->tx_group.gl_adapt_enable = 1;
-   tqp_vector->rx_group.gl_adapt_enable = 1;
+   tqp_vector->tx_group.coal.gl_adapt_enable = 1;
+   tqp_vector->rx_group.coal.gl_adapt_enable = 1;
 
-   tqp_vector->tx_group.int_gl = HNS3_INT_GL_50K;
-   tqp_vector->rx_group.int_gl = HNS3_INT_GL_50K;
+   tqp_vector->tx_group.coal.int_gl = HNS3_INT_GL_50K;
+   tqp_vector->rx_group.coal.int_gl = HNS3_INT_GL_50K;
 
/* Default: disable RL */
h->kinfo.int_rl_setting = 0;
 
-   tqp_vector->rx_group.flow_level = HNS3_FLOW_LOW;
-   tqp_vector->tx_group.flow_level = HNS3_FLOW_LOW;
+   tqp_vector->rx_group.coal.flow_level = HNS3_FLOW_LOW;
+   tqp_vector->tx_group.coal.flow_level = HNS3_FLOW_LOW;
 }
 
 static void hns3_vector_gl_rl_init_hw(struct hns3_enet_tqp_vector *tqp_vector,
@@ -224,9 +224,9 @@ static void hns3_vector_gl_rl_init_hw(struct 
hns3_enet_tqp_vector *tqp_vector,
struct hnae3_handle *h = priv->ae_handle;
 
hns3_set_vector_coalesce_tx_gl(tqp_vector,
-  tqp_vector->tx_group.int_gl);
+  tqp_vector->tx_group.coal.int_gl);
hns3_set_vector_coalesce_rx_gl(tqp_vector,
-  tqp_vector->rx_group.int_gl);
+  tqp_vector->rx_group.coal.int_gl);
hns3_set_vector_coalesce_rl(tqp_vector, h->kinfo.int_rl_setting);
 }
 
@@ -2381,12 +2381,12 @@ static bool hns3_get_new_int_gl(struct 
hns3_enet_ring_group *ring_group)
u16 new_int_gl;
int usecs;
 
-   if (!ring_group->int_gl)
+   if (!ring_group->coal.int_gl)
return false;
 
if (ring_group->total_packets == 0) {
-   ring_group->int_gl = HNS3_INT_GL_50K;
-   ring_group->flow_level = HNS3_FLOW_LOW;
+   ring_group->coal.int_gl = HNS3_INT_GL_50K;
+   ring_group->coal.flow_level = HNS3_FLOW_LOW;
return true;
}
 
@@ -2396,10 +2396,10 @@ static bool hns3_get_new_int_gl(struct 
hns3_enet_ring_group *ring_group)
 * 20-1249MB/s high  (18000 ints/s)
 * > 4pps  ultra (8000 ints/s)
 */
-   new_flow_level = ring_group->flow_level;
-   new_int_gl = ring_group->int_gl;
+   new_flow_level = ring_group->coal.flow_level;
+   new_int_gl = ring_group->coal.int_gl;
tqp_vector = ring_group->ring->tqp_vector;
-   usecs = (ring_group->int_gl << 1);
+   usecs = (ring_group->coal.int_gl << 1);
bytes_per_usecs = ring_group->total_bytes / usecs;
/* 100 microseconds */
packets_per_secs = ring_group->total_packets * 100 / usecs;
@@ -2446,9 +2446,9 @@ static bool hns3_get_new_int_gl(struct 
hns3_enet_ring_group *ring_group)
 
ring_group->total_bytes = 0;
ring_group->total_packets = 0;
-   ring_group->flow_level = new_flow_level;
-   if (new_int_gl != ring_group->int_gl) {
-   ring_group->int_gl = new_int_gl;
+   ring_group->coal.flow_level = new_flow_level;
+   if (new_int_gl != ring_group->coal.int_gl) {
+   ring_group->coal.int_gl = new_int_gl;
return true;
}
return false;
@@ -2460,18 +2460,18 @@ static void hns3_update_new_int_gl(struct 
hns3_enet_tqp_vector *tqp_vector)
struct 

[PATCH net-next 23/23] {topost} net: hns3: add support for VF driver inner interface hclgevf_ops.get_tqps_and_rss_info

2018-03-07 Thread Peng Li
This patch adds support for VF driver inner interface
hclgevf_ops.get_tqps_and_rss_info. This interface will be
used in the initialization process.

Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index b5cb8fb..6c240d6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1451,6 +1451,15 @@ static void hclgevf_get_channels(struct hnae3_handle 
*handle,
ch->combined_count = hdev->num_tqps;
 }
 
+static void hclgevf_get_tqps_and_rss_info(struct hnae3_handle *handle,
+ u16 *free_tqps, u16 *max_rss_size)
+{
+   struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+
+   *free_tqps = 0;
+   *max_rss_size = hdev->rss_size_max;
+}
+
 static const struct hnae3_ae_ops hclgevf_ops = {
.init_ae_dev = hclgevf_init_ae_dev,
.uninit_ae_dev = hclgevf_uninit_ae_dev,
@@ -1482,6 +1491,7 @@ static const struct hnae3_ae_ops hclgevf_ops = {
.get_fw_version = hclgevf_get_fw_version,
.set_vlan_filter = hclgevf_set_vlan_filter,
.get_channels = hclgevf_get_channels,
+   .get_tqps_and_rss_info = hclgevf_get_tqps_and_rss_info,
 };
 
 static struct hnae3_ae_algo ae_algovf = {
-- 
2.9.3



[PATCH net-next 20/23] {topost} net: hns3: fix for buffer overflow smatch warning

2018-03-07 Thread Peng Li
From: Yunsheng Lin 

This patch fixes the buffer overflow warning by refactoring
hclgevf_bind_ring_to_vector and hclge_get_ring_chain_from_mbx.

Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) 
Support")
Fixes: dde1a86e93ca ("net: hns3: Add mailbox support to PF driver")
Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h|  2 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 19 ---
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 60 ++
 3 files changed, 39 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h 
b/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h
index 3e9203e..e6e1d22 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h
@@ -57,6 +57,8 @@ enum hclge_mbx_vlan_cfg_subcode {
 
 #define HCLGE_MBX_MAX_MSG_SIZE 16
 #define HCLGE_MBX_MAX_RESP_DATA_SIZE   8
+#define HCLGE_MBX_RING_MAP_BASIC_MSG_NUM   3
+#define HCLGE_MBX_RING_NODE_VARIABLE_NUM   3
 
 struct hclgevf_mbx_resp_status {
struct mutex mbx_mutex; /* protects against contending sync cmd resp */
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index ed34ca3..e3e4ded 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -105,14 +105,17 @@ static int hclge_get_ring_chain_from_mbx(
struct hnae3_ring_chain_node *ring_chain,
struct hclge_vport *vport)
 {
-#define HCLGE_RING_NODE_VARIABLE_NUM   3
-#define HCLGE_RING_MAP_MBX_BASIC_MSG_NUM   3
struct hnae3_ring_chain_node *cur_chain, *new_chain;
int ring_num;
int i;
 
ring_num = req->msg[2];
 
+   if (ring_num > ((HCLGE_MBX_VF_MSG_DATA_NUM -
+   HCLGE_MBX_RING_MAP_BASIC_MSG_NUM) /
+   HCLGE_MBX_RING_NODE_VARIABLE_NUM))
+   return -ENOMEM;
+
hnae_set_bit(ring_chain->flag, HNAE3_RING_TYPE_B, req->msg[3]);
ring_chain->tqp_index =
hclge_get_queue_id(vport->nic.kinfo.tqp[req->msg[4]]);
@@ -128,18 +131,18 @@ static int hclge_get_ring_chain_from_mbx(
goto err;
 
hnae_set_bit(new_chain->flag, HNAE3_RING_TYPE_B,
-req->msg[HCLGE_RING_NODE_VARIABLE_NUM * i +
-HCLGE_RING_MAP_MBX_BASIC_MSG_NUM]);
+req->msg[HCLGE_MBX_RING_NODE_VARIABLE_NUM * i +
+HCLGE_MBX_RING_MAP_BASIC_MSG_NUM]);
 
new_chain->tqp_index =
hclge_get_queue_id(vport->nic.kinfo.tqp
-   [req->msg[HCLGE_RING_NODE_VARIABLE_NUM * i +
-   HCLGE_RING_MAP_MBX_BASIC_MSG_NUM + 1]]);
+   [req->msg[HCLGE_MBX_RING_NODE_VARIABLE_NUM * i +
+   HCLGE_MBX_RING_MAP_BASIC_MSG_NUM + 1]]);
 
hnae_set_field(new_chain->int_gl_idx, HCLGE_INT_GL_IDX_M,
   HCLGE_INT_GL_IDX_S,
-  req->msg[HCLGE_RING_NODE_VARIABLE_NUM * i +
-  HCLGE_RING_MAP_MBX_BASIC_MSG_NUM + 2]);
+  req->msg[HCLGE_MBX_RING_NODE_VARIABLE_NUM * i +
+  HCLGE_MBX_RING_MAP_BASIC_MSG_NUM + 2]);
 
cur_chain->next = new_chain;
cur_chain = new_chain;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 6bce99a..b5cb8fb 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -533,13 +533,11 @@ static int hclgevf_bind_ring_to_vector(struct 
hnae3_handle *handle, bool en,
   int vector,
   struct hnae3_ring_chain_node *ring_chain)
 {
-#define HCLGEVF_RING_NODE_VARIABLE_NUM 3
-#define HCLGEVF_RING_MAP_MBX_BASIC_MSG_NUM 3
struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
struct hnae3_ring_chain_node *node;
struct hclge_mbx_vf_to_pf_cmd *req;
struct hclgevf_desc desc;
-   int i, vector_id;
+   int i = 0, vector_id;
int status;
u8 type;
 
@@ -551,28 +549,33 @@ static int hclgevf_bind_ring_to_vector(struct 
hnae3_handle *handle, bool en,
return vector_id;
}
 
-   hclgevf_cmd_setup_basic_desc(, HCLGEVF_OPC_MBX_VF_TO_PF, false);
-   type = en ?
-   HCLGE_MBX_MAP_RING_TO_VECTOR : HCLGE_MBX_UNMAP_RING_TO_VECTOR;
-   req->msg[0] = type;
-   req->msg[1] = vector_id; /* vector_id should be id in VF */
-

[PATCH net-next 15/23] {topost} net: hns3: refactor the get/put_vector function

2018-03-07 Thread Peng Li
From: Yunsheng Lin 

There is a get_vector function, which allocate the vectors
for a client, but there is not a put_vector to free the
vector.

This patch introduces the put_vector function in order to
fix the coalesce configuration lost problem during reset
process.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|  3 +++
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c|  4 
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 28 --
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 12 +++---
 4 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 3c653eb..70441d2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -265,6 +265,8 @@ struct hnae3_ae_dev {
  *   Get tc size of handle
  * get_vector()
  *   Get vector number and vector information
+ * put_vector()
+ *   Put the vector in hdev
  * map_ring_to_vector()
  *   Map rings to vector
  * unmap_ring_from_vector()
@@ -376,6 +378,7 @@ struct hnae3_ae_ops {
 
int (*get_vector)(struct hnae3_handle *handle, u16 vector_num,
  struct hnae3_vector_info *vector_info);
+   int (*put_vector)(struct hnae3_handle *handle, int vector_num);
int (*map_ring_to_vector)(struct hnae3_handle *handle,
  int vector_num,
  struct hnae3_ring_chain_node *vr_chain);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 2bed73e..fef65b9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2709,6 +2709,10 @@ static int hns3_nic_uninit_vector_data(struct 
hns3_nic_priv *priv)
if (ret)
return ret;
 
+   ret = h->ae_algo->ops->put_vector(h, tqp_vector->vector_irq);
+   if (ret)
+   return ret;
+
hns3_free_vector_ring_chain(tqp_vector, _ring_chain);
 
if (priv->tqp_vector[i].irq_init_flag == HNS3_VECTOR_INITED) {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index c0f6939..323f95b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2969,6 +2969,24 @@ static int hclge_get_vector_index(struct hclge_dev 
*hdev, int vector)
return -EINVAL;
 }
 
+static int hclge_put_vector(struct hnae3_handle *handle, int vector)
+{
+   struct hclge_vport *vport = hclge_get_vport(handle);
+   struct hclge_dev *hdev = vport->back;
+   int vector_id;
+
+   vector_id = hclge_get_vector_index(hdev, vector);
+   if (vector_id < 0) {
+   dev_err(>pdev->dev,
+   "Get vector index fail. vector_id =%d\n", vector_id);
+   return vector_id;
+   }
+
+   hclge_free_vector(hdev, vector_id);
+
+   return 0;
+}
+
 static u32 hclge_get_rss_key_size(struct hnae3_handle *handle)
 {
return HCLGE_RSS_KEY_SIZE;
@@ -3523,18 +3541,13 @@ static int hclge_unmap_ring_frm_vector(struct 
hnae3_handle *handle,
}
 
ret = hclge_bind_ring_with_vector(vport, vector_id, false, ring_chain);
-   if (ret) {
+   if (ret)
dev_err(>pdev->dev,
"Unmap ring from vector fail. vectorid=%d, ret =%d\n",
vector_id,
ret);
-   return ret;
-   }
-
-   /* Free this MSIX or MSI vector */
-   hclge_free_vector(hdev, vector_id);
 
-   return 0;
+   return ret;
 }
 
 int hclge_cmd_set_promisc_mode(struct hclge_dev *hdev,
@@ -5996,6 +6009,7 @@ static const struct hnae3_ae_ops hclge_ops = {
.map_ring_to_vector = hclge_map_ring_to_vector,
.unmap_ring_from_vector = hclge_unmap_ring_frm_vector,
.get_vector = hclge_get_vector,
+   .put_vector = hclge_put_vector,
.set_promisc_mode = hclge_set_promisc_mode,
.set_loopback = hclge_set_loopback,
.start = hclge_ae_start,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index eee5e20..6bce99a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -627,13 +627,18 @@ static int hclgevf_unmap_ring_from_vector(
}
 
ret = hclgevf_bind_ring_to_vector(handle, false, vector, ring_chain);
-   if (ret) {
+   if (ret)
dev_err(>pdev->dev,
"Unmap ring from vector fail. vector=%d, ret 

[PATCH net-next 21/23] {topost} net: hns3: fix the queue id for tqp enable&

2018-03-07 Thread Peng Li
Command HCLGE_OPC_CFG_COM_TQP_QUEUE should use queue id in the
function, but command HCLGE_OPC_RESET_TQP_QUEUE should use global
queue id.
This patch fixes the queue id about queue enable/disable/reset.

Signed-off-by: Peng Li 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 50 +++---
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 323f95b..ea33cc5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3720,20 +3720,11 @@ static int hclge_ae_start(struct hnae3_handle *handle)
 {
struct hclge_vport *vport = hclge_get_vport(handle);
struct hclge_dev *hdev = vport->back;
-   int i, queue_id, ret;
+   int i, ret;
 
-   for (i = 0; i < vport->alloc_tqps; i++) {
-   /* todo clear interrupt */
-   /* ring enable */
-   queue_id = hclge_get_queue_id(handle->kinfo.tqp[i]);
-   if (queue_id < 0) {
-   dev_warn(>pdev->dev,
-"Get invalid queue id, ignore it\n");
-   continue;
-   }
+   for (i = 0; i < vport->alloc_tqps; i++)
+   hclge_tqp_enable(hdev, i, 0, true);
 
-   hclge_tqp_enable(hdev, queue_id, 0, true);
-   }
/* mac enable */
hclge_cfg_mac_mode(hdev, true);
clear_bit(HCLGE_STATE_DOWN, >state);
@@ -3753,19 +3744,11 @@ static void hclge_ae_stop(struct hnae3_handle *handle)
 {
struct hclge_vport *vport = hclge_get_vport(handle);
struct hclge_dev *hdev = vport->back;
-   int i, queue_id;
+   int i;
 
-   for (i = 0; i < vport->alloc_tqps; i++) {
-   /* Ring disable */
-   queue_id = hclge_get_queue_id(handle->kinfo.tqp[i]);
-   if (queue_id < 0) {
-   dev_warn(>pdev->dev,
-"Get invalid queue id, ignore it\n");
-   continue;
-   }
+   for (i = 0; i < vport->alloc_tqps; i++)
+   hclge_tqp_enable(hdev, i, 0, false);
 
-   hclge_tqp_enable(hdev, queue_id, 0, false);
-   }
/* Mac disable */
hclge_cfg_mac_mode(hdev, false);
 
@@ -4851,21 +4834,36 @@ static int hclge_get_reset_status(struct hclge_dev 
*hdev, u16 queue_id)
return hnae_get_bit(req->ready_to_reset, HCLGE_TQP_RESET_B);
 }
 
+static u16 hclge_covert_handle_qid_global(struct hnae3_handle *handle,
+ u16 queue_id)
+{
+   struct hnae3_queue *queue;
+   struct hclge_tqp *tqp;
+
+   queue = handle->kinfo.tqp[queue_id];
+   tqp = container_of(queue, struct hclge_tqp, q);
+
+   return tqp->index;
+}
+
 void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 {
struct hclge_vport *vport = hclge_get_vport(handle);
struct hclge_dev *hdev = vport->back;
int reset_try_times = 0;
int reset_status;
+   u16 queue_gid;
int ret;
 
+   queue_gid = hclge_covert_handle_qid_global(handle, queue_id);
+
ret = hclge_tqp_enable(hdev, queue_id, 0, false);
if (ret) {
dev_warn(>pdev->dev, "Disable tqp fail, ret = %d\n", ret);
return;
}
 
-   ret = hclge_send_reset_tqp_cmd(hdev, queue_id, true);
+   ret = hclge_send_reset_tqp_cmd(hdev, queue_gid, true);
if (ret) {
dev_warn(>pdev->dev,
 "Send reset tqp cmd fail, ret = %d\n", ret);
@@ -4876,7 +4874,7 @@ void hclge_reset_tqp(struct hnae3_handle *handle, u16 
queue_id)
while (reset_try_times++ < HCLGE_TQP_RESET_TRY_TIMES) {
/* Wait for tqp hw reset */
msleep(20);
-   reset_status = hclge_get_reset_status(hdev, queue_id);
+   reset_status = hclge_get_reset_status(hdev, queue_gid);
if (reset_status)
break;
}
@@ -4886,7 +4884,7 @@ void hclge_reset_tqp(struct hnae3_handle *handle, u16 
queue_id)
return;
}
 
-   ret = hclge_send_reset_tqp_cmd(hdev, queue_id, false);
+   ret = hclge_send_reset_tqp_cmd(hdev, queue_gid, false);
if (ret) {
dev_warn(>pdev->dev,
 "Deassert the soft reset fail, ret = %d\n", ret);
-- 
2.9.3



[PATCH net-next 16/23] {topost} net: hns3: fix for coalesce configuration lost during reset

2018-03-07 Thread Peng Li
From: Yunsheng Lin 

Coalesce configuration will be set to default value by
hns3_nic_init_vector_data during reset, which causes the
coalesce configuration loss problem.

This patch fixes it by setting the default value in
hns3_nic_alloc_vector_data, which will not be called in the
reset process.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 156 +---
 1 file changed, 114 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index fef65b9..453f509 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -211,19 +211,25 @@ static void hns3_vector_gl_rl_init(struct 
hns3_enet_tqp_vector *tqp_vector,
tqp_vector->tx_group.int_gl = HNS3_INT_GL_50K;
tqp_vector->rx_group.int_gl = HNS3_INT_GL_50K;
 
-   hns3_set_vector_coalesce_tx_gl(tqp_vector,
-  tqp_vector->tx_group.int_gl);
-   hns3_set_vector_coalesce_rx_gl(tqp_vector,
-  tqp_vector->rx_group.int_gl);
-
/* Default: disable RL */
h->kinfo.int_rl_setting = 0;
-   hns3_set_vector_coalesce_rl(tqp_vector, h->kinfo.int_rl_setting);
 
tqp_vector->rx_group.flow_level = HNS3_FLOW_LOW;
tqp_vector->tx_group.flow_level = HNS3_FLOW_LOW;
 }
 
+static void hns3_vector_gl_rl_init_hw(struct hns3_enet_tqp_vector *tqp_vector,
+ struct hns3_nic_priv *priv)
+{
+   struct hnae3_handle *h = priv->ae_handle;
+
+   hns3_set_vector_coalesce_tx_gl(tqp_vector,
+  tqp_vector->tx_group.int_gl);
+   hns3_set_vector_coalesce_rx_gl(tqp_vector,
+  tqp_vector->rx_group.int_gl);
+   hns3_set_vector_coalesce_rl(tqp_vector, h->kinfo.int_rl_setting);
+}
+
 static int hns3_nic_set_real_num_queue(struct net_device *netdev)
 {
struct hnae3_handle *h = hns3_get_handle(netdev);
@@ -2613,32 +2619,18 @@ static int hns3_nic_init_vector_data(struct 
hns3_nic_priv *priv)
struct hnae3_ring_chain_node vector_ring_chain;
struct hnae3_handle *h = priv->ae_handle;
struct hns3_enet_tqp_vector *tqp_vector;
-   struct hnae3_vector_info *vector;
-   struct pci_dev *pdev = h->pdev;
-   u16 tqp_num = h->kinfo.num_tqps;
-   u16 vector_num;
int ret = 0;
u16 i;
 
-   /* RSS size, cpu online and vector_num should be the same */
-   /* Should consider 2p/4p later */
-   vector_num = min_t(u16, num_online_cpus(), tqp_num);
-   vector = devm_kcalloc(>dev, vector_num, sizeof(*vector),
- GFP_KERNEL);
-   if (!vector)
-   return -ENOMEM;
-
-   vector_num = h->ae_algo->ops->get_vector(h, vector_num, vector);
-
-   priv->vector_num = vector_num;
-   priv->tqp_vector = (struct hns3_enet_tqp_vector *)
-   devm_kcalloc(>dev, vector_num, sizeof(*priv->tqp_vector),
-GFP_KERNEL);
-   if (!priv->tqp_vector)
-   return -ENOMEM;
+   for (i = 0; i < priv->vector_num; i++) {
+   tqp_vector = >tqp_vector[i];
+   hns3_vector_gl_rl_init_hw(tqp_vector, priv);
+   tqp_vector->num_tqps = 0;
+   }
 
-   for (i = 0; i < tqp_num; i++) {
-   u16 vector_i = i % vector_num;
+   for (i = 0; i < h->kinfo.num_tqps; i++) {
+   u16 vector_i = i % priv->vector_num;
+   u16 tqp_num = h->kinfo.num_tqps;
 
tqp_vector = >tqp_vector[vector_i];
 
@@ -2648,52 +2640,94 @@ static int hns3_nic_init_vector_data(struct 
hns3_nic_priv *priv)
hns3_add_ring_to_group(_vector->rx_group,
   priv->ring_data[i + tqp_num].ring);
 
-   tqp_vector->idx = vector_i;
-   tqp_vector->mask_addr = vector[vector_i].io_addr;
-   tqp_vector->vector_irq = vector[vector_i].vector;
-   tqp_vector->num_tqps++;
-
priv->ring_data[i].ring->tqp_vector = tqp_vector;
priv->ring_data[i + tqp_num].ring->tqp_vector = tqp_vector;
+   tqp_vector->num_tqps++;
}
 
-   for (i = 0; i < vector_num; i++) {
+   for (i = 0; i < priv->vector_num; i++) {
tqp_vector = >tqp_vector[i];
 
tqp_vector->rx_group.total_bytes = 0;
tqp_vector->rx_group.total_packets = 0;
tqp_vector->tx_group.total_bytes = 0;
tqp_vector->tx_group.total_packets = 0;
-   hns3_vector_gl_rl_init(tqp_vector, priv);
tqp_vector->handle = h;
 
ret = hns3_get_vector_ring_chain(tqp_vector,
 

[PATCH net-next 22/23] {topost} net: hns3: set the max ring num when alloc netdev

2018-03-07 Thread Peng Li
HNS3 driver should alloc netdev with max support ring num, as
driver support change netdev count by ethtool -L.

Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 27 -
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index b02f3ff..94f0b92 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -255,6 +255,16 @@ static int hns3_nic_set_real_num_queue(struct net_device 
*netdev)
return 0;
 }
 
+static u16 hns3_get_max_available_channels(struct hnae3_handle *h)
+{
+   u16 free_tqps, max_rss_size, max_tqps;
+
+   h->ae_algo->ops->get_tqps_and_rss_info(h, _tqps, _rss_size);
+   max_tqps = h->kinfo.num_tc * max_rss_size;
+
+   return min_t(u16, max_tqps, (free_tqps + h->kinfo.num_tqps));
+}
+
 static int hns3_nic_net_up(struct net_device *netdev)
 {
struct hns3_nic_priv *priv = netdev_priv(netdev);
@@ -3062,7 +3072,7 @@ static int hns3_client_init(struct hnae3_handle *handle)
int ret;
 
netdev = alloc_etherdev_mq(sizeof(struct hns3_nic_priv),
-  handle->kinfo.num_tqps);
+  hns3_get_max_available_channels(handle));
if (!netdev)
return -ENOMEM;
 
@@ -3401,17 +3411,6 @@ static int hns3_reset_notify(struct hnae3_handle *handle,
return ret;
 }
 
-static u16 hns3_get_max_available_channels(struct net_device *netdev)
-{
-   struct hnae3_handle *h = hns3_get_handle(netdev);
-   u16 free_tqps, max_rss_size, max_tqps;
-
-   h->ae_algo->ops->get_tqps_and_rss_info(h, _tqps, _rss_size);
-   max_tqps = h->kinfo.num_tc * max_rss_size;
-
-   return min_t(u16, max_tqps, (free_tqps + h->kinfo.num_tqps));
-}
-
 static void hns3_restore_coal(struct hns3_nic_priv *priv,
  struct hns3_enet_coalesce *tx,
  struct hns3_enet_coalesce *rx)
@@ -3488,12 +3487,12 @@ int hns3_set_channels(struct net_device *netdev,
if (ch->rx_count || ch->tx_count)
return -EINVAL;
 
-   if (new_tqp_num > hns3_get_max_available_channels(netdev) ||
+   if (new_tqp_num > hns3_get_max_available_channels(h) ||
new_tqp_num < kinfo->num_tc) {
dev_err(>dev,
"Change tqps fail, the tqp range is from %d to %d",
kinfo->num_tc,
-   hns3_get_max_available_channels(netdev));
+   hns3_get_max_available_channels(h));
return -EINVAL;
}
 
-- 
2.9.3



[PATCH net-next 04/23] {topost} net: hns3: fix endian issue when PF get mbx message flag

2018-03-07 Thread Peng Li
This patch fixes the endian issue when PF get mbx message flag.

Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index 3a2c174..ed34ca3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -335,11 +335,11 @@ void hclge_mbx_handler(struct hclge_dev *hdev)
struct hclge_mbx_vf_to_pf_cmd *req;
struct hclge_vport *vport;
struct hclge_desc *desc;
-   int ret;
+   int ret, flag;
 
+   flag = le16_to_cpu(crq->desc[crq->next_to_use].flag);
/* handle all the mailbox requests in the queue */
-   while (hnae_get_bit(crq->desc[crq->next_to_use].flag,
-   HCLGE_CMDQ_RX_OUTVLD_B)) {
+   while (hnae_get_bit(flag, HCLGE_CMDQ_RX_OUTVLD_B)) {
desc = >desc[crq->next_to_use];
req = (struct hclge_mbx_vf_to_pf_cmd *)desc->data;
 
@@ -414,6 +414,7 @@ void hclge_mbx_handler(struct hclge_dev *hdev)
}
crq->desc[crq->next_to_use].flag = 0;
hclge_mbx_ring_ptr_move_crq(crq);
+   flag = le16_to_cpu(crq->desc[crq->next_to_use].flag);
}
 
/* Write back CMDQ_RQ header pointer, M7 need this pointer */
-- 
2.9.3



[net-next:master 178/193] drivers/net/ipvlan/ipvlan.h:183:32: sparse: incompatible types in comparison expression (different address spaces)

2018-03-07 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   a366e300ae9fc466d333e6d8f2bc5d58ed248041
commit: 1ec54cb44e6731c3cb251bcf9251d65a4b4f6306 [178/193] net: unpollute 
priv_flags space
reproduce:
# apt-get install sparse
git checkout 1ec54cb44e6731c3cb251bcf9251d65a4b4f6306
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   drivers/net/ipvlan/ipvlan_core.c:57:36: sparse: incorrect type in argument 1 
(different base types) @@expected unsigned int [unsigned] [usertype] a @@   
 got unsigned int [unsigned] [usertype] a @@
   drivers/net/ipvlan/ipvlan_core.c:57:36:expected unsigned int [unsigned] 
[usertype] a
   drivers/net/ipvlan/ipvlan_core.c:57:36:got restricted __be32 const 
[usertype] s_addr
>> drivers/net/ipvlan/ipvlan.h:183:32: sparse: incompatible types in comparison 
>> expression (different address spaces)
--
>> drivers/net/ipvlan/ipvlan.h:183:32: sparse: incompatible types in comparison 
>> expression (different address spaces)
>> drivers/net/ipvlan/ipvlan.h:183:32: sparse: incompatible types in comparison 
>> expression (different address spaces)
>> drivers/net/ipvlan/ipvlan.h:183:32: sparse: incompatible types in comparison 
>> expression (different address spaces)

vim +183 drivers/net/ipvlan/ipvlan.h

   180  
   181  static inline bool netif_is_ipvlan_port(const struct net_device *dev)
   182  {
 > 183  return dev->rx_handler == ipvlan_handle_frame;
   184  }
   185  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: linux-next: manual merge of the selinux tree with the net-next tree

2018-03-07 Thread Stephen Rothwell
Hi all,

On Mon, 5 Mar 2018 12:40:54 +1100 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the selinux tree got a conflict in:
> 
>   net/sctp/socket.c
> 
> between several refactoring commits from the net-next tree and commit:
> 
>   2277c7cd75e3 ("sctp: Add LSM hooks")
> 
> from the selinux tree.
> 
> I fixed it up (I think - see below) and can carry the fix as
> necessary. This is now fixed as far as linux-next is concerned, but any
> non trivial conflicts should be mentioned to your upstream maintainer
> when your tree is submitted for merging.  You may also want to consider
> cooperating with the maintainer of the conflicting tree to minimise any
> particularly complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell

The resolution now looks like below (there were more changes to this
file in the net-next tree).  It will keep changing every time this file
is touched :-(

-- 
Cheers,
Stephen Rothwell

diff --cc net/sctp/socket.c
index 7d3476a4860d,73b34a6b5b09..
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@@ -1606,200 -1622,308 +1622,216 @@@ static int sctp_error(struct sock *sk, 
  static int sctp_msghdr_parse(const struct msghdr *msg,
 struct sctp_cmsgs *cmsgs);
  
 -static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 +static int sctp_sendmsg_parse(struct sock *sk, struct sctp_cmsgs *cmsgs,
 +struct sctp_sndrcvinfo *srinfo,
 +const struct msghdr *msg, size_t msg_len)
  {
 -  struct net *net = sock_net(sk);
 -  struct sctp_sock *sp;
 -  struct sctp_endpoint *ep;
 -  struct sctp_association *new_asoc = NULL, *asoc = NULL;
 -  struct sctp_transport *transport, *chunk_tp;
 -  struct sctp_chunk *chunk;
 -  union sctp_addr to;
 -  struct sctp_af *af;
 -  struct sockaddr *msg_name = NULL;
 -  struct sctp_sndrcvinfo default_sinfo;
 -  struct sctp_sndrcvinfo *sinfo;
 -  struct sctp_initmsg *sinit;
 -  sctp_assoc_t associd = 0;
 -  struct sctp_cmsgs cmsgs = { NULL };
 -  enum sctp_scope scope;
 -  bool fill_sinfo_ttl = false, wait_connect = false;
 -  struct sctp_datamsg *datamsg;
 -  int msg_flags = msg->msg_flags;
 -  __u16 sinfo_flags = 0;
 -  long timeo;
 +  __u16 sflags;
int err;
  
 -  err = 0;
 -  sp = sctp_sk(sk);
 -  ep = sp->ep;
 +  if (sctp_sstate(sk, LISTENING) && sctp_style(sk, TCP))
 +  return -EPIPE;
  
 -  pr_debug("%s: sk:%p, msg:%p, msg_len:%zu ep:%p\n", __func__, sk,
 -   msg, msg_len, ep);
 -
 -  /* We cannot send a message over a TCP-style listening socket. */
 -  if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING)) {
 -  err = -EPIPE;
 -  goto out_nounlock;
 -  }
 +  if (msg_len > sk->sk_sndbuf)
 +  return -EMSGSIZE;
  
 -  /* Parse out the SCTP CMSGs.  */
 -  err = sctp_msghdr_parse(msg, );
 +  memset(cmsgs, 0, sizeof(*cmsgs));
 +  err = sctp_msghdr_parse(msg, cmsgs);
if (err) {
pr_debug("%s: msghdr parse err:%x\n", __func__, err);
 -  goto out_nounlock;
 +  return err;
}
  
 -  /* Fetch the destination address for this packet.  This
 -   * address only selects the association--it is not necessarily
 -   * the address we will send to.
 -   * For a peeled-off socket, msg_name is ignored.
 -   */
 -  if (!sctp_style(sk, UDP_HIGH_BANDWIDTH) && msg->msg_name) {
 -  int msg_namelen = msg->msg_namelen;
 +  memset(srinfo, 0, sizeof(*srinfo));
 +  if (cmsgs->srinfo) {
 +  srinfo->sinfo_stream = cmsgs->srinfo->sinfo_stream;
 +  srinfo->sinfo_flags = cmsgs->srinfo->sinfo_flags;
 +  srinfo->sinfo_ppid = cmsgs->srinfo->sinfo_ppid;
 +  srinfo->sinfo_context = cmsgs->srinfo->sinfo_context;
 +  srinfo->sinfo_assoc_id = cmsgs->srinfo->sinfo_assoc_id;
 +  srinfo->sinfo_timetolive = cmsgs->srinfo->sinfo_timetolive;
 +  }
  
 -  err = sctp_verify_addr(sk, (union sctp_addr *)msg->msg_name,
 - msg_namelen);
 -  if (err)
 -  return err;
 +  if (cmsgs->sinfo) {
 +  srinfo->sinfo_stream = cmsgs->sinfo->snd_sid;
 +  srinfo->sinfo_flags = cmsgs->sinfo->snd_flags;
 +  srinfo->sinfo_ppid = cmsgs->sinfo->snd_ppid;
 +  srinfo->sinfo_context = cmsgs->sinfo->snd_context;
 +  srinfo->sinfo_assoc_id = cmsgs->sinfo->snd_assoc_id;
 +  }
  
 -  if (msg_namelen > sizeof(to))
 -  msg_namelen = sizeof(to);
 -  memcpy(, msg->msg_name, msg_namelen);
 -  msg_name = msg->msg_name;
 +  if (cmsgs->prinfo) {
 +  srinfo->sinfo_timetolive = cmsgs->prinfo->pr_value;
 +  

Re: [PATCH bpf-next 0/2] bpf: add support for bpf program to read perf event sample address

2018-03-07 Thread Daniel Borkmann
On 03/06/2018 07:55 PM, Teng Qin wrote:
> These patches add support that allows bpf programs attached to perf events to
> read the address values recorded with the perf events. These values are
> requested by specifying sample_type with PERF_SAMPLE_ADDR when calling
> perf_event_open().
> 
> The main motivation for these changes is to support building memory or lock
> access profiling and tracing tools. For example on Intel CPUs, the recorded
> address values for supported memory or lock access perf events would be
> the access or lock target addresses from PEBS buffer. Such information would
> be very valuable for building tools that help understand memory access or
> lock acquire pattern.

Series applied to bpf-next, thanks Teng!


Re: pull-request: bpf 2018-03-08

2018-03-07 Thread Daniel Borkmann
On 03/08/2018 02:31 AM, David Miller wrote:
> From: Daniel Borkmann 
> Date: Thu,  8 Mar 2018 02:17:16 +0100
> 
>> The following pull-request contains BPF updates for your *net* tree.
>>
>> The main changes are:
>>
>> 1) Fix various BPF helpers which adjust the skb and its GSO information
>>with regards to SCTP GSO. The latter is a special case where gso_size
>>is of value GSO_BY_FRAGS, so mangling that will end up corrupting
>>the skb, thus bail out when seeing SCTP GSO packets, from Daniel(s).
>>
>> 2) Fix a compilation error in bpftool where BPF_FS_MAGIC is not defined
>>due to too old kernel headers in the system, from Jiri.
>>
>> 3) Increase the number of x64 JIT passes in order to allow larger images
>>to converge instead of punting them to interpreter or having them
>>rejected when the interpreter is not built into the kernel, from Daniel.
>>
>> Please consider pulling these changes from:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
> 
> Pulled, thanks Daniel.

Thanks!

> About that x86 JIT passes thing...
> 
> I think since you now have a scheduling point in there, you can be
> even more liberal with the limit if necessary.

Agree, if needed we can always adapt it further in future, I think 20
should be good for now.

Thanks,
Daniel


Re: pull-request: bpf 2018-03-08

2018-03-07 Thread David Miller
From: Daniel Borkmann 
Date: Thu,  8 Mar 2018 02:17:16 +0100

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) Fix various BPF helpers which adjust the skb and its GSO information
>with regards to SCTP GSO. The latter is a special case where gso_size
>is of value GSO_BY_FRAGS, so mangling that will end up corrupting
>the skb, thus bail out when seeing SCTP GSO packets, from Daniel(s).
> 
> 2) Fix a compilation error in bpftool where BPF_FS_MAGIC is not defined
>due to too old kernel headers in the system, from Jiri.
> 
> 3) Increase the number of x64 JIT passes in order to allow larger images
>to converge instead of punting them to interpreter or having them
>rejected when the interpreter is not built into the kernel, from Daniel.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Daniel.

About that x86 JIT passes thing...

I think since you now have a scheduling point in there, you can be
even more liberal with the limit if necessary.


[for-next 02/11] net/mlx5: IPSec, Generalize sandbox QP commands

2018-03-07 Thread Saeed Mahameed
From: Yossi Kuperman 

The current code assume only SA QP commands.
Refactor in order to pave the way for new QP commands:
1. Generic cmd response format.
2. SA cmd checks are in dedicated functions.
3. Aligned debug prints.

Signed-off-by: Yossi Kuperman 
Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c   | 116 -
 include/linux/mlx5/mlx5_ifc_fpga.h |  16 +++
 2 files changed, 81 insertions(+), 51 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
index 95f9c5a8619b..e0f32b025e06 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
@@ -41,35 +41,23 @@
 #define SBU_QP_QUEUE_SIZE 8
 #define MLX5_FPGA_IPSEC_CMD_TIMEOUT_MSEC   (60 * 1000)
 
-enum mlx5_ipsec_response_syndrome {
-   MLX5_IPSEC_RESPONSE_SUCCESS = 0,
-   MLX5_IPSEC_RESPONSE_ILLEGAL_REQUEST = 1,
-   MLX5_IPSEC_RESPONSE_SADB_ISSUE = 2,
-   MLX5_IPSEC_RESPONSE_WRITE_RESPONSE_ISSUE = 3,
-};
-
-enum mlx5_fpga_ipsec_sacmd_status {
-   MLX5_FPGA_IPSEC_SACMD_PENDING,
-   MLX5_FPGA_IPSEC_SACMD_SEND_FAIL,
-   MLX5_FPGA_IPSEC_SACMD_COMPLETE,
+enum mlx5_fpga_ipsec_cmd_status {
+   MLX5_FPGA_IPSEC_CMD_PENDING,
+   MLX5_FPGA_IPSEC_CMD_SEND_FAIL,
+   MLX5_FPGA_IPSEC_CMD_COMPLETE,
 };
 
 struct mlx5_ipsec_command_context {
struct mlx5_fpga_dma_buf buf;
-   struct mlx5_accel_ipsec_sa sa;
-   enum mlx5_fpga_ipsec_sacmd_status status;
+   enum mlx5_fpga_ipsec_cmd_status status;
+   struct mlx5_ifc_fpga_ipsec_cmd_resp resp;
int status_code;
struct completion complete;
struct mlx5_fpga_device *dev;
struct list_head list; /* Item in pending_cmds */
+   u8 command[0];
 };
 
-struct mlx5_ipsec_sadb_resp {
-   __be32 syndrome;
-   __be32 sw_sa_handle;
-   u8 reserved[24];
-} __packed;
-
 struct mlx5_fpga_ipsec {
struct list_head pending_cmds;
spinlock_t pending_cmds_lock; /* Protects pending_cmds */
@@ -105,21 +93,22 @@ static void mlx5_fpga_ipsec_send_complete(struct 
mlx5_fpga_conn *conn,
   buf);
mlx5_fpga_warn(fdev, "IPSec command send failed with status 
%u\n",
   status);
-   context->status = MLX5_FPGA_IPSEC_SACMD_SEND_FAIL;
+   context->status = MLX5_FPGA_IPSEC_CMD_SEND_FAIL;
complete(>complete);
}
 }
 
-static inline int syndrome_to_errno(enum mlx5_ipsec_response_syndrome syndrome)
+static inline
+int syndrome_to_errno(enum mlx5_ifc_fpga_ipsec_response_syndrome syndrome)
 {
switch (syndrome) {
-   case MLX5_IPSEC_RESPONSE_SUCCESS:
+   case MLX5_FPGA_IPSEC_RESPONSE_SUCCESS:
return 0;
-   case MLX5_IPSEC_RESPONSE_SADB_ISSUE:
+   case MLX5_FPGA_IPSEC_RESPONSE_SADB_ISSUE:
return -EEXIST;
-   case MLX5_IPSEC_RESPONSE_ILLEGAL_REQUEST:
+   case MLX5_FPGA_IPSEC_RESPONSE_ILLEGAL_REQUEST:
return -EINVAL;
-   case MLX5_IPSEC_RESPONSE_WRITE_RESPONSE_ISSUE:
+   case MLX5_FPGA_IPSEC_RESPONSE_WRITE_RESPONSE_ISSUE:
return -EIO;
}
return -EIO;
@@ -127,9 +116,9 @@ static inline int syndrome_to_errno(enum 
mlx5_ipsec_response_syndrome syndrome)
 
 static void mlx5_fpga_ipsec_recv(void *cb_arg, struct mlx5_fpga_dma_buf *buf)
 {
-   struct mlx5_ipsec_sadb_resp *resp = buf->sg[0].data;
+   struct mlx5_ifc_fpga_ipsec_cmd_resp *resp = buf->sg[0].data;
struct mlx5_ipsec_command_context *context;
-   enum mlx5_ipsec_response_syndrome syndrome;
+   enum mlx5_ifc_fpga_ipsec_response_syndrome syndrome;
struct mlx5_fpga_device *fdev = cb_arg;
unsigned long flags;
 
@@ -139,8 +128,8 @@ static void mlx5_fpga_ipsec_recv(void *cb_arg, struct 
mlx5_fpga_dma_buf *buf)
return;
}
 
-   mlx5_fpga_dbg(fdev, "mlx5_ipsec recv_cb syndrome %08x sa_id %x\n",
- ntohl(resp->syndrome), ntohl(resp->sw_sa_handle));
+   mlx5_fpga_dbg(fdev, "mlx5_ipsec recv_cb syndrome %08x\n",
+ ntohl(resp->syndrome));
 
spin_lock_irqsave(>ipsec->pending_cmds_lock, flags);
context = list_first_entry_or_null(>ipsec->pending_cmds,
@@ -156,51 +145,48 @@ static void mlx5_fpga_ipsec_recv(void *cb_arg, struct 
mlx5_fpga_dma_buf *buf)
}
mlx5_fpga_dbg(fdev, "Handling response for %p\n", context);
 
-   if (context->sa.sw_sa_handle != resp->sw_sa_handle) {
-   mlx5_fpga_err(fdev, "mismatch SA handle. cmd 0x%08x vs resp 
0x%08x\n",
- ntohl(context->sa.sw_sa_handle),
- ntohl(resp->sw_sa_handle));
-   

[for-next 08/11] net/mlx5: Add flow-steering commands for FPGA IPSec implementation

2018-03-07 Thread Saeed Mahameed
From: Aviad Yehezkel 

In order to add a context to the FPGA, we need to get both the software
transform context (which includes the keys, etc) and the
source/destination IPs (which are included in the steering
rule). Therefore, we register new set of firmware like commands for
the FPGA. Each time a rule is added, the steering core infrastructure
calls the FPGA command layer. If the rule is intended for the FPGA,
it combines the IPs information with the software transformation
context and creates the respective hardware transform.
Afterwards, it calls the standard steering command layer.

Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/accel/ipsec.c  |   7 +
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c   | 724 +
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.h   |  24 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   2 +
 include/linux/mlx5/accel.h |   5 +
 include/linux/mlx5/fs.h|   3 +
 7 files changed, 770 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
index ab5bc82855fd..9f1b1939716a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
@@ -100,3 +100,10 @@ void mlx5_accel_esp_destroy_xfrm(struct 
mlx5_accel_esp_xfrm *xfrm)
mlx5_fpga_esp_destroy_xfrm(xfrm);
 }
 EXPORT_SYMBOL_GPL(mlx5_accel_esp_destroy_xfrm);
+
+int mlx5_accel_esp_modify_xfrm(struct mlx5_accel_esp_xfrm *xfrm,
+  const struct mlx5_accel_esp_xfrm_attrs *attrs)
+{
+   return mlx5_fpga_esp_modify_xfrm(xfrm, attrs);
+}
+EXPORT_SYMBOL_GPL(mlx5_accel_esp_modify_xfrm);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
index daae44c937f0..7b43fa269117 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
@@ -33,8 +33,12 @@
 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "mlx5_core.h"
+#include "fs_cmd.h"
 #include "fpga/ipsec.h"
 #include "fpga/sdk.h"
 #include "fpga/core.h"
@@ -75,6 +79,12 @@ struct mlx5_fpga_esp_xfrm {
struct mlx5_accel_esp_xfrm  accel_xfrm;
 };
 
+struct mlx5_fpga_ipsec_rule {
+   struct rb_node  node;
+   struct fs_fte   *fte;
+   struct mlx5_fpga_ipsec_sa_ctx   *ctx;
+};
+
 static const struct rhashtable_params rhash_sa = {
.key_len = FIELD_SIZEOF(struct mlx5_fpga_ipsec_sa_ctx, hw_sa),
.key_offset = offsetof(struct mlx5_fpga_ipsec_sa_ctx, hw_sa),
@@ -84,11 +94,15 @@ static const struct rhashtable_params rhash_sa = {
 };
 
 struct mlx5_fpga_ipsec {
+   struct mlx5_fpga_device *fdev;
struct list_head pending_cmds;
spinlock_t pending_cmds_lock; /* Protects pending_cmds */
u32 caps[MLX5_ST_SZ_DW(ipsec_extended_cap)];
struct mlx5_fpga_conn *conn;
 
+   struct notifier_block   fs_notifier_ingress_bypass;
+   struct notifier_block   fs_notifier_egress;
+
/* Map hardware SA   -->  SA context
 * (mlx5_fpga_ipsec_sa)   (mlx5_fpga_ipsec_sa_ctx)
 * We will use this hash to avoid SAs duplication in fpga which
@@ -96,6 +110,12 @@ struct mlx5_fpga_ipsec {
 */
struct rhashtable sa_hash;  /* hw_sa -> mlx5_fpga_ipsec_sa_ctx */
struct mutex sa_hash_lock;
+
+   /* Tree holding all rules for this fpga device
+* Key for searching a rule (mlx5_fpga_ipsec_rule) is (ft, id)
+*/
+   struct rb_root rules_rb;
+   struct mutex rules_rb_lock; /* rules lock */
 };
 
 static bool mlx5_fpga_is_ipsec_device(struct mlx5_core_dev *mdev)
@@ -498,6 +518,127 @@ mlx5_fpga_ipsec_build_hw_sa(struct mlx5_core_dev *mdev,
hw_sa->ipsec_sa_v1.flags |= MLX5_FPGA_IPSEC_SA_IPV6;
 }
 
+static bool is_full_mask(const void *p, size_t len)
+{
+   WARN_ON(len % 4);
+
+   return !memchr_inv(p, 0xff, len);
+}
+
+static bool validate_fpga_full_mask(struct mlx5_core_dev *dev,
+   const u32 *match_c,
+   const u32 *match_v)
+{
+   const void *misc_params_c = MLX5_ADDR_OF(fte_match_param,
+match_c,
+misc_parameters);
+   const void *headers_c = MLX5_ADDR_OF(fte_match_param,
+match_c,
+outer_headers);
+   const void *headers_v = MLX5_ADDR_OF(fte_match_param,
+match_v,
+outer_headers);
+
+ 

[for-next 07/11] net/mlx5: Refactor accel IPSec code

2018-03-07 Thread Saeed Mahameed
From: Aviad Yehezkel 

The current code has one layer that executed FPGA commands and
the Ethernet part directly used this code. Since downstream patches
introduces support for IPSec in mlx5_ib, we need to provide some
abstractions. This patch refactors the accel code into one layer
that creates a software IPSec transformation and another one which
creates the actual hardware context.
The internal command implementation is now hidden in the FPGA
core layer. The code also adds the ability to share FPGA hardware
contexts. If two contexts are the same, only a reference count
is taken.

Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/accel/ipsec.c  |  58 +--
 .../net/ethernet/mellanox/mlx5/core/accel/ipsec.h  |  97 ++---
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c   | 150 
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c   | 391 +++--
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.h   |  55 ++-
 include/linux/mlx5/accel.h |  83 -
 include/linux/mlx5/mlx5_ifc_fpga.h |  59 
 7 files changed, 668 insertions(+), 225 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
index 375ba438e7cf..ab5bc82855fd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
@@ -37,27 +37,6 @@
 #include "mlx5_core.h"
 #include "fpga/ipsec.h"
 
-void *mlx5_accel_ipsec_sa_cmd_exec(struct mlx5_core_dev *mdev,
-  struct mlx5_accel_ipsec_sa *cmd)
-{
-   int cmd_size;
-
-   if (!MLX5_IPSEC_DEV(mdev))
-   return ERR_PTR(-EOPNOTSUPP);
-
-   if (mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_CAP_V2_CMD)
-   cmd_size = sizeof(*cmd);
-   else
-   cmd_size = sizeof(cmd->ipsec_sa_v1);
-
-   return mlx5_fpga_ipsec_sa_cmd_exec(mdev, cmd, cmd_size);
-}
-
-int mlx5_accel_ipsec_sa_cmd_wait(void *ctx)
-{
-   return mlx5_fpga_ipsec_sa_cmd_wait(ctx);
-}
-
 u32 mlx5_accel_ipsec_device_caps(struct mlx5_core_dev *mdev)
 {
return mlx5_fpga_ipsec_device_caps(mdev);
@@ -75,6 +54,21 @@ int mlx5_accel_ipsec_counters_read(struct mlx5_core_dev 
*mdev, u64 *counters,
return mlx5_fpga_ipsec_counters_read(mdev, counters, count);
 }
 
+void *mlx5_accel_esp_create_hw_context(struct mlx5_core_dev *mdev,
+  struct mlx5_accel_esp_xfrm *xfrm,
+  const __be32 saddr[4],
+  const __be32 daddr[4],
+  const __be32 spi, bool is_ipv6)
+{
+   return mlx5_fpga_ipsec_create_sa_ctx(mdev, xfrm, saddr, daddr,
+spi, is_ipv6);
+}
+
+void mlx5_accel_esp_free_hw_context(void *context)
+{
+   mlx5_fpga_ipsec_delete_sa_ctx(context);
+}
+
 int mlx5_accel_ipsec_init(struct mlx5_core_dev *mdev)
 {
return mlx5_fpga_ipsec_init(mdev);
@@ -84,3 +78,25 @@ void mlx5_accel_ipsec_cleanup(struct mlx5_core_dev *mdev)
 {
mlx5_fpga_ipsec_cleanup(mdev);
 }
+
+struct mlx5_accel_esp_xfrm *
+mlx5_accel_esp_create_xfrm(struct mlx5_core_dev *mdev,
+  const struct mlx5_accel_esp_xfrm_attrs *attrs,
+  u32 flags)
+{
+   struct mlx5_accel_esp_xfrm *xfrm;
+
+   xfrm = mlx5_fpga_esp_create_xfrm(mdev, attrs, flags);
+   if (IS_ERR(xfrm))
+   return xfrm;
+
+   xfrm->mdev = mdev;
+   return xfrm;
+}
+EXPORT_SYMBOL_GPL(mlx5_accel_esp_create_xfrm);
+
+void mlx5_accel_esp_destroy_xfrm(struct mlx5_accel_esp_xfrm *xfrm)
+{
+   mlx5_fpga_esp_destroy_xfrm(xfrm);
+}
+EXPORT_SYMBOL_GPL(mlx5_accel_esp_destroy_xfrm);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
index 421ed71a029b..024dbd22a89b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
@@ -39,89 +39,20 @@
 
 #ifdef CONFIG_MLX5_ACCEL
 
-#define MLX5_IPSEC_SADB_IP_AH   BIT(7)
-#define MLX5_IPSEC_SADB_IP_ESP  BIT(6)
-#define MLX5_IPSEC_SADB_SA_VALIDBIT(5)
-#define MLX5_IPSEC_SADB_SPI_EN  BIT(4)
-#define MLX5_IPSEC_SADB_DIR_SX  BIT(3)
-#define MLX5_IPSEC_SADB_IPV6BIT(2)
-
-enum {
-   MLX5_IPSEC_CMD_ADD_SA = 0,
-   MLX5_IPSEC_CMD_DEL_SA = 1,
-   MLX5_IPSEC_CMD_ADD_SA_V2 = 2,
-   MLX5_IPSEC_CMD_DEL_SA_V2 = 3,
-   MLX5_IPSEC_CMD_MOD_SA_V2 = 4,
-   MLX5_IPSEC_CMD_SET_CAP = 5,
-};
-
-enum mlx5_accel_ipsec_enc_mode {
-   MLX5_IPSEC_SADB_MODE_NONE = 0,
-   MLX5_IPSEC_SADB_MODE_AES_GCM_128_AUTH_128 = 1,
-   MLX5_IPSEC_SADB_MODE_AES_GCM_256_AUTH_128 = 3,
-};
-
 #define MLX5_IPSEC_DEV(mdev) 

[for-next 11/11] net/mlx5: Fix wrongly assigned CQ reference counter

2018-03-07 Thread Saeed Mahameed
From: Leon Romanovsky 

The kernel compiled with CONFIG_REFCOUNT_FULL produces the following
error. The reason to it that initial value of refcount_t is supposed
to be more than 0, change it.

[3.106634] [ cut here ]
[3.107756] refcount_t: increment on 0; use-after-free.
[3.109130] WARNING: CPU: 0 PID: 1 at lib/refcount.c:153 
refcount_inc+0x27/0x30
[3.110085] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.16.0-rc1-00028-gf683e04bdccc #137
[3.110085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[3.110085] RIP: 0010:refcount_inc+0x27/0x30
[3.110085] RSP: :aa62fba0 EFLAGS: 00010286
[3.110085] RAX:  RBX: 9a6d1a1821c8 RCX: 98a50f48
[3.110085] RDX: 0001 RSI: 0086 RDI: 0246
[3.110085] RBP: 9a6d1ac800a0 R08: 0289 R09: 000a
[3.110085] R10: f03bc0682840 R11: 9949856d R12: 9a6d1b4a4000
[3.110085] R13:  R14: 9a6d1a0a6c00 R15: aa62fc5c
[3.110085] FS:  () GS:9a6d1fc0() 
knlGS:
[3.110085] CS:  0010 DS:  ES:  CR0: 80050033
[3.110085] CR2:  CR3: 0ba0a000 CR4: 06b0
[3.110085] Call Trace:
[3.110085]  mlx5_core_create_cq+0xde/0x250
[3.110085]  ? __kmalloc+0x1ce/0x1e0
[3.110085]  mlx5e_create_cq+0x15c/0x1e0
[3.110085]  mlx5e_open_drop_rq+0xea/0x190
[3.110085]  mlx5e_attach_netdev+0x53/0x140
[3.110085]  mlx5e_attach+0x3d/0x60
[3.110085]  mlx5e_add+0x11d/0x2f0
[3.110085]  mlx5_add_device+0x77/0x170
[3.110085]  mlx5_register_interface+0x74/0xc0
[3.110085]  ? set_debug_rodata+0x11/0x11
[3.110085]  init+0x67/0x72
[3.110085]  ? mlx4_en_init_ptys2ethtool_map+0x346/0x346
[3.110085]  do_one_initcall+0x98/0x147
[3.110085]  ? set_debug_rodata+0x11/0x11
[3.110085]  kernel_init_freeable+0x164/0x1e0
[3.110085]  ? rest_init+0xb0/0xb0
[3.110085]  kernel_init+0xa/0x100
[3.110085]  ret_from_fork+0x35/0x40
[3.110085] Code: 00 00 00 00 e8 ab ff ff ff 84 c0 74 02 f3 c3 80 3d 3b c3 
64 01 00 75 f5 48 c7 c7 68 0b 81 98 c6 05 2b c3 64 01 01 e8 79 d7 a3 ff <0f> ff 
c3 66 0f 1f 44 00 00 8b 06 83 f8 ff 74 39 31 c9 39 f8 89
[3.110085] ---[ end trace a0068e1c68438a74 ]---

Fixes: f105b45bf77c ("net/mlx5: CQ hold/put API")
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 669ed16938b3..a4179122a279 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -109,8 +109,7 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct 
mlx5_core_cq *cq,
cq->cons_index = 0;
cq->arm_sn = 0;
cq->eq = eq;
-   refcount_set(>refcount, 0);
-   mlx5_cq_hold(cq);
+   refcount_set(>refcount, 1);
init_completion(>free);
if (!cq->comp)
cq->comp = mlx5_add_cq_to_tasklet;
-- 
2.14.3



[for-next 10/11] net/mlx5: IPSec, Add support for ESN

2018-03-07 Thread Saeed Mahameed
From: Aviad Yehezkel 

Currently ESN is not supported with IPSec device offload.

This patch adds ESN support to IPsec device offload.
Implementing new xfrm device operation to synchronize offloading device
ESN with xfrm received SN. New QP command to update SA state at the
following:

   ESN 1ESN 2  ESN 3
|---*---|---*---|---*
^   ^   ^   ^   ^   ^

^ - marks where QP command invoked to update the SA ESN state
machine.
| - marks the start of the ESN scope (0-2^32-1). At this point move SA
ESN overlap bit to zero and increment ESN.
* - marks the middle of the ESN scope (2^31). At this point move SA
ESN overlap bit to one.

Signed-off-by: Aviad Yehezkel 
Signed-off-by: Yossef Efraim 
Signed-off-by: Saeed Mahameed 
---
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c   | 118 +++--
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.h   |  23 
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c   |  29 -
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.h   |   5 +
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c   |  22 
 include/linux/mlx5/accel.h |   2 +
 include/linux/mlx5/mlx5_ifc_fpga.h |   2 +
 7 files changed, 189 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index f5b1d60f96f5..cf58c9637904 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -38,18 +38,9 @@
 #include 
 
 #include "en.h"
-#include "accel/ipsec.h"
 #include "en_accel/ipsec.h"
 #include "en_accel/ipsec_rxtx.h"
 
-struct mlx5e_ipsec_sa_entry {
-   struct hlist_node hlist; /* Item in SADB_RX hashtable */
-   unsigned int handle; /* Handle in SADB_RX */
-   struct xfrm_state *x;
-   struct mlx5e_ipsec *ipsec;
-   struct mlx5_accel_esp_xfrm *xfrm;
-   void *hw_context;
-};
 
 static struct mlx5e_ipsec_sa_entry *to_ipsec_sa_entry(struct xfrm_state *x)
 {
@@ -121,6 +112,40 @@ static void mlx5e_ipsec_sadb_rx_free(struct 
mlx5e_ipsec_sa_entry *sa_entry)
ida_simple_remove(>halloc, sa_entry->handle);
 }
 
+static bool mlx5e_ipsec_update_esn_state(struct mlx5e_ipsec_sa_entry *sa_entry)
+{
+   struct xfrm_replay_state_esn *replay_esn;
+   u32 seq_bottom;
+   u8 overlap;
+   u32 *esn;
+
+   if (!(sa_entry->x->props.flags & XFRM_STATE_ESN)) {
+   sa_entry->esn_state.trigger = 0;
+   return false;
+   }
+
+   replay_esn = sa_entry->x->replay_esn;
+   seq_bottom = replay_esn->seq - replay_esn->replay_window + 1;
+   overlap = sa_entry->esn_state.overlap;
+
+   sa_entry->esn_state.esn = xfrm_replay_seqhi(sa_entry->x,
+   htonl(seq_bottom));
+   esn = _entry->esn_state.esn;
+
+   sa_entry->esn_state.trigger = 1;
+   if (unlikely(overlap && seq_bottom < MLX5E_IPSEC_ESN_SCOPE_MID)) {
+   ++(*esn);
+   sa_entry->esn_state.overlap = 0;
+   return true;
+   } else if (unlikely(!overlap &&
+   (seq_bottom >= MLX5E_IPSEC_ESN_SCOPE_MID))) {
+   sa_entry->esn_state.overlap = 1;
+   return true;
+   }
+
+   return false;
+}
+
 static void
 mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry,
   struct mlx5_accel_esp_xfrm_attrs *attrs)
@@ -152,6 +177,14 @@ mlx5e_ipsec_build_accel_xfrm_attrs(struct 
mlx5e_ipsec_sa_entry *sa_entry,
/* iv len */
aes_gcm->icv_len = x->aead->alg_icv_len;
 
+   /* esn */
+   if (sa_entry->esn_state.trigger) {
+   attrs->flags |= MLX5_ACCEL_ESP_FLAGS_ESN_TRIGGERED;
+   attrs->esn = sa_entry->esn_state.esn;
+   if (sa_entry->esn_state.overlap)
+   attrs->flags |= MLX5_ACCEL_ESP_FLAGS_ESN_STATE_OVERLAP;
+   }
+
/* rx handle */
attrs->sa_handle = sa_entry->handle;
 
@@ -187,7 +220,9 @@ static inline int mlx5e_xfrm_validate_state(struct 
xfrm_state *x)
netdev_info(netdev, "Cannot offload compressed xfrm states\n");
return -EINVAL;
}
-   if (x->props.flags & XFRM_STATE_ESN) {
+   if (x->props.flags & XFRM_STATE_ESN &&
+   !(mlx5_accel_ipsec_device_caps(priv->mdev) &
+   MLX5_ACCEL_IPSEC_CAP_ESN)) {
netdev_info(netdev, "Cannot offload ESN xfrm states\n");
return -EINVAL;
}
@@ -277,8 +312,14 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x)
netdev_info(netdev, "Failed adding to SADB_RX: %d\n", 
err);
goto err_entry;

[for-next 06/11] net/mlx5: Added required metadata capability for ipsec

2018-03-07 Thread Saeed Mahameed
From: Aviad Yehezkel 

Currently our device requires additional metadata in packet
to perform ipsec crypto offload.

Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 6 --
 include/linux/mlx5/accel.h   | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
index e7e28277733d..8de992ba7230 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
@@ -256,10 +256,12 @@ u32 mlx5_fpga_ipsec_device_caps(struct mlx5_core_dev 
*mdev)
struct mlx5_fpga_device *fdev = mdev->fpga;
u32 ret = 0;
 
-   if (mlx5_fpga_is_ipsec_device(mdev))
+   if (mlx5_fpga_is_ipsec_device(mdev)) {
ret |= MLX5_ACCEL_IPSEC_CAP_DEVICE;
-   else
+   ret |= MLX5_ACCEL_IPSEC_CAP_REQUIRED_METADATA;
+   } else {
return ret;
+   }
 
if (!fdev->ipsec)
return ret;
diff --git a/include/linux/mlx5/accel.h b/include/linux/mlx5/accel.h
index 601280c782d3..b674af63689b 100644
--- a/include/linux/mlx5/accel.h
+++ b/include/linux/mlx5/accel.h
@@ -38,6 +38,7 @@
 
 enum mlx5_accel_ipsec_caps {
MLX5_ACCEL_IPSEC_CAP_DEVICE = 1 << 0,
+   MLX5_ACCEL_IPSEC_CAP_REQUIRED_METADATA  = 1 << 1,
MLX5_ACCEL_IPSEC_CAP_ESP= 1 << 2,
MLX5_ACCEL_IPSEC_CAP_IPV6   = 1 << 3,
MLX5_ACCEL_IPSEC_CAP_LSO= 1 << 4,
-- 
2.14.3



[for-next 05/11] net/mlx5: Export ipsec capabilities

2018-03-07 Thread Saeed Mahameed
From: Aviad Yehezkel 

We will need that for ipsec verbs.

Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/accel/ipsec.c  |  3 +-
 .../net/ethernet/mellanox/mlx5/core/accel/ipsec.h  | 14 +-
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c   |  9 ++--
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c   | 14 +++---
 include/linux/mlx5/accel.h | 57 ++
 5 files changed, 73 insertions(+), 24 deletions(-)
 create mode 100644 include/linux/mlx5/accel.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
index b88ae12d9066..375ba438e7cf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
@@ -45,7 +45,7 @@ void *mlx5_accel_ipsec_sa_cmd_exec(struct mlx5_core_dev *mdev,
if (!MLX5_IPSEC_DEV(mdev))
return ERR_PTR(-EOPNOTSUPP);
 
-   if (mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_V2_CMD)
+   if (mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_CAP_V2_CMD)
cmd_size = sizeof(*cmd);
else
cmd_size = sizeof(cmd->ipsec_sa_v1);
@@ -62,6 +62,7 @@ u32 mlx5_accel_ipsec_device_caps(struct mlx5_core_dev *mdev)
 {
return mlx5_fpga_ipsec_device_caps(mdev);
 }
+EXPORT_SYMBOL_GPL(mlx5_accel_ipsec_device_caps);
 
 unsigned int mlx5_accel_ipsec_counters_count(struct mlx5_core_dev *mdev)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
index 14a2e95e82c3..421ed71a029b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
@@ -35,18 +35,10 @@
 #define __MLX5_ACCEL_IPSEC_H__
 
 #include 
+#include 
 
 #ifdef CONFIG_MLX5_ACCEL
 
-enum {
-   MLX5_ACCEL_IPSEC_DEVICE = BIT(1),
-   MLX5_ACCEL_IPSEC_IPV6 = BIT(2),
-   MLX5_ACCEL_IPSEC_ESP = BIT(3),
-   MLX5_ACCEL_IPSEC_LSO = BIT(4),
-   MLX5_ACCEL_IPSEC_NO_TRAILER = BIT(5),
-   MLX5_ACCEL_IPSEC_V2_CMD = BIT(7),
-};
-
 #define MLX5_IPSEC_SADB_IP_AH   BIT(7)
 #define MLX5_IPSEC_SADB_IP_ESP  BIT(6)
 #define MLX5_IPSEC_SADB_SA_VALIDBIT(5)
@@ -70,7 +62,7 @@ enum mlx5_accel_ipsec_enc_mode {
 };
 
 #define MLX5_IPSEC_DEV(mdev) (mlx5_accel_ipsec_device_caps(mdev) & \
- MLX5_ACCEL_IPSEC_DEVICE)
+ MLX5_ACCEL_IPSEC_CAP_DEVICE)
 
 struct mlx5_accel_ipsec_sa_v1 {
__be32 cmd;
@@ -126,8 +118,6 @@ void *mlx5_accel_ipsec_sa_cmd_exec(struct mlx5_core_dev 
*mdev,
  */
 int mlx5_accel_ipsec_sa_cmd_wait(void *context);
 
-u32 mlx5_accel_ipsec_device_caps(struct mlx5_core_dev *mdev);
-
 unsigned int mlx5_accel_ipsec_counters_count(struct mlx5_core_dev *mdev);
 int mlx5_accel_ipsec_counters_read(struct mlx5_core_dev *mdev, u64 *counters,
   unsigned int count);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index a8c3fe7cff0f..6f4a01620cc3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -242,7 +242,8 @@ static inline int mlx5e_xfrm_validate_state(struct 
xfrm_state *x)
return -EINVAL;
}
if (x->props.family == AF_INET6 &&
-   !(mlx5_accel_ipsec_device_caps(priv->mdev) & 
MLX5_ACCEL_IPSEC_IPV6)) {
+   !(mlx5_accel_ipsec_device_caps(priv->mdev) &
+MLX5_ACCEL_IPSEC_CAP_IPV6)) {
netdev_info(netdev, "IPv6 xfrm state offload is not supported 
by this device\n");
return -EINVAL;
}
@@ -375,7 +376,7 @@ int mlx5e_ipsec_init(struct mlx5e_priv *priv)
ipsec->en_priv = priv;
ipsec->en_priv->ipsec = ipsec;
ipsec->no_trailer = !!(mlx5_accel_ipsec_device_caps(priv->mdev) &
-  MLX5_ACCEL_IPSEC_NO_TRAILER);
+  MLX5_ACCEL_IPSEC_CAP_RX_NO_TRAILER);
netdev_dbg(priv->netdev, "IPSec attached to netdevice\n");
return 0;
 }
@@ -422,7 +423,7 @@ void mlx5e_ipsec_build_netdev(struct mlx5e_priv *priv)
if (!priv->ipsec)
return;
 
-   if (!(mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_ESP) ||
+   if (!(mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_CAP_ESP) ||
!MLX5_CAP_ETH(mdev, swp)) {
mlx5_core_dbg(mdev, "mlx5e: ESP and SWP offload not 
supported\n");
return;
@@ -441,7 +442,7 @@ void mlx5e_ipsec_build_netdev(struct mlx5e_priv *priv)
netdev->features |= NETIF_F_HW_ESP_TX_CSUM;
netdev->hw_enc_features |= NETIF_F_HW_ESP_TX_CSUM;
 
-   if (!(mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_LSO) 

[for-next 09/11] net/mlx5e: Added common function for to_ipsec_sa_entry

2018-03-07 Thread Saeed Mahameed
From: Aviad Yehezkel 

New function for getting driver internal sa entry from xfrm state.
All checks are done in one function.

Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c   | 29 ++
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 59df3dbd2e65..f5b1d60f96f5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -51,6 +51,21 @@ struct mlx5e_ipsec_sa_entry {
void *hw_context;
 };
 
+static struct mlx5e_ipsec_sa_entry *to_ipsec_sa_entry(struct xfrm_state *x)
+{
+   struct mlx5e_ipsec_sa_entry *sa;
+
+   if (!x)
+   return NULL;
+
+   sa = (struct mlx5e_ipsec_sa_entry *)x->xso.offload_handle;
+   if (!sa)
+   return NULL;
+
+   WARN_ON(sa->x != x);
+   return sa;
+}
+
 struct xfrm_state *mlx5e_ipsec_sadb_rx_lookup(struct mlx5e_ipsec *ipsec,
  unsigned int handle)
 {
@@ -312,28 +327,22 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x)
 
 static void mlx5e_xfrm_del_state(struct xfrm_state *x)
 {
-   struct mlx5e_ipsec_sa_entry *sa_entry;
+   struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x);
 
-   if (!x->xso.offload_handle)
+   if (!sa_entry)
return;
 
-   sa_entry = (struct mlx5e_ipsec_sa_entry *)x->xso.offload_handle;
-   WARN_ON(sa_entry->x != x);
-
if (x->xso.flags & XFRM_OFFLOAD_INBOUND)
mlx5e_ipsec_sadb_rx_del(sa_entry);
 }
 
 static void mlx5e_xfrm_free_state(struct xfrm_state *x)
 {
-   struct mlx5e_ipsec_sa_entry *sa_entry;
+   struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x);
 
-   if (!x->xso.offload_handle)
+   if (!sa_entry)
return;
 
-   sa_entry = (struct mlx5e_ipsec_sa_entry *)x->xso.offload_handle;
-   WARN_ON(sa_entry->x != x);
-
if (sa_entry->hw_context) {
mlx5_accel_esp_free_hw_context(sa_entry->hw_context);
mlx5_accel_esp_destroy_xfrm(sa_entry->xfrm);
-- 
2.14.3



[for-next 04/11] net/mlx5: IPSec, Add command V2 support

2018-03-07 Thread Saeed Mahameed
From: Aviad Yehezkel 

This patch adds V2 command support.
New fpga devices support extended features (udp encap, esn etc...), this
features require new hardware sadb format therefore we have a new version
of commands to manipulate it.

Signed-off-by: Yossef Efraim 
Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/accel/ipsec.c  |  9 +++-
 .../net/ethernet/mellanox/mlx5/core/accel/ipsec.h  | 21 ++--
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c   | 60 ++
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c   | 11 ++--
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.h   |  5 +-
 include/linux/mlx5/mlx5_ifc_fpga.h |  4 +-
 6 files changed, 66 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
index 53e69edaedde..b88ae12d9066 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c
@@ -40,10 +40,17 @@
 void *mlx5_accel_ipsec_sa_cmd_exec(struct mlx5_core_dev *mdev,
   struct mlx5_accel_ipsec_sa *cmd)
 {
+   int cmd_size;
+
if (!MLX5_IPSEC_DEV(mdev))
return ERR_PTR(-EOPNOTSUPP);
 
-   return mlx5_fpga_ipsec_sa_cmd_exec(mdev, cmd);
+   if (mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_V2_CMD)
+   cmd_size = sizeof(*cmd);
+   else
+   cmd_size = sizeof(cmd->ipsec_sa_v1);
+
+   return mlx5_fpga_ipsec_sa_cmd_exec(mdev, cmd, cmd_size);
 }
 
 int mlx5_accel_ipsec_sa_cmd_wait(void *ctx)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
index 4da9611a753d..14a2e95e82c3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
@@ -44,6 +44,7 @@ enum {
MLX5_ACCEL_IPSEC_ESP = BIT(3),
MLX5_ACCEL_IPSEC_LSO = BIT(4),
MLX5_ACCEL_IPSEC_NO_TRAILER = BIT(5),
+   MLX5_ACCEL_IPSEC_V2_CMD = BIT(7),
 };
 
 #define MLX5_IPSEC_SADB_IP_AH   BIT(7)
@@ -56,6 +57,9 @@ enum {
 enum {
MLX5_IPSEC_CMD_ADD_SA = 0,
MLX5_IPSEC_CMD_DEL_SA = 1,
+   MLX5_IPSEC_CMD_ADD_SA_V2 = 2,
+   MLX5_IPSEC_CMD_DEL_SA_V2 = 3,
+   MLX5_IPSEC_CMD_MOD_SA_V2 = 4,
MLX5_IPSEC_CMD_SET_CAP = 5,
 };
 
@@ -68,7 +72,7 @@ enum mlx5_accel_ipsec_enc_mode {
 #define MLX5_IPSEC_DEV(mdev) (mlx5_accel_ipsec_device_caps(mdev) & \
  MLX5_ACCEL_IPSEC_DEVICE)
 
-struct mlx5_accel_ipsec_sa {
+struct mlx5_accel_ipsec_sa_v1 {
__be32 cmd;
u8 key_enc[32];
u8 key_auth[32];
@@ -88,10 +92,19 @@ struct mlx5_accel_ipsec_sa {
__be32 sw_sa_handle;
__be16 tfclen;
u8 enc_mode;
-   u8 sip_masklen;
-   u8 dip_masklen;
+   u8 reserved1[2];
u8 flags;
-   u8 reserved[2];
+   u8 reserved2[2];
+};
+
+struct mlx5_accel_ipsec_sa {
+   struct mlx5_accel_ipsec_sa_v1 ipsec_sa_v1;
+   __be16 udp_sp;
+   __be16 udp_dp;
+   u8 reserved1[4];
+   __be32 esn;
+   __be16 vid; /* only 12 bits, rest is reserved */
+   __be16 reserved2;
 } __packed;
 
 /**
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 460a613059fe..a8c3fe7cff0f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -133,50 +133,46 @@ static void mlx5e_ipsec_build_hw_sa(u32 op, struct 
mlx5e_ipsec_sa_entry *sa_entr
 
memset(hw_sa, 0, sizeof(*hw_sa));
 
-   if (op == MLX5_IPSEC_CMD_ADD_SA) {
-   crypto_data_len = (x->aead->alg_key_len + 7) / 8;
-   key_len = crypto_data_len - 4; /* 4 bytes salt at end */
-   aead = x->data;
-   geniv_ctx = crypto_aead_ctx(aead);
-   ivsize = crypto_aead_ivsize(aead);
-
-   memcpy(_sa->key_enc, x->aead->alg_key, key_len);
-   /* Duplicate 128 bit key twice according to HW layout */
-   if (key_len == 16)
-   memcpy(_sa->key_enc[16], x->aead->alg_key, key_len);
-   memcpy(_sa->gcm.salt_iv, geniv_ctx->salt, ivsize);
-   hw_sa->gcm.salt = *((__be32 *)(x->aead->alg_key + key_len));
-   }
-
-   hw_sa->cmd = htonl(op);
-   hw_sa->flags |= MLX5_IPSEC_SADB_SA_VALID | MLX5_IPSEC_SADB_SPI_EN;
+   crypto_data_len = (x->aead->alg_key_len + 7) / 8;
+   key_len = crypto_data_len - 4; /* 4 bytes salt at end */
+   aead = x->data;
+   geniv_ctx = crypto_aead_ctx(aead);
+   ivsize = crypto_aead_ivsize(aead);
+
+   memcpy(_sa->ipsec_sa_v1.key_enc, x->aead->alg_key, key_len);
+   /* 

[for-next 01/11] net/mlx5: Use MLX5_IPSEC_DEV macro for ipsec caps

2018-03-07 Thread Saeed Mahameed
Fix build break of mlx5_accel_ipsec_device_caps is not defined when
MLX5_ACCEL is not selected, use MLX5_IPSEC_DEV instead which handles
such case.

Signed-off-by: Saeed Mahameed 
Reported-by: Doug Ledford 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 4e456c292ce4..f836e6b76f65 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -2642,8 +2642,7 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
goto err;
}
 
-   if (mlx5_accel_ipsec_device_caps(steering->dev) &
-   MLX5_ACCEL_IPSEC_DEVICE) {
+   if (MLX5_IPSEC_DEV(dev)) {
err = init_egress_root_ns(steering);
if (err)
goto err;
-- 
2.14.3



Re: [PATCH] net: remove VLA usage

2018-03-07 Thread David Miller
From: Laszlo Toth 
Date: Thu, 8 Mar 2018 01:19:53 +0100

> Separated snmp_seq_show_tcp_udp() to tcp and udp variants,
> so the usage of max_t() for the array size can be emitted.
> 
> Signed-off-by: Laszlo Toth 

But it's a max on a constant value, computed at compile
time.

I don't see at all why this is necessary.

If the compiler can't figure this out, fix it.

If the compiler warns on this with -Wvla, fix it.

Because there is no reason to have two separate routines for this.

Thank you.


[for-next 03/11] net/mlx5e: IPSec, Add support for ESP trailer removal by hardware

2018-03-07 Thread Saeed Mahameed
From: Yossi Kuperman 

Current hardware decrypts and authenticates incoming ESP packets.
Subsequently, the software extracts the nexthdr field, truncates the
trailer and adjusts csum accordingly.

With this patch and a capable device, the trailer is being removed
by the hardware and the nexthdr field is conveyed via PET. This way
we avoid both the need to access the trailer (cache miss) and to
compute its relative checksum, which significantly improve
the performance.

Experiment shows that trailer removal improves the performance by
2Gbps, (netperf). Both forwarding and host-to-host configurations.

Signed-off-by: Yossi Kuperman 
Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/accel/ipsec.h  |  2 +
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c   |  2 +
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.h   |  1 +
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c   | 10 +++-
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c   | 54 ++
 include/linux/mlx5/mlx5_ifc_fpga.h | 13 +-
 6 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
index 67cda8871f5a..4da9611a753d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.h
@@ -43,6 +43,7 @@ enum {
MLX5_ACCEL_IPSEC_IPV6 = BIT(2),
MLX5_ACCEL_IPSEC_ESP = BIT(3),
MLX5_ACCEL_IPSEC_LSO = BIT(4),
+   MLX5_ACCEL_IPSEC_NO_TRAILER = BIT(5),
 };
 
 #define MLX5_IPSEC_SADB_IP_AH   BIT(7)
@@ -55,6 +56,7 @@ enum {
 enum {
MLX5_IPSEC_CMD_ADD_SA = 0,
MLX5_IPSEC_CMD_DEL_SA = 1,
+   MLX5_IPSEC_CMD_SET_CAP = 5,
 };
 
 enum mlx5_accel_ipsec_enc_mode {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 1b49afca65c0..460a613059fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -378,6 +378,8 @@ int mlx5e_ipsec_init(struct mlx5e_priv *priv)
ida_init(>halloc);
ipsec->en_priv = priv;
ipsec->en_priv->ipsec = ipsec;
+   ipsec->no_trailer = !!(mlx5_accel_ipsec_device_caps(priv->mdev) &
+  MLX5_ACCEL_IPSEC_NO_TRAILER);
netdev_dbg(priv->netdev, "IPSec attached to netdevice\n");
return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
index 56e00baf16cc..bffc3ed0574a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
@@ -77,6 +77,7 @@ struct mlx5e_ipsec_stats {
 struct mlx5e_ipsec {
struct mlx5e_priv *en_priv;
DECLARE_HASHTABLE(sadb_rx, MLX5E_IPSEC_SADB_RX_BITS);
+   bool no_trailer;
spinlock_t sadb_rx_lock; /* Protects sadb_rx and halloc */
struct ida halloc;
struct mlx5e_ipsec_sw_stats sw_stats;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
index 6a7c8b04447e..64c549a06678 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
@@ -42,10 +42,11 @@
 enum {
MLX5E_IPSEC_RX_SYNDROME_DECRYPTED = 0x11,
MLX5E_IPSEC_RX_SYNDROME_AUTH_FAILED = 0x12,
+   MLX5E_IPSEC_RX_SYNDROME_BAD_PROTO = 0x17,
 };
 
 struct mlx5e_ipsec_rx_metadata {
-   unsigned char   reserved;
+   unsigned char   nexthdr;
__be32  sa_handle;
 } __packed;
 
@@ -301,10 +302,17 @@ mlx5e_ipsec_build_sp(struct net_device *netdev, struct 
sk_buff *skb,
switch (mdata->syndrome) {
case MLX5E_IPSEC_RX_SYNDROME_DECRYPTED:
xo->status = CRYPTO_SUCCESS;
+   if (likely(priv->ipsec->no_trailer)) {
+   xo->flags |= XFRM_ESP_NO_TRAILER;
+   xo->proto = mdata->content.rx.nexthdr;
+   }
break;
case MLX5E_IPSEC_RX_SYNDROME_AUTH_FAILED:
xo->status = CRYPTO_TUNNEL_ESP_AUTH_FAILED;
break;
+   case MLX5E_IPSEC_RX_SYNDROME_BAD_PROTO:
+   xo->status = CRYPTO_INVALID_PROTOCOL;
+   break;
default:
atomic64_inc(>ipsec->sw_stats.ipsec_rx_drop_syndrome);
return NULL;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
index e0f32b025e06..3b10d46dc821 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
@@ -273,6 +273,9 

[pull request][for-next 00/11] Mellanox, mlx5 IPSec updates 2018-02-28-2 (Part 2)

2018-03-07 Thread Saeed Mahameed
Hi Dave and Doug,

This series includes shared code updates (IPSec part2) for mlx5 core 
driver for both netdev and rdma subsystems.  This series should be pulled
to both trees so we can continue netdev and rdma specific submissions
separately.

Mainly it includes two fixes for previous pull requests:
1. net/mlx5: Use MLX5_IPSEC_DEV macro for ipsec caps
 - Fixes Build issue when MLX5_ACCEL is not selected
2. net/mlx5: Fix wrongly assigned CQ reference counter
- Fixes a call trace warning when CONFIG_REFCOUNT_FULL is selected
 
And IPsec ESN netdev and user-space foundation and support,
For more information please see tag log below.

The series doesn't cause any conflict with the latest mlx5 rc fixes.

Thanks,
Saeed.

---

The following changes since commit e810bf5e96e327500cc6334f9d56c8047aaabcff:

  net/mlx5: Flow steering cmd interface should get the fte when deleting 
(2018-03-06 22:20:15 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
tags/mlx5-updates-2018-02-28-2

for you to fetch changes up to 31135eb3887daf2ed3e88fbefc36243357a9008f:

  net/mlx5: Fix wrongly assigned CQ reference counter (2018-03-07 15:54:36 
-0800)


mlx5-updates-2018-02-28-2 (IPSec-2)

This series follows our previous one to lay out the foundations for IPSec
in user-space and extend current kernel netdev IPSec support. As noted in
our previous pull request cover letter "mlx5-updates-2018-02-28-1 (IPSec-1)",
the IPSec mechanism will be supported through our flow steering mechanism.
Therefore, we need to change the initialization order. Furthermore, IPsec
is also supported in both egress and ingress. Since our current flow
steering is egress only, we add an empty (only implemented through FPGA
steering ops) egress namespace to handle that case. We also implement
the required flow steering callbacks and logic in our FPGA driver.

We extend the FPGA support for ESN and modifying a xfrm too. Therefore, we
add support for some new FPGA command interface that supports them. The
other required bits are added too. The new features and requirements are
advertised via cap bits.

Last but not least, we revise our driver's accel_esp API. This API will be
shared between our netdev and IB driver, so we need to have all the required
functionality from both worlds.

Regards,
Aviad and Matan


Aviad Yehezkel (7):
  net/mlx5: IPSec, Add command V2 support
  net/mlx5: Export ipsec capabilities
  net/mlx5: Added required metadata capability for ipsec
  net/mlx5: Refactor accel IPSec code
  net/mlx5: Add flow-steering commands for FPGA IPSec implementation
  net/mlx5e: Added common function for to_ipsec_sa_entry
  net/mlx5: IPSec, Add support for ESN

Leon Romanovsky (1):
  net/mlx5: Fix wrongly assigned CQ reference counter

Saeed Mahameed (1):
  net/mlx5: Use MLX5_IPSEC_DEV macro for ipsec caps

Yossi Kuperman (2):
  net/mlx5: IPSec, Generalize sandbox QP commands
  net/mlx5e: IPSec, Add support for ESP trailer removal by hardware

 .../net/ethernet/mellanox/mlx5/core/accel/ipsec.c  |   59 +-
 .../net/ethernet/mellanox/mlx5/core/accel/ipsec.h  |   96 +-
 drivers/net/ethernet/mellanox/mlx5/core/cq.c   |3 +-
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c   |  306 +++--
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.h   |   24 +
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c   |   39 +-
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.h   |5 +
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c   | 1280 +++-
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.h   |   76 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |8 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |2 +
 include/linux/mlx5/accel.h |  144 +++
 include/linux/mlx5/fs.h|3 +
 include/linux/mlx5/mlx5_ifc_fpga.h |   92 +-
 14 files changed, 1858 insertions(+), 279 deletions(-)
 create mode 100644 include/linux/mlx5/accel.h


Re: [PATCH 0/3] ibmvnic: Clean up net close and fix reset bug

2018-03-07 Thread David Miller
From: Thomas Falcon 
Date: Wed, 7 Mar 2018 17:43:06 -0600

> Crud, this series is meant for the net-next tree, but I forgot to
> include it in the patch tag.

I know I'm really stubborn about a lot of things :-), but next
time you can just reply like this explaining things and that
is enough for me.

Thanks.


Re: [PATCH net-next] modules: allow modprobe load regular elf binaries

2018-03-07 Thread Luis R. Rodriguez
On Mon, Mar 05, 2018 at 05:34:57PM -0800, Alexei Starovoitov wrote:
> As the first step in development of bpfilter project [1]

So meta :) The URL refers an lwn article, which in turn refers to this effort's
first RFC.  As someone only getting *one* of these patches in emails, It would
be useful if the cover letter referenced instead an optional git tree and
branch so one could easily get the patches for more careful inspection. In the
meantime, can you make such tree available with a branch?

Also, since kernel/module.c is involved it would be wise to include Jessica,
which I've Cc'd. I'm going to guess Kees may like to review this too. Mimi
might want to help review as well.

Rafael may care about suspend/resume implications of these "umh modules" as
you put it.

> the request_module()
> code is extended to allow user mode helpers to be invoked. 

Upon inspection you never touch or use request_module() so this is
false and misleading. You don't use or extend request_module() at all, you rely
on extending finit_module() so that load_module() itself will now execute (via
modified do_execve_file()) the same file which was loaded as an module.

This is *very* different given request_module() has its own full magic and is
in and of itself a UMH of the kernel implemented in kernel/kmod.c.

Nevertheless, why?

> Idea is that
> user mode helpers are built as part of the kernel build and installed as
> traditional kernel modules with .ko file extension into distro specified
> location, such that from a distribution point of view, they are no different
> than regular kernel modules.

It sounds like finit_module() module loading is used as a convenience factor
simply to take advantage of being able to ship / maintain/ compile these umh
programs as part of the kernel. Is that right?

So the finit_module() interface was just a convenient mechanism. Is that right?

Ie, if folks had these binaries in place the regular UMH interface / API could 
be
used so that these could be looked for, but instead we want to carry these
in tandem with the kernel?

If so this still seems like an overly complex way to deal with this.

> Thus, allow request_module() logic to load such
> user mode helper (umh) modules via:
> 
>   request_module("foo") ->
> call_umh("modprobe foo") ->
>   sys_finit_module(FD of /lib/modules/.../foo.ko) ->
> call_umh(struct file)

OK so the use case envisioned here was for networking code to do
something like:

if (!loaded) {
err = request_module("bpfilter");
...
}

This is visible on your third patch (this is from your RFC, not this series):

https://www.mail-archive.com/netfilter-devel@vger.kernel.org/msg11129.html

So indeed all this patch does in the end is just putting tons of wrappers in
place so that kernel code can load certain trusted UMH programs we ship, and
maintain in the kernel.

request_module() has its own world though too. How often in your proof of
concept is request_module() called? How many times do you envision it being
called?

Please review lib/test_kmod.c and tools/testing/selftests/kmod/ for testing
your stuff too or consider extending appropriately.

Are aliases something which you expect we'll need to support for these
userspace... modules?

> Such approach enables kernel to delegate functionality traditionally done
> by kernel modules into user space processes (either root or !root) and
> reduces security attack surface of such new code, meaning in case of
> potential bugs only the umh would crash but not the kernel.

Now, this sounds great, however I think that the proof of concept chosen is
pretty complex to start off with. Even if its not designed to be a real
world life use case, a much simpler proof of concept to do something
more simple may be useful, if possible. One wouldn't need to to have it
replace a kernel functionality in real life. lib/ is full of CONFIG_TEST_*
examples, a simple new stupid kernel functionality which can in turn be replaced
with a respective userspace counterpart may be useful, and both kconfig
entries would be mutually exclusive.

> Another
> advantage coming with that would be that bpfilter.ko 

You mean foo.ko

> can be debugged and
> tested out of user space as well (e.g. opening the possibility to run
> all clang sanitizers, fuzzers or test suites for checking translation).

Great too.

> Also, such architecture makes the kernel/user boundary very precise:
> control plane is done by the user space while data plane stays in the kernel.

I don't see how this is defining any boundary, I see just a loader for a
userspace program, and re-using a kernel interface known, finit_module() which
makes it convenient for us to load pre-compiled kernel junk. I'm still
not convinced this is the right approach.

> It's easy to distinguish "umh module" from traditional kernel module:

Ah you said it, "umh module". I don't see what makes it a "umh module" so far,
all we are doing is executing a 

pull-request: bpf 2018-03-08

2018-03-07 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix various BPF helpers which adjust the skb and its GSO information
   with regards to SCTP GSO. The latter is a special case where gso_size
   is of value GSO_BY_FRAGS, so mangling that will end up corrupting
   the skb, thus bail out when seeing SCTP GSO packets, from Daniel(s).

2) Fix a compilation error in bpftool where BPF_FS_MAGIC is not defined
   due to too old kernel headers in the system, from Jiri.

3) Increase the number of x64 JIT passes in order to allow larger images
   to converge instead of punting them to interpreter or having them
   rejected when the interpreter is not built into the kernel, from Daniel.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit 4a0c7191c75e94b052855bf44d94bfe97403:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf (2018-03-02 
20:32:15 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 6007b080d2e2adb7af22bf29165f0594ea12b34c:

  bpf, x64: increase number of passes (2018-03-07 14:47:24 -0800)


Daniel Axtens (1):
  bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat to deal with gso sctp skbs

Daniel Borkmann (1):
  bpf, x64: increase number of passes

Jiri Benc (1):
  tools: bpftool: fix compilation with older headers

 Documentation/networking/segmentation-offloads.txt | 11 +++-
 arch/x86/net/bpf_jit_comp.c|  3 +-
 include/linux/skbuff.h | 22 
 net/core/filter.c  | 60 +++---
 tools/bpf/bpftool/common.c |  4 ++
 5 files changed, 79 insertions(+), 21 deletions(-)


[next-queue PATCH v4 6/8] igb: Add MAC address support for ethtool nftuple filters

2018-03-07 Thread Vinicius Costa Gomes
This adds the capability of configuring the queue steering of arriving
packets based on their source and destination MAC addresses.

In practical terms this adds support for the following use cases,
characterized by these examples:

$ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
(this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
to the RX queue 0)

$ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3
(this will direct packets with source address "44:44:44:44:44:44" to
the RX queue 3)

Signed-off-by: Vinicius Costa Gomes 
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 35 
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 94fc9a4bed8b..3f98299d4cd0 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2494,6 +2494,23 @@ static int igb_get_ethtool_nfc_entry(struct igb_adapter 
*adapter,
fsp->h_ext.vlan_tci = rule->filter.vlan_tci;
fsp->m_ext.vlan_tci = htons(VLAN_PRIO_MASK);
}
+   if (rule->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) {
+   ether_addr_copy(fsp->h_u.ether_spec.h_dest,
+   rule->filter.dst_addr);
+   /* As we only support matching by the full
+* mask, return the mask to userspace
+*/
+   eth_broadcast_addr(fsp->m_u.ether_spec.h_dest);
+   }
+   if (rule->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) {
+   ether_addr_copy(fsp->h_u.ether_spec.h_source,
+   rule->filter.src_addr);
+   /* As we only support matching by the full
+* mask, return the mask to userspace
+*/
+   eth_broadcast_addr(fsp->m_u.ether_spec.h_source);
+   }
+
return 0;
}
return -EINVAL;
@@ -2932,10 +2949,6 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter 
*adapter,
if ((fsp->flow_type & ~FLOW_EXT) != ETHER_FLOW)
return -EINVAL;
 
-   if (fsp->m_u.ether_spec.h_proto != ETHER_TYPE_FULL_MASK &&
-   fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK))
-   return -EINVAL;
-
input = kzalloc(sizeof(*input), GFP_KERNEL);
if (!input)
return -ENOMEM;
@@ -2945,6 +2958,20 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter 
*adapter,
input->filter.match_flags = IGB_FILTER_FLAG_ETHER_TYPE;
}
 
+   /* Only support matching addresses by the full mask */
+   if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_source)) {
+   input->filter.match_flags |= IGB_FILTER_FLAG_SRC_MAC_ADDR;
+   ether_addr_copy(input->filter.src_addr,
+   fsp->h_u.ether_spec.h_source);
+   }
+
+   /* Only support matching addresses by the full mask */
+   if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_dest)) {
+   input->filter.match_flags |= IGB_FILTER_FLAG_DST_MAC_ADDR;
+   ether_addr_copy(input->filter.dst_addr,
+   fsp->h_u.ether_spec.h_dest);
+   }
+
if ((fsp->flow_type & FLOW_EXT) && fsp->m_ext.vlan_tci) {
if (fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) {
err = -EINVAL;
-- 
2.16.2



[next-queue PATCH v3 0/8] igb: offloading of receive filters

2018-03-07 Thread Vinicius Costa Gomes
Hi,

Changes from v3:
 - Addressed review comments from Aaron F. Brown and
   Jakub Kicinski;

Changes from v2:
 - Addressed review comments from Jakub Kicinski, mostly about coding
   style adjustments and more consistent error reporting;

Changes from v1:
 - Addressed review comments from Alexander Duyck and Florian
   Fainelli;
 - Adding and removing cls_flower filters are now proposed in the same
   patch;
 - cls_flower filters are kept in a separated list from "ethtool"
   filters (so that section of the original cover letter is no longer
   valid);
 - The patch adding support for ethtool filters is now independent from
   the rest of the series;

Original cover letter:

This series enables some ethtool and tc-flower filters to be offloaded
to igb-based network controllers. This is useful when the system
configurator want to steer kinds of traffic to a specific hardware
queue.

The first two commits are bug fixes.

The basis of this series is to export the internal API used to
configure address filters, so they can be used by ethtool, and
extending the functionality so an source address can be handled.

Then, we enable the tc-flower offloading implementation to re-use the
same infrastructure as ethtool, and storing them in the per-adapter
"nfc" (Network Filter Config?) list. But for consistency, for
destructive access they are separated, i.e. an filter added by
tc-flower can only be removed by tc-flower, but ethtool can read them
all.

Only support for VLAN Prio, Source and Destination MAC Address, and
Ethertype is enabled for now.

Open question:
  - igb is initialized with the number of traffic classes as 1, if we
  want to use multiple traffic classes we need to increase this value,
  the only way I could find is to use mqprio (for example). Should igb
  be initialized with, say, the number of queues as its "num_tc"?

Vinicius Costa Gomes (8):
  igb: Fix not adding filter elements to the list
  igb: Fix queue selection on MAC filters on i210 and i211
  igb: Enable the hardware traffic class feature bit for igb models
  igb: Add support for MAC address filters specifying source addresses
  igb: Enable nfc filters to specify MAC addresses
  igb: Add MAC address support for ethtool nftuple filters
  igb: Add the skeletons for tc-flower offloading
  igb: Add support for adding offloaded clsflower filters

 drivers/net/ethernet/intel/igb/e1000_defines.h |   2 +
 drivers/net/ethernet/intel/igb/igb.h   |  12 +
 drivers/net/ethernet/intel/igb/igb_ethtool.c   |  65 +-
 drivers/net/ethernet/intel/igb/igb_main.c  | 306 -
 4 files changed, 371 insertions(+), 14 deletions(-)

--
2.16.2


[next-queue PATCH v4 4/8] igb: Add support for MAC address filters specifying source addresses

2018-03-07 Thread Vinicius Costa Gomes
Makes it possible to direct packets to queues based on their source
address. Documents the expected usage of the 'flags' parameter.

Signed-off-by: Vinicius Costa Gomes 
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  1 +
 drivers/net/ethernet/intel/igb/igb.h   |  1 +
 drivers/net/ethernet/intel/igb/igb_main.c  | 38 ++
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 573bf177fd08..c6f552de30dd 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -490,6 +490,7 @@
  * manageability enabled, allowing us room for 15 multicast addresses.
  */
 #define E1000_RAH_AV  0x8000/* Receive descriptor valid */
+#define E1000_RAH_ASEL_SRC_ADDR 0x0001
 #define E1000_RAH_QSEL_ENABLE 0x1000
 #define E1000_RAL_MAC_ADDR_LEN 4
 #define E1000_RAH_MAC_ADDR_LEN 2
diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index 55d6f17d5799..4501b28ff7c5 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -473,6 +473,7 @@ struct igb_mac_addr {
 
 #define IGB_MAC_STATE_DEFAULT  0x1
 #define IGB_MAC_STATE_IN_USE   0x2
+#define IGB_MAC_STATE_SRC_ADDR  0x4
 
 /* board specific private data structure */
 struct igb_adapter {
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 8684d5ed56e1..76969467de31 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6843,8 +6843,13 @@ static void igb_set_default_mac_filter(struct 
igb_adapter *adapter)
igb_rar_set_index(adapter, 0);
 }
 
-static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
- const u8 queue)
+/* Add a MAC filter for 'addr' directing matching traffic to 'queue',
+ * 'flags' is used to indicate what kind of match is made, match is by
+ * default for the destination address, if matching by source address
+ * is desired the flag IGB_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igb_add_mac_filter_flags(struct igb_adapter *adapter, const u8 
*addr,
+   const u8 queue, const u8 flags)
 {
struct e1000_hw *hw = >hw;
int rar_entries = hw->mac.rar_entry_count -
@@ -6864,7 +6869,7 @@ static int igb_add_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
 
ether_addr_copy(adapter->mac_table[i].addr, addr);
adapter->mac_table[i].queue = queue;
-   adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE;
+   adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE | flags;
 
igb_rar_set_index(adapter, i);
return i;
@@ -6873,8 +6878,20 @@ static int igb_add_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
return -ENOSPC;
 }
 
-static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr,
+static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
  const u8 queue)
+{
+   return igb_add_mac_filter_flags(adapter, addr, queue, 0);
+}
+
+/* Remove a MAC filter for 'addr' directing matching traffic to
+ * 'queue', 'flags' is used to indicate what kind of match need to be
+ * removed, match is by default for the destination address, if
+ * matching by source address is to be removed the flag
+ * IGB_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igb_del_mac_filter_flags(struct igb_adapter *adapter, const u8 
*addr,
+   const u8 queue, const u8 flags)
 {
struct e1000_hw *hw = >hw;
int rar_entries = hw->mac.rar_entry_count -
@@ -6891,12 +6908,14 @@ static int igb_del_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
for (i = 0; i < rar_entries; i++) {
if (!(adapter->mac_table[i].state & IGB_MAC_STATE_IN_USE))
continue;
+   if ((adapter->mac_table[i].state & flags) != flags)
+   continue;
if (adapter->mac_table[i].queue != queue)
continue;
if (!ether_addr_equal(adapter->mac_table[i].addr, addr))
continue;
 
-   adapter->mac_table[i].state &= ~IGB_MAC_STATE_IN_USE;
+   adapter->mac_table[i].state = 0;
memset(adapter->mac_table[i].addr, 0, ETH_ALEN);
adapter->mac_table[i].queue = 0;
 
@@ -6907,6 +6926,12 @@ static int igb_del_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
return -ENOENT;
 }
 
+static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr,
+ const u8 queue)
+{
+   return igb_del_mac_filter_flags(adapter, addr, queue, 0);
+}
+
 static int 

[next-queue PATCH v4 7/8] igb: Add the skeletons for tc-flower offloading

2018-03-07 Thread Vinicius Costa Gomes
This adds basic functions needed to implement offloading for filters
created by tc-flower.

Signed-off-by: Vinicius Costa Gomes 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 66 +++
 1 file changed, 66 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 04307ef07e5a..7762dd5f270d 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2497,6 +2498,69 @@ static int igb_offload_cbs(struct igb_adapter *adapter,
return 0;
 }
 
+static int igb_configure_clsflower(struct igb_adapter *adapter,
+  struct tc_cls_flower_offload *cls_flower)
+{
+   return -EOPNOTSUPP;
+}
+
+static int igb_delete_clsflower(struct igb_adapter *adapter,
+   struct tc_cls_flower_offload *cls_flower)
+{
+   return -EOPNOTSUPP;
+}
+
+static int igb_setup_tc_cls_flower(struct igb_adapter *adapter,
+  struct tc_cls_flower_offload *cls_flower)
+{
+   switch (cls_flower->command) {
+   case TC_CLSFLOWER_REPLACE:
+   return igb_configure_clsflower(adapter, cls_flower);
+   case TC_CLSFLOWER_DESTROY:
+   return igb_delete_clsflower(adapter, cls_flower);
+   case TC_CLSFLOWER_STATS:
+   return -EOPNOTSUPP;
+   default:
+   return -EINVAL;
+   }
+}
+
+static int igb_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+void *cb_priv)
+{
+   struct igb_adapter *adapter = cb_priv;
+
+   if (!tc_cls_can_offload_and_chain0(adapter->netdev, type_data))
+   return -EOPNOTSUPP;
+
+   switch (type) {
+   case TC_SETUP_CLSFLOWER:
+   return igb_setup_tc_cls_flower(adapter, type_data);
+
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int igb_setup_tc_block(struct igb_adapter *adapter,
+ struct tc_block_offload *f)
+{
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block, igb_setup_tc_block_cb,
+adapter, adapter);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block, igb_setup_tc_block_cb,
+   adapter);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
 static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data)
 {
@@ -2505,6 +2569,8 @@ static int igb_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
switch (type) {
case TC_SETUP_QDISC_CBS:
return igb_offload_cbs(adapter, type_data);
+   case TC_SETUP_BLOCK:
+   return igb_setup_tc_block(adapter, type_data);
 
default:
return -EOPNOTSUPP;
-- 
2.16.2



[next-queue PATCH v4 8/8] igb: Add support for adding offloaded clsflower filters

2018-03-07 Thread Vinicius Costa Gomes
This allows filters added by tc-flower and specifying MAC addresses,
Ethernet types, and the VLAN priority field, to be offloaded to the
controller.

This reuses most of the infrastructure used by ethtool, but clsflower
filters are kept in a separated list, so they are invisible to
ethtool.

Signed-off-by: Vinicius Costa Gomes 
---
 drivers/net/ethernet/intel/igb/igb.h  |   2 +
 drivers/net/ethernet/intel/igb/igb_main.c | 188 +-
 2 files changed, 188 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index 3b310b16e1d1..1874c4635d54 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -464,6 +464,7 @@ struct igb_nfc_input {
 struct igb_nfc_filter {
struct hlist_node nfc_node;
struct igb_nfc_input filter;
+   unsigned long cookie;
u16 etype_reg_index;
u16 sw_idx;
u16 action;
@@ -602,6 +603,7 @@ struct igb_adapter {
 
/* RX network flow classification support */
struct hlist_head nfc_filter_list;
+   struct hlist_head cls_flower_list;
unsigned int nfc_filter_count;
/* lock for RX network flow classification filter */
spinlock_t nfc_lock;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 7762dd5f270d..d7d86a7955bd 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2498,16 +2498,197 @@ static int igb_offload_cbs(struct igb_adapter *adapter,
return 0;
 }
 
+#define ETHER_TYPE_FULL_MASK ((__force __be16)~0)
+#define VLAN_PRIO_FULL_MASK (0x07)
+
+static int igb_parse_cls_flower(struct igb_adapter *adapter,
+   struct tc_cls_flower_offload *f,
+   int traffic_class,
+   struct igb_nfc_filter *input)
+{
+   struct netlink_ext_ack *extack = f->common.extack;
+
+   if (f->dissector->used_keys &
+   ~(BIT(FLOW_DISSECTOR_KEY_BASIC) |
+ BIT(FLOW_DISSECTOR_KEY_CONTROL) |
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) |
+ BIT(FLOW_DISSECTOR_KEY_VLAN))) {
+   NL_SET_ERR_MSG_MOD(extack,
+  "Unsupported key used, only BASIC, CONTROL, 
ETH_ADDRS and VLAN are supported");
+   return -EOPNOTSUPP;
+   }
+
+   if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+   struct flow_dissector_key_eth_addrs *key, *mask;
+
+   key = skb_flow_dissector_target(f->dissector,
+   FLOW_DISSECTOR_KEY_ETH_ADDRS,
+   f->key);
+   mask = skb_flow_dissector_target(f->dissector,
+FLOW_DISSECTOR_KEY_ETH_ADDRS,
+f->mask);
+
+   if (!is_zero_ether_addr(mask->dst)) {
+   if (!is_broadcast_ether_addr(mask->dst)) {
+   NL_SET_ERR_MSG_MOD(extack, "Only full masks are 
supported for destination MAC address");
+   return -EINVAL;
+   }
+
+   input->filter.match_flags |=
+   IGB_FILTER_FLAG_DST_MAC_ADDR;
+   ether_addr_copy(input->filter.dst_addr, key->dst);
+   }
+
+   if (!is_zero_ether_addr(mask->src)) {
+   if (!is_broadcast_ether_addr(mask->src)) {
+   NL_SET_ERR_MSG_MOD(extack, "Only full masks are 
supported for source MAC address");
+   return -EINVAL;
+   }
+
+   input->filter.match_flags |=
+   IGB_FILTER_FLAG_SRC_MAC_ADDR;
+   ether_addr_copy(input->filter.src_addr, key->src);
+   }
+   }
+
+   if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+   struct flow_dissector_key_basic *key, *mask;
+
+   key = skb_flow_dissector_target(f->dissector,
+   FLOW_DISSECTOR_KEY_BASIC,
+   f->key);
+   mask = skb_flow_dissector_target(f->dissector,
+FLOW_DISSECTOR_KEY_BASIC,
+f->mask);
+
+   if (mask->n_proto) {
+   if (mask->n_proto != ETHER_TYPE_FULL_MASK) {
+   NL_SET_ERR_MSG_MOD(extack, "Only full mask is 
supported for EtherType filter");
+   return -EINVAL;
+   }
+
+   input->filter.match_flags |= IGB_FILTER_FLAG_ETHER_TYPE;
+ 

[next-queue PATCH v4 1/8] igb: Fix not adding filter elements to the list

2018-03-07 Thread Vinicius Costa Gomes
Because the order of the parameters passes to 'hlist_add_behind()' was
inverted, the 'parent' node was added "behind" the 'input', as input
is not in the list, this causes the 'input' node to be lost.

Fixes: 0e71def25281 ("igb: add support of RX network flow classification")
Signed-off-by: Vinicius Costa Gomes 
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 606e6761758f..143f0bb34e4d 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2864,7 +2864,7 @@ static int igb_update_ethtool_nfc_entry(struct 
igb_adapter *adapter,
 
/* add filter to the list */
if (parent)
-   hlist_add_behind(>nfc_node, >nfc_node);
+   hlist_add_behind(>nfc_node, >nfc_node);
else
hlist_add_head(>nfc_node, >nfc_filter_list);
 
-- 
2.16.2



[next-queue PATCH v4 3/8] igb: Enable the hardware traffic class feature bit for igb models

2018-03-07 Thread Vinicius Costa Gomes
This will allow functionality depending on the hardware being traffic
class aware to work. In particular the tc-flower offloading checks
verifies that this bit is set.

Signed-off-by: Vinicius Costa Gomes 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index eabedc6b6518..8684d5ed56e1 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2806,6 +2806,9 @@ static int igb_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
if (hw->mac.type >= e1000_82576)
netdev->features |= NETIF_F_SCTP_CRC;
 
+   if (hw->mac.type >= e1000_i350)
+   netdev->features |= NETIF_F_HW_TC;
+
 #define IGB_GSO_PARTIAL_FEATURES (NETIF_F_GSO_GRE | \
  NETIF_F_GSO_GRE_CSUM | \
  NETIF_F_GSO_IPXIP4 | \
-- 
2.16.2



[next-queue PATCH v4 5/8] igb: Enable nfc filters to specify MAC addresses

2018-03-07 Thread Vinicius Costa Gomes
This allows igb_add_filter()/igb_erase_filter() to work on filters
that include MAC addresses (both source and destination).

For now, this only exposes the functionality, the next commit glues
ethtool into this. Later in this series, these APIs are used to allow
offloading of cls_flower filters.

Signed-off-by: Vinicius Costa Gomes 
---
 drivers/net/ethernet/intel/igb/igb.h |  9 +
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 28 
 drivers/net/ethernet/intel/igb/igb_main.c|  8 
 3 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index 4501b28ff7c5..3b310b16e1d1 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -441,6 +441,8 @@ struct hwmon_buff {
 enum igb_filter_match_flags {
IGB_FILTER_FLAG_ETHER_TYPE = 0x1,
IGB_FILTER_FLAG_VLAN_TCI   = 0x2,
+   IGB_FILTER_FLAG_SRC_MAC_ADDR   = 0x4,
+   IGB_FILTER_FLAG_DST_MAC_ADDR   = 0x8,
 };
 
 #define IGB_MAX_RXNFC_FILTERS 16
@@ -455,6 +457,8 @@ struct igb_nfc_input {
u8 match_flags;
__be16 etype;
__be16 vlan_tci;
+   u8 src_addr[ETH_ALEN];
+   u8 dst_addr[ETH_ALEN];
 };
 
 struct igb_nfc_filter {
@@ -739,4 +743,9 @@ int igb_add_filter(struct igb_adapter *adapter,
 int igb_erase_filter(struct igb_adapter *adapter,
 struct igb_nfc_filter *input);
 
+int igb_add_mac_filter_flags(struct igb_adapter *adapter, const u8 *addr,
+const u8 queue, const u8 flags);
+int igb_del_mac_filter_flags(struct igb_adapter *adapter, const u8 *addr,
+const u8 queue, const u8 flags);
+
 #endif /* _IGB_H_ */
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 143f0bb34e4d..94fc9a4bed8b 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2775,6 +2775,25 @@ int igb_add_filter(struct igb_adapter *adapter, struct 
igb_nfc_filter *input)
return err;
}
 
+   if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) {
+   err = igb_add_mac_filter_flags(adapter,
+  input->filter.dst_addr,
+  input->action, 0);
+   err = min_t(int, err, 0);
+   if (err)
+   return err;
+   }
+
+   if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) {
+   err = igb_add_mac_filter_flags(adapter,
+  input->filter.src_addr,
+  input->action,
+  IGB_MAC_STATE_SRC_ADDR);
+   err = min_t(int, err, 0);
+   if (err)
+   return err;
+   }
+
if (input->filter.match_flags & IGB_FILTER_FLAG_VLAN_TCI)
err = igb_rxnfc_write_vlan_prio_filter(adapter, input);
 
@@ -2823,6 +2842,15 @@ int igb_erase_filter(struct igb_adapter *adapter, struct 
igb_nfc_filter *input)
igb_clear_vlan_prio_filter(adapter,
   ntohs(input->filter.vlan_tci));
 
+   if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR)
+   igb_del_mac_filter_flags(adapter, input->filter.src_addr,
+input->action,
+IGB_MAC_STATE_SRC_ADDR);
+
+   if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR)
+   igb_del_mac_filter_flags(adapter, input->filter.dst_addr,
+input->action, 0);
+
return 0;
 }
 
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 76969467de31..04307ef07e5a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6848,8 +6848,8 @@ static void igb_set_default_mac_filter(struct igb_adapter 
*adapter)
  * default for the destination address, if matching by source address
  * is desired the flag IGB_MAC_STATE_SRC_ADDR can be used.
  */
-static int igb_add_mac_filter_flags(struct igb_adapter *adapter, const u8 
*addr,
-   const u8 queue, const u8 flags)
+int igb_add_mac_filter_flags(struct igb_adapter *adapter, const u8 *addr,
+const u8 queue, const u8 flags)
 {
struct e1000_hw *hw = >hw;
int rar_entries = hw->mac.rar_entry_count -
@@ -6890,8 +6890,8 @@ static int igb_add_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
  * matching by source address is to be removed the flag
  * IGB_MAC_STATE_SRC_ADDR can be used.
  */
-static int igb_del_mac_filter_flags(struct igb_adapter 

[next-queue PATCH v4 2/8] igb: Fix queue selection on MAC filters on i210 and i211

2018-03-07 Thread Vinicius Costa Gomes
On the RAH registers there are semantic differences on the meaning of
the "queue" parameter for traffic steering depending on the controller
model: there is the 82575 meaning, which "queue" means a RX Hardware
Queue, and the i350 meaning, where it is a reception pool.

The previous behaviour was having no effect for i210 and i211 based
controllers because the QSEL bit of the RAH register wasn't being set.

This patch separates the condition in discrete cases, so the different
handling is clearer.

Fixes: 83c21335c876 ("igb: improve MAC filter handling")
Signed-off-by: Vinicius Costa Gomes 
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  1 +
 drivers/net/ethernet/intel/igb/igb_main.c  | 15 +++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 83cabff1e0ab..573bf177fd08 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -490,6 +490,7 @@
  * manageability enabled, allowing us room for 15 multicast addresses.
  */
 #define E1000_RAH_AV  0x8000/* Receive descriptor valid */
+#define E1000_RAH_QSEL_ENABLE 0x1000
 #define E1000_RAL_MAC_ADDR_LEN 4
 #define E1000_RAH_MAC_ADDR_LEN 2
 #define E1000_RAH_POOL_MASK 0x03FC
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 715bb32e6901..eabedc6b6518 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8747,12 +8747,19 @@ static void igb_rar_set_index(struct igb_adapter 
*adapter, u32 index)
if (is_valid_ether_addr(addr))
rar_high |= E1000_RAH_AV;
 
-   if (hw->mac.type == e1000_82575)
+   switch (hw->mac.type) {
+   case e1000_82575:
+   case e1000_i210:
+   case e1000_i211:
+   rar_high |= E1000_RAH_QSEL_ENABLE;
rar_high |= E1000_RAH_POOL_1 *
-   adapter->mac_table[index].queue;
-   else
+ adapter->mac_table[index].queue;
+   break;
+   default:
rar_high |= E1000_RAH_POOL_1 <<
-   adapter->mac_table[index].queue;
+   adapter->mac_table[index].queue;
+   break;
+   }
}
 
wr32(E1000_RAL(index), rar_low);
-- 
2.16.2



[PATCH] net: remove VLA usage

2018-03-07 Thread Laszlo Toth
Separated snmp_seq_show_tcp_udp() to tcp and udp variants,
so the usage of max_t() for the array size can be emitted.

Signed-off-by: Laszlo Toth 
---
 net/ipv4/proc.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index dc5edc8..67f7d76 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -46,8 +46,6 @@
 #include 
 #include 
 
-#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX)
-
 /*
  * Report socket allocation statistics [m...@utu.fi]
  */
@@ -398,13 +396,13 @@ static int snmp_seq_show_ipstats(struct seq_file *seq, 
void *v)
return 0;
 }
 
-static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
+static int snmp_seq_show_tcp(struct seq_file *seq, void *v)
 {
-   unsigned long buff[TCPUDP_MIB_MAX];
+   unsigned long buff[TCP_MIB_MAX];
struct net *net = seq->private;
int i;
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, TCP_MIB_MAX * sizeof(unsigned long));
 
seq_puts(seq, "\nTcp:");
for (i = 0; snmp4_tcp_list[i].name; i++)
@@ -421,7 +419,16 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, 
void *v)
seq_printf(seq, " %lu", buff[i]);
}
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   return 0;
+}
+
+static int snmp_seq_show_udp(struct seq_file *seq, void *v)
+{
+   unsigned long buff[UDP_MIB_MAX];
+   struct net *net = seq->private;
+   int i;
+
+   memset(buff, 0, UDP_MIB_MAX * sizeof(unsigned long));
 
snmp_get_cpu_field_batch(buff, snmp4_udp_list,
 net->mib.udp_statistics);
@@ -432,7 +439,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
for (i = 0; snmp4_udp_list[i].name; i++)
seq_printf(seq, " %lu", buff[i]);
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, UDP_MIB_MAX * sizeof(unsigned long));
 
/* the UDP and UDP-Lite MIBs are the same */
seq_puts(seq, "\nUdpLite:");
@@ -455,7 +462,8 @@ static int snmp_seq_show(struct seq_file *seq, void *v)
icmp_put(seq);  /* RFC 2011 compatibility */
icmpmsg_put(seq);
 
-   snmp_seq_show_tcp_udp(seq, v);
+   snmp_seq_show_tcp(seq, v);
+   snmp_seq_show_udp(seq, v);
 
return 0;
 }
-- 
2.7.4



RE: [Intel-wired-lan] [next-queue PATCH v3 6/8] igb: Add MAC address support for ethtool nftuple filters

2018-03-07 Thread Vinicius Costa Gomes
Hi,

"Brown, Aaron F"  writes:

>> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
>> Behalf Of Vinicius Costa Gomes
>> Sent: Tuesday, March 6, 2018 5:30 PM
>> To: intel-wired-...@lists.osuosl.org
>> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus > palen...@intel.com>
>> Subject: [Intel-wired-lan] [next-queue PATCH v3 6/8] igb: Add MAC address
>> support for ethtool nftuple filters
>> 
>> This adds the capability of configuring the queue steering of arriving
>> packets based on their source and destination MAC addresses.
>> 
>> In practical terms this adds support for the following use cases,
>> characterized by these examples:
>> 
>> $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
>> (this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
>> to the RX queue 0)
>> 
>> $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3
>> (this will direct packets with destination address "44:44:44:44:44:44"
>> to the RX queue 3)
>
> I assume this example should read "... source address"  rather than
> "...destination".

Ugh, yeah. Will be fixed on v4.


Thank you,
--
Vinicius


[RESEND PATCH 0/3 net-next] ibmvnic: Clean up net close and fix reset bug

2018-03-07 Thread Thomas Falcon
This patch set cleans up and reorganizes the driver's net_device
close function and leverages that to fix up a bug that can occur
during some device resets. Some reset cases require the backing
adapter to be disabled before continuing, but other cases, such as 
during a device failover or partition migration, do not require this
step. Since the device will not be initialized at this stage and
its command-processing queue is closed, do not send the request to
disable the device as it could result in an error or timeout
disrupting the reset.

Thomas Falcon (3):
  ibmvnic: Clean up device close
  ibmvnic: Reorganize device close
  ibmvnic: Do not disable device during failover or partition migration

 drivers/net/ethernet/ibm/ibmvnic.c | 48 ++
 1 file changed, 23 insertions(+), 25 deletions(-)

-- 
1.8.3.1



[RESEND PATCH 1/3 net-next] ibmvnic: Clean up device close

2018-03-07 Thread Thomas Falcon
Remove some dead code now that RX pools are being cleaned. This
was included to wait until any pending RX queue interrupts are
processed, but NAPI polling should be disabled by this point.

Another minor change is to use the net device parameter for any
print functions instead of accessing it from the adapter structure.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 7654071..fca0533 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1162,7 +1162,7 @@ static int __ibmvnic_close(struct net_device *netdev)
if (adapter->tx_scrq) {
for (i = 0; i < adapter->req_tx_queues; i++)
if (adapter->tx_scrq[i]->irq) {
-   netdev_dbg(adapter->netdev,
+   netdev_dbg(netdev,
   "Disabling tx_scrq[%d] irq\n", i);
disable_irq(adapter->tx_scrq[i]->irq);
}
@@ -1174,18 +1174,8 @@ static int __ibmvnic_close(struct net_device *netdev)
 
if (adapter->rx_scrq) {
for (i = 0; i < adapter->req_rx_queues; i++) {
-   int retries = 10;
-
-   while (pending_scrq(adapter, adapter->rx_scrq[i])) {
-   retries--;
-   mdelay(100);
-
-   if (retries == 0)
-   break;
-   }
-
if (adapter->rx_scrq[i]->irq) {
-   netdev_dbg(adapter->netdev,
+   netdev_dbg(netdev,
   "Disabling rx_scrq[%d] irq\n", i);
disable_irq(adapter->rx_scrq[i]->irq);
}
-- 
1.8.3.1



[RESEND PATCH 3/3 net-next] ibmvnic: Do not disable device during failover or partition migration

2018-03-07 Thread Thomas Falcon
During a device failover or partition migration reset, it is not
necessary to disable the backing adapter since it should not be
running yet and its Command-Response Queue is closed. Sending
device commands during this time could result in an error or
timeout disrupting the reset process. In these cases, just halt
transmissions, clean up resources, and continue with reset.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index d93f286..7be4b06 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1653,12 +1653,15 @@ static int do_reset(struct ibmvnic_adapter *adapter,
rc = ibmvnic_reenable_crq_queue(adapter);
if (rc)
return 0;
+   ibmvnic_cleanup(netdev);
+   } else if (rwi->reset_reason == VNIC_RESET_FAILOVER) {
+   ibmvnic_cleanup(netdev);
+   } else {
+   rc = __ibmvnic_close(netdev);
+   if (rc)
+   return rc;
}
 
-   rc = __ibmvnic_close(netdev);
-   if (rc)
-   return rc;
-
if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM ||
adapter->wait_for_reset) {
release_resources(adapter);
-- 
1.8.3.1



[RESEND PATCH 2/3 net-next] ibmvnic: Reorganize device close

2018-03-07 Thread Thomas Falcon
Introduce a function to halt network operations and clean up any
unused or outstanding socket buffers. Then, during device close,
disable backing adapter before halting all queues and performing
cleanup. This ensures all backing device operations will be
stopped before the driver cleans up shared resources.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index fca0533..d93f286 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1143,14 +1143,11 @@ static void clean_tx_pools(struct ibmvnic_adapter 
*adapter)
}
 }
 
-static int __ibmvnic_close(struct net_device *netdev)
+static void ibmvnic_cleanup(struct net_device *netdev)
 {
struct ibmvnic_adapter *adapter = netdev_priv(netdev);
-   int rc = 0;
int i;
 
-   adapter->state = VNIC_CLOSING;
-
/* ensure that transmissions are stopped if called by do_reset */
if (adapter->resetting)
netif_tx_disable(netdev);
@@ -1168,10 +1165,6 @@ static int __ibmvnic_close(struct net_device *netdev)
}
}
 
-   rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
-   if (rc)
-   return rc;
-
if (adapter->rx_scrq) {
for (i = 0; i < adapter->req_rx_queues; i++) {
if (adapter->rx_scrq[i]->irq) {
@@ -1183,8 +1176,20 @@ static int __ibmvnic_close(struct net_device *netdev)
}
clean_rx_pools(adapter);
clean_tx_pools(adapter);
+}
+
+static int __ibmvnic_close(struct net_device *netdev)
+{
+   struct ibmvnic_adapter *adapter = netdev_priv(netdev);
+   int rc = 0;
+
+   adapter->state = VNIC_CLOSING;
+   rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
+   if (rc)
+   return rc;
+   ibmvnic_cleanup(netdev);
adapter->state = VNIC_CLOSED;
-   return rc;
+   return 0;
 }
 
 static int ibmvnic_close(struct net_device *netdev)
-- 
1.8.3.1



Re: [PATCH net 4/5] tcp: prevent bogus undos when SACK is not enabled

2018-03-07 Thread Yuchung Cheng
On Wed, Mar 7, 2018 at 12:19 PM, Neal Cardwell  wrote:
>
> On Wed, Mar 7, 2018 at 7:59 AM, Ilpo Järvinen  
> wrote:
> > A bogus undo may/will trigger when the loss recovery state is
> > kept until snd_una is above high_seq. If tcp_any_retrans_done
> > is zero, retrans_stamp is cleared in this transient state. On
> > the next ACK, tcp_try_undo_recovery again executes and
> > tcp_may_undo will always return true because tcp_packet_delayed
> > has this condition:
> > return !tp->retrans_stamp || ...
> >
> > Check for the false fast retransmit transient condition in
> > tcp_packet_delayed to avoid bogus undos. Since snd_una may have
> > advanced on this ACK but CA state still remains unchanged,
> > prior_snd_una needs to be passed instead of tp->snd_una.
>
> This one also seems like a case where it would be nice to have a
> specific packet-by-packet example, or trace, or packetdrill scenario.
> Something that we might be able to translate into a test, or at least
> to document the issue more explicitly.
I am hesitate for further logic to make undo "perfect" on non-sack
cases b/c undo is very complicated and SACK is extremely
well-supported today. so a trace to demonstrate how severe this issue
is appreciated.

>
> Thanks!
> neal


Re: [PATCH 0/3] ibmvnic: Clean up net close and fix reset bug

2018-03-07 Thread Thomas Falcon
On 03/07/2018 05:41 PM, Thomas Falcon wrote:
> This patch set cleans up and reorganizes the driver's net_device
> close function and leverages that to fix up a bug that can occur
> during some device resets. Some reset cases require the backing
> adapter to be disabled before continuing, but other cases, such as 
> during a device failover or partition migration, do not require this
> step. Since the device will not be initialized at this stage and
> its command-processing queue is closed, do not send the request to
> disable the device as it could result in an error or timeout
> disrupting the reset.
>
> Thomas Falcon (3):
>   ibmvnic: Clean up device close
>   ibmvnic: Reorganize device close
>   ibmvnic: Do not disable device during failover or partition migration
>
>  drivers/net/ethernet/ibm/ibmvnic.c | 48 
> ++
>  1 file changed, 23 insertions(+), 25 deletions(-)
>
Crud, this series is meant for the net-next tree, but I forgot to include it in 
the patch tag.



[PATCH 3/3] ibmvnic: Do not disable device during failover or partition migration

2018-03-07 Thread Thomas Falcon
During a device failover or partition migration reset, it is not
necessary to disable the backing adapter since it should not be
running yet and its Command-Response Queue is closed. Sending
device commands during this time could result in an error or
timeout disrupting the reset process. In these cases, just halt
transmissions, clean up resources, and continue with reset.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index d93f286..7be4b06 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1653,12 +1653,15 @@ static int do_reset(struct ibmvnic_adapter *adapter,
rc = ibmvnic_reenable_crq_queue(adapter);
if (rc)
return 0;
+   ibmvnic_cleanup(netdev);
+   } else if (rwi->reset_reason == VNIC_RESET_FAILOVER) {
+   ibmvnic_cleanup(netdev);
+   } else {
+   rc = __ibmvnic_close(netdev);
+   if (rc)
+   return rc;
}
 
-   rc = __ibmvnic_close(netdev);
-   if (rc)
-   return rc;
-
if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM ||
adapter->wait_for_reset) {
release_resources(adapter);
-- 
1.8.3.1



[PATCH 0/3] ibmvnic: Clean up net close and fix reset bug

2018-03-07 Thread Thomas Falcon
This patch set cleans up and reorganizes the driver's net_device
close function and leverages that to fix up a bug that can occur
during some device resets. Some reset cases require the backing
adapter to be disabled before continuing, but other cases, such as 
during a device failover or partition migration, do not require this
step. Since the device will not be initialized at this stage and
its command-processing queue is closed, do not send the request to
disable the device as it could result in an error or timeout
disrupting the reset.

Thomas Falcon (3):
  ibmvnic: Clean up device close
  ibmvnic: Reorganize device close
  ibmvnic: Do not disable device during failover or partition migration

 drivers/net/ethernet/ibm/ibmvnic.c | 48 ++
 1 file changed, 23 insertions(+), 25 deletions(-)

-- 
1.8.3.1



[PATCH 2/3] ibmvnic: Reorganize device close

2018-03-07 Thread Thomas Falcon
Introduce a function to halt network operations and clean up any
unused or outstanding socket buffers. Then, during device close,
disable backing adapter before halting all queues and performing
cleanup. This ensures all backing device operations will be
stopped before the driver cleans up shared resources.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index fca0533..d93f286 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1143,14 +1143,11 @@ static void clean_tx_pools(struct ibmvnic_adapter 
*adapter)
}
 }
 
-static int __ibmvnic_close(struct net_device *netdev)
+static void ibmvnic_cleanup(struct net_device *netdev)
 {
struct ibmvnic_adapter *adapter = netdev_priv(netdev);
-   int rc = 0;
int i;
 
-   adapter->state = VNIC_CLOSING;
-
/* ensure that transmissions are stopped if called by do_reset */
if (adapter->resetting)
netif_tx_disable(netdev);
@@ -1168,10 +1165,6 @@ static int __ibmvnic_close(struct net_device *netdev)
}
}
 
-   rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
-   if (rc)
-   return rc;
-
if (adapter->rx_scrq) {
for (i = 0; i < adapter->req_rx_queues; i++) {
if (adapter->rx_scrq[i]->irq) {
@@ -1183,8 +1176,20 @@ static int __ibmvnic_close(struct net_device *netdev)
}
clean_rx_pools(adapter);
clean_tx_pools(adapter);
+}
+
+static int __ibmvnic_close(struct net_device *netdev)
+{
+   struct ibmvnic_adapter *adapter = netdev_priv(netdev);
+   int rc = 0;
+
+   adapter->state = VNIC_CLOSING;
+   rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
+   if (rc)
+   return rc;
+   ibmvnic_cleanup(netdev);
adapter->state = VNIC_CLOSED;
-   return rc;
+   return 0;
 }
 
 static int ibmvnic_close(struct net_device *netdev)
-- 
1.8.3.1



[PATCH 1/3] ibmvnic: Clean up device close

2018-03-07 Thread Thomas Falcon
Remove some dead code now that RX pools are being cleaned. This
was included to wait until any pending RX queue interrupts are
processed, but NAPI polling should be disabled by this point.

Another minor change is to use the net device parameter for any
print functions instead of accessing it from the adapter structure.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 7654071..fca0533 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1162,7 +1162,7 @@ static int __ibmvnic_close(struct net_device *netdev)
if (adapter->tx_scrq) {
for (i = 0; i < adapter->req_tx_queues; i++)
if (adapter->tx_scrq[i]->irq) {
-   netdev_dbg(adapter->netdev,
+   netdev_dbg(netdev,
   "Disabling tx_scrq[%d] irq\n", i);
disable_irq(adapter->tx_scrq[i]->irq);
}
@@ -1174,18 +1174,8 @@ static int __ibmvnic_close(struct net_device *netdev)
 
if (adapter->rx_scrq) {
for (i = 0; i < adapter->req_rx_queues; i++) {
-   int retries = 10;
-
-   while (pending_scrq(adapter, adapter->rx_scrq[i])) {
-   retries--;
-   mdelay(100);
-
-   if (retries == 0)
-   break;
-   }
-
if (adapter->rx_scrq[i]->irq) {
-   netdev_dbg(adapter->netdev,
+   netdev_dbg(netdev,
   "Disabling rx_scrq[%d] irq\n", i);
disable_irq(adapter->rx_scrq[i]->irq);
}
-- 
1.8.3.1



[PATCHv2 net-next] openvswitch: fix vport packet length check.

2018-03-07 Thread William Tu
When sending a packet to a tunnel device, the dev's hard_header_len
could be larger than the skb->len in function packet_length().
In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen,
which is around 180, and an ARP packet sent to this tunnel has
skb->len = 42.  This causes the 'unsign int length' to become super
large because it is negative value, causing the later ovs_vport_send
to drop it due to over-mtu size.  The patch fixes it by setting it to 0.

Signed-off-by: William Tu 
---
v1->v2:
  replace the return type from unsigned int to int
---
 net/openvswitch/vport.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index b6c8524032a0..f81c1d0ddff4 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -464,10 +464,10 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff 
*skb,
return 0;
 }
 
-static unsigned int packet_length(const struct sk_buff *skb,
- struct net_device *dev)
+static int packet_length(const struct sk_buff *skb,
+struct net_device *dev)
 {
-   unsigned int length = skb->len - dev->hard_header_len;
+   int length = skb->len - dev->hard_header_len;
 
if (!skb_vlan_tag_present(skb) &&
eth_type_vlan(skb->protocol))
@@ -478,7 +478,7 @@ static unsigned int packet_length(const struct sk_buff *skb,
 * account for 802.1ad. e.g. is_skb_forwardable().
 */
 
-   return length;
+   return length > 0 ? length : 0;
 }
 
 void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto)
-- 
2.7.4



Re: linux-next: manual merge of the selinux tree with the net-next tree

2018-03-07 Thread Paul Moore
On Wed, Mar 7, 2018 at 3:26 PM, David Miller  wrote:
> From: Paul Moore 
> Date: Wed, 7 Mar 2018 15:20:33 -0500
>
>>> So you would only have to wait until my tree went in before
>>> sending your pull request.
>>
>> So you would want me to rebase selinux/next on top of Linus' tree in
>> the middle of the merge window?  I'm sure that isn't what you meant,
>> but that's how I keep reading the above ... which can't be right,
>> because in my experience that's one way to piss off Linus.  Help me
>> understand what you are saying.
>
> I never said you rebase anything.  I wonder where you get that from.

As I said, I was just trying to figure out what you were suggesting.
Your email was not very clear in my opinion.

> I'm saying, you just defer your pull request until Linus takes my
> networking tree in.
>
> No changes or rebasing of your tree is necessary whatsoever.  You just
> ask him to pull your tree as-is.
>
> Again, this is what other smaller subsystem trees do when they have a
> situation like this.

Which gets us back to what I originally suggested in my first email of
this thread: linux-next carries the fixup patch and when we send the
pull requests to Linus we mention this fixup/thread.

For what it's worth, if you mention the potential merge conflict, and
the fixup that Stephen provided, it shouldn't matter when the pull
requests are sent to Linus; he's a smart guy, he'll merge things in
the order he wants.  I've seen more than a few people get burned by
deferring pull requests, I don't intend to have SELinux, or audit for
that matter, run into the same problem.

-- 
paul moore
www.paul-moore.com


Re: [PATCH net-next] ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE

2018-03-07 Thread David Miller
From: Eric Dumazet 
Date: Wed, 07 Mar 2018 08:43:19 -0800

> From: Eric Dumazet 
> 
> Kirill found that recently added synchronize_rcu() call in
> ip6mr_sk_done()
> was slowing down netns dismantle and posted a patch to use it only if
> the socket
> was found.
> 
> I instead suggested to get rid of this call, and use instead
> SOCK_RCU_FREE
> 
> We might later change IPv4 side to use the same technique and unify
> both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu()
> that could be replaced by SOCK_RCU_FREE.
> 
> Tested:
>  time for i in {1..1000}; do unshare -n /bin/false;done
> 
>  Before : real 7m18.911s
>  After : real 10.187s
> 
> Fixes: 8571ab479a6e ("ip6mr: Make mroute_sk rcu-based")
> Signed-off-by: Eric Dumazet 
> Reported-by: Kirill Tkhai 
> Cc: Yuval Mintz 

Looks great, applied, thanks everyone.


Re: [pull request][for-next V2 00/13] Mellanox, mlx5 IPSec updates 2018-02-28-1

2018-03-07 Thread Saeed Mahameed
On Wed, 2018-03-07 at 15:57 -0500, Doug Ledford wrote:
> On Wed, 2018-03-07 at 15:41 -0500, Doug Ledford wrote:
> > On Wed, 2018-03-07 at 15:31 -0500, David Miller wrote:
> > > From: Saeed Mahameed 
> > > Date: Tue,  6 Mar 2018 22:35:03 -0800
> > > 
> > > > This series includes shared code updates for mlx5 core driver
> > > > for both
> > > > netdev and rdma subsystems.  This series should be pulled to
> > > > both
> > > > trees so we can continue netdev and rdma specific submissions
> > > > separately.
> > > > 
> > > > For more information please see tag log below.
> > > > 
> > > > The series doesn't cause any conflict with the latest mlx5 rc
> > > > fixes.
> > > > 
> > > > v1->v2:
> > > >   - Drop sparse fixes patch
> > > >   - Updated commit message of "net/mlx5: Add has_tag to
> > > > mlx5_flow_act"
> > > >   - Add const to  static mlx5_flow_cmd structs where needed.
> > > 
> > > Pulled, thanks Saeed.
> > 
> > Thanks, pulled here as well.
> > 
> 
> Just FYI,
> 
> My .config might have been in an unreasonable state (I had jumped
> from
> for-next to for-rc, built a kernel which ran a make oldconfig, then
> jumped back to for-next and tried to build with this series applied
> but
> without making any changes to the .config file), but I got a build
> error.  My .config had both innova and the new ipsec accelerator
> turned
> off or something like that, and I got this error:
> 
>   CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_main.o
> drivers/net/ethernet/mellanox/mlx5/core/fs_core.c: In function
> ‘mlx5_init_fs’:
> drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:2645:6: error:
> implicit declaration of function ‘mlx5_accel_ipsec_device_caps’; did
> you
> mean ‘mlx5_accel_ipsec_cleanup’? [-Werror=implicit-function-
> declaration]
>   if (mlx5_accel_ipsec_device_caps(steering->dev) &
>   ^~~~
>   mlx5_accel_ipsec_cleanup
>   CC [M]  drivers/net/ethernet/silan/sc92031.o
>   CC [M]  drivers/w1/slaves/w1_smem.o
>   CC [M]  drivers/net/ethernet/mellanox/mlx5/core/en_common.o
>   CC [M]  drivers/net/ethernet/sfc/falcon/nic.o
> drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:2646:6: error:
> ‘MLX5_ACCEL_IPSEC_DEVICE’ undeclared (first use in this function);
> did
> you mean ‘__MLX5_ACCEL_IPSEC_H__’?
>   MLX5_ACCEL_IPSEC_DEVICE) {
>   ^~~
>   __MLX5_ACCEL_IPSEC_H__
> drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:2646:6: note: each
> undeclared identifier is reported only once for each function it
> appears
> in
>   CC [M]  drivers/w1/w1.o
> 
> Running make config and enabling innova support and ipsec accelerator
> support fixed it.
> 

hmm, i think there is a missing include somewhere, will check this out
now and send the fix in the next pull request, later today or tomorrow.

Thanks Doug !


Re: [PATCH net-next 0/2] RDS: zerocopy code enhancements

2018-03-07 Thread David Miller
From: Sowmini Varadhan 
Date: Tue,  6 Mar 2018 07:22:32 -0800

> A couple of enhancements to the rds zerocop code
> - patch 1 refactors rds_message_copy_from_user to pull the zcopy logic
>   into its own function
> - patch 2 drops the usage sk_buff to track MSG_ZEROCOPY cookies and
>   uses a simple linked list (enhancement suggested by willemb during
>   code review)

Series applied, thanks.


Re: [RFC v3 net-next 08/18] net: SO_TXTIME: Add clockid and drop_if_late params

2018-03-07 Thread David Miller
From: Eric Dumazet 
Date: Wed, 07 Mar 2018 14:45:45 -0800

> No, we need to be extra careful.

+1


Re: [RFC v3 iproute2 3/3] tc: Add support for the TBS Qdisc

2018-03-07 Thread Stephen Hemminger
On Wed, 7 Mar 2018 14:29:23 -0800
Jesus Sanchez-Palencia  wrote:

> Hi,
> 
> 
> On 03/06/2018 05:51 PM, Stephen Hemminger wrote:
> > On Tue,  6 Mar 2018 17:16:08 -0800
> > Jesus Sanchez-Palencia  wrote:
> >   
> >> atic int tbs_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
> >> +{
> >> +  struct rtattr *tb[TCA_TBS_MAX+1];
> >> +  struct tc_tbs_qopt *qopt;
> >> +
> >> +  if (opt == NULL)
> >> +  return 0;
> >> +
> >> +  parse_rtattr_nested(tb, TCA_TBS_MAX, opt);
> >> +
> >> +  if (tb[TCA_TBS_PARMS] == NULL)
> >> +  return -1;
> >> +
> >> +  qopt = RTA_DATA(tb[TCA_TBS_PARMS]);
> >> +  if (RTA_PAYLOAD(tb[TCA_TBS_PARMS])  < sizeof(*qopt))
> >> +  return -1;
> >> +
> >> +  fprintf(f, "clockid ");
> >> +  if (qopt->clockid == CLOCKID_INVALID)
> >> +  fprintf(f, "invalid ");
> >> +  else
> >> +  fprintf(f, "%d ", qopt->clockid);
> >> +
> >> +  fprintf(f, "delta %d ", qopt->delta);
> >> +  fprintf(f, "offload %s ", (qopt->flags & TC_TBS_OFFLOAD_ON) ?
> >> +  "on" : "off");
> >> +  fprintf(f, "sorting %s", (qopt->flags & TC_TBS_SORTING_ON) ?
> >> +  "on" : "off");
> >> +
> >> +  return 0;
> >> +}  
> > 
> > All new print code in iproute2 should support JSON output.
> > Look at other code using json_print.h for simple way to handle this.
> >   
> 
> 
> Fixed, thanks. I'm assuming that only applies to print code from print_qopt()
> implementations. Please let me know if otherwise.
> 

Yes. that is what gets invoked by 'tc qdisc show'.
Not everything is updated, but want to get there soon.


[PATCH] net: phy: Move interrupt check from phy_check to phy_interrupt

2018-03-07 Thread Brad Mouring
If multiple phys share the same interrupt (e.g. a multi-phy chip),
the first device registered is the only one checked as phy_interrupt
will always return IRQ_HANDLED if the first phydev is not halted.
Move the interrupt check into phy_interrupt and, if it was not this
phydev, return IRQ_NONE to allow other devices on this irq a chance
to check if it was their interrupt.

Signed-off-by: Brad Mouring 
---
 drivers/net/phy/phy.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index e3e29c2b028b..ff1aa815568f 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -632,6 +632,12 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
if (PHY_HALTED == phydev->state)
return IRQ_NONE;/* It can't be ours.  */
 
+   if (phy_interrupt_is_valid(phydev)) {
+   if (phydev->drv->did_interrupt &&
+   !phydev->drv->did_interrupt(phydev))
+   return IRQ_NONE;
+   }
+
phy_change(phydev);
 
return IRQ_HANDLED;
@@ -725,16 +731,6 @@ EXPORT_SYMBOL(phy_stop_interrupts);
  */
 void phy_change(struct phy_device *phydev)
 {
-   if (phy_interrupt_is_valid(phydev)) {
-   if (phydev->drv->did_interrupt &&
-   !phydev->drv->did_interrupt(phydev))
-   return;
-
-   if (phydev->state == PHY_HALTED)
-   if (phy_disable_interrupts(phydev))
-   goto phy_err;
-   }
-
mutex_lock(>lock);
if ((PHY_RUNNING == phydev->state) || (PHY_NOLINK == phydev->state))
phydev->state = PHY_CHANGELINK;
-- 
2.16.2



Re: [PATCH bpf] bpf, x64: increase number of passes

2018-03-07 Thread Alexei Starovoitov
On Wed, Mar 07, 2018 at 10:10:01PM +0100, Daniel Borkmann wrote:
> In Cilium some of the main programs we run today are hitting 9 passes
> on x64's JIT compiler, and we've had cases already where we surpassed
> the limit where the JIT then punts the program to the interpreter
> instead, leading to insertion failures due to CONFIG_BPF_JIT_ALWAYS_ON
> or insertion failures due to the prog array owner being JITed but the
> program to insert not (both must have the same JITed/non-JITed property).
> 
> One concrete case the program image shrunk from 12,767 bytes down to
> 10,288 bytes where the image converged after 16 steps. I've measured
> that this took 340us in the JIT until it converges on my i7-6600U. Thus,
> increase the original limit we had from day one where the JIT covered
> cBPF only back then before we run into the case (as similar with the
> complexity limit) where we trip over this and hit program rejections.
> Also add a cond_resched() into the compilation loop, the JIT process
> runs without any locks and may sleep anyway.
> 
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 

Applied to bpf tree, Thanks Daniel!



Re: [RFC v3 net-next 08/18] net: SO_TXTIME: Add clockid and drop_if_late params

2018-03-07 Thread Eric Dumazet
On Wed, 2018-03-07 at 13:52 -0800, Jesus Sanchez-Palencia wrote:
> Hi,
...
> I should have mentioned on the commit msg, but the tc_drop_if_late is
> actually
> filling a 1 bit hole that was already there.
> 
> 
> > 
> > Do we really need 32 bits for a clockid_t ?
> 
> There is a 2 bytes hole just after tc_index, so a u16 clockid would
> fit
> perfectly without increasing the skbuffs size / cachelines any
> further.
> 
> From Richard's reply, it seems safe to just change the definition
> here if we
> make it explicit on the SCM_CLOCKID documentation the caveat about
> the max
> possible fd count for dynamic clocks.
> 
> How does that sound?

Not convincing really :/

Next big feature needing one bit in sk_buff will add it, and add a
63bit hole.

Then next feature(s) will happily consume 'because there are holes
anyway'.

Then at some point we will cross cache line boundary and performance
will take a 10 % hit.

It is a never ending trend.

If you really need 33 bits, then maybe we'll ask you to guard the new
bits with some #if IS_ENABLED(CONFIG_...) so that we can opt-out.

Why do we _really_ need dynamic clocks being supported in core
networking stack, other than 'that is needed to send 2 packets per
second with precise departure time and arbitrary user defined clocks,
so lets do that, and do not care of the other 10,000,000 packets we
receive/send per second'

I have one patch (TXCS, something that I called XPS in the past)
implementing the remote-freeing of skbs that help workloads where skb
are produced on cpu A and consumed on cpu B,
using an additional 16bit field that I have not upstreamed yet (even if
Mellanox folks want that), simply because of this additional field...

Maybe I should eat this hole before you take it ?

No, we need to be extra careful.



[PATCH 1/2] net: macb: Add phy-handle DT support

2018-03-07 Thread Brad Mouring
This optional binding (as described in the ethernet DT bindings doc)
directs the netdev to the phydev to use. This is useful for a phy
chip that has >1 phy in it, and two netdevs are using the same phy
chip (i.e. the second mac's phy lives on the first mac's MDIO bus)

The devicetree snippet would look something like this:

ethernet@feedf00d {
...
phy-handle = <> // the first netdev is physically wired to phy0
...
phy0: phy@0 {
...
reg = <0x0> // MDIO address 0
...
}
phy1: phy@1 {
...
reg = <0x1> // MDIO address 1
...
}
...
}

ethernet@deadbeef {
...
phy-handle = <> // tells the driver to use phy1 on the
 // first mac's mdio bus (it's 
wired thusly)
...
}

The work done to add the phy_node in the first place (dacdbb4dfc1a1:
"net: macb: add fixed-link node support") will consume the
device_node (if found).

Signed-off-by: Brad Mouring 
---
 drivers/net/ethernet/cadence/macb_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index e84afcf1ecb5..cc5b9e6e3526 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -567,6 +567,9 @@ static int macb_mii_init(struct macb *bp)
 
err = mdiobus_register(bp->mii_bus);
} else {
+   /* attempt to find a phy-handle */
+   bp->phy_node = of_parse_phandle(np, "phy-handle", 0);
+
/* try dt phy registration */
err = of_mdiobus_register(bp->mii_bus, np);
 
-- 
2.16.2



[PATCH v2 net] net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()

2018-03-07 Thread Greg Hackmann
f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a
__this_cpu_read() call inside ipcomp_alloc_tfms().

At the time, __this_cpu_read() required the caller to either not care
about races or to handle preemption/interrupt issues.  3.15 tightened
the rules around some per-cpu operations, and now __this_cpu_read()
should never be used in a preemptible context.  On 3.15 and later, we
need to use this_cpu_read() instead.

syzkaller reported this leading to the following kernel BUG while
fuzzing sendmsg:

BUG: using __this_cpu_read() in preemptible [] code: repro/3101
caller is ipcomp_init_state+0x185/0x990
CPU: 3 PID: 3101 Comm: repro Not tainted 4.16.0-rc4-00123-g86f84779d8e9 #154
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0xb9/0x115
 check_preemption_disabled+0x1cb/0x1f0
 ipcomp_init_state+0x185/0x990
 ? __xfrm_init_state+0x876/0xc20
 ? lock_downgrade+0x5e0/0x5e0
 ipcomp4_init_state+0xaa/0x7c0
 __xfrm_init_state+0x3eb/0xc20
 xfrm_init_state+0x19/0x60
 pfkey_add+0x20df/0x36f0
 ? pfkey_broadcast+0x3dd/0x600
 ? pfkey_sock_destruct+0x340/0x340
 ? pfkey_seq_stop+0x80/0x80
 ? __skb_clone+0x236/0x750
 ? kmem_cache_alloc+0x1f6/0x260
 ? pfkey_sock_destruct+0x340/0x340
 ? pfkey_process+0x62a/0x6f0
 pfkey_process+0x62a/0x6f0
 ? pfkey_send_new_mapping+0x11c0/0x11c0
 ? mutex_lock_io_nested+0x1390/0x1390
 pfkey_sendmsg+0x383/0x750
 ? dump_sp+0x430/0x430
 sock_sendmsg+0xc0/0x100
 ___sys_sendmsg+0x6c8/0x8b0
 ? copy_msghdr_from_user+0x3b0/0x3b0
 ? pagevec_lru_move_fn+0x144/0x1f0
 ? find_held_lock+0x32/0x1c0
 ? do_huge_pmd_anonymous_page+0xc43/0x11e0
 ? lock_downgrade+0x5e0/0x5e0
 ? get_kernel_page+0xb0/0xb0
 ? _raw_spin_unlock+0x29/0x40
 ? do_huge_pmd_anonymous_page+0x400/0x11e0
 ? __handle_mm_fault+0x553/0x2460
 ? __fget_light+0x163/0x1f0
 ? __sys_sendmsg+0xc7/0x170
 __sys_sendmsg+0xc7/0x170
 ? SyS_shutdown+0x1a0/0x1a0
 ? __do_page_fault+0x5a0/0xca0
 ? lock_downgrade+0x5e0/0x5e0
 SyS_sendmsg+0x27/0x40
 ? __sys_sendmsg+0x170/0x170
 do_syscall_64+0x19f/0x640
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7f0ee73dfb79
RSP: 002b:7ffe14fc15a8 EFLAGS: 0207 ORIG_RAX: 002e
RAX: ffda RBX:  RCX: 7f0ee73dfb79
RDX:  RSI: 208befc8 RDI: 0004
RBP: 7ffe14fc15b0 R08: 7ffe14fc15c0 R09: 7ffe14fc15c0
R10:  R11: 0207 R12: 00400440
R13: 7ffe14fc16b0 R14:  R15: 

Signed-off-by: Greg Hackmann 
---
v2: expand commit log to clarify that kernel 3.15 and later are impacted

 net/xfrm/xfrm_ipcomp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
index ccfdc7115a83..a00ec715aa46 100644
--- a/net/xfrm/xfrm_ipcomp.c
+++ b/net/xfrm/xfrm_ipcomp.c
@@ -283,7 +283,7 @@ static struct crypto_comp * __percpu 
*ipcomp_alloc_tfms(const char *alg_name)
struct crypto_comp *tfm;
 
/* This can be any valid CPU ID so we don't need locking. */
-   tfm = __this_cpu_read(*pos->tfms);
+   tfm = this_cpu_read(*pos->tfms);
 
if (!strcmp(crypto_comp_name(tfm), alg_name)) {
pos->users++;
-- 
2.16.2.395.g2e18187dfd-goog



[PATCH 2/2] Documentation: macb: Document phy-handle optional binding

2018-03-07 Thread Brad Mouring
Signed-off-by: Brad Mouring 
---
 Documentation/devicetree/bindings/net/macb.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/macb.txt 
b/Documentation/devicetree/bindings/net/macb.txt
index 27966ae741e0..457d5ae16f23 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -29,6 +29,7 @@ Optional properties for PHY child node:
 - reset-gpios : Should specify the gpio for phy reset
 - magic-packet : If present, indicates that the hardware supports waking
   up via magic packet.
+- phy-handle : see ethernet.txt file in the same directory
 
 Examples:
 
-- 
2.16.2



Re: [PATCH net-next] openvswitch: fix vport packet length check.

2018-03-07 Thread William Tu
On Wed, Mar 7, 2018 at 1:18 PM, Pravin Shelar  wrote:
> On Tue, Mar 6, 2018 at 5:56 PM, William Tu  wrote:
>> When sending a packet to a tunnel device, the dev's hard_header_len
>> could be larger than the skb->len in function packet_length().
>> In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen,
>> which is around 180, and an ARP packet sent to this tunnel has
>> skb->len = 42.  This causes the 'unsign int length' to become super
>> large because it is negative value, causing the later ovs_vport_send
>> to drop it due to over-mtu size.  The patch fixes it by setting it to 0.
>>
>> Signed-off-by: William Tu 
>> ---
>>  net/openvswitch/vport.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
>> index b6c8524032a0..7718d5b4cf8a 100644
>> --- a/net/openvswitch/vport.c
>> +++ b/net/openvswitch/vport.c
>> @@ -467,7 +467,7 @@ int ovs_vport_receive(struct vport *vport, struct 
>> sk_buff *skb,
>>  static unsigned int packet_length(const struct sk_buff *skb,
>>   struct net_device *dev)
> Can you also change return type of this function?
>

OK, I will change to int. Thanks


Re: [RFC v3 iproute2 3/3] tc: Add support for the TBS Qdisc

2018-03-07 Thread Jesus Sanchez-Palencia
Hi,


On 03/06/2018 05:51 PM, Stephen Hemminger wrote:
> On Tue,  6 Mar 2018 17:16:08 -0800
> Jesus Sanchez-Palencia  wrote:
> 
>> atic int tbs_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
>> +{
>> +struct rtattr *tb[TCA_TBS_MAX+1];
>> +struct tc_tbs_qopt *qopt;
>> +
>> +if (opt == NULL)
>> +return 0;
>> +
>> +parse_rtattr_nested(tb, TCA_TBS_MAX, opt);
>> +
>> +if (tb[TCA_TBS_PARMS] == NULL)
>> +return -1;
>> +
>> +qopt = RTA_DATA(tb[TCA_TBS_PARMS]);
>> +if (RTA_PAYLOAD(tb[TCA_TBS_PARMS])  < sizeof(*qopt))
>> +return -1;
>> +
>> +fprintf(f, "clockid ");
>> +if (qopt->clockid == CLOCKID_INVALID)
>> +fprintf(f, "invalid ");
>> +else
>> +fprintf(f, "%d ", qopt->clockid);
>> +
>> +fprintf(f, "delta %d ", qopt->delta);
>> +fprintf(f, "offload %s ", (qopt->flags & TC_TBS_OFFLOAD_ON) ?
>> +"on" : "off");
>> +fprintf(f, "sorting %s", (qopt->flags & TC_TBS_SORTING_ON) ?
>> +"on" : "off");
>> +
>> +return 0;
>> +}
> 
> All new print code in iproute2 should support JSON output.
> Look at other code using json_print.h for simple way to handle this.
> 


Fixed, thanks. I'm assuming that only applies to print code from print_qopt()
implementations. Please let me know if otherwise.



RE: [Intel-wired-lan] [next-queue PATCH v3 6/8] igb: Add MAC address support for ethtool nftuple filters

2018-03-07 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of Vinicius Costa Gomes
> Sent: Tuesday, March 6, 2018 5:30 PM
> To: intel-wired-...@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus  palen...@intel.com>
> Subject: [Intel-wired-lan] [next-queue PATCH v3 6/8] igb: Add MAC address
> support for ethtool nftuple filters
> 
> This adds the capability of configuring the queue steering of arriving
> packets based on their source and destination MAC addresses.
> 
> In practical terms this adds support for the following use cases,
> characterized by these examples:
> 
> $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
> (this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
> to the RX queue 0)
> 
> $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3
> (this will direct packets with destination address "44:44:44:44:44:44"
> to the RX queue 3)

I assume this example should read "... source address"  rather than 
"...destination".

> 
> Signed-off-by: Vinicius Costa Gomes 
> ---
>  drivers/net/ethernet/intel/igb/igb_ethtool.c | 35
> 
>  1 file changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> index 94fc9a4bed8b..3f98299d4cd0 100644
> --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> @@ -2494,6 +2494,23 @@ static int igb_get_ethtool_nfc_entry(struct
> igb_adapter *adapter,
>   fsp->h_ext.vlan_tci = rule->filter.vlan_tci;
>   fsp->m_ext.vlan_tci = htons(VLAN_PRIO_MASK);
>   }
> + if (rule->filter.match_flags &
> IGB_FILTER_FLAG_DST_MAC_ADDR) {
> + ether_addr_copy(fsp->h_u.ether_spec.h_dest,
> + rule->filter.dst_addr);
> + /* As we only support matching by the full
> +  * mask, return the mask to userspace
> +  */
> + eth_broadcast_addr(fsp->m_u.ether_spec.h_dest);
> + }
> + if (rule->filter.match_flags &
> IGB_FILTER_FLAG_SRC_MAC_ADDR) {
> + ether_addr_copy(fsp->h_u.ether_spec.h_source,
> + rule->filter.src_addr);
> + /* As we only support matching by the full
> +  * mask, return the mask to userspace
> +  */
> + eth_broadcast_addr(fsp-
> >m_u.ether_spec.h_source);
> + }
> +
>   return 0;
>   }
>   return -EINVAL;
> @@ -2932,10 +2949,6 @@ static int igb_add_ethtool_nfc_entry(struct
> igb_adapter *adapter,
>   if ((fsp->flow_type & ~FLOW_EXT) != ETHER_FLOW)
>   return -EINVAL;
> 
> - if (fsp->m_u.ether_spec.h_proto != ETHER_TYPE_FULL_MASK &&
> - fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK))
> - return -EINVAL;
> -
>   input = kzalloc(sizeof(*input), GFP_KERNEL);
>   if (!input)
>   return -ENOMEM;
> @@ -2945,6 +2958,20 @@ static int igb_add_ethtool_nfc_entry(struct
> igb_adapter *adapter,
>   input->filter.match_flags = IGB_FILTER_FLAG_ETHER_TYPE;
>   }
> 
> + /* Only support matching addresses by the full mask */
> + if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_source)) {
> + input->filter.match_flags |=
> IGB_FILTER_FLAG_SRC_MAC_ADDR;
> + ether_addr_copy(input->filter.src_addr,
> + fsp->h_u.ether_spec.h_source);
> + }
> +
> + /* Only support matching addresses by the full mask */
> + if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_dest)) {
> + input->filter.match_flags |=
> IGB_FILTER_FLAG_DST_MAC_ADDR;
> + ether_addr_copy(input->filter.dst_addr,
> + fsp->h_u.ether_spec.h_dest);
> + }
> +
>   if ((fsp->flow_type & FLOW_EXT) && fsp->m_ext.vlan_tci) {
>   if (fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) {
>   err = -EINVAL;
> --
> 2.16.2
> 
> ___
> Intel-wired-lan mailing list
> intel-wired-...@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Re: [next-queue PATCH v3 8/8] igb: Add support for adding offloaded clsflower filters

2018-03-07 Thread Vinicius Costa Gomes
Hi,

Jakub Kicinski  writes:

> On Tue,  6 Mar 2018 17:29:57 -0800, Vinicius Costa Gomes wrote:
>> This allows filters added by tc-flower and specifying MAC addresses,
>> Ethernet types, and the VLAN priority field, to be offloaded to the
>> controller.
>> 
>> This reuses most of the infrastructure used by ethtool, but clsflower
>> filters are kept in a separated list, so they are invisible to
>> ethtool.
>> 
>> Signed-off-by: Vinicius Costa Gomes 
>
> LGTM, thanks!
>
>> +NL_SET_ERR_MSG(extack,
>
> One nit: consider using NL_SET_ERR_MSG_MOD to prefix the message with
> driver name.

Sure thing. Will send a v4 with this shortly.


Cheers,
--
Vinicius


Re: [PATCH net 2/5] tcp: prevent bogus FRTO undos with non-SACK flows

2018-03-07 Thread Ilpo Järvinen
On Wed, 7 Mar 2018, Yuchung Cheng wrote:
> On Wed, Mar 7, 2018 at 11:24 AM, Neal Cardwell  wrote:
> > On Wed, Mar 7, 2018 at 7:59 AM, Ilpo Järvinen  
> > wrote:
> > >
> > > In a non-SACK case, any non-retransmitted segment acknowledged will
> > > set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
> > > no indication that it would have been delivered for real (the
> > > scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
> > > case). This causes bogus undos in ordinary RTO recoveries where
> > > segments are lost here and there, with a few delivered segments in
> > > between losses. A cumulative ACKs will cover retransmitted ones at
> > > the bottom and the non-retransmitted ones following that causing
> > > FLAG_ORIG_SACK_ACKED to be set in tcp_clean_rtx_queue and results
> > > in a spurious FRTO undo.
> > >
> > > We need to make the check more strict for non-SACK case and check
> > > that none of the cumulatively ACKed segments were retransmitted,
> > > which would be the case for the last step of FRTO algorithm as we
> > > sent out only new segments previously. Only then, allow FRTO undo
> > > to proceed in non-SACK case.
> >
> > Hi Ilpo - Do you have a packet trace or (even better) packetdrill
> > script illustrating this issue? It would be nice to have a test case
> > or at least concrete example of this.
>
> a packetdrill or even a contrived example would be good ...

I've seen all but this for sure in packet traces. But I'm somewhat 
old-school that while looking for the burst issue I discovered this 
issue by reading the code only (making it more than _one_ issue).
However, I think that I later on saw also this issue from the traces
(as it seemed to not match to any of the other burst issues this whole 
series is trying to fix). But finding that dump afterwards would take 
really long time, I've more than enough of them from our recent
tests ;-)).

But anyway, that was before the recent moving for the condition into 
tp->frto block so it might no longer be triggerable. It clearly was 
triggerable beforehand without tp->frto guard (and I just forward-ported 
past that recent change without thinking it much).

To trigger it, ever-R and !ever-R skb would need to be cumulatively 
ACKed when tp->frto is non-zero. Do you think that is still possible
with FRTO? E.g., after some undo leaving some ever-R and then RTO 
resulting in FRTO procedure?

> also why not just avoid setting FLAG_ORIG_SACK_ACKED on non-sack? seems 
> a much clean fix.

I guess that would work now that the relevant FRTO condition got moved
into the tp->frto block. It wouldn't have been that simple earlier
as SACK wanted FLAG_ORIG_SACK_ACKED while non-SACK wants
FLAG_ONLY_ORIG_ACKED (that was already available through a combination
of the existing FLAGs).


-- 
 i.

  1   2   3   4   >