Re: [PATCH rdma-next 0/3] Packet based credit mode

2018-12-07 Thread Jason Gunthorpe
On Fri, Dec 07, 2018 at 08:04:26AM +0200, Leon Romanovsky wrote:
> On Thu, Dec 06, 2018 at 08:27:06PM -0700, Jason Gunthorpe wrote:
> > On Fri, Nov 30, 2018 at 01:22:03PM +0200, Leon Romanovsky wrote:
> > > From: Leon Romanovsky 
> > >
> > > >From Danit,
> > >
> > > Packet based credit mode is an alternative end-to-end credit mode for QPs
> > > set during their creation. Credits are transported from the responder
> > > to the requester to optimize the use of its receive resources.
> > > In packet-based credit mode, credits are issued on a per packet basis.
> > >
> > > The advantage of this feature comes while sending large RDMA messages
> > > through switches that are short in memory.
> > >
> > > The first commit exposes QP creation flag and the HCA capability. The
> > > second commit adds support for a new DV QP creation flag. The last
> > > commit report packet based credit mode capability via the MLX5DV device
> > > capabilities.
> > >
> > > Thanks
> > >
> > > Danit Goldberg (3):
> > >   net/mlx5: Expose packet based credit mode
> > >   IB/mlx5: Add packet based credit mode support
> > >   IB/mlx5: Report packet based credit mode device capability
> >
> > This looks fine to me, can you update the shared branch please
> 
> Done, thanks
> 3fd3c80acc17 net/mlx5: Expose packet based credit mode

Applied to for-next

Thanks,
Jason


Re: [PATCH rdma-next 0/3] Packet based credit mode

2018-12-06 Thread Jason Gunthorpe
On Fri, Nov 30, 2018 at 01:22:03PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> >From Danit,
> 
> Packet based credit mode is an alternative end-to-end credit mode for QPs
> set during their creation. Credits are transported from the responder
> to the requester to optimize the use of its receive resources.
> In packet-based credit mode, credits are issued on a per packet basis.
> 
> The advantage of this feature comes while sending large RDMA messages
> through switches that are short in memory.
> 
> The first commit exposes QP creation flag and the HCA capability. The
> second commit adds support for a new DV QP creation flag. The last
> commit report packet based credit mode capability via the MLX5DV device
> capabilities.
> 
> Thanks
> 
> Danit Goldberg (3):
>   net/mlx5: Expose packet based credit mode
>   IB/mlx5: Add packet based credit mode support
>   IB/mlx5: Report packet based credit mode device capability

This looks fine to me, can you update the shared branch please

Thanks,
Jason


Re: [PATCH rdma-next 0/7] Enrich DEVX support

2018-12-04 Thread Jason Gunthorpe
On Tue, Dec 4, 2018 at 12:02 PM Doug Ledford  wrote:
>
> On Mon, 2018-11-26 at 08:28 +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky 
> >
> > From Yishai,
> > ---
> > This series enriches DEVX support in few aspects: it enables 
> > interoperability
> > between DEVX and verbs and improves mechanism for controlling privileged 
> > DEVX
> > commands.
> >
> > The first patch updates mlx5 ifc file.
> >
> > Next 3 patches enable modifying and querying verbs objects via the DEVX
> > interface.
> >
> > To achieve that the core layer introduced the 'UVERBS_IDR_ANY_OBJECT' type
> > to match any IDR object. Once it's used by some driver's method, the
> > infrastructure skips checking for the IDR type and it becomes the driver
> > handler responsibility.
> >
> > The DEVX methods of modify and query were changed to get any object type via
> > the 'UVERBS_IDR_ANY_OBJECT' mechanism. The type checking is done per object 
> > as
> > part of the driver code.
> >
> > The next 3 patches introduce more robust mechanism for controlling 
> > privileged
> > DEVX commands. The responsibility to block/allow per command was moved to be
> > done in the firmware based on the UID credentials that the driver reports 
> > upon
> > user context creation. This enables more granularity per command based on 
> > the
> > device security model and the user credentials.
> >
> > In addition, by introducing a valid range for 'general commands' we prevent 
> > the
> > need to touch the driver's code any time that a new future command will be
> > added.
> >
> > The last patch fixes the XRC verbs flow once a DEVX context is used. This is
> > needed as XRCD is some shared kernel resource and as such a kernel UID (=0)
> > should be used in its related resources.
> >
> > Thanks
> >
> > Yishai Hadas (7):
> >   net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits
> >   IB/core: Introduce UVERBS_IDR_ANY_OBJECT
> >   IB/core: Enable getting an object type from a given uobject
> >   IB/mlx5: Enable modify and query verbs objects via DEVX
> >   IB/mlx5: Enforce DEVX privilege by firmware
> >   IB/mlx5: Update the supported DEVX commands
> >   IB/mlx5: Allow XRC usage via verbs in DEVX context
> >
> >  drivers/infiniband/core/rdma_core.c   |  27 +++--
> >  drivers/infiniband/core/rdma_core.h   |  21 ++--
> >  drivers/infiniband/core/uverbs_uapi.c |  10 +-
> >  drivers/infiniband/hw/mlx5/devx.c | 142 ++
> >  drivers/infiniband/hw/mlx5/main.c |   4 +-
> >  drivers/infiniband/hw/mlx5/mlx5_ib.h  |   6 +-
> >  drivers/infiniband/hw/mlx5/qp.c   |  12 +--
> >  drivers/infiniband/hw/mlx5/srq.c  |   2 +-
> >  include/linux/mlx5/mlx5_ifc.h |  26 -
> >  include/rdma/uverbs_ioctl.h   |   6 ++
> >  include/rdma/uverbs_std_types.h   |  12 +++
> >  11 files changed, 215 insertions(+), 53 deletions(-)
> >
> > --
> > 2.19.1
> >
>
> Hi Leon,
>
> I've put this into my wip/dl-for-next branch.  Because it pulled in the
> mlx5-next tree (which was significant because the patch we wanted was at
> the tip and a whole slew of other patches were before it),

Yes, this is how the shared branch works, DaveM merges it, we don't
have to unless we have a dependency, so it can accumulate a bit.

> the changeset
> for this patch looks much bigger.  And it had the merge conflict you
> mentioned.  You might want to double check my work before it gets moved
> over to the official for-next branch.

It is a bit easier to follow the git log if merges are done with the
--log option to summarize what was pulled in.

Jason


Re: [PATCH RFC iproute2-next 1/2] rdma: add 'link add/delete' commands

2018-11-28 Thread Jason Gunthorpe
On Wed, Nov 28, 2018 at 04:21:48PM -0600, Steve Wise wrote:

> >> It does make sense to not require type.  The name must be unique so that
> >> should be enough.  I'll have to respin the kernel side though...
> > The delete_link really should be an operation on the ib_device, not
> > the link_ops thing. 
> >
> > That directly prevents mis-matching function callbacks..
> >
> > Jason
> Looking at the rtnetlink newlink/dellink, I see they cache the link_ops
> ptr in the net_device struct.  So when the link is deleted, then
> appropriate driver-specific dellink function can be called after finding
> the device to be deleted.  Should I do something along these lines?  IE
> add a struct rdma_link_ops pointer to struct ib_device.

I don't see a problem with that either..

Jason


Re: [PATCH RFC iproute2-next 1/2] rdma: add 'link add/delete' commands

2018-11-28 Thread Jason Gunthorpe
On Wed, Nov 28, 2018 at 02:18:55PM -0600, Steve Wise wrote:
> 
> 
> On 11/28/2018 2:13 PM, Leon Romanovsky wrote:
> > On Wed, Nov 28, 2018 at 02:07:29PM -0600, Steve Wise wrote:
> >>
> >> On 11/28/2018 2:04 PM, Leon Romanovsky wrote:
> >>> On Wed, Nov 28, 2018 at 01:08:05PM -0600, Steve Wise wrote:
>  On 11/28/2018 12:26 PM, Leon Romanovsky wrote:
> > On Thu, Sep 13, 2018 at 10:19:21AM -0700, Steve Wise wrote:
> >> Add new 'link' subcommand 'add' and 'delete' to allow binding a 
> >> soft-rdma
> >> device to a netdev interface.
> >>
> >> EG:
> >>
> >> rdma link add rxe_eth0 type rxe dev eth0
> >> rdma link delete rxe_eth0
> >>
> >> Signed-off-by: Steve Wise 
> >>  rdma/link.c  | 106 
> >> +++
> >>  rdma/rdma.h  |   1 +
> >>  rdma/utils.c |   2 +-
> >>  3 files changed, 108 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/rdma/link.c b/rdma/link.c
> >> index 7a6d4b7e356d..d4f76b0ce11f 100644
> >> +++ b/rdma/link.c
> >> @@ -14,6 +14,8 @@
> >>  static int link_help(struct rd *rd)
> >>  {
> >>pr_out("Usage: %s link show [DEV/PORT_INDEX]\n", rd->filename);
> >> +  pr_out("Usage: %s link add NAME type TYPE dev DEV\n", 
> >> rd->filename);
> > I suggest to rename "dev" to be "netdev", because we are using "dev" for
> > ib devices.
>  Yea ok.
> 
> >> +  pr_out("Usage: %s link delete NAME type TYPE\n", rd->filename);
> > Why do you need "type" for "delete" command?
>  Because the type is used in the kernel to find the appropriate link
>  ops.  I could change the kernel side to search all types for the device
>  name to delete? 
> >>> I would say, yes.
> >>> It makes "delete" operation more natural.
> >>>
> >>> Thanks
> >> Perhaps.
> >>
> >> Note: 'ip link delete' takes a type as well...
> > According to man section, yes.
> > According to various guides, no.
> > https://docs.fedoraproject.org/en-US/Fedora/20/html/Networking_Guide/sec-Configure_802_1Q_VLAN_Tagging_ip_Commands.html
> >
> > Thanks
> 
> It does make sense to not require type.  The name must be unique so that
> should be enough.  I'll have to respin the kernel side though...

The delete_link really should be an operation on the ib_device, not
the link_ops thing. 

That directly prevents mis-matching function callbacks..

Jason


Re: [PATCH V2 mlx5-next 00/12] mlx5 core generic EQ API for RDMA ODP

2018-11-21 Thread Jason Gunthorpe
On Tue, Nov 20, 2018 at 11:11:16AM -0700, Leon Romanovsky wrote:
> On Mon, Nov 19, 2018 at 10:52:30AM -0800, Saeed Mahameed wrote:
> > Hi,
> >
> > This patchset is for mlx5-next shared branch, and will be applied there
> > once the review is done.
> >
> > This patchset introduces mostly refactoring work and EQ related code 
> > updates to
> > allow moving the ODP rdma only logic from mlx5_core into mlx5 ib where it
> > belongs, and will allow future updates and optimizations for the rdma ODP
> > (On Demand Paging) feature to go only to rdma tree.
> >
> > Patch #1: Fixes the offsets of stored irq affinity hints inside mlx5
> > irq info array.
> >
> > Patch #2,3,4: Remove unused fields, code and logic
> >
> > Patch #5: Move all EQ related logic from main.c to eq.c to allow clear
> > and seamless refactoring for creating generic EQ management API.
> >
> > Patch #6: Create mlx5 core EQs in one place, in order to have one entry
> > point to call from main file.
> >
> > Patch #7,8: Move EQ related structures into eq_table mlx5 structure and
> > make eq_table fields and logic private to eq.c file.
> >
> > Patch #9,10: Create one generic EQ struct and use it in different
> > EQ types (usages) e.g. (Async, Command, FW pages, completion and ODP)
> > Introduce generic EQ API to allow creating Generic EQs regardless of
> > their types, will be uesd to create all mlx5 core EQs in mlx5_core and
> > ODP EQ in mlx5_ib.
> >
> > Patch #11: Move ODP logic out from mlx5_core eq.c into mlx5 rdma driver.
> > odp.c file.
> >
> > Patch #12: Make the trivial EQE access methods inline.
> >
> > v1->v2:
> > - Remove vertical alignment
> > - Fix spilling "Chip" -> "Cheap"
> >
> > Thanks,
> > Saeed.
> >
> >
> > Saeed Mahameed (12):
> >   net/mlx5: EQ, Use the right place to store/read IRQ affinity hint
> >   net/mlx5: EQ, Remove unused fields and structures
> >   net/mlx5: EQ, No need to store eq index as a field
> >   net/mlx5: EQ, Remove redundant completion EQ list lock
> >   net/mlx5: EQ, Move all EQ logic to eq.c
> >   net/mlx5: EQ, Create all EQs in one place
> >   net/mlx5: EQ, irq_info and rmap belong to eq_table
> >   net/mlx5: EQ, Privatize eq_table and friends
> >   net/mlx5: EQ, Different EQ types
> >   net/mlx5: EQ, Generic EQ
> >   {net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA
> >   net/mlx5: EQ, Make EQE access methods inline
> >
> 
> Pushed to shared mlx5-next:
> 
> 6d2d6fc83a28 net/mlx5: EQ, Make EQE access methods inline
> d5d284b829a6 {net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA
> 7701707cb94e net/mlx5: EQ, Generic EQ
> 16d760839cee net/mlx5: EQ, Different EQ types
> f2f3df550139 net/mlx5: EQ, Privatize eq_table and friends
> d674a9aa4344 net/mlx5: EQ, irq_info and rmap belong to eq_table
> c8e21b3b576b net/mlx5: EQ, Create all EQs in one place
> ca828cb4686f net/mlx5: EQ, Move all EQ logic to eq.c
> aaa553a64438 net/mlx5: EQ, Remove redundant completion EQ list lock
> 2883f352571b net/mlx5: EQ, No need to store eq index as a field
> 4de45c758636 net/mlx5: EQ, Remove unused fields and structures
> 1e86ace4c140 net/mlx5: EQ, Use the right place to store/read IRQ affinity hint

Okay, merged to rdma-next now

Thanks,
Jason


Re: [PATCH V2 mlx5-next 11/12] {net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA

2018-11-20 Thread Jason Gunthorpe
On Mon, Nov 19, 2018 at 10:52:41AM -0800, Saeed Mahameed wrote:
> Use the new generic EQ API to move all ODP RDMA data structures and logic
> form mlx5 core driver into mlx5_ib driver.
> 
> Signed-off-by: Saeed Mahameed 
> Reviewed-by: Leon Romanovsky 
> Reviewed-by: Tariq Toukan 
> ---
>  drivers/infiniband/hw/mlx5/main.c |  10 +-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h  |  15 +-
>  drivers/infiniband/hw/mlx5/odp.c  | 281 +-
>  drivers/net/ethernet/mellanox/mlx5/core/dev.c |  34 ---
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c  | 252 
>  .../net/ethernet/mellanox/mlx5/core/lib/eq.h  |   8 -
>  .../net/ethernet/mellanox/mlx5/core/main.c|  17 +-
>  .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   2 -
>  include/linux/mlx5/driver.h   |  49 ---
>  include/linux/mlx5/eq.h   |  21 ++
>  10 files changed, 308 insertions(+), 381 deletions(-)

Acked-by: Jason Gunthorpe 

Can you let me know when these land in the shared branch please?

Thanks,
Jason


Re: [PATCH mlx5-next 00/10] Collection of ODP fixes

2018-11-19 Thread Jason Gunthorpe
On Mon, Nov 12, 2018 at 01:43:15PM -0700, Leon Romanovsky wrote:
> On Thu, Nov 08, 2018 at 09:10:07PM +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky 
> >
> > Hi,
> >
> > This is collection of fixes to mlx5_core and mlx5_ib ODP logic.
> > There are two main reasons why we decided to forward it to mlx5-next
> > and not to rdma-rc or net like our other fixes.
> >
> > 1. We have large number of patches exactly in that area that are ready
> > for submission and we don't want to create extra merge work for
> > maintainers due to that. Among those future series (already ready and
> > tested): internal change of mlx5_core pagefaults handling, moving ODP
> > code to RDMA, change in EQ mechanism, simplification and moving SRQ QP
> > to RDMA, extra fixes to mlx5_ib ODP and ODP SRQ support.
> >
> > 2. Most of those fixes are for old bugs and there is no rush to fix them
> > right now (in -rc1/-rc2).
> >
> > Thanks
> >
> > Moni Shoua (10):
> >   net/mlx5: Release resource on error flow
> >   net/mlx5: Add interface to hold and release core resources
> >   net/mlx5: Enumerate page fault types
> >   IB/mlx5: Lock QP during page fault handling
> >   net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
> >   net/mlx5: Use multi threaded workqueue for page fault handling
> >   IB/mlx5: Improve ODP debugging messages
> 
> Applied to mlx5-next
> b02394aa75e3 IB/mlx5: Improve ODP debugging messages
> 90290db7669b net/mlx5: Use multi threaded workqueue for page fault handling
> ef90c5e9757d net/mlx5: Return success for PAGE_FAULT_RESUME in internal error 
> state
> 032080ab43ac IB/mlx5: Lock QP during page fault handling
> c99fefea2cc9 net/mlx5: Enumerate page fault types
> 27e95603f4df net/mlx5: Add interface to hold and release core resources
> 698114968a22 net/mlx5: Release resource on error flow
> 
> >   IB/mlx5: Avoid hangs due to a missing completion
> >   IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously
> 
> Dropped to address feedback.
> 
> >   net/mlx5: Remove unused function
> 
> Dropped, because it depends on dropped commit.

Okay, merged into for-next

Jason


Re: [PATCH mlx5-next 11/12] {net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA

2018-11-17 Thread Jason Gunthorpe
On Fri, Nov 16, 2018 at 01:59:00PM -0800, Saeed Mahameed wrote:
> Use the new generic EQ API to move all ODP RDMA data structures and logic
> form mlx5 core driver into mlx5_ib driver.
> 
> Signed-off-by: Saeed Mahameed 
> Reviewed-by: Leon Romanovsky 
> Reviewed-by: Tariq Toukan 
>  drivers/infiniband/hw/mlx5/main.c |  10 +-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h  |  15 +-
>  drivers/infiniband/hw/mlx5/odp.c  | 281 +-
>  drivers/net/ethernet/mellanox/mlx5/core/dev.c |  34 ---
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c  | 252 
>  .../net/ethernet/mellanox/mlx5/core/lib/eq.h  |   8 -
>  .../net/ethernet/mellanox/mlx5/core/main.c|  17 +-
>  .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   2 -
>  include/linux/mlx5/driver.h   |  49 ---
>  include/linux/mlx5/eq.h   |  21 ++
>  10 files changed, 308 insertions(+), 381 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c 
> b/drivers/infiniband/hw/mlx5/main.c
> index 6fbc0cba1bac..fcf4a0328a90 100644
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -6040,6 +6040,11 @@ static int mlx5_ib_stage_odp_init(struct mlx5_ib_dev 
> *dev)
>   return mlx5_ib_odp_init_one(dev);
>  }
>  
> +void mlx5_ib_stage_odp_cleanup(struct mlx5_ib_dev *dev)
> +{
> + mlx5_ib_odp_cleanup_one(dev);
> +}
> +
>  int mlx5_ib_stage_counters_init(struct mlx5_ib_dev *dev)
>  {
>   if (MLX5_CAP_GEN(dev->mdev, max_qp_cnt)) {
> @@ -6225,7 +6230,7 @@ static const struct mlx5_ib_profile pf_profile = {
>mlx5_ib_stage_dev_res_cleanup),
>   STAGE_CREATE(MLX5_IB_STAGE_ODP,
>mlx5_ib_stage_odp_init,
> -  NULL),
> +  mlx5_ib_stage_odp_cleanup),
>   STAGE_CREATE(MLX5_IB_STAGE_COUNTERS,
>mlx5_ib_stage_counters_init,
>mlx5_ib_stage_counters_cleanup),
> @@ -6395,9 +6400,6 @@ static struct mlx5_interface mlx5_ib_interface = {
>   .add= mlx5_ib_add,
>   .remove = mlx5_ib_remove,
>   .event  = mlx5_ib_event,
> -#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> - .pfault = mlx5_ib_pfault,
> -#endif
>   .protocol   = MLX5_INTERFACE_PROTOCOL_IB,
>  };
>  
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
> b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index b651a7a6fde9..d01af2d829b8 100644
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -880,6 +880,15 @@ struct mlx5_ib_lb_state {
>   boolenabled;
>  };
>  
> +struct mlx5_ib_pf_eq {
> + struct mlx5_ib_dev  *dev;
> + struct mlx5_eq  *core;
> + struct work_struct   work;
> + spinlock_t   lock; /* Pagefaults spinlock */
> + struct workqueue_struct  *wq;
> + mempool_t*pool;
> +};

I know this is being copied, but can we please not do this vertical
alignment madness in RDMA? It is such a nightmare to maintain this..

> +/* mempool_refill() was proposed but unfortunately wasn't accepted
> + * http://lkml.iu.edu/hypermail/linux/kernel/1512.1/05073.html
> + * Chip workaround.

'cheap workaround'

Jason


Re: [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously

2018-11-09 Thread Jason Gunthorpe
On Fri, Nov 09, 2018 at 06:26:22PM +0200, Leon Romanovsky wrote:
> On Thu, Nov 08, 2018 at 07:49:03PM +0000, Jason Gunthorpe wrote:
> > On Thu, Nov 08, 2018 at 09:10:15PM +0200, Leon Romanovsky wrote:
> > > From: Moni Shoua 
> > >
> > > Telling the HCA that page fault handling is done and QP can resume
> > > its flow is done in the context of the page fault handler. This blocks
> > > the handling of the next work in queue without a need.
> > > Call the PAGE_FAULT_RESUME command in an asynchronous manner and free
> > > the workqueue to pick the next work item for handling. All tasks that
> > > were executed after PAGE_FAULT_RESUME need to be done now
> > > in the callback of the asynchronous command mechanism.
> > >
> > > Signed-off-by: Moni Shoua 
> > > Signed-off-by: Leon Romanovsky 
> > >  drivers/infiniband/hw/mlx5/odp.c | 110 +--
> > >  include/linux/mlx5/driver.h  |   3 +
> > >  2 files changed, 94 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/mlx5/odp.c 
> > > b/drivers/infiniband/hw/mlx5/odp.c
> > > index abce55b8b9ba..0c4f469cdd5b 100644
> > > +++ b/drivers/infiniband/hw/mlx5/odp.c
> > > @@ -298,20 +298,78 @@ void mlx5_ib_internal_fill_odp_caps(struct 
> > > mlx5_ib_dev *dev)
> > >   return;
> > >  }
> > >
> > > +struct pfault_resume_cb_ctx {
> > > + struct mlx5_ib_dev *dev;
> > > + struct mlx5_core_rsc_common *res;
> > > + struct mlx5_pagefault *pfault;
> > > +};
> > > +
> > > +static void page_fault_resume_callback(int status, void *context)
> > > +{
> > > + struct pfault_resume_cb_ctx *ctx = context;
> > > + struct mlx5_pagefault *pfault = ctx->pfault;
> > > +
> > > + if (status)
> > > + mlx5_ib_err(ctx->dev, "Resolve the page fault failed with 
> > > status %d\n",
> > > + status);
> > > +
> > > + if (ctx->res)
> > > + mlx5_core_res_put(ctx->res);
> > > + kfree(pfault);
> > > + kfree(ctx);
> > > +}
> > > +
> > >  static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
> > > +   struct mlx5_core_rsc_common *res,
> > > struct mlx5_pagefault *pfault,
> > > -   int error)
> > > +   int error,
> > > +   bool async)
> > >  {
> > > + int ret = 0;
> > > + u32 *out = pfault->out_pf_resume;
> > > + u32 *in = pfault->in_pf_resume;
> > > + u32 token = pfault->token;
> > >   int wq_num = pfault->event_subtype == MLX5_PFAULT_SUBTYPE_WQE ?
> > > -  pfault->wqe.wq_num : pfault->token;
> > > - int ret = mlx5_core_page_fault_resume(dev->mdev,
> > > -   pfault->token,
> > > -   wq_num,
> > > -   pfault->type,
> > > -   error);
> > > - if (ret)
> > > - mlx5_ib_err(dev, "Failed to resolve the page fault on WQ 
> > > 0x%x\n",
> > > - wq_num);
> > > + pfault->wqe.wq_num : pfault->token;
> > > + u8 type = pfault->type;
> > > + struct pfault_resume_cb_ctx *ctx = NULL;
> > > +
> > > + if (async)
> > > + ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
> >
> > Why not allocate this ctx ast part of the mlx5_pagefault and avoid
> > this allocation failure strategy?
> 
> It is another way to implement it, both of them are correct.

.. I think it is alot better to move this allocation, it gets rid of
this ugly duplicated code

> Can I assume that we can progress with patches except patch #2?

Lets drop this one too..

Jason


Re: [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously

2018-11-08 Thread Jason Gunthorpe
On Thu, Nov 08, 2018 at 09:10:15PM +0200, Leon Romanovsky wrote:
> From: Moni Shoua 
> 
> Telling the HCA that page fault handling is done and QP can resume
> its flow is done in the context of the page fault handler. This blocks
> the handling of the next work in queue without a need.
> Call the PAGE_FAULT_RESUME command in an asynchronous manner and free
> the workqueue to pick the next work item for handling. All tasks that
> were executed after PAGE_FAULT_RESUME need to be done now
> in the callback of the asynchronous command mechanism.
> 
> Signed-off-by: Moni Shoua 
> Signed-off-by: Leon Romanovsky 
>  drivers/infiniband/hw/mlx5/odp.c | 110 +--
>  include/linux/mlx5/driver.h  |   3 +
>  2 files changed, 94 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/odp.c 
> b/drivers/infiniband/hw/mlx5/odp.c
> index abce55b8b9ba..0c4f469cdd5b 100644
> +++ b/drivers/infiniband/hw/mlx5/odp.c
> @@ -298,20 +298,78 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev 
> *dev)
>   return;
>  }
>  
> +struct pfault_resume_cb_ctx {
> + struct mlx5_ib_dev *dev;
> + struct mlx5_core_rsc_common *res;
> + struct mlx5_pagefault *pfault;
> +};
> +
> +static void page_fault_resume_callback(int status, void *context)
> +{
> + struct pfault_resume_cb_ctx *ctx = context;
> + struct mlx5_pagefault *pfault = ctx->pfault;
> +
> + if (status)
> + mlx5_ib_err(ctx->dev, "Resolve the page fault failed with 
> status %d\n",
> + status);
> +
> + if (ctx->res)
> + mlx5_core_res_put(ctx->res);
> + kfree(pfault);
> + kfree(ctx);
> +}
> +
>  static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
> +   struct mlx5_core_rsc_common *res,
> struct mlx5_pagefault *pfault,
> -   int error)
> +   int error,
> +   bool async)
>  {
> + int ret = 0;
> + u32 *out = pfault->out_pf_resume;
> + u32 *in = pfault->in_pf_resume;
> + u32 token = pfault->token;
>   int wq_num = pfault->event_subtype == MLX5_PFAULT_SUBTYPE_WQE ?
> -  pfault->wqe.wq_num : pfault->token;
> - int ret = mlx5_core_page_fault_resume(dev->mdev,
> -   pfault->token,
> -   wq_num,
> -   pfault->type,
> -   error);
> - if (ret)
> - mlx5_ib_err(dev, "Failed to resolve the page fault on WQ 
> 0x%x\n",
> - wq_num);
> + pfault->wqe.wq_num : pfault->token;
> + u8 type = pfault->type;
> + struct pfault_resume_cb_ctx *ctx = NULL;
> +
> + if (async)
> + ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);

Why not allocate this ctx ast part of the mlx5_pagefault and avoid
this allocation failure strategy?

Jason


Re: [PATCH mlx5-next 00/10] Collection of ODP fixes

2018-11-08 Thread Jason Gunthorpe
On Thu, Nov 08, 2018 at 09:10:07PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Hi,
> 
> This is collection of fixes to mlx5_core and mlx5_ib ODP logic.
> There are two main reasons why we decided to forward it to mlx5-next
> and not to rdma-rc or net like our other fixes.
> 
> 1. We have large number of patches exactly in that area that are ready
> for submission and we don't want to create extra merge work for
> maintainers due to that. Among those future series (already ready and
> tested): internal change of mlx5_core pagefaults handling, moving ODP
> code to RDMA, change in EQ mechanism, simplification and moving SRQ QP
> to RDMA, extra fixes to mlx5_ib ODP and ODP SRQ support.
> 
> 2. Most of those fixes are for old bugs and there is no rush to fix them
> right now (in -rc1/-rc2).
> 
> Thanks
> 
> Moni Shoua (10):
>   net/mlx5: Release resource on error flow
>   IB/mlx5: Avoid hangs due to a missing completion
>   net/mlx5: Add interface to hold and release core resources
>   net/mlx5: Enumerate page fault types
>   IB/mlx5: Lock QP during page fault handling
>   net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
>   net/mlx5: Use multi threaded workqueue for page fault handling
>   IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously
>   net/mlx5: Remove unused function
>   IB/mlx5: Improve ODP debugging messages
> 
>  drivers/infiniband/core/umem_odp.c|  14 +-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h  |   1 +
>  drivers/infiniband/hw/mlx5/mr.c   |  15 +-
>  drivers/infiniband/hw/mlx5/odp.c  | 158 ++
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  21 +--
>  drivers/net/ethernet/mellanox/mlx5/core/qp.c  |  20 ++-

there is alot of ethernet files here, parts of this should probably go
through the shared branch?

Jason


Re: [PATCH mlx5-next 02/10] IB/mlx5: Avoid hangs due to a missing completion

2018-11-08 Thread Jason Gunthorpe
On Thu, Nov 08, 2018 at 09:10:09PM +0200, Leon Romanovsky wrote:
> From: Moni Shoua 
> 
> Fix 2 flows that may cause a process to hang on wait_for_completion():
> 
> 1. When callback for create MKEY command returns with bad status
> 2. When callback for create MKEY command is called before executer of
> command calls wait_for_completion()
> 
> The following call trace might be triggered in the above flows:
> 
> INFO: task echo_server:1655 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> echo_server D 880813fb6898 0  1655  1 0x0004
>  880423f5b880 0086 880402290fd0 880423f5bfd8
>  880423f5bfd8 880423f5bfd8 880402290fd0 880813fb68a0
>  7fff 880813fb6898 880402290fd0 880813fb6898
> Call Trace:
>  [] schedule+0x29/0x70
>  [] schedule_timeout+0x239/0x2c0
>  [] ? mlx5_cmd_exec_cb+0x22/0x30 [mlx5_core]
>  [] ? mlx5_core_create_mkey_cb+0xb7/0x220 [mlx5_core]
>  [] ? mm_drop_all_locks+0xd7/0x110
>  [] wait_for_completion+0xfd/0x140
>  [] ? wake_up_state+0x20/0x20
>  [] mlx5_mr_cache_alloc+0xa8/0x170 [mlx5_ib]
>  [] implicit_mr_alloc+0x36/0x190 [mlx5_ib]
>  [] mlx5_ib_alloc_implicit_mr+0x4e/0xa0 [mlx5_ib]
>  [] mlx5_ib_reg_user_mr+0x93/0x6a0 [mlx5_ib]
>  [] ? mlx5_ib_exp_query_device+0xab0/0xbc0 [mlx5_ib]
>  [] ib_uverbs_exp_reg_mr+0x2fe/0x550 [ib_uverbs]
>  [] ? do_huge_pmd_anonymous_page+0x2bf/0x530
>  [] ib_uverbs_write+0x3ec/0x490 [ib_uverbs]
>  [] vfs_write+0xbd/0x1e0
>  [] SyS_write+0x7f/0xe0
>  [] system_call_fastpath+0x16/0x1b
> 
> Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib")
> Signed-off-by: Moni Shoua 
> Reviewed-by: Majd Dibbiny 
> Signed-off-by: Leon Romanovsky 
>  drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
>  drivers/infiniband/hw/mlx5/mr.c  | 15 ---
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
> b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index b651a7a6fde9..cd9335e368bd 100644
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -644,6 +644,7 @@ struct mlx5_cache_ent {
>   struct delayed_work dwork;
>   int pending;
>   struct completion   compl;
> + atomic_tdo_complete;
>  };
> 
>  struct mlx5_mr_cache {
> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> index 9b195d65a13e..259dd49c6874 100644
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -143,7 +143,7 @@ static void reg_mr_callback(int status, void *context)
>   kfree(mr);
>   dev->fill_delay = 1;
>   mod_timer(>delay_timer, jiffies + HZ);
> - return;
> + goto do_complete;
>   }
> 
>   mr->mmkey.type = MLX5_MKEY_MR;
> @@ -167,8 +167,13 @@ static void reg_mr_callback(int status, void *context)
>   pr_err("Error inserting to mkey tree. 0x%x\n", -err);
>   write_unlock_irqrestore(>lock, flags);
> 
> - if (!completion_done(>compl))
> +do_complete:
> + spin_lock_irqsave(>lock, flags);
> + if (atomic_read(>do_complete)) {
>   complete(>compl);
> + atomic_dec(>do_complete);
> + }
> + spin_unlock_irqrestore(>lock, flags);

Oh, this is quite an ugly way to use completions, I think this has
veered into misusing completion territory.. The completion_done was
never right...

add_keys should accept a flag indicating that this MR has a completor
waiting and should trigger complete() on CB finishing... Can probably
store the flag someplace in the MR.

Jason


Re: [PATCH rdma] net/mlx5: Fix XRC SRQ umem valid bits

2018-11-07 Thread Jason Gunthorpe
On Wed, Nov 07, 2018 at 09:34:34AM +0200, Leon Romanovsky wrote:
> On Tue, Nov 06, 2018 at 03:11:53PM -0700, Jason Gunthorpe wrote:
> > On Tue, Nov 06, 2018 at 05:10:53PM -0500, Doug Ledford wrote:
> > > On Tue, 2018-11-06 at 22:02 +0000, Jason Gunthorpe wrote:
> > > > On Tue, Nov 06, 2018 at 04:31:08PM -0500, Doug Ledford wrote:
> > > > > On Wed, 2018-10-31 at 12:20 +0200, Leon Romanovsky wrote:
> > > > > > From: Yishai Hadas 
> > > > > >
> > > > > > Adapt XRC SRQ to the latest HW specification with fixed definition
> > > > > > around umem valid bits. The previous definition relied on a bit 
> > > > > > which
> > > > > > was taken for other purposes in legacy FW.
> > > > > >
> > > > > > Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
> > > > > > Signed-off-by: Yishai Hadas 
> > > > > > Reviewed-by: Artemy Kovalyov 
> > > > > > Signed-off-by: Leon Romanovsky 
> > > > > > Hi Doug, Jason
> > > > > >
> > > > > > This commit fixes code sent in this merge window, so I'm not 
> > > > > > marking it
> > > > > > with any rdma-rc/rdma-next. It will be better to be sent during 
> > > > > > this merge
> > > > > > window if you have extra pull request to issue, or as a -rc 
> > > > > > material, if
> > > > > > not.
> > > > > >
> > > > > > BTW, we didn't combine reserved fields, because our convention is 
> > > > > > to align such
> > > > > > fields to 32 bits for better readability.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > This looks fine.  Let me know when it's in the mlx5-next tree to pull.
> > > >
> > > > It needs to go to -rc...
> > > >
> > > > This needs a mlx5-rc branch for this I guess?
> > >
> > > I don't think so.  As long as it's the first commit in mlx5-next, and
> > > mlx5-next is 4.20-rc1 based, then pulling this commit into the -rc tree
> > > will only pull the single commit.  Then when we pull into for-next for
> > > the first time, we will get this in for-next too.  That seems best to
> > > me.
> >
> > That works too, if Leon is fast :)
> 
> Thank you both for suggestion.
> 
> I did it.
> 99b77fef3c6c net/mlx5: Fix XRC SRQ umem valid bits
> 
> It is first commit and it is based on -rc1.

Okay, I put this branch in -rc

Jason


Re: [PATCH rdma] net/mlx5: Fix XRC SRQ umem valid bits

2018-11-06 Thread Jason Gunthorpe
On Tue, Nov 06, 2018 at 05:10:53PM -0500, Doug Ledford wrote:
> On Tue, 2018-11-06 at 22:02 +0000, Jason Gunthorpe wrote:
> > On Tue, Nov 06, 2018 at 04:31:08PM -0500, Doug Ledford wrote:
> > > On Wed, 2018-10-31 at 12:20 +0200, Leon Romanovsky wrote:
> > > > From: Yishai Hadas 
> > > > 
> > > > Adapt XRC SRQ to the latest HW specification with fixed definition
> > > > around umem valid bits. The previous definition relied on a bit which
> > > > was taken for other purposes in legacy FW.
> > > > 
> > > > Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
> > > > Signed-off-by: Yishai Hadas 
> > > > Reviewed-by: Artemy Kovalyov 
> > > > Signed-off-by: Leon Romanovsky 
> > > > Hi Doug, Jason
> > > > 
> > > > This commit fixes code sent in this merge window, so I'm not marking it
> > > > with any rdma-rc/rdma-next. It will be better to be sent during this 
> > > > merge
> > > > window if you have extra pull request to issue, or as a -rc material, if
> > > > not.
> > > > 
> > > > BTW, we didn't combine reserved fields, because our convention is to 
> > > > align such
> > > > fields to 32 bits for better readability.
> > > > 
> > > > Thanks
> > > 
> > > This looks fine.  Let me know when it's in the mlx5-next tree to pull.
> > 
> > It needs to go to -rc... 
> > 
> > This needs a mlx5-rc branch for this I guess?
> 
> I don't think so.  As long as it's the first commit in mlx5-next, and
> mlx5-next is 4.20-rc1 based, then pulling this commit into the -rc tree
> will only pull the single commit.  Then when we pull into for-next for
> the first time, we will get this in for-next too.  That seems best to
> me.

That works too, if Leon is fast :)

Jason


Re: [PATCH rdma] net/mlx5: Fix XRC SRQ umem valid bits

2018-11-06 Thread Jason Gunthorpe
On Tue, Nov 06, 2018 at 04:31:08PM -0500, Doug Ledford wrote:
> On Wed, 2018-10-31 at 12:20 +0200, Leon Romanovsky wrote:
> > From: Yishai Hadas 
> > 
> > Adapt XRC SRQ to the latest HW specification with fixed definition
> > around umem valid bits. The previous definition relied on a bit which
> > was taken for other purposes in legacy FW.
> > 
> > Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
> > Signed-off-by: Yishai Hadas 
> > Reviewed-by: Artemy Kovalyov 
> > Signed-off-by: Leon Romanovsky 
> > Hi Doug, Jason
> > 
> > This commit fixes code sent in this merge window, so I'm not marking it
> > with any rdma-rc/rdma-next. It will be better to be sent during this merge
> > window if you have extra pull request to issue, or as a -rc material, if
> > not.
> > 
> > BTW, we didn't combine reserved fields, because our convention is to align 
> > such
> > fields to 32 bits for better readability.
> > 
> > Thanks
> 
> This looks fine.  Let me know when it's in the mlx5-next tree to pull.

It needs to go to -rc... 

This needs a mlx5-rc branch for this I guess?

Jason 



Re: [PATCH rdma-next 3/4] IB/mlx5: Verify that driver supports user flags

2018-10-07 Thread Jason Gunthorpe
On Sun, Oct 07, 2018 at 12:03:36PM +0300, Leon Romanovsky wrote:
> From: Yonatan Cohen 
> 
> Flags sent down from user might not be supported by
> running driver.
> This might lead to unwanted bugs.
> To solve this, added macro to test for unsupported flags.
> 
> Signed-off-by: Yonatan Cohen 
> Signed-off-by: Leon Romanovsky 
>  drivers/infiniband/hw/mlx5/qp.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
> index bae48bdf281c..17c4b6641933 100644
> +++ b/drivers/infiniband/hw/mlx5/qp.c
> @@ -1728,6 +1728,15 @@ static void configure_requester_scat_cqe(struct 
> mlx5_ib_dev *dev,
>   MLX5_SET(qpc, qpc, cs_req, MLX5_REQ_SCAT_DATA32_CQE);
>  }
>  
> +#define MLX5_QP_CREATE_FLAGS_NOT_SUPPORTED(flags) \
> +  ((flags) & ~(\

This needs a cast, it would be better to add something like the check
comp mask function in rdma-core than this goofy macro thing.

Jason


Re: [PATCH rdma-next v1 0/7] Preparation to DevX extension series

2018-09-22 Thread Jason Gunthorpe
On Thu, Sep 20, 2018 at 09:35:19PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Changelog v0->v1:
>  * Update commit messages
>  * Split DevX series to small sub-series.
>  * Change static initialization from {0} to be {}
> 
> 
> Set uid as part of various IB commands so that the firmware can manage
> the IB object in a secured way.
> 
> The firmware should mark this IB object with the given uid so that it
> can be used later on only by objects with the same uid.
> 
> Upon DEVX flows that use this objec, the pointed object must have
> the same uid as of the issuer uid command.
> 
> When a command is issued with uid=0 it means that the issuer of the
> command is trusted (i.e. kernel), in that case any pointed object
> can be used regardless of its uid.
> 
> Thanks
> 
> Leon Romanovsky (1):
>   net/mlx5: Update mlx5_ifc with DEVX UID bits
> 
> Yishai Hadas (6):
>   net/mlx5: Set uid as part of CQ commands
>   net/mlx5: Set uid as part of QP commands
>   net/mlx5: Set uid as part of RQ commands
>   net/mlx5: Set uid as part of SQ commands
>   net/mlx5: Set uid as part of SRQ commands
>   net/mlx5: Set uid as part of DCT commands

This and the next series look OK to me. Let me know when it is applied
to the mlx branch

Thanks,
Jason


Re: [PATCH rdma-next 0/4] mlx5 vport loopback

2018-09-21 Thread Jason Gunthorpe
On Fri, Sep 21, 2018 at 03:14:36PM -0400, Doug Ledford wrote:
> On Mon, 2018-09-17 at 13:30 +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky 
> > 
> > Hi,
> > 
> > This is short series from Mark which extends handling of loopback
> > traffic. Originally mlx5 IB dynamically enabled/disabled both unicast
> > and multicast based on number of users. However RAW ethernet QPs need
> > more granular access.
> > 
> > Thanks
> > 
> > Mark Bloch (4):
> >   net/mlx5: Rename incorrect naming in IFC file
> >   RDMA/mlx5: Refactor transport domain bookkeeping logic
> >   RDMA/mlx5: Allow creating RAW ethernet QP with loopback support
> >   RDMA/mlx5: Enable vport loopback when user context or QP mandate
> 
> I've reviewed this series and I'm OK with it, but the first patch is for
> net/mlx5.  How are you expecting the series to be applied?  Are you
> wanting me or Jason to take the entire series, or does the first patch
> need to go through the mlx5 tree and get picked up by Dave and us, and
> then we take the rest?  This is unclear to me...

Saeed or Leon will apply the net/mlx5 patches when netdev and RDMA
approves the series, such as with the above approval.

Our job is to take the branch from

  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
mellanox/mlx5-next

So it is a two step process where it is approved on the mailing list
then Saeed/Leon will respond with the branch commits.

Depending on need I've been doing either a 'series merge' which looks
like this:

commit 8193abb6a8171c775503acd041eb957cc7e9e7f4
Merge: c1dfc0114c901b 25bb36e75d7d62
Author: Jason Gunthorpe 
Date:   Wed Jul 4 13:19:46 2018 -0600

Merge branch 'mlx5-dump-fill-mkey' into rdma.git for-next

For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull Dump and fill MKEY from Leon Romanovsky:


MLX5 IB HCA offers the memory key, dump_fill_mkey to increase performance,
when used in a send or receive operations.

It is used to force local HCA operations to skip the PCI bus access, while
keeping track of the processed length in the ibv_sge handling.

In this three patch series, we expose various bits in our HW spec
file (mlx5_ifc.h), move unneeded for mlx5_core FW command and export such
memory key to user space thought our mlx5-abi header file.


Botched auto-merge in mlx5_ib_alloc_ucontext() resolved by hand.

* branch 'mlx5-dump-fill-mkey':
  IB/mlx5: Expose dump and fill memory key
  net/mlx5: Add hardware definitions for dump_fill_mkey
  net/mlx5: Limit scope of dump_fill_mkey function
  net/mlx5: Rate limit errors in command interface

Signed-off-by: Jason Gunthorpe 

Which preserves the cover letter, so I prefer it.

 This only works if the RDMA patches have no dependency on the latest
RDMA tree.

The Mellanox branch may have additional patches beyond the series in
question, this just means they have progressed on the netdev side, I
usually trim them out of the 'git merge --log' section for greater
clarity.

Otherwise, merge the Mellanox branch with the right commit IDs and
then apply the RDMA patches, such as here:

commit eda98779f7d318cf96f030bbc5b23f034b69b80a
Merge: 4fca037783512c 664000b6bb4352
Author: Jason Gunthorpe 
Date:   Tue Jul 24 13:10:23 2018 -0600

Merge branch 'mellanox/mlx5-next' into rdma.git for-next

From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

This is required to resolve dependencies of the next series of RDMA
patches.

* branch 'mellanox/mlx5-next':
  net/mlx5: Add support for flow table destination number
  net/mlx5: Add forward compatible support for the FTE match data
  net/mlx5: Fix tristate and description for MLX5 module
  net/mlx5: Better return types for CQE API
  net/mlx5: Use ERR_CAST() instead of coding it
  net/mlx5: Add missing SET_DRIVER_VERSION command translation
  net/mlx5: Add XRQ commands definitions
  net/mlx5: Add core support for double vlan push/pop steering action
  net/mlx5: Expose MPEGC (Management PCIe General Configuration) structures
  net/mlx5: FW tracer, add hardware structures
  net/mlx5: fix uaccess beyond "count" in debugfs read/write handlers

Signed-off-by: Jason Gunthorpe 

It is up to Saeed to sync this branch with netdev.

Neither RDMA or netdev should apply patches marked for net/mlx5 - they
must go through the shared branch to avoid conflicts.

Jason


Re: [PATCH rdma-next 1/5] RDMA/core: Provide getter and setter to access IB device name

2018-09-20 Thread Jason Gunthorpe
On Thu, Sep 20, 2018 at 07:40:39PM +0300, Leon Romanovsky wrote:

> > > The protection is done with global device_lock because it is used in
> > > allocation and deallocation phases. At this stage, this lock is not
> > > busy and easily can be moved to be per-device, once it will be needed.
> > >
> > > Signed-off-by: Leon Romanovsky 
> > >  drivers/infiniband/core/device.c | 24 +++-
> > >  include/rdma/ib_verbs.h  |  8 +++-
> > >  2 files changed, 30 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/core/device.c 
> > > b/drivers/infiniband/core/device.c
> > > index 5a680a88aa87..3270cde6d806 100644
> > > +++ b/drivers/infiniband/core/device.c
> > > @@ -170,6 +170,14 @@ static struct ib_device 
> > > *__ib_device_get_by_name(const char *name)
> > >   return NULL;
> > >  }
> > >
> > > +void ib_device_get_name(struct ib_device *ibdev, char *name)
> > > +{
> > > + down_read(_rwsem);
> > > + strlcpy(name, ibdev->name, IB_DEVICE_NAME_MAX);
> > > + up_read(_rwsem);
> > > +}
> > > +EXPORT_SYMBOL(ib_device_get_name);
> >
> > I think we have to follow netdev and just rely on device_rename()
> > being 'good enough'.
> >
> > Switch everything to use dev_name()/etc rather than try and do
> > something like this so the responsibility is on the device core to
> > keep this working, not us.
> >
> > Turns out I have a series for that for unrelated reasons..
> 
> And what should I do now with this knowledge?

Maybe I can send it today..

Jason


Re: [PATCH rdma-next 1/5] RDMA/core: Provide getter and setter to access IB device name

2018-09-20 Thread Jason Gunthorpe
On Thu, Sep 20, 2018 at 02:21:58PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Prepare IB device name field to rename operation by ensuring that all
> accesses to it are protected with lock and users don't see part of name.

Oh dear, no, that isn't going to work, there is too much stuff using
dev_name.. Did you read the comment on device_rename??

https://elixir.bootlin.com/linux/v4.19-rc4/source/drivers/base/core.c#L2715

> The protection is done with global device_lock because it is used in
> allocation and deallocation phases. At this stage, this lock is not
> busy and easily can be moved to be per-device, once it will be needed.
> 
> Signed-off-by: Leon Romanovsky 
>  drivers/infiniband/core/device.c | 24 +++-
>  include/rdma/ib_verbs.h  |  8 +++-
>  2 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/device.c 
> b/drivers/infiniband/core/device.c
> index 5a680a88aa87..3270cde6d806 100644
> +++ b/drivers/infiniband/core/device.c
> @@ -170,6 +170,14 @@ static struct ib_device *__ib_device_get_by_name(const 
> char *name)
>   return NULL;
>  }
> 
> +void ib_device_get_name(struct ib_device *ibdev, char *name)
> +{
> + down_read(_rwsem);
> + strlcpy(name, ibdev->name, IB_DEVICE_NAME_MAX);
> + up_read(_rwsem);
> +}
> +EXPORT_SYMBOL(ib_device_get_name);

I think we have to follow netdev and just rely on device_rename()
being 'good enough'.  

Switch everything to use dev_name()/etc rather than try and do
something like this so the responsibility is on the device core to
keep this working, not us.

Turns out I have a series for that for unrelated reasons..

>  static int alloc_name(char *name)
>  {
>   unsigned long *inuse;
> @@ -202,6 +210,21 @@ static int alloc_name(char *name)
>   return 0;
>  }
> 
> +int ib_device_alloc_name(struct ib_device *ibdev, const char *pattern)
> +{
> + int ret = 0;
> +
> + mutex_lock(_mutex);
> + strlcpy(ibdev->name, pattern, IB_DEVICE_NAME_MAX);
> + if (strchr(ibdev->name, '%'))
> + ret = alloc_name(ibdev->name);
> +
> + mutex_unlock(_mutex);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(ib_device_alloc_name);

Can't call alloc_name() without also adding to the list, this will
allow duplicates.

Jason


Re: [PATCH mlx5-next 03/25] net/mlx5: Set uid as part of RQ commands

2018-09-19 Thread Jason Gunthorpe
On Wed, Sep 19, 2018 at 11:40:45AM -0700, Saeed Mahameed wrote:
> On Wed, Sep 19, 2018 at 10:28 AM Jason Gunthorpe  wrote:
> >
> > On Mon, Sep 17, 2018 at 02:03:56PM +0300, Leon Romanovsky wrote:
> > > From: Yishai Hadas 
> > >
> > > Set uid as part of RQ commands so that the firmware can manage the
> > > RQ object in a secured way.
> > >
> > > That will enable using an RQ that was created by verbs application
> > > to be used by the DEVX flow in case the uid is equal.
> > >
> > > Signed-off-by: Yishai Hadas 
> > > Signed-off-by: Leon Romanovsky 
> > >  drivers/net/ethernet/mellanox/mlx5/core/qp.c | 16 ++--
> > >  include/linux/mlx5/mlx5_ifc.h|  6 +++---
> > >  2 files changed, 17 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c 
> > > b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
> > > index 04f72a1cdbcc..0ca68ef54d93 100644
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
> > > @@ -540,6 +540,17 @@ int mlx5_core_xrcd_dealloc(struct mlx5_core_dev 
> > > *dev, u32 xrcdn)
> > >  }
> > >  EXPORT_SYMBOL_GPL(mlx5_core_xrcd_dealloc);
> > >
> > > +static void destroy_rq_tracked(struct mlx5_core_dev *dev, u32 rqn, u16 
> > > uid)
> > > +{
> > > + u32 in[MLX5_ST_SZ_DW(destroy_rq_in)]   = {0};
> > > + u32 out[MLX5_ST_SZ_DW(destroy_rq_out)] = {0};
> >
> > = {} is the preferred version of this, right?
> >
> > {0} explicitly initializes the first element to zero and only works if
> > the first element happens to be something integral..
> >
> 
> Both are perfectly ok in our scenarios.
> I remember one of the syntaxes yielded a statistic checker warning, i
> don't remember which syntax and what static checker :) ..

{0} will throw a 'missing-field-initializers' compiler warning,
however it recognizes '= {}' as an idom meaning 'zero everything' and
does not throw a warning.

Jason


Re: [PATCH rdma-next 00/24] Extend DEVX functionality

2018-09-19 Thread Jason Gunthorpe
On Mon, Sep 17, 2018 at 02:03:53PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> From Yishai,
> 
> This series comes to enable the DEVX functionality in some wider scope,
> specifically,
> - It enables using kernel objects that were created by the verbs
>   API in the DEVX flow.
> - It enables white list commands without DEVX user context.
> - It enables the IB link layer under CAP_NET_RAW capabilities.
> - It exposes the PRM handles for RAW QP (i.e. TIRN, TISN, RQN, SQN)
>   to be used later on directly by the DEVX interface.
> 
> In General,
> Each object that is created/destroyed/modified via verbs will be stamped
> with a UID based on its user context. This is already done for DEVX objects
> commands.
> 
> This will enable the firmware to enforce the usage of kernel objects
> from the DEVX flow by validating that the same UID is used and the resources 
> are
> really related to the same user.
> 
> For example in case a CQ was created with verbs it will be stamped with
> UID and once will be pointed by a DEVX create QP command the firmware will
> validate that the input CQN really belongs to the UID which issues the create 
> QP
> command.
> 
> As of the above, all the PRM objects (except of the public ones which
> are managed by the kernel e.g. FLOW, etc.) will have a UID upon their
> create/modify/destroy commands. The detection of UMEM / physical
> addressed in the relevant commands will be done by firmware according to a 
> 'umem
> valid bit' as the UID may be used in both cases.
> 
> The series also enables white list commands which don't require a
> specific DEVX context, instead of this a device UID is used so that
> the firmware will mask un-privileged functionality. The IB link layer
> is also enabled once CAP_NET_RAW permission exists.
> 
> To enable using the RAW QP underlay objects (e.g. TIRN, RQN, etc.) later
> on by DEVX commands the UHW output for this case was extended to return this
> data when a DEVX context is used.
> 
> Thanks
>
> Leon Romanovsky (1):
>   net/mlx5: Update mlx5_ifc with DEVX UID bits
> 
> Yishai Hadas (24):
>   net/mlx5: Set uid as part of CQ commands
>   net/mlx5: Set uid as part of QP commands
>   net/mlx5: Set uid as part of RQ commands
>   net/mlx5: Set uid as part of SQ commands
>   net/mlx5: Set uid as part of SRQ commands
>   net/mlx5: Set uid as part of DCT commands
>   IB/mlx5: Set uid as part of CQ creation
>   IB/mlx5: Set uid as part of QP creation
>   IB/mlx5: Set uid as part of RQ commands
>   IB/mlx5: Set uid as part of SQ commands
>   IB/mlx5: Set uid as part of TIR commands
>   IB/mlx5: Set uid as part of TIS commands
>   IB/mlx5: Set uid as part of RQT commands
>   IB/mlx5: Set uid as part of PD commands
>   IB/mlx5: Set uid as part of TD commands
>   IB/mlx5: Set uid as part of SRQ commands
>   IB/mlx5: Set uid as part of DCT commands
>   IB/mlx5: Set uid as part of XRCD commands
>   IB/mlx5: Set uid as part of MCG commands

This is really too many patches.. They are small and not too hard to
review, but it is well beyond the guideline.

And I'm not totally happy with the extensive use of ucontext in the IB
portions, it is problematic looking into the future, and uboject is
really not supposed to be used in the drivers.

The driver needs to store the uid in the PD (copied from the ucontext
that created it) and use that in all the dependent places, not use
pd->uobject->ucontext->devx_uid or some other convoluted way to get
to it. 

The ucontext variable should only be used when creating the PD, CQ and
devx objects.

This detail becomes quite important, for instance, if we get to the
'shared pd' that has been talked about at conference. In this case
when the 'receiver' of the 'shared pd' creates a child object, like a
MR, the MR must be stamped with the devx_uid of the PD (ie the
originating context's devx_uid), not the dev_uid of its local ufile!

If we do that, then the series can be split, so long as pd->devx_uid ==
0 until the entire series is applied. uid tagging is an all-or-nothing
thing, as partial tagging will break verbs. So breaking it up also
makes it more bi-section safe.

Something like these patches:

>   net/mlx5: Update mlx5_ifc with DEVX UID bits
>   net/mlx5: Set uid as part of CQ commands
>   net/mlx5: Set uid as part of QP commands
>   net/mlx5: Set uid as part of RQ commands
>   net/mlx5: Set uid as part of SQ commands
>   net/mlx5: Set uid as part of SRQ commands
>   net/mlx5: Set uid as part of DCT commands
>   IB/mlx5: Set uid as part of PD commands
>   IB/mlx5: Set uid as part of QP creation
>   IB/mlx5: Set uid as part of RQ commands
>   IB/mlx5: Set uid as part of SQ commands
>   IB/mlx5: Set uid as part of SRQ commands
>   IB/mlx5: Set uid as part of DCT commands

(13 patches)

Followed by the rest of the IB uid patches

Followed by a patch to make pd->uid != 0 along with these:

>   IB/mlx5: Set uid as part of CQ creation
>   IB/mlx5: Set valid umem bit on DEVX
>   IB/mlx5: Expose RAW QP device handles to 

Re: [PATCH mlx5-next 07/25] net/mlx5: Update mlx5_ifc with DEVX UID bits

2018-09-19 Thread Jason Gunthorpe
On Mon, Sep 17, 2018 at 02:04:00PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Add DEVX information to WQ, SRQ, CQ, TRI, TIS, QP,
> RQ, XRCD, PD, MKEY and MCG.
> 
> Signed-off-by: Leon Romanovsky 
> ---
>  include/linux/mlx5/mlx5_ifc.h | 67 
> +++
>  1 file changed, 43 insertions(+), 24 deletions(-)

It is weird that we sometimes have these IFC bundle updates and
sometimes the IFC is inlined in the patch.. 

Why not just do one big IFC only patch with everything the series
needs?

Jason


Re: [PATCH mlx5-next 03/25] net/mlx5: Set uid as part of RQ commands

2018-09-19 Thread Jason Gunthorpe
On Mon, Sep 17, 2018 at 02:03:56PM +0300, Leon Romanovsky wrote:
> From: Yishai Hadas 
> 
> Set uid as part of RQ commands so that the firmware can manage the
> RQ object in a secured way.
> 
> That will enable using an RQ that was created by verbs application
> to be used by the DEVX flow in case the uid is equal.
> 
> Signed-off-by: Yishai Hadas 
> Signed-off-by: Leon Romanovsky 
>  drivers/net/ethernet/mellanox/mlx5/core/qp.c | 16 ++--
>  include/linux/mlx5/mlx5_ifc.h|  6 +++---
>  2 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
> index 04f72a1cdbcc..0ca68ef54d93 100644
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
> @@ -540,6 +540,17 @@ int mlx5_core_xrcd_dealloc(struct mlx5_core_dev *dev, 
> u32 xrcdn)
>  }
>  EXPORT_SYMBOL_GPL(mlx5_core_xrcd_dealloc);
>  
> +static void destroy_rq_tracked(struct mlx5_core_dev *dev, u32 rqn, u16 uid)
> +{
> + u32 in[MLX5_ST_SZ_DW(destroy_rq_in)]   = {0};
> + u32 out[MLX5_ST_SZ_DW(destroy_rq_out)] = {0};

= {} is the preferred version of this, right? 

{0} explicitly initializes the first element to zero and only works if
the first element happens to be something integral..

Jason


Re: [PATCH mlx5-next 02/25] net/mlx5: Set uid as part of QP commands

2018-09-19 Thread Jason Gunthorpe
On Mon, Sep 17, 2018 at 02:03:55PM +0300, Leon Romanovsky wrote:
> From: Yishai Hadas 
> 
> Set uid as part of QP commands so that the firmware can manage the
> QP object in a secured way.
> 
> That will enable using a QP that was created by verbs application to
> be used by the DEVX flow in case the uid is equal.
> 
> Signed-off-by: Yishai Hadas 
> Signed-off-by: Leon Romanovsky 
>  drivers/net/ethernet/mellanox/mlx5/core/qp.c | 45 
> +---
>  include/linux/mlx5/mlx5_ifc.h| 22 +++---
>  include/linux/mlx5/qp.h  |  1 +
>  3 files changed, 39 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
> index 4ca07bfb6b14..04f72a1cdbcc 100644
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
> @@ -240,6 +240,7 @@ int mlx5_core_create_qp(struct mlx5_core_dev *dev,
>   if (err)
>   return err;
>  
> + qp->uid = MLX5_GET(create_qp_in, in, uid);
>   qp->qpn = MLX5_GET(create_qp_out, out, qpn);
>   mlx5_core_dbg(dev, "qpn = 0x%x\n", qp->qpn);
>  
> @@ -261,6 +262,7 @@ int mlx5_core_create_qp(struct mlx5_core_dev *dev,
>   memset(dout, 0, sizeof(dout));
>   MLX5_SET(destroy_qp_in, din, opcode, MLX5_CMD_OP_DESTROY_QP);
>   MLX5_SET(destroy_qp_in, din, qpn, qp->qpn);
> + MLX5_SET(destroy_qp_in, din, uid, qp->uid);
>   mlx5_cmd_exec(dev, din, sizeof(din), dout, sizeof(dout));
>   return err;
>  }
> @@ -320,6 +322,7 @@ int mlx5_core_destroy_qp(struct mlx5_core_dev *dev,
>  
>   MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP);
>   MLX5_SET(destroy_qp_in, in, qpn, qp->qpn);
> + MLX5_SET(destroy_qp_in, in, uid, qp->uid);
>   err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
>   if (err)
>   return err;
> @@ -373,7 +376,7 @@ static void mbox_free(struct mbox_info *mbox)
>  
>  static int modify_qp_mbox_alloc(struct mlx5_core_dev *dev, u16 opcode, int 
> qpn,
>   u32 opt_param_mask, void *qpc,
> - struct mbox_info *mbox)
> + struct mbox_info *mbox, u16 uid)
>  {
>   mbox->out = NULL;
>   mbox->in = NULL;
> @@ -381,26 +384,32 @@ static int modify_qp_mbox_alloc(struct mlx5_core_dev 
> *dev, u16 opcode, int qpn,
>  #define MBOX_ALLOC(mbox, typ)  \
>   mbox_alloc(mbox, MLX5_ST_SZ_BYTES(typ##_in), 
> MLX5_ST_SZ_BYTES(typ##_out))
>  
> -#define MOD_QP_IN_SET(typ, in, _opcode, _qpn) \
> - MLX5_SET(typ##_in, in, opcode, _opcode); \
> - MLX5_SET(typ##_in, in, qpn, _qpn)
> -
> -#define MOD_QP_IN_SET_QPC(typ, in, _opcode, _qpn, _opt_p, _qpc) \
> - MOD_QP_IN_SET(typ, in, _opcode, _qpn); \
> - MLX5_SET(typ##_in, in, opt_param_mask, _opt_p); \
> - memcpy(MLX5_ADDR_OF(typ##_in, in, qpc), _qpc, MLX5_ST_SZ_BYTES(qpc))
> +#define MOD_QP_IN_SET(typ, in, _opcode, _qpn, _uid) \
> + do { \
> + MLX5_SET(typ##_in, in, opcode, _opcode); \
> + MLX5_SET(typ##_in, in, qpn, _qpn); \
> + MLX5_SET(typ##_in, in, uid, _uid); \
> + } while (0)
> +
> +#define MOD_QP_IN_SET_QPC(typ, in, _opcode, _qpn, _opt_p, _qpc, _uid) \
> + do { \
> + MOD_QP_IN_SET(typ, in, _opcode, _qpn, _uid); \
> + MLX5_SET(typ##_in, in, opt_param_mask, _opt_p); \
> + memcpy(MLX5_ADDR_OF(typ##_in, in, qpc), \
> +_qpc, MLX5_ST_SZ_BYTES(qpc)); \
> + } while (0)

These should get touched by clang-format to follow the usual
formatting for macros..

Jason


Re: [PATCH] cxgb4: fix abort_req_rss6 struct

2018-09-11 Thread Jason Gunthorpe
On Fri, Aug 31, 2018 at 11:52:00AM -0700, Steve Wise wrote:
> Remove the incorrect WR_HDR field which can cause a misinterpretation
> of this CPL by ULDs.
> 
> Fixes: a3cdaa69e4ae ("cxgb4: Adds CPL support for Shared Receive Queues")
> Signed-off-by: Steve Wise 
> ---
> 
> Dave, Doug, and Jason,
> 
> I request this merge through the rdma repo since the only user of this
> structure is iw_cxgb4.

Applied to for-rc, thanks

Jason


Re: [PATCH v4 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace

2018-09-07 Thread Jason Gunthorpe
On Fri, Sep 07, 2018 at 01:14:51PM -0400, Doug Ledford wrote:
> On Fri, 2018-09-07 at 09:43 -0600, Jason Gunthorpe wrote:
> > On Thu, Sep 06, 2018 at 05:51:12PM +0300, Arseny Maslennikov wrote:
> > > Some tools may currently be using only the deprecated attribute;
> > > let's print an elaborate and clear deprecation notice to kmsg.
> > > 
> > > To do that, we have to replace the whole sysfs file, since we inherit
> > > the original one from netdev.
> > > 
> > > Signed-off-by: Arseny Maslennikov 
> > >  drivers/infiniband/ulp/ipoib/ipoib_main.c | 31 +++
> > >  1 file changed, 31 insertions(+)
> > > 
> > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
> > > b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > index 30f840f874b3..74732726ec6f 100644
> > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > @@ -2386,6 +2386,35 @@ int ipoib_add_pkey_attr(struct net_device *dev)
> > >   return device_create_file(>dev, _attr_pkey);
> > >  }
> > >  
> > > +/*
> > > + * We erroneously exposed the iface's port number in the dev_id
> > > + * sysfs field long after dev_port was introduced for that purpose[1],
> > > + * and we need to stop everyone from relying on that.
> > > + * Let's overload the shower routine for the dev_id file here
> > > + * to gently bring the issue up.
> > > + *
> > > + * [1] https://www.spinics.net/lists/netdev/msg272123.html
> > > + */
> > > +static ssize_t dev_id_show(struct device *dev,
> > > +struct device_attribute *attr, char *buf)
> > > +{
> > > + struct net_device *ndev = to_net_dev(dev);
> > > +
> > > + if (ndev->dev_id == ndev->dev_port)
> > > + netdev_info_once(ndev,
> > > + "\"%s\" wants to know my dev_id. Should it look at 
> > > dev_port instead? See Documentation/ABI/testing/sysfs-class-net for more 
> > > info.\n",
> > > + current->comm);
> > > +
> > > + return sprintf(buf, "%#x\n", ndev->dev_id);
> > > +}
> > > +static DEVICE_ATTR_RO(dev_id);
> > > +
> > > +int ipoib_intercept_dev_id_attr(struct net_device *dev)
> > > +{
> > > + device_remove_file(>dev, _attr_dev_id);
> > > + return device_create_file(>dev, _attr_dev_id);
> > > +}
> > 
> > Isn't this racey with userspace? Ie what happens if udev is querying
> > the dev_id right here?
> > 
> > Do we know there is no userspace doing this?
> 
> I don't think that it can race (or reasonably can).  The intercept
> function is done as part of ipoib_add_port() so the port itself isn't
> live yet.

The above code is after register_netdev, so the port itself is
certainly live as far as userspace is concerned..

All the other sysfs stuff in add_port is already wrong/racy.. See
Parav's recent series fixing this for the main devices..

Jason


Re: [PATCH v4 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace

2018-09-07 Thread Jason Gunthorpe
On Thu, Sep 06, 2018 at 05:51:12PM +0300, Arseny Maslennikov wrote:
> Some tools may currently be using only the deprecated attribute;
> let's print an elaborate and clear deprecation notice to kmsg.
> 
> To do that, we have to replace the whole sysfs file, since we inherit
> the original one from netdev.
> 
> Signed-off-by: Arseny Maslennikov 
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 31 +++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 30f840f874b3..74732726ec6f 100644
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -2386,6 +2386,35 @@ int ipoib_add_pkey_attr(struct net_device *dev)
>   return device_create_file(>dev, _attr_pkey);
>  }
>  
> +/*
> + * We erroneously exposed the iface's port number in the dev_id
> + * sysfs field long after dev_port was introduced for that purpose[1],
> + * and we need to stop everyone from relying on that.
> + * Let's overload the shower routine for the dev_id file here
> + * to gently bring the issue up.
> + *
> + * [1] https://www.spinics.net/lists/netdev/msg272123.html
> + */
> +static ssize_t dev_id_show(struct device *dev,
> +struct device_attribute *attr, char *buf)
> +{
> + struct net_device *ndev = to_net_dev(dev);
> +
> + if (ndev->dev_id == ndev->dev_port)
> + netdev_info_once(ndev,
> + "\"%s\" wants to know my dev_id. Should it look at 
> dev_port instead? See Documentation/ABI/testing/sysfs-class-net for more 
> info.\n",
> + current->comm);
> +
> + return sprintf(buf, "%#x\n", ndev->dev_id);
> +}
> +static DEVICE_ATTR_RO(dev_id);
> +
> +int ipoib_intercept_dev_id_attr(struct net_device *dev)
> +{
> + device_remove_file(>dev, _attr_dev_id);
> + return device_create_file(>dev, _attr_dev_id);
> +}

Isn't this racey with userspace? Ie what happens if udev is querying
the dev_id right here?

Do we know there is no userspace doing this?

>  static struct net_device *ipoib_add_port(const char *format,
>struct ib_device *hca, u8 port)
>  {
> @@ -2427,6 +2456,8 @@ static struct net_device *ipoib_add_port(const char 
> *format,
>*/
>   ndev->priv_destructor = ipoib_intf_free;
>  
> + if (ipoib_intercept_dev_id_attr(ndev))
> + goto sysfs_failed;

No device_remove_file needed?

Jason


Re: [PATCH] cxgb4: fix abort_req_rss6 struct

2018-09-05 Thread Jason Gunthorpe
On Fri, Aug 31, 2018 at 11:52:00AM -0700, Steve Wise wrote:
> Remove the incorrect WR_HDR field which can cause a misinterpretation
> of this CPL by ULDs.

What does that mean?

Is this an -rc patch?

Jason


Re: [PATCH rdma-next v1 00/15] Flow actions to mutate packets

2018-09-05 Thread Jason Gunthorpe
On Wed, Sep 05, 2018 at 08:14:58AM +0300, Leon Romanovsky wrote:

> > This looks OK to me, can you make the shared commit please?
> 
> Thanks, I pushed to mlx5-next
> 
> 50acec06f392 net/mlx5: Export packet reformat alloc/dealloc functions
> 31ca3648f01b net/mlx5: Pass a namespace for packet reformat ID allocation
> bea4e1f6c6c5 net/mlx5: Expose new packet reformat capabilities
> 60786f0987c0 {net, RDMA}/mlx5: Rename encap to reformat packet
> e0e7a3861b6c net/mlx5: Move header encap type to IFC header file
> 61444b458b01 net/mlx5: Break encap/decap into two separated flow table 
> creation flags
> c3c062f80665 net/mlx5: Add support for more namespaces when allocating modify 
> header
> 90c1d1b8da67 net/mlx5: Export modify header alloc/dealloc functions
> 8ce78257965e net/mlx5: Add proper NIC TX steering flow tables support
> 2226dcb424bf net/mlx5: Cleanup flow namespace getter switch logic

Okay, done thanks

Jason


Re: [PATCH rdma-next v1 12/15] RDMA/mlx5: Add a new flow action verb - modify header

2018-09-05 Thread Jason Gunthorpe
On Wed, Sep 05, 2018 at 08:20:44AM +0300, Leon Romanovsky wrote:
> On Tue, Sep 04, 2018 at 03:58:23PM -0600, Jason Gunthorpe wrote:
> > On Tue, Aug 28, 2018 at 02:18:51PM +0300, Leon Romanovsky wrote:
> >
> > > +static int 
> > > UVERBS_HANDLER(MLX5_IB_METHOD_FLOW_ACTION_CREATE_MODIFY_HEADER)(
> > > + struct ib_uverbs_file *file,
> > > + struct uverbs_attr_bundle *attrs)
> > > +{
> > > + struct ib_uobject *uobj = uverbs_attr_get_uobject(
> > > + attrs, MLX5_IB_ATTR_CREATE_MODIFY_HEADER_HANDLE);
> > > + struct mlx5_ib_dev *mdev = to_mdev(uobj->context->device);
> > > + enum mlx5_ib_uapi_flow_table_type ft_type;
> > > + struct ib_flow_action *action;
> > > + size_t num_actions;
> > > + void *in;
> > > + int len;
> > > + int ret;
> > > +
> > > + if (!mlx5_ib_modify_header_supported(mdev))
> > > + return -EOPNOTSUPP;
> > > +
> > > + in = uverbs_attr_get_alloced_ptr(attrs,
> > > + MLX5_IB_ATTR_CREATE_MODIFY_HEADER_ACTIONS_PRM);
> > > + len = uverbs_attr_get_len(attrs,
> > > + MLX5_IB_ATTR_CREATE_MODIFY_HEADER_ACTIONS_PRM);
> > > +
> > > + if (len % MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto))
> > > + return -EINVAL;
> > > +
> > > + ret = uverbs_get_const(_type, attrs,
> > > +MLX5_IB_ATTR_CREATE_MODIFY_HEADER_FT_TYPE);
> > > + if (ret)
> > > + return -EINVAL;
> >
> > This should be
> >
> > if (ret)
> > return ret;
> >
> > Every call to uverbs_get_const is wrong in this same way..
> 
> Right, from technical point of view uverbs_get_const can return EINVAL
> only, and it is correct for now, but need to be changed to proper
> "return ret".

No, it can return ENOENT as well.

Jason


Re: [PATCH mlx5-next v1 05/15] net/mlx5: Break encap/decap into two separated flow table creation flags

2018-09-05 Thread Jason Gunthorpe
On Wed, Sep 05, 2018 at 08:10:25AM +0300, Leon Romanovsky wrote:
> > > - int en_encap_decap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN);
> > > + int en_encap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN_ENCAP);
> > > + int en_decap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
> >
> > Yuk, please don't use !!.
> >
> > bool en_decap = flags & MLX5_FLOW_TABLE_TUNNEL_EN_DECAP;
> 
> We need to provide en_encap and en_decap as an input to MLX5_SET(...)
> which is passed to FW as 0 or 1. 
>
> Boolean type is declared in C as int and treated as zero for false
> and any other value for true,

No, that isn't right, the kernel uses C99's _Bool intrinsic type, which
is guaranteed to only hold 0 or 1 by the compiler.

See types.h:

typedef _Bool   bool;

Jason


Re: [PATCH rdma-next v1 00/15] Flow actions to mutate packets

2018-09-04 Thread Jason Gunthorpe
On Tue, Aug 28, 2018 at 02:18:39PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> >From Mark,
> 
> This series exposes the ability to create flow actions which can
> mutate packet headers. We do that by exposing two new verbs:
>  * modify header - can change existing packet headers. packet
>  * reformat - can encapsulate or decapsulate a packet.
>   Once created a flow action must be attached to a steering
>   rule for it to take effect.
> 
> The first 10 patches refactor mlx5_core code, rename internal structures
> to better reflect their operation and export needed functions so the
> RDMA side can allocate the action.
> 
> The last 5 patches expose via the IOCTL infrastructure mlx5_ib methods
> which do the actual allocation of resources and return an handle to the
> user. A user of this API is expected to know how to work with the
> device's spec as the input to those function is HW depended.
> 
> An example usage of the modify header action is routing, A user can
> create an action which edits the L2 header and decrease the TTL.
> 
> An example usage of the packet reformat action is VXLAN encap/decap
> which is done by the HW.
> 
> Changelog:
>  v0 -> v1:
>   * Patch 1: Addressed Saeed's comments and simplified the logic.
>   * Patch 2: Changed due to changes in patch 1.
> 
>  Split the 27 patch series into 3, this is the first one
>  which just lets the user create flow action.
>  Other than that styling fixes mainly in the RDMA patches
>  to make sure 80 chars limit isn't exceeded.
> 
>  RFC -> v0:
>   * Patch 1 a new patch which refactors the logic
> when getting a flow namespace.
>   * Patch 2 was split into two.
>   * Patch 3: Fixed a typo in commit message
>   * Patch 5: Updated commit message
>   * Patch 7: Updated commit message
> Renamed:
>   - MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT_ID to
> MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT
>   - packet_reformat_id to reformat_id in struct mlx5_flow_act
>   - packet_reformat_id to encap_id in struct mlx5_esw_flow_attr
>   - packet_reformat_id to encap_id in struct mlx5e_encap_entry
>   - PACKET_REFORMAT to REFORMAT when printing trace points
>   * Patch 9: Updated commit message
> Updated function declaration in mlx5_core.h, could of lead
> to compile error on bisection.
>   * Patch 11: Disallow egress rules insertion when in switchdev mode
>   * Patch 12: A new patch to deal with passing enum values using
> the IOCTL infrastructure.
>   * Patch 13: Use new enum value attribute when passing enum
> mlx5_ib_uapi_flow_table_type
>   * Patch 15: Don't set encap flags on flow tables if in switchdev mode
>   * Patch 17: Use new enum value attribute when passing enum
> mlx5_ib_uapi_flow_table_type and enum
> mlx5_ib_uapi_flow_action_packet_reformat_type
>   * Patch 19: Allow creation of both
> MLX5_IB_UAPI_FLOW_ACTION_PACKET_REFORMAT_TYPE_L2_TO_L3_TUNNEL
> and MLX5_IB_UAPI_FLOW_ACTION_PACKET_REFORMAT_TYPE_L3_TUNNEL_TO_L2
> packet
> reformat actions.
>   * Patch 20: A new patch which allows attaching packet reformat
> actions to flow tables on NIC RX.
> 
> Thanks
> 
> Mark Bloch (15):
>   net/mlx5: Cleanup flow namespace getter switch logic
>   net/mlx5: Add proper NIC TX steering flow tables support
>   net/mlx5: Export modify header alloc/dealloc functions
>   net/mlx5: Add support for more namespaces when allocating modify
> header
>   net/mlx5: Break encap/decap into two separated flow table creation
> flags
>   net/mlx5: Move header encap type to IFC header file
>   {net, RDMA}/mlx5: Rename encap to reformat packet
>   net/mlx5: Expose new packet reformat capabilities
>   net/mlx5: Pass a namespace for packet reformat ID allocation
>   net/mlx5: Export packet reformat alloc/dealloc functions
>   RDMA/uverbs: Add UVERBS_ATTR_CONST_IN to the specs language
>   RDMA/mlx5: Add a new flow action verb - modify header
>   RDMA/uverbs: Add generic function to fill in flow action object
>   RDMA/mlx5: Add new flow action verb - packet reformat
>   RDMA/mlx5: Extend packet reformat verbs
> 
>  drivers/infiniband/core/uverbs_ioctl.c |  23 ++
>  .../infiniband/core/uverbs_std_types_flow_action.c |   7 +-
>  drivers/infiniband/hw/mlx5/devx.c  |   6 +-
>  drivers/infiniband/hw/mlx5/flow.c  | 301 
> +
>  drivers/infiniband/hw/mlx5/main.c  |   3 +
>  drivers/infiniband/hw/mlx5/mlx5_ib.h   |  19 +-
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |   8 +-
>  .../mellanox/mlx5/core/diag/fs_tracepoint.h|   2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  51 ++--
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   2 +-
>  .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |   9 +-
>  drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  87 +++---
>  drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  73 +++--
>  

Re: [PATCH mlx5-next v1 05/15] net/mlx5: Break encap/decap into two separated flow table creation flags

2018-09-04 Thread Jason Gunthorpe
On Tue, Aug 28, 2018 at 02:18:44PM +0300, Leon Romanovsky wrote:
> From: Mark Bloch 
> 
> Today we are able to attach encap and decap actions only to the FDB. In
> preparation to enable those actions on the NIC flow tables, break the
> single flag into two. Those flags control whatever a decap or encap
> operations can be attached to the flow table created. For FDB, if
> encapsulation is required, we set both of them.
> 
> Signed-off-by: Mark Bloch 
> Reviewed-by: Saeed Mahameed 
> Reviewed-by: Or Gerlitz 
> Signed-off-by: Leon Romanovsky 
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 3 ++-
>  drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   | 7 ---
>  include/linux/mlx5/fs.h| 3 ++-
>  3 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index f72b5c9dcfe9..ff21807a0c4b 100644
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@ -529,7 +529,8 @@ static int esw_create_offloads_fast_fdb_table(struct 
> mlx5_eswitch *esw)
>   esw_size >>= 1;
>  
>   if (esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE)
> - flags |= MLX5_FLOW_TABLE_TUNNEL_EN;
> + flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_ENCAP |
> +   MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
>  
>   fdb = mlx5_create_auto_grouped_flow_table(root_ns, FDB_FAST_PATH,
> esw_size,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
> index 9ae777e56529..1698f325a21e 100644
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
> @@ -152,7 +152,8 @@ static int mlx5_cmd_create_flow_table(struct 
> mlx5_core_dev *dev,
> struct mlx5_flow_table *next_ft,
> unsigned int *table_id, u32 flags)
>  {
> - int en_encap_decap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN);
> + int en_encap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN_ENCAP);
> + int en_decap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);

Yuk, please don't use !!.

bool en_decap = flags & MLX5_FLOW_TABLE_TUNNEL_EN_DECAP;

Jason


Re: [PATCH rdma-next v1 12/15] RDMA/mlx5: Add a new flow action verb - modify header

2018-09-04 Thread Jason Gunthorpe
On Tue, Aug 28, 2018 at 02:18:51PM +0300, Leon Romanovsky wrote:

> +static int UVERBS_HANDLER(MLX5_IB_METHOD_FLOW_ACTION_CREATE_MODIFY_HEADER)(
> + struct ib_uverbs_file *file,
> + struct uverbs_attr_bundle *attrs)
> +{
> + struct ib_uobject *uobj = uverbs_attr_get_uobject(
> + attrs, MLX5_IB_ATTR_CREATE_MODIFY_HEADER_HANDLE);
> + struct mlx5_ib_dev *mdev = to_mdev(uobj->context->device);
> + enum mlx5_ib_uapi_flow_table_type ft_type;
> + struct ib_flow_action *action;
> + size_t num_actions;
> + void *in;
> + int len;
> + int ret;
> +
> + if (!mlx5_ib_modify_header_supported(mdev))
> + return -EOPNOTSUPP;
> +
> + in = uverbs_attr_get_alloced_ptr(attrs,
> + MLX5_IB_ATTR_CREATE_MODIFY_HEADER_ACTIONS_PRM);
> + len = uverbs_attr_get_len(attrs,
> + MLX5_IB_ATTR_CREATE_MODIFY_HEADER_ACTIONS_PRM);
> +
> + if (len % MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto))
> + return -EINVAL;
> +
> + ret = uverbs_get_const(_type, attrs,
> +MLX5_IB_ATTR_CREATE_MODIFY_HEADER_FT_TYPE);
> + if (ret)
> + return -EINVAL;

This should be

if (ret)
return ret;

Every call to uverbs_get_const is wrong in this same way..

I can probably fix it if this is the only thing though..

Jason


Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-08-17 Thread Jason Gunthorpe
On Fri, Aug 17, 2018 at 01:03:20PM -0700, Sagi Grimberg wrote:
> 
> > Hey Sagi,
> > 
> > The patch works allowing connections for the various affinity mappings 
> > below:
> > 
> > One comp_vector per core across all cores, starting with numa-local cores:
> 
> Thanks Steve, is this your "Tested by:" tag?

The new patchworks doesn't grab patches inlined in messages, so you
will need to resend it.

Also, can someone remind me what the outcome is here? Does it
supersede Leon's patch:

https://patchwork.kernel.org/patch/10526167/

?

Thanks,
Jason


[PATCH] Revert "net/smc: Replace ib_query_gid with rdma_get_gid_attr"

2018-08-16 Thread Jason Gunthorpe
This reverts commit ddb457c6993babbcdd41fca638b870d2a2fc3941.

The include rdma/ib_cache.h is kept, and we have to add a memset
to the compat wrapper to avoid compiler warnings in gcc-7

This revert is done to avoid extensive merge conflicts with SMC
changes in netdev during the 4.19 merge window.

Signed-off-by: Jason Gunthorpe 
---
 include/rdma/ib_cache.h |  1 +
 net/smc/smc_core.c  | 19 ++-
 net/smc/smc_ib.c| 24 ++--
 3 files changed, 21 insertions(+), 23 deletions(-)

As discussed before, the above patch to SMC in the rdma.git causes too
many merge conflicts, I am reverting it prior to sending the pull
request for RDMA and instead relying on the ibv_query_gid() compat
wrapper that has been in linux-next for some time.

Parav, please respin this patch against this branch:

https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/jgg-for-next

Thanks,
Jason

diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h
index a4ce441f36f0ad..3e11e7cc60b745 100644
--- a/include/rdma/ib_cache.h
+++ b/include/rdma/ib_cache.h
@@ -143,6 +143,7 @@ static inline __deprecated int ib_query_gid(struct 
ib_device *device,
 {
const struct ib_gid_attr *attr;
 
+   memset(attr_out, 0, sizeof(*attr_out));
attr = rdma_get_gid_attr(device, port_num, index);
if (IS_ERR(attr))
return PTR_ERR(attr);
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index d99a75f75e42be..15bad268f37d8b 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -451,7 +451,8 @@ static int smc_vlan_by_tcpsk(struct socket *clcsock, 
unsigned short *vlan_id)
 static int smc_link_determine_gid(struct smc_link_group *lgr)
 {
struct smc_link *lnk = >lnk[SMC_SINGLE_LINK];
-   const struct ib_gid_attr *gattr;
+   struct ib_gid_attr gattr;
+   union ib_gid gid;
int i;
 
if (!lgr->vlan_id) {
@@ -461,18 +462,18 @@ static int smc_link_determine_gid(struct smc_link_group 
*lgr)
 
for (i = 0; i < lnk->smcibdev->pattr[lnk->ibport - 1].gid_tbl_len;
 i++) {
-   gattr = rdma_get_gid_attr(lnk->smcibdev->ibdev, lnk->ibport, i);
-   if (IS_ERR(gattr))
+   if (ib_query_gid(lnk->smcibdev->ibdev, lnk->ibport, i, ,
+))
continue;
-   if (gattr->ndev) {
-   if (is_vlan_dev(gattr->ndev) &&
-   vlan_dev_vlan_id(gattr->ndev) == lgr->vlan_id) {
-   lnk->gid = gattr->gid;
-   rdma_put_gid_attr(gattr);
+   if (gattr.ndev) {
+   if (is_vlan_dev(gattr.ndev) &&
+   vlan_dev_vlan_id(gattr.ndev) == lgr->vlan_id) {
+   lnk->gid = gid;
+   dev_put(gattr.ndev);
return 0;
}
+   dev_put(gattr.ndev);
}
-   rdma_put_gid_attr(gattr);
}
return -ENODEV;
 }
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 74f29f814ec1f9..117b05f1a49475 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -373,21 +373,17 @@ void smc_ib_buf_unmap_sg(struct smc_ib_device *smcibdev,
 
 static int smc_ib_fill_gid_and_mac(struct smc_ib_device *smcibdev, u8 ibport)
 {
-   const struct ib_gid_attr *gattr;
-   int rc = 0;
+   struct ib_gid_attr gattr;
+   int rc;
 
-   gattr = rdma_get_gid_attr(smcibdev->ibdev, ibport, 0);
-   if (IS_ERR(gattr))
-   return PTR_ERR(gattr);
-   if (!gattr->ndev) {
-   rc = -ENODEV;
-   goto done;
-   }
-   smcibdev->gid[ibport - 1] = gattr->gid;
-   memcpy(smcibdev->mac[ibport - 1], gattr->ndev->dev_addr, ETH_ALEN);
-done:
-   rdma_put_gid_attr(gattr);
-   return rc;
+   rc = ib_query_gid(smcibdev->ibdev, ibport, 0,
+ >gid[ibport - 1], );
+   if (rc || !gattr.ndev)
+   return -ENODEV;
+
+   memcpy(smcibdev->mac[ibport - 1], gattr.ndev->dev_addr, ETH_ALEN);
+   dev_put(gattr.ndev);
+   return 0;
 }
 
 /* Create an identifier unique for this instance of SMC-R.
-- 
2.18.0



Re: [PATCH 0/3] net, IB/ipoib: Use dev_port to disambiguate

2018-08-13 Thread Jason Gunthorpe
On Mon, Aug 13, 2018 at 02:42:21PM +0300, Arseny Maslennikov wrote:
> Pre-3.15 userspace had trouble distinguishing different ports of a NIC
> on a single PCI bus/device/function. To solve this, a sysfs field `dev_port'
> was introduced quite a while ago (commit v3.14-rc3-739-g3f85944fe207), and
> some relevant device drivers were fixed to use it, but not in case of IPoIB.
> 
> The convention for some reason never got documented in the kernel, but
> was immediately adopted by userspace (notably udev[1][2], biosdevname[3])
> 
> 3/3 documents the sysfs field — that's why I'm CC-ing netdev.
> 
> This series was tested on current LTS and 4.18.
> 
> [1] https://lists.freedesktop.org/archives/systemd-devel/2014-June/020788.html
> [2] https://lists.freedesktop.org/archives/systemd-devel/2014-July/020804.html
> [3] 
> https://github.com/CloudAutomationNTools/biosdevname/blob/c795d51dd93a5309652f0d635f12a3ecfabfaa72/src/eths.c#L38
> 
> Arseny Maslennikov (3):
>   IB/ipoib: Use dev_port to expose network interface port numbers
>   IB/ipoib: Stop using dev_id to expose port numbers
>   Documentation/ABI: document /sys/class/net/*/dev_port
> 
>  Documentation/ABI/testing/sysfs-class-net | 10 ++
>  drivers/infiniband/ulp/ipoib/ipoib_main.c |  2 +-
>  2 files changed, 11 insertions(+), 1 deletion(-)

This series doesn't apply to rdma for-next, and it is the merge window
now.

Can you resubmit this aginst 4.19-rc1 in two weeks? Thanks

Jason


Re: [PATCH rdma-next 00/10] IPoIB uninit

2018-08-02 Thread Jason Gunthorpe
On Sun, Jul 29, 2018 at 11:34:50AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> IP link was broken due to the changes in IPoIB for the rdma_netdev
> support after commit cd565b4b51e5 ("IB/IPoIB: Support acceleration options 
> callbacks").
> 
> This patchset restores IPoIB pkey creation and removal using rtnetlink.
> 
> It is completely rewritten variant of
> https://marc.info/?l=linux-rdma=151553425830918=2 patch series.
> 
> Thanks
> 
> Erez Shitrit (2):
>   IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task
>   IB/ipoib: Make ipoib_neigh_hash_uninit fully fence its work
> 
> Jason Gunthorpe (8):
>   IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN
>   IB/ipoib: Move all uninit code into ndo_uninit
>   IB/ipoib: Move init code to ndo_init
>   RDMA/netdev: Use priv_destructor for netdev cleanup
>   IB/ipoib: Get rid of the sysfs_mutex
>   IB/ipoib: Do not remove child devices from within the ndo_uninit
>   IB/ipoib: Maintain the child_intfs list from ndo_init/uninit
>   IB/ipoib: Consolidate checking of the proposed child interface

Applied to for-next, finally this bug is fixed..

I squashed these two patches though:

   IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task
   IB/ipoib: Make ipoib_neigh_hash_uninit fully fence its work

Thanks,
Jason


Re: [PATCH rdma-next 00/27] Flow actions to mutate packets

2018-07-30 Thread Jason Gunthorpe
On Sun, Jul 29, 2018 at 03:58:38PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Hi,
> 
> This is PATCH variant of RFC posted in previous week to the ML.
> https://patchwork.ozlabs.org/cover/944184/
> 
> Changelog:
>  RFC -> v0:
>   * Patch 1 a new patch which refactors the logic
> when getting a flow namespace.
>   * Patch 2 was split into two.
>   * Patch 3: Fixed a typo in commit message
>   * Patch 5: Updated commit message
>   * Patch 7: Updated commit message
> Renamed:
>   - MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT_ID to
> MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT
>   - packet_reformat_id to reformat_id in struct mlx5_flow_act
>   - packet_reformat_id to encap_id in struct mlx5_esw_flow_attr
>   - packet_reformat_id to encap_id in struct mlx5e_encap_entry
>   - PACKET_REFORMAT to REFORMAT when printing trace points
>   * Patch 9: Updated commit message
> Updated function declaration in mlx5_core.h, could of lead
> to compile error on bisection.
>   * Patch 11: Disallow egress rules insertion when in switchdev mode
>   * Patch 12: A new patch to deal with passing enum values using
> the IOCTL infrastructure.
>   * Patch 13: Use new enum value attribute when passing enum
> mlx5_ib_uapi_flow_table_type
>   * Patch 15: Don't set encap flags on flow tables if in switchdev mode
>   * Patch 17: Use new enum value attribute when passing enum
> mlx5_ib_uapi_flow_table_type and enum
> mlx5_ib_uapi_flow_action_packet_reformat_type
>   * Patch 19: Allow creation of both
> MLX5_IB_UAPI_FLOW_ACTION_PACKET_REFORMAT_TYPE_L2_TO_L3_TUNNEL
> and MLX5_IB_UAPI_FLOW_ACTION_PACKET_REFORMAT_TYPE_L3_TUNNEL_TO_L2 packet
> reformat actions.
>   * Patch 20: A new patch which allows attaching packet reformat
> actions to flow tables on NIC RX.
> 
> Thanks
> 
> From Mark:
> This series exposes the ability to create flow actions which can
> mutate packet headers. We do that by exposing two new verbs:
>  * modify header - can change existing packet headers. packet
>  * reformat - can encapsulate or decapsulate a packet.
>   Once created a flow action must be attached to a steering
>   rule for it to take effect.

Mark, this got a bit big in terms of patch count, however most of the
patches fit on one screen and the overall line count isn't too bad.

Can this be split somehow?

Thanks,
Jason


Re: [PATCH rdma-next v2 0/8] Support mlx5 flow steering with RAW data

2018-07-26 Thread Jason Gunthorpe
On Thu, Jul 26, 2018 at 07:35:49AM +0300, Leon Romanovsky wrote:
> On Wed, Jul 25, 2018 at 08:35:17AM -0600, Jason Gunthorpe wrote:
> > On Wed, Jul 25, 2018 at 08:37:03AM +0300, Leon Romanovsky wrote:
> >
> > > > Also, I would like to keep the specs consistently formatted according
> > > > to clang-format with 'BinPackParameters: true', so I reflowed them as
> > > > well.
> > >
> > > I'm using default VIM clang-format.py without anything in .clang-format.
> > > Do you have an extra definitions there, except BinPackParameters?
> >
> > These days Linux includes a top level .clang-format that does a
> > pretty good job.
> >
> > I have to manually switch BinPackParameters on when working with these
> > specs to get the right indenting.. A pain, but maybe there is a better
> > way someday..
> 
> I don't think that it is feasible to ask from people to change some
> defaults only for patches that touch those specs. Any change in this
> area will change formatting back.

Eventually I think we might be able toadd a code comment to tell
clang-format, but that would be down the road a bit..

> Jason, bottom line, I won't use BinPackParameters for my patches.

Well, you can make sure the specs macro follows the required
formatting code style by hand if you prefer..

But, I want to see them in this layout, so they are easier to
maintain, not the haphazard layout we had before.

Jason


Re: [PATCH rdma-next v2 0/8] Support mlx5 flow steering with RAW data

2018-07-25 Thread Jason Gunthorpe
On Wed, Jul 25, 2018 at 08:37:03AM +0300, Leon Romanovsky wrote:

> > Also, I would like to keep the specs consistently formatted according
> > to clang-format with 'BinPackParameters: true', so I reflowed them as
> > well.
> 
> I'm using default VIM clang-format.py without anything in .clang-format.
> Do you have an extra definitions there, except BinPackParameters?

These days Linux includes a top level .clang-format that does a
pretty good job.

I have to manually switch BinPackParameters on when working with these
specs to get the right indenting.. A pain, but maybe there is a better
way someday..

Jason


Re: [PATCH rdma-next v2 0/8] Support mlx5 flow steering with RAW data

2018-07-24 Thread Jason Gunthorpe
On Tue, Jul 24, 2018 at 02:13:00PM -0600, Jason Gunthorpe wrote:

> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
> b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index 5f08b69f8a4f60..ec8410d3c4eb2a 100644
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -1232,11 +1232,9 @@ int mlx5_ib_devx_create(struct mlx5_ib_dev *dev,
>  void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev,
> struct mlx5_ib_ucontext *context);
>  const struct uverbs_object_tree_def *mlx5_ib_get_devx_tree(void);
> -struct mlx5_ib_flow_handler *mlx5_ib_raw_fs_rule_add(struct mlx5_ib_dev *dev,
> -  struct 
> mlx5_ib_flow_matcher *fs_matcher,
> -  void *cmd_in,
> -  int inlen, int dest_id,
> -  int dest_type);
> +struct mlx5_ib_flow_handler *mlx5_ib_raw_fs_rule_add(
> + struct mlx5_ib_dev *dev, struct mlx5_ib_flow_matcher *fs_matcher,
> + void *cmd_in, int inlen, int dest_id, int dest_type);
>  bool mlx5_ib_devx_is_flow_dest(void *obj, int *dest_id, int *dest_type);
>  int mlx5_ib_get_flow_trees(const struct uverbs_object_tree_def **root);
>  #else
> @@ -1247,17 +1245,22 @@ static inline void mlx5_ib_devx_destroy(struct 
> mlx5_ib_dev *dev,
>   struct mlx5_ib_ucontext *context) {}
>  static inline const struct uverbs_object_tree_def *
>  mlx5_ib_get_devx_tree(void) { return NULL; }
> -static inline struct mlx5_ib_flow_handler *
> -mlx5_ib_raw_fs_rule_add(struct mlx5_ib_dev *dev,
> - struct mlx5_ib_flow_matcher *fs_matcher,
> - void *cmd_in,
> - int inlen, int dest_id,
> - int dest_type) { return -EOPNOTSUPP; };
> -static inline bool
> -mlx5_ib_devx_is_flow_dest(void *obj, int *dest_id,
> -   int *dest_type) { return false; };
> +static inline struct mlx5_ib_flow_handler *mlx5_ib_raw_fs_rule_add(
> + struct mlx5_ib_dev *dev, struct mlx5_ib_flow_matcher *fs_matcher,
> + void *cmd_in, int inlen, int dest_id, int dest_type)
> +{
> + return ERR_PTR(-EOPNOTSUPP);
> +}

Bah, I botched the compile test for this - this entire static inline
should just be removed because the function is defined in main.c which
is always compiled.

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index ec8410d3c4eb2a..462505c8fa25a0 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1245,12 +1245,6 @@ static inline void mlx5_ib_devx_destroy(struct 
mlx5_ib_dev *dev,
struct mlx5_ib_ucontext *context) {}
 static inline const struct uverbs_object_tree_def *
 mlx5_ib_get_devx_tree(void) { return NULL; }
-static inline struct mlx5_ib_flow_handler *mlx5_ib_raw_fs_rule_add(
-   struct mlx5_ib_dev *dev, struct mlx5_ib_flow_matcher *fs_matcher,
-   void *cmd_in, int inlen, int dest_id, int dest_type)
-{
-   return ERR_PTR(-EOPNOTSUPP);
-}
 static inline bool mlx5_ib_devx_is_flow_dest(void *obj, int *dest_id,
 int *dest_type)
 {

Jason


Re: [PATCH rdma-next v2 0/8] Support mlx5 flow steering with RAW data

2018-07-24 Thread Jason Gunthorpe
On Tue, Jul 24, 2018 at 08:56:09AM +0300, Leon Romanovsky wrote:
> On Mon, Jul 23, 2018 at 08:42:36PM -0600, Jason Gunthorpe wrote:
> > On Mon, Jul 23, 2018 at 03:25:04PM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky 
> > >
> > > Changelog:
> > > v1->v2:
> > >  * Fix matcher to use the correct size.
> > >  * Rephrase commit log of the first patch.
> > > v0->v1:
> > >  * Fixed ADD_UVERBS_ATTRIBUTES_SIMPLE macro to pass the real address.
> > >  ?* Replaced UA_ALLOC_AND_COPY to regular copy_from
> > >  * Added UVERBS_ATTR_NO_DATA new macro for cleaner code.
> > >  * Used ib_dev from uobj when it exists.
> > >  * ib_is_destroy_retryable was replaced by ib_destroy_usecnt
> > >
> > > >From Yishai:
> > >
> > > This series introduces vendor create and destroy flow methods on the
> > > uverbs flow object by using the KABI infra-structure.
> > >
> > > It's done in a way that enables the driver to get its specific device
> > > attributes in a raw data to match its underlay specification while still
> > > using the generic ib_flow object for cleanup and code sharing.
> > >
> > > In addition, a specific mlx5 matcher object and its create/destroy
> > > methods were introduced. This object matches the underlay flow steering
> > > mask specification and is used as part of mlx5 create flow input data.
> > >
> > > This series supports IB_QP/TIR as its flow steering destination as
> > > applicable today via the ib_create_flow API, however, it adds also an
> > > option to work with DEVX object which its destination can be both TIR
> > > and flow table.
> > >
> > > Few changes were done in the mlx5 core layer to support forward
> > > compatible for the device specification raw data and to support flow
> > > table when the DEVX destination is used.
> > >
> > > As part of this series the default IB destroy handler
> > > (i.e. uverbs_destroy_def_handler()) was exposed from IB core to be
> > > used by the drivers and existing code was refactored to use it.
> > >
> > > Thanks
> > >
> > > Yishai Hadas (8):
> > >   net/mlx5: Add forward compatible support for the FTE match data
> > >   net/mlx5: Add support for flow table destination number
> > >   IB/mlx5: Introduce flow steering matcher object
> > >   IB: Consider ib_flow creation by the KABI infrastructure
> > >   IB/mlx5: Introduce vendor create and destroy flow methods
> > >   IB/mlx5: Support adding flow steering rule by raw data
> > >   IB/mlx5: Add support for a flow table destination
> > >   IB/mlx5: Expose vendor flow trees
> >
> > This seems fine to me. Can you send the mlx5 shared branch for the
> > first two patches?
> 
> I applied two first patches with Acked-by from Saeed to mlx5-next
> 
> 664000b6bb43 net/mlx5: Add support for flow table destination number
> 2aada6c0c96e net/mlx5: Add forward compatible support for the FTE match data

Okay, I merged the mlx5 branch and applied the series to for-next.

There was a trivial build failure with !CONFIG_INFINIBAND_USER_ACCESS
and a bunch of annoying check patch warnings regarding tabs and
leading white space. While I was fixing those I fixed the long lines
too, they had no reason to be long.

Also, I would like to keep the specs consistently formatted according
to clang-format with 'BinPackParameters: true', so I reflowed them as
well.

Please check and let me know if I made an error, diff is below.

Jason

diff --git a/drivers/infiniband/hw/mlx5/flow.c 
b/drivers/infiniband/hw/mlx5/flow.c
index c94ee1a43e2c3f..ee398a9b5f26b0 100644
--- a/drivers/infiniband/hw/mlx5/flow.c
+++ b/drivers/infiniband/hw/mlx5/flow.c
@@ -50,8 +50,8 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_CREATE_FLOW)(
int inlen;
bool dest_devx, dest_qp;
struct ib_qp *qp = NULL;
-   struct ib_uobject *uobj = uverbs_attr_get_uobject(attrs,
- MLX5_IB_ATTR_CREATE_FLOW_HANDLE);
+   struct ib_uobject *uobj =
+   uverbs_attr_get_uobject(attrs, MLX5_IB_ATTR_CREATE_FLOW_HANDLE);
struct mlx5_ib_dev *dev = to_mdev(uobj->context->device);
 
if (!capable(CAP_NET_RAW))
@@ -66,7 +66,8 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_CREATE_FLOW)(
return -EINVAL;
 
if (dest_devx) {
-   devx_obj = uverbs_attr_get_obj(attrs, 
MLX5_IB_ATTR_CREATE_FLOW_DEST_DEVX);
+   devx_obj = uverbs_attr_get_obj(
+   attrs, MLX5_IB_ATTR_CREATE_FLOW_DEST_DEVX);
if (IS_ERR(devx_obj))
return PTR_E

Re: [PATCH rdma-next v2 0/8] Support mlx5 flow steering with RAW data

2018-07-23 Thread Jason Gunthorpe
On Mon, Jul 23, 2018 at 03:25:04PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Changelog:
> v1->v2:
>  * Fix matcher to use the correct size.
>  * Rephrase commit log of the first patch.
> v0->v1:
>  * Fixed ADD_UVERBS_ATTRIBUTES_SIMPLE macro to pass the real address.
>  ?* Replaced UA_ALLOC_AND_COPY to regular copy_from
>  * Added UVERBS_ATTR_NO_DATA new macro for cleaner code.
>  * Used ib_dev from uobj when it exists.
>  * ib_is_destroy_retryable was replaced by ib_destroy_usecnt
> 
> >From Yishai:
> 
> This series introduces vendor create and destroy flow methods on the
> uverbs flow object by using the KABI infra-structure.
> 
> It's done in a way that enables the driver to get its specific device
> attributes in a raw data to match its underlay specification while still
> using the generic ib_flow object for cleanup and code sharing.
> 
> In addition, a specific mlx5 matcher object and its create/destroy
> methods were introduced. This object matches the underlay flow steering
> mask specification and is used as part of mlx5 create flow input data.
> 
> This series supports IB_QP/TIR as its flow steering destination as
> applicable today via the ib_create_flow API, however, it adds also an
> option to work with DEVX object which its destination can be both TIR
> and flow table.
> 
> Few changes were done in the mlx5 core layer to support forward
> compatible for the device specification raw data and to support flow
> table when the DEVX destination is used.
> 
> As part of this series the default IB destroy handler
> (i.e. uverbs_destroy_def_handler()) was exposed from IB core to be
> used by the drivers and existing code was refactored to use it.
> 
> Thanks
> 
> Yishai Hadas (8):
>   net/mlx5: Add forward compatible support for the FTE match data
>   net/mlx5: Add support for flow table destination number
>   IB/mlx5: Introduce flow steering matcher object
>   IB: Consider ib_flow creation by the KABI infrastructure
>   IB/mlx5: Introduce vendor create and destroy flow methods
>   IB/mlx5: Support adding flow steering rule by raw data
>   IB/mlx5: Add support for a flow table destination
>   IB/mlx5: Expose vendor flow trees

This seems fine to me. Can you send the mlx5 shared branch for the
first two patches?

Thanks,
Jason


Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-07-23 Thread Jason Gunthorpe
On Fri, Jul 20, 2018 at 04:25:32AM +0300, Max Gurtovoy wrote:
> 
> >>>[ 2032.194376] nvme nvme0: failed to connect queue: 9 ret=-18
> >>
> >>queue 9 is not mapped (overlap).
> >>please try the bellow:
> >>
> >
> >This seems to work.  Here are three mapping cases:  each vector on its
> >own cpu, each vector on 1 cpu within the local numa node, and each
> >vector having all cpus in its numa node.  The 2nd mapping looks kinda
> >funny, but I think it achieved what you wanted?  And all the cases
> >resulted in successful connections.
> >
> 
> Thanks for testing this.
> I slightly improved the setting of the left CPUs and actually used Sagi's
> initial proposal.
> 
> Sagi,
> please review the attached patch and let me know if I should add your
> signature on it.
> I'll run some perf test early next week on it (meanwhile I run login/logout
> with different num_queues successfully and irq settings).
> 
> Steve,
> It will be great if you can apply the attached in your system and send your
> findings.
> 
> Regards,
> Max,

So the conlusion to this thread is that Leon's mlx5 patch needs to wait
until this block-mq patch is accepted?

Thanks,
Jason


Re: [PATCH mlx5-next v1 2/8] net/mlx5: Add support for flow table destination number

2018-07-12 Thread Jason Gunthorpe
On Fri, Jul 13, 2018 at 12:51:10AM +0300, Or Gerlitz wrote:
> On Fri, Jul 13, 2018 at 12:26 AM, Jason Gunthorpe  wrote:
> > On Fri, Jul 13, 2018 at 12:00:41AM +0300, Or Gerlitz wrote:
> >> On Wed, Jul 11, 2018 at 2:10 PM, Leon Romanovsky  wrote:
> >> > From: Yishai Hadas 
> >> >
> >> > Add support to set a destination from a flow table number.
> >> > This functionality will be used in downstream patches from this
> >> > series by the DEVX stuff.
> >>
> >> Reading your cover letter, I still don't understand what is missing
> >> in the current mlx5 fs core API for your needs. After all, you do
> >> create flow tables from the IB driver through fs core calls, right?
> >> so @ the end of the day, you have the FT pointer to provide the
> >> core, why you need the FT number?
> >
> > Via the devx API userspace can create flow tables directly without
> > going to the driver's flow steering core.
> 
> so why you change the core?

User space flow tables don't get any traffic until they are linked
into the main steering. The only ID the kernel gets for them when
adding this link is the actual PRM handle, not a pointer - hence the
change.

Jason


Re: [PATCH mlx5-next v1 2/8] net/mlx5: Add support for flow table destination number

2018-07-12 Thread Jason Gunthorpe
On Fri, Jul 13, 2018 at 12:00:41AM +0300, Or Gerlitz wrote:
> On Wed, Jul 11, 2018 at 2:10 PM, Leon Romanovsky  wrote:
> > From: Yishai Hadas 
> >
> > Add support to set a destination from a flow table number.
> > This functionality will be used in downstream patches from this
> > series by the DEVX stuff.
> 
> Reading your cover letter, I still don't understand what is missing
> in the current mlx5 fs core API for your needs. After all, you do
> create flow tables from the IB driver through fs core calls, right?
> so @ the end of the day, you have the FT pointer to provide the
> core, why you need the FT number?

Via the devx API userspace can create flow tables directly without
going to the driver's flow steering core.

Jason


Re: [PATCH rdma-next 4/9] IB/mlx5: Introduce flow steering matcher object

2018-07-11 Thread Jason Gunthorpe
On Wed, Jul 11, 2018 at 12:32:56PM +0300, Yishai Hadas wrote:
> >>+static int UVERBS_HANDLER(MLX5_IB_METHOD_FLOW_MATCHER_CREATE)(struct 
> >>ib_device *ib_dev,
> >>+  struct ib_uverbs_file *file,
> >>+  struct uverbs_attr_bundle *attrs)
> >>+{
> >>+   struct mlx5_ib_dev *dev = to_mdev(ib_dev);
> >
> >I have a patch changing these - when working with uobj's the ib_dev
> >argument should not be used, the ib_dev must come from the ucontext
> >instead.
> >
> 
> OK, V1 will take the ib_dev from the uobj.
> 
> However, does it mean that you are going to change the signatures of all the
> handlers to *not* get ib_dev ?

Yes.

Jason


Re: [PATCH rdma-next 6/9] IB/mlx5: Introduce vendor create and destroy flow methods

2018-07-10 Thread Jason Gunthorpe
On Sun, Jul 08, 2018 at 01:24:42PM +0300, Leon Romanovsky wrote:

> +static int UVERBS_HANDLER(MLX5_IB_METHOD_CREATE_FLOW)(struct ib_device 
> *ib_dev,
> +   struct ib_uverbs_file 
> *file,
> +   struct uverbs_attr_bundle 
> *attrs)
> +{
> + struct mlx5_ib_dev *dev = to_mdev(ib_dev);

Same comment as before, the dev needs to come from uboj->context->device

> -/* Used by drivers to declare a complete parsing tree for a single method 
> that
> - * differs only in having additional driver specific attributes.
> +/* Used by drivers to declare a complete parsing tree for new methods
>   */
> -#define ADD_UVERBS_ATTRIBUTES_SIMPLE(_name, _object_id, _method_id, ...) 
>   \
> - static const struct uverbs_attr_def *const UVERBS_METHOD_ATTRS(\
> - _method_id)[] = { __VA_ARGS__ };   \
> - static const struct uverbs_method_def UVERBS_METHOD(_method_id) = {\
> - .id = _method_id,  \
> - .num_attrs = ARRAY_SIZE(UVERBS_METHOD_ATTRS(_method_id)),  \
> - .attrs = _METHOD_ATTRS(_method_id), \
> - }; \
> +#define ADD_UVERBS_METHODS(_name, _object_id, ...)   
>   \
>   static const struct uverbs_method_def *const UVERBS_OBJECT_METHODS(\
> - _object_id)[] = { _METHOD(_method_id) };\
> + _object_id)[] = { __VA_ARGS__ };   \
>   static const struct uverbs_object_def _name##_struct = {   \
>   .id = _object_id,  \
> - .num_methods = 1,  \
> + .num_methods = ARRAY_SIZE(UVERBS_OBJECT_METHODS(_object_id)),  \
>   .methods = _OBJECT_METHODS(_object_id)  \
>   }; \
>   static const struct uverbs_object_def *const _name##_ptrs[] = {\
> @@ -123,4 +115,17 @@
>   .objects = &_name##_ptrs,  \
>   }
>  
> +/* Used by drivers to declare a complete parsing tree for a single method 
> that
> + * differs only in having additional driver specific attributes.
> + */
> +#define ADD_UVERBS_ATTRIBUTES_SIMPLE(_name, _object_id, _method_id, ...) 
>   \
> + static const struct uverbs_attr_def *const UVERBS_METHOD_ATTRS(\
> + _method_id)[] = { __VA_ARGS__ };   \
> + static const struct uverbs_method_def UVERBS_METHOD(_method_id) = {\
> + .id = _method_id,  \
> + .num_attrs = ARRAY_SIZE(UVERBS_METHOD_ATTRS(_method_id)),  \
> + .attrs = _METHOD_ATTRS(_method_id), \
> + }; \
> + ADD_UVERBS_METHODS(_name, _object_id, _method_id)

Wow. How does that even compile? Oh I see, the only two users are
passing in a 0 constant which the compiler will understand as NULL
without a warning.

I guess this is an instant crash at runtime?

Should be:

ADD_UVERBS_METHODS(_name, _object_id, _METHOD(_method_id)

Jason


Re: [PATCH rdma-next 4/9] IB/mlx5: Introduce flow steering matcher object

2018-07-10 Thread Jason Gunthorpe
On Sun, Jul 08, 2018 at 01:24:40PM +0300, Leon Romanovsky wrote:
> From: Yishai Hadas 
> 
> Introduce flow steering matcher object and its create and destroy
> methods.
> 
> This matcher object holds some mlx5 specific driver properties that
> matches the underlay device specification when an mlx5 flow steering
> group is created.
> 
> It will be used in downstream patches to be part of mlx5 specific create
> flow method.
> 
> Signed-off-by: Yishai Hadas 
> Signed-off-by: Leon Romanovsky 
>  drivers/infiniband/hw/mlx5/Makefile  |   1 +
>  drivers/infiniband/hw/mlx5/flow.c| 132 
> +++
>  drivers/infiniband/hw/mlx5/mlx5_ib.h |  11 +++
>  include/uapi/rdma/mlx5_user_ioctl_cmds.h |  33 +++-
>  4 files changed, 176 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/infiniband/hw/mlx5/flow.c
> 
> diff --git a/drivers/infiniband/hw/mlx5/Makefile 
> b/drivers/infiniband/hw/mlx5/Makefile
> index 577e4c418bae..b8e4b15e2674 100644
> +++ b/drivers/infiniband/hw/mlx5/Makefile
> @@ -4,3 +4,4 @@ mlx5_ib-y :=  main.o cq.o doorbell.o qp.o mem.o srq.o mr.o 
> ah.o mad.o gsi.o ib_vi
>  mlx5_ib-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += odp.o
>  mlx5_ib-$(CONFIG_MLX5_ESWITCH) += ib_rep.o
>  mlx5_ib-$(CONFIG_INFINIBAND_USER_ACCESS) += devx.o
> +mlx5_ib-$(CONFIG_INFINIBAND_USER_ACCESS) += flow.o
> diff --git a/drivers/infiniband/hw/mlx5/flow.c 
> b/drivers/infiniband/hw/mlx5/flow.c
> new file mode 100644
> index ..99409e516c7f
> +++ b/drivers/infiniband/hw/mlx5/flow.c
> @@ -0,0 +1,132 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "mlx5_ib.h"
> +
> +#define UVERBS_MODULE_NAME mlx5_ib
> +#include 
> +
> +static const struct uverbs_attr_spec mlx5_ib_flow_type[] = {
> + [MLX5_IB_FLOW_TYPE_NORMAL] = {
> + .type = UVERBS_ATTR_TYPE_PTR_IN,
> + UVERBS_ATTR_TYPE(u16),
> + },
> + [MLX5_IB_FLOW_TYPE_SNIFFER] = {
> + /* No need to specify any data */
> + .type = UVERBS_ATTR_TYPE_PTR_IN,

I think this deserves a macro rather than a repeated
comment.. Especially since we now have have a different version in
uverbs_flow_action_esp_replay:

.type = UVERBS_ATTR_TYPE_PTR_IN,
/* No need to specify any data */
UVERBS_ATTR_SIZE(0, 0),

Something simple like:

#define UVERBS_ATTR_NO_DATA() UVERBS_ATTR_SIZE(0, 0)

> +static int flow_matcher_cleanup(struct ib_uobject *uobject,
> + enum rdma_remove_reason why)
> +{
> + struct mlx5_ib_flow_matcher *obj = uobject->object;
> + int ret = atomic_read(>usecnt) ? -EBUSY : 0;
> +
> + if (ib_is_destroy_retryable(ret, why, uobject))
> + return ret;

This is ib_destroy_usecnt() now

> +static int UVERBS_HANDLER(MLX5_IB_METHOD_FLOW_MATCHER_CREATE)(struct 
> ib_device *ib_dev,
> +struct ib_uverbs_file *file,
> +struct uverbs_attr_bundle *attrs)
> +{
> + struct mlx5_ib_dev *dev = to_mdev(ib_dev);

I have a patch changing these - when working with uobj's the ib_dev
argument should not be used, the ib_dev must come from the ucontext
instead.

> + void *cmd_in = uverbs_attr_get_alloced_ptr(attrs,
> +
> MLX5_IB_ATTR_FLOW_MATCHER_MATCH_MASK);
> + struct ib_uobject *uobj;
> + struct mlx5_ib_flow_matcher *obj;
> + int mask_len;
> + int err;

So this is to be written as
struct ib_uobject *uobj = uverbs_attr_get_uobject(attrs,
 MLX5_IB_ATTR_FLOW_MATCHER_CREATE_HANDLE);
struct mlx5_ib_dev *dev = to_mdev(uobj->context->device);

> +
> + obj = kzalloc(sizeof(struct mlx5_ib_flow_matcher), GFP_KERNEL);
> + if (!obj)
> + return -ENOMEM;
> +
> + obj->mask_len = uverbs_attr_get_len(attrs,
> + 
> MLX5_IB_ATTR_FLOW_MATCHER_MATCH_MASK);
> + memcpy(obj->matcher_mask.match_params, cmd_in, obj->mask_len);

As noted before, memcpying an alloced_ptr doesn't make sense, use the
copy from user version instead.

Jason


Re: [PATCH rdma-next 0/9] Support mlx5 flow steering with RAW data

2018-07-10 Thread Jason Gunthorpe
On Sun, Jul 08, 2018 at 01:24:36PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> >From Yishai:
> 
> This series introduces vendor create and destroy flow methods on the
> uverbs flow object by using the KABI infra-structure.
> 
> It's done in a way that enables the driver to get its specific device
> attributes in a raw data to match its underlay specification while still
> using the generic ib_flow object for cleanup and code sharing.
> 
> In addition, a specific mlx5 matcher object and its create/destroy
> methods were introduced. This object matches the underlay flow steering
> mask specification and is used as part of mlx5 create flow input data.
> 
> This series supports IB_QP/TIR as its flow steering destination as
> applicable today via the ib_create_flow API, however, it adds also an
> option to work with DEVX object which its destination can be both TIR
> and flow table.
> 
> Few changes were done in the mlx5 core layer to support forward
> compatible for the device specification raw data and to support flow
> table when the DEVX destination is used.
> 
> As part of this series the default IB destroy handler
> (i.e. uverbs_destroy_def_handler()) was exposed from IB core to be
> used by the drivers and existing code was refactored to use it.
> 
> Thanks

>   IB: Enable uverbs_destroy_def_handler to be used by drivers

I applied this one

> Yishai Hadas (9):
>   net/mlx5: Add forward compatible support for the FTE match data
>   net/mlx5: Add support for flow table destination number
>   IB/mlx5: Introduce flow steering matcher object
>   IB: Consider ib_flow creation by the KABI infrastructure
>   IB/mlx5: Introduce vendor create and destroy flow methods
>   IB/mlx5: Support adding flow steering rule by raw data
>   IB/mlx5: Add support for a flow table destination
>   IB/mlx5: Expose vendor flow trees

The rest will need to be resent after the comments are addressed.

Thanks,
Jason


Re: [PATCH net-next] qed: Add srq core support for RoCE and iWARP

2018-07-09 Thread Jason Gunthorpe
On Wed, May 30, 2018 at 04:11:37PM +0300, Yuval Bason wrote:
> This patch adds support for configuring SRQ and provides the necessary
> APIs for rdma upper layer driver (qedr) to enable the SRQ feature.
> 
> Signed-off-by: Michal Kalderon 
> Signed-off-by: Ariel Elior 
> Signed-off-by: Yuval Bason 
> ---
>  drivers/net/ethernet/qlogic/qed/qed_cxt.c   |   5 +-
>  drivers/net/ethernet/qlogic/qed/qed_cxt.h   |   1 +
>  drivers/net/ethernet/qlogic/qed/qed_hsi.h   |   2 +
>  drivers/net/ethernet/qlogic/qed/qed_iwarp.c |  23 
>  drivers/net/ethernet/qlogic/qed/qed_main.c  |   2 +
>  drivers/net/ethernet/qlogic/qed/qed_rdma.c  | 179 
> +++-
>  drivers/net/ethernet/qlogic/qed/qed_rdma.h  |   2 +
>  drivers/net/ethernet/qlogic/qed/qed_roce.c  |  17 ++-
>  include/linux/qed/qed_rdma_if.h |  12 +-
>  9 files changed, 235 insertions(+), 8 deletions(-)

Is this a pre-requisite for your related RDMA patches?

If yes, are you proposing that this patch should go via the RDMA tree?

Jason


Re: [PATCH rdma-next 0/3] Dump and fill MKEY

2018-07-04 Thread Jason Gunthorpe
On Wed, Jul 04, 2018 at 09:54:59PM +0300, Leon Romanovsky wrote:
> On Wed, Jul 04, 2018 at 11:47:39AM -0600, Jason Gunthorpe wrote:
> > On Tue, Jun 19, 2018 at 08:47:21AM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky 
> > >
> > > MLX5 IB HCA offers the memory key, dump_fill_mkey to increase
> > > performance, when used in a send or receive operations.
> > >
> > > It is used to force local HCA operations to skip the PCI bus access,
> > > while keeping track of the processed length in the ibv_sge handling.
> > >
> > > In this three patch series, we expose various bits in our HW
> > > spec file (mlx5_ifc.h), move unneeded for mlx5_core FW command and
> > > export such memory key to user space thought our mlx5-abi header file.
> > >
> > > Thanks
> >
> > This looks fine, can you send a pull request off the mlx5 branch
> > please?
> 
> Updated mlx5-next with first two commits,
> b183ee27f5fb net/mlx5: Add hardware definitions for dump_fill_mkey
> 4d4fb5dc988a net/mlx5: Limit scope of dump_fill_mkey function

Okay, applied to for-next, with the missing 'if (err)' fixed.

Thanks,
Jason


Re: [PATCH rdma-next 3/3] IB/mlx5: Expose dump and fill memory key

2018-07-04 Thread Jason Gunthorpe
On Tue, Jun 19, 2018 at 08:47:24AM +0300, Leon Romanovsky wrote:
> From: Yonatan Cohen 
> 
> MLX5 IB HCA offers the memory key, dump_fill_mkey to boost
> performance, when used in a send or receive operations.
> 
> It is used to force local HCA operations to skip the PCI bus access,
> while keeping track of the processed length in the ibv_sge handling.
> 
> Meaning, instead of a PCI write access the HCA leaves the target
> memory untouched, and skips filling that packet section. Similar
> behavior is done upon send, the HCA skips data in memory relevant
> to this key and saves PCI bus access.
> 
> This functionality saves PCI read/write operations.
> 
> Signed-off-by: Yonatan Cohen 
> Reviewed-by: Yishai Hadas 
> Reviewed-by: Guy Levi 
> Signed-off-by: Leon Romanovsky 
>  drivers/infiniband/hw/mlx5/main.c | 16 +++-
>  include/uapi/rdma/mlx5-abi.h  |  3 ++-
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c 
> b/drivers/infiniband/hw/mlx5/main.c
> index c29c7c838980..97113957398d 100644
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -1634,6 +1634,7 @@ static struct ib_ucontext 
> *mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
>   int err;
>   size_t min_req_v2 = offsetof(struct mlx5_ib_alloc_ucontext_req_v2,
>max_cqe_version);
> + u32 dump_fill_mkey;
>   bool lib_uar_4k;
>  
>   if (!dev->ib_active)
> @@ -1743,8 +1744,12 @@ static struct ib_ucontext 
> *mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
>   }
>  
>   err = mlx5_ib_devx_create(dev, context);
> + }
> +
> + if (MLX5_CAP_GEN(dev->mdev, dump_fill_mkey)) {
> + err = mlx5_cmd_dump_fill_mkey(dev->mdev, _fill_mkey);
>   if (err)
> - goto out_td;
> + goto out_mdev;
>   }

Dropping the if (err) after mlx5_ib_devx_create is a rebasing error,
right?

Jason


Re: [PATCH rdma-next 0/3] Dump and fill MKEY

2018-07-04 Thread Jason Gunthorpe
On Tue, Jun 19, 2018 at 08:47:21AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> MLX5 IB HCA offers the memory key, dump_fill_mkey to increase
> performance, when used in a send or receive operations.
> 
> It is used to force local HCA operations to skip the PCI bus access,
> while keeping track of the processed length in the ibv_sge handling.
> 
> In this three patch series, we expose various bits in our HW
> spec file (mlx5_ifc.h), move unneeded for mlx5_core FW command and
> export such memory key to user space thought our mlx5-abi header file.
> 
> Thanks

This looks fine, can you send a pull request off the mlx5 branch
please?

Thanks,
Jason


Re: [PATCH rdma-next 00/12] RDMA fixes 2018-06-24

2018-06-26 Thread Jason Gunthorpe
On Tue, Jun 26, 2018 at 07:21:26AM +0300, Leon Romanovsky wrote:
> On Mon, Jun 25, 2018 at 03:34:38PM -0600, Jason Gunthorpe wrote:
> > On Sun, Jun 24, 2018 at 11:23:41AM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky 
> > >
> > > Hi,
> > >
> > > This is bunch of patches trigged by running syzkaller internally.
> > >
> > > I'm sending them based on rdma-next mainly for two reasons:
> > > 1, Most of the patches fix the old issues and it doesn't matter when
> > > they will hit the Linus's tree: now or later in a couple of weeks
> > > during merge window.
> > > 2. They interleave with code cleanup, mlx5-next patches and Michael's
> > > feedback on flow counters series.
> > >
> > > Thanks
> > >
> > > Leon Romanovsky (12):
> > >   RDMA/uverbs: Protect from attempts to create flows on unsupported QP
> > >   RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow
> >
> > I applied these two to for-rc
> >
> > >   RDMA/uverbs: Check existence of create_flow callback
> > >   RDMA/verbs: Drop kernel variant of create_flow
> > >   RDMA/verbs: Drop kernel variant of destroy_flow
> > >   net/mlx5: Rate limit errors in command interface
> > >   RDMA/uverbs: Don't overwrite NULL pointer with ZERO_SIZE_PTR
> > >   RDMA/umem: Don't check for negative return value of dma_map_sg_attrs()
> > >   RDMA/uverbs: Remove redundant check
> >
> > These to for-next
> 
> Jason,
> 
> We would like to see patch "[PATCH mlx5-next 05/12] net/mlx5:
> Rate limit errors in command interface" in out mlx5-next. Is it possible
> at this point to drop it from for-next, so I'll be able to take it into
> mlx5-next?

Okay, you got to this while it was still 'wip', so it is dropped. Add
it to the mlx5-next branch and netdev or rdma can pull it next time
there is some reason to pull the branch..

Jason


Re: [PATCH rdma-next 00/12] RDMA fixes 2018-06-24

2018-06-25 Thread Jason Gunthorpe
On Sun, Jun 24, 2018 at 11:23:41AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Hi,
> 
> This is bunch of patches trigged by running syzkaller internally.
> 
> I'm sending them based on rdma-next mainly for two reasons:
> 1, Most of the patches fix the old issues and it doesn't matter when
> they will hit the Linus's tree: now or later in a couple of weeks
> during merge window.
> 2. They interleave with code cleanup, mlx5-next patches and Michael's
> feedback on flow counters series.
> 
> Thanks
> 
> Leon Romanovsky (12):
>   RDMA/uverbs: Protect from attempts to create flows on unsupported QP
>   RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow

I applied these two to for-rc

>   RDMA/uverbs: Check existence of create_flow callback
>   RDMA/verbs: Drop kernel variant of create_flow
>   RDMA/verbs: Drop kernel variant of destroy_flow
>   net/mlx5: Rate limit errors in command interface
>   RDMA/uverbs: Don't overwrite NULL pointer with ZERO_SIZE_PTR
>   RDMA/umem: Don't check for negative return value of dma_map_sg_attrs()
>   RDMA/uverbs: Remove redundant check

These to for-next

>   overflow.h: Add arithmetic shift helper
>   RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wq
>   RDMA/mlx5: Reuse existed shift_overlow helper

And these will have to be respun.

Thanks,
Jason


Re: [PATCH rdma-next 01/12] RDMA/uverbs: Protect from attempts to create flows on unsupported QP

2018-06-25 Thread Jason Gunthorpe
On Sun, Jun 24, 2018 at 11:23:42AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Flows can be created on UD and RAW_PACKET QP types. Attempts to provide
> other QP types as an input causes to various unpredictable failures.
> 
> The reason to it that in order to support all various types (e.g. XRC),
> we are supposed to use real_qp handle and not qp handle and give to
> driver/FW to fail such (XRC) flows. Being valuable solution, the simpler
> and safer variant is to ban all QP types except UD and RAW_PACKET,
> instead of relying on driver/FW.
> 
> Cc:  # 3.11
> Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
> Cc: syzkaller 
> Reported-by: Noa Osherovich 
> Signed-off-by: Leon Romanovsky 
> ---
>  drivers/infiniband/core/uverbs_cmd.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/uverbs_cmd.c 
> b/drivers/infiniband/core/uverbs_cmd.c
> index 779892b63729..c842a9423fbf 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -3553,14 +3553,20 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file 
> *file,
>   goto err_free_attr;
>   }
>  
> - qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, 
> file->ucontext);
> + qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle,
> +file->ucontext);

This hunk is just whitespace changing

>   if (!qp) {
>   err = -EINVAL;
>   goto err_uobj;
>   }
>  
> + if (qp->qp_type != IB_QPT_UD && qp->qp_type != IB_QPT_RAW_PACKET) {
> + err = -EINVAL;
> + goto err_put;
> + }
> +
>   flow_attr = kzalloc(struct_size(flow_attr, flows,
> - cmd.flow_attr.num_of_specs), GFP_KERNEL);
> + cmd.flow_attr.num_of_specs), 
> GFP_KERNEL);

Same here.

I dropped the two hunks and applied this to for-rc since it has
stable tags.

Jason


Re: [PATCH rdma-next 09/12] RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wq

2018-06-25 Thread Jason Gunthorpe
On Mon, Jun 25, 2018 at 11:10:41AM +0300, Leon Romanovsky wrote:
> On Sun, Jun 24, 2018 at 01:56:24PM -0600, Jason Gunthorpe wrote:
> > On Sun, Jun 24, 2018 at 11:23:50AM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky 
> > >
> > > [   61.182439] UBSAN: Undefined behaviour in 
> > > drivers/infiniband/hw/mlx5/qp.c:5366:34
> > > [   61.183673] shift exponent 4294967288 is too large for 32-bit type 
> > > 'unsigned int'
> > > [   61.185530] CPU: 0 PID: 639 Comm: qp Not tainted 
> > > 4.18.0-rc1-00037-g4aa1d69a9c60-dirty #96
> > > [   61.186981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > BIOS 1.10.2-2.fc27 04/01/2014
> > > [   61.188315] Call Trace:
> > > [   61.188661]  dump_stack+0xc7/0x13b
> > > [   61.190427]  ubsan_epilogue+0x9/0x49
> > > [   61.190899]  __ubsan_handle_shift_out_of_bounds+0x1ea/0x22f
> > > [   61.197040]  mlx5_ib_create_wq+0x1c99/0x1d50
> > > [   61.206632]  ib_uverbs_ex_create_wq+0x499/0x820
> > > [   61.213892]  ib_uverbs_write+0x77e/0xae0
> > > [   61.248018]  vfs_write+0x121/0x3b0
> > > [   61.249831]  ksys_write+0xa1/0x120
> > > [   61.254024]  do_syscall_64+0x7c/0x2a0
> > > [   61.256178]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [   61.259211] RIP: 0033:0x7f54bab70e99
> > > [   61.262125] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 
> > > 48 89 f8 48 89 f7 48 89 d6 48 89
> > > [   61.268678] RSP: 002b:7ffe1541c318 EFLAGS: 0246 ORIG_RAX: 
> > > 0001
> > > [   61.271076] RAX: ffda RBX:  RCX: 
> > > 7f54bab70e99
> > > [   61.273795] RDX: 0070 RSI: 2240 RDI: 
> > > 0003
> > > [   61.276982] RBP: 7ffe1541c330 R08: 200078e0 R09: 
> > > 0002
> > > [   61.280035] R10:  R11: 0246 R12: 
> > > 004005c0
> > > [   61.283279] R13: 7ffe1541c420 R14:  R15: 
> > > 
> > >
> > > Cc:  # 4.7
> > > Fixes: 79b20a6c3014 ("IB/mlx5: Add receive Work Queue verbs")
> > > Cc: syzkaller 
> > > Reported-by: Noa Osherovich 
> > > Signed-off-by: Leon Romanovsky 
> > >  drivers/infiniband/hw/mlx5/qp.c | 6 +-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/infiniband/hw/mlx5/qp.c 
> > > b/drivers/infiniband/hw/mlx5/qp.c
> > > index 6034a670859f..8e40263fd40e 100644
> > > +++ b/drivers/infiniband/hw/mlx5/qp.c
> > > @@ -5377,7 +5377,11 @@ static int set_user_rq_size(struct mlx5_ib_dev 
> > > *dev,
> > >
> > >   rwq->wqe_count = ucmd->rq_wqe_count;
> > >   rwq->wqe_shift = ucmd->rq_wqe_shift;
> > > - rwq->buf_size = (rwq->wqe_count << rwq->wqe_shift);
> > > + rwq->buf_size =
> > > + shift_overflow((size_t)rwq->wqe_count, (size_t)rwq->wqe_shift);
> >
> > The casts are redundant, the function argument is already size_t so
> > implicit promotion is guaranteed.
> 
> rwq->wqe_count and rwq->wqe_shift are declared as u32 and not as size_t.
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/hw/mlx5/mlx5_ib.h#L296

It doesn't matter, passing them to a function accepting size_t does
implicit promotion, the same as the explicit cast.

Jason


Re: [PATCH rdma-next 06/12] RDMA/uverbs: Don't overwrite NULL pointer with ZERO_SIZE_PTR

2018-06-24 Thread Jason Gunthorpe
On Sun, Jun 24, 2018 at 11:23:47AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Number of specs is provided by user and in valid case can be equal to zero.
> Such argument causes to call to kcalloc() with zero-length request and in
> return the ZERO_SIZE_PTR is assigned. This pointer is different from NULL
> and makes various if (..) checks to success.

The one seems really weird. There is nothing wrong with ZERO_SIZE_PTR,
but this description and fix suggest that something did

ptr = kalloc(0);
ptr[0] = ...;

Which is not allowed of course. Doesn't this mean there is also a
missing range check someplace?

Jason


Re: [PATCH rdma-next 09/12] RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wq

2018-06-24 Thread Jason Gunthorpe
On Sun, Jun 24, 2018 at 11:23:50AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> [   61.182439] UBSAN: Undefined behaviour in 
> drivers/infiniband/hw/mlx5/qp.c:5366:34
> [   61.183673] shift exponent 4294967288 is too large for 32-bit type 
> 'unsigned int'
> [   61.185530] CPU: 0 PID: 639 Comm: qp Not tainted 
> 4.18.0-rc1-00037-g4aa1d69a9c60-dirty #96
> [   61.186981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.10.2-2.fc27 04/01/2014
> [   61.188315] Call Trace:
> [   61.188661]  dump_stack+0xc7/0x13b
> [   61.190427]  ubsan_epilogue+0x9/0x49
> [   61.190899]  __ubsan_handle_shift_out_of_bounds+0x1ea/0x22f
> [   61.197040]  mlx5_ib_create_wq+0x1c99/0x1d50
> [   61.206632]  ib_uverbs_ex_create_wq+0x499/0x820
> [   61.213892]  ib_uverbs_write+0x77e/0xae0
> [   61.248018]  vfs_write+0x121/0x3b0
> [   61.249831]  ksys_write+0xa1/0x120
> [   61.254024]  do_syscall_64+0x7c/0x2a0
> [   61.256178]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   61.259211] RIP: 0033:0x7f54bab70e99
> [   61.262125] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 
> f8 48 89 f7 48 89 d6 48 89
> [   61.268678] RSP: 002b:7ffe1541c318 EFLAGS: 0246 ORIG_RAX: 
> 0001
> [   61.271076] RAX: ffda RBX:  RCX: 
> 7f54bab70e99
> [   61.273795] RDX: 0070 RSI: 2240 RDI: 
> 0003
> [   61.276982] RBP: 7ffe1541c330 R08: 200078e0 R09: 
> 0002
> [   61.280035] R10:  R11: 0246 R12: 
> 004005c0
> [   61.283279] R13: 7ffe1541c420 R14:  R15: 
> 
> 
> Cc:  # 4.7
> Fixes: 79b20a6c3014 ("IB/mlx5: Add receive Work Queue verbs")
> Cc: syzkaller 
> Reported-by: Noa Osherovich 
> Signed-off-by: Leon Romanovsky 
>  drivers/infiniband/hw/mlx5/qp.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
> index 6034a670859f..8e40263fd40e 100644
> +++ b/drivers/infiniband/hw/mlx5/qp.c
> @@ -5377,7 +5377,11 @@ static int set_user_rq_size(struct mlx5_ib_dev *dev,
>  
>   rwq->wqe_count = ucmd->rq_wqe_count;
>   rwq->wqe_shift = ucmd->rq_wqe_shift;
> - rwq->buf_size = (rwq->wqe_count << rwq->wqe_shift);
> + rwq->buf_size =
> + shift_overflow((size_t)rwq->wqe_count, (size_t)rwq->wqe_shift);

The casts are redundant, the function argument is already size_t so
implicit promotion is guaranteed.

Jason


Re: [PATCH rdma-next 0/2] RoCE ICRC counter

2018-06-22 Thread Jason Gunthorpe
On Thu, Jun 21, 2018 at 03:37:54PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Hi,
> 
> This series exposes RoCE ICRC counter through existing RDMA hw_counters
> sysfs interface.
> 
> First patch has all HW definitions in mlx5_ifc.h file and second patch is
> actual counter implementation.
> 
> Thanks
> 
> Talat Batheesh (2):
>   net/mlx5: Add RoCE RX ICRC encapsulated counter
>   IB/mlx5: Support RoCE ICRC encapsulated error counter
> 
>  drivers/infiniband/hw/mlx5/cmd.c | 12 +++
>  drivers/infiniband/hw/mlx5/cmd.h |  1 +
>  drivers/infiniband/hw/mlx5/main.c| 62 
> ++--
>  drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
>  include/linux/mlx5/mlx5_ifc.h| 11 +--
>  5 files changed, 81 insertions(+), 6 deletions(-)

Applied to rdma for-next with the mellanox/mlx5-next branch

Thanks,
Jason


Re: [PATCH rdma-next 0/2] RoCE ICRC counter

2018-06-21 Thread Jason Gunthorpe
On Thu, Jun 21, 2018 at 03:37:54PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Hi,
> 
> This series exposes RoCE ICRC counter through existing RDMA hw_counters
> sysfs interface.
> 
> First patch has all HW definitions in mlx5_ifc.h file and second patch is
> actual counter implementation.

The RDMA parts are OK, can you please send me the commit for the mlx5
patch when applied?

Thanks,
Jason


Re: [PATCH rdma-next 0/3] Dump and fill MKEY

2018-06-19 Thread Jason Gunthorpe
On Tue, Jun 19, 2018 at 08:47:21AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> MLX5 IB HCA offers the memory key, dump_fill_mkey to increase
> performance, when used in a send or receive operations.
> 
> It is used to force local HCA operations to skip the PCI bus access,
> while keeping track of the processed length in the ibv_sge handling.
> 
> In this three patch series, we expose various bits in our HW
> spec file (mlx5_ifc.h), move unneeded for mlx5_core FW command and
> export such memory key to user space thought our mlx5-abi header file.

Where is the user space for this?

Jason


Re: [PATCH rdma-next v2 00/20] Introduce mlx5 DEVX interface

2018-06-18 Thread Jason Gunthorpe
On Sun, Jun 17, 2018 at 12:59:46PM +0300, Leon Romanovsky wrote:
 
> Leon Romanovsky (2):
>   drm/i915: Move u64-to-ptr helpers to general header
>   kernel.h: Reuse u64_to_ptr macro to cast __user pointers

I dropped these since they are not needed by this series when using a
union.

> Matan Barak (5):
>   IB/uverbs: Export uverbs idr and fd types
>   IB/uverbs: Add PTR_IN attributes that are allocated/copied
> automatically

Revised this one, as noted

>   IB/uverbs: Add a macro to define a type with no kernel known size
>   IB/uverbs: Allow an empty namespace in ioctl() framework
>   IB/uverbs: Refactor uverbs_finalize_objects

I put the above in a branch and can apply them if you ack my revisions..

>   net/mlx5_core: Prevent warns in dmesg upon firmware commands
>   IB/core: Improve uverbs_cleanup_ucontext algorithm

I dropped these two (they are linked), need comments addressed and
resent.

> Yishai Hadas (13):
>   net/mlx5: Expose DEVX ifc structures
>   IB/mlx5: Introduce DEVX
>   IB/core: Introduce DECLARE_UVERBS_GLOBAL_METHODS
>   IB: Expose ib_ucontext from a given ib_uverbs_file
>   IB/mlx5: Add support for DEVX general command
>   IB/mlx5: Add obj create and destroy functionality
>   IB/mlx5: Add DEVX support for modify and query commands
>   IB/mlx5: Add support for DEVX query UAR
>   IB/mlx5: Add DEVX support for memory registration
>   IB/mlx5: Add DEVX query EQN support
>   IB/mlx5: Expose DEVX tree

I put these in a branch also and can apply them, but I need the first
two patches in the mlx5 core branch first please, thanks.

Since this requires so many core patches I think I prefer to merge the
mlx core branch then apply rather merge a branch.

Jason


Re: [PATCH rdma-next v2 06/20] IB/uverbs: Add PTR_IN attributes that are allocated/copied automatically

2018-06-18 Thread Jason Gunthorpe
On Sun, Jun 17, 2018 at 12:59:52PM +0300, Leon Romanovsky wrote:
> From: Matan Barak 
> 
> Adding UVERBS_ATTR_SPEC_F_ALLOC_AND_COPY flag to PTR_IN attributes.
> By using this flag, the parse automatically allocates and copies the
> user-space data. This data is accessible by using uverbs_attr_get_len
> and uverbs_attr_get_alloced_ptr inline accessor functions from the
> handler.
> 
> Signed-off-by: Matan Barak 
> Signed-off-by: Yishai Hadas 
> Signed-off-by: Leon Romanovsky 
>  drivers/infiniband/core/uverbs_ioctl.c | 25 -
>  include/rdma/uverbs_ioctl.h| 25 +
>  2 files changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/uverbs_ioctl.c 
> b/drivers/infiniband/core/uverbs_ioctl.c
> index 8cc3e8dad9b5..ee15c9ca788b 100644
> +++ b/drivers/infiniband/core/uverbs_ioctl.c
> @@ -114,7 +114,26 @@ static int uverbs_process_attr(struct ib_device *ibdev,
>   uattr->attr_data.reserved)
>   return -EINVAL;
>  
> - e->ptr_attr.data = uattr->data;
> + if (val_spec->flags & UVERBS_ATTR_SPEC_F_ALLOC_AND_COPY &&
> + uattr->len > sizeof(((struct ib_uverbs_attr *)0)->data)) {

Why open-code uverbs_attr_ptr_is_inline() ?

> diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
> index bd6bba3a6e04..0e6f782727bd 100644
> +++ b/include/rdma/uverbs_ioctl.h
> @@ -65,6 +65,8 @@ enum {
>   UVERBS_ATTR_SPEC_F_MANDATORY= 1U << 0,
>   /* Support extending attributes by length, validate all unknown size == 
> zero  */
>   UVERBS_ATTR_SPEC_F_MIN_SZ_OR_ZERO = 1U << 1,
> + /* Valid only for PTR_IN. Allocate and copy the data inside the parser 
> */
> + UVERBS_ATTR_SPEC_F_ALLOC_AND_COPY = 1U << 2,
>  };
>  
>  /* Specification of a single attribute inside the ioctl message */
> @@ -431,6 +433,17 @@ static inline struct ib_uobject 
> *uverbs_attr_get_uobject(const struct uverbs_att
>   return attr->obj_attr.uobject;
>  }
>  
> +static inline int uverbs_attr_get_len(const struct uverbs_attr_bundle 
> *attrs_bundle,
> +   u16 idx)
> +{
> + const struct uverbs_attr *attr = uverbs_attr_get(attrs_bundle, idx);
> +
> + if (IS_ERR(attr))
> + return PTR_ERR(attr);
> +
> + return attr->ptr_attr.len;
> +}
> +
>  static inline int uverbs_copy_to(const struct uverbs_attr_bundle 
> *attrs_bundle,
>size_t idx, const void *from, size_t size)
>  {
> @@ -457,6 +470,18 @@ static inline bool uverbs_attr_ptr_is_inline(const 
> struct uverbs_attr *attr)
>   return attr->ptr_attr.len <= sizeof(attr->ptr_attr.data);
>  }
>  
> +static inline void *uverbs_attr_get_alloced_ptr(const struct 
> uverbs_attr_bundle *attrs_bundle,
> + u16 idx)
> +{
> + const struct uverbs_attr *attr = uverbs_attr_get(attrs_bundle, idx);
> +
> + if (IS_ERR(attr))
> + return (void *)attr;
> +
> + return uverbs_attr_ptr_is_inline(attr) ? u64_to_ptr(void *, 
> attr->ptr_attr.data) :
> + u64_to_ptr(void, attr->ptr_attr.data);

WTF is this:

   u64_to_ptr(void *, attr->ptr_attr.data) 

That returns attr->ptr_attr.data casted to a void **, then casts it to
a void * - which is identical to u64_to_ptr(void, attr->ptr_attr.data)

It should be >ptr_attr.data. And the return should be const, the
caller shouldn't be mutating the copy.

All this use of u64_to_ptr is ugly needless obfuscation here, and
look, it causes bugs. Use a union.

Like this:

diff --git a/drivers/infiniband/core/uverbs_ioctl.c 
b/drivers/infiniband/core/uverbs_ioctl.c
index 8cc3e8dad9b506..82c5d33195dfc7 100644
--- a/drivers/infiniband/core/uverbs_ioctl.c
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -114,9 +114,27 @@ static int uverbs_process_attr(struct ib_device *ibdev,
uattr->attr_data.reserved)
return -EINVAL;
 
-   e->ptr_attr.data = uattr->data;
e->ptr_attr.len = uattr->len;
e->ptr_attr.flags = uattr->flags;
+
+   if (val_spec->flags & UVERBS_ATTR_SPEC_F_ALLOC_AND_COPY &&
+   !uverbs_attr_ptr_is_inline(e)) {
+   void *p;
+
+   p = kvmalloc(uattr->len, GFP_KERNEL);
+   if (!p)
+   return -ENOMEM;
+
+   e->ptr_attr.ptr = p;
+
+   if (copy_from_user(p, u64_to_user_ptr(uattr->data),
+  uattr->len)) {
+   kvfree(p);
+   return -EFAULT;
+   }
+   } else {
+   e->ptr_attr.data = uattr->data;
+   }
break;
 
case UVERBS_ATTR_TYPE_IDR:
@@ -201,6 +219,10 @@ static int uverbs_finalize_attrs(struct uverbs_attr_bundle 
*attrs_bundle,
   

Re: [PATCH rdma-next v2 09/20] IB/core: Improve uverbs_cleanup_ucontext algorithm

2018-06-17 Thread Jason Gunthorpe
On Sun, Jun 17, 2018 at 12:59:55PM +0300, Leon Romanovsky wrote:

> +void uverbs_cleanup_ucontext(struct ib_ucontext *ucontext, bool 
> device_removed)
> +{
>   /*
>* Waits for all remove_commit and alloc_commit to finish. Logically, We
>* want to hold this forever as the context is going to be destroyed,
>* but we'll release it since it causes a "held lock freed" BUG message.
>*/
>   down_write(>cleanup_rwsem);
> + while (!list_empty(>uobjects))
> + if (__uverbs_cleanup_ucontext(ucontext, RDMA_REMOVE_DESTROY))
> + /* No entry was cleaned-up successfully during this 
> iteration */
> + break;

No, this isn't right, it must remain REMOVE or CLOSE here. The enum is
a signal to the driver what is going on. DESTROY is only for user
triggered destroy called in a user context.

There needs to be some kind of guarenteed return from the driver that
destroy is failing due to elevated refcounts, and not some other
reason.. This is just checking for any ret?

> - while (!list_empty(>uobjects)) {
> - struct ib_uobject *obj, *next_obj;
> - unsigned int next_order = UINT_MAX;
> + if (!list_empty(>uobjects))
> + __uverbs_cleanup_ucontext(ucontext, device_removed ?
> + RDMA_REMOVE_DRIVER_REMOVE : RDMA_REMOVE_CLOSE);

Failure to cleanup is a driver bug, and should be reported with
WARN_ON. This is also mis using the remove enum, CLOSE is not a
'bigger hammer'

Jason


Re: [PATCH rdma-next v3 00/14] Verbs flow counters support

2018-06-04 Thread Jason Gunthorpe
On Sat, Jun 02, 2018 at 08:04:19AM +0300, Leon Romanovsky wrote:
> On Fri, Jun 01, 2018 at 03:11:49PM -0600, Jason Gunthorpe wrote:
> > On Thu, May 31, 2018 at 04:43:27PM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky 
> > >
> > > Changelog:
> > > v2->v3:
> > >  * Change function mlx5_fc_query signature to hide the details of
> > >internal core driver struct mlx5_fc
> > >  * Add commen to data[] field at struct mlx5_ib_flow_counters_data 
> > > (mlx5-abi.h)
> > >  * Use array of struct mlx5_ib_flow_counters_desc to clarify the output
> > > v1->v2:
> > >  * Removed conversion from struct mlx5_fc* to void*
> > >  * Fixed one place with double space in it
> > >  * Balanced release of hardware handler in case of counters allocation 
> > > failure
> > >  * Added Tested-by
> > >  * Minimize time spent holding mutex lock
> > >  * Fixed deadlock caused by nested lock in error path
> > >  * Protect from handler pointer derefence in the error paths
> >
> > Okay,
> >
> > Acked-by: Jason Gunthorpe 
> >
> > I've revised some of the commit messages, fixed the two bad
> > check-patch warnings, and fixed the patch ordering..
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/jgg-counters
> >
> > Please send a PR with the mlx-core bits and above commits.
> 
> Hi,
> 
> I applied two mlx5-next commits to the relevant tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/commit/?h=mlx5-next=930821e39d0a5f91ed58fea1692afe04f0fe0e1f
> https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/commit/?h=mlx5-next=5f9bf63ae80c4d0e5e986b6c1280bf8174978545
> 
> In first commit, I dropped the words "as used to be", per-Saeed's request.
> 
> The proper signed tag for whole the series is: verbs_flow_counters
> git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git 
> tags/verbs_flow_counters

Okay pulled, thanks

Jason


Re: [PATCH rdma-next v3 00/14] Verbs flow counters support

2018-06-01 Thread Jason Gunthorpe
On Thu, May 31, 2018 at 04:43:27PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Changelog:
> v2->v3:
>  * Change function mlx5_fc_query signature to hide the details of
>internal core driver struct mlx5_fc
>  * Add commen to data[] field at struct mlx5_ib_flow_counters_data 
> (mlx5-abi.h)
>  * Use array of struct mlx5_ib_flow_counters_desc to clarify the output
> v1->v2:
>  * Removed conversion from struct mlx5_fc* to void*
>  * Fixed one place with double space in it
>  * Balanced release of hardware handler in case of counters allocation failure
>  * Added Tested-by
>  * Minimize time spent holding mutex lock
>  * Fixed deadlock caused by nested lock in error path
>  * Protect from handler pointer derefence in the error paths

Okay,

Acked-by: Jason Gunthorpe 

I've revised some of the commit messages, fixed the two bad
check-patch warnings, and fixed the patch ordering..

https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/jgg-counters

Please send a PR with the mlx-core bits and above commits.

Thanks,
Jason


Re: [PATCH rdma-next v3 05/14] IB/uverbs: Add create/destroy counters support

2018-06-01 Thread Jason Gunthorpe
> diff --git a/drivers/infiniband/core/uverbs_std_types_counters.c 
> b/drivers/infiniband/core/uverbs_std_types_counters.c
> new file mode 100644
> index ..a5bc50ceee13
> +++ b/drivers/infiniband/core/uverbs_std_types_counters.c
> @@ -0,0 +1,100 @@
> +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
> BSD-2-Clause) */

Check patch tells me this is malformed should be:

// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB

Apparently the WITH Linux-syscall-note is only used in uapi header
files.

> +/*
> + * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + *  - Redistributions of source code must retain the above
> + *copyright notice, this list of conditions and the following
> + *disclaimer.
> + *
> + *  - Redistributions in binary form must reproduce the above
> + *copyright notice, this list of conditions and the following
> + *disclaimer in the documentation and/or other materials
> + *provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */

And this is not a SPDX BSD-2-Clause license, this is the SPDX
Linux-OpenIB license.

Please be careful to use the correct tag with SPDX..

Also can you check if these SPDX tags are what are intended:

include/rdma/restrack.h:/* SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause) 
*/
drivers/infiniband/core/restrack.c:/* SPDX-License-Identifier: (GPL-2.0+ OR 
BSD-3-Clause) */
drivers/infiniband/hw/mlx5/ib_rep.c:/* SPDX-License-Identifier: (GPL-2.0+ OR 
BSD-3-Clause) */
drivers/infiniband/hw/mlx5/ib_rep.h:/* SPDX-License-Identifier: (GPL-2.0+ OR 
BSD-3-Clause) */

I'm not super excited about the license proliferation, so if they
should have been OR Linux-OpenIB as well then please send a patch.

Jason


Re: [PATCH mlx5-next v2 11/13] IB/mlx5: Add flow counters binding support

2018-05-30 Thread Jason Gunthorpe
On Wed, May 30, 2018 at 06:24:00PM +0300, Yishai Hadas wrote:
> On 5/30/2018 6:06 PM, Jason Gunthorpe wrote:
> >On Wed, May 30, 2018 at 03:31:34PM +0300, Yishai Hadas wrote:
> >>On 5/29/2018 10:56 PM, Jason Gunthorpe wrote:
> >>>On Tue, May 29, 2018 at 04:09:15PM +0300, Leon Romanovsky wrote:
> >>>>diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
> >>>>index 508ea8c82da7..ef3f430a7050 100644
> >>>>+++ b/include/uapi/rdma/mlx5-abi.h
> >>>>@@ -443,4 +443,18 @@ enum {
> >>>>  enum {
> >>>>  MLX5_IB_CLOCK_INFO_V1  = 0,
> >>>>  };
> >>>>+
> >>>>+struct mlx5_ib_flow_counters_data {
> >>>>+ __aligned_u64   counters_data;
> >>>>+ __u32   ncounters;
> >>>>+ __u32   reserved;
> >>>>+};
> >>>>+
> >>>>+struct mlx5_ib_create_flow {
> >>>>+ __u32   ncounters_data;
> >>>>+ __u32   reserved;
> >>>>+ /* Following are counters data based on ncounters_data */
> >>>>+ struct mlx5_ib_flow_counters_data data[];
> >>>>+};
> >>>>+
> >>>>  #endif /* MLX5_ABI_USER_H */
> >>>
> >>>This uapi thing still needs to be fixed as I pointed out before.
> >>
> >>In V3 we can go with below, no change in memory layout but it can clarify
> >>the code/usage.
> >>
> >>struct mlx5_ib_flow_counters_desc {
> >> __u32   description;
> >> __u32   index;
> >>};
> >>
> >>struct mlx5_ib_flow_counters_data {
> >> RDMA_UAPI_PTR(struct mlx5_ib_flow_counters_desc *, counters_data);
> >> __u32   ncounters;
> >> __u32   reserved;
> >>};
> >
> >OK, this is what I asked for originally..
> >
> >>struct mlx5_ib_create_flow {
> >> __u32   ncounters_data;
> >> __u32   reserved;
> >> /* Following are counters data based on ncounters_data */
> >> struct mlx5_ib_flow_counters_data data[];
> >>
> >>
> >>>I still can't figure out why this should be a 2d array.
> >>
> >>This comes to support the future case of multiple counters objects/specs
> >>passed with the same flow. There is a need to differentiate mapping data for
> >>each counters object and that is done via the 'ncounters_data' field and the
> >>2d array.
> >
> >This still doesn't make any sense to me. How are these multiple
> >counters objects/specs going to be identified?
> >
> >Basically, what does the array index for data[] mean? Should it match
> >the spec index from the main verb or something?
> >
> 
> Each entry in the data[] should match a corresponding counter object that
> was pointed by a counters spec upon the flow creation.

What if there are a mixture of specs, some with counters and some
without?

The index is just matching the index of the spec? That makes sense.

> >This is a good place for a comment, since the intention is completely
> >opaque here.
> 
> Sure, we'll add comment to clarify the above.

Sure, can leave the flex array then too

Jason


Re: [PATCH mlx5-next v2 11/13] IB/mlx5: Add flow counters binding support

2018-05-30 Thread Jason Gunthorpe
On Wed, May 30, 2018 at 03:31:34PM +0300, Yishai Hadas wrote:
> On 5/29/2018 10:56 PM, Jason Gunthorpe wrote:
> >On Tue, May 29, 2018 at 04:09:15PM +0300, Leon Romanovsky wrote:
> >>diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
> >>index 508ea8c82da7..ef3f430a7050 100644
> >>+++ b/include/uapi/rdma/mlx5-abi.h
> >>@@ -443,4 +443,18 @@ enum {
> >>  enum {
> >>MLX5_IB_CLOCK_INFO_V1  = 0,
> >>  };
> >>+
> >>+struct mlx5_ib_flow_counters_data {
> >>+   __aligned_u64   counters_data;
> >>+   __u32   ncounters;
> >>+   __u32   reserved;
> >>+};
> >>+
> >>+struct mlx5_ib_create_flow {
> >>+   __u32   ncounters_data;
> >>+   __u32   reserved;
> >>+   /* Following are counters data based on ncounters_data */
> >>+   struct mlx5_ib_flow_counters_data data[];
> >>+};
> >>+
> >>  #endif /* MLX5_ABI_USER_H */
> >
> >This uapi thing still needs to be fixed as I pointed out before.
> 
> In V3 we can go with below, no change in memory layout but it can clarify
> the code/usage.
> 
> struct mlx5_ib_flow_counters_desc {
> __u32   description;
> __u32   index;
> };
> 
> struct mlx5_ib_flow_counters_data {
> RDMA_UAPI_PTR(struct mlx5_ib_flow_counters_desc *, counters_data);
> __u32   ncounters;
> __u32   reserved;
> };

OK, this is what I asked for originally..

> struct mlx5_ib_create_flow {
> __u32   ncounters_data;
> __u32   reserved;
> /* Following are counters data based on ncounters_data */
> struct mlx5_ib_flow_counters_data data[];
> 
> 
> >I still can't figure out why this should be a 2d array.
> 
> This comes to support the future case of multiple counters objects/specs
> passed with the same flow. There is a need to differentiate mapping data for
> each counters object and that is done via the 'ncounters_data' field and the
> 2d array.

This still doesn't make any sense to me. How are these multiple
counters objects/specs going to be identified?

Basically, what does the array index for data[] mean? Should it match
the spec index from the main verb or something?

This is a good place for a comment, since the intention is completely
opaque here.

> >A flex array at the end of a struct means that the struct can never be
> >extended again which seems like a terrible idea,
> 
> The header [1] has a fixed size and will always exist even if there will be
> no counters. Future extensions [2] will be added in the memory post the flex
> array which its size depends on 'ncounters_data'. This pattern is used also
> in other extended APIs. [3]
> 
> struct mlx5_ib_create_flow {
> __u32   ncounters_data;
> __u32   reserved;
> [1] /* Header is above 
> 
> /* Following are counters data based on ncounters_data */
> struct mlx5_ib_flow_counters_data data[];
> 
> [2] Future fields.

We could do that.. But we won't - if it comes to it this will have to
move to the new kabi.

> [3] 
> https://elixir.bootlin.com/linux/latest/source/include/uapi/rdma/ib_user_verbs.h#L1145

?? That looks like a normal flex array to me.

Jason


Re: [PATCH mlx5-next 1/3] net/mlx5: Exposing a new mini-CQE format

2018-05-29 Thread Jason Gunthorpe
On Tue, May 29, 2018 at 03:01:27PM -0600, Saeed Mahameed wrote:
> On Sun, 2018-05-27 at 13:42 +0300, Leon Romanovsky wrote:
> > From: Yonatan Cohen 
> > 
> > The new mini-CQE format includes byte-count, checksum
> > and stride index.
> > 
> > Reviewed-by: Yishai Hadas 
> > Reviewed-by: Guy Levi 
> > Signed-off-by: Yonatan Cohen 
> > Signed-off-by: Leon Romanovsky 
> 
> 
> Applied to mlx5-next.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git  mlx5-
> next
> 
> commit-id: ab741b2eed3e456cebd2240d4c9c6be003d5ae72

Thanks, everything is now merged as:

https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?h=wip/jgg-for-next=f3ca0ab114e0de3bbad4c4a537d32fb57aa42f81

Jason


Re: [PATCH rdma-next v2 01/13] IB/uverbs: Add an ib_uobject getter to ioctl() infrastructure

2018-05-29 Thread Jason Gunthorpe
On Tue, May 29, 2018 at 08:49:58PM +, Ruhl, Michael J wrote:
> >From: Jason Gunthorpe [mailto:j...@mellanox.com]
> >Sent: Tuesday, May 29, 2018 4:21 PM
> >To: Ruhl, Michael J 
> >Cc: Leon Romanovsky ; Doug Ledford
> >; Leon Romanovsky ; RDMA
> >mailing list ; Boris Pismenny
> >; Matan Barak ; Raed
> >Salem ; Yishai Hadas ; Saeed
> >Mahameed ; linux-netdev
> >
> >Subject: Re: [PATCH rdma-next v2 01/13] IB/uverbs: Add an ib_uobject getter
> >to ioctl() infrastructure
> >
> >On Tue, May 29, 2018 at 07:31:22PM +, Ruhl, Michael J wrote:
> >> >- struct ib_uverbs_destroy_cq_resp resp;
> >> >  struct ib_uobject *uobj =
> >> >- uverbs_attr_get(attrs,
> >> >UVERBS_ATTR_DESTROY_CQ_HANDLE)->obj_attr.uobject;
> >> >- struct ib_ucq_object *obj = container_of(uobj, struct ib_ucq_object,
> >> >-  uobject);
> >> >+ uverbs_attr_get_uobject(attrs,
> >> >UVERBS_ATTR_DESTROY_CQ_HANDLE);
> >> >+ struct ib_uverbs_destroy_cq_resp resp;
> >> >+ struct ib_ucq_object *obj;
> >> >  int ret;
> >> >
> >> >+ if (IS_ERR(uobj))
> >> >+ return PTR_ERR(uobj);
> >> >+
> >>
> >> I remember a conversation that if an method attribute was mandatory, that
> >you did not need to
> >> test the uobj for error (since it was checked in the infrastructure).
> >
> >Yes.
> >
> >> Is this error check necessary?
> >
> >No
> >
> >But there is no way to check one way or the other at compile time
> >right now, and omitting the check makes smatch mad.
> 
> Is smatch going to get mad at (same patch):

Yes, this is where it already got mad, IIRC :( 

Fixing this whole thing is a todo on my list..

Jason


Re: [PATCH rdma-next v2 01/13] IB/uverbs: Add an ib_uobject getter to ioctl() infrastructure

2018-05-29 Thread Jason Gunthorpe
On Tue, May 29, 2018 at 07:31:22PM +, Ruhl, Michael J wrote:
> >-struct ib_uverbs_destroy_cq_resp resp;
> > struct ib_uobject *uobj =
> >-uverbs_attr_get(attrs,
> >UVERBS_ATTR_DESTROY_CQ_HANDLE)->obj_attr.uobject;
> >-struct ib_ucq_object *obj = container_of(uobj, struct ib_ucq_object,
> >- uobject);
> >+uverbs_attr_get_uobject(attrs,
> >UVERBS_ATTR_DESTROY_CQ_HANDLE);
> >+struct ib_uverbs_destroy_cq_resp resp;
> >+struct ib_ucq_object *obj;
> > int ret;
> >
> >+if (IS_ERR(uobj))
> >+return PTR_ERR(uobj);
> >+
> 
> I remember a conversation that if an method attribute was mandatory, that you 
> did not need to
> test the uobj for error (since it was checked in the infrastructure).

Yes.

> Is this error check necessary?

No

But there is no way to check one way or the other at compile time
right now, and omitting the check makes smatch mad.

We need some more patches to be able to safely omit the check...

Jason


Re: [PATCH rdma-next 0/3] Introduce new mlx5 CQE format

2018-05-29 Thread Jason Gunthorpe
On Sun, May 27, 2018 at 01:42:31PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Introduce new internal to mlx5 CQE format - mini-CQE. It is a CQE in
> compressed form that holds data needed to extra a single full CQE.
> 
> It stride index, byte count and packet checksum.
> 
> Thanks
> 
> Yonatan Cohen (3):
>   net/mlx5: Exposing a new mini-CQE format
>   IB/mlx5: Refactor CQE compression response
>   IB/mlx5: Introduce a new mini-CQE format

Applied to for-next.

Generally taking new uapi patches that are first the list should have
a few weeks of comment period, but since this is just adding a new bit
to an existing driver private api it seems OK to go this merge window.

Thanks,
Jason


Re: [PATCH mlx5-next v2 11/13] IB/mlx5: Add flow counters binding support

2018-05-29 Thread Jason Gunthorpe
On Tue, May 29, 2018 at 04:09:15PM +0300, Leon Romanovsky wrote:
> diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
> index 508ea8c82da7..ef3f430a7050 100644
> +++ b/include/uapi/rdma/mlx5-abi.h
> @@ -443,4 +443,18 @@ enum {
>  enum {
>   MLX5_IB_CLOCK_INFO_V1  = 0,
>  };
> +
> +struct mlx5_ib_flow_counters_data {
> + __aligned_u64   counters_data;
> + __u32   ncounters;
> + __u32   reserved;
> +};
> +
> +struct mlx5_ib_create_flow {
> + __u32   ncounters_data;
> + __u32   reserved;
> + /* Following are counters data based on ncounters_data */
> + struct mlx5_ib_flow_counters_data data[];
> +};
> +
>  #endif /* MLX5_ABI_USER_H */

This uapi thing still needs to be fixed as I pointed out before.

I still can't figure out why this should be a 2d array. I think it
should be written simply as:

struct mlx5_ib_flow_counter_desc {
__u32 description;
__u32 index;
};

struct mlx5_ib_create_flow {
RDMA_UAPI_PTR(struct mlx5_ib_flow_counter_desc, counters_data);
__u32   ncounters;
__u32   reserved;
};

With the corresponding changes elsewhere.

A flex array at the end of a struct means that the struct can never be
extended again which seems like a terrible idea, especially since I
can't fathom why we'd need more that one array of counters and the
current code doesn't even support more than one..

Jason


Re: [PATCH rdma-next v1 11/13] IB/mlx5: Add flow counters binding support

2018-05-28 Thread Jason Gunthorpe
On Sun, May 27, 2018 at 01:23:44PM +0300, Leon Romanovsky wrote:

> + if (!mcounters->hw_cntrs_hndl) {
> + mcounters->hw_cntrs_hndl =
> + (void 
> *)mlx5_fc_create(to_mdev(ibcounters->device)->mdev,
> +false);

No unnecessary casts

> +struct mlx5_ib_flow_counters_data {
> + __aligned_u64   counters_data;

I think this is supposed to use RDMA_UAPI_PTR() these days.

Jason


Re: [PATCH rdma-next 3/3] IB/mlx5: Introduce a new mini-CQE format

2018-05-28 Thread Jason Gunthorpe
On Mon, May 28, 2018 at 07:52:03PM +0300, Yishai Hadas wrote:
> On 5/28/2018 7:11 PM, Jason Gunthorpe wrote:
> >On Sun, May 27, 2018 at 01:42:34PM +0300, Leon Romanovsky wrote:
> >>From: Yonatan Cohen 
> >>
> >>The new mini-CQE format includes the stride index, byte count and
> >>packet checksum.
> >>Stride index is needed for striding WQ feature.
> >>This patch exposes this capability and enables its setting
> >>via mlx5 UHW data as part of query device and cq creation.
> >>
> >>Reviewed-by: Yishai Hadas 
> >>Reviewed-by: Guy Levi 
> >>Signed-off-by: Yonatan Cohen 
> >>Signed-off-by: Leon Romanovsky 
> >>  drivers/infiniband/hw/mlx5/cq.c   | 42 
> >> +--
> >>  drivers/infiniband/hw/mlx5/main.c |  4 
> >>  include/uapi/rdma/mlx5-abi.h  |  2 +-
> >>  3 files changed, 37 insertions(+), 11 deletions(-)
> >>
> >>diff --git a/drivers/infiniband/hw/mlx5/cq.c 
> >>b/drivers/infiniband/hw/mlx5/cq.c
> >>index 7b4ce1a19de0..ad39d64b8108 100644
> >>+++ b/drivers/infiniband/hw/mlx5/cq.c
> >>@@ -751,6 +751,28 @@ static int alloc_cq_frag_buf(struct mlx5_ib_dev *dev,
> >>return 0;
> >>  }
> >>+enum {
> >>+   MLX5_CQE_RES_FORMAT_HASH = 0,
> >>+   MLX5_CQE_RES_FORMAT_CSUM = 1,
> >>+   MLX5_CQE_RES_FORMAT_CSUM_STRIDX = 3,
> >>+};
> >
> >What is this??
> 
> Those are mlx5 device values not uapi.
> 
> >>+static int mini_cqe_res_format_to_hw(struct mlx5_ib_dev *dev, u8 format)
> >>+{
> >>+   switch (format) {
> >>+   case MLX5_IB_CQE_RES_FORMAT_HASH:
> >>+   return MLX5_CQE_RES_FORMAT_HASH;
> >
> >Used here..
> 
> This is some conversion between the uapi to the device values.
> 
> >>+   mini_cqe_format =
> >>+   mini_cqe_res_format_to_hw(dev,
> >>+ ucmd.cqe_comp_res_format);
> >
> >And format comes from a ucmd, so that enum is upai.
> 
> Correct, see mlx5-abi.h as part of this patch.
> 
> >Put it in the right place and put the right comment beside
> >struct mlx5_ib_create_cq's cqe_comp_res_format..
> >
> >And what is wrong with the user space patches? Where is the update to
> >enum mlx5dv_cqe_comp_res_format ? And why is this wrong?
> >
> 
> See the first patch from below PR [1], it brings the new enum value to the
> user area as part of kernel-headers/rdma/mlx5-abi.h.
> 
> [1] https://github.com/linux-rdma/rdma-core/pull/337
> 
> >struct mlx5dv_cq_init_attr {
> > uint64_t comp_mask; /* Use enum mlx5dv_cq_init_attr_mask */
> > uint8_t cqe_comp_res_format; /* Use enum mlx5dv_cqe_comp_res_format 
> > */
> >  ^^
> >
> 
> The user space uses the DV prefix (e.g. MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX),
> no change from previous flags around enum mlx5dv_cqe_comp_res_format.

This still needs eventual cleaning up in the verbs_abi.h way.

But OK, this is not what I thought.

Jason


Re: [PATCH] IB: Revert "remove redundant INFINIBAND kconfig dependencies"

2018-05-28 Thread Jason Gunthorpe
On Fri, May 25, 2018 at 11:29:59PM +0200, Arnd Bergmann wrote:
> Several subsystems depend on INFINIBAND_ADDR_TRANS, which in turn depends
> on INFINIBAND. However, when with CONFIG_INIFIBAND=m, this leads to a
> link error when another driver using it is built-in. The
> INFINIBAND_ADDR_TRANS dependency is insufficient here as this is
> a 'bool' symbol that does not force anything to be a module in turn.
> 
> fs/cifs/smbdirect.o: In function `smbd_disconnect_rdma_work':
> smbdirect.c:(.text+0x1e4): undefined reference to `rdma_disconnect'
> net/9p/trans_rdma.o: In function `rdma_request':
> trans_rdma.c:(.text+0x7bc): undefined reference to `rdma_disconnect'
> net/9p/trans_rdma.o: In function `rdma_destroy_trans':
> trans_rdma.c:(.text+0x830): undefined reference to `ib_destroy_qp'
> trans_rdma.c:(.text+0x858): undefined reference to `ib_dealloc_pd'
> 
> Fixes: 9533b292a7ac ("IB: remove redundant INFINIBAND kconfig dependencies")
> Signed-off-by: Arnd Bergmann 
> Acked-by: Greg Thelen 
> ---
> The patch that introduced the problem has been queued in the
> rdma-fixes/for-rc tree. Please revert the patch before sending
> the branch to Linus.
> ---
>  drivers/infiniband/ulp/srpt/Kconfig | 2 +-
>  drivers/nvme/host/Kconfig   | 2 +-
>  drivers/nvme/target/Kconfig | 2 +-
>  drivers/staging/lustre/lnet/Kconfig | 2 +-
>  fs/cifs/Kconfig | 2 +-
>  net/9p/Kconfig  | 2 +-
>  net/rds/Kconfig | 2 +-
>  net/sunrpc/Kconfig  | 2 +-
>  8 files changed, 8 insertions(+), 8 deletions(-)

Applied to for-rc, thanks

Jason


Re: [PATCH] IB: Revert "remove redundant INFINIBAND kconfig dependencies"

2018-05-28 Thread Jason Gunthorpe
On Fri, May 25, 2018 at 05:32:52PM -0700, Greg Thelen wrote:
> On Fri, May 25, 2018 at 2:32 PM Arnd Bergmann  wrote:
> 
> > Several subsystems depend on INFINIBAND_ADDR_TRANS, which in turn depends
> > on INFINIBAND. However, when with CONFIG_INIFIBAND=m, this leads to a
> > link error when another driver using it is built-in. The
> > INFINIBAND_ADDR_TRANS dependency is insufficient here as this is
> > a 'bool' symbol that does not force anything to be a module in turn.
> 
> > fs/cifs/smbdirect.o: In function `smbd_disconnect_rdma_work':
> > smbdirect.c:(.text+0x1e4): undefined reference to `rdma_disconnect'
> > net/9p/trans_rdma.o: In function `rdma_request':
> > trans_rdma.c:(.text+0x7bc): undefined reference to `rdma_disconnect'
> > net/9p/trans_rdma.o: In function `rdma_destroy_trans':
> > trans_rdma.c:(.text+0x830): undefined reference to `ib_destroy_qp'
> > trans_rdma.c:(.text+0x858): undefined reference to `ib_dealloc_pd'
> 
> > Fixes: 9533b292a7ac ("IB: remove redundant INFINIBAND kconfig
> dependencies")
> > Signed-off-by: Arnd Bergmann 
> 
> Acked-by: Greg Thelen 
> 
> Sorry for the 9533b292a7ac problem.
> At this point the in release cycle, I think Arnd's revert is best.
> 
> If there is interest, I've put a little thought into an alternative fix:
> making INFINIBAND_ADDR_TRANS tristate.  But it's nontrivial.
> So I prefer this simple revert for now.

Is that a normal thing to do?

> Doug: do you need anything from me on this?

I can take the revert..

Jason


Re: [PATCH rdma-next 3/3] IB/mlx5: Introduce a new mini-CQE format

2018-05-28 Thread Jason Gunthorpe
On Sun, May 27, 2018 at 01:42:34PM +0300, Leon Romanovsky wrote:
> From: Yonatan Cohen 
> 
> The new mini-CQE format includes the stride index, byte count and
> packet checksum.
> Stride index is needed for striding WQ feature.
> This patch exposes this capability and enables its setting
> via mlx5 UHW data as part of query device and cq creation.
> 
> Reviewed-by: Yishai Hadas 
> Reviewed-by: Guy Levi 
> Signed-off-by: Yonatan Cohen 
> Signed-off-by: Leon Romanovsky 
>  drivers/infiniband/hw/mlx5/cq.c   | 42 
> +--
>  drivers/infiniband/hw/mlx5/main.c |  4 
>  include/uapi/rdma/mlx5-abi.h  |  2 +-
>  3 files changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> index 7b4ce1a19de0..ad39d64b8108 100644
> +++ b/drivers/infiniband/hw/mlx5/cq.c
> @@ -751,6 +751,28 @@ static int alloc_cq_frag_buf(struct mlx5_ib_dev *dev,
>   return 0;
>  }
>  
> +enum {
> + MLX5_CQE_RES_FORMAT_HASH = 0,
> + MLX5_CQE_RES_FORMAT_CSUM = 1,
> + MLX5_CQE_RES_FORMAT_CSUM_STRIDX = 3,
> +};

What is this??

> +static int mini_cqe_res_format_to_hw(struct mlx5_ib_dev *dev, u8 format)
> +{
> + switch (format) {
> + case MLX5_IB_CQE_RES_FORMAT_HASH:
> + return MLX5_CQE_RES_FORMAT_HASH;

Used here..

> + mini_cqe_format =
> + mini_cqe_res_format_to_hw(dev,
> +   ucmd.cqe_comp_res_format);

And format comes from a ucmd, so that enum is upai.

Put it in the right place and put the right comment beside
struct mlx5_ib_create_cq's cqe_comp_res_format..

And what is wrong with the user space patches? Where is the update to
enum mlx5dv_cqe_comp_res_format ? And why is this wrong?

struct mlx5dv_cq_init_attr {
uint64_t comp_mask; /* Use enum mlx5dv_cq_init_attr_mask */
uint8_t cqe_comp_res_format; /* Use enum mlx5dv_cqe_comp_res_format */
 ^^

No, it isn't, and there isn't even an enum for it. Are you sure this is
designed right? Looks pretty wrong to me.

Fix it all please, and you need to arrange things to share the uapi
header with dv just like verbs is doing.

No more of this lax attitude toward uapi!

Jason


Re: [net-next] i40iw/i40e: Remove link dependency on i40e

2018-05-23 Thread Jason Gunthorpe
On Wed, May 23, 2018 at 08:03:44AM -0700, Alexander Duyck wrote:
> On Tue, May 22, 2018 at 11:19 PM, Christoph Hellwig  
> wrote:
> > On Tue, May 22, 2018 at 02:04:06PM -0700, Jeff Kirsher wrote:
> >> > Why would you want to do this? The rdma driver is non-functional
> >> > without the ethernet driver, so why on earth would we want to defeat
> >> > the module dependency mechanism?
> >>
> >> This change is driven by the OSV's like Red Hat, where customer's were
> >> updating the i40e driver, which in turn broke i40iw.
> >
> > Doctor it hurts when I do this..
> >
> > There is no reason to make a mess of our drivers because people are
> > doing things they should haver never done and that aren't supported
> > in Linux.
> >
> > If Intel didn;t offer any out of tree drivers I'm pretty sure no
> > customer would even attempt this.  So fix this where the problem is.
> 
> Are you serious? You are never going to see out-of-tree drivers go
> away. They exist for the simple reason that most customers/OSVs are
> slow to upgrade their kernels so we have people running on a 3.10
> something kernel on their RHEL 7.X and want to use the latest greatest
> hardware.

So provide the i40iw module when providing the i40e upgrade module?

I still can't understand why this is a problem that needs to be
solved in mainline, or why it deserves a special and unique fix to
i40e, or even what the *actual* problem is..

Jason


Re: [net-next] i40iw/i40e: Remove link dependency on i40e

2018-05-22 Thread Jason Gunthorpe
On Tue, May 22, 2018 at 02:50:32PM -0700, Jeff Kirsher wrote:
> On Tue, 2018-05-22 at 15:33 -0600, Jason Gunthorpe wrote:
> > On Tue, May 22, 2018 at 02:04:06PM -0700, Jeff Kirsher wrote:
> > > On Tue, 2018-05-22 at 14:56 -0600, Jason Gunthorpe wrote:
> > > > On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> > > > > From: Sindhu Devale <sindhu.dev...@intel.com>
> > > > > 
> > > > > Currently i40iw is dependent on i40e symbols
> > > > > i40e_register_client and i40e_unregister_client due to
> > > > > which i40iw cannot be loaded without i40e being loaded.
> > > > > 
> > > > > This patch allows RDMA driver to build and load without
> > > > > linking to LAN driver and without LAN driver being loaded
> > > > > first. Once the LAN driver is loaded, the RDMA driver
> > > > > is notified through the netdevice notifiers to register
> > > > > as client to the LAN driver. Add function pointers to IDC
> > > > > register/unregister in the private VSI structure. This
> > > > > allows a RDMA driver to build without linking to i40e.
> > > > 
> > > > Why would you want to do this? The rdma driver is non-functional
> > > > without the ethernet driver, so why on earth would we want to
> > > > defeat
> > > > the module dependency mechanism?
> > > 
> > > This change is driven by the OSV's like Red Hat, where customer's
> > > were
> > > updating the i40e driver, which in turn broke i40iw.
> > 
> > So that isn't a reason to put something into the main line kernel,
> > and
> > I'm deeply skeptical this change is even sane.
> > 
> > It has been a while since I've looked at RH's kernel, but AFAIK,
> > breakage should only happen if the ABIs around
> > i40e_unregister_client/etc change..
> > 
> > So if the i40e module updates breaks the ABI, then stuffing that same
> > broken ABI through a function pointer is *totally* wrong.
> > 
> > Looks like a NAK for the RDMA side.
> 
> The ABI rarely changes, if at all.  The issue OSV's are seeing is that
> upgrading i40e, requires that i40iw be recompiled even though there
> were no updates/changes to the ABI.

Why does it need recompiling? Details please.

Jason


Re: [net-next] i40iw/i40e: Remove link dependency on i40e

2018-05-22 Thread Jason Gunthorpe
On Tue, May 22, 2018 at 02:04:06PM -0700, Jeff Kirsher wrote:
> On Tue, 2018-05-22 at 14:56 -0600, Jason Gunthorpe wrote:
> > On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> > > From: Sindhu Devale <sindhu.dev...@intel.com>
> > > 
> > > Currently i40iw is dependent on i40e symbols
> > > i40e_register_client and i40e_unregister_client due to
> > > which i40iw cannot be loaded without i40e being loaded.
> > > 
> > > This patch allows RDMA driver to build and load without
> > > linking to LAN driver and without LAN driver being loaded
> > > first. Once the LAN driver is loaded, the RDMA driver
> > > is notified through the netdevice notifiers to register
> > > as client to the LAN driver. Add function pointers to IDC
> > > register/unregister in the private VSI structure. This
> > > allows a RDMA driver to build without linking to i40e.
> > 
> > Why would you want to do this? The rdma driver is non-functional
> > without the ethernet driver, so why on earth would we want to defeat
> > the module dependency mechanism?
> 
> This change is driven by the OSV's like Red Hat, where customer's were
> updating the i40e driver, which in turn broke i40iw.

So that isn't a reason to put something into the main line kernel, and
I'm deeply skeptical this change is even sane.

It has been a while since I've looked at RH's kernel, but AFAIK,
breakage should only happen if the ABIs around
i40e_unregister_client/etc change..

So if the i40e module updates breaks the ABI, then stuffing that same
broken ABI through a function pointer is *totally* wrong.

Looks like a NAK for the RDMA side.

Jason


Re: [net-next] i40iw/i40e: Remove link dependency on i40e

2018-05-22 Thread Jason Gunthorpe
On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> From: Sindhu Devale 
> 
> Currently i40iw is dependent on i40e symbols
> i40e_register_client and i40e_unregister_client due to
> which i40iw cannot be loaded without i40e being loaded.
> 
> This patch allows RDMA driver to build and load without
> linking to LAN driver and without LAN driver being loaded
> first. Once the LAN driver is loaded, the RDMA driver
> is notified through the netdevice notifiers to register
> as client to the LAN driver. Add function pointers to IDC
> register/unregister in the private VSI structure. This
> allows a RDMA driver to build without linking to i40e.

Why would you want to do this? The rdma driver is non-functional
without the ethernet driver, so why on earth would we want to defeat
the module dependency mechanism?

Jason


Re: [pull request][for-next 00/15] Mellanox, mlx5 core and netdev updates 2018-05-17

2018-05-18 Thread Jason Gunthorpe
On Fri, May 18, 2018 at 01:03:51PM -0400, David Miller wrote:
> From: Saeed Mahameed 
> Date: Thu, 17 May 2018 18:22:43 -0700
> 
> > Below you can find two pull requests,
> > 
> > 1. mlx5 core updates to be shared for both netdev and RDMA, (patches 1..9)
> >  which is based on the last mlx5-next pull request
> >  
> > The following changes since commit a8408f4e6db775e245f20edf12b13fd58cc03a1c:
> > 
> >   net/mlx5: fix spelling mistake: "modfiy" -> "modify" (2018-05-04 12:11:51 
> > -0700)
> > 
> > are available in the Git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
> > tags/mlx5-updates-2018-05-17
> > 
> > for you to fetch changes up to 10ff5359f883412728ba816046ee3a696625ca02:
> > 
> >   net/mlx5e: Explicitly set source e-switch in offloaded TC rules 
> > (2018-05-17 14:17:35 -0700)
> > 
> > 2. mlx5e netdev updates only for net-next branch (patches 10..15) based on 
> > net-next
> > and the above pull request.
> > 
> > The following changes since commit 538e2de104cfb4ef1acb35af42427bff42adbe4d:
> > 
> >   Merge branch 'net-Allow-more-drivers-with-COMPILE_TEST' (2018-05-17 
> > 17:11:07 -0400)
> > 
> > are available in the Git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
> > tags/mlx5e-updates-2018-05-17
> > 
> > for you to fetch changes up to a228060a7c9ab88597eeac131e4578595d5d46ae:
> > 
> >   net/mlx5e: Add HW vport counters to representor ethtool stats (2018-05-17 
> > 17:48:54 -0700)
> > 
> > Dave, for your convenience you can either pull 1. and then 2. or pull 2.
> > directly.
> 
> Looks good.
> 
> I pulled 1 then I pulled 2.  That seemed to work well.  Particularly
> it allowed me to capture the two different merge commit messages one
> by one.

Does this double up the merge commit though? I see this in Saeed's
tags/mlx5e-updates-2018-05-17 ?

commit 260ab7042e24ccd4407985c6e775e39d064fab2b
Merge: 538e2de104cfb4 10ff5359f88341
Author: Saeed Mahameed 
Date:   Thu May 17 17:47:09 2018 -0700

Merge tag 'mlx5-updates-2018-05-17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

mlx5-updates-2018-05-17

mlx5 core dirver updates for both net-next and rdma-next branches.

From Christophe JAILLET, first three patches to use kvfree where needed.

From: Or Gerlitz 

Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.

This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.

Signed-off-by: Saeed Mahameed 

And this in your tree:

commit 3888ea4e2f1fb2f61e5418adf4b8332107ac0c8f
Merge: 2c47a65b7009eb 10ff5359f88341
Author: David S. Miller 
Date:   Fri May 18 13:00:08 2018 -0400

Merge tag 'mlx5-updates-2018-05-17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:


mlx5-updates-2018-05-17

mlx5 core dirver updates for both net-next and rdma-next branches.

From Christophe JAILLET, first three patche to use kvfree where needed.

From: Or Gerlitz 

Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.

This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.


Signed-off-by: David S. Miller 

I think the trouble is the Saeed needs to merge the 'core' stuff to
create the non-core patches for netdev (just like we want to do for
rdma)

So maybe netdev should take the #2 pull request and rdma should
take number #1?

This seems to be working OK from RDMA's side, we have much less netdev
stuff in our tree now which seems good!

Thanks,
Jason


Re: [PATCH v2] {net, IB}/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'

2018-05-15 Thread Jason Gunthorpe
On Sun, May 13, 2018 at 09:00:41AM +0200, Christophe JAILLET wrote:
> When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to
> free it.
> 
> Signed-off-by: Christophe JAILLET <christophe.jail...@wanadoo.fr>
> ---
> v1 -> v2: More places to update have been added to the patch
> ---
>  drivers/infiniband/hw/mlx5/cq.c| 2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/vport.c| 6 +++---
>  3 files changed, 5 insertions(+), 5 deletions(-)

I agree with Eric on the need for fixes lines in v3..

> diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> index 77d257ec899b..6d52ea03574e 100644
> --- a/drivers/infiniband/hw/mlx5/cq.c
> +++ b/drivers/infiniband/hw/mlx5/cq.c
> @@ -849,7 +849,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct 
> ib_udata *udata,
>   return 0;
>  
>  err_cqb:
> - kfree(*cqb);
> + kvfree(*cqb);

For the infiniband part:

Acked-by: Jason Gunthorpe <j...@mellanox.com>

Since this is mostly ethernet, can it go through netdev? thanks

Jason


Re: [PATCH v1 iproute2-next 2/3] rdma: print driver resource attributes

2018-05-15 Thread Jason Gunthorpe
On Tue, May 15, 2018 at 10:02:08AM -0500, Steve Wise wrote:
> > On Tue, May 15, 2018 at 09:31:27AM -0500, Steve Wise wrote:
> > > > cap net admin is not high enough privledge to see unhashed kernel
> > > > pointers. CAP_RAW_IO? Or follow what printk does?
> > > >
> > >
> > > Do you mean CAP_NET_RAW?  Here's the comments for it:
> > 
> > Nope..
> > 
> > > Func restricted_pointer() from lib/vsprintf.c uses CAP_SYSLOG.  The
> > comment for CAP_SYSLOG:
> > 
> > Yikes, yes, that is probably the required logic here, including the
> > kptr_restrict = 0 thing
> > 
> 
> Let's defer the ktpr_restrict issue for now; I want to finish the initial
> work this cycle, and adding that will likely take too much time.   I'll use
> CAP_SYSLOG and add a FIXME comment.  Ok? 

Lets just drop wr_id instead...

Jason


Re: [PATCH v1 iproute2-next 2/3] rdma: print driver resource attributes

2018-05-15 Thread Jason Gunthorpe
On Tue, May 15, 2018 at 09:31:27AM -0500, Steve Wise wrote:
> > cap net admin is not high enough privledge to see unhashed kernel
> > pointers. CAP_RAW_IO? Or follow what printk does?
> > 
> 
> Do you mean CAP_NET_RAW?  Here's the comments for it:

Nope..

> Func restricted_pointer() from lib/vsprintf.c uses CAP_SYSLOG.  The comment 
> for CAP_SYSLOG:

Yikes, yes, that is probably the required logic here, including the
kptr_restrict = 0 thing

Jason


Re: [PATCH v1 iproute2-next 2/3] rdma: print driver resource attributes

2018-05-15 Thread Jason Gunthorpe
On Tue, May 15, 2018 at 08:18:51AM -0500, Steve Wise wrote:
>  
> > > On Mon, May 14, 2018 at 05:04:26PM -0500, Steve Wise wrote:
> > > >
> > > >
> > > > On 5/14/2018 3:41 PM, Jason Gunthorpe wrote:
> > > > > On Mon, May 07, 2018 at 08:53:16AM -0700, Steve Wise wrote:
> > > > >> This enhancement allows printing rdma device-specific state, if
> > provided
> > > > >> by the kernel.  This is done in a generic manner, so rdma tool
> doesn't
> > > > >> need to know about the details of every type of rdma device.
> > > > >>
> > > > >> Driver attributes for a rdma resource are in the form of <key,
> > > > >> [print_type], value> tuples, where the key is a string and the
> value can
> > > > >> be any supported driver attribute.  The print_type attribute, if
> present,
> > > > >> provides a print format to use vs the standard print format for the
> > type.
> > > > >> For example, the default print type for a PROVIDER_S32 value is "%d
> ",
> > > > >> but "0x%x " if the print_type of PRINT_TYPE_HEX is included inthe
> > tuple.
> > > > >>
> > > > >> Driver resources are only printed when the -dd flag is present.
> > > > >> If -p is present, then the output is formatted to not exceed 80
> > columns,
> > > > >> otherwise it is printed as a single row to be grep/awk friendly.
> > > > >>
> > > > >> Example output:
> > > > >>
> > > > >> # rdma resource show qp lqpn 1028 -dd -p
> > > > >> link cxgb4_0/- lqpn 1028 rqpn 0 type RC state RTS rq-psn 0 sq-psn 0
> > > path-mig-state MIGRATED pid 0 comm [nvme_rdma]
> > > > >> sqid 1028 flushed 0 memsize 123968 cidx 85 pidx 85 wq_pidx 106
> > > flush_cidx 85 in_use 0
> > > > >> size 386 flags 0x0 rqid 1029 memsize 16768 cidx 43 pidx 41
> wq_pidx
> > > 171 msn 44 rqt_hwaddr 0x2a8a5d00
> > > > >> rqt_size 256 in_use 128 size 130 idx 43 wr_id
> 0x881057c03408 idx
> > > 40 wr_id 0x881057c033f0
> > > > > Hey some of these look like kernel pointers.. That is a no-no.. What
> > > > > is up there?
> > > >
> > > > Nothing is defined as a kernel pointer.  But wr_id is often a pointer
> to
> > > > the kernel rdma application's context...
> > > >
> > > > > The wr_id often contains a pointer, right? So we cannot just pass it
> > > > > to user space..
> > > >
> > > > Hmm.  It is useful for debugging kernel rdma applications.  Perhaps
> > > > these attrs can be only be sent up by the kernel if the capabilities
> > > > allow.  But previous review comments of the kernel series, which is
> now
> > > > merged, forced me to remove passing the capabilities information to
> the
> > > > driver resource fill functions.
> > > >
> > > > So what's the right way to do this?
> > >
> > > The reviewer asked do not pass to drivers whole CAP_.. bits, because
> > > they anyway don't need such granularity.
> > >
> > 
> > Ok thanks.
> 
> How's this?
> 
> diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
> index 6379685..2cf9c5c 100644
> +++ b/include/rdma/restrack.h
> @@ -66,7 +66,8 @@ struct rdma_restrack_root {
>  * Allows rdma drivers to add their own restrack attributes.
>  */
> int (*fill_res_entry)(struct sk_buff *msg,
> - struct rdma_restrack_entry *entry);
> + struct rdma_restrack_entry *entry,
> + bool net_admin_capable);
>  };

cap net admin is not high enough privledge to see unhashed kernel
pointers. CAP_RAW_IO? Or follow what printk does?

Honestly, I don't really know, this hashed pointer stuff is new..

Jason


Re: [PATCH v1 iproute2-next 2/3] rdma: print driver resource attributes

2018-05-14 Thread Jason Gunthorpe
On Mon, May 07, 2018 at 08:53:16AM -0700, Steve Wise wrote:
> This enhancement allows printing rdma device-specific state, if provided
> by the kernel.  This is done in a generic manner, so rdma tool doesn't
> need to know about the details of every type of rdma device.
> 
> Driver attributes for a rdma resource are in the form of  [print_type], value> tuples, where the key is a string and the value can
> be any supported driver attribute.  The print_type attribute, if present,
> provides a print format to use vs the standard print format for the type.
> For example, the default print type for a PROVIDER_S32 value is "%d ",
> but "0x%x " if the print_type of PRINT_TYPE_HEX is included inthe tuple.
> 
> Driver resources are only printed when the -dd flag is present.
> If -p is present, then the output is formatted to not exceed 80 columns,
> otherwise it is printed as a single row to be grep/awk friendly.
> 
> Example output:
> 
> # rdma resource show qp lqpn 1028 -dd -p
> link cxgb4_0/- lqpn 1028 rqpn 0 type RC state RTS rq-psn 0 sq-psn 0 
> path-mig-state MIGRATED pid 0 comm [nvme_rdma]
> sqid 1028 flushed 0 memsize 123968 cidx 85 pidx 85 wq_pidx 106 flush_cidx 
> 85 in_use 0
> size 386 flags 0x0 rqid 1029 memsize 16768 cidx 43 pidx 41 wq_pidx 171 
> msn 44 rqt_hwaddr 0x2a8a5d00
> rqt_size 256 in_use 128 size 130 idx 43 wr_id 0x881057c03408 idx 40 
> wr_id 0x881057c033f0

Hey some of these look like kernel pointers.. That is a no-no.. What
is up there?

The wr_id often contains a pointer, right? So we cannot just pass it
to user space..

Jason


[PATCH for-rc] uapi: Fix SPDX tags for files referring to the 'OpenIB.org' license

2018-04-20 Thread Jason Gunthorpe
Based on discussion with Kate Stewart this license is not a
BSD-2-Clause, but is now formally identified as Linux-OpenIB
by SPDX.

The key difference between the licenses is in the 'warranty'
paragraph.

if_infiniband.h refers to the 'OpenIB.org' license, but
does not include the text, instead it links to an obsolete
web site that contains a license that matches the BSD-2-Clause
SPX. There is no 'three clause' version of the OpenIB.org
license.

Signed-off-by: Jason Gunthorpe <j...@mellanox.com>
---
 include/uapi/linux/if_infiniband.h  | 2 +-
 include/uapi/linux/rds.h| 2 +-
 include/uapi/linux/tls.h| 2 +-
 include/uapi/rdma/cxgb3-abi.h   | 2 +-
 include/uapi/rdma/cxgb4-abi.h   | 2 +-
 include/uapi/rdma/hns-abi.h | 2 +-
 include/uapi/rdma/ib_user_cm.h  | 2 +-
 include/uapi/rdma/ib_user_ioctl_verbs.h | 2 +-
 include/uapi/rdma/ib_user_mad.h | 2 +-
 include/uapi/rdma/ib_user_sa.h  | 2 +-
 include/uapi/rdma/ib_user_verbs.h   | 2 +-
 include/uapi/rdma/mlx4-abi.h| 2 +-
 include/uapi/rdma/mlx5-abi.h| 2 +-
 include/uapi/rdma/mthca-abi.h   | 2 +-
 include/uapi/rdma/nes-abi.h | 2 +-
 include/uapi/rdma/qedr-abi.h| 2 +-
 include/uapi/rdma/rdma_user_cm.h| 2 +-
 include/uapi/rdma/rdma_user_ioctl.h | 2 +-
 include/uapi/rdma/rdma_user_rxe.h   | 2 +-
 19 files changed, 19 insertions(+), 19 deletions(-)

I propose to send this patch through the RDMA tree.

diff --git a/include/uapi/linux/if_infiniband.h 
b/include/uapi/linux/if_infiniband.h
index 050b92dcf8cf40..0fc33bf30e45a1 100644
--- a/include/uapi/linux/if_infiniband.h
+++ b/include/uapi/linux/if_infiniband.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
BSD-3-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
BSD-2-Clause) */
 /*
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index a66b213de3d7a4..20c6bd0b00079e 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
Linux-OpenIB) */
 /*
  * Copyright (c) 2008 Oracle.  All rights reserved.
  *
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index c6633e97eca40b..ff02287495ac56 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
Linux-OpenIB) */
 /*
  * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
  *
diff --git a/include/uapi/rdma/cxgb3-abi.h b/include/uapi/rdma/cxgb3-abi.h
index 9acb4b7a624633..85aed672f43e65 100644
--- a/include/uapi/rdma/cxgb3-abi.h
+++ b/include/uapi/rdma/cxgb3-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
Linux-OpenIB) */
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
  *
diff --git a/include/uapi/rdma/cxgb4-abi.h b/include/uapi/rdma/cxgb4-abi.h
index 1fefd0140c26f6..a159ba8dcf8f13 100644
--- a/include/uapi/rdma/cxgb4-abi.h
+++ b/include/uapi/rdma/cxgb4-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
Linux-OpenIB) */
 /*
  * Copyright (c) 2009-2010 Chelsio, Inc. All rights reserved.
  *
diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
index 7092c8de4bd883..78613b609fa846 100644
--- a/include/uapi/rdma/hns-abi.h
+++ b/include/uapi/rdma/hns-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
Linux-OpenIB) */
 /*
  * Copyright (c) 2016 Hisilicon Limited.
  *
diff --git a/include/uapi/rdma/ib_user_cm.h b/include/uapi/rdma/ib_user_cm.h
index 4a8f9562f7cd9b..e2709bb8cb1802 100644
--- a/include/uapi/rdma/ib_user_cm.h
+++ b/include/uapi/rdma/ib_user_cm.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
Linux-OpenIB) */
 /*
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Intel Corporation.  All rights reserved.
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h 
b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 04e46ea517d328..625545d862d7e4 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -1,

Re: [PATCH] net/mlx5: remove some extraneous spaces in indentations

2018-04-10 Thread Jason Gunthorpe
On Tue, Apr 10, 2018 at 05:22:54PM -0600, Jason Gunthorpe wrote:
> On Mon, Apr 09, 2018 at 01:43:36PM +0100, Colin Ian King wrote:
> > From: Colin Ian King <colin.k...@canonical.com>
> > 
> > There are several lines where there is an extraneous space causing
> > indentation misalignment. Remove them.
> > 
> > Cleans up Cocconelle warnings:
> > 
> > ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:409:3-18: code aligned
> > with following code on line 410
> > ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:415:3-18: code aligned
> > with following code on line 416
> > ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:421:3-18: code aligned
> > with following code on line 422
> > 
> > Signed-off-by: Colin Ian King <colin.k...@canonical.com>
> >  drivers/net/ethernet/mellanox/mlx5/core/qp.c | 18 +-
> >  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> applied to for-next thanks

Oh wait, this is for netdev, not rdma sorry. Never mind, DaveM should
pick it up.

Jason


Re: [PATCH] net/mlx5: remove some extraneous spaces in indentations

2018-04-10 Thread Jason Gunthorpe
On Mon, Apr 09, 2018 at 01:43:36PM +0100, Colin Ian King wrote:
> From: Colin Ian King 
> 
> There are several lines where there is an extraneous space causing
> indentation misalignment. Remove them.
> 
> Cleans up Cocconelle warnings:
> 
> ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:409:3-18: code aligned
> with following code on line 410
> ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:415:3-18: code aligned
> with following code on line 416
> ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:421:3-18: code aligned
> with following code on line 422
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/qp.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)

applied to for-next thanks

Jason


Re: [PATCH v2 net-next] qed*: Utilize FW 8.33.11.0

2018-03-28 Thread Jason Gunthorpe
On Wed, Mar 28, 2018 at 11:42:16AM +0300, Michal Kalderon wrote:
> This FW contains several fixes and features
> 
> RDMA Features
> - SRQ support
> - XRC support
> - Memory window support
> - RDMA low latency queue support
> - RDMA bonding support
> 
> RDMA bug fixes
> - RDMA remote invalidate during retransmit fix
> - iWARP MPA connect interop issue with RTR fix
> - iWARP Legacy DPM support
> - Fix MPA reject flow
> - iWARP error handling
> - RQ WQE validation checks
> 
> MISC
> - Fix some HSI types endianity
> - New Restriction: vlan insertion in core_tx_bd_data can't be set
>   for LB packets
> 
> ETH
> - HW QoS offload support
> - Fix vlan, dcb and sriov flow of VF sending a packet with
>   inband VLAN tag instead of default VLAN
> - Allow GRE version 1 offloads in RX flow
> - Allow VXLAN steering
> 
> iSCSI / FcoE
> - Fix bd availability checking flow
> - Support 256th sge proerly in iscsi/fcoe retransmit
> - Performance improvement
> - Fix handle iSCSI command arrival with AHS and with immediate
> - Fix ipv6 traffic class configuration
> 
> DEBUG
> - Update debug utilities
> 
> Signed-off-by: Michal Kalderon <michal.kalde...@cavium.com>
> Signed-off-by: Tomer Tayar <tomer.ta...@cavium.com>
> Signed-off-by: Manish Rangankar <manish.rangan...@cavium.com>
> Signed-off-by: Ariel Elior <ariel.el...@cavium.com>
> ---
> Changes from v1
>   Remove version bump and qedr module version addition
> ---
>  drivers/infiniband/hw/qedr/qedr_hsi_rdma.h |4 +-
>  drivers/infiniband/hw/qedr/verbs.c |4 +-
>  drivers/net/ethernet/qlogic/qed/qed_debug.c|  415 +++--
>  drivers/net/ethernet/qlogic/qed/qed_dev.c  |4 +-
>  drivers/net/ethernet/qlogic/qed/qed_hsi.h  | 1892 
> ++--
>  .../net/ethernet/qlogic/qed/qed_init_fw_funcs.c|  103 +-
>  drivers/net/ethernet/qlogic/qed/qed_iwarp.c|7 -
>  drivers/net/ethernet/qlogic/qed/qed_l2.c   |2 +-
>  drivers/net/ethernet/qlogic/qed/qed_ll2.c  |   13 -
>  include/linux/qed/common_hsi.h |2 +-
>  include/linux/qed/eth_common.h |2 +-
>  include/linux/qed/iscsi_common.h   |4 +-
>  include/linux/qed/rdma_common.h|2 +
>  include/linux/qed/roce_common.h|3 +
>  14 files changed, 1310 insertions(+), 1147 deletions(-)

For drivers/infiniband:

Acked-by: Jason Gunthorpe <j...@mellanox.com>

Jason


Re: [PATCH net-next] qed*: Utilize FW 8.33.11.0

2018-03-27 Thread Jason Gunthorpe
On Tue, Mar 27, 2018 at 08:50:24PM +0300, Leon Romanovsky wrote:
> On Tue, Mar 27, 2018 at 05:41:51PM +, Kalderon, Michal wrote:
> > > From: Jason Gunthorpe [mailto:j...@ziepe.ca]
> > > Sent: Tuesday, March 27, 2018 12:18 AM
> > > > diff --git a/drivers/infiniband/hw/qedr/main.c
> > > > b/drivers/infiniband/hw/qedr/main.c
> > > > index db4bf97..7dbbe6d 100644
> > > > +++ b/drivers/infiniband/hw/qedr/main.c
> > > > @@ -51,6 +51,7 @@
> > > >  MODULE_DESCRIPTION("QLogic 40G/100G ROCE Driver");
> > > > MODULE_AUTHOR("QLogic Corporation");  MODULE_LICENSE("Dual
> > > BSD/GPL");
> > > > +MODULE_VERSION(QEDR_MODULE_VERSION);
> > > >
> > > >  #define QEDR_WQ_MULTIPLIER_DFT (3)
> > > >
> > > > diff --git a/drivers/infiniband/hw/qedr/qedr.h
> > > > b/drivers/infiniband/hw/qedr/qedr.h
> > > > index 86d4511..ab0d411 100644
> > > > +++ b/drivers/infiniband/hw/qedr/qedr.h
> > > > @@ -43,6 +43,8 @@
> > > >  #include "qedr_hsi_rdma.h"
> > > >
> > > >  #define QEDR_NODE_DESC "QLogic 579xx RoCE HCA"
> > > > +#define QEDR_MODULE_VERSION "8.33.11.20"
> > > > +
> > >
> > > I thought we had a general prohibition against versions like
> > > this in mainline drivers? And what does this hunk have to do
> > > with supporting new firmware?
> > >
> > I'm assuming you refer only to rdma in regards to version
> > prohibition right ? as looking at all other vendors (including
> > Mellanox) all have module versions under net/ why is rdma
> > different in this way ?  I now searched back mails on the topic
> > and found an email from Leon where he stated: " I am strongly
> > against module versions. You should rely on official kernel
> > version."  But it's not always the inbox driver that is installed
> > or probed, the kernel version is not enough.  Given different
> > distros, vanilla kernels, out of box drivers, etc... it is
> > essential for us that based on logs And modinfo we can determine
> > the qed* drivers that are running.
> 
> We actually stopped to maintain driver versions, just ensure that inbox,
> upstream and MLNX_OFED have different names.
> 
> The discussion thread is here
> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-June/004426.html
> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-June/004441.html

Hmm, Linus pretty clearly said No to MODULE_VERSION and related.

So I can't take this hunk, and you shouldn't do in ethernet either, I
guess.

Honestly the idea that this version will somehow have meaning in the
distro kernels is pretty far fetched. You think distros will backport
patches changing version # in any way that will make some kind of
sense?

Jason


Re: [PATCH v2 iproute2-next 3/6] rdma: Add CM_ID resource tracking information

2018-03-27 Thread Jason Gunthorpe
On Tue, Mar 27, 2018 at 06:15:44PM +0300, Leon Romanovsky wrote:
> On Tue, Mar 27, 2018 at 08:44:55AM -0600, Jason Gunthorpe wrote:
> > On Tue, Mar 27, 2018 at 06:21:41AM +0300, Leon Romanovsky wrote:
> > > On Mon, Mar 26, 2018 at 04:30:33PM -0600, Jason Gunthorpe wrote:
> > > > On Mon, Mar 26, 2018 at 04:34:44PM -0500, Steve Wise wrote:
> > > > >
> > > > > On 3/26/2018 4:15 PM, Jason Gunthorpe wrote:
> > > > > > On Mon, Mar 26, 2018 at 09:30:41AM -0500, Steve Wise wrote:
> > > > > >>
> > > > > >> On 3/26/2018 9:17 AM, David Ahern wrote:
> > > > > >>> On 2/27/18 9:07 AM, Steve Wise wrote:
> > > > > >>>> diff --git a/rdma/rdma.h b/rdma/rdma.h
> > > > > >>>> index 5809f70..e55205b 100644
> > > > > >>>> +++ b/rdma/rdma.h
> > > > > >>>> @@ -18,10 +18,12 @@
> > > > > >>>>  #include 
> > > > > >>>>  #include 
> > > > > >>>>  #include 
> > > > > >>>> +#include 
> > > > > >>>>
> > > > > >>>>  #include "list.h"
> > > > > >>>>  #include "utils.h"
> > > > > >>>>  #include "json_writer.h"
> > > > > >>>> +#include 
> > > > > >>>>
> > > > > >>> did you forget to add rdma_cma.h? I don't see that file in my 
> > > > > >>> repo.
> > > > > >> It is provided by the rdma-core package, upon which rdma tool now
> > > > > >> depends for the rdma_port_space enum.
> > > > > > It is a kernel bug that enum is not in an include/uapi/rdma header
> > > > > >
> > > > > > Fix it there and don't try to use rdma-core headers to get kernel 
> > > > > > ABI.
> > > > > >
> > > > > > Jason
> > > > >
> > > > > I wish you'd commented on this just a little sooner.  I just resent v3
> > > > > of this series... with rdma_cma.h included. :)
> > > > >
> > > > > How about the restrack/nldev code just translates the port space from
> > > > > enum rdma_port_space to a new ABI enum, say nldev_rdma_port_space, 
> > > > > that
> > > > > i add to rdma_netlink.h?  I'd hate to open the can of worms of trying 
> > > > > to
> > > > > split rdma_cma.h into uabi and no uabi headers. :(
> > > >
> > > > If port space is already part of the ABI there isn't much reason to
> > > > translate it.
> > > >
> > > > You just need to pick the right header to put it in, since it is a verbs
> > > > define it doesn't belong in the netlink header.
> > >
> > > I completely understand Steve's concerns.
> > >
> > > I tried to do such thing (expose kernel headers) in first incarnation of
> > > rdmatool with attempt to clean IB/core as well to ensure that we won't 
> > > expose
> > > anything that is not implemented. It didn't go well.
> >
> > rdma-core is now using the kernel uapi/ headers natively, seems to be
> > going OK. What problem did you face?
> 
> I didn't agree to move to UAPI defines which are not implemented and not
> used in the kernel, so I sent small number of patches similar to those [1, 2].
> 
> Those patches were rejected.
> 
> So please don't mix Steve's need to use 3 defines with very large and
> painful task to expose proper UAPIs.

Steve can just move the 3 defines he needs to the uapi, we are doing
this incrementally..

rdma_core does not define kernel ABI and it is totally wrong to use
random constants from rdma_cma.h as kernel ABI.

Jason


  1   2   3   >