RE: Netlink messages without NLM_F_REQUEST flag

2017-06-07 Thread Hefty, Sean
> It is acking that message from user was received by kernel and now
> processing. The message from kernel to user are anyway unreliable [1],
> so I don't understand on which handshake you are talking.
> 
> [1] "reliable transmissions from kernel to user are impossible in any
> case"
> https://linux.die.net/man/7/netlink

The SA messages are kernel initiated messages that target a user space daemon 
(e.g. ibacm) for cached PR data.  If that fails because the daemon isn't 
running or the message is lost, the normal retry mechanism in the kernel should 
kick in and try sending the SA message to the remote (i.e. real) SA.

- Sean


RE: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-08 Thread Hefty, Sean
> > Even reading your statement twice did not make me any wiser.
> > You mentioned "better hardware resource usage". Compared to what? Is
> that
> > perhaps compared to IPoIB?  Since Ethernet frames have an extra
> header and
> > are larger than IPoIB frames, how can larger frames result in better
> hardware
> > resource usage?
> 
> Yes, as compared to IPoIB.  The problem with IPoIB is it introduces a
> significant amount of Verbs overhead which is not needed for Ethernet
> encapsulation.  Especially on hardware such as ours.  As Jason has
> mentioned having a more generic "skb_send" or "skb_qp" has been
> discussed in the past.

Let's start discussing whether ipoib should be in the upstream kernel while 
we're at this nonsense.

The IBTA chose to encapsulate IP over IB transport as their mechanism for 
supporting traditional socket based applications.  OPA chose Ethernet 
encapsulated over OPA link layer.  RDMA isn't involved with OPA.  This is a 
pointless discussion over architecture.



RE: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Hefty, Sean
> This may have been stated before, but what is missing from this
> description
> is an explanation of why accepting an Ethernet over RDMA driver in the
> upstream kernel is considered useful. We already have an IPoIB driver,
> so
> why to add another driver to the kernel tree for communicating IP
> packets
> over an RDMA network?

This is Ethernet - not IP - encapsulation over a non-InfiniBand device/protocol.


RE: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Hefty, Sean
> I didn't read patches yet, and prefer to ask it in advance. Does this
> new ULP work with all
> drivers/infiniband/hw/* devices as it is expected from ULP?

Like the way ipoib or srp work with all hw devices?  What is the real point of 
this question?


RE: [RFC v2 03/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) netdev

2016-12-15 Thread Hefty, Sean
> This goes on the network too? Also looks like it has endian problems.

I don't think OPA supports BE systems, and I think it uses LE on the wire for 
at least some portions of its protocol.


RE: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver

2016-11-29 Thread Hefty, Sean
> You are not making a subsystem. Don't overcomplicate things. A
> multi-part device device can just directly link.

The VNIC may be usable over multiple generations of HFIs, but I don't know if 
that is the intent.


RE: [PATCH 04/11] IB/cm: Expose DGID in SIDR request events

2015-06-16 Thread Hefty, Sean
> The idea is to allow SIDR request to be sorted by the GID, when we will
> have alias GIDs for IPoIB.

Please limit this series, or at least the early patches in this series, to 
simply moving the de-mux out of the ib_cm and into the rdma_cm.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 07/11] IB/cma: Helper functions to access port space IDRs

2015-06-15 Thread Hefty, Sean
> Add helper functions to access the IDRs by port-space and port number.
> 
> Pass around the port-space enum in cma.c instead of using pointers to
> port-space IDRs.

What is the motivation for this change?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 06/11] IB/cma: Refactor RDMA IP CM private-data parsing code

2015-06-15 Thread Hefty, Sean
> -static int cma_save_net_info(struct rdma_cm_id *id, struct rdma_cm_id
> *listen_id,
> -  struct ib_cm_event *ib_event)
> +static u16 cma_port_from_service_id(__be64 service_id)
>  {
> - struct cma_hdr *hdr;
> + return be64_to_cpu(service_id);
> +}

Nit - Does the compiler not complain about the cast from u64 to u16?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 05/11] IB/cm: Share listening CM IDs

2015-06-15 Thread Hefty, Sean
> @@ -722,6 +725,7 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device
> *device,
>   INIT_LIST_HEAD(&cm_id_priv->work_list);
>   atomic_set(&cm_id_priv->work_count, -1);
>   atomic_set(&cm_id_priv->refcount, 1);
> + cm_id_priv->listen_sharecount = 1;

This is setting the listen count before we know whether the cm_id will actually 
be used to listen.


>   return &cm_id_priv->id;
> 
>  error:
> @@ -847,11 +851,21 @@ retest:
>   spin_lock_irq(&cm_id_priv->lock);
>   switch (cm_id->state) {
>   case IB_CM_LISTEN:
> - cm_id->state = IB_CM_IDLE;
>   spin_unlock_irq(&cm_id_priv->lock);
> +
>   spin_lock_irq(&cm.lock);
> + if (--cm_id_priv->listen_sharecount > 0) {
> + /* The id is still shared. */
> + atomic_dec(&cm_id_priv->refcount);
> + spin_unlock_irq(&cm.lock);
> + return;
> + }
>   rb_erase(&cm_id_priv->service_node, &cm.listen_service_table);
>   spin_unlock_irq(&cm.lock);
> +
> + spin_lock_irq(&cm_id_priv->lock);
> + cm_id->state = IB_CM_IDLE;
> + spin_unlock_irq(&cm_id_priv->lock);

Why is the state being changed?  The cm_id is about to be freed anyway.


>   break;
>   case IB_CM_SIDR_REQ_SENT:
>   cm_id->state = IB_CM_IDLE;
> @@ -929,11 +943,32 @@ void ib_destroy_cm_id(struct ib_cm_id *cm_id)
>  }
>  EXPORT_SYMBOL(ib_destroy_cm_id);
> 
> -int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64
> service_mask,
> -  struct ib_cm_compare_data *compare_data)
> +/**
> + * __ib_cm_listen - Initiates listening on the specified service ID for
> + *   connection and service ID resolution requests.
> + * @cm_id: Connection identifier associated with the listen request.
> + * @service_id: Service identifier matched against incoming connection
> + *   and service ID resolution requests.  The service ID should be
> specified
> + *   network-byte order.  If set to IB_CM_ASSIGN_SERVICE_ID, the CM will
> + *   assign a service ID to the caller.
> + * @service_mask: Mask applied to service ID used to listen across a
> + *   range of service IDs.  If set to 0, the service ID is matched
> + *   exactly.  This parameter is ignored if %service_id is set to
> + *   IB_CM_ASSIGN_SERVICE_ID.
> + * @compare_data: This parameter is optional.  It specifies data that
> must
> + *   appear in the private data of a connection request for the specified
> + *   listen request.
> + * @lock: If set, lock the cm.lock spin-lock when adding the id to the
> + *   listener tree. When false, the caller must already hold the spin-
> lock,
> + *   and compare_data must be NULL.
> + */
> +static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
> +   __be64 service_mask,
> +   struct ib_cm_compare_data *compare_data,
> +   bool lock)
>  {
>   struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
> - unsigned long flags;
> + unsigned long flags = 0;
>   int ret = 0;
> 
>   service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
> @@ -959,7 +994,8 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64
> service_id, __be64 service_mask,
> 
>   cm_id->state = IB_CM_LISTEN;
> 
> - spin_lock_irqsave(&cm.lock, flags);
> + if (lock)
> + spin_lock_irqsave(&cm.lock, flags);

I'm not a fan of this sort of locking structure.  Why not just move the locking 
into the outside calls completely?  I.e. move to ib_cm_listen() instead of 
passing in true.


>   if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
>   cm_id->service_id = cpu_to_be64(cm.listen_service_id++);
>   cm_id->service_mask = ~cpu_to_be64(0);
> @@ -968,7 +1004,8 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64
> service_id, __be64 service_mask,
>   cm_id->service_mask = service_mask;
>   }
>   cur_cm_id_priv = cm_insert_listen(cm_id_priv);
> - spin_unlock_irqrestore(&cm.lock, flags);
> + if (lock)
> + spin_unlock_irqrestore(&cm.lock, flags);
> 
>   if (cur_cm_id_priv) {
>   cm_id->state = IB_CM_IDLE;
> @@ -978,8 +1015,86 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64
> service_id, __be64 service_mask,
>   }
>   return ret;
>  }
> +
> +int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64
> service_mask,
> +  struct ib_cm_compare_data *compare_data)
> +{
> + return __ib_cm_listen(cm_id, service_id, service_mask, compare_data,
> +   true);
> +}
>  EXPORT_SYMBOL(ib_cm_listen);
> 
> +/**
> + * Create a new listening ib_cm_id and listen on the given service ID.
> + *
> + * If there's an existing ID listening on that same device and service
> ID,
> + * return it.
> + *
> + * @device: Device associated with the cm_id.  All related communication
> will
> + * be asso

RE: [PATCH 04/11] IB/cm: Expose DGID in SIDR request events

2015-06-15 Thread Hefty, Sean
>  drivers/infiniband/core/cm.c | 7 +++
>  include/rdma/ib_cm.h | 2 ++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index c5f5f89e274a..46f99ec4080a 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -2983,6 +2983,13 @@ static void cm_format_sidr_req_event(struct cm_work
> *work,
>   param->pkey = __be16_to_cpu(sidr_req_msg->pkey);
>   param->listen_id = listen_id;
>   param->service_id = sidr_req_msg->service_id;
> + if (work->mad_recv_wc->wc->wc_flags & IB_WC_GRH) {
> + param->grh = 1;
> + memcpy(¶m->dgid, &work->mad_recv_wc->recv_buf.grh->dgid,
> +sizeof(param->dgid));
> + } else {
> + param->grh = 0;

What is the use case here?  Are you trying to sort by device?  How does the GID 
of the GMP relate to the listen?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 03/11] IB/cm: Expose service ID in request events

2015-06-15 Thread Hefty, Sean
> Expose the service ID on an incoming CM or SIDR request to the event
> handler. This will allow the RDMA CM module to de-multiplex connection
> requests based on the information encoded in the service ID.
> 
> Signed-off-by: Haggai Eran 

Acked-by: Sean Hefty 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 for-next 05/13] IB/cm: Reference count ib_cm_ids

2015-05-21 Thread Hefty, Sean
> It remains to clean up ib_cm's ib_cm_listen interface now that
> compare_data isn't used, but I'm not sure this belongs in this series.

This patch series is changing the behavior that the compare data solves.  
Currently, the ib_cm handles all of the multiplexing for the rdma_cm -- that's 
the reason for the compare data.  This series changes that such that the ib_cm 
would handle half the multiplexing, with the other half handled by the rdma_cm. 
 We should not insert that sort of split.  So, I disagree that this isn't part 
of this series.

Either do all of the multiplexing in the ib_cm -- without exposing it to inet 
(add port/pkey filtering if that works) -- or move all of it out.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 for-next 05/13] IB/cm: Reference count ib_cm_ids

2015-05-19 Thread Hefty, Sean
> I find Haggai's argument compelling, it is a very small amount of code
> and data to add a sharing count, and a very large amount to duplicate
> the whole service id map into cma.c.

I get wanting to share the listen list, but we end up with this:

> drivers/infiniband/core/cm.c   | 129 +-

that's more code than what the ib_cm module uses to track listens, and the 
result is a solution that ends up being split in a weird fashion between the 
ib_cm, iw_cm (TBD), and rdma_cm.

I wonder if the existing ib_cm interface is even what we want.  Currently, the 
rdma_cm pushes the private data (i.e. IP address) comparison into the ib_cm.  
This is only used by the rdma_cm.  Should that instead be moved out of the 
ib_cm and handled in the rdma_cm?  And then update the ib_cm to support 
multiple listens on the same service id.

For example, the ib_cm could simply queue all listen requests on the same 
service id.  When a REQ arrives, it just calls each listener back until one 
'claims' the REQ.  The destruction of a listener could then be synced with the 
callback.  From what I can tell, the current proposal requires that the ib_cm 
user be prepared to receive a REQ callback for a listen that it has already 
destroyed.  If needed, a flag can be added to the ib_cm_listen to indicate if 
the service id is shared/exclusive.

- Sean 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html