Re: [ewg] [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace

2010-05-17 Thread Liran Liss
If we have a dedicated ABI call for this mapping, then it seems reasonable to 
have it device independent.
However, this mapping is really only used when creating address handles.

So, we can base the mapping on the (device specific) create_ah() flow, but 
provide generic mapping functions for all devices to use (this is kind of what 
happens now).
Also, using create_ah() doesn't introduce an ABI call that is specific to 
ib--eth mappings.

This is similar to how device-specific ib_reg_user_mr() functions call the 
generic ib_umem_get()...

-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Roland Dreier
Sent: Thursday, May 13, 2010 10:18 PM
To: Sean Hefty
Cc: 'Eli Cohen'; Eli Cohen; Linux RDMA list; ewg
Subject: Re: [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace

  Basically, what I want to understand is why does this change make sense?
 
  @@ -1139,6 +1139,10 @@ struct ib_device {
 struct ib_grh *in_grh,
 struct ib_mad *in_mad,
 struct ib_mad *out_mad);
  +int(*get_eth_l2_addr)(struct ib_device *device,
  u8 port,
  +  union ib_gid *dgid, int
  sgid_idx,
  +  u8 *mac, u16 *vlan_id, u8
  *tagged);
  +

Yes, that was pretty much my original question.  Why do we have a verb for 
userspace to call a device-specific method to do the mapping?  The layering 
seems wrong somewhere if we have a generic verb to do this mapping, but then 
put the mapping in device-specific code.

 - R.
--
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace

2010-05-13 Thread Eli Cohen
On Wed, May 12, 2010 at 01:28:31PM -0700, Roland Dreier wrote:
   Add ib_uverbs_get_eth_l2_addr() to allow ibv_create_ah() to resolve sgid,
   dgid to vlan, dmac for any gid type.  Although user-space might bypass 
 this
   call for link-local gids, it is better not to replicate the kernel 
 resolution
   policy.  Port link layer is also returned by ibv_query_port().
 
 A high-level comment/question, followed by some notes about the specific 
 patch.
 
 At the highest level, is having this very low-level command exposed as
 part of the kernel uverbs - userspace API the right place to split
 things?  Making the Ethernet address resolution part of the low-level
 driver implies that it's not really a generic part of the verbs interface.

Correct - Ethernet address resolution really isn't part of the verbs
interface accoring to the RoCE specification; it should happen below
the verbs.

 
 Maybe it is generic, and we should have a generic function instead of
 calling into the low-level driver.  I see the argument that we should
 keep the policy in the kernel, although I'm not sure how strong that
 argument is -- once we start shipping a kernel with a certain policy
 (and I guess OFED has in effect already done that), how could we ever
 change that policy?  We'll have interoperability issues anyway, so it
 seems having userspace and kernel use different policies doesn't cause
 much further problems anyway.

Address mapping policies depend on the address type. This patch only
defines a policy for mapping link-local addresses, and we should
indeed take care not to change it (if possible).
Later on, we can add more policies for other address types (e.g.,
normal IPv6 addresses, mapped IPv4 addresses, etc.)
 
 Or maybe it is device-specific, and we could wrap it up into the create
 AH uverbs call we already have?

Interesting idea. Not sure what is better here: a seperate ABI
call or some additional 'u8 ctx[32]' field in struct
ib_uverbs_create_ah_resp that will be interpreted by the hw-specific
user-space driver.

 
 Low-level comments:
 
   +ssize_t ib_uverbs_get_eth_l2_addr(struct ib_uverbs_file *file, const char 
 __user *buf,
   +int in_len, int out_len)
   +{
   +  struct ib_uverbs_get_eth_l2_addr   cmd;
   +  struct ib_uverbs_get_eth_l2_addr_resp  resp;
   +  int  ret;
   +  struct ib_pd*pd;
   +
   +  if (out_len  sizeof resp)
   +  return -ENOSPC;
   +
   +  if (copy_from_user(cmd, buf, sizeof cmd))
   +  return -EFAULT;
   +
   +  pd = idr_read_pd(cmd.pd_handle, file-ucontext);
   +  if (!pd)
   +  return -EINVAL;
   +
   +  ret = ib_get_eth_l2_addr(pd-device, cmd.port, (union ib_gid *)cmd.gid,
   +   cmd.sgid_idx, resp.mac, resp.vlan_id, 
 resp.tagged);
   +  put_pd_read(pd);
   +  if (!ret) {
   +  if (copy_to_user((void __user *) (unsigned long) cmd.response,
   +   resp, sizeof resp))
   +  return -EFAULT;
 
 This leaks kernel memory contents to userspace since the stack variable
 resp is never cleared.  Also will cause problems if we ever need to use
 the reserved fields for anything.

I see, I missed that subtle detail.


 
 Also I'm not sure I understand why you pass the PD into this method?  It
 seems you just use it to get a pointer to the device, but you already
 have that from the uverbs_file structure that's passed into all commands.

True, missed that too.

 
   +int ib_get_eth_l2_addr(struct ib_device *device, u8 port, union ib_gid 
 *gid,
   + int sgid_idx, u8 *mac, __u16 *vlan_id, u8 *tagged)
   +{
   +  if (!device-get_eth_l2_addr)
   +  return -ENOSYS;
   +
   +  return device-get_eth_l2_addr(device, port, gid, sgid_idx, mac, 
 vlan_id, tagged);
   +}
   +EXPORT_SYMBOL(ib_get_eth_l2_addr);
 
 I don't think we need this wrapper, since uverbs can just call the
 get_eth_l2_addr method directly (we already do that for quite a few
 other methods, eg alloc_ucontext is a uverbs-only method that has no
 in-kernel wrapper).  Also the -ENOSYS test isn't necessary, since a
 device-driver shouldn't allow this method unless it actually implements
 it.
 
I agree.

By the way, do you prefer that I re-create the patch with the fixes or
would you make the necessary changes.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace

2010-05-13 Thread Sean Hefty
Address mapping policies depend on the address type. This patch only
defines a policy for mapping link-local addresses, and we should
indeed take care not to change it (if possible).
Later on, we can add more policies for other address types (e.g.,
normal IPv6 addresses, mapped IPv4 addresses, etc.)

 Or maybe it is device-specific, and we could wrap it up into the create
 AH uverbs call we already have?

Interesting idea. Not sure what is better here: a seperate ABI
call or some additional 'u8 ctx[32]' field in struct
ib_uverbs_create_ah_resp that will be interpreted by the hw-specific
user-space driver.

I don't understand why mapping remote addresses should be driver specific.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace

2010-05-13 Thread Eli Cohen
On Thu, May 13, 2010 at 08:55:13AM -0700, Sean Hefty wrote:
 
 Interesting idea. Not sure what is better here: a seperate ABI
 call or some additional 'u8 ctx[32]' field in struct
 ib_uverbs_create_ah_resp that will be interpreted by the hw-specific
 user-space driver.
 
 I don't understand why mapping remote addresses should be driver specific.

In this case it will not be just mapping remote addresses but creating
the required AH information which is unique to each device.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace

2010-05-13 Thread Sean Hefty
In this case it will not be just mapping remote addresses but creating
the required AH information which is unique to each device.

I understand that the AH is per device.  What I don't get is why we would want
each device to perform the mapping.  We don't expect the device to map GIDs to
LIDs when creating an AH.  The user must have done the mapping beforehand.  The
mapping does not depend on which device I'm using, it should be the same.

Basically, what I want to understand is why does this change make sense?

@@ -1139,6 +1139,10 @@ struct ib_device {
  struct ib_grh *in_grh,
  struct ib_mad *in_mad,
  struct ib_mad *out_mad);
+   int(*get_eth_l2_addr)(struct ib_device *device,
u8 port,
+ union ib_gid *dgid, int
sgid_idx,
+ u8 *mac, u16 *vlan_id, u8
*tagged);
+

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace

2010-05-13 Thread Roland Dreier
  Basically, what I want to understand is why does this change make sense?
  
  @@ -1139,6 +1139,10 @@ struct ib_device {
 struct ib_grh *in_grh,
 struct ib_mad *in_mad,
 struct ib_mad *out_mad);
  +int(*get_eth_l2_addr)(struct ib_device *device,
  u8 port,
  +  union ib_gid *dgid, int
  sgid_idx,
  +  u8 *mac, u16 *vlan_id, u8
  *tagged);
  +

Yes, that was pretty much my original question.  Why do we have a verb
for userspace to call a device-specific method to do the mapping?  The
layering seems wrong somewhere if we have a generic verb to do this
mapping, but then put the mapping in device-specific code.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace

2010-05-12 Thread Roland Dreier
  Add ib_uverbs_get_eth_l2_addr() to allow ibv_create_ah() to resolve sgid,
  dgid to vlan, dmac for any gid type.  Although user-space might bypass 
  this
  call for link-local gids, it is better not to replicate the kernel resolution
  policy.  Port link layer is also returned by ibv_query_port().

A high-level comment/question, followed by some notes about the specific patch.

At the highest level, is having this very low-level command exposed as
part of the kernel uverbs - userspace API the right place to split
things?  Making the Ethernet address resolution part of the low-level
driver implies that it's not really a generic part of the verbs interface.

Maybe it is generic, and we should have a generic function instead of
calling into the low-level driver.  I see the argument that we should
keep the policy in the kernel, although I'm not sure how strong that
argument is -- once we start shipping a kernel with a certain policy
(and I guess OFED has in effect already done that), how could we ever
change that policy?  We'll have interoperability issues anyway, so it
seems having userspace and kernel use different policies doesn't cause
much further problems anyway.

Or maybe it is device-specific, and we could wrap it up into the create
AH uverbs call we already have?

Low-level comments:

  +ssize_t ib_uverbs_get_eth_l2_addr(struct ib_uverbs_file *file, const char 
  __user *buf,
  +  int in_len, int out_len)
  +{
  +struct ib_uverbs_get_eth_l2_addr   cmd;
  +struct ib_uverbs_get_eth_l2_addr_resp  resp;
  +int  ret;
  +struct ib_pd*pd;
  +
  +if (out_len  sizeof resp)
  +return -ENOSPC;
  +
  +if (copy_from_user(cmd, buf, sizeof cmd))
  +return -EFAULT;
  +
  +pd = idr_read_pd(cmd.pd_handle, file-ucontext);
  +if (!pd)
  +return -EINVAL;
  +
  +ret = ib_get_eth_l2_addr(pd-device, cmd.port, (union ib_gid *)cmd.gid,
  + cmd.sgid_idx, resp.mac, resp.vlan_id, 
  resp.tagged);
  +put_pd_read(pd);
  +if (!ret) {
  +if (copy_to_user((void __user *) (unsigned long) cmd.response,
  + resp, sizeof resp))
  +return -EFAULT;

This leaks kernel memory contents to userspace since the stack variable
resp is never cleared.  Also will cause problems if we ever need to use
the reserved fields for anything.

Also I'm not sure I understand why you pass the PD into this method?  It
seems you just use it to get a pointer to the device, but you already
have that from the uverbs_file structure that's passed into all commands.

  +int ib_get_eth_l2_addr(struct ib_device *device, u8 port, union ib_gid *gid,
  +   int sgid_idx, u8 *mac, __u16 *vlan_id, u8 *tagged)
  +{
  +if (!device-get_eth_l2_addr)
  +return -ENOSYS;
  +
  +return device-get_eth_l2_addr(device, port, gid, sgid_idx, mac, 
  vlan_id, tagged);
  +}
  +EXPORT_SYMBOL(ib_get_eth_l2_addr);

I don't think we need this wrapper, since uverbs can just call the
get_eth_l2_addr method directly (we already do that for quite a few
other methods, eg alloc_ucontext is a uverbs-only method that has no
in-kernel wrapper).  Also the -ENOSYS test isn't necessary, since a
device-driver shouldn't allow this method unless it actually implements
it.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace

2010-02-18 Thread Eli Cohen
Add ib_uverbs_get_eth_l2_addr() to allow ibv_create_ah() to resolve sgid,
dgid to vlan, dmac for any gid type.  Although user-space might bypass this
call for link-local gids, it is better not to replicate the kernel resolution
policy.  Port link layer is also returned by ibv_query_port().

Signed-off-by: Eli Cohen e...@mellanox.co.il
---

Changes from v7:

1. ib_uverbs_get_mac() was renamed to ib_uverbs_get_eth_l2_addr() and it now
   returns both MAC, VLAN ID and a tagged indication to indicate if packets
   should go out tagged..

 drivers/infiniband/core/uverbs.h  |1 +
 drivers/infiniband/core/uverbs_cmd.c  |   33 +
 drivers/infiniband/core/uverbs_main.c |1 +
 drivers/infiniband/core/verbs.c   |   10 ++
 include/rdma/ib_user_verbs.h  |   22 --
 include/rdma/ib_verbs.h   |   17 +
 6 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index b3ea958..79359f6 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -194,5 +194,6 @@ IB_UVERBS_DECLARE_CMD(create_srq);
 IB_UVERBS_DECLARE_CMD(modify_srq);
 IB_UVERBS_DECLARE_CMD(query_srq);
 IB_UVERBS_DECLARE_CMD(destroy_srq);
+IB_UVERBS_DECLARE_CMD(get_eth_l2_addr);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 112d397..19b4827 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -452,6 +452,7 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file,
resp.active_width= attr.active_width;
resp.active_speed= attr.active_speed;
resp.phys_state  = attr.phys_state;
+   resp.link_layer  = attr.link_layer;
 
if (copy_to_user((void __user *) (unsigned long) cmd.response,
 resp, sizeof resp))
@@ -1824,6 +1825,38 @@ err:
return ret;
 }
 
+ssize_t ib_uverbs_get_eth_l2_addr(struct ib_uverbs_file *file, const char 
__user *buf,
+ int in_len, int out_len)
+{
+   struct ib_uverbs_get_eth_l2_addr   cmd;
+   struct ib_uverbs_get_eth_l2_addr_resp  resp;
+   int  ret;
+   struct ib_pd*pd;
+
+   if (out_len  sizeof resp)
+   return -ENOSPC;
+
+   if (copy_from_user(cmd, buf, sizeof cmd))
+   return -EFAULT;
+
+   pd = idr_read_pd(cmd.pd_handle, file-ucontext);
+   if (!pd)
+   return -EINVAL;
+
+   ret = ib_get_eth_l2_addr(pd-device, cmd.port, (union ib_gid *)cmd.gid,
+cmd.sgid_idx, resp.mac, resp.vlan_id, 
resp.tagged);
+   put_pd_read(pd);
+   if (!ret) {
+   if (copy_to_user((void __user *) (unsigned long) cmd.response,
+resp, sizeof resp))
+   return -EFAULT;
+
+   return in_len;
+   }
+
+   return ret;
+}
+
 ssize_t ib_uverbs_destroy_ah(struct ib_uverbs_file *file,
 const char __user *buf, int in_len, int out_len)
 {
diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index 5f284ff..ef9eaa5 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -109,6 +109,7 @@ static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file 
*file,
[IB_USER_VERBS_CMD_MODIFY_SRQ]  = ib_uverbs_modify_srq,
[IB_USER_VERBS_CMD_QUERY_SRQ]   = ib_uverbs_query_srq,
[IB_USER_VERBS_CMD_DESTROY_SRQ] = ib_uverbs_destroy_srq,
+   [IB_USER_VERBS_CMD_GET_ETH_L2_ADDR] = ib_uverbs_get_eth_l2_addr,
 };
 
 static struct vfsmount *uverbs_event_mnt;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f9cbdb6..f586702 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -920,3 +920,13 @@ int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, 
u16 lid)
return qp-device-detach_mcast(qp, gid, lid);
 }
 EXPORT_SYMBOL(ib_detach_mcast);
+
+int ib_get_eth_l2_addr(struct ib_device *device, u8 port, union ib_gid *gid,
+  int sgid_idx, u8 *mac, __u16 *vlan_id, u8 *tagged)
+{
+   if (!device-get_eth_l2_addr)
+   return -ENOSYS;
+
+   return device-get_eth_l2_addr(device, port, gid, sgid_idx, mac, 
vlan_id, tagged);
+}
+EXPORT_SYMBOL(ib_get_eth_l2_addr);
diff --git a/include/rdma/ib_user_verbs.h b/include/rdma/ib_user_verbs.h
index a17f771..09f38df 100644
--- a/include/rdma/ib_user_verbs.h
+++ b/include/rdma/ib_user_verbs.h
@@ -81,7 +81,8 @@ enum {
IB_USER_VERBS_CMD_MODIFY_SRQ,
IB_USER_VERBS_CMD_QUERY_SRQ,
IB_USER_VERBS_CMD_DESTROY_SRQ,
-   IB_USER_VERBS_CMD_POST_SRQ_RECV
+   IB_USER_VERBS_CMD_POST_SRQ_RECV,
+   IB_USER_VERBS_CMD_GET_ETH_L2_ADDR
 };
 
 /*