Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-14 Thread Eli Cohen
On Thu, Dec 10, 2009 at 02:49:45PM -0700, Jason Gunthorpe wrote:
 
 - Change the library API for ibv_port_attr to include a link_layer
 - Change the kernel API to retrieve link_layer
 - Add ibv_cmd_get_mac and other stuff to support RDMAoE
 - Update verbs examples to support RDMAoE
 - Update man pages (you missed these)
 

Will be addressed in the next patch set.
 --- a/include/infiniband/kern-abi.h
 +++ b/include/infiniband/kern-abi.h
 @@ -46,7 +46,7 @@
   * The minimum and maximum kernel ABI that we can handle.
   */
   #define IB_USER_VERBS_MIN_ABI_VERSION  1
  -#define IB_USER_VERBS_MAX_ABI_VERSION  6
  +#define IB_USER_VERBS_MAX_ABI_VERSION  7
  
 Whats this about? That seems like it needs a much bigger review,
 changing the kernel ABI version instantly breaks every existing
 libibverbs, shouldn't be done without alot of discussion!!
I think we can do without chagning the ABI version so I am going to
ommit it in the next patch set.

 
 Extra include?

Yes, thanks.


 
 @@ -86,6 +86,7 @@ default_symver(__ibv_query_device, ibv_query_device);
  int __ibv_query_port(struct ibv_context *context, uint8_t port_num,
  struct ibv_port_attr *port_attr)
  {
 +   port_attr-link_layer = IBV_LINK_LAYER_UNSPECIFIED;
 return context-ops.query_port(context, port_num, port_attr);
  }
 
 Seems like this should be just
  return ___ibv_query_port(context,port_num,port_attr);
 

A leftover, thanks.

  diff --git a/drivers/infiniband/core/uverbs_cmd.c 
  b/drivers/infiniband/core/uverbs_cmd.c
  index 012aadf..d592bd2 100644
  +++ b/drivers/infiniband/core/uverbs_cmd.c
  @@ -452,7 +452,8 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file 
  *file,
  resp.active_width= attr.active_width;
  resp.active_speed= attr.active_speed;
  resp.phys_state  = attr.phys_state;
  -   resp.transport   = attr.transport;
  +   resp.transport   = attr.transport == RDMA_TRANSPORT_RDMAOE ?
  +   IB_LINK_LAYER_ETHERNET : IB_LINK_LAYER_INFINIBAND;
 
 Are you going to change the kernel patches to use the new link_layer
 name?
 

Yes, in the next patch set.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-14 Thread Eli Cohen
On Thu, Dec 10, 2009 at 01:50:00PM -0800, Sean Hefty wrote:
 @@ -306,7 +314,7 @@ static struct pingpong_context *pp_init_ctx(struct
 ibv_device *ib_dev, int size,
  return NULL;
  }
 
 -memset(ctx-buf, 0, size);
 +memset(ctx-buf, 0x7b + is_server, size);
 
  ctx-context = ibv_open_device(ib_dev);
  if (!ctx-context) {
 @@ -481,6 +489,7 @@ static void usage(const char *argv0)
  printf(  -n, --iters=itersnumber of exchanges (default 1000)\n);
  printf(  -l, --sl=sl  service level value\n);
  printf(  -e, --events   sleep on CQ events (default poll)\n);
 +printf(  -g, --gid=remote gid gid of the other port\n);
 
 It seems easier for the user if the tests discovered its local GID and 
 exchanged
 it with the remote side like it does with the LID and QPN.
  

Will be changed in the next patch set -- the user will specify the
index to the GID table of the local port.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-14 Thread Jason Gunthorpe
On Mon, Dec 14, 2009 at 10:41:27AM +0200, Eli Cohen wrote:
 On Thu, Dec 10, 2009 at 01:50:00PM -0800, Sean Hefty wrote:
  @@ -306,7 +314,7 @@ static struct pingpong_context *pp_init_ctx(struct
  ibv_device *ib_dev, int size,
 return NULL;
 }
  
  -  memset(ctx-buf, 0, size);
  +  memset(ctx-buf, 0x7b + is_server, size);
  
 ctx-context = ibv_open_device(ib_dev);
 if (!ctx-context) {
  @@ -481,6 +489,7 @@ static void usage(const char *argv0)
 printf(  -n, --iters=itersnumber of exchanges (default 1000)\n);
 printf(  -l, --sl=sl  service level value\n);
 printf(  -e, --events   sleep on CQ events (default poll)\n);
  +  printf(  -g, --gid=remote gid gid of the other port\n);
  
  It seems easier for the user if the tests discovered its local GID and 
  exchanged
  it with the remote side like it does with the LID and QPN.
   
 
 Will be changed in the next patch set -- the user will specify the
 index to the GID table of the local port.

It would be nice if the tools would consistently let you choose the
source device based on gid.. I've been meaning to make a patch for
verbs to add a ibv_open_device_by_gid() function or something like
that.

Using gid idx is pretty unfriendly ...

But it is just a test prorgam, no worries.

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-10 Thread Jason Gunthorpe
On Thu, Dec 10, 2009 at 07:05:36PM +0200, Eli Cohen wrote:

 here is the patch I prepared based on the discussions we had. The patch is 
 based on the last
 rdmaoe/libibverbs patch I sent. libmlx4 was modified too, a trivial
 change that changes name. Both fixes were push to OFED. I will send a
 new patch set in a few days.

Could you prepare this based on Roland's tree? This patch won't apply.

Why are things still going into OFED so quickly? Shouldn't Roland
accept this stuff before it gets to OFED - or at least review it once..

 +static inline int ___ibv_query_port(struct ibv_context *context,
 + uint8_t port_num,
 + struct ibv_port_attr *port_attr)
 +{
 + uint8_t *padp;
 + int padsize;
 +
 + port_attr-link_layer = IBV_LINK_LAYER_UNSPECIFIED;
 + padp = port_attr-link_layer + sizeof port_attr-link_layer;
 + padsize = sizeof(int) - ((unsigned long)padp  (sizeof(int) - 1));
 + memset(padp, 0, padsize);
 +
 + return context-ops.query_port(context, port_num, port_attr);
 +}

You should explicity declare the padding, like ibv_query_port_resp
does, the above is very fragile.

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-10 Thread Eli Cohen
On Thu, Dec 10, 2009 at 10:33:53AM -0700, Jason Gunthorpe wrote:
 Could you prepare this based on Roland's tree? This patch won't apply.
 

I quote two patches, one for libibverbs based on 74638ac, and the
other for libmlx4 based on 444f634. I changed the padding handling as
you requested for libibverbs. You also need to apply a patch to the
kernel driver to match the new values for link_layer. I put it here
too.

libibverbs:


diff --git a/examples/devinfo.c b/examples/devinfo.c
index 84f95c7..393ec04 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -184,6 +184,19 @@ static int print_all_port_gids(struct ibv_context *ctx, 
uint8_t port_num, int tb
return rc;
 }
 
+static const char *link_layer_str(uint8_t link_layer)
+{
+   switch (link_layer) {
+   case IBV_LINK_LAYER_UNSPECIFIED:
+   case IBV_LINK_LAYER_INFINIBAND:
+   return IB;
+   case IBV_LINK_LAYER_ETHERNET:
+   return Ethernet;
+   default:
+   return Unknown;
+   }
+}
+
 static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port)
 {
struct ibv_context *ctx;
@@ -284,6 +297,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t 
ib_port)
printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid);
printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid);
printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc);
+   printf(\t\t\tlink_layer:\t\t%s\n, 
link_layer_str(port_attr.link_layer));
 
if (verbose) {
printf(\t\t\tmax_msg_sz:\t\t0x%x\n, 
port_attr.max_msg_sz);
diff --git a/examples/pingpong.c b/examples/pingpong.c
index b916f59..d4a46e4 100644
--- a/examples/pingpong.c
+++ b/examples/pingpong.c
@@ -31,6 +31,8 @@
  */
 
 #include pingpong.h
+#include arpa/inet.h
+#include stdlib.h
 
 enum ibv_mtu pp_mtu_to_enum(int mtu)
 {
@@ -53,3 +55,10 @@ uint16_t pp_get_local_lid(struct ibv_context *context, int 
port)
 
return attr.lid;
 }
+
+int pp_get_port_info(struct ibv_context *context, int port,
+struct ibv_port_attr *attr)
+{
+   return ibv_query_port(context, port, attr);
+}
+
diff --git a/examples/pingpong.h b/examples/pingpong.h
index 71d7c3f..16d3466 100644
--- a/examples/pingpong.h
+++ b/examples/pingpong.h
@@ -37,5 +37,7 @@
 
 enum ibv_mtu pp_mtu_to_enum(int mtu);
 uint16_t pp_get_local_lid(struct ibv_context *context, int port);
+int pp_get_port_info(struct ibv_context *context, int port,
+struct ibv_port_attr *attr);
 
 #endif /* IBV_PINGPONG_H */
diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index fa969e0..4d0bd0d 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -67,6 +67,8 @@ struct pingpong_context {
int  size;
int  rx_depth;
int  pending;
+   struct ibv_port_attr portinfo;
+   union ibv_giddgid;
 };
 
 struct pingpong_dest {
@@ -94,6 +96,12 @@ static int pp_connect_ctx(struct pingpong_context *ctx, int 
port, int my_psn,
.port_num   = port
}
};
+
+   if (ctx-dgid.global.interface_id) {
+   attr.ah_attr.is_global = 1;
+   attr.ah_attr.grh.hop_limit = 1;
+   attr.ah_attr.grh.dgid = ctx-dgid;
+   }
if (ibv_modify_qp(ctx-qp, attr,
  IBV_QP_STATE  |
  IBV_QP_AV |
@@ -289,11 +297,11 @@ out:
 
 static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, int 
size,
int rx_depth, int port,
-   int use_event)
+   int use_event, int is_server)
 {
struct pingpong_context *ctx;
 
-   ctx = malloc(sizeof *ctx);
+   ctx = calloc(1, sizeof *ctx);
if (!ctx)
return NULL;
 
@@ -306,7 +314,7 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
return NULL;
}
 
-   memset(ctx-buf, 0, size);
+   memset(ctx-buf, 0x7b + is_server, size);
 
ctx-context = ibv_open_device(ib_dev);
if (!ctx-context) {
@@ -481,6 +489,7 @@ static void usage(const char *argv0)
printf(  -n, --iters=itersnumber of exchanges (default 1000)\n);
printf(  -l, --sl=sl  service level value\n);
printf(  -e, --events   sleep on CQ events (default poll)\n);
+   printf(  -g, --gid=remote gid gid of the other port\n);
 }
 
 int main(int argc, char *argv[])
@@ -504,6 +513,7 @@ int main(int argc, char *argv[])
int  rcnt, scnt;
int  num_cq_events = 0;
int  sl = 0;
+   char*grh = NULL;
 
srand48(getpid() * time(NULL));
 
@@ -520,10 +530,11 @@ int 

Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-10 Thread Jason Gunthorpe
On Thu, Dec 10, 2009 at 11:14:55PM +0200, Eli Cohen wrote:
 On Thu, Dec 10, 2009 at 10:33:53AM -0700, Jason Gunthorpe wrote:
  Could you prepare this based on Roland's tree? This patch won't apply.
  
 
 I quote two patches, one for libibverbs based on 74638ac, and the
 other for libmlx4 based on 444f634. I changed the padding handling as
 you requested for libibverbs. You also need to apply a patch to the
 kernel driver to match the new values for link_layer. I put it here
 too.

Seems like this will work to me. I think you need to split this patch
if you want Roland to apply it. I'd suggest

- Change the library API for ibv_port_attr to include a link_layer
- Change the kernel API to retrieve link_layer
- Add ibv_cmd_get_mac and other stuff to support RDMAoE
- Update verbs examples to support RDMAoE
- Update man pages (you missed these)

--- a/include/infiniband/kern-abi.h
+++ b/include/infiniband/kern-abi.h
@@ -46,7 +46,7 @@
  * The minimum and maximum kernel ABI that we can handle.
  */
  #define IB_USER_VERBS_MIN_ABI_VERSION  1
 -#define IB_USER_VERBS_MAX_ABI_VERSION  6
 +#define IB_USER_VERBS_MAX_ABI_VERSION  7
 
Whats this about? That seems like it needs a much bigger review,
changing the kernel ABI version instantly breaks every existing
libibverbs, shouldn't be done without alot of discussion!!

 +enum {
 + IBV_LINK_LAYER_UNSPECIFIED,
 + IBV_LINK_LAYER_INFINIBAND,
 + IBV_LINK_LAYER_ETHERNET,
 +};

Why do you have IBV_LINK_LAYER_UNSPECIFIED ? That seems pointless. Why
should existing kernels return UNSPECIFIED when we know they are
always IB. I'd say drop IBV_LINK_LAYER_UNSPECIFIED and set
IBV_LINK_LAYER_INFINIBAND to 0.

What are iWarp devices going to return? Seems like the verbs library
should probably force link_layer to ethernet for iwarp devices, for
compatability.

 diff --git a/src/cmd.c b/src/cmd.c
 index cbd5288..5183d59 100644
 +++ b/src/cmd.c
 @@ -162,6 +162,7 @@ int ibv_cmd_query_device(struct ibv_context *context,
   return 0;
  }
  
 +#include stdio.h
  int ibv_cmd_query_port(struct ibv_context *context, uint8_t port_num,
  struct ibv_port_attr *port_attr,
  struct ibv_query_port *cmd, size_t cmd_size)

Extra include?

@@ -86,6 +86,7 @@ default_symver(__ibv_query_device, ibv_query_device);
 int __ibv_query_port(struct ibv_context *context, uint8_t port_num,
 struct ibv_port_attr *port_attr)
 {
+   port_attr-link_layer = IBV_LINK_LAYER_UNSPECIFIED;
return context-ops.query_port(context, port_num, port_attr);
 }

Seems like this should be just
 return ___ibv_query_port(context,port_num,port_attr);

 diff --git a/drivers/infiniband/core/uverbs_cmd.c 
 b/drivers/infiniband/core/uverbs_cmd.c
 index 012aadf..d592bd2 100644
 +++ b/drivers/infiniband/core/uverbs_cmd.c
 @@ -452,7 +452,8 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file,
   resp.active_width= attr.active_width;
   resp.active_speed= attr.active_speed;
   resp.phys_state  = attr.phys_state;
 - resp.transport   = attr.transport;
 + resp.transport   = attr.transport == RDMA_TRANSPORT_RDMAOE ?
 + IB_LINK_LAYER_ETHERNET : IB_LINK_LAYER_INFINIBAND;

Are you going to change the kernel patches to use the new link_layer
name?

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-10 Thread Sean Hefty
@@ -306,7 +314,7 @@ static struct pingpong_context *pp_init_ctx(struct
ibv_device *ib_dev, int size,
   return NULL;
   }

-  memset(ctx-buf, 0, size);
+  memset(ctx-buf, 0x7b + is_server, size);

   ctx-context = ibv_open_device(ib_dev);
   if (!ctx-context) {
@@ -481,6 +489,7 @@ static void usage(const char *argv0)
   printf(  -n, --iters=itersnumber of exchanges (default 1000)\n);
   printf(  -l, --sl=sl  service level value\n);
   printf(  -e, --events   sleep on CQ events (default poll)\n);
+  printf(  -g, --gid=remote gid gid of the other port\n);

It seems easier for the user if the tests discovered its local GID and exchanged
it with the remote side like it does with the LID and QPN.
 
diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
index 0db083a..8ef8844 100644


 /*
@@ -223,7 +224,8 @@ struct ibv_query_port_resp {
   __u8  active_width;
   __u8  active_speed;
   __u8  phys_state;
-  __u8  reserved[3];
+  __u8  link_layer;
+  __u8  reserved[2];
 };

Should we define the network layer too?  Right now we have IB transport, which
assumes IB network and link; iWarp transport, which almost assumes IP network
and Ethernet; and RDMAoE, which may or may not have a network (but requires the
L3 address) and Ethernet (or is it DCB) link.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-10 Thread Roland Dreier

#define IB_USER_VERBS_MIN_ABI_VERSION  1
   -#define IB_USER_VERBS_MAX_ABI_VERSION  6
   +#define IB_USER_VERBS_MAX_ABI_VERSION  7
   
  Whats this about? That seems like it needs a much bigger review,
  changing the kernel ABI version instantly breaks every existing
  libibverbs, shouldn't be done without alot of discussion!!

Yes, I've been meaning to bring this up earlier.

There was a time, long ago, when changing this ABI was maybe OK.  I
think even when I first designed the uverbs ABI, having this version
instead of capability bits was a goof.  But I don't have a time machine.

In any case I think at this point we need to stick with ABI version 6
will full backwards compat unless we really really can't.  And I don't
think the changes here are nearly drastic enough that we can't figure
out a way forward without breaking things.

  Why do you have IBV_LINK_LAYER_UNSPECIFIED ? That seems pointless. Why
  should existing kernels return UNSPECIFIED when we know they are
  always IB. I'd say drop IBV_LINK_LAYER_UNSPECIFIED and set
  IBV_LINK_LAYER_INFINIBAND to 0.

I prefer having the UNSPECIFIED.  Not a strong preference but my
reasoning is that if you have an old kernel that doesn't answer
anything, you're better off letting the app see that.  And there's no
reason why iWARP has to run over ethernet in principle.

Maybe I'm wrong but I don't like setting don't know magically to IB
behind the scenes.

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-10 Thread Jason Gunthorpe
On Thu, Dec 10, 2009 at 01:57:11PM -0800, Roland Dreier wrote:

 Maybe I'm wrong but I don't like setting don't know magically to IB
 behind the scenes.

Well, it isn't just don't know it also means the kernel doesn't
support the link_layer query. The kernels that don't support
link_layer also only support link_layer == IB for
IBV_TRANSPORT_IB. One proves the other..

So IBV_TRANSPORT_IB ports should always return
IBV_LINK_LAYER_INFINIBAND unless the kernel supports the new API and
says otherwise. You can still have IBV_LINK_LAYER_UNSPECIFIED, but it
can't be 0.

If this isn't done then again compatability with apps is compromised,
what should an app do when it sees IBV_LINK_LAYER_UNSPECIFIED due to
an older kernel and newer libibverbs?

 And there's no reason why iWARP has to run over ethernet in
 principle.

Right.. so existing kernels with the new library should return
IBV_LINK_LAYER_INFINIBAND for ports on IBV_TRANSPORT_IB devices and
IBV_LINK_LAYER_UNSPECIFIED for ports on everything else
(IBV_TRANSPORT_IWARP, IBV_TRANSPORT_UNKNOWN)

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg