[ewg] ofa_1_5_kernel 20100218-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git git_branch: ofed_kernel_1_5 Common build parameters: Passed: Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16.60-0.54.5-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-128.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.27.19-5-smp Passed on x86_64 with linux-2.6.9-89.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Failed: Build failed on x86_64 with linux-2.6.18-164.el5 Log: /home/vlad/tmp/ofa_1_5_kernel-20100218-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1832: warning: assignment from incompatible pointer type /home/vlad/tmp/ofa_1_5_kernel-20100218-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c: In function 'iscsi_transport_init': /home/vlad/tmp/ofa_1_5_kernel-20100218-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1935: warning: passing argument 3 of 'netlink_kernel_create' from incompatible pointer type /home/vlad/tmp/ofa_1_5_kernel-20100218-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1949: error: implicit declaration of function 'netlink_kernel_release' make[3]: *** [/home/vlad/tmp/ofa_1_5_kernel-20100218-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_5_kernel-20100218-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_5_kernel-20100218-0200_linux-2.6.18-164.el5_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.18-164.el5' make: *** [kernel] Error 2 -- ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH OFED-151] ehca forward ports
Hi Vlad, please apply for OFED-151. Forward ports for ehca driver to enable compilation on 2.6.32 and 2.6.31. Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com --- kernel_patches/backport/2.6.32/ehca-010-remove_driver_data.patch | 60 ++ kernel_patches/backport/2.6.32/ehca-020-fix_buswalk.patch| 17 ++ 2 files changed, 77 insertions(+) --- /dev/null +++ ofed_kernel-1.5/kernel_patches/backport/2.6.32/ehca-010-remove_driver_data.patch @@ -0,0 +1,60 @@ +commit f899c2ddd45f2515deb446e2b143e4a686a49aee +Author: Greg Kroah-Hartman gre...@suse.de +Date: Mon May 4 12:40:54 2009 -0700 + +infiniband: ehca: remove driver_data direct access of struct device + +In the near future, the driver core is going to not allow direct access +to the driver_data pointer in struct device. Instead, the functions +dev_get_drvdata() and dev_set_drvdata() should be used. These functions +have been around since the beginning, so are backwards compatible with +all older kernel versions. + +Cc: Sean Hefty sean.he...@intel.com +Cc: Roland Dreier rola...@cisco.com +Cc: Hal Rosenstock hal.rosenst...@gmail.com +Cc: gene...@lists.openfabrics.org +Cc: Christoph Raisch rai...@de.ibm.com +Acked-by: Hoang-Nam Nguyen hngu...@de.ibm.com +Signed-off-by: Greg Kroah-Hartman gre...@suse.de + +diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c +index 85905ab..ce4e6ef 100644 +--- a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c +@@ -636,7 +636,7 @@ static ssize_t ehca_show_##name(struct device *dev, \ + struct hipz_query_hca *rblock; \ + int data; \ + \ +- shca = dev-driver_data; \ ++ shca = dev_get_drvdata(dev); \ + \ + rblock = ehca_alloc_fw_ctrlblock(GFP_KERNEL); \ + if (!rblock) { \ +@@ -680,7 +680,7 @@ static ssize_t ehca_show_adapter_handle(struct device *dev, + struct device_attribute *attr, + char *buf) + { +- struct ehca_shca *shca = dev-driver_data; ++ struct ehca_shca *shca = dev_get_drvdata(dev); + + return sprintf(buf, %llx\n, shca-ipz_hca_handle.handle); + +@@ -749,7 +749,7 @@ static int __devinit ehca_probe(struct of_device *dev, + + shca-ofdev = dev; + shca-ipz_hca_handle.handle = *handle; +- dev-dev.driver_data = shca; ++ dev_set_drvdata(dev-dev, shca); + + ret = ehca_sense_attributes(shca); + if (ret 0) { +@@ -878,7 +878,7 @@ probe1: + + static int __devexit ehca_remove(struct of_device *dev) + { +- struct ehca_shca *shca = dev-dev.driver_data; ++ struct ehca_shca *shca = dev_get_drvdata(dev-dev); + unsigned long flags; + int ret; + --- /dev/null +++ ofed_kernel-1.5/kernel_patches/backport/2.6.32/ehca-020-fix_buswalk.patch @@ -0,0 +1,17 @@ +--- + drivers/infiniband/hw/ehca/ehca_mrmw.c |2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +Index: ofa_kernel-1.5.1/drivers/infiniband/hw/ehca/ehca_mrmw.c +=== +--- ofa_kernel-1.5.1.orig/drivers/infiniband/hw/ehca/ehca_mrmw.c ofa_kernel-1.5.1/drivers/infiniband/hw/ehca/ehca_mrmw.c +@@ -2463,7 +2463,7 @@ int ehca_create_busmap(void) + int ret; + + ehca_mr_len = 0; +- ret = walk_memory_resource(0, 1ULL MAX_PHYSMEM_BITS, NULL, ++ ret = walk_system_ram_range(0, 1ULL MAX_PHYSMEM_BITS, NULL, + ehca_create_busmap_callback); + return ret; + } ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH OFED-151] ehca in install.pl
Hi Vlad, another patch for OFED-1.5.1... Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com --- install.pl |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- OFED-1.5.1-20100217-0757.orig/install.pl +++ OFED-1.5.1-20100217-0757/install.pl @@ -1658,7 +1658,7 @@ sub set_availability # Ehca if ($arch =~ m/ppc64|powerpc/ and -$kernel =~ m/2.6.1[6-9]|2.6.2[0-9]|2.6.30/) { +$kernel =~ m/2.6.1[6-9]|2.6.2[0-9]|2.6.3[0-2]/) { $kernel_modules_info{'ehca'}{'available'} = 1; $packages_info{'libehca'}{'available'} = 1; $packages_info{'libehca-devel-static'}{'available'} = 1; ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCHv8 0/11] IBoE support to Infiniband
IBoE allows running the IB transport protocol using Ethernet frames, enabling the deployment of IB semantics on lossless Ethernet fabrics. IBoE packets are standard Ethernet frames with an IEEE assigned Ethertype, a GRH, unmodified IB transport headers and payload. IB subnet management and SA services are not required for IBoE operation; Ethernet management practices are used instead. IBoE encodes IP addresses into its GIDs and resolves MAC addresses using the host IP stack. For multicast GIDs, standard IP to MAC mappings apply. The OFA RDMA Verbs API is syntactically unmodified. The CMA is adapted to support IBoE ports allowing existing RDMA applications to run over IBoE with no changes. Address handles for IBoE are required to contain valid L3 addresses (GIDs) and the IB L2 address fields become reserved. The complementary Ethernet L2 address information is subsequently resolved below the API. As there is no SA in IBoE, the CMA code is adapted to locally fill-in corresponding path record attributes for IBoE address handles. Also, the CMA provides the required address handle attributes for SIDR requests and joining of multicast groups. With this patch set, each IBoE port is assigned a GID equal to the link local address of its corresponding net device, and one more GID for each one of the VLAN devices which are derived from it. iboe packets are tagged with the VLAN ID of the corresponding netdevice through which they are generated. The priority field in the 802.1q header of IBoE packets is derived from the SL field in the address vector. rdma_cm applications can set the TOS value of the rdma_cm_id object through the rdma_set_option() API which then maps to SL. With these patches, IBoE multicast frames may be broadcast as there is currently no use of a L2 multicast group membership protocol. To enable IBoE with the mlx4 driver stack, both the mlx4_en and mlx4_ib drivers must be loaded, and the netdevice for the corresponding IBoE port must be running. Individual ports of a multi port HCA can be independently configured as Ethernet (with support for IBoE) or as IB, as it was already the case. We have successfully tested MPI, SDP, RDS, and native Verbs applications over IBoE. Following is a series of 11 patches based on Roland's for-next branch. This new series reflects changes based on feedback from the community on the previous patch set. Changes from v7 1. Rebase on 2.6.33-rc3 2. Add VLAN support 3. Bug fixes and improvements (see in the patches changelog). Signed-off-by: Eli Cohen e...@mellanox.co.il --- drivers/infiniband/core/agent.c | 37 + drivers/infiniband/core/cm.c |5 drivers/infiniband/core/cma.c | 287 ++- drivers/infiniband/core/mad.c | 27 + drivers/infiniband/core/multicast.c | 25 + drivers/infiniband/core/sa_query.c| 46 +- drivers/infiniband/core/ucma.c| 54 ++ drivers/infiniband/core/ud_header.c | 129 +- drivers/infiniband/core/user_mad.c| 11 drivers/infiniband/core/uverbs.h |1 drivers/infiniband/core/uverbs_cmd.c | 33 + drivers/infiniband/core/uverbs_main.c |1 drivers/infiniband/core/verbs.c | 26 + drivers/infiniband/hw/mlx4/ah.c | 196 -- drivers/infiniband/hw/mlx4/mad.c | 32 + drivers/infiniband/hw/mlx4/main.c | 557 -- drivers/infiniband/hw/mlx4/mlx4_ib.h | 35 + drivers/infiniband/hw/mlx4/qp.c | 180 +++-- drivers/infiniband/hw/mthca/mthca_qp.c|2 drivers/infiniband/ulp/ipoib/ipoib_main.c |7 drivers/net/mlx4/en_main.c| 15 drivers/net/mlx4/en_netdev.c | 10 drivers/net/mlx4/en_port.c|4 drivers/net/mlx4/en_port.h|3 drivers/net/mlx4/fw.c |3 drivers/net/mlx4/intf.c | 20 + drivers/net/mlx4/main.c |6 drivers/net/mlx4/mlx4.h |1 drivers/net/mlx4/mlx4_en.h|1 drivers/net/mlx4/port.c | 19 + include/linux/mlx4/cmd.h |1 include/linux/mlx4/device.h | 32 + include/linux/mlx4/driver.h | 16 include/linux/mlx4/qp.h |9 include/rdma/ib_addr.h| 139 +++ include/rdma/ib_pack.h| 28 + include/rdma/ib_sa.h |3 include/rdma/ib_user_verbs.h | 22 + include/rdma/ib_verbs.h | 29 + 39 files changed, 1813 insertions(+), 239 deletions(-) ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCHv8 01/11] ib core: Add link layer property to ports
This patch adds the infrastructure for querying the link layer of a port, which can be either IB_LINK_LAYER_INFINIBAND or IB_LINK_LAYER_ETHERNET. This is required for adding IBoE support to Infiniband drivers so that branching decisions can be made according to the value of this property. For devices that do not provide an implementation for querying the link layer property of a port, the returned value depends on the node transport such that RMA_TRANSPORT_IB nodes will return IB_LINK_LAYER_INFINIBAND and RDMA_TRANSPORT_IWARP nodes will return IB_LINK_LAYER_ETHERNET. Signed-off-by: Eli Cohen e...@mellanox.co.il --- drivers/infiniband/core/verbs.c | 16 include/rdma/ib_verbs.h | 12 2 files changed, 28 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index a7da9be..f9cbdb6 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -94,6 +94,22 @@ rdma_node_get_transport(enum rdma_node_type node_type) } EXPORT_SYMBOL(rdma_node_get_transport); +enum rdma_link_layer rdma_port_link_layer(struct ib_device *device, u8 port_num) +{ + if (device-get_link_layer) + return device-get_link_layer(device, port_num); + + switch (rdma_node_get_transport(device-node_type)) { + case RDMA_TRANSPORT_IB: + return IB_LINK_LAYER_INFINIBAND; + case RDMA_TRANSPORT_IWARP: + return IB_LINK_LAYER_ETHERNET; + default: + return IB_LINK_LAYER_UNSPECIFIED; + } +} +EXPORT_SYMBOL(rdma_port_link_layer); + /* Protection domains */ struct ib_pd *ib_alloc_pd(struct ib_device *device) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 09509ed..bbfe315 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -75,6 +75,12 @@ enum rdma_transport_type { enum rdma_transport_type rdma_node_get_transport(enum rdma_node_type node_type) __attribute_const__; +enum rdma_link_layer { + IB_LINK_LAYER_UNSPECIFIED, + IB_LINK_LAYER_INFINIBAND, + IB_LINK_LAYER_ETHERNET, +}; + enum ib_device_cap_flags { IB_DEVICE_RESIZE_MAX_WR = 1, IB_DEVICE_BAD_PKEY_CNTR = (11), @@ -298,6 +304,7 @@ struct ib_port_attr { u8 active_width; u8 active_speed; u8 phys_state; + enum rdma_link_layerlink_layer; }; enum ib_device_modify_flags { @@ -1003,6 +1010,8 @@ struct ib_device { int(*query_port)(struct ib_device *device, u8 port_num, struct ib_port_attr *port_attr); + enum rdma_link_layer (*get_link_layer)(struct ib_device *device, +u8 port_num); int(*query_gid)(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); @@ -1213,6 +1222,9 @@ int ib_query_device(struct ib_device *device, int ib_query_port(struct ib_device *device, u8 port_num, struct ib_port_attr *port_attr); +enum rdma_link_layer rdma_port_link_layer(struct ib_device *device, + u8 port_num); + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- 1.7.0 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCHv8 02/11] ib_core: IBoE support only QP1
Since IBoE is using Ethernet as its link layer, there is no central management entity so there is need for QP0. QP1 is still needed since it handles communications between CM agents. This patch will create only QP1 for IBoE ports. Signed-off-by: Eli Cohen e...@mellanox.co.il --- Changes from v7: 1. Remove always true code 2. Fix failure to initialize port ah_lock in ib_sa_add_one drivers/infiniband/core/agent.c | 37 +++ drivers/infiniband/core/mad.c | 27 +++--- drivers/infiniband/core/multicast.c | 25 ++--- drivers/infiniband/core/sa_query.c | 41 ++ 4 files changed, 93 insertions(+), 37 deletions(-) diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c index ae7c288..964f4fb 100644 --- a/drivers/infiniband/core/agent.c +++ b/drivers/infiniband/core/agent.c @@ -48,6 +48,8 @@ struct ib_agent_port_private { struct list_head port_list; struct ib_mad_agent *agent[2]; + struct ib_device*device; + u8 port_num; }; static DEFINE_SPINLOCK(ib_agent_port_list_lock); @@ -58,11 +60,10 @@ __ib_get_agent_port(struct ib_device *device, int port_num) { struct ib_agent_port_private *entry; - list_for_each_entry(entry, ib_agent_port_list, port_list) { - if (entry-agent[0]-device == device - entry-agent[0]-port_num == port_num) + list_for_each_entry(entry, ib_agent_port_list, port_list) + if (entry-device == device entry-port_num == port_num) return entry; - } + return NULL; } @@ -155,14 +156,16 @@ int ib_agent_port_open(struct ib_device *device, int port_num) goto error1; } - /* Obtain send only MAD agent for SMI QP */ - port_priv-agent[0] = ib_register_mad_agent(device, port_num, - IB_QPT_SMI, NULL, 0, - agent_send_handler, - NULL, NULL); - if (IS_ERR(port_priv-agent[0])) { - ret = PTR_ERR(port_priv-agent[0]); - goto error2; + if (rdma_port_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) { + /* Obtain send only MAD agent for SMI QP */ + port_priv-agent[0] = ib_register_mad_agent(device, port_num, + IB_QPT_SMI, NULL, 0, + agent_send_handler, + NULL, NULL); + if (IS_ERR(port_priv-agent[0])) { + ret = PTR_ERR(port_priv-agent[0]); + goto error2; + } } /* Obtain send only MAD agent for GSI QP */ @@ -175,6 +178,9 @@ int ib_agent_port_open(struct ib_device *device, int port_num) goto error3; } + port_priv-device = device; + port_priv-port_num = port_num; + spin_lock_irqsave(ib_agent_port_list_lock, flags); list_add_tail(port_priv-port_list, ib_agent_port_list); spin_unlock_irqrestore(ib_agent_port_list_lock, flags); @@ -182,7 +188,8 @@ int ib_agent_port_open(struct ib_device *device, int port_num) return 0; error3: - ib_unregister_mad_agent(port_priv-agent[0]); + if (rdma_port_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) + ib_unregister_mad_agent(port_priv-agent[0]); error2: kfree(port_priv); error1: @@ -205,7 +212,9 @@ int ib_agent_port_close(struct ib_device *device, int port_num) spin_unlock_irqrestore(ib_agent_port_list_lock, flags); ib_unregister_mad_agent(port_priv-agent[1]); - ib_unregister_mad_agent(port_priv-agent[0]); + if (rdma_port_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) + ib_unregister_mad_agent(port_priv-agent[0]); + kfree(port_priv); return 0; } diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 7522008..f546ab7 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2610,6 +2610,9 @@ static void cleanup_recv_queue(struct ib_mad_qp_info *qp_info) struct ib_mad_private *recv; struct ib_mad_list_head *mad_list; + if (!qp_info-qp) + return; + while (!list_empty(qp_info-recv_queue.list)) { mad_list = list_entry(qp_info-recv_queue.list.next, @@ -2651,6 +2654,9 @@ static int ib_mad_port_start(struct ib_mad_port_private *port_priv) for (i = 0; i IB_MAD_QPS_CORE; i++) { qp = port_priv-qp_info[i].qp; + if (!qp) + continue; + /* * PKey index for QP1 is irrelevant but
[ewg] [PATCHv8 03/11] IB/umad: Enable support only for IB ports
Initialize umad context only for Infiniband (as opposed to Ethernet) ports. Signed-off-by: Eli Cohen e...@mellanox.co.il --- drivers/infiniband/core/user_mad.c | 11 +++ 1 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index 7de0296..e962c5a 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -1138,8 +1138,9 @@ static void ib_umad_add_one(struct ib_device *device) for (i = s; i = e; ++i) { umad_dev-port[i - s].umad_dev = umad_dev; - if (ib_umad_init_port(device, i, umad_dev-port[i - s])) - goto err; + if (rdma_port_link_layer(device, i) == IB_LINK_LAYER_INFINIBAND) + if (ib_umad_init_port(device, i, umad_dev-port[i - s])) + goto err; } ib_set_client_data(device, umad_client, umad_dev); @@ -1148,7 +1149,8 @@ static void ib_umad_add_one(struct ib_device *device) err: while (--i = s) - ib_umad_kill_port(umad_dev-port[i - s]); + if (rdma_port_link_layer(device, i) == IB_LINK_LAYER_INFINIBAND) + ib_umad_kill_port(umad_dev-port[i - s]); kref_put(umad_dev-ref, ib_umad_release_dev); } @@ -1162,7 +1164,8 @@ static void ib_umad_remove_one(struct ib_device *device) return; for (i = 0; i = umad_dev-end_port - umad_dev-start_port; ++i) - ib_umad_kill_port(umad_dev-port[i]); + if (rdma_port_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND) + ib_umad_kill_port(umad_dev-port[i]); kref_put(umad_dev-ref, ib_umad_release_dev); } -- 1.7.0 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCHv8 04/11] ib_core: IBoE CMA device binding
Add support for IBoE device binding and IP -- GID resolution. Path resolving and multicast joining are implemented within cma.c by filling the responses and pushing the callbacks to the cma work queue. IP-GID resolution always yields IPv6 link local addresses - remote GIDs are derived from the destination MAC address of the remote port. Multicast GIDs are always mapped to multicast MACs as is done in IPv6. Some helper functions are added to ib_addr.h. IPv4 multicast is enabled by translating IPv4 multicast addresses to IPv6 multicast as described in http://www.mail-archive.com/i...@sunroof.eng.sun.com/msg02134.html. Signed-off-by: Eli Cohen e...@mellanox.co.il --- Chages from v7: 1. Add force_grh flag to ib_init_ah_from_path() to request IB_AH_GRH for IB_LINK_LAYER_ETHERNET ports thus allowing to use hop limit 1 in path records. 2. cma_acquire_dev() finds the cma_dev by first assuming an iboe type device for none ARPHRD_INFINIBAND dev type. If it fails to do that, it falls back to old method. drivers/infiniband/core/cm.c |5 +- drivers/infiniband/core/cma.c | 283 +++-- drivers/infiniband/core/sa_query.c|5 +- drivers/infiniband/core/ucma.c| 45 - drivers/infiniband/ulp/ipoib/ipoib_main.c |2 +- include/rdma/ib_addr.h| 98 ++- include/rdma/ib_sa.h |3 +- 7 files changed, 412 insertions(+), 29 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 5130fc5..6513b1c 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -351,6 +351,7 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) unsigned long flags; int ret; u8 p; + int force_grh; read_lock_irqsave(cm.device_lock, flags); list_for_each_entry(cm_dev, cm.device_list, list) { @@ -371,8 +372,10 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) return ret; av-port = port; + force_grh = rdma_port_link_layer(cm_dev-ib_device, port-port_num) == + IB_LINK_LAYER_ETHERNET ? 1 : 0; ib_init_ah_from_path(cm_dev-ib_device, port-port_num, path, -av-ah_attr); +av-ah_attr, force_grh); av-timeout = path-packet_life_time + 1; return 0; } diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index cc9b594..df5f636 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -58,6 +58,7 @@ MODULE_LICENSE(Dual BSD/GPL); #define CMA_CM_RESPONSE_TIMEOUT 20 #define CMA_MAX_CM_RETRIES 15 #define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24) +#define IBOE_PACKET_LIFETIME 18 static void cma_add_one(struct ib_device *device); static void cma_remove_one(struct ib_device *device); @@ -157,6 +158,7 @@ struct cma_multicast { struct list_headlist; void*context; struct sockaddr_storage addr; + struct kref mcref; }; struct cma_work { @@ -173,6 +175,12 @@ struct cma_ndev_work { struct rdma_cm_eventevent; }; +struct iboe_mcast_work { + struct work_struct work; + struct rdma_id_private *id; + struct cma_multicast*mc; +}; + union cma_ip_addr { struct in6_addr ip6; struct { @@ -281,6 +289,8 @@ static void cma_attach_to_dev(struct rdma_id_private *id_priv, atomic_inc(cma_dev-refcount); id_priv-cma_dev = cma_dev; id_priv-id.device = cma_dev-device; + id_priv-id.route.addr.dev_addr.transport = + rdma_node_get_transport(cma_dev-device-node_type); list_add_tail(id_priv-list, cma_dev-id_list); } @@ -290,6 +300,14 @@ static inline void cma_deref_dev(struct cma_device *cma_dev) complete(cma_dev-comp); } +static inline void release_mc(struct kref *kref) +{ + struct cma_multicast *mc = container_of(kref, struct cma_multicast, mcref); + + kfree(mc-multicast.ib); + kfree(mc); +} + static void cma_detach_from_dev(struct rdma_id_private *id_priv) { list_del(id_priv-list); @@ -330,15 +348,29 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv) union ib_gid gid; int ret = -ENODEV; - rdma_addr_get_sgid(dev_addr, gid); + if (dev_addr-dev_type != ARPHRD_INFINIBAND) { + iboe_addr_get_sgid(dev_addr, gid); + list_for_each_entry(cma_dev, dev_list, list) { + ret = ib_find_cached_gid(cma_dev-device, gid, +id_priv-id.port_num, NULL); + if (!ret) + goto out; + } + } + + memcpy(gid, dev_addr-src_dev_addr + + rdma_addr_gid_offset(dev_addr), sizeof gid);
[ewg] [PATCHv8 05/11] ib_core: IBoE UD packet packing support
Add support functions to aid in packing IBoE packets. Signed-off-by: Eli Cohen e...@mellanox.co.il --- drivers/infiniband/core/ud_header.c | 100 ++ include/rdma/ib_pack.h | 29 +- 2 files changed, 92 insertions(+), 37 deletions(-) Changes from v7: 1. Re-work the changes so they extend the original idea behind these functions. 2. Fix wrong implementation of ib_ud_header_init(). A different patch was sent to Roland. diff --git a/drivers/infiniband/core/ud_header.c b/drivers/infiniband/core/ud_header.c index 8ec7876..7650313 100644 --- a/drivers/infiniband/core/ud_header.c +++ b/drivers/infiniband/core/ud_header.c @@ -80,6 +80,29 @@ static const struct ib_field lrh_table[] = { .size_bits= 16 } }; +static const struct ib_field eth_table[] = { + { STRUCT_FIELD(eth, dmac_h), + .offset_words = 0, + .offset_bits = 0, + .size_bits= 32 }, + { STRUCT_FIELD(eth, dmac_l), + .offset_words = 1, + .offset_bits = 0, + .size_bits= 16 }, + { STRUCT_FIELD(eth, smac_h), + .offset_words = 1, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(eth, smac_l), + .offset_words = 2, + .offset_bits = 0, + .size_bits= 32 }, + { STRUCT_FIELD(eth, type), + .offset_words = 3, + .offset_bits = 0, + .size_bits= 16 } +}; + static const struct ib_field grh_table[] = { { STRUCT_FIELD(grh, ip_version), .offset_words = 0, @@ -180,56 +203,51 @@ static const struct ib_field deth_table[] = { /** * ib_ud_header_init - Initialize UD header structure * @payload_bytes:Length of packet payload + * @lrh_present: specify if LRH is present + * @eth_present: specify if Eth header is present * @grh_present:GRH flag (if non-zero, GRH will be included) + * @immediate_present: specify if immediate data is present * @header:Structure to initialize - * - * ib_ud_header_init() initializes the lrh.link_version, lrh.link_next_header, - * lrh.packet_length, grh.ip_version, grh.payload_length, - * grh.next_header, bth.opcode, bth.pad_count and - * bth.transport_header_version fields of a struct ib_ud_header given - * the payload length and whether a GRH will be included. */ void ib_ud_header_init(int payload_bytes, + int lrh_present, + int eth_present, int grh_present, + int immediate_present, struct ib_ud_header *header) { - int header_len; u16 packet_length; memset(header, 0, sizeof *header); - header_len = - IB_LRH_BYTES + - IB_BTH_BYTES + - IB_DETH_BYTES; - if (grh_present) { - header_len += IB_GRH_BYTES; + if (lrh_present) { + header-lrh.link_version = 0; + header-lrh.link_next_header = + grh_present ? IB_LNH_IBA_GLOBAL : IB_LNH_IBA_LOCAL; + packet_length = IB_LRH_BYTES; } - header-lrh.link_version = 0; - header-lrh.link_next_header = - grh_present ? IB_LNH_IBA_GLOBAL : IB_LNH_IBA_LOCAL; - packet_length= (IB_LRH_BYTES + - IB_BTH_BYTES + - IB_DETH_BYTES+ - payload_bytes+ - 4+ /* ICRC */ - 3) / 4;/* round up */ - - header-grh_present = grh_present; + if (eth_present) + packet_length = IB_ETH_BYTES; + + packet_length += IB_BTH_BYTES + IB_DETH_BYTES + payload_bytes + + 4 + /* ICRC */ + 3;/* round up */ + packet_length /= 4; if (grh_present) { - packet_length += IB_GRH_BYTES / 4; - header-grh.ip_version = 6; - header-grh.payload_length = - cpu_to_be16((IB_BTH_BYTES + -IB_DETH_BYTES+ -payload_bytes+ -4+ /* ICRC */ -3) ~3); /* round up */ + packet_length += IB_GRH_BYTES / 4; + header-grh.ip_version = 6; + header-grh.payload_length = + cpu_to_be16((IB_BTH_BYTES + +IB_DETH_BYTES + +payload_bytes + +4 + /* ICRC */ +3) ~3); /*
[ewg] [PATCHv8 06/11] ipoib: avoid ipoib over IBoE
IPoIB is an implementation of IP over Infiniband transport. In the case of IBoE, the link layer is Ethernet so IP can work directly over Ethernet, so disable IPoIB for none IB_LINK_LAYER_INFINIBAND ports. Signed-off-by: Eli Cohen e...@mellanox.co.il --- drivers/infiniband/ulp/ipoib/ipoib_main.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 06014d2..5e6c2de 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1362,6 +1362,8 @@ static void ipoib_add_one(struct ib_device *device) } for (p = s; p = e; ++p) { + if (rdma_port_link_layer(device, p) != IB_LINK_LAYER_INFINIBAND) + continue; dev = ipoib_add_port(ib%d, device, p); if (!IS_ERR(dev)) { priv = netdev_priv(dev); @@ -1383,6 +1385,9 @@ static void ipoib_remove_one(struct ib_device *device) dev_list = ib_get_client_data(device, ipoib_client); list_for_each_entry_safe(priv, tmp, dev_list, list) { + if (rdma_port_link_layer(device, priv-port) != IB_LINK_LAYER_INFINIBAND) + continue; + ib_unregister_event_handler(priv-event_handler); rtnl_lock(); -- 1.7.0 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace
Add ib_uverbs_get_eth_l2_addr() to allow ibv_create_ah() to resolve sgid, dgid to vlan, dmac for any gid type. Although user-space might bypass this call for link-local gids, it is better not to replicate the kernel resolution policy. Port link layer is also returned by ibv_query_port(). Signed-off-by: Eli Cohen e...@mellanox.co.il --- Changes from v7: 1. ib_uverbs_get_mac() was renamed to ib_uverbs_get_eth_l2_addr() and it now returns both MAC, VLAN ID and a tagged indication to indicate if packets should go out tagged.. drivers/infiniband/core/uverbs.h |1 + drivers/infiniband/core/uverbs_cmd.c | 33 + drivers/infiniband/core/uverbs_main.c |1 + drivers/infiniband/core/verbs.c | 10 ++ include/rdma/ib_user_verbs.h | 22 -- include/rdma/ib_verbs.h | 17 + 6 files changed, 82 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index b3ea958..79359f6 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -194,5 +194,6 @@ IB_UVERBS_DECLARE_CMD(create_srq); IB_UVERBS_DECLARE_CMD(modify_srq); IB_UVERBS_DECLARE_CMD(query_srq); IB_UVERBS_DECLARE_CMD(destroy_srq); +IB_UVERBS_DECLARE_CMD(get_eth_l2_addr); #endif /* UVERBS_H */ diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 112d397..19b4827 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -452,6 +452,7 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file, resp.active_width= attr.active_width; resp.active_speed= attr.active_speed; resp.phys_state = attr.phys_state; + resp.link_layer = attr.link_layer; if (copy_to_user((void __user *) (unsigned long) cmd.response, resp, sizeof resp)) @@ -1824,6 +1825,38 @@ err: return ret; } +ssize_t ib_uverbs_get_eth_l2_addr(struct ib_uverbs_file *file, const char __user *buf, + int in_len, int out_len) +{ + struct ib_uverbs_get_eth_l2_addr cmd; + struct ib_uverbs_get_eth_l2_addr_resp resp; + int ret; + struct ib_pd*pd; + + if (out_len sizeof resp) + return -ENOSPC; + + if (copy_from_user(cmd, buf, sizeof cmd)) + return -EFAULT; + + pd = idr_read_pd(cmd.pd_handle, file-ucontext); + if (!pd) + return -EINVAL; + + ret = ib_get_eth_l2_addr(pd-device, cmd.port, (union ib_gid *)cmd.gid, +cmd.sgid_idx, resp.mac, resp.vlan_id, resp.tagged); + put_pd_read(pd); + if (!ret) { + if (copy_to_user((void __user *) (unsigned long) cmd.response, +resp, sizeof resp)) + return -EFAULT; + + return in_len; + } + + return ret; +} + ssize_t ib_uverbs_destroy_ah(struct ib_uverbs_file *file, const char __user *buf, int in_len, int out_len) { diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 5f284ff..ef9eaa5 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -109,6 +109,7 @@ static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file *file, [IB_USER_VERBS_CMD_MODIFY_SRQ] = ib_uverbs_modify_srq, [IB_USER_VERBS_CMD_QUERY_SRQ] = ib_uverbs_query_srq, [IB_USER_VERBS_CMD_DESTROY_SRQ] = ib_uverbs_destroy_srq, + [IB_USER_VERBS_CMD_GET_ETH_L2_ADDR] = ib_uverbs_get_eth_l2_addr, }; static struct vfsmount *uverbs_event_mnt; diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index f9cbdb6..f586702 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -920,3 +920,13 @@ int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) return qp-device-detach_mcast(qp, gid, lid); } EXPORT_SYMBOL(ib_detach_mcast); + +int ib_get_eth_l2_addr(struct ib_device *device, u8 port, union ib_gid *gid, + int sgid_idx, u8 *mac, __u16 *vlan_id, u8 *tagged) +{ + if (!device-get_eth_l2_addr) + return -ENOSYS; + + return device-get_eth_l2_addr(device, port, gid, sgid_idx, mac, vlan_id, tagged); +} +EXPORT_SYMBOL(ib_get_eth_l2_addr); diff --git a/include/rdma/ib_user_verbs.h b/include/rdma/ib_user_verbs.h index a17f771..09f38df 100644 --- a/include/rdma/ib_user_verbs.h +++ b/include/rdma/ib_user_verbs.h @@ -81,7 +81,8 @@ enum { IB_USER_VERBS_CMD_MODIFY_SRQ, IB_USER_VERBS_CMD_QUERY_SRQ, IB_USER_VERBS_CMD_DESTROY_SRQ, - IB_USER_VERBS_CMD_POST_SRQ_RECV + IB_USER_VERBS_CMD_POST_SRQ_RECV, + IB_USER_VERBS_CMD_GET_ETH_L2_ADDR }; /*
[ewg] [PATCHv8 09/11] mlx4: Add support for IBoE - address resolution
The following patch handles address vectors creation for IBoE ports. mlx4 needs the MAC address of the remote node to include it in the WQE of a UD QP or in the QP context of connected QPs. Address resolution is done atomically in the case of a link local address or a multicast GID and otherwise -EINVAL is returned. mlx4 transport packets were changed too to accommodate for IBoE. Multicast groups attach/detach calls dev_mc_add/remove to update the NIC's multicast filters.Since attaching a QP to a multicast group does not require the QP to be in a state different then INIT - this is fine for IB. For IBoE however, we need the port assigned to the QP in order to call dev_mc_add() for the correct netdevice, while port is assigned when moving from INIT to RTR. Hence, we must keep track of all the multicast groups attached to a QP and call dev_mc_add() when the port becomes available. Signed-off-by: Eli Cohen e...@mellanox.co.il --- Changes from v7: 1. Fix failure to initialize gid_index in create_iboe_ah() 2. Move call register_netdevice_notifier() after call to ib_register_device() 3. Call flush_workqueue() after unregister notifier to flush any pending work requesets. 4. Change build_mlx_header to match changes to ud_header.c drivers/infiniband/hw/mlx4/ah.c| 182 ++--- drivers/infiniband/hw/mlx4/mad.c | 32 ++- drivers/infiniband/hw/mlx4/main.c | 497 +--- drivers/infiniband/hw/mlx4/mlx4_ib.h | 35 +++- drivers/infiniband/hw/mlx4/qp.c| 139 +++-- drivers/infiniband/hw/mthca/mthca_qp.c |2 + drivers/net/mlx4/en_port.c |4 +- drivers/net/mlx4/en_port.h |3 +- drivers/net/mlx4/fw.c |3 +- include/linux/mlx4/cmd.h |1 + include/linux/mlx4/device.h| 31 ++- include/linux/mlx4/qp.h|7 +- 12 files changed, 819 insertions(+), 117 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c index c75ac94..0a2f1fb 100644 --- a/drivers/infiniband/hw/mlx4/ah.c +++ b/drivers/infiniband/hw/mlx4/ah.c @@ -31,63 +31,158 @@ */ #include mlx4_ib.h +#include rdma/ib_addr.h +#include linux/inet.h +#include linux/string.h -struct ib_ah *mlx4_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) +int mlx4_ib_resolve_grh(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah_attr, + u8 *mac, int *is_mcast, u8 port) { - struct mlx4_dev *dev = to_mdev(pd-device)-dev; - struct mlx4_ib_ah *ah; + struct mlx4_ib_iboe *iboe = dev-iboe; + struct in6_addr in6; - ah = kmalloc(sizeof *ah, GFP_ATOMIC); - if (!ah) - return ERR_PTR(-ENOMEM); + *is_mcast = 0; + spin_lock(iboe-lock); + if (!iboe-netdevs[port - 1]) { + spin_unlock(iboe-lock); + return -EINVAL; + } + spin_unlock(iboe-lock); - memset(ah-av, 0, sizeof ah-av); + memcpy(in6, ah_attr-grh.dgid.raw, sizeof in6); + if (rdma_link_local_addr(in6)) + rdma_get_ll_mac(in6, mac); + else if (rdma_is_multicast_addr(in6)) { + rdma_get_mcast_mac(in6, mac); + *is_mcast = 1; + } else + return -EINVAL; - ah-av.port_pd = cpu_to_be32(to_mpd(pd)-pdn | (ah_attr-port_num 24)); - ah-av.g_slid = ah_attr-src_path_bits; - ah-av.dlid= cpu_to_be16(ah_attr-dlid); - if (ah_attr-static_rate) { - ah-av.stat_rate = ah_attr-static_rate + MLX4_STAT_RATE_OFFSET; - while (ah-av.stat_rate IB_RATE_2_5_GBPS + MLX4_STAT_RATE_OFFSET - !(1 ah-av.stat_rate dev-caps.stat_rate_support)) - --ah-av.stat_rate; - } - ah-av.sl_tclass_flowlabel = cpu_to_be32(ah_attr-sl 28); + return 0; +} + +static struct ib_ah *create_ib_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr, + struct mlx4_ib_ah *ah) +{ + struct mlx4_dev *dev = to_mdev(pd-device)-dev; + + ah-av.ib.port_pd = cpu_to_be32(to_mpd(pd)-pdn | (ah_attr-port_num 24)); + ah-av.ib.g_slid = ah_attr-src_path_bits; if (ah_attr-ah_flags IB_AH_GRH) { - ah-av.g_slid |= 0x80; - ah-av.gid_index = ah_attr-grh.sgid_index; - ah-av.hop_limit = ah_attr-grh.hop_limit; - ah-av.sl_tclass_flowlabel |= + ah-av.ib.g_slid |= 0x80; + ah-av.ib.gid_index = ah_attr-grh.sgid_index; + ah-av.ib.hop_limit = ah_attr-grh.hop_limit; + ah-av.ib.sl_tclass_flowlabel |= cpu_to_be32((ah_attr-grh.traffic_class 20) | ah_attr-grh.flow_label); - memcpy(ah-av.dgid, ah_attr-grh.dgid.raw, 16); + memcpy(ah-av.ib.dgid, ah_attr-grh.dgid.raw, 16); + } + + ah-av.ib.dlid=
[ewg] [PATCHv8 10/11] ib_core: Add VLAN support to IBoE
Add 802.1q vlan support to IBoE. The vlan tag is encoded within the GID derived from a link local address in the following way: GID[11] GID[12] contain the vlan ID. The 3 bit user priority field is identical to the 3 bits of the SL. In case rdma_cm apps, the TOS field is used to generate the SL field by doing a shift right of 5 bits effectively taking to 3 MS bits of the TOS field. In order to support userspace verbs consumers, ib_uverbs_get_mac has changed into ib_uverbs_get_eth_l2_addr and now returns both MAC and VLAN information. Signed-off-by: Eli Cohen e...@mellanox.co.il --- drivers/infiniband/core/cma.c | 20 - drivers/infiniband/core/ucma.c | 13 - drivers/infiniband/core/ud_header.c | 31 - include/rdma/ib_addr.h | 49 --- include/rdma/ib_pack.h | 19 ++--- 5 files changed, 106 insertions(+), 26 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index df5f636..108d1bb 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1763,6 +1763,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) struct sockaddr_in *src_addr = (struct sockaddr_in *)route-addr.src_addr; struct sockaddr_in *dst_addr = (struct sockaddr_in *)route-addr.dst_addr; struct net_device *ndev = NULL; + u16 vid; if (src_addr-sin_family != dst_addr-sin_family) return -EINVAL; @@ -1782,14 +1783,6 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) route-num_paths = 1; - iboe_mac_to_ll(route-path_rec-sgid, addr-dev_addr.src_dev_addr); - iboe_mac_to_ll(route-path_rec-dgid, addr-dev_addr.dst_dev_addr); - - route-path_rec-hop_limit = 1; - route-path_rec-reversible = 1; - route-path_rec-pkey = cpu_to_be16(0x); - route-path_rec-mtu_selector = IB_SA_EQ; - if (addr-dev_addr.bound_dev_if) ndev = dev_get_by_index(init_net, addr-dev_addr.bound_dev_if); if (!ndev) { @@ -1797,6 +1790,17 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) goto err2; } + vid = rdma_vlan_dev_vlan_id(ndev); + + iboe_mac_vlan_to_ll(route-path_rec-sgid, addr-dev_addr.src_dev_addr, vid); + iboe_mac_vlan_to_ll(route-path_rec-dgid, addr-dev_addr.dst_dev_addr, vid); + + route-path_rec-hop_limit = 1; + route-path_rec-reversible = 1; + route-path_rec-pkey = cpu_to_be16(0x); + route-path_rec-mtu_selector = IB_SA_EQ; + route-path_rec-sl = id_priv-tos 5; + route-path_rec-mtu = iboe_get_mtu(ndev-mtu); route-path_rec-rate_selector = IB_SA_EQ; route-path_rec-rate = iboe_get_rate(ndev); diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index fcc27bc..ed670f5 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -586,13 +586,22 @@ static void ucma_copy_iboe_route(struct rdma_ucm_query_route_resp *resp, struct rdma_route *route) { struct rdma_dev_addr *dev_addr; + struct net_device *dev; + u16 vid = 0; resp-num_paths = route-num_paths; switch (route-num_paths) { case 0: dev_addr = route-addr.dev_addr; - iboe_mac_to_ll((union ib_gid *) resp-ib_route[0].dgid, - dev_addr-dst_dev_addr); + dev = dev_get_by_index(init_net, dev_addr-bound_dev_if); + if (dev) { + vid = rdma_vlan_dev_vlan_id(dev); + dev_put(dev); + } + + + iboe_mac_vlan_to_ll((union ib_gid *) resp-ib_route[0].dgid, + dev_addr-dst_dev_addr, vid); iboe_addr_get_sgid(dev_addr, (union ib_gid *) resp-ib_route[0].sgid); resp-ib_route[0].pkey = cpu_to_be16(0x); diff --git a/drivers/infiniband/core/ud_header.c b/drivers/infiniband/core/ud_header.c index 7650313..7d03cf1 100644 --- a/drivers/infiniband/core/ud_header.c +++ b/drivers/infiniband/core/ud_header.c @@ -33,6 +33,7 @@ #include linux/errno.h #include linux/string.h +#include linux/if_ether.h #include rdma/ib_pack.h @@ -103,6 +104,17 @@ static const struct ib_field eth_table[] = { .size_bits= 16 } }; +static const struct ib_field vlan_table[] = { + { STRUCT_FIELD(vlan, tag), + .offset_words = 0, + .offset_bits = 0, + .size_bits= 16 }, + { STRUCT_FIELD(vlan, type), + .offset_words = 0, + .offset_bits = 16, + .size_bits= 16 } +}; + static const struct ib_field grh_table[] = { { STRUCT_FIELD(grh, ip_version), .offset_words = 0, @@ -205,6 +217,7 @@
[ewg] [PATCHv8 11/11] mlx4: Add vlan support to IBoE
This patch allows IBoE traffic to be encapsulated in 802.1q tagged VLAN frames. The VLAN tag is encoded in the GID and derived from it by a simple computation. The netdev notifier callback is modified to catch new VLAN devices addition/removal and the port's GID table is updated to reflect the change such that for each netdevice there is an entry in the GID table. When the port's GID table is exhausted, GID entries will not be added. Only children of the main interface's can add to the GID table. If a vlan interface is added on another vlan interface (e.g. vconfig add eth2.6 8), then that interfaces will not add an entry to the GID table. Signed-off-by: Eli Cohen e...@mellanox.co.il --- drivers/infiniband/hw/mlx4/ah.c| 22 +++-- drivers/infiniband/hw/mlx4/main.c | 84 +++- drivers/infiniband/hw/mlx4/mlx4_ib.h |4 +- drivers/infiniband/hw/mlx4/qp.c| 49 +++--- drivers/infiniband/hw/mthca/mthca_qp.c |2 +- drivers/net/mlx4/en_netdev.c | 10 drivers/net/mlx4/mlx4_en.h |1 + drivers/net/mlx4/port.c| 19 +++ include/linux/mlx4/device.h|1 + include/linux/mlx4/qp.h|2 +- 10 files changed, 166 insertions(+), 28 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c index 0a2f1fb..32911c0 100644 --- a/drivers/infiniband/hw/mlx4/ah.c +++ b/drivers/infiniband/hw/mlx4/ah.c @@ -34,6 +34,7 @@ #include rdma/ib_addr.h #include linux/inet.h #include linux/string.h +#include rdma/ib_cache.h int mlx4_ib_resolve_grh(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah_attr, u8 *mac, int *is_mcast, u8 port) @@ -98,6 +99,8 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr u8 mac[6]; int err; int is_mcast; + u16 vlan_tag; + union ib_gid sgid; err = mlx4_ib_resolve_grh(ibdev, ah_attr, mac, is_mcast, ah_attr-port_num); if (err) @@ -105,8 +108,14 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr memcpy(ah-av.eth.mac_0_1, mac, 2); memcpy(ah-av.eth.mac_2_5, mac + 2, 4); + err = ib_get_cached_gid(pd-device, ah_attr-port_num, ah_attr-grh.sgid_index, sgid); + if (err) + return ERR_PTR(err); + vlan_tag = rdma_get_vlan_id(sgid); + vlan_tag |= (ah_attr-sl 7) 13; ah-av.eth.port_pd = cpu_to_be32(to_mpd(pd)-pdn | (ah_attr-port_num 24)); ah-av.eth.gid_index = ah_attr-grh.sgid_index; + ah-av.eth.vlan = cpu_to_be16(vlan_tag); if (ah_attr-static_rate) { ah-av.eth.stat_rate = ah_attr-static_rate + MLX4_STAT_RATE_OFFSET; while (ah-av.eth.stat_rate IB_RATE_2_5_GBPS + MLX4_STAT_RATE_OFFSET @@ -194,8 +203,8 @@ int mlx4_ib_destroy_ah(struct ib_ah *ah) return 0; } -int mlx4_ib_get_eth_l2_addr(struct ib_device *device, u8 port, union ib_gid *gid, - int sgid_idx, u8 *mac, u16 *vlan_id) +int mlx4_ib_get_eth_l2_addr(struct ib_device *device, u8 port, union ib_gid *dgid, + int sgid_idx, u8 *mac, u16 *vlan_id, u8 *tagged) { int err; struct mlx4_ib_dev *ibdev = to_mdev(device); @@ -203,13 +212,18 @@ int mlx4_ib_get_eth_l2_addr(struct ib_device *device, u8 port, union ib_gid *gid .port_num = port, }; int is_mcast; + union ib_gid sgid; - memcpy(ah_attr.grh.dgid.raw, gid, 16); + memcpy(ah_attr.grh.dgid.raw, dgid, 16); err = mlx4_ib_resolve_grh(ibdev, ah_attr, mac, is_mcast, port); if (err) ERR_PTR(err); - *vlan_id = 0; + err = ib_get_cached_gid(device, port, sgid_idx, sgid); + if (err) + return err; + *vlan_id = rdma_get_vlan_id(sgid); + *tagged = !!(*vlan_id); return 0; } diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 3b8ab83..f02897d 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -37,6 +37,7 @@ #include linux/netdevice.h #include linux/inetdevice.h #include linux/rtnetlink.h +#include linux/if_vlan.h #include rdma/ib_smi.h #include rdma/ib_user_verbs.h @@ -78,6 +79,8 @@ static void init_query_mad(struct ib_smp *mad) mad-method= IB_MGMT_METHOD_GET; } +static union ib_gid zgid; + static int mlx4_ib_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { @@ -800,12 +803,17 @@ static struct device_attribute *mlx4_class_attributes[] = { dev_attr_board_id }; -static void mlx4_addrconf_ifid_eui48(u8 *eui, struct net_device *dev) +static void mlx4_addrconf_ifid_eui48(u8 *eui, int is_vlan, u16 vlan_id, struct net_device *dev) { memcpy(eui, dev-dev_addr, 3); memcpy(eui + 5,
[ewg] [PATCHv8 1/4] libibverbs: Add link layer field to ibv_port_attr
This field can have one of the values - IBV_LINK_LAYER_UNSPECIFIED, IBV_LINK_LAYER_INFINIBAND, IBV_LINK_LAYER_ETHERNET. It can be used by applications to know the link layer used by the port, which can be either Infiniband or Ethernet. The addition of the new field does not change the size of struct ibv_port_attr due to alignment of the preceding field. Binary compatibility is not compromised either since new apps with old libraries will determine the link layer as IB while old applications with new a new library do not read this field. Solution suggested by: Roland Dreier rola...@cisco.com Jason Gunthorpe jguntho...@obsidianresearch.com Signed-off-by: Eli Cohen e...@mellanox.co.il --- include/infiniband/verbs.h | 21 + 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index 0f1cb2e..17df3ff 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -161,6 +161,12 @@ enum ibv_port_state { IBV_PORT_ACTIVE_DEFER = 5 }; +enum { + IBV_LINK_LAYER_UNSPECIFIED, + IBV_LINK_LAYER_INFINIBAND, + IBV_LINK_LAYER_ETHERNET, +}; + struct ibv_port_attr { enum ibv_port_state state; enum ibv_mtumax_mtu; @@ -181,6 +187,8 @@ struct ibv_port_attr { uint8_t active_width; uint8_t active_speed; uint8_t phys_state; + uint8_t link_layer; + uint8_t pad; }; enum ibv_event_type { @@ -693,6 +701,16 @@ struct ibv_context { void *abi_compat; }; +static inline int ___ibv_query_port(struct ibv_context *context, + uint8_t port_num, + struct ibv_port_attr *port_attr) +{ + port_attr-link_layer = IBV_LINK_LAYER_UNSPECIFIED; + port_attr-pad = 0; + + return context-ops.query_port(context, port_num, port_attr); +} + /** * ibv_get_device_list - Get list of IB devices currently available * @num_devices: optional. if non-NULL, set to the number of devices @@ -1097,4 +1115,7 @@ END_C_DECLS # undef __attribute_const +#define ibv_query_port(context, port_num, port_attr) \ + ___ibv_query_port(context, port_num, port_attr) + #endif /* INFINIBAND_VERBS_H */ -- 1.7.0 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCHv8 2/4] libibverbs: change kernel API to accept link layer
Modify the code to allow passing the link layer of a port from kernel to user. Update ibv_query_port.3 man page with the change. Signed-off-by: Eli Cohen e...@mellanox.co.il --- include/infiniband/kern-abi.h |3 ++- man/ibv_query_port.3 |1 + src/cmd.c |1 + 3 files changed, 4 insertions(+), 1 deletions(-) diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h index 0db083a..619ea7e 100644 --- a/include/infiniband/kern-abi.h +++ b/include/infiniband/kern-abi.h @@ -223,7 +223,8 @@ struct ibv_query_port_resp { __u8 active_width; __u8 active_speed; __u8 phys_state; - __u8 reserved[3]; + __u8 link_layer; + __u8 reserved[2]; }; struct ibv_alloc_pd { diff --git a/man/ibv_query_port.3 b/man/ibv_query_port.3 index 882470d..6d8b873 100644 --- a/man/ibv_query_port.3 +++ b/man/ibv_query_port.3 @@ -44,6 +44,7 @@ uint8_t init_type_reply;/* Type of initialization performed by S uint8_t active_width; /* Currently active link width */ uint8_t active_speed; /* Currently active link speed */ uint8_t phys_state; /* Physical port state */ +uint8_t link_layer; /* link layer protocol of the port */ .in -8 }; .sp diff --git a/src/cmd.c b/src/cmd.c index cbd5288..39af833 100644 --- a/src/cmd.c +++ b/src/cmd.c @@ -196,6 +196,7 @@ int ibv_cmd_query_port(struct ibv_context *context, uint8_t port_num, port_attr-active_width= resp.active_width; port_attr-active_speed= resp.active_speed; port_attr-phys_state = resp.phys_state; + port_attr-link_layer = resp.link_layer; return 0; } -- 1.7.0 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCHv8 3/4] libibverbs: Add API to retrieve eth link layer address
Add a command to retrieve the layer 2 address of an ethernet port. The layer 2 address is comprised of the port's MAC address and the VLAN ID.This is required by libraries to build work requests when the port's link layer is Ethernet. Signed-off-by: Eli Cohen e...@mellanox.co.il --- include/infiniband/driver.h |2 ++ include/infiniband/kern-abi.h | 23 ++- src/cmd.c | 24 src/libibverbs.map|1 + 4 files changed, 49 insertions(+), 1 deletions(-) diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h index 9a81416..3e09548 100644 --- a/include/infiniband/driver.h +++ b/include/infiniband/driver.h @@ -131,6 +131,8 @@ int ibv_cmd_create_ah(struct ibv_pd *pd, struct ibv_ah *ah, int ibv_cmd_destroy_ah(struct ibv_ah *ah); int ibv_cmd_attach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid); int ibv_cmd_detach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid); +int ibv_cmd_get_eth_l2_addr(struct ibv_pd *pd, uint8_t port, const union ibv_gid *gid, + int sgid_idx, uint8_t *mac, uint16_t *vlan_id, uint8_t *tagged); int ibv_dontfork_range(void *base, size_t size); int ibv_dofork_range(void *base, size_t size); diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h index 619ea7e..642c7db 100644 --- a/include/infiniband/kern-abi.h +++ b/include/infiniband/kern-abi.h @@ -85,7 +85,8 @@ enum { IB_USER_VERBS_CMD_MODIFY_SRQ, IB_USER_VERBS_CMD_QUERY_SRQ, IB_USER_VERBS_CMD_DESTROY_SRQ, - IB_USER_VERBS_CMD_POST_SRQ_RECV + IB_USER_VERBS_CMD_POST_SRQ_RECV, + IB_USER_VERBS_CMD_GET_ETH_L2_ADDR, }; /* @@ -804,6 +805,7 @@ enum { * trick opcodes in IBV_INIT_CMD() doesn't break. */ IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL_V2 = -1, + IB_USER_VERBS_CMD_GET_ETH_L2_ADDR_V2 = -1, }; struct ibv_destroy_cq_v1 { @@ -879,4 +881,23 @@ struct ibv_create_srq_resp_v5 { __u32 srq_handle; }; +struct ibv_get_eth_l2_addr { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u32 pd_handle; + __u8 port; + __u8 sgid_idx; + __u8 reserved[2]; + __u8 dgid[16]; +}; + +struct ibv_get_eth_l2_addr_resp { + __u8mac[6]; + __u16 vlan_id; + __u8tagged; + __u8reserved[3]; +}; + #endif /* KERN_ABI_H */ diff --git a/src/cmd.c b/src/cmd.c index 39af833..6a3c101 100644 --- a/src/cmd.c +++ b/src/cmd.c @@ -1123,3 +1123,27 @@ int ibv_cmd_detach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t l return 0; } + +int ibv_cmd_get_eth_l2_addr(struct ibv_pd *pd, uint8_t port, const union ibv_gid *gid, + int sgid_idx, uint8_t *mac, uint16_t *vlan_id, uint8_t *tagged) + +{ + struct ibv_get_eth_l2_addr cmd; + struct ibv_get_eth_l2_addr_resp resp; + + IBV_INIT_CMD_RESP(cmd, sizeof cmd, GET_ETH_L2_ADDR, resp, sizeof resp); + memcpy(cmd.dgid, gid, sizeof cmd.dgid); + cmd.pd_handle = pd-handle; + cmd.port = port; + cmd.sgid_idx = sgid_idx; + + if (write(pd-context-cmd_fd, cmd, sizeof cmd) != sizeof cmd) + return errno; + + memcpy(mac, resp.mac, 6); + *vlan_id = resp.vlan_id; + *tagged = resp.tagged; + + return 0; +} + diff --git a/src/libibverbs.map b/src/libibverbs.map index 1827da0..af52d0d 100644 --- a/src/libibverbs.map +++ b/src/libibverbs.map @@ -64,6 +64,7 @@ IBVERBS_1.0 { ibv_cmd_destroy_ah; ibv_cmd_attach_mcast; ibv_cmd_detach_mcast; + ibv_cmd_get_eth_l2_addr; ibv_copy_qp_attr_from_kern; ibv_copy_path_rec_from_kern; ibv_copy_path_rec_to_kern; -- 1.7.0 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCHv8 4/4] libibverbs: Update examples for IBoE
Since IBoE requires usage of GRH, update ibv_*_pinpong examples to accept GIDs. GIDs are given as an index to the local port's table and are exchanged between the client and the server through the socket connection. The examples are also modified to pass the gid index to the code that creates the address vector as a preparation to using gids other the the on in index 0. Signed-off-by: Eli Cohen e...@mellanox.co.il --- examples/devinfo.c | 14 +++ examples/pingpong.c | 31 examples/pingpong.h |4 ++ examples/rc_pingpong.c | 91 ++ examples/srq_pingpong.c | 84 --- examples/uc_pingpong.c | 82 +++--- examples/ud_pingpong.c | 81 ++ 7 files changed, 297 insertions(+), 90 deletions(-) diff --git a/examples/devinfo.c b/examples/devinfo.c index 84f95c7..393ec04 100644 --- a/examples/devinfo.c +++ b/examples/devinfo.c @@ -184,6 +184,19 @@ static int print_all_port_gids(struct ibv_context *ctx, uint8_t port_num, int tb return rc; } +static const char *link_layer_str(uint8_t link_layer) +{ + switch (link_layer) { + case IBV_LINK_LAYER_UNSPECIFIED: + case IBV_LINK_LAYER_INFINIBAND: + return IB; + case IBV_LINK_LAYER_ETHERNET: + return Ethernet; + default: + return Unknown; + } +} + static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port) { struct ibv_context *ctx; @@ -284,6 +297,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port) printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid); printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid); printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc); + printf(\t\t\tlink_layer:\t\t%s\n, link_layer_str(port_attr.link_layer)); if (verbose) { printf(\t\t\tmax_msg_sz:\t\t0x%x\n, port_attr.max_msg_sz); diff --git a/examples/pingpong.c b/examples/pingpong.c index b916f59..806f446 100644 --- a/examples/pingpong.c +++ b/examples/pingpong.c @@ -31,6 +31,10 @@ */ #include pingpong.h +#include arpa/inet.h +#include stdlib.h +#include stdio.h +#include string.h enum ibv_mtu pp_mtu_to_enum(int mtu) { @@ -53,3 +57,30 @@ uint16_t pp_get_local_lid(struct ibv_context *context, int port) return attr.lid; } + +int pp_get_port_info(struct ibv_context *context, int port, +struct ibv_port_attr *attr) +{ + return ibv_query_port(context, port, attr); +} + +void wire_gid_to_gid(const char *wgid, union ibv_gid *gid) +{ + char tmp[9]; + uint32_t v32; + int i; + + for (tmp[8] = 0, i = 0; i 4; ++i) { + memcpy(tmp, wgid + i * 8, 8); + sscanf(tmp, %x, v32); + *(uint32_t *)(gid-raw[i * 4]) = ntohl(v32); + } +} + +void gid_to_wire_gid(const union ibv_gid *gid, char wgid[]) +{ + int i; + + for (i = 0; i 4; ++i) + sprintf(wgid[i * 8], %08x, htonl(*(uint32_t *)(gid-raw + i * 4))); +} diff --git a/examples/pingpong.h b/examples/pingpong.h index 71d7c3f..9cdc03e 100644 --- a/examples/pingpong.h +++ b/examples/pingpong.h @@ -37,5 +37,9 @@ enum ibv_mtu pp_mtu_to_enum(int mtu); uint16_t pp_get_local_lid(struct ibv_context *context, int port); +int pp_get_port_info(struct ibv_context *context, int port, +struct ibv_port_attr *attr); +void wire_gid_to_gid(const char *wgid, union ibv_gid *gid); +void gid_to_wire_gid(const union ibv_gid *gid, char wgid[]); #endif /* IBV_PINGPONG_H */ diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c index fa969e0..a63905d 100644 --- a/examples/rc_pingpong.c +++ b/examples/rc_pingpong.c @@ -67,17 +67,19 @@ struct pingpong_context { int size; int rx_depth; int pending; + struct ibv_port_attr portinfo; }; struct pingpong_dest { int lid; int qpn; int psn; + union ibv_gid gid; }; static int pp_connect_ctx(struct pingpong_context *ctx, int port, int my_psn, enum ibv_mtu mtu, int sl, - struct pingpong_dest *dest) + struct pingpong_dest *dest, int sgid_idx) { struct ibv_qp_attr attr = { .qp_state = IBV_QPS_RTR, @@ -94,6 +96,13 @@ static int pp_connect_ctx(struct pingpong_context *ctx, int port, int my_psn, .port_num = port } }; + + if (dest-gid.global.interface_id) { + attr.ah_attr.is_global = 1; + attr.ah_attr.grh.hop_limit = 1; + attr.ah_attr.grh.dgid = dest-gid; + attr.ah_attr.grh.sgid_index = sgid_idx; + }
[ewg] [PATCHv8 2/2] libmlx4: Add Eth devices to ib devices list
With the new IBoE implementation, Ethernet devices expose also IB devices. Update the list of supported devices with that of the kernel. Signed-off-by: Eli Cohen e...@mellanox.co.il --- src/mlx4.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/src/mlx4.c b/src/mlx4.c index 1295c53..6068208 100644 --- a/src/mlx4.c +++ b/src/mlx4.c @@ -66,6 +66,13 @@ struct { HCA(MELLANOX, 0x6354), /* MT25408 Hermon QDR */ HCA(MELLANOX, 0x6732), /* MT25408 Hermon DDR PCIe gen2 */ HCA(MELLANOX, 0x673c), /* MT25408 Hermon QDR PCIe gen2 */ + HCA(MELLANOX, 0x6368), /* MT25448 [ConnectX EN 10GigE, PCIe 2.0 2.5GT/s] */ + HCA(MELLANOX, 0x6750), /* MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] */ + HCA(MELLANOX, 0x6372), /* MT25408 [ConnectX EN 10GigE 10GBaseT, PCIe 2.0 2.5GT/s] */ + HCA(MELLANOX, 0x675a), /* MT25408 [ConnectX EN 10GigE 10GBaseT, PCIe Gen2 5GT/s] */ + HCA(MELLANOX, 0x6764), /* MT26468 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] */ + HCA(MELLANOX, 0x6746), /* MT26438 ConnectX EN 40GigE PCIe gen2 5GT/s */ + HCA(MELLANOX, 0x676e), /* MT26478 ConnectX2 40GigE PCIe gen2 */ }; static struct ibv_context_ops mlx4_ctx_ops = { -- 1.7.0 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCHv8 1/2] libmlx4: Add IBoE support
Modify libmlx4 to support IBoE. The change involves retrieving the ethernet layer 2 address of a port based on its GID and source index through a new system call, ibv_cmd_eth_l2_addr(), and embedding the layer 2 information in the address vector representation of mlx4. Signed-off-by: Eli Cohen e...@mellanox.co.il --- src/mlx4.h |4 src/qp.c|8 +++- src/verbs.c | 34 ++ src/wqe.h |6 -- 4 files changed, 49 insertions(+), 3 deletions(-) diff --git a/src/mlx4.h b/src/mlx4.h index 4445998..4b12456 100644 --- a/src/mlx4.h +++ b/src/mlx4.h @@ -236,11 +236,15 @@ struct mlx4_av { uint8_t hop_limit; uint32_tsl_tclass_flowlabel; uint8_t dgid[16]; + uint8_t mac[8]; }; struct mlx4_ah { struct ibv_ah ibv_ah; struct mlx4_av av; + uint16_tvlan; + uint8_t mac[6]; + uint8_t tagged; }; static inline unsigned long align(unsigned long val, unsigned long align) diff --git a/src/qp.c b/src/qp.c index d194ae3..fa70889 100644 --- a/src/qp.c +++ b/src/qp.c @@ -143,6 +143,8 @@ static void set_datagram_seg(struct mlx4_wqe_datagram_seg *dseg, memcpy(dseg-av, to_mah(wr-wr.ud.ah)-av, sizeof (struct mlx4_av)); dseg-dqpn = htonl(wr-wr.ud.remote_qpn); dseg-qkey = htonl(wr-wr.ud.remote_qkey); + dseg-vlan = htons(to_mah(wr-wr.ud.ah)-vlan); + memcpy(dseg-mac, to_mah(wr-wr.ud.ah)-mac, 6); } static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ibv_sge *sg) @@ -281,6 +283,10 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, set_datagram_seg(wqe, wr); wqe += sizeof (struct mlx4_wqe_datagram_seg); size += sizeof (struct mlx4_wqe_datagram_seg) / 16; + if (to_mah(wr-wr.ud.ah)-tagged) { + ctrl-ins_vlan = 1 6; + ctrl-vlan_tag = htons(to_mah(wr-wr.ud.ah)-vlan); + } break; default: @@ -393,7 +399,7 @@ out: if (nreq == 1 inl size 1 size ctx-bf_buf_size / 16) { ctrl-owner_opcode |= htonl((qp-sq.head 0x) 8); - *(uint32_t *) ctrl-reserved |= qp-doorbell_qpn; + *(uint32_t *) (ctrl-vlan_tag) |= qp-doorbell_qpn; /* * Make sure that descriptor is written to memory * before writing to BlueFlame page. diff --git a/src/verbs.c b/src/verbs.c index 1ac1362..48731a7 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -614,9 +614,21 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp) return 0; } +static int mcast_mac(uint8_t *mac) +{ + int i; + uint8_t val = 0xff; + + for (i = 0; i 6; ++i) + val = mac[i]; + + return val == 0xff; +} + struct ibv_ah *mlx4_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr) { struct mlx4_ah *ah; + struct ibv_port_attr port_attr; ah = malloc(sizeof *ah); if (!ah) @@ -642,7 +654,29 @@ struct ibv_ah *mlx4_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr) memcpy(ah-av.dgid, attr-grh.dgid.raw, 16); } + if (ibv_query_port(pd-context, attr-port_num, port_attr)) + goto err; + + if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) { + if (ibv_cmd_get_eth_l2_addr(pd, attr-port_num, + (const union ibv_gid *)ah-av.dgid, + attr-grh.sgid_index, + ah-mac, ah-vlan, ah-tagged)) + goto err; + + if (mcast_mac(ah-mac)) + ah-av.dlid = htons(0xc000); + if (ah-tagged) { + ah-av.port_pd |= htonl(1 29); + ah-vlan |= (attr-sl 7) 13; + } + } + + return ah-ibv_ah; +err: + free(ah); + return NULL; } int mlx4_destroy_ah(struct ibv_ah *ah) diff --git a/src/wqe.h b/src/wqe.h index 6f7f309..1e6159c 100644 --- a/src/wqe.h +++ b/src/wqe.h @@ -54,7 +54,8 @@ enum { struct mlx4_wqe_ctrl_seg { uint32_towner_opcode; - uint8_t reserved[3]; + uint16_tvlan_tag; + uint8_t ins_vlan; uint8_t fence_size; /* * High 24 bits are SRC remote buffer; low 8 bits are flags: @@ -78,7 +79,8 @@ struct mlx4_wqe_datagram_seg { uint32_tav[8]; uint32_tdqpn; uint32_tqkey; - uint32_treserved[2]; + uint16_t
Re: [ewg] [PATCH OFED-151] ehca forward ports
Alexander Schmidt wrote: Hi Vlad, please apply for OFED-151. Forward ports for ehca driver to enable compilation on 2.6.32 and 2.6.31. Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com --- kernel_patches/backport/2.6.32/ehca-010-remove_driver_data.patch | 60 ++ kernel_patches/backport/2.6.32/ehca-020-fix_buswalk.patch| 17 ++ 2 files changed, 77 insertions(+) Hi Alex, I don't see patches for 2.6.31. Should they be here? Regards, Vladimir --- /dev/null +++ ofed_kernel-1.5/kernel_patches/backport/2.6.32/ehca-010-remove_driver_data.patch @@ -0,0 +1,60 @@ +commit f899c2ddd45f2515deb446e2b143e4a686a49aee +Author: Greg Kroah-Hartman gre...@suse.de +Date: Mon May 4 12:40:54 2009 -0700 + +infiniband: ehca: remove driver_data direct access of struct device + +In the near future, the driver core is going to not allow direct access +to the driver_data pointer in struct device. Instead, the functions +dev_get_drvdata() and dev_set_drvdata() should be used. These functions +have been around since the beginning, so are backwards compatible with +all older kernel versions. + +Cc: Sean Hefty sean.he...@intel.com +Cc: Roland Dreier rola...@cisco.com +Cc: Hal Rosenstock hal.rosenst...@gmail.com +Cc: gene...@lists.openfabrics.org +Cc: Christoph Raisch rai...@de.ibm.com +Acked-by: Hoang-Nam Nguyen hngu...@de.ibm.com +Signed-off-by: Greg Kroah-Hartman gre...@suse.de + +diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c +index 85905ab..ce4e6ef 100644 +--- a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c +@@ -636,7 +636,7 @@ static ssize_t ehca_show_##name(struct device *dev, \ + struct hipz_query_hca *rblock; \ + int data; \ +\ +-shca = dev-driver_data; \ ++shca = dev_get_drvdata(dev); \ +\ + rblock = ehca_alloc_fw_ctrlblock(GFP_KERNEL); \ + if (!rblock) { \ +@@ -680,7 +680,7 @@ static ssize_t ehca_show_adapter_handle(struct device *dev, + struct device_attribute *attr, + char *buf) + { +-struct ehca_shca *shca = dev-driver_data; ++struct ehca_shca *shca = dev_get_drvdata(dev); + + return sprintf(buf, %llx\n, shca-ipz_hca_handle.handle); + +@@ -749,7 +749,7 @@ static int __devinit ehca_probe(struct of_device *dev, + + shca-ofdev = dev; + shca-ipz_hca_handle.handle = *handle; +-dev-dev.driver_data = shca; ++dev_set_drvdata(dev-dev, shca); + + ret = ehca_sense_attributes(shca); + if (ret 0) { +@@ -878,7 +878,7 @@ probe1: + + static int __devexit ehca_remove(struct of_device *dev) + { +-struct ehca_shca *shca = dev-dev.driver_data; ++struct ehca_shca *shca = dev_get_drvdata(dev-dev); + unsigned long flags; + int ret; + --- /dev/null +++ ofed_kernel-1.5/kernel_patches/backport/2.6.32/ehca-020-fix_buswalk.patch @@ -0,0 +1,17 @@ +--- + drivers/infiniband/hw/ehca/ehca_mrmw.c |2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +Index: ofa_kernel-1.5.1/drivers/infiniband/hw/ehca/ehca_mrmw.c +=== +--- ofa_kernel-1.5.1.orig/drivers/infiniband/hw/ehca/ehca_mrmw.c ofa_kernel-1.5.1/drivers/infiniband/hw/ehca/ehca_mrmw.c +@@ -2463,7 +2463,7 @@ int ehca_create_busmap(void) + int ret; + + ehca_mr_len = 0; +-ret = walk_memory_resource(0, 1ULL MAX_PHYSMEM_BITS, NULL, ++ret = walk_system_ram_range(0, 1ULL MAX_PHYSMEM_BITS, NULL, +ehca_create_busmap_callback); + return ret; + } ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH] IB/qib: update driver for OFED 1.5.1
Vlad, Please pull from: git://git.openfabrics.org/~ralphc/linux-2.6 ofed_kernel_1_5 commit bbf2471eac44a9cf2db05803a212162da3898ca4 Author: Ralph Campbell (QLogic) ral...@lists.openfabrics.org Date: Thu Feb 18 13:53:24 2010 -0800 IB/qib: update driver for OFED 1.5.1 This patch rolls up several fixes for the QIB driver to improve serdes settings, minor bug fixes, and copyright updates. It also adds a vendor specific performance MAD for returning some congestion statistics. Signed-off-by: Ralph Campbell ralph.campb...@qlogic.com ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] MLX4 Strangeness
Hi Tom, Status 12 = IB_WC_RETRY_EXC_ERR Vendor_err = 129 -- Timeout and transport error counter exceeded This indicates that we lost connection to the client ie. something went wrong on client side (bad operation cause QP error...) please try to catch any error on the client (qp async event, cq error status and vendor_err...) Today I just run vdbench on big file and get error right away (lost connection and nfsrdma cannot recover from there) Thanks, -vu -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Tom Tucker Sent: Wednesday, February 17, 2010 10:07 AM To: Tziporet Koren Cc: linux-r...@vger.kernel.org; ewg@lists.openfabrics.org Subject: Re: [ewg] MLX4 Strangeness Hi Tziporet: Here is a trace with the data for WR failing with status 12. The vendor error is 129. Feb 17 12:27:33 vic10 kernel: rpcrdma_event_process:154 wr_id status 12 opcode 0 vendor_err 129 byte_len 0 qp 81002a13ec00 ex src_qp wc_flags, 0 pkey_index Feb 17 12:27:33 vic10 kernel: rpcrdma_event_process:154 wr_id 81002878d800 status 5 opcode 0 vendor_err 244 byte_len 0 qp 81002a13ec00 ex src_qp wc_flags, 0 pkey_index Feb 17 12:27:33 vic10 kernel: rpcrdma_event_process:167 wr_id 81002878d800 status 5 opcode 0 vendor_err 244 byte_len 0 qp 81002a13ec00 ex src_qp wc_flags, 0 pkey_index Any thoughts? Tom Tom Tucker wrote: Tom Tucker wrote: Tziporet Koren wrote: On 2/15/2010 10:24 PM, Tom Tucker wrote: Hello, I am seeing some very strange behavior on my MLX4 adapters running 2.7 firmware and the latest OFED 1.5.1. Two systems are involved and each have dual ported MTHCA DDR adapter and MLX4 adapters. The scenario starts with NFSRDMA stress testing between the two systems running bonnie++ and iozone concurrently. The test completes and there is no issue. Then 6 minutes pass and the server times out the connection and shuts down the RC connection to the client. From this point on, using the RDMA CM, a new RC QP can be brought up and moved to RTS, however, the first RDMA_SEND to the NFS SERVER system fails with IB_WC_RETRY_EXC_ERR. I have confirmed: - that arp completed successfully and the neighbor entries are populated on both the client and server - that the QP are in the RTS state on both the client and server - that there are RECV WR posted to the RQ on the server and they did not error out - that no RECV WR completed successfully or in error on the server - that there are SEND WR posted to the QP on the client - the client side SEND_WR fails with error 12 as mentioned above I have also confirmed the following with a different application (i.e. rping): server# rping -s client# rping -c -a 192.168.80.129 fails with the exact same error, i.e. client# rping -c -a 192.168.80.129 cq completion failed status 12 wait for RDMA_WRITE_ADV state 10 client DISCONNECT EVENT... However, if I run rping the other way, it works fine, that is, client# rping -s server# rping -c -a 192.168.80.135 It runs without error until I stop it. Does anyone have any ideas on how I might debug this? Tom What is the vendor syndrome error when you get a completion with error? Feb 16 15:08:29 vic10 kernel: rpcrdma: connection to 192.168.80.129:20049 closed (-103) Feb 16 15:51:27 vic10 kernel: rpcrdma: connection to 192.168.80.129:20049 on mlx4_0, memreg 5 slots 32 ird 16 Feb 16 15:52:01 vic10 kernel: rpcrdma_event_process:160 wr_id 81002879a000 status 5 opcode 0 vendor_err 244 byte_len 0 qp 81003c9e3200 ex src_qp wc_flags, 0 pkey_index Feb 16 15:52:06 vic10 kernel: rpcrdma: connection to 192.168.80.129:20049 closed (-103) Feb 16 15:52:06 vic10 kernel: rpcrdma: connection to 192.168.80.129:20049 on mlx4_0, memreg 5 slots 32 ird 16 Feb 16 15:52:40 vic10 kernel: rpcrdma_event_process:160 wr_id 81002879a000 status 5 opcode 0 vendor_err 244 byte_len 0 qp 81002f2d8400 ex src_qp wc_flags, 0 pkey_index Repeat forever So the vendor err is 244. Please ignore this. This log skips the failing WR (:-\). I need to do another trace. Does the issue occurs only on the ConnectX cards (mlx4) or also on the InfiniHost cards (mthca) Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg