[ewg] [GIT PULL] RDMA/nes: update for 1.5.2 RC2
Vlad, Please pull from: ssh://v...@sofa.openfabrics.org/home/ctung/scm/ofed-1.5.git ofed_kernel_1_5 for: Chien Tung (1): RDMA/nes: get and print eeprom version number Mirek Walukiewicz (1): RDMA/nes: Added missing mutex during memory registration kernel_patches/fixes/nes_0035_eeprom_version.patch | 34 kernel_patches/fixes/nes_0036_ima_mutex_fix.patch | 22 + 2 files changed, 56 insertions(+), 0 deletions(-) create mode 100644 kernel_patches/fixes/nes_0035_eeprom_version.patch create mode 100644 kernel_patches/fixes/nes_0036_ima_mutex_fix.patch Also please pull in Aleksey's RAW ETH support series and New RAW ETH QP type v2 [ PATCH 1/1 ] specific for nes. Thanks, Chien ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] EWG/OFED meeting June 7, 2010 meeting minutes
Tziporet - The fix so PSM doesn't try and build on unsupported systems was submitted last week, so that should be covered. I was a bit surprised to see it raised as an issue here, but you might not have had a chance to catch up. Also, we have done some testing on OFED 1.5.2, including with PSM, and that is going well. - Betsy On Tue, 2010-06-08 at 01:44 -0700, Tziporet Koren wrote: > Meeting summary: > > 1. OFED 1.5.2 progress as planned > 2. We plan to have RC2 on Thursday this week (Jun 10, 2010) > > Meeting details: > > 1. OFED 1.5.2 - features status > - Add new OSes: >- RHEL 5.5 - done >- SLES11 SP1 - Jeff Backer volunteered to do it >- Add RHEL6 beta - done > - Update the management package - new package was provided (not final) > - Update with new libibverbs 1.1.4 from Roland - on work > - Add-on packages that does not touch the core: >- Qlogic wish to add PSM library - Need to fix PSN library not to build on > systems > that are not supported: ia64 and PPC - Betsy should be > responsible for this >- New libehca tarball - done >- iWarp Multicast Acceleration (IBV_QPT_RAW_ETH) - done >- Add IBV_QPT_RAW_ETH for mlx4 - Voltaire - with in discussion between V & > Mellanox. >Moni to coordinate change for the nes driver on > RAW Eth QP >- ACM - Sean - done >- uDAPL package with bug fixes - better support for RoCE - done >- SDP Zcopy in GA - on work - toward completion > - Critical bug fixes - ongoing > > 2. OFED 1.5.2 testing status - all > Voltaire will start more testing after RC2 > Intel - Woody - continue testing - all good so far > Nes - have one issue that they should fix > IBM - not much testing. Will start on RC2 > Qlogic - no info > HP - not testing > Mellanox - regression is running, focused on SDP > > > Open: Is anyone interested to add more kernel.org support beyond 2.6.32 we > already support > > 3. Schedule: > Beta - May 3 - done - used in the interop. Report on OFED issues will be > provided. > RC1 - May 31 - done > RC2 - Jun 10 > RC3 - Jun 22 > GA - Jun 29 > > Tziporet > > > Tziporet > ___ > ewg mailing list > ewg@lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH v2] libibverbs: ibv_fork_init() and libhugetlbfs
Thanks, nice work. I like this approach. Alex (Vainman) any comments on this? - R. -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RAW ETH support for Mellanox v1 [PATCH 1/1]
> Did you check that counters are still working for RoCEE after this > patch? > I'll check the counters issue, but today patches created agains OFED-1.5.2-20100607-0636 and should be to be installed under kernel_patches/fixes directory. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RAW ETH support for Mellanox v1 [PATCH 1/1]
On Wed, Jun 09, 2010 at 02:29:57PM +0300, Aleksey Senin wrote: > > err = mlx4_multicast_attach(mdev->dev, &mqp->mqp, gid->raw, > !!(mqp->flags & > - MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK)); > + MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK), > + (ibqp->qp_type == IB_QPT_RAW_ETH) ? > + MLX4_PROT_EN : MLX4_PROT_IB); > if (err) > return err; Usage of MLX4_PROT_EN and MLX4_PROT_IB is wrong in this context since they are used for a totally different purpose. You need to define a new enum and explicitly set values for it to reflect hardware definitions. > @@ -1237,12 +1240,16 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, > if (cur_state == IB_QPS_INIT && > new_state == IB_QPS_RTR && > (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI || > - ibqp->qp_type == IB_QPT_UD || ibqp->qp_type == IB_QPT_RAW_ETY)) { > + ibqp->qp_type == IB_QPT_UD || ibqp->qp_type == IB_QPT_RAW_ETY || > + ibqp->qp_type == IB_QPT_RAW_ETH)) { > context->pri_path.sched_queue = (qp->port - 1) << 6; > if (is_qp0(dev, qp)) > context->pri_path.sched_queue |= > MLX4_IB_DEFAULT_QP0_SCHED_QUEUE; > else > context->pri_path.sched_queue |= > MLX4_IB_DEFAULT_SCHED_QUEUE; > + > + /* Default counter for non-RC QPs */ > + context->pri_path.counter_index = 0xff; Looks like this breaks hardware counters. Why are you using this statement? Also it appears that the patches were not created against latest OFED 1.5.2 sources. Did you check that counters are still working for RoCEE after this patch? ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RAW ETH support for Mellanox v1 [PATCH 1/1]
Add RAW ETH QP support for Mellanox adapters. Signed-off-by: Aleksey Senin --- drivers/infiniband/hw/mlx4/main.c | 13 + drivers/infiniband/hw/mlx4/qp.c | 25 + drivers/net/mlx4/mcg.c| 22 +- include/linux/mlx4/device.h |7 +-- 4 files changed, 48 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index c146b84..6841dc7 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -684,7 +684,9 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) struct mlx4_ib_qp *mqp = to_mqp(ibqp); err = mlx4_multicast_attach(mdev->dev, &mqp->mqp, gid->raw, !!(mqp->flags & - MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK)); + MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK), + (ibqp->qp_type == IB_QPT_RAW_ETH) ? + MLX4_PROT_EN : MLX4_PROT_IB); if (err) return err; @@ -695,7 +697,9 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) return 0; err_add: - mlx4_multicast_detach(mdev->dev, &mqp->mqp, gid->raw); + mlx4_multicast_detach(mdev->dev, &mqp->mqp, gid->raw, + (ibqp->qp_type == IB_QPT_RAW_ETH) ? + MLX4_PROT_EN : MLX4_PROT_IB); return err; } @@ -724,8 +728,9 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) struct net_device *ndev; struct gid_entry *ge; - err = mlx4_multicast_detach(mdev->dev, - &mqp->mqp, gid->raw); + err = mlx4_multicast_detach(mdev->dev, &mqp->mqp, gid->raw, + (ibqp->qp_type == IB_QPT_RAW_ETH) ? + MLX4_PROT_EN : MLX4_PROT_IB); if (err) return err; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 422d367..b6b484d 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -811,6 +811,7 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, case IB_QPT_RC: case IB_QPT_UC: case IB_QPT_UD: + case IB_QPT_RAW_ETH: { qp = kzalloc(sizeof *qp, GFP_KERNEL); if (!qp) @@ -902,6 +903,7 @@ static int to_mlx4_st(enum ib_qp_type type) case IB_QPT_RAW_ETY: case IB_QPT_SMI: case IB_QPT_GSI:return MLX4_QP_ST_MLX; + case IB_QPT_RAW_ETH:return MLX4_QP_ST_MLX; default:return -1; } } @@ -1064,8 +1066,9 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, break; } } - - if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI || + if (ibqp->qp_type == IB_QPT_RAW_ETH) + context->mtu_msgmax = 0xff; + else if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI || ibqp->qp_type == IB_QPT_RAW_ETY) context->mtu_msgmax = (IB_MTU_4096 << 5) | 11; else if (ibqp->qp_type == IB_QPT_UD) { @@ -1237,12 +1240,16 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, if (cur_state == IB_QPS_INIT && new_state == IB_QPS_RTR && (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI || -ibqp->qp_type == IB_QPT_UD || ibqp->qp_type == IB_QPT_RAW_ETY)) { +ibqp->qp_type == IB_QPT_UD || ibqp->qp_type == IB_QPT_RAW_ETY || + ibqp->qp_type == IB_QPT_RAW_ETH)) { context->pri_path.sched_queue = (qp->port - 1) << 6; if (is_qp0(dev, qp)) context->pri_path.sched_queue |= MLX4_IB_DEFAULT_QP0_SCHED_QUEUE; else context->pri_path.sched_queue |= MLX4_IB_DEFAULT_SCHED_QUEUE; + + /* Default counter for non-RC QPs */ + context->pri_path.counter_index = 0xff; } if (cur_state == IB_QPS_RTS && new_state == IB_QPS_SQD && @@ -1356,7 +1363,7 @@ int mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, goto out; } - if ((attr_mask & IB_QP_PORT) && + if ((attr_mask & IB_QP_PORT) && (ibqp->qp_type != IB_QPT_RAW_ETH) && (attr->port_num == 0 || attr->port_num > dev->num_ports)) { mlx4_ib_dbg("qpn 0x%x: invalid port number (%d) specified " "for transition %d to %d. qp_type %d", @@ -1365,6 +1372,16 @@ int mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, goto out; } + if ((attr_mask & IB_QP_PORT) && (ibqp->qp_type == IB_QPT_RAW_ETH) && + (rdma_port_link_layer(&de
[ewg] New RAW ETH QP type v2 [ PATCH 1/1 ]
Previous v1 missed implementation in verbs.c file. Add RAW ETH functionality to verbs layer. This QP used for creation RAW Ethernet packets over iWARP and RDMAOE protocols. Signed-off-by: Aleksey Senin --- drivers/infiniband/core/verbs.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 881850e..bb4dcd5 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -382,6 +382,7 @@ static const struct { [IB_QPT_UD] = (IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_QKEY), + [IB_QPT_RAW_ETH] = IB_QP_PORT, [IB_QPT_UC] = (IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_ACCESS_FLAGS), @@ -1004,7 +1005,11 @@ int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) switch (rdma_node_get_transport(qp->device->node_type)) { case RDMA_TRANSPORT_IB: - if (gid->raw[0] != 0xff || qp->qp_type != IB_QPT_UD) + if (qp->qp_type == IB_QPT_RAW_ETH) { + /* In raw Etherent mgids the 63 msb's should be 0 */ + if (gid->global.subnet_prefix & cpu_to_be64(~1ULL)) + return -EINVAL; + } else if (gid->raw[0] != 0xff || qp->qp_type != IB_QPT_UD) return -EINVAL; break; case RDMA_TRANSPORT_IWARP: @@ -1023,7 +1028,11 @@ int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) switch (rdma_node_get_transport(qp->device->node_type)) { case RDMA_TRANSPORT_IB: - if (gid->raw[0] != 0xff || qp->qp_type != IB_QPT_UD) + if (qp->qp_type == IB_QPT_RAW_ETH) { + /* In raw Etherent mgids the 63 msb's should be 0 */ + if (gid->global.subnet_prefix & cpu_to_be64(~1ULL)) + return -EINVAL; + } else if (gid->raw[0] != 0xff || qp->qp_type != IB_QPT_UD) return -EINVAL; break; case RDMA_TRANSPORT_IWARP: -- 1.6.5.2 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] ofa_1_5_kernel 20100609-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git git_branch: ofed_kernel_1_5 Common build parameters: Passed: Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16.60-0.54.5-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-128.el5 Passed on x86_64 with linux-2.6.18-194.el5 Passed on x86_64 with linux-2.6.18-164.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.27.19-5-smp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-89.ELsmp Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Failed: ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH v2] libibverbs: ibv_fork_init() and libhugetlbfs
On Wed, 02 Jun 2010 14:49:37 -0700 Roland Dreier wrote: > So if I read this correctly this patch introduces almost a 50% overhead > in the 1M case... and probably much worse (as a fraction) in say the 64K > or 4K case. I wonder if that's acceptable? We don't think this is acceptable, so we like the third approach you suggested very much. I've written the code and attached it below. This third version does not introduce additional overhead when not using huge pages (verified with 4k, 64k, 1m and 16m memory regions). Problem description: When fork support is enabled in libibverbs, madvise() is called for every memory page that is registered as a memory region. Memory ranges that are passed to madvise() must be page aligned and the size must be a multiple of the page size. libibverbs uses sysconf(_SC_PAGESIZE) to find out the system page size and rounds all ranges passed to reg_mr() according to this page size. When memory from libhugetlbfs is passed to reg_mr(), this does not work as the page size for this memory range might be different (e.g. 16Mb). So libibverbs would have to use the huge page size to calculate a page aligned range for madvise. As huge pages are provided to the application "under the hood" when preloading libhugetlbfs, the application does not have any knowledge about when it registers a huge page or a usual page. To work around this issue, detect the use of huge pages in libibverbs and align memory ranges passed to madvise according to the huge page size. Changes since v1: - detect use of huge pages at ibv_fork_init() time by walking through /sys/kernel/mm/hugepages/ - read huge page size from /proc/pid/smaps, which contains the page size of the mapping (thereby enabling support for mutliple huge page sizes) - code is independent of libhugetlbfs now, so huge pages can be provided to the application by any library Changes since v2: - reading from /proc/ to determine the huge page size is now only done when a call to madvise() using the system page size fails. So there is no overhead introduced when registering non-huge-page memory. Signed-off-by: Alexander Schmidt --- src/memory.c | 96 +++ 1 file changed, 90 insertions(+), 6 deletions(-) --- libibverbs.git.orig/src/memory.c +++ libibverbs.git/src/memory.c @@ -40,6 +40,8 @@ #include #include #include +#include +#include #include "ibverbs.h" @@ -70,10 +72,64 @@ static pthread_mutex_t mm_mutex = PTHREA static int page_size; static int too_late; +static unsigned long smaps_page_size(FILE *file) +{ + int n; + unsigned long size = 0; + char buf[1024]; + + while (fgets(buf, sizeof(buf), file) != NULL) { + if (!strstr(buf, "KernelPageSize:")) + continue; + + n = sscanf(buf, "%*s %lu", &size); + if (n < 1) + continue; + + /* page size is printed in Kb */ + size = size * 1024; + + break; + } + + return size; +} + +static unsigned long get_page_size(void *base) +{ + unsigned long ret = 0; + FILE *file; + char buf[1024]; + + file = fopen("/proc/self/smaps", "r"); + if (!file) + goto out; + + while (fgets(buf, sizeof(buf), file) != NULL) { + int n; + uintptr_t range_start, range_end; + + n = sscanf(buf, "%lx-%lx", &range_start, &range_end); + + if (n < 2) + continue; + + if ((uintptr_t) base >= range_start && (uintptr_t) base < range_end) { + ret = smaps_page_size(file); + break; + } + } + fclose(file); + +out: + return ret; +} + int ibv_fork_init(void) { - void *tmp; + void *tmp, *tmp_aligned; int ret; + unsigned long size; if (mm_root) return 0; @@ -88,8 +144,17 @@ int ibv_fork_init(void) if (posix_memalign(&tmp, page_size, page_size)) return ENOMEM; - ret = madvise(tmp, page_size, MADV_DONTFORK) || - madvise(tmp, page_size, MADV_DOFORK); + size = get_page_size(tmp); + + if (size) + tmp_aligned = (void *)((uintptr_t)tmp & ~(size - 1)); + else { + size = page_size; + tmp_aligned = tmp; + } + + ret = madvise(tmp_aligned, size, MADV_DONTFORK) || + madvise(tmp_aligned, size, MADV_DOFORK); free(tmp); @@ -522,7 +587,8 @@ static struct ibv_mem_node *undo_node(st return node; } -static int ibv_madvise_range(void *base, size_t size, int advice) +static int ibv_madvise_range(void *base, size_t size, int advice, +unsigned long page_size) { uintptr_t start, end; struct ibv_mem_node *node, *tmp; @@ -612,10 +678,28 @@ out: return ret
Re: [ewg] RAW_ETH support [PATCH 0/2]
Aleksey Senin wrote: > Those patches add new RAW_ETH QP type to the kernel in order to support > creation of RAW Ethernet packets for iWARP and RDMAOE protocols. > The reason for new type is that RAW_ETY QP already used by Mellanox > drivers for another purpose. Another reason, that there is RAW_ETH QP > type already present in userspace, but it mapped to RAW_ETY type in the > kernel and cause to confusion when dealing with code. > ___ > ewg mailing list > ewg@lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > Vlad, With Mirek's approval, we I think that nothing prevents from accepting these changes. thanks ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg