[ewg] Generating debuginfo for a build of OFED?

2012-02-08 Thread Mike Heinz
I'm trying to track down a problem by using systemtap - but it needs the debuginfo for the affected modules, and the OFED installer does not create a debuginfo for the kernel modules. Is there a way to turn the creation of debuginfo files on? This message and any attached documents contain

[ewg] ibdiagpath broken with TCL 8.5

2011-03-01 Thread Mike Heinz
...@lists.openfabrics.org [mailto:ewg- boun...@lists.openfabrics.org] On Behalf Of Mike Heinz Sent: Monday, February 21, 2011 11:55 AM To: klit...@dev.mellanox.co.il Cc: Linux RDMA; ewg@lists.openfabrics.org Subject: Re: [ewg] Patch breaks OFED 1.5.3: [PATCH] ibdiagpath: Properly index VlArbTable during QoS test

Re: [ewg] Patch breaks OFED 1.5.3: [PATCH] ibdiagpath: Properly index VlArbTable during QoS test

2011-02-21 Thread Mike Heinz
, I'll see if it makes a difference. -Original Message- From: Yevgeny Kliteynik [mailto:klit...@dev.mellanox.co.il] Sent: Sunday, February 20, 2011 9:05 AM To: Mike Heinz; John Jolly Cc: ewg@lists.openfabrics.org; Linux RDMA; Todd Rimmer; Eli Dorfman (Voltaire) Subject: Re: Patch breaks OFED

Re: [ewg] Patch breaks OFED 1.5.3: [PATCH] ibdiagpath: Properly index VlArbTable during QoS test

2011-02-21 Thread Mike Heinz
- From: Mike Heinz Sent: Monday, February 21, 2011 10:40 AM To: 'klit...@dev.mellanox.co.il'; John Jolly Cc: ewg@lists.openfabrics.org; Linux RDMA; Todd Rimmer; Eli Dorfman (Voltaire) Subject: RE: Patch breaks OFED 1.5.3: [ewg] [PATCH] ibdiagpath: Properly index VlArbTable during QoS test Yevgeny

[ewg] [PATCH] umad_send.3 (man page)

2011-02-07 Thread Mike Heinz
The man page for umad_send() does not match the source code. Signed-off-by: Michael Heinz michael.he...@qlogic.com --- diff --git a/libibumad/man/umad_send.3 b/libibumad/man/umad_send.3 index 2d84f57..c4a617a 100644 --- a/libibumad/man/umad_send.3 +++ b/libibumad/man/umad_send.3 @@ -7,11 +7,13 @@

[ewg] Patch breaks OFED 1.5.3: [PATCH] ibdiagpath: Properly index VlArbTable during QoS test

2011-02-07 Thread Mike Heinz
The version of ibdiagpath included with OFED 1.5.3-rc3 contains syntax errors which prevent it from executing on the systems I've tested (using TCL 8.4). Attempts to use ibdiagpath fail with an error message: -I--- -I- QoS on Path Check

Re: [ewg] [PATCH] IB/core: Control number of retries for SA to leave an MCG

2011-02-02 Thread Mike Heinz
Wouldn't the BUSY patch I proposed last year deal with this situation? -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Moni Shoua Sent: Wednesday, February 02, 2011 10:10 AM To: Vlad Cc: n...@voltaire.com; ewg Subject:

Re: [ewg] [PATCH] IB/core: Control number of retries for SA to leave an MCG

2011-02-02 Thread Mike Heinz
.html Basically, the spec permits an SM to reply busy instead of simply tossing packets on the floor, but OFED does not handle this case right now. -Original Message- From: Moni Shoua [mailto:mo...@voltaire.com] Sent: Wednesday, February 02, 2011 10:42 AM To: Mike Heinz Cc: Vlad; n

Re: [ewg] Need help for Infiniband optimisation for our cluster (MTU...)

2010-12-07 Thread Mike Heinz
When you say connected mode you referring to ipoib or your MPI configuration? You really don't want to use ipoib for HPC applications. What MPI are you using? For MPI - my personal experience is that OpenMPI is sometimes more reliable but Mvapich-1 offers the best performance. -Original

Re: [ewg] Need help for Infiniband optimisation for our cluster (MTU...)

2010-12-07 Thread Mike Heinz
Heh. I forgot Intel sells an mpi, I thought you were saying you had recompiled one of the OFED mpis with icc. 1) For your small cluster, there's no reason not to use connected mode. The only reason for providing a datagram mode with MPI is to support very large clusters where there simply

Re: [ewg] Need help for Infiniband optimisation for our cluster (MTU...)

2010-12-07 Thread Mike Heinz
. So the MTU value which I'm seeing on the ib0 interface (65520) is not connected to the real infiniband MTU value ? Le 07/12/2010 16:52, Mike Heinz a écrit : Heh. I forgot Intel sells an mpi, I thought you were saying you had recompiled one of the OFED mpis with icc. 1) For your small cluster

Re: [ewg] user SA notifications, redux

2010-10-14 Thread Mike Heinz
...@intel.com] Sent: Wednesday, October 13, 2010 11:59 AM To: Mike Heinz; linux-r...@vger.kernel.org; e...@openfabrics.org Cc: v...@mellanox.co.il; Roland Dreier Subject: RE: user SA notifications, redux As I mentioned earlier, the reason ib_sa acts as a single access point for SA/SM traps and notices

[ewg] user SA notifications, redux

2010-10-13 Thread Mike Heinz
to adding the user-space capability to libibverbs. Now that 1.5.2 is out the door, can we revisit this and try to get this and the matching kernel changes into the next release? === API for Proposal for adding ib_usa to the Linux Infiniband Subsystem Mike Heinz

[ewg] [PATCH] Proposal for MAD Busy handling

2010-10-08 Thread Mike Heinz
Sean, Jason, I backed off on this because the migration to OFED 1.5.2 and other issues was consuming all of my time; I've had this patch for quite a while but I finally had time recently to rework and test it for 1.5.2. The intent of this patch is to try to address the feedback you gave me

Re: [ewg] Binary files in libsdp SRPM in 1.5.2-rc7.

2010-09-21 Thread Mike Heinz
Resending this because I never saw it show up in the list: Looking at the SRPMS, I noticed that libsdp doesn't seem to have been made from clean source. It contains the result of a configure and make operation: Only in libsdp-1.1.103: config.h Only in libsdp-1.1.103: config.log Only in

[ewg] Binary files in libsdp SRPM in 1.5.2-rc7.

2010-09-20 Thread Mike Heinz
Looking at the SRPMS, I noticed that libsdp doesn't seem to have been made from clean source. It contains the result of a configure and make operation: Only in libsdp-1.1.103: config.h Only in libsdp-1.1.103: config.log Only in libsdp-1.1.103: config.status Only in libsdp-1.1.103: libtool Only

[ewg] Problems with mvapich2-1.5.1 - shared library is missing hwloc_* functions.

2010-09-10 Thread Mike Heinz
Hello all, I'm trying to build mvapich2-1.5.1 on an RHEL 5 update 3 system. It builds from the SRPM just fine, but when I try to compile test programs, they don't link. It appears that a set of routines, hwloc_* are missing from the shared library. [r...@homer bandwidth]#

Re: [ewg] [mpich2-dev] Problems with mvapich2-1.5.1 - shared library is missing hwloc_* functions.

2010-09-10 Thread Mike Heinz
BTW - in case it wasn't clear, this is the mvapich2-1.5.1 rpm that comes with OFED 1.5.2-rc6. -Original Message- From: mpich2-dev-boun...@mcs.anl.gov [mailto:mpich2-dev-boun...@mcs.anl.gov] On Behalf Of Mike Heinz Sent: Friday, September 10, 2010 2:43 PM To: mpich2-...@mcs.anl.gov; e

[ewg] Problems building OFED 1.5.2 RC2 on RHEL5, SLES11. libsdp fails to configure.

2010-07-22 Thread Mike Heinz
Hey, all - I'm trying to install the 1.5.2-rc2 tarball with the following command: # ./install.pl --all --prefix /usr/ofed-1.5.2-rc2 but it fails when it gets to libsdp: configure: error: OPENIB: --with-openib must be provided - fail to find standard OpenIB kernel installation error: Bad exit

Re: [ewg] Problems building OFED 1.5.2 RC2 on RHEL5, SLES11. libsdp fails to configure.

2010-07-22 Thread Mike Heinz
'_usr /usr/ofed-1.5.2' /home/mheinz/work/OFED-1.5.2-rc2/SRPMS/libsdp-1.1.101-0.3.gc767eee.src.rpm -Original Message- From: ewg-boun...@openfabrics.org [mailto:ewg-boun...@openfabrics.org] On Behalf Of Mike Heinz Sent: Thursday, July 22, 2010 3:05 PM To: e...@openfabrics.org Subject: [ewg

Re: [ewg] [PATCH] pkey fix for ipoib - resubmission

2010-06-18 Thread Mike Heinz
I never got a response to this patch, so I'm sending it again. - IPoIB is coded to use the 1st PKey in the PKey table as its ib0 interface. Additional ib0.pkey interfaces may be created using the /sys/class/... add_child interface. However, there is a race. During normal

[ewg] [PATCH] ofa_kernel openibd script

2010-06-16 Thread Mike Heinz
This patch builds upon my previously submitted patch for improving the default handling of the node_desc. With this patch, the openibd script will set the description of each HCA in the system to the value @: HCA-## where ## is replaced with a unique id number for that HCA and the @ symbol is

[ewg] [PATCH v2] OFED 1.5.2 ofa_kernel node_description patch

2010-06-15 Thread Mike Heinz
This is the OFED 1.5.2 version of a patch I submitted earlier today to linux-rdma. There are only very small differences between OFED 1.5.2 and matching areas of the IB drivers in Linux 2.6.35, but they were enough to break the patch, making this version necessary. If this patch is accepted

Re: [ewg] [PATCH] ofa_kernel madeye.c

2010-06-14 Thread Mike Heinz
Thanks! From: Vladimir Sokolovsky [v...@dev.mellanox.co.il] Sent: Sunday, June 13, 2010 5:01 AM To: Mike Heinz Cc: e...@openfabrics.org Subject: Re: [ewg] [PATCH] ofa_kernel madeye.c Mike Heinz wrote: This is a simple fix. Several of the snoop filters

[ewg] [PATCH] ofa_kernel madeye.c

2010-06-11 Thread Mike Heinz
and should be included in OFED 1.5.2. -Original Message- From: ewg-boun...@openfabrics.org [mailto:ewg-boun...@openfabrics.org] On Behalf Of Mike Heinz Sent: Tuesday, June 01, 2010 9:58 AM To: e...@openfabrics.org Subject: [ewg] [PATCH] ofa_kernel madeye.c I'm resending this, because it seems

Re: [ewg] [PATCH] node description patch

2010-06-11 Thread Mike Heinz
add code to force a trailing zero. -Original Message- From: Jack Morgenstein [mailto:ja...@dev.mellanox.co.il] Sent: Thursday, June 03, 2010 6:14 AM To: e...@openfabrics.org Cc: Mike Heinz; e...@openfabrics.org Subject: Re: [ewg] [PATCH] node description patch On Tuesday 01 June 2010 17

Re: [ewg] [PATCH] Handling busy responses from the SA

2010-06-08 Thread Mike Heinz
It's workable, although I really wish there was a way to handle stupid apps that aren't written to handle a busy response. -Original Message- From: Hefty, Sean [mailto:sean.he...@intel.com] Sent: Tuesday, June 08, 2010 12:44 PM To: Jason Gunthorpe Cc: Mike Heinz; linux-r

Re: [ewg] [PATCH] Handling busy responses from the SA

2010-06-07 Thread Mike Heinz
Roland Dreier said: I don't have a strong opinion on this but it seems a bit odd. If we're just going to drop the response anyway, why did the SA send it in the first place? On the other hand, if the SA told us it's busy, it does seem we could do something more sensible than retrying

Re: [ewg] [PATCH] Handling busy responses from the SA

2010-06-07 Thread Mike Heinz
Sean said: I don't object to the concept of treating a busy response as a timeout, but how does this help prevent overwhelming the SA? It continues to retry the queries, even if the SA says that it's too busy to respond without adjusting the timeout specified by the user. I would think

Re: [ewg] [PATCH] Handling busy responses from the SA

2010-06-07 Thread Mike Heinz
, Sean Cc: Mike Heinz; linux-r...@vger.kernel.org; e...@openfabrics.org Subject: Re: [PATCH] Handling busy responses from the SA On Fri, Jun 04, 2010 at 02:05:10PM -0700, Hefty, Sean wrote: Maybe we should re-think that guideline and allow users to simply indicate that the MAD layer should use

[ewg] [PATCH] Handling busy responses from the SA

2010-06-04 Thread Mike Heinz
The purpose of this patch is to cause the ib_mad driver to discard busy responses from the SA, effectively causing busy responses to become time outs. This ensures that naïve IB applications cannot overwhelm the SA with queries, which could happen when a cluster is being rebooted, or when a

[ewg] [PATCH] ofa_kernel madeye.c

2010-06-01 Thread Mike Heinz
Message- From: ewg-boun...@openfabrics.org [mailto:ewg-boun...@openfabrics.org] On Behalf Of Mike Heinz Sent: Wednesday, May 26, 2010 4:01 PM To: e...@openfabrics.org Subject: [ewg] [PATCH] ofa_kernel madeye.c This is a simple fix. Several of the snoop filters in ./drivers/infiniband/util

[ewg] [PATCH] node description patch

2010-06-01 Thread Mike Heinz
This patch fixes a problem with the openibd initialization script. On machines using slower DHCP servers, openibd frequently sets the HCA's node description to HCA-1. This patch modifies openibd to add a @ instead of the hostname and adds a small hook in the core drivers to replace the @ sign

Re: [ewg] Question: When should patches be submitted to EWG and when should they be submitted to linux-rdma?

2010-05-27 Thread Mike Heinz
To: Mike Heinz; openfabrics-...@openib.org Subject: RE: [ewg] Question: When should patches be submitted to EWG and when should they be submitted to linux-rdma? In general, we would like kernel code to be reviewed and accepted (or at least queued for acceptance) upstream first and then submitted

[ewg] [PATCH] ofa_kernel madeye.c

2010-05-26 Thread Mike Heinz
This is a simple fix. Several of the snoop filters in ./drivers/infiniband/util/madeye.c don't switch the attribute id to host byte order before checking it. Signed-off-by: Michael Heinz michael.he...@qlogic.com diff --git a/drivers/infiniband/util/madeye.c b/drivers/infiniband/util/madeye.c

[ewg] Question: When should patches be submitted to EWG and when should they be submitted to linux-rdma?

2010-05-26 Thread Mike Heinz
The subject says it all. If I have a patch that can be applied against either the current OFED git repository or against the upstream kernel - where do I post it? ___ ewg mailing list ewg@lists.openfabrics.org

Re: [ewg] Question: When should patches be submitted to EWG and when should they be submitted to linux-rdma?

2010-05-26 Thread Mike Heinz
, or do I need to submit the patch to both groups? -Original Message- From: Roland Dreier [mailto:rdre...@cisco.com] Sent: Wednesday, May 26, 2010 4:50 PM To: Mike Heinz Cc: openfabrics-...@openib.org Subject: Re: [ewg] Question: When should patches be submitted to EWG and when should

Re: [ewg] ibcheckerrors Port All FAILED reported

2010-05-06 Thread Mike Heinz
Ira, I'm pretty sure I already fixed this problem. I submitted a patch to Sasha back in April. -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Ira Weiny Sent: Wednesday, May 05, 2010 9:10 PM To: Woodruff, Robert J;

Re: [ewg] ibcheckerrors Port All FAILED reported

2010-05-06 Thread Mike Heinz
Yup - I've also sent a note to Sasha what happened to the patch. -Original Message- From: Ira Weiny [mailto:wei...@llnl.gov] Sent: Thursday, May 06, 2010 11:35 AM To: Mike Heinz; Sasha Khapyorsky Cc: Woodruff, Robert J; linux-r...@vger.kernel.org; EWG; tzipo...@mellanox.co.il Subject: Re

[ewg] [PATCH] management: adding mad_dump_fields to libibmad

2010-05-06 Thread Mike Heinz
Sasha asked that I re-submit the patches for perfquery in a slightly different format. This is the first of 3 patches. This patch adds a function to libibmad that allows the caller to dump a configurable range of MAD attributes. Basically, this provides an external interface to the internal

Re: [ewg] [PATCH] management: adding mad_dump_fields to libibmad

2010-05-06 Thread Mike Heinz
Khapyorsky [mailto:sashakv...@gmail.com] On Behalf Of Sasha Khapyorsky Sent: Thursday, May 06, 2010 5:03 PM To: Mike Heinz Cc: linux-r...@vger.kernel.org; e...@openfabrics.org Subject: Re: [PATCH] management: adding mad_dump_fields to libibmad On 13:27 Thu 06 May , Mike Heinz wrote: Sasha asked

Re: [ewg] ibcheckerrors Port All FAILED reported

2010-05-05 Thread Mike Heinz
Hi - the problem is that not all switches support the same features, and ibcheckerrors is treating this as an error. I believe this will be fixed in OFED 1.5.2. -Original Message- From: ewg-boun...@openfabrics.org [mailto:ewg-boun...@openfabrics.org] On Behalf Of Woodruff, Robert J

Re: [ewg] Hang in ib_mad when unergistering.

2010-05-03 Thread Mike Heinz
the problem. From: Tziporet Koren [mailto:tzipo...@dev.mellanox.co.il] Sent: Sunday, May 02, 2010 4:05 PM To: Mike Heinz Cc: e...@openfabrics.org Subject: Re: [ewg] Hang in ib_mad when unergistering. On 4/30/2010 4:04 PM, Mike Heinz wrote: Using OFED 1.5.0 and 1.5.1 we've been seeing nodes

[ewg] Hang in ib_mad when unergistering.

2010-04-30 Thread Mike Heinz
Using OFED 1.5.0 and 1.5.1 we've been seeing nodes occasionally hang when a process tries to disconnect from the umad interface. Can anyone suggest what might be causing this? Here's a typical example: Apr 29 10:01:37 st2139 kernel: qlgc_dsc D 80148c54 0 5478 1 5497

[ewg] [PATCH] Patch for libibmad

2010-04-19 Thread Mike Heinz
We had a customer report that perfquery was crashing on their nodes when trying to query ports on a switch. When I examined the core dump, it was clear that libibmad was dereferencing a null pointer from one of the mad_set_ functions: #0 0x in ?? () #1 0x2ae4e13e7536 in

[ewg] [PATCH] Adding new mad_dump_fields function to libibmad so that perfquery can be a bit more selective.

2010-04-19 Thread Mike Heinz
These patches are a modification to a patch I submitted earlier, based on Sasha's feedback. Rather than duplicating functionality between perfquery.c and libibmad/dump.c, this patch exposes the internal function _dump_fields() as new api call, mad_dump_fields(). This permits perfquery to change