Re: [ofa-general] Bug with SDP on IA64

2008-10-27 Thread Amir Vadai
I asked our IB expert Jack for hints and he told me this: From Section 11.6.2 (COMPLETION RETURN STATUS0 of the IB Spec volume 1, revision 1.2.1 * Local Length Error - ... Generated for a Work Request posted to the local Receive Queue when the sum of the Data Segment lengths is too small to

[ofa-general][perftest]service level implementation

2008-10-27 Thread Celine Bourde
Hi, We are testing QoS. We have defined service level rules in opensm and implemented qos-policy. Implementation wasn't fully done in perftest tools. So, I've implemented pertest tools by adding -L option to set service levels. I took OFED 1.3 git version. The lastest commit is : commit

Re: [ofa-general][perftest]service level implementation

2008-10-27 Thread Tziporet Koren
Oren is perftest maintainer Tziporet Celine Bourde wrote: Hi, We are testing QoS. We have defined service level rules in opensm and implemented qos-policy. Implementation wasn't fully done in perftest tools. So, I've implemented pertest tools by adding -L option to set service levels. I

Re: [ofa-general][perftest]service level implementation

2008-10-27 Thread Or Gerlitz
Celine Bourde wrote: We are testing QoS. We have defined service level rules in opensm and implemented qos-policy. Implementation wasn't fully done in perftest tools. So, I've implemented pertest tools by adding -L option to set service levels. I found qperf to be very useful for QoS

Re: [ofa-general] Bug with SDP on IA64

2008-10-27 Thread Nicolas Morey Chaisemartin
Amir Vadai a écrit : I asked our IB expert Jack for hints and he told me this: From Section 11.6.2 (COMPLETION RETURN STATUS0 of the IB Spec volume 1, revision 1.2.1 * Local Length Error - ... Generated for a Work Request posted to the local Receive Queue when the sum of the Data Segment

Re: [ofa-general] opensm as service - cfg files

2008-10-27 Thread Philippe Gregoire
Al Chu a écrit : On Thu, 2008-10-23 at 14:53 +0200, Philippe Gregoire wrote: Hi Yevgeny, Is it possible to write this service so it will be able to manage multiple instances of opensm on the same node, I mean start and stop all instances at the same time or separately. This will be very

[ofa-general] ofa_1_4_kernel 20081027-0200 daily build status

2008-10-27 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with

Re: [ofa-general] Bug with SDP on IA64

2008-10-27 Thread Nicolas Morey Chaisemartin
Amir Vadai a écrit : I asked our IB expert Jack for hints and he told me this: From Section 11.6.2 (COMPLETION RETURN STATUS0 of the IB Spec volume 1, revision 1.2.1 * Local Length Error - ... Generated for a Work Request posted to the local Receive Queue when the sum of the Data Segment

Re: [ofa-general] Bug with SDP on IA64

2008-10-27 Thread Dotan Barak
On Mon, Oct 27, 2008 at 11:09 AM, Nicolas Morey Chaisemartin [EMAIL PROTECTED] wrote: Amir Vadai a écrit : I asked our IB expert Jack for hints and he told me this: From Section 11.6.2 (COMPLETION RETURN STATUS0 of the IB Spec volume 1, revision 1.2.1 * Local Length Error - ... Generated

Re: [ofa-general] Bug with SDP on IA64

2008-10-27 Thread Nicolas Morey Chaisemartin
Dotan Barak a écrit : On Mon, Oct 27, 2008 at 11:09 AM, Nicolas Morey Chaisemartin [EMAIL PROTECTED] wrote: Amir Vadai a écrit : I asked our IB expert Jack for hints and he told me this: From Section 11.6.2 (COMPLETION RETURN STATUS0 of the IB Spec volume 1, revision 1.2.1 * Local

Re: [ofa-general] opensm as service - cfg files

2008-10-27 Thread Yevgeny Kliteynik
Philippe Gregoire wrote: Al Chu a écrit : On Thu, 2008-10-23 at 14:53 +0200, Philippe Gregoire wrote: Hi Yevgeny, Is it possible to write this service so it will be able to manage multiple instances of opensm on the same node, I mean start and stop all instances at the same time or

Re: [ofa-general] Bug with SDP on IA64

2008-10-27 Thread Amir Vadai
I opened a bug in bugzilla with your research: https://bugs.openfabrics.org/show_bug.cgi?id=1311 Nicolas Morey Chaisemartin wrote: Amir Vadai a écrit : I asked our IB expert Jack for hints and he told me this: From Section 11.6.2 (COMPLETION RETURN STATUS0 of the IB Spec volume 1, revision

Re: [ofa-general] opensm as service - cfg files

2008-10-27 Thread Philippe Gregoire
Yevgeny Kliteynik a écrit : Philippe Gregoire wrote: Al Chu a écrit : On Thu, 2008-10-23 at 14:53 +0200, Philippe Gregoire wrote: Hi Yevgeny, Is it possible to write this service so it will be able to manage multiple instances of opensm on the same node, I mean start and stop all

[ofa-general] [PATCH] sdp: fixed sparse warning

2008-10-27 Thread Amir Vadai
--- drivers/infiniband/ulp/sdp/sdp_bcopy.c |4 ++-- drivers/infiniband/ulp/sdp/sdp_cma.c |8 drivers/infiniband/ulp/sdp/sdp_main.c | 11 +-- 3 files changed, 11 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/ulp/sdp/sdp_bcopy.c

[ofa-general] OFED meeting agenda for today (Oct 27)

2008-10-27 Thread Tziporet Koren
This is the agenda for the OFED meeting today: 1. Review bugs status and decide on their priority 1262cri [EMAIL PROTECTED] congestion hang with RDS 1298cri [EMAIL PROTECTED] nfsrdma rh5.1 causes kernel panic 1299cri [EMAIL PROTECTED] nfs module

[ofa-general] Re: [PATCH] IB/sysfs: Add port_xmit_wait counter.

2008-10-27 Thread Roland Dreier
Looks OK... probably not worth checking ClassPortInfo:CapabilityMask.PortCountersXmitWaitSupported to make sure this field is defined, although it is unfortunate that the IB spec says that PortXmitWait is undefined rather than 0 when it isn't supported. Anyway, one question: static

Re: [ofa-general] [PATCH] ib_core: Use weak ordering for data registered memory

2008-10-27 Thread Roland Dreier
Some architectures support weak ordering in which case better performance is possible. IB registered memory used for data can be weakly ordered becuase the the completion queues' buffers are registered as strongly ordered. This will result in flushing all data related outstanding DMA

Re: [ofa-general] [PATCH 3/3] IB/ipath - improve UD loopback performance by allocating temp array once

2008-10-27 Thread Roland Dreier
The first two look OK for 2.6.28 I guess, although they don't seem to be regression fixes and appeared at the tail end of the merge window. I'll probably sneak them into -rc3. But this patch is just an optimization, right? So I'll wait for 2.6.29 for this one.

Re: [ofa-general] [PATCH] ib_core: Use weak ordering for data registered memory

2008-10-27 Thread Talpey, Thomas
At 12:19 PM 10/27/2008, Roland Dreier wrote: Some architectures support weak ordering in which case better performance is possible. IB registered memory used for data can be weakly ordered becuase the the completion queues' buffers are registered as strongly ordered. This will result in

Re: [ofa-general] opensm as service - cfg files

2008-10-27 Thread Ira Weiny
On Mon, 27 Oct 2008 10:40:17 +0100 Philippe Gregoire [EMAIL PROTECTED] wrote: Al Chu a écrit : On Thu, 2008-10-23 at 14:53 +0200, Philippe Gregoire wrote: Hi Yevgeny, Is it possible to write this service so it will be able to manage multiple instances of opensm on the same node,

Re: [ofa-general] Re: [PATCH] IB/sysfs: Add port_xmit_wait counter.

2008-10-27 Thread Vladimir Sokolovsky
Roland Dreier wrote: Looks OK... probably not worth checking ClassPortInfo:CapabilityMask.PortCountersXmitWaitSupported to make sure this field is defined, although it is unfortunate that the IB spec says that PortXmitWait is undefined rather than 0 when it isn't supported. Anyway, one

Re: [ofa-general] [PATCH] ib_core: Use weak ordering for data registered memory

2008-10-27 Thread Eli Cohen
On Mon, Oct 27, 2008 at 12:32:06PM -0400, Talpey, Thomas wrote: Eli - is there some reason you chose mode 0444 to protect against writing the setting after module loading? It looks like the value is inspected dynamically. I think the value of the parameter should be determined at driver

Re: [ofa-general] opensm as service - cfg files

2008-10-27 Thread Al Chu
Hey Philippe, On Mon, 2008-10-27 at 10:40 +0100, Philippe Gregoire wrote: Al Chu a écrit : On Thu, 2008-10-23 at 14:53 +0200, Philippe Gregoire wrote: Hi Yevgeny, Is it possible to write this service so it will be able to manage multiple instances of opensm on the same node, I

[ofa-general] false warnings of multicast join failures

2008-10-27 Thread Yossi Etigin
I'm referring to these: ib0: multicast join failed for ff12:401b::::::, status -11 The patch in http://lists.openfabrics.org/pipermail/general/2008-May/050551.html is causing them. The patch creates a state when there is no sm_ah, so all alloc_mad() calls return -11

Re: [ofa-general] opensm as service - cfg files

2008-10-27 Thread Yevgeny Kliteynik
Al Chu wrote: Hey Philippe, On Mon, 2008-10-27 at 10:40 +0100, Philippe Gregoire wrote: Al Chu a écrit : On Thu, 2008-10-23 at 14:53 +0200, Philippe Gregoire wrote: Hi Yevgeny, Is it possible to write this service so it will be able to manage multiple instances of opensm on the same

Re: [ofa-general][PATCH] mlx4: Setting the correct offset for default mac address

2008-10-27 Thread Jeff Garzik
Yevgeny Petrilin wrote: Signed-off-by: Yevgeny Petrilin [EMAIL PROTECTED] --- drivers/net/mlx4/fw.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index be09fdb..cee199c 100644 --- a/drivers/net/mlx4/fw.c +++

[ofa-general] Re: mlx4_en: remove duplicated #include

2008-10-27 Thread Jeff Garzik
Huang Weiyi wrote: Removed duplicated #include linux/cpumask.h in drivers/net/mlx4/en_main.c. Signed-off-by: Huang Weiyi [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] diff --git a/drivers/net/mlx4/en_main.c b/drivers/net/mlx4/en_main.c index 1b0eebf..4b9794e 100644 ---

Re: [ofa-general] opensm as service - cfg files

2008-10-27 Thread Al Chu
On Mon, 2008-10-27 at 20:53 +0200, Yevgeny Kliteynik wrote: Al Chu wrote: Hey Philippe, On Mon, 2008-10-27 at 10:40 +0100, Philippe Gregoire wrote: Al Chu a écrit : On Thu, 2008-10-23 at 14:53 +0200, Philippe Gregoire wrote: Hi Yevgeny, Is it possible to write this service

Re: [ofa-general] [PATCH 1/3] IB/ipath - fix the length returned in loopback UD completion queue entries

2008-10-27 Thread Roland Dreier
UD packets sent to the local IB port (loopback) have a zero length reported in the send work request completion entry. This fixes it by using a copy of the WQE to copy the data. According to the IB spec (as I read it at least), the bytes transferred field of a completion entry is only

[ofa-general] poll CQ failed -2 with connectX

2008-10-27 Thread Rick Warner
Hi all, I am configuring an opteron cluster with connectX Infiniband. I have a problem that if I run one of the NAS tests, it works the first, and maybe 2nd time, but after that the jobs instantly fail with messages like this- [Rank 44][cm.c: line 860]poll CQ failed -2 [Rank 51][cm.c: line

Re: [ofa-general] [PATCH 1/3] IB/ipath - fix the length returned in loopback UD completion queue entries

2008-10-27 Thread Ralph Campbell
On Mon, 2008-10-27 at 15:30 -0700, Roland Dreier wrote: UD packets sent to the local IB port (loopback) have a zero length reported in the send work request completion entry. This fixes it by using a copy of the WQE to copy the data. According to the IB spec (as I read it at least), the

Re: [ofa-general] poll CQ failed -2 with connectX

2008-10-27 Thread Rick Warner
On Monday 27 October 2008, Rick Warner wrote: Hi all, I am configuring an opteron cluster with connectX Infiniband. I have a problem that if I run one of the NAS tests, it works the first, and maybe 2nd time, but after that the jobs instantly fail with messages like this- [Rank 44][cm.c:

[ofa-general] ib_mthca catastrophic error detected

2008-10-27 Thread Scott A. Friedman
Hello On a several hundred node cluster we run here we have experienced several large (512+ core) job die with the following left in several of the node's logs. Below is an example from two different nodes. 22 nodes had this error after the large run died. What is this error and why would

Re: [ofa-general] ib_mthca catastrophic error detected

2008-10-27 Thread Roland Dreier
ib_mthca :02:00.0: Catastrophic error detected: internal error This means your HCA detected an internal error -- overheating, power glitch, cosmic ray, firmware bug, something like that. ___ general mailing list general@lists.openfabrics.org