[ewg] ofa_1_5_kernel 20100505-0200 daily build status

2010-05-05 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git
git_branch: ofed_kernel_1_5

Common build parameters: 

Passed:
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.24
Passed on i686 with linux-2.6.26
Passed on i686 with linux-2.6.22
Passed on i686 with linux-2.6.27
Passed on x86_64 with linux-2.6.16.60-0.54.5-smp
Passed on x86_64 with linux-2.6.16.60-0.21-smp
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18-128.el5
Passed on x86_64 with linux-2.6.18-186.el5
Passed on x86_64 with linux-2.6.18-164.el5
Passed on x86_64 with linux-2.6.18-93.el5
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.24
Passed on x86_64 with linux-2.6.22
Passed on x86_64 with linux-2.6.25
Passed on x86_64 with linux-2.6.26
Passed on x86_64 with linux-2.6.27
Passed on x86_64 with linux-2.6.27.19-5-smp
Passed on x86_64 with linux-2.6.9-67.ELsmp
Passed on x86_64 with linux-2.6.9-89.ELsmp
Passed on x86_64 with linux-2.6.9-78.ELsmp
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.22
Passed on ia64 with linux-2.6.23
Passed on ia64 with linux-2.6.24
Passed on ia64 with linux-2.6.25
Passed on ia64 with linux-2.6.26
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.19

Failed:
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH] mlx4_core: request MSIX vectors as much as there CPU cores

2010-05-05 Thread Eli Cohen
The current code requires num_possible_cpus() + 1 MSIX vectors. However,
num_possible_cpus() stands for the max number of supported CPUs by the kernel.
We should use num_online_cpus() which is the number of available CPUs for the
system.

Signed-off-by: Eli Cohen e...@mellanox.co.il
---
 drivers/net/mlx4/main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index e3e0d54..0559df4 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -969,7 +969,7 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
 
if (msi_x) {
nreq = min_t(int, dev-caps.num_eqs - dev-caps.reserved_eqs,
-num_possible_cpus() + 1);
+num_online_cpus() + 1);
entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL);
if (!entries)
goto no_msi;
-- 
1.7.1

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] libibverbs: add raw ethernet QP type IBV_QPT_RAW_ETH=7

2010-05-05 Thread Tziporet Koren
On 5/5/2010 12:55 AM, Roland Dreier wrote:
 We want to add that patch and iWARP multicast acceleration patches to 
 OFED 1.5.2 content.

 I have nothing to do with OFED development.

We first need to see how is this patch get into the Linux kernel before 
we take it to OFED
BTW - does this patch add-on to the verbs API?
I hope it does not break the ABI.

Tziporet

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] libibverbs: add raw ethernet QP type IBV_QPT_RAW_ETH=7

2010-05-05 Thread Walukiewicz, Miroslaw
BTW - does this patch add-on to the verbs API?
I hope it does not break the ABI.The only change to the API is 

The only change for API is a new IBV_QPT_RAW_ETH. There is no changes in 
structures. 

Regards,

Mirek 
-Original Message-
From: ewg-boun...@lists.openfabrics.org 
[mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Tziporet Koren
Sent: Wednesday, May 05, 2010 3:22 PM
To: Roland Dreier
Cc: Walukiewicz, Miroslaw; e...@openfabrics.org
Subject: Re: [ewg] [PATCH] libibverbs: add raw ethernet QP type 
IBV_QPT_RAW_ETH=7

On 5/5/2010 12:55 AM, Roland Dreier wrote:
 We want to add that patch and iWARP multicast acceleration patches to 
 OFED 1.5.2 content.

 I have nothing to do with OFED development.

We first need to see how is this patch get into the Linux kernel before 
we take it to OFED
BTW - does this patch add-on to the verbs API?
I hope it does not break the ABI.

Tziporet

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] mlx4_core: request MSIX vectors as much as there CPU cores

2010-05-05 Thread Jason Gunthorpe
On Wed, May 05, 2010 at 07:54:54AM -0700, Roland Dreier wrote:
   The current code requires num_possible_cpus() + 1 MSIX vectors. However,
   num_possible_cpus() stands for the max number of supported CPUs by the 
 kernel.
 
 I believe this is wrong -- num_possible_cpus() is the maximum number of
 CPUs for the running system, including hotplug.

FWIW, I've seen some systems running RH kernels that blow up here
because they run out of interrupt numbers if you have two IB cards in
them.

Mainline kernels have a reworked IRQ number allocator and don't have a
problem, so maybe this is an ofed only patch?

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] mlx4_core: request MSIX vectors as much as there CPU cores

2010-05-05 Thread Tziporet Koren
On 5/5/2010 7:32 PM, Jason Gunthorpe wrote:
 On Wed, May 05, 2010 at 07:54:54AM -0700, Roland Dreier wrote:

 The current code requires num_possible_cpus() + 1 MSIX vectors. However,
 num_possible_cpus() stands for the max number of supported CPUs by the 
 kernel.

 I believe this is wrong -- num_possible_cpus() is the maximum number of
 CPUs for the running system, including hotplug.
  
 FWIW, I've seen some systems running RH kernels that blow up here
 because they run out of interrupt numbers if you have two IB cards in
 them.

 Mainline kernels have a reworked IRQ number allocator and don't have a
 problem, so maybe this is an ofed only patch?

This is not related to OFED
We found it in performance work of our EN (10G) driver

Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] mlx4_core: request MSIX vectors as much as there CPU cores

2010-05-05 Thread Roland Dreier
  We found it in performance work of our EN (10G) driver

What does this have to do with performance?

Do you have a system where num_possible_cpus() differs significantly
from num_online_cpus()?  What kernel is that with?  As far as I can
tell, for x86 they should only be different if there genuinely are CPUs
that are available for hotplug.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] ibcheckerrors Port All FAILED reported

2010-05-05 Thread Woodruff, Robert J

Hi guys,

When I run ibcheckerrors on my Mellanox switch,
it is reporting that Port all FAILED. 

From what I can tell, the switch is working fine and
I think that this is a bogus error from the program.

If this is indeed not a real problem, can the diagnostic
be fixed to not report this as an error ?


ibcheckerrors -nocolor -v -t 100

# Checking Switch: nodeguid 0x0002c902004046a0
Node check lid 7: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED   

Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK

 Checking Ca: nodeguid 0x0002c9030002628a
Node check lid 14: OK
Error check on lid 14 (cstnh-2 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c90300025e0a
Node check lid 12: OK
Error check on lid 12 (cstnh-3 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030002615e
Node check lid 15: OK
Error check on lid 15 (cstnh-4 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e442
Node check lid 11: OK
Error check on lid 11 (cstnh-8 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e44e
Node check lid 8: OK
Error check on lid 8 (cstnh-11 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e3e6
Node check lid 2: OK
Error check on lid 2 (cstnh-13 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e44a
Node check lid 18: OK
Error check on lid 18 (cstnh-9 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c90300044fb4
Node check lid 13: OK
Error check on lid 13 (cstnh-7 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c90300044fbc
Node check lid 10: OK
Error check on lid 10 (cstnh-1 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e3ee
Node check lid 9: OK
Error check on lid 9 (cstnh-10 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e446
Node check lid 4: OK
Error check on lid 4 (cstnh-12 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e22e
Node check lid 1: OK
Error check on lid 1 (cstnh-14 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e43e
Node check lid 19: OK
Error check on lid 19 (cstnh-15 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0090270002000345
Node check lid 6: OK
Error check on lid 6 (cstnh-5 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0090270002000335
Node check lid 5: OK
Error check on lid 5 (cstnh-6 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c90300028238
Node check lid 3: OK
Error check on lid 3 (cst-linux HCA-1) port 1: OK

## Summary: 17 nodes checked, 0 bad nodes found
##  32 ports checked, 0 ports have errors beyond threshold
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] ibcheckerrors Port All FAILED reported

2010-05-05 Thread Mike Heinz
Hi - the problem is that not all switches support the same features, and 
ibcheckerrors is treating this as an error. I believe this will be fixed in 
OFED 1.5.2.

-Original Message-
From: ewg-boun...@openfabrics.org [mailto:ewg-boun...@openfabrics.org] On 
Behalf Of Woodruff, Robert J
Sent: Wednesday, May 05, 2010 4:48 PM
To: EWG; tzipo...@mellanox.co.il
Subject: [ewg] ibcheckerrors Port All FAILED reported


Hi guys,

When I run ibcheckerrors on my Mellanox switch,
it is reporting that Port all FAILED. 

From what I can tell, the switch is working fine and
I think that this is a bogus error from the program.

If this is indeed not a real problem, can the diagnostic
be fixed to not report this as an error ?


ibcheckerrors -nocolor -v -t 100

# Checking Switch: nodeguid 0x0002c902004046a0
Node check lid 7: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED   

Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK

 Checking Ca: nodeguid 0x0002c9030002628a
Node check lid 14: OK
Error check on lid 14 (cstnh-2 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c90300025e0a
Node check lid 12: OK
Error check on lid 12 (cstnh-3 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030002615e
Node check lid 15: OK
Error check on lid 15 (cstnh-4 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e442
Node check lid 11: OK
Error check on lid 11 (cstnh-8 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e44e
Node check lid 8: OK
Error check on lid 8 (cstnh-11 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e3e6
Node check lid 2: OK
Error check on lid 2 (cstnh-13 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e44a
Node check lid 18: OK
Error check on lid 18 (cstnh-9 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c90300044fb4
Node check lid 13: OK
Error check on lid 13 (cstnh-7 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c90300044fbc
Node check lid 10: OK
Error check on lid 10 (cstnh-1 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e3ee
Node check lid 9: OK
Error check on lid 9 (cstnh-10 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e446
Node check lid 4: OK
Error check on lid 4 (cstnh-12 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e22e
Node check lid 1: OK
Error check on lid 1 (cstnh-14 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c9030008e43e
Node check lid 19: OK
Error check on lid 19 (cstnh-15 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0090270002000345
Node check lid 6: OK
Error check on lid 6 (cstnh-5 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0090270002000335
Node check lid 5: OK
Error check on lid 5 (cstnh-6 HCA-1) port 1: OK

# Checking Ca: nodeguid 0x0002c90300028238
Node check lid 3: OK
Error check on lid 3 (cst-linux HCA-1) port 1: OK

## Summary: 17 nodes checked, 0 bad nodes found
##  32 ports checked, 0 ports have errors beyond threshold
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 03/11] IB/umad: Enable support only for IB ports

2010-05-05 Thread Roland Dreier
Why do we not allow umad for IBoE ports?  I understand there's no QP0
but why can't userspace use QP1 just like for IB link layer ports?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 01/11] ib core: Add link layer property to ports

2010-05-05 Thread Roland Dreier
Hi Eli,

I'm hoping to get this IBoE stuff in for 2.6.35.  I started an iboe
branch in my tree (similar to the xrc branch I've been carrying for a
while), and I added this patch in, except I renamed
rdma_port_link_layer() to rdma_port_get_link_layer().  This seems to
match rdma_node_get_transport() better.

In any case as I add patches to my branch, you can stop worrying about
them, which should make keeping this series updated easier.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 06/11] ipoib: avoid ipoib over IBoE

2010-05-05 Thread Roland Dreier
  @@ -1383,6 +1385,9 @@ static void ipoib_remove_one(struct ib_device *device)
   dev_list = ib_get_client_data(device, ipoib_client);
   
   list_for_each_entry_safe(priv, tmp, dev_list, list) {
  +if (rdma_port_link_layer(device, priv-port) != 
  IB_LINK_LAYER_INFINIBAND)
  +continue;

Why do we need this chunk here?  How could a netdev get on our list if
we never create IPoIB interfaces for IBoE ports?

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCHv8 02/11] ib_core: IBoE support only QP1

2010-05-05 Thread Roland Dreier
  @@ -795,11 +799,12 @@ static void mcast_add_one(struct ib_device *device)
   struct mcast_device *dev;
   struct mcast_port *port;
   int i;
  +int count = 0;
   
   if (rdma_node_get_transport(device-node_type) != RDMA_TRANSPORT_IB)
   return;
   
  -dev = kmalloc(sizeof *dev + device-phys_port_cnt * sizeof *port,
  +dev = kzalloc(sizeof *dev + device-phys_port_cnt * sizeof *port,

  @@ -1007,7 +1010,7 @@ static void ib_sa_add_one(struct ib_device *device)
   e = device-phys_port_cnt;
   }
   
  -sa_dev = kmalloc(sizeof *sa_dev +
  +sa_dev = kzalloc(sizeof *sa_dev +

Do you happen to remember why you needed these kmalloc - kzalloc conversions?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] ibcheckerrors Port All FAILED reported

2010-05-05 Thread Ira Weiny
Interesting...

I have a switch which does this as well.  Tracing through the scripts shows
that the perfquery command is failing like this.

14:29:03  ./perfquery 40 255
./perfquery: iberror: failed: AllPortSelect not supported

It seems there is an issue with the CapabilityMask value...

14:43:32  ./perfquery 40 255
cap_mask 0x400  === my debug output
./perfquery: iberror: failed: AllPortSelect not supported

14:43:38  ./saquery CPI 40
SA ClassPortInfo:
...
Capability mask..0x2602
...

Those don't match because...  perfquery has a bug...

perfquery is issuing a PMA query when it should be issuing a SA query.  It
just so happens that on some switches the result of that PMA query indicates
AllPortSelect is available.  Patch to follow.

Ira


On Wed, 5 May 2010 13:47:54 -0700
Woodruff, Robert J robert.j.woodr...@intel.com wrote:

 
 Hi guys,
 
 When I run ibcheckerrors on my Mellanox switch,
 it is reporting that Port all FAILED. 
 
 From what I can tell, the switch is working fine and
 I think that this is a bogus error from the program.
 
 If this is indeed not a real problem, can the diagnostic
 be fixed to not report this as an error ?
 
 
 ibcheckerrors -nocolor -v -t 100
 
 # Checking Switch: nodeguid 0x0002c902004046a0
 Node check lid 7: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED  
  
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
 Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK
 
  Checking Ca: nodeguid 0x0002c9030002628a
 Node check lid 14: OK
 Error check on lid 14 (cstnh-2 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c90300025e0a
 Node check lid 12: OK
 Error check on lid 12 (cstnh-3 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c9030002615e
 Node check lid 15: OK
 Error check on lid 15 (cstnh-4 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c9030008e442
 Node check lid 11: OK
 Error check on lid 11 (cstnh-8 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c9030008e44e
 Node check lid 8: OK
 Error check on lid 8 (cstnh-11 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c9030008e3e6
 Node check lid 2: OK
 Error check on lid 2 (cstnh-13 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c9030008e44a
 Node check lid 18: OK
 Error check on lid 18 (cstnh-9 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c90300044fb4
 Node check lid 13: OK
 Error check on lid 13 (cstnh-7 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c90300044fbc
 Node check lid 10: OK
 Error check on lid 10 (cstnh-1 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c9030008e3ee
 Node check lid 9: OK
 Error check on lid 9 (cstnh-10 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c9030008e446
 Node check lid 4: OK
 Error check on lid 4 (cstnh-12 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c9030008e22e
 Node check lid 1: OK
 Error check on lid 1 (cstnh-14 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c9030008e43e
 Node check lid 19: OK
 Error check on lid 19 (cstnh-15 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0090270002000345
 Node check lid 6: OK
 Error check on lid 6 (cstnh-5 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0090270002000335
 Node check lid 5: OK
 Error check on lid 5 (cstnh-6 HCA-1) port 1: OK
 
 # Checking Ca: nodeguid 0x0002c90300028238
 Node check lid 3: OK
 Error check on lid 3 (cst-linux HCA-1) port 1: OK
 
 ## Summary: 17 nodes checked, 0 bad nodes found
 ##  32 ports checked, 0 ports have errors beyond threshold
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://*lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
 


-- 
Ira Weiny wei...@llnl.gov
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] ibcheckerrors Port All FAILED reported

2010-05-05 Thread Ira Weiny
Nevermind, I am wrong about the below.

However, there is an option to emulate the all ports when it is not supported.

That is a way to fix this I believe.
Ira

On Wed, 5 May 2010 18:09:43 -0700
Ira Weiny wei...@llnl.gov wrote:

 Interesting...
 
 I have a switch which does this as well.  Tracing through the scripts shows
 that the perfquery command is failing like this.
 
 14:29:03  ./perfquery 40 255
 ./perfquery: iberror: failed: AllPortSelect not supported
 
 It seems there is an issue with the CapabilityMask value...
 
 14:43:32  ./perfquery 40 255
 cap_mask 0x400  === my debug output
 ./perfquery: iberror: failed: AllPortSelect not supported
 
 14:43:38  ./saquery CPI 40
 SA ClassPortInfo:
 ...
 Capability mask..0x2602
 ...
 
 Those don't match because...  perfquery has a bug...
 
 perfquery is issuing a PMA query when it should be issuing a SA query.  It
 just so happens that on some switches the result of that PMA query indicates
 AllPortSelect is available.  Patch to follow.
 
 Ira
 
 
 On Wed, 5 May 2010 13:47:54 -0700
 Woodruff, Robert J robert.j.woodr...@intel.com wrote:
 
  
  Hi guys,
  
  When I run ibcheckerrors on my Mellanox switch,
  it is reporting that Port all FAILED. 
  
  From what I can tell, the switch is working fine and
  I think that this is a bogus error from the program.
  
  If this is indeed not a real problem, can the diagnostic
  be fixed to not report this as an error ?
  
  
  ibcheckerrors -nocolor -v -t 100
  
  # Checking Switch: nodeguid 0x0002c902004046a0
  Node check lid 7: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: 
  FAILED   
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK
  Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK
  
   Checking Ca: nodeguid 0x0002c9030002628a
  Node check lid 14: OK
  Error check on lid 14 (cstnh-2 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c90300025e0a
  Node check lid 12: OK
  Error check on lid 12 (cstnh-3 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c9030002615e
  Node check lid 15: OK
  Error check on lid 15 (cstnh-4 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c9030008e442
  Node check lid 11: OK
  Error check on lid 11 (cstnh-8 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c9030008e44e
  Node check lid 8: OK
  Error check on lid 8 (cstnh-11 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c9030008e3e6
  Node check lid 2: OK
  Error check on lid 2 (cstnh-13 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c9030008e44a
  Node check lid 18: OK
  Error check on lid 18 (cstnh-9 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c90300044fb4
  Node check lid 13: OK
  Error check on lid 13 (cstnh-7 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c90300044fbc
  Node check lid 10: OK
  Error check on lid 10 (cstnh-1 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c9030008e3ee
  Node check lid 9: OK
  Error check on lid 9 (cstnh-10 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c9030008e446
  Node check lid 4: OK
  Error check on lid 4 (cstnh-12 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c9030008e22e
  Node check lid 1: OK
  Error check on lid 1 (cstnh-14 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c9030008e43e
  Node check lid 19: OK
  Error check on lid 19 (cstnh-15 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0090270002000345
  Node check lid 6: OK
  Error check on lid 6 (cstnh-5 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0090270002000335
  Node check lid 5: OK
  Error check on lid 5 (cstnh-6 HCA-1) port 1: OK
  
  # Checking Ca: nodeguid 0x0002c90300028238
  Node check lid 3: OK
  Error check on lid 3 (cst-linux HCA-1) port 1: OK
  
  ## Summary: 17 nodes checked, 0 bad nodes found
  ##  32 ports checked, 0 ports have errors beyond threshold
  ___
  ewg mailing list