[ewg] ofa_1_5_kernel 20100505-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git git_branch: ofed_kernel_1_5 Common build parameters: Passed: Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16.60-0.54.5-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-128.el5 Passed on x86_64 with linux-2.6.18-186.el5 Passed on x86_64 with linux-2.6.18-164.el5 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.27.19-5-smp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-89.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Failed: ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH] mlx4_core: request MSIX vectors as much as there CPU cores
The current code requires num_possible_cpus() + 1 MSIX vectors. However, num_possible_cpus() stands for the max number of supported CPUs by the kernel. We should use num_online_cpus() which is the number of available CPUs for the system. Signed-off-by: Eli Cohen e...@mellanox.co.il --- drivers/net/mlx4/main.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index e3e0d54..0559df4 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -969,7 +969,7 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev) if (msi_x) { nreq = min_t(int, dev-caps.num_eqs - dev-caps.reserved_eqs, -num_possible_cpus() + 1); +num_online_cpus() + 1); entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL); if (!entries) goto no_msi; -- 1.7.1 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] libibverbs: add raw ethernet QP type IBV_QPT_RAW_ETH=7
On 5/5/2010 12:55 AM, Roland Dreier wrote: We want to add that patch and iWARP multicast acceleration patches to OFED 1.5.2 content. I have nothing to do with OFED development. We first need to see how is this patch get into the Linux kernel before we take it to OFED BTW - does this patch add-on to the verbs API? I hope it does not break the ABI. Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] libibverbs: add raw ethernet QP type IBV_QPT_RAW_ETH=7
BTW - does this patch add-on to the verbs API? I hope it does not break the ABI.The only change to the API is The only change for API is a new IBV_QPT_RAW_ETH. There is no changes in structures. Regards, Mirek -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Tziporet Koren Sent: Wednesday, May 05, 2010 3:22 PM To: Roland Dreier Cc: Walukiewicz, Miroslaw; e...@openfabrics.org Subject: Re: [ewg] [PATCH] libibverbs: add raw ethernet QP type IBV_QPT_RAW_ETH=7 On 5/5/2010 12:55 AM, Roland Dreier wrote: We want to add that patch and iWARP multicast acceleration patches to OFED 1.5.2 content. I have nothing to do with OFED development. We first need to see how is this patch get into the Linux kernel before we take it to OFED BTW - does this patch add-on to the verbs API? I hope it does not break the ABI. Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] mlx4_core: request MSIX vectors as much as there CPU cores
On Wed, May 05, 2010 at 07:54:54AM -0700, Roland Dreier wrote: The current code requires num_possible_cpus() + 1 MSIX vectors. However, num_possible_cpus() stands for the max number of supported CPUs by the kernel. I believe this is wrong -- num_possible_cpus() is the maximum number of CPUs for the running system, including hotplug. FWIW, I've seen some systems running RH kernels that blow up here because they run out of interrupt numbers if you have two IB cards in them. Mainline kernels have a reworked IRQ number allocator and don't have a problem, so maybe this is an ofed only patch? Jason ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] mlx4_core: request MSIX vectors as much as there CPU cores
On 5/5/2010 7:32 PM, Jason Gunthorpe wrote: On Wed, May 05, 2010 at 07:54:54AM -0700, Roland Dreier wrote: The current code requires num_possible_cpus() + 1 MSIX vectors. However, num_possible_cpus() stands for the max number of supported CPUs by the kernel. I believe this is wrong -- num_possible_cpus() is the maximum number of CPUs for the running system, including hotplug. FWIW, I've seen some systems running RH kernels that blow up here because they run out of interrupt numbers if you have two IB cards in them. Mainline kernels have a reworked IRQ number allocator and don't have a problem, so maybe this is an ofed only patch? This is not related to OFED We found it in performance work of our EN (10G) driver Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] mlx4_core: request MSIX vectors as much as there CPU cores
We found it in performance work of our EN (10G) driver What does this have to do with performance? Do you have a system where num_possible_cpus() differs significantly from num_online_cpus()? What kernel is that with? As far as I can tell, for x86 they should only be different if there genuinely are CPUs that are available for hotplug. - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] ibcheckerrors Port All FAILED reported
Hi guys, When I run ibcheckerrors on my Mellanox switch, it is reporting that Port all FAILED. From what I can tell, the switch is working fine and I think that this is a bogus error from the program. If this is indeed not a real problem, can the diagnostic be fixed to not report this as an error ? ibcheckerrors -nocolor -v -t 100 # Checking Switch: nodeguid 0x0002c902004046a0 Node check lid 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK Checking Ca: nodeguid 0x0002c9030002628a Node check lid 14: OK Error check on lid 14 (cstnh-2 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300025e0a Node check lid 12: OK Error check on lid 12 (cstnh-3 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030002615e Node check lid 15: OK Error check on lid 15 (cstnh-4 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e442 Node check lid 11: OK Error check on lid 11 (cstnh-8 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44e Node check lid 8: OK Error check on lid 8 (cstnh-11 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3e6 Node check lid 2: OK Error check on lid 2 (cstnh-13 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44a Node check lid 18: OK Error check on lid 18 (cstnh-9 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fb4 Node check lid 13: OK Error check on lid 13 (cstnh-7 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fbc Node check lid 10: OK Error check on lid 10 (cstnh-1 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3ee Node check lid 9: OK Error check on lid 9 (cstnh-10 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e446 Node check lid 4: OK Error check on lid 4 (cstnh-12 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e22e Node check lid 1: OK Error check on lid 1 (cstnh-14 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e43e Node check lid 19: OK Error check on lid 19 (cstnh-15 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000345 Node check lid 6: OK Error check on lid 6 (cstnh-5 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000335 Node check lid 5: OK Error check on lid 5 (cstnh-6 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300028238 Node check lid 3: OK Error check on lid 3 (cst-linux HCA-1) port 1: OK ## Summary: 17 nodes checked, 0 bad nodes found ## 32 ports checked, 0 ports have errors beyond threshold ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] ibcheckerrors Port All FAILED reported
Hi - the problem is that not all switches support the same features, and ibcheckerrors is treating this as an error. I believe this will be fixed in OFED 1.5.2. -Original Message- From: ewg-boun...@openfabrics.org [mailto:ewg-boun...@openfabrics.org] On Behalf Of Woodruff, Robert J Sent: Wednesday, May 05, 2010 4:48 PM To: EWG; tzipo...@mellanox.co.il Subject: [ewg] ibcheckerrors Port All FAILED reported Hi guys, When I run ibcheckerrors on my Mellanox switch, it is reporting that Port all FAILED. From what I can tell, the switch is working fine and I think that this is a bogus error from the program. If this is indeed not a real problem, can the diagnostic be fixed to not report this as an error ? ibcheckerrors -nocolor -v -t 100 # Checking Switch: nodeguid 0x0002c902004046a0 Node check lid 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK Checking Ca: nodeguid 0x0002c9030002628a Node check lid 14: OK Error check on lid 14 (cstnh-2 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300025e0a Node check lid 12: OK Error check on lid 12 (cstnh-3 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030002615e Node check lid 15: OK Error check on lid 15 (cstnh-4 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e442 Node check lid 11: OK Error check on lid 11 (cstnh-8 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44e Node check lid 8: OK Error check on lid 8 (cstnh-11 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3e6 Node check lid 2: OK Error check on lid 2 (cstnh-13 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44a Node check lid 18: OK Error check on lid 18 (cstnh-9 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fb4 Node check lid 13: OK Error check on lid 13 (cstnh-7 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fbc Node check lid 10: OK Error check on lid 10 (cstnh-1 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3ee Node check lid 9: OK Error check on lid 9 (cstnh-10 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e446 Node check lid 4: OK Error check on lid 4 (cstnh-12 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e22e Node check lid 1: OK Error check on lid 1 (cstnh-14 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e43e Node check lid 19: OK Error check on lid 19 (cstnh-15 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000345 Node check lid 6: OK Error check on lid 6 (cstnh-5 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000335 Node check lid 5: OK Error check on lid 5 (cstnh-6 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300028238 Node check lid 3: OK Error check on lid 3 (cst-linux HCA-1) port 1: OK ## Summary: 17 nodes checked, 0 bad nodes found ## 32 ports checked, 0 ports have errors beyond threshold ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCHv8 03/11] IB/umad: Enable support only for IB ports
Why do we not allow umad for IBoE ports? I understand there's no QP0 but why can't userspace use QP1 just like for IB link layer ports? -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCHv8 01/11] ib core: Add link layer property to ports
Hi Eli, I'm hoping to get this IBoE stuff in for 2.6.35. I started an iboe branch in my tree (similar to the xrc branch I've been carrying for a while), and I added this patch in, except I renamed rdma_port_link_layer() to rdma_port_get_link_layer(). This seems to match rdma_node_get_transport() better. In any case as I add patches to my branch, you can stop worrying about them, which should make keeping this series updated easier. - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCHv8 06/11] ipoib: avoid ipoib over IBoE
@@ -1383,6 +1385,9 @@ static void ipoib_remove_one(struct ib_device *device) dev_list = ib_get_client_data(device, ipoib_client); list_for_each_entry_safe(priv, tmp, dev_list, list) { +if (rdma_port_link_layer(device, priv-port) != IB_LINK_LAYER_INFINIBAND) +continue; Why do we need this chunk here? How could a netdev get on our list if we never create IPoIB interfaces for IBoE ports? - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCHv8 02/11] ib_core: IBoE support only QP1
@@ -795,11 +799,12 @@ static void mcast_add_one(struct ib_device *device) struct mcast_device *dev; struct mcast_port *port; int i; +int count = 0; if (rdma_node_get_transport(device-node_type) != RDMA_TRANSPORT_IB) return; -dev = kmalloc(sizeof *dev + device-phys_port_cnt * sizeof *port, +dev = kzalloc(sizeof *dev + device-phys_port_cnt * sizeof *port, @@ -1007,7 +1010,7 @@ static void ib_sa_add_one(struct ib_device *device) e = device-phys_port_cnt; } -sa_dev = kmalloc(sizeof *sa_dev + +sa_dev = kzalloc(sizeof *sa_dev + Do you happen to remember why you needed these kmalloc - kzalloc conversions? -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] ibcheckerrors Port All FAILED reported
Interesting... I have a switch which does this as well. Tracing through the scripts shows that the perfquery command is failing like this. 14:29:03 ./perfquery 40 255 ./perfquery: iberror: failed: AllPortSelect not supported It seems there is an issue with the CapabilityMask value... 14:43:32 ./perfquery 40 255 cap_mask 0x400 === my debug output ./perfquery: iberror: failed: AllPortSelect not supported 14:43:38 ./saquery CPI 40 SA ClassPortInfo: ... Capability mask..0x2602 ... Those don't match because... perfquery has a bug... perfquery is issuing a PMA query when it should be issuing a SA query. It just so happens that on some switches the result of that PMA query indicates AllPortSelect is available. Patch to follow. Ira On Wed, 5 May 2010 13:47:54 -0700 Woodruff, Robert J robert.j.woodr...@intel.com wrote: Hi guys, When I run ibcheckerrors on my Mellanox switch, it is reporting that Port all FAILED. From what I can tell, the switch is working fine and I think that this is a bogus error from the program. If this is indeed not a real problem, can the diagnostic be fixed to not report this as an error ? ibcheckerrors -nocolor -v -t 100 # Checking Switch: nodeguid 0x0002c902004046a0 Node check lid 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK Checking Ca: nodeguid 0x0002c9030002628a Node check lid 14: OK Error check on lid 14 (cstnh-2 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300025e0a Node check lid 12: OK Error check on lid 12 (cstnh-3 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030002615e Node check lid 15: OK Error check on lid 15 (cstnh-4 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e442 Node check lid 11: OK Error check on lid 11 (cstnh-8 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44e Node check lid 8: OK Error check on lid 8 (cstnh-11 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3e6 Node check lid 2: OK Error check on lid 2 (cstnh-13 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44a Node check lid 18: OK Error check on lid 18 (cstnh-9 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fb4 Node check lid 13: OK Error check on lid 13 (cstnh-7 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fbc Node check lid 10: OK Error check on lid 10 (cstnh-1 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3ee Node check lid 9: OK Error check on lid 9 (cstnh-10 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e446 Node check lid 4: OK Error check on lid 4 (cstnh-12 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e22e Node check lid 1: OK Error check on lid 1 (cstnh-14 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e43e Node check lid 19: OK Error check on lid 19 (cstnh-15 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000345 Node check lid 6: OK Error check on lid 6 (cstnh-5 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000335 Node check lid 5: OK Error check on lid 5 (cstnh-6 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300028238 Node check lid 3: OK Error check on lid 3 (cst-linux HCA-1) port 1: OK ## Summary: 17 nodes checked, 0 bad nodes found ## 32 ports checked, 0 ports have errors beyond threshold ___ ewg mailing list ewg@lists.openfabrics.org http://*lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Ira Weiny wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] ibcheckerrors Port All FAILED reported
Nevermind, I am wrong about the below. However, there is an option to emulate the all ports when it is not supported. That is a way to fix this I believe. Ira On Wed, 5 May 2010 18:09:43 -0700 Ira Weiny wei...@llnl.gov wrote: Interesting... I have a switch which does this as well. Tracing through the scripts shows that the perfquery command is failing like this. 14:29:03 ./perfquery 40 255 ./perfquery: iberror: failed: AllPortSelect not supported It seems there is an issue with the CapabilityMask value... 14:43:32 ./perfquery 40 255 cap_mask 0x400 === my debug output ./perfquery: iberror: failed: AllPortSelect not supported 14:43:38 ./saquery CPI 40 SA ClassPortInfo: ... Capability mask..0x2602 ... Those don't match because... perfquery has a bug... perfquery is issuing a PMA query when it should be issuing a SA query. It just so happens that on some switches the result of that PMA query indicates AllPortSelect is available. Patch to follow. Ira On Wed, 5 May 2010 13:47:54 -0700 Woodruff, Robert J robert.j.woodr...@intel.com wrote: Hi guys, When I run ibcheckerrors on my Mellanox switch, it is reporting that Port all FAILED. From what I can tell, the switch is working fine and I think that this is a bogus error from the program. If this is indeed not a real problem, can the diagnostic be fixed to not report this as an error ? ibcheckerrors -nocolor -v -t 100 # Checking Switch: nodeguid 0x0002c902004046a0 Node check lid 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK Checking Ca: nodeguid 0x0002c9030002628a Node check lid 14: OK Error check on lid 14 (cstnh-2 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300025e0a Node check lid 12: OK Error check on lid 12 (cstnh-3 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030002615e Node check lid 15: OK Error check on lid 15 (cstnh-4 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e442 Node check lid 11: OK Error check on lid 11 (cstnh-8 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44e Node check lid 8: OK Error check on lid 8 (cstnh-11 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3e6 Node check lid 2: OK Error check on lid 2 (cstnh-13 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44a Node check lid 18: OK Error check on lid 18 (cstnh-9 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fb4 Node check lid 13: OK Error check on lid 13 (cstnh-7 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fbc Node check lid 10: OK Error check on lid 10 (cstnh-1 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3ee Node check lid 9: OK Error check on lid 9 (cstnh-10 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e446 Node check lid 4: OK Error check on lid 4 (cstnh-12 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e22e Node check lid 1: OK Error check on lid 1 (cstnh-14 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e43e Node check lid 19: OK Error check on lid 19 (cstnh-15 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000345 Node check lid 6: OK Error check on lid 6 (cstnh-5 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000335 Node check lid 5: OK Error check on lid 5 (cstnh-6 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300028238 Node check lid 3: OK Error check on lid 3 (cst-linux HCA-1) port 1: OK ## Summary: 17 nodes checked, 0 bad nodes found ## 32 ports checked, 0 ports have errors beyond threshold ___ ewg mailing list