date:20120321

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen

Will do,

Right now I have asked the user to try rebuilding with the newest openmpi just 
to be safe.

Interesting behavior rank0 the ib counters (using collctl) never gets a packet 
in, only packets out.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Mar 21, 2012, at 11:37 AM, Jeffrey Squyres wrote:

> On Mar 21, 2012, at 11:34 AM, Brock Palen wrote:
> 
>> tcp with this code?
> 
> Does it matter enough for debugging runs?
> 
>> Can we disable the psm mtl and use the verbs emulation on qlogic?  While the 
>> qlogic verbs isn't that great it is still much faster in my tests than tcp.
>> 
>> Is there a particular reason to pick tcp?
> 
> Not really.  My only thought was that verbs over qlogic devices isn't the 
> most stable stack around (they spend all their effort on PSM, not verbs).
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Jeffrey Squyres

On Mar 21, 2012, at 11:34 AM, Brock Palen wrote:

> tcp with this code?

Does it matter enough for debugging runs?

> Can we disable the psm mtl and use the verbs emulation on qlogic?  While the 
> qlogic verbs isn't that great it is still much faster in my tests than tcp.
> 
> Is there a particular reason to pick tcp?

Not really.  My only thought was that verbs over qlogic devices isn't the most 
stable stack around (they spend all their effort on PSM, not verbs).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen

tcp with this code?

Can we disable the psm mtl and use the verbs emulation on qlogic?  While the 
qlogic verbs isn't that great it is still much faster in my tests than tcp.

Is there a particular reason to pick tcp?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Mar 21, 2012, at 11:22 AM, Jeffrey Squyres wrote:

> We unfortunately don't have much visibility into the PSM device (meaning: 
> Open MPI is a thin shim on top of the underlying libpsm, which handles all 
> the MPI point-to-point semantics itself).  So we can't even ask you to run 
> padb to look at the message queues, because we don't have access to them.  :-\
> 
> Can you try running with TCP and see if that also deadlocks?  If it does, you 
> can at least run padb to have a look at the message queues.
> 
> 
> On Mar 21, 2012, at 11:15 AM, Brock Palen wrote:
> 
>> Forgotten stack as promised, it keeps changing at the lower level 
>> opal_progress, but never moves above that.
>> 
>> [yccho@nyx0817 ~]$ padb -Ormgr=orte --all --stack-trace --tree --all 
>> Stack trace(s) for thread: 1
>> -
>> [0-63] (64 processes)
>> -
>> main() at ?:?
>> Loci::makeQuery(Loci::rule_db const&, Loci::fact_db&, 
>> std::basic_string, std::allocator > 
>> const&)() at ?:?
>>   Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>> Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>>   Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>> Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>>   Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at 
>> ?:?
>> Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() 
>> at ?:?
>>   Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() 
>> at ?:?
>> Loci::execute_list::execute(Loci::fact_db&, 
>> Loci::sched_db&)() at ?:?
>>   Loci::execute_rule::execute(Loci::fact_db&, 
>> Loci::sched_db&)() at ?:?
>> streamUns::HypreSolveUnit::compute(Loci::sequence 
>> const&)() at ?:?
>>   hypre_BoomerAMGSetup() at ?:?
>> hypre_BoomerAMGBuildInterp() at ?:?
>>   -
>>   [0,2-3,5-16,18-19,21-24,27-34,36-63] (57 processes)
>>   -
>>   hypre_ParCSRMatrixExtractBExt() at ?:?
>> hypre_ParCSRMatrixExtractBExt_Arrays() at ?:?
>>   hypre_ParCSRCommHandleDestroy() at ?:?
>> PMPI_Waitall() at ?:?
>>   -
>>   [0,2-3,5,7-16,18-19,21-24,27-34,36-63] (56 
>> processes)
>>   -
>>   ompi_request_default_wait_all() at ?:?
>> opal_progress() at ?:?
>>   -
>>   [6] (1 processes)
>>   -
>>   ompi_mtl_psm_progress() at ?:?
>>   -
>>   [1,4,17,20,25-26,35] (7 processes)
>>   -
>>   hypre_ParCSRCommHandleDestroy() at ?:?
>> PMPI_Waitall() at ?:?
>>   ompi_request_default_wait_all() at ?:?
>> opal_progress() at ?:?
>> Stack trace(s) for thread: 2
>> -
>> [0-63] (64 processes)
>> -
>> start_thread() at ?:?
>> ips_ptl_pollintr() at ptl_rcvthread.c:324
>>   poll() at ?:?
>> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Mar 21, 2012, at 11:14 AM, Brock Palen wrote:
>> 
>>> I have a users code that appears to be hanging some times on MPI_Waitall(), 
>>>  stack trace from padb below.  It is on qlogic IB using the psm mtl.
>>> Without knowing what requests go to which rank, how can I check that this 
>>> code didn't just get its self into a deadlock?  Is there a way to get a 
>>> reable list of every ranks posted sends?  And then query an wiating 
>>> MPI_Waitall() of a running job to get what rends/recvs it is waiting on?
>>> 
>>> Thanks!
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Jeffrey Squyres

We unfortunately don't have much visibility into the PSM device (meaning: Open 
MPI is a thin shim on top of the underlying libpsm, which handles all the MPI 
point-to-point semantics itself).  So we can't even ask you to run padb to look 
at the message queues, because we don't have access to them.  :-\

Can you try running with TCP and see if that also deadlocks?  If it does, you 
can at least run padb to have a look at the message queues.


On Mar 21, 2012, at 11:15 AM, Brock Palen wrote:

> Forgotten stack as promised, it keeps changing at the lower level 
> opal_progress, but never moves above that.
> 
> [yccho@nyx0817 ~]$ padb -Ormgr=orte --all --stack-trace --tree --all 
> Stack trace(s) for thread: 1
> -
> [0-63] (64 processes)
> -
> main() at ?:?
>  Loci::makeQuery(Loci::rule_db const&, Loci::fact_db&, 
> std::basic_string, std::allocator > 
> const&)() at ?:?
>Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>  Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>  Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at 
> ?:?
>  Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() 
> at ?:?
>Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() 
> at ?:?
>  Loci::execute_list::execute(Loci::fact_db&, 
> Loci::sched_db&)() at ?:?
>Loci::execute_rule::execute(Loci::fact_db&, 
> Loci::sched_db&)() at ?:?
>  streamUns::HypreSolveUnit::compute(Loci::sequence 
> const&)() at ?:?
>hypre_BoomerAMGSetup() at ?:?
>  hypre_BoomerAMGBuildInterp() at ?:?
>-
>[0,2-3,5-16,18-19,21-24,27-34,36-63] (57 processes)
>-
>hypre_ParCSRMatrixExtractBExt() at ?:?
>  hypre_ParCSRMatrixExtractBExt_Arrays() at ?:?
>hypre_ParCSRCommHandleDestroy() at ?:?
>  PMPI_Waitall() at ?:?
>-
>[0,2-3,5,7-16,18-19,21-24,27-34,36-63] (56 
> processes)
>-
>ompi_request_default_wait_all() at ?:?
>  opal_progress() at ?:?
>-
>[6] (1 processes)
>-
>ompi_mtl_psm_progress() at ?:?
>-
>[1,4,17,20,25-26,35] (7 processes)
>-
>hypre_ParCSRCommHandleDestroy() at ?:?
>  PMPI_Waitall() at ?:?
>ompi_request_default_wait_all() at ?:?
>  opal_progress() at ?:?
> Stack trace(s) for thread: 2
> -
> [0-63] (64 processes)
> -
> start_thread() at ?:?
>  ips_ptl_pollintr() at ptl_rcvthread.c:324
>poll() at ?:?
> 
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On Mar 21, 2012, at 11:14 AM, Brock Palen wrote:
> 
>> I have a users code that appears to be hanging some times on MPI_Waitall(),  
>> stack trace from padb below.  It is on qlogic IB using the psm mtl.
>> Without knowing what requests go to which rank, how can I check that this 
>> code didn't just get its self into a deadlock?  Is there a way to get a 
>> reable list of every ranks posted sends?  And then query an wiating 
>> MPI_Waitall() of a running job to get what rends/recvs it is waiting on?
>> 
>> Thanks!
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen

Forgotten stack as promised, it keeps changing at the lower level 
opal_progress, but never moves above that.

[yccho@nyx0817 ~]$ padb -Ormgr=orte --all --stack-trace --tree --all 
Stack trace(s) for thread: 1
-
[0-63] (64 processes)
-
main() at ?:?
  Loci::makeQuery(Loci::rule_db const&, Loci::fact_db&, std::basic_string, std::allocator > const&)() at ?:?
Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
  Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
  Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at 
?:?
  Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at 
?:?
Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() 
at ?:?
  Loci::execute_list::execute(Loci::fact_db&, 
Loci::sched_db&)() at ?:?
Loci::execute_rule::execute(Loci::fact_db&, 
Loci::sched_db&)() at ?:?
  streamUns::HypreSolveUnit::compute(Loci::sequence 
const&)() at ?:?
hypre_BoomerAMGSetup() at ?:?
  hypre_BoomerAMGBuildInterp() at ?:?
-
[0,2-3,5-16,18-19,21-24,27-34,36-63] (57 processes)
-
hypre_ParCSRMatrixExtractBExt() at ?:?
  hypre_ParCSRMatrixExtractBExt_Arrays() at ?:?
hypre_ParCSRCommHandleDestroy() at ?:?
  PMPI_Waitall() at ?:?
-
[0,2-3,5,7-16,18-19,21-24,27-34,36-63] (56 
processes)
-
ompi_request_default_wait_all() at ?:?
  opal_progress() at ?:?
-
[6] (1 processes)
-
ompi_mtl_psm_progress() at ?:?
-
[1,4,17,20,25-26,35] (7 processes)
-
hypre_ParCSRCommHandleDestroy() at ?:?
  PMPI_Waitall() at ?:?
ompi_request_default_wait_all() at ?:?
  opal_progress() at ?:?
Stack trace(s) for thread: 2
-
[0-63] (64 processes)
-
start_thread() at ?:?
  ips_ptl_pollintr() at ptl_rcvthread.c:324
poll() at ?:?


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Mar 21, 2012, at 11:14 AM, Brock Palen wrote:

> I have a users code that appears to be hanging some times on MPI_Waitall(),  
> stack trace from padb below.  It is on qlogic IB using the psm mtl.
> Without knowing what requests go to which rank, how can I check that this 
> code didn't just get its self into a deadlock?  Is there a way to get a 
> reable list of every ranks posted sends?  And then query an wiating 
> MPI_Waitall() of a running job to get what rends/recvs it is waiting on?
> 
> Thanks!
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
>

[OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen

I have a users code that appears to be hanging some times on MPI_Waitall(),  
stack trace from padb below.  It is on qlogic IB using the psm mtl.
Without knowing what requests go to which rank, how can I check that this code 
didn't just get its self into a deadlock?  Is there a way to get a reable list 
of every ranks posted sends?  And then query an wiating MPI_Waitall() of a 
running job to get what rends/recvs it is waiting on?

Thanks!

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

Re: [OMPI users] InfiniBand path migration not working

2012-03-21 Thread Shamis, Pavel

Jeremy,

As far as I understand the tool that Evgeny recommended showed that the remote 
port is reachable. 
Based on the log that have been provided I can't find the issue in ompi, 
everything seems to be kosher.
Unfortunately, I do not have a platform where I may try to reproduce the issue. 
I would as Evegeny,
maybe Mellanox will be able to reproduce and debug the issue.


Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Mar 21, 2012, at 9:31 AM, Jeremy wrote:

> Hi Pasha,
> 
> I just wanted to check if you had any further suggestions regarding
> the APM issue based on the updated info in my previous email.
> 
> Thanks,
> 
> -Jeremy
> 
> On Mon, Mar 12, 2012 at 12:43 PM, Jeremy  wrote:
>> Hi Pasha, Yevgeny,
>> 
 My educated guess is that from some reason it is no direct connection path
 between lid-2 and lid-4. To prove it we have to look and the OpenSM routing
 information.
>> 
>>> If you don't get response or you get info of
>>> the device different that what you would expect,
>>> then the two ports are not part of the same
>>> subnet, and APN is expected to fail.
>>> Otherwise - it's probably a bug.
>> 
>> I've tried your suggestions and the details are below.  I am now
>> testing with a trivial MPI application that just does an
>> MPI_Send/MPI_Recv and then sleeps for a while (attached).  There is
>> much less output to weed through now!
>> 
>> When I unplug a cable from Port 1, the LID associated with Port 2 is
>> still reachable with smpquery.  So it looks like there should be a
>> valid path to migrate to on the same  subnet.
>> 
>> I am using 2 hosts in this output
>> sulu:  This is the host where I unplug the cable from Port 1. The
>> cable on Port 2 is connected all the time. LIDs 4 and 5.
>> bones:  On this host I leave cables connected to both Ports all the
>> time.LIDs 2 and 3.
>> 
>> A) Before I start, sulu shows that both Ports are up and active using
>> LIDs 4 and 5:
>> sulu> ibstatus
>> Infiniband device 'mlx4_0' port 1 status:
>>default gid: fe80::::0002:c903:0033:6fe1
>>base lid:0x4
>>sm lid:  0x6
>>state:   4: ACTIVE
>>phys state:  5: LinkUp
>>rate:56 Gb/sec (4X FDR)
>>link_layer:  InfiniBand
>> 
>> Infiniband device 'mlx4_0' port 2 status:
>>default gid: fe80::::0002:c903:0033:6fe2
>>base lid:0x5
>>sm lid:  0x6
>>state:   4: ACTIVE
>>phys state:  5: LinkUp
>>rate:56 Gb/sec (4X FDR)
>>link_layer:  InfiniBand
>> 
>> B) The other host, bones, is able to get to LIDs 4 and 5 OK:
>> bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 4
>> # Node info: Lid 4
>> BaseVers:1
>> ClassVers:...1
>> NodeType:Channel Adapter
>> NumPorts:2
>> SystemGuid:..0x0002c90300336fe3
>> Guid:0x0002c90300336fe0
>> PortGuid:0x0002c90300336fe1
>> PartCap:.128
>> DevId:...0x1003
>> Revision:0x
>> LocalPort:...1
>> VendorId:0x0002c9
>> 
>> bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 5
>> # Node info: Lid 5
>> BaseVers:1
>> ClassVers:...1
>> NodeType:Channel Adapter
>> NumPorts:2
>> SystemGuid:..0x0002c90300336fe3
>> Guid:0x0002c90300336fe0
>> PortGuid:0x0002c90300336fe2
>> PartCap:.128
>> DevId:...0x1003
>> Revision:0x
>> LocalPort:...2
>> VendorId:0x0002c9
>> 
>> C) I start the MPI program.  See attached file for output.
>> 
>> D) During Iteration 3, I unplugged the cable on Port 1 of sulu.
>> - I get the expected network error event message.
>> - sulu shows that Port 1 is down and Port 2 is active as expected.
>> - bones is still able to get to LID 5 on Port 2 of sulu as expected.
>> - The MPI application hangs and then terminates instead of running via LID 5.
>> 
>> sulu> ibstatus
>> Infiniband device 'mlx4_0' port 1 status:
>>default gid: fe80::::0002:c903:0033:6fe1
>>base lid:0x4
>>sm lid:  0x6
>>state:   1: DOWN
>>phys state:  2: Polling
>>rate:40 Gb/sec (4X QDR)
>>link_layer:  InfiniBand
>> 
>> Infiniband device 'mlx4_0' port 2 status:
>>default gid: fe80::::0002:c903:0033:6fe2
>>base lid:0x5
>>sm lid:  0x6
>>state:   4: ACTIVE
>>

Re: [OMPI users] InfiniBand path migration not working

2012-03-21 Thread Jeremy

Hi Pasha,

I just wanted to check if you had any further suggestions regarding
the APM issue based on the updated info in my previous email.

Thanks,

-Jeremy

On Mon, Mar 12, 2012 at 12:43 PM, Jeremy  wrote:
> Hi Pasha, Yevgeny,
>
>>> My educated guess is that from some reason it is no direct connection path
>>> between lid-2 and lid-4. To prove it we have to look and the OpenSM routing
>>> information.
>
>> If you don't get response or you get info of
>> the device different that what you would expect,
>> then the two ports are not part of the same
>> subnet, and APN is expected to fail.
>> Otherwise - it's probably a bug.
>
> I've tried your suggestions and the details are below.  I am now
> testing with a trivial MPI application that just does an
> MPI_Send/MPI_Recv and then sleeps for a while (attached).  There is
> much less output to weed through now!
>
> When I unplug a cable from Port 1, the LID associated with Port 2 is
> still reachable with smpquery.  So it looks like there should be a
> valid path to migrate to on the same  subnet.
>
> I am using 2 hosts in this output
> sulu:  This is the host where I unplug the cable from Port 1. The
> cable on Port 2 is connected all the time. LIDs 4 and 5.
> bones:  On this host I leave cables connected to both Ports all the
> time.LIDs 2 and 3.
>
> A) Before I start, sulu shows that both Ports are up and active using
> LIDs 4 and 5:
> sulu> ibstatus
> Infiniband device 'mlx4_0' port 1 status:
>        default gid:     fe80::::0002:c903:0033:6fe1
>        base lid:        0x4
>        sm lid:          0x6
>        state:           4: ACTIVE
>        phys state:      5: LinkUp
>        rate:            56 Gb/sec (4X FDR)
>        link_layer:      InfiniBand
>
> Infiniband device 'mlx4_0' port 2 status:
>        default gid:     fe80::::0002:c903:0033:6fe2
>        base lid:        0x5
>        sm lid:          0x6
>        state:           4: ACTIVE
>        phys state:      5: LinkUp
>        rate:            56 Gb/sec (4X FDR)
>        link_layer:      InfiniBand
>
> B) The other host, bones, is able to get to LIDs 4 and 5 OK:
> bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 4
> # Node info: Lid 4
> BaseVers:1
> ClassVers:...1
> NodeType:Channel Adapter
> NumPorts:2
> SystemGuid:..0x0002c90300336fe3
> Guid:0x0002c90300336fe0
> PortGuid:0x0002c90300336fe1
> PartCap:.128
> DevId:...0x1003
> Revision:0x
> LocalPort:...1
> VendorId:0x0002c9
>
> bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 5
> # Node info: Lid 5
> BaseVers:1
> ClassVers:...1
> NodeType:Channel Adapter
> NumPorts:2
> SystemGuid:..0x0002c90300336fe3
> Guid:0x0002c90300336fe0
> PortGuid:0x0002c90300336fe2
> PartCap:.128
> DevId:...0x1003
> Revision:0x
> LocalPort:...2
> VendorId:0x0002c9
>
> C) I start the MPI program.  See attached file for output.
>
> D) During Iteration 3, I unplugged the cable on Port 1 of sulu.
> - I get the expected network error event message.
> - sulu shows that Port 1 is down and Port 2 is active as expected.
> - bones is still able to get to LID 5 on Port 2 of sulu as expected.
> - The MPI application hangs and then terminates instead of running via LID 5.
>
> sulu> ibstatus
> Infiniband device 'mlx4_0' port 1 status:
>        default gid:     fe80::::0002:c903:0033:6fe1
>        base lid:        0x4
>        sm lid:          0x6
>        state:           1: DOWN
>        phys state:      2: Polling
>        rate:            40 Gb/sec (4X QDR)
>        link_layer:      InfiniBand
>
> Infiniband device 'mlx4_0' port 2 status:
>        default gid:     fe80::::0002:c903:0033:6fe2
>        base lid:        0x5
>        sm lid:          0x6
>        state:           4: ACTIVE
>        phys state:      5: LinkUp
>        rate:            56 Gb/sec (4X FDR)
>        link_layer:      InfiniBand
>
> bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 4
> ibwarn: [11192] mad_rpc: _do_madrpc failed; dport (Lid 4)
> smpquery: iberror: failed: operation NodeInfo: node info query failed
>
> bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 5
> # Node info: Lid 5
> BaseVers:1
> ClassVers:...1
> NodeType:Channel Adapter
> NumPorts:2
> SystemGuid:..0x0002c90300336fe3
> Guid:0x0002c90300336fe0
> PortGuid:0x0002c90300336fe2

Re: [OMPI users] MPI_Waitall hangs and querying

Re: [OMPI users] MPI_Waitall hangs and querying

Re: [OMPI users] MPI_Waitall hangs and querying

Re: [OMPI users] MPI_Waitall hangs and querying

Re: [OMPI users] MPI_Waitall hangs and querying

[OMPI users] MPI_Waitall hangs and querying

Re: [OMPI users] InfiniBand path migration not working

Re: [OMPI users] InfiniBand path migration not working

8 matches

Site Navigation

Mail list logo

Footer information