To use Tim Prins 2nd suggestion, you would also need to add "-mca pml cm" to the runs with "-mca mtl mx".
On 9/29/07, Tim Prins <tpr...@open-mpi.org> wrote: > I would reccommend trying a few things: > > 1. Set some debugging flags and see if that helps. So, I would try something > like: > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl > mx,self -host "indus1,indus2" -mca btl_base_debug 1000 ./hello > > This will output information as each btl is loaded, and whether or not the > load succeeds. > > 2. Try running with the mx mtl instead of the btl: > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" ./hello > > Similarly, for debug output: > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" -mca > mtl_base_debug 1000 ./hello > > Let me know if any of these work. > > Thanks, > > Tim > > On Saturday 29 September 2007 01:53:06 am Hammad Siddiqi wrote: > > Hi Terry, > > > > Thanks for replying. The following command is working fine: > > > > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl tcp,sm,self -machinefile > > machines ./hello > > > > The contents of machines are: > > indus1 > > indus2 > > indus3 > > indus4 > > > > I have tried using np=2 over pairs of machines, but the problem is same. > > The errors that occur are given below with the command that I am trying. > > > > **Test 1** > > > > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host > > "indus1,indus2" ./hello > > -------------------------------------------------------------------------- > > Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > -------------------------------------------------------------------------- > > Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > > > **Test 2* > > > > */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host > > "indus1,indus3" ./hello > > -------------------------------------------------------------------------- > > Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > -------------------------------------------------------------------------- > > Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > * > > *Test 3* > > */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host > > "indus1,indus4" ./hello > > -------------------------------------------------------------------------- > > Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > -------------------------------------------------------------------------- > > Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > > > **Test4** > > > > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host > > "indus2,indus4" ./hello > > -------------------------------------------------------------------------- > > Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > -------------------------------------------------------------------------- > > Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > * > > > > *Test5* > > > > * /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host > > "indus2,indus3" ./hello > > -------------------------------------------------------------------------- > > Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > -------------------------------------------------------------------------- > > Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > > > **Test 6* > > > > * /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host > > "indus3,indus4" ./hello > > -------------------------------------------------------------------------- > > Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > -------------------------------------------------------------------------- > > Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > > > **END OF TESTS** > > > > There is one thing to note that when I run this command including -mca > > pml cm it works fine :S > > > > mpirun -np 4 -mca btl mx,sm,self -mca pml cm -machinefile machines ./hello > > Hello MPI! Process 4 of 1 on indus2 > > Hello MPI! Process 4 of 2 on indus3 > > Hello MPI! Process 4 of 3 on indus4 > > Hello MPI! Process 4 of 0 on indus1 > > > > To my knowledge this command is not using shared memory and is only > > using myrinet as interconnect. > > One more thing I cannot start more than 4 processes in this case, The > > mpirun process hangs. > > > > Any suggestions? > > > > Once again, thanks for your help. > > > > Regards, > > Hammad > > > > Terry Dontje wrote: > > > Hi Hammad, > > > > > > It looks to me like none of the btl's could resolve a route between the > > > node that process rank 0 is on to the other nodes. > > > I would suggest trying np=2 over a couple pairs of machines to see if > > > that works and you can truly be sure that only the > > > first node is having this problem. > > > > > > It also might be helpful as a sanity check to use the tcp btl instead of > > > mx and see if you get more traction with that. > > > > > > --td > > > > > >> *From:* Hammad Siddiqi (/hammad.siddiqi_at_[hidden]/) > > >> *Date:* 2007-09-28 07:38:01 > > >> > > >> > > >> > > >> > > >> Hello, > > >> > > >> I am using Sun HPC Toolkit 7.0 to compile and run my C MPI programs. > > >> > > >> I have tested the myrinet installations using myricoms own test > > >> programs. The Myricom software stack I am using is MX and the vesrion is > > >> mx2g-1.1.7, mx_mapper is also used. > > >> We have 4 nodes having 8 dual core processors each (Sun Fire v890) and > > >> the operating system is > > >> Solaris 10 (SunOS indus1 5.10 Generic_125100-10 sun4u sparc > > >> SUNW,Sun-Fire-V890). > > >> > > >> The contents of machine file are: > > >> indus1 > > >> indus2 > > >> indus3 > > >> indus4 > > >> > > >> The output of *mx_info* on each node is given below > > >> > > >> =====*= > > >> indus1 > > >> *====== > > >> > > >> MX Version: 1.1.7rc3cvs1_1_fixes > > >> MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007 > > >> 2 Myrinet boards installed. > > >> The MX driver is configured to support up to 4 instances and 1024 nodes. > > >> =================================================================== > > >> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM > > >> Status: Running, P0: Link up > > >> MAC Address: 00:60:dd:47:ad:7c > > >> Product code: M3F-PCIXF-2 > > >> Part number: 09-03392 > > >> Serial number: 297218 > > >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured > > >> Mapped hosts: 10 > > >> > > >> > > >> ROUTE COUNT > > >> INDEX MAC ADDRESS HOST NAME P0 > > >> ----- ----------- > > >> --------- --- > > >> 0) 00:60:dd:47:ad:7c indus1:0 1,1 > > >> 2) 00:60:dd:47:ad:68 indus4:0 8,3 > > >> 3) 00:60:dd:47:b3:e8 indus4:1 7,3 > > >> 4) 00:60:dd:47:b3:ab indus2:0 7,3 > > >> 5) 00:60:dd:47:ad:66 indus3:0 8,3 > > >> 6) 00:60:dd:47:ad:76 indus3:1 8,3 > > >> 7) 00:60:dd:47:ad:77 jhelum1:0 8,3 > > >> 8) 00:60:dd:47:b3:5a ravi2:0 8,3 > > >> 9) 00:60:dd:47:ad:5f ravi2:1 1,1 > > >> 10) 00:60:dd:47:b3:bf ravi1:0 8,3 > > >> =================================================================== > > >> > > >> ====== > > >> *indus2* > > >> ====== > > >> > > >> MX Version: 1.1.7rc3cvs1_1_fixes > > >> MX Build: @indus2:/opt/mx2g-1.1.7rc3 Thu May 31 11:24:03 PKT 2007 > > >> 2 Myrinet boards installed. > > >> The MX driver is configured to support up to 4 instances and 1024 nodes. > > >> =================================================================== > > >> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM > > >> Status: Running, P0: Link up > > >> MAC Address: 00:60:dd:47:b3:ab > > >> Product code: M3F-PCIXF-2 > > >> Part number: 09-03392 > > >> Serial number: 296636 > > >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured > > >> Mapped hosts: 10 > > >> > > >> ROUTE > > >> COUNT > > >> INDEX MAC ADDRESS HOST NAME P0 > > >> ----- ----------- --------- --- > > >> 0) 00:60:dd:47:b3:ab indus2:0 1,1 > > >> 2) 00:60:dd:47:ad:68 indus4:0 1,1 > > >> 3) 00:60:dd:47:b3:e8 indus4:1 8,3 > > >> 4) 00:60:dd:47:ad:66 indus3:0 1,1 > > >> 5) 00:60:dd:47:ad:76 indus3:1 7,3 > > >> 6) 00:60:dd:47:ad:77 jhelum1:0 7,3 > > >> 8) 00:60:dd:47:ad:7c indus1:0 8,3 > > >> 9) 00:60:dd:47:b3:5a ravi2:0 8,3 > > >> 10) 00:60:dd:47:ad:5f ravi2:1 8,3 > > >> 11) 00:60:dd:47:b3:bf ravi1:0 7,3 > > >> =================================================================== > > >> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM > > >> Status: Running, P0: Link down > > >> MAC Address: 00:60:dd:47:b3:c3 > > >> Product code: M3F-PCIXF-2 > > >> Part number: 09-03392 > > >> Serial number: 296612 > > >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured > > >> Mapped hosts: 10 > > >> > > >> ====== > > >> *indus3* > > >> ====== > > >> MX Version: 1.1.7rc3cvs1_1_fixes > > >> MX Build: @indus3:/opt/mx2g-1.1.7rc3 Thu May 31 11:29:03 PKT 2007 > > >> 2 Myrinet boards installed. > > >> The MX driver is configured to support up to 4 instances and 1024 nodes. > > >> =================================================================== > > >> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM > > >> Status: Running, P0: Link up > > >> MAC Address: 00:60:dd:47:ad:66 > > >> Product code: M3F-PCIXF-2 > > >> Part number: 09-03392 > > >> Serial number: 297240 > > >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured > > >> Mapped hosts: 10 > > >> > > >> ROUTE > > >> COUNT > > >> INDEX MAC ADDRESS HOST NAME P0 > > >> ----- ----------- --------- --- > > >> 0) 00:60:dd:47:ad:66 indus3:0 1,1 > > >> 1) 00:60:dd:47:ad:76 indus3:1 8,3 > > >> 2) 00:60:dd:47:ad:68 indus4:0 1,1 > > >> 3) 00:60:dd:47:b3:e8 indus4:1 6,3 > > >> 4) 00:60:dd:47:ad:77 jhelum1:0 8,3 > > >> 5) 00:60:dd:47:b3:ab indus2:0 1,1 > > >> 7) 00:60:dd:47:ad:7c indus1:0 8,3 > > >> 8) 00:60:dd:47:b3:5a ravi2:0 8,3 > > >> 9) 00:60:dd:47:ad:5f ravi2:1 7,3 > > >> 10) 00:60:dd:47:b3:bf ravi1:0 8,3 > > >> =================================================================== > > >> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM > > >> Status: Running, P0: Link up > > >> MAC Address: 00:60:dd:47:ad:76 > > >> Product code: M3F-PCIXF-2 > > >> Part number: 09-03392 > > >> Serial number: 297224 > > >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured > > >> Mapped hosts: 10 > > >> > > >> ROUTE > > >> COUNT > > >> INDEX MAC ADDRESS HOST NAME P0 > > >> ----- ----------- --------- --- > > >> 0) 00:60:dd:47:ad:66 indus3:0 8,3 > > >> 1) 00:60:dd:47:ad:76 indus3:1 1,1 > > >> 2) 00:60:dd:47:ad:68 indus4:0 7,3 > > >> 3) 00:60:dd:47:b3:e8 indus4:1 1,1 > > >> 4) 00:60:dd:47:ad:77 jhelum1:0 1,1 > > >> 5) 00:60:dd:47:b3:ab indus2:0 7,3 > > >> 7) 00:60:dd:47:ad:7c indus1:0 8,3 > > >> 8) 00:60:dd:47:b3:5a ravi2:0 6,3 > > >> 9) 00:60:dd:47:ad:5f ravi2:1 8,3 > > >> 10) 00:60:dd:47:b3:bf ravi1:0 8,3 > > >> > > >> ====== > > >> *indus4* > > >> ====== > > >> > > >> MX Version: 1.1.7rc3cvs1_1_fixes > > >> MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007 > > >> 2 Myrinet boards installed. > > >> The MX driver is configured to support up to 4 instances and 1024 nodes. > > >> =================================================================== > > >> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM > > >> Status: Running, P0: Link up > > >> MAC Address: 00:60:dd:47:ad:68 > > >> Product code: M3F-PCIXF-2 > > >> Part number: 09-03392 > > >> Serial number: 297238 > > >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured > > >> Mapped hosts: 10 > > >> > > >> ROUTE > > >> COUNT > > >> INDEX MAC ADDRESS HOST NAME P0 > > >> ----- ----------- --------- --- > > >> 0) 00:60:dd:47:ad:68 indus4:0 1,1 > > >> 1) 00:60:dd:47:b3:e8 indus4:1 7,3 > > >> 2) 00:60:dd:47:ad:77 jhelum1:0 7,3 > > >> 3) 00:60:dd:47:ad:66 indus3:0 1,1 > > >> 4) 00:60:dd:47:ad:76 indus3:1 7,3 > > >> 5) 00:60:dd:47:b3:ab indus2:0 1,1 > > >> 7) 00:60:dd:47:ad:7c indus1:0 7,3 > > >> 8) 00:60:dd:47:b3:5a ravi2:0 7,3 > > >> 9) 00:60:dd:47:ad:5f ravi2:1 8,3 > > >> 10) 00:60:dd:47:b3:bf ravi1:0 7,3 > > >> =================================================================== > > >> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM > > >> Status: Running, P0: Link up > > >> MAC Address: 00:60:dd:47:b3:e8 > > >> Product code: M3F-PCIXF-2 > > >> Part number: 09-03392 > > >> Serial number: 296575 > > >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured > > >> Mapped hosts: 10 > > >> > > >> ROUTE > > >> COUNT > > >> INDEX MAC ADDRESS HOST NAME P0 > > >> ----- ----------- --------- --- > > >> 0) 00:60:dd:47:ad:68 indus4:0 6,3 > > >> 1) 00:60:dd:47:b3:e8 indus4:1 1,1 > > >> 2) 00:60:dd:47:ad:77 jhelum1:0 1,1 > > >> 3) 00:60:dd:47:ad:66 indus3:0 8,3 > > >> 4) 00:60:dd:47:ad:76 indus3:1 1,1 > > >> 5) 00:60:dd:47:b3:ab indus2:0 8,3 > > >> 7) 00:60:dd:47:ad:7c indus1:0 7,3 > > >> 8) 00:60:dd:47:b3:5a ravi2:0 6,3 > > >> 9) 00:60:dd:47:ad:5f ravi2:1 8,3 > > >> 10) 00:60:dd:47:b3:bf ravi1:0 8,3 > > >> > > >> The output from *ompi_info* is: > > >> > > >> Open MPI: 1.2.1r14096-ct7b030r1838 > > >> Open MPI SVN revision: 0 > > >> Open RTE: 1.2.1r14096-ct7b030r1838 > > >> Open RTE SVN revision: 0 > > >> OPAL: 1.2.1r14096-ct7b030r1838 > > >> OPAL SVN revision: 0 > > >> Prefix: /opt/SUNWhpc/HPC7.0 > > >> Configured architecture: sparc-sun-solaris2.10 > > >> Configured by: root > > >> Configured on: Fri Mar 30 12:49:36 EDT 2007 > > >> Configure host: burpen-on10-0 > > >> Built by: root > > >> Built on: Fri Mar 30 13:10:46 EDT 2007 > > >> Built host: burpen-on10-0 > > >> C bindings: yes > > >> C++ bindings: yes > > >> Fortran77 bindings: yes (all) > > >> Fortran90 bindings: yes > > >> Fortran90 bindings size: trivial > > >> C compiler: cc > > >> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc > > >> C++ compiler: CC > > >> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC > > >> Fortran77 compiler: f77 > > >> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77 > > >> Fortran90 compiler: f95 > > >> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95 > > >> C profiling: yes > > >> C++ profiling: yes > > >> Fortran77 profiling: yes > > >> Fortran90 profiling: yes > > >> C++ exceptions: yes > > >> Thread support: no > > >> Internal debug support: no > > >> MPI parameter check: runtime > > >> Memory profiling support: no > > >> Memory debugging support: no > > >> libltdl support: yes > > >> Heterogeneous support: yes > > >> mpirun default --prefix: yes > > >> MCA backtrace: printstack (MCA v1.0, API v1.0, Component > > >> v1.2.1) > > >> MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA maffinity: first_use (MCA v1.0, API v1.0, Component > > >> v1.2.1) > > >> MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) > > >> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) > > >> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA btl: mx (MCA v1.0, API v1.0.1, Component v1.2.1) > > >> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1) > > >> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1) > > >> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0) > > >> MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA mtl: mx (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1) > > >> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1) > > >> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1) > > >> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1) > > >> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1) > > >> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) > > >> MCA ras: dash_host (MCA v1.0, API v1.3, Component > > >> v1.2.1) > > >> MCA ras: gridengine (MCA v1.0, API v1.3, Component > > >> v1.2.1) > > >> MCA ras: localhost (MCA v1.0, API v1.3, Component > > >> v1.2.1) > > >> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1) > > >> MCA rds: hostfile (MCA v1.0, API v1.3, Component > > >> v1.2.1) MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA rds: > > >> resfile (MCA v1.0, API v1.3, Component v1.2.1) MCA rmaps: round_robin > > >> (MCA v1.0, API v1.3, Component v1.2.1) > > >> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1) > > >> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1) > > >> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA pls: gridengine (MCA v1.0, API v1.3, Component > > >> v1.2.1) > > >> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1) > > >> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1) > > >> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1) > > >> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1) > > >> MCA sds: singleton (MCA v1.0, API v1.0, Component > > >> v1.2.1) > > >> > > >> When I try to run a simple hello world program by issuing following > > >> command: > > >> > > >> *mpirun -np 4 -mca btl mx,sm,self -machinefile machines ./hello > > >> > > >> *The following error appears: > > >> > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > > >> If you specified the use of a BTL component, you may have > > >> forgotten a component (such as "self") in the list of > > >> usable components. > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> It looks like MPI_INIT failed for some reason; your parallel process is > > >> likely to abort. There are many reasons that a parallel process can > > >> fail during MPI_INIT; some of which are due to configuration or > > >> environment > > >> problems. This failure appears to be an internal failure; here's some > > >> additional information (which may only be relevant to an Open MPI > > >> developer): > > >> > > >> PML add procs failed > > >> --> Returned "Unreachable" (-12) instead of "Success" (0) > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> *** An error occurred in MPI_Init > > >> *** before MPI was initialized > > >> *** MPI_ERRORS_ARE_FATAL (goodbye) > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > > >> If you specified the use of a BTL component, you may have > > >> forgotten a component (such as "self") in the list of > > >> usable components. > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> It looks like MPI_INIT failed for some reason; your parallel process is > > >> likely to abort. There are many reasons that a parallel process can > > >> fail during MPI_INIT; some of which are due to configuration or > > >> environment > > >> problems. This failure appears to be an internal failure; here's some > > >> additional information (which may only be relevant to an Open MPI > > >> developer): > > >> > > >> PML add procs failed > > >> --> Returned "Unreachable" (-12) instead of "Success" (0) > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> *** An error occurred in MPI_Init > > >> *** before MPI was initialized > > >> *** MPI_ERRORS_ARE_FATAL (goodbye) > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> Process 0.1.3 is unable to reach 0.1.0 for MPI communication. > > >> If you specified the use of a BTL component, you may have > > >> forgotten a component (such as "self") in the list of > > >> usable components. > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> It looks like MPI_INIT failed for some reason; your parallel process is > > >> likely to abort. There are many reasons that a parallel process can > > >> fail during MPI_INIT; some of which are due to configuration or > > >> environment > > >> problems. This failure appears to be an internal failure; here's some > > >> additional information (which may only be relevant to an Open MPI > > >> developer): > > >> > > >> PML add procs failed > > >> --> Returned "Unreachable" (-12) instead of "Success" (0) > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> Process 0.1.2 is unable to reach 0.1.0 for MPI communication. > > >> If you specified the use of a BTL component, you may have > > >> forgotten a component (such as "self") in the list of > > >> usable components. > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> It looks like MPI_INIT failed for some reason; your parallel process is > > >> likely to abort. There are many reasons that a parallel process can > > >> fail during MPI_INIT; some of which are due to configuration or > > >> environment > > >> problems. This failure appears to be an internal failure; here's some > > >> additional information (which may only be relevant to an Open MPI > > >> developer): > > >> > > >> PML add procs failed > > >> --> Returned "Unreachable" (-*** An error occurred in MPI_Init > > >> *** before MPI was initialized > > >> *** MPI_ERRORS_ARE_FATAL (goodbye) > > >> 12) instead of "Success" (0) > > >> ------------------------------------------------------------------------ > > >>-- > > >> > > >> *** An error occurred in MPI_Init > > >> *** before MPI was initialized > > >> *** MPI_ERRORS_ARE_FATAL (goodbye) > > >> > > >> The output from more */var/run/fms/fma.log* > > >> > > >> Sat Sep 22 10:47:50 2007 NIC 0: M3F-PCIXF-2 s/n=297218 1 ports, speed=2G > > >> Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:7c > > >> Sat Sep 22 10:47:50 2007 NIC 1: M3F-PCIXF-2 s/n=297248 1 ports, speed=2G > > >> Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:5e > > >> Sat Sep 22 10:47:50 2007 fms-1.2.1 fma starting > > >> Sat Sep 22 10:47:50 2007 Mapper was 00:00:00:00:00:00, l=0, is now > > >> 00:60:dd:47:ad:7c, l=1 > > >> Sat Sep 22 10:47:50 2007 Mapping fabric... > > >> Sat Sep 22 10:47:54 2007 Mapper was 00:60:dd:47:ad:7c, l=1, is now > > >> 00:60:dd:47:b3:e8, l=1 > > >> Sat Sep 22 10:47:54 2007 Cancelling mapping > > >> Sat Sep 22 10:47:59 2007 5 hosts, 8 nics, 6 xbars, 40 links > > >> Sat Sep 22 10:47:59 2007 map version is 1987557551 > > >> Sat Sep 22 10:47:59 2007 Found NIC 0 at index 3! > > >> Sat Sep 22 10:47:59 2007 Found NIC 1 at index 2! > > >> Sat Sep 22 10:47:59 2007 map seems OK > > >> Sat Sep 22 10:47:59 2007 Routing took 0 seconds > > >> Mon Sep 24 14:26:46 2007 Requesting remap from indus4 > > >> (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:5a, lev=1, pkt_type=0 > > >> Mon Sep 24 14:26:51 2007 6 hosts, 10 nics, 6 xbars, 42 links > > >> Mon Sep 24 14:26:51 2007 map version is 1987557552 > > >> Mon Sep 24 14:26:51 2007 Found NIC 0 at index 3! > > >> Mon Sep 24 14:26:51 2007 Found NIC 1 at index 2! > > >> Mon Sep 24 14:26:51 2007 map seems OK > > >> Mon Sep 24 14:26:51 2007 Routing took 0 seconds > > >> Mon Sep 24 14:35:17 2007 Requesting remap from indus4 > > >> (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:bf, lev=1, pkt_type=0 > > >> Mon Sep 24 14:35:19 2007 7 hosts, 11 nics, 6 xbars, 43 links > > >> Mon Sep 24 14:35:19 2007 map version is 1987557553 > > >> Mon Sep 24 14:35:19 2007 Found NIC 0 at index 5! > > >> Mon Sep 24 14:35:19 2007 Found NIC 1 at index 4! > > >> Mon Sep 24 14:35:19 2007 map seems OK > > >> Mon Sep 24 14:35:19 2007 Routing took 0 seconds > > >> Tue Sep 25 21:47:52 2007 6 hosts, 9 nics, 6 xbars, 41 links > > >> Tue Sep 25 21:47:52 2007 map version is 1987557554 > > >> Tue Sep 25 21:47:52 2007 Found NIC 0 at index 3! > > >> Tue Sep 25 21:47:52 2007 Found NIC 1 at index 2! > > >> Tue Sep 25 21:47:52 2007 map seems OK > > >> Tue Sep 25 21:47:52 2007 Routing took 0 seconds > > >> Tue Sep 25 21:52:02 2007 Requesting remap from indus4 > > >> (00:60:dd:47:b3:e8): empty port x0p15 is no longer empty > > >> Tue Sep 25 21:52:07 2007 6 hosts, 10 nics, 6 xbars, 42 links > > >> Tue Sep 25 21:52:07 2007 map version is 1987557555 > > >> Tue Sep 25 21:52:07 2007 Found NIC 0 at index 4! > > >> Tue Sep 25 21:52:07 2007 Found NIC 1 at index 3! > > >> Tue Sep 25 21:52:07 2007 map seems OK > > >> Tue Sep 25 21:52:07 2007 Routing took 0 seconds > > >> Tue Sep 25 21:52:23 2007 7 hosts, 11 nics, 6 xbars, 43 links > > >> Tue Sep 25 21:52:23 2007 map version is 1987557556 > > >> Tue Sep 25 21:52:23 2007 Found NIC 0 at index 6! > > >> Tue Sep 25 21:52:23 2007 Found NIC 1 at index 5! > > >> Tue Sep 25 21:52:23 2007 map seems OK > > >> Tue Sep 25 21:52:23 2007 Routing took 0 seconds > > >> Wed Sep 26 05:07:01 2007 Requesting remap from indus4 > > >> (00:60:dd:47:b3:e8): verify failed x1p2, nic 0, port 0 route=-9 4 10 > > >> reply=-10 -4 9 , remote=ravi2 NIC > > >> 1, p0 mac=00:60:dd:47:ad:5f > > >> Wed Sep 26 05:07:06 2007 6 hosts, 9 nics, 6 xbars, 41 links > > >> Wed Sep 26 05:07:06 2007 map version is 1987557557 > > >> Wed Sep 26 05:07:06 2007 Found NIC 0 at index 3! > > >> Wed Sep 26 05:07:06 2007 Found NIC 1 at index 2! > > >> Wed Sep 26 05:07:06 2007 map seems OK > > >> Wed Sep 26 05:07:06 2007 Routing took 0 seconds > > >> Wed Sep 26 05:11:19 2007 7 hosts, 11 nics, 6 xbars, 43 links > > >> Wed Sep 26 05:11:19 2007 map version is 1987557558 > > >> Wed Sep 26 05:11:19 2007 Found NIC 0 at index 3! > > >> Wed Sep 26 05:11:19 2007 Found NIC 1 at index 2! > > >> Wed Sep 26 05:11:19 2007 map seems OK > > >> Wed Sep 26 05:11:19 2007 Routing took 0 seconds > > >> Thu Sep 27 11:45:37 2007 6 hosts, 9 nics, 6 xbars, 41 links > > >> Thu Sep 27 11:45:37 2007 map version is 1987557559 > > >> Thu Sep 27 11:45:37 2007 Found NIC 0 at index 6! > > >> Thu Sep 27 11:45:37 2007 Found NIC 1 at index 5! > > >> Thu Sep 27 11:45:37 2007 map seems OK > > >> Thu Sep 27 11:45:37 2007 Routing took 0 seconds > > >> Thu Sep 27 11:51:02 2007 7 hosts, 11 nics, 6 xbars, 43 links > > >> Thu Sep 27 11:51:02 2007 map version is 1987557560 > > >> Thu Sep 27 11:51:02 2007 Found NIC 0 at index 6! > > >> Thu Sep 27 11:51:02 2007 Found NIC 1 at index 5! > > >> Thu Sep 27 11:51:02 2007 map seems OK > > >> Thu Sep 27 11:51:02 2007 Routing took 0 seconds > > >> Fri Sep 28 13:27:10 2007 Requesting remap from indus4 > > >> (00:60:dd:47:b3:e8): verify failed x5p0, nic 1, port 0 route=-8 15 6 > > >> reply=-6 -15 8 , remote=ravi1 NIC > > >> 0, p0 mac=00:60:dd:47:b3:bf > > >> Fri Sep 28 13:27:24 2007 6 hosts, 8 nics, 6 xbars, 40 links > > >> Fri Sep 28 13:27:24 2007 map version is 1987557561 > > >> Fri Sep 28 13:27:24 2007 Found NIC 0 at index 5! > > >> Fri Sep 28 13:27:24 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map! > > >> Fri Sep 28 13:27:24 2007 map seems OK > > >> Fri Sep 28 13:27:24 2007 Routing took 0 seconds > > >> Fri Sep 28 13:27:44 2007 7 hosts, 10 nics, 6 xbars, 42 links > > >> Fri Sep 28 13:27:44 2007 map version is 1987557562 > > >> Fri Sep 28 13:27:44 2007 Found NIC 0 at index 7! > > >> Fri Sep 28 13:27:44 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map! > > >> Fri Sep 28 13:27:44 2007 map seems OK > > >> Fri Sep 28 13:27:44 2007 Routing took 0 seconds > > >> > > >> Do you have any suggestion or comments why this error appear and whats > > >> the solution to this problem. I have checked community mailing list for > > >> this problem and found few topics related to this, but could find any > > >> solution. Any suggestion or comments will be highly appreciated. > > >> > > >> The code that i m trying to run is given as follows: > > >> > > >> #include <stdio.h> > > >> #include "mpi.h" > > >> int main(int argc, char **argv) > > >> { > > >> int rank, size, tag, rc, i; > > >> MPI_Status status; > > >> char message[20]; > > >> rc = MPI_Init(&argc, &argv); > > >> rc = MPI_Comm_size(MPI_COMM_WORLD, &size); > > >> rc = MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > >> tag = 100; > > >> if(rank == 0) { > > >> strcpy(message, "Hello, world"); > > >> for (i=1; i<size; i++) > > >> rc = MPI_Send(message, 13, MPI_CHAR, i, tag, MPI_COMM_WORLD); > > >> } > > >> else > > >> rc = MPI_Recv(message, 13, MPI_CHAR, 0, tag, MPI_COMM_WORLD, > > >> &status); > > >> printf( "node %d : %.13s\n", rank,message); > > >> rc = MPI_Finalize(); > > >> return 0; > > >> } > > >> > > >> Thanks. > > >> Looking forward. > > >> Best regards, > > >> Hammad Siddiqi > > >> Center for High Performance Scientific Computing > > >> NUST Institute of Information Technology, > > >> National University of Sciences and Technology, > > >> Rawalpindi, Pakistan. > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... http://www.the-brights.net/