Hi Peter,

The WMI worked for you, that's great. Was it difficult for you to configure everything?

For the hanging problem, it's quite similar to another thread: http://www.open-mpi.org/community/lists/users/2012/01/18128.php

I wasn't able to solve that one yet, it's a complicated one. But the easy solution is to switch the send and recv sequence for root process. Could you please have a try on that?


Shiqing


On 2012-06-23 8:40 PM, Peter Soukalopoulos wrote:

Hi Shiqing,

No problems executing notepad.exe remotely -- process with id 2416 created on remote node.

From 10.244.166.37

C:\Users\greenbutton>wmic /node:10.243.1.134 process call create notepad.exe

Executing (Win32_Process)->Create()

Method execution successful.

Out Parameters:

instance of __PARAMETERS

{

        ProcessId = 2416;

        ReturnValue = 0;

};

No problems running the MPI command on notepad.exe

From 10.244.166.37

C:\Users\greenbutton>mpirun -np 2 -host 10.244.166.37,10.243.1.134 c:\windows\system32\notepad.exe

connecting to 10.243.1.134

username:greenbutton

password:*********

Save Credential?(Y/N) n

--------------------------------------------------------------------------

mpirun noticed that the job aborted, but has no info as to the process

that caused that situation.

--------------------------------------------------------------------------

(Works; blocked until notepad.exe killed on both nodes)

Running my command MPIHello still does not work across nodes; I believe there is a MPI communication problem between the processes, ie. MPI_Send/Recv. It worked with 2 processes but not 4. How do I go about resolving that? Is there a problem with the build settings of my executable?

C:\mpi\exe>mpirun -np 2 -host 10.244.166.37,10.243.1.134 MPIHello.exe

connecting to 10.243.1.134

username:greenbutton

password:*********

Save Credential?(Y/N) n

WE have 2 processors

Hello 1 Processor 1 at node AMAZONA-BMCKVD6 reporting for duty

(works -- output from rank 1)

C:\mpi\exe>

C:\mpi\exe>mpirun -np 4 -host 10.244.166.37,10.243.1.134 MPIHello.exe

connecting to 10.243.1.134

username:greenbutton

password:*********

Save Credential?(Y/N) n

WE have 4 processors

(hangs -- no output from ranks 1,2 or 3)

Please assist.

/Regards,/

/Peter/

*From:*Shiqing Fan [mailto:f...@hlrs.de]
*Sent:* Friday, 22 June 2012 8:11 p.m.
*To:* Open MPI Users
*Cc:* Peter Soukalopoulos
*Subject:* Re: [OMPI users] Mpiexec hanging when running "hello world" on 2 EC2 windows instances

Hi Peter,

The Open MPI potentially uses WMI to launch remote processes, so the WMI has to be configured correctly. There are two links talking about how to set it up in README.WINDOWS file:

http://msdn.microsoft.com/en-us/library/aa393266(VS.85).aspx <http://msdn.microsoft.com/en-us/library/aa393266%28VS.85%29.aspx>
http://community.spiceworks.com/topic/578

For testing whether it works or not, you can use following command:
wmic /node:remote_node_ip process call create notepad.exe

then log onto the other Windows, check in the task manager if the notepad.exe process is created (don't forget to delete it afterwards).

If that works, this command will also work:
mpirun -np 2 -host host1 host2 notepad.exe

Please try to run the above two test commands, if they all works you application should also work. Just let me know if you have any question or trouble with that.


Shiqing



On 2012-06-22 7:00 AM, Peter Soukalopoulos wrote:

    I am a new comer to Open MPI.

    I have spent the last day trying to diagnose why a "hello world"
    MPI application compiled with OpenMPI v1.6.1 (64 bit) hangs when
    run on two EC2 Windows instances. I note they are running on
    different subnets so I'm using the mca btl_tcp_if_include
    10.0.0.0/8 parameter. My two hosts are
    10.242.73.81,10.116.114.238. I've placed the executable in the
    same path on both machines.

    Diagnostic info requested is attached along with sample
    application source.

    When I run two processes on one instance -- the command succeeds:

    C:\mpi\exe>mpiexec -n 2 -host 10.242.73.81 --mca
    btl_tcp_if_include 10.0.0.0/8 MPIHello.exe

    WE have 2 processors

    Hello 1 Processor 1 at node AMAZONA-BMCKVD6 reporting for duty

    When I run across two hosts, the executable is launched on both
    instances but the process hangs:

    C:\mpi\exe>mpiexec -n 4 -host 10.242.73.81,10.116.114.238 --mca
    btl_tcp_if_include 10.0.0.0/8 MPIHello.exe

    connecting to 10.116.114.238

    username:greenbutton

    password:*********

    Save Credential?(Y/N) n

    WE have 4 processors

    Re-running with debug:

    C:\mpi\exe>mpiexec -n 4 -host 10.242.73.81,10.116.114.238 -d --mca
    btl_tcp_if_include 10.0.0.0/8 MPIHello.exe

    [AMAZONA-BMCKVD6:01240] procdir:
    
C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton@AMAZONA-BMCKVD6_0\63746\0\0


    [AMAZONA-BMCKVD6:01240] jobdir:
    
C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton@AMAZONA-BMCKVD6_0\63746\0


    [AMAZONA-BMCKVD6:01240] top:
    openmpi-sessions-greenbutton@AMAZONA-BMCKVD6_0

    [AMAZONA-BMCKVD6:01240] tmp: C:\Users\GREENB~1\AppData\Local\Temp\2

    [AMAZONA-BMCKVD6:01240] mpiexec: reset PATH: C:\Program Files
    
(x86)\OpenMPI_v1.6-x64\bin;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;


    [AMAZONA-BMCKVD6:01240] mpiexec: reset LD_LIBRARY_PATH: C:\Program
    Files (x86)\OpenMPI_v1.6-x64\lib

    connecting to 10.116.114.238

    username:greenbutton

    password:*********

    Save Credential?(Y/N) n

    [AMAZONA-BMCKVD6:02728] procdir:
    
C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton@AMAZONA-BMCKVD6_0\63746\1\0


    [AMAZONA-BMCKVD6:02728] jobdir:
    
C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton@AMAZONA-BMCKVD6_0\63746\1


    [AMAZONA-BMCKVD6:02728] top:
    openmpi-sessions-greenbutton@AMAZONA-BMCKVD6_0

    [AMAZONA-BMCKVD6:02728] tmp: C:\Users\GREENB~1\AppData\Local\Temp\2

    [AMAZONA-BMCKVD6:02728] [[63746,1],0] node[0].name AMAZONA-BMCKVD6
    daemon 0

    [AMAZONA-BMCKVD6:02728] [[63746,1],0] node[1].name 10 daemon 1

    [AMAZONA-BMCKVD6:01500] procdir:
    
C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton@AMAZONA-BMCKVD6_0\63746\1\2


    [AMAZONA-BMCKVD6:01500] jobdir:
    
C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton@AMAZONA-BMCKVD6_0\63746\1


    [AMAZONA-BMCKVD6:01500] top:
    openmpi-sessions-greenbutton@AMAZONA-BMCKVD6_0

    [AMAZONA-BMCKVD6:01500] tmp: C:\Users\GREENB~1\AppData\Local\Temp\2

    [AMAZONA-BMCKVD6:01500] [[63746,1],2] node[0].name AMAZONA-BMCKVD6
    daemon 0

    [AMAZONA-BMCKVD6:01500] [[63746,1],2] node[1].name 10 daemon 1

    WE have 4 processors

    I'd appreciate any guidance to getting this example to run on two
    instances on disparate subnets on Windows Server 2008 R2.

    Thanks in advance for your help.

    /Regards, /

    /Peter /

    *Peter Soukalopoulos*
    *Development Team Leader | GreenButton Limited *|
    www.greenbutton.com <http://www.greenbutton.com/>
    Level 13, Simpl House, 40 Mercer Street, Wellington, New Zealand
    Mobile: +64 22 632 5023| peter.soukalopou...@greenbutton.com
    <mailto:peter.soukalopou...@greenbutton.com> | Skype: psoukal |
    HQ: +644 499 0424


    This message contains confidential information, intended only for
    the person(s) named above, which may also be privileged. Any use,
    distribution, copying or disclosure by any other person is
    strictly prohibited. In such case, you should delete this message
    and kindly notify the sender via reply e-mail. Please advise
    immediately if you or your employer does not consent to Internet
    e-mail for messages of this kind.

    
*****************************************************************************

    **                                                                         
**

    ** WARNING:  This email contains an attachment of a very suspicious type.  
**

    ** You are urged NOT to open this attachment unless you are absolutely     
**

    ** sure it is legitimate.  Opening this attachment may cause irreparable   
**

    ** damage to your computer and your files.  If you have any questions      
**

    ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. 
**

    **                                                                         
**

    ** This warning was added by the IU Computer Science Dept. mail scanner.   
**

    
*****************************************************************************



    _______________________________________________

    users mailing list

    us...@open-mpi.org  <mailto:us...@open-mpi.org>

    http://www.open-mpi.org/mailman/listinfo.cgi/users



--
---------------------------------------------------------------
Shiqing Fan
High Performance Computing Center Stuttgart (HLRS)
Tel: ++49(0)711-685-87234      Nobelstrasse 19
Fax: ++49(0)711-685-65832      70569 Stuttgart
http://www.hlrs.de/organization/people/shiqing-fan/
email:f...@hlrs.de  <mailto:f...@hlrs.de>


--
---------------------------------------------------------------
Shiqing Fan
High Performance Computing Center Stuttgart (HLRS)
Tel: ++49(0)711-685-87234      Nobelstrasse 19
Fax: ++49(0)711-685-65832      70569 Stuttgart
http://www.hlrs.de/organization/people/shiqing-fan/
email: f...@hlrs.de

Reply via email to