date:20130326

Re: [OMPI users] Persistent Communication using MPI_SEND_INIT, MPI_RECV_INIT etc.

2013-03-26 Thread Jeff Squyres (jsquyres)

On Mar 25, 2013, at 10:21 PM, Timothy Stitt  wrote:

> I've inherited a MPI code that was written ~8-10 years ago

Always a fun situation to be in.  :-)

> and it predominately uses MPI persistent communication routines for data 
> transfers e.g. MPI_SEND_INIT, MPI_RECV_INIT, MPI_START etc.  I was just 
> wondering if using persistent communication calls is still the most 
> efficient/scalable way to perform communication when the communication 
> pattern is known and fixed amongst neighborhood processes? We regularly run 
> the code across an IB network so would there be a benefit to rewrite the code 
> using another approach (e.g. MPI one-sided communication)?

The answer is: it depends.  :-)

Persistent is not a bad choice.  It separates one-time setup from the actual 
communication, and OMPI does actually optimize that reasonably well.  Hence, 
when you MPI_START a request, it's a pretty short hop until you get down to the 
actual network stack and start sending messages (not that normal MPI_SEND is 
expensive, mind you...).

That being said, there are *many* factors that contribute to overall MPI 
performance.  Persistent vs. non-persistent communication is one, buffer re-use 
(for large messages) is another (especially over OpenFabrics networks, which 
you have/use, IIRC), pre-posting receives is another, ...etc.

It depends on your communication pattern, how much registered memory you have 
available (*** be sure to see 
http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem -- even if 
you're not [yet] seeing those warnings ***), the network distance of each 
process, etc.

I'm not a huge fan of MPI one-sided, but there are definitely applications 
which fit its model quite well.  And I'm told by people that I trust to 
understand that stuff much more deeply than me that the new MPI-3 one-sided 
stuff is *good* -- albeit complex.  Before trying to use that stuff, be sure to 
read the MPI-3 one-sided chapter (not MPI-2.2 -- one-sided got revamped in 
MPI-3) and really understand it.  Open MPI's MPI-3 one-sided implementation is 
not yet complete, but Brian is actively working on it.

There's also the neighborhood collectives that were introduced in MPI-3, which 
may or may not help you.  We don't have these ready yet, either.  I believe 
MPICH may have implementations of the neighborhood collectives; I don't know if 
they've had time to optimize them yet or not (you should ask them) -- i.e., I 
don't know if they're more optimized than a bunch of Send_init's at the 
beginning of time and calling Startall() periodically.  YMMV -- you'll probably 
need to do some benchmarking to figure out what's best for your application.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] seg fault and undefined symbol in openmpi-1.9r28203

2013-03-26 Thread Jeff Squyres (jsquyres)

On Mar 22, 2013, at 8:34 AM, Siegmar Gross 
 wrote:

> openSuSE Linux 12.1, x86_64, SunC 5.12, 32-bit
> --
> 
> linpc1 openmpi-1.9-Linux.x86_64.32_cc 113 tail log.make.Linux.x86_64.32_cc 
> Making all in mpi/fortran/use-mpi-f08
> make[2]: Entering directory `.../ompi/mpi/fortran/use-mpi-f08'
>  PPFC mpi-f08-sizeof.lo
>  PPFC mpi-f08.lo
> "../../../../../openmpi-1.9r28203/ompi/mpi/fortran/use-mpi-f08/mpi-f08.F90",
>  Line = 1, Column = 1: INTERNAL: Interrupt: Segmentation fault
> make[2]: *** [mpi-f08.lo] Error 1
> make[2]: Leaving directory `.../ompi/mpi/fortran/use-mpi-f08'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `.../ompi'
> make: *** [all-recursive] Error 1

Just to follow up on this thread for the web archives: I think we're talking 
about this error in a different thread (because we were too slow to answer this 
one -- sorry!).  It looks to me like a compiler error.

> Solaris 10, x86_64 and sparc, SunC 5.12, 32-bit and 64-bit
> --
> 
> sunpc1 openmpi-1.9-SunOS.x86_64.32_cc 16 tail -15 log.make.SunOS.x86_64.32_cc
> make[2]: Entering directory `.../opal/tools/wrappers'
>  CC   opal_wrapper.o
> "../../../../openmpi-1.9r28203/opal/include/opal/sys/ia32/atomic.h",
>  line 173: warning: parameter in inline asm statement unused: %2
> "../../../../openmpi-1.9r28203/opal/include/opal/sys/ia32/atomic.h",
>  line 193: warning: parameter in inline asm statement unused: %2
>  CCLD opal_wrapper
> Undefined   first referenced
> symbol in file
> opal_hwloc152_hwloc_solaris_get_chip_model ../../../opal/.libs/libopen-pal.so
> opal_hwloc152_hwloc_solaris_get_chip_type ../../../opal/.libs/libopen-pal.so
> ld: fatal: symbol referencing errors. No output written to .libs/opal_wrapper
> make[2]: *** [opal_wrapper] Error 2
> make[2]: Leaving directory `.../opal/tools/wrappers'

This one was fixed today on the OMPI trunk at SVN r28215.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] seg fault and 32-/64-bit mismatch in openmpi-1.7rc8r28201

2013-03-26 Thread Jeff Squyres (jsquyres)

On Mar 22, 2013, at 8:06 AM, Siegmar Gross 
 wrote:

> openSuSE Linux 12.1, x86_64, SunC 5.12, 32-bit
> --
> 
> tyr openmpi-1.7-Linux.x86_64.32_cc 547 tail log.make.Linux.x86_64.32_cc
> Making all in mpi/fortran/use-mpi-f08
> make[2]: Entering directory `.../ompi/mpi/fortran/use-mpi-f08'
>  PPFC mpi-f08-sizeof.lo
>  PPFC mpi-f08.lo
> "../../../../../openmpi-1.7rc8r28201/ompi/mpi/fortran/use-mpi-f08/mpi-f08.F90",
>  Line = 1, Column = 1: INTERNAL: Interrupt: Segmentation fault
> make[2]: *** [mpi-f08.lo] Error 1
> make[2]: Leaving directory `.../ompi/mpi/fortran/use-mpi-f08'

For this one, it looks like your Fortran compiler is borked (i.e., it's seg 
faulting!).  You might want to --disable-mpi-fortran.  This will make your 
build faster, too.  :-)

> Solaris 10, x86_64 and sparc, SunC 5.12, 64-bit
> ---
> 
> tyr openmpi-1.7-SunOS.x86_64.64_cc 565 tail -15 log.make.SunOS.x86_64.64_cc
>  F77  libvt_fmpi_la-vt_fmpiconst_2.lo
>  F77LDlibvt-fmpi.la
> ld: fatal: file .libs/libvt_fmpi_la-vt_fmpiconst_1.o: wrong ELF class: 
> ELFCLASS32
> ld: fatal: file processing errors. No output written to 
> .libs/libvt-fmpi.so.0.0.0
> make[5]: *** [libvt-fmpi.la] Error 2
> make[5]: Leaving directory `.../ompi/contrib/vt/vt/vtlib'

I pinged the VT guys about your other post -- they don't always keep a close 
watch in the OMPI users list, and sometimes rely on an out-of-band nudge to get 
a reply.  But when they're nudged (which they just were), they're pretty good 
about replying.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] mpi_init waits 64 seconds if vpn is connected

2013-03-26 Thread Jeff Squyres (jsquyres)

On Mar 21, 2013, at 9:52 PM, David A. Boger  wrote:

> If I add "-mca oob_tcp_if_exclude cscotun0", then the corresponding address 
> for that vpn interface no longer shows up in contact.txt, but the problem 
> remains. I also add "-mca btl ^cscotun0 -mca btl_tcp_if_exclude cscotun0" 
> with no effect.

I don't know if you have resolved this or not, but I notice you seem to have an 
error in your exclude line.  You should probably do:

mpirun --mca oob_tcp_if_exclude lo,cscotun0 --mca btl_tcp_if_exclude 
lo,cscotun0 ...

Or, better yet, just *include* the one interface that you want (so that you 
don't have to remember to *also* exclude localhost, and/or any other 
problematic interfaces):

mpirun --mca oob_tcp_if_include en0 --mca btl_tcp_if_include en0 ...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] "Error setting file view" NPB BTIO

2013-03-26 Thread Jeff Squyres (jsquyres)

Sorry for the delay in replying.

I see that there's an NPB 3.3.1 available these days -- did that fix the bug, 
perchance?


On Mar 20, 2013, at 5:08 PM, kme...@cs.uh.edu wrote:

> Hello ,
> 
> I am running NAS parallel benchmark's BTIO benchmark (NPB v 3.3) for class
> D and 1 process.
> 
> `make bt CLASS=D SUBTYPE=full NPROCS=1`
> 
> I have provided gcc's `mcmodel=medium` flag alongwith -O3 during
> compilation. This is on an x86_64 machine.
> 
> I have tested with openmpi 1.4.3, 1.7, but I get "Error setting file view"
> when I run the benchmark. It works fine for 4,16 processes. Can someone
> point out what is going wrong? Thanks in advance.
> 
> 
> NAS Parallel Benchmarks 3.3 -- BT Benchmark
> 
> No input file inputbt.data. Using compiled defaults
> Size:  408x 408x 408
> Iterations:  250dt:   0.200
> Number of active processes: 1
> 
> BTIO -- FULL MPI-IO write interval:   5
> 
> Error setting file view
> --
> mpirun has exited due to process rank 0 with PID 6663 on
> node crill-003 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --
> 
> 
> 
> 
> 
> Regards,
> Kshitij Mehta
> PhD student
> University of Houston
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] Minor bug: invalid values for opal_signal MCA parameter cause internal error

2013-03-26 Thread Jeff Squyres (jsquyres)

Just curious -- what docs are you referring to?  I don't see opal_signal 
referred to in the README, orterun.1, or the faq...


On Mar 20, 2013, at 11:32 AM, Ralph Castain  wrote:

> Simple to do - I added a clearer error message to the trunk and marked it for 
> inclusion in the eventual v1.7.1 release. I'll have to let someone else do 
> the docs as I don't fully grok the rationale behind it.
> 
> Thanks
> 
> On Mar 18, 2013, at 12:56 PM, Jeremiah Willcock  wrote:
> 
>> If a user gives an invalid value for the opal_signal MCA parameter, such as 
>> in the command:
>> 
>> mpirun -mca opal_signal x /bin/ls
>> 
>> the error produced by Open MPI 1.6.3 is:
>> 
>> --
>> It looks like opal_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during opal_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>> 
>> opal_util_register_stackhandlers failed
>> --> Returned value -5 instead of OPAL_SUCCESS
>> --
>> 
>> which claims to be an internal error, not an invalid argument given by a 
>> user.  That parameter also appears to be poorly documented in general 
>> (mentioned in ompi_info -a and on the mailing lists), and seems like it 
>> would be an incredibly useful debugging tool when running a crashing 
>> application under a debugger.
>> 
>> -- Jeremiah Willcock
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] Running openmpi jobs on two system-librdmacm: couldn't read ABI version

2013-03-26 Thread Jeff Squyres (jsquyres)

Yes, it looks like you have a heterogeneous system (i.e., a binary compiled on 
one server doesn't necessarily run properly on another server).

In this case, you should see the heterogeneous section of the FAQ.

Fair warning, though -- heterogeneous systems are more difficult to 
manage/maintain/use then homogeneous systems...


On Mar 26, 2013, at 3:54 AM, Syed Ahsan Ali  wrote:

> It may be because the other system is running upgraded version of linux which 
> is not having infiniband drivers. Any solution?
> 
> 
> On Tue, Mar 26, 2013 at 12:42 PM, Syed Ahsan Ali  
> wrote:
> Tried this but mpirun exits with this error
>  
> mpirun -np 40 /home/MET/hrm/bin/hrm
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[33095,1],8]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
>   Host: pmd04.pakmet.com
> Another transport will be used instead, although this may result in
> lower performance.
> --
> --
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
>   Process 1 ([[33095,1],28]) is on host: compute-02-00.private02.pakmet.com
>   Process 2 ([[33095,1],0]) is on host: pmd02
>   BTLs attempted: openib self sm
> Your MPI job is now going to abort; sorry.
> --
>  
>  
> Ahsan
> 
> On Fri, Mar 22, 2013 at 7:09 PM, Ralph Castain  wrote:
> 
> On Mar 22, 2013, at 3:42 AM, Syed Ahsan Ali  wrote:
> 
>> Actually due to some data base corruption I am not able to add any new node 
>> to cluster from the installer node. So I want to run parallel job on more 
>> nodes without adding them to existing cluster.
>> You are right the binaries must be present on the remote node as well.
>> Is this possible throught nfs? just as the compute nodes are nfs mounted 
>> with the installer node.
> 
> Sure - OMPI doesn't care how the binaries got there. Just so long as they are 
> present on the compute node.
> 
>>  
>> Ahsan
>> 
>> 
>> On Fri, Mar 22, 2013 at 3:33 PM, Reuti  wrote:
>> Am 22.03.2013 um 10:14 schrieb Syed Ahsan Ali:
>> 
>> > I have a very basic question. If we want to run mpirun job on two systems 
>> > which are not part of cluster, then how we can make it possible. Can the 
>> > host be specifiend on mpirun which is not compute node, rather a stand 
>> > alone system.
>> 
>> Sure, the machines can be specified as argument to `mpiexec`. But do you 
>> want to run applications just between these two machines, or should they 
>> participate on a larger parallel job with machines of the cluster: then a 
>> direct network connection between outside and inside of the cluster is 
>> necessary by some kind of forwarding in case these are separated networks.
>> 
>> Also the paths to the started binaries may be different, in case the two 
>> machines are not sharing the same /home with the cluster and this needs to 
>> be honored.
>> 
>> In case you are using a queuing system and want to route jobs to outside 
>> machines of the set up cluster: it's necessary to negotiate with the admin 
>> to allow jobs being scheduled thereto.
>> 
>> -- Reuti
>> 
>> 
>> > Thanks
>> > Ahsan
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> 
>> -- 
>> Syed Ahsan Ali Bokhari 
>> Electronic

Re: [OMPI users] Running openmpi jobs on two system-librdmacm: couldn't read ABI version

2013-03-26 Thread Syed Ahsan Ali

It may be because the other system is running upgraded version of linux
which is not having infiniband drivers. Any solution?


On Tue, Mar 26, 2013 at 12:42 PM, Syed Ahsan Ali wrote:

> Tried this but mpirun exits with this error
>
> mpirun -np 40 /home/MET/hrm/bin/hrm
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[33095,1],8]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
>   Host: pmd04.pakmet.com
> Another transport will be used instead, although this may result in
> lower performance.
> --
> --
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
>   Process 1 ([[33095,1],28]) is on host:
> compute-02-00.private02.pakmet.com
>   Process 2 ([[33095,1],0]) is on host: pmd02
>   BTLs attempted: openib self sm
> Your MPI job is now going to abort; sorry.
> --
>
>
> Ahsan
>
> On Fri, Mar 22, 2013 at 7:09 PM, Ralph Castain  wrote:
>
>>
>> On Mar 22, 2013, at 3:42 AM, Syed Ahsan Ali 
>> wrote:
>>
>> Actually due to some data base corruption I am not able to add any new
>> node to cluster from the installer node. So I want to run parallel job on
>> more nodes without adding them to existing cluster.
>> You are right the binaries must be present on the remote node as well.
>> Is this possible throught nfs? just as the compute nodes are nfs mounted
>> with the installer node.
>>
>>
>> Sure - OMPI doesn't care how the binaries got there. Just so long as they
>> are present on the compute node.
>>
>>
>> Ahsan
>>
>>
>> On Fri, Mar 22, 2013 at 3:33 PM, Reuti wrote:
>>
>>> Am 22.03.2013 um 10:14 schrieb Syed Ahsan Ali:
>>>
>>> > I have a very basic question. If we want to run mpirun job on two
>>> systems which are not part of cluster, then how we can make it possible.
>>> Can the host be specifiend on mpirun which is not compute node, rather a
>>> stand alone system.
>>>
>>> Sure, the machines can be specified as argument to `mpiexec`. But do you
>>> want to run applications just between these two machines, or should they
>>> participate on a larger parallel job with machines of the cluster: then a
>>> direct network connection between outside and inside of the cluster is
>>> necessary by some kind of forwarding in case these are separated networks.
>>>
>>> Also the paths to the started binaries may be different, in case the two
>>> machines are not sharing the same /home with the cluster and this needs to
>>> be honored.
>>>
>>> In case you are using a queuing system and want to route jobs to outside
>>> machines of the set up cluster: it's necessary to negotiate with the admin
>>> to allow jobs being scheduled thereto.
>>>
>>> -- Reuti
>>>
>>>
>>> > Thanks
>>> > Ahsan
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Syed Ahsan Ali Bokhari
>> Electronic Engineer (EE)
>>
>> Research & Development Division
>> Pakistan Meteorological Department H-8/4, Islamabad.
>> Phone # off  +92518358714
>> Cell # +923155145014
>>  ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org

Re: [OMPI users] Running openmpi jobs on two system-librdmacm: couldn't read ABI version

2013-03-26 Thread Syed Ahsan Ali

Tried this but mpirun exits with this error

mpirun -np 40 /home/MET/hrm/bin/hrm
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--
[[33095,1],8]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
  Host: pmd04.pakmet.com
Another transport will be used instead, although this may result in
lower performance.
--
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.
  Process 1 ([[33095,1],28]) is on host: compute-02-00.private02.pakmet.com
  Process 2 ([[33095,1],0]) is on host: pmd02
  BTLs attempted: openib self sm
Your MPI job is now going to abort; sorry.
--


Ahsan

On Fri, Mar 22, 2013 at 7:09 PM, Ralph Castain  wrote:

>
> On Mar 22, 2013, at 3:42 AM, Syed Ahsan Ali  wrote:
>
> Actually due to some data base corruption I am not able to add any new
> node to cluster from the installer node. So I want to run parallel job on
> more nodes without adding them to existing cluster.
> You are right the binaries must be present on the remote node as well.
> Is this possible throught nfs? just as the compute nodes are nfs mounted
> with the installer node.
>
>
> Sure - OMPI doesn't care how the binaries got there. Just so long as they
> are present on the compute node.
>
>
> Ahsan
>
>
> On Fri, Mar 22, 2013 at 3:33 PM, Reuti  wrote:
>
>> Am 22.03.2013 um 10:14 schrieb Syed Ahsan Ali:
>>
>> > I have a very basic question. If we want to run mpirun job on two
>> systems which are not part of cluster, then how we can make it possible.
>> Can the host be specifiend on mpirun which is not compute node, rather a
>> stand alone system.
>>
>> Sure, the machines can be specified as argument to `mpiexec`. But do you
>> want to run applications just between these two machines, or should they
>> participate on a larger parallel job with machines of the cluster: then a
>> direct network connection between outside and inside of the cluster is
>> necessary by some kind of forwarding in case these are separated networks.
>>
>> Also the paths to the started binaries may be different, in case the two
>> machines are not sharing the same /home with the cluster and this needs to
>> be honored.
>>
>> In case you are using a queuing system and want to route jobs to outside
>> machines of the set up cluster: it's necessary to negotiate with the admin
>> to allow jobs being scheduled thereto.
>>
>> -- Reuti
>>
>>
>> > Thanks
>> > Ahsan
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Syed Ahsan Ali Bokhari
> Electronic Engineer (EE)
>
> Research & Development Division
> Pakistan Meteorological Department H-8/4, Islamabad.
> Phone # off  +92518358714
> Cell # +923155145014
>  ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Persistent Communication using MPI_SEND_INIT, MPI_RECV_INIT etc.

Re: [OMPI users] seg fault and undefined symbol in openmpi-1.9r28203

Re: [OMPI users] seg fault and 32-/64-bit mismatch in openmpi-1.7rc8r28201

Re: [OMPI users] mpi_init waits 64 seconds if vpn is connected

Re: [OMPI users] "Error setting file view" NPB BTIO

Re: [OMPI users] Minor bug: invalid values for opal_signal MCA parameter cause internal error

Re: [OMPI users] Running openmpi jobs on two system-librdmacm: couldn't read ABI version

Re: [OMPI users] Running openmpi jobs on two system-librdmacm: couldn't read ABI version

Re: [OMPI users] Running openmpi jobs on two system-librdmacm: couldn't read ABI version

9 matches

Site Navigation

Mail list logo

Footer information