iirc, MPI_Comm_spawn should be used to spawn MPI apps only.
and depending on your interconnect, fork might not be supported from an MPI
app.

that being said, I am not sure MPI is the best way to go.
you might want to use the batch manager api to execute task on remote
nodes, or third party tools such as clustershell. it is written in python,
and an api is likely available.

Cheers,

Gilles

On Thursday, October 8, 2015, simona bellavista <afy...@gmail.com
<javascript:_e(%7B%7D,'cvml','afy...@gmail.com');>> wrote:

>
>
> 2015-10-07 14:59 GMT+02:00 Lisandro Dalcin <dalc...@gmail.com>:
>
>> On 7 October 2015 at 14:54, simona bellavista <afy...@gmail.com> wrote:
>> > I have written a small code in python 2.7 for launching 4 independent
>> > processes on the shell viasubprocess, using the library mpi4py. I am
>> getting
>> > ORTE_ERROR_LOG and I would like to understand where it is happening and
>> why.
>> >
>> > This is my code:
>> >
>> > #!/usr/bin/python
>> > import subprocess
>> > import re
>> > import sys
>> > from mpi4py import MPI
>> >
>> > def main():
>> >     root='base'
>> >     comm = MPI.COMM_WORLD
>> >     if comm.rank == 0:
>> >         job = [root+str(i) for i in range(4)]
>> >     else:
>> >         job = None
>> >
>> >     job = comm.scatter(job, root=0)
>> >     cmd="../../montepython/montepython/MontePython.py -conf
>> > ../config/default.conf -p ../config/XXXX.param -o ../chains/XXXX  -N
>> 10000 >
>> > XXXX.log"
>> >
>> >     cmd_job = re.sub(r"XXXX", job, cmd)
>> >     subprocess.check_call(cmd_job, shell=True)
>> >     return
>> >
>> > if __name__ == '__main__':
>> >   main()
>> >
>> > I am running with the command:
>> >
>> > mpirun -np 4 ./run.py
>> >
>> > This is the error message that I get:
>> >
>> > [localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file
>> > base/odls_base_default_fns.c at line 1762
>> > [localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file
>> > orted/orted_comm.c at line 916
>> > [localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file
>> > base/odls_base_default_fns.c at line 1762
>> > [localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file
>> > orted/orted_comm.c at line 916
>> >
>> --------------------------------------------------------------------------
>> > A system call failed during shared memory initialization that should
>> > not have.  It is likely that your MPI job will now either abort or
>> > experience performance degradation.
>> >
>> >   Local host:  localhost
>> >   System call: open(2)
>> >   Error:       No such file or directory (errno 2)
>> >
>> --------------------------------------------------------------------------
>> >
>> >
>> > I cannot understand where the error is happening. MontePython by itself
>> > should not use mpibecause it should be serial.
>> >
>>
>> This is likely related to a bad interaction between the way Python's
>> subprocess is implemented and the MPI implementation.
>>
>> Anyway, you should not use mpi4py for such a simple trivial
>> parallelism, I recommend you to take a look at Python's
>> multiprocessing module.
>>
>> If for any reason you want to go the MPI way, you should use MPI
>> dynamic process management, e.g. MPI.COMM_SELF.Spawn(...).
>>
>>
>> --
>> Lisandro Dalcin
>> ============
>> Research Scientist
>> Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
>> Numerical Porous Media Center (NumPor)
>> King Abdullah University of Science and Technology (KAUST)
>> http://numpor.kaust.edu.sa/
>>
>> 4700 King Abdullah University of Science and Technology
>> al-Khawarizmi Bldg (Bldg 1), Office # 4332
>> Thuwal 23955-6900, Kingdom of Saudi Arabia
>> http://www.kaust.edu.sa
>>
>> Office Phone: +966 12 808-0459
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/10/27833.php
>>
>
>
> I cannot figure out how spawn would work with a string-command. I tried
> MPI.COMM_SELF.Spawn(cmd, args=None,maxproc=4) and it just hangs
>

Reply via email to