Ok, thanks for your answers! I was not aware that it is a known issue. I guess I will just try to find a machine with OpenMPI/2.0.2 and try there.
On 16 February 2017 at 00:01, r...@open-mpi.org <r...@open-mpi.org> wrote: > Yes, 2.0.1 has a spawn issue. We believe that 2.0.2 is okay if you want to > give it a try > > Sent from my iPad > > On Feb 15, 2017, at 1:14 PM, Jason Maldonis <maldo...@wisc.edu> wrote: > > Just to throw this out there -- to me, that doesn't seem to be just a > problem with SLURM. I'm guessing the exact same error would be thrown > interactively (unless I didn't read the above messages carefully enough). > I had a lot of problems running spawned jobs on 2.0.x a few months ago, so > I switched back to 1.10.2 and everything worked. Just in case that helps > someone. > > Jason > > On Wed, Feb 15, 2017 at 1:09 PM, Anastasia Kruchinina < > nastja.kruchin...@gmail.com> wrote: > >> Hi! >> >> I am doing like this: >> >> sbatch -N 2 -n 5 ./job.sh >> >> where job.sh is: >> >> #!/bin/bash -l >> module load openmpi/2.0.1-icc >> mpirun -np 1 ./manager 4 >> >> >> >> >> >> >> >> On 15 February 2017 at 17:58, r...@open-mpi.org <r...@open-mpi.org> wrote: >> >>> The cmd line looks fine - when you do your “sbatch” request, what is in >>> the shell script you give it? Or are you saying you just “sbatch” the >>> mpirun cmd directly? >>> >>> >>> On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina < >>> nastja.kruchin...@gmail.com> wrote: >>> >>> Hi, >>> >>> I am running like this: >>> mpirun -np 1 ./manager >>> >>> Should I do it differently? >>> >>> I also thought that all sbatch does is create an allocation and then run >>> my script in it. But it seems it is not since I am getting these results... >>> >>> I would like to upgrade to OpenMPI, but no clusters near me have it yet >>> :( So I even cannot check if it works with OpenMPI 2.0.2. >>> >>> On 15 February 2017 at 16:04, Howard Pritchard <hpprit...@gmail.com> >>> wrote: >>> >>>> Hi Anastasia, >>>> >>>> Definitely check the mpirun when in batch environment but you may also >>>> want to upgrade to Open MPI 2.0.2. >>>> >>>> Howard >>>> >>>> r...@open-mpi.org <r...@open-mpi.org> schrieb am Mi. 15. Feb. 2017 um >>>> 07:49: >>>> >>>>> Nothing immediate comes to mind - all sbatch does is create an >>>>> allocation and then run your script in it. Perhaps your script is using a >>>>> different “mpirun” command than when you type it interactively? >>>>> >>>>> On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina < >>>>> nastja.kruchin...@gmail.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I am trying to use MPI_Comm_spawn function in my code. I am having >>>>> trouble with openmpi 2.0.x + sbatch (batch system Slurm). >>>>> My test program is located here: http://user.it.uu.se/~anakr367 >>>>> /files/MPI_test/ >>>>> >>>>> When I am running my code I am getting an error: >>>>> >>>>> OPAL ERROR: Timeout in file >>>>> ../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line >>>>> 193 >>>>> *** An error occurred in MPI_Init_thread >>>>> *** on a NULL communicator >>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now >>>>> abort, >>>>> *** and potentially your MPI job) >>>>> -------------------------------------------------------------------------- >>>>> >>>>> It looks like MPI_INIT failed for some reason; your parallel process >>>>> is >>>>> likely to abort. There are many reasons that a parallel process can >>>>> fail during MPI_INIT; some of which are due to configuration or >>>>> environment >>>>> problems. This failure appears to be an internal failure; here's some >>>>> additional information (which may only be relevant to an Open MPI >>>>> developer): >>>>> >>>>> ompi_dpm_dyn_init() failed >>>>> --> Returned "Timeout" (-15) instead of "Success" (0) >>>>> -------------------------------------------------------------------------- >>>>> >>>>> >>>>> The interesting thing is that there is no error when I am firstly >>>>> allocating nodes with salloc and then run my program. So, I noticed that >>>>> the program works fine using openmpi 1.x+sbach/salloc or openmpi >>>>> 2.0.x+salloc but not openmpi 2.0.x+sbatch. >>>>> >>>>> The error was reproduced on three different computer clusters. >>>>> >>>>> Best regards, >>>>> Anastasia >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users