Re: [OMPI users] Problem with MPI_FILE_WRITE_AT

2017-09-15 Thread Edgar Gabriel
thank you for the report and the code, I will look into this. What file system is that occurring on? Until I find the problem, note that you could switch to back to the previous parallel I/O implementation (romio) by providing that as a parameter to your mpirun command, e.g. mpirun --mca io

[OMPI users] Problem with MPI_FILE_WRITE_AT

2017-09-15 Thread McGrattan, Kevin B. Dr. (Fed)
I am using MPI_FILE_WRITE_AT to print out the timings of subroutines in a big Fortran code. I have noticed since upgrading to Open MPI 2.1.1 that sometimes the file to be written is corrupted. Each MPI process is supposed to write out a character string that is 159 characters in length, plus a

Re: [OMPI users] Honor host_aliases file for tight SGE integration

2017-09-15 Thread r...@open-mpi.org
Hi Reuti As far as I am concerned, you SGE users “own” the SGE support - so feel free to submit a patch! Ralph > On Sep 13, 2017, at 9:10 AM, Reuti wrote: > > Hi, > > I wonder whether it came ever to the discussion, that SGE can have a similar > behavior like

Re: [OMPI users] openmpi 1.10.2 with the IBM lsf system

2017-09-15 Thread Josh Hursey
That line of code is here: https://github.com/open-mpi/ompi/blob/v1.10.2/orte/mca/plm/lsf/plm_lsf_module.c#L346 (Unfortunately we didn't catch the rc from lsb_launch to see why it failed - I'll fix that). So it looks like LSF failed to launch our daemon on one or more remote machines. This could

[OMPI users] openmpi 1.10.2 with the IBM lsf system

2017-09-15 Thread Jing Gong
Hi, We tried to run a job of openfoam with 4480 cpus using the IBM LSF system but got the following error messages: ... [bs209:16251] [[25529,0],0] ORTE_ERROR_LOG: The specified application failed to start in file

[OMPI users] MPI vendor error

2017-09-15 Thread Ludovic Raess
Hi, we have a issue on our 32 nodes Linux cluster regarding the usage of Open MPI in a Infiniband dual-rail configuration (2 IB Connect X FDR single port HCA, Centos 6.6, OFED 3.1, openmpi 2.0.0, gcc 5.4, cuda 7). On long runs (over ~10 days) involving more than 1 node (usually 64 MPI