Good suggestion. This overall walltime reveals little difference between Intel MPI and Open MPI, for example: intelmpi=3.76 mins and openmpi=3.73 mins, while PBS pro shows intelmpi=3.82 mins and openmpi=3.80 mins.
Beichuan -----Original Message----- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Friday, March 21, 2014 07:06 To: Open MPI Users Subject: Re: [OMPI users] OpenMPI job initializing problem One thing to check would be the time spent between MPI_Init and MPI_Finalize - i.e., see if the time difference is caused by differences in init and finalize themselves. My guess is that is the source - would help us target the problem. On Mar 20, 2014, at 9:00 PM, Beichuan Yan <beichuan....@colorado.edu> wrote: > Here is an example of my data measured in seconds: > > communication overhead = commuT + migraT + print, compuT is > computational cost, totalT = compuT + communication overhead, > overhead% denotes percentage of communication overhead > > intelmpi (walltime=00:03:51) > iter [commuT migraT printT] compuT > totalT overhead% > 3999 4.945993e-03 2.689362e-04 1.440048e-04 1.689100e-02 > 2.224994e-02 2.343795e+01 > 5999 4.938126e-03 1.451969e-04 2.689362e-04 1.663089e-02 > 2.198315e-02 2.312373e+01 > 7999 4.904985e-03 1.490116e-04 1.451969e-04 1.678491e-02 > 2.198410e-02 2.298933e+01 > 9999 4.915953e-03 1.380444e-04 1.490116e-04 1.687193e-02 > 2.207494e-02 2.289473e+01 > > openmpi (walltime=00:04:32) > iter [commuT migraT printT] compuT > totalT overhead% > 3999 3.574133e-03 1.139641e-04 1.089573e-04 1.598001e-02 > 1.977706e-02 1.864836e+01 > 5999 3.574848e-03 1.189709e-04 1.139641e-04 1.599526e-02 > 1.980305e-02 1.865278e+01 > 7999 3.571033e-03 1.168251e-04 1.189709e-04 1.601100e-02 > 1.981783e-02 1.860879e+01 > 9999 3.587008e-03 1.258850e-04 1.168251e-04 1.596618e-02 > 1.979589e-02 1.875587e+01 > > It can be seen that Open MPI is faster in both communication and computation > measured by MPI_Wtime calls, but the wall time reported by PBS pro is larger. > > > -----Original Message----- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus > Correa > Sent: Thursday, March 20, 2014 15:08 > To: Open MPI Users > Subject: Re: [OMPI users] OpenMPI job initializing problem > > On 03/20/2014 04:48 PM, Beichuan Yan wrote: >> Ralph and Noam, >> >> Thanks for the clarifications, they are important. > I could be wrong in understanding the filesystem. >> >> Spirit appears to use a scratch directory for > shared memory backing which is mounted on Lustre, and does not seem to have > local directories or does not allow user to change TEMPDIR. Here is the info: >> [compute node]$ stat -f -L -c %T /tmp tmpfs [compute node]$ stat -f >> -L -c %T /home/yanb/scratch lustre >> > > So, /tmp is a tmpfs, in memory/RAM. > Maybe they don't open writing permissions for regular users on /tmp? > >> On another university supercomputer, I found the following: >> node0448[~]$ stat -f -L -c %T /tmp >> ramfs >> node0448[~]$ stat -f -L -c %T /home/yanb/scratch/ lustre Is this /tmp >> at compute node a local directory? I don't know how to tell it. >> >> Thanks, >> Beichuan >> >> >> >> -----Original Message----- >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph >> Castain >> Sent: Thursday, March 20, 2014 12:13 >> To: Open MPI Users >> Subject: Re: [OMPI users] OpenMPI job initializing problem >> >> >> On Mar 20, 2014, at 9:48 AM, Beichuan Yan <beichuan....@colorado.edu> wrote: >> >>> Hi, >>> >>> Today I tested OMPI v1.7.5rc5 and surprisingly, it works like a charm! >>> >>> I found discussions related to this issue: >>> >>> 1. http://www.open-mpi.org/community/lists/users/2011/11/17688.php >>> The correct solution here is get your sys admin to make /tmp local. Making >>> /tmp NFS mounted across multiple nodes is a major "faux pas" in the Linux >>> world - it should never be done, for the reasons stated by Jeff. >>> >>> my comment: for most clusters I have used, /tmp is NOT local. Open MPI >>> community may not enforce it. >> >> We don't enforce anything, but /tmp being network mounted is a VERY >> unusual situation in the cluster world, and highly unrecommended >> >> >>> >>> 2. http://www.open-mpi.org/community/lists/users/2011/11/17684.php >>> In the upcoming OMPI v1.7, we revamped the shared memory setup code such >>> that it'll actually use /dev/shm properly, or use some other mechanism >>> other than a mmap file backed in a real filesystem. So the issue goes away. >>> >>> my comment: up to OMPI v1.7.4, this shmem issue is still there. However, it >>> is resolved in OMPI v1.7.5rc5. This is surprising. >>> >>> Anyway, OMPI v1.7.5rc5 works well for multi-processes-on-one-node (shmem) >>> mode on Spirit. There is no need to tune TCP or IB parameters to use it. My >>> code just runs well: >>> >>> My test data takes 20 minutes to run with OMPI v1.7.4, but needs less than >>> 1 minute with OMPI v1.7.5rc5. I don't know what the magic is. I am >>> wondering when OMPI v1.7.5 final will be released. >>> >>> I will update performance comparison between Intel MPI and Open MPI. >>> >>> Thanks, >>> Beichuan >>> >>> >>> >>> -----Original Message----- >>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus >>> Correa >>> Sent: Friday, March 07, 2014 18:41 >>> To: Open MPI Users >>> Subject: Re: [OMPI users] OpenMPI job initializing problem >>> >>> On 03/06/2014 04:52 PM, Beichuan Yan wrote: >>>> No, I did all these and none worked. >>>> >>>> I just found, with exact the same code, data and job settings, a job can >>>> really run one day while cannot the other day. It is NOT repeatable. I >>>> don't know what the problem is: hardware? OpenMPI? PBS Pro? >>>> >>>> Anyway, I may have to give up using OpenMPI on that system and switch to >>>> IntelMPI which always work. >>>> >>>> Thanks, >>>> Beichuan >>> >>> Well, this machine may have been setup to run only Intel MPI (DAPL?) and >>> SGI MPI. >>> It is a pity that it doesn't seem to work with OpenMPI. >>> >>> In any case, good luck with your research project. >>> >>> Gus Correa >>> >>>> >>>> -----Original Message----- >>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus >>>> Correa >>>> Sent: Thursday, March 06, 2014 13:51 >>>> To: Open MPI Users >>>> Subject: Re: [OMPI users] OpenMPI job initializing problem >>>> >>>> On 03/06/2014 03:35 PM, Beichuan Yan wrote: >>>>> Gus, >>>>> >>>>> Yes, 10.148.0.0/16 is the IB subnet. >>>>> >>>>> I did try others but none worked: >>>>> #export >>>>> TCP="--mca btl sm,openib" >>>>> No run, no output >>>> >>>> If I remember right, and unless this changed in recent OMPI vervsions, you >>>> also need "self": >>>> >>>> -mca btl sm,openib,self >>>> >>>> Alternatively, you could rule out tcp: >>>> >>>> -mca btl ^tcp >>>> >>>>> >>>>> #export >>>>> TCP="--mca btl sm,openib --mca btl_tcp_if_include 10.148.0.0/16" >>>>> No run, no output >>>>> >>>>> Beichuan >>>> >>>> Likewise, "self" is missing here. >>>> >>>> Also, I don't know if you can ask for openib and also add --mca >>>> btl_tcp_if_include 10.148.0.0/16. >>>> Note that one turns off tcp (I think), whereas the other requests a >>>> tcp interface (or that the IB interface with IPoIB functionality). >>>> That combination sounds weird to me. >>>> The OMPI developers may clarify if this is valid syntax/syntax combination. >>>> >>>> I would try simply -mca btl sm,openib,self, which is likely to give >>>> you the IB transport with verbs, plus shared memory intra-node, >>>> plus the >>>> (mandatory?) self (loopback interface?). >>>> In my experience, this will also help identify any malfunctioning IB HCA >>>> in the nodes (with a failure/error message). >>>> >>>> >>>> I hope it helps, >>>> Gus Correa >>>> >>>> >>>>> >>>>> -----Original Message----- >>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus >>>>> Correa >>>>> Sent: Thursday, March 06, 2014 13:16 >>>>> To: Open MPI Users >>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem >>>>> >>>>> Hi Beichuan >>>>> >>>>> So, it looks like that now the program runs, even though with specific >>>>> settings depending on whether you're using OMPI 1.6.5 or 1.7.4, right? >>>>> >>>>> It looks like the problem now is performance, right? >>>>> >>>>> System load affects performance, but unless the network is overwhelmed, >>>>> or perhaps the Lustre file system is hanging or too slow, I would think >>>>> that a walltime increase from 1min to 10min is not related to system >>>>> load, but something else. >>>>> >>>>> Do you remember the setup that gave you 1min walltime? >>>>> Was it the same that you sent below? >>>>> Do you happen to know which nodes? >>>>> Are you sharing nodes with other jobs, or are you running alone on the >>>>> nodes? >>>>> Sharing with other processes may slow down your job. >>>>> If you request all cores in the node, PBS should give you a full node >>>>> (unless they tricked PBS to think the nodes have more cores than they >>>>> actually do). >>>>> How do you request the nodes in your #PBS directives? >>>>> Do you request nodes and ppn, or do you request procs? >>>>> >>>>> I suggest that you do: >>>>> cat $PBS_NODEFILE >>>>> in your PBS script, just to document which nodes are actually given to >>>>> you. >>>>> >>>>> Also helpful to document/troubleshoot is to add -v and -tag-output to >>>>> your mpiexec command line. >>>>> >>>>> >>>>> The difference in walltime could be due to some malfunction of IB HCAs on >>>>> the nodes, for instance. >>>>> Since you are allowing (if I remember right) the use of TCP, OpenMPI will >>>>> try to use any interfaces that you did not rule out. >>>>> If your mpiexec command line doesn't make any restriction, it will use >>>>> anything available, if I remember right. >>>>> (Jeff will correct me in the next second.) If your mpiexec command line >>>>> has mca btl_tcp_if_include 10.148.0.0/16 it will use the 10.148.0.0/16 >>>>> subnet in with TCP transport, I think. >>>>> (Jeff will cut my list subscription after that one, for spreading >>>>> misinformation.) >>>>> >>>>> In either case my impression is that you may have left a door open to the >>>>> use of non-IB (and non-IB-verbs) transport. >>>>> >>>>> Is 10.148.0.0/16 the an Infiniband subnet or an Ethernet subnet? >>>>> >>>>> Did you remeber Jeff's suggestion from a while ago to avoid TCP (over >>>>> Ethernet or over IB), and stick to IB verbs? >>>>> >>>>> >>>>> Is 10.148.0.0/16 the IB or the Ethernet subnet? >>>>> >>>>> On 03/02/2014 02:38 PM, Jeff Squyres (jsquyres) wrote: >>>>>> Both 1.6.x and 1.7.x/1.8.x will need verbs.h to use the native >>>>>> verbs network stack. >>>>>> >>>>>> You can use emulated TCP over IB (e.g., using the OMPI TCP BTL), >>>>>> but it's nowhere near as fast/efficient the native verbs network stack. >>>>>> >>>>> >>>>> >>>>> You could force the use of IB verbs with >>>>> >>>>> -mca btl ^tcp >>>>> >>>>> or with >>>>> >>>>> -mca btl sm,openib,self >>>>> >>>>> on the mpiexec command line. >>>>> >>>>> In this case, if any of the IB HCAs on the nodes is bad, the job >>>>> will abort with an error message, instead of running too slow (if >>>>> it is using other networks). >>>>> >>>>> There are also ways to tell OMPI to do a more verbose output, that >>>>> may perhaps help diagnose the problem. >>>>> ompi_info | grep verbose >>>>> may give some hints (I confess I don't remember them). >>>>> >>>>> >>>>> Believe me, this did happen to me, i.e., to run MPI programs in a >>>>> cluster that had all sorts of non-homogeneous nodes, some with >>>>> faulty IB HCAs, some with incomplete OFED installation, some that >>>>> were not mounting shared file systems properly, etc. >>>>> [I didn't administer that one!] >>>>> Hopefully that is not the problem you are facing, but verbose >>>>> output may help anyways. >>>>> >>>>> I hope this helps, >>>>> Gus Correa >>>>> >>>>> >>>>> >>>>> On 03/06/2014 01:49 PM, Beichuan Yan wrote: >>>>>> 1. For $TMPDIR and $TCP, there are four combinations by commenting >>>>>> on/off (note the system's default TMPDIR=/work3/yanb): >>>>>> export TMPDIR=/work1/home/yanb/tmp TCP="--mca btl_tcp_if_include >>>>>> 10.148.0.0/16" >>>>>> >>>>>> 2. I tested the 4 combinations for OpenMPI 1.6.5 and OpenMPI 1.7.4 >>>>>> respectively for the pure-MPI mode (no OPENMP threads; 8 nodes, each >>>>>> node runs 16 processes). The results are weird: of all 8 cases, only TWO >>>>>> of them can run, but run so slow: >>>>>> >>>>>> OpenMPI 1.6.5: >>>>>> export TMPDIR=/work1/home/yanb/tmp TCP="--mca btl_tcp_if_include >>>>>> 10.148.0.0/16" >>>>>> Warning: shared-memory, /work1/home/yanb/tmp/ Run, take 10 >>>>>> minutes, slow >>>>>> >>>>>> OpenMPI 1.7.4: >>>>>> #export TMPDIR=/work1/home/yanb/tmp #TCP="--mca >>>>>> btl_tcp_if_include 10.148.0.0/16" >>>>>> Warning: shared-memory /work3/yanb/605832.SPIRIT/ Run, take 10 >>>>>> minutess, slow >>>>>> >>>>>> So you see, a) openmpi 1.6.5 and 1.7.4 need different settings to >>>>>> run; >>>>> b) whether specifying TMPDIR, I got the shared memory warning. >>>>>> >>>>>> 3. But a few days ago, OpenMPI 1.6.5 worked great and took only 1 >>>>>> minute >>>>> (now it takes 10 minutes). I am so confused by the results. >>>>> Does the system loading level or fluctuation or PBS pro affect >>>>> OpenMPI performance? >>>>>> >>>>>> Thanks, >>>>>> Beichuan >>>>>> >>>>>> -----Original Message----- >>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus >>>>>> Correa >>>>>> Sent: Tuesday, March 04, 2014 08:48 >>>>>> To: Open MPI Users >>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem >>>>>> >>>>>> Hi Beichuan >>>>>> >>>>>> So, from "df" it looks like /home is /work1, right? >>>>>> >>>>>> Also, "mount" shows only /work[1-4], not the other >>>>>> 7 CWFS panfs (Panasas?), which apparently are not available in the >>>>>> compute nodes/blades. >>>>>> >>>>>> I presume you have access and are using only some of the >>>>>> /work[1-4] >>>>>> (lustre) file systems for all your MPI and other software installation, >>>>>> right? Not the panfs, right? >>>>>> >>>>>> Awkward that it doesn't work, because lustre is supposed to be a >>>>>> parallel file system, highly available to all nodes (assuming it is >>>>>> mounted on all nodes). >>>>>> >>>>>> It also shows a small /tmp with a tmpfs file system, which is volatile, >>>>>> in memory: >>>>>> >>>>>> http://en.wikipedia.org/wiki/Tmpfs >>>>>> >>>>>> I would guess they don't let you write there, so TMPDIR=/tmp may not be >>>>>> a possible option, but this is just a wild guess. >>>>>> Or maybe OMPI requires an actual non-volatile file system to write its >>>>>> shared memory auxiliary files and other stuff that normally goes on >>>>>> /tmp? [Jeff, Ralph, help!!] I kind of remember some old discussion on >>>>>> this list about this, but maybe it was in another list. >>>>>> >>>>>> [You could ask the sys admin about this, and perhaps what he >>>>>> recommends to use to replace /tmp.] >>>>>> >>>>>> Just in case they may have some file system mount point mixup, you could >>>>>> try perhaps TMPDIR=/work1/yanb/tmp (rather than /home) You could also >>>>>> try TMPDIR=/work3/yanb/tmp, as if I remember right this is another file >>>>>> system you have access to (not sure anymore, it may have been in the >>>>>> previous emails). >>>>>> Either way, you may need to create the tmp directory beforehand. >>>>>> >>>>>> ** >>>>>> >>>>>> Any chances that this is an environment mixup? >>>>>> >>>>>> Say, that you may be inadvertently using the SGI-MPI mpiexec Using a >>>>>> /full/path/to/mpiexec in your job may clarify this. >>>>>> >>>>>> "which mpiexec" will tell, but since the environment on the compute >>>>>> nodes may not be exactly the same as in the login node, it may not be >>>>>> reliable information. >>>>>> >>>>>> Or perhaps you may not be pointing to the OMPI libraries? >>>>>> Are you exporting PATH and LD_LIBRARY_PATH on .bashrc/.tcshrc, with the >>>>>> OMPI items (bin and lib) *PREPENDED* (not appended), so as to take >>>>>> precedence over other possible/SGI/pre-existent MPI items? >>>>>> >>>>>> Those are pretty (ugly) common problems. >>>>>> >>>>>> ** >>>>>> >>>>>> I hope this helps, >>>>>> Gus Correa >>>>>> >>>>>> On 03/03/2014 10:13 PM, Beichuan Yan wrote: >>>>>>> 1. info from a compute node >>>>>>> -bash-4.1$ hostname >>>>>>> r32i1n1 >>>>>>> -bash-4.1$ df -h /home >>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>> 10.148.18.45@o2ib:10.148.18.46@o2ib:/fs1 >>>>>>> 1.2P 136T 1.1P 12% /work1 -bash-4.1$ >>>>>>> mount devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs >>>>>>> on /tmp type tmpfs (rw,size=150m) none on >>>>>>> /proc/sys/fs/binfmt_misc type binfmt_misc >>>>>>> (rw) cpuset on /dev/cpuset type cpuset (rw) >>>>>>> 10.148.18.45@o2ib:10.148.18.46@o2ib:/fs1 on /work1 type lustre >>>>>>> (rw,flock) >>>>>>> 10.148.18.76@o2ib:10.148.18.164@o2ib:/fs2 on /work2 type lustre >>>>>>> (rw,flock) >>>>>>> 10.148.18.104@o2ib:10.148.18.165@o2ib:/fs3 on /work3 type lustre >>>>>>> (rw,flock) >>>>>>> 10.148.18.132@o2ib:10.148.18.133@o2ib:/fs4 on /work4 type lustre >>>>>>> (rw,flock) >>>>>>> >>>>>>> >>>>>>> 2. For "export TMPDIR=/home/yanb/tmp", I created it beforehand, and I >>>>>>> did see mpi-related temporary files there when the job gets started. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus >>>>>>> Correa >>>>>>> Sent: Monday, March 03, 2014 18:23 >>>>>>> To: Open MPI Users >>>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem >>>>>>> >>>>>>> Hi Beichuan >>>>>>> >>>>>>> OK, it says "unclassified.html", so I presume it is not a problem. >>>>>>> >>>>>>> The web site says the computer is an SGI ICE X. >>>>>>> I am not familiar to it, so what follows are guesses. >>>>>>> >>>>>>> The SGI site brochure suggests that the nodes/blades have local disks: >>>>>>> https://www.sgi.com/pdfs/4330.pdf >>>>>>> >>>>>>> The file systems prefixed with IP addresses (work[1-4]) and with panfs >>>>>>> (cwfs and CWFS[1-6]) and a colon (:) are shared exports (not local), >>>>>>> but not necessarily NFS (panfs may be Panasas?). >>>>>>> From this output it is hard to tell where /home is, but I would >>>>>>> guess it is also shared (not local). >>>>>>> Maybe "df -h /home" will tell. Or perhaps "mount". >>>>>>> >>>>>>> You may be logged in to a login/service node, so although it does have >>>>>>> a /tmp (your ls / shows tmp), this doesn't guarantee that the compute >>>>>>> nodes/blades also do. >>>>>>> >>>>>>> Since your jobs failed when you specified TMPDIR=/tmp, I would guess >>>>>>> /tmp doesn't exist on the nodes/blades, or is not writable. >>>>>>> >>>>>>> Did you try to submit a job with, say, "mpiexec -np 16 ls -ld /tmp"? >>>>>>> This should tell if /tmp exists on the nodes, if it is writable. >>>>>>> >>>>>>> A stupid question: >>>>>>> When you tried your job with this: >>>>>>> >>>>>>> export TMPDIR=/home/yanb/tmp >>>>>>> >>>>>>> Did you create the directory /home/yanb/tmp beforehand? >>>>>>> >>>>>>> Anyway, you may need to ask the help of a system administrator of this >>>>>>> machine. >>>>>>> >>>>>>> Gus Correa >>>>>>> >>>>>>> On 03/03/2014 07:43 PM, Beichuan Yan wrote: >>>>>>>> Gus, >>>>>>>> >>>>>>>> I am using this system: >>>>>>>> http://centers.hpc.mil/systems/unclassified.html#Spirit. I don't know >>>>>>>> exactly configurations of the file system. Here is the output of "df >>>>>>>> -h": >>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>> /dev/sda6 919G 16G 857G 2% / >>>>>>>> tmpfs 32G 0 32G 0% /dev/shm >>>>>>>> /dev/sda5 139M 33M 100M 25% /boot >>>>>>>> adfs3v-s:/adfs3/hafs14 >>>>>>>> 6.5T 678G 5.5T 11% /scratch >>>>>>>> adfs3v-s:/adfs3/hafs16 >>>>>>>> 6.5T 678G 5.5T 11% /var/spool/mail >>>>>>>> 10.148.18.45@o2ib:10.148.18.46@o2ib:/fs1 >>>>>>>> 1.2P 136T 1.1P 12% /work1 >>>>>>>> 10.148.18.132@o2ib:10.148.18.133@o2ib:/fs4 >>>>>>>> 1.2P 793T 368T 69% /work4 >>>>>>>> 10.148.18.104@o2ib:10.148.18.165@o2ib:/fs3 >>>>>>>> 1.2P 509T 652T 44% /work3 >>>>>>>> 10.148.18.76@o2ib:10.148.18.164@o2ib:/fs2 >>>>>>>> 1.2P 521T 640T 45% /work2 >>>>>>>> panfs://172.16.0.10/CWFS >>>>>>>> 728T 286T 443T 40% /p/cwfs >>>>>>>> panfs://172.16.1.61/CWFS1 >>>>>>>> 728T 286T 443T 40% /p/CWFS1 >>>>>>>> panfs://172.16.0.210/CWFS2 >>>>>>>> 728T 286T 443T 40% /p/CWFS2 >>>>>>>> panfs://172.16.1.125/CWFS3 >>>>>>>> 728T 286T 443T 40% /p/CWFS3 >>>>>>>> panfs://172.16.1.224/CWFS4 >>>>>>>> 728T 286T 443T 40% /p/CWFS4 >>>>>>>> panfs://172.16.1.224/CWFS5 >>>>>>>> 728T 286T 443T 40% /p/CWFS5 >>>>>>>> panfs://172.16.1.224/CWFS6 >>>>>>>> 728T 286T 443T 40% /p/CWFS6 >>>>>>>> panfs://172.16.1.224/CWFS7 >>>>>>>> 728T 286T 443T 40% /p/CWFS7 >>>>>>>> >>>>>>>> 1. My home directory is /home/yanb. >>>>>>>> My simulation files are located at /work3/yanb. >>>>>>>> The default TMPDIR set by system is just /work3/yanb >>>>>>>> >>>>>>>> 2. I did try not to set TMPDIR and let it default, which is just case >>>>>>>> 1 and case 2. >>>>>>>> Case1: #export TMPDIR=/home/yanb/tmp >>>>>>>> TCP="--mca btl_tcp_if_include 10.148.0.0/16" >>>>>>>> It gives no apparent reason. >>>>>>>> Case2: #export TMPDIR=/home/yanb/tmp >>>>>>>> #TCP="--mca btl_tcp_if_include 10.148.0.0/16" >>>>>>>> It gives warning of shared memory file on network file system. >>>>>>>> >>>>>>>> 3. With "export TMPDIR=/tmp", the job gives the same, no apparent >>>>>>>> reason. >>>>>>>> >>>>>>>> 4. FYI, "ls /" gives: >>>>>>>> ELT apps cgroup hafs1 hafs12 hafs2 hafs5 hafs8 home >>>>>>>> lost+found mnt p root selinux tftpboot var work3 >>>>>>>> admin bin dev hafs10 hafs13 hafs3 hafs6 hafs9 lib >>>>>>>> media net panfs sbin srv tmp work1 work4 >>>>>>>> app boot etc hafs11 hafs15 hafs4 hafs7 hafs_x86_64 lib64 >>>>>>>> misc opt proc scratch sys usr work2 workspace >>>>>>>> >>>>>>>> Beichuan >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of >>>>>>>> Gus Correa >>>>>>>> Sent: Monday, March 03, 2014 17:24 >>>>>>>> To: Open MPI Users >>>>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem >>>>>>>> >>>>>>>> Hi Beichuan >>>>>>>> >>>>>>>> If you are using the university cluster, chances are that /home is not >>>>>>>> local, but on an NFS share, or perhaps Lustre (which you may have >>>>>>>> mentioned before, I don't remember). >>>>>>>> >>>>>>>> Maybe "df -h" will show what is local what is not. >>>>>>>> It works for NFS, it prefixes file systems with the server name, but I >>>>>>>> don't know about Lustre. >>>>>>>> >>>>>>>> Did you try just not to set TMPDIR and let it default? >>>>>>>> If the default TMPDIR is on Lustre (did you say this?, anyway I >>>>>>>> don't >>>>>>>> remember) you could perhaps try to force it to /tmp: >>>>>>>> export TMPDIR=/tmp, >>>>>>>> If the cluster nodes are diskfull /tmp is likely to exist and be local >>>>>>>> to the cluster nodes. >>>>>>>> [But the cluster nodes may be diskless ... :( ] >>>>>>>> >>>>>>>> I hope this helps, >>>>>>>> Gus Correa >>>>>>>> >>>>>>>> On 03/03/2014 07:10 PM, Beichuan Yan wrote: >>>>>>>>> How to set TMPDIR to a local filesystem? Is /home/yanb/tmp a local >>>>>>>>> filesystem? I don't know how to tell a directory is local file system >>>>>>>>> or network file system. >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of >>>>>>>>> Jeff Squyres (jsquyres) >>>>>>>>> Sent: Monday, March 03, 2014 16:57 >>>>>>>>> To: Open MPI Users >>>>>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem >>>>>>>>> >>>>>>>>> How about setting TMPDIR to a local filesystem? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mar 3, 2014, at 3:43 PM, Beichuan Yan<beichuan....@colorado.edu> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I agree there are two cases for pure-MPI mode: 1. Job fails with no >>>>>>>>>> apparent reason; 2 job complains shared-memory file on network file >>>>>>>>>> system, which can be resolved by " export TMPDIR=/home/yanb/tmp", >>>>>>>>>> /home/yanb/tmp is my local directory. The default TMPDIR points to a >>>>>>>>>> Lustre directory. >>>>>>>>>> >>>>>>>>>> There is no any other output. I checked my job with "qstat -n" and >>>>>>>>>> found that processes were actually not started on compute nodes even >>>>>>>>>> though PBS Pro has "started" my job. >>>>>>>>>> >>>>>>>>>> Beichuan >>>>>>>>>> >>>>>>>>>>> 3. Then I test pure-MPI mode: OPENMP is turned off, and each >>>>>>>>>>> compute node runs 16 processes (clearly shared-memory of MPI is >>>>>>>>>>> used). Four combinations of "TMPDIR" and "TCP" are tested: >>>>>>>>>>> case 1: >>>>>>>>>>> #export TMPDIR=/home/yanb/tmp TCP="--mca btl_tcp_if_include >>>>>>>>>>> 10.148.0.0/16" >>>>>>>>>>> mpirun $TCP -np 64 -npernode 16 -hostfile $PBS_NODEFILE >>>>>>>>>>> ./paraEllip3d input.txt >>>>>>>>>>> output: >>>>>>>>>>> Start Prologue v2.5 Mon Mar 3 15:47:16 EST 2014 End >>>>>>>>>>> Prologue >>>>>>>>>>> v2.5 Mon Mar 3 15:47:16 EST 2014 >>>>>>>>>>> -bash: line 1: 448597 Terminated >>>>>>>>>>> /var/spool/PBS/mom_priv/jobs/602244.service12.SC >>>>>>>>>>> Start Epilogue v2.5 Mon Mar 3 15:50:51 EST 2014 Statistics >>>>>>>>>>> cpupercent=0,cput=00:00:00,mem=7028kb,ncpus=128,vmem=495768k >>>>>>>>>>> b >>>>>>>>>>> , >>>>>>>>>>> w >>>>>>>>>>> all >>>>>>>>>>> t >>>>>>>>>>> i >>>>>>>>>>> m >>>>>>>>>>> e >>>>>>>>>>> =00:03:24 End Epilogue v2.5 Mon Mar 3 15:50:52 EST 2014 >>>>>>>>>> >>>>>>>>>> It looks like you have two general cases: >>>>>>>>>> >>>>>>>>>> 1. The job fails for no apparent reason (like above), or 2. >>>>>>>>>> The job complains that your TMPDIR is on a shared filesystem >>>>>>>>>> >>>>>>>>>> Right? >>>>>>>>>> >>>>>>>>>> I think the real issue, then, is to figure out why your jobs are >>>>>>>>>> failing with no output. >>>>>>>>>> >>>>>>>>>> Is there anything in the stderr output? >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jeff Squyres >>>>>>>>>> jsquy...@cisco.com >>>>>>>>>> For corporate legal information go to: >>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jeff Squyres >>>>>>>>> jsquy...@cisco.com >>>>>>>>> For corporate legal information go to: >>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users