- Is /tmp on that machine on NFS or local? - Have you looked at the text of the help message that came out before the "9 more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs" message? It should contain details about what the problematic NFS directory is.
- Do you know that it's MPI that is causing this low CPU utilization? - You mentioned other MPI implementations; have you tested with them to see if they get better CPU utilization? - What happens if you run this application on a single machine, with no network messaging? - Do you know what specifically in your application is slow? I.e., have you done any instrumentation to see what steps / API calls are running slowly, and then tried to figure out why? - Do you have blocking message patterns that might operate well in shared memory, but expose the inefficiencies of its algorithms/design when it moves to higher-latency transports? - How long does your application run for? I ask these questions because MPI applications tend to be quite complicated. Sometimes it's the application itself that is the cause of slowdown / inefficiencies. On Oct 23, 2014, at 9:29 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote: > Later I change another machine and set the TMPDIR to default /tmp, but the > problem (low CPU utilization under 20%) still occur :< > > Vincent > > On Thu, Oct 23, 2014 at 10:38 PM, Jeff Squyres (jsquyres) > <jsquy...@cisco.com> wrote: > If normal users can't write to /tmp (or if /tmp is an NFS-mounted > filesystem), that's the underlying problem. > > @Vinson -- you should probably try to get that fixed. > > > > On Oct 23, 2014, at 10:35 AM, Joshua Ladd <jladd.m...@gmail.com> wrote: > > > It's not coming from OSHMEM but from the OPAL "shmem" framework. You are > > going to get terrible performance - possibly slowing to a crawl having all > > processes open their backing files for mmap on NSF. I think that's the > > error that he's getting. > > > > > > Josh > > > > On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com> > > wrote: > > HI, Thanks for your reply:) > > I really run an MPI program (compile with OpenMPI and run with "mpirun -n 8 > > ......"). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, what > > is OSHMEM ? > > > > Best > > Vincent > > > > On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote: > > From your error message, I gather you are not running an MPI program, but > > rather an OSHMEM one? Otherwise, I find the message strange as it only > > would be emitted from an OSHMEM program. > > > > What version of OMPI are you trying to use? > > > >> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote: > >> > >> Thanks for your reply:) > >> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and > >> even reset to /tmp (I get the system permission), the problem still occur > >> (CPU utilization still lower than 20%). I have no idea why and ready to > >> give up OpenMPI instead of using other MPI library. > >> > >> --------Old Message------------- > >> > >> Date: Tue, 21 Oct 2014 22:21:31 -0400 > >> From: Brock Palen <bro...@umich.edu> > >> To: Open MPI Users <us...@open-mpi.org> > >> Subject: Re: [OMPI users] low CPU utilization with OpenMPI > >> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu> > >> Content-Type: text/plain; charset=us-ascii > >> > >> Doing special files on NFS can be weird, try the other /tmp/ locations: > >> > >> /var/tmp/ > >> /dev/shm (ram disk careful!) > >> > >> Brock Palen > >> www.umich.edu/~brockp > >> CAEN Advanced Computing > >> XSEDE Campus Champion > >> bro...@umich.edu > >> (734)936-1985 > >> > >> > >> > >> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> > >> > wrote: > >> > > >> > Because of permission reason (OpenMPI can not write temporary file to > >> > the default /tmp directory), I change the TMPDIR to my local directory > >> > (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But > >> > the CPU utilization is very low under 20% (8 MPI rank running in Intel > >> > Xeon 8-core CPU). > >> > > >> > And I also got some message when I run with OpenMPI: > >> > [cn3:28072] 9 more processes have sent help message > >> > help-opal-shmem-mmap.txt / mmap on nfs > >> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all > >> > help / error messages > >> > > >> > Any idea? > >> > Thanks > >> > > >> > VIncent -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/