- Is /tmp on that machine on NFS or local?

- Have you looked at the text of the help message that came out before the "9 
more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs" 
message?  It should contain details about what the problematic NFS directory is.

- Do you know that it's MPI that is causing this low CPU utilization?

- You mentioned other MPI implementations; have you tested with them to see if 
they get better CPU utilization?

- What happens if you run this application on a single machine, with no network 
messaging?

- Do you know what specifically in your application is slow?  I.e., have you 
done any instrumentation to see what steps / API calls are running slowly, and 
then tried to figure out why?

- Do you have blocking message patterns that might operate well in shared 
memory, but expose the inefficiencies of its algorithms/design when it moves to 
higher-latency transports?

- How long does your application run for?

I ask these questions because MPI applications tend to be quite complicated. 
Sometimes it's the application itself that is the cause of slowdown / 
inefficiencies.



On Oct 23, 2014, at 9:29 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote:

> Later I change another machine and set the TMPDIR to default /tmp, but the 
> problem (low CPU utilization under 20%) still occur :<
> 
> Vincent
> 
> On Thu, Oct 23, 2014 at 10:38 PM, Jeff Squyres (jsquyres) 
> <jsquy...@cisco.com> wrote:
> If normal users can't write to /tmp (or if /tmp is an NFS-mounted 
> filesystem), that's the underlying problem.
> 
> @Vinson -- you should probably try to get that fixed.
> 
> 
> 
> On Oct 23, 2014, at 10:35 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
> 
> > It's not coming from OSHMEM but from the OPAL "shmem" framework. You are 
> > going to get terrible performance - possibly slowing to a crawl having all 
> > processes open their backing files for mmap on NSF. I think that's the 
> > error that he's getting.
> >
> >
> > Josh
> >
> > On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com> 
> > wrote:
> > HI, Thanks for your reply:)
> > I really run an MPI program (compile with OpenMPI and run with "mpirun -n 8 
> > ......"). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, what 
> > is OSHMEM ?
> >
> > Best
> > Vincent
> >
> > On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote:
> > From your error message, I gather you are not running an MPI program, but 
> > rather an OSHMEM one? Otherwise, I find the message strange as it only 
> > would be emitted from an OSHMEM program.
> >
> > What version of OMPI are you trying to use?
> >
> >> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote:
> >>
> >> Thanks for your reply:)
> >> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and 
> >> even reset to /tmp (I get the system permission), the problem still occur 
> >> (CPU utilization still lower than 20%). I have no idea why and ready to 
> >> give up OpenMPI instead of using other MPI library.
> >>
> >> --------Old Message-------------
> >>
> >> Date: Tue, 21 Oct 2014 22:21:31 -0400
> >> From: Brock Palen <bro...@umich.edu>
> >> To: Open MPI Users <us...@open-mpi.org>
> >> Subject: Re: [OMPI users] low CPU utilization with OpenMPI
> >> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu>
> >> Content-Type: text/plain; charset=us-ascii
> >>
> >> Doing special files on NFS can be weird,  try the other /tmp/ locations:
> >>
> >> /var/tmp/
> >> /dev/shm  (ram disk careful!)
> >>
> >> Brock Palen
> >> www.umich.edu/~brockp
> >> CAEN Advanced Computing
> >> XSEDE Campus Champion
> >> bro...@umich.edu
> >> (734)936-1985
> >>
> >>
> >>
> >> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> 
> >> > wrote:
> >> >
> >> > Because of permission reason (OpenMPI can not write temporary file to 
> >> > the default /tmp directory), I change the TMPDIR to my local directory 
> >> > (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But 
> >> > the CPU utilization is very low under 20% (8 MPI rank running in Intel 
> >> > Xeon 8-core CPU).
> >> >
> >> > And I also got some message when I run with OpenMPI:
> >> > [cn3:28072] 9 more processes have sent help message 
> >> > help-opal-shmem-mmap.txt / mmap on nfs
> >> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> >> > help / error messages
> >> >
> >> > Any idea?
> >> > Thanks
> >> >
> >> > VIncent


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to