Re: [OMPI users] Low CPU utilization
Hi, Am 16.10.2016 um 20:34 schrieb Mahmood Naderan: > Hi, > I am running two softwares that use OMPI-2.0.1. Problem is that the CPU > utilization is low on the nodes. > > > For example, see the process information below > > [root@compute-0-1 ~]# ps aux | grep siesta > mahmood 14635 0.0 0.0 108156 1300 ?S21:58 0:00 /bin/bash > /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf > mahmood 14636 0.0 0.0 108156 1300 ?S21:58 0:00 /bin/bash > /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf > mahmood 14637 61.6 0.2 372076 158220 ? Rl 21:58 0:38 > /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta > mahmood 14639 59.6 0.2 365992 154228 ? Rl 21:58 0:37 > /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta > > > Note that the cpu utilization is the third column. The "siesta.pl" script is > > #!/bin/bash > BENCH=$1 > export OMP_NUM_THREADS=1 > /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta < $BENCH > > > > > I also saw a similar behavior from Gromacs which has been discussed at > https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2016-October/108939.html > > It seems that there is a tricky thing with OMPI. Any idea is welcomed. Sounds like the two jobs are using the same cores by automatic core binding as one instance doesn't know anything about the other. For a first test you can start both with "mpiexec --bind-to none ..." and check whether you see a different behavior. `man mpiexec` mentions some hints about threads in applications. -- Reuti > > > Regards, > Mahmood > > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] Low CPU utilization
Hi, I am running two softwares that use OMPI-2.0.1. Problem is that the CPU utilization is low on the nodes. For example, see the process information below [root@compute-0-1 ~]# ps aux | grep siesta mahmood 14635 0.0 0.0 108156 1300 ?S21:58 0:00 /bin/bash /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf mahmood 14636 0.0 0.0 108156 1300 ?S21:58 0:00 /bin/bash /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf mahmood 14637 61.6 0.2 372076 158220 ? Rl 21:58 0:38 /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta mahmood 14639 59.6 0.2 365992 154228 ? Rl 21:58 0:37 /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta Note that the cpu utilization is the third column. The "siesta.pl" script is #!/bin/bash BENCH=$1 export OMP_NUM_THREADS=1 /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta < $BENCH I also saw a similar behavior from Gromacs which has been discussed at https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2016-October/108939.html It seems that there is a tricky thing with OMPI. Any idea is welcomed. Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] low CPU utilization with OpenMPI
Can you also check there is no cpu binding issue (several mpi tasks and/or OpenMP threads if any, bound to the same core and doing time sharing ? A simple way to check that is to log into a compute node, run top and then press 1 f j If some cores have higher usage than others, you are likely doing time sharing. An other option is to disable cpu binding (ompi and openmp if any) and see if things get better (This is suboptimal but still better than time sharing) "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: >- Is /tmp on that machine on NFS or local? > >- Have you looked at the text of the help message that came out before the "9 >more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs" >message? It should contain details about what the problematic NFS directory >is. > >- Do you know that it's MPI that is causing this low CPU utilization? > >- You mentioned other MPI implementations; have you tested with them to see if >they get better CPU utilization? > >- What happens if you run this application on a single machine, with no >network messaging? > >- Do you know what specifically in your application is slow? I.e., have you >done any instrumentation to see what steps / API calls are running slowly, and >then tried to figure out why? > >- Do you have blocking message patterns that might operate well in shared >memory, but expose the inefficiencies of its algorithms/design when it moves >to higher-latency transports? > >- How long does your application run for? > >I ask these questions because MPI applications tend to be quite complicated. >Sometimes it's the application itself that is the cause of slowdown / >inefficiencies. > > > >On Oct 23, 2014, at 9:29 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote: > >> Later I change another machine and set the TMPDIR to default /tmp, but the >> problem (low CPU utilization under 20%) still occur :< >> >> Vincent >> >> On Thu, Oct 23, 2014 at 10:38 PM, Jeff Squyres (jsquyres) >> <jsquy...@cisco.com> wrote: >> If normal users can't write to /tmp (or if /tmp is an NFS-mounted >> filesystem), that's the underlying problem. >> >> @Vinson -- you should probably try to get that fixed. >> >> >> >> On Oct 23, 2014, at 10:35 AM, Joshua Ladd <jladd.m...@gmail.com> wrote: >> >> > It's not coming from OSHMEM but from the OPAL "shmem" framework. You are >> > going to get terrible performance - possibly slowing to a crawl having all >> > processes open their backing files for mmap on NSF. I think that's the >> > error that he's getting. >> > >> > >> > Josh >> > >> > On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com> >> > wrote: >> > HI, Thanks for your reply:) >> > I really run an MPI program (compile with OpenMPI and run with "mpirun -n >> > 8 .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, >> > what is OSHMEM ? >> > >> > Best >> > Vincent >> > >> > On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote: >> > From your error message, I gather you are not running an MPI program, but >> > rather an OSHMEM one? Otherwise, I find the message strange as it only >> > would be emitted from an OSHMEM program. >> > >> > What version of OMPI are you trying to use? >> > >> >> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote: >> >> >> >> Thanks for your reply:) >> >> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and >> >> even reset to /tmp (I get the system permission), the problem still occur >> >> (CPU utilization still lower than 20%). I have no idea why and ready to >> >> give up OpenMPI instead of using other MPI library. >> >> >> >> Old Message- >> >> >> >> Date: Tue, 21 Oct 2014 22:21:31 -0400 >> >> From: Brock Palen <bro...@umich.edu> >> >> To: Open MPI Users <us...@open-mpi.org> >> >> Subject: Re: [OMPI users] low CPU utilization with OpenMPI >> >> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu> >> >> Content-Type: text/plain; charset=us-ascii >> >> >> >> Doing special files on NFS can be weird, try the other /tmp/ locations: >> >> >> >> /var/tmp/ >> >> /dev/shm (ram disk careful!) >> >> >> >
Re: [OMPI users] low CPU utilization with OpenMPI
- Is /tmp on that machine on NFS or local? - Have you looked at the text of the help message that came out before the "9 more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs" message? It should contain details about what the problematic NFS directory is. - Do you know that it's MPI that is causing this low CPU utilization? - You mentioned other MPI implementations; have you tested with them to see if they get better CPU utilization? - What happens if you run this application on a single machine, with no network messaging? - Do you know what specifically in your application is slow? I.e., have you done any instrumentation to see what steps / API calls are running slowly, and then tried to figure out why? - Do you have blocking message patterns that might operate well in shared memory, but expose the inefficiencies of its algorithms/design when it moves to higher-latency transports? - How long does your application run for? I ask these questions because MPI applications tend to be quite complicated. Sometimes it's the application itself that is the cause of slowdown / inefficiencies. On Oct 23, 2014, at 9:29 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote: > Later I change another machine and set the TMPDIR to default /tmp, but the > problem (low CPU utilization under 20%) still occur :< > > Vincent > > On Thu, Oct 23, 2014 at 10:38 PM, Jeff Squyres (jsquyres) > <jsquy...@cisco.com> wrote: > If normal users can't write to /tmp (or if /tmp is an NFS-mounted > filesystem), that's the underlying problem. > > @Vinson -- you should probably try to get that fixed. > > > > On Oct 23, 2014, at 10:35 AM, Joshua Ladd <jladd.m...@gmail.com> wrote: > > > It's not coming from OSHMEM but from the OPAL "shmem" framework. You are > > going to get terrible performance - possibly slowing to a crawl having all > > processes open their backing files for mmap on NSF. I think that's the > > error that he's getting. > > > > > > Josh > > > > On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com> > > wrote: > > HI, Thanks for your reply:) > > I really run an MPI program (compile with OpenMPI and run with "mpirun -n 8 > > .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, what > > is OSHMEM ? > > > > Best > > Vincent > > > > On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote: > > From your error message, I gather you are not running an MPI program, but > > rather an OSHMEM one? Otherwise, I find the message strange as it only > > would be emitted from an OSHMEM program. > > > > What version of OMPI are you trying to use? > > > >> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote: > >> > >> Thanks for your reply:) > >> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and > >> even reset to /tmp (I get the system permission), the problem still occur > >> (CPU utilization still lower than 20%). I have no idea why and ready to > >> give up OpenMPI instead of using other MPI library. > >> > >> Old Message- > >> > >> Date: Tue, 21 Oct 2014 22:21:31 -0400 > >> From: Brock Palen <bro...@umich.edu> > >> To: Open MPI Users <us...@open-mpi.org> > >> Subject: Re: [OMPI users] low CPU utilization with OpenMPI > >> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu> > >> Content-Type: text/plain; charset=us-ascii > >> > >> Doing special files on NFS can be weird, try the other /tmp/ locations: > >> > >> /var/tmp/ > >> /dev/shm (ram disk careful!) > >> > >> Brock Palen > >> www.umich.edu/~brockp > >> CAEN Advanced Computing > >> XSEDE Campus Champion > >> bro...@umich.edu > >> (734)936-1985 > >> > >> > >> > >> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> > >> > wrote: > >> > > >> > Because of permission reason (OpenMPI can not write temporary file to > >> > the default /tmp directory), I change the TMPDIR to my local directory > >> > (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But > >> > the CPU utilization is very low under 20% (8 MPI rank running in Intel > >> > Xeon 8-core CPU). > >> > > >> > And I also got some message when I run with OpenMPI: > >> > [cn3:28072] 9 more processes have sent help message > >> > help-opal-shmem-mmap.txt / mmap on nfs > >> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all > >> > help / error messages > >> > > >> > Any idea? > >> > Thanks > >> > > >> > VIncent -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] low CPU utilization with OpenMPI
Later I change another machine and set the TMPDIR to default /tmp, but the problem (low CPU utilization under 20%) still occur :< Vincent On Thu, Oct 23, 2014 at 10:38 PM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > If normal users can't write to /tmp (or if /tmp is an NFS-mounted > filesystem), that's the underlying problem. > > @Vinson -- you should probably try to get that fixed. > > > > On Oct 23, 2014, at 10:35 AM, Joshua Ladd <jladd.m...@gmail.com> wrote: > > > It's not coming from OSHMEM but from the OPAL "shmem" framework. You are > going to get terrible performance - possibly slowing to a crawl having all > processes open their backing files for mmap on NSF. I think that's the > error that he's getting. > > > > > > Josh > > > > On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com> > wrote: > > HI, Thanks for your reply:) > > I really run an MPI program (compile with OpenMPI and run with "mpirun > -n 8 .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, > what is OSHMEM ? > > > > Best > > Vincent > > > > On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> > wrote: > > From your error message, I gather you are not running an MPI program, > but rather an OSHMEM one? Otherwise, I find the message strange as it only > would be emitted from an OSHMEM program. > > > > What version of OMPI are you trying to use? > > > >> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> > wrote: > >> > >> Thanks for your reply:) > >> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm > and even reset to /tmp (I get the system permission), the problem still > occur (CPU utilization still lower than 20%). I have no idea why and ready > to give up OpenMPI instead of using other MPI library. > >> > >> Old Message- > >> > >> Date: Tue, 21 Oct 2014 22:21:31 -0400 > >> From: Brock Palen <bro...@umich.edu> > >> To: Open MPI Users <us...@open-mpi.org> > >> Subject: Re: [OMPI users] low CPU utilization with OpenMPI > >> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu> > >> Content-Type: text/plain; charset=us-ascii > >> > >> Doing special files on NFS can be weird, try the other /tmp/ locations: > >> > >> /var/tmp/ > >> /dev/shm (ram disk careful!) > >> > >> Brock Palen > >> www.umich.edu/~brockp > >> CAEN Advanced Computing > >> XSEDE Campus Champion > >> bro...@umich.edu > >> (734)936-1985 > >> > >> > >> > >> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> > wrote: > >> > > >> > Because of permission reason (OpenMPI can not write temporary file to > the default /tmp directory), I change the TMPDIR to my local directory > (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But the > CPU utilization is very low under 20% (8 MPI rank running in Intel Xeon > 8-core CPU). > >> > > >> > And I also got some message when I run with OpenMPI: > >> > [cn3:28072] 9 more processes have sent help message > help-opal-shmem-mmap.txt / mmap on nfs > >> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see > all help / error messages > >> > > >> > Any idea? > >> > Thanks > >> > > >> > VIncent > >> > ___ > >> > users mailing list > >> > us...@open-mpi.org > >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25548.php > >> ___ > >> users mailing list > >> us...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >> Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/2.php > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25556.php > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25558.php > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25560.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25561.php >
Re: [OMPI users] low CPU utilization with OpenMPI
How can I fix the error if all processes open their backing files for mmap on NSF like you said? Vincent On Thu, Oct 23, 2014 at 10:35 PM, Joshua Ladd <jladd.m...@gmail.com> wrote: > It's not coming from OSHMEM but from the OPAL "shmem" framework. You are > going to get terrible performance - possibly slowing to a crawl having all > processes open their backing files for mmap on NSF. I think that's the > error that he's getting. > > > Josh > > On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com> > wrote: > >> HI, Thanks for your reply:) >> I really run an MPI program (compile with OpenMPI and run with "mpirun -n >> 8 .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, >> what is OSHMEM ? >> >> Best >> Vincent >> >> On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> From your error message, I gather you are not running an MPI program, >>> but rather an OSHMEM one? Otherwise, I find the message strange as it only >>> would be emitted from an OSHMEM program. >>> >>> What version of OMPI are you trying to use? >>> >>> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> >>> wrote: >>> >>> Thanks for your reply:) >>> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm >>> and even reset to /tmp (I get the system permission), the problem still >>> occur (CPU utilization still lower than 20%). I have no idea why and ready >>> to give up OpenMPI instead of using other MPI library. >>> >>> Old Message- >>> >>> Date: Tue, 21 Oct 2014 22:21:31 -0400 >>> From: Brock Palen <bro...@umich.edu> >>> To: Open MPI Users <us...@open-mpi.org> >>> Subject: Re: [OMPI users] low CPU utilization with OpenMPI >>> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu> >>> Content-Type: text/plain; charset=us-ascii >>> >>> Doing special files on NFS can be weird, try the other /tmp/ locations: >>> >>> /var/tmp/ >>> /dev/shm (ram disk careful!) >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> CAEN Advanced Computing >>> XSEDE Campus Champion >>> bro...@umich.edu >>> (734)936-1985 >>> >>> >>> >>> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> >>> wrote: >>> > >>> > Because of permission reason (OpenMPI can not write temporary file to >>> the default /tmp directory), I change the TMPDIR to my local directory >>> (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But the >>> CPU utilization is very low under 20% (8 MPI rank running in Intel Xeon >>> 8-core CPU). >>> > >>> > And I also got some message when I run with OpenMPI: >>> > [cn3:28072] 9 more processes have sent help message >>> help-opal-shmem-mmap.txt / mmap on nfs >>> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see >>> all help / error messages >>> > >>> > Any idea? >>> > Thanks >>> > >>> > VIncent >>> > ___ >>> > users mailing list >>> > us...@open-mpi.org >>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> > Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/10/25548.php >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/10/2.php >>> >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/10/25556.php >>> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25558.php >> > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25560.php >
Re: [OMPI users] low CPU utilization with OpenMPI
If normal users can't write to /tmp (or if /tmp is an NFS-mounted filesystem), that's the underlying problem. @Vinson -- you should probably try to get that fixed. On Oct 23, 2014, at 10:35 AM, Joshua Ladd <jladd.m...@gmail.com> wrote: > It's not coming from OSHMEM but from the OPAL "shmem" framework. You are > going to get terrible performance - possibly slowing to a crawl having all > processes open their backing files for mmap on NSF. I think that's the error > that he's getting. > > > Josh > > On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com> wrote: > HI, Thanks for your reply:) > I really run an MPI program (compile with OpenMPI and run with "mpirun -n 8 > .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, what is > OSHMEM ? > > Best > Vincent > > On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote: > From your error message, I gather you are not running an MPI program, but > rather an OSHMEM one? Otherwise, I find the message strange as it only would > be emitted from an OSHMEM program. > > What version of OMPI are you trying to use? > >> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote: >> >> Thanks for your reply:) >> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and >> even reset to /tmp (I get the system permission), the problem still occur >> (CPU utilization still lower than 20%). I have no idea why and ready to give >> up OpenMPI instead of using other MPI library. >> >> Old Message--------- >> >> Date: Tue, 21 Oct 2014 22:21:31 -0400 >> From: Brock Palen <bro...@umich.edu> >> To: Open MPI Users <us...@open-mpi.org> >> Subject: Re: [OMPI users] low CPU utilization with OpenMPI >> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu> >> Content-Type: text/plain; charset=us-ascii >> >> Doing special files on NFS can be weird, try the other /tmp/ locations: >> >> /var/tmp/ >> /dev/shm (ram disk careful!) >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote: >> > >> > Because of permission reason (OpenMPI can not write temporary file to the >> > default /tmp directory), I change the TMPDIR to my local directory (export >> > TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU >> > utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core >> > CPU). >> > >> > And I also got some message when I run with OpenMPI: >> > [cn3:28072] 9 more processes have sent help message >> > help-opal-shmem-mmap.txt / mmap on nfs >> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all >> > help / error messages >> > >> > Any idea? >> > Thanks >> > >> > VIncent >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2014/10/25548.php >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/2.php > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25556.php > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25558.php > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25560.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] low CPU utilization with OpenMPI
It's not coming from OSHMEM but from the OPAL "shmem" framework. You are going to get terrible performance - possibly slowing to a crawl having all processes open their backing files for mmap on NSF. I think that's the error that he's getting. Josh On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com> wrote: > HI, Thanks for your reply:) > I really run an MPI program (compile with OpenMPI and run with "mpirun -n > 8 .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, > what is OSHMEM ? > > Best > Vincent > > On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> From your error message, I gather you are not running an MPI program, but >> rather an OSHMEM one? Otherwise, I find the message strange as it only >> would be emitted from an OSHMEM program. >> >> What version of OMPI are you trying to use? >> >> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> >> wrote: >> >> Thanks for your reply:) >> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and >> even reset to /tmp (I get the system permission), the problem still occur >> (CPU utilization still lower than 20%). I have no idea why and ready to >> give up OpenMPI instead of using other MPI library. >> >> Old Message--------- >> >> Date: Tue, 21 Oct 2014 22:21:31 -0400 >> From: Brock Palen <bro...@umich.edu> >> To: Open MPI Users <us...@open-mpi.org> >> Subject: Re: [OMPI users] low CPU utilization with OpenMPI >> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu> >> Content-Type: text/plain; charset=us-ascii >> >> Doing special files on NFS can be weird, try the other /tmp/ locations: >> >> /var/tmp/ >> /dev/shm (ram disk careful!) >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> >> wrote: >> > >> > Because of permission reason (OpenMPI can not write temporary file to >> the default /tmp directory), I change the TMPDIR to my local directory >> (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But the >> CPU utilization is very low under 20% (8 MPI rank running in Intel Xeon >> 8-core CPU). >> > >> > And I also got some message when I run with OpenMPI: >> > [cn3:28072] 9 more processes have sent help message >> help-opal-shmem-mmap.txt / mmap on nfs >> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see >> all help / error messages >> > >> > Any idea? >> > Thanks >> > >> > VIncent >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25548.php >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/2.php >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25556.php >> > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25558.php >
Re: [OMPI users] low CPU utilization with OpenMPI
HI, Thanks for your reply:) I really run an MPI program (compile with OpenMPI and run with "mpirun -n 8 .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, what is OSHMEM ? Best Vincent On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote: > From your error message, I gather you are not running an MPI program, but > rather an OSHMEM one? Otherwise, I find the message strange as it only > would be emitted from an OSHMEM program. > > What version of OMPI are you trying to use? > > On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote: > > Thanks for your reply:) > Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and > even reset to /tmp (I get the system permission), the problem still occur > (CPU utilization still lower than 20%). I have no idea why and ready to > give up OpenMPI instead of using other MPI library. > > Old Message- > > Date: Tue, 21 Oct 2014 22:21:31 -0400 > From: Brock Palen <bro...@umich.edu> > To: Open MPI Users <us...@open-mpi.org> > Subject: Re: [OMPI users] low CPU utilization with OpenMPI > Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu> > Content-Type: text/plain; charset=us-ascii > > Doing special files on NFS can be weird, try the other /tmp/ locations: > > /var/tmp/ > /dev/shm (ram disk careful!) > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> > wrote: > > > > Because of permission reason (OpenMPI can not write temporary file to > the default /tmp directory), I change the TMPDIR to my local directory > (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But the > CPU utilization is very low under 20% (8 MPI rank running in Intel Xeon > 8-core CPU). > > > > And I also got some message when I run with OpenMPI: > > [cn3:28072] 9 more processes have sent help message > help-opal-shmem-mmap.txt / mmap on nfs > > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all > help / error messages > > > > Any idea? > > Thanks > > > > VIncent > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25548.php > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/2.php > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25556.php >
Re: [OMPI users] low CPU utilization with OpenMPI
From your error message, I gather you are not running an MPI program, but rather an OSHMEM one? Otherwise, I find the message strange as it only would be emitted from an OSHMEM program. What version of OMPI are you trying to use? > On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote: > > Thanks for your reply:) > Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and > even reset to /tmp (I get the system permission), the problem still occur > (CPU utilization still lower than 20%). I have no idea why and ready to give > up OpenMPI instead of using other MPI library. > > Old Message- > > Date: Tue, 21 Oct 2014 22:21:31 -0400 > From: Brock Palen <bro...@umich.edu <mailto:bro...@umich.edu>> > To: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>> > Subject: Re: [OMPI users] low CPU utilization with OpenMPI > Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu > <mailto:cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu>> > Content-Type: text/plain; charset=us-ascii > > Doing special files on NFS can be weird, try the other /tmp/ locations: > > /var/tmp/ > /dev/shm (ram disk careful!) > > Brock Palen > www.umich.edu/~brockp <http://www.umich.edu/~brockp> > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu <mailto:bro...@umich.edu> > (734)936-1985 > > > > > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com > > <mailto:lwhvinson1...@gmail.com>> wrote: > > > > Because of permission reason (OpenMPI can not write temporary file to the > > default /tmp directory), I change the TMPDIR to my local directory (export > > TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU > > utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core > > CPU). > > > > And I also got some message when I run with OpenMPI: > > [cn3:28072] 9 more processes have sent help message > > help-opal-shmem-mmap.txt / mmap on nfs > > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all > > help / error messages > > > > Any idea? > > Thanks > > > > VIncent > > ___ > > users mailing list > > us...@open-mpi.org <mailto:us...@open-mpi.org> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/10/25548.php > > <http://www.open-mpi.org/community/lists/users/2014/10/25548.php>___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/2.php
Re: [OMPI users] low CPU utilization with OpenMPI
Thanks for your reply:) Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and even reset to /tmp (I get the system permission), the problem still occur (CPU utilization still lower than 20%). I have no idea why and ready to give up OpenMPI instead of using other MPI library. Old Message- List-Post: users@lists.open-mpi.org Date: Tue, 21 Oct 2014 22:21:31 -0400 From: Brock Palen <bro...@umich.edu> To: Open MPI Users <us...@open-mpi.org> Subject: Re: [OMPI users] low CPU utilization with OpenMPI Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu> Content-Type: text/plain; charset=us-ascii Doing special files on NFS can be weird, try the other /tmp/ locations: /var/tmp/ /dev/shm (ram disk careful!) Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote: > > Because of permission reason (OpenMPI can not write temporary file to the default /tmp directory), I change the TMPDIR to my local directory (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core CPU). > > And I also got some message when I run with OpenMPI: > [cn3:28072] 9 more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages > > Any idea? > Thanks > > VIncent > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: http://www.open-mpi.org/community/lists/users/2014/10/25548.php
Re: [OMPI users] low CPU utilization with OpenMPI
Doing special files on NFS can be weird, try the other /tmp/ locations: /var/tmp/ /dev/shm (ram disk careful!) Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Oct 21, 2014, at 10:18 PM, Vinson Leungwrote: > > Because of permission reason (OpenMPI can not write temporary file to the > default /tmp directory), I change the TMPDIR to my local directory (export > TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU > utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core > CPU). > > And I also got some message when I run with OpenMPI: > [cn3:28072] 9 more processes have sent help message help-opal-shmem-mmap.txt > / mmap on nfs > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help > / error messages > > Any idea? > Thanks > > VIncent > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25548.php
[OMPI users] low CPU utilization with OpenMPI
Because of permission reason (OpenMPI can not write temporary file to the default /tmp directory), I change the TMPDIR to my local directory (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core CPU). And I also got some message when I run with OpenMPI: [cn3:28072] 9 more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages Any idea? Thanks VIncent
[OMPI users] Low cpu utilization due to high IO operations of openmpi
Dear all, We have a diskless cluster with these specs: 1) A server which has some disks. Root directories (/usr, /lib, ...) are on /dev/sda while /home is on /dev/sdb and these are two physical hard drives. 2) Some compute nodes. These don't have any disk drive instead they are connected through a 10/100/1000 switch to the server 3) Nodes uses a NFS directory for booting which resides on /dev/sda. In our cluster we use openmpi with openfoam. Both were compiled using default options. Problem is when the openfoam solver with openmpi is sent to a compute node, the opnempi uses a lot of *WRITE* operations which causes a low cpu utilization and hence the processes are mainly in 'D' state. A brief description of our tests are: We ssh to the compute node and run the application there Test 1) One process of openfoam is launched without openmpi. Everything is fine and cpu is utilized 100% Test 2) Two processes of openfoam is launched (mpirun -np 2 ). Two openfoam processes has about 30% cpu utilization and they are in 'D' state most of the time. Is there any suggestion on that. That is a really poor performance Regards, Mahmood