Re: [OMPI users] Low CPU utilization

2016-10-16 Thread Reuti
Hi,

Am 16.10.2016 um 20:34 schrieb Mahmood Naderan:

> Hi,
> I am running two softwares that use OMPI-2.0.1. Problem is that the CPU 
> utilization is low on the nodes.
> 
> 
> For example, see the process information below
> 
> [root@compute-0-1 ~]# ps aux | grep siesta
> mahmood  14635  0.0  0.0 108156  1300 ?S21:58   0:00 /bin/bash 
> /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf
> mahmood  14636  0.0  0.0 108156  1300 ?S21:58   0:00 /bin/bash 
> /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf
> mahmood  14637 61.6  0.2 372076 158220 ?   Rl   21:58   0:38 
> /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta
> mahmood  14639 59.6  0.2 365992 154228 ?   Rl   21:58   0:37 
> /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta
> 
> 
> Note that the cpu utilization is the third column. The "siesta.pl" script is
> 
> #!/bin/bash
> BENCH=$1
> export OMP_NUM_THREADS=1
> /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta < $BENCH
> 
> 
> 
> 
> I also saw a similar behavior from Gromacs which has been discussed at 
> https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2016-October/108939.html
> 
> It seems that there is a tricky thing with OMPI. Any idea is welcomed.

Sounds like the two jobs are using the same cores by automatic core binding as 
one instance doesn't know anything about the other. For a first test you can 
start both with "mpiexec --bind-to none ..." and check whether you see a 
different behavior.

`man mpiexec` mentions some hints about threads in applications.

-- Reuti

> 
> 
> Regards,
> Mahmood
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] Low CPU utilization

2016-10-16 Thread Mahmood Naderan
Hi,
I am running two softwares that use OMPI-2.0.1. Problem is that the CPU
utilization is low on the nodes.


For example, see the process information below

[root@compute-0-1 ~]# ps aux | grep siesta
mahmood  14635  0.0  0.0 108156  1300 ?S21:58   0:00 /bin/bash
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf
mahmood  14636  0.0  0.0 108156  1300 ?S21:58   0:00 /bin/bash
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf
mahmood  14637 61.6  0.2 372076 158220 ?   Rl   21:58   0:38
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta
mahmood  14639 59.6  0.2 365992 154228 ?   Rl   21:58   0:37
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta


Note that the cpu utilization is the third column. The "siesta.pl" script is

#!/bin/bash
BENCH=$1
export OMP_NUM_THREADS=1
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta < $BENCH




I also saw a similar behavior from Gromacs which has been discussed at
https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2016-October/108939.html

It seems that there is a tricky thing with OMPI. Any idea is welcomed.


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OMPI users] low CPU utilization with OpenMPI

2014-10-24 Thread Gilles Gouaillardet
Can you also check there is no cpu binding issue (several mpi tasks and/or 
OpenMP threads if any, bound to the same core and doing time sharing ?
A simple way to check that is to log into a compute node, run top and then 
press 1 f j
If some cores have higher usage than others, you are likely doing time sharing.
An other option is to disable cpu binding (ompi and openmp if any) and see if 
things get better
(This is suboptimal but still better than time sharing)

"Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote:
>- Is /tmp on that machine on NFS or local?
>
>- Have you looked at the text of the help message that came out before the "9 
>more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs" 
>message?  It should contain details about what the problematic NFS directory 
>is.
>
>- Do you know that it's MPI that is causing this low CPU utilization?
>
>- You mentioned other MPI implementations; have you tested with them to see if 
>they get better CPU utilization?
>
>- What happens if you run this application on a single machine, with no 
>network messaging?
>
>- Do you know what specifically in your application is slow?  I.e., have you 
>done any instrumentation to see what steps / API calls are running slowly, and 
>then tried to figure out why?
>
>- Do you have blocking message patterns that might operate well in shared 
>memory, but expose the inefficiencies of its algorithms/design when it moves 
>to higher-latency transports?
>
>- How long does your application run for?
>
>I ask these questions because MPI applications tend to be quite complicated. 
>Sometimes it's the application itself that is the cause of slowdown / 
>inefficiencies.
>
>
>
>On Oct 23, 2014, at 9:29 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote:
>
>> Later I change another machine and set the TMPDIR to default /tmp, but the 
>> problem (low CPU utilization under 20%) still occur :<
>> 
>> Vincent
>> 
>> On Thu, Oct 23, 2014 at 10:38 PM, Jeff Squyres (jsquyres) 
>> <jsquy...@cisco.com> wrote:
>> If normal users can't write to /tmp (or if /tmp is an NFS-mounted 
>> filesystem), that's the underlying problem.
>> 
>> @Vinson -- you should probably try to get that fixed.
>> 
>> 
>> 
>> On Oct 23, 2014, at 10:35 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>> 
>> > It's not coming from OSHMEM but from the OPAL "shmem" framework. You are 
>> > going to get terrible performance - possibly slowing to a crawl having all 
>> > processes open their backing files for mmap on NSF. I think that's the 
>> > error that he's getting.
>> >
>> >
>> > Josh
>> >
>> > On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com> 
>> > wrote:
>> > HI, Thanks for your reply:)
>> > I really run an MPI program (compile with OpenMPI and run with "mpirun -n 
>> > 8 .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, 
>> > what is OSHMEM ?
>> >
>> > Best
>> > Vincent
>> >
>> > On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> > From your error message, I gather you are not running an MPI program, but 
>> > rather an OSHMEM one? Otherwise, I find the message strange as it only 
>> > would be emitted from an OSHMEM program.
>> >
>> > What version of OMPI are you trying to use?
>> >
>> >> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote:
>> >>
>> >> Thanks for your reply:)
>> >> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and 
>> >> even reset to /tmp (I get the system permission), the problem still occur 
>> >> (CPU utilization still lower than 20%). I have no idea why and ready to 
>> >> give up OpenMPI instead of using other MPI library.
>> >>
>> >> Old Message-
>> >>
>> >> Date: Tue, 21 Oct 2014 22:21:31 -0400
>> >> From: Brock Palen <bro...@umich.edu>
>> >> To: Open MPI Users <us...@open-mpi.org>
>> >> Subject: Re: [OMPI users] low CPU utilization with OpenMPI
>> >> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu>
>> >> Content-Type: text/plain; charset=us-ascii
>> >>
>> >> Doing special files on NFS can be weird,  try the other /tmp/ locations:
>> >>
>> >> /var/tmp/
>> >> /dev/shm  (ram disk careful!)
>> >>
>> >

Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-24 Thread Jeff Squyres (jsquyres)
- Is /tmp on that machine on NFS or local?

- Have you looked at the text of the help message that came out before the "9 
more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs" 
message?  It should contain details about what the problematic NFS directory is.

- Do you know that it's MPI that is causing this low CPU utilization?

- You mentioned other MPI implementations; have you tested with them to see if 
they get better CPU utilization?

- What happens if you run this application on a single machine, with no network 
messaging?

- Do you know what specifically in your application is slow?  I.e., have you 
done any instrumentation to see what steps / API calls are running slowly, and 
then tried to figure out why?

- Do you have blocking message patterns that might operate well in shared 
memory, but expose the inefficiencies of its algorithms/design when it moves to 
higher-latency transports?

- How long does your application run for?

I ask these questions because MPI applications tend to be quite complicated. 
Sometimes it's the application itself that is the cause of slowdown / 
inefficiencies.



On Oct 23, 2014, at 9:29 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote:

> Later I change another machine and set the TMPDIR to default /tmp, but the 
> problem (low CPU utilization under 20%) still occur :<
> 
> Vincent
> 
> On Thu, Oct 23, 2014 at 10:38 PM, Jeff Squyres (jsquyres) 
> <jsquy...@cisco.com> wrote:
> If normal users can't write to /tmp (or if /tmp is an NFS-mounted 
> filesystem), that's the underlying problem.
> 
> @Vinson -- you should probably try to get that fixed.
> 
> 
> 
> On Oct 23, 2014, at 10:35 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
> 
> > It's not coming from OSHMEM but from the OPAL "shmem" framework. You are 
> > going to get terrible performance - possibly slowing to a crawl having all 
> > processes open their backing files for mmap on NSF. I think that's the 
> > error that he's getting.
> >
> >
> > Josh
> >
> > On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com> 
> > wrote:
> > HI, Thanks for your reply:)
> > I really run an MPI program (compile with OpenMPI and run with "mpirun -n 8 
> > .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, what 
> > is OSHMEM ?
> >
> > Best
> > Vincent
> >
> > On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote:
> > From your error message, I gather you are not running an MPI program, but 
> > rather an OSHMEM one? Otherwise, I find the message strange as it only 
> > would be emitted from an OSHMEM program.
> >
> > What version of OMPI are you trying to use?
> >
> >> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote:
> >>
> >> Thanks for your reply:)
> >> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and 
> >> even reset to /tmp (I get the system permission), the problem still occur 
> >> (CPU utilization still lower than 20%). I have no idea why and ready to 
> >> give up OpenMPI instead of using other MPI library.
> >>
> >> Old Message-
> >>
> >> Date: Tue, 21 Oct 2014 22:21:31 -0400
> >> From: Brock Palen <bro...@umich.edu>
> >> To: Open MPI Users <us...@open-mpi.org>
> >> Subject: Re: [OMPI users] low CPU utilization with OpenMPI
> >> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu>
> >> Content-Type: text/plain; charset=us-ascii
> >>
> >> Doing special files on NFS can be weird,  try the other /tmp/ locations:
> >>
> >> /var/tmp/
> >> /dev/shm  (ram disk careful!)
> >>
> >> Brock Palen
> >> www.umich.edu/~brockp
> >> CAEN Advanced Computing
> >> XSEDE Campus Champion
> >> bro...@umich.edu
> >> (734)936-1985
> >>
> >>
> >>
> >> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> 
> >> > wrote:
> >> >
> >> > Because of permission reason (OpenMPI can not write temporary file to 
> >> > the default /tmp directory), I change the TMPDIR to my local directory 
> >> > (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But 
> >> > the CPU utilization is very low under 20% (8 MPI rank running in Intel 
> >> > Xeon 8-core CPU).
> >> >
> >> > And I also got some message when I run with OpenMPI:
> >> > [cn3:28072] 9 more processes have sent help message 
> >> > help-opal-shmem-mmap.txt / mmap on nfs
> >> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> >> > help / error messages
> >> >
> >> > Any idea?
> >> > Thanks
> >> >
> >> > VIncent


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-23 Thread Vinson Leung
Later I change another machine and set the TMPDIR to default /tmp, but the
problem (low CPU utilization under 20%) still occur :<

Vincent

On Thu, Oct 23, 2014 at 10:38 PM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> If normal users can't write to /tmp (or if /tmp is an NFS-mounted
> filesystem), that's the underlying problem.
>
> @Vinson -- you should probably try to get that fixed.
>
>
>
> On Oct 23, 2014, at 10:35 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>
> > It's not coming from OSHMEM but from the OPAL "shmem" framework. You are
> going to get terrible performance - possibly slowing to a crawl having all
> processes open their backing files for mmap on NSF. I think that's the
> error that he's getting.
> >
> >
> > Josh
> >
> > On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com>
> wrote:
> > HI, Thanks for your reply:)
> > I really run an MPI program (compile with OpenMPI and run with "mpirun
> -n 8 .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW,
> what is OSHMEM ?
> >
> > Best
> > Vincent
> >
> > On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org>
> wrote:
> > From your error message, I gather you are not running an MPI program,
> but rather an OSHMEM one? Otherwise, I find the message strange as it only
> would be emitted from an OSHMEM program.
> >
> > What version of OMPI are you trying to use?
> >
> >> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com>
> wrote:
> >>
> >> Thanks for your reply:)
> >> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm
> and even reset to /tmp (I get the system permission), the problem still
> occur (CPU utilization still lower than 20%). I have no idea why and ready
> to give up OpenMPI instead of using other MPI library.
> >>
> >> Old Message-
> >>
> >> Date: Tue, 21 Oct 2014 22:21:31 -0400
> >> From: Brock Palen <bro...@umich.edu>
> >> To: Open MPI Users <us...@open-mpi.org>
> >> Subject: Re: [OMPI users] low CPU utilization with OpenMPI
> >> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu>
> >> Content-Type: text/plain; charset=us-ascii
> >>
> >> Doing special files on NFS can be weird,  try the other /tmp/ locations:
> >>
> >> /var/tmp/
> >> /dev/shm  (ram disk careful!)
> >>
> >> Brock Palen
> >> www.umich.edu/~brockp
> >> CAEN Advanced Computing
> >> XSEDE Campus Champion
> >> bro...@umich.edu
> >> (734)936-1985
> >>
> >>
> >>
> >> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com>
> wrote:
> >> >
> >> > Because of permission reason (OpenMPI can not write temporary file to
> the default /tmp directory), I change the TMPDIR to my local directory
> (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But the
> CPU utilization is very low under 20% (8 MPI rank running in Intel Xeon
> 8-core CPU).
> >> >
> >> > And I also got some message when I run with OpenMPI:
> >> > [cn3:28072] 9 more processes have sent help message
> help-opal-shmem-mmap.txt / mmap on nfs
> >> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
> >> >
> >> > Any idea?
> >> > Thanks
> >> >
> >> > VIncent
> >> > ___
> >> > users mailing list
> >> > us...@open-mpi.org
> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25548.php
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/2.php
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25556.php
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25558.php
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25560.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25561.php
>


Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-23 Thread Vinson Leung
How can I fix the error if all processes open their backing files for mmap
on NSF like you said?

Vincent

On Thu, Oct 23, 2014 at 10:35 PM, Joshua Ladd <jladd.m...@gmail.com> wrote:

> It's not coming from OSHMEM but from the OPAL "shmem" framework. You are
> going to get terrible performance - possibly slowing to a crawl having all
> processes open their backing files for mmap on NSF. I think that's the
> error that he's getting.
>
>
> Josh
>
> On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com>
> wrote:
>
>> HI, Thanks for your reply:)
>> I really run an MPI program (compile with OpenMPI and run with "mpirun -n
>> 8 .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW,
>> what is OSHMEM ?
>>
>> Best
>> Vincent
>>
>> On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> From your error message, I gather you are not running an MPI program,
>>> but rather an OSHMEM one? Otherwise, I find the message strange as it only
>>> would be emitted from an OSHMEM program.
>>>
>>> What version of OMPI are you trying to use?
>>>
>>> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com>
>>> wrote:
>>>
>>> Thanks for your reply:)
>>> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm
>>> and even reset to /tmp (I get the system permission), the problem still
>>> occur (CPU utilization still lower than 20%). I have no idea why and ready
>>> to give up OpenMPI instead of using other MPI library.
>>>
>>> Old Message-
>>>
>>> Date: Tue, 21 Oct 2014 22:21:31 -0400
>>> From: Brock Palen <bro...@umich.edu>
>>> To: Open MPI Users <us...@open-mpi.org>
>>> Subject: Re: [OMPI users] low CPU utilization with OpenMPI
>>> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu>
>>> Content-Type: text/plain; charset=us-ascii
>>>
>>> Doing special files on NFS can be weird,  try the other /tmp/ locations:
>>>
>>> /var/tmp/
>>> /dev/shm  (ram disk careful!)
>>>
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> bro...@umich.edu
>>> (734)936-1985
>>>
>>>
>>>
>>> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com>
>>> wrote:
>>> >
>>> > Because of permission reason (OpenMPI can not write temporary file to
>>> the default /tmp directory), I change the TMPDIR to my local directory
>>> (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But the
>>> CPU utilization is very low under 20% (8 MPI rank running in Intel Xeon
>>> 8-core CPU).
>>> >
>>> > And I also got some message when I run with OpenMPI:
>>> > [cn3:28072] 9 more processes have sent help message
>>> help-opal-shmem-mmap.txt / mmap on nfs
>>> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>>> all help / error messages
>>> >
>>> > Any idea?
>>> > Thanks
>>> >
>>> > VIncent
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> > Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2014/10/25548.php
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2014/10/2.php
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2014/10/25556.php
>>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/10/25558.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25560.php
>


Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-23 Thread Jeff Squyres (jsquyres)
If normal users can't write to /tmp (or if /tmp is an NFS-mounted filesystem), 
that's the underlying problem.

@Vinson -- you should probably try to get that fixed.



On Oct 23, 2014, at 10:35 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:

> It's not coming from OSHMEM but from the OPAL "shmem" framework. You are 
> going to get terrible performance - possibly slowing to a crawl having all 
> processes open their backing files for mmap on NSF. I think that's the error 
> that he's getting.
> 
> 
> Josh
> 
> On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com> wrote:
> HI, Thanks for your reply:)
> I really run an MPI program (compile with OpenMPI and run with "mpirun -n 8 
> .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, what is 
> OSHMEM ?
> 
> Best
> Vincent
> 
> On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote:
> From your error message, I gather you are not running an MPI program, but 
> rather an OSHMEM one? Otherwise, I find the message strange as it only would 
> be emitted from an OSHMEM program.
> 
> What version of OMPI are you trying to use?
> 
>> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote:
>> 
>> Thanks for your reply:)
>> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and 
>> even reset to /tmp (I get the system permission), the problem still occur 
>> (CPU utilization still lower than 20%). I have no idea why and ready to give 
>> up OpenMPI instead of using other MPI library.
>> 
>> Old Message---------
>> 
>> Date: Tue, 21 Oct 2014 22:21:31 -0400
>> From: Brock Palen <bro...@umich.edu>
>> To: Open MPI Users <us...@open-mpi.org>
>> Subject: Re: [OMPI users] low CPU utilization with OpenMPI
>> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu>
>> Content-Type: text/plain; charset=us-ascii
>> 
>> Doing special files on NFS can be weird,  try the other /tmp/ locations:
>> 
>> /var/tmp/
>> /dev/shm  (ram disk careful!)
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote:
>> >
>> > Because of permission reason (OpenMPI can not write temporary file to the 
>> > default /tmp directory), I change the TMPDIR to my local directory (export 
>> > TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU 
>> > utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core 
>> > CPU).
>> >
>> > And I also got some message when I run with OpenMPI:
>> > [cn3:28072] 9 more processes have sent help message 
>> > help-opal-shmem-mmap.txt / mmap on nfs
>> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
>> > help / error messages
>> >
>> > Any idea?
>> > Thanks
>> >
>> > VIncent
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post: 
>> > http://www.open-mpi.org/community/lists/users/2014/10/25548.php
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/2.php
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25556.php
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25558.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25560.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-23 Thread Joshua Ladd
It's not coming from OSHMEM but from the OPAL "shmem" framework. You are
going to get terrible performance - possibly slowing to a crawl having all
processes open their backing files for mmap on NSF. I think that's the
error that he's getting.


Josh

On Thu, Oct 23, 2014 at 6:06 AM, Vinson Leung <lwhvinson1...@gmail.com>
wrote:

> HI, Thanks for your reply:)
> I really run an MPI program (compile with OpenMPI and run with "mpirun -n
> 8 .."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW,
> what is OSHMEM ?
>
> Best
> Vincent
>
> On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> From your error message, I gather you are not running an MPI program, but
>> rather an OSHMEM one? Otherwise, I find the message strange as it only
>> would be emitted from an OSHMEM program.
>>
>> What version of OMPI are you trying to use?
>>
>> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com>
>> wrote:
>>
>> Thanks for your reply:)
>> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and
>> even reset to /tmp (I get the system permission), the problem still occur
>> (CPU utilization still lower than 20%). I have no idea why and ready to
>> give up OpenMPI instead of using other MPI library.
>>
>> Old Message---------
>>
>> Date: Tue, 21 Oct 2014 22:21:31 -0400
>> From: Brock Palen <bro...@umich.edu>
>> To: Open MPI Users <us...@open-mpi.org>
>> Subject: Re: [OMPI users] low CPU utilization with OpenMPI
>> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu>
>> Content-Type: text/plain; charset=us-ascii
>>
>> Doing special files on NFS can be weird,  try the other /tmp/ locations:
>>
>> /var/tmp/
>> /dev/shm  (ram disk careful!)
>>
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>>
>>
>>
>> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com>
>> wrote:
>> >
>> > Because of permission reason (OpenMPI can not write temporary file to
>> the default /tmp directory), I change the TMPDIR to my local directory
>> (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But the
>> CPU utilization is very low under 20% (8 MPI rank running in Intel Xeon
>> 8-core CPU).
>> >
>> > And I also got some message when I run with OpenMPI:
>> > [cn3:28072] 9 more processes have sent help message
>> help-opal-shmem-mmap.txt / mmap on nfs
>> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>> all help / error messages
>> >
>> > Any idea?
>> > Thanks
>> >
>> > VIncent
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/10/25548.php
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/10/2.php
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/10/25556.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25558.php
>


Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-23 Thread Vinson Leung
HI, Thanks for your reply:)
I really run an MPI program (compile with OpenMPI and run with "mpirun -n 8
.."). My OpenMPI version is 1.8.3 and my program is Gromacs. BTW, what
is OSHMEM ?

Best
Vincent

On Thu, Oct 23, 2014 at 12:21 PM, Ralph Castain <r...@open-mpi.org> wrote:

> From your error message, I gather you are not running an MPI program, but
> rather an OSHMEM one? Otherwise, I find the message strange as it only
> would be emitted from an OSHMEM program.
>
> What version of OMPI are you trying to use?
>
> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote:
>
> Thanks for your reply:)
> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and
> even reset to /tmp (I get the system permission), the problem still occur
> (CPU utilization still lower than 20%). I have no idea why and ready to
> give up OpenMPI instead of using other MPI library.
>
> Old Message-
>
> Date: Tue, 21 Oct 2014 22:21:31 -0400
> From: Brock Palen <bro...@umich.edu>
> To: Open MPI Users <us...@open-mpi.org>
> Subject: Re: [OMPI users] low CPU utilization with OpenMPI
> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu>
> Content-Type: text/plain; charset=us-ascii
>
> Doing special files on NFS can be weird,  try the other /tmp/ locations:
>
> /var/tmp/
> /dev/shm  (ram disk careful!)
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
>
>
>
> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com>
> wrote:
> >
> > Because of permission reason (OpenMPI can not write temporary file to
> the default /tmp directory), I change the TMPDIR to my local directory
> (export TMPDIR=/home/user/tmp ) and then the MPI program can run. But the
> CPU utilization is very low under 20% (8 MPI rank running in Intel Xeon
> 8-core CPU).
> >
> > And I also got some message when I run with OpenMPI:
> > [cn3:28072] 9 more processes have sent help message
> help-opal-shmem-mmap.txt / mmap on nfs
> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
> help / error messages
> >
> > Any idea?
> > Thanks
> >
> > VIncent
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25548.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/2.php
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25556.php
>


Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-23 Thread Ralph Castain
From your error message, I gather you are not running an MPI program, but 
rather an OSHMEM one? Otherwise, I find the message strange as it only would be 
emitted from an OSHMEM program.

What version of OMPI are you trying to use?

> On Oct 22, 2014, at 7:12 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote:
> 
> Thanks for your reply:)
> Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and 
> even reset to /tmp (I get the system permission), the problem still occur 
> (CPU utilization still lower than 20%). I have no idea why and ready to give 
> up OpenMPI instead of using other MPI library.
> 
> Old Message-
> 
> Date: Tue, 21 Oct 2014 22:21:31 -0400
> From: Brock Palen <bro...@umich.edu <mailto:bro...@umich.edu>>
> To: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>>
> Subject: Re: [OMPI users] low CPU utilization with OpenMPI
> Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu 
> <mailto:cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu>>
> Content-Type: text/plain; charset=us-ascii
> 
> Doing special files on NFS can be weird,  try the other /tmp/ locations:
> 
> /var/tmp/
> /dev/shm  (ram disk careful!)
> 
> Brock Palen
> www.umich.edu/~brockp <http://www.umich.edu/~brockp>
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu <mailto:bro...@umich.edu>
> (734)936-1985
> 
> 
> 
> > On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com 
> > <mailto:lwhvinson1...@gmail.com>> wrote:
> >
> > Because of permission reason (OpenMPI can not write temporary file to the 
> > default /tmp directory), I change the TMPDIR to my local directory (export 
> > TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU 
> > utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core 
> > CPU).
> >
> > And I also got some message when I run with OpenMPI:
> > [cn3:28072] 9 more processes have sent help message 
> > help-opal-shmem-mmap.txt / mmap on nfs
> > [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> > help / error messages
> >
> > Any idea?
> > Thanks
> >
> > VIncent
> > ___
> > users mailing list
> > us...@open-mpi.org <mailto:us...@open-mpi.org>
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2014/10/25548.php 
> > <http://www.open-mpi.org/community/lists/users/2014/10/25548.php>___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/2.php



Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-22 Thread Vinson Leung
Thanks for your reply:)
Follow your advice I tried to set the TMPDIR to /var/tmp and /dev/shm and
even reset to /tmp (I get the system permission), the problem still occur
(CPU utilization still lower than 20%). I have no idea why and ready to
give up OpenMPI instead of using other MPI library.

Old Message-

List-Post: users@lists.open-mpi.org
Date: Tue, 21 Oct 2014 22:21:31 -0400
From: Brock Palen <bro...@umich.edu>
To: Open MPI Users <us...@open-mpi.org>
Subject: Re: [OMPI users] low CPU utilization with OpenMPI
Message-ID: <cc54135d-0cfe-440a-8df2-06b587e17...@umich.edu>
Content-Type: text/plain; charset=us-ascii

Doing special files on NFS can be weird,  try the other /tmp/ locations:

/var/tmp/
/dev/shm  (ram disk careful!)

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com>
wrote:
>
> Because of permission reason (OpenMPI can not write temporary file to the
default /tmp directory), I change the TMPDIR to my local directory (export
TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU
utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core
CPU).
>
> And I also got some message when I run with OpenMPI:
> [cn3:28072] 9 more processes have sent help message
help-opal-shmem-mmap.txt / mmap on nfs
> [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages
>
> Any idea?
> Thanks
>
> VIncent
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/10/25548.php


Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-21 Thread Brock Palen
Doing special files on NFS can be weird,  try the other /tmp/ locations:

/var/tmp/
/dev/shm  (ram disk careful!)

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Oct 21, 2014, at 10:18 PM, Vinson Leung  wrote:
> 
> Because of permission reason (OpenMPI can not write temporary file to the 
> default /tmp directory), I change the TMPDIR to my local directory (export 
> TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU 
> utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core 
> CPU). 
> 
> And I also got some message when I run with OpenMPI:
> [cn3:28072] 9 more processes have sent help message help-opal-shmem-mmap.txt 
> / mmap on nfs
> [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help 
> / error messages
> 
> Any idea?
> Thanks
> 
> VIncent
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25548.php



[OMPI users] low CPU utilization with OpenMPI

2014-10-21 Thread Vinson Leung
Because of permission reason (OpenMPI can not write temporary file to the
default /tmp directory), I change the TMPDIR to my local directory (export
TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU
utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core
CPU).

And I also got some message when I run with OpenMPI:
[cn3:28072] 9 more processes have sent help message
help-opal-shmem-mmap.txt / mmap on nfs
[cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages

Any idea?
Thanks

VIncent


[OMPI users] Low cpu utilization due to high IO operations of openmpi

2012-10-21 Thread Mahmood Naderan
Dear all,
We have a diskless cluster with these specs:
1) A server which has some disks. Root directories (/usr, /lib, ...) are 

on /dev/sda while /home is on /dev/sdb and these are two physical 

hard drives.

2) Some compute nodes. These don't have any disk drive instead they 

are connected through a 10/100/1000 switch to the server

3) Nodes uses a NFS directory for booting which resides on /dev/sda.

In our cluster we use openmpi with openfoam. Both were compiled using 

default options. Problem is when the openfoam solver with openmpi is 

sent to a compute node, the opnempi uses a lot of *WRITE* operations
which causes a low cpu utilization and hence the processes are mainly 

in 'D' state. A brief description of our tests are:

We ssh to the compute node and run the application there

Test 1) One process of openfoam is launched without openmpi. Everything 

is fine and cpu is utilized 100%
Test 2) Two processes of openfoam is launched (mpirun -np 2 ). Two 

openfoam processes has about 30% cpu utilization and they are in 'D' 

state most of the time.


Is there any suggestion on that. That is a really poor performance


Regards,
Mahmood