Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-03-04 Thread Dave Love
Tru Huynh  writes:

> afaik, 2.6.32-431 series is from RHEL(and clones) version >=6.5

[Right.]

> otoh, it might be related to http://bugs.centos.org/view.php?id=6949

That looks likely.  As we bind to cores, we wouldn't see it for MPI
processes, at least, and will see higher performance generally.  (I read
or replied carelessly in thinking this was about binding, rather than a
possible scheduling issue.)

It really is time to take core binding seriously at least 15 years after
NUMA became significant.


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-03-04 Thread Dave Love
Bernd Dammann  writes:

> We use Moab/Torque, so we could use cpusets (but that has had some
> other side effects earlier, so we did not implement it in our setup).

I don't know remember Torque does, but core binding and (Linux) cpusets
are somewhat orthogonal.  While a cpuset will obviously restrict the
processes somewhat, it won't provide the necessary binding (at least
unless the resource manager launches the processes and uses a cpuset for
each).

> Regardless of that, it looks strange to me, that this combination of
> kernel and OMPI has such a negative side effect on application
> performance.

I assume you can determine whether or not it's the kernel rather than
ompi/ofed by booting into the old one.


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-03-04 Thread Bernd Dammann

On 3/2/14 0:44 AM, Tru Huynh wrote:

On Fri, Feb 28, 2014 at 08:49:45AM +0100, Bernd Dammann wrote:

Maybe I should say, that we moved from SL 6.1 and OMPI 1.4.x to SL
6.4 with the above kernel, and OMPI 1.6.5 - which means a major
upgrade of our cluster.

After the upgrade, users reported those slowdowns, and a search on
this list showed, that other sites had the same (or similar issues)
with this kernel and OMPI version combination.


afaik, 2.6.32-431 series is from RHEL(and clones) version >=6.5


You're right - the kernel is coming from the rolling release of SL.


otoh, it might be related to http://bugs.centos.org/view.php?id=6949

Thanks!!!  That was exactly the problem.  We patched the kernel and 
installed it on a few nodes, and so far testing looks promising.  We had 
the kernel scheduler on our radar, since we could see that there were 
differences compared to the old kernel we'd used before, but didn't have 
time to dig deeper into it, yet.  Great work!  Let's hope this patch 
will make it into the official kernel.


Regards,
Bernd





Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-02-28 Thread Bernd Dammann

On 2/27/14 14:06 PM, Noam Bernstein wrote:

On Feb 27, 2014, at 2:36 AM, Patrick Begou  
wrote:


Bernd Dammann wrote:

Using the workaround '--bind-to-core' does only make sense for those jobs, that 
allocate full nodes, but the majority of our jobs don't do that.

Why ?
We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenFOAM and other 
applications to attach each process on its core because sometimes linux move 
processes and 2 process can run on the same core, slowing the application. Even 
if we do not use full nodes.
'--bind-to-core' is only not applicable if you mix OpenMP and MPI as all your 
threads will be binded to the same core but I do not remember that OpenFOAM 
does this yet.


But if your jobs don't allocate full nodes and there are two jobs on the same 
node
they can end up bound to the same subset of cores.


Exactly, that's our problem!


Torque cpusets should in
principle be able to do this (queuing system allocates distinct sets of cores to
distinct jobs), but I've never used them myself.



We started to use them at some point, but it had some side effects 
(leaving dangling jobs/processes), so we stopped using them.  And 
certain ISV applications has issues as well.



Here we've just basically given up on jobs that allocate a non-integer # of
nodes.  In principle they can (and then I turn off bind by core), but hardly 
anyone
does it except for some serial jobs.  Then again, we have a mix of 8 and 16 core
nodes.  If we had only 32 or 64 core nodes we might be less tolerant of this
restriction.



We are running a system with a very inhomogeneous workload, i.e. 
in-house applications, or applications which we compile ourselves, but 
also 3rd party applications, that not always are designed with a 
(multi-user) cluster in mind.


Rgds,
Bernd



Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-02-28 Thread Bernd Dammann

On 2/27/14 16:47 PM, Dave Love wrote:

Bernd Dammann  writes:


Hi,

I found this thread from before Christmas, and I wondered what the
status of this problem is.  We experience the same problems since our
upgrade to Scientific Linux 6.4, kernel 2.6.32-431.1.2.el6.x86_64, and
OpenMPI 1.6.5.

Users have reported severe slowdowns in all kinds of applications,
like VASP, OpenFOAM, etc.


I'm surprised a kernel change should be related to core binding, if
that's the issue, or caused your slowdown.  We were running that kernel
OK until recently with those sort of applications and that OMPI version.


Maybe I should say, that we moved from SL 6.1 and OMPI 1.4.x to SL 6.4 
with the above kernel, and OMPI 1.6.5 - which means a major upgrade of 
our cluster.


After the upgrade, users reported those slowdowns, and a search on this 
list showed, that other sites had the same (or similar issues) with this 
kernel and OMPI version combination.



(The change to the default alltoallv collective algorithm in the OMPI
1.6 series, discussed in the archives, might affect you if you upgraded
through it.)



OK, thanks - I take a look at it.


Using the workaround '--bind-to-core' does only make sense for those
jobs, that allocate full nodes, but the majority of our jobs don't do
that.


I don't consider it a workaround.  Just use a resource manager that
sorts it out for you.  For what it's worth, a recipe for SGE/OMPI is at
.  We're
happy with that (and seem to be at least on a par with Intel using
OMPI+GCC+OpenBLAS) now users automatically get binding.


We use Moab/Torque, so we could use cpusets (but that has had some other 
side effects earlier, so we did not implement it in our setup).


Regardless of that, it looks strange to me, that this combination of 
kernel and OMPI has such a negative side effect on application performance.


Rgds,
Bernd





Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-02-27 Thread John Hearns
Noam, cpusets are a very good idea.
Not only for CPU binding but for isolating 'badky behaved' applications.
If an application stsrts using huge amounts of memory - kill it, collapse
the cpuset and it is gone - nice clean way to manage jobs.


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-02-27 Thread Ralph Castain

On Feb 27, 2014, at 5:06 AM, Noam Bernstein  wrote:

> On Feb 27, 2014, at 2:36 AM, Patrick Begou 
>  wrote:
> 
>> Bernd Dammann wrote:
>>> Using the workaround '--bind-to-core' does only make sense for those jobs, 
>>> that allocate full nodes, but the majority of our jobs don't do that.
>> Why ?
>> We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenFOAM and other 
>> applications to attach each process on its core because sometimes linux move 
>> processes and 2 process can run on the same core, slowing the application. 
>> Even if we do not use full nodes.
>> '--bind-to-core' is only not applicable if you mix OpenMP and MPI as all 
>> your threads will be binded to the same core but I do not remember that 
>> OpenFOAM does this yet.
> 
> But if your jobs don't allocate full nodes and there are two jobs on the same 
> node
> they can end up bound to the same subset of cores.  Torque cpusets should in 
> principle be able to do this (queuing system allocates distinct sets of cores 
> to
> distinct jobs), but I've never used them myself.
> 
> Here we've just basically given up on jobs that allocate a non-integer # of 
> nodes.  In principle they can (and then I turn off bind by core), but hardly 
> anyone 
> does it except for some serial jobs.  Then again, we have a mix of 8 and 16 
> core
> nodes.  If we had only 32 or 64 core nodes we might be less tolerant of this 
> restriction.

I don't know if the original poster is using a resource manager or not, but we 
can support multi-tenant operations regardless. If you are using a resource 
manager, you can ask the RM to bind your allocation to a specific number of 
cores on each node. OMPI will then respect that restriction, binding your 
processes to cores within it.

If you aren't using a resource manager, or simply want to run multiple jobs on 
your own dedicated nodes, you can impose the restriction yourself by just 
adding the --cpu-set option to your cmd line:

mpirun --cpu-set 0-3 ...

will restrict OMPI to using the first four cores on each node. Any 
comma-delimited set of ranges can be provided.

Even more mapping and binding options are provided in the 1.7 series, so you 
might want to look at it.

HTH
Ralph

> 
> 
>   
> Noam
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-02-27 Thread Noam Bernstein
On Feb 27, 2014, at 2:36 AM, Patrick Begou  
wrote:

> Bernd Dammann wrote:
>> Using the workaround '--bind-to-core' does only make sense for those jobs, 
>> that allocate full nodes, but the majority of our jobs don't do that.
> Why ?
> We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenFOAM and other 
> applications to attach each process on its core because sometimes linux move 
> processes and 2 process can run on the same core, slowing the application. 
> Even if we do not use full nodes.
> '--bind-to-core' is only not applicable if you mix OpenMP and MPI as all your 
> threads will be binded to the same core but I do not remember that OpenFOAM 
> does this yet.

But if your jobs don't allocate full nodes and there are two jobs on the same 
node
they can end up bound to the same subset of cores.  Torque cpusets should in 
principle be able to do this (queuing system allocates distinct sets of cores to
distinct jobs), but I've never used them myself.

Here we've just basically given up on jobs that allocate a non-integer # of 
nodes.  In principle they can (and then I turn off bind by core), but hardly 
anyone 
does it except for some serial jobs.  Then again, we have a mix of 8 and 16 core
nodes.  If we had only 32 or 64 core nodes we might be less tolerant of this 
restriction.



Noam

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-02-27 Thread Patrick Begou

Bernd Dammann wrote:
Using the workaround '--bind-to-core' does only make sense for those jobs, 
that allocate full nodes, but the majority of our jobs don't do that.

Why ?
We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenFOAM and other 
applications to attach each process on its core because sometimes linux move 
processes and 2 process can run on the same core, slowing the application. Even 
if we do not use full nodes.
'--bind-to-core' is only not applicable if you mix OpenMP and MPI as all your 
threads will be binded to the same core but I do not remember that OpenFOAM does 
this yet.


Patrick

--
===
|  Equipe M.O.S.T. |  |
|  Patrick BEGOU   | mailto:patrick.be...@grenoble-inp.fr |
|  LEGI|  |
|  BP 53 X | Tel 04 76 82 51 35   |
|  38041 GRENOBLE CEDEX| Fax 04 76 82 52 71   |
===



Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-02-26 Thread Bernd Dammann

Hi,

I found this thread from before Christmas, and I wondered what the 
status of this problem is.  We experience the same problems since our 
upgrade to Scientific Linux 6.4, kernel 2.6.32-431.1.2.el6.x86_64, and 
OpenMPI 1.6.5.


Users have reported severe slowdowns in all kinds of applications, like 
VASP, OpenFOAM, etc.


Using the workaround '--bind-to-core' does only make sense for those 
jobs, that allocate full nodes, but the majority of our jobs don't do that.


Is there any news on this issue?

Regards,
Bernd

--
DTU Computing Center
Technical University of Denmark


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-19 Thread Noam Bernstein
On Dec 18, 2013, at 5:19 PM, Martin Siegert  wrote:
> 
> Thanks for figuring this out. Does this work for 1.6.x as well?
> The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity
> covers versions 1.2.x to 1.5.x. 
> Does 1.6.x support mpi_paffinity_alone = 1 ?
> I set this in openmpi-mca-params.conf but
> 
> # ompi_info | grep affinity
>  MPI extensions: affinity example
>   MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.4)
>   MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.4)
>   MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.4)
> 
> does not give any indication that this is actually used.

I never checked actual bindings with hwloc-ps or anything like that,
but as far as I can tell, 1.6.4 had consistently high performance when I
used mpi_paffinity_alone=1, and slowdowns of up to a factor of ~2
when I didn't.  1.7.3 with the old kernel never showed extreme slowdowns,
but we didn't benchmark it carefully, so it's conceivable it had minor
(same factor of 2) slowdowns.  With the new kernel 1.7.3 would
show slowdowns between a factor of 2 and maybe 20 (paffinity definitely
did nothing) , and "--bind-to core" restored consistent performance.


Noam

smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-19 Thread Dave Love
Brice Goglin  writes:

> hwloc-ps (and lstopo --top) are better at showing process binding but
> they lack a nice pseudographical interface with dynamic refresh.

That seems like an advantage when you want to check on a cluster!

> htop uses hwloc internally iirc, so there's hope we'll have everything needed 
> in htop one day ;)

Apparently not in RH EPEL, for what it's worth, and I don't understand
how to get bindings out of it.



Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-19 Thread Dave Love
Noam Bernstein  writes:

> On Dec 18, 2013, at 10:32 AM, Dave Love  wrote:
>
>> Noam Bernstein  writes:
>> 
>>> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in 
>>> some 
>>> collective communication), but now I'm wondering whether I should just test
>>> 1.6.5.
>> 
>> What bug, exactly?  As you mentioned vasp, is it specifically affecting
>> that?
>
> Yes - I never characterized it fully, but we attached with gdb to every
> single vasp running process, and all were stuck in the same
> call to MPI_allreduce() every time. It's only happening on a rather large 
> jobs, so it's not the easiest setup to debug.  

Maybe that's a different problem.  I know they tried multiple versions
of vasp, which had different failures.  Actually, I just remembered that
the version I examined with padb was built with the intel compiler and
run with gcc openmpi (I know...), but builds with gcc failed too.  I
don't know if that was taken up with the developers.

I guess this isn't the place to discuss vasp, unless it's helping to pin
down an ompi problem, but people might benefit from notes of problems in
the archive.

> If I can reproduce the problem with 1.6.5, and I can confirm that it's always 
> locking up in the same call to mpi_allreduce, and all processes are stuck 
> in the same call, is there interest in looking into a possible mpi issue?  

I'd have thought so from the point of view of those of us running 1.6
for compatibility with the RHEL6 openmpi.

Thanks for the info, anyhow.

Incidentally, if vasp is built with ompi's alltoallv -- I understand it
has its own implementation of that or something similar --
 may be
relevant, if you haven't seen it.



Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-18 Thread Martin Siegert
Hi,

expanding on Noam's problem a bit ...

On Wed, Dec 18, 2013 at 10:19:25AM -0500, Noam Bernstein wrote:
> Thanks to all who answered my question.  The culprit was an interaction 
> between
> 1.7.3 not supporting mpi_paffinity_alone (which we were using previously) and 
> the new 
> kernel.  Switching to --bind-to core (actually the environment variable 
> OMPI_MCA_hwloc_base_binding_policy=core) fixed the problem.
> 
> Noam

Thanks for figuring this out. Does this work for 1.6.x as well?
The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity
covers versions 1.2.x to 1.5.x. 
Does 1.6.x support mpi_paffinity_alone = 1 ?
I set this in openmpi-mca-params.conf but

# ompi_info | grep affinity
  MPI extensions: affinity example
   MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.4)
   MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.4)
   MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.4)

does not give any indication that this is actually used.

Cheers,
Martin

-- 
Martin Siegert
WestGrid/ComputeCanada
Simon Fraser University


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-18 Thread Ake Sandgren
On Wed, 2013-12-18 at 11:47 -0500, Noam Bernstein wrote: 
> Yes - I never characterized it fully, but we attached with gdb to every
> single vasp running process, and all were stuck in the same
> call to MPI_allreduce() every time. It's only happening on a rather large 
> jobs, so it's not the easiest setup to debug.  

That sounds like one of the bugs i found i VASP.
Could you send me the input data that triggers this (with info on how it
was run, i.e. #mpi-tasks etc) and i can check if our heavily fixed
version hits it.

/Åke S.



Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-18 Thread Brice Goglin
hwloc-ps (and lstopo --top) are better at showing process binding but they lack 
a nice pseudographical interface with dynamic refresh.
htop uses hwloc internally iirc, so there's hope we'll have everything needed 
in htop one day ;)
Brice



Dave Love  a écrit :
>John Hearns  writes:
>
>> 'Htop' is a very good tool for looking at where processes are
>running.
>
>I'd have thought hwloc-ps is the tool for that.
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-18 Thread Dave Love
John Hearns  writes:

> 'Htop' is a very good tool for looking at where processes are running.

I'd have thought hwloc-ps is the tool for that.


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-18 Thread Dave Love
Noam Bernstein  writes:

> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some 
> collective communication), but now I'm wondering whether I should just test
> 1.6.5.

What bug, exactly?  As you mentioned vasp, is it specifically affecting
that?

We have seen apparent deadlocks with vasp -- which users assure me is
due to malfunctioning hardware and/or batch system -- but I don't think
there was any evidence of it being due to openmpi (1.4 and 1.6 on
different systems here).  I didn't have the padb --deadlock mode working
properly at the time I looked at one, but it seemed just to be stuck
with some ranks in broadcast and the rest in barrier.  Someone else put
a parallel debugger on it, but I'm not sure if there was a conclusive
result, and I'm not very interested in debugging proprietary programs.


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-18 Thread Noam Bernstein
Thanks to all who answered my question.  The culprit was an interaction between
1.7.3 not supporting mpi_paffinity_alone (which we were using previously) and 
the new 
kernel.  Switching to --bind-to core (actually the environment variable 
OMPI_MCA_hwloc_base_binding_policy=core) fixed the problem.


Noam

smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-17 Thread Maxime Boissonneault

Hi,
Do you have thread multiples enabled in your OpenMPI installation ?

Maxime Boissonneault

Le 2013-12-16 17:40, Noam Bernstein a écrit :

Has anyone tried to use openmpi 1.7.3 with the latest CentOS kernel
(well, nearly latest: 2.6.32-431.el6.x86_64), and especially with infiniband?

I'm seeing lots of weird slowdowns, especially when using infiniband,
but even when running with "--mca btl self,sm" (it's much worse with
IB, though), so I was wondering if anyone else has tested this kernel yet?

Once I have some more detailed information I'll follow up.

Noam
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-17 Thread Ralph Castain
OMPI_MCA_hwloc_base_binding_policy=core


On Dec 17, 2013, at 8:40 AM, Noam Bernstein  wrote:

> On Dec 17, 2013, at 11:04 AM, Ralph Castain  wrote:
> 
>> Are you binding the procs? We don't bind by default (this will change in 
>> 1.7.4), and binding can play a significant role when comparing across 
>> kernels.
>> 
>> add "--bind-to-core" to your cmd line
> 
> Now that it works, is there a way to set it via an environment variable, or 
> do I have to put it
> on the command line each time?
> 
>   
> Noam___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-17 Thread Noam Bernstein
On Dec 17, 2013, at 11:04 AM, Ralph Castain  wrote:

> Are you binding the procs? We don't bind by default (this will change in 
> 1.7.4), and binding can play a significant role when comparing across kernels.
> 
> add "--bind-to-core" to your cmd line

Now that it works, is there a way to set it via an environment variable, or do 
I have to put it
on the command line each time?


Noam

smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-17 Thread Noam Bernstein
On Dec 17, 2013, at 11:04 AM, Ralph Castain  wrote:

> Are you binding the procs? We don't bind by default (this will change in 
> 1.7.4), and binding can play a significant role when comparing across kernels.
> 
> add "--bind-to-core" to your cmd line

Yeay - it works.  Thank you very much for the help.  I guess something must have
changed with the default binding for new kernel + 1.7.3.  


Noam

smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-17 Thread Nathan Hjelm
On Tue, Dec 17, 2013 at 11:16:48AM -0500, Noam Bernstein wrote:
> On Dec 17, 2013, at 11:04 AM, Ralph Castain  wrote:
> 
> > Are you binding the procs? We don't bind by default (this will change in 
> > 1.7.4), and binding can play a significant role when comparing across 
> > kernels.
> > 
> > add "--bind-to-core" to your cmd line
> 
> I've previously always used mpi_paffinity_alone=1, and the new behavior
> seems to be independent of whether or not I use it.  I'll try bind-to-core.

That would be the problem. That variable no longer exists in 1.7.4 and
has been replaced by hwloc_base_binding_policy. --bind-to core is an
alias of -mca hwloc_base_binding_policy core.

> One more possible clue.  I haven't done a full test, but for one
> particular setup (newer nodes, single node so presumably using
> sm), there are apparently two ways to fix the problem:
> 1. go back to the previous kernel, but stick with openmpi 1.7.3
> 2. stick with the new kernel, but go back to openmpi 1.6.4
> 
> So it appears to be some interaction between the new kernel and 1.7.3 that
> isn't present with 1.6.4.
> 
> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some 
> collective communication), but now I'm wondering whether I should just test
> 1.6.5.
> 
>   Noam
> 



> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



pgpvsBxN0Llm0.pgp
Description: PGP signature


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-17 Thread Noam Bernstein
On Dec 17, 2013, at 11:04 AM, Ralph Castain  wrote:

> Are you binding the procs? We don't bind by default (this will change in 
> 1.7.4), and binding can play a significant role when comparing across kernels.
> 
> add "--bind-to-core" to your cmd line

I've previously always used mpi_paffinity_alone=1, and the new behavior
seems to be independent of whether or not I use it.  I'll try bind-to-core.

One more possible clue.  I haven't done a full test, but for one
particular setup (newer nodes, single node so presumably using
sm), there are apparently two ways to fix the problem:
1. go back to the previous kernel, but stick with openmpi 1.7.3
2. stick with the new kernel, but go back to openmpi 1.6.4

So it appears to be some interaction between the new kernel and 1.7.3 that
isn't present with 1.6.4.

We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some 
collective communication), but now I'm wondering whether I should just test
1.6.5.

Noam



smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-17 Thread John Hearns
'Htop' is a very good tool for looking at where processes are running.


Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-17 Thread Ralph Castain
Are you binding the procs? We don't bind by default (this will change in 
1.7.4), and binding can play a significant role when comparing across kernels.

add "--bind-to-core" to your cmd line


On Dec 17, 2013, at 7:09 AM, Noam Bernstein  wrote:

> On Dec 16, 2013, at 5:40 PM, Noam Bernstein  
> wrote:
> 
>> 
>> Once I have some more detailed information I'll follow up.
> 
> OK - I've tried to characterize the behavior with vasp, which accounts for
> most of our cluster usage, and it's quite odd.  I ran my favorite benchmarking
> job repeated 4 times. As you can see below, in some
> cases using sm it's as fast as before (kernel 2.6.32-358.23.2.el6.x86_64),
> but mostly it's a factor of 2 slower.  With openib and our older nodes it's 
> always a 
> factor of 2-4 slower.  With the newer nodes in a situation where using sm is
> possible it's occasionally as fast as before, but sometimes it's 10-20 times
> slower.  When using ib with the new nodes it's always much slower than before.
> 
> openmpi is 1.7.3, recompiled with the new kernel.  vasp is 5.3.3, which we've
> been using for months.  Everything is compiled with an older stable version
> of the intel compiler, as we've been doing for a long time.
> 
> More perhaps useful information - I don't have actual data from the previous
> setup (perhaps I should roll back some nodes and check), but I generally
> expect to see 100% cpu usage on all the processes, either because they're
> doing numeric stuff, or doing a busy-wait for mpi.  However, now I see a few 
> of the vasp processes at 100%, and the others at 50-70% (say 4-6 on a given
> node at 100%, and the rest lower). 
> 
> If anyone has any ideas on what's going on, or how to debug further, I'd
> really appreciate some suggestions.
> 
>   
> Noam
> 
> 8 core nodes (dual Xeon X5550)
> 
> 8 MPI procs (single node)
> used to be 5.74 s
> now:
> btl: default  or sm only or sm+openib: 5.5-9.3 s, mostly the larger times
> btl: openib: 10.0-12.2 s
> 
> 16 MPI procs (2 nodes)
> used to be 2.88 s
> btl default or openib or sm+openib: 4.8 - 6.23 s
> 
> 32 MPI procs (4 nodes)
> use to be 1.59 s
> btl default or openib or sm+openib: 2.73-4.49 s, but sometimes just fails
> 
> at least once gave the errors (stack trace is incomplete, but probably on 
> mpi_comm_rank, mpi_comm_size, or mpi_barrier)
> [compute-3-24:32566] [[59587,0],0]:route_callback trying to get message from 
> [[59587,1],20] to [[59587,1],28]:102, routing loop
> [0] 
> func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_backtrace_print+0x1f)
>  [0x2b5940c2dd9f]
> [1] 
> func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_rml_oob.so(+0x22b6)
>  [0x2b5941f0f2b6]
> [2] 
> func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_recv_complete+0x27f)
>  [0x2b59441f]
> [3] 
> func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(+0x9d3a)
>  [0x2b5943334d3a]
> [4] 
> func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x8bc)
>  [0x2b5940c3592c]
> [5] func:mpirun(orterun+0xe25) [0x404565]
> [6] func:mpirun(main+0x20) [0x403594]
> [7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3091c1ed1d]
> [8] func:mpirun() [0x4034b9]
> 
> 
> 16 core nodes (dual Xeon E5-2670)
> 
> 8 MPI procs (single node)
> not sure what it used to be, but 3.3 s is plausible
> btl: default or sm or openib+sm: 3.3-3.4 s
> btl: openib 3.9-4.14 s
> 
> 16 MPI procs (single node)
> used to be 2.07 s
> btl default or openib: 23.0-32.56 s
> btl sm or sm+openib: 1.94 s - 39.27 s (mostly the slower times)
> 
> 32 MPI procs (2 nodes)
> used to be 1.24 s
> btl default or sm or openib or sm+openib: 30s - 97 
> s___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-17 Thread Noam Bernstein
On Dec 16, 2013, at 5:40 PM, Noam Bernstein  wrote:

> 
> Once I have some more detailed information I'll follow up.

OK - I've tried to characterize the behavior with vasp, which accounts for
most of our cluster usage, and it's quite odd.  I ran my favorite benchmarking
job repeated 4 times. As you can see below, in some
cases using sm it's as fast as before (kernel 2.6.32-358.23.2.el6.x86_64),
but mostly it's a factor of 2 slower.  With openib and our older nodes it's 
always a 
factor of 2-4 slower.  With the newer nodes in a situation where using sm is
possible it's occasionally as fast as before, but sometimes it's 10-20 times
slower.  When using ib with the new nodes it's always much slower than before.

openmpi is 1.7.3, recompiled with the new kernel.  vasp is 5.3.3, which we've
been using for months.  Everything is compiled with an older stable version
of the intel compiler, as we've been doing for a long time.

More perhaps useful information - I don't have actual data from the previous
setup (perhaps I should roll back some nodes and check), but I generally
expect to see 100% cpu usage on all the processes, either because they're
doing numeric stuff, or doing a busy-wait for mpi.  However, now I see a few 
of the vasp processes at 100%, and the others at 50-70% (say 4-6 on a given
node at 100%, and the rest lower). 

If anyone has any ideas on what's going on, or how to debug further, I'd
really appreciate some suggestions.


Noam

8 core nodes (dual Xeon X5550)

8 MPI procs (single node)
used to be 5.74 s
now:
btl: default  or sm only or sm+openib: 5.5-9.3 s, mostly the larger times
btl: openib: 10.0-12.2 s

16 MPI procs (2 nodes)
used to be 2.88 s
btl default or openib or sm+openib: 4.8 - 6.23 s

32 MPI procs (4 nodes)
use to be 1.59 s
btl default or openib or sm+openib: 2.73-4.49 s, but sometimes just fails

at least once gave the errors (stack trace is incomplete, but probably on 
mpi_comm_rank, mpi_comm_size, or mpi_barrier)
[compute-3-24:32566] [[59587,0],0]:route_callback trying to get message from 
[[59587,1],20] to [[59587,1],28]:102, routing loop
[0] 
func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_backtrace_print+0x1f)
 [0x2b5940c2dd9f]
[1] 
func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_rml_oob.so(+0x22b6) 
[0x2b5941f0f2b6]
[2] 
func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_recv_complete+0x27f)
 [0x2b59441f]
[3] 
func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(+0x9d3a) 
[0x2b5943334d3a]
[4] 
func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x8bc)
 [0x2b5940c3592c]
[5] func:mpirun(orterun+0xe25) [0x404565]
[6] func:mpirun(main+0x20) [0x403594]
[7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3091c1ed1d]
[8] func:mpirun() [0x4034b9]


16 core nodes (dual Xeon E5-2670)

8 MPI procs (single node)
not sure what it used to be, but 3.3 s is plausible
btl: default or sm or openib+sm: 3.3-3.4 s
btl: openib 3.9-4.14 s

16 MPI procs (single node)
used to be 2.07 s
btl default or openib: 23.0-32.56 s
btl sm or sm+openib: 1.94 s - 39.27 s (mostly the slower times)

32 MPI procs (2 nodes)
used to be 1.24 s
btl default or sm or openib or sm+openib: 30s - 97 s

smime.p7s
Description: S/MIME cryptographic signature