Tru Huynh writes:
> afaik, 2.6.32-431 series is from RHEL(and clones) version >=6.5
[Right.]
> otoh, it might be related to http://bugs.centos.org/view.php?id=6949
That looks likely. As we bind to cores, we wouldn't see it for MPI
processes, at least, and will see higher
Bernd Dammann writes:
> We use Moab/Torque, so we could use cpusets (but that has had some
> other side effects earlier, so we did not implement it in our setup).
I don't know remember Torque does, but core binding and (Linux) cpusets
are somewhat orthogonal. While a cpuset
On 3/2/14 0:44 AM, Tru Huynh wrote:
On Fri, Feb 28, 2014 at 08:49:45AM +0100, Bernd Dammann wrote:
Maybe I should say, that we moved from SL 6.1 and OMPI 1.4.x to SL
6.4 with the above kernel, and OMPI 1.6.5 - which means a major
upgrade of our cluster.
After the upgrade, users reported those
On 2/27/14 14:06 PM, Noam Bernstein wrote:
On Feb 27, 2014, at 2:36 AM, Patrick Begou
wrote:
Bernd Dammann wrote:
Using the workaround '--bind-to-core' does only make sense for those jobs, that
allocate full nodes, but the majority of our jobs don't do
On 2/27/14 16:47 PM, Dave Love wrote:
Bernd Dammann writes:
Hi,
I found this thread from before Christmas, and I wondered what the
status of this problem is. We experience the same problems since our
upgrade to Scientific Linux 6.4, kernel 2.6.32-431.1.2.el6.x86_64, and
Noam, cpusets are a very good idea.
Not only for CPU binding but for isolating 'badky behaved' applications.
If an application stsrts using huge amounts of memory - kill it, collapse
the cpuset and it is gone - nice clean way to manage jobs.
On Feb 27, 2014, at 5:06 AM, Noam Bernstein wrote:
> On Feb 27, 2014, at 2:36 AM, Patrick Begou
> wrote:
>
>> Bernd Dammann wrote:
>>> Using the workaround '--bind-to-core' does only make sense for those jobs,
>>> that
On Feb 27, 2014, at 2:36 AM, Patrick Begou
wrote:
> Bernd Dammann wrote:
>> Using the workaround '--bind-to-core' does only make sense for those jobs,
>> that allocate full nodes, but the majority of our jobs don't do that.
> Why ?
> We still use this option
Bernd Dammann wrote:
Using the workaround '--bind-to-core' does only make sense for those jobs,
that allocate full nodes, but the majority of our jobs don't do that.
Why ?
We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenFOAM and other
applications to attach each process on its core
Hi,
I found this thread from before Christmas, and I wondered what the
status of this problem is. We experience the same problems since our
upgrade to Scientific Linux 6.4, kernel 2.6.32-431.1.2.el6.x86_64, and
OpenMPI 1.6.5.
Users have reported severe slowdowns in all kinds of
On Dec 18, 2013, at 5:19 PM, Martin Siegert wrote:
>
> Thanks for figuring this out. Does this work for 1.6.x as well?
> The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity
> covers versions 1.2.x to 1.5.x.
> Does 1.6.x support mpi_paffinity_alone = 1 ?
> I set
Brice Goglin writes:
> hwloc-ps (and lstopo --top) are better at showing process binding but
> they lack a nice pseudographical interface with dynamic refresh.
That seems like an advantage when you want to check on a cluster!
> htop uses hwloc internally iirc, so there's
Noam Bernstein writes:
> On Dec 18, 2013, at 10:32 AM, Dave Love wrote:
>
>> Noam Bernstein writes:
>>
>>> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in
>>> some
>>> collective
Hi,
expanding on Noam's problem a bit ...
On Wed, Dec 18, 2013 at 10:19:25AM -0500, Noam Bernstein wrote:
> Thanks to all who answered my question. The culprit was an interaction
> between
> 1.7.3 not supporting mpi_paffinity_alone (which we were using previously) and
> the new
> kernel.
On Wed, 2013-12-18 at 11:47 -0500, Noam Bernstein wrote:
> Yes - I never characterized it fully, but we attached with gdb to every
> single vasp running process, and all were stuck in the same
> call to MPI_allreduce() every time. It's only happening on a rather large
> jobs, so it's not the
hwloc-ps (and lstopo --top) are better at showing process binding but they lack
a nice pseudographical interface with dynamic refresh.
htop uses hwloc internally iirc, so there's hope we'll have everything needed
in htop one day ;)
Brice
Dave Love a écrit :
>John
John Hearns writes:
> 'Htop' is a very good tool for looking at where processes are running.
I'd have thought hwloc-ps is the tool for that.
Noam Bernstein writes:
> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some
> collective communication), but now I'm wondering whether I should just test
> 1.6.5.
What bug, exactly? As you mentioned vasp, is it specifically affecting
Thanks to all who answered my question. The culprit was an interaction between
1.7.3 not supporting mpi_paffinity_alone (which we were using previously) and
the new
kernel. Switching to --bind-to core (actually the environment variable
OMPI_MCA_hwloc_base_binding_policy=core) fixed the
Hi,
Do you have thread multiples enabled in your OpenMPI installation ?
Maxime Boissonneault
Le 2013-12-16 17:40, Noam Bernstein a écrit :
Has anyone tried to use openmpi 1.7.3 with the latest CentOS kernel
(well, nearly latest: 2.6.32-431.el6.x86_64), and especially with infiniband?
I'm
OMPI_MCA_hwloc_base_binding_policy=core
On Dec 17, 2013, at 8:40 AM, Noam Bernstein wrote:
> On Dec 17, 2013, at 11:04 AM, Ralph Castain wrote:
>
>> Are you binding the procs? We don't bind by default (this will change in
>> 1.7.4), and
On Dec 17, 2013, at 11:04 AM, Ralph Castain wrote:
> Are you binding the procs? We don't bind by default (this will change in
> 1.7.4), and binding can play a significant role when comparing across kernels.
>
> add "--bind-to-core" to your cmd line
Now that it works, is
On Dec 17, 2013, at 11:04 AM, Ralph Castain wrote:
> Are you binding the procs? We don't bind by default (this will change in
> 1.7.4), and binding can play a significant role when comparing across kernels.
>
> add "--bind-to-core" to your cmd line
Yeay - it works. Thank
On Tue, Dec 17, 2013 at 11:16:48AM -0500, Noam Bernstein wrote:
> On Dec 17, 2013, at 11:04 AM, Ralph Castain wrote:
>
> > Are you binding the procs? We don't bind by default (this will change in
> > 1.7.4), and binding can play a significant role when comparing across
> >
On Dec 17, 2013, at 11:04 AM, Ralph Castain wrote:
> Are you binding the procs? We don't bind by default (this will change in
> 1.7.4), and binding can play a significant role when comparing across kernels.
>
> add "--bind-to-core" to your cmd line
I've previously always
'Htop' is a very good tool for looking at where processes are running.
Are you binding the procs? We don't bind by default (this will change in
1.7.4), and binding can play a significant role when comparing across kernels.
add "--bind-to-core" to your cmd line
On Dec 17, 2013, at 7:09 AM, Noam Bernstein wrote:
> On Dec 16, 2013, at
On Dec 16, 2013, at 5:40 PM, Noam Bernstein wrote:
>
> Once I have some more detailed information I'll follow up.
OK - I've tried to characterize the behavior with vasp, which accounts for
most of our cluster usage, and it's quite odd. I ran my favorite
28 matches
Mail list logo