If you get a chance, you might test this patch:
https://github.com/open-mpi/ompi-release/pull/656
I think it will resolve the problem you mentioned, and is small enough to go
into 1.10.1
Ralph
> On Oct 8, 2015, at 12:36 PM, marcin.krotkiewski
> wrote:
>
>
Sorry, I think I confused one thing:
On 10/08/2015 09:15 PM, marcin.krotkiewski wrote:
For version 1.10.1rc1 and up the situation is a bit different: it
seems that in many cases all cores are present in the cpuset, just
that the binding does not take place in a lot of cases. Instead,
I agree that makes sense. I’ve been somewhat limited in my ability to work on
this lately, and I think Gilles has been in a similar situation. I’ll try to
create a 1.10 patch later today. Depending how minimal I can make it, we may
still be able to put it into 1.10.1, though the window on that
Dear Ralph, Gilles, and Jeff
Thanks a lot for your effort.. Understanding this problem has been a
very interesting exercise for me that let me understand OpenMPI much
better (I think:).
I have given it all a little more thought, and done some more tests on
our production system, and I think
Hi, Gilles,
I have briefly tested your patch with master. So far everything works. I
must say what I really like about this version is that it with
--report-bindings it actually shows how the heterogeneous architectures
looks like, i.e., varying number of cores/sockets per compute node. This
I’m a little nervous about this one, Gilles. It’s doing a lot more than just
addressing the immediate issue, and I’m concerned about any potential
side-effects that we don’t fully unocver prior to release.
I’d suggest a two-pronged approach:
1. use my alternative method for 1.10.1 to solve the
Is this something that needs to go into v1.10.1?
If so, a PR needs to be filed ASAP. We were supposed to make the next 1.10.1
RC yesterday, but slipped to today due to some last second patches.
> On Oct 7, 2015, at 4:32 AM, Gilles Gouaillardet wrote:
>
> Marcin,
>
> here
Marcin,
here is a patch for the master, hopefully it fixes all the issues we
discussed
i will make sure it applies fine vs latest 1.10 tarball from tomorrow
Cheers,
Gilles
On 10/6/2015 7:22 PM, marcin.krotkiewski wrote:
Gilles,
Yes, it seemed that all was fine with binding in the patched
Gilles,
Yes, it seemed that all was fine with binding in the patched 1.10.1rc1 -
thank you. Eagerly waiting for the other patches, let me know and I will
test them later this week.
Marcin
On 10/06/2015 12:09 PM, Gilles Gouaillardet wrote:
Marcin,
my understanding is that in this case,
Marcin,
my understanding is that in this case, patched v1.10.1rc1 is working just
fine.
am I right ?
I prepared two patches
one to remove the warning when binding on one core if only one core is
available,
an other one to add a warning if the user asks a binding policy that makes
no sense with
I filed an issue to track this problem here:
https://github.com/open-mpi/ompi/issues/978
> On Oct 5, 2015, at 1:01 PM, Ralph Castain wrote:
>
> Thanks Marcin. I think we have three things we need to address:
>
> 1. the warning needs to be emitted regardless of whether
Thanks Marcin. I think we have three things we need to address:
1. the warning needs to be emitted regardless of whether or not
—report-bindings was given. Not sure how that warning got “covered” by the
option, but it is clearly a bug
2. improve the warning to include binding info - relatively
Hi, Gilles
you mentionned you had one failure with 1.10.1rc1 and -bind-to core
could you please send the full details (script, allocation and output)
in your slurm script, you can do
srun -N $SLURM_NNODES -n $SLURM_NNODES --cpu_bind=none -l grep
Cpus_allowed_list /proc/self/status
before
Marcin,
there is no need to pursue 1.10.0 since it is known to be broken for some
scenario.
it would really help me if you could provide the logs I requested, so I can
reproduce the issue and make sure we both talk about the same scenario.
imho, there is no legitimate reason to -map-by hwthread
I have applied the patch to both 1.10.0 and 1.10.1rc1. For 1.10.0 it did
not help - I am not sure how much (if) you want pursue this.
For 1.10.1rc1 I was so far unable to reproduce any binding problems with
jobs of up to 128 tasks. Some cosmetic suggestions. The warning it all
started with
I think this is okay, in general. I would only make one change: I would only
search for an alternative site if the binding policy wasn’t set by the user. If
the user specifies a mapping/binding pattern, then we should error out as we
cannot meet it.
I did think of one alternative that might be
Ralph and Marcin,
here is a proof of concept for a fix (assert should be replaced with
proper error handling)
for v1.10 branch.
if you have any chance to test it, please let me know the results
Cheers,
Gilles
On 10/5/2015 1:08 PM, Gilles Gouaillardet wrote:
OK, i'll see what i can do :-)
OK, i'll see what i can do :-)
On 10/5/2015 12:39 PM, Ralph Castain wrote:
I would consider that a bug, myself - if there is some resource
available, we should use it
On Oct 4, 2015, at 5:42 PM, Gilles Gouaillardet > wrote:
Marcin,
i ran a
I would consider that a bug, myself - if there is some resource available, we
should use it
> On Oct 4, 2015, at 5:42 PM, Gilles Gouaillardet wrote:
>
> Marcin,
>
> i ran a simple test with v1.10.1rc1 under a cpuset with
> - one core (two threads 0,16) on socket 0
> - two
Marcin,
i ran a simple test with v1.10.1rc1 under a cpuset with
- one core (two threads 0,16) on socket 0
- two cores (two threads each 8,9,24,25) on socket 1
$ mpirun -np 3 -bind-to core ./hello_c
--
A request was made to
Hi, all,
I played a bit more and it seems that the problem results from
trg_obj = opal_hwloc_base_find_min_bound_target_under_obj()
called in rmaps_base_binding.c / bind_downwards being wrong. I do not
know the reason, but I think I know when the problem happens (at least
on 1.10.1rc1). It
Ralph,
I suspect ompi tries to bind to threads outside the cpuset.
this could be pretty similar to a previous issue when ompi tried to bind to
cores outside the cpuset.
/* when a core has more than one thread, would ompi assume all the threads
are available if the core is available ? */
I will
Thanks - please go ahead and release that allocation as I’m not going to get to
this immediately. I’ve got several hot irons in the fire right now, and I’m not
sure when I’ll get a chance to track this down.
Gilles or anyone else who might have time - feel free to take a gander and see
if
Done. I have compiled 1.10.0 and 1.10.rc1 with --enable-debug and executed
mpirun --mca rmaps_base_verbose 10 --hetero-nodes --report-bindings
--bind-to core -np 32 ./affinity
In case of 1.10.rc1 I have also added :overload-allowed - output in a
separate file. This option did not make much
Rats - just realized I have no way to test this as none of the machines I can
access are setup for cgroup-based multi-tenant. Is this a debug version of
OMPI? If not, can you rebuild OMPI with —enable-debug?
Then please run it with —mca rmaps_base_verbose 10 and pass along the output.
Thanks
What version of slurm is this? I might try to debug it here. I’m not sure where
the problem lies just yet.
> On Oct 3, 2015, at 8:59 AM, marcin.krotkiewski
> wrote:
>
> Here is the output of lstopo. In short, (0,16) are core 0, (1,17) - core 1
> etc.
>
>
Here is the output of lstopo. In short, (0,16) are core 0, (1,17) - core
1 etc.
Machine (64GB)
NUMANode L#0 (P#0 32GB)
Socket L#0 + L3 L#0 (20MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#16)
L2 L#1 (256KB) + L1d L#1
Maybe I’m just misreading your HT map - that slurm nodelist syntax is a new one
to me, but they tend to change things around. Could you run lstopo on one of
those compute nodes and send the output?
I’m just suspicious because I’m not seeing a clear pairing of HT numbers in
your output, but HT
On 10/03/2015 04:38 PM, Ralph Castain wrote:
If mpirun isn’t trying to do any binding, then you will of course get
the right mapping as we’ll just inherit whatever we received.
Yes. I meant that whatever you received (what SLURM gives) is a correct
cpu map and assigns _whole_ CPUs, not a
If mpirun isn’t trying to do any binding, then you will of course get the right
mapping as we’ll just inherit whatever we received. Looking at your output,
it’s pretty clear that you are getting independent HTs assigned and not full
cores. My guess is that something in slurm has changed such
On 10/03/2015 01:06 PM, Ralph Castain wrote:
Thanks Marcin. Looking at this, I’m guessing that Slurm may be treating HTs as
“cores” - i.e., as independent cpus. Any chance that is true?
Not to the best of my knowledge, and at least not intentionally. SLURM
starts as many processes as there
Marcin,
could you give a try at v1.10.1rc1 that was released today ?
it fixes a bug when hwloc was trying to bind outside the cpuset.
Ralph and all,
imho, there are several issues here
- if slurm allocates threads instead of core, then the --oversubscribe
mpirun option could be mandatory
- with
Thanks Marcin. Looking at this, I’m guessing that Slurm may be treating HTs as
“cores” - i.e., as independent cpus. Any chance that is true?
I’m wondering because bind-to core will attempt to bind your proc to both HTs
on the core. For some reason, we thought that 8.24 were HTs on the same
Hi, Ralph,
I submit my slurm job as follows
salloc --ntasks=64 --mem-per-cpu=2G --time=1:0:0
Effectively, the allocated CPU cores are spread amount many cluster
nodes. SLURM uses cgroups to limit the CPU cores available for mpi
processes running on a given cluster node. Compute nodes are
Can you please send me the allocation request you made (so I can see what you
specified on the cmd line), and the mpirun cmd line?
Thanks
Ralph
> On Oct 2, 2015, at 8:25 AM, Marcin Krotkiewski
> wrote:
>
> Hi,
>
> I fail to make OpenMPI bind to cores correctly
Hi All,
I just got the same behaviour with old Torque (2.5, uses cpusets) we have
and OpenMPI 1.10.0; when --bind-to core is set, occasionally (not always)
it fails
Open MPI tried to bind a new process, but something went wrong. The
process was killed without launching the target application.
Hi,
I fail to make OpenMPI bind to cores correctly when running from within
SLURM-allocated CPU resources spread over a range of compute nodes in an
otherwise homogeneous cluster. I have found this thread
http://www.open-mpi.org/community/lists/users/2014/06/24682.php
and did try to use
37 matches
Mail list logo