Thanks guys, you're right.
This is an output of lstopo on our system which confirms that logical cpus
numbering is used in report bindings:
lstopo -l
Machine (256GB)
NUMANode L#0 (P#0 128GB) + Socket L#0 + L3 L#0 (35MB)
L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L2 L#1
(256KB) + L1 L#22 (32KB) + Core L#22 + PU L#22 (P#22)
> >>
> >> L2 L#23 (256KB) + L1 L#23 (32KB) + Core L#23 + PU L#23 (P#23)
> >>
> >> L2 L#24 (256KB) + L1 L#24 (32KB) + Core L#24 + PU L#24 (P#24)
> >>
> >> L2 L#25 (256
Hi guys,
I faced with an issue on our cluster related to mapping & binding policies
on 1.8.5.
The matter is that --report-bindings output doesn't correspond to the
locale. It looks like there is a mistake on the output itself, because it
just puts serial core number while that core can be on
at 6:36 PM, Ralph Castain <r...@open-mpi.org> wrote:
> S….are you going to restore the rest of it? Or are we asking Nathan to
> refile it without that one piece?
>
>
> On Apr 15, 2015, at 7:26 AM, Elena Elkina <elena.elk...@itseez.com> wrote:
>
> Hi Ralph.
>
Hi Ralph.
We don't need to revert the whole commit, just to fix this small part. I
proposed a fast fix for that in the PR but probably we need to fix it more
intellectually.
Best regards,
Elena
On Wed, Apr 15, 2015 at 6:08 PM, Ralph Castain wrote:
> I’m really puzzled - I
Hi Ralph,
As I remember the idea of this code was to create a reply once (and set
flag stored to true) but send this reply multiple times (to each process
from the list of requests). Flag stored is set to false earlier in the
code. It means that once (for the first request in the loop
I believe it is a bug - I provided some initial values for the modex scope
>> with the expectation (and request when we committed it) that people would
>> review and modify them as appropriate. I recall setting the openib scope as
>> “remote” only because I wasn’t aware of anyo
Hi,
It looks like there is a problem in trunk which reproduces with
simple_spawn test (orte/test/mpi/simple_spawn.c). It seems to be a n issue
with pmix. It doesn't reproduce with default set of btls. But it reproduces
with several btls specified. For example,
salloc -N5
Hi Artem,
Actually some time ago there was a known issue with coll ml. I used to run
my command lines with -mca coll ^ml to avoid these problems, so I don't
know if it was fixed or not. It looks like you have the same problem.
Best regards,
Elena
On Fri, Oct 17, 2014 at 7:01 PM, Artem Polyakov
Hi,
My reproducer failed even with one port enabled (-mca btl_openib_if_include
mlx4_0:1 ).
I tried with trunk as well - the same issue.
Best,
Elena
On Thu, May 8, 2014 at 11:49 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
> Nathan and George,
>
> here are the output files
Yes, this commit is also in the trunk.
Best,
Elena
On Wed, May 7, 2014 at 5:45 PM, Jeff Squyres (jsquyres)
<jsquy...@cisco.com>wrote:
> Is this also happening on the trunk?
>
>
> Sent from my phone. No type good.
>
> On May 7, 2014, at 9:44 AM, "Elena Elkina"
regards,
Elena
On Wed, May 7, 2014 at 5:43 PM, Jeff Squyres (jsquyres)
<jsquy...@cisco.com>wrote:
> Can you cite the branch and SVN r number?
>
> Sent from my phone. No type good.
>
> > On May 7, 2014, at 9:24 AM, "Elena Elkina"
Hi,
I've found that
commit b531973419a056696e6f88d813769aa4f1f1aee6 doesn't work
Author: Jeff Squyres
List-Post: devel@lists.open-mpi.org
Date: Tue Apr 22 19:48:56 2014 +
caused new failures with derived datatypes. Collectives return incorrect
Hi,
Recently I often meet hangs and seg faults with different command lines and
there are "ml" functions in the stack trace.
When I just turn "ml" off by do -mca coll ^ml, problems disappear.
For example,
oshrun -np 4 --map-by node --display-map ./ring_oshmem
fails with seg fault while
oshrun
14 matches
Mail list logo