Looking at your command line, did you remember to set -mca
mpi_paffinity_alone 1? If not, we won't set affinity on the processes.
On Jul 15, 2009, at 8:11 PM, Chris Samuel wrote:
- "Ralph Castain" wrote:
Could you check this? You can run a trivial job using the -npernode x
option, wh
- "Ralph Castain" wrote:
> Could you check this? You can run a trivial job using the -npernode x
> option, where x matched the #cores you were allocated on the nodes.
> If you do this, do we bind to the correct cores?
Nope, I'm afraid it doesn't - submitted a job asking
for 4 cores on one
Okay, George - this is fixed in r21690.
Thanks again
Ralph
On Jul 15, 2009, at 2:40 PM, Ralph Castain wrote:
Ah - interesting scenario!
Definitely a "bug" in the code, then. What it looks like, though, is
that the jdata->num_procs is wrong. There shouldn't be any way that
the num_procs in
On Wed, 15 Jul 2009, Lisandro Dalcin wrote:
The MPI 2-1 standard says:
"MPI_PROC_NULL is a valid target rank in the MPI RMA calls
MPI_ACCUMULATE, MPI_GET, and MPI_PUT. The effect is the same as for
MPI_PROC_NULL in MPI point-to-point communication. After any RMA
operation with rank MPI_PROC_NUL
The MPI 2-1 standard says:
"MPI_PROC_NULL is a valid target rank in the MPI RMA calls
MPI_ACCUMULATE, MPI_GET, and MPI_PUT. The effect is the same as for
MPI_PROC_NULL in MPI point-to-point communication. After any RMA
operation with rank MPI_PROC_NULL, it is still necessary to finish the
RMA epoc
Found the bug - we indeed failed to update the jdata->num_procs field
when adding the non-rf-mapped procs to the job.
Fix coming shortly.
On Jul 15, 2009, at 2:40 PM, Ralph Castain wrote:
Ah - interesting scenario!
Definitely a "bug" in the code, then. What it looks like, though, is
that
Ah - interesting scenario!
Definitely a "bug" in the code, then. What it looks like, though, is that
the jdata->num_procs is wrong. There shouldn't be any way that the num_procs
in the node array is different than jdata->num_procs.
My guess is that the rank_file mapper isn't correctly maintaining
I think I found a better solution (in r21688). Here is what I was
trying to do.
I have a more or less homogeneous cluster. In fact all processors are
identical, except that some are quad core and some dual core. Of
course I care how my processes are mapped on the quad cores, but not
reall
Actually I don't think this will help. I looked on MTT and there are
no errors related to this (logically all reductions should have
failed) ... and MTT is supposed to run on several platforms. What
happens inside is really strange, but as we do the same mistake when
we look-up the op as he
The routed comm system relies on each daemon having complete information as
to where every process is located, so the expectation was that only full
maps would ever be sent. Thus, the nidmap code is setup to always send a
full map.
I don't know how to even generate a "partial" map. I assume you ar
I have a question regarding the mapping. How can I declare a partial
mapping ? In fact I only care about how some of the processes are
mapped on some specific nodes. Right now if the rmaps doesn't contain
information about all nodes, we give up (before this patch we
segfaulted).
Does it m
Perhaps we should add a requirement for testing on 2-3 different
systems before long-term (or "big change") branches like this come to
the trunk? I say this because it seems like at least some of these
problems were based on bad luck -- i.e., the stuff worked on the
platform that it was be
Done. Hit "reload" on the URL below, check out an SVN repository, or
wait for these changes to be pushed to the live site.
Matthias Jurenz wrote:
Could you also mention the tool 'otfprofile' under the section 7,
please?
On Tue, 2009-07-14 at 18:54 -0700, Eugene Loh wrote:
P.S. Until the
Hi Jeff,
Ralph and Edgar send fwd an email about this.
We (George and myselve) are currently looking into this.
With the changes we have I can get IBM/spawn to work "sometimes", aka
sometimes, it segfaults.
Thanks,
Rainer
On Wednesday 15 July 2009 11:50:13 am Jeff Squyres wrote:
> I [very br
Thanks George!!
On Wed, Jul 15, 2009 at 9:57 AM, George Bosilca wrote:
> Yes, this appears to be at least partially part of the problem Edgar is
> seeing. We're trying to figure out how most of the tests passed so far with
> a wrong mapping. Interesting enough, while the mapping seems wrong the
>
Yes, this appears to be at least partially part of the problem Edgar
is seeing. We're trying to figure out how most of the tests passed so
far with a wrong mapping. Interesting enough, while the mapping seems
wrong the lookup is symmetric so most of the time we end-up with the
correct op by
I [very briefly] read about the DDT spawn issues, so I went to look at
ompi/op/op.c. I notice that there's a new comment above the op
datatype<-->op map construction area that says:
/* XXX TODO */
svn blame says:
21641 rusraink /* XXX TODO */
r21641 is the big merge from the pa
*Quickness competition round 1*
Jeff vs. Josh
1 : 0
;-))
Josh Hursey wrote:
> FYI.
>
>
> Begin forwarded message:
>
>> From: DongInn Kim
>> Date: July 15, 2009 10:39:01 AM EDT
>> To: all-osl-us...@osl.iu.edu
>> Subject: Re: [all-osl-users] Upgrading of the OSL SVN server
>>
>> I am s
FYI.
Begin forwarded message:
From: DongInn Kim
Date: July 15, 2009 10:39:01 AM EDT
To: all-osl-us...@osl.iu.edu
Subject: Re: [all-osl-users] Upgrading of the OSL SVN server
I am sorry that we can not upgrade subversion this time because of
the technical issues on the interaction between t
FYI.
Begin forwarded message:
From: "DongInn Kim"
Date: July 15, 2009 10:39:01 AM EDT
To:
Subject: Re: [all-osl-users] Upgrading of the OSL SVN server
I am sorry that we can not upgrade subversion this time because of
the technical issues on the interaction between the new subversion
and
On Jul 15, 2009, at 10:37 AM, Matthias Jurenz wrote:
Sure! Our SVN ID's are:
jurenz
and
knuepfer
Done! You should have write access -- let me know if you don't.
I think you guys have seen it before, but here's the wiki page about
adding / editing wiki pages:
https://
On Wed, 2009-07-15 at 10:24 -0400, Jeff Squyres wrote:
> On Jul 15, 2009, at 8:57 AM, Matthias Jurenz wrote:
>
> > I sent the answer directly to the user, 'cause I didn't subscribe to
> > the
> > user-list. I'll do that asap ;-)
> >
>
> Thanks -- I appreciate it. I know it's a somewhat high-vo
On Jul 15, 2009, at 10:24 AM, Jeff Squyres (jsquyres) wrote:
Thanks -- I appreciate it. I know it's a somewhat high-volume list.
I can bounce you the original question so that you can reply to it and
have it threaded properly.
Disregard -- you replied already. Many thanks!
--
Jeff Squyres
On Jul 15, 2009, at 8:57 AM, Matthias Jurenz wrote:
I sent the answer directly to the user, 'cause I didn't subscribe to
the
user-list. I'll do that asap ;-)
Thanks -- I appreciate it. I know it's a somewhat high-volume list.
I can bounce you the original question so that you can reply
Hmmm...I believe I made a mis-statement. Shocking to those who know me, I am
sure! :-)
Just to correct my comments: OMPI knows how many "slots" have been allocated
to us, but not which "cores". So I'll assign the correct number of procs to
each node, but they won't know that we were allocated core
Hi all,
I have a cluster where both HCA's of blade are active, but
connected to different subnet.
Is there an option in MPI to select one HCA out of available
one's? I know it can be done by making changes in openmpi code, but i need
clean interface like option during mpi launch
Hi Jeff,
On Wed, 2009-07-15 at 07:13 -0400, Jeff Squyres wrote:
> On Jul 15, 2009, at 6:17 AM, Matthias Jurenz wrote:
>
> > the FAQ page looks very nice.
> >
>
> Ditto -- thanks for doing it, Eugene!
>
> > I just sent the following answer to Lin Zou:
> >
>
> Did that go on-list? It would be g
On Tue, 2009-07-14 at 18:54 -0700, Eugene Loh wrote:
> P.S. Until the page goes live, I'll also leave it at
> http://www.osl.iu.edu/~eloh/faq/?category=perftools . Or, check out a
> workspace.
I'm happy with it.
Ashley.
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for
- "Ralph Castain" wrote:
Hi Ralph,
> Interesting. No, we don't take PLPA cpu sets into account when
> retrieving the allocation.
Understood.
> Just to be clear: from an OMPI perspective, I don't think this is an
> issue of binding, but rather an issue of allocation. If we knew we had
>
On Jul 14, 2009, at 1:23 PM, Rainer Keller wrote:
https://svn.open-mpi.org/trac/ompi/wiki/HowtoTesting
That is most helpful -- thanks!
What about the latency issue?
> >> Performance tests on the ompi-ddt branch have proven that there
is no
> >> performance penalties associated with this c
On Jul 15, 2009, at 6:17 AM, Matthias Jurenz wrote:
the FAQ page looks very nice.
Ditto -- thanks for doing it, Eugene!
I just sent the following answer to Lin Zou:
Did that go on-list? It would be good to see that stuff in the
publicly-searchable web archives. I mention this because
Interesting. No, we don't take PLPA cpu sets into account when
retrieving the allocation.
Just to be clear: from an OMPI perspective, I don't think this is an
issue of binding, but rather an issue of allocation. If we knew we had
been allocated only a certain number of cores on a node, then
Hi Eugene,
the FAQ page looks very nice.
I just sent the following answer to Lin Zou:
.
for a quick view of what is inside the trace you could try 'otfprofile'
to generate a tex/ps file with some information. This tool is a
component of the latest stand-alone version of the Open Trace Format
Hi Eugene,
The FAQ page looks very good!
Some links on the left side do not work, but I assume
they will work tomorrow, when the real page goes alive.
Thanks,
Nik
Eugene Loh wrote:
Zou, Lin (GE, Research, Consultant) wrote:
Hi all,
I want to trace my program, having used vampirTrace to genera
Hi all,
Not sure if this is a OpenMPI query or a PLPA query,
but given that PLPA seems to have some support for it
already I thought I'd start here. :-)
We run a quad core Opteron cluster with Torque 2.3.x
which uses the kernels cpuset support to constrain
a job to just the cores it has been allo
35 matches
Mail list logo