On Aug 17, 2009, at 7:59 PM, Chris Samuel wrote:
Ah, I think I've misunderstood the website then. :-(
It calls 1.3 stable and 1.2 old and I presumed old
meant deprecated. :-(
To clarify...
1.3 *is* stable, meaning "ok for production use." We test all 1.3
releases before they go out, it
- "Eugene Loh" wrote:
> Actually, the current proposed defaults for 1.3.4 are
> not to change the defaults at all.
Thanks, I hadn't picked up on the latest update to the
trac ticket 3 days ago that says that the defaults will
stay the same. Sounds good to me!
All the best and have a good w
Chris Samuel wrote:
- "Chris Samuel" wrote:
$ mpiexec --mca opal_paffinity_alone 1 -bysocket -bind-to-socket -mca
odls_base_report_bindings 99 -mca odls_base_verbose 7 ./cpi-1.4
To clarify - does that command line accurately reflect the
proposed defaults for OMPI 1.3
- "Chris Samuel" wrote:
> $ mpiexec --mca opal_paffinity_alone 1 -bysocket -bind-to-socket -mca
> odls_base_report_bindings 99 -mca odls_base_verbose 7 ./cpi-1.4
To clarify - does that command line accurately reflect the
proposed defaults for OMPI 1.3.4 ?
cheers,
Chris
--
Christopher Samu
- "Chris Samuel" wrote:
> This is most likely because it's getting an error from the
> kernel when trying to bind to a socket it's not permitted
> to access.
This is what strace reports:
18561 sched_setaffinity(18561, 8, { f0 }
18561 <... sched_setaffinity resumed> ) = -1 EINVAL (Invalid
- "Eugene Loh" wrote:
> Ah, you're missing the third secret safety switch that prevents
> hapless mortals from using this stuff accidentally! :^)
Sounds good to me. :-)
> I think you need to add
>
> --mca opal_paffinity_alone 1
Yup, looks like that's it; it fails to launch with tha
Chris Samuel wrote:
OK, grabbed that (1.4a1r21825). Configured with:
./configure --prefix=$FOO --with-openib --with-tm=/usr/
local/torque/latest --enable-static --enable-shared
It built & installed OK, but when running a trivial example
with it I don't see evidence for that code getting calle
- "Ralph Castain" wrote:
> Hi Chris
Hiya,
> The devel trunk has all of this in it - you can get that tarball from
> the OMPI web site (take the nightly snapshot).
OK, grabbed that (1.4a1r21825). Configured with:
./configure --prefix=$FOO --with-openib --with-tm=/usr/
local/torque/latest
Hi Chris
The devel trunk has all of this in it - you can get that tarball from
the OMPI web site (take the nightly snapshot).
I plan to work on cpuset support beginning Tues morning.
Ralph
On Aug 17, 2009, at 7:18 PM, Chris Samuel wrote:
- "Eugene Loh" wrote:
Hi Eugene,
[...]
It
- "Eugene Loh" wrote:
Hi Eugene,
[...]
> It would be even better to have binding selections adapt to other
> bindings on the system.
Indeed!
This touches on the earlier thread about making OMPI aware
of its cpuset/cgroup allocation on the node (for those sites
that are using it), it might
On Aug 17, 2009, at 5:59 PM, Chris Samuel wrote:
- "Jeff Squyres" wrote:
An important point to raise here: the 1.3 series is *not* the super
stable series. It is the *feature* series. Specifically: it is not
out of scope to introduce or change features within the 1.3 series.
Ah, I t
- "Jeff Squyres" wrote:
> An important point to raise here: the 1.3 series is *not* the super
> stable series. It is the *feature* series. Specifically: it is not
> out of scope to introduce or change features within the 1.3 series.
Ah, I think I've misunderstood the website then. :-(
On Aug 17 2009, Paul H. Hargrove wrote:
+ I wonder if one can do any "introspection" with the dynamic linker to
detect hybrid OpenMP (no "I") apps and avoid pinning them by default
(examining OMP_NUM_THREADS in the environment is no good, since that
variable may have a site default value othe
Jeff,
Jeff Squyres wrote:
ignored it whenever presenting competitive data. The 1,000,000th time I
saw this, I gave up arguing that our competitors were not being fair and
simply changed our defaults to always leave memory pinned for
OpenFabrics-based networks.
Instead, you should have tol
Some more thoughts in this thread that I've not seen expressed yet
(perhaps I missed them):
+ Some argue that this change in the middle of a stable series may, to
some users, appear to be a performance regression when they update.
However, I would argue that if the alternative is to delay thi
Some very good points in this thread all round.
On Mon, 2009-08-17 at 09:00 -0400, Jeff Squyres wrote:
>
> This is probably not too surprising (i.e., allowing the OS to move
> jobs around between cores on a socket can probably involve a little
> cache thrashing, resulting in that 5-10% loss)
On Aug 17, 2009, at 3:23 PM, N.M. Maclaren wrote:
>Yes, BUT... We had a similar option to this for a long, long time.
Sorry, perhaps I should have spelled out what I meant by "mandatory".
The system would not build (or run, depending on where it was set)
without such a value being specified.
On Aug 17 2009, Jeff Squyres wrote:
Yes, BUT... We had a similar option to this for a long, long time.
Sorry, perhaps I should have spelled out what I meant by "mandatory".
The system would not build (or run, depending on where it was set)
without such a value being specified. There would
On Aug 17, 2009, at 12:11 PM, N.M. Maclaren wrote:
1) To have a mandatory configuration option setting the default,
which
would have a name like 'performance' for the binding option. YOU
could then
beat up anyone who benchmarkets without it for being biassed. This
is a
better solution
On Aug 17 2009, Ralph Castain wrote:
At issue for us is that other MPIs -do- bind by default, thus creating an
apparent performance advantage for themselves compared to us on standard
benchmarks run "out-of-the-box". We repeatedly get beat-up in papers and
elsewhere over our performance, when ma
I don't disagree with your statements. However, I was addressing the
specific question of two OpenMPI programs conflicting on process placement,
not the overall question you are raising.
The issue of when/if to bind has been debated for a long time. I agree that
having more options (bind-to-socket
pen-mpi.org
> [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
> Sent: Monday, August 17, 2009 7:01 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] Heads up on new feature to 1.3.4
>
> On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:
>
> > I think the pro
On Aug 17 2009, Ralph Castain wrote:
The problem is that the two mpiruns don't know about each other, and
therefore the second mpirun doesn't know that another mpirun has
already used socket 0.
We hope to change that at some point in the future.
It won't help. The problem is less likely
Jeff Squyres wrote:
On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:
UNLESS you have a threaded application, in which case -any- binding
can be highly detrimental to performance.
I'm not quite sure I understand this statement. Binding is not
inherently contrary to multi-threaded applic
On Aug 17 2009, Jeff Squyres wrote:
On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:
I think the problem here, Eugene, is that performance benchmarks are
far from the typical application. We have repeatedly seen this -
optimizing for benchmarks frequently makes applications run less
effi
On Aug 16, 2009, at 8:56 PM, George Bosilca wrote:
I tend to agree with Chris. Changing the behavior of the 1.3 in the
middle of the stable release cycle, will be very confusing for our
users.
An important point to raise here: the 1.3 series is *not* the super
stable series. It is the *fea
On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:
I think the problem here, Eugene, is that performance benchmarks are
far from the typical application. We have repeatedly seen this -
optimizing for benchmarks frequently makes applications run less
efficiently. So I concur with Chris on th
The problem is that the two mpiruns don't know about each other, and
therefore the second mpirun doesn't know that another mpirun has
already used socket 0.
We hope to change that at some point in the future.
Ralph
On Aug 17, 2009, at 4:02 AM, Lenny Verkhovsky wrote:
In the multi job envi
In the multi job environment, can't we just start binding processes on the
first avaliable and unused socket?
I mean first job/user will start binding itself from socket 0,
the next job/user will start binding itself from socket 2, for instance .
Lenny.
On Mon, Aug 17, 2009 at 6:02 AM, Ralph Casta
On Aug 16, 2009, at 8:16 PM, Eugene Loh wrote:
Chris Samuel wrote:
- "Eugene Loh" wrote:
This is an important discussion.
Indeed! My big fear is that people won't pick up the significance
of the change and will complain about performance regressions
in the middle of an OMPI stable re
Chris Samuel wrote:
- "Eugene Loh" wrote:
This is an important discussion.
Indeed! My big fear is that people won't pick up the significance
of the change and will complain about performance regressions
in the middle of an OMPI stable release cycle.
2) The pro
- "Eugene Loh" wrote:
> This is an important discussion.
Indeed! My big fear is that people won't pick up the significance
of the change and will complain about performance regressions
in the middle of an OMPI stable release cycle.
> Do note:
>
> 1) Bind-to-core is actually the default be
I tend to agree with Chris. Changing the behavior of the 1.3 in the
middle of the stable release cycle, will be very confusing for our
users. Moreover, as Ralph pointed out, everything in Open MPI is
configurable so if we advertise this feature in the Changelog, the
institutions where the n
- "Ralph Castain" wrote:
> Hi Chris
Hiya Ralph,
> There would be a "-do-not-bind" option that will prevent us from
> binding processes to anything which should cover that situation.
Gotcha.
> My point was only that we would be changing the out-of-the-box
> behavior to the opposite of tod
This is an important discussion. Do note:
1) Bind-to-core is actually the default behavior of many MPIs today.
2) The proposed OMPI bind-to-socket default is less severe. In the
general case, it would allow multiple jobs to bind in the same way
without oversubscribing any core or socket. (
Hi Chris
There would be a "-do-not-bind" option that will prevent us from binding
processes to anything which should cover that situation.
My point was only that we would be changing the out-of-the-box behavior to
the opposite of today's, so all those such as yourself would now have to add
the -d
- "Terry Dontje" wrote:
> I just wanted to give everyone a heads up if they do not get bugs
> email. I just submitted a CMR to move over some new paffinity options
> from the trunk to the v1.3 branch.
Ralphs comments imply that for those sites that share nodes
between jobs (such as oursel
I just wanted to give everyone a heads up if they do not get bugs
email. I just submitted a CMR to move over some new paffinity options
from the trunk to the v1.3 branch. You can read the gory details in
https://svn.open-mpi.org/trac/ompi/ticket/1997
--td
38 matches
Mail list logo