Terry is dropping his account due to change in "day job"
responsibilities. I'm retaining mine. Oracle status is changing from
member to contributor.
On 7/16/2013 12:16 AM, Rainer Keller wrote:
Hi Josh,
thanks for the info. Was about to look at this mail...
Is Oracle / Sun not part of OMPI
in the past year.
Oracle
==
emallove: Ethan Mallove <ethan.mall...@oracle.com> **NO COMMITS IN LAST YEAR**
eugene: Eugene Loh <eugene@oracle.com>
tdd: Terry Dontje <terry.don...@oracle.com>
Please keep eugene, but close emallove and tdd.
On 02/23/13 14:45, Ralph Castain wrote:
This release candidate is the last one we expect to have before release, so
please test it. Can be downloaded from the usual place:
http://www.open-mpi.org/software/ompi/v1.7/
I haven't looked at this very carefully yet. Maybe someone can confirm what
On 02/20/13 07:54, Jeff Squyres (jsquyres) wrote:
All MTT testing looks good for 1.6.4. There seems to be an MPI dynamics
problem when --enable-spare-groups is used, but this does not look like a
regression to me.
I put out a final rc, because there was one more minor change to accommodate
On 10/04/12 07:00, Kawashima, Takahiro wrote:
(1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE.
This bug is caused by a use of an incorrect variable in
ompi/mpi/c/wait.c (for MPI_Wait) and by an incorrect
initialization of ompi_request_null in
On 10/4/2012 4:00 AM, Kawashima, Takahiro wrote:
> Hi Open MPI developers,
>
> I found some bugs in Open MPI and attach a patch to fix them.
>
> The bugs are:
>
> (1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE.
>
> (2) MPI_Status for an inactive request must be an empty
Where do I find the details on how the nightly tarballs are made from
the SVN repos?
On 9/27/2012 11:31 AM, N.M. Maclaren wrote:
On Sep 27 2012, Jeff Squyres (jsquyres) wrote:
..."that obscene hack"...
...configure mechanism...
Good discussion, but as far as my specific issue goes, it looks like
it's some peculiar interaction between different compiler versions. I'm
asking
The ibm tests aren't building for me. One of the issues is
mprobe_usempif08.f90 trying to access status%MPI_SOURCE and
status%MPI_TAG. I assume this is supposed to work, but it doesn't.
E.g., trunk with Oracle Studio compilers:
% cat a.f90
use mpi_f08
type(MPI_Status) status
Maybe this is related to Reuti's "-hostfile ignored in 1.6.1" on the
users mail list, but not quite sure.
Let's pretend my nodes are called local, r1, and r2. That is, I launch
mpirun from "local" and there are two other (remote) nodes available to
me. With the trunk (e.g., v1.9 r27247), I
Trunk broken? Last night, Oracle's MTT trunk runs all came up empty
handed. E.g.,
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: (nil)
[ 0] [0xe600]
[ 1] /lib/libc.so.6(strlen+0x33) [0x3fa0a3]
[ 2]
r27178 seems to build fine. Thanks.
On 8/29/2012 7:42 AM, Shamis, Pavel wrote:
Eugene,
Can you please confirm that the issue is resolved on your setup ?
On Aug 29, 2012, at 10:14 AM, Shamis, Pavel wrote:
The issue #2 was fixed in r27178.
g specific to that one machine?
I'm wondering because if it is just the one machine,
then it might be something strange about how it is setup
- perhaps the version of Solaris, or it is configuring
--enable-static, or...
Just trying to asses
On 08/24/12 09:54, Shamis, Pavel wrote:
Maybe there is a chance to get direct access to this system ?
No.
But I'm attaching compressed log files from configure/make.
tarball-of-log-files.tar.bz2
Description: application/bzip
On 8/7/2012 5:45 AM, Jeff Squyres wrote:
So the issue is when (for example) Fortran MPI_Recv says "hey, C ints are the same
as Fortran INEGERs, so I don't need a temporary MPI_Status buffer; I'll just use the
INTEGER array that I was given, and pass it to the back-end C MPI_Recv() routine."
On 7/31/2012 5:15 AM, Jeff Squyres wrote:
On Jul 31, 2012, at 2:58 AM, Eugene Loh wrote:
The main issue is this. If I go to ompi/mpi/fortran/mpif-h, I see six files (*recv_f and
*probe_f) that take status arguments. Normally, we do some conversion between Fortran
and C status arguments
I have some questions originally motivated by some mpif-h/MPI_Mprobe
failures we've seen in SPARC MTT runs at 64-bit in both v1.7 and v1.9,
but my poking around spread out from there.
The main issue is this. If I go to ompi/mpi/fortran/mpif-h, I see six
files (*recv_f and *probe_f) that take
at these issues. If not, it might be best to remove the
libnbc code from 1.7, as it's unfortunately clear that it's not as ready
for integration as we believed and I don't have time to fix the code base.
On 7/16/12 2:50 PM, "Eugene Loh"<eugene@oracle.com> wrote:
The NBC functionality d
to fix the code base.
Brian
On 7/16/12 2:50 PM, "Eugene Loh"<eugene@oracle.com> wrote:
The NBC functionality doesn't fare very well on SPARC. One of the
problems is with data alignment. An NBC schedule is a number of
variously sized fields laid out contiguously in lin
The NBC functionality doesn't fare very well on SPARC. One of the
problems is with data alignment. An NBC schedule is a number of
variously sized fields laid out contiguously in linear memory (e.g.,
see nbc_internal.h or nbc.c) and words don't have much natural
alignment. On SPARC, the
thought i would be 100 at the end of that do loop.
$%#@#@$% Fortran. :-(
On Jul 11, 2012, at 12:25 PM,<svn-commit-mai...@open-mpi.org> wrote:
Author: eugene (Eugene Loh)
Date: 2012-07-11 12:25:09 EDT (Wed, 11 Jul 2012)
New Revision: 2002
Log:
Apply the "right value when calling wa
On 07/06/12 14:35, Barrett, Brian W wrote:
On 7/6/12 2:31 PM, "Eugene Loh"<eugene@oracle.com> wrote:
The new reduce_scatter_block test is segfaulting with v1.7 but not with
the trunk. When we drop down into MPI_Reduce_scatter_block and attem
I assume this is an orphaned file that should be removed? (It looks
like a draft version of ibcast_f08.f90.)
The new reduce_scatter_block test is segfaulting with v1.7 but not with
the trunk. When we drop down into MPI_Reduce_scatter_block and attempt
to call
comm->c_coll.coll_reduce_scatter_block()
it's NULL. (So is comm->c_coll.coll_reduce_scatter_block_module.)
Is there some work on the trunk
Either there is a problem with MPI_Ibarrier or I don't understand the
semantics.
The following example is with openmpi-1.9a1r26747. (Thanks for the fix
in 26757. I tried with that as well with same results.) I get similar
results for different OSes, compilers, bitness, etc.
% cat
I'll look at this more, but for now I'll just note that the new ibarrier
test is showing lots of failures on MTT (cisco and oracle).
ompi/mca/coll/libnbc/nbc_internal.h
259/* Schedule cache structures/functions */
260u_int32_t adler32(u_int32_t adler, int8_t *buf, int len);
261void NBC_SchedCache_args_delete(void *entry);
262void NBC_SchedCache_args_delete_key_dummy(void *k);
u_int32_t
->
uint32_t
Thanks. That explains one mystery.
I'm still unclear, though. Or, maybe I'm hitting a different problem.
I configure with "--with-openib" (along with other stuff). I get:
r26639:checking if MCA component btl:openib can compile... yes
r26640:checking if MCA component btl:openib can
In tarball 26642, Fortran compilation no longer succeeds. I suspect the
problem might be 26641. E.g.,
libmpi_usempif08.so:
undefined reference to `ompi_iscan_f'
libmpi_mpifh.so:
undefined reference to `MPI_Reduce_scatter_block'
libmpi_mpifh.so:
undefined reference to
Thanks for r26638. Looks like that file still needs a little attention:
http://www.open-mpi.org/mtt/index.php?do_redir=2073
On 6/22/2012 10:40 AM, Eugene Loh wrote:
Looking good. Just a few more: btl_udapl_endpoint.c has instances of
seg_len and seg_addr. udapl may not have much of a future
Looking good. Just a few more: btl_udapl_endpoint.c has instances of
seg_len and seg_addr. udapl may not have much of a future, but for now
it's still there.
On 6/22/2012 7:22 AM, Hjelm, Nathan T wrote:
Looks like I missed a few places in udapl and osc. Fixed with r26635 and
r26634.
'opal_libevent2019_event_base_loop+0x606
/home/eugene/r26609/lib/libopen-rte.so.0.0.0'orte_daemon+0xd6d
/home/eugene/r26609/bin/orted'0xd4b
[remote1:01409] *** End of error message ***
Segmentation Fault (core dumped)
On Jun 19, 2012, at 8:31 PM, Eugene Loh wrote:
I'm having bad luck with the trunk starting with r26609
I'm having bad luck with the trunk starting with r26609. Basically,
things hang if I run
mpirun -H remote1,remote2 -n 2 hostname
where remote1 and remote2 are remote nodes.
On 6/15/2012 11:59 AM, Nathan Hjelm wrote:
Until we can find the root cause I pushed a change that protects the reset by
checking if size> 0.
Let me know if that works for you.
It does.
which only happens if the above described
test fails.
I had some doubts about r26597, but I don't have time to check into it until
Monday. Maybe you can remove it and se if you continue to have the same
segfault.
george.
On Jun 15, 2012, at 01:24 , Eugene Loh wrote:
I see a segfault
I see a segfault show up in trunk testing starting with r26598 when
tests like
ibm collective/struct_gatherv
intel src/MPI_Type_free_[types|pending_msg]_[f|c]
are run over openib. Here is a typical stack trace:
opal_convertor_create_stack_at_begining(convertor = 0x689730,
on
segfaults with a variety of tests. So, I think it's not specific to
loop_spawn.
On Sat, Jun 9, 2012 at 3:35 PM, Eugene Loh <eugene@oracle.com
<mailto:eugene@oracle.com>> wrote:
On 6/9/2012 12:06 PM, Eugene Loh wrote:
With r26565:
Enable orte prog
On 6/9/2012 12:06 PM, Eugene Loh wrote:
With r26565:
Enable orte progress threads and libevent thread support by default
Oracle MTT testing started showing new spawn_multiple failures.
Sorry. I meant loop_spawn.
(And then, starting I think in 26582, the problem is masked behind
another
With r26565:
Enable orte progress threads and libevent thread support by default
Oracle MTT testing started showing new spawn_multiple failures. I've
only seen this in 64-bit. Here are two segfaults, both from Linux/x86
systems running over TCP:
This one with GNU compilers:
[...]
I seem to get unreliable results from MTT queries.
To reproduce:
- go to http://www.open-mpi.org/mtt
- click on "Test run"
- for "Date range:" enter "2012-03-23 00:30:00 - 2012-03-23 23:55:00"
- for "Org:" enter "oracle"
- for "Platform name:" enter "t2k-0"
- for "Suite:" enter "ibm-32"
- click
I'm suspicious of some code, but would like comment from someone who
understands it.
In orte/util/nidmap.c orte_util_decode_pidmap(), one cycles through a
buffer. One cycles through jobs. For each one, one unpacks num_procs.
One also unpacks all sorts of other stuff like bind_idx. In
Here is another trunk hang. I get it if I use at least three remote
nodes. E.g., with r26385:
% mpirun -H remoteA,remoteB,remoteC -n 2 hostname
[remoteA:20508] [[54625,0],1] ORTE_ERROR_LOG: Not found in file
base/ess_base_fns.c at line 135
[remoteA:20508] [[54625,0],1] unable to get
I'm hanging on the trunk, even with something as simple as "mpirun
hostname". r26377 and earlier are fine, but r26381 is not. Quickly
looking at the putback log, r26380 seems to be the likely candidate.
I'll look at this some more, but the hang is here (orterun.c):
935 /* loop the
On 4/23/2012 8:22 AM, Jeffrey Squyres wrote:
On Apr 23, 2012, at 1:40 AM, Eugene Loh wrote:
[rhc@odin001 ~/svn-trunk]$ mpifort --showme
gfortran -I/nfs/rinfs/san/homedirs/rhc/openmpi/include
-I/nfs/rinfs/san/homedirs/rhc/openmpi/lib
-L/nfs/rinfs/san/homedirs/rhc/openmpi/lib -lmpi_usempi
Next Fortran problem.
Oracle MTT managed to build the trunk (r26307) in some cases. No
test-run failures in these cases, but the pass counts are way low.
Turns out, the Fortran tests aren't being built (or run). I try
compiling a Fortran code:
ld: fatal: library -lmpi_f77: not found
ld:
Another probably-Fortran-merge problem. Three issues in this e-mail.
Introduction: The last two nights, Oracle MTT tests have been unable to
build the trunk (r26307) with Oracle Studio compilers. This has been
uncovered since the fix of r26302, allowing us to get further in the
build
I think this is related to the "Fortran merge."
Last night, Oracle MTT tests couldn't build the trunk (r26307) with
Intel compilers. Specifically, configure fails with
checking to see if Fortran compiler likes the C++ exception flags... no
configure: WARNING: C++ exception flags are
the branch.
On Mar 14, 2012, at 11:27 PM, Eugene Loh wrote:
I'm quitting for the day, but happened to notice that all our v1.5 MTT runs are failing
with r26133, though tests ran fine as of r26129. Things run fine on-node, but if you run
even just "hostname" on a remote node, the job fails
I'm quitting for the day, but happened to notice that all our v1.5 MTT
runs are failing with r26133, though tests ran fine as of r26129.
Things run fine on-node, but if you run even just "hostname" on a remote
node, the job fails with
orted: Command not found
I get this problem whether I
Yes, seems to work for me, thanks.
On 3/3/2012 3:14 PM, Ralph Castain wrote:
Should be fixed in r26093
On Mar 3, 2012, at 4:06 PM, Eugene Loh wrote:
I'll look at this some more, but for now I'll note that the trunk has an
apparent regression in r26081.
./configure
I'll look at this some more, but for now I'll note that the trunk has an
apparent regression in r26081.
./configure \
--enable-shared\
--enable-orterun-prefix-by-default \
--disable-peruse \
In the test suite, we have an ibm/dynamic/loop_spawn test that looks
like this:
for (...) {
loop_spawn spawns loop_child
parent and child execute MPI_Intercomm_merge
parent and child execute MPI_Comm_free
parent and child execute MPI_Comm_disconnect
}
If
On 02/22/12 14:54, Ralph Castain wrote:
That doesn't really address the issue, though. What I want to know is:
what happens when you try to bind processes? What about
-bind-to-socket, and -persocket options? Etc. Reason I'm concerned:
I'm not sure what happens if the socket layer isn't
On 2/22/2012 11:08 AM, Ralph Castain wrote:
On Feb 22, 2012, at 11:59 AM, Brice Goglin wrote:
Le 22/02/2012 17:48, Ralph Castain a écrit :
On Feb 22, 2012, at 9:39 AM, Eugene Loh wrote
On 2/21/2012 10:31 PM, Eugene Loh wrote:
... "sockets" is unknown and hwloc returns 0 for n
On 2/21/2012 10:31 PM, Eugene Loh wrote:
... "sockets" is unknown and hwloc returns 0 for num_sockets and OMPI
pukes on divide by zero. OS info was listed in the original message
(below). Might we want to do something else? E.g., assume
num_sockets==1 when num_sockets==0 (if you
On 2/21/2012 5:40 PM, Paul H. Hargrove wrote:
Here are the first of the results of the testing I promised.
I am not 100% sure how to reach the code that Eugene reported as
problematic,
I don't think you're going to see it. Somehow, hwloc on the config in
question thinks there is no socket
the following should be fixed?
*) on this platform, hwloc finds no socket level
*) therefore hwloc returns num_sockets==0 to OMPI
*) OMPI divides by 0 and barfs on basically everything
On Feb 21, 2012, at 7:20 PM, Eugene Loh wrote:
We have some amount of MTT testing going on every night and on ONE
We have some amount of MTT testing going on every night and on ONE of
our systems v1.5 has been dead since r25914. The system is
Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007
x86_64 x86_64 x86_64 GNU/Linux
and I'm encountering the problem with Intel
I had a question about our Fortran MPI_Improbe support.
If I look at ompi/mpi/f77/improbe_f.c I see basically (lots of code
removed):
64void mpi_improbe_f(MPI_Fint *source, MPI_Fint *tag, MPI_Fint
*comm,
65 ompi_fortran_logical_t *flag, MPI_Fint
*message,
losely at
results. Mostly, in any case, things look fine.
but might be something with the submit.php script - just a guess
though at this point.
Unfortunately I have zero time to spend on MTT for a few weeks at
least. :/
-- Josh
On Thu, Jan 5, 2012 at 8:11 PM, Eugene Loh <eugene@o
Oracle has MTT jobs that have been running and, according to the log
files, been successfully reporting results to the IU database, even in
the last few days. If I look at http://www.open-mpi.org/mtt, however, I
can't seem to turn up any results for the new calendar year (2012). Any
On 11/21/11 20:51, Lukas Razik wrote:
Hello everybody!
I've Sun T5120 (SPARC64) Servers with
- Debian: 6.0.3
- linux-2.6.39.4 (from kernel.org)
- OFED-1.5.3.2
- InfiniBand: Mellanox Technologies MT25418 [ConnectX VPI PCIe 2.0 2.5GT/s - IB
DDR / 10GigE] (rev a0)
with newest FW (2.9.1)
and
On 11/16/2011 3:32 AM, TERRY DONTJE wrote:
On 11/15/2011 10:16 PM, Jeff Squyres wrote:
On Nov 14, 2011, at 10:17 PM, Eugene Loh wrote:
I tried building v1.5. r25469 builds for me, r25470 does not. This is
Friday's hwloc putback of CMR 2866. I'm on Solaris11/x86. The problem is
basically
I tried building v1.5. r25469 builds for me, r25470 does not. This is
Friday's hwloc putback of CMR 2866. I'm on Solaris11/x86. The problem
is basically:
Making all in tools/ompi_info
CC ompi_info.o
"../../../opal/include/opal/sys/ia32/atomic.h", line 173: warning:
parameter in
On 11/4/2011 5:56 AM, Jeff Squyres wrote:
On Oct 28, 2011, at 1:59 AM, Eugene Loh wrote
In our MTT testing, we see ibm/io/file_status_get_count fail occasionally with:
File locking failed in ADIOI_Set_lock(fd A,cmd F_SETLKW/7,type F_RDLCK/0,whence
0) with return value
and errno 5
In our MTT testing, we see ibm/io/file_status_get_count fail
occasionally with:
File locking failed in ADIOI_Set_lock(fd A,cmd F_SETLKW/7,type F_RDLCK/0,whence
0) with return value
and errno 5.
- If the file system is NFS, you need to use NFS version 3, ensure that the
lockd daemon
In MTT testing, we check OMPI version number to decide whether to test
MPI 2.2 datatypes.
Specifically, in intel_tests/src/mpitest_def.h:
#define MPITEST_2_2_datatype 0
#if defined(OPEN_MPI)
#if (OMPI_MAJOR_VERSION > 1) || (OMPI_MAJOR_VERSION == 1 &&
OMPI_MINOR_VERSION >= 7)
#
On 8/31/2011 4:48 AM, Ralph Castain wrote:
Perhaps it would help if you had clearly stated your concern.
Yeah. It would have helped had I clearly understood what was going on.
Most of all, that way I wouldn't have had to ask any questions! :^)
From this description, I gather your concern
On 8/30/2011 7:34 PM, Ralph Castain wrote:
On Aug 29, 2011, at 11:18 PM, Eugene Loh wrote:
Maybe someone can help me from having to think too hard.
Let's say I want to max my system limits. I can say this:
% mpirun --mca opal_set_max_sys_limits 1 ...
Cool.
Meanwhile, if I do
Maybe someone can help me from having to think too hard.
Let's say I want to max my system limits. I can say this:
% mpirun --mca opal_set_max_sys_limits 1 ...
Cool.
Meanwhile, if I do this:
% setenv OMPI_MCA_opal_set_max_sys_limits 1
% mpirun ...
remote processes don't see
It seems to me the FAQ item
http://www.open-mpi.org/faq/?category=large-clusters#fd-limits needs
updating. I'm willing to give this a try, but need some help first.
(I'm even more willing to let someone else do all this, but I'm not
holding my breath.)
For example, the text sounds dated --
ntioned above
so that you can spawn on CPUs that aren't spinning tightly on MPI progress,
...etc.
On Aug 15, 2011, at 11:47 AM, Eugene Loh wrote:
This is a question about ompi-tests/ibm/dynamic. Some of these tests (spawn, spawn_multiple,
loop_spawn/child, and no-disconnect) exercise
This is a question about ompi-tests/ibm/dynamic. Some of these tests
(spawn, spawn_multiple, loop_spawn/child, and no-disconnect) exercise
MPI_Comm_spawn* functionality. Specifically, they spawn additional
processes (beyond the initial mpirun launch) and therefore exert a
different load on a
NAS Parallel Benchmarks are self-verifying.
Another option is the MPI Testing Tool
http://www.open-mpi.org/projects/mtt/ but it might be more trouble than
it's worth.
(INCIDENTALLY, THERE ARE TRAC TROUBLES WITH THE THREE LINKS AT THE
BOTTOM OF THAT PAGE! COULD SOMEONE TAKE A LOOK?)
If
Thanks for the clarification. My myopic sense of the issue came out of
stumbling on this behavior due to MPI_Comm_spawn_multiple failing.
I think *multiple* issues caused this problem to escape notice for so
long. One is that if the system thought it was oversubscribed,
num_procs_alive was
On 7/13/2011 4:31 PM, Paul H. Hargrove wrote:
On 7/13/2011 4:20 PM, Yevgeny Kliteynik wrote:
> Finally, are you sure that infiniband/complib/cl_types_osd.h
exists on all platforms? (e.g., Solaris) I know you said you don't
have any Solaris machines to test with, but you should ping Oracle
The function orte_odls_base_default_launch_local() has a variable
num_procs_alive that is basically initialized like this:
if ( oversubscribed ) {
...
} else {
num_procs_alive = ...;
}
Specifically, if the "oversubscribed" test passes, the variable is not
I'm running into a hang that is very easy to reproduce. Basically,
something like this:
% mpirun -H remote_node hostname
remote_node
^C
That is, I run a program (doesn't need to be MPI) on a remote node. The
program runs, but my local orterun doesn't return. The problem seems
conf enable_progress) is minor. Either way,
things are fine. My concern is more around the accumulation of many
such instances.
Ralph Castain wrote:
On Mar 10, 2011, at 5:54 PM, Eugene Loh wrote:
Ralph Castain wrote:
Just stale code that doesn't hurt anything
Okay, so it'd be
-code progress threads to off because the code isn't thread safe
in key areas involving the event library, for one.
On Mar 10, 2011, at 3:43 PM, Eugene Loh wrote:
In the trunk, we hardwire progress threads to be off. E.g.,
% grep progress configure.ac
# Hardwire all progress threads to be off
In the trunk, we hardwire progress threads to be off. E.g.,
% grep progress configure.ac
# Hardwire all progress threads to be off
enable_progress_threads="no"
[Hardcode the ORTE progress thread to be off])
[Hardcode the OMPI progress thread to be off])
So,
I've been assigned CMR 2728, which is to apply some thread-support
changes to 1.5.x. The trac ticket has amusing language about "needs
testing". I'm not sure what that means. We rather consistently say
that we don't promise anything with regards to true thread support. We
specifically say
EADME consistent with the v1.5 source code (as opposed to talking
about features that will appear in unspecified future releases), either:
*) the comment should be removed from the README, or
*) opal-multi-threads should be CMRed to v1.5
On Feb 14, 2011, at 5:36 PM, Eugene Loh wrote:
In the v1.5 README, I see this:
--enable-opal-multi-threads
Enables thread lock support in the OPAL and ORTE layers. Does
not enable MPI_THREAD_MULTIPLE - see above option for that feature.
This is currently disabled by default.
I don't otherwise find opal-multi-threads at all in this
Jeff Squyres wrote:
Eugene --
This ROMIO fix needs to go upstream.
Makes sense. Whom do I pester about that? Is r24356 (and now CMR 2712)
okay as is? The ROMIO change is an unimportant stylistic change, so I'm
okay cutting it loose from the other changes in the putback.
Jeff Squyres wrote:
On Jan 11, 2011, at 2:05 PM, Eugene Loh wrote:
Do we have configure tests for them, or just #define's?
Configure tests.
Ok, cool. I assume you'll remove the senseless configure tests, too.
Right.
no
reason).
Do we have configure tests for them, or just #define's?
Configure tests.
On Jan 10, 2011, at 7:51 PM, Eugene Loh wrote:
Why do
u_int8_t
u_int16_t
u_int32_t
u_int64_t
get defined in opal_config.h? I don't see them used anywhere in the
OMPI/OPAL/ORTE code base.
Okay, one
Why do
u_int8_t
u_int16_t
u_int32_t
u_int64_t
get defined in opal_config.h? I don't see them used anywhere in the
OMPI/OPAL/ORTE code base.
Okay, one exception, in opal/util/if.c:
#if defined(__DragonFly__)
#define IN_LINKLOCAL(i)(((u_int32_t)(i) & 0x) == 0xa9fe)
than the minimum already computed). Pre-setting to (size_t)-1 should fix the
issue.
On Jan 3, 2011, at 17:17 , Eugene Loh wrote:
I can't tell if this is a problem, though I suspect it's a small one even if
it's a problem at all.
In mca_bml_r2_del_proc_btl(), a BTL is removed from the send
ain thread (as this only occurs in MPI_Finalize).
Can you look in the syslog to see if there is any additional info related to
this issue there?
Not much. A one-liner like this:
Dec 27 21:49:36 burl-ct-x4150-11 hermon: [ID 492207 kern.info] hermon1:
EQE local access violation
On Dec 30, 2010, at 20:
I was running a bunch of np=4 test programs over two nodes.
Occasionally, *one* of the codes would see an IBV_EVENT_QP_ACCESS_ERR
during MPI_Finalize(). I traced the code and ran another program that
mimicked the particular MPI calls made by that program. This other
program, too, would
I'm starting to look at the openib BTL for the first time and am
puzzled. In btl_openib_async.c, it looks like an asynchronous thread is
started. During MPI_Init(), the main thread sends the async thread a
file descriptor for each IB interface to be polled. In MPI_Finalize(),
the main
Jeff Squyres (jsquyres) wrote:
Ya, it sounds like we should fix this eager limit help text so that others aren't misled. We did say "attempt", but that's probably a bit too subtle.
Eugene - iirc: this is in the btl base (or some other central location) because it's shared between all btls.
George Bosilca wrote:
Moreover, eager send can improve performance if and only if the matching
receives are already posted on the peer. If not, the data will become
unexpected, and there will be one additional memcpy.
I don't think the first sentence is strictly true. There is a cost
Sébastien Boisvert wrote:
Le mardi 23 novembre 2010 à 16:07 -0500, Eugene Loh a écrit :
Sébastien Boisvert wrote:
Case 1: 30 MPI ranks, message size is 4096 bytes
File: mpirun-np-30-Program-4096.txt
Outcome: It hangs -- I killed the poor thing after 30 seconds
Sébastien Boisvert wrote:
Now I can describe the cases.
The test cases can all be explained by the test requiring eager messages
(something that test4096.cpp does not require).
Case 1: 30 MPI ranks, message size is 4096 bytes
File: mpirun-np-30-Program-4096.txt
Outcome: It hangs -- I
To add to Jeff's comments:
Sébastien Boisvert wrote:
The reason is that I am developping an MPI-based software, and I use
Open-MPI as it is the only implementation I am aware of that send
messages eagerly (powerful feature, that is).
As wonderful as OMPI is, I am fairly sure other MPI
Eugene Loh wrote:
In mca_btl_sm_get_sync(), I see this:
/* Use the DMA flag if knem supports it *and* the segment length
is greater than the cutoff. Note that if the knem_dma_min
value is 0 (i.e., the MCA param was set to 0), the segment size
will never be larger
In mca_btl_sm_get_sync(), I see this:
/* Use the DMA flag if knem supports it *and* the segment length
is greater than the cutoff. Note that if the knem_dma_min
value is 0 (i.e., the MCA param was set to 0), the segment size
will never be larger than it, so DMA will never
Jeff and I were talking about trac 2035 and the handling of mpirun
command-line options. While most mpirun options have long,
multi-character names prefixed with a double dash, OMPI had originally
also wanted to support combinations of short names (e.g., "mpirun -hvq",
even if we don't
1 - 100 of 272 matches
Mail list logo