from:"Tim Prins"

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18252

2008-04-25 Thread Tim Prins

This commit causes mpirun to segfault when running the IBM spawn tests 
on our slurm platforms (it may affect others as well). The failures only 
happen when mpirun is run in a batch script.


The backtrace I get is:
Program terminated with signal 11, Segmentation fault.
#0  0x002a969b9dbe in daemon_leader (jobid=2643591169, 
num_local_contributors=1,

type=1 '\001', data=0x588c40, flag=1 '\001', participants=0x566e80)
at grpcomm_basic_module.c:1196
1196OBJ_RELEASE(collection);
(gdb) bt
#0  0x002a969b9dbe in daemon_leader (jobid=2643591169, 
num_local_contributors=1,

type=1 '\001', data=0x588c40, flag=1 '\001', participants=0x566e80)
at grpcomm_basic_module.c:1196
#1  0x002a969ba316 in daemon_collective (jobid=2643591169, 
num_local_contributors=1,

type=1 '\001', data=0x588c40, flag=1 '\001', participants=0x566e80)
at grpcomm_basic_module.c:1279
#2  0x002a956a94a9 in orte_odls_base_default_collect_data 
(proc=0x588eb8, buf=0x588ef0)

at base/odls_base_default_fns.c:2183
#3  0x002a95692990 in process_commands (sender=0x588eb8, 
buffer=0x588ef0, tag=1)

at orted/orted_comm.c:485
#4  0x002a956920a0 in orte_daemon_cmd_processor (fd=-1, 
opal_event=1, data=0x588e90)

at orted/orted_comm.c:271
#5  0x002a957fe4ca in event_process_active (base=0x50d940) at 
event.c:647
#6  0x002a957fea8b in opal_event_base_loop (base=0x50d940, flags=0) 
at event.c:819

#7  0x002a957fe6c5 in opal_event_loop (flags=0) at event.c:726
#8  0x002a957fe57e in opal_event_dispatch () at event.c:662
#9  0x0040335d in orterun (argc=5, argv=0x7fb008) at 
orterun.c:551

#10 0x00402bb3 in main (argc=5, argv=0x7fb008) at main.c:13
(gdb)


I ran with
srun -N 3 -b mpirun -mca mpi_yield_when_idle 1 
~/ompi-tests/ibm/dynamic/spawn_

multiple

Thanks,

Tim
r...@osl.iu.edu wrote:

Author: rhc
Date: 2008-04-23 10:52:09 EDT (Wed, 23 Apr 2008)
New Revision: 18252
URL: https://svn.open-mpi.org/trac/ompi/changeset/18252

Log:
Add a loadbalancing feature to the round-robin mapper - more to be sent to 
devel list

Fix a potential problem with RM-provided nodenames not matching returns from 
gethostname - ensure that the HNP's nodename gets DNS-resolved when comparing 
against RM-provided hostnames. Note that this may be an issue for RM-based 
clusters that don't have local DNS resolution, but hopefully that is more 
indicative of a poorly configured system.

Text files modified: 
   trunk/orte/mca/ras/base/ras_base_node.c| 6 +++ 
   trunk/orte/mca/rmaps/base/base.h   | 4 ++  
   trunk/orte/mca/rmaps/base/rmaps_base_open.c|10 +++ 
   trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c |55 +++ 
   trunk/orte/mca/rmaps/round_robin/rmaps_rr.c|50 
   trunk/orte/tools/orterun/orterun.c | 3 ++  
   6 files changed, 92 insertions(+), 36 deletions(-)


Modified: trunk/orte/mca/ras/base/ras_base_node.c
==
--- trunk/orte/mca/ras/base/ras_base_node.c (original)
+++ trunk/orte/mca/ras/base/ras_base_node.c 2008-04-23 10:52:09 EDT (Wed, 
23 Apr 2008)
@@ -23,6 +23,7 @@
 
 #include "opal/util/output.h"

 #include "opal/util/argv.h"
+#include "opal/util/if.h"
 
 #include "orte/mca/errmgr/errmgr.h"

 #include "orte/util/name_fns.h"
@@ -111,7 +112,7 @@
  * first position since it is the first one entered. We need to check 
to see
  * if this node is the same as the HNP's node so we don't double-enter 
it
  */
-if (0 == strcmp(node->name, hnp_node->name)) {
+if (0 == strcmp(node->name, hnp_node->name) || 
opal_ifislocal(node->name)) {
 OPAL_OUTPUT_VERBOSE((5, orte_ras_base.ras_output,
  "%s ras:base:node_insert updating HNP info to %ld 
slots",
  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
@@ -124,6 +125,9 @@
 hnp_node->slots_alloc = node->slots_alloc;
 hnp_node->slots_max = node->slots_max;
 hnp_node->launch_id = node->launch_id;
+/* use the RM's name for the node */
+free(hnp_node->name);
+hnp_node->name = strdup(node->name);
 /* set the node to available for use */
 hnp_node->allocate = true;
 /* update the total slots in the job */

Modified: trunk/orte/mca/rmaps/base/base.h
==
--- trunk/orte/mca/rmaps/base/base.h(original)
+++ trunk/orte/mca/rmaps/base/base.h2008-04-23 10:52:09 EDT (Wed, 23 Apr 
2008)
@@ -57,10 +57,12 @@
 bool pernode;
 /** number of ppn for n_per_node mode

Re: [OMPI devel] Change in btl/tcp

2008-04-18 Thread Tim Prins

To echo what Josh said, there are no special compile flags being used. 
If you send me a patch with debug output, I'd be happy to run it for you.


Both odin and sif are fairly normal linux based clusters, with ethernet 
and openib IP networks. The ethernet network has both ipv4 & ipv6, and 
the openib network runs ipv4.


Tim

Adrian Knoth wrote:

On Fri, Apr 18, 2008 at 01:00:40PM -0400, Josh Hursey wrote:

The trick is to force Open MPI to use only tcp,self and nothing else.  
Did you try adding this (-mca btl tcp,self) to the runtime parameter  
set?


Sure. Even with 64 processes, I cannot trigger this behaviour. Neither
on Linux nor Solaris.

Any special compile flags?

I guess a little bit more debug output could probably reveal the
culprit.

Re: [OMPI devel] Change in btl/tcp

2008-04-18 Thread Tim Prins


Hi Adrian,

After this change, I am getting a lot of errors of the form:
[sif2][[12854,1],9][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] 
mca_btl_tcp_frag_recv: readv failed: Connection reset by

peer (104)

See for instance: http://www.open-mpi.org/mtt/index.php?do_redir=615

I have found this especially easy to reproduce if I run 16 processes all 
with just the tcp and self btls on the same machine, running the 
'hello_c' program in the examples directory.


Tim


Adrian Knoth wrote:

Hi!

As of r18169, I've changed the acceptance rules for incoming BTL-TCP
connections.

The old code would have denied a connection in case of non-matching
addresses (comparison between source address and expected source
address).

Unfortunately, you cannot always say which source address an incoming
packet will have (it's the sender's kernel who decides), so rejecting a
connection due to "wrong" source address caused a complete hang.

I had several cases, mostly multi-cluster setups, where this has happend
all the time. (typical scenario: you're expecting the headnode's
internal address, but since you're talking to another cluster,
the kernel uses the headnode's external address)

Though I've tested it as much as possible, I don't know if it breaks
your setup, especially the multi-rail stuff. George?


Cheerio

Re: [OMPI devel] RFC: changes to modex

2008-04-15 Thread Tim Prins

Hate to bring this up again, but I was thinking that an easy way to 
reduce the size of the modex would be to reduce the length of the names 
describing each piece of data.


More concretely, for a simple run I get the following names, each of 
which are sent over the wire for every proc (note that this will change 
depending on the number of btls one has active):

ompi-proc-info
btl.openib.1.3
btl.tcp.1.3
pml.base.1.0
btl.udapl.1.3

So that's 89 bytes of naming overhead (size of strings + dss packing 
overhead) per process.


A couple of possible solutions to this:
1. Send a 32 bit string hashes instead of the strings. This would reduce 
the per process size from 89 to 20 bytes, but there is always a (slight) 
possibility of collisions.


2. Change the way the dss packs strings. Currently, it packs a 32 bit 
sting length, the string, and a null terminator. It may be good enough 
to just pack the string a the NULL terminator. This would reduce 
per-process size from 89 to 69 bytes.


3. Reduce the length of the names. 'ompi-proc-info' could become simply 
'pinf', and two of the separators could be removed in the other names 
(ex: 'btl.openib.1.3' -> 'btlopenib1.3'). This would change the per 
process size from 89 to 71 bytes.


4. Do #2 & #3. This would change the per process size from 89 to 51 bytes.

Anyways, just an idea for consideration.

Tim


WHAT: Changes to MPI layer modex API

WHY: To be mo' betta scalable

WHERE: ompi/mpi/runtime/ompi_module_exchange.* and everywhere that
calls ompi_modex_send() and/or ompi_modex_recv()

TIMEOUT: COB Fri 4 Apr 2008

DESCRIPTION:

Per some of the scalability discussions that have been occurring (some
on-list and some off-list), and per the e-mail I sent out last week
about ongoing work in the openib BTL, Ralph and I put together a loose
proposal this morning to make the modex more scalable. The timeout is
fairly short because Ralph wanted to start implementing in the near
future, and we didn't anticipate that this would be a contentious
proposal.

The theme is to break the modex into two different kinds of data:

- Modex data that is specific to a given proc
- Modex data that is applicable to all procs on a given node

For example, in the openib BTL, the majority of modex data is
applicable to all processes on the same node (GIDs and LIDs and
whatnot). It is much more efficient to send only one copy of such
node-specific data to each process (vs. sending ppn copies to each
process). The spreadsheet I included in last week's e-mail clearly
shows this.

1. Add new modex API functions. The exact function signatures are
TBD, but they will be generally of the form:

  * int ompi_modex_proc_send(...): send modex data that is specific to
this process. It is just about exactly the same as the current API
call (ompi_modex_send).

  * int ompi_modex_proc_recv(...): receive modex data from a specified
peer process (indexed on ompi_proc_t*). It is just about exactly the
same as the current API call (ompi_modex_recv).

  * int ompi_modex_node_send(...): send modex data that is relevant
for all processes in this job on this node. It is intended that only
one process in a job on a node will call this function. If more than
one process in a job on a node calls _node_send(), then only one will
"win" (meaning that the data sent by the others will be overwritten).

  * int ompi_modex_node_recv(...): receive modex data that is relevant
for a whole peer node; receive the ["winning"] blob sent by
_node_send() from the source node. We haven't yet decided what the
node index will be; it may be (ompi_proc_t*) (i.e., _node_recv() would
figure out what node the (ompi_proc_t*) resides on and then give you
the data).

2. Make the existing modex API calls (ompi_modex_send,
ompi_modex_recv) be wrappers around the new "proc" send/receive
calls. This will provide exactly the same functionality as the
current API (but be sub-optimal at scale). It will give BTL authors
(etc.) time to update to the new API, potentially taking advantage of
common data across multiple processes on the same node. We'll likely
put in some opal_output()'s in the wrappers to help identify code that
is still calling the old APIs.

3. Remove the old API calls (ompi_modex_send, ompi_modex_recv) before
v1.3 is released.

[OMPI devel] mpirun return code problems

2008-04-08 Thread Tim Prins


Hi all,

I reported this before, but it seems that the report got lost. I have 
found some situations where mpirun will return a '0' when there is an error.


An easy way to reproduce this is to edit the file 
'orte/mca/plm/base/plm_base_launch_support.c' and on line 154 put in 
'return ORTE_ERROR;' (or apply the attached diff).


Then recompile and run mpirun. mpirun will indicate there was an error, 
but will still return 0. The reason this is concerning to me is that MTT 
only looks at return codes, so our tests may be failing and we wouldn't 
know it.


Thanks,

Tim
Index: orte/mca/plm/base/plm_base_launch_support.c
===
--- orte/mca/plm/base/plm_base_launch_support.c (revision 18092)
+++ orte/mca/plm/base/plm_base_launch_support.c (working copy)
@@ -151,7 +151,7 @@
  ORTE_JOBID_PRINT(job), ORTE_ERROR_NAME(rc)));
 return rc;
 }
-
+   return ORTE_ERROR; 
 /* complete wiring up the iof */
 OPAL_OUTPUT_VERBOSE((5, orte_plm_globals.output,
  "%s plm:base:launch wiring up iof",

Re: [OMPI devel] init_thread + spawn error

2008-04-04 Thread Tim Prins

Thanks for the report. As Ralph indicated the threading support in Open 
MPI is not good right now, but we are working to make it better.


I have filed a ticket (https://svn.open-mpi.org/trac/ompi/ticket/1267) 
so we do not loose track of this issue, and attached a potential fix to 
the ticket.


Thanks,

Tim

Joao Vicente Lima wrote:

Hi,
I getting a error on call init_thread and comm_spawn on this code:

#include "mpi.h"
#include 

int
main (int argc, char *argv[])
{
  int provided;
  MPI_Comm parentcomm, intercomm;

  MPI_Init_thread(, , MPI_THREAD_MULTIPLE, );
  MPI_Comm_get_parent ();

  if (parentcomm == MPI_COMM_NULL)
{
  printf ("spawning ... \n");
  MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1,
  MPI_INFO_NULL, 0, MPI_COMM_SELF, , 
MPI_ERRCODES_IGNORE);
  MPI_Comm_disconnect ();
}
  else
  {
printf ("child!\n");
MPI_Comm_disconnect ();
  }

  MPI_Finalize ();
  return 0;
}

and the error is:

spawning ...
opal_mutex_lock(): Resource deadlock avoided
[localhost:18718] *** Process received signal ***
[localhost:18718] Signal: Aborted (6)
[localhost:18718] Signal code:  (-6)
[localhost:18718] [ 0] /lib/libpthread.so.0 [0x2b6e5d9fced0]
[localhost:18718] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2b6e5dc3b3c5]
[localhost:18718] [ 2] /lib/libc.so.6(abort+0x10e) [0x2b6e5dc3c73e]
[localhost:18718] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c9560ff]
[localhost:18718] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c95601d]
[localhost:18718] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c9560ac]
[localhost:18718] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c956a93]
[localhost:18718] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c9569dd]
[localhost:18718] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c95797d]
[localhost:18718] [ 9]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+0x1ec)
[0x2b6e5c957dd9]
[localhost:18718] [10]
/usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b6e607f05cf]
[localhost:18718] [11]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(MPI_Comm_spawn+0x459)
[0x2b6e5c98ede9]
[localhost:18718] [12] ./spawn1(main+0x7a) [0x400ae2]
[localhost:18718] [13] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b6e5dc28b74]
[localhost:18718] [14] ./spawn1 [0x4009d9]
[localhost:18718] *** End of error message ***
opal_mutex_lock(): Resource deadlock avoided
[localhost:18719] *** Process received signal ***
[localhost:18719] Signal: Aborted (6)
[localhost:18719] Signal code:  (-6)
[localhost:18719] [ 0] /lib/libpthread.so.0 [0x2b9317a17ed0]
[localhost:18719] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2b9317c563c5]
[localhost:18719] [ 2] /lib/libc.so.6(abort+0x10e) [0x2b9317c5773e]
[localhost:18719] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b93169710ff]
[localhost:18719] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b931697101d]
[localhost:18719] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b93169710ac]
[localhost:18719] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b9316971a93]
[localhost:18719] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b93169719dd]
[localhost:18719] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b931697297d]
[localhost:18719] [ 9]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+0x1ec)
[0x2b9316972dd9]
[localhost:18719] [10]
/usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b931a80b5cf]
[localhost:18719] [11]
/usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b931a80dad7]
[localhost:18719] [12] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b9316977207]
[localhost:18719] [13]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Init_thread+0x166)
[0x2b93169b8622]
[localhost:18719] [14] ./spawn1(main+0x25) [0x400a8d]
[localhost:18719] [15] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b9317c43b74]
[localhost:18719] [16] ./spawn1 [0x4009d9]
[localhost:18719] *** End of error message ***
--
mpirun noticed that process rank 0 with PID 18719 on node localhost
exited on signal 6 (Aborted).
--

if I change MPI_Init_thread to MPI_Init all works.
some suggest ?
The attachments contain my ompi_info (r18077) and config.log.

thanks in advance,
Joao.




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Tim Prins

Is there a reason to rename ompi_modex_{send,recv} to 
ompi_modex_proc_{send,recv}? It seems simpler (and no more confusing and 
less work) to leave the names alone and add ompi_modex_node_{send,recv}.


Another question: Does the receiving process care that the information 
received applies to a whole node? I ask because maybe we could get the 
same effect by simply adding a parameter to ompi_modex_send which 
specifies if the data applies to just the proc or a whole node.


So, if we have ranks 1 & 2 on n1, and rank 3 on n2, then rank 1 would do:
ompi_modex_send("arch", arch, );
then rank 3 would do:
ompi_modex_recv(rank 1, "arch");
ompi_modex_recv(rank 2, "arch");

I don't really care either way, just wanted to throw out the idea.

Tim

Jeff Squyres wrote:

WHAT: Changes to MPI layer modex API

WHY: To be mo' betta scalable

WHERE: ompi/mpi/runtime/ompi_module_exchange.* and everywhere that  
calls ompi_modex_send() and/or ompi_modex_recv()


TIMEOUT: COB Fri 4 Apr 2008

DESCRIPTION:

Per some of the scalability discussions that have been occurring (some  
on-list and some off-list), and per the e-mail I sent out last week  
about ongoing work in the openib BTL, Ralph and I put together a loose  
proposal this morning to make the modex more scalable.  The timeout is  
fairly short because Ralph wanted to start implementing in the near  
future, and we didn't anticipate that this would be a contentious  
proposal.


The theme is to break the modex into two different kinds of data:

- Modex data that is specific to a given proc
- Modex data that is applicable to all procs on a given node

For example, in the openib BTL, the majority of modex data is  
applicable to all processes on the same node (GIDs and LIDs and  
whatnot).  It is much more efficient to send only one copy of such  
node-specific data to each process (vs. sending ppn copies to each  
process).  The spreadsheet I included in last week's e-mail clearly  
shows this.


1. Add new modex API functions.  The exact function signatures are  
TBD, but they will be generally of the form:


  * int ompi_modex_proc_send(...): send modex data that is specific to  
this process.  It is just about exactly the same as the current API  
call (ompi_modex_send).


  * int ompi_modex_proc_recv(...): receive modex data from a specified  
peer process (indexed on ompi_proc_t*).  It is just about exactly the  
same as the current API call (ompi_modex_recv).


  * int ompi_modex_node_send(...): send modex data that is relevant  
for all processes in this job on this node.  It is intended that only  
one process in a job on a node will call this function.  If more than  
one process in a job on a node calls _node_send(), then only one will  
"win" (meaning that the data sent by the others will be overwritten).


  * int ompi_modex_node_recv(...): receive modex data that is relevant  
for a whole peer node; receive the ["winning"] blob sent by  
_node_send() from the source node.  We haven't yet decided what the  
node index will be; it may be (ompi_proc_t*) (i.e., _node_recv() would  
figure out what node the (ompi_proc_t*) resides on and then give you  
the data).


2. Make the existing modex API calls (ompi_modex_send,  
ompi_modex_recv) be wrappers around the new "proc" send/receive  
calls.  This will provide exactly the same functionality as the  
current API (but be sub-optimal at scale).  It will give BTL authors  
(etc.) time to update to the new API, potentially taking advantage of  
common data across multiple processes on the same node.  We'll likely  
put in some opal_output()'s in the wrappers to help identify code that  
is still calling the old APIs.


3. Remove the old API calls (ompi_modex_send, ompi_modex_recv) before  
v1.3 is released.

Re: [OMPI devel] [OMPI svn] svn:open-mpi r17941

2008-03-27 Thread Tim Prins

Unfortunately now with r17988 I cannot run any mpi programs, they seem 
to hang in the modex.


Tim

Ralph H Castain wrote:

Thanks Tim - I found the problem and will commit a fix shortly.

Appreciate your testing and reporting!


On 3/27/08 8:24 AM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:


This commit breaks things for me. Running on 3 nodes of odin:

mpirun -mca btl tcp,sm,self  examples/ring_c

causes a hang. All of the processes are stuck in
orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang,
and the ring program does not hang all the time, but fairly often.

Tim

r...@osl.iu.edu wrote:

Author: rhc
Date: 2008-03-24 16:50:31 EDT (Mon, 24 Mar 2008)
New Revision: 17941
URL: https://svn.open-mpi.org/trac/ompi/changeset/17941

Log:
Fix the allgather and allgather_list functions to avoid deadlocks at large
node/proc counts. Violated the RML rules here - we received the allgather
buffer and then did an xcast, which causes a send to go out, and is then
subsequently received by the sender. This fix breaks that pattern by forcing
the recv to complete outside of the function itself - thus, the allgather and
allgather_list always complete their recvs before returning or sending.

Reogranize the grpcomm code a little to provide support for soon-to-come new
grpcomm components. The revised organization puts what will be common code
elements in the base to avoid duplication, while allowing components that
don't need those functions to ignore them.

Added:
   trunk/orte/mca/grpcomm/base/grpcomm_base_allgather.c
   trunk/orte/mca/grpcomm/base/grpcomm_base_barrier.c
   trunk/orte/mca/grpcomm/base/grpcomm_base_modex.c
Text files modified:
   trunk/orte/mca/grpcomm/base/Makefile.am| 5
   trunk/orte/mca/grpcomm/base/base.h |23 +
   trunk/orte/mca/grpcomm/base/grpcomm_base_close.c   | 4
   trunk/orte/mca/grpcomm/base/grpcomm_base_open.c| 1
   trunk/orte/mca/grpcomm/base/grpcomm_base_select.c  |   121 ++---
   trunk/orte/mca/grpcomm/basic/grpcomm_basic.h   |16
   trunk/orte/mca/grpcomm/basic/grpcomm_basic_component.c |30 -
   trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c|   845
++-
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos.h | 8
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_component.c   | 8
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_module.c  |21
   trunk/orte/mca/grpcomm/grpcomm.h   |45 +
   trunk/orte/mca/rml/rml_types.h |31
   trunk/orte/orted/orted_comm.c  |27 +
   14 files changed, 226 insertions(+), 959 deletions(-)


Diff not shown due to size (92619 bytes).
To see the diff, run the following command:

svn diff -r 17940:17941 --no-diff-deleted

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn] svn:open-mpi r17941

2008-03-27 Thread Tim Prins


This commit breaks things for me. Running on 3 nodes of odin:

mpirun -mca btl tcp,sm,self  examples/ring_c

causes a hang. All of the processes are stuck in 
orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang, 
and the ring program does not hang all the time, but fairly often.


Tim

r...@osl.iu.edu wrote:

Author: rhc
Date: 2008-03-24 16:50:31 EDT (Mon, 24 Mar 2008)
New Revision: 17941
URL: https://svn.open-mpi.org/trac/ompi/changeset/17941

Log:
Fix the allgather and allgather_list functions to avoid deadlocks at large 
node/proc counts. Violated the RML rules here - we received the allgather 
buffer and then did an xcast, which causes a send to go out, and is then 
subsequently received by the sender. This fix breaks that pattern by forcing 
the recv to complete outside of the function itself - thus, the allgather and 
allgather_list always complete their recvs before returning or sending.

Reogranize the grpcomm code a little to provide support for soon-to-come new 
grpcomm components. The revised organization puts what will be common code 
elements in the base to avoid duplication, while allowing components that don't 
need those functions to ignore them.

Added:
   trunk/orte/mca/grpcomm/base/grpcomm_base_allgather.c
   trunk/orte/mca/grpcomm/base/grpcomm_base_barrier.c
   trunk/orte/mca/grpcomm/base/grpcomm_base_modex.c
Text files modified: 
   trunk/orte/mca/grpcomm/base/Makefile.am| 5 
   trunk/orte/mca/grpcomm/base/base.h |23 +   
   trunk/orte/mca/grpcomm/base/grpcomm_base_close.c   | 4 
   trunk/orte/mca/grpcomm/base/grpcomm_base_open.c| 1 
   trunk/orte/mca/grpcomm/base/grpcomm_base_select.c  |   121 ++---   
   trunk/orte/mca/grpcomm/basic/grpcomm_basic.h   |16 
   trunk/orte/mca/grpcomm/basic/grpcomm_basic_component.c |30 -   
   trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c|   845 ++- 
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos.h | 8 
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_component.c   | 8 
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_module.c  |21 
   trunk/orte/mca/grpcomm/grpcomm.h   |45 +   
   trunk/orte/mca/rml/rml_types.h |31 
   trunk/orte/orted/orted_comm.c  |27 +   
   14 files changed, 226 insertions(+), 959 deletions(-)



Diff not shown due to size (92619 bytes).
To see the diff, run the following command:

svn diff -r 17940:17941 --no-diff-deleted

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn

Re: [OMPI devel] iof/libevent failures?

2008-03-25 Thread Tim Prins

I was able to replicate the failure with a debug build by running mpirun 
through a batch job. I then added the parameter you gave me, and it 
worked fine with the parameter.


Thanks,

Tim

Jeff Squyres wrote:
We're chasing down a problem that we're having on OSX w.r.t. libevent,  
too -- can you try running with:


--mca opal_event_include select

and see if that fixes the problem for you?


On Mar 25, 2008, at 8:49 AM, Tim Prins wrote:

Hi everyone,

For the last couple nights ALL of our mtt runs have been failing
(although the failure is masked because mpirun is returning the wrong
error code) with:

[odin005.cs.indiana.edu:28167] [[46567,0],0] ORTE_ERROR_LOG: Error  
in file

base/plm_base_launch_support.c at line 161
--
mpirun was unable to start the specified application as it encountered
an error.
More information may be available above.
--

This line is where we try to do an IOF push. It looks like it was  
broken
somewhere between r17922 and r17926, which includes the libevent  
merge.


I cannot replicate this with a debug build, so I thought I would throw
this out before I look any further.

Thanks,

Tim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] iof/libevent failures?

2008-03-25 Thread Tim Prins


Hi everyone,

For the last couple nights ALL of our mtt runs have been failing 
(although the failure is masked because mpirun is returning the wrong 
error code) with:


[odin005.cs.indiana.edu:28167] [[46567,0],0] ORTE_ERROR_LOG: Error in file
base/plm_base_launch_support.c at line 161
--
mpirun was unable to start the specified application as it encountered
an error.
More information may be available above.
--

This line is where we try to do an IOF push. It looks like it was broken 
somewhere between r17922 and r17926, which includes the libevent merge.


I cannot replicate this with a debug build, so I thought I would throw 
this out before I look any further.


Thanks,

Tim

[OMPI devel] Return code and error message problems

2008-03-25 Thread Tim Prins


Hi,

Something went wrong last night and all our MTT tests had the following 
output:

[odin005.cs.indiana.edu:28167] [[46567,0],0] ORTE_ERROR_LOG: Error in file
base/plm_base_launch_support.c at line 161
--
mpirun was unable to start the specified application as it encountered 
an error.

More information may be available above.
--

I have not tracked down what caused this, but the more immediate problem 
is that after giving this error mpirun returned '0' instead of a more 
sane error value.




Also, when running the test 'orte/test/mpi/abort' I get the error output:
--
mpirun has exited due to process rank 1 with PID 17822 on
node odin013 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--

Which is wrong, it should be saying that the process was aborted. It 
looks like somehow the job state is being set to 
ORTE_JOB_STATE_ABORTED_WO_SYNC  instead of ORTE_JOB_STATE_ABORTED.


Thanks,

Tim

[OMPI devel] [RFC] Reduce the number of tests run by make check

2008-03-04 Thread Tim Prins


WHAT: Reduce the number of tests run by make check

WHY: Some of the tests will not work properly until Open MPI is 
installed. Also, many of the tests do not really test anything.


WHERE: See below.

TIMEOUT: COB Friday March 14

DESCRIPTION:
We have been having many problems with make check over the years. People 
tend to change things and not update the tests, which lead to tarball 
generation failures and nightly test run failures. Furthermore, many of 
the tests test things which have not changed for years.


So with this in mind, I propose only running the following tests when 
'make check' is run:

asm/atomic_barrier
asm/atomic_barrier_noinline
asm/atomic_spinlock
asm/atomic_spinlock_noinline
asm/atomic_math
asm/atomic_math_noinline
asm/atomic_cmpset
asm/atomic_cmpset_noinline

We we would no longer run the following tests:
class/ompi_bitmap_t
class/opal_hash_table_t
class/opal_list_t
class/opal_value_array_t
class/opal_pointer_array
class/ompi_rb_tree_t
memory/opal_memory_basic
memory/opal_memory_speed
memory/opal_memory_cxx
threads/opal_thread
threads/opal_condition
datatype/ddt_test
datatype/checksum
datatype/position
peruse/mpi_peruse

These tests would not be deleted from the repository, just made so they 
do not run by default.

Re: [OMPI devel] make check failing

2008-03-04 Thread Tim Prins

Simple, because the test that eventually segfaults only runs if ompi is 
configured with threading. Otherwise it is a no-op.


Tim

Jeff Squyres wrote:
I think another important question is: why is this related to  
threads?  (i.e., why does it work in non-threaded builds)



On Mar 4, 2008, at 9:44 AM, Ralph H Castain wrote:

Carto select failing if it doesn't find any modules was called out  
in an
earlier message (might have been a commit log) when we set an mca-no- 
build
flag on that framework. This should probably be fixed - there are  
times when

someone may not wish to build any carto modules.

Is there some reason why carto absolutely must find a module? Can we  
crate a

default "none available" module in the base?


On 3/4/08 7:39 AM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:


Hi,

We have been having a problem lately with our MTT runs where make  
check

would fail when mpi threads were enabled.

Turns out the problem is that opal_init now calls
opal_base_carto_select, which cannot find any carto modules since we
have not done an install yet. So it returns a failure. This causes
opal_init to abort before initializing the event engine. So when we  
try
to do the threading tests, the event engine is uninitialized and  
fails.


So this is why it fails, but I do not know how best to fix it. Any
suggestions would be appreciated.

Tim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] make check failing

2008-03-04 Thread Tim Prins


Hi,

We have been having a problem lately with our MTT runs where make check 
would fail when mpi threads were enabled.


Turns out the problem is that opal_init now calls 
opal_base_carto_select, which cannot find any carto modules since we 
have not done an install yet. So it returns a failure. This causes 
opal_init to abort before initializing the event engine. So when we try 
to do the threading tests, the event engine is uninitialized and fails.


So this is why it fails, but I do not know how best to fix it. Any 
suggestions would be appreciated.


Tim

Re: [OMPI devel] [RFC] Default hostfile MCA param

2008-03-04 Thread Tim Prins

We have used '^' elsewhere to indicate not, so maybe just have the 
syntax be if you put '^' at the beginning of a line, that node is not used.


So we could have:
n0
n1
^headnode
n3

I understand the idea of having a flag to indicate that all nodes below 
a certain point should be ignored, but I think this might get confusing, 
and I'm unsure how useful it would be. I just see the usefulness of this 
to block out a couple of nodes by default. Besides, if you do want to 
block out many nodes, any reasonable text editor allows you to insert 
'^' in front of any number of lines easily.


Alternatively, for the particular situation that Edgar mentions, it may 
be good enough just to set rmaps_base_no_schedule_local in the mca 
params default file.


One question though: If I am in a slurm allocation which contains n1, 
and there is a default hostfile that contains "^n1", will I run on 'n1'?


I'm not sure what the answer is, I know we talked about the precedence 
earlier...


Tim

Ralph H Castain wrote:

I personally have no objection, but I would ask then that the wiki be
modified to cover this case. All I require is that someone define the syntax
to be used to indicate "this is a node I do -not- want used", or
alternatively a flag that indicates "all nodes below are -not- to be used".

Implementation isn't too hard once I have that...


On 3/3/08 9:44 AM, "Edgar Gabriel"  wrote:


Ralph,

could this mechanism be used also to exclude a node, indicating to never
run a job there? Here is the problem that I face quite often: students
working on the homework forget to allocate a partition  on the cluster,
and just type mpirun. Because of that, all jobs end up running on the
front-end node.

If we would have now the ability to specify in a default hostfile, to
never run a job on a specified node (e.g. the front end node), users
would get an error message when trying to do that. I am aware that
that's a little ugly...

THanks
edgar

Ralph Castain wrote:

I forget all the formatting we are supposed to use, so I hope you'll all
just bear with me.

George brought up the fact that we used to have an MCA param to specify a
hostfile to use for a job. The hostfile behavior described on the wiki,
however, doesn't provide for that option. It associates a hostfile with a
specific app_context, and provides a detailed hierarchical layout of how
mpirun is to interpret that information.

What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
to replace the deprecated capability. If found, the system's behavior will
be:

1. in a managed environment, the default hostfile will be used to filter the
discovered nodes to define the available node pool. Any hostfile and/or dash
host options provided to an app_context will be used to further filter the
node pool to define the specific nodes for use by that app_context. Thus,
nodes in the hostfile and dash host options given to an app_context -must-
also be in the default hostfile in order to be available for use by that
app_context - any nodes in the app_context options that are not in the
default hostfile will be ignored.

2. in an unmanaged environment, the default hostfile will be used to define
the available node pool. Any hostfile and/or dash host options provided to
an app_context will be used to filter the node pool to define the specific
nodes for use by that app_context, subject to the previous caveat. However,
add-hostfile and add-host options will add nodes to the node pool for use
-only- by the associated app_context.


I believe this proposed behavior is consistent with that described on the
wiki, and would be relatively easy to implement. If nobody objects, I will
do so by end-of-day 3/6.

Comments, suggestions, objections - all are welcome!
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] [RFC] Removal of orte_proc_table

2008-03-03 Thread Tim Prins


WHAT: Removal of orte_proc_table

WHY: It is the last 'orte' class, its implementation is an abstraction 
violation since it assumes certain things about how the opal_hash_table 
is implemented, and it is not much code to remove it.


WHERE: This will necessitate minor changes in:
btl: tcp, sctp
oob: tcp
routed: unity, tree
grpcomm: basic
iof: svc

TIMEOUT: COB Wednesday, March 5.

DESCRIPTION:
After the orte changes, we were left with only one orte 'class' left. So 
Ralph and I discussed the possibility of removing it, and found that the 
amount of code change to do so is relatively minor.


There are also a couple other good reasons to remove or revamp it:
  1. The way the orte_proc_table was used was confusing (since you 
created an opal_hash_table and then used it as an opal_proc_table).

  2. It assumed things about the implementation of the opal_hash_table.

So with this in mind, we feel it would be good to remove it.

Attached is a patch that removes the usage of orte_proc_table. If there 
are no objections, I will commit it COB Wednesday (likely with a couple 
minor cleanups).


Index: orte/mca/oob/tcp/oob_tcp_peer.c
===
--- orte/mca/oob/tcp/oob_tcp_peer.c (revision 17666)
+++ orte/mca/oob/tcp/oob_tcp_peer.c (working copy)
@@ -55,7 +55,7 @@
 #include "opal/util/net.h"
 #include "opal/util/error.h"

-#include "orte/class/orte_proc_table.h"
+#include "opal/class/opal_hash_table.h"
 #include "orte/util/name_fns.h"
 #include "orte/runtime/orte_globals.h"
 #include "orte/mca/errmgr/errmgr.h"
@@ -216,14 +216,14 @@
 mca_oob_tcp_peer_t * mca_oob_tcp_peer_lookup(const orte_process_name_t* name)
 {
 int rc;
-mca_oob_tcp_peer_t * peer, *old;
+mca_oob_tcp_peer_t * peer = NULL, *old;
 if (NULL == name) { /* can't look this one up */
 return NULL;
 }

 OPAL_THREAD_LOCK(_oob_tcp_component.tcp_lock);
-peer = (mca_oob_tcp_peer_t*)orte_hash_table_get_proc(
-   _oob_tcp_component.tcp_peers, name);
+opal_hash_table_get_value_uint64(_oob_tcp_component.tcp_peers,
+ orte_util_hash_name(name), (void**));
 if (NULL != peer && 0 == orte_util_compare_name_fields(ORTE_NS_CMP_ALL, 
>peer_name, name)) {
 OPAL_THREAD_UNLOCK(_oob_tcp_component.tcp_lock);
 return peer;
@@ -247,8 +247,8 @@
 peer->peer_retries = 0;

 /* add to lookup table */
-if(ORTE_SUCCESS != 
orte_hash_table_set_proc(_oob_tcp_component.tcp_peers, 
->peer_name, peer)) {
+if(OPAL_SUCCESS != 
opal_hash_table_set_value_uint64(_oob_tcp_component.tcp_peers, 
+
orte_util_hash_name(>peer_name), peer)) {
 MCA_OOB_TCP_PEER_RETURN(peer);
 OPAL_THREAD_UNLOCK(_oob_tcp_component.tcp_lock);
 return NULL;
Index: orte/mca/oob/tcp/oob_tcp_peer.h
===
--- orte/mca/oob/tcp/oob_tcp_peer.h (revision 17666)
+++ orte/mca/oob/tcp/oob_tcp_peer.h (working copy)
@@ -93,7 +93,7 @@
 #define MCA_OOB_TCP_PEER_RETURN(peer)   \
 {   \
 mca_oob_tcp_peer_shutdown(peer);\
-orte_hash_table_remove_proc(_oob_tcp_component.tcp_peers, 
>peer_name); \
+opal_hash_table_remove_value_uint64(_oob_tcp_component.tcp_peers, 
orte_util_hash_name(>peer_name)); \
 OPAL_FREE_LIST_RETURN(_oob_tcp_component.tcp_peer_free, \
   >super);\
 }
Index: orte/mca/oob/tcp/oob_tcp.c
===
--- orte/mca/oob/tcp/oob_tcp.c  (revision 17666)
+++ orte/mca/oob/tcp/oob_tcp.c  (working copy)
@@ -50,7 +50,6 @@
 #include "opal/util/net.h"
 #include "opal/class/opal_hash_table.h"

-#include "orte/class/orte_proc_table.h"
 #include "orte/mca/errmgr/errmgr.h"
 #include "orte/mca/rml/rml.h"
 #include "orte/util/name_fns.h"
@@ -1125,12 +1124,12 @@

 int mca_oob_tcp_resolve(mca_oob_tcp_peer_t* peer)
 {
-mca_oob_tcp_addr_t* addr;
+mca_oob_tcp_addr_t* addr = NULL;

 /* if the address is already cached - simply return it */
 OPAL_THREAD_LOCK(_oob_tcp_component.tcp_lock);
-addr = (mca_oob_tcp_addr_t 
*)orte_hash_table_get_proc(_oob_tcp_component.tcp_peer_names,
- >peer_name);
+opal_hash_table_get_value_uint64(_oob_tcp_component.tcp_peer_names,
+ orte_util_hash_name(>peer_name), (void**));
 OPAL_THREAD_UNLOCK(_oob_tcp_component.tcp_lock);
 if(NULL != addr) {
  mca_oob_tcp_peer_resolved(peer, addr);
@@ -1459,23 +1458,26 @@
 int mca_oob_tcp_set_addr(const orte_process_name_t* name, const char* uri)
 {
 struct sockaddr_storage inaddr;
-mca_oob_tcp_addr_t* addr;
-mca_oob_tcp_peer_t* peer;
+mca_oob_tcp_addr_t* addr = NULL;
+mca_oob_tcp_peer_t*

Re: [OMPI devel] C++ errhandler

2008-02-15 Thread Tim Prins


Done: https://svn.open-mpi.org/trac/ompi/ticket/1216

Tim

Jeff Squyres wrote:
Blah; it is not a known issue.  I swear I tested this, but I must have  
goofed.  :-(


Can you file a bug and assign it to me?  I can't look at it this  
second, but perhaps I can later today.  Thanks...




On Feb 15, 2008, at 9:19 AM, Tim Prins wrote:


Hi,

We are running into a problem with the IBM test cxx_call_errhandler
since the merge of the c++ bindings changes. Not sure if this is a  
known

problem, but I did not see a bug or any traffic about this one.

MTT link: http://www.open-mpi.org/mtt/index.php?do_redir=532

Thanks,

Tim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] C++ errhandler

2008-02-15 Thread Tim Prins


Hi,

We are running into a problem with the IBM test cxx_call_errhandler 
since the merge of the c++ bindings changes. Not sure if this is a known 
problem, but I did not see a bug or any traffic about this one.


MTT link: http://www.open-mpi.org/mtt/index.php?do_redir=532

Thanks,

Tim

Re: [OMPI devel] New address selection for btl-tcp (was Re: [OMPI svn] svn:open-mpi r17307)

2008-02-15 Thread Tim Prins


Adrian Knoth wrote:

On Fri, Feb 01, 2008 at 11:40:20AM -0500, Tim Prins wrote:


Adrian,


Hi!

Sorry for the late reply and thanks for your testing.


1. There are some warnings when compiling:


I've fixed these issues.

Thanks.


2. If I exclude all my tcp interfaces, the connection fails properly, 
but I do get a malloc request for 0 bytes:
tprins@odin examples]$ mpirun -mca btl tcp,self  -mca btl_tcp_if_exclude 
eth0,ib0,lo -np 2 ./ring_c

malloc debug: Request for 0 bytes (btl_tcp_component.c, 844)
malloc debug: Request for 0 bytes (btl_tcp_component.c, 844)



Not my fault, but I guess we could fix it anyway. Should we?
It probably should be fixed. But I've noticed that other BTLs (such as 
MX) do not properly handle the case where there are no available 
interfaces either...




3. If the exclude list does not contain 'lo', or the include list 
contains 'lo', the job hangs when using multiple nodes:


That's weird. Loopback interfaces should automatically be excluded right
from the beginning. See opal/util/if.c.

I neither know nor haven't checked where things go wrong. Do you want to
investigate? As already mentioned, this should not happen.
I took a quick glance at this file, and I'd be lying if I said I 
understood what was going on in it. One thing I did notice is that the 
parameter btl_tcp_if_exclude defaults to 'lo', but the user can of 
course overwrite it.


It might be worth looking into this further. If the user got an error or 
the job aborted if they did something wrong with 'lo' I would not worry 
about it at all. But the fact that it causes a hang is worrisome to me.




Can you post the output of "ip a s" or "ifconfig -a"?

It is at the end of the email.



However, the great news about this patch is that it appears to fix 
https://svn.open-mpi.org/trac/ompi/ticket/1027 for me.


It also fixes my #1206. I'd like to merge tmp-public/btl-tcp into the
trunk, especially before the 1.3 code freeze. Any objections?

Not from me, especially now that it is already in the trunk :).

Tim


--
ifconfig -a:
eth0  Link encap:Ethernet  HWaddr 00:E0:81:2D:0B:08
  inet addr:129.79.240.101  Bcast:129.79.240.255 
Mask:255.255.255.0

  inet6 addr: 2001:18e8:2:240:2e0:81ff:fe2d:b08/64 Scope:Global
  inet6 addr: fe80::2e0:81ff:fe2d:b08/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:555918407 errors:0 dropped:2122 overruns:0 frame:0
  TX packets:569928551 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:448936694980 (418.1 GiB)  TX bytes:486030858441 
(452.6 GiB)

  Interrupt:193

eth1  Link encap:Ethernet  HWaddr 00:E0:81:2D:0B:09
  BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
  Interrupt:201

ib0   Link encap:UNSPEC  HWaddr 
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00

  inet addr:192.168.0.101  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::202:c902:0:5d71/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
  RX packets:6304819 errors:0 dropped:0 overruns:0 frame:0
  TX packets:6355094 errors:0 dropped:2 overruns:0 carrier:0
  collisions:0 txqueuelen:128
  RX bytes:26794850321 (24.9 GiB)  TX bytes:35448899645 (33.0 GiB)

ib1   Link encap:UNSPEC  HWaddr 
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00

  BROADCAST MULTICAST  MTU:2044  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:128
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:182055033 errors:0 dropped:0 overruns:0 frame:0
  TX packets:182055033 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:997605665018 (929.0 GiB)  TX bytes:997605665018 
(929.0 GiB)


sit0  Link encap:IPv6-in-IPv4
  NOARP  MTU:1480  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

ip a s:
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:e0:81:2d:0b:08 brd ff:ff:ff:

Re: [OMPI devel] C++ build failures

2008-02-12 Thread Tim Prins

I just talked to Jeff about this. The problem was that on Sif we use 
--enable-visibility, and apparently the new c++ bindings access 
ompi_errhandler_create, which was not OMPI_DECLSPEC'd. Jeff will fix 
this soon.


Tim

Jeff Squyres wrote:
I'm a little concerned about the C++ test build failures from last  
night:


 http://www.open-mpi.org/mtt/index.php?do_redir=530

They are likely due to the C++ changes that came in over the weekend,  
but they *only* showed up at IU, which is somewhat odd.  I'm trying to  
replicate now (doing a fresh build of the trunk and will build the  
tests that failed for you), but I'm kinda guessing it's going to work  
fine on my platforms.


IU: do you have any idea what caused these failures?  Does sif have a  
newer compiler that is somehow picking up on a latent bug that we  
missed in the C++ stuff?

Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307

2008-02-01 Thread Tim Prins


Adrian,

For the most part this seems to work for me. But there are a few issues. 
I'm not sure which are introduced by this patch, and whether some may be 
expected behavior. But for completeness I will point them all out. 
First, let me explain I am working on a machine with 3 tcp interfaces, 
lo, eth0, and ib0. Both eth0 and ib0 connect all the compute nodes.


1. There are some warnings when compiling:
btl_tcp_proc.c:171: warning: no previous prototype for 'evaluate_assignment'
btl_tcp_proc.c:206: warning: no previous prototype for 'visit'
btl_tcp_proc.c:224: warning: no previous prototype for 
'mca_btl_tcp_initialise_interface'

btl_tcp_proc.c: In function `mca_btl_tcp_proc_insert':
btl_tcp_proc.c:304: warning: pointer targets in passing arg 2 of 
`opal_ifindextomask' differ in signedness
btl_tcp_proc.c:313: warning: pointer targets in passing arg 2 of 
`opal_ifindextomask' differ in signedness

btl_tcp_proc.c:389: warning: comparison between signed and unsigned
btl_tcp_proc.c:400: warning: comparison between signed and unsigned
btl_tcp_proc.c:401: warning: comparison between signed and unsigned
btl_tcp_proc.c:459: warning: ISO C90 forbids variable-size array `a'
btl_tcp_proc.c:459: warning: ISO C90 forbids mixed declarations and code
btl_tcp_proc.c:465: warning: ISO C90 forbids mixed declarations and code
btl_tcp_proc.c:466: warning: comparison between signed and unsigned
btl_tcp_proc.c:480: warning: comparison between signed and unsigned
btl_tcp_proc.c:485: warning: comparison between signed and unsigned
btl_tcp_proc.c:495: warning: comparison between signed and unsigned

2. If I exclude all my tcp interfaces, the connection fails properly, 
but I do get a malloc request for 0 bytes:
tprins@odin examples]$ mpirun -mca btl tcp,self  -mca btl_tcp_if_exclude 
eth0,ib0,lo -np 2 ./ring_c

malloc debug: Request for 0 bytes (btl_tcp_component.c, 844)
malloc debug: Request for 0 bytes (btl_tcp_component.c, 844)


3. If the exclude list does not contain 'lo', or the include list 
contains 'lo', the job hangs when using multiple nodes:
[tprins@odin examples]$ mpirun -mca btl tcp,self  -mca 
btl_tcp_if_exclude ib0 -np 2 -bynode ./ring_cProcess 0 sending 10 to 1, 
tag 201 (2 processes in ring)
[odin011][1,0][btl_tcp_endpoint.c:619:mca_btl_tcp_endpoint_complete_connect] 
connect() failed: Connection refused (111)


[tprins@odin examples]$ mpirun -mca btl tcp,self  -mca 
btl_tcp_if_include eth0,lo -np 2 -bynode ./ring_c

Process 0 sending 10 to 1, tag 201 (2 processes in ring)
[odin011][1,0][btl_tcp_endpoint.c:619:mca_btl_tcp_endpoint_complete_connect] 
connect() failed: Connection refused (111)



However, the great news about this patch is that it appears to fix 
https://svn.open-mpi.org/trac/ompi/ticket/1027 for me.


Hope this helps,

Tim



Adrian Knoth wrote:

On Wed, Jan 30, 2008 at 06:48:54PM +0100, Adrian Knoth wrote:


What is the real issue behind this whole discussion?

Hanging connections.
I'll have a look at it tomorrow.


To everybody who's interested in BTL-TCP, especially George and (to a
minor degree) rhc:

I've integrated something what I call "magic address selection code".
See the comments in r17348.

Can you check

   https://svn.open-mpi.org/svn/ompi/tmp-public/btl-tcp

if it's working for you? Read: multi-rail TCP, FNN, whatever is
important to you?


The code is proof of concept and could use a little tuning (if it's
working at all. Over here, it satisfies all tests).

I vaguely remember that at least Ralph doesn't like

   int a[perm_size * sizeof(int)];

where perm_size is dynamically evaluated (read: array size is runtime
dependent)

There are also some large arrays, search for MAX_KERNEL_INTERFACE_INDEX.
Perhaps it's better to replace them with an appropriate OMPI data
structure. I don't know what fits best, you guys know the details...


So please give the code a try, and if it's working, feel free to cleanup
whatever is necessary to make it the OMPI style or give me some pointers
what to change.


I'd like to point to Thomas' diploma thesis. The PDF explains the theory
behind the code, it's like an rationale. Unfortunately, the PDF has some
typos, but I guess you'll get the idea. It's a graph matching algorithm,
Chapter 3 covers everything in detail:

 http://cluster.inf-ra.uni-jena.de/~adi/peiselt-thesis.pdf


HTH

Re: [OMPI devel] vt compiler warnings and errors

2008-01-31 Thread Tim Prins


Hi Matthias,

I just noticed something else that seems odd. On a fresh checkout, I did 
a autogen and configure. Then I type 'make clean'. Things seem to 
progress normally, but once it gets to ompi/contrib/vt/vt/extlib/otf, a 
new configure script gets run.


Specifically:
[tprins@sif test]$ make clean

Making clean in otf
make[5]: Entering directory 
`/san/homedirs/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf'
 cd . && /bin/sh 
/u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run 
automake-1.10 --gnu
cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing 
--run autoconf

/bin/sh ./config.status --recheck
running CONFIG_SHELL=/bin/sh /bin/sh ./configure  --with-zlib-lib=-lz 
--prefix=/usr/local --exec-prefix=/usr/local --bindir=/usr/local/bin 
--libdir=/usr/local/lib --includedir=/usr/local/include 
--datarootdir=/usr/local/share/vampirtrace 
--datadir=${prefix}/share/${PACKAGE_TARNAME} 
--docdir=${prefix}/share/${PACKAGE_TARNAME}/doc --cache-file=/dev/null 
--srcdir=. CXXFLAGS=-g -Wall -Wundef -Wno-long-long -finline-functions 
-pthread LDFLAGS=  LIBS=-lnsl -lutil  -lm  CPPFLAGS=  CFLAGS=-g -Wall 
-Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes 
-Wstrict-prototypes -Wcomment -pedantic 
-Werror-implicit-function-declaration -finline-functions 
-fno-strict-aliasing -pthread FFLAGS=  --no-create --no-recursion

checking build system type... x86_64-unknown-linux-gnu



Not sure if this is expected behavior, but it seems wrong to me.

Thanks,

Tim

Matthias Jurenz wrote:

Hello,

all three VT related errors which MTT reported should be fixed now.

516:
The fix from George Bosilca at this morning should work on MacOS PPC. 
Thanks!


517:
The compile error occurred due to a missing header include.
Futhermore, the compiler warnings should be also fixed.

518:
I have added a check whether MPI I/O is available and add the 
corresponding VT's
configure option to enable/disable MPI I/O support. Therefor I used the 
variable
"define_mpi_io" from 'ompi/mca/io/configure.m4'. Is that o.k. or should 
I use another

variable ?


Matthias


On Di, 2008-01-29 at 09:19 -0500, Jeff Squyres wrote:
I got a bunch of compiler warnings and errors with VT on the PGI  
compiler last night -- my mail client won't paste it in nicely.  :-(


See these MTT reports for details:

- On Absoft systems:
   http://www.open-mpi.org/mtt/index.php?do_redir=516
- On Cisco systems:
   With PGI compilers:
   http://www.open-mpi.org/mtt/index.php?do_redir=517
   With GNU compilers:
   http://www.open-mpi.org/mtt/index.php?do_redir=518

The output may be a bit hard to read -- for MTT builds, we separate  
the stdout and stderr into 2 streams.  So you kinda have to merge them  
in your head; sorry...



--
Matthias Jurenz,
Center for Information Services and
High Performance Computing (ZIH), TU Dresden,
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] 32 bit udapl warnings

2008-01-31 Thread Tim Prins


Hi,

I am seeing some warnings on the trunk when compiling udapl in 32 bit 
mode with OFED 1.2.5.1:


btl_udapl.c: In function 'udapl_reg_mr':
btl_udapl.c:95: warning: cast from pointer to integer of different size
btl_udapl.c: In function 'mca_btl_udapl_alloc':
btl_udapl.c:852: warning: cast from pointer to integer of different size
btl_udapl.c: In function 'mca_btl_udapl_prepare_src':
btl_udapl.c:959: warning: cast from pointer to integer of different size
btl_udapl.c:1008: warning: cast from pointer to integer of different size
btl_udapl_component.c: In function 'mca_btl_udapl_component_progress':
btl_udapl_component.c:871: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_write_eager':
btl_udapl_endpoint.c:130: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_finish_max':
btl_udapl_endpoint.c:775: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_post_recv':
btl_udapl_endpoint.c:864: warning: cast from pointer to integer of 
different size
btl_udapl_endpoint.c: In function 
'mca_btl_udapl_endpoint_initialize_control_message':
btl_udapl_endpoint.c:1012: warning: cast from pointer to integer of 
different size



Thanks,

Tim

Re: [OMPI devel] vt compiler warnings and errors

2008-01-29 Thread Tim Prins


Jeff Squyres wrote:
I got a bunch of compiler warnings and errors with VT on the PGI  
compiler last night -- my mail client won't paste it in nicely.  :-(


See these MTT reports for details:

- On Absoft systems:
   http://www.open-mpi.org/mtt/index.php?do_redir=516
- On Cisco systems:
   With PGI compilers:
   http://www.open-mpi.org/mtt/index.php?do_redir=517
   With GNU compilers:
   http://www.open-mpi.org/mtt/index.php?do_redir=518
Note that this last link points to the IU failures when configuring with 
'--disable-mpi-io' which I reported earlier this morning.


Tim

Re: [OMPI devel] patch for building gm btl

2008-01-02 Thread Tim Prins

On Wednesday 02 January 2008 08:52:08 am Jeff Squyres wrote:
> On Dec 31, 2007, at 11:42 PM, Paul H. Hargrove wrote:
> > I tried today to build the OMPI trunk on a system w/ GM libs installed
> > (I tried both GM-2.0.16 and GM-1.6.4) and found that the GM BTL won't
> > even compile, due to unbalanced parens.  The patch below reintroduces
> > the parens that were apparently lost in r16633:
>
> Fixed (https://svn.open-mpi.org/trac/ompi/changeset/17029); thanks for
> the patch.
>
> > The fact that this has gone unfixed for 2 months suggests to me that
> > nobody is building the GM BTL.  So, how would I go about checking ...
> > a) ...if there exists any periodic build of the GM BTL via MTT?
>treks
> I thought that Indiana was doing GM builds, but perhaps they've
> upgraded to MX these days...?

This is correct. Our GM system was upgraded, and is now running MX (although 
we have yet to setup MTT on the upgraded system...).

Tim

[OMPI devel] opal_condition_wait

2007-12-06 Thread Tim Prins


Hi,

A couple of questions.

First, in opal_condition_wait (condition.h:97) we do not release the 
passed mutex if opal_using_threads() is not set. Is there a reason for 
this? I ask since this violates the way condition variables are supposed 
to work, and it seems like there are situations where this could cause 
deadlock.


Also, when we are using threads, there is a case where we do not 
decrement the signaled count, in condition.h:84. Gleb put this in in 
r9451, however the change does not make sense to me. I think that the 
signal count should always be decremented.


Can anyone shine any light on these issues?

Thanks,

Tim

[OMPI devel] opal_condition

2007-12-05 Thread Tim Prins


Hi,

Last night we had one of our threaded builds on the trunk hang when 
running make check on the test opal_condition in test/threads/


After running the test about 30-40 times, I was only able to get it to 
hang once. Looking at it is gdb, we get:


(gdb) info threads
  3 Thread 1084229984 (LWP 8450)  0x002a95e3bba9 in sched_yield () 
from /lib64/tls/libc.so.6

  2 Thread 1094719840 (LWP 8451)  0xff600012 in ?? ()
  1 Thread 182904955328 (LWP 8430)  0x002a9567309b in pthread_join 
() from /lib64/tls/libpthread.so.0

(gdb) thread 2
[Switching to thread 2 (Thread 1094719840 (LWP 8451))]#0 
0xff600012 in ?? ()

(gdb) bt
#0  0xff600012 in ?? ()
#1  0x0001 in ?? ()
#2  0x in ?? ()
(gdb) thread 1
[Switching to thread 1 (Thread 182904955328 (LWP 8430))]#0 
0x002a9567309b in pthread_join () from /lib64/tls/libpthread.so.0

(gdb) bt
#0  0x002a9567309b in pthread_join () from /lib64/tls/libpthread.so.0
#1  0x002a95794a7d in opal_thread_join () from 
/san/homedirs/mpiteam/mtt-runs/odin/20071204-Nightly/pb_2/installs/Bp80/src/openmpi-1.3a1r16847/opal/.libs/libopen-pal.so.0

#2  0x00401684 in main ()
(gdb) thread 3
[Switching to thread 3 (Thread 1084229984 (LWP 8450))]#0 
0x002a95e3bba9 in sched_yield () from /lib64/tls/libc.so.6

(gdb) bt
#0  0x002a95e3bba9 in sched_yield () from /lib64/tls/libc.so.6
#1  0x00401216 in thr1_run ()
#2  0x002a95672137 in start_thread () from /lib64/tls/libpthread.so.0
#3  0x002a95e53113 in clone () from /lib64/tls/libc.so.6
(gdb)


I know, this is not very helpful, but I have no idea what is going on. 
There have been no changes in this code area for a long time.


Has anyone else seen something like this? Any ideas what is going on?

Thanks,

Tim

Re: [OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities

2007-12-05 Thread Tim Prins


Well, I think it is pretty obvious that I am a fan of a attribute system :)

For completeness, I will point out that we also exchange architecture 
and hostname info in the modex.


Do we really need a complete node map? A far as I can tell, it looks 
like the MPI layer only needs a list of local processes. So maybe it 
would be better to forget about the node ids at the mpi layer and just 
return the local procs.


So my vote would be to leave the modex alone, but remove the node id, 
and add a function to get the list of local procs. It doesn't matter to 
me how the RTE implements that.


Alternatively, if we did a process attribute system we could just use 
predefined attributes, and the runtime can get each process's node id 
however it wants.


Tim

Ralph H Castain wrote:

IV. RTE/MPI relative modex responsibilities
The modex operation conducted during MPI_Init currently involves the
exchange of two critical pieces of information:

1. the location (i.e., node) of each process in my job so I can determine
who shares a node with me. This is subsequently used by the shared memory
subsystem for initialization and message routing; and

2. BTL contact info for each process in my job.

During our recent efforts to further abstract the RTE from the MPI layer, we
pushed responsibility for both pieces of information into the MPI layer.
This wasn't done capriciously - the modex has always included the exchange
of both pieces of information, and we chose not to disturb that situation.

However, the mixing of these two functional requirements does cause problems
when dealing with an environment such as the Cray where BTL information is
"exchanged" via an entirely different mechanism. In addition, it has been
noted that the RTE (and not the MPI layer) actually "knows" the node
location for each process.

Hence, questions have been raised as to whether:

(a) the modex should be built into a framework to allow multiple BTL
exchange mechansims to be supported, or some alternative mechanism be used -
one suggestion made was to implement an MPICH-like attribute exchange; and

(b) the RTE should absorb responsibility for providing a "node map" of the
processes in a job (note: the modex may -use- that info, but would no longer
be required to exchange it). This has a number of implications that need to
be carefully considered - e.g., the memory required to store the node map in
every process is non-zero. On the other hand:

(i) every proc already -does- store the node for every proc - it is simply
stored in the ompi_proc_t structures as opposed to somewhere in the RTE. We
would want to avoid duplicating that storage, but there would be no change
in memory footprint if done carefully.

(ii) every daemon already knows the node map for the job, so communicating
that info to its local procs may not prove a major burden. However, the very
environments where this subject may be an issue (e.g., the Cray) do not use
our daemons, so some alternative mechanism for obtaining the info would be
required.


So the questions to be considered here are:

(a) do we leave the current modex "as-is", to include exchange of the node
map, perhaps including "#if" statements to support different exchange
mechanisms?

(b) do we separate the two functions currently in the modex and push the
requirement to obtain a node map into the RTE? If so, how do we want the MPI
layer to retrieve that info so we avoid increasing our memory footprint?

(c) do we create a separate modex framework for handling the different
exchange mechanisms for BTL info, do we incorporate it into an existing one
(if so, which one), the new publish-subscribe framework, implement an
alternative approach, or...?

(d) other suggestions?

Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn] svn:open-mpi r16723

2007-11-14 Thread Tim Prins


Hi,

The following files bother me about this commit:
trunk/ompi/mca/btl/sctp/sctp_writev.c
trunk/ompi/mca/btl/sctp/sctp_writev.h

They bother me for 2 reasons:
1. Their naming does not follow the prefix rule
2. They are LGPL licensed. While I personally like the LGPL, I do not 
believe it is compatible with the BSD license that OMPI is distributed 
under. I think (though I could be wrong) that these files need to be 
removed from the repository and the functionality implemented in some 
other way.


Tim


pen...@osl.iu.edu wrote:

Author: penoff
Date: 2007-11-13 18:39:16 EST (Tue, 13 Nov 2007)
New Revision: 16723
URL: https://svn.open-mpi.org/trac/ompi/changeset/16723

Log:
initial SCTP BTL commit
Added:
   trunk/ompi/mca/btl/sctp/
   trunk/ompi/mca/btl/sctp/.ompi_ignore
   trunk/ompi/mca/btl/sctp/.ompi_unignore
   trunk/ompi/mca/btl/sctp/Makefile.am
   trunk/ompi/mca/btl/sctp/btl_sctp.c
   trunk/ompi/mca/btl/sctp/btl_sctp.h
   trunk/ompi/mca/btl/sctp/btl_sctp_addr.h
   trunk/ompi/mca/btl/sctp/btl_sctp_component.c
   trunk/ompi/mca/btl/sctp/btl_sctp_component.h
   trunk/ompi/mca/btl/sctp/btl_sctp_endpoint.c
   trunk/ompi/mca/btl/sctp/btl_sctp_endpoint.h
   trunk/ompi/mca/btl/sctp/btl_sctp_frag.c
   trunk/ompi/mca/btl/sctp/btl_sctp_frag.h
   trunk/ompi/mca/btl/sctp/btl_sctp_hdr.h
   trunk/ompi/mca/btl/sctp/btl_sctp_proc.c
   trunk/ompi/mca/btl/sctp/btl_sctp_proc.h
   trunk/ompi/mca/btl/sctp/btl_sctp_recv_handler.c
   trunk/ompi/mca/btl/sctp/btl_sctp_recv_handler.h
   trunk/ompi/mca/btl/sctp/btl_sctp_utils.c
   trunk/ompi/mca/btl/sctp/btl_sctp_utils.h
   trunk/ompi/mca/btl/sctp/configure.m4
   trunk/ompi/mca/btl/sctp/configure.params
   trunk/ompi/mca/btl/sctp/sctp_writev.c
   trunk/ompi/mca/btl/sctp/sctp_writev.h


Diff not shown due to size (201438 bytes).
To see the diff, run the following command:

svn diff -r 16722:16723 --no-diff-deleted

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-07 Thread Tim Prins

I'm curious what changed to make this a problem. How were we passing mca param 
from the base to the app before, and why did it change?

I think that options 1 & 2 below are no good, since we, in general, allow 
string mca params to have spaces (as far as I understand it). So a more 
general approach is needed. 

Tim

On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote:
> Sorry for delay - wasn't ignoring the issue.
>
> There are several fixes to this problem - ranging in order from least to
> most work:
>
> 1. just alias "ssh" to be "ssh -Y" and run without setting the mca param.
> It won't affect anything on the backend because the daemon/procs don't use
> ssh.
>
> 2. include "pls_rsh_agent" in the array of mca params not to be passed to
> the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
> orte_pls_base_orted_append_basic_args function. This would fix the specific
> problem cited here, but I admit that listing every such param by name would
> get tedious.
>
> 3. we could easily detect that a "problem" character was in the mca param
> value when we add it to the orted's argv, and then put "" around it. The
> problem, however, is that the mca param parser on the far end doesn't
> remove those "" from the resulting string. At least, I spent over a day
> fighting with a problem only to discover that was happening. Could be an
> error in the way I was doing things, or could be a real characteristic of
> the parser. Anyway, we would have to ensure that the parser removes any
> surrounding "" before passing along the param value or this won't work.
>
> Ralph
>
> On 11/5/07 12:10 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:
> > Hi,
> >
> > Commit 16364 broke things when using multiword mca param values. For
> > instance:
> >
> > mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent
> > "ssh -Y" xterm
> >
> > Will crash and burn, because the value "ssh -Y" is being stored into the
> > argv orted_cmd_line in orterun.c:1506. This is then added to the launch
> > command for the orted:
> >
> > /usr/bin/ssh -Y odin004  PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ;
> > export PATH ;
> > LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:$LD_LIBRARY_PATH ;
> > export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug
> > --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --nodename
> > odin004 --universe tpr...@odin.cs.indiana.edu:default-universe-27872
> > --nsreplica
> > "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
> >:4090 8"
> > --gprreplica
> > "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
> >:4090 8"
> > -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
> > mca_base_param_file_path
> > /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/tprins/rsl/
> >examp les
> > -mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/examples
> >
> > Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So
> > the quotes have been lost, as we die a horrible death.
> >
> > So we need to add the quotes back in somehow, or pass these options
> > differently. I'm not sure what the best way to fix this.
> >
> > Thanks,
> >
> > Tim

Re: [OMPI devel] Environment forwarding

2007-11-05 Thread Tim Prins

Thanks for the clarification everyone.

Tim

On Monday 05 November 2007 05:41:00 pm Torsten Hoefler wrote:
> On Mon, Nov 05, 2007 at 05:32:04PM -0500, Brian W. Barrett wrote:
> > On Mon, 5 Nov 2007, Torsten Hoefler wrote:
> > > On Mon, Nov 05, 2007 at 04:57:19PM -0500, Brian W. Barrett wrote:
> > >> This is extremely tricky to do.  How do you know which environment
> > >> variables to forward (foo in this case) and which not to (hostname).
> > >> SLURM has a better chance, since it's linux only and generally only
> > >> run on tightly controlled clusters.  But there's a whole variety of
> > >> things that shouldn't be forwarded and that list differs from OS to
> > >> OS.
> > >>
> > >> I believe we toyed around with the "right thing" in LAM and early on
> > >> with OPen MPI and decided that it was too hard to meet expected
> > >> behavior.
> > >
> > > Some applications rely on this (I know at least two right away, Gamess
> > > and Abinit) and they work without problems with Lam/Mpich{1,2} but not
> > > with Open MPI. I am *not* arguing that those applications are correct
> > > (I agree that this way of passing arguments is ugly, but it's done).
> > >
> > > I know it's not defined in the standard but I think it's a nice
> > > convenient functionality. E.g., setting the LD_LIBRARY_PATH to find
> > > libmpi.so in the .bashrc is also a pain if you have multiple (Open)
> > > MPIs installed.
> >
> > LAM does not automatically propogate environment variables -- it's
> > behavior is almost *exactly* like Open MPI's.  There might be a situation
> > where the environment is not quite so scrubbed if a process is started on
> > the same node mpirun is executed on, but it's only appearances -- in
> > reality, that's the environment that was alive when lamboot was executed.
>
> ok, I might have executed it on the same node (was a while ago).
>
> > With both LAM and Open MPI, there is the -x option to propogate a list of
> > environment variables, but that's about it.  Neither will push
> > LD_LIBRARY_PATH by default (and there are many good reasons for that,
> > particularly in heterogeneous situations).
>
> Ah, heterogeneous! Yes, I agree.
>
> Torsten

[OMPI devel] Environment forwarding

2007-11-05 Thread Tim Prins

Hi,

After talking with Torsten today I found something weird. When using the SLURM 
pls we seem to forward a user's environment, but when using the rsh pls we do 
not.

I.e.:
[tprins@odin ~]$ mpirun -np 1 printenv |grep foo
[tprins@odin ~]$ export foo=bar
[tprins@odin ~]$ mpirun -np 1 printenv |grep foo
foo=bar
[tprins@odin ~]$ mpirun -np 1 -mca pls rsh printenv |grep foo

So my question is which is the expected behavior? 

I don't think we can do anything about SLURM automatically forwarding the 
environment, but I think there should be a way to make rsh forward the 
environment. Perhaps add a flag to mpirun to do this?

Thanks,

Tim

[OMPI devel] Multiworld MCA parameter values broken

2007-11-05 Thread Tim Prins


Hi,

Commit 16364 broke things when using multiword mca param values. For 
instance:


mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent 
"ssh -Y" xterm


Will crash and burn, because the value "ssh -Y" is being stored into the 
argv orted_cmd_line in orterun.c:1506. This is then added to the launch 
command for the orted:


/usr/bin/ssh -Y odin004  PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ; 
export PATH ; 
LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:$LD_LIBRARY_PATH ; 
export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug 
--debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --nodename 
odin004 --universe tpr...@odin.cs.indiana.edu:default-universe-27872 
--nsreplica 
"0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0:40908" 
--gprreplica 
"0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0:40908" 
-mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca 
mca_base_param_file_path 
/u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/tprins/rsl/examples 
-mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/examples


Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So 
the quotes have been lost, as we die a horrible death.


So we need to add the quotes back in somehow, or pass these options 
differently. I'm not sure what the best way to fix this.


Thanks,

Tim

[OMPI devel] Use of orte_pointer_array in openib and udapl btls

2007-10-17 Thread Tim Prins


Hi,

The openib and udapl btls currently use the orte_pointer_array class. 
This is a problem for me as I am trying to implement the RSL. So, as far 
as I can tell, there are 3 options:


1. Move the orte_pointer_array class to opal. This would be quite simple 
to do and makes sense in that there is nothing in the orte_pointer_array 
specific to orte.


2. Change the udapl and openib btls to use a simple C array. There is 
currently a ticket filed (https://svn.open-mpi.org/trac/ompi/ticket/727) 
to do this in the openib btl.


3. Change the btls to use the ompi_pointer_array. This might not be a 
good idea since the above ticket says that the ompi_pointer array was 
intentionally not used.


Any of these options are fine with me, although if #2 is picked someone 
else would probably need to do most of the work.


Comments?

Thanks,

Tim

[OMPI devel] Non-blocking modex

2007-10-08 Thread Tim Prins


Hi,

I am working on implementing the RSL. Part of this is changing the modex 
to use the process attribute system in the RSL. I had designed this 
system to to include a non-blocking interface.


However, I have looked again and noticed that nobody is using the 
non-blocking modex receive. Because of this I am inclined to not have 
such an interface in the RSL.


Please let me know if there are any objections to this, and if anyone is 
currently planning on using such functionality.


Thanks,

Tim

[OMPI devel] RFC: Remove opal message buffer

2007-10-08 Thread Tim Prins


WHAT: Remove the opal message buffer code

WHY: It is not used

WHERE: Remove references from opal/mca/base/Makefile.am and
   opal/mca/base/base.h
   svn rm opal/mca/base/mca_base_msgbuf*

WHEN: After timeout

TIMEOUT: COB, Wednesday October 10, 2007



I ran into this code accidentally while looking at other things. It 
looks like it was originally designed to be our data packing/unpacking 
system, but we now use the dss for that.


A couple grep's through the code does not find anyone who actually uses 
this functionality. So, to reduce future confusion and excess code, I 
would like to remove it.

Re: [OMPI devel] Malloc segfaulting?

2007-09-22 Thread Tim Prins

But I am compiling Open MPI with --without-memory-manager, so it should work?

Anyways, I ran the tests and valgrind is reporting 2 different (potentially 
related) problems:

1.
==12680== Invalid read of size 4
==12680==at 0x709DE03: ompi_cb_fifo_write_to_head 
(ompi_circular_buffer_fifo.h:271)
==12680==by 0x709DA77: ompi_fifo_write_to_head (ompi_fifo.h:324)
==12680==by 0x709D964: mca_btl_sm_component_progress 
(btl_sm_component.c:398)
==12680==by 0x705BF6B: mca_bml_r2_progress (bml_r2.c:110)
==12680==by 0x44F905B: opal_progress (opal_progress.c:187)
==12680==by 0x704F0E5: opal_condition_wait (condition.h:98)
==12680==by 0x704EFD4: mca_pml_ob1_recv (pml_ob1_irecv.c:124)
==12680==by 0x7202A62: ompi_coll_tuned_scatter_intra_binomial 
(coll_tuned_scatter.c:166)
==12680==by 0x71F2C08: ompi_coll_tuned_scatter_intra_dec_fixed
(coll_tuned_decision_fixed.c:746)
==12680==by 0x4442494: PMPI_Scatter (pscatter.c:125)
==12680==by 0x8048F6F: main (scatter_in_place.c:73)

2.
==28775== Jump to the invalid address stated on the next line
==28775==at 0x2F305F35: ???
==28775==by 0x704AF6B: mca_bml_r2_progress (bml_r2.c:110)
==28775==by 0x44F905B: opal_progress (opal_progress.c:187)
==28775==by 0x440BF6B: opal_condition_wait (condition.h:98)
==28775==by 0x440BDF7: ompi_request_wait (req_wait.c:46)
==28775==by 0x71EF396: 
ompi_coll_tuned_reduce_scatter_intra_basic_recursivehalving
(coll_tuned_reduce_scatter.c:319)
==28775==by 0x71E1540: ompi_coll_tuned_reduce_scatter_intra_dec_fixed
(coll_tuned_decision_fixed.c:471)
==28775==by 0x7202806: ompi_osc_pt2pt_module_fence (osc_pt2pt_sync.c:84)
==28775==by 0x44501B5: PMPI_Win_fence (pwin_fence.c:57)
==28775==by 0x80493D6: test_acc3_1 (test_acc3.c:156)
==28775==by 0x8048FD0: test_acc3 (test_acc3.c:26)
==28775==by 0x8049609: main (test_acc3.c:206)
==28775==  Address 0x2F305F35 is not stack'd, malloc'd or (recently) free'd

I don't know what to make of these. Here is the link to the full results:
http://www.open-mpi.org/mtt/index.php?do_redir=386

Thanks,

Tim

On Friday 21 September 2007 10:40:21 am George Bosilca wrote:
> Tim,
>
> Valgrind will not help ... It can help with double free or things
> like this, but not with over-running memory that belong to your
> application. However, in Open MPI we have something that might help
> you. The option --enable-mem-debug add a unused space at the end of
> each memory allocation and make sure we don't write anything there. I
> think this is the simplest way to pinpoint this problem.
>
>Thanks,
>  george.
>
> On Sep 21, 2007, at 10:07 AM, Tim Prins wrote:
> > Aurelien and Brian.
> >
> > Thanks for the suggestions. I reran the runs with --without-memory-
> > manager and
> > got (on 2 of 5000 runs):
> > *** glibc detected *** corrupted double-linked list: 0xf704dff8 ***
> > on one and
> > *** glibc detected *** malloc(): memory corruption: 0xeda00c70 ***
> > on the other.
> >
> > So it looks like somewhere we are over-running our allocated space.
> > So now I
> > am attempting to redo the run with valgrind.
> >
> > Tim
> >
> > On Thursday 20 September 2007 09:59:14 pm Brian Barrett wrote:
> >> On Sep 20, 2007, at 7:02 AM, Tim Prins wrote:
> >>> In our nightly runs with the trunk I have started seeing cases
> >>> where we
> >>> appear to be segfaulting within/below malloc. Below is a typical
> >>> output.
> >>>
> >>> Note that this appears to only happen on the trunk, when we use
> >>> openib,
> >>> and are in 32 bit mode. It seems to happen randomly at a very low
> >>> frequency (59 out of about 60,000 32 bit openib runs).
> >>>
> >>> This could be a problem with our machine, and has showed up since I
> >>> started testing 32bit ofed 10 days ago.
> >>>
> >>> Anyways, just curious if anyone had any ideas.
> >>
> >> As someone else said, this usually points to a duplicate free or the
> >> like in malloc.  You might want to try compiling with --without-
> >> memory-manager, as the ptmalloc2 in glibc frequently is more verbose
> >> about where errors occurred than is the one in Open MPI.
> >>
> >> Brian
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] Malloc segfaulting?

2007-09-20 Thread Tim Prins


Hi folks,

In our nightly runs with the trunk I have started seeing cases where we 
appear to be segfaulting within/below malloc. Below is a typical output.


Note that this appears to only happen on the trunk, when we use openib, 
and are in 32 bit mode. It seems to happen randomly at a very low 
frequency (59 out of about 60,000 32 bit openib runs).


This could be a problem with our machine, and has showed up since I 
started testing 32bit ofed 10 days ago.


Anyways, just curious if anyone had any ideas.

Thanks,

Tim

--

[odin011:04084] *** Process received signal ***
[odin011:04084] Signal: Segmentation fault (11)
[odin011:04084] Signal code: Invalid permissions (2)
[odin011:04084] Failing at address: 0xf7cbea68
[odin011:04084] [ 0] [0xe600]
[odin011:04084] [ 1]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/libopen-pal.so.0(malloc+0x82)
[0xf7e882d2]
[odin011:04084] [ 2]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/libopen-rte.so.0(orte_hash_table_set_proc+0xfa)
[0xf7ec57aa]
[odin011:04084] [ 3]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_peer_lookup+0x11d)
[0xf7cbcebd]
[odin011:04084] [ 4]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_send_nb+0x1f)
[0xf7cbfccf]
[odin011:04084] [ 5]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_rml_oob.so(orte_rml_oob_send_buffer_nb+0x25a)
[0xf7cddfda]
[odin011:04084] [ 6]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_btl_openib.so
[0xf7c145f1]
[odin011:04084] [ 7]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_btl_openib.so
[0xf7c146e9]
[odin011:04084] [ 8]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_btl_openib.so(mca_btl_openib_endpoint_send+0x345)
[0xf7c0e155]
[odin011:04084] [ 9]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_btl_openib.so(mca_btl_openib_send+0x3e)
[0xf7c0718e]
[odin011:04084] [10]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_start_copy+0x17b)
[0xf7c3c4bb]
[odin011:04084] [11]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x27c)
[0xf7c35adc]
[odin011:04084] [12]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_gather_intra_basic_linear+0x65)
[0xf7bc72a5]
[odin011:04084] [13]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_gather_intra_dec_fixed+0x16a)
[0xf7bba2aa]
[odin011:04084] [14]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/libmpi.so.0(MPI_Gather+0x18c)
[0xf7f62b6c]
[odin011:04084] [15] src/MPI_Gather_c(main+0x5fd) [0x804a101]
[odin011:04084] [16] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xf7d0fde3]
[odin011:04084] [17] src/MPI_Gather_c [0x8049a81]
[odin011:04084] *** End of error message ***

Re: [OMPI devel] FreeBSD Support?

2007-09-19 Thread Tim Prins


This is fixed in r16164.

Tim

Brian Barrett wrote:

On Sep 19, 2007, at 4:11 PM, Tim Prins wrote:

Here is where it gets nasty. On FreeBSD, /usr/include/string.h  
includes

strings.h in some cases. But there is a strings.h in the ompi/mpi/f77
directory, so that is getting included instead of the proper
/usr/include/strings.h.

I suppose we could rename our strings.h to f77_strings.h, or something
similar. Does anyone have an opinion on this?


I think this is the best path forward.  Ugh.

Brian
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Public tmp branches

2007-08-31 Thread Tim Prins


Jeff Squyres wrote:
That's fine, too.  I don't really care -- /public already exists.  We  
can simply rename it to /tmp-public.


Let's do that. It should (more or less) address all concerns that have 
been voiced.


Tim




On Aug 31, 2007, at 8:52 AM, Ralph Castain wrote:


Why not make /tmp-public and /tmp-private?

Leave /tmp alone. Have all new branches made in one of the two new
directories, and as /tmp branches are slowly whacked, we can
(eventually) get rid of /tmp.

I'm fine with that.  If no one else objects, let's bring this up on
Tuesday to make sure everyone is aware and then pick a date to rename
everything (requires a global sync since it will affect anyone who
has a current /tmp checkout).
Or, to make life really simple, just leave /tmp alone and private.  
Just
create a tmp-public for branches that are not private. That way,  
those of us
with private tmp branches are unaffected, no global sync's are  
required,

etc.

Or perhaps that is -too- simple ;-)

Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Public tmp branches

2007-08-31 Thread Tim Prins

Why not make /tmp-public and /tmp-private?

Leave /tmp alone. Have all new branches made in one of the two new 
directories, and as /tmp branches are slowly whacked, we can 
(eventually) get rid of /tmp.

Tim

Jeff Squyres (jsquyres) wrote:
I thought about both of those (/tmp/private and /tmp/public), but 
couldn't think of a way to make them work.

1. If we do /tmp/private, we have to svn mv all the existing trees there 
which will annoy the developers (but is not a deal-breaker) and make 
/tmp publicly readable.  But that makes the history of all the private 
branches public.

2. If we do /tmp/public, I'm not quite sure how to setup the perms in SH 
to do that - if we setup /tmp to be 'no read access' for * and 
/tmp/public to have 'read access' for *, will a non authenticated user 
be able to reach /tmp/private?

-jms

 -Original Message-
From:   Brian Barrett [mailto:bbarr...@lanl.gov]
Sent:   Friday, August 17, 2007 11:51 AM Eastern Standard Time
To: Open MPI Developers
Subject:Re: [OMPI devel] Public tmp branches

ugh, sorry, I've been busy this week and didn't see a timeout, so a 
response got delayed.

I really don't like this format.  public doesn't have any meaning to 
it (tmp suggests, well, it's temporary).  I'd rather have /tmp/ and /
tmp/private or something like that.  Or /tmp/ and /tmp/public/.  
Either way :/.

Brian

On Aug 17, 2007, at 6:21 AM, Jeff Squyres wrote:

 > I didn't really put this in RFC format with a timeout, but no one
 > objected, so I have created:
 >
 >   http://svn.open-mpi.org/svn/ompi/public
 >
 > Developers should feel free to use this tree for public temporary
 > branches.  Specifically:
 >
 > - use /tmp if your branch is intended to be private
 > - use /public if your branch is intended to be public
 >
 > Enjoy.
 >
 >
 > On Aug 10, 2007, at 9:50 AM, Jeff Squyres wrote:
 >
 >> Right now all branches under /tmp are private to the OMPI core group
 >> (e.g., to allow unpublished academic work).  However, there are
 >> definitely cases where it would be useful to allow public branches
 >> when there's development work that is public but not yet ready for
 >> the trunk.  Periodically, we go an assign individual permissions to /
 >> tmp branches (like I just did to /tmp/vt-integration), but it would
 >> be easier if we had a separate tree for public "tmp" branches.
 >>
 >> Would anyone have an objection if I added /public (or any better name
 >> that someone can think of) for tmp-style branches, but that are open
 >> for read-only access to the public?
 >>
 >> --
 >> Jeff Squyres
 >> Cisco Systems
 >>
 >> ___
 >> devel mailing list
 >> de...@open-mpi.org
 >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
 >
 >
 > --
 > Jeff Squyres
 > Cisco Systems
 >
 > ___
 > devel mailing list
 > de...@open-mpi.org
 > http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-27 Thread Tim Prins

 to
do these things because we are now moving towards a tight integration of
ORTE and OMPI layers - i.e., ORTE can be simplified because we can take
advantage of our knowledge of what is happening on the MPI side of the
equation.

In order to accomplish this, however, we need to change the
points-of-contact between the MPI and RTE layers, and redefine what happens
at those points. If we require via the RSL that we" those points and what
happens at those points, then making these changes will either prove
impossible or at least will require considerable RSL code. On the other
hand, if we revise the RSL to support the new ORTE/OMPI functionality, then
we will have to write considerable code to make old versions of ORTE work
with the new system.
Again, I am not particularly concerned with supporting older versions of 
orte, but rather supporting different runtime systems.


Also, from what I know of these changes (and perhaps I don't understand 
them), the proposed changes would fit into the current RSL design.


Tim



Hence, my concern is that we not let RSL implementation prevent us from
moving forward with ORTE. The current work is required to meet scaling
demands, and hopefully will resolve much of the Cray issue. I see no value
in creating RSL just to support old versions of ORTE, nor for supporting
ORTE development. It would be nice if we could re-evaluate this after the
next ORTE version becomes solidified to see how the cost/benefit analysis
has changed, and whether the RSL remains a desirable option.

Ralph



On 8/16/07 7:47 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:


WHAT: Solicitation of feedback on the possibility of adding a runtime
services layer to Open MPI to abstract out the runtime.

WHY: To solidify the interface between OMPI and the runtime environment,
and to allow the use of different runtime systems, including different
versions of ORTE.

WHERE: Addition of a new framework to OMPI, and changes to many of the
files in OMPI to funnel all runtime request through this framework. Few
changes should be required in OPAL and ORTE.

WHEN: Development has started in tmp/rsl, but is still in its infancy. We hope
to have a working system in the next month.

TIMEOUT: 8/29/07

--
Short version:

I am working on creating an interface between OMPI and the runtime system.
This would make a RSL framework in OMPI which all runtime services would be
accessed from. Attached is a graphic depicting this.

This change would be invasive to the OMPI layer. Few (if any) changes
will be required of the ORTE and OPAL layers.

At this point I am soliciting feedback as to whether people are
supportive or not of this change both in general and for v1.3.


Long version:

The current model used in Open MPI assumes that one runtime system is
the best for all environments. However, in many environments it may be
beneficial to have specialized runtime systems. With our current system this
is not easy to do.

With this in mind, the idea of creating a 'runtime services layer' was
hatched. This would take the form of a framework within OMPI, through which
all runtime functionality would be accessed. This would allow new or
different runtime systems to be used with Open MPI. Additionally, with such a
system it would be possible to have multiple versions of open rte coexisting,
which may facilitate development and testing. Finally, this would solidify the
interface between OMPI and the runtime system, as well as provide
documentation and side effects of each interface function.

However, such a change would be fairly invasive to the OMPI layer, and
needs a buy-in from everyone for it to be possible.

Here is a summary of the changes required for the RSL (at least how it is
currently envisioned):

1. Add a framework to ompi for the rsl, and a component to support orte.
2. Change ompi so that it uses the new interface. This involves:
 a. Moving runtime specific code into the orte rsl component.
 b. Changing the process names in ompi to an opaque object.
 c. change all references to orte in ompi to be to the rsl.
3. Change the configuration code so that open-rte is only linked where needed.

Of course, all this would happen on a tmp branch.

The design of the rsl is not solidified. I have been playing in a tmp branch
(located at https://svn.open-mpi.org/svn/ompi/tmp/rsl) which everyone is
welcome to look at and comment on, but be advised that things here are
subject to change (I don't think it even compiles right now). There are
some fairly large open questions on this, including:

1. How to handle mpirun (that is, when a user types 'mpirun', do they
always get ORTE, or do they sometimes get a system specific runtime). Most
likely mpirun will always use ORTE, and alternative launching programs would
be used for other runtimes.
2. Whether there will be any performance implications. My guess is not,
but am not quite sure of this yet.

Again, I am interested in people's comments on

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-24 Thread Tim Prins


George Bosilca wrote:
Looks like I'm the only one barely excited about this idea. The  
system that you described, is well known. It been around for around  
10 years, and it's called PMI. The interface you have in the tmp  
branch as well as the description you gave in your email are more  
than similar with what they sketch in the following two documents:


http://www-unix.mcs.anl.gov/mpi/mpich/developer/design/pmiv2draft.htm
http://www-unix.mcs.anl.gov/mpi/mpich/developer/design/pmiv2.htm
Yes, I am well acquainted with these documents, and the PMI did provide 
a lot of inspiration for the RSL.


Now, there is something wrong with reinventing the wheel if there are  
no improvements. And so far I'm unable to notice any major  
improvement neither compared with PMI nor with what we have today  
(except maybe being able to use PMI inside Open MPI).
This is true. The RSL is designed to handle exactly what we need right 
now. This does not mean that the interface cannot be extended later. The 
current RSL is a starting point.


Again, my main concern is about fault tolerance. There is nothing in  
PMI (and nothing in RSL so far) that allow any kind of fault  
tolerance [And believe me re-writing the MPICH mpirun to allow  
checkpoint/restart is a hassle].
I am open to any extensions that are needed. Again, the current version 
is designed as a starting point. Also, I have been talking a lot with 
Josh and the current RSL is more than enough to support 
checkpoint/restart as currently implemented. I would be interested in 
talking about any additions that are needed.


Moreover, your approach seems to  
open the possibility of having heterogeneous RTE (in terms of  
features) which in my view is definitively the wrong approach.
Do you mean having different RTEs that support different features? 
Personally I do not see this as a horrible thing. In fact, we already 
deal with this problem, since different systems support different 
things. For instance, we support comm_spawn on most systems, but not all.


I do not understand why a user should have to use a RTE which supports 
every system ever imagined, and provides every possible fault-tolerant 
feature, when all they want is a thin RTE.


Tim



   george.

On Aug 16, 2007, at 9:47 PM, Tim Prins wrote:


WHAT: Solicitation of feedback on the possibility of adding a runtime
services layer to Open MPI to abstract out the runtime.

WHY: To solidify the interface between OMPI and the runtime  
environment,

and to allow the use of different runtime systems, including different
versions of ORTE.

WHERE: Addition of a new framework to OMPI, and changes to many of the
files in OMPI to funnel all runtime request through this framework.  
Few

changes should be required in OPAL and ORTE.

WHEN: Development has started in tmp/rsl, but is still in its  
infancy. We hope

to have a working system in the next month.

TIMEOUT: 8/29/07

--
Short version:

I am working on creating an interface between OMPI and the runtime  
system.
This would make a RSL framework in OMPI which all runtime services  
would be

accessed from. Attached is a graphic depicting this.

This change would be invasive to the OMPI layer. Few (if any) changes
will be required of the ORTE and OPAL layers.

At this point I am soliciting feedback as to whether people are
supportive or not of this change both in general and for v1.3.


Long version:

The current model used in Open MPI assumes that one runtime system is
the best for all environments. However, in many environments it may be
beneficial to have specialized runtime systems. With our current  
system this

is not easy to do.

With this in mind, the idea of creating a 'runtime services layer' was
hatched. This would take the form of a framework within OMPI,  
through which

all runtime functionality would be accessed. This would allow new or
different runtime systems to be used with Open MPI. Additionally,  
with such a
system it would be possible to have multiple versions of open rte  
coexisting,
which may facilitate development and testing. Finally, this would  
solidify the

interface between OMPI and the runtime system, as well as provide
documentation and side effects of each interface function.

However, such a change would be fairly invasive to the OMPI layer, and
needs a buy-in from everyone for it to be possible.

Here is a summary of the changes required for the RSL (at least how  
it is

currently envisioned):

1. Add a framework to ompi for the rsl, and a component to support  
orte.

2. Change ompi so that it uses the new interface. This involves:
 a. Moving runtime specific code into the orte rsl component.
 b. Changing the process names in ompi to an opaque object.
 c. change all references to orte in ompi to be to the rsl.
3. Change the configuration code so that open-rte is only linked  
where needed.


Of course, all this would happen on a tmp branch.

The design of the rsl is not solidified. I

Re: [OMPI devel] [RFC] Runtime Services Layer

2007-08-21 Thread Tim Prins


Terry,

Thanks for the comments. Responses below.

Terry D. Dontje wrote:

I think the concept is a good idea.  A few questions that come to mind:

1.  Do you have a set of APIs you plan on supporting? 
Do you mean the RSL API? Or do you mean the APIs of alternative runtime 
systems?


The rsl API is in 
https://svn.open-mpi.org/svn/ompi/tmp/rsl/ompi/mca/rsl/rsl.h


As far as other runtime systems, I have not looked too much at what 
others support. However, I am trying to make the APIs in the RSL as 
generic as possible.



2.  Are you planning on adding new APIs (not currently supported by ORTE)?
Not in the sense of new functionality, but some of the APIs are quite 
different then ORTE is currently using.



3.  Do any of the ORTE replacement APIs differ in how they work?

Well, every runtime does things differently.

For instance, looking at the MPICH PMI interface (which is sort-of their 
version of the RSL), they make heavy use of a key-value space. For the 
RSL, I am using process attributes which are similar in concept to this, 
but do work slightly differently.


Another difference is that the RSL exposes a out of band communication 
interface, which is not provided by the PMI. So if we used a runtime 
that was based on the PMI, then we would have to do our own out-of-band 
communication within the RSL component.



4.  Will RSL change in how we access information from the GPR?  If not
 how does this layer really separate us from ORTE?
Yes, although there is already a layer of abstraction here since the GPR 
usage in OMPI all goes through the modex code.


So what would happen with the RSL would be that the modex send/recv 
would be called, which would then call the process attribute send/recv 
code. Alternatively, the process attribute system could be called directly.


The process attribute system in the RSL would then use whatever 
implementation specific system it wants to exchange the data.



5.  How will RSL handle OOB functionality (routing of messages)?
That is up to the rsl implementation. An out-of-band interface is 
provided, and it is the components job to make sure the message is 
delivered.



6.  How does making the process names opaque differ from how ORTE
 names processes?  Do you still need a global namespace for a 
"universe"?
Again, it is up to the implementation. OMPI assumes that all process 
names it sees uniquely identify a remote process. In this sense, a 
global process namespace would need be needed. But if the rsl wanted to 
do some trickery to avoid the need for a global namespace, it probably 
could.




I like the idea but I really wonder if this will even be half-baked in 
time for

1.3  (same concern as Jeff's).

Understood.

Tim



--td

Tim Prins wrote:

WHAT: Solicitation of feedback on the possibility of adding a runtime 
services layer to Open MPI to abstract out the runtime.


WHY: To solidify the interface between OMPI and the runtime environment, 
and to allow the use of different runtime systems, including different 
versions of ORTE.


WHERE: Addition of a new framework to OMPI, and changes to many of the 
files in OMPI to funnel all runtime request through this framework. Few 
changes should be required in OPAL and ORTE.


WHEN: Development has started in tmp/rsl, but is still in its infancy. We hope 
to have a working system in the next month.


TIMEOUT: 8/29/07

--
Short version:

I am working on creating an interface between OMPI and the runtime system. 
This would make a RSL framework in OMPI which all runtime services would be 
accessed from. Attached is a graphic depicting this.


This change would be invasive to the OMPI layer. Few (if any) changes 
will be required of the ORTE and OPAL layers.


At this point I am soliciting feedback as to whether people are 
supportive or not of this change both in general and for v1.3.



Long version:

The current model used in Open MPI assumes that one runtime system is 
the best for all environments. However, in many environments it may be 
beneficial to have specialized runtime systems. With our current system this 
is not easy to do.


With this in mind, the idea of creating a 'runtime services layer' was 
hatched. This would take the form of a framework within OMPI, through which 
all runtime functionality would be accessed. This would allow new or 
different runtime systems to be used with Open MPI. Additionally, with such a

system it would be possible to have multiple versions of open rte coexisting,
which may facilitate development and testing. Finally, this would solidify the 
interface between OMPI and the runtime system, as well as provide 
documentation and side effects of each interface function.


However, such a change would be fairly invasive to the OMPI layer, and 
needs a buy-in from everyone for it to be possible.


Here is a summary of the changes required for the RSL (at least how it is 
currently envisioned):


1. Add a framework to ompi for the rsl, and a

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-19 Thread Tim Prins

On Friday 17 August 2007 10:53:41 am Richard Graham wrote:
> Tim,
>   This looks like a good idea, and is a good step toward componentizing the
> run-time services the are available from the MPI's perspective.
>   A few comments:
>  - It is a good idea to play around in a sandbox to see what may or may not
> work - otherwise we are just guessing.  However, this is driven by the
> current code structure, and may or may not align with longer term plans.
> What is needed here, I believe, is a deliberate design - i.e. figure out
> where we want to go, and then see if anything in the implementation needs
> to change before it is moved over to the trunk.
This is a good point. I have tried to make the design reflect exactly what we 
need from a generic runtime system, and not just copying how we currently use 
orte. However, there are some things currently in the RSL which are somewhat 
orte specific (i.e. multiple init 'stages'), but should not interfer with 
other runtimes being used (since the runtimes can just use no-ops for 
unneeded stages).

>  - We are where we are, and can't just throw it away (it could even be
> exactly what we want), so even if our "ideal" is different than our current
> state, I believe incremental change is the way to go, to preserve an
> operating code base.
Agreed. I think the implementation of the rsl can be done with minimal risk of 
breaking things, since it requires little (if any) change in opal and orte, 
and mostly minor changes in ompi.

>   - I think it is way too early to talk about moving things over to the
> trunk in the next month or so, unless there sufficient evaluation can be
> done in a month or so.  This is not to discourage you at all, but just to
> caution against moving too fast, and then having to redo things.
>   I am very supportive if this, I do believe this is the right way to go,
> unless someone else can come up with a better idea, and time to implement.

Thanks for the comments,

Tim

>
> Thanks,
> Rich
>
> On 8/16/07 9:47 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:
> > WHAT: Solicitation of feedback on the possibility of adding a runtime
> > services layer to Open MPI to abstract out the runtime.
> >
> > WHY: To solidify the interface between OMPI and the runtime environment,
> > and to allow the use of different runtime systems, including different
> > versions of ORTE.
> >
> > WHERE: Addition of a new framework to OMPI, and changes to many of the
> > files in OMPI to funnel all runtime request through this framework. Few
> > changes should be required in OPAL and ORTE.
> >
> > WHEN: Development has started in tmp/rsl, but is still in its infancy. We
> > hope to have a working system in the next month.
> >
> > TIMEOUT: 8/29/07
> >
> > --
> > Short version:
> >
> > I am working on creating an interface between OMPI and the runtime
> > system. This would make a RSL framework in OMPI which all runtime
> > services would be accessed from. Attached is a graphic depicting this.
> >
> > This change would be invasive to the OMPI layer. Few (if any) changes
> > will be required of the ORTE and OPAL layers.
> >
> > At this point I am soliciting feedback as to whether people are
> > supportive or not of this change both in general and for v1.3.
> >
> >
> > Long version:
> >
> > The current model used in Open MPI assumes that one runtime system is
> > the best for all environments. However, in many environments it may be
> > beneficial to have specialized runtime systems. With our current system
> > this is not easy to do.
> >
> > With this in mind, the idea of creating a 'runtime services layer' was
> > hatched. This would take the form of a framework within OMPI, through
> > which all runtime functionality would be accessed. This would allow new
> > or different runtime systems to be used with Open MPI. Additionally, with
> > such a system it would be possible to have multiple versions of open rte
> > coexisting, which may facilitate development and testing. Finally, this
> > would solidify the interface between OMPI and the runtime system, as well
> > as provide documentation and side effects of each interface function.
> >
> > However, such a change would be fairly invasive to the OMPI layer, and
> > needs a buy-in from everyone for it to be possible.
> >
> > Here is a summary of the changes required for the RSL (at least how it is
> > currently envisioned):
> >
> > 1. Add a framework to ompi for the rsl, and a component to support orte.
> > 2. Change ompi so that it uses the new interface. This involves:
> >  a. Movi

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-19 Thread Tim Prins

On Friday 17 August 2007 08:40:01 am Jeff Squyres wrote:
> I am definitely interested to see what the RSL turns out to be; I
> think it has many potential benefits.  There are also some obvious
> issues to be worked out (e.g., mpirun and friends).
Yeah, thinking through this and talking to others, it seems like the best way 
to deal with this is to say that mpirun points to our default runtime (orte), 
and that to use any other rsl component, you have to use that system's 
specific launcher (could be a 'srun', or a 'mpirun-foobar', whatever the 
system wants to do).

>
> As for whether this should go in v1.3, I don't know if it's possible
> to say yet -- it will depend on when RSL becomes [at least close to]
> ready, what the exact schedule for v1.3 is (which we've been skittish
> to define, since we're going for a feature-driven release), etc.

I agree that it is impossible to say right now, but wanted to throw it out 
there for people to consider/think about. 

Tim

>
> On Aug 16, 2007, at 9:47 PM, Tim Prins wrote:
> > WHAT: Solicitation of feedback on the possibility of adding a runtime
> > services layer to Open MPI to abstract out the runtime.
> >
> > WHY: To solidify the interface between OMPI and the runtime
> > environment,
> > and to allow the use of different runtime systems, including different
> > versions of ORTE.
> >
> > WHERE: Addition of a new framework to OMPI, and changes to many of the
> > files in OMPI to funnel all runtime request through this framework.
> > Few
> > changes should be required in OPAL and ORTE.
> >
> > WHEN: Development has started in tmp/rsl, but is still in its
> > infancy. We hope
> > to have a working system in the next month.
> >
> > TIMEOUT: 8/29/07
> >
> > --
> > Short version:
> >
> > I am working on creating an interface between OMPI and the runtime
> > system.
> > This would make a RSL framework in OMPI which all runtime services
> > would be
> > accessed from. Attached is a graphic depicting this.
> >
> > This change would be invasive to the OMPI layer. Few (if any) changes
> > will be required of the ORTE and OPAL layers.
> >
> > At this point I am soliciting feedback as to whether people are
> > supportive or not of this change both in general and for v1.3.
> >
> >
> > Long version:
> >
> > The current model used in Open MPI assumes that one runtime system is
> > the best for all environments. However, in many environments it may be
> > beneficial to have specialized runtime systems. With our current
> > system this
> > is not easy to do.
> >
> > With this in mind, the idea of creating a 'runtime services layer' was
> > hatched. This would take the form of a framework within OMPI,
> > through which
> > all runtime functionality would be accessed. This would allow new or
> > different runtime systems to be used with Open MPI. Additionally,
> > with such a
> > system it would be possible to have multiple versions of open rte
> > coexisting,
> > which may facilitate development and testing. Finally, this would
> > solidify the
> > interface between OMPI and the runtime system, as well as provide
> > documentation and side effects of each interface function.
> >
> > However, such a change would be fairly invasive to the OMPI layer, and
> > needs a buy-in from everyone for it to be possible.
> >
> > Here is a summary of the changes required for the RSL (at least how
> > it is
> > currently envisioned):
> >
> > 1. Add a framework to ompi for the rsl, and a component to support
> > orte.
> > 2. Change ompi so that it uses the new interface. This involves:
> >  a. Moving runtime specific code into the orte rsl component.
> >  b. Changing the process names in ompi to an opaque object.
> >  c. change all references to orte in ompi to be to the rsl.
> > 3. Change the configuration code so that open-rte is only linked
> > where needed.
> >
> > Of course, all this would happen on a tmp branch.
> >
> > The design of the rsl is not solidified. I have been playing in a
> > tmp branch
> > (located at https://svn.open-mpi.org/svn/ompi/tmp/rsl) which
> > everyone is
> > welcome to look at and comment on, but be advised that things here are
> > subject to change (I don't think it even compiles right now). There
> > are
> > some fairly large open questions on this, including:
> >
> > 1. How to handle mpirun (that is, when a user types 'mpirun', do they
> > always get ORTE, or do they sometimes get a system specific
>

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15881

2007-08-16 Thread Tim Prins


Jeff Squyres wrote:

On Aug 16, 2007, at 11:48 AM, Tim Prins wrote:


+#define ORTE_RML_TAG_UDAPL  25
+#define ORTE_RML_TAG_OPENIB 26
+#define ORTE_RML_TAG_MVAPI  27

I think that UDAPL, OPENIB, MVAPI should not appear anywhere in the
ORTE layer ...

I tend to agree with you. However, the precedent has been set long ago
to put all these constants in this file (i.e. there is
ORTE_RML_TAG_WIREUP and ORTE_RML_TAG_SM_BACK_FILE_CREATED which are  
only
used in OMPI), and it makes sense to have all tags defined in one  
place.


I think George's point is that the names UDAPL, OPENIB, MVAPI are all  
specific to the OMPI layer and refer to specific components.  The  
generic action WIREUP was probably somewhat forgivable, but  
SM_BACK_FILE_CREATED is probably the same kind of abstraction break  
as UDAPL/OPENIB/MVAPI, which is your point.


So you're both right.  :-)  But Tim's falling back on an older (and  
unfortunately bad) precedent.  It would be nice to not extend that  
bad precedent, IMHO...


I really don't care where the constants are defined, but they do need to 
be unique. I think it is easiest if all the constants are stored in one 
file, but if someone else wants to chop them up, that's fine with me. We 
would just have to be more careful when adding new constants to check 
both files.





If we end up doing the runtime services layer, all the ompi tags would
be defined in the RSL, and this will become moot.


True.  We will need a robust tag reservation system, though, to  
guarantee that every process gets the same tag values (e.g., if udapl  
is available on some nodes but not others, will that cause openib to  
have different values on different nodes?  And so on).
Not really. All that is needed is a list of constants (similar to the 
one in rml_types.h). If a rsl component doesn't like the particular 
constant tag values, they can do whatever they want in their 
implementation, as long as a messages sent on a tag is received on the 
same tag.


Tim

Re: [OMPI devel] Problem with group code

2007-08-16 Thread Tim Prins


Sorry, I pushed the wrong button and sent this before it was ready

Tim Prins wrote:

Hi folks,

I am running into a problem with the ibm test 'group'. I will try to 
explain what I think is going on, but I do not really understand the 
group code so please forgive me if it is wrong...


The test creates a group based on MPI_COMM_WORLD (group1), and a group 
that has half the procs in group1 (newgroup). Next, all the processes do:


MPI_Group_intersection(newgroup,group1,)

ompi_group_intersection figures out what procs are needed for group2, 
then calls


ompi_group_incl, passing 'newgroup' and ''

This then calls (since I am not using sparse groups) ompi_group_incl_plist

However, ompi_group_plist assumes that the current process is a member 
of the passed group ('newgroup'). Thus when it calls 
ompi_group_peer_lookup on 'newgroup', half of the processes get garbage 
back since they are not in 'newgroup'. In most cases, memory is 
initialized to \0 and things fall through, but we get intermittent 
segfaults in optimized builds.



Here is a patch to a error check which highlights the problem:
Index: group/group.h
===
--- group/group.h   (revision 15869)
+++ group/group.h   (working copy)
@@ -308,7 +308,7 @@
 static inline struct ompi_proc_t* ompi_group_peer_lookup(ompi_group_t
*group, int peer_id)
 {
 #if OMPI_ENABLE_DEBUG
-if (peer_id >= group->grp_proc_count) {
+if (peer_id >= group->grp_proc_count || peer_id < 0) {
 opal_output(0, "ompi_group_lookup_peer: invalid peer index
(%d)", peer_id);


Thanks,

Tim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] Problem with group code

2007-08-16 Thread Tim Prins


Hi folks,

I am running into a problem with the ibm test 'group'. I will try to 
explain what I think is going on, but I do not really understand the 
group code so please forgive me if it is wrong...


The test creates a group based on MPI_COMM_WORLD (group1), and a group 
that has half the procs in group1 (newgroup). Next, all the processes do:


MPI_Group_intersection(newgroup,group1,)

ompi_group_intersection figures out what procs are needed for group2, 
then calls


ompi_group_incl, passing 'newgroup' and ''

This then calls (since I am not using sparse groups) ompi_group_incl_plist

However, ompi_group_plist assumes that the current process is a member 
of the passed group ('newgroup'). Thus when it calls 
ompi_group_peer_lookup on 'newgroup', half of the processes get garbage 
back since they are not in 'newgroup'. In most cases, memory is 
initialized to \0 and things fall through, but we get intermittent 
segfaults in optimized builds.


In r I have put in a correction to a error check which should help show 
this problem.


Thanks,

Tim

[OMPI devel] RML tags

2007-08-15 Thread Tim Prins


Hi folks,

I was looking at the rml usage in ompi, and noticed that several of the 
btls (udapl, mvapi, and openib) use the same rml tag for their messages. 
My guess is that this is a mistake, but just want to ask if there is a 
reason for this before I correct it.


Thanks,

Tim

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15848

2007-08-14 Thread Tim Prins

This might be breaking things on odin. All our 64 bit openib mtt tests 
have the following output:


[odin003.cs.indiana.edu:30971] Wrong QP specification (QP 0 
"P,128,256,128,16:S,1024,256,128,32:S,4096,256,128,32:S,65536,256,128,32"). 
Point-to-point QP get 1-5 parameters


However, on my debug build I do not get any errors. Is anyone else 
seeing this?


Thanks,

Tim


jsquy...@osl.iu.edu wrote:

Author: jsquyres
Date: 2007-08-13 17:51:05 EDT (Mon, 13 Aug 2007)
New Revision: 15848
URL: https://svn.open-mpi.org/trac/ompi/changeset/15848

Log:
Change the default receive_queues value per
http://www.open-mpi.org/community/lists/devel/2007/08/2100.php.

Text files modified: 
   trunk/ompi/mca/btl/openib/btl_openib_mca.c | 2 +-  
   1 files changed, 1 insertions(+), 1 deletions(-)


Modified: trunk/ompi/mca/btl/openib/btl_openib_mca.c
==
--- trunk/ompi/mca/btl/openib/btl_openib_mca.c  (original)
+++ trunk/ompi/mca/btl/openib/btl_openib_mca.c  2007-08-13 17:51:05 EDT (Mon, 
13 Aug 2007)
@@ -477,7 +477,7 @@
 char *str;
 char **queues, **params = NULL;
 int num_pp_qps = 0, num_srq_qps = 0, qp = 0, ret = OMPI_ERROR;
-char *default_qps = 
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32";
+char *default_qps = 
"P,128,256,128,16:S,1024,256,128,32:S,4096,256,128,32:S,65536,256,128,32";
 uint32_t max_qp_size, max_size_needed;
 
 reg_string("receive_queues",

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn

Re: [OMPI devel] [OMPI users] warning:regcache incompatible with malloc

2007-07-11 Thread Tim Prins



Scott Atchley wrote:

On Jul 10, 2007, at 3:24 PM, Tim Prins wrote:


On Tuesday 10 July 2007 03:11:45 pm Scott Atchley wrote:

On Jul 10, 2007, at 2:58 PM, Scott Atchley wrote:

Tim, starting with the recently released 1.2.1, it is the default.

To clarify, MX_RCACHE=1 is the default.
It would be good for the default to be something where there is no  
warning

printed (i.e. 0 or 2). I see the warning on the current trunk.

Tim


After further discussion in-house, the warning can be avoided if - 
lmyriexpress is included when linking the app (i.e. if it is in mpicc  
when linking).
We cannot do this since we create network agnostic executables so that 
users can select networks at runtime. Doing -lmyriexpress would put an 
artificial dependency on the myrinet library, even if the user does not 
want to use it.




Another clarification, the regache does work with several replacement  
malloc libraries. If the user application overloads mmap(), munmap()  
and sbrk(), then it may or may not work. In this case, the user  
should use MX_RCACHE=0.

This sounds to me like a lot to ask the user to do...

My opinion is that if MX_RCACHE is not explicitly set by the user, Open 
MPI should set it to either 0 or 2 automatically. An explicit goal Open 
MPI is for it to automatically do the right thing in most cases. Letting 
a ton of warnings be spit out at the user, in my opinion, is the wrong 
thing.


Tim

Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Tim Prins


Jeff Squyres wrote:
2. The "--enable-mca-no-build" option takes a comma-delimited list of  
components that will then not be built.  Granted, this option isn't  
exactly intuitive, but it was the best that we could think of at the  
time to present a general solution for inhibiting the build of a  
selected list of components.  Hence, "--enable-mca-no-build=pls- 
slurm,ras-slurm" would inhibit building the SLURM RAS and PLS  
components (note that the SLURM components currently do not require  
any additional libraries, so a) there is no corresponding --with[out]- 
slurm option, and b) they are usually always built).


Actually, there are --with-slurm/--without-slurm options. We default to 
building slurm support automatically on linux and aix, but not on other 
platforms.


Tim

Re: [OMPI devel] Ob1 segfault

2007-07-09 Thread Tim Prins


Gleb Natapov wrote:

On Sun, Jul 08, 2007 at 12:41:58PM -0400, Tim Prins wrote:

On Sunday 08 July 2007 08:32:27 am Gleb Natapov wrote:

On Fri, Jul 06, 2007 at 06:36:13PM -0400, Tim Prins wrote:

While looking into another problem I ran into an issue which made ob1
segfault on me. Using gm, and running the test test_dan1 in the onesided
test suite, if I limit the gm freelist by too much, I get a segfault.
That is,

mpirun -np 2 -mca btl gm,self -mca btl_gm_free_list_max 1024 test_dan1

works fine, but

mpirun -np 2 -mca btl gm,self -mca btl_gm_free_list_max 512 test_dan1

I cannot, unfortunately, reproduce this with openib BTL.


segfaults. Here is the relevant output from gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1077541088 (LWP 15600)]
0x404d81c1 in mca_pml_ob1_send_fin (proc=0x9bd9490, bml_btl=0xd323580,
hdr_des=0x9e54e78, order=255 '�', status=1) at pml_ob1.c:267
267 MCA_PML_OB1_DES_ALLOC(bml_btl, fin, order,
sizeof(mca_pml_ob1_fin_hdr_t));

can you send me what's inside bml_btl?
It turns out that the order of arguments to mca_pml_ob1_send_fin was wrong. I 
fixed this in r15304. But now we hang instead of segfault, and have both 
processes just looping through opal_progress. I really don't what to look 
for. Any hints?



Can you look in gdb at mca_pml_ob1.rdma_pending?

Yeah, rank 0 has nothing on the list, and rank 1 has 48 things.

Here is the first item on the list:
$7 = {
  super = {
super = {
  super = {
obj_magic_id = 16046253926196952813,
obj_class = 0x404f5980,
obj_reference_count = 1,
cls_init_file_name = 0x404f30f9 "pml_ob1_sendreq.c",
cls_init_lineno = 1134
  },
  opal_list_next = 0x8f5d680,
  opal_list_prev = 0x404f57c8,
  opal_list_item_refcount = 1,
  opal_list_item_belong_to = 0x404f57b0
},
registration = 0x0,
ptr = 0x0
  },
  rdma_bml = 0x8729098,
  rdma_hdr = {
hdr_common = {
  hdr_type = 8 '\b',
  hdr_flags = 4 '\004'
},
hdr_match = {
  hdr_common = {
hdr_type = 8 '\b',
hdr_flags = 4 '\004'
  },
  hdr_ctx = 5,
  hdr_src = 1,
  hdr_tag = 142418176,
  hdr_seq = 0,
  hdr_padding = "\000"
},
hdr_rndv = {
  hdr_match = {
hdr_common = {
  hdr_type = 8 '\b',
  hdr_flags = 4 '\004'
},
hdr_ctx = 5,
hdr_src = 1,
hdr_tag = 142418176,
hdr_seq = 0,
hdr_padding = "\000"
  },
  hdr_msg_length = 236982400,
  hdr_src_req = {
lval = 0,
ival = 0,
pval = 0x0,
sval = {
  uval = 0,
  lval = 0
}
  }
},
hdr_rget = {
  hdr_rndv = {
hdr_match = {
  hdr_common = {
hdr_type = 8 '\b',
hdr_flags = 4 '\004'
  },
  hdr_ctx = 5,
  hdr_src = 1,
  hdr_tag = 142418176,
  hdr_seq = 0,
  hdr_padding = "\000"
},
hdr_msg_length = 236982400,
hdr_src_req = {
  lval = 0,
  ival = 0,
  pval = 0x0,
  sval = {
uval = 0,
lval = 0
  }
}
  },
  hdr_seg_cnt = 1106481152,
  hdr_padding = "\000\000\000",
  hdr_des = {
lval = 32768,
ival = 32768,
pval = 0x8000,
sval = {
  uval = 32768,
  lval = 0
}
  },
  hdr_segs = {{
  seg_addr = {
lval = 0,
ival = 0,
pval = 0x0,
sval = {
  uval = 0,
  lval = 0
}
  },
  seg_len = 0,
  seg_padding = "\000\000\000",
  seg_key = {
key32 = {0, 0},
key64 = 0,
key8 = "\000\000\000\000\000\000\000"
  }
}}
},
hdr_frag = {
  hdr_common = {
hdr_type = 8 '\b',
hdr_flags = 4 '\004'
  },
  hdr_padding = "\005\000\001\000\000",
  hdr_frag_offset = 142418176,
  hdr_src_req = {
lval = 236982400,
ival = 236982400,
pval = 0xe201080,
sval = {
  uval = 236982400,
  lval = 0
}
  },
  hdr_dst_req = {
lval = 0,
ival = 0,
pval = 0x0,
sval = {
  uval = 0,
  lval = 0
}
  }
},
hdr_ack = {
  hdr_common = {
hdr_type = 8 '\b',
hdr_flags = 4 '\004'
  },
  hdr_padding = "\005\000\001\000\000",
  hdr_src_req = {
lval = 142418176,
ival = 142418176,
pval = 0x87d2100,
sval = {
  uval = 142418176,
  lval = 0
}
  },
  hdr_dst_req = {
lval = 236982400,
ival = 236982400,
pval = 0xe201080,
sval = {
  uval = 236982400,
  lval = 0
}
  },

Re: [OMPI devel] Ob1 segfault

2007-07-08 Thread Tim Prins

On Sunday 08 July 2007 08:32:27 am Gleb Natapov wrote:
> On Fri, Jul 06, 2007 at 06:36:13PM -0400, Tim Prins wrote:
> > While looking into another problem I ran into an issue which made ob1
> > segfault on me. Using gm, and running the test test_dan1 in the onesided
> > test suite, if I limit the gm freelist by too much, I get a segfault.
> > That is,
> >
> > mpirun -np 2 -mca btl gm,self -mca btl_gm_free_list_max 1024 test_dan1
> >
> > works fine, but
> >
> > mpirun -np 2 -mca btl gm,self -mca btl_gm_free_list_max 512 test_dan1
>
> I cannot, unfortunately, reproduce this with openib BTL.
>
> > segfaults. Here is the relevant output from gdb:
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 1077541088 (LWP 15600)]
> > 0x404d81c1 in mca_pml_ob1_send_fin (proc=0x9bd9490, bml_btl=0xd323580,
> > hdr_des=0x9e54e78, order=255 '�', status=1) at pml_ob1.c:267
> > 267 MCA_PML_OB1_DES_ALLOC(bml_btl, fin, order,
> > sizeof(mca_pml_ob1_fin_hdr_t));
>
> can you send me what's inside bml_btl?

It turns out that the order of arguments to mca_pml_ob1_send_fin was wrong. I 
fixed this in r15304. But now we hang instead of segfault, and have both 
processes just looping through opal_progress. I really don't what to look 
for. Any hints?

Thanks,

Tim


>
> > (gdb) bt
> > #0  0x404d81c1 in mca_pml_ob1_send_fin (proc=0x9bd9490,
> > bml_btl=0xd323580, hdr_des=0x9e54e78, order=255 '�', status=1) at
> > pml_ob1.c:267 #1  0x404eef7a in mca_pml_ob1_send_request_put_frag
> > (frag=0xa711f00) at pml_ob1_sendreq.c:1141
> > #2  0x404d986e in mca_pml_ob1_process_pending_rdma () at pml_ob1.c:387
> > #3  0x404eed57 in mca_pml_ob1_put_completion (btl=0x9c37e38,
> > ep=0x9c42c78, des=0xb62ad00, status=0) at pml_ob1_sendreq.c:1108
> > #4  0x404ff520 in mca_btl_gm_put_callback (port=0x9bec5e0,
> > context=0xb62ad00, status=GM_SUCCESS) at btl_gm.c:682
> > #5  0x40512c4f in gm_handle_sent_tokens (p=0x9bec5e0, e=0x406189c0)
> > at ./libgm/gm_handle_sent_tokens.c:82
> > #6  0x40517c73 in _gm_unknown (p=0x9bec5e0, e=0x406189c0)
> > at ./libgm/gm_unknown.c:222
> > #7  0x405180fc in gm_unknown (p=0x9bec5e0, e=0x406189c0)
> > at ./libgm/gm_unknown.c:300
> > #8  0x40502708 in mca_btl_gm_component_progress () at
> > btl_gm_component.c:649 #9  0x404f6fd6 in mca_bml_r2_progress () at
> > bml_r2.c:110
> > #10 0x401a51d3 in opal_progress () at runtime/opal_progress.c:201
> > #11 0x405cf864 in opal_condition_wait (c=0x9e564b8, m=0x9e56478)
> > at ../../../../opal/threads/condition.h:98
> > #12 0x405cf68e in ompi_osc_pt2pt_module_fence (assert=0, win=0x9e55ec8)
> > at osc_pt2pt_sync.c:142
> > #13 0x400b6ebb in PMPI_Win_fence (assert=0, win=0x9e55ec8) at
> > pwin_fence.c:57 #14 0x0804a2f3 in test_bandwidth1 (nbufsize=105,
> > min_iterations=10, max_iterations=1000, verbose=0) at test_dan1.c:282
> > #15 0x0804b06f in get_bandwidth (argc=0, argv=0x0) at test_dan1.c:686
> > #16 0x080512f5 in test_dan1 () at test_dan1.c:3555
> > #17 0x08051573 in main (argc=1, argv=0xbfeba9f4) at test_dan1.c:3639
> > (gdb)
> >
> > This is using the trunk. Any ideas?
> >
> > Thanks,
> >
> > Tim
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
>   Gleb.
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Tim Prins

Actually, the tests are quite painful to run, since there are things in 
there that aren't real tests (such as spin, no-op, loob-child, etc) and 
I really don't know what the expected output should be.


Anyways, I have made my way through these things, and I could not see 
any failures. This should clear the way for these changesets to be being 
brought in.


George: Do you want to bring this over? If you do, remember to also 
remove test/class/orte_bitmap.c


Thanks,

Tim


Ralph H Castain wrote:

Sigh...is it really so much to ask that we at least run the tests in
orte/test/system and orte/test/mpi using both mpirun and singleton (where
appropriate) instead of just relying on "well I ran hello_world"?

That is all I have ever asked, yet it seems to be viewed as a huge
impediment. Is it really that much to ask for when modifying a core part of
the system? :-/

If you have done those tests, then my apology - but your note only indicates
that you ran "hello_world" and are basing your recommendation *solely* on
that test.


On 6/6/07 7:51 AM, "Tim Prins" <tpr...@open-mpi.org> wrote:

  

I hate to go back to this, but...

The original commits also included changes to gpr_replica_dict_fn.c
(r14331 and r14336). This change shows some performance improvement for
me (about %8 on mpi hello, 123 nodes, 4ppn), and cleans up some ugliness
in the gpr. Again, this is a algorithmic change so as the job scales the
performance improvement would be more noticeable.

I vote that this be put back in.

On a related topic, a small memory leak was fixed in r14328, and then
reverted. This change should be put back in.

Tim

George Bosilca wrote:


Commit r14791 apply this patch to the trunk. Let me know if you
encounter any kind of troubles.

  Thanks,
george.

On May 29, 2007, at 2:28 PM, Ralph Castain wrote:

  

After some work off-list with Tim, it appears that something has been
broken
again on the OMPI trunk with respect to comm_spawn. It was working
two weeks
ago, but...sigh.

Anyway, it doesn't appear to have any bearing either way on George's
patch(es), so whomever wants to commit them is welcome to do so.

Thanks
Ralph


On 5/29/07 11:44 AM, "Ralph Castain" <r...@lanl.gov> wrote:




On 5/29/07 11:02 AM, "Tim Prins" <tpr...@open-mpi.org> wrote:

  

Well, after fixing many of the tests...


Interesting - they worked fine for me. Perhaps a difference in
environment.

  

It passes all the tests
except the spawn tests. However, the spawn tests are seriously broken
without this patch as well, and the ibm mpi spawn tests seem to work
fine.


Then something is seriously wrong. The spawn tests were working as
of my
last commit - that is a test I religiously run. If the spawn test here
doesn't work, then it is hard to understand how the mpi spawn can
work since
the call is identical.

Let me see what's wrong first...

  

As far as I'm concerned, this should assuage any fear of problems
with these changes and they should now go in.

Tim

On May 29, 2007, at 11:34 AM, Ralph Castain wrote:



Well, I'll be the voice of caution again...

Tim: did you run all of the orte tests in the orte/test/system
directory? If
so, and they all run correctly, then I have no issue with doing the
commit.
If not, then I would ask that we not do the commit until that has
been done.

In running those tests, you need to run them on a multi-node
system, both
using mpirun and as singletons (you'll have to look at the tests to
see
which ones make sense in the latter case). This will ensure that we
have at
least some degree of coverage.

Thanks
Ralph



On 5/29/07 9:23 AM, "George Bosilca" <bosi...@cs.utk.edu> wrote:

  

I'd be happy to commit the patch into the trunk. But after what
happened last time, I'm more than cautious. If the community think
the patch is worth having it, let me know and I'll push it in the
trunk asap.

   Thanks,
 george.

On May 29, 2007, at 10:56 AM, Tim Prins wrote:



I think both patches should be put in immediately. I have done some
simple testing, and with 128 nodes of odin, with 1024 processes
running mpi hello, these decrease our running time from about 14.2
seconds to 10.9 seconds. This is a significant decrease, and as the
scale increases there should be increasing benefit.

I'd be happy to commit these changes if no one objects.

Tim

On May 24, 2007, at 8:39 AM, Ralph H Castain wrote:

  

Thanks - I'll take a look at this (and the prior ones!) in the
next
couple
of weeks when time permits and get back to you.

Ralph


On 5/23/07 1:11 PM, "George Bosilca" <bosi...@cs.utk.edu> wrote:



Attached is another patch to the ORTE layer, more specifically
the
replica. The idea is to decrease the number of strcmp by using a
small hash function before doing the strcmp. The has

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Tim Prins


I hate to go back to this, but...

The original commits also included changes to gpr_replica_dict_fn.c 
(r14331 and r14336). This change shows some performance improvement for 
me (about %8 on mpi hello, 123 nodes, 4ppn), and cleans up some ugliness 
in the gpr. Again, this is a algorithmic change so as the job scales the 
performance improvement would be more noticeable.


I vote that this be put back in.

On a related topic, a small memory leak was fixed in r14328, and then 
reverted. This change should be put back in.


Tim

George Bosilca wrote:
Commit r14791 apply this patch to the trunk. Let me know if you 
encounter any kind of troubles.


  Thanks,
george.

On May 29, 2007, at 2:28 PM, Ralph Castain wrote:

After some work off-list with Tim, it appears that something has been 
broken
again on the OMPI trunk with respect to comm_spawn. It was working 
two weeks

ago, but...sigh.

Anyway, it doesn't appear to have any bearing either way on George's
patch(es), so whomever wants to commit them is welcome to do so.

Thanks
Ralph


On 5/29/07 11:44 AM, "Ralph Castain" <r...@lanl.gov> wrote:





On 5/29/07 11:02 AM, "Tim Prins" <tpr...@open-mpi.org> wrote:


Well, after fixing many of the tests...


Interesting - they worked fine for me. Perhaps a difference in 
environment.



It passes all the tests
except the spawn tests. However, the spawn tests are seriously broken
without this patch as well, and the ibm mpi spawn tests seem to work
fine.


Then something is seriously wrong. The spawn tests were working as 
of my

last commit - that is a test I religiously run. If the spawn test here
doesn't work, then it is hard to understand how the mpi spawn can 
work since

the call is identical.

Let me see what's wrong first...



As far as I'm concerned, this should assuage any fear of problems
with these changes and they should now go in.

Tim

On May 29, 2007, at 11:34 AM, Ralph Castain wrote:


Well, I'll be the voice of caution again...

Tim: did you run all of the orte tests in the orte/test/system
directory? If
so, and they all run correctly, then I have no issue with doing the
commit.
If not, then I would ask that we not do the commit until that has
been done.

In running those tests, you need to run them on a multi-node
system, both
using mpirun and as singletons (you'll have to look at the tests to
see
which ones make sense in the latter case). This will ensure that we
have at
least some degree of coverage.

Thanks
Ralph



On 5/29/07 9:23 AM, "George Bosilca" <bosi...@cs.utk.edu> wrote:


I'd be happy to commit the patch into the trunk. But after what
happened last time, I'm more than cautious. If the community think
the patch is worth having it, let me know and I'll push it in the
trunk asap.

   Thanks,
 george.

On May 29, 2007, at 10:56 AM, Tim Prins wrote:


I think both patches should be put in immediately. I have done some
simple testing, and with 128 nodes of odin, with 1024 processes
running mpi hello, these decrease our running time from about 14.2
seconds to 10.9 seconds. This is a significant decrease, and as the
scale increases there should be increasing benefit.

I'd be happy to commit these changes if no one objects.

Tim

On May 24, 2007, at 8:39 AM, Ralph H Castain wrote:

Thanks - I'll take a look at this (and the prior ones!) in the 
next

couple
of weeks when time permits and get back to you.

Ralph


On 5/23/07 1:11 PM, "George Bosilca" <bosi...@cs.utk.edu> wrote:

Attached is another patch to the ORTE layer, more specifically 
the

replica. The idea is to decrease the number of strcmp by using a
small hash function before doing the strcmp. The hask key for 
each

registry entry is computed when it is added to the registry. When
we're doing a query, instead of comparing the 2 strings we first
check if the hash key match, and if they do match then we compare
the
2 strings in order to make sure we eliminate collisions from our
answers.

There is some benefit in terms of performance. It's hardly 
visible

for few processes, but it start showing up when the number of
processes increase. In fact the number of strcmp in the trace 
file

drastically decrease. The main reason it works well, is because
most
of the keys start with basically the same chars (such as orte-
blahblah) which transform the strcmp on a loop over few chars.

Ralph, please consider it for inclusion on the ORTE layer.

   Thanks,
 george.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...

Re: [OMPI devel] ORTE registry patch

2007-05-29 Thread Tim Prins

Well, after fixing many of the tests... It passes all the tests  
except the spawn tests. However, the spawn tests are seriously broken  
without this patch as well, and the ibm mpi spawn tests seem to work  
fine.


As far as I'm concerned, this should assuage any fear of problems  
with these changes and they should now go in.


Tim

On May 29, 2007, at 11:34 AM, Ralph Castain wrote:


Well, I'll be the voice of caution again...

Tim: did you run all of the orte tests in the orte/test/system  
directory? If
so, and they all run correctly, then I have no issue with doing the  
commit.
If not, then I would ask that we not do the commit until that has  
been done.


In running those tests, you need to run them on a multi-node  
system, both
using mpirun and as singletons (you'll have to look at the tests to  
see
which ones make sense in the latter case). This will ensure that we  
have at

least some degree of coverage.

Thanks
Ralph



On 5/29/07 9:23 AM, "George Bosilca" <bosi...@cs.utk.edu> wrote:


I'd be happy to commit the patch into the trunk. But after what
happened last time, I'm more than cautious. If the community think
the patch is worth having it, let me know and I'll push it in the
trunk asap.

   Thanks,
 george.

On May 29, 2007, at 10:56 AM, Tim Prins wrote:


I think both patches should be put in immediately. I have done some
simple testing, and with 128 nodes of odin, with 1024 processes
running mpi hello, these decrease our running time from about 14.2
seconds to 10.9 seconds. This is a significant decrease, and as the
scale increases there should be increasing benefit.

I'd be happy to commit these changes if no one objects.

Tim

On May 24, 2007, at 8:39 AM, Ralph H Castain wrote:


Thanks - I'll take a look at this (and the prior ones!) in the next
couple
of weeks when time permits and get back to you.

Ralph


On 5/23/07 1:11 PM, "George Bosilca" <bosi...@cs.utk.edu> wrote:


Attached is another patch to the ORTE layer, more specifically the
replica. The idea is to decrease the number of strcmp by using a
small hash function before doing the strcmp. The hask key for each
registry entry is computed when it is added to the registry. When
we're doing a query, instead of comparing the 2 strings we first
check if the hash key match, and if they do match then we compare
the
2 strings in order to make sure we eliminate collisions from our
answers.

There is some benefit in terms of performance. It's hardly visible
for few processes, but it start showing up when the number of
processes increase. In fact the number of strcmp in the trace file
drastically decrease. The main reason it works well, is because  
most

of the keys start with basically the same chars (such as orte-
blahblah) which transform the strcmp on a loop over few chars.

Ralph, please consider it for inclusion on the ORTE layer.

   Thanks,
 george.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] ORTE registry patch

2007-05-29 Thread Tim Prins

I think both patches should be put in immediately. I have done some  
simple testing, and with 128 nodes of odin, with 1024 processes  
running mpi hello, these decrease our running time from about 14.2  
seconds to 10.9 seconds. This is a significant decrease, and as the  
scale increases there should be increasing benefit.


I'd be happy to commit these changes if no one objects.

Tim

On May 24, 2007, at 8:39 AM, Ralph H Castain wrote:

Thanks - I'll take a look at this (and the prior ones!) in the next  
couple

of weeks when time permits and get back to you.

Ralph


On 5/23/07 1:11 PM, "George Bosilca"  wrote:


Attached is another patch to the ORTE layer, more specifically the
replica. The idea is to decrease the number of strcmp by using a
small hash function before doing the strcmp. The hask key for each
registry entry is computed when it is added to the registry. When
we're doing a query, instead of comparing the 2 strings we first
check if the hash key match, and if they do match then we compare the
2 strings in order to make sure we eliminate collisions from our
answers.

There is some benefit in terms of performance. It's hardly visible
for few processes, but it start showing up when the number of
processes increase. In fact the number of strcmp in the trace file
drastically decrease. The main reason it works well, is because most
of the keys start with basically the same chars (such as orte-
blahblah) which transform the strcmp on a loop over few chars.

Ralph, please consider it for inclusion on the ORTE layer.

   Thanks,
 george.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[O-MPI devel] Intel tests

2006-01-10 Thread Tim Prins

Hi everyone,

I have been playing around with Open-MPI, using it as a test bed for
another project I am working on, and have found that on the intel test
suite, ompi is failing the MPI_Allreduce_user_c,
MPI_Reduce_scatter_user_c, and MPI_Reduce_user_c tests (it prints
something like  MPITEST error (2): i=0, int value=4, expected 1, etc).
Are these known error?

BTW, this is on a x86_64 linux box running 4 processes locally, running
the trunk svn version 8667, with no additional mca parameters set.

Thanks,

TIm

61 matches

Mail list logo