[OMPI devel] cross-complie changes

2009-07-21 Thread Jeff Squyres
Note that with the DDT changes, there's a few more configure tests  
that may have added or changed configure cache value names, such as  
the type alignment values.


Just a heads-up for those who are cross-compiling...

--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r21707

2009-07-21 Thread Jeff Squyres

Do we really want asserts here, or orte_show_help()'s?

asserts won't fire in production builds, will they?


On Jul 17, 2009, at 10:54 AM,   wrote:


Author: tdd
Date: 2009-07-17 10:54:18 EDT (Fri, 17 Jul 2009)
New Revision: 21707
URL: https://svn.open-mpi.org/trac/ompi/changeset/21707

Log:
Add asserts to catch when btl_eager_limit is smaller than the pml  
headers.


Text files modified:
   trunk/ompi/mca/pml/csum/pml_csum_sendreq.h | 2 ++
   trunk/ompi/mca/pml/dr/pml_dr_sendreq.h | 2 ++
   trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h   | 2 ++
   3 files changed, 6 insertions(+), 0 deletions(-)

Modified: trunk/ompi/mca/pml/csum/pml_csum_sendreq.h
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/ompi/mca/pml/csum/pml_csum_sendreq.h  (original)
+++ trunk/ompi/mca/pml/csum/pml_csum_sendreq.h  2009-07-17 10:54:18  
EDT (Fri, 17 Jul 2009)

@@ -12,6 +12,7 @@
  * Copyright (c) 2009  IBM Corporation.  All rights reserved.
  * Copyright (c) 2009  Los Alamos National Security, LLC.  All  
rights

  * reserved.
+ * Copyright (c) 2009  Sun Microsystems, Inc.  All rights  
reserved.

  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -349,6 +350,7 @@
 size_t eager_limit = btl->btl_eager_limit -  
sizeof(mca_pml_csum_hdr_t);

 int rc;

+assert(btl->btl_eager_limit >= sizeof(mca_pml_csum_hdr_t));
 if( OPAL_LIKELY(size <= eager_limit) ) {
 switch(sendreq->req_send.req_send_mode) {
 case MCA_PML_BASE_SEND_SYNCHRONOUS:

Modified: trunk/ompi/mca/pml/dr/pml_dr_sendreq.h
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/ompi/mca/pml/dr/pml_dr_sendreq.h  (original)
+++ trunk/ompi/mca/pml/dr/pml_dr_sendreq.h  2009-07-17 10:54:18  
EDT (Fri, 17 Jul 2009)

@@ -9,6 +9,7 @@
  * University of Stuttgart.  All rights  
reserved.
  * Copyright (c) 2004-2006 The Regents of the University of  
California.

  * All rights reserved.
+ * Copyright (c) 2009  Sun Microsystems, Inc.  All rights  
reserved.

  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -176,6 +177,7 @@
 sendreq->req_send.req_base.req_sequence =  
OPAL_THREAD_ADD32(&proc->send_sequence,1);  \


   \
 /* select a btl  
*/\
+assert(bml_btl->btl->btl_eager_limit >=  
sizeof(mca_pml_dr_hdr_t));\
 eager_limit = bml_btl->btl->btl_eager_limit -  
sizeof(mca_pml_dr_hdr_t);   \
 if(size <= eager_limit)  
{ \
 switch(sendreq->req_send.req_send_mode)  
{ \


Modified: trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h(original)
+++ trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h2009-07-17 10:54:18  
EDT (Fri, 17 Jul 2009)

@@ -9,6 +9,7 @@
  * University of Stuttgart.  All rights  
reserved.
  * Copyright (c) 2004-2005 The Regents of the University of  
California.

  * All rights reserved.
+ * Copyright (c) 2009  Sun Microsystems, Inc.  All rights  
reserved.

  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -348,6 +349,7 @@
 size_t eager_limit = btl->btl_eager_limit -  
sizeof(mca_pml_ob1_hdr_t);

 int rc;

+assert(btl->btl_eager_limit >= sizeof(mca_pml_ob1_hdr_t));
 if( OPAL_LIKELY(size <= eager_limit) ) {
 switch(sendreq->req_send.req_send_mode) {
 case MCA_PML_BASE_SEND_SYNCHRONOUS:
___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r21707

2009-07-21 Thread Terry Dontje

Jeff Squyres wrote:

Do we really want asserts here, or orte_show_help()'s?

asserts won't fire in production builds, will they?

No but isn't this a critical path in the code?

--td



On Jul 17, 2009, at 10:54 AM,   wrote:


Author: tdd
Date: 2009-07-17 10:54:18 EDT (Fri, 17 Jul 2009)
New Revision: 21707
URL: https://svn.open-mpi.org/trac/ompi/changeset/21707

Log:
Add asserts to catch when btl_eager_limit is smaller than the pml 
headers.


Text files modified:
   trunk/ompi/mca/pml/csum/pml_csum_sendreq.h | 2 ++
   trunk/ompi/mca/pml/dr/pml_dr_sendreq.h | 2 ++
   trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h   | 2 ++
   3 files changed, 6 insertions(+), 0 deletions(-)

Modified: trunk/ompi/mca/pml/csum/pml_csum_sendreq.h
== 


--- trunk/ompi/mca/pml/csum/pml_csum_sendreq.h  (original)
+++ trunk/ompi/mca/pml/csum/pml_csum_sendreq.h  2009-07-17 10:54:18 
EDT (Fri, 17 Jul 2009)

@@ -12,6 +12,7 @@
  * Copyright (c) 2009  IBM Corporation.  All rights reserved.
  * Copyright (c) 2009  Los Alamos National Security, LLC.  All 
rights

  * reserved.
+ * Copyright (c) 2009  Sun Microsystems, Inc.  All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -349,6 +350,7 @@
 size_t eager_limit = btl->btl_eager_limit - 
sizeof(mca_pml_csum_hdr_t);

 int rc;

+assert(btl->btl_eager_limit >= sizeof(mca_pml_csum_hdr_t));
 if( OPAL_LIKELY(size <= eager_limit) ) {
 switch(sendreq->req_send.req_send_mode) {
 case MCA_PML_BASE_SEND_SYNCHRONOUS:

Modified: trunk/ompi/mca/pml/dr/pml_dr_sendreq.h
== 


--- trunk/ompi/mca/pml/dr/pml_dr_sendreq.h  (original)
+++ trunk/ompi/mca/pml/dr/pml_dr_sendreq.h  2009-07-17 10:54:18 
EDT (Fri, 17 Jul 2009)

@@ -9,6 +9,7 @@
  * University of Stuttgart.  All rights 
reserved.

  * Copyright (c) 2004-2006 The Regents of the University of California.
  * All rights reserved.
+ * Copyright (c) 2009  Sun Microsystems, Inc.  All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -176,6 +177,7 @@
 sendreq->req_send.req_base.req_sequence = 
OPAL_THREAD_ADD32(&proc->send_sequence,1);  \
   
\
 /* select a btl 
*/\
+assert(bml_btl->btl->btl_eager_limit >= 
sizeof(mca_pml_dr_hdr_t));\
 eager_limit = bml_btl->btl->btl_eager_limit - 
sizeof(mca_pml_dr_hdr_t);   \
 if(size <= eager_limit) 
{ \
 switch(sendreq->req_send.req_send_mode) 
{ \


Modified: trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h
== 


--- trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h(original)
+++ trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h2009-07-17 10:54:18 
EDT (Fri, 17 Jul 2009)

@@ -9,6 +9,7 @@
  * University of Stuttgart.  All rights 
reserved.

  * Copyright (c) 2004-2005 The Regents of the University of California.
  * All rights reserved.
+ * Copyright (c) 2009  Sun Microsystems, Inc.  All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -348,6 +349,7 @@
 size_t eager_limit = btl->btl_eager_limit - 
sizeof(mca_pml_ob1_hdr_t);

 int rc;

+assert(btl->btl_eager_limit >= sizeof(mca_pml_ob1_hdr_t));
 if( OPAL_LIKELY(size <= eager_limit) ) {
 switch(sendreq->req_send.req_send_mode) {
 case MCA_PML_BASE_SEND_SYNCHRONOUS:
___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full








[OMPI devel] BTL receive callback

2009-07-21 Thread Sebastian Rinke

Hello,
I am developing a new BTL component (Open MPI v1.3.2) for a new  
3D-torus interconnect. During a simple message transfer of 16362 B  
between two nodes with MPI_Send(), MPI_Recv() I encounter the following:


The sender:
---

1. prepare_src() size: 16304 reserve: 32
   -> alloc() size: 16336
   -> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
   -> send cb ()
   -> free()
4. component_progress()
   -> recv cb ()
  -> prepare_src() size: 58 reserve: 32
 -> alloc() size: 90
 -> ompi_convertor_pack(): 58
  -> free() size: 90  Send is missing !!!
5. NO PROGRESS

The receiver:
-

1. component_progress()
   -> recv cb ()
  -> alloc() size: 32
  -> send()
2. component_progress()
   -> send cb ()
   -> free() size: 32
3. component_progress() for ever !!!

The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.

I have found that mca_pml_ob1_recv_frag_callback_ack() is  the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free() instead  
of send()

so that I can get an idea of where to look for errors in my BTL component.

Thank you very much in advance.

Sebastian Rinke



Re: [OMPI devel] BTL receive callback

2009-07-21 Thread George Bosilca
Based on your code the only reason I can imagine for the second send  
to never be triggered is that the request is considered completed at  
that point.


I can't imagine how the free is called without a prior send. If I look  
at the code pml_ob1_sendreq.c:1061, the free is only called when the  
send fails, but it is always preceded by a send.


Can you check the return values of the ompi_convertor_pack and  
prepare_src please?


  george.

On Jul 21, 2009, at 11:55 , Sebastian Rinke wrote:


Hello,
I am developing a new BTL component (Open MPI v1.3.2) for a new 3D- 
torus interconnect. During a simple message transfer of 16362 B  
between two nodes with MPI_Send(), MPI_Recv() I encounter the  
following:


The sender:
---

1. prepare_src() size: 16304 reserve: 32
  -> alloc() size: 16336
  -> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
  -> send cb ()
  -> free()
4. component_progress()
  -> recv cb ()
 -> prepare_src() size: 58 reserve: 32
-> alloc() size: 90
-> ompi_convertor_pack(): 58
 -> free() size: 90  Send is missing !!!
5. NO PROGRESS

The receiver:
-

1. component_progress()
  -> recv cb ()
 -> alloc() size: 32
 -> send()
2. component_progress()
  -> send cb ()
  -> free() size: 32
3. component_progress() for ever !!!

The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.

I have found that mca_pml_ob1_recv_frag_callback_ack() is  the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free()  
instead of send()
so that I can get an idea of where to look for errors in my BTL  
component.


Thank you very much in advance.

Sebastian Rinke

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] BTL receive callback

2009-07-21 Thread Sebastian Rinke

Thank you for your hint. I found that prepare_src() didn't
return the correct size, i.e. it did

ompi_convertor_pack(...,&max_data);
*size = max_data;

However, after ompi_convertor_pack(), max_data == 0 thus *size == 0
and free() is called without a prior send() in pml_ob1_sendreq.c:1064

I took this order from btl_openib.c's prepare_src().
So it seems that it doesn't cause any problems there but for me it does.

Thanks for your help.
Sebastian.


Quoting George Bosilca :

Based on your code the only reason I can imagine for the second send  
to never be triggered is that the request is considered completed at  
that point.


I can't imagine how the free is called without a prior send. If I  
look at the code pml_ob1_sendreq.c:1061, the free is only called  
when the send fails, but it is always preceded by a send.


Can you check the return values of the ompi_convertor_pack and  
prepare_src please?


  george.

On Jul 21, 2009, at 11:55 , Sebastian Rinke wrote:


Hello,
I am developing a new BTL component (Open MPI v1.3.2) for a new  
3D-torus interconnect. During a simple message transfer of 16362 B  
between two nodes with MPI_Send(), MPI_Recv() I encounter the  
following:


The sender:
---

1. prepare_src() size: 16304 reserve: 32
 -> alloc() size: 16336
 -> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
 -> send cb ()
 -> free()
4. component_progress()
 -> recv cb ()
-> prepare_src() size: 58 reserve: 32
   -> alloc() size: 90
   -> ompi_convertor_pack(): 58
-> free() size: 90  Send is missing !!!
5. NO PROGRESS

The receiver:
-

1. component_progress()
 -> recv cb ()
-> alloc() size: 32
-> send()
2. component_progress()
 -> send cb ()
 -> free() size: 32
3. component_progress() for ever !!!

The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.

I have found that mca_pml_ob1_recv_frag_callback_ack() is  the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free()  
instead of send()

so that I can get an idea of where to look for errors in my BTL component.

Thank you very much in advance.

Sebastian Rinke

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel








[OMPI devel] lotsa errors in new autodetect component

2009-07-21 Thread Jeff Squyres

This is on Linux with a very recent kernel (2.6.30), gcc 4.3.3:

libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include - 
I../../../../orte/include -I../../../../ompi/include -I../../../../ 
opal/mca/paffinity/linux/plpa/src/libplpa -I../../../.. -I/users/ 
jsquyres -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing- 
prototypes -Wstrict-prototypes -Wcomment -pedantic -Wno-long-double - 
Werror-implicit-function-declaration -finline-functions -fno-strict- 
aliasing -pthread -fvisibility=hidden -MT  
opal_installdirs_autodetect_component.lo -MD -MP -MF .deps/ 
opal_installdirs_autodetect_component.Tpo -c  
opal_installdirs_autodetect_component.c  -fPIC -DPIC -o .libs/ 
opal_installdirs_autodetect_component.o

opal_installdirs_autodetect_component.c: In function ‘whatis’:
opal_installdirs_autodetect_component.c:73: warning: comparison  
between signed and unsigned

opal_installdirs_autodetect_component.c: At top level:
opal_installdirs_autodetect_component.c:112: error: static declaration  
of ‘opal_installdirs_autodetect_open’ follows non-static declaration
opal_installdirs_autodetect_component.c:29: error: previous  
declaration of ‘opal_installdirs_autodetect_open’ was here
opal_installdirs_autodetect_component.c: In function  
‘opal_installdirs_autodetect_open’:
opal_installdirs_autodetect_component.c:141: warning: passing argument  
1 of ‘opal_free’ discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:146: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:148: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:151: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:153: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:156: warning: passing argument  
1 of ‘opal_free’ discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:157: warning: passing argument  
1 of ‘opal_free’ discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:161: warning: passing argument  
1 of ‘opal_free’ discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:163: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:164: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:165: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:166: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:167: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:168: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:169: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:170: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:171: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:172: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:173: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:174: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:175: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:176: warning: assignment  
discards qualifiers from pointer target type
opal_installdirs_autodetect_component.c:177: warning: assignment  
discards qualifiers from pointer target type


--
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI devel] [OMPI svn] svn:open-mpi r21723

2009-07-21 Thread Ralph Castain
This commit appears to have broken the build system for Mac OSX. Could  
you please fix it - since it only supports Solaris and Linux, how  
about setting it so it continues to work in other environments??


Thanks
Ralph

On Jul 21, 2009, at 2:19 PM, i...@osl.iu.edu wrote:


Author: igb
Date: 2009-07-21 16:19:38 EDT (Tue, 21 Jul 2009)
New Revision: 21723
URL: https://svn.open-mpi.org/trac/ompi/changeset/21723

Log:
Added autodetect installdirs component.  Currently supports Solaris  
and Linux.


* Installation directories will be inferred from the actual location
 of the shared library that contains the component.

* OPAL_PREFIX and other environment variables allow users to override
 the inferred directories.  They should no longer be necessary in
 most cases, though.

* Any directories that cannot be inferred will fall back to whatever
 is provided by the config installdirs component.


Added:
  trunk/opal/mca/installdirs/autodetect/
  trunk/opal/mca/installdirs/autodetect/Makefile.am
  trunk/opal/mca/installdirs/autodetect/configure.m4
  trunk/opal/mca/installdirs/autodetect/configure.params
  trunk/opal/mca/installdirs/autodetect/ 
opal_installdirs_autodetect_component.c

  trunk/opal/mca/installdirs/autodetect/opal_installdirs_backtrace.c
  trunk/opal/mca/installdirs/autodetect/opal_installdirs_linux.c
  trunk/opal/mca/installdirs/autodetect/opal_installdirs_solaris.c
  trunk/opal/mca/installdirs/autodetect/opal_installdirs_walkcontext.c
Text files modified:
  trunk/AUTHORS  
| 1
  trunk/NEWS 
| 7
  trunk/opal/mca/installdirs/base/installdirs_base_components.c |
112 +-
  trunk/opal/mca/installdirs/base/installdirs_base_expand.c |
306 ---

  4 files changed, 328 insertions(+), 98 deletions(-)


Diff not shown due to size (39001 bytes).
To see the diff, run the following command:

svn diff -r 21722:21723 --no-diff-deleted

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn




[OMPI devel] autodetect broken

2009-07-21 Thread Jeff Squyres
I'm quite confused about what this component did to the base  
functions.  I haven't had a chance to digest it properly, but it  
"feels wrong"...  Iain -- can you please explain the workings of this  
component and its interactions with the base?


Also, it seems broken:

[15:31] svbu-mpi:~/svn/ompi4 % ompi_info | grep installd
--
Sorry!  You were supposed to get help about:
developer warning: field too long
But I couldn't open the help file:
/${datadir}/openmpi/help-ompi_info.txt: No such file or  
directory.  Sorry!

--
 MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4)
 MCA installdirs: autodetect (MCA v2.0, API v2.0, Component  
v1.4)

 MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4)
[15:31] svbu-mpi:~/svn/ompi4 %

The help file should have been found.  This is on Linux RHEL4, but I  
doubt it's a Linux-version-specific issue...


I'm going to .ompi_ignore this component because 3 other people have  
complained to me in the last 15 minutes on IM that it breaks things  
for them.


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] [OMPI svn] svn:open-mpi r21723

2009-07-21 Thread Iain Bason


On Jul 21, 2009, at 6:31 PM, Ralph Castain wrote:

This commit appears to have broken the build system for Mac OSX.  
Could you please fix it - since it only supports Solaris and Linux,  
how about setting it so it continues to work in other environments??


That was the intent of the configure.m4 script in that directory.  It  
is supposed to check for the existence of some files in /proc, which  
should not exist on a Mac.  Could you send me the relevant portion of  
the config.log on Mac OSX?


Iain



Re: [OMPI devel] autodetect broken

2009-07-21 Thread Iain Bason


On Jul 21, 2009, at 6:34 PM, Jeff Squyres wrote:

I'm quite confused about what this component did to the base  
functions.  I haven't had a chance to digest it properly, but it  
"feels wrong"...  Iain -- can you please explain the workings of  
this component and its interactions with the base?


The autodetect component gets loaded after the environment component,  
and before the config component.  So environment variables like  
OPAL_PREFIX will override it.


When it loads, it finds the directory containing libopen-pal.so  
(assuming that is where the autodetect component actually is) and sets  
its install_dirs_data.libdir to that.  The other fields of  
install_dirs_data are set to "${infer-libdir}".  So when the base  
component loads autodetect, and no environment variables have set any  
of the fields, opal_install_dirs.everything_except_libdir is set to "$ 
{infer-libdir}".


(If the autodetect component is statically linked into an application,  
then it will set bindir rather than libdir.)


The base component looks for fields set to "${infer-foo}", and calls  
opal_install_dirs_infer to figure out what the field should be.  For  
example, if opal_install_dirs.prefix is set to "${infer-libdir}", then  
it calls opal_install_dirs_infer("prefix", "libdir}", 6, &component- 
>install_dirs_data).


Opal_install_dirs_infer expands everything in component- 
>install_dirs_data.libdir *except* "${prefix}".  Let's say that ompi  
was configured so that libdir is "${prefix}/lib", and the actual path  
to libopen-pal.so is /usr/local/lib/libopen-pal.so.  The autodetect  
component will have set opal_install_dirs.libdir to "/usr/local/lib".   
It matches the tail of "${prefix}/lib" to "/usr/local/lib", and infers  
that the remainder must be the prefix, so it sets  
opal_install_dirs.prefix to "/usr/local".


Other directories (e.g., pkgdatadir) presumably cannot be inferred  
from libdir, and opal_install_dirs_infer will return NULL.  The config  
component will then load some value into that field, and things will  
work as they did before.


Iain



Re: [OMPI devel] [OMPI svn] svn:open-mpi r21723

2009-07-21 Thread Jeff Squyres
FWIW, it seems to compile ok for me on Leopard (i.e., autodetect  
disables itself):


--- MCA component installdirs:autodetect (m4 configuration macro)
checking for MCA component installdirs:autodetect compile mode... static
checking procfs.h usability... no
checking procfs.h presence... no
checking for procfs.h... no
checking for /proc/self/maps... no
checking if MCA component installdirs:autodetect can compile... no

However, I see that autodetect configure.m4 is checking  
$backtrace_installdirs_happy -- which seems like a no-no.  The  
ordering of framework / component configure.m4 scripts is not  
guaranteed, so it's not a good idea to check the output of another  
configure.m4's macro.



On Jul 21, 2009, at 6:45 PM, Iain Bason wrote:



On Jul 21, 2009, at 6:31 PM, Ralph Castain wrote:

> This commit appears to have broken the build system for Mac OSX.
> Could you please fix it - since it only supports Solaris and Linux,
> how about setting it so it continues to work in other environments??

That was the intent of the configure.m4 script in that directory.  It
is supposed to check for the existence of some files in /proc, which
should not exist on a Mac.  Could you send me the relevant portion of
the config.log on Mac OSX?

Iain

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] autodetect broken

2009-07-21 Thread Iain Bason


On Jul 21, 2009, at 6:34 PM, Jeff Squyres wrote:


Also, it seems broken:

[15:31] svbu-mpi:~/svn/ompi4 % ompi_info | grep installd
--
Sorry!  You were supposed to get help about:
   developer warning: field too long
But I couldn't open the help file:
   /${datadir}/openmpi/help-ompi_info.txt: No such file or  
directory.  Sorry!

--
MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4)
MCA installdirs: autodetect (MCA v2.0, API v2.0, Component  
v1.4)

MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4)
[15:31] svbu-mpi:~/svn/ompi4 %

The help file should have been found.  This is on Linux RHEL4, but I  
doubt it's a Linux-version-specific issue...


Could you send me your configure options, and your OPAL_XXX  
environment variables?


Iain



Re: [OMPI devel] autodetect broken

2009-07-21 Thread Jeff Squyres
Arrgh!!  Even with .ompi_ignore, everything is broken on OS X and  
Linux (perhaps this is what Ralph was referring to -- not a compile  
time problem?):


-
$ mpicc -g -Isrc   -c -o libmpitest.o libmpitest.c
Cannot open configuration file ${datadir}/openmpi/mpicc-wrapper-data.txt
Error parsing data file mpicc: Not found
-

$#%@#$%@#$%@#$%#$!

Given that it's happening on 2 different OS's, this is enough to  
convince me that r21723 is unfortunately borked.  I'm going to back it  
out.




On Jul 21, 2009, at 7:13 PM, Iain Bason wrote:



On Jul 21, 2009, at 6:34 PM, Jeff Squyres wrote:

> I'm quite confused about what this component did to the base
> functions.  I haven't had a chance to digest it properly, but it
> "feels wrong"...  Iain -- can you please explain the workings of
> this component and its interactions with the base?

The autodetect component gets loaded after the environment component,
and before the config component.  So environment variables like
OPAL_PREFIX will override it.

When it loads, it finds the directory containing libopen-pal.so
(assuming that is where the autodetect component actually is) and sets
its install_dirs_data.libdir to that.  The other fields of
install_dirs_data are set to "${infer-libdir}".  So when the base
component loads autodetect, and no environment variables have set any
of the fields, opal_install_dirs.everything_except_libdir is set to "$
{infer-libdir}".

(If the autodetect component is statically linked into an application,
then it will set bindir rather than libdir.)

The base component looks for fields set to "${infer-foo}", and calls
opal_install_dirs_infer to figure out what the field should be.  For
example, if opal_install_dirs.prefix is set to "${infer-libdir}", then
it calls opal_install_dirs_infer("prefix", "libdir}", 6, &component-
 >install_dirs_data).

Opal_install_dirs_infer expands everything in component-
 >install_dirs_data.libdir *except* "${prefix}".  Let's say that ompi
was configured so that libdir is "${prefix}/lib", and the actual path
to libopen-pal.so is /usr/local/lib/libopen-pal.so.  The autodetect
component will have set opal_install_dirs.libdir to "/usr/local/lib".
It matches the tail of "${prefix}/lib" to "/usr/local/lib", and infers
that the remainder must be the prefix, so it sets
opal_install_dirs.prefix to "/usr/local".

Other directories (e.g., pkgdatadir) presumably cannot be inferred
from libdir, and opal_install_dirs_infer will return NULL.  The config
component will then load some value into that field, and things will
work as they did before.

Iain

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] [OMPI svn] svn:open-mpi r21723

2009-07-21 Thread Iain Bason


On Jul 21, 2009, at 7:35 PM, Jeff Squyres wrote:

However, I see that autodetect configure.m4 is checking  
$backtrace_installdirs_happy -- which seems like a no-no.  The  
ordering of framework / component configure.m4 scripts is not  
guaranteed, so it's not a good idea to check the output of another  
configure.m4's macro.


Grrr, I thought I had changed all those to findpc_happy.  Well, that's  
easy enough to fix.  I don't see how it could result in the component  
being built when it isn't supposed to be, though.


Iain



Re: [OMPI devel] autodetect broken

2009-07-21 Thread Jeff Squyres

On Jul 21, 2009, at 7:46 PM, Iain Bason wrote:


> The help file should have been found.  This is on Linux RHEL4, but I
> doubt it's a Linux-version-specific issue...

Could you send me your configure options, and your OPAL_XXX
environment variables?




  $ ./configure --prefix=/home/jsquyres/bogus --disable-mpi-f77 -- 
enable-mpirun-prefix-by-default


No OPAL_* env variables set.

Same thing happens on OS X and Linux.

If you have an immediate fix for this, that would be great --  
otherwise, please back this commit out (I said in my previous mail  
that I would back it out, but I had assumed that you were gone for the  
day.  If you're around, please make the call...).


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] autodetect broken

2009-07-21 Thread Iain Bason


On Jul 21, 2009, at 7:50 PM, Jeff Squyres wrote:

If you have an immediate fix for this, that would be great --  
otherwise, please back this commit out (I said in my previous mail  
that I would back it out, but I had assumed that you were gone for  
the day.  If you're around, please make the call...).


I am effectively gone for the day.  (I am managing to send the odd  
email between my kids interrupting me.)  Please do back out.  I'll be  
able to look at fixing it tomorrow.


Iain



Re: [OMPI devel] autodetect broken

2009-07-21 Thread Iain Bason


On Jul 21, 2009, at 7:50 PM, Jeff Squyres wrote:


On Jul 21, 2009, at 7:46 PM, Iain Bason wrote:

> The help file should have been found.  This is on Linux RHEL4,  
but I

> doubt it's a Linux-version-specific issue...

Could you send me your configure options, and your OPAL_XXX
environment variables?




 $ ./configure --prefix=/home/jsquyres/bogus --disable-mpi-f77 -- 
enable-mpirun-prefix-by-default


No OPAL_* env variables set.

Same thing happens on OS X and Linux.


And does it fail when actually installed in /home/jsquyres/bogus, or  
only when installed elsewhere?


Iain



Re: [OMPI devel] autodetect broken

2009-07-21 Thread Jeff Squyres

On Jul 21, 2009, at 7:55 PM, Iain Bason wrote:


>  $ ./configure --prefix=/home/jsquyres/bogus --disable-mpi-f77 --
> enable-mpirun-prefix-by-default
>
> No OPAL_* env variables set.
>
> Same thing happens on OS X and Linux.

And does it fail when actually installed in /home/jsquyres/bogus, or
only when installed elsewhere?




I have done a plain "make -j 4 all; make install" -- so it resides in / 
home/jsquyres/bogus.


Per your prior mail, I'll back it out.

I know Brian has some opinions about the changes in the base functions  
-- he'll be replying shortly (he's still driving home).


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] autodetect broken

2009-07-21 Thread Iain Bason


On Jul 21, 2009, at 7:48 PM, Jeff Squyres wrote:

Arrgh!!  Even with .ompi_ignore, everything is broken on OS X and  
Linux (perhaps this is what Ralph was referring to -- not a compile  
time problem?):


-
$ mpicc -g -Isrc   -c -o libmpitest.o libmpitest.c
Cannot open configuration file ${datadir}/openmpi/mpicc-wrapper- 
data.txt

Error parsing data file mpicc: Not found
-


Is this just mpicc, or is it also ompi_info and mpirun failing like  
this?  I presume the autodetect component is *not* involved, right? So  
this presumably is a problem with opal_install_dirs_expand?


Iain



Re: [OMPI devel] autodetect broken

2009-07-21 Thread Jeff Squyres

On Jul 21, 2009, at 8:01 PM, Iain Bason wrote:


> $ mpicc -g -Isrc   -c -o libmpitest.o libmpitest.c
> Cannot open configuration file ${datadir}/openmpi/mpicc-wrapper-
> data.txt
> Error parsing data file mpicc: Not found

Is this just mpicc, or is it also ompi_info and mpirun failing like
this?



ompi_info was -- I did not check mpirun.


I presume the autodetect component is *not* involved, right? So
this presumably is a problem with opal_install_dirs_expand?




autodetect was .ompi_info'ed out, so it was not built/installed.

--
Jeff Squyres
jsquy...@cisco.com



[OMPI devel] fortran MPI_COMPLEX datatype broken

2009-07-21 Thread Jeff Squyres

The extent for MPI_COMPLEX is returning 0.

This is causing many the intel Fortran tests to fail, because they  
loop over testing types, and MPI_COMPLEX is one of those types.   
Specifically, you get a floating point exception because the intel  
test computes (size / extent), and extent==0, so it's a division by 0.


I'm not sure where this happened, but it's feels like something in the  
new DDT code...?


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] fortran MPI_COMPLEX datatype broken

2009-07-21 Thread Jeff Squyres

On Jul 21, 2009, at 8:44 PM, Jeff Squyres (jsquyres) wrote:


The extent for MPI_COMPLEX is returning 0.




Sorry -- I accidentally hit "send" way before I finished typing.  :-\

You can reproduce the problem with a trivial program:

-
#include 
#include 

int main(int argc, char* argv[])
{
MPI_Aint extent;
MPI_Init(NULL, NULL);
MPI_Type_extent(MPI_COMPLEX, &extent);
printf("Got extent: %d\n", extent);
MPI_Finalize();
return 0;
}
-

This is an OMPI that was compiled with Fortran support.  If I break at  
MPI_Type_extent in gdb, here's what *type is:


-
(gdb) p *type
$1 = {super = {super = {obj_magic_id = 16046253926196952813,
  obj_class = 0x2a95aa0520, obj_reference_count = 1,
  cls_init_file_name = 0x2a95626ce0 "ompi_datatype_module.c",
  cls_init_lineno = 134}, flags = 63011, id = 0, bdt_used = 0,  
size = 0,

true_lb = 0, true_ub = 0, lb = 0, ub = 0, align = 0, nbElems = 1,
name = "OPAL_UNAVAILABLE", '\0' , desc =  
{length = 1,

  used = 1, desc = 0x2a95ac4640}, opt_desc = {length = 1, used = 1,
  desc = 0x2a95ac4640}, btypes = {0 }}, id = 25,
  d_f_to_c_index = 18, d_keyhash = 0x0, args = 0x0,  
packed_description = 0x0,

  name = "MPI_COMPLEX", '\0' }
-

The OPAL_UNAVAILABLE looks ominous...?  When I do the same thing with  
MPI_INTEGER, it doesn't say OPAL_UNAVAILABLE:


-
(gdb) p *type
$2 = {super = {super = {obj_magic_id = 16046253926196952813,
  obj_class = 0x2a95aa0520, obj_reference_count = 1,
  cls_init_file_name = 0x2a95626ce0 "ompi_datatype_module.c",
  cls_init_lineno = 131}, flags = 55094, id = 6, bdt_used = 64,  
size = 4,

true_lb = 0, true_ub = 4, lb = 0, ub = 4, align = 4, nbElems = 1,
name = "OPAL_INT4", '\0' , desc = {length = 1,
  used = 1, desc = 0x2a95777920}, opt_desc = {length = 1, used = 1,
  desc = 0x2a95777920}, btypes = {0, 0, 0, 0, 0, 0, 1,
  0 }}, id = 22, d_f_to_c_index = 7, d_keyhash  
= 0x0,

  args = 0x0, packed_description = 0x0,
  name = "MPI_INTEGER", '\0' }
-

Note that configure was happy with all the COMPLEX datatypes;  
config.out and config.log attached.  This was with gcc 3.4 on RHEL4.


--
Jeff Squyres
jsquy...@cisco.com


complex-borked.tar.bz2
Description: BZip2 compressed data


Re: [OMPI devel] fortran MPI_COMPLEX datatype broken

2009-07-21 Thread Jeff Squyres

A little more data...

ompi_datatype_module.c:442 says

#if 0 /* XXX TODO The following may be deleted, both CXX and F77/F90  
complex types are statically set up */

...followed by code to initialize ompi_mpi_cplx (i.e., MPI_COMPLEX).

(another TODO!!)

But ompi_mpi_cplex is setup with:

ompi_predefined_datatype_t ompi_mpi_cplex =   
OMPI_DATATYPE_INIT_DEFER (COMPLEX, OMPI_DATATYPE_FLAG_DATA_FORTRAN |  
OMPI_DATATYPE_FLAG_DATA_COMPLEX );


and OMPI_DATATYPE_INIT_DEFER has a comment above it saying:

/*
 * Initilization for these types is deferred until runtime.
 *
 * Using this macro implies that at this point not all informations  
needed
 * to fill up the datatype are known. We fill them with zeros and  
then later

 * when the datatype engine will be initialized we complete with the
 * correct information. This macro should be used for all composed  
types.

 */

So this first thing is clearly wrong.

Assumedly, ompi_mpi_cplx (and friends) *do* need to be setup  
dynamically at runtime, and the code must be fixed to do so.





On Jul 21, 2009, at 8:51 PM, Jeff Squyres (jsquyres) wrote:


On Jul 21, 2009, at 8:44 PM, Jeff Squyres (jsquyres) wrote:

> The extent for MPI_COMPLEX is returning 0.
>


Sorry -- I accidentally hit "send" way before I finished typing.  :-\

You can reproduce the problem with a trivial program:

-
#include 
#include 

int main(int argc, char* argv[])
{
 MPI_Aint extent;
 MPI_Init(NULL, NULL);
 MPI_Type_extent(MPI_COMPLEX, &extent);
 printf("Got extent: %d\n", extent);
 MPI_Finalize();
 return 0;
}
-

This is an OMPI that was compiled with Fortran support.  If I break at
MPI_Type_extent in gdb, here's what *type is:

-
(gdb) p *type
$1 = {super = {super = {obj_magic_id = 16046253926196952813,
   obj_class = 0x2a95aa0520, obj_reference_count = 1,
   cls_init_file_name = 0x2a95626ce0 "ompi_datatype_module.c",
   cls_init_lineno = 134}, flags = 63011, id = 0, bdt_used = 0,
size = 0,
 true_lb = 0, true_ub = 0, lb = 0, ub = 0, align = 0, nbElems = 1,
 name = "OPAL_UNAVAILABLE", '\0' , desc =
{length = 1,
   used = 1, desc = 0x2a95ac4640}, opt_desc = {length = 1, used  
= 1,
   desc = 0x2a95ac4640}, btypes = {0 }}, id =  
25,

   d_f_to_c_index = 18, d_keyhash = 0x0, args = 0x0,
packed_description = 0x0,
   name = "MPI_COMPLEX", '\0' }
-

The OPAL_UNAVAILABLE looks ominous...?  When I do the same thing with
MPI_INTEGER, it doesn't say OPAL_UNAVAILABLE:

-
(gdb) p *type
$2 = {super = {super = {obj_magic_id = 16046253926196952813,
   obj_class = 0x2a95aa0520, obj_reference_count = 1,
   cls_init_file_name = 0x2a95626ce0 "ompi_datatype_module.c",
   cls_init_lineno = 131}, flags = 55094, id = 6, bdt_used = 64,
size = 4,
 true_lb = 0, true_ub = 4, lb = 0, ub = 4, align = 4, nbElems = 1,
 name = "OPAL_INT4", '\0' , desc = {length = 1,
   used = 1, desc = 0x2a95777920}, opt_desc = {length = 1, used  
= 1,

   desc = 0x2a95777920}, btypes = {0, 0, 0, 0, 0, 0, 1,
   0 }}, id = 22, d_f_to_c_index = 7, d_keyhash
= 0x0,
   args = 0x0, packed_description = 0x0,
   name = "MPI_INTEGER", '\0' }
-

Note that configure was happy with all the COMPLEX datatypes;
config.out and config.log attached.  This was with gcc 3.4 on RHEL4.

--
Jeff Squyres
jsquy...@cisco.com





--
Jeff Squyres
jsquy...@cisco.com