[OMPI devel] cross-complie changes
Note that with the DDT changes, there's a few more configure tests that may have added or changed configure cache value names, such as the type alignment values. Just a heads-up for those who are cross-compiling... -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r21707
Do we really want asserts here, or orte_show_help()'s? asserts won't fire in production builds, will they? On Jul 17, 2009, at 10:54 AM, wrote: Author: tdd Date: 2009-07-17 10:54:18 EDT (Fri, 17 Jul 2009) New Revision: 21707 URL: https://svn.open-mpi.org/trac/ompi/changeset/21707 Log: Add asserts to catch when btl_eager_limit is smaller than the pml headers. Text files modified: trunk/ompi/mca/pml/csum/pml_csum_sendreq.h | 2 ++ trunk/ompi/mca/pml/dr/pml_dr_sendreq.h | 2 ++ trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h | 2 ++ 3 files changed, 6 insertions(+), 0 deletions(-) Modified: trunk/ompi/mca/pml/csum/pml_csum_sendreq.h = = = = = = = = == --- trunk/ompi/mca/pml/csum/pml_csum_sendreq.h (original) +++ trunk/ompi/mca/pml/csum/pml_csum_sendreq.h 2009-07-17 10:54:18 EDT (Fri, 17 Jul 2009) @@ -12,6 +12,7 @@ * Copyright (c) 2009 IBM Corporation. All rights reserved. * Copyright (c) 2009 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -349,6 +350,7 @@ size_t eager_limit = btl->btl_eager_limit - sizeof(mca_pml_csum_hdr_t); int rc; +assert(btl->btl_eager_limit >= sizeof(mca_pml_csum_hdr_t)); if( OPAL_LIKELY(size <= eager_limit) ) { switch(sendreq->req_send.req_send_mode) { case MCA_PML_BASE_SEND_SYNCHRONOUS: Modified: trunk/ompi/mca/pml/dr/pml_dr_sendreq.h = = = = = = = = == --- trunk/ompi/mca/pml/dr/pml_dr_sendreq.h (original) +++ trunk/ompi/mca/pml/dr/pml_dr_sendreq.h 2009-07-17 10:54:18 EDT (Fri, 17 Jul 2009) @@ -9,6 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. + * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -176,6 +177,7 @@ sendreq->req_send.req_base.req_sequence = OPAL_THREAD_ADD32(&proc->send_sequence,1); \ \ /* select a btl */\ +assert(bml_btl->btl->btl_eager_limit >= sizeof(mca_pml_dr_hdr_t));\ eager_limit = bml_btl->btl->btl_eager_limit - sizeof(mca_pml_dr_hdr_t); \ if(size <= eager_limit) { \ switch(sendreq->req_send.req_send_mode) { \ Modified: trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h = = = = = = = = == --- trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h(original) +++ trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h2009-07-17 10:54:18 EDT (Fri, 17 Jul 2009) @@ -9,6 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. + * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -348,6 +349,7 @@ size_t eager_limit = btl->btl_eager_limit - sizeof(mca_pml_ob1_hdr_t); int rc; +assert(btl->btl_eager_limit >= sizeof(mca_pml_ob1_hdr_t)); if( OPAL_LIKELY(size <= eager_limit) ) { switch(sendreq->req_send.req_send_mode) { case MCA_PML_BASE_SEND_SYNCHRONOUS: ___ svn-full mailing list svn-f...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r21707
Jeff Squyres wrote: Do we really want asserts here, or orte_show_help()'s? asserts won't fire in production builds, will they? No but isn't this a critical path in the code? --td On Jul 17, 2009, at 10:54 AM, wrote: Author: tdd Date: 2009-07-17 10:54:18 EDT (Fri, 17 Jul 2009) New Revision: 21707 URL: https://svn.open-mpi.org/trac/ompi/changeset/21707 Log: Add asserts to catch when btl_eager_limit is smaller than the pml headers. Text files modified: trunk/ompi/mca/pml/csum/pml_csum_sendreq.h | 2 ++ trunk/ompi/mca/pml/dr/pml_dr_sendreq.h | 2 ++ trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h | 2 ++ 3 files changed, 6 insertions(+), 0 deletions(-) Modified: trunk/ompi/mca/pml/csum/pml_csum_sendreq.h == --- trunk/ompi/mca/pml/csum/pml_csum_sendreq.h (original) +++ trunk/ompi/mca/pml/csum/pml_csum_sendreq.h 2009-07-17 10:54:18 EDT (Fri, 17 Jul 2009) @@ -12,6 +12,7 @@ * Copyright (c) 2009 IBM Corporation. All rights reserved. * Copyright (c) 2009 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -349,6 +350,7 @@ size_t eager_limit = btl->btl_eager_limit - sizeof(mca_pml_csum_hdr_t); int rc; +assert(btl->btl_eager_limit >= sizeof(mca_pml_csum_hdr_t)); if( OPAL_LIKELY(size <= eager_limit) ) { switch(sendreq->req_send.req_send_mode) { case MCA_PML_BASE_SEND_SYNCHRONOUS: Modified: trunk/ompi/mca/pml/dr/pml_dr_sendreq.h == --- trunk/ompi/mca/pml/dr/pml_dr_sendreq.h (original) +++ trunk/ompi/mca/pml/dr/pml_dr_sendreq.h 2009-07-17 10:54:18 EDT (Fri, 17 Jul 2009) @@ -9,6 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. + * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -176,6 +177,7 @@ sendreq->req_send.req_base.req_sequence = OPAL_THREAD_ADD32(&proc->send_sequence,1); \ \ /* select a btl */\ +assert(bml_btl->btl->btl_eager_limit >= sizeof(mca_pml_dr_hdr_t));\ eager_limit = bml_btl->btl->btl_eager_limit - sizeof(mca_pml_dr_hdr_t); \ if(size <= eager_limit) { \ switch(sendreq->req_send.req_send_mode) { \ Modified: trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h == --- trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h(original) +++ trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.h2009-07-17 10:54:18 EDT (Fri, 17 Jul 2009) @@ -9,6 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. + * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -348,6 +349,7 @@ size_t eager_limit = btl->btl_eager_limit - sizeof(mca_pml_ob1_hdr_t); int rc; +assert(btl->btl_eager_limit >= sizeof(mca_pml_ob1_hdr_t)); if( OPAL_LIKELY(size <= eager_limit) ) { switch(sendreq->req_send.req_send_mode) { case MCA_PML_BASE_SEND_SYNCHRONOUS: ___ svn-full mailing list svn-f...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
[OMPI devel] BTL receive callback
Hello, I am developing a new BTL component (Open MPI v1.3.2) for a new 3D-torus interconnect. During a simple message transfer of 16362 B between two nodes with MPI_Send(), MPI_Recv() I encounter the following: The sender: --- 1. prepare_src() size: 16304 reserve: 32 -> alloc() size: 16336 -> ompi_convertor_pack(): 16304 2. send() 3. component_progress() -> send cb () -> free() 4. component_progress() -> recv cb () -> prepare_src() size: 58 reserve: 32 -> alloc() size: 90 -> ompi_convertor_pack(): 58 -> free() size: 90 Send is missing !!! 5. NO PROGRESS The receiver: - 1. component_progress() -> recv cb () -> alloc() size: 32 -> send() 2. component_progress() -> send cb () -> free() size: 32 3. component_progress() for ever !!! The problem is that after prepare_src() for the 2nd fragment, the sender calls free() instead of send() in its recv cb. Thus, the 2nd fragment is not being transmitted. As a consequence, the receiver waits for the 2nd fragment. I have found that mca_pml_ob1_recv_frag_callback_ack() is the corresponding recv cb. Before diving into the ob1 code, could you tell me under which conditions this cb calls free() instead of send() so that I can get an idea of where to look for errors in my BTL component. Thank you very much in advance. Sebastian Rinke
Re: [OMPI devel] BTL receive callback
Based on your code the only reason I can imagine for the second send to never be triggered is that the request is considered completed at that point. I can't imagine how the free is called without a prior send. If I look at the code pml_ob1_sendreq.c:1061, the free is only called when the send fails, but it is always preceded by a send. Can you check the return values of the ompi_convertor_pack and prepare_src please? george. On Jul 21, 2009, at 11:55 , Sebastian Rinke wrote: Hello, I am developing a new BTL component (Open MPI v1.3.2) for a new 3D- torus interconnect. During a simple message transfer of 16362 B between two nodes with MPI_Send(), MPI_Recv() I encounter the following: The sender: --- 1. prepare_src() size: 16304 reserve: 32 -> alloc() size: 16336 -> ompi_convertor_pack(): 16304 2. send() 3. component_progress() -> send cb () -> free() 4. component_progress() -> recv cb () -> prepare_src() size: 58 reserve: 32 -> alloc() size: 90 -> ompi_convertor_pack(): 58 -> free() size: 90 Send is missing !!! 5. NO PROGRESS The receiver: - 1. component_progress() -> recv cb () -> alloc() size: 32 -> send() 2. component_progress() -> send cb () -> free() size: 32 3. component_progress() for ever !!! The problem is that after prepare_src() for the 2nd fragment, the sender calls free() instead of send() in its recv cb. Thus, the 2nd fragment is not being transmitted. As a consequence, the receiver waits for the 2nd fragment. I have found that mca_pml_ob1_recv_frag_callback_ack() is the corresponding recv cb. Before diving into the ob1 code, could you tell me under which conditions this cb calls free() instead of send() so that I can get an idea of where to look for errors in my BTL component. Thank you very much in advance. Sebastian Rinke ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] BTL receive callback
Thank you for your hint. I found that prepare_src() didn't return the correct size, i.e. it did ompi_convertor_pack(...,&max_data); *size = max_data; However, after ompi_convertor_pack(), max_data == 0 thus *size == 0 and free() is called without a prior send() in pml_ob1_sendreq.c:1064 I took this order from btl_openib.c's prepare_src(). So it seems that it doesn't cause any problems there but for me it does. Thanks for your help. Sebastian. Quoting George Bosilca : Based on your code the only reason I can imagine for the second send to never be triggered is that the request is considered completed at that point. I can't imagine how the free is called without a prior send. If I look at the code pml_ob1_sendreq.c:1061, the free is only called when the send fails, but it is always preceded by a send. Can you check the return values of the ompi_convertor_pack and prepare_src please? george. On Jul 21, 2009, at 11:55 , Sebastian Rinke wrote: Hello, I am developing a new BTL component (Open MPI v1.3.2) for a new 3D-torus interconnect. During a simple message transfer of 16362 B between two nodes with MPI_Send(), MPI_Recv() I encounter the following: The sender: --- 1. prepare_src() size: 16304 reserve: 32 -> alloc() size: 16336 -> ompi_convertor_pack(): 16304 2. send() 3. component_progress() -> send cb () -> free() 4. component_progress() -> recv cb () -> prepare_src() size: 58 reserve: 32 -> alloc() size: 90 -> ompi_convertor_pack(): 58 -> free() size: 90 Send is missing !!! 5. NO PROGRESS The receiver: - 1. component_progress() -> recv cb () -> alloc() size: 32 -> send() 2. component_progress() -> send cb () -> free() size: 32 3. component_progress() for ever !!! The problem is that after prepare_src() for the 2nd fragment, the sender calls free() instead of send() in its recv cb. Thus, the 2nd fragment is not being transmitted. As a consequence, the receiver waits for the 2nd fragment. I have found that mca_pml_ob1_recv_frag_callback_ack() is the corresponding recv cb. Before diving into the ob1 code, could you tell me under which conditions this cb calls free() instead of send() so that I can get an idea of where to look for errors in my BTL component. Thank you very much in advance. Sebastian Rinke ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] lotsa errors in new autodetect component
This is on Linux with a very recent kernel (2.6.30), gcc 4.3.3: libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include - I../../../../orte/include -I../../../../ompi/include -I../../../../ opal/mca/paffinity/linux/plpa/src/libplpa -I../../../.. -I/users/ jsquyres -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing- prototypes -Wstrict-prototypes -Wcomment -pedantic -Wno-long-double - Werror-implicit-function-declaration -finline-functions -fno-strict- aliasing -pthread -fvisibility=hidden -MT opal_installdirs_autodetect_component.lo -MD -MP -MF .deps/ opal_installdirs_autodetect_component.Tpo -c opal_installdirs_autodetect_component.c -fPIC -DPIC -o .libs/ opal_installdirs_autodetect_component.o opal_installdirs_autodetect_component.c: In function ‘whatis’: opal_installdirs_autodetect_component.c:73: warning: comparison between signed and unsigned opal_installdirs_autodetect_component.c: At top level: opal_installdirs_autodetect_component.c:112: error: static declaration of ‘opal_installdirs_autodetect_open’ follows non-static declaration opal_installdirs_autodetect_component.c:29: error: previous declaration of ‘opal_installdirs_autodetect_open’ was here opal_installdirs_autodetect_component.c: In function ‘opal_installdirs_autodetect_open’: opal_installdirs_autodetect_component.c:141: warning: passing argument 1 of ‘opal_free’ discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:146: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:148: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:151: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:153: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:156: warning: passing argument 1 of ‘opal_free’ discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:157: warning: passing argument 1 of ‘opal_free’ discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:161: warning: passing argument 1 of ‘opal_free’ discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:163: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:164: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:165: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:166: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:167: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:168: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:169: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:170: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:171: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:172: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:173: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:174: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:175: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:176: warning: assignment discards qualifiers from pointer target type opal_installdirs_autodetect_component.c:177: warning: assignment discards qualifiers from pointer target type -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] [OMPI svn] svn:open-mpi r21723
This commit appears to have broken the build system for Mac OSX. Could you please fix it - since it only supports Solaris and Linux, how about setting it so it continues to work in other environments?? Thanks Ralph On Jul 21, 2009, at 2:19 PM, i...@osl.iu.edu wrote: Author: igb Date: 2009-07-21 16:19:38 EDT (Tue, 21 Jul 2009) New Revision: 21723 URL: https://svn.open-mpi.org/trac/ompi/changeset/21723 Log: Added autodetect installdirs component. Currently supports Solaris and Linux. * Installation directories will be inferred from the actual location of the shared library that contains the component. * OPAL_PREFIX and other environment variables allow users to override the inferred directories. They should no longer be necessary in most cases, though. * Any directories that cannot be inferred will fall back to whatever is provided by the config installdirs component. Added: trunk/opal/mca/installdirs/autodetect/ trunk/opal/mca/installdirs/autodetect/Makefile.am trunk/opal/mca/installdirs/autodetect/configure.m4 trunk/opal/mca/installdirs/autodetect/configure.params trunk/opal/mca/installdirs/autodetect/ opal_installdirs_autodetect_component.c trunk/opal/mca/installdirs/autodetect/opal_installdirs_backtrace.c trunk/opal/mca/installdirs/autodetect/opal_installdirs_linux.c trunk/opal/mca/installdirs/autodetect/opal_installdirs_solaris.c trunk/opal/mca/installdirs/autodetect/opal_installdirs_walkcontext.c Text files modified: trunk/AUTHORS | 1 trunk/NEWS | 7 trunk/opal/mca/installdirs/base/installdirs_base_components.c | 112 +- trunk/opal/mca/installdirs/base/installdirs_base_expand.c | 306 --- 4 files changed, 328 insertions(+), 98 deletions(-) Diff not shown due to size (39001 bytes). To see the diff, run the following command: svn diff -r 21722:21723 --no-diff-deleted ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn
[OMPI devel] autodetect broken
I'm quite confused about what this component did to the base functions. I haven't had a chance to digest it properly, but it "feels wrong"... Iain -- can you please explain the workings of this component and its interactions with the base? Also, it seems broken: [15:31] svbu-mpi:~/svn/ompi4 % ompi_info | grep installd -- Sorry! You were supposed to get help about: developer warning: field too long But I couldn't open the help file: /${datadir}/openmpi/help-ompi_info.txt: No such file or directory. Sorry! -- MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4) MCA installdirs: autodetect (MCA v2.0, API v2.0, Component v1.4) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4) [15:31] svbu-mpi:~/svn/ompi4 % The help file should have been found. This is on Linux RHEL4, but I doubt it's a Linux-version-specific issue... I'm going to .ompi_ignore this component because 3 other people have complained to me in the last 15 minutes on IM that it breaks things for them. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] [OMPI svn] svn:open-mpi r21723
On Jul 21, 2009, at 6:31 PM, Ralph Castain wrote: This commit appears to have broken the build system for Mac OSX. Could you please fix it - since it only supports Solaris and Linux, how about setting it so it continues to work in other environments?? That was the intent of the configure.m4 script in that directory. It is supposed to check for the existence of some files in /proc, which should not exist on a Mac. Could you send me the relevant portion of the config.log on Mac OSX? Iain
Re: [OMPI devel] autodetect broken
On Jul 21, 2009, at 6:34 PM, Jeff Squyres wrote: I'm quite confused about what this component did to the base functions. I haven't had a chance to digest it properly, but it "feels wrong"... Iain -- can you please explain the workings of this component and its interactions with the base? The autodetect component gets loaded after the environment component, and before the config component. So environment variables like OPAL_PREFIX will override it. When it loads, it finds the directory containing libopen-pal.so (assuming that is where the autodetect component actually is) and sets its install_dirs_data.libdir to that. The other fields of install_dirs_data are set to "${infer-libdir}". So when the base component loads autodetect, and no environment variables have set any of the fields, opal_install_dirs.everything_except_libdir is set to "$ {infer-libdir}". (If the autodetect component is statically linked into an application, then it will set bindir rather than libdir.) The base component looks for fields set to "${infer-foo}", and calls opal_install_dirs_infer to figure out what the field should be. For example, if opal_install_dirs.prefix is set to "${infer-libdir}", then it calls opal_install_dirs_infer("prefix", "libdir}", 6, &component- >install_dirs_data). Opal_install_dirs_infer expands everything in component- >install_dirs_data.libdir *except* "${prefix}". Let's say that ompi was configured so that libdir is "${prefix}/lib", and the actual path to libopen-pal.so is /usr/local/lib/libopen-pal.so. The autodetect component will have set opal_install_dirs.libdir to "/usr/local/lib". It matches the tail of "${prefix}/lib" to "/usr/local/lib", and infers that the remainder must be the prefix, so it sets opal_install_dirs.prefix to "/usr/local". Other directories (e.g., pkgdatadir) presumably cannot be inferred from libdir, and opal_install_dirs_infer will return NULL. The config component will then load some value into that field, and things will work as they did before. Iain
Re: [OMPI devel] [OMPI svn] svn:open-mpi r21723
FWIW, it seems to compile ok for me on Leopard (i.e., autodetect disables itself): --- MCA component installdirs:autodetect (m4 configuration macro) checking for MCA component installdirs:autodetect compile mode... static checking procfs.h usability... no checking procfs.h presence... no checking for procfs.h... no checking for /proc/self/maps... no checking if MCA component installdirs:autodetect can compile... no However, I see that autodetect configure.m4 is checking $backtrace_installdirs_happy -- which seems like a no-no. The ordering of framework / component configure.m4 scripts is not guaranteed, so it's not a good idea to check the output of another configure.m4's macro. On Jul 21, 2009, at 6:45 PM, Iain Bason wrote: On Jul 21, 2009, at 6:31 PM, Ralph Castain wrote: > This commit appears to have broken the build system for Mac OSX. > Could you please fix it - since it only supports Solaris and Linux, > how about setting it so it continues to work in other environments?? That was the intent of the configure.m4 script in that directory. It is supposed to check for the existence of some files in /proc, which should not exist on a Mac. Could you send me the relevant portion of the config.log on Mac OSX? Iain ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] autodetect broken
On Jul 21, 2009, at 6:34 PM, Jeff Squyres wrote: Also, it seems broken: [15:31] svbu-mpi:~/svn/ompi4 % ompi_info | grep installd -- Sorry! You were supposed to get help about: developer warning: field too long But I couldn't open the help file: /${datadir}/openmpi/help-ompi_info.txt: No such file or directory. Sorry! -- MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4) MCA installdirs: autodetect (MCA v2.0, API v2.0, Component v1.4) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4) [15:31] svbu-mpi:~/svn/ompi4 % The help file should have been found. This is on Linux RHEL4, but I doubt it's a Linux-version-specific issue... Could you send me your configure options, and your OPAL_XXX environment variables? Iain
Re: [OMPI devel] autodetect broken
Arrgh!! Even with .ompi_ignore, everything is broken on OS X and Linux (perhaps this is what Ralph was referring to -- not a compile time problem?): - $ mpicc -g -Isrc -c -o libmpitest.o libmpitest.c Cannot open configuration file ${datadir}/openmpi/mpicc-wrapper-data.txt Error parsing data file mpicc: Not found - $#%@#$%@#$%@#$%#$! Given that it's happening on 2 different OS's, this is enough to convince me that r21723 is unfortunately borked. I'm going to back it out. On Jul 21, 2009, at 7:13 PM, Iain Bason wrote: On Jul 21, 2009, at 6:34 PM, Jeff Squyres wrote: > I'm quite confused about what this component did to the base > functions. I haven't had a chance to digest it properly, but it > "feels wrong"... Iain -- can you please explain the workings of > this component and its interactions with the base? The autodetect component gets loaded after the environment component, and before the config component. So environment variables like OPAL_PREFIX will override it. When it loads, it finds the directory containing libopen-pal.so (assuming that is where the autodetect component actually is) and sets its install_dirs_data.libdir to that. The other fields of install_dirs_data are set to "${infer-libdir}". So when the base component loads autodetect, and no environment variables have set any of the fields, opal_install_dirs.everything_except_libdir is set to "$ {infer-libdir}". (If the autodetect component is statically linked into an application, then it will set bindir rather than libdir.) The base component looks for fields set to "${infer-foo}", and calls opal_install_dirs_infer to figure out what the field should be. For example, if opal_install_dirs.prefix is set to "${infer-libdir}", then it calls opal_install_dirs_infer("prefix", "libdir}", 6, &component- >install_dirs_data). Opal_install_dirs_infer expands everything in component- >install_dirs_data.libdir *except* "${prefix}". Let's say that ompi was configured so that libdir is "${prefix}/lib", and the actual path to libopen-pal.so is /usr/local/lib/libopen-pal.so. The autodetect component will have set opal_install_dirs.libdir to "/usr/local/lib". It matches the tail of "${prefix}/lib" to "/usr/local/lib", and infers that the remainder must be the prefix, so it sets opal_install_dirs.prefix to "/usr/local". Other directories (e.g., pkgdatadir) presumably cannot be inferred from libdir, and opal_install_dirs_infer will return NULL. The config component will then load some value into that field, and things will work as they did before. Iain ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] [OMPI svn] svn:open-mpi r21723
On Jul 21, 2009, at 7:35 PM, Jeff Squyres wrote: However, I see that autodetect configure.m4 is checking $backtrace_installdirs_happy -- which seems like a no-no. The ordering of framework / component configure.m4 scripts is not guaranteed, so it's not a good idea to check the output of another configure.m4's macro. Grrr, I thought I had changed all those to findpc_happy. Well, that's easy enough to fix. I don't see how it could result in the component being built when it isn't supposed to be, though. Iain
Re: [OMPI devel] autodetect broken
On Jul 21, 2009, at 7:46 PM, Iain Bason wrote: > The help file should have been found. This is on Linux RHEL4, but I > doubt it's a Linux-version-specific issue... Could you send me your configure options, and your OPAL_XXX environment variables? $ ./configure --prefix=/home/jsquyres/bogus --disable-mpi-f77 -- enable-mpirun-prefix-by-default No OPAL_* env variables set. Same thing happens on OS X and Linux. If you have an immediate fix for this, that would be great -- otherwise, please back this commit out (I said in my previous mail that I would back it out, but I had assumed that you were gone for the day. If you're around, please make the call...). -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] autodetect broken
On Jul 21, 2009, at 7:50 PM, Jeff Squyres wrote: If you have an immediate fix for this, that would be great -- otherwise, please back this commit out (I said in my previous mail that I would back it out, but I had assumed that you were gone for the day. If you're around, please make the call...). I am effectively gone for the day. (I am managing to send the odd email between my kids interrupting me.) Please do back out. I'll be able to look at fixing it tomorrow. Iain
Re: [OMPI devel] autodetect broken
On Jul 21, 2009, at 7:50 PM, Jeff Squyres wrote: On Jul 21, 2009, at 7:46 PM, Iain Bason wrote: > The help file should have been found. This is on Linux RHEL4, but I > doubt it's a Linux-version-specific issue... Could you send me your configure options, and your OPAL_XXX environment variables? $ ./configure --prefix=/home/jsquyres/bogus --disable-mpi-f77 -- enable-mpirun-prefix-by-default No OPAL_* env variables set. Same thing happens on OS X and Linux. And does it fail when actually installed in /home/jsquyres/bogus, or only when installed elsewhere? Iain
Re: [OMPI devel] autodetect broken
On Jul 21, 2009, at 7:55 PM, Iain Bason wrote: > $ ./configure --prefix=/home/jsquyres/bogus --disable-mpi-f77 -- > enable-mpirun-prefix-by-default > > No OPAL_* env variables set. > > Same thing happens on OS X and Linux. And does it fail when actually installed in /home/jsquyres/bogus, or only when installed elsewhere? I have done a plain "make -j 4 all; make install" -- so it resides in / home/jsquyres/bogus. Per your prior mail, I'll back it out. I know Brian has some opinions about the changes in the base functions -- he'll be replying shortly (he's still driving home). -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] autodetect broken
On Jul 21, 2009, at 7:48 PM, Jeff Squyres wrote: Arrgh!! Even with .ompi_ignore, everything is broken on OS X and Linux (perhaps this is what Ralph was referring to -- not a compile time problem?): - $ mpicc -g -Isrc -c -o libmpitest.o libmpitest.c Cannot open configuration file ${datadir}/openmpi/mpicc-wrapper- data.txt Error parsing data file mpicc: Not found - Is this just mpicc, or is it also ompi_info and mpirun failing like this? I presume the autodetect component is *not* involved, right? So this presumably is a problem with opal_install_dirs_expand? Iain
Re: [OMPI devel] autodetect broken
On Jul 21, 2009, at 8:01 PM, Iain Bason wrote: > $ mpicc -g -Isrc -c -o libmpitest.o libmpitest.c > Cannot open configuration file ${datadir}/openmpi/mpicc-wrapper- > data.txt > Error parsing data file mpicc: Not found Is this just mpicc, or is it also ompi_info and mpirun failing like this? ompi_info was -- I did not check mpirun. I presume the autodetect component is *not* involved, right? So this presumably is a problem with opal_install_dirs_expand? autodetect was .ompi_info'ed out, so it was not built/installed. -- Jeff Squyres jsquy...@cisco.com
[OMPI devel] fortran MPI_COMPLEX datatype broken
The extent for MPI_COMPLEX is returning 0. This is causing many the intel Fortran tests to fail, because they loop over testing types, and MPI_COMPLEX is one of those types. Specifically, you get a floating point exception because the intel test computes (size / extent), and extent==0, so it's a division by 0. I'm not sure where this happened, but it's feels like something in the new DDT code...? -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] fortran MPI_COMPLEX datatype broken
On Jul 21, 2009, at 8:44 PM, Jeff Squyres (jsquyres) wrote: The extent for MPI_COMPLEX is returning 0. Sorry -- I accidentally hit "send" way before I finished typing. :-\ You can reproduce the problem with a trivial program: - #include #include int main(int argc, char* argv[]) { MPI_Aint extent; MPI_Init(NULL, NULL); MPI_Type_extent(MPI_COMPLEX, &extent); printf("Got extent: %d\n", extent); MPI_Finalize(); return 0; } - This is an OMPI that was compiled with Fortran support. If I break at MPI_Type_extent in gdb, here's what *type is: - (gdb) p *type $1 = {super = {super = {obj_magic_id = 16046253926196952813, obj_class = 0x2a95aa0520, obj_reference_count = 1, cls_init_file_name = 0x2a95626ce0 "ompi_datatype_module.c", cls_init_lineno = 134}, flags = 63011, id = 0, bdt_used = 0, size = 0, true_lb = 0, true_ub = 0, lb = 0, ub = 0, align = 0, nbElems = 1, name = "OPAL_UNAVAILABLE", '\0' , desc = {length = 1, used = 1, desc = 0x2a95ac4640}, opt_desc = {length = 1, used = 1, desc = 0x2a95ac4640}, btypes = {0 }}, id = 25, d_f_to_c_index = 18, d_keyhash = 0x0, args = 0x0, packed_description = 0x0, name = "MPI_COMPLEX", '\0' } - The OPAL_UNAVAILABLE looks ominous...? When I do the same thing with MPI_INTEGER, it doesn't say OPAL_UNAVAILABLE: - (gdb) p *type $2 = {super = {super = {obj_magic_id = 16046253926196952813, obj_class = 0x2a95aa0520, obj_reference_count = 1, cls_init_file_name = 0x2a95626ce0 "ompi_datatype_module.c", cls_init_lineno = 131}, flags = 55094, id = 6, bdt_used = 64, size = 4, true_lb = 0, true_ub = 4, lb = 0, ub = 4, align = 4, nbElems = 1, name = "OPAL_INT4", '\0' , desc = {length = 1, used = 1, desc = 0x2a95777920}, opt_desc = {length = 1, used = 1, desc = 0x2a95777920}, btypes = {0, 0, 0, 0, 0, 0, 1, 0 }}, id = 22, d_f_to_c_index = 7, d_keyhash = 0x0, args = 0x0, packed_description = 0x0, name = "MPI_INTEGER", '\0' } - Note that configure was happy with all the COMPLEX datatypes; config.out and config.log attached. This was with gcc 3.4 on RHEL4. -- Jeff Squyres jsquy...@cisco.com complex-borked.tar.bz2 Description: BZip2 compressed data
Re: [OMPI devel] fortran MPI_COMPLEX datatype broken
A little more data... ompi_datatype_module.c:442 says #if 0 /* XXX TODO The following may be deleted, both CXX and F77/F90 complex types are statically set up */ ...followed by code to initialize ompi_mpi_cplx (i.e., MPI_COMPLEX). (another TODO!!) But ompi_mpi_cplex is setup with: ompi_predefined_datatype_t ompi_mpi_cplex = OMPI_DATATYPE_INIT_DEFER (COMPLEX, OMPI_DATATYPE_FLAG_DATA_FORTRAN | OMPI_DATATYPE_FLAG_DATA_COMPLEX ); and OMPI_DATATYPE_INIT_DEFER has a comment above it saying: /* * Initilization for these types is deferred until runtime. * * Using this macro implies that at this point not all informations needed * to fill up the datatype are known. We fill them with zeros and then later * when the datatype engine will be initialized we complete with the * correct information. This macro should be used for all composed types. */ So this first thing is clearly wrong. Assumedly, ompi_mpi_cplx (and friends) *do* need to be setup dynamically at runtime, and the code must be fixed to do so. On Jul 21, 2009, at 8:51 PM, Jeff Squyres (jsquyres) wrote: On Jul 21, 2009, at 8:44 PM, Jeff Squyres (jsquyres) wrote: > The extent for MPI_COMPLEX is returning 0. > Sorry -- I accidentally hit "send" way before I finished typing. :-\ You can reproduce the problem with a trivial program: - #include #include int main(int argc, char* argv[]) { MPI_Aint extent; MPI_Init(NULL, NULL); MPI_Type_extent(MPI_COMPLEX, &extent); printf("Got extent: %d\n", extent); MPI_Finalize(); return 0; } - This is an OMPI that was compiled with Fortran support. If I break at MPI_Type_extent in gdb, here's what *type is: - (gdb) p *type $1 = {super = {super = {obj_magic_id = 16046253926196952813, obj_class = 0x2a95aa0520, obj_reference_count = 1, cls_init_file_name = 0x2a95626ce0 "ompi_datatype_module.c", cls_init_lineno = 134}, flags = 63011, id = 0, bdt_used = 0, size = 0, true_lb = 0, true_ub = 0, lb = 0, ub = 0, align = 0, nbElems = 1, name = "OPAL_UNAVAILABLE", '\0' , desc = {length = 1, used = 1, desc = 0x2a95ac4640}, opt_desc = {length = 1, used = 1, desc = 0x2a95ac4640}, btypes = {0 }}, id = 25, d_f_to_c_index = 18, d_keyhash = 0x0, args = 0x0, packed_description = 0x0, name = "MPI_COMPLEX", '\0' } - The OPAL_UNAVAILABLE looks ominous...? When I do the same thing with MPI_INTEGER, it doesn't say OPAL_UNAVAILABLE: - (gdb) p *type $2 = {super = {super = {obj_magic_id = 16046253926196952813, obj_class = 0x2a95aa0520, obj_reference_count = 1, cls_init_file_name = 0x2a95626ce0 "ompi_datatype_module.c", cls_init_lineno = 131}, flags = 55094, id = 6, bdt_used = 64, size = 4, true_lb = 0, true_ub = 4, lb = 0, ub = 4, align = 4, nbElems = 1, name = "OPAL_INT4", '\0' , desc = {length = 1, used = 1, desc = 0x2a95777920}, opt_desc = {length = 1, used = 1, desc = 0x2a95777920}, btypes = {0, 0, 0, 0, 0, 0, 1, 0 }}, id = 22, d_f_to_c_index = 7, d_keyhash = 0x0, args = 0x0, packed_description = 0x0, name = "MPI_INTEGER", '\0' } - Note that configure was happy with all the COMPLEX datatypes; config.out and config.log attached. This was with gcc 3.4 on RHEL4. -- Jeff Squyres jsquy...@cisco.com -- Jeff Squyres jsquy...@cisco.com