Re: [OMPI devel] orte_ns_base_select failed: returned value -1 instead of ORTE_SUCCESS
Hmmm...well, my bad. There does indeed appear to be something funny going on with Leopard. No idea what - it used to work fine. I haven't tested it in awhile though - I've been test building regularly on Leopard, but running on Tiger (I misspoke earlier). For now, I'm afraid you can't run on Leopard. Have to figure it out later when I have more time. Ralph > -- Forwarded Message >> From: Aurélien Bouteiller>> Reply-To: Open MPI Developers >> Date: Thu, 31 Jan 2008 02:18:27 -0500 >> To: Open MPI Developers >> Subject: Re: [OMPI devel] orte_ns_base_select failed: returned value -1 >> instead of ORTE_SUCCESS >> >> I tried using a fresh trunk, same problem have occured. Here is the >> complete configure line. I am using libtool 1.5.22 from fink. >> Otherwise everything is standard OS 10.5. >> >>$ ../trunk/configure --prefix=/Users/bouteill/ompi/build --enable- >> mpirun-prefix-by-default --disable-io-romio --enable-debug --enable- >> picky --enable-mem-debug --enable-mem-profile --enable-visibility -- >> disable-dlopen --disable-shared --enable-static >> >> The error message generated by abort contains garbage (line numbers do >> not match anything in .c files and according to gdb the failure does >> not occur during ns initialization). This looks like a heap corruption >> or something as bad. >> >> orterun (argc=4, argv=0xb81c) at ../../../../trunk/orte/tools/ >> orterun/orterun.c:529 >> 529 cb_states = ORTE_PROC_STATE_TERMINATED | >> ORTE_PROC_STATE_AT_STG1; >> (gdb) n >> 530 rc = orte_rmgr.spawn_job(apps, num_apps, , 0, NULL, >> job_state_callback, cb_states, ); >> (gdb) n >> 531 while (NULL != (item = opal_list_remove_first())) >> OBJ_RELEASE(item); >> (gdb) n >> ** Stepping over inlined function code. ** >> 532 OBJ_DESTRUCT(); >> (gdb) n >> 534 if (orterun_globals.do_not_launch) { >> (gdb) n >> 539 OPAL_THREAD_LOCK(_globals.lock); >> (gdb) n >> 541 if (ORTE_SUCCESS == rc) { >> (gdb) n >> 542 while (!orterun_globals.exit) { >> (gdb) n >> 543 opal_condition_wait(_globals.cond, >> (gdb) n >> [grosse-pomme.local:77335] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in >> file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/ >> oob_base_init.c at line 74 >> >> Aurelien >> >> >> Le 30 janv. 08 à 17:18, Ralph Castain a écrit : >> >>> Are you running on the trunk, or an earlier release? >>> >>> If the trunk, then I suspect you have a stale library hanging >>> around. I >>> build and run statically on Leopard regularly. >>> >>> >>> On 1/30/08 2:54 PM, "Aurélien Bouteiller" >>> wrote: >>> I get a runtime error in static build on Mac OS 10.5 (automake 1.10, autoconf 2.60, gcc-apple-darwin 4.01, libtool 1.5.22). The error does not occur in dso builds, and everything seems to work fine on Linux. Here is the error log. ~/ompi$ mpirun -np 2 NetPIPE_3.6/NPmpi [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/ oob_base_init.c at line 74 [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/ns/proxy/ ns_proxy_component.c at line 222 [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Error in file / SourceCache/openmpi/openmpi-5/openmpi/orte/runtime/orte_init_stage1.c at line 230 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ns_base_select failed --> Returned value -1 instead of ORTE_SUCCESS -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init_stage1 failed --> Returned "Error" (-1) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye)
Re: [OMPI devel] vt compiler warnings and errors
Ah -- I didn't notice this before -- do you have a configure script committed to SVN? If so, this could be the problem. Whether what Tim sees happens or not will depend on the timestamps that SVN puts on configure and all of the files dependent upon configure (Makefile.in, Makefile, ...etc.) in the VT tree. If some of them have "bad" timestamps, then the dependencies in the Makefiles can end up re-running VT's configure, re-create configure, etc. Is there a way to get OMPI's autogen to also autogen the VT software? This would ensure one, consistent set of timestamps (not dependent upon what timestamps SVN wrote to your filesystem for these sensitive files). On Jan 31, 2008, at 12:36 PM, Matthias Jurenz wrote: Hi Tim, that seems wrong for me, too. I could not reproduce this on my computer. The VT-integration comes with an own configure script, which will not created by the OMPI's autogen.sh. I have not really an idea what's going wrong... I suppose, the problem is that you use another version of the Autotools as I have used to bootstap VT ?!? The VT's configure script was created by following version of the Autotools: autoconf 2.61, automake 1.10, libtool 1.5.24. Which version of the Autotools you are using to boostrap OpenMPI ? Matthias On Do, 2008-01-31 at 08:09 -0500, Tim Prins wrote: Hi Matthias, I just noticed something else that seems odd. On a fresh checkout, I did a autogen and configure. Then I type 'make clean'. Things seem to progress normally, but once it gets to ompi/contrib/vt/vt/extlib/ otf, a new configure script gets run. Specifically: [tprins@sif test]$ make clean Making clean in otf make[5]: Entering directory `/san/homedirs/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf' cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run automake-1.10 --gnu cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/ missing --run autoconf /bin/sh ./config.status --recheck running CONFIG_SHELL=/bin/sh /bin/sh ./configure --with-zlib-lib=-lz --prefix=/usr/local --exec-prefix=/usr/local --bindir=/usr/local/bin --libdir=/usr/local/lib --includedir=/usr/local/include --datarootdir=/usr/local/share/vampirtrace --datadir=${prefix}/share/${PACKAGE_TARNAME} --docdir=${prefix}/share/${PACKAGE_TARNAME}/doc --cache-file=/dev/ null --srcdir=. CXXFLAGS=-g -Wall -Wundef -Wno-long-long -finline- functions -pthread LDFLAGS= LIBS=-lnsl -lutil -lm CPPFLAGS= CFLAGS=-g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -pthread FFLAGS= --no-create --no-recursion checking build system type... x86_64-unknown-linux-gnu Not sure if this is expected behavior, but it seems wrong to me. Thanks, Tim Matthias Jurenz wrote: > Hello, > > all three VT related errors which MTT reported should be fixed now. > > 516: > The fix from George Bosilca at this morning should work on MacOS PPC. > Thanks! > > 517: > The compile error occurred due to a missing header include. > Futhermore, the compiler warnings should be also fixed. > > 518: > I have added a check whether MPI I/O is available and add the > corresponding VT's > configure option to enable/disable MPI I/O support. Therefor I used the > variable > "define_mpi_io" from 'ompi/mca/io/configure.m4'. Is that o.k. or should > I use another > variable ? > > > Matthias > > > On Di, 2008-01-29 at 09:19 -0500, Jeff Squyres wrote: >> I got a bunch of compiler warnings and errors with VT on the PGI >> compiler last night -- my mail client won't paste it in nicely. :-( >> >> See these MTT reports for details: >> >> - On Absoft systems: >>http://www.open-mpi.org/mtt/index.php?do_redir=516 >> - On Cisco systems: >>With PGI compilers: >>http://www.open-mpi.org/mtt/index.php?do_redir=517 >>With GNU compilers: >>http://www.open-mpi.org/mtt/index.php?do_redir=518 >> >> The output may be a bit hard to read -- for MTT builds, we separate >> the stdout and stderr into 2 streams. So you kinda have to merge them >> in your head; sorry... >> > -- > Matthias Jurenz, > Center for Information Services and > High Performance Computing (ZIH), TU Dresden, > Willersbau A106, Zellescher Weg 12, 01062 Dresden > phone +49-351-463-31945, fax +49-351-463-37773 > > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773
Re: [OMPI devel] vt compiler warnings and errors
Hi Tim, that seems wrong for me, too. I could not reproduce this on my computer. The VT-integration comes with an own configure script, which will not created by the OMPI's autogen.sh. I have not really an idea what's going wrong... I suppose, the problem is that you use another version of the Autotools as I have used to bootstap VT ?!? The VT's configure script was created by following version of the Autotools: autoconf 2.61, automake 1.10, libtool 1.5.24. Which version of the Autotools you are using to boostrap OpenMPI ? Matthias On Do, 2008-01-31 at 08:09 -0500, Tim Prins wrote: > Hi Matthias, > > I just noticed something else that seems odd. On a fresh checkout, I did > a autogen and configure. Then I type 'make clean'. Things seem to > progress normally, but once it gets to ompi/contrib/vt/vt/extlib/otf, a > new configure script gets run. > > Specifically: > [tprins@sif test]$ make clean > > Making clean in otf > make[5]: Entering directory > `/san/homedirs/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf' > cd . && /bin/sh > /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run > automake-1.10 --gnu > cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing > --run autoconf > /bin/sh ./config.status --recheck > running CONFIG_SHELL=/bin/sh /bin/sh ./configure --with-zlib-lib=-lz > --prefix=/usr/local --exec-prefix=/usr/local --bindir=/usr/local/bin > --libdir=/usr/local/lib --includedir=/usr/local/include > --datarootdir=/usr/local/share/vampirtrace > --datadir=${prefix}/share/${PACKAGE_TARNAME} > --docdir=${prefix}/share/${PACKAGE_TARNAME}/doc --cache-file=/dev/null > --srcdir=. CXXFLAGS=-g -Wall -Wundef -Wno-long-long -finline-functions > -pthread LDFLAGS= LIBS=-lnsl -lutil -lm CPPFLAGS= CFLAGS=-g -Wall > -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes > -Wstrict-prototypes -Wcomment -pedantic > -Werror-implicit-function-declaration -finline-functions > -fno-strict-aliasing -pthread FFLAGS= --no-create --no-recursion > checking build system type... x86_64-unknown-linux-gnu > > > > Not sure if this is expected behavior, but it seems wrong to me. > > Thanks, > > Tim > > Matthias Jurenz wrote: > > Hello, > > > > all three VT related errors which MTT reported should be fixed now. > > > > 516: > > The fix from George Bosilca at this morning should work on MacOS PPC. > > Thanks! > > > > 517: > > The compile error occurred due to a missing header include. > > Futhermore, the compiler warnings should be also fixed. > > > > 518: > > I have added a check whether MPI I/O is available and add the > > corresponding VT's > > configure option to enable/disable MPI I/O support. Therefor I used the > > variable > > "define_mpi_io" from 'ompi/mca/io/configure.m4'. Is that o.k. or should > > I use another > > variable ? > > > > > > Matthias > > > > > > On Di, 2008-01-29 at 09:19 -0500, Jeff Squyres wrote: > >> I got a bunch of compiler warnings and errors with VT on the PGI > >> compiler last night -- my mail client won't paste it in nicely. :-( > >> > >> See these MTT reports for details: > >> > >> - On Absoft systems: > >>http://www.open-mpi.org/mtt/index.php?do_redir=516 > >> - On Cisco systems: > >>With PGI compilers: > >>http://www.open-mpi.org/mtt/index.php?do_redir=517 > >>With GNU compilers: > >>http://www.open-mpi.org/mtt/index.php?do_redir=518 > >> > >> The output may be a bit hard to read -- for MTT builds, we separate > >> the stdout and stderr into 2 streams. So you kinda have to merge them > >> in your head; sorry... > >> > > -- > > Matthias Jurenz, > > Center for Information Services and > > High Performance Computing (ZIH), TU Dresden, > > Willersbau A106, Zellescher Weg 12, 01062 Dresden > > phone +49-351-463-31945, fax +49-351-463-37773 > > > > > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] SnapC
So the ompi-checkpoint command connects with the Global Coordinator in the SnapC 'full' component. The Global Coordinator lives in the HNP (mpirun/orterun) as determined by the 'full' component. As a result to start a checkpoint ompi-checkpoint must connect to the HNP. From a user standpoint, they are typically running ompi-checkpoint from the same machine where they started mpirun. So it made the most sense to have these two connect to each other, especially if we ask the user to provide the PID of the mpirun process to checkpoint. That being said, with the proper changes to 'full' (or with a new SnapC component), ompi-checkpoint could issue the checkpoint request to any process in the MPI job [orterun, orted, application processes] and have the correct things happen. I have received one request for this functionality, but have not had the time yet to dig into it. Does that help? Cheers, Josh On Jan 31, 2008, at 9:51 AM, Leonardo Fialho wrote: Hi all (and Josh), Why the ompi-checkpoint have to contact the HNP specifically? If I use another process to start the snapshot coordinator, apparently it´s works fine, no? PS: I prefer to send this message to the list... to keep it on the history for further use... -- Leonardo Fialho Computer Architecture and Operating Systems Department - CAOS Universidad Autonoma de Barcelona - UAB ETSE, Edifcio Q, QC/3088 http://www.caos.uab.es Phone: +34-93-581-2888 Fax: +34-93-581-2478 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [MTT devel] Reporter Slowness
Ok so the script is done. It took a bit longer than I had expected, but when it finished then things sped back up ('24 hours' of data in 6 sec). There are a few more maintenance operations I want to run which will help out a bit more, but I'll push those to this weekend. Thanks for your patience, and let me know if it feels sluggish again. So as of this email things should be back to normal. Cheers, Josh On Jan 30, 2008, at 5:09 PM, Josh Hursey wrote: I've started the script running. Below is a short version, and a trilogy of the gory details. I wanted to write up the details so if it ever happens again to us (or someone else) they can see what we did to fix it. The Short Version: -- The Slowness(tm) was caused by the recent shifting of data in the database to resolve the partition table problems seen earlier this month. The bad news is that it will take about 14 hours to finish. The good news is that I confirmed that this will fix the performance problem that we are seeing. In the small run this technique reduce the '24 hour' query execution time from ~40 sec back down to ~8 sec. This may slow down client submits this evening, but should not prevent them from being able to submit. The 'DELETE' operations do not require an exclusive lock, so the 'INSERT' operations should proceed fine concurrently. The 'INSERT' operations will need to be blocked while the 'VACUUM FULL' operation is progressing since it *does* require an exclusive lock. The 'INSERT' operations will proceed normally once this lock is released resulting in a temporary slowdown for clients that submit during these windows of time (about 20 min or so). The Details: Part 1: What I did earlier this week: (more than you wanted to know for prosperity purposes) -- The original problem was that the master partition tables accidently started storing data because I forgot to load the 2008 partition tables into the database before the first of the year. :( So we loaded the partition tables, but we still needed to move the misplaced data. To move the misplaced data we have to duplicate the row (so it is stored properly this time), but we also need to take care in assigning row IDs to the duplicate rows. We cannot give the dup'ed rows the same ID or we will be unable to differentiate the original and the dup'ed row. So I created a dummy table for mpi_install/test_build/test_run to translate between the orig row ID and the dup'ed row ID. I used the nextval on the sequence to populate the values for the dup'ed rows in the dummy table. Now that I had translation I joined the dummy table with it's corresponding master table (e.g. "mpi_install join mpi_install_dummy on mpi_install.mpi_install_id = mpi_install_dummy.orig_id"), and instead of selecting the original ID from the dummy table I selected the new dup'ed ID. I inserted this selection back in to the mpi_install table. (Cool little trick that PostgreSQL lets you get away with sometimes). Once I have duplicated all of the effected rows, then I updated all references to the original ID and set it to the duplicated ID in the test_build/test_run tables. This removed all internal reference to the original ID, and replaced it with the duplicated so we retain integrity of the data. Once I have verified that no tables references the original row I delete those rows from the mpi_install/test_build/test_run tables. The Details: Part 2: What I forgot to do: - When rows are deleted from PostgreSQL the disk space used continues to be reserved for this table, and is not reclaimed unless you 'VACUUM FULL' this table. PostgreSQL does this for many good reasons which are described in their documentation. However in the case of the master partition tables we want them to release all of their disk space since we should never be storing data in this particular table. I did a 'VACUUM FULL' on the mpi_install and test_build tables originally, but did not do it on the test_run table since this operation requires an exclusive lock on the table and can take a long time to finish. Further I only completed about 1% of the deletions for test_run before I stopped this operation choosing to wait for the weekend since it will take a long time to complete. By only deleting part of the test_run master table (which contained about 1.2 Million rows) this caused the queries on this table to slow down considerably. The Query Planner estimated the execution of the '24 hour' query at 322,924 and it completed in about 40 seconds. I ran 'VACUUM FULL test_run' which only Vacuums the master table, and then re-ran the query. This time the Query Planner estimated the execution at 151,430 and it completed in about 8 seconds. The Details: Part 3: What I am doing now: - Currently I am deleting the rest of the old rows from test_run. There are approx. 1.2 million rows, and
[OMPI devel] SnapC
Hi all (and Josh), Why the ompi-checkpoint have to contact the HNP specifically? If I use another process to start the snapshot coordinator, apparently it´s works fine, no? PS: I prefer to send this message to the list... to keep it on the history for further use... -- Leonardo Fialho Computer Architecture and Operating Systems Department - CAOS Universidad Autonoma de Barcelona - UAB ETSE, Edifcio Q, QC/3088 http://www.caos.uab.es Phone: +34-93-581-2888 Fax: +34-93-581-2478
Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307
On Wed, Jan 30, 2008 at 06:48:54PM +0100, Adrian Knoth wrote: > > What is the real issue behind this whole discussion? > Hanging connections. > I'll have a look at it tomorrow. To everybody who's interested in BTL-TCP, especially George and (to a minor degree) rhc: I've integrated something what I call "magic address selection code". See the comments in r17348. Can you check https://svn.open-mpi.org/svn/ompi/tmp-public/btl-tcp if it's working for you? Read: multi-rail TCP, FNN, whatever is important to you? The code is proof of concept and could use a little tuning (if it's working at all. Over here, it satisfies all tests). I vaguely remember that at least Ralph doesn't like int a[perm_size * sizeof(int)]; where perm_size is dynamically evaluated (read: array size is runtime dependent) There are also some large arrays, search for MAX_KERNEL_INTERFACE_INDEX. Perhaps it's better to replace them with an appropriate OMPI data structure. I don't know what fits best, you guys know the details... So please give the code a try, and if it's working, feel free to cleanup whatever is necessary to make it the OMPI style or give me some pointers what to change. I'd like to point to Thomas' diploma thesis. The PDF explains the theory behind the code, it's like an rationale. Unfortunately, the PDF has some typos, but I guess you'll get the idea. It's a graph matching algorithm, Chapter 3 covers everything in detail: http://cluster.inf-ra.uni-jena.de/~adi/peiselt-thesis.pdf HTH -- Cluster and Metacomputing Working Group Friedrich-Schiller-Universität Jena, Germany private: http://adi.thur.de
Re: [OMPI devel] 32 bit udapl warnings
On Thu, Jan 31, 2008 at 08:45:54AM -0500, Don Kerr wrote: > This was brought to my attention once before but I don't see this > message so I just plain forgot about it. :-( > uDAPL defines its pointers as uint64, "typedef DAT_UINT64 DAT_VADDR", > and pval is a "void *" which is why the message comes up. If I remove > the cast I believe I get a different warning and I just haven't stopped > to think of a way around this. dat_pointer = (DAT_VADDR)(uintptr_t)void_pointer; This is not just a warning. This is a real bug. If MSB of a void pointer will be 1 it will be sign extended. > > Tim Prins wrote: > > Hi, > > > > I am seeing some warnings on the trunk when compiling udapl in 32 bit > > mode with OFED 1.2.5.1: > > > > btl_udapl.c: In function 'udapl_reg_mr': > > btl_udapl.c:95: warning: cast from pointer to integer of different size > > btl_udapl.c: In function 'mca_btl_udapl_alloc': > > btl_udapl.c:852: warning: cast from pointer to integer of different size > > btl_udapl.c: In function 'mca_btl_udapl_prepare_src': > > btl_udapl.c:959: warning: cast from pointer to integer of different size > > btl_udapl.c:1008: warning: cast from pointer to integer of different size > > btl_udapl_component.c: In function 'mca_btl_udapl_component_progress': > > btl_udapl_component.c:871: warning: cast from pointer to integer of > > different size > > btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_write_eager': > > btl_udapl_endpoint.c:130: warning: cast from pointer to integer of > > different size > > btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_finish_max': > > btl_udapl_endpoint.c:775: warning: cast from pointer to integer of > > different size > > btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_post_recv': > > btl_udapl_endpoint.c:864: warning: cast from pointer to integer of > > different size > > btl_udapl_endpoint.c: In function > > 'mca_btl_udapl_endpoint_initialize_control_message': > > btl_udapl_endpoint.c:1012: warning: cast from pointer to integer of > > different size > > > > > > Thanks, > > > > Tim > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.
Re: [OMPI devel] 32 bit udapl warnings
This was brought to my attention once before but I don't see this message so I just plain forgot about it. :-( uDAPL defines its pointers as uint64, "typedef DAT_UINT64 DAT_VADDR", and pval is a "void *" which is why the message comes up. If I remove the cast I believe I get a different warning and I just haven't stopped to think of a way around this. Tim Prins wrote: Hi, I am seeing some warnings on the trunk when compiling udapl in 32 bit mode with OFED 1.2.5.1: btl_udapl.c: In function 'udapl_reg_mr': btl_udapl.c:95: warning: cast from pointer to integer of different size btl_udapl.c: In function 'mca_btl_udapl_alloc': btl_udapl.c:852: warning: cast from pointer to integer of different size btl_udapl.c: In function 'mca_btl_udapl_prepare_src': btl_udapl.c:959: warning: cast from pointer to integer of different size btl_udapl.c:1008: warning: cast from pointer to integer of different size btl_udapl_component.c: In function 'mca_btl_udapl_component_progress': btl_udapl_component.c:871: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_write_eager': btl_udapl_endpoint.c:130: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_finish_max': btl_udapl_endpoint.c:775: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_post_recv': btl_udapl_endpoint.c:864: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_initialize_control_message': btl_udapl_endpoint.c:1012: warning: cast from pointer to integer of different size Thanks, Tim ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] vt compiler warnings and errors
Hi Matthias, I just noticed something else that seems odd. On a fresh checkout, I did a autogen and configure. Then I type 'make clean'. Things seem to progress normally, but once it gets to ompi/contrib/vt/vt/extlib/otf, a new configure script gets run. Specifically: [tprins@sif test]$ make clean Making clean in otf make[5]: Entering directory `/san/homedirs/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf' cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run automake-1.10 --gnu cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run autoconf /bin/sh ./config.status --recheck running CONFIG_SHELL=/bin/sh /bin/sh ./configure --with-zlib-lib=-lz --prefix=/usr/local --exec-prefix=/usr/local --bindir=/usr/local/bin --libdir=/usr/local/lib --includedir=/usr/local/include --datarootdir=/usr/local/share/vampirtrace --datadir=${prefix}/share/${PACKAGE_TARNAME} --docdir=${prefix}/share/${PACKAGE_TARNAME}/doc --cache-file=/dev/null --srcdir=. CXXFLAGS=-g -Wall -Wundef -Wno-long-long -finline-functions -pthread LDFLAGS= LIBS=-lnsl -lutil -lm CPPFLAGS= CFLAGS=-g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -pthread FFLAGS= --no-create --no-recursion checking build system type... x86_64-unknown-linux-gnu Not sure if this is expected behavior, but it seems wrong to me. Thanks, Tim Matthias Jurenz wrote: Hello, all three VT related errors which MTT reported should be fixed now. 516: The fix from George Bosilca at this morning should work on MacOS PPC. Thanks! 517: The compile error occurred due to a missing header include. Futhermore, the compiler warnings should be also fixed. 518: I have added a check whether MPI I/O is available and add the corresponding VT's configure option to enable/disable MPI I/O support. Therefor I used the variable "define_mpi_io" from 'ompi/mca/io/configure.m4'. Is that o.k. or should I use another variable ? Matthias On Di, 2008-01-29 at 09:19 -0500, Jeff Squyres wrote: I got a bunch of compiler warnings and errors with VT on the PGI compiler last night -- my mail client won't paste it in nicely. :-( See these MTT reports for details: - On Absoft systems: http://www.open-mpi.org/mtt/index.php?do_redir=516 - On Cisco systems: With PGI compilers: http://www.open-mpi.org/mtt/index.php?do_redir=517 With GNU compilers: http://www.open-mpi.org/mtt/index.php?do_redir=518 The output may be a bit hard to read -- for MTT builds, we separate the stdout and stderr into 2 streams. So you kinda have to merge them in your head; sorry... -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] 32 bit udapl warnings
Hi, I am seeing some warnings on the trunk when compiling udapl in 32 bit mode with OFED 1.2.5.1: btl_udapl.c: In function 'udapl_reg_mr': btl_udapl.c:95: warning: cast from pointer to integer of different size btl_udapl.c: In function 'mca_btl_udapl_alloc': btl_udapl.c:852: warning: cast from pointer to integer of different size btl_udapl.c: In function 'mca_btl_udapl_prepare_src': btl_udapl.c:959: warning: cast from pointer to integer of different size btl_udapl.c:1008: warning: cast from pointer to integer of different size btl_udapl_component.c: In function 'mca_btl_udapl_component_progress': btl_udapl_component.c:871: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_write_eager': btl_udapl_endpoint.c:130: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_finish_max': btl_udapl_endpoint.c:775: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_post_recv': btl_udapl_endpoint.c:864: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_initialize_control_message': btl_udapl_endpoint.c:1012: warning: cast from pointer to integer of different size Thanks, Tim
Re: [OMPI devel] orte_ns_base_select failed: returned value -1 instead of ORTE_SUCCESS
I tried using a fresh trunk, same problem have occured. Here is the complete configure line. I am using libtool 1.5.22 from fink. Otherwise everything is standard OS 10.5. $ ../trunk/configure --prefix=/Users/bouteill/ompi/build --enable- mpirun-prefix-by-default --disable-io-romio --enable-debug --enable- picky --enable-mem-debug --enable-mem-profile --enable-visibility -- disable-dlopen --disable-shared --enable-static The error message generated by abort contains garbage (line numbers do not match anything in .c files and according to gdb the failure does not occur during ns initialization). This looks like a heap corruption or something as bad. orterun (argc=4, argv=0xb81c) at ../../../../trunk/orte/tools/ orterun/orterun.c:529 529 cb_states = ORTE_PROC_STATE_TERMINATED | ORTE_PROC_STATE_AT_STG1; (gdb) n 530 rc = orte_rmgr.spawn_job(apps, num_apps, , 0, NULL, job_state_callback, cb_states, ); (gdb) n 531 while (NULL != (item = opal_list_remove_first())) OBJ_RELEASE(item); (gdb) n ** Stepping over inlined function code. ** 532 OBJ_DESTRUCT(); (gdb) n 534 if (orterun_globals.do_not_launch) { (gdb) n 539 OPAL_THREAD_LOCK(_globals.lock); (gdb) n 541 if (ORTE_SUCCESS == rc) { (gdb) n 542 while (!orterun_globals.exit) { (gdb) n 543 opal_condition_wait(_globals.cond, (gdb) n [grosse-pomme.local:77335] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/ oob_base_init.c at line 74 Aurelien Le 30 janv. 08 à 17:18, Ralph Castain a écrit : Are you running on the trunk, or an earlier release? If the trunk, then I suspect you have a stale library hanging around. I build and run statically on Leopard regularly. On 1/30/08 2:54 PM, "Aurélien Bouteiller"wrote: I get a runtime error in static build on Mac OS 10.5 (automake 1.10, autoconf 2.60, gcc-apple-darwin 4.01, libtool 1.5.22). The error does not occur in dso builds, and everything seems to work fine on Linux. Here is the error log. ~/ompi$ mpirun -np 2 NetPIPE_3.6/NPmpi [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/ oob_base_init.c at line 74 [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/ns/proxy/ ns_proxy_component.c at line 222 [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Error in file / SourceCache/openmpi/openmpi-5/openmpi/orte/runtime/orte_init_stage1.c at line 230 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ns_base_select failed --> Returned value -1 instead of ORTE_SUCCESS -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init_stage1 failed --> Returned "Error" (-1) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -- Dr. Aurélien Bouteiller Sr. Research Associate - Innovative Computing Laboratory Suite 350, 1122 Volunteer Boulevard Knoxville, TN 37996 865 974 6321 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel