Re: [OMPI devel] orte_ns_base_select failed: returned value -1 instead of ORTE_SUCCESS

2008-01-31 Thread Ralph Castain
Hmmm...well, my bad. There does indeed appear to be something funny going on
with Leopard. No idea what - it used to work fine. I haven't tested it in
awhile though - I've been test building regularly on Leopard, but running on
Tiger (I misspoke earlier).

For now, I'm afraid you can't run on Leopard. Have to figure it out later
when I have more time.

Ralph


> -- Forwarded Message
>> From: Aurélien Bouteiller 
>> Reply-To: Open MPI Developers 
>> Date: Thu, 31 Jan 2008 02:18:27 -0500
>> To: Open MPI Developers 
>> Subject: Re: [OMPI devel] orte_ns_base_select failed: returned value -1
>> instead of ORTE_SUCCESS
>> 
>> I tried using a fresh trunk, same problem have occured.  Here is the
>> complete configure line. I am using libtool 1.5.22 from fink.
>> Otherwise everything is standard OS 10.5.
>> 
>>$ ../trunk/configure --prefix=/Users/bouteill/ompi/build --enable-
>> mpirun-prefix-by-default --disable-io-romio --enable-debug --enable-
>> picky --enable-mem-debug --enable-mem-profile --enable-visibility --
>> disable-dlopen --disable-shared --enable-static
>> 
>> The error message generated by abort contains garbage (line numbers do
>> not match anything in .c files and according to gdb the failure does
>> not occur during ns initialization). This looks like a heap corruption
>> or something as bad.
>> 
>> orterun (argc=4, argv=0xb81c) at ../../../../trunk/orte/tools/
>> orterun/orterun.c:529
>> 529 cb_states = ORTE_PROC_STATE_TERMINATED |
>> ORTE_PROC_STATE_AT_STG1;
>> (gdb) n
>> 530 rc = orte_rmgr.spawn_job(apps, num_apps, , 0, NULL,
>> job_state_callback, cb_states, );
>> (gdb) n
>> 531 while (NULL != (item = opal_list_remove_first()))
>> OBJ_RELEASE(item);
>> (gdb) n
>> ** Stepping over inlined function code. **
>> 532 OBJ_DESTRUCT();
>> (gdb) n
>> 534 if (orterun_globals.do_not_launch) {
>> (gdb) n
>> 539 OPAL_THREAD_LOCK(_globals.lock);
>> (gdb) n
>> 541 if (ORTE_SUCCESS == rc) {
>> (gdb) n
>> 542 while (!orterun_globals.exit) {
>> (gdb) n
>> 543 opal_condition_wait(_globals.cond,
>> (gdb) n
>> [grosse-pomme.local:77335] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in
>> file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/
>> oob_base_init.c at line 74
>> 
>> Aurelien
>> 
>> 
>> Le 30 janv. 08 à 17:18, Ralph Castain a écrit :
>> 
>>> Are you running on the trunk, or an earlier release?
>>> 
>>> If the trunk, then I suspect you have a stale library hanging
>>> around. I
>>> build and run statically on Leopard regularly.
>>> 
>>> 
>>> On 1/30/08 2:54 PM, "Aurélien Bouteiller" 
>>> wrote:
>>> 
 I get a runtime error in static build on Mac OS 10.5 (automake 1.10,
 autoconf 2.60, gcc-apple-darwin 4.01, libtool 1.5.22).
 
 The error does not occur in dso builds, and everything seems to work
 fine on Linux.
 
 Here is the error log.
 
 ~/ompi$ mpirun -np 2 NetPIPE_3.6/NPmpi
 [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in
 file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/
 oob_base_init.c at line 74
 [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in
 file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/ns/proxy/
 ns_proxy_component.c at line 222
 [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Error in file /
 SourceCache/openmpi/openmpi-5/openmpi/orte/runtime/orte_init_stage1.c
 at line 230
 --
 It looks like orte_init failed for some reason; your parallel
 process is
 likely to abort.  There are many reasons that a parallel process can
 fail during orte_init; some of which are due to configuration or
 environment problems.  This failure appears to be an internal
 failure;
 here's some additional information (which may only be relevant to an
 Open MPI developer):
 
   orte_ns_base_select failed
   --> Returned value -1 instead of ORTE_SUCCESS
 
 --
 --
 It looks like MPI_INIT failed for some reason; your parallel
 process is
 likely to abort.  There are many reasons that a parallel process can
 fail during MPI_INIT; some of which are due to configuration or
 environment
 problems.  This failure appears to be an internal failure; here's
 some
 additional information (which may only be relevant to an Open MPI
 developer):
 
   ompi_mpi_init: orte_init_stage1 failed
   --> Returned "Error" (-1) instead of "Success" (0)
 --
 *** An error occurred in MPI_Init
 *** before MPI was initialized
 *** MPI_ERRORS_ARE_FATAL (goodbye)

Re: [OMPI devel] vt compiler warnings and errors

2008-01-31 Thread Jeff Squyres
Ah -- I didn't notice this before -- do you have a configure script  
committed to SVN?  If so, this could be the problem.


Whether what Tim sees happens or not will depend on the timestamps  
that SVN puts on configure and all of the files dependent upon  
configure (Makefile.in, Makefile, ...etc.) in the VT tree.  If some of  
them have "bad" timestamps, then the dependencies in the Makefiles can  
end up re-running VT's configure, re-create configure, etc.


Is there a way to get OMPI's autogen to also autogen the VT software?   
This would ensure one, consistent set of timestamps (not dependent  
upon what timestamps SVN wrote to your filesystem for these sensitive  
files).




On Jan 31, 2008, at 12:36 PM, Matthias Jurenz wrote:


Hi Tim,

that seems wrong for me, too. I could not reproduce this on my  
computer.
The VT-integration comes with an own configure script, which will  
not created by the OMPI's autogen.sh.
I have not really an idea what's going wrong... I suppose, the  
problem is that you use another version
of the Autotools as I have used to bootstap VT ?!? The VT's  
configure script was created by following

version of the Autotools:

autoconf 2.61, automake 1.10, libtool 1.5.24.

Which version of the Autotools you are using to boostrap OpenMPI ?


Matthias


On Do, 2008-01-31 at 08:09 -0500, Tim Prins wrote:


Hi Matthias,

I just noticed something else that seems odd. On a fresh checkout,  
I did

a autogen and configure. Then I type 'make clean'. Things seem to
progress normally, but once it gets to ompi/contrib/vt/vt/extlib/ 
otf, a

new configure script gets run.

Specifically:
[tprins@sif test]$ make clean

Making clean in otf
make[5]: Entering directory
`/san/homedirs/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf'
  cd . && /bin/sh
/u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run
automake-1.10 --gnu
cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/ 
missing

--run autoconf
/bin/sh ./config.status --recheck
running CONFIG_SHELL=/bin/sh /bin/sh ./configure  --with-zlib-lib=-lz
--prefix=/usr/local --exec-prefix=/usr/local --bindir=/usr/local/bin
--libdir=/usr/local/lib --includedir=/usr/local/include
--datarootdir=/usr/local/share/vampirtrace
--datadir=${prefix}/share/${PACKAGE_TARNAME}
--docdir=${prefix}/share/${PACKAGE_TARNAME}/doc --cache-file=/dev/ 
null
--srcdir=. CXXFLAGS=-g -Wall -Wundef -Wno-long-long -finline- 
functions

-pthread LDFLAGS=  LIBS=-lnsl -lutil  -lm  CPPFLAGS=  CFLAGS=-g -Wall
-Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes
-Wstrict-prototypes -Wcomment -pedantic
-Werror-implicit-function-declaration -finline-functions
-fno-strict-aliasing -pthread FFLAGS=  --no-create --no-recursion
checking build system type... x86_64-unknown-linux-gnu



Not sure if this is expected behavior, but it seems wrong to me.

Thanks,

Tim

Matthias Jurenz wrote:
> Hello,
>
> all three VT related errors which MTT reported should be fixed now.
>
> 516:
> The fix from George Bosilca at this morning should work on MacOS  
PPC.

> Thanks!
>
> 517:
> The compile error occurred due to a missing header include.
> Futhermore, the compiler warnings should be also fixed.
>
> 518:
> I have added a check whether MPI I/O is available and add the
> corresponding VT's
> configure option to enable/disable MPI I/O support. Therefor I  
used the

> variable
> "define_mpi_io" from 'ompi/mca/io/configure.m4'. Is that o.k. or  
should

> I use another
> variable ?
>
>
> Matthias
>
>
> On Di, 2008-01-29 at 09:19 -0500, Jeff Squyres wrote:
>> I got a bunch of compiler warnings and errors with VT on the PGI
>> compiler last night -- my mail client won't paste it in  
nicely.  :-(

>>
>> See these MTT reports for details:
>>
>> - On Absoft systems:
>>http://www.open-mpi.org/mtt/index.php?do_redir=516
>> - On Cisco systems:
>>With PGI compilers:
>>http://www.open-mpi.org/mtt/index.php?do_redir=517
>>With GNU compilers:
>>http://www.open-mpi.org/mtt/index.php?do_redir=518
>>
>> The output may be a bit hard to read -- for MTT builds, we  
separate
>> the stdout and stderr into 2 streams.  So you kinda have to  
merge them

>> in your head; sorry...
>>
> --
> Matthias Jurenz,
> Center for Information Services and
> High Performance Computing (ZIH), TU Dresden,
> Willersbau A106, Zellescher Weg 12, 01062 Dresden
> phone +49-351-463-31945, fax +49-351-463-37773
>
>
>  


>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Matthias Jurenz,
Center for Information Services and
High Performance Computing (ZIH), TU Dresden,
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773

Re: [OMPI devel] vt compiler warnings and errors

2008-01-31 Thread Matthias Jurenz
Hi Tim,

that seems wrong for me, too. I could not reproduce this on my computer.
The VT-integration comes with an own configure script, which will not
created by the OMPI's autogen.sh.
I have not really an idea what's going wrong... I suppose, the problem
is that you use another version
of the Autotools as I have used to bootstap VT ?!? The VT's configure
script was created by following
version of the Autotools:

autoconf 2.61, automake 1.10, libtool 1.5.24.

Which version of the Autotools you are using to boostrap OpenMPI ?


Matthias


On Do, 2008-01-31 at 08:09 -0500, Tim Prins wrote:

> Hi Matthias,
> 
> I just noticed something else that seems odd. On a fresh checkout, I did 
> a autogen and configure. Then I type 'make clean'. Things seem to 
> progress normally, but once it gets to ompi/contrib/vt/vt/extlib/otf, a 
> new configure script gets run.
> 
> Specifically:
> [tprins@sif test]$ make clean
> 
> Making clean in otf
> make[5]: Entering directory 
> `/san/homedirs/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf'
>   cd . && /bin/sh 
> /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run 
> automake-1.10 --gnu
> cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing 
> --run autoconf
> /bin/sh ./config.status --recheck
> running CONFIG_SHELL=/bin/sh /bin/sh ./configure  --with-zlib-lib=-lz 
> --prefix=/usr/local --exec-prefix=/usr/local --bindir=/usr/local/bin 
> --libdir=/usr/local/lib --includedir=/usr/local/include 
> --datarootdir=/usr/local/share/vampirtrace 
> --datadir=${prefix}/share/${PACKAGE_TARNAME} 
> --docdir=${prefix}/share/${PACKAGE_TARNAME}/doc --cache-file=/dev/null 
> --srcdir=. CXXFLAGS=-g -Wall -Wundef -Wno-long-long -finline-functions 
> -pthread LDFLAGS=  LIBS=-lnsl -lutil  -lm  CPPFLAGS=  CFLAGS=-g -Wall 
> -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes 
> -Wstrict-prototypes -Wcomment -pedantic 
> -Werror-implicit-function-declaration -finline-functions 
> -fno-strict-aliasing -pthread FFLAGS=  --no-create --no-recursion
> checking build system type... x86_64-unknown-linux-gnu
> 
> 
> 
> Not sure if this is expected behavior, but it seems wrong to me.
> 
> Thanks,
> 
> Tim
> 
> Matthias Jurenz wrote:
> > Hello,
> > 
> > all three VT related errors which MTT reported should be fixed now.
> > 
> > 516:
> > The fix from George Bosilca at this morning should work on MacOS PPC. 
> > Thanks!
> > 
> > 517:
> > The compile error occurred due to a missing header include.
> > Futhermore, the compiler warnings should be also fixed.
> > 
> > 518:
> > I have added a check whether MPI I/O is available and add the 
> > corresponding VT's
> > configure option to enable/disable MPI I/O support. Therefor I used the 
> > variable
> > "define_mpi_io" from 'ompi/mca/io/configure.m4'. Is that o.k. or should 
> > I use another
> > variable ?
> > 
> > 
> > Matthias
> > 
> > 
> > On Di, 2008-01-29 at 09:19 -0500, Jeff Squyres wrote:
> >> I got a bunch of compiler warnings and errors with VT on the PGI  
> >> compiler last night -- my mail client won't paste it in nicely.  :-(
> >>
> >> See these MTT reports for details:
> >>
> >> - On Absoft systems:
> >>http://www.open-mpi.org/mtt/index.php?do_redir=516
> >> - On Cisco systems:
> >>With PGI compilers:
> >>http://www.open-mpi.org/mtt/index.php?do_redir=517
> >>With GNU compilers:
> >>http://www.open-mpi.org/mtt/index.php?do_redir=518
> >>
> >> The output may be a bit hard to read -- for MTT builds, we separate  
> >> the stdout and stderr into 2 streams.  So you kinda have to merge them  
> >> in your head; sorry...
> >>
> > --
> > Matthias Jurenz,
> > Center for Information Services and
> > High Performance Computing (ZIH), TU Dresden,
> > Willersbau A106, Zellescher Weg 12, 01062 Dresden
> > phone +49-351-463-31945, fax +49-351-463-37773
> > 
> > 
> > 
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

--
Matthias Jurenz,
Center for Information Services and 
High Performance Computing (ZIH), TU Dresden, 
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773


smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] SnapC

2008-01-31 Thread Josh Hursey
So the ompi-checkpoint command connects with the Global Coordinator in  
the SnapC 'full' component. The Global Coordinator lives in the HNP  
(mpirun/orterun) as determined by the 'full' component. As a result to  
start a checkpoint ompi-checkpoint must connect to the HNP.


From a user standpoint, they are typically running ompi-checkpoint  
from the same machine where they started mpirun. So it made the most  
sense to have these two connect to each other, especially if we ask  
the user to provide the PID of the mpirun process to checkpoint.


That being said, with the proper changes to 'full' (or with a new  
SnapC component), ompi-checkpoint could issue the checkpoint request  
to any process in the MPI job [orterun, orted, application processes]  
and have the correct things happen.


I have received one request for this functionality, but have not had  
the time yet to dig into it.


Does that help?

Cheers,
Josh

On Jan 31, 2008, at 9:51 AM, Leonardo Fialho wrote:


Hi all (and Josh),

Why the ompi-checkpoint have to contact the HNP specifically? If I use
another process to start the snapshot coordinator, apparently it´s
works fine, no?

PS: I prefer to send this message to the list... to keep it on the
history for further use...

--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [MTT devel] Reporter Slowness

2008-01-31 Thread Josh Hursey
Ok so the script is done. It took a bit longer than I had expected,  
but when it finished then things sped back up ('24 hours' of data in 6  
sec). There are a few more maintenance operations I want to run which  
will help out a bit more, but I'll push those to this weekend.


Thanks for your patience, and let me know if it feels sluggish again.  
So as of this email things should be back to normal.


Cheers,
Josh

On Jan 30, 2008, at 5:09 PM, Josh Hursey wrote:


I've started the script running.

Below is a short version, and a trilogy of the gory details. I wanted
to write up the details so if it ever happens again to us (or someone
else) they can see what we did to fix it.


The Short Version:
--
The Slowness(tm) was caused by the recent shifting of data in the
database to resolve the partition table problems seen earlier this
month.

The bad news is that it will take about 14 hours to finish.

The good news is that I confirmed that this will fix the performance
problem that we are seeing. In the small run this technique reduce the
'24 hour' query execution time from ~40 sec back down to ~8 sec.

This may slow down client submits this evening, but should not prevent
them from being able to submit. The 'DELETE' operations do not require
an exclusive lock, so the 'INSERT' operations should proceed fine
concurrently. The 'INSERT' operations will need to be blocked while
the 'VACUUM FULL' operation is progressing since it *does* require an
exclusive lock. The 'INSERT' operations will proceed normally once
this lock is released resulting in a temporary slowdown for clients
that submit during these windows of time (about 20 min or so).



The Details: Part 1: What I did earlier this week:
(more than you wanted to know for prosperity purposes)
--
The original problem was that the master partition tables accidently
started storing data because I forgot to load the 2008 partition
tables into the database before the first of the year. :( So we loaded
the partition tables, but we still needed to move the misplaced data.

To move the misplaced data we have to duplicate the row (so it is
stored properly this time), but we also need to take care in assigning
row IDs to the duplicate rows. We cannot give the dup'ed rows the same
ID or we will be unable to differentiate the original and the dup'ed
row. So I created a dummy table for mpi_install/test_build/test_run to
translate between the orig row ID and the dup'ed row ID. I used the
nextval on the sequence to populate the values for the dup'ed rows in
the dummy table.

Now that I had translation I joined the dummy table with it's
corresponding master table (e.g. "mpi_install join mpi_install_dummy
on mpi_install.mpi_install_id = mpi_install_dummy.orig_id"), and
instead of selecting the original ID from the dummy table I selected
the new dup'ed ID. I inserted this selection back in to the
mpi_install table. (Cool little trick that PostgreSQL lets you get
away with sometimes).

Once I have duplicated all of the effected rows, then I updated all
references to the original ID and set it to the duplicated ID in the
test_build/test_run tables. This removed all internal reference to the
original ID, and replaced it with the duplicated so we retain
integrity of the data.

Once I have verified that no tables references the original row I
delete those rows from the mpi_install/test_build/test_run tables.



The Details: Part 2: What I forgot to do:
-
When rows are deleted from PostgreSQL the disk space used continues to
be reserved for this table, and is not reclaimed unless you 'VACUUM
FULL' this table. PostgreSQL does this for many good reasons which are
described in their documentation. However in the case of the master
partition tables we want them to release all of their disk space since
we should never be storing data in this particular table.

I did a 'VACUUM FULL' on the mpi_install and test_build tables
originally, but did not do it on the test_run table since this
operation requires an exclusive lock on the table and can take a long
time to finish. Further I only completed about 1% of the deletions for
test_run before I stopped this operation choosing to wait for the
weekend since it will take a long time to complete.

By only deleting part of the test_run master table (which contained
about 1.2 Million rows) this caused the queries on this table to slow
down considerably. The Query Planner estimated the execution of the
'24 hour' query at 322,924 and it completed in about 40 seconds. I ran
'VACUUM FULL test_run' which only Vacuums the master table, and then
re-ran the query. This time the Query Planner estimated the execution
at 151,430 and it completed in about 8 seconds.



The Details: Part 3: What I am doing now:
-
Currently I am deleting the rest of the old rows from test_run. There
are approx. 1.2 million rows, and 

[OMPI devel] SnapC

2008-01-31 Thread Leonardo Fialho
Hi all (and Josh),

Why the ompi-checkpoint have to contact the HNP specifically? If I use
another process to start the snapshot coordinator, apparently it´s
works fine, no?

PS: I prefer to send this message to the list... to keep it on the
history for further use...

-- 
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478



Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307

2008-01-31 Thread Adrian Knoth
On Wed, Jan 30, 2008 at 06:48:54PM +0100, Adrian Knoth wrote:

> > What is the real issue behind this whole discussion?
> Hanging connections.
> I'll have a look at it tomorrow.

To everybody who's interested in BTL-TCP, especially George and (to a
minor degree) rhc:

I've integrated something what I call "magic address selection code".
See the comments in r17348.

Can you check

   https://svn.open-mpi.org/svn/ompi/tmp-public/btl-tcp

if it's working for you? Read: multi-rail TCP, FNN, whatever is
important to you?


The code is proof of concept and could use a little tuning (if it's
working at all. Over here, it satisfies all tests).

I vaguely remember that at least Ralph doesn't like

   int a[perm_size * sizeof(int)];

where perm_size is dynamically evaluated (read: array size is runtime
dependent)

There are also some large arrays, search for MAX_KERNEL_INTERFACE_INDEX.
Perhaps it's better to replace them with an appropriate OMPI data
structure. I don't know what fits best, you guys know the details...


So please give the code a try, and if it's working, feel free to cleanup
whatever is necessary to make it the OMPI style or give me some pointers
what to change.


I'd like to point to Thomas' diploma thesis. The PDF explains the theory
behind the code, it's like an rationale. Unfortunately, the PDF has some
typos, but I guess you'll get the idea. It's a graph matching algorithm,
Chapter 3 covers everything in detail:

 http://cluster.inf-ra.uni-jena.de/~adi/peiselt-thesis.pdf


HTH

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] 32 bit udapl warnings

2008-01-31 Thread Gleb Natapov
On Thu, Jan 31, 2008 at 08:45:54AM -0500, Don Kerr wrote:
> This was brought to my attention once before but I don't see this 
> message so I just plain forgot about it. :-(
> uDAPL defines its pointers as uint64, "typedef DAT_UINT64 DAT_VADDR", 
> and pval is a "void *" which is why the message comes up.  If I remove 
> the cast I believe I get a different warning and I just haven't stopped 
> to think of a way around this.
dat_pointer = (DAT_VADDR)(uintptr_t)void_pointer;

This is not just a warning. This is a real bug. If MSB of a void pointer
will be 1 it will be sign extended.

> 
> Tim Prins wrote:
> > Hi,
> >
> > I am seeing some warnings on the trunk when compiling udapl in 32 bit 
> > mode with OFED 1.2.5.1:
> >
> > btl_udapl.c: In function 'udapl_reg_mr':
> > btl_udapl.c:95: warning: cast from pointer to integer of different size
> > btl_udapl.c: In function 'mca_btl_udapl_alloc':
> > btl_udapl.c:852: warning: cast from pointer to integer of different size
> > btl_udapl.c: In function 'mca_btl_udapl_prepare_src':
> > btl_udapl.c:959: warning: cast from pointer to integer of different size
> > btl_udapl.c:1008: warning: cast from pointer to integer of different size
> > btl_udapl_component.c: In function 'mca_btl_udapl_component_progress':
> > btl_udapl_component.c:871: warning: cast from pointer to integer of 
> > different size
> > btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_write_eager':
> > btl_udapl_endpoint.c:130: warning: cast from pointer to integer of 
> > different size
> > btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_finish_max':
> > btl_udapl_endpoint.c:775: warning: cast from pointer to integer of 
> > different size
> > btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_post_recv':
> > btl_udapl_endpoint.c:864: warning: cast from pointer to integer of 
> > different size
> > btl_udapl_endpoint.c: In function 
> > 'mca_btl_udapl_endpoint_initialize_control_message':
> > btl_udapl_endpoint.c:1012: warning: cast from pointer to integer of 
> > different size
> >
> >
> > Thanks,
> >
> > Tim
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >   
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.


Re: [OMPI devel] 32 bit udapl warnings

2008-01-31 Thread Don Kerr
This was brought to my attention once before but I don't see this 
message so I just plain forgot about it. :-(
uDAPL defines its pointers as uint64, "typedef DAT_UINT64 DAT_VADDR", 
and pval is a "void *" which is why the message comes up.  If I remove 
the cast I believe I get a different warning and I just haven't stopped 
to think of a way around this.


Tim Prins wrote:

Hi,

I am seeing some warnings on the trunk when compiling udapl in 32 bit 
mode with OFED 1.2.5.1:


btl_udapl.c: In function 'udapl_reg_mr':
btl_udapl.c:95: warning: cast from pointer to integer of different size
btl_udapl.c: In function 'mca_btl_udapl_alloc':
btl_udapl.c:852: warning: cast from pointer to integer of different size
btl_udapl.c: In function 'mca_btl_udapl_prepare_src':
btl_udapl.c:959: warning: cast from pointer to integer of different size
btl_udapl.c:1008: warning: cast from pointer to integer of different size
btl_udapl_component.c: In function 'mca_btl_udapl_component_progress':
btl_udapl_component.c:871: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_write_eager':
btl_udapl_endpoint.c:130: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_finish_max':
btl_udapl_endpoint.c:775: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_post_recv':
btl_udapl_endpoint.c:864: warning: cast from pointer to integer of 
different size
btl_udapl_endpoint.c: In function 
'mca_btl_udapl_endpoint_initialize_control_message':
btl_udapl_endpoint.c:1012: warning: cast from pointer to integer of 
different size



Thanks,

Tim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  


Re: [OMPI devel] vt compiler warnings and errors

2008-01-31 Thread Tim Prins

Hi Matthias,

I just noticed something else that seems odd. On a fresh checkout, I did 
a autogen and configure. Then I type 'make clean'. Things seem to 
progress normally, but once it gets to ompi/contrib/vt/vt/extlib/otf, a 
new configure script gets run.


Specifically:
[tprins@sif test]$ make clean

Making clean in otf
make[5]: Entering directory 
`/san/homedirs/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf'
 cd . && /bin/sh 
/u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run 
automake-1.10 --gnu
cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing 
--run autoconf

/bin/sh ./config.status --recheck
running CONFIG_SHELL=/bin/sh /bin/sh ./configure  --with-zlib-lib=-lz 
--prefix=/usr/local --exec-prefix=/usr/local --bindir=/usr/local/bin 
--libdir=/usr/local/lib --includedir=/usr/local/include 
--datarootdir=/usr/local/share/vampirtrace 
--datadir=${prefix}/share/${PACKAGE_TARNAME} 
--docdir=${prefix}/share/${PACKAGE_TARNAME}/doc --cache-file=/dev/null 
--srcdir=. CXXFLAGS=-g -Wall -Wundef -Wno-long-long -finline-functions 
-pthread LDFLAGS=  LIBS=-lnsl -lutil  -lm  CPPFLAGS=  CFLAGS=-g -Wall 
-Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes 
-Wstrict-prototypes -Wcomment -pedantic 
-Werror-implicit-function-declaration -finline-functions 
-fno-strict-aliasing -pthread FFLAGS=  --no-create --no-recursion

checking build system type... x86_64-unknown-linux-gnu



Not sure if this is expected behavior, but it seems wrong to me.

Thanks,

Tim

Matthias Jurenz wrote:

Hello,

all three VT related errors which MTT reported should be fixed now.

516:
The fix from George Bosilca at this morning should work on MacOS PPC. 
Thanks!


517:
The compile error occurred due to a missing header include.
Futhermore, the compiler warnings should be also fixed.

518:
I have added a check whether MPI I/O is available and add the 
corresponding VT's
configure option to enable/disable MPI I/O support. Therefor I used the 
variable
"define_mpi_io" from 'ompi/mca/io/configure.m4'. Is that o.k. or should 
I use another

variable ?


Matthias


On Di, 2008-01-29 at 09:19 -0500, Jeff Squyres wrote:
I got a bunch of compiler warnings and errors with VT on the PGI  
compiler last night -- my mail client won't paste it in nicely.  :-(


See these MTT reports for details:

- On Absoft systems:
   http://www.open-mpi.org/mtt/index.php?do_redir=516
- On Cisco systems:
   With PGI compilers:
   http://www.open-mpi.org/mtt/index.php?do_redir=517
   With GNU compilers:
   http://www.open-mpi.org/mtt/index.php?do_redir=518

The output may be a bit hard to read -- for MTT builds, we separate  
the stdout and stderr into 2 streams.  So you kinda have to merge them  
in your head; sorry...



--
Matthias Jurenz,
Center for Information Services and
High Performance Computing (ZIH), TU Dresden,
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] 32 bit udapl warnings

2008-01-31 Thread Tim Prins

Hi,

I am seeing some warnings on the trunk when compiling udapl in 32 bit 
mode with OFED 1.2.5.1:


btl_udapl.c: In function 'udapl_reg_mr':
btl_udapl.c:95: warning: cast from pointer to integer of different size
btl_udapl.c: In function 'mca_btl_udapl_alloc':
btl_udapl.c:852: warning: cast from pointer to integer of different size
btl_udapl.c: In function 'mca_btl_udapl_prepare_src':
btl_udapl.c:959: warning: cast from pointer to integer of different size
btl_udapl.c:1008: warning: cast from pointer to integer of different size
btl_udapl_component.c: In function 'mca_btl_udapl_component_progress':
btl_udapl_component.c:871: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_write_eager':
btl_udapl_endpoint.c:130: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_finish_max':
btl_udapl_endpoint.c:775: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_post_recv':
btl_udapl_endpoint.c:864: warning: cast from pointer to integer of 
different size
btl_udapl_endpoint.c: In function 
'mca_btl_udapl_endpoint_initialize_control_message':
btl_udapl_endpoint.c:1012: warning: cast from pointer to integer of 
different size



Thanks,

Tim


Re: [OMPI devel] orte_ns_base_select failed: returned value -1 instead of ORTE_SUCCESS

2008-01-31 Thread Aurélien Bouteiller
I tried using a fresh trunk, same problem have occured.  Here is the  
complete configure line. I am using libtool 1.5.22 from fink.  
Otherwise everything is standard OS 10.5.


  $ ../trunk/configure --prefix=/Users/bouteill/ompi/build --enable- 
mpirun-prefix-by-default --disable-io-romio --enable-debug --enable- 
picky --enable-mem-debug --enable-mem-profile --enable-visibility -- 
disable-dlopen --disable-shared --enable-static


The error message generated by abort contains garbage (line numbers do  
not match anything in .c files and according to gdb the failure does  
not occur during ns initialization). This looks like a heap corruption  
or something as bad.


orterun (argc=4, argv=0xb81c) at ../../../../trunk/orte/tools/ 
orterun/orterun.c:529
529	cb_states = ORTE_PROC_STATE_TERMINATED |  
ORTE_PROC_STATE_AT_STG1;

(gdb) n
530	rc = orte_rmgr.spawn_job(apps, num_apps, , 0, NULL,  
job_state_callback, cb_states, );

(gdb) n
531	while (NULL != (item = opal_list_remove_first()))  
OBJ_RELEASE(item);

(gdb) n
** Stepping over inlined function code. **
532 OBJ_DESTRUCT();
(gdb) n
534 if (orterun_globals.do_not_launch) {
(gdb) n
539 OPAL_THREAD_LOCK(_globals.lock);
(gdb) n
541 if (ORTE_SUCCESS == rc) {
(gdb) n
542 while (!orterun_globals.exit) {
(gdb) n
543 opal_condition_wait(_globals.cond,
(gdb) n
[grosse-pomme.local:77335] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in  
file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/ 
oob_base_init.c at line 74


Aurelien


Le 30 janv. 08 à 17:18, Ralph Castain a écrit :


Are you running on the trunk, or an earlier release?

If the trunk, then I suspect you have a stale library hanging  
around. I

build and run statically on Leopard regularly.


On 1/30/08 2:54 PM, "Aurélien Bouteiller"   
wrote:



I get a runtime error in static build on Mac OS 10.5 (automake 1.10,
autoconf 2.60, gcc-apple-darwin 4.01, libtool 1.5.22).

The error does not occur in dso builds, and everything seems to work
fine on Linux.

Here is the error log.

~/ompi$ mpirun -np 2 NetPIPE_3.6/NPmpi
[grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in
file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/
oob_base_init.c at line 74
[grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in
file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/ns/proxy/
ns_proxy_component.c at line 222
[grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Error in file /
SourceCache/openmpi/openmpi-5/openmpi/orte/runtime/orte_init_stage1.c
at line 230
--
It looks like orte_init failed for some reason; your parallel  
process is

likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal  
failure;

here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ns_base_select failed
  --> Returned value -1 instead of ORTE_SUCCESS

--
--
It looks like MPI_INIT failed for some reason; your parallel  
process is

likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's  
some

additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init_stage1 failed
  --> Returned "Error" (-1) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)



--
Dr. Aurélien Bouteiller
Sr. Research Associate - Innovative Computing Laboratory
Suite 350, 1122 Volunteer Boulevard
Knoxville, TN 37996
865 974 6321





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel