Re: [OMPI devel] More VT warnings
* Tim Prins wrote on Fri, Feb 01, 2008 at 04:09:31PM CET: > > Note that this indicates that the file vt_metric_papi.c is being > compiled *3* times. I am not using a parallel make here. Any ideas why > it is compiling 3 times? The file is listed as source file to four different libraries, and per-target CFLAGS are used for these. Between one and four of these libraries are actually built, depending on decisions done at configure time. Cheers, Ralf
Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307
Adrian, For the most part this seems to work for me. But there are a few issues. I'm not sure which are introduced by this patch, and whether some may be expected behavior. But for completeness I will point them all out. First, let me explain I am working on a machine with 3 tcp interfaces, lo, eth0, and ib0. Both eth0 and ib0 connect all the compute nodes. 1. There are some warnings when compiling: btl_tcp_proc.c:171: warning: no previous prototype for 'evaluate_assignment' btl_tcp_proc.c:206: warning: no previous prototype for 'visit' btl_tcp_proc.c:224: warning: no previous prototype for 'mca_btl_tcp_initialise_interface' btl_tcp_proc.c: In function `mca_btl_tcp_proc_insert': btl_tcp_proc.c:304: warning: pointer targets in passing arg 2 of `opal_ifindextomask' differ in signedness btl_tcp_proc.c:313: warning: pointer targets in passing arg 2 of `opal_ifindextomask' differ in signedness btl_tcp_proc.c:389: warning: comparison between signed and unsigned btl_tcp_proc.c:400: warning: comparison between signed and unsigned btl_tcp_proc.c:401: warning: comparison between signed and unsigned btl_tcp_proc.c:459: warning: ISO C90 forbids variable-size array `a' btl_tcp_proc.c:459: warning: ISO C90 forbids mixed declarations and code btl_tcp_proc.c:465: warning: ISO C90 forbids mixed declarations and code btl_tcp_proc.c:466: warning: comparison between signed and unsigned btl_tcp_proc.c:480: warning: comparison between signed and unsigned btl_tcp_proc.c:485: warning: comparison between signed and unsigned btl_tcp_proc.c:495: warning: comparison between signed and unsigned 2. If I exclude all my tcp interfaces, the connection fails properly, but I do get a malloc request for 0 bytes: tprins@odin examples]$ mpirun -mca btl tcp,self -mca btl_tcp_if_exclude eth0,ib0,lo -np 2 ./ring_c malloc debug: Request for 0 bytes (btl_tcp_component.c, 844) malloc debug: Request for 0 bytes (btl_tcp_component.c, 844) 3. If the exclude list does not contain 'lo', or the include list contains 'lo', the job hangs when using multiple nodes: [tprins@odin examples]$ mpirun -mca btl tcp,self -mca btl_tcp_if_exclude ib0 -np 2 -bynode ./ring_cProcess 0 sending 10 to 1, tag 201 (2 processes in ring) [odin011][1,0][btl_tcp_endpoint.c:619:mca_btl_tcp_endpoint_complete_connect] connect() failed: Connection refused (111) [tprins@odin examples]$ mpirun -mca btl tcp,self -mca btl_tcp_if_include eth0,lo -np 2 -bynode ./ring_c Process 0 sending 10 to 1, tag 201 (2 processes in ring) [odin011][1,0][btl_tcp_endpoint.c:619:mca_btl_tcp_endpoint_complete_connect] connect() failed: Connection refused (111) However, the great news about this patch is that it appears to fix https://svn.open-mpi.org/trac/ompi/ticket/1027 for me. Hope this helps, Tim Adrian Knoth wrote: On Wed, Jan 30, 2008 at 06:48:54PM +0100, Adrian Knoth wrote: What is the real issue behind this whole discussion? Hanging connections. I'll have a look at it tomorrow. To everybody who's interested in BTL-TCP, especially George and (to a minor degree) rhc: I've integrated something what I call "magic address selection code". See the comments in r17348. Can you check https://svn.open-mpi.org/svn/ompi/tmp-public/btl-tcp if it's working for you? Read: multi-rail TCP, FNN, whatever is important to you? The code is proof of concept and could use a little tuning (if it's working at all. Over here, it satisfies all tests). I vaguely remember that at least Ralph doesn't like int a[perm_size * sizeof(int)]; where perm_size is dynamically evaluated (read: array size is runtime dependent) There are also some large arrays, search for MAX_KERNEL_INTERFACE_INDEX. Perhaps it's better to replace them with an appropriate OMPI data structure. I don't know what fits best, you guys know the details... So please give the code a try, and if it's working, feel free to cleanup whatever is necessary to make it the OMPI style or give me some pointers what to change. I'd like to point to Thomas' diploma thesis. The PDF explains the theory behind the code, it's like an rationale. Unfortunately, the PDF has some typos, but I guess you'll get the idea. It's a graph matching algorithm, Chapter 3 covers everything in detail: http://cluster.inf-ra.uni-jena.de/~adi/peiselt-thesis.pdf HTH
Re: [OMPI devel] VT in trunk + how to disable
I think my position is about the same as Terry's. I also think we have a precedent for building everything that is possible and letting the user choose at run-time what they want to do. My $0.02 is that it's easier to tell random users (and customers!) "yes, OMPI should have built that for you by default; you use it like this..." vs. "No, sorry, you need to go re-install OMPI to have feature X." We developers are probably a bit more sensitive to this issue since it makes longer builds (and we re-build all the time). But remember that most people install OMPI only a small number of times -- so build time is less of an issue for them. (I'm assuming that at least one of your motivations for asking was the longer build time...?) On Feb 1, 2008, at 10:17 AM, Terry Dontje wrote: Josh Hursey wrote: Should the default be to *disable* vampirtrace? I mention this since, I assume, most people do not depend on this tool for every Open MPI install. Meaning that Open MPI does not require this integration for correct MPI functionality unlike something like ROMIO [example of opt-out functionality which is 3rd party]. So I would suggest to the group that vampirtrace be an opt-in functionality. What do others think? I am not completely against disabling it as a default. However, once it builds consistently having it enabled by default shouldn't really cause any problems for those not directly using it (well outside of more time to compile). I imagine changing the default probably would help ORTE move forward but then I wonder if we will run into issues of the vampire stuff not being able to resolve their issues because of ORTE problems put back to the trunk. --td -- Josh On Jan 28, 2008, at 9:59 AM, Andreas Knüpfer wrote: Hi everybody, the vampirtrace integration arrived at the trunk today. There seems to be one issue already, but we'll fix this asap. As a general hint, this is how to completely disable anything we integrated: configure --enable-contrib-no-build=vt ... Then again, we'd like to see all the issues you may encounter and fix them. Best regards, Andreas -- Dipl. Math. Andreas Knuepfer, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A114, Zellescher Weg 12, 01062 Dresden phone +49-351-463-38323, fax +49-351-463-37773 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] vt compiler warnings and errors
On Feb 1, 2008, at 5:35 AM, Ralf Wildenhues wrote: These files do not belong in SVN, they are generated by aclocal: ompi/contrib/vt/vt/extlib/otf/aclocal.m4 ompi/contrib/vt/vt/aclocal.m4 I think both of these have their own configure scripts, meaning that they were autoconfed/automaked/whatever before they were put into OMPI. And in hindsight, this fits in with exactly what our original goal was: take a VT tarball and dump it into OMPI's SVN. Doh! So I think the question still remains: can we hook VT's autoconf (et al.) requirements into the top-level autogen.sh so that the trunk copy of vt doesn't have configure/aclocal.m4/etc. and OMPI's top-level autogen.sh will create them? -- Jeff Squyres Cisco Systems
Re: [OMPI devel] VT in trunk + how to disable
Should the default be to *disable* vampirtrace? I mention this since, I assume, most people do not depend on this tool for every Open MPI install. Meaning that Open MPI does not require this integration for correct MPI functionality unlike something like ROMIO [example of opt-out functionality which is 3rd party]. So I would suggest to the group that vampirtrace be an opt-in functionality. What do others think? -- Josh On Jan 28, 2008, at 9:59 AM, Andreas Knüpfer wrote: Hi everybody, the vampirtrace integration arrived at the trunk today. There seems to be one issue already, but we'll fix this asap. As a general hint, this is how to completely disable anything we integrated: configure --enable-contrib-no-build=vt ... Then again, we'd like to see all the issues you may encounter and fix them. Best regards, Andreas -- Dipl. Math. Andreas Knuepfer, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A114, Zellescher Weg 12, 01062 Dresden phone +49-351-463-38323, fax +49-351-463-37773 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] vt compiler warnings and errors
* Jeff Squyres wrote on Thu, Jan 31, 2008 at 07:10:36PM CET: > Ah -- I didn't notice this before -- do you have a configure script > committed to SVN? If so, this could be the problem. > > On Do, 2008-01-31 at 08:09 -0500, Tim Prins wrote: [...] > >> [tprins@sif test]$ make clean > >> > >> Making clean in otf > >> make[5]: Entering directory > >> `/san/homedirs/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf' > >> cd . && /bin/sh > >> /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run > >> automake-1.10 --gnu > >> cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/ > >> missing > >> --run autoconf [...] These files do not belong in SVN, they are generated by aclocal: ompi/contrib/vt/vt/extlib/otf/aclocal.m4 ompi/contrib/vt/vt/aclocal.m4 Cheers, Ralf
Re: [OMPI devel] vt compiler warnings and errors
Hi everybody, now this is an interesting effect. After a fresh checkout all files have the actual time, haven't they? Is the timestamp explicitly saved somewhere? Could it be, that this is newer than Tim's local time yesterday? Maybe the system time is not set to UTC or something like this? If so, then it should be possible to reproduce this today. Could you give it a try, Tim? Another cause could be slight differences in files' times because one is checked out earlier than the other. However, OTF's configure ran before during the first global configure. Therefore, all files' timestamps should be correct after this. So I don't believe in this explanation. What do you think? -- Dipl. Math. Andreas Knuepfer, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A114, Zellescher Weg 12, 01062 Dresden phone +49-351-463-38323, fax +49-351-463-37773 signature.asc Description: This is a digitally signed message part.