Re: [OMPI users] Memory manager
To tell you all what noone wanted to tell me, yes, it does seem to be the memory manager. Compiling everything with --with-memory-manager=none returns the vmem use to the more reasonable ~100MB per process (down from >8GB). I take it this may affect my peak bandwidth over infiniband. What's the general feeling about how bad this is? On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote: > Hi folks > > I'm trying to run an MPI app on an infiniband cluster with OpenMPI > 1.2.6. > > When run on a single node, this app is grabbing large chunks of memory > (total per process ~8.5GB, including strace showing a single 4GB grab) > but not using it. The resident memory use is ~40MB per process. When > this app is compiled in serial mode (with conditionals to remove the MPI > calls) the memory use is more like what you'd expect, 40MB res and > ~100MB vmem. > > Now I didn't write it so I'm not sure what extra stuff the MPI version > does, and we haven't tracked down the large memory grabs. > > Could it be that this vmem is being grabbed by the OpenMPI memory > manager rather than directly by the app? > > Ciao > Terry > >
Re: [OMPI users] openmpi 32-bit g++ compilation issue
Arif, It looks like your system is 64 bit by default and it therefore doesn't pick up the 32 bit libraries automatically at the link step (note the -L/.../x86_64-suse-linux/lib entries prior to the correspond entries pointing to the 32 bit library versions). I don't use suse linux so I don't know if this is something you can control in the configure step for open-mpi. Doug Reeder On May 19, 2008, at 2:48 PM, Arif Ali wrote: Hi, OS: SLES10 SP1 OFED: 1.3 openmpi: 1.2 1.2.5 1.2.6 compilers: gcc g++ gfortran I am creating a 32-bit build of openmpi on an Infiniband cluster, and the compilation gets stuck, If I use the /usr/lib64/gcc/x86_64- suse-linux/4.1.2/32/libstdc++.so library manually it compiles that piece of code. I was wandering if anyone else has had this problem. Or is there any other way of getting this to work. I feel that there may be something very silly here that I have missed out. but I can't seem to gather it. I have also tried this on a fresh install of OFED 1.3 with openmpi 1.2.6 libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../../../opal/include - I../../../orte/include -I../../../ompi/include - DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 - I../../.. -O3 -DNDEBUG -m32 -finline-functions -pthread -MT file.lo -MD -MP -MF .deps/file.Tpo -c file.cc -fPIC -DPIC -o .libs/file.o depbase=`echo win.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ../../../libtool --tag=CXX --mode=compile g++ - DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include - I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 - DOMPI_SKIP_MPICXX=1 -I../../..-O3 -DNDEBUG -m32 -finline- functions -pthread -MT win.lo -MD -MP -MF $depbase.Tpo -c -o win.lo win.cc &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../../../opal/include - I../../../orte/include -I../../../ompi/include - DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 - I../../.. -O3 -DNDEBUG -m32 -finline-functions -pthread -MT win.lo - MD -MP -MF .deps/win.Tpo -c win.cc -fPIC -DPIC -o .libs/win.o /bin/sh ../../../libtool --tag=CXX --mode=link g++ -O3 -DNDEBUG - m32 -finline-functions -pthread -export-dynamic -m32 -o libmpi_cxx.la -rpath /opt/openmpi/1.2.6/gnu_4.1.2/32/lib mpicxx.lo intercepts.lo comm.lo datatype.lo file.lo win.lo -lnsl -lutil -lm libtool: link: g++ -shared -nostdlib /usr/lib64/gcc/x86_64-suse- linux/4.1.2/../../../../lib/crti.o /usr/lib64/gcc/x86_64-suse-linux/ 4.1.2/32/crtbeginS.o .libs/mpicxx.o .libs/intercepts.o .libs/ comm.o .libs/datatype.o .libs/file.o .libs/win.o -Wl,-rpath -Wl,/ usr/lib64/gcc/x86_64-suse-linux/4.1.2 -Wl,-rpath -Wl,/usr/lib64/gcc/ x86_64-suse-linux/4.1.2 -lnsl -lutil -L/usr/lib64/gcc/x86_64-suse- linux/4.1.2/32 -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../ x86_64-suse-linux/lib/../lib -L/usr/lib64/gcc/x86_64-suse-linux/ 4.1.2/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib64/ gcc/x86_64-suse-linux/4.1.2 -L/usr/lib64/gcc/x86_64-suse-linux/ 4.1.2/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc/x86_64- suse-linux/4.1.2/../../.. /usr/lib64/gcc/x86_64-suse-linux/4.1.2/ libstdc++.so -lm -lpthread -lc -lgcc_s /usr/lib64/gcc/x86_64-suse- linux/4.1.2/32/crtendS.o /usr/lib64/gcc/x86_64-suse-linux/ 4.1.2/../../../../lib/crtn.o -m32 -pthread -m32 -pthread -Wl,- soname -Wl,libmpi_cxx.so.0 -o .libs/libmpi_cxx.so.0.0.0 /usr/lib64/gcc/x86_64-suse-linux/4.1.2/libstdc++.so: could not read symbols: File in wrong format collect2: ld returned 1 exit status -- Arif Ali Software Engineer OCF plc Mobile: +44 (0)7970 148 122 DDI:+44 (0)114 257 2240 Office: +44 (0)114 257 2200 Fax:+44 (0)114 257 0022 Email: a...@ocf.co.uk Web:http://www.ocf.co.uk Support Phone: +44 (0)845 702 3829 Support E-mail: supp...@ocf.co.uk Skype: arif_ali80 MSN:a...@ocf.co.uk This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] openmpi 32-bit g++ compilation issue
Hi, OS: SLES10 SP1 OFED: 1.3 openmpi: 1.2 1.2.5 1.2.6 compilers: gcc g++ gfortran I am creating a 32-bit build of openmpi on an Infiniband cluster, and the compilation gets stuck, If I use the /usr/lib64/gcc/x86_64-suse-linux/4.1.2/32/libstdc++.so library manually it compiles that piece of code. I was wandering if anyone else has had this problem. Or is there any other way of getting this to work. I feel that there may be something very silly here that I have missed out. but I can't seem to gather it. I have also tried this on a fresh install of OFED 1.3 with openmpi 1.2.6 libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 -I../../.. -O3 -DNDEBUG -m32 -finline-functions -pthread -MT file.lo -MD -MP -MF .deps/file.Tpo -c file.cc -fPIC -DPIC -o .libs/file.o depbase=`echo win.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ../../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 -I../../..-O3 -DNDEBUG -m32 -finline-functions -pthread -MT win.lo -MD -MP -MF $depbase.Tpo -c -o win.lo win.cc &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 -I../../.. -O3 -DNDEBUG -m32 -finline-functions -pthread -MT win.lo -MD -MP -MF .deps/win.Tpo -c win.cc -fPIC -DPIC -o .libs/win.o /bin/sh ../../../libtool --tag=CXX --mode=link g++ -O3 -DNDEBUG -m32 -finline-functions -pthread -export-dynamic -m32 -o libmpi_cxx.la -rpath /opt/openmpi/1.2.6/gnu_4.1.2/32/lib mpicxx.lo intercepts.lo comm.lo datatype.lo file.lo win.lo -lnsl -lutil -lm libtool: link: g++ -shared -nostdlib /usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../lib/crti.o /usr/lib64/gcc/x86_64-suse-linux/4.1.2/32/crtbeginS.o .libs/mpicxx.o .libs/intercepts.o .libs/comm.o .libs/datatype.o .libs/file.o .libs/win.o -Wl,-rpath -Wl,/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -Wl,-rpath -Wl,/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -lnsl -lutil -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2/32 -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../x86_64-suse-linux/lib/../lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../.. /usr/lib64/gcc/x86_64-suse-linux/4.1.2/libstdc++.so -lm -lpthread -lc -lgcc_s /usr/lib64/gcc/x86_64-suse-linux/4.1.2/32/crtendS.o /usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../lib/crtn.o -m32 -pthread -m32 -pthread -Wl,-soname -Wl,libmpi_cxx.so.0 -o .libs/libmpi_cxx.so.0.0.0 /usr/lib64/gcc/x86_64-suse-linux/4.1.2/libstdc++.so: could not read symbols: File in wrong format collect2: ld returned 1 exit status -- Arif Ali Software Engineer OCF plc Mobile: +44 (0)7970 148 122 DDI:+44 (0)114 257 2240 Office: +44 (0)114 257 2200 Fax:+44 (0)114 257 0022 Email: a...@ocf.co.uk Web:http://www.ocf.co.uk Support Phone: +44 (0)845 702 3829 Support E-mail: supp...@ocf.co.uk Skype: arif_ali80 MSN:a...@ocf.co.uk This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email.
Re: [OMPI users] "Sorry! You were supposed to get help about..."
It feels like OMPI is somehow looking for the help files in the wrong place. Were they moved after OMPI was installed? How did you install OMPI? On May 16, 2008, at 10:30 AM, Alex L. wrote: Hello Everybody, i got a little bit annoying situation with OMPI error messages on a RHEL 4-something box. Every time i should see a error message i recieve something like: - Sorry! You were supposed to get help about: orterun:init-failure from the file: help-orterun.txt But I couldn't find any file matching that name. Sorry! - i know that the help-files (and only the help-files not the whole installation) are located in: /usr/share/openmpi/1.2.3-gcc/help64/openmpi/* Is it possible to tell OMPI to look for the help files in this direcotry ? Some ENV variable or a --option ? Thank you in advance, Alex ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [MTT users] MTT server side problem
Hello, Did you have chance to review this patch ? Regards, Pasha Josh Hursey wrote: Sorry for the delay on this. I probably will not have a chance to look at it until later this week or early next. Thank you for the work on the patch. Cheers, Josh On May 12, 2008, at 8:08 AM, Pavel Shamis (Pasha) wrote: Hi Josh, I ported the error handling mechanism from submit/index.php to to the database.inc. Please review. Thanks, Pasha Josh Hursey wrote: Pasha, I'm looking at the patch a bit closer and even though at a high level the do_pg_connect, do_pg_query, simple_select, and select functions do the same thing the versions in submit/index.php have some additional error handling mechanisms that the ones in database.inc do not have. Specifically they send email when the functions fail with messages indicating what failed so corrections can be made. So though I agree that we should unify the functionality I cannot recommend this patch since it will result in losing useful error handling functionality. Maybe there is another way to clean this up to preserve the error reporting. -- Josh On May 7, 2008, at 11:56 AM, Pavel Shamis (Pasha) wrote: Hi Josh, I had the original problem with some old revision from trunk. Today I updated the server to latest revision from trunk + the patch and everything looks good. Can I commit the patch ? Pasha Ethan Mallove wrote: On Wed, May/07/2008 06:04:07PM, Pavel Shamis (Pasha) wrote: Hi Josh. Looking at the patch I'm a little bit conserned. The "get_table_fields()" is, as you mentioned, no longer used so should be removed. However the other functions are critical to the submission script particularly 'do_pg_connect' which opens the connection to the backend database. All the functions are implemented in $topdir/database.inc file. And the "database.inc" implementation is better because it use password and username from config.ini. The original implementation from submit/index use hardcoded values defined in the file. Are you using the current development trunk (mtt/trunk) or the stable release branch (mtt/branches/ompi-core-testers)? trunk Can you send us the error messages that you were receiving? 1. On client side I see ""*** WARNING: MTTDatabase client did not get a serial" As result of the error some of MTT results is not visible via the web reporter 2. On server side I found follow error message: [client 10.4.3.214] PHP Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 23592960 bytes) in /.autodirect/swgwork/MTT/mtt/submit/index.php(79) : eval()'d code on line 77515 [Mon May 05 19:26:05 2008] [notice] caught SIGTERM, shutting down [Mon May 05 19:30:54 2008] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Mon May 05 19:30:54 2008] [notice] Digest: generating secret for digest authentication ... [Mon May 05 19:30:54 2008] [notice] Digest: done [Mon May 05 19:30:54 2008] [notice] LDAP: Built with OpenLDAP LDAP SDK [Mon May 05 19:30:54 2008] [notice] LDAP: SSL support unavailable My memory limit in php.ini file was set on 256MB ! Looks like PHP is actually using a 32MB limit ("Allowed memory size of 33554432 ..."). Does a (Apache?) daemon need to be restarted for the php.ini file to take effect? To check your settings, this little PHP script will print an HTML page of all the active system settings (search on "memory_limit"). -Ethan Regards, Pasha Cheers, Josh On May 7, 2008, at 4:49 AM, Pavel Shamis (Pasha) wrote: Hi, I upgraded the server side (the mtt is still running , so don't know if the problem was resolved) During upgrade I had some problem with the submit/index.php script, it had some duplicated functions and some of them were broken. Please review the attached patch. Pasha Ethan Mallove wrote: On Tue, May/06/2008 06:29:33PM, Pavel Shamis (Pasha) wrote: I'm not sure which cron jobs you're referring to. Do you mean these? https://svn.open-mpi.org/trac/mtt/browser/trunk/server/php/cron I talked about this one: https://svn.open-mpi.org/trac/mtt/wiki/ServerMaintenance I'm guessing you would only be concerned with the below periodic-maintenance.pl script, which just runs ANALYZE/VACUUM queries. I think you can start that up whenever you want (and it should optimize the Reporter). https://svn.open-mpi.org/trac/mtt/browser/trunk/server/sql/cron/periodic-maintenance.pl -Ethan The only thing there are the regular mtt-resu...@open-mpi.org email alerts and some out-of-date DB monitoring junk. You can ignore that stuff. Josh, are there some nightly (DB pruning/cleaning/vacuuming?) cron jobs that Pasha should be running? -Ethan Thanks. Ethan Mallove wrote: Hi Pasha, I thought this issue was solved in r1119 (see below). Do you have the latest mtt/server scripts? https://svn.open-mpi.org/trac/mtt/changeset/1119/trunk/server/php/submit -Ethan On Tue, May/06/2008 03:26:43PM, Pavel Shamis (Pasha) wrote: About the