Re: [OMPI users] How to keep multiple installations at same time
I checked with my colleague, who is one of the module developers. His response: > That's a surprise to me?! > I will admit that I'm a little slow on releases, > but it's still quite active. On 8/5/14 11:39 AM, Fabricio Cannini wrote: On 05-08-2014 13:54, Ralph Castain wrote: Check the repo - hasn't been touched in a very long time Yes, the cvs repo hasn't been touched in a long long time, but they have apparently migrated to git. cvs: http://modules.cvs.sourceforge.net/viewvc/modules/ git: http://sourceforge.net/p/modules/git/ci/master/tree/ There is still activity on git, patches for newest tcl version. It may not be bursting, but I wouldn't call it "dead". Yet. ;) [ ]'s ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/08/24921.php -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] current release?
Hi, On Feb 21, Jeff Squyres announced Open MPI 1.6.4. However, on the Open MPI home page, 1.6.3 is still indicated as the current release. Going to the download page shows 1.6.4 as the current release, so I think the problem is isolated to the home page. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] limiting tasks/ranks
Hi, Is there a way to limit the number of tasks started by mpirun? For example, on our 48-core SMP, I'd like to limit MPI jobs to a maximum of 12 tasks. That is, "mpirun -np 16 ..." would return an error. Note that this is a strictly interactive system; no batch environment available. I've just quickly scanned the MCA parameters: ompi_info --param all all and couldn't find the answer to my question. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] problems with 1.6
Hi Ralph, Sorry for the false alarm, and thanks for the tip: ... version confusion where the mpirun being used doesn't match the backend daemons. Yes, my test environment was wonky. All is well now. On May 14, 2012, at 3:41 PM, David Turner wrote: ... [c0667:24962] [[39579,1],11] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 118 -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] problems with 1.6
Hi all, I am having troubles with the newly-available 1.6 release (tar.gz). I built it with my "normal" configure options, with no obvious configure or make errors. I used both PGI 12.4, and GCC 4.7.0, under Scientific Linux 5.5. I then compiled my "normal" matrix-multiply test case. Upon execution, I get (with either compiler): [c0667:24962] [[39579,1],11] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 118 [c0667:24962] [[39579,1],11] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 [c0667:24966] [[39579,1],15] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/nidmap.c at line 118 [c0667:24966] [[39579,1],15] ORTE_ERROR_LOG: Data unpack had inadequate space in file ess_env_module.c at line 174 ... It looks like orte_init failed for some reason; ... ... It looks like MPI_INIT failed for some reason; ... I can provide additional details if needed, but again: I did nothing different than what I have done with previous OMPI and compiler releases. Thoughts? Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] open-mpi.org site
Hi all, Currently getting "You don't have permission to access / on this server" on the www.open-mpi-org website. -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] UC EXTERNAL: Re: How to set up state-less node /tmpfor OpenMPI usage
Indeed, my terminology is inexact. I believe you are correct; our diskless nodes use tmpfs, not ramdisk. Thanks for the clarification! On 11/4/11 11:00 AM, Rushton Martin wrote: There appears to be some confusion about ramdisks and tmpfs. A ramdisk sets aside a fixed amount of memory for its exclusive use, so that a file being written to ramdisk goes first to the cache, then to ramdisk, and may exist in both for some time. tmpfs however opens up the cache to programs so that a file being written goes to cache and stays there. The "size" of a tmpfs pseudo-disk is the maximum it can grow to (which according to the mount man page defaults to 50% of memory). Hence only enough memory to hold the data is actually used which ties up with David Turner's figures. You can easily tell which method is in use from df. A traditional ramdisk will appears as /dev/ramN (N = 0, 1 ...) whereas a tmpfs device will be a simple name, often tmpfs. I would guess that the single "-" in David's df command is precisely this. On our diskless nodes root shows as device compute_x86_64, whilst /tmp, /dev/shm and /var/tmp show as "none". HTH, Martin Rushton HPC System Manager, Weapons Technologies Tel: 01959 514777, Mobile: 07939 219057 email: jmrush...@qinetiq.com www.QinetiQ.com QinetiQ - Delivering customer-focused solutions Please consider the environment before printing this email. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Blosch, Edwin L Sent: 04 November 2011 16:19 To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmpfor OpenMPI usage OK, I wouldn't have guessed that the space for /tmp isn't actually in RAM until it's needed. That's the key piece of knowledge I was missing; I really appreciate it. So you can allow /tmp to be reasonably sized, but if you aren't actually using it, then it doesn't take up 11 GB of RAM. And you prevent users from crashing the node by setting mem limit to 4 GB less than the available memory. Got it. I agree with your earlier comment: these are fairly common systems now. We have program- and owner-specific disks where I work, and after the program ends, the disks are archived or destroyed. Before the stateless configuration option, the entire computer, nodes and switches as well as disks, were archived or destroyed after each program. Not too cost-effective. Is this a reasonable final summary? : OpenMPI uses temporary files in such a way that it is performance-critical that these so-called session files, used for shared-memory communications, must be "local". For state-less clusters, this means the node image must include a /tmp or /wrk partition, intelligently sized so as not to enable an application to exhaust the physical memory of the node, and care must be taken not to mask this in-memory /tmp with an NFS mounted filesystem. It is not uncommon for cluster enablers to exclude /tmp from a typical base Linux filesystem image or mount it over NFS, as a means of providing users with a larger-sized /tmp that is not limited to a fraction of the node's physical memory, or to avoid garbage accumulation in /tmp taking up the physical RAM. But not having /tmp or mounting it over NFS is not a viable stateless-node configuration option if you intend to run OpenMPI. Instead you could have a /bigtmp which is NFS-mounted and a /tmp whi! ch is local, for example. Starting in OpenMPI 1.7.x, shared-memory communication will no longer go through memory-mapped files, and vendors/users will no longer need to be vigilant concerning this OpenMPI performance requirement on stateless node configuration. Is that a reasonable summary? If so, would it be helpful to include this as an FAQ entry under General category? Or the "shared memory" category? Or the "troubleshooting" category? Thanks -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of David Turner Sent: Friday, November 04, 2011 1:38 AM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage % df /tmp Filesystem 1K-blocks Used Available Use% Mounted on - 12330084822848 11507236 7% / % df / Filesystem 1K-blocks Used Available Use% Mounted on - 12330084822848 11507236 7% / That works out to 11GB. But... The compute nodes have 24GB. Freshly booted, about 3.2GB is consumed by the kernel, various services, and the root file system. At this time, usage of /tmp is essentially nil. We set user memory limits to 20GB. I would imagine that the size of the session directories depends on a number of factors; perhaps the developers can comment on that. I have only seen total sizes in the 10s of MBs on our 8-node, 24GB nodes. As long as they're removed after each job, they do
Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage
I should have been more careful. When we first started using OpenMPI, version 1.4.1, there was a bug that caused session directories to be left behind. This was fixed in subsequent releases (and via a patch for 1.4.1). Our batch epilogue still removes everything in /tmp that belongs to the owner of the batch job. It is invoked after the user's application has terminated, so the session directories are already gone by that time. Sorry for the confusion! On 11/4/11 3:43 AM, TERRY DONTJE wrote: David, are you saying your jobs consistently leave behind session files after the job exits? It really shouldn't even in the case when a job aborts, I thought, mpirun took great pains to cleanup after itself. Can you tell us what version of OMPI you are running with? I think I could see kill -9 of mpirun and processes below would cause turds to be left behind. --td On 11/4/2011 2:37 AM, David Turner wrote: % df /tmp Filesystem 1K-blocks Used Available Use% Mounted on - 12330084 822848 11507236 7% / % df / Filesystem 1K-blocks Used Available Use% Mounted on - 12330084 822848 11507236 7% / That works out to 11GB. But... The compute nodes have 24GB. Freshly booted, about 3.2GB is consumed by the kernel, various services, and the root file system. At this time, usage of /tmp is essentially nil. We set user memory limits to 20GB. I would imagine that the size of the session directories depends on a number of factors; perhaps the developers can comment on that. I have only seen total sizes in the 10s of MBs on our 8-node, 24GB nodes. As long as they're removed after each job, they don't really compete with the application for available memory. On 11/3/11 8:40 PM, Ed Blosch wrote: Thanks very much, exactly what I wanted to hear. How big is /tmp? -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of David Turner Sent: Thursday, November 03, 2011 6:36 PM To: us...@open-mpi.org Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage I'm not a systems guy, but I'll pitch in anyway. On our cluster, all the compute nodes are completely diskless. The root file system, including /tmp, resides in memory (ramdisk). OpenMPI puts these session directories therein. All our jobs run through a batch system (torque). At the conclusion of each batch job, an epilogue process runs that removes all files belonging to the owner of the current batch job from /tmp (and also looks for and kills orphan processes belonging to the user). This epilogue had to written by our systems staff. I believe this is a fairly common configuration for diskless clusters. On 11/3/11 4:09 PM, Blosch, Edwin L wrote: Thanks for the help. A couple follow-up-questions, maybe this starts to go outside OpenMPI: What's wrong with using /dev/shm? I think you said earlier in this thread that this was not a safe place. If the NFS-mount point is moved from /tmp to /work, would a /tmp magically appear in the filesystem for a stateless node? How big would it be, given that there is no local disk, right? That may be something I have to ask the vendor, which I've tried, but they don't quite seem to get the question. Thanks -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Thursday, November 03, 2011 5:22 PM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote: I might be missing something here. Is there a side-effect or performance loss if you don't use the sm btl? Why would it exist if there is a wholly equivalent alternative? What happens to traffic that is intended for another process on the same node? There is a definite performance impact, and we wouldn't recommend doing what Eugene suggested if you care about performance. The correct solution here is get your sys admin to make /tmp local. Making /tmp NFS mounted across multiple nodes is a major "faux pas" in the Linux world - it should never be done, for the reasons stated by Jeff. Thanks -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Eugene Loh Sent: Thursday, November 03, 2011 1:23 PM To: us...@open-mpi.org Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage Right. Actually "--mca btl ^sm". (Was missing "btl".) On 11/3/2011 11:19 AM, Blosch, Edwin L wrote: I don't tell OpenMPI what BTLs to use. The default uses sm and puts a session file on /tmp, which is NFS-mounted and thus not a good choice. Are you suggesting something like --mca ^sm? -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Eugene Loh Sent: Thursday, November 03, 2011 12:54 PM To: us...@open-mpi.org Subject: Re: [OMPI user
Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage
% df /tmp Filesystem 1K-blocks Used Available Use% Mounted on - 12330084822848 11507236 7% / % df / Filesystem 1K-blocks Used Available Use% Mounted on - 12330084822848 11507236 7% / That works out to 11GB. But... The compute nodes have 24GB. Freshly booted, about 3.2GB is consumed by the kernel, various services, and the root file system. At this time, usage of /tmp is essentially nil. We set user memory limits to 20GB. I would imagine that the size of the session directories depends on a number of factors; perhaps the developers can comment on that. I have only seen total sizes in the 10s of MBs on our 8-node, 24GB nodes. As long as they're removed after each job, they don't really compete with the application for available memory. On 11/3/11 8:40 PM, Ed Blosch wrote: Thanks very much, exactly what I wanted to hear. How big is /tmp? -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of David Turner Sent: Thursday, November 03, 2011 6:36 PM To: us...@open-mpi.org Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage I'm not a systems guy, but I'll pitch in anyway. On our cluster, all the compute nodes are completely diskless. The root file system, including /tmp, resides in memory (ramdisk). OpenMPI puts these session directories therein. All our jobs run through a batch system (torque). At the conclusion of each batch job, an epilogue process runs that removes all files belonging to the owner of the current batch job from /tmp (and also looks for and kills orphan processes belonging to the user). This epilogue had to written by our systems staff. I believe this is a fairly common configuration for diskless clusters. On 11/3/11 4:09 PM, Blosch, Edwin L wrote: Thanks for the help. A couple follow-up-questions, maybe this starts to go outside OpenMPI: What's wrong with using /dev/shm? I think you said earlier in this thread that this was not a safe place. If the NFS-mount point is moved from /tmp to /work, would a /tmp magically appear in the filesystem for a stateless node? How big would it be, given that there is no local disk, right? That may be something I have to ask the vendor, which I've tried, but they don't quite seem to get the question. Thanks -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Thursday, November 03, 2011 5:22 PM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote: I might be missing something here. Is there a side-effect or performance loss if you don't use the sm btl? Why would it exist if there is a wholly equivalent alternative? What happens to traffic that is intended for another process on the same node? There is a definite performance impact, and we wouldn't recommend doing what Eugene suggested if you care about performance. The correct solution here is get your sys admin to make /tmp local. Making /tmp NFS mounted across multiple nodes is a major "faux pas" in the Linux world - it should never be done, for the reasons stated by Jeff. Thanks -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Eugene Loh Sent: Thursday, November 03, 2011 1:23 PM To: us...@open-mpi.org Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage Right. Actually "--mca btl ^sm". (Was missing "btl".) On 11/3/2011 11:19 AM, Blosch, Edwin L wrote: I don't tell OpenMPI what BTLs to use. The default uses sm and puts a session file on /tmp, which is NFS-mounted and thus not a good choice. Are you suggesting something like --mca ^sm? -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Eugene Loh Sent: Thursday, November 03, 2011 12:54 PM To: us...@open-mpi.org Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage I've not been following closely. Why must one use shared-memory communications? How about using other BTLs in a "loopback" fashion? ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ us
Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage
I'm not a systems guy, but I'll pitch in anyway. On our cluster, all the compute nodes are completely diskless. The root file system, including /tmp, resides in memory (ramdisk). OpenMPI puts these session directories therein. All our jobs run through a batch system (torque). At the conclusion of each batch job, an epilogue process runs that removes all files belonging to the owner of the current batch job from /tmp (and also looks for and kills orphan processes belonging to the user). This epilogue had to written by our systems staff. I believe this is a fairly common configuration for diskless clusters. On 11/3/11 4:09 PM, Blosch, Edwin L wrote: Thanks for the help. A couple follow-up-questions, maybe this starts to go outside OpenMPI: What's wrong with using /dev/shm? I think you said earlier in this thread that this was not a safe place. If the NFS-mount point is moved from /tmp to /work, would a /tmp magically appear in the filesystem for a stateless node? How big would it be, given that there is no local disk, right? That may be something I have to ask the vendor, which I've tried, but they don't quite seem to get the question. Thanks -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Thursday, November 03, 2011 5:22 PM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote: I might be missing something here. Is there a side-effect or performance loss if you don't use the sm btl? Why would it exist if there is a wholly equivalent alternative? What happens to traffic that is intended for another process on the same node? There is a definite performance impact, and we wouldn't recommend doing what Eugene suggested if you care about performance. The correct solution here is get your sys admin to make /tmp local. Making /tmp NFS mounted across multiple nodes is a major "faux pas" in the Linux world - it should never be done, for the reasons stated by Jeff. Thanks -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Eugene Loh Sent: Thursday, November 03, 2011 1:23 PM To: us...@open-mpi.org Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage Right. Actually "--mca btl ^sm". (Was missing "btl".) On 11/3/2011 11:19 AM, Blosch, Edwin L wrote: I don't tell OpenMPI what BTLs to use. The default uses sm and puts a session file on /tmp, which is NFS-mounted and thus not a good choice. Are you suggesting something like --mca ^sm? -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Eugene Loh Sent: Thursday, November 03, 2011 12:54 PM To: us...@open-mpi.org Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage I've not been following closely. Why must one use shared-memory communications? How about using other BTLs in a "loopback" fashion? ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] Displaying MAIN in Totalview
Hi, About a month ago, this topic was discussed with no real resolution: http://www.open-mpi.org/community/lists/users/2011/02/15538.php We noticed the same problem (TV does not display the user's MAIN routine upon initial startup), and contacted the TV developers. They suggested a simple OMPI code modification, which we implemented and tested; it seems to work fine. Hopefully, this capability can be restored in future releases. Here is the body of our communication with the TV developers: -- Interestingly enough, someone else asked this very same question recently and I finally dug into it last week and figured out what was going on. TotalView publishes a public interface which allows any MPI implementor to set things up so that it should work fairly seamless with TotalView. I found that one of the defines in the interface is MPIR_force_to_main and when we find this symbol defined in mpirun (or orterun in Open MPI's case) then we spend a bit more effort to focus the source pane on the main routine. As you may guess, this is NOT being defined in OpenMPI 1.4.2. It was being defined in the 1.2.x builds though, in a routine called totalview.c. OpenMPI has been re-worked significantly since then, and totalview.c has been replaced by debuggers.c in orte/tools/orterun. About line 130 to 140 (depending on any changes since my look at the 1.4.1 sources) you should find a number of MPIR_ symbols being defined. struct MPIR_PROCDESC *MPIR_proctable = NULL; int MPIR_proctable_size = 0; int MPIR_being_debugged = 0; volatile int MPIR_debug_state = 0; volatile int MPIR_i_am_starter = 0; volatile int MPIR_partial_attach_ok = 1; I believe you should be able to insert the line: int MPIR_force_to_main = 0; into this section, and then the behavior you are looking for should work after you rebuild OpenMPI. I haven't yet had the time to do that myself, but that was all that existed in the 1.2.x sources, and I know those achieved the desired effect. It's quite possible that someone realized the symbol was initialized, but wasn't be used anyplace, so they just removed it. Without realizing we were looking for it in the debugger. When I pointed this out to the other user, he said he would try it out and pass it on to the Open MPI group. I just checked on that thread, and didn't see any update, so I passed on the info myself. -- -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] memory limits on remote nodes
Hi, Various people contributed: Isn't it possible to set this up in torque/moab directly? In SGE I would simply define h_vmem and it's per slot then; and with a tight integration all Open MPI processes will be children of sge_execd and the limit will be enforced. I could be wrong, but I -think- the issue here is that the soft limits need to be set on a per-job basis. This I also thought, and `qsub -l h_vmem=4G ...` should do it. It can be requested on a per job basis (with further limits on a queue level if necessary). Well, this sent me in the right direction. I believe h_vmem is SGE-specific, but our torque environment provides the "pmem" (physical memory) and "pvmem" (virtual memory) resources on a per-job basis. These seem to provide exactly the functionality we need. Sorry to bother you an issue that ended up being independent of Open MPI! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] memory limits on remote nodes
Hi Ralph, There is an MCA param that tells the orted to set its usage limits to the hard limit: MCA opal: parameter "opal_set_max_sys_limits" (current value:<0>, data source: default value) Set to non-zero to automatically set any system-imposed limits to the maximum allowed The orted could be used to set the soft limit down from that value on a per-job basis, but we didn't provide a mechanism for specifying it. Would be relatively easy to do, though. What version are you using? If I create a patch, would you be willing to test it? 1.4.2, with 1.4.1 available, and 1.4.3 waiting in the wings. I would love to test any patch you could come up with. The ability to set any valid limit to any valid value, applied equally to all processes, would go a long way in making our environment more stable. Thanks! Hi, We would like to set process memory limits (vmemoryuse, in csh terms) on remote processes. Our batch system is torque/moab. The nodes of our cluster each have 24GB of physical memory, of which 4GB is taken up by the kernel and the root file system. Note that these are diskless nodes, so no swap either. We can globally set the per-process limit to 2.5GB. This works fine if applications run "packed": 8 MPI tasks running on each 8-core node, for an aggregate limit of 20GB. However, if a job only wants to run 4 tasks, the soft limit can safely be raised to 5GB. 2 tasks, 10GB. 1 task, the full 20GB. Upping the soft limit in the batch script itself only affects the "head node" of the job. Since limits are not part of the "environment", I can find no way propagate them to remote nodes. If I understand how this all works, the remote processes are started by orted, and therefore inherit its limits. Is there any sort of orted configuration that can help here? Any other thoughts about how to approach this? Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] memory limits on remote nodes
Hi, We would like to set process memory limits (vmemoryuse, in csh terms) on remote processes. Our batch system is torque/moab. The nodes of our cluster each have 24GB of physical memory, of which 4GB is taken up by the kernel and the root file system. Note that these are diskless nodes, so no swap either. We can globally set the per-process limit to 2.5GB. This works fine if applications run "packed": 8 MPI tasks running on each 8-core node, for an aggregate limit of 20GB. However, if a job only wants to run 4 tasks, the soft limit can safely be raised to 5GB. 2 tasks, 10GB. 1 task, the full 20GB. Upping the soft limit in the batch script itself only affects the "head node" of the job. Since limits are not part of the "environment", I can find no way propagate them to remote nodes. If I understand how this all works, the remote processes are started by orted, and therefore inherit its limits. Is there any sort of orted configuration that can help here? Any other thoughts about how to approach this? Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] location of ompi libraries
Hi Jeff, Thanks for the response. Reviewing my builds, I realized that for 1.4.2, I had configured using contrib/platform/lanl/tlcc/optimized-nopanasas per Ralph Castain's suggestion. That file includes both: enable_dlopen=no enable_shared=yes enable_static=yes Here is my *real* issue. I am trying to test Voltaire's Fabric Collective Accelerator, which extends mca_component_path, and adds a few additional .so files. It appears I must have enable_dlopen=yes for this to work, which makes sense. I assume that the shared/static settings above result in *both* .a and .so versions of the ompi libraries getting built. I'm not sure if this will affect my ability to use Voltaire's mca plugins, but I have determined that simply removing the enable_dlopen=no is not sufficient to restore all the ompi .so files. I assume (haven't tried it yet) that removing the enable_static=yes will result in the ompi .so files getting created. I guess I'm just looking for some guidance in the use of the above options. I have read many warnings on the ompi website about trying to link statically. Thanks! On 10/5/10 7:17 AM, Jeff Squyres wrote: It is more than likely that you compiled Open MPI with --enable-static and/or --disable-dlopen. In this case, all of Open MPI's plugins are slurped up into the libraries themselves (e.g., libmpi.so or libmpi.a). That's why everything continues to work properly. On Oct 4, 2010, at 6:58 PM, David Turner wrote: Hi, In Open MPI 1.4.1, the directory lib/openmpi contains about 130 entries, including such things as mca_btl_openib.so. In my build of Open MPI 1.4.2, lib/openmpi contains exactly three items: libompi_dbg_msgq.a libompi_dbg_msgq.la libompi_dbg_msgq.so I have searched my 1.4.2 installation for mca_btl_openib.so, to no avail. And yet, 1.4.2 seems to work "fine". Is my installation broken, or is the organization significantly different between the two versions? A quick scan of the release notes didn't help. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] location of ompi libraries
Hi, In Open MPI 1.4.1, the directory lib/openmpi contains about 130 entries, including such things as mca_btl_openib.so. In my build of Open MPI 1.4.2, lib/openmpi contains exactly three items: libompi_dbg_msgq.a libompi_dbg_msgq.la libompi_dbg_msgq.so I have searched my 1.4.2 installation for mca_btl_openib.so, to no avail. And yet, 1.4.2 seems to work "fine". Is my installation broken, or is the organization significantly different between the two versions? A quick scan of the release notes didn't help. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] compiler upgrades require openmpi rebuild?
Hi, We have recently upgraded our default compiler suite from PGI 10.5 to PGI 10.8. We use the "module" system to manage third-party software. The module for PGI sets PATH and LD_LIBRARY_PATH. Using Open MPI 1.4.2, built with PGI 10.5, I have verified that changing PATH is sufficient for the Open MPI compiler wrappers to pick up version 10.8 of the PGI compilers. However, it appears that the 10.5 PGI libraries are "wired" into the wrappers somehow. So I get an executable that has been compiled with PGI 10.8 but linked against PGI 10.5 libraries. Short of rebuilding Open MPI with PGI 10.8, is there any (safe, reliable) way to get the compiler wrappers to link against the PGI 10.8 libraries? Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] problem with -npernode
Hi, On 06/17/2010 03:34 PM, Ralph Castain wrote: No more info required - it's a bug. Fixed and awaiting release of 1.4.3. I downloaded openmpi-1.4.3a1r23261.tar.gz, dated June 9. It behaves the same as 1.4.2. Is there a newer version available for testing? On Jun 17, 2010, at 3:50 PM, David Turner wrote: Hi, Recently, Christopher Maestas reported a problem with -npernode in Open MPI 1.4.2 ("running a ompi 1.4.2 job with -np versus -npernode"). I have also encountered this problem, with a simple "hello, world" program: % mpirun -np 16 ./a.out myrank, icount = 0 16 myrank, icount = 2 16 myrank, icount = 5 16 myrank, icount = 7 16 myrank, icount = 1 16 myrank, icount = 4 16 myrank, icount = 6 16 myrank, icount = 3 16 myrank, icount = 8 16 myrank, icount = 9 16 myrank, icount =10 16 myrank, icount =12 16 myrank, icount =13 16 myrank, icount =15 16 myrank, icount =11 16 myrank, icount =14 16 FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP % mpirun -np 16 -npernode 8 ./a.out [c1146:15313] *** Process received signal *** [c1146:15313] Signal: Segmentation fault (11) [c1146:15313] Signal code: Address not mapped (1) [c1146:15313] Failing at address: 0x50 [c1146:15313] *** End of error message *** Segmentation fault [c1138:26571] [[62315,0],1] routed:binomial: Connection to lifeline [[62315,0],0] lost % module swap openmpi openmpi/1.4.1 % mpirun -np 16 -npernode 8 ./a.out myrank, icount = 8 16 myrank, icount =13 16 myrank, icount =10 16 myrank, icount =11 16 myrank, icount =15 16 myrank, icount =14 16 myrank, icount =12 16 myrank, icount = 5 16 myrank, icount = 2 16 myrank, icount = 3 16 myrank, icount = 1 16 myrank, icount = 0 16 myrank, icount = 9 16 myrank, icount = 6 16 myrank, icount = 7 16 myrank, icount = 4 16 FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP Compilers are PGI/10.5, OS is Scientific Linux 5.4, resource manager is torque 2.4.5. Please let me know if you need more information. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] problem with -npernode
Hi, Recently, Christopher Maestas reported a problem with -npernode in Open MPI 1.4.2 ("running a ompi 1.4.2 job with -np versus -npernode"). I have also encountered this problem, with a simple "hello, world" program: % mpirun -np 16 ./a.out myrank, icount = 0 16 myrank, icount = 2 16 myrank, icount = 5 16 myrank, icount = 7 16 myrank, icount = 1 16 myrank, icount = 4 16 myrank, icount = 6 16 myrank, icount = 3 16 myrank, icount = 8 16 myrank, icount = 9 16 myrank, icount =10 16 myrank, icount =12 16 myrank, icount =13 16 myrank, icount =15 16 myrank, icount =11 16 myrank, icount =14 16 FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP % mpirun -np 16 -npernode 8 ./a.out [c1146:15313] *** Process received signal *** [c1146:15313] Signal: Segmentation fault (11) [c1146:15313] Signal code: Address not mapped (1) [c1146:15313] Failing at address: 0x50 [c1146:15313] *** End of error message *** Segmentation fault [c1138:26571] [[62315,0],1] routed:binomial: Connection to lifeline [[62315,0],0] lost % module swap openmpi openmpi/1.4.1 % mpirun -np 16 -npernode 8 ./a.out myrank, icount = 8 16 myrank, icount =13 16 myrank, icount =10 16 myrank, icount =11 16 myrank, icount =15 16 myrank, icount =14 16 myrank, icount =12 16 myrank, icount = 5 16 myrank, icount = 2 16 myrank, icount = 3 16 myrank, icount = 1 16 myrank, icount = 0 16 myrank, icount = 9 16 myrank, icount = 6 16 myrank, icount = 7 16 myrank, icount = 4 16 FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP Compilers are PGI/10.5, OS is Scientific Linux 5.4, resource manager is torque 2.4.5. Please let me know if you need more information. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] Threading models with openib
Hi all, Please verify: if using openib BTL, the only threading model is MPI_THREAD_SINGLE? Is there a timeline for full support of MPI_THREAD_MULTIPLE in Open MPI's openib BTL? Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] excluding hosts
Hi Ralph, Are you using a scheduler of some kind? If so, you can add this to your default mca param file: Yes, we are running torque/moab. orte_allocation_required = 1 This will prevent anyone running without having an allocation. You can also set Ah. An "allocation". Not much info on this on the open-mpi website. I believe this is what we will want, to prevent mpirun on login nodes. rmaps_base_no_schedule_local = 1 which tells mpirun not to schedule any MPI procs on the local node. In our batch environment, mpirun will be executing on one of the compute nodes. That is, we don't have dedicated MOM nodes. Therefore, I think we will want to schedule (at least) one MPI task on the same node. Actually, when somebody wants to run (for example) 256 tasks packed on 32 8-core nodes, I think we'll need mpirun to share a *core* with one of the MPI tasks. The above option would prevent that, correct? Does that solve the problem? I'll give it a try and let you know. Thanks! Ralph On Apr 6, 2010, at 3:28 PM, David Turner wrote: Hi, Our cluster has a handful of login nodes, and then a bunch of compute nodes. OpenMPI is installed in a global file system visible from both sets of nodes. This means users can type "mpirun" from an interactive prompt, and quickly oversubscribe the login node. So, is there a way to explicitly exclude hosts from consideration for mpirun? To prevent (what is usually accidental) running MPI apps on our login nodes? Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] excluding hosts
Hi, Our cluster has a handful of login nodes, and then a bunch of compute nodes. OpenMPI is installed in a global file system visible from both sets of nodes. This means users can type "mpirun" from an interactive prompt, and quickly oversubscribe the login node. So, is there a way to explicitly exclude hosts from consideration for mpirun? To prevent (what is usually accidental) running MPI apps on our login nodes? Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] IPoIB
Hi all, Please, some clarification. I have built Open MPI 1.4.1 against our IB verbs layer, and all seems well. But a question has come up about IPoIB. While all communications are using the "native" IB interface (verbs), will mpirun use IPoIB during job launch and teardown? If it matters, resource allocation is via torque. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] Questions on /tmp/openmpi-sessions-userid directory
Hi Ralph > ... that is fixed in the upcoming 1.4.2 release. Can you say when this release will be generally available? Proliferating session directories are a problem for us too. -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] sm btl choices
Hi Ralph, Which version of OMPI are you using? We know that the 1.2 series was unreliable about removing the session directories, but 1.3 and above appear to be quite good about it. If you are having problems with the 1.3 or 1.4 series, I would definitely like to know about it. When I was at LANL, I ran a number of tests in exactly this configuration. While the sm btl did provide some performance advantage, it wasn't very much (the bandwidth was only about 10% greater, and the latency wasn't all that different either). I set the default configuration for users to include sm as 10% isn't something to sneer at, but you could disable it without an enormous impact. I realize I have another question about this. When you say "exactly" this configuration, do you mean the mmap files were backed to /tmp via ramdisk, or to a remote file system over the communications fabric? We have historically redefined TMPDIR to point somewhere other than /tmp, and have told our users *never* to use /tmp (if possible). I suppose that if OMPI cleans up after itself, and we use a prologue/epilogue, and regular scrubbing, we can keep /tmp under control. Another option would be to run an epilog that hammers the session directory. That's what LANL does, even though we didn't see much trouble with cleanup starting with the 1.3 series (still have a bunch of users stuck on 1.2). Depending on what environment you are running, you might contact folks there and get a copy of their epilog script. On Mar 1, 2010, at 1:42 AM, David Turner wrote: Hi all, Running on a large cluster of 8-core nodes. I understand that the SM BTL is a "good thing". But I'm curious about its use of memory-mapped files. I believe these files will be in $TMPDIR, which defaults to /tmp. In our cluster, the compute nodes are stateless, so /tmp is actually in RAM. Keeping memory-mapped "files" in memory seems kind of circular, although I know little about these things. A bigger problem is that it appears OMPI does not remove the files upon completion. Another option is to redefine $TMPDIR to point to a "real" file system. In our cluster, all the available file systems are accessed over the IB fabric. So it seems that there will be IB traffic, even though the point of the SM BTL is to avoid this traffic. Given the above two constraints, might it just be better to disable the SM BTL entirely, and use the IB BTL even within a node? Of course, the "self" BTL should still be used if appropriate. Any thoughts clarifying these issues would be greatly appreciated. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] sm btl choices
On 3/1/10 1:51 AM, Ralph Castain wrote: Which version of OMPI are you using? We know that the 1.2 series was unreliable about removing the session directories, but 1.3 and above appear to be quite good about it. If you are having problems with the 1.3 or 1.4 series, I would definitely like to know about it. Oops; sorry! OMPI 1.4.1, compiled with PGI 10.0 compilers, running on Scientific Linux 5.4, ofed 1.4.2. The session directories are *frequently* left behind. I have not really tried to characterize under what circumstances they are removed. But please confirm: they *should* be removed by OMPI. When I was at LANL, I ran a number of tests in exactly this configuration. While the sm btl did provide some performance advantage, it wasn't very much (the bandwidth was only about 10% greater, and the latency wasn't all that different either). I set the default configuration for users to include sm as 10% isn't something to sneer at, but you could disable it without an enormous impact. I'd prefer to provide as much performance as possible, also. Another option would be to run an epilog that hammers the session directory. That's what LANL does, even though we didn't see much trouble with cleanup starting with the 1.3 series (still have a bunch of users stuck on 1.2). Depending on what environment you are running, you might contact folks there and get a copy of their epilog script. Yes, we are already planning our prologues and epilogues, just haven't implemented them yet. Even if I can find and fix a reason why OMPI is currently not doing this, we will probably do it an epilogue anyway. Thanks for your help! On Mar 1, 2010, at 1:42 AM, David Turner wrote: Hi all, Running on a large cluster of 8-core nodes. I understand that the SM BTL is a "good thing". But I'm curious about its use of memory-mapped files. I believe these files will be in $TMPDIR, which defaults to /tmp. In our cluster, the compute nodes are stateless, so /tmp is actually in RAM. Keeping memory-mapped "files" in memory seems kind of circular, although I know little about these things. A bigger problem is that it appears OMPI does not remove the files upon completion. Another option is to redefine $TMPDIR to point to a "real" file system. In our cluster, all the available file systems are accessed over the IB fabric. So it seems that there will be IB traffic, even though the point of the SM BTL is to avoid this traffic. Given the above two constraints, might it just be better to disable the SM BTL entirely, and use the IB BTL even within a node? Of course, the "self" BTL should still be used if appropriate. Any thoughts clarifying these issues would be greatly appreciated. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] sm btl choices
Hi all, Running on a large cluster of 8-core nodes. I understand that the SM BTL is a "good thing". But I'm curious about its use of memory-mapped files. I believe these files will be in $TMPDIR, which defaults to /tmp. In our cluster, the compute nodes are stateless, so /tmp is actually in RAM. Keeping memory-mapped "files" in memory seems kind of circular, although I know little about these things. A bigger problem is that it appears OMPI does not remove the files upon completion. Another option is to redefine $TMPDIR to point to a "real" file system. In our cluster, all the available file systems are accessed over the IB fabric. So it seems that there will be IB traffic, even though the point of the SM BTL is to avoid this traffic. Given the above two constraints, might it just be better to disable the SM BTL entirely, and use the IB BTL even within a node? Of course, the "self" BTL should still be used if appropriate. Any thoughts clarifying these issues would be greatly appreciated. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] which ofed rpms for openmpi
, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] Problem building OpenMPI with PGI compilers
Jeff, Subject: Re: [OMPI users] Problem building OpenMPI with PGI compilers From: Jeff Squyres <jsquy...@cisco.com> Date: Thu, 10 Dec 2009 10:20:32 -0500 To: Open MPI Users <us...@open-mpi.org> ... Actually, I was wrong. You *can't* just take the SVN trunk's autogen.sh and use it with a v1.4 tarball (for various uninteresting reasons). Given that we haven't moved this patch to the v1.4 branch yet (i.e., it's not yet in a nightly v1.4 tarball), probably the easiest thing to do is to apply the attached patch to a v1.4 tarball. I tried it with my PGI 10.0 install and it seems to work. So -- forget everything about autogen.sh and just apply the attached patch. Thanks; I was able to complete the make process using the provided patch. -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
[OMPI users] Problem building OpenMPI with PGI compilers
Hi all, My first ever attempt to build OpenMPI. Platform is Sun Sunfire x4600 M2 servers, running Scientific Linux version 5.3. Trying to build OpenMPI 1.4 (as of today; same problems yesterday with 1.3.4). Trying to use PGI version 10.0. As a first attempt, I set CC, CXX, F77, and FC, then did "configure" and "make". Make ends with: libtool: link: pgCC --prelink_objects --instantiation_dir Template.dir .libs/mpicxx.o .libs/intercepts.o .libs/comm.o .libs/datatype.o .libs/win.o .libs/file.o -Wl,--rpath -Wl,/project/projectdirs/mpccc/usg/software/tnt/openmpi/openmpi-1.4/ompi/.libs -Wl,--rpath -Wl,/project/projectdirs/mpccc/usg/software/tnt/openmpi/openmpi-1.4/orte/.libs -Wl,--rpath -Wl,/project/projectdirs/mpccc/usg/software/tnt/openmpi/openmpi-1.4/opal/.libs -Wl,--rpath -Wl,/global/common/tesla/usg/openmpi/1.4/lib -L/project/projectdirs/mpccc/usg/software/tnt/openmpi/openmpi-1.4/orte/.libs -L/project/projectdirs/mpccc/usg/software/tnt/openmpi/openmpi-1.4/opal/.libs ../../../ompi/.libs/libmpi.so /project/projectdirs/mpccc/usg/software/tnt/openmpi/openmpi-1.4/orte/.libs/libopen-rte.so /project/projectdirs/mpccc/usg/software/tnt/openmpi/openmpi-1.4/opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lpthread pgCC-Error-Unknown switch: --instantiation_dir make[2]: *** [libmpi_cxx.la] Error 1 So I Googled "instantiation_dir openmpi", which led me to: http://cia.vc/stats/project/OMPI?s_message=3 where I see: There's still something wrong with the C++ support, however; I get errors about a template directory switch when compiling the C++ MPI bindings (doesn't happen with PGI 9.0). Still working on this... it feels like it's still a Libtool issue because OMPI is not putting in this compiler flag as far as I can tell: {{{ /bin/sh ../../../libtool --tag=CXX --mode=link pgCC -g -version-info 0:0:0 -export-dynamic -o libmpi_cxx.la -rpath /home/jsquyres/bogus/lib mpicxx.lo intercepts.lo comm.lo datatype.lo win.lo file.lo ../../../ompi/libmpi.la -lnsl -lutil -lpthread libtool: link: tpldir=Template.dir libtool: link: rm -rf Template.dir libtool: link: pgCC --prelink_objects --instantiation_dir Template.dir .libs/mpicxx.o .libs/intercepts.o .libs/comm.o .libs/datatype.o .libs/win.o .libs/file.o -Wl,--rpath -Wl,/users/jsquyres/svn/ompi-1.3/ompi/.libs -Wl,--rpath -Wl,/users/jsquyres/svn/ompi-1.3/orte/.libs -Wl,--rpath -Wl,/users/jsquyres/svn/ompi-1.3/opal/.libs -Wl,--rpath -Wl,/home/jsquyres/bogus/lib -L/users/jsquyres/svn/ompi-1.3/orte/.libs -L/users/jsquyres/svn/ompi-1.3/opal/.libs ../../../ompi/.libs/libmpi.so /users/jsquyres/svn/ompi-1.3/orte/.libs/libopen-rte.so /users/jsquyres/svn/ompi-1.3/opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lpthread pgCC-Error-Unknown switch: --instantiation_dir make: *** [libmpi_cxx.la] Error 1 }}} I noticed the comment "doesn't happen with PGI 9.0", so I re-did the entire process with PGI 9.0 instead of 10.0, but I get the same error! Any suggestions? Let me know if I should provide full copies of the configure and make output. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316