Re: [OMPI users] mpirun hanging when processes started on head node

2007-06-11 Thread Ralph H Castain
Hi Sean Could you please clarify something? I¹m a little confused by your comments about where things are running. I¹m assuming that you mean everything works fine if you type the mpirun command on the head node and just let it launch on your compute nodes ­ that the problems only occur when you

Re: [OMPI users] mpirun hanging when processes started on head node

2007-06-12 Thread Ralph H Castain
Hi Sean > [Sean] I'm working through the strace output to follow the progression on the > head node. It looks like mpirun consults '/bpfs/self' and determines that the > request is to be run on the local machine so it fork/execs 'orted' which then > runs 'hostname'. 'mpirun' didn't consult

Re: [OMPI users] Recursive use of "orterun"

2007-07-11 Thread Ralph H Castain
I'm unaware of any issues that would cause it to fail just because it is being run via that interface. The error message is telling us that the procs got launched, but then orterun went away unexpectedly. Are you seeing your procs complete? We do sometimes see that message due to a race condition

Re: [OMPI users] Recursive use of "orterun" (Ralph H Castain)

2007-07-11 Thread Ralph H Castain
put is from loading that module; the next thing in > the code is the os.system call to start orterun with 2 processors.) > > Also, there is absolutely no output from the second orterun-launched > program (even the first line does not execute.) > > Cheers, > > Lev > > > >&

Re: [OMPI users] orte_pls_base_select fails

2007-07-18 Thread Ralph H Castain
On 7/18/07 9:49 AM, "Adam C Powell IV" wrote: > As mentioned, I'm running in a chroot environment, so rsh and ssh won't > work: "rsh localhost" will rsh into the primary local host environment, > not the chroot, which will fail. > > [The purpose is to be able to build

Re: [OMPI users] orte_pls_base_select fails

2007-07-18 Thread Ralph H Castain
Tim has proposed a clever fix that I had not thought of - just be aware that it could cause unexpected behavior at some point. Still, for what you are trying to do, that might meet your needs. Ralph On 7/18/07 11:44 AM, "Tim Prins" wrote: > Adam C Powell IV wrote: >> As

Re: [OMPI users] mpirun hanging followup

2007-07-18 Thread Ralph H Castain
On 7/18/07 11:46 AM, "Bill Johnstone" wrote: > --- Ralph Castain wrote: > >> No, the session directory is created in the tmpdir - we don't create >> anything anywhere else, nor do we write any executables anywhere. > > In the case where the TMPDIR env

Re: [OMPI users] mpirun hanging followup

2007-07-18 Thread Ralph H Castain
Hooray! Glad we could help track this down - sorry it was so hard to do so. To answer your questions: 1. Yes - ORTE should bail out gracefully. It definitely should not hang. I will log the problem and investigate. I believe I know where the problem lies, and it may already be fixed on our

Re: [OMPI users] OpenMPI start up problems

2007-07-19 Thread Ralph H Castain
I gather you are running under TM since you have a PBS_NODEFILE? If so, in 1.2 we setup to read that file directly - you cannot specify it on the command line. We will fix this in 1.3 so you can do both, but for now - under TM - you have to leave that "-machinefile $PBS_NODEFILE" off of the

Re: [OMPI users] orterun --bynode/--byslot problem

2007-07-23 Thread Ralph H Castain
No, byslot appears to be working just fine on our bproc clusters (it is the default mode). As you probably know, bproc is a little strange in how we launch - we have to launch the procs in "waves" that correspond to the number of procs on a node. In other words, the first "wave" launches a proc

Re: [OMPI users] orterun --bynode/--byslot problem

2007-07-23 Thread Ralph H Castain
Yes...it would indeed. On 7/23/07 9:03 AM, "Kelley, Sean" <sean.kel...@solers.com> wrote: > Would this logic be in the bproc pls component? > Sean > > > From: users-boun...@open-mpi.org on behalf of Ralph H Castain > Sent: Mon 7/23/2007 9:18 AM > T

Re: [OMPI users] mpi daemon

2007-08-02 Thread Ralph H Castain
The daemon's name is "orted" - one will be launched on each remote node as the application is started, but they only live for as long as the application is executing. Then they go away. On 8/2/07 12:47 PM, "Reuti" wrote: > Am 02.08.2007 um 18:32 schrieb Francesco

Re: [OMPI users] memory leaks on solaris

2007-08-06 Thread Ralph H Castain
On 8/5/07 6:35 PM, "Glenn Carver" wrote: > I'd appreciate some advice and help on this one. We're having > serious problems running parallel applications on our cluster. After > each batch job finishes, we lose a certain amount of available > memory.

Re: [OMPI users] memory leaks on solaris

2007-08-06 Thread Ralph H Castain
t; I would be curious if this helps. > > -DON > p.s. orte-clean does not exist in the ompi v1.2 branch, it is in the > trunk but I think there is an issue with it currently > > Ralph H Castain wrote: > >> >> On 8/5/07 6:35 PM, "Glenn Carver" <gl

Re: [OMPI users] Circumvent --host or dynamically read host info?

2007-08-30 Thread Ralph H Castain
I take it you are running in an rsh/ssh environment (as opposed to a managed environment like SLURM)? I'm afraid that you have to tell us -all- of the nodes that will be utilized in your job at the beginning (i.e., to mpirun). This requirement is planned to be relaxed in a later version, but that

Re: [OMPI users] Job does not quit even when the simulation dies

2007-11-07 Thread Ralph H Castain
As Jeff indicated, the degree of capability has improved over time - I'm not sure which version this represents. The type of failure also plays a major role in our ability to respond. If a process actually segfaults or dies, we usually pick that up pretty well and abort the rest of the job

Re: [OMPI users] Q: Problems launching MPMD applications? ('mca_oob_tcp_peer_try_connect' error 103)

2007-12-06 Thread Ralph H Castain
On 12/5/07 8:47 AM, "Brian Dobbins" wrote: > Hi Josh, > >> I believe the problem is that you are only applying the MCA >> parameters to the first app context instead of all of them: > > Thank you very much.. applying the parameters with -gmca works fine with the > test

Re: [OMPI users] ORTE_ERROR_LOG: Data unpack had inadequate space in file gpr_replica_cmd_processor.c at line 361

2007-12-14 Thread Ralph H Castain
ter > Moffett Field, CA 94035-1000 > > Fax: 415-604-3957 > > > If I try to use multiple nodes, I got the error messages: > ORTE_ERROR_LOG: Data unpack had inadequate space in file dss/dss_unpack.c at > line 90 > ORTE_ERROR_LOG: Data unpack had inadequate space in fil

Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2007-12-17 Thread Ralph H Castain
On 12/12/07 5:46 AM, "Elena Zhebel" wrote: > > > Hello, > > > > I'm working on a MPI application where I'm using OpenMPI instead of MPICH. > > In my "master" program I call the function MPI::Intracomm::Spawn which spawns > "slave" processes. It is not

Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2007-12-17 Thread Ralph H Castain
above). This may become available in a future release - TBD. Hope that helps Ralph > > Thanks and regards, > Elena > > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph H Castain > Sent: Monday, December

Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2007-12-18 Thread Ralph H Castain
eside - everything should work the same. Just as an FYI: the name of that environmental variable is going to change in the 1.3 release, but everything will still work the same. Hope that helps Ralph > > Thanks and regards, > Elena > > > -Original Message- > From:

Re: [OMPI users] Torque and OpenMPI 1.2

2007-12-18 Thread Ralph H Castain
Hate to be a party-pooper, but the answer is "no" in OpenMPI 1.2. We don't allow the use of a hostfile in a Torque environment in that version. We have changed this for v1.3, but you'll have to wait for that release. Sorry Ralph On 12/18/07 11:12 AM, "pat.o'bry...@exxonmobil.com"

Re: [OMPI users] Torque and OpenMPI 1.2

2007-12-19 Thread Ralph H Castain
>>>>> Terry, >>>>>Your suggestion worked. So long as I specifically state >>>>> "--without-tm", >>>>> the OpenMPI 1.2.4 build allows the use of "-hostfile". >>>> Apparently, by >>>>> default, OpenMPI 1.2.4 will incorporate To

Re: [OMPI users] Torque and OpenMPI 1.2

2007-12-19 Thread Ralph H Castain
ra parms. Will 1.3 also carry the same > restrictions you list below? > Pat > > J.W. (Pat) O'Bryant,Jr. > Business Line Infrastructure > Technical Systems, HPC > Office: 713-431-7022 > > > > >

Re: [OMPI users] mpirun: specify multiple install prefixes

2007-12-20 Thread Ralph H Castain
I'm afraid not - nor is it in the plans for 1.3 either. I'm afraid it fell through the cracks as the needs inside the developer community moved into other channels. I'll raise the question internally and see if people feel we should do this. It wouldn't be hard to put it into 1.3 at this point,

Re: [OMPI users] Torque and OpenMPI 1.2

2007-12-20 Thread Ralph H Castain
t; Optimization and Uncertainty Estimation >> Sandia National Laboratories >> P.O. Box 5800, Mail Stop 1318 >> Albuquerque, NM 87185-1318 >> Voice: 505-284-8845, FAX: 505-284-2518 >> >>> -Original Message- >>> From: users-boun..

Re: [OMPI users] orte in persistent mode

2008-01-02 Thread Ralph H Castain
Hi Neeraj No, we still don't support having a persistent set of daemons acting as some kind of "virtual machine" like LAM/MPI did. We at one time had talked about adding it. However, our most recent efforts have actually taken us away from supporting that mode of operation. As a result, I very

Re: [OMPI users] Need explanation for the following ORTE error message

2008-01-23 Thread Ralph H Castain
On 1/23/08 8:26 AM, "David Gunter" wrote: > A user of one of our OMPI 1.2.3 builds encountered the following error > message during an MPI job run: > > ORTE_ERROR_LOG: File read failure in file > util/universe_setup_file_io.c at line 123 It means that at some point in the

[OMPI users] FW: problems with hostfile when doing MPMD

2008-04-14 Thread Ralph H Castain
Hi Jody I believe this was intended for the Users mailing list, so I'm sending the reply there. We do plan to provide more explanation on these in the 1.3 release - believe me, you are not alone in puzzling over all the configuration params! Many of us in the developer community also sometimes

Re: [OMPI users] Proper use of sigaction in Open MPI?

2008-04-24 Thread Ralph H Castain
I have never tested this before, so I could be wrong. However, my best guess is that the following is happening: 1. you trap the signal and do your cleanup. However, when your proc now exits, it does not exit with a status of "terminated-by-signal". Instead, it exits normally. 2. the local

Re: [OMPI users] specifying hosts in mpi_spawn()

2008-05-30 Thread Ralph H Castain
I'm afraid I cannot answer that question without first knowing what version of Open MPI you are using. Could you provide that info? Thanks Ralph On 5/29/08 6:41 PM, "Bruno Coutinho" wrote: > How mpi handles the host string passed in the info argument to >

Re: [OMPI users] specifying hosts in mpi_spawn()

2008-06-02 Thread Ralph H Castain
o another > version if necessary. > > > 2008/5/30 Ralph H Castain <r...@lanl.gov>: >> I'm afraid I cannot answer that question without first knowing what version >> of Open MPI you are using. Could you provide that info? >> >> Thanks >> Ralph >> >

Re: [OMPI users] Application Context and OpenMPI 1.2.4

2008-06-17 Thread Ralph H Castain
Hi Pat A friendly elf forwarded this to me, so please be sure to explicitly include me on any reply. Was that the only error message you received? I would have expected a trail of "error_log" outputs that would help me understand where this came from. If not, I can give you some debug flags to

Re: [OMPI users] SLURM and OpenMPI

2008-06-17 Thread Ralph H Castain
I can believe 1.2.x has problems in that regard. Some of that has nothing to do with slurm and reflects internal issues with 1.2. We have made it much more resistant to those problems in the upcoming 1.3 release, but there is no plan to retrofit those changes to 1.2. Part of the problem was that

Re: [OMPI users] SLURM and OpenMPI

2008-06-19 Thread Ralph H Castain
such failures, and in fact did it more effectively. For example if slurm > detects a node has failed, it will stop the job, allocate an additional > free node to make up the deficit, then relaunch. It more difficult (to > put it mildly) for a job launcher to do that. > > Thanks again,

Re: [OMPI users] null characters in output

2008-06-19 Thread Ralph H Castain
is before? I am working on a simple test case, but > unfortunately have not found one that is deterministic so far. > > Thanks, > Federico > > -Original Message- > From: Ralph H Castain [mailto:r...@lanl.gov] > Sent: Tuesday, June 17, 2008 1:09 PM > To: Sacerdoti,

Re: [OMPI users] Displaying Selected MCA Modules

2008-06-23 Thread Ralph H Castain
I can guarantee bproc support isn't broken in 1.2 - we use it on several production machines every day, and it works fine. I heard of only one potential problem having to do with specifying multiple app_contexts on a cmd line, but we are still trying to confirm that it wasn't operator error. In

Re: [OMPI users] mca parameters: meaning and use

2008-06-26 Thread Ralph H Castain
Actually, I suspect the requestor was hoping for an explanation somewhat more illuminating than the terse comments output by ompi_info. ;-) Bottom line is "no". We have talked numerous times about the need to do this, but unfortunately there has been little accomplished. I doubt it will happen

Re: [OMPI users] Need some help regarding Linpack execution

2008-07-02 Thread Ralph H Castain
You also might want to resend this to the MPICH mailing list ­ this is the Open MPI mailing list ;-) On 7/2/08 8:03 AM, "Swamy Kandadai" wrote: > Hi: > May be you do not have 12 entries in your machine.list file. You need to have > atleast np lines in your machine.list > >

Re: [OMPI users] mpirun w/ enable-mpi-threads spinning up cputime when app path is invalid

2008-07-02 Thread Ralph H Castain
Out of curiosity - what version of OMPI are you using? On 7/2/08 10:46 AM, "Steve Johnson" wrote: > If mpirun is given an application that isn't in the PATH, then instead of > exiting it prints the error that it failed to find the executable and then > proceeds spins up cpu

Re: [OMPI users] mpirun w/ enable-mpi-threads spinning up cputime when app path is invalid

2008-07-02 Thread Ralph H Castain
Sorry - went to one of your links to get that info. We know OMPI 1.2.x isn't thread safe. This is unfortunately another example of it. Hopefully, 1.3 will be better. Ralph On 7/2/08 11:01 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > Out of curiosity - what version of

Re: [OMPI users] ORTE_ERROR_LOG timeout

2008-07-08 Thread Ralph H Castain
Several thins are going on here. First, this error message: > mpirun noticed that job rank 1 with PID 9658 on node mac1 exited on signal > 6 (Aborted). > 2 additional processes aborted (not shown) indicates that your application procs are aborting for some reason. The system is then attempting

Re: [OMPI users] Query regarding OMPI_MCA_ns_nds_vpid env variable

2008-07-11 Thread Ralph H Castain
This variable is only for internal use and has no applicability to a user. Basically, it is used by the local daemon to tell an application process its rank when launched. Note that it disappears in v1.3...so I wouldn't recommend looking for it. Is there something you are trying to do with it?

Re: [OMPI users] Query regarding OMPI_MCA_ns_nds_vpid env variable

2008-07-11 Thread Ralph H Castain
On 7/11/08 7:32 AM, "Ashley Pittman" <apitt...@concurrent-thinking.com> wrote: > On Fri, 2008-07-11 at 07:20 -0600, Ralph H Castain wrote: >> This variable is only for internal use and has no applicability to a user. >> Basically, it is used by the local daemon

Re: [OMPI users] Outputting rank and size for all outputs.

2008-07-11 Thread Ralph H Castain
reat, and are probably a little > nicer than my current setup. > > -Mark > > > On Jul 11, 2008, at 9:46 AM, Ralph H Castain wrote: > >> Adding the ability to tag stdout/err with the process rank is fairly >> simple. >> We are going to talk about this next week

Re: [OMPI users] Query regarding OMPI_MCA_ns_nds_vpid env variable

2008-07-11 Thread Ralph H Castain
On 7/11/08 7:50 AM, "Ashley Pittman" <apitt...@concurrent-thinking.com> wrote: > On Fri, 2008-07-11 at 07:42 -0600, Ralph H Castain wrote: >> >> >> On 7/11/08 7:32 AM, "Ashley Pittman" <apitt...@concurrent-thinking.com> >> wrote:

Re: [OMPI users] Query regarding OMPI_MCA_ns_nds_vpid env variable

2008-07-11 Thread Ralph H Castain
On 7/11/08 8:33 AM, "Ashley Pittman" <apitt...@concurrent-thinking.com> wrote: > On Fri, 2008-07-11 at 08:01 -0600, Ralph H Castain wrote: >>>> I believe this is partly what motivated the creation of the MPI envars - to >>>> create a vehicle

Re: [OMPI users] MPI_Comm_spawn error messages

2006-07-06 Thread Ralph H Castain
Hi Saadat Could you tell us something more about the system you are using? What type of processors, operating system, any resource manager (e.g., SLURM, PBS), etc? Thanks Ralph On 7/6/06 10:49 AM, "s anwar" wrote: > Good Day: > > I am getting the following error messages

Re: [OMPI users] MPI_Info for MPI_Open_port

2006-07-11 Thread Ralph H Castain
On 7/11/06 11:59 AM, "Edgar Gabriel" wrote: > Abhishek Agarwal wrote: >> Hello, >> >> Is there a way of providing a specific port number in MPI_Info when using a >> MPI_Open_port command so that clients know which port number to connect. > > the MPI port-name in Open MPI

Re: [OMPI users] OpenMPI / PBS / TM interaction

2006-08-03 Thread Ralph H Castain
Depending upon what version you are using, this could be resolved fairly simply. Check to see if your version supports the "nooversubscribe" command line option. If it does, then setting that option may (I believe) resolve the problem - at the least, it will only allow you to run one application

Re: [OMPI users] MPI_Comm_spawn_multiple and BProc

2006-09-27 Thread Ralph H Castain
Could you please clarify - what "Bproc kernel patch" are you referring to? Thanks Ralph On 9/27/06 2:37 AM, "laurent.po...@fr.thalesgroup.com" wrote: > Hi, > > I'm using MPI_Comm_spawn_multiple with Open MPI 1.1.1. > It used to work well, until I used the

Re: [OMPI users] job fails to terminate

2006-10-18 Thread Ralph H Castain
Hi Lydia Could you confirm the version you are using? I think there is a typo there. Also, could you tell us how you configured the code (the configure command line would be nice). Thanks Ralph On 10/18/06 11:03 AM, "Lydia Heck" wrote: > > I have recently installed

Re: [OMPI users] job fails to terminate

2006-10-20 Thread Ralph H Castain
Hi Lydia Thanks - that does help! Could you try this without threads? We have tried to make the system work with threads, but our testing has been limited. First thing I would try is to make sure that we aren't hitting a thread-lock. Thanks Ralph On 10/20/06 2:11 AM, "Lydia Heck"

Re: [OMPI users] users Digest, Vol 411, Issue 2

2006-10-20 Thread Ralph H Castain
--enable-mpi-threads \ >>> --enable-progress-threads \ >>> --with-threads=solaris > > all of them? > > Lydia > >> >> ---------- >> >> Message: 1 >> Date: Fri, 20

Re: [OMPI users] how do i link to .la library files?

2006-10-26 Thread Ralph H Castain
Easiest method is just to use the "mpicc" command to compile your code. It will automatically link you to the right libraries, include directories, etc. You can check the $prefix/bin directory to see all the compiler wrappers we provide. Ralph On 10/26/06 7:12 AM, "shane kennedy"

Re: [OMPI users] mpirun crashes when compiled in 64-bit mode on Apple Mac Pro

2006-10-26 Thread Ralph H Castain
If you wouldn't mind, could you try it again after applying the attached patch? This looks like a problem we encountered on another release where something in the runtime didn't get initialized early enough. It only shows up in certain circumstances, but this seems to fix it. You can apply the

Re: [OMPI users] MPI_Comm_spawn multiple bproc support

2006-10-31 Thread Ralph H Castain
nor do I have one for comm_spawn_multiple that uses the "host" field. I can try to concoct something over the next few days, though, and verify that our code is working correctly. > > Regards. > > Herve > > Date: Mon, 30 Oct 2006 09:00:47 -0700 > From: Ralph H C

Re: [OMPI users] MPI_Comm_spawn multiple bproc support

2006-11-07 Thread Ralph H Castain
tion phase or if there is really a > incompatibility problem in open mpi. > > Thank you so much for all you support, I wish it is not succesful yet. > > Regards. > > Herve > > Date: Fri, 03 Nov 2006 14:10:20 -0700 > From: Ralph H Castain <r...@lanl.gov> > Subj

Re: [OMPI users] Pernode request

2006-12-13 Thread Ralph H Castain
On 12/12/06 9:18 AM, "Maestas, Christopher Daniel" wrote: > Ralph, > > I figured I should of run an mpi program ...here's what it does (seems > to be by-X-slot style): > --- > $ /apps/x86_64/system/mpiexec-0.82/bin/mpiexec -npernode 2 mpi_hello > Hello, I am node an41

Re: [OMPI users] Pernode request

2006-12-13 Thread Ralph H Castain
ping > byslot now. Dang those smp systems. :-) Ja, ist definitely confusing... > > -cdm > >> -Original Message- >> From: users-boun...@open-mpi.org >> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain >> Sent: Wednesday, December 13, 2006 6:5

Re: [OMPI users] orted: command not found

2007-01-03 Thread Ralph H Castain
Hi Jose Sorry for entering the discussion late. From tracing the email thread, I somewhat gather the following: 1. you have installed Open MPI 1.1.2 on two 686 boxes 2. you created a hostfile on one of the nodes and execute mpirun from that node. You gave us a prefix indicating where we should

Re: [OMPI users] openmpi / mpirun problem on aix: poll failed with errno=25, opal_event_loop: ompi_evesel->dispatch() failed.

2007-01-09 Thread Ralph H Castain
Hi Michael I would suggest using the nightly snapshot off of the trunk - the poe module compiles correctly there. I suspect we need an update to bring that fix over to the 1.2 branch. Ralph On 1/9/07 7:55 AM, "Michael Marti" wrote: > Thanks Jeff for the hint. > >

Re: [OMPI users] Can't start more than one process in a node as normal user

2007-01-17 Thread Ralph H Castain
Hi Eddie Open MPI needs to create a temporary file system ­ what we call our ³session directory² - where it stores things like the shared memory file. From this output, it appears that your /tmp directory is ³locked² to root access only. You have three options for resolving this problem: (a)

Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-06 Thread Ralph H Castain
sted in any > suggestion, semi-fixes, etc. which might help get to the bottom of this. Right > now: whether the daemons are indeed up and running, or if there are some that > are not (causing MPI_Init to hang). Thanks, Todd -Original > Message- From: users-boun...@open-mpi.org

Re: [OMPI users] Does Open MPI "Realy" support AIX?

2007-02-08 Thread Ralph H Castain
Hi Ali After conferring with my colleagues, it appears we don't have the cycles right now to really support AIX. As you have noted, the problem is with the io forwarding subsystem - a considerable issue. We will revise the web site to indicate this situation. We will provide an announcement of

Re: [OMPI users] Does Open MPI "Really" support AIX?

2007-02-13 Thread Ralph H Castain
itment (announced plan) from Open MPI group to make OMPI support > available for major UNIX and RTOS, > will make the Open MPI the leader in the market, and could open new doors for > R grants. > > Ali, > > > Ralph H Castain <r...@lanl.gov> > Sent by: use

Re: [OMPI users] Open MPI and PBS Pro 8

2007-02-13 Thread Ralph H Castain
On 2/13/07 11:30 AM, "Brock Palen" wrote: > On Feb 13, 2007, at 12:55 PM, Troy Telford wrote: > >> First, the good news: >> I've recently tried PBS Pro 8 with Open MPI 1.1.4. >> >> At least with PBS Pro version 8, you can (finally) do a dynamic/shared >> object for the TM

Re: [OMPI users] Open MPI and PBS Pro 8

2007-02-13 Thread Ralph H Castain
Oh, I should have made something clear - I believe those command line options aren't available in the 1.1 series. You'll have to upgrade to 1.2 (available in beta at the moment). On 2/13/07 12:20 PM, "Ralph H Castain" <r...@lanl.gov> wrote: > > > > On 2/13/07

Re: [OMPI users] MPI_Comm_Spawn

2007-02-27 Thread Ralph H Castain
Now that's interesting! There shouldn't be a limit, but to be honest, I've never tested that mode of operation - let me look into it and see. It sounds like there is some counter that is overflowing, but I'll look. Thanks Ralph On 2/27/07 8:15 AM, "rozzen.vinc...@fr.thalesgroup.com"

Re: [OMPI users] MPI_Comm_Spawn

2007-03-13 Thread Ralph H Castain
dd6 in main (argc=6, argv=0xb854) at main.c:13 >> (gdb) >> -Message d'origine- >> De : users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]De la >> part de Tim Prins >> Envoyé : lundi 5 mars 2007 22:34 >> À : Open MPI Users >> Objet

Re: [OMPI users] Fun with threading

2007-03-15 Thread Ralph H Castain
I can't speak to the MPI problems mentioned in here as my area of focus is solely on the RTE. With that caveat, I can say that - despite the fact there is little thread safety testing in the system - I haven't heard of any trouble launching non-MPI apps. We do it regularly, in both threaded and

Re: [OMPI users] Orted freezes on launch of application

2007-03-15 Thread Ralph H Castain
the resulting output so we can figure out what is going on. Ralph On 3/13/07 9:09 AM, "David Minor" <davi...@orbotech.com> wrote: > with tar > > > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf > Of Ralph H Castain > Sent: Tuesd

Re: [OMPI users] Signal 13

2007-03-15 Thread Ralph H Castain
It isn't a /dev issue. The problem is likely that the system lacks sufficient permissions to either: 1. create the Open MPI session directory tree. We create a hierarchy of subdirectories for temporary storage used for things like your shared memory file - the location of the head of that tree

Re: [OMPI users] Open MPI error when using MPI_Comm_spawn

2007-04-04 Thread Ralph H Castain
Hi Prakash I can't really test this solution as the Torque dynamic host allocator appears to be something you are adding to that system (so it isn't part of the released code). However, the attached code should cleanly add any nodes to any existing allocation known to OpenRTE. I hope to resume

Re: [OMPI users] MPI_Comm_Spawn

2007-04-04 Thread Ralph H Castain
sible. Threading support is VERY lightly tested, but I >> doubt it is the problem since it always fails after 31 spawns. >> >> Again, I have tried with these configure options and the same version >> of Open MPI and have still have been able to replicate this (after >> letti

Re: [OMPI users] Comm_connect: Data unpack would read past end of buffer

2018-08-03 Thread Ralph H Castain
The buffer being overrun isn’t anything to do with you - it’s an internal buffer used as part of creating the connections. It indicates a problem in OMPI. The 1.10 series is out of the support window, but if you want to stick with it you should at least update to the last release in that series

Re: [OMPI users] Settings oversubscribe as default?

2018-08-03 Thread Ralph H Castain
The equivalent MCA param is rmaps_base_oversubscribe=1. You can add OMPI_MCA_rmaps_base_oversubscribe to your environ, or set rmaps_base_oversubscribe in your default MCA param file. > On Aug 3, 2018, at 1:24 AM, Florian Lindner wrote: > > Hello, > > I can use --oversubscribe to enable

Re: [OMPI users] local communicator and crash of the code

2018-08-03 Thread Ralph H Castain
Those two command lines look exactly the same to me - what am I missing? > On Aug 3, 2018, at 10:23 AM, Diego Avesani wrote: > > Dear all, > > I am experiencing a strange error. > > In my code I use three group communications: > MPI_COMM_WORLD > MPI_MASTERS_COMM > LOCAL_COMM > > which have

Re: [OMPI users] cannot run openmpi 2.1

2018-08-11 Thread Ralph H Castain
Put "oob=^usock” in your default mca param file, or add OMPI_MCA_oob=^usock to your environment > On Aug 11, 2018, at 5:54 AM, Kapetanakis Giannis > wrote: > > Hi, > > I'm struggling to get 2.1.x to work with our HPC. > > Version 1.8.8 and 3.x works fine. > > In 2.1.3 and 2.1.4 I get

[MTT users] Python client requires MTT_HOME

2018-08-14 Thread Ralph H Castain
Hello all During the telecon today, we decided to enforce a requirement in the Python client that MTT_HOME be set in the environment to point at the top of the MTT directory tree. This significantly simplified some code and seemed a reasonable minimum requirement for operation. The commit for

Re: [OMPI users] What happened to orte-submit resp. DVM?

2018-08-28 Thread Ralph H Castain
You must have some stale code because those tools no longer exist. Note that we are (gradually) replacing orte-dvm with PRRTE: https://github.com/pmix/prrte See the “how-to” guides for PRRTE towards the bottom of this page: https://pmix.org/support/how-to/

Re: [OMPI users] What happened to orte-submit resp. DVM?

2018-08-29 Thread Ralph H Castain
> On Aug 29, 2018, at 1:59 AM, Reuti wrote: > >> >> Am 29.08.2018 um 04:46 schrieb Ralph H Castain > <mailto:r...@open-mpi.org>>: >> You must have some stale code because those tools no longer exist. > > Aha. This code is then by accient still

Re: [OMPI users] stdout/stderr question

2018-09-10 Thread Ralph H Castain
I’m not sure why this would be happening. These error outputs go through the “show_help” functionality, and we specifically target it at stderr: /* create an output stream for us */ OBJ_CONSTRUCT(, opal_output_stream_t); lds.lds_want_stderr = true; orte_help_output =

Re: [OMPI users] stdout/stderr question

2018-09-10 Thread Ralph H Castain
job to be terminated. The first process to do so was: >>> >>> Process name: [[22380,1],0] >>> Exit code:255 >>> -- >>> $ cat stdout >>> hello from 0 >&g

Re: [OMPI users] hwloc, OpenMPI and unsupported OSes and toolchains

2018-03-21 Thread Ralph H Castain
I don’t see how Open MPI can operate without pthreads > On Mar 19, 2018, at 3:23 PM, Gregory (tim) Kelly wrote: > > Hello Everyone, > I'm inquiring to find someone that can answer some multi-part questions about > hwloc, OpenMPI and an alternative OS and toolchain. I have a

[OMPI users] SC'18 PMIx BoF meeting

2018-10-15 Thread Ralph H Castain
Hello all [I’m sharing this on the OMPI mailing lists (as well as the PMIx one) as PMIx has become tightly integrated to the OMPI code since v2.0 was released] The PMIx Community will once again be hosting a Birds-of-a-Feather meeting at SuperComputing. This year, however, will be a little

Re: [OMPI users] opal_pmix_base_select failed for master and 4.0.0

2018-10-12 Thread Ralph H Castain
ithub.com/open-mpi/ompi/pull/5846 >> Both of these should be in tonight's nightly snapshot. >> Thank you! >>> On Oct 5, 2018, at 5:45 AM, Ralph H Castain wrote: >>> >>> Please send Jeff and I the opal/mca/pmix/pmix4x/pmix/config.log again - >>>

Re: [OMPI users] Bug with Open-MPI Processor Count

2018-11-01 Thread Ralph H Castain
Set rmaps_base_verbose=10 for debugging output Sent from my iPhone > On Nov 1, 2018, at 9:31 AM, Adam LeBlanc wrote: > > The version by the way for Open-MPI is 3.1.2. > > -Adam LeBlanc > >> On Thu, Nov 1, 2018 at 12:05 PM Adam LeBlanc wrote: >> Hello, >> >> I am an employee of the UNH

Re: [OMPI users] Bug with Open-MPI Processor Count

2018-11-01 Thread Ralph H Castain
> = > > > Yes the hostfile is available on all nodes through an NFS mount for all of > our home directories. > >> On Thu, Nov 1, 2018 at 2:44 PM Adam LeBlanc wrote: >> >> >> -- Fo

Re: [OMPI users] Bug with Open-MPI Processor Count

2018-11-01 Thread Ralph H Castain
> ------ Forwarded message - > From: Ralph H Castain mailto:r...@open-mpi.org>> > Date: Thu, Nov 1, 2018 at 1:07 PM > Subject: Re: [OMPI users] Bug with Open-MPI Processor Count > To: Open MPI Users <mailto:users@lists.open-mpi.org>> > > > Set r

Re: [OMPI users] OMPI 3.1.x, PMIx, SLURM, and mpiexec/mpirun

2018-11-12 Thread Ralph H Castain
mpirun should definitely still work in parallel with srun - they aren’t mutually exclusive. OMPI 3.1.2 contains PMIx v2.1.3. The problem here is that you built Slurm against PMIx v2.0.2, which is not cross-version capable. You can see the cross-version situation here:

Re: [OMPI users] opal_pmix_base_select failed for master and 4.0.0

2018-10-03 Thread Ralph H Castain
looks like Siegmar passed --with-hwloc=internal. > > Open MPI's configure understood this and did the appropriate things. > PMIX's configure didn't. > > I think we need to add an adjustment into the PMIx configure.m4 in OMPI... > > >> On Oct 2, 2018, at 5:25 PM, Ralph

Re: [OMPI users] Cannot run MPI code on multiple cores with PBS

2018-10-03 Thread Ralph H Castain
Did you configure OMPI —with-tm=? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it. > On Oct 3, 2018, at 12:02 PM, Castellana Michele > wrote: > > Dear all, > I am having trouble running an MPI code across multiple cores on a new >

Re: [OMPI users] Cannot run MPI code on multiple cores with PBS

2018-10-03 Thread Ralph H Castain
Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH > On Oct 3, 2018, at 12:33 PM, Ralph H Castain wrote: > > Did you configure OMPI —with-tm=? It looks like we didn’t > build PBS suppo

Re: [OMPI users] opal_pmix_base_select failed for master and 4.0.0

2018-10-05 Thread Ralph H Castain
> > On 10/3/18 8:14 PM, Ralph H Castain wrote: >> Jeff and I talked and believe the patch in >> https://github.com/open-mpi/ompi/pull/5836 >> <https://github.com/open-mpi/ompi/pull/5836> should fix the problem. > > > Today I've installed openmpi-master-

Re: [OMPI users] opal_pmix_base_select failed for master and 4.0.0

2018-10-02 Thread Ralph H Castain
Looks like PMIx failed to build - can you send the config.log? > On Oct 2, 2018, at 12:00 AM, Siegmar Gross > wrote: > > Hi, > > yesterday I've installed openmpi-v4.0.x-201809290241-a7e275c and > openmpi-master-201805080348-b39bbfb on my "SUSE Linux Enterprise Server > 12.3 (x86_64)" with Sun

Re: [OMPI users] opal_pmix_base_select failed for master and 4.0.0

2018-10-02 Thread Ralph H Castain
it, but perhaps something is different about this environment. > On Oct 2, 2018, at 6:36 AM, Ralph H Castain wrote: > > Looks like PMIx failed to build - can you send the config.log? > >> On Oct 2, 2018, at 12:00 AM, Siegmar Gross >> wrote: >> >> Hi, >

Re: [OMPI users] opal_pmix_base_select failed for master and 4.0.0

2018-10-02 Thread Ralph H Castain
> /lib64/libc.so.6 >libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6 >libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6 >libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6 >/lib64/libresolv.so.2: > libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6 >

Re: [OMPI users] issue compiling openmpi 3.2.1 with pmi and slurm

2018-10-10 Thread Ralph H Castain
It appears that the CPPFLAGS isn’t getting set correctly as the component didn’t find the Slurm PMI-1 header file. Perhaps it would help if we saw the config.log output so we can see where OMPI thought the file was located. > On Oct 10, 2018, at 6:44 AM, Ross, Daniel B. via users > wrote: >

Re: [OMPI users] issue compiling openmpi 3.2.1 with pmi and slurm

2018-10-10 Thread Ralph H Castain
='' > opal_pmi1_LDFLAGS='' > opal_pmi1_LIBS='-lpmi' > opal_pmi1_rpath='' > opal_pmi2_CPPFLAGS='' > opal_pmi2_LDFLAGS='' > opal_pmi2_LIBS='-lpmi2' > opal_pmi2_rpath='' > opal_pmix_ext1x_CPPFLAGS='' > opal_pmix_ext1x_LDFLAGS='' > opal_pmix_ext1x_LIBS='' > opal_p

  1   2   >