[hwloc-devel] Create success (hwloc git 1.11.3-43-g8f0e3cd)

2016-08-10 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc 1.11.3-43-g8f0e3cd Start time: Wed Aug 10 08:50:41 PDT 2016 End time: Wed Aug 10 08:53:56 PDT 2016 Your friendly daemon, Cyrador ___ hwloc-devel mailing list

[hwloc-devel] Create success (hwloc git dev-1222-gdbe0cfd)

2016-08-10 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc dev-1222-gdbe0cfd Start time: Wed Aug 10 18:01:05 PDT 2016 End time: Wed Aug 10 18:04:43 PDT 2016 Your friendly daemon, Cyrador ___ hwloc-devel mailing list

Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-04 Thread Ralph H. Castain
Looks okay to me Brian - I went ahead and filed the CMR and sent it on to Brad for approval. Ralph > On Tue, 3 Mar 2009, Brian W. Barrett wrote: > >> On Tue, 3 Mar 2009, Jeff Squyres wrote: >> >>> 1.3.1rc3 had a race condition in the ORTE shutdown sequence. The only >>> difference between rc3

Re: [O-MPI devel] New Bproc Components

2005-07-28 Thread Ralph H. Castain
Very interesting! Appreciate the info. My numbers are slightly better - as I've indicated, there is a NxN message exchange currently in the system that needs to be removed. With that commented out, the system scales roughly linearly with number of processes. At 04:31 PM 7/28/2005, you wrote:

[O-MPI devel] New simplified registry API's

2005-08-02 Thread Ralph H. Castain
Yo all Per last week's discussions, I have created a set of new simplified API's for the registry. These include: 1. orte_gpr.put_1 and orte_gpr.put_N: these allow you to put data on the registry without having to define your own value structures. They take a segment name, a NULL-terminated

Re: [O-MPI devel] broken rmgr?

2005-08-03 Thread Ralph H. Castain
Hmmm...it was running for me last night and (I thought) this morning, but I'll test it again and see if I can reproduce the problem. Could be something crept in there. At 06:28 PM 8/3/2005, you wrote: I just noticed that mpirun hangs forever inside the orte_rmgr.finalize() routine. AFAIK

Re: [O-MPI devel] compile error

2005-08-08 Thread Ralph H. Castain
Very interesting - it built fine for me (building static). However, the ns_base_nds.c file is "stale", so I just committed a "delete" of that file. It shouldn't have been building anyway as it isn't in the Makefile. My guess, therefore, is that you are building dynamically and are encountering

[O-MPI devel] RHC development plans

2005-09-01 Thread Ralph H. Castain
Yo folks Several people have asked lately what I am planning to do next on ORTE. Just to help maintain coordination, here is my current list of planned activities (in priority order). Any requests/suggestions are welcomed - this isn't in concrete by any means. 1. Add George's architecture

Re: [O-MPI devel] RHC development plans

2005-09-01 Thread Ralph H. Castain
Yo folks I have now completed the first three of these items. I believe this brings ORTE to a stage that is - at the least - very close to release quality. There are a few memory leaks left (oob and iof subsystems), but I'm not as familiar with those and have asked for help. Barring any

[O-MPI devel] Startup/shutdown performance

2005-09-13 Thread Ralph H. Castain
Yo folks Josh ran some tests for me on Odin earlier today - the results show a major improvement in our startup/shutdown performance. As you may recall, our times grew roughly exponentially before - as the attached graph shows, they now grow roughly linearly. The data also shows that the

[O-MPI devel] New build methodology

2005-11-15 Thread Ralph H. Castain
Yo folks While I generally find the new build methodology (i.e., reducing the number of makefiles) has little impact on me, I have now encountered one problem that causes a significant difficulty. In trying to work on a revised data packing system for the orte part of the branch, I now find

Re: [O-MPI devel] New build methodology

2005-11-15 Thread Ralph H. Castain
Your proposed change would help a great deal - thanks! Can you steer me through the change? At 07:33 AM 11/15/2005, you wrote: Hi Ralph, * Ralph H. Castain wrote on Tue, Nov 15, 2005 at 03:12:38PM CET: > > While I generally find the new build methodology (i.e., reducing the &g

Re: [O-MPI devel] New build methodology

2005-11-20 Thread Ralph H. Castain
ut we have definitely made it harder to develop a subsystem. Is that really a good trade? I wonder. Ralph At 08:08 AM 11/15/2005, you wrote: * Ralph H. Castain wrote on Tue, Nov 15, 2005 at 03:45:26PM CET: > At 07:33 AM 11/15/2005, you wrote: > > > >Would it help if onl

Re: [O-MPI devel] New build methodology

2005-11-21 Thread Ralph H. Castain
un "make" in a framework directory, it just builds the stuff in base without recusing. Of course, you can't run make in the base/ directory, but since running make in the framework directory is essentially equivalent, it doesn't exactly matter. Brian On Nov 20, 2005, at 10:04 PM, Ralph H. Casta

Re: [O-MPI devel] New build methodology

2005-11-21 Thread Ralph H. Castain
Hi Ralf Appreciate the offer, but I think at this stage it isn't worth the hassle. We either implement a long-term fix, or just pay the price. Thanks though Ralph At 01:37 AM 11/21/2005, you wrote: Hi Ralph, * Ralph H. Castain wrote on Mon, Nov 21, 2005 at 04:04:34AM CET: > Just as an

[O-MPI devel] Process for modifying APIs

2005-11-21 Thread Ralph H. Castain
Yo all As you may have seen from earlier emails, I encountered some difficulty in modifying existing APIs within the streamlined build system. After some effort, I think I have defined a method for modifying the API-level of a subsystem that gets around some of the problems. I thought I

Re: [O-MPI devel] rsh and fork pls components

2005-12-13 Thread Ralph H. Castain
No problem with me - seems straightforward and resolves some confusion. On the orted check for the fork pls, you will find that there is a flag in the process info structure that indicates "I am a daemon". You may just need to check that flag - gets set very early and so should be available

Re: [O-MPI devel] Alpha 4 and job state transitions

2006-02-02 Thread Ralph H. Castain
I've just finished some stuff - will check it into the system (hopefully) tomorrow. I'll be able to take a look at this next week. My guess is that the launcher isn't setting that proc state at this time since it isn't being used by the system internally and we didn't know anyone else was

[O-MPI devel] New data support subsystem for ORTE

2006-02-06 Thread Ralph H. Castain
Hello all After several months of development, I have merged the new data support subsystem for ORTE into the trunk. I must provide one caveat of warning: I have made every effort to test the revised system, but cannot guarantee its operation in every condition and under every system. For

Re: [O-MPI devel] Alpha 4 and job state transitions

2006-02-08 Thread Ralph H. Castain
Nathan This should now be fixed on the trunk. Once it is checked out more thoroughly, I'll ask that it be moved to the 1.0 branch. For now, you might want to check out the trunk and verify it meets your needs. Ralph At 03:05 PM 2/1/2006, you wrote: This was happening on Alpha 1 as well but

[O-MPI devel] Modification to triggers

2006-02-08 Thread Ralph H. Castain
Hi all As you'll see in my latest commit, I have made a slight modification to the standard triggers that ensures we define them for ALL of the process and job states. This will now allow users to subscribe to triggers (for example) on all processes achieving INIT, LAUNCH, and RUNNING

Re: [O-MPI devel] Modification to triggers

2006-02-09 Thread Ralph H. Castain
Hmmmyuck! I'll take a look - will set it back to what it was before in the interim. Thanks Ralph At 07:05 AM 2/9/2006, you wrote: On Feb 8, 2006, at 12:46 PM, Ralph H. Castain wrote: > In addition, I took advantage of the change to fix something Brian > had flagged in the orte/mc

Re: [OMPI devel] [O-MPI devel] Alpha 4 and job state transitions

2006-02-13 Thread Ralph H. Castain
>> Nathan DeBardeleben, Ph.D. >> Los Alamos National Laboratory >> Parallel Tools Team >> High Performance Computing Environments >> phone: 505-667-3428 >> email: ndeb...@lanl.gov >>

Re: [OMPI devel] help - urgent

2006-06-30 Thread Ralph H Castain
Hi Amrita I¹m not entirely sure I understand your questions, but will try to answer them below. If you can share what you are doing, we¹d be happy to provide advice. Ralph On 6/30/06 5:45 AM, "amrita mathuria" wrote: > hi... > > I am working with open mpi

Re: [OMPI devel] orted problem

2006-07-05 Thread Ralph H Castain
This has been around for a very long time (at least a year, if memory serves correctly). The problem is that the system "hangs" while trying to flush the io buffers through the RML because it loses connection to the head node process (for 1.x, that's basically mpirun) - but the "flush" procedure

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
Hi Nathan Could you tell us which version of the code you are using, and print out the rc value that was returned by the "get" call? I see nothing obviously wrong with the code, but much depends on what happened prior to this call too. BTW: you might want to release the memory stored in the

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
h Performance Computing Environments > phone: 505-667-3428 > email: ndeb...@lanl.gov > ----- > > > > Ralph H Castain wrote: >> Hi Nathan >> >> Could you tell us which version of the code you a

Re: [OMPI devel] OpenMPI not conforming with the C90 spec?

2006-08-21 Thread Ralph H Castain
On 8/21/06 1:14 AM, "Ralf Wildenhues" wrote: > >> Perhaps we should use int64_t instead. > > No, that would not help: int64_t is C99, so it should not be declared > either in C89 mode. Also, the int64_t is required to have 64 bits, and > could thus theoretically be

Re: [OMPI devel] OpenMPI not conforming with the C90 spec?

2006-08-21 Thread Ralph H Castain
On 8/21/06 6:58 AM, "Ralf Wildenhues" <ralf.wildenh...@gmx.de> wrote: > * Ralph H Castain wrote on Mon, Aug 21, 2006 at 02:39:51PM CEST: >> >> It sounds, therefore, like we are now C99 compliant and no longer C90 >> compliant at all? > > Well

[OMPI devel] Upcoming: Major ORTE changes

2006-08-23 Thread Ralph H Castain
Yo all There has been a bit of discussion about this on the core developers list and on telecons, but I felt that perhaps I should provide a more detailed warning to the broader developer community. In the next few weeks, there will be some major revisions submitted to the Open MPI trunk on the

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain
Actually, I was a part of that thread - see my comments beginning with http://www.open-mpi.org/community/lists/devel/2006/03/0797.php. Perhaps I communicated poorly here. The issue in the prior thread was that few systems nowadays don't offer at least some level of IPv6 compatibility, even if

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain
On 9/6/06 9:44 AM, "Christian Kauhaus" wrote: > Bogdan Costescu : >> I don't know why you think that this (talking to different nodes via >> different channels) is unusual - I think that it's quite probable, >> especially in a

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-07 Thread Ralph H Castain
. > > I even volunteer for that. Next week I will be away, so I will come > back with a design for the phone conference on ... well beginning of > october. > >george. > > > On Sep 7, 2006, at 12:22 PM, Ralph H Castain wrote: > >> Jeff and I talked about t

[OMPI devel] ORTE tutorial

2006-09-25 Thread Ralph H Castain
Yo folks I need to do a little planning and it would help a bunch to have a preliminary head count. Could you please let me know (a) if you plan to participate in the tutorial, and (b) indicate if in-person or remote? For an agenda, my thought is that we will start at 7am Mountain time (that's

Re: [OMPI devel] ORTE tutorial

2006-09-25 Thread Ralph H Castain
round 10.30 pm on wednesday, and by the time I pick up > the rental car and drive to White Rocks, it can become quite late) > Could we maybe start a little later that day, e.g. 8am or 9am? > > Thanks > Edgar > > Ralph H Castain wrote: >> Yo folks >> >> I

[OMPI devel] Tentative OpenRTE tutorial agenda

2006-09-25 Thread Ralph H Castain
Hello all I have attached a tentative agenda for this week's tutorial, based on inputs received so far from planned participants. I have adjusted things to try and accommodate the needs of a geographically distributed audience, and the fact that - as sole speaker - I cannot possibly talk for

[OMPI devel] ORTE Tutorial Materials

2006-09-27 Thread Ralph H Castain
Hello all The materials for Thursday's session of the ORTE tutorial are now complete and stable. I have posted them on the OpenRTE web site at: http://www.open-rte.org/papers/tutorial-sept-2006/index.php Both Powerpoint and PDF (printed two slides/page) formats are available. I should have the

[OMPI devel] ORTE Timing

2006-09-29 Thread Ralph H Castain
Hello all There was some discussion at yesterday's tutorial about ORTE scalability and where bottlenecks might be occurring. I spent some time last night identifying key information required to answer those questions. I'll be presenting a slide today showing the key timing points that we would

Re: [OMPI devel] socket usage

2006-10-25 Thread Ralph H Castain
I can't speak to the MPI layer, but for OpenRTE, each process holds one socket open to the HNP. Each process *has* all the socket connection info for all of the processes in its job, but I don't believe we actually open those sockets until we attempt to communicate with that process (needs to be

Re: [OMPI devel] New oob/tcp?

2006-10-25 Thread Ralph H Castain
I don't see any new component, Adrian. There have been a few updates to the existing component, some of which might cause conflicts with the merge, but those shouldn't be too hard to resolve. As far as I know, the oob/tcp component is relatively stable. Brian is doing some work on it to enable us

Re: [OMPI devel] Getting process PID

2006-11-09 Thread Ralph H Castain
Hi Greg All of the schema keys are listed in orte/mca/schema/schema_types.h. The key you are looking for is the ORTE_PROC_LOCAL_PID_KEY. You will also see a ORTE_PROC_PID_KEY. This one refers to the pid assigned by the launcher - the other refers to the pid reported by the process from its

Re: [OMPI devel] Build system changes

2006-11-30 Thread Ralph H Castain
Thanks Ralf! Much appreciated. On 11/30/06 8:33 AM, "Ralf Wildenhues" wrote: > * Ralph Castain wrote on Thu, Nov 30, 2006 at 04:12:16PM CET: >> That could be the problem. I had to update automake, and unfortunately >> Darwin Ports hasn't reached that level yet. So I had

Re: [OMPI devel] Major revision to the RML/OOB

2006-12-06 Thread Ralph H Castain
We aren't ignoring your situation, Adrian - Jeff and I are talking about how best to deal with the situation and your offer to help. This revision will indeed see some significant change in the oob/tcp component, mostly in the init and connect procedures. The concern is that we want to leave open

Re: [OMPI devel] Major revision to the RML/OOB

2006-12-06 Thread Ralph H Castain
The changes we are planning to do will in no way preclude the use of multicast for the xcast procedure. The changes in the OOB subsystem deal specifically with how those connections are initialized, which is something we would need to do for multicast anyway. The routing method for the xcast is

[OMPI devel] OpenRTE telecon?

2007-01-04 Thread Ralph H Castain
Hi everyone Several of us were on a telecon yesterday and the topic of better coordinating the activities on OpenRTE came up. While things have percolated along reasonably well, the general feeling was that better, wider knowledge of current OpenRTE development activities and directions would

Re: [OMPI devel] OpenRTE telecon?

2007-01-11 Thread Ralph H Castain
on adding functionality to the code - I will note those on the site as I am fixing them. Again, I would like to note that people are always welcome to drop me a note or call me on the phone if they have a question about what I'm doing or planning to do. Thanks Ralph On 1/4/07 7:41 AM, "

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-29 Thread Ralph H Castain
On 1/27/07 9:37 AM, "Greg Watson" wrote: > There are two more interfaces that have changed: > > 1. orte_rds.query() now takes a job id, whereas in 1.2b1 it didn't > take any arguments. I seem to remember that I call this to kick orted > into action, but I'm not sure of the

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-29 Thread Ralph H Castain
On 1/29/07 10:20 AM, "Greg Watson" <gwat...@lanl.gov> wrote: > > On Jan 29, 2007, at 6:47 AM, Ralph H Castain wrote: > >> >> >> >> On 1/27/07 9:37 AM, "Greg Watson" <gwat...@lanl.gov> wrote: >> >>> There a

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-30 Thread Ralph H Castain
anything >> other than a hostfile, we really don't have a way to do that right >> now. The >> ORTE 2.0 design allows for it, but we haven't implemented that yet - >> probably a few months away. >> >> Hope that helps >> Ralph >> >> >> O

Re: [OMPI devel] Is it possible to get BTL transport work directly with MPI level

2007-04-03 Thread Ralph H Castain
On 4/3/07 9:32 AM, "Li-Ta Lo" wrote: > On Sun, 2007-04-01 at 13:12 -0600, Ralph Castain wrote: > >> >> 2. I'm not sure what you mean by mapping MPI processes to "physical" >> processes, but I assume you mean how do we assign MPI ranks to processes on >> specific nodes. You

[OMPI devel] ORTE scalability issues

2007-04-16 Thread Ralph H Castain
Hello all I understand that several people are interested in the OpenRTE scalability issues - this is great! However, it appears we haven't done a very good job of circulating information about the identified causes of the current issues. In the hope of helping people to be productive in their

Re: [OMPI devel] ORTE scalability issues

2007-04-17 Thread Ralph H Castain
Thanks Christian. Actually, I was aware of that and should have clarified that these tests did *not* involve the IPv6 code. Ralph On 4/17/07 1:31 AM, "Christian Kauhaus" <ckauh...@minet.uni-jena.de> wrote: > Ralph H Castain <r...@lanl.gov>: >> even tho

[OMPI devel] Change to default xcast mode [RFC]

2007-05-18 Thread Ralph H Castain
For the last several months, we have supported three modes of sending the xcast messages used to release MPI processes from their various stage gates: 1. Direct - message sent directly to each process in a serial fashion 2. Linear - message sent serially to the daemon on each node, which then

Re: [OMPI devel] [devel-core] Change to default xcast mode [RFC]

2007-05-18 Thread Ralph H Castain
e messages to each orted independently (instead of via a binomial tree method). > > Andrew > > Ralph H Castain wrote: >> For the last several months, we have supported three modes of sending the >> xcast messages used to release MPI processes from their various stag

[OMPI devel] Dumping process status etc.

2007-05-22 Thread Ralph H Castain
This came up in today's telecon and I promised to send this to George - however, it occurred to me that others may also want to know. If you want to dump info for debugging purposes, and if you can get into orterun/mpirun (e.g., via gdb), you can dump info on anything with the following (NOTE:

Re: [OMPI devel] Strange schema error

2007-05-23 Thread Ralph H Castain
Just a quick glance (running out door) - it looks like Josh commented out a critical piece of code in the rds hostfile component at line 442. It loads the cell info into the name service so it can correctly respond to the query you cite below. You might try restoring that code - if you do, check

Re: [OMPI devel] Strange schema error

2007-05-23 Thread Ralph H Castain
> wrote: > I haven't looked at this at all, but that line changed in r6813 which > was Aug. 2005 so I would guess the problem is elsewhere. However with > the recent ORTE changes maybe this is a side effect. > > -- Josh > > > On May 23, 2007, at 11:11 AM, Ralph H

Re: [OMPI devel] Strange schema error

2007-05-23 Thread Ralph H Castain
Okay, this is now fixed as of r14732. Thanks (and apologies) to George for spotting it. Ralph On 5/23/07 9:57 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > Actually, I think that is true (got back earlier than expected). The problem > really is that we had multip

Re: [OMPI devel] ORTE registry patch

2007-05-24 Thread Ralph H Castain
Thanks - I'll take a look at this (and the prior ones!) in the next couple of weeks when time permits and get back to you. Ralph On 5/23/07 1:11 PM, "George Bosilca" wrote: > Attached is another patch to the ORTE layer, more specifically the > replica. The idea is to

[OMPI devel] Why the HNP gets so big...

2007-05-31 Thread Ralph H Castain
Scaling tests over the last few months have all shown a behavior that has elicited significant comment: namely, that the HNP is observed to grow to multiple gigabytes in size for runs involving several thousand processes. This represents a peak size that declines to a much smaller footprint once

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Ralph H Castain
s (you'll have to look at the tests to >>>>>> see >>>>>> which ones make sense in the latter case). This will ensure that we >>>>>> have at >>>>>> least some degree of coverage. >>>>>> >>>>>>

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Ralph H Castain
ou do, remember to also > remove test/class/orte_bitmap.c > > Thanks, > > Tim > > > Ralph H Castain wrote: >> Sigh...is it really so much to ask that we at least run the tests in >> orte/test/system and orte/test/mpi using both mpirun and singleton (where >> a

[OMPI devel] Major commit to trunk

2007-06-12 Thread Ralph H Castain
Yo all I made a major commit to the trunk this morning (r15007) that merits general notification and some explanation. *** IMPORTANT NOTE *** One major impact of the commit you *may* notice is that support for several environments will be broken. This commit is known to break

Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralph H Castain
Actually, I was talking specifically about configuration at build time. I realize there are trade-offs here, and suspect we can find a common ground. The problem with using the options Jeff described is that they require knowledge on the part of the builder as to what environments have had their

Re: [OMPI devel] ticket 1023

2007-07-10 Thread Ralph H Castain
As I understood our original discussions, this would move responsibility for mapping rank to processor back into the orted - is that still true? Reason I ask is to again clarify for people if we are doing so as it (a) impacts those systems that don't use our orteds (e.g., will affinity still work

[OMPI devel] Bproc support

2007-07-10 Thread Ralph H Castain
Yo all I have upgraded the support for Bproc on the Open MPI trunk as of r15328. We now support Bproc environments that do not utilize resource managers - in these cases, we will allow the user to launch on all nodes upon which they have execution authorities. Please note that, if you login to

Re: [OMPI devel] ticket 1023

2007-07-10 Thread Ralph H Castain
al processes. Currently this component is the ODLS. Most of my > work is in the ODLS component so if you decide to eliminate the orteds > you mast, somehow, preserve the ODLS functionality. > > Sharon. > > > > -Original Message- > From: devel-boun...@open-mpi.org

Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralph H Castain
as multiple, related > frameworks (e.g., RAS and PLS). E.g., "orte_base_launcher=tm", or > somesuch. > > > On Jul 10, 2007, at 9:08 AM, Ralph H Castain wrote: > >> Actually, I was talking specifically about configuration at build >> time. I >> realize the

Re: [OMPI devel] Multi-environment builds

2007-07-11 Thread Ralph H Castain
Interesting point - no reason why we couldn't use that functionality for this purpose. Good idea! On 7/11/07 5:38 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > On Jul 10, 2007, at 1:26 PM, Ralph H Castain wrote: > >>> 2. It may be useful to have some h

[OMPI devel] Orte update

2007-07-12 Thread Ralph H Castain
Yo all I have a fairly significant change coming to the orte part of the code base that will require an autogen (sorry). I'll check it in late this afternoon (can't do it at night as it is on my office desktop). The commit will fix the singleton operations, including singleton comm_spawn. It

Re: [OMPI devel] Orte update

2007-07-12 Thread Ralph H Castain
the update. Thanks Ralph On 7/12/07 7:53 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > Yo all > > I have a fairly significant change coming to the orte part of the code base > that will require an autogen (sorry). I'll check it in late this afternoon > (can't do it

Re: [OMPI devel] Orte update

2007-07-12 Thread Ralph H Castain
let me know of any problems. Ralph On 7/12/07 1:45 PM, "Ralph H Castain" <r...@lanl.gov> wrote: > Yo folks > > Several of us are stuck waiting for this commit to hit. Rather than wasting > the next several hours, I'm going to make the commit now. > > So plea

Re: [OMPI devel] [devel-core] Orte update

2007-07-12 Thread Ralph H Castain
ns ? This > will solve the Windows problem, and will give us a more consistent > environment. > >george. > > On Jul 12, 2007, at 4:02 PM, Ralph H Castain wrote: > >> The commit has been made - it is r15390. >> >> This commit restored the ability to e

Re: [OMPI devel] [devel-core] Orte update

2007-07-12 Thread Ralph H Castain
y separate code paths. That's why we wound up where we are. Remember, the ODLS fork/exec's application procs, so it includes all kinds of stuff for that purpose. In this case, we are fork/exec'ing an orted - totally different informational requirements. On 7/12/07 2:17 PM, "Ralph H Castain&

[OMPI devel] Major reduction in ORTE

2007-07-12 Thread Ralph H Castain
Yo all As we are discussing functional requirements for the upcoming 1.3 release, I was asked to provide a little info about what is going to be happening to the ORTE part of the code base over the remainder of this year. Short answer: there will be a major code revision to reduce ORTE to the

Re: [OMPI devel] [devel-core] Major reduction in ORTE

2007-07-13 Thread Ralph H Castain
> On Thu, Jul 12, 2007 at 03:04:01PM -0600, Ralph H Castain wrote: >> As always, any thoughts/suggestions are welcomed. >> > I hope Sharon's work on process affinity will be merged into the trunk > before this works begins and functionality will be preser

Re: [OMPI devel] Orte update

2007-07-13 Thread Ralph H Castain
On 7/13/07 7:22 AM, "Sven Stork" <st...@hlrs.de> wrote: > Hi Ralph, > > On Thursday 12 July 2007 15:53, Ralph H Castain wrote: >> Yo all >> >> I have a fairly significant change coming to the orte part of the code base >> that will

Re: [OMPI devel] Orte update

2007-07-16 Thread Ralph H Castain
Sigh - somehow, the fix slid out of that commit. I have now fixed it in r15437. Thanks Ralph On 7/16/07 6:11 AM, "Sven Stork" <st...@hlrs.de> wrote: > On Friday 13 July 2007 15:35, Ralph H Castain wrote: >> >> On 7/13/07 7:22 AM, "Sven Stork"

Re: [OMPI devel] iof / oob issues

2007-07-18 Thread Ralph H Castain
Just to further clarify the clarification... ;-) This condition has existed for the last several months. The root problem dates at least back into the 1.1 series. We chased the problem down to the iof_flush call in the odls when a process terminates in something like Jan or Feb this year, at

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Ralph H Castain
I believe that was fixed in r15405 - are you at that rev level? On 7/18/07 7:27 AM, "Gleb Natapov" wrote: > Hi, > > With current trunk LD_LIBRARY_PATH is not set for ranks that are > launched on the head node. This worked previously. > > -- > Gleb. >

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Ralph H Castain
It works for me in both cases, provided I give the fully qualified host name for your first example. In other words, these work: pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host localhost printenv | grep LD [pn1180961.lanl.gov:22021] [0.0] test of print_name OLDPWD=/Users/rhc/openmpi

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-19 Thread Ralph H Castain
it out. So the question is: why do you not have LD_LIBRARY_PATH set in your environment when you provide a different hostname? On 7/19/07 7:45 AM, "Gleb Natapov" <gl...@voltaire.com> wrote: > On Wed, Jul 18, 2007 at 09:08:38PM +0300, Gleb Natapov wrote: >>

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-19 Thread Ralph H Castain
he one > that works fine. The failing one is the first one, where > LD_LIBRARY_PATH is not provided. As Gleb indicate using localhost > make the problem vanish. > >george. > > On Jul 19, 2007, at 10:57 AM, Ralph H Castain wrote: > >> But it *does* provide an LD

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-19 Thread Ralph H Castain
Talked with Brian and we have identified the problem and a fix - will come in later today. Thanks Ralph On 7/19/07 9:24 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > You are correct - I misread the note. My bad. > > I'll look at how we might ensure the LD_LIBRAR

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-19 Thread Ralph H Castain
problem that will take some discussion - to occur separately from this chain. So some of the behavior you cited continues for the moment. Thanks Ralph On 7/19/07 9:39 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > Talked with Brian and we have identified the problem and a fix - w

Re: [OMPI devel] Removal of cellid

2007-07-19 Thread Ralph H Castain
This change has finally been merged into the trunk as r15517. It will unfortunately require an autogen (sorry). Please let me know if you encounter any problems. As noted in the commit, I tried to catch all the places that required change, but cannot guarantee that I got all of them. Thanks

[OMPI devel] Hostfiles - yet again

2007-07-24 Thread Ralph H Castain
Yo all As you know, I am working on revamping the hostfile functionality to make it work better with managed environments (at the moment, the two are exclusive). The issue that we need to review is how we want the interaction to work, both for the initial launch and for comm_spawn. In talking

Re: [OMPI devel] Hostfiles - yet again

2007-07-26 Thread Ralph H Castain
Hi Aurelien Perhaps some bad news on this subject - see below. On 7/26/07 7:53 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > > > > On 7/26/07 7:33 AM, "rolf.vandeva...@sun.com" <rolf.vandeva...@sun.com> > wrote: > >> Aurelien Bout

Re: [OMPI devel] Hostfiles - yet again

2007-07-26 Thread Ralph H Castain
On 7/26/07 2:24 PM, "Aurelien Bouteiller" <boute...@cs.utk.edu> wrote: > Ralph H Castain wrote: >> After some investigation, I'm afraid that I have to report that this - as >> far as I understand what you are doing - may no longer work in Open MPI in >>

[OMPI devel] Trunk borked?

2007-08-06 Thread Ralph H Castain
Yo all I've been playing with the trunk today and found it appears to be broken for comm_spawn. I'm getting two types of errors, perhaps related: 1. if everything is being done on localhost, I do not see any of the IO from the child process. Mpirun executes and completes cleanly, however.

Re: [OMPI devel] [devel-core] Trunk borked?

2007-08-06 Thread Ralph H Castain
On 8/6/07 1:51 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > On Aug 6, 2007, at 11:49 AM, Ralph H Castain wrote: > >> 1. if everything is being done on localhost, I do not see any of >> the IO from >> the child process. Mpirun executes an

[OMPI devel] Status update

2007-08-10 Thread Ralph H Castain
Hello all This is just to let you know of a change in my status. I will be on vacation all of next week (Aug 13-17), and possibly part of the following week as well. I will not have my computer with me, so I will not be reading or responding to email for up to two weeks. When I return, I will

[OMPI devel] Trunk issue?

2007-08-27 Thread Ralph H Castain
Yo folks Just checked out a fresh copy of the trunk and tried to build it using my usual configure: ./configure --prefix=/Users/rhc/openmpi --with-devel-headers --disable-shared --enable-static --disable-mpi-f77 --disable-mpi-f90 --enable-mem-debug --without-memory-manager --enable-debug

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-28 Thread Ralph H Castain
On 8/27/07 7:30 AM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: > Ralph, > > Ralph H Castain wrote: >> Just returned from vacation...sorry for delayed response > No Problem. Hope you had a good vacation :) And sorry for my super > delayed respo

[OMPI devel] [RFC] Exit without finalize

2007-09-06 Thread Ralph H Castain
WHAT: Decide upon how to handle MPI applications where one or more processes exit without calling MPI_Finalize WHY:Some applications can abort via an exit call instead of calling MPI_Abort when a library (or something else) calls exit. This situation is outside a

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-07 Thread Ralph H Castain
Sorry for delay - wasn't ignoring the issue. There are several fixes to this problem - ranging in order from least to most work: 1. just alias "ssh" to be "ssh -Y" and run without setting the mca param. It won't affect anything on the backend because the daemon/procs don't use ssh. 2. include

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-08 Thread Ralph H Castain
wrote: >> >>> I'm curious what changed to make this a problem. How were we passing mca >>> param >>> from the base to the app before, and why did it change? >>> >>> I think that options 1 & 2 below are no good, since we, in general, allow >>&g

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Ralph H Castain
-- I'm just joining this conversation late: what's the problem > with opal_cmd_line_parse? > > It should obey all quoting from shells, etc. I.e., it shouldn't care > about tokens with special characters (to include spaces) because the > shell divides all of that stuff up --

Re: [OMPI devel] ORTE process name,, nodeid..

2007-11-19 Thread Ralph H Castain
Yo Galen I'm not aware of any continuing discussion to totally remove the process name from ORTE - I believe we coalesced to redefining how the jobid was established to a procedure that doesn't require a name server. This hasn't come over to the trunk yet, but will in the next couple of months.

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Ralph H Castain
odels are used. (i.e., you exec locally > but it turns into a system-like invocation on the remote side). In > this case, I think you'll need to quote extended strings (e.g., those > containing spaces) for the non-local invocations not not quote it for > local invocations. >

  1   2   3   >