[hwloc-devel] Create success (hwloc git 1.11.3-43-g8f0e3cd)

2016-08-10 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc 1.11.3-43-g8f0e3cd Start time: Wed Aug 10 08:50:41 PDT 2016 End time: Wed Aug 10 08:53:56 PDT 2016 Your friendly daemon, Cyrador ___ hwloc-devel mailing list

[hwloc-devel] Create success (hwloc git dev-1222-gdbe0cfd)

2016-08-10 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc dev-1222-gdbe0cfd Start time: Wed Aug 10 18:01:05 PDT 2016 End time: Wed Aug 10 18:04:43 PDT 2016 Your friendly daemon, Cyrador ___ hwloc-devel mailing list

Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-04 Thread Ralph H. Castain
Looks okay to me Brian - I went ahead and filed the CMR and sent it on to Brad for approval. Ralph > On Tue, 3 Mar 2009, Brian W. Barrett wrote: > >> On Tue, 3 Mar 2009, Jeff Squyres wrote: >> >>> 1.3.1rc3 had a race condition in the ORTE shutdown sequence. The only >>> difference between rc3

Re: [O-MPI devel] New Bproc Components

2005-07-28 Thread Ralph H. Castain
Very interesting! Appreciate the info. My numbers are slightly better - as I've indicated, there is a NxN message exchange currently in the system that needs to be removed. With that commented out, the system scales roughly linearly with number of processes. At 04:31 PM 7/28/2005, you wrote:

[O-MPI devel] New simplified registry API's

2005-08-02 Thread Ralph H. Castain
Yo all Per last week's discussions, I have created a set of new simplified API's for the registry. These include: 1. orte_gpr.put_1 and orte_gpr.put_N: these allow you to put data on the registry without having to define your own value structures. They take a segment name, a NULL-terminated

Re: [O-MPI devel] compile error

2005-08-08 Thread Ralph H. Castain
Very interesting - it built fine for me (building static). However, the ns_base_nds.c file is "stale", so I just committed a "delete" of that file. It shouldn't have been building anyway as it isn't in the Makefile. My guess, therefore, is that you are building dynamically and are encountering

[O-MPI devel] RHC development plans

2005-09-01 Thread Ralph H. Castain
Yo folks Several people have asked lately what I am planning to do next on ORTE. Just to help maintain coordination, here is my current list of planned activities (in priority order). Any requests/suggestions are welcomed - this isn't in concrete by any means. 1. Add George's architecture

Re: [O-MPI devel] RHC development plans

2005-09-01 Thread Ralph H. Castain
Yo folks I have now completed the first three of these items. I believe this brings ORTE to a stage that is - at the least - very close to release quality. There are a few memory leaks left (oob and iof subsystems), but I'm not as familiar with those and have asked for help. Barring any

[O-MPI devel] Startup/shutdown performance

2005-09-13 Thread Ralph H. Castain
Yo folks Josh ran some tests for me on Odin earlier today - the results show a major improvement in our startup/shutdown performance. As you may recall, our times grew roughly exponentially before - as the attached graph shows, they now grow roughly linearly. The data also shows that the

[O-MPI devel] New build methodology

2005-11-15 Thread Ralph H. Castain
Yo folks While I generally find the new build methodology (i.e., reducing the number of makefiles) has little impact on me, I have now encountered one problem that causes a significant difficulty. In trying to work on a revised data packing system for the orte part of the branch, I now find

Re: [O-MPI devel] New build methodology

2005-11-15 Thread Ralph H. Castain
Your proposed change would help a great deal - thanks! Can you steer me through the change? At 07:33 AM 11/15/2005, you wrote: Hi Ralph, * Ralph H. Castain wrote on Tue, Nov 15, 2005 at 03:12:38PM CET: > > While I generally find the new build methodology (i.e., reducing the &g

Re: [O-MPI devel] New build methodology

2005-11-20 Thread Ralph H. Castain
ut we have definitely made it harder to develop a subsystem. Is that really a good trade? I wonder. Ralph At 08:08 AM 11/15/2005, you wrote: * Ralph H. Castain wrote on Tue, Nov 15, 2005 at 03:45:26PM CET: > At 07:33 AM 11/15/2005, you wrote: > > > >Would it help if onl

Re: [O-MPI devel] New build methodology

2005-11-21 Thread Ralph H. Castain
un "make" in a framework directory, it just builds the stuff in base without recusing. Of course, you can't run make in the base/ directory, but since running make in the framework directory is essentially equivalent, it doesn't exactly matter. Brian On Nov 20, 2005, at 10:04 PM, Ralph H. Casta

Re: [O-MPI devel] New build methodology

2005-11-21 Thread Ralph H. Castain
Hi Ralf Appreciate the offer, but I think at this stage it isn't worth the hassle. We either implement a long-term fix, or just pay the price. Thanks though Ralph At 01:37 AM 11/21/2005, you wrote: Hi Ralph, * Ralph H. Castain wrote on Mon, Nov 21, 2005 at 04:04:34AM CET: > Just as an

[O-MPI devel] Process for modifying APIs

2005-11-21 Thread Ralph H. Castain
Yo all As you may have seen from earlier emails, I encountered some difficulty in modifying existing APIs within the streamlined build system. After some effort, I think I have defined a method for modifying the API-level of a subsystem that gets around some of the problems. I thought I

Re: [O-MPI devel] rsh and fork pls components

2005-12-13 Thread Ralph H. Castain
No problem with me - seems straightforward and resolves some confusion. On the orted check for the fork pls, you will find that there is a flag in the process info structure that indicates "I am a daemon". You may just need to check that flag - gets set very early and so should be available

Re: [O-MPI devel] Alpha 4 and job state transitions

2006-02-02 Thread Ralph H. Castain
I've just finished some stuff - will check it into the system (hopefully) tomorrow. I'll be able to take a look at this next week. My guess is that the launcher isn't setting that proc state at this time since it isn't being used by the system internally and we didn't know anyone else was

[O-MPI devel] New data support subsystem for ORTE

2006-02-06 Thread Ralph H. Castain
Hello all After several months of development, I have merged the new data support subsystem for ORTE into the trunk. I must provide one caveat of warning: I have made every effort to test the revised system, but cannot guarantee its operation in every condition and under every system. For

Re: [O-MPI devel] Alpha 4 and job state transitions

2006-02-08 Thread Ralph H. Castain
Nathan This should now be fixed on the trunk. Once it is checked out more thoroughly, I'll ask that it be moved to the 1.0 branch. For now, you might want to check out the trunk and verify it meets your needs. Ralph At 03:05 PM 2/1/2006, you wrote: This was happening on Alpha 1 as well but

Re: [O-MPI devel] Modification to triggers

2006-02-09 Thread Ralph H. Castain
Hmmmyuck! I'll take a look - will set it back to what it was before in the interim. Thanks Ralph At 07:05 AM 2/9/2006, you wrote: On Feb 8, 2006, at 12:46 PM, Ralph H. Castain wrote: > In addition, I took advantage of the change to fix something Brian > had flagged in the orte/mc

Re: [OMPI devel] [O-MPI devel] Alpha 4 and job state transitions

2006-02-13 Thread Ralph H. Castain
>> Nathan DeBardeleben, Ph.D. >> Los Alamos National Laboratory >> Parallel Tools Team >> High Performance Computing Environments >> phone: 505-667-3428 >> email: ndeb...@lanl.gov >>

Re: [OMPI devel] help - urgent

2006-06-30 Thread Ralph H Castain
Hi Amrita I¹m not entirely sure I understand your questions, but will try to answer them below. If you can share what you are doing, we¹d be happy to provide advice. Ralph On 6/30/06 5:45 AM, "amrita mathuria" wrote: > hi... > > I am working with open mpi

Re: [OMPI devel] orted problem

2006-07-05 Thread Ralph H Castain
This has been around for a very long time (at least a year, if memory serves correctly). The problem is that the system "hangs" while trying to flush the io buffers through the RML because it loses connection to the head node process (for 1.x, that's basically mpirun) - but the "flush" procedure

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
Hi Nathan Could you tell us which version of the code you are using, and print out the rc value that was returned by the "get" call? I see nothing obviously wrong with the code, but much depends on what happened prior to this call too. BTW: you might want to release the memory stored in the

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
h Performance Computing Environments > phone: 505-667-3428 > email: ndeb...@lanl.gov > ----- > > > > Ralph H Castain wrote: >> Hi Nathan >> >> Could you tell us which version of the code you a

Re: [OMPI devel] OpenMPI not conforming with the C90 spec?

2006-08-21 Thread Ralph H Castain
On 8/21/06 1:14 AM, "Ralf Wildenhues" wrote: > >> Perhaps we should use int64_t instead. > > No, that would not help: int64_t is C99, so it should not be declared > either in C89 mode. Also, the int64_t is required to have 64 bits, and > could thus theoretically be

Re: [OMPI devel] OpenMPI not conforming with the C90 spec?

2006-08-21 Thread Ralph H Castain
On 8/21/06 6:58 AM, "Ralf Wildenhues" <ralf.wildenh...@gmx.de> wrote: > * Ralph H Castain wrote on Mon, Aug 21, 2006 at 02:39:51PM CEST: >> >> It sounds, therefore, like we are now C99 compliant and no longer C90 >> compliant at all? > > Well

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain
Actually, I was a part of that thread - see my comments beginning with http://www.open-mpi.org/community/lists/devel/2006/03/0797.php. Perhaps I communicated poorly here. The issue in the prior thread was that few systems nowadays don't offer at least some level of IPv6 compatibility, even if

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain
On 9/6/06 9:44 AM, "Christian Kauhaus" wrote: > Bogdan Costescu : >> I don't know why you think that this (talking to different nodes via >> different channels) is unusual - I think that it's quite probable, >> especially in a

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-07 Thread Ralph H Castain
. > > I even volunteer for that. Next week I will be away, so I will come > back with a design for the phone conference on ... well beginning of > october. > >george. > > > On Sep 7, 2006, at 12:22 PM, Ralph H Castain wrote: > >> Jeff and I talked about t

[OMPI devel] ORTE tutorial

2006-09-25 Thread Ralph H Castain
Yo folks I need to do a little planning and it would help a bunch to have a preliminary head count. Could you please let me know (a) if you plan to participate in the tutorial, and (b) indicate if in-person or remote? For an agenda, my thought is that we will start at 7am Mountain time (that's

Re: [OMPI devel] ORTE tutorial

2006-09-25 Thread Ralph H Castain
round 10.30 pm on wednesday, and by the time I pick up > the rental car and drive to White Rocks, it can become quite late) > Could we maybe start a little later that day, e.g. 8am or 9am? > > Thanks > Edgar > > Ralph H Castain wrote: >> Yo folks >> >> I

[OMPI devel] ORTE Tutorial Materials

2006-09-27 Thread Ralph H Castain
Hello all The materials for Thursday's session of the ORTE tutorial are now complete and stable. I have posted them on the OpenRTE web site at: http://www.open-rte.org/papers/tutorial-sept-2006/index.php Both Powerpoint and PDF (printed two slides/page) formats are available. I should have the

[OMPI devel] ORTE Timing

2006-09-29 Thread Ralph H Castain
Hello all There was some discussion at yesterday's tutorial about ORTE scalability and where bottlenecks might be occurring. I spent some time last night identifying key information required to answer those questions. I'll be presenting a slide today showing the key timing points that we would

Re: [OMPI devel] socket usage

2006-10-25 Thread Ralph H Castain
I can't speak to the MPI layer, but for OpenRTE, each process holds one socket open to the HNP. Each process *has* all the socket connection info for all of the processes in its job, but I don't believe we actually open those sockets until we attempt to communicate with that process (needs to be

Re: [OMPI devel] New oob/tcp?

2006-10-25 Thread Ralph H Castain
I don't see any new component, Adrian. There have been a few updates to the existing component, some of which might cause conflicts with the merge, but those shouldn't be too hard to resolve. As far as I know, the oob/tcp component is relatively stable. Brian is doing some work on it to enable us

Re: [OMPI devel] Build system changes

2006-11-30 Thread Ralph H Castain
Thanks Ralf! Much appreciated. On 11/30/06 8:33 AM, "Ralf Wildenhues" wrote: > * Ralph Castain wrote on Thu, Nov 30, 2006 at 04:12:16PM CET: >> That could be the problem. I had to update automake, and unfortunately >> Darwin Ports hasn't reached that level yet. So I had

[OMPI devel] OpenRTE telecon?

2007-01-04 Thread Ralph H Castain
Hi everyone Several of us were on a telecon yesterday and the topic of better coordinating the activities on OpenRTE came up. While things have percolated along reasonably well, the general feeling was that better, wider knowledge of current OpenRTE development activities and directions would

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-29 Thread Ralph H Castain
On 1/27/07 9:37 AM, "Greg Watson" wrote: > There are two more interfaces that have changed: > > 1. orte_rds.query() now takes a job id, whereas in 1.2b1 it didn't > take any arguments. I seem to remember that I call this to kick orted > into action, but I'm not sure of the

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-29 Thread Ralph H Castain
On 1/29/07 10:20 AM, "Greg Watson" <gwat...@lanl.gov> wrote: > > On Jan 29, 2007, at 6:47 AM, Ralph H Castain wrote: > >> >> >> >> On 1/27/07 9:37 AM, "Greg Watson" <gwat...@lanl.gov> wrote: >> >>> There a

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-30 Thread Ralph H Castain
anything >> other than a hostfile, we really don't have a way to do that right >> now. The >> ORTE 2.0 design allows for it, but we haven't implemented that yet - >> probably a few months away. >> >> Hope that helps >> Ralph >> >> >> O

Re: [OMPI devel] Is it possible to get BTL transport work directly with MPI level

2007-04-03 Thread Ralph H Castain
On 4/3/07 9:32 AM, "Li-Ta Lo" wrote: > On Sun, 2007-04-01 at 13:12 -0600, Ralph Castain wrote: > >> >> 2. I'm not sure what you mean by mapping MPI processes to "physical" >> processes, but I assume you mean how do we assign MPI ranks to processes on >> specific nodes. You

[OMPI devel] ORTE scalability issues

2007-04-16 Thread Ralph H Castain
Hello all I understand that several people are interested in the OpenRTE scalability issues - this is great! However, it appears we haven't done a very good job of circulating information about the identified causes of the current issues. In the hope of helping people to be productive in their

Re: [OMPI devel] ORTE scalability issues

2007-04-17 Thread Ralph H Castain
Thanks Christian. Actually, I was aware of that and should have clarified that these tests did *not* involve the IPv6 code. Ralph On 4/17/07 1:31 AM, "Christian Kauhaus" <ckauh...@minet.uni-jena.de> wrote: > Ralph H Castain <r...@lanl.gov>: >> even tho

[OMPI devel] Dumping process status etc.

2007-05-22 Thread Ralph H Castain
This came up in today's telecon and I promised to send this to George - however, it occurred to me that others may also want to know. If you want to dump info for debugging purposes, and if you can get into orterun/mpirun (e.g., via gdb), you can dump info on anything with the following (NOTE:

Re: [OMPI devel] Strange schema error

2007-05-23 Thread Ralph H Castain
Just a quick glance (running out door) - it looks like Josh commented out a critical piece of code in the rds hostfile component at line 442. It loads the cell info into the name service so it can correctly respond to the query you cite below. You might try restoring that code - if you do, check

Re: [OMPI devel] Strange schema error

2007-05-23 Thread Ralph H Castain
> wrote: > I haven't looked at this at all, but that line changed in r6813 which > was Aug. 2005 so I would guess the problem is elsewhere. However with > the recent ORTE changes maybe this is a side effect. > > -- Josh > > > On May 23, 2007, at 11:11 AM, Ralph H

Re: [OMPI devel] Strange schema error

2007-05-23 Thread Ralph H Castain
Okay, this is now fixed as of r14732. Thanks (and apologies) to George for spotting it. Ralph On 5/23/07 9:57 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > Actually, I think that is true (got back earlier than expected). The problem > really is that we had multip

Re: [OMPI devel] ORTE registry patch

2007-05-24 Thread Ralph H Castain
Thanks - I'll take a look at this (and the prior ones!) in the next couple of weeks when time permits and get back to you. Ralph On 5/23/07 1:11 PM, "George Bosilca" wrote: > Attached is another patch to the ORTE layer, more specifically the > replica. The idea is to

[OMPI devel] Why the HNP gets so big...

2007-05-31 Thread Ralph H Castain
Scaling tests over the last few months have all shown a behavior that has elicited significant comment: namely, that the HNP is observed to grow to multiple gigabytes in size for runs involving several thousand processes. This represents a peak size that declines to a much smaller footprint once

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Ralph H Castain
s (you'll have to look at the tests to >>>>>> see >>>>>> which ones make sense in the latter case). This will ensure that we >>>>>> have at >>>>>> least some degree of coverage. >>>>>> >>>>>>

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Ralph H Castain
ou do, remember to also > remove test/class/orte_bitmap.c > > Thanks, > > Tim > > > Ralph H Castain wrote: >> Sigh...is it really so much to ask that we at least run the tests in >> orte/test/system and orte/test/mpi using both mpirun and singleton (where >> a

[OMPI devel] Major commit to trunk

2007-06-12 Thread Ralph H Castain
Yo all I made a major commit to the trunk this morning (r15007) that merits general notification and some explanation. *** IMPORTANT NOTE *** One major impact of the commit you *may* notice is that support for several environments will be broken. This commit is known to break

Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralph H Castain
Actually, I was talking specifically about configuration at build time. I realize there are trade-offs here, and suspect we can find a common ground. The problem with using the options Jeff described is that they require knowledge on the part of the builder as to what environments have had their

Re: [OMPI devel] ticket 1023

2007-07-10 Thread Ralph H Castain
As I understood our original discussions, this would move responsibility for mapping rank to processor back into the orted - is that still true? Reason I ask is to again clarify for people if we are doing so as it (a) impacts those systems that don't use our orteds (e.g., will affinity still work

Re: [OMPI devel] ticket 1023

2007-07-10 Thread Ralph H Castain
al processes. Currently this component is the ODLS. Most of my > work is in the ODLS component so if you decide to eliminate the orteds > you mast, somehow, preserve the ODLS functionality. > > Sharon. > > > > -Original Message- > From: devel-boun...@open-mpi.org

Re: [OMPI devel] Multi-environment builds

2007-07-11 Thread Ralph H Castain
Interesting point - no reason why we couldn't use that functionality for this purpose. Good idea! On 7/11/07 5:38 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > On Jul 10, 2007, at 1:26 PM, Ralph H Castain wrote: > >>> 2. It may be useful to have some h

[OMPI devel] Orte update

2007-07-12 Thread Ralph H Castain
Yo all I have a fairly significant change coming to the orte part of the code base that will require an autogen (sorry). I'll check it in late this afternoon (can't do it at night as it is on my office desktop). The commit will fix the singleton operations, including singleton comm_spawn. It

Re: [OMPI devel] Orte update

2007-07-12 Thread Ralph H Castain
the update. Thanks Ralph On 7/12/07 7:53 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > Yo all > > I have a fairly significant change coming to the orte part of the code base > that will require an autogen (sorry). I'll check it in late this afternoon > (can't do it

Re: [OMPI devel] Orte update

2007-07-12 Thread Ralph H Castain
let me know of any problems. Ralph On 7/12/07 1:45 PM, "Ralph H Castain" <r...@lanl.gov> wrote: > Yo folks > > Several of us are stuck waiting for this commit to hit. Rather than wasting > the next several hours, I'm going to make the commit now. > > So plea

Re: [OMPI devel] [devel-core] Orte update

2007-07-12 Thread Ralph H Castain
y separate code paths. That's why we wound up where we are. Remember, the ODLS fork/exec's application procs, so it includes all kinds of stuff for that purpose. In this case, we are fork/exec'ing an orted - totally different informational requirements. On 7/12/07 2:17 PM, "Ralph H Castain&

[OMPI devel] Major reduction in ORTE

2007-07-12 Thread Ralph H Castain
Yo all As we are discussing functional requirements for the upcoming 1.3 release, I was asked to provide a little info about what is going to be happening to the ORTE part of the code base over the remainder of this year. Short answer: there will be a major code revision to reduce ORTE to the

Re: [OMPI devel] [devel-core] Major reduction in ORTE

2007-07-13 Thread Ralph H Castain
> On Thu, Jul 12, 2007 at 03:04:01PM -0600, Ralph H Castain wrote: >> As always, any thoughts/suggestions are welcomed. >> > I hope Sharon's work on process affinity will be merged into the trunk > before this works begins and functionality will be preser

Re: [OMPI devel] Orte update

2007-07-13 Thread Ralph H Castain
On 7/13/07 7:22 AM, "Sven Stork" <st...@hlrs.de> wrote: > Hi Ralph, > > On Thursday 12 July 2007 15:53, Ralph H Castain wrote: >> Yo all >> >> I have a fairly significant change coming to the orte part of the code base >> that will

Re: [OMPI devel] Orte update

2007-07-16 Thread Ralph H Castain
Sigh - somehow, the fix slid out of that commit. I have now fixed it in r15437. Thanks Ralph On 7/16/07 6:11 AM, "Sven Stork" <st...@hlrs.de> wrote: > On Friday 13 July 2007 15:35, Ralph H Castain wrote: >> >> On 7/13/07 7:22 AM, "Sven Stork"

Re: [OMPI devel] iof / oob issues

2007-07-18 Thread Ralph H Castain
Just to further clarify the clarification... ;-) This condition has existed for the last several months. The root problem dates at least back into the 1.1 series. We chased the problem down to the iof_flush call in the odls when a process terminates in something like Jan or Feb this year, at

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Ralph H Castain
I believe that was fixed in r15405 - are you at that rev level? On 7/18/07 7:27 AM, "Gleb Natapov" wrote: > Hi, > > With current trunk LD_LIBRARY_PATH is not set for ranks that are > launched on the head node. This worked previously. > > -- > Gleb. >

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Ralph H Castain
It works for me in both cases, provided I give the fully qualified host name for your first example. In other words, these work: pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host localhost printenv | grep LD [pn1180961.lanl.gov:22021] [0.0] test of print_name OLDPWD=/Users/rhc/openmpi

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-19 Thread Ralph H Castain
it out. So the question is: why do you not have LD_LIBRARY_PATH set in your environment when you provide a different hostname? On 7/19/07 7:45 AM, "Gleb Natapov" <gl...@voltaire.com> wrote: > On Wed, Jul 18, 2007 at 09:08:38PM +0300, Gleb Natapov wrote: >>

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-19 Thread Ralph H Castain
he one > that works fine. The failing one is the first one, where > LD_LIBRARY_PATH is not provided. As Gleb indicate using localhost > make the problem vanish. > >george. > > On Jul 19, 2007, at 10:57 AM, Ralph H Castain wrote: > >> But it *does* provide an LD

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-19 Thread Ralph H Castain
Talked with Brian and we have identified the problem and a fix - will come in later today. Thanks Ralph On 7/19/07 9:24 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > You are correct - I misread the note. My bad. > > I'll look at how we might ensure the LD_LIBRAR

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-19 Thread Ralph H Castain
problem that will take some discussion - to occur separately from this chain. So some of the behavior you cited continues for the moment. Thanks Ralph On 7/19/07 9:39 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > Talked with Brian and we have identified the problem and a fix - w

Re: [OMPI devel] Removal of cellid

2007-07-19 Thread Ralph H Castain
This change has finally been merged into the trunk as r15517. It will unfortunately require an autogen (sorry). Please let me know if you encounter any problems. As noted in the commit, I tried to catch all the places that required change, but cannot guarantee that I got all of them. Thanks

[OMPI devel] Hostfiles - yet again

2007-07-24 Thread Ralph H Castain
Yo all As you know, I am working on revamping the hostfile functionality to make it work better with managed environments (at the moment, the two are exclusive). The issue that we need to review is how we want the interaction to work, both for the initial launch and for comm_spawn. In talking

Re: [OMPI devel] Hostfiles - yet again

2007-07-26 Thread Ralph H Castain
Hi Aurelien Perhaps some bad news on this subject - see below. On 7/26/07 7:53 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > > > > On 7/26/07 7:33 AM, "rolf.vandeva...@sun.com" <rolf.vandeva...@sun.com> > wrote: > >> Aurelien Bout

Re: [OMPI devel] Hostfiles - yet again

2007-07-26 Thread Ralph H Castain
On 7/26/07 2:24 PM, "Aurelien Bouteiller" <boute...@cs.utk.edu> wrote: > Ralph H Castain wrote: >> After some investigation, I'm afraid that I have to report that this - as >> far as I understand what you are doing - may no longer work in Open MPI in >>

[OMPI devel] Status update

2007-08-10 Thread Ralph H Castain
Hello all This is just to let you know of a change in my status. I will be on vacation all of next week (Aug 13-17), and possibly part of the following week as well. I will not have my computer with me, so I will not be reading or responding to email for up to two weeks. When I return, I will

[OMPI devel] Trunk issue?

2007-08-27 Thread Ralph H Castain
Yo folks Just checked out a fresh copy of the trunk and tried to build it using my usual configure: ./configure --prefix=/Users/rhc/openmpi --with-devel-headers --disable-shared --enable-static --disable-mpi-f77 --disable-mpi-f90 --enable-mem-debug --without-memory-manager --enable-debug

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-28 Thread Ralph H Castain
On 8/27/07 7:30 AM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: > Ralph, > > Ralph H Castain wrote: >> Just returned from vacation...sorry for delayed response > No Problem. Hope you had a good vacation :) And sorry for my super > delayed respo

[OMPI devel] [RFC] Exit without finalize

2007-09-06 Thread Ralph H Castain
WHAT: Decide upon how to handle MPI applications where one or more processes exit without calling MPI_Finalize WHY:Some applications can abort via an exit call instead of calling MPI_Abort when a library (or something else) calls exit. This situation is outside a

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-07 Thread Ralph H Castain
Sorry for delay - wasn't ignoring the issue. There are several fixes to this problem - ranging in order from least to most work: 1. just alias "ssh" to be "ssh -Y" and run without setting the mca param. It won't affect anything on the backend because the daemon/procs don't use ssh. 2. include

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-08 Thread Ralph H Castain
wrote: >> >>> I'm curious what changed to make this a problem. How were we passing mca >>> param >>> from the base to the app before, and why did it change? >>> >>> I think that options 1 & 2 below are no good, since we, in general, allow >>&g

Re: [OMPI devel] ORTE process name,, nodeid..

2007-11-19 Thread Ralph H Castain
Yo Galen I'm not aware of any continuing discussion to totally remove the process name from ORTE - I believe we coalesced to redefining how the jobid was established to a procedure that doesn't require a name server. This hasn't come over to the trunk yet, but will in the next couple of months.

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Ralph H Castain
odels are used. (i.e., you exec locally > but it turns into a system-like invocation on the remote side). In > this case, I think you'll need to quote extended strings (e.g., those > containing spaces) for the non-local invocations not not quote it for > local invocations. >

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Ralph H Castain
AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > On Nov 19, 2007, at 10:01 AM, Ralph H Castain wrote: > >> Unfortunately, it -is- a significant problem when passing the params >> on to >> the orteds, as Tim has eloquently pointed out in his original >>

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Ralph H Castain
On 11/19/07 8:19 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > On Nov 19, 2007, at 10:15 AM, Ralph H Castain wrote: > >> Hmmm...okay, I get the message: if it is going to be fixed, then I >> have to >> fix it on my end. ;-) Which means it a

[OMPI devel] RTE issues: scalability & complexity

2007-12-04 Thread Ralph H Castain
Yo all As (I hope) many of you know, we are in a final phase of revamping ORTE to simplify the code, enhance scalability, and improve reliability. In working on this effort, we recently uncovered four issues that merit broader discussion (apologies in advance for verbosity). Although these

[OMPI devel] RTE Issue II: Interaction between the ROUTED and GRPCOMM frameworks

2007-12-04 Thread Ralph H Castain
II. Interaction between the ROUTED and GRPCOMM frameworks When we initially developed these two frameworks within the RTE, we envisioned them to operate totally independently of each other. Thus, the grpcomm collectives provide algorithms such as a binomial "xcast" that uses the daemons to

[OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities

2007-12-04 Thread Ralph H Castain
IV. RTE/MPI relative modex responsibilities The modex operation conducted during MPI_Init currently involves the exchange of two critical pieces of information: 1. the location (i.e., node) of each process in my job so I can determine who shares a node with me. This is subsequently used by the

Re: [OMPI devel] RTE issue I. Support for non-MPI jobs

2007-12-05 Thread Ralph H Castain
On 12/5/07 7:58 AM, "rolf.vandeva...@sun.com" <rolf.vandeva...@sun.com> wrote: > Ralph H Castain wrote: > >> I. Support for non-MPI jobs >> Considerable complexity currently exists in ORTE because of the stipulation >> in our first requirements docum

Re: [OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities

2007-12-05 Thread Ralph H Castain
t it seems to me that we simply exchange a modex framework for an attribute framework (since each environment would have to get the attribute values in a different manner) - don't we? I have no problem with using attributes instead of the modex, but the issue appears to be the same either way - you sti

Re: [OMPI devel] RTE Issue II: Interaction between the ROUTED and GRPCOMM frameworks

2007-12-05 Thread Ralph H Castain
ot >> self contained. There is logic for routing algorithms outside of the >> components (for example, in orte/orted/orted_comm.c). So, if there are >> any overhauls planned I definitely think this needs to be cleaned up. >> >> Thanks, >> >> Tim >> >> Ra

Re: [OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities

2007-12-06 Thread Ralph H Castain
robably ignorance on my part, but it seems >> to me that we simply exchange a modex framework for an attribute framework >> (since each environment would have to get the attribute values in a >> different manner) - don't we? >> >> I have no problem with using attributes ins

Re: [OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities

2007-12-06 Thread Ralph H Castain
sn't it? Probably ignorance on my part, but it seems >> to me that we simply exchange a modex framework for an attribute framework >> (since each environment would have to get the attribute values in a >> different manner) - don't we? >> >> I have no problem with using attribute

Re: [OMPI devel] pointer_array

2007-12-17 Thread Ralph H Castain
It would require extensive modification as use of the pointer array has spread over a wide range of the code base. I would really appreciate it if we didn't do this right now. The differences are historic in nature - several years ago, the folks working on the OMPI layer needed to insert some

Re: [OMPI devel] pointer_array

2007-12-17 Thread Ralph H Castain
e you leave orte alone, > and just move the ompi pointer array implementation down into opal. > That way, any new code can make use of it from opal, and only orte > would need to be adjusted later, after Ralph is done with his changes. > > On Dec 17, 2007 9:18 AM, Ralph H Casta

Re: [OMPI devel] RTE issues: scalability & complexity

2007-12-19 Thread Ralph H Castain
Hi all There was very little response to these notes, so I'm moving forward as per the initial mailings. Here is what was concluded - holler if you have a comment. Ralph On 12/4/07 8:09 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > Yo all > > As (I hope) many of yo

Re: [OMPI devel] dropping a pls module into an Open MPI build

2008-01-23 Thread Ralph H Castain
Hi Dean Had to ponder this for awhile. I'm not entirely sure of the source of the problem, but one suspicion has to do with the name of the module. Open MPI ships with a module named "rsh" in the PLS framework. The MCA uses the module name in its loading process. If you insert another module

Re: [OMPI devel] trunk breakage

2008-01-23 Thread Ralph H Castain
Still no joy... pn1180961:~/openmpi/trunk rhc$ make install > /dev/null vprotocol_pessimist_sender_based.c: In function 'sb_mmap_file_open': vprotocol_pessimist_sender_based.c:43: error: 'errno' undeclared (first use in this function) vprotocol_pessimist_sender_based.c:43: error: (Each undeclared

Re: [OMPI devel] dropping a pls module into an Open MPI build

2008-01-24 Thread Ralph H Castain
Appreciate the clarification. I am unaware of anyone attempting that procedure in the past, but I'm not terribly surprised to hear it would encounter problems and/or fail. Given the myriad of configuration options in the code base, it would seem almost miraculous that you could either (a) hit the

  1   2   3   >