Re: [OMPI users] independent startup of orted and orterun

2015-02-04 Thread Ralph Castain
We're going to take this off-list so we quit peppering you all with the development...will report back when we have something more concrete should anyone else be interested. On Wed, Feb 4, 2015 at 2:22 AM, Mark Santcroos wrote: > Ok great, sounds like a plan! > > >

Re: [OMPI users] independent startup of orted and orterun

2015-02-04 Thread Mark Santcroos
Ok great, sounds like a plan! > On 04 Feb 2015, at 2:53 , Ralph Castain wrote: > > Appreciate your patience! I'm somewhat limited this week by being on travel > to our HQ, so I don't have access to my usual test cluster. I'll be better > situated to complete the

Re: [OMPI users] independent startup of orted and orterun

2015-02-03 Thread Ralph Castain
Appreciate your patience! I'm somewhat limited this week by being on travel to our HQ, so I don't have access to my usual test cluster. I'll be better situated to complete the implementation once I get home. For now, some quick thoughts: 1. stdout/stderr: yes, I just need to "register"

Re: [OMPI users] independent startup of orted and orterun

2015-02-03 Thread Ralph Castain
Hmmmno, I wasn't seeing those warnings/errors, but I only ran one submit job. I'll investigate. On Tue, Feb 3, 2015 at 11:38 AM, Mark Santcroos wrote: > Hi Ralph, > > > On 03 Feb 2015, at 16:28 , Ralph Castain wrote: > > I think I fixed some

Re: [OMPI users] independent startup of orted and orterun

2015-02-03 Thread Mark Santcroos
Hi Ralph, Besides the items in the other mail, I have three more items that would need resolving at some point. 1. STDOUT/STDERR currently go to the orte-dvm console. I'm sure this is not a fundamental limitation. Even if getting the information to the orte-submit instance would be

Re: [OMPI users] independent startup of orted and orterun

2015-02-03 Thread Mark Santcroos
Hi Ralph, > On 03 Feb 2015, at 16:28 , Ralph Castain wrote: > I think I fixed some of the handshake issues - please give it another try. > You should see orte-submit properly shutdown upon completion, Indeed, it works on my laptop now! Great! It feels quite fast too, for sort

Re: [OMPI users] independent startup of orted and orterun

2015-02-03 Thread Ralph Castain
I think I fixed some of the handshake issues - please give it another try. You should see orte-submit properly shutdown upon completion, and orte-dvm properly shutdown when sent the terminate cmd. I was able to cleanly run MPI jobs on my laptop. On Mon, Feb 2, 2015 at 10:44 PM, Mark Santcroos

Re: [OMPI users] independent startup of orted and orterun

2015-02-03 Thread Mark Santcroos
On 03 Feb 2015, at 0:20 , Ralph Castain wrote: > Okay, thanks - I'll get on it tonight. Looks like a fairly simple bug, so > hopefully I'll have it ironed out tonight. Sorry, I was not completely accurate. Let me be more specific: * The orte-submit does not return though, so

Re: [OMPI users] independent startup of orted and orterun

2015-02-02 Thread Ralph Castain
Okay, thanks - I'll get on it tonight. Looks like a fairly simple bug, so hopefully I'll have it ironed out tonight. On Mon, Feb 2, 2015 at 1:40 PM, Mark Santcroos wrote: > FWIW: I see similar behaviour on my laptop (OS X Yosemite 10.10.2). > > > On 02 Feb 2015, at

Re: [OMPI users] independent startup of orted and orterun

2015-02-02 Thread Mark Santcroos
FWIW: I see similar behaviour on my laptop (OS X Yosemite 10.10.2). > On 02 Feb 2015, at 21:26 , Mark Santcroos wrote: > > Ok, let me check on some other systems too though, it might be Cray specific. > > >> On 02 Feb 2015, at 19:07 , Ralph Castain

Re: [OMPI users] independent startup of orted and orterun

2015-02-02 Thread Mark Santcroos
Ok, let me check on some other systems too though, it might be Cray specific. > On 02 Feb 2015, at 19:07 , Ralph Castain wrote: > > Yikes - looks like a bug crept into there at the last minute. I actually had > it working just fine - not sure what happened here. I'm on

Re: [OMPI users] independent startup of orted and orterun

2015-02-02 Thread Ralph Castain
Yikes - looks like a bug crept into there at the last minute. I actually had it working just fine - not sure what happened here. I'm on travel this week, but I'll try to dig into this a bit and spot the issue. Thanks! Ralph On Mon, Feb 2, 2015 at 3:50 AM, Mark Santcroos

Re: [OMPI users] independent startup of orted and orterun

2015-02-02 Thread Mark Santcroos
Hi Ralph, Great, the semantics look exactly as what I need! (To aid in debugging I added "--debug-devel" to orte-dvm.c which was useful to detect and come by some initial bumps) The current status: * I can submit applications and see their output on the orte-dvm console * The following

Re: [OMPI users] independent startup of orted and orterun

2015-02-01 Thread Ralph Castain
I have pushed the changes to the OMPI master. It took a little bit more than I had hoped due to the changes to the ORTE infrastructure, but hopefully this will meet your needs. It consists of two new tools: (a) orte-dvm - starts the virtual machine by launching a daemon on every node of the

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Mark Santcroos
Hi Ralph, All makes sense! Thanks a lot! Looking forward to your modifications. Please don't hesitate to through things with rough-edges to me! Cheers, Mark > On 21 Jan 2015, at 23:21 , Ralph Castain wrote: > > Let me address your questions up here so you don’t have to

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Ralph Castain
Let me address your questions up here so you don’t have to scan thru the entire note. PMIx rationale: PMI has been around for a long time, primarily used inside the MPI library implementations to perform wireup. It provided a link from the MPI library to the local resource manager. However, as

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Mark Santcroos
Hi Ralph, > On 21 Jan 2015, at 21:20 , Ralph Castain wrote: > > Hi Mark > >> On Jan 21, 2015, at 11:21 AM, Mark Santcroos >> wrote: >> >> Hi Ralph, all, >> >> To give some background, I'm part of the RADICAL-Pilot [1] development team. >>

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Ralph Castain
Hi Mark > On Jan 21, 2015, at 11:21 AM, Mark Santcroos > wrote: > > Hi Ralph, all, > > To give some background, I'm part of the RADICAL-Pilot [1] development team. > RADICAL-Pilot is a Pilot System, an implementation of the Pilot (job) > concept, which is in its

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Mark Santcroos
Hi Ralph, all, To give some background, I'm part of the RADICAL-Pilot [1] development team. RADICAL-Pilot is a Pilot System, an implementation of the Pilot (job) concept, which is in its most minimal form takes care of the decoupling of resource acquisition and workload management. So instead

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Ralph Castain
Ah, I see - actually, MR+ is supported under ORTE. You just need to add the —staged option to the cmd line. You then run your job like this: mpirun —staged -n X my-mapper : -n Y my-reducer We’ll wire together the output of the mappers to the input of the reducers and run them in an

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Mark Santcroos
Hi Ralph, I figured that MR+ had to require something similar too (assuming it actually does use ORCM). Let me study that wiki a bit and I will come back with a description of my use-case. Thanks! Mark > On 21 Jan 2015, at 17:18 , Ralph Castain wrote: > > Sorry -

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Ralph Castain
Sorry - should have included the link to the ORCM project: https://github.com/open-mpi/orcm/wiki > On Jan 21, 2015, at 8:16 AM, Ralph Castain wrote: > > Theoretically, yes - see the ORCM project, which basically does what you ask. >

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Ralph Castain
Theoretically, yes - see the ORCM project, which basically does what you ask. The launch system in there isn’t fully implemented yet, but the fundamental idea is valid and supports some range of capability. We used to have a cmd line option in ORTE for what you propose - it wouldn’t be too