Appreciate your patience! I'm somewhat limited this week by being on travel to our HQ, so I don't have access to my usual test cluster. I'll be better situated to complete the implementation once I get home.
For now, some quick thoughts: 1. stdout/stderr: yes, I just need to "register" orte-submit as the one to receive those from the submitted job. 2. That one is going to be a tad trickier, but is resolvable. May take me a little longer to fix. 3. dang - I thought I had it doing so. I'll look to find the issue. I suspect it's just a case of correctly setting the return code of orte-submit. I'd welcome the help! Let me ponder the best way to point you to the areas needing work, and we can kick around off-list about who does what. Great to hear this is working with your tool so quickly!! Ralph On Tue, Feb 3, 2015 at 3:49 PM, Mark Santcroos <mark.santcr...@rutgers.edu> wrote: > Hi Ralph, > > Besides the items in the other mail, I have three more items that would > need resolving at some point. > > 1. STDOUT/STDERR currently go to the orte-dvm console. > I'm sure this is not a fundamental limitation. > Even if getting the information to the orte-submit instance would be > problematic, the orte-dvm writing this to a file per session would be good > enough too. > > 2. Failing applications currently tear down the dvm. > Ideally that would not be the case, and this would be handled in > relation to item (3). > Possibly this needs to be configurable, if others would like to see > different behaviour. > > 3. orte-submit doesn't return the exit code of the application. > > To be clear, I realise the current implementation is a proof of concept, > so these are no complaints, just wishes of where I hope to see this going! > > FWIW: these items might require less intricate knowledge of OMPI in > general, so with some pointers/guidance I can probably work on these myself > if needed. > > Cheers, > > Mark > > ps. I did a quick-and-dirty integration with our own tool and the ORTE > abstraction maps like a charm! > ( > https://github.com/radical-cybertools/radical.pilot/commit/2d36e886081bf8531097edfc95ada1826257e460 > ) > > > On 03 Feb 2015, at 20:38 , Mark Santcroos <mark.santcr...@rutgers.edu> > wrote: > > > > Hi Ralph, > > > >> On 03 Feb 2015, at 16:28 , Ralph Castain <r...@open-mpi.org> wrote: > >> I think I fixed some of the handshake issues - please give it another > try. > >> You should see orte-submit properly shutdown upon completion, > > > > Indeed, it works on my laptop now! Great! > > It feels quite fast too, for sort tasks :-) > > > >> and orte-dvm properly shutdown when sent the terminate cmd. > > > > ACK. This also works as expected. > > > >> I was able to cleanly run MPI jobs on my laptop. > > > > Do you also see the following errors/warnings on the dvm side? > > > > [netbook:28324] [[20896,0],0] Releasing job data for [INVALID] > > Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI > mark@netbook Distribution, ident: 1.9.0a1, repo rev: dev-811-g7299cc3, > Unreleased developer copy, 132) > > [netbook:28324] sess_dir_finalize: proc session dir does not exist > > [netbook:28324] [[20896,0],0] dvm: job [20896,20] has completed > > [netbook:28324] [[20896,0],0] Releasing job data for [20896,20] > > > > The "INVALID" message is there for every "submit", the sess_dir_finalize > exists per instance/core. > > Is that something to worry about, that needs fixing or is that a > configuration issue? > > > > I haven't been able to test on Edison because of maintenance > (today+tomorrow), so I will report on that later. > > > > Thanks again! > > > > Mark > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/02/26282.php >