Appreciate your patience! I'm somewhat limited this week by being on travel
to our HQ, so I don't have access to my usual test cluster. I'll be better
situated to complete the implementation once I get home.

For now, some quick thoughts:

1. stdout/stderr: yes, I just need to "register" orte-submit as the one to
receive those from the submitted job.

2. That one is going to be a tad trickier, but is resolvable. May take me a
little longer to fix.

3. dang - I thought I had it doing so. I'll look to find the issue. I
suspect it's just a case of correctly setting the return code of
orte-submit.

I'd welcome the help! Let me ponder the best way to point you to the areas
needing work, and we can kick around off-list about who does what.

Great to hear this is working with your tool so quickly!!
Ralph


On Tue, Feb 3, 2015 at 3:49 PM, Mark Santcroos <mark.santcr...@rutgers.edu>
wrote:

> Hi Ralph,
>
> Besides the items in the other mail, I have three more items that would
> need resolving at some point.
>
> 1. STDOUT/STDERR currently go to the orte-dvm console.
>    I'm sure this is not a fundamental limitation.
>    Even if getting the information to the orte-submit instance would be
> problematic, the orte-dvm writing this to a file per session would be good
> enough too.
>
> 2. Failing applications currently tear down the dvm.
>    Ideally that would not be the case, and this would be handled in
> relation to item (3).
>    Possibly this needs to be configurable, if others would like to see
> different behaviour.
>
> 3. orte-submit doesn't return the exit code of the application.
>
> To be clear, I realise the current implementation is a proof of concept,
> so these are no complaints, just wishes of where I hope to see this going!
>
> FWIW: these items might require less intricate knowledge of OMPI in
> general, so with some pointers/guidance I can probably work on these myself
> if needed.
>
> Cheers,
>
> Mark
>
> ps. I did a quick-and-dirty integration with our own tool and the ORTE
> abstraction maps like a charm!
>     (
> https://github.com/radical-cybertools/radical.pilot/commit/2d36e886081bf8531097edfc95ada1826257e460
> )
>
> > On 03 Feb 2015, at 20:38 , Mark Santcroos <mark.santcr...@rutgers.edu>
> wrote:
> >
> > Hi Ralph,
> >
> >> On 03 Feb 2015, at 16:28 , Ralph Castain <r...@open-mpi.org> wrote:
> >> I think I fixed some of the handshake issues - please give it another
> try.
> >> You should see orte-submit properly shutdown upon completion,
> >
> > Indeed, it works on my laptop now! Great!
> > It feels quite fast too, for sort tasks :-)
> >
> >> and orte-dvm properly shutdown when sent the terminate cmd.
> >
> > ACK. This also works as expected.
> >
> >> I was able to cleanly run MPI jobs on my laptop.
> >
> > Do you also see the following errors/warnings on the dvm side?
> >
> > [netbook:28324] [[20896,0],0] Releasing job data for [INVALID]
> > Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI
> mark@netbook Distribution, ident: 1.9.0a1, repo rev: dev-811-g7299cc3,
> Unreleased developer copy, 132)
> > [netbook:28324] sess_dir_finalize: proc session dir does not exist
> > [netbook:28324] [[20896,0],0] dvm: job [20896,20] has completed
> > [netbook:28324] [[20896,0],0] Releasing job data for [20896,20]
> >
> > The "INVALID" message is there for every "submit", the sess_dir_finalize
> exists per instance/core.
> > Is that something to worry about, that needs fixing or is that a
> configuration issue?
> >
> > I haven't been able to test on Edison because of maintenance
> (today+tomorrow), so I will report on that later.
> >
> > Thanks again!
> >
> > Mark
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/02/26282.php
>

Reply via email to