The launchers are spawned in a binary tree structure.  Launcher 0 spawns
launchers 1 and 2, as well as runtime 0.  Launcher 1 spawns 3 and 4, and so
on.  The launchers are linked together in the same way.  So each launcher
has one parent (the launcher that spawned it, except for the initial
launcher, which has no parent), a local runtime child, and potentially
other launcher children, depending on the number of places.

The normal shutdown sequence of an X10 program begins when place 0 tells
all other places that it's time to shut down.  This is a standard message
that gets processed in the java code, not in the launchers, and it causes
the runtime to exit.  Now the launchers are responsible for forwarding the
stderr and stdout from all of the runtimes out to the console for the
initial launcher process.  When a runtime exits, the launcher detects this,
but it may have other launcher children that have not yet completed and are
sending messages along.  So the launcher does not exit.

A launcher will exit when all children have exited, because then it knows
that there is nothing left to forward.  This causes the tree structure to
collapse from the leaves, down to the initial launcher process.  In the
event that a launcher detects that a parent has exited (an error
condition), it will kill the local runtime and also exit.  Child launchers
see this and do the same, collapsing the tree structure from the point of
failure out to the leaves.

Currently, the launcher has no way of identifying a runtime that has exited
because it's shutting down, vs one that has been killed unnaturally.  This
is why the rest of the processes are still around when you do this kill.
If you were to kill the initial launcher process instead of some child
somewhere, then the entire tree would come down cleanly.

This could be improved, if needed.

- Ben



                                                                                
                                                     
  From:       David E Hudak <dhu...@osc.edu>                                    
                                                     
                                                                                
                                                     
  To:         Mailing list for users of the X10 programming language 
<x10-users@lists.sourceforge.net>                               
                                                                                
                                                     
  Date:       06/28/2011 10:05                                                  
                                                     
                                                                                
                                                     
  Subject:    Re: [X10-users] Java native interface and runtime?                
                                                     
                                                                                
                                                     






On Jun 24, 2011, at 6:32 PM, Igor Peshansky wrote:

> Hi, Dave,
>
> Since 2.1.2, X10 comes with what we call a multi-vm Java backend
> implementation.  It runs with the sockets transport.  For sockets, the
> Java runner, "x10", uses the same launcher as the C++ backend
> ("X10Launcher"), so one can run it on, e.g., a Linux cluster by
> setting X10_NPLACES and X10_HOSTFILE, just like you would for a C++
> launch.

I am using X10 2.2.0.  I have figured out how to launch multi-vm across
multiple nodes of the OSC cluster using the x10 command with X10_NPLACES
and X10_HOSTFILE.

However, for our cluster we are always concerned about process clean up.
So, as a test, I ran a four-node job running MontyPi (good old MontyPi!),
running one java instance per node.  I was logged into the nodes and ran
top on each.  Sure enough, one 'java' process per node.

Then, I ran it again, but this time I manually killed one of the JVM's -
one of the launcher "children".  A sibling on one node exited, a sibling on
another node did not and the parent did not exit.  I was hoping that an
exit of any process would cause the entire set of processes to shut down.
Looking at the source code for launcher.cc, it looks like there are hooks
in there (Launcher::handleDeadChild and Launcher::handleDeadParent), but I
don't know the expected behavior.

Typically, we rely on the Torque resource manager to start processes.
There is a torque daemon on each node of the job.  The daemons fork child
processes to do the work.  If one child exits, the daemon is notified and
in turn notifies the other node's daemons.  If the X10 launcher is the
supported process management mechanism, it would be a good idea to have it
work with resource managers.  Open MPI used to have a standalone project
called Open Run Time Environment (Open RTE or ORTE) which may be an
interesting fit.  However, at this point, I think I am going to do
something in my job scripts to manually shut down processes after an exit
as a workaround.

>
> As far as I know, you cannot use the MPI transport with multi-vm.

Understood.  This is one advantage of the MPI transport:  I use mpiexec to
launch processes via Torque and the cleanup works correctly.

Thanks,
Dave

>                Igor
>
> On Fri, Jun 24, 2011 at 4:47 PM, David E Hudak <dhu...@osc.edu> wrote:
>> Hi All,
>>
>> I have a colleague with a Java implementation of a genetic algorithm.
He is interested in parallelizing the application for both multicore and
multinode execution.
>>
>> In the initial implementation, there are a set of classes for specifying
fitness functions, expressing genes and implementing gene manipulations.
There is a top-level simulation object that run the various number of
generations.  My plan was to try using the java native interface to use the
existing Java classes for organisms and fitness, and rewrite the top level
simulation in X10.
>>
>> I have been evaluating X10 for purely numeric applications on our
cluster (C++ back end, MPI runtime and mpiexec as a process launcher).  I
believe I read somewhere that the Java native interface requires the Java
back end.  In that case, I'd need to make sure we could run the sockets
runtime and whatever process launcher we have for java (x10run?).
>>
>> Anyone have any advice?
>>
>> Thanks,
>> Dave
>
>
------------------------------------------------------------------------------

> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense..
> http://p.sf.net/sfu/splunk-d2d-c1
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users

---
David E. Hudak, Ph.D.          dhu...@osc.edu
Program Director, HPC Engineering
Ohio Supercomputer Center
http://www.osc.edu










------------------------------------------------------------------------------

All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users




------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Reply via email to