The launchers are spawned in a binary tree structure. Launcher 0 spawns launchers 1 and 2, as well as runtime 0. Launcher 1 spawns 3 and 4, and so on. The launchers are linked together in the same way. So each launcher has one parent (the launcher that spawned it, except for the initial launcher, which has no parent), a local runtime child, and potentially other launcher children, depending on the number of places.
The normal shutdown sequence of an X10 program begins when place 0 tells all other places that it's time to shut down. This is a standard message that gets processed in the java code, not in the launchers, and it causes the runtime to exit. Now the launchers are responsible for forwarding the stderr and stdout from all of the runtimes out to the console for the initial launcher process. When a runtime exits, the launcher detects this, but it may have other launcher children that have not yet completed and are sending messages along. So the launcher does not exit. A launcher will exit when all children have exited, because then it knows that there is nothing left to forward. This causes the tree structure to collapse from the leaves, down to the initial launcher process. In the event that a launcher detects that a parent has exited (an error condition), it will kill the local runtime and also exit. Child launchers see this and do the same, collapsing the tree structure from the point of failure out to the leaves. Currently, the launcher has no way of identifying a runtime that has exited because it's shutting down, vs one that has been killed unnaturally. This is why the rest of the processes are still around when you do this kill. If you were to kill the initial launcher process instead of some child somewhere, then the entire tree would come down cleanly. This could be improved, if needed. - Ben From: David E Hudak <dhu...@osc.edu> To: Mailing list for users of the X10 programming language <x10-users@lists.sourceforge.net> Date: 06/28/2011 10:05 Subject: Re: [X10-users] Java native interface and runtime? On Jun 24, 2011, at 6:32 PM, Igor Peshansky wrote: > Hi, Dave, > > Since 2.1.2, X10 comes with what we call a multi-vm Java backend > implementation. It runs with the sockets transport. For sockets, the > Java runner, "x10", uses the same launcher as the C++ backend > ("X10Launcher"), so one can run it on, e.g., a Linux cluster by > setting X10_NPLACES and X10_HOSTFILE, just like you would for a C++ > launch. I am using X10 2.2.0. I have figured out how to launch multi-vm across multiple nodes of the OSC cluster using the x10 command with X10_NPLACES and X10_HOSTFILE. However, for our cluster we are always concerned about process clean up. So, as a test, I ran a four-node job running MontyPi (good old MontyPi!), running one java instance per node. I was logged into the nodes and ran top on each. Sure enough, one 'java' process per node. Then, I ran it again, but this time I manually killed one of the JVM's - one of the launcher "children". A sibling on one node exited, a sibling on another node did not and the parent did not exit. I was hoping that an exit of any process would cause the entire set of processes to shut down. Looking at the source code for launcher.cc, it looks like there are hooks in there (Launcher::handleDeadChild and Launcher::handleDeadParent), but I don't know the expected behavior. Typically, we rely on the Torque resource manager to start processes. There is a torque daemon on each node of the job. The daemons fork child processes to do the work. If one child exits, the daemon is notified and in turn notifies the other node's daemons. If the X10 launcher is the supported process management mechanism, it would be a good idea to have it work with resource managers. Open MPI used to have a standalone project called Open Run Time Environment (Open RTE or ORTE) which may be an interesting fit. However, at this point, I think I am going to do something in my job scripts to manually shut down processes after an exit as a workaround. > > As far as I know, you cannot use the MPI transport with multi-vm. Understood. This is one advantage of the MPI transport: I use mpiexec to launch processes via Torque and the cleanup works correctly. Thanks, Dave > Igor > > On Fri, Jun 24, 2011 at 4:47 PM, David E Hudak <dhu...@osc.edu> wrote: >> Hi All, >> >> I have a colleague with a Java implementation of a genetic algorithm. He is interested in parallelizing the application for both multicore and multinode execution. >> >> In the initial implementation, there are a set of classes for specifying fitness functions, expressing genes and implementing gene manipulations. There is a top-level simulation object that run the various number of generations. My plan was to try using the java native interface to use the existing Java classes for organisms and fitness, and rewrite the top level simulation in X10. >> >> I have been evaluating X10 for purely numeric applications on our cluster (C++ back end, MPI runtime and mpiexec as a process launcher). I believe I read somewhere that the Java native interface requires the Java back end. In that case, I'd need to make sure we could run the sockets runtime and whatever process launcher we have for java (x10run?). >> >> Anyone have any advice? >> >> Thanks, >> Dave > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense.. > http://p.sf.net/sfu/splunk-d2d-c1 > _______________________________________________ > X10-users mailing list > X10-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/x10-users --- David E. Hudak, Ph.D. dhu...@osc.edu Program Director, HPC Engineering Ohio Supercomputer Center http://www.osc.edu ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users