On Jun 28, 2011, at 10:04 AM, David E Hudak wrote: > > On Jun 24, 2011, at 6:32 PM, Igor Peshansky wrote: > >> Hi, Dave, >> >> Since 2.1.2, X10 comes with what we call a multi-vm Java backend >> implementation. It runs with the sockets transport. For sockets, the >> Java runner, "x10", uses the same launcher as the C++ backend >> ("X10Launcher"), so one can run it on, e.g., a Linux cluster by >> setting X10_NPLACES and X10_HOSTFILE, just like you would for a C++ >> launch. > > I am using X10 2.2.0. I have figured out how to launch multi-vm across > multiple nodes of the OSC cluster using the x10 command with X10_NPLACES and > X10_HOSTFILE. > > However, for our cluster we are always concerned about process clean up. So, > as a test, I ran a four-node job running MontyPi (good old MontyPi!), running > one java instance per node. I was logged into the nodes and ran top on each. > Sure enough, one 'java' process per node. > > Then, I ran it again, but this time I manually killed one of the JVM's - one > of the launcher "children". A sibling on one node exited, a sibling on > another node did not and the parent did not exit. I was hoping that an exit > of any process would cause the entire set of processes to shut down. Looking > at the source code for launcher.cc, it looks like there are hooks in there > (Launcher::handleDeadChild and Launcher::handleDeadParent), but I don't know > the expected behavior. > > Typically, we rely on the Torque resource manager to start processes. There > is a torque daemon on each node of the job. The daemons fork child processes > to do the work. If one child exits, the daemon is notified and in turn > notifies the other node's daemons. If the X10 launcher is the supported > process management mechanism, it would be a good idea to have it work with > resource managers. Open MPI used to have a standalone project called Open > Run Time Environment (Open RTE or ORTE) which may be an interesting fit. > However, at this point, I think I am going to do something in my job scripts > to manually shut down processes after an exit as a workaround.
Ooops! Hang on, just found X10_LAUNCHER_SSH. I'll let you know if I can get that to work... Dave > >> >> As far as I know, you cannot use the MPI transport with multi-vm. > > Understood. This is one advantage of the MPI transport: I use mpiexec to > launch processes via Torque and the cleanup works correctly. > > Thanks, > Dave > >> Igor >> >> On Fri, Jun 24, 2011 at 4:47 PM, David E Hudak <dhu...@osc.edu> wrote: >>> Hi All, >>> >>> I have a colleague with a Java implementation of a genetic algorithm. He >>> is interested in parallelizing the application for both multicore and >>> multinode execution. >>> >>> In the initial implementation, there are a set of classes for specifying >>> fitness functions, expressing genes and implementing gene manipulations. >>> There is a top-level simulation object that run the various number of >>> generations. My plan was to try using the java native interface to use the >>> existing Java classes for organisms and fitness, and rewrite the top level >>> simulation in X10. >>> >>> I have been evaluating X10 for purely numeric applications on our cluster >>> (C++ back end, MPI runtime and mpiexec as a process launcher). I believe I >>> read somewhere that the Java native interface requires the Java back end. >>> In that case, I'd need to make sure we could run the sockets runtime and >>> whatever process launcher we have for java (x10run?). >>> >>> Anyone have any advice? >>> >>> Thanks, >>> Dave >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense.. >> http://p.sf.net/sfu/splunk-d2d-c1 >> _______________________________________________ >> X10-users mailing list >> X10-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/x10-users > > --- > David E. Hudak, Ph.D. dhu...@osc.edu > Program Director, HPC Engineering > Ohio Supercomputer Center > http://www.osc.edu > > > > > > > > > > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2d-c2 > _______________________________________________ > X10-users mailing list > X10-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/x10-users --- David E. Hudak, Ph.D. dhu...@osc.edu Program Director, HPC Engineering Ohio Supercomputer Center http://www.osc.edu ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users