On Jun 28, 2011, at 10:04 AM, David E Hudak wrote:

> 
> On Jun 24, 2011, at 6:32 PM, Igor Peshansky wrote:
> 
>> Hi, Dave,
>> 
>> Since 2.1.2, X10 comes with what we call a multi-vm Java backend
>> implementation.  It runs with the sockets transport.  For sockets, the
>> Java runner, "x10", uses the same launcher as the C++ backend
>> ("X10Launcher"), so one can run it on, e.g., a Linux cluster by
>> setting X10_NPLACES and X10_HOSTFILE, just like you would for a C++
>> launch.
> 
> I am using X10 2.2.0.  I have figured out how to launch multi-vm across 
> multiple nodes of the OSC cluster using the x10 command with X10_NPLACES and 
> X10_HOSTFILE.
> 
> However, for our cluster we are always concerned about process clean up.  So, 
> as a test, I ran a four-node job running MontyPi (good old MontyPi!), running 
> one java instance per node.  I was logged into the nodes and ran top on each. 
>  Sure enough, one 'java' process per node.  
> 
> Then, I ran it again, but this time I manually killed one of the JVM's - one 
> of the launcher "children".  A sibling on one node exited, a sibling on 
> another node did not and the parent did not exit.  I was hoping that an exit 
> of any process would cause the entire set of processes to shut down.  Looking 
> at the source code for launcher.cc, it looks like there are hooks in there 
> (Launcher::handleDeadChild and Launcher::handleDeadParent), but I don't know 
> the expected behavior.  
> 
> Typically, we rely on the Torque resource manager to start processes.  There 
> is a torque daemon on each node of the job.  The daemons fork child processes 
> to do the work.  If one child exits, the daemon is notified and in turn 
> notifies the other node's daemons.  If the X10 launcher is the supported 
> process management mechanism, it would be a good idea to have it work with 
> resource managers.  Open MPI used to have a standalone project called Open 
> Run Time Environment (Open RTE or ORTE) which may be an interesting fit.  
> However, at this point, I think I am going to do something in my job scripts 
> to manually shut down processes after an exit as a workaround.  

Ooops!  Hang on, just found X10_LAUNCHER_SSH.  I'll let you know if I can get 
that to work...

Dave

> 
>> 
>> As far as I know, you cannot use the MPI transport with multi-vm.
> 
> Understood.  This is one advantage of the MPI transport:  I use mpiexec to 
> launch processes via Torque and the cleanup works correctly.  
> 
> Thanks,
> Dave
> 
>>      Igor
>> 
>> On Fri, Jun 24, 2011 at 4:47 PM, David E Hudak <dhu...@osc.edu> wrote:
>>> Hi All,
>>> 
>>> I have a colleague with a Java implementation of a genetic algorithm.  He 
>>> is interested in parallelizing the application for both multicore and 
>>> multinode execution.
>>> 
>>> In the initial implementation, there are a set of classes for specifying 
>>> fitness functions, expressing genes and implementing gene manipulations.  
>>> There is a top-level simulation object that run the various number of 
>>> generations.  My plan was to try using the java native interface to use the 
>>> existing Java classes for organisms and fitness, and rewrite the top level 
>>> simulation in X10.
>>> 
>>> I have been evaluating X10 for purely numeric applications on our cluster 
>>> (C++ back end, MPI runtime and mpiexec as a process launcher).  I believe I 
>>> read somewhere that the Java native interface requires the Java back end.  
>>> In that case, I'd need to make sure we could run the sockets runtime and 
>>> whatever process launcher we have for java (x10run?).
>>> 
>>> Anyone have any advice?
>>> 
>>> Thanks,
>>> Dave
>> 
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a 
>> definitive record of customers, application performance, security 
>> threats, fraudulent activity and more. Splunk takes this data and makes 
>> sense of it. Business sense. IT sense. Common sense.. 
>> http://p.sf.net/sfu/splunk-d2d-c1
>> _______________________________________________
>> X10-users mailing list
>> X10-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/x10-users
> 
> ---
> David E. Hudak, Ph.D.          dhu...@osc.edu
> Program Director, HPC Engineering
> Ohio Supercomputer Center
> http://www.osc.edu
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security 
> threats, fraudulent activity, and more. Splunk takes this data and makes 
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2d-c2
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users

---
David E. Hudak, Ph.D.          dhu...@osc.edu
Program Director, HPC Engineering
Ohio Supercomputer Center
http://www.osc.edu










------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Reply via email to