Hi Marko - unfortunately, attachments are also removed on this mailing list. You can send an email directly to me if you would like me to look at it.
You can specify which host each place is located on most easily by using X10_HOSTFILE instead of X10_HOSTLIST. If you are running with 48 places, then you can create a text file with 48 lines, with a hostname on each line. Place 0 will run on the host specified on the first line, place 1 on the second line, and so on. This can also be done with a hostlist, but it will be a very long command line. Both the hostfile and hostlist wrap, so when you specify only 4 nodes as per your email below, you should be getting 12 places on each (node001 should have places 0,4,8,12,etc). Depending on your program, you may get better performance running with 4 places instead of 48, and use async to increase the parallelism within each place. You may also want to explicitly set the X10_NTHREADS environment variable to 1 if you're using 48 places, or 12 if using 4 places. Others on this mailing list may have additional comments on this. - Ben From: Marko Kobal <marko.ko...@arctur.si> To: "x10-users@lists.sourceforge.net" <x10-users@lists.sourceforge.net> Date: 08/09/2011 12:35 Subject: Re: [X10-users] runing on multiple places (cluster) with sockets RT implementation Hi, Me again ;) I have an example (N-body) written to execute in parallel. It works just fine, scales good on more cores. However, when I try to run it on more than one machine, more exactly on 4 nodes, I can see that X10 does not properly distributes the load to the nodes. I have nodes with 2 Intel processors, 6 cores each, that makes 12 cores per node, 48 cores per 4 nodes: export X10_HOSTLIST=node001,node002,node003,node004 export X10_NPLACES=48 I did compile for the sockets RT Implementation: # x10c++ -x10rt sockets -o nbody.parallel.sockets nbody.parallel.x10 When I execute the program, I can see that processes are spawn through the 4 nodes, however the load is not distributed evenly. On some nodes there are more than 12 processes running, on some less than 12. This is obviously not good as some nodes are overloaded (and as such processing is not optimal) and some are under loaded (that's not one would wish for). See the print screen from my monitoring software: (sorry, the picture was embedded wich is obviusly not supported by the mailing list, I've put it into attachement now) The usage for X10Laucher says: X10Launcher [-np NUM_OF_PLACES] [-hostlist HOST1,HOST2,ETC] [-hostfile FILENAME] COMMAND_TO_LAUNCH [ARG1 ARG2 ...] . so there is no parameter to set "processes per node" . I would expect something similiar as is the "-perhost" parameter in the MPI world. Is there any way to achieve this with the X10 sockets RT Implementation? Thanks for help! Kind regards, Marko Kobal ------------------------------------------------------------------------------ uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users ------------------------------------------------------------------------------ uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users