Hi Russel -- (Make sure you set the environment variable X10_NTHREADS to specify the number of initial workers in the X10 computation. See the link http://dist.codehaus.org/x10/documentation/languagespec/x10-env.pdf I posted earlier.)
There does appear to be a problem in the C++ backend with MPI runtime and setting X10_NTHREADS > 1. Team will look at that .. more later today (after they wake up :-)) Best, Vijay On 11/5/2010 6:09 AM, Russel Winder wrote: > On Wed, 2010-11-03 at 17:36 -0400, Igor Peshansky wrote: > [ . . . ] >> You cannot run multi-place code with runx10 -- that script only supports >> launching one place at a time. > OK, thanks -- I hadn't realized that. I should have read INSTALL.txt it > is most clear on the topic! > >> With 2.1.0 using the pgas_sockets transport, you can either use mpiexec >> from an existing MPI installation to launch multiple processes, or use >> the manager/launcher combination (the process is described in the >> INSTALL.txt file in the distribution). > Anecdotal evidence of personal experience only, but it has to be said > Chapel is way simpler to get small embarrassingly parallel codes working > in parallel than X10 -- compile an executable and it executes in > parallel if it can. In my case twin-Xeon workstation so eight cores. > Also Chapel (as with C++ with pthreads, C++ with MPI, C++0x with futures > or asynchronous function call) only uses the cores that it is told to > not all the ones available. > > Having said that, I need to be constructive, even though this might seem > adverse criticism -- which I guess it is, but be assured it is presented > in an attempt to be helpful and/or find out what I am doing wrong. > > I tried X10 with the manager/launcher combination and as an MPI > executable. Whilst launcher is happy to be called as launcher with it in > the path, manager is not. manager insists on being called with an > absolute path which is a serious irritant. Also I got some weird > results, so I switched to treating my executable as an MPI executable -- > which gives a fairer comparison to other versions I have anyway. > However that gave somewhat surprising results, along with all cores > getting used 100% even when not required. > > I appreciate that my "application" (calculating Pi by quadrature) is > trivial and a microbenchmark, but it is generally a good example for > presentations as it is small of code, easy to compile and run and -- > most importantly -- gives something of a handle on scaling. Knowing > that the X10 team has focused on the C++ back end rather than the Java > back end, I am ignoring the JVM-based execution for now. (Also JIT warm > up is a serious issue for JVM-based microbenchmarks at the best of > times, so it is difficult to get a fair comparison for this code on that > platform anyway -- though Scala performs very well.) > > |> runx10 Pi_X10_Parallel > ==== X10 Parallel pi = 3.141592651589971 > ==== X10 Parallel iteration count = 1000000000 > ==== X10 Parallel elapse = 8.597293312 > ==== X10 Parallel task count = 1 > > ==== X10 Parallel pi = 3.141592648389901 > ==== X10 Parallel iteration count = 1000000000 > ==== X10 Parallel elapse = 8.597158734000001 > ==== X10 Parallel task count = 2 > > ==== X10 Parallel pi = 3.141592629477861 > ==== X10 Parallel iteration count = 1000000000 > ==== X10 Parallel elapse = 8.599376457 > ==== X10 Parallel task count = 8 > > ==== X10 Parallel pi = 3.141592554064001 > ==== X10 Parallel iteration count = 1000000000 > ==== X10 Parallel elapse = 8.600300205 > ==== X10 Parallel task count = 32 > > |> mpirun -n 8 Pi_X10_Parallel > ==== X10 Parallel pi = 3.141592651589971 > ==== X10 Parallel iteration count = 1000000000 > ==== X10 Parallel elapse = 18.120695501 > ==== X10 Parallel task count = 1 > > ==== X10 Parallel pi = 3.141592648389901 > ==== X10 Parallel iteration count = 1000000000 > ==== X10 Parallel elapse = 8.659647902 > ==== X10 Parallel task count = 2 > > ==== X10 Parallel pi = 3.141592629477861 > ==== X10 Parallel iteration count = 1000000000 > ==== X10 Parallel elapse = 4.364804941 > ==== X10 Parallel task count = 8 > > ==== X10 Parallel pi = 3.141592554064 > ==== X10 Parallel iteration count = 1000000000 > ==== X10 Parallel elapse = 2.225576907 > ==== X10 Parallel task count = 32 > > The runx10 version is running comparable speed to all the other C++ and > Chapel sequential and single core parallel versions. However the X10 > MPI executed version seems to be twice as slow as all the other C++ and > Chapel parallel version using either threads or MPI. So I guess my > question is: why is X10 half the speed of C++/MPI? Also: does it really > need to use all the cycles for the infrastructure when C++/MPI does not? > > If the answers to these questions are RTFM type one, feel free to just > point me at the FM :-) > > Thanks. > ------------------------------------------------------------------------------ The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book "Blueprint to a Billion" shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users