Hi Russel --

(Make sure you set the environment variable X10_NTHREADS to specify the 
number of initial workers in the X10 computation. See the link 
http://dist.codehaus.org/x10/documentation/languagespec/x10-env.pdf I 
posted earlier.)

There does appear to be a problem in the C++ backend with MPI runtime 
and setting X10_NTHREADS > 1.

Team will look at that .. more later today (after they wake up :-))

Best,
Vijay
On 11/5/2010 6:09 AM, Russel Winder wrote:
> On Wed, 2010-11-03 at 17:36 -0400, Igor Peshansky wrote:
> [ . . . ]
>> You cannot run multi-place code with runx10 -- that script only supports
>> launching one place at a time.
> OK, thanks -- I hadn't realized that.  I should have read INSTALL.txt it
> is most clear on the topic!
>
>> With 2.1.0 using the pgas_sockets transport, you can either use mpiexec
>> from an existing MPI installation to launch multiple processes, or use
>> the manager/launcher combination (the process is described in the
>> INSTALL.txt file in the distribution).
> Anecdotal evidence of personal experience only, but it has to be said
> Chapel is way simpler to get small embarrassingly parallel codes working
> in parallel than X10 -- compile an executable and it executes in
> parallel if it can.  In my case twin-Xeon workstation so eight cores.
> Also Chapel (as with C++ with pthreads, C++ with MPI, C++0x with futures
> or asynchronous function call) only uses the cores that it is told to
> not all the ones available.
>
> Having said that, I need to be constructive, even though this might seem
> adverse criticism -- which I guess it is, but be assured it is presented
> in an attempt to be helpful and/or find out what I am doing wrong.
>
> I tried X10 with the manager/launcher combination and as an MPI
> executable. Whilst launcher is happy to be called as launcher with it in
> the path, manager is not.  manager insists on being called with an
> absolute path which is a serious irritant.  Also I got some weird
> results, so I switched to treating my executable as an MPI executable --
> which gives a fairer comparison to other versions I have anyway.
> However that gave somewhat surprising results, along with all cores
> getting used 100% even when not required.
>
> I appreciate that my "application" (calculating Pi by quadrature) is
> trivial and a microbenchmark, but it is generally a good example for
> presentations as it is small of code, easy to compile and run and --
> most importantly -- gives something of a handle on scaling.   Knowing
> that the X10 team has focused on the C++ back end rather than the Java
> back end, I am ignoring the JVM-based execution for now. (Also JIT warm
> up is a serious issue for JVM-based microbenchmarks at the best of
> times, so it is difficult to get a fair comparison for this code on that
> platform anyway -- though Scala performs very well.)
>
> |>  runx10 Pi_X10_Parallel
> ==== X10 Parallel pi = 3.141592651589971
> ==== X10 Parallel iteration count = 1000000000
> ==== X10 Parallel elapse = 8.597293312
> ==== X10 Parallel task count = 1
>
> ==== X10 Parallel pi = 3.141592648389901
> ==== X10 Parallel iteration count = 1000000000
> ==== X10 Parallel elapse = 8.597158734000001
> ==== X10 Parallel task count = 2
>
> ==== X10 Parallel pi = 3.141592629477861
> ==== X10 Parallel iteration count = 1000000000
> ==== X10 Parallel elapse = 8.599376457
> ==== X10 Parallel task count = 8
>
> ==== X10 Parallel pi = 3.141592554064001
> ==== X10 Parallel iteration count = 1000000000
> ==== X10 Parallel elapse = 8.600300205
> ==== X10 Parallel task count = 32
>
> |>  mpirun -n 8 Pi_X10_Parallel
> ==== X10 Parallel pi = 3.141592651589971
> ==== X10 Parallel iteration count = 1000000000
> ==== X10 Parallel elapse = 18.120695501
> ==== X10 Parallel task count = 1
>
> ==== X10 Parallel pi = 3.141592648389901
> ==== X10 Parallel iteration count = 1000000000
> ==== X10 Parallel elapse = 8.659647902
> ==== X10 Parallel task count = 2
>
> ==== X10 Parallel pi = 3.141592629477861
> ==== X10 Parallel iteration count = 1000000000
> ==== X10 Parallel elapse = 4.364804941
> ==== X10 Parallel task count = 8
>
> ==== X10 Parallel pi = 3.141592554064
> ==== X10 Parallel iteration count = 1000000000
> ==== X10 Parallel elapse = 2.225576907
> ==== X10 Parallel task count = 32
>
> The runx10 version is running comparable speed to all the other C++ and
> Chapel sequential and single core parallel versions.  However the X10
> MPI executed version seems to be twice as slow as all the other C++ and
> Chapel parallel version using either threads or MPI.  So I guess my
> question is: why is X10 half the speed of C++/MPI?  Also: does it really
> need to use all the cycles for the infrastructure when C++/MPI does not?
>
> If the answers to these questions are RTFM type one, feel free to just
> point me at the FM :-)
>
> Thanks.
>



------------------------------------------------------------------------------
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a 
Billion" shares his insights and actions to help propel your 
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Reply via email to