Re: [X10-users] Running C++ generated code . . .

Russel Winder Fri, 05 Nov 2010 03:09:53 -0700

On Wed, 2010-11-03 at 17:36 -0400, Igor Peshansky wrote:
[ . . . ]
> You cannot run multi-place code with runx10 -- that script only supports
> launching one place at a time.


OK, thanks -- I hadn't realized that.  I should have read INSTALL.txt it
is most clear on the topic!

> With 2.1.0 using the pgas_sockets transport, you can either use mpiexec
> from an existing MPI installation to launch multiple processes, or use
> the manager/launcher combination (the process is described in the
> INSTALL.txt file in the distribution).

Anecdotal evidence of personal experience only, but it has to be said
Chapel is way simpler to get small embarrassingly parallel codes working
in parallel than X10 -- compile an executable and it executes in
parallel if it can.  In my case twin-Xeon workstation so eight cores.
Also Chapel (as with C++ with pthreads, C++ with MPI, C++0x with futures
or asynchronous function call) only uses the cores that it is told to
not all the ones available.

Having said that, I need to be constructive, even though this might seem
adverse criticism -- which I guess it is, but be assured it is presented
in an attempt to be helpful and/or find out what I am doing wrong.

I tried X10 with the manager/launcher combination and as an MPI
executable. Whilst launcher is happy to be called as launcher with it in
the path, manager is not.  manager insists on being called with an
absolute path which is a serious irritant.  Also I got some weird
results, so I switched to treating my executable as an MPI executable --
which gives a fairer comparison to other versions I have anyway.
However that gave somewhat surprising results, along with all cores
getting used 100% even when not required.

I appreciate that my "application" (calculating Pi by quadrature) is
trivial and a microbenchmark, but it is generally a good example for
presentations as it is small of code, easy to compile and run and --
most importantly -- gives something of a handle on scaling.   Knowing
that the X10 team has focused on the C++ back end rather than the Java
back end, I am ignoring the JVM-based execution for now. (Also JIT warm
up is a serious issue for JVM-based microbenchmarks at the best of
times, so it is difficult to get a fair comparison for this code on that
platform anyway -- though Scala performs very well.)

|> runx10 Pi_X10_Parallel
==== X10 Parallel pi = 3.141592651589971
==== X10 Parallel iteration count = 1000000000
==== X10 Parallel elapse = 8.597293312
==== X10 Parallel task count = 1

==== X10 Parallel pi = 3.141592648389901
==== X10 Parallel iteration count = 1000000000
==== X10 Parallel elapse = 8.597158734000001
==== X10 Parallel task count = 2

==== X10 Parallel pi = 3.141592629477861
==== X10 Parallel iteration count = 1000000000
==== X10 Parallel elapse = 8.599376457
==== X10 Parallel task count = 8

==== X10 Parallel pi = 3.141592554064001
==== X10 Parallel iteration count = 1000000000
==== X10 Parallel elapse = 8.600300205
==== X10 Parallel task count = 32

|> mpirun -n 8 Pi_X10_Parallel
==== X10 Parallel pi = 3.141592651589971
==== X10 Parallel iteration count = 1000000000
==== X10 Parallel elapse = 18.120695501
==== X10 Parallel task count = 1

==== X10 Parallel pi = 3.141592648389901
==== X10 Parallel iteration count = 1000000000
==== X10 Parallel elapse = 8.659647902
==== X10 Parallel task count = 2

==== X10 Parallel pi = 3.141592629477861
==== X10 Parallel iteration count = 1000000000
==== X10 Parallel elapse = 4.364804941
==== X10 Parallel task count = 8

==== X10 Parallel pi = 3.141592554064
==== X10 Parallel iteration count = 1000000000
==== X10 Parallel elapse = 2.225576907
==== X10 Parallel task count = 32

The runx10 version is running comparable speed to all the other C++ and
Chapel sequential and single core parallel versions.  However the X10
MPI executed version seems to be twice as slow as all the other C++ and
Chapel parallel version using either threads or MPI.  So I guess my
question is: why is X10 half the speed of C++/MPI?  Also: does it really
need to use all the cycles for the infrastructure when C++/MPI does not?

If the answers to these questions are RTFM type one, feel free to just
point me at the FM :-)

Thanks. 

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: rus...@russel.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder



------------------------------------------------------------------------------
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a 
Billion" shares his insights and actions to help propel your 
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Re: [X10-users] Running C++ generated code . . .

Reply via email to