Hi all,
  I've been working on a small thread ring benchmark in X10 and have
codes written in MPI, UPC, and X10 thus far. Unfortunately, my X10 code
is quite slower than the others (two orders of magnitude slower) and I'm
not entirely sure why. Essentially each process just sends a message to
the next process, and the final process sends the message to the first
process (forming a ring). 

I'd like to make the code comparable to MPI and UPC and would love a
separate pair of eyes to look it over - it's less than 100 lines of code
so it's not that long. I've posted the code here for those who are
interested:

http://pastebin.com/dYPCwh4G

I know everyone is busy but any help is much appreciated! I know X10 can
pass around 100 messages between 64 processors on two nodes with the MPI
backed faster than 9700 seconds (the MPI is doing it in 4 seconds), but
I'm just not sure what I'm doing wrong.

Thanks!


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Reply via email to