Hi all, I've been working on a small thread ring benchmark in X10 and have codes written in MPI, UPC, and X10 thus far. Unfortunately, my X10 code is quite slower than the others (two orders of magnitude slower) and I'm not entirely sure why. Essentially each process just sends a message to the next process, and the final process sends the message to the first process (forming a ring).
I'd like to make the code comparable to MPI and UPC and would love a separate pair of eyes to look it over - it's less than 100 lines of code so it's not that long. I've posted the code here for those who are interested: http://pastebin.com/dYPCwh4G I know everyone is busy but any help is much appreciated! I know X10 can pass around 100 messages between 64 processors on two nodes with the MPI backed faster than 9700 seconds (the MPI is doing it in 4 seconds), but I'm just not sure what I'm doing wrong. Thanks! ------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users