Re: [X10-users] Thread Ring Benchmark

Chris Bunch Fri, 11 Feb 2011 11:15:36 -0800

Certainly! Here's a link:

http://pastebin.com/3cHa5M2R


Thanks for the help! I appreciate it a lot!

On Fri, 2011-02-11 at 14:05 -0500, Dave Cunningham wrote:
> Can you show us the MPI code so we can verify it's doing the same thing as
> the X10 code?
> 
> On Fri, Feb 11, 2011 at 1:07 PM, Chris Bunch <c...@cs.ucsb.edu> wrote:
> 
> > Hi Josh,
> >  Thanks for the quick response! This code is definitely quite an
> > improvement over the last, and improves my running time from 10000
> > seconds to 1500 seconds. Unfortunately, it's still much slower than my
> > UPC and MPI codes, which are coming in at 5 seconds. I'm compiling my
> > X10 code with -O -NO_CHECKS. Any other ideas?
> >
> > Thanks again!
> >
> > On Fri, 2011-02-11 at 12:25 +1100, Josh Milthorpe wrote:
> > > Sorry Chris!  I just realised my "simplification" means it's no longer a
> > > ring :-)
> > >
> > > The correct code for the main method is:
> > >
> > >     for (var index : Int = 0; index < NUM_MESSAGES; index++) {
> > >       val i = index;
> > >       finish for (p in Place.places()) async at (p) {
> > >         if (p.id == 0) {
> > >             Ring.send(here.next(), i);
> > >             Ring.recv(i);
> > >         } else {
> > >             Ring.recv(i);
> > >             Ring.send(here.next(), i);
> > >         }
> > >       }
> > >     }
> > >
> > >
> > > Josh Milthorpe wrote:
> > > > Hi Chris,
> > > >
> > > > this is a nice test of X10 primitive and communications.
> > > >
> > > > When I profile your code on multiple places on a single computer, I see
> > > > almost all the runtime is spent in "busy waiting" - presumably, threads
> > > > at the receiving node waiting for the sending node to complete.  There
> > > > is more information on the busy waiting problem in
> > > > http://jira.codehaus.org/browse/XTENLANG-1012
> > > >
> > > > I'm guessing that the "sleep" is not an essential part of your
> > > > benchmark.  If that's right, I would say that this is a perfect test
> > for
> > > > conditional atomic blocks (section 14.7.2 of the language
> > > > specification).  You can replace the body of recv(...)  with
> > > >
> > > >              when (A(here.id) == value);
> > > >
> > > > This simply waits for the value to be set, and then returns.
> > > >
> > > > Sadly, this won't work by itself.  With the current version of X10, the
> > > > blocked thread is never woken up again to check the condition.  Thus we
> > > > have a deadlock - see http://jira.codehaus.org/browse/XTENLANG-1660for
> > > > more information.
> > > >
> > > > There is an easy way to avoid this deadlock using an (unconditional)
> > > > atomic block in send(...) as follows:
> > > >
> > > >     at (target) {
> > > >       atomic A(here.id) = value;
> > > >     }
> > > >
> > > > On exit of this atomic block at the receiving place, the runtime checks
> > > > whether there are other threads waiting, and if so wakes them up.  So
> > > > the blocked thread will see that the condition is now true, and
> > continue.
> > > >
> > > > These changes improved the performance of your code by over three
> > orders
> > > > of magnitude on my platform.  Please let me know whether they work for
> > you.
> > > >
> > > > As an aside, you can use the Place.next() method to simplify the code
> > > > dramatically.  A full version is below.
> > > >
> > > > Cheers
> > > >
> > > > Josh
> > > >
> > > > ---
> > > >   public static def send(target:Place, value:Int) {
> > > >     at (target) {
> > > >       atomic A(here.id) = value;
> > > >     }
> > > >   }
> > > >
> > > >   public static def recv(value:Int) {
> > > >     when (A(here.id) == value);
> > > >   }
> > > >
> > > >   public static def main(args:Array[String](1)) {
> > > >     val startTime = Timer.milliTime();
> > > >
> > > >     for (var index : Int = 0; index < NUM_MESSAGES; index++) {
> > > >       val i = index;
> > > >       finish for (p in Place.places()) async at (p) {
> > > >             Ring.send(here.next(), i);
> > > >             Ring.recv(i);
> > > >       }
> > > >     }
> > > >
> > > >     val endTime = Timer.milliTime();
> > > >     val totalTime = (endTime - startTime) / 1000.0;
> > > >
> > > >     Console.OUT.printf("It took %f seconds\n", totalTime);
> > > >   }
> > > > ---
> > > >
> > > >
> > > > Chris Bunch wrote:
> > > >
> > > >> Hi all,
> > > >>   I've been working on a small thread ring benchmark in X10 and have
> > > >> codes written in MPI, UPC, and X10 thus far. Unfortunately, my X10
> > code
> > > >> is quite slower than the others (two orders of magnitude slower) and
> > I'm
> > > >> not entirely sure why. Essentially each process just sends a message
> > to
> > > >> the next process, and the final process sends the message to the first
> > > >> process (forming a ring).
> > > >>
> > > >> I'd like to make the code comparable to MPI and UPC and would love a
> > > >> separate pair of eyes to look it over - it's less than 100 lines of
> > code
> > > >> so it's not that long. I've posted the code here for those who are
> > > >> interested:
> > > >>
> > > >> http://pastebin.com/dYPCwh4G
> > > >>
> > > >> I know everyone is busy but any help is much appreciated! I know X10
> > can
> > > >> pass around 100 messages between 64 processors on two nodes with the
> > MPI
> > > >> backed faster than 9700 seconds (the MPI is doing it in 4 seconds),
> > but
> > > >> I'm just not sure what I'm doing wrong.
> > > >>
> > > >> Thanks!
> > > >>
> > > >>
> > > >>
> > ------------------------------------------------------------------------------
> > > >> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio
> > XE:
> > > >> Pinpoint memory and threading errors before they happen.
> > > >> Find and fix more than 250 security defects in the development cycle.
> > > >> Locate bottlenecks in serial and parallel code that limit performance.
> > > >> http://p.sf.net/sfu/intel-dev2devfeb
> > > >> _______________________________________________
> > > >> X10-users mailing list
> > > >> X10-users@lists.sourceforge.net
> > > >> https://lists.sourceforge.net/lists/listinfo/x10-users
> > > >>
> > > >>
> > > >
> > > >
> > > >
> > ------------------------------------------------------------------------------
> > > > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio
> > XE:
> > > > Pinpoint memory and threading errors before they happen.
> > > > Find and fix more than 250 security defects in the development cycle.
> > > > Locate bottlenecks in serial and parallel code that limit performance.
> > > > http://p.sf.net/sfu/intel-dev2devfeb
> > > > _______________________________________________
> > > > X10-users mailing list
> > > > X10-users@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/x10-users
> > > >
> > >
> > >
> > >
> > ------------------------------------------------------------------------------
> > > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
> > > Pinpoint memory and threading errors before they happen.
> > > Find and fix more than 250 security defects in the development cycle.
> > > Locate bottlenecks in serial and parallel code that limit performance.
> > > http://p.sf.net/sfu/intel-dev2devfeb
> > > _______________________________________________
> > > X10-users mailing list
> > > X10-users@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/x10-users
> >
> >
> >
> >
> > ------------------------------------------------------------------------------
> > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
> > Pinpoint memory and threading errors before they happen.
> > Find and fix more than 250 security defects in the development cycle.
> > Locate bottlenecks in serial and parallel code that limit performance.
> > http://p.sf.net/sfu/intel-dev2devfeb
> > _______________________________________________
> > X10-users mailing list
> > X10-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/x10-users
> >
> ------------------------------------------------------------------------------
> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
> Pinpoint memory and threading errors before they happen.
> Find and fix more than 250 security defects in the development cycle.
> Locate bottlenecks in serial and parallel code that limit performance.
> http://p.sf.net/sfu/intel-dev2devfeb
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users



------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Re: [X10-users] Thread Ring Benchmark

Reply via email to