Hi. You are essentially creating a lot of activities, but the fact remains you have only two compute threads. That is you specify your ideal parallelism (1000x1000 threads), but the useful parallelism is just two threads - you have only two hardware cores or threads. So there is a significant overhead in terms of activity creation/scheduling/termination.
A simple but effective means to handle it is via loop chunking (see http://portal.acm.org/citation.cfm?id=1542275.1542304). I don't think it is yet implemented in the X10 compiler. So for the time being you may want to do it manually - replace the parallel loop which creates a large number of activities with one that creates fewer number of activities. Warm regards, Krishna. |------------> | From: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |mohammed elsaeedy <mohammed.elsae...@kaust.edu.sa> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | To: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Mailing list for users of the X10 programming language <x10-users@lists.sourceforge.net> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Date: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |07/05/2010 03:49 PM | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Subject: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Re: [X10-users] Async Parallelism???? | >--------------------------------------------------------------------------------------------------------------------------------------------------| I'm sorry but the previous mail was sent by mistake incomplete so heres my message: Dear List, I have this very weird behavior going on, I'm just implementing this simple Matrix-Vector Multiplication program and I'm measuring the time consumed by my program, so heres my parallel code with asyncs : * val n:int=1000; val a= new Array[Int]([0..n,0..n],((i,j):Point)=>j); //Matrix a 2 dim val b= new Array[Int]([0..n],((i):Point)=>i); //Vector b 1 dim val result = new Array[Int]([0..n],((i):Point)=>0); //Vector result 1 dim result= ab val h = new Hello(); val begin:Long = Timer.nanoTime(); finish { for((i,j):Point in a.region) { async { val value:int; finish value= h.computeMult(a,b,i,j); //this is just a method that calculates the multiplication of each matrix row element with the // corresponding vector column element atomic result(i)+=value; // atomic sum up of the corresponding result element } } } val end:Long = Timer.nanoTime();* So as you can see the size of the matrix and the vector is 1000, the execution time with "async, finish, atomic" (works in one place but multi-activities) is *11.872344* sec where as if it worked sequentially (by removing all "async", "finish", "atomic") it gives a better result which is *2.39275* secs, and ofcourse as I increase the size of the dimensions it goes worse. I'm working on a dual core machine. How is this possible? please advise. On Mon, Jul 5, 2010 at 5:15 AM, mohammed elsaeedy < mohammed.elsae...@kaust.edu.sa> wrote: > Dear List, > > > I have this very weird behavior going on, I'm just implementing this > simple Matrix-Vector Multiplication program > and I'm measuring the time consumed by my program, so heres my parallel > code with asyncs : > * > val n:int=1000; > // 2 > val a= new Array[Int]([0..n,0..n],((i,j):Point)=>j); > val b= new Array[Int]([0..n],((i):Point)=>i); > val result = new Array[Int]([0..n],((i):Point)=>0); > > val h = new Hello(); > val begin:Long = Timer.nanoTime(); > > finish > { > for((i,j):Point in a.region) > { > async > { > val value:int; > finish > value= h.computeMult(a,b,i,j); //this is just > a method that calculates > atomic result(i)+=value; > } > } > } > > val end:Long = Timer.nanoTime();* > > -- > Thank you for your concern. > Regards, > Mohammed El Sayed > -- Thank you for your concern. Regards, Mohammed El Sayed ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users