Hi,
We used the following code to test the performance of foreach. add1() is a sequential code. in add2(), we use foreach, and let X10 to partition workloads. and in add3(), we partition the workloads by ourselves. We use c++ backend, and run the code as # env X10_NTHREADS=2 runx10 ./Test_foreach The performance: time of add1() = 32.3 ms time of add2() = 3277.98 ms time of add3() = 18.33 ms It is surprising that add2() is 100 times slower than add1(). Is someone knows the reason? Thanks. // Test_foreach.x10 def add1() { for ((i) in 0..size-1) { data(i) += 5; } } def add2() { finish foreach ((i) in 0..size-1) { data(i) += 5; } } def add3() { var numThreads: Int = 2; val mySize = size/numThreads; finish foreach ((p) in 0..numThreads-1) { for ((i) in p*mySize..(p+1)*mySize-1) { data(i) += 5; } } } ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users