02-Feb-2013 00:39, FG пишет:
On 2013-02-01 20:33, Dmitry Olshansky wrote:
Mine reiteration on it, with a bit of help from std.parallelism.
std.parallelism uses thread pool thus it's somewhat faster then
creating threads
anew.
Interestingly, threads+barrier here wasn't much slower than tasks:
1
Thanks. Yes, you are right. I have increased the dimension.
Excellent. Thank you so much for your suggestion and code. It now
produces near linear speedup.
On 2013-02-01 20:33, Dmitry Olshansky wrote:
Mine reiteration on it, with a bit of help from std.parallelism.
std.parallelism uses thread pool thus it's somewhat faster then creating threads
anew.
Interestingly, threads+barrier here wasn't much slower than tasks:
14% slower for dmd32, only 5% f
01-Feb-2013 20:08, Sparsh Mittal пишет:
Here is the code:
Mine reiteration on it, with a bit of help from std.parallelism.
std.parallelism uses thread pool thus it's somewhat faster then creating
threads anew.
Still it's instantaneous for me in a range of 30-40ms even with grid
size of 1024
On 2013-02-01 16:42, Sparsh Mittal wrote:
When I run it, and compare this parallel version with its serial version, I only
get speedup of nearly <1.3 for 2 threads. When I write same program in Go,
scaling is nearly 2.
Also, in D, on doing "top", I see the usage as only 130% CPU and not nearly 2
Here is the code:
#!/usr/bin/env rdmd
import std.stdio;
import std.concurrency;
import core.thread;
import std.datetime;
import std.conv;
import core.sync.barrier;
immutable int gridSize = 256;
immutable int MAXSTEPS = 5; /* Maximum number of
iterations */
immutable d
Can't tell much without the whole source or at least compilable
standalone piece.
Give me a moment. I will post.
01-Feb-2013 19:42, Sparsh Mittal пишет:
It got posted before I completed it! Sorry.
I am parallelizing a program which follows this structure:
immutable int numberOfThreads= 2
for iter = 1 to MAX_ITERATION
{
myLocalBarrier = new Barrier(numberOfThreads+1);
for i= 1 to numberOfThre