Re: Simple parallel foreach and summation/reduction

2018-09-24 Thread Russel Winder via Digitalmars-d-learn
Hi,

Apologies for coming late to this thread.

I started with:

   import std.random: uniform;
   import std.range: iota;
   import std.stdio: writeln;

   void main() {
ulong sum;
foreach(i; iota(1_000_000_000)) {
if (uniform(0F,12F) > 6F) sum++;
}
writeln("The sum is ", sum);
   }

and then transformed it to:

   import std.algorithm: map, reduce;
   import std.random: uniform;
   import std.range: iota;
   import std.stdio: writeln;

   void main() {
ulong sum = iota(1_000_000_000).map!((_) => uniform(0F,12F) > 6F ? 1 : 
0).reduce!"a +b";
writeln("The sum is ", sum);
   }

and then made use of std.parallelism:

   import std.algorithm: map;
   import std.array:array;
   import std.parallelism: taskPool;
   import std.random: uniform;
   import std.range: iota;
   import std.stdio: writeln;

   void main() {
ulong sum = taskPool().reduce!"a + b"(iota(1_000_000_000).map!((_) => 
uniform(0F,12F) > 6F ? 1 : 0));
writeln("The sum is ", sum);
   }

I am not entirely sure how to capture the memory used but roughly (since this
is a one off measure and not a statistically significant experiment):

first takes 30s
second takes 30s
third takes 4s

on an ancient twin Xeon workstation, so 8 cores but all ancient and slow.

The issue here is that std.parallelism.reduce, std.parallelism.map, and
std.parallelism.amap are all "top level" work scattering functions, they all
assume total control of the resources. So the above is a parallel reduce using
sequential map which works fine. Trying to mix parallel reduce and parallel
map or amap ends up with two different attempts to make use of the resources
to create tasks.

std.parallelism isn't really a fork/join framework in the Java sense, if you
want tree structure parallelism, you have to do things with futures.  

-- 
Russel.
===
Dr Russel Winder  t: +44 20 7585 2200
41 Buckmaster Roadm: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk



signature.asc
Description: This is a digitally signed message part


Re: Simple parallel foreach and summation/reduction

2018-09-24 Thread Chris Katko via Digitalmars-d-learn

On Monday, 24 September 2018 at 07:13:24 UTC, Chris Katko wrote:

On Monday, 24 September 2018 at 05:59:20 UTC, Chris Katko wrote:

[...]



Actually, I just realized/remembered that the error occurs 
inside parallelism itself, and MANY times at that:


[...]


This JUST occurred to me. When I use an outer taskPool.[a]map, am 
I NOT supposed to use the taskPool version of reduce?! But 
instead, the std.algorithm one?


Because this is running with both/all cores, and only using 2.7MB 
of RAM:


sum = taskPool.reduce!(test)(
map!(monte)(range)   //map, not taskPool.map
);  

If that's the correct case, the docs did NOT make that obvious!

FYI, I went from ~5200 samples / mSec, to 7490 samples / mSec. 
36% difference for second "real" core. Better than nothing, I 
guess. I'll have to try it on my main machine with a proper CPU.


Re: Simple parallel foreach and summation/reduction

2018-09-24 Thread Chris Katko via Digitalmars-d-learn

On Monday, 24 September 2018 at 05:59:20 UTC, Chris Katko wrote:
On Saturday, 22 September 2018 at 02:26:41 UTC, Chris Katko 
wrote:
On Saturday, 22 September 2018 at 02:13:58 UTC, Chris Katko 
wrote:
On Friday, 21 September 2018 at 12:15:59 UTC, Ali Çehreli 
wrote:

On 09/21/2018 12:25 AM, Chris Katko wrote:

[...]


You can use a free-standing function as a workaround, which 
is included in the following chapter that explains most of 
std.parallelism:


  http://ddili.org/ders/d.en/parallelism.html

That chapter is missing e.g. the newly-added fold():

  
https://dlang.org/phobos/std_parallelism.html#.TaskPool.fold


Ali


Okay... so I've got it running. The problem is, it uses tons 
of RAM. In fact, proportional to the working set.


T test(T)(T x, T y)
{
return x + y;
}

double monte(T)(T x)
{
double v = uniform(-1F, 1F);
double u = uniform(-1F, 1F);
if(sqrt(v*v + u*u) < 1.0)
{
return 1;
}else{
return 0;
}
}

auto taskpool = new TaskPool();
sum = taskpool.reduce!(test)(
taskpool.amap!monte(
iota(num)
)   );  
taskpool.finish(true);

100 becomes ~8MB
1000 becomes 80MB
1, I can't even run because it says "Exception: 
Memory Allocation failed"


Also, when I don't call .finish(true) at the end, it just sits 
there forever (after running) like one of the threads won't 
terminate. Requiring a control-C. But the docs and examples 
don't seem to indicate I should need that...


So I looked into it. It's amap that explodes in RAM.

Per the docs, amap has "less overhead but more memory usage." 
While map has more overhead but less memory usage and "avoids 
the need to keep all results in memory."


But, if I make a call to map... it doesn't compile! I get:

Error: no [] operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map


Simply changing amap to map here:

sum = taskPool.reduce!(test)
(
taskPool.map!(monte)(range)
);



Actually, I just realized/remembered that the error occurs inside 
parallelism itself, and MANY times at that:


/usr/include/dmd/phobos/std/parallelism.d(2590): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2596): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2616): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2616): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2616): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2616): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2616): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2616): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2626): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2626): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2626): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2626): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2626): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2626): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
/usr/include/dmd/phobos/std/parallelism.d(2634): Error: no [] 
operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map
monte.d(64): Error: template instance 
std.parallelism.TaskPool.reduce!(test).reduce!(Map) error 
instantiating


Though I tried looking up the git version of prallelism.d and the 
lines don't quite line up:


https://github.com/dlang/phobos/blob/master/std/parallelism.d


Re: Simple parallel foreach and summation/reduction

2018-09-24 Thread Chris Katko via Digitalmars-d-learn

On Saturday, 22 September 2018 at 02:26:41 UTC, Chris Katko wrote:
On Saturday, 22 September 2018 at 02:13:58 UTC, Chris Katko 
wrote:
On Friday, 21 September 2018 at 12:15:59 UTC, Ali Çehreli 
wrote:

On 09/21/2018 12:25 AM, Chris Katko wrote:

[...]


You can use a free-standing function as a workaround, which 
is included in the following chapter that explains most of 
std.parallelism:


  http://ddili.org/ders/d.en/parallelism.html

That chapter is missing e.g. the newly-added fold():

  https://dlang.org/phobos/std_parallelism.html#.TaskPool.fold

Ali


Okay... so I've got it running. The problem is, it uses tons 
of RAM. In fact, proportional to the working set.


T test(T)(T x, T y)
{
return x + y;
}

double monte(T)(T x)
{
double v = uniform(-1F, 1F);
double u = uniform(-1F, 1F);
if(sqrt(v*v + u*u) < 1.0)
{
return 1;
}else{
return 0;
}
}

auto taskpool = new TaskPool();
sum = taskpool.reduce!(test)(
taskpool.amap!monte(
iota(num)
)   );  
taskpool.finish(true);

100 becomes ~8MB
1000 becomes 80MB
1, I can't even run because it says "Exception: Memory 
Allocation failed"


Also, when I don't call .finish(true) at the end, it just sits 
there forever (after running) like one of the threads won't 
terminate. Requiring a control-C. But the docs and examples 
don't seem to indicate I should need that...


So I looked into it. It's amap that explodes in RAM.

Per the docs, amap has "less overhead but more memory usage." 
While map has more overhead but less memory usage and "avoids the 
need to keep all results in memory."


But, if I make a call to map... it doesn't compile! I get:

Error: no [] operator overload for type 
std.parallelism.TaskPool.map!(monte).map!(Result).map.Map


Simply changing amap to map here:

sum = taskPool.reduce!(test)
(
taskPool.map!(monte)(range)
);


Re: Simple parallel foreach and summation/reduction

2018-09-21 Thread Chris Katko via Digitalmars-d-learn

On Saturday, 22 September 2018 at 02:13:58 UTC, Chris Katko wrote:

On Friday, 21 September 2018 at 12:15:59 UTC, Ali Çehreli wrote:

On 09/21/2018 12:25 AM, Chris Katko wrote:

[...]


You can use a free-standing function as a workaround, which is 
included in the following chapter that explains most of 
std.parallelism:


  http://ddili.org/ders/d.en/parallelism.html

That chapter is missing e.g. the newly-added fold():

  https://dlang.org/phobos/std_parallelism.html#.TaskPool.fold

Ali


Okay... so I've got it running. The problem is, it uses tons of 
RAM. In fact, proportional to the working set.


T test(T)(T x, T y)
{
return x + y;
}

double monte(T)(T x)
{
double v = uniform(-1F, 1F);
double u = uniform(-1F, 1F);
if(sqrt(v*v + u*u) < 1.0)
{
return 1;
}else{
return 0;
}
}

auto taskpool = new TaskPool();
sum = taskpool.reduce!(test)(
taskpool.amap!monte(
iota(num)
)   );  
taskpool.finish(true);

100 becomes ~8MB
1000 becomes 80MB
1, I can't even run because it says "Exception: Memory 
Allocation failed"


Also, when I don't call .finish(true) at the end, it just sits 
there forever (after running) like one of the threads won't 
terminate. Requiring a control-C. But the docs and examples don't 
seem to indicate I should need that...


Re: Simple parallel foreach and summation/reduction

2018-09-21 Thread Chris Katko via Digitalmars-d-learn

On Friday, 21 September 2018 at 12:15:59 UTC, Ali Çehreli wrote:

On 09/21/2018 12:25 AM, Chris Katko wrote:
On Thursday, 20 September 2018 at 05:51:17 UTC, Neia Neutuladh 
wrote:
On Thursday, 20 September 2018 at 05:34:42 UTC, Chris Katko 
wrote:
All I want to do is loop from 0 to [constant] with a for or 
foreach, and have it split up across however many cores I 
have.


You're looking at std.parallelism.TaskPool, especially the 
amap and reduce functions. Should do pretty much exactly what 
you're asking.


auto taskpool = new TaskPool();
taskpool.reduce!((a, b) => a + b)(iota(1_000_000_000_000L));


I get "Error: template instance `reduce!((a, b) => a + b)` 
cannot use local __lambda1 as parameter to non-global template 
reduce(functions...)" when trying to compile that using the 
online D editor with DMD and LDC.


Any ideas?


You can use a free-standing function as a workaround, which is 
included in the following chapter that explains most of 
std.parallelism:


  http://ddili.org/ders/d.en/parallelism.html

That chapter is missing e.g. the newly-added fold():

  https://dlang.org/phobos/std_parallelism.html#.TaskPool.fold

Ali


Okay... so I've got it running. The problem is, it uses tons of 
RAM. In fact, proportional to the working set.


T test(T)(T x, T y)
{
return x + y;
}

double monte(T)(T x)
{
double v = uniform(-1F, 1F);
double u = uniform(-1F, 1F);
if(sqrt(v*v + u*u) < 1.0)
{
return 1;
}else{
return 0;
}
}

auto taskpool = new TaskPool();
sum = taskpool.reduce!(test)(
taskpool.amap!monte(
iota(num)
)   );  
taskpool.finish(true);

100 becomes ~8MB
1000 becomes 80MB
1, I can't even run because it says "Exception: Memory 
Allocation failed"


Re: Simple parallel foreach and summation/reduction

2018-09-21 Thread Ali Çehreli via Digitalmars-d-learn

On 09/21/2018 12:25 AM, Chris Katko wrote:

On Thursday, 20 September 2018 at 05:51:17 UTC, Neia Neutuladh wrote:

On Thursday, 20 September 2018 at 05:34:42 UTC, Chris Katko wrote:
All I want to do is loop from 0 to [constant] with a for or foreach, 
and have it split up across however many cores I have.


You're looking at std.parallelism.TaskPool, especially the amap and 
reduce functions. Should do pretty much exactly what you're asking.


auto taskpool = new TaskPool();
taskpool.reduce!((a, b) => a + b)(iota(1_000_000_000_000L));


I get "Error: template instance `reduce!((a, b) => a + b)` cannot use 
local __lambda1 as parameter to non-global template 
reduce(functions...)" when trying to compile that using the online D 
editor with DMD and LDC.


Any ideas?


You can use a free-standing function as a workaround, which is included 
in the following chapter that explains most of std.parallelism:


  http://ddili.org/ders/d.en/parallelism.html

That chapter is missing e.g. the newly-added fold():

  https://dlang.org/phobos/std_parallelism.html#.TaskPool.fold

Ali


Re: Simple parallel foreach and summation/reduction

2018-09-21 Thread Dennis via Digitalmars-d-learn

On Friday, 21 September 2018 at 07:25:17 UTC, Chris Katko wrote:
I get "Error: template instance `reduce!((a, b) => a + b)` 
cannot use local __lambda1 as parameter to non-global template 
reduce(functions...)" when trying to compile that using the 
online D editor with DMD and LDC.


Any ideas?


That's a long standing issue: 
https://issues.dlang.org/show_bug.cgi?id=5710


Using a string for the expression does work though:
```
import std.stdio, std.parallelism, std.range;

void main() {
taskPool.reduce!"a + b"(iota(1_000L)).writeln;
}
```


Re: Simple parallel foreach and summation/reduction

2018-09-21 Thread Chris Katko via Digitalmars-d-learn
On Thursday, 20 September 2018 at 05:51:17 UTC, Neia Neutuladh 
wrote:
On Thursday, 20 September 2018 at 05:34:42 UTC, Chris Katko 
wrote:
All I want to do is loop from 0 to [constant] with a for or 
foreach, and have it split up across however many cores I have.


You're looking at std.parallelism.TaskPool, especially the amap 
and reduce functions. Should do pretty much exactly what you're 
asking.


auto taskpool = new TaskPool();
taskpool.reduce!((a, b) => a + b)(iota(1_000_000_000_000L));


I get "Error: template instance `reduce!((a, b) => a + b)` cannot 
use local __lambda1 as parameter to non-global template 
reduce(functions...)" when trying to compile that using the 
online D editor with DMD and LDC.


Any ideas?


Re: Simple parallel foreach and summation/reduction

2018-09-19 Thread Neia Neutuladh via Digitalmars-d-learn

On Thursday, 20 September 2018 at 05:34:42 UTC, Chris Katko wrote:
All I want to do is loop from 0 to [constant] with a for or 
foreach, and have it split up across however many cores I have.


You're looking at std.parallelism.TaskPool, especially the amap 
and reduce functions. Should do pretty much exactly what you're 
asking.


auto taskpool = new TaskPool();
taskpool.reduce!((a, b) => a + b)(iota(1_000_000_000_000L));