Re: sliced().array compatibility with parallel?

2016-01-10 Thread Marc Schütz via Digitalmars-d-learn

On Sunday, 10 January 2016 at 01:16:43 UTC, Ilya Yaroshenko wrote:

On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
I'm playing around with win32, v2.069.2 dmd and 
"dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
.array(), should that first dimension then be compatible with 
parallel foreach?


[...]


Oh... there is no bug.
means must be shared =) :

shared double[1000] means;



I'd say, if `shared` is required, but it compiles without, then 
it's still a bug.


Re: sliced().array compatibility with parallel?

2016-01-10 Thread Russel Winder via Digitalmars-d-learn
On Sun, 2016-01-10 at 01:46 +, Jay Norwood via Digitalmars-d-learn
wrote:
> 
[…]
>  // processed non-parallel works ok
>  foreach( dv; dv2){
>  if(dv != dv){ // test for NaN
>  return 1;
>  }
>  }
> 
>  // calculated parallel leaves out processing of many values
>  foreach( dv; dvp){
>  if(dv != dv){ // test for NaN
>  return 1;
>  }
>  }
>  return(0);
> }

I am not convinced these "Tests for NaN" actually test for NaN. I
believe you have to use isNan(dv).

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder



signature.asc
Description: This is a digitally signed message part


Re: sliced().array compatibility with parallel?

2016-01-10 Thread Jay Norwood via Digitalmars-d-learn

On Sunday, 10 January 2016 at 11:21:53 UTC, Marc Schütz wrote:


I'd say, if `shared` is required, but it compiles without, then 
it's still a bug.


Yeah, probably so.  Interestingly, without 'shared' and using a 
simple assignment from a constant (means[i]= 1.0;), instead of 
assignment from the sum() evaluation, results in all the values 
being initialized, so not marking it shared doesn't protect it 
from being written from the other thread.  Anyway, the shared 
declaration doesn't seem to slow the execution, and it does make 
sense to me that it should be marked shared.





Re: sliced().array compatibility with parallel?

2016-01-10 Thread Jay Norwood via Digitalmars-d-learn

On Sunday, 10 January 2016 at 03:23:14 UTC, Ilya wrote:
I will add significantly faster pairwise summation based on 
SIMD instructions into the future std.las. --Ilya


Wow! A lot of overhead in the debug build.  I checked the 
computed values are the same.  This is on my laptop corei5.


dub -b release-nobounds --force
parallel time msec:448
non_parallel msec:767

dub -b debug --force
parallel time msec:2465
non_parallel msec:4962

on my corei7 desktop, the release-no bounds
parallel time msec:161
non_parallel msec:571






Re: sliced().array compatibility with parallel?

2016-01-10 Thread Jay Norwood via Digitalmars-d-learn

On Sunday, 10 January 2016 at 12:11:39 UTC, Russel Winder wrote:

 foreach( dv; dvp){
 if(dv != dv){ // test for NaN
 return 1;
 }
 }
 return(0);
}


I am not convinced these "Tests for NaN" actually test for NaN. 
I

believe you have to use isNan(dv).


I saw it mentioned in another post, and tried it.  Works.



Re: sliced().array compatibility with parallel?

2016-01-09 Thread Jay Norwood via Digitalmars-d-learn

On Sunday, 10 January 2016 at 00:41:35 UTC, Ilya Yaroshenko wrote:

It is a bug (Slice or Parallel ?). Please fill this issue.
Slice should work with parallel, and array of slices should 
work with parallel.


Ok, thanks, I'll submit it.




Re: sliced().array compatibility with parallel?

2016-01-09 Thread Ilya Yaroshenko via Digitalmars-d-learn

On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
I'm playing around with win32, v2.069.2 dmd and 
"dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
.array(), should that first dimension then be compatible with 
parallel foreach?


I find that without using parallel, all the means get computed, 
but with parallel, only about  half of them are computed in 
this example.  The others remain NaN, examined in the debugger 
in Visual D.


import std.range : iota;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.conv : to;
import std.stdio;
import std.experimental.ndslice;

enum testCount = 1;
double[1000] means;
double[] data;

void f1() {
 import std.parallelism;
 auto sl = data.sliced(1000,100_000);
 auto sla = sl.array();
 foreach(i,vec; parallel(sla)){
  double v=vec.sum(0.0);
  means[i] = v / 100_000;
 }
}

void main() {
 data = new double[100_000_000];
 for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;}
 auto r = benchmark!(f1)(testCount);
 auto f0Result = to!Duration(r[0] / testCount);
 f0Result.writeln;
 writeln(means[0]);
}


This is a bug in std.parallelism :-)

Proof:

import std.range : iota;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.conv : to;
import std.stdio;
import mir.ndslice;

import std.parallelism;

enum testCount = 1;

double[1000] means;
double[] data;

void f1() {
//auto sl = data.sliced(1000, 100_000);
//auto sla = sl.array();
auto sla = new double[][1000];
foreach(i, ref e; sla)
{
e = data[i * 100_000 .. (i+1) * 100_000];
}
foreach(i,vec; parallel(sla))
{
double v = vec.sum;
means[i] = v / vec.length;
}
}

void main() {
data = new double[100_000_000];
foreach(i, ref e; data){
e = i / 100_000_000.0;
}
auto r = benchmark!(f1)(testCount);
auto f0Result = to!Duration(r[0] / testCount);
f0Result.writeln;
writeln(means);
}

Prints:
[0.00045, 0.0015, 0.0025, 0.0035, 0.0044, 0.0054, 
0.0064, 0.0074, 0.0084, 0.0094, 0.0105, 0.0115, 
0.0125, 0.0135, 0.0145, 0.0155, 0.0165, 0.0175, 0.0185, 0.0195, 
0.0205, 0.0215, 0.0225, 0.0235, 0.0245, 0.0255, 0.0265, 0.0275, 
0.0285, 0.0295, 0.0305, 0.0315, 0.0325, 0.0335, 0.0345, 0.0355, 
0.0365, 0.0375, 0.0385, 0.0395, 0.0405, 0.0415, 0.0425, 0.0435, 
0.0445, 0.0455, 0.0465, 0.0475, 0.0485, 0.0495, 0.0505, 0.0515, 
0.0525, 0.0535, 0.0545, 0.0555, 0.0565, 0.0575, 0.0585, 0.0595, 
0.0605, 0.0615, 0.0625, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan 


sliced().array compatibility with parallel?

2016-01-09 Thread Jay Norwood via Digitalmars-d-learn
I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice": 
"~>0.8.8".  If I convert the 2D slice with .array(), should that 
first dimension then be compatible with parallel foreach?


I find that without using parallel, all the means get computed, 
but with parallel, only about  half of them are computed in this 
example.  The others remain NaN, examined in the debugger in 
Visual D.


import std.range : iota;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.conv : to;
import std.stdio;
import std.experimental.ndslice;

enum testCount = 1;
double[1000] means;
double[] data;

void f1() {
 import std.parallelism;
 auto sl = data.sliced(1000,100_000);
 auto sla = sl.array();
 foreach(i,vec; parallel(sla)){
  double v=vec.sum(0.0);
  means[i] = v / 100_000;
 }
}

void main() {
 data = new double[100_000_000];
 for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;}
 auto r = benchmark!(f1)(testCount);
 auto f0Result = to!Duration(r[0] / testCount);
 f0Result.writeln;
 writeln(means[0]);
}


Re: sliced().array compatibility with parallel?

2016-01-09 Thread Jay Norwood via Digitalmars-d-learn

for example,
means[63]  through means[251] are consistently all NaN when using 
parallel in this test, but are all computed double values when 
parallel is not used.




Re: sliced().array compatibility with parallel?

2016-01-09 Thread Ilya Yaroshenko via Digitalmars-d-learn

On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
I'm playing around with win32, v2.069.2 dmd and 
"dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
.array(), should that first dimension then be compatible with 
parallel foreach?


[...]


It is a bug (Slice or Parallel ?). Please fill this issue.
Slice should work with parallel, and array of slices should work 
with parallel.


Re: sliced().array compatibility with parallel?

2016-01-09 Thread Ilya Yaroshenko via Digitalmars-d-learn

On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
I'm playing around with win32, v2.069.2 dmd and 
"dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
.array(), should that first dimension then be compatible with 
parallel foreach?


[...]


Oh... there is no bug.
means must be shared =) :

shared double[1000] means;



Re: sliced().array compatibility with parallel?

2016-01-09 Thread Jay Norwood via Digitalmars-d-learn

On Sunday, 10 January 2016 at 01:54:18 UTC, Jay Norwood wrote:

ok, thanks.  That works. I'll go back to trying ndslice now.


The parallel time for this case is about a 2x speed-up on my 
corei5 laptop, debug build in windows32, dmd.


D:\ec_mars_ddt\workspace\nd8>nd8.exe
parallel time msec:2495
non_parallel msec:5093

===
import std.array : array;
import std.algorithm;
import std.datetime;
import std.conv : to;
import std.stdio;
import std.experimental.ndslice;

shared double[1000] means;
double[] data;

void f1() {
import std.parallelism;
auto sl = data.sliced(1000,100_000);
foreach(i,vec; parallel(sl)){
means[i] = vec.sum / 100_000;
}
}

void f2() {
auto sl = data.sliced(1000,100_000);
foreach(i,vec; sl.array){
means[i] = vec.sum / 100_000;
}
}

void main() {
data = new double[100_000_000];
for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;}
StopWatch sw1, sw2;
sw1.start();
f1() ;
auto r1 = sw1.peek().msecs;
sw2.start();
f2();
auto r2 = sw2.peek().msecs;

writeln("parallel time msec:",r1);
writeln("non_parallel msec:", r2);
}




Re: sliced().array compatibility with parallel?

2016-01-09 Thread Ilya via Digitalmars-d-learn

On Sunday, 10 January 2016 at 02:43:05 UTC, Jay Norwood wrote:

On Sunday, 10 January 2016 at 01:54:18 UTC, Jay Norwood wrote:

[...]


The parallel time for this case is about a 2x speed-up on my 
corei5 laptop, debug build in windows32, dmd.


[...]


I will add significantly faster pairwise summation based on SIMD 
instructions into the future std.las. --Ilya


Re: sliced().array compatibility with parallel?

2016-01-09 Thread Jay Norwood via Digitalmars-d-learn

On Sunday, 10 January 2016 at 01:16:43 UTC, Ilya Yaroshenko wrote:

On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
I'm playing around with win32, v2.069.2 dmd and 
"dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
.array(), should that first dimension then be compatible with 
parallel foreach?


[...]


Oh... there is no bug.
means must be shared =) :

shared double[1000] means;



ok, thanks.  That works. I'll go back to trying ndslice now.