Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-06 Thread Paul via Digitalmars-d-learn

On Thursday, 6 April 2023 at 01:44:15 UTC, H. S. Teoh wrote:



D ranges are conceptually sequential, but the actual underlying 
memory access patterns depends on the concrete type at runtime. 
An array's elements are stored sequentially in memory, and 
arrays are ranges.  But a linked-list can also have a range 
interface, yet its elements may be stored in non-consecutive 
memory locations.  So the concrete type matters here; the range 
API only gives you conceptual sequentiality, it does not 
guarantee physically sequential memory access.


Very helpful Teoh.  Thanks again.


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-06 Thread Salih Dincer via Digitalmars-d-learn
On Tuesday, 4 April 2023 at 16:22:29 UTC, Steven Schveighoffer 
wrote:

On 4/4/23 11:34 AM, Salih Dincer wrote:
On Tuesday, 4 April 2023 at 14:20:20 UTC, Steven Schveighoffer 
wrote:
parallel is a shortcut to `TaskPool.parallel`, which is 
indeed a foreach-only construct, it does not return a range.


I think what you want is `TaskPool.map`:

```d
// untested, just looking at the
taskPool.map!(/* your map function here */)
   (s.iota(len)).writeln;
```



I tried, thanks but it goes into infinite loop. For example, 
the first 50 of the sequence should have been printed to the 
screen immediately without waiting.


```d
long[50] arr;
RowlandSequence_v2 range;

auto next(long a)
{
   range.popFront();
   return arr[a] = range.front();
}

void main()
{
     range = RowlandSequence_v2(7, 2);
     taskPool.map!next(iota(50))/*
     s.iota(50)
  .map!next
  .parallel//*/
  .writeln;
}
```


Keep in mind that `arr` and `range` are thread-local, and so 
will be different states for different tasks.


I guess the reason it goes into an infinite loop is because gcd() 
a recursive function (gcd). This is the only solution I can think 
of about this:


```d
import std.range, std.algorithm : map;
import std.stdio, std.parallelism;
//import std.numeric : gcd;
/*
struct Vector {
  long x, y, result;
  alias result this;
}

Vector gcd(long a, long b) {
  if(b == 0) return Vector(1, 0, a);

  auto pre = gcd(b, a % b);
  auto tmp = (a / b) * pre.y;

  return Vector(pre.y, pre.x - tmp, pre.result);
}//*/

struct RowlandSequence_v3 {
  long b, r, n, a = 3, limit;

  bool empty() { return n == limit; }

  auto front() { return a; }

  void popFront() {
long result = 1;
while(result == 1) {
  result = gcd(r++, b);
  b += result;
}
a = result;
  }

  long gcd(long a, long b) {
long c;
while(b != 0) {
  c = a % b;
  a = b;
  b = c;
}
return a;
  }
}

auto next(ref RowlandSequence_v3 range) {
  with(range) {
if(empty) return [n, front];
popFront();
return [n++, front];
  }
}

long[179] arr;

void main()
{
  // initialization:
  RowlandSequence_v3[4] r = [
  RowlandSequence_v3(7 , 2, 0, 3, 112),
  RowlandSequence_v3(186837678, 62279227, 112, 3, 145),
  RowlandSequence_v3(747404910, 249134971, 145, 6257, 160),
  RowlandSequence_v3(1494812421, 498270808, 160, 11, 177)
  ];

  auto tasks = [ task(, r[0]),
 task(, r[1]),
 task(, r[2]),
 task(, r[3])
   ];

  // quad parallel operation:
  foreach(_; 0..r[0].limit)
  {
foreach(p, ref task; tasks)
{
task.executeInNewThread;
auto t = task.workForce;
arr[t[0]] = t[1];
}
  }

  // prints...
  foreach(x, n; arr) {
switch(x + 1) {
  case 112, 145, 160:
n.writeln; break;
  default:
n.write(", ");
}
  }
} /* PRINTS:
user@debian:~/Documents$ dmd -O rowlandSequence.d -release
user@debian:~/Documents$ time ./rowlandSequence
5, 3, 11, 3, 23, 3, 47, 3, 5, 3, 101, 3, 7, 11, 3, 13, 233, 3, 
467, 3, 5, 3, 941, 3, 7, 1889, 3, 3779, 3, 7559, 3, 13, 15131, 3, 
53, 3, 7, 30323, 3, 60647, 3, 5, 3, 101, 3, 121403, 3, 242807, 3, 
5, 3, 19, 7, 5, 3, 47, 3, 37, 5, 3, 17, 3, 199, 53, 3, 29, 3, 
486041, 3, 7, 421, 23, 3, 972533, 3, 577, 7, 1945649, 3, 163, 7, 
3891467, 3, 5, 3, 127, 443, 3, 31, 7783541, 3, 7, 15567089, 3, 
19, 29, 3, 5323, 7, 5, 3, 31139561, 3, 41, 3, 5, 3, 62279171, 3, 
7, 83, 3
29, 3, 1103, 3, 5, 3, 13, 7, 124559609, 3, 107, 3, 911, 3, 
249120239, 3, 11, 3, 7, 61, 37, 179, 3, 31, 19051, 7, 3793, 23, 
3, 5, 3, 6257, 3

3, 11, 3, 13, 5, 3, 739, 37, 5, 3, 498270791, 3, 19, 11, 3
3, 3, 5, 3, 996541661, 3, 7, 37, 5, 3, 67, 1993083437, 3, 5, 3, 
83, 3, 3, 0,

real7m54.093s
user7m54.062s
sys 0m0.024s
*/
```

However, parallel processing for 4-digit sequence elements is not 
promising at least for the Rowland Sequence.


SDB@79


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-05 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Apr 06, 2023 at 01:20:28AM +, Paul via Digitalmars-d-learn wrote:
[...]
> Yes I understand, basically, what's going on in hardware.  I just
> wasn't sure if the access type was linked to the container type.  It
> seems obvious now, since you've both made it clear, that it also
> depends on how I'm accessing my container.
> 
> Having said all of this, isn't a D 'range' fundamentally a sequential
> access container (i.e popFront) ?

D ranges are conceptually sequential, but the actual underlying memory
access patterns depends on the concrete type at runtime. An array's
elements are stored sequentially in memory, and arrays are ranges.  But
a linked-list can also have a range interface, yet its elements may be
stored in non-consecutive memory locations.  So the concrete type
matters here; the range API only gives you conceptual sequentiality, it
does not guarantee physically sequential memory access.


T

-- 
Many open minds should be closed for repairs. -- K5 user


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-05 Thread Paul via Digitalmars-d-learn

On Wednesday, 5 April 2023 at 23:06:54 UTC, H. S. Teoh wrote:

So your data structures and algorithms should be designed in a 
way that takes advantage of linear access where possible.



T


Yes I understand, basically, what's going on in hardware.  I just 
wasn't sure if the access type was linked to the container type.  
It seems obvious now, since you've both made it clear, that it 
also depends on how I'm accessing my container.


Having said all of this, isn't a D 'range' fundamentally a 
sequential access container (i.e popFront) ?


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-05 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Apr 05, 2023 at 10:34:22PM +, Paul via Digitalmars-d-learn wrote:
> On Tuesday, 4 April 2023 at 22:20:52 UTC, H. S. Teoh wrote:
> 
> > Best practices for arrays in hot loops:
[...]
> > - Where possible, prefer sequential access over random access (take
> >   advantage of the CPU cache hierarchy).
> 
> Thanks for sharing Teoh!  Very helpful.
> 
> would this be random access? for(size_t i; i indices?  ...and this be sequential foreach(a;arr) ?
> 
> or would they have to be completely different kinds of containers?  a
> dlang 'range' vs arr[]?
[...]

The exact syntactic construct you use is not important; under the hood,
for(i; i

Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-05 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/5/23 6:34 PM, Paul wrote:

On Tuesday, 4 April 2023 at 22:20:52 UTC, H. S. Teoh wrote:


Best practices for arrays in hot loops:
- Avoid appending if possible; instead, pre-allocate outside the loop.
- Where possible, reuse existing arrays instead of discarding old ones
  and allocating new ones.
- Use slices where possible instead of making copies of subarrays (this
  esp. applies to strings).
- Where possible, prefer sequential access over random access (take
  advantage of the CPU cache hierarchy).


Thanks for sharing Teoh!  Very helpful.

would this be random access? for(size_t i; iindices?

...and this be sequential foreach(a;arr) ?


No, random access is access out of sequence. Those two lines are pretty 
much equivalent, and even a naive compiler is going to produce exactly 
the same generated code from both of them.


A classic example is processing a 2d array:

```d
for(int i = 0; i < arr[0].length; ++i)
   for(int j = 0; j < arr.length; ++j)
 arr[j][i]++;

// vs
for(int j = 0; j < arr.length; ++j)
   for(int i = 0; i < arr[0].length; ++i)
 arr[j][i]++;
```

The first accesses elements *by column*, which means that the array data 
is accessed non-linearly in memory.


To be fair, both are "linear" in terms of algorithm, but one is going to 
be faster because of cache coherency (you are accessing sequential 
*hardware addresses*).


-Steve


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-05 Thread Paul via Digitalmars-d-learn

On Tuesday, 4 April 2023 at 22:20:52 UTC, H. S. Teoh wrote:


Best practices for arrays in hot loops:
- Avoid appending if possible; instead, pre-allocate outside 
the loop.
- Where possible, reuse existing arrays instead of discarding 
old ones

  and allocating new ones.
- Use slices where possible instead of making copies of 
subarrays (this

  esp. applies to strings).
- Where possible, prefer sequential access over random access 
(take

  advantage of the CPU cache hierarchy).


Thanks for sharing Teoh!  Very helpful.

would this be random access? for(size_t i; iusing indices?

...and this be sequential foreach(a;arr) ?

or would they have to be completely different kinds of containers?
a dlang 'range' vs arr[]?




Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-04 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Apr 04, 2023 at 09:35:29PM +, Paul via Digitalmars-d-learn wrote:
[...]
> Well Steven just making the change you said reduced the execution time
> from ~6-7 secs to ~3 secs.  Then, including the 'parallel' in the
> foreach statement took it down to ~1 sec.
> 
> Boy lesson learned in appending-to and zeroing dynamic arrays in a hot
> loop!

Best practices for arrays in hot loops:
- Avoid appending if possible; instead, pre-allocate outside the loop.
- Where possible, reuse existing arrays instead of discarding old ones
  and allocating new ones.
- Use slices where possible instead of making copies of subarrays (this
  esp. applies to strings).
- Where possible, prefer sequential access over random access (take
  advantage of the CPU cache hierarchy).


T

-- 
Famous last words: I *think* this will work...


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-04 Thread Paul via Digitalmars-d-learn
On Monday, 3 April 2023 at 23:50:48 UTC, Steven Schveighoffer 
wrote:


So what you need is inside `createSpansOfNoBeacons`, take as a 
reference a `ref Span[MAX_SPANS]`, and have it return a 
`Span[]` that is a slice of that which was "alocated".


See if this helps.


Well Steven just making the change you said reduced the execution 
time from ~6-7 secs to ~3 secs.  Then, including the 'parallel' 
in the foreach statement took it down to ~1 sec.


Boy lesson learned in appending-to and zeroing dynamic arrays in 
a hot loop!




Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-04 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/4/23 11:34 AM, Salih Dincer wrote:

On Tuesday, 4 April 2023 at 14:20:20 UTC, Steven Schveighoffer wrote:
parallel is a shortcut to `TaskPool.parallel`, which is indeed a 
foreach-only construct, it does not return a range.


I think what you want is `TaskPool.map`:

```d
// untested, just looking at the
taskPool.map!(/* your map function here */)
   (s.iota(len)).writeln;
```



I tried, thanks but it goes into infinite loop. For example, the first 
50 of the sequence should have been printed to the screen immediately 
without waiting.


```d
long[50] arr;
RowlandSequence_v2 range;

auto next(long a)
{
   range.popFront();
   return arr[a] = range.front();
}

void main()
{
     range = RowlandSequence_v2(7, 2);
     taskPool.map!next(iota(50))/*
     s.iota(50)
  .map!next
  .parallel//*/
  .writeln;
}
```


Keep in mind that `arr` and `range` are thread-local, and so will be 
different states for different tasks.


Though I don't really know what you are doing there.

-Steve


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-04 Thread Salih Dincer via Digitalmars-d-learn
On Tuesday, 4 April 2023 at 14:20:20 UTC, Steven Schveighoffer 
wrote:
parallel is a shortcut to `TaskPool.parallel`, which is indeed 
a foreach-only construct, it does not return a range.


I think what you want is `TaskPool.map`:

```d
// untested, just looking at the
taskPool.map!(/* your map function here */)
   (s.iota(len)).writeln;
```



I tried, thanks but it goes into infinite loop. For example, the 
first 50 of the sequence should have been printed to the screen 
immediately without waiting.


```d
long[50] arr;
RowlandSequence_v2 range;

auto next(long a)
{
  range.popFront();
  return arr[a] = range.front();
}

void main()
{
range = RowlandSequence_v2(7, 2);
taskPool.map!next(iota(50))/*
s.iota(50)
 .map!next
 .parallel//*/
 .writeln;
}
```

On Tuesday, 4 April 2023 at 13:18:01 UTC, Ali Çehreli wrote:
I don't have time to experiment more at this time but I have 
the following chapters, which includes some of those other 
algorithms as well:


  http://ddili.org/ders/d/kosut_islemler.html


I read it, thanks...

SDB@79


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-04 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/4/23 5:24 AM, Salih Dincer wrote:

Is it necessary to enclose the code in `foreach()`? I invite Ali to tell 
me! Please explain why parallel isn't running.


parallel is a shortcut to `TaskPool.parallel`, which is indeed a 
foreach-only construct, it does not return a range.


I think what you want is `TaskPool.map`:

```d
// untested, just looking at the
taskPool.map!(/* your map function here */)
   (s.iota(len)).writeln;
```

Can't use pipelining with it, because it is a member function.

https://dlang.org/phobos/std_parallelism.html#.TaskPool.map

-Steve


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-04 Thread Ali Çehreli via Digitalmars-d-learn

On 4/4/23 02:24, Salih Dincer wrote:

> I don't understand what `foreach()` does :)

Hm. I forgot whether 'parallel' works only with 'foreach'. But there are 
various other algorithms in std.parallelism that may be more useful with 
range algorithm chains:


  https://dlang.org/phobos/std_parallelism.html

> in Turkish

I don't have time to experiment more at this time but I have the 
following chapters, which includes some of those other algorithms as well:


  http://ddili.org/ders/d/kosut_islemler.html

  http://ddili.org/ders/d.en/parallelism.html

Ali



Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-04 Thread Salih Dincer via Digitalmars-d-learn
On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer 
wrote:

So for example, if you have:

```d
foreach(i; iota(0, 2_000_000).parallel)
{
   runExpensiveTask(i);
}
```

The foreach is run on the main thread, gets a `0`, then hands 
off to a task thread `runExpensiveTask(0)`. Then it gets a `1`, 
and hands off to a task thread `runExpensiveTask(1)`, etc. The 
iteration is not expensive, and is not done in parallel.


On the other hand, what you *shouldn't* do is:

```d
foreach(i; iota(0, 2_000_000).map!(x => 
runExpensiveTask(x)).parallel)

{
}
```

as this will run the expensive task *before* running any tasks.


I don't understand what `foreach()` does :)
```d
import std.range, std.algorithm : map;
import std.stdio, std.parallelism;
//import sdb.sequences : RowlandSequence_v2;/*
struct RowlandSequence_v2 {
  import std.numeric : gcd;

  long b, r, a = 3;
  enum empty = false;
  auto front() => a;
  void popFront() {
long result = 1;
while(result == 1) {
  result = gcd(r++, b);
  b += result;
}
a = result;
  }
}//*/

enum BP : long {
   // s, f, r, b = 7, /* <- beginning
   s = 178, r = 1993083484, b =  5979250449,//*/
   len = 190
}

void main()
{
  with(BP) {
long[len] arr;
auto range = RowlandSequence_v2(b, r);

s.iota(len)
 .map!((a){
range.popFront();
return arr[a] = range.front();
  }
  )
 .parallel
 .writeln;
  }
} /* PRINTS:

ParallelForeach!(MapResult!(__lambda3, 
Result))(std.parallelism.TaskPool, [5, 3, 73, 157, 7, 5, 3, 13, 
3986167223, 3, 7, 73], 1)


*/
```

Is it necessary to enclose the code in `foreach()`? I invite Ali 
to tell me! Please explain why parallel isn't running.


"Ben anlamıyor, foreach ne yapıyor  Kodu `foreach()` içine almak 
şart mı? Ali'yi davet ediyorum, bana anlatması için! Paralel() 
niye çalışmıyor, lütfen açıklayın hocam. Mümkünse Türkçe!" in 
Turkish.


SDB@79


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-03 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/3/23 7:22 PM, Paul wrote:

```d
// Timed main() 
vvv

void main(string[] args) {
auto progStartTime = MonoTime.currTime;
//-
 string filename = args.length > 1 ? args[1] : "aoc2215a.data";
 CommPair[] commPair;
 ulong x,y;

 // read file that has data sets in the form of x,y coordinate pairs
 // for each sensor-beacon pair.  Create on array of structs to hold
 // this information.
 loadFileDataIntoArrayOfStructs(commPair, filename);

 foreach(int lineOfInterest;parallel(iota(0,4_000_001))) {
     Span[] span; // A section of line-of-interest coordinates where 
no other beacons are present.

     const spanReserve = span.reserve(50);
     createSpansOfNoBeacons(lineOfInterest,commPair,span);

     // if spans overlap, combine them into a single span and mark
     // the other spans !inUse.
     combineOverlappingSpans(span);

     // look for a line that doesn't have 4,000,001 locations 
accounted for

     if(beaconFreeLocations(span) < 4_000_001) {

     // find the location that is not accounted for
     foreach(ulong i;0..4_000_000) {
     bool found = false;
     foreach(sp;span) {
     if(i >= sp.xLow && i <= sp.xHigh) {
     found = true;
     break;
     }
     }
     if(!found) {
     x = i; y = lineOfInterest;
     break;
     }
     }
     }
 }
writeln(x," ",y);

```


So I just quoted your main loop.

I am assuming that this O(n^2) algorithm doesn't actually run for all 
iterations, because that wouldn't be feasible (16 trillion iterations is 
a lot).


This means that I'm assuming a lot of cases do not run the second loop. 
Everything you do besides prune the second loop is mostly allocating an 
array of `Span` types. This means most of the parallel loops are 
allocating, and doing nothing else. As I said earlier, allocations need 
a global lock of the GC.


What you need to do probably, is to avoid these allocations per loop.

The easiest thing I can think of is to store the Span array as a static 
array of the largest array you need (i.e. the length of `commPair`), and 
then slice it instead of appending.


So what you need is inside `createSpansOfNoBeacons`, take as a reference 
a `ref Span[MAX_SPANS]`, and have it return a `Span[]` that is a slice 
of that which was "alocated".


See if this helps.

FWIW, I did the AoC 2022 as well, and I never needed parallel execution. 
Looking at my solution comment in reddit, this one I sort of punted by 
knowing I could exit as soon as the answer is found (my solution runs in 
2.5s on my input). But I recommend (once you are done), reading this 
post, it is a really cool way to look at it:


https://www.reddit.com/r/adventofcode/comments/zmcn64/2022_day_15_solutions/j0hl19a/?context=8=9

-Steve


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-03 Thread Paul via Digitalmars-d-learn
On Monday, 3 April 2023 at 23:13:58 UTC, Steven Schveighoffer 
wrote:


Yeah, please post.

```d
module aoc2215b2;

import std.stdio;
import std.file: readText;
import std.conv: to;
import std.math: abs;
import std.traits;
import std.parallelism;
import std.range;
import core.time: MonoTime;

// Timed main() 
vvv

void main(string[] args) {
auto progStartTime = MonoTime.currTime;
//-
string filename = args.length > 1 ? args[1] : "aoc2215a.data";
CommPair[] commPair;
ulong x,y;

	// read file that has data sets in the form of x,y coordinate 
pairs
	// for each sensor-beacon pair.  Create on array of structs to 
hold

// this information.
loadFileDataIntoArrayOfStructs(commPair, filename);

foreach(int lineOfInterest;parallel(iota(0,4_000_001))) {
		Span[] span; // A section of line-of-interest coordinates where 
no other beacons are present.

const spanReserve = span.reserve(50);
createSpansOfNoBeacons(lineOfInterest,commPair,span);

// if spans overlap, combine them into a single span and mark
// the other spans !inUse.
combineOverlappingSpans(span);

		// look for a line that doesn't have 4,000,001 locations 
accounted for

if(beaconFreeLocations(span) < 4_000_001) {

// find the location that is not accounted for
foreach(ulong i;0..4_000_000) {
bool found = false;
foreach(sp;span) {
if(i >= sp.xLow && i <= sp.xHigh) {
found = true;
break;
}
}
if(!found) {
x = i; y = lineOfInterest;
break;
}
}
}
}
writeln(x," ",y);

//-
auto progEndTime = MonoTime.currTime;
writeln(progEndTime - progStartTime);
}
// Timed main() 
^^^




struct CommPair {
int sx,sy,bx,by;
int manhattanDistance;
}

void loadFileDataIntoArrayOfStructs(ref CommPair[] commPair, 
string filename) {

import std.regex;
auto s = readText(filename);
	auto ctr = ctRegex!(`x=(-*\d+), y=(-*\d+):.*x=(-*\d+), 
y=(-*\d+)`);

CommPair cptemp;
foreach (c; matchAll(s, ctr)) {
cptemp.sx = to!int(c[1]);
cptemp.sy = to!int(c[2]);
cptemp.bx = to!int(c[3]);
cptemp.by = to!int(c[4]);
		cptemp.manhattanDistance = abs(cptemp.sx-cptemp.bx) + 
abs(cptemp.sy-cptemp.by);

commPair ~= cptemp;
}
}

struct Span {
int xLow, xHigh;
bool inUse = true;
}

void createSpansOfNoBeacons(int lineOfInterest, CommPair[] 
commPair,ref Span[] span) {

foreach(size_t i,cp;commPair) {
int distanceToLineOfInterest = abs(cp.sy - lineOfInterest);
if(cp.manhattanDistance >= distanceToLineOfInterest) {
			int xLow  = cp.sx - (cp.manhattanDistance - 
distanceToLineOfInterest);
			int xHigh = cp.sx + (cp.manhattanDistance - 
distanceToLineOfInterest);

span ~= Span(xLow,xHigh);
}
}
}

void combineOverlappingSpans(ref Span[] span) {
bool combinedSpansThisCycle = true;
while(combinedSpansThisCycle) {
combinedSpansThisCycle = false;
for(size_t i=0; i < span.length-1; i++) {
if(!span[i].inUse) continue;

for(size_t j=i+1; j < span.length; j++) {
if(!span[j].inUse) continue;

// if one span overlaps with the other, combine them into one 
span
if(spanIContainedInSpanJ(span[i],span[j]) || 
spanJContainedInSpanI(span[i],span[j])) {
	span[i].xLow  = span[i].xLow  < span[j].xLow  ? span[i].xLow 
 : span[j].xLow;
	span[i].xHigh = span[i].xHigh > span[j].xHigh ? 
span[i].xHigh : span[j].xHigh;

span[j].inUse = false;
combinedSpansThisCycle = true;

// after combining two spans, perform 
bounds checking
// 15 part b limits the search between 
0 and 4,000,000
	span[i].xLow  = span[i].xLow  < 0 ? 0 : 
span[i].xLow;
	span[i].xHigh = span[i].xHigh > 4_000_000 ? 4_000_000 : 
span[i].xHigh;

 

Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-03 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/3/23 6:56 PM, Paul wrote:

On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer wrote:



If your `foreach` body takes a global lock (like `writeln(i);`), 
then it's not going to run any faster (probably slower actually).

**Ok I did have some debug writelns I commented out.**


And did it help?

**No**

My program is about 140 lines Steven.  Its just one of the Advent of 
Code challenges.  Could I past the whole program here and see what you 
think?


Yeah, please post.

-Steve


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-03 Thread Paul via Digitalmars-d-learn
On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer 
wrote:




If your `foreach` body takes a global lock (like 
`writeln(i);`), then it's not going to run any faster 
(probably slower actually).

**Ok I did have some debug writelns I commented out.**


And did it help?

**No**

My program is about 140 lines Steven.  Its just one of the Advent 
of Code challenges.  Could I past the whole program here and see 
what you think?


Thanks for your assistance...much appreciated.





Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-03 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/3/23 6:02 PM, Paul wrote:

On Sunday, 2 April 2023 at 15:32:05 UTC, Steven Schveighoffer wrote:



It's important to note that parallel doesn't iterate the range in 
parallel, it just runs the body in parallel limited by your CPU count.

**?!?**


So for example, if you have:

```d
foreach(i; iota(0, 2_000_000).parallel)
{
   runExpensiveTask(i);
}
```

The foreach is run on the main thread, gets a `0`, then hands off to a 
task thread `runExpensiveTask(0)`. Then it gets a `1`, and hands off to 
a task thread `runExpensiveTask(1)`, etc. The iteration is not 
expensive, and is not done in parallel.


On the other hand, what you *shouldn't* do is:

```d
foreach(i; iota(0, 2_000_000).map!(x => runExpensiveTask(x)).parallel)
{
}
```

as this will run the expensive task *before* running any tasks.



If your `foreach` body takes a global lock (like `writeln(i);`), then 
it's not going to run any faster (probably slower actually).

**Ok I did have some debug writelns I commented out.**


And did it help? Another thing that takes a global lock is memory 
allocation.



Also make sure you have more than one logical CPU.

**I have 8.**


It's dependent on the work being done, but you should see a roughly 8x 
speedup as long as the overhead of distributing tasks is not significant 
compared to the work being done.


-Steve


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-03 Thread Paul via Digitalmars-d-learn
On Sunday, 2 April 2023 at 15:32:05 UTC, Steven Schveighoffer 
wrote:




It's important to note that parallel doesn't iterate the range 
in parallel, it just runs the body in parallel limited by your 
CPU count.

**?!?**

If your `foreach` body takes a global lock (like 
`writeln(i);`), then it's not going to run any faster (probably 
slower actually).

**Ok I did have some debug writelns I commented out.**

If you can disclose more about what you are trying to do, it 
would be helpful.
**This seems like it would be a lot of code and explaining but 
let me think about how to summarize.**



Also make sure you have more than one logical CPU.

**I have 8.**




Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-02 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/1/23 6:32 PM, Paul wrote:

On Saturday, 1 April 2023 at 18:30:32 UTC, Steven Schveighoffer wrote:

On 4/1/23 2:25 PM, Paul wrote:

```d
import std.range;

foreach(i; iota(0, 2_000_000).parallel)
```



Is there a way to tell if the parallelism actually divided up the work?  
Both versions of my program run in the same time ~6 secs.


It's important to note that parallel doesn't iterate the range in 
parallel, it just runs the body in parallel limited by your CPU count.


If your `foreach` body takes a global lock (like `writeln(i);`), then 
it's not going to run any faster (probably slower actually).


If you can disclose more about what you are trying to do, it would be 
helpful.


Also make sure you have more than one logical CPU.

-Steve


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-02 Thread Salih Dincer via Digitalmars-d-learn

On Sunday, 2 April 2023 at 04:34:40 UTC, Salih Dincer wrote:

I haven't seen rsFirst256 until now...


**Edit:** I saw, I saw :)

I am struck with consternation! I've never seen these results 
before. Interesting, there is such a thing as parallel threading 
:)


Here are my skipPoints:

```d
enum BP : long {
  //f, a, r, b = 7, /* <- beginning
   f = 113, r =   62279227, b =   186837678,
  // f = 146, r =  249134971, b =   747404910,
  // f = 161, r =  498270808, b =  1494812421,
  // f = 178, r = 1993083484, b =  5979250449,
  // f = 210, r = 3986167363, b = 11958502086,
  //*/
  s = 5
} /* PRINTS:
eLab@pico:~/Projeler$ ./RownlandSequence_v2
122: ["124559610, 373678827"]
128: ["249120240, 747360717"]
*/
```

It looks like there are 5 total skipPoints until 256 where it 
loops for a long time. (while passing 1's).


SDB@79


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Salih Dincer via Digitalmars-d-learn

On Saturday, 1 April 2023 at 22:48:46 UTC, Ali Çehreli wrote:

On 4/1/23 15:30, Paul wrote:

> Is there a way to verify that it split up the work in to
tasks/threads
> ...?

It is hard to see the difference unless there is actual work in 
the loop that takes time.


I always use the Rowland Sequence for such experiments.  At least 
it's better than the Fibonacci Range:


```d
struct RowlandSequence {
  import std.numeric : gcd;
  import std.format : format;
  import std.conv : text;

  long b, r, a = 3;
  enum empty = false;

  string[] front() {
string result = format("%s, %s", b, r);
return [text(a), result];
  }

  void popFront() {
long result = 1;
while(result == 1) {
  result = gcd(r++, b);
  b += result;
}
a = result;
  }
}

enum BP {
  f = 1, b = 7, r = 2, a = 1, /*
  f = 109, b = 186837516, r = 62279173, //*/
  s = 5
}

void main()
{
  RowlandSequence rs;
  long start, skip;

  with(BP) {
rs = RowlandSequence(b, r);
start = f;
skip = s;
  }
  rs.popFront();

  import std.stdio, std.parallelism;
  import std.range : take;

  auto rsFirst128 = rs.take(128);
  foreach(r; rsFirst128.parallel)
  {
if(r[0].length > skip)
{
  start.writeln(": ", r);
}
start++;
  }
} /* PRINTS:

46: ["121403", "364209, 121404"]
48: ["242807", "728421, 242808"]
68: ["486041", "1458123, 486042"]
74: ["972533", "2917599, 972534"]
78: ["1945649", "5836947, 1945650"]
82: ["3891467", "11674401, 3891468"]
90: ["7783541", "23350623, 7783542"]
93: ["15567089", "46701267, 15567090"]
102: ["31139561", "93418683, 31139562"]
108: ["62279171", "186837513, 62279172"]

*/
```

The operation is simple, again multiplication, addition, 
subtraction and module, i.e. So four operations but enough to 
overrun the CPU! I haven't seen rsFirst256 until now because I 
don't have a fast enough processor. Maybe you'll see it, but the 
first 108 is fast anyway.


**PS:** Decrease value of the `skip` to see the entire sequence. 
In cases where your processor power is not enough, you can create 
skip points.  Check out BP...


SDB@79


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Ali Çehreli via Digitalmars-d-learn

On 4/1/23 15:30, Paul wrote:

> Is there a way to verify that it split up the work in to tasks/threads
> ...?

It is hard to see the difference unless there is actual work in the loop 
that takes time. You can add a Thread.sleep call. (Commented-out in the 
following program.)


Another option is to monitor a task manager like 'top' on unix based 
systems. It should multiple threads for the same program.


However, I will do something unspeakably wrong and take advantage of 
undefined behavior below. :) Since iteration count is an even number, 
the 'sum' variable should come out as 0 in the end. With .parallel it 
doesn't because multiple threads are stepping on each other's toes (values):


import std;

void main() {
long sum;

foreach(i; iota(0, 2_000_000).parallel) {
// import core.thread;
// Thread.sleep(1.msecs);

if (i % 2) {
++sum;

} else {
--sum;
}
}

if (sum == 0) {
writeln("We highly likely worked serially.");

} else {
writefln!"We highly likely worked in parallel because %s != 
0."(sum);

}
}

If you remove .parallel, 'sum' will always be 0.

Ali



Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Paul via Digitalmars-d-learn
On Saturday, 1 April 2023 at 18:30:32 UTC, Steven Schveighoffer 
wrote:

On 4/1/23 2:25 PM, Paul wrote:

```d
import std.range;

foreach(; iota(0, 2_000_000).parallel)
```

-Steve


Is there a way to tell if the parallelism actually divided up the 
work?  Both versions of my program run in the same time ~6 secs.


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Paul via Digitalmars-d-learn

```d
import std.range;

foreach(; iota(0, 2_000_000).parallel)
```

-Steve


Is there a way to verify that it split up the work in to 
tasks/threads ...?  The example you gave me works...compiles w/o 
errors but the execution time is the same as the non-parallel 
version.  They both take about 6 secs to execute.  totalCPUs 
tells me I have 8 CPUs available.




Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Paul via Digitalmars-d-learn

Thanks Steve.




Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/1/23 2:25 PM, Paul wrote:

Thanks in advance for any assistance.

As the subject line suggests can I do something like? :
```d
foreach (i; taskPool.parallel(0..2_000_000))
```
Obviously this exact syntax doesn't work but I think it expresses the 
gist of my challenge.



```d
import std.range;

foreach(; iota(0, 2_000_000).parallel)
```

-Steve


foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Paul via Digitalmars-d-learn

Thanks in advance for any assistance.

As the subject line suggests can I do something like? :
```d
foreach (i; taskPool.parallel(0..2_000_000))
```
Obviously this exact syntax doesn't work but I think it expresses 
the gist of my challenge.