Re: foreach (i; taskPool.parallel(0..2_000_000)
On Thursday, 6 April 2023 at 01:44:15 UTC, H. S. Teoh wrote: D ranges are conceptually sequential, but the actual underlying memory access patterns depends on the concrete type at runtime. An array's elements are stored sequentially in memory, and arrays are ranges. But a linked-list can also have a range interface, yet its elements may be stored in non-consecutive memory locations. So the concrete type matters here; the range API only gives you conceptual sequentiality, it does not guarantee physically sequential memory access. Very helpful Teoh. Thanks again.
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Tuesday, 4 April 2023 at 16:22:29 UTC, Steven Schveighoffer wrote: On 4/4/23 11:34 AM, Salih Dincer wrote: On Tuesday, 4 April 2023 at 14:20:20 UTC, Steven Schveighoffer wrote: parallel is a shortcut to `TaskPool.parallel`, which is indeed a foreach-only construct, it does not return a range. I think what you want is `TaskPool.map`: ```d // untested, just looking at the taskPool.map!(/* your map function here */) (s.iota(len)).writeln; ``` I tried, thanks but it goes into infinite loop. For example, the first 50 of the sequence should have been printed to the screen immediately without waiting. ```d long[50] arr; RowlandSequence_v2 range; auto next(long a) { range.popFront(); return arr[a] = range.front(); } void main() { range = RowlandSequence_v2(7, 2); taskPool.map!next(iota(50))/* s.iota(50) .map!next .parallel//*/ .writeln; } ``` Keep in mind that `arr` and `range` are thread-local, and so will be different states for different tasks. I guess the reason it goes into an infinite loop is because gcd() a recursive function (gcd). This is the only solution I can think of about this: ```d import std.range, std.algorithm : map; import std.stdio, std.parallelism; //import std.numeric : gcd; /* struct Vector { long x, y, result; alias result this; } Vector gcd(long a, long b) { if(b == 0) return Vector(1, 0, a); auto pre = gcd(b, a % b); auto tmp = (a / b) * pre.y; return Vector(pre.y, pre.x - tmp, pre.result); }//*/ struct RowlandSequence_v3 { long b, r, n, a = 3, limit; bool empty() { return n == limit; } auto front() { return a; } void popFront() { long result = 1; while(result == 1) { result = gcd(r++, b); b += result; } a = result; } long gcd(long a, long b) { long c; while(b != 0) { c = a % b; a = b; b = c; } return a; } } auto next(ref RowlandSequence_v3 range) { with(range) { if(empty) return [n, front]; popFront(); return [n++, front]; } } long[179] arr; void main() { // initialization: RowlandSequence_v3[4] r = [ RowlandSequence_v3(7 , 2, 0, 3, 112), RowlandSequence_v3(186837678, 62279227, 112, 3, 145), RowlandSequence_v3(747404910, 249134971, 145, 6257, 160), RowlandSequence_v3(1494812421, 498270808, 160, 11, 177) ]; auto tasks = [ task(, r[0]), task(, r[1]), task(, r[2]), task(, r[3]) ]; // quad parallel operation: foreach(_; 0..r[0].limit) { foreach(p, ref task; tasks) { task.executeInNewThread; auto t = task.workForce; arr[t[0]] = t[1]; } } // prints... foreach(x, n; arr) { switch(x + 1) { case 112, 145, 160: n.writeln; break; default: n.write(", "); } } } /* PRINTS: user@debian:~/Documents$ dmd -O rowlandSequence.d -release user@debian:~/Documents$ time ./rowlandSequence 5, 3, 11, 3, 23, 3, 47, 3, 5, 3, 101, 3, 7, 11, 3, 13, 233, 3, 467, 3, 5, 3, 941, 3, 7, 1889, 3, 3779, 3, 7559, 3, 13, 15131, 3, 53, 3, 7, 30323, 3, 60647, 3, 5, 3, 101, 3, 121403, 3, 242807, 3, 5, 3, 19, 7, 5, 3, 47, 3, 37, 5, 3, 17, 3, 199, 53, 3, 29, 3, 486041, 3, 7, 421, 23, 3, 972533, 3, 577, 7, 1945649, 3, 163, 7, 3891467, 3, 5, 3, 127, 443, 3, 31, 7783541, 3, 7, 15567089, 3, 19, 29, 3, 5323, 7, 5, 3, 31139561, 3, 41, 3, 5, 3, 62279171, 3, 7, 83, 3 29, 3, 1103, 3, 5, 3, 13, 7, 124559609, 3, 107, 3, 911, 3, 249120239, 3, 11, 3, 7, 61, 37, 179, 3, 31, 19051, 7, 3793, 23, 3, 5, 3, 6257, 3 3, 11, 3, 13, 5, 3, 739, 37, 5, 3, 498270791, 3, 19, 11, 3 3, 3, 5, 3, 996541661, 3, 7, 37, 5, 3, 67, 1993083437, 3, 5, 3, 83, 3, 3, 0, real7m54.093s user7m54.062s sys 0m0.024s */ ``` However, parallel processing for 4-digit sequence elements is not promising at least for the Rowland Sequence. SDB@79
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Thu, Apr 06, 2023 at 01:20:28AM +, Paul via Digitalmars-d-learn wrote: [...] > Yes I understand, basically, what's going on in hardware. I just > wasn't sure if the access type was linked to the container type. It > seems obvious now, since you've both made it clear, that it also > depends on how I'm accessing my container. > > Having said all of this, isn't a D 'range' fundamentally a sequential > access container (i.e popFront) ? D ranges are conceptually sequential, but the actual underlying memory access patterns depends on the concrete type at runtime. An array's elements are stored sequentially in memory, and arrays are ranges. But a linked-list can also have a range interface, yet its elements may be stored in non-consecutive memory locations. So the concrete type matters here; the range API only gives you conceptual sequentiality, it does not guarantee physically sequential memory access. T -- Many open minds should be closed for repairs. -- K5 user
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Wednesday, 5 April 2023 at 23:06:54 UTC, H. S. Teoh wrote: So your data structures and algorithms should be designed in a way that takes advantage of linear access where possible. T Yes I understand, basically, what's going on in hardware. I just wasn't sure if the access type was linked to the container type. It seems obvious now, since you've both made it clear, that it also depends on how I'm accessing my container. Having said all of this, isn't a D 'range' fundamentally a sequential access container (i.e popFront) ?
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Wed, Apr 05, 2023 at 10:34:22PM +, Paul via Digitalmars-d-learn wrote: > On Tuesday, 4 April 2023 at 22:20:52 UTC, H. S. Teoh wrote: > > > Best practices for arrays in hot loops: [...] > > - Where possible, prefer sequential access over random access (take > > advantage of the CPU cache hierarchy). > > Thanks for sharing Teoh! Very helpful. > > would this be random access? for(size_t i; i indices? ...and this be sequential foreach(a;arr) ? > > or would they have to be completely different kinds of containers? a > dlang 'range' vs arr[]? [...] The exact syntactic construct you use is not important; under the hood, for(i; i
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/5/23 6:34 PM, Paul wrote: On Tuesday, 4 April 2023 at 22:20:52 UTC, H. S. Teoh wrote: Best practices for arrays in hot loops: - Avoid appending if possible; instead, pre-allocate outside the loop. - Where possible, reuse existing arrays instead of discarding old ones and allocating new ones. - Use slices where possible instead of making copies of subarrays (this esp. applies to strings). - Where possible, prefer sequential access over random access (take advantage of the CPU cache hierarchy). Thanks for sharing Teoh! Very helpful. would this be random access? for(size_t i; iindices? ...and this be sequential foreach(a;arr) ? No, random access is access out of sequence. Those two lines are pretty much equivalent, and even a naive compiler is going to produce exactly the same generated code from both of them. A classic example is processing a 2d array: ```d for(int i = 0; i < arr[0].length; ++i) for(int j = 0; j < arr.length; ++j) arr[j][i]++; // vs for(int j = 0; j < arr.length; ++j) for(int i = 0; i < arr[0].length; ++i) arr[j][i]++; ``` The first accesses elements *by column*, which means that the array data is accessed non-linearly in memory. To be fair, both are "linear" in terms of algorithm, but one is going to be faster because of cache coherency (you are accessing sequential *hardware addresses*). -Steve
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Tuesday, 4 April 2023 at 22:20:52 UTC, H. S. Teoh wrote: Best practices for arrays in hot loops: - Avoid appending if possible; instead, pre-allocate outside the loop. - Where possible, reuse existing arrays instead of discarding old ones and allocating new ones. - Use slices where possible instead of making copies of subarrays (this esp. applies to strings). - Where possible, prefer sequential access over random access (take advantage of the CPU cache hierarchy). Thanks for sharing Teoh! Very helpful. would this be random access? for(size_t i; iusing indices? ...and this be sequential foreach(a;arr) ? or would they have to be completely different kinds of containers? a dlang 'range' vs arr[]?
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Tue, Apr 04, 2023 at 09:35:29PM +, Paul via Digitalmars-d-learn wrote: [...] > Well Steven just making the change you said reduced the execution time > from ~6-7 secs to ~3 secs. Then, including the 'parallel' in the > foreach statement took it down to ~1 sec. > > Boy lesson learned in appending-to and zeroing dynamic arrays in a hot > loop! Best practices for arrays in hot loops: - Avoid appending if possible; instead, pre-allocate outside the loop. - Where possible, reuse existing arrays instead of discarding old ones and allocating new ones. - Use slices where possible instead of making copies of subarrays (this esp. applies to strings). - Where possible, prefer sequential access over random access (take advantage of the CPU cache hierarchy). T -- Famous last words: I *think* this will work...
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Monday, 3 April 2023 at 23:50:48 UTC, Steven Schveighoffer wrote: So what you need is inside `createSpansOfNoBeacons`, take as a reference a `ref Span[MAX_SPANS]`, and have it return a `Span[]` that is a slice of that which was "alocated". See if this helps. Well Steven just making the change you said reduced the execution time from ~6-7 secs to ~3 secs. Then, including the 'parallel' in the foreach statement took it down to ~1 sec. Boy lesson learned in appending-to and zeroing dynamic arrays in a hot loop!
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/4/23 11:34 AM, Salih Dincer wrote: On Tuesday, 4 April 2023 at 14:20:20 UTC, Steven Schveighoffer wrote: parallel is a shortcut to `TaskPool.parallel`, which is indeed a foreach-only construct, it does not return a range. I think what you want is `TaskPool.map`: ```d // untested, just looking at the taskPool.map!(/* your map function here */) (s.iota(len)).writeln; ``` I tried, thanks but it goes into infinite loop. For example, the first 50 of the sequence should have been printed to the screen immediately without waiting. ```d long[50] arr; RowlandSequence_v2 range; auto next(long a) { range.popFront(); return arr[a] = range.front(); } void main() { range = RowlandSequence_v2(7, 2); taskPool.map!next(iota(50))/* s.iota(50) .map!next .parallel//*/ .writeln; } ``` Keep in mind that `arr` and `range` are thread-local, and so will be different states for different tasks. Though I don't really know what you are doing there. -Steve
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Tuesday, 4 April 2023 at 14:20:20 UTC, Steven Schveighoffer wrote: parallel is a shortcut to `TaskPool.parallel`, which is indeed a foreach-only construct, it does not return a range. I think what you want is `TaskPool.map`: ```d // untested, just looking at the taskPool.map!(/* your map function here */) (s.iota(len)).writeln; ``` I tried, thanks but it goes into infinite loop. For example, the first 50 of the sequence should have been printed to the screen immediately without waiting. ```d long[50] arr; RowlandSequence_v2 range; auto next(long a) { range.popFront(); return arr[a] = range.front(); } void main() { range = RowlandSequence_v2(7, 2); taskPool.map!next(iota(50))/* s.iota(50) .map!next .parallel//*/ .writeln; } ``` On Tuesday, 4 April 2023 at 13:18:01 UTC, Ali Çehreli wrote: I don't have time to experiment more at this time but I have the following chapters, which includes some of those other algorithms as well: http://ddili.org/ders/d/kosut_islemler.html I read it, thanks... SDB@79
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/4/23 5:24 AM, Salih Dincer wrote: Is it necessary to enclose the code in `foreach()`? I invite Ali to tell me! Please explain why parallel isn't running. parallel is a shortcut to `TaskPool.parallel`, which is indeed a foreach-only construct, it does not return a range. I think what you want is `TaskPool.map`: ```d // untested, just looking at the taskPool.map!(/* your map function here */) (s.iota(len)).writeln; ``` Can't use pipelining with it, because it is a member function. https://dlang.org/phobos/std_parallelism.html#.TaskPool.map -Steve
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/4/23 02:24, Salih Dincer wrote: > I don't understand what `foreach()` does :) Hm. I forgot whether 'parallel' works only with 'foreach'. But there are various other algorithms in std.parallelism that may be more useful with range algorithm chains: https://dlang.org/phobos/std_parallelism.html > in Turkish I don't have time to experiment more at this time but I have the following chapters, which includes some of those other algorithms as well: http://ddili.org/ders/d/kosut_islemler.html http://ddili.org/ders/d.en/parallelism.html Ali
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer wrote: So for example, if you have: ```d foreach(i; iota(0, 2_000_000).parallel) { runExpensiveTask(i); } ``` The foreach is run on the main thread, gets a `0`, then hands off to a task thread `runExpensiveTask(0)`. Then it gets a `1`, and hands off to a task thread `runExpensiveTask(1)`, etc. The iteration is not expensive, and is not done in parallel. On the other hand, what you *shouldn't* do is: ```d foreach(i; iota(0, 2_000_000).map!(x => runExpensiveTask(x)).parallel) { } ``` as this will run the expensive task *before* running any tasks. I don't understand what `foreach()` does :) ```d import std.range, std.algorithm : map; import std.stdio, std.parallelism; //import sdb.sequences : RowlandSequence_v2;/* struct RowlandSequence_v2 { import std.numeric : gcd; long b, r, a = 3; enum empty = false; auto front() => a; void popFront() { long result = 1; while(result == 1) { result = gcd(r++, b); b += result; } a = result; } }//*/ enum BP : long { // s, f, r, b = 7, /* <- beginning s = 178, r = 1993083484, b = 5979250449,//*/ len = 190 } void main() { with(BP) { long[len] arr; auto range = RowlandSequence_v2(b, r); s.iota(len) .map!((a){ range.popFront(); return arr[a] = range.front(); } ) .parallel .writeln; } } /* PRINTS: ParallelForeach!(MapResult!(__lambda3, Result))(std.parallelism.TaskPool, [5, 3, 73, 157, 7, 5, 3, 13, 3986167223, 3, 7, 73], 1) */ ``` Is it necessary to enclose the code in `foreach()`? I invite Ali to tell me! Please explain why parallel isn't running. "Ben anlamıyor, foreach ne yapıyor Kodu `foreach()` içine almak şart mı? Ali'yi davet ediyorum, bana anlatması için! Paralel() niye çalışmıyor, lütfen açıklayın hocam. Mümkünse Türkçe!" in Turkish. SDB@79
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/3/23 7:22 PM, Paul wrote: ```d // Timed main() vvv void main(string[] args) { auto progStartTime = MonoTime.currTime; //- string filename = args.length > 1 ? args[1] : "aoc2215a.data"; CommPair[] commPair; ulong x,y; // read file that has data sets in the form of x,y coordinate pairs // for each sensor-beacon pair. Create on array of structs to hold // this information. loadFileDataIntoArrayOfStructs(commPair, filename); foreach(int lineOfInterest;parallel(iota(0,4_000_001))) { Span[] span; // A section of line-of-interest coordinates where no other beacons are present. const spanReserve = span.reserve(50); createSpansOfNoBeacons(lineOfInterest,commPair,span); // if spans overlap, combine them into a single span and mark // the other spans !inUse. combineOverlappingSpans(span); // look for a line that doesn't have 4,000,001 locations accounted for if(beaconFreeLocations(span) < 4_000_001) { // find the location that is not accounted for foreach(ulong i;0..4_000_000) { bool found = false; foreach(sp;span) { if(i >= sp.xLow && i <= sp.xHigh) { found = true; break; } } if(!found) { x = i; y = lineOfInterest; break; } } } } writeln(x," ",y); ``` So I just quoted your main loop. I am assuming that this O(n^2) algorithm doesn't actually run for all iterations, because that wouldn't be feasible (16 trillion iterations is a lot). This means that I'm assuming a lot of cases do not run the second loop. Everything you do besides prune the second loop is mostly allocating an array of `Span` types. This means most of the parallel loops are allocating, and doing nothing else. As I said earlier, allocations need a global lock of the GC. What you need to do probably, is to avoid these allocations per loop. The easiest thing I can think of is to store the Span array as a static array of the largest array you need (i.e. the length of `commPair`), and then slice it instead of appending. So what you need is inside `createSpansOfNoBeacons`, take as a reference a `ref Span[MAX_SPANS]`, and have it return a `Span[]` that is a slice of that which was "alocated". See if this helps. FWIW, I did the AoC 2022 as well, and I never needed parallel execution. Looking at my solution comment in reddit, this one I sort of punted by knowing I could exit as soon as the answer is found (my solution runs in 2.5s on my input). But I recommend (once you are done), reading this post, it is a really cool way to look at it: https://www.reddit.com/r/adventofcode/comments/zmcn64/2022_day_15_solutions/j0hl19a/?context=8=9 -Steve
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Monday, 3 April 2023 at 23:13:58 UTC, Steven Schveighoffer wrote: Yeah, please post. ```d module aoc2215b2; import std.stdio; import std.file: readText; import std.conv: to; import std.math: abs; import std.traits; import std.parallelism; import std.range; import core.time: MonoTime; // Timed main() vvv void main(string[] args) { auto progStartTime = MonoTime.currTime; //- string filename = args.length > 1 ? args[1] : "aoc2215a.data"; CommPair[] commPair; ulong x,y; // read file that has data sets in the form of x,y coordinate pairs // for each sensor-beacon pair. Create on array of structs to hold // this information. loadFileDataIntoArrayOfStructs(commPair, filename); foreach(int lineOfInterest;parallel(iota(0,4_000_001))) { Span[] span; // A section of line-of-interest coordinates where no other beacons are present. const spanReserve = span.reserve(50); createSpansOfNoBeacons(lineOfInterest,commPair,span); // if spans overlap, combine them into a single span and mark // the other spans !inUse. combineOverlappingSpans(span); // look for a line that doesn't have 4,000,001 locations accounted for if(beaconFreeLocations(span) < 4_000_001) { // find the location that is not accounted for foreach(ulong i;0..4_000_000) { bool found = false; foreach(sp;span) { if(i >= sp.xLow && i <= sp.xHigh) { found = true; break; } } if(!found) { x = i; y = lineOfInterest; break; } } } } writeln(x," ",y); //- auto progEndTime = MonoTime.currTime; writeln(progEndTime - progStartTime); } // Timed main() ^^^ struct CommPair { int sx,sy,bx,by; int manhattanDistance; } void loadFileDataIntoArrayOfStructs(ref CommPair[] commPair, string filename) { import std.regex; auto s = readText(filename); auto ctr = ctRegex!(`x=(-*\d+), y=(-*\d+):.*x=(-*\d+), y=(-*\d+)`); CommPair cptemp; foreach (c; matchAll(s, ctr)) { cptemp.sx = to!int(c[1]); cptemp.sy = to!int(c[2]); cptemp.bx = to!int(c[3]); cptemp.by = to!int(c[4]); cptemp.manhattanDistance = abs(cptemp.sx-cptemp.bx) + abs(cptemp.sy-cptemp.by); commPair ~= cptemp; } } struct Span { int xLow, xHigh; bool inUse = true; } void createSpansOfNoBeacons(int lineOfInterest, CommPair[] commPair,ref Span[] span) { foreach(size_t i,cp;commPair) { int distanceToLineOfInterest = abs(cp.sy - lineOfInterest); if(cp.manhattanDistance >= distanceToLineOfInterest) { int xLow = cp.sx - (cp.manhattanDistance - distanceToLineOfInterest); int xHigh = cp.sx + (cp.manhattanDistance - distanceToLineOfInterest); span ~= Span(xLow,xHigh); } } } void combineOverlappingSpans(ref Span[] span) { bool combinedSpansThisCycle = true; while(combinedSpansThisCycle) { combinedSpansThisCycle = false; for(size_t i=0; i < span.length-1; i++) { if(!span[i].inUse) continue; for(size_t j=i+1; j < span.length; j++) { if(!span[j].inUse) continue; // if one span overlaps with the other, combine them into one span if(spanIContainedInSpanJ(span[i],span[j]) || spanJContainedInSpanI(span[i],span[j])) { span[i].xLow = span[i].xLow < span[j].xLow ? span[i].xLow : span[j].xLow; span[i].xHigh = span[i].xHigh > span[j].xHigh ? span[i].xHigh : span[j].xHigh; span[j].inUse = false; combinedSpansThisCycle = true; // after combining two spans, perform bounds checking // 15 part b limits the search between 0 and 4,000,000 span[i].xLow = span[i].xLow < 0 ? 0 : span[i].xLow; span[i].xHigh = span[i].xHigh > 4_000_000 ? 4_000_000 : span[i].xHigh;
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/3/23 6:56 PM, Paul wrote: On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer wrote: If your `foreach` body takes a global lock (like `writeln(i);`), then it's not going to run any faster (probably slower actually). **Ok I did have some debug writelns I commented out.** And did it help? **No** My program is about 140 lines Steven. Its just one of the Advent of Code challenges. Could I past the whole program here and see what you think? Yeah, please post. -Steve
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer wrote: If your `foreach` body takes a global lock (like `writeln(i);`), then it's not going to run any faster (probably slower actually). **Ok I did have some debug writelns I commented out.** And did it help? **No** My program is about 140 lines Steven. Its just one of the Advent of Code challenges. Could I past the whole program here and see what you think? Thanks for your assistance...much appreciated.
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/3/23 6:02 PM, Paul wrote: On Sunday, 2 April 2023 at 15:32:05 UTC, Steven Schveighoffer wrote: It's important to note that parallel doesn't iterate the range in parallel, it just runs the body in parallel limited by your CPU count. **?!?** So for example, if you have: ```d foreach(i; iota(0, 2_000_000).parallel) { runExpensiveTask(i); } ``` The foreach is run on the main thread, gets a `0`, then hands off to a task thread `runExpensiveTask(0)`. Then it gets a `1`, and hands off to a task thread `runExpensiveTask(1)`, etc. The iteration is not expensive, and is not done in parallel. On the other hand, what you *shouldn't* do is: ```d foreach(i; iota(0, 2_000_000).map!(x => runExpensiveTask(x)).parallel) { } ``` as this will run the expensive task *before* running any tasks. If your `foreach` body takes a global lock (like `writeln(i);`), then it's not going to run any faster (probably slower actually). **Ok I did have some debug writelns I commented out.** And did it help? Another thing that takes a global lock is memory allocation. Also make sure you have more than one logical CPU. **I have 8.** It's dependent on the work being done, but you should see a roughly 8x speedup as long as the overhead of distributing tasks is not significant compared to the work being done. -Steve
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Sunday, 2 April 2023 at 15:32:05 UTC, Steven Schveighoffer wrote: It's important to note that parallel doesn't iterate the range in parallel, it just runs the body in parallel limited by your CPU count. **?!?** If your `foreach` body takes a global lock (like `writeln(i);`), then it's not going to run any faster (probably slower actually). **Ok I did have some debug writelns I commented out.** If you can disclose more about what you are trying to do, it would be helpful. **This seems like it would be a lot of code and explaining but let me think about how to summarize.** Also make sure you have more than one logical CPU. **I have 8.**
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/1/23 6:32 PM, Paul wrote: On Saturday, 1 April 2023 at 18:30:32 UTC, Steven Schveighoffer wrote: On 4/1/23 2:25 PM, Paul wrote: ```d import std.range; foreach(i; iota(0, 2_000_000).parallel) ``` Is there a way to tell if the parallelism actually divided up the work? Both versions of my program run in the same time ~6 secs. It's important to note that parallel doesn't iterate the range in parallel, it just runs the body in parallel limited by your CPU count. If your `foreach` body takes a global lock (like `writeln(i);`), then it's not going to run any faster (probably slower actually). If you can disclose more about what you are trying to do, it would be helpful. Also make sure you have more than one logical CPU. -Steve
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Sunday, 2 April 2023 at 04:34:40 UTC, Salih Dincer wrote: I haven't seen rsFirst256 until now... **Edit:** I saw, I saw :) I am struck with consternation! I've never seen these results before. Interesting, there is such a thing as parallel threading :) Here are my skipPoints: ```d enum BP : long { //f, a, r, b = 7, /* <- beginning f = 113, r = 62279227, b = 186837678, // f = 146, r = 249134971, b = 747404910, // f = 161, r = 498270808, b = 1494812421, // f = 178, r = 1993083484, b = 5979250449, // f = 210, r = 3986167363, b = 11958502086, //*/ s = 5 } /* PRINTS: eLab@pico:~/Projeler$ ./RownlandSequence_v2 122: ["124559610, 373678827"] 128: ["249120240, 747360717"] */ ``` It looks like there are 5 total skipPoints until 256 where it loops for a long time. (while passing 1's). SDB@79
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Saturday, 1 April 2023 at 22:48:46 UTC, Ali Çehreli wrote: On 4/1/23 15:30, Paul wrote: > Is there a way to verify that it split up the work in to tasks/threads > ...? It is hard to see the difference unless there is actual work in the loop that takes time. I always use the Rowland Sequence for such experiments. At least it's better than the Fibonacci Range: ```d struct RowlandSequence { import std.numeric : gcd; import std.format : format; import std.conv : text; long b, r, a = 3; enum empty = false; string[] front() { string result = format("%s, %s", b, r); return [text(a), result]; } void popFront() { long result = 1; while(result == 1) { result = gcd(r++, b); b += result; } a = result; } } enum BP { f = 1, b = 7, r = 2, a = 1, /* f = 109, b = 186837516, r = 62279173, //*/ s = 5 } void main() { RowlandSequence rs; long start, skip; with(BP) { rs = RowlandSequence(b, r); start = f; skip = s; } rs.popFront(); import std.stdio, std.parallelism; import std.range : take; auto rsFirst128 = rs.take(128); foreach(r; rsFirst128.parallel) { if(r[0].length > skip) { start.writeln(": ", r); } start++; } } /* PRINTS: 46: ["121403", "364209, 121404"] 48: ["242807", "728421, 242808"] 68: ["486041", "1458123, 486042"] 74: ["972533", "2917599, 972534"] 78: ["1945649", "5836947, 1945650"] 82: ["3891467", "11674401, 3891468"] 90: ["7783541", "23350623, 7783542"] 93: ["15567089", "46701267, 15567090"] 102: ["31139561", "93418683, 31139562"] 108: ["62279171", "186837513, 62279172"] */ ``` The operation is simple, again multiplication, addition, subtraction and module, i.e. So four operations but enough to overrun the CPU! I haven't seen rsFirst256 until now because I don't have a fast enough processor. Maybe you'll see it, but the first 108 is fast anyway. **PS:** Decrease value of the `skip` to see the entire sequence. In cases where your processor power is not enough, you can create skip points. Check out BP... SDB@79
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/1/23 15:30, Paul wrote: > Is there a way to verify that it split up the work in to tasks/threads > ...? It is hard to see the difference unless there is actual work in the loop that takes time. You can add a Thread.sleep call. (Commented-out in the following program.) Another option is to monitor a task manager like 'top' on unix based systems. It should multiple threads for the same program. However, I will do something unspeakably wrong and take advantage of undefined behavior below. :) Since iteration count is an even number, the 'sum' variable should come out as 0 in the end. With .parallel it doesn't because multiple threads are stepping on each other's toes (values): import std; void main() { long sum; foreach(i; iota(0, 2_000_000).parallel) { // import core.thread; // Thread.sleep(1.msecs); if (i % 2) { ++sum; } else { --sum; } } if (sum == 0) { writeln("We highly likely worked serially."); } else { writefln!"We highly likely worked in parallel because %s != 0."(sum); } } If you remove .parallel, 'sum' will always be 0. Ali
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Saturday, 1 April 2023 at 18:30:32 UTC, Steven Schveighoffer wrote: On 4/1/23 2:25 PM, Paul wrote: ```d import std.range; foreach(; iota(0, 2_000_000).parallel) ``` -Steve Is there a way to tell if the parallelism actually divided up the work? Both versions of my program run in the same time ~6 secs.
Re: foreach (i; taskPool.parallel(0..2_000_000)
```d import std.range; foreach(; iota(0, 2_000_000).parallel) ``` -Steve Is there a way to verify that it split up the work in to tasks/threads ...? The example you gave me works...compiles w/o errors but the execution time is the same as the non-parallel version. They both take about 6 secs to execute. totalCPUs tells me I have 8 CPUs available.
Re: foreach (i; taskPool.parallel(0..2_000_000)
Thanks Steve.
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/1/23 2:25 PM, Paul wrote: Thanks in advance for any assistance. As the subject line suggests can I do something like? : ```d foreach (i; taskPool.parallel(0..2_000_000)) ``` Obviously this exact syntax doesn't work but I think it expresses the gist of my challenge. ```d import std.range; foreach(; iota(0, 2_000_000).parallel) ``` -Steve
foreach (i; taskPool.parallel(0..2_000_000)
Thanks in advance for any assistance. As the subject line suggests can I do something like? : ```d foreach (i; taskPool.parallel(0..2_000_000)) ``` Obviously this exact syntax doesn't work but I think it expresses the gist of my challenge.