Re: foreach (i; taskPool.parallel(0..2_000_000)
On Saturday, 1 April 2023 at 22:48:46 UTC, Ali Çehreli wrote: On 4/1/23 15:30, Paul wrote: > Is there a way to verify that it split up the work in to tasks/threads > ...? It is hard to see the difference unless there is actual work in the loop that takes time. I always use the Rowland Sequence for such experiments. At least it's better than the Fibonacci Range: ```d struct RowlandSequence { import std.numeric : gcd; import std.format : format; import std.conv : text; long b, r, a = 3; enum empty = false; string[] front() { string result = format("%s, %s", b, r); return [text(a), result]; } void popFront() { long result = 1; while(result == 1) { result = gcd(r++, b); b += result; } a = result; } } enum BP { f = 1, b = 7, r = 2, a = 1, /* f = 109, b = 186837516, r = 62279173, //*/ s = 5 } void main() { RowlandSequence rs; long start, skip; with(BP) { rs = RowlandSequence(b, r); start = f; skip = s; } rs.popFront(); import std.stdio, std.parallelism; import std.range : take; auto rsFirst128 = rs.take(128); foreach(r; rsFirst128.parallel) { if(r[0].length > skip) { start.writeln(": ", r); } start++; } } /* PRINTS: 46: ["121403", "364209, 121404"] 48: ["242807", "728421, 242808"] 68: ["486041", "1458123, 486042"] 74: ["972533", "2917599, 972534"] 78: ["1945649", "5836947, 1945650"] 82: ["3891467", "11674401, 3891468"] 90: ["7783541", "23350623, 7783542"] 93: ["15567089", "46701267, 15567090"] 102: ["31139561", "93418683, 31139562"] 108: ["62279171", "186837513, 62279172"] */ ``` The operation is simple, again multiplication, addition, subtraction and module, i.e. So four operations but enough to overrun the CPU! I haven't seen rsFirst256 until now because I don't have a fast enough processor. Maybe you'll see it, but the first 108 is fast anyway. **PS:** Decrease value of the `skip` to see the entire sequence. In cases where your processor power is not enough, you can create skip points. Check out BP... SDB@79
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/1/23 15:30, Paul wrote: > Is there a way to verify that it split up the work in to tasks/threads > ...? It is hard to see the difference unless there is actual work in the loop that takes time. You can add a Thread.sleep call. (Commented-out in the following program.) Another option is to monitor a task manager like 'top' on unix based systems. It should multiple threads for the same program. However, I will do something unspeakably wrong and take advantage of undefined behavior below. :) Since iteration count is an even number, the 'sum' variable should come out as 0 in the end. With .parallel it doesn't because multiple threads are stepping on each other's toes (values): import std; void main() { long sum; foreach(i; iota(0, 2_000_000).parallel) { // import core.thread; // Thread.sleep(1.msecs); if (i % 2) { ++sum; } else { --sum; } } if (sum == 0) { writeln("We highly likely worked serially."); } else { writefln!"We highly likely worked in parallel because %s != 0."(sum); } } If you remove .parallel, 'sum' will always be 0. Ali
Re: foreach (i; taskPool.parallel(0..2_000_000)
On Saturday, 1 April 2023 at 18:30:32 UTC, Steven Schveighoffer wrote: On 4/1/23 2:25 PM, Paul wrote: ```d import std.range; foreach(; iota(0, 2_000_000).parallel) ``` -Steve Is there a way to tell if the parallelism actually divided up the work? Both versions of my program run in the same time ~6 secs.
Re: foreach (i; taskPool.parallel(0..2_000_000)
```d import std.range; foreach(; iota(0, 2_000_000).parallel) ``` -Steve Is there a way to verify that it split up the work in to tasks/threads ...? The example you gave me works...compiles w/o errors but the execution time is the same as the non-parallel version. They both take about 6 secs to execute. totalCPUs tells me I have 8 CPUs available.
Re: Why are globals set to tls by default? and why is fast code ugly by default?
On Saturday, 1 April 2023 at 15:02:12 UTC, Ali Çehreli wrote: Does anyone have documentation on why Rust and Zip does not do thread local by default? I wonder what experience it was based on. I think that would hard to get documentation on the rationale for that decision. Maybe you can get an answer in their forums but I doubt it. For Rust I think they based it on that globals should have some kind of synchronization which is enforced at compile time. Therefore TLS becomes second citizen. Speaking of experience, I used to be a C++ programmer. We made use of thread-local storage precisely zero times. I think it's because the luminaries of the time did not even talk about it. Yes, that's "normal" programming that you more or less never use TLS. With D, I take good advantage of thread-local storage. Interestingly, I do that *only* for fast code. void foo(int arg) { static int[] workArea; if (workArea.length < nededFor(arg)) { // increase length } // Use workArea } Now I can use any number of threads using foo and they will have their independent work areas. Work area grows in amortized fashion for each thread. I find the code above to be clean and beautiful. It is very fast because there are no synchronization primitives needed because no work area is shared between threads. There is nothing beautiful with it other than the clean syntax. Why not just use a stack variable which is thread local as well. TLS is often allocated on the stack in many systems anyway. Accessing TLS variables can slower compared to stack variables. The complexity of TLS doesn't pay for its usefulness. > It's common knowledge that accessing tls global is slow > http://david-grs.github.io/tls_performance_overhead_cost_linux/ "TLS global is slow" would be misleading because even the article you linked explains right at the top, in the TL;DR are that "TLS may be slow". This depends how it is implemented. TLS is really a forest and can be implemented in many ways and it also depends where it is being accessed (shared libraries, executable etc.). In general TLS on x86 is accessed by fs:[-offset_to_variable] this isn't that slow but the complexity to get there is high. Keep in mind the TLS area must be initialized for every thread creation which isn't ideal. fs:[] isn't always possible and a function call is required similar to a DLL symbol look up. TLS is a turd which shouldn't have been created. They should have stopped with key/value pair which languages then could build on if they wanted. Now TLS are in the executable standards and it is a mess. x86 has now two ways of TLS (normal and TLS_DESC) just to make things even more complicated. A programmer never see this mess but as systems programmer I see this and it is horrible.
Re: foreach (i; taskPool.parallel(0..2_000_000)
Thanks Steve.
Re: foreach (i; taskPool.parallel(0..2_000_000)
On 4/1/23 2:25 PM, Paul wrote: Thanks in advance for any assistance. As the subject line suggests can I do something like? : ```d foreach (i; taskPool.parallel(0..2_000_000)) ``` Obviously this exact syntax doesn't work but I think it expresses the gist of my challenge. ```d import std.range; foreach(; iota(0, 2_000_000).parallel) ``` -Steve
foreach (i; taskPool.parallel(0..2_000_000)
Thanks in advance for any assistance. As the subject line suggests can I do something like? : ```d foreach (i; taskPool.parallel(0..2_000_000)) ``` Obviously this exact syntax doesn't work but I think it expresses the gist of my challenge.
Re: Why are globals set to tls by default? and why is fast code ugly by default?
On 4/1/23 17:02, Ali Çehreli wrote: Does anyone have documentation on why Rust and Zip does not do thread local by default? Rust just does not do mutable globals except in unsafe code.
Re: Is this code correct?
On Friday, 31 March 2023 at 13:11:58 UTC, z wrote: I've tried to search before but was only able to find articles for 3D triangles, and documentation for OpenGL, which i don't use. The first function you posted takes a 3D triangle as input, so I assumed you're working in 3D. What are you working on? Determines if a triangle is visible. You haven't defined what 'visible' means for a geometric triangle.
Re: Why are globals set to tls by default? and why is fast code ugly by default?
On 3/26/23 13:41, ryuukk_ wrote: > C, C++, Rust, Zig, Go doesn't do TLS by default for example C doesn't do because there was no such concept when it was conceived. C++ doesn't do because they built on top of C. (D does because it has always been innovative.) Go doesn't do because it had no innovations anyway. Does anyone have documentation on why Rust and Zip does not do thread local by default? I wonder what experience it was based on. Speaking of experience, I used to be a C++ programmer. We made use of thread-local storage precisely zero times. I think it's because the luminaries of the time did not even talk about it. With D, I take good advantage of thread-local storage. Interestingly, I do that *only* for fast code. void foo(int arg) { static int[] workArea; if (workArea.length < nededFor(arg)) { // increase length } // Use workArea } Now I can use any number of threads using foo and they will have their independent work areas. Work area grows in amortized fashion for each thread. I find the code above to be clean and beautiful. It is very fast because there are no synchronization primitives needed because no work area is shared between threads. Finding one example to the contrary does not make TLS a bad idea. Engineering is full of compromises. I agree with D's TLS by-default idea. Since I am here, I want to touch on something that may give the wrong idea to newer D programmers: D does not have globals. Every symbol belongs to a module. And copying an earlier comment of yours: > It's common knowledge that accessing tls global is slow > http://david-grs.github.io/tls_performance_overhead_cost_linux/ "TLS global is slow" would be misleading because even the article you linked explains right at the top, in the TL;DR are that "TLS may be slow". Ali
Re: Why are globals set to tls by default? and why is fast code ugly by default?
On Saturday, 1 April 2023 at 13:11:46 UTC, Guillaume Piolat wrote: TLS could be explicit and we wouldn't need a -vtls flag. Yeah, I think what we should do is make each thing be explicitly marked. When I want tls, I tend to comment that it was intentional anyway to make it clear I didn't just forget to put a shared note on the static.
Re: Why are globals set to tls by default? and why is fast code ugly by default?
On Saturday, 1 April 2023 at 08:47:54 UTC, IGotD- wrote: TLS by default is mistake in my opinion and it doesn't really help. TLS should be discouraged as much as possible as it is complicated and slows down thread creation. It looks like a mistake if we consider none of the D-inspired languages have stolen TLS-by-default.
Re: Why are globals set to tls by default? and why is fast code ugly by default?
On Friday, 31 March 2023 at 19:43:42 UTC, bachmeier wrote: Those of us that have been scarred by reading FORTRAN 77 code would disagree. I use global mutables myself (and even the occasional goto), but if anything, it should be `__GLOBAL_MUTABLE_VARIABLE` to increase the pain of using them. But you kind of get into the same things with "accidental TLS". It doesn't race, but now the variable is different for every thread, which is a different kind of race. TLS could be explicit and we wouldn't need a -vtls flag. There is no flag to warn for every use of @trusted, so in the grand scheme of things TLS is more dangerous than @trusted.
Re: Why are globals set to tls by default? and why is fast code ugly by default?
On Sunday, 26 March 2023 at 18:25:54 UTC, Richard (Rikki) Andrew Cattermole wrote: Having TLS by default is actually quite desirable if you like your code to be safe without having to do anything extra. As soon as you go into global to the process memory, you are responsible for synchronization. Ensuring that the state is what you want it to be. Keep in mind that threads didn't exist when C was created. They could not change their approach without breaking everyone's code. So what they do is totally irrelevant unless its 1980. I think its the correct way around. You can't accidentally cause memory safety issues. You must explicitly opt-into the ability to mess up your programs state. I think "safe" BS is going too far. Normally you don't use global variables at all but if you do the most usual is to use normal global variables with perhaps some kind of synchronization primitive. TLS is quite unusual and having TLS by default might even introduce bugs as the programmer believes that the value can be set by all threads while they are independent. Regardless, __gshared in front of the variable isn't a huge deal but it shows that the memory model in D is a leaking bucket. Some compilers enforce synchronization primitives for global variables and are "safe" that way. However, sometimes you don't need them like in small systems that only has one thread and it just gets in the way. TLS by default is mistake in my opinion and it doesn't really help. TLS should be discouraged as much as possible as it is complicated and slows down thread creation.
Re: Why are globals set to tls by default? and why is fast code ugly by default?
On Sunday, 26 March 2023 at 18:25:54 UTC, Richard (Rikki) Andrew Cattermole wrote: Having TLS by default is actually quite desirable if you like your code to be safe without having to do anything extra. As soon as you go into global to the process memory, you are responsible for synchronization. Ensuring that the state is what you want it to be. Keep in mind that threads didn't exist when C was created. They could not change their approach without breaking everyone's code. So what they do is totally irrelevant unless its 1980. I think its the correct way around. You can't accidentally cause memory safety issues. You must explicitly opt-into the ability to mess up your programs state. I think "safe" BS is going too far. Normally you don't use global variables at all but if you do the most usual is to use normal global variables with perhaps some kind of synchronization primitive. TLS is quite unusual and having TLS by default might even introduce bugs as the programmer believes that the value can be set by all threads while they are independent. Regardless, __gshared in front of the variable isn't a huge deal but it shows that the memory model in D is a leaking bucket. Some compilers enforce synchronization primitives for global variables and are "safe" that way. However, sometimes you don't need them like in small systems that only has one thread and it just gets in the way. TLS by default is mistake in my opinion and it doesn't really help. TLS should be discouraged as much as possible as it is complicated and slows down thread creation.