Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Salih Dincer via Digitalmars-d-learn

On Saturday, 1 April 2023 at 22:48:46 UTC, Ali Çehreli wrote:

On 4/1/23 15:30, Paul wrote:

> Is there a way to verify that it split up the work in to
tasks/threads
> ...?

It is hard to see the difference unless there is actual work in 
the loop that takes time.


I always use the Rowland Sequence for such experiments.  At least 
it's better than the Fibonacci Range:


```d
struct RowlandSequence {
  import std.numeric : gcd;
  import std.format : format;
  import std.conv : text;

  long b, r, a = 3;
  enum empty = false;

  string[] front() {
string result = format("%s, %s", b, r);
return [text(a), result];
  }

  void popFront() {
long result = 1;
while(result == 1) {
  result = gcd(r++, b);
  b += result;
}
a = result;
  }
}

enum BP {
  f = 1, b = 7, r = 2, a = 1, /*
  f = 109, b = 186837516, r = 62279173, //*/
  s = 5
}

void main()
{
  RowlandSequence rs;
  long start, skip;

  with(BP) {
rs = RowlandSequence(b, r);
start = f;
skip = s;
  }
  rs.popFront();

  import std.stdio, std.parallelism;
  import std.range : take;

  auto rsFirst128 = rs.take(128);
  foreach(r; rsFirst128.parallel)
  {
if(r[0].length > skip)
{
  start.writeln(": ", r);
}
start++;
  }
} /* PRINTS:

46: ["121403", "364209, 121404"]
48: ["242807", "728421, 242808"]
68: ["486041", "1458123, 486042"]
74: ["972533", "2917599, 972534"]
78: ["1945649", "5836947, 1945650"]
82: ["3891467", "11674401, 3891468"]
90: ["7783541", "23350623, 7783542"]
93: ["15567089", "46701267, 15567090"]
102: ["31139561", "93418683, 31139562"]
108: ["62279171", "186837513, 62279172"]

*/
```

The operation is simple, again multiplication, addition, 
subtraction and module, i.e. So four operations but enough to 
overrun the CPU! I haven't seen rsFirst256 until now because I 
don't have a fast enough processor. Maybe you'll see it, but the 
first 108 is fast anyway.


**PS:** Decrease value of the `skip` to see the entire sequence. 
In cases where your processor power is not enough, you can create 
skip points.  Check out BP...


SDB@79


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Ali Çehreli via Digitalmars-d-learn

On 4/1/23 15:30, Paul wrote:

> Is there a way to verify that it split up the work in to tasks/threads
> ...?

It is hard to see the difference unless there is actual work in the loop 
that takes time. You can add a Thread.sleep call. (Commented-out in the 
following program.)


Another option is to monitor a task manager like 'top' on unix based 
systems. It should multiple threads for the same program.


However, I will do something unspeakably wrong and take advantage of 
undefined behavior below. :) Since iteration count is an even number, 
the 'sum' variable should come out as 0 in the end. With .parallel it 
doesn't because multiple threads are stepping on each other's toes (values):


import std;

void main() {
long sum;

foreach(i; iota(0, 2_000_000).parallel) {
// import core.thread;
// Thread.sleep(1.msecs);

if (i % 2) {
++sum;

} else {
--sum;
}
}

if (sum == 0) {
writeln("We highly likely worked serially.");

} else {
writefln!"We highly likely worked in parallel because %s != 
0."(sum);

}
}

If you remove .parallel, 'sum' will always be 0.

Ali



Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Paul via Digitalmars-d-learn
On Saturday, 1 April 2023 at 18:30:32 UTC, Steven Schveighoffer 
wrote:

On 4/1/23 2:25 PM, Paul wrote:

```d
import std.range;

foreach(; iota(0, 2_000_000).parallel)
```

-Steve


Is there a way to tell if the parallelism actually divided up the 
work?  Both versions of my program run in the same time ~6 secs.


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Paul via Digitalmars-d-learn

```d
import std.range;

foreach(; iota(0, 2_000_000).parallel)
```

-Steve


Is there a way to verify that it split up the work in to 
tasks/threads ...?  The example you gave me works...compiles w/o 
errors but the execution time is the same as the non-parallel 
version.  They both take about 6 secs to execute.  totalCPUs 
tells me I have 8 CPUs available.




Re: Why are globals set to tls by default? and why is fast code ugly by default?

2023-04-01 Thread IGotD- via Digitalmars-d-learn

On Saturday, 1 April 2023 at 15:02:12 UTC, Ali Çehreli wrote:


Does anyone have documentation on why Rust and Zip does not do 
thread local by default? I wonder what experience it was based 
on.




I think that would hard to get documentation on the rationale for 
that decision. Maybe you can get an answer in their forums but I 
doubt it. For Rust I think they based it on that globals should 
have some kind of synchronization which is enforced at compile 
time. Therefore TLS becomes second citizen.


Speaking of experience, I used to be a C++ programmer. We made 
use of thread-local storage precisely zero times. I think it's 
because the luminaries of the time did not even talk about it.




Yes, that's "normal" programming that you more or less never use 
TLS.


With D, I take good advantage of thread-local storage. 
Interestingly, I do that *only* for fast code.


void foo(int arg) {
static int[] workArea;

if (workArea.length < nededFor(arg)) {
// increase length
}

// Use workArea
}

Now I can use any number of threads using foo and they will 
have their independent work areas. Work area grows in amortized 
fashion for each thread.


I find the code above to be clean and beautiful. It is very 
fast because there are no synchronization primitives needed 
because no work area is shared between threads.




There is nothing beautiful with it other than the clean syntax. 
Why not just use a stack variable which is thread local as well. 
TLS is often allocated on the stack in many systems anyway. 
Accessing TLS variables can slower compared to stack variables. 
The complexity of TLS doesn't pay for its usefulness.




> It's common knowledge that accessing tls global is slow
> 
http://david-grs.github.io/tls_performance_overhead_cost_linux/


"TLS global is slow" would be misleading because even the 
article you linked explains right at the top, in the TL;DR are 
that "TLS may be slow".


This depends how it is implemented. TLS is really a forest and 
can be implemented in many ways and it also depends where it is 
being accessed (shared libraries, executable etc.). In general 
TLS on x86 is accessed by fs:[-offset_to_variable] this isn't 
that slow but the complexity to get there is high. Keep in mind 
the TLS area must be initialized for every thread creation which 
isn't ideal. fs:[] isn't always possible and a function call is 
required similar to a DLL symbol look up. TLS is a turd which 
shouldn't have been created. They should have stopped with 
key/value pair which languages then could build on if they 
wanted. Now TLS are in the executable standards and it is a mess. 
x86 has now two ways of TLS (normal and TLS_DESC) just to make 
things even more complicated. A programmer never see this mess 
but as systems programmer I see this and it is horrible.





Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Paul via Digitalmars-d-learn

Thanks Steve.




Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Steven Schveighoffer via Digitalmars-d-learn

On 4/1/23 2:25 PM, Paul wrote:

Thanks in advance for any assistance.

As the subject line suggests can I do something like? :
```d
foreach (i; taskPool.parallel(0..2_000_000))
```
Obviously this exact syntax doesn't work but I think it expresses the 
gist of my challenge.



```d
import std.range;

foreach(; iota(0, 2_000_000).parallel)
```

-Steve


foreach (i; taskPool.parallel(0..2_000_000)

2023-04-01 Thread Paul via Digitalmars-d-learn

Thanks in advance for any assistance.

As the subject line suggests can I do something like? :
```d
foreach (i; taskPool.parallel(0..2_000_000))
```
Obviously this exact syntax doesn't work but I think it expresses 
the gist of my challenge.


Re: Why are globals set to tls by default? and why is fast code ugly by default?

2023-04-01 Thread Timon Gehr via Digitalmars-d-learn

On 4/1/23 17:02, Ali Çehreli wrote:


Does anyone have documentation on why Rust and Zip does not do thread 
local by default?


Rust just does not do mutable globals except in unsafe code.


Re: Is this code correct?

2023-04-01 Thread Dennis via Digitalmars-d-learn

On Friday, 31 March 2023 at 13:11:58 UTC, z wrote:
I've tried to search before but was only able to find articles 
for 3D triangles, and documentation for OpenGL, which i don't 
use.


The first function you posted takes a 3D triangle as input, so I 
assumed you're working in 3D. What are you working on?



Determines if a triangle is visible.


You haven't defined what 'visible' means for a geometric triangle.



Re: Why are globals set to tls by default? and why is fast code ugly by default?

2023-04-01 Thread Ali Çehreli via Digitalmars-d-learn

On 3/26/23 13:41, ryuukk_ wrote:

> C, C++, Rust, Zig, Go doesn't do TLS by default for example

C doesn't do because there was no such concept when it was conceived.

C++ doesn't do because they built on top of C.

(D does because it has always been innovative.)

Go doesn't do because it had no innovations anyway.

Does anyone have documentation on why Rust and Zip does not do thread 
local by default? I wonder what experience it was based on.


Speaking of experience, I used to be a C++ programmer. We made use of 
thread-local storage precisely zero times. I think it's because the 
luminaries of the time did not even talk about it.


With D, I take good advantage of thread-local storage. Interestingly, I 
do that *only* for fast code.


void foo(int arg) {
static int[] workArea;

if (workArea.length < nededFor(arg)) {
// increase length
}

// Use workArea
}

Now I can use any number of threads using foo and they will have their 
independent work areas. Work area grows in amortized fashion for each 
thread.


I find the code above to be clean and beautiful. It is very fast because 
there are no synchronization primitives needed because no work area is 
shared between threads.


Finding one example to the contrary does not make TLS a bad idea. 
Engineering is full of compromises. I agree with D's TLS by-default idea.



Since I am here, I want to touch on something that may give the wrong 
idea to newer D programmers: D does not have globals. Every symbol 
belongs to a module.


And copying an earlier comment of yours:

> It's common knowledge that accessing tls global is slow
> http://david-grs.github.io/tls_performance_overhead_cost_linux/

"TLS global is slow" would be misleading because even the article you 
linked explains right at the top, in the TL;DR are that "TLS may be slow".


Ali



Re: Why are globals set to tls by default? and why is fast code ugly by default?

2023-04-01 Thread Adam D Ruppe via Digitalmars-d-learn

On Saturday, 1 April 2023 at 13:11:46 UTC, Guillaume Piolat wrote:

TLS could be explicit and we wouldn't need a -vtls flag.


Yeah, I think what we should do is make each thing be explicitly 
marked.


When I want tls, I tend to comment that it was intentional anyway 
to make it clear I didn't just forget to put a shared note on the 
static.


Re: Why are globals set to tls by default? and why is fast code ugly by default?

2023-04-01 Thread Guillaume Piolat via Digitalmars-d-learn

On Saturday, 1 April 2023 at 08:47:54 UTC, IGotD- wrote:


TLS by default is mistake in my opinion and it doesn't really 
help. TLS should be discouraged as much as possible as it is 
complicated and slows down thread creation.


It looks like a mistake if we consider none of the D-inspired 
languages have stolen TLS-by-default.




Re: Why are globals set to tls by default? and why is fast code ugly by default?

2023-04-01 Thread Guillaume Piolat via Digitalmars-d-learn

On Friday, 31 March 2023 at 19:43:42 UTC, bachmeier wrote:


Those of us that have been scarred by reading FORTRAN 77 code 
would disagree. I use global mutables myself (and even the 
occasional goto), but if anything, it should be 
`__GLOBAL_MUTABLE_VARIABLE` to increase the pain of using them.


But you kind of get into the same things with "accidental TLS". 
It doesn't race, but now the variable is different for every 
thread, which is a different kind of race.


TLS could be explicit and we wouldn't need a -vtls flag. There is 
no flag to warn for every use of @trusted, so in the grand scheme 
of things TLS is more dangerous than @trusted.


Re: Why are globals set to tls by default? and why is fast code ugly by default?

2023-04-01 Thread IGotD- via Digitalmars-d-learn
On Sunday, 26 March 2023 at 18:25:54 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
Having TLS by default is actually quite desirable if you like 
your code to be safe without having to do anything extra.


As soon as you go into global to the process memory, you are 
responsible for synchronization. Ensuring that the state is 
what you want it to be.


Keep in mind that threads didn't exist when C was created. They 
could not change their approach without breaking everyone's 
code. So what they do is totally irrelevant unless its 1980.


I think its the correct way around. You can't accidentally 
cause memory safety issues. You must explicitly opt-into the 
ability to mess up your programs state.


I think "safe" BS is going too far. Normally you don't use global 
variables at all but if you do the most usual is to use normal 
global variables with perhaps some kind of synchronization 
primitive. TLS is quite unusual and having TLS by default might 
even introduce bugs as the programmer believes that the value can 
be set by all threads while they are independent.


Regardless, __gshared in front of the variable isn't a huge deal 
but it shows that the memory model in D is a leaking bucket. Some 
compilers enforce synchronization primitives for global variables 
and are "safe" that way. However, sometimes you don't need them 
like in small systems that only has one thread and it just gets 
in the way.


TLS by default is mistake in my opinion and it doesn't really 
help. TLS should be discouraged as much as possible as it is 
complicated and slows down thread creation.


Re: Why are globals set to tls by default? and why is fast code ugly by default?

2023-04-01 Thread IGotD- via Digitalmars-d-learn
On Sunday, 26 March 2023 at 18:25:54 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
Having TLS by default is actually quite desirable if you like 
your code to be safe without having to do anything extra.


As soon as you go into global to the process memory, you are 
responsible for synchronization. Ensuring that the state is 
what you want it to be.


Keep in mind that threads didn't exist when C was created. They 
could not change their approach without breaking everyone's 
code. So what they do is totally irrelevant unless its 1980.


I think its the correct way around. You can't accidentally 
cause memory safety issues. You must explicitly opt-into the 
ability to mess up your programs state.


I think "safe" BS is going too far. Normally you don't use global 
variables at all but if you do the most usual is to use normal 
global variables with perhaps some kind of synchronization 
primitive. TLS is quite unusual and having TLS by default might 
even introduce bugs as the programmer believes that the value can 
be set by all threads while they are independent.


Regardless, __gshared in front of the variable isn't a huge deal 
but it shows that the memory model in D is a leaking bucket. Some 
compilers enforce synchronization primitives for global variables 
and are "safe" that way. However, sometimes you don't need them 
like in small systems that only has one thread and it just gets 
in the way.


TLS by default is mistake in my opinion and it doesn't really 
help. TLS should be discouraged as much as possible as it is 
complicated and slows down thread creation.