Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-28 Thread monkyyy via Digitalmars-d-announce

On Wednesday, 6 December 2023 at 16:28:08 UTC, Mike Parker wrote:


## The Next Meetings
We had our October monthly meeting one week after this meeting. 
The next quarterly should happen on January 5, 2024. We had no 
regular planning sessions in October, but two workgroup 
meetings took place regarding DMD-as-a-library. The monthly 
meeting summary is coming next, then I'll publish an update 
about the workgroup meetings.


https://monkyyyscience.substack.com/p/d-data-structures

please add this to the agenda


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-12 Thread Dukc via Digitalmars-d-announce

On Monday, 11 December 2023 at 19:55:38 UTC, Timon Gehr wrote:
There is the following trick. Not ideal since the length cannot 
be inferred, but this successfully injects alloca into the 
caller's scope.


Wow, what a great hack - I'd have never came up with that!


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-12 Thread Bastiaan Veelo via Digitalmars-d-announce

On Monday, 11 December 2023 at 19:55:38 UTC, Timon Gehr wrote:

... this successfully injects alloca into the caller's scope.

```d
import core.stdc.stdlib:alloca;
import std.range:ElementType;
import core.lifetime:moveEmplace;

struct VLA(T,alias len){
T[] storage;
this(R)(R initializer,return void[] 
storage=alloca(len*T.sizeof)[0..len*T.sizeof]){

this.storage=cast(T[])storage;
foreach(ref element;this.storage){
assert(!initializer.empty);
auto init=initializer.front;
moveEmplace!T(init,element);
initializer.popFront();
}
}
ref T opIndex(size_t i)return{ return storage[i]; }
T[] opSlice()return{ return storage; }
}

auto vla(alias len,R)(R initializer,void[] 
storage=alloca(len*ElementType!R.sizeof)[0..len*ElementType!R.sizeof]){

return VLA!(ElementType!R,len)(initializer,storage);
}

void main(){
import std.stdio,std.string,std.conv,std.range;
int x=readln.strip.to!int;
writeln(vla!x(2.repeat(x))[]);
}
```


You guys are great!


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-12 Thread Nicholas Wilson via Digitalmars-d-announce

On Monday, 11 December 2023 at 08:24:55 UTC, Bastiaan Veelo wrote:
On Sunday, 10 December 2023 at 22:59:06 UTC, Nicholas Wilson 
wrote:
Or you could use grep with `--output-ll` as noted by Johan 
https://github.com/ldc-developers/ldc/issues/4265#issuecomment-1376424944 although this will be with that `workaroundIssue1356` applied.


Thanks for highlighting this, as I must have forgotten. I 
should be able to create a CI job that checks this as part of 
the release. This will give us the confidence that we need.


I should note that regex will need some updating for the most 
recent LLVMs that have opaque pointers enabled:


`ptr byval\(%[a-zA-Z_][a-zA-Z0-9_\.]*\) align`



Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-11 Thread Sergey via Digitalmars-d-announce
On Monday, 11 December 2023 at 22:04:34 UTC, Nicholas Wilson 
wrote:
And please do get in touch with Bruce Carneal if you want some 
tips and insight with the practical and applied side of 
dcompute (also with auto-vectorisation) as he has used it a lot 
more than I have.


dcompute needs some love: 
https://github.com/libmir/dcompute/pull/74



Cheers, I look forward to some large speed increase reports.


it will be amazing to see such reports


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-11 Thread Nicholas Wilson via Digitalmars-d-announce

On Monday, 11 December 2023 at 08:24:55 UTC, Bastiaan Veelo wrote:
On Sunday, 10 December 2023 at 22:59:06 UTC, Nicholas Wilson 
wrote:
Always happy to help if you're interested in looking into 
using dcompute.


Thank you, I'll let you know!


And please do get in touch with Bruce Carneal if you want some 
tips and insight with the practical and applied side of dcompute 
(also with auto-vectorisation) as he has used it a lot more than 
I have.


Or you could use grep with `--output-ll` as noted by Johan 
https://github.com/ldc-developers/ldc/issues/4265#issuecomment-1376424944 although this will be with that `workaroundIssue1356` applied.


Thanks for highlighting this, as I must have forgotten. I 
should be able to create a CI job that checks this as part of 
the release. This will give us the confidence that we need.


-- Bastiaan.


Cheers, I look forward to some large speed increase reports.



Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-11 Thread Timon Gehr via Digitalmars-d-announce

On 12/11/23 20:55, Timon Gehr wrote:


There is the following trick. Not ideal since the length cannot be 
inferred, but this successfully injects alloca into the caller's scope.




I see Nick already brought it up.



Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-11 Thread Timon Gehr via Digitalmars-d-announce

On 12/6/23 17:28, Mike Parker wrote:



One way to do that in D is to use `alloca`, but that's an issue because 
the memory it allocates has to be used in the same function that calls 
the `alloca`. So you can't, e.g., use `alloca` to alloc memory in a 
constructor, and that prevents using it in a custom array 
implementation. He couldn't think of a way to translate it.


There is the following trick. Not ideal since the length cannot be 
inferred, but this successfully injects alloca into the caller's scope.


```d
import core.stdc.stdlib:alloca;
import std.range:ElementType;
import core.lifetime:moveEmplace;

struct VLA(T,alias len){
T[] storage;
this(R)(R initializer,return void[] 
storage=alloca(len*T.sizeof)[0..len*T.sizeof]){

this.storage=cast(T[])storage;
foreach(ref element;this.storage){
assert(!initializer.empty);
auto init=initializer.front;
moveEmplace!T(init,element);
initializer.popFront();
}
}
ref T opIndex(size_t i)return{ return storage[i]; }
T[] opSlice()return{ return storage; }
}

auto vla(alias len,R)(R initializer,void[] 
storage=alloca(len*ElementType!R.sizeof)[0..len*ElementType!R.sizeof]){

return VLA!(ElementType!R,len)(initializer,storage);
}

void main(){
import std.stdio,std.string,std.conv,std.range;
int x=readln.strip.to!int;
writeln(vla!x(2.repeat(x))[]);
}
```



Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-11 Thread Guillaume Piolat via Digitalmars-d-announce

On Sunday, 10 December 2023 at 15:08:05 UTC, Bastiaan Veelo wrote:


We are looking forward to being able to safely use LDC, because 
tests show that it has the potential to at least double the 
performance.




Yes, and that's before you its excellent SIMD capabilities :)




Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-11 Thread Bastiaan Veelo via Digitalmars-d-announce
On Sunday, 10 December 2023 at 22:59:06 UTC, Nicholas Wilson 
wrote:
Always happy to help if you're interested in looking into using 
dcompute.


Thank you, I'll let you know!

Or you could use grep with `--output-ll` as noted by Johan 
https://github.com/ldc-developers/ldc/issues/4265#issuecomment-1376424944 although this will be with that `workaroundIssue1356` applied.


Thanks for highlighting this, as I must have forgotten. I should 
be able to create a CI job that checks this as part of the 
release. This will give us the confidence that we need.


-- Bastiaan.


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-10 Thread Bastiaan Veelo via Digitalmars-d-announce

On Sunday, 10 December 2023 at 18:16:05 UTC, Nick Treleaven wrote:
You can call `alloca` as a default argument to a function. The 
memory will be allocated on the caller's stack before calling 
the function:

https://github.com/ntrel/stuff/blob/master/util.d#L113C1-L131C2

I've just tested and it seems it works as a constructor default 
argument too.


Clever!


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-10 Thread Nicholas Wilson via Digitalmars-d-announce

On Sunday, 10 December 2023 at 16:08:45 UTC, Bastiaan Veelo wrote:
On Sunday, 10 December 2023 at 15:31:55 UTC, Richard (Rikki) 
Andrew Cattermole wrote:


It will be interesting to hear how dcompute will fare in your 
situation, due to it being D code it should be an incremental 
improvement once you're ready to move to D fully.


Yes, dcompute could mean another leap forward. There are so 
many great things to look forward to.


-- Bastiaan.


Always happy to help if you're interested in looking into using 
dcompute. I can't remember if we've talked about it before, but 
if you were wanting to use it you'd need OpenCL 2.x (explicitly 
the 2.x version series, or make sure the 3.x implementation 
supports SPIRV) running on that 20 logical core box (or if it has 
GPUs attached to it, CUDA (any version should do) for NVidia GPUs 
or OpenCL 2.x (as above) on any other GPUs).


With regards to the stack corruption there is 
https://github.com/ldc-developers/ldc/blob/master/gen/abi/x86.cpp#L260 which has been there for some time. It would be fairly simple to issue a diagnostic there (although getting source location from there might be a bit tricky) for when there is both a `byval` and an alignment specified.


Or you could use grep with `--output-ll` as noted by Johan 
https://github.com/ldc-developers/ldc/issues/4265#issuecomment-1376424944 although this will be with that `workaroundIssue1356` applied.




Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-10 Thread Nick Treleaven via Digitalmars-d-announce

On Wednesday, 6 December 2023 at 16:28:08 UTC, Mike Parker wrote:
One way to do that in D is to use `alloca`, but that's an issue 
because the memory it allocates has to be used in the same 
function that calls the `alloca`. So you can't, e.g., use 
`alloca` to alloc memory in a constructor, and that prevents 
using it in a custom array implementation.


You can call `alloca` as a default argument to a function. The 
memory will be allocated on the caller's stack before calling the 
function:

https://github.com/ntrel/stuff/blob/master/util.d#L113C1-L131C2

I've just tested and it seems it works as a constructor default 
argument too.


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-10 Thread Bastiaan Veelo via Digitalmars-d-announce
On Sunday, 10 December 2023 at 17:11:04 UTC, Siarhei Siamashka 
wrote:
On Sunday, 10 December 2023 at 15:08:05 UTC, Bastiaan Veelo 
wrote:
The compiler can check if `scope` delegates escape a function, 
but it only does this in `@safe` code --- and our code is long 
from being `@safe`. So it was a bit of a puzzle to find out 
which arguments needed to be `scope` and which arguments 
couldn't be `scope`.


This reminded me of 
https://forum.dlang.org/thread/myiqlzkghnnyykbyk...@forum.dlang.org
LDC has a special GC2Stack IR optimization pass, which is a 
lifesaver in many cases like this.


Interesting.

Are there some known blocker bugs, which prevent a safe usage 
of LDC in production?


This one: https://github.com/ldc-developers/ldc/issues/4265

Mike has summarized it:
LDC unfortunately had an issue that caused stack corruption on 
32-bit Windows. They'd hit it in one case and were able to work 
around it, but he couldn't be sure they wouldn't hit it 
somewhere else. He wasn't willing to risk unreliable 
computations.


He said that LDC could do the right thing, but his 
understanding from talking to Martin was that implementing it 
would have a large time cost. Since Win32 is going to 
eventually go away, he wasn't very keen on paying that cost. 
They'd spoken at DConf about the possibility of LDC raising 
compilation errors when stack corruption could occur so that 
they could then work around those cases, but he hadn't followed 
up with Martin about it.


-- Bastiaan.


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-10 Thread Siarhei Siamashka via Digitalmars-d-announce

On Sunday, 10 December 2023 at 15:08:05 UTC, Bastiaan Veelo wrote:
1) Missing `scope` storage class specifiers on `delegate` 
function arguments. This can be chalked down as a beginner 
error, but also one that is easy to miss. If you didn't know: 
without `scope` the compiler cannot be sure that the delegate 
is not stored in some variable that has a longer lifetime than 
the stack frame of the (nested) function pointed to by the 
delegate. Therefore, a dynamic closure is created, which means 
that the stack is copied to new GC-allocated memory. In the 
majority of our cases, delegate arguments are simple callbacks 
that are only stored on the stack, but a select number of 
delegates in the GUI are stored for longer. The compiler can 
check if `scope` delegates escape a function, but it only does 
this in `@safe` code --- and our code is long from being 
`@safe`. So it was a bit of a puzzle to find out which 
arguments needed to be `scope` and which arguments couldn't be 
`scope`.


This reminded me of 
https://forum.dlang.org/thread/myiqlzkghnnyykbyk...@forum.dlang.org
LDC has a special GC2Stack IR optimization pass, which is a 
lifesaver in many cases like this.


So now all cores are finally under full load, which is a 
magnificent sight! Speed of DMD `release-nobounds` is on par 
with our Pascal version, if not slightly faster. We are looking 
forward to being able to safely use LDC, because tests show 
that it has the potential to at least double the performance.


Are there some known blocker bugs, which prevent a safe usage of 
LDC in production?


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-10 Thread Bastiaan Veelo via Digitalmars-d-announce
On Sunday, 10 December 2023 at 15:31:55 UTC, Richard (Rikki) 
Andrew Cattermole wrote:


It will be interesting to hear how dcompute will fare in your 
situation, due to it being D code it should be an incremental 
improvement once you're ready to move to D fully.


Yes, dcompute could mean another leap forward. There are so many 
great things to look forward to.


-- Bastiaan.


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-10 Thread Richard (Rikki) Andrew Cattermole via Digitalmars-d-announce

That is awesome to hear!

If the move towards ldc has the potential to half your run time, that is 
quite a significant improvement for your customers.


It will be interesting to hear how dcompute will fare in your situation, 
due to it being D code it should be an incremental improvement once 
you're ready to move to D fully.


Based upon the estimates here already, it seems like acquiring an LDC 
developer in house might be well worth it.


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-10 Thread Bastiaan Veelo via Digitalmars-d-announce

On Wednesday, 6 December 2023 at 16:28:08 UTC, Mike Parker wrote:

Bastiaan reported that SARC had been testing their D codebase 
(transpiled from Pascal---[see Bastiaan's DConf 2019 
talk](https://youtu.be/HvunD0ZJqiA)). They'd found the 
multithreaded performance worse than the Pascal version. He 
said that execution time increased with more threads and that 
it didn't matter how many threads you throw at it. It's the 
latter problem he was focused on at the moment.


I have an update on this issue. But first let me clarify how 
grave this situation is (was!) for us. There are certain tasks 
that we, and our customers, need to perform that involves a 20 
logical core computer to crunch numbers for a week. This is 
painful, but it also means that a doubling of that time is 
completely unacceptable, let alone a 20-fold increase. It is the 
difference between in business and out of business.


Aside from the allocation issue, there are several other 
properties that our array implementation needs to replicate from 
Extended Pascal: being able to have non-0 starting indices, 
having value semantics, having array limits that can be 
compile-time and run-time, and function arguments that must work 
on arrays of any limits, also for multi-dimensional arrays. So 
while trying to solve one aspect, care had to be taken not to 
break any of the other aspects.


It turned out that thread contention had more than one causes, 
which made this an extra frustrating problem because just as we 
thought to have found the culprit, it did not have the effect 
that we expected.


These were the three major reasons we were seeing large thread 
contention, in no particular order:


1) Missing `scope` storage class specifiers on `delegate` 
function arguments. This can be chalked down as a beginner error, 
but also one that is easy to miss. If you didn't know: without 
`scope` the compiler cannot be sure that the delegate is not 
stored in some variable that has a longer lifetime than the stack 
frame of the (nested) function pointed to by the delegate. 
Therefore, a dynamic closure is created, which means that the 
stack is copied to new GC-allocated memory. In the majority of 
our cases, delegate arguments are simple callbacks that are only 
stored on the stack, but a select number of delegates in the GUI 
are stored for longer. The compiler can check if `scope` 
delegates escape a function, but it only does this in `@safe` 
code --- and our code is long from being `@safe`. So it was a bit 
of a puzzle to find out which arguments needed to be `scope` and 
which arguments couldn't be `scope`.
2) Allocating heap memory in the array implementation, as 
discussed in the meeting. We followed Walter's advice and now use 
`alloca`. Not directly, but using string mixin's and static 
member functions that generate the appropriate code.
3) Stale calls to `GC.addRange` and `GC.removeRange`. These were 
left over from an experiment where we tried to circumvent the 
garbage collector. Without knowing these were still in there, we 
were puzzled because we even saw contention in code that was 
marked `@nogc`. It makes sense now, because even though 
`addRange` doesn't allocate, it does need the global GC lock to 
register the range safely. Because the stack is already scanned 
by default, these calls were now superfluous and could be removed.


So now all cores are finally under full load, which is a 
magnificent sight! Speed of DMD `release-nobounds` is on par with 
our Pascal version, if not slightly faster. We are looking 
forward to being able to safely use LDC, because tests show that 
it has the potential to at least double the performance.


A big sigh of relief from us as we have solved the biggest hurdle 
(hopefully!) on our way to full adoption of D.


-- Bastiaan.


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-06 Thread Sergey via Digitalmars-d-announce

On Wednesday, 6 December 2023 at 16:28:08 UTC, Mike Parker wrote:

### Bastiaan
They'd found the multithreaded performance worse than the 
Pascal version. He said that execution time increased with more 
threads and that it didn't matter how many threads you throw at 
it. It's the latter problem he was focused on at the moment.


At first, they'd suspected the GC, but it turned out to be 
contention resulting from heap allocation. In Pascal, they'd 
heavily used variable-length arrays. For those, the length is 
determined at run time, but it's fixed. Since they can't grow, 
they're put on the stack. This makes them quite fast and avoids 
the global lock of the heap.


I am kindly invite Bastiaan and his team to participate in this 
competition :) https://github.com/jinyus/related_post_gen


fixed-sized arrays will suit perfectly for the task, and it also 
has multithreading comparison! Pascal should be good over there


Re: D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-06 Thread ryuukk_ via Digitalmars-d-announce
This needs to be taken out of DRuntime because DRuntime is 
distributed pre-compiled, and that ties it to a specific 
compiler API, which isn't good. Instead, we should distribute 
it as a package. It's something he'd brought up before.


Why not directly distribute DRuntime as a source? or rather, 
simplify how it can be used as a source


``dmd -i`` does the magic already, it'll be able to pick what 
ever module on the fly


That's how i use my custom runtime, as source, makes things much 
smoother to use, however, in the case of druntime, it might 
highlight some compilation speed issues


What was the rational behind distributing the runtime as a 
compiled library?


D Language Foundation October 2023 Quarterly Meeting Summary

2023-12-06 Thread Mike Parker via Digitalmars-d-announce
The D Language Foundation's quarterly meeting for October 2023 
took place on Friday the 6th at 15:00 UTC. This was quite a short 
one as far as quarterlies go, clocking in at around 35 minutes.


## The Attendees

The following people attended the meeting:

* Mathis Beer (Funkwerk)
* Walter Bright (DLF)
* Dennis Korpel (DLF)
* Mario Kröplin (Funkwerk)
* Mathias Lang (DLF/Symmetry)
* Átila Neves (DLF/Symmetry)
* Mike Parker (DLF)
* Igor Pikovets (Ahrefs)
* Carsten Rasmussen (Decard)
* Robert Schadek (DLF/Symmetry)
* Bastiaan Veelo (SARC)

## The Summary

### Bastiaan
Bastiaan reported that SARC had been testing their D codebase 
(transpiled from Pascal---[see Bastiaan's DConf 2019 
talk](https://youtu.be/HvunD0ZJqiA)). They'd found the 
multithreaded performance worse than the Pascal version. He said 
that execution time increased with more threads and that it 
didn't matter how many threads you throw at it. It's the latter 
problem he was focused on at the moment.


At first, they'd suspected the GC, but it turned out to be 
contention resulting from heap allocation. In Pascal, they'd 
heavily used variable-length arrays. For those, the length is 
determined at run time, but it's fixed. Since they can't grow, 
they're put on the stack. This makes them quite fast and avoids 
the global lock of the heap.


One way to do that in D is to use `alloca`, but that's an issue 
because the memory it allocates has to be used in the same 
function that calls the `alloca`. So you can't, e.g., use 
`alloca` to alloc memory in a constructor, and that prevents 
using it in a custom array implementation. He couldn't think of a 
way to translate it. He was able to work around it by using 
allocators in the array implementation with a thread-local free 
list. He found that promising. His current problem was that it 
took a lot of time to understand the experimental allocators 
package. Once he got this sorted, he would have to see if it 
helped solve the problem they were seeing with more threads 
resulting in worse performance.


There was also a problem with DMD underperforming Pascal. DMD's 
output was about five times slower than Pascal's. His tests with 
LDC showed it was two times faster than Pascal. Unfortunately, 
they are currently limited to 32-bit Windows, and it will be a 
few years before they can migrate to 64-bit. LDC unfortunately 
had an issue that [caused stack corruption on 32-bit 
Windows](https://github.com/ldc-developers/ldc/issues/4265). 
They'd hit it in one case and were able to work around it, but he 
couldn't be sure they wouldn't hit it somewhere else. He wasn't 
willing to risk unreliable computations.


He said that LDC could do the right thing, but his understanding 
from talking to Martin was that implementing it would have a 
large time cost. Since Win32 is going to eventually go away, he 
wasn't very keen on paying that cost. They'd spoken at DConf 
about the possibility of LDC raising compilation errors when 
stack corruption could occur so that they could then work around 
those cases, but he hadn't followed up with Martin about it.


They'd spent seven years getting the transcompilation complete, 
so this was a critical issue they needed to resolve. He was 
hopeful that the experimental allocator package would help solve 
it.


Robert asked if he'd looked into doing something like the small 
string optimization, where you set a default size that you use 
for static arrays and then only resort to heap allocation when 
you need something larger. Had they analyzed their code to 
determine the array sizes they were using? Bastiaan said yes, a 
consequence of this issue was that they were linking with a 
rather large stack size.


Walter suggested he just use `alloca`. Just have the 
transcompiler emit calls to `alloca` in the first lines of the 
function body for any VLAs and they should be okay. Bastiaan said 
they'd thought of allocating large chunks of memory up front and 
just picking off chunks of that for a custom allocator. That 
works very close to a free list, then he discovered the std 
allocator package has a free list. His experiments with that 
worked, but it had been challenging to implement it more 
generally. He said he would have to take another look at `alloca`.


Walter said `alloca` wasn't used very much in D, but it's there. 
If he were to implement C VLAs, that's what he'd use to do it. 
Robert stressed they should analyze their code to see what a 
magic maximum number of elements is and just use that for static 
arrays, allocating on the heap when they need more. Static arrays 
and `alloca` were comparable to some degree. Maybe they could get 
away with that. It should result in cleaner code.


Robert also suggested that since this project has been going on 
for so long and was a good showcase for D in general, Bastiaan 
should come back and ask for help even on more than a quarterly 
basis. We then had a bit of discussion about what it would take 
to fix the LDC