Re: vectorization of a simple loop -- not in DMD?

2022-07-11 Thread Bruce Carneal via Digitalmars-d-learn
On Monday, 11 July 2022 at 18:15:16 UTC, Ivan Kazmenko wrote: Hi. I'm looking at the compiler output of DMD (-O -release), LDC (-O -release), and GDC (-O3) for a simple array operation: ``` void add1 (int [] a) { foreach (i; 0..a.length) a[i] += 1; } ``` Here are the outputs:

Re: abs and minimum values

2021-10-29 Thread Bruce Carneal via Digitalmars-d-learn
On Friday, 29 October 2021 at 14:23:49 UTC, Kagamin wrote: Unsigned integers aren't numbers. assert(-abs(1)<0); Unsigneds approximate whole numbers of course (truncated on one side). Likewise signeds approximate integers (across a restricted interval). As always, we need to be careful with

invariants and compiler flags, best practice?

2021-08-06 Thread Bruce Carneal via Digitalmars-d-learn
I'm nervous enough about future compilations/builds of the code that I'm responsible for that I employ the following idiom quite a bit, mostly in @trusted code: (some boolean expression denoting invariants) || assert(0, "what went wrong"); How might the above cause problems and how do you

Re: Can I get the time "Duration" in "nsecs" acurracy?

2021-07-10 Thread Bruce Carneal via Digitalmars-d-learn
On Saturday, 10 July 2021 at 01:11:28 UTC, Steven Schveighoffer wrote: You can get better than hnsecs resolution with `core.time.MonoTime`, which can support whatever the OS supports. However, `Duration` and `SysTime` are stored in hnsecs for a very specific reason -- range. Simply put,

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-03-07 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 7 March 2021 at 14:15:58 UTC, z wrote: On Thursday, 25 February 2021 at 14:28:40 UTC, Guillaume Piolat wrote: On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote: How does one optimize code to make full use of the CPU's SIMD capabilities? Is there any way to guarantee that

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-02-25 Thread Bruce Carneal via Digitalmars-d-learn
On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote: How does one optimize code to make full use of the CPU's SIMD capabilities? Is there any way to guarantee that "packed" versions of SIMD instructions will be used?(e.g. vmulps, vsqrtps, etc...) To give some context, this is a sample of one

Re: D meets GPU: recommendations?

2021-01-29 Thread Bruce Carneal via Digitalmars-d-learn
On Friday, 29 January 2021 at 20:01:17 UTC, Bruce Carneal wrote: On Friday, 29 January 2021 at 17:46:05 UTC, Guillaume Piolat wrote: On Friday, 29 January 2021 at 16:34:25 UTC, Bruce Carneal wrote: The project I've been working on for the last few months has a compute backend that is currently

Re: D meets GPU: recommendations?

2021-01-29 Thread Bruce Carneal via Digitalmars-d-learn
On Friday, 29 January 2021 at 17:46:05 UTC, Guillaume Piolat wrote: On Friday, 29 January 2021 at 16:34:25 UTC, Bruce Carneal wrote: The project I've been working on for the last few months has a compute backend that is currently written MT+SIMD. I would like to bring up a GPU variant. What

Re: D meets GPU: recommendations?

2021-01-29 Thread Bruce Carneal via Digitalmars-d-learn
On Friday, 29 January 2021 at 18:23:40 UTC, mw wrote: On Friday, 29 January 2021 at 16:34:25 UTC, Bruce Carneal wrote: Guidance from experience regarding any of the above, or other, GPU possibilities would be most welcome. https://dlang.org/blog/2017/10/30/d-compute-running-d-on-the-gpu/

D meets GPU: recommendations?

2021-01-29 Thread Bruce Carneal via Digitalmars-d-learn
The project I've been working on for the last few months has a compute backend that is currently written MT+SIMD. I would like to bring up a GPU variant. If you have experience with this sort of thing, I'd love to hear from you, either within this forum or at beerconf. In a past life I was

Re: low-latency GC

2020-12-06 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 6 December 2020 at 16:42:00 UTC, Ola Fosheim Grostad wrote: On Sunday, 6 December 2020 at 14:44:25 UTC, Paulo Pinto wrote: And while on the subject of low level programming in JVM or .NET. https://www.infoq.com/news/2020/12/net-5-runtime-improvements/ Didnt say anything about low

Re: low-latency GC

2020-12-06 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 6 December 2020 at 08:59:49 UTC, Ola Fosheim Grostad wrote: On Sunday, 6 December 2020 at 08:36:49 UTC, Bruce Carneal wrote: Yes, but they don't allow low level programming. Go also freeze to sync threads this has a rather profound impact on code generation. They have spent a lot of

Re: low-latency GC

2020-12-06 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 6 December 2020 at 08:12:58 UTC, Ola Fosheim Grostad wrote: On Sunday, 6 December 2020 at 07:45:17 UTC, Bruce Carneal wrote: GCs scan memory, sure. Lots of variations. Not germane. Not a rationale. We need to freeze the threads when collecting stacks/globals. OK. Low latency

Re: low-latency GC

2020-12-05 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 6 December 2020 at 06:52:41 UTC, Ola Fosheim Grostad wrote: On Sunday, 6 December 2020 at 05:41:05 UTC, Bruce Carneal wrote: OK. Some rationale? Do you, for example, believe that no-probable-dlanger could benefit from a low-latency GC? That it is too hard to implement? That the

Re: low-latency GC

2020-12-05 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim Grostad wrote: On Sunday, 6 December 2020 at 05:16:26 UTC, Bruce Carneal wrote: How difficult would it be to add a, selectable, low-latency GC to dlang? Is it closer to "we cant get there from here" or "no big deal if you already have

low-latency GC

2020-12-05 Thread Bruce Carneal via Digitalmars-d-learn
How difficult would it be to add a, selectable, low-latency GC to dlang? Is it closer to "we cant get there from here" or "no big deal if you already have the low-latency GC in hand"? I've heard Walter mention performance issues (write barriers IIRC). I'm also interested in the GC-flavor

Re: is type checking in D undecidable?

2020-10-23 Thread Bruce Carneal via Digitalmars-d-learn
On Friday, 23 October 2020 at 16:56:46 UTC, Kagamin wrote: On Thursday, 22 October 2020 at 18:24:47 UTC, Bruce Carneal wrote: Per the wiki on termination analysis some languages with dependent types (Agda, Coq) have built-in termination checkers. What they do with code that does, say, a hash

Re: is type checking in D undecidable?

2020-10-22 Thread Bruce Carneal via Digitalmars-d-learn
On Friday, 23 October 2020 at 04:24:09 UTC, Paul Backus wrote: On Friday, 23 October 2020 at 00:53:19 UTC, Bruce Carneal wrote: When you write functions, the compiler helps you out with fully automated constraint checking. When you write templates you can write them so that they look like

Re: is type checking in D undecidable?

2020-10-22 Thread Bruce Carneal via Digitalmars-d-learn
On Thursday, 22 October 2020 at 20:37:22 UTC, Paul Backus wrote: On Thursday, 22 October 2020 at 19:24:53 UTC, Bruce Carneal wrote: On a related topic, I believe that type functions enable a large amount of code in the "may be hard to prove decidable" category (templates) to be (re)written as

Re: is type checking in D undecidable?

2020-10-22 Thread Bruce Carneal via Digitalmars-d-learn
On Thursday, 22 October 2020 at 18:46:07 UTC, Ola Fosheim Grøstad wrote: On Thursday, 22 October 2020 at 18:38:12 UTC, Stefan Koch wrote: On Thursday, 22 October 2020 at 18:33:52 UTC, Ola Fosheim Grøstad wrote: In general, it is hard to tell if a computation is long-running or unsolvable.

Re: is type checking in D undecidable?

2020-10-22 Thread Bruce Carneal via Digitalmars-d-learn
On Thursday, 22 October 2020 at 18:04:32 UTC, Ola Fosheim Grøstad wrote: On Thursday, 22 October 2020 at 17:25:44 UTC, Bruce Carneal wrote: Is type checking in D undecidable? Per the wiki on dependent types it sure looks like it is. Even if it is, you can still write something that is

is type checking in D undecidable?

2020-10-22 Thread Bruce Carneal via Digitalmars-d-learn
Is type checking in D undecidable? Per the wiki on dependent types it sure looks like it is. I assume that it's well known to the compiler contributors that D type checking is undecidable which, among other reasons, is why we have things like template recursion limits. Confirmation of the

Re: __vector(ubyte[32]) misalignment

2020-08-10 Thread Bruce Carneal via Digitalmars-d-learn
On Monday, 10 August 2020 at 13:52:46 UTC, Steven Schveighoffer wrote: On 8/9/20 8:46 AM, Steven Schveighoffer wrote: On 8/9/20 8:37 AM, Steven Schveighoffer wrote: I think this has come up before, there may even be a bug report on it. Found one, I'll see if I can fix the array runtime:

Re: __vector(ubyte[32]) misalignment

2020-08-10 Thread Bruce Carneal via Digitalmars-d-learn
On Monday, 10 August 2020 at 13:52:46 UTC, Steven Schveighoffer wrote: On 8/9/20 8:46 AM, Steven Schveighoffer wrote: On 8/9/20 8:37 AM, Steven Schveighoffer wrote: I think this has come up before, there may even be a bug report on it. Found one, I'll see if I can fix the array runtime:

Re: __vector(ubyte[32]) misalignment

2020-08-09 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 9 August 2020 at 12:37:06 UTC, Steven Schveighoffer wrote: On 8/9/20 8:09 AM, Bruce Carneal wrote: [...] All blocks in the GC that are more than 16 bytes are aligned by 32 bytes. You shouldn't have any 16 byte blocks here, because each element is 32 bytes long. However, if your

Re: __vector(ubyte[32]) misalignment

2020-08-09 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 9 August 2020 at 10:02:32 UTC, kinke wrote: On Sunday, 9 August 2020 at 01:03:51 UTC, Bruce Carneal wrote: Is sub .alignof alignment expected here? IOW, do I have to manually manage memory if I want alignments above 16? IIRC, yes when using the GC, as that only guarantees 16-bytes

Re: __vector(ubyte[32]) misalignment

2020-08-09 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 9 August 2020 at 09:58:18 UTC, Johan wrote: On Sunday, 9 August 2020 at 01:03:51 UTC, Bruce Carneal wrote: The .alignof attribute of __vector(ubyte[32]) is 32 but initializing an array of such vectors via an assignment to .length has given me 16 byte alignment (and subsequent seg

Re: __vector(ubyte[32]) misalignment

2020-08-09 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 9 August 2020 at 05:49:23 UTC, user1234 wrote: On Sunday, 9 August 2020 at 01:56:54 UTC, Bruce Carneal wrote: On Sunday, 9 August 2020 at 01:03:51 UTC, Bruce Carneal wrote: Manually managing the alignment eliminated the seg faulting. Additionally, I found that

Re: __vector(ubyte[32]) misalignment

2020-08-08 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 9 August 2020 at 01:03:51 UTC, Bruce Carneal wrote: The .alignof attribute of __vector(ubyte[32]) is 32 but initializing an array of such vectors via an assignment to .length has given me 16 byte alignment (and subsequent seg faults which I suspect are related). Is sub .alignof

__vector(ubyte[32]) misalignment

2020-08-08 Thread Bruce Carneal via Digitalmars-d-learn
The .alignof attribute of __vector(ubyte[32]) is 32 but initializing an array of such vectors via an assignment to .length has given me 16 byte alignment (and subsequent seg faults which I suspect are related). Is sub .alignof alignment expected here? IOW, do I have to manually manage

Re: safety and auto vectorization

2020-08-03 Thread Bruce Carneal via Digitalmars-d-learn
On Monday, 3 August 2020 at 18:55:36 UTC, Steven Schveighoffer wrote: On 8/2/20 1:31 PM, Bruce Carneal wrote: import std; void f0(int[] a, int[] b, int[] dst) @safe {     dst[] = a[] + b[]; } [snip of auto-vectorization example] I was surprised that f0 ran just fine with a.length and

safety and auto vectorization

2020-08-02 Thread Bruce Carneal via Digitalmars-d-learn
import std; void f0(int[] a, int[] b, int[] dst) @safe { dst[] = a[] + b[]; } void f1(int[] a, int[] b, int[] dst) @trusted { const minLen = min(a.length, b.length, dst.length); dst[0..minLen] = a[0..minLen] + b[0..minLen]; assert(dst.length == minLen); } I was surprised that

Re: idiomatic output given -preview=nosharedaccess ,

2020-07-01 Thread Bruce Carneal via Digitalmars-d-learn
On Tuesday, 30 June 2020 at 20:43:00 UTC, Bruce Carneal wrote: On Tuesday, 30 June 2020 at 20:12:59 UTC, Stanislav Blinov wrote: On Tuesday, 30 June 2020 at 20:04:33 UTC, Steven Schveighoffer wrote: The answer is -- update Phobos so it works with -nosharedaccess :) Yeah... and dip1000. And

Re: idiomatic output given -preview=nosharedaccess ,

2020-06-30 Thread Bruce Carneal via Digitalmars-d-learn
On Tuesday, 30 June 2020 at 20:12:59 UTC, Stanislav Blinov wrote: On Tuesday, 30 June 2020 at 20:04:33 UTC, Steven Schveighoffer wrote: The answer is -- update Phobos so it works with -nosharedaccess :) Yeah... and dip1000. And dip1008. And dip... :) Didn't want to be snippity but, yeah,

idiomatic output given -preview=nosharedaccess ,

2020-06-30 Thread Bruce Carneal via Digitalmars-d-learn
Given -preview=nosharedaccess on the command line, "hello world" fails to compile (you are referred to core.atomic ...). What is the idiomatic way to get writeln style output from a nosharedaccess program? Is separate compilation the way to go?

Re: linker aliases to carry dlang attributes for externs

2020-04-12 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 12 April 2020 at 23:14:42 UTC, Bruce Carneal wrote: Could dlang compilers emit aliases for extern(C) and extern(C++) routines that would carry dlang specific information? (@safe, @nogc, nothrow, ...) I'm thinking two symbols. The first as per normal C/C++, and the second as per

linker aliases to carry dlang attributes for externs

2020-04-12 Thread Bruce Carneal via Digitalmars-d-learn
Could dlang compilers emit aliases for extern(C) and extern(C++) routines that would carry dlang specific information? (@safe, @nogc, nothrow, ...) I'm thinking two symbols. The first as per normal C/C++, and the second as per normal dlang with a "use API {C, C++, ...}" suffix.

Re: auto vectorization notes

2020-03-28 Thread Bruce Carneal via Digitalmars-d-learn
On Saturday, 28 March 2020 at 18:01:37 UTC, Crayo List wrote: On Saturday, 28 March 2020 at 06:56:14 UTC, Bruce Carneal wrote: On Saturday, 28 March 2020 at 05:21:14 UTC, Crayo List wrote: On Monday, 23 March 2020 at 18:52:16 UTC, Bruce Carneal wrote: [snip] Explicit SIMD code, ispc or other,

Re: auto vectorization notes

2020-03-28 Thread Bruce Carneal via Digitalmars-d-learn
On Saturday, 28 March 2020 at 05:21:14 UTC, Crayo List wrote: On Monday, 23 March 2020 at 18:52:16 UTC, Bruce Carneal wrote: [snip] (on the downside you have to guard against compiler code-gen performance regressions) auto vectorization is bad because you never know if your code will get

auto vectorization notes

2020-03-23 Thread Bruce Carneal via Digitalmars-d-learn
When speeds are equivalent, or very close, I usually prefer auto vectorized code to explicit SIMD/__vector code as it's easier to read. (on the downside you have to guard against compiler code-gen performance regressions) One oddity I've noticed is that I sometimes need to use

Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-28 Thread Bruce Carneal via Digitalmars-d-learn
On Friday, 28 February 2020 at 10:11:23 UTC, Bruce Carneal wrote: On Friday, 28 February 2020 at 06:50:55 UTC, 9il wrote: On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote: So after reading the translation of RYU I was interested too see if the decimalLength() function can be

Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-28 Thread Bruce Carneal via Digitalmars-d-learn
On Friday, 28 February 2020 at 06:50:55 UTC, 9il wrote: On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote: So after reading the translation of RYU I was interested too see if the decimalLength() function can be written to be faster, as it cascades up to 8 CMP. [...] bsr can

Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-27 Thread Bruce Carneal via Digitalmars-d-learn
On Thursday, 27 February 2020 at 19:46:23 UTC, Basile B. wrote: Yes please, post the benchmark method. You see the benchmarks I run with your version are always slowest. I'm aware that rndGen (and generaly any uniform rnd func) is subject to a bias but I dont thing this bias maters much in the

Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-27 Thread Bruce Carneal via Digitalmars-d-learn
On Thursday, 27 February 2020 at 17:11:48 UTC, Basile B. wrote: On Thursday, 27 February 2020 at 15:29:02 UTC, Bruce Carneal wrote: On Thursday, 27 February 2020 at 08:52:09 UTC, Basile B. wrote: I will post my code if there is any meaningful difference in your subsequent results. give me

Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-27 Thread Bruce Carneal via Digitalmars-d-learn
On Thursday, 27 February 2020 at 15:29:02 UTC, Bruce Carneal wrote: big snip TL;DR for the snipped: Unsurprisingly, different inputs will lead to different timing results. The equi-probable values supplied by a standard PRNG differ significantly from an equi-probable digit input. In

Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-27 Thread Bruce Carneal via Digitalmars-d-learn
On Thursday, 27 February 2020 at 08:52:09 UTC, Basile B. wrote: On Thursday, 27 February 2020 at 04:44:56 UTC, Basile B. wrote: On Thursday, 27 February 2020 at 03:58:15 UTC, Bruce Carneal wrote: Maybe you talked about another implementation of decimalLength9 ? Yes. It's one I wrote after

Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Bruce Carneal via Digitalmars-d-learn
On Thursday, 27 February 2020 at 03:58:15 UTC, Bruce Carneal wrote: On Wednesday, 26 February 2020 at 23:09:34 UTC, Basile B. wrote: On Wednesday, 26 February 2020 at 20:44:31 UTC, Bruce Carneal wrote: After shuffling the input, branchless wins by 2.4X (240%). snip Let me know if the

Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Bruce Carneal via Digitalmars-d-learn
On Wednesday, 26 February 2020 at 23:09:34 UTC, Basile B. wrote: On Wednesday, 26 February 2020 at 20:44:31 UTC, Bruce Carneal wrote: After shuffling the input, branchless wins by 2.4X (240%). I've replaced the input by the front of a rndGen (that pops for count times and starting with a

Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Bruce Carneal via Digitalmars-d-learn
On Wednesday, 26 February 2020 at 19:44:05 UTC, Bruce Carneal wrote: On Wednesday, 26 February 2020 at 13:50:11 UTC, Basile B. wrote: On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote: ... foreach (i; 0 .. count) sum += funcs[func](i); The input stream is highly

Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Bruce Carneal via Digitalmars-d-learn
On Wednesday, 26 February 2020 at 13:50:11 UTC, Basile B. wrote: On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote: ... foreach (i; 0 .. count) sum += funcs[func](i); The input stream is highly predictable and strongly skewed towards higher digits. The winning