[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed
https://issues.dlang.org/show_bug.cgi?id=15873 Walter Bright changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #8 from Walter Bright --- DMD predefines some version identifiers based on SIMD level: version (D_SIMD) - for SSE2 instruction sets version (D_AVX) - for SSE2..AVX instruction sets version (D_AVX2) - for SSE2..AVX2 instruction sets which should do the job. --
[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed
https://issues.dlang.org/show_bug.cgi?id=15873 Marco Leisechanged: What|Removed |Added CC||marco.le...@gmx.de --- Comment #7 from Marco Leise --- My concern is with "fast.json" where the call site reads auto json = parseJSON(...); and I feel that import core.cpuid; if (sse42) handleJson!true(); else handleJson!false(); void handleJson(bool sse42)() { auto json = parseJSON!sse42(...); } is just not palatable. ('handleJson' being needed, since the return value would be a RAII struct with compile-time specialization.) Importing core.cpuid, figuring out which flag to use and set as a template argument and writing a switch-case or if-else is not economically reasonable, so to speak when you could enable SSE4 globally and often implicitly (-march=native). Also in my case DMD wont profit, because it's inline assembly doesn't inline (making it too slow) and GDC wont profit because it is not supported by core.cpuid, leaving only LDC - but that's another story. My argument here is that the one writing SIMD code is not necessarily the one calling it. Compile-time information about the (implied) target enables us to reduce the cognitive load for library users, and still make use of the latest CPU features. This is working to great benefit with intrinsics in other compilers (for popcnt, memcpy, etc.), but we can't imitate that. So we ended up with runtime checks against a global variable in popcnt for what should be a single instruction on recent CPUs and an additional "SSE4 only" _popcnt in http://dlang.org/phobos/core_bitop.html#.popcnt --
[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed
https://issues.dlang.org/show_bug.cgi?id=15873 Manuchanged: What|Removed |Added CC||turkey...@gmail.com --- Comment #6 from Manu --- DMD really needs some way to select the simd level to target from the command line. Runtime selection is appropriate at the outer loop, but runtime selection is not practical for small occurrences of SIMD appearing littered around, or where the selection would be made in the inner loop. --
[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed
https://issues.dlang.org/show_bug.cgi?id=15873 poncechanged: What|Removed |Added CC||alil...@gmail.com --- Comment #5 from ponce --- Could DMD also generate SSE code for 32-bit targets (easily)? SSE2 is very common. I see two main advantages: - it can also avoid some divergence in results between 32-bit and 64-bit related to the unexpected higher precision of FPU operations. Using the FPU you might think that floats are sufficient for one task when they aren't, because they were promoted to 80-bit float internally. - avoiding denormals. It is a recurring concern in audio code though not that bad. MSVC generates SSE2 in 32-bit by default I think. --
[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed
https://issues.dlang.org/show_bug.cgi?id=15873 --- Comment #4 from Walter Bright--- https://github.com/D-Programming-Language/dlang.org/pull/1260 --
[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed
https://issues.dlang.org/show_bug.cgi?id=15873 --- Comment #3 from Walter Bright--- DMD predefines "D_SIMD" for: 1. all 64 bit code generation 2. OSX 32 bit code generation and does generate SIMD instructions for those platforms. DMD does not have compiler switches to select SIMD levels. --
[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed
https://issues.dlang.org/show_bug.cgi?id=15873 --- Comment #2 from Walter Bright--- newsgroup thread: http://www.digitalmars.com/d/archives/digitalmars/D/Any_usable_SIMD_implementation_282806.html github thread: https://github.com/D-Programming-Language/phobos/pull/2862 --
[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed
https://issues.dlang.org/show_bug.cgi?id=15873 Walter Brightchanged: What|Removed |Added CC||bugzi...@digitalmars.com --- Comment #1 from Walter Bright --- For DMD, the minimum SIMD level can be ascertained by: 1. the operating system - for example, OSX is only sold on certain CPUs and above. Also, Linux assumes SIMD in the default behavior of gcc. 2. 32 or 64 bit code being generated The DMD compiler assumes the existence of that minimum SIMD level, and generates SIMD code accordingly. The SIMD capabilities can be tested at runtime: http://dlang.org/phobos/core_cpuid.html This is used, for example, here: https://github.com/D-Programming-Language/druntime/blob/master/src/rt/arraydouble.d#L33 The idea is to use a template to statically generated code for each supported SIMD level. Then, test the capabilities at a high level, and select the right branch at the high level. Then each level's implementation runs at full speed with custom code for that level. --
[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed
https://issues.dlang.org/show_bug.cgi?id=15873 Jack Stoufferchanged: What|Removed |Added Keywords||CTFE, SIMD --