[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed

2020-12-21 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15873

Walter Bright  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #8 from Walter Bright  ---
DMD predefines some version identifiers based on SIMD level:

version (D_SIMD) - for SSE2 instruction sets
version (D_AVX)  - for SSE2..AVX instruction sets
version (D_AVX2) - for SSE2..AVX2 instruction sets

which should do the job.

--


[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed

2016-04-11 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15873

Marco Leise  changed:

   What|Removed |Added

 CC||marco.le...@gmx.de

--- Comment #7 from Marco Leise  ---
My concern is with "fast.json" where the call site reads

  auto json = parseJSON(...);

and I feel that

  import core.cpuid;
  if (sse42)
handleJson!true();
  else
handleJson!false();

  void handleJson(bool sse42)()
  {
auto json = parseJSON!sse42(...);
  }

is just not palatable. ('handleJson' being needed, since the return value would
be a RAII struct with compile-time specialization.) Importing core.cpuid,
figuring out which flag to use and set as a template argument and writing a
switch-case or if-else is not economically reasonable, so to speak when you
could enable SSE4 globally and often implicitly (-march=native). Also in my
case DMD wont profit, because it's inline assembly doesn't inline (making it
too slow) and GDC wont profit because it is not supported by core.cpuid,
leaving only LDC - but that's another story.

My argument here is that the one writing SIMD code is not necessarily the one
calling it. Compile-time information about the (implied) target enables us to
reduce the cognitive load for library users, and still make use of the latest
CPU features. This is working to great benefit with intrinsics in other
compilers (for popcnt, memcpy, etc.), but we can't imitate that. So we ended up
with runtime checks against a global variable in popcnt for what should be a
single instruction on recent CPUs and an additional "SSE4 only" _popcnt in
http://dlang.org/phobos/core_bitop.html#.popcnt

--


[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed

2016-04-06 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15873

Manu  changed:

   What|Removed |Added

 CC||turkey...@gmail.com

--- Comment #6 from Manu  ---
DMD really needs some way to select the simd level to target from the command
line. Runtime selection is appropriate at the outer loop, but runtime selection
is not practical for small occurrences of SIMD appearing littered around, or
where the selection would be made in the inner loop.

--


[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed

2016-04-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15873

ponce  changed:

   What|Removed |Added

 CC||alil...@gmail.com

--- Comment #5 from ponce  ---
Could DMD also generate SSE code for 32-bit targets (easily)? SSE2 is very
common.

I see two main advantages:

- it can also avoid some divergence in results between 32-bit and 64-bit
related to the unexpected higher precision of FPU operations. Using the FPU you
might think that floats are sufficient for one task when they aren't, because
they were promoted to 80-bit float internally.

- avoiding denormals. It is a recurring concern in audio code though not that
bad.

MSVC generates SSE2 in 32-bit by default I think.

--


[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed

2016-04-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15873

--- Comment #4 from Walter Bright  ---
https://github.com/D-Programming-Language/dlang.org/pull/1260

--


[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed

2016-04-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15873

--- Comment #3 from Walter Bright  ---
DMD predefines "D_SIMD" for:

1. all 64 bit code generation
2. OSX 32 bit code generation

and does generate SIMD instructions for those platforms. DMD does not have
compiler switches to select SIMD levels.

--


[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed

2016-04-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15873

--- Comment #2 from Walter Bright  ---
newsgroup thread:

http://www.digitalmars.com/d/archives/digitalmars/D/Any_usable_SIMD_implementation_282806.html

github thread:

https://github.com/D-Programming-Language/phobos/pull/2862

--


[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed

2016-04-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15873

Walter Bright  changed:

   What|Removed |Added

 CC||bugzi...@digitalmars.com

--- Comment #1 from Walter Bright  ---
For DMD, the minimum SIMD level can be ascertained by:

1. the operating system - for example, OSX is only sold on certain CPUs and
above. Also, Linux assumes SIMD in the default behavior of gcc.
2. 32 or 64 bit code being generated

The DMD compiler assumes the existence of that minimum SIMD level, and
generates SIMD code accordingly.


The SIMD capabilities can be tested at runtime:

  http://dlang.org/phobos/core_cpuid.html

This is used, for example, here:

 
https://github.com/D-Programming-Language/druntime/blob/master/src/rt/arraydouble.d#L33

The idea is to use a template to statically generated code for each supported
SIMD level. Then, test the capabilities at a high level, and select the right
branch at the high level. Then each level's implementation runs at full speed
with custom code for that level.

--


[Issue 15873] In order to implement std.simd, compile time info about CPU specifics is needed

2016-04-04 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15873

Jack Stouffer  changed:

   What|Removed |Added

   Keywords||CTFE, SIMD

--