https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100627

            Bug ID: 100627
           Summary: missing optimization
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Hello gcc team,
i think i wrote something like that a long time ago, but i'm not sure. I think
the standard conversion uint64_t -> float/double is inefficient when AVX512 is
not available. At least on x86, but with SVE or other CPUs this may not be the
case. Problems:
- a lot of conditional jumps are generated, not BPU-friendly
- and therefore not branchfree
- larger codesize
I briefly implemented a few conversions for SSE/SSE2
(https://godbolt.org/z/n63WedKT9). Advantages:
- branchfree
- mostly smaller codesize
- more quickly
Wouldn't it make sense to implement the standard conversion in this way
(including for AVX/AVX2)?

thx
Gero

Reply via email to