https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110013

            Bug ID: 110013
           Summary: [i386] vector_size(8) on 32-bit ABI
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: husseydevin at gmail dot com
  Target Milestone: ---

Closely related to bug 86541, which was fixed on x64 only.

On 32-bit, GCC passes any vector_size(8) vectors to external functions in MMX
registers, similar to how it passes 16 byte vectors in SSE registers. 

This appears to be the only time that GCC will ever naturally generate an MMX
instruction.

This is only good if and only if you are using MMX intrinsics and are manually
handling _mm_empty().

Otherwise, if, say, you are porting over NEON code (where I found this issue)
using the vector_size intrinsics, this can cause some sneaky issues if your
function fails to inline:
1. Things will likely break because GCC doesn't handle MMX and x87 properly
   - Example of broken code (works with -mno-mmx):
https://godbolt.org/z/xafWPohKb
2. You will have a nasty performance toll, more than just a cdecl call, as GCC
doesn't actually know what to do with an MMX register and just spills it into
memory.
   - This especially can be seen when v2sf is used and it places the floats
into MMX registers.

There are two options. The first is to use the weird ABI that Clang seems to
use:

| Type             | SIMD | Params | Return  |
| float            | base | stack  | ST0:ST1 |
| float            | SSE  | XMM0-2 | XMM0    |
| double           | all  | stack  | ST0     |
| long long/__m64  | all  | stack  | EAX:EDX |
| int, short, char | base | stack  | stack   |
| int, short, char | SSE2 | stack  | XMM0    |

However, since the current ABIs aren't 100% compatible anyways, I think that a
much simpler solution is to just convert to SSE like x64 does, falling back to
the stack if SSE is not available.

Changing the ABI to this also allows us to port MMX with SSE (bug 86541) to
32-bit mode. If you REALLY need MMX intrinsics, you can't inline, and you don't
have SSE2, you can cope with a stack spill.

Reply via email to