https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110013
Bug ID: 110013 Summary: [i386] vector_size(8) on 32-bit ABI Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: husseydevin at gmail dot com Target Milestone: --- Closely related to bug 86541, which was fixed on x64 only. On 32-bit, GCC passes any vector_size(8) vectors to external functions in MMX registers, similar to how it passes 16 byte vectors in SSE registers. This appears to be the only time that GCC will ever naturally generate an MMX instruction. This is only good if and only if you are using MMX intrinsics and are manually handling _mm_empty(). Otherwise, if, say, you are porting over NEON code (where I found this issue) using the vector_size intrinsics, this can cause some sneaky issues if your function fails to inline: 1. Things will likely break because GCC doesn't handle MMX and x87 properly - Example of broken code (works with -mno-mmx): https://godbolt.org/z/xafWPohKb 2. You will have a nasty performance toll, more than just a cdecl call, as GCC doesn't actually know what to do with an MMX register and just spills it into memory. - This especially can be seen when v2sf is used and it places the floats into MMX registers. There are two options. The first is to use the weird ABI that Clang seems to use: | Type | SIMD | Params | Return | | float | base | stack | ST0:ST1 | | float | SSE | XMM0-2 | XMM0 | | double | all | stack | ST0 | | long long/__m64 | all | stack | EAX:EDX | | int, short, char | base | stack | stack | | int, short, char | SSE2 | stack | XMM0 | However, since the current ABIs aren't 100% compatible anyways, I think that a much simpler solution is to just convert to SSE like x64 does, falling back to the stack if SSE is not available. Changing the ABI to this also allows us to port MMX with SSE (bug 86541) to 32-bit mode. If you REALLY need MMX intrinsics, you can't inline, and you don't have SSE2, you can cope with a stack spill.