Hi,
I made some test runs and you are right: The difference is negligible,
because at least in my test with the -O3 flag the compiler produces code
that starts with the unaligned beginning, then processes the large
middle part with aligned reads and SIMD instructions and then the rest.
Even in a
Op ma 17 mrt 2025 om 02:58 schreef Stefan Oltmanns :
>
> Hi,
>
> yes, SSE requires aligned buffers for operations that directly read an
> operand from memory
libFLAC does unaligned SIMD all the time, both SSE and AVX, so I don't
think that is true. See
https://c9x.me/x86/html/file_module_x86_id_1
Hi,
yes, SSE requires aligned buffers for operations that directly read an
operand from memory, AVX is more relaxed, but is supposed to be faster
with aligned data.
I'm not sure if macOS automatically aligns buffers with malloc, but
since C11 there is aligned_alloc, that works fine on Linux/macO
I believe that the SSE/AVX hardware engine only works with aligned buffers.
That said, I also believe that macOS already aligns buffers, even with simple
malloc(), although I might be wrong. At the very least, there is surely a
CoreAudio memory allocation function that aligns buffers for audio,
Hi,
Please explain why you need aligned buffers.
Kind regards, Martijn van Beurden
Op zo 16 mrt 2025 om 01:36 schreef Stefan Oltmanns :
>
> Hello,
>
> I want to process the output from libflac with SSE/AVX. Unfortunately it
> seems that libflac always allocates the output buffer itself and there