I created an example that is a little bit closer to the actual code and
changed the compiler from C++ to C.

It is interesting the optimization that the compiler has chosen for version
1 versus version 2.  One calls
memcpy and one doesn't.  There is a good chance the inlining of memcpy as
SSE+scalar per iteration
will be faster for syscache scans-- which I believe are usually small (1-4
keys?).

Probably the only reason to do this patch would be if N is normally large
or if this is considered an
improvement in code clarity without a detrimental impact on small N
syscache scans.
I realize you only said "possible small optimization".  It might be
worthwhile to benchmark the code for
different values of n to determine if there is a tipping point either way?

https://godbolt.org/z/dM18cGfE6

-- bg

On Mon, Mar 9, 2026 at 8:05 AM Ranier Vilela <[email protected]> wrote:

>
> Em seg., 9 de mar. de 2026 às 10:16, Ranier Vilela <[email protected]>
> escreveu:
>
>> Hi.
>>
>> In the functions *systable_beginscan* and *systable_beginscan_ordered*,
>> is possible a small optimization.
>> The array *idxkey* can be constructed in one go with a single call to
>> mempcy.
>> The excess might not make much of a difference, but I think it's worth
>> the effort.
>>
>> patch attached.
>>
> Someone asked me if O2 does not do the work.
> Apparently not.
>
> https://godbolt.org/z/h5dndz33x
>
> best regards,
> Ranier Vilela
>

Reply via email to