Em ter., 5 de nov. de 2024 às 00:23, David Rowley <dgrowle...@gmail.com>
escreveu:

> On Tue, 5 Nov 2024 at 06:39, Ranier Vilela <ranier...@gmail.com> wrote:
> > I think we can add a small optimization to this last patch [1].
> > The variable *aligned_end* is only needed in the second loop (for).
> > So, only before the for loop do we actually declare it.
> >
> > Result before this change:
> > check zeros using BERTRAND 1 0.000031s
> >
> > Result after this change:
> > check zeros using BERTRAND 1 0.000018s
> >
> > + const unsigned char *aligned_end;
> >
> > + /* Multiple bytes comparison(s) at once */
> > + aligned_end = (const unsigned char *) ((uintptr_t) end &
> (~(sizeof(size_t) - 1)));
> > + for (; p < aligned_end; p += sizeof(size_t))
>
> I think we all need to stop using Godbolt's servers to run benchmarks
> on. These servers are likely to be running various other workloads in
> highly virtualised environments and are not going to be stable servers
> that would give consistent benchmark results.
>
> I tried your optimisation in the attached allzeros.c and here are my
> results:
>
> # My version
> $ gcc allzeros.c -O2 -o allzeros && for i in {1..3}; do ./allzeros; done
> char: done in 1566400 nanoseconds
> size_t: done in 195400 nanoseconds (8.01638 times faster than char)
> char: done in 1537500 nanoseconds
> size_t: done in 196300 nanoseconds (7.8324 times faster than char)
> char: done in 1543600 nanoseconds
> size_t: done in 196300 nanoseconds (7.86347 times faster than char)
>
> # Ranier's optimization
> $ gcc allzeros.c -O2 -D RANIERS_OPTIMIZATION -o allzeros && for i in
> {1..3}; do ./allzeros; done
> char: done in 1943100 nanoseconds
> size_t: done in 531700 nanoseconds (3.6545 times faster than char)
> char: done in 1957200 nanoseconds
> size_t: done in 458400 nanoseconds (4.26963 times faster than char)
> char: done in 1949500 nanoseconds
> size_t: done in 469000 nanoseconds (4.15672 times faster than char)
>
> Seems to be about half as fast with gcc on -O2
>
Thanks for test coding.

I've tried with msvc 2022 32bits
Here the results:

C:\usr\src\tests\allzeros>allzeros
char: done in 71431900 nanoseconds
size_t: done in 18010900 nanoseconds (3.96604 times faster than char)

C:\usr\src\tests\allzeros>allzeros
char: done in 71070100 nanoseconds
size_t: done in 19654300 nanoseconds (3.61601 times faster than char)

C:\usr\src\tests\allzeros>allzeros
char: done in 68682400 nanoseconds
size_t: done in 19841100 nanoseconds (3.46162 times faster than char)

C:\usr\src\tests\allzeros>allzeros
char: done in 63215100 nanoseconds
size_t: done in 17920200 nanoseconds (3.52759 times faster than char)


C:\usr\src\tests\allzeros>c /DRANIERS_OPTIMIZATION

Microsoft (R) Program Maintenance Utility Versão 14.40.33813.0
Direitos autorais da Microsoft Corporation. Todos os direitos reservados.

C:\usr\src\tests\allzeros>allzeros
char: done in 67213800 nanoseconds
size_t: done in 15049200 nanoseconds (4.46627 times faster than char)

C:\usr\src\tests\allzeros>allzeros
char: done in 51505900 nanoseconds
size_t: done in 13645700 nanoseconds (3.77452 times faster than char)

C:\usr\src\tests\allzeros>allzeros
char: done in 62852600 nanoseconds
size_t: done in 17863800 nanoseconds (3.51843 times faster than char)

C:\usr\src\tests\allzeros>allzeros
char: done in 51877200 nanoseconds
size_t: done in 13759900 nanoseconds (3.77017 times faster than char)

The function used to replace clock_getime is:
timespec_get(ts, TIME_UTC)

best regards,
Ranier Vilela

Reply via email to