It's why I like going for optimisations that try to reduce code size
without sacrificing speed, because of reducing the number of 16-byte or
32-byte sections. Anyhow, back to work with optimising!
Gareth aka. Kit
On 04/01/2022 19:33, Martin Frb via fpc-devel wrote:
On 04/01/2022 18:43, Jonas
On 04/01/2022 18:43, Jonas Maebe via fpc-devel wrote:
On 03/01/2022 12:54, Martin Frb via fpc-devel wrote:
not sure if this is of interest to you, but I see you do a lot on the
optimizer
It's very likely unrelated to anything the optimiser does or does not
do, and more regarding random di
On 03/01/2022 12:54, Martin Frb via fpc-devel wrote:
not sure if this is of interest to you, but I see you do a lot on the
optimizer
It's very likely unrelated to anything the optimiser does or does not
do, and more regarding random differences in code layout. Charlie posted
the following
On 4-1-2022 17:15, J. Gareth Moreton via fpc-devel wrote:
I neglected to include -Cpcoreavx, that was my bad. I'll try again.
According to Intel® 64 and IA-32 Architectures Software Developer’s
Manual, Vol 2B, Page 4-391. The zero flag is set if the source is
zero, and cleared otherwise. R
I neglected to include -Cpcoreavx, that was my bad. I'll try again.
According to Intel® 64 and IA-32 Architectures Software Developer’s
Manual, Vol 2B, Page 4-391. The zero flag is set if the source is zero,
and cleared otherwise. Regarding an undefined result, I got confused
with the BSF a
On 4-1-2022 16:31, Martin Frb via fpc-devel wrote:
Weird as mine is inlined with -Cpcoreavx -O4, with no special
handling for 0. But that does put some things on shaky ground. Maybe
zero the result before hand?
Same here.
I looked up popcnt and found nothing about not setting if zero. (E.g
@Marco: havent played with popcnt => it could benefit from the "const to
var" too.
So I played around a bit...
Of course, all this is intel only
1)
var
Mask8, Mask1: qword;
Mask8 := EIGHTYMASK;
Mask1 := ONEMASK;
And the constant no longer is assigned inside the loop.
Also makes
On 04/01/2022 10:31, Marco van de Voort via fpc-devel wrote:
Weird as mine is inlined with -Cpcoreavx -O4, with no special handling
for 0. But that does put some things on shaky ground. Maybe zero the
result before hand?
Same here.
About UTF8LengthF
On 4-1-2022 01:06, J. Gareth Moreton via fpc-devel wrote:
Prepare for a lot of technical rambling!
This is just an analysis of the compilation of utf8lentest.lpr, not
any of the System units. Notably, POPCNT isn't called directly, but
instead goes through the System unit via "call fpc_popcnt
Prepare for a lot of technical rambling!
This is just an analysis of the compilation of utf8lentest.lpr, not any
of the System units. Notably, POPCNT isn't called directly, but instead
goes through the System unit via "call fpc_popcnt_qword" on both 3.2.x
and 3.3.1. A future study of "fpc_po
Interesting - thank you. Will be interesting to study the assembler
output to see what's going on.
I'm honoured that I've become the go-to person when optimisation is
concerned!
Gareth aka. Kit
On 03/01/2022 11:54, Martin Frb via fpc-devel wrote:
Hi Gareth,
not sure if this is of interest
On 3-1-2022 12:54, Martin Frb via fpc-devel wrote:
fpc 3.2.3 / fpc 3.3.1
fst 594 fst 688
fst 578 fst 703
fst 578 fst 687
fst 562 fst 688
Fyi, the latest asm version (+fst/pop/add/naieve) is at
http://www.stack.nl/~marcov/utf8lentest.lpr
Hi Gareth,
not sure if this is of interest to you, but I see you do a lot on the
optimizer
While testing the attached, I found that one of the functions was
notable slower when compiled with 3.3.1 (compared to 3.2.3).
So maybe something you are interested in looking at?
The Code in "Utf
13 matches
Mail list logo