Re: [fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

2022-01-04 Thread J. Gareth Moreton via fpc-devel
It's why I like going for optimisations that try to reduce code size without sacrificing speed, because of reducing the number of 16-byte or 32-byte sections.  Anyhow, back to work with optimising! Gareth aka. Kit On 04/01/2022 19:33, Martin Frb via fpc-devel wrote: On 04/01/2022 18:43,

Re: [fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

2022-01-04 Thread Martin Frb via fpc-devel
On 04/01/2022 18:43, Jonas Maebe via fpc-devel wrote: On 03/01/2022 12:54, Martin Frb via fpc-devel wrote: not sure if this is of interest to you, but I see you do a lot on the optimizer It's very likely unrelated to anything the optimiser does or does not do, and more regarding random

Re: [fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

2022-01-04 Thread Jonas Maebe via fpc-devel
On 03/01/2022 12:54, Martin Frb via fpc-devel wrote: not sure if this is of interest to you, but I see you do a lot on the optimizer It's very likely unrelated to anything the optimiser does or does not do, and more regarding random differences in code layout. Charlie posted the

Re: [fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

2022-01-04 Thread Marco van de Voort via fpc-devel
On 4-1-2022 17:15, J. Gareth Moreton via fpc-devel wrote: I neglected to include -Cpcoreavx, that was my bad.  I'll try again. According to Intel® 64 and IA-32 Architectures Software Developer’s Manual, Vol 2B, Page 4-391.  The zero flag is set if the source is zero, and cleared otherwise. 

Re: [fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

2022-01-04 Thread J. Gareth Moreton via fpc-devel
I neglected to include -Cpcoreavx, that was my bad.  I'll try again. According to Intel® 64 and IA-32 Architectures Software Developer’s Manual, Vol 2B, Page 4-391.  The zero flag is set if the source is zero, and cleared otherwise.  Regarding an undefined result, I got confused with the BSF

Re: [fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

2022-01-04 Thread Marco van de Voort via fpc-devel
On 4-1-2022 16:31, Martin Frb via fpc-devel wrote: Weird as mine is inlined with -Cpcoreavx -O4, with no special handling for 0. But that does put some things on shaky ground. Maybe zero the result before hand? Same here. I looked up popcnt and found nothing about not setting if zero.

Re: [fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

2022-01-04 Thread Martin Frb via fpc-devel
@Marco: havent played with popcnt => it could benefit from the "const to var" too. So I played around a bit... Of course, all this is intel only 1) var   Mask8, Mask1: qword;   Mask8 := EIGHTYMASK;   Mask1 := ONEMASK; And the constant no longer is assigned inside the loop. Also

Re: [fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

2022-01-04 Thread Martin Frb via fpc-devel
On 04/01/2022 10:31, Marco van de Voort via fpc-devel wrote: Weird as mine is inlined with -Cpcoreavx -O4, with no special handling for 0. But that does put some things on shaky ground. Maybe zero the result before hand? Same here. About

Re: [fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

2022-01-04 Thread Marco van de Voort via fpc-devel
On 4-1-2022 01:06, J. Gareth Moreton via fpc-devel wrote: Prepare for a lot of technical rambling! This is just an analysis of the compilation of utf8lentest.lpr, not any of the System units.  Notably, POPCNT isn't called directly, but instead goes through the System unit via "call