On 30/12/2021 14:43, Marco van de Voort via lazarus wrote:
Compile with -O4 -Cpcoreavx2 , the others (non asm) will become
faster, my guess is "add" will be about double of asm.
Core I7 8700K
3.3.1 from Dec 10th
3.2.3 from Dec 9th
With fpc 3.3.1:
- fst is worse?
- add gets better
-O4
On 30-12-2021 14:17, John Landmesser via lazarus wrote:
Perhaps usefui test information from my PC: 77
Compile with -O4 -Cpcoreavx2 , the others (non asm) will become faster,
my guess is "add" will be about double of asm.
Also, on windows "high performance" as power scheme. On non windows
Perhaps usefui test information from my PC:
**
[john1@manjaro sdb2]$ ./utf8lentest
234526968
fst:128406168
pop:128406168
add:128406168
asm:128406168
29315871
fst 1365
fst 1367
fst 1366
fst 1366
pop 9990
pop 9990
pop 9997
pop 9981
add 1386
add 1382
add
On 30-12-2021 10:15, Florian Klämpfl via lazarus wrote:
Linux uses different calling conventions, please check with the patch
below.
Linux is quite generous with the volatile registers, so luckily it
matches quite closely.
I first tried the approach of your patch, but [s] has problems on
Am 30.12.21 um 08:23 schrieb Alexey Tor. via lazarus:
New unit test, with Martin's integrated. If I play with godbolt, Ryzen
zen3 (ryzen 5x00X) is nearly twice as fast in cycles as my Ivy Bridge,
so I would like to see some benchmarks from various processors. Also
from very old ones (P4 and
New unit test, with Martin's integrated. If I play with godbolt, Ryzen
zen3 (ryzen 5x00X) is nearly twice as fast in cycles as my Ivy Bridge,
so I would like to see some benchmarks from various processors. Also
from very old ones (P4 and Clawhammers) to test instruction sets.
Project
On 30-12-2021 01:29, Marco van de Voort via lazarus wrote:
(P4 and Clawhammers) to test instruction sets.
64-bit supporting Pentium 4's of course, since this is a 64-bit only test.
Claw/Sledgehammer is the first iteration of Athlon64 and of the x86_64
architecture as a whole and misses
On 29-12-2021 00:00, Bart via lazarus wrote:
On Tue, Dec 28, 2021 at 11:35 PM Martin Frb via lazarus
wrote:
I have a core I7-8600
The diff between the old code and popcnt is less significant.
old: 715
pop: 695
But there is a 3rd way, that is faster.
add: 610
Not surprising that you should
On 29-12-2021 16:30, Martin Frb via lazarus wrote:
Could you post full source if you haven't already? For a bit of
benchmarking. I just wrote it from the top of my head, and I assumed
5 instructions for 16-byte would win any time, but haven't verified
anything yet.
I had it attached on
On 29/12/2021 13:42, Marco van de Voort via lazarus wrote:
On 29-12-2021 10:16, Martin Frb via lazarus wrote:
// Martin's routine that should be replaced by some punpkl magic,
but it is too late now.
Why too late?
See datetime stamp. 02:10 AM. I don't know how it is with you Lazarus
Am 29.12.2021 um 13:42 schrieb Marco van de Voort via lazarus:
p.s. is there a workaround for git worktree to work on the same branch? E.g.
trunk for 32-bit and trunk for 64-bit ? :-)
No. You cannot checkout the same branch in two worktrees. But you can do the following: create a new branch
On 29-12-2021 10:16, Martin Frb via lazarus wrote:
// Martin's routine that should be replaced by some punpkl magic, but
it is too late now.
Why too late?
See datetime stamp. 02:10 AM. I don't know how it is with you Lazarus
devels, but we FPC devels need our beauty sleep from time to
On 29/12/2021 02:10, Marco van de Voort via lazarus wrote:
On 28-12-2021 23:35, Martin Frb via lazarus wrote:
"nx" has a single "1" in each of the 8 bytes in a Qword (based on
64bit).
If we regard each of this bytes as an entity of its own, then we can
keep adding those "1".
I also was
On 28-12-2021 23:35, Martin Frb via lazarus wrote:
"nx" has a single "1" in each of the 8 bytes in a Qword (based on 64bit).
If we regard each of this bytes as an entity of its own, then we can
keep adding those "1".
I also was thinking in that direction, but more about how to optimize
On Tue, Dec 28, 2021 at 11:35 PM Martin Frb via lazarus
wrote:
> I have a core I7-8600
> The diff between the old code and popcnt is less significant.
>
> old: 715
> pop: 695
>
> But there is a 3rd way, that is faster.
> add: 610
Not surprising that you should come up with a faster solution.
On 28/12/2021 15:50, Bart via lazarus wrote:
On Tue, Dec 28, 2021 at 3:39 PM Marco van de Voort via lazarus
wrote:
On what machine did you test? The settings if for the generated code,
but the actual processor determines the effective speed.
I have a Intel i5 7th generation on my Win10-64
16 matches
Mail list logo