Re: [Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]

2021-12-29 Thread Alexey Tor. via lazarus
New unit test, with Martin's integrated. If I play with godbolt, Ryzen zen3 (ryzen 5x00X) is nearly twice as fast in cycles as my Ivy Bridge, so I would like to see some benchmarks from various processors. Also from very old ones (P4 and Clawhammers) to test instruction sets. Project utf8len

Re: [Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]

2021-12-29 Thread Marco van de Voort via lazarus
On 30-12-2021 01:29, Marco van de Voort via lazarus wrote:  (P4 and Clawhammers) to test instruction sets. 64-bit supporting Pentium 4's of course, since this is a 64-bit only test. Claw/Sledgehammer is the first iteration of Athlon64 and of the x86_64 architecture as a whole and misses so

Re: [Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]

2021-12-29 Thread Marco van de Voort via lazarus
On 29-12-2021 00:00, Bart via lazarus wrote: On Tue, Dec 28, 2021 at 11:35 PM Martin Frb via lazarus wrote: I have a core I7-8600 The diff between the old code and popcnt is less significant. old: 715 pop: 695 But there is a 3rd way, that is faster. add: 610 Not surprising that you should

Re: [Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]

2021-12-29 Thread Marco van de Voort via lazarus
On 29-12-2021 16:30, Martin Frb via lazarus wrote: Could you post full source if you haven't already? For a bit of benchmarking. I just wrote it from the top of my head, and I assumed 5 instructions for 16-byte would win any time, but haven't verified anything yet. I had it attached on m

Re: [Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]

2021-12-29 Thread Martin Frb via lazarus
On 29/12/2021 13:42, Marco van de Voort via lazarus wrote: On 29-12-2021 10:16, Martin Frb via lazarus wrote: // Martin's routine that should be replaced by some punpkl magic, but it is too late now. Why too late? See datetime stamp.  02:10 AM. I don't know how it is with you Lazarus dev

Re: [Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]

2021-12-29 Thread Florian Klämpfl via lazarus
Am 29.12.2021 um 13:42 schrieb Marco van de Voort via lazarus: p.s. is there a workaround for git worktree to work on the same branch? E.g. trunk for 32-bit and trunk for 64-bit ? :-) No. You cannot checkout the same branch in two worktrees. But you can do the following: create a new branch

Re: [Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]

2021-12-29 Thread Marco van de Voort via lazarus
On 29-12-2021 10:16, Martin Frb via lazarus wrote: // Martin's routine that should be replaced by some punpkl magic, but it is too late now. Why too late? See datetime stamp.  02:10 AM. I don't know how it is with you Lazarus devels, but we FPC devels need our beauty sleep from time to ti

Re: [Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]

2021-12-29 Thread Martin Frb via lazarus
On 29/12/2021 02:10, Marco van de Voort via lazarus wrote: On 28-12-2021 23:35, Martin Frb via lazarus wrote: "nx" has a single "1" in each of the 8 bytes in a Qword (based on 64bit). If we regard each of this bytes as an entity of its own, then we can keep adding those "1". I also was th