Re: considering L1 cache

2024-05-13 Thread Jim Meyering
You might be interested in this implementation: https://engineering.fb.com/2019/04/25/developer-tools/f14/ On Mon, May 13, 2024 at 4:24 PM Bruno Haible wrote: > > Paul Eggert wrote: > > I installed the > > attached. This probably a win (over de Bruijn too), at least for some > > apps and platform

Re: considering L1 cache

2024-05-13 Thread Bruno Haible
Paul Eggert wrote: > I installed the > attached. This probably a win (over de Bruijn too), at least for some > apps and platforms, though I haven't benchmarked. Thanks! Replacing a table access with ca. 7 arithmetic instructions definitely a win. I also love how this code makes use of condition

Re: considering L1 cache

2024-05-13 Thread Paul Eggert
On 5/13/24 09:17, Bruno Haible wrote: The reason is that such a 256-bytes table will tend to occupy 256 bytes in the CPU's L1 cache, and thus reduce the ability of other code to use the L1 cache. Yes, it partly depends on whether the function is called a lot (so the 256-byte table is in the ca

Re: considering L1 cache

2024-05-13 Thread Bruno Haible
Hi Paul, > substituting something > more straightforward than a de Bruijn hash (possibly faster?). > ... > +#if !defined _GL_STDBIT_HAS_BUILTIN_CLZ && !_MSC_VER > +/* __gl_stdbit_clztab[B] is the number of leading zeros in > + the 8-bit byte with value B. */ > +char const __gl_stdbit_clztab[256