On Fri, Feb 6, 2026 at 11:13 PM Nathan Bossart <[email protected]> wrote: > > On Thu, Feb 05, 2026 at 02:48:44PM +0700, John Naylor wrote: > > On Thu, Feb 5, 2026 at 4:43 AM Nathan Bossart <[email protected]> > > wrote: > >> Sure. I'm tempted to suggest that we only use the plain C version here, > >> too. The SSE4.2 bms_num_members() test I did yesterday used it and showed > >> improvement at one word. If we do that, we can rip out even more code > >> since we no longer need the popcount built-ins. > > > > Unlike the 32-bit case, people do run production on 64-bit platforms > > that are not Arm/x86, so that would require effort to see if the > > builtins are worth it for them. That seems like a separate effort. I > > can help with that, but let's get the tested stuff in first. > > Alright. I moved that to a new 0004 patch that we can consider separately > once 0001-0003 have been committed.
Okay, this is looking good. I have just one more suggestion: For 0002, just copy the word-wise functions verbatim. That way, it's a pure refactoring commit and the exception doesn't need explaining. With that, I'd say go ahead and commit 0001/2. Then after a bit more research, the final form of the inline functions can be visible in a single commit. I've tested S390X already and hope to test one other platform. -- John Naylor Amazon Web Services
