On Wed, Mar 13, 2025 at 12:02:07AM +0000, nathandboss...@gmail.com wrote:
> Those are nice results.  I'm a little worried about the Neon implementation
> for smaller inputs since it uses a per-byte loop for the remaining bytes,
> though.  If we can ensure there's no regression there, I think this patch
> will be in decent shape.

True, the neon implementation in patch v6 did perform worse for smaller inputs.
This is solved in v7, we have added pg_popcount64 to speed up the processing of
smaller inputs/remaining bytes. Also, similar to sve, the neon-2reg version
performed better than neon-1reg but no improvement in neon-4reg.

The below table compares patches v6 and v7 on m7g.4xlarge
Query: SELECT drive_popcount(1000000, 8-byte words);
 8-byte words |  master  | v6-neon-2reg| v7-neon-2reg|  v7-sve
--------------+----------+-------------+-------------+--------
        1     |   4.051  |     6.239   |     3.431   |   3.343
        2     |   4.429  |    10.773   |     3.899   |   3.335
        3     |   4.844  |    14.066   |     4.398   |   3.348
        4     |   5.324  |     3.342   |     3.663   |   3.365
        5     |   5.900  |     7.108   |     4.349   |   4.441
        6     |   6.478  |    11.720   |     4.851   |   4.441
        7     |   7.192  |    15.686   |     5.551   |   4.447
        8     |   8.016  |     4.288   |     4.367   |   4.013


We modified [0] to get the numbers for pg_popcount_masked
 8-byte words |  master  | v7-neon-2reg|  v7-sve
--------------+----------+-------------+--------
        1     |   4.289  |     4.202   |   3.827
        2     |   4.993  |     4.662   |   3.823
        3     |   5.981  |     5.459   |   3.834
        4     |   6.438  |     4.230   |   3.846
        5     |   7.169  |     5.236   |   5.072
        6     |   7.949  |     5.922   |   5.106
        7     |   9.130  |     6.535   |   5.060
        8     |   9.796  |     5.328   |   4.718
      512     | 387.543  |   182.801   |  77.077
     1024     | 760.644  |   360.660   | 150.519

[0] 
https://postgr.es/m/CAFBsxsE7otwnfA36Ly44zZO+b7AEWHRFANxR1h1kxveEV=g...@mail.gmail.com

-Chiranmoy

Attachment: v7-0001-SVE-and-NEON-support-for-pg_popcount.patch
Description: v7-0001-SVE-and-NEON-support-for-pg_popcount.patch

Reply via email to