On 22/11/2024 01:27, Giacchino, Luca wrote:
The existing list_sort takes a comparator function to compare pairs of
ListCell. On the other hand, x86-simd-sort requires an array of numeric
values to sort, and it returns an array of sorted indices. To enable
x86-simd-sort, we add new list_sort_simd functions that take an
extractor function. The function extracts a value (float or uint32) from
a ListCell. Those values are then collected into an array for x86-simd-
sort to work on. A comparator function can still be passed to be used as
tie-breaker.
typedef float (*list_sort_extractor_float)(const ListCell *a);
typedef uint32 (*list_sort_extractor_uint32)(const ListCell *a);
void list_sort_simd_float(List *list, list_sort_extractor_float extract,
list_sort_comparator cmp);
void list_sort_simd_uint32(List *list, list_sort_extractor_uint32
extract, list_sort_comparator cmp);
These functions will exist alongside the current list_sort. Existing
list_sort use cases in Postgres or extensions will not be affected by
default, and they can be converted to list_sort_simd functions where it
makes sense in terms of performance.
I'd suggest targeting pg_qsort() directly, instead of list_sort().
list_sort() is not used in very performance critical parts.
We identified a first use case for list_sort_simd_float in pgvector. As
part of HNSW index construction, pgvector uses list_sort to sort
candidate vectors by distance. Using list_sort_simd_float, we observed
reduction in index build time in some scenarios. For example, we
observed 7% reduction in index build time with the gist-960 dataset and
10% with the dbpedia-openai-1000k dataset (ANN-Benchmarks, HNSW index
with halfvec, m=80). We are also looking into microbenchmarks to measure
list_sort performance independently.
That's interesting. I'd suggest proposing this to the pgvector project
directly, since pgvector would immediately benefit.
We’d appreciate feedback on this approach. In the meantime, we will
complete the patch to share. We also plan to extend SIMD-based sort to
tuple sort in the future.
If you could use this to speed up tuple sorting, that would be much more
interesting for PostgreSQL itself.
--
Heikki Linnakangas
Neon (https://neon.tech)