On 22/11/2024 01:27, Giacchino, Luca wrote:
The existing list_sort takes a comparator function to compare pairs of ListCell. On the other hand, x86-simd-sort requires an array of numeric values to sort, and it returns an array of sorted indices. To enable x86-simd-sort, we add new list_sort_simd functions that take an extractor function. The function extracts a value (float or uint32) from a ListCell. Those values are then collected into an array for x86-simd- sort to work on. A comparator function can still be passed to be used as tie-breaker.

typedef float (*list_sort_extractor_float)(const ListCell *a);

typedef uint32 (*list_sort_extractor_uint32)(const ListCell *a);

void list_sort_simd_float(List *list, list_sort_extractor_float extract, list_sort_comparator cmp);

void list_sort_simd_uint32(List *list, list_sort_extractor_uint32 extract, list_sort_comparator cmp);

These functions will exist alongside the current list_sort. Existing list_sort use cases in Postgres or extensions will not be affected by default, and they can be converted to list_sort_simd functions where it makes sense in terms of performance.

I'd suggest targeting pg_qsort() directly, instead of list_sort(). list_sort() is not used in very performance critical parts.

We identified a first use case for list_sort_simd_float in pgvector. As part of HNSW index construction, pgvector uses list_sort to sort candidate vectors by distance. Using list_sort_simd_float, we observed reduction in index build time in some scenarios. For example, we observed 7% reduction in index build time with the gist-960 dataset and 10% with the dbpedia-openai-1000k dataset (ANN-Benchmarks, HNSW index with halfvec, m=80). We are also looking into microbenchmarks to measure list_sort performance independently.

That's interesting. I'd suggest proposing this to the pgvector project directly, since pgvector would immediately benefit.

We’d appreciate feedback on this approach. In the meantime, we will complete the patch to share. We also plan to extend SIMD-based sort to tuple sort in the future.

If you could use this to speed up tuple sorting, that would be much more interesting for PostgreSQL itself.

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to