Re: Proposal to introduce a shuffle function to intarray extension

Thomas Munro Sat, 16 Jul 2022 23:01:02 -0700

On Sun, Jul 17, 2022 at 3:37 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
> I wrote:
> > On the whole, I'd vote for calling it shuffle(), and expecting that
> > we'd also use that name for any future generic version.
>
> Actually ... is there a reason to bother with an intarray version
> at all, rather than going straight for an in-core anyarray function?
> It's not obvious to me that an int4-only version would have
> major performance advantages.


Yeah, that seems like a good direction.  If there is a performance
advantage to specialising, then perhaps we only have to specialise on
size, not type.  Perhaps there could be a general function that
internally looks out for typbyval && typlen == 4, and dispatches to a
specialised 4-byte, and likewise for 8, if it can, and that'd already
be enough to cover int, bigint, float etc, without needing
specialisations for each type.

I went to see what Professor Lemire would have to say about all this,
expecting to find a SIMD rabbit hole to fall down for some Sunday
evening reading, but the main thing that jumped out was this article
about the modulo operation required by textbook Fisher-Yates to get a
bounded random number, the random() % n that appears in the patch.  He
talks about shuffling twice as fast by using a no-division trick to
get bounded random numbers[1].  I guess you might need to use our
pg_prng_uint32() for that trick because random()'s 0..RAND_MAX might
introduce bias.  Anyway, file that under go-faster ideas for later.

[1] https://lemire.me/blog/2016/06/30/fast-random-shuffling/

Re: Proposal to introduce a shuffle function to intarray extension

Reply via email to