Am 21.07.22 um 10:41 schrieb Dean Rasheed:
A couple of quick comments on the current patch:
Thank you for your feedback!
It's important to mark these new functions as VOLATILE, not IMMUTABLE, otherwise they won't work as expected in queries. See https://www.postgresql.org/docs/current/xfunc-volatility.html
CREATE FUNCTION marks functions as VOLATILE by default if not explicitly marked otherwise. I assumed function definitions in pg_proc.dat have the same behavior. I will fix that.
https://www.postgresql.org/docs/current/sql-createfunction.html
It would be better to use pg_prng_uint64_range() rather than rand() to pick elements. Partly, that's because it uses a higher quality PRNG, with a larger internal state, and it ensures that the results are unbiased across the range. But more importantly, it interoperates with setseed(), allowing predictable sequences of "random" numbers to be generated -- something that's useful in writing repeatable regression tests.
I agree that we should use pg_prng_uint64_range(). However, in order to achieve interoperability with setseed() we would have to use drandom_seed (rather than pg_global_prng_state) as rng state, which is declared statically in float.c and exclusively used by random(). Do we want to expose drandom_seed to other functions?
Assuming these new functions are made to interoperate with setseed(), which I think they should be, then they also need to be marked as PARALLEL RESTRICTED, rather than PARALLEL SAFE. See https://www.postgresql.org/docs/current/parallel-safety.html, which explains why setseed() and random() are parallel restricted.
As mentioned above, i assumed the default here is PARALLEL UNSAFE. I'll fix that.
In my experience, the requirement for sampling with replacement is about as common as the requirement for sampling without replacement, so it seems odd to provide one but not the other. Of course, we can always add a with-replacement function later, and give it a different name. But maybe array_sample() could be used for both, with a "with_replacement" boolean parameter?
We can also add a "with_replacement" boolean parameter which is false by default to array_sample() later. I do not have a strong opinion about that and will implement it, if desired. Any opinions about default/no-default?
Martin