Re: Should we optimize the `ORDER BY random() LIMIT x` case?

Aleksander Alekseev Fri, 16 May 2025 05:01:32 -0700

Hi,

> A custom SRF seems great to me. You may propose such an aggregate in the
> core - it seems it doesn't even need any syntax changes. For example:
> SELECT * FROM (SELECT sample(q, 10, <type>) FROM (SELECT ...) AS q);
> or something like that.


I experimented with this idea a bit and it looks like this is not going to work.

In our case sample() should behave both as an aggregate function and
as SRF. An aggregate function computes a single result [1]. We
probably could bypass this by returning an array but it would be
inconvenient for the user. The problem with SRFs is that we never know
when SRF is called the last time e.g. there is no SRF_IS_LASTCALL().
So there is no way to aggregate the tuples until we receive the last
one and then return the sample. I don't think SRF can know this in a
general case (JOINs, SELECT .. LIMIT .., etc). Unless I'm missing
something of course. Perhaps somebody smarter than me could show us
the exact way to implement sample() as an SRF, but at the moment
personally I don't see it.

On top of that it's worth noting that the query you showed above
wouldn't be such a great UI in my opinion.

If I'm right about the limitations of aggregate functions and SRFs
this leaves us the following options:

1. Changing the constraints of aggregate functions or SRFs. However I
don't think we want to do it for such a single niche scenario.
2. Custom syntax and a custom node.
3. To give up

Thoughts?

[1]: https://www.postgresql.org/docs/current/functions-aggregate.html
-- 
Best regards,
Aleksander Alekseev

Re: Should we optimize the `ORDER BY random() LIMIT x` case?

Reply via email to