Re: Should we optimize the `ORDER BY random() LIMIT x` case?

wenhui qiu Wed, 14 May 2025 19:07:06 -0700

Hi Aleksander
 if we can optimize the query， that would be great，Then we won't need to
pull a lot of data to the program end and randomly pick the needed data
there.


On Thu, 15 May 2025 at 07:41, Aleksander Alekseev <aleksan...@timescale.com>
wrote:

> Hi,
>
> If I didn't miss anything, currently we don't seem to support sampling
> the result of an arbitrary SELECT query efficiently.
>
> To give one specific example:
>
> ````
> CREATE TABLE temperature(
>   ts TIMESTAMP NOT NULL,
>   city TEXT NOT NULL,
>   temperature INT NOT NULL);
>
> CREATE TABLE humidity(
>   ts TIMESTAMP NOT NULL,
>   city TEXT NOT NULL,
>   humidity INT NOT NULL);
>
> -- imagine having much more data ...
> INSERT INTO temperature (ts, city, temperature)
> SELECT ts + (INTERVAL '60 minutes' * random()), city, 30*random()
> FROM generate_series('2022-01-01' :: TIMESTAMP,
>                      '2022-01-31', '1 day') AS ts,
>      unnest(array['City A', 'City B']) AS city;
>
> INSERT INTO humidity (ts, city, humidity)
> SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random()
> FROM generate_series('2022-01-01' :: TIMESTAMP,
>                      '2022-01-31', '1 day') AS ts,
>      unnest(array['City A', 'City B']) AS city;
>
> -- "AS OF" join:
> SELECT t.ts, t.city, t.temperature, h.humidity
> FROM temperature AS t
> LEFT JOIN LATERAL
>   ( SELECT * FROM humidity
>     WHERE city = t.city AND ts <= t.ts
>     ORDER BY ts DESC LIMIT 1
>   ) AS h ON TRUE
> WHERE t.ts < '2022-01-05';
> ```
>
> One can do `SELECT (the query above) ORDER BY random() LIMIT x` but
> this produces an inefficient plan. Alternatively one could create
> temporary tables using `CREATE TEMP TABLE ... AS SELECT * FROM tbl
> TABLESAMPLE BERNOULLI(20)` but this is inconvenient and would be
> suboptimal even if we supported global temporary tables.
>
> 1. Do you think there might be value in addressing this issue?
> 2. If yes, how would you suggest addressing it from the UI point of
> view - by adding a special syntax, some sort of aggregate function, or
> ...?
>
> --
> Best regards,
> Aleksander Alekseev
>
>
>

Re: Should we optimize the `ORDER BY random() LIMIT x` case?

Reply via email to