Hi Aleksander if we can optimize the query, that would be great,Then we won't need to pull a lot of data to the program end and randomly pick the needed data there.
On Thu, 15 May 2025 at 07:41, Aleksander Alekseev <aleksan...@timescale.com> wrote: > Hi, > > If I didn't miss anything, currently we don't seem to support sampling > the result of an arbitrary SELECT query efficiently. > > To give one specific example: > > ```` > CREATE TABLE temperature( > ts TIMESTAMP NOT NULL, > city TEXT NOT NULL, > temperature INT NOT NULL); > > CREATE TABLE humidity( > ts TIMESTAMP NOT NULL, > city TEXT NOT NULL, > humidity INT NOT NULL); > > -- imagine having much more data ... > INSERT INTO temperature (ts, city, temperature) > SELECT ts + (INTERVAL '60 minutes' * random()), city, 30*random() > FROM generate_series('2022-01-01' :: TIMESTAMP, > '2022-01-31', '1 day') AS ts, > unnest(array['City A', 'City B']) AS city; > > INSERT INTO humidity (ts, city, humidity) > SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random() > FROM generate_series('2022-01-01' :: TIMESTAMP, > '2022-01-31', '1 day') AS ts, > unnest(array['City A', 'City B']) AS city; > > -- "AS OF" join: > SELECT t.ts, t.city, t.temperature, h.humidity > FROM temperature AS t > LEFT JOIN LATERAL > ( SELECT * FROM humidity > WHERE city = t.city AND ts <= t.ts > ORDER BY ts DESC LIMIT 1 > ) AS h ON TRUE > WHERE t.ts < '2022-01-05'; > ``` > > One can do `SELECT (the query above) ORDER BY random() LIMIT x` but > this produces an inefficient plan. Alternatively one could create > temporary tables using `CREATE TEMP TABLE ... AS SELECT * FROM tbl > TABLESAMPLE BERNOULLI(20)` but this is inconvenient and would be > suboptimal even if we supported global temporary tables. > > 1. Do you think there might be value in addressing this issue? > 2. If yes, how would you suggest addressing it from the UI point of > view - by adding a special syntax, some sort of aggregate function, or > ...? > > -- > Best regards, > Aleksander Alekseev > > >