Thank you all who replied!! It looks like Sugawara's recursive solution does the trick. Unfortunately performance is quite poor for the sample dataset I'm working with which is a table of about 50000 records. Indeed, there are indexes applied to the table. I believe the recursive select is being executed a great number of times causing the first part of the query to take a long time.
The fastest solution I've come up with is a plpgsql procedure the loops over a select where the result is ordered by (id,tstamp) and examines the tstamp values and only returns rows that meet the interval criteria. This technique takes roughly 2 seconds to filter out records over my 50000 record sample set.... which is acceptable but not nearly as elegant as a single SQL statement. Again, thank you for all the replies. Wayne On Sun, Jun 05, 2011 at 08:52:30PM +0900, Masaru Sugawara wrote: > On Fri, 3 Jun 2011 15:52:53 -0400 > lists-pg...@useunix.net wrote: > > > I also think you might want to use WITH RECURSIVE clause. > This SQL searches the case of an interval of 5 minutes or more, > and sets a relationship between a parent to its child. > > > CREATE TABLE tbl(id integer, ts time) ; > INSERT INTO tbl VALUES > (0, '20:00'), > (0, '20:05'), > (0, '20:08'), > (0, '20:10'), > (0, '20:11'), > (1, '20:03'), > (1, '20:04'), > (1, '20:05'), > (1, '20:09'), > (1, '20:16'); > > SELECT * FROM tbl; > > > -- > WITH RECURSIVE rec(id , ts_p, ts_c) AS ( > SELECT a1.id, min(a1.ts), min(b1.ts) > FROM tbl AS a1, tbl AS b1 > WHERE a1.id=b1.id AND a1.ts + interval'5 minute' <= b1.ts > GROUP BY a1.id > UNION ALL > SELECT t2.id, t2.ts_p, t2.ts_c > FROM rec AS t1 INNER JOIN > (SELECT a2.id, a2.ts as ts_p, min(b2.ts) AS ts_c > FROM tbl AS a2, tbl AS b2 > WHERE a2.id = b2.id AND a2.ts + interval'5 minute' <= b2.ts > GROUP BY a2.id, a2.ts > UNION ALL > SELECT a3.id, a3.ts, null > FROM tbl AS a3 > ) AS t2 ON t1.id = t2.id AND t1.ts_c=t2.ts_p > ) > SELECT DISTINCT id, ts_p AS ts FROM rec > ORDER BY 1,2; > > > > > > > > I have a table that, at a minimum, has ID and timestamp columns. Records > > are inserted into with random IDs and timestamps. Duplicate IDs are > > allowed. > > > > I want to select records grouped by ID, ordered by timestamp that are X > > minutes > > apart. In this case X is 5. > > > > Note, the intervals are not X minute wall clock intervals, they are X minute > > intervals from the last accepted record, per-id. > > > > For instance here is some sample input data: > > > > ID TS (HH:MM) > > ------------------- > > 0 20:00 > > 1 20:03 > > 1 20:04 > > 0 20:05 > > 1 20:05 > > 0 20:08 > > 1 20:09 > > 0 20:10 > > > > I'd want the select to return: > > > > ID TS (HH:MM) > > ------------------- > > 0 20:00 > > 0 20:05 > > 0 20:10 > > 1 20:03 > > 1 20:09 > > > > > > Does my question make sense? > > > > Thanks in advance, > > Wayne > > > > -- > > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-sql > > > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql