Re: [SQL] selecting records X minutes apart

lists-pgsql Sun, 05 Jun 2011 12:43:38 -0700

Thank you all who replied!!  It looks like Sugawara's recursive solution does
the trick.  Unfortunately performance is quite poor for the sample dataset I'm
working with which is a table of about 50000 records.  Indeed, there are
indexes applied to the table.  I believe the recursive select is being
executed a great number of times causing the first part of the query to take a
long time.


The fastest solution I've come up with is a plpgsql procedure the loops over a
select where the result is ordered by (id,tstamp) and examines the tstamp
values and only returns rows that meet the interval criteria. This technique
takes roughly 2 seconds to filter out records over my 50000 record sample
set.... which is acceptable but not nearly as elegant as a single SQL
statement.

Again, thank you for all the replies.

Wayne

On Sun, Jun 05, 2011 at 08:52:30PM +0900, Masaru Sugawara wrote:
> On Fri, 3 Jun 2011 15:52:53 -0400
> [email protected] wrote:
> 
> 
> I also think you might want to use WITH RECURSIVE clause. 
> This SQL searches the case of an interval of 5 minutes or more, 
> and sets a relationship between a parent to its child.
> 
> 
> CREATE TABLE tbl(id integer, ts time) ;
> INSERT INTO tbl VALUES
> (0, '20:00'),
> (0, '20:05'),
> (0, '20:08'),
> (0, '20:10'),
> (0, '20:11'),
> (1, '20:03'),
> (1, '20:04'),
> (1, '20:05'),
> (1, '20:09'),
> (1, '20:16');
> 
> SELECT * FROM tbl;
> 
> 
> --
> WITH RECURSIVE rec(id , ts_p, ts_c) AS (
> SELECT a1.id, min(a1.ts), min(b1.ts)
>    FROM tbl AS a1, tbl AS b1 
>    WHERE a1.id=b1.id AND a1.ts + interval'5 minute' <= b1.ts 
>    GROUP BY a1.id
> UNION ALL
> SELECT t2.id, t2.ts_p, t2.ts_c
>    FROM rec AS t1 INNER JOIN 
>    (SELECT a2.id, a2.ts as ts_p, min(b2.ts) AS ts_c
>        FROM tbl AS a2, tbl AS b2 
>        WHERE a2.id = b2.id AND a2.ts + interval'5 minute' <= b2.ts 
>        GROUP BY a2.id, a2.ts
>     UNION ALL
>     SELECT a3.id, a3.ts, null
>        FROM tbl AS a3      
>     ) AS t2 ON t1.id = t2.id AND t1.ts_c=t2.ts_p
> )
> SELECT DISTINCT id, ts_p AS ts FROM rec
> ORDER BY 1,2;
> 
> 
> 
> 
> 
> 
> > I have a table that, at a minimum, has ID and timestamp columns.  Records
> > are inserted into with random IDs and timestamps.  Duplicate IDs are 
> > allowed. 
> > 
> > I want to select records grouped by ID, ordered by timestamp that are X 
> > minutes
> > apart. In this case X is 5.
> > 
> > Note, the intervals are not X minute wall clock intervals, they are X minute
> > intervals from the last accepted record, per-id.
> > 
> > For instance here is some sample input data:
> > 
> > ID  TS (HH:MM)
> > -------------------
> > 0   20:00
> > 1   20:03
> > 1   20:04
> > 0   20:05
> > 1   20:05
> > 0   20:08
> > 1   20:09
> > 0   20:10
> > 
> > I'd want the select to return:
> > 
> > ID  TS (HH:MM)
> > -------------------
> > 0   20:00
> > 0   20:05
> > 0   20:10
> > 1   20:03
> > 1   20:09
> > 
> > 
> > Does my question make sense?
> > 
> > Thanks in advance,
> > Wayne
> > 
> > -- 
> > Sent via pgsql-sql mailing list ([email protected])
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql
> 
> 
> 
> 
> -- 
> Sent via pgsql-sql mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] selecting records X minutes apart

Reply via email to