Re: [GENERAL] Summing activity intervals without any obvious column to group by

David Johnston Mon, 13 Aug 2012 18:04:41 -0700

On Aug 13, 2012, at 20:28, Carey Tilden <[email protected]> wrote:


> Apologies for the awkward title.  I haven't quite thought of the right way to 
> describe my problem, which may be why I've had a hard time figuring out how 
> to solve it.  I have a list of program start/stop times, and I want to know 
> how long each run takes to complete.  The thing that's really tripping me up 
> is there are gaps in the sequence.  I've figured out how to collapse the 
> results down to a single row per attempt, but I can't quite figure out how to 
> further collapse down each full run to its own row.  It'd be easy if I had a 
> session_id or something to group on, but I don't.  All I have are the 
> start/stop times.
> 
> Here's some sample data.  Hopefully this clarifies what I'm talking about:
> 
>     drop table if exists program_runs;
>     
>     create temporary table program_runs (
>         id serial,
>         time_stamp timestamptz,
>         action text
>     );
>     
>     insert into program_runs (time_stamp, action) values
>         ('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00 PST', 
> 'stopped early'),
>         ('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00 PST', 
> 'stopped early'),
>         ('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00 PST', 
> 'completed'),
>         ('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00 PST', 
> 'stopped early'),
>         ('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00 PST', 
> 'completed'),
>         ('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00 PST', 
> 'stopped early'),
>         ('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00 PST', 
> 'stopped early'),
>         ('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00 PST', 
> 'stopped early'),
>         ('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00 PST', 
> 'completed');
>     
>     select
>         this_time_stamp as starting_time_stamp,
>         next_time_stamp - this_time_stamp as time_elapsed,
>         next_action as closing_action
>     from (
>         select
>             time_stamp as this_time_stamp, lead(time_stamp) over (order by 
> id) as next_time_stamp,
>             action as this_action, lead(action) over (order by id) as 
> next_action,
>             id as this_id, lead(id) over (order by id) as next_id
>         from program_runs
>     ) q
>     where this_action = 'started';
> 
> Note that each run has a pair of entries in the table.  The first is always 
> "started", but the second may be either "stopped early" or "completed".  The 
> final results I'd like to see are:
> 
>       starting_time_stamp   | total_time_elapsed 
>     ------------------------+--------------------
>      2012-01-01 10:00:00-08 | 00:27:00
>      2012-01-01 10:50:00-08 | 00:13:00
>      2012-01-01 11:20:00-08 | 00:39:00
> 
> Hope that's enough detail.  Any ideas or suggestions gladly accepted!
> 
> Regards,
> Carey

First artificially generate row (pair) identifiers by integer dividing the 
ordered row number by 2.

Using window or sub-queries identify the bookends for each group (i.e., the 
identifier for each completed and the prior completed).  Give these groups 
artificial session identifiers/row numbers.

Assign the artificial session id to each transaction row by using the bookends.

Now you have identifiers with which to group.

This makes a number of assumptions regarding the form of the input data. It 
will solve for your example data but it may not generalize.  In particular it 
assumes non-overlapping sessions.

HTH

David J.

-- 
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Summing activity intervals without any obvious column to group by

Reply via email to