On Aug 13, 2012, at 20:28, Carey Tilden <[email protected]> wrote:
> Apologies for the awkward title. I haven't quite thought of the right way to
> describe my problem, which may be why I've had a hard time figuring out how
> to solve it. I have a list of program start/stop times, and I want to know
> how long each run takes to complete. The thing that's really tripping me up
> is there are gaps in the sequence. I've figured out how to collapse the
> results down to a single row per attempt, but I can't quite figure out how to
> further collapse down each full run to its own row. It'd be easy if I had a
> session_id or something to group on, but I don't. All I have are the
> start/stop times.
>
> Here's some sample data. Hopefully this clarifies what I'm talking about:
>
> drop table if exists program_runs;
>
> create temporary table program_runs (
> id serial,
> time_stamp timestamptz,
> action text
> );
>
> insert into program_runs (time_stamp, action) values
> ('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00 PST',
> 'stopped early'),
> ('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00 PST',
> 'stopped early'),
> ('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00 PST',
> 'completed'),
> ('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00 PST',
> 'stopped early'),
> ('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00 PST',
> 'completed'),
> ('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00 PST',
> 'stopped early'),
> ('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00 PST',
> 'stopped early'),
> ('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00 PST',
> 'stopped early'),
> ('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00 PST',
> 'completed');
>
> select
> this_time_stamp as starting_time_stamp,
> next_time_stamp - this_time_stamp as time_elapsed,
> next_action as closing_action
> from (
> select
> time_stamp as this_time_stamp, lead(time_stamp) over (order by
> id) as next_time_stamp,
> action as this_action, lead(action) over (order by id) as
> next_action,
> id as this_id, lead(id) over (order by id) as next_id
> from program_runs
> ) q
> where this_action = 'started';
>
> Note that each run has a pair of entries in the table. The first is always
> "started", but the second may be either "stopped early" or "completed". The
> final results I'd like to see are:
>
> starting_time_stamp | total_time_elapsed
> ------------------------+--------------------
> 2012-01-01 10:00:00-08 | 00:27:00
> 2012-01-01 10:50:00-08 | 00:13:00
> 2012-01-01 11:20:00-08 | 00:39:00
>
> Hope that's enough detail. Any ideas or suggestions gladly accepted!
>
> Regards,
> Carey
First artificially generate row (pair) identifiers by integer dividing the
ordered row number by 2.
Using window or sub-queries identify the bookends for each group (i.e., the
identifier for each completed and the prior completed). Give these groups
artificial session identifiers/row numbers.
Assign the artificial session id to each transaction row by using the bookends.
Now you have identifiers with which to group.
This makes a number of assumptions regarding the form of the input data. It
will solve for your example data but it may not generalize. In particular it
assumes non-overlapping sessions.
HTH
David J.
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general