Re: [GENERAL] Summing activity intervals without any obvious column to group by

Carey Tilden Mon, 13 Aug 2012 21:06:32 -0700

On Mon, Aug 13, 2012 at 7:05 PM, David Johnston <pol...@yahoo.com> wrote:


> On Aug 13, 2012, at 21:22, Carey Tilden <carey.til...@gmail.com> wrote:
>
> On Mon, Aug 13, 2012 at 6:01 PM, David Johnston <pol...@yahoo.com> wrote:
>
>> On Aug 13, 2012, at 20:28, Carey Tilden <carey.til...@gmail.com> wrote:
>>
>> > Apologies for the awkward title.  I haven't quite thought of the right
>> way to describe my problem, which may be why I've had a hard time figuring
>> out how to solve it.  I have a list of program start/stop times, and I want
>> to know how long each run takes to complete.  The thing that's really
>> tripping me up is there are gaps in the sequence.  I've figured out how to
>> collapse the results down to a single row per attempt, but I can't quite
>> figure out how to further collapse down each full run to its own row.  It'd
>> be easy if I had a session_id or something to group on, but I don't.  All I
>> have are the start/stop times.
>> >
>> > Here's some sample data.  Hopefully this clarifies what I'm talking
>> about:
>> >
>> >     drop table if exists program_runs;
>> >
>> >     create temporary table program_runs (
>> >         id serial,
>> >         time_stamp timestamptz,
>> >         action text
>> >     );
>> >
>> >     insert into program_runs (time_stamp, action) values
>> >         ('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00
>> PST', 'stopped early'),
>> >         ('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00
>> PST', 'stopped early'),
>> >         ('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00
>> PST', 'completed'),
>> >         ('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00
>> PST', 'stopped early'),
>> >         ('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00
>> PST', 'completed'),
>> >         ('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00
>> PST', 'stopped early'),
>> >         ('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00
>> PST', 'stopped early'),
>> >         ('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00
>> PST', 'stopped early'),
>> >         ('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00
>> PST', 'completed');
>> >
>> >     select
>> >         this_time_stamp as starting_time_stamp,
>> >         next_time_stamp - this_time_stamp as time_elapsed,
>> >         next_action as closing_action
>> >     from (
>> >         select
>> >             time_stamp as this_time_stamp, lead(time_stamp) over (order
>> by id) as next_time_stamp,
>> >             action as this_action, lead(action) over (order by id) as
>> next_action,
>> >             id as this_id, lead(id) over (order by id) as next_id
>> >         from program_runs
>> >     ) q
>> >     where this_action = 'started';
>> >
>> > Note that each run has a pair of entries in the table.  The first is
>> always "started", but the second may be either "stopped early" or
>> "completed".  The final results I'd like to see are:
>> >
>> >       starting_time_stamp   | total_time_elapsed
>> >     ------------------------+--------------------
>> >      2012-01-01 10:00:00-08 | 00:27:00
>> >      2012-01-01 10:50:00-08 | 00:13:00
>> >      2012-01-01 11:20:00-08 | 00:39:00
>> >
>> > Hope that's enough detail.  Any ideas or suggestions gladly accepted!
>> >
>> > Regards,
>> > Carey
>>
>> First artificially generate row (pair) identifiers by integer dividing
>> the ordered row number by 2.
>>
>> Using window or sub-queries identify the bookends for each group (i.e.,
>> the identifier for each completed and the prior completed).  Give these
>> groups artificial session identifiers/row numbers.
>>
>> Assign the artificial session id to each transaction row by using the
>> bookends.
>>
>
> This is the part where I draw a blank.  How would I do that?  Seems like
> it should be easy with window functions, but I just can't think of the way
> to do it.
>
>
> With detail as ()
> , bookmarks as ()
> Select detail.id, bookmarks.id, ...
> From detail
> Join bookmarks on ( detail.id between bookmarks.startid and
> bookmarks.endid )
>
> Bookmarks as (
> Select row_number() over () as id,
> detail.id, coalesce(min(detail.id) over (rows 1 preceeding),0) --coalesce
> for the first row
> From detail where 'completed'
> )
>
> Probably need to play with row ordering but this should get you started.
>

Thanks much!  That finally clicked and I now have the results I was after.
I included my latest code as an attachment, since pasting it inline seemed
a bit much.  I'm all ears if anyone wants to take a look and suggest any
further refinements.

Cheers,
Carey


>
>> Now you have identifiers with which to group.
>>
>> This makes a number of assumptions regarding the form of the input data.
>> It will solve for your example data but it may not generalize.  In
>> particular it assumes non-overlapping sessions.
>>
>
> The assumptions hold fairly well.  Sessions do not overlap, thankfully.
> There are different program runs to untangle, but that's simple enough
> (order by program_name, time_stamp).
>
> Thanks for the suggestions so far!
>
> Carey
>
>

program_runs_testbed.sql
Description: Binary data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Summing activity intervals without any obvious column to group by

Reply via email to