On Mon, Aug 13, 2012 at 7:05 PM, David Johnston <pol...@yahoo.com> wrote:
> On Aug 13, 2012, at 21:22, Carey Tilden <carey.til...@gmail.com> wrote: > > On Mon, Aug 13, 2012 at 6:01 PM, David Johnston <pol...@yahoo.com> wrote: > >> On Aug 13, 2012, at 20:28, Carey Tilden <carey.til...@gmail.com> wrote: >> >> > Apologies for the awkward title. I haven't quite thought of the right >> way to describe my problem, which may be why I've had a hard time figuring >> out how to solve it. I have a list of program start/stop times, and I want >> to know how long each run takes to complete. The thing that's really >> tripping me up is there are gaps in the sequence. I've figured out how to >> collapse the results down to a single row per attempt, but I can't quite >> figure out how to further collapse down each full run to its own row. It'd >> be easy if I had a session_id or something to group on, but I don't. All I >> have are the start/stop times. >> > >> > Here's some sample data. Hopefully this clarifies what I'm talking >> about: >> > >> > drop table if exists program_runs; >> > >> > create temporary table program_runs ( >> > id serial, >> > time_stamp timestamptz, >> > action text >> > ); >> > >> > insert into program_runs (time_stamp, action) values >> > ('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00 >> PST', 'stopped early'), >> > ('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00 >> PST', 'stopped early'), >> > ('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00 >> PST', 'completed'), >> > ('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00 >> PST', 'stopped early'), >> > ('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00 >> PST', 'completed'), >> > ('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00 >> PST', 'stopped early'), >> > ('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00 >> PST', 'stopped early'), >> > ('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00 >> PST', 'stopped early'), >> > ('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00 >> PST', 'completed'); >> > >> > select >> > this_time_stamp as starting_time_stamp, >> > next_time_stamp - this_time_stamp as time_elapsed, >> > next_action as closing_action >> > from ( >> > select >> > time_stamp as this_time_stamp, lead(time_stamp) over (order >> by id) as next_time_stamp, >> > action as this_action, lead(action) over (order by id) as >> next_action, >> > id as this_id, lead(id) over (order by id) as next_id >> > from program_runs >> > ) q >> > where this_action = 'started'; >> > >> > Note that each run has a pair of entries in the table. The first is >> always "started", but the second may be either "stopped early" or >> "completed". The final results I'd like to see are: >> > >> > starting_time_stamp | total_time_elapsed >> > ------------------------+-------------------- >> > 2012-01-01 10:00:00-08 | 00:27:00 >> > 2012-01-01 10:50:00-08 | 00:13:00 >> > 2012-01-01 11:20:00-08 | 00:39:00 >> > >> > Hope that's enough detail. Any ideas or suggestions gladly accepted! >> > >> > Regards, >> > Carey >> >> First artificially generate row (pair) identifiers by integer dividing >> the ordered row number by 2. >> >> Using window or sub-queries identify the bookends for each group (i.e., >> the identifier for each completed and the prior completed). Give these >> groups artificial session identifiers/row numbers. >> >> Assign the artificial session id to each transaction row by using the >> bookends. >> > > This is the part where I draw a blank. How would I do that? Seems like > it should be easy with window functions, but I just can't think of the way > to do it. > > > With detail as () > , bookmarks as () > Select detail.id, bookmarks.id, ... > From detail > Join bookmarks on ( detail.id between bookmarks.startid and > bookmarks.endid ) > > Bookmarks as ( > Select row_number() over () as id, > detail.id, coalesce(min(detail.id) over (rows 1 preceeding),0) --coalesce > for the first row > From detail where 'completed' > ) > > Probably need to play with row ordering but this should get you started. > Thanks much! That finally clicked and I now have the results I was after. I included my latest code as an attachment, since pasting it inline seemed a bit much. I'm all ears if anyone wants to take a look and suggest any further refinements. Cheers, Carey > >> Now you have identifiers with which to group. >> >> This makes a number of assumptions regarding the form of the input data. >> It will solve for your example data but it may not generalize. In >> particular it assumes non-overlapping sessions. >> > > The assumptions hold fairly well. Sessions do not overlap, thankfully. > There are different program runs to untangle, but that's simple enough > (order by program_name, time_stamp). > > Thanks for the suggestions so far! > > Carey > >
program_runs_testbed.sql
Description: Binary data
-- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general