On Fri, Feb 23, 2018 at 6:42 AM, Viktor Fougstedt <vik...@chalmers.se> wrote:
> > Hi. > > This answer is perhaps useful if I understand your problem correctly. But > I might have interpreted it wrongly. :-) > > > I would probably start with merging intervals so that overlapping and > adjacent intervals become single continuous intervals, then select from > those merged intervals. > > We have an application with a lot of interval handling in PostgreSQL, and > we use many variants of algorithms based on one by Itzik Ben Gan which he > calls “packing intervals”. The post we started with was an old reader’s > challenge from SQL Magazine. It has since been updated when MS-SQL started > supporting window functions better (Itzik Ben Gan is a MS-SQL-guy). > > Basically, it is a few CTE:s which convert the intervals into “start” (+1) > and “stop” (-1) events, then keeps a running sum of these, and finally > creates new intervals by noting that the merged (or “packed”) intervals > starts with events that had sum=0 before them and stops with events which > have sum=0 after them. > > It involves both CTE:s and window functions and is quite a beautiful > example of SQL, IMHO. > > I think it’s best to google it, he does a good job of explaining how it > works. > > Hope that helps a bit at least. > > /Viktor > > Hi. Thanks for the many helpful responses! Although I didn't end up with exactly what I was looking for, I think I got to something that works good enough for now. I did it with CTEs, in a way I think similar to what you were suggesting (my printer is out of ink, so I didn't actually get to look at that book yet!) I ended up having to do 4 passes: 1) Identify the starts and ends of continuous ranges 2) Eliminate the middle-point records (non stops/starts) 3) Merge the stop date in with the starts 4) Eliminate the stops I couldn't see how to do it in less steps, but if there's a way... I posted the query along with some sample data below. Thanks again! Ken BEGIN; CREATE TEMP TABLE sample_data ( client_id INTEGER, start_date DATE, end_date DATE, episode INTEGER -- I won't actually have this info; just a label I added for clarity and testing ); INSERT INTO sample_data VALUES (1,'1990-01-01','1990-12-31',0), (1,'1991-01-01','1991-12-31',0), (1,'1995-01-01','1995-06-30',1), (1,'2000-01-01','2000-12-31',2), (1,'2001-01-01','2001-12-31',2), (1,'2002-01-01','2002-12-31',2), (1,'2003-01-01','2003-12-31',2), (1,'2004-01-01','2004-12-31',2), (1,'2005-01-01','2005-12-31',2), (1,'2006-01-01','2006-12-31',2), (1,'2014-01-01','2014-12-31',3), (1,'2015-01-01','2015-12-31',3), (1,'2017-06-30','2017-12-31',4), (1,'2018-01-01',NULL,4), (2,'2014-02-01','2015-01-31',0), (2,'2015-02-01','2015-12-31',0), (2,'2017-09-30','2018-01-31',1), (2,'2018-02-01','2018-02-14',1) ; WITH peek3 AS ( WITH peek2 AS ( WITH peek AS ( SELECT client_id, episode, daterange(start_date,end_date,'[]') AS range, COALESCE(daterange(start_date,end_date+1,'[]') && lead(daterange(start_date,end_date,'[]')) OVER (PARTITION BY client_id ORDER BY start_date),false) AS continues, COALESCE(daterange(start_date,end_date,'[]') && lag(daterange(start_date,end_date+1,'[]')) OVER (PARTITION BY client_id ORDER BY start_date),false) AS is_continued FROM sample_data ) SELECT * FROM peek WHERE NOT (is_continued AND continues) ) SELECT client_id,episode,range, daterange(lower(range), CASE WHEN lead(is_continued) OVER (PARTITION BY client_id ORDER BY range) THEN lead(upper(range)) OVER (PARTITION BY client_id ORDER BY range) ELSE upper(range) END) AS full_range ,continues,is_continued FROM peek2 ) SELECT * FROM peek3 WHERE NOT is_continued ; -- AGENCY Software A Free Software data system By and for non-profits *http://agency-software.org/ <http://agency-software.org/>* *https://demo.agency-software.org/client <https://demo.agency-software.org/client>* ken.tan...@agency-software.org (253) 245-3801 Subscribe to the mailing list <agency-general-requ...@lists.sourceforge.net?body=subscribe> to learn more about AGENCY or follow the discussion.