Re: Given a set of daterange, finding the continuous range that includes a particular date

mariusz Fri, 23 Feb 2018 06:10:45 -0800

On Thu, 2018-02-22 at 17:23 -0800, Ken Tanzer wrote:
> 
> 
> On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver
> <adrian.kla...@aklaver.com> wrote:
>         On 02/22/2018 04:58 PM, Ken Tanzer wrote:
>         
>                 
>                 
>                 On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver
>                 <adrian.kla...@aklaver.com
>                 <mailto:adrian.kla...@aklaver.com>> wrote:
>                 
>                     On 02/22/2018 04:44 PM, Ken Tanzer wrote:
>                 
>                         Hi, hoping to get some help with this.  I'm
>                 needing to take a
>                         specific date, a series of dateranges  and,
>                 given a specific
>                         date, return a single conitinuous daterange
>                 that includes that date.
>                 
>                         To elaborate a bit, I've got lots of tables
>                 that include start
>                         and end dates.  For example:
>                 
>                         CREATE TABLE tbl_staff_assign (
>                               staff_assign_id         SERIAL PRIMARY
>                 KEY,
>                               client_id               INTEGER NOT NULL
>                 REFERENCES
>                         tbl_client (client_id),
>                               staff_id                INTEGER
>                 REFERENCES
>                         tbl_staff(staff_id),
>                               staff_assign_type_code      VARCHAR(10)
>                 NOT NULL
>                         REFERENCES tbl_l_staff_assign_type
>                 (staff_assign_type_code),
>                               staff_assign_date           DATE NOT
>                 NULL,
>                               staff_assign_date_end       DATE,
>                         ...
>                 
>                         So a client might leave a progrma and then
>                 return later, or they
>                         might simply switch to another staff_id.  (In
>                 which case one
>                         record will have and end date, and the next
>                 record will start on
>                         the next day.)  In this case I need to know
>                 "what period were
>                         they continuously in the program that includes
>                 X date?"  So I'd
>                         like to be able to do something like:
>                 
>                         "SELECT
>                 staff_assign_date,continuous_daterange(
>                         staff_assign_date, (SELECT
>                 
>                 
> array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
>                         ) FROM staff_assign sa2 WHERE
>                 sa2.client_id=sa.client_id) FROM
>                         staff_assign sa
>                 
>                         I've done this before with procedures specific
>                 to a particular
>                         table, and working with the start and end
>                 dates.  I'm now
>                         wanting to try to do this once generically
>                 that will work for
>                         all my cases.  So I'm hoping to do this in a
>                 way that
>                         performance isn't horrible.  And it's a little
>                 unclear to me how
>                         much and how I might be able to use the
>                 daterange operators to
>                         accomplish this efficiently.
>                 
>                 
>                     The operator I use to solve similar problems:
>                 
>                 
>                 https://www.postgresql.org/docs/10/static/functions-range.html
>                 
>                 
> <https://www.postgresql.org/docs/10/static/functions-range.html>
>                 
>                     @>      contains element
>                 '[2011-01-01,2011-03-01)'::tsrange
>                     @> '2011-01-10'::timestamp   t
>                 
>                 
>                 Thanks Adrian.  But how would you apply that to this
>                 situation, where I have a series of (quite possibly
>                 discontinuous) dateranges?
>         
>         
>         This is going to depend on a more formal definition of the
>         problem with some sample data. Right now I am trying to
>         reconcile "what period were they continuously in the program
>         that includes X date?" with "where I have a series of (quite
>         possibly discontinuous) dateranges? " Maybe its just me,  I
>         can't see how discontinuous can also be continuously.
>         
>                 
> 
> 
> Start   End
> -- Episode 1
> 1/1/16  3/30/16
> 4/1/16  4/30/16
> -- Episode 2
> 
> 1/1/18  1/31/18
> 2/1/18  NULL
> 
> 
> Given 1/15/18, should return 1/1/18-->NULL
> Given 1/15/16, should return 1/1/16-->4/30/16
>



i guess, you can easily get max continuous range for each row with
something like this:

CREATE OR REPLACE FUNCTION
        append_daterange(d1 daterange, d2 daterange)
        RETURNS daterange
        LANGUAGE sql
        AS
$$
        SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END;
$$;

CREATE AGGREGATE agg_daterange (daterange) (
        sfunc = append_daterange,
        stype = daterange
);

SELECT dr,
       lower(agg_daterange(dr) OVER (ORDER BY dr ASC)),
       upper(agg_daterange(dr) OVER (ORDER BY dr DESC))
  FROM ...

above example is simplified to selecting only daterange column "dr" for
readability, which in your case should be something like

daterange(staff_assign_date,staff_assign_date_end,'[)')

please note that daterange would be normalized to [) format so upper()
above returns exactly your max "staff_assign_date_end" for each
continuous range when dateranges are created with '[)' format.

the key point is ... ELSE d2 in append_daterange() which starts with new
value each time that new value is discontinuous with agg's state value
and order in which rows are processed (ASC for lower of daterange, DESC
for upper of daterange).

unfortunately this involves reading all rows for "client_id" and
additional sorting for each window.
i recall reading that you already pay the price of reading all rows for
client_id anyway, so the only question is the cost of two additional
sorts (maybe extracting dateranges to subset on which to do windows and
rejoining result of continuous ranges to original set would help to
lower the cost).

another way would be recursive cte, easy to append continuous dateranges
one by one from a set of dateranges staring with some date of choice,
but i doubt it would be cheaper, not to mention additional
postprocessing to get what you really want. and definitely would be less
readable.


i guess, from that point where for each row you have start and end date
of max continuous range including given row, you can easily format or
filter whatever you want


> Thinking about this a bit more, I'm wondering if a window function
> could be used for this?  I've used them a bit, but definitely never
> wrote one.   Something like 
> 
> 
> continuous_daterange(staff_assign_date,daterange(staff_assign_date,staff_assign_date_end,'[[]')
>  OVER (PARTION BY client_id)
> 
> 
> But then a window function can only see the rows included in the
> query, yes?  In which case this would only work if you queried for all
> the rows for a particular client_id?  I guess in the example I gave I
> was doing that anyway, so maybe this would be no worse.
> 
> 
> Ken
> -- 
> 
> AGENCY Software  
> A Free Software data system
> By and for non-profits
> http://agency-software.org/
> https://demo.agency-software.org/client
> 
> ken.tan...@agency-software.org
> (253) 245-3801
> 
> 
> Subscribe to the mailing list to
> learn more about AGENCY or
> follow the discussion.

Re: Given a set of daterange, finding the continuous range that includes a particular date

Reply via email to