On Mar 31, 4:07 pm, Arnaud Delobelle <arno...@googlemail.com> wrote: > prueba...@latinmail.com writes: > > [...] > > > > > Well since I attracted a couple people's attention I will describe the > > problem in more detail. Describing the problem properly is probably as > > hard as solving it, so excuse me if I struggle a bit. > > > The problem is for a health insurance company and involves the period > > of time a person is covered. Most insurance companies allow not only > > for the main member to be insured but his family: the spouse and the > > dependents (children). This additional coverage costs extra but less > > than a full new insurance. So for example if Alice buys an insurance > > worth at 100 dollars a month, she can insure her husband Bob for an > > additional 50 dollars. Under certain circumstances Alice may go off > > the insurance and only Bob stays. In that case the price goes back to > > 100 dollars or maybe there is a deal for 80 or something like that. In > > other words the cost of the insurance is dependent on the combination > > of family members that participate in it. Additionally not only do we > > have different family compositions but also different insurance > > products. So you can get medical, dental and vision insurance. > > > All that data is stored in a database that is not very tidy and looks > > something like this: > > > First Day of Coverage, Last Day of Coverage, Relationship, Product > > 5/3/2005, 5/3/2005, D, M > > 9/10/2005, 10/10/2005, S, V > > 3/15/2005, 7/15/2005, M, M > > 3/1/2005, 6/1/2005, S, M > > 5/15/2005, 7/20/2005, D, D > > 9/10/2005, 1/1/2140, M, V > > 2/1/2005, 5/3/2005, M, M > > > Where > > Relationship: M=Main Member, S=Spouse, D=Dependent > > Product: M=Medical, D=Dental, V=Vision > > > As far as I know at the present time there are no deals based on > > combination of products purchased so we will handle each product > > independently. What I want is a simple algorithm that allows me to > > calculate something like this out of it (hopefully I didn’t make a > > mistake): > > > Medical: > > 2/1/2005, 2/28/2005, M > > 3/1/2005, 5/2/2005, M&S > > 5/3/2005, 5/3/2005, M&S&D > > 5/4/2005, 6/1/2005, M&S > > 6/2/2005, 7/15/2005, M > > > Dental: > > 5/15/2005, 7/20/2005, D > > > Vision: > > 9/10/2005, 10/10/2005, M&S > > 10/11/2005, 1/1/2140, M > > OK the approach I describe in my previous email works fine for this > particular problem. I implement it below - the function walk_ivals is at > the heart of it, I've made it as simple as possible and it's pretty > clear that it is O(nlogn). > > The function that actually sorts the data is union(), and it's just a > call to walk_ivals with callback the function acc() which is constant > time, therefore union() itself has the same complexity as walk_ivals. > > There are no comments - I don't have the time to add any, sorry! > > -------------------------------------------------- > import datetime > from collections import defaultdict > > def walk_ivals(ivals, callback, endvals=(-1, 1)): > endpoints = [(x, data) for ival, data in ivals for x in zip(ival, > endvals)] > endpoints.sort() > for (endpoint, endval), data in endpoints: > callback(endpoint, endval, data) > > def union(ivals): > timelines = defaultdict(list) > mult = defaultdict(lambda: defaultdict(int)) > def acc(date, step, data): > rel, prod = data > old_mult = mult[prod][rel] > mult[prod][rel] -= step > if not(old_mult and mult[prod][rel]): > rels = [rel for rel, m in mult[prod].iteritems() if m] > if timelines[prod] and timelines[prod][-1][0] == date: > timelines[prod][-1][1] = rels > else: > timelines[prod].append([date, rels]) > walk_ivals(ivals, acc) > return timelines > > test_data = """5/3/2005, 5/3/2005, D, M > 9/10/2005, 10/10/2005, S, V > 3/15/2005, 7/15/2005, M, M > 3/1/2005, 6/1/2005, S, M > 5/15/2005, 7/20/2005, D, D > 9/10/2005, 1/1/2140, M, V > 2/1/2005, 5/3/2005, M, M""" > > def parse_date(date_string): > month, day, year = map(int, date_string.split('/')) > return datetime.date(year, month, day) > > def parse_data(data_string): > data = [] > for line in data_string.split("\n"): > start, end, rel, prod = line.split(",") > start, end = map(parse_date, (start + datetime.timedelta(1), end)) > rel, prod = rel.strip(), prod.strip() > data.append([(start, end), (rel, prod)]) > return data > > def test(): > ivals = parse_data(test_data) > timelines = union(ivals) > for prod, timeline in timelines.iteritems(): > print "-"*20 > print "Product", prod > for date, covers in timeline: > print date, ' & '.join(covers) > > if __name__ == '__main__': > test() > -------------------------------------------------- > > Here is what it outputs: > > marigold:junk arno$ python intervals2.py > -------------------- > Product M > 2005-02-01 M > 2005-03-01 S & M > 2005-05-03 S & M & D > 2005-05-04 S & M > 2005-06-02 M > 2005-07-16 > -------------------- > Product D > 2005-05-15 D > 2005-07-21 > -------------------- > Product V > 2005-09-10 S & M > 2005-10-11 M > 2140-01-02 > > -------------------------------------------------- > > The date output is slightly different from yours - I didn't realise you > had time intervals and now I don't have the time to change it, although > it's only a cosmetic change (the end-date of each interval is the > start-date of the next one minus 1 day). > > HTH > > PS: warning - I have done some minor editing of the code after pasting > it, so it might break (although it should be fine). > > -- > Arnaud
Thanks Arnaud, indeed I got this error when running the program: Traceback (most recent call last): File "C:/Python26/timeint2.py", line 57, in <module> test() File "C:/Python26/timeint2.py", line 48, in test ivals = parse_data(test_data) File "C:/Python26/timeint2.py", line 42, in parse_data start, end = map(parse_date, (start + datetime.timedelta(1), end)) TypeError: cannot concatenate 'str' and 'datetime.timedelta' objects easy fix: start, end = map(parse_date, (start , end)) end+=datetime.timedelta(1) I am assuming you want close-open intervals. I also added another line to the test: test_data = """5/3/2005, 5/3/2005, D, M 9/10/2005, 10/10/2005, S, V 3/15/2005, 7/15/2005, M, M 3/1/2005, 6/1/2005, S, M 5/15/2005, 7/20/2005, D, D 9/10/2005, 1/1/2140, M, V 2/1/2005, 5/3/2005, M, M 3/1/2004, 7/1/2004, M, M""" result: -------------------- Product M 2004-03-01 M 2004-07-02 2005-02-01 M 2005-03-01 S & M 2005-05-03 S & M & D 2005-05-04 S & M 2005-06-02 M 2005-07-16 -------------------- Product D 2005-05-15 D 2005-07-21 -------------------- Product V 2005-09-10 S & M 2005-10-11 M 2140-01-02 The output will have to be massaged to get the begin and end of each interval, but otherwise this is more or less what I was looking for. The walk_ivals and union function are 20 lines of pretty dense code. I will have to study it a bit to understand how it really works. Thanks again -- http://mail.python.org/mailman/listinfo/python-list