On 03/21/2013 08:39 PM, Matthew Johnson wrote:
Dear list,
> > I have been trying to understand out how to use iterators and in > particular groupby statements. I am, however, quite lost. > > I wish to subset the below list, selecting the observations that have > an ID ('realtime_start') value that is greater than some date (i've > used the variable name maxDate), and in the case that there is more > than one such record, returning only the one that has the largest ID > ('realtime_start'). > > The code below does the job, however i have the impression that it > might be done in a more python way using iterators and groupby > statements. > > could someone please help me understand how to go from this code to > the pythonic idiom? > > thanks in advance, > > Matt Johnson > > _________________ > > ## Code example > > import pprint > > obs = [{'date': '2012-09-01', > 'realtime_end': '2013-02-18', > 'realtime_start': '2012-10-15', > 'value': '231.951'}, > {'date': '2012-09-01', > 'realtime_end': '2013-02-18', > 'realtime_start': '2012-11-15', > 'value': '231.881'}, > {'date': '2012-10-01', > 'realtime_end': '2013-02-18', > 'realtime_start': '2012-11-15', > 'value': '231.751'}, > {'date': '2012-10-01', > 'realtime_end': '9999-12-31', > 'realtime_start': '2012-12-19', > 'value': '231.623'}, > {'date': '2013-02-01', > 'realtime_end': '9999-12-31', > 'realtime_start': '2013-03-21', > 'value': '231.157'}, > {'date': '2012-11-01', > 'realtime_end': '2013-02-18', > 'realtime_start': '2012-12-14', > 'value': '231.025'}, > {'date': '2012-11-01', > 'realtime_end': '9999-12-31', > 'realtime_start': '2013-01-19', > 'value': '231.071'}, > {'date': '2012-12-01', > 'realtime_end': '2013-02-18', > 'realtime_start': '2013-01-16', > 'value': '230.979'}, > {'date': '2012-12-01', > 'realtime_end': '9999-12-31', > 'realtime_start': '2013-02-19', > 'value': '231.137'}, > {'date': '2012-12-01', > 'realtime_end': '9999-12-31', > 'realtime_start': '2013-03-19', > 'value': '231.197'}, > {'date': '2013-01-01', > 'realtime_end': '9999-12-31', > 'realtime_start': '2013-02-21', > 'value': '231.198'}, > {'date': '2013-01-01', > 'realtime_end': '9999-12-31', > 'realtime_start': '2013-03-21', > 'value': '231.222'}] > > maxDate = "2013-03-21" > > dobs = dict([(d, []) for d in set([e['date'] for e in obs])]) > > for o in obs: > dobs[o['date']].append(o) > > dobs_subMax = dict([(k, [d for d in v if d['realtime_start'] <= maxDate]) > for k, v in dobs.items()]) > > rts = lambda x: x['realtime_start'] > > mmax = [sorted(e, key=rts)[-1] for e in dobs_subMax.values() if e] > > mmax.sort(key = lambda x: x['date']) > > pprint.pprint(mmax) > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor >
You can do it with groupby like so: from itertools import groupby from operator import itemgetter maxDate = "2013-03-21" mmax = list() obs.sort(key=itemgetter('date')) for k, group in groupby(obs, key=itemgetter('date')): group = [dob for dob in group if dob['realtime_start'] <= maxDate] if group: group.sort(key=itemgetter('realtime_start')) mmax.append(group[-1]) pprint.pprint(mmax) Note that writing multiply-nested comprehensions like you did results in very unreadable code. Do you find this code more readable? -m -- Lark's Tongue Guide to Python: http://lightbird.net/larks/ Many a man fails as an original thinker simply because his memory it too good. Friedrich Nietzsche _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor