PyToolz, Pandas, Dask .groupby() toolz.itertoolz.groupby does this succinctly without any new/magical/surprising syntax.
https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.groupby >From https://github.com/pytoolz/toolz/blob/master/toolz/itertoolz.py : """ def groupby(key, seq): """ Group a collection by a key function >>> names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank'] >>> groupby(len, names) # doctest: +SKIP {3: ['Bob', 'Dan'], 5: ['Alice', 'Edith', 'Frank'], 7: ['Charlie']} >>> iseven = lambda x: x % 2 == 0 >>> groupby(iseven, [1, 2, 3, 4, 5, 6, 7, 8]) # doctest: +SKIP {False: [1, 3, 5, 7], True: [2, 4, 6, 8]} Non-callable keys imply grouping on a member. >>> groupby('gender', [{'name': 'Alice', 'gender': 'F'}, ... {'name': 'Bob', 'gender': 'M'}, ... {'name': 'Charlie', 'gender': 'M'}]) # doctest:+SKIP {'F': [{'gender': 'F', 'name': 'Alice'}], 'M': [{'gender': 'M', 'name': 'Bob'}, {'gender': 'M', 'name': 'Charlie'}]} See Also: countby """ if not callable(key): key = getter(key) d = collections.defaultdict(lambda: [].append) for item in seq: d[key(item)](item) rv = {} for k, v in iteritems(d): rv[k] = v.__self__ return rv """ If you're willing to install Pandas (and NumPy, and ...), there's pandas.DataFrame.groupby: https://pandas.pydata.org/pandas-docs/stable/generated/ pandas.DataFrame.groupby.html https://github.com/pandas-dev/pandas/blob/v0.23.1/pandas/ core/generic.py#L6586-L6659 Dask has a different groupby implementation: https://gist.github.com/darribas/41940dfe7bf4f987eeaa# file-pandas_dask_test-ipynb https://dask.pydata.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.groupby On Thursday, June 28, 2018, Chris Barker via Python-ideas < python-ideas@python.org> wrote: > On Thu, Jun 28, 2018 at 8:25 AM, Nicolas Rolin <nicolas.ro...@tiime.fr> > wrote: >> >> I use list and dict comprehension a lot, and a problem I often have is to >> do the equivalent of a group_by operation (to use sql terminology). >> > > I don't know from SQL, so "group by" doesn't mean anything to me, but this: > > >> For example if I have a list of tuples (student, school) and I want to >> have the list of students by school the only option I'm left with is to >> write >> >> student_by_school = defaultdict(list) >> for student, school in student_school_list: >> student_by_school[school].append(student) >> > > seems to me that the issue here is that there is not way to have a > "defaultdict comprehension" > > I can't think of syntactically clean way to make that possible, though. > > Could itertools.groupby help here? It seems to work, but boy! it's ugly: > > In [*45*]: student_school_list > > Out[*45*]: > > [('Fred', 'SchoolA'), > > ('Bob', 'SchoolB'), > > ('Mary', 'SchoolA'), > > ('Jane', 'SchoolB'), > > ('Nancy', 'SchoolC')] > > > In [*46*]: {a:[t[0] *for* t *in* b] *for* a,b *in* groupby(sorted > (student_school_list, key=*lambda* t: t[1]), key=*lambda* t: t[ > > ...: 1])} > > ...: > > ...: > > ...: > > ...: > > ...: > > ...: > > ...: > > Out[*46*]: {'SchoolA': ['Fred', 'Mary'], 'SchoolB': ['Bob', 'Jane'], > 'SchoolC': ['Nancy']} > > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/