On Thu, Jun 28, 2018 at 1:34 PM, David Mertz <me...@gnosis.cx> wrote:
> I'd add one more option. You want something that behaves like SQL. Right > in the standard library is sqlite3, and you can create an in-memory DB to > hope the data you expect to group. > There are also packages designed to make DB-style queries easier. Here's one I found with a quick google. -CHB > On Thu, Jun 28, 2018, 3:48 PM Wes Turner <wes.tur...@gmail.com> wrote: > >> PyToolz, Pandas, Dask .groupby() >> >> toolz.itertoolz.groupby does this succinctly without any >> new/magical/surprising syntax. >> >> https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.groupby >> >> From https://github.com/pytoolz/toolz/blob/master/toolz/itertoolz.py : >> >> """ >> def groupby(key, seq): >> """ Group a collection by a key function >> >>> names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank'] >> >>> groupby(len, names) # doctest: +SKIP >> {3: ['Bob', 'Dan'], 5: ['Alice', 'Edith', 'Frank'], 7: ['Charlie']} >> >>> iseven = lambda x: x % 2 == 0 >> >>> groupby(iseven, [1, 2, 3, 4, 5, 6, 7, 8]) # doctest: +SKIP >> {False: [1, 3, 5, 7], True: [2, 4, 6, 8]} >> Non-callable keys imply grouping on a member. >> >>> groupby('gender', [{'name': 'Alice', 'gender': 'F'}, >> ... {'name': 'Bob', 'gender': 'M'}, >> ... {'name': 'Charlie', 'gender': 'M'}]) # >> doctest:+SKIP >> {'F': [{'gender': 'F', 'name': 'Alice'}], >> 'M': [{'gender': 'M', 'name': 'Bob'}, >> {'gender': 'M', 'name': 'Charlie'}]} >> See Also: >> countby >> """ >> if not callable(key): >> key = getter(key) >> d = collections.defaultdict(lambda: [].append) >> for item in seq: >> d[key(item)](item) >> rv = {} >> for k, v in iteritems(d): >> rv[k] = v.__self__ >> return rv >> """ >> >> If you're willing to install Pandas (and NumPy, and ...), there's >> pandas.DataFrame.groupby: >> >> https://pandas.pydata.org/pandas-docs/stable/generated/ >> pandas.DataFrame.groupby.html >> >> https://github.com/pandas-dev/pandas/blob/v0.23.1/pandas/ >> core/generic.py#L6586-L6659 >> >> >> Dask has a different groupby implementation: >> https://gist.github.com/darribas/41940dfe7bf4f987eeaa# >> file-pandas_dask_test-ipynb >> >> https://dask.pydata.org/en/latest/dataframe-api.html# >> dask.dataframe.DataFrame.groupby >> >> >> On Thursday, June 28, 2018, Chris Barker via Python-ideas < >> python-ideas@python.org> wrote: >> >>> On Thu, Jun 28, 2018 at 8:25 AM, Nicolas Rolin <nicolas.ro...@tiime.fr> >>> wrote: >>>> >>>> I use list and dict comprehension a lot, and a problem I often have is >>>> to do the equivalent of a group_by operation (to use sql terminology). >>>> >>> >>> I don't know from SQL, so "group by" doesn't mean anything to me, but >>> this: >>> >>> >>>> For example if I have a list of tuples (student, school) and I want to >>>> have the list of students by school the only option I'm left with is to >>>> write >>>> >>>> student_by_school = defaultdict(list) >>>> for student, school in student_school_list: >>>> student_by_school[school].append(student) >>>> >>> >>> seems to me that the issue here is that there is not way to have a >>> "defaultdict comprehension" >>> >>> I can't think of syntactically clean way to make that possible, though. >>> >>> Could itertools.groupby help here? It seems to work, but boy! it's ugly: >>> >>> In [*45*]: student_school_list >>> >>> Out[*45*]: >>> >>> [('Fred', 'SchoolA'), >>> >>> ('Bob', 'SchoolB'), >>> >>> ('Mary', 'SchoolA'), >>> >>> ('Jane', 'SchoolB'), >>> >>> ('Nancy', 'SchoolC')] >>> >>> >>> In [*46*]: {a:[t[0] *for* t *in* b] *for* a,b *in* groupby(sorted >>> (student_school_list, key=*lambda* t: t[1]), key=*lambda* t: t[ >>> >>> ...: 1])} >>> >>> ...: >>> >>> ...: >>> >>> ...: >>> >>> ...: >>> >>> ...: >>> >>> ...: >>> >>> ...: >>> >>> Out[*46*]: {'SchoolA': ['Fred', 'Mary'], 'SchoolB': ['Bob', 'Jane'], >>> 'SchoolC': ['Nancy']} >>> >>> >>> -CHB >>> >>> >>> -- >>> >>> Christopher Barker, Ph.D. >>> Oceanographer >>> >>> Emergency Response Division >>> NOAA/NOS/OR&R (206) 526-6959 voice >>> 7600 Sand Point Way NE >>> <https://maps.google.com/?q=7600+Sand+Point+Way+NE&entry=gmail&source=g> >>> (206) 526-6329 fax >>> Seattle, WA 98115 (206) 526-6317 main reception >>> >>> chris.bar...@noaa.gov >>> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas@python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/