Re: [Python-ideas] Fwd: grouping / dict of lists
On Wed, Jul 4, 2018 at 6:34 AM, David Mertz wrote: > You've misunderstood part of the discussion. There are two different > signatures being discussed/proposed for a grouping() function. > > The one you show we might call grouping_michael(). The alternate API we > might call grouping_chris(). These two calls will produce the same result > (the first output you show) > > grouping_michael(words, keyfunc=len) > grouping_chris((len(word), word) for word in words) > > I happen to prefer grouping_michael(), but recognize they each make > slightly different things obvious. > I starting thinking grouping_chris was the obvious and natural thing to do, but his discussion has made it clear that grouping_michael is more natural for some kinds of data. and in some cases, it really comes down to taste, after all, who's to say which of these is "better" map(func, iterable) or (expression for item in iterable) given that map existed in Python when comprehensions were added, I tend to see the latter as more "Pythonic" but that's just me. So I'm currently lobbying for both :-) The default is iterable of (key. value) pairs, but the use can specify a key function is they want to do it that way. While a bit of a schizophrenic API, it makes sens (to me), because grouping_mikael isn't useful with a default key function anyway. The other enhancement I suggest is that an (optional) value function be added, as there are use cases where that would be really helpful. -CHB NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)
On Tue, Jul 3, 2018 at 6:23 AM, David Mertz wrote: > Guido said he has mooted this discussion > ... But before putting it on auto-archive, the BDFL said (1) NO GO on getting a new builtin; (2) NO OBJECTION to putting it in itertools. I don't recall him offering an opinion on a class in collections, did he? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)
On Wed, Jul 4, 2018 at 3:53 AM, INADA Naoki wrote: > But if it happens, I'm -1 on functools and collections. > They are used very much. Every Python tool import them regardless how > much of their contents are used. > really? collections? what for? I'm guessing namedtuple and maybe deque. But collections already has 9 classes (well, things) in it so we'd be adding a bit less than 10% more to it. what is the concern? import time, memory? In either case, it seems like the wrong driver for deciding where to put new things. > If you really want to add it in collections, I suggests from collections.groupdict import GroupDict. Perhaps the stdlib should have a deeper namespaces in general -- if that is established as a policy, then this could be the first thing to follow that policy. But I thought "flat is better than nested" -- sigh. So maybe we need to bite the bullet and solve the problem at another level: 1) if, say, namedtuple has gotten very popular, maybe it should move to builtins. 2) Whatever happened to the proposals to make it easier to lazy-load stuff in modules? If that gets implemented, then we can speed up startup in general, and not have to be too worried about adding "too much" to a module because one thing in it is common use. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Add a __cite__ method for scientific packages
typeshed, dotted lookup, ScholarlyArticle semantic graphs with classes, properties, and URIs Would external metadata (similar to how typeshed is defined in a 'shadow naming scheme' (?)) be advantageous for dotted name lookup of citation metadata? > Typeshed contains external type annotations for the Python standard library and Python builtins, as well as third party packages. > > This data can e.g. be used for static analysis, type checking or type inference. https://github.com/python/typeshed stdlib/{2, 2and3, 3, 3.5, 3.6, 3.7} third_party/{2, 2and3, 3}/{jinja2,} Ideally, a ScholarlyArticle can also be published as HTML with RDFa and/or JSONLD (in addition to two column LaTeX/PDF which is lossy in regards to structured data / linked data) with its own document-level metadata simply as part of a graph of resources (such as schema:citation and schema:Datasets) described using a search-indexed vocabulary such as the Schema.org RDFS vocabulary. An aside: https://schema.org/unitCode has a range of {Text, URL} where the Text should be a 3 character UN/CEFACT Common Code; but there's also QUDT for unit URIs; fortunately, RDF allows repeated property values, so we can just add both. On Wednesday, July 4, 2018, Wes Turner wrote: > ... a schema:Dataset may be part of a Creative work. > > https://schema.org/Dataset > https://schema.org/isPartOf > https://schema.org/ScholarlyArticle > > #LinkedReproducibility #nbmeta > > On Wednesday, July 4, 2018, Wes Turner wrote: > >> https://schema.org/CreativeWork >> https://schema.org/Code >> https://schema.org/SoftwareApplication >> >> CreativeWork has a https://schema.org/citation field with a range of >> {CreativeWork, Text} >> >> There's also a https://schema.org/funder attribute with a domain of >> CreativeWork and a range of {Organization, Person} >> >> - BibTeX is actually somewhat ill-specified, TBH. >> - There is a repository of CSL styles at https://citationstyles.org . >> - CSL is sponsored by both Zotero and Mendeley. >> - A number of search engines support schema.org (and JSONLD) >> - The schema.org RDFS vocabulary is designed to describe a graph of >> resources (CreativeWork, Code, SoftwareApplication, ScholarlyArticle, >> MedicalScholarlyArticle). >> >> __citation__ = [{}, ] >> __citation__ = { >> '@type': ['schema:ScholarlyArticle'], >> 'schema:name': '', >> 'schema:author': [{ >> '@type': 'schema:Person', >> '...': '...'}] >> } >> >> JSONLD is ideal for describing a graph of resources with varied types. >> >> If the overhead of __citation__ for every import is unjustified, >> a lookup of methods with dotted names that finds entries for root modules >> as well would be great: >> >> >>> citations('json.loads') >> >>> citations('list.sort') >> >> A tracing debugger could lookup each and every package, module, function, >> and method each ScholarlyArticle SoftwareApplication executes (from a >> registry in e.g. a _citations_.py or a _citations_.jsonld.json). >> >> It'd be a shame to need to manually format citations for a particular >> Journal's CSL bibliographic metadata template preference. >> >> sphinxcontrib-bibtex is a Sphinx extension for BibTeX support (with a >> bibliography directive and a cite role) >> - Src: https://github.com/mcmtroffaes/sphinxcontrib-bibtex >> >> Jupyter notebooks support document-level metadata (in JSON that's >> currently only similar to schema.org JSONLD). >> >> https://schema.org/ScholarlyArticle is search engine indexable. >> >> >> On Wednesday, July 4, 2018, Alexander Belopolsky < >> alexander.belopol...@gmail.com> wrote: >> >>> >>> >>> On Sun, Jul 1, 2018 at 9:45 AM David Mertz wrote: >>> .. There's absolutely nothing in the idea that requires a change in Python, and Python developers or users are not, as such, the relevant experts. >>> >>> This is not entirely true. If some variant of __citation__ is endorsed >>> by the community, I would expect that pydoc would extract this information >>> to fill an appropriate section in the documentation page. Note that pydoc >>> already treats a number of dunder variables specially: '__author__', >>> '__credits__', >>> and '__version__' are a few that come to mind, so I don't think the >>> threshold for adding one more should be too high. On the other hand, >>> maybe '__author__', '__credits__', and '__citation__' should be >>> merged in one structured variable (a dict?) with format designed with some >>> extendability in mind. >>> >> ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Add a __cite__ method for scientific packages
... a schema:Dataset may be part of a Creative work. https://schema.org/Dataset https://schema.org/isPartOf https://schema.org/ScholarlyArticle #LinkedReproducibility #nbmeta On Wednesday, July 4, 2018, Wes Turner wrote: > https://schema.org/CreativeWork > https://schema.org/Code > https://schema.org/SoftwareApplication > > CreativeWork has a https://schema.org/citation field with a range of > {CreativeWork, Text} > > There's also a https://schema.org/funder attribute with a domain of > CreativeWork and a range of {Organization, Person} > > - BibTeX is actually somewhat ill-specified, TBH. > - There is a repository of CSL styles at https://citationstyles.org . > - CSL is sponsored by both Zotero and Mendeley. > - A number of search engines support schema.org (and JSONLD) > - The schema.org RDFS vocabulary is designed to describe a graph of > resources (CreativeWork, Code, SoftwareApplication, ScholarlyArticle, > MedicalScholarlyArticle). > > __citation__ = [{}, ] > __citation__ = { > '@type': ['schema:ScholarlyArticle'], > 'schema:name': '', > 'schema:author': [{ > '@type': 'schema:Person', > '...': '...'}] > } > > JSONLD is ideal for describing a graph of resources with varied types. > > If the overhead of __citation__ for every import is unjustified, > a lookup of methods with dotted names that finds entries for root modules > as well would be great: > > >>> citations('json.loads') > >>> citations('list.sort') > > A tracing debugger could lookup each and every package, module, function, > and method each ScholarlyArticle SoftwareApplication executes (from a > registry in e.g. a _citations_.py or a _citations_.jsonld.json). > > It'd be a shame to need to manually format citations for a particular > Journal's CSL bibliographic metadata template preference. > > sphinxcontrib-bibtex is a Sphinx extension for BibTeX support (with a > bibliography directive and a cite role) > - Src: https://github.com/mcmtroffaes/sphinxcontrib-bibtex > > Jupyter notebooks support document-level metadata (in JSON that's > currently only similar to schema.org JSONLD). > > https://schema.org/ScholarlyArticle is search engine indexable. > > > On Wednesday, July 4, 2018, Alexander Belopolsky < > alexander.belopol...@gmail.com> wrote: > >> >> >> On Sun, Jul 1, 2018 at 9:45 AM David Mertz wrote: >> >>> .. >>> There's absolutely nothing in the idea that requires a change in Python, >>> and Python developers or users are not, as such, the relevant experts. >>> >> >> This is not entirely true. If some variant of __citation__ is endorsed >> by the community, I would expect that pydoc would extract this information >> to fill an appropriate section in the documentation page. Note that pydoc >> already treats a number of dunder variables specially: '__author__', >> '__credits__', >> and '__version__' are a few that come to mind, so I don't think the >> threshold for adding one more should be too high. On the other hand, >> maybe '__author__', '__credits__', and '__citation__' should be merged >> in one structured variable (a dict?) with format designed with some >> extendability in mind. >> > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Add a __cite__ method for scientific packages
https://schema.org/CreativeWork https://schema.org/Code https://schema.org/SoftwareApplication CreativeWork has a https://schema.org/citation field with a range of {CreativeWork, Text} There's also a https://schema.org/funder attribute with a domain of CreativeWork and a range of {Organization, Person} - BibTeX is actually somewhat ill-specified, TBH. - There is a repository of CSL styles at https://citationstyles.org . - CSL is sponsored by both Zotero and Mendeley. - A number of search engines support schema.org (and JSONLD) - The schema.org RDFS vocabulary is designed to describe a graph of resources (CreativeWork, Code, SoftwareApplication, ScholarlyArticle, MedicalScholarlyArticle). __citation__ = [{}, ] __citation__ = { '@type': ['schema:ScholarlyArticle'], 'schema:name': '', 'schema:author': [{ '@type': 'schema:Person', '...': '...'}] } JSONLD is ideal for describing a graph of resources with varied types. If the overhead of __citation__ for every import is unjustified, a lookup of methods with dotted names that finds entries for root modules as well would be great: >>> citations('json.loads') >>> citations('list.sort') A tracing debugger could lookup each and every package, module, function, and method each ScholarlyArticle SoftwareApplication executes (from a registry in e.g. a _citations_.py or a _citations_.jsonld.json). It'd be a shame to need to manually format citations for a particular Journal's CSL bibliographic metadata template preference. sphinxcontrib-bibtex is a Sphinx extension for BibTeX support (with a bibliography directive and a cite role) - Src: https://github.com/mcmtroffaes/sphinxcontrib-bibtex Jupyter notebooks support document-level metadata (in JSON that's currently only similar to schema.org JSONLD). https://schema.org/ScholarlyArticle is search engine indexable. On Wednesday, July 4, 2018, Alexander Belopolsky < alexander.belopol...@gmail.com> wrote: > > > On Sun, Jul 1, 2018 at 9:45 AM David Mertz wrote: > >> .. >> There's absolutely nothing in the idea that requires a change in Python, >> and Python developers or users are not, as such, the relevant experts. >> > > This is not entirely true. If some variant of __citation__ is endorsed by > the community, I would expect that pydoc would extract this information to > fill an appropriate section in the documentation page. Note that pydoc > already treats a number of dunder variables specially: '__author__', > '__credits__', > and '__version__' are a few that come to mind, so I don't think the > threshold for adding one more should be too high. On the other hand, > maybe '__author__', '__credits__', and '__citation__' should be merged > in one structured variable (a dict?) with format designed with some > extendability in mind. > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Add a __cite__ method for scientific packages
On Sun, Jul 1, 2018 at 9:45 AM David Mertz wrote: > .. > There's absolutely nothing in the idea that requires a change in Python, > and Python developers or users are not, as such, the relevant experts. > This is not entirely true. If some variant of __citation__ is endorsed by the community, I would expect that pydoc would extract this information to fill an appropriate section in the documentation page. Note that pydoc already treats a number of dunder variables specially: '__author__', '__credits__', and '__version__' are a few that come to mind, so I don't think the threshold for adding one more should be too high. On the other hand, maybe '__author__', '__credits__', and '__citation__' should be merged in one structured variable (a dict?) with format designed with some extendability in mind. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)
On Wed, Jul 4, 2018, 3:11 AM Ivan Levkivskyi wrote: > Replying to the question in subject, I think it would be better in > collections as a class. > Having it just as a function doesn't buy much, because one can do the > same with three lines and a defaultdict. > Four lines. You'll need to convert from defaultdict back to a basic dict to avoid mistaken inserts. For some use cases. However, if this is a class it can support adding new elements, merge the > groupeddicts, etc. > > -- > Ivan > > > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: grouping / dict of lists
There are some cases when that's the correct behavior. It mimics pandas.DataFrame.groupby. For example, what if you have a sequence of (key, v1, v2) triples? Group by key, then keep the triples intact is the right choice sometimes. On Wed, Jul 4, 2018, 6:39 AM David Mertz wrote: > Steven: > > You've misunderstood part of the discussion. There are two different > signatures being discussed/proposed for a grouping() function. > > The one you show we might call grouping_michael(). The alternate API we > might call grouping_chris(). These two calls will produce the same result > (the first output you show) > > grouping_michael(words, keyfunc=len) > grouping_chris((len(word), word) for word in words) > > I happen to prefer grouping_michael(), but recognize they each make > slightly different things obvious. Absolutely no one wants the behavior in > your second output. > > On Tue, Jul 3, 2018, 9:32 PM Steven D'Aprano wrote: > >> Of course you can prepare the sequence any way you like, but these are >> not equivalent: >> >> grouping(words, keyfunc=len) >> grouping((len(word), word) for word in words) >> >> The first groups words by their length; the second groups pairs of >> (length, word) tuples by equality. >> >> py> grouping("a bb ccc d ee fff".split(), keyfunc=len) >> {1: ['a', 'd'], 2: ['bb', 'ee'], 3: ['ccc', 'fff']} >> >> py> grouping((len(w), w) for w in "a bb ccc d ee fff".split()) >> {(3, 'ccc'): [(3, 'ccc')], (1, 'd'): [(1, 'd')], (2, 'ee'): [(2, 'ee')], >> (3, 'fff'): [(3, 'fff')], (1, 'a'): [(1, 'a')], (2, 'bb'): [(2, ' >> > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Fwd: grouping / dict of lists
Steven: You've misunderstood part of the discussion. There are two different signatures being discussed/proposed for a grouping() function. The one you show we might call grouping_michael(). The alternate API we might call grouping_chris(). These two calls will produce the same result (the first output you show) grouping_michael(words, keyfunc=len) grouping_chris((len(word), word) for word in words) I happen to prefer grouping_michael(), but recognize they each make slightly different things obvious. Absolutely no one wants the behavior in your second output. On Tue, Jul 3, 2018, 9:32 PM Steven D'Aprano wrote: > Of course you can prepare the sequence any way you like, but these are > not equivalent: > > grouping(words, keyfunc=len) > grouping((len(word), word) for word in words) > > The first groups words by their length; the second groups pairs of > (length, word) tuples by equality. > > py> grouping("a bb ccc d ee fff".split(), keyfunc=len) > {1: ['a', 'd'], 2: ['bb', 'ee'], 3: ['ccc', 'fff']} > > py> grouping((len(w), w) for w in "a bb ccc d ee fff".split()) > {(3, 'ccc'): [(3, 'ccc')], (1, 'd'): [(1, 'd')], (2, 'ee'): [(2, 'ee')], > (3, 'fff'): [(3, 'fff')], (1, 'a'): [(1, 'a')], (2, 'bb'): [(2, ' > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)
I'm -1 on adding it in stdlib. But if it happens, I'm -1 on functools and collections. They are used very much. Every Python tool import them regardless how much of their contents are used. On the other hand, itertools contains random stuff very rarely used. If you really want to add it in collections, I suggests from collections.groupdict import GroupDict. Regards, On Tue, Jul 3, 2018 at 10:23 PM David Mertz wrote: > Guido said he has mooted this discussion, so it's probably not reaching > him. It took one thousand fewer messages for him to stop following this > than with PEP 572, for some reason :-). > > But before putting it on auto-archive, the BDFL said (1) NO GO on getting > a new builtin; (2) NO OBJECTION to putting it in itertools. > > My problem with the second idea is that *I* find it very wrong to have > something in itertools that does not return an iterator. It wrecks the > combinatorial algebra of the module. > > That said, it's easy to fix... and I believe independently useful. Just > make grouping() a generator function rather than a plain function. This > lets us get an incremental grouping of an iterable. This can be useful if > the iterable is slow or infinite, but the partial groupings are useful in > themselves. > > Python 3.7.0 (default, Jun 28 2018, 07:39:16) > [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> from grouping import grouping > >>> grouped = grouping('AbBa', key=str.casefold) > >>> for dct in grouped: print(dct) > ... > {'a': ['A']} > {'a': ['A'], 'b': ['b']} > {'a': ['A'], 'b': ['b', 'B']} > {'a': ['A', 'a'], 'b': ['b', 'B']} > > > This isn't so useful for the concrete sequence, but for this it would be > great: > > for grouped in grouping(data_over_wire()): > > process_partial_groups(grouped) > > > The implementation need not and should not rely on "pre-grouping" with > itertools.groupby: > > def grouping(iterable, key=None): > groups = {} > key = key or (lambda x: x) > for item in iterable: > groups.setdefault(key(item), []).append(item) > yield groups > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- INADA Naoki ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)
On 4 July 2018 at 11:25, Steven D'Aprano wrote: > On Wed, Jul 04, 2018 at 11:08:05AM +0100, Ivan Levkivskyi wrote: > > Replying to the question in subject, I think it would be better in > > collections as a class. > > Having it just as a function doesn't buy much, because one can do the > same > > with three lines and a defaultdict. > > However, if this is a class it can support adding new elements, merge the > > groupeddicts, etc. > > defaultdicts support adding new elements, and they have an update method > same as regular dicts :-) > Except that updating will not do what I want. Merging two groupeddicts is not just `one.update(other)` Moreover, using just an update with regular dicts will do something bug-prone, it will add every group from `other` as an element to the corresponding group in `one`. -- Ivan ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)
On Wed, Jul 04, 2018 at 11:08:05AM +0100, Ivan Levkivskyi wrote: > Replying to the question in subject, I think it would be better in > collections as a class. > Having it just as a function doesn't buy much, because one can do the same > with three lines and a defaultdict. > However, if this is a class it can support adding new elements, merge the > groupeddicts, etc. defaultdicts support adding new elements, and they have an update method same as regular dicts :-) -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)
Replying to the question in subject, I think it would be better in collections as a class. Having it just as a function doesn't buy much, because one can do the same with three lines and a defaultdict. However, if this is a class it can support adding new elements, merge the groupeddicts, etc. -- Ivan ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/