Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-07-04 Thread Wes Turner
typeshed, dotted lookup, ScholarlyArticle semantic graphs with classes,
properties, and URIs

Would external metadata (similar to how typeshed is defined in a 'shadow
naming scheme' (?)) be advantageous
for dotted name lookup of citation metadata?

> Typeshed contains external type annotations for the Python standard
library and Python builtins, as well as third party packages.
>

> This data can e.g. be used for static analysis, type checking or type
inference.

https://github.com/python/typeshed
stdlib/{2, 2and3, 3, 3.5, 3.6, 3.7}
third_party/{2, 2and3, 3}/{jinja2,}

Ideally, a ScholarlyArticle can also be published as HTML with RDFa and/or
JSONLD (in addition to two column LaTeX/PDF which is lossy in regards to
structured data / linked data) with its own document-level metadata simply
as part of a graph of resources (such as schema:citation and
schema:Datasets) described using a search-indexed vocabulary such as the
Schema.org RDFS vocabulary.

An aside:
https://schema.org/unitCode has a range of {Text, URL} where the Text
should be a 3 character UN/CEFACT Common Code; but there's also QUDT for
unit URIs; fortunately, RDF allows repeated property values, so we can just
add both.

On Wednesday, July 4, 2018, Wes Turner  wrote:

> ... a schema:Dataset may be part of a Creative work.
>
> https://schema.org/Dataset
> https://schema.org/isPartOf
> https://schema.org/ScholarlyArticle
>
> #LinkedReproducibility #nbmeta
>
> On Wednesday, July 4, 2018, Wes Turner  wrote:
>
>> https://schema.org/CreativeWork
>>   https://schema.org/Code
>>   https://schema.org/SoftwareApplication
>>
>> CreativeWork has a https://schema.org/citation field with a range of
>> {CreativeWork, Text}
>>
>> There's also a https://schema.org/funder attribute with a domain of
>> CreativeWork and a range of {Organization, Person}
>>
>> - BibTeX is actually somewhat ill-specified, TBH.
>> - There is a repository of CSL styles at https://citationstyles.org .
>> - CSL is sponsored by both Zotero and Mendeley.
>> - A number of search engines support schema.org (and JSONLD)
>> - The schema.org RDFS vocabulary is designed to describe a graph of
>> resources (CreativeWork, Code, SoftwareApplication, ScholarlyArticle,
>> MedicalScholarlyArticle).
>>
>> __citation__ = [{}, ]
>> __citation__ = {
>>   '@type': ['schema:ScholarlyArticle'],
>>   'schema:name': '',
>>   'schema:author': [{
>>   '@type': 'schema:Person',
>>   '...': '...'}]
>> }
>>
>> JSONLD is ideal for describing a graph of resources with varied types.
>>
>> If the overhead of __citation__ for every import is unjustified,
>> a lookup of methods with dotted names that finds entries for root modules
>> as well would be great:
>>
>> >>> citations('json.loads')
>> >>> citations('list.sort')
>>
>> A tracing debugger could lookup each and every package, module, function,
>> and method each ScholarlyArticle SoftwareApplication executes (from a
>> registry in e.g. a _citations_.py or a _citations_.jsonld.json).
>>
>> It'd be a shame to need to manually format citations for a particular
>> Journal's CSL bibliographic  metadata template preference.
>>
>> sphinxcontrib-bibtex is a Sphinx extension for BibTeX support (with a
>> bibliography directive and a cite role)
>> - Src: https://github.com/mcmtroffaes/sphinxcontrib-bibtex
>>
>> Jupyter notebooks support document-level metadata (in JSON that's
>> currently only similar to schema.org JSONLD).
>>
>> https://schema.org/ScholarlyArticle is search engine indexable.
>>
>>
>> On Wednesday, July 4, 2018, Alexander Belopolsky <
>> alexander.belopol...@gmail.com> wrote:
>>
>>>
>>>
>>> On Sun, Jul 1, 2018 at 9:45 AM David Mertz  wrote:
>>>
 ..
 There's absolutely nothing in the idea that requires a change in
 Python, and Python developers or users are not, as such, the relevant
 experts.

>>>
>>> This is not entirely true.  If some variant of __citation__ is endorsed
>>> by the community, I would expect that pydoc would extract this information
>>> to fill an appropriate section in the documentation page.  Note that pydoc
>>> already treats a number of dunder variables specially: '__author__', 
>>> '__credits__',
>>> and '__version__' are a few that come to mind, so I don't think the
>>> threshold for adding one more should be too high.  On the other hand,
>>> maybe   '__author__', '__credits__', and '__citation__' should be
>>> merged in one structured variable (a dict?) with format designed with some
>>> extendability in mind.
>>>
>>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-07-04 Thread Wes Turner
... a schema:Dataset may be part of a Creative work.

https://schema.org/Dataset
https://schema.org/isPartOf
https://schema.org/ScholarlyArticle

#LinkedReproducibility #nbmeta

On Wednesday, July 4, 2018, Wes Turner  wrote:

> https://schema.org/CreativeWork
>   https://schema.org/Code
>   https://schema.org/SoftwareApplication
>
> CreativeWork has a https://schema.org/citation field with a range of
> {CreativeWork, Text}
>
> There's also a https://schema.org/funder attribute with a domain of
> CreativeWork and a range of {Organization, Person}
>
> - BibTeX is actually somewhat ill-specified, TBH.
> - There is a repository of CSL styles at https://citationstyles.org .
> - CSL is sponsored by both Zotero and Mendeley.
> - A number of search engines support schema.org (and JSONLD)
> - The schema.org RDFS vocabulary is designed to describe a graph of
> resources (CreativeWork, Code, SoftwareApplication, ScholarlyArticle,
> MedicalScholarlyArticle).
>
> __citation__ = [{}, ]
> __citation__ = {
>   '@type': ['schema:ScholarlyArticle'],
>   'schema:name': '',
>   'schema:author': [{
>   '@type': 'schema:Person',
>   '...': '...'}]
> }
>
> JSONLD is ideal for describing a graph of resources with varied types.
>
> If the overhead of __citation__ for every import is unjustified,
> a lookup of methods with dotted names that finds entries for root modules
> as well would be great:
>
> >>> citations('json.loads')
> >>> citations('list.sort')
>
> A tracing debugger could lookup each and every package, module, function,
> and method each ScholarlyArticle SoftwareApplication executes (from a
> registry in e.g. a _citations_.py or a _citations_.jsonld.json).
>
> It'd be a shame to need to manually format citations for a particular
> Journal's CSL bibliographic  metadata template preference.
>
> sphinxcontrib-bibtex is a Sphinx extension for BibTeX support (with a
> bibliography directive and a cite role)
> - Src: https://github.com/mcmtroffaes/sphinxcontrib-bibtex
>
> Jupyter notebooks support document-level metadata (in JSON that's
> currently only similar to schema.org JSONLD).
>
> https://schema.org/ScholarlyArticle is search engine indexable.
>
>
> On Wednesday, July 4, 2018, Alexander Belopolsky <
> alexander.belopol...@gmail.com> wrote:
>
>>
>>
>> On Sun, Jul 1, 2018 at 9:45 AM David Mertz  wrote:
>>
>>> ..
>>> There's absolutely nothing in the idea that requires a change in Python,
>>> and Python developers or users are not, as such, the relevant experts.
>>>
>>
>> This is not entirely true.  If some variant of __citation__ is endorsed
>> by the community, I would expect that pydoc would extract this information
>> to fill an appropriate section in the documentation page.  Note that pydoc
>> already treats a number of dunder variables specially: '__author__', 
>> '__credits__',
>> and '__version__' are a few that come to mind, so I don't think the
>> threshold for adding one more should be too high.  On the other hand,
>> maybe   '__author__', '__credits__', and '__citation__' should be merged
>> in one structured variable (a dict?) with format designed with some
>> extendability in mind.
>>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-07-04 Thread Wes Turner
https://schema.org/CreativeWork
  https://schema.org/Code
  https://schema.org/SoftwareApplication

CreativeWork has a https://schema.org/citation field with a range of
{CreativeWork, Text}

There's also a https://schema.org/funder attribute with a domain of
CreativeWork and a range of {Organization, Person}

- BibTeX is actually somewhat ill-specified, TBH.
- There is a repository of CSL styles at https://citationstyles.org .
- CSL is sponsored by both Zotero and Mendeley.
- A number of search engines support schema.org (and JSONLD)
- The schema.org RDFS vocabulary is designed to describe a graph of
resources (CreativeWork, Code, SoftwareApplication, ScholarlyArticle,
MedicalScholarlyArticle).

__citation__ = [{}, ]
__citation__ = {
  '@type': ['schema:ScholarlyArticle'],
  'schema:name': '',
  'schema:author': [{
  '@type': 'schema:Person',
  '...': '...'}]
}

JSONLD is ideal for describing a graph of resources with varied types.

If the overhead of __citation__ for every import is unjustified,
a lookup of methods with dotted names that finds entries for root modules
as well would be great:

>>> citations('json.loads')
>>> citations('list.sort')

A tracing debugger could lookup each and every package, module, function,
and method each ScholarlyArticle SoftwareApplication executes (from a
registry in e.g. a _citations_.py or a _citations_.jsonld.json).

It'd be a shame to need to manually format citations for a particular
Journal's CSL bibliographic  metadata template preference.

sphinxcontrib-bibtex is a Sphinx extension for BibTeX support (with a
bibliography directive and a cite role)
- Src: https://github.com/mcmtroffaes/sphinxcontrib-bibtex

Jupyter notebooks support document-level metadata (in JSON that's currently
only similar to schema.org JSONLD).

https://schema.org/ScholarlyArticle is search engine indexable.


On Wednesday, July 4, 2018, Alexander Belopolsky <
alexander.belopol...@gmail.com> wrote:

>
>
> On Sun, Jul 1, 2018 at 9:45 AM David Mertz  wrote:
>
>> ..
>> There's absolutely nothing in the idea that requires a change in Python,
>> and Python developers or users are not, as such, the relevant experts.
>>
>
> This is not entirely true.  If some variant of __citation__ is endorsed by
> the community, I would expect that pydoc would extract this information to
> fill an appropriate section in the documentation page.  Note that pydoc
> already treats a number of dunder variables specially: '__author__', 
> '__credits__',
> and '__version__' are a few that come to mind, so I don't think the
> threshold for adding one more should be too high.  On the other hand,
> maybe   '__author__', '__credits__', and '__citation__' should be merged
> in one structured variable (a dict?) with format designed with some
> extendability in mind.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-07-04 Thread Alexander Belopolsky
On Sun, Jul 1, 2018 at 9:45 AM David Mertz  wrote:

> ..
> There's absolutely nothing in the idea that requires a change in Python,
> and Python developers or users are not, as such, the relevant experts.
>

This is not entirely true.  If some variant of __citation__ is endorsed by
the community, I would expect that pydoc would extract this information to
fill an appropriate section in the documentation page.  Note that pydoc
already treats a number of dunder variables
specially: '__author__', '__credits__', and '__version__' are a few that
come to mind, so I don't think the threshold for adding one more should be
too high.  On the other hand, maybe   '__author__', '__credits__', and
'__citation__' should be merged in one structured variable (a dict?) with
format designed with some extendability in mind.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-07-01 Thread Alex Walters
> -Original Message-
> From: Python-ideas  list=sdamon@python.org> On Behalf Of Nick Timkovich
> Sent: Sunday, July 1, 2018 12:02 PM
> To: Matt Arcidy 
> Cc: python-ideas 
> Subject: Re: [Python-ideas] Add a __cite__ method for scientific packages

> 
> From an abstract level, however, citing code requires that it's published in
> some form, which is very strongly related to packaging, so I think such
> discussions would best revolve around there. Maybe rolling something into
> pkg_resources that could pull out a short citation string from some package
> metadata (a hypothetical `pkg_resources.get_distribution("numpy").citation`
> that could be wrapped by some helper function if desired)? The actual
> mechanism to convert metadata into something in the repo (a dunder cite
> string in the root module, a separate metadata file, etc.) into the package
> metadata isn't as important as rolling said metadata into something part of
> the distribution package like the version or long_description fields. Once the
> schema of the citation data is defined, you could add it to the metadata spec
> (outgrowth of PEP-566) https://packaging.python.org/specifications/core-
> metadata/

Putting citation information into pyproject.toml makes a lot more sense than 
putting it in the modules themselves, where they would have to be introspected 
to be extracted.

* It puts zero burden on the core developers
* It puts near zero burden on the distutils special interest group
* It doesn't consume names from the package namespace
* It's just a TOML file -  you can add sections to it willy-nilly
* It's just a TOML file - there's libraries in almost all ecosystems to handle 
it.

Nothing has to go into the core metadata specification unless part of your 
suggestion is that Pypi show the citations.  I don't think that is a good idea 
for the scope of Pypi and the workload of the warehouse developers.  I don't 
think it's too much to ask for the scientific community to figure out the 
solution that works for most people before bringing it back here.  I also don't 
think its out of scope to suggest taking this to SciPy - yes, not everything 
depends on SciPy, but you don't need everything, you just momentum.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-07-01 Thread Nick Timkovich
On Fri, Jun 29, 2018 at 8:58 PM, Matt Arcidy  wrote:

> It seems like the very small percentage of academic users whose careers
> depend on this cannot resolve the political issue of forming a standards
> body.
>
> I don't see how externalizing the standard development will help.  Kudos
> for shortcutting the process in a practical way to just get it done,  but
> this just puts core devs in the middle of silly academic spats.  A language
> endorsed citation method isn't a 'correct' method, and without the broad
> consensus that currently doesn't exist, this becomes _your_ method, a
> picked winner but ultimately a lightning rod for bored tenured professors
> with personal axes to grind.  If this were about implementing an existing
> correct method I'm sure a grad student would be tasked with it for an
> afternoon.
>
>
[...]  Just create a jstor style git server where obeying the citation
> protocol is mandatory.
>

I don't know if it constitutes a standards body, but there are a couple
journals out there that are meant to serve as mechanisms for turning a repo
into a published/citable thing, they might be good to look at for prior art
as well as to what metadata should be included:

* https://joss.theoj.org/about (sponsored by NumFOCUS)
* https://www.journals.elsevier.com/softwarex/

>From an abstract level, however, citing code requires that it's published
in some form, which is very strongly related to packaging, so I think such
discussions would best revolve around there. Maybe rolling something into
pkg_resources that could pull out a short citation string from some package
metadata (a hypothetical `pkg_resources.get_distribution("numpy").citation`
that could be wrapped by some helper function if desired)? The actual
mechanism to convert metadata into something in the repo (a dunder cite
string in the root module, a separate metadata file, etc.) into the package
metadata isn't as important as rolling said metadata into something part of
the distribution package like the version or long_description fields. Once
the schema of the citation data is defined, you could add it to the
metadata spec (outgrowth of PEP-566)
https://packaging.python.org/specifications/core-metadata/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-07-01 Thread David Mertz
I think a __citation__ *method* is a bad idea. This yells out "attribute"
to me. A function or two that parses those attributes in some manner is a
better idea... And there's no reason that function or two need to be
dunders. There's also no reason they need to be in the standard library...
There might be many citation/writing applications that process the data to
their own needs.

But assuming there is an attribute, WHAT goes inside it? Is it a string?
And if so, in what markup format? Is it a dictionary? A list? A custom
class? Does some wrapper function deal with different formats. Does the
wrapper also scan for __author__, __copyright__, and friends?

We also need to decide what __citation__ is an attribute OF. Only modules?
Classes? Methods? Functions? All of the above? If multiple, how are the
attributes at different places synthesized or processed? Can one object
have multiple citations (e.g. what if a class or method implements multiple
algorithms depending on a switch... Or depending on the shape of the data
being processed? The different algorithms might need different citations).

These are all questions that could have good answers. But I don't know what
the answers are. I've worked in scientific computing for a good while, but
not as an academic. And when I was an academic it wasn't in scientific
computing. This list is not mostly composed of the relevant experts. Those
are the authors and users of SciPy and statsmodels, and scikit-learn, and
xarray, and Tensorflow, and astropy, and so on.

There's absolutely nothing in the idea that requires a change in Python,
and Python developers or users are not, as such, the relevant experts. In
the future, AFTER there is widespread acceptance of what goes on a
__citation__ attribute, it would be easy and obvious to add minimal support
in Python itself for displaying citation content. But this is the wrong
group to mandate what the actual academic needs are here.

On Sun, Jul 1, 2018, 9:07 AM Ivan Levkivskyi  wrote:

> On 28 June 2018 at 01:19, Nathaniel Smith  wrote:
>
>> On Wed, Jun 27, 2018 at 2:20 PM, Andrei Kucharavy
>>  wrote:
>> > To remediate to that situation, I suggest a __citation__ method
>> associated
>> > to each package installation and import. Called from the __main__,
>> > __citation__() would scan __citation__ of all imported packages and
>> return
>> > the list of all relevant top-level citations associated to the packages.
>>
>
> I actually think the opposite. If this is not fixed in a PEP it will stay
> in the current state.
> Writing a PEP (and officially accepting it) for this purpose will give a
> signal that it is a standard practice
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-07-01 Thread Ivan Levkivskyi
On 28 June 2018 at 01:19, Nathaniel Smith  wrote:

> On Wed, Jun 27, 2018 at 2:20 PM, Andrei Kucharavy
>  wrote:
> > To remediate to that situation, I suggest a __citation__ method
> associated
> > to each package installation and import. Called from the __main__,
> > __citation__() would scan __citation__ of all imported packages and
> return
> > the list of all relevant top-level citations associated to the packages.
> >
> > As a scientific package developer working in academia, the problem is
> quite
> > serious, and the solution seems relatively straightforward.
> >
> > What does Python core team think about addition and long-term
> maintenance of
> > such a feature to the import and setup mechanisms? What do other users
> and
> > scientific package developers think of such a mechanism for citations
> > retrieval?
>
> This is indeed a serious problem. I suspect python-ideas isn't the
> best venue for addressing it though – there's nothing here that needs
> changes to the Python interpreter itself (I think), and the people who
> understand this problem the best and who are most affected by it,
> mostly aren't here.
>

I actually think the opposite. If this is not fixed in a PEP it will stay
in the current state.
Writing a PEP (and officially accepting it) for this purpose will give a
signal that it is a standard practice
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-30 Thread Nick Coghlan
On 29 June 2018 at 12:14, Nathaniel Smith  wrote:
> On Thu, Jun 28, 2018 at 2:25 PM, Andrei Kucharavy
>  wrote:
>> As for the list, reserving a __citation__/__cite__ for packages at the same
>> level as __version__ is now reserved and adding a citation()/cite() function
>> to the standard library seemed large enough modifications to warrant
>> searching a buy-in from the maintainers and the community at large.
>
> There isn't actually any formal method for registering special names
> like __version__, and they aren't treated specially by the language.
> They're just variables that happen to have a funny name. You shouldn't
> start using them willy-nilly, but you don't actually have to ask
> permission or anything.

The one caveat on dunder names is that we expressly exempt them from
our usual backwards compatibility guarantees, so it's worth getting
some level of "No, we're not going to do anything that would conflict
with your proposed convention" at the language design level.

> And it's not very likely that someone else
> will come along and propose using the name __citation__ for something
> that *isn't* a citation :-).

Aye, in this case I think you can comfortably assume that we'll
happily leave the "__citation__" and "__cite__" dunder names alone
unless/until there's a clear consensus in the scientific Python
community to use them a particular way.

And even then, it would likely be Python package installers like pip,
Python environment managers like pipenv, and data analysis environment
managers like conda that would handle the task of actually consuming
that metadata (in whatever form it may appear). Having your citation
management support depend on which version of Python you were using
seems like it would be mostly a source of pain rather than beneficial.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-29 Thread Matt Arcidy
posed to cite and where to find the citations.
>
> We aren't likely to convince everyone to cite code overnight, but making
> citing as easy as possible does seem like a step in the right direction to
> me.
>
> I still think it would be very nice to have an official standard for
>> citation information in Python packages as codified in a PEP. That would
>> reduce ambiguity and make it much easier for tool-writers who want to parse
>> citation information.
>>
>
> That's my opinion as well.
>
> To summarize the conversation until now, it seems that  __citation__ data
> field and a cite() script seem to be the preferred option. If the proposal
> gets traction and is accepted, the citation for Python as well as the
> instructions to get citation for a package can be added as a top-level
> command, similar to credits, copyright or license.
>
> As of now, it seems like the next steps would be to:
>
> - draft a PEP (or complete the existing one) and implement the cite()
> script as well as a show-case package using __citation__
> - talk to major package maintainers to see if they have any objections to
> the method or suggestions with regards to pep/implementation
> - talk to the distutils-sig list to see if we could add the __citation__
> metadata to setup.py
> - submit a proper PEP (Would a pull request to
> https://github.com/python/peps be an acceptable way of doing it?)
>
> Is there something I might be missing so far?
>
> Best,
>
> *Andrei Kucharavy*
>
> Post-Doc @ *Joel S. Bader** Lab*
>
> Johns Hopkins University, Baltimore, USA.
>
>
> On Fri, Jun 29, 2018 at 10:51 AM Nathan Goldbaum 
> wrote:
>
>>
>>
>> On Thu, Jun 28, 2018 at 11:26 PM, Alex Walters 
>> wrote:
>>
>>> But don't all the users who care about citing modules already use the
>>> scientific python packages, with scipy itself at it's center?  Wouldn't
>>> those engaging in science or in academia be better stewards of this than
>>> systems programmers?  Since you're not asking for anything that can't be
>>> done in a third party module, and there is a third party module that most
>>> of the target audience of this standard would already have, there is zero
>>> reason to take up four names in the python runtime to serve those users.
>>>
>>
>>
>> Not all scientific software in Python depends on scipy or even numpy.
>> However, it does all depend on Python.
>>
>> Although perhaps that argues for a cross-language solution :)
>>
>> I still think it would be very nice to have an official standard for
>> citation information in Python packages as codified in a PEP. That would
>> reduce ambiguity and make it much easier for tool-writers who want to parse
>> citation information.
>>
>> > -Original Message-
>>> > From: Adrian Price-Whelan 
>>> > Sent: Friday, June 29, 2018 12:16 AM
>>> > To: Alex Walters 
>>> > Cc: Steven D'Aprano ; python-ideas@python.org
>>> > Subject: Re: [Python-ideas] Add a __cite__ method for scientific
>>> packages
>>> >
>>> > For me, it's about setting a standard that is endorsed by the
>>> > language, and setting expectations for users. There currently is no
>>> > standard, which is why packages use __citation__, __cite__,
>>> > __bibtex__, etc., and as a user I don't immediately know where to look
>>> > for citation information (without going to the source). My feeling is
>>> > that adopting __citation__ or some dunder name could be implemented on
>>> > classes, functions, etc. with less of a chance of naming conflicts,
>>> > but am open to discussion.
>>> >
>>> > I have some notes here about various ideas for more advanced
>>> > functionality that would support automatically keeping track of
>>> > citation information for imported packages, classes, functions:
>>> > https://github.com/adrn/CitationPEP/blob/master/NOTES.md
>>> >
>>> > On Thu, Jun 28, 2018 at 10:57 PM, Alex Walters <
>>> tritium-l...@sdamon.com>
>>> > wrote:
>>> > > Why not scipy.cite() or scipy.citation()?  I don't see any reason
>>> for these
>>> > > functions to ship with standard python at all.
>>> > >
>>> > >> -Original Message-
>>> > >> From: Python-ideas >> > >> list=sdamon@python.org> On Behalf Of Steven D'Aprano
>>> > >> Sent: Thursday, June 28, 2018 8:17 PM
>>> > >> To: python-ideas@python.o

Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-29 Thread David Mertz
On Fri, Jun 29, 2018, 8:14 PM Andrei Kucharavy 
wrote:
> Not all packages are within the numpy/scipy universe - Pandas and Seaborn
are notable examples.

Huh?! Pandas is a thin wrapper around NumPy. To be fair, it is a wrapper
that adds a huge number of wrapping methods and classes. Seaborn in turn
has at least a soft dependency on Pandas (some of the charts really need a
DataFrame to work from).

I like the idea of standardizing curation information. But it has little to
do with Python itself. Getting the authors of scientific packages to agree
on conventions is what needed, and doing that requires accurately
determining their needs, not some mandate from Python itself. Nothing in
the language needs to change to agree on some certain collection of names
(perhaps dunders, perhaps not), and some certain formats for the data that
might live inside them.

Down the road, if there gets to be widespread acceptance of these
conventions, Python standard library might include a function or two to
work with them. But the horse should go before the cart.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-29 Thread Andrei Kucharavy
nt libraries that get wrapped and used in most modern
scientific computing languages, but very rarely directly.

In addition to that, while I see how granular citations could be
implemented in Python, I have a bit more trouble understanding how calls to
R, Python, Perl, C, C++ or Fortran from command line scripts can be
analyzed on the fly to get metadata about citations. I have even more
trouble imagining how it would be possible to bring developers across all
the separate language communities to agree on a single standard.

> CLS-JSON represented as a dict to be supplied to the setup file is
> > definitely one way of doing it. I was, however, thinking more about the
> > BibTeX format, given that CLS-JSON is more closely affiliated with
> Mendeley
>
> Huh, is it? I only know it from Zotero.
>

Hm - was not aware Zotero uses it as well - it's definitely a good sign and
I will have to look into CLS-JSON it more in depth.

Why not scipy.cite() or scipy.citation()?  I don't see any reason for these
> functions to ship with standard python at all.


There are packages that do not depend on scipy and even for those that do -
most users writing analysis pipelines for scientific packages are unaware
that they are using scipy/numpy underneath the packages that do what they
want at the highest level.

I don't think that this is a very useful idea, because most people that
> I've encountered that don't cite software is people they think that it's
> not important, not because they don't know what the right citation is.
> The problem is social and not technological. I don't want to spend time
> on a technical solution to it.
>

Thanks for your opinion Gael - as maintainer of scikits-learn you have more
experience with this issue more than most of us.

In my field (computational biology in molecular biology labs) the situation
is somewhat different - most of the custom scripts are implemented by
people who often have learned Python or programming at all in the last
couple of years. Most of the time they get asked by the corresponding
author to provide 1-5 citations for their analytical pipeline and to
describe what they did in the supplementary material and I had several
junior developers in my labs come forwards to me asking what they were
supposed to cite and where to find the citations.

We aren't likely to convince everyone to cite code overnight, but making
citing as easy as possible does seem like a step in the right direction to
me.

I still think it would be very nice to have an official standard for
> citation information in Python packages as codified in a PEP. That would
> reduce ambiguity and make it much easier for tool-writers who want to parse
> citation information.
>

That's my opinion as well.

To summarize the conversation until now, it seems that  __citation__ data
field and a cite() script seem to be the preferred option. If the proposal
gets traction and is accepted, the citation for Python as well as the
instructions to get citation for a package can be added as a top-level
command, similar to credits, copyright or license.

As of now, it seems like the next steps would be to:

- draft a PEP (or complete the existing one) and implement the cite()
script as well as a show-case package using __citation__
- talk to major package maintainers to see if they have any objections to
the method or suggestions with regards to pep/implementation
- talk to the distutils-sig list to see if we could add the __citation__
metadata to setup.py
- submit a proper PEP (Would a pull request to
https://github.com/python/peps be an acceptable way of doing it?)

Is there something I might be missing so far?

Best,

*Andrei Kucharavy*

Post-Doc @ *Joel S. Bader** Lab*

Johns Hopkins University, Baltimore, USA.


On Fri, Jun 29, 2018 at 10:51 AM Nathan Goldbaum 
wrote:

>
>
> On Thu, Jun 28, 2018 at 11:26 PM, Alex Walters 
> wrote:
>
>> But don't all the users who care about citing modules already use the
>> scientific python packages, with scipy itself at it's center?  Wouldn't
>> those engaging in science or in academia be better stewards of this than
>> systems programmers?  Since you're not asking for anything that can't be
>> done in a third party module, and there is a third party module that most
>> of the target audience of this standard would already have, there is zero
>> reason to take up four names in the python runtime to serve those users.
>>
>
>
> Not all scientific software in Python depends on scipy or even numpy.
> However, it does all depend on Python.
>
> Although perhaps that argues for a cross-language solution :)
>
> I still think it would be very nice to have an official standard for
> citation information in Python packages as codified in a PEP. That would
> reduce ambiguity and make it much easier for tool-writers who want to parse
> citation 

Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-29 Thread Nathan Goldbaum
On Thu, Jun 28, 2018 at 11:26 PM, Alex Walters 
wrote:

> But don't all the users who care about citing modules already use the
> scientific python packages, with scipy itself at it's center?  Wouldn't
> those engaging in science or in academia be better stewards of this than
> systems programmers?  Since you're not asking for anything that can't be
> done in a third party module, and there is a third party module that most
> of the target audience of this standard would already have, there is zero
> reason to take up four names in the python runtime to serve those users.
>


Not all scientific software in Python depends on scipy or even numpy.
However, it does all depend on Python.

Although perhaps that argues for a cross-language solution :)

I still think it would be very nice to have an official standard for
citation information in Python packages as codified in a PEP. That would
reduce ambiguity and make it much easier for tool-writers who want to parse
citation information.

> -Original Message-
> > From: Adrian Price-Whelan 
> > Sent: Friday, June 29, 2018 12:16 AM
> > To: Alex Walters 
> > Cc: Steven D'Aprano ; python-ideas@python.org
> > Subject: Re: [Python-ideas] Add a __cite__ method for scientific packages
> >
> > For me, it's about setting a standard that is endorsed by the
> > language, and setting expectations for users. There currently is no
> > standard, which is why packages use __citation__, __cite__,
> > __bibtex__, etc., and as a user I don't immediately know where to look
> > for citation information (without going to the source). My feeling is
> > that adopting __citation__ or some dunder name could be implemented on
> > classes, functions, etc. with less of a chance of naming conflicts,
> > but am open to discussion.
> >
> > I have some notes here about various ideas for more advanced
> > functionality that would support automatically keeping track of
> > citation information for imported packages, classes, functions:
> > https://github.com/adrn/CitationPEP/blob/master/NOTES.md
> >
> > On Thu, Jun 28, 2018 at 10:57 PM, Alex Walters 
> > wrote:
> > > Why not scipy.cite() or scipy.citation()?  I don't see any reason for
> these
> > > functions to ship with standard python at all.
> > >
> > >> -Original Message-----
> > >> From: Python-ideas  > >> list=sdamon@python.org> On Behalf Of Steven D'Aprano
> > >> Sent: Thursday, June 28, 2018 8:17 PM
> > >> To: python-ideas@python.org
> > >> Subject: Re: [Python-ideas] Add a __cite__ method for scientific
> packages
> > >>
> > >> On Thu, Jun 28, 2018 at 05:25:00PM -0400, Andrei Kucharavy wrote:
> > >>
> > >> > As for the list, reserving a __citation__/__cite__ for packages at
> the
> > > same
> > >> > level as __version__ is now reserved and adding a citation()/cite()
> > >> > function to the standard library seemed large enough modifications
> to
> > >> > warrant searching a buy-in from the maintainers and the community at
> > >> large.
> > >>
> > >> I think that an approach similar to help/quit/exit is warranted. The
> > >> cite()/citation() function need not be *literally* built into the
> > >> language, it could be an external function written in Python and added
> > >> to builtins by the site.py module.
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Steve
> > >> ___
> > >> Python-ideas mailing list
> > >> Python-ideas@python.org
> > >> https://mail.python.org/mailman/listinfo/python-ideas
> > >> Code of Conduct: http://python.org/psf/codeofconduct/
> > >
> > > ___
> > > Python-ideas mailing list
> > > Python-ideas@python.org
> > > https://mail.python.org/mailman/listinfo/python-ideas
> > > Code of Conduct: http://python.org/psf/codeofconduct/
> >
> >
> >
> > --
> > Adrian M. Price-Whelan
> > Lyman Spitzer, Jr. Postdoctoral Fellow
> > Princeton University
> > http://adrn.github.io
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Alex Walters
But don't all the users who care about citing modules already use the 
scientific python packages, with scipy itself at it's center?  Wouldn't those 
engaging in science or in academia be better stewards of this than systems 
programmers?  Since you're not asking for anything that can't be done in a 
third party module, and there is a third party module that most of the target 
audience of this standard would already have, there is zero reason to take up 
four names in the python runtime to serve those users.

> -Original Message-
> From: Adrian Price-Whelan 
> Sent: Friday, June 29, 2018 12:16 AM
> To: Alex Walters 
> Cc: Steven D'Aprano ; python-ideas@python.org
> Subject: Re: [Python-ideas] Add a __cite__ method for scientific packages
> 
> For me, it's about setting a standard that is endorsed by the
> language, and setting expectations for users. There currently is no
> standard, which is why packages use __citation__, __cite__,
> __bibtex__, etc., and as a user I don't immediately know where to look
> for citation information (without going to the source). My feeling is
> that adopting __citation__ or some dunder name could be implemented on
> classes, functions, etc. with less of a chance of naming conflicts,
> but am open to discussion.
> 
> I have some notes here about various ideas for more advanced
> functionality that would support automatically keeping track of
> citation information for imported packages, classes, functions:
> https://github.com/adrn/CitationPEP/blob/master/NOTES.md
> 
> On Thu, Jun 28, 2018 at 10:57 PM, Alex Walters 
> wrote:
> > Why not scipy.cite() or scipy.citation()?  I don't see any reason for these
> > functions to ship with standard python at all.
> >
> >> -Original Message-
> >> From: Python-ideas  >> list=sdamon@python.org> On Behalf Of Steven D'Aprano
> >> Sent: Thursday, June 28, 2018 8:17 PM
> >> To: python-ideas@python.org
> >> Subject: Re: [Python-ideas] Add a __cite__ method for scientific packages
> >>
> >> On Thu, Jun 28, 2018 at 05:25:00PM -0400, Andrei Kucharavy wrote:
> >>
> >> > As for the list, reserving a __citation__/__cite__ for packages at the
> > same
> >> > level as __version__ is now reserved and adding a citation()/cite()
> >> > function to the standard library seemed large enough modifications to
> >> > warrant searching a buy-in from the maintainers and the community at
> >> large.
> >>
> >> I think that an approach similar to help/quit/exit is warranted. The
> >> cite()/citation() function need not be *literally* built into the
> >> language, it could be an external function written in Python and added
> >> to builtins by the site.py module.
> >>
> >>
> >>
> >>
> >> --
> >> Steve
> >> ___
> >> Python-ideas mailing list
> >> Python-ideas@python.org
> >> https://mail.python.org/mailman/listinfo/python-ideas
> >> Code of Conduct: http://python.org/psf/codeofconduct/
> >
> > ___
> > Python-ideas mailing list
> > Python-ideas@python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
> 
> 
> 
> --
> Adrian M. Price-Whelan
> Lyman Spitzer, Jr. Postdoctoral Fellow
> Princeton University
> http://adrn.github.io

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Adrian Price-Whelan
For me, it's about setting a standard that is endorsed by the
language, and setting expectations for users. There currently is no
standard, which is why packages use __citation__, __cite__,
__bibtex__, etc., and as a user I don't immediately know where to look
for citation information (without going to the source). My feeling is
that adopting __citation__ or some dunder name could be implemented on
classes, functions, etc. with less of a chance of naming conflicts,
but am open to discussion.

I have some notes here about various ideas for more advanced
functionality that would support automatically keeping track of
citation information for imported packages, classes, functions:
https://github.com/adrn/CitationPEP/blob/master/NOTES.md

On Thu, Jun 28, 2018 at 10:57 PM, Alex Walters  wrote:
> Why not scipy.cite() or scipy.citation()?  I don't see any reason for these
> functions to ship with standard python at all.
>
>> -Original Message-
>> From: Python-ideas > list=sdamon@python.org> On Behalf Of Steven D'Aprano
>> Sent: Thursday, June 28, 2018 8:17 PM
>> To: python-ideas@python.org
>> Subject: Re: [Python-ideas] Add a __cite__ method for scientific packages
>>
>> On Thu, Jun 28, 2018 at 05:25:00PM -0400, Andrei Kucharavy wrote:
>>
>> > As for the list, reserving a __citation__/__cite__ for packages at the
> same
>> > level as __version__ is now reserved and adding a citation()/cite()
>> > function to the standard library seemed large enough modifications to
>> > warrant searching a buy-in from the maintainers and the community at
>> large.
>>
>> I think that an approach similar to help/quit/exit is warranted. The
>> cite()/citation() function need not be *literally* built into the
>> language, it could be an external function written in Python and added
>> to builtins by the site.py module.
>>
>>
>>
>>
>> --
>> Steve
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Adrian M. Price-Whelan
Lyman Spitzer, Jr. Postdoctoral Fellow
Princeton University
http://adrn.github.io
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Alex Walters
Why not scipy.cite() or scipy.citation()?  I don't see any reason for these
functions to ship with standard python at all.

> -Original Message-
> From: Python-ideas  list=sdamon@python.org> On Behalf Of Steven D'Aprano
> Sent: Thursday, June 28, 2018 8:17 PM
> To: python-ideas@python.org
> Subject: Re: [Python-ideas] Add a __cite__ method for scientific packages
> 
> On Thu, Jun 28, 2018 at 05:25:00PM -0400, Andrei Kucharavy wrote:
> 
> > As for the list, reserving a __citation__/__cite__ for packages at the
same
> > level as __version__ is now reserved and adding a citation()/cite()
> > function to the standard library seemed large enough modifications to
> > warrant searching a buy-in from the maintainers and the community at
> large.
> 
> I think that an approach similar to help/quit/exit is warranted. The
> cite()/citation() function need not be *literally* built into the
> language, it could be an external function written in Python and added
> to builtins by the site.py module.
> 
> 
> 
> 
> --
> Steve
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Nathaniel Smith
On Thu, Jun 28, 2018 at 2:25 PM, Andrei Kucharavy
 wrote:
>> This is indeed a serious problem. I suspect python-ideas isn't the
>> best venue for addressing it though – there's nothing here that needs
>> changes to the Python interpreter itself (I think), and the people who
>> understand this problem the best and who are most affected by it,
>> mostly aren't here.
>
> There has been localized discussion popping up among the large scientific
> package maintainers and some attempts to solve the problem at the local
> level. Until now they seemed to be winding down due to a lack of a
> large-scale citation mechanism and a discussion about what is concretely
> doable at the scale of the language is likely to finalize

Those are the people with the most motivation and expertise to solve
this, and whose buy-in you'll need on any solution. If they haven't
solved it yet themselves, then there are basically two reasons why
that happens: either because they're busy and no-one's had enough time
to work on it, or else because they're uncertain about the best path
forward. Neither of these is a problem that python-ideas can help
with. If you want to be effective here, you need to talk to them to
figure out how you can help them move forward.

If I were you, I'd try organizing a birds-of-a-feather at the next
SciPy conference, or start getting in touch with others working on
this (duecredit devs, the folks listed on that citationPEP thing,
etc.), and go from there. (Feel free to CC me if you do start up some
effort like this.)

> As for the list, reserving a __citation__/__cite__ for packages at the same
> level as __version__ is now reserved and adding a citation()/cite() function
> to the standard library seemed large enough modifications to warrant
> searching a buy-in from the maintainers and the community at large.

There isn't actually any formal method for registering special names
like __version__, and they aren't treated specially by the language.
They're just variables that happen to have a funny name. You shouldn't
start using them willy-nilly, but you don't actually have to ask
permission or anything. And it's not very likely that someone else
will come along and propose using the name __citation__ for something
that *isn't* a citation :-).

>> You'll want to check out the duecredit project:
>> https://github.com/duecredit/duecredit
>> One of the things they've thought about is the ability to track
>> citation information at a more fine-grained way than per-package – for
>> example, there might be a paper that should be cited by anyone who
>> calls a particular method (or even passes a specific argument to some
>> specific method, when that turns on some fancy algorithm).
>
>
> Due credit looks amazing - I will definitely check it out. The idea was,
> however, to bring the barrier for adoption and usage as low as possible. In
> my experience, the vast majority of Python users in academic environment who
> aren't citing the packages properly are beginners. As such they are unlikely
> to search for third-party libraries beyond those they've found and used to
> solve their specific problem.
>
>  who just assembled a pipeline based on widely-used libraries and would need
> to generate a citation list for it to pass on to their colleagues
> responsible for the paper assembly and submission.

The way to do this is to first get your solution implemented as a
third-party library and adopted by the scientific packages, and then
start thinking about whether it would make sense to move the library
into the standard library. It's relatively easy to move things into
the standard library. The hard part is making sure that you
implemented the right thing in the first place, and that's MUCH more
likely if you start out as a third-party package.

>> I'd actually like to see a more general solution that isn't restricted
>> to any one language, because multi-language analysis pipelines are
>> very common. For example, we could standardize a convention where if a
>> certain environment variable is set, then the software writes out
>> citation information to a certain location, and then implement
>> libraries that do this in multiple languages. Of course, that's a
>> "dynamic" solution that requires running the software -- which is
>> probably necessary if you want to do fine-grained citations, but it
>> might be useful to also have static metadata, e.g. as part of the
>> package metadata that goes into sdists, wheels, and on PyPI. That
>> would be a discussion for the distutils-sig mailing list, which
>> manages that metadata.
>
>
> Thanks for the reference to the distutils-sig list. I will talk to them if
> the idea gets traction here

I think you misunderstand how these lists work :-). (Which is fine --
it's actually pretty opaque and confusing if you don't already know!)
Generally, distutils-sig operates totally independently from
python-{ideas,dev} -- if you have a packaging proposal, it goes there
and not here; if you 

Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Steven D'Aprano
On Thu, Jun 28, 2018 at 05:25:00PM -0400, Andrei Kucharavy wrote:

> As for the list, reserving a __citation__/__cite__ for packages at the same
> level as __version__ is now reserved and adding a citation()/cite()
> function to the standard library seemed large enough modifications to
> warrant searching a buy-in from the maintainers and the community at large.

I think that an approach similar to help/quit/exit is warranted. The 
cite()/citation() function need not be *literally* built into the 
language, it could be an external function written in Python and added 
to builtins by the site.py module.




-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Guido van Rossum
One more thing. There's precedent for this: when you start an interactive
Python interpreter it tells you how to get help, but also how to get
copyright, credits and license information:

$ python3
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 26 2018, 19:50:54)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> credits
Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of
thousands
for supporting Python development.  See www.python.org for more
information.
>>>

It makes total sense to add citations/references to this list (and those
should probably print a reference for Python followed by instructions on
how to get references for other packages and how to properly add a
reference to your own code).

-- 
--Guido van Rossum (python.org/~guido)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Andrei Kucharavy
That's a lot of responses, thanks for the interest and the suggestions!


Are there other languages or software communities that do something like
> this? It would be nice not to have to invent this wheel. Eventually a PEP
> and an implementation should be presented, but first the idea needs to be
> explored more.


To my knowledge, R is the only language that implements such a feature.
Package developers add a CITATION text file containing a text with whatever
text citation format for their package. A specialized citation() built-in
function can be called from the REPL that would return a citation for the R
itself, including a BibTex file for LateX users. When citation is called on
a package instead, it returns the contents of CITATION for that package
specifically (eg. citation("ggplot2")) or alternatively uses package
metadata to build a sane citation. Given that most of work with R is done
within a REPL and packages are installed/imported with commands such as
install.package("ggplot2")/import("ggplot2"), this approach makes sense in
that context. This, however, didn't feel terribly Pythonic to me.

As for PEP and a reference implementation, I will gladly take care of them
if the idea gets enough traction, but there seems to be already a PEP draft
as well as an attempt at implementation by one of the AstroPy/AstroML
maintainers, using the __citation__ field and citation() function to unpack
it:

https://github.com/adrn/CitationPEP

There also seem some packages in the community using __bibtex__ rather than
__citation__ to store BibTeX entries but I haven't found yet any large
project implementing it or PEP drafts associated to it.


The software sustainability institute in the UK have written several blog
> posts advocating the use of CITATION files containing this sort of metadata:
> https://software.ac.uk/blog/2017-12-12-standard-format-citation-files


Yes, that's the R approach I presented above. It is viable, especially if
hooked to something accessible from the REPL directly, such as __cite__ or
__citation__ attribute/method for modules. I would, however, advocate for a
more structured approach - perhaps JSON or BibTeX that would get parsed and
converted to suitable citation format by the __cite__, if it was
implemented as a method.

A github code search for __citation__ also gets 127 hits that mostly seem
> to be research software that are using this attribute more or less as
> suggested here:
> https://github.com/search?q=__citation__=Code


Most of them are from the AstroPy universe or from the CitationPEP draft
I've referenced above.

This is indeed a serious problem. I suspect python-ideas isn't the
> best venue for addressing it though – there's nothing here that needs
> changes to the Python interpreter itself (I think), and the people who
> understand this problem the best and who are most affected by it,
> mostly aren't here.


There has been localized discussion popping up among the large scientific
package maintainers and some attempts to solve the problem at the local
level. Until now they seemed to be winding down due to a lack of a
large-scale citation mechanism and a discussion about what is concretely
doable at the scale of the language is likely to finalize

As for the list, reserving a __citation__/__cite__ for packages at the same
level as __version__ is now reserved and adding a citation()/cite()
function to the standard library seemed large enough modifications to
warrant searching a buy-in from the maintainers and the community at large.

You'll want to check out the duecredit project:
> https://github.com/duecredit/duecredit
> One of the things they've thought about is the ability to track
> citation information at a more fine-grained way than per-package – for
> example, there might be a paper that should be cited by anyone who
> calls a particular method (or even passes a specific argument to some
> specific method, when that turns on some fancy algorithm).


Due credit looks amazing - I will definitely check it out. The idea was,
however, to bring the barrier for adoption and usage as low as possible. In
my experience, the vast majority of Python users in academic environment
who aren't citing the packages properly are beginners. As such they are
unlikely to search for third-party libraries beyond those they've found and
used to solve their specific problem.

 who just assembled a pipeline based on widely-used libraries and would
need to generate a citation list for it to pass on to their colleagues
responsible for the paper assembly and submission.

I'd actually like to see a more general solution that isn't restricted
> to any one language, because multi-language analysis pipelines are
> very common. For example, we could standardize a convention where if a
> certain environment variable is set, then the software writes out
> citation information to a certain location, and then implement
> libraries that do this in multiple languages. Of course, that's a
> "dynamic" 

Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread pylang
​​
​> Are there other languages or software communities that do something like
this? It would be nice not to have to invent this wheel. ​

While I do not use R regularly, I understand their community is largely
academic-driven, and citations are strongly encouraged as seen in their
documentation:

https://stat.ethz.ch/R-manual/R-devel/library/utils/html/citation.html

Here is an example use of their `citation()` function:

http://www.blopig.com/blog/2013/07/citing-r-packages-in-your-thesispaperassignments/

> citation()

To cite R in publications use:

R Core Team (2013). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria.
URL http://www.R-project.org/.

A BibTeX entry for LaTeX users is

@Manual{,
title = {R: A Language and Environment for Statistical
Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2013},
url = {http://www.R-project.org/},
}

Calling the `citation()` function generates a BibTex output (
http://www.bibtex.org/), which is one of the most common citation
conventions.

For reference, I believe this is the source code:

https://github.com/wch/r-source/blob/c3f7d32c842ca61fa23a25d4240d6caf980fe2ee/src/library/tools/R/citation.R
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Chris Barker - NOAA Federal via Python-ideas
I think this is a fine idea, but could be achieved by convention, like
__version__, rather than by fiat.

And it’s certainly not a language feature.

So Nathaniel’s right — the thing to do now is work out the convention,
and then advocate for it.

-CHB
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Steve Barnes



On 28/06/2018 00:00, Nathan Goldbaum wrote:
> This is an interesting proposal. Speaking as a developer of scientific 
> software packages it would be really cool to have support for something 
> like this in the language itself.
> 
> The software sustainability institute in the UK have written several 
> blog posts advocating the use of CITATION files containing this sort of 
> metadata:
> 
> https://software.ac.uk/blog/2017-12-12-standard-format-citation-files
> 
> A github code search for __citation__ also gets 127 hits that mostly 
> seem to be research software that are using this attribute more or less 
> as suggested here:
> 
> https://github.com/search?q=__citation__=Code
> 
> It's also worth pointing out http://citeas.org/ which is sort of a 
> citation search engine for software projects. It uses a number of 
> heuristics to figure out what the appropriate citation for a piece of 
> software is.
> 
I just thought that it might be worth pointing out that this should 
actually work both ways i.e. if a specific package, module or function 
is inspired by or directly implements the methods included in a specific 
publication then any __citation__ entries within it should also cite 
that/those or allow references to them to be recovered.

The general principle is if you are expecting to be cited you also have 
to cite.

-- 
Steve (Gadget) Barnes
Any opinions in this message are my personal opinions and do not reflect 
those of my employer.

---
This email has been checked for viruses by AVG.
https://www.avg.com

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Steven D'Aprano
On Wed, Jun 27, 2018 at 05:20:01PM -0400, Andrei Kucharavy wrote:
[...]

> To remediate to that situation, I suggest a __citation__ method associated
> to each package installation and import. Called from the __main__,
> __citation__() would scan __citation__ of all imported packages and return
> the list of all relevant top-level citations associated to the packages.

Why does this have to be a dunder method? In general, application code 
shouldn't be calling dunders directly, they're reserved for Python.

I think your description of what this method should do is not 
really coherent. On the one hand, you have __citation__() be a method 
that you call (how?) but on the other hand you have it being a data 
field __citation__ that you scan.

Which is it?

I do think you have identified an important feature, but I think this is 
a *tool*, not a *language feature*. My spur of the moment thought is:

- we could have a script (a third party script? or in the std lib?) 
  which the user calls, giving the name of their module or package as
  argument

  e.g. "python -m cite myapplication.py"

- this script knows how to analyse myapplication.py for a list of
  dependencies, perhaps filtering out standard library packages;

- it interrogates myapplication, and each dependency, for a citation;

- this might involve reserving a standard __citation__ data field
  in each module, or a __citation__.xml file in the package, or
  some other protocol;

- or perhaps the cite script nows how to generate the appropriate
  citation itself, from any of the standard formatted data fields
  found in many common modules, like __author__, __version__ etc.

- either way, the script would generate a list of packages and
  modules used by myapplication, plus citations for them.

Presumably you would need to be able to specify which citation style to 
use.

The point is, the *grunt work* of generating the citations is just a 
script. It isn't a language feature. It might not even be in the std lib 
(although perhaps we could ship it as a standard Python script, like the 
compileall module and a few other tools, starting in version 3.8).

The protocol of how the script works out the citations can be 
developed. Perhaps we could reserve a __citation__ dunder as a de facto 
standard data field, like people already use __author__ and __version__ 
and similar. Or it could look for a separate XML or TXT file in the 
package directory.



> As a scientific package developer working in academia, the problem is quite
> serious, and the solution seems relatively straightforward.
> 
> What does Python core team think about addition and long-term maintenance
> of such a feature to the import and setup mechanisms?

What does this have to do with either import or setup?


> What do other users
> and scientific package developers think of such a mechanism for citations
> retrieval?

A long time ago, I added a feature request for a page in the 
documentation to show how to cite Python in various formats:

https://bugs.python.org/issue26597

I don't believe there has been any progress on this. (I certainly don't 
know the right way to cite software.) Perhaps this can be merged with 
your idea.

Should Python have a standard sys.__citation__ field that provides the 
relevant detail in some format-independent, machine-readable object like 
a named tuple? Then this hypothetical cite.py tool could read the tuple 
and format it according to any citation style.



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Antoine Pitrou
On Wed, 27 Jun 2018 17:19:35 -0700
Nathaniel Smith  wrote:
> On Wed, Jun 27, 2018 at 2:20 PM, Andrei Kucharavy
>  wrote:
> > To remediate to that situation, I suggest a __citation__ method associated
> > to each package installation and import. Called from the __main__,
> > __citation__() would scan __citation__ of all imported packages and return
> > the list of all relevant top-level citations associated to the packages.
> >
> > As a scientific package developer working in academia, the problem is quite
> > serious, and the solution seems relatively straightforward.
> >
> > What does Python core team think about addition and long-term maintenance of
> > such a feature to the import and setup mechanisms? What do other users and
> > scientific package developers think of such a mechanism for citations
> > retrieval?  
> 
> This is indeed a serious problem. I suspect python-ideas isn't the
> best venue for addressing it though – there's nothing here that needs
> changes to the Python interpreter itself (I think), and the people who
> understand this problem the best and who are most affected by it,
> mostly aren't here.
> 
> You'll want to check out the duecredit project:
> https://github.com/duecredit/duecredit
> One of the things they've thought about is the ability to track
> citation information at a more fine-grained way than per-package – for
> example, there might be a paper that should be cited by anyone who
> calls a particular method (or even passes a specific argument to some
> specific method, when that turns on some fancy algorithm).
> 
> The R world also has some prior art -- in particular I know they have
> citations as part of the standard metadata in every package.
> 
> I'd actually like to see a more general solution that isn't restricted
> to any one language, because multi-language analysis pipelines are
> very common.

Perhaps a dedicated CPU instruction?

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-27 Thread Nathaniel Smith
On Wed, Jun 27, 2018 at 2:20 PM, Andrei Kucharavy
 wrote:
> To remediate to that situation, I suggest a __citation__ method associated
> to each package installation and import. Called from the __main__,
> __citation__() would scan __citation__ of all imported packages and return
> the list of all relevant top-level citations associated to the packages.
>
> As a scientific package developer working in academia, the problem is quite
> serious, and the solution seems relatively straightforward.
>
> What does Python core team think about addition and long-term maintenance of
> such a feature to the import and setup mechanisms? What do other users and
> scientific package developers think of such a mechanism for citations
> retrieval?

This is indeed a serious problem. I suspect python-ideas isn't the
best venue for addressing it though – there's nothing here that needs
changes to the Python interpreter itself (I think), and the people who
understand this problem the best and who are most affected by it,
mostly aren't here.

You'll want to check out the duecredit project:
https://github.com/duecredit/duecredit
One of the things they've thought about is the ability to track
citation information at a more fine-grained way than per-package – for
example, there might be a paper that should be cited by anyone who
calls a particular method (or even passes a specific argument to some
specific method, when that turns on some fancy algorithm).

The R world also has some prior art -- in particular I know they have
citations as part of the standard metadata in every package.

I'd actually like to see a more general solution that isn't restricted
to any one language, because multi-language analysis pipelines are
very common. For example, we could standardize a convention where if a
certain environment variable is set, then the software writes out
citation information to a certain location, and then implement
libraries that do this in multiple languages. Of course, that's a
"dynamic" solution that requires running the software -- which is
probably necessary if you want to do fine-grained citations, but it
might be useful to also have static metadata, e.g. as part of the
package metadata that goes into sdists, wheels, and on PyPI. That
would be a discussion for the distutils-sig mailing list, which
manages that metadata.

One challenge in standardizing this kind of thing is choosing a
standard way to represent citation information. Maybe CSL-JSON?
There's a lot of complexity as you dig into this, though of course one
shouldn't let the perfect be the enemy of the good...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-27 Thread Nathan Goldbaum
This is an interesting proposal. Speaking as a developer of scientific
software packages it would be really cool to have support for something
like this in the language itself.

The software sustainability institute in the UK have written several blog
posts advocating the use of CITATION files containing this sort of metadata:

https://software.ac.uk/blog/2017-12-12-standard-format-citation-files

A github code search for __citation__ also gets 127 hits that mostly seem
to be research software that are using this attribute more or less as
suggested here:

https://github.com/search?q=__citation__=Code

It's also worth pointing out http://citeas.org/ which is sort of a citation
search engine for software projects. It uses a number of heuristics to
figure out what the appropriate citation for a piece of software is.

On Wed, Jun 27, 2018 at 5:49 PM, Guido van Rossum  wrote:

> While I'm not personally in need of citations (and never felt I was) I can
> easily understand the point -- sometimes citations can make or break a
> career and having written a popular software package should be acknowledged.
>
> Are there other languages or software communities that do something like
> this? It would be nice not to have to invent this wheel. Eventually a PEP
> and an implementation should be presented, but first the idea needs to be
> explored more.
>
> --Guido
>
> On Wed, Jun 27, 2018 at 3:30 PM Andrei Kucharavy <
> andrei.kuchar...@gmail.com> wrote:
>
>> Over the last 10 years, Python has slowly inched towards becoming the
>> most popular scientific computing language, beating or seriously
>> challenging Matlab, R, Mathematica and many specialized languages (S, SAS,
>> ...) in numerous applications.
>>
>> A large part of this growth is driven by amazing community packages, such
>> as numpy, scipy, scikits-learn, scikits-image, seaborn or pandas, just to
>> name a few. Development of such packages represents a significant time
>> investment by people working in academic environments. To be able to
>> justify the investment of time into such package development and support,
>> the developers usually associated them with a scientific article. The
>> number of citations of those articles are considered as measures of the
>> usefulness of articles and are required to justify the time spent on them.
>>
>> Unfortunately, as of now, a significant issue is that such packages are
>> not cited despite being extensively used. Part of this is due to the
>> difficulties with compiling the list of proper citations for each module
>> (and, for libraries associated with multiple update publications, selecting
>> the relevant citation). Part of this is due to users not realizing which of
>> the modules they are using have associated publications and should be cited.
>>
>> To remediate to that situation, I suggest a __citation__ method
>> associated to each package installation and import. Called from the
>> __main__, __citation__() would scan __citation__ of all imported packages
>> and return the list of all relevant top-level citations associated to the
>> packages.
>>
>> As a scientific package developer working in academia, the problem is
>> quite serious, and the solution seems relatively straightforward.
>>
>> What does Python core team think about addition and long-term maintenance
>> of such a feature to the import and setup mechanisms? What do other users
>> and scientific package developers think of such a mechanism for citations
>> retrieval?
>>
>> Best,
>>
>>
>> *Andrei Kucharavy*Post-Doc @ *Joel S. Bader*
>> * Lab*Johns Hopkins University, Baltimore, USA.
>>
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-27 Thread Guido van Rossum
While I'm not personally in need of citations (and never felt I was) I can
easily understand the point -- sometimes citations can make or break a
career and having written a popular software package should be acknowledged.

Are there other languages or software communities that do something like
this? It would be nice not to have to invent this wheel. Eventually a PEP
and an implementation should be presented, but first the idea needs to be
explored more.

--Guido

On Wed, Jun 27, 2018 at 3:30 PM Andrei Kucharavy 
wrote:

> Over the last 10 years, Python has slowly inched towards becoming the most
> popular scientific computing language, beating or seriously challenging
> Matlab, R, Mathematica and many specialized languages (S, SAS, ...) in
> numerous applications.
>
> A large part of this growth is driven by amazing community packages, such
> as numpy, scipy, scikits-learn, scikits-image, seaborn or pandas, just to
> name a few. Development of such packages represents a significant time
> investment by people working in academic environments. To be able to
> justify the investment of time into such package development and support,
> the developers usually associated them with a scientific article. The
> number of citations of those articles are considered as measures of the
> usefulness of articles and are required to justify the time spent on them.
>
> Unfortunately, as of now, a significant issue is that such packages are
> not cited despite being extensively used. Part of this is due to the
> difficulties with compiling the list of proper citations for each module
> (and, for libraries associated with multiple update publications, selecting
> the relevant citation). Part of this is due to users not realizing which of
> the modules they are using have associated publications and should be cited.
>
> To remediate to that situation, I suggest a __citation__ method associated
> to each package installation and import. Called from the __main__,
> __citation__() would scan __citation__ of all imported packages and return
> the list of all relevant top-level citations associated to the packages.
>
> As a scientific package developer working in academia, the problem is
> quite serious, and the solution seems relatively straightforward.
>
> What does Python core team think about addition and long-term maintenance
> of such a feature to the import and setup mechanisms? What do other users
> and scientific package developers think of such a mechanism for citations
> retrieval?
>
> Best,
>
>
> *Andrei Kucharavy*Post-Doc @ *Joel S. Bader*
> * Lab*Johns Hopkins University, Baltimore, USA.
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Add a __cite__ method for scientific packages

2018-06-27 Thread Andrei Kucharavy
Over the last 10 years, Python has slowly inched towards becoming the most
popular scientific computing language, beating or seriously challenging
Matlab, R, Mathematica and many specialized languages (S, SAS, ...) in
numerous applications.

A large part of this growth is driven by amazing community packages, such
as numpy, scipy, scikits-learn, scikits-image, seaborn or pandas, just to
name a few. Development of such packages represents a significant time
investment by people working in academic environments. To be able to
justify the investment of time into such package development and support,
the developers usually associated them with a scientific article. The
number of citations of those articles are considered as measures of the
usefulness of articles and are required to justify the time spent on them.

Unfortunately, as of now, a significant issue is that such packages are not
cited despite being extensively used. Part of this is due to the
difficulties with compiling the list of proper citations for each module
(and, for libraries associated with multiple update publications, selecting
the relevant citation). Part of this is due to users not realizing which of
the modules they are using have associated publications and should be cited.

To remediate to that situation, I suggest a __citation__ method associated
to each package installation and import. Called from the __main__,
__citation__() would scan __citation__ of all imported packages and return
the list of all relevant top-level citations associated to the packages.

As a scientific package developer working in academia, the problem is quite
serious, and the solution seems relatively straightforward.

What does Python core team think about addition and long-term maintenance
of such a feature to the import and setup mechanisms? What do other users
and scientific package developers think of such a mechanism for citations
retrieval?

Best,


*Andrei Kucharavy*Post-Doc @ *Joel S. Bader*
* Lab*Johns Hopkins University, Baltimore, USA.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/