...and YES, detecting when a reference has changed but the adjacent text has not is something that will be detectable with the dataset we aim to produce. That's a great idea!
On Tue, May 2, 2017 at 7:59 AM, Kerry Raymond <[email protected]> wrote: > Just a couple of thoughts that cross my mind ... > > If people use the {{cite book}} etc templates, it will be relatively easy > to work out what the components of the citation are. However if people roll > their own, e.g. > > <ref>[http://someurl This And That], Blah Blah 2000</ref> > > you may have some difficulty working out what is what. I've just been > though a tedious exercise of updating a set of URLs using AWB over some > thousands of articles and some of the ways people roll their own citations > were quite remarkable (and often quite unhelpful). It may be that you can't > extract much from such citations. However, the good news is that if they > have a URL in them, it will probably be in plain-sight. > > Whereas there are a number of templates that I regularly use for citation > like {{cite QHR}} (currently 1234 transclusions) and {{cite QPN}} > (currently 2738 transclusions) and {{Census 2011 AUS}} (4400 > transclusions) all of which generate their URLs. I'm not sure how you will > deal with these in terms of extracting URLs. > > But whatever the limitations, it will be a useful dataset to answer some > interesting questions. > > One phenomena I often see is new users updating information (e.g. changing > the population of a town) while leaving behind the old citation for the > previous value. So it superficially looks like the new information is cited > to a reliable source when in fact it isn't. I've often wished we could > automatically detect and raise a "warning" when the "text being supported" > by the citation changes yet the citation does not. The problem, of course, > is that we only know where the citation appears in the text and that we > presume it is in support for "some earlier" text (without being clear > exactly where it is). And if an article is reorganised, it may well result > in the citation "drifting away" from the text it supports or even that it > is in support of text that has been deleted. So I think it is important to > know what text preceded the citation at the time the citation first appears > in the article history as it may be useful to compare it against the text > that *now* appears before it. It is a great pity that (in these digital > times) we have not developed a citation model where you select chunks of > text and link your citation to them, so that the relationship between the > text and the citation is more apparent. > > Kerry > > -----Original Message----- > From: Wiki-research-l [mailto:[email protected]] > On Behalf Of Andrea Forte > Sent: Tuesday, 2 May 2017 5:18 AM > To: Research into Wikimedia content and communities < > [email protected]> > Subject: [Wiki-research-l] Citation Project - Comments Welcome! > > Hi all, > > > One of my PhD students, Meen Chul Kim, is a data scientist with experience > in bibliometrics and we will be working on some citation-related research > together with Aaron and Dario in the coming months. Our main goal in the > short term is to develop an enhanced citation dataset that will allow for > future analyses of citation data associated with article quality, > lifecycle, editing trends, etc. > > > The project page is here: > https://meta.wikimedia.org/wiki/Research:Understanding_ > the_context_of_citations_in_Wikipedia > > > The project is just getting started so this is a great time to offer > feedback and suggestions, especially for features of citations that we > should mine as a first step, since this will affect what the dataset can be > used for in the future. > > > Looking forward to seeing some of you at WikiCite!! > > Andrea > > > > > -- > :: Andrea Forte > :: Associate Professor > :: College of Computing and Informatics, Drexel University > :: http://www.andreaforte.net > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > -- :: Andrea Forte :: Associate Professor :: College of Computing and Informatics, Drexel University :: http://www.andreaforte.net _______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
