...and YES, detecting when a reference has changed but the adjacent text
has not is something that will be detectable with the dataset we aim to
produce. That's a great idea!

On Tue, May 2, 2017 at 7:59 AM, Kerry Raymond <[email protected]>
wrote:

> Just a couple of thoughts that cross my mind ...
>
> If people use the {{cite book}} etc templates, it will be relatively easy
> to work out what the components of the citation are. However if people roll
> their own, e.g.
>
> <ref>[http://someurl This And That], Blah Blah 2000</ref>
>
> you may have some difficulty working out what is what. I've just been
> though a tedious exercise of updating a set of URLs using AWB over some
> thousands of articles and some of the ways people roll their own citations
> were quite remarkable (and often quite unhelpful). It may be that you can't
> extract much from such citations. However, the good news is that if they
> have a URL in them, it will probably be in plain-sight.
>
> Whereas there are a number of templates that I regularly use for citation
> like {{cite QHR}} (currently 1234 transclusions) and {{cite QPN}}
> (currently 2738  transclusions) and {{Census 2011 AUS}} (4400
> transclusions) all of which generate their URLs. I'm not sure how you will
> deal with these in terms of extracting URLs.
>
> But whatever the limitations, it will be a useful dataset to answer some
> interesting questions.
>
> One phenomena I often see is new users updating information (e.g. changing
> the population of a town) while leaving behind the old citation for the
> previous value. So it superficially looks like the new information is cited
> to a reliable source when in fact it isn't. I've often wished we could
> automatically detect and raise a "warning" when the "text being supported"
> by the citation changes yet the citation does not. The problem, of course,
> is that we only know where the citation appears in the text and that we
> presume it is in support for "some earlier" text (without being clear
> exactly where it is). And if an article is reorganised, it may well result
> in the citation "drifting away" from the text it supports or even that it
> is in support of text that has been deleted. So I think it is important to
> know what text preceded the citation at the time the citation first appears
> in the article history as it may be useful to compare it against the text
> that *now* appears before it. It is a great pity that (in these digital
> times) we have not developed a citation model where you select chunks of
> text and link your citation to them, so that the relationship between the
> text and the citation is more apparent.
>
> Kerry
>
> -----Original Message-----
> From: Wiki-research-l [mailto:[email protected]]
> On Behalf Of Andrea Forte
> Sent: Tuesday, 2 May 2017 5:18 AM
> To: Research into Wikimedia content and communities <
> [email protected]>
> Subject: [Wiki-research-l] Citation Project - Comments Welcome!
>
> Hi all,
>
>
> One of my PhD students, Meen Chul Kim, is a data scientist with experience
> in bibliometrics and we will be working on some citation-related research
> together with Aaron and Dario in the coming months. Our main goal in the
> short term is to develop an enhanced citation dataset that will allow for
> future analyses of citation data associated with article quality,
> lifecycle, editing trends, etc.
>
>
> The project page is here:
> https://meta.wikimedia.org/wiki/Research:Understanding_
> the_context_of_citations_in_Wikipedia
>
>
> The project is just getting started so this is a great time to offer
> feedback and suggestions, especially for features of citations that we
> should mine as a first step, since this will affect what the dataset can be
> used for in the future.
>
>
> Looking forward to seeing some of you at WikiCite!!
>
> Andrea
>
>
>
>
> --
>  :: Andrea Forte
>  :: Associate Professor
>  :: College of Computing and Informatics, Drexel University
>  :: http://www.andreaforte.net
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>


-- 
 :: Andrea Forte
 :: Associate Professor
 :: College of Computing and Informatics, Drexel University
 :: http://www.andreaforte.net
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to