Shilad,

Very cool! Thanks for sharing. I do have a couple of questions...

On Sun, Apr 22, 2012 at 7:15 PM, Shilad Sen <[email protected]> wrote:
> Greetings!
>
> I'm a CS Professor at Macalester College in St. Paul and I'm on research
> sabbatical at GroupLens this year. I've been working with Heather Ford and
> Dave Musicant to explore several research questions related to citation use
> on Wikipedia.
>
> We're still in the middle of analyzing data, and working through parsing
> lots of messy forms of citation references. However, I'll summarize our
> findings as they stand.
>
> As of Jan 1, 2011 there are 6384425 total citations in the main namespace
> for English Wikipedia.

Does this count both templated and non-templated citations? Do you
count citations appearing in any area of the article (e.g. inline
footnotes, "references" section, "further reading" or "bibliography"
section, and "external links"?) Or is anything left out?

> Our top-line research questions focus on citations containing URLs, so we
> broke down our results into citations with a URL (78%) and those without
> (22%).
>
> The top 5 domains in citations with a URL are:
> 1. books.google.com (73777 - 1.48%)
> 2. news.bbc.co.uk (52347 - 1.05%)
> 3. www.stat.gov.pl (51598 - 1.03%)
> 4. www.nytimes.com (39454 - 0.79%)
> 5. www.imdb.com (24993 - 0.50%)

This will probably be part of your published results, but it would be
very interesting to see a long-tail list of these domains, and maybe
try and break them out into types -- that would start to get at
questions like how many paywalled journals are cited, etc.

> The top 5 types of citations without a URL are:
> 1. cite book (190090 - 13.65%)
> 2. citation needed (148339 - 10.65%)
> 3. cite journal (63722 - 4.58%)
> 4. cite news (25052 - 1.80%)
> 5. citation (22773 - 1.64%)

"Citation needed" is really the absence of a citation, not an actual
citation, right? :) The others look like the standard reference
templates.. so my question above about templates applies.

> We have also looked at the *inequality* in citation domains. In other words,
> what share of citations do the most popular domains receive? Citation
> inequality has been steadily growing; the Gini coefficient grew from 0.63 in
> Jan 2007 to 0.81 in Nov 2011.

Interesting! Thanks so much for sharing!

-- phoebe

_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to