Yes, that was my thought. It would be difficult to know the sex (or the gender) 
of an author name on a paper. There would inevitably be a lot that you could 
not determine. And certainly in the sciences multi-author pages are the norm 
and even where you did know the sex/gender of all, do you assign some 
part-score? E.g. 0 for all male, 1 for all female, 0.6 for 3 women and 2 men.

But I am curious why you are asking the question? That the writing/research of 
women is under-represented in Wikipedia citations? If so, without conducting 
any research, I'd say "yes it is under-represented". But my reason would be 
because women are under-represented as writers/researchers in the first place.  
And certainly the older the source, the more likely it is to be written by a 
man. So to investigate gender bias in citations in Wikipedia, you would have to 
estimate the proportion of men/women (or at least their outputs) over time in a 
given discipline and then ask the question, "taking into account of the time of 
publication of a citation and the proportion of men/women active in this 
discipline at that time, do Wikipedia citations show a sex/gender basis?". Hmm 
... very tricky.

I'd be inclined to suggest starting with a much simpler task. Pick a discipline 
(preferably one with a professional society who can tell your their estimate of 
current male/female ratio over (say) the past 5 years), limit the Wikipedia 
articles to topics in that discipline, and limit the citations to those 
published within the last 5 years. Indeed, perhaps limiting it to publications 
that are principally from the same country(s) as the professional society from 
which you get the data (as clearly men/women's participation in any discipline 
can vary with different countries for cultural reasons). Then you have some way 
to gauge whether Wikipedia is showing more or less gender bias in its citations 
than the discipline itself exhibits through publication. Quite a challenge!

And of course, it is not Wikipedia that adds citations. It is individual 
contributor who add citations. Does the sex/gender of the contributor have any 
correlation to any observed bias? Again, the task is made more difficult 
because a lot of Wikipedians don't identify their sex/gender.

The other thing to be alert to is the difference in how (I believe) Wikipedians 
cite compared to researchers. As a researcher, I will of course be reading 
papers in my field all the time and what I read will influence my subsequent 
work. Therefore when I write about my research, my citations are referring to 
papers that I have already read and whose authors may be familiar to me from 
their other work, having met them at a conferences, private correspondence, 
etc. However as a Wikipedian, I am only partially operating that way (mostly 
when I write new articles or significantly expand them, that is, when I am 
doing the research). A lot of the time I am adding citations relating to 
content other people (often new users) have added/changed without citation. 
These come up on my watchlist all the time. What do I do? Of course I could 
revert saying "no citation provided", but that's not the way to encourage new 
contributors nor to grow the encyclopedia, so if the information seems 
plausible (not obviously vandalism), I will attempt to find a citation for it 
(using tools like Google and other topic-specialise search tools). This is what 
I call "lucky dip" mode of citing as obviously I have no idea what the source 
was for the original contributor. The sources I find from my search may not 
already be known to me (frequently they are not). Or to summarise, IMHO, 
researchers (or Wikipedians in "new content mode") cite a source already known 
to them and whose authors may be known to them and could consciously or 
unconsciously engage in some discrimination in citation based on sex/gender or 
other criteria, whereas Wikipedians in "updating mode" are likely to be citing 
a source not previously known to them and may be happy just to have found a 
source and are unlikely to be spending a lot of their time researching the 
authors of that source to be extent they could then consciously or 
unconsciously exercise discrimination on sex/gender. If I invest any extra 
effort in such a situations, it's probably because the wording of the source is 
a close match to the Wikipedia article which begs the question of copyright 
violation (which needs to be dealt with by deletion or rewriting) or being a 
Wikipedia mirror (which is obviously not an acceptable citation).

So I suspect whether a citation was added by the same contributor as the 
content it supports or a subsequent contributor probably makes a difference to 
the likelihood of conscious/unconscious discrimination.

Also, finally, often Wikipedia cites web pages and other sources that do not 
have any individual authorship, e.g. government websites. Remember that 
Wikipedia prefers open citations over paywalled citations and a lot of the 
publications behind paywalls are individually authored.

Your proposed research has a lot of interesting challenges and a number of 
limitations. I'm not saying don't do it, but I am saying start very small and 
see if you can find any evidence to support your hypothesis before embarking on 
a larger study. Because contributor behaviour is what you are trying to study, 
you probably need to do both quantitative and qualitative experiments. E.g. I 
have described the two modes of citation I do, but I cannot say how typical my 
behaviour is.

Kerry

-----Original Message-----
From: Wiki-research-l [mailto:[email protected]] On 
Behalf Of Leila Zia
Sent: Friday, 23 August 2019 3:44 AM
To: Research into Wikimedia content and communities 
<[email protected]>
Subject: Re: [Wiki-research-l] gender balance of wikipedia citations

Hi Greg,

A few comments if you're going to go with "proportion of male vs female authors 
of the source material used as citations in arbitrary
articles":

* Please differentiate between sex (female, male, ...) and gender (woman, man, 
...). My understanding from your initial email is that you want to stay focused 
on gender, not sex.

* Unless you have reliable sources about the gender of an author, I would not 
recommend trying to predict what the gender is. (As you may know, this is not 
uncommon in social media studies, for example, to predict the gender of the 
author based on their image or their name.
These approaches introduce biases and social challenges.)

* Re your question about whether WMF has resources to look into this question 
in-house: I can't speak for the whole of WMF, however, I can share more about 
the Research team's direction. As part of our future work, we would like to 
"help contributors monitor violations of core content policies and assess 
information reliability and bias both granularly and at scale". [1] The 
question you proposed can fall under assessing bias in content (considering 
citations as part of the content). I expect us to focus first on the piece 
about violations of core content policies and information reliability and come 
back to the bias question later. As a result, we won't have bandwidth to do 
your proposal in-house at the moment. Sorry about that.

I hope this helps.

Best,
Leila

[1] Section 2 of our Knowledge Integrity whitepaper:
https://upload.wikimedia.org/wikipedia/commons/9/9a/Knowledge_Integrity_-_Wikimedia_Research_2030.pdf


On Thu, Aug 22, 2019 at 9:57 AM Greg <[email protected]> wrote:
>
> Hi Kerry,
> Those are all very interesting ways to look at this. I was thinking 
> mostly along the lines of your first bullet point, but I'd be 
> interested in research in any of those areas.
>
> Thanks,
> Greg
>
> On Thu, Aug 22, 2019 at 5:00 AM 
> <[email protected]>
> wrote:
>
> > Send Wiki-research-l mailing list submissions to
> >         [email protected]
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >         https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > or, via email, send a message with subject or body 'help' to
> >         [email protected]
> >
> > You can reach the person managing the list at
> >         [email protected]
> >
> > When replying, please edit your Subject line so it is more specific 
> > than "Re: Contents of Wiki-research-l digest..."
> >
> >
> > Today's Topics:
> >
> >    1. gender balance of wikipedia citations (Greg)
> >    2. Re: gender balance of wikipedia citations (Kerry Raymond)
> >
> >
> > --------------------------------------------------------------------
> > --
> >
> > Message: 1
> > Date: Wed, 21 Aug 2019 20:19:18 -0700
> > From: Greg <[email protected]>
> > To: [email protected]
> > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > Message-ID:
> >         <
> > caoo9dnty+odo5oqrmzeg1nze-kynylwntd6acheytbyegk8...@mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Greetings!
> >
> > I was looking for information about the gender balance of Wikipedia 
> > citations and no one I've asked knows of any work on this topic. Do you?
> >
> > I think this is an important question.
> >
> > Here's what I've learned so far:
> >
> > Wikipedia citations are currently in the form of text strings. There 
> > is also an initiative to place citations in an annotated structured 
> > repository (wikicite). I do not know the current status of wikicite 
> > or if/when this could be used for this inquiry--either to examine 
> > all, or a sensible subset of the citations.
> >
> > My perspective is that understanding the gender balance is  
> > necessary and urgent. The balance could be better, the same, or 
> > worse than the citation balances we already know, and the scale of the 
> > effect is quite large.
> >
> > Is this a line of inquiry that the wikimedia/wikicite community is 
> > interested in pursuing? If so, what is the best way to get started? 
> > Does the WMF have the resources and interest to look into this matter 
> > inhouse?
> >
> > Thanks for your thoughts.
> >
> > Greg
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Thu, 22 Aug 2019 13:53:45 +1000
> > From: "Kerry Raymond" <[email protected]>
> > To: "'Research into Wikimedia content and communities'"
> >         <[email protected]>
> > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > Message-ID: <[email protected]>
> > Content-Type: text/plain;       charset="UTF-8"
> >
> > Could you elaborate a bit more on what you mean by the gender 
> > balance of citations?
> >
> > Are you talking about:
> >
> > * proportion of male vs female authors of the source material used 
> > as citations in arbitrary articles>
> > *  the quality/quantity of citations in biography articles of men vs women?
> > * the quality/quantity of citations in articles that are gendered by 
> > some other criteria (e.g. reader interest, romantic comedy vs action film)?
> >
> > Kerry
> >
> > -----Original Message-----
> > From: Wiki-research-l 
> > [mailto:[email protected]]
> > On Behalf Of Greg
> > Sent: Thursday, 22 August 2019 1:19 PM
> > To: [email protected]
> > Subject: [Wiki-research-l] gender balance of wikipedia citations
> >
> > Greetings!
> >
> > I was looking for information about the gender balance of Wikipedia 
> > citations and no one I've asked knows of any work on this topic. Do you?
> >
> > I think this is an important question.
> >
> > Here's what I've learned so far:
> >
> > Wikipedia citations are currently in the form of text strings. There 
> > is also an initiative to place citations in an annotated structured 
> > repository (wikicite). I do not know the current status of wikicite 
> > or if/when this could be used for this inquiry--either to examine 
> > all, or a sensible subset of the citations.
> >
> > My perspective is that understanding the gender balance is  
> > necessary and urgent. The balance could be better, the same, or 
> > worse than the citation balances we already know, and the scale of the 
> > effect is quite large.
> >
> > Is this a line of inquiry that the wikimedia/wikicite community is 
> > interested in pursuing? If so, what is the best way to get started? 
> > Does the WMF have the resources and interest to look into this matter 
> > inhouse?
> >
> > Thanks for your thoughts.
> >
> > Greg
> > _______________________________________________
> > Wiki-research-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> >
> >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> >
> > ------------------------------
> >
> > End of Wiki-research-l Digest, Vol 168, Issue 11
> > ************************************************
> >
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to