Thanks for sharing your experience and thoughts, Jane. I did not know this
was happening--I'm hardly an expert, so that's not surprising, and yet it's
still very troubling to hear. I'm not sure what you mean by setting up a
Wikiproject. Do you mean of ways for how to study this gap--i.e., the ideas
that have been floated in this thread to this point? Or are you thinking of
something else?

Greg

On Mon, Aug 26, 2019 at 5:00 AM <[email protected]>
wrote:

> Send Wiki-research-l mailing list submissions to
>         [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
>         [email protected]
>
> You can reach the person managing the list at
>         [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
>    1. Re: gender balance of Wikipedia citations (WereSpielChequers)
>    2. Re: gender balance of Wikipedia citations (Greg)
>    3. Re: sockpuppets and how to find them sooner (Federico Leva (Nemo))
>    4. Re: gender balance of Wikipedia citations (Jane Darnell)
>    5. Re: gender balance of wikipedia citations (Federico Leva (Nemo))
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 25 Aug 2019 14:28:25 +0100
> From: WereSpielChequers <[email protected]>
> To: Research into Wikimedia content and communities
>         <[email protected]>
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> Message-ID:
>         <CAAanWP3qJnMpLB4tr9Eqt4EJLg2kCihkb50UY-d8=
> [email protected]>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Greg,
>
> One of the major step changes in the early growth of the English Wikipedia
> was when a bot called RamBot created stub articles on US places. I think
> they were cited to the census. Others have created articles on rivers in
> countries and various other topics by similar programmatic means. Nowadays
> such article creation is unlikely to get consensus on the English
> Wikipedia, but there are some languages which are very open to such
> creations and have them by the million.
>
> I'm not sure if the fastest updating of existing articles is automated or
> just semiautomated. But looking at the bot requests page, it certainly
> looks like some people are running such maintenance bots "updating GDP by
> country" is a current bot request.
> https://en.wikipedia.org/wiki/Wikipedia:Bot_requests.
>
> I'm not sure how "the ease of a source for purposes of converting into a
> table and generating a separate article for each row" relates to gender.
> But i suspect "number of times cited in wikipedia" deserves less kudos than
> "number of times cited in academia".
>
> WSC
>
> On Sun, 25 Aug 2019 at 05:22, Greg <[email protected]> wrote:
>
> > Thanks again, Kerry. I am hoping that someone with access to more
> resources
> > (knowledge, support, etc) than I have will look into this.
> >
> > A few more thoughts/questions:
> >
> > 1. The link to the citation dataset from the Medium article ("What are
> the
> > ten most cited sources on Wikipedia? Let’s ask the data.") is broken.
> > 2. As far as I can tell, every named author in the top ten most cited
> > sources on Wikipedia is male. One piece is by a working group
> > 3. This line from the Medium piece struck me: "Many of these publications
> > have been cited by Wikipedians across large series of articles using
> > powerful bots and automated tools."
> >
> > Are citations being added by bots? I'm not sure that I understand that
> line
> > correctly.
> >
> > Greg
> >
> >
> >
> >
> >
>
>
> ------------------------------
>
> Message: 2
> Date: Sun, 25 Aug 2019 21:16:25 -0700
> From: Greg <[email protected]>
> To: [email protected]
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> Message-ID:
>         <CAOO9DNvGyfvJkzyRq60cSQi-T80mAkUa=
> [email protected]>
> Content-Type: text/plain; charset="UTF-8"
>
> Thanks, WSC. All very interesting.
>
> I've been thinking about Wiklpedia citations less in terms of kudos and
> more in terms of a feedback loop. The cited sources get a significant
> amount of attention (1 click per 200 pageviews is the number I saw
> recently). When I imagine total Wikipedia traffic, that's huge. How many
> students are finding sources this way? How many academics? And how many of
> these citations are finding their way back into academic publications via
> this mechanism?
>
> Assuming this is happening to some degree, the gender imbalance of the
> citations is also reflected. If the Wikipedia imbalance is the same as the
> one in academia, that's one thing; if it is better on Wikipedia than it is
> in academia, that's reason to celebrate; if the balance is worse, that's
> concerning. In fact, if the gender imbalance conforms to my fears instead
> of my hopes, and is magnified by the massive website traffic, I imagine it
> could even explain the growth in the citation disparity researchers note in
> their study of political science texts. (I link to that study in a previous
> post; it was mentioned in the Washington Post recently)
>
> There is a very real possibility that Wikipedia is making the citation
> gender gap worse. I think we need to understand what is happening and take
> immediate action if the news is not good.
>
> Greg
>
> >
> >
> >
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 26 Aug 2019 10:59:07 +0300
> From: "Federico Leva (Nemo)" <[email protected]>
> To: Research into Wikimedia content and communities
>         <[email protected]>, Aaron Halfaker
>         <[email protected]>, Kerry Raymond <[email protected]>
> Subject: Re: [Wiki-research-l] sockpuppets and how to find them sooner
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Please everyone avoid using jargon specific to the English Wikipedia on
> this cross-language and cross-wiki mailing list.
>
> Aaron Halfaker, 23/08/19 17:36:
> > I think embeddings[1] would be a nice way to create a signature.
>
> There is some discussion of acceptable user fingerprinting (presumably
> to be available to CheckUsers only), other than the usual over-reliance
> on IP addresses, in particular at
> <
> https://meta.wikimedia.org/wiki/Talk:IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation
> >.
>
> Federico
>
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 26 Aug 2019 10:17:46 +0200
> From: Jane Darnell <[email protected]>
> To: Research into Wikimedia content and communities
>         <[email protected]>
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> Message-ID:
>         <CAFVcA-G87k26nBMr=-e-+C8o6eG0KQvVihH=
> [email protected]>
> Content-Type: text/plain; charset="UTF-8"
>
> Greg,
> Thanks for worrying. This is a known problem and yes, Wikipedia contributes
> to the Gendergap in citations and no, it's not an easy fix, since it is the
> fault of systemic bias in academia. So fewer women are head author on
> scientific publications, and it is generally only the head author that gets
> cited on Wikipedia. This is not just a problem with written works in the
> field of politics.  I spend most of my time working on paintings and their
> documented catalogs, so generally I only notice and fix this problem in art
> catalogs. Women rarely appear as lead author mentioned. I will always add
> them in to descriptions when I add items for their works on Wikidata, but I
> can not always find them! Sometimes I can't even create items for them
> because all I have is a name and a work and nothing else available online
> anywhere. You see this most often with women who spent entire careers
> working at a single institution and the institution doesn't bother to
> promote their work or even list them in exhibition catalogs. With luck
> there might be a local obituary, but not always. If you have suggestions
> how to set up a Wikiproject to tackle this it would be a good idea. In my
> onwiki experience the Women-in-Red community can be very positive in their
> response to gendergap-related issues for women writers.
> Jane
>
> On Mon, Aug 26, 2019 at 6:17 AM Greg <[email protected]> wrote:
>
> > Thanks, WSC. All very interesting.
> >
> > I've been thinking about Wiklpedia citations less in terms of kudos and
> > more in terms of a feedback loop. The cited sources get a significant
> > amount of attention (1 click per 200 pageviews is the number I saw
> > recently). When I imagine total Wikipedia traffic, that's huge. How many
> > students are finding sources this way? How many academics? And how many
> of
> > these citations are finding their way back into academic publications via
> > this mechanism?
> >
> > Assuming this is happening to some degree, the gender imbalance of the
> > citations is also reflected. If the Wikipedia imbalance is the same as
> the
> > one in academia, that's one thing; if it is better on Wikipedia than it
> is
> > in academia, that's reason to celebrate; if the balance is worse, that's
> > concerning. In fact, if the gender imbalance conforms to my fears instead
> > of my hopes, and is magnified by the massive website traffic, I imagine
> it
> > could even explain the growth in the citation disparity researchers note
> in
> > their study of political science texts. (I link to that study in a
> previous
> > post; it was mentioned in the Washington Post recently)
> >
> > There is a very real possibility that Wikipedia is making the citation
> > gender gap worse. I think we need to understand what is happening and
> take
> > immediate action if the news is not good.
> >
> > Greg
> >
> > >
> > >
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 26 Aug 2019 11:45:09 +0300
> From: "Federico Leva (Nemo)" <[email protected]>
> To: Research into Wikimedia content and communities
>         <[email protected]>, Greg
>         <[email protected]>
> Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Greg, 22/08/19 06:19:
> > I do not know the current status of wikicite or if/when this
> > could be used for this inquiry--either to examine all, or a sensible
> subset
> > of the citations.
>
> If I see correctly, you still did not receive an answer on the data
> available.
>
> It's true that the Figshare item for
> <
> https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wikipedia>
>
> was deleted (I've asked about it on the talk page), but it's trivial to
> run https://pypi.org/project/mwcites/ and extract the data yourself, at
> least for citations which use an identifier.
>
> Some example datasets produced this way:
> https://zenodo.org/record/15871
> https://zenodo.org/record/55004
> https://zenodo.org/record/54799
>
> Once you extract the list of works, the fun begins. You'll need to
> intersect with other data sources (Wikidata, ORCID, other?) and account
> for a number of factors until you manage to find a subset of the data
> which has a sufficiently high signal:noise ratio. For instance you might
> need to filter or normalise by
> * year of publication (some year recent enough to have good data but old
> enough to allow the work to be cited elsewhere, be archived after
> embargos);
> * country or institution (some probably have better ORCID coverage);
> * field/discipline and language;
> * open access status (per Unpaywall);
> * number of expected pageviews and clicks (for instance using
> <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews> and
> <https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Releases>;
>
> a link from 10k articles on asteroids or proteins is not the same as
> being the lone link from a popular article which is not the same as a
> link buried among a thousand others on a big article);
> * time or duration of the addition (with one of the various diff
> extraction libraries, content persistence data or possibly historical
> eventstream if such a thing is available).
>
> To avoid having to invent everything yourself, maybe you can reuse the
> method of some similar study, for instance the one on the open access
> citation advantage or one of the many which studied the gender imbalance
> of citations and peer review in journals.
>
> However, it's very possible that the noise is just too much for a
> general computational method. You might consider a more manual approach
> on a sample of relevant events, for instance the *removal* of citations,
> which is in my opinion more significant than the addition.* You might
> extract all the diffs which removed a citation from an article in the
> last N years (probably they'll be in the order of 10^5 rather than
> 10^6), remove some massive events or outliers, sample 500-1000 of them
> randomly and verify the required data manually.
>
> As usual it will be impossible to have an objective assessment of
> whether that citation was really (in)appropriate in that context
> according to the (English or whatever) Wikipedia guidelines. To test
> that too, you should replicate one of the various studies of the gender
> imbalance of peer review, perhaps one of those which tried to assess the
> impact of a double blind peer review system on the gender imbalance.
> However, because the sources are already published, you'd need to
> provide the agendered information yourself and make sure the
> participants perform their assessment in some controlled environment
> where they don't have access to any gendered information (i.e. where you
> cut them off the internet).
>
> How many years do you have to work on this project? :-)
>
> Federico
>
> (*) I might add a citation just because it's the first result a popular
> search engine gives me, after glancing at the abstract and maybe the
> journal home page; but if I remove an existing citation, hopefully I've
> at least assessed its content and made a judgement about it, apart from
> cases of mass removals for specific problems with certain articles or
> publication venues.
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ------------------------------
>
> End of Wiki-research-l Digest, Vol 168, Issue 20
> ************************************************
>
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to