Thanks for sharing your experience and thoughts, Jane. I did not know this was happening--I'm hardly an expert, so that's not surprising, and yet it's still very troubling to hear. I'm not sure what you mean by setting up a Wikiproject. Do you mean of ways for how to study this gap--i.e., the ideas that have been floated in this thread to this point? Or are you thinking of something else?
Greg On Mon, Aug 26, 2019 at 5:00 AM <[email protected]> wrote: > Send Wiki-research-l mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wiki-research-l digest..." > > > Today's Topics: > > 1. Re: gender balance of Wikipedia citations (WereSpielChequers) > 2. Re: gender balance of Wikipedia citations (Greg) > 3. Re: sockpuppets and how to find them sooner (Federico Leva (Nemo)) > 4. Re: gender balance of Wikipedia citations (Jane Darnell) > 5. Re: gender balance of wikipedia citations (Federico Leva (Nemo)) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 25 Aug 2019 14:28:25 +0100 > From: WereSpielChequers <[email protected]> > To: Research into Wikimedia content and communities > <[email protected]> > Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations > Message-ID: > <CAAanWP3qJnMpLB4tr9Eqt4EJLg2kCihkb50UY-d8= > [email protected]> > Content-Type: text/plain; charset="UTF-8" > > Hi Greg, > > One of the major step changes in the early growth of the English Wikipedia > was when a bot called RamBot created stub articles on US places. I think > they were cited to the census. Others have created articles on rivers in > countries and various other topics by similar programmatic means. Nowadays > such article creation is unlikely to get consensus on the English > Wikipedia, but there are some languages which are very open to such > creations and have them by the million. > > I'm not sure if the fastest updating of existing articles is automated or > just semiautomated. But looking at the bot requests page, it certainly > looks like some people are running such maintenance bots "updating GDP by > country" is a current bot request. > https://en.wikipedia.org/wiki/Wikipedia:Bot_requests. > > I'm not sure how "the ease of a source for purposes of converting into a > table and generating a separate article for each row" relates to gender. > But i suspect "number of times cited in wikipedia" deserves less kudos than > "number of times cited in academia". > > WSC > > On Sun, 25 Aug 2019 at 05:22, Greg <[email protected]> wrote: > > > Thanks again, Kerry. I am hoping that someone with access to more > resources > > (knowledge, support, etc) than I have will look into this. > > > > A few more thoughts/questions: > > > > 1. The link to the citation dataset from the Medium article ("What are > the > > ten most cited sources on Wikipedia? Let’s ask the data.") is broken. > > 2. As far as I can tell, every named author in the top ten most cited > > sources on Wikipedia is male. One piece is by a working group > > 3. This line from the Medium piece struck me: "Many of these publications > > have been cited by Wikipedians across large series of articles using > > powerful bots and automated tools." > > > > Are citations being added by bots? I'm not sure that I understand that > line > > correctly. > > > > Greg > > > > > > > > > > > > > ------------------------------ > > Message: 2 > Date: Sun, 25 Aug 2019 21:16:25 -0700 > From: Greg <[email protected]> > To: [email protected] > Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations > Message-ID: > <CAOO9DNvGyfvJkzyRq60cSQi-T80mAkUa= > [email protected]> > Content-Type: text/plain; charset="UTF-8" > > Thanks, WSC. All very interesting. > > I've been thinking about Wiklpedia citations less in terms of kudos and > more in terms of a feedback loop. The cited sources get a significant > amount of attention (1 click per 200 pageviews is the number I saw > recently). When I imagine total Wikipedia traffic, that's huge. How many > students are finding sources this way? How many academics? And how many of > these citations are finding their way back into academic publications via > this mechanism? > > Assuming this is happening to some degree, the gender imbalance of the > citations is also reflected. If the Wikipedia imbalance is the same as the > one in academia, that's one thing; if it is better on Wikipedia than it is > in academia, that's reason to celebrate; if the balance is worse, that's > concerning. In fact, if the gender imbalance conforms to my fears instead > of my hopes, and is magnified by the massive website traffic, I imagine it > could even explain the growth in the citation disparity researchers note in > their study of political science texts. (I link to that study in a previous > post; it was mentioned in the Washington Post recently) > > There is a very real possibility that Wikipedia is making the citation > gender gap worse. I think we need to understand what is happening and take > immediate action if the news is not good. > > Greg > > > > > > > > > > ------------------------------ > > Message: 3 > Date: Mon, 26 Aug 2019 10:59:07 +0300 > From: "Federico Leva (Nemo)" <[email protected]> > To: Research into Wikimedia content and communities > <[email protected]>, Aaron Halfaker > <[email protected]>, Kerry Raymond <[email protected]> > Subject: Re: [Wiki-research-l] sockpuppets and how to find them sooner > Message-ID: <[email protected]> > Content-Type: text/plain; charset=utf-8; format=flowed > > Please everyone avoid using jargon specific to the English Wikipedia on > this cross-language and cross-wiki mailing list. > > Aaron Halfaker, 23/08/19 17:36: > > I think embeddings[1] would be a nice way to create a signature. > > There is some discussion of acceptable user fingerprinting (presumably > to be available to CheckUsers only), other than the usual over-reliance > on IP addresses, in particular at > < > https://meta.wikimedia.org/wiki/Talk:IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation > >. > > Federico > > > > ------------------------------ > > Message: 4 > Date: Mon, 26 Aug 2019 10:17:46 +0200 > From: Jane Darnell <[email protected]> > To: Research into Wikimedia content and communities > <[email protected]> > Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations > Message-ID: > <CAFVcA-G87k26nBMr=-e-+C8o6eG0KQvVihH= > [email protected]> > Content-Type: text/plain; charset="UTF-8" > > Greg, > Thanks for worrying. This is a known problem and yes, Wikipedia contributes > to the Gendergap in citations and no, it's not an easy fix, since it is the > fault of systemic bias in academia. So fewer women are head author on > scientific publications, and it is generally only the head author that gets > cited on Wikipedia. This is not just a problem with written works in the > field of politics. I spend most of my time working on paintings and their > documented catalogs, so generally I only notice and fix this problem in art > catalogs. Women rarely appear as lead author mentioned. I will always add > them in to descriptions when I add items for their works on Wikidata, but I > can not always find them! Sometimes I can't even create items for them > because all I have is a name and a work and nothing else available online > anywhere. You see this most often with women who spent entire careers > working at a single institution and the institution doesn't bother to > promote their work or even list them in exhibition catalogs. With luck > there might be a local obituary, but not always. If you have suggestions > how to set up a Wikiproject to tackle this it would be a good idea. In my > onwiki experience the Women-in-Red community can be very positive in their > response to gendergap-related issues for women writers. > Jane > > On Mon, Aug 26, 2019 at 6:17 AM Greg <[email protected]> wrote: > > > Thanks, WSC. All very interesting. > > > > I've been thinking about Wiklpedia citations less in terms of kudos and > > more in terms of a feedback loop. The cited sources get a significant > > amount of attention (1 click per 200 pageviews is the number I saw > > recently). When I imagine total Wikipedia traffic, that's huge. How many > > students are finding sources this way? How many academics? And how many > of > > these citations are finding their way back into academic publications via > > this mechanism? > > > > Assuming this is happening to some degree, the gender imbalance of the > > citations is also reflected. If the Wikipedia imbalance is the same as > the > > one in academia, that's one thing; if it is better on Wikipedia than it > is > > in academia, that's reason to celebrate; if the balance is worse, that's > > concerning. In fact, if the gender imbalance conforms to my fears instead > > of my hopes, and is magnified by the massive website traffic, I imagine > it > > could even explain the growth in the citation disparity researchers note > in > > their study of political science texts. (I link to that study in a > previous > > post; it was mentioned in the Washington Post recently) > > > > There is a very real possibility that Wikipedia is making the citation > > gender gap worse. I think we need to understand what is happening and > take > > immediate action if the news is not good. > > > > Greg > > > > > > > > > > > > > _______________________________________________ > > Wiki-research-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > ------------------------------ > > Message: 5 > Date: Mon, 26 Aug 2019 11:45:09 +0300 > From: "Federico Leva (Nemo)" <[email protected]> > To: Research into Wikimedia content and communities > <[email protected]>, Greg > <[email protected]> > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations > Message-ID: <[email protected]> > Content-Type: text/plain; charset=utf-8; format=flowed > > Greg, 22/08/19 06:19: > > I do not know the current status of wikicite or if/when this > > could be used for this inquiry--either to examine all, or a sensible > subset > > of the citations. > > If I see correctly, you still did not receive an answer on the data > available. > > It's true that the Figshare item for > < > https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wikipedia> > > was deleted (I've asked about it on the talk page), but it's trivial to > run https://pypi.org/project/mwcites/ and extract the data yourself, at > least for citations which use an identifier. > > Some example datasets produced this way: > https://zenodo.org/record/15871 > https://zenodo.org/record/55004 > https://zenodo.org/record/54799 > > Once you extract the list of works, the fun begins. You'll need to > intersect with other data sources (Wikidata, ORCID, other?) and account > for a number of factors until you manage to find a subset of the data > which has a sufficiently high signal:noise ratio. For instance you might > need to filter or normalise by > * year of publication (some year recent enough to have good data but old > enough to allow the work to be cited elsewhere, be archived after > embargos); > * country or institution (some probably have better ORCID coverage); > * field/discipline and language; > * open access status (per Unpaywall); > * number of expected pageviews and clicks (for instance using > <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews> and > <https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Releases>; > > a link from 10k articles on asteroids or proteins is not the same as > being the lone link from a popular article which is not the same as a > link buried among a thousand others on a big article); > * time or duration of the addition (with one of the various diff > extraction libraries, content persistence data or possibly historical > eventstream if such a thing is available). > > To avoid having to invent everything yourself, maybe you can reuse the > method of some similar study, for instance the one on the open access > citation advantage or one of the many which studied the gender imbalance > of citations and peer review in journals. > > However, it's very possible that the noise is just too much for a > general computational method. You might consider a more manual approach > on a sample of relevant events, for instance the *removal* of citations, > which is in my opinion more significant than the addition.* You might > extract all the diffs which removed a citation from an article in the > last N years (probably they'll be in the order of 10^5 rather than > 10^6), remove some massive events or outliers, sample 500-1000 of them > randomly and verify the required data manually. > > As usual it will be impossible to have an objective assessment of > whether that citation was really (in)appropriate in that context > according to the (English or whatever) Wikipedia guidelines. To test > that too, you should replicate one of the various studies of the gender > imbalance of peer review, perhaps one of those which tried to assess the > impact of a double blind peer review system on the gender imbalance. > However, because the sources are already published, you'd need to > provide the agendered information yourself and make sure the > participants perform their assessment in some controlled environment > where they don't have access to any gendered information (i.e. where you > cut them off the internet). > > How many years do you have to work on this project? :-) > > Federico > > (*) I might add a citation just because it's the first result a popular > search engine gives me, after glancing at the abstract and maybe the > journal home page; but if I remove an existing citation, hopefully I've > at least assessed its content and made a judgement about it, apart from > cases of mass removals for specific problems with certain articles or > publication venues. > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > ------------------------------ > > End of Wiki-research-l Digest, Vol 168, Issue 20 > ************************************************ > _______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
