[Wiki-research-l] Re: Generation gap widens between admins and other editors on the English Wikipedia.

2023-08-17 Thread Stuart A. Yeates
> Given that the total size of the community is stable or slowly growing, I
> don't see why so few candidates are coming forward for RFA.

I've written thousands of en.wiki biographies and noticed that the hardest
people to find sources for are lawyers (and by extension judges) because
these people scrupulously keep their private life private to reduce the
chances that they can be challenged as not impartial or having a conflict
of interest. It doesn't stop them from having interests or being partial,
of course, they're only human and after the fact these come out, often in
an obituary.

A case in point is Patsy Reddy, who when appointed Dame had so few online
sources that she didn't qualify as notable on en.wiki (I spent a whole day
looking when the honours were announced). Two years later she was made
governor general. There are now > 50 sources in her article, but only the
tiny primary-source print-only "New Zealand Who's Who Aotearoa 2001" is
prior to her being appointed Dame. She did a metric buttload of stuff, but
in a way to keep out of the public / press eye.

We have built a system that does exactly the same in adminship ---
ruthlessly select for the kinds of people and kinds of lived experience
that keep themselves off the internet except in the most innocuous of ways.
The system literally selects for wiki-lawyers who keep their wiki-lawyering
quiet.

THIS is why so few candidates are coming forward for RFA / why so many are
scared to put themselves forward. It may be an inherent property of all
quasi-legal systems, I'm not sure.

[Disclaimer: I'm currently T-BANNED from BLPs on en.wiki]
[Disclaimer: I'm from the cohort recruited from then-competitor everything2
where I'd been editing since prior to the founding of wikipedia; I'm more
part of the 'system' than most.]

cheers
stuart
--
...let us be heard from red core to black sky


On Thu, 17 Aug 2023 at 00:31, WereSpielChequers 
wrote:
>
> Probably the biggest change to the process came with the unbundling of
> rollback in 2008, at least that was when the biggest drop came in RFAs,
and
> "good vandalfighter" ceased to be sufficient to pass RFA. You also had to
> show some contribution to building the pedia. We now have over six
thousand
> rollbackers and less than 900 admins, so I think that unbundling did make
> it easier to get Rollback.though arguably Rollback itself is now a
> redundant userright as anyone can just opt in to tools like twinkle.
>
>
>
> I wasn't around in the early years, I started editing in 2007 towards the
> end of the exponential growth era and only started to pay attention to RFA
> in 2008. Though I have looked at quite a few earlier RFAs.  I think that
> the criteria haven't changed much in a decade - maybe there has been an
> increase in the requirements for tenure and or edits, or rather someone
> with 3,000 to 4,000 unautomated edits can expect a few opposes as would
> someone with between one and two years active editing. What I can't
explain
> is why we appointed 121 new admins in 2009 but averaged less than 20 new
> admins a year for the last ten years. I really don't think that the de
> facto criteria for adminship are very different now compared to 2009:
>
> There are people who care about the deletion button and don't want someone
> who will be to soft or harsh with it.
>
> There are people who care about the block button, including those who
don't
> want someone blocking the regulars who hasn't gone through the process of
> building content.
>
> There are people who think that all admins should be legally adult
>
> And there are those who want to stop certain long term problems returning
> in a new guise. One assumption made here is that the mask will slip if one
> of those editors tries to make nice for an entire year in order to make
> admin.
>
>
> Given that the total size of the community is stable or slowly growing, I
> don't see why so few candidates are coming forward for RFA.
>
> WSC
>
> On Wed, 16 Aug 2023 at 03:24, Samuel Klein  wrote:
>
> > The iron law of gaps...
> >
> > On Tue, Aug 15, 2023 at 5:44 PM The Cunctator 
wrote:
> >
> > > IMHO: The amount of jargon and legalistic booby traps to navigate now
to
> > > become an admin is gargantuan, and there isn't a strong investment in
a
> > > development ladder.
> >
> >
> > Yes.  More generally, a shift towards a Nupedia model (elaborate
seven-step
> > processes, focus on quality, focus on knowing lots of precedent and not
> > making mistakes, spending more time justifying actions than making
them) is
> > making sweeping, mopping, and bureaucracy generally more work, less fun,
> > and more exclusionary.
> >
> > Perhaps asking everyone to adopt someone new, or sticking "provisional"
> > tags on a family of palette-swap roles that are Really Truly NBD
> >  We Mean It This Time, would help
stave
> > off the iron law in a repeatable
> >  way//
> >
> > SJ
> > 

[Wiki-research-l] Re: Can I have a Wikipedia article written about me?

2022-07-27 Thread Stuart A. Yeates
If you want a Wikipedia article about you, you'll need independent
secondary sources with in depth coverage of you and at least a couple of
editors who can read the language(s) they're in. If you're looking at the
English language Wikipedia, I suggest starting at the teahouse. Other
Wikipedia have other systems

Cheers
Stuart

On Wed, 27 Jul 2022, 16:30 Turritopsis Dohrnii Teo En Ming, <
tdtem.opensou...@gmail.com> wrote:

> Subject: Can I have a Wikipedia article written about me?
>
> Good day from Singapore,
>
> Can I have a Wikipedia article written about me?
>
> I am looking forward to your reply.
>
> Thank you.
>
> Regards,
>
> Mr. Turritopsis Dohrnii Teo En Ming
> Targeted Individual in Singapore
> 27 July 2022 Wed
> Blogs:
> https://tdtemcerts.blogspot.com
> https://tdtemcerts.wordpress.com
> ___
> Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
> To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org
>
___
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org


[Wiki-research-l] Re: Edit Summary Stats / Research?

2021-08-03 Thread Stuart A. Yeates
There is a long-standing tool to search them at

https://sigma.toolforge.org/summary.py?name=Stuartyeates=re-review=500=enwiki=Wikipedia

In case you're looking for code to reuse.

cheers
stuart
--
...let us be heard from red core to black sky

On Wed, 4 Aug 2021 at 05:38, WereSpielChequers
 wrote:
>
> Dear Isaac,
>
> I'm not aware of any research on this. But there are a couple of common
> assumptions that you could check as part of any research.
>
>
>1. One of the reasons why any suggestion that we make edit summaries
>compulsory is that as long as they are optional, blank edit summaries are a
>great way to identify vandals.
>2. There is also a certain amount of "sneaky vandalism" denoted by edits
>that get reverted or reverted and the perpetrators get warned for vandalism
>or blocked as a "vandalism only account"
>3. Though we admins have the technology to blank people's edit summaries
>it is very rarely used
>
>
>
>
>  Regards
> Jonathan
>
> On Tue, 3 Aug 2021 at 16:20, Isaac Johnson  wrote:
>
> > Does anyone know of any research or statistics around edit summary
> >  usage on Wikipedia? All
> > I
> > could find in a quick scan was some statistics from 2010 (
> > https://meta.wikimedia.org/wiki/Usage_of_edit_summary_on_Wikipedia). I'm
> > curious if anyone has more updated statistics, or, even better: a more
> > thorough analysis of how edit summaries are used by editors -- i.e. how
> > complete they are, to what degree they represent the "what" vs. the "why",
> > how often they are misleading, etc.
> >
> > Best,
> > Isaac
> >
> > --
> > Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation
> > ___
> > Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
> > To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org
> >
> ___
> Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
> To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org
___
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org


Re: [Wiki-research-l] How to quantifying "effort" or "time spent" put into articles?

2020-10-20 Thread Stuart A. Yeates
I suggest that you talk to editors who have got an article to
featured-article status about their estimates of the work involved
(commonly cited as 3-6 months full time work). If you then count their
edits on the article and divide one by the other...

cheers
stuart


--
...let us be heard from red core to black sky

On Wed, 21 Oct 2020 at 12:45, Nate TeBlunthuis  wrote:
>
> Thanks everyone for sharing your ideas. I really appreciate you taking
> the time! This discussion raised a lot of useful points.
>
> I think that as a first approximation, an edit-sessions approach seems
> okay. I think it is reasonable for the kind of purposes I have in mind
> to ignore time spent incidental to an edit (but not directly spent
> editing) like walking to a building or reading an article before
> beginning to edit. Cases like adding references or images from commons
> seem potentially important. Something like what Isaac or Aaron suggested
> like using a model to better estimate the amount of time that different
> kinds of edit takes (potentially using task categories, links, images,
> text diff metrics, references as features) would be a good but
> higher-effort measurement approach. Ignoring obvious vandalism is
> obviously an important step.
>
> --
>
> Nate
>
>
> On 10/20/20 3:28 PM, Isaac Johnson wrote:
> > Thanks for raising this question Nate! Really interested in this
> > discussion. Another option to throw into the mix though it would require a
> > fair bit of work:
> >
> > The Growth team put together a taxonomy of tasks that editors do and their
> > perceived difficulty level:
> > https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Newcomer_tasks#/media/File:Newcomer_tasks_-_Difficulty_filters.png
> > I'm not sure how complete the taxonomy is, but you could:
> >
> > - come up with a complete-ish taxonomy of edit types (another option to
> > consider: https://github.com/diyiy/Wiki_Semantic_Intention)
> > - assign each edit type a general difficulty level and time estimate
> > (hopefully backed up with some empirical data from edit session data or
> > which user groups engage in a given type of edit though for all the 
> > reasons
> > mentioned by you and others, that can be really hard to calculate)
> > - build detectors for each type of edit (unfortunately this is going to
> > require parsing a lot of wikitext but hopefully you can simply do things
> > like just compare the count # of links, images, templates, etc. in the
> > previous revision and current revision with mwparserfromhell)
> > - classify each edit based on what changes it had to the difficulty
> > level and therefore estimated time/expertise involved.
> >
> > Alternatively, you could just count up # of links etc. for the current
> > version of the page and multiply each link etc. by estimated time to add.
> > This would be highly conservative though because it would miss all the
> > collaboration / updating / adjusting / etc. so it would be more of an
> > estimate of minimum time to build a page.
> >
> > On Tue, Oct 20, 2020 at 5:43 PM Ziko van Dijk  wrote:
> >
> >> Hello Nate,
> >>
> >> Thank you for your interesting question, and thank you for your paper
> >> with Shaw and Mako Hill 2018 on the rise and decline of populations.
> >>
> >> Your endeavour seems to be most difficult and hardly possible. My
> >> thinking would be the following: there are certain patterns behind an
> >> edit, or: editing activity. For example, imagine someone who reads an
> >> article and corrects some minor typos and linguistic issues on the
> >> going. How long is the article, how long may it take to read it? How
> >> long may it take to make those edits (or, one big edit)?
> >>
> >> On the one hand, you may ask editors or observe them to find out how
> >> much time they need for this kind of activity. On the other hand, you
> >> may try to find this pattern back in certain characteristics of the
> >> edit (edit of the whole page; small changes of letters at several
> >> locations of the text).
> >>
> >> It would be a philosophical question what is exactly part of the
> >> editing activity. If I read a whole article for my own purposes, as a
> >> reader, without intention to edit, and then I find a small error and
> >> quickly correct it - does that make my whole reading of the article a
> >> part of my editing activity? I would have read the article anyway.
> >>
> >> There would be many other patterns. E.g., someone adds a picture. How
> >> much time this takes, that depends on whether the editor has searched
> >> for it on Commons, or took the same one he found in a different
> >> language version. So, if the picture appears in other language
> >> versions, you assume that the editor needed 10 minutes to find it, and
> >> otherwise, that he needed only two minutes to find the picture on a
> >> different language version?
> >>
> >> A last example: On a meeting of administrators I remember an admin
> >> 

Re: [Wiki-research-l] Editor surveys on race/ethnicity/religion

2020-09-21 Thread Stuart A. Yeates
Everyone from China and Saudi Arabia (two countries which
systematically block wikipedia) are likely to be taking technical
measures to disguise their country.

That's a lot of people, but I'm not sure how many editors that is.

cheers
stuart

--
...let us be heard from red core to black sky

On Tue, 22 Sep 2020 at 07:01, L.Gelauff  wrote:
>
> Just thinking out loud.. are we looking for actual race/ethnicity/etc data,
> or is it rather that we're looking for whether someone belongs to an under
> represented group in their specific situation? If it is the latter, there
> may be ways to phrase the question without asking for actual demographics.
>
> Stuart; do you have any indication for how large a portion that group is? I
> am aware of public pages being potentially disguised as such, but wasn't
> familiar with stories about this happening in a survey context (although it
> does not sound implausible).
>
> Best,
>
> Lodewijk
>
> On Mon, Sep 21, 2020 at 11:39 AM Stuart A. Yeates  wrote:
>
> > Another point not touched on by other commenters is that even if ideal
> > race / ethnicity question(s were developed for every country in the
> > world, users from some countries commonly disguise their country due
> > to censorship in that country, so we there would be a whole class of
> > systematic errors where we asked users the wrong country's
> > question(s).
> >
> > cheers
> > stuart
> > --
> > ...let us be heard from red core to black sky
> >
> > On Tue, 22 Sep 2020 at 05:00, Isaac Johnson  wrote:
> > >
> > > Adding another point from Rebecca Maung who helps run the annual
> > Community
> > > Insights surveys <https://meta.wikimedia.org/wiki/Community_Insights>
> > but
> > > isn't currently on this listserv so couldn't respond directly:
> > >
> > > This year's Community Insights survey (reporting scheduled for early
> > 2021)
> > > is the first that will ask Wikimedia contributors about race and
> > > ethnicity-- but only in certain geographies. Due to all the excellent
> > > points made in this thread, we have never asked a race or ethnicity
> > > question, but this year we decided to start asking locally relevant
> > > questions where we could. This year only editors in the US and Britain
> > will
> > > see a question about race or ethnicity, tailored to their local contexts.
> > > In the coming years, we will expand the countries and geographies that
> > see
> > > a question like this, prioritizing places where there is a larger editor
> > > presence and local laws and norms allow such questions. We have not yet
> > > discussed asking about religion in the Community Insights survey.
> > >
> > > On Mon, Sep 21, 2020 at 9:20 AM Isaac Johnson 
> > wrote:
> > >
> > > > As pointed out by others, the highly contextualized nature of religion,
> > > > race, and ethnicity between countries makes it very difficult to
> > impossible
> > > > to craft questions that are not overly reductive but still somewhat
> > > > universal. Despite this challenge, understanding diversity in a way
> > that
> > > > captures these aspects is obviously quite important as they often
> > figure
> > > > very strongly into power and representation within history, media, etc.
> > > >
> > > > In general, if you're looking for large-scale surveys of editors, the
> > Meta
> > > > category (Category:Editor surveys
> > > > <https://meta.wikimedia.org/wiki/Category:Editor_surveys>) is actually
> > > > quite complete (same for readers
> > > > <https://meta.wikimedia.org/wiki/Category:Reader_surveys>). In
> > > > particular, I wrote what little I could find about these topics into
> > this
> > > > section of our recently published knowledge gaps taxonomy:
> > > > https://arxiv.org/pdf/2008.12314.pdf#subsubsection.3.1.7
> > > >
> > > > The April 2011 editor survey took the approach of just asking people
> > how
> > > > they felt they were different from others in the community -- this
> > specific
> > > > question is not one that I would advocate today (asking people to
> > identify
> > > > all the ways in which they may be "outsiders" is not particularly
> > > > welcoming) but this is also probably the style of approach (asking
> > people
> > > > how well they feel represented within Wikipedia content or editor
> > > > community) that you'd have to take to get

Re: [Wiki-research-l] Editor surveys on race/ethnicity/religion

2020-09-21 Thread Stuart A. Yeates
Another point not touched on by other commenters is that even if ideal
race / ethnicity question(s were developed for every country in the
world, users from some countries commonly disguise their country due
to censorship in that country, so we there would be a whole class of
systematic errors where we asked users the wrong country's
question(s).

cheers
stuart
--
...let us be heard from red core to black sky

On Tue, 22 Sep 2020 at 05:00, Isaac Johnson  wrote:
>
> Adding another point from Rebecca Maung who helps run the annual Community
> Insights surveys <https://meta.wikimedia.org/wiki/Community_Insights> but
> isn't currently on this listserv so couldn't respond directly:
>
> This year's Community Insights survey (reporting scheduled for early 2021)
> is the first that will ask Wikimedia contributors about race and
> ethnicity-- but only in certain geographies. Due to all the excellent
> points made in this thread, we have never asked a race or ethnicity
> question, but this year we decided to start asking locally relevant
> questions where we could. This year only editors in the US and Britain will
> see a question about race or ethnicity, tailored to their local contexts.
> In the coming years, we will expand the countries and geographies that see
> a question like this, prioritizing places where there is a larger editor
> presence and local laws and norms allow such questions. We have not yet
> discussed asking about religion in the Community Insights survey.
>
> On Mon, Sep 21, 2020 at 9:20 AM Isaac Johnson  wrote:
>
> > As pointed out by others, the highly contextualized nature of religion,
> > race, and ethnicity between countries makes it very difficult to impossible
> > to craft questions that are not overly reductive but still somewhat
> > universal. Despite this challenge, understanding diversity in a way that
> > captures these aspects is obviously quite important as they often figure
> > very strongly into power and representation within history, media, etc.
> >
> > In general, if you're looking for large-scale surveys of editors, the Meta
> > category (Category:Editor surveys
> > <https://meta.wikimedia.org/wiki/Category:Editor_surveys>) is actually
> > quite complete (same for readers
> > <https://meta.wikimedia.org/wiki/Category:Reader_surveys>). In
> > particular, I wrote what little I could find about these topics into this
> > section of our recently published knowledge gaps taxonomy:
> > https://arxiv.org/pdf/2008.12314.pdf#subsubsection.3.1.7
> >
> > The April 2011 editor survey took the approach of just asking people how
> > they felt they were different from others in the community -- this specific
> > question is not one that I would advocate today (asking people to identify
> > all the ways in which they may be "outsiders" is not particularly
> > welcoming) but this is also probably the style of approach (asking people
> > how well they feel represented within Wikipedia content or editor
> > community) that you'd have to take to get information on ethnicity / race /
> > religion without writing country-specific questions:
> > https://upload.wikimedia.org/wikipedia/commons/7/76/Editor_Survey_Report_-_April_2011.pdf#page=65
> >
> > On Mon, Sep 21, 2020 at 6:12 AM Stuart A. Yeates 
> > wrote:
> >
> >> The ethnicity / race question is an incredibly hard question to
> >> compose in an internationalised way.
> >>
> >> Pretty much every country in the world uses different terms and there
> >> are some very confusing cases where the same term is used in different
> >> countries to mean very different things (e,g, "Asian" in UK English vs
> >> New Zealand English). This is derived from varying legal definitions
> >> (for example blood quantum vs one-drop laws); the history of
> >> colonisation and waves of immigration to the country; along with
> >> cultural differences.
> >>
> >> cheers
> >> stuart
> >> --
> >> ...let us be heard from red core to black sky
> >>
> >> On Mon, 21 Sep 2020 at 21:55, Federico Leva (Nemo) 
> >> wrote:
> >> >
> >> > Su-Laine Brodsky, 21/09/20 08:19:
> >> > > I’m wondering if any large-scale surveys have been done that ask
> >> Wikipedia editors about their race, ethnicity, or religion?
> >> >
> >> > What international standards exist to phrase such questions?
> >> > Denominations commonly used in surveys in one country may be considered
> >> > horrific or even illegal in others.
> >> >
> >> > I see OECD considers it

Re: [Wiki-research-l] Editor surveys on race/ethnicity/religion

2020-09-21 Thread Stuart A. Yeates
The ethnicity / race question is an incredibly hard question to
compose in an internationalised way.

Pretty much every country in the world uses different terms and there
are some very confusing cases where the same term is used in different
countries to mean very different things (e,g, "Asian" in UK English vs
New Zealand English). This is derived from varying legal definitions
(for example blood quantum vs one-drop laws); the history of
colonisation and waves of immigration to the country; along with
cultural differences.

cheers
stuart
--
...let us be heard from red core to black sky

On Mon, 21 Sep 2020 at 21:55, Federico Leva (Nemo)  wrote:
>
> Su-Laine Brodsky, 21/09/20 08:19:
> > I’m wondering if any large-scale surveys have been done that ask Wikipedia 
> > editors about their race, ethnicity, or religion?
>
> What international standards exist to phrase such questions?
> Denominations commonly used in surveys in one country may be considered
> horrific or even illegal in others.
>
> I see OECD considers it a difficult problem too:
>
> 
>
> 76.  Current NSOs collection practices cluster around three broad
> categories: 1) all OECD countries collect information on some diversity
> proxies such as country of birth (36 OECD members); 2) a small majority,
> mostly Eastern European countries, the United Kingdom and Ireland,
> gather additional information on race and ethnicity (16 OECD members);
> and 3) only a handful of countries in the Americas and Oceania collect
> data on indigenous identity (6 OECD members). Diversity statistics are
> collected from the perspective of either enumerating the size of the
> relevant populations (typically in the census) or of comparing
> well-being outcomes across different population groups.
>
> 77.  While privacy and human rights legislation sometimes prevents or
> discourages the routine collection of diversity data, the need to
> improve data availability and quality is being recognised in most
> countries. Many countries are piloting the addition of new ethnic
> response options to more accurately reflect the make-up of their
> societies (e.g. Ireland, the United States), while Belgium is
> considering allowing collection of race and ethnicity data within the
> restrictions imposed by the national legal framework. Within the
> European Statistical System, the inclusion of more detailed migration
> information is also being considered: The Framework Regulation for
> Production of European Statistics on Persons and Households European
> foresees the incorporation of questions on the country of birth of the
> respondent’s parents in the Labour Force Surveys (from 2020), the
> European Health Interview Survey, the European Union Statistics on
> Income and Living Conditions, the Household Budget Surveys and the
> Community surveys on ICT usage in households and by individuals. The
> European Union Agency for Fundamental Rights is pursuing its Roma and
> Travellers Survey to collect comparable data in six selected Member
> States in 2018 (FRA, 2018[77]).
>
> 
>
> https://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=SDD/DOC(2018)9=En
>
> Federico
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Requesting access to deleted pages for research purposes

2020-07-09 Thread Stuart A. Yeates
I recently completed a project writing en.wiki articles for all female
and indigenous professors in my country, .nz.

I now write pronounless biographies, because there were a significant
number whose gender wasn't apparent from their public persona. My
guess is that women and LGBTIA+ minorities are incentivised to remove
markers of their gender from their online presence to keep a lower
profile to avoid the trolls and bigots.

There were also a number who clearly appeared to be a certain
ethnicity based on their staff photo, but where there were no reliable
sources as to that ethnicity.

I also had a one person ask for their article to be deleted. [If this
is of interest I can send details to you directly, but I will not post
their details to a public forum and ask you refrain from this also.]

I look forward to reading your experimental design taking these
factors into account.

cheers
stuart
--
...let us be heard from red core to black sky

On Fri, 10 Jul 2020 at 06:43, Mackenzie Lemieux
 wrote:
>
> Dear Wiki Community,
>
> My name is Mackenzie Lemieux and I am a neuroscience researcher at the Salk
> Institute for Biological Studies and I am interested in exploring biases on
> Wikipedia.
>
> My research hypothesis is that gender or ethnicity mediate the rate of
> flagging and deletion of pages for women in STEM.  I hope to
> retrospectively analyze Wikipedia's deletion history, harvest the
> biographical articles about scientists that have been created over the past
> n years and then confirm the gender and ethnicity of a large sample.
>
> It appears that we can identify deleted pages with Wikipedia's deletion log
> , but to actually see
> the page that was deleted we need to be members of one of these Wikipedia
> user groups:  Administrators
> , Oversighters
> , Researchers
> , Checkusers
> .
>
> Does anyone have advice on how to obtain researcher status or is there
> anyone willing to collaborate who has access to the data we need?
>
> Warmly,
> Mackenzie Lemieux
>
>
> --
> Mackenzie Lemieux
> mackenzie.lemi...@gmail.com
> cell: 416-806-0041
> 220 Gilmour Avenue
> Toronto, Ontario
> M6P 3B4
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] NEEDS COMMENT ASAP: Wikimedia Project Grant Proposal to address BIAS!

2020-03-28 Thread Stuart A. Yeates
Hi Chris

This, for me, is a diversity issue.

Over the years I only seem to give feedback on grants when prompted by
someone (it's been my only involvement in grants to date) and every
time a potential grantee has prompted me to take a look at their
grant, some other grant seems even better than the one I've been
prompted to take a look at.

I suggest that rather than encouraging potentially grantees to solicit
feedback on their grant, you encourage them to solicit feedback on all
the grants up for funding. This would tend to minimize of
well-connected, established people / groups getting more / better
feedback and thus more likely to be grant recipients.

cheers
stuart
--
...let us be heard from red core to black sky

On Sun, 15 Mar 2020 at 13:25, Chris "Jethro" Schilling
 wrote:
>
> Hi Stuart,
>
> We ask Project Grant applicants to invite feedback from relevant
> communities on their proposals, such as on this mailing list. While I
> understand it is helpful for folks to be aware of all research-related
> Project Grant proposals, this is something you should ask a program officer
> for. It's not something an individual applicant should be expected to
> provide. I am supporting the Project Grants round, so here is that list of
> research-related proposals:
>
>- Modelling and Populating Performing Arts Data in Wikidata
>
> <https://meta.wikimedia.org/wiki/Grants:Project/Fjjulien/Modelling_and_Populating_Performing_Arts_Data_in_Wikidata>
>- Misinformation And Its Discontents: Narrative Recommendations on
>Wikipedia's Vulnerabilities and Resilience
>
> <https://meta.wikimedia.org/wiki/Grants:Project/Ocaasi/Misinformation_And_Its_Discontents:_Narrative_Recommendations_on_Wikipedia%27s_Vulnerabilities_and_Resilience>
>- Translatathon@Uniba: Developing Transversal Competences
>
> <https://meta.wikimedia.org/wiki/Grants:Project/Uniba_-_Dipartimento_LELIA/Translatathon@Uniba:_Developing_Transversal_Competences>
>- Community Health Metrics: Understanding Editor Drop-off
>
> <https://meta.wikimedia.org/wiki/Grants:Project/Community_Health_Metrics:_Understanding_Editor_Drop-off>
>- Geogap in South Asia and Dutch Caribbean on Wikimedia projects
>
> <https://meta.wikimedia.org/wiki/Grants:Project/Michelle_Boon_%26_Layka100/Geogap_in_South_Asia_and_Dutch_Caribbean_on_Wikimedia_projects>
>- Addressing Implicit Bias on Wikipedia
>
> <https://meta.wikimedia.org/wiki/Grants:Project/JackieKoerner/Addressing_Implicit_Bias_on_Wikipedia>
>
> The community review period is ending soon, but we will still be reviewing
> feedback through the committee review period starting on 17 March 2020.
>
> With thanks,
>
> Chris
>
>
> *Chris Schilling* (him/his/they/their)
> User:I JethroBT (WMF)
> <https://meta.wikimedia.org/wiki/User:I_JethroBT_(WMF)>
> Program Officer, Wikimedia Foundation Grants
> Wikimedia Foundation <https://wikimediafoundation.org/>
>
>
> On Thu, Mar 12, 2020 at 3:24 PM Stuart A. Yeates  wrote:
>
> > If there's a deadline coming up for supporting research proposals, how
> > about doing the right thing and post a link to all the research
> > proposals?
> >
> > cheers
> > stuart
> > --
> > ...let us be heard from red core to black sky
> >
> > On Fri, 13 Mar 2020 at 02:26, Jackie  wrote:
> > >
> > > Hi Friends,
> > >
> > > I submitted a Wikimedia project grant proposal for the 2020 round. I
> > would
> > > really appreciate it if you could check it out and endorse it if you
> > > support the proposal.
> > >
> > >
> > https://meta.wikimedia.org/wiki/Grants:Project/JackieKoerner/Addressing_Implicit_Bias_on_Wikipedia
> > >
> > > The last day to share support is just days away. If addressing bias on
> > > Wikipedia is important to you now is the time to speak up!
> > >
> > > Thank you!
> > >
> > > Best,
> > >
> > > Jackie
> > >
> > > --
> > > Jackie Koerner, Ph.D.
> > > jackiekoerner.com
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Announcement] Daily Social Media Traffic Report for English Wikipedia articles

2020-03-23 Thread Stuart A. Yeates
My immediate thought is how to connect this to the wiki projects for each
article, because wiki projects are the primary sources of expert knowledge
and have the resources to deal with many issues.

Cheers
Stuart

On Tue, 24 Mar 2020, 8:24 AM Jonathan Morgan,  wrote:

> The WMF Research team has published a new pageview report of inbound
> traffic coming from Facebook, Twitter, YouTube, and Reddit.[1]
>
> The report contains a list of all articles that received at least 500 views
> from one or more of these platforms (i.e. someone clicked a link on Twitter
> that sent them directly to a Wikipedia article). The report is available
> on-wiki and will be updated daily at around 14:00 UTC with traffic counts
> from the previous calendar day.
>
> We believe this report provides editors with a valuable new information
> source. Daily inbound social media traffic stats can help editors monitor
> edits to articles that are going viral on social media sites and/or are
> being linked to by the social media platform itself in order to fact-check
> disinformation and other controversial content[2][3].
>
> The social media traffic report also contains additional public article
> metadata that may be useful in the context of monitoring articles that are
> receiving unexpected attention from social media sites, such as...
>
>- the total number of pageviews (from all sources) that article received
>in the same period of time
>- the number of pageviews the article received from the same platform
>(e.g. Facebook) the previous day (two days ago)
>- the number of editors who have the page on their watchlist
>- the number of editors who have watchlisted the page AND recently
>visited it
>
> We want your feedback! We have some ideas of our own for how to improve the
> report, but we want to hear yours! If you have feature suggestions, please
> add them here.[4] We intend to maintain this daily report for at least the
> next two months. If we receive feedback that the report is useful, we are
> considering making it available indefinitely.
>
> If you have other questions about the report, please first check out our
> (still growing) FAQ [5]. All questions, comments, concerns, ideas, etc. are
> welcome on the project talkpage on Meta.[4]
>
> 1. https://en.wikipedia.org/wiki/User:HostBot/Social_media_traffic_report
> 2.
>
> https://www.engadget.com/2018/03/15/wikipedia-unaware-would-be-youtube-fact-checker/
> 3.
>
> https://mashable.com/2017/10/05/facebook-wikipedia-context-articles-news-feed/
> 4.
>
> https://meta.wikimedia.org/wiki/Research_talk:Social_media_traffic_report_pilot
> 5.
>
> https://meta.wikimedia.org/wiki/Research:Social_media_traffic_report_pilot/About
>
> Cheers,
> Jonathan
>
> --
> Jonathan T. Morgan
> Senior Design Researcher
> Wikimedia Foundation
> User:Jmorgan (WMF) 
> (Uses He/Him)
>
> *Please note that I do not expect a response from you on evenings or
> weekends*
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] NEEDS COMMENT ASAP: Wikimedia Project Grant Proposal to address BIAS!

2020-03-12 Thread Stuart A. Yeates
If there's a deadline coming up for supporting research proposals, how
about doing the right thing and post a link to all the research
proposals?

cheers
stuart
--
...let us be heard from red core to black sky

On Fri, 13 Mar 2020 at 02:26, Jackie  wrote:
>
> Hi Friends,
>
> I submitted a Wikimedia project grant proposal for the 2020 round. I would
> really appreciate it if you could check it out and endorse it if you
> support the proposal.
>
> https://meta.wikimedia.org/wiki/Grants:Project/JackieKoerner/Addressing_Implicit_Bias_on_Wikipedia
>
> The last day to share support is just days away. If addressing bias on
> Wikipedia is important to you now is the time to speak up!
>
> Thank you!
>
> Best,
>
> Jackie
>
> --
> Jackie Koerner, Ph.D.
> jackiekoerner.com
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Modelling user behaviour on Wikipedia

2020-02-24 Thread Stuart A. Yeates
Hi Kiril

Let's just say that history has taught us to be risk-averse to
drive-by researchers.

Can you point us to other research output using this methodology? Do
you (or any of your team) have significant editing experience? Are you
familiar with the firestorm that is paid editing and sock puppetry??

cheers
stuart

--
...let us be heard from red core to black sky

On Tue, 25 Feb 2020 at 10:43, Kiril Simeonovski
 wrote:
>
> Hi Pine,
>
> The findings from the research will be articulated to draw clear
> conclusions about what causes utility and disutility from participation,
> and how this is perceived by different editors. For instance, it is natural
> to assume that editors come to contribute by adding content that will
> remain visible, while blocks and reverted edits are risk factors that drive
> them away, although different editors have different levels of risk
> aversion. Similarly to any other research, the benefit for the community
> and individual editors is going to be indirect but yet not insignificant to
> be accepted in the future process of decision-making (if the research
> demonstrates the existence of high level of risk aversion towards
> something, then it automatically signals that doing that thing is harmful
> for the environment).
>
> I know that it's impossible to predict the extent to which this research
> would make impact because the body of literature is very poor on
> volunteer-driven environments in a dynamic setting but it's definitely
> worth to start off something that might attract the attention of
> researchers in this direction. At the end, the research is not meant to
> carve rules in stone that any single editor should respect but rather to
> suggest something that individuals and communities might find useful (the
> means of doing this will definitely not turn Wikipedia into a laboratory or
> put someone's privacy in danger).
>
> Best,
> Kiril
>
> On Mon, Feb 24, 2020 at 9:43 PM Pine W  wrote:
>
> > Hi Kiril,
> >
> > Thank you for sharing your proposal.
> >
> > I am concerned about the possibility of Wikipedia being used as a
> > laboratory for experiments that consume volunteers' time and/or
> > personal data, and don't benefit Wikipedia or its participants. Does
> > your research benefit the community, and if so, how? It sounds like
> > your research intends to develop a model of decision trees for
> > individual Wikipedians, and at first read I don't understand how the
> > individual research subjects or the community would benefit.
> >
> > Sorry if this sounds defensive, but I hope that you understand why I'm
> > asking.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> > On Mon, Feb 24, 2020 at 6:00 PM Kiril Simeonovski
> >  wrote:
> > >
> > > Hi all,
> > >
> > > I am currently working on a research concerned with modelling user
> > > behaviour on Wikipedia. The idea is to design a field experiment over a
> > > random sample of Wikipedians in order to examine their risk preferences
> > and
> > > define (dis)utilities that will be used in a utility-maximisation model.
> > >
> > > I have already submitted an abstract that got accepted for the
> > > biennial Foundations
> > > of Utility and Risk Conference 2020 
> > and my
> > > future plans include presentation of the concept at other research
> > > conferences (including Wikimania 2020).
> > >
> > > You can visit the project page
> > > <
> > https://meta.wikimedia.org/wiki/Research:Modelling_Behaviour_in_a_Peer_Production_Economy_upon_Evidence_from_Wikipedia
> > >
> > > of this research on Meta. Your questions and comments are welcome at any
> > > time. Thank you!
> > >
> > > Best regards,
> > > Kiril
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Modelling user behaviour on Wikipedia

2020-02-24 Thread Stuart A. Yeates
Removing accounts with low numbers of edits will, of course, blind
your analysis to users who use throw-away accounts, even when they can
clearly be attributed to the same individual.

cheers
stuart
--
...let us be heard from red core to black sky

On Tue, 25 Feb 2020 at 08:42, Kiril Simeonovski
 wrote:
>
> Hi Stuart,
>
> Yes, all those terms refer to a Wikipedia account. My plan is to avoid
> sampling accounts with very low activity (probably less then a minimum
> threshold of edits) because of the impracticality to draw any conclusion
> from them.
>
> Best,
> Kiril
>
> On Mon, Feb 24, 2020 at 8:36 PM Stuart A. Yeates  wrote:
>
> > When you say "participant", "user" and "editor" do you actually mean
> > account?
> >
> > I routinely notice what appear to be people attending real-file events
> > using one account but then editing afterwards with a different
> > account.
> >
> > cheers
> > stuart
> > --
> > ...let us be heard from red core to black sky
> >
> > On Tue, 25 Feb 2020 at 07:00, Kiril Simeonovski
> >  wrote:
> > >
> > > Hi all,
> > >
> > > I am currently working on a research concerned with modelling user
> > > behaviour on Wikipedia. The idea is to design a field experiment over a
> > > random sample of Wikipedians in order to examine their risk preferences
> > and
> > > define (dis)utilities that will be used in a utility-maximisation model.
> > >
> > > I have already submitted an abstract that got accepted for the
> > > biennial Foundations
> > > of Utility and Risk Conference 2020 <https://www.furconference.org/>
> > and my
> > > future plans include presentation of the concept at other research
> > > conferences (including Wikimania 2020).
> > >
> > > You can visit the project page
> > > <
> > https://meta.wikimedia.org/wiki/Research:Modelling_Behaviour_in_a_Peer_Production_Economy_upon_Evidence_from_Wikipedia
> > >
> > > of this research on Meta. Your questions and comments are welcome at any
> > > time. Thank you!
> > >
> > > Best regards,
> > > Kiril
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Modelling user behaviour on Wikipedia

2020-02-24 Thread Stuart A. Yeates
When you say "participant", "user" and "editor" do you actually mean account?

I routinely notice what appear to be people attending real-file events
using one account but then editing afterwards with a different
account.

cheers
stuart
--
...let us be heard from red core to black sky

On Tue, 25 Feb 2020 at 07:00, Kiril Simeonovski
 wrote:
>
> Hi all,
>
> I am currently working on a research concerned with modelling user
> behaviour on Wikipedia. The idea is to design a field experiment over a
> random sample of Wikipedians in order to examine their risk preferences and
> define (dis)utilities that will be used in a utility-maximisation model.
>
> I have already submitted an abstract that got accepted for the
> biennial Foundations
> of Utility and Risk Conference 2020  and my
> future plans include presentation of the concept at other research
> conferences (including Wikimania 2020).
>
> You can visit the project page
> 
> of this research on Meta. Your questions and comments are welcome at any
> time. Thank you!
>
> Best regards,
> Kiril
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] How many papers / books about wikis?

2020-01-05 Thread Stuart A. Yeates
This largely depends on what you mean by "about"

Many tens of thousands of articles use a wikipedia or other WMF
service as a source in some way (either a source for a definition, a
selection or as a traditional datasource in some way) or speculate
about the future of wikis.

At the other end of the spectrum, a vanishing small number tell
experienced wikipedia editors anything they didn't already know about
their wikipedia or other WMF service or quantify things we know in
ways we don't see as deeply flawed.

cheers
stuart
--
...let us be heard from red core to black sky


On Mon, 6 Jan 2020 at 02:36, Ziko van Dijk  wrote:
>
> Hello,
> to everyone to whom it concerns, my best wishes for the year 2020!
> I am interested in the number of scientific papers or monographies,
> articles etc. about wikis. Do you know about a paper that has come up with
> a relatively recent number?
> In my understanding, there are several problems that make it unwise to
> simply search for "wiki" in a general catalogue:
> * the word wiki can appear in words such as "Wikinger" (German for:
> viking), or it is used as a metaphor (e.g., for a reform of democracy)
> * some entities such as Wikileaks have "wiki" in their name, but are no
> wikis, and some entities such as Open Street Map are wikis, but don't have
> the word in their name
> * wiki relevant topics may appear under terms such as "collaborative
> writing" or "open content creation".
>
> Kind regards
> Ziko
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Series of blog posts

2019-12-19 Thread Stuart A. Yeates
Over the last couple of days I've posted a series of
professionally-editted blog posts related to my involvement in
en.wiki:

https://sciblogs.co.nz/guestwork/2019/12/18/how-i-came-to-be-writing-wikipedia-biographies-for-female-new-zealand-professors/
https://sciblogs.co.nz/guestwork/2019/12/19/what-to-put-in-a-wikipedia-biography-and-what-gets-left-out/
https://sciblogs.co.nz/guestwork/2019/12/20/15-years-of-editing-wikipedia/

Spoiler: today is my 15th anniversary of editing wikipedia under my own name.

cheers
stuart
--
...let us be heard from red core to black sky

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Does wikipedians feel like commoners ?

2019-12-07 Thread Stuart A. Yeates
It's worth noting that while Richard Stallman and Eric S. Raymond
played important roles historically and published
widely-read-at-the-time analyses, both have had significant falls from
grace since then and basing current analyses of the commons and other
systems on their work should be done _very_ carefully.

cheers
stuart

--
...let us be heard from red core to black sky

On Sun, 8 Dec 2019 at 11:39, Todd Allen  wrote:
>
> If you're looking for general history on the digital commons movement,
> check out Richard Stallman and the Free Software Foundation, and Eric S.
> Raymond's *The Cathedral and the Bazaar*. A lot of the initial Wikipedians
> were very much in favor of open source and open content, and were quite
> familiar with those. I don't, to be quite honest, know about "E. Ostrom",
> and have never heard them discussed on-wiki, but of course other editors
> might be.
>
> But if you really want to see the influence of the "commons" idea on
> Wikipedia, the open source software movement is going to be very relevant
> to what you want to look at. Mediawiki, the software that Wikipedia and
> other Wikimedia sites run on, is open source, and the technology stack
> underlying it is as well.
>
> On Sat, Dec 7, 2019 at 9:05 AM Sebastien Shulz 
> wrote:
>
> > Hi everyone,
> >
> > I'm currently doing a Ph.d on digital commons. I'm tracing the history of
> > the "digital common" movement (if there is one). And I wanted to know if
> > there are some studies about Wikipedians and their relation with the
> > conceptual framework of the commons (do they feel like commoners ? Do they
> > know E. Ostrom, etc.)
> > Thanks a lot for your help !
> > Best regards,
> >
> > *Sébastien Shulz*
> > *Doctorant en sociologie *
> > *Laboratoire Interdisciplinaire Sciences Innovations Sociétés*
> > *06.68.86.68.46 // Linkedin *
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] New paper - Indigenous knowledge on Wikipedia

2019-07-05 Thread Stuart A. Yeates
Technically these are primary sources when they are first recorded /
written down. The first recorded / written theorizing about them is
secondary. Reporting on a concensus reached by that theorizing is tertiary.
Assuming that the singing, theorizing and reporting is done by different
parties.

The model very much assumes a subject and an observer and written /
recorded communication.

The model (for better or worse) is very effective is suppressing things
like arguing from the Bible or from direct religious experience. Presumably
new models would either accept those or deal with them in other ways.

Cheers
Stuart

On Sat, 6 Jul 2019 2:11 pm Ocean Power  wrote:

> What about Australian indigenous songs that trace the path of songlines
> that both document collective history and folk knowledge and also
> rhythmically document land contours and other landmarks as a
> map/timeline/travel guide and often compile folkloric and secondary and
> primary knowledge over generations? I'm curious if you think these function
> in some ways as tertiary sources which, at least according to the wiki,
> include "travel guides, field guides, and almanacs." I'm out of my depth
> but enjoying the back and forth here.
>
> On Fri, Jul 5, 2019 at 5:20 PM, Stuart A. Yeates 
> wrote:
>
> > Hi Samuel
> >
> > Can you provide examples of tertiary sources from pure oral cultures?
> I've
> > never heard of any.
> >
> > Cheers
> > Stuart
> >
> > On Sat, 6 Jul 2019 1:19 am Samuel Klein  wrote:
> >
> >> I think we have all the mechanics needed for this.
> >>
> >> - Individual revisions aren't editable, once posted, and stay around
> >> forever (unless revdeleted).
> >> - Each wiki can have its own guidelines for how accounts can be shared.
> >> - Rather than limiting who can edit, you could have a whitelist of
> >> contributors considered by the local community to represent their
> >> knowledge; and have a lens that only looks at those contributions. (like
> >> flagged revs)
> >>
> >> (@stuart - tertiary sourcing can apply to any source; it does not
> privilege
> >> print culture. only particular standards of notability and verifiability
> >> start to limit which sources are preferred.)
> >>
> >> On Thu, Jul 4, 2019 at 7:39 PM Kerry Raymond 
> >> wrote:
> >>
> >> > On en.WP we prohibit shared accounts and accounts that appear to
> >> represent
> >> > an organisation so that's a barrier. But assuming there was some
> special
> >> > case to allow a username to represent a community of knowledge, we
> would
> >> > still have a practical problem of whether the individual creating
> such an
> >> > account or doing the edit was authorised to do so by that community,
> >> which
> >> > would require some kind of real-world validation. But, let's say local
> >> > chapters or local users could undertake that process using local
> >> knowledge
> >> > of how such communities identify and operate.
> >> >
> >> > The problem it still doesn't solve is that whatever information is
> added
> >> > by that account could then be changed by anyone. We would have to
> have a
> >> > way to prevent that happening, which would be a technical problem.
> Also
> >> > could that information ever be deleted by anyone (even for purely
> >> innocent
> >> > purposes, e.g. splitting a large article might delete the content from
> >> one
> >> > article to re-insert into other article). Or is the positioning of the
> >> > content within a particular article a decision only that group might
> be
> >> > allowed to take?
> >> >
> >> > A possible technical/social solution is to have traditional knowledge
> of
> >> > this nature in a sister project, where rules on user names would be
> >> > entirely different and obviously oral sourced material allowed. The
> >> group
> >> > could then produce named units of information as a single unit
> (similar
> >> to
> >> > a File on Commons). These units could then be added to en.WP or others
> >> > (obviously the language the units are written would have be
> identified,
> >> as
> >> > Commons does with descriptions already) so only English content is
> added
> >> to
> >> > en.WP and so on. The content would be presented in en.WP in a way (in
> a
> >> > "traditional language" box with a link to something explaining that
> what
&g

Re: [Wiki-research-l] New paper - Indigenous knowledge on Wikipedia

2019-07-05 Thread Stuart A. Yeates
Hi Samuel

Can you provide examples of tertiary sources from pure oral cultures? I've
never heard of any.

Cheers
Stuart

On Sat, 6 Jul 2019 1:19 am Samuel Klein  wrote:

> I think we have all the mechanics needed for this.
>
> - Individual revisions aren't editable, once posted, and stay around
> forever (unless revdeleted).
> - Each wiki can have its own guidelines for how accounts can be shared.
> - Rather than limiting who can edit, you could have a whitelist of
> contributors considered by the local community to represent their
> knowledge; and have a lens that only looks at those contributions.  (like
> flagged revs)
>
> (@stuart - tertiary sourcing can apply to any source; it does not privilege
> print culture.  only particular standards of notability and verifiability
> start to limit which sources are preferred.)
>
> On Thu, Jul 4, 2019 at 7:39 PM Kerry Raymond 
> wrote:
>
> > On en.WP we prohibit shared accounts and accounts that appear to
> represent
> > an organisation so that's a barrier. But assuming there was some special
> > case to allow a username to represent a community of knowledge, we would
> > still have a practical problem of whether the individual creating such an
> > account or doing the edit was authorised to do so by that community,
> which
> > would require some kind of real-world validation. But, let's say local
> > chapters or local users could undertake that process using local
> knowledge
> > of how such communities identify and operate.
> >
> > The problem it still doesn't solve is that whatever information is added
> > by that account could then be changed by anyone. We would have to have a
> > way to prevent that happening, which would be a technical problem. Also
> > could that information ever be deleted by anyone (even for purely
> innocent
> > purposes, e.g. splitting a large article might delete the content from
> one
> > article to re-insert into other article). Or is the positioning of the
> > content within a particular article a decision only that group might be
> > allowed to take?
> >
> > A possible technical/social solution is to have traditional knowledge of
> > this nature in a sister project, where rules on user names would be
> > entirely different and obviously oral sourced material allowed.  The
> group
> > could then produce named units of information as a single unit (similar
> to
> > a File on Commons). These units could then be added to en.WP or others
> > (obviously the language the units are written would have be identified,
> as
> > Commons does with descriptions already) so only English content is added
> to
> > en.WP and so on. The content would be presented in en.WP in a way (in a
> > "traditional language" box with a link to something explaining that what
> > means) so the reader understands what this info is and is free to trust
> it
> > or not. The information itself cannot be modified on en.WP only on the
> > sister project (requests on talk pages of the sister project would need
> to
> > be allowed for anyone to make requests eg report misspelling). En.WP
> would
> > remain in control of whether the content was included but could not
> change
> > the content themselves.
> >
> > It seems to be a sister project similar to the current Commons would be
> > what we need to make this work.
> >
> > Sent from my iPad
> >
> > On 4 Jul 2019, at 6:03 pm, Jan Dittrich 
> wrote:
> >
> > >> Maybe not "signed" in the sense of a signature of a Talk page, but
> each
> > > contribution is attributed automatically to its user as seen in the
> > > history. As someone who edits under my real name, I absolutely put my
> > name
> > > to my contributions.
> > >
> > > That is what I assumed, too, since it was coherent with some of the
> > > problems described in:
> > >
> >
> https://upload.wikimedia.org/wikipedia/commons/6/6c/PG-Slides-Wikimania18.pdf
> > > in this interpretation, Mediawiki (and lots of other software) code-ify
> > > knowledge production as done by single people  [1]– a person can edit,
> > but
> > > not a group (which was one of the challenges in the project described
> in
> > > the slides, if I remember correctly)
> > >
> > > I would be much interested in more research on what values are "build
> in"
> > > our software (Some Research by Heather Ford and Stuart Geiger goes in
> > this
> > > direction).
> > >
> > > Best,
> > > Jan
> > >
> > > [1] An interesting read on the concept of "transmitting knowledge"
> (e.g.
> > in
> > > articles and via the web) and knowledge as inherently social would be
> > > Ingold’s "From the Transmission of Representation to the Education of
> > > Attention" (http://lchc.ucsd.edu/MCA/Paper/ingold/ingold1.htm).
> > >
> > > Am Do., 4. Juli 2019 um 02:20 Uhr schrieb Kerry Raymond <
> > > kerry.raym...@gmail.com>:
> > >
> > >> Maybe not "signed" in the sense of a signature of a Talk page, but
> each
> > >> contribution is attributed automatically to its user as seen in the
> > >> history. As someone who edits under my 

Re: [Wiki-research-l] New paper - Indigenous knowledge on Wikipedia

2019-07-04 Thread Stuart A. Yeates
At the end of the day, wikipedia is by definition a tertiary source
source and built on concepts of Western print culture. Traditional
knowledge is immiscible with this model.

This is exactly why I stopped promoting mi.wiki locally here --- as I
understand the needs of mi speakers and activists wikipedias are
incapable of meeting them.

cheers
stuart
--
...let us be heard from red core to black sky

On Fri, 5 Jul 2019 at 11:39, Kerry Raymond  wrote:
>
> On en.WP we prohibit shared accounts and accounts that appear to represent an 
> organisation so that's a barrier. But assuming there was some special case to 
> allow a username to represent a community of knowledge, we would still have a 
> practical problem of whether the individual creating such an account or doing 
> the edit was authorised to do so by that community, which would require some 
> kind of real-world validation. But, let's say local chapters or local users 
> could undertake that process using local knowledge of how such communities 
> identify and operate.
>
> The problem it still doesn't solve is that whatever information is added by 
> that account could then be changed by anyone. We would have to have a way to 
> prevent that happening, which would be a technical problem. Also could that 
> information ever be deleted by anyone (even for purely innocent purposes, 
> e.g. splitting a large article might delete the content from one article to 
> re-insert into other article). Or is the positioning of the content within a 
> particular article a decision only that group might be allowed to take?
>
> A possible technical/social solution is to have traditional knowledge of this 
> nature in a sister project, where rules on user names would be entirely 
> different and obviously oral sourced material allowed.  The group could then 
> produce named units of information as a single unit (similar to a File on 
> Commons). These units could then be added to en.WP or others (obviously the 
> language the units are written would have be identified, as Commons does with 
> descriptions already) so only English content is added to en.WP and so on. 
> The content would be presented in en.WP in a way (in a "traditional language" 
> box with a link to something explaining that what means) so the reader 
> understands what this info is and is free to trust it or not. The information 
> itself cannot be modified on en.WP only on the sister project (requests on 
> talk pages of the sister project would need to be allowed for anyone to make 
> requests eg report misspelling). En.WP would remain in control of whether the 
> content was included but could not change the content themselves.
>
> It seems to be a sister project similar to the current Commons would be what 
> we need to make this work.
>
> Sent from my iPad
>
> On 4 Jul 2019, at 6:03 pm, Jan Dittrich  wrote:
>
> >> Maybe not "signed" in the sense of a signature of a Talk page, but each
> > contribution is attributed automatically to its user as seen in the
> > history. As someone who edits under my real name, I absolutely put my name
> > to my contributions.
> >
> > That is what I assumed, too, since it was coherent with some of the
> > problems described in:
> > https://upload.wikimedia.org/wikipedia/commons/6/6c/PG-Slides-Wikimania18.pdf
> > in this interpretation, Mediawiki (and lots of other software) code-ify
> > knowledge production as done by single people  [1]– a person can edit, but
> > not a group (which was one of the challenges in the project described in
> > the slides, if I remember correctly)
> >
> > I would be much interested in more research on what values are "build in"
> > our software (Some Research by Heather Ford and Stuart Geiger goes in this
> > direction).
> >
> > Best,
> > Jan
> >
> > [1] An interesting read on the concept of "transmitting knowledge" (e.g. in
> > articles and via the web) and knowledge as inherently social would be
> > Ingold’s "From the Transmission of Representation to the Education of
> > Attention" (http://lchc.ucsd.edu/MCA/Paper/ingold/ingold1.htm).
> >
> > Am Do., 4. Juli 2019 um 02:20 Uhr schrieb Kerry Raymond <
> > kerry.raym...@gmail.com>:
> >
> >> Maybe not "signed" in the sense of a signature of a Talk page, but each
> >> contribution is attributed automatically to its user as seen in the
> >> history. As someone who edits under my real name, I absolutely put my name
> >> to my contributions.
> >>
> >> Or the other possible interpretation of "signed" here may be referring to
> >> the citations which are usually sources with one or small number of
> >> individual authors, as opposed to a community of shared knowledge
> >> custodians which is the case with Aboriginal Australians.
> >>
> >> Kerry
> >>
> >> Sent from my iPad
> >>
> >>> On 4 Jul 2019, at 10:28 am, Todd Allen  wrote:
> >>>
> >>> I found one error:
> >>>
> >>> "Even the idea that contributions to the wiki should be signed by
> >>> individuals is at odds with many traditional 

Re: [Wiki-research-l] Questions about SuggestBot

2019-06-24 Thread Stuart A. Yeates
By "administrative groups" I meant category tree starting at
https://en.wikipedia.org/wiki/Category:Wikipedia_maintenance

cheers
stuart
--
...let us be heard from red core to black sky

On Tue, 25 Jun 2019 at 13:21, Haifeng Zhang  wrote:
>
> Thanks so much for answering my questions, Stuart.
>
> It seems redlinks are related to article creation only.
>
> Could you give me some detail about how "administrative groups" work in term 
> of task routing?
>
> I also found the following TASK CENTER page 
> (https://en.wikipedia.org/wiki/Wikipedia:Task_Center).
>
> Are the links/lists (under "Do it!") used frequently by editors as routing 
> tools?
>
>
> Thanks,
>
> Haifeng Zhang
> ____
> From: Wiki-research-l  on behalf 
> of Stuart A. Yeates 
> Sent: Sunday, June 23, 2019 11:37:38 PM
> To: Research into Wikimedia content and communities
> Subject: Re: [Wiki-research-l] Questions about SuggestBot
>
> (a) SuggestBot visited me in the last week.
> https://en.wikipedia.org/w/index.php?title=User_talk%3AStuartyeates=revision=902456290=901462765
>
> (b) There are lots of different task routing approaches: lists of
> redlinks,administrative groups, etc.
>
> (c) Sentences containing the words 'bot' and 'documented' appear to
> mainly exist for comedic value. Bots are typically even less
> documented than usual.
>
> cheers
> stuart
> --
> ...let us be heard from red core to black sky
>
> On Mon, 24 Jun 2019 at 15:24, Haifeng Zhang  wrote:
> >
> > Hi all,
> >
> > Is the SuggestBot still in use in Wikipedia?
> >
> > Are there similar task routing tools that have been deployed in Wikipedia?
> >
> > Where in Wikipedia the use of such tools or bots was documented?
> >
> >
> > Thanks,
> >
> > Haifeng Zhang
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Questions about SuggestBot

2019-06-23 Thread Stuart A. Yeates
(a) SuggestBot visited me in the last week.
https://en.wikipedia.org/w/index.php?title=User_talk%3AStuartyeates=revision=902456290=901462765

(b) There are lots of different task routing approaches: lists of
redlinks,administrative groups, etc.

(c) Sentences containing the words 'bot' and 'documented' appear to
mainly exist for comedic value. Bots are typically even less
documented than usual.

cheers
stuart
--
...let us be heard from red core to black sky

On Mon, 24 Jun 2019 at 15:24, Haifeng Zhang  wrote:
>
> Hi all,
>
> Is the SuggestBot still in use in Wikipedia?
>
> Are there similar task routing tools that have been deployed in Wikipedia?
>
> Where in Wikipedia the use of such tools or bots was documented?
>
>
> Thanks,
>
> Haifeng Zhang
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Analysis into active user stats

2019-04-30 Thread Stuart A. Yeates
> I "have attached a .pdf of the results to this email."

This appears not to have come through.

cheers
stuart

--
...let us be heard from red core to black sky

On Wed, 1 May 2019 at 08:50, RhinosF1 Wikipedia  wrote:
>
> Hi all,
> As you maybe aware, Over the last 3 weeks, I've been looking into the
> accuracy of active user statistics on English Wikipedia.
>
> I haven't had a chance to upload the final results to
> https://en.wikipedia.org/wiki/User:RhinosF1/activeuser but I have completed
> the gathering of statistics and have attached a .pdf of the results to this
> email.
>
> I've found it interesting how there is a sudden drop in the number of
> active users although I half expected this and intended to find it although
> I want to look deeper.
>
> I'd like too see whether this is down to blocks or just not continuing and
> asses whether time requirements or edit requirements have bigger impact.
>
> I look forward to any feedback and help in the research.
>
> The plan for the next stages are as follows:
> 1. About 10-14 days for people getting this email to respond.
> 2. Run the new list of queries for about 2-3 week to gather some data to
> show
> 3.  Show the data to enwiki users and ask for feedback / help collecting
> data
> 4. Present results in 2-3 months time.
> 5. Gather wide feedback on results
> 6. Maybe take action to improve it if we can see what action needs doing
>
>
> As you will see most of the data is from around 9pm UTC so in future stages
> I would appreciate data collection from a larger range of times.
>
>
> Thanks in advance,
> RhinosF1
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Ways thru which articles could attract editors

2019-04-29 Thread Stuart A. Yeates
There are, broadly, two kinds of ways to attract editors, ways that
attract existing wikipedia editors to specific articles (such as
https://en.wikipedia.org/wiki/User:SuggestBot ,
https://en.wikipedia.org/wiki/Wikipedia:Article_Rescue_Squadron and
https://en.wikipedia.org/wiki/Wikipedia:Today%27s_articles_for_improvement
) and those which attract new editors to the project (such as
https://en.wikipedia.org/wiki/Wikipedia:How_to_run_an_edit-a-thon and
other outreach projects).

Both of these are worthwhile efforts, but the latter might be
preferred because by potentially expanding the size of the overall
community allows more editing to be achieved in the long-run.

cheers
stuart




--
...let us be heard from red core to black sky

On Sun, 28 Apr 2019 at 12:39, Haifeng Zhang  wrote:
>
> Thanks for your reply, Kerry. I meant any kind of quality improvement.
>
> Some mechanisms may target specific type of editors, and others might be 
> quite general.
>
>
> Best,
>
> Haifeng Zhang
>
> Postdoctoral Research Fellow
> Human-Computer Interaction Institute
> Carnegie Mellon University
> 
> From: Wiki-research-l  on behalf 
> of Kerry Raymond 
> Sent: Saturday, April 27, 2019 6:16:12 PM
> To: 'Research into Wikimedia content and communities'
> Subject: Re: [Wiki-research-l] Ways thru which articles could attract editors
>
> "Article quality" is quite a wide topic. I would imagine most good faith 
> contributors believe they are improving the quality of an article with every 
> edit. Do you have some specific type of quality improvement in mind? E.g. 
> more citations, more content, fewer spelling errors?
>
> Kerry
>
> -Original Message-
> From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
> Behalf Of Haifeng Zhang
> Sent: Sunday, 28 April 2019 7:53 AM
> To: Research into Wikimedia content and communities 
> 
> Subject: [Wiki-research-l] Ways thru which articles could attract editors
>
> Dear folks,
>
> I wonder what are those mechanisms/events (in Wikipedia or WikiProjects) 
> which may attract editors to improve article quality.
>
> One example is Today's articles for improvement. Within WikipProjects, GA/FA 
> nominations seem useful too.
>
>
> Thanks,
>
> Haifeng Zhang
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Transferring CC-BY scientific literature into WP

2019-04-17 Thread Stuart A. Yeates
Given that there are organisations already organised, funded and
operating to preserve and promote open-access research, we might want
to think about focusing on getting deep interoperability with them,
rather than sucking all the content into Wikisource, where we can't
provide half the functionality that they can.

cheers
stuart
--
...let us be heard from red core to black sky

On Thu, 18 Apr 2019 at 09:47, Timothy Wood  wrote:
>
> This looks like a job for Wikisource. If nothing else, so long as we can
> verify their CC licensing is compatible, we can archive and preserve them
> in perpetuity on WS. Unfortunately I've scarcely contributed to WS
> personally. I've reached out to a WS admin that I know from Commons. When
> they reply I'll cc them on this thread.
>
> V/r
> TJW/GMG
>
> On Wed, Apr 17, 2019 at 4:47 PM Alexandre Hocquet <
> alexandre.hocq...@univ-lorraine.fr> wrote:
>
> > On 17/04/2019 22:36, Stuart A. Yeates wrote:
> > > On Thu, 18 Apr 2019 at 08:29, Alexandre Hocquet wrote:
> > >> what? then a lot of wikipedia
> > >> articles should be labelled as {{secondary sources needed}})
> > > Exactly. Sourcing as a whole across wikipedia already relies too
> > > heavily on primary sources. I regularly tag articles as such.
> >
> > Well, fair enough then. Good luck for your crusade, and thanks for your
> > interesting views about what constitutes primary, secondary and
> > tertiary. I guess I now have an answer about how much sympathy my
> > suggestion would bring.
> >
> >
> > --
> > ***
> > Alexandre Hocquet
> > Archives Henri Poincaré & Science History Institute
> > alexandre.hocq...@univ-lorraine.fr
> > https://www.sciencehistory.org/profile/alexandre-hocquet
> > https://poincare.univ-lorraine.fr/fr/membre-titulaire/alexandre-hocquet
> > ***
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Transferring CC-BY scientific literature into WP

2019-04-17 Thread Stuart A. Yeates
On Thu, 18 Apr 2019 at 08:29, Alexandre Hocquet
 wrote:

> what? then a lot of wikipedia
> articles should be labelled as {{secondary sources needed}})

Exactly. Sourcing as a whole across wikipedia already relies too
heavily on primary sources. I regularly tag articles as such.

cheers
stuart
--
...let us be heard from red core to black sky

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-18 Thread Stuart A. Yeates
In addition to Kerry's excellent examples there are users editing
wikipedia though TOR, the anonymity and censorship circumvention
network. These users face extra scrutiny.

cheers
stuart


--
...let us be heard from red core to black sky

On Tue, 19 Mar 2019 at 13:04, Kerry Raymond  wrote:
>
> Apart from the legitimate alternate accounts and the illegitimate sockpuppet 
> accounts, there are other ways that alternate accounts exist.
>
> Occasional contributors often forget their username and/or password. Password 
> recovery isn't possible unless you provide an email address at sign-up (it's 
> optional, but you can add it later). So what such people then  do is just 
> create a new user account (I'm not sure there is anything else they can do). 
> I see this sort of behaviour a lot at events. The other variation of the 
> problem is that they did provide an email address but it is one not easily 
> accessible to them at the event (i.e. a librarian who signed up with a work 
> email address that cannot be accessed outside of the organisation).
>
> The other group of people with multiple accounts are those who edit 
> anonymously as serial IPs. The same person can use a number of IP numbers 
> over time. Often you don't realise it is the same person unless you see a lot 
> of their work and can see a pattern in it. For example, at the moment, there 
> is a person with a series of IP accounts that is  changing a common section 
> of a Queensland place article to be a subsection of another, who I notice on 
> my watchlist . This person appears to acquire a new IP address every week or 
> so, but the pattern of editing makes it obvious it's the same person behind 
> it. Whether or not an IP address can be considered "an account" depends on 
> your purposes. The one IP address can also be used by multiple people (e.g. 
> coming through a shared organisational network in a library or school). It is 
> claimed by some people that many new users do their first edits anonymously, 
> so if you are serious about studying "new contributors", then maybe you have 
> to look at anonymous editing. And also even regular contributors may 
> sometimes choose to edit anonymously, e.g. being in an unsecure IT 
> environment and reluctant to use their username/password in that situation 
> (particularly people with administrator or other significant access rights).
>
> Because I do outreach, I look for new accounts that turn up on my watchlist 
> and send them welcome messages etc. Because I also do training, I see a lot 
> of genuinely new people in action where I can observe their edits. So when I 
> see new accounts or IPs doing far more "sophisticated" edits than I see new 
> users do, I tend to suspect they are not genuinely new contributors.
>
> I think the best you can do is look for new accounts and be prepared to omit 
> any that show signs of sophisticated editing (either in terms of they are 
> doing technically or what they say on Talk pages or in edit summaries). For 
> example, no genuine new user will mention a policy (they don't know they 
> exist). Also genuine new users don't tend to edit that quickly, so any rapid 
> fire series of successful edits is unlikely to be a genuine new user.  I 
> think this inability to know if a new account represents a genuinely new user 
> is an inherent limitation for your research and should be documented as such 
> explaining the many circumstances in which new accounts might belong to 
> non-new users.
>
> Kerry
>
> -Original Message-
> From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
> Behalf Of Pine W
> Sent: Tuesday, 19 March 2019 5:27 AM
> To: Research into Wikimedia content and communities 
> 
> Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
>
> Hi Haifeng,
>
> Some users will state on user pages that an account is an alternate account. 
> However, this practice is not followed by everyone, and those who do follow 
> this practice aren't required to so in a uniform way.
>
> Alternate accounts which are not labeled as such, and which are used for 
> illegitimate purposes such as double voting, are an ongoing problem. You 
> might be interested in the English Wikipedia page 
> https://en.wikipedia.org/wiki/Wikipedia:Sock_puppetry.
>
> Alternate accounts can also be used for legitimate purposes, such as people 
> who have one account for their professional or academic activities and 
> another account for their personal use.
>
> Good luck with your project.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> On Thu, Mar 14, 2019 at 1:30 PM Haifeng Zhang 
> wrote:
>
> > Stuart,
> >
> > I'm building an agent-based simulation of Wikipedia collaboration.
> >
> > I would like my model to be empirically grounded, so I need to collect
> > data for new editors.
> >
> > Alternative accounts can be an issue, but I wonder is there a way to
> > identify editors who have multiple account?
> >
> >
> > Thanks,

Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-13 Thread Stuart A. Yeates
On Thu, 14 Mar 2019 at 09:16, Haifeng Zhang  wrote:
>
> Thanks a lot for help, Finn. Now my query can draw sample of new registered 
> editors.

To repeat a point I made earlier in the thread: this query deals with
accounts not editors. Many at the coalface consider this to be a very
important difference. You appear not to have shared enough of your
research project for us to tell whether it's going to matter for you.

cheers
stuart

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Stuart A. Yeates
There are thousands and thousands of editors with multiple accounts.
Those who have been bothered to add a category are listed at
https://en.wikipedia.org/wiki/Category:Wikipedians_with_alternative_accounts

Many editors who engage in outreach are advised to create new accounts
for themselves regularly, simply because the experience of new account
creation changes over time and helping users streamline that
(especially in situations such as editathons) requires thorough
knowledge of account creation and the things that can make it go
wrong. Pretty much a prerequisite for the old  accountcreator
userright https://en.wikipedia.org/wiki/Wikipedia:Account_creator
(which I've had on several occasions) and the new eventcoordinator
userright  https://en.wikipedia.org/wiki/Wikipedia:Event_coordinator
(which is too new for me to have had yet).

cheers
stuart
--
...let us be heard from red core to black sky

On Wed, 13 Mar 2019 at 10:40, Isaac Johnson  wrote:
>
> Yes, thanks for the clarification Stuart. I don't know of any statistics to
> suggest how widespread this is, but it might be worth checking, especially
> if you are focusing on editors with higher edit counts (who I suspect are
> more likely to have multiple accounts for licit reasons).
>
> On Tue, Mar 12, 2019 at 4:34 PM Stuart A. Yeates  wrote:
>
> > Note that this code deals with accounts, not editors, which is what
> > Haifeng asked for.
> >
> > There are many reasons, both licit and illicit for editors to have
> > more than one account. I know I have more than ten for
> > policy-compliant reasons.
> >
> > cheers
> > stuart
> >
> >
> > --
> > ...let us be heard from red core to black sky
> >
> > On Wed, 13 Mar 2019 at 10:21, Isaac Johnson  wrote:
> > >
> > > Hey Haifeng,
> > > If you decide to process the dumps, you should be able to easily
> > repurpose
> > > some quick code that I wrote for a similar project:
> > >
> > https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover
> > >
> > > Notably, I'd suggest using the stub history dumps as they are much
> > smaller
> > > because they do not include the actual content. For instance, for March
> > 1st
> > > and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/),
> > this
> > > file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB.
> > >
> > > Best,
> > > Isaac
> > >
> > > On Tue, Mar 12, 2019 at 3:56 PM Pine W  wrote:
> > >
> > > > Hi Haifeng, thanks for the information. I think that your idea of
> > looking
> > > > in the dumps makes sense. Am I understanding correctly that you would
> > like
> > > > advice regarding how to do that in the most efficient way?
> > > >
> > > > Hi Leila, I believe that I asked for more information regarding
> > Heifeng's
> > > > work. There has been discussion on English Wikipedia regarding
> > volunteers
> > > > being unhappy with the interventions or proposed interventions of
> > > > researchers. I think that asking about the nature of Haifeng's
> > research is
> > > > legitimate, and I tried to provide some examples of possible types of
> > > > research. I'm trying to protect the community from problematic
> > > > interventions, while also welcoming research that is accepted by the
> > > > community.
> > > >
> > > > Pine
> > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > >
> > > >
> > > > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang  > >
> > > > wrote:
> > > >
> > > > > Pine and Stuart,
> > > > >
> > > > > I meant extracting a random sample of new editors (month by month)
> > from
> > > > > Wikipedia edit history.
> > > > >
> > > > > It is not about survey of new editors, but still thanks for your
> > > > > suggestions.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Haifeng Zhang
> > > > >
> > > > > Postdoctoral Research Fellow
> > > > > Human-Computer Interaction Institute
> > > > > Carnegie Mellon University
> > > > > 
> > > > > From: Wiki-research-l 
> > on
> > > > > behalf of Stuart A. Yeates 
> > > > > Sent: Tuesday, March 12, 2019 3:46:19 PM
> > > > > To: Research into Wikimedia content and communities
> &

Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Stuart A. Yeates
Note that this code deals with accounts, not editors, which is what
Haifeng asked for.

There are many reasons, both licit and illicit for editors to have
more than one account. I know I have more than ten for
policy-compliant reasons.

cheers
stuart


--
...let us be heard from red core to black sky

On Wed, 13 Mar 2019 at 10:21, Isaac Johnson  wrote:
>
> Hey Haifeng,
> If you decide to process the dumps, you should be able to easily repurpose
> some quick code that I wrote for a similar project:
> https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover
>
> Notably, I'd suggest using the stub history dumps as they are much smaller
> because they do not include the actual content. For instance, for March 1st
> and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/), this
> file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB.
>
> Best,
> Isaac
>
> On Tue, Mar 12, 2019 at 3:56 PM Pine W  wrote:
>
> > Hi Haifeng, thanks for the information. I think that your idea of looking
> > in the dumps makes sense. Am I understanding correctly that you would like
> > advice regarding how to do that in the most efficient way?
> >
> > Hi Leila, I believe that I asked for more information regarding Heifeng's
> > work. There has been discussion on English Wikipedia regarding volunteers
> > being unhappy with the interventions or proposed interventions of
> > researchers. I think that asking about the nature of Haifeng's research is
> > legitimate, and I tried to provide some examples of possible types of
> > research. I'm trying to protect the community from problematic
> > interventions, while also welcoming research that is accepted by the
> > community.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang 
> > wrote:
> >
> > > Pine and Stuart,
> > >
> > > I meant extracting a random sample of new editors (month by month) from
> > > Wikipedia edit history.
> > >
> > > It is not about survey of new editors, but still thanks for your
> > > suggestions.
> > >
> > >
> > > Thanks,
> > > Haifeng Zhang
> > >
> > > Postdoctoral Research Fellow
> > > Human-Computer Interaction Institute
> > > Carnegie Mellon University
> > > 
> > > From: Wiki-research-l  on
> > > behalf of Stuart A. Yeates 
> > > Sent: Tuesday, March 12, 2019 3:46:19 PM
> > > To: Research into Wikimedia content and communities
> > > Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
> > >
> > > There are a number of new-editor-heavy noticeboards. I would suggest
> > > posting an invite there to your survey (or whatever) If you ask for
> > > editor's usernames you can filter out those who don't meet your
> > > definition of 'new'
> > >
> > > I'm thinking of places like:
> > > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> > > https://en.wikipedia.org/wiki/Wikipedia:Help_desk
> > >
> > > cheers
> > > stuart
> > >
> > >
> > > --
> > > ...let us be heard from red core to black sky
> > >
> > > On Wed, 13 Mar 2019 at 08:37, Leila Zia  wrote:
> > > >
> > > > Hi Pine,
> > > >
> > > > Haifeng has a simple question about how to sample editors other than
> > > > via dumps. It would be great if someone who knows the answer to help
> > > > them to move forward.
> > > >
> > > > If you are interested to learn more about their research, instead of
> > > > answering their question, my recommendation would be to start the
> > > > conversation with: "can you tell us more about your research?" kind of
> > > > question. I find the current way of communication very speculative,
> > > > and that is not good for making a vibrant research community that can
> > > > help us address some of our big questions.
> > > >
> > > > Best,
> > > > Leila
> > > >
> > > > On Tue, Mar 12, 2019 at 12:08 PM Pine W  wrote:
> > > > >
> > > > > Hi, can you expand on what you mean by "sample"? If you're referring
> > to
> > > > > analyzing users' edit histories then that should be fine. However, if
> > > > > you're planning to send surveys or messages to them, sending them
> > > > > barnstars, or otherwis

Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Stuart A. Yeates
There are a number of new-editor-heavy noticeboards. I would suggest
posting an invite there to your survey (or whatever) If you ask for
editor's usernames you can filter out those who don't meet your
definition of 'new'

I'm thinking of places like:
https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
https://en.wikipedia.org/wiki/Wikipedia:Help_desk

cheers
stuart


--
...let us be heard from red core to black sky

On Wed, 13 Mar 2019 at 08:37, Leila Zia  wrote:
>
> Hi Pine,
>
> Haifeng has a simple question about how to sample editors other than
> via dumps. It would be great if someone who knows the answer to help
> them to move forward.
>
> If you are interested to learn more about their research, instead of
> answering their question, my recommendation would be to start the
> conversation with: "can you tell us more about your research?" kind of
> question. I find the current way of communication very speculative,
> and that is not good for making a vibrant research community that can
> help us address some of our big questions.
>
> Best,
> Leila
>
> On Tue, Mar 12, 2019 at 12:08 PM Pine W  wrote:
> >
> > Hi, can you expand on what you mean by "sample"? If you're referring to
> > analyzing users' edit histories then that should be fine. However, if
> > you're planning to send surveys or messages to them, sending them
> > barnstars, or otherwise manipulating their on-wiki experience, that would
> > be problematic.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang 
> > wrote:
> >
> > > Hi folks,
> > >
> > > My work needs to randomly sample new editors in each month, e.g., 100
> > > editors per month.
> > >
> > > Do any of you have good suggestions for how to do this efficiently?
> > >
> > > I could think of using the dump files, but wonder are there other options?
> > >
> > >
> > > Thanks,
> > >
> > > Haifeng Zhang
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] User type context sensitivity to introduction sections.

2019-02-09 Thread Stuart A. Yeates
I believe that the English language term you are looking for is
https://en.wikipedia.org/wiki/Plain_English and the problem is that
en.wiki policies already require plain english. The core of the issue
is that writing in plain english is hard and currently there are few
tools to support editors produce it.

A decent reading level test applied by section and calculated using a
javascript tool that fitted into the standard wiki framework for tools
would be a very useful addition. The tool could annotate the article
and for new articles notify the article creator.  Of course, we'd need
supporting materials to aid editors learn plain english and so forth,
but we have to start somewhere.

cheers
stuart

--
...let us be heard from red core to black sky

On Sun, 10 Feb 2019 at 11:22, Ziko van Dijk  wrote:
>
> Allow me to propose something different: Wikipedia needs better writing,
> not technical solutions. And for different target groups, we need different
> encyclopedias:
> * for children
> * for people with disabilities, such as
> https://en.wikipedia.org/wiki/Leichte_Sprache
> * for scholars, e.g. "Wikipedia scholar".
> A different wiki for every target group can be arranged in the best
> possible way for the target group.
>
> Kind regards
> Ziko
>
>
>
>
> Am Sa., 9. Feb. 2019 um 21:55 Uhr schrieb Aaron Gray <
> aaronngray.li...@gmail.com>:
>
> > I am thinking maybe we could use subdomains for layperson, and for schools,
> > and maybe universities to have specialized [approved] content also ? Just
> > an idea given this possible mechanism.
> >
> > On Sat, 9 Feb 2019 at 20:15, Aaron Gray 
> > wrote:
> >
> > > Thank you please keep suggestions and pragmatics coming in !
> > >
> > > I looked at this problem some time ago and the extra programming for what
> > > I am proposing is quite minimal utilizing existing MediaWiki libraries
> > and
> > > adding extra code to support the tag structure with defaulting to make it
> > > seamless to existing articles.
> > >
> > > I really think this would increase the usability and audience of
> > > Wikipedia and also might possibly allow us to integrate content from
> > other
> > > Wikipedia projects.
> > >
> > > Regards,
> > >
> > > Aaron
> > >
> > >
> > > On Sat, 9 Feb 2019 at 07:57, Amir E. Aharoni <
> > amir.ahar...@mail.huji.ac.il>
> > > wrote:
> > >
> > >> The suggestions that bring up the Simple English Wikipedia miss the fact
> > >> that it only covers the English language, which most people don't know,
> > >> and
> > >> doesn't do almost anything for the many other languages of the world.
> > (I'm
> > >> saying "almost anything" because I know that there are people who prefer
> > >> to
> > >> translate articles from the Simple English Wikipedia, and this
> > indirectly
> > >> benefits other languages.)
> > >>
> > >> One thing about how Wikipedia works that practically no-one ever
> > >> challenges
> > >> is that every page title is associated with a page, and the page is
> > always
> > >> a single big blob of sections, section headings, templates and magic
> > >> words.
> > >>
> > >> What if it was not a single blob?
> > >>
> > >> What if all the magic words, such as NOTOC, DISPLAYTITLE, and
> > DEFAULTSORT
> > >> moved to a separate metadata storage?
> > >>
> > >> More closely to this thread's topic, what if at least some sections that
> > >> all or most pages have were stored separately, so that it would be
> > >> possible
> > >> to parse and render them semantically? The References section, for
> > >> example,
> > >> is something that many pages have. What if it could be separated from
> > the
> > >> prose blob and stored separately, so that it would be parsed
> > semantically
> > >> for different screens and contexts, such as Wikicite? Currently its
> > >> rendering and storage is heavily biased for desktop and wiki syntax
> > >> editing, and suboptimal for mobile display and editing, as well as for
> > >> translation.
> > >>
> > >> And most closely to the thread's original topic, what if one page could
> > >> have several lead sections? Sure, this can be done now with hacks such
> > as
> > >> templates and namespaces, but these are still hacks: they are not
> > >> semantic,
> > >> not portable across languages, and not easily machine-readable.
> > >>
> > >> Of course, doing all these things would require major, major changes in
> > >> how
> > >> Wikipedia's software works. Developers would have to write a lot of code
> > >> and editors would have to get used to new things. But sometimes it's
> > worth
> > >> thinking our of the box instead of saying "that's not how Wikipedia
> > >> works".
> > >>
> > >> בתאריך שבת, 9 בפבר׳ 2019, 02:16, מאת Aaron Gray <
> > >> aaronngray.li...@gmail.com
> > >> >:
> > >>
> > >> > I am suggesting WikiPedia has context-sensitive articles so if you
> > are a
> > >> > kid or a layperson or an expert in a field you get a different
> > >> > introduction.
> > >> >
> > >> > Often the reason people don't read or use WikiPedia is articles 

Re: [Wiki-research-l] User type context sensitivity to introduction sections.

2019-02-08 Thread Stuart A. Yeates
On the English language wikipedia the guidelines about ledes are
pretty clear and such articles are in breach of them. Please tag them
with {{lead rewrite}} when you find them. TW lets you do this via
javascript magic.

cheers
stuart


--
...let us be heard from red core to black sky

On Sat, 9 Feb 2019 at 13:15, Aaron Gray  wrote:
>
> I am suggesting WikiPedia has context-sensitive articles so if you are a
> kid or a layperson or an expert in a field you get a different
> introduction.
>
> Often the reason people don't read or use WikiPedia is articles are too
> complex at the start.
>
> Having an adaptive setting that can be chosen but users as default needs
> facilitating by WikiMedia technology.
>
> Thoughts and ideas and possible implementation ideas on this idea are
> welcomed.
>
> Regards,
>
> Aaron
>
>
> --
> Aaron Gray
>
> Independent Open Source Software Engineer, Computer Language Researcher,
> Information Theorist, and amateur computer scientist.
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Invitations to participate in a research about Wikidata as a learning platform

2018-12-17 Thread Stuart A. Yeates
sitive impact in Academia (which I have done for years with
> Wikipedia and now with Wikidata).
>
> While I do not expect anyone to show support blindly, I do find your
> message a bit puzzling -- weather you meant it or not, your mail suggests
> that your are unsupportive of this research and your tone was dismissive,
> without portraying the situation accurately, nor by checking the details
> properly.
> That is your prerogative, as is not filling out the questionnaire. But I
> would urge you to reconsider.
> Take a closer look. Assume good faith. And you'll find that once you pass
> the personal info part, the questions are not at all intrusive, but rather
> focused, genuine, inquisitive and most importantly -- focused on the
> Wikidata experience you've had.
>
> In short, I hope you reconsider. If you don't, that's fine as well.
>
> Shani.
>
> ---
> *Shani Evenstein Sigalov*
> EdTech Innovation Strategist, NY/American Medical Program, Sackler School
> of Medicine, Tel Aviv University.
> PhD Candidate, School of Education, Tel Aviv University.
> Lecturer, Tel Aviv University.
> Chairperson, WikiProject Medicine Foundation
> <https://meta.wikimedia.org/wiki/Wiki_Project_Med>.
> Chairperson, Wikipedia & Education User Group
> <https://meta.wikimedia.org/wiki/Wikipedia_%26_Education_User_Group>.
> Chairperson, The Hebrew Literature Digitization Society
> <http://www.israelgives.org/amuta/580428621>.
> Chief Editor, Project Ben-Yehuda <http://bybe.benyehuda.org>.
> *+972-525640648*
>
>
> On Mon, Dec 17, 2018 at 11:52 PM Stuart A. Yeates  wrote:
>
> > I find the personal information requested particularly intrusive
> > compared to other similar surveys (and the overview suggests that
> > further pages are going to ask for more personal information),
> > especially since the cover page does not mention any ethics board
> > approval (not sure if this is required in Israel).
> >
> > Do you really have a plan for evaluating the responses that requires
> > all this information? If not, why are you collecting it?
> >
> > As it stand I'll not be answering this survey.
> >
> > cheers
> > stuart
> >
> > --
> > ...let us be heard from red core to black sky
> >
> > On Tue, 18 Dec 2018 at 10:14, Shani Evenstein 
> > wrote:
> > >
> > > Thanks, Leila!
> > > Not sure why this happened with the link, but happy you found one that
> > > works.  :)
> > >
> > > Shani.
> > >
> > > ---
> > > *Shani Evenstein Sigalov*
> > > EdTech Innovation Strategist, NY/American Medical Program, Sackler School
> > > of Medicine, Tel Aviv University.
> > > PhD Candidate, School of Education, Tel Aviv University.
> > > Lecturer, Tel Aviv University.
> > > Chairperson, WikiProject Medicine Foundation
> > > <https://meta.wikimedia.org/wiki/Wiki_Project_Med>.
> > > Chairperson, Wikipedia & Education User Group
> > > <https://meta.wikimedia.org/wiki/Wikipedia_%26_Education_User_Group>.
> > > Chairperson, The Hebrew Literature Digitization Society
> > > <http://www.israelgives.org/amuta/580428621>.
> > > Chief Editor, Project Ben-Yehuda <http://bybe.benyehuda.org>.
> > > *+972-525640648*
> > >
> > >
> > > On Mon, Dec 17, 2018 at 11:00 PM Leila Zia  wrote:
> > >
> > > > Hi Shani and all,
> > > >
> > > > On Sun, Dec 16, 2018 at 5:27 AM Shani Evenstein 
> > > > wrote:
> > > > >
> > > > > Dear Wiki-researchers,
> > > > >
> > > > > I have a huge favor to ask of everyone in this mailing list --
> > > > >
> > > > > TLDR: *please fill out* *this questionnaire
> > > > > <
> > > >
> > https://mail.google.com/mail/u/0/%E2%80%8Bhttps://goo.gl/forms/WMFb6j2mpG2HwFTx2
> > !
> > > > !>!*
> > > >
> > > > There is something about the above link that doesn't work. I /think/
> > > > you meant to share:
> > > >
> > > >
> > https://docs.google.com/forms/d/e/1FAIpQLSc_h3LPcPgM2V3W5tNdRdWjw3ayRUq73nD0HyhVz07SKwE0Hw/viewform
> > > >
> > > > Good luck with your research! :)
> > > >
> > > > Best,
> > > > Leila
> > > >
> > > > ___
> > > > Wiki-research-l mailing list
> > > > Wiki-research-l@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Invitations to participate in a research about Wikidata as a learning platform

2018-12-17 Thread Stuart A. Yeates
I find the personal information requested particularly intrusive
compared to other similar surveys (and the overview suggests that
further pages are going to ask for more personal information),
especially since the cover page does not mention any ethics board
approval (not sure if this is required in Israel).

Do you really have a plan for evaluating the responses that requires
all this information? If not, why are you collecting it?

As it stand I'll not be answering this survey.

cheers
stuart

--
...let us be heard from red core to black sky

On Tue, 18 Dec 2018 at 10:14, Shani Evenstein  wrote:
>
> Thanks, Leila!
> Not sure why this happened with the link, but happy you found one that
> works.  :)
>
> Shani.
>
> ---
> *Shani Evenstein Sigalov*
> EdTech Innovation Strategist, NY/American Medical Program, Sackler School
> of Medicine, Tel Aviv University.
> PhD Candidate, School of Education, Tel Aviv University.
> Lecturer, Tel Aviv University.
> Chairperson, WikiProject Medicine Foundation
> .
> Chairperson, Wikipedia & Education User Group
> .
> Chairperson, The Hebrew Literature Digitization Society
> .
> Chief Editor, Project Ben-Yehuda .
> *+972-525640648*
>
>
> On Mon, Dec 17, 2018 at 11:00 PM Leila Zia  wrote:
>
> > Hi Shani and all,
> >
> > On Sun, Dec 16, 2018 at 5:27 AM Shani Evenstein 
> > wrote:
> > >
> > > Dear Wiki-researchers,
> > >
> > > I have a huge favor to ask of everyone in this mailing list --
> > >
> > > TLDR: *please fill out* *this questionnaire
> > > <
> > https://mail.google.com/mail/u/0/%E2%80%8Bhttps://goo.gl/forms/WMFb6j2mpG2HwFTx2!
> > !>!*
> >
> > There is something about the above link that doesn't work. I /think/
> > you meant to share:
> >
> > https://docs.google.com/forms/d/e/1FAIpQLSc_h3LPcPgM2V3W5tNdRdWjw3ayRUq73nD0HyhVz07SKwE0Hw/viewform
> >
> > Good luck with your research! :)
> >
> > Best,
> > Leila
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Empowering Researchers

2018-12-07 Thread Stuart A. Yeates
Multiple competing definitions of terms. Seems a little similar to
everything2, whose editorship Wikipedia canabalised more than a decade
ago...

For those at their first rodeo, feel free to Google it.

Cheers
Stuart

On Sat, 8 Dec 2018 12:11 am Gabriele - Qeios  Dear list members,
> I’m Gabriele Marinello, co-founder along with Giorgio Bedogni and Alberto
> Bedogni of Qeios, a new Open Access integrated system, created by
> researchers, for researchers.
> Qeios is the first tool designed to improve the quality and
> comparability/reproducibility of the research by acting at the production
> level. A new piece of knowledge, the Definition, and the rating system
> built on it allow researchers to produce and publish research of increased
> quality and comparability/reproducibility.
> If you are curious, you can find a video and more information here:
> https://www.qeios.com/about
> If then you are interested, you can sign up using an invitation link, here
> is Giorgio’s: https://www.qeios.com/invitation-to-join/researcher/314
> If you have any questions/doubts or feedback, feel free to drop me an
> email at g...@qeios.com or call me at +39 380 8912791.
> Many thanks and all the best,
> Gabriele
> —
> Gabriele Marinello
> Co-founder, Qeios Ltd
> 34, Old Barrack Yard, SW1X 7NP, London, UKUK   +44 (0) 7426 853828IT   +39
> 380 891279...@qeios.comwww.qeios.com
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wiki Studies issue 1

2017-10-03 Thread Stuart A. Yeates
OJS, the software in use, includes support for all three of these, but only
the last is usually free.

cheers
stuart

--
...let us be heard from red core to black sky

On 2 October 2017 at 23:59, Andy Mabbett <a...@pigsonthewing.org.uk> wrote:

> On 2 October 2017 at 10:34, Stuart A. Yeates <syea...@gmail.com> wrote:
>
> >  The first issue of *Wiki Studies*, an open-access, peer-reviewed journal
> > addressing the intersection of Wikipedia and higher education has been
> > published:
> >
> > http://wikistudies.org/index.php?journal=wikistudies=
> issue=current
>
> Congratulations to all concerned for this milestone achievement, but:
>
> * No ISSN?
> * No DOI for papers?
> * No ORCID iDs for contributors?
>
> --
> Andy Mabbett
> @pigsonthewing
> http://pigsonthewing.org.uk
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] category extraction question

2017-07-24 Thread Stuart A. Yeates
Sorry it's taken me so long to get back to this.

https://pdfs.semanticscholar.org/dea9/142b39bdc2c3738e0f9cb7c6d117750ef2f7.pdf
and https://meta.wikimedia.org/wiki/Beyond_categories are good places to
start on the issues with cats on en.wiki.

cheers
stuart

--
...let us be heard from red core to black sky

On 12 July 2017 at 02:53, Leila Zia <le...@wikimedia.org> wrote:

> Hi Stuart,
>
> On Mon, Jul 10, 2017 at 6:45 PM, Stuart A. Yeates <syea...@gmail.com>
> wrote:
> > The category system on en.wiki is not an IS-A system and there have been
> > several discussions about making it it based on mathematical principals
> > which have come to nothing because the consensus of editors is against
> it.
> > The best way to think about categories is as a locally-faceted related
> > links system.
>
> It would be great if you can share a link to one or more of those
> conversations, if it's not too hard to find them. This is a
> conversation that comes up often and I'd like to educate myself with
> this background. (and to confirm: on our end the goal is not to change
> the category system on enwiki, but to make it machine understandable
> for specific applications.)
>
> > Having said that, Category:Wikipedia maintenance is an important root
> > probably useful for separating  the wheat from the chaff. Most of these
> are
> > also hidden categories. I'm not sure whether this flag appears in the
> SQL,
> > but see
> > https://en.wikipedia.org/wiki/Wikipedia:Categorization#Hiding_categories
>
> Looking into these. thanks!
>
> Best,
> Leila
>
> > cheers
> > stuart
> >
> > --
> > ...let us be heard from red core to black sky
> >
> > On 11 July 2017 at 13:20, Leila Zia <le...@wikimedia.org> wrote:
> >
> >> Hi all,
> >>
> >> [If you are not interested in discussions related to the category system
> >> (on English Wikipedia)
> >> , you can stop here. :)]
> >>
> >> We have run into a problem that some of you may have thought about or
> >> addressed before. We are trying to clean up the category system on
> English
> >> Wikipedia by turning the category structure to an IS-A hierarchy. (The
> >> output of this work can be useful for the research on template
> >> recommendation [1], for example, but the use-cases won't stop there).
> One
> >> issue that we are facing is the following:
> >>
> >> We are currently
> >> using
> >>  SQL dumps to extract categories associated with every article on
> English
> >> Wikipedia (main namespace). [2]
> >> Using this approach, we get 5 categories associated with Flow cytometry
> >> bioinformatics article [3]:
> >>
> >> Flow_cytometry
> >> Bioinformatics
> >>
> >> Wikipedia_articles_published_in_peer-reviewed_literature
> >> Wikipedia_articles_published_in_PLOS_Computational_Biology
> >> CS1_maint:_Multiple_names:_authors_list
> >>
> >> The problem is that only the first two categories are the ones we are
> >> interested in. We have one cleaning step through which we only keep
> >> categories that belong to category Article and that step removes the
> last
> >> category above, but the other two Wikipedia_... remain there. We need to
> >> somehow prune the data and clean it from those two categories.
> >>
> >> One way we could do the above would be to parse wikitext instead of the
> SQL
> >> dumps and focus on extracting categories marked by pattern
> [[Category:XX]],
> >> but in that case, we would lose a good category such as
> >> Guided_missiles_of_Norway
> >> because that's generated by a template.
> >>
> >> Any ideas on how we can start with a "cleaner" dataset of categories
> >> related to the topic of the articles as opposed to maintenance related
> or
> >> other types of categories?
> >>
> >> Thanks,
> >> Leila
> >>
> >> [1] https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia
> >> _stubs_across_languages
> >>
> >> [2] The exact code we use is
> >>
> >> SELECT p.page_id id, p.page_title title, cl.cl_to category
> >> FROM categorylinks cl
> >> JOIN page p
> >> on cl.cl_from = p.page_id
> >> where cl_type = 'page'
> >> and page_namespace = 0
> >> and page_is_redirect = 0
> >>
> >> and the edges of the category graph are extracted with
> >>
> >> *SELECT p.page_title category, cl.cl_to p

Re: [Wiki-research-l] category extraction question

2017-07-10 Thread Stuart A. Yeates
The category system on en.wiki is not an IS-A system and there have been
several discussions about making it it based on mathematical principals
which have come to nothing because the consensus of editors is against it.
The best way to think about categories is as a locally-faceted related
links system.

Having said that, Category:Wikipedia maintenance is an important root
probably useful for separating  the wheat from the chaff. Most of these are
also hidden categories. I'm not sure whether this flag appears in the SQL,
but see
https://en.wikipedia.org/wiki/Wikipedia:Categorization#Hiding_categories

cheers
stuart

--
...let us be heard from red core to black sky

On 11 July 2017 at 13:20, Leila Zia  wrote:

> Hi all,
>
> [If you are not interested in discussions related to the category system
> ​ (on English Wikipedia)​
> , you can stop here. :)]
>
> We have run into a problem that some of you may have thought about or
> addressed before. We are trying to clean up the category system on English
> Wikipedia by turning the category structure to an IS-A hierarchy. (The
> output of this work can be useful for the research on template
> recommendation [1], for example, but the use-cases won't stop there). One
> issue that we are facing is the following:
>
> We are currently
> ​using
>  SQL dumps to extract categories associated with every article on English
> Wikipedia (main namespace). [2]
> ​ Using this approach, we get 5 categories associated with Flow cytometry
> bioinformatics article [3]:
>
> Flow_cytometry
> Bioinformatics
>
> Wikipedia_articles_published_in_peer-reviewed_literature
> Wikipedia_articles_published_in_PLOS_Computational_Biology
> CS1_maint:_Multiple_names:_authors_list
>
> ​The problem is that only the first two categories are the ones we are
> interested in. We have one cleaning step through which we only keep
> categories that belong to category Article and that step removes the last
> category above, but the other two Wikipedia_... remain there. We need to
> somehow prune the data and clean it from those two categories.
>
> One way we could do the above would be to parse wikitext instead of the SQL
> dumps and focus on extracting categories marked by pattern [[Category:XX]],
> but in that case, we would lose a good category such as
> Guided_missiles_of_Norway​
> ​ because that's generated by a template.​
>
> Any ideas on how we can start with a "cleaner" dataset of categories
> related to the topic of the articles as opposed to maintenance related or
> other types of categories?
>
> Thanks,
> Leila
>
> [1] https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia
> _stubs_across_languages
>
> [2] The exact code we use is
>
> SELECT p.page_id id, p.page_title title, cl.cl_to category
> FROM categorylinks cl
> JOIN page p
> on cl.cl_from = p.page_id
> where cl_type = 'page'
> and page_namespace = 0
> and page_is_redirect = 0
>
> ​and the edges of the category graph are extracted with
>
> *SELECT p.page_title category, cl.cl_to parent *
> *FROM categorylinks cl *
> *JOIN page p *
> *ON p.page_id = cl.cl_from *
> *where p.page_namespace = 14*​
>
>
> ​[3] https://en.wikipedia.org/wiki/Flow_cytometry_bioinformatics​
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Recognizing domain experts contribution to Wikipedia

2017-07-08 Thread Stuart A. Yeates
It is worth remembering that via the orcid identifier in the authority
control template, the is now a standard linked-data mechanism for
researchers to identify themselves. I have no idea whether anyone is
looking at that though.

Cheers
Stuart

On Saturday, July 8, 2017, Pine W  wrote:

> Hi Alex,
>
> I believe that this is a subject of interest to the community. It would
> indeed be helpful to know the percentage of people with graduate-level
> academic qualifications who regularly make contributions on English
> Wikipedia and other language editions of Wikipedia.
>
> I'd suggest thinking about the following:
>
> 1. In general, academics don't receive benefits to their C.V. from
> contributing to Wikipedia. My guess is that this is a major reason why
> relatively few academics contribute to Wikipedia on a regular basis. It
> might be interesting if you can produce data that confirms a hypothesis
> like this.
>
> 2. I would encourage changing the term that you use from "recognized domain
> experts" to "people with graduate-level academic qualifications". In the
> U.S., in many domains, there are multiple ways for people to gain
> reputations of being experts in domain; an academic qualification is often
> not required, although it may be helpful.
>
> Pine
>
>
> On Fri, Jul 7, 2017 at 8:20 AM, Alex Yarovoy  > wrote:
>
> > Hi All,
> >
> > I'm a Master student working under the supervision of Drs. Arazy and
> Minkov
> > (Haifa U)
> > My research explores the extent to which  "recognized domain experts"
> > contribute to Wikipedia.
> > (I use a narrow definition for "recognized domain experts" to include
> those
> > with academic qualifications in the relevant topic).
> > I manually tracked these experts using a variety of sources, and then use
> > machine learning methods for automatically identifying domain experts
> > within Wikipedia editors.
> >
> > I'm writing to explore whether this research is on interest to the
> > community and to learn if other people have already tackled this research
> > question.
> >
> > Thank you in advance for pointing me to relevant research projects
> > Alex
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org 
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


-- 
--
...let us be heard from red core to black sky
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Project exploring automated classification of article importance

2017-04-27 Thread Stuart A. Yeates
Following up Kerry's comments: far more useful to our encyclopedia building
project would not be a global importance assessor, but a assessor of which
wikiprojects a page is likely to be of interest to. There are hundreds of
thousands of en.wiki pages which are not tagged properly to their
wikiprojects and are thus effectively invisible to the community of editors
who case about them.

This is a classic example of statistical classification, so it shouldn't be
too technically difficult...

cheers
stuart

--
...let us be heard from red core to black sky

On 28 April 2017 at 12:28, Kerry Raymond <kerry.raym...@gmail.com> wrote:

> I observe (and am unsurprised) that WikiProject Australia also rates the
> Pavlova article as High importance, which demonstrates into the Stuart's
> comments about graphs and subgraphs. If there are relationships between
> WikiProjects, there is probably some correlation about importance of
> articles as seen by those projects. As it happens, WikiProject Australia
> and WikiProject New Zealand are related on Wikipedia only by both being
> within the category "WikiProject Countries projects" (along with every
> other national WikiProject), so this is an example where you cannot see the
> connection between these projects "on-wiki" but anyone who knows anything
> about the geography, history, and culture of the two countries will
> understand the close connection (e.g. ANZAC, sheep, pavlova, rugby union)
> but, as the project tagging will show, we do have our differences, e.g.
> Whitebait is a High Importance article for NZ but Oz doesn't even tag it
> (we don't share the NZ passion for these small fish). And perhaps more
> seriously, our two countries have different indigenous peoples so our
> project tagging around Maori (NZ) and Aboriginal and Torres Strait Islander
> (Oz) articles would usually be quite disjoint.
>
> So if there are correlations between project tagging, it may be something
> exploitable in machine assessment of importance.
>
> Kerry
>
> -Original Message-
> From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org]
> On Behalf Of Stuart A. Yeates
> Sent: Friday, 28 April 2017 6:18 AM
> To: Research into Wikimedia content and communities <
> wiki-research-l@lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] Project exploring automated classification
> of article importance
>
> On em.wiki article importance is relative to some wikiproject. This is
> encoded in https://en.wikipedia.org/wiki/Template:WPBannerMeta which
> appears on 16% of all wikipedia pages via specialisations such as
> https://en.wikipedia.org/wiki/Template:WikiProject_New_Zealand
>
> Within Wikiproject New Zealand, there are articles which we think are very
> important to us, which we would never argue are even marginally important
> on a global scale. Take for example
> https://en.wikipedia.org/wiki/Pavlova_(food)
>
> For the mathematically inclined, this is a classic case of graph and many
> subgraphs.
>
> cheers
> stuart
>
>
> --
> ...let us be heard from red core to black sky
>
> On 27 April 2017 at 21:44, Gerard Meijssen <gerard.meijs...@gmail.com>
> wrote:
>
> > Hoi,
> > I have read the proposal and it leaves me wondering. Also the notion
> > of importance is indeed neither easy nor obvious. I think the question
> > what is most important is irrelevant depending on how you look at it.
> > Subject can be irrelevant when you look at it from a personal
> > perspective, looking at it from a particular perspective and indeed
> > what seems relevant may become irrelevant or relevant over time. When
> > you use metrics there will always be one way or another why it will be
> found to be problematic.
> >
> > When you consider Wikipedia, the difference it makes with similar
> > resources is that its long tail is so much longer and still it is easy
> > and obvious to show how the English Wikipedia's long tail is not long
> > enough [1]. When you are looking for links and relevance, Wikidata
> > includes data on all Wikipedias and thereby more avenues to establish
> relevance.
> >
> > Research has been done that shows that when people are suggested to
> > write articles or amend articles, it works best when it is about
> > subjects they care about. What people are interested in was based in
> > the research on past behaviour. What we could do is flip this and ask
> > people. Based on categories, on projects, whatever people do to
> > categorise what is their interest. This will work on a micro level. On
> > a meta level, it may drive cooperation when we enable people to share
> > their interest (at that moment in time). On a macr

Re: [Wiki-research-l] Project exploring automated classification of article importance

2017-04-27 Thread Stuart A. Yeates
On em.wiki article importance is relative to some wikiproject. This is
encoded in https://en.wikipedia.org/wiki/Template:WPBannerMeta which
appears on 16% of all wikipedia pages via specialisations such as
https://en.wikipedia.org/wiki/Template:WikiProject_New_Zealand

Within Wikiproject New Zealand, there are articles which we think are very
important to us, which we would never argue are even marginally important
on a global scale. Take for example
https://en.wikipedia.org/wiki/Pavlova_(food)

For the mathematically inclined, this is a classic case of graph and many
subgraphs.

cheers
stuart


--
...let us be heard from red core to black sky

On 27 April 2017 at 21:44, Gerard Meijssen 
wrote:

> Hoi,
> I have read the proposal and it leaves me wondering. Also the notion of
> importance is indeed neither easy nor obvious. I think the question what is
> most important is irrelevant depending on how you look at it. Subject can
> be irrelevant when you look at it from a personal perspective, looking at
> it from a particular perspective and indeed what seems relevant may become
> irrelevant or relevant over time. When you use metrics there will always be
> one way or another why it will be found to be problematic.
>
> When you consider Wikipedia, the difference it makes with similar resources
> is that its long tail is so much longer and still it is easy and obvious to
> show how the English Wikipedia's long tail is not long enough [1]. When you
> are looking for links and relevance, Wikidata includes data on all
> Wikipedias and thereby more avenues to establish relevance.
>
> Research has been done that shows that when people are suggested to write
> articles or amend articles, it works best when it is about subjects they
> care about. What people are interested in was based in the research on past
> behaviour. What we could do is flip this and ask people. Based on
> categories, on projects, whatever people do to categorise what is their
> interest. This will work on a micro level. On a meta level, it may drive
> cooperation when we enable people to share their interest (at that moment
> in time). On a macro level data may arrive at Wikidata and this will allow
> us to seek what articles include specific data (think date of death for
> instance). On a meta and macro level, we could ask readers what subjects
> they are missing. This would provide an additional incentive for people to
> write. For this last suggestion we could measure what people are missing.
>
> Anyway, relevance and importance depend on a point of view. When our
> community is enabled to make a difference, it will help us with our
> content. As a movement we know that there is enough that we do not properly
> cover. Advocating these issues and targeting and educating potential
> communities is where the WMF could play more of a role.
> Thanks,
>GerardM
>
>
>
> [1]
> http://ultimategerardm.blogspot.nl/2017/04/wikidata-
> user-stories-sum-of-all.html
>
> On 26 April 2017 at 13:48, Jonathan Cardy 
> wrote:
>
> > I like to think that in time importance will win out over popularity. If
> > Wikipedia still exists in fifty of five hundred years time and we are
> still
> > using pasteurisation and indeed still eating hydrocarbon based foods,
> then
> > I suspect the pop group you mention will be less frequently read about
> than
> > the pasteurisation process.
> >
> > In the meantime if we try to work it out at all it has to be something of
> > a judgement call, and one we will occasionally get wrong. Any guesses as
> to
> > which current branches of science will be as forgotten in a century as
> > phrenology is today?
> >
> > At an extreme the weekly top ten most viewed articles are a good guide to
> > what is trending in the popular cultures of India and the USA. I'm
> assuming
> > that most modern pop culture is inherently ephemeral. Of course digital
> > historians of future centuries may be rolling on the floor laughing at
> this
> > email, and the TV dramas currently being filmed may still be widely
> studied
> > and universally known classics while our leading edge science lies buried
> > in the foundations of their science.
> >
> > Regards
> >
> > Jonathan
> >
> >
> > > On 26 Apr 2017, at 08:50, Jane Darnell  wrote:
> > >
> > > Yes I totally agree that "importance is a relative metric rather than
> > > absolute." I also agree that incoming links and pageviews are not
> > accurate
> > > measurements of "importance" for all of the reasons you mention.
> However,
> > > we are still a project that is actively exploring the universe of
> > > knowledge, and leaning heavily on academia and other established
> sources
> > we
> > > must "boldly go where no man has gone before" (and please feel free to
> > > insert "white, euro-centric" before the man part). So do you have any
> > > suggestions what we could measure going forward that would cough up
> some
> > > 

Re: [Wiki-research-l] Student Learning with Wikipedia

2017-04-05 Thread Stuart A. Yeates
Do you have a link to the original spec for the 'Fall 2016 study' so we can
judge whether you've done what you've said you'd do so far?

cheers
stuart

--
...let us be heard from red core to black sky

On 6 April 2017 at 09:59, Zach McDowell  wrote:

> Hi everyone,
> I've been working with Wiki Education for the last nine months researching
> student learning using Wikipedia based assignments. We have a ton of
> amazing data that I'll be posting about shortly and releasing under an open
> license (as well as a summary of some preliminary findings).
>
> I've written a grant proposal for WMF to continue this research as we've
> really only touched the tip of the iceberg here. Please check it out and if
> you're interested I'd love your feedback (and your support).
>
> https://meta.wikimedia.org/wiki/Grants:Project/Learning_
> with_Wikipedia_Based_Assignments
>
>
> best
>
> Zach
>
> 
> Zachary J. McDowell, PhD
> University of Illinois at Chicago
>
> Postdoctoral Fellow, National Center for Digital Government
> Research Fellow, Wiki Education Foundation
> Managing Editor, communication +1
> www.zachmcdowell.com
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editors: research on transitions, learning over time, leaving

2017-03-22 Thread Stuart A. Yeates
I know that I was recruited to Wikipedia from then-competitor everything2,
it would be interesting to find active users who joined during E2's
precipitous decline, match their accounts and compare editing styles.

cheers
stuart

On Tuesday, March 21, 2017, WereSpielChequers 
wrote:

> Dear Jan,
>
> It's a fascinating topic and one that interests me as well.
>
> But you have to be careful with your assumptions, our data is almost always
> based on user accounts, but we'd like to think we are looking at people.
> Some of whom will have different accounts over time. Some of the
> involvement will switch between projects - apparently half the founding
> Wikidata community were previously active in the movement. Some will spend
> periods of their volunteer time off wiki - many very active volunteers put
> time in as Arbcom members, OTRS volunteers or chapter trustees.
>
>
> Volunteers are very very different to staff or even subscribers, barely 16
> years into the project we simply don't have the data to workout longterm
> patterns of retention and reactivation, but the signs so far are that
> Wikipedia is beginning to look like other volunteer organisations that
> people have a multi decade relationship with.
>
> A few years ago the WMF did a survey of former editors, partly to learn why
> they'd left. One of the most common responses was "I haven't left yet".
>
> WSC
>
> On 20 March 2017 at 09:34, Jan Dittrich  wrote:
>
> > Hello,
> >
> > I am looking for research on how editors transition through various
> levels
> > of involvement in their time as editors. The questions I ask myself are:
> >
> > - How many people to come each month?
> > - How many editors leave?
> >
> > …those are not too difficult to answer but…
> >
> > - How many people become more involved over time? E.g. How many each
> month
> > come to a level where they are interested in handling many pages on the
> > watchlist, learn the less obvious aspects of wiki culture etc.
> >
> > In my work as designer I am often involved in features for intermediate
> > and/or very involved users and I’m wondering if there are any ballpark
> > estimates of how many people learn these features each month.
> >
> > Jan
> >
> > --
> > Jan Dittrich
> > UX Design/ User Research
> >
> > Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> > Phone: +49 (0)30 219 158 26-0
> > http://wikimedia.de
> >
> > Imagine a world, in which every single human being can freely share in
> the
> > sum of all knowledge. That‘s our commitment.
> >
> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter
> > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> > Körperschaften I Berlin, Steuernummer 27/029/42207.
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] tool / framework for article lifecycle stats ?

2017-03-16 Thread Stuart A. Yeates
That's the closest I've seen to what I think I want, but I need it in
machine readable form, I'll probably hack simething using the API.

Cheers
Stuart

On Wednesday, March 15, 2017, Àlex Hinojo <alexhin...@gmail.com> wrote:

> Hi Stuard. Is this what you are looking for?
>
> https://tools.wmflabs.org/xtools-articleinfo/
>
>
>
> El dc., 15 de març 2017 a les 9:21, Stuart A. Yeates (<syea...@gmail.com
> <javascript:_e(%7B%7D,'cvml','syea...@gmail.com');>>) va escriure:
>
>> Is there a tool or framework forgetting article lifecycle stats in an
>> automated fashion. Is anyone aware of something like that? Things like
>> creator (+ their basic stats), total # of edits, who's edited the article
>> (+ their basic stats), article age, article flags, etc.
>>
>> I'm reasonably platform / language agnostic. I'll only need stats on
>> dozens of articles an hour, so no need for a weaponised platform.
>>
>>
>> cheers
>> stuart
>> --
>> ...let us be heard from red core to black sky
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> <javascript:_e(%7B%7D,'cvml','Wiki-research-l@lists.wikimedia.org');>
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
> --
> Àlex Hinojo
>


-- 
--
...let us be heard from red core to black sky
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] tool / framework for article lifecycle stats ?

2017-03-15 Thread Stuart A. Yeates
Is there a tool or framework forgetting article lifecycle stats in an
automated fashion. Is anyone aware of something like that? Things like
creator (+ their basic stats), total # of edits, who's edited the article
(+ their basic stats), article age, article flags, etc.

I'm reasonably platform / language agnostic. I'll only need stats on dozens
of articles an hour, so no need for a weaponised platform.


cheers
stuart
--
...let us be heard from red core to black sky
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Quick way to figure out how many Wikipedia edits a given editor has made?

2017-03-08 Thread Stuart A. Yeates
Be aware that the https://meta.wikimedia.org/wiki/Special:CentralAuth
method is unreliable / misleading for many editors whose accounts predate
Central Authentication.

cheers
stuart

--
...let us be heard from red core to black sky

On 9 March 2017 at 09:47, Jonathan Cardy 
wrote:

> Interesting, I wonder how they miscalculate their account registered date.
> I suspect it is confused by renames.
>
> It also has the drawback of assuming that people just have one account.
>
> But it does seem to give an accurate figure of the number of edits that an
> account has made.
>
>
> Regards
>
> Jonathan / WereSpielChequers
>
>
> On 8 Mar 2017, at 19:25, Jaqen  wrote:
>
>
>
>
> On Wed, Mar 8, 2017 at 6:36 PM, Misha Teplitskiy <
> mishateplits...@gmail.com> wrote:
>
>> Dear Wiki researchers,
>>
>> Is there a quick/easy way to figure out how many edits a particular
>> editor has made (or, better yet, get a list of all those edits)?
>>
>
> Hi Misha,
>
> you can get the total edit count here:
>
> https://meta.wikimedia.org/wiki/Special:CentralAuth
>
> Jaqen
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Retention of Wikimedians for the long term

2017-02-22 Thread Stuart A. Yeates
On 22 February 2017 at 16:40, David Goodman  wrote:

> what mattered to me was personal appreciation of my work--just as it did
> in my primary career. Not form notices, but  individual public comments
> that from people who showed that they understood. There is no way of
> automating that. The virtues of wikiprojects  (and local meetups) is of
> extending that appreciation more broadly and more intensely.
>

Automate, no. Encourage, yes.

I can imagine a tool that located editors working mainly in the area of a
wikiproject (i.e. 3/5ths of their last 50 edits over three or more weeks,
maybe) who had not had much recent obvious attention from other editors (no
third-party edits to their talk page in that time) and once a week send
each person signed up to the wikiproject a notification with a link to
encourage the wikiproject participant to give that editor feedback on their
work.

In short, a private prompt to send a public feedback. 95% of the feedback
would probably be positive, but it might also find one or two of the more
subtle types of vandal.

cheers
stuart


--
...let us be heard from red core to black sky
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] ORCID

2017-02-21 Thread Stuart A. Yeates
Users of this list may be interested in an IdeaLab proposal I've just
created:

https://meta.wikimedia.org/wiki/Grants:IdeaLab/drive_contributions_from_the_academic_world_through_better_ORCID_integration

cheers
stuart
--
...let us be heard from red core to black sky
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Retention of Wikimedians for the long term

2017-02-20 Thread Stuart A. Yeates
I have thought about writing a bot that congratulated active users on
account creation anniversaries and suggested directions for growth.
"Grats X you've been editing for 2 years, here's a picture of a kitten.
Have you thought about doing New Page Patrol?"

"Grats Y you've been editing for a decade, here's a virtual beer, you've
earned it! Have you thought about applying for adminship?"

Of course, you'd want to check account account behaviour pretty carefully
first.

cheers
stuart

--
...let us be heard from red core to black sky

On 21 February 2017 at 14:33, Pine W  wrote:

> Hi Kerry,
>
> Thanks for the ideas. Jonathan Morgan, Aaron Halfaker, and I have had more
> than one conversation about wikiprojects as a way to engage with new
> editors. Unfortunately, there are a lot of derelict wikiprojects.
>
> I have some ideas about how to improve the training system for ENWP and
> Commons in particular. But that's different from the motivation issue,
> which I think is more challenging. With enough money and time, the training
> system can be upgraded. I'm not sure if the same is true for motivation. I
> have the impression that student Wikimedians are mostly motivated by grades
> (hence the precipitous decline in their participation after their Wikipedia
> Education Program class ends), and many other people are motivated by money
> or PR (hence we get a lot of people engaging in promotionalism or PR
> management.) It's not clear to me how someone goes from being wiki-curious
> to feeling motivated enough to contribute for years. There are many other
> hobbies that are lower stress, healthier, offer more opportunities for
> socializing, and offer a friendlier environment. I think that some
> Wikimedians are motivated by desire to promote or share their interest in a
> particular topic, which might keep content creators interested and engaged
> for years, particularly if they meet people with similar interests. But
> it's a phase change to go from being a content creator or curator, to
> taking on roles that benefit other individual Wikimedians, or broad
> cross-sections of the Wikimedia community. We could use all of those kinds
> of good-faith long-term contributors.
>
> Perhaps we should include information in our training about "career paths"
> for Wikimedians who would like to develop their skills and/or move into new
> roles?
>
> I'm not sure what else to suggest. I find it challenging to figure out how
> to motivate people to want to contribute productively for years, and there
> are some roles for which lengthy experience is an informal but significant
> prerequisite for acceptance and/or success. I'd like to see more people
> make that journey.
>
> Pine
>
>
> On Mon, Feb 20, 2017 at 2:10 PM, Kerry Raymond 
> wrote:
>
>> Pine,
>>
>> It sounds to me that there are two separate parts to your question.
>>
>> One relates to the survival of such editors to being ongoing active
>> editors. The second seems to relate to recruiting them and perhaps
>> upskilling them for specific purposes, eg administration, guild of copy
>> editors, and whatever initiatives you have in mind.
>>
>> The first question probably relates to being able to get them better
>> informed about the policies of Wikipedia at least in relation to the area
>> of their contributions and how to engage with the community because it is
>> the abrasive interaction with the community that seems to drive people away.
>>
>> The second probably relates to raising awareness of WikiProjects and
>> other collaborative initiatives. (Obviously all of WP is collaborative, but
>> some things require higher levels of coordination and I think this might be
>> what you are referring to). I think probably needs some analysis of the
>> nature of their contributions and/or their topics of interest in order to
>> introduce them to targetted WikiProjects etc that seem logical trajectories
>> for them. The mistake we make constantly in onboarding newbies is
>> overwhelming them with information (think of the standard Twinkle welcome
>> templates) because "THEY NEED TO KNOW THIS" instead of what they want to
>> know "how do I do this current thing I am trying to do". For similar
>> reasons I think any attempts to draw them into particular
>> projects/initiatives should be highly targeted, not too frequent, and based
>> on what their interests seem to be rather where someone else would like
>> them to work. (I think we should avoid the mindset of "I need to recruit
>> some cannon fodder"). Having got their attention, someone probably has to
>> hold their hand through whatever upskilling is needed to get them
>> productive. Just pointing people at a Project page isn't helpful, there
>> needs to be some human outreach and shepherding.
>>
>> In some idealised universe, we should see Wikipedians as being on a
>> learning journey, where (through analysis of past contributions and
>> interactions) we are tracking them against 

Re: [Wiki-research-l] regional KPIs

2017-01-23 Thread Stuart A. Yeates
"closure of the [[Category:Australia]]" is not going to work. In en.wiki
subcategories are not subsets in any mathematical sense and the category
tree has many, many loops and no roots.

cheers
stuart

--
...let us be heard from red core to black sky

On Tue, Jan 24, 2017 at 2:12 PM, Kerry Raymond 
wrote:

> As previously came up in discussion about chapters, it would be very
> useful to have national data about Wikipedia activities, which can be
> determined (generally) from IP addresses. Now I understand the privacy
> argument in relation to logged-in users (not saying I agree with it though
> in relation to aggregate data). However, can we find a proxy that does not
> have the privacy considerations.
>
>
>
> My hypothesis is that national content is predominantly written by users
> resident in that nation. And that therefore activity on national content
> can be used as a proxy for national user editing activity.
>
>
>
> In the case of Australia, we could describe Australian national content in
> either of two ways: articles within the closure of the
> [[Category:Australia]] and/or those tagged as  {{WikiProject Australia}}.
> There are arguments for/against either (neither is perfect, in my
> experience the category closure will tend to have false positives and the
> project will tend to have false negatives).
>
>
>
> I would like to know what correlation exists between national editor
> activity (as determined from IP addresses mapped to location) and national
> content edits and if/how it changes over time for various nations. This is
> research that only WMF can do because WMF has the IP addresses and the rest
> of us can’t have them for privacy reasons.
>
>
>
> If we could establish that a strong-enough correlation existed between
> them, we could use national content activity (for which there is no privacy
> consideration) as a proxy for national editing activity. And we might even
> be able to come up with a multiplier for each nation to provide comparable
> data for national editing activity.
>
>
>
> Now, it may be that we need to restrict the edits themselves in some way
> to maximise the correlations between national content and same-nation
> editor activity.
>
>
>
> My second hypothesis is “semantic” edits (e.g. edits that add large
> amounts of content or citation) to national content will be more highly
> correlated with same-nation editors than “syntactic” edits (e.g. fix
> spelling, punctuation or Manual of Style issues) will be. I suspect most
> bots and other automated/semi-automated edits are doing syntactic edits.
>
>
>
> Now, some of you will probably be aware of [https://en.wikipedia.org/
> wiki/Wikipedia:Wikipedia_Signpost/2017-01-17/Recent_research Female
> Wikipedians aren't more likely to edit women biographies]. So it may well
> be that my patriotic-editing hypothesis is also untrue. But it would be
> nice to know one way or the other.
>
>
>
> Kerry
>
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Chapters

2017-01-09 Thread Stuart A. Yeates
On Tue, Jan 10, 2017 at 1:50 PM, Kerry Raymond 
wrote:

> My personal 10c on this having been a chapter member for several  years
> and a chapter committee member for some of those years  is that there are
> the chapters who get annual funding and those who don’t. If you don’t get
> annual funding, then you have no staff member who can do the day-to-day
> administrative work (every organisation has to submit forms to their
> government, organise auditing, keep the web site updated, do the
> bookkeeping, etc) so this work has to be farmed out to the members, which
> means that sometimes you have nobody with the right skills
> (responsibilities of treasurers make it a particularly difficult role to
> fill) and that you use up all of people’s time and goodwill in doing the
> day-to-day stuff instead of doing the exciting projects you hoped you’d be
> doing as a chapter member. Contrary to what WMF think ,there is a lot of
> work involved in writing grant applications and, when you are doing it with
> lots of volunteers each with randoms skills and only a certain amount of
> spare time, generally some people let you down (family issues, busy at
> work, or maybe just don’t know how to write the section allocated to them)
> and it doesn’t get finished to meet the deadline, which is then a waste of
> the time of the people who did their share of the work. The net result is a
> somewhat demoralising downward spiral with fewer members, burned-out
> committee people, and fewer achievements. I’ve pretty much abandoned trying
> to work chapter-wide and just try to do what I can in my own local area.
>
>
>
> WMF strongly pushes you to use volunteer time in a chapter, but overlooks
> practical realities. Engagement with GLAMs almost always involves weekday
> meetings; most volunteers are not available on weekdays due to their own
> employment. I have 7 upcoming GLAM sessions in the next 3 weeks (all for
> 1Lib1Ref) all on weekdays and despite my call for help to both chapter
> members and the Australian noticeboard, nobody is volunteering; I guess I
> am doing them all myself (assuming I don’t have conflicting commitments).
> Even committee meetings are very hard to schedule across 4 time zones with
> everyone with different working hours, different commitments to family
> events etc on the weekends, and technology problems with phones/computers
> often waste a lot of the meeting time (some people can’t get Hangouts to
> work for them, other people’s microphones cut out randomly, etc). Our
> chapter has never met face to face.
>

I've been approached several times to start/spearhead a national chapter
and have declined for exactly these reasons.

cheers
stuart

--
...let us be heard from red core to black sky
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Identifying Wikipedia stubs in various languages

2016-09-20 Thread Stuart A. Yeates
You _really_ need to exclude markup and include only body text when
measuring stubs. It's not uncommon for mass-produced articles with a  only
one or two sentences of text to approach 1K characters, once you include
maintenance templates, content templates, categories, infobox, references,
etc, etc

cheers
stuart

--
...let us be heard from red core to black sky

On Wed, Sep 21, 2016 at 5:01 AM, Morten Wang <nett...@gmail.com> wrote:

> I don't know of a clean, language-independent way of grabbing all stubs.
> Stuart's suggestion is quite sensible, at least for English Wikipedia. When
> I last checked a few years ago, the mean length of an English language stub
> (on a log-scale) is around 1kB (including all markup), and they're quite
> much smaller than any other class.
>
> I'd also see if the category system allows for some straightforward
> retrieval. English has https://en.wikipedia.org/
> wiki/Category:Stub_categories and https://en.wikipedia.org/
> wiki/Category:Stubs with quite a lot of links to other languages, which
> could be a good starting point. For some of the research we've done on
> quality, exploiting regularities in the category system using database
> access (in other words, LIKE-queries), is a quick way to grab most articles.
>
> A combination of both approaches might be a good way. If you're looking
> for even more thorough classification, grabbing a set and training a
> classifier might be the way to go.
>
>
> Cheers,
> Morten
>
>
> On 20 September 2016 at 02:40, Stuart A. Yeates <syea...@gmail.com> wrote:
>
>> en:WP:DYK has a measure of 1,500+ characters of prose, which is a useful
>> cutoff. There is weaponised javascript to measure that at en:WP:Did you
>> know/DYKcheck
>>
>> Probably doesn't translate to CJK languages which have radically
>> different information content per character.
>>
>> cheers
>> stuart
>>
>> --
>> ...let us be heard from red core to black sky
>>
>> On Tue, Sep 20, 2016 at 9:26 PM, Robert West <w...@cs.stanford.edu>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Does anyone know if there's a straightforward (ideally
>>> language-independent) way of identifying stub articles in Wikipedia?
>>>
>>> Whatever works is ok, whether it's publicly available data or data
>>> accessible only on the WMF cluster.
>>>
>>> I've found lists for various languages (e.g., Italian
>>> <https://it.wikipedia.org/wiki/Categoria:Stub> or English
>>> <https://en.wikipedia.org/wiki/Category:All_stub_articles>), but the
>>> lists are in different formats, so separate code is required for each
>>> language, which doesn't scale.
>>>
>>> I guess in the worst case, I'll have to grep for the respective stub
>>> templates in the respective wikitext dumps, but even this requires to know
>>> for each language what the respective template is. So if anyone could point
>>> me to a list of stub templates in different languages, that would also be
>>> appreciated.
>>>
>>> Thanks!
>>> Bob
>>>
>>> --
>>> Up for a little language game? -- http://www.unfun.me
>>>
>>> ___
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Identifying Wikipedia stubs in various languages

2016-09-20 Thread Stuart A. Yeates
en:WP:DYK has a measure of 1,500+ characters of prose, which is a useful
cutoff. There is weaponised javascript to measure that at en:WP:Did you
know/DYKcheck

Probably doesn't translate to CJK languages which have radically different
information content per character.

cheers
stuart

--
...let us be heard from red core to black sky

On Tue, Sep 20, 2016 at 9:26 PM, Robert West  wrote:

> Hi everyone,
>
> Does anyone know if there's a straightforward (ideally
> language-independent) way of identifying stub articles in Wikipedia?
>
> Whatever works is ok, whether it's publicly available data or data
> accessible only on the WMF cluster.
>
> I've found lists for various languages (e.g., Italian
>  or English
> ), but the
> lists are in different formats, so separate code is required for each
> language, which doesn't scale.
>
> I guess in the worst case, I'll have to grep for the respective stub
> templates in the respective wikitext dumps, but even this requires to know
> for each language what the respective template is. So if anyone could point
> me to a list of stub templates in different languages, that would also be
> appreciated.
>
> Thanks!
> Bob
>
> --
> Up for a little language game? -- http://www.unfun.me
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Thinking big: scaling up Wikimedia's contributor population by two orders of magnitude

2016-08-28 Thread Stuart A. Yeates
I completely disagree with this criticism of the WMF.

It seems to me that the main barriers to getting gamification happening in
relation to en.wiki are cultural / organisational issues not marketing ones.

If the editing communities genuinely wanted huge influxes of complete
newbie editors, I have no doubt that the commercial partners who benefit
from wikipedia could send them our way pretty trivially. What the editing
communities want / need is new minimally-competent editors, and crafting
them from complete newbies (typically called on-boarding) is very costly.

See https://en.wikipedia.org/wiki/Onboarding for an overview of the
complexities.

cheers
stuart

--
...let us be heard from red core to black sky

On Sun, Aug 28, 2016 at 5:37 PM, Gerard Meijssen 
wrote:

> Hoi,
> You are absolutely right. Both approaches have promise. It is however a
> marketing job, not a research job to realise their potential. Marketing is
> where the WMF sucks.
> Thanks,
>   GerardM
>
> On 27 August 2016 at 22:49, Dario Taraborelli 
> wrote:
>
>> Nice, thought-provoking post, Pine.
>>
>> Here's my take on two ways to attract a population of good-faith
>> contributors 1 or 2 orders of magnitude larger than the current one, based
>> on what I've seen over the last couple of years:
>>
>> *Gamified interfaces for microcontributions à la Wikidata game*.
>> (per GerardM) there's absolutely no doubt this model is effective at
>> creating a large volume of high-quality edits, and value to the project and
>> communities. So far these tools have been primarily targeted at an existing
>> (and relatively small) population of core contributors and the only attempt
>> at expanding this to a much broader contributor base (WikiGrok) were too
>> premature. I do expect we will see more and more of lightweight distributed
>> curation in the next 5-10 years. In my opinion Wikidata is ready to
>> experiment with a much larger number of single-purpose contributory
>> interfaces (around missing images, translations, label evaluation,
>> referencing etc)
>>
>> *Ubiquitous outreach, supported by dedicated technology*.
>> I called out in my Wikimania 2014 talk
>> 
>> the fact that the single, most effective initiative ever run to attract new
>> contributors has been WLM (I am intentionally not including initiatives
>> like WP in the classroom as they target a pre-defined population such as
>> students, but they are probably the most advanced example in this
>> category). Creating tools such as recommender systems and todo lists 
>> *tailored
>> to the interests of particular, intrinsically motivated contributors* as
>> well as the analytics dashboards  to
>> measure the relative impact and best design of these programs, is the most
>> promising venue to expand the Wikimedia contributor population.
>>
>> My 2 cents. How making the edit button 10x larger is not a solution to
>> this problem is a topic I'll reserve to a separate thread.
>>
>> Thanks for starting this thread.
>>
>> Dario
>>
>> On Sat, Aug 27, 2016 at 5:32 AM, rupert THURNER > > wrote:
>>
>>> On Sat, Aug 27, 2016 at 11:08 AM, Amir E. Aharoni <
>>> amir.ahar...@mail.huji.ac.il> wrote:
>>>
 The English Wikipedia alone has hundreds of thousands of items to fix -
 missing references, misspellings, etc. The problems are nicely sorted at
 https://en.m.wikipedia.org/wiki/Category:Wikipedia_backlog . There are
 millions of other things to fix in other projects. So quality is getting
 higher in many ways, but the amount of stuff to fix is still enormous.

 What we don't have is an easy way for new people to start eliminating
 items from the backlogs. The Wikidata games are a nice step in the right
 direction, but their appeal to new participants is non-existent.

>>>
>>> there is a backlog? after 15 years contributing you tell that on the
>>> research mailing list :) i used wikidata games for a couple of minutes and
>>> great pleasure when i see the link flying by in an email. but i am never
>>> able to find that link again in my life. maybe that is the problem? rename
>>> the "donate" link to "contribute" and then have "money" and "time" which
>>> links to code and content. just my 2c ...
>>>
>>> rupert
>>>
>>>
>>> ___
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>>
>> --
>>
>> *Dario Taraborelli  *Head of Research, Wikimedia Foundation
>> wikimediafoundation.org • nitens.org • @readermeter
>> 
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>

Re: [Wiki-research-l] Research on automatically created articles

2016-08-23 Thread Stuart A. Yeates
For the sake of completeness, the archival URL for the thread at ANI is

https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/IncidentArchive931#Moving_discussion_from_wikimedia_research_mailing_list

cheers
stuart

--
...let us be heard from red core to black sky

On Tue, Aug 16, 2016 at 7:04 AM, Samuel Klein  wrote:

> Thanks Sidd for responding actively in this thread.
>
> The biggest problem here: the algorithm used in this research were bad.
> They produced nonsense that wasn't remotely grammatical.  You should have
> caught most of these problems.  (The early version of the bot (for just
> plays) had a poor success rate as well, but it seemed plausible that a
> template for tiny play articles could be effectively filled out with
> automation.)
>
> Two interesting results IMO:
>  + A nonsensical article with a decent first sentence & sections, and refs
> (however random), can serve as encouragement to write a real article.
> Possibly more of an encouragement than just the first sentence alone.  I
> believe there's some related research into how people respond to cold
> emails that include mistakes & nonsense.  (Surely there's a more effective
> \ non-offensive way to produce similar results)
>  + We could use even a naive measure of the coverage & consistency of new
> article review.  (If it drops below a certain threshhold, we could do
> something like change the background color & search-engine metadata for
> pages that haven't been properly reviewed yet)
>
> For future researchers:
> If we encourage people to spend more time making tools work – rather than
> doing something simple (even counterproductive) and writing a paper about
> it – everyone will benefit.  The main namespace is full of bots, both fully
> automatic and requiring a human to run them. Anyone considering or
> implementing wiki automation should look at them and talk to the community
> of bot maintainers.
>
> Sam
>
> On Mon, Aug 15, 2016 at 1:28 PM, siddhartha banerjee 
> wrote:
>
>> Ziko,
>>
>> Thanks for your detailed email. Agree on all the comments.
>>
>> Some earlier comments might have been harsh, but I understand that there
>> is a valid reason behind it and also the dedication of so many people
>> involved to help reach Wikipedia where it is today.
>>
>> We should have been more diligent in finding out policies and rules
>> (including IRB) before entering content on Wikipedia. We promise not to
>> repeat anything of this sort in the future and also I am trying to
>> summarize all that has been discussed here to prevent such unpleasant
>> experiences from other researchers in this area.
>>
>> -- Sidd
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
>
> --
> Samuel Klein  @metasj   w:user:sj  +1 617 529 4266
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Tips for doing wikipedia research

2016-08-18 Thread Stuart A. Yeates
In a parallel thread to this, things to avoid when doing wikipedia research
are being discussed. That's turning quite negative, so I thought I'd start
a contra-thead. This is all my personal experience and advice based on a
decade in en.wiki and twice that in academia.


Which account to use?
=

There are rules about use of multiple accounts, these are detailed at
[[Wikipedia:Sock puppetry]]. Sockpuppetry is probably the harshestly
punished of errors on en.wiki. The main points to note are:

(a) using multiple accounts to hide edits, the extent of editing or to
create a false impression is strictly forbidden. [If your experimental
design depends on true anonymity, take it to
[[WP:Administrators%27_noticeboard]] for advice on how to proceed, but be
prepared to justify yourself.]

(b) you can separate your general account from accounts used for testing,
experiments, etc. In the case of automated editing, all automated editing
must be done in a separate account with a name ending in the letters 'bot'
If you use separate accounts, on-wiki stats can be used.

(c) if you use separate accounts, link them using the templates
at [[Wikipedia:Userboxes/Wikipedia/Related accounts]] or clear statements
on the user pages. See for example by user pages at [[User:Stuartyeates]]
and [[User:Stuartyeates (code test)]]

(d) different accounts can have different privileges and flags,
particularly the bot flag which permits fully automated editing and
autopatrolled which controls the level of scrutiny of new pages. You can
request for funky permissions at [[Wikipedia:Requests for permissions]],
any admin can remove permissions on request.


What kinds of topics to use


As a tertiary source, Wikipedia has universal scope, biased by the coverage
of subjects in secondary sources. Some parts of that scope is more
contested than other parts and unless you wish to explicit deal with that
contestation, you probably want to avoid some things.

Hotly contested topics: current and ongoing political and military clashes
(Crimea, the South China Sea, etc.); current political/social hot-potatoes
in countries with significant English speaking populations (abortion
rights, gamergate, climate change, etc); peak ideological figures (Jesus,
Hitler, etc); current political candidates (Trump, Clinton, etc),

Average contested topics: Biographies of living people, etc

Safer topics: Biographies of the dead (give them 12 months for obits,
probate etc), migrating data from authoritative third-party sources
(official gazetteers, national biographies, etc), recent (1-5 years ago)
academic advances, etc.


Problems we don't have
===

1) We have enough short stub articles which need expanding. We don't need
more unless they clearly serve a secondary purpose, such as combatting our
acknowledge systematic biases.

2) We are already aware of hundreds of ways to disrupt wikipedia, we don't
need demonstrations.

3) We know the internet and internet search engines are full of unreliable
websites, content farms paid shrills and armchair politicians, we don't
need any more of them added to en.wiki as sources.

4) We know that certain kinds of hoaxes are pretty trivial pull off, we
don't need any more, thanks.


Systematic problems we have
===

(1) Attracting, retaining and motivating a large, diverse pool of editors

(2) Preventing self-interested and promotional editing

(3) Countering our well-known systematic biases (gender, culture,
geography, time, etc)

(4) How should we resolve conflict among well-meaning editors when it arises

(5) How do we improve our processes so that as much as possible we're
dealing with the deep issues rather than the shallow issues.

(6) Approximately 1/2000 of our articles meet our highest quality
requirements. How do we improve that?

(7) Disambiguation of names (aka authority control). There are lots of
institutions, people, places and events with similar names. Sorting them
out is hard, very hard.


Incidental problems we have and might make good research topics
===

For a list of incidental issues that arise see [[Wikipedia:Bot requests]].


Other points
==

Create new pages in the Draft namespace rather than the Article namespace

Putting changes or additional sources on the talk page of an article rather
than in the body is inherently safer.

Wait for the natural conclusion of events before writing about them. For a
crime, this is sentencing or end of appeals period; for an academic
discovery this is independent verification; for a medical breakthrough this
is gaining FDA approval (or giving up); etc.

Newly created articles have high mortality and high visibility. Techniques
that improve their handling are likely to be welcomed. Possibilities
include: adding a 'links' section with automatically discovered
likely-to-be-reliable sources; comparing the new article to existing
article and adding a 

Re: [Wiki-research-l] Research on automatically created articles

2016-08-14 Thread Stuart A. Yeates
I disagree.

You continue to treat the problems as information systems issues; they are
people / ethnographic issues.

This is typified in your complete failure to separate WMF, admins and
editors. WMF host this mailing list and the web servers involved and are
irrelevant unless we're dealing with libel, slander or denial of service
attacks (which we don't appear to be) or applying for grants. en.wiki
editors are the rank and file group of people who you've been making work
for (fixing pages, suggest pages for deletion, etc). en.wiki admins enforce
the consensus of editors (block users, delete pages based on an established
consensus, etc).

cheers
stuart

--
...let us be heard from red core to black sky

On Mon, Aug 15, 2016 at 10:16 AM, siddhartha banerjee 
wrote:

> Hello,
>
> Based on the discussion and suggestion in the Admin incidents page:
> https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/
> Incidents#their_results, I have gone to each of the articles (that still
> existed) and made corrections and changes necessary -- both in terms of the
> content written as well as unreliable sources. I have requested
> administrators to check if my edits still have issues, and I would go back
> and change anything else required. I guess my advisor would be posting to
> this thread only later this week, so before that I wanted to summarize all
> that I learnt during the discussion here and on the incidents page.
>
> 1. Multiple accounts policy: Do not use multiple user accounts to post
> content.
> 2. Research ethics:  There was a serious issue in assumptions made (even
> by other researchers as can be seen from the multiple papers mentioned who
> work in this area). Furthermore, when our previous work (
> https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_
> Signpost/2015-01-28/Recent_research) was mentioned on Wikimedia
> newsletter, it did not provide any indication to us about the issues with
> legitimacy about this kind of research. But, based on that, the assumptions
> were inappropriate. It is better to involve the WMF community by letting
> them know about any project prior to its start and engaging them such that
> best decisions could be taken and such similar situations do not arise.
> As an administrator mentioned in the discussion and I think is very
> important to note: 'you not only denied the community the opportunity to
> decide whether we wish to allow/participate in this research, you precluded
> any efforts we might have made to minimize the disruption and affect a
> quick clean-up'.
> Based on the last few emails, it seems that IRB is waived, however, that
> waiver should be stamped (but this should be after the community has been
> informed of a task -- if a research might cause some disruption, it should
> not be done at any cost). Also, it would be better to create articles in a
> different namespace. The problem here was that clicking on red-links
> directly went to the article creation markup page -- which should have been
> put into draft space. But still, even creating drafts imply that other
> editors are looking at it, which should not be done without prior consent.
> Testing of any content should be done offline, and not on Wikipedia -- as
> it can potentially disrupt. Even with moderate quality content, it implies
> wastage of time for editors. I plan to bring all of these to the notice of
> the research committee who had approved this work such that similar issues
> do not happen in the future. Also, I plan to write on this and share this
> to the wider community who have worked or are working on similar problems
> [I am not sure if they have already been contacted by someone from WMF]. If
> they could be also roped into the discussion. that would be better is what
> I think.
> One thing I would quote from the discussion in the incident page:"Because
> researchers and institutions need to realize that this project is not a
> laboratory for their work, not unless they make an effort to work with the
> community" and this is also very important.
> My apologies for the extra work that had to be done by the numerous
> editors to edit the content and clean them -- that cannot be reverted now
> but can definitely be stopped in future. We did not add any content after
> Feb earlier this year and have promised in that discussion not to create
> anything more. If we want to do some analysis, we plan to use other
> crowdsourcing techniques (such as Amazon mech turk) and find out quality of
> the generated content.
>
> Please add anything you think that I have missed and also regarding the
> clean-up as I have tried to remove the irrelevant material from all the
> articles edited using the usernames.
>
>
> Thanks,
> Sidd
>
>
>
>
>
> On Fri, Aug 12, 2016 at 10:02 AM, siddhartha banerjee 
> wrote:
>
>> Hi,
>>
>> My advisor, Prof. Mitra is busy in travels this week. He said he will be
>> posting to this thread about his thoughts later 

Re: [Wiki-research-l] Research on automatically created articles

2016-08-11 Thread Stuart A. Yeates
You interacted with living people through wikipedia and you're recorded
information about whether they deleted your references. Many people (like
myself) are directly traceable from their wikipedia accounts to their
real-life identifies.

How is a record of someone doing something not about them?

cheers
stuart


--
...let us be heard from red core to black sky

On Fri, Aug 12, 2016 at 2:04 PM, siddhartha banerjee 
wrote:

> As I mentioned earlier, I was not sure about the multiple account policy.
> I got the notification about the incident being raised, and I will be happy
> with whatever decisions Wiki administrators make.
>
> As Denny mentioned, we did not plan anything large-scale but only for a
> small group of edits. Furthermore, we mentioned the results only being
> valid until a particular date before the submission of that conference
> paper and things may have already changed a lot (articles removed, edited
> further, etc). We have not made any additions since Feb, nor do we plan to
> do anything further. Whatever we do, would be offline.
>
> To Denny's point about other researchers trying to do the same kind of
> research, I do see research in this area coming up and it might make sense
> to have certain rules (although I do not have much idea on how these things
> work abt rules on Wiki in general.) I know this because some researchers
> have contacted me previously on this work, and they are also looking into
> similar areas. One example in this area of work is the following -- this is
> very recent: http://snap.stanford.edu/wikiworkshop2016/papers/
> wikiworkshop_icwsm2016_pochampally.pdf
>
> Regarding human subjects, no reviewers in the conferences as well as any
> other person from Wikimedia mentioned anything on that earlier. Our
> previous works were featured earlier on Wikimedia newsletters (links in
> earlier emails) and still nothing on it was mentioned nor we found any
> information on Wikipedia in general about it. As per the requirements,
> approval would be necessary if: *Data about living individuals through
> intervention or interaction or **Identifiable private information about
> living individuals. *As is mentioned. the "about" fact is very imp --
> because nothing about editors data was used or collected in the research.
>
> If rules do change, I will keep following the thread and also please let
> me know -- I will try to inform to all researchers who work in this area if
> they get in touch with me.
>
> -- Siddhartha
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Research on automatically created articles

2016-08-11 Thread Stuart A. Yeates
I think you misunderstand the nature of en.wiki.

en.wiki is not a rule-based automata; en.wiki is an autonomous community
that works by consensus.

I cannot imagine a set of research rules constructed outside en.wiki that
lets you 'safely' do interact with it. Observe it, maybe, but not interact
with it. I can also imagine that certain kinds of observation (or certain
results coming out of observation) making further observation difficult.

The best advice I can provide is to team up with an experienced editor or
two.

[For editing for educational rather than research purposes see
https://en.wikipedia.org/wiki/Wikipedia:Education_program ]

cheers
stuart

--
...let us be heard from red core to black sky

On Fri, Aug 12, 2016 at 11:04 AM, Ziko van Dijk <zvand...@gmail.com> wrote:

> Hello,
>
> Do we have a collection of already existing and relevant policies and
> statements, at least for English Wikipedia? On Meta I found this page
> https://meta.wikimedia.org/wiki/Research:Wikipedia_Research_Management
> which main statement is that research is too various and complex to give
> some few recommendations.
>
> At first sight, I find it difficult to read something relevant from
> https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not
>
> I imagine that guidelines could be helpful with regard to a) research that
> includes editing wiki pages, b) the editing of students or pupils for
> educational purposes.
>
> Research and educational activity should not disturb the efforts of the
> Wikipedia community to create and improve encyclopedic content.
> Disturbance can occur from creating sub standard content and involving in
> activities that disrupts work flows. ...
>
> This guidelines could be only a recommendation, as long the Wikipedia
> communities don't change their rules. But it'd be great, anyway, if the
> guidelines can be based somehow on existing Wikipedia rules.
>
> Kind regards
> Ziko
>
>
>
>
>
> 2016-08-12 0:41 GMT+02:00 Denny Vrandečić <vrande...@gmail.com>:
>
>> So here's the list of accounts that were used in order to create the
>> articles:
>>
>> https://en.wikipedia.org/wiki/Special:Contributions/Brownweepy
>> https://en.wikipedia.org/wiki/Special:Contributions/Theatremania
>> https://en.wikipedia.org/wiki/Special:Contributions/Bhopebhai
>> https://en.wikipedia.org/wiki/Special:Contributions/Dicdac123
>> https://en.wikipedia.org/wiki/Special:Contributions/MightyPepper
>>
>> Also some edits may have been done through IPs.
>>
>> In discussion with Sidd it was clear that they did not plan to ever
>> mass-create a large number of articles, and it is only these 50 articles or
>> so we can clean up now. I am not terribly worried about this particular
>> work (according to the paper there were 47 surviving articles at the time
>> of writing, i.e. in Spring).
>>
>> What I am concerned about is the fact that there will be more such
>> experiments from other groups. It would be great to set up a few rules for
>> this kind of behavior, so that we can at least point to them. If the only
>> rule that was broken here was the "don't use multiple accounts" rule, I am
>> not sure whether that would be sufficient.
>>
>> Cheers,
>> Denny
>>
>>
>>
>> On Wed, Aug 10, 2016 at 1:47 AM Stuart A. Yeates <syea...@gmail.com>
>> wrote:
>>
>>> * The previous work you cite appears to have created articles in the
>>> draft namespace rather than the article namespace. This is a very important
>>> and very relevant detail, meaning your situation is in no way comparable to
>>> the previous work from my point of view
>>> * You appear to be solving a problem that the community of wikipedia
>>> editors does not have. We have enough low-quality stub articles that need
>>> human effort to improve and we're not really interested in more unless
>>> either (a) they demonstrably combat some of the systematic biases we're
>>> struggling with or (b) they demonstrably attract new cohorts users to do
>>> that improvement. Note that the examples discussed in the research
>>> newsletter are a non-English writer and a women writer. These are important
>>> details.
>>> * Your paper appears not to attempt to make any attempt to measure the
>>> statistical significance of your results; this isn't science.
>>> * Most of your sources are _really_ _really_ bad.
>>> https://en.wikipedia.org/wiki/Talonid Contains 8 unique refs, one of
>>> which is good, one of which is a passable and the others should be removed
>>> immediately (but I won't because it'll make it harder for t

Re: [Wiki-research-l] Research on automatically created articles

2016-08-10 Thread Stuart A. Yeates
* The previous work you cite appears to have created articles in the draft
namespace rather than the article namespace. This is a very important and
very relevant detail, meaning your situation is in no way comparable to the
previous work from my point of view
* You appear to be solving a problem that the community of wikipedia
editors does not have. We have enough low-quality stub articles that need
human effort to improve and we're not really interested in more unless
either (a) they demonstrably combat some of the systematic biases we're
struggling with or (b) they demonstrably attract new cohorts users to do
that improvement. Note that the examples discussed in the research
newsletter are a non-English writer and a women writer. These are important
details.
* Your paper appears not to attempt to make any attempt to measure the
statistical significance of your results; this isn't science.
* Most of your sources are _really_ _really_ bad. https://en.wikipedia.org/
wiki/Talonid Contains 8 unique refs, one of which is good, one of which is
a passable and the others should be removed immediately (but I won't
because it'll make it harder for third parties reading this conversation to
follow it.).

If you want to properly evaluate your technique, try this: Randomly pick N
articles from https://en.wikipedia.org/wiki/Category:Articles_lacking_
sources subcats splitting them into control and subjects randomly. Parse
each subject article for sentences that your system appears to understand.
For each sentence your thing you understand look for reliable sources to
support that sentence. Add a single ref to a single statement in each
article. Add all the refs using a single account with a message on the user
page about the nature of the edits. If you're not able to add any refs,
mark it as a failure. Measure article lifespan for each group.

If you're in a hurry and want fast results, work with articles less than a
week old (hint: articles IDs are numerically increasing sequence) or the
intersection of https://en.wikipedia.org/wiki/Category:Articles_lacking_
sources subcats and Category:Articles_for_deletion Both of these groups of
articles are actively being considered for deletion.

cheers
stuart


--
...let us be heard from red core to black sky

On Wed, Aug 10, 2016 at 9:30 AM, siddhartha banerjee 
wrote:

> Hello Everyone,
>
> I am the first author of the paper that Denny has referred. Firstly, I
> want to thank Denny for asking me to join this list and know more about
> this discussion.
>
> 1. Regarding quality, we know that there are issues, and even in the
> conference, I have repeatedly told the audience that I am not satisfied
> with the quality of the content generated. However, the percentage of
> articles that were not removed when the paper was submitted was minimal. I
> have sent Denny a list of accounts that were used and it might have been
> possible that several articles created have been removed from those
> accounts within the last couple of months. I was not aware of the multiple
> account policy.
>
> 2. The area of Wikipedia article generation have been explored by others
> in the past. [http://www.aclweb.org/anthology/P09-1024, http://
> wwwconference.org/proceedings/www2011/companion/p161.pdf] We were not
> aware of any rules regarding these sort of experiments. However, we do
> understand that such experiments can harm the general quality of this great
> encyclopedic resource, hence we did out analysis on bare minimum articles.
> In fact, we did our initial work on it back in 2014, and Wikimedia research
> even covered details about our paper here -- https://blog.wikimedia.org/
> 2015/02/02/wikimedia-research-newsletter-january-2015/#Bot_
> detects_theatre_play_scripts_on_the_web_and_writes_
> Wikipedia_articles_about_them
>
> If questions were raised at that point, we would surely not have done
> anything further on this, or rather do things offline without creating or
> adding any content on Wikipedia.
>
> I understand your point about imposing rules and I think it makes sense.
> However, during this research, we were not aware of any rules, hence
> continued our work.
> As I have told Denny, our purpose was to check whether we could create
> bare minimal articles which could be eventually improved by authors on
> Wikipedia, and also to see if they are totally removed. But, it was done
> with a few articles and we did not create anything beyond that point. Also,
> we did not do any manual modifications to the articles although we saw
> quality issues because it would void our analysis and claims.
>
> Thanks everyone for your time and the great work you are doing for the
> Wikipedia community.
>
> Regards,
> Sidd
>
>
>
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l 

Re: [Wiki-research-l] WMF Open Access Policy and Independent Researchers

2016-06-28 Thread Stuart A. Yeates
There are many open access journals which do not charge fees or any
description.  See http://www.opendoar.org/ or talk to a friendly librarian
to find a journal that meets your needs.

cheers
stuart

--
...let us be heard from red core to black sky

On Wed, Jun 29, 2016 at 1:49 PM, Maximilian Klein  wrote:

> Hello All,
>
> As you might know WMF has an Open Access Policy that requires all work
> that they fund to be Open Access[1]. A strange consequence of this policy,
> that I recently ran into, is that it requires researchers funded by grants
> to publish OA -- but without providing any funding to do so. That is, I
> recently completed an Individual Engagement Grant (IEG), part of whose
> scope was explicitly to write a paper about the work[2], and when I wrote
> to WMF to acquire funds for OA publishing, they confirmed that the paper
> was under the OA mandate but indicated that funds were not available to pay
> for OA publishing.
>
> Has anyone else use WMF's Open Access Policy?  What was your experience?
>
> [1] https://wikimediafoundation.org/wiki/Open_access_policy
> [2]
> https://meta.wikimedia.org/wiki/Grants:IEG/WIGI:_Wikipedia_Gender_Index#Activities
>
> Make a great day,
> Max Klein ‽ http://notconfusing.com/
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] citing female academics

2016-02-28 Thread Stuart A. Yeates
Data has been sucked from GND to wikidata via a number of routes,
principally VIAF.
See Wikidata:Bot_requests#Import_GND_identifiers_from_VIAF_dump for example
for a discussion of an instance of this.

cheers
stuart

--
...let us be heard from red core to black sky

On Mon, Feb 29, 2016 at 7:50 AM, Gerard Meijssen <gerard.meijs...@gmail.com>
wrote:

> Hoi,
> The blog states that a lot of data was sucked into Wikidata from GND. As
> far as I am aware that never happened. So its assertion is wrong.
> Thanks,
>   GerardM
>
> On 28 February 2016 at 19:43, Stuart A. Yeates <syea...@gmail.com> wrote:
>
>>
>>
>> --
>> ...let us be heard from red core to black sky
>>
>> On Mon, Feb 29, 2016 at 7:14 AM, Gerard Meijssen <
>> gerard.meijs...@gmail.com> wrote:
>>
>>> Hoi,
>>> It is trivial when you only consider Wikidata.
>>>
>>
>> I've previous blogged about the issues with sex / gender in wikidata at
>> http://opensourceexile.blogspot.co.nz/2014/07/adrian-pohl-wrote-some-excellent.html
>> has the sitaution moved on?
>>
>> cheers
>> stuart
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] citing female academics

2016-02-28 Thread Stuart A. Yeates
--
...let us be heard from red core to black sky

On Mon, Feb 29, 2016 at 7:14 AM, Gerard Meijssen 
wrote:

> Hoi,
> It is trivial when you only consider Wikidata.
>

I've previous blogged about the issues with sex / gender in wikidata at
http://opensourceexile.blogspot.co.nz/2014/07/adrian-pohl-wrote-some-excellent.html
has the sitaution moved on?

cheers
stuart
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] citing female academics

2016-02-28 Thread Stuart A. Yeates
On Monday, 29 February 2016, Joe Corneli  wrote:

> On Sun, Feb 28, 2016 at 8:24 AM, Jane Darnell  > wrote:
>
> > Oddly, there appears to be no solidarity among female Wikipedians that
> take
> > this into account, because I assume we have lots of female academic
> > Wikipedians who could easily write about other female academics in
> academic
> > articles (or on Wikipedia) if they wanted to and don't.
>
> I have a very basic question, to do with navigating Wikipedia's
> categories.  Is there a sensible way to query the category system (or
> extracts, e.g. to DBPedia) to produce a side-by-side comparison of how
> many pages on♀vs ♂ [might as well add: vs ⚧, i.e. nonbinary] academics
> there are in existence on Wikipedia?


I have written biographies of third gender academics as well as those who
appear not to have published gender info.  Finding relable sources on this
facet of private people is very, very hard and likely to be a stumbling
block to actually writing articles.

Cheers
Stuart



-- 
--
...let us be heard from red core to black sky
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] citing female academics

2016-02-28 Thread Stuart A. Yeates
I've done a lot of [[WP:NPP]], and I already have a prejudice about at
article before I've read the first word based on the layout of the article
(bold name? cats? infobox? reference section? reference section in
columns?).

I recently did a push to increase the diversity of coverage of local
academics
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_New_Zealand/Requested_articles/New_Zealand_academic_biographies
and I can't stress enough how useful the 'page furniture' (everything other
than the body text) is for the palatability of stabs to the editors at new
page patrol and other others dealing with new articles. If you're doing a
lot of articles, investment in a good template is time well spent. The one
I used for this is at
https://en.wikipedia.org/wiki/User:Stuartyeates/sandbox/academicbio

cheers
stuart


--
...let us be heard from red core to black sky

On Sun, Feb 28, 2016 at 9:24 PM, Jane Darnell  wrote:

> Well I think it is even more basic than that. People (and myself as
> Wikipedian included) tend to google search for info and rarely pick up the
> pay-walled stuff if their searches are set to free knowledge. We all know
> how google favors Wikipedia, but this particular female academic has no
> Wikipedia page, while the guy who wrote the offending blog post does. I
> mean the one who wrote the book (which would probably have been on such a
> page) but the woman who wrote the blog about the blog doesn't have one
> either. So if the guy just googled the stuff there is a very good chance
> that he really didn't pick up the info that the blog is objecting to. In
> other words, the problem with systemic bias is even worse than she knows.
>
> Oddly, there appears to be no solidarity among female Wikipedians that
> take this into account, because I assume we have lots of female academic
> Wikipedians who could easily write about other female academics in academic
> articles (or on Wikipedia) if they wanted to and don't. In fact, on
> Wikipedia they just hold them to the same biased standards and are probably
> (being detail oriented) even more careful with "the rules" as men are,
> which Yaroslav discovered to his distaste this week when I asked him (as
> academic) to take a look at an AfC for Nitasha Kaul which he successfully
> created after crossing swords with a (self-proclaimed female academic) AfC
> volunteer LaMona:
>
> https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:WikiProject_Articles_for_creation=707318059
> (scroll down to Draft:Nitasha Kaul)
>
> Even though I was annoyed enough to post about this on facebook (which is
> where Yaroslav responded) I don't even fault LaMona for her behavior, since
> she is "just following AfC rules" and has probably never even realized that
> what she did was not only not taking the wider academic community's female
> bias into account, but also the "Global South bias" and the "people of
> color bias". This is exactly why we organize things like Art and
> Women's History month, if only to try and get the conversation started. You
> only start to understand the problem when you do something like what
> Yaroslav did (which I myself was unwilling to do, to my shame).
>
> For the record, as Yaroslav is a common figure at AfD, his comment that it
> would be kept there is what allowed the article in main namespace:
> https://en.wikipedia.org/wiki/Talk:Nitasha_Kaul
>
>
> On Sun, Feb 28, 2016 at 6:13 AM, Mark J. Nelson  wrote:
>
>> One exacerbating factor maybe worth adding in, which is also relevant
>> for what Wikipedia cites imo, is that more popular or journalistic
>> writing tends not to cite academic writing, even when very relevant,
>> sometimes even when the journalist/author in question actually did read
>> something by the academic in question during the course of their
>> research. Partly this is because journalistic/popular writing has much
>> less emphasis on citations as currency to begin with, and stylstically
>> prefers to avoid citations and footnotes. And partly because they seem
>> to only consider other things on a similar level of popularity worth
>> acknowledging--- other best-sellers, well-known pundits, even
>> high-traffic blogs, but not as much the lowly academic monograph or
>> journal article.
>>
>> -Mark
>>
>> Heather Ford  writes:
>>
>> > There's an interesting discussion going on right now on the Association
>> of
>> > Internet Researchers mailing list about the citing of women (and women
>> of
>> > colour) in academia that I thought might be interesting. The comments
>> are
>> > also really (as Gabriella Coleman noted) 'lively' so they're worth a
>> read
>> > too. I'd be curious to learn more about how we as a Wikipedia research
>> > community fare here too...
>> >
>> >
>> https://merylalper.com/2016/02/22/please-read-the-article-please-cite-women-academics/
>> >
>> > Best,
>> > Heather.
>> >
>> > Dr Heather Ford
>> > University Academic Fellow
>> > School of 

Re: [Wiki-research-l] Looking for stats of registered Wikipedians

2015-08-02 Thread Stuart A. Yeates
Due to the pseudonymous nature of Wikipedia, there is no way to collect the
number of Wikipedians, only the number of accounts they create and the
actions of those accounts.

cheers
stuart

--
...let us be heard from red core to black sky

On Mon, Aug 3, 2015 at 8:25 AM, Srijan Kumar srijanke...@gmail.com wrote:

 Hi everyone,

 I am looking for the average number of Wikipedians who register on English
 Wikipedia per month, and make at least one edit. Is there any such
 information?
 http://reportcard.wmflabs.org/graphs/new_editors has similar information,
 but quite what I am looking for.

 Thanks. Any pointers are appreciated!
 Srijan

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Community health (retitled thread)

2015-06-04 Thread Stuart A. Yeates
 Here's a list of possible metrics that we could use for measuring
community health.

That's a great list, with some great metrics. I'd be included to add some
silo-breaking metrics which measure activity across projects or across
silos within projects:

* Number of editors with actions/edits on more than N wikis (N=2, N=3, etc)
* Number of editors with actions/edits on more than N namespaces on the
same wiki (N=2, N=3, etc)
...

cheers
stuart


--
...let us be heard from red core to black sky
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-07 Thread Stuart A. Yeates
Accept-language is systematically broken for minority languages within
dominant language communities. In New Zealand, a country with three
official languages and a textbook case of language revivalism, I've never
met anyone without a degree in computer science who sets accept-language,
and I've never seen a computer system which ships with all three official
languages selectable. Most computer systems ship with en or en-us as the
default.

If there were silver bullets in this area, the solution would be obvious
and we wouldn't even be thinking about having this conversation.

cheers
stuart

On Thursday, May 7, 2015, Oliver Keyes oke...@wikimedia.org wrote:

 As I've now said...4 times, I don't think we'd be using geolocation.
 We'd be using the accept-language header. See
 https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language

 On 7 May 2015 at 00:52, WereSpielChequers werespielchequ...@gmail.com
 wrote:
  When a reader comes to Wikipedia from the web we can detect their IP
 address and that usually geolocates them to a country. More often than not
 that then tells you the dominant language of that country.
 
  If we were to default to official or dominant languages then I predict
 endless arguments as to which language(s) should be the default in which
 countries. The large expat community in some parts of the Arab world might
 prefer English over Arabic. India would want to do things by state, and a
 whole new front would emerge in the Israeli Palestine debate.
 
  Regards
 
  Jonathan Cardy
 
 
  On 7 May 2015, at 05:06, Sam Katz smk...@gmail.com wrote:
 
  hey guys, you can't guess geolocation, because occasionally you'd be
  wrong. this happens to me all the time. I want to read a site in
  spanish... and then it thinks I'm in Latin America, when I'm not.
 
  --Sam
 
  On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes oke...@wikimedia.org
 wrote:
  Possibly. But that sounds potentially wooly and sometimes inaccurate.
 
  When a browser makes a web request, it sends a header called the
  accept_language header
  (
 https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
  which indicates what languages the browser finds ideal - i.e., what
  languages the user and system are using.
 
  If we're going to make modifications here (I hope we will. But again;
  early days) I don't see a good argument for using geolocation, which
  is, as you've noted, flawed without substantial time and energy being
  applied to map those countries to probable languages. The data the
  browser already sends to the server contains the /certain/ languages.
  We can just use that.
 
  On 6 May 2015 at 22:50, Stuart A. Yeates syea...@gmail.com wrote:
  This seems like a great place to use analytics data, for each division
  in the geo-location classification, rank each of the languages by
  usage and present the top N as likely candidates (+ browser settings)
  when we need the user to pick a language.
 
  cheers
  stuart
  --
  ...let us be heard from red core to black sky
 
 
  On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson m...@anadrome.org
 wrote:
 
  Stuart A. Yeates syea...@gmail.com writes:
 
  Reading that excellent presentation, the thought that struck me was:
 
  If I wanted to subvert the assumption that Wikipedia == en.wiki,
  linking to http://www.wikipedia.org/ is what I'd do.
 
  A smarter http://www.wikipedia.org/ might guess geo-location and
 thus
  local languages.
 
  I'd also like to see something smarter done at the main page, but the
  and thus bit here is notoriously tricky.
 
  For example most geolocation-based things, like Wikidata by default,
  tend to produce funny results in Denmark. A Copenhagener is offered
  something like this choice, in order:
 
  * Danish, Greelandic, Faroese, Swedish, German, ...
 
  The reasoning here is that Danish, Greenlandic, and Faroese are
 official
  languages of the Danish Realm, which includes both Denmark proper,
 and
  two autonomous territories, Greeland and the Faroe Islands. And then
  Sweden and Germany are the two neighboring countries.
 
  But for the average Copenhagener, the following order is far more
  likely:
 
  * Danish, English, Norwegian Bokmål, ...
 
  The reason here is that Norwegian Bokmål is very close to Danish in
  written form (more than Swedish is, and especially more than Faroese
 is)
  while English is a widely used semi-official language in business,
  government, and education (for example about half of university
 theses
  are now written in English, and several major companies use it as
 their
  official workplace language).
 
  I think it's possible to come up with something that better aligns
 with
  readers' actual preferences, but it's not easy!
 
  -Mark
 
  --
  Mark J. Nelson
  Anadrome Research
  http://www.kmjn.org
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research

Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Stuart A. Yeates
Probably also an excellent time to consider whether we can do anything
for those languages which don't have wikis yet.

For example, I'm in .nz, which has en, mi and nzs as official
languages, but we're a long way from an nzs.wiki, given that ase.wiki
is still in incubator. With the release of Unicode 8 with Sutton
SignWriting in June, these may or may not kick off in a big way.

cheers
stuart
--
...let us be heard from red core to black sky


On Thu, May 7, 2015 at 12:34 PM, Oliver Keyes oke...@wikimedia.org wrote:
 Agreed! That's one of the changes I'd really like to push ahead with,
 although we're going to do some more in-depth data collection before
 any redesign :).

 On 6 May 2015 at 20:27, Stuart A. Yeates syea...@gmail.com wrote:
 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

 cheers
 stuart

 --
 ...let us be heard from red core to black sky


 On Thu, May 7, 2015 at 6:40 AM, Oliver Keyes oke...@wikimedia.org wrote:
 Cross-posting to research and analytics, too!


 -- Forwarded message --
 From: Oliver Keyes oke...@wikimedia.org
 Date: 6 May 2015 at 13:11
 Subject: Traffic to the portal from Zero providers
 To: wikimedia-sea...@lists.wikimedia.org


 Hey all,

 (Throwing this to the public list, because transparency is Good)

 I recently did a presentation on a traffic analysis to the Wikipedia
 home page - www.wikipedia.org.[1]

 One of the biggest visualisations, in impact terms, showed that a lot
 of portal traffic - far more, proportionately, than traffic to
 Wikipedia overall - is coming from India and Brazil.[2] One of the
 hypotheses was that this could be Zero traffic.

 I've done a basic analysis of the traffic, looking specifically at the
 zero headers,[3] and this hypothesis turns out to be incorrect -
 almost no zero traffic is hitting the portal. The traffic we're seeing
 from Brazil and India is not zero-based.

 This makes a lot of sense (the reason mobile traffic redirects to the
 enwiki home page from the portal is the Zero extension, so presumably
 this happens specifically to Zero traffic) but it does mean that our
 null hypothesis - that this traffic is down to ISP-level or
 device-level design choices and links - is more likely to be correct.

 [1] http://ironholds.org/misc/homepage_presentation.html
 [2] http://ironholds.org/misc/homepage_presentation.html#/11
 [3] https://phabricator.wikimedia.org/T98076

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation


 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Stuart A. Yeates
Reading that excellent presentation, the thought that struck me was:

If I wanted to subvert the assumption that Wikipedia == en.wiki,
linking to http://www.wikipedia.org/ is what I'd do.

A smarter http://www.wikipedia.org/ might guess geo-location and thus
local languages.

cheers
stuart

--
...let us be heard from red core to black sky


On Thu, May 7, 2015 at 6:40 AM, Oliver Keyes oke...@wikimedia.org wrote:
 Cross-posting to research and analytics, too!


 -- Forwarded message --
 From: Oliver Keyes oke...@wikimedia.org
 Date: 6 May 2015 at 13:11
 Subject: Traffic to the portal from Zero providers
 To: wikimedia-sea...@lists.wikimedia.org


 Hey all,

 (Throwing this to the public list, because transparency is Good)

 I recently did a presentation on a traffic analysis to the Wikipedia
 home page - www.wikipedia.org.[1]

 One of the biggest visualisations, in impact terms, showed that a lot
 of portal traffic - far more, proportionately, than traffic to
 Wikipedia overall - is coming from India and Brazil.[2] One of the
 hypotheses was that this could be Zero traffic.

 I've done a basic analysis of the traffic, looking specifically at the
 zero headers,[3] and this hypothesis turns out to be incorrect -
 almost no zero traffic is hitting the portal. The traffic we're seeing
 from Brazil and India is not zero-based.

 This makes a lot of sense (the reason mobile traffic redirects to the
 enwiki home page from the portal is the Zero extension, so presumably
 this happens specifically to Zero traffic) but it does mean that our
 null hypothesis - that this traffic is down to ISP-level or
 device-level design choices and links - is more likely to be correct.

 [1] http://ironholds.org/misc/homepage_presentation.html
 [2] http://ironholds.org/misc/homepage_presentation.html#/11
 [3] https://phabricator.wikimedia.org/T98076

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation


 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread Stuart A. Yeates
This seems like a great place to use analytics data, for each division
in the geo-location classification, rank each of the languages by
usage and present the top N as likely candidates (+ browser settings)
when we need the user to pick a language.

cheers
stuart
--
...let us be heard from red core to black sky


On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson m...@anadrome.org wrote:

 Stuart A. Yeates syea...@gmail.com writes:

 Reading that excellent presentation, the thought that struck me was:

 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.

 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.

 I'd also like to see something smarter done at the main page, but the
 and thus bit here is notoriously tricky.

 For example most geolocation-based things, like Wikidata by default,
 tend to produce funny results in Denmark. A Copenhagener is offered
 something like this choice, in order:

 * Danish, Greelandic, Faroese, Swedish, German, ...

 The reasoning here is that Danish, Greenlandic, and Faroese are official
 languages of the Danish Realm, which includes both Denmark proper, and
 two autonomous territories, Greeland and the Faroe Islands. And then
 Sweden and Germany are the two neighboring countries.

 But for the average Copenhagener, the following order is far more
 likely:

 * Danish, English, Norwegian Bokmål, ...

 The reason here is that Norwegian Bokmål is very close to Danish in
 written form (more than Swedish is, and especially more than Faroese is)
 while English is a widely used semi-official language in business,
 government, and education (for example about half of university theses
 are now written in English, and several major companies use it as their
 official workplace language).

 I think it's possible to come up with something that better aligns with
 readers' actual preferences, but it's not easy!

 -Mark

 --
 Mark J. Nelson
 Anadrome Research
 http://www.kmjn.org

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Anyone have access to this article?

2015-04-01 Thread Stuart A. Yeates
I have to say that a WMF staffer using their official WMF account to
ask community members to commit copyright infringement is not a good
look.

cheers
stuart
--
...let us be heard from red core to black sky


On Thu, Apr 2, 2015 at 10:48 AM, Jonathan Morgan jmor...@wikimedia.org wrote:
 http://onlinelibrary.wiley.com/doi/10./jcom.12123/abstract

 What Creates Interactivity in Online News Discussions? An Exploratory
 Analysis of Discussion Factors in User Comments on News Items

 If you have access, and can send me a PDF offline, I would be very grateful
 :)

 Cheers,
 Jonathan


 --
 Jonathan T. Morgan
 Community Research Lead
 Wikimedia Foundation
 User:Jmorgan (WMF)
 jmor...@wikimedia.org


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Anyone have access to this article?

2015-04-01 Thread Stuart A. Yeates
I think you mean might have been permissible, if the original request
had included the intended use.

cheers
stuart

--
...let us be heard from red core to black sky


On Thu, Apr 2, 2015 at 10:57 AM, Nicole Askin nask...@alumni.uwo.ca wrote:
 Stuart, this is permissible per Wiley's terms of use - Authorized Users may
 also transmit such material to a third-party colleague in hard copy or
 electronically for personal use or scholarly, educational, or scientific
 research or professional use.
 Nicole

 On Wed, Apr 1, 2015 at 2:52 PM, Stuart A. Yeates syea...@gmail.com wrote:

 I have to say that a WMF staffer using their official WMF account to
 ask community members to commit copyright infringement is not a good
 look.

 cheers
 stuart
 --
 ...let us be heard from red core to black sky


 On Thu, Apr 2, 2015 at 10:48 AM, Jonathan Morgan jmor...@wikimedia.org
 wrote:
  http://onlinelibrary.wiley.com/doi/10./jcom.12123/abstract
 
  What Creates Interactivity in Online News Discussions? An Exploratory
  Analysis of Discussion Factors in User Comments on News Items
 
  If you have access, and can send me a PDF offline, I would be very
  grateful
  :)
 
  Cheers,
  Jonathan
 
 
  --
  Jonathan T. Morgan
  Community Research Lead
  Wikimedia Foundation
  User:Jmorgan (WMF)
  jmor...@wikimedia.org
 
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] preelminary results from the Wikipedia Gender Inequality Index project - comments welcome

2015-01-13 Thread Stuart A. Yeates
I have a question about the P21.

Has any of the GND author sex information leaked into P21? because
that's known-bad data.

It's bad because  the GND in all it's wisdom decided to assign sex to
authors based on a apparent gender of the name published under, even
for periods when many women were known to publish using male
pseudonyms. I believe that VIAF takes this data at face value,
effectively poisoning pretty much every conceivable use of library
authority data for gender analysis for the foreseeable future.

cheers
stuart

--
...let us be heard from red core to black sky


On Wed, Jan 14, 2015 at 10:07 AM, Magnus Manske
magnusman...@googlemail.com wrote:
 To spam this list as well as Twitter :-)

 http://magnusmanske.de/wordpress/?p=250

 On Tue, Jan 13, 2015 at 5:41 PM, Maximilian Klein isa...@gmail.com wrote:

 Thank you all for the feedback. I will have taken away quite a few good
 ideas for further investigation, to summarize:

 Gerard - look at the ratios of those bios of a language, which exist only
 in that language.
 Han Teng - male gaze hypothesis, create a by-profession crosstabular
 analysis.
 Jane - look at the ratios of leading actors by language, and fictional
 humans more closely.
 Jonathan - perform a filter step, or perhaps a weighting by page-views.

 Thanks so much for the advice, what a great list.

 Make a great day,
 Max Klein ‽ http://notconfusing.com/

 On Mon, Jan 12, 2015 at 5:57 AM, WereSpielChequers
 werespielchequ...@gmail.com wrote:

 I have spent quite a bit of time at new page patrol over the years. My
 suspicion is that many if not most of the people who create articles on
 newly signed pop stars and actors are from their management agency rather
 than fans, especially if they seem too early in their career to have fans.
 Sportspeople I suggest are more likely to be written about by fans,
 especially if they have been signed by a major team, or more importantly for
 Wikipedia a team with an actively editing fan.

 On this theory the quality of articles, the number of edits, and when we
 had the Article Feedback Tool the number of is hot type comments would be
 a good indication of interest from the volunteer editing community. But
 article creation is in part a matter of the policy of the relevant talent
 agencies.

 Sorry if that sounds overly cynical, perhaps if it were possible one
 would filter out the articles that get scarcely any views and then look at
 the gender balance of articles that are of interest to our audience as well
 as our editors.

 Regards

 Jonathan Cardy


 On 11 Jan 2015, at 22:23, h hant...@gmail.com wrote:

 Hello Piotr and Gerard,

 I think a competing hypothesis would be male gaze. That is to say,
 the more female representation is not about a culture (defined as national,
 ethnic, linguistic or regional, not macho/feminine), but rather a
 gender-interest bias. Thus the more female representation could mean more
 male dominant culture, which is against the theoretical assumption of
 Piotr's research.

 Note that East Asian Wikipedians that I know, especially those who
 edit Chinese Wikipedia, are predominantly very young. Some of them can be
 highly interested in opposite sex.

 Check the following category pages as examples:
 (1a) Female actresses of every countries in the world

 http://zh.wikipedia.org/wiki/Category:%E5%90%84%E5%9C%8B%E5%A5%B3%E6%BC%94%E5%93%A1
 (1b) Male actresses of every countries in the world

 http://zh.wikipedia.org/wiki/Category:%E5%90%84%E5%9B%BD%E7%94%B7%E6%BC%94%E5%91%98

 (2a) Female Japanese AV (i.e. porn) actresses

 http://zh.wikipedia.org/w/index.php?title=Category:%E6%97%A5%E6%9C%ACAV%E5%A5%B3%E5%84%AA
 (2b) Male Japanese AV (i.e. porn) actresses

 http://zh.wikipedia.org/w/index.php?title=Category:%E6%97%A5%E6%9C%ACAV%E7%94%B7%E5%84%AA

 It is quiet clear that the male gaze hypothesis seems to apply here.
 More female presentation simply because they are there to be consumed by men
 or boys.

 So one of my suggestions for research is to select a few professional
 categories that are of interest (say, politicians, poets, entertainers,
 etc.) to do some cross-tab analysis.

 Thus, I will be extremely cautious against using the current
 metrics/methods as viable gender inequality index.

 As a proponent of data normalization and geographic normalization
 method myself, I would distinguish two sets of comparisons: one is
 cross-country or cross-language version absolute value comparison, another
 is cross-country or cross-language version normalized value comparison. By
 geographic normalization, I mean that researchers must gather another set of
 cross-country or cross-language datasets that captures some aspects of
 realities external to Wikipedia. In this case, I would say the Wikipedia
 represented politicians' gender ratio against the offline gender ratio of
 politicians. In other words, data normalization allows researchers to
 compare which language version are more or 

Re: [Wiki-research-l] Research discussion: Visions for Wikipedia

2014-10-28 Thread Stuart A. Yeates
On Wed, Oct 29, 2014 at 9:09 AM, Aaron Halfaker ahalfa...@wikimedia.org wrote:
 When that kind of roadblock gets put in the path of innovation, we're
 already ossified.

 That's an interesting opinion.  It seems that you are suggesting that the
 problem is not recoverable.  How do you know that is true?

The CC license gives us the assurance that the problem is recoverable.
The question is WMF's role the recovery.

HHVM is promising evidence that the WMF is open to technical
innovation within a single layer of the infrastructure, but note that
that has been driven from within the WMF, resourced by the WMF and I
believe peopled by the WMF.

What is needed is a framework (technical and organisational) that
allows for similar innovation to be done by non-WMF people in areas
that the WMF agrees with in principal but considers not a resourced
priority. I certainly see no evidence of that.

cheers
stuart

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Research discussion: Visions for Wikipedia

2014-10-27 Thread Stuart A. Yeates
Sometimes I find the history of gcc ([[GNU Compiler Collection]]) enlightening.

gcc was one of the first pieces of open source software to be embedded
ubiquitously, globally, in lots of very important things. By 1997 it's
development had ossified and those pushing for new features forked the
code to take it in new directions. Within two years the vigorous
development in the fork had led to it being the blessed official
version.

cheers
stuart

On Tue, Oct 28, 2014 at 12:15 PM, Oliver Keyes oke...@wikimedia.org wrote:
 If it's that trivial to implement, implement it.

 That's a very compressed way of saying; I think it's fine for us to disagree
 on this list. But, really? Pine's email made you despair? It, by
 inference, made you conclude he doesn't accept new things? You find the
 absence of a feature actively irrational?

 It's okay for Pine's vision to be different from yours, or mine, or Aaron's,
 or anyone else's. Wikimedia's ethos is not built on any one person's vision:
 it is built on the sum of all of our hopes (in an ideal universe). It's not
 a one-in, one-out system where ideas must be harshly and actively countered
 so that yours can take primacy.

 So let's try and stay non-hyperbolic and civil on this list, please. As a
 heuristic; if even /you/ feel a need to write an apology for your email into
 an email, don't hit send.

 On 27 October 2014 17:14, Gerard Meijssen gerard.meijs...@gmail.com wrote:

 Hoi,
 I read your mail again. It makes me despair.

 Wikimedia research is NOT about Wikipedia, not exclusively. When I read
 what is an inspiration to you I find all the reasons why Wikipedians do not
 accept anything new. Why we still do not have a search that also returns
 information on what is NOT in that particular Wikipedia. It is only one
 example out of many. It is however so easy to implement, it defies logic
 that it has not happened on all Wikipedias. It is just one example that
 demonstrates that we do not even share the sum of all information that is
 available to us.

 ...

 Sorry,
   GerardM

 On 20 October 2014 08:23, Pine W wiki.p...@gmail.com wrote:

 Both of the presentations at the October Wikimedia Research Showcase were
 fascinating and I encourage everyone to watch them [1]. I would like to
 continue to discuss the themes from the showcase about Wikipedia's
 adaptability, viability, and diversity.

 Aaron's discussion about Wikipedia's ongoing internal adaptations, and
 the slowing of those adaptations, reminded me of this statement from MIT
 Technology Review in 2013 (and I recommend reading the whole article [2]):

 The main source of those problems (with Wikipedia) is not mysterious.
 The loose collective running the site today, estimated to be 90 percent
 male, operates a crushing bureaucracy with an often abrasive atmosphere that
 deters newcomers who might increase partipcipation in Wikipedia and broaden
 its coverage.

 I would like to contrast that vision of Wikipedia with the vision
 presented by User:CatherineMunro (formatting tweaks by me), which I re-read
 when I need encouragement:

 THIS IS AN ENCYCLOPEDIA
 One gateway
 to the wide garden of knowledge,
 where lies
 The deep rock of our past,
 in which we must delve
 The well of our future,
 The clear water
 we must leave untainted
 for those who come after us,
 The fertile earth,
 in which truth may grow
 in bright places,
 tended by many hands,
 And the broad fall of sunshine,
 warming our first steps
 toward knowing
 how much we do not know.

 How can we align ouselves less with the former vision and more with the
 latter? [3]

 I hope that we can continue to discuss these themes on the Research
 mailing list. Please contribute your thoughts and questions there.

 Regards,

 Pine

 [1] youtube.com/watch?v=-We4GZbH3Iw

 [2]
 http://www.technologyreview.com/featuredstory/520446/the-decline-of-wikipedia/

 [3] Lest this at first seem to be impossible, I will borrow and tweak a
 quote from from George Bernard Shaw and later used by John F. Kennedy: Some
 people see things as they are and say, 'Why?' Let us dream things that never
 were and say, 'Why not?'


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] What works for increasing editor engagement?

2014-09-14 Thread Stuart A. Yeates
My personal hypothesis is that much wikipedia incivility is part of
the broader internet-troll phenomenon (google Don't Read The
Comments if you're unfamiliar with the effects of trolling). I'd be
very interested to see a linguistic comparison between classes of
edits/comments tagged as 'bad' across a range of sites which allow
unmoderated comments.

Being able to confirm that large part of the problem was actually part
of an internet-wide problem rather than a local problem would be a big
step forward.

It worries me that the WMF may, by making the wikipedia interface more
similar to other discussion systems, reduce the differences between us
and the troll-infested platforms and make it psychologically easier
for those who troll on other platforms to troll on wikipedia.

cheers
stuart

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] What works for increasing editor engagement?

2014-09-14 Thread Stuart A. Yeates
On Mon, Sep 15, 2014 at 12:29 PM, Kerry Raymond kerry.raym...@gmail.com wrote:
 I have email notification for my watch list

How many items on your watchlist? I appear to accumulated 14,871 items
on mine since I last zero'd it. Right now there are 159 changes in the
last 24 hours.

I'm not sure I could cope with that volume.

Part of the problem is probably my participation in WP:BLP/N, which
means that at least once a week I edit an article that's getting lots
of edits and likely to for some time.

cheers
stuart

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Joining derp?

2014-09-03 Thread Stuart A. Yeates
What I think we really need is better standardisation of description
of datasets, so that they can shared in machine-readable ways. Then we
can have as many different groups working with different sets of
datasets as we like and still search, find and publish globally.

cheers
stuart

On Thu, Sep 4, 2014 at 11:29 AM, Jonathan Morgan jmor...@wikimedia.org wrote:
 I don't think there's cause for you to be concerned, Stu. FWIW, we've talked
 to Tim since launch, and after we expressed our concerns he assured us that
 the model of DERP is still just facilitating connections in a non-exclusive
 way, rather than playing a role as a reviewing body or a data broker of any
 kind.

 There were other reasons we decided to be a little more cautious about
 committing to this kind of initiative. As Toby Negrin pointed out recently:
 There is one major difference between the companies involved in DERP and
 ourselves -- they all use data collected from their users to make money and
 we explicitly do not. This is frankly a point of pride for many members of
 the foundation and certainly the community.

 More pragmatically, the last week of organizing for the DERP launch just
 happened too fast for us (and happened during Wikimania, to boot!). Those of
 us in research-y roles hadn't had a chance to discuss all the evolving
 details as a team, and on the eve of the launch we didn't all feel we had a
 100% clear idea of what commitments we would be making by joining.

 But we're still on the DERP mailing list, and (if the review gods are
 merciful) we plan to co-organize a CSCW workshop with Tim Hwang and Max
 Goodman at CSCW 2015.

 We like DERP! Don't stop DERPing!

 - Jonathan




 On Wed, Sep 3, 2014 at 3:31 PM, R.Stuart Geiger sgei...@gmail.com wrote:

 Hi all, thanks for all the info. I'm a DERP fellow, which means I was
 planning on participating in this as a researcher (I'm doing some work on
 reddit, too) as well as serving as an advisory board. I apparently haven't
 been involved in the same threads/calls with the DERP organizers that Aaron,
 Jonathan, and Dario have been on, and I'm kind of shocked at what I'm
 hearing. I completely believe you guys, it just runs so opposite to what
 I've been told that I'm dreading the e-mail I think I'm going to have to
 write to the DERP folks.

 This is the first time I've heard anything about DERP being much more than
 an informal communication broker between organizations and academic
 researchers. DERP was pitched to me as a big signaling mechanism to
 researchers, platforms, and the public that there are spaces outside of
 Facebook and Twitter to do research. Wikimedia obviously doesn't need DERP
 as much as some of the smaller platforms do, but I thought it would be great
 for Wikimedia's presence (yes, the logo) to be there, standing in solidarity
 with the lesser-researched platforms. As it was explained to me, all that
 was supposed to be involved in a platform joining DERP is 1) a public
 declaration that they are open to receiving requests from researchers via
 DERP and 2) a commitment to review and respond to proposals that were
 e-mailed from researchers to DERP. In one of the fellows calls, I actually
 think someone asked whether DERP would be like an Institutional Review Board
 that would independently approve/reject studies, and we all thought that it
 would be better for these to be done on a case-by-case basis between the
 researcher and the platform(s).

 Early on, I actually suggested adding some language about ethics. I
 suggested that as we started these projects, it would be great to develop an
 ongoing, informal set of best practices for doing computational social
 science in an academic/industry partnership -- particularly in the wake of
 the Facebook emotion contagion study. Something like a series of blog posts
 about the various ethical issues we encountered in the course of doing this
 kind of research across a bunch of different platforms, and ways that they
 were resolved. Perhaps that might synthesize into a mini workshop
 culminating in a whitepaper, but it wouldn't ever be binding. As I was told
 about it, DERP's direct role ends once the researcher has made successful
 contact with the platform, aside from very high-level community organizing
 things like discussions about best practices. Same thing with data standards
 -- it is a fool's errand to mandate those, but I was told that DERP might
 one day be a hub where people could talk about how to integrate data from
 different platforms.

 I did see the language that All research supported by DERP will be
 released openly and made publicly available, but I interpreted this as
 something even weaker than Green OA -- that even if you publish in a closed
 access journal, you have to write something up about the research. Kind of
 like what Aaron did with our ABS paper. [1] The idea was that you should't
 be able to do studies in the dark without anybody ever knowing about them.
 The fellows were 

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-08 Thread Stuart A. Yeates
Web browser language settings are an obvious place to start this. This
will give you an approximation of user's preferred language (more
likely the preferred language of those who configured their software).
See http://www.w3.org/International/questions/qa-lang-priorities.en.php
for the gory details.

cheers
stuart

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] this month's research newsletter

2014-07-07 Thread Stuart A. Yeates
On Mon, Jul 7, 2014 at 7:16 PM, Joe Corneli holtzerman...@gmail.com wrote:
 On Mon, Jul 7, 2014 at 4:33 AM, Kim Osman kim.os...@qut.edu.au wrote:

 The newsletter is an important and unique space that has the potential to 
 foster this interaction through gathering current research and also 
 considering via effective and importantly *attributed* peer review, future 
 research directions. And maybe even collaborations...

 At the risk of drifting away from the initial theme in this thread,
 Kim's comment reminds me of:
 http://meta.wikimedia.org/wiki/Wiki_Research_Ideas/Research_Hub

That's a good idea, but as I understand it, the demand is for stuff
closer to the academic end of the spectrum than the wikipedia end of
the spectrum; because there's a huge demand from academics for things
that count towards tenure (etc).

Academic means peer review, basically. Maybe what we need is a peer
review journal with a pair of review panels, one of academics (is
this sound science, competently carried out?) and one of experienced
wikipedians (is this conducted in an open and transparent fashion and
showing awareness of the wiki way of doing things?). To be accepted
papers would have to pass both panels.

cheers
stuart

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] this month's research newsletter

2014-07-06 Thread Stuart A. Yeates
I've been avoiding jumping into this thread, to let people closer to
the issue have the first say but it seems to me that there are a
couple of things that bear saying:

* We're a cross-discipline group, academia and Wikipedia

* While the portion of the review in question may not have been an
appropriate academic criticism, it was certainly an appropriate
Wikipedia criticism (and a criticism I agree with).

* It's up to those who write it to collectively to decide what the
newsletter to be. Deference to the standards of academia will benefit
the careers of those in academia. Deference to the standards of
Wikipedia will increase the chances of some of this research actually
leading to better outcomes in live wiki. Maybe a better articulation
of this to reviewers and reviewed might help, as might two-part
reviews addressing the concerns of each audience separately.

* I can't believe that there's a shortage of people to write reviews.
I can believe that there's a shortage of people motivated to write
reviews. Maybe we could look at a DYK-like quid pro quo system? Note
that this could be done independently from the editing of the
newsletter, all it would take is a quorum of (potential) editors to
set up a wiki page to coordinate and set standards.

cheers
stuart

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Women on Wikidata

2014-04-20 Thread Stuart A. Yeates
On Sun, Apr 20, 2014 at 7:11 PM, Gerard Meijssen
gerard.meijs...@gmail.com wrote:

 To be blunt, Wikidata gains the quantitative quality I am looking for when 
 only male and female
 is added where applicable. Transgender issues with respect are edge cases.

Transgender issues are primarily raised because they're vitally
important for people today, but they're not the only issues.

Far more numerically superior are the issues of people writing under
other-gendered pseudonyms; that's a systemic problem, in the GND data
for example. Lord Charles Albert Florian Wellesley and Currer
Bell were only outed as pseudonyms of Charlotte Brontë once she
achieved a certain level of fame. Modern analysis suggests that there
are probably thousands if not tens of thousands of other writers who
never achieved that level of fame and never had their pseudonyms
revealed. GND and similar library data commonly base their gender data
on nothing more than the apparent gender of the name on the cover page
(librarianship practice, unlike archival practise, takes such things
at face value). To take that librarianship practise out of context and
assert that that those thousands or tens of thousands of authors were
men (rather than just publishing under male or ambiguous names) isn't
going to get you sued, but that doesn't mean it's not the
white-washing of generations of women writers.

cheers
stuart

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Women on Wikidata

2014-04-20 Thread Stuart A. Yeates
On Mon, Apr 21, 2014 at 7:10 AM, Magnus Manske
magnusman...@googlemail.com wrote:

 The success of Wikidata is tightly coupled to the re-use of its wealth of
 data, both in Wikimedia projects and by third parties. Completeness of data
 is very much a factor here; for some research purposes, completeness may
 even be more important than 100% accuracy. As we have seen on Wikipedia,
 accuracy will improve over time, if a critical mass of contributors can be
 achieved.

I'm surprised that the WMF lawyers signed off on this. Deliberately
getting the sex of living people wrong seems like the kind of thing
litigation is made of. But then again, I'm not a lawyer.

cheers
stuart

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Women on Wikidata

2014-04-19 Thread Stuart A. Yeates
I have huge issues with using wikidata in this fashion. The blp and gender
guidelines on wiki.en have evolved over a very long time for some very good
rreasons.

I invite you to explain for example how culturally appropriate your
approach is for non western cultures with more than two genders.

Or the usefulness of describing the gender of prominent transgendered
people using a website with no policy against attack pages.

cheers
stuart

On 19/04/2014 11:32 PM, Gerard Meijssen gerard.meijs...@gmail.com wrote:

 Hoi,
 Many of you have done research on the gender gap in Wikipedia articles.
As a result you must have associated articles with people and those people
with their gender.

 It would be awesome if you would do the following:
 provide us with files that include at least that information.
 better, add pertinent information to Wikidata ... at least the fact that
they are human and, their sex
 It would be stellar when you can identify differences between what you
know and what is known in WIkidata
 The point is very much that a lot of information is added to Wikidata all
the time and when your base line information is known to Wikidata, It will
cover Wikipedia that much better.

 In your research you may want to look into the current difference in sex
between men and women... You can find it all the time, near real time..
Currently there are 150.801 females for 755.747 males known to Wikidata.
Yes, you can change the queries to find only female painters or females
with India as their nationality.. or males obviously

 * male
http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM[31%3A5]%20AND%20CLAIM[21%3A6581097]

 *female
http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM[31%3A5]%20AND%20CLAIM[21%3A6581072]

 When you would like your own database, you can.
 Thanks,
  GerardM

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Women on Wikidata

2014-04-19 Thread Stuart A. Yeates
On 20/04/2014 11:05 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote:

 What I do know is that at Wikidata we harvest information from all
Wikipedias. It does include en,wp but it is not exclusively so. It does
include the Russian, the Chinese, the Arabic ... all Wikipedias. As you
know, the first operational task for Wikidata is to replace the old inter
language links. A next objective is to include all the information that is
currently held in info boxes.

What process does wikidata have when different wikis have different
policies about what should appear by default in infoboxes. In particular
when a policy calls for discretion or human judgement?

cheers
stuart
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] The role of English Wikipedia's top content creators in perpetuating gender bias

2014-02-23 Thread Stuart A. Yeates
Not discounting the excellent points made above, I can't help but feel that
there are groups that have been fighting discrimination in institutions for
decades and that maybe we need to work with them rather than reinvent a
non-straight-white-male-wheel ourselves. People like
http://womenintheartsfoundation.org/ , http://www.guerrillagirls.com/ ,
http://www.ifuw.org/ , etc. It seems to me like the kind of activity that
WMF might have funds for.

cheers
stuart
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] The role of English Wikipedia's top content creators in perpetuating gender bias

2014-02-20 Thread Stuart A. Yeates
On Fri, Feb 21, 2014 at 7:55 AM, David Monniaux david.monni...@free.frwrote:


 I do not find such books on female sports. In fact, if I look for a book
 on the French women's soccer team on Amazon, I find something...
 extracted from Wikipedia! (Recall that football is the most popular
 sport in France...)

 In short, for certain topics (e.g. male sports), there is a gazillion
 books, biographies, and other source material readily available, while
 for others (e.g. female sports) such sources are more difficult to find.

 What I would like to understand is how much the bias is caused by such
 imbalances in sources. A possible evaluation method would be to consider
 female and male personalities (e.g. writers) equal in notoriety (e.g.
 according to scholars from that field), and to compare the length and
 quality of the biographies. What do you think?


A couple of confounding factors:
(a) Historically many talented women writers have written as men (or using
a house pseudonym).

(b) Historically serials have bee consumed disproportionately by women and
books by men. Historically libraries index book content but not serial
content by subject. Thus material written for a female audience has lower
visibility, even to writers in the field, because it's so much harder to
find.

cheers
stuart
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] The role of English Wikipedia's top content creators in perpetuating gender bias

2014-02-17 Thread Stuart A. Yeates
On Tue, Feb 18, 2014 at 9:34 AM, Samuel Klein meta...@gmail.com wrote:

 Why do you want categories in the first place?  Why not extract
 whatever semantic meaning you need (e.g., about genderbread) by
 parsing the sentences in each article?


Because for most people gender is a private matter which never makes it
into their article because being a private matter there are no reliable
sources about it?

 Coming from a Western, English-language point of view it's very easy to
  create structures that declare groups of people such as fa'afafine
 incapable
  of existing.

 ... so many assumptions you just made there :-)


Yes, but I happen to know they're all true; because I was speaking of
myself.


 Why is this a problem?
 The attribute gender according to DNB is a) useful historical data,
 b) verifiable, and c) easy to add to wikidata. I believe you can have
 DNB-gender as one of the variations on the global gender
 attribute.  Most articles (unless they are talking about the DNB
 specifically) would likely refer to the global attribute.  But this
 way you can have both datasets globally accessible.  Then after the
 import is done, people can write bulk data-cleaning scripts to help
 humans review those articles where the two differ.  And in cases where
 there is a years-long edit war about what the global attribute should
 be, you can keep track of what the input source-data is from various
 sources.


I'm primarily an en.wiki editor and frankly don't care about wikidata,
except as it affects en.wiki.

What I am sure of is that 'gender' on en.wiki defaulting to DNB-gender
unless the individual has spoken about their gender in reliable sources is
inappropriate. Not only does it breach WP:BLP, but by white-washing
minorities it is a travesty of [[Wikipedia:Systemic bias]].

cheers
stuart
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] The role of English Wikipedia's top content creators in perpetuating gender bias

2014-02-17 Thread Stuart A. Yeates
On Tue, Feb 18, 2014 at 9:44 AM, Federico Leva (Nemo) nemow...@gmail.comwrote:


 Even disregarding the impact, to assess the bias of the contributors
 themselves a more precise research, comparative in nature, would be needed.
 For instance, if one writes articles on parliament members in country X,
 and 70 % of articles are about males, that’s only biased if the actual
 percentage of male MP is less than 70 %. The same should be done with all
 the sources for each topic.


I disagree with this. If one is using biased sourced (such as a list which
is 70% male) it is one's responsibility to match that, where possible, with
other sources to counteract that bias. IMHO.


 And it’s nothing compared to the systemic bias towards the western and
 anglo-saxon point of view which writing in English and using (mostly
 online) English sources encourages, let alone languages less global in
 nature.


I completely agree with you.

cheers
stuart
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] The role of English Wikipedia's top content creators in perpetuating gender bias

2014-02-17 Thread Stuart A. Yeates


 I agree it is difficult in Wikipedia to:
 * measure bias in number, size, quality of articles
 * correlate bias in articles to skewed demographics of editors contributing
 to those articles (probably adjusted by frequency, size, or nature of
 edits)
 * determine if editors creating observable bias in the articles are doing
 so
 deliberately or unconsciously
 * postulate ways to address the bias

 But since we are here to discuss research, then let's discuss what would be
 a set of experiments that would help to answer these questions?



What would be great would be a set survey of the top 5000 (i.e. the group
that Laura is already working with) where they were asked basic questions
about the fields they edited in and their perception of gender bias, then
half way through they were presented with their rating by Laura's work,
 then another set of questions relating to gender bias.

Suitably phrased questions could be used to discover:
(a) whether they're a priori aware of the apparent bias
(b) whether they are surprised at the gender balance in their articles
discovered by Laura's work
(c) whether they see the gender balance as an issue
(d) whether they are aware of any untapped or under-utilised resources for
women's articles in their fields
(e) whether they are interested in working to combat apparent bias


cheers
stuart
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l