Just as a suggestion, you can turn these kind of numbers into a probability distribution using the beta distribution. If you use (1,1) as a prior you get something like beta(251,1) for the the probability of the probability that somebody named "Aaron" is male.

-----Original Message----- From: Markus Krötzsch
Sent: Sunday, October 13, 2013 6:16 PM
To: Discussion list for the Wikidata project.
Subject: [Wikidata-l] Application: sexing people by name/research gender bias

Hi all,

I'd like to share a little Wikidata application: I just used Wikidata to
guess the sex of people based on their (first) name [1]. My goal was to
determine gender bias among the authors in several research areas. This
is how some people spend their free time on weekends ;-)

In the process, I also created a long list of first names with
associated sex information from Wikidata [2]. It is not super clean but
it served its purpose. If you are a researcher, then maybe the gender
bias of journals/conferences is interesting to you as well. Details and
some discussion of the results are online [1].

Cheers,

Markus

[1] http://korrekt.org/page/Note:Sex_Distributions_in_Research
[2]
https://docs.google.com/spreadsheet/ccc?key=0AstQ5xfO-xXGdE9UVkxNc0JMVWJzNmJqNmhPRjc0cnc&usp=sharing

_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to