A copy of this post can be found at

My dissertation topic involves doing a demographic and geographic study of
Australian sport fandom online. There are several sites and social networks
where you can get publicly available demographic data to begin to formulate
a picture of the user population, and then segment that population out by
interest in a league, sport and athlete. I’ve spent a lot of time looking at
Twitter, Facebook and LiveJournal. Recently, partly because of a trip to the
discussions with a few people at
UCNISS <http://www.ucniss.net/>, my interest in who was contributing to
Australian sport wiki articles on Wikipedia increased.

Finding out who edited Wikipedia articles using publicly available
information is a bit of a challenge. The most reliable information for who
edited comes from IP address information. IP addresses can provide an idea
as to the geographic location of the contributor. It is easy enough, with
the help of a friend, to create a tool that pull the history of a Wikipedia
article, get a list of IP addresses that edited the article, feed the IP
address into another tool that will pull up the general location of the
contributor. (One of my favorite visualizations of this type of information
is WikipediaVision <http://www.lkozma.net/wpv/index.html>.) The data isn’t
always accurate and if I was looking primarily at New Zealand, a country
without its own dedicated IP address range, this would be even less
reliable. Still, for my purposes, this data works pretty well.

This data is still pretty limited. There are a lot of articles that are
edited by non-anonymous users. Sometimes, it is possible to get demographic
and geographic information about Wikipedia contributors by viewing their
profile pages. This can just be time consuming to do manually if an article
has a large number of contributors as you need to view a lot of user pages.
It becomes a deterrence for trying to collect geographic information about
article contributors.

I was looking for a more time effective and accurate method of collecting
geographic and demographic information about contributors that is publicly
available on their user pages. The easiest and quickest way to get this
information on a mass scale is to utilize user box information. Many user
boxes, when included on a user page, put the user into a category. These
categories are often then linked through the Wikipedian category
Beyond that, user boxes involve templates. It is easy to get a list of
articles (user pages) that the template is included on.

The methodology that I selected from this point is rather straightforward.
It involved:

1. Select a category.
2. Copy and paste the list of articles (user pages) in the selected category
to an Excel spreadsheet. Sort the list alphabetically. Copy and paste only
the user pages to Notepad. Replace * User with blank. Copy and paste this
list back to Excel.
3. Create a filter where the cell contains / . Select those cells. Copy them
to notepad, replace / with [tab] in order to remove user subpages from the
list. Copy this back to Excel. Select only the column with usernames.
4. Run an advance filter in order to remove all duplicate rows.
5. Copy this list back to the dedicated spreadsheet. Label all those users
with the category from which they were pulled in a unique column.
6. Repeat steps 1 to 5 until all the categories that you want to have
included are included.
7. Merge/Group all the rows by username.

This method may not be the most efficient way of going about doing this. It
can probably be improved by automating some of these steps. In my case, step
7 was not able to be completed using Excel. I had to e-mail the file to
@woganmay <http://twitter.com/woganmay>, who I believe converted the file to
a mySQL database, used the group feature, converted the results back to csv
and e-mailed the file to me.

In my case, I did not complete this for every category. Some categories did
not seem worth it time wise as they had too few user pages to be included.
In other cases, the categories were just too big to do. This included all
the members of User de, User en, User es, User fr, User it, User jp. Only a
selected number of categories were included because of time constraints.
Data gathering was focused on categories that I perceived would have the
greatest number of Australians and other possible contributors to Australian
related articles. When these categories were more exhausted, categories with
between 1,00 and 5,000 articles were selected.

There are all sort of limitations to this data. First, not everyone includes
userboxes on their profile pages. This means that there could be a lot more
Australians on Wikipedia than indicated by userbox inclusion on a user page.
The assumption for the resulting data is that proportional representation
exists for various categories. So while there are X amount of Christians and
Y amount of Atheists, the assumption that the relationship between X and Y
will always be proportional to the actual population on Wikipedia. Whatever
data is available thus has to be viewed as good enough or supplemented by
going to individual user ages to see if other information is available when
a user appears where no information for someone when running against the
history of the article.

Second, even when they do exist, there are often useful pieces of
information that are missing. For example, in an Australian context, there
is a userbox for Rugby League fans. There is not however a userbox for
Australian rules footy fans. There are also not user boxes and categories
for fans of NRL or AFL teams. (This type of user box and category exists for
National Hockey League teams.)

About halfway through this process, I realized that this data could be
useful for analysis beyond who is editing Wikipedia. At the moment, I’ve
only totaled data I have for Australians. It is pretty fascinating and would
be neat to go further with: How does the proportional size of the Australian
Wikipedian population compare against the actual population? Does the size
of the Australian Atheist versus Christiah community actively reflect the
proportions in Australian society? Or is the Australian Wikipedian community
demographically distinct from the greater population?

The following tables include the data based on people who were
included in Wikipedians
in Australia<http://en.wikipedia.org/wiki/Category:Wikipedians_in_Australia>and
its subcategories and Australian
Wikipedians <http://en.wikipedia.org/wiki/Category:Australian_Wikipedians>.
A copy of the raw data can be found at October 9 – Wikipedia English Data –
The data is provided without comment though any attempts at explaining the
patterns found are very much appreciated.
 Country Count  Bangladesh 3  Canada 2  Egypt 2  India 1  Indonesia 2
Ireland 3  Jamaica 2  Japan 5  New Zealand 17  Papua New Guinea 1  Republic
of Ireland 5  Singapore 5  South Africa 2  South Korea 1  Sri Lanka 2
Tanzania 2  Turkey 2  United States 16   State Count  Australian Capital
Territory 89  Canterbury 1  New South Wales 345  Northern Territory 5  Otago
1  Queensland 208  South Australia 144  Southland 1  Tasmania 54  Victoria
370  Wellington 2  Western Australia 145   Degree Count  BA degrees 21  BCom
degrees 2  BCS degrees 3  BE degrees 18  BMus degrees 1  BS degrees 41  MS
degrees 5  PhD degrees 18   University/Alma Mater Count  Australian National
University 14  Avondale College 1  Charles Sturt University 1  Curtin
University of Technology 7  Deakin University 6  Flinders University 7
University 1  James Cook University 2  La Trobe University 2  Macquarie
University 5  Massey University 1  Monash University 19  Royal Melbourne
Institute of Technology 10  University of Adelaide 4  University of Alberta
1  University of Canberra 3  University of Melbourne 21  University of New
England 4  University of New South Wales 24  University of Newcastle 8
of Sydney 16  University of Tasmania 3  University of Technology,
Sydney 4  University
of Western Australia 11  University of Wollongong 4  Victorian College of
the Arts 1   Student type Count  Business students 3  College students 26  Law
students 9  Medical students 8  University students 59   Website Count  Open
Directory Project 1  OpenStreetMap 2  Wookieepedia 1   Religion Count  Anglican
and Episcopalian 8  Antitheist 3  Atheist 97  Buddhist 13  Catholic 7
Christian 47  Eastern Orthodox 2  Hindu 1  Jewish 4  Lutheran 1  Methodist 2
Muslim 4  Non-denominational Christian 2  Objectivist 2  Pastafarian 17
Presbyterian 3  Protestant 11  Roman Catholic 10   Ethnicity and nationality
Count  Argentine 2  Bangladeshi 2  British 3  English 10  Latino/Hispanic 1
Skill Count  Aircraft pilots 5  Artists 3  Engineers 17  Filmmakers 17
Homebrewers 10  Mechanical engineers 1  Professional writers 1  Surfers 2
Profession Count  Accountants 2  Actor 5  Actuaries 2  Aircraft pilots 5
Biologist 9  Broadcasters 5  Chemist 6  Composers 28  Computer scientists 7
Engineers 17  Filmmakers 17  Geoscientists 2  Mechanical engineers 1
Scientists 7  Teacher 18  University teacher 4  Web designers 2  Web
developers 1   Interest Count  Chemistry 27  Cooking 1  Physics 34  Strings
(physics) 6   Sports Count  Cavers 2  Cross-country runners 4  Dancers
3  Detroit
Red Wings fans 2  Equestrians 2  Fencers 2  Geocachers 8  Hikers 2  Hunters
7  Outdoor pursuits 2  Rugby league fans 50  Runners 2  Sailing 1  Scuba
divers 8  Snowboarders 2  Swimmers 16  Swing dancers 1  Toronto Maple Leafs
fans 1  Ultimate Fighting Championship fans 2  Vancouver Canucks fans
3  WikiProject
Tennis members 4   Wikipedia Status Count  Administrator hopefuls 41
Administrators 45  Administrators who will provide copies of deleted
articles 11  Bureaucrats 1  Contribute to Wikimedia Commons 1  Create
userboxes 3  Opted out of automatic signing 4  Reviewers 10
Rollbackers 27  Service
Award Level 01 12  Service Award Level 02 14  Service Award Level 03
10  Service
Award Level 04 5  Service Award Level 05 6  Service Award Level 06 9  Service
Award Level 07 11  Service Award Level 08 3  Service Award Level 09 2
Commons administrators 2   Philosophy Count  Hindu 1  Humanist 6
Materialist 9  Pastafarian 16  Theist 9

twitter: purplepopple
blog: ozziesport.com
Wikimediaau-l mailing list

Reply via email to