Daniel Schwen wrote:

> I find it a little frustrating that this wheel gets reinvented 
> so often. My tool was used a couple of times after I posted it, 
> and now as maybe one user per day (from a quick glance at the 

Users of the Swedish Wikipedia are increasingly starting to use 
Duesentrieb's CatScan tool.  It is really useful, but could need 
some further improvement, especially in the handling of large 
categories.

> So we have shown multiple times now that cat intersection is 
> technically feasible. What we nee now is massive lobbying for 
> atomic categorisation. THAT is the hurdle right now IMO. Not 
> some SQL queries.

After a lengthy discussion (over many years) about category:tennis 
players and category:female tennis players in the Swedish 
Wikipedia, I created in late August 2008 the category:men and 
category:women, so that all profession categories could be freed 
from the burden of also documenting the gender.  The Swedish 
Wikipedia still has a category:Danish tennis players (combining 
profession and nationality), just like the English Wikipedia, but 
gender is now documented separately, as in the German Wikipedia.

All three languages have a category:1942 births.  I think no 
language of Wikipedia has a combined category for tennis players 
born in 1942.  So the question of atomic categories is not an 
absolute.  It is more or less implemented everywhere.  For finding 
tennis players born in 1942, even the English Wikipedia needs to 
do cross sectioning of categories.

Radically changing the categorization system is not realistic.  
It was a huge effort already to introduce men/women in the Swedish 
Wikipedia, even though this was just adding categories (not 
removing any), and even though Swedish is not among the largest 10 
Wikipedias. Within 3 months (September-November), some 75,000 
articles were categorized, of which 15,000 women and 60,000 men. 
The ratio 1:4 (1 woman for every 4 men) is far more equal than the 
1:6 ratio of the German Wikipedia.

What I discovered then was that of these 75,000 biographies, only 
60,000 were categorized according to year of birth.  So we now 
have to birth categorize 15,000 articles before we can compile 
reliable statistics on how the gender imbalance shifts over time. 
Early estimates show that there is a 1:10 gender ratio in the 18th 
century and a 1:3 ratio for those born in the 1970s.

So the larger imbalance (1:6) of the German Wikipedia might be 
explained by having a larger amount of 18th century biographies.



-- 
  Lars Aronsson ([EMAIL PROTECTED])
  Aronsson Datateknik - http://aronsson.se

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to