On Tue, Dec 2, 2008 at 2:01 PM, Magnus Manske
<[EMAIL PROTECTED]>wrote:
> (feel free to bash me if we had this variant already, I couldn't find
> it in the list archives)
>
> Task: On German Wikipedia (yay atomic categories!), find women who
> were born in 1901 and died in 1986.
> Runtime : Toolserver, <2 sec
> Query:
> SELECT * FROM ( SELECT page_title,count(cl_to) AS cnt FROM
> page,categorylinks WHERE page_id=cl_from AND cl_to in ( "Frau" ,
> "Geboren_1901" , "Gestorben_1986" ) GROUP BY cl_from ) AS tbl1 WHERE
> tbl1.cnt = 3 ;
>
> Trying to "poison" the query by also looking in all GFDL images
> ("GFDL-Bild", ~60K entries in category) increases runtime to 3 sec.,
> so not that bad.
>
>
> I've implemented this as a tool now:
> http://toolserver.org/~magnus/category_intersection.php<http://toolserver.org/%7Emagnus/category_intersection.php>
>
> Queries seem to take a little longer there (2-4 sec) compared to the
> command line.
>
> Articles on en.wikipedia with "1905 births" and "1967 deaths" took <0.4
> sec.
> OTOH, looking for images on Commons in "GFDL" and "Buildings in
> Berlin" took ~2min. Might be the giant GFDL category, or the
> toolserver, or both. I'll try to fiddle with it some more utilising
> cat_pages/cat_files.
>
> Magnus
>
Very nice. Danke.
You should mention that categories are entered each at a separate line (or
an example), as it took me some trials to figure it out.
--
--alnokta
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l