https://bugzilla.wikimedia.org/show_bug.cgi?id=6220

--- Comment #22 from Bawolff <[email protected]> 2012-01-01 10:13:59 UTC ---
So I have some ideas how to fix this.

Basically, GlobalUsage stores what images that don't exist locally are in use.
So I was thinking a query something like:

select '6' as namespace, gil_to as title, count(*) as value from
globalimagelinks LEFT JOIN image on gil_to = img_name where img_name is null
and gil_wiki = 'jawikinews' group by gil_to order by count(*) DESC;

(Using jawikinews as an example, since it's a smallish size wiki (5480 entries
in global usage) thus I can easily test these queries on toolserver). 6 ==
NS_FILE.

This seemed to work, however with one problem. Image redirects were still
included. I'm not sure if that's a globalusage issue (should the links be to
the target image) or if its intentional behaviour. Filtering those out in the
sql gives:

select '6' as namespace, gil_to as title, count(*) as value from
globalimagelinks LEFT JOIN image on gil_to = img_name LEFT JOIN page on (gil_to
= page_title and page_namespace=6) where img_name is null and gil_wiki =
'jawikinews' and (page_is_redirect is null or page_is_redirect = 0) group by
gil_to order by count(*) DESC;

However, that seems to slow down the query by quite a bit (10 seconds went to 2
minutes). OTOH, the query is slow regardless, and its going to be cached (I'm
not sure how slow is too slow). This still would mess up on some edge cases
though, such as if the page is a redirect to a non-existant file (or even to
something not in NS_FILE). [And of course it doesn't address the more general
problem of files from Foreign repos in general. I'm not sure if the general
problem is addressable without a schema change]


So possible way forward - Add to GlobalUsage extension a new special page that
overrides the built in special:wantedfiles with the new query. Even with the
first query i mentioned, it would cut down on false positives significantly.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to