A month ago, PageImages extension[1] was black-deployed, intended to
automatically associate images with articles. It populates its data
when LinksUpdate is run, i.e. when a page or templates it trascludes
is edited or purged. Since then, most of pages were re-parsed, however
slightly less than a million English WP articles remain:

select count(*), avg(page_len) from page where page_namespace=0 and 
page_is_redirect=0 and page_touched < '20121229000000';
+----------+---------------+
| count(*) | avg(page_len) |
+----------+---------------+
|   977568 |     3172.0948 |
+----------+---------------+
1 row in set (5 min 59.55 sec)

Waiting for these pages to be updated naturally could take forever:

select min(page_touched) from page where page_namespace=0 and 
page_is_redirect=0;
+-------------------+
| min(page_touched) |
+-------------------+
| 20090714142954    |
+-------------------+
1 row in set (2 min 15.13 sec)

That was [2] before I purged it: obscure topic, no templates.

Thus, I would like to populate this data with a script[3]. To reduce
the scare, let me remark that these pages have almost no templates and
are significantly smaller than average: 3172 bytes vs. 5673 so they
should be mostly fast to parse.

Is running it a good idea?

-----
[1] https://www.mediawiki.org/wiki/Extension:PageImages
[2] https://en.wikipedia.org/wiki/City_of_Melbourne_election,_2008
[3] 
https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/PageImages.git;a=blob;f=initImageData.php;hb=HEAD

-- 
Best regards,
  Max Semenik ([[User:MaxSem]])


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to