Bryan Tong Minh wrote:
> On Sun, Sep 9, 2012 at 8:34 PM, Roberto Flores <[email protected]>
> wrote:
>> Could you please provide HTML dumps (I mean, with the templates
>> pre-processed into HTML, everything else the same as now) every 3 or 4
>> months?
>> 
> How a template is rendered into HTML depends very much on the context
> (i.e. page title, last modification date, etc.) and the arguments that
> it is called with. So an HTML render of all template pages is unlikely
> to be very useful for you.

This reply and others in this thread don't make any sense to me. It seems
like the opening poster is looking at
<http://dumps.wikimedia.org/other/static_html_dumps/> and noticing that the
HTML dumps haven't been updated in years. This is a problem and it's tracked
by <https://bugzilla.wikimedia.org/show_bug.cgi?id=15017>.

For reference, the English Wikipedia has over 4 million content pages. At a
rate of a page per second (which is completely unrealistic, but let's just
assume for a moment), you could get the HTML of every content page on the
English Wikipedia in about 46.3 days(!). The suggestion in any discussion
regarding HTML dumps that people should just use the API themselves
(presumably in combination with an XML dump containing the wikitext of each
page) is just absurd. There's enormous value in the HTML dumps. This subject
came up in December 2011 and from the comments in that thread, it seemed as
though the only reason the HTML dumps have been updated is that nobody has
run the relevant script.

MZMcBride



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to