Re: [Wikitech-l] Html dump for Wikipedia

MZMcBride Tue, 06 Dec 2011 06:02:36 -0800

Tim Starling wrote:
> On 04/12/11 12:32, MZMcBride wrote:
>> This may be a stupid question as I don't understand the mechanics
>> particularly well, but... as far as I understand it, there's a Squid cache
>> layer that contains the HTML output of parsed and rendered wikitext pages.
>> This stored HTML is what most anonymous viewers receive when they access the
>> site. Why can't that be dumped into a output file rather than running
>> expensive and timely HTML dump generation scripts?
>> 
>> In other words, it's not as though the HTML doesn't exist already. It's
>> served millions and millions of times each day. Why is it so painful to make
>> it available as a dump?
> 
> Most of the code would be the same, it's just a bit more flexible to
> do the parsing in the extension, it makes it easier to change some
> details of the generated HTML, and lets you avoid polluting the caches
> with rarely-viewed pages. It's not especially painful either way.


So the reason that there hasn't been an HTML dump of Wikimedia wikis in
years is that no Wikimedia sysadmin can be bothered to run a maintenance
script?

MZMcBride



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html dump for Wikipedia

Reply via email to