Re: [Wikitech-l] Core html of a wikisource page

Alex Brollo Wed, 06 Apr 2011 01:43:21 -0700

2011/4/6 Daniel Kinzler <dan...@brightbyte.de>

> On 06.04.2011 09:15, Alex Brollo wrote:
> > I saved the HTML source of a typical Page: page from it.source, the
> > resulting txt file having ~ 28 kBy; then I saved the "core html" only,
> t.i.
> > the content of <div class="pagetext">, and this file have 2.1 kBy; so
> > there's a more than tenfold ratio between "container" and "real content".
>
> wow, really? that seems a lot...
>
> > I there a trick to download the "core html" only?
>
> there are two ways:
>
> a) the old style "render" action, like this:
> <http://en.wikipedia.org/wiki/Foo?action=render>
>
> b) the api "parse" action, like this:
> <
> http://en.wikipedia.org/w/api.php?action=parse&page=Foo&redirects=1&format=xml
> >
>
> To learn more about the web API, have a look at <
> http://www.mediawiki.org/wiki/API>
>
>
Thanks Daniel, API stuff is a little hard for me:  the more I study, the
less I edit. :-)


Just to have a try, I called the same page, "render" action gives a file of
~ 3.4 kBy, "api" action a file of ~ 5.6 kBy. Obviuosly I'm thinking to bot
download. You are suggesting that it would be a good idea to use a *unlogged
* bot to avoid page parsing, and to catch the page code from some cache? I
know that some thousands of calls are nothing for wiki servers, but... I
always try to get a good performance, even from the most banal template.

Alex
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Core html of a wikisource page

Reply via email to