2011/4/6 Daniel Kinzler <dan...@brightbyte.de> > On 06.04.2011 09:15, Alex Brollo wrote: > > I saved the HTML source of a typical Page: page from it.source, the > > resulting txt file having ~ 28 kBy; then I saved the "core html" only, > t.i. > > the content of <div class="pagetext">, and this file have 2.1 kBy; so > > there's a more than tenfold ratio between "container" and "real content". > > wow, really? that seems a lot... > > > I there a trick to download the "core html" only? > > there are two ways: > > a) the old style "render" action, like this: > <http://en.wikipedia.org/wiki/Foo?action=render> > > b) the api "parse" action, like this: > < > http://en.wikipedia.org/w/api.php?action=parse&page=Foo&redirects=1&format=xml > > > > To learn more about the web API, have a look at < > http://www.mediawiki.org/wiki/API> > > Thanks Daniel, API stuff is a little hard for me: the more I study, the less I edit. :-)
Just to have a try, I called the same page, "render" action gives a file of ~ 3.4 kBy, "api" action a file of ~ 5.6 kBy. Obviuosly I'm thinking to bot download. You are suggesting that it would be a good idea to use a *unlogged * bot to avoid page parsing, and to catch the page code from some cache? I know that some thousands of calls are nothing for wiki servers, but... I always try to get a good performance, even from the most banal template. Alex _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l