Re: [Toolserver-l] question about performance/getting webpage content

Platonides Mon, 24 Nov 2008 13:16:07 -0800

seth wrote:
> Hi!
> 
> I wrote a perl script, which works on some HTML content of some 
> wikipedia-webpages. Some of those pages are >300kB and perls LWP-mirror 
> hangs up.
> 
> Two questions:
> 1. Is there a better/faster way to get the HTML content of e.g. 
> http://meta.wikimedia.org/wiki/Spam_blacklist/Log
> than
>    my $ua = LWP::UserAgent->new;
>    $ua->mirror($url, $filename);
> ?


To get the content of wikipedia pages you should be using WikiProxy 
http://meta.wikimedia.org/wiki/User:Duesentrieb/WikiProxy

If you still do need to fetch it by yourself, you can launch an external 
tool (wget, curl...) to download it and then read it as a normal file.



> 2. If I've questions about such stuff, am I right here? Otherwise, sorry 
> for bothering you. :-)
> 
> Cheers
> seth

Yes, this is a good place :)


Platonides

_______________________________________________
Toolserver-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/toolserver-l

Re: [Toolserver-l] question about performance/getting webpage content

Reply via email to