On 13.10.01, 19:36:23, Tim Kynerd wrote: > Hi, > > I have been running a plucker-build script that plucks news from a couple of > sites. However, this script generally takes 35-40 minutes (!) to run, and > since I live in Sweden and pay for even local phone calls by the minute, I'd > like to shorten this. > > Just for the heck of it, I've just installed wwwoffle, which caches HTML > documents from the Web, and played with it a little bit. I can easily get > it to download and cache the documents I'd like to pluck (which should take > less time than downloading *and* parsing them, right?) -- but it stores them > in a hashed form in special directories, and they're only accessible through > a proxy server on my local machine.
> > Is there any way to make plucker use this proxy server? I checked the docs > and tried setting up a .pluckerrc file with the "http_proxy=" option in the > [DEFAULT] section, but when I try to pluck a document that's in the wwwoffle > cache, the system still brings up the Internet connection, indicating that > plucker isn't trying to use the proxy server to access that document. It seems that (on Unix, at least) the Plucker Python scripts honour the "http_proxy" and "ftp_proxy" environment variables. Just set them tp "http://name.of.your.proxy:port/" (in my case it's "http://wwwproxy:3128/", using a private squid proxy in the local network.) Set the environment variable in the shell from which you start the plucker scripts. On my system it definitely queries the proxy, as I can see from the logs. Also note that the http_proxy variable's content needs to be in the form of an URL, preferably with port number. > > Or can anyone suggest some other intelligent way to do what I need to do? > Any help is welcome. > > Regards, > Tim Kynerd > > Sunrise in Stockholm today: 7:20 > Sunset in Stockholm today: 17:47 > My rail transit photos at http://www.kynerd.nu > -- Bernd Sieker NetBSD - the cathedral versus the bizarre. -- Julian Assange

