Hi, I'm trying to create a little program which parses e-mails from Google News Alerts, downloads the referenced stories, and creates an index of the stories to make a sort of "virtual scrapbook" of new stories on a particular subject.
I'm using wget --page-requisites (and other options) to download the pages. The problem is that wget doesn't always save the top HTML file into a predicatable filename when it gets a 302 response from the server (and maybe in other situations as well, I haven't checked.) For example: wget --page-requisites http://www.theherald.co.uk/6123.shtml Downloads to www.theherald.co.uk/news/6123.html rather than www.theherald.co.uk/6123.html This makes it hard to create the index file, because I can't predict where the HTML file has been saved so I can't automatically generate a list of URLs I want the index to link to. Is there a neat way to fix this? --output-document has potential, but I'd only want to force the filename of the top level HTML file, I don't want all page requisites dumped into a single file. Alternatively, I could download once with --page-requites and then again with --output-document but then I can't use --convert-links to make the page suitable for local viewing. Any suggestions? Thanks in advance, Jim -- Jim Farrand