Re: simple wget question
On Thu, 10 May 2007 16:04:41 -0500 (CDT) Steven M. Schweda wrote: From: R Kimber Yes there's a web page. I usually know what I want. There's a difference between knowing what you want and being able to describe what you want so that it makes sense to someone who does not know what you want. Well I was wondering if wget had a way of allowing me to specify it. But won't a recursive get get more than just those files? Indeed, won't it get everything at that level? The accept/reject options seem to assume you know what's there and can list them to exclude them. I only know what I want. [...] Are you trying to say that you have a list of URLs, and would like to use one wget command for all instead of one wget command per URL? Around here: ALP $ wget -h GNU Wget 1.10.2c, a non-interactive network retriever. Usage: alp$dka0:[utility]wget.exe;13 [OPTION]... [URL]... [...] That [URL]... was supposed to suggest that you can supply more than one URL on the command line. Subject to possible command-line length limitations, this should allow any number of URLs to be specified at once. There's also -1 (--input-file=FILE). No bets, but it looks as if you can specify - for FILE, and it'll read the URLs from stdin, so you could pipe them in from anything. Thanks, but my point is I don't know the full URL, just the pattern. What I'm trying to download is what I might express as: http://www.stirling.gov.uk/*.pdf but I guess that's not possible. I just wondered if it was possible for wget to filter out everything except *.pdf - i.e. wget would look at a site, or a directory on a site, and just accept those files that match a pattern. - Richard -- Richard Kimber http://www.psr.keele.ac.uk/
--page-requisites and --post-data options
Hi ! First of all, thank you for this beautifull tool that Wget is. Simply to tell you that if you want, in a script, to make an action on a web server ( by posting datas), an completely mirror the resulting page for offline viewing, you will use --page-requisites and --post-data options together. The fact is, Wget continues to post datas for all images, css required to 'completely' download the page, and the server says : 405 : method not allowed which is normal. Is there a way to do it with wget options ? thanks in advance ! Laurent STELLA
Re: --page-requisites and --post-data options
From aulaulau: [...] you will use --page-requisites and --post-data options together. Probably not something anyone considered. Is there a way to do it with wget options ? Perhaps use --post-data to get the primary page, and then use -i primary_page (perhaps with -F, perhaps with --page-requisites) to get the other pieces? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: simple wget question
From: R Kimber What I'm trying to download is what I might express as: http://www.stirling.gov.uk/*.pdf At last. but I guess that's not possible. In general, it's not. FTP servers often support wildcards. HTTP servers do not. Generally, an HTTP server will not give you a list of all its files the way an FTP server often will, which is why I asked (so long ago) If there's a Web page which has links to all of them, [...]. I just wondered if it was possible for wget to filter out everything except *.pdf - i.e. wget would look at a site, or a directory on a site, and just accept those files that match a pattern. Wget has options for this, as suggested before (wget -h): [...] Recursive accept/reject: -A, --accept=LIST comma-separated list of accepted extensions. -R, --reject=LIST comma-separated list of rejected extensions. [...] but, like many of us, it's not psychic. It needs explict URLs or else instructions (-r) to follow links which it sees in the pages it sucks down. If you don't have a list of the URLs you want, and you don't have URLs for one or more Web pages which contain links to the items you want, then you're probably out of luck. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547