RE: simple wget question
This is something that is not supported by the http protocol. If you access the site via ftp://..., then you can use wildcards like *.pdf -Original Message- From: R Kimber [mailto:[EMAIL PROTECTED] Sent: Saturday, May 12, 2007 06:43 To: wget@sunsite.dk Subject: Re: simple wget question On Thu, 10 May 2007 16:04:41 -0500 (CDT) Steven M. Schweda wrote: From: R Kimber Yes there's a web page. I usually know what I want. There's a difference between knowing what you want and being able to describe what you want so that it makes sense to someone who does not know what you want. Well I was wondering if wget had a way of allowing me to specify it. But won't a recursive get get more than just those files? Indeed, won't it get everything at that level? The accept/reject options seem to assume you know what's there and can list them to exclude them. I only know what I want. [...] Are you trying to say that you have a list of URLs, and would like to use one wget command for all instead of one wget command per URL? Around here: ALP $ wget -h GNU Wget 1.10.2c, a non-interactive network retriever. Usage: alp$dka0:[utility]wget.exe;13 [OPTION]... [URL]... [...] That [URL]... was supposed to suggest that you can supply more than one URL on the command line. Subject to possible command-line length limitations, this should allow any number of URLs to be specified at once. There's also -1 (--input-file=FILE). No bets, but it looks as if you can specify - for FILE, and it'll read the URLs from stdin, so you could pipe them in from anything. Thanks, but my point is I don't know the full URL, just the pattern. What I'm trying to download is what I might express as: http://www.stirling.gov.uk/*.pdf but I guess that's not possible. I just wondered if it was possible for wget to filter out everything except *.pdf - i.e. wget would look at a site, or a directory on a site, and just accept those files that match a pattern. - Richard -- Richard Kimber http://www.psr.keele.ac.uk/
RE: simple wget question
Sorry, I didn't see that Steven has already answered the question. -Original Message- From: Steven M. Schweda [mailto:[EMAIL PROTECTED] Sent: Saturday, May 12, 2007 10:05 To: WGET@sunsite.dk Cc: [EMAIL PROTECTED] Subject: Re: simple wget question From: R Kimber What I'm trying to download is what I might express as: http://www.stirling.gov.uk/*.pdf At last. but I guess that's not possible. In general, it's not. FTP servers often support wildcards. HTTP servers do not. Generally, an HTTP server will not give you a list of all its files the way an FTP server often will, which is why I asked (so long ago) If there's a Web page which has links to all of them, [...]. I just wondered if it was possible for wget to filter out everything except *.pdf - i.e. wget would look at a site, or a directory on a site, and just accept those files that match a pattern. Wget has options for this, as suggested before (wget -h): [...] Recursive accept/reject: -A, --accept=LIST comma-separated list of accepted extensions. -R, --reject=LIST comma-separated list of rejected extensions. [...] but, like many of us, it's not psychic. It needs explict URLs or else instructions (-r) to follow links which it sees in the pages it sucks down. If you don't have a list of the URLs you want, and you don't have URLs for one or more Web pages which contain links to the items you want, then you're probably out of luck. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: simple wget question
On Thu, 10 May 2007 16:04:41 -0500 (CDT) Steven M. Schweda wrote: From: R Kimber Yes there's a web page. I usually know what I want. There's a difference between knowing what you want and being able to describe what you want so that it makes sense to someone who does not know what you want. Well I was wondering if wget had a way of allowing me to specify it. But won't a recursive get get more than just those files? Indeed, won't it get everything at that level? The accept/reject options seem to assume you know what's there and can list them to exclude them. I only know what I want. [...] Are you trying to say that you have a list of URLs, and would like to use one wget command for all instead of one wget command per URL? Around here: ALP $ wget -h GNU Wget 1.10.2c, a non-interactive network retriever. Usage: alp$dka0:[utility]wget.exe;13 [OPTION]... [URL]... [...] That [URL]... was supposed to suggest that you can supply more than one URL on the command line. Subject to possible command-line length limitations, this should allow any number of URLs to be specified at once. There's also -1 (--input-file=FILE). No bets, but it looks as if you can specify - for FILE, and it'll read the URLs from stdin, so you could pipe them in from anything. Thanks, but my point is I don't know the full URL, just the pattern. What I'm trying to download is what I might express as: http://www.stirling.gov.uk/*.pdf but I guess that's not possible. I just wondered if it was possible for wget to filter out everything except *.pdf - i.e. wget would look at a site, or a directory on a site, and just accept those files that match a pattern. - Richard -- Richard Kimber http://www.psr.keele.ac.uk/
Re: simple wget question
From: R Kimber What I'm trying to download is what I might express as: http://www.stirling.gov.uk/*.pdf At last. but I guess that's not possible. In general, it's not. FTP servers often support wildcards. HTTP servers do not. Generally, an HTTP server will not give you a list of all its files the way an FTP server often will, which is why I asked (so long ago) If there's a Web page which has links to all of them, [...]. I just wondered if it was possible for wget to filter out everything except *.pdf - i.e. wget would look at a site, or a directory on a site, and just accept those files that match a pattern. Wget has options for this, as suggested before (wget -h): [...] Recursive accept/reject: -A, --accept=LIST comma-separated list of accepted extensions. -R, --reject=LIST comma-separated list of rejected extensions. [...] but, like many of us, it's not psychic. It needs explict URLs or else instructions (-r) to follow links which it sees in the pages it sucks down. If you don't have a list of the URLs you want, and you don't have URLs for one or more Web pages which contain links to the items you want, then you're probably out of luck. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: simple wget question
On Sun, 6 May 2007 21:44:16 -0500 (CDT) Steven M. Schweda wrote: From: R Kimber If I have a series of files such as http://www.stirling.gov.uk/elections07abcd.pdf http://www.stirling.gov.uk/elections07efg.pdf http://www.stirling.gov.uk/elections07gfead.pdf etc is there a single wget command that would download them all, or would I need to do each one separately? It depends. As usual, it might help to know your wget version and operating system, but in this case, a more immediate mystery would be what you mean by them all, and how one would know which such files exist. GNU Wget 1.10.2, Ubuntu 7.04 If there's a Web page which has links to all of them, then you could use a recursive download starting with that page. Look through the output from wget -h, paying particular attention to the sections Recursive download and Recursive accept/reject. If there's no such Web page, then how would wget be able to divine the existence of these files? Yes there's a web page. I usually know what I want. But won't a recursive get get more than just those files? Indeed, won't it get everything at that level? The accept/reject options seem to assume you know what's there and can list them to exclude them. I only know what I want. Not necessarily what I don't want. I did look at the man page, and came to the tentative conclusion that there wasn't a way (or at least an efficient way) of doing it, which is why I asked the question. - Richard -- Richard Kimber http://www.psr.keele.ac.uk/
Re: simple wget question
From: R Kimber If I have a series of files such as http://www.stirling.gov.uk/elections07abcd.pdf http://www.stirling.gov.uk/elections07efg.pdf http://www.stirling.gov.uk/elections07gfead.pdf etc is there a single wget command that would download them all, or would I need to do each one separately? It depends. As usual, it might help to know your wget version and operating system, but in this case, a more immediate mystery would be what you mean by them all, and how one would know which such files exist. If there's a Web page which has links to all of them, then you could use a recursive download starting with that page. Look through the output from wget -h, paying particular attention to the sections Recursive download and Recursive accept/reject. If there's no such Web page, then how would wget be able to divine the existence of these files? If you're running something older than version 1.10.2, you might try getting the current released version first. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547