RE: simple wget question

2007-05-13 Thread Willener, Pat
This is something that is not supported by the http protocol.
If you access the site via ftp://..., then you can use wildcards like *.pdf 

-Original Message-
From: R Kimber [mailto:[EMAIL PROTECTED] 
Sent: Saturday, May 12, 2007 06:43
To: wget@sunsite.dk
Subject: Re: simple wget question

On Thu, 10 May 2007 16:04:41 -0500 (CDT)
Steven M. Schweda wrote:

 From: R Kimber
 
  Yes there's a web page.  I usually know what I want.
 
There's a difference between knowing what you want and being able
 to describe what you want so that it makes sense to someone who does
 not know what you want.

Well I was wondering if wget had a way of allowing me to specify it.

  But won't a recursive get get more than just those files? Indeed,
  won't it get everything at that level? The accept/reject options
  seem to assume you know what's there and can list them to exclude
  them.  I only know what I want. [...]
 
Are you trying to say that you have a list of URLs, and would like
 to use one wget command for all instead of one wget command per URL? 
 Around here:
 
 ALP $ wget -h
 GNU Wget 1.10.2c, a non-interactive network retriever.
 Usage: alp$dka0:[utility]wget.exe;13 [OPTION]... [URL]...
 [...]
 
 That [URL]... was supposed to suggest that you can supply more than
 one URL on the command line.  Subject to possible command-line length
 limitations, this should allow any number of URLs to be specified at
 once.
 
There's also -1 (--input-file=FILE).  No bets, but it looks as
 if you can specify - for FILE, and it'll read the URLs from stdin,
 so you could pipe them in from anything.

Thanks, but my point is I don't know the full URL, just the pattern.

What I'm trying to download is what I might express as:

http://www.stirling.gov.uk/*.pdf

but I guess that's not possible.  I just wondered if it was possible
for wget to filter out everything except *.pdf - i.e. wget would look
at a site, or a directory on a site, and just accept those files that
match a pattern.

- Richard
-- 
Richard Kimber
http://www.psr.keele.ac.uk/


RE: simple wget question

2007-05-13 Thread Willener, Pat
Sorry, I didn't see that Steven has already answered the question. 

-Original Message-
From: Steven M. Schweda [mailto:[EMAIL PROTECTED] 
Sent: Saturday, May 12, 2007 10:05
To: WGET@sunsite.dk
Cc: [EMAIL PROTECTED]
Subject: Re: simple wget question

From: R Kimber

 What I'm trying to download is what I might express as:
 
 http://www.stirling.gov.uk/*.pdf

   At last.

 but I guess that's not possible.

   In general, it's not.  FTP servers often support wildcards.  HTTP
servers do not.  Generally, an HTTP server will not give you a list of
all its files the way an FTP server often will, which is why I asked (so
long ago) If there's a Web page which has links to all of them, [...].

   I just wondered if it was possible
 for wget to filter out everything except *.pdf - i.e. wget would look
 at a site, or a directory on a site, and just accept those files that
 match a pattern.

   Wget has options for this, as suggested before (wget -h):

[...]
Recursive accept/reject:
  -A,  --accept=LIST   comma-separated list of accepted extensions.
  -R,  --reject=LIST   comma-separated list of rejected extensions.
[...]

but, like many of us, it's not psychic.  It needs explict URLs or else
instructions (-r) to follow links which it sees in the pages it sucks
down.  If you don't have a list of the URLs you want, and you don't have
URLs for one or more Web pages which contain links to the items you
want, then you're probably out of luck.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: simple wget question

2007-05-11 Thread R Kimber
On Thu, 10 May 2007 16:04:41 -0500 (CDT)
Steven M. Schweda wrote:

 From: R Kimber
 
  Yes there's a web page.  I usually know what I want.
 
There's a difference between knowing what you want and being able
 to describe what you want so that it makes sense to someone who does
 not know what you want.

Well I was wondering if wget had a way of allowing me to specify it.

  But won't a recursive get get more than just those files? Indeed,
  won't it get everything at that level? The accept/reject options
  seem to assume you know what's there and can list them to exclude
  them.  I only know what I want. [...]
 
Are you trying to say that you have a list of URLs, and would like
 to use one wget command for all instead of one wget command per URL? 
 Around here:
 
 ALP $ wget -h
 GNU Wget 1.10.2c, a non-interactive network retriever.
 Usage: alp$dka0:[utility]wget.exe;13 [OPTION]... [URL]...
 [...]
 
 That [URL]... was supposed to suggest that you can supply more than
 one URL on the command line.  Subject to possible command-line length
 limitations, this should allow any number of URLs to be specified at
 once.
 
There's also -1 (--input-file=FILE).  No bets, but it looks as
 if you can specify - for FILE, and it'll read the URLs from stdin,
 so you could pipe them in from anything.

Thanks, but my point is I don't know the full URL, just the pattern.

What I'm trying to download is what I might express as:

http://www.stirling.gov.uk/*.pdf

but I guess that's not possible.  I just wondered if it was possible
for wget to filter out everything except *.pdf - i.e. wget would look
at a site, or a directory on a site, and just accept those files that
match a pattern.

- Richard
-- 
Richard Kimber
http://www.psr.keele.ac.uk/


Re: simple wget question

2007-05-11 Thread Steven M. Schweda
From: R Kimber

 What I'm trying to download is what I might express as:
 
 http://www.stirling.gov.uk/*.pdf

   At last.

 but I guess that's not possible.

   In general, it's not.  FTP servers often support wildcards.  HTTP
servers do not.  Generally, an HTTP server will not give you a list of
all its files the way an FTP server often will, which is why I asked (so
long ago) If there's a Web page which has links to all of them, [...].

   I just wondered if it was possible
 for wget to filter out everything except *.pdf - i.e. wget would look
 at a site, or a directory on a site, and just accept those files that
 match a pattern.

   Wget has options for this, as suggested before (wget -h):

[...]
Recursive accept/reject:
  -A,  --accept=LIST   comma-separated list of accepted extensions.
  -R,  --reject=LIST   comma-separated list of rejected extensions.
[...]

but, like many of us, it's not psychic.  It needs explict URLs or else
instructions (-r) to follow links which it sees in the pages it sucks
down.  If you don't have a list of the URLs you want, and you don't have
URLs for one or more Web pages which contain links to the items you
want, then you're probably out of luck.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: simple wget question

2007-05-10 Thread R Kimber
On Sun, 6 May 2007 21:44:16 -0500 (CDT)
Steven M. Schweda wrote:

 From: R Kimber
 
  If I have a series of files such as
  
  http://www.stirling.gov.uk/elections07abcd.pdf
  http://www.stirling.gov.uk/elections07efg.pdf
  http://www.stirling.gov.uk/elections07gfead.pdf
   
  etc
  
  is there a single wget command that would download them all, or
  would I need to do each one separately?
 
It depends.  As usual, it might help to know your wget version and
 operating system, but in this case, a more immediate mystery would be
 what you mean by them all, and how one would know which such files
 exist.

GNU Wget 1.10.2, Ubuntu 7.04

If there's a Web page which has links to all of them, then you
 could use a recursive download starting with that page.  Look through
 the output from wget -h, paying particular attention to the sections
 Recursive download and Recursive accept/reject.  If there's no
 such Web page, then how would wget be able to divine the existence of
 these files?

Yes there's a web page.  I usually know what I want.

But won't a recursive get get more than just those files? Indeed, won't
it get everything at that level? The accept/reject options seem to
assume you know what's there and can list them to exclude them.  I only
know what I want.  Not necessarily what I don't want. I did look at the
man page, and came to the tentative conclusion that there wasn't a
way (or at least an efficient way) of doing it, which is why I asked
the question.

- Richard
-- 
Richard Kimber
http://www.psr.keele.ac.uk/


Re: simple wget question

2007-05-06 Thread Steven M. Schweda
From: R Kimber

 If I have a series of files such as
 
 http://www.stirling.gov.uk/elections07abcd.pdf
 http://www.stirling.gov.uk/elections07efg.pdf
 http://www.stirling.gov.uk/elections07gfead.pdf
  
 etc
 
 is there a single wget command that would download them all, or would I
 need to do each one separately?

   It depends.  As usual, it might help to know your wget version and
operating system, but in this case, a more immediate mystery would be
what you mean by them all, and how one would know which such files
exist.

   If there's a Web page which has links to all of them, then you could
use a recursive download starting with that page.  Look through the
output from wget -h, paying particular attention to the sections
Recursive download and Recursive accept/reject.  If there's no such
Web page, then how would wget be able to divine the existence of these
files?

   If you're running something older than version 1.10.2, you might try
getting the current released version first.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547