Re: simple wget question

2007-05-11 Thread R Kimber
On Thu, 10 May 2007 16:04:41 -0500 (CDT)
Steven M. Schweda wrote:

 From: R Kimber
 
  Yes there's a web page.  I usually know what I want.
 
There's a difference between knowing what you want and being able
 to describe what you want so that it makes sense to someone who does
 not know what you want.

Well I was wondering if wget had a way of allowing me to specify it.

  But won't a recursive get get more than just those files? Indeed,
  won't it get everything at that level? The accept/reject options
  seem to assume you know what's there and can list them to exclude
  them.  I only know what I want. [...]
 
Are you trying to say that you have a list of URLs, and would like
 to use one wget command for all instead of one wget command per URL? 
 Around here:
 
 ALP $ wget -h
 GNU Wget 1.10.2c, a non-interactive network retriever.
 Usage: alp$dka0:[utility]wget.exe;13 [OPTION]... [URL]...
 [...]
 
 That [URL]... was supposed to suggest that you can supply more than
 one URL on the command line.  Subject to possible command-line length
 limitations, this should allow any number of URLs to be specified at
 once.
 
There's also -1 (--input-file=FILE).  No bets, but it looks as
 if you can specify - for FILE, and it'll read the URLs from stdin,
 so you could pipe them in from anything.

Thanks, but my point is I don't know the full URL, just the pattern.

What I'm trying to download is what I might express as:

http://www.stirling.gov.uk/*.pdf

but I guess that's not possible.  I just wondered if it was possible
for wget to filter out everything except *.pdf - i.e. wget would look
at a site, or a directory on a site, and just accept those files that
match a pattern.

- Richard
-- 
Richard Kimber
http://www.psr.keele.ac.uk/


--page-requisites and --post-data options

2007-05-11 Thread aulaulau

Hi !





First of all, thank you for this beautifull tool that Wget is.



Simply to tell you that if you want, in a script, to make an action on a web 
server ( by posting datas), an completely mirror the resulting page for offline 
viewing, you will use --page-requisites and --post-data options together.

The fact is, Wget continues to post datas for all images, css required to 
'completely' download the page, and the server says :  405 : method not 
allowed  which is normal.



Is there a way to do it with wget options ?





thanks in advance !



Laurent STELLA


Re: --page-requisites and --post-data options

2007-05-11 Thread Steven M. Schweda
From aulaulau:

 [...] you will use --page-requisites and --post-data options together.

   Probably not something anyone considered.

 Is there a way to do it with wget options ?

   Perhaps use --post-data to get the primary page, and then use -i
primary_page (perhaps with -F, perhaps with --page-requisites) to get
the other pieces?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: simple wget question

2007-05-11 Thread Steven M. Schweda
From: R Kimber

 What I'm trying to download is what I might express as:
 
 http://www.stirling.gov.uk/*.pdf

   At last.

 but I guess that's not possible.

   In general, it's not.  FTP servers often support wildcards.  HTTP
servers do not.  Generally, an HTTP server will not give you a list of
all its files the way an FTP server often will, which is why I asked (so
long ago) If there's a Web page which has links to all of them, [...].

   I just wondered if it was possible
 for wget to filter out everything except *.pdf - i.e. wget would look
 at a site, or a directory on a site, and just accept those files that
 match a pattern.

   Wget has options for this, as suggested before (wget -h):

[...]
Recursive accept/reject:
  -A,  --accept=LIST   comma-separated list of accepted extensions.
  -R,  --reject=LIST   comma-separated list of rejected extensions.
[...]

but, like many of us, it's not psychic.  It needs explict URLs or else
instructions (-r) to follow links which it sees in the pages it sucks
down.  If you don't have a list of the URLs you want, and you don't have
URLs for one or more Web pages which contain links to the items you
want, then you're probably out of luck.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547