Recursive downloading and post

2007-10-22 Thread Stuart Moore
Hi

I've been using wget to recursively download the output of a CGI
script on a server, together with any documents linked to by the
output of that CGI script - and then to use -k to create a locally
linked version.

Due to the length of data sent to the CGI script, wget needs to be
invoked with --post-file

It seems that wget sends this post data to all the URLs it recursively
downloads - not just the base URL. Unfortunately this stings me
somewhat, as one of the webservers I need to download the linked files
from refuses post requests. (Even if I can get this changed, it seems
wrong to be sending post requests designed for one page to another
one)

Is there any way to get wget to only use the post data for the first
file downloaded? I couldn't find any in the documentation - in fact
there seems to be nothing in the documentation regarding the
interaction of recursive downloading with post data. It would be great
to see the current behaviour documented somewhere.

Alternatively, if anyone can suggest any workarounds, that'd be much
appreciated. I need to convert the links, so just downloading the
first file using post, and then using that as wget's input (using -i)
won't work.

Stuart Moore


Re: Recursive downloading and post

2007-10-22 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Stuart Moore wrote:
 Hi
 
 I've been using wget to recursively download the output of a CGI
 script on a server, together with any documents linked to by the
 output of that CGI script - and then to use -k to create a locally
 linked version.
 
 Due to the length of data sent to the CGI script, wget needs to be
 invoked with --post-file
 
 It seems that wget sends this post data to all the URLs it recursively
 downloads - not just the base URL. Unfortunately this stings me
 somewhat, as one of the webservers I need to download the linked files
 from refuses post requests. (Even if I can get this changed, it seems
 wrong to be sending post requests designed for one page to another
 one)
 
 Is there any way to get wget to only use the post data for the first
 file downloaded? I couldn't find any in the documentation - in fact
 there seems to be nothing in the documentation regarding the
 interaction of recursive downloading with post data. It would be great
 to see the current behaviour documented somewhere.
 
 Alternatively, if anyone can suggest any workarounds, that'd be much
 appreciated. I need to convert the links, so just downloading the
 first file using post, and then using that as wget's input (using -i)
 won't work.

Hi Stuart,

Unfortunately, I'm not sure I can offer much help. AFAICT, --post-file
and --post-data weren't really designed for use with recursive
downloading. At some point, it's planned to introduce a mechanism to
apply certain configurations only for specified paths
(http://wget.addictivecode.org/FeatureSpecifications/PathSpecificConfig);
it will be a while before we can get to that, however.

You might be able to wing it with the -i option, and then convert the
links in the first (POST) URL by hand, using a simple script in Sed or
Perl. I'm afraid I don't know what else to suggest. :\

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHHN8D7M8hyUobTrERCGEoAJ453SCQUtraSfqgtctziHK6QutI/ACdHWHu
dwO+l5wCYkR1nUiu67n8DU8=
=QQv4
-END PGP SIGNATURE-


RE: Recursive downloading and post

2007-10-22 Thread Tony Lewis
Micah Cowan wrote

 Stuart Moore wrote:
  Is there any way to get wget to only use the post data for the first
  file downloaded?

 Unfortunately, I'm not sure I can offer much help. AFAICT, --post-file
 and --post-data weren't really designed for use with recursive
 downloading.

Perhaps not, but I can't imagine that there is any scenario where the POST
data should legitimately be sent for anything other than the URL(s) on the
command line.

I'd vote for this being flagged as a bug.

Tony