Good day!
I use wget 1.9.1.
By default all link to root site / or somedomain.com/ wget convert
to /index.html or somedomain.com/index.html.
But some site don't use index.html as default page and if use timestamp
and continue download site in more than 1 session
1. wget first download index.html
The whole matter of conversion of / to /index.html on the file
system is a hack. But I really don't know how to better represent
empty trailing file name on the file system.
Hrvoje Niksic wrote:
The whole matter of conversion of / to /index.html on the file
system is a hack. But I really don't know how to better represent
empty trailing file name on the file system.
Another, for now rather limited, hack: on file systems which support some
sort of file attributes
On some sites when I try to download a file, if I use the browser the
file is saved with the correct name, but using Wget I get a diferent
name.
It seems that Wget ignores the Content-disposition HTTP header.
Ir you try to download, for example, the Sun Java Server Faces
reference implementation
I really like wget but I'd like to make a suggestion for an improvement.
Occasionally, I would like more nuanced control over the URLs that 'wget'
downloads recursively.
Would it be a relatively simple to allow wget to take a filter argument
which is some other executable.
wget -r --filter
On Mon, Mar 01, 2004 at 07:25:52PM +0100, Hrvoje Niksic wrote:
Removing the offending code fixes the problem, but I'm not sure if
this is the correct solution. I expect it would be more correct to
remove multiple slashes only before the first occurrance of ?, but
not afterwards.