Hi list!
saddly I couldn't find the E-Mail of Allan (maybe because I'm atached by
the news gateway) so this is a list-only-post.
Micah Cowan wrote:
Hi Allan,
You'll generally get better results if you post to the mailing list
([email protected]). I've added it to the recipients list.
Coombe, Allan David (DPS) wrote:
Hi Micah,
First some context&
We are using wget 1.11.3 to mirror a web site so we can do some offline
processing on it. The mirror is on a Solaris 10 x86 server.
The problem we are getting appears to be because the URLs in the HTML
pages that are harvested by wget for downloading have mixed case (the
site we are mirroring is running on a Windows 2000 server using IIS) and
the directory structure created on the mirror have 'duplicate'
directories because of the mixed case.
For example, the URLs in HTML pages /Senate/committees/index.htm and
/senate/committees/index.htm refer to the same file but wget creates 2
different directory structures on the mirror site for these URLs.
Ok... at this point I need to ask whether you try to mirror or just
backup the site.
The main problem is easy: the moment you want a working mirror you need
those mixed-case files or rewrite the url to a unique casing.
At this point it seems to be most practical to either introduce a hook
like --restrict-file-names to modify the name of the local copy and the
links inside the downloaded files in the same way.
An other option is to create symlinks for the different directory cases.
That would safe half the overhead, i guess.
To create such a symlink structure you could use the output of "
find /mirror/basedir -type d | sort -f
" hope that helps.
Matthias