Hi,

  I'm mirroring a few internet associated www sites onto my server named
www.mirror.edu.cn. I don't want to defined any HTTPD virtual host since I
have many www sites to mirror, and morever, I can't determine which sites
will be selected to mirror in the future. So I defined the access pattern
of my server as: http://www.mirror.edu.cn/sites/<site_name>/, for example,
http://www.mirror.edu.cn/sites/www.isoc.org/.

  I use wget 1.5.3 to carry out these mirroring tasks. The command line to
mirror http://www.isoc.org/ is as follows:
----------------
PATH=/usr/bin:/usr/local/bin
export PATH
cd /space/large/www/mirror/sites/www.isoc.org
if [ ! -s log ]; then
    wget --background --output-file=$HOME/.wget/sites/www.isoc.org/log \
    --non-verbose --tries=5 --wait=0 --force-directories --no-host-directories \
    --ignore-length --convert-links --mirror --no-host-lookup --no-parent \
    --exclude-directories=/cgi-bin http://www.isoc.org/
fi
----------------
/space/large/www/mirror/ is the document root for http://www.mirror.edu.cn/,
and /space/large/www/mirror/sites/www.isoc.org/(or http://www.mirror.edu.cn/
sites/www.isoc.org/) should be the virtual document root for the mirrored
http://www.isoc.org/ on http://www.mirror.edu.cn/.

  The problem is wget doesn't provide an HTTP option such as --document-root
=DIR(for the previous example, --document-root=/sites/www.isoc.org) to make
my assumption on virtual document root work. Therefore, the mirrored link on
my server for http://www.isoc.org/oti/ appears as http://www.mirror.edu.cn/
sites/www.isoc.org/oti/ when it is referenced relatively(oti), and appears
as http://www.mirror.edu.cn/oti/ when it is referenced absolutively(/oti).

  Therefore, my suggestion is that wget can provide a new HTTP option named
--document-root or something else. It works independently of other related
options such as --convert-links. This option convert any root reference in
the mirrored document into the specified virtual document root. For example,
/oti or http://www.isoc.org/oti is converted into /sites/www.isoc.org/oti.

  Is there any other solution to my situation? Can anybody give me a revised
version of wget, or tell me how to revise it? Thanks.

Regards,
Yang Jizhang

Network Research Center
Tsinghua University
Beijing 100084, P.R.China

Reply via email to