Mirroring redirected web sites

2007-08-21 Thread Theo Wollenleben
I want to mirror data on a server that redirects to different mirrors.
The server decides which mirror to pick. I want a local copy under the
original address. It doesn't work with `wget -x -N' because Wget will
use the redirected address for checking the timestamp and saving the
local copy. For a single file I also tried `wget -N -O
local_copy_of_file'. Apparently Wget doesn't check the timestamp of
`local_copy_of_file', so it doesn't work either. Is there a way to
achieve what I want with Wget? If not then this is a request for an
enhancement. There could be an option that tells Wget to use the
original address for all local operations like checking timestamps and
saving data. To illustrate the situation I provide the following
example. The server redirects with status code 302. After some while the
server might choose another mirror.

$ wget -x -N http://download.suse.com/update/10.1/repodata/repomd.xml
--09:30:41--  http://download.suse.com/update/10.1/repodata/repomd.xml
   = `download.suse.com/update/10.1/repodata/repomd.xml'
Resolving download.suse.com... 195.135.221.130
Connecting to download.suse.com|195.135.221.130|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://ftp5.gwdg.de/pub/suse/update/10.1/repodata/repomd.xml
[following]
--09:30:41--  http://ftp5.gwdg.de/pub/suse/update/10.1/repodata/repomd.xml
   = `ftp5.gwdg.de/pub/suse/update/10.1/repodata/repomd.xml'
Resolving ftp5.gwdg.de... 134.76.12.5
Connecting to ftp5.gwdg.de|134.76.12.5|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1,231 (1.2K) [text/xml]

100%[==]
1,231 --.--K/s

09:30:41 (31.73 MB/s) -
`ftp5.gwdg.de/pub/suse/update/10.1/repodata/repomd.xml' saved [1231/1231]


Additional remark: At the moment I'm using the following shell code for
mirroring a single file. It could be much simpler if Wget would use the
file name provided with `-O' to check the timestamp...

export LANG=en_US.UTF-8
function wget_mirror () {
 local URL=$1
 # remove protocol identifier:
 local FILE=$(echo $URL | sed 's/^[a-z]*:\/\/\(.*\)/\1/')
 local DIR=$(dirname $FILE)
 if [ -e $FILE ]; then
  TIME_LOCAL=$(stat -c %Y $FILE)
  TIME=$(date -d $(wget --spider -S $URL 21 | grep Last-Modified:
| sed 's/^  Last-Modified: \(.*\)/\1/') +%s)
 fi
 if [ ! -e $FILE -o $TIME -gt $TIME_LOCAL ]; then
  if [ ! -d $DIR ]; then mkdir -p $DIR; fi
  wget -O $FILE $URL
 fi
}



Re: Mirroring redirected web sites

2007-08-21 Thread Steven M. Schweda
From: Theo Wollenleben

 [...]  For a single file I also tried `wget -N -O
 local_copy_of_file'. Apparently Wget doesn't check the timestamp of
 `local_copy_of_file', so it doesn't work either.  [...]

   The implementation of -O defeats -N (among other options).  Look
around at http://www.mail-archive.com/wget@sunsite.dk/ for details.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547