-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Ed wrote:
> Seen this twice now but unable to track down how it happens.
> 
> I am crawling a list of websites which are being kept in a cache area.

<snip>

> A small number of files end up in the wrong location, evidence from
> the logs indicates that these
> 
> - are page requisite downloads, e.g. jpegs generally
> -  have a saved line of the form 'file.jpg' saved - i.e. no directory prefix
> - are a small part of the overall crawl activity, most things get put
> away properly
> 
> the html for these pages shows references to the offending item as
> '../../../../../../ ..... file.jpg' (in the one case where I counted
> ../ is repeated 17 times)
> 
> the wget logs shows that after a spell of correctly saving requisites
> for a site we get a run of these errors until the current download
> finishes processing, each file erroneously saved is associated with a
> log line like
> Server file no newer than local file `filename.jpg' -- not retrieving.
> However this is the *first occurence* of filename.jpg in the log
> 
> I am using Centos and the Centos build of wget, I have looked through
> bug trackers in vain, is this a known problem with wget? with centos?
> Repeating this particular event did not produce the same problem and
> as my wget code has not changed I am assuming it is intermittent in
> some fashion.

It's hard to track down what's wrong unless you can give us a specific
invocation that we can use to test with. It'd also be helpful if you
could provide evidence that these paths are indeed wrong. Comparisons
between the actual URL (both of the requisite and the referring page),
and the location would be helpful, as would a snippit from the debug
log. But even just a page we could try it against, or if it only happens
for a whole site, the main URL you're doing this against, would help.

But, before that, we need you to try to reproduce it with a canonical
version of Wget, please, which you can obtain from
ftp://ftp.gnu.org/gnu/wget/wget-1.10.2.tar.gz. CentOS is a RedHat
derivative, and it is known that RedHat has made some heavy
modifications to Wget, so that their version is not our version.

You might also see how our current development version holds up. You can
get it via Subversion, see
http://www.gnu.org/software/wget/wgetdev.html#development; you'll need
Subversion, GNU Autoconf and GNU Gettext.

Thanks very much for your help!

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG3Y/17M8hyUobTrERCJENAJ45jNxYDxXFijr/4HOnXJXQnccivQCeN/+o
gv08oGm8kZuT+xh2LWdcHig=
=1sTU
-----END PGP SIGNATURE-----

Reply via email to