Noel Koethe <[EMAIL PROTECTED]> writes: > "If the http content-length header differs from actual data length, > wget disregards the http specification as follows:
It doesn't disregard the HTTP specification. As far as I'm aware, HTTP simply specifies that the information provided by Content-Length must be correct. When it is not correct, the protocol has been broken by the server and the best Wget can do is try to make sense of the situation. In both cases you report, Wget's behavior is by design. > 1) if content-length is greater than actual data, wget keeps > retrying to receive the whole file indefinitely. Not indefinitely, but until `--tries' attempts (20 by default) have been exhausted. > Using the command-line parameter --ignore-length fixes this but > should it not be on by default? No. When you're downloading files over a slow or unstable network, you will often get EOF while reading data. Retrying in spite of that EOF has been one of Wget's primary features since the very beginning. So Wget is not disregarding the spec, it is *honoring* it by assuming that the provided Content-Length is correct, as it should be. This feature has made many a download possible. In the cases where the content-length header truly is broken, use `--ignore-length'. > 2) If content-length is smaller than actual data sent by server, > wget happily downloads it all instead of stopping at what ever > content-length specified. Again, this is a feature. Broken CGI scripts often report broken values for `Content-Length'. When more data arrives, it becomes apparent that the reported value is *broken* (unlike in the case when less data arrives). Wget can either dismiss the rest of the data or dismiss the header. I judged the data actually transmitted over the wire to be more important than one obviously broken header. The exception is when persistent connections are used. In that case, Content-Length is honored to the letter, and the remote server had *better* provide the correct value, or else. > This is contrary to the spec which strictly states that > content-length must be obeyed and that the user must be notified > that something strange happened. Which spec says that?
