Feature suggestion: change detection for wget -c
Wget has no way of verifying that the local file is really a valid prefix of the remote file Couldn't wget redownload the last 4 bytes (or so) of the file? For a few bytes per file we could detect changes to almost all compressed files and the majority of uncompressed files. -- John C. McCabe-Dansted PhD Student University of Western Australia
wget silently overwrites a file when using -c and the server does not support resuming
Using 1.10.2 To reproduce: 1) Download a video from Google Video: $ wget -O Test.resume_me.avi http://vp05.video.l.google.com/videodownload?version=0secureurl=twAAAKKXmJe_gUGC30JVHiQCrmBhoU7JEoYkn1zkPRI9Vm4nYjXB_Lconoy-Fwa2rg40mCn-w3frP3K4KTW7vxmD2bubcJainv-i4vxBqUS_k2VtLtsJI04UFSYcVQVESuIqHZfGuToqj3r3HkfzbKYgoRSzAEI6xUl3-jQKsKAgpQzwoaRbExjhOU2kup9A0VxOlC_KdqG2QWMejRjLZZEfCDb4ETaWEBT0qIGq3W_GS6sKcx6dKXYGMuiGbd4Wf9v3Mgsigh=ongRDut1aAA_QP6pwGRnwIWO2k0begin=0len=1221999docid=9076288729387457440rdc=1; 2) Cancel the download after a few seconds. 3) Re-download, using the -c flag. Result: The old file will be silently overwritten. Wget should refuse downloading the file. The docs specifically state: Beginning with Wget 1.7, if you use -c on a non-empty file, and it turns out that the server does not support continued downloading, Wget will refuse to start the download from scratch, which would effectively ruin existing contents. If you really want the download to start from scratch, remove the file.
Re: wget silently overwrites a file when using -c and the server does not support resuming
Ori Avtalion wrote: Using 1.10.2 To reproduce: 1) Download a video from Google Video: $ wget -O Test.resume_me.avi http://vp05.video.l.google.com/videodownload?version=0secureurl=twAAAKKXmJe_gUGC30JVHiQCrmBhoU7JEoYkn1zkPRI9Vm4nYjXB_Lconoy-Fwa2rg40mCn-w3frP3K4KTW7vxmD2bubcJainv-i4vxBqUS_k2VtLtsJI04UFSYcVQVESuIqHZfGuToqj3r3HkfzbKYgoRSzAEI6xUl3-jQKsKAgpQzwoaRbExjhOU2kup9A0VxOlC_KdqG2QWMejRjLZZEfCDb4ETaWEBT0qIGq3W_GS6sKcx6dKXYGMuiGbd4Wf9v3Mgsigh=ongRDut1aAA_QP6pwGRnwIWO2k0begin=0len=1221999docid=9076288729387457440rdc=1; 2) Cancel the download after a few seconds. 3) Re-download, using the -c flag. Result: The old file will be silently overwritten. Wget should refuse downloading the file. The docs specifically state: Beginning with Wget 1.7, if you use -c on a non-empty file, and it turns out that the server does not support continued downloading, Wget will refuse to start the download from scratch, which would effectively ruin existing contents. If you really want the download to start from scratch, remove the file. Did you actually confirm that a partially downloaded file existed? I have canceled downloads and no trace of the partially downloaded file was to be found. -- Gerard Seibert [EMAIL PROTECTED]
Feature request : save the charset of the pages
Hi, I think that wget should include a charset declaration in the html page if it don't exist. The charset of a web page can be found in 2 ways : -In the http header (example : Content-Type: text/html; charset=ISO-8859-1 ) -In the html header (example : meta http-equiv=Content-Type content=text/html; charset=UTF-8 ) For browsing, it's enough to have the charser only in the http header. The browser is informed. But after download with wget, there is no longer charset if it wasn't in the html header. Example : $ wget -SEk http://www.la-croix.com/ --00:08:33-- http://www.la-croix.com/ = `index.html.2' Resolving www.la-croix.com... 160.92.103.70 Connecting to www.la-croix.com|160.92.103.70|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Thu, 31 Aug 2006 22:06:18 GMT Server: Apache Set-Cookie: JSESSIONID=41649A198F5523A8E970C25FDFB02A9E.C5067890C9167DD999; Path=/ Last-Modified: Thu, 31 Aug 2006 22:02:49 GMT Connection: close Content-Type: text/html; charset=ISO-8859-15 Length: unspecified [text/html] [ = ] 51,974 280.97K/s 00:08:34 (280.84 KB/s) - `index.html.2.4.html' saved [51974] Converting index.html.2.4.html... 3-246 Converted 1 files in 0.006 seconds. The charset of this page is ISO-8859-15, but this information is now lost because the file don't contain any information about it. If after I parse this file, the parser won't know the charset. If I submit now the file to the html walidator http://validator.w3.org it's printing : Result: Failed validation File: index.html.2.4.html Encoding: utf-8 Doctype: Sorry, I am unable to validate this document because on line 19, 182-183, 211, 215, 220, 225, 232, 236, 246, 286, 328, 403, 448, 455, 483, 519, 539, 547, 606, 643, 657-658, 660, 675, 679, 690, 701, 711, 720, 724, 732-733, 764 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication. I think if a html header don't declare a charset, wget should include it.
Re: wget silently overwrites a file when using -c and the server does not support resuming
From: Ori Avtalion wget -O Test.resume_me.avi [...] [...] Result: The old file will be silently overwritten. [...] You're working too hard. Using -O will overwrite the output file no matter what happens, whether the download works or not. That's what -O does. If you don't like it, don't use -O. If you look through the archive, you can find many other cases where -O caused various effects which various users did not like. It's a characteristic of -O. If you can see the same problem when you don't specify -O, feel free to re-complain. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547