Re: problem with downloading when HREF has ../
On Mon, 03 Apr 2006 17:15:52 +0200 Mauro Tortonesi [EMAIL PROTECTED] wrote: The fix will appear in the next release, 1.11. Mauro's paragraph you quoted (beginning with i am going to test and apply your patch later this week) referred to applying the patch to the version control repository, not to the timeframe of releasing 1.11. It is my understanding that 1.11 will be released within the next couple of months; Mauro might give a more precise date. wget 1.11 will definitely be released in the next couple of months, but i can't be more precise in this moment. at the beginning, i was thinking about adding support for regex, gnunet and fix gnutls support in that release. now i am reconsidering whether to delay these new features for 1.12 and focus on fixing the incredible number of recently reported bugs instead. it's already 10 months since your promise to release wget 1.11, and almost a year since i've reported this problem, but wget 1.11 will hasn't been released. what are your plans for the new release? Best, v.
Re: problem with downloading when HREF has ../
Vladimir Volovich wrote: MT == Mauro Tortonesi writes: I addressed this bug in wget few months ago. See the fix here: http://www.mail-archive.com/wget@sunsite.dk/msg08516.html MT hi frank, MT i am going to test and apply your patch later this week, as well MT as many other pending patches. unfortunately i am still working MT on my ph.d. thesis at the moment, so i don't have much time to MT work on wget. however, since i believe my thesis should be ready MT tomorrow or wednesday at most, i am planning to spend the rest of MT the week to catch up with wget. are there any news on the wget update? hrvoje fixed this problem more than one month ago. from the ChangeLog: 2006-02-27 Hrvoje Niksic [EMAIL PROTECTED] * url.c (path_simplify): Don't preserve .. at beginning of path. Suggested by Frank McCown. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: problem with downloading when HREF has ../
MT == Mauro Tortonesi writes: are there any news on the wget update? MT hrvoje fixed this problem more than one month ago. from the MT ChangeLog: i don't see the official source at ftp.gnu.org/gnu/wget/ that's what i'm asking about. Best, v.
Re: problem with downloading when HREF has ../
Vladimir Volovich [EMAIL PROTECTED] writes: MT == Mauro Tortonesi writes: are there any news on the wget update? MT hrvoje fixed this problem more than one month ago. from the MT ChangeLog: i don't see the official source at ftp.gnu.org/gnu/wget/ that's what i'm asking about. The fix will appear in the next release, 1.11. Mauro's paragraph you quoted (beginning with i am going to test and apply your patch later this week) referred to applying the patch to the version control repository, not to the timeframe of releasing 1.11. It is my understanding that 1.11 will be released within the next couple of months; Mauro might give a more precise date.
Re: problem with downloading when HREF has ../
Hrvoje Niksic wrote: Vladimir Volovich [EMAIL PROTECTED] writes: MT == Mauro Tortonesi writes: are there any news on the wget update? MT hrvoje fixed this problem more than one month ago. from the MT ChangeLog: i don't see the official source at ftp.gnu.org/gnu/wget/ that's what i'm asking about. The fix will appear in the next release, 1.11. Mauro's paragraph you quoted (beginning with i am going to test and apply your patch later this week) referred to applying the patch to the version control repository, not to the timeframe of releasing 1.11. It is my understanding that 1.11 will be released within the next couple of months; Mauro might give a more precise date. wget 1.11 will definitely be released in the next couple of months, but i can't be more precise in this moment. at the beginning, i was thinking about adding support for regex, gnunet and fix gnutls support in that release. now i am reconsidering whether to delay these new features for 1.12 and focus on fixing the incredible number of recently reported bugs instead. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: problem with downloading when HREF has ../
MT == Mauro Tortonesi writes: I addressed this bug in wget few months ago. See the fix here: http://www.mail-archive.com/wget@sunsite.dk/msg08516.html MT hi frank, MT i am going to test and apply your patch later this week, as well MT as many other pending patches. unfortunately i am still working MT on my ph.d. thesis at the moment, so i don't have much time to MT work on wget. however, since i believe my thesis should be ready MT tomorrow or wednesday at most, i am planning to spend the rest of MT the week to catch up with wget. are there any news on the wget update? Best, v.
Re: problem with downloading when HREF has ../
Frank McCown wrote: Vladimir Volovich wrote: DV == Dmitry Vereschaka writes: suppose that i run wget -r -l 1 http://some-host.com/index.html and index.html contains a link like this: A HREF=../directory/file.htmlfile/A DV URL ../directory/file.html placed in DV http://some-host.com/index.html is illegal because no parent DV directory for /index.html exists. it is legal. it works everywhere else. that's why i ask to normalize the URL properly. The URL is legal if the web server doesn't complain (I'm betting its IIS?) and returns /directory/file.html properly, but it's still not technically proper to try to access a file above the root web directory. I addressed this bug in wget few months ago. See the fix here: http://www.mail-archive.com/wget@sunsite.dk/msg08516.html hi frank, i am going to test and apply your patch later this week, as well as many other pending patches. unfortunately i am still working on my ph.d. thesis at the moment, so i don't have much time to work on wget. however, since i believe my thesis should be ready tomorrow or wednesday at most, i am planning to spend the rest of the week to catch up with wget. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: problem with downloading when HREF has ../
DV == Dmitry Vereschaka writes: suppose that i run wget -r -l 1 http://some-host.com/index.html and index.html contains a link like this: A HREF=../directory/file.htmlfile/A DV URL ../directory/file.html placed in DV http://some-host.com/index.html is illegal because no parent DV directory for /index.html exists. it is legal. it works everywhere else. that's why i ask to normalize the URL properly. Best, v.
Re: problem with downloading when HREF has ../
On Sun, 26 Feb 2006, Vladimir Volovich wrote: suppose that i run wget -r -l 1 http://some-host.com/index.html and index.html contains a link like this: A HREF=../directory/file.htmlfile/A URL ../directory/file.html placed in http://some-host.com/index.html is illegal because no parent directory for /index.html exists.
Re: problem with downloading when HREF has ../
Vladimir Volovich wrote: DV == Dmitry Vereschaka writes: suppose that i run wget -r -l 1 http://some-host.com/index.html and index.html contains a link like this: A HREF=../directory/file.htmlfile/A DV URL ../directory/file.html placed in DV http://some-host.com/index.html is illegal because no parent DV directory for /index.html exists. it is legal. it works everywhere else. that's why i ask to normalize the URL properly. The URL is legal if the web server doesn't complain (I'm betting its IIS?) and returns /directory/file.html properly, but it's still not technically proper to try to access a file above the root web directory. I addressed this bug in wget few months ago. See the fix here: http://www.mail-archive.com/wget@sunsite.dk/msg08516.html Regards, Frank
Re: problem with downloading when HREF has ../
FM == Frank McCown writes: DV URL ../directory/file.html placed in DV http://some-host.com/index.html is illegal because no parent DV directory for /index.html exists. it is legal. it works everywhere else. that's why i ask to normalize the URL properly. FM The URL is legal if the web server doesn't complain (I'm FM betting its IIS?) and returns /directory/file.html properly, E.g., Apache 2.0 does complain on requests like GET /../dir/file.html HTTP/1.0 with HTTP/1.1 400 Bad Request so wget will not work properly at all. FM but it's still not technically proper to try to access a file FM above the root web directory. Indeed. All browsers do not even try to go beyond root. Wget should be fixed, too. FM I addressed this bug in wget few months ago. See the fix here: FM http://www.mail-archive.com/wget@sunsite.dk/msg08516.html Thanks. I hope that wget developers will issue a new release soon, with this fix. Best, v.
Re: problem with downloading when HREF has ../
Vladimir Volovich [EMAIL PROTECTED] writes: E.g., Apache 2.0 does complain on requests like GET /../dir/file.html HTTP/1.0 with HTTP/1.1 400 Bad Request so wget will not work properly at all. Wget's implementation is reflects rfc1808, which explicitly requires all extraneous .. path elements to be retained. In other words, that Wget does so is no accident, it had to be separately coded into path_simplify, as shown by this ChangeLog entry: 2003-11-14 Hrvoje Niksic [EMAIL PROTECTED] (path_simplify): Don't swallow ..'s at the beginning of string. E.g. simplify foo/../../bar as ../bar, not as bar. However, even rfc2396, released in 1998, relaxed this, stating in section 5.2: g) If the resulting buffer string still begins with one or more complete path segments of .., then the reference is considered to be in error. Implementations may handle this error by retaining these components in the resolved path (i.e., treating them as part of the final URI), by removing them from the resolved path (i.e., discarding relative levels above the root), or by avoiding traversal of the reference. rfc3986 (released in 2005) goes further and, as far as I can tell, simply specifies extraneous .. to resolve to /. The editors apparently recognized the reality of virtually all current implementations, and Wget should do the same. I believe Frank's proposed modification is a correct fix for this. (Except the entire else block should be deleted, rather than just commenting out the two offending lines.)