Re: problem with downloading when HREF has ../

2007-02-01 Thread Vladimir Volovich
On Mon, 03 Apr 2006 17:15:52 +0200
 Mauro Tortonesi [EMAIL PROTECTED] wrote:

  The fix will appear in the next release, 1.11.  Mauro's paragraph you
  quoted (beginning with i am going to test and apply your patch later
  this week) referred to applying the patch to the version control
  repository, not to the timeframe of releasing 1.11.
  
  It is my understanding that 1.11 will be released within the next
  couple of months; Mauro might give a more precise date.
 
 wget 1.11 will definitely be released in the next couple of months, but i
 can't be more precise in this moment. at the beginning, i was thinking
 about adding support for regex, gnunet and fix gnutls support in that
 release. now i am reconsidering whether to delay these new features for
 1.12 and focus on fixing the incredible number of recently reported bugs
 instead.

it's already 10 months since your promise to release wget 1.11,
and almost a year since i've reported this problem,
but wget 1.11 will hasn't been released. what are your plans for the new
release?

Best,
v.


Re: problem with downloading when HREF has ../

2006-04-03 Thread Mauro Tortonesi

Vladimir Volovich wrote:

MT == Mauro Tortonesi writes:

  I addressed this bug in wget few months ago.  See the fix here:
  
  http://www.mail-archive.com/wget@sunsite.dk/msg08516.html


 MT hi frank,

 MT i am going to test and apply your patch later this week, as well
 MT as many other pending patches. unfortunately i am still working
 MT on my ph.d.  thesis at the moment, so i don't have much time to
 MT work on wget.  however, since i believe my thesis should be ready
 MT tomorrow or wednesday at most, i am planning to spend the rest of
 MT the week to catch up with wget.

are there any news on the wget update?


hrvoje fixed this problem more than one month ago. from the ChangeLog:


2006-02-27  Hrvoje Niksic  [EMAIL PROTECTED]

* url.c (path_simplify): Don't preserve .. at beginning of path.
Suggested by Frank McCown.


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: problem with downloading when HREF has ../

2006-04-03 Thread Vladimir Volovich
MT == Mauro Tortonesi writes:

  are there any news on the wget update?

 MT hrvoje fixed this problem more than one month ago. from the
 MT ChangeLog:

i don't see the official source at ftp.gnu.org/gnu/wget/

that's what i'm asking about.

Best,
v.



Re: problem with downloading when HREF has ../

2006-04-03 Thread Hrvoje Niksic
Vladimir Volovich [EMAIL PROTECTED] writes:

 MT == Mauro Tortonesi writes:

   are there any news on the wget update?

  MT hrvoje fixed this problem more than one month ago. from the
  MT ChangeLog:

 i don't see the official source at ftp.gnu.org/gnu/wget/

 that's what i'm asking about.

The fix will appear in the next release, 1.11.  Mauro's paragraph you
quoted (beginning with i am going to test and apply your patch later
this week) referred to applying the patch to the version control
repository, not to the timeframe of releasing 1.11.

It is my understanding that 1.11 will be released within the next
couple of months; Mauro might give a more precise date.


Re: problem with downloading when HREF has ../

2006-04-03 Thread Mauro Tortonesi

Hrvoje Niksic wrote:

Vladimir Volovich [EMAIL PROTECTED] writes:


MT == Mauro Tortonesi writes:

 are there any news on the wget update?

MT hrvoje fixed this problem more than one month ago. from the
MT ChangeLog:

i don't see the official source at ftp.gnu.org/gnu/wget/

that's what i'm asking about.


The fix will appear in the next release, 1.11.  Mauro's paragraph you
quoted (beginning with i am going to test and apply your patch later
this week) referred to applying the patch to the version control
repository, not to the timeframe of releasing 1.11.

It is my understanding that 1.11 will be released within the next
couple of months; Mauro might give a more precise date.


wget 1.11 will definitely be released in the next couple of months, but 
i can't be more precise in this moment. at the beginning, i was thinking 
about adding support for regex, gnunet and fix gnutls support in that 
release. now i am reconsidering whether to delay these new features for 
1.12 and focus on fixing the incredible number of recently reported bugs 
instead.


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: problem with downloading when HREF has ../

2006-03-27 Thread Vladimir Volovich
MT == Mauro Tortonesi writes:

  I addressed this bug in wget few months ago.  See the fix here:
  
  http://www.mail-archive.com/wget@sunsite.dk/msg08516.html

 MT hi frank,

 MT i am going to test and apply your patch later this week, as well
 MT as many other pending patches. unfortunately i am still working
 MT on my ph.d.  thesis at the moment, so i don't have much time to
 MT work on wget.  however, since i believe my thesis should be ready
 MT tomorrow or wednesday at most, i am planning to spend the rest of
 MT the week to catch up with wget.

are there any news on the wget update?

Best,
v.



Re: problem with downloading when HREF has ../

2006-02-27 Thread Mauro Tortonesi

Frank McCown wrote:


Vladimir Volovich wrote:


DV == Dmitry Vereschaka writes:

  suppose that i run
wget -r -l 1 http://some-host.com/index.html
and index.html contains a link like this:
A HREF=../directory/file.htmlfile/A
 
 DV URL ../directory/file.html placed in
 DV http://some-host.com/index.html is illegal because no parent
 DV directory for /index.html exists.

it is legal. it works everywhere else. that's why i ask to normalize
the URL properly.




The URL is legal if the web server doesn't complain (I'm betting its 
IIS?) and returns /directory/file.html properly, but it's still not 
technically proper to try to access a file above the root web directory.


I addressed this bug in wget few months ago.  See the fix here:

http://www.mail-archive.com/wget@sunsite.dk/msg08516.html


hi frank,

i am going to test and apply your patch later this week, as well as many 
other pending patches. unfortunately i am still working on my ph.d. 
thesis at the moment, so i don't have much time to work on wget. 
however, since i believe my thesis should be ready tomorrow or wednesday 
at most, i am planning to spend the rest of the week to catch up with wget.


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it



Re: problem with downloading when HREF has ../

2006-02-26 Thread Vladimir Volovich
DV == Dmitry Vereschaka writes:

  suppose that i run
  
  wget -r -l 1 http://some-host.com/index.html
  
  and index.html contains a link like this:
  
  A HREF=../directory/file.htmlfile/A
  

 DV URL ../directory/file.html placed in
 DV http://some-host.com/index.html is illegal because no parent
 DV directory for /index.html exists.

it is legal. it works everywhere else. that's why i ask to normalize
the URL properly.

Best,
v.



Re: problem with downloading when HREF has ../

2006-02-26 Thread Dmitry Vereschaka

On Sun, 26 Feb 2006, Vladimir Volovich wrote:

 suppose that i run

   wget -r -l 1 http://some-host.com/index.html

 and index.html contains a link like this:

   A HREF=../directory/file.htmlfile/A


URL ../directory/file.html placed in http://some-host.com/index.html is
illegal because no parent directory for /index.html exists.


Re: problem with downloading when HREF has ../

2006-02-26 Thread Frank McCown

Vladimir Volovich wrote:

DV == Dmitry Vereschaka writes:

  suppose that i run
  
  wget -r -l 1 http://some-host.com/index.html
  
  and index.html contains a link like this:
  
  A HREF=../directory/file.htmlfile/A
  


 DV URL ../directory/file.html placed in
 DV http://some-host.com/index.html is illegal because no parent
 DV directory for /index.html exists.

it is legal. it works everywhere else. that's why i ask to normalize
the URL properly.



The URL is legal if the web server doesn't complain (I'm betting its 
IIS?) and returns /directory/file.html properly, but it's still not 
technically proper to try to access a file above the root web directory.


I addressed this bug in wget few months ago.  See the fix here:

http://www.mail-archive.com/wget@sunsite.dk/msg08516.html

Regards,
Frank



Re: problem with downloading when HREF has ../

2006-02-26 Thread Vladimir Volovich
FM == Frank McCown writes:

 DV URL ../directory/file.html placed in
 DV http://some-host.com/index.html is illegal because no parent
 DV directory for /index.html exists.

  it is legal. it works everywhere else. that's why i ask to
  normalize the URL properly.

 FM The URL is legal if the web server doesn't complain (I'm
 FM betting its IIS?) and returns /directory/file.html properly,

E.g., Apache 2.0 does complain on requests like
GET /../dir/file.html HTTP/1.0 with HTTP/1.1 400 Bad Request
so wget will not work properly at all.

 FM but it's still not technically proper to try to access a file
 FM above the root web directory.

Indeed. All browsers do not even try to go beyond root.
Wget should be fixed, too.

 FM I addressed this bug in wget few months ago.  See the fix here:

 FM http://www.mail-archive.com/wget@sunsite.dk/msg08516.html

Thanks. I hope that wget developers will issue a new release soon,
with this fix.

Best,
v.



Re: problem with downloading when HREF has ../

2006-02-26 Thread Hrvoje Niksic
Vladimir Volovich [EMAIL PROTECTED] writes:

 E.g., Apache 2.0 does complain on requests like GET
 /../dir/file.html HTTP/1.0 with HTTP/1.1 400 Bad Request so wget
 will not work properly at all.

Wget's implementation is reflects rfc1808, which explicitly requires
all extraneous .. path elements to be retained.  In other words,
that Wget does so is no accident, it had to be separately coded into
path_simplify, as shown by this ChangeLog entry:

2003-11-14  Hrvoje Niksic  [EMAIL PROTECTED]

(path_simplify): Don't swallow ..'s at the beginning of string.
E.g. simplify foo/../../bar as ../bar, not as bar.

However, even rfc2396, released in 1998, relaxed this, stating in
section 5.2:

  g) If the resulting buffer string still begins with one or more
 complete path segments of .., then the reference is
 considered to be in error.  Implementations may handle this
 error by retaining these components in the resolved path
 (i.e., treating them as part of the final URI), by removing
 them from the resolved path (i.e., discarding relative levels
 above the root), or by avoiding traversal of the reference.

rfc3986 (released in 2005) goes further and, as far as I can tell,
simply specifies extraneous .. to resolve to /.  The editors
apparently recognized the reality of virtually all current
implementations, and Wget should do the same.

I believe Frank's proposed modification is a correct fix for this.
(Except the entire else block should be deleted, rather than just
commenting out the two offending lines.)