Re: wget ftp url syntax is wrong

2001-02-28 Thread Jan Prikryl

> > By the way, neither "//" nor "/%2F" works in 1.7-dev.  Perhaps we
> > broke that when we fixed the problem where recursive FTP 'wget's
> > assumed that logging in always put you in '/'?
> 
> I believe some of Jan's changes broke it.  Also, the standard idiom:
> 
> wget -r ftp://username:password@host//path/to/home/something
> 
> no longer works.

Aargh. I will have a look at it.

-- jan 




Re: wget ftp url syntax is wrong

2001-02-28 Thread Hrvoje Niksic

"Dan Harkless" <[EMAIL PROTECTED]> writes:

> By the way, neither "//" nor "/%2F" works in 1.7-dev.  Perhaps we
> broke that when we fixed the problem where recursive FTP 'wget's
> assumed that logging in always put you in '/'?

I believe some of Jan's changes broke it.  Also, the standard idiom:

wget -r ftp://username:password@host//path/to/home/something

no longer works.



Re: wget ftp url syntax is wrong

2001-02-27 Thread Jamie Zawinski

Dan Harkless wrote:
> 
> It's my experience that very few anonymous FTP servers put you in a
> directory other than '/' (it certainly may be a chroot()ed '/'),

ftp.redhat.com puts you in /pub by default (as user "anonymous".)

I haven't checked, but I'd say it's a safe bet that this is what the
ftpd that comes with Red Hat Linux does by default.

-- 
Jamie Zawinski
[EMAIL PROTECTED] http://www.jwz.org/
[EMAIL PROTECTED]   http://www.dnalounge.com/



Re: wget ftp url syntax is wrong

2001-02-27 Thread Dan Harkless


Hrvoje Niksic <[EMAIL PROTECTED]> writes:
> Jamie Zawinski <[EMAIL PROTECTED]> writes:
> > Netscape can retrieve this URL:
> > 
> >   ftp://ftp.redhat.com/pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm
> > 
> > wget cannot.   wget wants it to be:
> > 
> >   ftp://ftp.redhat.com//pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm
> > 
> > I believe the Netscape behavior is right and the wget behavior is
> > wrong.
> 
> Wget behavior is based on what was specified in rfc1738 at the time I
> was writing the code.  rfc1738 does require %2F to be used instead of
> the slash immediately preceding "pub", but I considered the
> distinction to be purely academic and made Wget accept both.  (I have
> yet to see a purpose for CWD-ing into an empty directory.)

By the way, neither "//" nor "/%2F" works in 1.7-dev.  Perhaps we broke that
when we fixed the problem where recursive FTP 'wget's assumed that logging
in always put you in '/'?

---
Dan Harkless| To help prevent SPAM contamination,
GNU Wget co-maintainer  | please do not mention this email
http://sunsite.dk/wget/ | address in Usenet posts -- thank you.



Re: wget ftp url syntax is wrong

2001-02-27 Thread Dan Harkless


Jamie Zawinski <[EMAIL PROTECTED]> writes:
> Dan Harkless wrote:
> > Well, silly or not, the concept is already there, so I don't think it makes
> > sense to remove the ability to access RFC-valid URLs in order to imitate
> > Netscape or Internet Explorer.
> 
> I guess that depends on whether you think it's more important to
> do the most useful thing, and what people expect; or do what the
> RFC says, despite the fact that nobody else has actually implemented
> that.

Well, I don't think we've shown that _nobody_ else has implemented that.  I
guess it mightn't be a terrible idea to make the common, non-compliant
behavior the default, though, as long as the RFC-correct behavior is
optionally available.

It's my experience that very few anonymous FTP servers put you in a
directory other than '/' (it certainly may be a chroot()ed '/'), though, and
FTP files that require a login and password to get at tend not to be
published as URLs, so in reality I don't think we're talking about that
large a body of common practice.

> (But if you're going to slavishly follow the RFC, you have to do one CWD
> for each directory component, or it won't work on, e.g., VMS and TWENEX
> file servers.)

I certainly wouldn't be opposed to putting in such behavior, if people using
VMS or TWENEX servers complained.

---
Dan Harkless| To help prevent SPAM contamination,
GNU Wget co-maintainer  | please do not mention this email
http://sunsite.dk/wget/ | address in Usenet posts -- thank you.



Re: wget ftp url syntax is wrong

2001-02-27 Thread Hrvoje Niksic

Jamie Zawinski <[EMAIL PROTECTED]> writes:

> Netscape can retrieve this URL:
> 
>   ftp://ftp.redhat.com/pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm
> 
> wget cannot.   wget wants it to be:
> 
>   ftp://ftp.redhat.com//pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm
> 
> I believe the Netscape behavior is right and the wget behavior is
> wrong.

Wget behavior is based on what was specified in rfc1738 at the time I
was writing the code.  rfc1738 does require %2F to be used instead of
the slash immediately preceding "pub", but I considered the
distinction to be purely academic and made Wget accept both.  (I have
yet to see a purpose for CWD-ing into an empty directory.)



Re: wget ftp url syntax is wrong

2001-02-27 Thread Jamie Zawinski

Dan Harkless wrote:
> 
> Well, silly or not, the concept is already there, so I don't think it makes
> sense to remove the ability to access RFC-valid URLs in order to imitate
> Netscape or Internet Explorer.

I guess that depends on whether you think it's more important to
do the most useful thing, and what people expect; or do what the
RFC says, despite the fact that nobody else has actually implemented
that.

I guess you know what my opinion is: de facto standards are the
only ones that matter.

> > The correct approach would be to try "CWD url/dir/path/" (the correct
> > meaning) and if this does not work, try "CWD /url/dir/path/".
> 
> I agree this would seem to be the best approach.  I'll add this to the TODO.

That works too, I suppose.

(But if you're going to slavishly follow the RFC, you have to do one CWD
for each directory component, or it won't work on, e.g., VMS and TWENEX
file servers.)

-- 
Jamie Zawinski
[EMAIL PROTECTED] http://www.jwz.org/
[EMAIL PROTECTED]   http://www.dnalounge.com/



Re: wget ftp url syntax is wrong

2001-02-27 Thread Dan Harkless


Jan Prikryl <[EMAIL PROTECTED]> writes:
> Quoting Jamie Zawinski ([EMAIL PROTECTED]):
> > However, that said, I still think wget should do what Netscape does,
> > because that's what everyone expects.  The concept of a "default 
> > directory" in a URL is silly.

Well, silly or not, the concept is already there, so I don't think it makes
sense to remove the ability to access RFC-valid URLs in order to imitate
Netscape or Internet Explorer.

> The correct approach would be to try "CWD url/dir/path/" (the correct
> meaning) and if this does not work, try "CWD /url/dir/path/".

I agree this would seem to be the best approach.  I'll add this to the TODO.

---
Dan Harkless| To help prevent SPAM contamination,
GNU Wget co-maintainer  | please do not mention this email
http://sunsite.dk/wget/ | address in Usenet posts -- thank you.



Re: wget ftp url syntax is wrong

2001-02-26 Thread Jan Prikryl

Quoting Jamie Zawinski ([EMAIL PROTECTED]):

> However, that said, I still think wget should do what Netscape does,
> because that's what everyone expects.  The concept of a "default 
> directory" in a URL is silly.

The correct approach would be to try "CWD url/dir/path/" (the correct
meaning) and if this does not work, try "CWD /url/dir/path/".

-- jan

+--
 Jan Prikryl| vr|vis center for virtual reality and visualisation
 <[EMAIL PROTECTED]> | http://www.vrvis.at
+--



Re: wget ftp url syntax is wrong

2001-02-26 Thread Jamie Zawinski

Hanno Foest wrote:
> 
>>   ftp://ftp.redhat.com/pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm
>>   ftp://ftp.redhat.com//pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm
...
> I don't think so. The double slash in front of the path part of the URL
> starts the path in the ftp server's root, while the single slash starts
> it in the default directory you log into when doing anonymous ftp. The
> default directory isn't the server's root in this case, but "pub".

Ok, I read the RFC, and we're both wrong:

   http://www.faqs.org/rfcs/rfc1738.html

   For example, the URL ftp:[EMAIL PROTECTED]/%2Fetc/motd> is
   interpreted by FTP-ing to "host.dom", logging in as "myname"
   (prompting for a password if it is asked for), and then executing
   "CWD /etc" and then "RETR motd". This has a different meaning from
   ftp:[EMAIL PROTECTED]/etc/motd> which would "CWD etc" and then
   "RETR motd"; the initial "CWD" might be executed relative to the
   default directory for "myname". On the other hand,
   ftp:[EMAIL PROTECTED]//etc/motd>, would "CWD " with a null
   argument, then "CWD etc", and then "RETR motd".

So according to the RFC, to use an absolute path, you have to begin
the path component with "/%2F", not with "//" -- the latter means
"cd to the current directory first", thus, it's a no-op.  (Actually
it's not clear whether "CWD " means "home directory" or "current 
directory": it's unspecified by RFC 765.)

However, that said, I still think wget should do what Netscape does,
because that's what everyone expects.  The concept of a "default 
directory" in a URL is silly.

I'll bet MSIE does the same thing as Netscape.  That makes it
the standard.

-- 
Jamie Zawinski
[EMAIL PROTECTED] http://www.jwz.org/
[EMAIL PROTECTED]   http://www.dnalounge.com/



Re: wget ftp url syntax is wrong

2001-02-26 Thread Jan Prikryl

Quoting Hanno Foest ([EMAIL PROTECTED]):

> On Mon, Feb 26, 2001 at 12:46:51AM -0800, Jamie Zawinski wrote:
> 
> > Netscape can retrieve this URL: 
> >
> >   ftp://ftp.redhat.com/pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm
> > 
> > wget cannot.   wget wants it to be:
> > 
> >   ftp://ftp.redhat.com//pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm
> > 
> > I believe the Netscape behavior is right and the wget behavior is wrong.
> 
> I don't think so. The double slash in front of the path part of the URL
> starts the path in the ftp server's root, while the single slash starts
> it in the default directory you log into when doing anonymous ftp. The
> default directory isn't the server's root in this case, but "pub".

Right. On the other hand, wget shall be probably able to handle the
missing slash at the beginning (as Netscape does).

-- jan

+--
 Jan Prikryl| vr|vis center for virtual reality and visualisation
 <[EMAIL PROTECTED]> | http://www.vrvis.at
+--



Re: wget ftp url syntax is wrong

2001-02-26 Thread Hanno Foest

On Mon, Feb 26, 2001 at 12:46:51AM -0800, Jamie Zawinski wrote:

> Netscape can retrieve this URL: 
>
>   ftp://ftp.redhat.com/pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm
> 
> wget cannot.   wget wants it to be:
> 
>   ftp://ftp.redhat.com//pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm
> 
> I believe the Netscape behavior is right and the wget behavior is wrong.

I don't think so. The double slash in front of the path part of the URL
starts the path in the ftp server's root, while the single slash starts
it in the default directory you log into when doing anonymous ftp. The
default directory isn't the server's root in this case, but "pub".

So

wget ftp://ftp.redhat.com/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm

works as intended, starting the path relative to the default directory.
Netscape can't retrieve this URL, though... which I believe is wrong.

Hanno



wget ftp url syntax is wrong

2001-02-26 Thread Jamie Zawinski

Netscape can retrieve this URL:

  ftp://ftp.redhat.com/pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm

wget cannot.   wget wants it to be:

  ftp://ftp.redhat.com//pub/redhat/updates/7.0/i386/apache-devel-1.3.14-3.i386.rpm

I believe the Netscape behavior is right and the wget behavior is wrong.

-- 
Jamie Zawinski
[EMAIL PROTECTED] http://www.jwz.org/
[EMAIL PROTECTED]   http://www.dnalounge.com/