Re: wget ipv6 patch

2003-10-10 Thread Mauro Tortonesi
On Fri, 10 Oct 2003, Hrvoje Niksic wrote:

> Mauro Tortonesi <[EMAIL PROTECTED]> writes:
>
> >> An IPv4 address is nothing more than a 32-bit quantity.  I don't
> >> see anything "incorrect" about using unsigned char[4] for that, and
> >> that works perfectly fine on 64-bit architectures.
> >
> > ok, but in this way you have to call memcmp each time you want to compare
> > two ip addresses and memcpy each time you want to copy an ip
> > address.
>
> Well, you can copy addresses with the assignment operator as well, as
> long as they're in a `struct', as they are in the current code.  You
> do need `memcmp' to compare them, but that's fine with me.
>
> > i prefer the in_addr approach (and i don't understand why we
> > shouldn't use structures like in_addr and in_addr6 which have been
> > created just to do what we want: storing ip addresses)
>
> Because they're complexly defined and hard to read if all you want is
> to store 4 and 16 bytes of binary data, respectively.

i guess that depends from the development style you are used to.

> >> Besides, you seem to be willing to cache the string representation
> >> of an IP address.  Why is it acceptable to work with a char *, but
> >> unacceptable to work with unsigned char[4]?  I simply don't see
> >> that in_addr is helping anything in host.c's code base.
> >
> > i would prefer to cache string representation of ip addresses
> > because the ipv6 code would be much simpler and more elegant.
>
> I agree.  My point was merely to point out that even you yourself
> believe that struct in_addr* is not the only legitimate way to store
> an IP address.
>
> >> >> I don't understand the new PASSIVE flag to lookup_host.
> >> >
> >> > well, that's a problem. to get a socket address suitable for
> >> > bind(2), you must call getaddrinfo with the AI_PASSIVE flag set.
> >>
> >> Why?  The current code seems to get by without it.
> >
> > the problem is when you call lookup_host to get a struct to pass to
> > bind(2). if you use --bind-address=localhost and you don't set the
> > AI_PASSIVE flag, getaddinfo will return the 127.0.0.1 address, which
> > is incorrect.
> >
> >> There must be a way to get at the socket address without calling
> >> getaddrinfo.
> >
> > not if you want to to use --bind-address=ipv6only.domain.com.
>
> I see.  I guess we'll have to live with it, one way or the other.
> Instead of accumulating boolean arguments, lookup_host should probably
> accept a FLAGS argument, so you can call it with, e.g.:
>
> lst = lookup_host (addr, LH_PASSIVE | LH_SILENT);
> ...
>

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi [EMAIL PROTECTED]
[EMAIL PROTECTED]
Deep Space 6 - IPv6 on Linuxhttp://www.deepspace6.net
Ferrara Linux Users Group   http://www.ferrara.linux.it


Re: wget ipv6 patch

2003-10-10 Thread Hrvoje Niksic
Mauro Tortonesi <[EMAIL PROTECTED]> writes:

> and i'm saying that for this task the ideal structure is
> sockaddr_storage. notice that my code uses sockaddr_storage
> (typedef'd as wget_sockaddr) only when dealing with socket
> addresses, not for ip address caching.

Now I see.  Thanks for clearing it up.

>> An IPv4 address is nothing more than a 32-bit quantity.  I don't
>> see anything "incorrect" about using unsigned char[4] for that, and
>> that works perfectly fine on 64-bit architectures.
>
> ok, but in this way you have to call memcmp each time you want to compare
> two ip addresses and memcpy each time you want to copy an ip
> address.

Well, you can copy addresses with the assignment operator as well, as
long as they're in a `struct', as they are in the current code.  You
do need `memcmp' to compare them, but that's fine with me.

> i prefer the in_addr approach (and i don't understand why we
> shouldn't use structures like in_addr and in_addr6 which have been
> created just to do what we want: storing ip addresses)

Because they're complexly defined and hard to read if all you want is
to store 4 and 16 bytes of binary data, respectively.

> however, notice that using unsigned char[4] and unsigned char[16] is
> a less portable solution and is potentially prone to problems with
> the alignement of the sockaddr_in and sockaddr_in6 structs.

Note that I only propose using unsigned char[N] for internal storing
of addresses, such as in Wget's own `struct ip_address'.  For talking
to system API's, we can and should copy the address to the appropriate
sockaddr_* structure.  That's how the current code works, and it's
quite portable.

>> Besides, you seem to be willing to cache the string representation
>> of an IP address.  Why is it acceptable to work with a char *, but
>> unacceptable to work with unsigned char[4]?  I simply don't see
>> that in_addr is helping anything in host.c's code base.
>
> i would prefer to cache string representation of ip addresses
> because the ipv6 code would be much simpler and more elegant.

I agree.  My point was merely to point out that even you yourself
believe that struct in_addr* is not the only legitimate way to store
an IP address.

>> >> I don't understand the new PASSIVE flag to lookup_host.
>> >
>> > well, that's a problem. to get a socket address suitable for
>> > bind(2), you must call getaddrinfo with the AI_PASSIVE flag set.
>>
>> Why?  The current code seems to get by without it.
>
> the problem is when you call lookup_host to get a struct to pass to
> bind(2). if you use --bind-address=localhost and you don't set the
> AI_PASSIVE flag, getaddinfo will return the 127.0.0.1 address, which
> is incorrect.
>
>> There must be a way to get at the socket address without calling
>> getaddrinfo.
>
> not if you want to to use --bind-address=ipv6only.domain.com.

I see.  I guess we'll have to live with it, one way or the other.
Instead of accumulating boolean arguments, lookup_host should probably
accept a FLAGS argument, so you can call it with, e.g.:

lst = lookup_host (addr, LH_PASSIVE | LH_SILENT);
...


Re: wget ipv6 patch

2003-10-10 Thread Mauro Tortonesi
On Wed, 8 Oct 2003, Hrvoje Niksic wrote:

> Mauro Tortonesi <[EMAIL PROTECTED]> writes:
>
> >> I still don't understand the choice to use sockaddr and
> >> sockaddr_storage in a application code.
> >> They result in needless casts and (to me) uncomprehensible code.
> >
> > well, using sockaddr_storage is the right way (TM) to write IPv6 enabled
> > code ;-)
>
> Not when the only thing you need is storing the result of a DNS
> lookup.

i agree, in fact for that case we use the ip_address struct.


> I've seen the RFC, but I don't agree with it in the case of Wget.  In
> fact, even the RFC states that the data structure is merely a help for
> writing portable code across "multiple address families and
> platforms".  Wget doesn't aim for AF independence, and the
> alternatives are at least as good for platform independence.

i think we are talking about two different things. i am talking about
storing generic socket addresses (IPv4 or IPv6) and i'm saying that for
this task the ideal structure is sockaddr_storage. notice that my code
uses sockaddr_storage (typedef'd as wget_sockaddr) only when dealing with
socket addresses, not for ip address caching. and i am not trying to write
af-indipendent code, anyway.


> >> For example, this cast: (unsigned char *)(&addr->addr_v4.s_addr)
> >> would not be necessary if the address were defined as unsigned
> >> char[4].
> >
> > in_addr is the correct structure to store ipv4 addresses. using
> > in_addr instead of unsigned char[4] makes much easier to copy or
> > compare ipv4 addresses. moreover, you don't have to care about the
> > integer size in 64-bits architectures.
>
> An IPv4 address is nothing more than a 32-bit quantity.  I don't see
> anything "incorrect" about using unsigned char[4] for that, and that
> works perfectly fine on 64-bit architectures.

ok, but in this way you have to call memcmp each time you want to compare
two ip addresses and memcpy each time you want to copy an ip address.
i prefer the in_addr approach (and i don't understand why we shouldn't
use structures like in_addr and in_addr6 which have been created just to
do what we want: storing ip addresses) as i think it is a more elegant
solution, but if you still wish to use unsigned char[4] and unsigned
char[16] i cannot force you to change your mind.

however, notice that using unsigned char[4] and unsigned char[16] is a
less portable solution and is potentially prone to problems with the
alignement of the sockaddr_in and sockaddr_in6 structs.


> Besides, you seem to be willing to cache the string representation of
> an IP address.  Why is it acceptable to work with a char *, but
> unacceptable to work with unsigned char[4]?  I simply don't see that
> in_addr is helping anything in host.c's code base.

i would prefer to cache string representation of ip addresses because
the ipv6 code would be much simpler and more elegant.


> >> I don't understand the new PASSIVE flag to lookup_host.
> >
> > well, that's a problem. to get a socket address suitable for
> > bind(2), you must call getaddrinfo with the AI_PASSIVE flag set.
>
> Why?  The current code seems to get by without it.

the problem is when you call lookup_host to get a struct to pass to
bind(2). if you use --bind-address=localhost and you don't set the
AI_PASSIVE flag, getaddinfo will return the 127.0.0.1 address, which is
incorrect.


> There must be a way to get at the socket address without calling
> getaddrinfo.

not if you want to to use --bind-address=ipv6only.domain.com.


> >> If it would make your life easier to add TYPE in !ENABLE_IPV6 case,
> >> so you can write it more compactly, by all means do it.  By "more
> >> compactly" I mean something code like this:
> >>
> [...]
> > that's a question i was going to ask you. i supposed you were
> > against adding the type member to ip_address in the IPv4-only case,
>
> Maintainability is more important than saving a few bytes per cached
> IP address, especially since I don't expect the number of cache
> entries to ever be large enough to make a difference.  (If someone
> downloads from so many addresses that the hash table sizes become a
> problem, the TYPE member will be the least of his problems.)

i agree.


> > P.S. please notice that by caching the string representation of IP
> >  addresses instead of their network representation, the code
> >  could become much more elegant and simple.
>
> You said that before, but I don't quite understand why that's the
> case.  It's certainly not the case for IPv4.

sure, but that would be of great help, especially for IPv6. the code (and
in particular the translation from ip addresses to socket addresses and
viceversa) would be much simpler and elegant.


-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Deep Space 6 - IPv6 with Linux  http://www.deepspace6.net
Ferrara Linux User Gro

Re: wget ipv6 patch

2003-10-08 Thread Dražen Kačar
Mauro Tortonesi wrote:

> are there __REALLY__ systems which do not support inet_aton? their ISVs
> should be ashamed of themselves...

Solaris, for example. IIRC inet_aton isn't in any document which claims
to be a standard.

> however, yours seemed to me an ugly hack, so i have temporarily removed
> it. as you say, it would be probably better to provide a fallback
> implementation of inet_aton in cmpt.c.

But standards define inet_pton, which can do what inet_aton does, so that
should be checked for before using the fallback implementation.

-- 
 .-.   .-.Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
 |
 |[EMAIL PROTECTED]


Re: wget ipv6 patch

2003-10-08 Thread Hrvoje Niksic
Mauro Tortonesi <[EMAIL PROTECTED]> writes:

>> I still don't understand the choice to use sockaddr and
>> sockaddr_storage in a application code.
>> They result in needless casts and (to me) uncomprehensible code.
>
> well, using sockaddr_storage is the right way (TM) to write IPv6 enabled
> code ;-)

Not when the only thing you need is storing the result of a DNS
lookup.

I've seen the RFC, but I don't agree with it in the case of Wget.  In
fact, even the RFC states that the data structure is merely a help for
writing portable code across "multiple address families and
platforms".  Wget doesn't aim for AF independence, and the
alternatives are at least as good for platform independence.

>> For example, this cast: (unsigned char *)(&addr->addr_v4.s_addr)
>> would not be necessary if the address were defined as unsigned
>> char[4].
>
> in_addr is the correct structure to store ipv4 addresses. using
> in_addr instead of unsigned char[4] makes much easier to copy or
> compare ipv4 addresses. moreover, you don't have to care about the
> integer size in 64-bits architectures.

An IPv4 address is nothing more than a 32-bit quantity.  I don't see
anything "incorrect" about using unsigned char[4] for that, and that
works perfectly fine on 64-bit architectures.

Besides, you seem to be willing to cache the string representation of
an IP address.  Why is it acceptable to work with a char *, but
unacceptable to work with unsigned char[4]?  I simply don't see that
in_addr is helping anything in host.c's code base.

>> I don't understand the new PASSIVE flag to lookup_host.
>
> well, that's a problem. to get a socket address suitable for
> bind(2), you must call getaddrinfo with the AI_PASSIVE flag set.

Why?  The current code seems to get by without it.

There must be a way to get at the socket address without calling
getaddrinfo.

> are there __REALLY__ systems which do not support inet_aton? their
> ISVs should be ashamed of themselves...

Those systems are very old, possibly predating the very invention of
inet_aton.

>> If it would make your life easier to add TYPE in !ENABLE_IPV6 case,
>> so you can write it more compactly, by all means do it.  By "more
>> compactly" I mean something code like this:
>>
[...]
> that's a question i was going to ask you. i supposed you were
> against adding the type member to ip_address in the IPv4-only case,

Maintainability is more important than saving a few bytes per cached
IP address, especially since I don't expect the number of cache
entries to ever be large enough to make a difference.  (If someone
downloads from so many addresses that the hash table sizes become a
problem, the TYPE member will be the least of his problems.)

> P.S. please notice that by caching the string representation of IP
>  addresses instead of their network representation, the code
>  could become much more elegant and simple.

You said that before, but I don't quite understand why that's the
case.  It's certainly not the case for IPv4.



Re: wget ipv6 patch

2003-10-08 Thread Mauro Tortonesi
On Wed, 8 Oct 2003, Hrvoje Niksic wrote:

> Mauro Tortonesi <[EMAIL PROTECTED]> writes:
>
> > so, i am asking you: what do you think of these changes?
>
> Overall they look very good!  Judging from the patch, a large piece of
> the work part seems to be in an unexpected place: the FTP code.

yes, i have added support for LPRT and LPSV, and refactored existing code.
i still have to work on the code, but the main problem remains probably
the duplication of ftp_port and ftp_pasv, which have two different
versions (one for the IPv6-enabled case and the other for IPv4-only
case).


> Here are some remarks I got looking at the patch.
>
> It inadvertently undoes the latest fnmatch move.

sorry. i am working on an old wget cvs release. i will get up-to-date with
the latest cvs changes ASAP.


> I still don't understand the choice to use sockaddr and
> sockaddr_storage in a application code.
> They result in needless casts and (to me) uncomprehensible code.

well, using sockaddr_storage is the right way (TM) to write IPv6 enabled
code ;-)

quoting RFC3493 section 3.10:


   One simple addition to the sockets API that can help application
   writers is the "struct sockaddr_storage".  This data structure can
   simplify writing code that is portable across multiple address
   families and platforms.  This data structure is designed with the
   following goals.

   - Large enough to accommodate all supported protocol-specific address
  structures.

   - Aligned at an appropriate boundary so that pointers to it can be
  cast as pointers to protocol specific address structures and used
  to access the fields of those structures without alignment
  problems.

   The sockaddr_storage structure contains field ss_family which is of
   type sa_family_t.  When a sockaddr_storage structure is cast to a
   sockaddr structure, the ss_family field of the sockaddr_storage
   structure maps onto the sa_family field of the sockaddr structure.
   When a sockaddr_storage structure is cast as a protocol specific
   address structure, the ss_family field maps onto a field of that
   structure that is of type sa_family_t and that identifies the
   protocol's address family.


using a union like:

struct wget_sockaddr {
struct sockaddr;
struct sockaddr_in;
struct sockaddr_in6;
};

is not an elegant solution, and is probably not safe because of compiler
alignments. see the chapter about struct sockaddr_storage in:

http://www.kame.net/newsletter/19980604


> For example, this cast: (unsigned char *)(&addr->addr_v4.s_addr) would
> not be necessary if the address were defined as unsigned char[4].

in_addr is the correct structure to store ipv4 addresses. using in_addr
instead of unsigned char[4] makes much easier to copy or compare ipv4
addresses. moreover, you don't have to care about the integer size in
64-bits architectures.


> I don't understand the new PASSIVE flag to lookup_host.

well, that's a problem. to get a socket address suitable for bind(2), you
must call getaddrinfo with the AI_PASSIVE flag set. for instance, if you
call:

getaddrinfo(NULL, "ftp", &hints, &res)

with the AI_PASSIVE flag, you get the :: port 21 and 0.0.0.0 port 21
socket addresses, while calling getaddrinfo without the AI_PASSIVE flag
returns the ::1 port 21 and 127.0.0.1 port 21 addresses.

the passive flag for lookup_host is a very unelegant hack, but i haven't
found a way to get rid of it, yet. any suggestion?


> In lookup_host, the comment says that you don't need to call
> getaddrinfo_with_timeout, but then you call getaddrinfo_with_timeout.
> An oversight?
>
> You removed this code:
>
> -  /* ADDR is defined to be in network byte order, which is what
> -  this returns, so we can just copy it to STORE_IP.  However,
> -  on big endian 64-bit architectures the value will be stored
> -  in the *last*, not first four bytes.  OFFSET makes sure that
> -  we copy the correct four bytes.  */
> -  int offset = 0;
> -#ifdef WORDS_BIGENDIAN
> -  offset = sizeof (unsigned long) - sizeof (ip4_address);
> -#endif
>
> But the reason the code is there is that inet_aton is not present on
> all architectures, whereas inet_addr is.  So I used only inet_addr in
> the IPv4 case, and inet_addr stupidly returned `long', which requires
> some contortions to copy into a uchar[4] on 64-bit machines.  (I see
> that inet_addr returns `in_addr_t' these days.)
>
> If you intend to use inet_aton without checking, there should be a
> fallback implementation in cmpt.c.

are there __REALLY__ systems which do not support inet_aton? their ISVs
should be ashamed of themselves...

however, yours seemed to me an ugly hack, so i have temporarily removed
it. as you say, it would be probably better to provide a fallback
implementation of inet_aton in cmpt.c.


> I note that you elided TYPE from ip_address if ENABLE_IPV6 is not
> defined.  That (I think) results in code duplication in some places,
> because the code effectively h

Re: wget ipv6 patch

2003-10-08 Thread Hrvoje Niksic
Mauro Tortonesi <[EMAIL PROTECTED]> writes:

> so, i am asking you: what do you think of these changes?

Overall they look very good!  Judging from the patch, a large piece of
the work part seems to be in an unexpected place: the FTP code.

Here are some remarks I got looking at the patch.

It inadvertently undoes the latest fnmatch move.

I still don't understand the choice to use sockaddr and
sockaddr_storage in a application code.  They result in needless casts
and (to me) uncomprehensible code.  For example, this cast:
(unsigned char *)(&addr->addr_v4.s_addr) would not be necessary if the
address were defined as unsigned char[4].

I don't understand the new PASSIVE flag to lookup_host.

In lookup_host, the comment says that you don't need to call
getaddrinfo_with_timeout, but then you call getaddrinfo_with_timeout.
An oversight?

You removed this code:

-  /* ADDR is defined to be in network byte order, which is what
-this returns, so we can just copy it to STORE_IP.  However,
-on big endian 64-bit architectures the value will be stored
-in the *last*, not first four bytes.  OFFSET makes sure that
-we copy the correct four bytes.  */
-  int offset = 0;
-#ifdef WORDS_BIGENDIAN
-  offset = sizeof (unsigned long) - sizeof (ip4_address);
-#endif

But the reason the code is there is that inet_aton is not present on
all architectures, whereas inet_addr is.  So I used only inet_addr in
the IPv4 case, and inet_addr stupidly returned `long', which requires
some contortions to copy into a uchar[4] on 64-bit machines.  (I see
that inet_addr returns `in_addr_t' these days.)

If you intend to use inet_aton without checking, there should be a
fallback implementation in cmpt.c.

I note that you elided TYPE from ip_address if ENABLE_IPV6 is not
defined.  That (I think) results in code duplication in some places,
because the code effectively has to handle the IPv4 case twice:

#ifdef ENABLE_IPV6
switch (addr->type)
  {
case IPv6:
... IPv6 handling ...
break;
case IPv4:
... IPv4 handling ...
break;
  }
#else
  ... IPv4 handling because TYPE is not present without ENABLE_IPV6 ...
#endif

If it would make your life easier to add TYPE in !ENABLE_IPV6 case, so
you can write it more compactly, by all means do it.  By "more
compactly" I mean something code like this:

switch (addr->type)
  {
#ifdef ENABLE_IPV6
case IPv6:
... IPv6 handling ...
break;
#endif
case IPv4:
... IPv4 handling ...
break;
  }



wget ipv6 patch

2003-10-08 Thread Mauro Tortonesi

here is my first patch to improve ipv6 support of wget. please, notice
that the code compiles, but is still buggy and will probably not work.

i am sending this preliminary patch only to gather feedback from wget
developers and to coordinate with other developers who are working on
ipv6 support for wget.

so, i am asking you: what do you think of these changes?

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Deep Space 6 - IPv6 with Linux  http://www.deepspace6.net
Ferrara Linux User Grouphttp://www.ferrara.linux.it

wget-ipv6.diff.bz2
Description: Binary data