Re: wget ipv6 patch

2003-10-10 Thread Mauro Tortonesi
On Wed, 8 Oct 2003, Hrvoje Niksic wrote:

 Mauro Tortonesi [EMAIL PROTECTED] writes:

  I still don't understand the choice to use sockaddr and
  sockaddr_storage in a application code.
  They result in needless casts and (to me) uncomprehensible code.
 
  well, using sockaddr_storage is the right way (TM) to write IPv6 enabled
  code ;-)

 Not when the only thing you need is storing the result of a DNS
 lookup.

i agree, in fact for that case we use the ip_address struct.


 I've seen the RFC, but I don't agree with it in the case of Wget.  In
 fact, even the RFC states that the data structure is merely a help for
 writing portable code across multiple address families and
 platforms.  Wget doesn't aim for AF independence, and the
 alternatives are at least as good for platform independence.

i think we are talking about two different things. i am talking about
storing generic socket addresses (IPv4 or IPv6) and i'm saying that for
this task the ideal structure is sockaddr_storage. notice that my code
uses sockaddr_storage (typedef'd as wget_sockaddr) only when dealing with
socket addresses, not for ip address caching. and i am not trying to write
af-indipendent code, anyway.


  For example, this cast: (unsigned char *)(addr-addr_v4.s_addr)
  would not be necessary if the address were defined as unsigned
  char[4].
 
  in_addr is the correct structure to store ipv4 addresses. using
  in_addr instead of unsigned char[4] makes much easier to copy or
  compare ipv4 addresses. moreover, you don't have to care about the
  integer size in 64-bits architectures.

 An IPv4 address is nothing more than a 32-bit quantity.  I don't see
 anything incorrect about using unsigned char[4] for that, and that
 works perfectly fine on 64-bit architectures.

ok, but in this way you have to call memcmp each time you want to compare
two ip addresses and memcpy each time you want to copy an ip address.
i prefer the in_addr approach (and i don't understand why we shouldn't
use structures like in_addr and in_addr6 which have been created just to
do what we want: storing ip addresses) as i think it is a more elegant
solution, but if you still wish to use unsigned char[4] and unsigned
char[16] i cannot force you to change your mind.

however, notice that using unsigned char[4] and unsigned char[16] is a
less portable solution and is potentially prone to problems with the
alignement of the sockaddr_in and sockaddr_in6 structs.


 Besides, you seem to be willing to cache the string representation of
 an IP address.  Why is it acceptable to work with a char *, but
 unacceptable to work with unsigned char[4]?  I simply don't see that
 in_addr is helping anything in host.c's code base.

i would prefer to cache string representation of ip addresses because
the ipv6 code would be much simpler and more elegant.


  I don't understand the new PASSIVE flag to lookup_host.
 
  well, that's a problem. to get a socket address suitable for
  bind(2), you must call getaddrinfo with the AI_PASSIVE flag set.

 Why?  The current code seems to get by without it.

the problem is when you call lookup_host to get a struct to pass to
bind(2). if you use --bind-address=localhost and you don't set the
AI_PASSIVE flag, getaddinfo will return the 127.0.0.1 address, which is
incorrect.


 There must be a way to get at the socket address without calling
 getaddrinfo.

not if you want to to use --bind-address=ipv6only.domain.com.


  If it would make your life easier to add TYPE in !ENABLE_IPV6 case,
  so you can write it more compactly, by all means do it.  By more
  compactly I mean something code like this:
 
 [...]
  that's a question i was going to ask you. i supposed you were
  against adding the type member to ip_address in the IPv4-only case,

 Maintainability is more important than saving a few bytes per cached
 IP address, especially since I don't expect the number of cache
 entries to ever be large enough to make a difference.  (If someone
 downloads from so many addresses that the hash table sizes become a
 problem, the TYPE member will be the least of his problems.)

i agree.


  P.S. please notice that by caching the string representation of IP
   addresses instead of their network representation, the code
   could become much more elegant and simple.

 You said that before, but I don't quite understand why that's the
 case.  It's certainly not the case for IPv4.

sure, but that would be of great help, especially for IPv6. the code (and
in particular the translation from ip addresses to socket addresses and
viceversa) would be much simpler and elegant.


-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Deep Space 6 - IPv6 with Linux  http://www.deepspace6.net
Ferrara Linux User Grouphttp://www.ferrara.linux.it




Re: wget ipv6 patch

2003-10-10 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 and i'm saying that for this task the ideal structure is
 sockaddr_storage. notice that my code uses sockaddr_storage
 (typedef'd as wget_sockaddr) only when dealing with socket
 addresses, not for ip address caching.

Now I see.  Thanks for clearing it up.

 An IPv4 address is nothing more than a 32-bit quantity.  I don't
 see anything incorrect about using unsigned char[4] for that, and
 that works perfectly fine on 64-bit architectures.

 ok, but in this way you have to call memcmp each time you want to compare
 two ip addresses and memcpy each time you want to copy an ip
 address.

Well, you can copy addresses with the assignment operator as well, as
long as they're in a `struct', as they are in the current code.  You
do need `memcmp' to compare them, but that's fine with me.

 i prefer the in_addr approach (and i don't understand why we
 shouldn't use structures like in_addr and in_addr6 which have been
 created just to do what we want: storing ip addresses)

Because they're complexly defined and hard to read if all you want is
to store 4 and 16 bytes of binary data, respectively.

 however, notice that using unsigned char[4] and unsigned char[16] is
 a less portable solution and is potentially prone to problems with
 the alignement of the sockaddr_in and sockaddr_in6 structs.

Note that I only propose using unsigned char[N] for internal storing
of addresses, such as in Wget's own `struct ip_address'.  For talking
to system API's, we can and should copy the address to the appropriate
sockaddr_* structure.  That's how the current code works, and it's
quite portable.

 Besides, you seem to be willing to cache the string representation
 of an IP address.  Why is it acceptable to work with a char *, but
 unacceptable to work with unsigned char[4]?  I simply don't see
 that in_addr is helping anything in host.c's code base.

 i would prefer to cache string representation of ip addresses
 because the ipv6 code would be much simpler and more elegant.

I agree.  My point was merely to point out that even you yourself
believe that struct in_addr* is not the only legitimate way to store
an IP address.

  I don't understand the new PASSIVE flag to lookup_host.
 
  well, that's a problem. to get a socket address suitable for
  bind(2), you must call getaddrinfo with the AI_PASSIVE flag set.

 Why?  The current code seems to get by without it.

 the problem is when you call lookup_host to get a struct to pass to
 bind(2). if you use --bind-address=localhost and you don't set the
 AI_PASSIVE flag, getaddinfo will return the 127.0.0.1 address, which
 is incorrect.

 There must be a way to get at the socket address without calling
 getaddrinfo.

 not if you want to to use --bind-address=ipv6only.domain.com.

I see.  I guess we'll have to live with it, one way or the other.
Instead of accumulating boolean arguments, lookup_host should probably
accept a FLAGS argument, so you can call it with, e.g.:

lst = lookup_host (addr, LH_PASSIVE | LH_SILENT);
...


Re: wget ipv6 patch

2003-10-10 Thread Mauro Tortonesi
On Fri, 10 Oct 2003, Hrvoje Niksic wrote:

 Mauro Tortonesi [EMAIL PROTECTED] writes:

  An IPv4 address is nothing more than a 32-bit quantity.  I don't
  see anything incorrect about using unsigned char[4] for that, and
  that works perfectly fine on 64-bit architectures.
 
  ok, but in this way you have to call memcmp each time you want to compare
  two ip addresses and memcpy each time you want to copy an ip
  address.

 Well, you can copy addresses with the assignment operator as well, as
 long as they're in a `struct', as they are in the current code.  You
 do need `memcmp' to compare them, but that's fine with me.

  i prefer the in_addr approach (and i don't understand why we
  shouldn't use structures like in_addr and in_addr6 which have been
  created just to do what we want: storing ip addresses)

 Because they're complexly defined and hard to read if all you want is
 to store 4 and 16 bytes of binary data, respectively.

i guess that depends from the development style you are used to.

  Besides, you seem to be willing to cache the string representation
  of an IP address.  Why is it acceptable to work with a char *, but
  unacceptable to work with unsigned char[4]?  I simply don't see
  that in_addr is helping anything in host.c's code base.
 
  i would prefer to cache string representation of ip addresses
  because the ipv6 code would be much simpler and more elegant.

 I agree.  My point was merely to point out that even you yourself
 believe that struct in_addr* is not the only legitimate way to store
 an IP address.

   I don't understand the new PASSIVE flag to lookup_host.
  
   well, that's a problem. to get a socket address suitable for
   bind(2), you must call getaddrinfo with the AI_PASSIVE flag set.
 
  Why?  The current code seems to get by without it.
 
  the problem is when you call lookup_host to get a struct to pass to
  bind(2). if you use --bind-address=localhost and you don't set the
  AI_PASSIVE flag, getaddinfo will return the 127.0.0.1 address, which
  is incorrect.
 
  There must be a way to get at the socket address without calling
  getaddrinfo.
 
  not if you want to to use --bind-address=ipv6only.domain.com.

 I see.  I guess we'll have to live with it, one way or the other.
 Instead of accumulating boolean arguments, lookup_host should probably
 accept a FLAGS argument, so you can call it with, e.g.:

 lst = lookup_host (addr, LH_PASSIVE | LH_SILENT);
 ...


-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi [EMAIL PROTECTED]
[EMAIL PROTECTED]
Deep Space 6 - IPv6 on Linuxhttp://www.deepspace6.net
Ferrara Linux Users Group   http://www.ferrara.linux.it


wget ipv6 patch

2003-10-08 Thread Mauro Tortonesi

here is my first patch to improve ipv6 support of wget. please, notice
that the code compiles, but is still buggy and will probably not work.

i am sending this preliminary patch only to gather feedback from wget
developers and to coordinate with other developers who are working on
ipv6 support for wget.

so, i am asking you: what do you think of these changes?

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Deep Space 6 - IPv6 with Linux  http://www.deepspace6.net
Ferrara Linux User Grouphttp://www.ferrara.linux.it

wget-ipv6.diff.bz2
Description: Binary data


Re: wget ipv6 patch

2003-10-08 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 so, i am asking you: what do you think of these changes?

Overall they look very good!  Judging from the patch, a large piece of
the work part seems to be in an unexpected place: the FTP code.

Here are some remarks I got looking at the patch.

It inadvertently undoes the latest fnmatch move.

I still don't understand the choice to use sockaddr and
sockaddr_storage in a application code.  They result in needless casts
and (to me) uncomprehensible code.  For example, this cast:
(unsigned char *)(addr-addr_v4.s_addr) would not be necessary if the
address were defined as unsigned char[4].

I don't understand the new PASSIVE flag to lookup_host.

In lookup_host, the comment says that you don't need to call
getaddrinfo_with_timeout, but then you call getaddrinfo_with_timeout.
An oversight?

You removed this code:

-  /* ADDR is defined to be in network byte order, which is what
-this returns, so we can just copy it to STORE_IP.  However,
-on big endian 64-bit architectures the value will be stored
-in the *last*, not first four bytes.  OFFSET makes sure that
-we copy the correct four bytes.  */
-  int offset = 0;
-#ifdef WORDS_BIGENDIAN
-  offset = sizeof (unsigned long) - sizeof (ip4_address);
-#endif

But the reason the code is there is that inet_aton is not present on
all architectures, whereas inet_addr is.  So I used only inet_addr in
the IPv4 case, and inet_addr stupidly returned `long', which requires
some contortions to copy into a uchar[4] on 64-bit machines.  (I see
that inet_addr returns `in_addr_t' these days.)

If you intend to use inet_aton without checking, there should be a
fallback implementation in cmpt.c.

I note that you elided TYPE from ip_address if ENABLE_IPV6 is not
defined.  That (I think) results in code duplication in some places,
because the code effectively has to handle the IPv4 case twice:

#ifdef ENABLE_IPV6
switch (addr-type)
  {
case IPv6:
... IPv6 handling ...
break;
case IPv4:
... IPv4 handling ...
break;
  }
#else
  ... IPv4 handling because TYPE is not present without ENABLE_IPV6 ...
#endif

If it would make your life easier to add TYPE in !ENABLE_IPV6 case, so
you can write it more compactly, by all means do it.  By more
compactly I mean something code like this:

switch (addr-type)
  {
#ifdef ENABLE_IPV6
case IPv6:
... IPv6 handling ...
break;
#endif
case IPv4:
... IPv4 handling ...
break;
  }



Re: wget ipv6 patch

2003-10-08 Thread Mauro Tortonesi
On Wed, 8 Oct 2003, Hrvoje Niksic wrote:

 Mauro Tortonesi [EMAIL PROTECTED] writes:

  so, i am asking you: what do you think of these changes?

 Overall they look very good!  Judging from the patch, a large piece of
 the work part seems to be in an unexpected place: the FTP code.

yes, i have added support for LPRT and LPSV, and refactored existing code.
i still have to work on the code, but the main problem remains probably
the duplication of ftp_port and ftp_pasv, which have two different
versions (one for the IPv6-enabled case and the other for IPv4-only
case).


 Here are some remarks I got looking at the patch.

 It inadvertently undoes the latest fnmatch move.

sorry. i am working on an old wget cvs release. i will get up-to-date with
the latest cvs changes ASAP.


 I still don't understand the choice to use sockaddr and
 sockaddr_storage in a application code.
 They result in needless casts and (to me) uncomprehensible code.

well, using sockaddr_storage is the right way (TM) to write IPv6 enabled
code ;-)

quoting RFC3493 section 3.10:


   One simple addition to the sockets API that can help application
   writers is the struct sockaddr_storage.  This data structure can
   simplify writing code that is portable across multiple address
   families and platforms.  This data structure is designed with the
   following goals.

   - Large enough to accommodate all supported protocol-specific address
  structures.

   - Aligned at an appropriate boundary so that pointers to it can be
  cast as pointers to protocol specific address structures and used
  to access the fields of those structures without alignment
  problems.

   The sockaddr_storage structure contains field ss_family which is of
   type sa_family_t.  When a sockaddr_storage structure is cast to a
   sockaddr structure, the ss_family field of the sockaddr_storage
   structure maps onto the sa_family field of the sockaddr structure.
   When a sockaddr_storage structure is cast as a protocol specific
   address structure, the ss_family field maps onto a field of that
   structure that is of type sa_family_t and that identifies the
   protocol's address family.


using a union like:

struct wget_sockaddr {
struct sockaddr;
struct sockaddr_in;
struct sockaddr_in6;
};

is not an elegant solution, and is probably not safe because of compiler
alignments. see the chapter about struct sockaddr_storage in:

http://www.kame.net/newsletter/19980604


 For example, this cast: (unsigned char *)(addr-addr_v4.s_addr) would
 not be necessary if the address were defined as unsigned char[4].

in_addr is the correct structure to store ipv4 addresses. using in_addr
instead of unsigned char[4] makes much easier to copy or compare ipv4
addresses. moreover, you don't have to care about the integer size in
64-bits architectures.


 I don't understand the new PASSIVE flag to lookup_host.

well, that's a problem. to get a socket address suitable for bind(2), you
must call getaddrinfo with the AI_PASSIVE flag set. for instance, if you
call:

getaddrinfo(NULL, ftp, hints, res)

with the AI_PASSIVE flag, you get the :: port 21 and 0.0.0.0 port 21
socket addresses, while calling getaddrinfo without the AI_PASSIVE flag
returns the ::1 port 21 and 127.0.0.1 port 21 addresses.

the passive flag for lookup_host is a very unelegant hack, but i haven't
found a way to get rid of it, yet. any suggestion?


 In lookup_host, the comment says that you don't need to call
 getaddrinfo_with_timeout, but then you call getaddrinfo_with_timeout.
 An oversight?

 You removed this code:

 -  /* ADDR is defined to be in network byte order, which is what
 -  this returns, so we can just copy it to STORE_IP.  However,
 -  on big endian 64-bit architectures the value will be stored
 -  in the *last*, not first four bytes.  OFFSET makes sure that
 -  we copy the correct four bytes.  */
 -  int offset = 0;
 -#ifdef WORDS_BIGENDIAN
 -  offset = sizeof (unsigned long) - sizeof (ip4_address);
 -#endif

 But the reason the code is there is that inet_aton is not present on
 all architectures, whereas inet_addr is.  So I used only inet_addr in
 the IPv4 case, and inet_addr stupidly returned `long', which requires
 some contortions to copy into a uchar[4] on 64-bit machines.  (I see
 that inet_addr returns `in_addr_t' these days.)

 If you intend to use inet_aton without checking, there should be a
 fallback implementation in cmpt.c.

are there __REALLY__ systems which do not support inet_aton? their ISVs
should be ashamed of themselves...

however, yours seemed to me an ugly hack, so i have temporarily removed
it. as you say, it would be probably better to provide a fallback
implementation of inet_aton in cmpt.c.


 I note that you elided TYPE from ip_address if ENABLE_IPV6 is not
 defined.  That (I think) results in code duplication in some places,
 because the code effectively has to handle the IPv4 case twice:

 #ifdef 

Re: wget ipv6 patch

2003-10-08 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 I still don't understand the choice to use sockaddr and
 sockaddr_storage in a application code.
 They result in needless casts and (to me) uncomprehensible code.

 well, using sockaddr_storage is the right way (TM) to write IPv6 enabled
 code ;-)

Not when the only thing you need is storing the result of a DNS
lookup.

I've seen the RFC, but I don't agree with it in the case of Wget.  In
fact, even the RFC states that the data structure is merely a help for
writing portable code across multiple address families and
platforms.  Wget doesn't aim for AF independence, and the
alternatives are at least as good for platform independence.

 For example, this cast: (unsigned char *)(addr-addr_v4.s_addr)
 would not be necessary if the address were defined as unsigned
 char[4].

 in_addr is the correct structure to store ipv4 addresses. using
 in_addr instead of unsigned char[4] makes much easier to copy or
 compare ipv4 addresses. moreover, you don't have to care about the
 integer size in 64-bits architectures.

An IPv4 address is nothing more than a 32-bit quantity.  I don't see
anything incorrect about using unsigned char[4] for that, and that
works perfectly fine on 64-bit architectures.

Besides, you seem to be willing to cache the string representation of
an IP address.  Why is it acceptable to work with a char *, but
unacceptable to work with unsigned char[4]?  I simply don't see that
in_addr is helping anything in host.c's code base.

 I don't understand the new PASSIVE flag to lookup_host.

 well, that's a problem. to get a socket address suitable for
 bind(2), you must call getaddrinfo with the AI_PASSIVE flag set.

Why?  The current code seems to get by without it.

There must be a way to get at the socket address without calling
getaddrinfo.

 are there __REALLY__ systems which do not support inet_aton? their
 ISVs should be ashamed of themselves...

Those systems are very old, possibly predating the very invention of
inet_aton.

 If it would make your life easier to add TYPE in !ENABLE_IPV6 case,
 so you can write it more compactly, by all means do it.  By more
 compactly I mean something code like this:

[...]
 that's a question i was going to ask you. i supposed you were
 against adding the type member to ip_address in the IPv4-only case,

Maintainability is more important than saving a few bytes per cached
IP address, especially since I don't expect the number of cache
entries to ever be large enough to make a difference.  (If someone
downloads from so many addresses that the hash table sizes become a
problem, the TYPE member will be the least of his problems.)

 P.S. please notice that by caching the string representation of IP
  addresses instead of their network representation, the code
  could become much more elegant and simple.

You said that before, but I don't quite understand why that's the
case.  It's certainly not the case for IPv4.



Re: wget ipv6 patch

2003-10-08 Thread Draen Kaar
Mauro Tortonesi wrote:

 are there __REALLY__ systems which do not support inet_aton? their ISVs
 should be ashamed of themselves...

Solaris, for example. IIRC inet_aton isn't in any document which claims
to be a standard.

 however, yours seemed to me an ugly hack, so i have temporarily removed
 it. as you say, it would be probably better to provide a fallback
 implementation of inet_aton in cmpt.c.

But standards define inet_pton, which can do what inet_aton does, so that
should be checked for before using the fallback implementation.

-- 
 .-.   .-.Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
 |
 |[EMAIL PROTECTED]