Re: wget ipv6 patch
On Fri, 10 Oct 2003, Hrvoje Niksic wrote: > Mauro Tortonesi <[EMAIL PROTECTED]> writes: > > >> An IPv4 address is nothing more than a 32-bit quantity. I don't > >> see anything "incorrect" about using unsigned char[4] for that, and > >> that works perfectly fine on 64-bit architectures. > > > > ok, but in this way you have to call memcmp each time you want to compare > > two ip addresses and memcpy each time you want to copy an ip > > address. > > Well, you can copy addresses with the assignment operator as well, as > long as they're in a `struct', as they are in the current code. You > do need `memcmp' to compare them, but that's fine with me. > > > i prefer the in_addr approach (and i don't understand why we > > shouldn't use structures like in_addr and in_addr6 which have been > > created just to do what we want: storing ip addresses) > > Because they're complexly defined and hard to read if all you want is > to store 4 and 16 bytes of binary data, respectively. i guess that depends from the development style you are used to. > >> Besides, you seem to be willing to cache the string representation > >> of an IP address. Why is it acceptable to work with a char *, but > >> unacceptable to work with unsigned char[4]? I simply don't see > >> that in_addr is helping anything in host.c's code base. > > > > i would prefer to cache string representation of ip addresses > > because the ipv6 code would be much simpler and more elegant. > > I agree. My point was merely to point out that even you yourself > believe that struct in_addr* is not the only legitimate way to store > an IP address. > > >> >> I don't understand the new PASSIVE flag to lookup_host. > >> > > >> > well, that's a problem. to get a socket address suitable for > >> > bind(2), you must call getaddrinfo with the AI_PASSIVE flag set. > >> > >> Why? The current code seems to get by without it. > > > > the problem is when you call lookup_host to get a struct to pass to > > bind(2). if you use --bind-address=localhost and you don't set the > > AI_PASSIVE flag, getaddinfo will return the 127.0.0.1 address, which > > is incorrect. > > > >> There must be a way to get at the socket address without calling > >> getaddrinfo. > > > > not if you want to to use --bind-address=ipv6only.domain.com. > > I see. I guess we'll have to live with it, one way or the other. > Instead of accumulating boolean arguments, lookup_host should probably > accept a FLAGS argument, so you can call it with, e.g.: > > lst = lookup_host (addr, LH_PASSIVE | LH_SILENT); > ... > -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi [EMAIL PROTECTED] [EMAIL PROTECTED] Deep Space 6 - IPv6 on Linuxhttp://www.deepspace6.net Ferrara Linux Users Group http://www.ferrara.linux.it
Re: wget ipv6 patch
Mauro Tortonesi <[EMAIL PROTECTED]> writes: > and i'm saying that for this task the ideal structure is > sockaddr_storage. notice that my code uses sockaddr_storage > (typedef'd as wget_sockaddr) only when dealing with socket > addresses, not for ip address caching. Now I see. Thanks for clearing it up. >> An IPv4 address is nothing more than a 32-bit quantity. I don't >> see anything "incorrect" about using unsigned char[4] for that, and >> that works perfectly fine on 64-bit architectures. > > ok, but in this way you have to call memcmp each time you want to compare > two ip addresses and memcpy each time you want to copy an ip > address. Well, you can copy addresses with the assignment operator as well, as long as they're in a `struct', as they are in the current code. You do need `memcmp' to compare them, but that's fine with me. > i prefer the in_addr approach (and i don't understand why we > shouldn't use structures like in_addr and in_addr6 which have been > created just to do what we want: storing ip addresses) Because they're complexly defined and hard to read if all you want is to store 4 and 16 bytes of binary data, respectively. > however, notice that using unsigned char[4] and unsigned char[16] is > a less portable solution and is potentially prone to problems with > the alignement of the sockaddr_in and sockaddr_in6 structs. Note that I only propose using unsigned char[N] for internal storing of addresses, such as in Wget's own `struct ip_address'. For talking to system API's, we can and should copy the address to the appropriate sockaddr_* structure. That's how the current code works, and it's quite portable. >> Besides, you seem to be willing to cache the string representation >> of an IP address. Why is it acceptable to work with a char *, but >> unacceptable to work with unsigned char[4]? I simply don't see >> that in_addr is helping anything in host.c's code base. > > i would prefer to cache string representation of ip addresses > because the ipv6 code would be much simpler and more elegant. I agree. My point was merely to point out that even you yourself believe that struct in_addr* is not the only legitimate way to store an IP address. >> >> I don't understand the new PASSIVE flag to lookup_host. >> > >> > well, that's a problem. to get a socket address suitable for >> > bind(2), you must call getaddrinfo with the AI_PASSIVE flag set. >> >> Why? The current code seems to get by without it. > > the problem is when you call lookup_host to get a struct to pass to > bind(2). if you use --bind-address=localhost and you don't set the > AI_PASSIVE flag, getaddinfo will return the 127.0.0.1 address, which > is incorrect. > >> There must be a way to get at the socket address without calling >> getaddrinfo. > > not if you want to to use --bind-address=ipv6only.domain.com. I see. I guess we'll have to live with it, one way or the other. Instead of accumulating boolean arguments, lookup_host should probably accept a FLAGS argument, so you can call it with, e.g.: lst = lookup_host (addr, LH_PASSIVE | LH_SILENT); ...
Re: wget ipv6 patch
On Wed, 8 Oct 2003, Hrvoje Niksic wrote: > Mauro Tortonesi <[EMAIL PROTECTED]> writes: > > >> I still don't understand the choice to use sockaddr and > >> sockaddr_storage in a application code. > >> They result in needless casts and (to me) uncomprehensible code. > > > > well, using sockaddr_storage is the right way (TM) to write IPv6 enabled > > code ;-) > > Not when the only thing you need is storing the result of a DNS > lookup. i agree, in fact for that case we use the ip_address struct. > I've seen the RFC, but I don't agree with it in the case of Wget. In > fact, even the RFC states that the data structure is merely a help for > writing portable code across "multiple address families and > platforms". Wget doesn't aim for AF independence, and the > alternatives are at least as good for platform independence. i think we are talking about two different things. i am talking about storing generic socket addresses (IPv4 or IPv6) and i'm saying that for this task the ideal structure is sockaddr_storage. notice that my code uses sockaddr_storage (typedef'd as wget_sockaddr) only when dealing with socket addresses, not for ip address caching. and i am not trying to write af-indipendent code, anyway. > >> For example, this cast: (unsigned char *)(&addr->addr_v4.s_addr) > >> would not be necessary if the address were defined as unsigned > >> char[4]. > > > > in_addr is the correct structure to store ipv4 addresses. using > > in_addr instead of unsigned char[4] makes much easier to copy or > > compare ipv4 addresses. moreover, you don't have to care about the > > integer size in 64-bits architectures. > > An IPv4 address is nothing more than a 32-bit quantity. I don't see > anything "incorrect" about using unsigned char[4] for that, and that > works perfectly fine on 64-bit architectures. ok, but in this way you have to call memcmp each time you want to compare two ip addresses and memcpy each time you want to copy an ip address. i prefer the in_addr approach (and i don't understand why we shouldn't use structures like in_addr and in_addr6 which have been created just to do what we want: storing ip addresses) as i think it is a more elegant solution, but if you still wish to use unsigned char[4] and unsigned char[16] i cannot force you to change your mind. however, notice that using unsigned char[4] and unsigned char[16] is a less portable solution and is potentially prone to problems with the alignement of the sockaddr_in and sockaddr_in6 structs. > Besides, you seem to be willing to cache the string representation of > an IP address. Why is it acceptable to work with a char *, but > unacceptable to work with unsigned char[4]? I simply don't see that > in_addr is helping anything in host.c's code base. i would prefer to cache string representation of ip addresses because the ipv6 code would be much simpler and more elegant. > >> I don't understand the new PASSIVE flag to lookup_host. > > > > well, that's a problem. to get a socket address suitable for > > bind(2), you must call getaddrinfo with the AI_PASSIVE flag set. > > Why? The current code seems to get by without it. the problem is when you call lookup_host to get a struct to pass to bind(2). if you use --bind-address=localhost and you don't set the AI_PASSIVE flag, getaddinfo will return the 127.0.0.1 address, which is incorrect. > There must be a way to get at the socket address without calling > getaddrinfo. not if you want to to use --bind-address=ipv6only.domain.com. > >> If it would make your life easier to add TYPE in !ENABLE_IPV6 case, > >> so you can write it more compactly, by all means do it. By "more > >> compactly" I mean something code like this: > >> > [...] > > that's a question i was going to ask you. i supposed you were > > against adding the type member to ip_address in the IPv4-only case, > > Maintainability is more important than saving a few bytes per cached > IP address, especially since I don't expect the number of cache > entries to ever be large enough to make a difference. (If someone > downloads from so many addresses that the hash table sizes become a > problem, the TYPE member will be the least of his problems.) i agree. > > P.S. please notice that by caching the string representation of IP > > addresses instead of their network representation, the code > > could become much more elegant and simple. > > You said that before, but I don't quite understand why that's the > case. It's certainly not the case for IPv4. sure, but that would be of great help, especially for IPv6. the code (and in particular the translation from ip addresses to socket addresses and viceversa) would be much simpler and elegant. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Deep Space 6 - IPv6 with Linux http://www.deepspace6.net Ferrara Linux User Gro
Re: wget ipv6 patch
Mauro Tortonesi wrote: > are there __REALLY__ systems which do not support inet_aton? their ISVs > should be ashamed of themselves... Solaris, for example. IIRC inet_aton isn't in any document which claims to be a standard. > however, yours seemed to me an ugly hack, so i have temporarily removed > it. as you say, it would be probably better to provide a fallback > implementation of inet_aton in cmpt.c. But standards define inet_pton, which can do what inet_aton does, so that should be checked for before using the fallback implementation. -- .-. .-.Yes, I am an agent of Satan, but my duties are largely (_ \ / _) ceremonial. | |[EMAIL PROTECTED]
Re: wget ipv6 patch
Mauro Tortonesi <[EMAIL PROTECTED]> writes: >> I still don't understand the choice to use sockaddr and >> sockaddr_storage in a application code. >> They result in needless casts and (to me) uncomprehensible code. > > well, using sockaddr_storage is the right way (TM) to write IPv6 enabled > code ;-) Not when the only thing you need is storing the result of a DNS lookup. I've seen the RFC, but I don't agree with it in the case of Wget. In fact, even the RFC states that the data structure is merely a help for writing portable code across "multiple address families and platforms". Wget doesn't aim for AF independence, and the alternatives are at least as good for platform independence. >> For example, this cast: (unsigned char *)(&addr->addr_v4.s_addr) >> would not be necessary if the address were defined as unsigned >> char[4]. > > in_addr is the correct structure to store ipv4 addresses. using > in_addr instead of unsigned char[4] makes much easier to copy or > compare ipv4 addresses. moreover, you don't have to care about the > integer size in 64-bits architectures. An IPv4 address is nothing more than a 32-bit quantity. I don't see anything "incorrect" about using unsigned char[4] for that, and that works perfectly fine on 64-bit architectures. Besides, you seem to be willing to cache the string representation of an IP address. Why is it acceptable to work with a char *, but unacceptable to work with unsigned char[4]? I simply don't see that in_addr is helping anything in host.c's code base. >> I don't understand the new PASSIVE flag to lookup_host. > > well, that's a problem. to get a socket address suitable for > bind(2), you must call getaddrinfo with the AI_PASSIVE flag set. Why? The current code seems to get by without it. There must be a way to get at the socket address without calling getaddrinfo. > are there __REALLY__ systems which do not support inet_aton? their > ISVs should be ashamed of themselves... Those systems are very old, possibly predating the very invention of inet_aton. >> If it would make your life easier to add TYPE in !ENABLE_IPV6 case, >> so you can write it more compactly, by all means do it. By "more >> compactly" I mean something code like this: >> [...] > that's a question i was going to ask you. i supposed you were > against adding the type member to ip_address in the IPv4-only case, Maintainability is more important than saving a few bytes per cached IP address, especially since I don't expect the number of cache entries to ever be large enough to make a difference. (If someone downloads from so many addresses that the hash table sizes become a problem, the TYPE member will be the least of his problems.) > P.S. please notice that by caching the string representation of IP > addresses instead of their network representation, the code > could become much more elegant and simple. You said that before, but I don't quite understand why that's the case. It's certainly not the case for IPv4.
Re: wget ipv6 patch
On Wed, 8 Oct 2003, Hrvoje Niksic wrote: > Mauro Tortonesi <[EMAIL PROTECTED]> writes: > > > so, i am asking you: what do you think of these changes? > > Overall they look very good! Judging from the patch, a large piece of > the work part seems to be in an unexpected place: the FTP code. yes, i have added support for LPRT and LPSV, and refactored existing code. i still have to work on the code, but the main problem remains probably the duplication of ftp_port and ftp_pasv, which have two different versions (one for the IPv6-enabled case and the other for IPv4-only case). > Here are some remarks I got looking at the patch. > > It inadvertently undoes the latest fnmatch move. sorry. i am working on an old wget cvs release. i will get up-to-date with the latest cvs changes ASAP. > I still don't understand the choice to use sockaddr and > sockaddr_storage in a application code. > They result in needless casts and (to me) uncomprehensible code. well, using sockaddr_storage is the right way (TM) to write IPv6 enabled code ;-) quoting RFC3493 section 3.10: One simple addition to the sockets API that can help application writers is the "struct sockaddr_storage". This data structure can simplify writing code that is portable across multiple address families and platforms. This data structure is designed with the following goals. - Large enough to accommodate all supported protocol-specific address structures. - Aligned at an appropriate boundary so that pointers to it can be cast as pointers to protocol specific address structures and used to access the fields of those structures without alignment problems. The sockaddr_storage structure contains field ss_family which is of type sa_family_t. When a sockaddr_storage structure is cast to a sockaddr structure, the ss_family field of the sockaddr_storage structure maps onto the sa_family field of the sockaddr structure. When a sockaddr_storage structure is cast as a protocol specific address structure, the ss_family field maps onto a field of that structure that is of type sa_family_t and that identifies the protocol's address family. using a union like: struct wget_sockaddr { struct sockaddr; struct sockaddr_in; struct sockaddr_in6; }; is not an elegant solution, and is probably not safe because of compiler alignments. see the chapter about struct sockaddr_storage in: http://www.kame.net/newsletter/19980604 > For example, this cast: (unsigned char *)(&addr->addr_v4.s_addr) would > not be necessary if the address were defined as unsigned char[4]. in_addr is the correct structure to store ipv4 addresses. using in_addr instead of unsigned char[4] makes much easier to copy or compare ipv4 addresses. moreover, you don't have to care about the integer size in 64-bits architectures. > I don't understand the new PASSIVE flag to lookup_host. well, that's a problem. to get a socket address suitable for bind(2), you must call getaddrinfo with the AI_PASSIVE flag set. for instance, if you call: getaddrinfo(NULL, "ftp", &hints, &res) with the AI_PASSIVE flag, you get the :: port 21 and 0.0.0.0 port 21 socket addresses, while calling getaddrinfo without the AI_PASSIVE flag returns the ::1 port 21 and 127.0.0.1 port 21 addresses. the passive flag for lookup_host is a very unelegant hack, but i haven't found a way to get rid of it, yet. any suggestion? > In lookup_host, the comment says that you don't need to call > getaddrinfo_with_timeout, but then you call getaddrinfo_with_timeout. > An oversight? > > You removed this code: > > - /* ADDR is defined to be in network byte order, which is what > - this returns, so we can just copy it to STORE_IP. However, > - on big endian 64-bit architectures the value will be stored > - in the *last*, not first four bytes. OFFSET makes sure that > - we copy the correct four bytes. */ > - int offset = 0; > -#ifdef WORDS_BIGENDIAN > - offset = sizeof (unsigned long) - sizeof (ip4_address); > -#endif > > But the reason the code is there is that inet_aton is not present on > all architectures, whereas inet_addr is. So I used only inet_addr in > the IPv4 case, and inet_addr stupidly returned `long', which requires > some contortions to copy into a uchar[4] on 64-bit machines. (I see > that inet_addr returns `in_addr_t' these days.) > > If you intend to use inet_aton without checking, there should be a > fallback implementation in cmpt.c. are there __REALLY__ systems which do not support inet_aton? their ISVs should be ashamed of themselves... however, yours seemed to me an ugly hack, so i have temporarily removed it. as you say, it would be probably better to provide a fallback implementation of inet_aton in cmpt.c. > I note that you elided TYPE from ip_address if ENABLE_IPV6 is not > defined. That (I think) results in code duplication in some places, > because the code effectively h
Re: wget ipv6 patch
Mauro Tortonesi <[EMAIL PROTECTED]> writes: > so, i am asking you: what do you think of these changes? Overall they look very good! Judging from the patch, a large piece of the work part seems to be in an unexpected place: the FTP code. Here are some remarks I got looking at the patch. It inadvertently undoes the latest fnmatch move. I still don't understand the choice to use sockaddr and sockaddr_storage in a application code. They result in needless casts and (to me) uncomprehensible code. For example, this cast: (unsigned char *)(&addr->addr_v4.s_addr) would not be necessary if the address were defined as unsigned char[4]. I don't understand the new PASSIVE flag to lookup_host. In lookup_host, the comment says that you don't need to call getaddrinfo_with_timeout, but then you call getaddrinfo_with_timeout. An oversight? You removed this code: - /* ADDR is defined to be in network byte order, which is what -this returns, so we can just copy it to STORE_IP. However, -on big endian 64-bit architectures the value will be stored -in the *last*, not first four bytes. OFFSET makes sure that -we copy the correct four bytes. */ - int offset = 0; -#ifdef WORDS_BIGENDIAN - offset = sizeof (unsigned long) - sizeof (ip4_address); -#endif But the reason the code is there is that inet_aton is not present on all architectures, whereas inet_addr is. So I used only inet_addr in the IPv4 case, and inet_addr stupidly returned `long', which requires some contortions to copy into a uchar[4] on 64-bit machines. (I see that inet_addr returns `in_addr_t' these days.) If you intend to use inet_aton without checking, there should be a fallback implementation in cmpt.c. I note that you elided TYPE from ip_address if ENABLE_IPV6 is not defined. That (I think) results in code duplication in some places, because the code effectively has to handle the IPv4 case twice: #ifdef ENABLE_IPV6 switch (addr->type) { case IPv6: ... IPv6 handling ... break; case IPv4: ... IPv4 handling ... break; } #else ... IPv4 handling because TYPE is not present without ENABLE_IPV6 ... #endif If it would make your life easier to add TYPE in !ENABLE_IPV6 case, so you can write it more compactly, by all means do it. By "more compactly" I mean something code like this: switch (addr->type) { #ifdef ENABLE_IPV6 case IPv6: ... IPv6 handling ... break; #endif case IPv4: ... IPv4 handling ... break; }
wget ipv6 patch
here is my first patch to improve ipv6 support of wget. please, notice that the code compiles, but is still buggy and will probably not work. i am sending this preliminary patch only to gather feedback from wget developers and to coordinate with other developers who are working on ipv6 support for wget. so, i am asking you: what do you think of these changes? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Deep Space 6 - IPv6 with Linux http://www.deepspace6.net Ferrara Linux User Grouphttp://www.ferrara.linux.it wget-ipv6.diff.bz2 Description: Binary data