wget ipv6 patch

2003-10-08 Thread Mauro Tortonesi

here is my first patch to improve ipv6 support of wget. please, notice
that the code compiles, but is still buggy and will probably not work.

i am sending this preliminary patch only to gather feedback from wget
developers and to coordinate with other developers who are working on
ipv6 support for wget.

so, i am asking you: what do you think of these changes?

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Deep Space 6 - IPv6 with Linux  http://www.deepspace6.net
Ferrara Linux User Grouphttp://www.ferrara.linux.it

wget-ipv6.diff.bz2
Description: Binary data


Re: some wget patches against beta3

2003-10-08 Thread Hrvoje Niksic
[EMAIL PROTECTED] (Martin v. Löwis) writes:

 Why do you think the scheme is narrow-minded?

Because 1.9-beta3 seems to be a problem.

 VERSION = ('[.0-9]+-?b[0-9]+'
'|[.0-9]+-?dev[0-9]+'
'|[.0-9]+-?pre[0-9]+'
'|[.0-9]+-?rel[0-9]+'
'|[.0-9]+[a-z]?'
'|[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]')

But that's narrow.  Why support 1.9-b3, but not 1.9-beta3 or
1.9-alpha3, or 1.9-rc10?  Those and similar version schemes are in
wide use.

 That's really bad.  But what's even worse is that something or
 someone silently changed beta3 to b3 in the POT, and then failed
 to perform the same change for my translation, which caused it to
 get dropped without notice.

 Nothing should get dropped without a notice. [...]

I now understand that this could have been an exception due to the
outage.  But that's how it happened.  I sent the translation -- twice
-- and it got dropped.  Karl told me to resend the translation with a
1.9-b3 version (which I'd never heard of before), so I naturally
assumed that the submission had been dropped because of version.

 Now, since UMontreal has changed the translation@ alias, it might be
 that some messages were lost during the outage; this is unfortunate,
 but difficult to correct, as we cannot find out which messages might
 have lost. Fortunately, most translators know to get a message back
 from the robot for all submissions, so if they don't get one, they
 resend.

Note that I did resend, but to no avail.  My first attempt contained a
MIME attachment, which I then found out the robot didn't understand.
My second attempt was from po-mode, which should have produced a valid
message, except for the version.



Re: wget ipv6 patch

2003-10-08 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 so, i am asking you: what do you think of these changes?

Overall they look very good!  Judging from the patch, a large piece of
the work part seems to be in an unexpected place: the FTP code.

Here are some remarks I got looking at the patch.

It inadvertently undoes the latest fnmatch move.

I still don't understand the choice to use sockaddr and
sockaddr_storage in a application code.  They result in needless casts
and (to me) uncomprehensible code.  For example, this cast:
(unsigned char *)(addr-addr_v4.s_addr) would not be necessary if the
address were defined as unsigned char[4].

I don't understand the new PASSIVE flag to lookup_host.

In lookup_host, the comment says that you don't need to call
getaddrinfo_with_timeout, but then you call getaddrinfo_with_timeout.
An oversight?

You removed this code:

-  /* ADDR is defined to be in network byte order, which is what
-this returns, so we can just copy it to STORE_IP.  However,
-on big endian 64-bit architectures the value will be stored
-in the *last*, not first four bytes.  OFFSET makes sure that
-we copy the correct four bytes.  */
-  int offset = 0;
-#ifdef WORDS_BIGENDIAN
-  offset = sizeof (unsigned long) - sizeof (ip4_address);
-#endif

But the reason the code is there is that inet_aton is not present on
all architectures, whereas inet_addr is.  So I used only inet_addr in
the IPv4 case, and inet_addr stupidly returned `long', which requires
some contortions to copy into a uchar[4] on 64-bit machines.  (I see
that inet_addr returns `in_addr_t' these days.)

If you intend to use inet_aton without checking, there should be a
fallback implementation in cmpt.c.

I note that you elided TYPE from ip_address if ENABLE_IPV6 is not
defined.  That (I think) results in code duplication in some places,
because the code effectively has to handle the IPv4 case twice:

#ifdef ENABLE_IPV6
switch (addr-type)
  {
case IPv6:
... IPv6 handling ...
break;
case IPv4:
... IPv4 handling ...
break;
  }
#else
  ... IPv4 handling because TYPE is not present without ENABLE_IPV6 ...
#endif

If it would make your life easier to add TYPE in !ENABLE_IPV6 case, so
you can write it more compactly, by all means do it.  By more
compactly I mean something code like this:

switch (addr-type)
  {
#ifdef ENABLE_IPV6
case IPv6:
... IPv6 handling ...
break;
#endif
case IPv4:
... IPv4 handling ...
break;
  }



Re: [PATCH] wget-1.8.2: Portability, plus EBCDIC patch

2003-10-08 Thread Martin Kraemer
On Tue, Oct 07, 2003 at 06:06:59PM +0200, Hrvoje Niksic wrote:
 Martin, thanks for the patch and the detailed report.  Note that it
 might have made more sense to apply the patch to the latest CVS
 version, which is somewhat different from 1.8.2.

What must I set CVSROOT to?

 I'm really not sure whether to add this patch.  On the one hand, it's
 nice to support as many architectures as possible.  But on the other
 hand, most systems are ASCII.  All the systems I've ever seen or
 worked on have been ASCII.

Right; that is exactly what makes it so hard for those who must
work on EBCDIC systems: nobody supports them, and most available
software is proprietary. So, getting a patch (even if only distributed
as-is, e.g., in contrib/ebcdic.patch) is a valuable help for those
who don't have it (yet).

  I am fairly certain that I would not be
 able to support EBCDIC in the long run and that, unless someone were
 to continually support EBCDIC, the existing support would bitrot away.
 
 Is anyone on the Wget list using an EBCDIC system?

How can they if they don't have the patch? It only works if the socket
talks ASCII on the network, and that is what the patch solves ;-)

   Martin
-- 
[EMAIL PROTECTED] | Fujitsu Siemens
Fon: +49-89-636-46021, FAX: +49-89-636-47655 | 81730  Munich,  Germany


problem with 302 server respose parsing

2003-10-08 Thread Sergey Vasilevsky
I use Wget 1.8.2.
When I try receive page with '-nc' option and server return 302 and new url,
wget not test that url on rules in '-nc' and download and rewrite existing
file.

I think wget not used command line option rules when parse server response
header!
It is a bug?



Re: wget ipv6 patch

2003-10-08 Thread Mauro Tortonesi
On Wed, 8 Oct 2003, Hrvoje Niksic wrote:

 Mauro Tortonesi [EMAIL PROTECTED] writes:

  so, i am asking you: what do you think of these changes?

 Overall they look very good!  Judging from the patch, a large piece of
 the work part seems to be in an unexpected place: the FTP code.

yes, i have added support for LPRT and LPSV, and refactored existing code.
i still have to work on the code, but the main problem remains probably
the duplication of ftp_port and ftp_pasv, which have two different
versions (one for the IPv6-enabled case and the other for IPv4-only
case).


 Here are some remarks I got looking at the patch.

 It inadvertently undoes the latest fnmatch move.

sorry. i am working on an old wget cvs release. i will get up-to-date with
the latest cvs changes ASAP.


 I still don't understand the choice to use sockaddr and
 sockaddr_storage in a application code.
 They result in needless casts and (to me) uncomprehensible code.

well, using sockaddr_storage is the right way (TM) to write IPv6 enabled
code ;-)

quoting RFC3493 section 3.10:


   One simple addition to the sockets API that can help application
   writers is the struct sockaddr_storage.  This data structure can
   simplify writing code that is portable across multiple address
   families and platforms.  This data structure is designed with the
   following goals.

   - Large enough to accommodate all supported protocol-specific address
  structures.

   - Aligned at an appropriate boundary so that pointers to it can be
  cast as pointers to protocol specific address structures and used
  to access the fields of those structures without alignment
  problems.

   The sockaddr_storage structure contains field ss_family which is of
   type sa_family_t.  When a sockaddr_storage structure is cast to a
   sockaddr structure, the ss_family field of the sockaddr_storage
   structure maps onto the sa_family field of the sockaddr structure.
   When a sockaddr_storage structure is cast as a protocol specific
   address structure, the ss_family field maps onto a field of that
   structure that is of type sa_family_t and that identifies the
   protocol's address family.


using a union like:

struct wget_sockaddr {
struct sockaddr;
struct sockaddr_in;
struct sockaddr_in6;
};

is not an elegant solution, and is probably not safe because of compiler
alignments. see the chapter about struct sockaddr_storage in:

http://www.kame.net/newsletter/19980604


 For example, this cast: (unsigned char *)(addr-addr_v4.s_addr) would
 not be necessary if the address were defined as unsigned char[4].

in_addr is the correct structure to store ipv4 addresses. using in_addr
instead of unsigned char[4] makes much easier to copy or compare ipv4
addresses. moreover, you don't have to care about the integer size in
64-bits architectures.


 I don't understand the new PASSIVE flag to lookup_host.

well, that's a problem. to get a socket address suitable for bind(2), you
must call getaddrinfo with the AI_PASSIVE flag set. for instance, if you
call:

getaddrinfo(NULL, ftp, hints, res)

with the AI_PASSIVE flag, you get the :: port 21 and 0.0.0.0 port 21
socket addresses, while calling getaddrinfo without the AI_PASSIVE flag
returns the ::1 port 21 and 127.0.0.1 port 21 addresses.

the passive flag for lookup_host is a very unelegant hack, but i haven't
found a way to get rid of it, yet. any suggestion?


 In lookup_host, the comment says that you don't need to call
 getaddrinfo_with_timeout, but then you call getaddrinfo_with_timeout.
 An oversight?

 You removed this code:

 -  /* ADDR is defined to be in network byte order, which is what
 -  this returns, so we can just copy it to STORE_IP.  However,
 -  on big endian 64-bit architectures the value will be stored
 -  in the *last*, not first four bytes.  OFFSET makes sure that
 -  we copy the correct four bytes.  */
 -  int offset = 0;
 -#ifdef WORDS_BIGENDIAN
 -  offset = sizeof (unsigned long) - sizeof (ip4_address);
 -#endif

 But the reason the code is there is that inet_aton is not present on
 all architectures, whereas inet_addr is.  So I used only inet_addr in
 the IPv4 case, and inet_addr stupidly returned `long', which requires
 some contortions to copy into a uchar[4] on 64-bit machines.  (I see
 that inet_addr returns `in_addr_t' these days.)

 If you intend to use inet_aton without checking, there should be a
 fallback implementation in cmpt.c.

are there __REALLY__ systems which do not support inet_aton? their ISVs
should be ashamed of themselves...

however, yours seemed to me an ugly hack, so i have temporarily removed
it. as you say, it would be probably better to provide a fallback
implementation of inet_aton in cmpt.c.


 I note that you elided TYPE from ip_address if ENABLE_IPV6 is not
 defined.  That (I think) results in code duplication in some places,
 because the code effectively has to handle the IPv4 case twice:

 #ifdef 

Re: wget ipv6 patch

2003-10-08 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 I still don't understand the choice to use sockaddr and
 sockaddr_storage in a application code.
 They result in needless casts and (to me) uncomprehensible code.

 well, using sockaddr_storage is the right way (TM) to write IPv6 enabled
 code ;-)

Not when the only thing you need is storing the result of a DNS
lookup.

I've seen the RFC, but I don't agree with it in the case of Wget.  In
fact, even the RFC states that the data structure is merely a help for
writing portable code across multiple address families and
platforms.  Wget doesn't aim for AF independence, and the
alternatives are at least as good for platform independence.

 For example, this cast: (unsigned char *)(addr-addr_v4.s_addr)
 would not be necessary if the address were defined as unsigned
 char[4].

 in_addr is the correct structure to store ipv4 addresses. using
 in_addr instead of unsigned char[4] makes much easier to copy or
 compare ipv4 addresses. moreover, you don't have to care about the
 integer size in 64-bits architectures.

An IPv4 address is nothing more than a 32-bit quantity.  I don't see
anything incorrect about using unsigned char[4] for that, and that
works perfectly fine on 64-bit architectures.

Besides, you seem to be willing to cache the string representation of
an IP address.  Why is it acceptable to work with a char *, but
unacceptable to work with unsigned char[4]?  I simply don't see that
in_addr is helping anything in host.c's code base.

 I don't understand the new PASSIVE flag to lookup_host.

 well, that's a problem. to get a socket address suitable for
 bind(2), you must call getaddrinfo with the AI_PASSIVE flag set.

Why?  The current code seems to get by without it.

There must be a way to get at the socket address without calling
getaddrinfo.

 are there __REALLY__ systems which do not support inet_aton? their
 ISVs should be ashamed of themselves...

Those systems are very old, possibly predating the very invention of
inet_aton.

 If it would make your life easier to add TYPE in !ENABLE_IPV6 case,
 so you can write it more compactly, by all means do it.  By more
 compactly I mean something code like this:

[...]
 that's a question i was going to ask you. i supposed you were
 against adding the type member to ip_address in the IPv4-only case,

Maintainability is more important than saving a few bytes per cached
IP address, especially since I don't expect the number of cache
entries to ever be large enough to make a difference.  (If someone
downloads from so many addresses that the hash table sizes become a
problem, the TYPE member will be the least of his problems.)

 P.S. please notice that by caching the string representation of IP
  addresses instead of their network representation, the code
  could become much more elegant and simple.

You said that before, but I don't quite understand why that's the
case.  It's certainly not the case for IPv4.



Re: wget ipv6 patch

2003-10-08 Thread Draen Kaar
Mauro Tortonesi wrote:

 are there __REALLY__ systems which do not support inet_aton? their ISVs
 should be ashamed of themselves...

Solaris, for example. IIRC inet_aton isn't in any document which claims
to be a standard.

 however, yours seemed to me an ugly hack, so i have temporarily removed
 it. as you say, it would be probably better to provide a fallback
 implementation of inet_aton in cmpt.c.

But standards define inet_pton, which can do what inet_aton does, so that
should be checked for before using the fallback implementation.

-- 
 .-.   .-.Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
 |
 |[EMAIL PROTECTED]


Error: wget for Windows.

2003-10-08 Thread Suhas Tembe
I am trying to use wget for Windows  get this message: The ordinal 508 could not be 
located in the dynamic link library LIBEAY32.dll.

This is the command I am using:
wget http://www.website.com --http-user=username --http-passwd=password

I have the LIBEAY32.dll file in the same folder as the wget. What could be wrong?

Thanks in advance.
Suhas



Re: Error: wget for Windows.

2003-10-08 Thread Jens Rösner
Hi Suhas!

 I am trying to use wget for Windows  get this message: The ordinal 508 
 could not be located in the dynamic link library LIBEAY32.dll.

You are very probably using the wrong version of the SSL files.
Take a look at 
http://xoomer.virgilio.it/hherold/
Herold has nicely rearranged the links to 
wget binaries and the SSL binaries.
As you can see, different wget versions need 
different SSL versions-
Just download the matching SSL, 
everything else should then be easy :)

Jens



 
 This is the command I am using:
 wget http://www.website.com --http-user=username 
 --http-passwd=password
 
 I have the LIBEAY32.dll file in the same folder as the wget. What could 
 be wrong?
 
 Thanks in advance.
 Suhas
 

-- 
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++



Re: some wget patches against beta3

2003-10-08 Thread Hrvoje Niksic
[EMAIL PROTECTED] (Martin v. Löwis) writes:

 Hrvoje Niksic [EMAIL PROTECTED] writes:

  VERSION = ('[.0-9]+-?b[0-9]+'
 '|[.0-9]+-?dev[0-9]+'
 '|[.0-9]+-?pre[0-9]+'
 '|[.0-9]+-?rel[0-9]+'
 '|[.0-9]+[a-z]?'
 '|[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]')
 
 But that's narrow.  Why support 1.9-b3, but not 1.9-beta3 or
 1.9-alpha3, or 1.9-rc10?  Those and similar version schemes are in
 wide use.

 Are you requesting the addition of these three formats?

Yes, please.

To be clear: it would be ideal if the Robot didn't care about
versioning at all.  But if it really has to, then it should support
versioning schemes in wide use.