Juho Vähä-Herttua <[EMAIL PROTECTED]> writes:

>> It's not that hard, either -- you can always transform UTF-16 into
>> UTF-8 and work with that.
>
> No you can't. Then the filenames in URLs that should be as escaped UTF-16 will
> be transformed into escaped UTF-8.

Can you elaborate on this?  What I had in mind was:

1. start with a stream of UTF-16 sequences
2. convert that into a string of UCS code points
3. encode that into UTF-8
now work with UTF-8 consistently

What do you mean by file names as "escpaed UTF-16"?

>> silently assumes (or so I believe; you never confirmed this) that the
>> charset of u->host is the charset of the user's locale.  That breaks
>> with any page that specifies a different charset and attempts to link
>> to a non-ASCII domain.
>
> I confirmed this quite clearly in my earlier mail:

I must have missed that part of the mail; sorry about that.

> It assumes this with the function I used, but it also supports  
> conversions from unicode strings where conversions are made
> manually.

So Wget has not only to call libidn, but also to call an unspecified
library that converts charsets encountered in HTML (potentially a
large set) to Unicode?

Reply via email to