Juho Vähä-Herttua <[EMAIL PROTECTED]> writes: >> It's not that hard, either -- you can always transform UTF-16 into >> UTF-8 and work with that. > > No you can't. Then the filenames in URLs that should be as escaped UTF-16 will > be transformed into escaped UTF-8.
Can you elaborate on this? What I had in mind was: 1. start with a stream of UTF-16 sequences 2. convert that into a string of UCS code points 3. encode that into UTF-8 now work with UTF-8 consistently What do you mean by file names as "escpaed UTF-16"? >> silently assumes (or so I believe; you never confirmed this) that the >> charset of u->host is the charset of the user's locale. That breaks >> with any page that specifies a different charset and attempts to link >> to a non-ASCII domain. > > I confirmed this quite clearly in my earlier mail: I must have missed that part of the mail; sorry about that. > It assumes this with the function I used, but it also supports > conversions from unicode strings where conversions are made > manually. So Wget has not only to call libidn, but also to call an unspecified library that converts charsets encountered in HTML (potentially a large set) to Unicode?
