Mauro Tortonesi <[EMAIL PROTECTED]> writes: > but i suspect we wiil probably have to add foreign charset support > to wget one of these days. for example, suppose we are doing a > recursive HTTP retrieval and the HTML pages we retrieve are not > encoded in ASCII but in UTF16 (an encoding in which is perfectly > fine to have null bytes in the stream). what do we do in that > situation?
I've never seen a UTF-16 HTML page (which doesn't mean they don't exist), nor have I seen reports that requested adding support for UTF-16. If/when UTF-16 becomes an issue, it's not that hard to add rudimentary support for converting the (ASCII subset of) UTF-16 to ASCII, so that we can find the links. In fact, we could be even smarter -- Wget could mechanically convert UTF-16 to UTF-8, and parse UTF-8 contents as if it were ASCII, without ever being aware of the charset intricacies. The nice thing about UTF-8 is that it can be handled with normal C string functions without corrupting the international characters.
