On Monday 21 February 2005 12:35 pm, Leonid wrote: > Mauro, > > I tend to agree with Hrvoje. If you decide to open the > Pandora's box and implement iconv support, please, please, > provide an option, preferably default one, to configure or > use wget without iconv.
of course. > FYI, there are languages which actively use more than one coding. For > example, I know 14 different codings for Russian language. i know, i know. i18n is a great pain in the neck. > It is rather common that either the charset at the remote host or the > charset at the local host are set incorrectly. this is not a problem. actually (apart from the case of a document returned as an HTTP response) we cannot be sure that the charset used by the server is exactly our locale. the only two reasonable things we can do are: - assume all data is ASCII - assume all data is in our locale charset the second assumption allows us to avoid problems like this one: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=271931 > iconv will choke in an attempt to recode strings to UTF-8 and back in such a > situation. actually, the interpretation of data as a sequence of multi byte charachters encoded in the local charset is done using mbrtowc(3), which allows us to get an array of wide chars (see current CVS code in string_t.c). we would need to use iconv(3) only to translate the obtained wide char string into a UTF8 encoded (normal) char string and eventually for UTF8 {de,en}coding. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng. http://www.ing.unife.it Institute of Human & Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linux http://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it