On 17/12/2004 10:13, Arcane Jill wrote:

...

One last question - why /can't/ locale conversion be automated? I don't really get this one, but it's the root of this whole topic. Surely, if we make the following assumptions:
(1) No user has a locale of UTF-8, and
(2) Some users will have created UTF-8 filenames and UTF-8 text files, and
(3) Some of those text files may have been concatenated, leading to mixed-encoding text files
then we can surely automate everything. (Requirement (1) can be met simply by asking all users who have changed their locale to UTF-8 to change it back again, temporarily). ...


This locale change is not exactly simple for (future?) users who only speak and use a language which is supported only by UTF-8 - which would include most Indians and SE Asians for a start.

Assuming these requirements, all you have to do is:

...
# if (the file can be positively identified as a text file)
# {
# re-encode all non-UTF-8 substrings (assuming them to be in the user's locale) to UTF-8


This assumption is invalid. I have on my system a number of files which are text files but encoded neither in UTF-8 nor in my own locale. I read them either with programs which can display them according to their locale (which is not encoded within the file) or by using substitution fonts (which is justified because many of them were written for in such obsolescent setups). This kind of automated conversion would cause disastrous damage.


-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/





Reply via email to