Peter Kirk <peterkirk at qaya dot org> wrote: >> If a certain Unicode plain text file uses ASCII punctuation OR spaces >> OR end-of-line characters, AND the file is not too short or has a >> very odd formatting, then the algorithm should work. > > True. But there may be certain languages (perhaps Thai?) for which all > of these circumstances regularly occur together. It would be very > inconvenient for users of these languages if programs regularly > attribute the wrong encoding to their text.
Whether this is specifically true for Thai or not -- and I doubt that the "short file or odd formatting" condition could ever be considered language-dependent -- I would say an otherwise-good heuristic that performs badly for Thai ought to have special cases built in for Thai, rather than being discarded. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

