2012/10/18 Doug Ewell <[email protected]>: > Philippe Verdy wrote: >> ASCII, > > A strict subset of UTF-8, so no need to support this separately.
Not really. If the file to save does not need any character which is found in an 8-bit extended character set (there are many of them), saving them as ASCII (i.e. saving this charset information in the metadata) still preserves the compatibility of the encoded text with all these other extended charsets (notably all ISO 8859-* codepages as well as UTF-8). This does not mean that the encoder will be different. The difference is only in the metadata you emit for the encoded file. If you indicate UTF-8 always, the file may be rejected by all applications that expect not being able to handle Unicode correctly. So they will reject the file without even trying to decode it. This matches the need for "being lenient for reading (in other applications), but strict when writing (just specify the real minimum requirements for decoding the file)". However if the file already specified the "UTF-8" encoding, it should not be changed blindly and automatically into "ASCII", because further editors may restrict the usable character set, or could attempt to store approximations if ever you insert a non-ASCII character in what was intented to be compatible directly with UTF-8. This applies for example to emails (each email is independant from others, even if they are replying to a previous one being partially or fully encoded in the response; the link between emails is not part of their text, but part of their tracking MIME headers and of metadata for local processing in mail agents or proxies) : minimize the decoding requirements when sending it.

