2012/10/19 Doug Ewell <[email protected]>: > Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote: > >>>> ASCII, >>> >>> A strict subset of UTF-8, so no need to support this separately. >> >> Not really. If the file to save does not need any character which is >> found in an 8-bit extended character set (there are many of them), >> saving them as ASCII (i.e. saving this charset information in the >> metadata) still preserves the compatibility of the encoded text with >> all these other extended charsets (notably all ISO 8859-* codepages as >> well as UTF-8). > > Which metadata is that? I was sure we were talking about editors for > plain-text files, which don't have any sort of metadata declaring the > character encoding or anything else.
There's always some metadata : either it comes from the filesystem itself (filenaming conventions or explicit storage of this metadata, including HTTP that is a filesystem supporting them, or MIME for emails), or it comes from information provided by the user in that editor, to instrut it about how to decode it, or it is implicit in the editor itself which offers no choice for it in its GUI or command line. The metadata I am refering to are of course not those stored in the plain-text body of the file itself (including the decoded body part of a MIME email or the body part of an HTTP request or reply, or the content read with I/O after opening a file or a continuous stream), so they are not those you may find in HTML or XML processing syntaxes as part of the file content itself (something that is not really recommanded if those files are handled blindly as if they were just "plain text", ignoring their required syntax for decoding them : the information about the syntax needed to process them however is metadata, when you first have to know that the file type is XML or HTML, because it is not really stored in the file content, but just "guessed" from some leading signatures) As soon as a user needs to specify the filetype or file encoding somewhere that the filesystem does not provide itself as separately stored metadata, the user provides additional metadata. This is true when he also chooses a specific editor that handles a specific syntax or encoding (the metadata provided by the user consists in this choice of tool, even if it was inappropriate from a wrong guess or assumption).

