Re: texteditors that can process and save in different encodings

Asmus Freytag Sun, 21 Oct 2012 20:00:30 -0700

On 10/21/2012 4:09 AM, Philippe Verdy wrote:

Unless there's a way to rebuild the metadata unambiguously or to enforce
>that it is complete and correct, it's very hard to rely on it for any
>particular purpose.

Enforcing that the metadata is correct is perfectly possible, at least
to ensure that it matches the requirements. (For example, an incorrect
encoding, given in metadata, should be signaled each time it violates
one of its rules : this is possible for many text standardized
encodings, including UTF's).

It may be possible to do some verification of well-formedness forwell-designed encoding schemes like the UTFs but, pray, how do you tellapart 8859-1 from 8859-15?

These are not rarely occurring character sets and enforcement for them,as for any of the other 8859 series would only be possible if you wereto do the very same character-set sniffing that you so dislike.

If you run a variation of a language detector, it's possible to detect,for example, that the text is in Icelandic, and therefore requires8859-1 instead of 8859-15. That is because the few code points that aremapped to different characters in these two sets would be appearing(statistically) in the wrong context.

This is something a clever text editing (or HTML editing) tool could do,but not something that you can build into an OS.

Anyway, to cut the discussion short, I'd love to see a working exampleof any system where metadata are 100% reliable.

A./

Re: texteditors that can process and save in different encodings

Reply via email to