Philippe Verdy wrote:
But the most basic converters between encodings (not syntax
transformers such as converting characters into escape sequences for
specific computer languages) should be integrated (this includes
standard UTF's, notably UTF-8 and probably UTF-16,
So far so good.
ASCII,
A strict subset of UTF-8, so no need to support this separately.
and most probably ISO-8859 1,
People outside of the Americas and Western Europe might disagree with
this "obvious" default SBCS choice.
and its Windows 1252 extension which replaces the deprecated C1
controls from ISO 8859, as agreed now in HTML5 and most common
practices ;
C1 controls are deprecated from HTML5, and probably from other versions
of HTML, and from XML. Even in 2012, other types of text files are
rumored to exist. Until C1 controls are formally deprecated from ISO
6429 and/or ECMA 48, it is incorrect to declare them "deprecated" in
general.
this should also include the integrated support for local encodings
that are already natively integrated in the OS for its legacy 8-bit
encoding, which should be supported by using local OS API's,
Step by step, this started with "the most basic converters" and has
evolved into something much more extensive. The .NET framework supports
dozens of non-Unicode encodings. Once you go down this path, users will
reasonably expect your app to provide all kinds of character processing,
like CRLF conversion and \Uxxxx conversion and trailing-space stripping
and tab/space conversion and maybe normalization. This is the situation
we are in today.
--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell