Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

Doug Ewell Thu, 11 Nov 2010 09:50:37 -0800

Mark Davis 😎 wrote:

There are superset relations among some of the CJK character sets, andalso -- practically speaking -- between some of the windows andISO-8859 sets. I say practically speaking because in generalenvironments, the C1 controls are really unused, so where a nonISO-8859 set is same except for 80..9F you can treat it pragmaticallyas a superset.

There was a time, about 10 years ago, when Frank da Cruz would havereplied almost immediately about the importance of C1 controls interminal environments, and the arguments about incompatibility between8859-1 and Windows-1252 would have been off and running.

That was about the same time that people like Roman Czyborra werecomplaining that their terminals were scrambled by text encoded inUTF-8, because of its use of bytes in the 80..9F range, and people likeJörg Knappen were creating alternative UTF's to get around thisperceived problem.

Regarding the subset/superset terminology, we need to distinguishbetween "encoding subsets" and "repertoire subsets":

* ASCII is both an encoding subset and a repertoire subset of 8859-1 andWindows-1252 and UTF-8.

* 8859-1 is an encoding subset of Windows-1252, except for the 80..9Frange.

* 8859-1 and Windows-1252 are repertoire subsets, but not encodingsubsets, of UTF-8.


* 8859-15 is neither type of subset of 8859-1.

* Etc.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s

Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

Reply via email to