Hello Martin,

On 2010-11-11 04:54, "Martin J. Dürst" wrote:
> Yes, except that the terms superset/subset (and set in general)
> shouldn't be used unless you really strictly speak about the repertoire
> of characters, and not the encoding itself. So e.g. the repertoire of
> iso-8859-1 is a subset of the repertoire of UTF-8. However, iso-8859-1
> is not a subset of UTF-8, not because you can't label some text encoded
> as iso-8859-1, but because subset relationships among the encodings
> themselves don't make sense).

if you model encodings as functions, thereby making ASCII something like

ASCII ≔ { 0 ↦ '\0', ..., 32 ↦ ' ', 33 ↦ '!', 34 ↦ '"', ..., 126 ↦ '~', 127 ↦ '' }

you can definitely use the words subset and superset. Since this is just a set of tuples that may be contained idendically in other encodings (such as UTF-8), it is appropriate to say that ASCII is a subset of UTF-8. Of course, restricting this to the range of the function, i.e.

    ran ASCII = {'\0', ..., ' ', '!', ..., '~', '' }

(sorry, borrowing some syntax from Z) allows you to make repertoire comparisons in a sub/superset manner, making ran Latin9 a subset of ran Unicode, even though the respective functions don't share this relationship.

Just a thought :-)

Regards,
Johannes

Reply via email to