We had thought of something similar, but which would provide more information in interfaces.
Reserve a space of 256 code points, with names: UNCONVERTIBLE BYTE-00 UNCONVERTIBLE BYTE-01 ... UNCONVERTIBLE BYTE-FF During a conversion process, if some bytes (say from corrupt UTF-8) cannot be correctly converted into code points, then a sequence of the above are generated. This doesn't preserve the original text -- you would never convert back from these codepoints to anything; it is really only useful ephemerally, in the process of doing a conversion where something goes wrong. It is really only a slightly more verbose FFFD REPLACEMENT, but would be handy in certain conversion APIs, expecially in single-code-point-at-a-time API like getChar(). Mark __________________________________ http://www.macchiato.com ► “Eppur si muove” ◄ ----- Original Message ----- From: "Dominikus Scherkl" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, October 30, 2002 03:49 Subject: New Charakter Proposal > Hello. > > I would like to have a "source failure indicator symbol" (SFIS) > charakter in the unicode, which a charset-convertion unit may > insert into a text (Suggeested position: U+FFF8). > > Reason: > several charsets have undefined codepoints which were > defined in a former or later version (eg. overlong > UTF-8 encodings or the $ symbol (0x24) in the INVARIANT > charset). > > A converter can replace such symbols by U+FFFD (which is > correct but loses the information), or simply use the > charakter which most likely is intended (which hides the error). > Both is not very good. > > The SFIS would allow the reader to see that an error occured > and therefore the following charakter may be incorrect, but > maintain the readability if the right conversion is made anyway > (or at least give a hint which charakter may be intended - > eg. the $ sign could have been any other currency symbol > if a national 7-bit charset was changed to INVARIANT by > previous conversions). > > Of course a converter can still use U+FFFD if it has no > idea which character is intended or if unicode doesn't contain > the character. > > > The whole "charakter identities"-discussion gave me another > reason to introduce such a SFIS-charakter: > A font-renderer may show the SFIS before a charakter which > is replaced by another one because the correct one is not > contained in the font (eg. it may render an "a with > superscript e above" by SFIS + "a umlaut" to indcate the > error and show an probably fitting replacement, which is > much better than to show an empty square). > In short words: > The SFIS may indicate a kind of compatibility-decomposition > of the following charakter. > (this is not nessessarily the standard compatibility-decomposition). > > I'd like to hear if my suggestion is completely weird or > if anybody else think it might be useful. > > Best Regards. > -- > Dominikus Scherkl > [EMAIL PROTECTED] > >

