Re: Validity and properties of U+FFFD (was RE: Roundtripping in Unico de)

Kenneth Whistler Tue, 14 Dec 2004 13:21:23 -0800

Lars asked:

> BTW, what are the properties of U+FFFD? In English please, do not point me
> to the standard.


?!

It has the general category of "Symbol Other" [gc=So].

> Like, can it be a part of an identifier, 

It does not have the ID_Start or the ID_Continue property, which
you could determine for yourself by referring to the standard:

http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt

That doesn't prevent a formal syntax definition for a language
from including it within the BNF for defining and identifier,
but in general, no, it would not appear in identifiers, just as
most other symbols would not.

> is it an 'alphanumeric'? 

No.

> Let me speculate. It should be a letter 

No.

> (it probably more
> often originally was than wasn't). 

You are referring here to speculation regarding what uninterpretable
sequence in some other character encoding was *converted* to U+FFFD
on conversion to Unicode. But that is irrelevant to the properties
of U+FFFD itself.

That is tantamount, for example, to claiming that the C0 control
code 0x1A SUBSTITUTE should be defined as a "letter", simply because
it is often used in signalling a conversion substitution in
8-bit tables.

> I would accept it for identifiers (variables, filenames). 

If you are defining your own language, that would be your
prerogative, of course. But if you are using standard languages
like C, C++, Java, C#, SQL, etc., it is unlikely that you would
be correct in that approach.

> It has no case properties. And it is obviously not a
> space.

True.

There is much, much more to know about Unicode character properties
than just what can be inferred from an attempt to apply the
POSIX model to UTF-8. A good place to start would be Unicode
Technical Report #23, The Unicode Charater Property Model:

http://www.unicode.org/reports/tr23/

And after that, yes, I would point you to the standard.

--Ken

Re: Validity and properties of U+FFFD (was RE: Roundtripping in Unico de)

Reply via email to