RE: Clones (was RE: Hexadecimal)

Asmus Freytag Tue, 19 Aug 2003 06:34:26 -0700

Compatibility characters:

The recommendations for compatibility characters are necessarily vague, since their use in legacy data (and legacy environments) is strongly dependent on what is (or was) customary in a given environment.

If a process merely warehouses text data (or parses only a very small subset of characters for special purpose, such as an HTML parser) then merely preserving legacy characters is often the best strategy. However, take the opposite example, of a process that actually scans the text for roman numerals. In that case, ignoring the compatibility characters would be a mistake, since legacy data of the kind for which these compatibility characters were added would *only* contain roman numerals in this form. They would *not* use the ASCII characters.

Processes that modify legacy data for re-export to a legacy system obviously need to be intimately familiar with the legacy conventions, in a way that could not possibly be documented in the Unicode Standard in all details for every character and every legacy system.

Documentation in the code charts:

I agree with several of the comments that "hiding" the information about special characters in running text makes it unnecessarily difficult to work with the information. On the other hand, not everything can be succinctly expressed in machine readable tables (some characters have complicated usages), and even annotations in the name list have limits. They are definitely not the place for lengthier discussions.

For Unicode 4.0 we have attempted to improve the situation by systematically extracting the line-breaking related information into UAX#14, which at least allows task-focused access. Information about mathematical usage of characters is now collected in one place in UTR#25, partially duplicating, and partially extending the information in the text of the standard, but providing a single place of access. Further improvements are possible. Personally I'd be in favor of some icon in the character names list that simply indicates that a character is more fully discussed elsewhere - that would make the code charts more useful as an index into the description of the characters.

Mathematical operators:

Future extensions of programming languages should allow not only the MINUS sign as operator, but many other charactesr, for example LOGICAL AND and LOGICAL OR, and as many other operators as appropriate for the language.

Input of the operators doesn't have to necessarily be done via a special purpose keyboard. The use of input macros, editor substitution or similar input technologies (e.g. turning && into LOGICAL AND) would be more flexible. Some editors already support the display of highly formatted program source code even though the underlying text backbone uses the standard ASCII conventions of current programming languages. Just one example is Source Insight from www.sourceinsight.com, which not only represents >= etc. by singly symbols, but can also correctly increase the size of outer parentheses for nested expressions.

A./

RE: Clones (was RE: Hexadecimal)

Reply via email to