Re: Default bidi ranges

Ken Whistler Wed, 09 Nov 2011 11:49:44 -0800

On 11/9/2011 9:30 AM, Asmus Freytag wrote:

On 11/9/2011 1:18 AM, "Martin J. Dürst" wrote:
I tried to find something like a normative description of the defaultbidi class of unassigned code points.
In UTR #9, it says(http://www.unicode.org/reports/tr9/tr9-23.html#Bidirectional_Character_Types):
Unassigned characters are given strong types in the algorithm. Thisis an explicit exception to the general Unicode conformancerequirements with respect to unassigned characters. As charactersbecome assigned in the future, these bidirectional types may change.For assignments to character types, see DerivedBidiClass.txt[DerivedBIDI] in the [UCD].

That *is* the normative description of the default Bidi_Class forunassigned code points.

The DerivedBidiClass.txt file, as far as I understand, is mainly acondensation of bidi classes into character ranges (rather thangiving them for each codepoint independently as in UnicodeData.txt).I.e. it can at any moment be derived automatically fromUnicodeData.txt, and is as such not normative.

Because the default values for Bidi_Class are complicated, and cannot bederivedsimply by parsing the values for *assigned* characters inUnicodeData.txt, thelisting of the default values for Bidi_Class in DerivedBidiClass.txthave to be

taken as normative for those values.

Why is it then that the default class assignments are only given inthis file (unless I have overlooked something)? And why is it thatthey are only given in comments?
Because the UnicodeData.txt file has no header (for historicalcompatibility).
Because, like the practice of putting <style> in HTML inside comments,these things (@missing) are in comments to protect older parsers.


And to go beyond what Asmus said there, the "@missing" hack was created as

a syntax for specifying *the* default values for properties where itmakes sensethat they have a *single* default value. It doesn't work for specifyingmultipledefault values differing by code point range. Hence no addition of the@missingcomment in DerivedBidiClass.txt (or its potential addition toPropertyValueAliases.txt)

doesn't suffice for the entire definition.

I'm trying to create a program that takes all the bidi assignments(including default ones) and creates the data part of a bidialgorithm implementation, but I don't feel confident to code againststuff that's in comments. Any advice?


Use the values in the comments.

Remember that this is not *code* with comments that get stripped outbefore compiling.These are text data files for parsing. The fact that people are alreadyparsing the@missing statements indicates that those are being treated normativelynow. Youcould say the same thing for the titles, dates, and copyright notices onthese data

files: they aren't "optional" content to be ignored.

Is it possible that this could be fixed (making it more normative,and putting it in a form that's easier to process automatically)?

This is part of a very large problem for creating a more complete andmachine-parseablemeans of accessing *all* of the Unicode character property data,including data aboutthe *status* of properties and their default values. It won't, IMO, befixed by individual

file fixes one at a time, although incremental improvement can be helpful.

Note that the UCD in XML was created to address this problem in part,but it stillcannot answer many questions about the status of properties, their fullderivations,

their interactions, and their functions.

--Ken

Re: Default bidi ranges

Reply via email to