2011/8/22 Peter Constable <peter...@microsoft.com>: > From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy > >>> As I explained in an earlier message, the layout engine doesn't use >>> the "default" property value but the resolved bidi level. >> >> Once again, you refuse to understand my arguments. > > I don't think I'm refusing to understand anything. I'm merely taking your > assertions _as stated_ and evaluating whether I think they are accurate or > not. Perhaps what you intend to convey assumes things not clear in what > you've stated, since you think I'm not understanding you. > > >> What I'm saying is that OpenType CANNOT resolve the bidi level of >> PUAs (with the exception where we use additional BiDi controls, > > Of course _OpenType_ cannot, but any rendering engine that uses OpenType > _must_ resolve the bidi level of _all_ characters in a sequence that it is > given to render. Given our current situation, a default rendering > implementation would resolve PUA characters to an even (LTR) level unless, of > course, bidi control characters -- particularly RLO -- are used to override > the directionality of the character, as you mention. > >> which remains a hack, because it adds unnecessary unvisible markup >> around the encoded texts, and complexifies the use of strings and >> substrings). > > We'll, depending on how you define "hack", some might reasonably suggest that > any usage of PUA is "a hack". (Of course, some who may not use the term in > the same way might argue that it is certainly not "a hack".) > > You can turn the problem as you want, but PUAs (as well as unknown > characters) still have default properties that, in fine, will get used in > absence of a more precise definition (i.e. an explicit override) of the > actual BiDi property needed for the character.
So now I perceive your opinion : - you don't want the solution proposed by Michael Everson (simply adding a range of RTL PUA), that I also think is not necessary, but is clearly a possible solution. - you propose to use BiDi overrrides. I also think (like Michael Everson) that this is an unpractical hack (Michael Everson that has to work and discuss with old scripts, or many new unencoded characters to add to existing scripts (notably Arabic) trying to encode them, finding various ways to represent them, and *test* his solutions, will certainly think that embedding each occurence of a PUA substring in BiDi controls, including in the middle of Arabic words, is certainly a very bad hack. - He must certainly think (I also think it too), that PUA characters are NOT hacks. They are architectural to the well-being of the UCS, essential in various situations to preserve the software conformance with the standard. In fact, for old and rare scripts, using PUAs will remain essential for long, because those scripts will need more and more time now to get encoded, requiring more extensive researches, more collaborations with less technical-aware people that cannot understand why they'll have to test the proposed solutions using test fonts and test input methods tht require them to enter BiDi controls around all those PUA characters. The only problem here is the strong LTR property of all existing PUAs, as if they were only needed for rare Han sinograms, or for symbols. Note that, for using a PUA for rare letters found in Arabic, it is impossible to embed the whole Arabic text in Bidi overrides: this would completely break the normal behavior of the non-PUA characters found in the text, notably sequences of Arabic digits, because the BiDi controls are effectively disabling the BiDi algorithm so that it will return a single RTL run for all the text in these controls. IF BiDi controls are used, they have to be inserted ONLY between subranges containing the PUAs, and only those. The solution proposed by Michael (a new block of RTL PUAs, probably in plane 14) still has an advantage: no BiDi controls are needed at all. The BiDi algorithm does not have to be disabled. All other aspects of RTL scripts (or mixed RTL/LTR scripts) are preserved (including mirroring behaviors for "auto-LTR" characters (at the begining of paragraphs) and characters whose directionality depends on the resolved direction of the precening text. I don't think this is necessary though: I see no reason why implementations *have to* keep the strong LTR property of existing PUAs. This strong LTR property is only the consequence of the fact that this is only the *default* value of those PUAs, and applications should not be restricted from changing this property as they want, especially for PUAs. But to change this property value, we need an explicit PUA agreement about their usage, in such a way that it can be understood by a computer. This means an external source of character properties. My opinion is that this need is most often sufficient if it solves just the problem of correct display order. Given that the encoded texts (using those existing strong LTR PUAs that we want to adopt a RTL behavior) do not explicitly encode the PUA agreement, the source of the PUA agreement cannot be the encoded text (BiDi controls are definitely not a demonstration of such PUA agreement). For me, it would be simple to embed this PUA agreement, for computer use, in a font suitable for displaying those PUAs (let's remember that those PUAs will still need such a specific font that *must* match the same PUA agreement). All that is needed then, in such a PUA font, is that it indicates which PUA characters (that are "cmap'ed" in it) are RTL or not. This just requires a new very small table in the PUA font, to help the text layout engine to correctly resolve the direction of text runs when it implements the BiDi algorithm. Then, if needed, the standard "rtlm", "ltrm" features (for glyph-level mirroring, in the case of OpenType layout) can be used reliably ('rtla" and "ltra" features for typograpic variants may also be used, but they are probably much less essential for PUA characters that are likely expected to be represented in the PUA font with just a single typographic variant per "cmap'ped" glyph). If the PUA font does not have such information about which of its cmapped PUA is RTL, all of them will resolve as LTR, only if a third data source is used and associated to the document and its PUA font, to specify their effective BiDi properties. This is in practical more complex to manage (notably for plain-text documents: this would require that plain-text editors can load a separate properties file in addition to loading the plain-text document and selecting the appropriate PUA font. If editing or viewing a rich-text document (e.g. HTML in an HTML editor, or a word-processor document, or an online Wiki document) in WYSIWYG mode, that rich-text document will need to supply the reference to that source of information, in some meta-data field, just like it can store also the font to select, in order to render the text in the expected order: this is possible to do automatically, without user interaction each time he loads the document, but it won't be as easy for plain-text documents (including when editing the plain-text HTML or Wiki source code). So there are only two options: - (1) the solution advocated by Michael Everson (a new RTL PUA block, say in plane 14); it does not require a change in the BiDi algorithm itself, but renderers must implement the new version of the UCD. - (2) an external source to override the strong LTR property of existing PUA blocks (my opinion is that the PUA font is the perfect fit to place those PUA information), and use this information in renderers. (I advocate placing this information in PUA fonts directly, something that some font formats is already doing, but not OpenType for now). In both cases, the text renderers have to be modified (including renderers for OpenType, whose specifications will need to be updated, to change the BiDi algorithm implementation): this requires an approval either by the UTC & WG2 (solution 1) or by the OpenType working group (solution 2). -- Philippe.