> From: Philippe Verdy <[email protected]> > Date: Thu, 24 Apr 2014 17:11:23 +0200 > Cc: Asmus Freytag <[email protected]>, Ilya Zakharevich > <[email protected]>, [email protected], > James Clark <[email protected]>, unicode Unicode Discussion > <[email protected]> > > > In addition, assuming that by "guillemets" Philippe means U+00AB and > > U+00BB, > > > "guillemet" is THE correct name, even in English. "guillemot" comes from an > old typo error.
I didn't mean to say "guillemet" was typo, I just wasn't sure which Unicode codepoint you had in mind, since you didn't show its full official name or its codepoint. And at least your original message used "<<" and ">>" transliterations, not the actual characters. > > they cannot possibly form a bracketed pair, because their > > General Category is not Ps and Pe. For that reason, you will never > > find them in BidiBrackets.txt. > > > > Forget the general category, we know that it does not solve any > internationalization issue correctly. All past versions of Unicode > algorthms that initially attempted to use them now use them only as > informative rules (which are not stabilized) to help generate new "derived" > properties (which should be used verbatim from the content of the UCD, > because rapidly new exceptions are added to the rules). > > The guillemet evidently form a pair even if their use depends on languages > which may swap their role (and this is the main reason why they are not > assigned Ps and Pe because Ps and Pe will be swapped. They are still a pair > which works even better than """ that can be paired in 3 different ways and > not just two (meaning that you don't know which one to look for. They are not a pair for the purposes of the PBA, which is the subject of this discussion. Your message, viz.: > - later the closing guillemet matches the opening guillemet remaining on > the stack, even if the second opening bracket was pushed on top of it : > pair of guillemets is matched, the opening guillement is dropped from the > stack but the second bracket on top of it remains there and can also match > now the following closing bracket. indicated that you thought the guillemets could form a bracket pair, which they cannot, according to the UBA. > So nothing (at least not the reason of the GC which is just an intermediate > but incomplete helper) forbids the guillemets to be listed in > BidiBrackets.txt. They don't satisfy the conditions for that. From BidiBrackets.txt: # This file lists the set of code points with Bidi_Paired_Bracket_Type # property values Open and Close. The set is derived from the character # properties General_Category (gc), Bidi_Class (bc), Bidi_Mirrored (Bidi_M), # and Bidi_Mirroring_Glyph (bmg), as follows: two characters, A and B, # form a bracket pair if A has gc=Ps and B has gc=Pe, both have bc=ON and # Bidi_M=Y, and bmg of A is B. Bidi_Paired_Bracket (bpb) maps A to B and # vice versa, and their Bidi_Paired_Bracket_Type (bpt) property values are # Open (o) and Close (c), respectively. As you see, Ps and Pe are explicitly required. _______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

