After some investigation, I'm taking my question/proposition back ) There are two major problems I found while implementing BPA-alike (relaxed) rules for the quotation marks: 1) The paired quotation marks that could be used a lone as apostrophes: with the information currently provided by the UCD, it's not possible to guess if the quotation mark is used to indicate a start/end of the quoted block or to indicate an apostrophe (example: he said: 'I don't use apostrophes' (the left/right single quotation marks (U+2018/U+2019) were used to quote what he said and to indicate an apostrophe)). Heuristic could use the word boundaries to skip a middle-word apostrophes, the problem is that that language-tailored word breaking implementation requires the text to be itemized into script runs first -- too complicated; 2) In some languages, the paired quotation marks could be swapped or used as unpaired (example: »…» or »…«): without an additional information, it is easy to misinterpret the appearance of the quoted block (example: »Danish«and»Polish»).
Perhaps, this should be somehow clarified in respective sections of UAX#24, "General Punctuation", etc. Konstantin 2012/6/7 CE Whitehead <[email protected]>: > Hi. > > From: Konstantin Ritt <ritt.ks_at_gmail.com> > Date: Thu, 7 Jun 2012 13:06:04 +0300 >> Yep, forgot to mention that the difference is in that that some paired >> quotation characters might be used alone in place of apostrophe, etc. >> so that the BPA rules could be relaxed for the quotation marks. >> Dunno about their mirroring in all languages. I thought the >> BidiMirroring.txt is supposed to list a (language-independent) >> characters and their respective mirrored brothers. > >> UAX#24 section 2.2 "Handling Characters with the Common Script Property" >> states: >>> In determining the boundaries of a run of text in a given script, >>> programs must resolve any of the special script property values, such >> as >>> Common, based on the context of the surrounding characters. A simple >>> heuristic uses the script of the preceding character, which >> works well in >>> many cases. However, this may not always produce optimal results. For >>> example, in the text "... gamma (γ) is ...", this >> heuristic would cause >>> matching parentheses to be in different scripts. >>> >>> Generally, paired punctuation, such as brackets or quotation marks, >>> belongs to the enclosing or outer level of the text and should > >>> therefore match the script of the enclosing text. In addition, opening >>> and closing elements of a pair resolve to the same script property >> >>> values, where possible. The use of quotation marks is language dependent; >>> therefore it is not possible to tell from the character code >> alone >>> whether a particular quotation mark is used as an opening or closing >>> punctuation. For more information, see Section 6.2, > >>> General Punctuation, of [Unicode]. >>> >>> Some characters that are normally used as paired punctuation may also be >>> used singly. An example is U+2019 right single quotation >> mark, which is >>> also used as apostrophe, in which case it no longer acts as an enclosing >>> punctuation. An example from physics would >> be <ψ| or |ψ>, where the >>> enclosing punctuation characters may not form consistent pairs. > >> IIUC, this is the same problem like the one PRI #231 is intended to solve. > >> For the cases like "a«b»" one would expect similar results provided by >> the UBA and the script itemization. > >> Konstantin > > 2012/6/7 Philippe Verdy <verdy_p_at_wanadoo.fr>: >>> Their pairing and mirroring is not appropriate for all languages using >>> them. >>> >>> 2012/6/7 Konstantin Ritt <ritt.ks_at_gmail.com>: >>>> Actually, they have a respective entries in the BidiMirroring.txt: >>>> 00AB; 00BB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK >>>> 00BB; 00AB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK >>>> and mapped into gc=Pi and gc=Pf. >>>> Even without the per-language tailoring, it seems like a good basic >>>> approximation, no? > > Phillipe is correct; Wikipedia gives some examples of language-specific > variation in opening and closing quotation marks: > http://en.wikipedia.org/wiki/Non-English_usage_of_quotation_marks > > (also of course as Konstantin notes the single quotation marks are used in > some languages as apostrophes to indicate possession) > > I have not used say French-style quotations in facebook where parentheses > get displayed at the wrong places if used in mixed right-to-left and > left-to-right text. So I dunno what happens to quotation marks in > mixed-directionality text yet. > > Best, > > --C. E. Whitehead > [email protected] >

