RE: General Categories Pe, Pf, Pi, Ps

Kent Karlsson via Unicode Sat, 24 Jan 2026 05:37:54 -0800
From: Unicode <[email protected]> On Behalf Of Jukka K. Korpela 
via Unicode
Sent: Friday, January 16, 2026 8:46 PM
…
> Generally, whether a character is closing, final, initial, or opening 
> punctation should be based on language-specific
> information, such as CLDR.
I would advice against that, since 1) language information is not always 
available, 2) even when available, it is not reliable,
3) even when available and correct, people often use their primary language’s 
quotation convention, even for there second/third/… language…
For quotation marks, it is an unfortunate historical accident that different 
typographic traditions (not languages really) have
different conventions.
For “ambiguous” quote marks (and for that matter apostrophes also when not used 
as quotation marks) and line breaking
I have proposed an update to the Unicode line breaking rules (not 
language/typographic tradition dependent) in
https://www.unicode.org/L2/L2025/25261r-line-breaking.pdf.
That should take care of the line breaking issue (very annoying at present) for 
“ambiguous” quote marks.
When it comes to the bidi issue with these marks, I note that other brackets 
now seem to be treated specially (I
haven’t yet checked the latest issue of the bidi algorithm), at least there is 
a new data file: https://www.unicode.org/Public/UNIDATA/BidiBrackets.txt. But 
“ambiguous” quote marks are not handled. One would
still need some bidi control characters (like RLM, LRM) to fix the issue. But 
people will not generally be so knowledgeable,
as well as meticulous, to input them. So I would suggest to add to bidi 
processing that
RE: General Categories Pe, Pf, Pi, Ps

Reply via email to