My guess is that Pe, Pf, Pi and Ps were based on the usage of punctuation in English and some other languages. If this subclassification is taken too seriously, problems will arise. For example, software that takes U+201D too seriously as Pf, treats texts like xxx ”xxx” xxx badly: since U+201D is Pf, a line break is not permitted before it, even when a space intervenes. This is what MS Word does, irrespective of language settings, even for a language for which it knows that U+201D is both “start quotation” and “end quotation”.
Generally, whether a character is closing, final, initial, or opening punctation should be based on language-specific information, such as CLDR. Yucca pe 16.1.2026 klo 18.09 Marius Spix via Unicode ([email protected]) kirjoitti: > I wonder what is the point of the General Categories Pe, Pf, Pi and Ps? > > Different languages use different quotation marks, for example: > > English: “ (U+201C, Pi) + ” (U+201D, Pf) > German: „ (U+201E, Ps) + “ (U+201C, Pi) > Polish: „ (U+201E, Ps) + ” (U+201D, Pf) > > How does a character classify as closing, final, initial, or opening > punctation? Are there any general criteria? > > Best regards, > > Marius > >
