My guess is that  Pe, Pf, Pi and Ps were based on the usage of punctuation
in English and some other languages. If this subclassification is taken too
seriously, problems will arise. For example, software that takes U+201D too
seriously as Pf, treats texts like xxx ”xxx” xxx badly: since  U+201D is
Pf, a line break is not permitted before it, even when a space intervenes.
This is what MS Word does, irrespective of language settings, even for a
language for which it knows that U+201D is both “start quotation” and “end
quotation”.

Generally, whether a character is closing, final, initial, or opening
punctation should be based on language-specific information, such as CLDR.

Yucca


pe 16.1.2026 klo 18.09 Marius Spix via Unicode ([email protected])
kirjoitti:

> I wonder what is the point of the General Categories Pe, Pf, Pi and Ps?
>
> Different languages use different quotation marks, for example:
>
> English:  “ (U+201C, Pi) + ” (U+201D, Pf)
> German: „ (U+201E, Ps) + “ (U+201C, Pi)
> Polish: „ (U+201E, Ps) + ” (U+201D, Pf)
>
> How does a character classify as closing, final, initial, or opening
> punctation? Are there any general criteria?
>
> Best regards,
>
> Marius
>
>

Reply via email to