2011/8/25 Peter Constable <[email protected]>: > From: [email protected] [mailto:[email protected]] On Behalf Of Philippe Verdy > >>2011/8/22 Joó Ádám <[email protected]>: >>> Speaking of actual implementation, I’m convinced that this format >>> should be the same as it is for encoded characters ... > >> As well, the small properties files can be embedded, in a very compact >> form, in the PUA font. > > In one sense having data regarding PUA character properties embedded within a > font could make sense since the interpretation of instances of those PUA > characters will be tied to particular fonts. > > However, I don't see this as really being workable: rendering implementations > will typically do certain types of processes without access to any font data.
Remove the future "will in your sentence... you're assuming how future implementations will work. And the "certain types of process" element is extremely fuzzy. Those that want to use PUA as RTL characters will never be satisfied, they want an access to some properties data that are not only those from the UCD. But you're right in one thing: the font is not expected to contain all those properties. I am still convinced that this is the best place for BC property values which are tied to the font, for rendering purpose. Only the properties for PUA characters that have absolutely no use in rendering should not be in fonts (for example collation weights, case mappings, custom character name aliases if one wants). Some other properties may be needed for rendering purpose: notably text segmentation data for handling line breaks (many PUA are currently used for custom sinograms in the Han script, that allows linebreak to occur before and after each of them; but this behavior would not be perceived as correct for most scripts. However, I don't think that line breaking properties data are very well fitting in fonts, because such segmentation is not needed only for rendering. However for most of those non-rendering purpose (e.g. plain-text search), we genenrally don't want to have the search result depending on soft line breaks. Soft line breaks are only meant for rendering purpose, and so this breakability may become also under the control of the font. On the opposite, hard line breaks are controlled by existing non-PUA control characters, so they are not a problem and don't need to be overriden. Those hard line breaks are very often expected to be searchable, unlike soft line breaks which should remain invisible in plain-text searches as they are only the result of some rendering process.

