On 7/8/2013 1:35 PM, Whistler, Ken wrote:
A much more productive approach, it seems to me, would be instead to try to establish information about various, identifiable typographical traditions for use of punctuation around the world, and then associate "exemplar sets" of punctuation used with those traditions.
I would recommend that an approach like that be used "behind the scenes" to manage the update of the data file.
We are stuck with a format that seemingly assumes that all characters are treated individually. However, I agree with you, that this is not the case, but instead, there are these sets of punctuation marks for certain "typographical traditions".
In addition, there are issues like the Dandas, where specific marks have been unified across a range of related scripts.
A flexible way to pull this information together would be a UTN that tries to collect this information in human, not machine readable form, with commentary and background.
If the information in the UTN is considered solid, then it could be reflected, in a separate pass, in the existing property file. Because you would work on the basis of either typographical sets (or explicit encoding decisions) there would be less temptation to jiggle individual characters' property values.
A./

