Richard Wordingham asked: > How many examples do I need to collect to add Tai Tham to the script > extensions property for ... ?
IMO, a couple well-documented examples ought to suffice. But, this query raises a couple further questions for me regarding the scalability and maintenance of ScriptExtensions.txt. Basically, reports coming in of "Script X character Y is also used with Script Z" are proving to be a rather haphazard and ad hoc way of maintaining that data file and the related property. It seems as if additions to the data file are motivated more by who is paying attention to what this month, rather than by any overall measures of objective validity or implementation usefulness of the property. I'm not sure what alternative there is now, but find it very distasteful that the UTC has been forced into the mode of property maintenance for such a subjective and haphazard collection of observations about common usage. The second question is this: what likelihood is there that a full implementation of Tai Tham will not also be expected to be capable of handling all of Thai? In such a case, aren't a series of ad hoc observations about common use of punctuation between the scripts somewhat superfluous? I ask that because the situation echoes the rather more extensive situation of East Asian punctuation usage for ideographic or syllabic scripts typeset together with Chinese. Trying to track all of those instances down and getting them all enshrined in ScriptExtensions.txt strikes me as a losing proposition already -- and the situation is likely to just get worse as more historic scripts from East Asia end up in Unicode eventually. A much more productive approach, it seems to me, would be instead to try to establish information about various, identifiable typographical traditions for use of punctuation around the world, and then associate "exemplar sets" of punctuation used with those traditions. Such an approach, I assert, would tend to be much more robust (as well as more comprehensible) than definition of very fragile set definitions associating lists of scripts one-by-one with various characters. --Ken

