> The problem is as follows: > > You see an interpolation "25a to 25c". How do you know that this means > "25a, 25b, 25c"? You know by removing the number and then starting with > the "a" go through code points adding one until you reach "c". Easy. > This will work for all alphabets where that are layed out in alphabetical > order in Unicode, and they probably all are. (but thats an assumption on > my part :-)
Ouch. Unicode order has no meaning in the real world, and only really works for English (and not even then properly for subtle cases, like ligatures, not that these would ever be used in these kind of addresses). You need to know the lexical ordering, which means you need to know the language. Sometimes you can guess from the character, and two characters make it easier than one, but the problem doesn't go away with two - the "null" variant isn't central to this problem. There's also a cultural assumption about how you might do this in other countries. I've no idea how Chinese addresses are formulated normally - whether they even use digits, and if those digits are the arabic numerals - let alone what these exceptional cases might be. But IF you know it is Chinese and IF the scheme fits, with digits + Chinese Character, then the null case still works (Chinese characters still have a lexical ordering, I believe it has to do with the number of strokes, but any relationship to Unicode order is purely coincidental) So I'm coming round to the view that alphabetic should explicitly only mean only n nA nB ... nZ where you can start and end at any point in the sequence, and not even try to deal with other characters from other alphabets (not even other latin ones). Any other sequence from other cultures needs its own interpolation style or additional qualifying tag to identify it, just as we'd tag an email with the encoding. David _______________________________________________ talk mailing list [email protected] http://lists.openstreetmap.org/listinfo/talk

