Jon,
Thanks for your reply.
On Oct 13, 2004, at 3:15 AM, you wrote:
imported UTF-8 sequences like [U+0065][U+0303] <e, tilde> get remapped internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE.
Is this kind of behavior what one would expect?
That's conformant, if it causes problems with any other process (including
other processes that are part of the system in question)
Like, for example, a rendering process?
then that other process isn't complying with conformance clause C9.
At a guess I'd say it's probably normalising to NFC which is advantageous in
a lot of ways (for example you should do this with data that has to conform
with the web's [draft] character model).
One of the clearest advantages is that it makes searching a lot more
efficient, as only one of the potentially very many canonically equivalent
sequences will have to be searched for
Yes.
(though case-insensitive and/or diacritical-insensitive searches will still have many possible matching strings).
Yup.
On the other hand there are potential security risks with such normalisation, and perhaps therefore it is something that should be configurable.
It's problematic (and buglike) for at least one reason: one needs to put all these precomposed things in one's font, or FileMaker doesn't display them properly.
That's were the problem lies, not in the normalisation.
Maybe they ought to be rendering the glyphs according to the characters in the font, with a fallback via decomposition. If they normalize and simply throw up the missing character empty box, this is not very helpful.
I built a tidy IPA transcription font, lacking many precomposed things. Importing and exporting a data subset in FM7 reveals a total of 113 characters not displaying properly. This is annoying, to say the least.
One reason I wanted a *small* font is that in PDF generation big fonts may not always be subsetted properly, and even a single page PDF will end up embedding the whole font.
Also, there is extra overhead with a big font that seems to slow things up a bit, even on a fast machine.
I'm assuming it will export the data in decomposed form ... but haven't actually tried that yet ...
I wouldn't assume anything of the sort. Normalising to NFD would be quite
unusual.
Yes, I realize that now. And my test confirms that the internal normalization is also what you get on export. And hence those 113 empty boxes ...
BTW, this application supports import of UTF-8, but will not export UTF-8. That's odd, isn't it? It'll only export UTF-16 (it's internal storage form).
Odd indeed.
Well, maybe they're saving UTF-8 export for a future release ... though I can't imagine why.
-Richard

