Re: outside decomposed, inside precomposed

Richard Cook Wed, 13 Oct 2004 08:39:02 -0700

Jon,

Thanks for your reply.

On Oct 13, 2004, at 3:15 AM, you wrote:

imported UTF-8 sequences like [U+0065][U+0303] <e, tilde> get
remapped internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE.
Is this kind of behavior what one would expect?
That's conformant, if it causes problems with any other process (including other processes that are part of the system in question)


Like, for example, a rendering process?

then that other
process isn't complying with conformance clause C9.
At a guess I'd say it's probably normalising to NFC which is advantageous in a lot of ways (for example you should do this with data that has to conform with the web's [draft] character model).

One of the clearest advantages is that it makes searching a lot more efficient, as only one of the potentially very many canonically equivalent sequences will have to be searched for


Yes.

(though case-insensitive and/or
diacritical-insensitive searches will still have many possible matching
strings).


Yup.

On the other hand there are potential security risks with such
normalisation, and perhaps therefore it is something that should be
configurable.

It's problematic (and buglike) for at least one reason: one needs to
put all these precomposed things in one's font, or FileMaker doesn't
display them properly.

That's were the problem lies, not in the normalisation.

Maybe they ought to be rendering the glyphs according to the characters in the font, with a fallback via decomposition. If they normalize and simply throw up the missing character empty box, this is not very helpful.

I built a tidy IPA transcription font, lacking many precomposed things. Importing and exporting a data subset in FM7 reveals a total of 113 characters not displaying properly. This is annoying, to say the least.

One reason I wanted a *small* font is that in PDF generation big fonts may not always be subsetted properly, and even a single page PDF will end up embedding the whole font.

Also, there is extra overhead with a big font that seems to slow things up a bit, even on a fast machine.

I'm assuming it will export the data in decomposed form ...
but haven't actually tried that yet ...
I wouldn't assume anything of the sort. Normalising to NFD would be quite unusual.

Yes, I realize that now. And my test confirms that the internal normalization is also what you get on export. And hence those 113 empty boxes ...

BTW, this application supports import of UTF-8, but will not export
UTF-8. That's odd, isn't it? It'll only export UTF-16 (it's internal
storage form).

Odd indeed.

Well, maybe they're saving UTF-8 export for a future release ... though I can't imagine why.

-Richard

Re: outside decomposed, inside precomposed

Reply via email to