Hi there, I have recently spent some time fixing the import and export of special characters to LaTeX. Took me quite some time to understand the intricacies of TeXmacs' handling of encodings. Now, after I have fought my way through it (some patches are already in CVS, others are in the queue) I want to share a few thoughts about this:
The TeXmacs documentation talks about "special symbols" and "universal symbols". The latter are a reasonable concept: similar to latex or HTML-entities, each character gets a unique identifier that represents its meaning independent of its graphical representation. The renderer then has all the information it needs to find the correct glyph depending on the environment to render the character. In math mode, universal symbols work perfectly. "special symbols" however are a nightmare. In fact, the core idea is not that bad: special symbols are characters in some specific encoding. The horror comes from the fact that this encoding is a) difficult to retrieve from the environment b) seemingly fixed to 1-byte encodings (blocking the path towards UTF-8) c) silently assumed to be Cork T1 in many places in the code There are some provisions for Cork T2a to accomodate cyrillic characters but otherwise conversions are at best been done ad-hoc and incompletely, which can be easily observed when you try to use special characters in alternative fonts. I remember someone talking about a long term vision of moving to unicode internally. This probably is an extremely ambitious goal that has no chance to be tackled in the near future (other design issues are much more pressing). So what can be done to clean up the current situation? I believe the most important thing is to clearly define the current situation. If internal encoding is de-facto Cork T1, this should be stated in the documentation instead of talking about the abstract concept of "special symbols". One straightforward solution that I see, would be to move towards universal symbols for anything that is not ASCII. The support for special symbols in text mode should be there. It should be possible to replace any reference to non-ASCII codes in the sources by the respective universal symbol. As ASCII is the overlap of most relevant codes (namely Cork, latinN and UTF-8), the step towards ASCII+universal symbols would be a simple step towards real encoding-independence. Once we have eradicated Cork T1 that way, it should also be much easier to introduce UTF-8 internally, which should be reasonably close to a 1-to-1 mapping on universal symbols. What do you think? Greetings, Norbert _______________________________________________ Texmacs-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/texmacs-dev
