suggestion

Joseph Wright Mon, 13 Jan 2020 00:21:37 -0800

On 13/01/2020 03:41, Doug McKenna wrote:

Phil Taylor wrote:

| So because JSBox is required/designed to incorporate all of XeTeX's
| features, it must (by definition) implement/provide \Umathcode.


Just to be clear, JSBox can eventually incorporate all of XeTeX's features 
(primitives), but does not do so now. It doesn't even incorporate pdfTeX's 
features, but it is set up to. I'm merely adding XeTeX features as necessary to 
get the LaTeX macro library installed and then typeset a LaTeX document 
containing no Unicode at all. The problem is that somewhere in the LaTeX format 
initialization the ability to recognize a Unicode character (as opposed to a 
UTF-8 byte sequence) is equated with the assumption that it's being run under 
XeTeX, and that therefore at least some of XeTeX's features are there and can 
be relied upon at format initialization time.

At present, there are two engines that implement \Umathcode, etc., 'inthe wild', XeTeX and LuaTeX, and they have (over time) come to an agreedposition on what core features are available at the macro level. (Forexample, originally XeTeX called it's new primitives \XeTeX... but theygot renamed to \U... to match LuaTeX.)

They have quite a lot of differences too, but a core subset of featuresis available with both, and that comes about as they offer \Umathcode.Almost all of the tests in LaTeX look for the relevant primitive, so forexample when we want \Uchar we look for it. However, there are as younote a few places where finding \Umathcode is by far the easiest marker.

It's quite possible to add additional tests to the core code, providedthere is a spec or at least some notes on what's available. (Forexample, (u)pTeX for a long time had no docs in English, so things weretricky. But there is now a basic manual there to allow those of us whodo not know Japanese to offer at least some basic support.)

| But could not JSbox perform (or simulate) the following :

| \let \Umathschar = \Umathchar % use British spelling as synonym
| \let \Umathchar = \undefined % inhibit "load-unicode-data.tex"'s special 
treatment of engines that implement \Umathchar
| \input load-unicode-data % since it would seem that you cannot simply skip 
this step
| \let \Umathchar = \Umathschar % restore canonical meaning of \Umathchar


It could, but it's not my code that's issuing "\input load-unicode-data". The reading of 
"load-unicode-data.tex" is embedded within my version of LaTeX's own initialization code, 
and there's no guarantee that elsewhere in that code there isn't some dependence on \Umathchar that 
such a re-definition might interfere with. LaTeX's code has several tests that rely on whether 
|\Umathchar| is defined or not, and even in the latest versions, it is declared that \Umathchar 
existence is the official way to test. Indeed, the latest official comments, as David Carlisle 
brought to my attention in this thread, declare that \Umathchar existence testing is the current 
way to go in all sorts of places.


I think you mean \Umathcode :)

Each place that uses Unicode features does test for this primitive; ifit exists, we have to-date been able to assume a few additionalprimitives are also available (e-TeX, \Uchar, \Umathchardef) but mainlytells us that we can allocate \lccode and \uccode beyond 255.

Here is perhaps a slightly better hack:

If it's acceptable as the very first executable line in latex.ltx (or other format source 
files) to test the catcode value of `{ to determine whether a format has already been 
loaded or not, then it should be acceptable within "load-unicode-data.tex" (or 
the like) to include a similar test to determine whether to proceed with the TeX parse of 
the Unicode data, or to bail because it's presumable that the tables are already 
initialized. For example, the first non-8-bit Unicode character is:

0100;LATIN CAPITAL LETTER A WITH MACRON;Lu;0;L;0041 0304;;;;N;LATIN CAPITAL 
LETTER A MACRON;;;0101;

It is safe, I think, to assume that this Unicode character will forever be 
classified as an uppercase letter (with a lowercase mapping value of U+0101).

The test at the start of latex.ltx is about making sure we are in IniTeXmode: I'm not sure I'd choose to do that today, but the test islong-standing. For load-unicode-data, the idea was partly that there wasreally no issue about checking: unlike formats, that might have hiddenstuff, here all we are trying to do is get to a known position. Thatlinks to the second reason I'm slightly wary of a test. As-written,load-unicode-data ensures that the \lccode, \uccode and \catcode tablesare in a state *known to the macro layer*. I know it's slightly strangeto you, but as a macro programmer I can't 'know' what different enginedevs might do/change, and I certainly don't know exactly what version ofUnicodeData.txt you are working from. By doing initialisation withoutchecking, I can be sure that we are on a known Unicode version.

To be honest, that's all a minor concern: it's very much more that therewas no need to worry about a test. It would be trivial to add one, notleast since the Unicode Consortium have a clear position on stability.

I'm trying to avoid initializing these character mapping tables twice, 
especially when the second time (reading this file) rather inefficiently takes 
30 times longer than the first, and accomplishes nothing new.

Like I said, from a macro programmer POV it accomplishes 'the codes arein a known state I control', though practically that's not a majorthing. (If you were using a Unicode version different from the oneXeTeX/LuaTeX use, it would presumably impact on a rather limited subsetof chars.)


Joseph

Re: [XeTeX] [EXT] A LaTeX Unicode initialization desire/question/suggestion

Reply via email to