2012/2/9 d fulano <[email protected]>: > > It seems to be this is the way the command prompt behaves with an > > invalid (incomplete) utf8 sequence.. Even other command prompt > > programs eg ftp seem to behave strangely with the 65001 codepage > > if random accented characters are typed which correspond to > > invalid utf8. > > > > > > In utf-8, to typeset é (e accent) / unicode E9, you need to type 2 bytes: > > 195 169 in decimal > > C3 A9 in hex > > > > In contrast,in UTF-8 when you type é (e accent) this signifies > > the first of 3 bytes, which actually encode chinese characters > > at Unicode 9000+. > > So somethng fails in interactive TeX as there are no other > > valid characers following é as required in utf-8. > > > > You can test the above as: > > > > -a- create a text file with the line: > > > > \font\arial="Arial Unicode MS" at 12pt\arial é\bye > This must be some Windows misfeature, my bash shell in Linux works correclly, locales are set to UTF-8, I can type Czech accented characters as well as Devanagari directly on the keybord. The following test works as expected:
This is XeTeX, Version 3.1415926-2.3-0.9997.5 (TeX Live 2011) **\relax entering extended mode *\font\a="Nakula" \a ěšč करना \bye > > > The two characters are created by character map or > > on the numeric keyboard: alt-195 alt-169. > > Running this through xetex produces just one character é - the e with accute. > > Xetex expects utf8 input. > > (need to make sure though that the editor you use doesnt > > try to be 'helpful' by reencoding the two strange characters as urf, > > resulting in 4 bytes, so dont choose 'save as utf8 format') > > > > -b- if you create \é (backslash e-accute) in a file and run > > it, the program stops with undefined control sequence \é > > the two characters displayed after the backslash are the > > utf8 encoding of é e-acute. So Xetex outputs utf8 text. > > > > At least this is what I get on my pc. > > > > utf8 is not the same as unicode, it's an encoding for unicode, which > > takes good unicode characters and translates into multi-byte 'garbage'. > > Only the first 127 ASCII characters stay the same under UTF8, and > > the rest convert into multi-bytes. > > > > But, why do you want to change your *keyboard* input to utf8 anyway? > > It's not that you can do the utf conversion in your head and type the > > converted characters in. > > > > Xetex expects utf-8 input by default, so you could simply 'type' in > utf-encoded > > characters eg é and it would work. (Easier to use a utf8 enabled text editor > > though...). So there is no need for special translation. > > > > Xetex also produces uft-8 output on the screen by default, at least this is > what > > I see when there is a problem with accented characters. (thier utf encodings) > > It's just that the command window does't translate these utf8 characters into > > nice glyphs. And that is the case regardless of the cp 65001 setting. > > Also, the switch for a Unicode-enabled command prompt "cmd /u" also doent help > > with this either. > > > > In all the above cases however (whatever the chcp and whatever /u switch is > used) > > if I open the tex .log file with a text (utf8) editor, I see the correct > > symbols anyway. > > > > > > > > > > > > > > > >> >> F:\>xetex >> This is XeTeX, Version 3.1415926-2.2-0.9997.4 (Web2C 2010) >> restricted \write18 enabled. >> **\relax >> entering extended mode >> >> *é >> >> ! Emergency stop. >> <*> \relax >> >> No pages of output. >> Transcript written on texput.log. > > As you can see, the é, when entered in code page > 65001, is interpreted as a Ctrl-z. > > > > > > > > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
