Re: [NTG-context] Basic question on Unicode and ConTeXt
Mojca Miklavec wrote: I'm still slighlty confused by the encoding files (texnansi, ec,..., in one case iso-8859-7 is used). Does it mean that it is impossible (or at least very complex or slow) to access more than 256 characters from a single font at once? indeed and since it's related to hyphenation ... but some day pdftex will be 32 bit and open type so ... Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Am 2005-07-23 um 00:20 schrieb Mojca Miklavec: I'm still slighlty confused by the encoding files (texnansi, ec,..., in one case iso-8859-7 is used). Does it mean that it is impossible (or at least very complex or slow) to access more than 256 characters from a single font at once? TeX as an old 8bit system isn't able to handle more than 256 chars per font. Only more modern siblings (like Omega/Aleph) are able to handle "Unicode size" fonts by itself. Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Christopher Creutzig wrote: > We already have > Iconv in ruby and can, if we know that ISO-8859-2 is a single byte > coding system, simply say > > conv = Iconv.new("UTF-16", "ISO-8859-2") > 255.times { |i| puts lookup[conv.iconv("%c" % i)] } > > to get the whole list, assuming we've filled the lookup hash first. Great! Sorry for all my philosophising! I don't know ruby (yet) and I didn't even think about this possibility. My last idea was to parse and combine the data on http://www.unicode.org/Public/MAPPINGS/VENDORS/, http://www.unicode.org/Public/UNIDATA/UnicodeData.txt and http://partners.adobe.com/public/developer/en/opentype/aglfn13.txt, but your idea is hundred times faster and better! Thanks a lot! > As you've said, I'd combine steps A2 and A3, to make ConTeXt run faster. That's OK for me. If there's a simple internal ruby tool (called every time when unicode->tex mapping changes or some more encoding support is added) instead of one-time-script, there should be no problem to do that directly. > If you want, for whatever reason, to use \textellipsis for an > ellipsis (it just looks horribly wrong to me) instead of \dots, you'd > need to invoke the ruby script which generates the regi-* files. I just wanted to give an example that changes are sometimes needed and that it is difficult to trace all the places where they should have been made. Sorry, this example wasn't very ilustrative, I don't even know what \textellipses stands for, I just saw some comments about changes made in regi-* files or some discrepancies. > The whole thing should not require any change at all to ConTeXt > itself, since the regi-* files could look exactly as they do now, just > being generated automatically. (For the multibyte encodings, the whole > thing gets much more tricky.) I noticed (perhaps I'm wrong) that TeX community support for cyrillic may be better than that in unicode and in the available old 8bit encodings. ConTeXt is also already supporting those strange regimes (ctt, dbk, mls, mnk, mos, ncc, ...) that I was unable to find anywhere else. In this case one should also be careful in order not to spoil this already available feature. I'm still slighlty confused by the encoding files (texnansi, ec,..., in one case iso-8859-7 is used). Does it mean that it is impossible (or at least very complex or slow) to access more than 256 characters from a single font at once? Mojca ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Christopher Creutzig wrote: conv = Iconv.new("UTF-16", "ISO-8859-2") 255.times { |i| puts lookup[conv.iconv("%c" % i)] } to get the whole list, assuming we've filled the lookup hash first. an alternative is to use the tcx files but that is kind of messy so we need a utf-8 hash (can be loaded from unic-* files) Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Mojca Miklavec wrote: A1.) prepare the files to be used as a source of transformation from "any" character set to utf and prepare a list of synonyms for encodings In my point of view, that should only be a fallback. We already have Iconv in ruby and can, if we know that ISO-8859-2 is a single byte coding system, simply say conv = Iconv.new("UTF-16", "ISO-8859-2") 255.times { |i| puts lookup[conv.iconv("%c" % i)] } to get the whole list, assuming we've filled the lookup hash first. As you've said, I'd combine steps A2 and A3, to make ConTeXt run faster. If you want, for whatever reason, to use \textellipsis for an ellipsis (it just looks horribly wrong to me) instead of \dots, you'd need to invoke the ruby script which generates the regi-* files. The whole thing should not require any change at all to ConTeXt itself, since the regi-* files could look exactly as they do now, just being generated automatically. (For the multibyte encodings, the whole thing gets much more tricky.) Christopher ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Christopher Creutzig wrote: > Hans Hagen wrote: > >> So why not mapping the characters to unicode first and defining the > >> mapping from unicode to \TeXcommand only once? regi-* files (at least > >> in the meaning they have now) could be prepared automatically by a > >> script, less error-prone and without the need to say "Some more > >> definitions will be added later." > >> > > you mean ... > > > > \defineactivetoken 123 {\uchar{...}{...}} > > > > it is an option but it's much slower and take much more memory > > I may be wrong, of course, but I think Mojca proposed something > different (and something that should be really easy to implement): Have > the unicode vectors stored in a format easily parsed by an external ruby > script and create the regi-* files from that, using the conversion > tables provided by your operating system or iconv or wherever ruby gets > them from. Yes, I had something different in mind. A1.) prepare the files to be used as a source of transformation from "any" character set to utf and prepare a list of synonyms for encodings (example: a file that says that in ISO-8859-2, character 0xA3 represents an unicode character 0x0141 (lstroke): for every character, for every Mac/Windows/iso/[...] encoding that we want to support) A2.) write a script which automatically generates regi-* files from those files, but regi-* files would contain only the mapping to unicode number (example: \startregime[iso-8859-2] ... \somecommandtomapacharactertounicode {163}{1}{65} % lstroke ... \stopregime) A3.) prepare a huge file with mapping from unicode numbers to ConTeXt commands (example: ... \somecommandtomapfromunicodetocontext {1}{65}{\lstroke} ...) A4.) ... I don't mind what ConTeXt does with this \lstroke afterwards, but it seems it is already clever enough to produce the (proper) glyph at the end What should ConTeXt do with that? B1.) The file under A3 should be processed at the beginning. As it may become really huge, exotic definitions should be only preloaded if asked for (\usemodule[korean]), while there is probably no harm if (accented) latin, greek, cyrillic and punctuation (TM, copyright, ..) are preloaded by default B2.) Once the \enableregime[iso-8859-2] or any other regime is requested, the file with the corresponding regime definitions is processed. However, as \somecommandtomapacharactertounicode {163}{1}{65} is processed, the character '163' is not stored as \uchar{1}{65}, but as \lstroke. '\somecommandtomapacharactertounicode' would first take a look which ConTeXt command is saved under \uchar{1}{65} and call the \defineactivetoken 179 {\lstroke} as a result. I don't know the details of the ConTeXt internal stuff, but I think (hope) that it should be possible to do it this way. B1 (preloading mapping from unicode to tex commands) is probably the only "hungry" step in the whole story. I think that it doesn't make any sense to ask the user to "\input regi-whatever". \enableregime and some additional definitions should be clever enough to find out which file to process in order to enable the proper regime. % Christopher's idea is actually yet another alternative, which combines the steps A2 and A3. If the mapping unicode->ConTeXt is in some easy-to-parse format, there's actually no additional effort if the script writes directly the ConTeXt commands instead of unicode numbers into regi-* files, so that B2 has some less work to do. As long as it is guaranteed that nobody will change these files manually, this is OK. The only drawback is that if someone notices that "\textellipsis" is more suitable than "\dots", the script has to be changed and the files have to be generated once more. If the character is mapped to (0x2026 HORIZONTAL ELLIPSIS) instead, only one line in the file with unicode->ConTeXt mapping (A3) has to be changed. If B2 cannot work as described, the Christopher's proposal would be the only proper way to go. % I wanted to test \showcharacters on the live.contextgarden.net (as Hans suggested that my map files are probably not OK), but it didn't compile there. (I hope it's not because of my buggy contributions in the last few days.) Is there any tool or macro to visialize all the glyphs available in a font? \showcharacters (if it works) shows only the glyphs that ConTeXt is aware of. What about the rest? Mojca ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Hans Hagen wrote: So why not mapping the characters to unicode first and defining the mapping from unicode to \TeXcommand only once? regi-* files (at least in the meaning they have now) could be prepared automatically by a script, less error-prone and without the need to say "Some more definitions will be added later." you mean ... \defineactivetoken 123 {\uchar{...}{...}} it is an option but it's much slower and take much more memory I may be wrong, of course, but I think Mojca proposed something different (and something that should be really easy to implement): Have the unicode vectors stored in a format easily parsed by an external ruby script and create the regi-* files from that, using the conversion tables provided by your operating system or iconv or wherever ruby gets them from. regards, Christopher ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Mojca Miklavec wrote: Hans Hagen wrote: Mojca Miklavec wrote: (concerning eregi-* files: you can define filesynonyms so we need a list of filesynonyms and regimesynonyms) What do you mean by writing file synonyms? Where would it be used? \definefilesynonym [mojka] [mojca] \definefilesynonym [moika] [mojca] \definefilesynonym [moica] [mojca] Ok, if you are provocating, I'll strike back: None of the definitions above are allowed because they don't warn the user if he's using the wrong name. They should throw an error instead. The only proper way would be to define something like \setuplabeltext[\s!en][\v!pronouncemyname=moitsa] \setuplabeltext[\s!de][\v!pronouncemyname=mojza] \setuplabeltext[\s!ru][\v!pronouncemyname=мойца] ... so how about using: \translate[en=moitsa,de=mojza,ru=мойца] then -) OK. I'll prepare \defineregimesynonym-s proposals, but I still don't know what the file synonyms should be used for in this context. The user probably doesn't need to care about file names? depends on if you want to preload all those vectors (take quite some memory although i may find a way around that [maybe delayed loading] So why not mapping the characters to unicode first and defining the mapping from unicode to \TeXcommand only once? regi-* files (at least in the meaning they have now) could be prepared automatically by a script, less error-prone and without the need to say "Some more definitions will be added later." you mean ... \defineactivetoken 123 {\uchar{...}{...}} it is an option but it's much slower and take much more memory \uchar{2}{33} takes 1 hash pointer and 7 char slots (so probably 8 mem locations) while \eacute takes one mem location Is it possible to switch the regimes in the middle of the document (like it is possible to switch the languages)? An example usage would be if some input documents (plain text, some older TeX files or database entries) are written in some other encoding than the main stream. (Possibly switching in such a way that no leftovers remain after the old encoding is replaced by a new one.) switching is possible but in that case you probably want to set toc/index/etc expansion to yes Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Hans Hagen wrote: > Mojca Miklavec wrote: > > >>(concerning eregi-* files: you can define filesynonyms so we need a list of > >>filesynonyms and regimesynonyms) > >> > > > >What do you mean by writing file synonyms? Where would it be used? > > \definefilesynonym [mojka] [mojca] > \definefilesynonym [moika] [mojca] > \definefilesynonym [moica] [mojca] Ok, if you are provocating, I'll strike back: None of the definitions above are allowed because they don't warn the user if he's using the wrong name. They should throw an error instead. The only proper way would be to define something like \setuplabeltext[\s!en][\v!pronouncemyname=moitsa] \setuplabeltext[\s!de][\v!pronouncemyname=mojza] \setuplabeltext[\s!ru][\v!pronouncemyname=мойца] ... > >For unicode regimes, this is probably an useful (more or less complete) set. > > > >\defineregimesynonym[utf8][utf] > >\defineregimesynonym[utf 8][utf] > > > > > the spacy one does not make much sense > > >\defineregimesynonym[utf-8][utf] > >\defineregimesynonym[unicode][utf] > > > > > not sure about this one Me neither, but "utf" alone is just as doubtful as this one. However, leaving utf-8 and utf8 only is OK. > >(Btw, I tried all the four before I got the answer on the mailing list > >that I should use 'utf' instead.) > > > >For the rest of the regimes I have to take a look first, so that I > >don't say anything wrong. There has to be only one clear scheme. > > > indeed, i'll wait patiently for your complete list of synonyms OK. I'll prepare \defineregimesynonym-s proposals, but I still don't know what the file synonyms should be used for in this context. The user probably doesn't need to care about file names? > >What's the proper name for nonbreaking space, '~', to be put in regi-* file? > > > how about \nonbreakablespace Thanks. There was no such glyph in \showcharacters -) (PS: I'm sorry for accusing the innocent commands of \showcharacters and \showaccents for the missfunctionality. I accidentaly placed them after an \obeylines command as I was debugging some files. They couldn't have worked there anyway.) %%% I wanted to post this in another thread, but it probably still fits on this place: The regi-* files currently map characters from individual encodings directly into \TeXcommands. But unicode is already supported in ConTeXt and the mappings from single file encodings into unicode are pretty well defined (perhaps there are some exceptions?) and can be obtained elsewhere on the internet. On the other hand, mapping from unicode to \TeXcommands is much less straightforward and sometimes subjective. I noticed some comments in regi-* files like % \texttrademark changed to \trademark or % \dots changed to \textellipsis The one who does the changes like that probably does them only in one file, the rest remains as is (and probably becomes deprecated if not unfunctional one day). On the other hand, there are around ten different cyrilic encodings (mostly they are already supported by ConTeXt, but anyway) and many other encodings in other languages as well. This means that the same cyrilic letter has to be assigned the name in ten files (regimes), possibly manually. So why not mapping the characters to unicode first and defining the mapping from unicode to \TeXcommand only once? regi-* files (at least in the meaning they have now) could be prepared automatically by a script, less error-prone and without the need to say "Some more definitions will be added later." Is it possible to switch the regimes in the middle of the document (like it is possible to switch the languages)? An example usage would be if some input documents (plain text, some older TeX files or database entries) are written in some other encoding than the main stream. (Possibly switching in such a way that no leftovers remain after the old encoding is replaced by a new one.) Mojca ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Mojca Miklavec wrote: maybe a better name is regi-ce or just regi-1250 regi-ce is a bad name as there are 4 central european encodings (IBM-853, ISO-8859-2, MacCE and Windows-1250) plus Croatian. 1250 alone is probably OK, but there's no hint in file name about which encoding is meant (windows/ibm/iso/mac ...). I tested the code for regime synonyms and it looks OK. Thanks for investingating my request :) ok, i'll add it to enco-ini then (concerning eregi-* files: you can define filesynonyms so we need a list of filesynonyms and regimesynonyms) What do you mean by writing file synonyms? Where would it be used? \definefilesynonym [mojka] [mojca] \definefilesynonym [moika] [mojca] \definefilesynonym [moica] [mojca] For unicode regimes, this is probably an useful (more or less complete) set. \defineregimesynonym[utf8][utf] \defineregimesynonym[utf 8][utf] the spacy one does not make much sense \defineregimesynonym[utf-8][utf] \defineregimesynonym[unicode][utf] not sure about this one (Btw, I tried all the four before I got the answer on the mailing list that I should use 'utf' instead.) For the rest of the regimes I have to take a look first, so that I don't say anything wrong. There has to be only one clear scheme. indeed, i'll wait patiently for your complete list of synonyms there are \showcharacters \showaccents Thank you. The commands were only kind-of-working here. They produced the table that I wanted (and quite some trash as well), but they were complaining a lot. Thanks for the contribution into Visual debugging, Hraban! What's the proper name for nonbreaking space, '~', to be put in regi-* file? how about \nonbreakablespace Hans -- - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Mojca Miklavec wrote: \Dstroke has some "problems" anyway, at least in cmr (lmr?). The stroke should be on the left, but it is on the right. I thought it was just because \tt don't have that glyph, but also the roman version is rendered extremely bad. in case of doubt, you can discuss this with Boguslaw Jackowski (jacko) who is in charge of latin modern; it shoul dbe ok in latin roman So what is the proper way of writing 'ä' (a umlaut) then? in german mode, "u will produce it (tricky since ther eis no hyphenation then) latin modern did have them and there is a special encoding vector in the context distribution (awaiting for those umlaust to show up again) Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Hans Hagen wrote: > Mojca Miklavec wrote: > > >regi-lat.tex is interesting, made just for typesetting Croatian :) > >Perhaps I can add some stuff there too. > > > >\defineactivetoken đ {\pseudoencodeddj} > >\defineactivetoken Ð {\pseudoencodedDJ} > > > >This should be \dstroke and \Dstroke. > > > ok, changed Thank you. \Dstroke has some "problems" anyway, at least in cmr (lmr?). The stroke should be on the left, but it is on the right. I thought it was just because \tt don't have that glyph, but also the roman version is rendered extremely bad. > >Where did the "hungarumlaut" characters get the name from? > > > the names probably come from postscript Thanks, I looked into some .afm files and they were actually there. > btw, there is a differnece between umlaut and diaeresis (height) So what is the proper way of writing 'ä' (a umlaut) then? > can't you make it into a > > \defineactivetoken 128 {\texteuro} % € 20AC EURO SIGN > > kind of table? Good idea indeed, it looks much nicer this way. > maybe a better name is regi-ce or just regi-1250 regi-ce is a bad name as there are 4 central european encodings (IBM-853, ISO-8859-2, MacCE and Windows-1250) plus Croatian. 1250 alone is probably OK, but there's no hint in file name about which encoding is meant (windows/ibm/iso/mac ...). I tested the code for regime synonyms and it looks OK. Thanks for investingating my request :) > (concerning eregi-* files: you can define filesynonyms so we need a list of > filesynonyms and regimesynonyms) What do you mean by writing file synonyms? Where would it be used? For unicode regimes, this is probably an useful (more or less complete) set. \defineregimesynonym[utf8][utf] \defineregimesynonym[utf 8][utf] \defineregimesynonym[utf-8][utf] \defineregimesynonym[unicode][utf] (Btw, I tried all the four before I got the answer on the mailing list that I should use 'utf' instead.) For the rest of the regimes I have to take a look first, so that I don't say anything wrong. There has to be only one clear scheme. > there are > > \showcharacters > \showaccents Thank you. The commands were only kind-of-working here. They produced the table that I wanted (and quite some trash as well), but they were complaining a lot. Thanks for the contribution into Visual debugging, Hraban! What's the proper name for nonbreaking space, '~', to be put in regi-* file? Mojca ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
On 7/17/05, Hans Hagen <[EMAIL PROTECTED]> wrote: > Mojca Miklavec wrote: > > >regi-lat.tex is interesting, made just for typesetting Croatian :) > >Perhaps I can add some stuff there too. > > > >\defineactivetoken đ {\pseudoencodeddj} > >\defineactivetoken Ð {\pseudoencodedDJ} > > > >This should be \dstroke and \Dstroke. > > > > > ok, changed > yes, there are also exactly two glyphs \dstroke and \Dstroke in Vietnamese :) Cheers, -- http://vnoss.org Vietnamese Open Source Software Community ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Henning Hraban Ramm wrote: Am 2005-07-17 um 22:37 schrieb Hans Hagen: there are \showcharacters \showaccents BTW I finally created the wiki page "Visual Debugging" for all the \show... commands; I guess there are even more than I listed there, and some descriptions are still missing (had no time to try them all). thanks (\trace... is also handy) Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Am 2005-07-17 um 22:37 schrieb Hans Hagen: there are \showcharacters \showaccents BTW I finally created the wiki page "Visual Debugging" for all the \show... commands; I guess there are even more than I listed there, and some descriptions are still missing (had no time to try them all). Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Mojca Miklavec wrote: I'm now attaching a file for support for windows-1250-encoded files. One character is missing (I don't know what to write for non-breaking space) and it's not extensively tested or proved for typos. So if someone can drop an eye on it, I'll be glad. maybe a better name is regi-ce or just regi-1250 Does anyone have any script to test the encoding (which would produce a matrix of (almost) 266 characters)? there are \showcharacters \showaccents it all depends on the combination of input regime and font encoding Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Mojca Miklavec wrote: regi-lat.tex is interesting, made just for typesetting Croatian :) Perhaps I can add some stuff there too. \defineactivetoken đ {\pseudoencodeddj} \defineactivetoken Ð {\pseudoencodedDJ} This should be \dstroke and \Dstroke. ok, changed Where did the "hungarumlaut" characters get the name from? Woudn't it be better to have "doubleaccute" (as in UNICODE standard). We also don't name the characters "germanumlaut" but "diaeresis" instead. the names probably come from postscript btw, there is a differnece between umlaut and diaeresis (height) Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Henning Hraban Ramm wrote: > You did read http://contextgarden.net/Encodings_and_Regimes and > linked pages, did you? > If you learn anything new, please add it to the wiki! Thank you! It was probably me who copy-pasted some of the material there from some thread, but when I looked at it once again, I learnt something new. A while ago I was asking how to typeset things in windows-1250 encoding (\usepackage[cp1250]{inputenc} in LaTeX). I got some answer (just a temporary solution with csr fonts), but it was not a satisfying one. I'm now attaching a file for support for windows-1250-encoded files. One character is missing (I don't know what to write for non-breaking space) and it's not extensively tested or proved for typos. So if someone can drop an eye on it, I'll be glad. Does anyone have any script to test the encoding (which would produce a matrix of (almost) 266 characters)? regi-lat.tex is interesting, made just for typesetting Croatian :) Perhaps I can add some stuff there too. \defineactivetoken đ {\pseudoencodeddj} \defineactivetoken Ð {\pseudoencodedDJ} This should be \dstroke and \Dstroke. Where did the "hungarumlaut" characters get the name from? Woudn't it be better to have "doubleaccute" (as in UNICODE standard). We also don't name the characters "germanumlaut" but "diaeresis" instead. Mojca regi-cp1250.tex Description: TeX document ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Am 2005-07-14 um 21:13 schrieb Steffen Wolfrum: But, why is the Vietnamese example with \enableregime[utf] linked under vis = visciiVISCIIVietnamesevis = visciiVISCII Vietnamese and not accessable with utfUTF-8Unicode ? (Same for cyrillic) Is this just a wrong link, or does this show that I don't have understood the realationship between regimes and encoding? Shouldn't all UTF relevant examples be listed under UTF? All examples are (could be) relevant for UTF-8, because you can set (nearly) everything in Unicode. VISCII is one possible encoding for Vietnamese (and only for Vietnamese), so I found it rather logical to link from there to V., even if the V. example uses UTF-8, which is probably more modern - as probably a lot of other encodings are obsolete/deprecated. So, even if the V. example could be considered a general UTF-8 example, it shows how one can (and perhaps should) typeset V. So I guess the only error or missing link is the link from UTF-8 to Vietnamese (and Cyrillic). Do it yourself as you please. Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Steffen Wolfrum wrote: I know there is \enableregime[utf] but what else I needed that the output equals my utf-8 input? Could some maybe give a short and usable How-To on common examples: Greek Russian an East European language and an Asian language? You did read http://contextgarden.net/Encodings_and_Regimes and linked pages, did you? If you learn anything new, please add it to the wiki! Well, yes, I wasn't interested in e.g. VISCII, but I read the info for UTF. But as you wrote "linked pages" I became more curious and looked up also those pages. Indeed, there is more: But, why is the Vietnamese example with \enableregime[utf] \setupencoding[default=t5 linked under vis = visciiVISCII Vietnamesevis = viscii VISCII Vietnamese and not accessable with utf UTF-8 Unicode ? (Same for cyrillic) Is this just a wrong link, or does this show that I don't have understood the realationship between regimes and encoding? Shouldn't all UTF relevant examples be listed under UTF? \enableregime is not enough. You need to setup font encoding and appropriate bodyfont. For these see type-enc, type-pre and such. Example for cyrillic: \enableregime [utf] \setupencoding [default=t2a] \usetypescript [modern-base] [\defaultencoding] \setupbodyfont [modern] \starttext Тест. \stoptext -- Radhelorn <[EMAIL PROTECTED]> ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
On 7/14/05, Steffen Wolfrum <[EMAIL PROTECTED]> wrote: > > > Well, yes, I wasn't interested in e.g. VISCII, but I read the info for UTF. > But as you wrote "linked pages" I became more curious and looked up also those > pages. Indeed, there is more: > > But, why is the Vietnamese example with > \enableregime[utf] > \setupencoding[default=t5 > linked under > vis = visciiVISCII Vietnamesevis = viscii VISCII Vietnamese > and not accessable with > utf UTF-8 Unicode ? (Same for cyrillic) Sorry, I can not understand your question. Vietnamese can you TeX/LaTeX and ConTeXt with different input encodings: TCVN, VISCII, VPS or UTF-8. I'm using currently ConTeXt UTF-8 input for ConTeXt no problem. Not yet tested with another input encoding, but no more problem with TeX/LaTeX, so should be ok with ConTeXt, i'm wrong? -- http://vnoss.org Vietnamese Open Source Software Community ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Hi Henning, Zitat von Henning Hraban Ramm <[EMAIL PROTECTED]>: > Am 2005-07-14 um 11:30 schrieb Steffen Wolfrum: > > > I know there is \enableregime[utf] > > but what else I needed that the output equals my utf-8 input? > > > > Could some maybe give a short and usable How-To on common examples: > > Greek > > Russian > > an East European language > > and an Asian language? > > You did read http://contextgarden.net/Encodings_and_Regimes and > linked pages, did you? > If you learn anything new, please add it to the wiki! Well, yes, I wasn't interested in e.g. VISCII, but I read the info for UTF. But as you wrote "linked pages" I became more curious and looked up also those pages. Indeed, there is more: But, why is the Vietnamese example with \enableregime[utf] \setupencoding[default=t5 linked under vis = visciiVISCII Vietnamesevis = viscii VISCII Vietnamese and not accessable with utf UTF-8 Unicode ? (Same for cyrillic) Is this just a wrong link, or does this show that I don't have understood the realationship between regimes and encoding? Shouldn't all UTF relevant examples be listed under UTF? So,sorry for starting this irrelevant thread, Steffen ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Basic question on Unicode and ConTeXt
Am 2005-07-14 um 11:30 schrieb Steffen Wolfrum: I know there is \enableregime[utf] but what else I needed that the output equals my utf-8 input? Could some maybe give a short and usable How-To on common examples: Greek Russian an East European language and an Asian language? You did read http://contextgarden.net/Encodings_and_Regimes and linked pages, did you? If you learn anything new, please add it to the wiki! Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net ___ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context