Re: [NTG-context] Regimes to be supported; Comments?

2005-07-29 Thread Taco Hoekwater


Hi Mojca,


Re: encodings, this page may be of help:

  http://www.kostis.net/charsets/

I personally prefer to use iconv as a preprocessor (any to utf-8), so
I don't really care all that much about supported encodings. I have
some remarks anyway, of course ;-)

I think that some of the encodings on your list are more like "keyboard
mappings" than like actual input encodings (some MacXXX ones, for
instance).

The 'original' MICROSOFT/PC ones and EBCDIC have probably all fallen
in disuse by now  in 'normal operations'. I would not bother with them.


Cheers, Taco

Mojca Miklavec wrote:


a) Good luck (I don't want to be on your place)!
b) Take a good (commercial) program
c) If you're ready to invest the rest of your time (forget about 
hobbies!), it's probably doable in LaTeX or ConTeXt until then
č) Forget about TeX - it will be possible to solve this problem one day 
with unicode & one of the new TeX engines. But until then, it's not 
worth the effort, because any effort you may invest will become obsolete 
in a couple of years.


I'm missing an option:

d) you need some editorial and TeX skill but otherwise this is quite
   doable with current TeX/ConTeXt.

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


[NTG-context] Regimes to be supported; Comments?

2005-07-29 Thread Mojca Miklavec

Hello,

Some time ago there was a discussion about extending support for 
different regimes in ConTeXt. The list of (to-be-)supported regimes 
probably depends strongly on the implementation (ruby+iconv?). I 
collected a preliminary list of candidate regimes and possible synonyms 
(some synonyms are listed there for backward compatibility and have to 
remain there), leaving out most of eastern encodings (not because they 
shouldn't be on the list, but because I'm completely ignorant about that).


Hans suggested to post this to the mailing list first to get some useful 
comments and suggestions.


#

The following question should probably go in a separate thread, but it's 
a very similar thematic. In July 2006 Ljubljana will host people from 
around 85 coutries of the world. One of the very ambitious organizers is 
dreaming for already a couple of years to print the participant names 
(on honourable mentions for example, ...) in both latinic transcription 
and as they are written in original (under an assumption that the names 
are properly entered in a UTF-8 database). This is probably not possible 
to do for every single obscure language, but does it in general sound like:

a) Good luck (I don't want to be on your place)!
b) Take a good (commercial) program
c) If you're ready to invest the rest of your time (forget about 
hobbies!), it's probably doable in LaTeX or ConTeXt until then
č) Forget about TeX - it will be possible to solve this problem one day 
with unicode & one of the new TeX engines. But until then, it's not 
worth the effort, because any effort you may invest will become obsolete 
in a couple of years.


To be honest, even some people who will thanslate the materials into the 
native language, will probably do that with paper, pencil & scanner.


#


Mojca

And here the encodings:

# ISO
ISO-8859-1  Western
ISO-8859-2  Central European
ISO-8859-3  South European
ISO-8859-4  Baltic
ISO-8859-5  Cyrillic
ISO-8859-6  Arabic
ISO-8859-7  Greek
ISO-8859-8  Hebrew Visual
ISO-8859-8-I Hebrew (???) What is that?
ISO-8859-9  Turkish
ISO-8859-10 Nordic
ISO-8859-11 Thai
ISO-8859-13 Baltic
ISO-8859-14 Celtic
ISO-8859-15 Western
ISO-8859-16 Romanian

\defineregimesynonym[il*][iso-8859-*], *=1-16\12
\defineregimesynonym[latin*][iso-8859-*], *=1-16\12
\defineregimesynonym[cp819][iso-8859-1]

% I'm not sure that anyone needs these:
\defineregimesynonym[iso-ir-100][iso-8859-1]
\defineregimesynonym[iso-ir-101][iso-8859-2]
\defineregimesynonym[iso-ir-109][iso-8859-3]
\defineregimesynonym[iso-ir-110][iso-8859-4]
\defineregimesynonym[iso-ir-144][iso-8859-5]
\defineregimesynonym[iso-ir-127][iso-8859-6]
\defineregimesynonym[iso-ir-126][iso-8859-7]
\defineregimesynonym[iso-ir-138][iso-8859-8]
\defineregimesynonym[iso-ir-148][iso-8859-9]
\defineregimesynonym[iso-ir-157][iso-8859-10]
\defineregimesynonym[iso-ir-179][iso-8859-13]
\defineregimesynonym[iso-ir-199][iso-8859-14]
\defineregimesynonym[iso-ir-203][iso-8859-15]
\defineregimesynonym[iso-ir-226][iso-8859-16]

% backward compatibility
\defineregimesynonym[iso88595][iso-8859-5]

(recode also recognises "arabic", "greek", "cyrillic", "hebrew" as 
an alias for those encodings: I don't if this is a good idea as there 
are other charset operating with the same language groups as well)


# APPLE
MacArabic
MacCeltic
MacCentralEuropean
% CentEur, CentralEurope or CentralEuropean? or all of them?
MacChineseSimplified
MacChineseTraditional
MacCroatian
MacCyrillic
MacDevanagari
MacDingbats
MacFarsi
MacGaelic
MacGreek
MacGujarati
MacGurmukhi
MacHebrew
MacIcelandic
MacInuit
MacJapanese
MacKeyboard
MacKorean
MacRoman
MacRomanian
MacSymbol
MacThai
MacTurkish
MacUkrainian

\defineregimesynonym[MacCE][MacCentralEuropean]
\defineregimesynonym[mac][MacRoman]
\defineregimesynonym[maccyr][MacCyrillic]
\defineregimesynonym[macukr][MacUkrainian]

(I also need some help here: sometimes Mac encodings are defined using 
adjectives, sometimes using nouns, like Ukraine/Ukrainian. Should only 
one of them (which?) be used or both of them? On the unicode page, Mac 
encodings appear twice. The second time under Microsoft/Apple, 
containing MacCyrillic, MacGreek, MacIceland, MacLatin2, MacRoman, 
MacTurkish. I didn't really get the point for that.)


# IBM
% essentially the same as under Microsoft, with some minor changes 
(to be processed manually, if these are to be supported)

# MICROSOFT
EBCDIC % plenty of them are missing on the web
cp037
cp500
cp875
cp1026
PC
cp437 LatinUS
cp737 Greek
cp775 BaltRim
cp850 Latin1
cp852 Latin2
cp855 Cyrillic
cp857 Turkish
cp860 Portuguese
cp861 Icelandic
cp862 Heb