Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-25 Thread Hans Hagen

Mojca Miklavec wrote:


I'm still slighlty confused by the encoding files (texnansi, ec,...,
in one case iso-8859-7 is used). Does it mean that it is impossible
(or at least very complex or slow) to access more than 256 characters
from a single font at once?
 

indeed and since it's related to hyphenation ... 

but some day pdftex will be 32 bit and open type so ... 

Hans 




-
 Hans Hagen | PRAGMA ADE
 Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-25 Thread Henning Hraban Ramm


Am 2005-07-23 um 00:20 schrieb Mojca Miklavec:

I'm still slighlty confused by the encoding files (texnansi, ec,...,
in one case iso-8859-7 is used). Does it mean that it is impossible
(or at least very complex or slow) to access more than 256 characters
from a single font at once?


TeX as an old 8bit system isn't able to handle more than 256 chars  
per font.
Only more modern siblings (like Omega/Aleph) are able to handle  
"Unicode size" fonts by itself.



Grüßlis vom Hraban!
---
http://www.fiee.net/texnique/
http://contextgarden.net

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-22 Thread Mojca Miklavec
Christopher Creutzig wrote:
> We already have
> Iconv in ruby and can, if we know that ISO-8859-2 is a single byte
> coding system, simply say
> 
> conv = Iconv.new("UTF-16", "ISO-8859-2")
> 255.times { |i| puts lookup[conv.iconv("%c" % i)] }
> 
> to get the whole list, assuming we've filled the lookup hash first.

Great!

Sorry for all my philosophising! I don't know ruby (yet) and I didn't
even think about this possibility. My last idea was to parse and
combine the data on http://www.unicode.org/Public/MAPPINGS/VENDORS/, 
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt and
http://partners.adobe.com/public/developer/en/opentype/aglfn13.txt,
but your idea is hundred times faster and better! Thanks a lot!

> As you've said, I'd combine steps A2 and A3, to make ConTeXt run faster.

That's OK for me. If there's a simple internal ruby tool (called every
time when unicode->tex mapping changes or some more encoding support
is added) instead of one-time-script, there should be no problem to do
that directly.

> If you want, for whatever reason, to use \textellipsis for an
> ellipsis (it just looks horribly wrong to me) instead of \dots, you'd
> need to invoke the ruby script which generates the regi-* files.

I just wanted to give an example that changes are sometimes needed and
that it is difficult to trace all the places where they should have
been made. Sorry, this example wasn't very ilustrative, I don't even
know what \textellipses stands for, I just saw some comments about
changes made in regi-* files or some discrepancies.

>   The whole thing should not require any change at all to ConTeXt
> itself, since the regi-* files could look exactly as they do now, just
> being generated automatically.  (For the multibyte encodings, the whole
> thing gets much more tricky.)

I noticed (perhaps I'm wrong) that TeX community support for cyrillic
may be better than that in unicode and in the available old 8bit
encodings. ConTeXt is also already supporting those strange regimes
(ctt, dbk, mls, mnk, mos, ncc, ...) that I was unable to find anywhere
else. In this case one should also be careful in order not to spoil
this already available feature.

I'm still slighlty confused by the encoding files (texnansi, ec,...,
in one case iso-8859-7 is used). Does it mean that it is impossible
(or at least very complex or slow) to access more than 256 characters
from a single font at once?

Mojca
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-22 Thread Hans Hagen

Christopher Creutzig wrote:


conv = Iconv.new("UTF-16", "ISO-8859-2")
255.times { |i| puts lookup[conv.iconv("%c" % i)] }

to get the whole list, assuming we've filled the lookup hash first.


an alternative is to use the tcx files but that is kind of messy

so we need a utf-8 hash (can be loaded from unic-* files)

Hans



-
 Hans Hagen | PRAGMA ADE
 Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-22 Thread Christopher Creutzig

Mojca Miklavec wrote:


A1.) prepare the files to be used as a source of transformation from
"any" character set to utf and prepare a list of synonyms for
encodings


 In my point of view, that should only be a fallback.  We already have 
Iconv in ruby and can, if we know that ISO-8859-2 is a single byte 
coding system, simply say


conv = Iconv.new("UTF-16", "ISO-8859-2")
255.times { |i| puts lookup[conv.iconv("%c" % i)] }

to get the whole list, assuming we've filled the lookup hash first.


 As you've said, I'd combine steps A2 and A3, to make ConTeXt run 
faster.  If you want, for whatever reason, to use \textellipsis for an 
ellipsis (it just looks horribly wrong to me) instead of \dots, you'd 
need to invoke the ruby script which generates the regi-* files.


 The whole thing should not require any change at all to ConTeXt 
itself, since the regi-* files could look exactly as they do now, just 
being generated automatically.  (For the multibyte encodings, the whole 
thing gets much more tricky.)



Christopher
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-20 Thread Mojca Miklavec
Christopher Creutzig wrote:
> Hans Hagen wrote:
> >> So why not mapping the characters to unicode first and defining the
> >> mapping from unicode to \TeXcommand only once? regi-* files (at least
> >> in the meaning they have now) could be prepared automatically by a
> >> script, less error-prone and without the need to say "Some more
> >> definitions will be added later."
> >>
> > you mean ...
> >
> > \defineactivetoken 123 {\uchar{...}{...}}
> >
> > it is an option but it's much slower and take much more memory
> 
>   I may be wrong, of course, but I think Mojca proposed something
> different (and something that should be really easy to implement):  Have
> the unicode vectors stored in a format easily parsed by an external ruby
> script and create the regi-* files from that, using the conversion
> tables provided by your operating system or iconv or wherever ruby gets
> them from.

Yes, I had something different in mind.

A1.) prepare the files to be used as a source of transformation from
"any" character set to utf and prepare a list of synonyms for
encodings

(example: a file that says that in ISO-8859-2, character 0xA3
represents an unicode character 0x0141 (lstroke): for every character,
for every Mac/Windows/iso/[...] encoding that we want to support)

A2.) write a script which automatically generates regi-* files from
those files, but regi-* files would contain only the mapping to
unicode number

(example:
\startregime[iso-8859-2]
...
\somecommandtomapacharactertounicode {163}{1}{65} % lstroke
...
\stopregime)

A3.) prepare a huge file with mapping from unicode numbers to ConTeXt commands

(example:
...
\somecommandtomapfromunicodetocontext {1}{65}{\lstroke}
...)

A4.) ... I don't mind what ConTeXt does with this \lstroke afterwards,
but it seems it is already clever enough to produce the (proper) glyph
at the end

What should ConTeXt do with that?
B1.) The file under A3 should be processed at the beginning. As it may
become really huge, exotic definitions should be only preloaded if
asked for (\usemodule[korean]), while there is probably no harm if
(accented) latin, greek, cyrillic and punctuation (TM, copyright, ..)
are preloaded by default

B2.) Once the \enableregime[iso-8859-2] or any other regime is
requested, the file with the corresponding regime definitions is
processed. However, as \somecommandtomapacharactertounicode
{163}{1}{65} is processed, the character '163' is not stored as
\uchar{1}{65}, but as \lstroke. '\somecommandtomapacharactertounicode'
would first take a look which ConTeXt command is saved under
\uchar{1}{65} and call the
\defineactivetoken 179 {\lstroke} as a result.

I don't know the details of the ConTeXt internal stuff, but I think
(hope) that it should be possible to do it this way. B1 (preloading
mapping from unicode to tex commands) is probably the only "hungry"
step in the whole story.

I think that it doesn't make any sense to ask the user to "\input
regi-whatever". \enableregime and some additional definitions should
be clever enough to find out which file to process in order to enable
the proper regime.

%

Christopher's idea is actually yet another alternative, which combines
the steps A2 and A3. If the mapping unicode->ConTeXt is in some
easy-to-parse format, there's actually no additional effort if the
script writes directly the ConTeXt commands instead of unicode numbers
into regi-* files, so that B2 has some less work to do. As long as it
is guaranteed that nobody will change these files manually, this is
OK. The only drawback is that if someone notices that "\textellipsis"
is more suitable than "\dots", the script has to be changed and the
files have to be generated once more. If the character is mapped to
(0x2026 HORIZONTAL ELLIPSIS) instead, only one line in the file with
unicode->ConTeXt mapping (A3) has to be changed.

If B2 cannot work as described, the Christopher's proposal would be
the only proper way to go.

%

I wanted to test \showcharacters on the live.contextgarden.net (as
Hans suggested that my map files are probably not OK), but it didn't
compile there. (I hope it's not because of my buggy contributions in
the last few days.)

Is there any tool or macro to visialize all the glyphs available in a
font? \showcharacters (if it works) shows only the glyphs that ConTeXt
is aware of. What about the rest?

Mojca
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-20 Thread Christopher Creutzig

Hans Hagen wrote:


So why not mapping the characters to unicode first and defining the
mapping from unicode to \TeXcommand only once? regi-* files (at least
in the meaning they have now) could be prepared automatically by a
script, less error-prone and without the need to say "Some more
definitions will be added later."
 


you mean ...

\defineactivetoken 123 {\uchar{...}{...}}

it is an option but it's much slower and take much more memory


 I may be wrong, of course, but I think Mojca proposed something 
different (and something that should be really easy to implement):  Have 
the unicode vectors stored in a format easily parsed by an external ruby 
script and create the regi-* files from that, using the conversion 
tables provided by your operating system or iconv or wherever ruby gets 
them from.



regards,
Christopher
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-19 Thread Hans Hagen

Mojca Miklavec wrote:


Hans Hagen wrote:
 


Mojca Miklavec wrote:

   


(concerning eregi-* files: you can define filesynonyms so we need a list of 
filesynonyms and regimesynonyms)

   


What do you mean by writing file synonyms? Where would it be used?
 


\definefilesynonym  [mojka]  [mojca]
\definefilesynonym  [moika]  [mojca]
\definefilesynonym  [moica]  [mojca]
   



Ok, if you are provocating, I'll strike back:
None of the definitions above are allowed because they don't warn the
user if he's using the wrong name. They should throw an error instead.
The only proper way would be to define something like

\setuplabeltext[\s!en][\v!pronouncemyname=moitsa]
\setuplabeltext[\s!de][\v!pronouncemyname=mojza]
\setuplabeltext[\s!ru][\v!pronouncemyname=мойца]
...
 


so how about using:

\translate[en=moitsa,de=mojza,ru=мойца]

then -)


OK. I'll prepare \defineregimesynonym-s proposals, but I still don't
know what the file synonyms should be used for in this context. The
user probably doesn't need to care about file names?
 

depends on if you want to preload all those vectors (take quite some 
memory although i may find a way around that [maybe delayed loading]



So why not mapping the characters to unicode first and defining the
mapping from unicode to \TeXcommand only once? regi-* files (at least
in the meaning they have now) could be prepared automatically by a
script, less error-prone and without the need to say "Some more
definitions will be added later."
 


you mean ...

\defineactivetoken 123 {\uchar{...}{...}}

it is an option but it's much slower and take much more memory

\uchar{2}{33} takes 1 hash pointer and 7 char slots (so probably 8 mem 
locations) while \eacute takes one mem location



Is it possible to switch the regimes in the middle of the document
(like it is possible to switch the languages)? An example usage would
be if some input documents (plain text, some older TeX files or
database entries) are written in some other encoding than the main
stream.
(Possibly switching in such a way that no leftovers remain after the
old encoding is replaced by a new one.)
 

switching is possible but in that case  you probably want to set toc/index/etc expansion to yes 

Hans 



-
 Hans Hagen | PRAGMA ADE
 Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-18 Thread Mojca Miklavec
Hans Hagen wrote:
> Mojca Miklavec wrote:
> 
> >>(concerning eregi-* files: you can define filesynonyms so we need a list of 
> >>filesynonyms and regimesynonyms)
> >>
> >
> >What do you mean by writing file synonyms? Where would it be used?
> 
> \definefilesynonym  [mojka]  [mojca]
> \definefilesynonym  [moika]  [mojca]
> \definefilesynonym  [moica]  [mojca]

Ok, if you are provocating, I'll strike back:
None of the definitions above are allowed because they don't warn the
user if he's using the wrong name. They should throw an error instead.
The only proper way would be to define something like

\setuplabeltext[\s!en][\v!pronouncemyname=moitsa]
\setuplabeltext[\s!de][\v!pronouncemyname=mojza]
\setuplabeltext[\s!ru][\v!pronouncemyname=мойца]
...

> >For unicode regimes, this is probably an useful (more or less complete) set.
> >
> >\defineregimesynonym[utf8][utf]
> >\defineregimesynonym[utf 8][utf]
> >
> >
> the spacy one does not make much sense
> 
> >\defineregimesynonym[utf-8][utf]
> >\defineregimesynonym[unicode][utf]
> >
> >
> not sure about this one

Me neither, but "utf" alone is just as doubtful as this one. However,
leaving utf-8 and utf8 only is OK.

> >(Btw, I tried all the four before I got the answer on the mailing list
> >that I should use 'utf' instead.)
> >
> >For the rest of the regimes I have to take a look first, so that I
> >don't say anything wrong. There has to be only one clear scheme.
> >
> indeed, i'll wait patiently for your complete list of synonyms

OK. I'll prepare \defineregimesynonym-s proposals, but I still don't
know what the file synonyms should be used for in this context. The
user probably doesn't need to care about file names?

> >What's the proper name for nonbreaking space, '~', to be put in regi-* file?
> >
> how about \nonbreakablespace

Thanks. There was no such glyph in \showcharacters -)

(PS: I'm sorry for accusing the innocent commands of \showcharacters
and \showaccents for the missfunctionality. I accidentaly placed them
after an \obeylines command as I was debugging some files. They
couldn't have worked there anyway.)

%%%

I wanted to post this in another thread, but it probably still fits on
this place:

The regi-* files currently map characters from individual encodings
directly into \TeXcommands. But unicode is already supported in
ConTeXt and the mappings from single file encodings into unicode are
pretty well defined (perhaps there are some exceptions?) and can be
obtained elsewhere on the internet. On the other hand, mapping from
unicode to \TeXcommands is much less straightforward and sometimes
subjective.

I noticed some comments in regi-* files like
  % \texttrademark changed to \trademark
or
  % \dots changed to \textellipsis

The one who does the changes like that probably does them only in one
file, the rest remains as is (and probably becomes deprecated if not
unfunctional one day).

On the other hand, there are around ten different cyrilic encodings
(mostly they are already supported by ConTeXt, but anyway) and many
other encodings in other languages as well. This means that the same
cyrilic letter has to be assigned the name in ten files (regimes),
possibly manually.

So why not mapping the characters to unicode first and defining the
mapping from unicode to \TeXcommand only once? regi-* files (at least
in the meaning they have now) could be prepared automatically by a
script, less error-prone and without the need to say "Some more
definitions will be added later."


Is it possible to switch the regimes in the middle of the document
(like it is possible to switch the languages)? An example usage would
be if some input documents (plain text, some older TeX files or
database entries) are written in some other encoding than the main
stream.
(Possibly switching in such a way that no leftovers remain after the
old encoding is replaced by a new one.)

Mojca
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-18 Thread Hans Hagen

Mojca Miklavec wrote:


maybe a better name is regi-ce or just regi-1250
   



regi-ce is a bad name as there are 4 central european encodings
(IBM-853, ISO-8859-2, MacCE and Windows-1250) plus Croatian. 1250
alone is probably OK, but there's no hint in file name about which
encoding is meant (windows/ibm/iso/mac ...).


I tested the code for regime synonyms and it looks OK. Thanks for
investingating my request :)
 


ok, i'll add it to enco-ini then


(concerning eregi-* files: you can define filesynonyms so we need a list of 
filesynonyms and regimesynonyms)
   



What do you mean by writing file synonyms? Where would it be used?
 



\definefilesynonym  [mojka]  [mojca]
\definefilesynonym  [moika]  [mojca]
\definefilesynonym  [moica]  [mojca]


For unicode regimes, this is probably an useful (more or less complete) set.

\defineregimesynonym[utf8][utf]
\defineregimesynonym[utf 8][utf]
 


the spacy one does not make much sense


\defineregimesynonym[utf-8][utf]
\defineregimesynonym[unicode][utf]
 


not sure about this one


(Btw, I tried all the four before I got the answer on the mailing list
that I should use 'utf' instead.)

For the rest of the regimes I have to take a look first, so that I
don't say anything wrong. There has to be only one clear scheme.
 


indeed, i'll wait patiently for your complete list of synonyms


there are

\showcharacters
\showaccents
   



Thank you. The commands were only kind-of-working here. They produced
the table that I wanted (and quite some trash as well), but they were
complaining a lot.

Thanks for the contribution into Visual debugging, Hraban!


What's the proper name for nonbreaking space, '~', to be put in regi-* file?
 


how about \nonbreakablespace

Hans

--

-
 Hans Hagen | PRAGMA ADE
 Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-18 Thread Hans Hagen

Mojca Miklavec wrote:


\Dstroke has some "problems" anyway, at least in cmr (lmr?). The
stroke should be on the left, but it is on the right. I thought it was
just because \tt don't have that glyph, but also the roman version is
rendered extremely bad.
 

in case of doubt, you can discuss this with  Boguslaw Jackowski (jacko) 
who is in charge of latin modern; it shoul dbe ok in latin roman



So what is the proper way of writing 'ä' (a umlaut) then?
 

in german mode, "u will produce it (tricky since ther eis no hyphenation 
then)


latin modern did have them and there is a special encoding vector in the 
context distribution (awaiting for those umlaust to show up again)


Hans

-
 Hans Hagen | PRAGMA ADE
 Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-18 Thread Mojca Miklavec
Hans Hagen wrote:
> Mojca Miklavec wrote:
> 
> >regi-lat.tex is interesting, made just for typesetting Croatian :)
> >Perhaps I can add some stuff there too.
> >
> >\defineactivetoken đ {\pseudoencodeddj}
> >\defineactivetoken Ð {\pseudoencodedDJ}
> >
> >This should be \dstroke and \Dstroke.
> >
> ok, changed

Thank you.

\Dstroke has some "problems" anyway, at least in cmr (lmr?). The
stroke should be on the left, but it is on the right. I thought it was
just because \tt don't have that glyph, but also the roman version is
rendered extremely bad.

> >Where did the "hungarumlaut" characters get the name from?
> >
> the names probably come from postscript

Thanks, I looked into some .afm files and they were actually there.

> btw, there is a differnece between umlaut and diaeresis (height)

So what is the proper way of writing 'ä' (a umlaut) then?

> can't you make it into a
> 
> \defineactivetoken 128 {\texteuro} % € 20AC EURO SIGN
> 
> kind of table?

Good idea indeed, it looks much nicer this way.

> maybe a better name is regi-ce or just regi-1250

regi-ce is a bad name as there are 4 central european encodings
(IBM-853, ISO-8859-2, MacCE and Windows-1250) plus Croatian. 1250
alone is probably OK, but there's no hint in file name about which
encoding is meant (windows/ibm/iso/mac ...).


I tested the code for regime synonyms and it looks OK. Thanks for
investingating my request :)

> (concerning eregi-* files: you can define filesynonyms so we need a list of 
> filesynonyms and regimesynonyms)

What do you mean by writing file synonyms? Where would it be used?

For unicode regimes, this is probably an useful (more or less complete) set.

\defineregimesynonym[utf8][utf]
\defineregimesynonym[utf 8][utf]
\defineregimesynonym[utf-8][utf]
\defineregimesynonym[unicode][utf]

(Btw, I tried all the four before I got the answer on the mailing list
that I should use 'utf' instead.)

For the rest of the regimes I have to take a look first, so that I
don't say anything wrong. There has to be only one clear scheme.

> there are
> 
> \showcharacters
> \showaccents

Thank you. The commands were only kind-of-working here. They produced
the table that I wanted (and quite some trash as well), but they were
complaining a lot.

Thanks for the contribution into Visual debugging, Hraban!


What's the proper name for nonbreaking space, '~', to be put in regi-* file?

Mojca
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-17 Thread VnPenguin
On 7/17/05, Hans Hagen <[EMAIL PROTECTED]> wrote:
> Mojca Miklavec wrote:
> 
> >regi-lat.tex is interesting, made just for typesetting Croatian :)
> >Perhaps I can add some stuff there too.
> >
> >\defineactivetoken đ {\pseudoencodeddj}
> >\defineactivetoken Ð {\pseudoencodedDJ}
> >
> >This should be \dstroke and \Dstroke.
> >
> >
> ok, changed
> 

yes, there are also exactly two glyphs \dstroke and \Dstroke in Vietnamese :)

Cheers,
-- 
http://vnoss.org
Vietnamese Open Source Software Community
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-17 Thread Hans Hagen

Henning Hraban Ramm wrote:


Am 2005-07-17 um 22:37 schrieb Hans Hagen:


there are

\showcharacters
\showaccents



BTW I finally created the wiki page "Visual Debugging" for all the  
\show... commands; I guess there are even more than I listed there,  
and some descriptions are still missing (had no time to try them all).


thanks 

(\trace... is also handy) 

Hans 



-
 Hans Hagen | PRAGMA ADE
 Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-17 Thread Henning Hraban Ramm

Am 2005-07-17 um 22:37 schrieb Hans Hagen:


there are

\showcharacters
\showaccents


BTW I finally created the wiki page "Visual Debugging" for all the  
\show... commands; I guess there are even more than I listed there,  
and some descriptions are still missing (had no time to try them all).



Grüßlis vom Hraban!
---
http://www.fiee.net/texnique/
http://contextgarden.net

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-17 Thread Hans Hagen

Mojca Miklavec wrote:


I'm now attaching a file for support for windows-1250-encoded files.
One character is missing (I don't know what to write for non-breaking
space) and it's not extensively tested or proved for typos. So if
someone can drop an eye on it, I'll be glad.
 


maybe a better name is regi-ce or just regi-1250


Does anyone have any script to test the encoding (which would produce
a matrix of (almost) 266 characters)?
 


there are

\showcharacters
\showaccents

it all depends on the combination of input regime and font encoding 

Hans 



-
 Hans Hagen | PRAGMA ADE
 Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-17 Thread Hans Hagen

Mojca Miklavec wrote:


regi-lat.tex is interesting, made just for typesetting Croatian :)
Perhaps I can add some stuff there too.

\defineactivetoken đ {\pseudoencodeddj}
\defineactivetoken Ð {\pseudoencodedDJ}

This should be \dstroke and \Dstroke.
 


ok, changed


Where did the "hungarumlaut" characters get the name from? Woudn't it
be better to have "doubleaccute" (as in UNICODE standard). We also
don't name the characters "germanumlaut" but "diaeresis" instead.
 

the names probably come from postscript 

btw, there is a differnece between umlaut and diaeresis (height) 

Hans 



-
 Hans Hagen | PRAGMA ADE
 Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-15 Thread Mojca Miklavec
Henning Hraban Ramm wrote:
> You did read http://contextgarden.net/Encodings_and_Regimes and
> linked pages, did you?
> If you learn anything new, please add it to the wiki!

Thank you! It was probably me who copy-pasted some of the material
there from some thread, but when I looked at it once again, I learnt
something new. A while ago I was asking how to typeset things in
windows-1250 encoding (\usepackage[cp1250]{inputenc} in LaTeX).  I got
some answer (just a temporary solution with csr fonts), but it was not
a satisfying one.

I'm now attaching a file for support for windows-1250-encoded files.
One character is missing (I don't know what to write for non-breaking
space) and it's not extensively tested or proved for typos. So if
someone can drop an eye on it, I'll be glad.

Does anyone have any script to test the encoding (which would produce
a matrix of (almost) 266 characters)?

regi-lat.tex is interesting, made just for typesetting Croatian :)
Perhaps I can add some stuff there too.

\defineactivetoken đ {\pseudoencodeddj}
\defineactivetoken Ð {\pseudoencodedDJ}

This should be \dstroke and \Dstroke.

Where did the "hungarumlaut" characters get the name from? Woudn't it
be better to have "doubleaccute" (as in UNICODE standard). We also
don't name the characters "germanumlaut" but "diaeresis" instead.

Mojca


regi-cp1250.tex
Description: TeX document
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-15 Thread Henning Hraban Ramm

Am 2005-07-14 um 21:13 schrieb Steffen Wolfrum:


But, why is the Vietnamese example with
\enableregime[utf]
linked under
vis = visciiVISCIIVietnamesevis = visciiVISCII 
Vietnamese

and not accessable with
utfUTF-8Unicode ? (Same for cyrillic)

Is this just a wrong link, or does this show that I don't have  
understood the

realationship between regimes and encoding?
Shouldn't all UTF relevant examples be listed under UTF?


All examples are (could be) relevant for UTF-8, because you can set  
(nearly)

everything in Unicode.

VISCII is one possible encoding for Vietnamese (and only for  
Vietnamese),
so I found it rather logical to link from there to V., even if the V.  
example

uses UTF-8, which is probably more modern - as probably a lot of other
encodings are obsolete/deprecated.

So, even if the V. example could be considered a general UTF-8 example,
it shows how one can (and perhaps should) typeset V.

So I guess the only error or missing link is the link from UTF-8 to  
Vietnamese

(and Cyrillic). Do it yourself as you please.



Grüßlis vom Hraban!
---
http://www.fiee.net/texnique/
http://contextgarden.net

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-14 Thread Radhelorn

Steffen Wolfrum wrote:

I know there is \enableregime[utf]
but what else I needed that the output equals my utf-8 input?

Could some maybe give a short and usable How-To on common examples:
Greek
Russian
an East European language
and an Asian language?


You did read http://contextgarden.net/Encodings_and_Regimes and
linked pages, did you?
If you learn anything new, please add it to the wiki!


Well, yes, I wasn't interested in e.g. VISCII, but I read the info for UTF.
But as you wrote "linked pages" I became more curious and looked up also those
pages. Indeed, there is more:

But, why is the Vietnamese example with
\enableregime[utf]
\setupencoding[default=t5
linked under
vis = visciiVISCII  Vietnamesevis = viscii  VISCII  Vietnamese
and not accessable with
utf UTF-8   Unicode ? (Same for cyrillic)

Is this just a wrong link, or does this show that I don't have understood the
realationship between regimes and encoding?

Shouldn't all UTF relevant examples be listed under UTF?




\enableregime is not enough. You need to setup font encoding and 
appropriate bodyfont. For these see type-enc, type-pre and such.


Example for cyrillic:

\enableregime [utf]
\setupencoding [default=t2a]
\usetypescript [modern-base] [\defaultencoding]
\setupbodyfont [modern]

\starttext
Тест.
\stoptext


--
Radhelorn <[EMAIL PROTECTED]>
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-14 Thread VnPenguin
On 7/14/05, Steffen Wolfrum <[EMAIL PROTECTED]> wrote:
> 
> 
> Well, yes, I wasn't interested in e.g. VISCII, but I read the info for UTF.
> But as you wrote "linked pages" I became more curious and looked up also those
> pages. Indeed, there is more:
> 
> But, why is the Vietnamese example with
> \enableregime[utf]
> \setupencoding[default=t5
> linked under
> vis = visciiVISCII  Vietnamesevis = viscii  VISCII  Vietnamese
> and not accessable with
> utf UTF-8   Unicode ? (Same for cyrillic)

Sorry, I can not understand your question.

Vietnamese can you TeX/LaTeX and ConTeXt with different input
encodings: TCVN, VISCII, VPS or UTF-8.

I'm using currently ConTeXt UTF-8 input for ConTeXt no problem. Not
yet tested with another input encoding, but no more problem with
TeX/LaTeX, so should be ok with ConTeXt, i'm wrong?

-- 
http://vnoss.org
Vietnamese Open Source Software Community
___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-14 Thread Steffen Wolfrum
Hi Henning,


Zitat von Henning Hraban Ramm <[EMAIL PROTECTED]>:

> Am 2005-07-14 um 11:30 schrieb Steffen Wolfrum:
>
> > I know there is \enableregime[utf]
> > but what else I needed that the output equals my utf-8 input?
> >
> > Could some maybe give a short and usable How-To on common examples:
> > Greek
> > Russian
> > an East European language
> > and an Asian language?
>
> You did read http://contextgarden.net/Encodings_and_Regimes and
> linked pages, did you?
> If you learn anything new, please add it to the wiki!


Well, yes, I wasn't interested in e.g. VISCII, but I read the info for UTF.
But as you wrote "linked pages" I became more curious and looked up also those
pages. Indeed, there is more:

But, why is the Vietnamese example with
\enableregime[utf]
\setupencoding[default=t5
linked under
vis = visciiVISCII  Vietnamesevis = viscii  VISCII  Vietnamese
and not accessable with
utf UTF-8   Unicode ? (Same for cyrillic)

Is this just a wrong link, or does this show that I don't have understood the
realationship between regimes and encoding?

Shouldn't all UTF relevant examples be listed under UTF?


So,sorry for starting this irrelevant thread,

Steffen

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Basic question on Unicode and ConTeXt

2005-07-14 Thread Henning Hraban Ramm

Am 2005-07-14 um 11:30 schrieb Steffen Wolfrum:


I know there is \enableregime[utf]
but what else I needed that the output equals my utf-8 input?

Could some maybe give a short and usable How-To on common examples:
Greek
Russian
an East European language
and an Asian language?


You did read http://contextgarden.net/Encodings_and_Regimes and  
linked pages, did you?

If you learn anything new, please add it to the wiki!


Grüßlis vom Hraban!
---
http://www.fiee.net/texnique/
http://contextgarden.net

___
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context