Re: [XeTeX]   in XeTeX

2011-11-13 Thread Peter Dyballa

Am 13.11.2011 um 23:14 schrieb Ross Moore:

> Is there a EUR 0,01 coin?   :-)

Yes, 1 ¢ and 2 ¢ coins exist.

--
Mit friedvollen Grüßen

  Pete

When Richard Stallman goes to the loo, he core dumps.




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread Ross Moore
Hi all,

On 14/11/2011, at 7:55 AM, Zdenek Wagner wrote:

> Before typing a document one should think what will be the purpose of
> it. If the only purpose is to have it typeset by (La)TeX, I would just
> use well known macros and control symbols (~, $, &, %, ^, _). If the
> text should be stored in a generic database, I cannot use ~ because I
> do not know whether it will be processed by TeX. I cannot use  
> because I do not know whether it will be processed by HTML aware
> tools. I cannot even use   because the tool used for processing
> the exported data may not understand entities at all. In such a case I
> must use U+00a0 and make sure that the tool used for processing the
> data knows how to handle it, or I should plug in a preprocessor. 

This is exactly correct.
Text will be entered into whatever tools, for storing data.
Such text may well contain characters (rightly or wrongly) that
have not traditionally been used in (La)TeX typesetting.

Thus the problem is: "what should be the default (Xe)TeX behaviour
when encountering such characters in the input stream?"


Currently there is no part of building the XeTeX.def format
that handles these, apart from"

   "00A0  (= u00a0)  being set to have \catcode 12.
see the coding of  xetex.ini 

Nothing sets any properties of characters in the range:

   U+02000 --> U+0200F ,  U+02028 --> U+0202F

apart from perhaps in  bidi.sty  which needs the RTL and
LTR marks, ZWNJ and maybe some others.
But  bidi.sty  is optionally loaded by the user, so does not 
count here as the *default* behaviour for XeTeX-based formats.

The result is that these characters just pass through to the 
output, as part of a character string within the PDF,
*provided* the font supports them.


However, the tradition .tfm-based TeX fonts just treat these 
as missing characters, contributing zero to the metric width.
There'll be a message in the .log file:

>>> Missing character: There is no   in font cmr10!
>>> Missing character: There is no   in font cmr10!
>>> Missing character: There is no   in font cmr10!
>>> Missing character: There is no   in font cmr10!


This seems like a reasonable default behaviour, especially
in light of the lack of consensus to do anything else.


One slight problem is that those "Missing character" messages
do not go to the "console" window, but only to the .log  file.
Many users will not notice this.

Although this is just following TeX's design, and was quite sensible 
when TeX was just using its own CMR fonts, I think that XeTeX 
should have directed such warning messages also to the Console. 

XeTeX has stepped out of the tightly controlled environment 
of traditional TeX jobs, so should also have re-thought about 
what are "errors" and "warnings" and extra technical information, 
and how relevant these would be to users/authors.


The point here is that users might simply not notice that 
some of the characters in their inputs may not have not been 
processed in the best possible way.
This would be particularly the case for characters that have
no visible rendering, but just insert extra space, as are
being discussed in this thread.


> 
>>> Where would such a default take place:
>>> - XeTeX engine
>>> - XeLaTeX format
>>> - some package (xunicode, fontspec, some new package)

xunicode  doesn't handle the meaning of non-ascii input.
It is designed primarily for mapping legacy ascii-style input
(via macro-names) to the best-possible Unicode code-point(s).

fontspec  isn't right either, as we are talking about spacing,
not actual printed characters from a font.

>>> - my own package/preamble template
>> 
>> None of these ?  In a stand-alone file that can be \input
>> by Plain XeTeX users, by XeLaTeX users, and by XeLaTeX
>> package authors.

I think that this counts as a "package", just using a .tex
(or other) suffix, rather than necessarily .sty .


A TEC-kit mapping file is another place where these characters
can be processed; e.g. removed, if there is no need for them
to be part of the final PDF output.

However this inhibits the possibility of earlier logic
being easily applied, to test the context of the role
of these special characters and act accordingly.


>> 
>> In a future XeTeX variant (if such a thing comes to exist),
>> the functionality could be built into the engine.

Certainly some default behaviour could be included.
But what is best?

Assigning a \catcode of 10 would be appropriate in
some situations, for some characters.
Making some characters active, then giving an expansion,
would be appropriate in other situations.

Packages could be written for these situations.

But then, as always, it is up to the users to recognise
the issues, for their own particular data and their
own output requirements, then choose packages accordingly.


>> 
>> My EUR 0,02 (while we still have one).
>> ** Phil.

Is there a EUR 0,01 coin?   :-)
We lost our AUD 0.01 and 0.02 coins long ago.
There is even talk now of dropping the 0.05 one.

> 
> 
> --

Re: [XeTeX]   in XeTeX

2011-11-13 Thread Zdenek Wagner
2011/11/13 Philip TAYLOR :
>
>
> Tobias Schoel wrote:
>>
>> Now, that the practicability is cleared, let's come back to the
>> philosophical part:
>
> Actually, I think this is the practical/pragmatic part,
> but let's carry on none the less ...
>>
>> Should  =u00a0 be active and treated as ~ by default? Just like
>> u202f and u2009 should be active and treated as \, and \,\hspace{0pt}?
>
> Well : a macro-based solution is certainly the best place
> to start (and to experiment) but the particular expansions
> that you have chosen are not entirely generic : \hspace,
> for example, is unknown in Plain TeX, and is therefore
> better replaced with \hskip.  Whether \hskip would then
> work happily with LaTeX, I have no idea, but it is by
> no means unreasonable to think that there might be format-
> specific definitions for each of these characters.
>>
In LaTeX \hskip does exactly the same as in plain but the question is
when this replacement should occur. It may seem that a TECkit map can
be used but this is applied after all macros have been expanded and
the horizontal list is beaing created. If you replace U+00a0 with
\hskip  at that time, \hskip will be printed in the current
font. In order to insert \hskip as a token the replacement has to
occur in TeX mouth. The size of the skip, its stretchability and
shrinkability is taken from the fondimen registers of the curent font
but the TeX mouth does not know what font will be current when the
replaced U+00a0 will be processed by the TeX stomach. The mouth cannot
simply replace it with ~ becose it does not know what will be its
meaning when it is processed in the stomach.

Before typing a document one should think what will be the purpose of
it. If the only purpose is to have it typeset by (La)TeX, I would just
use well known macros and control symbols (~, $, &, %, ^, _). If the
text should be stored in a generic database, I cannot use ~ because I
do not know whether it will be processed by TeX. I cannot use  
because I do not know whether it will be processed by HTML aware
tools. I cannot even use   because the tool used for processing
the exported data may not understand entities at all. In such a case I
must use U+00a0 and make sure that the tool used for processing the
data knows how to handle it, or I should plug in a preprocessor. And I
must prepare a suitable input method how the users will enter U+00a0.
I have it on my keyboard but I am not sure whether such a key is a
common feature. If a user has to enter it using a weird combination,
he or she will not do it. Remember that a user may work remotely via
ssh or telnet with no graphics. (Even then my keyboard contains
U+00a0.)

>> Where would such a default take place:
>> - XeTeX engine
>> - XeLaTeX format
>> - some package (xunicode, fontspec, some new package)
>> - my own package/preamble template
>
> None of these ?  In a stand-alone file that can be \input
> by Plain XeTeX users, by XeLaTeX users, and by XeLaTeX
> package authors.
>
> In a future XeTeX variant (if such a thing comes to exist),
> the functionality could be built into the engine.
>
> My EUR 0,02 (while we still have one).
> ** Phil.
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread Philip TAYLOR



Tobias Schoel wrote:

Now, that the practicability is cleared, let's come back to the
philosophical part:


Actually, I think this is the practical/pragmatic part,
but let's carry on none the less ...


Should  =u00a0 be active and treated as ~ by default? Just like
u202f and u2009 should be active and treated as \, and \,\hspace{0pt}?


Well : a macro-based solution is certainly the best place
to start (and to experiment) but the particular expansions
that you have chosen are not entirely generic : \hspace,
for example, is unknown in Plain TeX, and is therefore
better replaced with \hskip.  Whether \hskip would then
work happily with LaTeX, I have no idea, but it is by
no means unreasonable to think that there might be format-
specific definitions for each of these characters.


Where would such a default take place:
- XeTeX engine
- XeLaTeX format
- some package (xunicode, fontspec, some new package)
- my own package/preamble template


None of these ?  In a stand-alone file that can be \input
by Plain XeTeX users, by XeLaTeX users, and by XeLaTeX
package authors.

In a future XeTeX variant (if such a thing comes to exist),
the functionality could be built into the engine.

My EUR 0,02 (while we still have one).
** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread Tobias Schoel
Now, that the practicability is cleared, let's come back to the 
philosophical part:


Should  =u00a0 be active and treated as ~ by default? Just like 
u202f and u2009 should be active and treated as \, and \,\hspace{0pt}?


Where would such a default take place:
- XeTeX engine
- XeLaTeX format
- some package (xunicode, fontspec, some new package)
- my own package/preamble template

As was discussed in the Thread "Space characters and whitespace", using 
these characters without any treatment contradicts TeX's spacing 
algorithms. So it seems, one should not use these characters and blame 
unicode OR treat these characters specially.


bye

Toscho

Am 13.11.2011 21:36, schrieb Mike Maxwell:

On 11/13/2011 11:09 AM, Tobias Schoel wrote:

How much text flow control mechanism should be done by none-ASCII
characters? Unicode has different codepoints for signs with the same
meaning but different text flow control (space vs. non-break space). So
text flow could be controled via Unicode codepoints. But should it? Or
should text flow be controled via commands and active characters?

One opinion says, that using (La)TeX is programming. Consequently, each
character used should be visually well distinguishable. This is not the
case with all the Unicode white space characters.

One opinion says, that using (La)TeX is transforming plain text (like
.txt) in well formatted text. Consequently, the plain text may contain
as much (meta)-information as possible and these information should be
used when transforming it to well formatted text. So Unicode white space
characters are allowed and should be valued by their specific meaning.


And on the third hand, XeTeX could allow both.

 > How would you visually differentiate between all
 > the white space characters (space vs. non-break space, thin space
 > (u2009) vs. narrow no-break space (u202f), … ) such that the text
 > remains readable?

Of course, there's precedent for this kind of problem: tab characters.
For that matter, many text editors display Unicode combining diacritics
over or under the base character that they go with, which is already
getting away from a straightforward display of the underlying characters.

At any rate, there are lots of ways non-ASCII space characters could be
distinguished; Philip Taylor mentions color coding, which is certainly
possible. Another would be to display some kind of code for non-ASCII
spaces. There's one font which displays all characters as nothing but
their Unicode code points (in hex) inside some kind of box. A tex(t)
editor could certainly be programmed to display control characters
(which these space characters essentially are) differently from the
"regular" characters (which would continue to be displayed with an
ordinary font).

The editor I use, jEdit, provides yet another option: a command
(bindable to a keystroke) that tells me the Unicode code point of any
character, on the editor's status line.



--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread Mike Maxwell

On 11/13/2011 11:09 AM, Tobias Schoel wrote:

How much text flow control mechanism should be done by none-ASCII
characters? Unicode has different codepoints for signs with the same
meaning but different text flow control (space vs. non-break space). So
text flow could be controled via Unicode codepoints. But should it? Or
should text flow be controled via commands and active characters?

One opinion says, that using (La)TeX is programming. Consequently, each
character used should be visually well distinguishable. This is not the
case with all the Unicode white space characters.

One opinion says, that using (La)TeX is transforming plain text (like
.txt) in well formatted text. Consequently, the plain text may contain
as much (meta)-information as possible and these information should be
used when transforming it to well formatted text. So Unicode white space
characters are allowed and should be valued by their specific meaning.


And on the third hand, XeTeX could allow both.

> How would you visually differentiate between all
> the white space characters (space vs. non-break space, thin space
> (u2009) vs. narrow no-break space (u202f), … ) such that the text
> remains readable?

Of course, there's precedent for this kind of problem: tab characters. 
For that matter, many text editors display Unicode combining diacritics 
over or under the base character that they go with, which is already 
getting away from a straightforward display of the underlying characters.


At any rate, there are lots of ways non-ASCII space characters could be 
distinguished; Philip Taylor mentions color coding, which is certainly 
possible.  Another would be to display some kind of code for non-ASCII 
spaces.  There's one font which displays all characters as nothing but 
their Unicode code points (in hex) inside some kind of box.  A tex(t) 
editor could certainly be programmed to display control characters 
(which these space characters essentially are) differently from the 
"regular" characters (which would continue to be displayed with an 
ordinary font).


The editor I use, jEdit, provides yet another option: a command 
(bindable to a keystroke) that tells me the Unicode code point of any 
character, on the editor's status line.

--
Mike Maxwell
maxw...@umiacs.umd.edu
"My definition of an interesting universe is
one that has the capacity to study itself."
--Stephen Eastmond


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread Tobias Schoel



Am 13.11.2011 20:25, schrieb Zdenek Wagner:

2011/11/13 Tobias Schoel:



Am 13.11.2011 12:35, schrieb Zdenek Wagner:


2011/11/13:


On Sun, 13 Nov 2011, Petr Tomasek wrote:


make ~ not active when writing my own macros because it contradicts
the Unicode standard...)


Isn't it just as much a "contradiction" of the "standard" for \ to do
what \ does?  I don't think that is a good way to decide what TeX's
input format should be.
--


And how about math and tables in TeX? And I would like to know a good
text editor that visually displays U+00a0 in such a way that I can
easily distinguish it from U+0020. If I canot see the difference, I
can never be sure. And I definitely do not want to use hexedit for my
TeX files.


That is a good question. It's close to a question I asked earlier on this
list:

How much text flow control mechanism should be done by none-ASCII
characters? Unicode has different codepoints for signs with the same meaning
but different text flow control (space vs. non-break space). So text flow
could be controled via Unicode codepoints. But should it? Or should text
flow be controled via commands and active characters?

One opinion says, that using (La)TeX is programming. Consequently, each
character used should be visually well distinguishable. This is not the case
with all the Unicode white space characters.

One opinion says, that using (La)TeX is transforming plain text (like .txt)
in well formatted text. Consequently, the plain text may contain as much
(meta)-information as possible and these information should be used when
transforming it to well formatted text. So Unicode white space characters
are allowed and should be valued by their specific meaning.


(La)TeX source file is not a plain text. Every LaTeX document nowadays
starts with \documentclass but such text is not present in the output.


Of course, the preamble isn't plain text, but mostly macros. I thought 
of the body of the document. I think, it's common practice for larger 
documents to have a main latex file, which reads \documentclass … 
\begin{document}\input{first_chapter}\input{second_chapter}…\end{document}
In these cases, the input documents are more or less plain text 
(depending on the subject).



Even XML is not plain text, you can use entities as ,' and
many more. Of course, if (La)TeX is used for automatic processing of
data extracted from a database that can contain a wide variety of
Unicode character, it is a valid question how to handle such input.
Or if the content is copy-pasted, from let's say HTML. But who would do 
that …





Matthew Skala
msk...@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex








--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex








--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread Zdenek Wagner
2011/11/13 Tobias Schoel :
>
>
> Am 13.11.2011 12:35, schrieb Zdenek Wagner:
>>
>> 2011/11/13:
>>>
>>> On Sun, 13 Nov 2011, Petr Tomasek wrote:

 make ~ not active when writing my own macros because it contradicts
 the Unicode standard...)
>>>
>>> Isn't it just as much a "contradiction" of the "standard" for \ to do
>>> what \ does?  I don't think that is a good way to decide what TeX's
>>> input format should be.
>>> --
>>
>> And how about math and tables in TeX? And I would like to know a good
>> text editor that visually displays U+00a0 in such a way that I can
>> easily distinguish it from U+0020. If I canot see the difference, I
>> can never be sure. And I definitely do not want to use hexedit for my
>> TeX files.
>
> That is a good question. It's close to a question I asked earlier on this
> list:
>
> How much text flow control mechanism should be done by none-ASCII
> characters? Unicode has different codepoints for signs with the same meaning
> but different text flow control (space vs. non-break space). So text flow
> could be controled via Unicode codepoints. But should it? Or should text
> flow be controled via commands and active characters?
>
> One opinion says, that using (La)TeX is programming. Consequently, each
> character used should be visually well distinguishable. This is not the case
> with all the Unicode white space characters.
>
> One opinion says, that using (La)TeX is transforming plain text (like .txt)
> in well formatted text. Consequently, the plain text may contain as much
> (meta)-information as possible and these information should be used when
> transforming it to well formatted text. So Unicode white space characters
> are allowed and should be valued by their specific meaning.
>
(La)TeX source file is not a plain text. Every LaTeX document nowadays
starts with \documentclass but such text is not present in the output.
Even XML is not plain text, you can use entities as  , ' and
many more. Of course, if (La)TeX is used for automatic processing of
data extracted from a database that can contain a wide variety of
Unicode character, it is a valid question how to handle such input.
>>
>>> Matthew Skala
>>> msk...@ansuz.sooke.bc.ca                 People before principles.
>>> http://ansuz.sooke.bc.ca/
>>>
>>>
>>> --
>>> Subscriptions, Archive, and List information, etc.:
>>>  http://tug.org/mailman/listinfo/xetex
>>>
>>
>>
>>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread Philip TAYLOR

One option would be to colour-code them, but I was
more interested in the philosophy than the implementation.

** Phil.



Not in every case. How would you visually differentiate between all the
white space characters (space vs. non-break space, thin space (u2009)
vs. narrow no-break space (u202f), … ) such that the text remains readable?



--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread Tobias Schoel



Am 13.11.2011 18:16, schrieb Philip TAYLOR:



Tobias Schoel wrote:


One opinion says, that using (La)TeX is programming. Consequently, each
character used should be visually well distinguishable. This is not the
case with all the Unicode white space characters.


Is that not a function of the editor used ? Is it not valid
for an editor to display different Unicode spaces differently,
such that the user can visually differentiate between them ?

Philip Taylor


Not in every case. How would you visually differentiate between all the 
white space characters (space vs. non-break space, thin space (u2009) 
vs. narrow no-break space (u202f), … ) such that the text remains readable?


Toscho


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread Philip TAYLOR



Tobias Schoel wrote:


One opinion says, that using (La)TeX is programming. Consequently, each
character used should be visually well distinguishable. This is not the
case with all the Unicode white space characters.


Is that not a function of the editor used ?  Is it not valid
for an editor to display different Unicode spaces differently,
such that the user can visually differentiate between them ?

Philip Taylor


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread Tobias Schoel



Am 13.11.2011 12:35, schrieb Zdenek Wagner:

2011/11/13:

On Sun, 13 Nov 2011, Petr Tomasek wrote:

make ~ not active when writing my own macros because it contradicts
the Unicode standard...)


Isn't it just as much a "contradiction" of the "standard" for \ to do
what \ does?  I don't think that is a good way to decide what TeX's
input format should be.
--

And how about math and tables in TeX? And I would like to know a good
text editor that visually displays U+00a0 in such a way that I can
easily distinguish it from U+0020. If I canot see the difference, I
can never be sure. And I definitely do not want to use hexedit for my
TeX files.


That is a good question. It's close to a question I asked earlier on 
this list:


How much text flow control mechanism should be done by none-ASCII 
characters? Unicode has different codepoints for signs with the same 
meaning but different text flow control (space vs. non-break space). So 
text flow could be controled via Unicode codepoints. But should it? Or 
should text flow be controled via commands and active characters?


One opinion says, that using (La)TeX is programming. Consequently, each 
character used should be visually well distinguishable. This is not the 
case with all the Unicode white space characters.


One opinion says, that using (La)TeX is transforming plain text (like 
.txt) in well formatted text. Consequently, the plain text may contain 
as much (meta)-information as possible and these information should be 
used when transforming it to well formatted text. So Unicode white space 
characters are allowed and should be valued by their specific meaning.





Matthew Skala
msk...@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex








--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread Zdenek Wagner
2011/11/13  :
> On Sun, 13 Nov 2011, Petr Tomasek wrote:
>> make ~ not active when writing my own macros because it contradicts
>> the Unicode standard...)
>
> Isn't it just as much a "contradiction" of the "standard" for \ to do
> what \ does?  I don't think that is a good way to decide what TeX's
> input format should be.
> --
And how about math and tables in TeX? And I would like to know a good
text editor that visually displays U+00a0 in such a way that I can
easily distinguish it from U+0020. If I canot see the difference, I
can never be sure. And I definitely do not want to use hexedit for my
TeX files.

> Matthew Skala
> msk...@ansuz.sooke.bc.ca                 People before principles.
> http://ansuz.sooke.bc.ca/
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-13 Thread mskala
On Sun, 13 Nov 2011, Petr Tomasek wrote:
> make ~ not active when writing my own macros because it contradicts
> the Unicode standard...)

Isn't it just as much a "contradiction" of the "standard" for \ to do
what \ does?  I don't think that is a good way to decide what TeX's
input format should be.
-- 
Matthew Skala
msk...@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex