Re: [XeTeX] Polyglossia: Automatic script detection

Tobias Schoel Sun, 06 Mar 2011 13:18:06 -0800

It seems to me, that the approach is flawed. The language, some text iswritten in, is an important, structural aspect of the text and thusshould not be left to some machine to guess. The machine should do theformalistic work depending on the structural decisions made by the user.Thus, a user should say "this text is that language", just aspolyglossia does by offering him the \text<lang>-commands.


The formalistic things, the computer should do then:
- choose the font (a parameter, the user should be able to change)
- change language specific behaviour
- some internal staff.

A sensible default for the language shouldn't come from the package, butfrom some system locale (not necessarily the os-locale, but maybe axe(la)tex-specific locale or a locale defined by parameter to xe(la)tex)


bye

Toscho

Am 06.03.2011 19:33, schrieb [email protected]:

On Sun, 6 Mar 2011, Gerrit wrote:

the text. If the document has main language English and second language

What is the "main language written in the text"?

Sorry, my English... I meant the main language, the text is written in. E.g.
an English article with some Russian text in it.


What you wrote was acceptable English, but I don't understand how
Polyglossia is supposed to detect the main language.  When there are
several candidate languages that may be in use in the document, which one
is the "main" one?  Is it the first one used?  Or the one that has the
largest number of characters over the entire document (which necessitates
an extra pass through the document)?

It seems like you are saying, in order to detect the language, first we
must detect the main language.  That's just changing the name of the
problem.

Similarily, for a Chinese words in a Japanese text: The Chinese words will
then be written in a Japanese font (and Japanase simplified characters, when
necessary). e.g. ?? (Guangdong) will become ?? in Japanese, even though it
would be ?? in traditional Chinese.


I haven't read enough Chinese/Japanese mixed documents to know whether
that's commonly done in practice, but I wonder whether users would really
consider it satisfactory.

I did not mean this method for determing the overall language of the document.
This is indeed much more complicated. But if we define German as the main
language (\setdefaultlanguage{german}) of the document, "table of contents"


Okay - so "main" language is declared by the user?

Mixing text in the same script always poses a problem, but more in the field
of hyphenation, not so much in that of font changing. I guess, we do not want
to select a different latin font for French, written inside of a English
document? This would not look good.


You might want to select a different Latin font for Polish.

Instead of basing it on language, I'd rather allow the user to specify an
automatic font switch for Unicode ranges: "use this font most of the time,
but use this other font for U+3000 to U+30FF..."  Then if they are mixing
languages that use different code points (such as English/Russian), they
can get behaviour such as you describe; and if they are mixing languages
that share character codes (such as Chinese/Japanese), then they have to
use some other mechanism to mark up which is which, but they would have to
anyway, so nothing has been lost.  Calling it "character code range"
rather than "language" or "script" avoids making the false promise that we
can correctly distinguish between languages or scripts.  I think some
feature similar to this already exists, and expansions of it have
certainly been proposed.



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] Polyglossia: Automatic script detection

Reply via email to