On Sun, 6 Mar 2011, Gerrit wrote: > In this case, Polyglossia could give priority to the main language written in > the text. If the document has main language English and second language
What is the "main language written in the text"? > the document, that the character is ambigunous. For other languages, like > Chinese or Japanese, punctuation marks are different to latin, so not that > many rules would be necessary. Chinese and Japanese are easy to distinguish from English, but not so easy to distinguish from each other. A Chinese text will usually contain characters that couldn't be Japanese, and a Japanese text will almost always contain characters that couldn't be Chinese, but it's possible to construct nontrivial text fragments in either of those languages using only characters common to the two. Similar issues exist between all pairs of languages that are written in very similar scripts - such as English/French, Russian/Ukranian, and so on. I don't know if Russian and Ukranian might be similar enough we could get away with lumping them together, but it may be necessary to distinguish Japanese from Chinese because of different character forms, English from French to produce the right punctuation spacing, Romanian from others because of the cedilla/comma accent issue, Czech and Polish from others because of hacek and kreska, and almost every language from almost every other to choose the right translations for words like "Section" and "Figure." It seems to me that this kind of auto-detection based on character usage can only ever *sometimes* work. Smarter ways of guessing language based on bigger units than individual characters (for instance, looking for the presence or absence of common words) do exist, but those will break on some texts too. Perhaps you only intended to distinguish general script families (like Latin from Cyrillic), not languages (like English from Russian), but I think Polyglossia needs to distinguish languages, even when it's limited to font selection only. I don't really object to autodetection as long as it's only a deprecated default to make things easier for users who don't know any better, but anybody who actually cares what language they are writing *must* specify it manually or the system will, inevitably, make a wrong guess eventually. -- Matthew Skala [email protected] People before principles. http://ansuz.sooke.bc.ca/ -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
