2013/12/9 <msk...@ansuz.sooke.bc.ca>: > On Mon, 9 Dec 2013, Khaled Hosny wrote: >> > U+E0001 U+E0065 U+E006E U+0073 U+0061 U+006E U+0067 >> >> And it is a kind of tagging, so beyond the scope of identifying the >> language of *untagged* text (which is the claim that spurred all this >> discussion). > > The claim was "A properly encoded utf-8 string should contain everything > you need!". If you forbid using Unicode tag characters, then you're > saying "It is impossible to encode language in Unicode when you're not > allowed to use the features designed for that purpose," which is not > an interesting statement. > > Yes, of course some kind of tagging is needed. Keith seems to think that > the tagging will magically come from "proper" UTF-8, and of course he's > wrong. I think language tagging would be possible in pure Unicode, as the > string above demonstrates, but that's not a good way to do it. The really > original question had to do with RTL versus LTR detection, not language > detection, and that's a different issue. > > Unicode specifies a way to detect RTL versus LTR, such that in many cases > it doesn't require tagging. Unicode's way of doing it may or may not be a > good one, but we cannot reasonably pretend that it doesn't exist. The > Unicode bidi algorithm does exist. XeTeX does not implement the Unicode > bidi algorithm. The interesting remaining question is whether XeTeX > should implement it. I tend to think not - because if we implement it, > people will blame us for its failings. It'd also be a lot of work, break > compatibility with the rest of the TeX world, STILL require tagging in > many cases, and so on. > A bit off topic, dou you know a good Linux text editor woth properly implemented bidi algorithm so that I could type multilingual texts? Evne the combination of Urdu and TeX macros is a pain, it is not easy to type \textbf{میں نے \today\ کو سب کچھ کیا۔} I am not able to type it on a single line, gedit, kate and even gmail and facebook get confused and create garbage if I mix LTR and RTL scripts.. I can only use a commercial XML editor that allows me to combine text in a latin script with texts in Hindi and Urdu.
> -- > Matthew Skala > msk...@ansuz.sooke.bc.ca People before principles. > http://ansuz.sooke.bc.ca/ > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex