On Wed, 30 Sep 2009 15:05:50 +0900, "Susan F." <soofalk at gmail.com> wrote: > ...is language-specific processing possible? not > language-specific text generation, rather language-specific text alignment. > such as, if lang=en then alignment=left, if lang=jp then alignment=justify. > the logic is very simple and it seems possible but all my custamization > efforrts were unsuccessful.
Apropos of this (and another msg today on multi-lingual work in XXE): for a couple years now, we have been using XXE to process grammars of languages with complex scripts (specifically Bangla = Bengali, which is a more or less typical Indic script, and Urdu, which uses a Perso-Arabic script). We have nothing in these languages longer than a couple lines of text (interlinear text, if you're familiar with that, as well as example words and phrases in-lined). For output, we decided the normal FO -> PDF route would not work, because it didn't (as far as we could tell) handle non-Roman scripts well. We briefly experimented with converting the DocBook XML to Microsoft Word (XXE has scripts to do that). It worked passably for Bengali, but did not look like a feasible route for Arabic script. Instead, we've been converting the DocBook XML to XeTeX (a Unicode-aware version of LaTeX) using the dblatex program (http://dblatex.sourceforge.net/). We had to do a few enhancements to handle interlinear text (thanks to work by Andy Black and his collaborators--Andy appears on this list from time to time), and do a few things with a specialized LaTeX style sheet. The other work we had to do to accommodate right-to-left text is to bracket that text with a LaTeX command that tells XeTeX to process that stretch of text in a right-to-left fashion (and to use a specific font). We do the bracketing automatically during the conversion process (so the bracketing is not visible in XXE), using a Perl script to find stretches of Unicode characters in the Arabic block (plus spaces etc.). The only tricky part about that was when we had non-Arabic punctuation intermixed with Arabic script. The use of dblatex to produce a XeTeX output, and then running XeTeX to produce the PDF, has given us excellent results for Urdu, which is a very difficult language to typeset (it's not your typical Arabic script). We went back and re-ran the Bangla grammar through this process, and it comes out well too. (Disclaimer: we're running this process under Linux. I believe it would work in Windows as well, but we haven't tried that.) There is one shortcoming to using XMLMind to edit Arabic script: the cursor movement does not work correctly in right-to-left script. (More specifically, I believe the insertion point has the correct behavior, but the visible cursor does not.) The folks at XXE have declined to fix this--understandably, as I'm sure the Arabic script market for XXE is rather small. Also, persuading XXE to use a specific font for other languages is a bit tricky, involving editing a file in the Java lib directory. I was successful at this for the Bangla script, but I could not override the default Arabic script choice. We have not tried any of this with CJK languages. Mike Maxwell CASL/ U MD

