Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Dear Anant, Bellamkonda Ramaraya kavi (1875-1914 AD, Andhra Pradesh) was a respected Vedānta philosopher who wrote many works, including a sub-commentary called *Gītābhāṣyārkaprakāśikā* (गीताभाष्यार्कप्रकाशिका), on Śaṅkara Bhagavatpāda's commentary Gītābhāṣya (गीताभाष्य) on the Bhagavadgītā (भगवद्गीता). So you need to be clear first what you are typing. The *Bhagavadgītā* has already been typed into machine-readable form many times. You can find free copies here: http://gretil.sub.uni-goettingen.de/gret_utf.htm (search for bhagavadgita). Several commentaries on the Bhagavadgītā have also been typed into the computer, including those of Śaṅkara, Yāmuna, Rāmānuja and Jñānadeva. So you don't need to type Śaṅkara's *Gītābhāṣya* itself. Here is a shortcut to the *Gītābhāṣya*: http://gretil.sub.uni-goettingen.de/gret_utf.htm. But I don't think Bellamkonda Ramaraya's subcommentary has been typed yet. So that would be a good project for you. When you say all languages, I think you probably mean all alphabets. There are two important rules for you: 1. Type using Unicode encoding and font (see herehttp://salrc.uchicago.edu/resources/fonts/available/). This can be in Roman script or in Devanāgarī, Telugu script (e.g., Akshar Unicode http://salrc.uchicago.edu/resources/fonts/available/telugu/), or any other. All the scripts are supported by Unicode, so it doesn't matter which you choose. Take the one *you* are most accurate in and familiar with, and in which Bellamkonda's work was published, i.e. probably Telugu. Once your text is typed, *if you have used Unicode*, conversion to other alphabets can be done automatically. 2. Keep your typing simple and concentrate on accuracy. Your efforts will be of no use if you do not type Bellamkonda Ramaraya's words with utmost care and accuracy. It would be an insult to his memory and his philosophy to introduce errors. So type carefully, and then proof-read (check) what you have typed, and then have a friend check your typing too. Every time you check, you will find new errors to correct - don't be discouraged! When you are finished, share the result with the world through GRETIL and SARIT http://sarit.indology.info, as well as your own website. Thank you for your efforts, and good luck! Dominik Wujastyk On 2 October 2011 07:21, A u akupadhyay...@gmail.com wrote: Hello Mr. Shirisha Rao, I am trying to type shankara bhashyam on Bhagwadgeeta by Shri Bellamkonda Rama raya kavi my goal is to make it available on all indian languages. I was wondering if you can help me in this regard. I saw your example, it produces output in many languages. my latex knowledge is very little, if you can give me some guidance to start I would greatly appreciate it. regards Anant On Mon, Sep 12, 2011 at 5:57 AM, Shrisha Rao sh...@nyx.net wrote: El sep 12, 2011, a las 12:25 a.m., Neal Delmonico escribió: Also Zdenek raises an interesting possibility. If I were to want to typeset Sanskrit, say this very Sanskrit, in Bengali or Telugu script. How would I go about that? I am able to get outputs in multiple scripts (Devanagari, Kannada, Roman, Telugu), but not Bengali, using the xetex-itrans package. The source file has to be in ITRANS rather than accented-Roman (IAST) format. See the attached for an example of a source and output. This should be extensible relatively easily to Bengali, Gujarati, Oriya, etc., though perhaps not to Tamil. Regards, Shrisha Rao Thanks again. Neal -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Dominik -- Several commentaries on the Bhagavadgītā have also been typed into the computer, including those of Śaṅkara, Yāmuna, Rāmānuja and Jñānadeva. What is the significance (if any) of the extra-high ṅ in Śaṅkara ? ** Phil. -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
On 2 oct. 2011, at 22:40, Philip TAYLOR (Webmaster, Ret'd) wrote: Dominik -- Several commentaries on the Bhagavadgītā have also been typed into the computer, including those of Śaṅkara, Yāmuna, Rāmānuja and Jñānadeva. What is the significance (if any) of the extra-high ṅ in Śaṅkara ? Because that's how his name is spelled. You have guttural, palatal, retroflex and dental n in Devanāgarī, respectively ङ ṅa ; ञ ña; ण ṇa and न na. The guttural na is transcribed using a superscript dot, but maybe you do not have it in a standard font, and your MUA used whatever font was available, therefore this extra height you're talking about. I'm not sure if I've correctly understood you, to be honest. Cyril -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Cyril Niklaus wrote: Because that's how his name is spelled. You have guttural, palatal, retroflex and dental n in Devanāgarī, respectively ङ ṅa ; ञ ña; ण ṇa and न na. Yes, but all n variants are normally the same size, modulo the diacritics. The guttural na is transcribed using a superscript dot, but maybe you do not have it in a standard font, and your MUA used whatever font was available, therefore this extra height you're talking about. I'm not sure if I've correctly understood you, to be honest. Agreed : I have changed my font preferences for Other languages (odd way of having to tell it which font to use for UTF-8 !), and now all four n variants are the same height. Philip Taylor -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
2011/10/2 Philip TAYLOR (Webmaster, Ret'd) p.tay...@rhul.ac.uk: Cyril Niklaus wrote: Because that's how his name is spelled. You have guttural, palatal, retroflex and dental n in Devanāgarī, respectively ङ ṅa ; ञ ña; ण ṇa and न na. Yes, but all n variants are normally the same size, modulo the diacritics. Its not so uncommon that two fonts with the same design size have different x-height. If your computer has to select one character from a different font because it does not exist in your main font, such discrepancies can be expected. At my computer ṅ appears lower. I do not know where fonconfig takes it from, probably from the John Smith's fonts. The guttural na is transcribed using a superscript dot, but maybe you do not have it in a standard font, and your MUA used whatever font was available, therefore this extra height you're talking about. I'm not sure if I've correctly understood you, to be honest. Agreed : I have changed my font preferences for Other languages (odd way of having to tell it which font to use for UTF-8 !), and now all four n variants are the same height. Philip Taylor -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
oh, I completely misunderstood your question, Phil. The answer is: none. It's a rendering artefact. Dominik On 2 October 2011 23:47, Zdenek Wagner zdenek.wag...@gmail.com wrote: 2011/10/2 Philip TAYLOR (Webmaster, Ret'd) p.tay...@rhul.ac.uk: Cyril Niklaus wrote: Because that's how his name is spelled. You have guttural, palatal, retroflex and dental n in Devanāgarī, respectively ङ ṅa ; ञ ña; ण ṇa and न na. Yes, but all n variants are normally the same size, modulo the diacritics. Its not so uncommon that two fonts with the same design size have different x-height. If your computer has to select one character from a different font because it does not exist in your main font, such discrepancies can be expected. At my computer ṅ appears lower. I do not know where fonconfig takes it from, probably from the John Smith's fonts. The guttural na is transcribed using a superscript dot, but maybe you do not have it in a standard font, and your MUA used whatever font was available, therefore this extra height you're talking about. I'm not sure if I've correctly understood you, to be honest. Agreed : I have changed my font preferences for Other languages (odd way of having to tell it which font to use for UTF-8 !), and now all four n variants are the same height. Philip Taylor -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
El oct 2, 2011, a las 7:21 a.m., A u escribió: Hello Mr. Shirisha Rao, I am trying to type shankara bhashyam on Bhagwadgeeta by Shri Bellamkonda Rama raya kavi my goal is to make it available on all indian languages. I was wondering if you can help me in this regard. I saw your example, it produces output in many languages. my latex knowledge is very little, if you can give me some guidance to start I would greatly appreciate it. Get more familiar with LaTeX, in particular XeLaTeX; that's an obvious place to start. Regards, Shrisha Rao regards Anant -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Hello. A question to specialists, Arthur and Mojca maybe :) Is it necessary to have two sets of hyphenation rules, one in NFC and one in NFD? Or, if hyphenation patterns are written in NFC, for instance, will they be applied correctly to a document written in NFD? Regards, Yves -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
On Mon, Sep 12, 2011 at 09:36, Yves Codet wrote: Hello. A question to specialists, Arthur and Mojca maybe :) Is it necessary to have two sets of hyphenation rules, one in NFC and one in NFD? Or, if hyphenation patterns are written in NFC, for instance, will they be applied correctly to a document written in NFD? That depends on engine. From what I understand, XeTeX does normalize the input, so NFD should work fine. But I'm only speaking from memory based on Jonathan's talk at BachoTeX. I might be wrong. I'm not sure what LuaTeX does. If one doesn't write the code, it might be that no normalization will ever take place. I can also easily imagine that our patterns don't work with NFD input with Hyphenator.js. I'm not sure how patterns in Firefox or OpenOffice deal with normalization. I never tested that. But in my opinion engine *should* be capable of doing normalization. Else you can easily end up with exponential problem. A patterns with 3 accented letters can easily result in 8 or even more duplicated patterns to cover all possible combinations of composed-or-decomposed characters. Arthur had some plans to cover normalization in hyph-utf8, but I already hate the idea of duplicated apostrophe, let alone all duplications just for the sake of stupid engines that don't understand unicode :). Mojca -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
On 12 Sep 2011, at 08:59, Mojca Miklavec wrote: On Mon, Sep 12, 2011 at 09:36, Yves Codet wrote: Hello. A question to specialists, Arthur and Mojca maybe :) Is it necessary to have two sets of hyphenation rules, one in NFC and one in NFD? Or, if hyphenation patterns are written in NFC, for instance, will they be applied correctly to a document written in NFD? That depends on engine. From what I understand, XeTeX does normalize the input, so NFD should work fine. But I'm only speaking from memory based on Jonathan's talk at BachoTeX. xetex will normalize text as it is being read from an input file IF the parameter \XeTeXinputnormalization is set to 1 (NFC) or 2 (NFD), but will leave it untouched if it's zero (which is the initial default). Note that this would not affect character sequences that might be created in other ways than reading text files - e.g. you could still create unnormalized text within xetex via macros, etc. Forcing universal normalization is hazardous because there are fonts that do not render the different normalization forms equally well, so users may have a specific reason for wanting to use a certain form. (This is, of course, a shortcoming of such fonts, but because this is the real world situation, I'm reluctant to switch on normalization by default in the engine.) In principle, it seems desirable that the engine should deal with normalization automatigally when using hyphenation patterns, but this is not currently implemented. Personally, I'd recommend the use of NFC as a standard in almost all situations, and suggest that pattern authors should operate on this assumption; support for non-NFC text may then be less-than-perfect, but I'd consider that a feature request for the engine(s) more than for the patterns. I might be wrong. I'm not sure what LuaTeX does. If one doesn't write the code, it might be that no normalization will ever take place. I can also easily imagine that our patterns don't work with NFD input with Hyphenator.js. I'm not sure how patterns in Firefox or OpenOffice deal with normalization. I never tested that. But in my opinion engine *should* be capable of doing normalization. Else you can easily end up with exponential problem. A patterns with 3 accented letters can easily result in 8 or even more duplicated patterns to cover all possible combinations of composed-or-decomposed characters. Arthur had some plans to cover normalization in hyph-utf8, but I already hate the idea of duplicated apostrophe, That's a bit different, and hard to see how we could avoid it except via special-case code somewhere that knows to treat U+0027 and U+2019 as equivalent for certain purposes, even though they are NOT canonically equivalent characters and would not be touched by normalization. IMO, the duplicated apostrophe case is something we have to live with because there are, in effect, two different orthographic conventions in use, and we want both to be supported. They're alternate spellings of the word, and so require separate patterns - just like we'd require for colour and color, if we were trying to support both British and American conventions in a single set of patterns. let alone all duplications just for the sake of stupid engines that don't understand unicode :). Yes, the engine should handle that. But it doesn't (unless you enable input normalization that matches your patterns). JK -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Dear Phil, You should know better. :-) In 1993 you invited me to give a talk about hyphenation at RHBNC. I started out my lecture by demolishing the old chestnut that British is hyphenated etymologically while American isn't. Reality is much more blurry. Hugh Williamson got it right, as so often: The customs of word-division derive partly from etymology, partly from meaning, partly from pronunciation, and partly from tradition. Effective communication depends upon conventions, in word-division as elsewhere, and the best conventions are those the reader is likely to expect. The first part of a divided word should not mislead the reader about the pronunciation or meaning of the second part. Word-division for the benefit of the reader, however, is best determined by a reader’s perceptions; different customs apply to different words, and a few simple rules are not enough to find the right place. -- Methods of Book Design, pp. 48, 89. You are perfectly right, though, that a single set of patterns couldn't support British and American hyphenation at once. Their hyphenation points differ in approximately 30% of cases, that is for words that are spelt the same. Dominik On 12 September 2011 12:09, Philip TAYLOR (Webmaster, Ret'd) p.tay...@rhul.ac.uk wrote: Jonathan Kew wrote: On 12 Sep 2011, at 08:59, Mojca Miklavec wrote: Arthur had some plans to cover normalization in hyph-utf8, but I already hate the idea of duplicated apostrophe, That's a bit different, and hard to see how we could avoid it except via special-case code somewhere that knows to treat U+0027 and U+2019 as equivalent for certain purposes, even though they are NOT canonically equivalent characters and would not be touched by normalization. IMO, the duplicated apostrophe case is something we have to live with because there are, in effect, two different orthographic conventions in use, and we want both to be supported. They're alternate spellings of the word, and so require separate patterns - just like we'd require for colour and color, if we were trying to support both British and American conventions in a single set of patterns. It may be that you are intentionally putting up a straw-man argument here, but if you are not, may I comment that trying to support both British and American conventions in a single set of patterns would (IMHO) be impossible, since British English hyphenation is based primarily on etymology whilst American is based on syllable boundaries. I wish I understood more about the duplicate apostophe problem, in order to be able to offer a more directly relevant (and constructive) comment : Google throws up nothing relevant. Philip Taylor -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
On Mon, Sep 12, 2011 at 12:09, Philip TAYLOR (Webmaster, Ret'd) p.tay...@rhul.ac.uk wrote: I wish I understood more about the duplicate apostophe problem, in order to be able to offer a more directly relevant (and constructive) comment : Google throws up nothing relevant. Users type ' (U+0027) and expect the proper apostrophe (U+2019) to show up in final PDF. Knuth just replaced the character (you cannot get U+0027 in pdfTeX, except in typewriter font). In XeTeX mapping=tex-text does that, but not all users use that one, so we need to support both variants. Mojca -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
I've just had a stimulating conversation about this with my friend and fellow Sanskritist, Alessandro Graheli (who also reads this XeTeX list, and is doing critical editions of Sanskrit texts with XeTeX). Alessandro was concerned that I overstated the case. He has used the existing Codet/Kew hyph-sa.tex patterns, and prefers them even for romanised Sanskrit. Word-division after a vowel fits with the forms of recitation and caesura that Alessandro learned when he was a student in India working extensively with traditional Sanskrit pandits. He also said that Italian typesetting of Sanskrit in romanisation hyphenates this way, rather than in the etymological manner that I was asserting. We need more study to sort out some of these issues, but it looks prima facie as if both styles of hyphenating romanised Sanskrit should be preserved, since there are different usage-groups out there. While the hyphenation style for romanised Sanskrit that I describe below reflects widespread usage in good printing over the last century or more, mainly in British texts and journals, and may be required in future too, there are also people who are comfortable with Devanagari-style hyphenation in Romanised text too. Best, Dominik On 11 September 2011 20:40, Dominik Wujastyk wujas...@gmail.com wrote: Sanskrit is hyphenated differently in Devanagari and in Roman script. If you use the hyph-sa.tex patterns, you get Roman hyphenated *as if it were Devanagari,* which is not acceptable in scholarly circles. The last 150 years of European writing on Sanskrit, using Romanisation, has developed hyphenation rules based on Sanskrit etymology, paying attention to compound words, internal sandhi, etc. (i.e., like German in some respects). The Devanagari hyphenation uses a much simpler idea, basically hyphenate after almost any vowel. To get appropriate hyphenation in Romanisation, we need to go down the Patgen path. So we need to develop a large lexicon of appropriately-hyphenated romanised Sanskrit words in UTF8 encoding, and when that list is reasonably long, process it through Patgen to make patterns. I am slowly developing such a list, but it would be great to collaborate. While the list is in the making, it can still be used, by using \hyphenation. Thus: \documentclass{article} polyglossia, xltxtra, whatnot ... \setotherlanguage{sanskrit} % for transliterated Sanskrit \newfontfamily\sanskritfont{TeX Gyre Pagella} % Define \sansk{} which is the same as \emph{}, except that it causes appropriate hyphenation % for Sanskrit words. Use \sansk{} for Sanskrit and \emph{} for English. \newcommand{\sansk}[1]{\emph{\textsanskrit{#1}}} ... \begin{document} \input{sanskrit-hyphenations.tex} % see attached file. Blah English blah. \sansk{āyurveda, avicchinnasampradāyatvād}. \end{document} Best, Dominik -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Mojca Miklavec wrote: Why do you type Ret'd they're helico-pter instead of Ret’d they’re “helico-pter” ? You are unicode-aware, aren't you? Mojca Unicode-aware, but not Unicode-typing. This (like my earlier reply) is typed on an IBM Model M keyboard (the real thing, clicky, dating from circa 1985 : see Exhibit `A' https://picasaweb.google.com/110725905659537251822/IBMModelMKeyboard?authkey=Gv1sRgCMbhqKypi57lNw#5651442526952322114), and is used to compose strictly ASCII text. If I want Unicode, I copy and paste it from the web. ** Phil. -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Gasp! A CRT! -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Dominik Wujastyk wrote: Gasp! A CRT! Sir. You have the honour to be communicating with (in the words of my former manager, David Sweeney) a DINOSAUR. What else would you expect a dinosaur to use but an IBM Model M clicky keyboard and a 19 CRT monitor ?! ** Phil, still wondering what changes the 20th century will bring :-) -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Thanks to Dominik for presenting my needs for hyphenating romanised Sanskrit according to the syllabic division of Sanskrit traditional phonetics. For a number of reasons, in my philologically-oriented work I prefer to typeset Sanskrit words as faithfully as possible to the sources, and the hyph-sa.tex fulfils this need. Yet, I think I understand Dominik on the need for a reader-friendly hyphenation of Sanskrit, particularly in texts with less strict philological needs, and in English essays with occasional Sanskrit terms. In this regard, Dominik's suggestion of adopting the customs of the academic tradition makes sense. But how consistently are such customs applied? And, how many of them are the informed choice of scholars, and not the product of typographers' tastes, dictionaries of modern languages, or software-specific algorithms? In any case, I think that readibility judgements on hyphenation of Sanskrit are largely influenced by one's own habits in hyphenating English, Italian, or any other language, so it is difficult to set a universal standard other than the Devanagari-conforming one. As for Italian typesettingt, hyphenation of Sanskrit words is probably as irregularly applied as in English literature. It is just that, in respect to English, some consonantic clusters commonly found also in Sanskrit (pr, pl, st etc.) are not broken in Italian hyphenation (e.g. ca-sti-tà vs. chas-ti-ty); thus, by adopting Italian hyphenating patterns, one probably gets slightly better results as far as traditional syllabic division of Sanskrit. Best, Alessandro Graheli Il giorno 12/set/11, alle ore 12:58, Dominik Wujastyk ha scritto: I've just had a stimulating conversation about this with my friend and fellow Sanskritist, Alessandro Graheli (who also reads this XeTeX list, and is doing critical editions of Sanskrit texts with XeTeX). Alessandro was concerned that I overstated the case. He has used the existing Codet/Kew hyph-sa.tex patterns, and prefers them even for romanised Sanskrit. Word-division after a vowel fits with the forms of recitation and caesura that Alessandro learned when he was a student in India working extensively with traditional Sanskrit pandits. He also said that Italian typesetting of Sanskrit in romanisation hyphenates this way, rather than in the etymological manner that I was asserting. We need more study to sort out some of these issues, but it looks prima facie as if both styles of hyphenating romanised Sanskrit should be preserved, since there are different usage-groups out there. While the hyphenation style for romanised Sanskrit that I describe below reflects widespread usage in good printing over the last century or more, mainly in British texts and journals, and may be required in future too, there are also people who are comfortable with Devanagari-style hyphenation in Romanised text too. Best, Dominik On 11 September 2011 20:40, Dominik Wujastyk wujas...@gmail.com wrote: Sanskrit is hyphenated differently in Devanagari and in Roman script. If you use the hyph-sa.tex patterns, you get Roman hyphenated as if it were Devanagari, which is not acceptable in scholarly circles. The last 150 years of European writing on Sanskrit, using Romanisation, has developed hyphenation rules based on Sanskrit etymology, paying attention to compound words, internal sandhi, etc. (i.e., like German in some respects). The Devanagari hyphenation uses a much simpler idea, basically hyphenate after almost any vowel. To get appropriate hyphenation in Romanisation, we need to go down the Patgen path. So we need to develop a large lexicon of appropriately-hyphenated romanised Sanskrit words in UTF8 encoding, and when that list is reasonably long, process it through Patgen to make patterns. I am slowly developing such a list, but it would be great to collaborate. While the list is in the making, it can still be used, by using \hyphenation. Thus: \documentclass{article} polyglossia, xltxtra, whatnot ... \setotherlanguage{sanskrit} % for transliterated Sanskrit \newfontfamily\sanskritfont{TeX Gyre Pagella} % Define \sansk{} which is the same as \emph{}, except that it causes appropriate hyphenation % for Sanskrit words. Use \sansk{} for Sanskrit and \emph{} for English. \newcommand{\sansk}[1]{\emph{\textsanskrit{#1}}} ... \begin{document} \input{sanskrit-hyphenations.tex} % see attached file. Blah English blah. \sansk{āyurveda, avicchinnasampradāyatvād}. \end{document} Best, Dominik -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Alessandro and I agree to disagree about the issue of philological correctness. I think that hyphenating following etymology, lexicon and morphemic boundaries is *more* philological than break after a vowel. I think what Alessandro means by philology in this case is that he is influenced by the usage of manuscript scribes and recitation. But these are both traditions in which hyphenation was not theorized at all. However, in the end, I don't really think it's about philology at all, but mere precedent. The fact is, there's a huge body of printed work out there, books and journals, that has accumulated since about 1850, in which Sanskrit is commonly presented in roman transliteration and is routinely hyphenated according to compound-breaks (dharma-cakra) and morphemic boundaries (bhav-a-ti). A lot of people have got used to this kind of hyphenation, often subliminally, and want it in their own printed work. Normally, they don't get it. Authors of indological journal articles frequently have to re-hyphenate Sanskrit words manually in their page proofs. There is a continuing demand this kind of hyphenation. (Surely you can hear the thunder of Sanskritists clamouring for etymological hyphenation hyphenation? :-) Since it's not that hard for XeTeX, we can eventually provide it as a service for those who wish to use it. I'm not being prescriptive about this. Others can use the existing patterns. Let a thousand flowers bloom. I'm quite taken by the concept that Alessandro has raised about different hyphenation traditions for the same language and script in different countries. I.e., English (or Sanskrit) might be differently hyphenated in Italy. Very interesting. Best, Dominik (coffee later, Alessandro?) On 12 September 2011 14:55, alessandro graheli a.grah...@gmail.com wrote: Thanks to Dominik for presenting my needs for hyphenating romanised Sanskrit according to the syllabic division of Sanskrit traditional phonetics. For a number of reasons, in my philologically-oriented work I prefer to typeset Sanskrit words as faithfully as possible to the sources, and the hyph-sa.tex fulfils this need. Yet, I think I understand Dominik on the need for a reader-friendly hyphenation of Sanskrit, particularly in texts with less strict philological needs, and in English essays with occasional Sanskrit terms. In this regard, Dominik's suggestion of adopting the customs of the academic tradition makes sense. But how consistently are such customs applied? And, how many of them are the informed choice of scholars, and not the product of typographers' tastes, dictionaries of modern languages, or software-specific algorithms? In any case, I think that readibility judgements on hyphenation of Sanskrit are largely influenced by one's own habits in hyphenating English, Italian, or any other language, so it is difficult to set a universal standard other than the Devanagari-conforming one. As for Italian typesettingt, hyphenation of Sanskrit words is probably as irregularly applied as in English literature. It is just that, in respect to English, some consonantic clusters commonly found also in Sanskrit (pr, pl, st etc.) are not broken in Italian hyphenation (e.g. ca-sti-tà vs. chas-ti-ty); thus, by adopting Italian hyphenating patterns, one probably gets slightly better results as far as traditional syllabic division of Sanskrit. Best, Alessandro Graheli Il giorno 12/set/11, alle ore 12:58, Dominik Wujastyk ha scritto: I've just had a stimulating conversation about this with my friend and fellow Sanskritist, Alessandro Graheli (who also reads this XeTeX list, and is doing critical editions of Sanskrit texts with XeTeX). Alessandro was concerned that I overstated the case. He has used the existing Codet/Kew hyph-sa.tex patterns, and prefers them even for romanised Sanskrit. Word-division after a vowel fits with the forms of recitation and caesura that Alessandro learned when he was a student in India working extensively with traditional Sanskrit pandits. He also said that Italian typesetting of Sanskrit in romanisation hyphenates this way, rather than in the etymological manner that I was asserting. We need more study to sort out some of these issues, but it looks prima facie as if both styles of hyphenating romanised Sanskrit should be preserved, since there are different usage-groups out there. While the hyphenation style for romanised Sanskrit that I describe below reflects widespread usage in good printing over the last century or more, mainly in British texts and journals, and may be required in future too, there are also people who are comfortable with Devanagari-style hyphenation in Romanised text too. Best, Dominik On 11 September 2011 20:40, Dominik Wujastyk wujas...@gmail.com wrote: Sanskrit is hyphenated differently in Devanagari and in Roman script. If you use the hyph-sa.tex patterns, you get Roman hyphenated *as if it were
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Thanks to both Yves and Zdenek for your suggestions and examples. The hyphenation is working now in both Devanagari and Roman Translit. I'd have never figured it out on my own. If I were to want to read more on this where would I look? Also Zdenek raises an interesting possibility. If I were to want to typeset Sanskrit, say this very Sanskrit, in Bengali or Telugu script. How would I go about that? Thanks again. Neal On Sun, 11 Sep 2011 04:32:59 -0500, Zdenek Wagner zdenek.wag...@gmail.com wrote: 2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net: Thanks! How would one set it up so that the English portions are hyphenated according to English rules and the transliteration is hyphenated according to Sanskrit rules? I am sending an example. You can see another nice feature of the TECkit mapping. The mapping is applied when the text is typeset. You can thus store the transliterated text in a temporary macro and typeset it twice. There is one problem (this is the reason why I am sending a copy to François). It is requested that Sanskrit text is typeset by a font with Devanagari characters. However, Sanskrit is also written in other scripts so that people in other parts of India, who do not know Devanagari, could read it. Even the Tibetan script contains retroflex consonants that are not used in the Tibetan language but server for writing Sanskrit (and recently writing words of English origin). Polyglossia should not be that demanding. And just to François: I found two bugs in documentation. Section 5.2 mentions selection between Western and Devanagari numerals, but it should be Bengali numerals (I am not sure which option is really implemented). At the introduction, Vafa Khaligi's name is wrong. AFAIK in Urdu and Farsi, the isolated and final form of YEH are dotless (it is not a big bug), but in fact the name is written as Khaliql, there is ق instead of غ Best Neal On Sat, 10 Sep 2011 19:40:51 -0500, Zdenek Wagner zdenek.wag...@gmail.com wrote: 2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net: Here is the source files for the pdf. Sorry to take so long to send them. Your default language for polygliglossia is defined as English. You switch to Sanskrit only inside the \skt macro. The text in Devanagari is therefore hyphenated according to Sanskrit rules but the transliterated text is hyphenated according to the English rules. You have to switch the language to Sanskrit also for the transliterated text. Best Neal On Sat, 10 Sep 2011 17:53:42 -0500, Mojca Miklavec mojca.miklavec.li...@gmail.com wrote: On Sun, Sep 11, 2011 at 00:39, Neal Delmonico wrote: Here is an example of what I mean in the pdf attached. Do I get it right that hyphenation is working, it is just that it misses a lot of valid hyphenation points? You should talk to Yves Codet, the author of Sanskrit patterns. But PLEASE: do post example of your code when you ask for help. If you don't send the source, it is not clear whether you are in fact using Sanskrit patterns or if you are falling back to English when you try to switch fonst. You could just as well sent us PDF with French hyphenation enabled and claim that TeX is buggy since it doesn't hyphenate right. Mojca -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Using Opera's revolutionary email client: http://www.opera.com/mail/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Using Opera's revolutionary email client: http://www.opera.com/mail/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Using Opera's revolutionary email client: http://www.opera.com/mail/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net: Thanks to both Yves and Zdenek for your suggestions and examples. The hyphenation is working now in both Devanagari and Roman Translit. I'd have never figured it out on my own. If I were to want to read more on this where would I look? Frankly I do not know. I often read the source code of the packages in order to uinderstand the internals. In fact I even studied the whole source code of LaTeX. Also Zdenek raises an interesting possibility. If I were to want to typeset Sanskrit, say this very Sanskrit, in Bengali or Telugu script. How would I go about that? Probably you can mechanically rewrite RomDev.map to convert the transliteration to another script and compile it with teckit_compile. I do not know Sanskrit and do not know other scripts, my knowledge in this area is almost zero, so I am not sure whether such mechanical approach would work. Thanks again. Neal On Sun, 11 Sep 2011 04:32:59 -0500, Zdenek Wagner zdenek.wag...@gmail.com wrote: 2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net: Thanks! How would one set it up so that the English portions are hyphenated according to English rules and the transliteration is hyphenated according to Sanskrit rules? I am sending an example. You can see another nice feature of the TECkit mapping. The mapping is applied when the text is typeset. You can thus store the transliterated text in a temporary macro and typeset it twice. There is one problem (this is the reason why I am sending a copy to François). It is requested that Sanskrit text is typeset by a font with Devanagari characters. However, Sanskrit is also written in other scripts so that people in other parts of India, who do not know Devanagari, could read it. Even the Tibetan script contains retroflex consonants that are not used in the Tibetan language but server for writing Sanskrit (and recently writing words of English origin). Polyglossia should not be that demanding. And just to François: I found two bugs in documentation. Section 5.2 mentions selection between Western and Devanagari numerals, but it should be Bengali numerals (I am not sure which option is really implemented). At the introduction, Vafa Khaligi's name is wrong. AFAIK in Urdu and Farsi, the isolated and final form of YEH are dotless (it is not a big bug), but in fact the name is written as Khaliql, there is ق instead of غ Best Neal On Sat, 10 Sep 2011 19:40:51 -0500, Zdenek Wagner zdenek.wag...@gmail.com wrote: 2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net: Here is the source files for the pdf. Sorry to take so long to send them. Your default language for polygliglossia is defined as English. You switch to Sanskrit only inside the \skt macro. The text in Devanagari is therefore hyphenated according to Sanskrit rules but the transliterated text is hyphenated according to the English rules. You have to switch the language to Sanskrit also for the transliterated text. Best Neal On Sat, 10 Sep 2011 17:53:42 -0500, Mojca Miklavec mojca.miklavec.li...@gmail.com wrote: On Sun, Sep 11, 2011 at 00:39, Neal Delmonico wrote: Here is an example of what I mean in the pdf attached. Do I get it right that hyphenation is working, it is just that it misses a lot of valid hyphenation points? You should talk to Yves Codet, the author of Sanskrit patterns. But PLEASE: do post example of your code when you ask for help. If you don't send the source, it is not clear whether you are in fact using Sanskrit patterns or if you are falling back to English when you try to switch fonst. You could just as well sent us PDF with French hyphenation enabled and claim that TeX is buggy since it doesn't hyphenate right. Mojca -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Using Opera's revolutionary email client: http://www.opera.com/mail/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Using Opera's revolutionary email client: http://www.opera.com/mail/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Using Opera's revolutionary email client: http://www.opera.com/mail/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
Hello, Neal. I still don't receive your messages :( Le 11 sept. 2011 à 22:21, Zdenek Wagner a écrit : Also Zdenek raises an interesting possibility. If I were to want to typeset Sanskrit, say this very Sanskrit, in Bengali or Telugu script. How would I go about that? Probably you can mechanically rewrite RomDev.map to convert the transliteration to another script and compile it with teckit_compile. I do not know Sanskrit and do not know other scripts, my knowledge in this area is almost zero, so I am not sure whether such mechanical approach would work. I presume you can do like Zdeněk says (I don't much about Teckit). Otherwise you can write Sanskrit directly in Bengali or Telugu script. If you tell Polyglossia what is in Sanskrit it should be hyphenated correctly in those scripts as well. Best wishes, Yves -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
On Sat, Sep 10, 2011 at 22:37, Neal Delmonico wrote: Greetings, I have a question. How does one get the hyphenation to work for transliterated Sanskrit as well as it does for Sanskrit in Devenagari. I use the same text in Devanagari and Roman transliteration and yet in the Devanagari the hyphenation works fine and in the transliteration it does not. Is there some trick to setting up the transliteration so that the hyphenation works? Please send a minimal example that fails to work. In theory the transliterated Sanskrit should work - at least the patterns are present. (On the other hand I don't know anything about Sanskrit, but if there is some technical issue ...) Mojca -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
How does one do that? Where are the patterns kept and what format needs to be rebuilt. Sorry for being so clueless about this. Best Neal On Sat, 10 Sep 2011 15:47:38 -0500, Zdenek Wagner zdenek.wag...@gmail.com wrote: 2011/9/10 Neal Delmonico ndelmon...@sbcglobal.net: Greetings, I have a question. How does one get the hyphenation to work for transliterated Sanskrit as well as it does for Sanskrit in Devenagari. I use the same text in Devanagari and Roman transliteration and yet in the Devanagari the hyphenation works fine and in the transliteration it does not. Is there some trick to setting up the transliteration so that the hyphenation works? It is necessary to modify the hyphenation patterns and then rebuild the format. Thanks. Neal -- Using Opera's revolutionary email client: http://www.opera.com/mail/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Using Opera's revolutionary email client: http://www.opera.com/mail/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net: How does one do that? Where are the patterns kept and what format needs to be rebuilt. Sorry for being so clueless about this. Sorry for the noise, I located the patterns and as Mojca wrote, the patterns for the transliteration are present. It should work out of the box unless you use a different transliteration. An example including the log file will help to find the source of the problem. Best Neal On Sat, 10 Sep 2011 15:47:38 -0500, Zdenek Wagner zdenek.wag...@gmail.com wrote: 2011/9/10 Neal Delmonico ndelmon...@sbcglobal.net: Greetings, I have a question. How does one get the hyphenation to work for transliterated Sanskrit as well as it does for Sanskrit in Devenagari. I use the same text in Devanagari and Roman transliteration and yet in the Devanagari the hyphenation works fine and in the transliteration it does not. Is there some trick to setting up the transliteration so that the hyphenation works? It is necessary to modify the hyphenation patterns and then rebuild the format. Thanks. Neal -- Using Opera's revolutionary email client: http://www.opera.com/mail/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Using Opera's revolutionary email client: http://www.opera.com/mail/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenation in Transliterated Sanskrit
On Sun, Sep 11, 2011 at 00:08, Neal Delmonico wrote: How does one do that? Where are the patterns kept and what format needs to be rebuilt. Sorry for being so clueless about this. The patterns are in hyph-sa.tex (kpsewhich hyph-sa.tex). Mojca -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex