Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-10-02 Thread Dominik Wujastyk
Dear Anant,

Bellamkonda Ramaraya kavi (1875-1914 AD, Andhra Pradesh) was a respected
Vedānta philosopher who wrote many works, including a sub-commentary called
*Gītābhāṣyārkaprakāśikā* (गीताभाष्यार्कप्रकाशिका), on Śaṅkara Bhagavatpāda's
commentary Gītābhāṣya (गीताभाष्य) on the Bhagavadgītā (भगवद्गीता).

So you need to be clear first what you are typing.  The *Bhagavadgītā* has
already  been typed into machine-readable form many times.  You can find
free copies here:  http://gretil.sub.uni-goettingen.de/gret_utf.htm (search
for bhagavadgita).  Several commentaries on the Bhagavadgītā have also
been typed into the computer, including those of Śaṅkara, Yāmuna, Rāmānuja
and Jñānadeva.  So you don't need to type Śaṅkara's *Gītābhāṣya* itself.
Here is a shortcut to the *Gītābhāṣya*:
http://gretil.sub.uni-goettingen.de/gret_utf.htm.

But I don't think Bellamkonda Ramaraya's subcommentary has been typed yet.
So that would be a good project for you.

When you say all languages, I think you probably mean all alphabets.
There are two important rules for you:

1. Type using Unicode encoding and font (see
herehttp://salrc.uchicago.edu/resources/fonts/available/).
This can be in Roman script or in Devanāgarī, Telugu script (e.g., Akshar
Unicode http://salrc.uchicago.edu/resources/fonts/available/telugu/), or
any other.  All the scripts are supported by Unicode, so it doesn't matter
which you choose.  Take the one *you* are most accurate in and familiar
with, and in which Bellamkonda's work was published, i.e. probably Telugu.

Once your text is typed, *if you have used Unicode*, conversion to other
alphabets can be done automatically.

2.  Keep your typing simple and concentrate on accuracy.  Your efforts will
be of no use if you do not type  Bellamkonda Ramaraya's words with utmost
care and accuracy.  It would be an insult to his memory and his philosophy
to introduce errors.  So type carefully, and then proof-read (check) what
you have typed, and then have a friend check your typing too.  Every time
you check, you will find new errors to correct - don't be discouraged!

When you are finished, share the result with the world through GRETIL and
SARIT http://sarit.indology.info, as well as your own website.

Thank you for your efforts, and good luck!

Dominik Wujastyk


On 2 October 2011 07:21, A u akupadhyay...@gmail.com wrote:

 Hello Mr. Shirisha Rao,
 I am trying to type shankara bhashyam on Bhagwadgeeta by Shri Bellamkonda
 Rama raya kavi my goal is to make it available on all indian languages.
 I was wondering if you can help me in this regard.
 I saw your example, it produces output in many languages. my latex
 knowledge is very little, if you can give me some guidance to start I would
 greatly appreciate it.
 regards
 Anant

  On Mon, Sep 12, 2011 at 5:57 AM, Shrisha Rao sh...@nyx.net wrote:

  El sep 12, 2011, a las 12:25 a.m., Neal Delmonico escribió:

  Also Zdenek raises an interesting possibility.  If I were to want to
 typeset Sanskrit, say this very Sanskrit, in Bengali or Telugu script.  How
 would I go about that?

 I am able to get outputs in multiple scripts (Devanagari, Kannada, Roman,
 Telugu), but not Bengali, using the xetex-itrans package.  The source file
 has to be in ITRANS rather than accented-Roman (IAST) format.  See the
 attached for an example of a source and output.

 This should be extensible relatively easily to Bengali, Gujarati, Oriya,
 etc., though perhaps not to Tamil.

 Regards,

 Shrisha Rao

  Thanks again.
 
  Neal






 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex





 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-10-02 Thread Philip TAYLOR (Webmaster, Ret'd)

Dominik --


Several commentaries on the Bhagavadgītā have also
been typed into the computer, including those of
Śaṅkara, Yāmuna, Rāmānuja and Jñānadeva.


What is the significance (if any) of the extra-high ṅ
in Śaṅkara ?

** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-10-02 Thread Cyril Niklaus

On 2 oct. 2011, at 22:40, Philip TAYLOR (Webmaster, Ret'd) wrote:

 Dominik --
 
 Several commentaries on the Bhagavadgītā have also
 been typed into the computer, including those of
 Śaṅkara, Yāmuna, Rāmānuja and Jñānadeva.
 
 What is the significance (if any) of the extra-high ṅ
 in Śaṅkara ?

Because that's how his name is spelled.  You have guttural, palatal, retroflex 
and dental n in Devanāgarī, respectively ङ ṅa
; ञ ña; ण ṇa and न na. 
The guttural na is transcribed using a superscript dot, but maybe you do not 
have it in a standard font, and your MUA used whatever font was available, 
therefore this extra height you're talking about.  I'm not sure if I've 
correctly understood you, to be honest.

Cyril


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-10-02 Thread Philip TAYLOR (Webmaster, Ret'd)



Cyril Niklaus wrote:


Because that's how his name is spelled.  You have guttural, palatal, retroflex 
and dental n in Devanāgarī, respectively ङ ṅa
; ञ ña; ण ṇa and न na.


Yes, but all n variants are normally the same size, modulo the diacritics.


The guttural na is transcribed using a superscript dot, but maybe you do not 
have it in a standard font, and your MUA used whatever font was available, 
therefore this extra height you're talking about.  I'm not sure if I've 
correctly understood you, to be honest.


Agreed : I have changed my font preferences for Other languages
(odd way of having to tell it which font to use for UTF-8 !),
and now all four n variants are the same height.

Philip Taylor


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-10-02 Thread Zdenek Wagner
2011/10/2 Philip TAYLOR (Webmaster, Ret'd) p.tay...@rhul.ac.uk:


 Cyril Niklaus wrote:

 Because that's how his name is spelled.  You have guttural, palatal,
 retroflex and dental n in Devanāgarī, respectively ङ ṅa
 ; ञ ña; ण ṇa and न na.

 Yes, but all n variants are normally the same size, modulo the diacritics.

Its not so uncommon that two fonts with the same design size have
different x-height. If your computer has to select one character from
a different font because it does not exist in your main font, such
discrepancies can be expected. At my computer ṅ appears lower. I do
not know where fonconfig takes it from, probably from the John Smith's
fonts.

 The guttural na is transcribed using a superscript dot, but maybe you do
 not have it in a standard font, and your MUA used whatever font was
 available, therefore this extra height you're talking about.  I'm not sure
 if I've correctly understood you, to be honest.

 Agreed : I have changed my font preferences for Other languages
 (odd way of having to tell it which font to use for UTF-8 !),
 and now all four n variants are the same height.

 Philip Taylor


 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-10-02 Thread Dominik Wujastyk
oh, I completely misunderstood your question, Phil.

The answer is: none.  It's a rendering artefact.

Dominik

On 2 October 2011 23:47, Zdenek Wagner zdenek.wag...@gmail.com wrote:

 2011/10/2 Philip TAYLOR (Webmaster, Ret'd) p.tay...@rhul.ac.uk:
 
 
  Cyril Niklaus wrote:
 
  Because that's how his name is spelled.  You have guttural, palatal,
  retroflex and dental n in Devanāgarī, respectively ङ ṅa
  ; ञ ña; ण ṇa and न na.
 
  Yes, but all n variants are normally the same size, modulo the
 diacritics.
 
 Its not so uncommon that two fonts with the same design size have
 different x-height. If your computer has to select one character from
 a different font because it does not exist in your main font, such
 discrepancies can be expected. At my computer ṅ appears lower. I do
 not know where fonconfig takes it from, probably from the John Smith's
 fonts.

  The guttural na is transcribed using a superscript dot, but maybe you do
  not have it in a standard font, and your MUA used whatever font was
  available, therefore this extra height you're talking about.  I'm not
 sure
  if I've correctly understood you, to be honest.
 
  Agreed : I have changed my font preferences for Other languages
  (odd way of having to tell it which font to use for UTF-8 !),
  and now all four n variants are the same height.
 
  Philip Taylor
 
 
  --
  Subscriptions, Archive, and List information, etc.:
   http://tug.org/mailman/listinfo/xetex
 



 --
 Zdeněk Wagner
 http://hroch486.icpf.cas.cz/wagner/
 http://icebearsoft.euweb.cz



 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-10-01 Thread Shrisha Rao
El oct 2, 2011, a las 7:21 a.m., A u escribió:

 Hello Mr. Shirisha Rao,
 I am trying to type shankara bhashyam on Bhagwadgeeta by Shri Bellamkonda 
 Rama raya kavi my goal is to make it available on all indian languages. 
 I was wondering if you can help me in this regard. 
 I saw your example, it produces output in many languages. my latex knowledge 
 is very little, if you can give me some guidance to start I would greatly 
 appreciate it. 

Get more familiar with LaTeX, in particular XeLaTeX; that's an obvious place to 
start.

Regards,

Shrisha Rao

 regards
 Anant




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-12 Thread Yves Codet
Hello.

A question to specialists, Arthur and Mojca maybe :) Is it necessary to have 
two sets of hyphenation rules, one in NFC and one in NFD? Or, if hyphenation 
patterns are written in NFC, for instance, will they be applied correctly to a 
document written in NFD?

Regards,

Yves






--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-12 Thread Mojca Miklavec
On Mon, Sep 12, 2011 at 09:36, Yves Codet wrote:
 Hello.

 A question to specialists, Arthur and Mojca maybe :) Is it necessary to have 
 two sets of hyphenation rules, one in NFC and one in NFD? Or, if hyphenation 
 patterns are written in NFC, for instance, will they be applied correctly to 
 a document written in NFD?

That depends on engine.

From what I understand, XeTeX does normalize the input, so NFD should
work fine. But I'm only speaking from memory based on Jonathan's talk
at BachoTeX. I might be wrong. I'm not sure what LuaTeX does. If one
doesn't write the code, it might be that no normalization will ever
take place.

I can also easily imagine that our patterns don't work with NFD input
with Hyphenator.js. I'm not sure how patterns in Firefox or OpenOffice
deal with normalization. I never tested that.

But in my opinion engine *should* be capable of doing normalization.
Else you can easily end up with exponential problem. A patterns with 3
accented letters can easily result in 8 or even more duplicated
patterns to cover all possible combinations of composed-or-decomposed
characters.

Arthur had some plans to cover normalization in hyph-utf8, but I
already hate the idea of duplicated apostrophe, let alone all
duplications just for the sake of stupid engines that don't
understand unicode :).

Mojca



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-12 Thread Jonathan Kew

On 12 Sep 2011, at 08:59, Mojca Miklavec wrote:

 On Mon, Sep 12, 2011 at 09:36, Yves Codet wrote:
 Hello.
 
 A question to specialists, Arthur and Mojca maybe :) Is it necessary to have 
 two sets of hyphenation rules, one in NFC and one in NFD? Or, if hyphenation 
 patterns are written in NFC, for instance, will they be applied correctly to 
 a document written in NFD?
 
 That depends on engine.
 
 From what I understand, XeTeX does normalize the input, so NFD should
 work fine. But I'm only speaking from memory based on Jonathan's talk
 at BachoTeX.

xetex will normalize text as it is being read from an input file IF the 
parameter \XeTeXinputnormalization is set to 1 (NFC) or 2 (NFD), but will leave 
it untouched if it's zero (which is the initial default).

Note that this would not affect character sequences that might be created in 
other ways than reading text files - e.g. you could still create unnormalized 
text within xetex via macros, etc.

Forcing universal normalization is hazardous because there are fonts that do 
not render the different normalization forms equally well, so users may have a 
specific reason for wanting to use a certain form. (This is, of course, a 
shortcoming of such fonts, but because this is the real world situation, I'm 
reluctant to switch on normalization by default in the engine.)

In principle, it seems desirable that the engine should deal with normalization 
automatigally when using hyphenation patterns, but this is not currently 
implemented.

Personally, I'd recommend the use of NFC as a standard in almost all 
situations, and suggest that pattern authors should operate on this assumption; 
support for non-NFC text may then be less-than-perfect, but I'd consider that a 
feature request for the engine(s) more than for the patterns.

 I might be wrong. I'm not sure what LuaTeX does. If one
 doesn't write the code, it might be that no normalization will ever
 take place.
 
 I can also easily imagine that our patterns don't work with NFD input
 with Hyphenator.js. I'm not sure how patterns in Firefox or OpenOffice
 deal with normalization. I never tested that.
 
 But in my opinion engine *should* be capable of doing normalization.
 Else you can easily end up with exponential problem. A patterns with 3
 accented letters can easily result in 8 or even more duplicated
 patterns to cover all possible combinations of composed-or-decomposed
 characters.
 
 Arthur had some plans to cover normalization in hyph-utf8, but I
 already hate the idea of duplicated apostrophe,

That's a bit different, and hard to see how we could avoid it except via 
special-case code somewhere that knows to treat U+0027 and U+2019 as 
equivalent for certain purposes, even though they are NOT canonically 
equivalent characters and would not be touched by normalization.

IMO, the duplicated apostrophe case is something we have to live with because 
there are, in effect, two different orthographic conventions in use, and we 
want both to be supported. They're alternate spellings of the word, and so 
require separate patterns - just like we'd require for colour and color, if 
we were trying to support both British and American conventions in a single set 
of patterns.

 let alone all
 duplications just for the sake of stupid engines that don't
 understand unicode :).

Yes, the engine should handle that. But it doesn't (unless you enable input 
normalization that matches your patterns).

JK




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-12 Thread Dominik Wujastyk
Dear Phil,

You should know better. :-)

In 1993 you invited me to give a talk about hyphenation at RHBNC.  I started
out my lecture by demolishing the old chestnut that British is hyphenated
etymologically while American isn't.  Reality is much more blurry.

Hugh Williamson got it right, as so often:

The customs of word-division derive partly from etymology,
partly from meaning, partly from pronunciation, and partly from
tradition. Effective communication depends upon conventions, in
word-division as elsewhere, and the best conventions are those the
reader is likely to expect. The first part of a divided word should
not mislead the reader about the pronunciation or meaning of the
second part.
Word-division for the benefit of the reader, however, is best
determined by a reader’s perceptions; different customs apply to
different words, and a few simple rules are not enough to find the
right place.
-- Methods of Book Design, pp. 48, 89.


You are perfectly right, though, that a single set of patterns couldn't
support British and American hyphenation at once.  Their hyphenation points
differ in approximately 30% of cases, that is for words that are spelt the
same.

Dominik


On 12 September 2011 12:09, Philip TAYLOR (Webmaster, Ret'd) 
p.tay...@rhul.ac.uk wrote:


 Jonathan Kew wrote:
  On 12 Sep 2011, at 08:59, Mojca Miklavec wrote:
 
  Arthur had some plans to cover normalization in hyph-utf8, but I
  already hate the idea of duplicated apostrophe,
 
  That's a bit different, and hard to see how we could avoid it except via
 special-case code somewhere that knows to treat U+0027 and U+2019 as
 equivalent for certain purposes, even though they are NOT canonically
 equivalent characters and would not be touched by normalization.
 
  IMO, the duplicated apostrophe case is something we have to live with
 because there are, in effect, two different orthographic conventions in use,
 and we want both to be supported. They're alternate spellings of the word,
 and so require separate patterns - just like we'd require for colour and
 color, if we were trying to support both British and American conventions
 in a single set of patterns.

 It may be that you are intentionally putting up a straw-man argument here,
 but if you are not, may I comment that trying to support both British and
 American conventions in a single set of patterns would (IMHO) be
 impossible, since British English hyphenation is based primarily on
 etymology whilst American is based on syllable boundaries.  I wish
 I understood more about the duplicate apostophe problem, in order
 to be able to offer a more directly relevant (and constructive) comment :
 Google throws up nothing relevant.

 Philip Taylor


 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-12 Thread Mojca Miklavec
On Mon, Sep 12, 2011 at 12:09, Philip TAYLOR (Webmaster, Ret'd)
p.tay...@rhul.ac.uk wrote:

 I wish
 I understood more about the duplicate apostophe problem, in order
 to be able to offer a more directly relevant (and constructive) comment :
 Google throws up nothing relevant.

Users type ' (U+0027) and expect the proper apostrophe (U+2019) to
show up in final PDF. Knuth just replaced the character (you cannot
get U+0027 in pdfTeX, except in typewriter font). In XeTeX
mapping=tex-text does that, but not all users use that one, so we need
to support both variants.

Mojca



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-12 Thread Dominik Wujastyk
I've just had a stimulating conversation about this with my friend and
fellow Sanskritist, Alessandro Graheli (who also reads this XeTeX list, and
is doing critical editions of Sanskrit texts with XeTeX).

Alessandro was concerned that I overstated the case.  He has used the
existing Codet/Kew hyph-sa.tex patterns, and prefers them even for romanised
Sanskrit.  Word-division after a vowel fits with the forms of recitation and
caesura that Alessandro learned when he was a student in India working
extensively with traditional Sanskrit pandits.  He also said that Italian
typesetting of Sanskrit in romanisation hyphenates this way, rather than in
the etymological manner that I was asserting.

We need more study to sort out some of these issues, but it looks prima
facie as if both styles of hyphenating romanised Sanskrit should be
preserved, since there are different usage-groups out there.  While the
hyphenation style for romanised Sanskrit that I describe below reflects
widespread usage in good printing over the last century or more, mainly in
British texts and journals, and may be required in future too, there are
also people who are comfortable with Devanagari-style hyphenation in
Romanised text too.

Best,
Dominik

On 11 September 2011 20:40, Dominik Wujastyk wujas...@gmail.com wrote:

 Sanskrit is hyphenated differently in Devanagari and in Roman script.  If
 you use the hyph-sa.tex patterns, you get Roman hyphenated *as if it were
 Devanagari,* which is not acceptable in scholarly circles.  The last 150
 years of European writing on Sanskrit, using Romanisation, has developed
 hyphenation rules based on Sanskrit etymology, paying attention to compound
 words, internal sandhi, etc. (i.e., like German in some respects).  The
 Devanagari hyphenation uses a much simpler idea, basically hyphenate after
 almost any vowel.

 To get appropriate hyphenation in Romanisation, we need to go down the
 Patgen path.  So we need to develop a large lexicon of
 appropriately-hyphenated romanised Sanskrit words in UTF8 encoding, and when
 that list is reasonably long, process it through Patgen to make patterns.

 I am slowly developing such a list, but it would be great to collaborate.

 While the list is in the making, it can still be used, by using
 \hyphenation.

 Thus:

 \documentclass{article}

 polyglossia, xltxtra, whatnot
 ...
 \setotherlanguage{sanskrit}  % for transliterated Sanskrit
 \newfontfamily\sanskritfont{TeX Gyre Pagella}

 % Define \sansk{} which is the same as \emph{}, except that it causes
 appropriate hyphenation
 % for Sanskrit words.  Use \sansk{} for Sanskrit and \emph{} for English.
 \newcommand{\sansk}[1]{\emph{\textsanskrit{#1}}}
 ...
 \begin{document}

 \input{sanskrit-hyphenations.tex} % see attached file.

 Blah English blah.  \sansk{āyurveda, avicchinnasampradāyatvād}.

 \end{document}


 Best,
 Dominik



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-12 Thread Philip TAYLOR (Webmaster, Ret'd)


Mojca Miklavec wrote:
 Why do you type Ret'd they're helico-pter instead of Ret’d they’re 
 “helico-pter” ? You are unicode-aware, aren't you? Mojca 
Unicode-aware, but not Unicode-typing.  This (like my earlier
reply) is typed on an IBM Model M keyboard (the real thing, clicky,
dating from circa 1985 : see Exhibit `A' 
https://picasaweb.google.com/110725905659537251822/IBMModelMKeyboard?authkey=Gv1sRgCMbhqKypi57lNw#5651442526952322114),
 and is used to compose strictly
ASCII text.  If I want Unicode, I copy and paste it from the web.

** Phil.


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-12 Thread Dominik Wujastyk
Gasp! A CRT!


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-12 Thread Philip TAYLOR (Webmaster, Ret'd)


Dominik Wujastyk wrote:
 Gasp! A CRT!  

Sir.  You have the honour to be communicating with
(in the words of my former manager, David Sweeney)
a DINOSAUR.  What else would you expect a dinosaur
to use but an IBM Model M clicky keyboard and a 19
CRT monitor ?!

** Phil, still wondering what changes the 20th century will bring :-)


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-12 Thread alessandro graheli
Thanks to Dominik for presenting my needs for hyphenating romanised  
Sanskrit according to the syllabic division of Sanskrit traditional  
phonetics. For a number of reasons, in my philologically-oriented  
work I prefer to typeset Sanskrit words as faithfully as possible to  
the sources, and the hyph-sa.tex fulfils this need.


Yet, I think I understand Dominik on the need for a reader-friendly  
hyphenation of Sanskrit, particularly in texts with less strict  
philological needs, and in English essays with occasional Sanskrit  
terms. In this regard, Dominik's suggestion of adopting the customs  
of the academic tradition makes sense. But how consistently are such  
customs applied? And, how many of them are the informed choice of  
scholars, and not the product of typographers' tastes, dictionaries  
of modern languages, or software-specific algorithms? In any case, I  
think that readibility judgements on hyphenation of Sanskrit are  
largely influenced by one's own habits in hyphenating English,  
Italian, or any other language, so it is difficult to set a universal  
standard other than the Devanagari-conforming one.


As for Italian typesettingt, hyphenation of Sanskrit words is  
probably as irregularly applied as in English literature. It is just  
that, in respect to English, some consonantic clusters commonly found  
also in Sanskrit (pr, pl, st etc.) are not broken in Italian  
hyphenation (e.g. ca-sti-tà vs. chas-ti-ty); thus, by adopting  
Italian hyphenating patterns, one probably gets slightly better  
results as far as traditional syllabic division of Sanskrit.


Best,
Alessandro Graheli



Il giorno 12/set/11, alle ore 12:58, Dominik Wujastyk ha scritto:

I've just had a stimulating conversation about this with my friend  
and fellow Sanskritist, Alessandro Graheli (who also reads this XeTeX  
list, and is doing critical editions of Sanskrit texts with XeTeX).


Alessandro was concerned that I overstated the case.  He has used the  
existing Codet/Kew hyph-sa.tex patterns, and prefers them even for  
romanised Sanskrit.  Word-division after a vowel fits with the forms  
of recitation and caesura that Alessandro learned when he was a  
student in India working extensively with traditional Sanskrit  
pandits.  He also said that Italian typesetting of Sanskrit in  
romanisation hyphenates this way, rather than in the etymological  
manner that I was asserting.


We need more study to sort out some of these issues, but it looks  
prima facie as if both styles of hyphenating romanised Sanskrit  
should be preserved, since there are different usage-groups out  
there.  While the hyphenation style for romanised Sanskrit that I  
describe below reflects widespread usage in good printing over the  
last century or more, mainly in British texts and journals, and may  
be required in future too, there are also people who are comfortable  
with Devanagari-style hyphenation in Romanised text too.


Best,
Dominik

On 11 September 2011 20:40, Dominik Wujastyk wujas...@gmail.com wrote:
Sanskrit is hyphenated differently in Devanagari and in Roman  
script.  If you use the hyph-sa.tex patterns, you get Roman  
hyphenated as if it were Devanagari, which is not acceptable in  
scholarly circles.  The last 150 years of European writing on  
Sanskrit, using Romanisation, has developed hyphenation rules based  
on Sanskrit etymology, paying attention to compound words, internal  
sandhi, etc. (i.e., like German in some respects).  The Devanagari  
hyphenation uses a much simpler idea, basically hyphenate after  
almost any vowel.


To get appropriate hyphenation in Romanisation, we need to go down  
the Patgen path.  So we need to develop a large lexicon of  
appropriately-hyphenated romanised Sanskrit words in UTF8 encoding,  
and when that list is reasonably long, process it through Patgen to  
make patterns.


I am slowly developing such a list, but it would be great to  
collaborate.


While the list is in the making, it can still be used, by using  
\hyphenation.


Thus:

\documentclass{article}

polyglossia, xltxtra, whatnot
...
\setotherlanguage{sanskrit}  % for transliterated Sanskrit
\newfontfamily\sanskritfont{TeX Gyre Pagella}

% Define \sansk{} which is the same as \emph{}, except that it causes  
appropriate hyphenation
% for Sanskrit words.  Use \sansk{} for Sanskrit and \emph{} for  
English.

\newcommand{\sansk}[1]{\emph{\textsanskrit{#1}}}
...
\begin{document}

\input{sanskrit-hyphenations.tex} % see attached file.

Blah English blah.  \sansk{āyurveda, avicchinnasampradāyatvād}.

\end{document}


Best,
Dominik



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-12 Thread Dominik Wujastyk
Alessandro and I agree to disagree about the issue of philological
correctness. I think that hyphenating following etymology, lexicon and
morphemic boundaries is *more* philological than break after a vowel.  I
think what Alessandro means by philology in this case is that he is
influenced by the usage of manuscript scribes and recitation.  But these are
both traditions in which hyphenation was not theorized at all.   However, in
the end, I don't really think it's about philology at all, but mere
precedent.

The fact is, there's a huge body of printed work out there, books and
journals, that has accumulated since about 1850, in which Sanskrit is
commonly presented in roman transliteration and is routinely hyphenated
according to compound-breaks (dharma-cakra) and morphemic boundaries
(bhav-a-ti).  A lot of people have got used to this kind of hyphenation,
often subliminally, and want it in their own printed work.  Normally, they
don't get it.  Authors of indological journal articles frequently have to
re-hyphenate Sanskrit words manually in their page proofs.  There is a
continuing demand this kind of hyphenation.  (Surely you can hear the
thunder of Sanskritists clamouring for etymological hyphenation hyphenation?
:-)   Since it's not that hard for XeTeX, we can eventually provide it as a
service for those who wish to use it.  I'm not being prescriptive about
this.  Others can use the existing patterns.  Let a thousand flowers bloom.

I'm quite taken by the concept that Alessandro has raised about different
hyphenation traditions for the same language and script in different
countries.  I.e., English (or Sanskrit) might be differently hyphenated in
Italy.  Very interesting.

Best,
Dominik

(coffee later, Alessandro?)



On 12 September 2011 14:55, alessandro graheli a.grah...@gmail.com wrote:

  Thanks to Dominik for presenting my needs for hyphenating romanised
 Sanskrit according to the syllabic division of Sanskrit traditional
 phonetics. For a number of reasons, in my philologically-oriented work I
 prefer to typeset Sanskrit words as faithfully as possible to the sources,
 and the hyph-sa.tex fulfils this need.

 Yet, I think I understand Dominik on the need for a reader-friendly
 hyphenation of Sanskrit, particularly in texts with less strict philological
 needs, and in English essays with occasional Sanskrit terms. In this regard,
 Dominik's suggestion of adopting the customs of the academic tradition makes
 sense. But how consistently are such customs applied? And, how many of them
 are the informed choice of scholars, and not the product of typographers'
 tastes, dictionaries of modern languages, or software-specific algorithms?
 In any case, I think that readibility judgements on hyphenation of Sanskrit
 are largely influenced by one's own habits in hyphenating English, Italian,
 or any other language, so it is difficult to set a universal standard other
 than the Devanagari-conforming one.

 As for Italian typesettingt, hyphenation of Sanskrit words is probably as
 irregularly applied as in English literature. It is just that, in respect to
 English, some consonantic clusters commonly found also in Sanskrit (pr, pl,
 st etc.) are not broken in Italian hyphenation (e.g. ca-sti-tà vs.
 chas-ti-ty); thus, by adopting Italian hyphenating patterns, one probably
 gets slightly better results as far as traditional syllabic division of
 Sanskrit.

 Best,
 Alessandro Graheli



 Il giorno 12/set/11, alle ore 12:58, Dominik Wujastyk ha scritto:

 I've just had a stimulating conversation about this with my friend and
 fellow Sanskritist, Alessandro Graheli (who also reads this XeTeX list, and
 is doing critical editions of Sanskrit texts with XeTeX).

 Alessandro was concerned that I overstated the case.  He has used the
 existing Codet/Kew hyph-sa.tex patterns, and prefers them even for romanised
 Sanskrit.  Word-division after a vowel fits with the forms of recitation and
 caesura that Alessandro learned when he was a student in India working
 extensively with traditional Sanskrit pandits.  He also said that Italian
 typesetting of Sanskrit in romanisation hyphenates this way, rather than in
 the etymological manner that I was asserting.

 We need more study to sort out some of these issues, but it looks prima
 facie as if both styles of hyphenating romanised Sanskrit should be
 preserved, since there are different usage-groups out there.  While the
 hyphenation style for romanised Sanskrit that I describe below reflects
 widespread usage in good printing over the last century or more, mainly in
 British texts and journals, and may be required in future too, there are
 also people who are comfortable with Devanagari-style hyphenation in
 Romanised text too.

 Best,
 Dominik

 On 11 September 2011 20:40, Dominik Wujastyk wujas...@gmail.com wrote:

 Sanskrit is hyphenated differently in Devanagari and in Roman script.  If
 you use the hyph-sa.tex patterns, you get Roman hyphenated *as if it were
 

Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-11 Thread Neal Delmonico
Thanks to both Yves and Zdenek for your suggestions and examples.  The  
hyphenation is working now in both Devanagari and Roman Translit.  I'd  
have never figured it out on my own.  If I were to want to read more on  
this where would I look?


Also Zdenek raises an interesting possibility.  If I were to want to  
typeset Sanskrit, say this very Sanskrit, in Bengali or Telugu script.   
How would I go about that?


Thanks again.

Neal

On Sun, 11 Sep 2011 04:32:59 -0500, Zdenek Wagner  
zdenek.wag...@gmail.com wrote:



2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net:
Thanks!  How would one set it up so that the English portions are  
hyphenated
according to English rules and the transliteration is hyphenated  
according

to Sanskrit rules?


I am sending an example. You can see another nice feature of the
TECkit mapping. The mapping is applied when the text is typeset. You
can thus store the transliterated text in a temporary macro and
typeset it twice.

There is one problem (this is the reason why I am sending a copy to
François). It is requested that Sanskrit text is typeset by a font
with Devanagari characters. However, Sanskrit is also written in other
scripts so that people in other parts of India, who do not know
Devanagari, could read it. Even the Tibetan script contains retroflex
consonants that are not used in the Tibetan language but server for
writing Sanskrit (and recently writing words of English origin).
Polyglossia should not be that demanding.

And just to François: I found two bugs in documentation. Section 5.2
mentions selection between Western and Devanagari numerals, but it
should be Bengali numerals (I am not sure which option is really
implemented). At the introduction, Vafa Khaligi's name is wrong. AFAIK
in Urdu and Farsi, the isolated and final form of YEH are dotless (it
is not a big bug), but in fact the name is written as Khaliql, there
is ق instead of غ


Best

Neal

On Sat, 10 Sep 2011 19:40:51 -0500, Zdenek Wagner  
zdenek.wag...@gmail.com

wrote:


2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net:


Here is the source files for the pdf.  Sorry to take so long to send
them.


Your default language for polygliglossia is defined as English. You
switch to Sanskrit only inside the \skt macro. The text in Devanagari
is therefore hyphenated according to Sanskrit rules but the
transliterated text is hyphenated according to the English rules. You
have to switch the language to Sanskrit also for the transliterated
text.


Best

Neal

On Sat, 10 Sep 2011 17:53:42 -0500, Mojca Miklavec
mojca.miklavec.li...@gmail.com wrote:


On Sun, Sep 11, 2011 at 00:39, Neal Delmonico wrote:


Here is an example of what I mean in the pdf attached.


Do I get it right that hyphenation is working, it is just that it
misses a lot of valid hyphenation points?

You should talk to Yves Codet, the author of Sanskrit patterns.

But PLEASE: do post example of your code when you ask for help. If  
you

don't send the source, it is not clear whether you are in fact using
Sanskrit patterns or if you are falling back to English when you try
to switch fonst. You could just as well sent us PDF with French
hyphenation enabled and claim that TeX is buggy since it doesn't
hyphenate right.

Mojca


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex



--
Using Opera's revolutionary email client: http://www.opera.com/mail/


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex









--
Using Opera's revolutionary email client: http://www.opera.com/mail/


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex








--
Using Opera's revolutionary email client: http://www.opera.com/mail/



--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-11 Thread Zdenek Wagner
2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net:
 Thanks to both Yves and Zdenek for your suggestions and examples.  The
 hyphenation is working now in both Devanagari and Roman Translit.  I'd have
 never figured it out on my own.  If I were to want to read more on this
 where would I look?

Frankly I do not know. I often read the source code of the packages in
order to uinderstand the internals. In fact I even studied the whole
source code of LaTeX.

 Also Zdenek raises an interesting possibility.  If I were to want to typeset
 Sanskrit, say this very Sanskrit, in Bengali or Telugu script.  How would I
 go about that?

Probably you can mechanically rewrite RomDev.map to convert the
transliteration to another script and compile it with teckit_compile.
I do not know Sanskrit and do not know other scripts, my knowledge in
this area is almost zero, so I am not sure whether such mechanical
approach would work.

 Thanks again.

 Neal

 On Sun, 11 Sep 2011 04:32:59 -0500, Zdenek Wagner zdenek.wag...@gmail.com
 wrote:

 2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net:

 Thanks!  How would one set it up so that the English portions are
 hyphenated
 according to English rules and the transliteration is hyphenated
 according
 to Sanskrit rules?

 I am sending an example. You can see another nice feature of the
 TECkit mapping. The mapping is applied when the text is typeset. You
 can thus store the transliterated text in a temporary macro and
 typeset it twice.

 There is one problem (this is the reason why I am sending a copy to
 François). It is requested that Sanskrit text is typeset by a font
 with Devanagari characters. However, Sanskrit is also written in other
 scripts so that people in other parts of India, who do not know
 Devanagari, could read it. Even the Tibetan script contains retroflex
 consonants that are not used in the Tibetan language but server for
 writing Sanskrit (and recently writing words of English origin).
 Polyglossia should not be that demanding.

 And just to François: I found two bugs in documentation. Section 5.2
 mentions selection between Western and Devanagari numerals, but it
 should be Bengali numerals (I am not sure which option is really
 implemented). At the introduction, Vafa Khaligi's name is wrong. AFAIK
 in Urdu and Farsi, the isolated and final form of YEH are dotless (it
 is not a big bug), but in fact the name is written as Khaliql, there
 is ق instead of غ

 Best

 Neal

 On Sat, 10 Sep 2011 19:40:51 -0500, Zdenek Wagner
 zdenek.wag...@gmail.com
 wrote:

 2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net:

 Here is the source files for the pdf.  Sorry to take so long to send
 them.

 Your default language for polygliglossia is defined as English. You
 switch to Sanskrit only inside the \skt macro. The text in Devanagari
 is therefore hyphenated according to Sanskrit rules but the
 transliterated text is hyphenated according to the English rules. You
 have to switch the language to Sanskrit also for the transliterated
 text.

 Best

 Neal

 On Sat, 10 Sep 2011 17:53:42 -0500, Mojca Miklavec
 mojca.miklavec.li...@gmail.com wrote:

 On Sun, Sep 11, 2011 at 00:39, Neal Delmonico wrote:

 Here is an example of what I mean in the pdf attached.

 Do I get it right that hyphenation is working, it is just that it
 misses a lot of valid hyphenation points?

 You should talk to Yves Codet, the author of Sanskrit patterns.

 But PLEASE: do post example of your code when you ask for help. If you
 don't send the source, it is not clear whether you are in fact using
 Sanskrit patterns or if you are falling back to English when you try
 to switch fonst. You could just as well sent us PDF with French
 hyphenation enabled and claim that TeX is buggy since it doesn't
 hyphenate right.

 Mojca


 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


 --
 Using Opera's revolutionary email client: http://www.opera.com/mail/


 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex







 --
 Using Opera's revolutionary email client: http://www.opera.com/mail/


 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex






 --
 Using Opera's revolutionary email client: http://www.opera.com/mail/



 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-11 Thread Yves Codet
Hello, Neal.

I still don't receive your messages :(

Le 11 sept. 2011 à 22:21, Zdenek Wagner a écrit :

 Also Zdenek raises an interesting possibility.  If I were to want to typeset
 Sanskrit, say this very Sanskrit, in Bengali or Telugu script.  How would I
 go about that?
 
 Probably you can mechanically rewrite RomDev.map to convert the
 transliteration to another script and compile it with teckit_compile.
 I do not know Sanskrit and do not know other scripts, my knowledge in
 this area is almost zero, so I am not sure whether such mechanical
 approach would work.

I presume you can do like Zdeněk says (I don't much about Teckit). Otherwise 
you can write Sanskrit directly in Bengali or Telugu script. If you tell 
Polyglossia what is in Sanskrit it should be hyphenated correctly in those 
scripts as well.

Best wishes,

Yves





--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-10 Thread Mojca Miklavec
On Sat, Sep 10, 2011 at 22:37, Neal Delmonico wrote:
 Greetings,

 I have a question.  How does one get the hyphenation to work for
 transliterated Sanskrit as well as it does for Sanskrit in Devenagari.  I
 use the same text in Devanagari and Roman transliteration and yet in the
 Devanagari the hyphenation works fine and in the transliteration it does
 not.  Is there some trick to setting up the transliteration so that the
 hyphenation works?

Please send a minimal example that fails to work. In theory the
transliterated Sanskrit should work - at least the patterns are
present.

(On the other hand I don't know anything about Sanskrit, but if there
is some technical issue ...)

Mojca



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-10 Thread Neal Delmonico
How does one do that?  Where are the patterns kept and what format needs  
to be rebuilt.  Sorry for being so clueless about this.


Best

Neal

On Sat, 10 Sep 2011 15:47:38 -0500, Zdenek Wagner  
zdenek.wag...@gmail.com wrote:



2011/9/10 Neal Delmonico ndelmon...@sbcglobal.net:

Greetings,

I have a question.  How does one get the hyphenation to work for
transliterated Sanskrit as well as it does for Sanskrit in Devenagari.  
 I

use the same text in Devanagari and Roman transliteration and yet in the
Devanagari the hyphenation works fine and in the transliteration it does
not.  Is there some trick to setting up the transliteration so that the
hyphenation works?

It is necessary to modify the hyphenation patterns and then rebuild the  
format.



Thanks.

Neal

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex








--
Using Opera's revolutionary email client: http://www.opera.com/mail/


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-10 Thread Zdenek Wagner
2011/9/11 Neal Delmonico ndelmon...@sbcglobal.net:
 How does one do that?  Where are the patterns kept and what format needs to
 be rebuilt.  Sorry for being so clueless about this.

Sorry for the noise, I located the patterns and as Mojca wrote, the
patterns for the transliteration are present. It should work out of
the box unless you use a different transliteration. An example
including the log file will help to find the source of the problem.

 Best

 Neal

 On Sat, 10 Sep 2011 15:47:38 -0500, Zdenek Wagner zdenek.wag...@gmail.com
 wrote:

 2011/9/10 Neal Delmonico ndelmon...@sbcglobal.net:

 Greetings,

 I have a question.  How does one get the hyphenation to work for
 transliterated Sanskrit as well as it does for Sanskrit in Devenagari.  I
 use the same text in Devanagari and Roman transliteration and yet in the
 Devanagari the hyphenation works fine and in the transliteration it does
 not.  Is there some trick to setting up the transliteration so that the
 hyphenation works?

 It is necessary to modify the hyphenation patterns and then rebuild the
 format.

 Thanks.

 Neal

 --
 Using Opera's revolutionary email client: http://www.opera.com/mail/


 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex






 --
 Using Opera's revolutionary email client: http://www.opera.com/mail/


 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Hyphenation in Transliterated Sanskrit

2011-09-10 Thread Mojca Miklavec
On Sun, Sep 11, 2011 at 00:08, Neal Delmonico wrote:
 How does one do that?  Where are the patterns kept and what format needs to
 be rebuilt.  Sorry for being so clueless about this.

The patterns are in hyph-sa.tex (kpsewhich hyph-sa.tex).

Mojca



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex