Re: [tex-hyphen] Unicode patterns for Unicode in Pdflatex/Babel?

Arthur Reutenauer Sun, 12 Jun 2011 12:56:26 -0700

    Hi Daniel,

> I am trying to make myself clear: I would like to know if it wouldn't be 
> possible to employ a custom
> Unicode hyphenation rules/pattern file also for Pdflatex/Babel when the text 
> there is Unicode, too


  That shouldn't be necessary: the patterns that are presented to pdfTeX are 
encoded in some *font* encoding, distinct from the input encoding that you use 
in your document.  The inputenc and fontenc packages take care of mapping the 
code positions from the input encoding to the font encoding, be it UTF-8 or an 
8-bit encoding.  As far as the patterns are concerned, they're always encoded 
in UTF-8 in the different hyph-<lang>.tex, and are converted on the fly when 
input by pdfTeX, at format generation time, to whatever font encoding is 
appropriate for the language at hand.  The inputenc / fontenc packages then do 
the job for you, and you can use any encoding you wish in your document. 

  However, I'm going to venture a wild guess and assume that the language 
you're interested in is Sanskrit, a language which actually has patterns 
disabled for pdfTeX, because we couldn't determine what font encoding was 
appropriate when the patterns were submitted: for the vast majority of 
languages that had patterns when Mojca and I took over work on hyphenation 
files three years ago, there was one single 8-bit encoding, that was used by 
both the pattern file and the Babel support files.  Several languages, though, 
have been added in the mean time, including Sanskrit, that had no dedicated 
8-bit encoding that we could use(*).  We thus decided to make them available 
for Unicode-aware TeX engines only; hence, you don't have access to them from 
pdfTeX.  But if you have a reason to want to use them, we'll gladly make them 
available as well.  That won't be a problem at all; we only never considered 
the issue because we didn't think it would come up -- Mojca, what do you think?

    Arthur

(*) Note that packages to typeset Devanagari in TeX, as well as several other 
Indic scripts, have existed for a long time, but they didn't have any 
hyphenation patterns attached.  These have only be added recently from 
different contributors, and when Mojca found out that OpenOffice shipped many 
pattern files for modern Indian languages.  All the files were encoded using 
UTF-8.

Re: [tex-hyphen] Unicode patterns for Unicode in Pdflatex/Babel?

Reply via email to