Re: [NTG-context] Hyphenation patterns

2021-04-09 Thread Arthur Rosendahl
  Denis’ latest question reminded me of an earlier query he had about
hyphenation, asking why “applicable” and “obligated” were hyphenated by
ConTeXt as ap-plic-a-ble and ob-lig-at-ed, and not ap-pli-ca-ble and
ob-li-ga-te(d) like in Merriam-Webster (the discussion started at
https://mailman.ntg.nl/pipermail/ntg-context/2020/099695.html).

  First of all, I note that while Webster’s dictionary is a useful
guide, and indeed a major reference for any American typographer,
there’s no absolute rule that we have to follow it either.  The break
applic-able, for example, does look acceptable to me; oblig-ated, less
so.

  Taco reminded that when producing a set of hyphenation patterns from a
list of hyphenated words, we’re essentially compressing information, and
that some minor deviations are to be expected.  However, in my
experience, unexpected breakpoints are almost never due to chance, but
to a deliberate decision.

  Then Hraban said that:

On Fri, Oct 09, 2020 at 10:15:17AM +0200, Henning Hraban Ramm wrote:
> Usually Arthur’s (hail the emperor of hyphenation and protector of the 
> patterns) patterns are flawless, so I guess it’s not a bug but an exception 
> of the rules.

  I see that my self-appointed title is catching on, nice :-)
Unfortunately the patterns are just as likely to contain errors as
anything else, and in this particular case we’ll probably never know for
sure, because the original hyphenated word list was never published (all
the word lists from which patterns were produced in the 80s and 90s have
been lost, for all languages).  We’re thus reduced to guessing the
intent of those who compiled the lists.

  We can get hints from looking at the patterns involved in the
debatable breaks.  Hans has a useful script:

$ mtxrun --script patterns --language=us --left=2 --right=2 --hyphenate 
applicable
hyphenator  |
hyphenator  | . a p p l i c a b l e .   . a p p l i c a b l e .  
hyphenator  |4p1p0   0 4 1 0 0 0 0 0 0 0 0  
hyphenator  |  1p2l2 0 4 1 2 2 0 0 0 0 0 0  
hyphenator  |  0p0l0i2c1a0b0 0 4 1 2 2 2 1 0 0 0 0  
hyphenator  |1c0a0   0 4 1 2 2 2 1 0 0 0 0  
hyphenator  |0c0a1b0l0   0 4 1 2 2 2 1 1 0 0 0  
hyphenator  |0b2l2   0 4 1 2 2 2 1 1 2 2 0  
hyphenator  |0b4l0e0.0   0 4 1 2 2 2 1 1 4 2 0  
hyphenator  | .0a4p1p2l2i2c1a1b4l2e0.   . a p-p l i c-a-b l e .  
hyphenator  |
mtx-patterns| us 2 2 : applicable : ap-plic-a-ble

  That tells us that there are seven patterns involved in hyphenating
the word applicable: 4p1, 1p2l2, pli2c1ab, 1ca, ca1bl, b2l2, and b4le.
(the final dot is part of that last pattern).  The pattern responsible
for the break applic-able is pli2c1ab.  If we now refer to the source
repository for hyphenation patterns (since comments are stripped in the
ConTeXt sources): 
https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/hyph-en-us.tex
-- we can see line 4508

hyphen.tex patterns end here, and additional patterns begin:

which means that the pattern pli2c1ab, line 4817, is an “additional
pattern”.  The background story is that hyphen.tex, the original
hyphenation pattern file for American English, produced in 1982-1983
from a list of hyphenated words (following mostly Webster’s), was later
augmented with more patterns that were supposed to improve hyphenation
for many words.  The person who added these new patterns apparently had
a list of words hyphenated incorrectly (according to him) by hyphen.tex,
but both that list and the one used to produce hyphen.tex are as
mentioned above now lost, probably forever.

  In any case, the pattern that causes the break applic-able was clearly
added intentionally; and as I said that break seems quite reasonable to
me.  Not so for the one in oblig-ated, so let’s have a look at that:

$ mtxrun --script patterns --language=us --left=2 --right=2 --hyphenate 
obligated
hyphenator  |
hyphenator  | . o b l i g a t e d .   . o b l i g a t e d .  
hyphenator  |  0o0b0l0i2g1 0 0 0 0 2 1 0 0 0 0  
hyphenator  |0b2l2 0 0 2 2 2 1 0 0 0 0  
hyphenator  |  5l0i0g0a0t0e0   0 0 5 2 2 1 0 0 0 0  
hyphenator  |2i0g0 0 0 5 2 2 1 0 0 0 0  
hyphenator  |  1g0a0   0 0 5 2 2 1 0 0 0 0  
hyphenator  |  2t1e0d0 0 0 5 2 2 1 2 1 0 0  
hyphenator  | .0o0b5l2i2g1a2t1e0d0.   . o b-l i g-a t-e d .  
hyphenator  |
mtx-patterns| us 2 2 : obligated : ob-lig-at-ed

  Here we see that the dubious break is caused by the pattern obli2g1,
also an “additional pattern” (line 4783), and here it’s not hard to
guess where 

Re: [NTG-context] Hyphenation patterns

2020-10-09 Thread Denis Maier

Am 09.10.2020 um 14:48 schrieb Hans Hagen:

On 10/9/2020 9:01 AM, Denis Maier wrote:

[...]
I see. I've noticed lang-us.lua has a list of exceptions in it:
  ["exceptions"]={
   ["characters"]="abcdefghijlmnoprstuyz",
   ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory 
phil-an-thropic present presents project projects reci-procity 
re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble",

   ["length"]=168,
   ["n"]=14,
  },

Would it be possible to add more exceptions to that list as they come 
up? Or is that inappropriate?

you can add your own runtime in a style:

\hyphenation {fo-ob-ar} \hsize 1mm foobar


Sure. I use \startexceptions[en] for that. I just thought everyone might 
benefit...


Denis
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns

2020-10-09 Thread Hans Hagen

On 10/9/2020 9:01 AM, Denis Maier wrote:

Am 09.10.2020 um 08:57 schrieb Taco Hoekwater:



On 9 Oct 2020, at 08:52, Denis Maier  wrote:

Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:

\starttext

{EN: \en\hyphenatedcoloredword{applicable}}

{DE: \de\hyphenatedcoloredword{applicable}}

\stoptext

Wow, that's super helpful. The English pattern seems to be 
"ap-plic-a-ble"

According to Meriam-Webster it should just be "ap·​pli·​ca·​ble".

{EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate"
According to Meriam-Webster it should be "ob·​li·​gate".

I've had a look at the files mentioned by Tomáš, but as these are not 
just wordlists I can not really tell what is happening.


So, is that a bug?
Not really. hyphenation patterns are a bit like applying JPEG 
compression to
a dictionary. It makes the data size smaller by recognising patterns 
while

ignoring outliers.

Occasional errors are to be expected, which is why \hyphenation exists.



I see. I've noticed lang-us.lua has a list of exceptions in it:
  ["exceptions"]={
   ["characters"]="abcdefghijlmnoprstuyz",
   ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory 
phil-an-thropic present presents project projects reci-procity 
re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble",

   ["length"]=168,
   ["n"]=14,
  },

Would it be possible to add more exceptions to that list as they come 
up? Or is that inappropriate?

you can add your own runtime in a style:

\hyphenation {fo-ob-ar} \hsize 1mm foobar

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns

2020-10-09 Thread Denis Maier

Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:

\starttext

{EN: \en\hyphenatedcoloredword{applicable}}

{DE: \de\hyphenatedcoloredword{applicable}}

\stoptext

Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble"
According to Meriam-Webster it should just be "ap·​pli·​ca·​ble".

{EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate"
According to Meriam-Webster it should be "ob·​li·​gate".

I've had a look at the files mentioned by Tomáš, but as these are not 
just wordlists I can not really tell what is happening.


So, is that a bug?

Best,
Denis
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns

2020-10-09 Thread Hans Hagen

On 10/9/2020 10:15 AM, Henning Hraban Ramm wrote:




Am 09.10.2020 um 08:52 schrieb Denis Maier :

Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:

\starttext

{EN: \en\hyphenatedcoloredword{applicable}}

{DE: \de\hyphenatedcoloredword{applicable}}

\stoptext


Wow, that's super helpful.


BTW \hyphenatedword works the same. I didn’t see anything colored.
There are some more commands like this, even \hyphenatedfile, see
https://source.contextgarden.net/tex/context/base/mkiv/supp-box.mkiv?search=hyphenated

Usually Arthur’s (hail the emperor of hyphenation and protector of the 
patterns) patterns are flawless, so I guess it’s not a bug but an exception of 
the rules.

ancient secret features:

>mtxrun --script patterns --hyphenate applicable --language=gb
hyphenator  |
hyphenator  | . a p p l i c a b l e .   . a p p l i c a b l e .
hyphenator  |  2a0p0 2 0 0 0 0 0 0 0 0 0 0
hyphenator  |4p1p2   2 4 1 2 0 0 0 0 0 0 0
hyphenator  |  0p2l2 2 4 1 2 2 0 0 0 0 0 0
hyphenator  |  1a0b0 2 4 1 2 2 0 1 0 0 0 0
hyphenator  |2b0l2   2 4 1 2 2 0 1 2 0 2 0
hyphenator  |  4l0e0.0   2 4 1 2 2 0 1 2 4 2 0
hyphenator  | .2a4p1p2l2i0c1a2b4l2e0.   . a p-p l i c-a b l e .
hyphenator  |
mtx-patterns| gb 3 3 : applicable : applic-able

>mtxrun --script patterns --hyphenate applicable --language=us
hyphenator  |
hyphenator  | . a p p l i c a b l e .   . a p p l i c a b l e .
hyphenator  |4p1p0   0 4 1 0 0 0 0 0 0 0 0
hyphenator  |  1p2l2 0 4 1 2 2 0 0 0 0 0 0
hyphenator  |  0p0l0i2c1a0b0 0 4 1 2 2 2 1 0 0 0 0
hyphenator  |1c0a0   0 4 1 2 2 2 1 0 0 0 0
hyphenator  |0c0a1b0l0   0 4 1 2 2 2 1 1 0 0 0
hyphenator  |0b2l2   0 4 1 2 2 2 1 1 2 2 0
hyphenator  |0b4l0e0.0   0 4 1 2 2 2 1 1 4 2 0
hyphenator  | .0a4p1p2l2i2c1a1b4l2e0.   . a p-p l i c-a-b l e .
hyphenator  |
mtx-patterns| us 3 3 : applicable : applic-a-ble

not the kind of stuff one wants to expose a new user to

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns

2020-10-09 Thread Hans Hagen

On 10/8/2020 7:05 PM, Henning Hraban Ramm wrote:



Am 08.10.2020 um 17:41 schrieb Denis Maier :

where can I find the hyphenation patterns used by ConTeXt? I have two wrongly 
hyphenated words, and I want to check whether this is due to incorrect 
patterns. (I tried the source browser... not much luck so far.) The words are:
1. applicable => hyphenated as applic-able
2. obligated => hyphenated as oblig-ated

I know I can use \hyphenation to correct that, but I wanted to check the 
patterns nevertheless.


I guess it’s just a valid option.
You can check possible hyphenations like this:

\starttext

{EN: \en\hyphenatedcoloredword{applicable}}

{DE: \de\hyphenatedcoloredword{applicable}}

\stoptext

americans and brits hyphnetate differently

\starttext
{\language[usenglish]  {\tt US \number\normallanguage}: 
\hyphenatedcoloredword{applicable}}\par
{\language[ukenglish]  {\tt UK \number\normallanguage}: 
\hyphenatedcoloredword{applicable}}\par

\stoptext

syllable vs stem (but I bet Arthur can explain better)

hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns

2020-10-09 Thread Henning Hraban Ramm


> Am 09.10.2020 um 08:52 schrieb Denis Maier :
> 
> Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
>> \starttext
>> 
>> {EN: \en\hyphenatedcoloredword{applicable}}
>> 
>> {DE: \de\hyphenatedcoloredword{applicable}}
>> 
>> \stoptext
>> 
> Wow, that's super helpful.

BTW \hyphenatedword works the same. I didn’t see anything colored.
There are some more commands like this, even \hyphenatedfile, see
https://source.contextgarden.net/tex/context/base/mkiv/supp-box.mkiv?search=hyphenated

Usually Arthur’s (hail the emperor of hyphenation and protector of the 
patterns) patterns are flawless, so I guess it’s not a bug but an exception of 
the rules.

Hraban
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns

2020-10-09 Thread Denis Maier

Am 09.10.2020 um 08:57 schrieb Taco Hoekwater:



On 9 Oct 2020, at 08:52, Denis Maier  wrote:

Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:

\starttext

{EN: \en\hyphenatedcoloredword{applicable}}

{DE: \de\hyphenatedcoloredword{applicable}}

\stoptext


Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble"
According to Meriam-Webster it should just be "ap·​pli·​ca·​ble".

{EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate"
According to Meriam-Webster it should be "ob·​li·​gate".

I've had a look at the files mentioned by Tomáš, but as these are not just 
wordlists I can not really tell what is happening.

So, is that a bug?

Not really. hyphenation patterns are a bit like applying JPEG compression to
a dictionary. It makes the data size smaller by recognising patterns while
ignoring outliers.

Occasional errors are to be expected, which is why \hyphenation exists.



I see. I've noticed lang-us.lua has a list of exceptions in it:
 ["exceptions"]={
  ["characters"]="abcdefghijlmnoprstuyz",
  ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory 
phil-an-thropic present presents project projects reci-procity 
re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble",

  ["length"]=168,
  ["n"]=14,
 },

Would it be possible to add more exceptions to that list as they come 
up? Or is that inappropriate?


Denis
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns

2020-10-09 Thread Taco Hoekwater


> On 9 Oct 2020, at 08:52, Denis Maier  wrote:
> 
> Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
>> \starttext
>> 
>> {EN: \en\hyphenatedcoloredword{applicable}}
>> 
>> {DE: \de\hyphenatedcoloredword{applicable}}
>> 
>> \stoptext
>> 
> Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble"
> According to Meriam-Webster it should just be "ap·​pli·​ca·​ble".
> 
> {EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate"
> According to Meriam-Webster it should be "ob·​li·​gate".
> 
> I've had a look at the files mentioned by Tomáš, but as these are not just 
> wordlists I can not really tell what is happening.
> 
> So, is that a bug? 

Not really. hyphenation patterns are a bit like applying JPEG compression to 
a dictionary. It makes the data size smaller by recognising patterns while
ignoring outliers. 

Occasional errors are to be expected, which is why \hyphenation exists.

Best wishes,
Taco

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns

2020-10-08 Thread Henning Hraban Ramm

> Am 08.10.2020 um 17:41 schrieb Denis Maier :
> 
> where can I find the hyphenation patterns used by ConTeXt? I have two wrongly 
> hyphenated words, and I want to check whether this is due to incorrect 
> patterns. (I tried the source browser... not much luck so far.) The words are:
> 1. applicable => hyphenated as applic-able
> 2. obligated => hyphenated as oblig-ated
> 
> I know I can use \hyphenation to correct that, but I wanted to check the 
> patterns nevertheless.

I guess it’s just a valid option.
You can check possible hyphenations like this:

\starttext

{EN: \en\hyphenatedcoloredword{applicable}}

{DE: \de\hyphenatedcoloredword{applicable}}

\stoptext


Hraban
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns

2020-10-08 Thread Tomas Hala
Hi,

you can find patterns on this directory:

texlive/2020/texmf-dist/tex/context/patterns/mkiv/

Best wishes,

Tomáš 

Thu, Oct 08, 2020 ve 05:41:09PM +0200 Denis Maier napsal(a):
# Hi,
# 
# where can I find the hyphenation patterns used by ConTeXt? I have
# two wrongly hyphenated words, and I want to check whether this is
# due to incorrect patterns. (I tried the source browser... not much
# luck so far.) The words are:
# 1. applicable => hyphenated as applic-able
# 2. obligated => hyphenated as oblig-ated
# 
# I know I can use \hyphenation to correct that, but I wanted to check
# the patterns nevertheless.
# 
# Best,
# Denis
# 
___
# If your question is of interest to others as well, please add an entry to the 
Wiki!
# 
# maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
# webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
# archive  : https://bitbucket.org/phg/context-mirror/commits/
# wiki : http://contextgarden.net
# 
___

 Tomáš Hála

Mendelova univerzita, Provozně ekonomická fakulta, ústav informatiky
Zemědělská 1, CZ-613 00 Brno,  tel. +420 545 13 22 28

http://akela.mendelu.cz/~thala
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns and adjusted kerning: ConTeXt vs. LuaTeX

2011-02-25 Thread Hans Hagen

On 25-2-2011 1:18, Heilmann, Till A. wrote:

Maybe the ConTeXt community can be of assistance to the LuaTeX bunch ...

As a new LuaTeX user, I came across the following problem: Using Lua(La)TeX, 
customized kerning of letter pairs (via the FeatureFile capability of fontspec) 
is ignored when it coincides with a possible hyphenation of a word (e.g. 
between 'f' and 'h' in German words like 'aufhalten'; see first minimal example 
below).

Ulrike Fischer was so kind to point out two things 
(http://tug.org/pipermail/luatex/2011-February/002569.html): First, the problem 
seems to be the break points between the adjusted kerning pairs. Second, 
ConTeXt seems to handle this case correctly (see second minimal example below; 
feature file bonum.fea from first example required).

I am no expert in neither LuaTeX nor context, but Ulrike suggested I post here 
and ask if the (typographically correct) ConTeXt behavior or solution can be 
reproduced with Lua(La)TeX.

Thanks,
- Till

1. Lua(La)TeX

\begin{filecontents*}{bonum.fea}
languagesystem DFLT dflt;
languagesystem latn dflt;
feature kern {
pos f h 100;
} kern;
\end{filecontents*}
\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{fontspec}
\setmainfont[FeatureFile=bonum.fea]{TeX Gyre Bonum}
\begin{document}
fh aufhalten
\end{document}


I cannot test that (I only have the context minimals installed) but I 
don't know anything about latex internals so it would be a wild guess. 
Maybe babel is interfering?  In base mode kerning and and hyphenation 
happen in the traditional tex way, so there is not much extra trickery 
taking place.



2. ConTeXt

\mainlanguage   [de]
\definefontfeature[test][featurefile=bonum,kern=yes]
\definefont[test][name:texgyrebonum*test]
\starttext
\test fh aufhalten
\stoptext


Indeed I see a kern.

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns and adjusted kerning: ConTeXt vs. LuaTeX

2011-02-25 Thread Ulrike Fischer
Am Fri, 25 Feb 2011 14:35:10 +0100 schrieb Hans Hagen:

 As a new LuaTeX user, I came across the following problem: Using
 Lua(La)TeX, customized kerning of letter pairs (via the
 FeatureFile capability of fontspec) is ignored when it coincides
 with a possible hyphenation of a word (e.g. between 'f' and 'h'
 in German words like 'aufhalten'; see first minimal example
 below).

 1. Lua(La)TeX

 I cannot test that (I only have the context minimals installed) but I 
 don't know anything about latex internals so it would be a wild guess. 
 Maybe babel is interfering? 

No, the problem exists also if you only load the german patterns.

 In base mode kerning and and hyphenation 
 happen in the traditional tex way, so there is not much extra trickery 
 taking place.

Well, as you mention base mode: This reminded me that I had to
force base mode to get my reencoding to work in latex. So I tried in
context + latex/luaotfload (with german hyphenation patterns):

\font\test=name:TeX Gyre
Bonum:mode=base:featurefile=bonum.fea;+kern

and

\font\test=name:TeX Gyre
Bonum:mode=node:featurefile=bonum.fea;+kern 

And bingo: with mode=base it works in both formats, with mode=node
the kern disappears. Without mode declaration the kern disappears in
latex. 

So I think it isn't true that the manual of luaotfload claims By
default mode=base is used. 



-- 
Ulrike Fischer 

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns and adjusted kerning: ConTeXt vs. LuaTeX

2011-02-25 Thread Ulrike Fischer
Am Fri, 25 Feb 2011 16:37:26 +0100 schrieb Heilmann, Till A.:

 Am Fri, 25 Feb 2011 14:41:10 +0100 schrieb Ulrike Fischer:
 
 In base mode kerning and and hyphenation 
 happen in the traditional tex way, so there is not much extra trickery 
 taking place.
 
 Well, as you mention base mode: This reminded me that I had to
 force base mode to get my reencoding to work in latex. So I tried in
 context + latex/luaotfload (with german hyphenation patterns):
 
 [...]
 
 And bingo: with mode=base it works in both formats, with mode=node
 the kern disappears. Without mode declaration the kern disappears in
 latex.
 
 Ah, yes, the transcript of my first example clearly shows fontspec operating 
 in node mode.
 
 Please excuse my naive asking: Is there any way to continue using
 fontspec's setmainfont command (it is convenient for someone
 unexperienced like me) and at the same force luaotfload into
 using base mode?

The following seems to work:

\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{fontspec}
\setmainfont[RawFeature={mode=base},FeatureFile=bonum.fea]{TeX Gyre
Bonum}
\begin{document}
fh aufhalten
\end{document}



-- 
Ulrike Fischer 

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns and adjusted kerning: ConTeXt vs. LuaTeX

2011-02-25 Thread Ulrike Fischer
Am Fri, 25 Feb 2011 16:45:31 +0100 schrieb Ulrike Fischer:

 Ah, yes, the transcript of my first example clearly shows fontspec operating 
 in node mode.

Yes, but I could also reproduce the problem without fontspec (only
with luaotfload).
  
 Please excuse my naive asking: Is there any way to continue using
 fontspec's setmainfont command (it is convenient for someone
 unexperienced like me) and at the same force luaotfload into
 using base mode?
 
 The following seems to work:
 
 \documentclass{article}
 \usepackage[ngerman]{babel}
 \usepackage{fontspec}
 \setmainfont[RawFeature={mode=base},FeatureFile=bonum.fea]{TeX Gyre
 Bonum}
 \begin{document}
 fh aufhalten
 \end{document}

And after a look in the fontspec code:

\setmainfont[Renderer=Basic,FeatureFile=bonum.fea]{TeX Gyre Bonum}
-- 
Ulrike Fischer 

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns and adjusted kerning: ConTeXt vs. LuaTeX

2011-02-25 Thread Khaled Hosny
On Fri, Feb 25, 2011 at 03:41:10PM +0100, Ulrike Fischer wrote:
 So I think it isn't true that the manual of luaotfload claims By
 default mode=base is used. 

It used to be like that but we changed it a while ago, looks like I
didn't update the manual.

Regards,
 Khaled

-- 
 Khaled Hosny
 Egyptian
 Arab
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Hyphenation patterns and adjusted kerning:?ConTeXt vs. LuaTeX

2011-02-25 Thread Khaled Hosny
On Fri, Feb 25, 2011 at 04:45:31PM +0100, Ulrike Fischer wrote:
  Ah, yes, the transcript of my first example clearly shows fontspec 
  operating in node mode.
  
  Please excuse my naive asking: Is there any way to continue using
  fontspec's setmainfont command (it is convenient for someone
  unexperienced like me) and at the same force luaotfload into
  using base mode?
 
 The following seems to work:
 
 \documentclass{article}
 \usepackage[ngerman]{babel}
 \usepackage{fontspec}
 \setmainfont[RawFeature={mode=base},FeatureFile=bonum.fea]{TeX Gyre Bonum}

Better Renderer=Basic.

-- 
 Khaled Hosny
 Egyptian
 Arab
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] hyphenation patterns

2010-05-24 Thread Hans Hagen

On 24-5-2010 2:16, Mojca Miklavec wrote:


There's no need to apologize. First, there's an infinite number of
foreign names, so that one simply cannot get all of them right. I
guess that Lju-bl-ja-na is not properly hyphenated either (Lu-bia-na


why not just use hyphenmin values of 3 to prevent such cases

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] hyphenation patterns

2010-05-24 Thread luigi scarso
On Sun, May 23, 2010 at 11:38 PM, Mojca Miklavec
mojca.miklavec.li...@gmail.com wrote:
 hyphenate properly in Italian. Italian is a
 what-you-see-is-what-you-pronounce language (in contrast to English)
Apart some traps like

glicine vs tagliare
where syllable 'gli' is spelled in completely different way

or
anno (year) vs hanno (have in they have)
where the sound is the same

or àncora (anchor) vs ancóra (again)
and we usually write ancora vs ancora (yes, no difference: only the
sound is different)


or péro (pear tree) vs però (but)

and so on.


-- 
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] hyphenation patterns

2010-05-24 Thread rogutes
Mojca Miklavec (2010-05-24 02:16):
 Dear Claudio,
 
 Thanks a lot for your prompt reply.
 
 On Mon, May 24, 2010 at 00:39, Claudio Beccari wrote:
  Dear Mojca,
  no proper Italian word ends in ch (this digraph in normal Italian words is
  pronunced as k, not as č or ć).
  Nevertheless there are a number of surnames dating back to the old times
  (150 years ago) when North East Italy was under Austro-Hungarian ruling,
  when Istrian names, mainly Croatian and Slovenian, where transliterated in
  such a way that the tipical patronimic ending  -ič or -ić (I don't know the
  exact spelling in Latin letters of the Croatian/Slovenian names) was
  transliterated for the Empire bureaucracy with -ich.
 
 Thanks a lot for some more insight. I admit that I didn't know the
 details (I should be ashamed) and in my area they were more radical
 with surname changes (mine was Michelazzi and I think that most
 surnames here were properly Romanized, for example Filipčič -
 Filippi, so again no problems with hyphenation :) :) :).
 
  This spelling remained
  when North East Italy and Istria were annexed to the Kingdom of Italy at the
  end of WW1. After WW2 most of Istria returned mainly to Croatia and a small
  part to Slovenia, but the Slovenians and Croatians that had moved the NE
  Italy and had become Italian citizens maintained their surnames with the
  Austro-Hungarian spelling.
 
  When I prepared the hyphen patterns for Italian ad Latin I did think to
  this particular spelling, but I concluded that it was not so important; I
  was wrong, and I apologize.
 
 There's no need to apologize. First, there's an infinite number of
 foreign names, so that one simply cannot get all of them right. I
 guess that Lju-bl-ja-na is not properly hyphenated either (Lu-bia-na
 is ok), but in my opinion it's a valid argument that one should change
 the language when writing foreign names if they are to be hyphenated
 properly. I can also easily imagine Slovenian patterns that would
 hyphenate:
 Fis-cher, Aac-hen, Go-ethe
 when not knowing that those letters represent a single letter/sound
 in foreign words.
 
 Second, I have no idea, but I think it was a pure coincidence that the
 problem reported by Rogutės Sparnuotos is the same as that for
 surnames of a group of people on North-East (I think that the name in
 question comes from Russia with translitaration done by English). On
 the other hand if it's just a tiny pattern that solves them all ...

Thank you Mojca and Claudio for your replies.

Mojca has guessed correctly: I merely noticed that the surname Manovich is
hyphenated wrongly in the three languages I've tested. And I don't mind
using \hyphenation{} or switching language for foreign names.

I don't know how hyphenation patterns are made, so I was surprised to see
the main rule of at least Latin/Italian/Lithuanian hyphenation broken (a
syllable must contain a vowel). From your explanations it seems that
hyphenation patterns are kind of case-by-case rules, so this problem is
not suprising, since no common words end with '-ch' in these languages.

Wonder if I'll find a maintainer of the Lithuanian patterns...

-- 
--  Rogutės Sparnuotos
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] hyphenation patterns

2010-05-23 Thread Mojca Miklavec
On Mon, May 24, 2010 at 01:22, Rogutės Sparnuotos wrote:

 \setuplayout[textwidth=0.2cm]
 \starttext
 \language[la] Manovich.
 \stoptext

 hyphenates 'Manovich' into Ma-no-vi-ch, while it should be Ma-no-vich. The
 same applies for Italian and Lithuanian languages (in LaTeX as well).

 Could there be such an omission in the hyphenation patterns? Or am I
 missing something?

Both Italian and Latin have the pattern 1c meaning break in front
of any letter c unless another patterns prohibits that. Lithuanian
patterns contain i1c which means break between i and c.

Nothing in ConTeXt can or will be fixed, but here's a short answer
with four options of what you can do:
1. Use \hyphenation{Ma-no-vich} on top of your document
2. Use Manovič instead of Manovich (it then hyphenates properly in
Latin at least, I didn't try the others); or Манович :)
3. Use \mainlanguage[la] bla bla bla {\language[en] Manovich}
4. Complain to the authors of Italian/Latin/Lithuanian patterns and
ask them for a fix.

Some explanation:
I assume that this is not a native Latin, Italian or Lithuanian word.
If you are talking about the artist name (Lev Manovich) then you are
using English transliteration of Russian word and expect it to
hyphenate properly in Italian. Italian is a
what-you-see-is-what-you-pronounce language (in contrast to English)
and you cannot expect that it will hyphenate properly all the foreign
names that are not even transliterated properly. An Italian word
would most probably never end with ch, so there's currently no
pattern present that would prohibit that behaviour. I don't know
Russian enough, but I would blindly guess that the right
transliteration would be Manovič anyway (of course everyone would have
a problem with getting the right accent and with proper pronounciation
then) and German wikipedia somehow confirms that:
Lev Manowitsch (russ. Лев Манович, wiss. Transliteration Lev Manovič;
* 1960 in Moskau)
Note that Germans transliterate the name differently and Italians
could transliterate it in a different way as well. Since Lithuanian
contains the letter č, I would assume that they would transliterate
the name with č anyway (disclaimer: my knowledge about Lithuanian is
zero, so I'm not even sure how they pronounce that letter). For
example particular - Serbian will never have a problem with
hyphenation of foreign names:
http://sr.wikipedia.org/sr-el/Алберт_Ајнштајн
Albert Ajnštajn (nem. Albert Einstein) je bio teorijski fizičar ...

The question is always: how many different foreign names to you want
to hyphenate properly in any given language?

On the other hand, even with Italian pronunciation, I guess that ch is
considered to be a single consonant (I may be wrong in that, but
it's not too relevant either), so adding an additional pattern 2ch.
(or 4ch., not sure which one is needed) cannot hurt.

Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] hyphenation patterns

2010-05-23 Thread Mojca Miklavec
Dear Claudio,

Thanks a lot for your prompt reply.

On Mon, May 24, 2010 at 00:39, Claudio Beccari wrote:
 Dear Mojca,
 no proper Italian word ends in ch (this digraph in normal Italian words is
 pronunced as k, not as č or ć).
 Nevertheless there are a number of surnames dating back to the old times
 (150 years ago) when North East Italy was under Austro-Hungarian ruling,
 when Istrian names, mainly Croatian and Slovenian, where transliterated in
 such a way that the tipical patronimic ending  -ič or -ić (I don't know the
 exact spelling in Latin letters of the Croatian/Slovenian names) was
 transliterated for the Empire bureaucracy with -ich.

Thanks a lot for some more insight. I admit that I didn't know the
details (I should be ashamed) and in my area they were more radical
with surname changes (mine was Michelazzi and I think that most
surnames here were properly Romanized, for example Filipčič -
Filippi, so again no problems with hyphenation :) :) :).

 This spelling remained
 when North East Italy and Istria were annexed to the Kingdom of Italy at the
 end of WW1. After WW2 most of Istria returned mainly to Croatia and a small
 part to Slovenia, but the Slovenians and Croatians that had moved the NE
 Italy and had become Italian citizens maintained their surnames with the
 Austro-Hungarian spelling.

 When I prepared the hyphen patterns for Italian ad Latin I did think to
 this particular spelling, but I concluded that it was not so important; I
 was wrong, and I apologize.

There's no need to apologize. First, there's an infinite number of
foreign names, so that one simply cannot get all of them right. I
guess that Lju-bl-ja-na is not properly hyphenated either (Lu-bia-na
is ok), but in my opinion it's a valid argument that one should change
the language when writing foreign names if they are to be hyphenated
properly. I can also easily imagine Slovenian patterns that would
hyphenate:
Fis-cher, Aac-hen, Go-ethe
when not knowing that those letters represent a single letter/sound
in foreign words.

Second, I have no idea, but I think it was a pure coincidence that the
problem reported by Rogutės Sparnuotos is the same as that for
surnames of a group of people on North-East (I think that the name in
question comes from Russia with translitaration done by English). On
the other hand if it's just a tiny pattern that solves them all ...

 I will submit, at least for Italian, a revised
 pattern file. I doubt I should do it also for Latin, although it does not
 cost anything...

In case you do submit any updates, I would be extremely grateful for
submitting an update to
   
http://www.ctan.org/tex-archive/language/hyph-utf8/tex/generic/hyph-utf8/patterns/hyph-it.tex
instead of (or at least in addition to) the original file (you may
remove the initial comments).

Also, if you happen to have the original of
   http://www.tug.org/TUGboat/Articles/tb13-1/tb34becc.pdf
it would be nice to include it into repository as documentation about
Italian hyphenation (but that's all too off-topic for the ConTeXt
mailing list).

Thanks again,
Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___