On Thu, 27 Oct 2011 20:19:07 +0000 (UTC) Andreas Vox <avox at arcor.de> wrote:
> John Jason Jordan <johnxj at ...> writes: > > ... > > > I agree with Gregory that hyphenation is not perfect in many > > programs. And Gregory has an excellent point about short syllable > > breaks (e.g. "re-ceived") being sometimes ambiguous, leading the > > reader to have to pause or re-read a line to connect the first and > > last parts of the hyphenated word. I'm not sure how a layout > > program can fix this, however. > ... > > > > > I don't know how Scribus does its hyphenation. And if it does use an > > algorithm, switching to dictionary-based hyphenation just for > > English may be impractical. Nevertheless, I wanted to point out that > > hyphenation is more problematic than just deciding whether to base > > it on the entire paragraph or one line at a time. > > Scribus uses the same algorithm as TeX and OO.o. But that congenial > method is really also a dictionary approach: to create the > hyphenation rules, a large corpus of text is fed into the generation > program. The program then tries to condense that information into a > ruleset, which contains rules like "if you see this 'xyz" pattern, > assume good break pos at 1 and bad break pos at 2, unless it also > matches "pxyzq", in which case the best break position is 4, > unless...." This results in a file with hundreds of short patterns > which indicate good and bad break positions (priotized 1-5 iirc). > Then the whole corpus is tested with this algorithm and the remaining > words which aren't hyphenated correctly (usually just a dozen or so) > are put into an exception list. > > I don't know of any program that handles problems like "re-ceive" > properly. With a paragraph layouter it should be possible to include > extra penalties for such cases, so the layouter would automatically > try to avoid those. > > /Andreas > > > > > ___ > Scribus Mailing List: scribus at lists.scribus.net > Edit your options or unsubscribe: > http://lists.scribus.net/mailman/listinfo/scribus > See also: > http://wiki.scribus.net > http://forums.scribus.net > > _______________________________________________________ > Unlimited Disk, Data Transfer, PHP/MySQL Domain Hosting > http://www.doteasy.com TeX works first from a hyphenation dictionary; if the word is not found then from an algorithm. The algorithm is based on the work of Frank. M. Liang. The minimum length of the first fragment of an hyphenated word is by default 2 characters (\lefthyphenmin=2) and the right fragment is 3 characters. (\righthyphenmin=3). There is a parameter for discouraging hyphens (\hyphenpenalty=50). As you increase it hyphens become less likely. A discretionary hyphen can be inserted in a word by inserting \- where a hyphen might occur. There is a settable parameter that discourages two hyphens in a row (\doublehyphendemerits=10000). If you don't like a very short word or word fragment to end a paragraph there is some TeX trickery to prevent that from happening. The accepted custom is make end of paragraph words or fragments at least as wide as the indent of the following paragraph. TeX takes it from there. And so on. All this complexity can be ignored by most users. The defaults are sensible. My point is that when anyone says "TeX can't handle this typesetting situation" they are probably wrong. You set the rules or accept the defaults. TeX follows those rules. I have set entire books with zero manual kerning. Now how does this impact Scribus? I suggest some optional behind the scenes magic. If the paragraph is selected for TeX typesetting then the text is passed to luatex or xetex (TeX variants) along with the font name and size, the max measure (width of the print line) etc. Luatex sets the paragraph and returns the paragraph to Scribus, with hidden kerns for word spacing. Is this easy? No. Is it close to optimum? Yes. -- John Culleton Free list of books for self-publishers: http://wexfordpress.net/shortlist.html "Create Book Covers with Scribus" http://www.booklocker.com/books/4055.html
