DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16870>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16870 Hyphenation bug including bugfix : sporadic mutilation of hyphenated word Summary: Hyphenation bug including bugfix : sporadic mutilation of hyphenated word Product: Fop Version: 0.20.4 Platform: PC OS/Version: Windows NT/2K Status: NEW Severity: Normal Priority: Other Component: general AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Explanation of bug: ------------------- Under some circumstances (see below) some hyphenated words are mutilated. E.g. the german word Altersvorsorge, was SOMETIMES (but not very often) hyphenated rsvor-Altesorge. Reason: ------- Xerces uses the characters() calls to give FOP a character buffer which is a 'view window' on the current document. It can happen that one word (like "Altersvorsorge") is fragmented over two calls of characters(). In the given example : "Alte" and "rsvorsorge" FOP adds the first part of the word to the "pending areas". This happens in org\apache\fop\layout\LineArea.java in the method addText(). Xerces delivers the rest of the word in his second characters-call which results in a second call to addText(). In this second call (if hyphenation is set to true) the method doHyphenation() (also in class LineArea) is called which completely ignores pending areas!!! So it happens that the word fragment "rsvorsorge" is handed over to the hyphenation engine, which does a correct job with this fragment. Now the Hyphenator determines that "rsvor-" is added to the current line area. The next call to addText checks if there are any pending areas ("Alte" in our example) prints it in the next line and continues with the rest of the current buffer ("sorge [...]" in the example). So the reason that this bug occurs only in very few situations is that it depends on 1) how often and with which buffer size the xml-parser calls the characters- method and so I think it definitely depends on the version of the xml parser used 2) how the xml-document looks like; an additional character/newline somewhere BEFORE the mutilated word can change the calls to the characters method. MY CHANGES ---------- I changed the internals of the method doHyphenation(). It now takes into account any pending areas which may contain word fragments. New Approach in doHyphenation: 1) Scan pending areas vector for pending text fragments, and remove them from the pending areas vector 2) Concatenate result from 1) with the current word to be hyphenated in the current char-buffer 3) call Hyphenator 4) use addWord to add pre-hyphen word fragment to current line area 5) Decision: is final hyphenation point somewhere in the pending area or in the current char-buffer ? 5a) hyphenation point is somewhere in the pending area : --> add rest of characters of the pending pending text fragments to the pending area vector (they will be printed in a new line (by addText()) together with the rest of the word which is in the current buffer). For this task I used the existing addSpacedWord() method with the pending parameter set to true. 5b) hyphenation point is somewhere in the current char buffer: --> just return new position in current char buffer I also changed the signature of doHyphenation(): Parameter TextState was added : addSpacedWord method (used in 5a) needs the current textState The call to doHyphenation() in LineArea.addText() is modified: The remaining width parameter now isn't reduced by the pendingWidth, because doHyphenation now looks at pending areas itself: ret = this.doHyphenation(dataCopy, i, wordStart, this.getContentWidth() - (finalWidth + spaceWidth /*+ pendingWidth*/), textState); I think it doesn't make sense that I include our xsl-fo documents to reproduce the error, because we use custom fonts, which will likely lead to a different layout on your system and the error will probably not occur. Chris Wewerka [EMAIL PROTECTED] Munich, Germany --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]