Re: [Bug 27773] - [PATCH] Hyphenation
I think it would be better to report this item in a patch of its own. It really is a new issue. Ok, sorry. I'm going to do as you suggest. Luca
DO NOT REPLY [Bug 28431] New: - Hyphenation of words with punctuation marks
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28431. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28431 Hyphenation of words with punctuation marks Summary: Hyphenation of words with punctuation marks Product: Fop Version: 1.0dev Platform: PC OS/Version: Windows XP Status: NEW Severity: Normal Priority: Other Component: page-master/layout AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] I have found a small bug concerning hyphenation in the HyphenationTree.hyphenate() method. Before checking the exception list or using the algorithm, the function normalizes the word: during this phase, if a non-letter character is found null is returned. // normalize word char[] c = new char[2]; for (i = 1; i = len; i++) { c[0] = w[offset + i - 1]; int nc = classmap.find(c, 0); if (nc 0) {// found a non-letter character, abort return null; } word[i] = (char)nc; } I think the condition (nc 0) is too strong: at the moment words followed by punctuation marks, or in parenthesis, are not hyphenated. So, for example, the word suggestion can be hyphenated, but suggestion. and (suggestion), cannot. This is how I tried to fix this problem: - non-letter characters at the beginning are not copied into word[] - if a non-letter character is found which is not at the beginning, it is not copied into word[] and a boolean variable becomes true - if a letter-character is found when the variable is true, null is returned; otherwise, word[] is used to find hyphenation points I have also added a little optimization: if, after the normalization and the non-letter character removal, the word size is less than (remainCharCount + pushCharCount), null is returned, without checking the exception list and performing the algorithm. I'm going to attach the proposed patch and a test fo file which shows a few examples. Regards Luca
DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28431. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28431 Hyphenation of words with punctuation marks --- Additional Comments From [EMAIL PROTECTED] 2004-04-16 13:29 --- Created an attachment (id=11258) proposed patch to HyphenationTree
DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28431. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28431 Hyphenation of words with punctuation marks --- Additional Comments From [EMAIL PROTECTED] 2004-04-16 13:30 --- Created an attachment (id=11259) test fo file: words with punctuation marks and parenthesis
DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28431. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28431 Hyphenation of words with punctuation marks --- Additional Comments From [EMAIL PROTECTED] 2004-04-16 19:44 --- Luca, The patch works well. I do not find the name bAfterLetter very clear. It really is bNonLetterAfterLetters, but that is too long. I find bEndOfLetters a reasonable choice. The 'else if (!bAfterLetter)' might as well be just 'else'. The venom is in the tail. I do not know the details of this part of hyphenation. Your addition of 'iIgnoreAtBeginning' seems OK. I think you should also add 'iIgnoreAtBeginning' in the if branch (hyphenation exceptions), but the results of a test fo are not quite in favour. Perhaps you can have a look into this. I added a long comment explaining various features, perhaps most to myself. I added cases to the test fo showing a word that is too short (when one adds debug logging, one sees the effect), and 4 cases with a hyphenation exception word. Regards, Simon
DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28431. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28431 Hyphenation of words with punctuation marks --- Additional Comments From [EMAIL PROTECTED] 2004-04-16 19:45 --- Created an attachment (id=11264) An expanded test fo file
DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28431. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28431 Hyphenation of words with punctuation marks --- Additional Comments From [EMAIL PROTECTED] 2004-04-16 19:46 --- Created an attachment (id=11265) A slightly modified patch
DO NOT REPLY [Bug 27199] - [PATCH] FOP breadcrumb problem
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=27199. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=27199 [PATCH] FOP breadcrumb problem [EMAIL PROTECTED] changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution||FIXED --- Additional Comments From [EMAIL PROTECTED] 2004-04-17 01:21 --- I think we've done what we can do on this issue. We just need to keep reverting the breadcrumbs.js file in xml-site/targets/fop/skin everytime we do a publish, until the forrestbot version is updated: http://issues.cocoondev.org/jira//secure/ViewIssue.jspa?key=FOR-129
DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28431. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28431 Hyphenation of words with punctuation marks --- Additional Comments From [EMAIL PROTECTED] 2004-04-17 05:17 --- Your assumptions appear correct, I checked the Washington Post newspaper and saw that hyphenation does indeed occur with words that have a period or comma at the end of them. Glen