Re: [Bug 27773] - [PATCH] Hyphenation

2004-04-16 Thread Luca Furini

 I think it would be better to report this item in a patch of
 its own. It really is a new issue.

Ok, sorry.
I'm going to do as you suggest.

Luca




DO NOT REPLY [Bug 28431] New: - Hyphenation of words with punctuation marks

2004-04-16 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=28431.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=28431

Hyphenation of words with punctuation marks

   Summary: Hyphenation of words with punctuation marks
   Product: Fop
   Version: 1.0dev
  Platform: PC
OS/Version: Windows XP
Status: NEW
  Severity: Normal
  Priority: Other
 Component: page-master/layout
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]


I have found a small bug concerning hyphenation in the 
HyphenationTree.hyphenate() method.
Before checking the exception list or using the algorithm, the 
function normalizes the word: during this phase, if a non-letter character 
is found null is returned.
// normalize word
char[] c = new char[2];
for (i = 1; i = len; i++) {
c[0] = w[offset + i - 1];
int nc = classmap.find(c, 0);
if (nc  0) {// found a non-letter character, abort
return null;
}
word[i] = (char)nc;
}
I think the condition (nc  0) is too strong: at the moment words followed by 
punctuation marks, or in parenthesis, are not hyphenated.
So, for example, the word suggestion can be hyphenated, but suggestion. 
and (suggestion), cannot.

This is how I tried to fix this problem:
- non-letter characters at the beginning are not copied into word[]
- if a non-letter character is found which is not at the beginning, it is not 
copied into word[] and a boolean variable becomes true
- if a letter-character is found when the variable is true, null is returned; 
otherwise, word[] is used to find hyphenation points

I have also added a little optimization: if, after the normalization and the 
non-letter character removal, the word size is less than (remainCharCount + 
pushCharCount), null is returned, without checking the exception list and 
performing the algorithm.

I'm going to attach the proposed patch and a test fo file which shows a few 
examples.

Regards

Luca


DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks

2004-04-16 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=28431.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=28431

Hyphenation of words with punctuation marks





--- Additional Comments From [EMAIL PROTECTED]  2004-04-16 13:29 ---
Created an attachment (id=11258)
proposed patch to HyphenationTree


DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks

2004-04-16 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=28431.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=28431

Hyphenation of words with punctuation marks





--- Additional Comments From [EMAIL PROTECTED]  2004-04-16 13:30 ---
Created an attachment (id=11259)
test fo file: words with punctuation marks and parenthesis


DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks

2004-04-16 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=28431.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=28431

Hyphenation of words with punctuation marks





--- Additional Comments From [EMAIL PROTECTED]  2004-04-16 19:44 ---
Luca,

The patch works well.

I do not find the name bAfterLetter very clear. It really is
bNonLetterAfterLetters, but that is too long. I find bEndOfLetters a
reasonable choice.

The 'else if (!bAfterLetter)' might as well be just 'else'.

The venom is in the tail. I do not know the details of this part of
hyphenation. Your addition of 'iIgnoreAtBeginning' seems OK. I think
you should also add 'iIgnoreAtBeginning' in the if branch (hyphenation
exceptions), but the results of a test fo are not quite in
favour. Perhaps you can have a look into this.

I added a long comment explaining various features, perhaps most to
myself.

I added cases to the test fo showing a word that is too short (when
one adds debug logging, one sees the effect), and 4 cases with a
hyphenation exception word.

Regards, Simon


DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks

2004-04-16 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=28431.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=28431

Hyphenation of words with punctuation marks





--- Additional Comments From [EMAIL PROTECTED]  2004-04-16 19:45 ---
Created an attachment (id=11264)
An expanded test fo file


DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks

2004-04-16 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=28431.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=28431

Hyphenation of words with punctuation marks





--- Additional Comments From [EMAIL PROTECTED]  2004-04-16 19:46 ---
Created an attachment (id=11265)
A slightly modified patch


DO NOT REPLY [Bug 27199] - [PATCH] FOP breadcrumb problem

2004-04-16 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=27199.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=27199

[PATCH] FOP breadcrumb problem

[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution||FIXED



--- Additional Comments From [EMAIL PROTECTED]  2004-04-17 01:21 ---
I think we've done what we can do on this issue. 

We just need to keep reverting the breadcrumbs.js file in
xml-site/targets/fop/skin everytime we do a publish, until the forrestbot
version is updated:

http://issues.cocoondev.org/jira//secure/ViewIssue.jspa?key=FOR-129


DO NOT REPLY [Bug 28431] - Hyphenation of words with punctuation marks

2004-04-16 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=28431.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=28431

Hyphenation of words with punctuation marks





--- Additional Comments From [EMAIL PROTECTED]  2004-04-17 05:17 ---
Your assumptions appear correct, I checked the Washington Post newspaper and saw
that hyphenation does indeed occur with words that have a period or comma at the
end of them.

Glen