Re: Initial soft hyphen support

2007-01-16 Thread Andreas L Delmelle

On Jan 15, 2007, at 22:20, Andreas L Delmelle wrote:


snip /



Not really sure what would be most efficient:
- a void method appending to a parameter StringBuffer
- a method returning a copy of the char[] from index to index...

Seen that every String ultimately has a backing char[](*) anyway,  
I'd say that we can safely return the copy, and remove the overhead of


StringBuffer.append(new String(char[])).toString().toCharArray()


Looked a bit deeper, and there is apparently a good reason to use a  
StringBuffer: the char[] from one FOText might need to be appended to  
that of a previous one (see TextLM.findHyphenationPoints()).


I guess it would be a bad idea to replace this with arrays, since  
they're not so straightforward to concatenate (requires copying into  
a new array).


Too bad we're still targeting 1.3, else we might consider switching  
to a java.nio.CharBuffer...



Cheers,

Andreas



Re: Initial soft hyphen support

2007-01-16 Thread Vincent Hennebert
Andreas L Delmelle a écrit :
 On Jan 15, 2007, at 22:20, Andreas L Delmelle wrote:
 
 snip /
 
 Not really sure what would be most efficient:
 - a void method appending to a parameter StringBuffer
 - a method returning a copy of the char[] from index to index...

 Seen that every String ultimately has a backing char[](*) anyway, I'd
 say that we can safely return the copy, and remove the overhead of

 StringBuffer.append(new String(char[])).toString().toCharArray()
 
 Looked a bit deeper, and there is apparently a good reason to use a
 StringBuffer: the char[] from one FOText might need to be appended to
 that of a previous one (see TextLM.findHyphenationPoints()).
 
 I guess it would be a bad idea to replace this with arrays, since
 they're not so straightforward to concatenate (requires copying into a
 new array).
 
 Too bad we're still targeting 1.3, else we might consider switching to a
 java.nio.CharBuffer...

Hadn't we agreed upon raising the minimum Java version to 1.4? Or at
least make a poll on fop-user to see if that would create any problem.
If that'd depend only on me, we would already be using all the Java 1.5
nice features ;-)

Vincent


Re: Initial soft hyphen support

2007-01-16 Thread Clay Leeds

On Jan 16, 2007, at 12:25 PM, Vincent Hennebert wrote:

Hadn't we agreed upon raising the minimum Java version to 1.4? Or at
least make a poll on fop-user to see if that would create any problem.
If that'd depend only on me, we would already be using all the Java  
1.5

nice features ;-)

Vincent


As I recall, we're targeting JDK 1.4 for 0.93+, and leave the JDK 1.3  
for 0.20.5 (since it's not changing anyway). I think the thought was  
that anyone who needs to stay on JDK 1.3 (AIX 4.x  others locked  
into IBM Java 1.3, etc.) can continue using fop-0.20.5.


Web Maestro Clay


Re: Initial soft hyphen support

2007-01-15 Thread Andreas L Delmelle

On Jan 14, 2007, at 23:11, J.Pietschmann wrote:


Andreas L Delmelle wrote:
The SHY character will be presented to the hyphenator simply as a  
character of the word it appears in. The hyphenator should then be  
smart enough to recognize this as a special character, and do  
something like: create a hyphenation point for the SHY, ...


Unfortunately, the hyphenator currently isn't as nearly as smart,
and it's a major job to push it in this direction. E.g. it means
major API changes.


Unfortunate indeed :(

BTW: I took a very quick look, and does anyone know if there is a  
good reason why Hyphenation.word is a String? I mean, everything that  
comes from FOText and passes through TextLM is already char[]. The  
Hyphenation constructor takes a String parameter, so I guess  
somewhere --haven't looked yet-- a String is constructed from the  
portion of char[] that is to be hyphenated. If you then look at  
HyphenationTree, it says word.toCharArray()...



Cheers,

Andreas



Re: Initial soft hyphen support

2007-01-15 Thread J.Pietschmann

Andreas L Delmelle wrote:
BTW: I took a very quick look, and does anyone know if there is a good 
reason why Hyphenation.word is a String?


The hyphenator  interface goes through several wrapping layers,
probably due to the usual take working code and wrap it to fit
the caller method.
This which always seemed to be overly complicated for me. I tried
to come up with a comprehensive API for hyphenation (which would
also be applicable to spelling and other similar tasks). Unfortunately,
there doesn't seem to be any usable standard, all APIs I've seen
are very specific or simply horrible. Any simplification is certainly
welcome.

J.Pietschmann


Re: Initial soft hyphen support

2007-01-15 Thread Andreas L Delmelle

On Jan 15, 2007, at 21:25, J.Pietschmann wrote:


Andreas L Delmelle wrote:
BTW: I took a very quick look, and does anyone know if there is a  
good reason why Hyphenation.word is a String?


The hyphenator  interface goes through several wrapping layers,
probably due to the usual take working code and wrap it to fit
the caller method.


Looks that way...
Traced it down, and in TextLM.getWordChars() we get

  sbChars.append(new String(textArray, ai.iStartIndex,
  ai.iBreakIndex - ai.iStartIndex));


Not really sure what would be most efficient:
- a void method appending to a parameter StringBuffer
- a method returning a copy of the char[] from index to index...

Seen that every String ultimately has a backing char[](*) anyway, I'd  
say that we can safely return the copy, and remove the overhead of


StringBuffer.append(new String(char[])).toString().toCharArray()

Hmmm... Put it like that, and this would almost be one for the Daily  
WTF! 8-)


(*) which BTW, answers the question about the char[] instances being  
twice that of the text-nodes in the document in the snapshot posted  
by Richard earlier on in the thread about memory issues. Sure, there  
are some 39K text-nodes in the document, but there are most likely at  
least as many non-internalized property values (cfr. the number of  
String instances)...



This which always seemed to be overly complicated for me. I tried
to come up with a comprehensive API for hyphenation (which would
also be applicable to spelling and other similar tasks).  
Unfortunately,

there doesn't seem to be any usable standard, all APIs I've seen
are very specific or simply horrible. Any simplification is certainly
welcome.


A quick-and-dirty hack to make the Hyphenator return a Hyphenation as  
I described earlier on --hyph-point for the SHY and the rest as two  
separate hyphenated words-- doesn't seem too hard to pull off, but it  
would be an exception for the SHY only. For a more comprehensive  
approach, I currently don't know enough about hyphenation basics, I'm  
afraid...



Cheers,

Andreas


Re: Initial soft hyphen support

2007-01-14 Thread J.Pietschmann

Andreas L Delmelle wrote:
The SHY character will be presented to the 
hyphenator simply as a character of the word it appears in. The 
hyphenator should then be smart enough to recognize this as a special 
character, and do something like: create a hyphenation point for the 
SHY, ...


Unfortunately, the hyphenator currently isn't as nearly as smart,
and it's a major job to push it in this direction. E.g. it means
major API changes.

J.Pietschmann


Initial soft hyphen support

2007-01-13 Thread Manuel Mall
Just committed the initial support for the soft hyphen.

As we had two in favour of having the SHY always produce a break 
opportunity and only one against that's the route I took.

I had no luck with giving the SHY a reduced penalty and have the Knuth 
algorithm favour them before normal hyphenation breaks. Even with a 
penalty value of 1 fop still chooses the hyphenation break with a 
penalty of 50. Either I do something wrong or I misunderstand how the 
Knuth breaking calculation is suppose to work. May be one of the Knuth 
experts can have a look at this PLEASE.

Also not correctly working (yet) is ipd calculation when kerning and a 
SHY break is involved. But may be that's a more general issue.

For those looking closer at the commit the area handling within the text 
layout manager has changed a bit. Before this patch the assumption was 
made that the sequence of characters given to the LM will be fully 
output to the area tree. Now we have for the first time the case that 
characters (the SHY) can be dropped. This led to changes with respect 
to certain indexing loops.

Manuel


Re: Initial soft hyphen support

2007-01-13 Thread Andreas L Delmelle

On Jan 13, 2007, at 10:31, Manuel Mall wrote:

Hi Manuel,


Just committed the initial support for the soft hyphen.


Nice job, thanks!


As we had two in favour of having the SHY always produce a break
opportunity and only one against that's the route I took.

I had no luck with giving the SHY a reduced penalty and have the Knuth
algorithm favour them before normal hyphenation breaks. Even with a
penalty value of 1 fop still chooses the hyphenation break with a
penalty of 50. Either I do something wrong or I misunderstand how the
Knuth breaking calculation is suppose to work. May be one of the Knuth
experts can have a look at this PLEASE.


Well, I'm still not really an expert, but as I'm beginning to  
understand more and more, what you altered was the base Knuth element  
generation, right?


IIUC, a possible solution may be to treat SHY as special *only* if  
hyphenation is turned off.
The reasoning being that, if hyphenate is true, then handling the SHY  
becomes the hyphenator's job. The SHY character will be presented to  
the hyphenator simply as a character of the word it appears in. The  
hyphenator should then be smart enough to recognize this as a special  
character, and do something like: create a hyphenation point for the  
SHY, and try to hyphenate the parts before and after the SHY as  
separate words...



HTH!

Andreas