Re: Initial soft hyphen support
On Jan 15, 2007, at 22:20, Andreas L Delmelle wrote: snip / Not really sure what would be most efficient: - a void method appending to a parameter StringBuffer - a method returning a copy of the char[] from index to index... Seen that every String ultimately has a backing char[](*) anyway, I'd say that we can safely return the copy, and remove the overhead of StringBuffer.append(new String(char[])).toString().toCharArray() Looked a bit deeper, and there is apparently a good reason to use a StringBuffer: the char[] from one FOText might need to be appended to that of a previous one (see TextLM.findHyphenationPoints()). I guess it would be a bad idea to replace this with arrays, since they're not so straightforward to concatenate (requires copying into a new array). Too bad we're still targeting 1.3, else we might consider switching to a java.nio.CharBuffer... Cheers, Andreas
Re: Initial soft hyphen support
Andreas L Delmelle a écrit : On Jan 15, 2007, at 22:20, Andreas L Delmelle wrote: snip / Not really sure what would be most efficient: - a void method appending to a parameter StringBuffer - a method returning a copy of the char[] from index to index... Seen that every String ultimately has a backing char[](*) anyway, I'd say that we can safely return the copy, and remove the overhead of StringBuffer.append(new String(char[])).toString().toCharArray() Looked a bit deeper, and there is apparently a good reason to use a StringBuffer: the char[] from one FOText might need to be appended to that of a previous one (see TextLM.findHyphenationPoints()). I guess it would be a bad idea to replace this with arrays, since they're not so straightforward to concatenate (requires copying into a new array). Too bad we're still targeting 1.3, else we might consider switching to a java.nio.CharBuffer... Hadn't we agreed upon raising the minimum Java version to 1.4? Or at least make a poll on fop-user to see if that would create any problem. If that'd depend only on me, we would already be using all the Java 1.5 nice features ;-) Vincent
Re: Initial soft hyphen support
On Jan 16, 2007, at 12:25 PM, Vincent Hennebert wrote: Hadn't we agreed upon raising the minimum Java version to 1.4? Or at least make a poll on fop-user to see if that would create any problem. If that'd depend only on me, we would already be using all the Java 1.5 nice features ;-) Vincent As I recall, we're targeting JDK 1.4 for 0.93+, and leave the JDK 1.3 for 0.20.5 (since it's not changing anyway). I think the thought was that anyone who needs to stay on JDK 1.3 (AIX 4.x others locked into IBM Java 1.3, etc.) can continue using fop-0.20.5. Web Maestro Clay
Re: Initial soft hyphen support
On Jan 14, 2007, at 23:11, J.Pietschmann wrote: Andreas L Delmelle wrote: The SHY character will be presented to the hyphenator simply as a character of the word it appears in. The hyphenator should then be smart enough to recognize this as a special character, and do something like: create a hyphenation point for the SHY, ... Unfortunately, the hyphenator currently isn't as nearly as smart, and it's a major job to push it in this direction. E.g. it means major API changes. Unfortunate indeed :( BTW: I took a very quick look, and does anyone know if there is a good reason why Hyphenation.word is a String? I mean, everything that comes from FOText and passes through TextLM is already char[]. The Hyphenation constructor takes a String parameter, so I guess somewhere --haven't looked yet-- a String is constructed from the portion of char[] that is to be hyphenated. If you then look at HyphenationTree, it says word.toCharArray()... Cheers, Andreas
Re: Initial soft hyphen support
Andreas L Delmelle wrote: BTW: I took a very quick look, and does anyone know if there is a good reason why Hyphenation.word is a String? The hyphenator interface goes through several wrapping layers, probably due to the usual take working code and wrap it to fit the caller method. This which always seemed to be overly complicated for me. I tried to come up with a comprehensive API for hyphenation (which would also be applicable to spelling and other similar tasks). Unfortunately, there doesn't seem to be any usable standard, all APIs I've seen are very specific or simply horrible. Any simplification is certainly welcome. J.Pietschmann
Re: Initial soft hyphen support
On Jan 15, 2007, at 21:25, J.Pietschmann wrote: Andreas L Delmelle wrote: BTW: I took a very quick look, and does anyone know if there is a good reason why Hyphenation.word is a String? The hyphenator interface goes through several wrapping layers, probably due to the usual take working code and wrap it to fit the caller method. Looks that way... Traced it down, and in TextLM.getWordChars() we get sbChars.append(new String(textArray, ai.iStartIndex, ai.iBreakIndex - ai.iStartIndex)); Not really sure what would be most efficient: - a void method appending to a parameter StringBuffer - a method returning a copy of the char[] from index to index... Seen that every String ultimately has a backing char[](*) anyway, I'd say that we can safely return the copy, and remove the overhead of StringBuffer.append(new String(char[])).toString().toCharArray() Hmmm... Put it like that, and this would almost be one for the Daily WTF! 8-) (*) which BTW, answers the question about the char[] instances being twice that of the text-nodes in the document in the snapshot posted by Richard earlier on in the thread about memory issues. Sure, there are some 39K text-nodes in the document, but there are most likely at least as many non-internalized property values (cfr. the number of String instances)... This which always seemed to be overly complicated for me. I tried to come up with a comprehensive API for hyphenation (which would also be applicable to spelling and other similar tasks). Unfortunately, there doesn't seem to be any usable standard, all APIs I've seen are very specific or simply horrible. Any simplification is certainly welcome. A quick-and-dirty hack to make the Hyphenator return a Hyphenation as I described earlier on --hyph-point for the SHY and the rest as two separate hyphenated words-- doesn't seem too hard to pull off, but it would be an exception for the SHY only. For a more comprehensive approach, I currently don't know enough about hyphenation basics, I'm afraid... Cheers, Andreas
Re: Initial soft hyphen support
Andreas L Delmelle wrote: The SHY character will be presented to the hyphenator simply as a character of the word it appears in. The hyphenator should then be smart enough to recognize this as a special character, and do something like: create a hyphenation point for the SHY, ... Unfortunately, the hyphenator currently isn't as nearly as smart, and it's a major job to push it in this direction. E.g. it means major API changes. J.Pietschmann
Initial soft hyphen support
Just committed the initial support for the soft hyphen. As we had two in favour of having the SHY always produce a break opportunity and only one against that's the route I took. I had no luck with giving the SHY a reduced penalty and have the Knuth algorithm favour them before normal hyphenation breaks. Even with a penalty value of 1 fop still chooses the hyphenation break with a penalty of 50. Either I do something wrong or I misunderstand how the Knuth breaking calculation is suppose to work. May be one of the Knuth experts can have a look at this PLEASE. Also not correctly working (yet) is ipd calculation when kerning and a SHY break is involved. But may be that's a more general issue. For those looking closer at the commit the area handling within the text layout manager has changed a bit. Before this patch the assumption was made that the sequence of characters given to the LM will be fully output to the area tree. Now we have for the first time the case that characters (the SHY) can be dropped. This led to changes with respect to certain indexing loops. Manuel
Re: Initial soft hyphen support
On Jan 13, 2007, at 10:31, Manuel Mall wrote: Hi Manuel, Just committed the initial support for the soft hyphen. Nice job, thanks! As we had two in favour of having the SHY always produce a break opportunity and only one against that's the route I took. I had no luck with giving the SHY a reduced penalty and have the Knuth algorithm favour them before normal hyphenation breaks. Even with a penalty value of 1 fop still chooses the hyphenation break with a penalty of 50. Either I do something wrong or I misunderstand how the Knuth breaking calculation is suppose to work. May be one of the Knuth experts can have a look at this PLEASE. Well, I'm still not really an expert, but as I'm beginning to understand more and more, what you altered was the base Knuth element generation, right? IIUC, a possible solution may be to treat SHY as special *only* if hyphenation is turned off. The reasoning being that, if hyphenate is true, then handling the SHY becomes the hyphenator's job. The SHY character will be presented to the hyphenator simply as a character of the word it appears in. The hyphenator should then be smart enough to recognize this as a special character, and do something like: create a hyphenation point for the SHY, and try to hyphenate the parts before and after the SHY as separate words... HTH! Andreas