Re: Why is / is valid line breaking char in FOP?
IMO, start it in XML Graphics Commons. We can always move it somewhere else if the need arises. On 26.10.2005 22:30:48 J.Pietschmann wrote: > I'd rather have the code in a reusable library outside of the FOP > project (in particular the infrastructure dealing with Unicode files > and the table generator). Unfortunately, none of the jakarta commons > modules showed much enthusiasm for integrating it, and I don't think > I have enough time to maintain a new module for this. Jeremias Maerki
Re: Why is / is valid line breaking char in FOP?
Manuel Mall wrote: I like the idea of having a UNICODE conformant/compliant/based line breaking algorithm in FOP. Note this has nothing to do with the Knuth algorithm used in FOP. I am talking about using the UNICODE algorithm to determine line break opportunities. That's exactly the purpose of both BreakIterator and my implementation. Shall we use your work in FOP That was the basic idea. and if so how can we best integrate it? Well I'd rather have the code in a reusable library outside of the FOP project (in particular the infrastructure dealing with Unicode files and the table generator). Unfortunately, none of the jakarta commons modules showed much enthusiasm for integrating it, and I don't think I have enough time to maintain a new module for this. BTW, looking at http://www.unicode.org/reports/tr14/ with respect to the SOLIDUS, that is line breaking property SY, it is actually quite complex as it does not allow a break within a sequence of digits, e.g. 26/10/2005 and discourages breaking things like "w/o" or "A/S". Oops! I should have read further. J.Pietschmann
Re: Why is / is valid line breaking char in FOP?
On Wed, 26 Oct 2005 03:15 am, J.Pietschmann wrote: > Manuel Mall wrote: > > While investigating if we could use the standard > > java.text.BreakIterator to determine line break points I noticed > > that FOP uses in addition to space, zero width space, hyphen also > > the forward slash as a valid line breaking character. The Java > > BreakIterator does not recognize slash as a line breaking char (nor > > FWIW does MS Word). > > > > What is the background to FOP allowing this? Is this consistent > > with normal user expectations or is this specific to type setting > > environments / Tex / Knuth? > > The BreakIterator class is supposed to implement the Unicode TR14 > standard annex > http://www.unicode.org/reports/tr14/ > The slash U+002F aka SOLIDUS is assigned a line breaking property > value SY (Symbols Allowing Breaks) > http://www.unicode.org/Public/UNIDATA/LineBreak.txt > which means "prevent a break before, and allow a break after". I > suspect this is a recent change in Unicode, not implemented yet by > your JDK release. > BTW first breaking the text using whitespace, then applying the > BreakIterator is unwise, because white space is significant for TR14 > line breaking. Unfortunately, combining whitespace normalization, > line break detection and word parsing (for hyphenation) in a single > pass is unwieldy if BreakIterator is used, that's why I tried to > implement it differently some time ago > http://people.apache.org/~pietsch/linebreak.tar.gz > Joerg, great stuff. I like the idea of having a UNICODE conformant/compliant/based line breaking algorithm in FOP. Note this has nothing to do with the Knuth algorithm used in FOP. I am talking about using the UNICODE algorithm to determine line break opportunities. It is then up to the Knuth algorithm to convert the Knuth element lists generated from the line break opportunities into an optimal set of line breaks. But how can we move forward? The current FOP code to determine line break opportunities looks a bit like a quick solution that works well for simple texts using only space, nbsp, zero width space, but not anything that uses more sophisticated UNICODE break characters. You have some code which does a better job at it but its not in FOP. Shall we use your work in FOP and if so how can we best integrate it? BTW, looking at http://www.unicode.org/reports/tr14/ with respect to the SOLIDUS, that is line breaking property SY, it is actually quite complex as it does not allow a break within a sequence of digits, e.g. 26/10/2005 and discourages breaking things like "w/o" or "A/S". > J.Pietschmann Manuel
Re: Why is / is valid line breaking char in FOP?
Manuel Mall wrote: While investigating if we could use the standard java.text.BreakIterator to determine line break points I noticed that FOP uses in addition to space, zero width space, hyphen also the forward slash as a valid line breaking character. The Java BreakIterator does not recognize slash as a line breaking char (nor FWIW does MS Word). What is the background to FOP allowing this? Is this consistent with normal user expectations or is this specific to type setting environments / Tex / Knuth? The BreakIterator class is supposed to implement the Unicode TR14 standard annex http://www.unicode.org/reports/tr14/ The slash U+002F aka SOLIDUS is assigned a line breaking property value SY (Symbols Allowing Breaks) http://www.unicode.org/Public/UNIDATA/LineBreak.txt which means "prevent a break before, and allow a break after". I suspect this is a recent change in Unicode, not implemented yet by your JDK release. BTW first breaking the text using whitespace, then applying the BreakIterator is unwise, because white space is significant for TR14 line breaking. Unfortunately, combining whitespace normalization, line break detection and word parsing (for hyphenation) in a single pass is unwieldy if BreakIterator is used, that's why I tried to implement it differently some time ago http://people.apache.org/~pietsch/linebreak.tar.gz J.Pietschmann
Re: Why is / is valid line breaking char in FOP?
Manuel Mall wrote: While investigating if we could use the standard java.text.BreakIterator to determine line break points I noticed that FOP uses in addition to space, zero width space, hyphen also the forward slash as a valid line breaking character. The Java BreakIterator does not recognize slash as a line breaking char (nor FWIW does MS Word). What is the background to FOP allowing this? Is this consistent with normal user expectations or is this specific to type setting environments / Tex / Knuth? I don't remember whether it was already there or I added the slash to the other allowed characters, but in my idea this is useful when the text of a block contains some long url, which would not have other feasible breaks (apart maybe from some "-", which could be misleading). Regards Luca
Why is / is valid line breaking char in FOP?
My apologies if that has been discussed before. While investigating if we could use the standard java.text.BreakIterator to determine line break points I noticed that FOP uses in addition to space, zero width space, hyphen also the forward slash as a valid line breaking character. The Java BreakIterator does not recognize slash as a line breaking char (nor FWIW does MS Word). What is the background to FOP allowing this? Is this consistent with normal user expectations or is this specific to type setting environments / Tex / Knuth? Regards Manuel