Re: White space handling Wiki page
On Oct 31, 2005, at 22:18, Andreas L Delmelle wrote: On Oct 27, 2005, at 06:29, Manuel Mall wrote: Actually something like: fo:block background-color=yellowword1fo:character character=#10;/fo:character character= /word2fo:character character= /word3fo:character character=#10;//fo:block currently causes an exception! The problem can be solved by a slight modification to OneCharIterator: * add a constructor with Character parameter (and member) * add a remove() implementation which makes Character's parent remove it from its list of child nodes Tested locally (very quickly), and seems to work nicely. If I get the chance to commit it in the next few days, I'll do so myself, but if you want to have a go, it's a pretty easy fix (adds up to about 10-15 LOC incl. javadocs :-)) Oops, been too quick. From an UnsupportedOperationException to a ConcurrentModificationException... The trick seems to be to introduce a small boolean 'discard' switch to the Character object, flip this upon calling OCIter.remove(), and have the Block/Inline later remove any of its characters marked as discardable, but do this (of course) only after the RecursiveCharIterator has finished --to avoid the childNodes list from being altered while it's being iterated over... Other option: store a list of the discardable space fo:characters at Block or Inline level, instead of marking the Character itself as such... A bit more than 15 LOC, but still quite doable. Cheers, Andreas
DO NOT REPLY [Bug 37318] - fop.bat: NoClassDefFoundError: org/apache/fop/cli/Main
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=37318. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND· INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=37318 --- Additional Comments From [EMAIL PROTECTED] 2005-11-01 15:33 --- Did you build fop using ant before trying to run it? -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
zero width space
Currently if one puts a zero-width-space (U+200B) into an XSL-FO file (or specifies linefeed-treatment=treat-as-zero-width-space) it is rendered as a missing character in PDF. Is that correct, i.e. does this character have to exist in the font used or should the formatter or renderer simply remove this character? It is the second approach that both AntennaHouse and RenderX appear to have chosen. Manuel
DO NOT REPLY [Bug 37318] - fop.bat: NoClassDefFoundError: org/apache/fop/cli/Main
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=37318. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND· INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=37318 --- Additional Comments From [EMAIL PROTECTED] 2005-11-01 17:29 --- Build fob? Do I have to? There are alreayd JAR files in the lib/ directory ... but yes, I can't see a fop.jar file! Why isn't it offered as a jar file? I also can't find a source bundle zip file, so do I really need to download all source files? Then I would have to install subversion, because there are so much source files that it takes hours to download each of them via the subversion link. :-( -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee.
Re: zero width space
Manuel Mall wrote: Currently if one puts a zero-width-space (U+200B) into an XSL-FO file (or specifies linefeed-treatment=treat-as-zero-width-space) it is rendered as a missing character in PDF. Is that correct, i.e. does this character have to exist in the font used or should the formatter or renderer simply remove this character? It is the second approach that both AntennaHouse and RenderX appear to have chosen. I recommend that no character is output for a ZWS. The whole purpose of placing a ZWS into the input XSL-FO is to give layout an extra break opportunity, without changing the appearance of the generated document. Chris
Re: zero width space
On Nov 1, 2005, at 15:52, Manuel Mall wrote: Currently if one puts a zero-width-space (U+200B) into an XSL-FO file (or specifies linefeed-treatment=treat-as-zero-width-space) it is rendered as a missing character in PDF. Is that correct, i.e. does this character have to exist in the font used or should the formatter or renderer simply remove this character? It is the second approach that both AntennaHouse and RenderX appear to have chosen. It's certainly not correct to render a missing glyph character, but it would also be wrong to remove it too early. The character doesn't take part in white-space treatment/collapsing, since it's not XML whitespace. It's somewhere in layout that the decision has to be made not to allocate space for this character, but it could play a part in line-building... My two cents. Cheers, Andreas
Re: White space handling Wiki page
On Nov 1, 2005, at 10:04, Manuel Mall wrote: I am sure it is doable - but is it worth it at this stage? Possibly after a better understanding of the white-space handling issues that whole current system needs revision? One problem with the current char iterator is that it iterates over inline boundaries which causes white space to be collapsed across those which according to the clarification of the WG is incorrect. IMO to implement the refinement step of the white space handling (which currently happens in the flow.Block object) we need an iterator which goes through all characters but indicates fo boundaries (not including fo:characters) so we can do: a) linefeed treatment across all characters; b) white space collapse across each consecutive section of implicit/explicit fo:characters, i.e. delimited by the start/end of fo's; c 1) white-space-treatment from the start of the fo:block to the first non white-space character; The iterator must also be able to either operate backwards or be able to be reset to a particular position (last non white space character) so we can do: c 2) white-space-treatment from the end of the fo:block backwards to the first non white-space character It must also support character deletions and character substitutions. Does that make sense? Very much. Precisely with that in mind, I've also been contemplating moving part of the whitespace-handling to inline-level. This would keep the nested inlines separated from the Block's own direct FOText descendants (and at the same time, in combination with the modification I already described, this would provide us with an opportunity to remove fo:characters from within the nested inlines -- which would become quite a pain if this removal is deferred to block- level) So the RecursiveCharIterator should only create Iterators over regular FOText or fo:characters that are direct descendants of the Block/Inline. FOText of nested FObjs should be left alone, since the whitespace will already be collapsed. IOW, it should stop being -- recursive? Currently, whitespace handling is triggered from the moment a Block encounters a child node that isn't FOText nor generates inline areas. At the basis this seems OK, the only difference I'd propose is that inlines do their own whitespace handling, so that *if* whitespace needs to be collapsed across fo boundaries --maybe there are cases?--, the block-level only needs to look at the first and last characters in an inline's text. Cheers, Andreas
Re: Leading/trailing space removal in LineLM
On Tue, Nov 01, 2005 at 11:40:42PM +0800, Manuel Mall wrote: This is probably a question for Luca or Simon. In LineLM we have this code: // ignore KnuthGlue and KnuthPenalty objects // at the beginning of the line seqIterator = seq.listIterator(iStartElement); tempElement = (KnuthElement) seqIterator.next(); while (!tempElement.isBox() seqIterator.hasNext()) { tempElement = (KnuthElement) seqIterator.next(); iStartElement++; } What is the background to this? This seems to interfere with certain combinations of white-space-collapse=false and white-space-treatment=preserve/ignore-if-before-linefeed. I think there is similar code to remove trailing stuff with similar interference. Glue and penalty items are removed at the start of a line. This is part of the Knuth algorithm. It does not touch the matter of white-space-collapse. If there is whitespace that may not be removed/collapsed at the start of the line, it must be protected by a preceding zero-width box. I.o.w., the value of white-space-collapse needs to be taken into account at the phase of getNextKnuthElements. Simon -- Simon Pepping home page: http://www.leverkruid.nl
Re: Unicode compliant Line Breaking
On Mon, Oct 31, 2005 at 03:25:12PM +0800, Manuel Mall wrote: In a previous post Joerg pointed to the Unicode Standard Annex #14 on Line Breaking (http://www.unicode.org/reports/tr14/) and his initial implementation: http://people.apache.org/~pietsch/linebreak.tar.gz. I had since a closer look at both UAX#14 and Joerg's code. Because I liked what I saw I went about adapting Joerg's code it to Unicode 4.1 and added fairly extensive JUnit test cases to it mainly because it really helps to go through the various different cases mentioned in the spec in some structured fashion. Is our current hyphenation method a subset of Unicode's method? Assuming now that this will be agreed as well the next step would be the more detailed design of the integration. But this is well beyond the scope of this e-mail as there are some tricky issues involved and they probably need to be tackled in conjunction with the white space handling issues. Many of the problems are related to our LayoutManager structures which create barriers when it comes to the need to process character sequences across those boundaries as is the case for both line breaking and white space handling. Add to that the design of the I seem to recall that the hyphenation code collects words across LM boundaries. It seems a useful goal to implement Unicode hyphenation. But since it is a major effort, it does not fit in working towards a release. In any case it would have to be in a separate branch until it proves to work and to implement a substantial part of hyphenation. Then it does not immediately matter if it is a separate project or a part of FOP. Simon -- Simon Pepping home page: http://www.leverkruid.nl
Re: Unicode compliant Line Breaking
Simon Pepping wrote: Is our current hyphenation method a subset of Unicode's method? Umm. What's the relation between hyphenation and TR14 (except for handling soft hyphens)? I guess you confuse finding line breaks in general and line breaking due to hyphenation. I seem to recall that the hyphenation code collects words across LM boundaries. As it should. Word boundaries and FO boundaries are different things: blockA wwrapper text-decoration=underlineo/wrapperrd/block J.Pietschmann
Re: zero width space
On Wed, 2 Nov 2005 01:03 am, Chris Bowditch wrote: Manuel Mall wrote: Currently if one puts a zero-width-space (U+200B) into an XSL-FO file (or specifies linefeed-treatment=treat-as-zero-width-space) it is rendered as a missing character in PDF. Is that correct, i.e. does this character have to exist in the font used or should the formatter or renderer simply remove this character? It is the second approach that both AntennaHouse and RenderX appear to have chosen. I recommend that no character is output for a ZWS. The whole purpose of placing a ZWS into the input XSL-FO is to give layout an extra break opportunity, without changing the appearance of the generated document. That seems to be the consensus, that is consider ZWS for line breaking but then discard and don't give it to the renderers. Are there any other (unusual Unicode) characters which fall in the same category that is they influence layout decisions but should not be seen by the renderers? Chris Manuel