Re: Page breaking [was: Markers added to the wrong page]
[The Web Maestro] I don't know how the spec deals with this, but I doubt the spec cares which algorithm is used. That said, would it be a good idea to determine which algorithm to use based on something in userconfig.xml or something? If the Knuth system is more 'expensive' in terms of resources, we could make non-Knuth the default, and enable Knuth via a config file. Yes indeed. But without measurements I would guess that knuth page breaking would be far less expensive than the knuth line breaking. The line breaking has to deal with far more elements and because of that a far larger set of active nodes. But clearly a choice will be good, both for page and for line breaking. [Victor] The good news is that this is an excellent idea. The bad news is that FOP's determination to not even tolerate more than one algorithm is what drove FOP and FOray apart. So you have stumbled in to a somewhat sensitive area, which might explain the absence of responses (so far). BTW, since it is relevant here, thanks for your kind comments last week in another thread. You are ever the peacemaker, a very useful role. As long as the differences remain as great as they are, we *need* more than one FO implementation. In the meantime, I am much more useful and productive outside of FOP than I am in it, and, the FOP developers probably are also. To my perhaps still naive mind, layout consists of exactly two tasks: 1) finding line-breaks within logical blocks (nested blocks consisting of a new logical block), and 2) finding page-breaks between and within logical blocks. All other tasks can be conveniently handled within the two data structures, not being dependent on any strategy. The Knuth-style algorithm is by definition a total-fit algorithm. The scope of such an algorithm for the first task is the logical block itself. The scope of the second task is the page-sequence. So, to implement a Knuth-style algorithm (itself a good idea) for item 2, requires the ability to see an entire page-sequence in the AreaTree at one time. Not at all. It is a rather trivial change to knuth to pick a page break when there is N pages of lookahead. If we assume that finished page knuth element arrive one at time to the KnuthPage algorithm, the main loop becomes: addKnuthElement(element) { if element.isBox(): totalWidth += element.width else if element.isGlue(): if (previous element is box): considerLegalBreak() totalWidth += element.width totalStretch += element.stretch totalShrink += element.shrink else if element.isPenalty(): if element.penalty INFINITE: considerLegalBreak() if activeList.startLine currentPage + lookahead: // In the activeList, find the node with the best demerit. bestNode = findBestDemerit() // Remove all nodes in activeList which does not have bestNode // in their parentage. clearActiveList(bestNode) makePageBreak(bestNode) Here it is only clearActiveList() which does not have a fast implementation when using the usual implementation of ActiveNode. It will require a scan over all the active node and for each node a scan back thru the previous links to the node at currentPage+1. clearActiveList(bestNode): for node in activeList: // jump back thru previous (node.page-currentPage) times. while i++ node.page - currentPage previous = node.previous if previous != bestNode: activeList.remove(node) The rest of the methods called are similar to the methods in my knuth line breaking patch (which is quite similar the current implementation). My own insecurity comes from figuring out which penalty values and demerit algorihtm to use to get keeps and orphans right. regards, finn
Re: Page breaking [was: Markers added to the wrong page]
[Luca] I'm not sure it is always possible to do this: sometimes the representation of a block depends on the properties of a higher level block. For example: outer block | - | | innerinner blockblock 12 In order to decide whether there can be a page break between the lines generated by innerBlock1 and innerBlock2, we must know: - if inner block 1 has keep-with-next - if inner block 2 has keep-with-previous - if outer block has keep-together This can be done if the outer BlockLM calls receives the elements created by its child, looks at them and adds / removes / corrects; could this be done if the inner BlockLMs directly give their element to the top-level LM? I would pass the element on the immidiate parent, which recursively passes them on the top-level LM in the direction. For inline, the toplevel would be LineLM and for blocks it would be the PageLM. BTW, what about your great refactoring of all the knuth code? I do not have the time atm to do the followup and bugfixes, so it will have to wait. If anyone also wants to commit it, feel free. regards, finn
Re: Layout dimension mechanism
[Jeremias] I think there is something fundamentally wrong with the layout dimension mechanism. I've found two problems, one minor, one very bad: 1. While implementing percentages for background-position-horizontal and -vertical, I realized that it wasn't so simple to base the calculation on a simple value. The percentage base is actually the padding-rect-size minus the intrinsic size of the background image. Of course, this value could be set by each layout manager but it seems awkward to have to do that everywhere. Instead, I can easily calculate this in TraitSetter.addBackground(), provided the BPD and IPD of each Area is properly set before calling TraitSetter.addBackground(). That's currently not the case, however. But it's probably a good idea to do that anyway. Yes IPD and BPD traits should be assigned to all area which has ipd and bpd. But at the time where base lengths is needed, no areas has yet been created. So I don't see any way to avoid making the base lengths available during getNextBreakPoss. Either as values in a HashMap or on demand via a context object. 2. The above let me calculate the background image position easily for block-containers. When I wanted to check the same for the body region I found some bugs that I'm not yet testing for. While trying to fix them I found a problem when I specify width=height=100% on a block-container as the only child of a flow. The BLOCK_IPD and BLOCK_BPD layout dimensions were fetched from the Root which returned the page size instead of the content-rect size of the current region body area (which is wrong for documents with different page sizes throughout the document). PageSequenceLayoutManager also sets the page size as BLOCK_IPD/BPD on each Region (which is wrong, too). In principle, the BLOCK_IPD/BPD should come from the currently applicable Region if there's no parent block-level object. The current setting of dimension is neither complete nor correct. I'm beginning to think that the layout dimensions should be held and provided by the layout managers instead of the FObjs. To evaluate a percentage-enabled value the layout manager would have to call getLength() Just to be accurate, I would say it is the getValue() call that must accept a context parameter. with some kind of context object that allows it to fetch the right base values for percentage calculations. I haven't fully made up my mind about that and I'd like to hear what you guys think about this problem. I agree completely. The LayoutDimension in FObj was a hack to show off the relative length expression mechanisme. At the time a huge amount of Length.getValue() calls was located in PropertyManager. They are gone now, which I hope will make the change to using a context easier. regards, finn
Re: cvs commit: xml-fop/src/java/org/apache/fop/layoutmgr TraitSetter.java BlockLayoutManager.java
[Simon] There does not seem to be a need to add the inherited value later; the property maker already has done so. See IndentPropertyMaker.compute(PropertyList). It uses propertyList.getInherited(baseMaker.propId).getNumeric()) to get the inherited value. Earlier FOP developers understood this part well. [Jeremias] I understand, but I think you're talking exclusively about the property resolution phase (right?) while I found that I need the computed value of the margin property (not only the explicit one as is currently the case) and the inherited start-indent for the layout manager code and to set traits correctly. [Simon] PropertyList().get(Constants.PR_START_INDENT) gets the computed value, that is, the value after property refinement. It is not the raw value stated in the FO file; FOP does not store that at all. FOP tries to do property refinement immediately. If that is not possible because the computed value depends on a trait of an area, FOP stores an expression, which can be computed at layout time. I see no mention in section 5 of the spec that the trait value for start-indent is different from the computed property value. The problem that Jeremias is trying to address is computing space-[start,end] traits. Since only start-indent was computed correctly he tried to compute space-[start,end] traits based on the [start,end]-indent and to do that the inherited indent values (which has just been added in IndentPropertyMaker) must be substracted. Which may indicate that calculating space-[start,end] traits based on indents is probably not the right approach. I think the solution would be to: - Store the [start,end]-indent traits. - Let LayoutContent.refIPD be the reference IPD rather than the content ipd of the parent. - let the renderer keep track of the reference rect and use the start-indent trait to locate the position of the content rectangle. regards, finn
Re: cvs commit: xml-fop/src/java/org/apache/fop/layoutmgr TraitSetter.java BlockLayoutManager.java
[Jeremias] would you please check if it is acceptable to put the inherited values directly into the CommonMarginBlock? It might have been cleaner to always get the value via the parent FO but I think in this case it helps simplifying the code in TraitSetter and BlockLayoutManager. It looks wrong, it feels wrong, but I'm not at all sure if it is wrong. But I would be tempted to drop the use of space-[start,end] traits and instead let the renderers use [start,end]-indent traits and the reference rectangle to calculate the start position of the content rectangle. Alternatively calculate space-[start,end] based on the [start,end]-indent traits on the parent area rather then the parent fo. regards, finn
Re: cvs commit: xml-fop/src/java/org/apache/fop/fo/properties BoxPropShorthandParser.java
[EMAIL PROTECTED] wrote: Log: convertValueForProperty didn't have the right signature and therefore didn't override the superclass implementation. That was sloppy of me. Thanks for finding and fixing it. +ListProperty listProperty = (ListProperty)property; That is a no-no. Properties should not be casted but only be coerced using the getXXX methods. And the getList() method is already defined on Property so no cast is needed. Using casts will prevent us from adding property proxies, which i suspect will be needed. regards, finn
Re: cvs commit: xml-fop/src/java/org/apache/fop/datatypes LengthBase.java
Simon Pepping wrote: On Tue, Dec 28, 2004 at 06:03:13PM -, [EMAIL PROTECTED] wrote: Index: LengthBase.java === RCS file: /home/cvs/xml-fop/src/java/org/apache/fop/datatypes/LengthBase.java,v retrieving revision 1.8 retrieving revision 1.9 diff -u -r1.8 -r1.9 --- LengthBase.java 28 Oct 2004 10:00:19 - 1.8 +++ LengthBase.java 28 Dec 2004 18:03:12 - 1.9 @@ -20,6 +20,7 @@ import org.apache.fop.fo.Constants; import org.apache.fop.fo.FObj; +import org.apache.fop.fo.IntrinsicSizeAccess; import org.apache.fop.fo.PropertyList; import org.apache.fop.fo.expr.PropertyException; @@ -122,13 +128,15 @@ case BLOCK_HEIGHT: return parentFO.getLayoutDimension(PercentBase.BLOCK_BPD).intValue(); case CONTAINING_REFAREA:// example: start-indent, end-indent - { //FONode fo; //for (fo = parentFO; fo != null !fo.generatesReferenceAreas(); //fo = fo.getParent()); //return (((fo != null) (fo instanceof FObj)) ? ((FObj)fo).getContentWidth() : 0); return 0; -} +case IMAGE_INTRINSIC_WIDTH: +return ((IntrinsicSizeAccess)propertyList.getFObj()).getIntrinsicWidth(); +case IMAGE_INTRINSIC_HEIGHT: +return ((IntrinsicSizeAccess)propertyList.getFObj()).getIntrinsicHeight(); case CUSTOM_BASE: //log.debug(!!! LengthBase.getBaseLength() called on CUSTOM_BASE type !!!); return 0; I am surprised to find the property list here. If I understand Finn's restructuring of the usage of property lists, it was the intention that all references to the property lists are released, so that they can be GC'ed. The exceptions are the subtree under Marker and the line of ancestors of RetrieveMarker. This patch reveals that LengthBase maintains a reference to the property list. Perhaps that is an oversight in the restructuring? Indeed. Perhaps the references to the property list and to the parent FO might be replaced by a reference to the FO to which the property using this LengthBase belongs. But I would not know how to replace the method PropertyList.getInherited(). Perhpaps Finn wants to take a look at this? Unfortunately not at the moment where real work takes its toll. The idea was that all base values was assigned on the FObj in layoutDimension, including font sizes. And that also goes for image sizes. In the longer term the layoutDimension should be available in a context object that is passed into all Length.getValue(context) calls. regards, finn
Re: Large files.
The loop can be stopped when we temporary run out of FO tree nodes and restarted again when new nodes has been added. I suppose that the FO tree can then be viewed as a stream of FO nodes. [Victor] This model probably works fine if you never need to look ahead, but there are numerous examples (well discussed in the archives) where one does need to do that, the most obvious being automatic table layout. Peter's solution to that is pull parsing, which probably works, but forces an intermingling of interests that I need to avoid. My solution is to serialize the FOTree as necessary Did you notice that if a FOTree (or a fragment of it) is serialized to a preorder sequential representation with end markers, the preorder, postorder and child events can be fired directly from the input stream? IOW the event based layout can work both of a normal parent/children linked tree and a sequential tree. The bottom line is that, if you conceivably need to see both the beginning and end of the input simultaneously (which we do), I can see a need for several passes over the same tree fragment, but do we really need to see the beginning and the end at the same time? The auto table example appears to need 2 or 3 sequential passes over the table subtree (one to find the minimum/maximum column width and one to do the actual layout), but the layout process is still sequential. regards, finn
Re: Large files.
The problem with Keirons layout code (with respect to large input files) is that it works top-down on the LM tree and thus require the creating of the complete LM tree for the page sequence. To better fit within SAX event model the layout process should also be event driven, triggered f.ex by the endBlock() and character() events. That means that the traversing of the FOTree cannot be easily done using recursive decend. Instead the main loop over the FO tree could use a non-recusive tree treverse like this: public Element traverse(Node q, Node end, int mode) { while (q != end) { switch (mode) { case FIRST: // First time in the node. preorder(q); if (q.hasChildNodes()) { q = q.getFirstChild(); break; } mode = NEXT; // fallthrough case NEXT: // All children are handled. postorder(q); // Add the node to its parent. if (q.getParentNode() != null) { child(q.getParentNode(), q); } // fallthrough case RESTART: // Attempt to move vertically to next sibling. horizontally? if (q.getNextSibling() != null) { q = q.getNextSibling(); mode = FIRST; break; } // No sibling found, move to the parent to // do postorder. q = q.getParentNode(); mode = NEXT; } } return null; } ... The loop can be stopped when we temporary run out of FO tree nodes and restarted again when new nodes has been added. I suppose that the FO tree can then be viewed as a stream of FO nodes. [Peter] I suppose so. And I suppose that making layout event-driven would better fit in with the SAX event model. It just doesn't make sense from the point of view of layout which is basically a top-down process. By top-down, do you mean the visiting order of the nodes in the tree? Or the structure of the code that does the layout? Like this: process(Node n) { preorder(n) for (each child in n) { process(child) child(n, child); } postorder(n); } The traverse() code does the exact same as the process() code. The differences are: - All people immediately recoqnize process(). - process() can use local variables to store state. A major drawback of process() is that is darn hard to suspend processing and then restart it. Our current getNextBreakPoss() code is a nice example of just how hard it is. regards, finn
Refactoring of knuth line breaking code.
Hi, I've been playing around with the knuth line breaking code and made a slight refactoring of it. The result is 38% faster for START align, 78% faster for CENTER align and 12% for JUSTIFY align. Meassured using 'time' command, jdk1.4.2_03, -Xmx2 and a rough edition of Sherlock Holmes from the the Gutenberg project: http://apache.org/~bckfnn/advsh12s.fo http://apache.org/~bckfnn/advsh12c.fo http://apache.org/~bckfnn/advsh12j.fo (s=start, c=center, j=justify alignment) The patch is here: http://issues.apache.org/bugzilla/show_bug.cgi?id=32612 The performance improvement comes from: - Using -INFINITE_RATIO when there is too little shrink in a line for an element to fit. - Reuse the BestRecords instance instead of creating an instance for each line * for each element. - Using ArrayList instead of LinkedList. - Avoiding the indexOf calls on Paragraph. - Use '*' instead of Math.pow() for calculating demerit. - Split activeList into an array indexed by line and nodes within a line linked together. These improvements could also be applied to the existing code, so I think the more interesting point is the quality of the refactoring job. I guess that I have moved away from a faithful implementation of Knuth's algorithm, but I think the result is more in line with common java code. Opinions? regards, finn
Large files.
[Peter] As a completely unintentional side-effect, it gave me the tools to solve the really critical FOP performance problem on large files. This isn't going to be solved by micro-efficiencies on FO tree storage. I'm going to use this as a excuse to outline my thinking for handling large input files. I'm almost sure that it is a bit different from your view. This is a rant, and I don't have a working patch to back it up. The problem with Keirons layout code (with respect to large input files) is that it works top-down on the LM tree and thus require the creating of the complete LM tree for the page sequence. To better fit within SAX event model the layout process should also be event driven, triggered f.ex by the endBlock() and character() events. That means that the traversing of the FOTree cannot be easily done using recursive decend. Instead the main loop over the FO tree could use a non-recusive tree treverse like this: public Element traverse(Node q, Node end, int mode) { while (q != end) { switch (mode) { case FIRST: // First time in the node. preorder(q); if (q.hasChildNodes()) { q = q.getFirstChild(); break; } mode = NEXT; // fallthrough case NEXT: // All children are handled. postorder(q); // Add the node to its parent. if (q.getParentNode() != null) { child(q.getParentNode(), q); } // fallthrough case RESTART: // Attempt to move vertically to next sibling. if (q.getNextSibling() != null) { q = q.getNextSibling(); mode = FIRST; break; } // No sibling found, move to the parent to // do postorder. q = q.getParentNode(); mode = NEXT; } } return null; } This is the algorithm from Knuth TAoCP, vol I, Page 323 program S. It traverses any DOM tree fragment from 'q' to 'end' without using the java stack. The tree must support getFirstChild() and getNextSibling(). During the traverse it fires 'preorder', 'postorder' and 'child' events. When applied to FOP the nodes are LayoutManagers and getFirstChild() and getNextSibling() will create the LayoutManager for the first child / next sibling. The loop can be stopped when we temporary run out of FO tree nodes and restarted again when new nodes has been added. I suppose that the FO tree can then be viewed as a stream of FO nodes. regards, finn
Re: Marker.rebind()
[Simon] I have just committed a patch which fixes bug 32253. One problem remains: the text in the marker does not always have the right properties, e.g. the right size. This is probably due to the fact that Marker.rebind() does not rebind the whole subtree below a marker. Your right! It seems that FOTexts isn't bind()'ed by the Marker.rebind(). I'm certain that I had that working at some point before I submitted the patch. I guess it was the change in FOText's inheritance, which happened just before I submitted, that caused the problem. The solution is to move the FObj.createPropertyList() method to FONode and to call ft.createPropertList from FObjMixed.addCharacter(). That will register the the FOText with the Marker.children map. I'll do it later this week. regards, finn
Knuth maximum demerit value.
Hi, What is the logic behind the INFINITE_DEMERITS = 1E11 ? Having a threshold of 20 can result in a demerit value of (1 + 100 * 20 ** 3) ** 2 == 6.400016E11 for the first KnuthNode which is then ignored since it is worse than the initial demerit value. I would guess that when we use a ration of 20, the initial demerit value should somehow match. regards, finn
Re: More questions on line breaking.
1) What is inactiveList doing. Nodes are added but never used. [Simon] It contains all feasible breakpoints except those that are still active, i.e., are still in scope as the start of a line ending at the currently considered breakpoint. At the end of the loop the active list only contains nodes pertaining to the end of the paragraph. From the best one of them the breakpoints of the paragraph are calculated by tracing back to the beginning through the inactive list. See the code in LineLM after the comment // use the chosen node to determine the optimum breakpoints in the line bestActiveNode = bestActiveNode.previous; the previous node is held in the inactive list. Ok, so it isn't really needed when the algorithm is implemented in java. Just by having the previous node linked from within bestActiveNode is enough to keep the inactive nodes alive. So inactiveList can be removed. regards, finn
Re: Knuth linebreaking questions
And why not adjust the spacing within the user specified min/max for START and END alignment? [Luca] Should the user desire adjusted spaces, wouldn't it be better for him to specify justified alignment? :-) Seriously, the recommendation (at 7.16.2 letter-spacing and 7.16.8 word-spacing) states that these spaces may also be influenced by justification, but says nothing about start and end alignments. I tend to read that to mean that word spacing may be pushed beyond the specified range by justification. And I would think that unjustified alignment still has the option of using the word-spacing range but ofcourse has to stay within the range. I'm still not sure why it would be ok to ignore any user specified min and max values of 'word-spacing' during START and END alignment. If a user specifies a length range, what would the reason be for not using it? Perhaps with additional DEFAULT_SPACE_WIDTH. When alignment is start or end, each space has always its .optimum width, so there is no need to look at the .minimum and .maximum: the user most preferred value is already used. Is there anything that prevents using a non .optimum value within the range if the result is judged to be better (with a lower demerit). Ok, performance is indeed a fine reason, but IMHO such quality vs. speed tradeoffs should eventually be made by the user rather than us. Simon told the same: # Note that in TeX such thresholds are user-adjustable parameters. I # think they should eventually be so in FOP too, for those of us who # have the most exquisite taste of line layout. and I think it's a good idea; the algorithm should: 1 find breaking points without hyphenation 2 hyphenate 3 find breaking points with hyphenation 4 decide which ones are better and point #4 uses the user-definable threshold; where should this constant be stored? Inside the code of LineLM or in a configuration file? An extension attribute? fo:block fox:knuth-threshold=5 ... /fo:block I suspect that the other knuth parameters should be specified the same way. But it is not a high priority IMO. regards, finn
More questions on line breaking.
Hi Some more questions. 1) What is inactiveList doing. Nodes are added but never used. 2) If there is no shrink in a line (the case in START alignment) then nodes are never removed from activeList until a forced break element is found. Is that really the intention of the algorithm? It seems suspect that a ration of INFINITE_RATIO is also created when the break is too wide to fit within a node. regards, finn
Re: Knuth linebreaking questions
1) What is the purpose of 2 glues for a normal space in END and START alignment: new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false)); new KnuthPenalty(0, 0, false, , true)); new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true)); [Luca Furini] The purpose is to give each line (but the last one) the same stretchability, regardless of the number of spaces in it. If the penalty is not used (there is no line ending there) the overall effect of the 2 glues is a 0 stretchability and does not modify the line total; if the penalty is used (a line ends there) then the stretchability of the previous glue is added to the line total, which becomes 3 * wordSpaceIPD.opt because the previous space, as said before, added 0 (the following glue is suppressed). In justified text, a line with many spaces can be adjusted in order to be much shorter, or much longer. If left-aligned text used the same elements, the algorithm would find the same breaking points; but this time adjustment ratios are not used, so a line with many spaces would be too much longer, or too much shorter, than the other lines. Using these elements, the algorithm creates lines whose unadjusted width is quite the same. Ok, thank you for the explanation. I'm still not sure why it would be ok to ignore any user specified min and max values of 'word-spacing' during START and END alignment. If a user specifies a length range, what would the reason be for not using it? Perhaps with additional DEFAULT_SPACE_WIDTH. And why not adjust the spacing within the user specified min/max for START and END alignment? 3) What is the reasoning for doing hyphenation only after threshold=1 fails. Naive common sense tells me that if the user specify hyphenation we should do hyphenation before finding line breaks. Finding hyphenation points is time-expansive (all words must be hyphenated, not only the ones near a line's end), the sequence of elements becomes longer, there are more feasible breaking points, and a line ending with a - is less beautiful; so I thought that if a set of breaking points could be find without hyphenation. I just took the hyphenate property as a suggestion instead of an order! :-) Note that the same algorithm with the same threshold could find a different set of breaking points with and without hyphenation, because the elements are different. Without hyphenation, spaces could need a little higher adjustment, for example. Ok, performance is indeed a fine reason, but IMHO such quality vs. speed tradeoffs should eventually be made by the user rather than us. Thank you for taking the time to explain it all in such great detail. regards, finn
Knuth linebreaking questions
Hi Luca (and others), I've been trying to get my head around the line breaking code and during that process some questions has come up. I urge you *not* to take anything I ask as a sign of criticism or as a request for changes. I don't have the Knuth paper where the algorithm is described so perhaps the answers would be obvious if I read it. 1) What is the purpose of 2 glues for a normal space in END and START alignment: new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false)); new KnuthPenalty(0, 0, false, , true)); new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true)); and why isn't the min and max of wordspaceIPD used. 2) What does the threshold parameter to findBreakingPoints controll? It seems to be a performance parameter which control the number of active nodes, rather than a quality parameter. Or to frame my question differently, if threshold=1 finds a set of breaks, will threshold=5 always pick the same set of breaks? Or can threshold=5 find a better set of breaks? 3) What is the reasoning for doing hyphenation only after threshold=1 fails. Naive common sense tells me that if the user specify hyphenation we should do hyphenation before finding line breaks. 4) I've compared your code to tex_wrap http://oedipus.sourceforge.net/texlib/ and the main difference is in the way new KnuthNodes are added to the active list. Is the BestRecords part of Knuth or is it your own invention? Why is it only fitness_class'es in BestRecord that is higher then minDemerits + incompatibleFitnessDemerit that is added to activeList? Why not all fitness_class'es in BestRecords? regards, finn
Re: cvs commit: xml-fop/src/java/org/apache/fop/traits LayoutProps.java SpaceVal.java
[Glen] 2.) Appended EN_ to enumeration constants to [J.Pietschmann] Yuk. Having a large number of identifiers in the same scope with an identical prefix isn't very good for autocompletion both in Emacs and Eclipse. [Glen] We've been doing the same with PR_ (properties) and FO_ (FO's) for quite some time. To avoid a name conflict somewhere. After hitting the EN_, you're at the same place you would be without the prefix. Furthermore, you can now hunt away for your enumeration constants without them being intermixed with the PR_'s and FO_'s. It was also a commenting issue: TRUE and FALSE, for example, without a prefix, just weren't self documenting enough to emphasize that we're working with enumeration constants here. (Remember, we removed the old interfaces--per your desire as well as mine--such as WritingMode.LR_TB or whatever that previously provided that emphasis.) How about having 3 interfaces: 'Properties', 'Elements' and 'Enums' which contains the constants without any prefix. And then decide that these interfaces are never implemented, but the constants are always accessed using the interface name: Enums.TRUE That would keep the searchability and perhaps even help us when (if) we move to typesafe enums. regards, finn
Re: Do we need an inherit enumeration constant?
[Finn] It [an INHERIT constant] isn't needed as an enum value because the 'inherit' keyword takes precedence over the other enumeration values. See 5.9.11: The Keyword values take precedence over EnumerationToken. where Keyword is 'inherit'. The code to handle 'inherit' is at line 397 http://cvs.apache.org/viewcvs.cgi/xml-fop/src/java/org/apache/fop/fo/properties/Proper tyMaker.java?annotate=1.11 [Glen] Finn, I've read what you've written above, as well as what the spec is saying -- but I am confused here. Why does there need to be a precedence-determining rule in the spec that says The Keyword values take precedence over EnumerationToken.? Does anyone know? Perhaps because 'Keyword' and 'EnumerationToken' is produces with the same lexical rule. http://www.w3.org/TR/REC-xml-names/#NT-NCName My first thought is that we're only going to have one attribute value between the quotes (i.e., inherit, left, right, justify, etc.) so I don't see the need for order-of-evaluation rules or declarations of precedence. But are there actually cases where you would have more than one enumeration constant (e.g., left | justify?) I must be missing something here. 'azimuth' and 'text-decoration' can have multiple enum values (eg. overline line-through). But no property can have inherit as well as some other value. regards, finn
Re: Do we need an inherit enumeration constant?
[Glen] This is possibly a question for Finn, but if anyone knows: In our Constants.java [1], we don't have an property enumeration constant for inherit. (Go here [2], and search on inherit, you will see it listed in the Value: section for multiple properties.) Is there any reason why we don't need it, or did we just forget to add it to our Constants.java? It isn't needed as an enum value because the 'inherit' keyword takes precedence over the other enumeration values. See 5.9.11: The Keyword values take precedence over EnumerationToken. where Keyword is 'inherit'. The code to handle 'inherit' is at line 397 http://cvs.apache.org/viewcvs.cgi/xml-fop/src/java/org/apache/fop/fo/properties/PropertyMaker.java?annotate=1.11 OTOH the code to deal with 'inherit' and the other enum values needs some changes to deal with optional white-space, so perhaps the parsing of enums should be passed to the PropertyParser and the 'inherit' test should be placed after the call to PropertyParser.parse(). In that case a INHERIT constant would be needed. regards, finn
Re: remove LayoutManager.initialize()?
[Glen] Does anyone have a problem if I worked towards removing the initialize() method from our LayoutManager interface? There is two way of looking at it. The code in initProperties() which can be moved to the ctor should be. That is no loss at all and will reduce complexity of the LMs a tiny bit. But removing the initialize() method will reduce the flexibility that LMs currently has to retrieve information from the parent LM. Perhaps that flexibility is not used at the moment but I suspect that it will be needed to implement the irregular inheritence of properties like text-decoration. regards, finn
Re: AreaFactory patch
[Glen] Finn, keep in mind--both you and Simon wanted pluggable LM's, and you even supplied a patch for it a few months ago. But you have been mostly silent on the matter ever since (i.e., it looks like you don't have a need for it ATM.) Or perhaps I've been working on other things, like property optimizations and exceptions. And just maybe I didn't feel that I had to be the one who implemented plugable LMs since I wasn't the one who removed the existing plugability. So sometimes it is good to have things sit in Bugzilla for a couple of months, see if others want it, or what modifications they want, or see if even the original requestor still needs it. Also, Tybor seemed to be fine with using pluggable LM's instead of pluggable Area's--i.e., not even needing them *now*--which would make his desires in sync with yours and Simon's (and mine), as well as keep our layout code in our LM's instead of moving to the Area objects. Do you still see a need then for *both* pluggable LM's and pluggable Areas in our code then? I didn't find that realistic, as I am uncertain of the additional power that it offers, but am asking your opinion here. I only see a need for plugable LMs, but the AreaFactory patch is so small that I see no problem with throwing a bone to Tibor. regards, finn
Exceptions. (Was: AreaFactory patch)
[Peter] On the topic of exceptions, and now that it's all over... I was puzzled by this discussion. Would anyone expect that Defoe would subclass SAXException for document validation errors? That is your choice. Surely exceptions that occured during SAX event handling (eg startElement) should be handled, either by wrapping the exception in a SAXException, or by passing SAXExeption subclasses along or by throwing some kind of RuntimeException (did I miss another option?). I don't think exception handling by e.printStackTrace(); is a long term solution. If not (it doesn't), why not? AFAIK, there is no well defined API to XSL:FO parsing, so everybody will do it differently. And if someone was inclined to write an FO processor using a DOM front-end, would you expect validation errors to throw subclasses of SAXException? Ofcourse not. The same fo file, with the same errors, could be processed by three different processors, using three different parsing methodologies, and the same validation errors would encountered. Do you mean that the 3 different processors should ideally report the same validation errors in the same manner? That can only happen after someone standardize a SAFO API (Simple API for FO parsing). Until then all implementation will throw different exceptions, which is fine by me. (OK, you can classify at least two categories of validation error, but I think the argument still applies.) SAX != XML parsing, let alone document validation. True, but FOP heavily depends on SAX. regards, finn
Re: AreaFactory patch
[Chris] I'm definitely in agreement with you on this one Glen. Lets keep Layout simple whilst its still unfinished. Pluggable LMs can be added once we have an initial release. [Andreas] Well... (sigh)... well ('nutha sigh) What *does* Finn think, in that case? So far, I've yet to hear a single *solid* argument pleading against the proposed change. Of course, something like LM Makers can be added later on --the proposed AreaFactory shouldn't hinder that. I got some minor suggestions to the patch: - It should be strict typed: createBlock(..), createInline(..) - It should be complete so that all area creation was done through the factory, not just the 3 areas that Tibor needs. All we heard up to here is a few vague concerns about so-called increased complexity. What?!? It's a plain, simple, basic-as-can-be Factory pattern for chrissake! It doesn't bite... or does it? Anyone? I've never bought the increased complexity argument. Not in this case and not in any of the previous cases where it was raised. regards, finn
Re: Connection between FObj and Area
[Tibor Vyletel] Hello Fopsters, I use FOP (dev 1.0) in a bit different scenario as you. My renderer works in GEF framework. Till now, I have hacked simple change in relevant layout managers: BlockLayoutManager.getParentArea(Area childArea) { ... // this creates connection between fobj and curBlockArea: curBlockArea.addTrait(Trait.ID_AREA, this.fobj); ... } This solution is just a hack and a maintainance nightmare, as the design is constantly changing ;) I am aware that described construction is breaking memory optimization goals, however I need it to achieve my goal: enable editing after the rendering process and make this editing as responsive as possible. Lately (in fact, after the removal of getID() method from FObj), I've been thinking about more elegant solution which would allow me to track a mapping between FObjs and Areas in cases I need it. I know that any technique which would force this connection in standard processing would not be possible, because its effects on memory consumption would be relevant. My alternative idea is: - create AreaFactory interface (or abstract class) which would be responsible for creation of areas. Provide default implementating class with today's functionality scattered among LMs. There are to choices how to design this interface: a) with one strong method: Area createArea(LayoutManager lm, FObj fobj, Object constraints) b) with a set of specialized methods for creation of different area types: Inline createInline(...) {} Block createBlock(...) {} ... Would you need to return subclasses of Inline and Block etc? Or would you just set various additional traits such as this.fobj? - refactor instantiation of area objects in layout managers: remove all direct instantiations and replace them with calls to DefaultAreaFactory - solve how to configure AreaFactory before layouting process. Layout managers don't have access to UserAgent so far, so the best method would probably be a pair of static methods: Actually, LMs normally has access to the fobj and through that the FONode.getUserAgent() method. AreaFactory AreaFactory.getFactory() AreaFactory.setFactory(AreaFactory factory) and this constructrion in LMs: currArea = AreaFactory().getFactory().create When this pattern would be applied, my problem would be simply solved by creation of descendants for some Area classes, which would hold reference to the FObj data. Default implementation would bring no extra memory consumption and/or relevant speed degradation. The reason why I have written this mail, is to obtain some feedback about suggested refactoring. In fact I am forced to do some changes to the FOP code because of my project's special requirements. Would you still need to change FOP sources, even after your proposal was implemented? The real payback for me would the case when all the maintainance nightmares would be gone and this can happen only when creation of Areas is more modular. I think that modularization of this could bring its two cents to the quality of the FOP's source code. From your description I sort of get the impression that a smaller change, such having all LMs call a new method, say AbstractLayoutManager.setGeneralTraits(FObj) which has a default empty implementation. Would it be good enough for you to change AbstractLayoutManager to add your special requirements? regards, finn
Re: Exception hierarchy.
[Glen] I'm not clear why you didn't derive ValidationException from SAXParseException. I know the locator is already present in FOPException, but absent the subclass from SAXParseException, it ends up being a different Locator object, i.e., user code that would handle a SAXParseException can't handle this FOPException. Perhaps just a nitpick, but I see a long-term advantage in having our ValidationExceptions subclass SAXParseException if possible. Sometime when a PropertyException is thrown (f.ex during evaludation of a relative numeric) the name of the property and the location is not available. In the patch I've added a catch in PropertyParser which add the property name and location information to the PropertyException and rethrows the augmented exception. SAXParseException keeps the location information private so it is not possible to change the location info after a SAXParseException is thrown. Other than that, the hierarchy could also have been rooted in SAXParseException. (Namely, if an embedded user of FOP wished to report both parsing and validation errors to the user in the same manner--a plausible scenario--it would be convenient to have ValidationException subclass SAXParseException.) ValidationException and SAXParseException both have SAXException as a parent, so a user can catch SAXException to deal with the different exception in a uniform manner. regards, finn
Re: Exception hierarchy.
[Andreas] It seems more about not throwing SAXParseExceptions, but catching them and throwing a FOPException instead. Isn't that the issue? That a 'missingChildElementError' is actually not a SAXParseException... because it has nothing to do with SAX in itself [Glen] SAXParseExceptions would appear to be for errors that occur during the SAX parsing of an XML document (as a result of a user supplying an invalid XML document). It is not for errors that occur within the implementation of the SAX processor (e.g., Xalan) itself. You mean xerces, right? Xerces throws SAXParseException for all kinds of invalid XML: root block /root org.xml.sax.SAXParseException: The element type block must be terminated by the matching end-tag /block. SPE might be considered the XML parsing equivalent of a syntax error that would occur for a code compilation error. Right, errors at the XML level. As the definition of its parent SAXException[1] states, This class can contain basic error or warning information from either the XML parser *or the application* So it can be used within FOP. Indeed, and that is exactly what my proposal does: If the application needs to pass through other types of exceptions, it must wrap those exceptions in a SAXException or *an exception derived from a SAXException*. [1] http://xml.apache.org/xalan-j/apidocs/org/xml/sax/SAXException.html --it's a violation of a definition as formulated in the XSL-FO Rec. The error is treated and reported as being a SAXParseException only because the latter offers the convenience of a locator, but it should actually be something like No, because the SAXParseException is for such validation errors, regardless of what you're parsing. Thoughful people can disagree about that. Learn once, use everywhere: Batik, FOP, Cocoon, Axis, etc., etc. (Also, as a minor additional benefit, it helps pluggability if all XSL processors adopted SAXParseException instead of each of us coming up with our own name for this.) Heh. It appears that Batik have choosen to root their hierarchy in RuntimeException and Cocoon in avalon. I don't know about Axis. I'm sure both project has valid reason for their decisions, but it is a pipedream to think all projects would ever use the same. the suggested 'ValidateException', or maybe something like 'FOParseException'. Using that logic, a SAXParseException would never be used anywhere, because it is SAX usually used to validate the XML document. It is thrown by xerces. And caught by all users of all SAX parsers. If it's no good for FOP, then it would be no good for Batik, and Cocoon, and Axis, and then we're back to rarely using Java exceptions directly in one's coding. Invalid argument. It incorrect to make conclusions based on this single case. And the other project have already chosen differently. SAXParseException is there for a purpose: to be used, it's already coded and documented! Throwing a SAXParseException actually seems more confusing to the end-user (--he/she who, on average, reads little more than the first two lines of the stack trace :-) ) Maybe not, because a SAXParseException has the benefit of the user knowing it already (and if they don't know it, they will learn it *now*, and get to apply this knowledge for future parsing of other languages.) If that is your argument, then we *certainly* shouldn't throw SAXParseException, since users know that a SAXParseException means that the XML input can't even be parsed, and much less transformed. When they see a SAXParseException they would begin by looking at the input XML, not on the output XML from the transformation. Also, our SAXParseExceptions each have detailed error messages telling them what the problem is. (The stack trace they usually won't see anyway, unless they choose to run in debug mode.) Good, so does ValidateException. snip/ What's wrong with an app having its own set of particular exceptions reflecting its overall design, according to 'what can go wrong and where'? Excellent question. The reason is that exceptions are for the benefit of the user using the software, and he/she has usually no internal knowledge of the software product itself. The user doesn't know about FO trees, area trees, layout, renderers--this is all our language that is collectively a black box to the user. Furthermore, users don't care *where* the error occurred inside the black box (advanced users have the stack trace anyway), they do want to know *what* went wrong though, so they can group those *what's* together and report on them in a common manner. For example, I see two main errors that can occur with FOP: 1.) syntax error: elements not following (a) proper ordering or (b) invalid values for properties. These should raise a SAXParseException, with IMHO not *necessarily* a need to differentiate between the two because they both imply the same problem: badly written stylesheet, as well as the same solution: fix/redo your stylesheet. 2.) file not found
Re: meaning of Area.TreeExt interface
[Glen] Still, I wonder if it may be better to do away with this interface and just hardcode the layout/rendering of these objects (i.e. bookmarks) directly, just like we do all the other Area objects that are created during the layout process. This would simplify area.extensions.BookmarkData and area.AreaTreeModel processing code. Thoughts (anyone)? [me] That would make the life of extension writers harder. Thus I would prefer that TreeExt stay. [Glen] Finn, ***be very careful***, the definition of tree extension was #2, *not* #1 (see previous email). I don't like its name, it should be OutOfAreaObject or something like that. And why shouldn't extensions be allowed to create OutOfAreaObjects? Some extension writers will needed that just like the author of fox:bookmarks needed it. regards, finn
Re: meaning of Area.TreeExt interface
[Glen] Actually, to clarify what I wrote: why do we need an extension interface (TreeExt) for objects placed outside the document (pdf bookmarks/metadata), while *not* needing one for extension objects that end up placed *on* the document? You subclass Area for elements on the document and TreeExt for elements outside the document. For the latter types, area creation is hardcoded within our layout managers (to be replaced by extension writers when they have new FO's), so I don't logically see why we can't do the same with the former types. How can an extension pass information onto the renderer if TreeExt (or whatever we want to call it) is removed? regards, finn
Re: meaning of Area.TreeExt interface
[Glen] OK, I'd like to 1) rename TreeExt to OffDocumentItem (to clarify what it's really for) 2) change it from an interface to an abstract base class. 3.) Have this base class hold the when parameter for when this ODI is to be generated, so the when does not need to be hardcoded into AreaTreeHandler as it is currently for PDF Bookmarks. (Helps with extensibility.) 4.) Rename Renderer.renderTreeExtension() to Renderer.processOffDocumentItem(). Any problems with this? No. regards, finn
Exception hierarchy.
Hi Team, I would like to make some changes to the exceptions used in tree building. First, I dislike throwing SAXParseException from within FOP. In my opinion SAXParseException are for reporting problems during the parsing of XML, not for reporting subsequent problems with the FO tree. Second, the exceptions used in FOP should form a structure where concrete subclass exception are thrown and a common superclass can be used for catching all the different exceptions. My proposal for the exception hierarchy look like this: - SAXException - FOPException - ValidateException - PropertyException Putting all our exceptions into one hierarchy that inherit from SAXException helps reduce the number of wrapped exception. I've uploaded a patch which changes the FO tree building and property code. The benefit of the patch is the reduced number of blind catches: } catch (XXXException e) { // log.error(e); } Any objections? regards, finn
Re: Exception hierarchy.
[Glen] I'm confused--why is OK to throw SAXExceptions but not its child SAXParseExceptions? With the latter, it just holds locator information necessary to pinpoint the problem for the user. Please elaborate. The way I see it SAXException is the exception used to communicate across the ContentHandler event methods. And SAXParseException is used to communicate an actual error during parsing of XML. That is what the exception type specify. And BTW, the patch does not throw any SAXExceptions. Java gives us a large family of predefined exceptions for use in our coding--I'm leery of proclamations of them being No Good and that each and every Java application must re-invent their own exceptions instead. That just seems sloppy. But the java exceptions are Very Good. There is just isn't any of them that specifies an error found during XSL:FO validation or found during property evaluation of XSL:FO expressions. Glen (who wanted to get rid of FOPException and just rely on the Java-defined exceptions) IMHO reusing SAXParseException is exactly as wrong as reusing f.ex. FileNotFoundException, which is another fine exception, just not suited for use from within FO tree building. regards, finn
Re: meaning of Area.TreeExt interface
[Glen] Still, I wonder if it may be better to do away with this interface and just hardcode the layout/rendering of these objects (i.e. bookmarks) directly, just like we do all the other Area objects that are created during the layout process. This would simplify area.extensions.BookmarkData and area.AreaTreeModel processing code. Thoughts (anyone)? That would make the life of extension writers harder. Thus I would prefer that TreeExt stay. regards, finn
Re: FObj.bind() SAXParseException
[Glen] The FObj.bind() method throws a SAXParseException if this method hasn't been overridden in the FObj subclass. I'm unsure why a subclass override is required (as the javadoc comment for this method indicates)--wouldn't it be OK if we just have an empty method here without the exception (e.g. if you have no properties of concern for a particular fo)? That would seem to be preferable. I'm a bit unsure. Generally I wanted two ways of handling atrributes/properties. Either - override processNode and optionally createPropertyList if the element wants to deal with the attributes directly, - or override bind if the element want to take part in the whole property system. Outline.java is an example of an element that want to handle the attributes directly. And I think that the other two element Label and Bookmarks should do the same. So the preferred solution IMO would be to add a null createPropertyList and an empty processNode method to ExtensionObj. I mention this because I'm currently trying to run our pdfoutline.fo example, and this exception is being called because no bind() method has been implemented yet in the fox:bookmarks element. and +fobj = node; Appologies for the regressions. regards, finn
Handling XML parse errors when using identity transform.
Hi Team, I'm no expert on java's xml transformer APIs so I need a little help. It appears to me that when using the identity Transformer from xalan we no longer get notifications of XML parse errors. Using an obviously corrupt input fo: ?xml version=1.0 encoding=UTF-8? xfo:root xmlns:fo=http://www.w3.org/1999/XSL/Format; /xfo:root we get the a message on stderr from xerces DefaultErrorHandler: [Fatal Error] t.fo:2:56: The prefix xfo for element xfo:root is not bound. and there is no way AFAICT to set a different error handler on the XMLReader that the xalans transformer creates and uses to parse the input file. Am I missing something? regards, finn
Re: Handling XML parse errors when using identity transform.
[Glen] I'm not sure of the problem here, your input fo is not an xsl fo, because the xfo to a namespace wasn't bound. Hence the Xerces error message--what is it that you're expecting instead? I would expect that the ErrorHandler methods in FOTreeBuilder (warning(), error() and fatalError()) was called. And that the message was written to our common-logging. regards, finn
Re: Handling XML parse errors when using identity transform.
[Andreas] Hi Finn, Not sure if this is what you're looking for, but the javax.xml.transform.Transformer class does offer a setErrorListener() method to use a custom Error Handler... IIUC, all we have yet to do is provide an implementation for a javax.xml.transform.ErrorListener and use it to route the error messages to our common-logging (instead of the org.apache.xml.utils.DefaultErrorHandler routing them to stderr or a supplied PrintStream or PrintWriter). I couldn't get that working. As I see the it, the SAXException from the xml parsing is not passed through to the ErrorListener. The source code from xalan confirms that. No ErrorHandler is passed to the XMLReader that is create in TransformerIdentityImpl.java. regards, finn
Defoe
Hi Peter, Did I miss the announcement? http://defoe.sourceforge.net/ regards, finn
Re: [Fwd: Re: Performance improvement in property consumption.]
[Peter] Alt-design just uses a sparse array, constructed at END_ELEMENT. Space savings are progressively realized as the depth of the FO Tree reduces. Maximum consumption occurs at the points of greatest depth of the tree, minima at the end of each page-sequence. [me] IIRC your sparse array does not just contain the relevant properties for an element (those listed in the spec under each fo:element), but the joined set of all properties relevant for all the allowed children. If this is correct, the sparse arrays stores more properties and uses more memory than my proposal does. [Peter] ... no, the sparse arrays contain only properties relevant to the node itself. If not, it's a bug. I misremembered, my apologies for giving the impression that FAD use more memory. Thank you for setting me straight. regards, finn
Re: [Fwd: Re: Performance improvement in property consumption.]
Why is it more efficient (I know it is, given your metrics, but want to know why)--aren't you just moving the values already stored in the PropertyList into separate fields in the FO objects? Yes, you're releasing the PropertyList's memory, but the elements that the PropertyList previously stored are now stored in the FObj. So if PropertyList can be thought of as a C-like struct holding the values of its FObj's properties, what you're doing appears to be just taking that struct's member variables and moving them to the FObj. But, obviously, given the performance/memory boost you're noting, PropertyList *can't* be regarded as a C-like struct. Why? Could PropertyList be made more efficient instead of this change--make it more like a C-like struct? [Peter] It's a mixed bag, by the look of it. From the patch, applying to FOText: +// The value of properties relevant for character. +private CommonFont commonFont; +private CommonHyphenation commonHyphenation; +private ColorType color; +private Property letterSpacing; +private SpaceProperty lineHeight; +private int whiteSpaceCollapse; +private int textTransform; +private Property wordSpacing; +private int wrapOption; + +// End of property values Note the combination of simple fields for whiteSpaceCollapse and more complex structures like CommonFont. I really like the combination of CommonXXX structures and simple fields, since it litterally matches the spec. Except that the spec doesn't have a FOText element so it is a bad example for me to use. The fields in FOText was taken from TextInfo and I suspect that eventually more fields will be needed in FOText. In the rest of the elements, the set of fields matches the spec. The only exception is a bug where the some of the inline LayoutManagers uses vertical-align which is a shorthand. The layoutmanagers should instead use the properties that the shorthand sets: alignment-baseline, alignment-adjust, baseline-shift and dominant-baseline. But since only one of these properties has been implemented yet, I choose to keep the use of vertical-align for now. Alt-design just uses a sparse array, constructed at END_ELEMENT. Space savings are progressively realized as the depth of the FO Tree reduces. Maximum consumption occurs at the points of greatest depth of the tree, minima at the end of each page-sequence. IIRC your sparse array does not just contain the relevant properties for an element (those listed in the spec under each fo:element), but the joined set of all properties relevant for all the allowed children. If this is correct, the sparse arrays stores more properties and uses more memory than my proposal does. Finn has gone a step further, and collapsed the property structures into local variables, which is good for both memory consumption and speed, at the cost of some more code. IIUC. Correct. In addition I also get a certain level of typesafeness for the properties. regards, finn
Re: DO NOT REPLY [Bug 31206] - [PATCH] Improvements over the new line breaking algorithm
[Luca] IMPORTANT: I had to revert Finn Bock's changes to the PDFRenderer (dated 2004/09/22 13:22:16), otherwise leaders with svg use-content produce errors in the pdf output. There isn't any run-time error, but when I try to open the pdf file, I get these warnings: - There was an error processing a page. Wrong operand type. - Illegal operation 'q' inside a text object. - Wrong operand type. and the page with the svg leaders is left empty. I think it could be something involving the saveGraphicsState() method. I've just now fixed this issue so that your patch work with the PDFRenderer. regards, finn
Re: [Fwd: Re: Performance improvement in property consumption.]
[Clay] Which of the alignment-* property is the one you're referring to that has been implemented? I was just looking at them from the point of view of the property subsystem, where only baseline-shift has been implemented. I didn't mean to imply that it actually work all the way through layout. regards, finn
Re: Performance improvement in property consumption.
[Glen] So if we did this at the FO level, in effect, we'd have to (1) store an instance variable of every valid property for each FO, given that we wouldn't know whether the FOEventHandler's needs it beforehand, and [me] Yes. Which is massively more efficient than storing the exact same properties in a PropertyList. [Glen] Why is it more efficient (I know it is, given your metrics, but want to know why)--aren't you just moving the values already stored in the PropertyList into separate fields in the FO objects? Yes, you're releasing the PropertyList's memory, but the elements that the PropertyList previously stored are now stored in the FObj. Keep in mind that there is 2 different sets of properties: - The set of specified properties. - The relevant properties (as listed in the spec under each element). The existing PropertyList stores the specified properties in the super HashMap and has an additional cache which stores all retrieved properties. In my proposal the specified and the cached properties are still stored in the property list but only the relevant properties are retained in the fo object. So if PropertyList can be thought of as a C-like struct holding the values of its FObj's properties, what you're doing appears to be just taking that struct's member variables and moving them to the FObj. No, see above. But, obviously, given the performance/memory boost you're noting, PropertyList *can't* be regarded as a C-like struct. Why? Could PropertyList be made more efficient instead of this change--make it more like a C-like struct? Speed can be improved, but at the cost of additional memory. The beauty of my proposal is that we can pick the fastest implementation of property assignment and property lookup without worrying about the memory because the property list is released. regards, finn
Re: Performance improvement in property consumption.
Keep in mind that there is 2 different sets of properties: - The set of specified properties. - The relevant properties (as listed in the spec under each element). The existing PropertyList stores the specified properties in the super HashMap and has an additional cache which stores all retrieved properties. [Glen] Ummm...just to be very careful so I can understand what you're saying--instead of retrieved properties above, did you mean relevant properties? No, the current PropertyList caches nearly all accessed properties. This was added (IIUC) to improve speed of repeated retrieval of the same property, as typically seen for inherited properties. You can find the caching in PropertyList.findProperty(). regards, finn
Re: Performance improvement in property consumption.
In my proposal the specified and the cached properties are still stored in the property list but only the relevant properties are retained in the fo object. [Andreas] Yes, and IIJC, at the same time, you're eliminating the need for downstream property queries having to be performed through the PropertyList, so the FObj's can communicate directly (--less underlying HashMap fetches...) So roughly: a. FObj1 asks FObj2 b. FObj2 probes its property instance variable c. FObj2 responds to FObj1 instead of a. FObj1 asks its PropertyList b. PropertyList asks FObj2's PropertyList c. FObj2's PropertyList queries its HashMap d. FObj2's PropertyList responds to PropertyList e. PropertyList responds to FObj1 No, at the startElement() event the property list exists for all the parent elements and they are used to answer all property queries, including the property function and inheritance. So the process is you outline above is unchanged. PS. I'm ignoring the handling of markers in my descriptions. regards, finn
Performance improvement in property consumption.
Hi Team, I've put together another largish patch that changes the way properties and used by the FO objects, the layout managers and the FOEventHandler subclasses. http://issues.apache.org/bugzilla/show_bug.cgi?id=31699 The performance increase is 42% while at the same time using %27 less memory. The numbers comes from rendering the initial chapters of DocBook The Definite Guide [1] to a 131 page pdf file using jdk1.4.2_03. Performance is meassured in wallclock time and memory by finding the minimum -Xmx value where the document can be rendered (in steps of 100). The improvement in performance and in memory are due to PropertyList now being a temporary object that is only available to a FObj during the call to FObj.bind(). All fo elements are required to extract all the properties that they need and to store the values as instance fields in the fo element. When the PropertyList is a temporary object that isn't stored in the FObj, the size of the PropertyList does not matter and a much faster version can be used (implemented by StaticPropertyList). The patch also adds correct property resolution for fo:marker and fo:retrieve-marker. Children of fo:marker has a memory efficient version of their PropertyList stored within the Marker object and they are re-bind'ed by fo:retrieve-marker each time a marker is retrieved. Any objections? [1] http://www.apache.org/~bckfnn/DocBookTDG.zip regards, finn
Re: Performance improvement in property consumption.
[Glen] All fo elements are required to extract all the properties that they need and to store the values as instance fields in the fo element. I love the performance boost you're recording but have a philosophical issue of an fo object needing a property--it is really an issue of the *FOEventHandler subclass* that needs the properties--and some handlers need/implement more properties than others. Does any event handler need more or different properties than what is defined in the spec under each fo element? Because that is the set of properties that I store in the fo objects. If further properties are needed by any property consumer, it can just be added to the fo element. An example could be text-decoration on fo:block. So if we did this at the FO level, in effect, we'd have to (1) store an instance variable of every valid property for each FO, given that we wouldn't know whether the FOEventHandler's needs it beforehand, and Yes. Which is massively more efficient than storing the exact same properties in a PropertyList. (2) instead of the programmatic convenience of getPropString(PR_OFFICIAL_PROPERTY_NAME), we'd have to remember what each of the properties are named in the FO's (We can probably standardize these though.) I've consistly picked getOfficielPropertyName() as the name of the getter method. Another clarification needed: when would the property list's memory get released--at the PageSequence level, or as soon as the FO is finished? At the endElement() event. Because the values of some FO's properties further downstream depend on an earlier FO's properties, we may not be able to release the memory when the earlier FO is finished. If a property value is needed by a following sibling, then it can be found as an instance field on the fo object that defined the property. Or, (to support inheritance) we may need to store instance variables for what may be a very unlikely event of a child needing that value. I don't understand what you mean. All property inheritence is resolved at the time of the child startElement() event. The resolution is done using the parent propertylist while it still exists. Also, are there any other options for releasing this memory that you know of? Is there someway this .bind() or other releasing of memory can be done within the FOEventHandler subclasses, or at least within LayoutManager (forgetting about RTF) instead? I don't see any benefit at all, from such approaches. PS. The description of releasing property list only apply to normal elements. All element under fo:marker are treated differently. regards, finn
Re: [GUMP@brutus]: Project xml-fop (in module xml-fop) failed
[Jeremias] If switched the junit tasks to write to System.out instead of to a file. Hopefully, we will get more usable output this way. I don't know if we can get Gump's full output somewhere so we could check the test reports. BTW, I have no errors locally. I get an error: junit: [echo] Running basic functionality tests for fop-transcoder.jar [junit] Testsuite: org.apache.fop.BasicTranscoderTestSuite [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 3,968 sec [junit] - Standard Output --- [junit] drawImage x=7 y=277 width=97 height=14 [EMAIL PROTECTED]: type = 2 DirectColorModel: rmask=ff gmask=ff00 bmask=ff amask=ff00 IntegerInterleavedRaster: width = 97 height = 14 #Bands = 4 xOff = 0 yOff = 0 dataOffset[0] 0 [junit] - --- [echo] Running basic functionality tests for fop-transcoder-allinone.jar [junit] Testsuite: org.apache.fop.BasicTranscoderTestSuite [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0,281 sec [junit] - Standard Error - [junit] java.io.IOException: SAX2 driver class org.apache.xerces.parsers.SAXParser not found [junit] at org.apache.batik.dom.util.SAXDocumentFactory.createDocument(Unknown Source) [junit] at org.apache.batik.dom.util.SAXDocumentFactory.createDocument(Unknown Source) [junit] at org.apache.batik.dom.svg.SAXSVGDocumentFactory.createDocument(Unknown Source) [junit] at org.apache.batik.dom.svg.SAXSVGDocumentFactory.createDocument(Unknown Source) [junit] at org.apache.batik.transcoder.XMLAbstractTranscoder.transcode(Unknown Source) [junit] at org.apache.batik.transcoder.SVGAbstractTranscoder.transcode(Unknown Source) [junit] at org.apache.fop.AbstractBasicTranscoderTestCase.testGenericPDFTranscoder(AbstractBasicTranscoderTestCase.java:70) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:324) [junit] at junit.framework.TestCase.runTest(TestCase.java:154) [junit] at junit.framework.TestCase.runBare(TestCase.java:127) [junit] at junit.framework.TestResult$1.protect(TestResult.java:106) [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) [junit] at junit.framework.TestResult.run(TestResult.java:109) [junit] at junit.framework.TestCase.run(TestCase.java:118) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208) [junit] at junit.framework.TestSuite.run(TestSuite.java:203) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208) [junit] at junit.framework.TestSuite.run(TestSuite.java:203) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:325) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeInVM(JUnitTask.java:804) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:551) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:527) [junit] at org.apache.tools.ant.Task.perform(Task.java:319) [junit] at org.apache.tools.ant.Target.execute(Target.java:309) [junit] at org.apache.tools.ant.Target.performTasks(Target.java:336) [junit] at org.apache.tools.ant.Project.executeTarget(Project.java:1306) [junit] at org.apache.tools.ant.Project.executeTargets(Project.java:1250) [junit] at org.apache.tools.ant.Main.runBuild(Main.java:610) [junit] at org.apache.tools.ant.Main.start(Main.java:196) [junit] at org.apache.tools.ant.Main.main(Main.java:235) [junit] - --- [junit] Testcase: testGenericPDFTranscoder(org.apache.fop.BasicPDFTranscoderTestCase): Caused an ERROR [junit] null [junit] Enclosed Exception: [junit] SAX2 driver class org.apache.xerces.parsers.SAXParser not found [junit] org.apache.batik.transcoder.TranscoderException: null [junit] Enclosed Exception: [junit] SAX2 driver class org.apache.xerces.parsers.SAXParser not found [junit] at org.apache.batik.transcoder.XMLAbstractTranscoder.transcode(Unknown Source) [junit] at org.apache.batik.transcoder.SVGAbstractTranscoder.transcode(Unknown Source) [junit] at org.apache.fop.AbstractBasicTranscoderTestCase.testGenericPDFTranscoder(AbstractBasicTranscoderTestCase.java:70) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) regards, finn
Re: cvs commit: xml-fop/src/java/org/apache/fop/render/rtf TableAttributesConverter.java
[Jeremias Maerki] Finn, it seems to me that you probably forgot the check in all your changes. FOP doesn't compile ATM. The method makeBorder is missing. Yes, I forgot. I'm sorry for the inconvenience. Glen, Thank You for temporarily fixing my mistake. regards, finn
Re: Storing the Locator instance in FONode.
[Glen] I see. I liked Locator because I can get away with passing in one parameter into the vCN() instead of three. (But I don't think we'll need to change that portion of the code because whenever vCN() is being called, the Locator *is* accurate--would you agree?) Yes. regards, finn
Storing the Locator instance in FONode.
Hi Glen, I think that your recent change that stores the Locator instance in each FONode is incorrect use of the Locator interface. The docs says: http://java.sun.com/j2ee/sdk_1.3/techdocs/api/org/xml/sax/Locator.html Note that the results returned by the object will be valid only during the scope of each content handler method: the application will receive unpredictable results if it attempts to use the locator at any other time. And in fact, it is the same Locator instance that is stored in each FONode. I think it would be better to revert to using separate line/column/systemId fields if we want to know the location of each FONode. regards, finn
Loading of extension element mappings.
Hi Team, I think the API for adding additional element mappings to the FOUserAgent is bound to cause problems when fop is deployed in a multi classloader environment. Using tomcat as an example, an extension mapping deployed in a webapp can not be found if a well meaning administrator has placed fop.jar in tomcats common/lib directory. This is because FOTreeBuilder.addElementMapping(String) uses Class.forName() to load the element mapping class. I see several possible solutions: 1) Let FOTreeBuilder.addElementMapping() also check the Thread.getContextClassLoader() for the extension element mapping. This will work in environments that actually sets the context classloader. Tomcat does but other environments doesn't. 2) Pass in an explicit classloader to use when loading extension mappings. 3) Let the client pass the actual element mapping object, instead of just passing in the name of the element mapping class. I'm strongly in favor of 3) because it Just Work (tm), and when a ClassNotFoundException do occur, the programmer immediately knows why and how to fix it. So I propose to change the API signature from FOUserAgent.addElementMapping(String) to FOUserAgent.addElementMapping(ElementMapping) regards, finn
Re: [VOTE] Luca Furini for Committer
[Simon Pepping] I propose that we make Luca Furini a member of the FOP team. Here is my +1 vote. +1 regards, finn
Re: cvs commit: xml-fop/src/java/org/apache/fop/fo/pagination Root.java
A (farfetched) argument against ThreadLocals would be that they prevent one FOP processing run to occur recursively during another independent processing run. Like an extension element that itself will attempt to process another fo document. [Glen] I think that restriction is implicit in the XSL Recommendation, because fo:instream-foreign-object [1], has the requirement that its child be from a *non-XSL* namespace. If the rec intended FO documents to be processed recursively, they wouldn't have had a non-XSL namespace requirement (i.e., why bother to require users to have a finn:fodocument that would just wrap fo:root/, if you can just use the latter tag directly?) Indeed, this restriction could be an argument *for* using ThreadLocals. No, not really. ThreadLocals should be used for storing data that is local to each thread (thread singletons). They are not a way to introduce global variables which just so happens to work in tomcat. regards, finn
Re: [proposal] How to do logging from the command line
[Jeremias] Hi Finn, I took a look at it and I must say that I'm a bit confused, too. Anyway, I have a proposal to make. I would expect a command-line application to work like any other C-program such as grep, svn, ls or whatever. That means you don't get any [INFO] before each line, but you can define the log level (normally quiet, normal and verbose) through command line switches. That'll work for most CLI-users. Since the Fop class is the one to control the whole application it (or rather the CommandLineOptions class) can also set up C-L to behave exactly as explained above. That would probably mean not to use SimpleLog but to provide a special implementation for command-line use. At any rate, I don't agree with the way that SimpleLog is implemented. Informational messages should go to System.out, errors to System.err. Logging prefixes should be disabled. I've had to do the same for Avalon Logging in Barcode4J [1] and I'm very pleased with the result. Using this I was able to implement the Barcode4J CLI in a way that the generated barcode could be output to System.out which may also be desirable for certain people. You could even modify the whole thing in way that you could implement FOP as a filter application getting the input through System.in and sending the output to System.out, giving error messages through System.err. For those who know about C-L and want to do some special logging we could have a command line switch that disables our special logger setup. They can fully control C-L from outside. For the cases where FOP is used embedded in a bigger software, FOP shouldn't manipulate anything in C-L, it's simply the developer's job to set up C-L from outside. WDYT? I like it. It will get consistent results for command line users without them (practicly) ever having to do any logging configuration. I have added a patch with an implementation of your proposal. Feel free to point out any mistakes that I have made. Glen, I'm a bit confused about your objections. I don't think they apply to Jeremias' proposal at all. - SimpleLog plays *no* part in this. - There is no wrapping of c-l in any way, we just supply own very simple specialized logging implementation. - Users can still use other c-l loggers (jdk14, log4j, etc.) by setting LOGCHOICE in fop.bat regards, finn
Re: cvs commit: xml-fop/src/java/org/apache/fop/fo FOTreeBuilder.java
[Glen] I think fox: should also be validated. It's, by definition, part of FOP core, and can't be added/adjusted to by anyone but us. Yes. I agree. regards, finn
Logging of exception.
Hi, I didn't follow the discussion in the spring about command line -d and commons-logging so I'm likely missing some important pieces, but I'm a bit confused about the result. If I attempt to render a fatally corrupt input fo file like: fo:block/ , I get the expected SAXParseException message on the console and the Turn on debugging for more information line. When I turn on debugging, then the full exception is printed directly on the console (without using C-L!) and no information about the exception is sendt to C-L. So the C-L debug level directly controls output to System.err. That makes very little sense to me. At the very least the exception should be logged to C-L, right? I would also guess that any additional System.err output should be controlled separately from C-L, with a -d option. regards, finn
Re: cvs commit: xml-fop/src/java/org/apache/fop/fo/pagination Root.java
[Jeremias Maerki] Question to everyone: We currently don't have a multi-threaded design like Peter West's approach. Can anyone think of a reason that all the FO-building and layouting process for one processing run may run within more than one thread? I don't think threre is one if the SAX event delivery is always within one thread (big if). If there isn't I believe we could make use of a ThreadLocal to put some shared data that will be easily accessible from all involved components. Initialize the ThreadLocal in startDocument() and clear it in endDocument(). I realize there's a certain risk involved but it could really shorten the access ways for shared data especially for the FO tree, if that's really a problem (I'd also like to know if it really is. Anyone?). A (farfetched) argument against ThreadLocals would be that they prevent one FOP processing run to occur recursively during another independent processing run. Like an extension element that itself will attempt to process another fo document. My preference would be to explicit pass the shared data (in this case the FOEventHandler) to the classes that needs it. If the recursion Glen discovered is deemed too slow, then store the FOEventHandler in an instance field for each FONode. regards, finn
Re: cvs commit: xml-fop/src/java/org/apache/fop/fo/pagination Root.java
My preference would be to explicit pass the shared data (in this case the FOEventHandler) to the classes that needs it. If the recursion Glen discovered is deemed too slow, then store the FOEventHandler in an instance field for each FONode. [Jeremias] My preference, too. But having too much in FONode will generate a lot memory consumption. Then maybe we can select some specific node types that will carry an actual reference to the FOEventHandler. Like fo:root (obviously), fo:flow and fo:block and all other node type will reference their parent. We can then pick the right tradeoff between memory and CPU. It's just a thought. regards, finn
Re: validateChildNode prevents extensions.
[Glen] Oh, when I meant alter the system I also meant adding classes, using different classes, etc., i.e., some things need to be done beforehand to accomodate a new element. Come on! That leaves alter the system without any meaning. Recompiling modified sources into a new fop.jar would IMO mean alter the system. Here is how extensions should work: http://xml.apache.org/fop/dev/extensions.html and that is effectively what I have done. BTW, without divulging too much that may hurt your interests, would you mind explaining your reluctance to just modify FOP source (replace classes, etc.) for what you are trying to market? Is it licensing issues--or is it more for programmatic style/user convenience? I want to better understand your reluctance on this matter. Legally the seperation between FOP and extension have placed a solid wall between my opensource fop-dev work and my commercial extension. My client cant claim ownership of any of the fop-dev work since the extension didn't require *any* changes to fop. Commercially, trying to sell an fork of FOP in order to sell an extension to FOP will never fly. regards, finn
Re: validateChildNode prevents extensions.
Extension writers has to create a subclass of AbstractLayoutManager (I just use a LeafNodeLayoutManager) and an subclass of Area. But that are normal operations since there are subclasses of those for each element type. I must also add code to the renderer to handle my area, but that is because the renderer code is not flexable enough to handle unknown area classes. I get around that limitation by plugging in a completely new PDF renderer. [Glen] OK, in other words, you have to change FOP source code to add a new XSL-based extension element. You have no problem philosophically with this. I was unclear. When I write an extension, I make *no* change to any existing FOP source files. And I do not replace any existing FOP class files with new version. For an extension element I have to writte a new version of FO classes, a layout managers, an area and also a new PDF renderer but these are all new files with new class names. Also, I can't replace the Block class with my subclass in the FOElementMapping without changing FOP sources. OK, you're saying again that you have to change the FOP source code in order to add a new XSL-based extension element, but now you're having a problem with adding *this* code. I don't want to change any FOP code! And if I have to change any FOP sources to add an extension, then FOP doesn't support extension with the meaning of extension as I understand it. I'm still not sure why this is causing you so much more consternation--isn't this trivial (even if annoying) to what you will need to do in the other classes? An extension mechanism where I can put an unmodified fop.jar and myextension.jar on the CLASSPATH and have it work is a defining issue to me. It is the mechanism that is currently paying my rent. Between changing code in 3 places with no validation for the user community (and tons of NPE/CCE complaints due to badly written FO's for the committers to deal with on the various lists), vs. changing the code in 4 or 5 places *with* validation for the user community (and few if any NPE/CCE complaints), you do have a bit of a tough case to make that the former would win a cost-benefit analysis. Having to subclass the parents will prevent two different unrelated extension from both having fo:block's as parents. Not true--fo:block already allows 20 or so different types of children--all distinct from each other. (But to further simplify your coding, all you have to do is check that the namespace URI is www.finn.com or whatever and you can choose to allow everything from that namespace.) I meant that if I have to either subclass or change Block.java then two independently developed extensions can not both be used in the same document. Remember, FOP is basically a compiler--like javac, you have to define its syntax it will accept, and that will require coding. Also, the only difference between an XSL FO and an extension FO is that the latter hasn't been blessed by the W3C yet. Both need validation and coding. I don't mind having to write validation code, but I seriously mind having to write it in exsting FOP sources. It has worked quite well before. Without changing any of FOPs layout or area sources. True. But in the old version, FOP would happily accept, say, an fo:root as a child of an fo:block, or an fo:layout-master-set as a child of an fo:list-item, and subsequently NPE while processing. That model simply had to change. Nobody would respect/use a product that operates that way, or a product based on such a product. Fine, but that can never be an argument for removing support for independant extensions. I mean exactly that the validation should be loose. If I need a finnbock:foobar tag as a child of fo:block, fo:block should not prevent me from doing so. Yes it should, if fo:block is to be XSL compliant. The Recommendation defines the content model (i.e., the children) that fo:block is to follow and their ordering. We modify that for the extension elements predefined within FOP--as we're allowed to. Then I, as an extension writer, also want to change the validation but without having to modify existing FOP sources. So the validation have to somehow be made plugable. Perhaps the extension elements can implement a tag interface to indicate that the extension shouldn't be validated by the parent. Or parhaps the checking could be disabled entirely. Actually, we could do that--fairly simple--let's see what the other committers have to say. We can add a boolean to FOUserAgent called disableValidation, have FOTreeBuilder read it every time it sees an FO in the stream, and on the basis of that decide whether or not to call vCN(). I see two problems though: (1) performance issue--this boolean will need to be read for *every* node in the FO document, I would guess that a check on a boolean for each node is a bit faster than calling vCN for each node. But I have not measured it. (2) FOP will be raising NPE and CCE errors for
validateChildNode prevents extensions.
Glen I think that the new validateChildNode() methods are too strict in response to extension elements. My guess is that the validation should only occur when one fo namespace element is added to another fo element. For instance, Block.validateChildNode() doesn't allow any of my extension elements as children. regards, finn
Re: [Bug 29124] - New line breaking algorithm
Simon Pepping wrote: TextInfo derives that value from the value of the word-spacing property. That must be an error in the property handling by FOP. Yes. I'll commit a fix for it tomorrow. In FOPropertyMapping 0pt is used as the default value. Apparently normal should be the default value. I am not sure how this can be done for a space property. Something similar happens for length properties, which can have the default value auto. I think normal should be a keyword. Keywords are for mapping one value into another value before the values are parsed. In this case 'normal' should be implemented as an enum. Apparently, the actual value can only be calculated at layout time, when the font is known. Yes, so the consumers of the 'word-spacing' value must must check if the value as an enum == NORMAL before using the value as a Space: Property wordSpacing = propertyList.get(PR_WORD_SPACING); if (wordSpacing.getEnum() == NORMAL) { textInfo.wordSpacing = new SpaceVal( new MinOptMax(min, opt, max), true, true, 0); } else { textInfo.wordSpacing = new SpaceVal(wordSpacing.getSpace()); } regards, finn
Re: Switch from AddLMVisitor to FObj.addLayoutManager()
Glen Mazza wrote: I still prefer my solution. Ok, but please consider that it will then become somewhat more difficult to add an alternative layout subsystem. Right now I just have to replace AddLMVisitor (and the XXXLayoutManager classes). As far as I understand your proposal, I would have to replace FOElementMapping and subclass the FO tree classes in order to plug in a new set of XXXLayoutManager classes. I'm all for simpler code, as long as it does eliminate too many of the features that I use wink. BTW, thanks again for dropping the HEAD code base from 900 to 600 classes and getting rid of all that autogenerated code. You have definitely made FOP more accessible to the programming masses! Curiously, that was never my goal. I just wanted to reduce the amount of mailing list discussions on the topic of generated code. regards, finn
Re: Switch from AddLMVisitor to FObj.addLayoutManager()
[Glen Mazza] I still prefer my solution. Ok, but please consider that it will then become somewhat more difficult to add an alternative layout subsystem. [Glen Mazza] Layout subsystems are much rarer than people think, so not that much a concern IMO. Keyword here is somewhat more difficult--it's not insurmountable--and compared to the vast complexity of creating a layout subsystem--almost noise level. It will be a few more headaches, but you'll have a better LS in HEAD to base your code off of as a result. IMHO the closer integrated FO tree and LM tree classes is a worse starting point. Victor had the right intentions on this. [...]. I'm seeing the increase in patches to FOP that will result in this change as a greater benefit*, because (1) only the severe minority of FOP users create their own layout strategies while the majority benefits from a faster-created 1.0, and (2) more patches give alternative LS people more and better code for them to base their work on, and in some cases may even remove the need to have an alternative LS at all. You seems to assume that the one default layout system in FOP can satisfy all requirements. I doubt that. The default layout system should be good but it shouldn't try solve everything perfectly IMO. Perhaps Jörg disagree wink. *You are a good example, because if it's more difficult for you to create an alternative, you'll be more likely to come back to FOP and start making improvements to HEAD's layout strategy! GOOD! (Please do!) I would rather you work on FOP than on a competitor for it. My alternative layout subsystem isn't an alternative to FOP but an alternative way of implementing the getNextBreakPoss() code. I do not yet know if anything will come of it but it looks quite promising. If it works, I'll post it as a patch for discussion. regards, finn
Re: Switch from AddLMVisitor to FObj.addLayoutManager()
Glen Mazza wrote: Team, While I'm implementing the validateChildNode() methods, I would also like to work on removing the AddLMVisitor visitor pattern[1], and revert to our previous method of having the FO's themselves setup the LayoutManagers via addLayoutManager(). (See here [2][3][4] for examples of the previous way of a year ago.) I would consider that a small step backwards because I liked the seperation between the FO tree and the LM tree. But I have no great love for the AddLMVisitor so I wouldn't mind at all if it disappeared. But instead of putting all the LM strategy code into the FO tree classes, I would suggest that the creation of the LM tree are done with a LayoutManagerMapping and a series of Maker classes, one for each FO class. I'll upload a patch to demonstrate the idea. regards, finn
Re: [VOTE] Simon Pepping for Committer
Glen Mazza wrote: I'd like us to go ahead and make Simon Pepping our newest FOP team member. He has provided steady ML help and numerous patch contributions for the past few months, and with the many layout patches that have been coming in recently, we could certainly use his continued help on our team. +1 regards, finn
Re: [PATCH] Support for percentages and table-units
[Finn Bock] I don't understand how you propose to solve any of this, but I hope it would be Ok to commit the straight forward solution I propose. [J.Pietschmann] Whatever works. I just want to note that given the almost one- to-one correspondence between FOs and LMs both in classes and instances (with the exceptions of page, column and line LM), the only advantages of having LMs is - code reuse by inheritance - no layout related data in the FO, for better sharing/reuse Keeping area dimensions in the FO kills the latter. That is not correct. Temporarily storing the area dimension in the FO tree just long enough for the getNextBreakPoss() to return does *not* in any way prevent reusing the FO tree or the LM tree for an other rendering run. There is also more good reasons for having an LM tree than just code reuse. The lineLM and a separate place for the layout logic just to name two. For storing reference measurements for resolving in the layout context, you have only to keep track of inheritable properties, which are basically font-size, ipd and bpd. References to specified values (in contrast to computed values) can be handled by splicing in the parsed property expression for the referenced property as replacement for the referencing function. I don't know what splicing means, but the issue that I don't understand your solution to is when a child fo makes a reference to an computed value that is an expression (like 10% of IPD of 'a') in a parent fo. fo:block id=a border-start-width=10% fo:block id=b border-start-width=inherit /fo:block /fo:block This way the FO tree holds properties (parsed property expressions), while the layout context and the area tree hold the refined traits. I propose to store the specified expression (the 10% of IPD of 'a') in the parent fo. But if that design is chosen, any reference to that property in the parent fo must ensure that the base value that is in effect for the parent fo is available. The base values for the parent fo (IPD of 'a') is not currently available in the layout context for 'b'. Only the IPD of 'b' value exists in the LayoutContext for 'b'. And there is not even a single area created yet. Using the LayoutContext would be a clean design, but first the LayoutContexts must be linked together with the LayoutContext from the parent fo and the LayoutContext must have a reference to the fo element so that the LengthBase can decide how far up the LayoutContext chain it should go in order to find the base value that is valid for the parent fo. I'd be +0 to volunteer wink somebody to implement that. Should we delay my proposed patch until somebody has come up with an implementation that pass the LayoutContext to all Length.getValue(lc) calls? regards, finn
Re: [PATCH] Support for percentages and table-units
If an expression reference another expression in a parent fo, the parent fo expression must be evaluated against the LayoutContext that was in effect for the parent fo and *not* against the child fo LayoutContext. fo:block id=a border-start-width=10% fo:block id=b border-start-width=inherit /fo:block /fo:block It must be the LayoutContex for 'a' that is used when we evaluate the 10% even when we call: propertyList.get(PR_BORDER_START_WIDTH).getValue(lc) with the layout context for 'b'. [Simon Pepping] [leader example snipped} The 30% must be calculated w.r.t. the page width for both leaders, and both leaders must be equally wide. This is not the current result. When I tried to remedy this, I got confused by the sets of symbols on PercentBase and on LengthBase. Why are there 2 sets of symbols? The idea was that PercentBase.XXX names the stored values and LengthBase names the algorithm for looking up a base value. Most of the time these maps one-to-one to each other but for some I imaged that they would be different. margin-top f.ex. should really use an algorithm like BLOCKIPD_OR_PAGEHEIGHT which would look for either PercentBase.BLOCK_IPD or PercentBase.PAGE_HEIGHT depending on the fo:element. Clearly all of this isn't complete implemented in my patch yet and perhaps it can be done in a simpler way. Better names for PercentBase and LengthBase could help. They are mapped onto each other in LengthBase.getBaseLength(), but not all symbols are mapped, e.g. LengthBase.CONTAINING_BOX, which is used by leader-length, has no corresponding action. Some further integration of old and new code must be done here. Correct. Question: What to do if I insert a fo:wrapper between fo:flow and fo:block? That makes it the parent of fo:block, but it has no layout manager corresponding to it. Override getLayoutDimension on it? I don't think so, but perhaps I don't just yet understand the situation that you are describing. A percentage on the wrapper should be resolved against the parent of the wrapper, right? regards, finn
Re: [PATCH] Support for percentages and table-units
[Peter B. West] Finn, When I apply your most recent patch (10366) against a cvs updated HEAD tree and attempt to compile, I get the following: [javac] /usr/local/src/fop-HEAD-finn/src/java/org/apache/fop/fo/properties/LinearCombinationLength.java:60: After applying the patch (10366), the following files should be removed: src/java/org/apache/fop/fo/properties/LinearCombinationLength.java src/java/org/apache/fop/fo/properties/MixedLength.java Does anyone know how to include a file-remove in a patch? regards, finn
Re: [PATCH] Support for percentages and table-units
[me] If an expression reference another expression in a parent fo, the parent fo expression must be evaluated against the LayoutContext that was in effect for the parent fo and *not* against the child fo LayoutContext. fo:block id=a border-start-width=10% fo:block id=b border-start-width=inherit /fo:block /fo:block It must be the LayoutContex for 'a' that is used when we evaluate the 10% even when we call: propertyList.get(PR_BORDER_START_WIDTH).getValue(lc) with the layout context for 'b'. [J.Pietschmann] Well, I used to believe the 10% has been evaluated already, and the inherited property can grab the absolute value immediately. If it is evaluated already where would the evaluated value be stored? In the propertyList (aka the FO tree)? And then the value should be reverted to the expression when the base value changes due to breaks. Storing the resolved value would IMO remove all the benefits from passing in a context parameter to getValue(). Perhaps it can be done that way too, but it is very different from my proposal. regards, finn
Re: [PATCH] Support for percentages and table-units
If it is evaluated already where would the evaluated value be stored? [J.Pietschmann] The layout context for the child LM could be an appropriate place. The resolved property values of the parents should be stored in the layout context? I must be missing something here! And then the value should be reverted to the expression when the base value changes due to breaks. No problem, this is known at the place where a new Layout context is created for getting BP from the child LM. I don't understand how you propose to solve any of this, but I hope it would be Ok to commit the straight forward solution I propose. Then you can change it at a later time to pass in the LayoutContext as a parameter to Length.getValue(). regards, finn
Re: DO NOT REPLY [Bug 26778] - [PATCH] Support for percentages and table-units
[Simon Pepping] I like the patch and the way RelativeNumericProperty holds and evaluates an expression tree (except my different preference for storing layout information, as discussed). This is really nice and works well: v = (((0mpt +(4000mpt +20.0%)) +0mpt) +0mpt) I found a few things that may need modification: In LengthBase.java: /** array of valid percent-based length types */ public static final int[] PERCENT_BASED_LENGTH_TYPES = { CUSTOM_BASE, FONTSIZE, INH_FONTSIZE, CONTAINING_BOX, CONTAINING_REFAREA } ; add BLOCK_WIDTH and BLOCK_HEIGHT (Is this variable used anywhere?) I don't think so. Block, BlockContainer, Table, Region, PageLayoutManager: getters and setters should be removed again? Yes. In LineLayoutManager.java: private Length iTextIndent; remove the i from the name; it stands for int. Yes. regards, finn
Re: [PATCH] Support for percentages and table-units
[J.Pietschmann] The layout context has the actual IPD MinOptMax. There is no inherent reason it should have a link to a parent context or the property subsystem, it's only necessary to have a method to resolve a property expression given a set of MinOptMax for the various traits which can be used as references for percentages. Like textIndent=propertyManager.get(TEXT_INDENT).resolve(layoutContext); Right, but it doesn't have all the base lengths. For some base lengths it is one of the parent layout contexts that has the trait, like ipd of the containing reference area. How could we get hold of that trait if the LayoutContexts isn't linked together? Whatever object that is passed to the resolve() method must also AFAICT also have a reference to the FO so that the resolve() can find the right LayoutContext in the LayoutContext tree to use as base. Perhaps I'm missing something fundamental in your suggestion, but I just can see how there is enough information in a single LayoutContext to resolve relative lengths that is inherited from a parent fo:element. [Simon Pepping] In the LayoutContext design, one does not climb a tree to find the relevant information; instead, the information is propagated downward for ready reference. Each getNextBreakPoss call gets a LayoutContext from its caller, and constructs one for its own calls to getNextBreakPoss. It should make sure that the latter LayoutContext contains all the relevant information, such as the width of the containing block and of the containing reference area. Either it copies it from the LayoutContext it received from its caller, or it inserts new values, as appropriate. In principle it should contain the dimensions that correspond to the percent based length types listed in LengthBase. If an expression reference another expression in a parent fo, the parent fo expression must be evaluated against the LayoutContext that was in effect for the parent fo and *not* against the child fo LayoutContext. fo:block id=a border-start-width=10% fo:block id=b border-start-width=inherit /fo:block /fo:block It must be the LayoutContex for 'a' that is used when we evaluate the 10% even when we call: propertyList.get(PR_BORDER_START_WIDTH).getValue(lc) with the layout context for 'b'. I don't really care how that is done, but I would link the LayoutContexts to their parent. Or rather, I would use the LM as the objects for storing the actual dimension since they are already linked together with their parent. But for now I think we should just use the FO tree and later change the signature of getValue() to take some kind of context parameter when a real need arise. regards, finn
Re: [PATCH] Support for percentages and table-units
Somehow, in our current design, the information must be stored in an object that exists: [J.Pietschmann] IIRC that's what the layout context was meant for. Perhaps, but I doubt it. If they was change to always get a reference to the parent layout context when they are created, and if they had a reference to the FObj, and if they was made available to the property subsystem, then they could properly be used for it. I still think it is easier to use either the FOs or the LMs . regards, finn
Re: [PATCH] Support for percentages and table-units
[Simon Pepping] If in the re-use the layout would not change, the area tree could be reused. OTOH, if the layout would change, e.g. because another renderer would use a font with a different font metric, the layout information in the FO tree cannot be reused. The layout dimension that is stored in the FO tree is constantly updated during discovery of BreakPoss'es and is never reused, not even when a block is split over a break where new values are assigned. The solution I propose makes it impossible to run two different renderers concurrently, but it does not in any way prevent the FO tree from being reused with another renderer after the first rendering is finished. This is another argument to separate FO tree and layout information. No, not really IMO. regards, finn
Re: [PATCH] Support for percentages and table-units
Perhaps, but I doubt it. If they was change to always get a reference to the parent layout context when they are created, and if they had a reference to the FObj, and if they was made available to the property subsystem, then they could properly be used for it. [J.Pietschmann] The layout context has the actual IPD MinOptMax. There is no inherent reason it should have a link to a parent context or the property subsystem, it's only necessary to have a method to resolve a property expression given a set of MinOptMax for the various traits which can be used as references for percentages. Like textIndent=propertyManager.get(TEXT_INDENT).resolve(layoutContext); Right, but it doesn't have all the base lengths. For some base lengths it is one of the parent layout contexts that has the trait, like ipd of the containing reference area. How could we get hold of that trait if the LayoutContexts isn't linked together? Whatever object that is passed to the resolve() method must also AFAICT also have a reference to the FO so that the resolve() can find the right LayoutContext in the LayoutContext tree to use as base. Perhaps I'm missing something fundamental in your suggestion, but I just can see how there is enough information in a single LayoutContext to resolve relative lengths that is inherited from a parent fo:element. I still think it is easier to use either the FOs or the LMs . Maybe. Well, the approach using FOs are here already. It can't get easier than that. regards, finn
[PATCH] Support for percentages and table-units
[Simon Pepping] Finn, I see you also solved another problem, viz. that fo:layout-master-set did not return a proper IPD. Correct. There are, I'm certain, many more cases where the layout managers does not yet set all the dimensions needed to resolve all the different percentages. However, I am not happy with your solution. During the layout process, you feed the page dimensions back into the FO tree, in PageLayoutManager.createPageAreas. Yes, and in BlockLayoutManager and in all the other LM that defines base length for percentages. I think it would be a better design if, in order to resolve the percent-based properties, you would not climb the FO tree but the Area tree. That avoids feeding back results from the Area tree into the FO tree. The XSL spec says (XSL-PR/slice3.html#section-N1208-Conceptual-Procedure): 'The formatting object supplies parameters to its children based on the traits of areas already in the area tree, possibly including areas generated by the formatting object or its ancestors.' Holding those parameters in the FO tree is one solution. Extracting them from the area tree layed out up to that point is another solution, But there is *no* areas layed out or even created at the point where we need resolved length. When getNextBreakPoss is called, we only have a series of BreakPoss's stored in different layout managers. The Area objects are created much later when we have BreakPoss's for an entire page. PageLM then calls addAreas which recursively creates all the areas for the page. which has the advantage that it does not load the FO tree with information that it does not need to have. I initially had a separate PropertyContext object where the length was stored. The FO element then had a reference to the PropertyContext and there was a PropertyContext for every FO. But since there was a one-to-one correspondence between the FO's and the PropertyContexts I decided to put the information directly in the FO's instead. Somehow, in our current design, the information must be stored in an object that exists: - when the properties are parsed. - when the break possibilities are created. The FO nodes fulfills both requirements. Perhaps the dimensions should be stored in the layout manager tree but the LM tree is not available when the the properties are parsed and there is no way to get from a FO to the LM's that the FO creates. regards, finn
Re: JUnit test failure
[J.Pietschmann] Hi all, I get a nice Junit failure: Testcase: testFO2PDFWithDOM took 0.23 sec Caused an ERROR loader constraints violated when linking org/w3c/dom/Node class java.lang.LinkageError: loader constraints violated when linking This seems to have something to do mixing Jars form the JDK and fop/lib. Does anybody have an idea how this can be avoided? Any new information on how to avoid this error? Or perhaps a description of an environment where it doesn't occur. I can't make the hint in the junit faq work for me. I also think junit3.8.1 already have the lines in its excluded.properties. regards, finn
Re: cvs commit: fop.bat
[EMAIL PROTECTED] wrote: Index: fop.bat === -java -cp %LOCALCLASSPATH% org.apache.fop.apps.Fop %1 %2 %3 %4 %5 %6 %7 %8 +rem 'shift' removes %0 (i.e., the fop.bat filename) +shift +java -cp %LOCALCLASSPATH% org.apache.fop.apps.Fop %* Last time I had to work on cmdline args in windows, the %* trick didn't work on win9x: http://sourceforge.net/tracker/?group_id=12867atid=112867func=detailaid=484181 Unfortunately I can't test on a win9x so I can't create our own solution and I don't know if we can copy the solution from jython (which I know work on all real versions windows) due to license differences. Perhaps we should just copy the loop from ant and hope that ant works on win9x. regards, finn
Re: Percentages
[Simon Pepping] Quite a piece of work. I will try to understand it. A small correction: margin-[top,bottom]: width of containing block, except for page context where it ^ height (I suppose). Well, not according to the spec: http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#margin-top (my patch does not deal with the block-width / page-height issue correctly btw). regards, finn
Percentages
Hi, I just uploaded a patch which add support for percentages and calculations on percentages to bugzilla. I don't think that the patch is quite right in its attempt to find the correct base length to combine with the percentage. Some of my confusion comes from the fact that I can't tell the different boxes/areas/blocks apart. How f.ex. are these specifications different: - ipd of closest area generated by a block-level fo - width of containing block - ipd of closest ancestor block-area. Perhaps somebody who understand the spec better then me can describe what boxes and sizes that are to used. I went through the spec and came up with these descriptions of the bases for the properties that support percentage: top, bottom, height, max-height: height of containing block if explicit. Auto otherwise. left, right, padding-*, margin-[left,right], max-width, min-width, width, text-indent: width of containing block background-position-horizontal, background-position-vertical: Size of padding-rectangle. Relative values. font-size: inherited font-size. margin-[top,bottom]: width of containing block, except for page context where it is page box height. [start,end]-indent: ipd of containing reference-area. space-[end,start], line-line-end-indent: ipd of closest ancestor block-area. alignment-adjust: 'height' if fo:external-graphics or fo:instream-foreign-object, 'font-size' if fo:character and 'line-height' otherwise. baseline-shift: 'line-height' of the parent-area. block-progression-dimension: corresponding dimension of closest area generated by a block-level fo. Auto if not specified explicit. content-[height,width] intrinsic height and width inline-progression-dimension: corresponding dimension of closest area generated by a block-level fo. Auto if not specified explicit. min-height: height of containing. line-height: 'font-size' leader-pattern-width, leader-length: ipd of content-rect of parent area. column-gap: width of region being divided into columns. extent: corresponding height or width of the page. column-width: width of table. text-altitude, text-depth: fonts em-height. provisional-label-seperation, provisional-distance-between-starts: ipd of closest block-area that is not a line-area vertical-align: 'line-height' regards, finn
Re: [PATCH] unnesting Property.Maker and rollling datatypes into thier properties.
[Glen Mazza] Another option, Finn, is to move all the Property subclasses to fo.properties (even if they're alongside the makers, nested or unnested), after thinking about it, I think that will be a little bit clearer than having them in the datatype package. Comments? I like it. How does this sound: - Unnest Property.Maker to PropertyMaker and put it in fo.properties. - Roll the datatypes classes into their property class and move the property class to fo.properties (but without unnesting their Maker class because the remaining nested Maker classes are really trivial). - Move the handcoded maker classes to fo.properties. regards, finn
Re: Unnesting properties and makers.
--- Finn Bock [EMAIL PROTECTED] wrote: Does anyone know why we wrap the datatypes instances in a property instance? I think we could avoid the property instance by having the datatypes extends an AbstractProperty class which implement a Property interface: public interface Property { public Length getLength(); public Space getSpace(); ... } [Glen Mazza] Finn, just so I understand more here--what is the set of methods that this interface would have? (You don't have to give me a full enumeration if it's huge--but let me know you determine them.) How many of them are there--10 of them or 20 or 30 or ??? This is the full set, exactly the same which now exists in Property as null methods. public Length getLength(); public ColorType getColorType(); public CondLength getCondLength(); public LengthRange getLengthRange(); public LengthPair getLengthPair(); public Space getSpace(); public Keep getKeep(); public int getEnum(); public char getCharacter(); public Vector getList(); public Number getNumber(); public Numeric getNumeric(); public String getNCname(); public Object getObject(); public String getString(); The name of the returned compound property values would change according to the new naming rule that we decide. regards, finn
Re: Unnesting properties and makers.
[Glen Mazza] I now understand what you're saying, and like the simplification you're suggesting. The current naming however, is probably preferable--the word Property figures quite highly in the spec! Do you have a problem remaining with it? Not at all, it is just that I though it would be confusing to rename the 'datatypes' classes to XXXProperty as they would conflict with the old XXXProperty classes, but it is only a problem when you compare before vs. after. If the change is done, the resulting XXXProperty classes will be completely consistent. For those (*)'ed datatypes, can't we get rid of the datatype instead by rolling that datatype into the equivalently named Property? In turn, have *those* Properties extend AbstractProperty as you suggest. Actually, I guess I'm just saying the same thing you're suggesting, except to use --Property instead of --Type for everything. Indeed. Which package should the resulting rolled datatype/property be placed in? My feeling says fop.datatypes (and the nested makers should be unnested and placed in fop.fo.properties). But that is a separate suggestion which does not have to be dealt with initially. Offhand, it's doesn't seem natural to go without Property objects--they are kept in the PropertyList and indexed by the property ID in that list. That would still be the case. Everything stored in the PropertyList implements the Property interface. But only a few of them would extend AbstractProperty, correct--or would you plan on having all do so? All of the properties would extend AbstractProperty. That way the properties get the default 'null' implementation of all the gettype methods. The only hard requirement is that all the properties implement the Property interface. Except that the code above should IMHO use if (prop.getLength() != null) to test for a length type instead of using instanceof. Well, instanceof is slower I believe, but better self-commenting. If you switch to this type of conditional for speed, just add a short comment of its purpose--here, to determine if we are working with an EnumProperty or a LengthProperty. (Another option, BTW, if you think it will cut down on buggy programming, is to have the classes implementing this Property interface supply unsupported interface methods a la Peter's Read-Only BitSet[1], i.e., throw exceptions. We can revisit this topic later if code errors are becoming a problem.) In most cases a NPE exception is throws immediately after the call to gettype, but an exception thrown from within the gettype could indeed carry more information about the cause of failure. I still like the null return and null test better than the alternatives tho. regards, finn
Re: Unnesting properties and makers.
Each of the typeType classes also implements the gettype methods from Property so the layout must do exactly the same as it does now to extract the right value: propertyList.get(PR_INLINE_PROGRESSION_DIMENSION). getLengthRange().getOptimum().getLength(); [Andreas L. Delmelle] Hmmm... coming back to my recent question about the use of/access to the background-color property: I somehow would feel much for further extending the way the Common*Properties are handled. IIC, the calls like the above should only happen in the background via the propMgr of the FObj, and not become part of the public API. I dunno. The spec clearly list which properties that apply for a element: file:///d:/java/REC-xsl/slice6.html#fo_external-graphic so it makes sense to find the same list of assignments in the layout managers. As a concrete example, in Layout, I would rather see something like: private AreaDimensionProps adimProps; ... protected void initProperties(PropertyManager propMgr) { adimProps = propMgr.getAreaDimensionProps(); ... } Yeah, if it make sense to add more groups of properties together (and it seems that such a ipd,bpd pair make sense) I don't see a problem adding that. ... Length ipd = aProps.ipd; Yes, except that it is a LengthRange property. (maybe the latter can become more abstract PropertyValue ipd = aProps.ipd; ) My gut feeling says no. Unless the property in question can take non-LengthRange values (which ipd can not). The layoutmanagers should resolve the property value as far as they can as early as they can IMHO. regards, finn
Re: Unnesting properties and makers.
Glen Mazza wrote: Well, instanceof is slower I believe, but better self-commenting. [J.Pietschmann] Instanceof is exactly as fast as a simple function call after warm-up. That is not what I remembered, so I made a small test program and ran it with 3 different versions of jdk: [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe -cp . x false method call 160 true method call 170 false instanceof 581 true instanceof 581 [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe -cp . x false method call 16614 true method call 881 false instanceof 1162 true instanceof 2083 [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x false method call 581 true method call 661 false instanceof 2153 true instanceof 2734 I really don't know what to conclude from this test, but at least I'm glad I didn't mentioned performance as the reason why I prefer the gettype way of testing for subclasses. I'm surprised of the slow performance of calling non-overridden methods in jdk1.3.1. I don't have any explanation for that. regards, finn import java.io.*; import java.net.*; public class x { public static final int ITERS = 1; public static void main(String[] args) throws Exception { Prop prop = new Prop(); Prop stringprop = new StringProp(); // Warm up the JIT. testCall(prop); testInstanceOf(prop); long now; now = System.currentTimeMillis(); testCall(prop); System.out.println(false method call + (System.currentTimeMillis() - now)); now = System.currentTimeMillis(); testCall(stringprop); System.out.println(true method call + (System.currentTimeMillis() - now)); now = System.currentTimeMillis(); testInstanceOf(prop); System.out.println(false instanceof + (System.currentTimeMillis() - now)); now = System.currentTimeMillis(); testInstanceOf(stringprop); System.out.println(true instanceof + (System.currentTimeMillis() - now)); } public static void testInstanceOf(Prop prop) { for (int i = ITERS; i = 0; i--) { boolean x = prop.getString() != null; } } public static void testCall(Prop prop) { for (int i = ITERS; i = 0; i--) { boolean x = prop instanceof StringProp; } } public static class Prop { public String getString() { return null; } } public static class StringProp extends Prop{ String value = a string; public String getString() { return value; } } }
Re: Unnesting properties and makers.
[Andreas L. Delmelle] snip / public static void testInstanceOf(Prop prop) { for (int i = ITERS; i = 0; i--) { boolean x = prop.getString() != null; } } public static void testCall(Prop prop) { for (int i = ITERS; i = 0; i--) { boolean x = prop instanceof StringProp; } } I'd swap either the method names or the contained expressions to get dependable results (typo? Yeah, an embarrassing copypaste bug. Thanks for catching it. The result is then: [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe -cp . x false method call 581 true method call 581 false instanceof 160 true instanceof 170 [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe -cp . x false method call 1272 true method call 2304 false instanceof 17945 true instanceof 912 [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x false method call 2154 true method call 2754 false instanceof 590 true instanceof 651 regards, finn
Re: Unnesting properties and makers.
file:///d:/java/REC-xsl/slice6.html#fo_external-graphic [Andreas L. Delmelle] (Off-topic: Finn, I don't think I have access to your d:-drive ;) ) I hope not :-0 . Sorry about that. Yeah, if it make sense to add more groups of properties together (and it seems that such a ipd,bpd pair make sense) I don't see a problem adding that. I just think this will lead to an API that's a bit clearer, cleaner and so, in the long run, easier to manage and maintain. I don't really know whether the Common*Properties were separated out because they are, well, common, and it's more efficient for them to be treated as a bundle. Maybe it was originally the intention of creating property groups along the groups in which they are divided in the spec (see http://xml.apache.org/fop/compliance.html)? I don't know what the original intention was either but from the no-longer-used setup() methods in the flow objects like fo.flow.Block it looks like somebody once wanted the list of properties from the spec to be represented in the code. But that should not prevent us from doing it differently. regards, finn
Re: Comments on the changes to the property subsystem
[Simon Pepping] I have just catched up with the massive changes to the property system. Allow me to share a few observations: Thanks for your comments. How do you otherwise think it compares to the previous generated property makers? 1. If I see correctly, PropertySets is not yet used. Correct. Its intended use is when we, at some point in the future, want to store more than just the specified properties in the FObjs. PropertyList is still a HashMap keyed on property name. Is this waiting for some other changes to be made? Yes. Either PropertyList should have a Property[] array indexed directly by the propId or the HashMap should be keyed by an Integer object with the same value as the propId: super.get(integerArray[propId]); where integerArray is initialized as: integerArray[1] = new Integer(1); integerArray[2] = new Integer(2); ... Which of these it will be depends on how much information we decide to store in the Fobj. 2. In FOPropertyMapping the word 'generic' is used in two different senses: in s_generics and getGenericMappings() on the one hand, and in genericBoolean etc., createGenerics() and useGeneric() on the other hand. This may be confusing. One might e.g. be tempted to think that s_generics contains the objects genericXxx only, which is not the case. You are absolutely correct. I've took the names from the code which existed at the time. I'm also terrible at naming things so other suggestions will be welcome. I suggest that we rename s_generics and getGenericMappings to s_fomapping and getFoMappings because they deal with the properties from the xsl:fo spec. 3. getGenericMappings rebuilds s_generics every time it is called. In the current code it seems to be called only once, by FObj when the class is loaded. Would it not be better if getGenericMappings itself would ensure that the array is built only once? Yes, but I would like to take the question a bit further and ask where such information should be cached? Storing it in static variables caches it in the classloader which makes it difficult to control the release of the memory. Perhaps it would be better to store the information in the Driver or FOUserAgent, IOW to put the caching under user control. If caching at the classloader level is the best way to do it, then it makes perfect sense to do what you suggest. regards, finn
Suspicious override in RtfListStyleNumber.
Hi, While I was trying to fix some warnings from 'javadoc' about missing @see method references I discovered a suspicious construct in org.apache.fop.render.rtf.rtflib.rtfdoc.RtfListStyleNumber where the signature of the writeListPrefix(RtfList) method is different from the signature of the superclass RtfListStyle where the signature is writeListPrefix(RtfListItem). I'm guessing that RtfListStyleNumber is incorrect and that it should at least override writeListPrefix(RtfListItem). I hope that someone familiar with the rtf code could take a look at this. regards, finn
Re: Compiling HEAD
[Peter B. West] Is HEAD supposed to be compiling? Yes. I can compile it just fine from a fresh checkout. I'm getting errors starting at datatypes/ColorType.java. Which errors? Have you tried to do a ant clean first? regards, finn
Re: Unnesting properties and makers.
Does anyone know why we wrap the datatypes instances in a property instance? I think we could avoid the property instance by having the datatypes extends an AbstractProperty class which implement a Property interface: [Glen Mazza] Could you explain why we have the datatypes instances to begin with--what they're for? I'm not sure what their precise purpose is. The datatypes are the slightly more complex property values. The property classes wraps the datatype in order to give the datatypes a common interface. This list show the concrete Property subclasses and the datatypes that each of them wraps. CharacterPropertychar ColorTypePropertyColorType (*) CondLengthProperty CondLength (*) EnumProperty int KeepProperty Keep (*) LengthPairProperty LengthPair (*) LengthProperty Length,AutoLength,FixedLength,PercentLength (*) LengthRangeProperty LengthRange (*) ListProperty Vector NCNameProperty String NumberProperty Number NumericProperty Numeric (*) SpacePropertySpace StringProperty String ToBeImplementedProperty Some of the concrete property subclasses wraps standard java types such as int, char, String, Number and Vector and for these properties we still need a wrapper. But some of them, marked with (*), wraps a datatype which is under our own control and for those properties, the datatype class could also function as the property wrapper. Offhand, it's doesn't seem natural to go without Property objects--they are kept in the PropertyList and indexed by the property ID in that list. That would still be the case. Everything stored in the PropertyList implements the Property interface. In the list below of the new property classes, all the typeType classes implements Property and are stored in PropertyList. CharacterType char ColorTypeType it-self CondLengthType it-self EnumTypeint KeepTypeit-self LengthPairType it-self LengthType, AutoLengthType, FixedLengthType, PercentLengthType it-self LengthRangeType it-self ListTypeVector NCNameType String NumberType Number NumericType it-self SpaceType it-self StringType String ToBeImplementedType Each of the typeType classes also implements the gettype methods from Property so the layout must do exactly the same as it does now to extract the right value: propertyList.get(PR_INLINE_PROGRESSION_DIMENSION). getLengthRange().getOptimum().getLength(); For the classes which are both property and datatype, the gettype method becomes: public type gettype() { this this; } Furthermore, those are the objects requested by layout. What would be your alternative storage technique otherwise--I believe, we do (frequently?) have more than one datatype per property, correct? I remember two cases, but I can only find one at the moment: In Title.setup(): prop = this.propertyList.get(PR_BASELINE_SHIFT); if (prop instanceof LengthProperty) { Length bShift = prop.getLength(); } else if (prop instanceof EnumProperty) { int bShift = prop.getEnum(); } This would stay the same, except LengthProperty would be called LengthType and EnumProperty would be called EnumType. Except that the code above should IMHO use if (prop.getLength() != null) to test for a length type instead of using instanceof. I'm not sure what I propose as the naming convention for the new combined property/value, but Alt-Design calls them typeType so I used that in the list above. regards, finn
Unnesting properties and makers.
Hi, After updating from CVS, it is most likely necessary to do an ant clean to get rid of the old generated maker classes, before building. I have not yet removed the properties.xsl file from CVS. I guess it should be removed since it isn't used anymore. I've found an argument for unnesting the maker classes from their property classes: If we want to put the makers in its own package and I think it would be a little cleaner to do that. Using the fo.properties package seems natural. Does anyone know why we wrap the datatypes instances in a property instance? I think we could avoid the property instance by having the datatypes extends an AbstractProperty class which implement a Property interface: public interface Property { public Length getLength(); public Space getSpace(); ... } public class AbstractProperty { public Length getLength() { return null; } public Space getSpace() { return null; } ... } public class Length extends AbstractProperty { // Rest of datatypes.Length class. ... public Length getLength() { return this; } } With such a change, one of the remaining differences between HEAD and Alt-Design would be in the naming of the classes: Property = PropertyValue Property.Maker = Property Comments? regards, finn
Re: Unnesting properties and makers.
I have not yet removed the properties.xsl file from CVS. I guess it should be removed since it isn't used anymore. [J.Pietschmann] I think you could leave the file there for now. Ok. It should be sufficient to inactivate the related task in the buildfile (for example putting it in an XML comment). Too late for that, but I'll reactive the lines as comments tomorrow. Does anyone know why we wrap the datatypes instances in a property instance? No. Actually we should strive to use a proper parse tree for property expressions: 1. Create a few classes for the symbols in the property expression grammar (section 5.9 of the spec). I think we need as terminals - AbsoluteNumeric - RelativeNumeric - Color (the #N thingy) - String (aka Literal) - NCName (everything else, basically, including enum tokens and inherit) and for the nonterminals - PropertyFunction - Some classes for the operators 2. Write a proper parser (maybe using ANTLR, at least for bootstrap) which produces a proper parse tree. 3. Add methods to the objects for resolving relative numeric values (percentages, em) and for evaluation. 4. Perhaps add constant folding to the parser. Interesting. What issues do we have in property parsing that is solved by this? I'm only aware of arithmetic on relative numerics which doesn't work. regards, finn
Re: Newbie committer questions.
but instead I used the :ext: protocol and set CVS_RSH=ssh with cygwin, just like I do for sourceforge. Would I get any benefit from using ssh tunneling? [Jeremias Maerki] You don't have to establish the SSH connection everytime you do a CVS operation. It's speedier. Thanks, perhaps I'll give it a try later. For now the CVS_RSH method works fine and fast enough. [Glen Mazza] ... Finn, would you mind taking over the rest of your last patch? The issues I found can be discussed and changed, if necessary, after you apply it. I'll get to it tomorrow, friday. I can't see any way of doing it incrementally, so I'm going to commit the rest of it in one go. regards, finn
Newbie committer questions.
Hi, I've received my account information and everything appears to work fine. In the guides for setting up CVS there are several ways to set up tunneling http://jakarta.apache.org/site/cvsonwin32.html but instead I used the :ext: protocol and set CVS_RSH=ssh with cygwin, just like I do for sourceforge. Would I get any benefit from using ssh tunneling? When I committed my first modification I got the CVS/Template file shown, but from browsing around a bit in the archives of commit mails, I couldn't find an example of anyone using that Template. What is the policy on this? I also guess that we are not yet updating any CHANGES files with the modification we make? regards, finn
Re: Comments on new property maker implementation
Finn Bock wrote: I would guess that doing ~6 string compares to navigate the binary tree (with 148 color keywords) is slower than one string hash, ~1.2 int compares and one string compare. But I haven't measured it, so you might be well be right. Many keyword sets for other properties are much smaller and could perhaps benefit from a more suitable collection type. [J.Pietschmann] I meant setup effort, although a binary tree will most likely do additional memory management. You are right about the lookup. Just for curiosity, where do you get the 1.2 int comparisions? A perfect hash should not have collisions. I was comparing a standard HashMap with your binary tree. A perfect hash would likely have a more complicated hash function and of course zero int compares. It might also be interesting how a trie or ternary tree (as used for hyphenation patterns) would compare to hash maps for keywords (in terms of setup costs, lookup costs and memory). I have doing a study of various Java implementations on my todo list but didn't quite get around to do this. Very interesting indeed. regards, finn
Re: Comments on new property maker implementation
[Finn Bock] You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So static int[] a = { 101,102,103,104,105,106,107,108 }; becomes in bytecodes: Method static {} 0 bipush 8 2 newarray int 4 dup 5 iconst_0 [Glen Mazza] Hmmm...Are you saying that declaring a static array isn't much (any?) faster than manually creating one? I would guess that it is a bit faster than the typical bytecode for manually created arrays since the above uses 'dup' instead of 'getstatic' or 'aload' to push the array on the stack. I didn't realize that there is code being run for static arrays--I would have thought the compiled bytecode just includes the array internally, and not the code to create it. (i.e., if you opened the bytecode you would see an array 101 102 103 104... sitting someplace.) Arrays can't be stored in the constant pool so there is no place to put the data except as bytecode. http://java.sun.com/docs/books/vmspec/html/ClassFile.doc.html#20080 It should perhaps be noted that constant instances of String is not really stored in the constant pool either. The pool just stores the utf-8 representation of the string constant, and each literal string is initialized as a new pool String object based on that (as UTF-16ish). Isn't that how C works, at least? I think so, but C has hardware support for code and data segments so C can make better use of it. Sigh...I guess I *didn't* know bytecode by heart after all! ;-) I didn't bring it up to discourage the use of static initialized arrays. If it makes sense to put something in a static array we should do so without concern of compiletime vs. runtime. After all, the initialization is only performed once per classloader. regards, finn