from:"Finn Bock"

Re: Page breaking [was: Markers added to the wrong page]

2005-02-08 Thread Finn Bock

[The Web Maestro]
I don't know how the spec deals with this, but I doubt the 
spec cares which algorithm is used. That said, would it be a 
good idea to determine which algorithm to use based on 
something in userconfig.xml or something? If the Knuth system 
is more 'expensive' in terms of resources, we could make 
non-Knuth the default, and enable Knuth via a config file.
Yes indeed. But without measurements I would guess that knuth page 
breaking would be far less expensive than the knuth line breaking. The 
line breaking has to deal with far more elements and because of that a 
far larger set of active nodes.

But clearly a choice will be good, both for page and for line breaking.
[Victor]
The good news is that this is an excellent idea. The bad news is that FOP's
determination to not even tolerate more than one algorithm is what drove FOP
and FOray apart. So you have stumbled in to a somewhat sensitive area, which
might explain the absence of responses (so far).
BTW, since it is relevant here, thanks for your kind comments last week in
another thread. You are ever the peacemaker, a very useful role. As long as
the differences remain as great as they are, we *need* more than one FO
implementation. In the meantime, I am much more useful and productive
outside of FOP than I am in it, and, the FOP developers probably are also.
To my perhaps still naive mind, layout consists of exactly two tasks: 1)
finding line-breaks within logical blocks (nested blocks consisting of a new
logical block), and 2) finding page-breaks between and within logical
blocks. All other tasks can be conveniently handled within the two data
structures, not being dependent on any strategy. The Knuth-style algorithm
is by definition a total-fit algorithm. The scope of such an algorithm for
the first task is the logical block itself. The scope of the second task is
the page-sequence. So, to implement a Knuth-style algorithm (itself a good
idea) for item 2, requires the ability to see an entire page-sequence in the
AreaTree at one time.
Not at all. It is a rather trivial change to knuth to pick a page break 
when there is N pages of lookahead.

If we assume that finished page knuth element arrive one at time to the 
KnuthPage algorithm, the main loop becomes:

addKnuthElement(element) {
if element.isBox():
totalWidth += element.width
else if element.isGlue():
if (previous element is box):
considerLegalBreak()
totalWidth += element.width
totalStretch += element.stretch
totalShrink += element.shrink
else if element.isPenalty():
if element.penalty  INFINITE:
considerLegalBreak()
if activeList.startLine  currentPage + lookahead:
// In the activeList, find the node with the best demerit.
bestNode = findBestDemerit()
// Remove all nodes in activeList which does not have bestNode
// in their parentage.
clearActiveList(bestNode)
makePageBreak(bestNode)
Here it is only clearActiveList() which does not have a fast 
implementation when using the usual implementation of ActiveNode. It 
will require a scan over all the active node and for each node a scan 
back thru the previous links to the node at currentPage+1.

clearActiveList(bestNode):
for node in activeList:
// jump back thru previous (node.page-currentPage) times.
while i++  node.page - currentPage
previous = node.previous
if previous != bestNode:
activeList.remove(node)

The rest of the methods called are similar to the methods in my knuth 
line breaking patch (which is quite similar the current implementation).

My own insecurity comes from figuring out which penalty values and 
demerit algorihtm to use to get keeps and orphans right.

regards,
finn

Re: Page breaking [was: Markers added to the wrong page]

2005-02-07 Thread Finn Bock

[Luca]
I'm not sure it is always possible to do this: sometimes the
representation of a block depends on the properties of a higher level
block. For example:
 outer block
 |
 -
 |   |
  innerinner
  blockblock
12
In order to decide whether there can be a page break between the lines
generated by innerBlock1 and innerBlock2, we must know:
- if inner block 1 has keep-with-next
- if inner block 2 has keep-with-previous
- if outer block has keep-together
This can be done if the outer BlockLM calls receives the elements created
by its child, looks at them and adds / removes / corrects; could this be
done if the inner BlockLMs directly give their element to the top-level
LM?
I would pass the element on the immidiate parent, which recursively 
passes them on the top-level LM in the direction. For inline, the 
toplevel would be LineLM and for blocks it would be the PageLM.

BTW, what about your great refactoring of all the knuth code?
I do not have the time atm to do the followup and bugfixes, so it will 
have to wait. If anyone also wants to commit it, feel free.

regards,
finn

Re: Layout dimension mechanism

2005-01-21 Thread Finn Bock

[Jeremias]
I think there is something fundamentally wrong with the layout dimension
mechanism. I've found two problems, one minor, one very bad:
1. While implementing percentages for background-position-horizontal and
-vertical, I realized that it wasn't so simple to base the calculation
on a simple value. The percentage base is actually the padding-rect-size
minus the intrinsic size of the background image. Of course, this value
could be set by each layout manager but it seems awkward to have to do
that everywhere. Instead, I can easily calculate this in
TraitSetter.addBackground(), provided the BPD and IPD of each Area is
properly set before calling TraitSetter.addBackground(). That's
currently not the case, however. But it's probably a good idea to do
that anyway.
Yes IPD and BPD traits should be assigned to all area which has ipd and bpd.
But at the time where base lengths is needed, no areas has yet been 
created. So I don't see any way to avoid making the base lengths 
available during getNextBreakPoss. Either as values in a HashMap or on 
demand via a context object.

2. The above let me calculate the background image position easily for
block-containers. When I wanted to check the same for the body region I
found some bugs that I'm not yet testing for. While trying to fix them I
found a problem when I specify width=height=100% on a block-container
as the only child of a flow. The BLOCK_IPD and BLOCK_BPD layout
dimensions were fetched from the Root which returned the page size
instead of the content-rect size of the current region body area (which
is wrong for documents with different page sizes throughout the document).
PageSequenceLayoutManager also sets the page size as BLOCK_IPD/BPD on
each Region (which is wrong, too). In principle, the BLOCK_IPD/BPD
should come from the currently applicable Region if there's no parent
block-level object.
The current setting of dimension is neither complete nor correct.
I'm beginning to think that the layout dimensions should be held and provided
by the layout managers instead of the FObjs. To evaluate a
percentage-enabled value the layout manager would have to call
getLength() 
Just to be accurate, I would say it is the getValue() call that must 
accept a context parameter.

with some kind of context object that allows it to fetch the
right base values for percentage calculations. I haven't fully made up
my mind about that and I'd like to hear what you guys think about this
problem.
I agree completely.
The LayoutDimension in FObj was a hack to show off the relative length 
expression mechanisme. At the time a huge amount of Length.getValue() 
calls was located in PropertyManager. They are gone now, which I hope 
will make the change to using a context easier.

regards,
finn

Re: cvs commit: xml-fop/src/java/org/apache/fop/layoutmgr TraitSetter.java BlockLayoutManager.java

2005-01-11 Thread Finn Bock

[Simon]
There does not seem to be a need to add
the inherited value later; the property maker already has done so. See
IndentPropertyMaker.compute(PropertyList). It uses
propertyList.getInherited(baseMaker.propId).getNumeric()) to get the
inherited value. Earlier FOP developers understood this part well.
[Jeremias]
I understand, but I think you're talking exclusively about the property
resolution phase (right?) while I found that I need the computed value
of the margin property (not only the explicit one as is currently the
case) and the inherited start-indent for the layout manager code and to
set traits correctly.
[Simon]
PropertyList().get(Constants.PR_START_INDENT) gets the computed value,
that is, the value after property refinement. It is not the raw value
stated in the FO file; FOP does not store that at all. FOP tries to do
property refinement immediately. If that is not possible because the
computed value depends on a trait of an area, FOP stores an
expression, which can be computed at layout time.
I see no mention in section 5 of the spec that the trait value for
start-indent is different from the computed property value.
The problem that Jeremias is trying to address is computing 
space-[start,end] traits. Since only start-indent was computed correctly 
he tried to compute space-[start,end] traits based on the 
[start,end]-indent and to do that the inherited indent values (which has 
just been added in IndentPropertyMaker) must be substracted.

Which may indicate that calculating space-[start,end] traits based on 
indents is probably not the right approach.

I think the solution would be to:
- Store the [start,end]-indent traits.
- Let LayoutContent.refIPD be the reference IPD rather than the content
  ipd of the parent.
- let the renderer keep track of the reference rect and use the
  start-indent trait to locate the position of the content rectangle.
regards,
finn

Re: cvs commit: xml-fop/src/java/org/apache/fop/layoutmgr TraitSetter.java BlockLayoutManager.java

2005-01-07 Thread Finn Bock

[Jeremias]
would you please check if it is acceptable to put the inherited values
directly into the CommonMarginBlock? It might have been cleaner to
always get the value via the parent FO but I think in this case it helps
simplifying the code in TraitSetter and BlockLayoutManager.
It looks wrong, it feels wrong, but I'm not at all sure if it is wrong.
But I would be tempted to drop the use of space-[start,end] traits and 
instead let the renderers use [start,end]-indent traits and the 
reference rectangle to calculate the start position of the content 
rectangle.

Alternatively calculate space-[start,end] based on the 
[start,end]-indent traits on the parent area rather then the parent fo.

regards,
finn

Re: cvs commit: xml-fop/src/java/org/apache/fop/fo/properties BoxPropShorthandParser.java

2005-01-06 Thread Finn Bock

[EMAIL PROTECTED] wrote:
  Log:
  convertValueForProperty didn't have the right signature and
  therefore didn't override the superclass implementation.
That was sloppy of me. Thanks for finding and fixing it.
  +ListProperty listProperty = (ListProperty)property;
That is a no-no. Properties should not be casted but only be coerced 
using the getXXX methods.

And the getList() method is already defined on Property so no cast is 
needed.

Using casts will prevent us from adding property proxies, which i 
suspect will be needed.

regards,
finn

Re: cvs commit: xml-fop/src/java/org/apache/fop/datatypes LengthBase.java

2004-12-30 Thread Finn Bock

Simon Pepping wrote:
On Tue, Dec 28, 2004 at 06:03:13PM -, [EMAIL PROTECTED] wrote:
 
 Index: LengthBase.java
 ===
 RCS file: /home/cvs/xml-fop/src/java/org/apache/fop/datatypes/LengthBase.java,v
 retrieving revision 1.8
 retrieving revision 1.9
 diff -u -r1.8 -r1.9
 --- LengthBase.java	28 Oct 2004 10:00:19 -	1.8
 +++ LengthBase.java	28 Dec 2004 18:03:12 -	1.9
 @@ -20,6 +20,7 @@
  
  import org.apache.fop.fo.Constants;
  import org.apache.fop.fo.FObj;
 +import org.apache.fop.fo.IntrinsicSizeAccess;
  import org.apache.fop.fo.PropertyList;
  import org.apache.fop.fo.expr.PropertyException;
  
 @@ -122,13 +128,15 @@
  case BLOCK_HEIGHT:
  return parentFO.getLayoutDimension(PercentBase.BLOCK_BPD).intValue();
  case CONTAINING_REFAREA:// example: start-indent, end-indent
 - {
  //FONode fo;
  //for (fo = parentFO; fo != null  !fo.generatesReferenceAreas();
  //fo = fo.getParent());
  //return (((fo != null)  (fo instanceof FObj)) ? ((FObj)fo).getContentWidth() : 0);
  return 0;
 -}
 +case IMAGE_INTRINSIC_WIDTH:
 +return ((IntrinsicSizeAccess)propertyList.getFObj()).getIntrinsicWidth();
 +case IMAGE_INTRINSIC_HEIGHT:
 +return ((IntrinsicSizeAccess)propertyList.getFObj()).getIntrinsicHeight();
  case CUSTOM_BASE:
  //log.debug(!!! LengthBase.getBaseLength() called on CUSTOM_BASE type !!!);
  return 0;

I am surprised to find the property list here. If I understand Finn's
restructuring of the usage of property lists, it was the intention
that all references to the property lists are released, so that they
can be GC'ed. The exceptions are the subtree under Marker and the line
of ancestors of RetrieveMarker. This patch reveals that LengthBase
maintains a reference to the property list. Perhaps that is an
oversight in the restructuring? 
Indeed.
Perhaps the references to the
property list and to the parent FO might be replaced by a reference to
the FO to which the property using this LengthBase belongs. But I
would not know how to replace the method
PropertyList.getInherited(). Perhpaps Finn wants to take a look at
this?
Unfortunately not at the moment where real work takes its toll.
The idea was that all base values was assigned on the FObj in 
layoutDimension, including font sizes. And that also goes for image sizes.

In the longer term the layoutDimension should be available in a context 
object that is passed into all Length.getValue(context) calls.

regards,
finn

Re: Large files.

2004-12-12 Thread Finn Bock


The loop can be stopped when we temporary run out of FO tree 
nodes and restarted again when new nodes has been added. I 
suppose that the FO tree can then be viewed as a stream of FO nodes.
[Victor]
This model probably works fine if you never need to look ahead, but there
are numerous examples (well discussed in the archives) where one does need
to do that, the most obvious being automatic table layout. Peter's solution
to that is pull parsing, which probably works, but forces an intermingling
of interests that I need to avoid. My solution is to serialize the FOTree as
necessary 
Did you notice that if a FOTree (or a fragment of it) is serialized to a 
preorder sequential representation with end markers, the preorder, 
postorder and child events can be fired directly from the input stream?

IOW the event based layout can work both of a normal parent/children 
linked tree and a sequential tree.

The bottom line is that, if you
conceivably need to see both the beginning and end of the input
simultaneously (which we do), 
I can see a need for several passes over the same tree fragment, but do
we really need to see the beginning and the end at the same time?
The auto table example appears to need 2 or 3 sequential passes over the 
table subtree (one to find the minimum/maximum column width and one to 
do the actual layout), but the layout process is still sequential.

regards,
finn

Re: Large files.

2004-12-10 Thread Finn Bock

The problem with Keirons layout code (with respect to large input 
files) is that it works top-down on the LM tree and thus require the 
creating of the complete LM tree for the page sequence. To better fit 
within SAX event model the layout process should also be event driven, 
triggered f.ex by the endBlock() and character() events. That means 
that the traversing of the FOTree cannot be easily done using 
recursive decend. Instead the main loop over the FO tree could use a 
non-recusive tree treverse like this:

public Element traverse(Node q, Node end, int mode) {
while (q != end) {
switch (mode) {
case FIRST:
// First time in the node.
preorder(q);
if (q.hasChildNodes()) {
q = q.getFirstChild();
break;
}
mode = NEXT;
// fallthrough
case NEXT:
// All children are handled.
postorder(q);
// Add the node to its parent.
if (q.getParentNode() != null) {
child(q.getParentNode(), q);
}
// fallthrough
case RESTART:
// Attempt to move vertically to next sibling.
 horizontally?
if (q.getNextSibling() != null) {
q = q.getNextSibling();
mode = FIRST;
break;
}
// No sibling found, move to the parent to
// do postorder.
q = q.getParentNode();
mode = NEXT;
}
}
return null;
}
...
The loop can be stopped when we temporary run out of FO tree nodes and 
restarted again when new nodes has been added. I suppose that the FO 
tree can then be viewed as a stream of FO nodes.
[Peter]
I suppose so.  And I suppose that making layout event-driven would 
better fit in with the SAX event model.  It just doesn't make sense from 
the point of view of layout which is basically a top-down process.
By top-down, do you mean the visiting order of the nodes in the tree? Or 
the structure of the code that does the layout? Like this:

process(Node n) {
preorder(n)
for (each child in n) {
process(child)
child(n, child);
}
postorder(n);
}
The traverse() code does the exact same as the process() code. The 
differences are:

- All people immediately recoqnize process().
- process() can use local variables to store state.
A major drawback of process() is that is darn hard to suspend processing 
and then restart it. Our current getNextBreakPoss() code is a nice 
example of just how hard it is.

regards,
finn

Refactoring of knuth line breaking code.

2004-12-09 Thread Finn Bock

Hi,
I've been playing around with the knuth line breaking code and made a
slight refactoring of it. The result is 38% faster for START align,
78% faster for CENTER align and 12% for JUSTIFY align.
Meassured using 'time' command, jdk1.4.2_03, -Xmx2 and a rough 
edition of Sherlock Holmes from the the Gutenberg project:

   http://apache.org/~bckfnn/advsh12s.fo
   http://apache.org/~bckfnn/advsh12c.fo
   http://apache.org/~bckfnn/advsh12j.fo
(s=start, c=center, j=justify alignment)
The patch is here:
  http://issues.apache.org/bugzilla/show_bug.cgi?id=32612
The performance improvement comes from:
- Using -INFINITE_RATIO when there is too little shrink in a line for an
  element to fit.
- Reuse the BestRecords instance instead of creating an instance for
  each line * for each element.
- Using ArrayList instead of LinkedList.
- Avoiding the indexOf calls on Paragraph.
- Use '*' instead of Math.pow() for calculating demerit.
- Split activeList into an array indexed by line and nodes within a
  line linked together.
These improvements could also be applied to the existing code, so I
think the more interesting point is the quality of the refactoring
job. I guess that I have moved away from a faithful implementation of
Knuth's algorithm, but I think the result is more in line with common
java code.
Opinions?
regards,
finn

Large files.

2004-12-09 Thread Finn Bock

[Peter]
As a completely unintentional side-effect, it gave me the tools to solve 
the really critical FOP performance problem on large files.  This isn't 
going to be solved by micro-efficiencies on FO tree storage.
I'm going to use this as a excuse to outline my thinking for handling 
large input files. I'm almost sure that it is a bit different from your 
view.

This is a rant, and I don't have a working patch to back it up.
The problem with Keirons layout code (with respect to large input files) 
is that it works top-down on the LM tree and thus require the creating 
of the complete LM tree for the page sequence. To better fit within SAX 
event model the layout process should also be event driven, triggered 
f.ex by the endBlock() and character() events. That means that the 
traversing of the FOTree cannot be easily done using recursive decend. 
Instead the main loop over the FO tree could use a non-recusive tree 
treverse like this:

public Element traverse(Node q, Node end, int mode) {
while (q != end) {
switch (mode) {
case FIRST:
// First time in the node.
preorder(q);
if (q.hasChildNodes()) {
q = q.getFirstChild();
break;
}
mode = NEXT;
// fallthrough
case NEXT:
// All children are handled.
postorder(q);
// Add the node to its parent.
if (q.getParentNode() != null) {
child(q.getParentNode(), q);
}
// fallthrough
case RESTART:
// Attempt to move vertically to next sibling.
if (q.getNextSibling() != null) {
q = q.getNextSibling();
mode = FIRST;
break;
}
// No sibling found, move to the parent to
// do postorder.
q = q.getParentNode();
mode = NEXT;
}
}
return null;
}
This is the algorithm from Knuth TAoCP, vol I, Page 323 program S. It 
traverses any DOM tree fragment from 'q' to 'end' without using the java 
stack. The tree must support getFirstChild() and getNextSibling(). 
During the traverse it fires 'preorder', 'postorder' and 'child' events.

When applied to FOP the nodes are LayoutManagers and getFirstChild() and 
getNextSibling() will create the LayoutManager for the first child / 
next sibling.

The loop can be stopped when we temporary run out of FO tree nodes and 
restarted again when new nodes has been added. I suppose that the FO 
tree can then be viewed as a stream of FO nodes.

regards,
finn

Re: Marker.rebind()

2004-12-07 Thread Finn Bock

[Simon]
I have just committed a patch which fixes bug 32253. One problem
remains: the text in the marker does not always have the right
properties, e.g. the right size. This is probably due to the fact that
Marker.rebind() does not rebind the whole subtree below a marker.
Your right! It seems that FOTexts isn't bind()'ed  by the 
Marker.rebind(). I'm certain that I had that working at some point 
before I submitted the patch. I guess it was the change in FOText's 
inheritance, which happened just before I submitted, that caused the 
problem.

The solution is to move the FObj.createPropertyList() method to FONode 
and to call ft.createPropertList from FObjMixed.addCharacter(). That 
will register the the FOText with the Marker.children map.

I'll do it later this week.
regards,
finn

Knuth maximum demerit value.

2004-12-07 Thread Finn Bock

Hi,
What is the logic behind the INFINITE_DEMERITS = 1E11 ? Having a 
threshold of 20 can result in a demerit value of

   (1 + 100 * 20 ** 3) ** 2 == 6.400016E11
for the first KnuthNode which is then ignored since it is worse than the 
initial demerit value.

I would guess that when we use a ration of 20, the initial demerit value 
should somehow match.

regards,
finn

Re: More questions on line breaking.

2004-12-03 Thread Finn Bock


1) What is inactiveList doing. Nodes are added but never used.
[Simon]
It contains all feasible breakpoints except those that are still
active, i.e., are still in scope as the start of a line ending at the
currently considered breakpoint. At the end of the loop the active
list only contains nodes pertaining to the end of the paragraph. From
the best one of them the breakpoints of the paragraph are calculated
by tracing back to the beginning through the inactive list.
See the code in LineLM after the comment
// use the chosen node to determine the optimum breakpoints
in the line
bestActiveNode = bestActiveNode.previous;
the previous node is held in the inactive list.
Ok, so it isn't really needed when the algorithm is implemented in java. 
Just by having the previous node linked from within bestActiveNode is 
enough to keep the inactive nodes alive.

So inactiveList can be removed.
regards,
finn

Re: Knuth linebreaking questions

2004-12-02 Thread Finn Bock

And why not adjust the spacing within the user specified min/max for
START and END alignment?
[Luca]
Should the user desire adjusted spaces, wouldn't it be better for him to
specify justified alignment? :-)
Seriously, the recommendation (at 7.16.2 letter-spacing and 7.16.8
word-spacing) states that these spaces may also be influenced by
justification, but says nothing about start and end alignments.
I tend to read that to mean that word spacing may be pushed beyond the 
specified range by justification. And I would think that unjustified 
alignment still has the option of using the word-spacing range but 
ofcourse has to stay within the range.

I'm still not sure why it would be ok to ignore any user specified
min and max values of 'word-spacing' during START and END alignment.
If a user specifies a length range, what would the reason be for not
using it? Perhaps with additional DEFAULT_SPACE_WIDTH.

When alignment is start or end, each space has always its .optimum width,
so there is no need to look at the .minimum and .maximum: the user most
preferred value is already used.
Is there anything that prevents using a non .optimum value within the 
range if the result is judged to be better (with a lower demerit).

Ok, performance is indeed a fine reason, but IMHO such quality vs.
speed tradeoffs should eventually be made by the user rather than us.

Simon told the same:
# Note that in TeX such thresholds are user-adjustable parameters. I
# think they should eventually be so in FOP too, for those of us who
# have the most exquisite taste of line layout.
and I think it's a good idea; the algorithm should:
 1 find breaking points without hyphenation
 2 hyphenate
 3 find breaking points with hyphenation
 4 decide which ones are better
and point #4 uses the user-definable threshold; where should this constant
be stored? Inside the code of LineLM or in a configuration file?
An extension attribute?
   fo:block fox:knuth-threshold=5 ... /fo:block
I suspect that the other knuth parameters should be specified the same 
way. But it is not a high priority IMO.

regards,
finn

Re: Knuth linebreaking questions

2004-12-01 Thread Finn Bock


1) What is the purpose of 2 glues for a normal space in END and START
alignment:
new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false));
new KnuthPenalty(0, 0, false, , true));
new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true));
[Luca Furini]
The purpose is to give each line (but the last one) the same
stretchability, regardless of the number of spaces in it.
If the penalty is not used (there is no line ending there) the overall
effect of the 2 glues is a 0 stretchability and does not modify the line
total; if the penalty is used (a line ends there) then the stretchability
of the previous glue is added to the line total, which becomes 3 *
wordSpaceIPD.opt because the previous space, as said before, added 0 (the
following glue is suppressed).
In justified text, a line with many spaces can be adjusted in order to be
much shorter, or much longer.
If left-aligned text used the same elements, the algorithm would find the
same breaking points; but this time adjustment ratios are not used, so a
line with many spaces would be too much longer, or too much shorter, than
the other lines.
Using these elements, the algorithm creates lines whose unadjusted width is
quite the same.
Ok, thank you for the explanation.
I'm still not sure why it would be ok to ignore any user specified min 
and max values of 'word-spacing' during START and END alignment. If a 
user specifies a length range, what would the reason be for not using 
it? Perhaps with additional DEFAULT_SPACE_WIDTH.

And why not adjust the spacing within the user specified min/max for 
START and END alignment?


3) What is the reasoning for doing hyphenation only after threshold=1
fails. Naive common sense tells me that if the user specify hyphenation
we should do hyphenation before finding line breaks.
Finding hyphenation points is time-expansive (all words must be
hyphenated, not only the ones near a line's end), the sequence of
elements becomes longer, there are more feasible breaking points, and a
line ending with a - is less beautiful; so I thought that if a set of
breaking points could be find without hyphenation.
I just took the hyphenate property as a suggestion instead of an order! :-)
Note that the same algorithm with the same threshold could find a
different set of breaking points with and without hyphenation, because the
elements are different. Without hyphenation, spaces could need a little
higher adjustment, for example.
Ok, performance is indeed a fine reason, but IMHO such quality vs. speed 
tradeoffs should eventually be made by the user rather than us.

Thank you for taking the time to explain it all in such great detail.
regards,
finn

Knuth linebreaking questions

2004-11-30 Thread Finn Bock

Hi Luca (and others),
I've been trying to get my head around the line breaking code and during
that process some questions has come up. I urge you *not* to take
anything I ask as a sign of criticism or as a request for changes. I 
don't have the Knuth paper where the algorithm is described so perhaps 
the answers would be obvious if I read it.

1) What is the purpose of 2 glues for a normal space in END and START 
alignment:

new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false));
new KnuthPenalty(0, 0, false, , true));
new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true));
and why isn't the min and max of wordspaceIPD used.
2) What does the threshold parameter to findBreakingPoints controll?
It seems to be a performance parameter which control the number of 
active nodes, rather than a quality parameter. Or to frame my question 
differently, if threshold=1 finds a set of breaks, will threshold=5 
always pick the same set of breaks? Or can threshold=5 find a better set 
of breaks?

3) What is the reasoning for doing hyphenation only after threshold=1 
fails. Naive common sense tells me that if the user specify hyphenation 
we should do hyphenation before finding line breaks.

4) I've compared your code to tex_wrap
   http://oedipus.sourceforge.net/texlib/
and the main difference is in the way new KnuthNodes are added to the 
active list. Is the BestRecords part of Knuth or is it your own 
invention? Why is it only fitness_class'es in BestRecord that is higher 
then minDemerits + incompatibleFitnessDemerit that is added to 
activeList? Why not all fitness_class'es in BestRecords?

regards,
finn

Re: cvs commit: xml-fop/src/java/org/apache/fop/traits LayoutProps.java SpaceVal.java

2004-11-26 Thread Finn Bock

[Glen]
 2.) Appended EN_ to enumeration constants to
[J.Pietschmann]
Yuk. Having a large number of identifiers in the
same scope with
an identical prefix isn't very good for
autocompletion both in
Emacs and Eclipse. 
[Glen]
We've been doing the same with PR_ (properties) and
FO_ (FO's) for quite some time.  
To avoid a name conflict somewhere.
After hitting the
EN_, you're at the same place you would be without the
prefix.  Furthermore, you can now hunt away for your
enumeration constants without them being intermixed
with the PR_'s and FO_'s.
It was also a commenting issue:  TRUE and FALSE, for
example, without a prefix, just weren't self
documenting enough to emphasize that we're working
with enumeration constants here.  (Remember, we
removed the old interfaces--per your desire as well as
mine--such as WritingMode.LR_TB or whatever that
previously provided that emphasis.)
How about having 3 interfaces: 'Properties', 'Elements' and 'Enums' 
which contains the constants without any prefix. And then decide that 
these interfaces are never implemented, but the constants are always 
accessed using the interface name:
Enums.TRUE

That would keep the searchability and perhaps even help us when (if) we 
move to typesafe enums.

regards,
finn

Re: Do we need an inherit enumeration constant?

2004-11-25 Thread Finn Bock

[Finn]
It [an INHERIT constant] isn't needed as an enum value
because the 'inherit' keyword takes precedence over
the other enumeration values. See 5.9.11:
 
The Keyword values take precedence over
EnumerationToken.
 
where Keyword is 'inherit'. The code to handle
'inherit' is at line 397
 
http://cvs.apache.org/viewcvs.cgi/xml-fop/src/java/org/apache/fop/fo/properties/Proper

tyMaker.java?annotate=1.11
 
[Glen]

Finn, I've read what you've written above, as well as
what the spec is saying -- but I am confused here. 
Why does there need to be a precedence-determining
rule in the spec that says The Keyword values take
precedence over EnumerationToken.?  Does anyone know?
Perhaps because 'Keyword' and 'EnumerationToken' is produces with the 
same lexical rule.

http://www.w3.org/TR/REC-xml-names/#NT-NCName
My first thought is that we're only going to have one
attribute value between the quotes (i.e., inherit,
left, right, justify, etc.) so I don't see the
need for order-of-evaluation rules or declarations of
precedence.  But are there actually cases where you
would have more than one enumeration constant (e.g.,
left | justify?)  I must be missing something here.
'azimuth' and 'text-decoration' can have multiple enum values (eg. 
overline line-through). But no property can have inherit as well as 
some other value.

regards,
finn

Re: Do we need an inherit enumeration constant?

2004-11-22 Thread Finn Bock

[Glen]
This is possibly a question for Finn, but if anyone knows:  In our 
Constants.java [1], we don't have an property enumeration constant for 
inherit.  (Go here [2], and search on inherit, you will see it 
listed in the Value: section for multiple properties.)  Is there any 
reason why we don't need it, or did we just forget to add it to our 
Constants.java?
It isn't needed as an enum value because the 'inherit' keyword takes 
precedence over the other enumeration values. See 5.9.11:

The Keyword values take precedence over EnumerationToken.
where Keyword is 'inherit'. The code to handle 'inherit' is at line 397
http://cvs.apache.org/viewcvs.cgi/xml-fop/src/java/org/apache/fop/fo/properties/PropertyMaker.java?annotate=1.11
OTOH the code to deal with 'inherit' and the other enum values needs 
some changes to deal with optional white-space, so perhaps the parsing 
of enums should be passed to the PropertyParser and the 'inherit' test 
should be placed after the call to PropertyParser.parse(). In that case 
a INHERIT constant would be needed.

regards,
finn

Re: remove LayoutManager.initialize()?

2004-11-10 Thread Finn Bock

[Glen]
Does anyone have a problem if I worked towards
removing the initialize() method from our
LayoutManager interface?  
There is two way of looking at it. The code in initProperties() which 
can be moved to the ctor should be. That is no loss at all and will 
reduce complexity of the LMs a tiny bit.

But removing the initialize() method will reduce the flexibility that 
LMs currently has to retrieve information from the parent LM. Perhaps 
that flexibility is not used at the moment but I suspect that it will be 
needed to implement the irregular inheritence of properties like 
text-decoration.

regards,
finn

Re: AreaFactory patch

2004-11-03 Thread Finn Bock

[Glen]
Finn, keep in mind--both you and Simon wanted
pluggable LM's, and you even supplied a patch for it a
few months ago.   But you have been mostly silent on
the matter ever since (i.e., it looks like you don't
have a need for it ATM.)  
Or perhaps I've been working on other things, like property 
optimizations and exceptions. And just maybe I didn't feel that I had to 
be the one who implemented plugable LMs since I wasn't the one who 
removed the existing plugability.

So sometimes it is good to
have things sit in Bugzilla for a couple of months,
see if others want it, or what modifications they
want, or see if even the original requestor still
needs it.  

Also, Tybor seemed to be fine with using pluggable
LM's instead of pluggable Area's--i.e., not even
needing them *now*--which would make his desires in
sync with yours and Simon's (and mine), as well as
keep our layout code in our LM's instead of moving to
the Area objects.  Do you still see a need then for
*both* pluggable LM's and pluggable Areas in our code
then?  I didn't find that realistic, as I am uncertain
of the additional power that it offers, but am asking
your opinion here.
I only see a need for plugable LMs, but the AreaFactory patch is so 
small that I see no problem with throwing a bone to Tibor.

regards,
finn

Exceptions. (Was: AreaFactory patch)

2004-11-02 Thread Finn Bock

[Peter]
On the topic of exceptions, and now that it's all over...
I was puzzled by this discussion.  Would anyone expect that Defoe would 
subclass SAXException for document validation errors?
That is your choice. Surely exceptions that occured during SAX event 
handling (eg startElement) should be handled, either by wrapping the 
exception in a SAXException, or by passing SAXExeption subclasses along 
or by throwing some kind of RuntimeException (did I miss another 
option?). I don't think exception handling by
e.printStackTrace();
is a long term solution.

If not (it doesn't), why not?  
AFAIK, there is no well defined API to XSL:FO parsing, so everybody will 
do it differently.

And if someone was inclined to write an FO processor 
using a DOM front-end, would you expect validation errors to throw 
subclasses of SAXException?  
Ofcourse not.
The same fo file, with the same errors, 
could be processed by three different processors, using three different 
parsing methodologies, and the same validation errors would encountered.
Do you mean that the 3 different processors should ideally report the 
same validation errors in the same manner? That can only happen after 
someone standardize a SAFO API (Simple API for FO parsing). Until then 
all implementation will throw different exceptions, which is fine by me.

(OK, you can classify at least two categories of validation error, but I 
think the argument still applies.)

SAX != XML parsing, let alone document validation.
True, but FOP heavily depends on SAX.
regards,
finn

Re: AreaFactory patch

2004-11-02 Thread Finn Bock

[Chris]
I'm definitely in agreement with you on this one Glen. Lets keep
Layout simple whilst its still unfinished.
Pluggable LMs can be added once we have an
initial release.
[Andreas]
Well... (sigh)... well ('nutha sigh)
What *does* Finn think, in that case? So far, I've yet to hear a single
*solid* argument pleading against the proposed change. Of course, something
like LM Makers can be added later on --the proposed AreaFactory shouldn't
hinder that.
I got some minor suggestions to the patch:
- It should be strict typed: createBlock(..), createInline(..)
- It should be complete so that all area creation was done through the
  factory, not just the 3 areas that Tibor needs.
All we heard up to here is a few vague concerns about so-called increased
complexity. What?!? It's a plain, simple, basic-as-can-be Factory pattern
for chrissake! It doesn't bite... or does it? Anyone?
I've never bought the increased complexity argument. Not in this case 
and not in any of the previous cases where it was raised.

regards,
finn

Re: Connection between FObj and Area

2004-10-29 Thread Finn Bock

[Tibor Vyletel]
Hello Fopsters,
I use FOP (dev 1.0) in a bit different scenario as you. My renderer works in
GEF framework. Till now, I have hacked simple change in relevant layout
managers:
BlockLayoutManager.getParentArea(Area childArea) {
...
// this creates connection between fobj and curBlockArea:
curBlockArea.addTrait(Trait.ID_AREA, this.fobj);
...
}
This solution is just a hack and a maintainance nightmare, as the design is
constantly changing ;)

I am aware that described construction is breaking memory optimization
goals, however I need it to achieve my goal: enable editing after the
rendering process and make this editing as responsive as possible.
Lately (in fact, after the removal of getID() method from FObj), I've been
thinking about more elegant solution which would allow me to track a mapping
between FObjs and Areas in cases I need it. I know that any technique which
would force this connection in standard processing would not be possible,
because its effects on memory consumption would be relevant.
My alternative idea is:
- create AreaFactory interface (or abstract class) which would be
responsible for creation of areas. Provide default implementating class with
today's functionality scattered among LMs.
There are to choices how to design this interface:
a) with one strong method: Area createArea(LayoutManager lm, FObj fobj,
Object constraints)
b) with a set of specialized methods for creation of different area
types:
Inline createInline(...) {}
Block createBlock(...) {}
...
Would you need to return subclasses of Inline and Block etc? Or would 
you just set various additional traits such as this.fobj?

- refactor instantiation of area objects in layout managers: remove all
direct instantiations and replace them with calls to DefaultAreaFactory
- solve how to configure AreaFactory before layouting process. Layout
managers don't have access to UserAgent so far, so the best method would
probably be a pair of static methods:
Actually, LMs normally has access to the fobj and through that the 
FONode.getUserAgent() method.

AreaFactory AreaFactory.getFactory()
AreaFactory.setFactory(AreaFactory factory)
and this constructrion in LMs:
currArea = AreaFactory().getFactory().create
When this pattern would be applied, my problem would be simply solved by
creation of descendants for some Area classes, which would hold reference to
the FObj data. Default implementation would bring no extra memory
consumption and/or relevant speed degradation.
The reason why I have written this mail, is to obtain some feedback about
suggested refactoring. In fact I am forced to do some changes to the FOP
code because of my project's special requirements. 
Would you still need to change FOP sources, even after your proposal was 
implemented?

The real payback for me
would the case when all the maintainance nightmares would be gone and this
can happen only when creation of Areas is more modular. I think that
modularization of this could bring its two cents to the quality of the FOP's
source code.
From your description I sort of get the impression that a smaller 
change, such having all LMs call a new method, say

   AbstractLayoutManager.setGeneralTraits(FObj)
which has a default empty implementation. Would it be good enough for 
you to change AbstractLayoutManager to add your special requirements?

regards,
finn

Re: Exception hierarchy.

2004-10-28 Thread Finn Bock

[Glen]
I'm not clear why you didn't derive
ValidationException from SAXParseException.  I know
the locator is already present in FOPException, but
absent the subclass from SAXParseException, it ends up
being a different Locator object, i.e., user code
that would handle a SAXParseException can't handle
this FOPException.  Perhaps just a nitpick, but I see
a long-term advantage in having our
ValidationExceptions subclass SAXParseException if
possible.
Sometime when a PropertyException is thrown (f.ex during evaludation of 
a relative numeric) the name of the property and the location is not 
available. In the patch I've added a catch in PropertyParser which add 
the property name and location information to the PropertyException and 
rethrows the augmented exception. SAXParseException keeps the location 
information private so it is not possible to change the location info 
after a SAXParseException is thrown. Other than that, the hierarchy 
could also have been rooted in SAXParseException.

(Namely, if an embedded user of FOP wished to report
both parsing and validation errors to the user in the
same manner--a plausible scenario--it would be
convenient to have ValidationException subclass
SAXParseException.)
ValidationException and SAXParseException both have SAXException as a 
parent, so a user can catch SAXException to deal with the different 
exception in a uniform manner.

regards,
finn

Re: Exception hierarchy.

2004-10-27 Thread Finn Bock

[Andreas]
It seems more about not throwing SAXParseExceptions,
but catching them and
throwing a FOPException instead. Isn't that the
issue? That a
'missingChildElementError' is actually not a
SAXParseException... because it
has nothing to do with SAX in itself 
[Glen]
SAXParseExceptions would appear to be for errors that
occur during the SAX parsing of an XML document (as a
result of a user supplying an invalid XML document). 
It is not for errors that occur within the
implementation of the SAX processor (e.g., Xalan)
itself.  
You mean xerces, right?
Xerces throws SAXParseException for all kinds of invalid XML:
root
 block
/root
org.xml.sax.SAXParseException: The element type block must be 
terminated by the matching end-tag /block.

SPE might be considered the XML parsing
equivalent of a syntax error that would occur for a
code compilation error.  
Right, errors at the XML level.
As the definition of its parent SAXException[1]
states, This class can contain basic error or warning
information from either the XML parser *or the
application*  So it can be used within FOP.
Indeed, and that is exactly what my proposal does:

If the application needs to pass through other types of exceptions, it 
must wrap those exceptions in a SAXException or *an exception derived 
from a SAXException*.


[1]
http://xml.apache.org/xalan-j/apidocs/org/xml/sax/SAXException.html
--it's a
violation of a definition as
formulated in the XSL-FO Rec. The error is treated
and reported as being a
SAXParseException only because the latter offers the
convenience of a
locator, but it should actually be something like

No, because the SAXParseException is for such
validation errors, regardless of what you're parsing. 
Thoughful people can disagree about that.
Learn once, use everywhere:  Batik, FOP, Cocoon, Axis,
etc., etc.  (Also, as a minor additional benefit, it
helps pluggability if all XSL processors adopted
SAXParseException instead of each of us coming up with
our own name for this.)
Heh. It appears that Batik have choosen to root their hierarchy in 
RuntimeException and Cocoon in avalon. I don't know about Axis. I'm sure 
both project has valid reason for their decisions, but it is a pipedream 
to think all projects would ever use the same.

the suggested
'ValidateException', or maybe something like
'FOParseException'.

Using that logic, a SAXParseException would never be
used anywhere, because it is SAX usually used to
validate the XML document.  
It is thrown by xerces. And caught by all users of all SAX parsers.
If it's no good for FOP,
then it would be no good for Batik, and Cocoon, and
Axis, and then we're back to rarely using Java
exceptions directly in one's coding. 
Invalid argument. It incorrect to make conclusions based on this single 
case. And the other project have already chosen differently.

SAXParseException is there for a purpose:  to be used,
it's already coded and documented!

Throwing a SAXParseException actually seems more
confusing to the end-user
(--he/she who, on average, reads little more than
the first two lines of the
stack trace :-) )

Maybe not, because a SAXParseException has the benefit
of the user knowing it already (and if they don't know
it, they will learn it *now*, and get to apply this
knowledge for future parsing of other languages.) 
If that is your argument, then we *certainly* shouldn't throw 
SAXParseException, since users know that a SAXParseException means that 
the XML input can't even be parsed, and much less transformed. When they 
see a SAXParseException they would begin by looking at the input XML, 
not on the output XML from the transformation.

Also, our SAXParseExceptions each have detailed error
messages telling them what the problem is.  (The stack
trace they usually won't see anyway, unless they
choose to run in debug mode.)
Good, so does ValidateException.
snip/
What's wrong with an app having its own set of
particular exceptions
reflecting its overall design, according to 'what
can go wrong and where'?

Excellent question.  The reason is that exceptions are
for the benefit of the user using the software, and
he/she has usually no internal knowledge of the
software product itself.  The user doesn't know about
FO trees, area trees, layout, renderers--this is all
our language that is collectively a black box to the
user.  Furthermore, users don't care *where* the error
occurred inside the black box (advanced users have the
stack trace anyway), they do want to know *what* went
wrong though, so they can group those *what's*
together and report on them in a common manner.
For example, I see two main errors that can occur with
FOP:
1.) syntax error:  elements not following (a) proper
ordering or (b) invalid values for properties.  These
should raise a SAXParseException, with IMHO not
*necessarily* a need to differentiate between the two
because they both imply the same problem:  badly
written stylesheet, as well as the same solution:
fix/redo your stylesheet.
2.) file not found

Re: meaning of Area.TreeExt interface

2004-10-26 Thread Finn Bock

[Glen]
Still, I wonder if it may be better to do away with
this interface and just hardcode the layout/rendering
of these objects (i.e. bookmarks) directly, just like
we do all the other Area objects that are created
during the layout process.  This would simplify 
area.extensions.BookmarkData and area.AreaTreeModel
processing code.  Thoughts (anyone)?
[me]
That would make the life of extension writers
harder. Thus I would 
prefer that TreeExt stay.
[Glen]
Finn, ***be very careful***, the definition of tree
extension was #2, *not* #1 (see previous email).  I
don't like its name, it should be OutOfAreaObject or
something like that.  
And why shouldn't extensions be allowed to create OutOfAreaObjects? Some 
extension writers will needed that just like the author of fox:bookmarks 
needed it.

regards,
finn

Re: meaning of Area.TreeExt interface

2004-10-26 Thread Finn Bock

[Glen]
Actually, to clarify what I wrote: why do we need an
extension interface (TreeExt) for objects placed
outside the document (pdf bookmarks/metadata), while
*not* needing one for extension objects that end up
placed *on* the document?
You subclass Area for elements on the document and TreeExt for elements 
outside the document.

For the latter types, area
creation is hardcoded within our layout managers (to
be replaced by extension writers when they have new
FO's), so I don't logically see why we can't do the
same with the former types.
How can an extension pass information onto the renderer if TreeExt (or 
whatever we want to call it) is removed?

regards,
finn

Re: meaning of Area.TreeExt interface

2004-10-26 Thread Finn Bock

[Glen]
OK, I'd like to 

1) rename TreeExt to OffDocumentItem (to clarify what
it's really for)  

2) change it from an interface to an abstract base
class.  

3.) Have this base class hold the when parameter for
when this ODI is to be generated, so the when does
not need to be hardcoded into AreaTreeHandler as it is
currently for PDF Bookmarks.  (Helps with
extensibility.)
4.) Rename Renderer.renderTreeExtension() to
Renderer.processOffDocumentItem().  

Any problems with this?
No.
regards,
finn

Exception hierarchy.

2004-10-26 Thread Finn Bock

Hi Team,
I would like to make some changes to the exceptions used in tree building.
First, I dislike throwing SAXParseException from within FOP. In my 
opinion SAXParseException are for reporting problems during the parsing 
of XML, not for reporting subsequent problems with the FO tree.

Second, the exceptions used in FOP should form a structure where 
concrete subclass exception are thrown and a common superclass can be 
used for catching all the different exceptions.

My proposal for the exception hierarchy look like this:
- SAXException
  - FOPException
- ValidateException
- PropertyException
Putting all our exceptions into one hierarchy that inherit from 
SAXException helps reduce the number of wrapped exception.

I've uploaded a patch which changes the FO tree building and property 
code. The benefit of the patch is the reduced number of blind catches:

} catch (XXXException e) {
// log.error(e);
}
Any objections?
regards,
finn

Re: Exception hierarchy.

2004-10-26 Thread Finn Bock

[Glen]
I'm confused--why is OK to throw SAXExceptions but not
its child SAXParseExceptions?  With the latter, it
just holds locator information necessary to pinpoint
the problem for the user.  Please elaborate.
The way I see it SAXException is the exception used to communicate 
across the ContentHandler event methods. And SAXParseException is used 
to communicate an actual error during parsing of XML. That is what the 
exception type specify.

And BTW, the patch does not throw any SAXExceptions.
Java gives us a large family of predefined exceptions
for use in our coding--I'm leery of proclamations of
them being No Good and that each and every Java
application must re-invent their own exceptions
instead.  That just seems sloppy.
But the java exceptions are Very Good. There is just isn't any of them 
that specifies an error found during XSL:FO validation or found during 
property evaluation of XSL:FO expressions.

Glen  (who wanted to get rid of FOPException and just
rely on the Java-defined exceptions)
IMHO reusing SAXParseException is exactly as wrong as reusing f.ex. 
FileNotFoundException, which is another fine exception, just not suited 
for use from within FO tree building.

regards,
finn

Re: meaning of Area.TreeExt interface

2004-10-25 Thread Finn Bock

[Glen]
Still, I wonder if it may be better to do away with
this interface and just hardcode the layout/rendering
of these objects (i.e. bookmarks) directly, just like
we do all the other Area objects that are created
during the layout process.  This would simplify 
area.extensions.BookmarkData and area.AreaTreeModel
processing code.  Thoughts (anyone)?
That would make the life of extension writers harder. Thus I would 
prefer that TreeExt stay.

regards,
finn

Re: FObj.bind() SAXParseException

2004-10-24 Thread Finn Bock

[Glen]
The FObj.bind() method throws a SAXParseException if
this method hasn't been overridden in the FObj
subclass.  I'm unsure why a subclass override is
required (as the javadoc comment for this method
indicates)--wouldn't it be OK if we just have an empty
method here without the exception (e.g. if you have no
properties of concern for a particular fo)?  That
would seem to be preferable.
I'm a bit unsure. Generally I wanted two ways of handling 
atrributes/properties. Either
- override processNode and optionally createPropertyList if the
  element wants to deal with the attributes directly,
- or override bind if the element want to take part in the whole
  property system.

Outline.java is an example of an element that want to handle the 
attributes directly. And I think that the other two element Label and 
Bookmarks should do the same.

So the preferred solution IMO would be to add a null 
createPropertyList and an empty processNode method to ExtensionObj.

I mention this because I'm currently trying to run our
pdfoutline.fo example, and this exception is being
called because no bind() method has been implemented
yet in the fox:bookmarks element.
and
  +fobj = node;
Appologies for the regressions.
regards,
finn

Handling XML parse errors when using identity transform.

2004-10-21 Thread Finn Bock

Hi Team,
I'm no expert on java's xml  transformer APIs so I need a little help. 
It appears to me that when using the identity Transformer from xalan we 
no longer get notifications of XML parse errors.

Using an obviously corrupt input fo:
?xml version=1.0 encoding=UTF-8?
xfo:root xmlns:fo=http://www.w3.org/1999/XSL/Format;
/xfo:root
we get the a message on stderr from xerces DefaultErrorHandler:
[Fatal Error] t.fo:2:56: The prefix xfo for element xfo:root is not 
bound.

and there is no way AFAICT to set a different error handler on the 
XMLReader that the xalans transformer creates and uses to parse the 
input file.

Am I missing something?
regards,
finn

Re: Handling XML parse errors when using identity transform.

2004-10-21 Thread Finn Bock

[Glen]
I'm not sure of the problem here, your input fo is not
an xsl fo, because the xfo to a namespace wasn't
bound.  Hence the Xerces error message--what is it
that you're expecting instead?
I would expect that the ErrorHandler methods in FOTreeBuilder 
(warning(), error() and fatalError()) was called. And that the message 
was written to our common-logging.

regards,
finn

Re: Handling XML parse errors when using identity transform.

2004-10-21 Thread Finn Bock

[Andreas]
Hi Finn,
Not sure if this is what you're looking for, but the
javax.xml.transform.Transformer class does offer a setErrorListener() method
to use a custom Error Handler... IIUC, all we have yet to do is provide an
implementation for a javax.xml.transform.ErrorListener and use it to route
the error messages to our common-logging (instead of the
org.apache.xml.utils.DefaultErrorHandler routing them to stderr or a
supplied PrintStream or PrintWriter).
I couldn't get that working. As I see the it, the SAXException from the 
xml parsing is not passed through to the ErrorListener.

The source code from xalan confirms that. No ErrorHandler is passed to 
the XMLReader that is create in TransformerIdentityImpl.java.

regards,
finn

Defoe

2004-10-20 Thread Finn Bock

Hi Peter,
Did I miss the announcement?
http://defoe.sourceforge.net/
regards,
finn

Re: [Fwd: Re: Performance improvement in property consumption.]

2004-10-16 Thread Finn Bock

[Peter]
Alt-design just uses a sparse array, constructed at END_ELEMENT.  Space
savings are progressively realized as the depth of the FO Tree reduces.
 Maximum consumption occurs at the points of greatest depth of the
tree, minima at the end of each page-sequence.
[me]
IIRC your sparse array does not just contain the relevant properties 
for an element (those listed in the spec under each fo:element), but 
the joined set of all properties relevant for all the allowed 
children. If this is correct, the sparse arrays stores more 
properties and uses more memory than my proposal does.
[Peter]
... no, the sparse arrays contain only properties relevant 
to the node itself.  If not, it's a bug.
I misremembered, my apologies for giving the impression that FAD use 
more memory. Thank you for setting me straight.

regards,
finn

Re: [Fwd: Re: Performance improvement in property consumption.]

2004-10-15 Thread Finn Bock

Why is it more efficient (I know it is, given your
metrics, but want to know why)--aren't you just moving
the values already stored in the PropertyList into
separate fields in the FO objects?  Yes, you're
releasing the PropertyList's memory, but the elements
that the PropertyList previously stored are now stored
in the FObj. 
So if PropertyList can be thought of as a C-like
struct holding the values of its FObj's properties,
what you're doing appears to be just taking that
struct's member variables and moving them to the FObj.

But, obviously, given the performance/memory boost
you're noting, PropertyList *can't* be regarded as a
C-like struct.  Why?  Could PropertyList be made more
efficient instead of this change--make it more like a
C-like struct?
[Peter]
It's a mixed bag, by the look of it.  From the patch, applying to FOText:
+// The value of properties relevant for character.
+private CommonFont commonFont;
+private CommonHyphenation commonHyphenation;
+private ColorType color;
+private Property letterSpacing;
+private SpaceProperty lineHeight;
+private int whiteSpaceCollapse;
+private int textTransform;
+private Property wordSpacing;
+private int wrapOption;
+
+// End of property values
Note the combination of simple fields for whiteSpaceCollapse and more
complex structures like CommonFont.
I really like the combination of CommonXXX structures and simple fields, 
since it litterally matches the spec. Except that the spec doesn't have 
a FOText element so it is a bad example for me to use. The fields in 
FOText was taken from TextInfo and I suspect that eventually more fields 
will be needed in FOText.

In the rest of the elements, the set of fields matches the spec. The 
only exception is a bug where the some of the inline LayoutManagers uses 
vertical-align which is a shorthand. The layoutmanagers should instead 
use the properties that the shorthand sets: alignment-baseline, 
alignment-adjust, baseline-shift and dominant-baseline. But since only 
one of these properties has been implemented yet, I choose to keep the 
use of vertical-align for now.

Alt-design just uses a sparse array, constructed at END_ELEMENT.  Space
savings are progressively realized as the depth of the FO Tree reduces.
 Maximum consumption occurs at the points of greatest depth of the
tree, minima at the end of each page-sequence.
IIRC your sparse array does not just contain the relevant properties for 
an element (those listed in the spec under each fo:element), but the 
joined set of all properties relevant for all the allowed children. If 
this is correct, the sparse arrays stores more properties and uses more 
memory than my proposal does.

Finn has gone a step further, and collapsed the property structures into
local variables, which is good for both memory consumption and speed, at
the cost of some more code.  IIUC.
Correct. In addition I also get a certain level of typesafeness for the 
properties.

regards,
finn

Re: DO NOT REPLY [Bug 31206] - [PATCH] Improvements over the new line breaking algorithm

2004-10-15 Thread Finn Bock

[Luca]
IMPORTANT: I had to revert Finn Bock's changes to the PDFRenderer (dated
2004/09/22 13:22:16), otherwise leaders with svg use-content produce
errors in the pdf output.
There isn't any run-time error, but when I try to open the pdf file, I get
these warnings:
 - There was an error processing a page. Wrong operand type.
 - Illegal operation 'q' inside a text object.
 - Wrong operand type.
and the page with the svg leaders is left empty.
I think it could be something involving the saveGraphicsState() method.
I've just now fixed this issue so that your patch work with the PDFRenderer.
regards,
finn

Re: [Fwd: Re: Performance improvement in property consumption.]

2004-10-15 Thread Finn Bock

[Clay]
Which of the alignment-* property is the one you're referring to that 
has been implemented?
I was just looking at them from the point of view of the property 
subsystem, where only baseline-shift has been implemented. I didn't 
mean to imply that it actually work all the way through layout.

regards,
finn

Re: Performance improvement in property consumption.

2004-10-14 Thread Finn Bock

[Glen]
So if we did this at the FO level, in effect, we'd
have to (1) store an instance variable of every
valid
property for each FO, given that we wouldn't know
whether the FOEventHandler's needs it beforehand,
and
[me]
Yes. Which is massively more efficient than storing
the exact same 
properties in a PropertyList.
[Glen]
Why is it more efficient (I know it is, given your
metrics, but want to know why)--aren't you just moving
the values already stored in the PropertyList into
separate fields in the FO objects?  Yes, you're
releasing the PropertyList's memory, but the elements
that the PropertyList previously stored are now stored
in the FObj.
Keep in mind that there is 2 different sets of properties:
- The set of specified properties.
- The relevant properties (as listed in the spec under each element).
The existing PropertyList stores the specified properties in the super 
HashMap and has an additional cache which stores all retrieved properties.

In my proposal the specified and the cached properties are still stored 
in the property list but only the relevant properties are retained in 
the fo object.

So if PropertyList can be thought of as a C-like
struct holding the values of its FObj's properties,
what you're doing appears to be just taking that
struct's member variables and moving them to the FObj.
No, see above.
But, obviously, given the performance/memory boost
you're noting, PropertyList *can't* be regarded as a
C-like struct.  Why?  Could PropertyList be made more
efficient instead of this change--make it more like a
C-like struct?
Speed can be improved, but at the cost of additional memory.
The beauty of my proposal is that we can pick the fastest implementation 
of property assignment and property lookup without worrying about the 
memory because the property list is released.

regards,
finn

Re: Performance improvement in property consumption.

2004-10-14 Thread Finn Bock


Keep in mind that there is 2 different sets of
properties:
- The set of specified properties.
- The relevant properties (as listed in the spec
under each element).
The existing PropertyList stores the specified
properties in the super 
HashMap and has an additional cache which stores all
retrieved properties.
[Glen]
Ummm...just to be very careful so I can understand
what you're saying--instead of retrieved properties
above, did you mean relevant properties?
No, the current PropertyList caches nearly all accessed properties. This 
 was added (IIUC) to improve speed of repeated retrieval of the same 
property, as typically seen for inherited properties.

You can find the caching in PropertyList.findProperty().
regards,
finn

Re: Performance improvement in property consumption.

2004-10-14 Thread Finn Bock

In my proposal the specified and the cached properties are still stored
in the property list but only the relevant properties are retained in
the fo object.
[Andreas]
Yes, and IIJC, at the same time, you're eliminating the need for downstream
property queries having to be performed through the PropertyList, so the
FObj's can communicate directly (--less underlying HashMap fetches...)
So roughly:
a. FObj1 asks FObj2
b. FObj2 probes its property instance variable
c. FObj2 responds to FObj1
instead of
a. FObj1 asks its PropertyList
b. PropertyList asks FObj2's PropertyList
c. FObj2's PropertyList queries its HashMap
d. FObj2's PropertyList responds to PropertyList
e. PropertyList responds to FObj1
No, at the startElement() event the property list exists for all the 
parent elements and they are used to answer all property queries, 
including the property function and inheritance.

So the process is you outline above is unchanged.
PS. I'm ignoring the handling of markers in my descriptions.
regards,
finn

Performance improvement in property consumption.

2004-10-13 Thread Finn Bock

Hi Team,
I've put together another largish patch that changes the way properties
and used by the FO objects, the layout managers and the FOEventHandler
subclasses.
http://issues.apache.org/bugzilla/show_bug.cgi?id=31699
The performance increase is 42% while at the same time using %27 less
memory. The numbers comes from rendering the initial chapters of DocBook 
The Definite Guide [1] to a 131 page pdf file using jdk1.4.2_03. 
Performance is meassured in wallclock time and memory by finding the 
minimum -Xmx value where the document can be rendered (in steps of 100).

The improvement in performance and in memory are due to PropertyList now
being a temporary object that is only available to a FObj during the
call to FObj.bind(). All fo elements are required to extract all the
properties that they need and to store the values as instance fields in
the fo element.
When the PropertyList is a temporary object that isn't stored in the
FObj, the size of the PropertyList does not matter and a much faster
version can be used (implemented by StaticPropertyList).
The patch also adds correct property resolution for fo:marker and
fo:retrieve-marker. Children of fo:marker has a memory efficient version
of their PropertyList stored within the Marker object and they are
re-bind'ed by fo:retrieve-marker each time a marker is retrieved.
Any objections?
[1] http://www.apache.org/~bckfnn/DocBookTDG.zip
regards,
finn

Re: Performance improvement in property consumption.

2004-10-13 Thread Finn Bock

[Glen]
All fo elements are required to
extract all the
properties that they need and to store the values as
instance fields in
the fo element.

I love the performance boost you're recording but have
a philosophical issue of an fo object needing a
property--it is really an issue of the
*FOEventHandler subclass* that needs the
properties--and some handlers need/implement more
properties than others.
Does any event handler need more or different properties than what is 
defined in the spec under each fo element? Because that is the set of 
properties that I store in the fo objects.

If further properties are needed by any property consumer, it can just 
be added to the fo element. An example could be text-decoration on 
fo:block.

So if we did this at the FO level, in effect, we'd
have to (1) store an instance variable of every valid
property for each FO, given that we wouldn't know
whether the FOEventHandler's needs it beforehand, and
Yes. Which is massively more efficient than storing the exact same 
properties in a PropertyList.

(2) instead of the programmatic convenience of
getPropString(PR_OFFICIAL_PROPERTY_NAME), we'd have to
remember what each of the properties are named in the
FO's (We can probably standardize these though.)  
I've consistly picked getOfficielPropertyName() as the name of the 
getter method.

Another clarification needed:  when would the property
list's memory get released--at the PageSequence level,
or as soon as the FO is finished?  
At the endElement() event.
Because the values
of some FO's properties further downstream depend on
an earlier FO's properties, we may not be able to
release the memory when the earlier FO is finished. 
If a property value is needed by a following sibling, then it can be 
found as an instance field on the fo object that defined the property.

Or, (to support inheritance) we may need to store
instance variables for what may be a very unlikely
event of a child needing that value.
I don't understand what you mean. All property inheritence is resolved 
at the time of the child startElement() event. The resolution is done 
using the parent propertylist while it still exists.

Also, are there any other options for releasing this
memory that you know of? Is there someway this .bind()
or other releasing of memory can be done within the
FOEventHandler subclasses, or at least within
LayoutManager (forgetting about RTF) instead?  
I don't see any benefit at all, from such approaches.
PS. The description of releasing property list only apply to normal 
elements. All element under fo:marker are treated differently.

regards,
finn

Re: [GUMP@brutus]: Project xml-fop (in module xml-fop) failed

2004-10-12 Thread Finn Bock

[Jeremias]
If switched the junit tasks to write to System.out instead of to a file.
Hopefully, we will get more usable output this way. I don't know if we
can get Gump's full output somewhere so we could check the test reports.
BTW, I have no errors locally.
I get an error:
junit:
 [echo] Running basic functionality tests for fop-transcoder.jar
[junit] Testsuite: org.apache.fop.BasicTranscoderTestSuite
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 3,968 sec
[junit] - Standard Output ---
[junit] drawImage x=7 y=277 width=97 height=14 
[EMAIL PROTECTED]: type = 2 DirectColorModel: rmask=ff 
gmask=ff00 bmask=ff amask=ff00 IntegerInterleavedRaster: width = 97 
height = 14 #Bands = 4 xOff = 0 yOff = 0 dataOffset[0] 0
[junit] -  ---
 [echo] Running basic functionality tests for 
fop-transcoder-allinone.jar
[junit] Testsuite: org.apache.fop.BasicTranscoderTestSuite
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0,281 sec

[junit] - Standard Error -
[junit] java.io.IOException: SAX2 driver class 
org.apache.xerces.parsers.SAXParser not found
[junit] 	at 
org.apache.batik.dom.util.SAXDocumentFactory.createDocument(Unknown Source)
[junit] 	at 
org.apache.batik.dom.util.SAXDocumentFactory.createDocument(Unknown Source)
[junit] 	at 
org.apache.batik.dom.svg.SAXSVGDocumentFactory.createDocument(Unknown 
Source)
[junit] 	at 
org.apache.batik.dom.svg.SAXSVGDocumentFactory.createDocument(Unknown 
Source)
[junit] 	at 
org.apache.batik.transcoder.XMLAbstractTranscoder.transcode(Unknown Source)
[junit] 	at 
org.apache.batik.transcoder.SVGAbstractTranscoder.transcode(Unknown Source)
[junit] 	at 
org.apache.fop.AbstractBasicTranscoderTestCase.testGenericPDFTranscoder(AbstractBasicTranscoderTestCase.java:70)
[junit] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] 	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] 	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] 	at java.lang.reflect.Method.invoke(Method.java:324)
[junit] 	at junit.framework.TestCase.runTest(TestCase.java:154)
[junit] 	at junit.framework.TestCase.runBare(TestCase.java:127)
[junit] 	at junit.framework.TestResult$1.protect(TestResult.java:106)
[junit] 	at 
junit.framework.TestResult.runProtected(TestResult.java:124)
[junit] 	at junit.framework.TestResult.run(TestResult.java:109)
[junit] 	at junit.framework.TestCase.run(TestCase.java:118)
[junit] 	at junit.framework.TestSuite.runTest(TestSuite.java:208)
[junit] 	at junit.framework.TestSuite.run(TestSuite.java:203)
[junit] 	at junit.framework.TestSuite.runTest(TestSuite.java:208)
[junit] 	at junit.framework.TestSuite.run(TestSuite.java:203)
[junit] 	at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:325)
[junit] 	at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeInVM(JUnitTask.java:804)
[junit] 	at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:551)
[junit] 	at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:527)
[junit] 	at org.apache.tools.ant.Task.perform(Task.java:319)
[junit] 	at org.apache.tools.ant.Target.execute(Target.java:309)
[junit] 	at org.apache.tools.ant.Target.performTasks(Target.java:336)
[junit] 	at 
org.apache.tools.ant.Project.executeTarget(Project.java:1306)
[junit] 	at 
org.apache.tools.ant.Project.executeTargets(Project.java:1250)
[junit] 	at org.apache.tools.ant.Main.runBuild(Main.java:610)
[junit] 	at org.apache.tools.ant.Main.start(Main.java:196)
[junit] 	at org.apache.tools.ant.Main.main(Main.java:235)
[junit] -  ---
[junit] Testcase: 
testGenericPDFTranscoder(org.apache.fop.BasicPDFTranscoderTestCase): 
Caused an ERROR
[junit] null
[junit] Enclosed Exception:
[junit] SAX2 driver class org.apache.xerces.parsers.SAXParser not found
[junit] org.apache.batik.transcoder.TranscoderException: null
[junit] Enclosed Exception:
[junit] SAX2 driver class org.apache.xerces.parsers.SAXParser not found
[junit] 	at 
org.apache.batik.transcoder.XMLAbstractTranscoder.transcode(Unknown Source)
[junit] 	at 
org.apache.batik.transcoder.SVGAbstractTranscoder.transcode(Unknown Source)
[junit] 	at 
org.apache.fop.AbstractBasicTranscoderTestCase.testGenericPDFTranscoder(AbstractBasicTranscoderTestCase.java:70)
[junit] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] 	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] 	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

regards,
finn

Re: cvs commit: xml-fop/src/java/org/apache/fop/render/rtf TableAttributesConverter.java

2004-10-03 Thread Finn Bock

[Jeremias Maerki]
Finn, it seems to me that you probably forgot the check in all your
changes. FOP doesn't compile ATM. The method makeBorder is missing.
Yes, I forgot. I'm sorry for the inconvenience.
Glen, Thank You for temporarily fixing my mistake.
regards,
finn

Re: Storing the Locator instance in FONode.

2004-10-01 Thread Finn Bock

[Glen]
I see.  I liked Locator because I can get away with
passing in one parameter into the vCN() instead of
three.  (But I don't think we'll need to change that
portion of the code because whenever vCN() is being
called, the Locator *is* accurate--would you agree?)
Yes.
regards,
finn

Storing the Locator instance in FONode.

2004-09-30 Thread Finn Bock

Hi Glen,
I think that your recent change that stores the Locator instance in each 
FONode is incorrect use of the Locator interface. The docs says:

http://java.sun.com/j2ee/sdk_1.3/techdocs/api/org/xml/sax/Locator.html

Note that the results returned by the object will be valid only during 
the scope of each content handler method: the application will receive 
unpredictable results if it attempts to use the locator at any other time.


And in fact, it is the same Locator instance that is stored in each FONode.
I think it would be better to revert to using separate 
line/column/systemId fields if we want to know the location of each FONode.

regards,
finn

Loading of extension element mappings.

2004-09-23 Thread Finn Bock

Hi Team,
I think the API for adding additional element mappings to the 
FOUserAgent is bound to cause problems when fop is deployed in a multi 
classloader environment.

Using tomcat as an example, an extension mapping deployed in a webapp 
can not be found if a well meaning administrator has placed fop.jar in 
tomcats common/lib directory. This is because 
FOTreeBuilder.addElementMapping(String) uses Class.forName() to load the 
element mapping class.

I see several possible solutions:
1) Let FOTreeBuilder.addElementMapping() also check the
   Thread.getContextClassLoader() for the extension element mapping.
   This will work in environments that actually sets the context
   classloader. Tomcat does but other environments doesn't.
2) Pass in an explicit classloader to use when loading extension
   mappings.
3) Let the client pass the actual element mapping object, instead of
   just passing in the name of the element mapping class.
I'm strongly in favor of 3) because it Just Work (tm), and when a 
ClassNotFoundException do occur, the programmer immediately knows why 
and how to fix it.

So I propose to change the API signature from
FOUserAgent.addElementMapping(String)
to
FOUserAgent.addElementMapping(ElementMapping)
regards,
finn

Re: [VOTE] Luca Furini for Committer

2004-09-15 Thread Finn Bock

[Simon Pepping]
I propose that we make Luca Furini a member of the FOP team.

Here is my +1 vote.
+1
regards,
finn

Re: cvs commit: xml-fop/src/java/org/apache/fop/fo/pagination Root.java

2004-09-12 Thread Finn Bock


A (farfetched) argument against ThreadLocals would be that they 
prevent one FOP processing run to occur recursively during another 
independent processing run. Like an extension element that itself 
will attempt to process another fo document.
[Glen]
I think that restriction is implicit in the XSL Recommendation, because 
fo:instream-foreign-object [1], has the requirement that its child be 
from a *non-XSL* namespace.  If the rec intended FO documents to be 
processed recursively, they wouldn't have had a non-XSL namespace 
requirement (i.e., why bother to require users to have a 
finn:fodocument that would just wrap fo:root/, if you can just use 
the latter tag directly?)  Indeed, this restriction could be an argument 
*for* using ThreadLocals.
No, not really.
ThreadLocals should be used for storing data that is local to each 
thread (thread singletons). They are not a way to introduce global 
variables which just so happens to work in tomcat.

regards,
finn

Re: [proposal] How to do logging from the command line

2004-09-10 Thread Finn Bock

[Jeremias]
Hi Finn,
I took a look at it and I must say that I'm a bit confused, too.
Anyway, I have a proposal to make. I would expect a command-line
application to work like any other C-program such as grep, svn, ls or
whatever. That means you don't get any [INFO] before each line, but
you can define the log level (normally quiet, normal and verbose)
through command line switches. That'll work for most CLI-users.
Since the Fop class is the one to control the whole application it (or
rather the CommandLineOptions class) can also set up C-L to behave
exactly as explained above. That would probably mean not to use
SimpleLog but to provide a special implementation for command-line use.
At any rate, I don't agree with the way that SimpleLog is implemented.
Informational messages should go to System.out, errors to System.err.
Logging prefixes should be disabled. I've had to do the same for Avalon
Logging in Barcode4J [1] and I'm very pleased with the result. Using
this I was able to implement the Barcode4J CLI in a way that the
generated barcode could be output to System.out which may also be
desirable for certain people. You could even modify the whole thing in
way that you could implement FOP as a filter application getting the
input through System.in and sending the output to System.out, giving
error messages through System.err.
For those who know about C-L and want to do some special logging we
could have a command line switch that disables our special logger setup.
They can fully control C-L from outside.
For the cases where FOP is used embedded in a bigger software, FOP
shouldn't manipulate anything in C-L, it's simply the developer's job to
set up C-L from outside.
WDYT?
I like it. It will get consistent results for command line users without 
 them (practicly) ever having to do any logging configuration.

I have added a patch with an implementation of your proposal. Feel free 
to point out any mistakes that I have made.

Glen, I'm a bit confused about your objections. I don't think they apply 
to Jeremias' proposal at all.

- SimpleLog plays *no* part in this.
- There is no wrapping of c-l in any way, we just supply own very
  simple specialized logging implementation.
- Users can still use other c-l loggers (jdk14, log4j, etc.) by
  setting LOGCHOICE in fop.bat
regards,
finn

Re: cvs commit: xml-fop/src/java/org/apache/fop/fo FOTreeBuilder.java

2004-09-09 Thread Finn Bock

[Glen]
I think fox: should also be validated.  It's, by
definition, part of FOP core, and can't be
added/adjusted to by anyone but us.
Yes. I agree.
regards,
finn

Logging of exception.

2004-09-08 Thread Finn Bock

Hi,
I didn't follow the discussion in the spring about command line -d and 
commons-logging so I'm likely missing some important pieces, but I'm a 
bit confused about the result.

If I attempt to render a fatally corrupt input fo file like:
fo:block/
, I get the expected SAXParseException message on the console and the 
Turn on debugging for more information line.

When I turn on debugging, then the full exception is printed directly on 
the console (without using C-L!) and no information about the exception 
is sendt to C-L.

So the C-L debug level directly controls output to System.err. That 
makes very little sense to me.

At the very least the exception should be logged to C-L, right?
I would also guess that any additional System.err output should be 
controlled separately from C-L, with a -d option.

regards,
finn

Re: cvs commit: xml-fop/src/java/org/apache/fop/fo/pagination Root.java

2004-09-07 Thread Finn Bock

[Jeremias Maerki]
Question to everyone: We currently don't have a multi-threaded design
like Peter West's approach. Can anyone think of a reason that all the
FO-building and layouting process for one processing run may run within
more than one thread? I don't think threre is one if the SAX event
delivery is always within one thread (big if). If there isn't I believe
we could make use of a ThreadLocal to put some shared data that will be
easily accessible from all involved components. Initialize the
ThreadLocal in startDocument() and clear it in endDocument(). I realize
there's a certain risk involved but it could really shorten the access
ways for shared data especially for the FO tree, if that's really a
problem (I'd also like to know if it really is. Anyone?).
A (farfetched) argument against ThreadLocals would be that they prevent 
one FOP processing run to occur recursively during another independent 
processing run. Like an extension element that itself will attempt to 
process another fo document.

My preference would be to explicit pass the shared data (in this case 
the FOEventHandler) to the classes that needs it. If the recursion Glen 
discovered is deemed too slow, then store the FOEventHandler in an 
instance field for each FONode.

regards,
finn

Re: cvs commit: xml-fop/src/java/org/apache/fop/fo/pagination Root.java

2004-09-07 Thread Finn Bock

My preference would be to explicit pass the shared data (in this case 
the FOEventHandler) to the classes that needs it. If the recursion Glen 
discovered is deemed too slow, then store the FOEventHandler in an 
instance field for each FONode.
[Jeremias]
My preference, too. But having too much in FONode will generate a lot
memory consumption. 
Then maybe we can select some specific node types that will carry an 
actual reference to the FOEventHandler. Like fo:root (obviously), 
fo:flow and fo:block and all other node type will reference their 
parent. We can then pick the right tradeoff between memory and CPU.

It's just a thought.
regards,
finn

Re: validateChildNode prevents extensions.

2004-08-31 Thread Finn Bock

[Glen]
Oh, when I meant alter the system I also meant
adding classes, using different classes, etc., i.e.,
some things need to be done beforehand to accomodate a
new element.
Come on! That leaves alter the system without any meaning. Recompiling 
modified sources into a new fop.jar would IMO mean alter the system.

Here is how extensions should work:
 http://xml.apache.org/fop/dev/extensions.html
and that is effectively what I have done.
BTW, without divulging too much that may hurt your
interests, would you mind explaining your reluctance
to just modify FOP source (replace classes, etc.) 
for what you are trying to market?  Is it licensing
issues--or is it more for programmatic style/user
convenience?  I want to better understand your
reluctance on this matter.
Legally the seperation between FOP and extension have placed a solid 
wall between my opensource fop-dev work and my commercial extension. My 
client cant claim ownership of any of the fop-dev work since the 
extension didn't require *any* changes to fop.

Commercially, trying to sell an fork of FOP in order to sell an 
extension to FOP will never fly.

regards,
finn

Re: validateChildNode prevents extensions.

2004-08-29 Thread Finn Bock


Extension writers has to create a subclass of
AbstractLayoutManager (I 
just use a LeafNodeLayoutManager) and an subclass of
Area. But that are 
normal operations since there are subclasses of
those for each element 
type. I must also add code to the renderer to handle
my area, but that 
is because the renderer code is not flexable enough
to handle unknown 
area classes. I get around that limitation by
plugging in a completely 
new PDF renderer.
[Glen]
OK, in other words, you have to change FOP source code
to add a new XSL-based extension element.  You have no
problem philosophically with this.
I was unclear. When I write an extension, I make *no* change to any 
existing FOP source files. And I do not replace any existing FOP class 
files with new version.

For an extension element I have to writte a new version of FO classes, a 
layout managers, an area and also a new PDF renderer but these are all 
new files with new class names.

Also, I can't replace 
the Block class with my subclass in the
FOElementMapping without 
changing FOP sources.
OK, you're saying again that you have to change the
FOP source code in order to add a new XSL-based
extension element, but now you're having a problem
with adding *this* code.
I don't want to change any FOP code! And if I have to change any FOP 
sources to add an extension, then FOP doesn't support extension with the 
meaning of extension as I understand it.

I'm still not sure why this
is causing you so much more consternation--isn't this
trivial (even if annoying) to what you will need to do
in the other classes?
An extension mechanism where I can put an unmodified fop.jar and 
myextension.jar on the CLASSPATH and have it work is a defining issue to 
me. It is the mechanism that is currently paying my rent.

Between changing code in 3 places with no validation
for the user community (and tons of NPE/CCE complaints
due to badly written FO's for the committers to deal
with on the various lists), vs. changing the code in 4
or 5 places *with* validation for the user community
(and few if any NPE/CCE complaints), you do have a bit
of a tough case to make that the former would win a
cost-benefit analysis.

Having to subclass the parents will prevent two
different unrelated 
extension from both having fo:block's as parents.

Not true--fo:block already allows 20 or so different
types of children--all distinct from each other.  (But
to further simplify your coding, all you have to do is
check that the namespace URI is www.finn.com or
whatever and you can choose to allow everything from
that namespace.)
I meant that if I have to either subclass or change Block.java then two 
independently developed extensions can not both be used in the same 
document.

Remember, FOP is basically a compiler--like javac, you
have to define its syntax it will accept, and that
will require coding.  Also, the only difference
between an XSL FO and an extension FO is that the
latter hasn't been blessed by the W3C yet.  Both need
validation and coding.
I don't mind having to write validation code, but I seriously mind 
having to write it in exsting FOP sources.

It has worked quite well before. Without changing
any of FOPs layout or 
area sources.
True.  But in the old version, FOP would happily
accept, say, an fo:root as a child of an fo:block, or
an fo:layout-master-set as a child of an fo:list-item,
and subsequently NPE while processing.  That model
simply had to change.  Nobody would respect/use a
product that operates that way, or a product based on
such a product.
Fine, but that can never be an argument for removing support for 
independant extensions.

I mean exactly that the validation should be loose.
If I need a 
finnbock:foobar tag as a child of fo:block,
fo:block should not 
prevent me from doing so. 

Yes it should, if fo:block is to be XSL compliant. 
The Recommendation defines the content model (i.e.,
the children) that fo:block is to follow and their
ordering.  We modify that for the extension elements
predefined within FOP--as we're allowed to. 
Then I, as an extension writer, also want to change the validation but 
without having to modify existing FOP sources. So the validation have to 
somehow be made plugable.

Perhaps the extension elements can implement a tag
interface to indicate 
that the extension shouldn't be validated by the
parent. Or parhaps the 
checking could be disabled entirely.


Actually, we could do that--fairly simple--let's see
what the other committers have to say.  We can add a
boolean to FOUserAgent called disableValidation, have
FOTreeBuilder read it every time it sees an FO in the
stream, and on the basis of that decide whether or not
to call vCN().  

I see two problems though:  (1) performance
issue--this boolean will need to be read for *every*
node in the FO document, 
I would guess that a check on a boolean for each node is a bit faster 
than calling vCN for each node. But I have not measured it.

(2) FOP will be raising NPE
and CCE errors for

validateChildNode prevents extensions.

2004-08-27 Thread Finn Bock

Glen
I think that the new validateChildNode() methods are too strict in 
response to extension elements. My guess is that the validation should 
only occur when one fo namespace element is added to another fo element.

For instance, Block.validateChildNode() doesn't allow any of my 
extension elements as children.

regards,
finn

Re: [Bug 29124] - New line breaking algorithm

2004-08-12 Thread Finn Bock

Simon Pepping wrote:
TextInfo derives that value from the value of the word-spacing
property.
That must be an error in the property handling by FOP. 
Yes. I'll commit a fix for it tomorrow.
In
FOPropertyMapping 0pt is used as the default value. Apparently
normal should be the default value. I am not sure how this can be
done for a space property. Something similar happens for length
properties, which can have the default value auto. I think normal
should be a keyword. 
Keywords are for mapping one value into another value before the values 
are parsed. In this case 'normal' should be implemented as an enum.

Apparently, the actual value can only be
calculated at layout time, when the font is known.
Yes, so the consumers of the 'word-spacing' value must must check if the 
value as an enum == NORMAL before using the value as a Space:

Property wordSpacing = propertyList.get(PR_WORD_SPACING);
if (wordSpacing.getEnum() == NORMAL) {
textInfo.wordSpacing = new SpaceVal(
   new MinOptMax(min, opt, max), true, true, 0);
} else {
textInfo.wordSpacing = new SpaceVal(wordSpacing.getSpace());
}
regards,
finn

Re: Switch from AddLMVisitor to FObj.addLayoutManager()

2004-08-06 Thread Finn Bock

Glen Mazza wrote:
I still prefer my solution.
Ok, but please consider that it will then become somewhat more difficult 
to add an alternative layout subsystem.

Right now I just have to replace AddLMVisitor (and the XXXLayoutManager 
classes). As far as I understand your proposal, I would have to replace 
FOElementMapping and subclass the FO tree classes in order to plug in a 
new set of XXXLayoutManager classes.

I'm all for simpler code, as long as it does eliminate too many of the 
features that I use wink.

BTW, thanks again for dropping the HEAD code base from
900 to 600 classes and getting rid of all that
autogenerated code.  You have definitely made FOP more
accessible to the programming masses!
Curiously, that was never my goal. I just wanted to reduce the amount of 
mailing list discussions on the topic of generated code.

regards,
finn

Re: Switch from AddLMVisitor to FObj.addLayoutManager()

2004-08-06 Thread Finn Bock

[Glen Mazza]
I still prefer my solution.
Ok, but please consider that it will then become
somewhat more difficult 
to add an alternative layout subsystem.
[Glen Mazza]
Layout subsystems are much rarer than people think, so
not that much a concern IMO.  Keyword here is
somewhat more difficult--it's not
insurmountable--and compared to the vast complexity of
creating a layout subsystem--almost noise level.  It
will be a few more headaches, but you'll have a better
LS in HEAD to base your code off of as a result.
IMHO the closer integrated FO tree and LM tree classes is a worse 
starting point. Victor had the right intentions on this.

[...]. I'm seeing the increase in patches to FOP
that will result in this change as a greater benefit*,
because (1) only the severe minority of FOP users
create their own layout strategies while the majority
benefits from a faster-created 1.0, and (2) more
patches give alternative LS people more and better
code for them to base their work on, and in some cases
may even remove the need to have an alternative LS at
all.
You seems to assume that the one default layout system in FOP can 
satisfy all requirements. I doubt that. The default layout system should 
be good but it shouldn't try solve everything perfectly IMO. Perhaps 
Jörg disagree wink.

*You are a good example, because if it's more
difficult for you to create an alternative, you'll
be more likely to come back to FOP and start making
improvements to HEAD's layout strategy!  GOOD! (Please
do!)  I would rather you work on FOP than on a
competitor for it.
My alternative layout subsystem isn't an alternative to FOP but an 
alternative way of implementing the getNextBreakPoss() code.

I do not yet know if anything will come of it but it looks quite 
promising. If it works, I'll post it as a patch for discussion.

regards,
finn

Re: Switch from AddLMVisitor to FObj.addLayoutManager()

2004-08-05 Thread Finn Bock

Glen Mazza wrote:
Team,
While I'm implementing the validateChildNode()
methods, I would also like to work on removing the
AddLMVisitor visitor pattern[1], and revert to our
previous method of having the FO's themselves setup
the LayoutManagers via addLayoutManager().  (See here
[2][3][4] for examples of the previous way of a year
ago.)  
I would consider that a small step backwards because I liked the 
seperation between the FO tree and the LM tree. But I have no great love 
for the AddLMVisitor so I wouldn't mind at all if it disappeared.

But instead of putting all the LM strategy code into the FO tree 
classes, I would suggest that the creation of the LM tree are done with 
a LayoutManagerMapping and a series of Maker classes, one for each 
FO class.

I'll upload a patch to demonstrate the idea.
regards,
finn

Re: [VOTE] Simon Pepping for Committer

2004-04-02 Thread Finn Bock

Glen Mazza wrote:

I'd like us to go ahead and make Simon Pepping our
newest FOP team member.  He has provided steady ML
help and numerous patch contributions for the past few
months, and with the many layout patches that have
been coming in recently, we could certainly use his
continued help on our team.
+1

regards,
finn

Re: [PATCH] Support for percentages and table-units

2004-02-24 Thread Finn Bock

[Finn Bock]

I don't understand how you propose to solve any of this, but I hope it 
would be Ok to commit the straight forward solution I propose.
[J.Pietschmann]

Whatever works. I just want to note that given the almost one-
to-one correspondence between FOs and LMs both in classes and
instances (with the exceptions of page, column and line LM),
the only advantages of having LMs is
- code reuse by inheritance
- no layout related data in the FO, for better sharing/reuse
Keeping area dimensions in the FO kills the latter.
That is not correct. Temporarily storing the area dimension in the FO 
tree just long enough for the getNextBreakPoss() to return does *not* in 
any way prevent reusing the FO tree or the LM tree for an other 
rendering run.

There is also more good reasons for having an LM tree than just code 
reuse. The lineLM and a separate place for the layout logic just to name 
two.

For storing reference measurements for resolving in the layout
context, you have only to keep track of inheritable properties,
which are basically font-size, ipd and bpd. References to
specified values (in contrast to computed values) can be handled
by splicing in the parsed property expression for the referenced
property as replacement for the referencing function. 
I don't know what splicing means, but the issue that I don't understand 
your solution to is when a child fo makes a reference to an computed 
value that is an expression (like 10% of IPD of 'a') in a parent fo.

fo:block id=a border-start-width=10%
   fo:block id=b border-start-width=inherit
   /fo:block
/fo:block
This way
the FO tree holds properties (parsed property expressions), while
the layout context and the area tree hold the refined traits.
I propose to store the specified expression (the 10% of IPD of 'a') in 
the parent fo. But if that design is chosen, any reference to that 
property in the parent fo must ensure that the base value that is in 
effect for the parent fo is available.

The base values for the parent fo (IPD of 'a') is not currently 
available in the layout context for 'b'. Only the IPD of 'b' value 
exists in the LayoutContext for 'b'. And there is not even a single area 
created yet.

Using the LayoutContext would be a clean design, but first the 
LayoutContexts must be linked together with the LayoutContext from the 
parent fo and the LayoutContext must have a reference to the fo element 
so that the LengthBase can decide how far up the LayoutContext chain it 
should go in order to find the base value that is valid for the parent fo.

I'd be +0 to volunteer wink somebody to implement that.

Should we delay my proposed patch until somebody has come up with an 
implementation that pass the LayoutContext to all Length.getValue(lc) calls?

regards,
finn

Re: [PATCH] Support for percentages and table-units

2004-02-21 Thread Finn Bock

If an expression reference another expression in a parent fo, the parent 
fo expression must be evaluated against the LayoutContext that was in 
effect for the parent fo and *not* against the child fo LayoutContext.

fo:block id=a border-start-width=10%
  fo:block id=b border-start-width=inherit
  /fo:block
/fo:block
It must be the LayoutContex for 'a' that is used when we evaluate the 
10% even when we call:
   propertyList.get(PR_BORDER_START_WIDTH).getValue(lc)
with the layout context for 'b'.
[Simon Pepping]

[leader example snipped}

The 30% must be calculated w.r.t. the page width for both leaders, and
both leaders must be equally wide. This is not the current result.
When I tried to remedy this, I got confused by the sets of symbols on
PercentBase and on LengthBase. Why are there 2 sets of symbols?
The idea was that PercentBase.XXX names the stored values and LengthBase 
names the algorithm for looking up a base value. Most of the time these 
maps one-to-one to each other but for some I imaged that they would be 
different. margin-top f.ex. should really use an algorithm like 
BLOCKIPD_OR_PAGEHEIGHT which would look for either PercentBase.BLOCK_IPD 
or PercentBase.PAGE_HEIGHT depending on the fo:element.

Clearly all of this isn't complete implemented in my patch yet and 
perhaps it can be done in a simpler way. Better names for PercentBase 
and LengthBase could help.

They
are mapped onto each other in LengthBase.getBaseLength(), but not all
symbols are mapped, e.g. LengthBase.CONTAINING_BOX, which is used by
leader-length, has no corresponding action. Some further integration
of old and new code must be done here.
Correct.

Question: What to do if I insert a fo:wrapper between fo:flow and
fo:block? That makes it the parent of fo:block, but it has no layout
manager corresponding to it. Override getLayoutDimension on it?
I don't think so, but perhaps I don't just yet understand the situation 
that you are describing. A percentage on the wrapper should be resolved 
against the parent of the wrapper, right?

regards,
finn

Re: [PATCH] Support for percentages and table-units

2004-02-21 Thread Finn Bock

[Peter B. West]

Finn,

When I apply your most recent patch (10366) against a cvs updated HEAD
tree and attempt to compile, I get the following:
[javac]
/usr/local/src/fop-HEAD-finn/src/java/org/apache/fop/fo/properties/LinearCombinationLength.java:60: 
After applying the patch (10366), the following files should be removed:

src/java/org/apache/fop/fo/properties/LinearCombinationLength.java
src/java/org/apache/fop/fo/properties/MixedLength.java
Does anyone know how to include a file-remove in a patch?

regards,
finn

Re: [PATCH] Support for percentages and table-units

2004-02-20 Thread Finn Bock

[me]

If an expression reference another expression in a parent fo, the 
parent fo expression must be evaluated against the LayoutContext that 
was in effect for the parent fo and *not* against the child fo 
LayoutContext.

fo:block id=a border-start-width=10%
   fo:block id=b border-start-width=inherit
   /fo:block
/fo:block
It must be the LayoutContex for 'a' that is used when we evaluate the 
10% even when we call:
propertyList.get(PR_BORDER_START_WIDTH).getValue(lc)
with the layout context for 'b'.
[J.Pietschmann]

Well, I used to believe the 10% has been evaluated already, and the
inherited property can grab the absolute value immediately.
If it is evaluated already where would the evaluated value be stored? In 
the propertyList (aka the FO tree)? And then the value should be 
reverted to the expression when the base value changes due to breaks.

Storing the resolved value would IMO remove all the benefits from 
passing in a context parameter to getValue().

Perhaps it can be done that way too, but it is very different from my 
proposal.

regards,
finn

Re: [PATCH] Support for percentages and table-units

2004-02-20 Thread Finn Bock

If it is evaluated already where would the evaluated value be stored?
[J.Pietschmann]

The layout context for the child LM could be an appropriate place.
The resolved property values of the parents should be stored in the 
layout context? I must be missing something here!

And then the value should be reverted to the expression when the base 
value changes due to breaks.
No problem, this is known at the place where a new Layout context is
created for getting BP from the child LM.
I don't understand how you propose to solve any of this, but I hope it 
would be Ok to commit the straight forward solution I propose. Then you 
can change it at a later time to pass in the LayoutContext as a 
parameter to Length.getValue().

regards,
finn

Re: DO NOT REPLY [Bug 26778] - [PATCH] Support for percentages and table-units

2004-02-19 Thread Finn Bock

[Simon Pepping]

I like the patch and the way RelativeNumericProperty holds and
evaluates an expression tree (except my different preference for
storing layout information, as discussed). This is really nice and
works well:
	  v = (((0mpt +(4000mpt +20.0%)) +0mpt) +0mpt)

I found a few things that may need modification:

In LengthBase.java:

/** array of valid percent-based length types */
public static final int[] PERCENT_BASED_LENGTH_TYPES
= { CUSTOM_BASE, FONTSIZE, INH_FONTSIZE, CONTAINING_BOX,
CONTAINING_REFAREA } ;
add BLOCK_WIDTH and BLOCK_HEIGHT

(Is this variable used anywhere?)
I don't think so.

Block, BlockContainer, Table, Region, PageLayoutManager: getters and
setters should be removed again?
Yes.

In LineLayoutManager.java:

private Length iTextIndent;

remove the i from the name; it stands for int.
Yes.

regards,
finn

Re: [PATCH] Support for percentages and table-units

2004-02-18 Thread Finn Bock

[J.Pietschmann]
The layout context has the actual IPD MinOptMax. There is no
inherent reason it should have a link to a parent context or the
property subsystem, it's only necessary to have a method to
resolve a property expression given a set of MinOptMax for
the various traits which can be used as references for
percentages. Like
textIndent=propertyManager.get(TEXT_INDENT).resolve(layoutContext);
Right, but it doesn't have all the base lengths. For some base lengths 
it is one of the parent layout contexts that has the trait, like ipd of 
the containing reference area. How could we get hold of that trait if 
the LayoutContexts isn't linked together?

Whatever object that is passed to the resolve() method must also AFAICT 
also have a reference to the FO so that the resolve() can find the right 
LayoutContext in the LayoutContext tree to use as base.

Perhaps I'm missing something fundamental in your suggestion, but I just 
can see how there is enough information in a single LayoutContext to 
resolve relative lengths that is inherited from a parent fo:element.
[Simon Pepping]

In the LayoutContext design, one does not climb a tree to find the
relevant information; instead, the information is propagated downward
for ready reference. Each getNextBreakPoss call gets a LayoutContext
from its caller, and constructs one for its own calls to
getNextBreakPoss. It should make sure that the latter LayoutContext
contains all the relevant information, such as the width of the
containing block and of the containing reference area. Either it
copies it from the LayoutContext it received from its caller, or it
inserts new values, as appropriate. In principle it should contain the
dimensions that correspond to the percent based length types listed in
LengthBase.
If an expression reference another expression in a parent fo, the parent 
fo expression must be evaluated against the LayoutContext that was in 
effect for the parent fo and *not* against the child fo LayoutContext.

fo:block id=a border-start-width=10%
   fo:block id=b border-start-width=inherit
   /fo:block
/fo:block
It must be the LayoutContex for 'a' that is used when we evaluate the 
10% even when we call:
propertyList.get(PR_BORDER_START_WIDTH).getValue(lc)
with the layout context for 'b'.

I don't really care how that is done, but I would link the 
LayoutContexts to their parent. Or rather, I would use the LM as the 
objects for storing the actual dimension since they are already linked 
together with their parent.

But for now I think we should just use the FO tree and later change the 
signature of getValue() to take some kind of context parameter when a 
real need arise.

regards,
finn

Re: [PATCH] Support for percentages and table-units

2004-02-17 Thread Finn Bock

Somehow, in our current design, the information must be stored in an 
object that exists:
[J.Pietschmann]

IIRC that's what the layout context was meant for.
Perhaps, but I doubt it. If they was change to always get a reference to 
the parent layout context when they are created, and if they had a 
reference to the FObj, and if they was made available to the property 
subsystem, then they could properly be used for it.

I still think it is easier to use either the FOs or the LMs .

regards,
finn

Re: [PATCH] Support for percentages and table-units

2004-02-17 Thread Finn Bock

[Simon Pepping]

If in the re-use the layout would not change, the area tree could be
reused. OTOH, if the layout would change, e.g. because another
renderer would use a font with a different font metric, the layout
information in the FO tree cannot be reused.
The layout dimension that is stored in the FO tree is constantly updated 
during discovery of BreakPoss'es and is never reused, not even when a 
block is split over a break where new values are assigned.

The solution I propose makes it impossible to run two different 
renderers concurrently, but it does not in any way prevent the FO tree 
from being reused with another renderer after the first rendering is 
finished.

This is another argument
to separate FO tree and layout information.
No, not really IMO.

regards,
finn

Re: [PATCH] Support for percentages and table-units

2004-02-17 Thread Finn Bock


  Perhaps, but I doubt it. If they was change to always get a reference to
  the parent layout context when they are created, and if they had a
  reference to the FObj, and if they was made available to the property
  subsystem, then they could properly be used for it.
[J.Pietschmann]

The layout context has the actual IPD MinOptMax. There is no
inherent reason it should have a link to a parent context or the
property subsystem, it's only necessary to have a method to
resolve a property expression given a set of MinOptMax for
the various traits which can be used as references for
percentages. Like
  textIndent=propertyManager.get(TEXT_INDENT).resolve(layoutContext);
Right, but it doesn't have all the base lengths. For some base lengths 
it is one of the parent layout contexts that has the trait, like ipd of 
the containing reference area. How could we get hold of that trait if 
the LayoutContexts isn't linked together?

Whatever object that is passed to the resolve() method must also AFAICT 
also have a reference to the FO so that the resolve() can find the right 
LayoutContext in the LayoutContext tree to use as base.

Perhaps I'm missing something fundamental in your suggestion, but I just 
can see how there is enough information in a single LayoutContext to 
resolve relative lengths that is inherited from a parent fo:element.

  I still think it is easier to use either the FOs or the LMs .

Maybe.
Well, the approach using FOs are here already. It can't get easier than 
that.

regards,
finn

[PATCH] Support for percentages and table-units

2004-02-15 Thread Finn Bock

[Simon Pepping]

Finn,

I see you also solved another problem, viz. that fo:layout-master-set
did not return a proper IPD. 
Correct. There are, I'm certain, many more cases where the layout 
managers does not yet set all the dimensions needed to resolve all the 
different percentages.

However, I am not happy with your
solution. During the layout process, you feed the page dimensions back
into the FO tree, in PageLayoutManager.createPageAreas. 
Yes, and in BlockLayoutManager and in all the other LM that defines base 
length for percentages.

I think it
would be a better design if, in order to resolve the percent-based
properties, you would not climb the FO tree but the Area tree. That
avoids feeding back results from the Area tree into the FO tree.
The XSL spec says (XSL-PR/slice3.html#section-N1208-Conceptual-Procedure):

'The formatting object supplies parameters to its children based on the
traits of areas already in the area tree, possibly including areas
generated by the formatting object or its ancestors.'
Holding those parameters in the FO tree is one solution. Extracting
them from the area tree layed out up to that point is another
solution, 
But there is *no* areas layed out or even created at the point where we 
need resolved length. When getNextBreakPoss is called, we only have a 
series of BreakPoss's stored in different layout managers.

The Area objects are created much later when we have BreakPoss's for an 
entire page. PageLM then calls addAreas which recursively creates all 
the areas for the page.

which has the advantage that it does not load the FO tree
with information that it does not need to have.
I initially had a separate PropertyContext object where the length was 
stored. The FO element then had a reference to the PropertyContext and 
there was a PropertyContext for every FO. But since there was a 
one-to-one correspondence between the FO's and the PropertyContexts I 
decided to put the information directly in the FO's instead.

Somehow, in our current design, the information must be stored in an 
object that exists:

- when the properties are parsed.
- when the break possibilities are created.
The FO nodes fulfills both requirements.

Perhaps the dimensions should be stored in the layout manager tree but 
the LM tree is not available when the the properties are parsed and 
there is no way to get from a FO to the LM's that the FO creates.

regards,
finn

Re: JUnit test failure

2004-02-12 Thread Finn Bock

[J.Pietschmann]

Hi all,
I get a nice Junit failure:
Testcase: testFO2PDFWithDOM took 0.23 sec
Caused an ERROR
loader constraints violated when linking org/w3c/dom/Node class
java.lang.LinkageError: loader constraints violated when linking
This seems to have something to do mixing Jars form the JDK and
fop/lib. Does anybody have an idea how this can be avoided?
Any new information on how to avoid this error? Or perhaps a description 
of an environment where it doesn't occur.

I can't make the hint in the junit faq work for me. I also think 
junit3.8.1 already have the lines in its excluded.properties.

regards,
finn

Re: cvs commit: fop.bat

2004-02-11 Thread Finn Bock

[EMAIL PROTECTED] wrote:

  Index: fop.bat
  ===
  -java -cp %LOCALCLASSPATH% org.apache.fop.apps.Fop %1 %2 %3 %4 %5 %6 %7 %8
  +rem 'shift' removes %0 (i.e., the fop.bat filename)
  +shift
  +java -cp %LOCALCLASSPATH% org.apache.fop.apps.Fop %*
Last time I had to work on cmdline args in windows, the %* trick didn't 
work on win9x:

http://sourceforge.net/tracker/?group_id=12867atid=112867func=detailaid=484181

Unfortunately I can't test on a win9x so I can't create our own solution 
and I don't know if we can copy the solution from jython (which I know 
work on all real versions windows) due to license differences.

Perhaps we should just copy the loop from ant and hope that ant works on 
win9x.

regards,
finn

Re: Percentages

2004-02-09 Thread Finn Bock

[Simon Pepping]

Quite a piece of work. I will try to understand it.

A small correction:

margin-[top,bottom]:
   width of containing block, except for page context where it
  ^
  height (I suppose).
Well, not according to the spec:

http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#margin-top

(my patch does not deal with the block-width / page-height issue 
correctly btw).

regards,
finn

Percentages

2004-02-08 Thread Finn Bock

Hi,

I just uploaded a patch which add support for percentages and 
calculations on percentages to bugzilla.

I don't think that the patch is quite right in its attempt to find the 
correct base length to combine with the percentage. Some of my confusion 
comes from the fact that I can't tell the different boxes/areas/blocks 
apart.

How f.ex. are these specifications different:
- ipd of closest area generated by a block-level fo
- width of containing block
- ipd of closest ancestor block-area.
Perhaps somebody who understand the spec better then me can describe 
what boxes and sizes that are to used.

I went through the spec and came up with these descriptions of the bases 
for the properties that support percentage:

top, bottom, height, max-height:
height of containing block if explicit. Auto otherwise.
left, right, padding-*, margin-[left,right],
max-width, min-width, width, text-indent:
width of containing block
background-position-horizontal, background-position-vertical:
Size of padding-rectangle. Relative values.
font-size:
inherited font-size.
margin-[top,bottom]:
width of containing block, except for page context where it
is page box height.
[start,end]-indent:
ipd of containing reference-area.
space-[end,start], line-line-end-indent:
ipd of closest ancestor block-area.
alignment-adjust:
'height' if fo:external-graphics or fo:instream-foreign-object,
'font-size' if fo:character and 'line-height' otherwise.
baseline-shift:
'line-height' of the parent-area.
block-progression-dimension:
corresponding dimension of closest area generated by a
block-level fo.
Auto if not specified explicit.
content-[height,width]
intrinsic height and width
inline-progression-dimension:
corresponding dimension of closest area generated by a
block-level fo.
Auto if not specified explicit.
min-height:
height of containing.
line-height:
'font-size'
leader-pattern-width, leader-length:
ipd of content-rect of parent area.
column-gap:
width of region being divided into columns.
extent:
corresponding height or width of the page.
column-width:
width of table.
text-altitude, text-depth:
fonts em-height.
provisional-label-seperation, provisional-distance-between-starts:
ipd of closest block-area that is not a line-area
vertical-align:
'line-height'
regards,
finn

Re: [PATCH] unnesting Property.Maker and rollling datatypes into thier properties.

2004-02-01 Thread Finn Bock

[Glen Mazza]

Another option, Finn, is to move all the Property 
subclasses to fo.properties (even if they're alongside
the makers, nested or unnested), after thinking about
it, I think that will be a little bit clearer than
having them in the datatype package.  Comments?
I like it. How does this sound:

- Unnest Property.Maker to PropertyMaker and put it in fo.properties.
- Roll the datatypes classes into their property class and move the
  property class to fo.properties (but without unnesting their Maker
  class because the remaining nested Maker classes are really trivial).
- Move the handcoded maker classes to fo.properties.
regards,
finn

Re: Unnesting properties and makers.

2004-01-26 Thread Finn Bock


--- Finn Bock [EMAIL PROTECTED] wrote:

Does anyone know why we wrap the datatypes instances
in a property 
instance? I think we could avoid the property
instance by having the 
datatypes extends an AbstractProperty class which
implement a Property 
interface:

public interface Property {
public Length getLength();
public Space getSpace();
...
}
[Glen Mazza]

Finn, just so I understand more here--what is the set
of methods that this interface would have?  (You don't
have to give me a full enumeration if it's huge--but
let me know you determine them.)  How many of them are
there--10 of them or 20 or 30 or ???
This is the full set, exactly the same which now exists in Property as 
null methods.

public Length getLength();
public ColorType getColorType();
public CondLength getCondLength();
public LengthRange getLengthRange();
public LengthPair getLengthPair();
public Space getSpace();
public Keep getKeep();
public int getEnum();
public char getCharacter();
public Vector getList();
public Number getNumber();
public Numeric getNumeric();
public String getNCname();
public Object getObject();
public String getString();
The name of the returned compound property values would change according 
to the new naming rule that we decide.

regards,
finn

Re: Unnesting properties and makers.

2004-01-26 Thread Finn Bock

[Glen Mazza]

I now understand what you're saying, and like the
simplification you're suggesting.  The current naming
however, is probably preferable--the word Property 
figures quite highly in the spec!  Do you have a
problem remaining with it?
Not at all, it is just that I though it would be confusing to rename the 
'datatypes' classes to XXXProperty as they would conflict with the old 
XXXProperty classes, but it is only a problem when you compare before 
vs. after. If the change is done, the resulting XXXProperty classes will 
be completely consistent.

For those (*)'ed datatypes, can't we get rid of the
datatype instead by rolling that datatype into the
equivalently named Property?  In turn, have *those*
Properties extend AbstractProperty as you suggest. 
Actually, I guess I'm just saying the same thing
you're suggesting, except to use --Property instead
of --Type for everything.
Indeed. Which package should the resulting rolled datatype/property be 
placed in? My feeling says fop.datatypes (and the nested makers should 
be unnested and placed in fop.fo.properties). But that is a separate 
suggestion which does not have to be dealt with initially.

Offhand, it's doesn't seem natural to go without
Property objects--they are kept in the
PropertyList

and indexed by the property ID in that list. 
That would still be the case. Everything stored in
the PropertyList 
implements the Property interface. 


But only a few of them would extend AbstractProperty,
correct--or would you plan on having all do so?
All of the properties would extend AbstractProperty. That way the 
properties get the default 'null' implementation of all the gettype 
methods. The only hard requirement is that all the properties implement 
the Property interface.

Except that the
code above should IMHO use if (prop.getLength() !=
null) to test for a 
length type instead of using instanceof.
Well, instanceof is slower I believe, but better
self-commenting.  If you switch to this type of
conditional for speed, just add a short comment of its
purpose--here, to determine if we are working with an
EnumProperty or a LengthProperty.
(Another option, BTW, if you think it will cut down on
buggy programming, is to have the classes implementing
this Property interface supply unsupported interface
methods a la Peter's Read-Only BitSet[1], i.e., throw
exceptions.  We can revisit this topic later if code
errors are becoming a problem.)
In most cases a NPE exception is throws immediately after the call to 
gettype, but an exception thrown from within the gettype could 
indeed carry more information about the cause of failure. I still like 
the null return and null test better than the alternatives tho.

regards,
finn

Re: Unnesting properties and makers.

2004-01-26 Thread Finn Bock


Each of the typeType classes also implements the gettype methods
from Property so the layout must do exactly the same as it does now to
extract the right value:
   propertyList.get(PR_INLINE_PROGRESSION_DIMENSION).
 getLengthRange().getOptimum().getLength();
[Andreas L. Delmelle]

Hmmm... coming back to my recent question about the use of/access to the
background-color property: I somehow would feel much for further extending
the way the Common*Properties are handled. IIC, the calls like the above
should only happen in the background via the propMgr of the FObj, and not
become part of the public API.
I dunno. The spec clearly list which properties that apply for a element:

file:///d:/java/REC-xsl/slice6.html#fo_external-graphic

so it makes sense to find the same list of assignments in the layout 
managers.

As a concrete example, in Layout, I would rather see something like:

private AreaDimensionProps adimProps;
...
protected void initProperties(PropertyManager propMgr) {
adimProps = propMgr.getAreaDimensionProps();
...
}
Yeah, if it make sense to add more groups of properties together (and it 
seems that such a ipd,bpd pair make sense) I don't see a problem adding 
that.

...
Length ipd = aProps.ipd;
Yes, except that it is a LengthRange property.

(maybe the latter can become more abstract
  PropertyValue ipd = aProps.ipd; )
My gut feeling says no. Unless the property in question can take 
non-LengthRange values (which ipd can not). The layoutmanagers should 
resolve the property value as far as they can as early as they can IMHO.

regards,
finn

Re: Unnesting properties and makers.

2004-01-26 Thread Finn Bock

Glen Mazza wrote:

Well, instanceof is slower I believe, but better
self-commenting.
[J.Pietschmann]

Instanceof is exactly as fast as a simple function call
after warm-up.
That is not what I remembered, so I made a small test program and ran it 
with 3 different versions of jdk:

[/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe -cp . x
false method call 160
true method call 170
false instanceof 581
true instanceof 581
[/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe -cp . x
false method call 16614
true method call 881
false instanceof 1162
true instanceof 2083
[/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x
false method call 581
true method call 661
false instanceof 2153
true instanceof 2734
I really don't know what to conclude from this test, but at least I'm 
glad I didn't mentioned performance as the reason why I prefer the 
gettype way of testing for subclasses.

I'm surprised of the slow performance of calling non-overridden methods 
in jdk1.3.1. I don't have any explanation for that.

regards,
finn


import java.io.*;
import java.net.*;
public class x {
public static final int ITERS = 1;
public static void main(String[] args) throws Exception {
Prop prop = new Prop();
Prop stringprop = new StringProp();
// Warm up the JIT.
testCall(prop);
testInstanceOf(prop);
long now;

now = System.currentTimeMillis();
testCall(prop);
System.out.println(false method call  +
(System.currentTimeMillis() - now));
now = System.currentTimeMillis();
testCall(stringprop);
System.out.println(true method call  +
(System.currentTimeMillis() - now));
now = System.currentTimeMillis();
testInstanceOf(prop);
System.out.println(false instanceof  +
(System.currentTimeMillis() - now));
now = System.currentTimeMillis();
testInstanceOf(stringprop);
System.out.println(true instanceof  +
(System.currentTimeMillis() - now));
}

public static void testInstanceOf(Prop prop) {
for (int i = ITERS; i = 0; i--) {
boolean x = prop.getString() != null;
}
}
public static void testCall(Prop prop) {
for (int i = ITERS; i = 0; i--) {
boolean x = prop instanceof StringProp;
}
}
public static class Prop {
public String getString() {
return null;
}
}
public static class StringProp extends Prop{
String value = a string;
public String getString() {
return value;
}
}
}

Re: Unnesting properties and makers.

2004-01-26 Thread Finn Bock

[Andreas L. Delmelle]
snip /

public static void testInstanceOf(Prop prop) {
for (int i = ITERS; i = 0; i--) {
boolean x = prop.getString() != null;
}
}
public static void testCall(Prop prop) {
for (int i = ITERS; i = 0; i--) {
boolean x = prop instanceof StringProp;
}
}


I'd swap either the method names or the contained expressions to get
dependable results (typo? 
Yeah, an embarrassing copypaste bug. Thanks for catching it. The result 
is then:

[/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe  -cp . x
false method call 581
true method call 581
false instanceof 160
true instanceof 170
[/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe  -cp . x
false method call 1272
true method call 2304
false instanceof 17945
true instanceof 912
[/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x
false method call 2154
true method call 2754
false instanceof 590
true instanceof 651
regards,
finn

Re: Unnesting properties and makers.

2004-01-26 Thread Finn Bock

file:///d:/java/REC-xsl/slice6.html#fo_external-graphic
[Andreas L. Delmelle]

(Off-topic: Finn, I don't think I have access to your d:-drive ;) )
I hope not :-0 . Sorry about that.

Yeah, if it make sense to add more groups of properties together (and it
seems that such a ipd,bpd pair make sense) I don't see a problem adding
that.
I just think this will lead to an API that's a bit clearer, cleaner and so,
in the long run, easier to manage and maintain. I don't really know whether
the Common*Properties were separated out because they are, well, common, and
it's more efficient for them to be treated as a bundle. Maybe it was
originally the intention of creating property groups along the groups in
which they are divided in the spec (see
http://xml.apache.org/fop/compliance.html)?
I don't know what the original intention was either but from the 
no-longer-used setup() methods in the flow objects like fo.flow.Block it 
looks like somebody once wanted the list of properties from the spec to 
be represented in the code. But that should not prevent us from doing it 
differently.

regards,
finn

Re: Comments on the changes to the property subsystem

2004-01-25 Thread Finn Bock

[Simon Pepping]

I have just catched up with the massive changes to the property
system. Allow me to share a few observations:
Thanks for your comments. How do you otherwise think it compares to the 
previous generated property makers?

1. If I see correctly, PropertySets is not yet used. 
Correct. Its intended use is when we, at some point in the future, want 
to store more than just the specified properties in the FObjs.

   PropertyList is
   still a HashMap keyed on property name. Is this waiting for
   some other changes to be made?
Yes. Either PropertyList should have a Property[] array indexed directly 
by the propId or the HashMap should be keyed by an Integer object with 
the same value as the propId:

   super.get(integerArray[propId]);

where integerArray is initialized as:

   integerArray[1] = new Integer(1);
   integerArray[2] = new Integer(2);
   ...
Which of these it will be depends on how much information we decide to 
store in the Fobj.

2. In FOPropertyMapping the word 'generic' is used in two different
   senses: in s_generics and getGenericMappings() on the one hand,
   and in genericBoolean etc., createGenerics() and useGeneric() on
   the other hand. This may be confusing. One might e.g. be tempted to
   think that s_generics contains the objects genericXxx only, which
   is not the case.
You are absolutely correct. I've took the names from the code which 
existed at the time. I'm also terrible at naming things so other 
suggestions will be welcome. I suggest that we rename s_generics and 
getGenericMappings to s_fomapping and getFoMappings because they deal 
with the properties from the xsl:fo spec.

3. getGenericMappings rebuilds s_generics every time it is called. In
   the current code it seems to be called only once, by FObj when the
   class is loaded. Would it not be better if getGenericMappings
   itself would ensure that the array is built only once?
Yes, but I would like to take the question a bit further and ask where 
such information should be cached? Storing it in static variables caches 
it in the classloader which makes it difficult to control the release of 
the memory.

Perhaps it would be better to store the information in the Driver or 
FOUserAgent, IOW to put the caching under user control.

If caching at the classloader level is the best way to do it, then it 
makes perfect sense to do what you suggest.

regards,
finn

Suspicious override in RtfListStyleNumber.

2004-01-24 Thread Finn Bock

Hi,

While I was trying to fix some warnings from 'javadoc' about missing 
@see method references I discovered a suspicious construct in

   org.apache.fop.render.rtf.rtflib.rtfdoc.RtfListStyleNumber

where the signature of the

   writeListPrefix(RtfList)

method is different from the signature of the superclass RtfListStyle 
where the signature is

   writeListPrefix(RtfListItem).

I'm guessing that RtfListStyleNumber is incorrect and that it should at 
least override writeListPrefix(RtfListItem). I hope that someone 
familiar with the rtf code could take a look at this.

regards,
finn

Re: Compiling HEAD

2004-01-23 Thread Finn Bock

[Peter B. West]

Is HEAD supposed to be compiling? 
Yes. I can compile it just fine from a fresh checkout.

I'm getting errors starting at datatypes/ColorType.java.
Which errors? Have you tried to do a ant clean first?

regards,
finn

Re: Unnesting properties and makers.

2004-01-23 Thread Finn Bock


Does anyone know why we wrap the datatypes instances
in a property 
instance? I think we could avoid the property
instance by having the 
datatypes extends an AbstractProperty class which
implement a Property 
interface:
[Glen Mazza]

Could you explain why we have the datatypes instances
to begin with--what they're for?  I'm not sure what
their precise purpose is.
The datatypes are the slightly more complex property values. The 
property classes wraps the datatype in order to give the datatypes a 
common interface.

This list show the concrete Property subclasses and the datatypes that 
each of them wraps.

CharacterPropertychar
ColorTypePropertyColorType (*)
CondLengthProperty   CondLength (*)
EnumProperty int
KeepProperty Keep (*)
LengthPairProperty   LengthPair (*)
LengthProperty   Length,AutoLength,FixedLength,PercentLength (*)
LengthRangeProperty  LengthRange (*)
ListProperty Vector
NCNameProperty   String
NumberProperty   Number
NumericProperty  Numeric (*)
SpacePropertySpace
StringProperty   String
ToBeImplementedProperty
Some of the concrete property subclasses wraps standard java types such 
as int, char, String, Number and Vector and for these properties we 
still need a wrapper. But some of them, marked with (*), wraps a 
datatype which is under our own control and for those properties, the 
datatype class could also function as the property wrapper.

Offhand, it's doesn't seem natural to go without
Property objects--they are kept in the PropertyList
and indexed by the property ID in that list. 
That would still be the case. Everything stored in the PropertyList 
implements the Property interface. In the list below of the new property 
classes, all the typeType classes implements Property and are stored 
in PropertyList.

CharacterType   char
ColorTypeType   it-self
CondLengthType  it-self
EnumTypeint
KeepTypeit-self
LengthPairType  it-self
LengthType, AutoLengthType, FixedLengthType, PercentLengthType
it-self
LengthRangeType it-self
ListTypeVector
NCNameType  String
NumberType  Number
NumericType it-self
SpaceType   it-self
StringType  String
ToBeImplementedType
Each of the typeType classes also implements the gettype methods 
from Property so the layout must do exactly the same as it does now to 
extract the right value:

   propertyList.get(PR_INLINE_PROGRESSION_DIMENSION).
 getLengthRange().getOptimum().getLength();
For the classes which are both property and datatype, the gettype 
method becomes:

public type gettype() {
this this;
}
Furthermore, those are the objects requested by
layout.  What would be your alternative storage
technique otherwise--I believe, we do (frequently?)
have more than one datatype per property, correct?
I remember two cases, but I can only find one at the moment: In 
Title.setup():

prop = this.propertyList.get(PR_BASELINE_SHIFT);
if (prop instanceof LengthProperty) {
Length bShift = prop.getLength();
} else if (prop instanceof EnumProperty) {
int bShift = prop.getEnum();
}
This would stay the same, except LengthProperty would be called 
LengthType and EnumProperty would be called EnumType. Except that the 
code above should IMHO use if (prop.getLength() != null) to test for a 
length type instead of using instanceof.

I'm not sure what I propose as the naming convention for the new 
combined property/value, but Alt-Design calls them typeType so I used 
that in the list above.

regards,
finn

Unnesting properties and makers.

2004-01-22 Thread Finn Bock

Hi,

After updating from CVS, it is most likely necessary to do an ant 
clean to get rid of the old generated maker classes, before building.

I have not yet removed the properties.xsl file from CVS. I guess it 
should be removed since it isn't used anymore.

I've found an argument for unnesting the maker classes from their 
property classes: If we want to put the makers in its own package and I 
think it would be a little cleaner to do that. Using the fo.properties 
package seems natural.

Does anyone know why we wrap the datatypes instances in a property 
instance? I think we could avoid the property instance by having the 
datatypes extends an AbstractProperty class which implement a Property 
interface:

public interface Property {
public Length getLength();
public Space getSpace();
...
}
public class AbstractProperty {
public Length getLength() {
return null;
}
public Space getSpace() {
return null;
}
...
}
public class Length extends AbstractProperty {
// Rest of datatypes.Length class.
...
public Length getLength() {
return this;
}
}
With such a change, one of the remaining differences between HEAD and 
Alt-Design would be in the naming of the classes:
  Property = PropertyValue
  Property.Maker = Property

Comments?

regards,
finn

Re: Unnesting properties and makers.

2004-01-22 Thread Finn Bock


I have not yet removed the properties.xsl file from CVS. I guess it 
should be removed since it isn't used anymore.
[J.Pietschmann]

I think you could leave the file there for now. 
Ok.

It should be
sufficient to  inactivate the related task in the buildfile
(for example putting it in an XML comment).
Too late for that, but I'll reactive the lines as comments tomorrow.

Does anyone know why we wrap the datatypes instances in a property 
instance?
No.

Actually we should strive to use a proper parse tree for
property expressions:
1. Create a few classes for the symbols in the property
  expression grammar (section 5.9 of the spec). I think we need
 as terminals
 - AbsoluteNumeric
 - RelativeNumeric
 - Color (the #N thingy)
 - String (aka Literal)
 - NCName (everything else, basically, including enum tokens and
  inherit)
 and for the nonterminals
 - PropertyFunction
 - Some classes for the operators
2. Write a proper parser (maybe using ANTLR, at least for bootstrap)
 which produces a proper parse tree.
3. Add methods to the objects for resolving relative numeric values
 (percentages, em) and for evaluation.
4. Perhaps add constant folding to the parser.
Interesting. What issues do we have in property parsing that is solved 
by this? I'm only aware of arithmetic on relative numerics which doesn't 
work.

regards,
finn

Re: Newbie committer questions.

2004-01-21 Thread Finn Bock

but instead I used the :ext: protocol and set CVS_RSH=ssh with cygwin, 
just like I do for sourceforge. Would I get any benefit from using ssh 
tunneling?
[Jeremias Maerki]

You don't have to establish the SSH connection everytime you do a CVS
operation. It's speedier.
Thanks, perhaps I'll give it a try later. For now the CVS_RSH method 
works fine and fast enough.

[Glen Mazza]

 ... Finn, would you mind taking
 over the rest of your last patch?  The issues I found
 can be discussed and changed, if necessary, after you
 apply it.
I'll get to it tomorrow, friday. I can't see any way of doing it 
incrementally, so I'm going to commit the rest of it in one go.

regards,
finn

Newbie committer questions.

2004-01-20 Thread Finn Bock

Hi,

I've received my account information and everything appears to work 
fine. In the guides for setting up CVS there are several ways to set up 
tunneling

   http://jakarta.apache.org/site/cvsonwin32.html

but instead I used the :ext: protocol and set CVS_RSH=ssh with cygwin, 
just like I do for sourceforge. Would I get any benefit from using ssh 
tunneling?

When I committed my first modification I got the CVS/Template file 
shown, but from browsing around a bit in the archives of commit mails, I 
couldn't find an example of anyone using that Template. What is the 
policy on this?

I also guess that we are not yet updating any CHANGES files with the 
modification we make?

regards,
finn

Re: Comments on new property maker implementation

2004-01-20 Thread Finn Bock

Finn Bock wrote:

I would guess that doing ~6 string compares to navigate the binary 
tree (with 148 color keywords) is slower than one string hash, ~1.2 
int compares and one string compare. But I haven't measured it, so you 
might be well be right. Many keyword sets for other properties are 
much smaller and could perhaps benefit from a more suitable collection 
type.
[J.Pietschmann]

I meant setup effort, although a binary tree will most likely do
additional memory management. You are right about the lookup. Just
for curiosity, where do you get the 1.2 int comparisions? A perfect
hash should not have collisions.
I was comparing a standard HashMap with your binary tree. A perfect hash 
would likely have a more complicated hash function and of course zero 
int compares.

It might also be interesting how a trie or ternary tree (as used for
hyphenation patterns) would compare to hash maps for keywords (in
terms of setup costs, lookup costs and memory). I have doing a
study of various Java implementations on my todo list but didn't
quite get around to do this.
Very interesting indeed.

regards,
finn

Re: Comments on new property maker implementation

2004-01-19 Thread Finn Bock

[Finn Bock]

You should perhaps also be aware that the values
in a static array gets 

assigned to the array one element at a time. So

   static int[] a = {
101,102,103,104,105,106,107,108 };

becomes in bytecodes:

Method static {}
  0 bipush 8
  2 newarray int
  4 dup
  5 iconst_0
[Glen Mazza]

Hmmm...Are you saying that declaring a static array
isn't much (any?) faster than manually creating one? 
I would guess that it is a bit faster than the typical bytecode for 
manually created arrays since the above uses 'dup' instead of 
'getstatic' or 'aload' to push the array on the stack.

I didn't realize that there is code being run for
static arrays--I would have thought the compiled
bytecode just includes the array internally, and not
the code to create it.  (i.e., if you opened the
bytecode you would see an array 101 102 103 104...
sitting someplace.)
Arrays can't be stored in the constant pool so there is no place to put 
the data except as bytecode.

http://java.sun.com/docs/books/vmspec/html/ClassFile.doc.html#20080

It should perhaps be noted that constant instances of String is not 
really stored in the constant pool either. The pool just stores the 
utf-8 representation of the string constant, and each literal string is 
initialized as a new pool String object based on that (as UTF-16ish).

Isn't that how C works, at least?
I think so, but C has hardware support for code and data segments so C 
can make better use of it.

Sigh...I guess I *didn't* know bytecode by heart after
all!  ;-)
I didn't bring it up to discourage the use of static initialized arrays.
If it makes sense to put something in a static array we should do so 
without concern of compiletime vs. runtime. After all, the 
initialization is only performed once per classloader.

regards,
finn

1 2 >

1 - 100 of 136 matches

Mail list logo